Search
The search is based on four aspects indexing, the search query, the filter, and the rescoring.
1. Indexing
In order for a search to be performed at all, documents and product fields must be indexed. You can think of this process as creating a keyword directory for a book. The search is performed in this directory and the matching pages can be returned. In order to be able to offer results that are as generalized as possible, the documents are subject to various analysis processes during indexing that allows the computer to better understand search terms:
- Remove special characters like "" - /
- Lowercase all letters
- Replacing umlauts or letters from other alphabets with the base letters
- Decomposition of compound words
- Word stem reduction
2. Search query
The search query is the central part of the search. At this point, the maximum number of hits of the search is defined. Likewise, a sensible basic sorting of the results is carried out with the Elastic Search(ES) standard mechanisms.
The search query consists of different sub-queries that are combined additively with an appropriate weighting. Each component of the sub-query leads to a possible expansion of the hit set. It is therefore important to target as narrow a range of hits as possible with these components in order to obtain a high accuracy of the total set of hits. In the following the single. Components and their weighting are explained below.
a) Multimatch
This is a good basic search, which covers the widest range of hits and consequently receives the greatest weight. Only fields that contain all search terms ("and") are considered. In contrast to an ("or") search, this search opts for higher precision and a smaller number of hits. For each product, the field with the highest ES score is selected and all other fields that also contain hits are considered with a lower factor. All in all, part a.) alone should deliver good hits for the most frequent search queries.
b) Titlematch
This query is specially designed for multi-phrase search queries, of which at least 2 terms are to be found in the title field of the product. The search is thus able to find relevant hits even if the description of the search object is not completely correct.
Ex. "Acer Inconia 5" - does not exist as a title like that, but all "Acer Inconia" is returned:
c) Crossfields
With this query, the entries of all search fields are treated together as one field. This allows you to cover search queries where the search terms are distributed over several fields.
Ex. "black bicycle shorts". The two terms together are not present in a single field for most products:
d) Prefix
This query is necessary for the auto-completion of search terms. Already from three letters of the started word, the first results for the search are delivered. The accuracy increases with the number of letters entered.
e.) Fuzzy
a-d are executed again as fuzzy search (if fuzzy search active Fuzzy). Here, in some cases, fuzzy search terms are also included in the search, so there is a lower weighting of the results compared to the non-fuzzy hits. For more information on fuzzy, see section Error tolerant search.
f) Variants
Optionally, the queries a-e can be applied again to all variants. This costs performance depending on the number of variants and gives only different results if the variants differ from the parent articles. Under certain circumstances, however, this can also lead to greater blurring.
Ex. "big love" finds at favorite bag also the "Kånken Laptop 17" because a chield in the one description contains "favorite backpack" and "big brother" in the long text description.
3. Filter
Filters limit the hit list selected under 1. By default, queries such as article still active, in stock,... are filtered here.
4. Rescoring
This part of the search is newly sharpened in Makaira Query - 2018.3.1 and performs night sorting of results based on the following factors:
a) Rescoring based on partial title/attribute hits based on heuristic assumptions. Using many search fields and especially long text elements naturally results in a large result set. In order to sort better hits to the first positions, the results are resorted so that partial hits ("or") in the title or an attribute are weighted higher than hits in other fields. Rescoring a) is done with a constant factor and is not as usual for ES search queries with an undefined value. Ex. from 1 c): If the term "black bicycle pants" is searched for, all results with the term bicycle pants or black in the title will be rescored. The same applies to attribute values.
Ex. "telemark ski binding".
b) Using dynamic boosting based on general user data. This data includes e.g. number bought, number added to cart, price,... The data is transformed to a normal distribution-like form before being used for dynamic boosting and scaled according to the maximum desired impact. To transform and scale the data appropriately, knowledge of the distribution and its maximum or minimum value is necessary. For performance reasons, it is advisable to perform the scaling during the import into the ES index.
Example: "Quantity in the shopping cart" transformed to the normal distribution and scaled to the range of values [0,0.25] to set the maximum influence (see images below).
c) With the help of user-specific data, which classifies the user into a class. The data is determined as part of the Operational Intelligence / Machine Learning module. More details can be found in the Machine Learning documentation. In principle, however, the procedure is the same as in b).
Performance: The Rescoring Query allows the option "window_size". The additional and also more complex options can thus be limited to the most important, e.g. 50 hits, from the outset.
Updated almost 3 years ago