E-commerce Searching Engines Architecture

E-commerce Searching Engines Architecture
By Aryan October 30, 2024 5 min read

In e-commerce applications, one of the major goal is to match the product from large repository of items data based upon the intended user query. Users generally use short and ambiguous terms which requires efficient algorithms to better match with actual products. There has been wide research around the topic and we will be summarizing some of the algorithms which can be used for this purpose.


a. Term based methods like BM25 and TF-IDF are widely used but since they use item frequency in scoring, they are bound to give lot of false positives. As an example in the image given below, when the query word is “Gym Weight”, it has been matched to weight lifting hooks shown in Option b below. Also it won’t be able to consider the semantic meaning and thus wont be matching the relevant item C since it doesn’t have matching word in it.

b. Neural Based Methods have also been used which maps the query and items in dense semantic space and measure there similarity. When input is fed from the user, they give the relevant score to the matching product which helps to filter out the related products.

c. Transformer based methods such as BERT can also be used to measure the relevance matching and they have bi-encoder and cross-encoder options available where bi-encoder encodes the query and items separately and thus are faster in matching while the cross-encoder creates joint representation of the query-item using attention mechanism and is thus more accurate than bi-encoder but is comparatively slower (high latency). The drawback to the neural network and transformer based methods are that the results won’t be explainable and associated wrong results identification is also very labour-intensive.

Local2

Image Source: https://arxiv.org/pdf/2307.00370.pdf


Entity Matching Based Approach : In entity matching based approaches, entities such as product types are extracted using NER Based models and then the query and entity are jointly encoded with cross encoder and then For every QE (query, entity) pair in a QI (query,item) pair, the relevance probability is computed with an MLP scoring layer followed by a sigmoid function. Then A soft logic operator is applied to get the score.

Local

Image Source: https://arxiv.org/pdf/2307.00370.pdf

So this are all some of the ways the inner algorithm works out when someone queries an ecommerce websites like Amazon or Flipkart.

About the Author

Aryan

Machine Learning Expert