When shoppers come to eBay, our goal is to help them easily find a product they’ll love.
Over the past year, our teams have focused on how we can improve item recommendations for buyers that are more relevant and aligned to their shopping interests.
Under our previous recommendations model, products were displayed for customers on the eBay marketplace — including the “Similar Sponsored Items” strip on an item listing page — based on their sales popularity.
This post describes how we developed a relevance cascade model to help connect shoppers with more relevant recommendations on eBay. Scraping tests show that the relevance cascade model recommends more similar items to buyers and improves the overall shopping experience. The subsequent A/B test showed a 2.1% purchase increase that validated our approach.
Background
As the name suggests, “Similar Sponsored Items” recommends similar items with respect to the main item on the page, which we call the “seed” item.
Customers had received recommendations that were powered by a machine learning model; however, there were occasional cases where the recommended items didn’t quite match the seed item.
For instance, in the screenshot below, the top three recommended watches for a $14,500 luxury wristwatch are different in item title, appearance, brand and price range. Such recommendations are most likely not aligned with shoppers’ interest who land on the luxury wristwatch page.
Recommendation Ranking Model
To improve the user experience in this recommendation strip, we conducted an investigation on the previous machine learning ranking model.
The ranking model was formulated as a learning-to-rank problem and trained with the Gradient Boosted Tree (GBT) model. Model features can be roughly categorized into three groups: similarity features (which are computed between the seed item and the candidate items), candidate item popularity and quality features.
Among all the features, we noticed that many top important features belong to the popularity feature group, which means popular items are likely to get better ranking than some of the relevant but not popular items (e.g. new items without a sales history).
Relevance Cascade Model
To improve recommendation relevance, we built a relevance model and applied it as a filter to remove less relevant candidates and only funnel the relevant candidates into the ranking model. Below is the high-level model inference flow chart.
Customized Objective Function
The crux is how to select the “relevant” candidates. First, we removed popularity-based features from the relevance model. Then, to promote relevance, we created a customized objective function in the model by incorporating an item similarity score into the purchase probability prediction.
Customizing an objective function means we can add additional terms on top of the original loss function, which in our case is the cross entropy loss.
Our customized objective function is designed to boost training weight for relevant purchase examples and discount the less relevant purchase examples. A similarity score is used as an approximation for relevance.
Where y is the purchase label, it’s 1 for purchase and 0 for non-purchase; p is the predicted purchase probability; h is a similarity score between each seed and recommendation candidate item (it is also a feature in the model), that is between 0 and 1; and are hyperparameters that range from 0 to 1. We take = in our model.
Based on the new objective function, we trained a classification model to predict purchase probability for each item. We then selected the top-k items with the highest purchase probabilities in the relevance model as the “relevant” candidates to flow into the original ranker.
How to Pick k for Different Categories
Since different categories have different catalog sizes, top k items’ relevance degree would vary across different categories. The more relevant items we select to flow into the downstream ranker, the more easily the relevant items are surfaced, hence the more easily the customer would convert, the higher purchase through rate (i.e. PTR) it would be. K is the key to select a different number of top relevant items for each category.
But how to pick the “correct” k for each category? We gathered online PTR differences between the model without relevance filter and the model with different k values ({35, 40, 45, 50, 55, 60}) as the filter.
On the marketplace level, k = 60 works best for the U.S., which has the highest Purchase Through Rate (PTR). For the other three marketplaces U.K., Australia and Germany, none of the k’s outperforms the model without filter.
We did the same analysis on category level for categories with enough traffic. Here are some examples where the optimal k differs in different categories.
Visual Examples
We picked the optimal k for each high-traffic category and applied global optimal k for other categories to implement the relevance cascade model. Now, let’s revisit the case of wristwatches. The recommended items by the relevance cascade model are much more similar to the seed item in title, appearance, brand and price range. This is a clear demonstration that it is important to first select the relevant candidate items, and only then apply a ranking model optimized for conversion, as our relevance cascade model does.
Here are more examples which show results without a relevance filter on the first row, and optimal k relevance filter on the second row. The first item on each row is the seed item.
Example 1 - women’s boots
Example 2 - headphones
Our Experiment’s Results
We tested the relevance cascade model with eBay’s real traffic through an online A/B test. It shows a 2.14% increase in PTR, which validates that the relevance cascade model improves user shopping experience and drives better conversion.