Diversity in Search

By: David Goldberg

This is a post about how to rank search results, specifically the list of items you get after doing a search on eBay.

Diversity

One simple approach to ranking is to give a score to each item. For concreteness, imagine the score to be an estimate of the probability that a user will buy the item. Each item is scored individually, and then they are presented in score order, with the item most likely to be purchased at the top.

This simple approach makes a lot of sense. But there are some situations where it can work poorly. For example suppose that there are forty results, 20 are Sony and 20 are Panasonic. And suppose that all the Sony items have a score that is just a tiny bit better than any of the Panasonic items. Then the first 20 results will be Sony, giving the incorrect impression that we only have Sony. It would be better to show at least one Panasonic near the top, preferably high enough to be “above the fold”. In other words in this case I want to take diversity into account. An ultra-simple method to handle this case would be to add a small amount of random noise to each score. But I think you can see that this is somewhat of a kludge, and doesn’t address the problem of balancing score and diversity head-on.

Solutions in the Literature

One of the first principled methods for diversity goes back fifteen years to the paper of Jaime Carbonell and Jade Goldstein “The use of MMR, diversity-based reranking for reordering documents and producing summarization” in Proceedings of SIGIR ’98. Their idea is a simple one. Find a way to measure the similarity of items, where presumably two Sony items would be more similar than a Sony item and a Panasonic one. Then trade off the score and the similarity. In more detail, they propose placing items on the results page one step at a time. The first step places the item with the highest score, and records its index in the set of placed items I. The second step examines every unplaced item and compares their scores with their similarity to the item i (where I={i}). If P(j) is the score of j (P for probability) then each item j is evaluated using what they call marginal relevance: λP(j)−(1−λ)S(i,j). The marginal relevance is higher when the score P is better, and it is higher when the similarity to i is less. The parameter λ determines the relative importance of these two numbers. The item chosen for the second slot is the one with maximum marginal relevance (MMR):

$MMR = \mbox{argmax}_{j \notin I}\left(\lambda P(j) - (1 - \lambda)S(i, j)\right)$

The MMR item is placed on the results page, and its index is added to I. In general the algorithm computes

$MMR = \mbox{argmax}_{j \notin I}\left(\lambda P(j) - (1 - \lambda)\max_{i \in I}S(i, j)\right)$

and places the resulting item on the result page as well as adding its index in I.

This seems an improvement over adding random noise, but is unsatisfying for several reasons. Once it places an item, it is reluctant to place a second similar item. This may be fine for web search where you only want to see one instance of each kind of page, but is not so good for e-commerce where you’d like to compare multiple items. Also, there are almost certainly several dimensions of similarity (brand, cost, new vs used, etc), and it’s not clear how to combine them all into a single similarity function.

Rakesh Agrawal et. al. have a paper “Diversifying search results” from WSDM ’09 that doesn’t address any of those objections, but does address a different problem: how do you pick λ? They propose an algorithm that doesn’t have an arbitrary tuning parameter. They suppose that each item is in a category and that the demand (for a fixed query) of each category is known. This maps well with eBay search, since we know the category demand for our top queries. How to use this info? They imagine that users who issue query Q are diverse—that some are looking for items in category C₁ while others want an item in a different category C₂, and that this diversity is captured in the category demand table. The paper gives an algorithm that maximizes the chance that a user will find an item that in their desired category. Let’s call this the Agrawal algorithm.

The Agrawal algorithm is a step above MMR in elegance because it doesn’t require fiddling with an arbitrary constant like . It works well for categories, but what about other forms of diversity like brand or item condition (new vs used). Just as we record category demand, we could record demand for brands, item condition, etc. But then we would need a way to soup-up the Agrawal algorithm to combine these different demands, and presumably would need to prioritize them. And like MMR, it heavily penalizes an item if a similar one already exists, which is not appropriate for e-commerce.

A paper by Michael Welch et. al. “Search result diversity for informational queries”, WWW ’11 addresses one of the objections to Agrawal. They assume that users want to see more than one document per category. Specifically, they introduce a random variable J representing the number of items a users wants to see. You have to provide an estimate for the probability distribution of J (i.e. find numbers p_j = Pr(J = j)) and then the algorithm uses that in its ranking. But it still doesn’t address our requirement to gracefully handle multiple dimensions of diversity.

There’s another factor to take into account. With multiple dimensions, we are unlikely to have a good estimate of the demand. For example if the dimensions are category, brand and condition, we would ideally want the demands for each tuple: for example (category=TV, Brand=Sony, condition=Used). For all these reasons, we abandoned the train of thought Carbonell → Agrawal → Welch, and took a different approach.

Our Solution: Agents

For each query we specify which dimensions are important, together constraints in the form of a max and/or min. For example for the query “flat screen TV”, we might want at least 10% new items, and at most 50% with brand Samsung. This fits in nicely with our search architecture, which lets us customize the search behavior for a specific query by putting info into DSBE, a database indexed by query. Also we expect that in the common case the min/max constraints aren’t violated, and so the exact value of the constraints aren’t critical.

You can think of this as a very simple agent system. Each dimension has an agent with constraints (max/min). Each agent monitors the construction of the results page, and tries to keep their constraint satisfied. They won’t always succeed, so think of the requirements as being soft constraints.

Whenever you have multiple constraints, your first question should be how to prioritize them. This falls out almost automatically from the agent algorithm, which I now describe.

Each agent monitors the construction of the result set. At each stage it checks on its constraints. If any are violated, it scans through the unplaced items looking for an item that will get the result set closer to satisfying its constraints. This is the candidate.

If the agent’s constraint is violated, it is unhappy and it quantifies its unhappiness by summing two terms. The first term measures how much the placed items deviate from the agent’s constraints. The more deviation, the more unhappiness. The second term involves the candidate. There will be a penalty for placing the candidate on the SRP instead of the default item that would otherwise be placed, because the candidate has a lower score (P) then the default. Summarizing the terminology:

Candidate: An item proposed by the agent for the next spot on the results page. If no agent proposes a candidate, the default item is the unplaced item of highest score.
Deviance: How far the placed items deviate from the agent’s constraints. Larger means more deviation.
Penalty: The penalty (in reduced score) for placing the candidate instead of the default. Larger means more of a penalty, i.e. more gap between the scores.
Unhappiness: The agent’s unhappiness. If unhappiness>0 then the agent’s candidate will be a contender for the next placed item.

Now back to question of prioritizing. It falls out automatically. If multiple agents are unhappy, we simply pick the one with the largest unhappiness score, and use its candidate as the next item to be placed.

This approach isn’t hassle-free. You need to pick constraints (max/min) for each query. And you need to quantify deviance and penalty. Or in other words, find how to weight them, which is reminiscent of the parameter λ in MMR. But we prefer this because it is efficient, handles multiple constraints, and is not too sensitive to the exact values of max and min. For most queries we expect the constraints to hold with the default ranking, and so placing an agent’s candidate is the exception rather than the rule.

A Sample Formula for Unhappiness

The description in the previous section was abstract in that it talked about deviance and penalty without offering a way to compute them. Here is one way.

A typical constraint requires that the fraction items with property P be at least f (this is a min constraint). P might be that an item is used, or has brand Sony, or is newly listed. The straightforward way to compute deviation from a constraint is to look at the number of items with P placed so far and compare it to the number needed to meet the constraint. Suppose there are n items placed so far, and that k of them are in P. Since the constraint is min = f we’d like at least n items. If k < nf the constraint isn’t met so set the deviance to nf − k . Or more precisely deviance is (nf − k)⁺ where x⁺= x when x > 0 and 0 otherwise.

A very simple way to compute the penalty is to compare the score of the candidate item I_C with the item that would otherwise be placed, I_default. The penalty is S(I_C ) − S(I_default). Since items are sorted in score order, the penalty is always nonnegative.

Unhappiness is computed using the formula unhappiness = deviance − λ × penalty. Imagine deviance > 0 but the only way to fix it is by placing an item with a huge penalty. The large − term makes unhappiness negative, so the agent isn’t unhappy after all. In other words, agents are altruistic and don’t insist on their constraint if it results in placing a poor item. The parameter $\lambda$ controls the tradeoff between score and diversity.

As a preliminary formula, unhappiness is

$\text{unhappiness} = \text{deviance} - \lambda \times \text{penalty} = (nf - k)^+ - \lambda ( S(I_C) - S(I_{\scriptsize{\text{default}}}))$

But there’s one issue remaining. Suppose the constraint asks for at least 10%, and suppose the first item placed does not have the desired property. Then n = 1, f = 0.1, and k = 0 so (nf − k )⁺ =(0.1)⁺ =0.1 and the constraint is already unhappy and will be in the running to have its proposed items placed next. These seems like jumping the gun, since the goal is 10% but only one items has been placed. There are at least two ways to account for this.

The first would be rounding: replace (nf − k)⁺ with round(nf − k)⁺. But I think of constraints as something only triggered in exceptional circumstances, and so prefer to have unhappiness only go negative at the last possible minute. In other words, if it is still possible to satisfy nf ≤ k on the next round, then don’t trigger the constraint.

Writing this as an equation, suppose I pass on this constraint and the next item turns out to not have P. Then k stays the same and n becomes n + 1. But on the next round I pick an item with P. Then k becomes k + 1 and n becomes n + 2. So I can satisfy the constraint in the future if (n + 2)f ≤ k + 1. So replace (nf−k)⁺ with ((n+2)f − (k+1))⁺. The formula becomes

$\boxed{ \footnotesize{\mbox{unhappiness}} = \footnotesize{\mbox{deviance}} - \lambda \times \mbox{penalty} = ((n+2)f - k - 1)^+ - \lambda (S(I_C) - S(I_{\scriptsize{\text{default}}}))}$

Constraints

There are two types of constraints based on a property P. The first is a min constraint, which asks that the fraction of items with P be greater than f_min. The second is max constraint, asking for the fraction to be less than f_max. Most properties are something very specific, like item=used. But there is also an any property. This would be used (for example) to keep any one seller from dominating the results. A max constraint with property seller, and using any asks that there be no seller id with more than of the results. The any property doesn’t make sense with the min constraint.

The any property is also a nice way to do duplicate (or fuzzy) dup removal. We assign a hash to each item, so that duplicate (or similar) items have the same hash. Then using a max with f = 1/50 means (roughly) that each item will appear only once on each 50-item page. If λ = 0 the requirement is absolute (unless there are competing agents). But by setting λ to a small number, we can allow dups if the alternate is to show very low quality items.

The Algorithm

I close with a more formal description of the agent-based diversity algorithm. It can be implemented quite efficiently. In MMR (as well as Welch and Agrawal), at each step you must scan all unplaced items (think O(n²)). In our algorithm, you scan each item once for the entire algorithm (although this is done separately for each constraint). Scanning is implemented by having a pointer per constraint. The pointer moves down the unplaced list as the algorithm proceeds.

Input: a list of unplaced items, sorted by score
Output: items in their final arrangement for display on the search results page (SRP)
Remove the top item from unplaced and append it to SRP
For every constraint C, set C.ptr := first unplaced item
While unplaced is not empty
 initialize the set candidates to be empty
 ForEach C in set of soft constraints
 If C.unhappiness() > 0
 add C to the candidates
 // this has the side effect of updating C.ptr
 end
 EndForEach
 If candidates = empty then
 set item to be the top (highest score) item on unplaced
 else
 take the element C in candidates with the largest unhappiness score
 set item := C.proposed_item()
 end
 if there are any C.ptr pointing at item, move them down one
 remove item from unplaced and append it to SRP
EndWhile

Method C.unhappiness() =
 n := number of items placed in SRP so far
 f := C.f // the bound for this constraint
 k := the number of items placed so far that satisfy the constraint C
 // For the "any" property, k is the max number of items with the
 // same property value that satisfy the constraint
 if C.op = min then
 deviance := ((n+2)f - k - 1)+
 else
 deviance := (k + 1 - (n+2)f)+
 end
 Starting at C.ptr, scan down unplaced until reaching an item I that will reduce deviance
 C.ptr := I
 Idefault := highest scoring unplaced item
 penalty := S(I) - S(Idefault)
 unhappiness := deviance - lambda * penalty
 // set proposed_item field and return
 C.proposed_item := I
 return unhappiness
end

Tag: Search Science