Building a Product Catalog: What we Learned from our University Machine Learning Competition

We challenged more than 100 college students at seven universities to structure listing data using AI and machine learning.

At eBay, we use state-of-the-art machine learning (ML), statistical modeling and inference, knowledge graphs, and other advanced technologies to solve business problems associated with massive amounts of data, much of which enters our system unstructured, incomplete, and sometimes, incorrect. To help surface fresh ideas on how we can solve this problem, we partnered with university students at institutions across the country to host an ML competition to spur more research in the ecommerce domain using our very own dataset — 1 million selected public data from unlabeled listings. What we didn’t expect was the number of learnings that surfaced from the submissions. Here are some of the key takeaways that stood out to us.

1. Students are interested in solving these problems.

When we started outreach to universities, we were initially skeptical of the dataset we selected, and whether an ecommerce domain challenge would attract students. Academic curiosity and competitions of this nature typically skew toward the areas of vision and language. On the other hand, commerce doesn't get the required attention, so we were thrilled by the excitement and response. Our original plan was to onboard two university teams, and we surpassed our goal with more than 100 participants from seven universities, spread across 37 teams. Thanks to word of mouth and the uniqueness of the dataset, we realized that there is a genuine interest in this topic by students and researchers alike.  

2. A scalable platform and streamlined evaluation criteria are key to a successful ML competition. 

We assessed various platforms to host the competition and EvalAI proved to be an ideal choice. EvalAI is under an open-source license and allowed architectural flexibility, enabling us to scale efficiently. 

The challenge is most naturally addressed with an unsupervised learning method. However, in order to evaluate the submissions, we needed to obtain a Golden Set of correctly clustered listings, which proved to be difficult. Even when we sent the same pair of listings to several human reviewers and asked the apparently simple question on whether these two listings were the same, we often received conflicting results, which in turn required several rounds of review.

We chose the overall Rand Index as the evaluation criteria, which is an objective measure that evaluates overall accuracy. While the Rand Index served its purpose in evaluating the submissions, we would opt to use a different metric in future challenges that gives a higher weight to pairs of listings that should be identified as being identical.

3. eBay’s unstructured data poses an ongoing challenge. 

eBay is a platform that allows sellers to enter listing data in an unstructured way. As a result, listings sometimes lack certain information, contain redundant information, etc. 

While the results obtained by the winning team are promising, the problem is far from being completely solved, and this competition has only affirmed its difficulty. However, the winning method provides a solid baseline, which eBay product teams will continue to build and iterate on.

The Winners

Following a thorough evaluation of the models, methodologies, code, and more, we are excited to announce the winners of eBay's 2019 ML Challenge. The student winner, in a solo member team, was Yang Zhao from Stanford University. In addition, eBay awarded an internship to Rabiraj Bandopadhyay who was part of a runner-up team from the State University of New York-Buffalo. Yang and Rabiraj will join our virtual summer internship program and have the opportunity to work with the eBay product team that is using ML and AI to solve the unique challenge that pertains only to eBay — making sense of more than 1.5 billion listings.

Yang Zhao is a Ph.D. student at Stanford University, majoring in Civil Engineering. He is a member of the computational geomechanics group, developing numerical models for anisotropic rocks. Other than academics, Yang is interested in basketball and finance, especially in figuring out how the macroscopic economy works. He has a dual bachelor’s degree in Economy from Tsinghua University and passed the first two levels of CFA exams.

Rabiraj Bandopadhyay is a Master's student at SUNY – Buffalo, majoring in Computer Science and Engineering. His key areas of interest are theoretical machine learning, matrix decompositions for neural networks, and linear algebra techniques in unsupervised and supervised learning. His other hobbies include listening to classic rock music and reading nonfiction.

The eBay Internship Program

Our interns help us reimagine eBay’s marketplace for millions of customers around the world. While today’s climate poses many unknowns for students entering the workforce, we are more committed than ever to providing our interns the best possible learning experience. 

In response to the COVID-19 pandemic, eBay’s 12-week summer internship will be held virtually to promote the health and safety of our interns. Combining real work experience and programming, the internship will give interns the unique opportunity to see into various business verticals, meet our executives and network with like-minded peers.

Throughout the internship, students will be challenged to propose solutions to complex problems that have a positive impact on customers and sellers alike. 

Congratulations to our winners and huge thanks to all the participants for their enthusiasm and support.