Training a large-scale visual search model is an extremely challenging task. Using our on-premise hardware, eBay engineers and researchers spent months training a single model to recognize more than 10,000 product categories. We want to iterate much faster, even when we are working with datasets of tens or hundreds of millions of product images.
Machine learning hardware is evolving very rapidly, and we faced an important choice when planning our next-generation visual search effort: should we purchase and deploy new ML hardware in-house, or should we move to the cloud?
Building cutting-edge shared distributed computing infrastructure in-house requires us to wait months for certain components and is complicated and expensive. And each generation of hardware is soon surpassed by the next.
We decided to evaluate Google Cloud Platform, which makes a wide range of powerful ML hardware accelerators available as scalable infrastructure. Because even our smallest datasets contain tens of millions of images, we were especially interested in Cloud TPU Pods, which can deliver up to 11.5 petaflops while providing the experience of programming a single machine.
Our results are very promising: an important ML task that took more than 40 days to run on our in-house systems completed in just four days on a fraction of a TPUv2 Pod, a 10X reduction in training time. This is a game changer—the dramatic increase in training speed not only allows us to iterate faster but also allows us to avoid large up-front capital expenditures.
We believe ML hardware accelerators such as Cloud TPUs and TPU Pods will become the norm for business AI workloads. With the availability of such resources at public cloud scale, many enterprises large and small will have the capability to innovate with AI. By adopting GCP’s Cloud TPUs as one of our strategic assets, eBay can ensure that our customers see the freshest possible product listings and find what they want every time.
We would like to acknowledge help from Google Brain and Google Cloud Platform teams.