Streamlining Language Technology from Idea to Deployment

In recent eBay Tech Blog articles, we presented the Unified AI platform called Krylov and our pythonic tool to interact with the platform, PyKrylov. In this article, we introduce our Natural Language Processing framework built on top of the AI platform.

“I propose to consider the question, can machines think?”

-Alan Turing, 1950


The birth of Natural Language Processing (NLP) is often accredited to Alan Turing’s famous test: can a human distinguish a human conversation from a computer conversation? Seven decades later, the NLP field continues to see vast progress. Attempts to analyze, structure and generate human languages have been tackled by rule-based, statistical and, most recently, neural-net based approaches.

eBay is unique with its unmatched inventory in many languages and categories, as well as user-generated input and queries. For a global ecommerce company, NLP continues to be at the core of our business such as:

  • Language representation models that improve search relevance and ranking
  • Machine translation that connects buyers and sellers, even if they speak different languages
  • Named entity recognition (NER) that extracts product aspects and brand names in a query
  • Spelling correction that guides our customers to their intended items

Within both eBay and NLP research communities, these tasks have been approached in several different programming languages and toolkits. Of note, Python-based toolkits having found most traction in the last couple of years.

PyNLP Overview

To streamline and facilitate NLP usage at eBay, we built an internal framework called PyNLP. It grafts into the AI platform Krylov (see Figure 1). 

26Figure 1: Simple layered representation of PyNLP and the “Krylov” AI platform components they interact with.

Several existing pain points led to these efforts, which we group into Exploration, Customization and Deployment.  


Out-of-the-box NLP approaches almost never work when applied to ecommerce data. When looking at inventory items such as, Hello Kitty Cheer Leader Plush Doll TY 6” !!! PINK & WHITE, it seems clear that a standard parts-of-speech analyzer optimized on belles-lettres will not detect the brand “Hello Kitty.” Instead, it may assume a greeting interjection and struggle to detect that 6” is the product size. PyNLP combines models trained on e-commerce aware data and implementations that make best use of eBay’s infrastructure and rich data to handle above input.

Another exploration hurdle is discovering and utilizing NLP models trained by other teams across eBay. PyNLP aims to serve as the single source of truth for NLP models. Onboarding new models and accessing existing models are streamlined via a pythonic interface which are designed to work smoothly on top of the existing eBay AI infrastructure. Standardized interfaces allow for easy and intuitive access for exploration as well as for batch processing. This ensures that our users only need to think through more details when they actively want to switch the implementations themselves.


The sheer number of publicly available NLP solutions is already overwhelming. Each toolkit typically comes with its own software environment requirements and often, also includes sparse documentation. While this is true for many software products out there, it’s especially true in machine learning. This is because open-source software often stems from a research perspective and the code can accrue debt.

If a user wants to train on a downstream task, it is easy to pass the training data through all available models and directly compare the performance to find the best matching solution. For the models itself, PyNLP shows how to customize many of them via QuickStart training and fine-tuning recipes on Jupyter notebooks. The requirements for each model are encapsulated and maintained in docker build files.


New language modelling paradigms, such as neural embeddings (e.g., BERT[1]), can take several days to retrain with suitable data on a significantly double-digit GPU rack so they should be handled with prudence. It is crucial that these efforts, already during development state, can be shared across all teams rapidly. PyNLP makes onboarding easy for researchers to explore and for model developers to share by providing templates and libraries which help connect to PyNLP’s core. With a vast collection of best practices and helper tools, we eliminate roadblocks from a PyNLP microservice prototype towards a production-ready service in a Kubernetes environment.

Below, we explore how we tackle these challenges by laying out three use cases: exploration of existing solutions, customizing specific models, and how to converge prototypes into deployed services.

Use Case: Exploration

This use case assumes that a tech-savvy applied researcher wants to explore existing solutions for a specific task, -here- a Named Entity Recognition (NER) service.

Screen Shot 2020 05 21 at 12.30.27 PM

The four lines above serve a powerful neural net trained on eBay-specific data. Under the hood, registry connects to Krylov Core (cf. Figure 1) to access the model management system. A global registry endpoint can tell an eBay researcher which models we already onboarded and which docker image has the matching implementation details to serve the model data. load_model will download the image from the internal docker container registry, mount the model data into this docker and start a microservice. NER is a stub that connects to this microservice via REST, providing a pythonic way of model interaction. If the model is already provided as a central microservice, PyNLP can also directly connect to it without having to host the microservice itself. However, we need to ensure that older model versions are still runnable and reproducible in case a downstream application needs this output to run properly.

All interfaces and result types are predefined by PyNLP. All NER microservices serve the same interface and with a simple Python for loop, you can access process specific test data through them. Since they are started via docker, the underlying implementation might use Python, Java, C++ or other arbitrary programming languages. The models paradigms might also range from simple rule-based solutions to modern neural paradigm. Comparing NLP task performance across updated model data, different implementations or even paradigms is made very easy through this process. For our example, the deep neural net model can finally tell us that Hello Kitty is a brand, Plush is a material, Doll is a type, 6” is a measurement and that PINK and WHITE are colors.

Screen Shot 2020 05 21 at 12.31.25 PM

Use Case: Customization

Assume that through exploration, a specific model caught the eye of an interested researcher. For most of the applied researcher tasks, a first assessment of the baseline performance is the beginning in a model development lifecycle. The next use case of PyNLP is to provide easy access to the model implementation itself and allow for customization of the algorithm or fine-tuning of the model.

For typical recipes such as fine-tuning a BERT embedding, you can start the images in Jupyter notebook[2] mode and work your way through provided notebooks. Since these notebooks can be started in the AI platform Krylov, researchers also have direct access to the Hadoop clusters and other data access points. This allows them to request heavier hardware support if needed. 


Figure 2: Sphinx-generated documentation page for PyNLP. The Jupyter notebooks depicted are executable.

A central design aspect of PyNLP is its living documentation. Our code does not speak for itself – it speaks with you. Through docker bases, interface templates and framework libraries for our microservices, we provide code slots for the concrete model implementation parts. This means that even when you are looking at a first prototype implementation that was just copied and pasted, it most likely already comes with a battery of onboarding functionality. This includes support for inline documentation, Sphinx[3] integration, a SwaggerUI[4] with examples to showcase their APIs and support for Prometheus[5] metrics.


Figure 3: SwaggerUI interface for an in-house BERT model. It interactively connects with the microservice so that you can run quick examples to test the API.

Use Case: Deployment

Assuming that a strong model has been identified, customized, and optimized on a specific task, ensuring business impact in a commercial environment is the next step in a lifecycle. Of course, security consideration and service-level agreement need to be tackled individually.

We can now run microservices in cloud-native environment as a containerized and dynamically orchestrated platform, as well as monitor logs and metrics. For the deployment into Tess, our cloud infrastructure based on Kubernetes[6], PyNLP facilitates as much as possible. is our event processing system that logs both the standard out/err of the microservices as well as the aforementioned Prometheus metrics. To simplify the above, we use Helm Chart[7], a template-based approach, to deploy PyNLP microservices into Tess and manage its lifecycle. Using this, users will be able to deploy the service in just one line of command (illustrated below), thereby shielding away the complexity of Kubernetes resource creation.

Screen Shot 2020 05 21 at 12.33.39 PM26

Figure 4: Grafana board reporting on metrics such as average response time. Individual metrics can be emitted by the microservice implementation, if needed.


PyNLP is built to significantly accelerate the NLP model development life cycle by reducing any obstacles in terms of software-specific requirements, data exchange, and interaction with the eBay infrastructure. We foster comparability and reproducibility through standardized interfaces and make use of living documentation to share (language-specific) considerations and best practices. While heaving NLP microservices into production can never be an automated process for security considerations, we facilitate the process as much as possible.