Scaling Large Language Models for e-Commerce: The Development of a Llama-Based Customized LLM

Third-party LLMs like Llama 3.1 allow us to adapt powerful tools for the e-commerce domain with a mix of eBay and general data to enable our magical AI experiences.

eBay holds decades of insights we can use to enrich our AI technology and create personalized, magical experiences for our customers. And there are billions of active listings on our platform at any moment with millions of active sellers in 190 global markets.

As such, training large language models (LLMs) in the e-commerce domain presents unique opportunities and challenges. And because of the depth and breadth of our data, we have taken a hybrid approach to LLMs.

On one hand, we build e-commerce LLMs completely from scratch (including our LiLiuM family of LLMs), which gives us full control over every aspect of the models, including license, data, architecture, vocabulary and more. On the other hand, we adapt existing third party models (like Meta’s Llama models) toward the e-commerce domain by continuously pretraining on a mix of eBay and general domain data. This allows us to move faster and unlock more value, as we do not have to develop the models completely from scratch.

The rest of the article explains the development of our Llama-based customized LLMs for e-commerce – 8-billion and 70-billion parameter language models which we have adapted to the e-commerce domain. In short, “e-Llama.”

 

Background

LLMs like GPT-4 and Claude have revolutionized natural language processing (NLP) across multiple domains, including e-commerce. However, these services come with considerable costs, making them impractical for businesses like eBay that need fine-tuned, scalable and cost-effective solutions. Additionally, relying on third-party models introduces data security risks and limits fine-tuning capabilities based on proprietary data.

To address these challenges, we aimed to build an in-house solution. The combination of open and proprietary models gives us the best of both worlds to achieve fine-tuned, scalable, and cost-effective solutions that are tuned to e-commerce applications, making it easier and faster to leverage AI to buy or sell things people love. 

Training a large-scale LLM from scratch is a very time- and resource-intensive process. In order to move fast, one could use existing pretrained models, such as Llama 3.1, for their use cases. However, these models typically lack specific knowledge, in our case about the e-commerce domain.

As a solution, we continue training the Llama base models on a large amount of e-commerce data in order to infuse domain specific knowledge into the model. This technique is known as "continued pretraining" and the training setup has to be carefully balanced to make the model learn about the new domain while preventing the model from degrading too much in performance on general domain tasks.

 

Data Sources

The goal is to infuse e-commerce specific knowledge into the Llama 3.1 base models, without the models forgetting information they have learned in their original pretraining (an effect sometimes called "catastrophic forgetting"). To achieve this, we include some examples in our training data mixture which are close to the examples the models have originally been pre-trained on. This "replay" has been shown to help a model retain previously learned information. The aforementioned examples are drawn from a mixture of curated, publicly available and open-source datasets, and smaller but more high quality datasets. We also include 10% non-English language general domain data to further enhance the model’s multilingual capabilities.

Regarding the e-commerce domain, we utilize several data sources. On the one hand, we gather data from public listings and product reviews from the eBay website. This data is then thoroughly filtered and serialized to fit the task of autoregressive language modeling. Additionally, we train an e-commerce classifier and use it to extract e-commerce specific examples from a larger open-source dataset.

Training Methodology

Training was conducted using 60 nodes, each having 8 NVIDIA H100 80GB GPUs (a total of 480 GPUs). The GPUs are connected via NVIDIA NVLink (intra-node) and InfiniBand (inter-node). The hardware is part of the eBay compute platform. Model training at this scale requires an efficient distribution of model and optimizer states across several GPUs and sometimes even across nodes. We use Megatron-LM as a highly-optimized training framework that allows us to use 3D parallelism in training (data parallel (DP), tensor parallel (TP), pipeline parallel (PP) as well as distributed optimizer states, flash-attention-2, among other optimizations.

We determine the optimal training setup through a series of experiments at a smaller scale. We find that, for our use case, a maximum learning rate of 10% of the original maximum learning rate, and a general-to-e-commerce data sampling ratio of 1-1 gives the best results. We use cosine learning rate scheduling with warmup, a batch size of approximately 11.8 million tokens and 85k total update steps. This means the models are trained on 1 trillion tokens in total. Training the 70 billion model on 1 trillion tokens took around one month, or about 340k GPU-hours. Comparing these numbers to what has been reported for the Llama 2 base model training, we find our setup to be even more efficient.

 

e-Llama Performance

The final e-Llama models demonstrate approximately 25% improvement in e-commerce-specific benchmarks for English and about 30% improvement for non-English when compared to the corresponding Llama 3.1 base models. At the same time, we observe only 1% degradation on general domain NLU benchmarks for the large e-Llama 70B model.

After pretraining, we further instruction-tuned the models, aligning them with human feedback to ensure they generated safe and contextually appropriate content. This tuning also helped the models learn guardrails and follow explicit instructions, enhancing their practical application.

By infusing domain-specific knowledge in the e-Llama models through continued pre-training, we are seeing a more efficient setup and overall improvements in performance and benchmark, and the model training is enabling eBay to leverage proprietary and open LLMs to drive new AI initiatives across the company.

Read more about the development of our Llama-based customized LLMs for e-commerce in this paper.

Learn about our approach to responsible AI here