Building and releasing a modern homepage for eBay’s hundreds of millions of annual and tens of millions daily users was the team’s greatest challenge to date. We were given the mission to develop a personalized buying experience by mashing merchandised best offers with tailored ads, topping it off with the hottest trends and relevant curated content. Any such virtual front door had to be robust to changes, be easily configurable for any site, and had to load really, really fast!
There’s no second chance to make a first impression, so we had to make it count. But how quickly can you go from 0 to 100 (100% deployed for 100% of traffic to the homepage)?
The team commenced this project in late 2016, and we rushed to get an early flavor available for an internal beta in mid-February. Then, traffic was slowly ramped until reaching a 100% mark for the Big 4 sites (US, UK, DE, AU), delivering it less than three months after the beta, while simultaneously adding more scope and fresh content to the sites.
Our new homepage consists of three main components, spread across three large pools:
- Node Application – The homepage front-end web server handles incoming user requests and acts as the presentation layer for the application, boasting responsive design.
- Experience Service – The experience service tier is the primary application server that interacts with the page template and customization service and orchestrates the request handling to various data providers. It performs response post-processing that includes tracking. This layer returns a render-ready format that is consumable by all screen types. The experience layer is invoked by Node for desktop and mobile web traffic and directly by the native applications.
- Data Provider Service – The data provider service is facade that massages responses from datastores into a semantic format. Providers, such as Merchandise, Ads, Marketing, Collections, and Trends, are interfacing with this service. JSON transformation, done using Jolt, produces the response datagrams.
Additional auxiliary components help keep the system configurable and simple to reorganize or to publish new content to site within seconds.
The Experience service was built on a shared platform, called Answers, developed jointly with the Browse and Search domains.
Due to the hierarchical nature of the system, its complex compositeness (16 unique modules), and its multiple dependencies (9 different services, to be exact), special consideration has to be given to the rollout and deployment process.
We adopted a bottom-up strategy by deploying in the reverse order of dependencies. That is, we start with datastores, data providers, experience services and lastly deploy the node application server. This approach helps us bubble up issues early on in the process and tackle misalignment in an effective manner.
All service-to-service calls are made via clearly defined, internally reviewed APIs. Interface changes are only allowed to be additive to maintain backwards compatibility. This is essential for the incremental bottom-up rollout scheme.
When developing and deploying a new build, as you would delicately change a layer in a house of cards, one must act with great care and much thought about the inherited risks. Test strategies like “mock-first” were used as a working proof-of-concept and allowed testing to begin early on, where caching techniques quickly took the place of the very same mocks in the final product. Unit tests, hand-in-hand with functional and nonfunctional tests, had to be done with rigor, to make regression easy, fast, and resilient to change.
The team always deployed to Staging and Pre-production before conducting any smoke-testing on three boxes in Production, one VM instance per Data Center. Specific networking issues were exposed and quickly remediated, as a result.
Some engineering-related challenges that the team was facing had to do with Experimentation Platform complexities, where a selective pool channel redirect appeared to have unexpected behavior with traffic rerouting, while gradually diverting from the current Feed home page to the new carousel-based one.
Internal Beta testing, done by Buyer Experience teams, and external in-the-wild testing conducted by a crowd-testing company, took place with a zero-traffic experiment. This process was followed by a slow 1% traffic ramp, to help control the increments, as gradual 5% and 25% steps were introduced to the pools, spanning days to weeks between the phases. Each ramp involved multiple daily Load & Performance cycles, while fine-tuning timeout and retry variables.
To summarize our key learnings from the homepage experience:
- Relying on clearly-defined APIs was vital in achieving an iterative, incremental development and rollout strategy.
- Co-evolving our logging and monitoring infrastructure and the team’s debugging expertise was essential to achieving an aggressive rollout approach on a distributed stack.
- Tracking and Experimentation setups, as well as baking in Accessibility, proved to be major variables that should get well-scoped into a product’s development lifecycle and launch timelines.
- Introducing gradual functional increments along with a design for testability in mind proved to be a valuable quality control measure and allowed absorbing change while moving forward.
- A meticulous development and release process maintained the associated risks and mitigation actions, allowing the team to meet their milestones, as defined and on time.