Move Fast and (Don’t) Break Things
When building rich user interfaces, validating the correctness of the UI across all form factors and variations has always been a challenge. How the UI looks on a single device is only one consideration. From text size to accessibility to different languages, engineers must think about – and test for – a seemingly endless number of possibilities.
This was a critical consideration and a top priority as we continue to develop and update the eBay Motors App. For this project, we used Flutter, Google’s UI toolkit for building beautiful, natively compiled applications for mobile, web, and desktop from a single codebase.
As one of the earliest large companies to adopt Flutter in a production environment, we had to think through many different UI factors that could affect the experience for our diverse, global customer base.
We knew from experience that automation was the only scalable way to protect against regression as we rapidly iterated on the eBay Motors App. Typically, when most engineers think about UI test automation, they think about behavior testing: does a button perform the right action, will the screen be updated when a network call completes, etc. With this approach, you can test some of your most important requirements and feel confident that they will continue to work as expected as the software evolves.
However, in a rich client application, there are many requirements that are visual: ensuring that content overlaid on an image has enough contrast, ensuring that multiple elements are visually aligned, asserting that certain data elements are properly emphasized, etc. These types of requirements are unnatural or often impossible to test using behavioral approaches.
This is where screenshot testing comes in.
Flutter provides out-of-the-box screenshot testing in the form of “golden tests.” During a test run, a Flutter Widget can be rendered and captured as a screenshot. That can be compared to a known good “reference” screenshot (aka Golden). If the screenshots deviate, the test automatically fails.
This is a powerful tool, and is one that the Flutter team uses extensively in their own codebase. However, the out of the box support is fairly basic. Consequently, as we introduced Golden tests into our test suite, we found ourselves having to continually add redundant configuration code into every new test. In addition, we ran into hurdles achieving fidelity in our goldens, such as getting fonts and image assets to load correctly. As we added more tests, and tried to test UIs that varied by data or by device, the number of goldens we created became difficult to manage.
Fortunately, we started to recognize some common patterns that simplified the problem space. As these patterns evolved, we packaged them up into an open source package called the Golden Toolkit. Let’s take a look at some examples.
A Dozen Reasons to Change
Let’s take a look at an example from the Home screen of the eBay Motors app.
The home screen is filled with cards that represent active vehicle listings on eBay. Our first requirement was to be able to display an “auction” listing. Here is a golden of a VehicleCard representing an auction.
This is pretty straight-forward. The next feature was to emphasize the time remaining on listings that were about to expire. Then we were asked to show the current bid price once the auction’s reserve had been met.
We then added support for non-auction listings. Then we wanted confidence that the widget was responsive for different widths. Before long, we ended up with nearly a dozen variations of this “simple” card.
We started off capturing a separate golden for every one of these. This became unmanageable. Not only did it require a lot of redundant test code. But automated tests are intended to serve as documentation of the requirements for other developers, and having so many similar, but nuanced variations was making it difficult to see the big picture.
What we wished, was that we could see all of the variations at once.
Then we realized, with Flutter, we could. After all, Goldens are simply captured renderings of Widgets. We just needed to build a new widget that could display all of the widgets we cared about at once.
Here is what we ended up with:
This approach makes it easier to document and visualize all the requirements, while allowing code reviewers to understand the impact of a change in a pull request. In addition, it was easily extensible so that future requirements could be captured, with minimal effort.
Putting It All Together
Now we had a viable strategy for ensuring that individual UI components looked great and didn’t break. But could we use Goldens to help testing more complicated layouts, what about entire screens? One of our biggest pain points was ensuring that complex, full screen layouts continued to look correct on both phones and tablets.
We arrived at a simple API for capturing goldens for full screens. Here is a multiScreenGolden test of the same screen on a phone versus a tablet.
With multiScreenGolden, it is possible to simulate any device that is relevant for your test. Perhaps you want to make sure the content looks okay on a very small device, such as an iPhone SE, or maybe want to see how a screen looks in a different language, in dark mode, or with larger device fonts enabled. While you can always manually test those variations while developing, it’s not practical to ensure that they continue to work as expected as the software evolves.
With GoldenBuilder, we had a strategy for writing visual unit tests for our individual widgets, and with multiScreenGolden() we had a strategy for writing visual integration tests that could verify that the final end user experience is correct.
In Conclusion: Move Quickly With Confidence
The Golden Toolkit’s simple to use APIs have allowed us to easily write UI regression tests across our entire app. As a result, we have been able to move quickly, without sacrificing quality; confident the app will look great on all form factors as the code continues to change and evolve.
Developers who want more information about the Golden Toolkit, can go to: https://pub.dev/packages/golden_toolkit