tl;dr
No.
Well, perhaps, but it can be the lesser of two evils.
It depends.
Disclaimer – these thoughts reflect my personal opinions and not a consensus of how the entire Shutl engineering team at eBay feels.
Full version
As a software engineer, I’ve always had it drilled into my head that duplication is a bad thing, and teams I’ve been a part of have always worked diligently to reduce all forms of duplication they see. This as a general rule makes a lot of sense; I can still remember reading through Martin Fowler’s Refactoring while I was at university and the excitement of discovering design patterns.
Duplication is bad, but…
Good developers are precise and careful and understand the risks and pitfalls of duplication in a code base—see the DRY (don’t repeat yourself) principle, from The Pragmatic Programmer:
“The alternative is to have the same thing expressed in two or more places. If you change one, you have to remember to change the others, or your program will be brought to its knees by a contradiction. It isn’t a question of whether you’ll remember: it’s a question of when you’ll forget.”
Which makes perfect sense. It’s time well spent when you try to make your code streamlined and readable. You’ll end up with a cleaner, easier-to-maintain, and more extensible code base as a result. I briefly mention this point here just to give some context, but it’s been covered many times and I’m assuming that we’re all on board: Duplication is a code smell, and removing it is usually a good thing.
However, like most things, I believe this advice starts getting you into trouble if you take it to an extreme. Advice, just like code, does not live in a vacuum; context is king. For example, when it comes to unit tests, sometimes the cost introduced by removing duplication can be greater than the benefits gained. What I plan to show you in this post are a few examples of where I would leave duplication in place and how this duplication has, in my mind, led to cleaner and more readable code.
Peacock Programmers
(image source: https://www.flickr.com/photos/pokerbrit/9455644074)
I was trying to think up an appropriate term for a thought I had in my head and settled on Peacock Programmers.
[Edit – I’ve discovered that an alternative name might already exist for this (and here I thought I was being original) – The Magpie Developer ]
A peacock programmer is someone who loves to use every tool in the arsenal and for various reasons attempts to decorate code with every possible feature that is available to him or her.
You can recognize the work of a peacock programmer through inelegant attempts at elegance. It’s wordy, flashy, complicated looking, and almost impossible to understand without deep concentrated effort and untangling. Sure it’s fun to write and a blast to experiment and learn all the different possible ways of solving a problem, but such use is as an academic tool. When you’re seeing it on a production code base, it can be frustrating.
Are Peacock Programmers a bad thing?
While talking through the concept of peacock programmers over lunch in the office, a colleague, Kai, raised an interesting point. The term sounds rather negative, and in truth my description in the previous section was slanted that way. However, Kai points out that It is only through the efforts of these peacock programmers and excessive use of all these features that the rest of us learn where the limits of reasonable use are. It’s a common pattern with many technologies, such as with meta-programming. The first stage is to learn the technology. The next stage is to get excited by this new knowledge and attempt to use it everywhere. It’s then, by seeing it used excessively and understanding the consequences of that, that you learn to regulate its use and ensure it’s appropriate.
As Kai sees it, peacock programmers are the early adopters that the rest of us ultimately learn moderation from, and their nature is an essential part of our learning.
A case study
The focus of this blog post is a real unit test from a live code base that I’ve worked on. I’ve copied the original test here below** as an example that the rest of this blog post is going to focus on. There were some more interesting examples to pick from, but they started to get a little unwieldy for blogging purposes, so I’ve picked a simpler example. I’ll add some notes at the end on other issues I had found in the larger tests. I’ll also not be concerning myself with actual details of what is being tested; whether the testing is appropriate or well-formed is not the focus here, but rather just the overall structure of the test class.
** Disclaimer – the code and the style have been preserved exactly as the were. I’ve merely renamed classes/methods for anonymity.
require File.expand_path('spec/spec_helper')
describe NewFarm do
let(:valid_params) {
{size: "large", colour: "#E0E0EB", breed: "Greyface Dartmoor", user_id: 1321}
}
let(:expected_attributes) do
{
size: "large",
colour: "#E0E0EB",
breed: "Greyface Dartmoor",
user_id: 1321,
fake_sheep: nil,
session_id: "a uuid",
value: 0
}
end
let(:service) do
double 'service'
end
let(:sheep_collection) { double("sheep collection", id: "an id", valid?: true) }
before do
service.stub(:generate_sheep).and_return(sheep_collection)
SecureRandom.stub(:uuid).and_return("a uuid")
end
let(:new_farm) { NewFarm.new service, valid_params }
subject { new_farm }
describe "#formatted_base_errors" do
let(:sheep_collection) {
double("sheep collection", id: "an id", valid?: false, errors: {base: errors})
}
before do
new_farm.save
end
context "when no errors are present on the sheep_collection" do
let(:errors) { [] }
its(:formatted_base_errors) { should == "" }
end
context "when errors are present on the sheep_collection" do
let(:errors) { ["no wool", "pretty sure it's a disguised wolf"] }
its(:formatted_base_errors) { should == "no wool, pretty sure it's a disguised wolf"}
end
end
describe "#save" do
context "success" do
it "generates sheep" do
service.should_receive(:generate_sheep).
with(expected_attributes).
and_return(sheep_collection)
new_farm.save
end
it "persists the farm" do
new_farm.save
Farm.last.uuid.should == "a uuid"
end
it "associates the sheep collection with the farm" do
SecureRandom.stub(:uuid).and_return("a uuid")
service.should_receive(:generate_sheep).
with(expected_attributes).
and_return(sheep_collection)
new_farm.save
Farm.last.sheep_collection_id.should == "an id"
end
its(:save) { should be_true }
end
context "when colour is wrong for breed" do
let(:params) {
{size: "newborn",
colour: "#ff69b4",
breed: "Badger Face Welsh Mountain"}.with_indifferent_access
}
let(:new_farm) { NewFarm.new service, params }
its(:save) { should == false}
it "puts an error on the colour" do
new_farm.save
new_farm.errors[:color].should == [
"Badger Face Welsh Mountain sheep are not #ff69b4"
]
end
end
context "when input is missing" do
let(:new_farm) { NewFarm.new service }
its(:save) { should == false }
it "doesn't call the service" do
service.should_not_receive(:generate_sheep)
new_farm.save
end
it "doesn't create a farm" do
expect {
new_farm.save
}.not_to change{ Farm.count }
end
it "adds errors for each field" do
new_farm.save
[:size, :colour, :breed].each do |attr|
new_farm.should have(1).error_on attr
end
end
end
context "when service returns errors" do
let(:sheep_collection) {
double 'sheep_collection', {"valid?" => false,
"errors" => {
"size" => ["oh noes"],
"base" => ["A base error", "Is something else"]
}
}
}
before do
service.should_receive(:generate_sheep).and_return(sheep_collection)
end
it "puts the errors on itself" do
new_farm.save
new_farm.errors[:size].should == ["oh noes"]
new_farm.errors[:base].should =~ ["A base error", "Is something else"]
end
it "returns false" do
new_farm.save.should be_false
end
end
end
context "#farm_id" do
it "returns the farm uuid" do
new_farm = NewFarm.new service, valid_params
new_farm.save
new_farm.farm_id.should == Farm.last.uuid
end
end
end
This test was typical for the code base, and was seen by many as a well,written one. In some ways it is; the tests are small, they’re contained, and they test one thing each.
Yet I’ve always been frustrated when I see code like this. I’m a fan of simplicity and try to avoid, as much as I can, the use of superfluous language features. Maybe I’m just stupid (this is quite possible) but I look at a test like this and I really struggle to understand what’s going on. Also important to note is that I’ve picked one of the simpler samples for this post.
Nested describes – like a boss
We ran an analysis on a small set of our tests (the tests for one of our services) to figure out the worst culprits for nested describes/contexts. Our record was seven. That’s right, seven layers of nested description before you finally get to the test body. Good luck trying to figure out what’s actually being tested and finding the relevant bits of setup!
Nested describes/contexts can potentially be a way of helping structure your tests, but give it a moment’s thought before you wrap that spec in a context and ask yourself “why am I doing this?” Especially when you’re creating a context with just one test inside it, why not just make the test name more descriptive? In isolation, the idea of being able to use contexts to help structure and add meaning to your tests is a good one, and I’ve made good use of contexts. But use them sparingly, lift your head regularly from the depths of your spec, and take a look at the whole test file and ask yourself “how’s it looking?” Once you start going more than a couple of levels deep, perhaps it’s time to rethink and ask yourself why your test is so complicated.
e.g., instead of this:
require "spec_helper"
context "when something interesting happens"
it "behaves like this" do
...
end
end
…how about this:
require "spec_helper"
it "behaves like this when something interesting happens" do
...
end
Bonus:
Here’s an example I found from a code base (again, with the names obfuscated; my only replacements are farm and sheep) that I’ve worked on with 6 layers (I couldn’t find the record-breaking seven-layer spec ? )
describe Farm::UpdateSheep do
...
describe '#from_farm' do
...
context 'saving from farm' do
...
describe 'formatting for the api' do
...
context 'with a nulled date' do
...
it 'submits an empty date' do
...
Now imagine the message you would see when this test fails…
Farm::UpdateSheep #from_farm saving from farm formatting for the api with a nulled date submits an empty data
‘Let’ it go
Another feature that I think is worryingly overused is ‘let’. Now I understand that let is lazily evaluated and can improve performance, but we’re talking about unit tests here; running a suite should be taking in the order of seconds. What I’m more concerned about is the time wasted trying to decipher a test class that has ‘let’ blocks scattered all over.
Again, my biggest issue with them comes from misuse and misdirection. I have seen many simple unit test classes that have been made so much more verbose and complex through the addition of unnecessary ‘let’ blocks. Their introduction led to an explosion in the scope of a test. No longer can I look at a self-contained ‘it’ block and understand the test—now I need to scroll all around, trying to piece together the nested contexts and follow the chain of ‘let’ blocks to see which data is being set up, and then see which of these many lets my test is overriding… I’m getting anxious just thinking about it (more thoughts on this, and alternative structure can be found in the When, Then, Given section below).
Subject
This is probably my least-favorite feature. I can’t think of a single time in my own experience where I’ve thought that using ‘subject’ was a good idea and I am really confused as to why people are propagating its use. It just further reduces readability and adds pointless misdirection to your tests. When I first experienced the joys of RSpec after a stint working in Java, I loved the expressiveness and the flow of the test structure. But ‘subject’ feels like a step backwards.
It gets even more confusing when your spec is testing something that is happening in the initialization of the class under test. Note: bonus point for combining subject with described_class.
subject {described_class.new}
...
# way way down in the file
it "does something" do
Sheep.should_receive(:is_disguised_wolf?)
...
subject
end
We came across a test like this by chance while working on a class, and my pairing partner saw the trailing subject and assumed it was some mistake as it looked like a pretty meaningless line accidentally added at the end of a long test. My partner deleted this line of code without giving it much thought, and then a few minutes later when we ran the test, it naturally started failing.
My advice: ditch the pointless subject—new is not always better. Make your tests explicit. I’d much rather see an appropriately named variable that is instantiated within the test body where it’s used as opposed to a cryptic ‘subject’ that was defined way up near the top of the test (or somewhere in the nest of contexts I had to traverse to reach the test).
When, Then, Given
I’m sure you’re starting to see a pattern here. Most of my annoyances are born from testing styles that increase the spread of focus required to understand what is happening. The idea of a unit test is that it tests a unit. I expect them to be concise and self-contained.
The way I try to achieve this structure is by following the “When, Then, Given” pattern. I’m assuming that you’re going to be familiar with the BDD given/when/then (or the synonymous arrange/act/assert) philosophy of test structure. When I write a unit test, I start by mentally picturing these as three distinct sections of my test. At times, when trying to make this pattern explicit or to share it with a new team. I explicitly express the sections through comments:
it "does something" do
# Given
...
# When
...
#Then
...
end
So far so good. However, the mistake people make from here is to start working through the sections, filling them in. I wouldn’t do that. Start with the When. This should be one line, and the most obvious/easiest to write. When I look at a test written in this format I can immediately recognize the trigger for your scenario.
Next, move on to the Then. This is the next easiest section to fill in. What is the expected outcome?
And only then should you go back and complete the Given. In order to achieve this result, what data is required? And keep everything in the test! Again, this advice is flexible; I can understand there are times when you’d want to pull things out. But I try to resist this urge as much as possible (in the rewritten version of the test at the end of this post, you’ll notice I did use one ‘let’ block). Just keep the exceptions to this practice under check, or else a few months later your test will have grown into a mess. Keep the discipline.
This point brings us back to another reason I hate ‘let’ blocks. You can no longer read a spec and understand why the class under test is behaving the way it does. The data required to achieve its result is now spread all around. In addition, while you’re following all these ‘let’ blocks to see what your test is doing, you’re chasing a load of red herrings, as much of this data is entirely irrelevant to the test’s specific scenario.
Shared examples = shared pain
I couldn’t find a small enough sample to include as a case study of this point, but shared examples are just a nightmare! Imagine all the trouble of going back and forth across a long test file trying to make heads or tails of it, and then multiply that by 10. Just avoid.
Less duplication = more lines of code…right?
Back to our case study: I’ve attempted to rewrite a number of tests using the style I’m recommending, and the first time I tried, I truly did expect there to be more lines of code. I was OK with that; I was willing to accept that as a trade-off for the increased readability and comprehensibility of the tests.
The original test was 162 lines of code.
The rewrite…101.
Technically speaking, there is more code, the lines are longer, but there are fewer of them, and they conveyed more meaning. I’ve seen the same result in almost every test I’ve rewritten, and I was surprised.
So here’s the finished result. This is the above test as I would have written it. It’s not perfect and perhaps it’s just me, but I find this style so much easier to comprehend and get my head around.
require "spec_helper"
describe NewFarm do
let(:valid_params) { {size: "large", colour: "#E0E0EB", breed: "Greyface Dartmoor", user_id: 1321} }
it 'only returns base errors when there are no other errors on the sheep' do
sheep_collection = double(Service::SheepCollection, valid?: false, errors: {base: ["no wool", "pretty sure it's a disguised wolf"]})
service = double("service", generate_sheep: sheep_collection)
new_farm = NewFarm.new(service, valid_params)
new_farm.save.should be_false
new_farm.formatted_base_errors.should == "no wool, pretty sure it's a disguised wolf"
end
it 'has no formatted base errors when there are both errors on base as well as other errors on the farm' do
sheep_collection = double(Service::SheepCollection, valid?: false, errors: {base: ["farm errors should take priority"]})
service = double("service", generate_sheep: sheep_collection)
new_farm = NewFarm.new(service, {breed: nil})
new_farm.save.should be_false
new_farm.formatted_base_errors.should be_nil
end
it 'returns nil when trying to format base errors on a valid farm' do
sheep_collection = double(Service::SheepCollection, valid?: true, errors: {})
service = double("service", generate_sheep: sheep_collection)
new_farm = NewFarm.new(service, valid_params)
new_farm.save
new_farm.formatted_base_errors.should be_nil
end
context "#save" do
it 'generates sheep on successful save' do
expected_attributes = {size: "large", colour: "#E0E0EB", breed: "Greyface Dartmoor",
user_id: 1321, fake_sheep: nil, session_id: "a uuid", value: 0}
RandomGenerator.stub(:uuid).and_return("a uuid")
service = double("service")
service.should_receive(:generate_sheep).with(expected_attributes).and_return(double("sheep collection", id: "id", valid?: true))
NewFarm.new(service, valid_params).save
end
it "persists the sheep" do
service = double("service", generate_sheep: double("sheep collection", id: "id", valid?: true))
RandomGenerator.stub(:uuid).and_return("a uuid")
NewFarm.new(service, valid_params).save
Farm.last.uuid.should == "a uuid"
end
it "returns the sheep uuid" do
service = double("service", generate_sheep: double("sheep collection", id: "id", valid?: true))
new_farm = NewFarm.new service, valid_params
new_farm.save
new_farm.farm_id.should == Farm.last.uuid
end
it "associates the sheep collection with the farm" do
service = double("service", generate_sheep: double("sheep collection", id: "an id", valid?: true))
NewFarm.new(service, valid_params).save
Farm.last.sheep_collection_id.should == "an id"
end
it "puts an error on the colour when it doesn't make sense for the breed" do
service = double('service', generate_sheep: double("sheep colletion", id: "an id", valid?: true))
farm = NewFarm.new(service, {size: "newborn", colour: "#ff69b4", breed: "Badger Face Welsh Mountain"})
farm.save
farm.errors[:colour].should == ["Badger Face Welsh Mountain sheep are not #ff69b4"]
end
it "adds errors and doesn't create a farm when input is missing" do
service = double("service", generate_sheep: double("sheep collection", id: "id", valid?: true))
new_farm = NewFarm.new service
expect{ new_farm.save }.not_to change{ Farm.count }
[:size, :colour, :breed].each{ |attr| new_farm.should have(1).error_on attr }
end
it 'puts sheep collection errors onto self when there are errors' do
sheep_collection = double("sheep collection", valid?: false, errors: {base: ["Error"], size: ["oh noes"]})
service = double("service", generate_sheep: sheep_collection)
new_farm = NewFarm.new(service, valid_params)
new_farm.save
new_farm.errors[:size].should == ["oh noes"]
new_farm.errors[:base].should =~ ["Error"]
end
end
end