But code is sensitive. A single bit error in just the right place can crash the program and corrupt the data it handles. Most systems aren’t this sensitive. Bridges can withstand the removal of bolts and supports, skyscrapers can withstand significant fires, airlines can still fly when an engine fails. But the very nature of software makes it vulnerable to single point failures.
For the Space Shuttle, NASA solved this problem by writing the control software twice. They gave the exact same specification to two different teams and had both teams implement that specification. During a flight they run both versions of the
software in different computers on board the orbiter, and compare the results. Disagreements between the two programs cause humans to get involved. This method is effective, but not perfect because it’s possible that the two development teams may have made similar errors in judgement. But the odds against that are reasonably high.
Accountants also manipulate the life’s blood of their companies. They too create structures that are sensitive to single point failures. The wrong digit on a spreadsheet in just the wrong place at just the wrong time can bring the company
down and send the executives to jail. To manage this sensitivity, accountants emulate NASA. They do everything twice using a practice called double–entry bookkeeping. Each transaction gets entered once in the debit accounts and again in the credit accounts. From there they follow separate mathematical pathways until they meet at a grand subtraction on the balance sheet that must yield a zero. This method is effective but not perfect because it’s possible that complementary errors might hide an imbalance. But the odds against that are reasonably high.
Accountants have gone so far as to make this practice part of the GAAP (Generally Accepted Accounting Principles) that define professional behavior amongst accountants.
How can we, as programmers, treat our code with less respect than accountants treat their spreadsheets? Is our code less important? Are errors in our code less costly? Are we under more time pressure than accountants? Why haven’t we defined GAPP Generally Accepted Programming Principles? Why haven’t we defined professional behavior for programmers?
The answer to that, of course, is that our “profession” is young. It’s barely fifty years old. Accountants started dabbling with double–entry bookkeeping nearly a millennium ago, and only formally adopted it in the 1500s. Programmers simply haven’t
had much time to understand what professionalism is.
But the clock is ticking, and the stakes are high. We cannot continue to treat the life’s blood of our companies so recklessly. It is time for us to define our profession. It’s time we adopted the practices and principles that define good programming.
What programming practice corresponds to double–entry bookkeeping? Double–entry bookkeeping is not simply a matter of doing everything twice. There is a method and process involved. When you follow it, you are indeed doing everything twice, but you are doing it in a certain, well accepted, way. You use the accepted nomenclatures, and the accepted procedures, and you produce the accepted reports in the accepted formats. Whatever practice we, as programmers, adopt. It should have a similar formalism.
The best candidate for such a practice is called Test Driven Development (TDD). It has a formalism, a method, a set of accepted nomenclatures and procedures, and it definitely causes everything to be done twice. The rules of TDD are simple, though if youfve not seen them before you may find them somewhat startling.
You must not write any production code until you have first written a failing unit test.
You must not write any more of a unit test than is sufficient to fail. (And compilation errors are failures).
You must not write any more production code than is sufficient to pass the currently failing unit test.
If you follow these three laws you will find yourself locked in a cycle that is a minute or so long. You must begin by writing a unit test for the desired functionality. But that unit test will quickly fail to compile because it will instantiate classes and call functions that have not been written yet. So you must stop writing the unit test and start writing production code. But fixing the compiler or logic errors does not take long and then you must stop writing production code and return to the test. At first, most programmers
reject these rules out of hand. They complain that it will double their coding time. They worry that the constant switching between
tests and code will break their train of thought. And they don’t believe that the result will be any better.
Complaints like this might explain why it took five hundred years for accountants to adopt double–entry bookkeeping. In a time when all accounting was done by hand using expensive and difficult writing materials, one can only imagine the resistance such a bold new idea would have encountered. But TDD has been around for over ten years now, and the results are in. It definitely
does not double the cost of software. Indeed, it seems to halve it. It does not slow the programmers down, it seems to double their
speed. It does not break their train of thought, in fact it seems to lead to better designs. And the results, in terms of deployed defects, have been measured to be ten times better*.
But why should this be? Let’s engage in a simple thought experiment. Imagine a team of developers following the three laws of TDD as described above. Pick one of those developers. It doesn’t matter who you pick, and it doesn’t matter when you pick them. A minute or so ago, all the code they were working on executed and passed all it’s tests.
Let me repeat that. A minute or so ago, everything worked. That’s an astonishing statement. What would programming be like if you were never more than a minute or so away from seeing everything work? How much debugging do you think you would do? The answer to that is: Not Much.
How much time do you spend chasing bugs? What if we could shrink that time by a factor of two or three (or ten!). How much time would that save you? That savings alone might make it worth adopting TDD, but in fact there are better reasons.
Have you ever been significantly slowed down by bad code? Of course you have. Have you ever looked at a module and thought: “Somebody ought to clean this up!”. Certainly. So why hasn’t anyone cleaned it up? Why don’t you clean it up? There’s one simple answer. Fear.
You know that if you try to clean up the code, you are liable to break it. And if you break it, it will become yours. So you back away from the idea of cleaning up the code. You donft need the headache. You donft need the risk. You are safer leaving the code in itfs messy state.
That decision is made over and over again, by every programmer on the team. Every time they encounter an opportunity to clean something up, they forego the chance because the risk is too high. And so the code slowly degrades. Bit by bit it rots like a piece of bad meat. As the months and years go by the code becomes a tangled quagmire that slows everyone down can causes estimates to climb to the sky.
Now imagine you had a button you could push that would run thousands of tests over your code base in a minute or so. Imagine that these tests covered nearly every line and every branch in your code. Imagine that you trusted these tests. What would that do to your fear? In short, it would eliminate it. You could see a messy module and make one small change to clean it just a little bit. Then you could run the tests and see that they all pass. You could make another small cleanup, and run the tests again, and you could repeat this simple process until the module was much cleaner.
We put a lot of emphasis on good design because we want our software to be flexible and maintainable. That’s a good thing. But nothing makes software more flexible and maintainable than a good suite of tests –– by a huge order of magnitude. Good design is important, but good tests are much more effective. With good tests, you can fearlessly improve the design. Without those tests you are afraid to make any but the most necessary changes.
The elimination of fear, and the freedom that creates to clean and improve code may be the most powerful benefit of TDD. The tests
provide the means by which code rot can be slowed, stopped, and reversed. And code rot is the single biggest reason for loss of productivity in software teams. When the code is bad, it’s hard to make progress.
Have you ever integrated a third party package? The vendor sends you a DVD with the code and documentation on it. The documentation is often a nicely formatted PDF written by a tech writer. At the end there’s an ugly appendix with all the code examples. Where’s the first place you go? Of course you go to the code examples, because that’s where the truth is. You don’t want to read what the tech writer wrote. You want to see the code. You understand code. Code is your language.
When we use TDD to write unit tests, each one of those tests is a code example for the system. These tests are small and focussed little snippets of code that describe how some small part of the system works. If you want to know how to create an
object, there is a test that creates it every way it can be created. If you want to know how to call a function, there is a test that calls it every way it can be called. In short the tests are documents that describe the inner workings of the system. Those
documents are written in a language you understand. They are utterly unambiguous, they are so formal that they execute, and they cannot get out of sync with the application. In short, they are the perfect kind of low level design document.
And if that werenft enough, what do you think you have to do to get every little bit of your code to be testable? The simple answer is you have to decouple it. The only way to test the inner workings of a function is to decouple those inner workings from the rest of the function. So the act of writing tests first drives you to a high degree of decoupling, which improves your designs.
The benefits go on and on. Those three simple little laws have a profoundly positive effect on the teams that adopt them, and the
software they create.
What about writing tests after you write the code? Many programmers feel more comfortable with that approach. They donft like the formalism and discipline involved with TDD. They dont want to be caught in that tight one–minute cycle. They want the freedom to write their code first and then write unit tests after the fact.
This would be like accountants who enter all the credit transactions first and then decide to enter all the debit transactions last. Imagine trying to track the balance sheet! It goes out of balance as soon as you enter the first credit transaction and you hope it comes back into balance when you enter the final debit. What if it doesn’t? Whichtransaction had the error. No, it’s better to enter each transaction on both the credit and debit accounts and ensure that everything balances before you go to the next transaction.
Let’s go through the thought experiment again, but this time with the test–after approach. Is debug time shortened? No, because the code is not kept executing every minute or so. The time between error entry and error discovery grows, and debugging time is as usual. After all, writing unit tests after the fact is no more effective than manual testingafter the fact. You still detect bugs
written hours or days ago, and you still have to chase them down.
Will test–after produce a test suite that eliminates fear? Unlikely, because in order to eliminate fear you must trust that test suite.
Without those three laws you have no real guarantee that the code is covered sufficiently. Tests, as an after thought, are not likely to be as good as tests that drive the development. If you don’t trust the test suite, the fear never goes away.
Will test–after product nice short little documents that describe the inner workings of the system? To some extent, yes, but again you need to trust your documentation. And tests as an afterthought are not likely to cover all the cases.
Will test–after force decoupling? Almost certainly not. Indeed, the author is more likely to forego a test that forces him to decouple, rather than to refactor the module so that the test can be written.
Besides, test–after is an ad–hoc approach. It has none of the formalism, accepted nomenclature and procedures, and rigor of TDD.
Test–after is really no different from business–as–usual. And, in most cases test–after really means test–sometimes, or test–if–I–havetime, or test–never. Test–after is not a professional discipline.
I hope I’ve made my point. We, programmers, hold the fate of our companies in our hands. We manipulate life’s blood. That’s a huge responsibility. It is time for us to stop behaving recklessly with that responsibility and to start behaving like professionals. It’s time for us to adopt TDD as one of our core disciplines.
*These results come from a number of studies at various large companies like IBM, Microsoft, Sabre, and Symantec. Readers are encouraged to google for “TDD Case Studies” and read through the dozens of different reports.