The most important argument is a practical one: Test-First doesn’t work. I don’t care what you’ve heard. I don’t care how much your suit wants it. I don’t care how much the stress-spattered eyes of your coworkers gleam in their endorsement. It. Doesn’t. Work.
A big part of the problem is there’s very little comparison. Most programmers learned of the Test-First strategy in their early programming years and saw it as more rigorous before they really had practical experience to go off of. Most never looked back and dismissed everything else as a frivolous pastime from their uneducated past. It doesn’t help that this is how kids are being taught these days in the first place.
I was once one of these newly educated kids. So, for a few years, I worked exclusively with the Test-First strategy. When it was over, the results were undeniable. The code—all of it—was horrible. Class projects, research code, contract programs, indie games, everything I’d written during that time—it was all slow, hard to read, and so buggy I probably should have cried. It passed the tests, but not much else.
It was also the slowest code to write. I was astounded by how much. It’s like a factor of five, at least.
And we shipped it! All of the horrible stuff I’d written, just like every other cog in the machine, got packaged up into a “.jar” or a “.exe” or a “.pdf” or a “.app” and got sent off to customers. To buy.
A big part of the arguments for TDD is that such complaints are anecdotal. This is a valid concern. But, it’s not hard to look at the situation as an outsider.
How often have you seen a program crash? If it was developed by a large software company, chances are it was written using TDD. Clearly, TDD is not a magic bullet. So, TDD does not “prove your code works”.
How long does development take? On occasion, I have singlehandedly written medium-size, feature-complete programs in a single day. Regularly, I write entire modules (a few datastructures and classes, some functions, assorted misc.) in the same amount of time. That is a week or a month to a team of engineers using TDD. I’d say my code is equivalent or better reliability (indeed, without naming names, I have written custom software to replace TDD 3rd-party code because the latter was too buggy). So, TDD takes far longer to achieve the same result.
I invite the industry reader to verify these two experiences for themself.
If you’d like to know why I believe these facts are, or still aren’t convinced, what follows are some more concrete reasons why TDD fails.
The largest problem is that TDD restricts the elegance of finished designs. No one can magick a perfect design into existence, no matter how many design sessions you do on paper. As the Jargon File aptly notes:
[E]xperience has shown repeatedly that good designs arise only from evolutionary, exploratory interaction between one (or at most a small handful of) exceptionally able designer(s) and an active user population—and that the first try at a big new idea is always wrong.
Anyone who has any experience with software knows that this is true. This has been true from day one of computer programming. COBOL programmers know it. They did iterative programming on the ENIAC. This is how laws get written. This is how engineering gets done. This is how every single thing humans build has a chance at working. To say otherwise is denialism. It’s spitting in the face of a millennium of experience. Designs are unicorns. You should court them, get to know them. Develop a relationship. People who want to design everything first are looking for a shotgun wedding to cover up a one-night stand. It’s wrong.
So this is the first reason TDD fails: You’re trying to make a design before you learn anything about it.
What does this look like in software engineering?
If you’re working on your section of a module, it behooves you to constantly be changing your design around to make it perfect. Even in my “established” codebases, I still make occasional—nay, frequent—large-scale changes, since increased experience demonstrates faults in the original design, or new knowledge suggests a better one. This is healthy because it decouples current code from failed design decisions of the past.
Despite my codebase’s fairly large scope, even wide-reaching changes can be made quickly since there isn’t a battery of tests to delete and rewrite for each project affected. I have done massive restructurings of my core graphics library, for instance, on multiple occasions as I learned new things and as the old designs’ shortcomings became apparent. The fact that I could do this by myself to such a large volume of code should emphasize the fact that the same could not be achieved using TDD. I cannot even imagine how terrible my codebase would have been without restructuring its original design yet still adding features.
The Test-First design strategy discourages these frequent changes by increasing the amount of work it takes to modify anything. If you want to make a new function, you have to also make three new functions to test it. If you want to change what a function does, you have to change all the tests you wrote for it. It gets intractable. Eris help you if you want to refactor a class—let alone a class hierarchy. That’s getting to be a full day’s worth of error-ridden, painstaking work for something that should have taken fifteen minutes at worst.
So maybe you write tests at a coarser granularity. This is almost worse since it means you’re writing code to test algorithms. This requires more careful thought, rewards less partial success, and actually means more changes since algorithms change more than backends.
In practice, this leads to suboptimal designs. People write something one way, but then are afraid to change it because they’ll have to rewrite the testing code that goes along with it. These poor designs are actually often buggier, since they aren’t approaching the problem in the right way! They add tests to compatibility layers along with their algorithm. Then the cycle repeats and they add another boilerplate layer and merge it haphazardly with the first, adding a whole new battery of tests. So the problematic code stays as it is and becomes more and more protracted until it finally just needs to be thrown out and rewritten anyway.
Software engineers using the test-driven process get used to this kind of thing—so much so that they don’t even realize it’s happening. It’s only the new hire with no credibility who realizes that having three layers of semantically-void indirection to do something simple is inefficient and idiotic.
As proof, here’s a real example from a TDD project I had the misfortune to be affiliated with. The authors have requested anonymity. Just stew in the long, gory horror:
assert(...)
. Except for a blip in early 2003, function A, now 152 lines long, is unchanged until mid-2006._Implementation
) of yet another new class. Function A now has two helpers, three layers and two classes of wrapper functions, and almost 50 test cases holding the thing together. The implementation spans six files and almost 6000 lines.All this came to my attention in early 2010, when I called the project leader’s attention to it. I questioned the choice of TDD, and he replied that it led to cleaner code. I then showed him an annotated SVN log, to which he took exception and said he would investigate. I stopped using the project soon after and then most of the developers moved to a closed-source fork, so I don’t know whether it ever got resolved. I frankly don’t care.
You would be very hard-pressed to convince me that the example above, a ten-thousand-line-monstrosity with redundant code that no one understands, is more reliable than a refactored function A would have been. This is an extreme example, of course, but it’s hard not to see the evidence of lesser evils in other TDD open source projects. Pick your favorite and look.
I dare you.
The lesson here, again, is that authors and maintainers where so afraid to change a bad design that they just plain didn’t.
Many advocates claim TDD gives them more confidence that their code is right. As the above section demonstrates, perhaps this confidence is misplaced.
How confident are TDD developers? Very. In my experience, TDD too often produces code that works very nicely for the test cases, and simply doesn’t on real data. Frequently, my bug reports to such projects are automatically met with “no, we tested for something like that”. I have even pointed out exactly where the bug is, only to have its existence be denied. The arrogance TDD seems to cultivate—that any TDD code is somehow a higher standard of robustness—shields its developers from the preeminent fact that it is more often than not the opposite.
The problem is, TDD encourages programmers to write programs that fit those test cases—and not necessarily others. Ideally, if tests are written “well”, then any program that passes them is satisfactory. However, it is impossible in almost every case to specify the desired output of every possible input. Even if it is, it’s certainly not practical to.
Moreover the process itself encourages laziness. While writing the tests should be the most carefully focused part of TDD, it’s comparatively mindless to generating new code. From personal experience, I know that writing tests is wearying. There’s an attitude of “let’s just move on to the next test”. This laziness carries over into the target code itself: after hours of writing tests, just cruft some implementation together and hope it holds.
It has become a running joke among software consumers by how late software development’s deadlines slip. Why does this happen?
It happens because the initial outlay of the project planned for only the cost of writing the code plus tests. It didn’t take into account the realization that the code wouldn’t actually work. Here’s a real, quoted schedule for a 12 week TDD project that came out of a planning meeting:
Here’s what actually happened:
In a larger environment, the quality control team would have caught the bug, and ordered management to extend the project.
After I gave up TDD, I started getting emails about how-can-I-possibly-get-anything-done-without-TDD. How much can I get done without your TDD? Turns out: for the same quality . . . more than you.
I don’t think it’s just me, though. I think it’s just that I don’t use TDD.
My code is constructed loosely: at each stage of the process, I envision the key features of a design, and then build downwards and outwards. I check each function carefully, but I don’t let that distract me from visualizing how it all fits together. I put preliminary thought into my design, but by and large I improvise—and I do not fear radical changes halfway through. Moreover, I embrace them. If I think up a good architecture change, I make it immediately.
It is my belief that the complexity and dynamism of modern programs is so staggering that it can only be appreciated by understanding it intuitively. Writing tests forces one to formalize this complexity to the point of presupposing that the complexity of a modern program can be reduced to two or three simple cases. Writing test cases for programs is more often than not like trying to describe the Mona Lisa by sketching in crayon. The beautiful state and delicate control flow balanced so carefully in the programmer’s mind aren’t aided by the romp-tromp of test-driven boots; such things aren’t reducible to a few lines of description in a test file.
Even if you somehow succeed, TDD prevents incremental drafts by functionally requiring all tests for a module to pass before you get any real results. It’s test this, test that, and before you know it it’s a week later and your algorithm isn’t even half done—and when you finish it, you find out it doesn’t work. I’d have figured that out on day one, even if the implementation I used to do it were crap.
All the criticisms in my critique of test cases also apply, but for Test-Driven Development, the problems are specially preventable. The real bottom line is the same: software development doesn’t work at all if the people doing it are incompetent. No measure of unit testing or code review or whatever buzzphrase du jour will ever be thought up can guarantee that a program will work as intended. TDD is a bandaid over a larger problem: writing good code in the first place.
There is literally no substitute for competence. If your coders don’t have it, TDD won’t fix it. If they do have it, TDD will undermine it. The Test-First strategy discourages this sort of thought by giving false security in the form of a passed test suite. It leads to broken code in broken designs and allows people to feel proud of themselves anyway.
TDD hampers good design through code bloat and fragmentation. It arrogantly presupposes that designs can be built all at once, and it doesn’t even give better results. It should never be used.