Myth 0: Unit tests are unnecessary if we have integration tests.

One thing I’ve run into a few times is people who have the idea that unit testing is useless or unnecessary if you’re running integration tests. In some cases people even go so far as to say they’re unnecessary if you have functional testing. This is a mistake.

It is true that unit testing is unnecessary. In fact, testing period is rather optional and unnecessary. You can write code without doing any testing at all, package it up, and sell it. People might even buy it. If you’re really lucky it might even work some of the time. You’d be pretty stupid to try it, but speaking strictly of necessity testing is not necessary.

The reason you test at all is because you want to ensure that you have developed a good product. There are all kinds of stages to testing though and even if you decide you will test, strictly speaking no single form of testing is necessary either. The reason you perform any given kind of test is because you want the benefits that it offers.

Here are the kinds of testing that most people engage in:

Acceptance Testing
Verifying that the product does what the customer wants. An acceptance test might be, “remains stable under extreme loads,” though this is rather vague (you’d want to negotiate with your customer what “stable” means and what “extreme load” means).
Functional Testing
Verifying that some aspect of the product does what you think it’s supposed to do. An example of this might be verifying that a user login form pops up when a user tries to access a page that requires authentication and they haven’t yet.
Integration Testing
Ensuring that objects work correctly when tied together. This is more of a development task in my opinion. An example might be testing that a user authentication library’s components work as a user authentication system when used together. Integration testing often encompasses an entire library or component but will often not encompass an entire product.
Unit Testing
Unit testing is a development task. It involves ensuring that specific functions and units within a component work the way they are supposed to.

Many people also differentiate between “regression testing” and normal testing. The only significance of regression testing is that it is about old behavior. Regression tests are simply tests that you wrote for previous releases. Strictly speaking it is unnecessary to run these tests, the only reason you do is to ensure that when you added new behavior and/or fixed bugs, that you did not break old behavior or reintroduce bugs that you’d previously fixed (because of course you’re writing tests for all your bugs, right?).

As you can see, these classifications build up in layers. A unit test tests one single entity in a software component. Integration testing tests the integration of many software entities into a component. Functional testing tests that a set of components implements specific functionality. Acceptance tests verify that that functionality implements the expected product. Omitting any one of these layers and you suffer a lack of the benefits they offer. You’re basically not testing that level of your software product and of course the argument against doing this is the argument for doing testing period.

Lets consider an example to get into mind what it is that unit testing offers that the higher layers do not. We don’t need to go all the way up right now because we want to examine what unit testing itself gives us, so we’ll assume that integration testing is being done. This example will use a user authentication system as an example.

A user authentication system is going to have all sorts of entities that implement it. There may be a persistence entity or set of entities that makes sure user data is kept around for a while. There may be entities like “User”, “Role”, “Group”, “Credential”, etc… There should almost certainly be at least one encryption scheme for relaying passwords from user to the system in order to ensure security.

An integration test of this component might be implemented by writing a scriptable program that logs in a user given a specific account name and password and then verifies that when a specific permission is requested that it is either given or denied. A complete set of these tests will very likely exercise every aspect of the code written to implement the behavior, just as a complete set of functional tests will exercise much or all of the behavior tested by this integration testing system.

Lets say now that you are writing this component. If you’re not doing unit testing here then of course you are writing a significant amount of code before you have any idea how much of it works the way it probably should. You will not know if the permission checking system works until you’ve implemented the entire system and can run an integration test that tries to gain permissions. If you were writing unit tests on the other hand, you could check that the permission checking system is probably working as expected long before you have to create other things like persistence. In fact you might not even need real users, groups, roles, or whatever because you can create fakes of these things to test the checking algorithm.

If you are working on this system, it’s already in place and has integration tests but no unit tests, and you want to swap out the encryption algorithm with something more secure. There are two things here that may very well be missing:

  1. The design may not be well suited to this change.
  2. You have to go through an entire login process just to check that encryption and decryption work correctly

These issues are actually related. Unit testing offers an inherently valuable service beyond simple testing: unit testing encourages a decoupled design because you are required to make your objects reusable in order to be testable as individual units. Integration tests, operating at a much higher level, do not require well thought out design that allows dependency injection, overriding, and use of individual aspects because all of the units that implement these things are available. Writing a unit test for one, solitary unit makes sure that you are keeping everything that is not directly related to the specific behavior implemented by that unit separate from other stuff.

A good design lets you plug and play things together to implement functionality rather than requiring you to track changes across multiple units. When your components become deeply coupled internally it becomes much harder to implement even the most basic changes without causing failures in ways you had no expectation of. Since testing units as units requires a decoupled design of those units, changing one becomes much less risky. What you gain here then by unit testing, and what is not offered at any other level, is a decoupled product with less rot that allows you to respond to change much quicker and cleaner.

The second part of the idea is that you can more quickly develop a single entity because you can test it more quickly. Not having to perform a large change set in order to verify the code you’re currently working on is a huge benefit to productivity. You can work on things that are not ready to integrate due to interface changes or even non-existence of pieces of a component and have some confidence that you are moving forward. Integrating and then debugging is a much slower method due not only to the fact that more must be accomplished before testing is possible, but also because the brain of the developer has to track a whole lot more that way and no matter how brilliant a developer is, there is an upper level to the amount of disparate things the human mind can be thinking about at the same time. When this barrier is broken a developer begins spending a lot more time rediscovering rather than holding the entire thing they are working on in memory. If we were talking about software performance we’d be talking about cache misses and page faults.

In short, if a developer can work on one, specific entity and test it they can then forget utterly about the details of that entity and move forward to another entity or at a higher level of abstraction. Lacking unit tests and only being able to test once integration is possible reduces this ability, especially if the design has coupled due to not being actively discouraged by the requirement for entity reuse in unit tests.

Furthermore, debugging itself is a larger task when faced with an entire integration or more due to the simple, basic fact that there’s a lot more to debug. While the act of logging in a user, attempting to gain a particular permission, and checking to see if you have or not can fail in any number of ways, the act of encrypting or decrypting a password is a much smaller thing to debug. If the fault is in the encryption functionality then finally debugging to that point to find out that a perfectly valid password failed encryption or decryption and then why is a much larger task than testing and debugging this behavior directly. If you fail more than once to implement your encryption scheme, but have to step down to it each and every time you check, or pass through a whole list of successes before the one failing point, then you are adding a whole lot of unnecessary work and time to what is already a complicated endeavor.

Finally, when you do integration testing the combinatorial nature of the problem you are testing explodes quite rapidly. Testing an entire user authentication and permission system is several orders of magnitude larger a problem than testing an encryption scheme or account search function. This creates a similar problem to the developer having to track more information in order to accomplish something before he can test it. Exercising an entire component means you have to work each possible branch in the system. Are you so sure you can even accomplish this? Unit tests on the other hand, being focused to one particular entity, have many less paths of execution to check. You can fully test 5 units much easier than you can fully test a component of those same 5 units. Remember that this an exponentially growing problem; 3 branch points creates 23 paths, not 2*3.

Yes, this means that you are relying on the lower levels to do some of the testing at the higher levels. Testing every path of execution in an entire product is almost certainly outside of practicable reality. It would be akin to solving chess, which is said to require more atoms in the universe to compute. This means that yes, sometimes bugs get through that are caused by the inter-operation of several entities or components because you didn’t execute that exact path in the upper layer tests. When this happens what you do is you take the hard path and debug. When you find the reason for the failure, the component and/or unit that is not working the way it needs to, you write new tests at that layer to check the behavior.

Generally speaking a bug is either caused by an entity that is not obeying the contract it is supposed to, the misuse of an entity that is obeying its contract, or a misunderstanding about what the requirements of the entity were. Debugging then becomes an act of discovery about which of these issues you are facing, where, and how. Then you can discover what you missed in your unit test that failed to account for the complete interface of the entity, what caused your entity under test to misuse another interface, or what needs to change in the requirements for an entity in order to correctly perform the task that is required of it.

In conclusion, by not testing at the unit level, which is fairly well the lowest level of abstraction it is reasonable to test at, you miss a great many benefits offered by unit testing that are not offered by higher layers of testing. You increase the amount of effort that must be accomplished without a harness, thereby decreasing the safety of your code changes. You vastly increase the complexity that your testing must reach in order to fully test your product. You lose the benefit of forced encapsulation and decoupling created by unique requirements imposed by unit testing. Finally, in so doing you greatly reduce the short and long term value of your product and increase the amount of technical debt it will accumulate over time; you vastly decrease the maintainable lifetime of your product, that amount of time in which the continued maintenance of a code-base is less than throwing it away and starting over. In these respects then, every developer, development team, and manager should consider unit testing absolutely necessary no matter how much other testing is being done.

What is unit testing?

One day I had a job interview over the phone.  The person on the other end asked me what I thought of unit testing.  I explained how I had, on my own, adopted the practice and tried to get my fellow team members to as well but had met with resistance.  I further explained how I tried to write them before coding, but lacking a rigorous sort of process or environment I sometimes got lazy…but I always regretted it later.  I told him how I thought doing the tests before the code was pretty important, made sure you were thinking ahead, and that when I actually did it I tended to write better code faster.  I also explained how even insisting on there being unit tests in projects I lead tended to cause developers to write them after the fact as just another thing they had to do rather than actually getting any use out of the practice.  I also explained that it was the first practice to go out the window when the time crunch happened because management didn’t understand or appreciate the practice even though it gained them so much.

The person on the other end agreed with me on everything, said he liked to write unit tests first, etc…  Later I was offered the job, I accepted the offer, and couple weeks later I was in the office working on this guy’s team.

There wasn’t a unit test to be found anywhere on a brand new codebase.  There wasn’t even any automated integration tests.  One developer, desperate for something, had made a client program to connect with this service program, take keyboard input, and spit out to the console.  We would log into the development server, shut everything down, copy a bunch of files over, restart stuff, run your client program, copy a line out of a word document that you had personally made for testing, fed it as input, manually looked at the result and say, “Yeah, that makes sense,” and then log into the database to make sure it had done its thing there as well.  There were a couple source files that had an, “#ifdef UNIT_TEST,” check that would generate a main function that would take input, call some functions, and put the output on the screen.  The architecture itself suffered as well and was primarily a large collection of globals that were manipulated by functions that could be in any module; it was essentially untestable for the most part, which is number one reason why almost no unit tests existed or so I thought.

As I continued my adjustment phase I came to know that this guy actually hated unit testing.  What I just described is exactly what he meant by that term and what most of the world calls “unit testing” was new and strange.  There was a clear disconnect between what the two of us had been talking about during the interview process.

As we continued down this path for a few months there came a time when we were told to implement a particular version of a scrum-like, lean-like process.  The company that was consulting us gave us some training on their “unit tests”.  These were composed of a large program they’d developed which would perform socket connections and communicate with services.  You’d program it with a scripting language and activate it with a directory searching makefile system that would find these scripts and run them.  They would link libraries into executable that could be driven by this program and these scripts consisting of “thin” layers on top of the library to make it driveable in this manner.  The engine had regex capabilities and could take the output received from network sockets and check that it matched specific patterns.  The same was done with the logging system.

Neither of these is a bad practice, and in fact the later is actually a GOOD practice, it just isn’t unit testing.  I’ve been rather surprised how few people seem to know what unit testing is, what it gives them in return, and how to do it effectively.  Of course these are things I continue to work on myself, but the first stage that must be reached is knowing what unit tests are and are not.  So without further ado, here’s what they are not:

Unit tests are not interactive 

A unit test should not be getting input from the user nor providing output that is checked by one.  This includes the developer as a user.  A unit test should only be printing results, if it even does that much.  There is some utility in having intermediate data printed to the screen for when the test fails, especially if you’re using a crappy framework that can’t print the values of asserts but only the expression used in them, but you should be able to turn this off and you really should be using a debugger instead.

A unit test should be a program that runs without interaction, does its tests, checks the results, and expresses PASS or FAIL.  That is it.  The most useful way for it to express this result isn’t even on the screen in any way but in the return value to the operating system: 0 for pass, non-0 for fail.

Unit tests test a single unit

A test program that has to link to an entire library is not a unit test.  Sometimes it may be easier to, but this should certainly not be required by the framework you’re using.  This is especially true in languages that lack virtual dispatch or hybrids like C++ where much of your work might be static.  In these cases it’s very important that you be able to override and mock behavior by linking with stub versions of functions and classes rather than the real thing.

A unit tests a single, solitary unit.  In a large library, composed of many units, you should have many unit tests that compile and run as part of the process of building that library.  A library that implements a user, roll, and security session for example is going to have several components in it probably not limited to user, roll, and session.  A unit test doesn’t check all of these together, it checks the behavior of only one of the units.  Generally you want to mock or stub out the rest to provide as much control and simplicity as possible.  This focuses the testing effort on the key piece when actually using the rest of the library means that failures could be caused by units that you’re not intending to test.

Further, having a whole bunch of components that need to be used together in order to build and run a test means that when it comes time to change the behavior of the larger entity you’re working on you will need to make changes across the board before being able to run your test.  This is much harder, more prone to error, and encourages hacking.  It adds to the decay of a project, further ingraining coupling, etc rather than alleviating it.  It is much easier and much faster to perform your changes incrementally, testing as you go, and this can only be done if you don’t have to implement EVERYTHING before you test.  This is one of the main benefits of unit tests in my opinion so if you have to link it all together to do your tests you’re severely missing the whole point.

Unit testing should be easy or your lazy developers (like me) won’t be encouraged to do it and will do silly things like write them after the fact, just throw something together that sort of looks like a test but doesn’t really test anything, etc…  They’ll work to get the requirement done and that’s it.  Do it right though and your developers may actually come to realize that it’s much easier to work with unit tests that try to write everything without harness, that they’re faster and better that way, and go home less frustrated and suicidal or homicidal.

Unit tests are written in the language the unit is in

If you’re writing some complex, or even simple program to drive another program and eating scripts, etc…  what you’re doing isn’t unit testing.  Although problems can be solved by adding an extra layer of indirection, the use of the unit you’re testing should not be so encapsulated.  You want to be directly using this unit, feeding it parameters exactly how it’s normally done and checking the results exactly as they are returned.  The only way to do this is to use the language the code you’re testing is written in so that you can call its functions themselves.

If you are not writing in the language that your code is in then there are numerous modes of failure that can be the result.  Your test can fail not because the unit is broken, but because all the crap you added to use it is broken.  Maybe your script program has a bug in it and is feeding your unit a bunch of bullshit.  Maybe the code you used to turn your unit into a program drivable by your scripting engine is full of bugs.  You just don’t know.  The whole purpose of a <i>unit</i> test is to focus on that single unit of code and check its functionality against expectations.  Anything extra is just added fluff that gets in the way.

In fact, you should be pulling stuff OUT of your test suite.  You should be mocking functions that your unit uses and injecting them into the system through virtual dispatch or linkage.  This way you’re simplifying the system even further to the point where JUST the behavior of the unit is under scrutiny.

If you’re writing some gigantic engine that reads scripts and runs a program then what you’re doing is integration testing.  It can temporarily take the place of real unit testing in legacy systems that are too coupled to do it right, but should not form a permanent framework.

Unit tests are written first

I’ve very often broken this rule but it’s really a lazy practice to do so that in the end only causes problems.  Unless you are writing your tests first, making them fail before writing the code, you don’t know if the code you’re testing is even BEING tested.  You thus don’t know if your tests are giving you anything at all.  One of the largest benefits of having unit tests is the ability to refactor implementation details while being fairly confident that you’re not breaking anything.  If the unit test checking the unit you’re altering is complete then when you’re done and the test still passes, the rest of the system should behave exactly as it did before you started changing things.  This is very, very important to the practice of agile development.

One thing I have done to try and alleviate this issue with post-coding unit tests is to use a code coverage tool.  This also though doesn’t really give an accurate picture though because simply executing a block of code makes it “tested” according to the code coverage reporter, but doesn’t mean you’ve checked anything that block has done.  It’s very difficult and time consuming to write tests post-facto.

Furthermore, another good thing about unit test writing is the effect it has on the design of your code.  Having to make your code testable makes it also reusable, decoupled, and simpler.  These things are important for maintainability of course and when code starts getting all tangled up it also gets complicated and fundamentally unreusable…and thus untestable.  Writing code without a harness means you have to be very careful about accidentally introducing coupling or you’ll be tearing your hair out trying to write the tests.  This is probably the primary reason I’ve seriously regretted it when I got lazy and didn’t do my tests and do them first.

Of course, if you’re not inclined to like SOLID design principles to begin with then you’re not likely going to enjoy unit testing too much.  Adopt unit testing though and much of these principles will simply force themselves upon you.

So that’s some of the things I think unit tests are not that I’ve seen go by the term.