PSD: Duncan White's Practical Software Development Pages

Welcome to Duncan White's Practical Software Development (PSD) Pages.

I'm Duncan White, an experienced and professional programmer, and have been programming for well over 30 years, mainly in C and Perl, although I know many other languages. In that time, despite my best intentions:-), I just can't help learning a thing or two about the practical matters of designing, programming, testing, debugging, running projects etc. Back in 2007, I thought I'd start writing an occasional series of articles, book reviews, more general thoughts etc, all focussing on software development without all the guff.

See all my Practical Software Development (PSD) Pages

Testing and Development - the Siamese Twins of PSD.

Testing your programs - during or after development - has always been an important part of programming, but is sometimes regarded as dull. In particular, Regression Testing is a long established principle, used on all large projects (awk,perl etc) to check that a recent change has not broken the system. Fred Brooks gave the classic definition of this (in The Mythical Man Month, p 122):
Theoretically, after each fix one must run the entire batch of test cases previously run against the system, to ensure that it has not been damaged in an obscure way.
In practice, such regression testing must indeed approximate this theoretical idea, and is very costly.

However, in recent years, testing has become cool and sexy, largely due to all the recent strong emphasis on testing in Extreme Programming (XP) and it's spin-off Test Driven Development (TDD) with their associated concepts of Code Refactoring and automatic Unit Testing.

I regard the core XP/TDD Testing concept (sometimes referred to as code a little, test a little) as an extremely sound idea. This quote from extremeprogramming.org gives the idea:
XP emphasizes not just testing, but testing well.
Tests are automated and provide a safety net for programmers and customers alike.
Tests are created before the code is written, while the code is written, and after the code is written.
As bugs are found new tests are added.
A [tight] safety net .. is created. Bugs don't get through twice.

TDD goes even further, and uses test cases to drive development, which is a very interesting idea to think about. I think it's quite appealing - so later in this article, I'll give it a try!

Let's just consider for a moment - what is the purpose of testing? It's scaffolding - there to significantly increase our confidence in the code we write, although never to 100% (remember Dijkstra's observation: program testing can be used very effectively to show the presence of bugs but never to show their (complete) absence). We try to flush out as many bugs as possible as quickly as possible, prevent our changes reintroducing bugs, and find out if our changes in one area expose hidden bugs in another.

Principles of Testing
Based on the common ground between my experience, XP/TDD concepts and the 3 "PP" books I keep mentioning:

The Pragmatic Programmer (H&T) - see my Review here.

The Practice of Programming (K&P) see my Review here.

Programming Pearls (Bentley) - see my Review here.

we can make a stab at the core principles of testing:

Develop and Test incrementally: As K&P say Testing should go hand in hand with program construction (p145), or as H&T say The earlier a bug is found, the cheaper it is to remedy. "Code a little, test a little" is a popular saying in the Smalltalk world, and we can adopt that mantra as our own by writing test code at the same time (or even before) we write the production code (p237). Start with simple tests, test the boundaries of what you've just written, and move onto more complex test cases as you write more complex bits of code.

Tussock-hopping is a particular incremental development technique I use all the time - it refers to making a series of small atomic changes, where each change makes a single logical addition and fixes the immediate consequences of that change (eg. if you add a parameter to a function, you must update all the calls to the function). I call it tussock-hopping by analogy with crossing a dangerous marsh by jumping from tussock to tussock and trying not to fall into the boggy bits. (BTW, I like walking in mountainous and often boggy areas; this analogy just came to me one day while trying to cross a boggy patch:-))

Each tussock-hop takes your code from working -> broken -> working. How do you know it's working again? By recompiling and testing it again! It's easy to get carried away and start making multiple unrelated changes all at once. Do this, and - trust me - one day you'll screw up badly, get very confused, and perhaps have to revert to your previous version and start again. (You are using version control, now, aren't you? No? Why the hell not? There are so many excellent version control tools out there, for example git or subversion).
Aside: I routinely apply the "tussock hopping" principle at a micro-level, in terms of structurally editing a single function - see this aside for more details of tussock hopping.

Test automatically: This is the core principle from which everything else flows, as in the K&P quote: It's tedious and unreliable to do much testing by hand; proper testing involves lots of tests, lots of inputs, and lots of comparison of outputs. Testing should therefore be done by programs, which don't get tired or careless. It's worth taking the time to write a script or trivial program that encapsulates all the tests, so a complete test suite can be run by (literally or figuratively) pushing a single button. (p149), and also the H&T quote: Most testing should be done automatically. It's important to note that by "automatically" we mean that the test results are interpreted automatically as well! (p246).
A related point that both K&P and H&T emphasize is that you can generate repetitive boilerplate code itself automatically, in a way that guarantees it's correctness (eg. parser generators, custom tools). H&T discuss this in sections on simple code generators and the DRY (Don't Repeat Yourself) principle. Over the years, I've written many Programs that write Programs as H&T suggest. A future article will show examples of this.

Test thoroughly - in fact, ruthlessly: as in H&T's quote: Most developers hate testing. They tend to test gently, subconsciously knowing where the code will break and avoiding the weak spots. Pragmatic Programmers are different. We are driven to find our bugs now, so we don't have to endure the shame of others finding ours bugs later (p237). Similarly here's an extremeprogramming.org quote: XP emphasizes not just testing, but testing well and a K&P quote: It's important to test a program systematically (p145).
I would add - from teaching experience - that it's important to make your tests as clear as possible. Assume that every test you write will be run hundreds of times (in automated regression testing and unit testing), but even so whenever a test fails, you'll need to look at the output to understand what went wrong. So, be kind to your future self - spend a few minutes making a test program produce simple and clear output. I've seen test output that is scruffy or even actively confusing, making sense to noone but the person writing it, and giving an impression of disorganised desperation. This is another of H&T's Broken Windows needing fixing.

Test the Interface, and the Boundary Conditions: What to test? Test the interface (thinking of the module as a black box), the properties that the interface guarantees (aka H&T's Design by Contract), and also test the boundary conditions. Also: think about the boundary conditions as you write code - as K&P say (p140) most bugs occur at boundaries. If a piece of code is going to fail, it will likely fail at a boundary. Thinking about boundaries can help you prevent - or at least spot and fix - many off-by-one errors before your program is ever run.

Program defensively: Design your functions with clear preconditions, postconditions, invariants and expected properties (Design by Contract again) and turn implicit assumptions into explicit assertions which abort the program as soon as an expected property no longer holds. Your assertions then form an essential part of the testing regime.

Know what output to expect (K&P, p146): In order to automate testing, your test framework or your individual test cases need to know the correct output for each input, and flag up the errors. Don't rely on human eyeballs to detect test failures - that's not what Eric Raymond meant in his famous saying given enough eyeballs, all bugs are shallow.

But how does your test know the correct answer? There are only a few ways of generating this knowledge:

Manually embed an assertion for every checkable property into the test program, so that if your code operates correctly, the test program will run silently and exit with status 0 - success. But if some part of your code is broken, one of the assertions will fail, a message be printed (to stderr) and the program aborted with a non-zero exit status. A simple framework script can then run all tests, retrieving exit status and stderr for each one. This works well for single-input tests, i.e. the input values are embedded in the program.

Manually prepare, for each input to each test program, a "correct output" file - then use this to check the generated output of the test program. Problem: this doesn't scale to lots of tests and lots of inputs, but it works well for a small number of early stage tests, and is particularly appropriate when testing ADTs. We'll see examples of this later.

Write an independent prototype implementation to test yours against, using simple but inefficient algorithms, and probably written in a high-level scripting language eg. perl. Problems: how do you know the prototype is correct, and do you really want to write two versions?

Test the current version of your code against the previous version, as in regression testing. This is one of K&P's main strategies, and they give a nice example of a framework script on p149. Problem: how to test while still writing the first version?

Autogenerate many such tests - and their correct answers - design a Bentley/H&T/K&P style little language and code-generate both the test and the correct answer from the little language instruction stream. This is a lovely idea which we'll end this article with!

GUIs are difficult to test: Follow good modularity and decoupling design principles and make sure the GUI is a thin wrapper on top of a solid system functionality API. That allows you to thoroughly test the system without the GUI - and vice versa. It if often worthwhile to implement a minimal non-GUI user interface to the system, both for ease of testing and for power users who will often appreciate command line tools that can inject their data into pipelines and be scripted into convenient collections. This may involve designing a little language to encode what operations you want to apply to what operands.

Different Kinds of Tests
Of course, there are many different types of tests, here are some of them:

Syntax checking after each small tussock-hop is the most obvious first test. Scripting languages (eg. perl) often have a syntax checker (eg. perl -cw), and for compiled languages, the compiler itself is a superb syntax checker. Set warnings to maximum, and fix all warnings immediately. Often, bugs lurk in precisely the unclear sections of code that compilers warn about. Ignoring warnings is simply idiotic: fix them (as in H&T's injunction to Fix Broken Windows).

An empty import test is the simplest possible module test harness, it imports the module under test, but does nothing. Syntax check/compile and link this test program, checking again for no linker warnings or errors. Run it and make sure it doesn't crash.

Use simple test harnesses to test the interface: call each individual operation (function), for a variety of inputs. Test the boundary conditions, and the error cases - in the latter case, the correct result is that the program aborts with a particular error message and exit status.

For individual functions with particularly complex algorithms, a combination of manual dry run techniques, single stepping with a debugger and adding print statements to the code can be used to convince yourself that the basic algorithm works, at least for a particular input value.

More complex tests can test that short sequences of operations compose together correctly, verifying all expected properties of each result at each stage - eg. mutually inverse operations (eg. addition and subtraction) that should cancel out. Use assertions to verify all these properties at each stage of the test, so the program aborts immediately if it departs from your expectations.

Test collections of modules using bottom-up testing: given a higher level module H that uses facilities from several lower-level modules L1, L2 and L3, obviously you'd unit test the lower-level modules first, then test module H assuming the correctness of L1..L3.

Try to decouple H's testing from it's lower-level helper modules L1..L3 - by substituting highly simplified stub implementations of required functions in the lower modules. These stubs can hardcode desired answers, look answers up in a lookup table, or even ask you - the user - to provide the answer. Even that can be scripted as long as the answers are taken from standard input. Another good method is to test H against simple and inefficient versions of L1..L3 - perhaps the initial versions of them which you wrote, tested, and then kept for later testing!

Of course, should you change the module interface, you'll find yourself updating two parallel modules. Despite these clever tricks, there is often no realistic alternative to thorough bottom up unit testing.

Following K&P as quoted above (on Test automation) you can test whole programs - or modules and their harnesses - by writing simple scripts (shell/perl etc) which test a given program for a wide variety of inputs. Whole collections of programs forming a system can be tested - system testing. You can even gather non-functional information, eg. finding out memory use and runtime performance via benchmarking and profiling.

Finally, some subset of all the above tests (as long as they do not require human interaction) can then be grouped together via scripting frameworks to comprise your regression test suite, as in Fred Brooks classic definition we saw earlier.

Enough Theory! Show me an example!
Ok, that's more than enough theory! In the Second Part, we'll develop a worked example of TDD.

d.white@imperial.ac.uk
Back to PSD Top
Written: March-June 2013