PSD: Duncan White's Practical Software Development Pages

Welcome to Duncan White's Practical Software Development (PSD) Pages.

I'm Duncan White, an experienced and professional programmer, and have been programming for well over 20 years, mainly in C and Perl, although I know many other languages. In that time, despite my best intentions:-), I just can't help learning a thing or two about the practical matters of designing, programming, testing, debugging, running projects etc. Back in 2007, I thought I'd start writing an occasional series of articles, book reviews, more general thoughts etc, all focussing on software development without all the guff.

See all my Practical Software Development (PSD) Pages

Testing and Development part 3

In the First Part of this article I discussed testing your programs, an old field newly reinvigorated by the recent strong emphasis on testing in Extreme Programming (XP) and it's spin-off Test Driven Development (TDD)

In the Second Part of this article, we started working through an example of development and testing a stack of elements, in C, and developed a stack that could successively push a single integer onto a stack, leaving a non-empty depth 1 stack whose top value was the value we pushed.

Now, in part 3, let's continue developing our stack.

TDD Phase IV: Get Test3 Working - via a bug!
We now need our next test program - test3.c: the push 1, pop 1 test, whose body is:
   stack s = empty_stack();
   CHECK_BOOL( isempty(s), true );
   CHECK_INT( depth(s), 0 );

   push( 20, s );
   CHECK_BOOL( isempty(s), false );
   CHECK_INT( depth(s), 1 );
   CHECK_INT( top(s), 20 );

   CHECK_INT( pop(s), 20 );
   CHECK_BOOL( isempty(s), true );
   CHECK_INT( depth(s), 0 );

   destroy_stack(s);
i.e test2.c with the pop() test stanza added. As before, we add test3 to our Makefile (add test3 to the tests variable and add test3 compilation/linking rules). Download this version here, now make fails on the pop() stub:
    make
    gcc -Wall -g   -c -o test3.o test3.c
    gcc -o test3 test3.o stack.o
    ./checker test1 test2 test3
    test1 ok
    test2 ok
    test3.c:23 - pop(s): was -1, expected 20
    test3 failed
So, implement pop(), decrementing the number of elements:
    element pop(stack s)             /* pop the top element off stack */
    {
	assert( s->nel > 0 );        /* not empty */
	s->nel--;
	int n = s->data[s->nel];
	return n;
    }
Download this version here. Now that we've added pop(), we would expect the whole make test suite to work. But, contrary to our expectations, test3 is still failing, despite having implemented pop() it doesn't deliver the correct value. However, note that before we had implemented pop() it delivered the stub value -1, now it delivers 0:
    make
    gcc -Wall -g   -c -o stack.o stack.c
    gcc -o test1 test1.o stack.o
    gcc -o test2 test2.o stack.o
    gcc -o test3 test3.o stack.o
    ./checker test1 test2 test3
    test1 ok
    test2 ok
    test3.c:23 - pop(s): was 0, expected 20
    test3 failed
So, this would appear to be our first honest-to-goodness bug. You've probably already spotted the problem, but let's assume we can't see it, so we need to debug the problem. But how? Run ./test3 a few times to check it fails reliably - yes, it does, so at least the bug seems deterministic.

I find it always helps to ask myself - what value would I most like to know to help debugging this problem? We know that pop() returned 0 unexpectedly, so I'd like to be able to check the stack state at the end of push() and at the beginning of pop().
We have a choice of techniques - dry run the program on paper, add debugging print statements to the test program or stack module, or use a debugger and set breakpoints on entry to push() and pop(). Let's choose the debugger gdb. First, let's change MAXDEPTH to something very small - say 5 (edit stack.h and then make). Having changed the source code - rerun ./test3 to make sure the behaviour is unchanged. Then run gdb test3, set breakpoints in push() and pop(), run the program until it hits a breakpoint, then single step through each operation. Note that the statement gdb shows is the one that is about to run - not the statement that has just run:
    gdb test3
    Copyright 2004 Free Software Foundation, Inc.
    (gdb) b push
    Breakpoint 1 at 0x8048631: file stack.c, line 38.
    (gdb) b pop
    Breakpoint 2 at 0x804869d: file stack.c, line 51.
    (gdb) r
    Starting program: test3 

    Breakpoint 1, push (n=20, s=0x804a008) at stack.c:38
    38              assert( s->nel < MAXDEPTH ); /* check not full */
    (gdb) p *s
    $1 = {nel = 0, data = {0, 0, 0, 0, 0}}
    (gdb) s
    39              s->nel++;
    (gdb) s
    40              s->data[s->nel] = n;
    (gdb) p s->nel
    $3 = 1
    (gdb) s
    41      }
    (gdb) p *s
    $4 = {nel = 1, data = {0, 20, 0, 0, 0}}
Hmm.. at this point, we notice that 20 is stored in element 1 of the array, not element 0 - element 0 is never used!
This seems slightly suspicious - perhaps this is the bug - but let's carry on debugging a little further, onto the next breakpoint at least:
    (gdb) c
    Continuing.

    Breakpoint 2, pop (s=0x804a008) at stack.c:51
    51              assert( s->nel > 0 );    /* check not empty */
    (gdb) p *s
    $5 = {nel = 1, data = {0, 20, 0, 0, 0}}
This shows that the stack has not been corrupted between the end of push() and the start of pop(). Now let's see what pop() itself does:
    (gdb) s
    52              s->nel--;
    (gdb) s
    53              int n = s->data[s->nel];
    (gdb) p s->nel
    $6 = 0
    (gdb) s
    54              return n;
    (gdb) p n
    $7 = 0
    (gdb) q
Aha! There's the problem - push() stores the first stack element (20) in data[1] whereas pop() retrieves the value of the top stack element from data[0] which we initialized to 0 and never modified!
So, push() has one interpretation of the array - that the stack elements are in data[1..nel] - and pop() uses a different interpretation - the stack elements are in data[0..nel-1]. We need to decide which interpretation is correct, and either modify push() to use pop()'s interpretation, or vice versa.
Sometimes it can be a matter of judgement as to which interpretation is correct, which can be a bit tricky. Fortunately, this time, there's a definitive tie-break: looking back at our original stack type definition we see:
    element data[MAXDEPTH]; /* elements in data[0..nel-1]      */
According to this definition, the stack elements should have been in data[0..nel-1] all along. By that definition, and our intuitive feel for more idiomatic C code, let's conclude that pop() is using the array correctly, and push() is wrong. So, modify push() from:
    s->nel++;
    s->data[s->nel] = n;
to the following:
    s->data[s->nel] = n;
    s->nel++;
or, better yet, the following more idiomatic code:
    s->data[s->nel++] = n;
Download this version here. Secure in the knowledge of a bug-fix correctly diagnosed and fixed, we recompile and recheck it, expecting everything to work now, and find:
    ./checker test1 test2 test3 
    test1 ok
    test2.c:21 - top(s): was 0, expected 20
    test2 failed
    test3.c:21 - top(s): was 0, expected 20
    test3 failed
Somehow, we've broken top() - without even changing it! Oh no! Bugs multiplying out of control.. We're doomed, Captain Mannering, we're doooooomed!.. Ah, no, hang on - the top() body is:
    assert( s->nel > 0 );    /* check not empty */
    return s->data[s->nel];
This is only correct according to the old data[1..nel] interpretation. According to the new data[0..nel-1] interpretation, the top element is data[nel-1].
Change top()'s body to read:
    assert( s->nel > 0 );    /* check not empty */
    return s->data[s->nel-1];
By the way, the assertion convinces us that we're not going to underrun the lower bound of the array - we only compute data[nel-1] if nel>0, hence nel-1>=0.
Download this version here. Recompile and retest:
    make
    gcc -Wall -g   -c -o stack.o stack.c
    gcc -o test1 test1.o stack.o
    gcc -Wall -g   -c -o test2.o test2.c
    gcc -o test2 test2.o stack.o
    gcc -Wall -g   -c -o test3.o test3.c
    gcc -o test3 test3.o stack.o
    ./checker test1 test2 test3 
    test1 ok
    test2 ok
    test3 ok
Phew! For the first time, we have a working phase IV stack!
TDD Interregnum: Are we there yet?
Are we anywhere near finished? Well, let's see. Let's ask our black hat testing team (a la IBM's legendary black team) to look over our test-suite. Failing that, let's at least put on a slightly grey hat and think about the strength of our test suite ourselves! Hmm. total of three tests. Not good. Maximum of one element pushed onto the stack, and popped off again. We haven't even tested our original example use scenario of pushing two items on the stack and popping them! I don't think we should be too impressed yet!

A graphic way of expressing our lack of confidence in the thoroughness of our test suite is to turn it around: put on a black hat, and think - could we develop a stack implementation that passes our entire testsuite, while being (obviously) completely broken to human eyes? If so, this proves that our testsuite is wholly inadequate!
In this case, the answer's yes - we could easily develop a "joke stack" implementation that can push and pop a single value from a stack: First, take a safe copy of stack.c and stack.h using commands:
   cp stack.c real-stack.c
   cp stack.h real-stack.h
and then create a stack implementation based on the following principles:
    typedef struct {	      /* stack == malloc()d struct containing 0/1 elements */
	bool    empty;	      /* is the stack empty */
	element data;         /* joke stack: one element */
    } *stack;
and, as a sample of the operations:
    void push(element n, stack s)    /* push element n onto stack s */
    {
	s->data = n;
	s->empty = false;
    }

    element top(stack s)             /* look at the top element of stack */
    {
	assert( ! s->empty );    /* check not empty */
	return s->data;
    }
Hopefully, this makes the point pretty forcefully! It's just like UK Government targets in the NHS (the National Health Service for non-Brits) or schools - if you specify that you'll only measure X, Y and Z, and allocate resources based on how well people achieve X, Y and Z, then smart people will optimize delivery of X, Y and Z - and nothing else! In economics, this is called satisficing, another example is the urban legend about Henry T. Ford and the Model T Kingpin. Anyway, you can download the joke stack version here.

So, we're certainly not there yet. We need tests to distinguish a real stack from any joke stack implementations. This is one thing that concerns me about TDD - if the tests are actually driving development, it's arguable that the joke stack is a correct solution at this point. However, I hope that noone would ever write such a stupid implementation, even as a stepping stone - except to make a point.
Anyway, now we've seen the joke stack, let's immediately construct a test case to expose it's limitations. Enter test4 - the push 2, pop 2 test:
    #include <stdio.h>
    #include <stdlib.h>
    #include "bool.h"
    #include "check.h"
    #include "stack.h"

    int main( void )
    {
	stack s = empty_stack();
	CHECK_BOOL( isempty(s), true );
	CHECK_INT( depth(s), 0 );

	push( 20, s );
	CHECK_BOOL( isempty(s), false );
	CHECK_INT( depth(s), 1 );
	CHECK_INT( top(s), 20 );

	push( 30, s );
	CHECK_BOOL( isempty(s), false );
	CHECK_INT( depth(s), 2 );
	CHECK_INT( top(s), 30 );

	CHECK_INT( pop(s), 30 );
	CHECK_BOOL( isempty(s), false );
	CHECK_INT( depth(s), 1 );
	CHECK_INT( top(s), 20 );

	CHECK_INT( pop(s), 20 );
	CHECK_BOOL( isempty(s), true );
	CHECK_INT( depth(s), 0 );

	destroy_stack(s);
	exit(0);
    }
Having added the obvious extra Makefile rules to build and test test4 as well (download this version here) our test suite now exposes the limitations of the joke stack:
    test1 ok
    test2 ok
    test3 ok
    test4.c:25 - depth(s): was 1, expected 2
    test4 failed
Before going back to our proper stack implementation, note that test4.c above is getting very long and repetitive. Let's refactor two repeated sequences of tests to get:
    #define CHECK_ELEMENT CHECK_INT

    void checkempty( stack s )
    {
	CHECK_BOOL( isempty(s), true );
	CHECK_INT( depth(s), 0 );
    }

    void checknonempty( stack s, int expectd, element expectt )
    {
	CHECK_BOOL( isempty(s), false );
	CHECK_INT( depth(s), expectd );
	CHECK_ELEMENT( top(s), expectt );
    }

    int main( void )
    {
	stack s = empty_stack();     checkempty(s);
	push( 20, s );               checknonempty( s, 1, 20 );
	push( 30, s );               checknonempty( s, 2, 30 );
	CHECK_ELEMENT( pop(s), 30 ); checknonempty( s, 1, 20 );
	CHECK_ELEMENT( pop(s), 20 ); checkempty(s);
	destroy_stack(s);
	exit(0);
    }
Checking again, the results are (as they should be) unchanged - test4 is still exposing the joke stack's shallowness. Download this version here.
Ok, now let's replace the joke stack with our previous proper version of the stack (you did make sure not to destroy the previous proper version, didn't you?) and remake:
    cp stack.c joke-stack.c
    cp stack.h joke-stack.h
    cp real-stack.c stack.c
    cp real-stack.h stack.h
    make clean check
The results are the heartwarming:
    ./checker test1 test2 test3 test4
    test1 ok
    test2 ok
    test3 ok
    test4 ok
Download this version here.
As a final flourish in this section, if we now add the obvious test5 whose body is:
        stack s = empty_stack();     checkempty(s);
        push( 20, s );               checknonempty( s, 1, 20 );
        push( 30, s );               checknonempty( s, 2, 30 );
        push( 45, s );               checknonempty( s, 3, 45 );
        CHECK_ELEMENT( pop(s), 45 ); checknonempty( s, 2, 30 );
        CHECK_ELEMENT( pop(s), 30 ); checknonempty( s, 1, 20 );
        CHECK_ELEMENT( pop(s), 20 ); checkempty(s);
        destroy_stack(s);
        exit(0);
Having created test5.c and modified the Makefile as usual, compiling and running it again gives:
    ./checker test1 test2 test3 test4 test5
    test1 ok
    test2 ok
    test3 ok
    test4 ok
    test5 ok
You can download this version here.

At this point, we could obviously carry on generating many more test cases, test6, test7 and test8 etc, pushing and popping more items, possibly having several push/pop cycles in a single test. Manually inserting the correct check calls into these tests at the correct places now becomes an increasing chore.

Also, when would we stop testing? TDD doesn't seem to answer this!

In the Final Part of this article, we'll try a more sophisticated approach instead.
d.white@imperial.ac.uk
Back to part 2 of this article Back to PSD Top
Written: March-June 2013