PSD: Duncan White's Practical Software Development Pages

Welcome to Duncan White's Practical Software Development (PSD) Pages.

I'm Duncan White, an experienced and professional programmer, and have been programming for well over 30 years, mainly in C and Perl, although I know many other languages. In that time, despite my best intentions:-), I just can't help learning a thing or two about the practical matters of designing, programming, testing, debugging, running projects etc. Back in 2007, I thought I'd start writing an occasional series of articles, book reviews, more general thoughts etc, all focussing on software development without all the guff.

See all my Practical Software Development (PSD) Pages

Simlating OOP in ANSI C, part 2

In the first part of this article I discussed how to simulate Object Oriented Programming (OOP) in plain old ANSI C, starting with a discussion of OO terminology, an OO pseudo-code, and presenting a 3-class example system.

Now, here in the second part, I'm going to specify a particular simple but general variant of OOP, then I'll show how to translate a single class into C, leaving inheritance for later.

In the third and final part, I'll show how to translate single inheritance into C, translate the rest of the classes, and finally summarise the technique.

Our Chosen Variant of OOP
Now we have our example to implement, let's describe the particular variant of OOP that is - Goldilocks like - just right for our purposes: general enough to cope, but simple enough to allow us to focus on the important stuff. Here I've chosen to throw out all the C++ and Java junk in favour of the rather pure and simple OO model in Simula-67, the original OO language - even older than Smalltalk, often wrongly claimed to be the first:
Each object belongs to a single concrete class. No friend classes. No abstract classes or interfaces that can't be instantiated. No private classes, no nested classes, every class is (what C++ and Java programmers would call) public.

An object encapsulates instance attributes and methods (operations) that operate on them, giving a little package of "smart data". However..

All instance attributes are fully public, i.e. they can be read (and altered) from outside the object. No private/protected rubbish to control access. The programmer can choose not to directly access instance attributes from outside the object - if they want to - in order to give themselves more freedom in future to change what representation the object has (i.e. what attributes the class has and how they represent the object's state). Or they could decide that readonly access to attributes from outside is ok, but altering attributes from outside is bad. It's their choice.

A class is a form of module (a compilation unit/source file), split into two pieces - the class definition (an interface) and the class implementation. This corresponds nicely to how we build modules in many languages - even C's header file and C source file split.

Single-inheritance is the only form of inheritance supported.

No roles, traits or Java-style interfaces to fake up multiple inheritance by another route.

All instance methods are virtual, i.e. every call to every instance method uses dynamic dispatch to pick the most appropriate implementation (that matches what the object actually is, it's dynamic type not it's static type). No "final" or "static" methods that can't be overridden, or "final" classes that can't be subclassed.

No class attributes, or class methods (apart from the constructor). Class attributes and methods could easily be implemented if desired, this omission is just for simplicity.

Object initialization can be chained, because it often helps to initialize a derived object first as if were an instance of it's parental class, and then modify it.

No method chaining: our idea is that OO's job is to choose (via dynamic dispatch) which single method function to call - it is not compulsory to allow a particular method in a derived class to "wrap" the parental/ancestral class's version of the same method function. Method chaining could be implemented relatively easily if desired, again, this omission is for simplicity.
There is a single way of creating an object - by a constructor call like:
  Car c = new Car_or_subclass_of_Car( parameters );
Unlike C++ where (IMHO) there are far too many ways of creating an objects (global variable, local variable on stack, pointer to object on the heap, pointer to object in local variable, pointer to object in global variable, reference to another object; did I miss any:-)?).
Such a constructor call is roughly equivalent to:
  Car *c     = new Car_or_subclass_of_Car( parameters );  // C++
  ref(Car) c = new Car_or_subclass_of_Car( parameters );  // Simula-67
As we'll be translating our "OO pseudocode" into ANSI C, and C doesn't have any form of automatic garbage collection, we're going to need a deconstructor method for each object, and we'll need to call it explicitly whenever we've finished with an object; otherwise we'll leak memory. That will be treated as a normal method call:
  object->dispose()
Of course, dispose() had better be the Very Last Method ever called on that object:-).
Objects behave like pointers, in that assigning an object to a variable, or passing an object to a function/method, does not clone the object, it passes the original object in all it's read/write glory. If you want to be able to clone an object then you'd better write an explicit object->clone() method. There are a considerable number of problems associated with the question: how deeply should we clone?

No generics like Java, aka templates in C++, this is not a core OO feature. In particular, it's an entirely different form of polymorphism than subtype polymorphism - ie. the combination of dynamic dispatch and subclass compatibility.
Following our "isa" discussion in part 1, we will give every object an "isa" attribute which returns a string - the class name. Note that we could implement "isa" as a method which, when called, returns the class name as a string, but an attribute seems simpler. So, given our earlier Vehicle and Car variables, the following assertions should succeed:
 assert v1.isa == "Vehicle";
 assert c1.isa == "Car";
 assert v2.isa == "Car";
Most important, not everything has to be an object:

Making an integer an object, is (IMHO) the first sign of zealots over-generalising an idea;-) An integer fits in a machine word, and is manipulated by builtin binary operators like +, -, * and /, which map onto single machine instructions. It's not an object; + is not a method that asks one numeric object to add another one onto it!

Similarly, insisting that the main function is a static public void main method in a class that is never instantiated (eg in Java, C# etc) is simply daft: main is a plain non-OO function, why wrap it up for goodness sake!

Finally, in my first PSD article I argued that every ADT needs a display_as_str operation, so we'll apply that rule to every class as well.
Right, so let's get started.
Implementing the Vector2 class in ANSI C
Let's start with our Vector2 class, and try to translate it into C, working out some general principles as we go. The obvious place to start is to consider how we might represent classes, objects and method calls in C (leaving inheritance until part 3).
Refreshing our memory - from part 1 of this article - Vector2's class interface was:
 class Vector2
 {
     double x, y default 0, 0;
     double method magnitude();              // find the magnitude
     double method distance_to( Vector2 p ); // distance to a point
     void   method add( Vector2 v );         // this += v
     void   method scale( double n );        // this *= n
     void   method normalize();              // make this a unit vector
     void   method print();                  // display this vector
 }
A class often seems like an enhanced form of a C struct (record in Pascal terms), so that seems like the place to start. Let's define a structure to represent our Vector2 class, with fields representing both the attributes (such as x and y) and the methods (some kind of function pointers). In this first attempt, let's delay the decision about which type each function pointer is, by assuming we have a placeholder type called FUNCPTR available. We'll expand this later:
 struct Vector2
 {
         char *   isa;                            // attributes
         double   x, y;
         FUNCPTR  magnitude;                      // methods
         FUNCPTR  distance_to;
         FUNCPTR  add;
         FUNCPTR  scale;
         FUNCPTR  normalize;
         FUNCPTR  print;
 };
Aside: a variant on this technique involves defining a separate struct type for the function pointers, usually called a vtable (virtual or vector table), and then embedding a vtable pointer into the above Vector2 structure instead. (This is roughly what most C++ implementations do, in fact). This has some advantages (most obviously, space: if you create many Vector2 objects, sharing a single vtable, this uses rather less space per object), and disadvantages (extra complexity, when should you allocate the Vector2 vtable, and when should you free it in order not to leak memory?). For our purposes, the separate vtable is unnecessary complexity, so we'll stick to directly embedded function pointers.
Now, given that we've decided all objects behave like pointers, let's decide that, in our C translation, every object is a pointer. So, the Vector2 type that we'll actually want to deal with most frequently is a struct Vector2 * (a pointer to a struct Vector2), so let's typedef that for our convenience:
 typedef struct Vector2 *Vector2;
This declaration defines a type Vector2 that is a pointer to a struct Vector2. Although this declaration may seem confusing, or even self-referential, C keeps the namespaces of struct XXX and XXX completely separate. It's my recommended C style - your milage may vary, feel free to rename the structure as struct Something_Else, for example struct Vector2_struct.
Now, let's think about methods. Refreshing our memory again, our methods (in our C++ like pseudo-code) were:
 implementation class Vector2;

 double method magnitude() {                   // find the magnitude
     return sqrt(x*x+y*y);                     // (distance to origin)
 }
 double method distance_to( Vector2 p ) {      // distance to a point
     double a = x - p.x;
     double b = y - p.y;
     return sqrt(a*a+b*b);
 }
 void method add( Vector2 v ) {                // this += v
     x += v.x;
     y += v.y;
 }
 void method scale( double n ) {               // this *= n
     x *= n;
     y *= n;
 }
 void method normalize() {                     // make a unit vector
     double m = magnitude();
     scale( 1/m ) unless m == 0 || m == 1;
 }
 void method print() {                         // display this vector
     printf( "[%.3f,%.3f]", x, y );
 }
A fundamental part of OOP is that a method call needs to know which object it is being called on. Consider our magnitude method (in OO pseudo-code):
 double method magnitude()
 {
     return sqrt(x*x+y*y);
 }
When magnitude() squares x - which x is that? Ditto for y.
At the OOP level, the answer is obvious: it's the x (or y) that lives "inside" the specific object that magnitude() is being invoked on, often known as "self" or "this" (as in "me, myself, this object"). If we like, we can make "this" more explicit, still in our OO pseudocode:
 double method magnitude()
 {
     double x = this.x;
     double y = this.y;
     return sqrt(x*x+y*y);
 }
So if we invoke the method as:
 v = new Vector2( 10, 20 );
 v.magnitude();
It is v.x (ie. 10), and v.y (ie. 20), that magnitude() is squaring and summing, before square rooting the sum. In OOP, this association of a method and it's object happens automatically, as if by magic.
To implement it in C, we have to explain how the magic works, even if this gets us expelled from the Magic Circle. There is only one real way of doing this: make the implicit "this" parameter explicit, giving our C method function (and later, all C method functions) an extra parameter:
 // our first method implemented in ANSI C
 static double magnitude( Vector2 this )
 {
     double x = this->x;
     double y = this->y;
     return sqrt(x*x+y*y);
 }
Note that we have made our method "static" because methods should be private to the class - only accessible via method calls on an object. This is part of encapsulating the object, it's attributes and it's methods together. However, we will see a case later where this is unhelpful.
Let's see the rest of the methods translated into C, to convince us that any method can be translated in this way:
 static double distance_to( Vector2 this, Vector2 v ) {
         double a = this->x - v->x;
         double b = this->y - v->y;
         return sqrt(a*a+b*b);
 }
 static void add( Vector2 this, Vector2 v ) {
         this->x += v->x;
         this->y += v->y;
 }
 static void scale( Vector2 this, double n ) {
         this->x *= n;
         this->y *= n;
 }
 static void normalize( Vector2 this ) {
         double m = magnitude(this);          // m = this.magnitude()
         if( m != 0.0 && m != 1.0 )
         {
                 scale( this, 1.0/m );        // this.scale( 1.0/m );
         }
 }
 static void print( Vector2 this ) {
         printf( "[%.3f,%.3f]", this->x, this->y );
 }
One subtlety: one method may need to call another method (like the calls to magnitude() and scale() inside normalize()) - naturally enough we call this intra-class calls. For now, we can directly call the method functions (giving them their extra "this" parameter). Note that these are normal function calls - not proper virtual method calls - for now, this doesn't matter, we'll revisit this later when we consider inheritance.
Now that we have all our methods translated to static C functions, how do we "wire them up" into the appropriate slots in our struct? We need a constructor to allocate a chunk of memory, initialize all the fields. Later we'll also need a deconstructor to deallocate that chunk:
One way of writing our constructor is as follows:
 Vector2 new_Vector2( double x, double y ) {
         Vector2 v = (Vector2) malloc( sizeof(struct Vector2) );
         assert( v != NULL );
         v->isa          = "Vector2";
         v->x            = x==-1 ? 0 : x;        // implement default
         v->y            = y==-1 ? 0 : y;        // implement default
         v->magnitude    = &magnitude;
         v->distance_to  = &distance_to;
         v->add          = &add;
         v->scale        = &scale;
         v->normalize    = &normalize;
         v->print        = &print;
         return v;
 }
See how the initializations set every function pointer in the structure to &the_function_implementing_the_method, and also implements the "default" values of attributes - the idea is that a NULL pointer or -1 numeric value in the parameter list means: use the default value for that field if one was defined (in this particular case, using -1 to represent 0 seems mildly futile here, given that we could use 0 to represent 0 instead, but the intention is to have a general mechanism for implementing default values). Of course, supporting default values is a mere convenience, not a core part of OOP.
However, having written the constructor in one piece above, it's good practice to split it into 3 pieces - this may seem like overkill now, but it'll pay dividends later:
First, the allocator which does the malloc:
 Vector2 alloc_Vector2( void ) {
         Vector2 v = (Vector2) malloc( sizeof(struct Vector2) );
         assert( v != NULL );
         return v;
 }
Second, the initializer that sets all the fields (and implements any default values):
 void init_Vector2( Vector2 v, double x, double y ) {
         v->isa          = "Vector2";
         v->x            = x==-1 ? 0 : x;        // implement default
         v->y            = y==-1 ? 0 : y;        // implement default
         v->magnitude    = &magnitude;
         v->distance_to  = &distance_to;
         v->add          = &add;
         v->scale        = &scale;
         v->normalize    = &normalize;
         v->print        = &print;
 }
Finally, here's our constructor:
 Vector2 new_Vector2( double x, double y ) {
         Vector2 v = alloc_Vector2();
         init_Vector2( v, x, y );
         return v;
 }
Having the initializer separate enables us to call it to reset an object's values - if we want to reuse an object completely.
We'll see the reason for having the allocator separate when we come to inheritance in part 3.
Given that our object allocator allocates memory, and memory allocation is our responsibility in C, we'll need a deconstructor to free the object when we're done. Although many OO languages also happen to implement automatic garbage collection, it's not a core part of OOP, it's merely a convenience - in C we'll have to do it manually. We could implement our deconstructor as a simple function taking a Vector2:
 void dispose_Vector2( Vector2 v ) {                    // deconstructor
         free(v);
 }
Or even, as it's so simple, make it a macro:
 #define dispose_Vector2(v) free(v)                    // deconstructor
But the most general way of implementing this is as an extra method, this (combined with dynamic dispatch) allows a future subclass - containing attributes that themselves need to be destroyed properly - to redefine the dispose() method. To make dispose() a method, we add a field to our structure:
 FUNCPTR  dispose;                       // deconstructor
Then write the method:
 static void dispose( Vector2 this ) {
         free( this );
 }
Then wire it in, in the initializer:
 v->dispose              = &dispose;
Note that dispose() is a special method - in that it's the last method we must ever call on each object, as when called it destroys "this" object.
Now, we're nearly there - but we're still using our FUNCPTR placeholders in our "struct Vector2". We must either define that type or replace each use of it with a different defined type.
In fact, no single definition for FUNCPTR is possible - because a C function pointer type includes all the parameter types and the return type. So instead, let's define a function pointer type for each method, remembering that every method, and hence method function signature, takes "this" first:
 typedef double (*Vector2_magnitude   )( Vector2 this );
 typedef double (*Vector2_distance_to )( Vector2 this, Vector2 other );
 typedef void   (*Vector2_add         )( Vector2 this, Vector2 other );
 typedef void   (*Vector2_normalize   )( Vector2 this );
 typedef void   (*Vector2_print       )( Vector2 this );
 typedef void   (*Vector2_scale       )( Vector2 this, double d );
 typedef void   (*Vector2_dispose     )( Vector2 this );
Note that several of these function pointer types have the same parameter and return type signature, eg. Vector2_normalize(), Vector2_print() and Vector2_dispose() are all pointers to functions mapping Vector2 x Vector2 -> void. If you prefer writing slightly fewer type defintions, you could define one function pointer type per distinct method signature. However, the above is clearer and simpler - and more machine-generatable.
Using those types, we rewrite our struct definition giving:
 struct Vector2
 {
         char *               isa;               // attributes
         double               x, y;
         Vector2_magnitude    magnitude;         // methods
         Vector2_distance_to  distance_to;
         Vector2_add          add;
         Vector2_scale        scale;
         Vector2_normalize    normalize;
         Vector2_print        print;
         Vector2_dispose      dispose;           // deconstructor
 };
As to where all this code goes, we'll end up with vector2.c and vector2.h comprising a standard C module. Let's show both files (they're in a tarball mentioned later for you to examine more closely):
vector2.h reads:
 typedef struct Vector2 *Vector2;                           // all objects are pointers

 // foreach method, declare a type definition to make life easier
 // note that every method, and hence method signature, takes "this" first.
 typedef double (*Vector2_magnitude   )( Vector2 this );
 typedef double (*Vector2_distance_to )( Vector2 this, Vector2 other );
 typedef void   (*Vector2_add         )( Vector2 this, Vector2 other );
 typedef void   (*Vector2_normalize   )( Vector2 this );
 typedef void   (*Vector2_print       )( Vector2 this );
 typedef void   (*Vector2_scale       )( Vector2 this, double d );
 typedef void   (*Vector2_dispose     )( Vector2 this );

 struct Vector2
 {
         char *               isa;                          // attributes
         double               x, y;
         Vector2_magnitude    magnitude;                    // methods
         Vector2_distance_to  distance_to;
         Vector2_add          add;
         Vector2_scale        scale;
         Vector2_normalize    normalize;
         Vector2_print        print;
         Vector2_dispose      dispose;                      // deconstructor
 };

 extern Vector2 new_Vector2( double x, double y );          // constructor
 extern Vector2 alloc_Vector2( void );                      // allocator
 extern void init_Vector2( Vector2 v, double x, double y ); // initializer

 // OO call helpers to add "implicit this object first param"
 #define OM0(o,m)      (*((o)->m))(o)
 #define OM1(o,m,p)    (*((o)->m))((o),(p))
vector2.c reads:
 #include <stdio.h>
 #include <stdlib.h>
 #include <assert.h>
 #include <math.h>
 
 #include "vector2.h"
 
 static double magnitude( Vector2 this )
 {
         double x = this->x;
         double y = this->y;
         return sqrt(x*x+y*y);
 }
 static double distance_to( Vector2 this, Vector2 v )
 {
         double a = this->x - v->x;
         double b = this->y - v->y;
         return sqrt(a*a+b*b);
 }
 static void add( Vector2 this, Vector2 v ) {
         this->x += v->x;
         this->y += v->y;
 }
 static void scale( Vector2 this, double n ) {
         this->x *= n;
         this->y *= n;
 }
 static void normalize( Vector2 this ) {
         double m = magnitude(this);         // m = magnitude()
         if( m != 0.0 && m != 1.0 )
         {
                 scale( this, 1.0/m);        // scale(1.0/m)
         }
 }
 static void print( Vector2 this )
 {
         printf( "[%.3f,%.3f]", this->x, this->y );
 }
 static void dispose( Vector2 this ) {
         free( this );
 }
 Vector2 alloc_Vector2( void ) {
         Vector2 v = (Vector2) malloc( sizeof(struct Vector2) );
         assert( v != NULL );
         return v;
 }
 void init_Vector2( Vector2 v, double x, double y ) {
         v->isa          = "Vector2";
         v->x            = x==-1 ? 0 : x;        // implement default
         v->y            = y==-1 ? 0 : y;        // implement default
         v->magnitude    = &magnitude;
         v->distance_to  = &distance_to;
         v->add          = &add;
         v->scale        = &scale;
         v->normalize    = &normalize;
         v->print        = &print;
         v->dispose      = &dispose;
 }
 Vector2 new_Vector2( double x, double y ) {
         Vector2 v = alloc_Vector2();
         init_Vector2( v, x, y );
         return v;
 }
Next, we'll need a test program, let's show it in our OO pseudocode first (using a made-up printf extension %OBJ to mean "print this object using it's own print method", a good example of my argument that every ADT needs a display_as_str operation):
 Vector2 a  = new Vector2( 10, 0 );
 Vector2 b  = new Vector2( 0, 10 );

 double d = a.magnitude;
 printf( "magnitude of %OBJ is %g\n", a, d );

 d = a.distance_to(a);
 printf( "distance from %OBJ to itself is %g\n", a, d ); 

 d = a.distance_to(b);
 printf( "distance from %OBJ to %OBJ is %g\n", a, b, d ); 

 printf( "normalizing a: %OBJ ", a );
 a.normalize;
 d = a.magnitude;
 printf( " gives %OBJ with magnitude %g\n", a, d );

 printf( "a isa %s, b isa %s\n", a.isa, b.isa );
Our C translation simpletestvector2.c starts obviously enough:
 #include <stdio.h>
 #include <stdlib.h>
 #include "vector2.h"
 
 int main( void )
 {
         Vector2 a  = new_Vector2( 10, 0 );
         Vector2 b  = new_Vector2( 0, 10 );
         .......
Next, we have to work out how to translate a method call such as double d = a.magnitude into our C code: We must dereference the function pointer in the structure and call it, remembering to pass the object as an extra parameter:
         double d = (*(a->magnitude))(a);
Ok, this looks pretty horrible, but will do the job.
These days, ANSI C will automatically dereference the function pointer for you. Using this syntax, we can write our magnitude() call as the much simpler:
         double d = a->magnitude(a);
However, we must still remember to pass the "this" parameter to every method call, and it's very easy to forget to do this. To make this a fraction easier, I suggest defining "Object Method" helper macros as follows:
 // OO call helpers to add "implicit this object first param"
 #define OM0(o,m)        (*((o)->m))(o)
 #define OM1(o,m,p)      (*((o)->m))((o),(p))
 #define OM2(o,m,p,q)    (*((o)->m))((o),(p),(q))
Here, OM0 should be used when you want to write an object-method-with-no-parameters call, OM1 for method calls with a single parameter etc. For example:
         OM0(a,print);               // a.print;
         d = OM0(a,magnitude);       // d = a.magnitude;
         d = OM1(a,distance_to,b);   // d = a.distance_to(b);
We can now translate the rest of our test program as (expanding our %OBJ pseudo-printf capability as we go):
         .......
         // d = a->magnitude() using OM0 (object-method-call) helper
         double d = OM0(a,magnitude);
 
         printf( "magnitude of " );
         OM0(a,print);
         printf( " is %g\n", d );

         // d = a->distance_to(a),
         //     using OM1 (object-method-1param-call) helper
         d = OM1(a,distance_to,a);

         printf( "distance from " );
         OM0(a,print);                       // a->print()
         printf( " to itself is %g\n", d );

         // d = a->distance_to(b), using OM1 helper
         d     = OM1(a,distance_to,b);

         printf( "distance from " );
         OM0(a,print);                       // a->print()
         printf( " to " );
         OM0(b,print);                       // b->print()
         printf( " is %g\n", d );

         printf( "normalizing a: " );
         OM0(a,print);
         OM0(a,normalize);
         d = OM0(a,magnitude);
         printf( " gives " );
         OM0(a,print);
         printf( " with magnitude %g\n", d );

         printf( "a isa %s, b isa %s\n", a->isa, b->isa );
Finally, we'll need to dispose our objects and return from main:
         OM0( b, dispose );
         OM0( a, dispose );
         return 0;
 }
Now we really have finished implementing our Vector2 class. I've put this together into a neat collection of well-structured C files (comprising a Makefile, the vector2 module (.c and .h file), an expanded version of the simple test program shown above, as well as some unit tests) as a tarball, ready for you to download, extract and compile.
Having compiled everything via make, run the above simple test program via ./simpletestvector2 and you'll see the following output:
 magnitude of [10,0] is 10
 distance from [10,0] to itself is 0
 distance from [10,0] to [0,10] is 14.1421
 normalizing a: [10,0] gives [1,0] with magnitude 1
 scaling a: [1,0] by 5  gives [5,0] with magnitude 5
 adding b: [0,10] to a: [5,0] gives [5,10] with magnitude 11.1803
 a isa Vector2, b isa Vector2
I suggest that you take a few minutes downloading the tarball, extracting it, compiling the code, running the code, examining the code and it's output, and convincing yourself that it really works.
Onto Part 3

Ok, now proceed to the third and final part, in which I show how to translate single inheritance into C, translate the rest of the classes, and summarise the technique.

d.white@imperial.ac.uk
On to the third part of this article Back to the first part of this article Back to PSD Top
Written: July 2014