PSD: Duncan White's Practical Software Development Pages

Welcome to Duncan White's Practical Software Development (PSD) Pages.

I'm Duncan White, an experienced and professional programmer, and have been programming for well over 30 years, mainly in C and Perl, although I know many other languages. In that time, despite my best intentions:-), I just can't help learning a thing or two about the practical matters of designing, programming, testing, debugging, running projects etc. Back in 2007, I thought I'd start writing an occasional series of articles, book reviews, more general thoughts etc, all focussing on software development without all the guff.

See all my Practical Software Development (PSD) Pages

Using features from one Language in a language without them:
case study, Object Oriented Programming (OOP) in ANSI C

An excellent general principle is to transfer cool features from one language into another. There are many examples of this, for example Perl teaches us the importance of hashes (aka dictionaries or maps, a collection of key/value pairs indexed on the key), you can give yourself this ability in C by implementing a generic hash ADT (see below), in which every key is an arbitrary string, and every value is a void * pointer, and then reusing it at every opportunity.

Let's look at another example: suppose (unlike me) you can't live without Object Oriented Programming (OOP) but have to build a large system in C - which does not natively support OOP. It's easy to say "use C++" or "use C#" or "use Objective-C" but what if we must use plain old ANSI C?

It's pretty easy (and an age old technique) to do OOP in C. But a surprisingly large number of people don't seem to know this technique (or any of it's variants) and dispute that you can do OOP in C at all.

One problem with OOP is that everyone uses slightly variable terminology and definitions, and it is very hard even to define a common agreed set of components that every OO system implements. This isn't just my opinion: see c2.com/cgi/wiki?NobodyAgreesOnWhatOoIs (thanks to Clayton Weimer of LinkedIn for this reference).

In the first part of this article, I'll look at some of the important components of OOP, introduce some OOP pseudo-code, show the example we're going to translate to C, and tackle a few of the subtleties of OOP - especially subclass compatibility, aka subtype polymorphism, and the related concept of dynamic dispatch.

In the second part, I'll specify the simple form of OOP that I'm going to translate to C. Then I'll show how to translate a single class into C, leaving inheritance for later.

In the third and final part, I'll show how to translate single inheritance into C, translate the rest of the classes, and finally summarise the technique.

Attempting to define OOP (punch mist?)

Despite the difficulties, let's try to define OOP, from the Wikipedia OOP entry:
An object is an abstract data type with the addition of polymorphism and inheritance. ... An object has state (data) and behavior (code).

Unpacking those terms a little:

Abstract data types (ADTs)

An important case of modular programming, building a self-contained module implementing a particular data type (often a container type such as a list, stack, queue, or indeed a dictionary/map/hash as mentioned above) in terms of a series of operations (such as pushing a value onto a stack or popping the top value off a stack). Crucially, an ADT hides most of the implementation details from the future users of the ADT (i.e. abstracting away the critical properties - the operations - and information hiding all non-critical details). Modularity and ADTs were one of the first main encapsulation and decoupling techniques widely used to structure large programs. They also enabled significant amounts of code reuse before OOP.

Inheritance

In most forms of OOP each object is created as a member of a specific class, and a new class can be defined as a specialised variety of one (or more) parent classes, inheriting all the parent classes' data and behaviours. Note that multiple inheritance is widely thought to be ill-defined, but more recently many commentators have started to criticise single inheritance as well. Personally I dislike all forms of inheritance more and more as time goes by!
An alternative to inheritance is the idea of roles, traits or interfaces, but that would take us too far afield. Google's Go language has a very unconventional and interesting approach - no inheritance (type hierarchy) at all. Instead you first define several unrelated classes (types with methods), and then much later you notice some commonality - that several unrelated classes all have a set of methods in common, so then you can choose to define an interface that matches all classes with (say) an object.write(string) and an object.close() method. Then you can write a function that takes an element of such an interface type as a parameter, that function can now be called with an object of any class matching the interface. This gives OO polymorphism (aka dynamic dispatch, aka virtual methods), and a single class can satisfy any number of interfaces, designed long after the class has been built. Some call this duck typing, as in if it behaves like a duck, swims like a duck, quacks like a duck; maybe it's a duck?

Polymorphism

as used in OOP we usually mean subtype polymorphism (which I prefer to call subclass compatibility). This means that a subclass object may be stored in a parent class variable, without forgetting that it is a member of the subclass. It's easier to explain this with a concrete example, so we'll come back to this shortly.
Separately, some OO languages (including Java and C++) use a second unrelated form of polymorphism: parametric polymorphism - aka generics or templates, the ability to define a type-parameterised class, for example List<T> where T represents any type, to be specified when declaring a variable, as in: List<Car> carlist; - which declares a variable called carlist, a list of cars. This form of polymorphism is absolutely not required as part of OO, it's an optional extra.

Behaviours or methods:

These are the OO equivalents of an ADT's operations, functions that have privileged access into the internals of an object's implementation, and which collectively enforce valuable properties of the ADT. Behaviour is probably more of a modelling term, whereas method is more of a programming term.

Another important aspect of OOP is discussed later in the Wikipedia page:

Dynamic dispatch

"When a method is invoked on an object, the object itself determines what code gets executed by looking up the method at run time in a table associated with the object"
i.e. this is the fundamental tendency of methods to be dynamic, "virtual" in C++ terms, and is perhaps the singlest most important capability that OOP adds (compared to non-OO ADTs).

Some Example Classes
For concreteness, let's see an example. Suppose we wanted to model Vehicles and Cars moving around in a 2-D space, for some traffic management application.

We start with a simple Vector2 class that defines a single geometrical (X,Y) point or vector in 2-D space. What methods (operations) should a Vector2 have? It should know how to:

Find it's own magnitude (distance from the origin).

Find the distance to any other point on the plane.

Perform a translation or addition: add a second Vector2 onto itself.

Scale itself by a constant factor.

Normalize itself (make it's magnitude 1, i.e. form a unit vector).

Print itself out in a convenient textual display format. Why? Back in my first PSD article I argued that every ADT needs a display_as_str operation, and we saw above that objects are ADTs with extras.
Written in some vaguely C++-like OO pseudocode that's:
 class Vector2
 {
     double x, y default 0, 0;
     double method magnitude();              // find the magnitude
     double method distance_to( Vector2 p ); // distance to a point
     void   method add( Vector2 v );         // this += v
     void   method scale( double n );        // this *= n
     void   method normalize();              // make this a unit vector
     void   method print();                  // display this vector
 }
Vector2's methods might be implemented as:
 implementation class Vector2;

 double method magnitude() {                   // find the magnitude
     return sqrt(x*x+y*y);                     // (distance to origin)
 }
 double method distance_to( Vector2 p ) {      // distance to a point
     double a = x - p.x;
     double b = y - p.y;
     return sqrt(a*a+b*b);
 }
 void method add( Vector2 v ) {                // this += v
     x += v.x;
     y += v.y;
 }
 void method scale( double n ) {               // this *= n
     x *= n;
     y *= n;
 }
 void method normalize() {                     // make a unit vector
     double m = magnitude();
     scale( 1/m ) unless m == 0 || m == 1;
 }
 void method print() {                         // display this vector
     printf( "[%.3f,%.3f]", x, y );
 }
Note that I'm assuming that a class should be presented in two parts: definition and implementation. In this I'm following standard modular programming techniques, but not all OO languages do this - Java is the obvious example which requires a single part definition of a class. Even the original OO language, Simula-67, did not support two part implementation of classes, but it had a rather good excuse: the entire programmatic system was placed in a single file, because Simula had no separate compilation units (modules)! Observe that the two-part class convention will map neatly into C's standard "two-file" modular technique: a header file and matching C source file collectively implementing a module.
Next, we define a Vehicle class describing the common properties of all vehicles (such as current position, speed, carrying capacity and the ability to move around the plane):
 class Vehicle
 {
     string      name;                 // each vehicle has a name for clarity
     Vector2     pos;                  // where this vehicle is at present
     Vector2     dir;                  // what direction this vehicle is going
     double      speed default 0;      // speed of this vehicle in metres per second
     int         people_cur default 1; // how many people the vehicle is carrying
     int         people_max default 4; // how many people the vehicle can carry
     void method move(double T);       // move in the current direction for T seconds
     void method print();              // display this vehicle
 }
Note that we represent both positions and directions by Vector2's.
We might implement Vehicle's methods by:
 implementation class Vehicle;

 void method move(double T) {          // move in the current direction for T seconds
     dir.normalize unless dir.magnitude == 1;
     Vector2 delta = new Vector2( dir.x, dir.y );
     delta.scale( T * speed );
     pos.add( delta );
 }
 void method print() {                 // display this vehicle
     printf( "Vehicle( " );
     printf( "%s: pos:%OBJ, dir:%OBJ", name, pos, dir );
     printf( ", speed:%.3f, people: %d (max:%d)", speed, people_cur, people_max );
     // %OBJ is a made up printf extension meaning "print
     // the corresponding object via it's print method"
     printf( " )" );
 }
Then define a Car as a subclass of a Vehicle, overriding existing methods - such as print() to describe itself as a Car - or adding new car-specific methods. Unfortunately, I can't think of any compelling Car-specific methods. Let's say every Car has a number of wheels, and a fuel tank with an amount of fuel and a maximum capacity for now:
 class Car subclassof Vehicle
 {
     int         wheels default 4;          // only cars have wheels(?)
     double      fuel_cur default 0;        // in litres
     double      fuel_max;                  // in litres
     void method addfuel( double fuel );    // add some fuel
 }
Implement Car's methods as:
 implementation class Car;

 void method addfuel( double fuel ) {       // add fuel litres to the fuel tank.
     assert fuel >= 0;
     assert fuel_cur >= 0 && fuel_cur <= fuel_max;
     fuel_cur += fuel;
     fuel_cur = fuel_max if fuel_cur > fuel_max;
 }
 void method print() {                      // redefine how to display this car
     printf( "Car( " );
     printf( "%s: pos:%OBJ, dir:%OBJ", name, pos, dir );
     printf( ", speed:%.3f, people: %d (max:%d)", speed, people_cur, people_max );
     printf( ", wheels:%d, fuel:%.3f (max %.3f)", wheels, fuel_cur, fuel_max );
     printf( " )" );
 }
Note that there is no single correct inheritance hierarchy when designing an OO system - for example should we have placed a WheeledVehicle class between a Vehicle and a Car, to express the common wheely properties of all vehicles with wheels? It's entirely your choice, as system designer.
Aside: the internals of Car's print() method look very similar to Vehicle's print() method, but with more key/value pairs printed on the last line. We're Repeating Ourselves (breaking the Pragmatic Programmer's Don't Repeat Yourself tip!). To solve this, we'd could refactor the print() methods and then method chain: This refactoring involves splitting out the key/value printing from Car's print() method into a helper method called innerprint(), and then do exactly the same in Vehicle, giving:
 implementation class Vehicle;
 void method print() {                      // display this vehicle
     printf( "Vehicle( " );
     this.innerprint();
     printf( " )" );
 }
 void method innerprint() {                 // display the key/value innards of a vehicle
     printf( "%s: pos:%OBJ, dir:%OBJ", name, pos, dir );
     printf( ", speed:%.3f, people: %d (max:%d)", speed, people_cur, people_max );
     // %OBJ is a made up printf extension meaning "print
     // the corresponding object via it's print method"
 }
 ...
 implementation class Car;
 void method print()                        // display this car
 {
     printf( "Car( " );
     this.innerprint();
     printf( " )" );
 }
 void method innerprint() {                 // display the key/value innards of a car
     printf( "%s: pos:%OBJ, dir:%OBJ", name, pos, dir );
     printf( ", speed:%.3f, people: %d (max:%d)", speed, people_cur, people_max );
     printf( ", wheels:%d, fuel:%.3f (max %.3f)", wheels, fuel_cur, fuel_max );
 }
Of course, we're still Repeating Ourselves as much as before - the Car innerprint() method contains (as the first two lines) the whole of Vehicle's innerprint() method. All we need is a method chaining syntax to allow Car's innerprint() to invoke Vehicle's innerprint() method on the Car. For example:
 implementation class Car;
 void method innerprint() {                 // display the key/value innards of a car
     this.Vehicle.innerprint();             // method chain: call Vehicle's innerprint()
     printf( ", wheels:%d, fuel:%.3f (max %.3f)", wheels, fuel_cur, fuel_max );
 }
While method chaining is well understood, extremely useful - many OO languages have it - and does reduce repetition (generally a Good Thing(TM)), for the purposes of this article let's leave method chaining as an optional feature. We'll discuss in part three how we might implement it. So, for now, let's do without innerprint() as that was mainly to enable the method chaining - we said this whole bullet point was just an aside:-)
Given these class definitions, we can write OO pseudocode using them, inventing convenient name/value (hash-like) syntax for constructor arguments, many of which are optional:
 Vehicle v1 = new Vehicle( name: "v1",             // at origin (default)
                           dir: new Vector2(0,1),  // north (y increasing)
                           speed: 1.3,
                           people_max: 6 );
 v1.print;

 Car c1     = new Car(     name: "c1",
                           pos: new Vector2(10,0),
                           dir: new Vector2(1,1),  // northeast
                           speed: 2,
                           people_max: 4,
                           people_cur: 2,
                           fuel_max: 100);
 c1.print;
If we could run our OO pseudocode, this should print:
 Vehicle( v1: pos:[0,0], dir:[0,1], speed:1.3, people:1 (max:6) )
 Car( c1: pos:[10,0], dir:[1,1], speed:2, people:2, max:4, wheels:4, fuel:0 (max 100) )
Subclass Compatibility explained
Using our example classes, we can see and explain subclass compatibility (subtype polymorphism if you must) and dynamic dispatch in more detail:
A Car is a subclass (or specialisation) of a Vehicle, so every Car is also logically a Vehicle, but a Vehicle may or may not be a Car - so some Vehicles are Cars, some are not. In programming terms, that means that a Vehicle variable can store a Vehicle object (as above) or an object of any subclass of Vehicle, such as a Car. For example:
 Vehicle v2 = new Car(     name: "c2",
                           pos: new Vector2(20,0),
                           dir: new Vector2(1,0),  // west
                           speed: 6,
                           people_max: 3,
                           fuel_cur: 50,
                           fuel_max: 200);
(More scarily, a Vehicle variable can even store a future subclass of Car that we haven't written yet!).
In this case, v2 is storing a Car (the one named "c2"), and we have the weird situation that the object stored in v2 is really a Car (with all it's extra Car-shaped attributes and methods), but v2 is a Vehicle variable. We say that the static type of v2 is Vehicle, while the dynamic type of v2 is Car. In most languages, static type == dynamic type, but in OOP this is often not true.
More formally, a subclass (like Car) is subclass compatible with an ancestral class (like Vehicle) - so we can store a Car into a Vehicle as above. However, the opposite is not true: an ancestral class (like Vehicle) is not subclass compatible with a subclass (like Car). So trying to assign:
 Car badcar = new Vehicle(...);
is a compile time error in most OO languages, because a Car is more specialised than a Vehicle. Counterfactual: if we allowed it, how should the Vehicle, once stored in badcar, respond to a Car-specific method like addfuel() which it doesn't have? Make it up? Crash horribly? There's no good answer - so it can not be allowed.
So, what can we safely do with v2, this Car-Vehicle hybrid? It's static type primarily defines what it is and what it can do - it's a Vehicle. If we attempt to call a Car-specific method via v2, as in:
 v2.addfuel( 100 );
it must generate a compile-time error (because an OO language compiler cannot guarantee that a Vehicle variable would contain a Car under all circumstances, even though we can see intuitively that it's safe this time). Note that some dynamic OO languages, like SmallTalk, support something called extreme late binding that would allow any method to be called via any object, and would sort out at runtime what happened - with the fallback action being a fatal error.
Aside: an advanced OO compiler might generate code to check at run-time that v2 contains a Car-compatible object, aborting if the check fails, and allowing the addfuel call if the check succeeds. Simula-67 allowed you to write some code to explicitly allow this, from memory (it's been a while!) you wrote:
 (v2 qua Car).addfuel( 100 );
to express "I assert that v2 stores a Car, and want to treat (qualify) it as a Car right now"; but that's too advanced for us to worry about. I believe this is similar to a Type Assertion in Google's Go language.
However, under the hood the underlying object still knows that it's a Car, so any common Vehicle-and-Car methods we call, such as:
 v2.print;
will use dynamic dispatch to find which implementation of the print() method to call - depending on the dynamic type of the object stored in v2. So our "Vehicle" v2 will print out as a Car, which makes sense given that it is one!
 Car( c2: pos:[20,0], dir:[1,0], speed:6, people:0 (max:3), wheels:4, fuel:50 (max 200) )
Hence, dynamic dispatch is the key enabling technique that makes subclass compatibility work.
One more useful OO Feature: isa
Some OO languages give every object a method that enables you to check the dynamic type of an object directly at run-time. This is often called isa as in:
 assert v2.isa == "Car";
This is a useful OO feature - and easy to implement, so we will.
Aside: another thing we can use "isa" for, is to save having to redefine print() in Car. In our innerprint() refactored version, Vehicle and Car's print() methods were:
 implementation class Vehicle;
 void method print() {         // display this vehicle
     printf( "Vehicle( " );
     this.innerprint();
     printf( " )" );
 }
 ...
 implementation class Car;
 void method print()           // display this car
 {
     printf( "Car( " );
     this.innerprint();
     printf( " )" );
 }
Note that the pattern is exactly the same in both methods: first we print the classname, then we innerprint() the object wrapped in brackets. If we alter Vehicle's print() method to read:
 void method print()           // display this vehicle or whatever it is
 {
     printf( "%s( ", this.isa );
     this.innerprint();
     printf( " )" );
 }
Then we can delete Car's print() method entirely - it's no longer needed. but now the generalised print() method in Vehicle will describe a Vehicle as a Vehicle, a Car as a Car, a Future-Subclass-of-Car as a Future-Subclass-of-Car etc.
That's enough for one page: follow onto the second part of this article, in which I will specify a particular simple but general variant of OOP, then show how to translate a single class into C, leaving inheritance for later.
d.white@imperial.ac.uk
On to the second part of this article Back to PSD Top
Written: July 2014
Slightly revised: March 2017

Welcome to Duncan White's Practical Software Development (PSD) Pages.

Using features from one Language in a language without them: case study, Object Oriented Programming (OOP) in ANSI C

Attempting to define OOP (punch mist?)

Some Example Classes

Subclass Compatibility explained

One more useful OO Feature: isa

Using features from one Language in a language without them:
case study, Object Oriented Programming (OOP) in ANSI C