Practical Software Development: TEFEL 2: Go-Style Interfaces in C, Part 2:

Welcome to Duncan White's Practical Software Development (PSD) Pages.

I'm Duncan White, an experienced and professional programmer, and have been programming for well over 30 years, mainly in C and Perl, although I know many other languages. In that time, despite my best intentions:-), I just can't help learning a thing or two about the practical matters of designing, programming, testing, debugging, running projects etc. Back in 2007, I thought I'd start writing an occasional series of articles, book reviews, more general thoughts etc, all focussing on software development without all the guff.

See all my Practical Software Development (PSD) Pages

TEFEL #2: Another Example of a TEFEL tool: Go-style Interfaces in C, Part 2

Recap of Part 1:
In Part 1 of this article I explained how we might use Unix's libdl dynamic linking and introspection library, to simulate a Go-style interface, in which you "bind" a package to an interface, which either fails if the package is not compatible with that interface or succeeds, giving you an interface value. That interface value may be passed around as a parameter, and calls "through it" to any of the interface functions can be made at any later time.
An example interface in our hypothetical TEFEL form is f12.interface:
  %func void f1( void );
  %func int f2( void );
The corresponding output is a plain C f12.[ch] module, which implements an f12 structure type:
  // void_void_f: a pointer to a void->void function
  typedef void (*f12_void_void_f)( void );

  // int_void_f: a pointer to a void->int function
  typedef int (*f12_int_void_f)( void );

  // This represents the "interface f12" at run-time.
  // It's a container of SLOTS for the f12 functions..
  typedef struct
  {
	f12_void_void_f   f1;
	f12_int_void_f    f2;
  } f12;
and an f12_bind() function. In the .h file this appears as:
  /*
   * f12 *in = f12_bind( char *module, char *errmsg );
   *	Attempt to "bind" lib<module>.so to the f12 interface:
   *	Load "lib<module>.so" into memory, and attempt to locate the
   *	required symbols f1 and f2 (or <module>_f1 and <module>_f2...)
   *	within it's namespace.  For now, we just check for existence
   *	of those function symbols, later on we'll try to check the
   *	compatibility of the function signatures with the interface.
   *
   *	If we fail: strcpy an error message into errmsg and return NULL
   *	If we succeed: return an newly malloc()d f12 object with the
   *	slot function pointers bound to the corresponding functions in
   *	lib<module>.so (now in memory)
   */
  extern f12 *f12_bind( char *module, char *errmsg );
and in the .c file this appears as:
  #include <dlfcn.h>
  ..
  #include "lookup.h"

  /*
   * f12 *in = f12_bind( char *module, char *errmsg );
   *	[same comment]
   */
  f12 *f12_bind( char *module, char *errmsg )
  {
	char libname[1024];
	sprintf( libname, "lib%s.so", module );
	void *dl = dlopen( libname, RTLD_NOW );
	if( dl == NULL )
	{
		sprintf( errmsg, "f12_bind: dlopen of %s failed", libname );
		return NULL;
	}

	f12 *in = malloc(sizeof(*in));
	if( in == NULL )
	{
		strcpy( errmsg, "f12_bind: malloc() failed" );
		return NULL;
	}

	pkg_info   info;
	info.dl        = dl;
	info.module    = module;
	info.interface = "f12";
	info.libname   = libname;
	info.errmsg    = errmsg;

	in->f1 = (f12_void_void_f) lookup_function( &info, "f1" );
	if( in->f1 == NULL )
	{
		free(in);
		return NULL;
	}

	in->f2 = (f12_int_void_f) lookup_function( &info, "f2" );
	if( in->f2 == NULL )
	{
		free(in);
		return NULL;
	}

	return in;
}
So, in this second part, our goal is to actually build a TEFEL-style pre-processor to translate f12.interface onto f12.[ch] as shown above. As in my previous TEFEL examples, I'll build this tool in Perl, my go-to language for string processing and code generation. Of course you can choose to build a TEFEL-style tool in whatever language you prefer, there's nothing forcing you to use Perl.
Our First Goal: Generate the .h file from the .interface file
Let's set our initial goal to build a Perl TEFEL processor called cint (as our input language can be described as C with Interfaces), which can parse our f12.interface file, and generate the corresponding f12.h file. We'll produce the corresponding f12.c file later:
One thing we observe in passing is that our .interface format appears to contain nothing but our %func declarations, so in this particular case we don't care so much about letting all unmarked lines through the TEFEL processor assuming they are valid C. However, let's assume that our interface can also contain comments, constants (defines) and perhaps some types, which we will simply copy unaltered into the generated .h file.
This time, in order to enhance reuse, let's split out most of the TEFEL support code (low-level parsing) into Support.pm. This code is virtually unchanged from Article 11 so should need little explanation. Support.pm starts:
  # Support: general utility functions..

  use strict;
  use warnings;
  use 5.010;
  use Data::Dumper;
  use Function::Parameters;

  my $infh;      	# fd of current C+Int file we're translating
  my $lineno;  	# current line no inside $infh
  my $currline;	# current line from file (set by nextline(), used in fatal())

  #
  # my $ok = openfile( $inputfilename );
  #	Open $inputfilename, as the source of lines via nextline().
  #	Return 1 if it opens ok, 0 otherwise.
  #
  fun openfile( $inputfilename )
  {
	$lineno = 0;
	open( $infh, '<', $inputfilename ) || return 0;
	return 1;
  }

  #
  # my $line = nextline();
  #	Read the next line from $infh, incrementing $lineno afterwards,
  #	and return it.
  #
  fun nextline()
  {
	my $line = <$infh>;
	$currline = $line;
	$lineno++;
	return $line;
  }
 
  #
  # fatal( $whatsleft, $msg );
  #	Given $whatsleft (a suffix of $currline) and a message $msg, print
  #	a standard-formatted fatal error and die.
  #
  fun fatal( $whatsleft, $msg )
  {
	$currline =~ s/^\t/        /;
	$whatsleft =~ s/^\t/        /;
	my $err = $currline;
	my $pos = length($currline) - length($whatsleft) - 1;
	$pos = 0 if $pos < 0;
	my $indent = ' ' x $pos;
	$err .= "$indent^ Error at line $lineno: $msg\n";
	die "\n$err\n";
  }
Support.pm then continues with actual parsing routines, in order to parse the %func lines there, note that we're reusing the exact same function syntax, and hence can reuse the same function parse routines that we wrote in Article 11. First, we'll parse the parameters:
  #
  # my( $ok, @params ) = parse_params( $str );
  #	Parse 'void' or a non-empty comma-separated list of "simpletype id"
  #	parameters from $str, which should not contain any suffix after the
  #	parameters, eg. ) etc which must have always been removed..
  #	A simpletype is an identifier followed by zero-or-more '*'s,
  #
  #	if parsing is successful, build an array of "simpletype id" strings,
  #	and return ( 1, array ),
  #	else return ( 0, errmsg )
  #
  fun parse_params( $str )
  {
	#print "debug: parsing params from $str\n";
	if( $str =~ /^void\s*$/ )
	{
		return ( 1 );		# empty array
	}
	my @result;
	# a param is something like int p, int *p, int ****p etc.
	while( $str =~ s/^(\w+)\s*([*]*)\s*(\w+)\s*,?\s*// )
	{
		push @result, "$1 $2$3";
	}
	if( $str )
	{
		my $n = @result;
		return (0, "params: junk <$str> after parsing $n params");
	}
	return (1, @result);
  }
Next, we'll parse the whole function line - note that this code has changed slightly, in particular we now pass in a checker callback function which is invoked in order to check whether the function named in the function line has already been declared, hence the voluminuous comment block that is nearly as long as the code!
  #
  # my( $ok, $info ) = parse_func_line( $line, $checker );
  #	Given a line $line, that should be a C-like function defn in which
  #	the return type, and each of the parameter types, is a "Duncan's
  #	Simple Type" (ie. an identifier followed by zero or more '*'s),
  #	attempt to parse the whole line.
  #
  #	If the line parses successfully as a function definition, then the
  #	$checker coderef is called with the function name (to check that the
  #	function has not already been defined) - the checker coderef should
  #	return an error message if the function is already defined, or undef
  #	otherwise.
  #
  #	A successfully parsed, previously undefined, function
  #	causes us to return ( 1, function info hashref ),
  #	otherwise we return ( 0, errmsg )
  #
  #	The fields in a function info hashref are:
  #		FUNCNAME     => $funcname,
  #		RETURNTYPE   => $returntype,
  #		PARAMS	     => \@params,
  #		PARAMSTR     => $paramstr,
  #		ORIGLINE     => $origline,
  #	(PARAMSTR is the C-format parameter string, eg "char *x, int y")
  #
  fun parse_func_line( $line, $checker )
  {
	chomp $line;
	my $origline = $line;

	$line =~ s/^\s+//;
	my $returntype;
	if( $line =~ /^void\s+\w/ )
	{
		$returntype = "void";
		$line =~ s/^void\s+//;
	} elsif( $line =~ s/^(\w+)\s*([*]*)\s*// )
	{
		my( $t, $stars ) = ($1,$2);
		$returntype = $stars?"$t $stars":$t;
	} else
	{
		return( 0, "<void | simpletype> expected at <$line>" );
	}

	$line =~ s/^(\w+)\s*//;
	my $funcname = $1;

	# check whether $funcname is already defined?
	my $error = $checker->( $funcname );
	return( 0, $error ) if defined $error;

	$line =~ s/^$\s*//;
	$line =~ s/\s*$\s*;?$//;

	# line now contains only the parameters.
	my $params = $line;
	my( $ok, @params ) = parse_params( $params );
	return( 0, @params ) unless $ok;

	my $paramstr = @params ? join( ', ', @params ) : 'void';
	my $info = {
		FUNCNAME     => $funcname,
		RETURNTYPE   => $returntype,
		PARAMS	     => \@params,
		PARAMSTR     => $paramstr,
		ORIGLINE     => $origline,
	};
	#print Dumper $info;

	return (1, $info);
  }

  1;
Now that we have Support.pm, let's start to write cint, our TEFEL processor. cint starts:
  #!/usr/bin/perl
  #
  #	cint:	a prototype "C with interfaces" to C translator..
  #
  #		This tool contains an experimental "C with go-style interfaces"
  #		to C translator based on some thoughts I had about how we could
  #		implement Go-style interfaces to packages in C.  It's a side
  #		effect of a LinkedIn discussion that I started in the Plain
  #		Ordinary C group, to discuss my TEFEL idea - Nigel Evans
  #		suggested some form of lightweight OO for C, on the lines of
  #		JavaScript's prototype-based model, and I said "or what
  #		about Go-style interfaces.."  and then started thinking:-)
  #
  #		This is the result.  It translates a single F.interface
  #		"C+Interfaces" source file to the corresponding F.h
  #		file implementing that interface - the F.c will be produced
  #		by a later version of this tool
  #
  #	(C) August 2018, Duncan C. White
  #
  
  use strict;
  use warnings;
  use 5.010;
  use Data::Dumper;
  use Getopt::Long;
  use Function::Parameters;
  use FindBin qw($Bin);
  
  use lib "$Bin";
  use Support;
  use Sig;
Note here that we are using Perl's FindBin module in order to write Position Independent Code - to allow Perl to find our Support.pm module in the same directory as cint lives in. This allows you to run /absolute/path/of/cint - and have it find Support.pm in /absolute/path/of/Support.pm.
Next, cint will need various global data structures to represent the current interface we are translating to C:
  my $interface;	# name of current interface
  my %isfunc;		# set of all marked functions in the interface
  my @func;		# the marked functions in the order we saw them
  my @structfield;	# the structure fields
  my %seensig;		# the set of signatures we've already seen
  my %funcsig;		# function -> signature mapping
(Where these are not obvious, for example what we mean by a "signature", will be explained later).
Next, let's look at cint's main code which reads the interface file:
  die "Usage: cint filename\n" unless @ARGV == 1;
  
  my $inputfilename = shift;
  $interface = $inputfilename;
  $interface =~ s/\.interface$//;
  my $cfilename = "$interface.c";
  my $hfilename = "$interface.h";
  
  openfile( $inputfilename ) || die "cint: can't open $inputfilename\n";
  
  unlink( $cfilename );
  unlink( $hfilename );
Then, cint parses the input, handling each line - translating each %func declaration into a valid C prototype, and accumulating the desired .h file contents in the string variable $htext:
  my $htext = "";		# generated .h file contents
  while( defined( $_ = nextline() ) )
  {
  	$htext .= handle_line( $_ );
  }
  
  # TODO: make the struct definition for the interface,
  # and make the bind heading for the interface..

  # build the .h file
  open( my $hfh, '>', $hfilename ) || die "cint: can't create $hfilename\n";
  print $hfh $htext;
  close( $hfh );
Next, of course we'll need to write handle_line(), very similar to the corresponding function in Article 11. The first half of this function is:
  #
  # my $text = handle_line( $line );
  #	handle $line, modifying @structfield and %seensig, returning any
  #	text (in plain C format) that should go straight into the .h file.
  #
  fun handle_line( $line )
  {
  	return $line unless $line =~ /^%/;		# copy non-% lines
  
  	if( $line =~ s/^(\s*)%func\s*// )		# found %func?
  	{
  		#print "found %func\n";
  	        my $origindent = $1;
  		my( $ok, $info ) = parse_func_line( $line, \&checkfunc );
  		fatal( $line, $info ) unless $ok;
  
  		my $funcname = $info->{FUNCNAME};
  		my $rettype  = $info->{RETURNTYPE};
  		my $origline = $info->{ORIGLINE};
  		my $params   = $info->{PARAMS};
  
  		print "debug: found func $origline\n";
  
  		my $htext = "";

		# TODO: generate a typedef for this function in $htext
		# and record that the structure (to be generated later)
		# contains one more field.
  
  		return $htext;
  	}
  	fatal( $line, "Unhandled % line" );
  }
The call to parse_func_line() takes a reference to a function checking function checkfunc() as a parameter.
  #
  # my $error = checkfunc( $funcname );
  #	Check whether function $funcname is already defined - if so, return
  #	a sensible error message, otherwise return undef.  Also marks the
  #	function as defined..
  #
  fun checkfunc( $funcname )
  {
  	return "function $funcname already defined" if $isfunc{$funcname}++;
  	return undef;
  }
Now, to fill in the TODO section, let's review what a single %func line in our .interface input file looks like, and the corresponding function pointer typedef that we wish to construct in the .h file: A single input line might read:
  %func void *f1( char *s, int n );
and the corresponding function pointer type that we wish to construct reads:
  typedef void *(*f12_void_charstar_int_f)( char *, int );
Of that, the interesting part is the name of the type itself:
  f12_voidstar_charstar_int_f
That may be described symbolically as:
  ${interface}_${functionsignature}_f
Here, the interface name is f12. But what do we mean by a function signature? It is a representation of the unique combination of our function's return type (voidstar) and parameter types:
  ${returntype}_${paramtypes}
Specifically, for our example function we get the signature:
  voidstar_charstar_int
The signature string comprises an underscore-separated list of type names, each turned into an alphanumeric word - voidstar (the return type), charstar (the type of the first parameter) and int (the type of the second parameter). Note that, as usual in our TEFEL tools, we are only dealing with Duncan's Simple Types, both for the return type and for all parameter types - recall that a Duncan's Simple Type (a DST) is a typename followed by zero or more '*'s (symbolising "pointer to").
Given a DST, we turn each '*' into the word star and remove all spaces, concatenating the basic type name and zero or more stars to form the word representation of the DST. So void generates void, void * generates voidstar, char * generates charstar, int generates int, and char *** would generate charstarstarstar. This transformation can be written in Perl as:
  $type =~ s/\*/star/g;
  $type =~ s/\s+//g;
  return $type;
This code is seen a few lines below inside a function called dst2word(). Then we concatenate all the DST type words together with underscores to form the function signature, giving us the following signature creation function which we call makesig():
  my $sig = makesig( $rettype, @params );
That, given $rettype="void *" and @params=("char *s", "int n"), will generate $sig="voidstar_charstar_int". I've put makesig() and dst2word() in a separate module Sig.pm:
  # Sig: signature functions

  use strict;
  use warnings;
  use 5.010;
  use Data::Dumper;
  use Function::Parameters;

  #
  # my $word = dst2word( $type );
  #	Transform a Duncan Simple Type ($type) - a typename followed by
  #	zero or more '*'s - into a single word description, eg. if type
  #	is "char **", the word is "charstarstar".
  #
  fun dst2word( $type )
  {
	$type =~ s/\*/star/g;
	$type =~ s/\s+//g;
	return $type;
  }

  #
  # my $sigstr = makesig( $returntype, @params );
  #	Build a signature string, eg. if $returntype is "char *", and @params
  #	are ( "void *x", "int y", "char **z" ), the signature string is
  #	"charstar_voidstar_int_charstarstar".
  #      ^ returntype      ^ 2nd param
  #	          ^ 1st param  ^ 3rd param
  #
  fun makesig( $returntype, @params )
  {
	$returntype = dst2word( $returntype );
	@params      = ("void x") if @params == 0;
	my $argtypes = join( '_',
		map {
			s/\w+$//;		# remove the parameter name
			dst2word( $_ )
		} @params );
	return "${returntype}_${argtypes}";
  }

  1;
This infrastructure now enables us to fill in the missing half of handle_line(), replacing the TODO comment with:
  my $sig      = makesig( $rettype, @$params );
  my $typename = "${interface}_${sig}_f";

  print "debug: found func $origline with sig $sig\n";

  my $htext = "";
  unless( $seensig{$sig}++ )
  {
	my $args = join( ', ', map {
		s/\w+$//;	# remove the parameter name
		$_;
	} @$params );
	$args = "void" unless $args;
	$htext = "typedef $rettype (*$typename)( $args );\n";
  }
  push @structfield, "$typename $funcname;";
  push @func, $funcname;
  $funcsig{$funcname} = $sig;
In particular, note that we must only generate a typedef for each unique function signature - hence our use of %seensig to store the set of all signatures that we've already seen, and the Perlish idiom unless( $seensig{$sig}++ ) to mean "if this signature has not been seen before".

By contrast, we unconditionally add a $typename $funcname entry to the array of structure fields for the interface structure (that we will generate shortly) whether or not the function signature is unique. Similarly, we unconditionally append the function name to @func, the list of all functions seen, and unconditionally associate the function signature with the function name in the %funcsig hash.
Having processed all the input, generated a typedef for each distinct function signature, and recorded information about all the functions we've seen, the next step is to generate the interface structure from the @structfield array we built up while handling %func lines:
  #
  # my $text = makestruct( $interface, @structfield );
  #	Generate the structure type for the interface..
  #
  fun makestruct( $interface, @structfield )
  {
  	my $struct = "typedef struct\n{\n";
  	$struct .= join( '', map { s/^/\t/; "$_\n" } @structfield );
  	$struct .= qq(} $interface;\n);
  
  	my $str = qq(
  // This represents the "interface $interface" at run-time.
  // It's a container of SLOTS for the $interface functions..
  $struct
  	);
  	return $str;
  }
Next, we want to generate the interface bind function, initially we'll want to generate a function declaration (an extern declaration) suitable for inserting into the .h file, but we will want to produce an extremely similar true function definition later, so let's write a function which can do either, including the usage comment in both declaration and definition:
  #
  # my $str = makebind( $extern, $interface );
  #	Generate the bind function definition or declaration for the
  #	current interface
  #
  fun makebind( $extern, $interface )
  {
  	my $str = qq[
  /*
   * $interface *in = ${interface}_bind( char *module, char *errmsg );
   *	Attempt to "bind" lib<module>.so to the $interface interface:
   *	Load "lib<module>.so" into memory, and attempt to locate all the
   *	required function symbols inside the library.  For each function
   *	called <fname>, we look first for a symbol "fname", then if that fails,
   *	for a symbol ""module_fname".
   *
   *	If we fail to find even one of the required functions: strcpy
   *	an error message into errmsg and return NULL
   *
   *	If we succeed then we say we have "bound" the module to the interface:
   *	we return an newly malloc()d $interface object with the slot function
   *	pointers bound to the corresponding functions in lib<module>.so
   */
  $extern$interface *${interface}_bind( char *module, char *errmsg )];
  	return $str;
  }
Back in the main code of cint, we initially opened the files and handled all the lines, generating an incomplete .h file. Now we can use makestruct() and makebind() to complete the .h file by replacing another TODO comment with:
  # make the struct definition for the interface..
  $htext .= makestruct( $interface, @structfield );
  
  # make the bind heading for the interface..
  $htext .= makebind( "extern ", $interface ).";\n";
For clarity, here's the whole of the main code of cint again:
  die "Usage: cint filename\n" unless @ARGV == 1;

  my $inputfilename = shift;
  $interface = $inputfilename;
  $interface =~ s/\.interface$//;
  my $cfilename = "$interface.c";
  my $hfilename = "$interface.h";

  openfile( $inputfilename ) || die "cint: can't open $inputfilename\n";

  unlink( $cfilename );
  unlink( $hfilename );

  my $htext = "";		# generated .h file contents
  while( defined( $_ = nextline() ) )
  {
	$htext .= handle_line( $_ );
  }

  # make the struct definition for the interface..
  $htext .= makestruct( $interface, @structfield );

  # make the bind heading for the interface..
  $htext .= makebind( "extern ", $interface ).";\n";

  # build the .h file
  open( my $hfh, '>', $hfilename ) || die "cint: can't create $hfilename\n";
  print $hfh $htext;
  close( $hfh );
At this point, we have our first version of cint that should be able to generate the .h file corresponding to the given interface.
You'll find this version ready for download in the tarball 08cint1.tgz. Using that version, run:
  ./cint f12.interface 
and then examine the newly generated f12.h which should read exactly like the version we built by hand in part 1.
Our Second Goal: Generate the .c file from the .interface file
Ok, our next step is to extend cint to generate the .c file as well as the .h file.
In the main body of cint we append:
  my $ctext = "";         # generate the .c file..

  $ctext .= c_preamble( $interface );

  # make the bind heading for the interface..
  $ctext .= makebind( "", $interface ) . "\n";

  $ctext .= makebindbody( $interface );

  open( my $cfh, '>', $cfilename ) || die "cint: can't create $cfilename\n";
  print $cfh $ctext;
  close( $cfh );
Now all we need to do is implement those functions. Let's start with c_preamable() which is largely boilerplate:
  #
  # my $str = c_preamble( $interface );
  #	Produce the preamble for the .c file, for $interface.
  #	This mainly comprises the headers etc.
  #
  fun c_preamble( $interface )
  {
	my $str =
  qq(#include <stdio.h>
  #include <stdlib.h>
  #include <string.h>
  #include <assert.h>
  #include <dlfcn.h>

  #include "$interface.h"
  #include "lookup.h"


  );
	return $str;
  }
makebind() has already been implemented - here we call it with "" as the extern value, to make it produce a function definition header.
That leaves makebindbody(), that generates the entire body of the bind function. Let's build that piecemeal from a top section (boilerplate), a middle section containing the lookup of each function and the assignment of the result to the corresponding field, and a bottom section (the return statement). Leaving the middle section for a moment, let's write the top and bottom parts:
  #
  # my $str = makebindbody( $interface );
  #	Construct the body of the bind function for $interface.
  #	(Also uses @func and %funcsig
  #
  fun makebindbody( $interface )
  {
	my $top =
  qq({
	char libname[1024];
	assert( strlen(module) < 1000 );
	sprintf( libname, "lib%s.so", module );
	void *dl = dlopen( libname, RTLD_NOW );
	if( dl == NULL )
	{
		sprintf( errmsg, "${interface}_bind: dlopen of %s failed", libname );
		return NULL;
	}

	${interface} *in = malloc(sizeof(*in));
	if( in == NULL )
	{
		strcpy( errmsg, "${interface}_bind: malloc() failed" );
		return NULL;
	}

	pkg_info   info;
	info.dl        = dl;
	info.module    = module;
	info.interface = "$interface";
	info.libname   = libname;
	info.errmsg    = errmsg;
  );

	my $middle = "";

	# TODO: build the middle section

	my $bottom =
  qq(
	return in;
  }
  );

	return $top.$middle.$bottom;
  }
Here's the missing middle section, replacing the obvious TODO: it iterates over the data structures (@func) and %funcsig that we built earlier to produce one call to lookup_function() per function, and then tests the result:
	foreach my $f (@func)
	{
		my $sig = $funcsig{$f};
		#print "func $f, sig $sig\n";
		$middle .=
  qq(
	in->$f = (${interface}_${sig}_f) lookup_function( &info, "$f" );
	if( in->$f == NULL )
	{
		free(in);
		return NULL;
	}
  );
	}
You'll find this version ready for download in the tarball 09cint2.tgz. Using that version, run:
  ./cint f12.interface 
and then examine the newly generated f12.c which should read exactly like the version we built by hand in part 1. Most importantly, if you then run:
  make
it should compile f12.c successfully, with no warnings, and compile all the test programs and sample packages, ready for you to test.
In particular, you should be able to rerun various tests, eg:
  make
  export LD_LIBRARY_PATH=".:$LD_LIBRARY_PATH"
    (or setenv LD_LIBRARY_PATH ".:$LD_LIBRARY_PATH" for csh users)
  ./f12_any_pkg pkg1 pkg2
and see:
  pkg1::f1
  pkg1::f2, returning 1
  f2 returned 1

  pkg2::f1
  pkg2::f2, returning 42
  f2 returned 42
just as we did using our hand built f12.[ch] module in part 1.
So at this point, we have our Minimum Viable Product, ie. our first functionally complete version of the cint tool. How much code have we written? Sig.pm comprises 45 lines of new code. Support.pm comprises 167 lines of mostly reused code. cint itself comprises 273 lines of mostly new code. In total we've written 485 lines of Perl code.
Our Third Goal: Better checking of package functions
At present, all we are checking for each required function F is that a public symbol (which we assume to be a function) exists in a package, that is either called F or PKG_F (where PKG is the package name). We make no attempt at all to check that the return type and/or numbers and types of parameters are compatible - in other words that the signature is compatible.
I thought it might be worth trying to improve on this. Many C++ compilers essentially rename functions to include information about their signature, we could do that (ideally via a TEFEL style tool), but that makes the package functions almost impossible to call except via the interface system.
Instead, each package file (pkg[1-3].c) can optionally decide to define a FLAG VARIABLE per function (only the existence of the flag variable is used), and a PKG_useflagvars variable to indicate that lookup_function() should try to check them.
So, for instance, pkg1.c could be annotated to include:
char pkg1_useflagvars;		// check for existence of the flag variables
char pkg1_f1_void_void;
char pkg1_f2_int_void;
char pkg1_f3_void_charstar_int;
char pkg1_f4_voidstar_int;
Initially, we'll define these flag variables ourselves manually, which is obviously a pain. But then we'll automate that too.
So, most of the changes are plain C code in lookup.[ch]. First, we modify lookup_function() to take an additional parameter char *sig - the function signature to check for (or NULL if we don't want to check). Here's the new heading, with a revised comment:
  //
  // void *p = lookup_function( pkg_info *info, char *funcname, char *sig );
  //	look within info->dl, a dynamic library opened by dlopen(), for
  //	either the global symbol <funcname> or the global symbol
  //	<modulename_funcname>.
  //	If we find either, return a pointer to the first one we found.
  //	If not, return NULL.
  //	We also check whether a flag variable corresponding to the given
  //	signature (sig) - if it's not NULL - exists in the package, unless
  //	a flag variable PKG_useflagvars does not itself exist.
  //
  void *lookup_function( pkg_info *info, char *funcname, char *sig )
The body of lookup_function() needs to be rearranged, previously we found a symbol (either F or PKG_F) and immediately returned a pointer to it, delivering an error message if we fell off the end. Now we invert that, generating and returning the error message if no candidate symbol is found:
  // can we look up the unadorned symbol funcname inside the dl?
  void *p = dlsym( info->dl, funcname );
  if( p == NULL )
  {
	// can we look up the module-qualified symbol inside the dl?
	char fullname[1024];
	sprintf( fullname, "%s_%s", info->module, funcname );
	p = dlsym( info->dl, fullname );
	if( p == NULL )
	{
		sprintf( info->errmsg,
			"No symbol '%s' or '%s' in %s",
			funcname, fullname, info->libname );
		return NULL;
	}
  }
Then (now that we know that a candidate symbol has been found) checking the signature:
  char useflags[1024];
  sprintf( useflags, "%s_useflagvars", info->module );

  // Should we check for a signature flag variable?
  if( sig != NULL && dlsym( info->dl, useflags ) != NULL )
  {
	// check the flag variable for the signature
	char sigsym[1024];
	sprintf( sigsym, "%s_%s", info->module, sig );

	// if the signature symbol doesn't exist, fail..
	if( dlsym( info->dl, sigsym ) == NULL )
	{
		sprintf( info->errmsg,
			"%s_bind: No sig symbol '%s' in %s",
			info->interface, sigsym, info->libname );
		return NULL;
	}
  }

  return p;
The changes in cint are very minor: simply generating calls to lookup_function() with the extra signature parameter:
	$middle .=
  qq(
	in->$f = (${interface}_${sig}_f) lookup_function( &info, "$f", "${f}_${sig}" );
	....
  );
and revising the generated bind function's comment.
You'll find this version ready for download in the tarball 10cint3.tgz. Using that version, run:
  ./cint f12.interface 
and then examine the newly generated f12.c which passes the extra signature parameter to each call to lookup_function(), and the larger modifications in lookup_function() (in lookup.c) itself.
In that version you'll also find that pkg1.c has been modified to include the flag variables, whereas pkg[23].c have not.
If you then run:
  make
  export LD_LIBRARY_PATH=".:$LD_LIBRARY_PATH"
    (or setenv LD_LIBRARY_PATH ".:$LD_LIBRARY_PATH" for csh users)
  ./f12_any_pkg pkg1 pkg2
you should see it work exactly like it did before.
However, if you hand edit pkg1.c and change (say) f1()'s return type from void to int:
  int pkg1_f1( void )
  {
	printf( "pkg1::f1\n" );
  }
and modify it's corresponding signature flag variable name to:
  char pkg1_f1_int_void;
Then run:
  make
  ./f12_any_pkg pkg1
reports:
  f12_bind: No sig symbol 'pkg1_f1_void_void' in libpkg1.so
which indicates that it correctly checked for the signature and failed to find it.
Our Fourth Goal: Automatic generation of flag variables
Creating those flag variables manually is a pain, is tremendously error prone, and breaks the DRY principle.
We can't help but notice that all the information is present in the package source file, contained (of course) in each package function we declare. So instead, let's autogenerate them from marked up package files. Instead of writing pkg1.c in plain C, let's replace it with pkg1.pkg in which each package function is annotated with %func:
  /*
   *	pkg1: a collection of C functions, that may accidentally satisfy
   *	      one or more interfaces that don't yet exist..
   */

  #include <stdio.h>
  #include <stdlib.h>

  %func int pkg1_f1( void )
  {
	printf( "pkg1::f1\n" );
  }

  %func int pkg1_f2( void )
  {
	printf( "pkg1::f2, returning 1\n" );
	return 1;
  }

  %func void pkg1_f3( char *s, int x )
  {
	printf( "pkg1::f3, s='%s', x=%d\n", s, x );
  }

  %func void *pkg1_f4( int n )
  {
	printf( "pkg1::f4, x=%d, returning NULL\n", n );
	return NULL;
  }
Now our goal becomes: implement a small companion TEFEL tool called cpkg which reads the .pkg file as shown above, and translates each %func entry to plain C, adding the corresponding flag variable as it goes. At some time, cpkg must also emit the PKG_useflagvars flag variable itself.

You will of course notice that our %func syntax above is EXACTLY THE SAME as we used in the .interface files - and hence the new cpkg tool can use all the TEFEL infrastructure that we painstakingly built up for cint. In particular it uses the unmodified Support.pm parsing module, and the unmodified Sig.pm signature module, and cpkg itself is basically a cloned and simplified version of cint. cpkg only produces a .c file, so we remove the code that writes out a .h file.

Several data structures are no longer needed, so are removed. Most of cint's code generation logic (functions makestruct(), makebind(), c_preamble() and makebindbody()) are removed. The main change is that all the results of handle_line() are accumulated in a $text Perl string and written out to the .c file at the end.
Within handle_line(), for every parsed %func line, we build the signature immediately, and then run the following action code:
	print "debug: found func $origline with sig $sig\n";

	my $name = $funcname;
	$name =~ s/^${package}_//;	# remove the package name
	my $sigvar = "${package}_${name}_${sig}";

	my $text = "char $sigvar;\t// $funcname signature variable:\n".
		   "$origline\t// $funcname function:\n";
	return $text;
Right at the end, we add the useflagvars flag as follows:
	$text .= "\nchar ${package}_useflagvars;          // check the existence of the flag variables\n";
Here is the complete code of cpkg: it is just over 100 lines long, barely more than a third of the length of cint:
  #!/usr/bin/perl
  #
  #	cpkg:	a prototype "C with packages" to C translator
  #
  #		This tool is the other side (the "package" side) of an
  #		experimental "C with go-style interfaces" to C translator.
  #
  #		This is the result.  It translates a single F.pkg
  #		"C+Package" source file to the corresponding F.c
  #		file implementing that interface.  A .pkg file marks
  #		certain functions with %func, and for each of these,
  #		a signature checking variable is automatically generated.
  #
  #	(C) August 2018, Duncan C. White
  #
  use strict;
  use warnings;
  use 5.010;
  use Data::Dumper;
  use Getopt::Long;
  use Function::Parameters;
  use FindBin qw($Bin);

  use lib "$Bin";
  use Support;
  use Sig;

  my $package;		# name of current package
  my %isfunc;		# set of all functions in the package

  #
  # my $error = checkfunc( $funcname );
  #	Check whether function $funcname is already defined - if so, return
  #	a sensible error message, otherwise return undef.  Also marks the
  #	function as defined..
  #
  fun checkfunc( $funcname )
  {
	return "function $funcname already defined" if $isfunc{$funcname}++;
	return undef;
  }

  #
  # my $text = handle_line( $line );
  #	handle $line, returning any text (in plain C format) that should go
  #	straight into the .c file.
  #
  fun handle_line( $line )
  {
	return $line unless $line =~ /^%/;		# copy non-% lines

	if( $line =~ s/^(\s*)%func\s*// )		# found %func?
	{
		#print "found %func\n";
		my( $ok, $info ) = parse_func_line( $line, \&checkfunc );
		fatal( $line, $info ) unless $ok;

		my $funcname = $info->{FUNCNAME};
		my $rettype  = $info->{RETURNTYPE};
		my $origline = $info->{ORIGLINE};
		my $params   = $info->{PARAMS};
		my $sig      = makesig( $rettype, @$params );

		print "debug: found func $origline with sig $sig\n";

		my $name = $funcname;
		$name =~ s/^${package}_//;	# remove the package name
		my $sigvar = "${package}_${name}_${sig}";

		my $text = "char $sigvar;\t// $funcname signature variable:\n".
			   "$origline\t// $funcname function:\n";
		return $text;
	}
	fatal( $line, "Unhandled % line" );
  }

  die "Usage: cpkg filename\n" unless @ARGV == 1;

  my $inputfilename = shift;
  my $basename = $inputfilename;
  $basename =~ s/\.pkg$//;
  my $cfilename = "$basename.c";

  $package = $basename;

  openfile( $inputfilename ) || die "cpkg: can't open $inputfilename\n";

  unlink( $cfilename );

  my $text = "";		# generated .c file contents
  while( defined( $_ = nextline() ) )
  {
	$text .= handle_line( $_ );
  }

  # add the "useflagvars" flag.
  $text .= "\nchar ${package}_useflagvars;          // check the existence of the flag variables\n";

  open( my $cfh, '>', $cfilename ) || die "cpkg: can't create $cfilename\n";
  print $cfh $text;
  close( $cfh );
You'll find this version ready for download in the tarball 11cint4.tgz. Using that version, run:
  ./cpkg pkg1.pkg
and then examine the newly generated pkg1.c. You'll see that it defines all the signature flag variables, interspersed with the functions themselves.
In that version you'll also find that the Makefile has been modified to autogenerate pkg1.c from pkg1.pkg by invoking cpkg, but the other pkg[23].c modules have been left alone. Converting these to .pkg files and modifying the Makefile to generate them is left for you.
If you then run:
  make
  export LD_LIBRARY_PATH=".:$LD_LIBRARY_PATH"
    (or setenv LD_LIBRARY_PATH ".:$LD_LIBRARY_PATH" for csh users)
  ./f12_any_pkg pkg1 pkg2
you should see it work exactly like it did before.
However, if you hand edit pkg1.pkg and change (say) f1()'s return type from void to int:
  %func int pkg1_f1( void )
  {
	printf( "pkg1::f1\n" );
  }
(no need now to modify it's corresponding signature flag variable, that's autogenerated!) Then run:
  make
  ./f12_any_pkg pkg1
reports:
  f12_bind: No sig symbol 'pkg1_f1_void_void' in libpkg1.so
which indicates that cpkg generated the correct altered signature flag variable, and then when we ran f12_any_pkg it's lookup_function() correctly checked for the expected signature and failed to find it.
There are other changes we could make, but we have to stop somewhere. So, left as exercises for you are the following observations:
Once all packages are built via cpkg, we can rely on signature flag variables always existing, so would no longer need to generate PKG_useflagvars and test for it's existence; remove it.
Looking at the .pkg file above, the package name is repeated an awful lot: violating the DRY principle. cpkg knows the package name, and now we know which public functions exist in the package (because they're all marked with %func), we could write functions like:
  %func int f1( void )
  {
	printf( "pkg1::f1\n" );
  }
and have cpkg automatically qualify each marked function with it's package name. Even better, it could redefine a C pre-processor macro FUNCNAME containing the string pkg1::f1 in exactly the format our function's printf() happens to need, allowing us to write:
  %func int f1( void )
  {
	printf( FUNCNAME ## "\n" );
  }
Implement this change. Technically, if cpkg has arranged to qualify every marked function's name, then lookup_function() no longer needs to check for unadorned function name symbols, so you could remove that code too.
If you have implemented the above suggestion, then you may notice that your %func definitions are now exactly the same as those we find in an interface file - previously, they were syntactically the same, but the function names in the interface were always unqualified, whereas those in the .pkg file might have been qualified or unqualified. But now that they are all guaranteed to be unqualified, this gives you a shorthand way of creating an interface file in the first place: write a .pkg file first, and then build the abstract interface via:
  grep %func P.pkg > I.interface
Build a tiny tool (perhaps a Perl script, perhaps just a shell script) that takes P and I as parameters and invokes the above command to generate the corresponding interface. Call it mkinterface. Can you add any useful features to such a utility? For example, how about a set of functions to exclude from the interface, set by a command line argument, perhaps you might invoke it as:
  ./mkinterface --exclude 'this_function' --exclude 'that_function' PKG INT
Currently Support.pm and Sig.pm are entirely independent - largely because I wrote Sig.pm after writing Support.pm. But it occurs to me that cpkg and cint contain some repeated code: specifically every time we call parse_func_line(), we write the following code sequence which almost immediately calls makesig() to generate the function's signature:
  my( $ok, $info ) = parse_func_line( $line, \&checkfunc );
  fatal( $line, $info ) unless $ok;

  my $funcname = $info->{FUNCNAME};
  my $rettype  = $info->{RETURNTYPE};
  my $origline = $info->{ORIGLINE};
  my $params   = $info->{PARAMS};
  my $sig      = makesig( $rettype, @$params );

  print "debug: found func $origline with sig $sig\n";
This repeated code extracts several fields from the returned info hash (in particular: the return type and the parameter array reference), solely in order to pass them into makesig().
If we moved the call to makesig() inside parse_func_line(), and stored the signature in the %info hash it returns, the above code sequence in both cpkg and cint would become:
  my( $ok, $info ) = parse_func_line( $line, \&checkfunc );
  fatal( $line, $info ) unless $ok;

  my $funcname = $info->{FUNCNAME};
  my $origline = $info->{ORIGLINE};
  my $sig      = $info->{SIG};

  print "debug: found func $origline with sig $sig\n";
Think about it, and decide whether or not to couple the modules together in this fashion. Might there ever be a case where pre-computing the function signature during parsing is a bad idea?
Try these tools - and mkinterface if you've built it - out in a larger example. Construct a realistic .pkg file with around a dozen functions, make an interface from (say) 8 of them that cohere sensibly, then build a second package implementing the same set of functions a different way, and check that the two packages are compatible. Build a second interface with a couple of different implementations. If anything about using these tools at this slightly bigger scale irritates you - fix it! Please let me know the results if you do try this - I might well learn something that wouldn't be obvious to me from someone else using these tools at scale.

Finally, Go gets much of it's expressive power not just from using interfaces, but from passing interface-A values themselves as parameters in other interface-B functions: we might call this Higher Order Interfaces. Can you think up - and try using cpkg, cint and mkinterface - one or more examples of this, and see how powerful you find the technique?
Summary

I have now shown how to implement two TEFEL-style tools (cint and cpkg) to transform C+Interfaces and C+Packages input syntax into plain C. For the first time, to enhance reuse, these tools have much of the shared support code broken out into Support.pm (for most of the parsing code, generalised) and Sig.pm (for code to generate function signatures). In particular, this has shown that the shared support code is highly reusable, and we saw how easy it made building cpkg as - frankly - an afterthought. Writing 100 lines of Perl code to implement cpkg, the vast majority of which were already present in cint, was no more than half an hour's work. This shows that TEFEL can be almost an experimental what if we do this? technique.
Of course the implementation technique (libdl) is currently platform dependent, which is not ideal. If anyone can think of a way of making this more portable, please tell me?
Update: since writing this, I've discovered: http://jamesgregson.blogspot.com/2010/01/portable-dynamic-libraries.html that addresses this question for Linux, Windows and MacOS. His examples are C++ but I'm sure one can easily turn them into C.
If anyone would like to take James's technique, adapt it for C, and check that it still works reliably on Linux, MacOS and Windows, and then adapt cint and lookup_function() to replace libdl calls with calls via James's portability layer, please let me know the results.

d.white@imperial.ac.uk
Back to PSD Top
Written: August 2019