Optimal Code Selection in Component-based Scientific Programs

-------------------------

Motivation

One of the recent research efforts in the Software Performance Optimisation Research Group is a DESO (Delayed Evaluation, Self-Optimising) library of parallel scientific software components. This library comes with both C and C++ bindings. An example of the C++ interface is the following simple numerical algorithm:

/* 
 * A is an N-by-N matrix
 * r, z, p, q, x are N-element vectors
 * rho, rho_o, beta, alpha and err are scalars 
 * 
 * Uses the Conjugate Gradient algorithm to solve
 * the equation Ax = b for x. 
 */
for ( i = 1; i <= max_iter; i++ ) {
  /* ... */
  rho = r * z;
  if (i == 1) { 
    p = z;
  } else { 
    z = z + (rho / rho_o) * p;
    p = z;
  }

  q = A * p;
  alpha = rho / (p * q);
  x = x + alpha * p;
  r = r - alpha * q;

  err = nrm2(r); /* Check convergence */ 
  /* ... */ 
}

See our EuroPar paper for an illustration for how a simple high-level piece of code like the above can become a working parallel program.

The Challenge of Cross-component Optimisation. It is sadly not true that the composition of optimised software components is optimal. For example: Suppose we have two large optimised numerical functions f and g, then f(); g() is not necessarily optimal, in particular if these are parallel functions. One possible solution is of course to make sure that we recompile f and g each time we want to call them and then use a compiler with powerful interprocedural optimisations. However, in the case of libraries, we have no information available at compile-time about how library functions will be used: we lose all context information. In our library of parallel scientific components, we overcome this problem of lost context information by using delayed evaluation to capture the data-flow graph of library operations before they are executed at program runtime. We then have the opportunity to perform runtime cross-component optimisations.

Code (Component) Selection. For a vector update statement such as x = x + alpha * p, the data-flow graph nodes created are as follows:

       +------------------+
       | temp = alpha * p |
       +------------------+
              /
             /
            |
           \|/
            |
    +--------------+
    | x = x + temp |
    +--------------+
The reason why two nodes are created is the manner in which DESO's C++ interface "parses" the statement x = x + alpha * p. This approach is clearly sub-optimal, because
  1. We should fuse the two loops temp = alpha * p and x = x + temp.

  2. On many platforms, the hardware vendor will have supplied an optimised implementation of the operation y <- alpha × x + y. In our current implementation, we lose the opportunity to use that optimised implementation. The problem of selecting the optimal set of vendor-supplied functions for a particular computation is a code selection problem.

For a larger illustration of this problem, consider the following data flow graph, which is the actual DAG generated when running the first iteration of the above sample program.

Picture of First Iteration DAG

The aim of this project is to use runtime techniques to optimise these kind of DAGs. It would be a real bonus if we could also know whether our optimisation has just resulted in an improvement or found the actual optimal code selection for this DAG.

Similar problems have to be solved by optimising compilers for sequential code, and several published algorithms are available for that context.

There are ways in which both the code selection problem and the loop fusion problem could be addressed by using compile-time methods, specifically C++ Template Metaprogramming, see for example Blitz++ ( http://www.oonumerics.org/blitz/). There should be interesting trade-offs between the runtime and compile-time approaches.

The Task

Extend our existing parallel linear algebra library to perform runtime code selection, exploiting available optimised library routines. Evaluate by comparing with loop fusion and compile-time approaches.

Is this project for you?

This is a research-level project. To do this project, you must not be scared to pick up and work with an existing large software system. You should enjoy the challenge of getting software to work and do what you want. If you would like to discuss this project, please feel free to talk to either Paul Kelly or Olav Beckmann. Our EuroPar 2002 paper gives an overview of the software framework which you would be working with, whilst the THEMIS paper gives are more big-picture overview of the aims of our research project.

-----------------------------

Olav Beckmann
Last modified: Tue Oct 4 15:28:10 BST 2005