332 Advanced Computer Architecture

Exercise 8 (Assessed): the CFD Challenge


In 1759 Leonhard Euler published equations of motion for a fluid, applying Newton's second law, i.e., the product of mass and acceleration of a body equals the external forces acting on it. Euler's idea to express knowledge about fluid dynamics in the form of partial differential equations meant a major breakthrough. A practical shortcoming of his flow model, however, was that it did not consider friction forces. In 1845 George Stokes introduced general equations, which in the case of an incompressible fluid reduce to equations already proposed by Claude Navier in 1822. These general equations are now called Navier-Stokes equations. Understanding and controlling a large class of fluid flows was reduced to the mathematical solution of a few basic differential equations. Unfortunately, analytical solutions are very rarely available. Numerical solution of the Navier-Stokes equations has developed into a major industry, and a vital support to diverse areas of engineering, medicine and science1.

For further information see for example:

(The latter is an Imperial College spin-out company).

This exercise is based on a CFD application called NaSt3DGP, written by Michael Griebel, Roberto Croce, Frank Koster and Michael Meyer, based on the book Griebel/Dornseifer/Neunhoefer, Numerical Simulation in Fluid Dynamics, SIAM Philadelphia (1998). The code is freely available (although not open source), and can be downloaded from

You must abide by the licence published there. You are encouraged to visit their web site, which includes images and movies illustrating various applications of the tool.

All-out performance

This exercise can be done in groups of up to three, deadline Tuesday 4th March. Submit your work at the DoC Student General Office in a suitable labelled folder.

Basically, your job is to figure out how to run the computation specified in the next section as fast as you possibly can, and to write a brief report explaining how you did it.


  1. You can choose any hardware platform you wish. You are encouraged to find interesting and diverse machines to experiment with. The goal is high performance on your chosen platform so it is OK to choose an interesting machine even if it's not the fastest available.

  2. The numerical results reported by the application must be absolutely identical.

  3. You can achieve full marks even if you do not achieve the maximum performance.

  4. Marks are awarded for

  5. You should produce a compact report in the style of an academic paper for presentation at an international conference such as Supercomputing (www.sc2000.org). The report must not be more than 7 pages in length.

Tools and techniques

You may find it useful to find out about:

The application - getting started

Begin by making a copy of the files you will need:

cp -r /homes/phjk/ToyPrograms/Nast3dgp-2003 ./
cd Nast3dgp-2003
setenv NASTROOT `pwd`
The final line sets the environment variable NASTROOT to the current directory path. (if you use bash you will need to adjust this final line).

Next, build the application:

cd NaSt3DGP/src/
make clean
make serial
Now you can run the application. Various data files have been provided; start with a smallish one:
$NASTROOT/Scripts/RunNavcalc.sh Cavity2D-48x48x4x0.003
This command should produce (after some initial data) the following output:
Step:       1 Time: 0.000000 TimeStep: 0.00208333333333  Time:   0h  0m       0.00s
  it:       1  res: 2.753807e+00    itmax: 200
  it:      51  res: 4.851704e-02    itmax: 200
  it:     101  res: 1.143685e-03    itmax: 200
  it:     151  res: 1.100218e-05    itmax: 200
Iter:     200  Res: 1.63466965e-07     all It:      600
Mass Balance (rhs): -4.70586777e-16

Step:       2 Time: 0.002083 TimeStep: 0.00208333333333  Time:   0h  0m       1.00s
  it:       1  res: 1.907481e-02    itmax: 200
  it:      51  res: 5.579380e-04    itmax: 200
  it:     101  res: 3.942419e-06    itmax: 200
Iter:     138  Res: 5.59304312e-08     all It:     1014
Read the script ``$NASTROOT/Scripts/RunNavcalc.sh'' to see what this did. An input file ``DataFiles/Cavity2D-32x32x4-0.003/p.nav'' describes the 3d environment in which the fluid will flow, and other details of the simulation to be run. This is pre-processed by a helper program, ``NaSt3DGP/bin/navsetup'', to produce an initial state for the mesh on which the calculation is done, which is stored in ``Junk/t.bin''. The CFD solver itself, ``NaSt3DGP/bin/navcalc'', reads this, simulates the fluid flow for a specified number of timesteps, and writes the fluid state back to ``Junk/t.bin''.

You should concentrate on the Cavity problem, which is a standard CFD test example:

The fluid is in a box, whose lid is sliding to one side, dragging the top surface of the fluid sideways. Interesting fluid behaviour only begins to evolve after a fairly large number of timesteps. A variety of instances of this problem have been provided, of varying size and run-time, the smallest (above) being intended for use with simulators. In practice, problems of interest can involve $512^3$ mesh elements or more, over hundreds of timesteps.

The second problem you might consider is more visually appealing:

./Scripts/RunNavcalc.sh FlowPastObstacle3D-51x51x13x0.4
For details see

Validating your output

It is essential that any optimised program computes exactly the same result. It is usually sufficient to check that the ``residuals'' (the ``res:'' column in the output listed above) match.

Using Cachegrind

Try the following:

$NASTROOT/Scripts/RunNavcalc-cachegrind.sh Cavity2D-32x32x4-0.003
This runs the application under the CacheGrind simulator, then uses the ``cg_annotate'' tool to annotate the source code. The output can be found in the file ``SourceCodeAnnotatedByCacheGrind''.

Cachegrind annotates each line of code with the following events:

Note that D1 total accesses is given by D1mr + D1mw, and that L2 total accesses is given by I2mr + D2mr + D2mw. See the documentation (http://developer.kde.org/ sewardj/docs/manual.html) for details. You need to read the annotated listing carefully as some C++ functions are inlined (notably the array subscript operator ``[]'').

How to get started

Here are some hints:

How to finish

The main criterion for assessment is this: you should have a reasonably sensible hypothesis for how to improve performance, and you should evaluate your hypothesis in a systematic way, using experiments together, if possible, with analysis.

What to hand in

Hand in a concise report which Please do not write more than seven pages.

Paul Kelly, Imperial College, 2003


... science1
Historical background from http://www.cwi.nl/research/2001/Koren_Eng/.

next_inactive up previous