This is the first of two equally-weighted assessed coursework
exercises. Working individually, work through the exercise and write
up a short report presenting and explaining your results. Submit the
pdf electronically via CATE (https://sparrow.doc.ic.ac.uk/~
cate/).
The CATE system will also indicate the deadline for the exercise.
Carl Gustav Jacobi (1804-1851, see
http://www-groups.dcs.st-and.ac.uk/~
history/Mathematicians/Jacobi.html)
gives his name to a particularly simple algorithm which is widely used
in solving partial differential equations. For an example, see
http://www.psc.edu/training/Drexel/Code_Development/Code_Development.html.
The idea is very simple indeed; here is a C implementation:
typedef double gridtype[MAXiDIM][MAXjDIM]; gridtype grid1; gridtype grid2; int it,i,j; double onequarter = 1.0/4.0; InitGrid(grid1, iDim, jDim); InitGrid(grid2, iDim, jDim); for (it=0; it<Niters; it+=1) { for (i=1; i<iDim-1; ++i) for (j=1; j<jDim-1; ++j) grid2[i][j] = onequarter * (grid1[i-1][j] + grid1[i+1][j] + grid1[i][j-1] + grid1[i][j+1]); it++; if (it >= Niters) break; for (i=1; i<iDim-1; ++i) for (j=1; j<jDim-1; ++j) grid1[i][j] = onequarter * (grid2[i-1][j] + grid2[i+1][j] + grid2[i][j-1] + grid2[i][j+1]); } #ifdef PRINT_GRID /* the grid to print depends on whether Niters was odd or even */ PrintGrid(Niters%2==1 ? grid2 : grid1, iDim, jDim); #endifAs you can see, the loop simply smooths the initial grid by replacing each element grid[i][j] with the average of its four neighbours, north, east, west and south. This is often called a ``four-point stencil''. Note that:
You can find the source code in
/homes/phjk/ToyPrograms/C/Jacobi2d-2007/Jacobi2d.cCopy this file to your own directory1. You should compile the application for the x86 platform using the following commands:
x=5000 gcc -O3 -DMAXiDIM=$x -DMAXjDIM=$x -o Jacobi2d.x86 Jacobi2d.cwhere
$x
is a shell variable set to the array size to be used. Good values lie in the range
200-1000 for the simulator, and up to about 7000 for native execution (if you use ``csh'' or ``tcsh'' instead of the ``bash'' shell
you set $x
with set x=200
).
Now run the application as follows:
./Jacobi2d.x86 $x $x 5Compile the application for the Simplscalar simulator using the following command:
x=200 ~phjk/simplescalar/bin/gcc -O3 -DMAXiDIM=$x -DMAXjDIM=$x -o Jacobi2d Jacobi2d.cNow run the application as follows:
~phjk/simplescalar/simplesim-2.0/sim-outorder ./Jacobi2d $x $x 2Here the value 2 is the number of iterations of the Jacobi sweep to be repeated. Choose this value to keep the simulation time (or execution time) under control while ensuring the simulated machine does a significant amount of work - 2 is the minimum sensible value.
Your first job is to study how the performance varies with array size.
Note that you need to recompile the application for each problem size so that the arrays are declared with the right dimensions.
Try to make sure no other processes are active as they will interfere with your results. How many times do you need to repeat your experiments to get a reliable result?
Your next job is to study the effect of various architectural features on the performance of the Jacobi2d application running under the SimpleScalar simulator:
To do this you need to write a shell script. You might find the following Bash script useful:
#!/bin/sh -f SIZE=200 ROOT=/homes/phjk/ToyPrograms/C/Jacobi2d-2007 for x in 2 4 8 16 32 64 128 256 do echo -n $x "\t" ~phjk/simplescalar/simplesim-2.0/sim-outorder -ruu:size $x \ $ROOT/simplescalar/Jacobi2d-512-O3 $SIZE $SIZE 2 \ 2>&1 | grep sim_IPC doneThe output is delivered to the file ``results''. You can find this script in
~phjk/ToyPrograms/C/Jacobi2d-2007/scripts/Jacobi2d-ss-ruu-script.shSee also ``Jacobi2d-ss-vary.sh''.
Try using the gnuplot program. Run the script above, and save the output in a file ``table''. Type ``gnuplot''. Then, at its prompt type:
set logscale x 2 plot [][0:2] 'table2' using 1:3 with linespointsTo save the plot as a postscript file, try:
set term postscript eps set output "psfile.ps" plot [][0:2] 'table2' using 1:3 with linespointsTry ``help postscript'', ``help plot'' etc for further details.
Compile the application using the ``-S'' flag.
~phjk/simplescalar/bin/gcc -O3 -S -DMAXiDIM=$x -DMAXjDIM=$x Jacobi2d.cThe assembler code generated by the compiler is delivered in ``Jacobi2d.s''.
The compiler accepts many parameters to control optimisation. Try ``man gcc'' to read about them. For example, ``-funroll-loops'' encourages the compiler to unroll loops:
~phjk/simplescalar/bin/gcc -O3 -funroll-loops -DMAXiDIM=$x -DMAXjDIM=$x -o Jacobi2d Jacobi2d.cDoes this improve the performance of the simulated machine? What happens to the IPC?
Paul Kelly, Imperial College London, 2007