This is the second of two equally-weighted assessed coursework exercises. You may work in groups of two or three if you wish, but your report must include an explicit statement of who did what. Submit your work in a pdf file electronically via CATE.1 The CATE system will also indicate the deadline for this exercise.
This exercise concerns the same 3D stencil problem that we studied in simulation in Exercise 4. The goal this time is to achieve high performance on real hardware.
Copy the benchmark code to your own directory, e.g.
prompt> mkdir /homes/yourid/ACA12 (you will probably have already done this) prompt> cd /homes/yourid/ACA11 prompt> cp -r /homes/phjk/ToyPrograms/ACA12/StencilChallenge ./
(The ./ above is the destination of the copy - your current working directory).
List the contents of the benchmark directory:
prompt> cd StencilChallenge prompt> ls common.h main.test.c probe_heat_blocked.c probe_heat_timeskew.c cycle.h Makefile probe_heat.c run.h find_best_block.sh min.pl probe_heat_circqueue.c util.c main.c probe probe_heat_oblivious.c util.h
prompt> make clean; make probe ; ./probe 602 602 602 0 0 0 5This builds the most straightforward implementation of the program. You may wish to explore some of the optimised versions:
prompt> make clean; make circqueue_probe ; ./probe 602 602 602 75 75 75 5 prompt> make clean ; make oblivious_probe ; ./probe 602 602 602 0 0 0 5The parameters of the probe program are as follows:
./probe <gridx> <gridy> <gridz> <tilex> <tiley> <tilez> <iterations>In the simplest version, the tile size parameters are ignored. The more sophisticated versions use them in various ways, and they may have to divide into the grid size (-2) exactly.
Basically, your job is to figure out how to run this program as fast as you possibly can, and to write a brief report explaining how you did it.
You are encouraged to look at graphics processors, mobile devices, etc. If in doubt please ask.
Changing the rules
If you want to bend any of these rules just ask.
Hints, tools and techniques
Performance analysis tools: You may find it useful to find out about:
Compilers You could investigate the potential benefits of using more sophisticated compilers:
Source code modifications
You are strongly invited to modify the source code to investigate performance optimisation opportunities.
How to finish The main criterion for assessment is this: you should have a reasonably sensible hypothesis for how to improve performance, and you should evaluate your hypothesis in a systematic way, using experiments together, if possible, with analysis.
What to hand in Hand in a concise report which
Please do not write more than seven pages.
The benchmark code we are using comes from this website:
http://www.cs.berkeley.edu/~skamil/projects/stencilprobe/It was developed by Shoaib Kamil at Berkeley. Some small changes have been made for the purposes of this exercise, notably the addition of the checksum correctness check.
Paul H.J. Kelly, Imperial College London, 2012