This is the second of two equally-weighted assessed coursework exercises.
You may work in groups of two or three if you wish, but your report must include an explicit statement of who did what.
Submit your work in a pdf file electronically via the CATE system,1 which will indicate the deadline for this exercise.
This exercise is about the same Sum of Absolute Differences benchmark, as we studied under simulation in the first assessed exercise. This time, however, the challenge is to make it go as fast as you can. You are encouraged to modify the source code - up to using different algorithms and data structures.
Copy the benchmark code to your own directory:
prompt> mkdir /path/to/your/dir prompt> cd !$ prompt> cp -r /homes/phjk/ToyPrograms/ACA09/SAD ./ prompt> cd SAD && ls
Make the Parboil suite's test harness:
prompt> cd common/src prompt> make PARBOIL_ROOT=/absolute/path/to/your/dir prompt> cd ../..
Compile and run the fast CPU version cpu:
prompt> ./parboil run sad cpu default Parboil parallel benchmark suite, version 0.1 IO: 0.375603 GPU: 0.000000 Copy: 0.000000 Compute: 0.061069 Pass
You can start with copying this CPU version and modifying it, e.g.:
prompt> cp -r benchmarks/sad/src/cpu benchmarks/sad/src/mycpu
Compile and run mycpu similarly:
prompt> ./parboil run sad mycpu default
The Parboil's test harness should let you know if the obtained output mismatches the reference one.
default is the default data set of input image frames. You may wish to scale the default data set for evaluating your version. For example, to add a scaled data set of frames, type:
prompt> ./scripts/add_dataset 64 32
Each parameter must be an integral multiple of 16. This will create subdirectories
benchmarks/sad, and place the scaled input frames into
input/64x32 and the reference output (from running the
cpu version) into
To remove this data set, type:
prompt> ./scripts/rm_dataset 64 32
Basically, your job is to figure out how to run this program as fast as you possibly can, and to write a brief report explaining how you did it.
Try the Apple G5s, ICT supercomputer resources (Itaniums, Opterons), graphics co-processors (NVIDIA, ATI), PDAs, DSP processors, or FPGAs. Please ask if you would like a suggestion.
If you want to bend any of these rules just ask.
You are strongly invited to modify the source code to investigate performance optimisation opportunities.
Paul H.J. Kelly & Anton Lokhmotov, Imperial College London, 2009