332 Advanced Computer Architecture

Exercise 6 (Assessed): the Splay Tree Challenge

This is the second of two assessed coursework exercises.

You may work in groups of two or three if you wish, but your report must include an explicit statement of who did what.

Submit your work electronically via CATE.

Background

This exercise is about the same splay tree benchmark program as we studied under simulation in Exercise 4. However, this time the challenge is to make it go as fast as you can.

You are encouraged to modify the source code - up to and including using a completely different data structure.

Although the main interest is the splay tree, you are also invited to investigate performance optimisations in the random workload generation if you wish.

All-out performance

Basically, your job is to figure out how to run this program as fast as you possibly can, and to write a brief report explaining how you did it.

Rules

  1. The goal is to reduce the execution time for the entire run (not just the time per query), as follows:
    time ./splaytest.x86 3 $SIZE $SIZE $SIZE 5
    
    where $SIZE is a parameter chosen so that the runtime and memory requirement on the platform you choose is reasonable. On a PC I suggest 100000000 (100 million).

  2. You can choose any hardware platform you wish. You are encouraged to find interesting and diverse machines to experiment with. The goal is high performance on your chosen platform so it is OK to choose an interesting machine even if it's not the fastest available. On linux type ``cat /proc/cpuinfo''.

    Try the Apple G5s, ICT supercomputer resources (Itanium, Opteron) possibly PDAs, DSP processors, graphics co-processor or FPGAs. Please ask if you would like a suggestion.

  3. Make sure the machine is quiescent before doing timing experiments. Always repeat experiments for statistical significance.

  4. Choose a problem size which suits the performance of the machine you choose - the runtime must be large enough for an improvements to be evident. The really interesting problems are, of course, the long-running ones.

  5. The program prints ``number_of_matches'' when it finishes. If you modify the program, make sure that it still prints the same value as the unmodified program.

  6. You can achieve full marks even if you do not achieve the maximum performance.

  7. Marks are awarded for

  8. You should produce a compact report in the style of an academic paper for presentation at an international conference such as Supercomputing (www.sc2000.org). The report must not be more than 7 pages in length.

Changing the rules

If you want to bend any of these rules just ask.

For example:

Hints, tools and techniques

Performance analysis tools:

You may find it useful to find out about:

Compilers

You could investigate the potential benefits of more sophisticated compiler techniques:

Source code modifications

You are strongly invited to modify the source code to investigate performance optimisation opportunities.

How to finish

The main criterion for assessment is this: you should have a reasonably sensible hypothesis for how to improve performance, and you should evaluate your hypothesis in a systematic way, using experiments together, if possible, with analysis.

What to hand in

Hand in a concise report which Please do not write more than seven pages.


Paul Kelly, Imperial College, 2008


Footnotes

... machine)1
To do this you will need to build the code using a native Windows compiler.