On-line lecture notes etc for similar/related courses
Information on particular computer systems
Research projects and papers
The most important tool for studying issues in computer architecture is
a simulator. It's boring to write one's own - but fortunately there
are some good ones available freely on the web. Here's a selection:
- SIMICS - whole-system simulator.
- The SimOS project at Stanford.
A timing-accurate simulator for SGI multiprocessor running the standard
operating system, including modelling of operating system, disk, network and
- The SimpleScalar toolset
A detailed timing simulator for a CPU based on DLX, incorporating multiple
issue, dynamic instruction scheduling, speculative execution,
branch prediction, and a sophisticated multi-level memory hierarchy.
All these architecture details can be controlled for simulation
experiments evaluating their importance. Includes compiler and debugger.
- The Benchmark Gateway
- SPEC publically available
figures for SPEC95 CPU benchmarks (as featured in H&P)
plus some fileserver and graphics benchmarks. Lots of
useful background as well.
A commercial company specialising in benchmarks
involving a mix of typical jobs. Some publically-available
results but very little on methodology.
- McCalpin's STREAM benchmark
Measures memory system throughput. Although a very artificial
benchmark, this exposes an important performance factor
which is not properly reflected by SPEC figures (esp. SPECInt).
The figures show that supercomputers really are in a
different league, but because of vector startup delays
they should be taken with an even bigger pinch of salt.
- Jack Dongarra's LINPACK performance collection.
This is based on solving a large dense system
of linear equations using Gaussian elimination.
There are three sets of figures: execution time
on a given Fortran program solving a small, 100x100
matrix, execution time with as much hacking as
necessary on a 1000x1000 problem, and performance
in MFlops on any chosen problem size - but they
must also report how big a problem is needed
for even half of this figure to be achievable.
LINPACK is quite good fun but is really only a guide
to peak performance; real applications have richer
data structures with more pointers and more
- The Transaction Processing Performance Council
Collection of results for large database systems. Interesting
because this is an important market which motivates vendors
strongly, but where I/O system and memory throughput are
the real issues, and SPEC CPU benchmarks don't predict much.
The benchmarking methodology is also worth studying.
Note, in particular, how the problem size is scaled
to match the performance of the system under test.
After all it's not very interesting to be able to handle
a 100,000 cashpoint machine transactions a
second if you only maintain data on 1,000 customers.
Microprocessor Report's speculations on the Intel/HP IA-64 (Merced)
architecture, based on recent Intel patents.
Intel's tour of the Pentium Pro/Pentium II processor
architecture The Pentium II and Pentium Pro
share the same processor core. The course covers
almost all of its key features, including
branch prediction, dynamic instruction scheduling,
multiple issue, speculative execution, memory
disambiguation, multiple levels of cache and
cache coherency protocols.
This is a Pentium-compatible processor design designed for
low cost and low power. It basically uses the simple
static pipeline approach covered at the very start
of the course. The argument is that a big fraction of
execution time is actually dominated by off-chip accesses
so a very simple processor core can compete with
Intel, AMD and Cyrix's sophisticated, dynamic and speculative
instruction scheduling - provided the spare transistors
are used to improve average memory access time (AMAT).
Almost 4/5 of the transistors are devoted to cache.
Since the cache SRAM is extremely dense, the chip area
(which determines cost) is very small compared with
the competition even though the transistor count is
See also Byte, October 1997 pp.51.
HP-PA 8000 Architecture presentation at Hot Chips
96. In particular note the vast effort devoted
to speculative execution (850K transistors).
Article on the HP 8000 processor architecture.
This is another example of a
4-way superscalar with dynamic instruction
scheduling (similar to the MIPS R10000 and
the PowerPC 620). It's unusual in
having no on-chip cache - the article
explains this interesting decision.
Articles on the HP 7100 processor architecture
In particular, the instruction set extensions
aimed at supporting multimedia applications.
Great Microprocessors of the Past and Present
a potted history of microprocessors and their design
since the beginning.
Brief Introduction to Alpha Systems and Processors
by Neal Crook, Digital Equipment.