Parallelism is being exploited at three levels: pipelining of arithmetic, multiple
FPUs, and multiple processors:
[3em]
Arithmetic pipelining
FP pipeline depth
5
Multiple FPUs
Control complexity
3
Multiple CPUs
Interconnection
16
Total parallelism
240
The memory system parallelism is needed to sustain this performance
on applications which don't fit into cache
For peak performance we must write programs which are optimal at all three
levels - but sub-optimal programs can still run at a large fraction of peak.
The moral:
High performance computing relies on achieving the
maximum cost-effective advantage at every level.