Next:
Interchange loops
Up:
Motivation: an example
Previous:
Memory system
Performance
The initial version runs in 10.3 seconds.
The matrix multiplication takes 512
3
steps, each involving two floating-point operations, an add and a multiply, i.e.
.
This loop achieves a computation rate of 268/10.3=26.1 MFLOPs.
That is, one floating-point operation completed every 30 clock cycles
How are we going to get value for money?