Next: Optimising memory accesses
Up: Motivation: an example
Previous: Performance
for (i = 0; i < 512; i++)
for (k = 0; k < 512; k++){
r = A[i][k];
for (j = 0; j < 512; j++)
C[i][j] += r * B[k][j];
}
- 5.2 seconds (51.6 MFLOPS).
- Why is this such a good idea?
- How might a compiler perform this transformation?
- Can we do better still?