Next: Performance of blocked version
Up: Motivation: an example
Previous: Blocking (a.k.a. ``tiling'')
Now interchange so blocked loops are outermost:
for (kk = 0; kk < 500; kk += BLKSZ)
for (jj = 0; jj < 500; jj += BLKSZ)
for (i = 0; i < 500; i++)
for (k = kk; k < min(kk+BLKSZ,500); k++){
r = A[i][k];
for (j = jj; j < min(jj+BLKSZ, 500); j++)
C[i][j] += r * B[k][j];
}
- The inner i,k,j loops perform a multiplication of a pair of
partial matrices. - BLKSZ is chosen so that a
submatrix of B and a row of length BLKSZ of
C can fit in the cache. - What is the right value for BLKSZ?
Paul H J Kelly
Thu Dec 4 18:15:31 GMT 1997