Next: Performance of blocked version
Up: Motivation: an example
Previous: Blocking (a.k.a. ``tiling'')
Now interchange so blocked loops are outermost:
for (kk = 0; kk < 512; kk += BLKSZ)
for (jj = 0; jj < 512; jj += BLKSZ)
for (i = 0; i < 512; i++)
for (k = kk; k < min(kk+BLKSZ,512); k++){
r = A[i][k];
for (j = jj; j < min(jj+BLKSZ, 512); j++)
C[i][j] += r * B[k][j];
}
- The inner i,k,j loops perform a multiplication of a pair of
partial matrices.
- BLKSZ is chosen so that a
submatrix of B and a row of length BLKSZ of C can fit in the cache.
- What is the right value for BLKSZ?