If you did course 332, Parallel Architectures, you should remember the examples in Question 1 -- look over them quickly and move onto Question 2.
Make sure that you have a look at Question 3.
For each of the following loops, draw the dependence graph, and if possible write down a vectorised version with the same input-output behaviour:
for i := 1 to N do A[i+1] := A[i] + 1;
for i := 1 to 10 do A[10*i] := A[11*i]
for i := 1 to N do A[i] := B[i]; C[i] := A[i] + B[i]; E[i] := C[i+1];
This question is intended to prepare for the section of the course on using unimodular matrices to capture loop transformations. It concerns the following example:
for i1 := 0 to 3 do for i2 := 0 to 3 do A[i1,i2] := A[i1-1,i2] + A[i1,i2-1]
for k1 := 0 to 3 do for k2 := k1 to k1+3 do A[k1,k2-k1] := A[k1-1,k2-k1] + A[k1,k2-k1-1]
For each of the following loops, draw the dependence graph, and if possible write down a vectorised version with the same input-output behaviour:
for i := 1 to 10 do A[i*2-1] := A[i*2-N*4]where N is not known at compile-time.
for i := 1 to 10 do for j := 1 to 11 do A[i*10+j] := A[i*10+j-1]
for i := 1 to 100 do for j := 1 to i-1 do A[i*100+j] := A[j*100+i](Note that this loop is ``triangular'').
Paul Kelly, Imperial College, February 1997.
This question is intended to prepare for the section of the course on using unimodular matrices to capture loop transformations. It concerns the following example:
for i1 := 0 to 3 do for i2 := 0 to 3 do A[i1,i2] := A[i1-1,i2] + A[i1,i2-1]
Interchange is clearly valid since the execution order of the interchanged loop nest would respect the dependences of the original loop.
There is a loop carried data dependence in the inner loop which blocks vectorisation. Similarly, the loop-carried data dependence in the outer loop prevents parallel execution of i1 iterations. Interchanging the loop nest does not help.
Yes. To see this consider larger loop limits, strip-mine then interchange:
for i1 := 0 to 29 do
for i2 := 0 to 29 do
A[i1,i2] := A[i1-1,i2] + A[i1,i2-1]
for ii1 := 0 to 29 step 10 do
for i1 := ii1 to ii1+10 do
for ii2 := 0 to 29 step 10 do
for i2 := ii2 to ii2+10 do
A[i1,i2] := A[i1-1,i2] + A[i1,i2-1]
for ii1 := 0 to 29 step 10 do
for ii2 := 0 to 29 step 10 do
for i1 := ii1 to ii1+10 do
for i2 := ii2 to ii2+10 do
A[i1,i2] := A[i1-1,i2] + A[i1,i2-1]
To see how this works, think about the order in which the iteration space is traversed.
for k1 := 0 to 3 do for k2 := k1 to k1+3 do A[k1,k2-k1] := A[k1-1,k2-k1] + A[k1,k2-k1-1]Solutions for the remaining questions are not available but should become apparent from the lecture notes. See also Banerjee's paper ``Unimodular Transformations of Double Loops''.
Paul Kelly, Imperial College, February 1997.