419 Compilers II

Tutorial exercise 1

If you did course 332, Parallel Architectures, you should remember the examples in Question 1 -- look over them quickly and move onto Question 2.

Make sure that you have a look at Question 3.

Question 1: Basic dependence analysis

For each of the following loops, draw the dependence graph, and if possible write down a vectorised version with the same input-output behaviour:

  1. for i := 1 to N do
      A[i+1] := A[i] + 1;
  2. for i := 1 to 10 do
      A[10*i] := A[11*i]
  3. for i := 1 to N do
      A[i] := B[i];
      C[i] := A[i] + B[i];
      E[i] := C[i+1];

Question 2: Wavefronts

This question is intended to prepare for the section of the course on using unimodular matrices to capture loop transformations. It concerns the following example:

for i1 := 0 to 3 do
  for i2 := 0 to 3 do
    A[i1,i2] := A[i1-1,i2] + A[i1,i2-1]
    1. Draw the iteration space for this loop and mark on it the dependences.
    2. Write down the dependence direction vector for each dependence.
    3. Write down the dependence distance vector for each dependence. Recall that the dependence distance is the number of iterations which separate a pair of references to the same location, while the dependence directions is its sign.
    4. Can this loop nest be interchanged?
    5. Can this loop be vectorised or parallelised?
    6. Can this loop be tiled (i.e. executed block-wise to improve cache behaviour)?
  1. Consider this loop:
    for k1 := 0 to 3 do
      for k2 := k1 to k1+3 do
        A[k1,k2-k1] := A[k1-1,k2-k1] + A[k1,k2-k1-1]
    1. Draw the iteration space for this loop and mark on it the dependences.
    2. Write down the dependence direction vector for each dependence.
    3. This loop has the same effect as the earlier loop, but has been skewed. What does this mean -- what is the relationship between the two loops?
    4. Can this loop nest be interchanged?
    5. Can this loop be vectorised or parallelised?
    6. Can this loop be tiled (i.e. executed block-wise to improve cache behaviour)?

Question 3: Less basic dependence analysis

For each of the following loops, draw the dependence graph, and if possible write down a vectorised version with the same input-output behaviour:

  1. for i := 1 to 10 do
      A[i*2-1] := A[i*2-N*4]
    where N is not known at compile-time.
  2. for i := 1 to 10 do
      for j := 1 to 11 do
        A[i*10+j] := A[i*10+j-1]
  3. for i := 1 to 100 do
      for j := 1 to i-1 do
        A[i*100+j] := A[j*100+i]
    (Note that this loop is ``triangular'').

Paul Kelly, Imperial College, February 1997.

419 Compilers II Tutorial 1

Sample Solutions

Question 2: Wavefronts

This question is intended to prepare for the section of the course on using unimodular matrices to capture loop transformations. It concerns the following example:

for i1 := 0 to 3 do
  for i2 := 0 to 3 do
    A[i1,i2] := A[i1-1,i2] + A[i1,i2-1]
    1. Draw the iteration space for this loop and mark on it the dependences.

      tabular42

    2. Write down the dependence direction vector for each dependence.
      • S tex2html_wrap_inline324 S due to the first reference to A.
      • S tex2html_wrap_inline326 S due to the second reference to A.
    3. Write down the dependence distance vector for each dependence. Recall that the dependence distance is the number of iterations which separate a pair of references to the same location, while the dependence directions is its sign.
      • (1,0) for the first reference to A.
      • (0,1) for the second reference to A.
    4. Can this loop nest be interchanged?

      Interchange is clearly valid since the execution order of the interchanged loop nest would respect the dependences of the original loop.

    5. Can this loop be vectorised or parallelised?

      There is a loop carried data dependence in the inner loop which blocks vectorisation. Similarly, the loop-carried data dependence in the outer loop prevents parallel execution of i1 iterations. Interchanging the loop nest does not help.

    6. Can this loop be tiled (i.e. executed block-wise to improve cache behaviour)?

      Yes. To see this consider larger loop limits, strip-mine then interchange:

      for i1 := 0 to 29 do
        for i2 := 0 to 29 do
          A[i1,i2] := A[i1-1,i2] + A[i1,i2-1]
      for ii1 := 0 to 29 step 10 do
        for i1 := ii1 to ii1+10 do
          for ii2 := 0 to 29 step 10 do
            for i2 := ii2 to ii2+10 do
              A[i1,i2] := A[i1-1,i2] + A[i1,i2-1]
      for ii1 := 0 to 29 step 10 do
        for ii2 := 0 to 29 step 10 do
          for i1 := ii1 to ii1+10 do
            for i2 := ii2 to ii2+10 do
              A[i1,i2] := A[i1-1,i2] + A[i1,i2-1]
      To see how this works, think about the order in which the tex2html_wrap_inline328 iteration space is traversed.
  1. Consider this loop:
    for k1 := 0 to 3 do
      for k2 := k1 to k1+3 do
        A[k1,k2-k1] := A[k1-1,k2-k1] + A[k1,k2-k1-1]
    Solutions for the remaining questions are not available but should become apparent from the lecture notes. See also Banerjee's paper ``Unimodular Transformations of Double Loops''.

Paul Kelly, Imperial College, February 1997.



Paul H J Kelly Thu Feb 6 22:20:09 GMT 1997