332 Advanced Computer Architecture

Exercise 5.1. Software pipelining

In this question, assume a static-pipeline machine with the following characteristics:

Consider the following loop:
LOOP: SD   0(R1),F4     ; stores into M[i]
      ADDD F4,F0,F2     ; adds to M[i-1]
      LD   F0,-16(R1)   ; loads M[i-2]
      BNEZ R1,LOOP      ; delayed branch
      SUB  R1,R1,#8     ; executed whether branch taken or not
Using your paper sideways, draw a diagram showing the timing for the execution of the loop. You will need to look at more than one iteration.

What is the average CPI when the loop is executed many times?

Explain what this loop does, and describe how it is likely to be used to perform a straightforward useful function. Show any initialisation needed. What is going on? How does the performance of this loop compare with a straightforward implementation?

Paul Kelly, Imperial College, 2000

next up previous