332 Advanced Computer Architecture

Unassessed tutorial exercise 2

Exercise 2.1. Instruction set issues in pipeline design

During the gruelling interview for the post of Deputy Technical President, IC Inside $^{\rm TM}$ Computers Incorporated, you have been asked to investigate a new addressing mode for a DLX-like architecture. The idea is to allow one of the operands of an ALU instruction to be in memory. To offset this increase in complexity, you restrict all memory addressing (in Load, Store and ALU instructions) to be register-indirect only (i.e. no displacement addressing).

For example, suppose R1 is an index into an array starting at address 100, and we need to multiply the R1 $^{\rm th}$ element by 123. On DLX you could write:

LD   R2, 100(R1)
MULI R3, R2, 123

Here we load R2 with the element of the array, using displacement addressing. With the modified architecture, you would write

ADDI R2, R1, 100
MULI R3, (R2), 123

Here, the first instruction does the displacement calculation explicitly, leaving a pointer to the array element in R2. The second instruction then reads the value from memory.

1.: Propose a change to the DLX pipeline (as shown in the notes and in Chapter 3 of Hennessy and Patterson) to accommodate this addressing mode. Draw the new pipeline and explain the changes.
2.: What new data hazards are created by this addressing mode using your modified pipeline? Give an example of each of the hazards you find. Explain where in the pipeline the hazard occurs.
3.: List all changes or additions which must be made to the hardware to accommodate this new addressing mode.
4.: List all the questions which must be answered to estimate the impact of this change on overall performance.

Paul Kelly, Imperial College, 2000

Notes on the solutions to unassessed exercise 2

Exercise 2.1. Instruction set issues in pipeline design

1.

Propose a change to the DLX pipeline (as shown in the notes and in Chapter 3 of Hennessy and Patterson) to accommodate this addressing mode. Draw the new pipeline and explain the changes.

Notice that the changes do not force us to have two ALU stages: arithmetic instructions need an ALU to do the arithmetic, but do not need to do any effective address calculation (this is why it was so important to disallow displacement addressing).

However, in processing an arithmetic instruction with a memory operand, it is necessary to access memory before performing the arithmetic, so we must have the MEM cycle before the EX cycle.

So it looks like a solution is simply to switch the MEM and EX stages in the DLX pipeline, giving a picture like this (it's a good idea to draw it this way to show up the relative timings):

Clock:

Instruction 1

MEM

Instruction 2

MEM

Instruction 3

MEM

Instruction 4

MEM

Instruction 5

MEM

The question is, can this handle all the instructions in the instruction set. For example, notice that displacement addressing has to be disallowed not only in arithmetic instructions, but also in loads and stores. (Why?)

2.

What new data hazards are created by this addressing mode using your modified pipeline? Give an example of each of the hazards you find. Explain where in the pipeline the hazard occurs.

With the pipeline above, there is essentially one hazard, although you have to consider two cases. Consider initially the following:

ADD R1, R2, R3
NOP              ; (no operation)
ADD R4, (R1), R5

The MEM stage of the last instruction must use the value generated at the end of the EX stage two instructions previously. This problem can be dealt with using the forwarding (sometimes called bypassing) technique: special hardware catches the case, and routes the operand direct from EX to MEM without going through the register file. Thus a stall is avoided.

Now consider the case where R1 is used immediately after the instruction in which it is computed:

ADD R1, R2, R3
ADD R4, (R1), R5

Again, the MEM stage of the last instruction must use the value generated by the first one, but if we refer to the timing we see that the MEM stage for the second instruction is supposed to start in the same cycle as the EX stage in which R1 is calculated. Thus there is no alternative but to stall.

Referring to the textbook (pg. 264), observe that both of these hazards are read-after-write (RAW) hazards: a stage tried to read a value before it was written. A write-after-read (WAR) hazard occurs if a register or memory cell should be overwritten after it is read, but because of the pipeline, it could be overwritten before it is read. A write-after-write (WAW) hazard occurs if the pipeline changes the order in which writes occur, thereby leaving the wrong values in the end.

Why aren't there any WAR or WAW hazards in our pipeline?

3.

List all changes or additions which must be made to the hardware to accommodate this new addressing mode.

The number of memory ports needed is not changed. Similarly, the number of simultaneous accesses to the register file is unchanged (how many do we need to handle?). We do need to modify the data path so addresses can be routed direct from the register file to the memory address register (MAR).

We need new hardware to handle forwarding: essentially a multiplexer on the MAR so that data can be routed to it either from the register file or from the ALU. We also need additional control hardware to handle this, and to organise stalling on the data hazard noted above.

4.

List all the questions which must be answered to estimate the impact of this change on overall performance.

Impact on clock time?

Percentage of loads and stores which use displacement addressing (since each of these would now involve an additional instruction to compute the effective address).

Percentage of loads which load a value which is only used once by an ALU operation -- since it is in this case that the new addressing mode saves us something.

After the compilers have been changed so that displacement addressing is calculated in a register, what percentage of instructions that calculate an address are immediately followed by a load instruction which uses it? (since this is the case where a stall would occur).

(Paul Kelly, Imperial College, October 2000)