In the notes I show a single loop with two backedges. Show an example in C or Java of two source-code loops that lead to two backedges with the same target node.

The immediate dominator of a node n is a node that dominates but does not dominate any other dominator of . Show that immediate dominators form a tree.

A Depth-first spanning tree for a graph arises when the edges are traversed in a depth-first search. Categorizing edges in graph:

- Advancing edges: from ancestor to proper descendant.
- Retreating edges: from descendant to ancestor (not necessarily proper)
- Cross edges: all other edges

Construct a graph containing a retreating edge which is not a backedge (Thanks to Monica Lam for this one).

Structured control flow constructs enforce the property that loops have a single entry node. Can you write a loop with multiple entry nodes in Java without using goto? Can you see a way to do it in C? (it's called ``Duff's Device'').

The following questions develop some of the ideas presented in the lectures, connects them to textbook material and introduce some more advanced topics. They are not primarily intended to be a guide to likely exam questions.

If we use the graph colouring scheme for register allocation, we can simplify the code for transExp. TransExp walks the AST, producing instructions which use ``temporaries'' -- that is, pseudo-registers. The pseudo-registers are assigned to real registers by the graph-colouring register allocator. TransExp doesn't have to handle running out of pseudo-registers.

In Exercise 3, you were encouraged to use instructions which allow immediate and absolute operands for arithmetic instructions. The 68000's instruction set is much more powerful, and there are further extensions in the 68020 and later versions.

(Look up ``maximal munch'' in Appel's book).

Does your pattern-matching approach lead to the minimum possible number of instructions? Look up ``optimum'' versus ``optimal'' in Appel's book.

(Incidentally, Motorola have also developed ColdFire, a processor for system-on-a-chip applications which executes only a simplified subset of the 68000 instruction set, omitting some of these difficult-to-implement (and difficult-to-use?) features).

In the Intel IA-32 instruction set, the 32-bit (signed and unsigned) integer divide instructions divide a 64-bit operand by a 32-bit number, producing a 32-bit quotient and a 32-bit remainder:

idiv operand ; 2's complement division ; does edx||eax / operand ; eax <- quotient ; edx <- remainderThe quotient is delivered to register eax, the remainder to edx.

Incidentally, unsigned multiply has a similar problem:

mul eax, operand ; unsigned multiplication ; edx||eax <- eax * operandHowever, the IA-32 designers took pity on compiler writers and provided three variants of signed integer multiply:

imul operand ; 2's comp. multiplication ; edx||eax <- eax * operand reg, operand ; reg <- reg * operand reg, immed ; reg <- reg * immed

Consider the following loop:

for (i=0; i<=98; i++) { if (i == a) { b = 1; } }If you implement the algorithms presented so far in this course, you get code like this:

.data ; Variable a has been allocated to D4, b has been allocated to D0 .text move.l #98, D0 clr.l D1 bra L5 L1: move.l D1, D2 move.l D4, D3 cmp.l D2, D3 beq L2 bra L3 L2: move.l #1, D0 bra L4 L3: L4: add.l #1, D1 L5: cmp.l D0, D1 bgt L6 bra L1 L6: move.l D1, iThis code has lots of redundant branches. They result from the simple, syntax-directed translation scheme we used - being syntax-directed, it fails to take advantage of the context in which the code occurs.

There are two ways to improve this. One way might be to modify transStat to take account of the context better. Another way, which promises better results, is to start from the control flow graph for the code above.

See Appel Section 8.2 (``Traces'').

The code above includes a number of redundant register-register moves.

Again, you can try to avoid introducing them (in transExp, transStat etc). Alternatively, you can try to eliminate them later. Can the graph colouring scheme be extended to do this?

The Intel IA-64 instruction set is explicitly parallel, an idea called ``EPIC'' -- which stands for 'explicit parallel instruction computing'. Each 128-bit ``packet'' holds three IA-64 instructions, each 41 bits long. The Itanium implementation of IA-64 can issue up to six instructions per cycle - provided they are independent. The main purpose of the extra five bits () is to indicate whether there is a ``stop'' present in this packet, and if so, where. Obviously, we get better performance if we can arrange the code so there are six instructions between each stop (a group of parallel instructions, which may span several packets, is called a ``bundle'').

The challenge with IA-64 is to schedule instructions to maximise parallelism.

See Appel Chapter 20 (instruction scheduling).

Many recent architectures include special instructions which operate on several short-word operands packed into a single register. Examples include Intel's MMX and SSE, AMD's 3DNow!, and Motorola's PowerPC Altivec. They are often called ``SIMD'' -- single instruction, multiple data.

For example, consider this C code:

typedef struct {float f [4];} VECTOR4F; VECTOR4F add4f (VECTOR4F a, VECTOR4F b) { VECTOR4F r; r.f [0] = a.f [0] * b.f [0]; r.f [1] = a.f [1] * b.f [1]; r.f [2] = a.f [2] * b.f [2]; r.f [3] = a.f [3] * b.f [3]; return r; }This can be packed into just one SSE instruction:

addps xmm0,xmm1Now consider a loop:

void addv(size, float A[], float B[], float C[]) { int i; for (i=0;i<size;i++) C[i] = A[i]+B[i]; }}

*Paul Kelly, November 2009*

In the notes I show a single loop with two backedges. Show an example in C or Java of two source-code loops that lead to two backedges with the same target node.

*It's actually easier with just one loop:
*

while (a < b) { if (a == 0) { a = b; } else { a = -a; } }

do { do { a += 1; } while (a<c); } while (a<b);

The immediate dominator of a node n is a node that dominates but does not dominate any other dominator of . Show that immediate dominators form a tree.

*We have to show that each node has a unique immediate dominator
(except for the start node). Suppose instead there are two: dominates
, and dominates . So
all paths from start to pass through both and . Then
there must be a path from start to then (or vice
versa). But all paths from pass through , so
dominates .
*

A Depth-first spanning tree for a graph arises when the edges are traversed in a depth-first search. Categorizing edges in graph:

- Advancing edges: from ancestor to proper descendant.
- Retreating edges: from descendant to ancestor (not necessarily proper)
- Cross edges: all other edges

Construct a graph containing a retreating edge which is not a backedge (Thanks to Monica Lam for this one).

Structured control flow constructs enforce the property that loops have a single entry node. Can you write a loop with multiple entry nodes in C without using goto?

*Consider this:
*

switch(i) do { case 1: printf(``(%d) One\n'',i); i+=5; continue; default: printf(``(%d) More!\n'',i); i+=1; continue; } while (i<10);

*Paul Kelly, Imperial College November 2009*