We could avoid the stall due to the first load by executing it earlier -- i.e. during the previous iteration. What we want to do is this:
ldd [%o0],%f2 ! load f2 for first itern L313: fmuld %f6,%f2,%f2 ldd [%o1],%f4 ldd [%o0+8],%f2 ! load f2 for next itern add %o1,8,%o1 faddd %f2,%f4,%f2 add %o0,8,%o0 cmp %o0,%o2 blu L313 std %f2,[%o1-8]This doesn't quite work as the load clobbers f2. To fix this we must modify the register allocation: use f8 for the result of the load.