We can reorder the instructions so that useful work is done while the floating point operations and loads take place:
L313: ldd [%o0],%f2 fmuld %f6,%f2,%f2 ldd [%o1],%f4 add %o1,8,%o1 ! fwd 2 to fill ldd delay faddd %f2,%f4,%f2 add %o0,8,%o0 ! fwd 3 to fill addd delay cmp %o0,%o2 blu L313 std %f2,[%o1-8] ! bwd 4 to avoid addd delay