PERFORMANCE

Next: ... Up: Pipelining With Multicycle Operations Previous: Handling WAW hazards

PERFORMANCE

Without out-of-order issue (covered in the next section, this static-pipeline approach has somewhat disappointing performance.

EXAMPLE 1: Consider the spice circuit-simulation benchmark (which heavily involves floating point), running on the MIPS R3000.
35% of total no of clock cycles are stalls:

load delays: 3% (assume perfect cache)

branch delays: 2%

FP structural stalls: 3%

FP data hazard stalls: 27%

EXAMPLE 2: H&P pp.208 shows a similar breakdown for R4000 - pipelining the Add and Multiply doesn't help
We will therefore focus on reducing data hazard stalls; we will examine hardware techniques first then consider compile-time approaches.

load delays:	3%	(assume perfect cache)
branch delays:	2%
FP structural stalls:	3%
FP data hazard stalls:	27%