Without out-of-order issue (covered in the next section, this static-pipeline approach has somewhat disappointing performance.
35% of total no of clock cycles are stalls:
We will therefore focus on reducing data hazard stalls; we will examine hardware
techniques first then consider compile-time approaches.
load delays:
3%
(assume perfect cache)
branch delays:
2%
FP structural stalls:
3%
FP data hazard stalls:
27%