Next: Moral
Up: Motivation: an example
Previous: Reducing overheads of blocking...
- Original version: 10.3 seconds (26.1 MFLOPS).
- That was using a ``good'' optimising compiler!
- Factor of five performance improvement.
- No reduction in amount of arithmetic performed.
(The CodePlay VectorC
compiler generates slightly better code for this inner, blocked loop, giving
155.0 MFLOPS)