Idea: instead of trying to exploit more
instruction-level parallelism by building a
bigger CPU, build two - or more
This only makes sense if the application
parallelism exists…
Why might it be better?
No need for multiported register file
No need for long-range forwarding
CPUs can take independent control paths
Still need to synchronise and communicate
Program has to be structured appropriately…