Advanced Computer Architecture Chapter 7.33
Large-Scale Shared-Memory Multiprocessors
•Bus inevitably becomes a bottleneck when many processors are used
Use a more general interconnection network
So snooping does not work
 DRAM memory is also distributed
 Each node allocates space from local DRAM
 Copies of remote data are made in cache
Major design issues:
How to find and represent the “directory" of each line?
How to find a copy of a line?
As a case study, we will look at S3.MP (Sun's Scalable Shared memory Multi-Processor, a CC-NUMA (cache-coherent non-uniform memory access) architecture