•Bus inevitably becomes a bottleneck when many processors are used
–Use a more general interconnection network
–So snooping does not
work
• DRAM memory is also distributed
– Each node allocates space from local
DRAM
– Copies of remote data are made in
cache
•Major design issues:
–How to find and
represent the “directory" of each line?
–How to find a
copy of a line?
•As a case study, we will look at S3.MP
(Sun's Scalable Shared memory
Multi-Processor, a CC-NUMA (cache-coherent non-uniform
memory access) architecture