332
Advanced Computer Architecture
Chapter 7

Parallel architectures, shared memory, and cache coherency

Overview

Example: Compaq Alpha+ Quadrics

NPACI Blue Horizon, San Diego Supercomputer Center

SGI Origin 3800 at SARA, Netherlands

The “Earth Simulator”

What are parallel computers used for?

Why add another processor?

Architectural effectiveness of Intel processors

Architectural effectiveness of Intel processors

How to add another processor?

How to add another processor?

How to connect processors...

Multiple caches… and trouble

Multiple caches… and trouble

Multiple caches… and trouble

Multiple caches… and trouble

Implementing distributed, shared memory

Cache consistency (aka cache coherency)

Implementing Strong Consistency: update

Implementing Strong Consistency: update…

A more cunning plan… invalidation

Update vs invalidate

The “Berkeley" Protocol

Berkeley cache coherence protocol:
state transition diagram

The job of the cache controller - snooping

Berkeley protocol - summary

Snoop Cache Extensions

Snooping Cache Variations

Implementation Complications

Implementing Snooping Caches

Implementing Snooping Caches

Large-Scale Shared-Memory Multiprocessors

Larger MPs

Distributed Directory MPs

Directory Protocol

Directory Protocol

Directory Protocol Messages

State Transition Diagram for an Individual Cache Block in a Directory Based System

CPU -Cache State Machine

State Transition Diagram for the Directory

Directory State Machine

Example Directory Protocol

Example Directory Protocol

Example

Example

Example

Example

Example

Example

Implementing a Directory

Synchronization

Uninterruptable Instruction to Fetch and Update Memory

Uninterruptable Instruction to Fetch and Update Memory

User Level Synchronization—Operation Using this Primitive

Another MP Issue:
Memory Consistency Models

Memory Consistency Model

Summary

Case study:
Sun’s S3MP

S3MP: Read Requests

S3MP: Read Requests - remote

S3MP - Writes

S3MP - Replacements

Finding your data

ccNUMA summary

Beyond ccNUMA

Which cache should the cache controller control?

Summary and Conclusions