The Semantics of Shared Memory in Intel CPU/FPGASystems

Abstract

Heterogeneous CPU/FPGA devices, in which a CPU and an FPGA can execute together while sharing memory,are becoming popular in several computing sectors. In this paper, we study the shared-memory semanticsof these devices, with a view to providing a firm foundation for reasoning about the programs that run onthem. We focus on Intel platforms that combine an Intel FPGA with a multicore Xeon CPU. We describe theweak-memory behaviours that are allowed (and observable) on these devices when CPU threads and an FPGAthread access common memory locations in a fine-grained manner through multiple channels. Some of thesebehaviours are familiar from well-studied CPU and GPU concurrency; others are weaker still. We encodethese behaviours in two formal memory models: one operational, one axiomatic. We develop executableimplementations of both models, using the CBMC bounded model-checking tool for our operational modeland the Alloy modelling language for our axiomatic model. Using these, we cross-check our models againsteach other via a translator that converts Alloy-generated executions into queries for the CBMC model. Wealso validate our models against actual hardware by translating 583 Alloy-generated executions into litmustests that we run on CPU/FPGA devices; when doing this, we avoid the prohibitive cost of synthesising ahardware design per litmus test by creating our own ‘litmus-test processor’ in hardware. We expect that ourmodels, one of which has been deemed ‘definitive’ by a Senior Principal Engineer at Intel, will be useful forlow-level programmers, compiler writers, and designers of analysis tools. Indeed, as a demonstration of theutility of our work, we use our operational model to reason about a producer/consumer buffer implementedacross the CPU and the FPGA. When the buffer uses insufficient synchronisation – a situation that our modelis able to detect – we observe that its performance improves at the cost of occasional data corruption.

Date