Graham Markall

Graham Markall

PhD student in the Department of Computing, Imperial College London.
Member of the Software Performance Optimisation Group.
Supervisors: Paul Kelly and David Ham.

Research Statement:

Optimal implementations of finite element solvers for different targets (multicore CPUs, GPUs, etc.) need to be implemented using different data structures and algorithms appropriate for the characteristics of the target hardware. This complicates development and maintenance, and each time a new architecture is targeted, much of the solver must be rewritten from scratch. A solution to this problem is to use a domain-specific language (DSL), which enables the generation of high-performance code from maintanable sources.

The Unified Form Language (UFL) (from the FEniCS project) is one such DSL, which provides a high level of abstraction and eliminates many of the time-consuming and error-prone tasks required when developing a solver in a low-level language. Since the UFL representation of finite element assembly is independent of low-level implementation details, it allows the generation of high-performance code for multiple targets from a single source representation.

My research involves using UFL as a starting point for generating optimised GPU implementations of finite element solvers, by transforming UFL code into CUDA and/or OpenCL code. I am investigating the effects of different choices of data layouts and implementation algorithms on the performance of the resulting code. The goal of these investigations is to produce a set of tools that can automatically generate high-performance code by choosing the optimal data format and algorithms based on a model of the target hardware, and the parameters of the given problem. These tools will be used to integrate UFL sources into Fluidity, and will have an impact on its development by:


Publications:


Presentations:

Experiments in Unstructured Mesh Finite Element CFD Using CUDA
Presented at the 1st CUDA Developers' Conference. Oxford, UK. December 2009.
Award: Best student presentation.
GPU Acceleration of Finite Element Computations
Presented at Imperial College's Accelerated Computing meeting, June 29 2009.
A sparse conjugate gradient solver implemented in CUDA is benchmarked in a finite element test problem and evaluated against using PETSc to solve the same problem. Results obtained show that the solver has the same level of accuracy as PETSc and converges an order of magnitude faster on typical configurations. A description of the implementation of the solver and how it may be integrated into Fortran and C programs will be given. The source code to the solver will be made available and can be used as a drop-in replacement for other conjugate gradient solvers, to facilitate exploitation of GPUs in suitable applications.

Reports:

Making Faster FEM Solvers, Faster
My MPhil transfer report, submitted in July 2010.
MSc Project
Supervisor: Paul Kelly. Fluidity (developed by the Applied Modelling & Computation Group in the Department of Earth Science & Engineering) is a general purpose computational fluid dynamics code that uses the finite element method to solve the Navier-Stokes equations on adapting unstructured meshes. My MSc project was a pilot study into accelerating the assembly of large, sparse systems of equations using multicore architectures. CUDA versions of the assembly phase of two test problems were produced, resulting in almost an order of magnitude speedup over a multicore CPU. Since rewriting code for each multicore architecture is a labour intensive process, a compiler is produced that generates CUDA code from a high-level specification of the method written in the Unified Form Language. Targeting a new architecture only requires writing a new backend for the UFL compiler, and recompiling existing code.
1st ISO
Supervisor: Paul Kelly. I worked on a sparse conjugate-gradient solver for NVidia GPUs. This solver was compared against the PETSc solver in a finite element test problem which solves a Laplacian equation. The NVidia 280GTX GPU showed a speedup of up to 10 times over one core of an Intel Core 2 Duo 3GHz when solving systems generated by the test problem.
Undergraduate Project
Supervisor: Andy Nisbet. I collected value profile data for the execution of a subset of the MiBench suite of benchmarks executing inside the LLVM interpreter and on the x86 architecture. The inputs and outputs of instructions and the values transferred across the data bus were recorded. These profiles were used to guide the design of a cache that stores the outputs of computations, and an encoding scheme to reduce switching activity on the data bus.

ClearSpeed CSX600 Driver for Linux Kernel >=2.6.24

Changes in the API for the management of scatterlists in kernel 2.6.24 (see this page for details) prevent the ClearSpeed drivers from compiling with these kernel versions. I have modified the driver to reflect these changes, and have made it available here.


Previous Study:


Other Interests:


Email: grm08 A doc.ic.ac.uk (Replace A with "at").

Valid XHTML 1.0 Strict Valid CSS!