Graham Markall
Research Statement:
Optimal implementations of finite element solvers for different targets (multicore CPUs, GPUs, etc.) need to be implemented using different data structures and algorithms appropriate for the characteristics of the target hardware. This complicates development and maintenance, and each time a new architecture is targeted, much of the solver must be rewritten from scratch. A solution to this problem is to use a domain-specific language (DSL), which enables the generation of high-performance code from maintanable sources.
The Unified Form Language (UFL) (from the FEniCS project) is one such DSL, which provides a high level of abstraction and eliminates many of the time-consuming and error-prone tasks required when developing a solver in a low-level language. Since the UFL representation of finite element assembly is independent of low-level implementation details, it allows the generation of high-performance code for multiple targets from a single source representation.
My research involves using UFL as a starting point for generating optimised GPU implementations of finite element solvers, by transforming UFL code into CUDA and/or OpenCL code. I am investigating the effects of different choices of data layouts and implementation algorithms on the performance of the resulting code. The goal of these investigations is to produce a set of tools that can automatically generate high-performance code by choosing the optimal data format and algorithms based on a model of the target hardware, and the parameters of the given problem. These tools will be used to integrate UFL sources into Fluidity, and will have an impact on its development by:
- making development easier and faster by allowing developers to quickly and easily implement new discretisations,
- allowing aggressive exploitation of future architectures,
- and increasing the efficiency of development by separating the concerns of computational scientists and software engineers.
Publications:
- M. B. Giles, P. H. J. Kelly, G. R. Markall, G. R. Mudalige and Z. Sharif. Performance Analysis of the OP2 Multi-Layer Abstraction Framework on Many-Core Architectures. 1st International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computing Systems (PMBS '10), held as part of SC'10. November 2010, New Orleans, LA, USA.
- G. R. Markall, D.A. Ham and P. H. J. Kelly. Towards generating optimised finite element solvers for GPUs from high-level specifications, to appear in Proceedings of the 10th International Conference on Computational Science (ICCS 2010). Amsterdam, Netherlands. June 2010.
Presentations:
- Experiments in Unstructured Mesh Finite Element CFD Using CUDA
- Presented at the 1st CUDA Developers' Conference. Oxford, UK. December 2009.
- Award: Best student presentation.
- GPU Acceleration of Finite Element Computations
- Presented at Imperial College's Accelerated Computing meeting, June 29 2009.
- A sparse conjugate gradient solver implemented in CUDA is benchmarked in a finite element test problem and evaluated against using PETSc to solve the same problem. Results obtained show that the solver has the same level of accuracy as PETSc and converges an order of magnitude faster on typical configurations. A description of the implementation of the solver and how it may be integrated into Fortran and C programs will be given. The source code to the solver will be made available and can be used as a drop-in replacement for other conjugate gradient solvers, to facilitate exploitation of GPUs in suitable applications.
Reports:
- Making Faster FEM Solvers, Faster
- My MPhil transfer report, submitted in July 2010.
- MSc Project
- Supervisor: Paul Kelly. Fluidity (developed by the Applied Modelling & Computation Group in the Department of Earth Science & Engineering) is a general purpose computational fluid dynamics code that uses the finite element method to solve the Navier-Stokes equations on adapting unstructured meshes. My MSc project was a pilot study into accelerating the assembly of large, sparse systems of equations using multicore architectures. CUDA versions of the assembly phase of two test problems were produced, resulting in almost an order of magnitude speedup over a multicore CPU. Since rewriting code for each multicore architecture is a labour intensive process, a compiler is produced that generates CUDA code from a high-level specification of the method written in the Unified Form Language. Targeting a new architecture only requires writing a new backend for the UFL compiler, and recompiling existing code.
- 1st ISO
- Supervisor: Paul Kelly. I worked on a sparse conjugate-gradient solver for NVidia GPUs. This solver was compared against the PETSc solver in a finite element test problem which solves a Laplacian equation. The NVidia 280GTX GPU showed a speedup of up to 10 times over one core of an Intel Core 2 Duo 3GHz when solving systems generated by the test problem.
- Undergraduate Project
- Supervisor: Andy Nisbet. I collected value profile data for the execution of a subset of the MiBench suite of benchmarks executing inside the LLVM interpreter and on the x86 architecture. The inputs and outputs of instructions and the values transferred across the data bus were recorded. These profiles were used to guide the design of a cache that stores the outputs of computations, and an encoding scheme to reduce switching activity on the data bus.
ClearSpeed CSX600 Driver for Linux Kernel >=2.6.24
Changes in the API for the management of scatterlists in kernel 2.6.24 (see this page for details) prevent the ClearSpeed drivers from compiling with these kernel versions. I have modified the driver to reflect these changes, and have made it available here.
Previous Study:
- MSc. Advanced Computing, Imperial College London, 2008-2009
- BSc. (Hons) Computer Science, Manchester Metropolitan University, 2005-2008
Other Interests:
- I am a trustee of Ath Welak, a charity which helped to rebuild homes and businesses in Sri Lanka following the 2004 tsunami.
- I have been elected chair of the Imperial College Linux Users' Society for the academic years 2009-2010 and 2010-2011.
Email: grm08 A doc.ic.ac.uk (Replace A with "at").