A Lazy, Self-Optimising Parallel Matrix Library MSc Thesis Department of Computing, Imperial College of Science, Technology and Medicine, London, U.K. Olav Beckmann September 1996 This project describes a methodology for parallelising numerical programs by means of calls to a library of parallel implementations for a number of common computationally expensive operations. Such a library can provide an attractive alternative to a full-scale parallelising compiler, since users can keep both the structure of their program and the compiler they are using unchanged, and instead merely have to link with the parallel library. A naive implementation of such a library would have a number of significant problems: since we cannot, from the library implementor's point of view, analyse the user program, we would normally for each library call have to distribute the operands and collect the result, with no scope for optimising data distributions over a sequence of operations. This project uses lazy evaluation to address these problems. Calls to library functions are initially suspended and a recipe for the result is stored instead. When a value is forced, e.g. because of output or a conditional test, we can optimise the execution of the DAG which has been built up. Work has been done on this topic in a previous project. The main work of this project has been in the development of a methodology for representing data distributions which has the following key characteristics: * A wide range of regular data distributions, including all those available in HPF, can be represented. * Redistributions have essentially the same format as distributions, we can calculate the required redistribution from any two distributions, and the composition of a distribution with any redistribution is always a valid new distribution. * A method is presented for deriving the code for a contention-free implementation of distributions and redistributions from their mathematical representations. The design of the `lazy' runtime machine is described, and an algorithm for optimising data placements is given. An experimental implementation demonstrates the practical working of the data distributions method described and shows the performance benefit of lazy data movement.