Index of Project Proposals

Compile-time virtual memory by Dr. Paul Kelly

Compile-time virtual memory

Idea: Modify the GNU C compiler to handle address translation in software. Then look at ways of optimising this overhead out.
Motivation: Address translation is an extremely complicated part of CPU design. It is a common source of errors, and it has a pernicious effect on performance.
The performance impact comes firstly from the time taken to look every address up in a page table (TLB - translation lookaside buffer). This is normally small and often zero since it can be done in parallel with access to the primary cache.
However there are other performance penalties. For example, when a page fault occurs, a conventional operating system traps to a generic fault handler and saves the faulting program's state. If we handle this in software, the compiler can generate optimised code for the specific point where the fault occurs. There are also penalties because of caches.
Another interesting aspect of this is user-specified interrupt handlers. In a conventional OS this is very hard indeed to do fast: you need to switch into the user's address space in the handler, and then you run the risk of getting a page fault. With software address translation both these problems disappear.
Finally, a system with software-managed virtual memory can give the user program complete control over its memory resource allocation. For example, it can choose appropriate page sizes (not necessarily constant), appropriate page table data structures, and it can schedule its work to use RAM-resident data while waiting for a disk access to be satisfied.
The problems are
- Security: does this mean the compiler has to be trusted? Well; if so, maybe that's not so bad after all, if it gives us important performance advantages.
- Performance: although I have indicated ways in which performance might be improved, it would be very slow if we had to to the TLB lookup in software for every memory reference. However, things need not be this bad. For example, we should be able to do a lot with loops, moving repeated checks for accesses to the same page to the start of the loop. Similarly with stacks, and fields of a record whose base address is OK.
The job: Modify the GNU C compiler so that every time a pointer is used, address translation is done first (this has already been done in an earlier project, see below). Establish a suite of benchmark programs, and evaluate the performance of the resulting code. Go back and look at ways of improving this performance by improving gcc's optimisation of your code. Study applications.

Reading:
Richard Jones project report (here).
Various research papers, e.g. Application-level virtual memory.

Equipment: Unix workstation.
Recommended tools: Unix, C, GNU C compiler.
Suitability: This is a research-level project with enormous potential scope. The basic prerequisites are 1) insight into performance, architecture and applications issues, 2) the practical ability to get complicated software to do what you want, and the imagination and clarity of thought to design good experiments.
C Compiler back-end generating Java Byte Code by Dr. Paul Kelly

C Compiler back-end generating Java Byte Code

Idea: Modify the Gnu C compiler (which is designed to be easily retargeted) to generate code for the Java Virtual Machine.
Motivation: To run as an applet in a web browser like Mosaic, a program must be compiled into Java byte code; at the moment, the only way to get Java byte code is to write your program in Java and compile it. The idea of this project is to translate C into Java byte code - so all kinds of ordinary C applications can be downloaded as Java applets.
The tricky bits:
- The Java virtual machine is designed to be strongly typed, so you need to figure out how C's weakly-typed mechanisms will fit.
- C programs need to interface with the broswer. A simple terminal interface would do for a start but what you really need is X-windows...
The job:
- Design how C is to be mapped into the JVM, and prototype your proposal by translating by hand.
- Figure out how gcc is designed to make it easy to generate code for a new machine, and experiment. Start by generating a readable assembler code.
- Translate assembler into binary (e.g. using the Gnu assembler gas).
- Build support code
- Try porting some substantial applications
Reading:
See the Java Virtual Machine specification, and Tim Wilkinson's Kaffe interpreter.
Equipment: Unix workstation.
Recommended tools: Unix, C/C++, gcc, kaffe.
Suitability: This is a research-level project with enormous potential scope. The basic prerequisites are 1) insight into performance, architecture and applications issues, 2) the practical ability to get complicated software to do what you want, and the imagination and clarity of thought to design good experiments.
Compiler for Skeleton-based Parallel Programming Dr. Paul Kelly

Compiler for Skeleton-based Parallel Programming

The goal of this project is to write a compiler which automatically re-tunes a parallel program when it is ported to a new parallel machine.
"Skeletons" are what make this possible. Skeletons encourage structured parallel programming, in which a parallel program is built from ready-made components for which efficient implementations are known to exist. These components may be ordinary parallel operations, or, more interestingly, they may be "skeletons" which can be filled in with user-code. The main issue in building a compiler is to use analytical performance models of the different skeletons used in a program in order to find the most efficient allocation of resources.
Background: A fundamental problem in parallel programming is that the way one part of the program should be parallelised can depend on the parallelism in the rest of the program.
- Example: Matrix multiply: The way an input array is distributed may determine how a parallel matrix multiply should be implemented, and may influence the distribution, in turn, of the result.
- Example: Poisson solver: suppose one part of a program finds the heat flow in a 2-D region, and a second part tests whether all points in the region satisfy some convergence test. We could allocate a square array of processors to solve for the heat flow, and a tree of processors to compute the convergence test. In reality, however, this may not be the most efficient allocation of resources. It's likely that it would be more efficient to allocate all the processors to solving for the heat flow, and to use the same processors to do the (relatively fast) convergence testing.
The idea of skeletons is to get the programmer to write the program so that as much of the application's parallelism as possible is exposed in the form of calls to standard parallel operations/skeletons. Then the compiler's job is to decide how each parallel operation should be implemented (in parallel, sequentially, using which parallel implementation "template"), and, if in parallel, how data structures should be distributed and how processors should be allocated to the different parallel activities.
To do this, the compiler must have an estimate of the execution time for each user-provided code fragment (written in C for example). It should also have a performance model of how each skeleton's implementation template performs on the given target architecture.
The job:
1. Design/select an input language. Although we should discuss it, I think the "SCL" language proposed by John Darlington and co (see "Reading" below) would be more-or-less suitable, perhaps in simplified form. This is a data-parallel functional language which allows fragments of C to be included (actually I think I prefer an imperative language with expressions which can use the full power of a functional language).
2. Write/find a small set of example programs to use to evaluate your work. Identify the minimum set of SCL operations needed to run them.
3. For each of your SCL operations, design a parallel implementation scheme (called a "template") and write down a simple mathematical model of its expected performance. For some skeletons, it may make sense to have several different templates for different situations, but don't worry about that yet.
4. Using the MPI (Message Passing Interface) standard for portable parallel programming, built quick and simple implementations of your templates. Do some performance analysis to get parameters for your performance models.
5. You should now be able to translate your example programs into implementations by hand, so you can get some results.
6. Write an automatic translator to translate SCL into a C program which calls templates and user code fragments appropriately (note that you need to add monitoring code to gather execution times for C fragments).
7. Add a resource allocation phase to your compiler. It should build a representation of the combined performance of the complete system, and should then use simple heuristics to make resource allocations which minimise the expected execution time.
8. Evaluate your work by studying the performance of your system on your chosen benchmarks.
Reading:
- {Fortran} {90D/HPF} Compiler for Distributed-Memory {MIMD} Computers: design, implementation and performance results by Zeki Bozkus, Alok Choudhary, Geoffrey Fox, Tomasz Haupt, Sanjay Ranka". In Proceedings, Supercomputing'93.
- A methodology For the Development and the Support of Massively Parallel Programs M Danelutto, R {Di Meglio}, S Orlando, S Pelagatti and M Vanneschi. Future Generation Computer Systems Vol.8, pp205--220 (Aug 1992).
- Parallel Implementation of FP using a Template-based Approach Marco Danelutto and Susanna Pelagatti, University of Pisa.
- Parallel Skeletons for Structured Composition John Darlington, Yike Guo, Wing To and Jin Yang. In Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP), July 1995.
Equipment: Unix workstation, Fujitsu AP1000.
Recommended tools: Unix, C or C++, MPI (Message Passing Interface).
Suitability: This is a challenging research-level project. The basic prerequisites are 1) basic compiler-writing skills, 2) insight into performance, architecture and applications issues, 3) some aptitude for manipulating analytical performance models (we're not talking about complicated maths here).
Compiling short-word arithmetic by Dr. Paul Kelly

Compiling short-word arithmetic

Idea: A number of important performance-critical applications are characterised by the need for short-word arithmetic. These applications are poorly-served by existing compiler technology and, to some extent, CPU architectures. For example, although processors with 64-bit data paths and arithmetic units are now commonly available, it is very hard to use them to process eight 1-byte fixed-point numbers in parallel. Even when it is possible, the source code is very complex and machine-dependent.
The idea of this project is to study compilation techniques, language design issues, and architectural features to support simple, portable programming of such applications which achieves the performance potential of wide data paths.
Motivation: Examples of application areas where high-performance short-word operations are important include:
- Compression, decompression (e.g. for video-on-demand, digital cameras, solid-state Walkmen).
- Control systems (e.g. for engine management, robot attitude control, etc.)
- Communications switching (e.g. for ATM network routing and management).
- Image processing, e.g. for medicine, burglar alarms, robot guidance.
- Signal processing (e.g. for voice recognition, voice identification, acoustic virtual reality).
The job:
There are several ways to address this project. One way forward would be to base the work on the GNU gcc C compiler. This has the advantage of providing a freely-available, state-of-the-art compiler as a starting point.
- Identify a small set of example codes/benchmarks against which to evaluate the work
- Study GCC's existing support for short words
- Study instruction-set support for short words and investigate whether novel instructions might be needed (one way to do this would be to use the GCC DLX or MIPS back-end, for which simulators are available. You could then extend the simulator with any suitable instructions.
- Study how GCC could be modified to generate code for blocks of short words.
- Implement modifications
- Evaluate modifications against benchmarks.
Reading:
Various research papers, e.g.
- Please ask - I haven't thought of the right references yet. .
Equipment: Unix workstation.
Recommended tools: GNU C compiler gcc, the SPIM simulator, the SPEC'95 benchmark suite.
Suitability: This is a research-level project with enormous potential scope. The main prerequisites are 1) some intuition and experience thinking about computer systems performance, and 2) proven ability to make complicated software do what you want. Once you've got to grips with the practical work, the real opportunity is to invent clever mechanisms, to design clever, easy-to-do experiments to evaluate your ideas, and to interpret the results.
Infinite precision integer arithmetic by Susan Eisenbach

Infinite precision integer arithmetic

Description: We currently teach the functional language Miranda whose numbers are implemented in such a way that people doing work requiring accuracy can rely on the Miranda number system. Gofer is another functional language whose number system is nowhere is good (try (sqrt 2.0)^2 - 2 for example on the Miranda system and the gofer system). There are many reasons why we might decide to change from Miranda to gofer as a teaching language. It would be a real pity to lose the Miranda precision. The aim of this project is to investigate the different ways of representing numbers and to implement your choice. The implementation can either be in gofer itself or can alter the compiler itself. range of integers
Partial Evaluation for C by Dr. Paul Kelly

Partial Evaluation for C

Build a partial evaluator for C programs - ie a compiler (or preprocessor) which using partial knowledge of the input to produce a more efficient program for repeated use with varying instances of the remaining input.
Partial evaluation is the idea of a compiler which takes a program together with some of its inputs, and produces an optimised program ready to be applied to the remaining inputs.
- Example: ray tracer + scene, many different viewpoints
- Example: interpreter + program, many different inputs.
- Example: assembly code + architectural simulator for pipelined CPU
In the interpreter case, you avoid parsing, building the abstract syntax tree and repeating testing AST nodes (or bytecodes).
We would have to discuss how to go about this project. One option would be to take an existing C compiler as a starting point (e.g. GNU gcc). Another would be to build a small and simple stand-alone prototype, perhaps in a very high level language such as SML.
Equipment: Unix.
Recommended tools: C, gcc.
Suitability: This is a research-level project with enormous potential scope. The basic prerequisites are 1) insight into the theoretical background, 2) the practical ability to get complicated software to do what you want, and the imagination and clarity of thought to design a simple and effective scheme.
Wysiwyg LaTeX using Checkpointing by Dr. Paul Kelly

Wysiwyg LaTeX using Checkpointing

A project to build a What-You-See-Is-What-You-Get phototypesetting system based on the LaTeX package
LaTeX is a document-preparation and typesetting package which is widely used in academia. It produces exceptionally nice-looking results (especially for maths), is extremely reliable, and can handle full-size books as well as letters and articles.
Unfortunately it is a little inconvenient to use: you prepare a "source" file in ASCII which describes the structure of your document, then "compile" it to produce an output file which you can view or print.
This means there is a delay between typing the material in, and seeing the final result. For short documents this is annoying but quick. For a book it can take frustratingly long.
The idea of this project is to build a clever front-end which allows you to view the page you're working on practically instantly.
How can this be done easily?
The goal of this project is to reuse the LaTeX program more-or-less unchanged. This has the advantage the you inherit the reliability and quality of the existing LaTeX code.
The trick is to use checkpointing so that the LaTeX program doesn't have to rerun from the beginning each time you want to view the output.
That is:
1. When you run LaTeX, stop every few pages and take a checkpoint of the program's state.
2. When you edit the source code, make a note of the earliest byte of the input file that is changed.
3. Find the latest checkpoint from the previous run, just before this changed byte was read.
4. Rerun the program from there. (you may run it to completion, or stop it as soon as enough output has been generated).
5. Signal the viewer that it should reread the output file to display the current page with the changes.
The effect of this is that you can view the effect on the current page of a change to the input file with a limited amount of work, no matter how large the input file.
How can this be done efficiently?
The feasibility of the scheme depends on collecting checkpoints lazily. Fortunately there are well-established techniques for checkpointing computations with minimal cost using virtual memory techniques. The basic idea is that when a checkpointing event occurs, the current data pages are protected against writing, and their identifiers are remembered in case the snapshot has to be reinstated. The client computation is then continued, but when a page fault occurs due to a write to a checkpointed page, a true copy of the page is made. The client process's memory map is modified so that the copy supercedes the read-only copy, and the client is allowed to continue with read-write access to the page.
When the next checkpoint event occurs, many but not all of the read-only pages may have been copied. As before, the present pages are made read-only and their idenifiers are remembered for restarting.
The effect of this scheme is that time is spent copying only those pages which are actually changed between checkpoints. Furthermore, space is occupied only by those pages which are changed.
A complete system
I am sure that many user-interface designs are possible, but the basic service we can offer is as follows:
- A pair of windows, side-by-side: one a text editor for modifying the LaTeX source of the document, the other displaying the result.
- A button (mouse, keyboard or on-screen) to cause reformatting
- An optional pop-up window showing the messages arising from any formatting errors.
A nice touch would be synchronised scrolling of the two windows --- so when you move around in the previewer, the text editor is adjusted to refer to the right area. This is easily done using the correlation between pages of the DVI file read by the previewer, pages output by TeX, and pages read by TeX.
It is probably necessary to perform recomputation only when some specific command is given, as otherwise a syntax error is likely to occur, leading to no output.
Note that when starting the system up, snapshots need be collected relatively infrequently, perhaps every five or ten pages. We can thus keep the time invested in snapshotting small. Once recomputation is requested, more frequent snapshots should be kept to reduce latency.
Equipment: Unix workstation.
Recommended tools: Unix, C or C++, Tcl/TK, GNU Emacs, xdvi (sources), latex (sources), checkpointing package (e.g. as produced by project student Paul Walmsley in 1994).
Extending the Java language by Susan Eisenbach and Sophia Drossopoulou

Extending the Java language

Description: Many expressive programming languages have a facility to generalise code for reuse. C++ has its templates, Ada has its generics and functional programming languages have polymorphism. Java lacks any such feature and the language designers at Sun are thinking of adding this to the language. We've heard a talk from Guy Steele, one of the Java language designers entitled "Should Java have Parameterized Types?" given at a workshop on language design. What should they add? An investigation of the different types of features including both how the code you would write looks, implementation issues and possible interactions with the rest of the language needs to be undertaken.
Is it safe to run Java applets? by Susan Eisenbach and Sophia Drossopoulou

Is it safe to run Java applets?

Description: When we click on a Java applet on a web page a byte code version of the Java program starts executing. This project is to look at what types of things can be sent down to your machine that might cause you problems.
What is the millenium problem? by Susan Eisenbach

What is the millenium problem?

Description: There's lots of talk in the media about the disasters that will befall us when the year 2000 comes. What actually is likely to happen? Where are the problems? What software can we produce now to improve matters?
JAVA Cryptographic Designer by Dr. Naranker Dulay

JAVA Cryptographic Designer

The aim of this project is to design a graphical application that can be used to (i) learn about cryptographic protocols, (ii) design and analyse cryptographic protocols.
Compiling Declarative Languages to JAVA Bytecode by Dr. Naranker Dulay

Compiling Declarative Languages to JAVA Bytecode

The aim of this project is to implement a functional or logic language by compiling it to JAVA bytecode.
Generating Prolog++ code from specifications by Dr. Chris Moss

Generating Prolog++ code from specifications

Details Please click here
Language Design, Semantics, Types by Dr. Sophia Drossopoulou

Language Design, Semantics, Types
p> Proposals:
Language Design, Semantics, Types
- Soundness of the Java Type System Java was designed to support safe, distributed programming on the intrernet. However, several studies point out flaws in its security, and point out the need for a clear understanding of the type system. In this project we will provide a formal semantics and a proof of the soundness of the type system (ie that in a type correct program at run run-time all variables would contain values of the type predicted by the compiler).
- Is the C++ type system really safe? The aim of the project is to produce a formal proof of the soundness of the type system; ie that in a type correct program at run run-time all variables would contain values of the type predicted by the compiler. This is an important and non-trivial claim to make: for example, the type system of Eiffel, a highly successful and relatively simple oo programming language, has been discovered not to be sound, a couple of years after the development of the language.
- Type Systems for Object Oriented Languages
Java Applications
- Java is a simple OO language, developed by SUN to support distributed program development and composition on the Internet. More on Java at our local mirror site.
  The possibilities opened by Java are currently being explored. Possible Java applications for student projects could be the animation of algorithms, or interactive packages teaching Maths, or even Java, a simulation package. More sophisticated applications could be Java programming tools, and libraries.
Language Design and Compilers
- Units and Dimensions for Programming Languages: The type system of a programming language is extended to express dimensions (eg distance, speed), and units for measuring these dimensions (eg km, cm, sec, hr), implicit conversions between units, and operators from units in some dimension to other units in other dimensions. The project requires some innovation, and it involves work in programming language definition and compiler writing.
- Higher order Module Parameterization an extension to an existing programming language with modules (eg Modula-2), to allow for parameterization of modules (eg generic lists, and generic matrices expressed in terms of generic lists), and thus a higher level of code reuse. Abstract Datatypes as Afterthoughts A special purpose editor that will assist the improvement of a program to include abstract data types.
- Program Beautifier A pre-processor, which would suggest improvements to beginners' programs. It should start with simple improvemenents, as eg replacing if boolexpr1 then if boolexpr2 then ... by if booolexpr1 and boolexpr2 then ... . The make more substantial suggestions which concern data flow and flow of control. If successful, this will be a very useful tool for our students.
- Program Beautifier for Smalltalk Similar ideas as above, this time applied to Smalltalk. The advantage of this approach is, that the compiler is part of the environment, and therefore we have direct access to all information required.
Tools
- Predicates and Quantifiers without Tears - a graphic interactive tool to explain and demonstrate first order predicate calculus.
- Smalltalk Program Designer: aims to close the gap between the very high level, language independent design methodologies (like OOA/OOD, OMT etc), and the existing Smalltalk browsers which only support entering programs that are fully designed and thought out.
- Arithmetic Tutor - an interactive teacher for basic arithmetic, which sets questions (eg 3*5=?, 6-4=?) according to the pupil's abilities. In addition, when the pupil gives a wrong answer, it supports the pupil, by asking appropriate further questions (eg 3*5=5+5+5=?) to help him/her find the correct answer
- Programming Tutor: an interactive system to support teaching of basic programming. The Tutor should consists of a series of lessons. Lessons should consist of short explanatory material together with little tests or little programming assignments which would be tested by the Tutor.

Index of Project Proposals

Compile-time virtual memory

C Compiler back-end generating Java Byte Code

Compiler for Skeleton-based Parallel Programming

Compiling short-word arithmetic

Infinite precision integer arithmetic

Partial Evaluation for C

Wysiwyg LaTeX using Checkpointing

Extending the Java language

Is it safe to run Java applets?

What is the millenium problem?

JAVA Cryptographic Designer

Compiling Declarative Languages to JAVA Bytecode

Generating Prolog++ code from specifications

Language Design, Semantics, Types

Language Design, Semantics, Types

Java Applications

Language Design and Compilers

Tools