Simulating a PC running Linux

Idea: Write a program, which runs under a standard PC operating system, which simulates the execution of a complete, bare PC and its peripherals. Your simulation should be good enough to be able to run a Unix-like operating system (such as Linux, or perhaps Windows, Solaris or Plan 9).

Motivation: There are several reasons why it would be useful to be able to run a complete operating system as a process within a standard (presumably Unix) environment:

One reason is, of course, so that one can experiment with non-standard operating systems without having to re-configure one's machine.
Another reason (the main reason from the point of view of my research) is that one can instrument the simulation and study how the simulated system spends its time.

Design problems/issues:

You need to be able to simulate the effect of privileged instructions, but the simulator will be running in user mode. A simple way to solve this is to use an instruction set simulator, instead of directly executing the code of the client operating system or its processes.
A related problem is that the client OS will manipulate its address translation hardware, which is not easily done in the simulator.
Using an instruction set simulator would mean that your system would be portable to non-Intel platforms, but would be slow.
You may be able to improve the speed by reverting to direct execution in certain circumstances (e.g. when executing the the client OS's subprocesses), although this is potentially tricky (e.g. a trap will enter the host's handler, not the client OS's).
Another complicated solution is to translate the Intel binary into the host machine's instruction set (which may be Intel again, but without privileged instructions).
Another advantage of using an instruction set simulator is that you can monitor how many instructions have been executed, how much time is spent computing, etc. You can also simulate the cache and monitor its effectiveness, and you can simulate the address translation mechanism (TLB etc).
You should expect the simulated OS to run more slowly. This is to be expected, but it's worth investigating checkpoints to avoid having to simulated the bootstrap sequence repeatedly. The idea of checkpointing is to take a complete copy of the state of the simulator, so that it can be restarted from where it left off.
You need to build simulations of peripherals as well as the CPU. You can, of course, choose the simplest peripherals you can find - perhaps just a serial terminal line and a disk drive to start with. Graphics and network devices can come later. Note that the disk device can get the actual data from the host filesystem.
It is desireable to simulate real devices, so that you can install any client OS which can drive them.
For some applications, it is important to estimate the time the client system would take if it were running on real hardware of some configuration. There are many interesting research projects which would be made possible if this were available.
Examples include
- Data caches; most research into the design of caches for high-performance microprocessors have assumed a single, CPU-intensive application (cf. the SPEC benchmark suite). This doesn't predict the performance of the system when running a mixture of jobs, or an application which involves interaction with the OS and with other processes.
- Instruction caches; operating systems have unusual instruction cache behaviour compared with typical CPU-intensive applications, and the instruction cache design options need to be re-evaluated (e.g. because interrupts and especially system calls can lead to many instructions being executed with very little re-use of recently-executed instructions. This trashes the background application's cache to no purpose).
- Network interfaces; traditionally, networks have been slow compared with CPUs and memory systems. Now, commonly-available LAN technology pushes the limits of the CPU's responsiveness and its memory system's bandwidth. It's vital to eliminate software overheads from network interfacing, but it's unclear how to integrate this into existing protocol stacks.
- Non-volatile RAM, parallel disk systems, etc; There are many ways of improving the performance of file access while retaining data stability, but it's unclear which approach is suitable in which situation.
- Power management is a neglected but crucial issue in many interesting application areas. It is primarily an operating system function, but its impact on the system's perceived responsiveness must be controlled.

The job:

Decide how Intel instructions are to be executed in the client OS kernel. Unless you can think of a really clever scheme, this is going to have to be done in a software simulator. Implement one and test it carefully.
Figure out the bootstrap file format and fix your simulator to work with it. Test by compiling a simple standalone application and running it on the real hardware and on the simulator (this requires some trivial form of output mechanism).
Develop a simulation of the serial line controller, and test it against the real hardware. When this works you should be able to load a Linux kernel and get an error message when it fails to contact any other peripherals. You should be able to test it fully by modifying the Linux kernel.
Identify the other peripherals that need to work, such as timers etc, and develop simulations for them. Test similarly. The most complex will probably be the disk drive (you may wish to start with a floppy drive, since you ought to be able to map the basic operations to raw I/O on the host OS and access a standard boot disk).
At this point, you ought to be able to run a Linux kernel, although it is likely that problems will be uncovered with the simulation of interrupt, trap and address translation mechanisms. Once you fix them, the system should actually boot, albeit slowly.
Once the system works, you will want to spend some time sorting out more devices (e.g. a hard disk importing files from the host), perhaps a network, checkpointing, and some monitoring.
As soon as the system is stable, it would be really nice to use it to collect some statistics as a start towards using it as a platform for experimental research.
Also, you oughy to be able to boot some other OS's. Windows, Solaris, BSD, etc - if you get the basics right they should all just run.

Reading:

Check out the SimOS project; they describe the results from a similar project for SGI hardware.

Equipment: PC running Linux; stacks of disk space.

Recommended tools: Unix, C/C++.

Suitability: This is a demanding research-level project with enormous potential scope. The basic prerequisites are 1) insight into performance, architecture and applications issues, 2) the practical ability to get complicated software to do what you want, and the imagination and clarity of thought to design good experiments.