28 February
Noon, LT308 Huxley
Title: | Debugging large scale operating systems |
---|---|
Abstract: | Operating Systems are a medium though which hardware resources are managed and exported as services to applications. When they stop working, so does the business on which the tower of software and hardware is supporting. Finding the root cause of a system failure in an Operating System built from a code base of millions of lines, possibly with 3rd party kernel modules is a seriously challenging task. When the systems in question have 1000's of cpu's, 100's of TB's of physical memory and running 100,000's of processes, then complexity steps up a gear. We examine some of the practical issues involved in post-postmortem failure analysis and what the future challenges are in scaling up diagnosis support to deal with the large systems of tomorrow. |
Speaker Details: | Jim Moore Jim Moore is the EMEA Solaris Revenue Product Engineering Director where he is responsible for Solaris engineering staff in the UK and Prague. A role which attempts to reconcile the holy trinity of product quality, customer satisfaction and making R.P.E. a enjoyable place to work. Previously, Jim was a senior kernel engineer for many years working across the spectrum of Solaris subsystems. |