Imperial College  » Dept of Computing  » Peter Pietzuch DISSP
 
 

PhD Students

I'm always looking for outstanding and motivated PhD students. Starting dates for PhDs are April and October. More information on the application process and funding opportunities is available here.

EBS Book

DEBS Book

 

DISSP: Dependable Internet-Scale Stream Processing

(EPSRC Grant EP/F035217/1 — July 2008 to June 2011)

Project Motivation

Real-time stream data has begun to play an increasingly important role on the Internet. One of the causes for this is the proliferation of geographically-distributed stream data sources such as sensor networks, scientific instruments, pervasive computing environments and web feeds connected to the Internet. Potentially millions of users world-wide want to take advantage of the availability of this data. Therefore they require a convenient way to process real-time stream data at a global scale through applications that perform Internet-scale stream processing (ISSP). Similar to the ease of relational queries in DBMS, stream-processing systems allow users to access and manipulate distributed data streams through declarative queries. However, the scale of an Internet-wide system poses substantial challenges when it comes to providing a dependable service. Any such system must gracefully handle the failure of network links and processing hosts while managing a large pool of CPU and network resources.

For example, astronomers want to detect transient sky events, such as gamma-ray bursts, in real-time. To detect such events, they must correlate real-time image streams from geographically-distributed radio telescopes. These events only last for minutes and, after an event has been detected, instruments need to be re-aligned to focus on on-going occurrences. The left figure below shows an ISSP system, executing a query that takes images from radio telescopes, processes them in real-time and delivers data about transient anomalies to two astronomers. The processing is done by a distributed set of hosts (or data centres). The logical structure of the query implementing the application is shown on the right. The system must ensure that image data is transported reliably between processing sites. However, some data loss is acceptable, as long as no transient sky events are missed as a consequence.
DISSP example DISSP query

Project Overview

The DISSP project investigates how to build dependable Internet-scale stream-processing systems for interconnecting tomorrow's pervasive sensor systems and global scientific experiments. We argue that Internet-scale stream processing needs new models for achieving dependability. Achieving dependability in this context is a significant challenge for several reasons: (1) failure will be the common case in the system. Due to its size, a fraction of Internet paths and hosts will be unavailable at any time; (2) the real-time nature of the data means that there is little time for recovery from failure; (3) a shared infrastructure, such as an ISSP system, will experience high utilisation. Consequently the additional resource demand during recovery can overload the system. The traditional wisdom of substantially over-provisioning a system to compensate for failure is infeasible in such a shared, federated platform.

Therefore we believe that we need to depart from the hard dependability guarantees of traditional DBMSs and today's stream-processing systems. Ensuring no tuple loss at all times may be feasible within a single data centre, but we cannot hope to achieve this at an Internet-scale. Instead, we explore dependability guarantees that are driven by application requirements. Many sensing applications can cope with a controlled degradation of result quality.  While result quality is reduced, the system provides constant feedback to users on the achieved level of service. Feedback is expressed in a domain-specific way, e.g., by notifying a scientific user about the reduction in detection confidence of events of interest. This feedback also drives an adaptive fault-tolerance mechanism allowing the DISSP system to strategise about resource allocation in order to minimise the reduction in service quality of a maximum number of users.

Project Objectives

The aims of the DISSP project are to:
  • Investigate and develop a novel reliability model for DISSP that includes user-perceived quality of query results and provides feedback to users on quality degradation due to unmaskable network and host failures.
  • Provide adaptive fault-tolerance mechanisms for DISSP systems that select appropriate strategies for maximising result quality over the lifetime of potentially long-running queries (eg over several months).
  • Design, implement and evaluate a scalable prototype system for DISSP using controlled experiments on the Emulab network testbed.
  • Deploy an open, global and shared platform for DISSP as a public service on the PlanetLab research network and thus to facilitate and encourage the use of DISSP across research communities.

The specific research issues addressed by the project include:

  • How to choose a realistic failure model for a DISSP system in terms of correlated and uncorrelated host and network failures over long periods of time?
  • What are appropriate data and query models for DISSP systems that may suffer from failure of the infrastructure? How to specify queries that can be dynamically adapted to failures of query operators and data sources?
  • How to relate low-level failures to application-level degradation of result quality? How to design application-specific and -independent utility functions for expressing user-perceived benefit of long-running queries in DISSP?
  • What is a suitable set of proactive and reactive techniques for a DISSP system to compensate or mask short- and long-term host and network failures?
  • What are light-weight and scalable mechanisms to reconcile inconsistent state of stream processing operators after failure recovery?
  • How to leverage approaches from autonomous computing, control theory and machine learning to design scalable and adaptive feedback mechanisms for selecting between different fault-tolerance techniques?
  • What level of functionality and interfaces are needed when providing a shared DISSP system for a global, inter-disciplinary user base?

Project Members


Last modified on: 05-03-2009 22:34:30 — (C) Peter Pietzuch — Email:prp(at)doc(dot)ic(dot)ac(dot)uk
Powered by Etomite CMS.