Imperial College » Dept of Computing » Peter Pietzuch

PhD Students

I'm always looking for outstanding and motivated PhD students. Starting dates for PhDs are April and October. More information on the application process and funding opportunities is available here.

EBS Book

Scalable Distributed Systems Design (16/17)

Lecturer: Dr Peter Pietzuch
Email: prp@doc (you know the rest)
Office: Huxley 442

Course Schedule

Introduction

Lectures 1+2 (Jan 18): Overview of scalable distributed system design, data centres, design principles (slides PDF)

Scalable Data

Lectures 3+4 (Jan 25)

Google BigTable (slides PDF)

"Bigtable: A Distributed Storage System for Structured Data", Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber, Seventh Symposium on Operating System Design and Implementation (OSDI), Seattle, WA, November, 2006.

What is the problem that this paper tries to solve? How would summarise its main idea in a few sentences? How does it work in more detail?
What is good about the paper? What is not good about the paper?
How does the design of BigTable compare to that of a parallel relational database management system (RDBMS)?
What limits the scalability of the BigTable design?

Amazon Dynamo (slides PDF )

"Dynamo: Amazon's Highly Available Key-Value Store", Guiseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swami Sivasubramanian, Peter Vosshall, and Werner Vogels, ACM Symposium on Operating Systems Principles (SOSP), Stevenson, WA, October 2007.

What is the problem that this paper tries to solve? How would summarise its main idea in a few sentences? How does it work in more detail?

What is good about the paper? What is not good about the paper?

To what extent is the design of Dynamo inspired by Distributed Hash Tables (DHTs)? What are the advantages and disadvantages of such a design?

How does the design of Dynamo compare to that of BigTable?

Lectures 5+6 (Feb 1)

Google Spanner (slides PDF)

"Spanner: Google's Globally-Distributed Database", James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, JJ Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, and Dale Woodford, Tenth Symposium on Operating System Design and Implementation (OSDI), Hollywood, CA, October, 2012.

What is the problem that this paper tries to solve? How would summarise its main idea in a few sentences? How does it work in more detail?
What is good about the paper? What is not good about the paper?
How does the performance of Spanner depend on the workload?
What other applications could TrueTime have?

Apache Kafka - Eno Thereska, Confluent, Inc. and Imperial College London (guest lecture) (slides PDF)

Scalable Computation

Lectures 7+8 (Feb 8)

Google Map/Reduce (slides PDF)

"MapReduce: Simplified Data Processing on Large Clusters", Jeffrey Dean and Sanjay Ghemawat, Sixth Symposium on Operating System Design and Implementation (OSDI), San Francisco, CA, December, 2004.

What is the problem that this paper tries to solve? How would summarise its main idea in a few sentences? How does it work in more detail?
What is good about the paper? What is not good about the paper?
What algorithms cannot be easily expressed in the MapReduce model?
Can you think of other techniques for handling stragglers?

Apache Spark/RDD (slides PDF)

"Resilient Distributed Datasets", Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, Ion Stoica, 9th USENIX conference on Networked Systems Design and Implementation (NSDI), San Jose, CA, April 2012.

What is the problem that this paper tries to solve? How would summarise its main idea in a few sentences? How does it work in more detail?
What is good about the paper? What is not good about the paper?
Is the comparison with Hadoop fair?
How well can Spark be used to process graph data?

No lectures (Feb 15)

Scalable Services

Lectures 9+10 (Feb 22)

Apache Zookeeper (slides PDF)

"ZooKeeper: Wait-Free Coordination for Internet-Scale Systems", Patrick Hunt, Mahadev Konar, Flavio P. Junqueira, and Benjamin Reed, USENIX Annual Technical Conference (ATC), Boston, MA, 2010.

What is the problem that this paper tries to solve? How would summarise its main idea in a few sentences? How does it work in more detail?
What is good about the paper? What is not good about the paper?
How is Zookeeper different from a distributed database such as Google Spanner?
What extensions would you propose for Zookeeper?

No lectures (Mar 1)

Lectures 11 (Mar 8) - 10am

"LogDevice: A Log-Oriented Data Store" , Lovro Puzar, Facebook (guest lecture)

Lectures 12+13 (Mar 15)

Revision lecture (slides PDF)

"Azure SQL Data Warehouse", Robin Lester, Microsoft Azure (guest lecture) (slides PDF)

Optional Reading

Google Millwheel

"MillWheel: Fault-Tolerant Stream Processing at Internet Scale", Tyler Akidau, Alex Balikov, Kaya Bekiroglu, Slava Chernyak, Josh Haberman, Reuven Lax, Sam McVeety, Daniel Mills, Paul Nordstrom, Sam Whittle, Very Large Data Bases (VLDB), 2013.

Microsoft Dryad

"Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks", Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly, European Conference on Computer Systems (EuroSys), Lisbon, Portugal, 2007.

Last modified on: 14-01-2018 11:13:15 — (C) Peter Pietzuch — Email: prp(at)doc(dot)ic(dot)ac(dot)uk