The Distributed Algorithms course is concerned with the algorithmic
aspects of distributed computing. In particular, we focus on Distributed
Systems which are prone to hardware and/or software failures. Two classes
of subjects are discussed:
|
|
Models of distributed computing (Introduction)
Lecture Notes: HTML
(requires a browser that can handle frames); 4-up printable in PDF
and Postscript
format.
References:
J. Turek and D. Shasha, “The Many Faces of Consensus in Distributed Systems”,
IEEE Computer, June 1992.
R. Schlichting and F. Schneider, "Fail-Stop processors: an approach to
designing fault-tolerant computing", ACM Transactions on Computer Systems,
1(3), August 1983.
F. Cristian, "Understanding Fault-Tolerant Distributed Systems", Communications
of the ACM, 34(2), February 1991.
|
|
|
Synchronous message-passing distributed systems
Leader Election and Bredth-First Search algorithms
Lecture Notes: HTML
(requires a browser that can handle frames); 4-up printable in PDF
and Postscript
format.
Demos: A zip
file containing models for the following algorithms: LCR,
LCR
+ halt, FloodMax, FloodMaxOpt,
SyncBFS,
SyncBFS
+ termination.
Tutorial Questions:
Tutorial 1 (21/01/00): Questions (PDF,
Postscript);
solutions (PDF,
Postscript).
Tutorial 2 (24/01/00): Questions (PDF,
Postscript);
solutions (PDF,
Postscript).
References:
-
“Distributed Algorithms”, Nancy Lynch, Morgan Kaufmann, 1996. Chapters
2, 3 and 4.
The Atomic Commitment Problem
Lecture Notes: HTML
with animations which also includes a PowerPoint show (works well only
with MS Internet
Explorer 5 - blaim Microsoft for that); 4-up handouts in PDF
and Postscript
format. The handouts do not include the animated slides of the PowerPoint
presentation.
Errata: In slide 14 of the handouts, in the very last line of
the slide, the process must block and not abort! See also the corrected
handouts above.
The material in this section is based on earlier notes by Vassos Hadzilacos.
Additional notes: Impossibility of Atomic Commitment in the presence
of link failures -- proof for 2 processes. (PDF,
Postscript)
Tutorial 3 (31/01/00): Questions (PDF,
Postscript);
solutions (PDF,
Postscript).
References:
-
J.N. Gray, "Notes on Database Operating Systems: An Advanced Course", Lecture
Notes in Computer Science 60:393-481, Springer-Verlag, Berlin, 1978.
-
D. Skeen, "Nonblocking Commit Protocols", In Proc. ACM SIGMOD Conf. on
Management of Data, pp. 133-147, Orlando FL, June, 1982.
-
D. Cheung and T. Kameda, "Site-Optimal Termination Protocols for a Distibuted
Database under Networking Partitioning", In Proc. 4th ACM SIGACT-SIGOPS
Symp. on Principles of Distributed Computing, pp. 111-121, Minaki, Ontario,
August, 1985.
The Byzantine Generals Problem
Lecture Notes: 4-up handouts in PDF.
Tutorial 4 (07/02/00): Questions (PDF,
Postscript);
solutions (PDF,
Postscript).
References:
L. Lamport, R. Shostak, M. Pease, "The Byzantine Generals Problem", ACM
TOPLAS 4(3), pp. 382-401, July 1982.
D. Dolev, H. R. Strong, "Authenticated Algorithms for Byzantine Agreement",
SIAM Journal of Computer, 12(4), pp. 656-666, 1983.
|
|
|
Asynchronous message-passing distributed systems
Impossibility of Consensus
Lecture Notes: HTML
with animations which also includes a PowerPoint show; 4-up handouts
in PDF
and Postscript
format.
Additional notes:
Top-level induction of the FLP impossibility proof (PDF,
Postscript):
Inductive construction of a schedule S*, such that S* is infinite, admissible,
applicable to an initial bivalent configuration C", and for every prefix
S' of S*, S'(C") is also bivalent.
The material in this section is based on earlier notes by Vassos Hadzilacos.
Tutorial 5 (14/02/00):
Assessed coursework (PDF,
Postscript).
Hand out date: Monday 14 February 2000
Hand in deadline: Monday 6 March 2000
Nominal time: 1.5 hours
Sample solutions: PDF,
Postscript.
References:
-
M. Fischer, N. Lynch, M. Peterson, "Impossibility of Distributed Consensus
with one Faulty Process", Journal of the ACM, Vol. 32, No. 2, April 1985.
-
J. Turek and D. Shasha, “The Many Faces of Consensus in Distributed Systems”,
IEEE Computer, June 1992.
Logical Time
Lecture Notes: HTML
with animations which also includes a PowerPoint show; 4-up handouts
in PDF
and Postscript
format.
Tutorial 6 (21/02/00): Questions (PDF,
Postscript).
References:
-
L. Lamport, "Time, Clocks and Ordering of Events in a Distributed System",
Communications of the ACM, Vol. 21, No. 7, July 1978, pp. 558-565.
-
G. Ricart, A.K. Agrawala, "An Optimal Algorithm for Mutual Exclusion in
computer Networks", Communications of the ACM, Vol. 24, No. 1, January
1981.
-
F. Mattern, "Virtual Time and Global States of Distributed Systems", in
Proceedings of the International Conference on Parallel and Distributed
Algorithms, pp. 215-216, 1989.
-
C. Fidge, "Tmestamps in message-passing systems that preserve the partial
ordering", in Proceedings of the 11th Australian Computer Science Conference,
February 1988.
Fault-Tolerant Broadcasts
Lecture Notes: HTML
with animations which also includes a PowerPoint show; 4-up handouts
in PDF
and Postscript
format.
Tutorial 7 (06/03/00): Questions (PDF,
Postscript);
solutions (PDF,
Postscript).
References:
-
Vassos Hadzilacos and Sam Toueg, "A modular approach to fault tolerant
broadcasts and related problems", Technical Report # 94-1425, Dept of Computer
Science, University of Toronto and Dept of Computer Science, Cornell University,
May 1994. Available as http://www.cs.utoronto.ca/~vassos/research/publications/HT94/paper.ps.gz
-
Ken Birman, "The process group approach to reliable distributed computing",
Communications of the ACM, 36(12), pp.37-53, 1993.
-
K. Birman, A. Schiper and P. Stephenson, "Lightweight causal and atomic
group multicast", ACM Transactions on Computer Systems, 9(3), August 1991.
|
|
|
Partially synchronous message-passing distributed systems
Failure Detectors
Lecture Notes: HTML
with animations which also includes a PowerPoint show; 4-up handouts
in Postscript
format.
Additional notes:
Informal correctness arguments (PDF,
Postscript)
for the algorithm that solves Consensus using the Perfect Failure Detector
(P). The algorithm is presented in the Lecture Notes, page 14.
Tutorial 8 (13/03/00): Questions (PDF,
Postscript);
solutions (PDF,
Postscript).
References:
-
T.D. Chandra and S. Toueg “Unreliable failure detectors for reliable distributed
systems”, Journal of the ACM, 43(2), March 1996.
-
T.D. Chandra, V. Hadzilacos and S. Toueg “The weakest failure detector
for solving consensus”, Journal of the ACM, 43(4), July 1996.
|