Extracting Algorithms from Code

   Software development rarely follows a nice top-down development
order, starting from a well-defined (simple, verified) abstract
algorithm and refining it into effective code. More often the
algorithm evolves as the code takes shape, under many, often critical,
constraints (e.g., time/memory/bandwidth usage, integration with other
code bases) that simply do not appear at the algorithm abstraction level.

   This means that any formal description of the underlying algorithm
has to be extracted from the code. Even though it cannot be used to
drive the development process, such a post facto model is still quite
useful, especially for concurrent algorithms:

   - The model can be used to systematically generate assertions and test
 sequences that check that the code works correctly in all corner cases;
 thus filling in the blind spot of traditional testing procedures have for
 nondeterminism.

   - The model will provide some insight as to why the code works, how it
 can be modified without compromising correctness, and how those modifications
 should be tested.

   - The model can be used to identify modes of operation of the code, which
 in turn may be useful to predict performance issues.

  In this lecture we will present a methodology we developed for doing
model extraction of this kind, using Lamport's TLA+ language and its associated
tools. The main steps of the method are as follows:

  - Using a small TLA+ library that provides a mathematical model of the
  programming language used (in particular, its call stack), write a first cut
  model that closely follows the code.

  - Use the TLC model checker to verify (on small problem sizes) basic
  properties of the model, starting with type invariants, then moving on to
  representation and algorithmic invariants, and a few safety and liveness
  properties.

  - Using the representation invariants, write a second, abstract, and
  language-independent model for which larger instances can be
  checked.

  - Find the inductive invariant of the abstract model, and check it using
  bounded model checking.

  - Find the variants that guarantee the temporal (liveness) properties of
  the abstract model.

Thus each element of the model can be related back to the original code,
and to the software engineering process described above.