Probabilistic Abstract Interpretation

Next: Workplan Up: Programme and Methodology Previous: Overall Aims and Objectives

Methodology

Data-flow Analysis

On a concrete level, the idea is to incorporate in the data-flow analysis not only information about possible values of variables, but also information about the probability that a certain value is actually assigned to a variable.

This implies the use of probability distributions to characterise the evolution of variable assignments. Therefore, the accumulating semantics will be based on (probability) distributions rather than on sets. This can be obtained as a quite straight-forward generalisation of existing methodologies.

As an example consider the above program fragment. In the classical abstract interpretation framework one associates to each program point the set of values which the variable $n$ might assume at this point. In our case, we would get:

at $A:$: the set of (let's say) the first $N$ natural numbers ${1,2,3,&ldots;,N};$
at $B:$: the set of all prime numbers less than $N$ , ${p &thicksp;|&thicksp; prim(p) &thicksp; and &thicksp; p ≤N};$
at $C:$: all odd prime numbers less than $N$ , ${p &thicksp;|&thicksp; prim(p) &thicksp; and &thicksp; p ≤N};$
at $D:$: all even prime numbers less than $N$ , i.e. just the set ${ 2 }$ .

Suppose we have information about the probability distribution on the values of $n$ . We then can incorporate this information in the analysis. For example, assume a uniform distribution of the input data, i.e. the first $N$ natural numbers occur each with the same probability ${1 N}$ . Then the probabilities to reach the various program points are

at $B:$: $P={p N}$ , where $p$ is the number of prime numbers smaller than $N$ ;
at $C:$: $P={p-1 N}$ , as we have $p-1$ odd prime numbers smaller than $N$ ;
at $D:$: $P={1 N}$ .

Note that this is a probabilistic analysis of a deterministic program.

The situation becomes more complicated if we assume a non-uniform input distribution. The aim of our research is to overcome such difficulties by replacing heuristic arguments by a formal system which is the combination of classical data-flow analysis with probabilistic inference. A particular approach is to integrate the collecting semantics approach with Bayesian network inference [56, 6, 28]. Such inference algorithms take exponential time in the worst case, but are reasonably fast in most practical applications [37].

Cousot Framework

Abstract interpretation in the Cousot framework is based on the comparison of the concrete semantics of a program and a so-called abstract semantics, which represents certain properties of the program. Both the concrete and the abstract domain are represented in the Cousot approach by partially ordered sets (posets). These two semantic domains are related by two functions, an abstraction function $α$ and a concretisation function $γ$ . In oder to define a correct abstract interpretation this pair of functions must form a Galois insertion of the abstract domain into the concrete one.

The idea is to transfer this general framework in a probabilistic setting, by replacing either one or both (concrete and abstract) domains by probabilistic domains. This is done by moving from order-theoretic to linear structures (e.g. Banach lattices, etc.)

The main difficulty we expect is to find an appropriate notion of ``probabilistic Galois connection''. This is necessary because the classical notion is defined for domains where only the information order is relevant. In our case, the domains possess a richer structure (e.g. Banach space). Therefore, we have to guarantee that the abstraction and concretisation functions not only form a Galois connection (wrt the information order), but also fulfil some additional structure-preserving conditions (e.g. wrt the Banach space structure).

To our knowledge this is the first attempt to base the abstract interpretation theory on a probabilistic denotational semantics.

Such a project fits very well with current ongoing research in randomised and approximation algorithms, which are becoming very popular (in particular in complexity theory e.g. [64]). Furthermore, the quantification of properties of programs (e.g. running time, resource consumption, costs, etc.) is commonly recognised as being of great importance, especially in areas like distributed computing.

Next: Workplan Up: Programme and Methodology Previous: Overall Aims and Objectives