Department of Computing, Imperial College

Machine Intelligence 19 Workshop,

Withersdane Conference Centre

Imperial College at Wye

18-20 September 2002

This workshop is the 19th in the Machine Intelligence series, which was founded by Professor Donald Michie in 1965. The workshop programme chair is Professor Stephen Muggleton, with local arrangements support by Janice Lonsdale. The theme of this year's workshop is ``Reasoning and Uncertainty: methods and applications''. That is, Uncertainty Reasoning on the one hand, and Machine Learning on the other. The theme was chosen by the Machine Intelligence board as a natural outcome of discussions at Machine Intelligence 18 which identified the increasing need for explicit representation of uncertainty in machine learned models of biological processes. The workshop will bring together a number of the most respected figures in these related fields of Artificial Intelligence.

Papers at the workshop will be refereed for a special issue of the Machine Intelligence area of the ETAI (Electronic Transactions on Artificial Intelligence) Journal. Referee comments for inclusion of papers will be posted on the MI19 ETAI web page.

Programme for Machine Intelligence 19 (PDF - includes information about breaks and session chairs)

Contributors

Nicos Angelopoulos bsp;
James Cussens bsp;
Peter Flach bsp;
Koichi Furukawa bsp;
Hector Geffner bsp;
Lise Getoor bsp;
Katsumi Inoue bsp;
Manfred Jaeger bsp;
David Jensen bsp;

Kristian Kersting bsp;
Stephen Muggleton bsp;
Henri Prade bsp;
Stuart Russell bsp;
Claude Sammut bsp;
Erik Sandewall bsp;
Taisuke Sato bsp;
Hiroaki Watanabe bsp;
Jon Williamson bsp;

Paper Formatting Information

Venue Information

Travel directions

Railtrack Information

Accommodation Information - Single rooms have been assigned to all contributors. Any questions concerning accomodation should be directed to Janice Lonsdale (jll@doc.ic.ac.uk).

Titles and Abstracts

Speaker: Nicos Angelopoulos
Title: Machine Learning Metabolic Pathway descriptions using a Probabilistic Relational Representation
Abstract: A variety of approaches aimed at machine learning probabilistic relational representations have been explored in recent years. These include learning algorithms for Probabilistic Relational Models, Bayesian Logic Programs and Stochastic Logic Programs. In this paper we investigate the question of whether there exist important applications which require the machine learning of such representations. In particular, applications of this kind must necessitate a) machine learning, b) probabilistic representation and c) relations. We argue in this paper that the automated extension of partial metabolic pathway descriptions fits all these criteria. As usual with such representations the learning task can be divided into sub-tasks involving model building and parameter estimation. In order to gain understanding of the sample size requirements for estimating the parameters of medium-sized metabolic networks we conducted experiments in which RMS error is estimated with increasing sample size. We separately investigated the effects of multiple pathway contributions to the concentrations of metabolites within the networks. For non-branching pathways the results indicate that parameter estimation is feasible from limited sources of data. However, the introduction of branches increases sample requirements considerably.

Speaker: James Cussens
Title: Inductive logic versus the logic of probability
Abstract: First-order logic is a formalism whose representational flexibility and well developed theory motivates its adoption for representing complex probabilistic models. Probability has often been presented as a replacement for logic-based methods, but much of the new work in AI is both logical and probabilistic: seeking to exploit the strengths of both approaches in a hybrid framework. Logic-based approaches (often based on logic programming) also provide a method of implementing probabilistic programming languages. Combined logical-probabilistic frameworks are very old; often the result of grand attempts to formulate a calculus of general reasoning. A careful account of the assumptions and achievements of these systems provides understanding of what is significant and novel in contemporary logical-probabilistic frameworks. This paper aims to provide such an account drawing particularly on the pioneering work of Boole. We find two major strands in this work. There are attempts to address Hume's problem of induction by devising an inductive logic and then there is work with the more modest aim of formalising probabilistic reasoning using a logic of probability.

Speaker: Peter Flach
Title: Probabilistic reasoning with terms
Abstract: Many problems in artificial intelligence can be naturally approached by generating and manipulating probability distributions over structured objects. First-order terms such as lists, trees and tuples and nestings thereof can represent individuals with complex structure in the underlying domain, such as sequences or molecules. Higher-order terms such as sets and multisets provide additional representational flexibility. I will present two Bayesian approaches that employ such probability distributions over structured objects: the first is an upgrade of the well-known naive Bayesian classifier to deal with first-order and higher-order terms, and the second is an upgrade of propositional Bayesian networks to deal with nested tuples.

Speaker: Koichi Furukawa
Title: Modelling Human Skill in Bayesian Network
Abstract: In this paper, we discuss the problem of modelling human skill in Bayesian network. The purpose of modelling human skill is to use the model to improve performances in such activities as playing instruments, dancing, and playing various kinds of ports. The problem of human skill is its tacitness: even professional violinists or cellists do not know how they are playing. The purpose of this research is to model human skill in terms of Bayesian network. This paper gives the basic framework of the research by proposing possible representations and structures of the Bayesian networks for human skill, and by defining the purpose of model usage.

Speaker: Hector Geffner
Title: Planning as Branch and Bound
Abstract: Branching and lower bounds are two key notions in heuristic search and combinatorial optimization. Branching refers to the way the space of solutions is searched, while lower bounds refer to approximate cost measures used for focusing and pruning the search. In AI Planning, admissible heuristics or lower bounds have received considerable attention recently and most current optimal planners use them, either explicitly or implicitly (e.g., by relying on a plan graph). Branching, on the other hand, has received less attention and is seldom discussed explicitly. In this work, we make the notion of branching in planning explicit and relate it to branching schemes used in combinatorial optimization. We analyze from this perspective the relationship between heuristic-search, constraint-based and partial-order planning, and between planning and scheduling. We also introduce a branch-and-bound formulation for planning that handles actions with durations and use unary resources, and consider a range of theories that stand halfway between planning and scheduling. The goals are twofold: to make sense of various techniques that have been found useful in planning and scheduling in a unified framework, and to lay the ground for systems that can effectively combine both.

Speaker: Lise Getoor
Title: Learning Structured Statistical Models from Relational Data
Abstract: Here we describe tools for constructing statistical models from relational data. Our goal is to learn structured probabilistic models that represent statistical correlations both between the properties of an entity and between the properties of related entities. These statistical models can then be used for a variety of tasks including knowledge discovery, exploratory data analysis, data summarization and anomaly detection. Unfortunately, most statistical learning methods work only with "flat" data representations. Thus, to apply these methods, we are forced to convert the data into a flat form, thereby not only losing its compact representation and structure but also potentially introducing statistical skew. These drawbacks severely limit the ability of current statistical methods to model relational databases. Here we describe two complementary approaches: one approach suited to making probabilistic statements about individuals and the second approach suited to making statements about frequencies in relational data. We describe algorithms for both learning and making inferences in these models, and give experimental results.

Speaker: Katsumi Inoue
Title: Disjunctive Explanations in Inductive and Abductive Logic Programming
Abstract: Given some evidence or observation, abduction infers candidate hypotheses to explain such an observation. In this paper, we consider the case in which multiple explanations exist but we cannot select one from them. Our approach leaves the decision indefinite, but offers a consistent way to theory changes. A disjunctive explanation, which is a disjunction of possible alternative explanations, is introduced for this purpose. For abduction, such a disjunction is useful when obtained explanations are incorporated or assimilated into the current knowledge base. The assimilated knowledge base preserves the intended semantics from the collection of all possible updated knowledge bases. The merit of disjunctive explanations is that we only need one current knowledge base at a time, still keeping every possible change in a single state. This method is extended to accomplish an update when there are multiple ways to remove old unnecessary hypotheses from the knowledge base. The proposed framework is also well applied to view updates in disjunctive databases. Finally, disjunctive explanations are considered in induction, where we take the disjunction of multiple rules. In this case, disjunctive explanations correspond to take the greatest specialization of rules under implication.

Speaker: Manfred Jaeger
Title: Relational Bayesian Networks, a Survey
Abstract: Relational Bayesian networks are a highly expressive representation language for probability distributions on relational structures. Distinguishing features of this framework are syntactic simplicity (the language is defined by only four elementary syntactic constructs loosely corresponding to the constructs of predicate logic) and semantic clarity. In this paper I give an overview over relational Bayesian networks. First syntax, semantics and inference methods for elementary queries are recapitulated, then I will highlight some of the more challenging topics that arise when one asks non-elementary queries about probabilities for infinite structures, or limit probabilities for finite structures of increasing size. Finally, I will briefly discuss the question of learning relational Bayesian networks from data.

Speaker: David Jensen
Title: Feature Selection Biases in Relational Learning
Abstract: Recently, we showed how two common structural characteristics of relational data can cause learning algorithms to mistakenly conclude that correlation exists between a class label and a relational feature (Jensen & Neville 2002). This paper extends that work to three additional cases where structural characteristics of relational data can cause an inappropriate bias toward certain classes of features. Collectively, these cases can affect any learning algorithm that uses aggregation functions (e.g., maximum, minimum, average, mode, count, or exists) to construct relational features. Such algorithms include probabilistic relational models and many algorithms for learning models in first-order logic. We provide proofs of these biases and discuss their effect on algorithms for learning probabilistic models from relational data. We also summarize and discuss all our recent work on the feature selection biases introduced by the relational structure of training data.

Speaker: Kristian Kersting
Title: Logical Hidden Markov Models
Abstract: Logical hidden Markov models (LOHMMs) are a generalization of hidden Markov models (HMMs) to analyze sequences of logical atoms. In LOHMMs, abstract states summarize sets of states and are represented by logical atoms. Transitions are defined between abstract states to summarize sets of transitions between states. Unification is used to share information among states, and between states and observations. By design, the number of parameters of a LOHMM can be by an order of magnitude smaller than the number of an equivalent HMM. We devised adaptations of the classical HMM procedures such as the forward and backward procedures. Our experiments show that LOHMMs have a good generalization performance, and that it is easy to extract characteristic patterns from the trained LOHMM.

Speaker: Stephen Muggleton
Title: Learning structure and parameters of Stochastic Logic Programs
Abstract: Previous papers have studied learning of Stochastic Logic Programs (SLPs) either as a purely parametric estimation problem or separated structure learning and parameter estimation into separate phases. In this paper we consider ways in which both the structure and the parameters of an SLP can be learned simultaneously. The paper assumes an ILP algorithm, such as Progol or FOIL, in which clauses are constructed independently. We derive analytical and numerical methods for efficient computation of the optimal probability parameters for a single clause choice within such a search. An implementation of this approach in Progol4.5 is demonstrated.

Speaker: Henri Prade
Title: Modeling positive and negative information in possibility theory
Abstract: From a representation point of view, it may be interesting to distinguish between i) what is possible because it is consistent with the available knowledge on the one hand, and ii) what is guaranteed to be possible because it is reported from observations on the other hand. Such a distinction also makes sense when expressing preferences, in order to point out choices which are positively desired among those which are not rejected. Possibility theory provides a representation framework where this distinction can be made in a graded way. In logical terms, the two types of information can be encoded by two types of constraints expressed in terms of necessity measures and in terms of guaranteed possibility functions. These two set functions are min-decomposable with respect to conjunction and disjunction respectively. This gives birth to two forms of possibilistic logic bases, which can viewed as generalized CNF and DNF respectively. By application of a minimal commitment principle, they induce a pair of possibility distributions at the semantic level, for which a consistency condition should hold in order to ensure that what is claimed to be guaranteed as possible is indeed not impossible. The bilattice structure underlying the framework is pointed out. The paper provides a survey of this bipolar representation framework, including the use of conditional measures, or the handling of comparative context-dependent constraints. The interest of the framework is stressed for expressing preferences, or in the representation of if then rules in terms of examples and counter-examples.

Speaker: Stuart Russell
Title: Identity Uncertainty
Abstract: We are often uncertain about the identity of objects. This phenomenon appears in theories of object persistence in early childhood; in the well-known Morning Star/Evening Star example; in tracking and data association systems for radar; in security systems based on personal identification; in the content analysis of web pages; and in many aspects of our everyday lives. The talk will describe a formal probabilistic approach to reasoning about identity under uncertainty in the framework of first-order logics of probability, with application to wide-area freeway traffic monitoring and citation matching.

Speaker: Claude Sammut
Title: Localisation in an uncertain world with multiple agents
Abstract: The information delivered by a robot's sensors always contains a high degree of uncertainty. When there are multiple communicating robots, there is an opportunity to combine sensor information to obtain a more complete model of the world. However, combining uncertain information in noisy and dynamic environments is fraught with many difficulties and can often make reasoning about the world more difficult. In this paper, we describe our attempts to combine world models of multiple robots in the RoboCup robot soccer domain. We work with Sony's Aibo legged robot. A team consists of four such robots, each of which operates autonomously, sensing its environment primarily through an onboard colour camera. Each robot is equipped with a wireless LAN card enabling it to communicate with its team mates. The robots attempt to localise themselves and other objects and construct a world model which may be shared. The problem we discuss is how the world models from other robots can be incorporated into a single model. We also discuss the dangers in attempting to do so.

Speaker: Erik Sandewall
Title: What's in a Decision?
Abstract: We address knowledge representation issues that arise when designing systems with cognitive autonomy and, in particular, the representational framework for deliberation about courses of action and for monitoring the execution of actions, as well as the momentary decisions that mark the transition from deliberation to action.

Speaker: Taisuke Sato
Title: Computing Kikuchi approximations by cluster BP
Abstract: Learning and logical reasoning are notable functionalities of our intelligence, and PRISM has been developed as a programming language for implementing such intelligent functionalities. It is a logic programming language augmented with an EM learning capability for statistical parameters embedded in programs. It not only encompasses known symbolic-statitical formalisms from Bayesian networks to hidden Markov models to probabilistic CFGs but subsumes specific EM learning algorithms proposed in each area. In this paper, we raise the issue of approximate probability computations in hope for mitigating some limitations of the PRISM formalism. We especially pay attention to "loopy BP" which is at the crossroads of Bayesian networks, statistical physics and error correction decoding. It is a way of approximations to intractable probability computations based on asynchronous message passing. We re-examine its theoretical background, the Kikuchi approximation, and propose "cluster BP," a mild generalization of loopy BP to realize a better yet simple Kikuchi approximations. We also propose "cluster CCCP" which is a convergent version of cluster BP to obtain a local minimum of Kikuchi approximations.

Speaker: Hiroaki Watanabe
Title: A First-order Stochastic Action Language
Abstract: A first-order stochastic action language, Datalog Stochastic Action Language (Datalog-SAL), is introduced for describing stochastic dynamic changing worlds. We define syntax and semantics of Datalog-SAL in order to show that a model of Datalog-SAL is given as a set of Hidden Markov Models (HMMs). Probability distributions over actions, state transitions, and initial states could be learned by HMM's parameter learning algorithms such as Baum-Welch algorithm. We investigate the learning complexity of Datalog-SAL with Baum-Welch algorithm. We show two simple examples from a nuclear power station domain and the metabolic pathway reaction domain. Datalog-SAL provides a simple and smooth conversion of Datalog and probability in the action and change framework.

Speaker: Jon Williamson
Title: Maximising Entropy Efficiently
Abstract: Determining a prior probability function via the maximum entropy principle can be a computationally intractable task. However one can easily determine - in advance of entropy maximisation - a list of conditional independencies that the maximum entropy function will satisfy. These independencies can be used to reduce the complexity of the entropy maximisation task. In particular, one can use these independencies to construct a direct acyclic graph in a Bayesian network, and then maximise entropy with respect to the numerical parameters of this network. This can result in an efficient representation of a prior probability function, and one that may allow efficient updating and marginalisation. Furthermore this simplification of the entropy maximisation task may be exploited to construct a proof theory for probabilistic logic.