Papers at the workshop will be refereed for a special issue of the Machine Intelligence area of the ETAI (Electronic Transactions on Artificial Intelligence) Journal. Referee comments for inclusion of papers will be posted on the MI19 ETAI web page.

**Programme for Machine Intelligence 19 (PDF - includes information about breaks and session chairs)**

**Contributors**

Nicos Angelopoulos bsp;
James Cussens bsp; Peter Flach bsp; Koichi Furukawa bsp; Hector Geffner bsp; Lise Getoor bsp; Katsumi Inoue bsp; Manfred Jaeger bsp; David Jensen bsp; |
Kristian Kersting bsp;
Stephen Muggleton bsp; Henri Prade bsp; Stuart Russell bsp; Claude Sammut bsp; Erik Sandewall bsp; Taisuke Sato bsp; Hiroaki Watanabe bsp; Jon Williamson bsp; |

**Accommodation Information** - Single rooms have been assigned to all contributors. Any questions concerning accomodation should be directed to Janice Lonsdale (jll@doc.ic.ac.uk).

**Speaker: Nicos Angelopoulos**
**Title: Machine Learning Metabolic Pathway descriptions using a Probabilistic Relational Representation**
**Abstract: **
A variety of approaches aimed at machine learning probabilistic relational
representations have been explored in recent years. These include learning
algorithms for Probabilistic Relational Models, Bayesian Logic Programs and
Stochastic Logic Programs. In this paper we investigate the question
of whether there exist important applications which require the machine
learning of such representations. In particular,
applications of this kind must necessitate a) machine learning, b) probabilistic
representation and c) relations. We argue in this paper that the automated
extension of partial metabolic pathway descriptions fits all these
criteria. As usual with such representations the learning task can be
divided into sub-tasks involving model building and parameter estimation.
In order to gain understanding of the sample size requirements for
estimating the parameters of medium-sized metabolic networks
we conducted experiments in which RMS error is estimated with
increasing sample size. We separately investigated the effects of
multiple pathway contributions to the concentrations of metabolites
within the networks. For non-branching pathways the results indicate
that parameter estimation is feasible from limited sources of data.
However, the introduction of branches increases sample requirements
considerably.

**Speaker: James Cussens**
**Title: Inductive logic versus the logic of probability**
**Abstract: **
First-order logic is a formalism whose representational flexibility
and well developed theory motivates its adoption for representing
complex probabilistic models. Probability has often been presented as
a replacement for logic-based methods, but much of the new work
in AI is both logical and probabilistic: seeking to exploit the
strengths of both approaches in a hybrid framework. Logic-based
approaches (often based on logic programming) also provide a method
of implementing probabilistic programming languages.
Combined logical-probabilistic frameworks are very old; often the
result of grand attempts to formulate a calculus of general
reasoning. A careful account of the assumptions and achievements of
these systems provides understanding of what is significant and novel
in contemporary logical-probabilistic frameworks.
This paper aims to provide such an account drawing particularly on the
pioneering work of Boole. We find two major strands in this
work. There are attempts to address Hume's problem of induction by
devising an inductive logic and then there is work with the more
modest aim of formalising probabilistic reasoning using a logic of
probability.

**Speaker: Peter Flach**
**Title: Probabilistic reasoning with terms**
**Abstract: **
Many problems in artificial intelligence can be naturally approached
by generating and manipulating probability distributions over
structured objects. First-order terms such as lists, trees and tuples
and nestings thereof can represent individuals with complex structure
in the underlying domain, such as sequences or molecules.
Higher-order terms such as sets and multisets provide additional
representational flexibility. I will present two Bayesian approaches
that employ such probability distributions over structured objects:
the first is an upgrade of the well-known naive Bayesian classifier
to deal with first-order and higher-order terms, and the second is an
upgrade of propositional Bayesian networks to deal with nested tuples.

**Speaker: Koichi Furukawa**
**Title: Modelling Human Skill in Bayesian Network**
**Abstract: **
In this paper, we discuss the problem of modelling human skill in
Bayesian network. The purpose of modelling human skill is to use the
model to improve performances in such activities as playing
instruments, dancing, and playing various kinds of ports. The problem
of human skill is its tacitness: even professional violinists or
cellists do not know how they are playing. The purpose of this
research is to model human skill in terms of Bayesian network. This
paper gives the basic framework of the research by proposing possible
representations and structures of the Bayesian networks for human
skill, and by defining the purpose of model usage.

**Speaker: Hector Geffner**
**Title: Planning as Branch and Bound**
**Abstract: **
Branching and lower bounds are two key notions
in heuristic search and combinatorial optimization.
Branching refers to the way the space of
solutions is searched, while lower bounds
refer to approximate cost measures
used for focusing and pruning the search.
In AI Planning, admissible heuristics
or lower bounds have received considerable
attention recently and most current optimal
planners use them, either explicitly or implicitly
(e.g., by relying on a plan graph).
Branching, on the other hand,
has received less attention and is seldom
discussed explicitly.
In this work, we make the notion of branching
in planning explicit and relate it to branching schemes
used in combinatorial optimization.
We analyze from this perspective
the relationship between heuristic-search, constraint-based
and partial-order planning, and between planning and scheduling.
We also introduce a branch-and-bound formulation for planning
that handles actions with durations and use
unary resources, and consider a range of theories
that stand halfway between planning and scheduling.
The goals are twofold: to make sense
of various techniques that have been found
useful in planning and scheduling
in a unified framework,
and to lay the ground for systems
that can effectively combine both.

**Speaker: Lise Getoor**
**Title: Learning Structured Statistical Models from Relational Data**
**Abstract: **
Here we describe tools for constructing statistical models from relational
data. Our goal is to learn structured probabilistic models
that represent statistical correlations both between the properties of
an entity and between the properties of related entities. These statistical
models can then be used for a variety of tasks including knowledge discovery,
exploratory data analysis, data summarization and anomaly detection.
Unfortunately, most statistical learning methods work only with "flat"
data representations. Thus, to apply these methods, we
are forced to convert the data into a flat form, thereby not only
losing its compact representation and structure but also potentially
introducing statistical skew. These drawbacks severely limit the ability
of current statistical methods to model relational databases. Here we describe
two complementary approaches: one approach suited to making probabilistic
statements about individuals and the second approach suited to making
statements about frequencies in relational data. We describe algorithms for
both learning and making inferences in these models, and give experimental
results.

**Speaker: Katsumi Inoue**
**Title: Disjunctive Explanations in Inductive and Abductive Logic Programming**
**Abstract: **
Given some evidence or observation, abduction infers candidate
hypotheses to explain such an observation. In this paper, we consider
the case in which multiple explanations exist but we cannot select one
from them. Our approach leaves the decision indefinite, but offers a
consistent way to theory changes. A disjunctive explanation, which is
a disjunction of possible alternative explanations, is introduced for
this purpose. For abduction, such a disjunction is useful when
obtained explanations are incorporated or assimilated into the current
knowledge base. The assimilated knowledge base preserves the intended
semantics from the collection of all possible updated knowledge bases.
The merit of disjunctive explanations is that we only need one current
knowledge base at a time, still keeping every possible change in a
single state. This method is extended to accomplish an update when
there are multiple ways to remove old unnecessary hypotheses from the
knowledge base. The proposed framework is also well applied to view
updates in disjunctive databases. Finally, disjunctive explanations
are considered in induction, where we take the disjunction of multiple
rules. In this case, disjunctive explanations correspond to take the
greatest specialization of rules under implication.

**Speaker: Manfred Jaeger**
**Title: Relational Bayesian Networks, a Survey**
**Abstract: **
Relational Bayesian networks are a highly expressive
representation language for probability distributions
on relational structures. Distinguishing features of
this framework are syntactic simplicity (the language is defined
by only four elementary syntactic constructs loosely corresponding
to the constructs of predicate logic) and semantic clarity.
In this paper I give an overview over relational Bayesian
networks. First syntax, semantics and inference methods for
elementary queries are recapitulated, then I will highlight
some of the more challenging topics that arise when one
asks non-elementary queries about probabilities for
infinite structures, or limit probabilities for finite structures
of increasing size. Finally, I will briefly discuss the
question of learning relational Bayesian networks from data.

**Speaker: David Jensen**
**Title: Feature Selection Biases in Relational Learning**
**Abstract: **
Recently, we showed how two common structural characteristics of
relational data can cause learning algorithms to mistakenly conclude
that correlation exists between a class label and a relational feature
(Jensen & Neville 2002). This paper extends that work to three
additional cases where structural characteristics of relational data can
cause an inappropriate bias toward certain classes of features.
Collectively, these cases can affect any learning algorithm that uses
aggregation functions (e.g., maximum, minimum, average, mode, count, or
exists) to construct relational features. Such algorithms include
probabilistic relational models and many algorithms for learning models
in first-order logic. We provide proofs of these biases and discuss
their effect on algorithms for learning probabilistic models from
relational data. We also summarize and discuss all our recent work on
the feature selection biases introduced by the relational structure of
training data.

**Speaker: Kristian Kersting**
**Title: Logical Hidden Markov Models**
**Abstract: **
Logical hidden Markov models (LOHMMs) are a generalization of hidden
Markov models (HMMs) to analyze sequences of logical atoms. In LOHMMs,
abstract states summarize sets of states and are represented by
logical atoms. Transitions are defined between abstract states
to summarize sets of transitions between states. Unification is used
to share information among states, and between states and
observations.
By design, the number of parameters of a LOHMM can be by an order of
magnitude smaller than the number of an equivalent HMM. We devised
adaptations of the classical HMM procedures such as the forward and
backward procedures. Our experiments show that LOHMMs have a good
generalization performance, and that it is easy to extract
characteristic patterns from the trained LOHMM.

**Speaker: Stephen Muggleton**
**Title: Learning structure and parameters of Stochastic Logic Programs**
**Abstract: **
Previous papers have studied learning of Stochastic Logic Programs (SLPs)
either as a purely parametric estimation problem or separated structure
learning and parameter estimation into separate phases. In this paper we
consider ways in which both the structure and the parameters
of an SLP can be learned simultaneously. The paper assumes an ILP algorithm,
such as Progol or FOIL, in which clauses are constructed independently.
We derive analytical and numerical methods for efficient computation
of the optimal probability parameters for a single clause choice within
such a search. An implementation of this approach in Progol4.5 is demonstrated.

**Speaker: Henri Prade**
**Title: Modeling positive and negative information in possibility theory**
**Abstract: **
From a representation point of view, it may be interesting to
distinguish between i) what is possible because it is consistent with
the available knowledge on the one hand, and ii) what is guaranteed
to be possible because it is reported from observations on the other
hand. Such a distinction also makes sense when expressing
preferences, in order to point out choices which are positively
desired among those which are not rejected. Possibility theory
provides a representation framework where this distinction can be
made in a graded way. In logical terms, the two types of information
can be encoded by two types of constraints expressed in terms of
necessity measures and in terms of guaranteed possibility functions.
These two set functions are min-decomposable with respect to
conjunction and disjunction respectively. This gives birth to two
forms of possibilistic logic bases, which can viewed as generalized
CNF and DNF respectively. By application of a minimal commitment
principle, they induce a pair of possibility distributions at the
semantic level, for which a consistency condition should hold in
order to ensure that what is claimed to be guaranteed as possible is
indeed not impossible. The bilattice structure underlying the
framework is pointed out. The paper provides a survey of this bipolar
representation framework, including the use of conditional measures,
or the handling of comparative context-dependent constraints. The
interest of the framework is stressed for expressing preferences, or
in the representation of if then rules in terms of examples and
counter-examples.

**Speaker: Stuart Russell**
**Title: Identity Uncertainty**
**Abstract: **
We are often uncertain about the identity of objects. This phenomenon
appears in theories of object persistence in early childhood; in the
well-known Morning Star/Evening Star example; in tracking and data
association systems for radar; in security systems based on personal
identification; in the content analysis of web pages; and in many
aspects of our everyday lives. The talk will describe a formal
probabilistic approach to reasoning about identity under uncertainty
in the framework of first-order logics of probability, with
application to wide-area freeway traffic monitoring and citation matching.

**Speaker: Claude Sammut**
**Title: Localisation in an uncertain world with multiple agents**
**Abstract: **
The information delivered by a robot's sensors always contains a high
degree of uncertainty. When there are multiple communicating robots,
there is an opportunity to combine sensor information to obtain a more
complete model of the world. However, combining uncertain information in
noisy and dynamic environments is fraught with many difficulties and can
often make reasoning about the world more difficult.
In this paper, we describe our attempts to combine world models of
multiple robots in the RoboCup robot soccer domain. We work with Sony's
Aibo legged robot. A team consists of four such robots, each of which
operates autonomously, sensing its environment primarily through an
onboard colour camera. Each robot is equipped with a wireless LAN card
enabling it to communicate with its team mates. The robots attempt to
localise themselves and other objects and construct a world model which
may be shared. The problem we discuss is how the world models from other
robots can be incorporated into a single model. We also discuss the
dangers in attempting to do so.

**Speaker: Erik Sandewall**
**Title: What's in a Decision?**
**Abstract: **
We address knowledge representation issues that arise when designing
systems with cognitive autonomy and, in particular, the representational
framework for deliberation about courses of action and for
monitoring the execution of actions, as well as the momentary
decisions that mark the transition from deliberation to action.

**Speaker: Taisuke Sato**
**Title: Computing Kikuchi approximations by cluster BP**
**Abstract: **
Learning and logical reasoning are notable
functionalities of our intelligence, and PRISM
has been developed as a programming language for
implementing such intelligent functionalities.
It is a logic programming language augmented
with an EM learning capability for statistical
parameters embedded in programs. It not only
encompasses known symbolic-statitical formalisms
from Bayesian networks to hidden Markov models
to probabilistic CFGs but subsumes specific EM
learning algorithms proposed in each area.
In this paper, we raise the issue of approximate
probability computations in hope for mitigating
some limitations of the PRISM formalism. We
especially pay attention to "loopy BP" which is
at the crossroads of Bayesian networks,
statistical physics and error correction
decoding. It is a way of approximations to
intractable probability computations based on
asynchronous message passing. We re-examine its
theoretical background, the Kikuchi
approximation, and propose "cluster BP," a mild
generalization of loopy BP to realize a better
yet simple Kikuchi approximations. We also
propose "cluster CCCP" which is a convergent
version of cluster BP to obtain a local minimum
of Kikuchi approximations.

**Speaker: Hiroaki Watanabe**
**Title: A First-order Stochastic Action Language**
**Abstract: **
A first-order stochastic action language, Datalog Stochastic Action Language
(Datalog-SAL), is introduced for describing stochastic dynamic changing
worlds. We define syntax and semantics of Datalog-SAL in order to show that
a model of Datalog-SAL is given as a set of Hidden Markov Models (HMMs).
Probability distributions over actions, state transitions, and initial states
could be learned by HMM's parameter learning algorithms such as Baum-Welch
algorithm. We investigate the learning complexity of Datalog-SAL with
Baum-Welch algorithm. We show two simple examples from a nuclear power station
domain and the metabolic pathway reaction domain. Datalog-SAL provides a
simple and smooth conversion of Datalog and probability in the action and
change framework.

**Speaker: Jon Williamson**
**Title: Maximising Entropy Efficiently**
**Abstract: **
Determining a prior probability function via the maximum entropy principle
can be a computationally intractable task. However one can easily
determine - in advance of entropy maximisation - a list of conditional
independencies that the maximum entropy function will satisfy. These
independencies can be used to reduce the complexity of the entropy
maximisation task. In particular, one can use these independencies to
construct a direct acyclic graph in a Bayesian network, and then maximise
entropy with respect to the numerical parameters of this network. This can
result in an efficient representation of a prior probability function, and
one that may allow efficient updating and marginalisation. Furthermore this
simplification of the entropy maximisation task may be exploited to
construct a proof theory for probabilistic logic.