|
Simon Colton Division of Informatics, University of
Edinburgh 80 South Bridge, Edinburgh EH1 1HN, United Kingdom
simonco@dai.ed.ac.uk
Abstract
We discuss the HR program [colton:ijcai99] which is designed to
perform automated theory formation in domains of pure mathematics. We
overview the recent application of theory formation to discovery,
learning and problem solving tasks. We compare how theory formation is
used for these tasks and discuss how to compare HR to the machine
learning program Progol [muggleton:progol].
The HR Program
The HR program [colton:ijcai99], named after mathematicians Hardy
and Ramanujan, is designed to form theories in domains of mathematics
such as group theory, graph theory and number theory. HR starts with
background information such as the axioms of a finite algebra, or some
concepts in number theory such as the divisors of integers,
multiplication and addition. Each concept is supplied with a
definition in both a Prolog format and a format acceptable as input to
the Otter theorem prover [mccune:ottermanual], and a finite set
of examples. HR uses one of seven general production rules to base a
new concept on either one old concept (in which case we say the
production rule is unary ) or two old concepts (a binary
production rule). This produces a set of concepts which form the core
of the theory. Each production rule generates a definition in both
formats and a set of examples for the new concept.
While producing concepts, HR makes conjectures supported by empirical
evidence. If it notices that the examples of a new concept are exactly
the same as an old concept (for the data available), it will
conjecture that the definitions of the two concepts are logically
equivalent --- producing an `if and only if' conjecture. In finite
algebras such as finite group theory, HR invokes Otter to prove the
conjectures it makes. Whenever Otter is unsuccessful, HR invokes the
MACE model generator [mccune:macemanual] to find a counterexample
to disprove the conjecture. In this way, HR forms a theory which
contains concepts, examples, open conjectures, theorems and proofs.
To improve the quality of the theories, HR uses heuristic measures to
estimate the worth of concepts. It then performs a best first search
by using the more interesting concepts as the basis for new concepts
before the less interesting ones. The user sets weights for a weighted
sum of all the measures and this sum is taken as an estimate of the
worth of each concept. The measures include intrinsic properties of
the concept such as the number of examples it has, as well as
relational measures such as the novelty of the categorisation it
produces, as discussed in [colton:ijcai99].
The quantity and quality of conjectures that a concept appears in is
also assessed, with concepts appearing in interesting conjectures
assessed as more interesting than those appearing in dull
conjectures. The worth of a theorem is assessed by the length of the
proof produced by Otter, with longer proofs indicating a more
interesting conjecture statement. Conjectures are also assessed by
their surprisingness: the syntactic difference between two definitions
conjectured to be equivalent is taken as a measure of the
surprisingness, with a greater difference indicating a more surprising
conjecture. HR completes a cycle of mathematical activity where
concept formation drives conjecture making which in turns improves
concept formation.
HR improves on previous theory formation programs such as AM
[lenat:kbsiai] and GT [epstein:otdomt] by incorporating
theorem proving (AM could not prove theorems) and by being able to
work in different domains (GT could only work in graph theory). We
have recently applied HR to discovery, learning and problem solving
tasks. We provide enough detail of these tasks below to enable a
comparison of the techniques involved.
Discovery Tasks
In less than an hour, HR can produce more than 2000 concepts in number
theory and if the user sets different starting parameters, the yield
of concepts differs each time HR is run. There is therefore the
possibility of HR producing new and genuinely interesting concepts.
We explored this possibility in [colton:aaai00] and we summarise
this work below.
It is difficult to tell in general whether a concept is either new or
interesting. In number theory, however, there is an Encyclopedia of
Integer Sequences, [sloane:oeois] which contains over 55,000
sequences collected over 35 years by Neil Sloane, with contributions
from many mathematicians. If a concept HR produces in number theory
can be interpreted as an integer sequence which is missing from the
Encyclopedia, this gives some indication --- but by no means a
guarantee --- that the concept may be novel.
We also used the Encyclopedia to give some indication as to whether
the new integer sequences HR produces are interesting. To do this for
a chosen sequence , we enabled HR to find sequences in the
Encyclopedia which are empirically related to . This relation is
then stated as a conjecture about . As a trivial example, given the
sequence of prime numbers, HR makes the conjecture that they are never
square numbers. It does this by noticing that none of the prime
numbers in its sequence were in the Encyclopedia entry for square
numbers. As well as finding disjoint sequences, HR is able to find
subsequences and supersequences of the chosen sequences.
Due to the large number of sequences in the Encyclopedia, many
sequences related to the chosen one are output and we had to implement
pruning techniques to discard dull conjectures. For example, it is
desirable that a sequence conjectured to be disjoint with the chosen
sequence has its terms distributed over roughly the same part of the
number line as the chosen sequence. If so, the two sequences occupy
roughly the same part of the number line yet do not share any terms
--- which increases the possibility of the conjecture being true
and/or interesting. Therefore, HR discards conjectures about disjoint
sequences if the overlap of their ranges falls below a minimum
specified by the user.
By finding conjectures relating the sequence HR has invented to the
sequences already in the Encyclopedia, HR provides some evidence that
the sequence is of interest. This invent and investigate approach has
successfully led to 17 sequences invented by HR being added to the
Encyclopedia, all supplied with interesting conjectures. A good
example of this is the sequence of integers where the number of
divisors is prime, which HR invented (in as much as it was produced by
HR and not present in the Encyclopedia). When asked to find
subsequences of this sequence, the first answer produced was the
sequence of integers where the sum of divisors is prime
(submitted to the Encyclopedia by someone else). Interpreted as a
conjecture, this result states that, given an integer, , if the sum
of divisors of is prime, then the number of divisors of will
also be prime. We have subsequently proved this result, and while we
do not know for certain whether it is a new result, it certainly adds
interest to the sequence HR invented.
Learning Tasks
We have also explored the possibility of using HR as a machine
learning program, where the user supplies examples and non-examples of
a concept and HR finds a definition for the concept. So far, we have
used HR to learn definitions for integer sequences, as discussed in
[colton:icml00]. We have also applied HR to Michalski-style train
problems [michalski:trains_original] where the program is asked
to find a reason why a certain set of trains are going East, based on
certain characteristics of the train, for example the shape of the
carriages.
The naive way to use theory formation for learning tasks is to supply
HR with background knowledge and ask it to form a theory, stopping
when it has found a concept which matches the data supplied. To focus
theory formation, we adapted HR's heuristic search to favour building
on concepts which achieved a categorisation closer to the one achieved
by the target concept. For reasons given in [colton:icml00], we
found that this approach often failed to learn integer sequences
because it concentrated on the wrong concepts.
Instead of the heuristic search, we used an exhaustive search enhanced
with a look ahead mechanism. As an example, given the sequence (prime numbers) we have enabled HR to notice that, when it
forms the concept of number of divisors, the prime numbers all have
two divisors, a fact which is true of none of the other integers in
HR's dataset. Each production rule has an algorithm for spotting a
pattern which is true only for the positive examples, and when this
happens a suitable theory formation step involving that production
rule is added to the top of the agenda. Execution of the step will
produce a concept which fits the data. The pattern spotting mechanism
is faster than actually performing a theory formation step because (a)
for the majority of the time, it is possible to quickly tell that
there is no pattern and (b) there are overheads involved in performing
a theory formation step.
We also implemented a scoring system for the look ahead mechanism
whereby the largest number of positive examples for which a pattern
could be found was output by the production rule. However, for the
examples we used --- well known number theory concepts such as prime
numbers, squarefree numbers and odious numbers --- we found that the
scoring system performed worse due to increased overheads and the
suggestion of theory formation steps which do not lead to the desired
concept. However, we did implement a `unary-first' search by combining
a depth first and breadth first search: the unary production rules are
used exhaustively for each new concept before returning to the binary
production rules with old concepts. In this way, each new concept
receives some preliminary development, but is not combined with
previous concepts until later.
The look ahead mechanism has been successful with both the problems
about trains and integer sequences, and we supply results in
[colton:icml00]. It is particularly effective when the concept to
be learned is a combination of two old concepts, for example odd prime
numbers which combines the concepts of odd numbers and prime
numbers. Depth first, breadth first and unary first searches do not
find this concept quickly without the look ahead mechanism. However,
with the look ahead mechanism, odd numbers are invented and as soon as
prime numbers are introduced, HR notices that the positive examples
are both odd and prime (and the negative examples are not). HR then
combines these concepts and reaches the solution much quicker --- in
fact the time taken to learn the concept reduces from 384 to just 5
seconds. We are planning to compare HR and the Progol machine learning
program [muggleton:progol] as discussed in the conclusions section
below.
Problem Solving
In his book on mathematical problem solving [zeitz:taacops] Paul
Zeitz suggests a `plug-and-chug' method, whereby calculations are
performed and the results analysed to see if any pattern emerges which
might provide insight into the problem. Zeitz supplies the following
problem --- taken from a 1930s Hungarian mathematics contest --- as an
example where this approach leads to the solution: Show that the product of four consecutive
integers is never a square number.
Following the plug and chug method, Zeitz calculates examples of the
product of four consecutive integers:

The sequence of calculations continues: 24, 120, 360,
840 and a eureka moment occurs with the realisation that these are all
one less than a square. Zeitz then makes the conjecture that all such
numbers are one less than a square and hence not square numbers. Zeitz
states that:
Getting to the conjecture was the crux move. At this point the problem
metamorphosed into an exercise!
To finish the problem, it is necessary to show that the product of
four consecutive integers can be written as a square minus 1:

We have applied HR to plug-and-chug problems of this nature, by
getting it to make suggestions which might lead to the eureka
moment. To do this, HR is given a set of numbers which are related to
the problem and asked to suggest properties in the hope that one of
the suggestions will provide insight into the problem. This occurs
through theory formation: for every new concept HR introduces, if the
examples it is given have the property prescribed by the concept, then
the definition is output. For example, when used for the Hungarian
contest problem above, HR is given the numbers 24, 120, 360 and
840. As it forms a theory, it invents types of number and when the
numbers 24, 120, 360 and 840 all satisfy the definition of a
particular number type, the definition is output. Of course, some
suggestions do not provide insight (for example that they are all even
numbers). However, HR eventually invents the concept of
squares-minus-one and so finds the conjecture which metamorphosed the
problem. In this case, only 15 suggestions are made before the correct
one, but we may have to introduce pruning measures if too many
suggestions are produced.
The application of HR to problem solving is very recent and has not
been reported anywhere other than here. We are presently experimenting
with this application of theory formation, in particular we are
compiling a corpus of problems where the plug-and-chug approach would
help. We intend to attach this functionality to a computer algebra
package such as Maple or Mathematica.
Conclusions and Further Work
It was not our intention here to discuss whether HR is better or worse
than other programs at discovery, learning or problem solving
tasks. Rather, we wanted to compare the use of theory formation in
these three tasks. In particular, we note that all three problems are
effectively learning tasks. For discovery, HR is asked to learn about
the top level concept of number. In doing so, it re-invents many well
known concepts such as prime numbers, but also invents new concepts
and supplies interesting conjectures about them. For problem solving,
HR is asked to learn about a particular concept which the problem
discusses. This highlights that the problem of learning a concept
definition is a sub-task of the problem of learning about a
concept, which involves forming a theory about a concept (whether it
be the concept of number or a more specialised concept, such as the
product of four consecutive integers).
In [colton:icml00] we started the process of comparing HR with
the machine learning program Progol [muggleton:progol]. We showed
that various aspects of the inductive logic programming algorithm,
including inverse resolution, the background knowledge supplied and
the mode parameters, correspond directly with the production rules
that HR uses, in terms of the types of concepts which Progol and HR
can learn. We also pointed out the limitations in HR as a machine
learning program. We intend to fully compare the programs by enabling
Progol to work in a domain that HR works in (number theory) and by
enabling HR to work in a domain that Progol works in (train
theory). We have enabled HR to perform machine learning tasks such as
those performed by Progol. Similarly, we intend to use Progol to form
theories containing concepts and conjectures in a similar way to HR.
One way to do this is in number theory would be to supply Progol with
a set of categorisations of the numbers 1 to, say, 6 and ask it to
learn a concept achieving the categorisation. There are 203 such
categorisations, so the theory would contain at most 203 concepts ---
fewer if Progol could not learn a definition for some
categorisations. Conjectures could also be formed by changing the
settings and/or the background predicates in Progol and asking it to
learn another definition for a particular concept. If it produced a
different concept definition, a conjecture stating the equivalence of
the two definitions could be made. Similarly, implication conjectures
could be stated using two concepts expressing a type of number (which
would be found for boolean categorisations). An implication conjecture
could be stated if all the numbers in the positive category for one
concept were in the positive category of the other concept (similarly
for the negative category).
We have showed in principle that theory formation can be applied to
learning, discovery and problem solving tasks. This has mainly been
achieved in number theory and we need to test the application of
theory formation in other domains, including non-mathematical ones ---
we have already started testing HR on train problems. We also hope to
apply theory formation to constraint satisfaction problems --- where
some of the conjectures HR makes are turned into constraints and
added to the problem specification. We also hope to apply theory
formation to the generation of puzzles, such as odd-one-out and
next-in-sequence puzzles.
Acknowledgments
This work is supported by EPSRC grant GR/M98012.
References
[colton:ijcai99]
S. Colton, A. Bundy, and T. Walsh.
HR: Automatic concept formation in pure mathematics.
In Proceedings of the 16th International Joint Conference
on Artificial Intelligence, 1999.
[colton:icml00]
S. Colton, A. Bundy, and T. Walsh.
Automatic identification of mathematical concepts.
In Machine Learning: Proceedings of the 17th International
Conference, 2000.
[colton:aaai00]
S. Colton, A. Bundy, and T. Walsh.
Automatic invention of integer sequences.
In Proceedings of the Seventeenth National Conference on
Artificial Intelligence, 2000.
[lenat:kbsiai]
R. Davis and D. Lenat.
Knowledge-Based Systems in Artificial Intelligence.
McGraw-Hill Advanced Computer Science Series, 1982.
[epstein:otdomt]
S. Epstein.
On the discovery of mathematical theorems.
In Proceedings of the 11th International Joint Conference
on Artificial Intelligence, 1987.
[mccune:ottermanual]
W. McCune.
The OTTER user's guide.
Technical Report ANL/90/9, Argonne National Laboratories, 1990.
[mccune:macemanual]
W. McCune.
A Davis-Putnam program and its application to finite first-order
model search.
Technical Report ANL/MCS-TM-194, Argonne National Laboratories, 1994.
[michalski:trains_original]
R. Michalski and J. Larson.
Inductive inference of VL decision rules.
In Proceedings of the Workshop in Pattern-Directed Inference
Systems
(Published in SIGART Newsletter ACM, No. 63), 1977.
[muggleton:progol]
S. Muggleton.
Inverse entailment and Progol.
New Generation Computing, 13:245--286, 1995.
[sloane:oeois]
N. Sloane.
The encyclopedia of integer sequences.
http://www.research.att.com/. njas /sequences, 2000.
[zeitz:taacops]
P. Zeitz.
The Art and Craft of Problem Solving.
John Wiley and Sons, 1999.
© 2002 Simon Colton
|