|
Simon Colton simonco@dai.ed.ac.uk
Alan Bundy bundy@dai.ed.ac.uk
Division of Informatics,
University of Edinburgh, 80 South Bridge, Edinburgh EH1 1HN
Toby Walsh tw@cs.york.ac.uk
Department of Computer Science, University of York,
Heslington, York Y010 5DD
Abstract
The HR program by [colton:ijcai99] performs theory formation in
mathematics by exploring a space of mathematical concepts. By enabling
HR to determine when it has found a particular concept, and by adding
a forward looking mechanism, we have applied HR to the problem of
identifying mathematical concepts. We illustrate this by using HR to
identify and extrapolate integer sequences and by performing a
qualitative comparison with the machine learning program Progol.
Introduction
Extrapolating integer sequences such as is an
intelligent activity requiring both understanding and
creativity. While there have been attempts in Artificial Intelligence
to automatically extrapolate sequences, presently the state of the art
is to use a large online database, the
Encyclopedia of Integer Sequences. Extrapolating integer sequences
generalises to the problem of automatically identifying a property of
a set of mathematical objects (the example set) which distinguishes
the objects from a larger class.
The HR program [colton:ijcai99] performs theory formation by
exploring a space of mathematical concepts in domains such as finite
group theory, graph theory and number theory. The space is
characterised by the initial set of concepts supplied by the user,
from which all subsequent concepts are built, and by the seven
production rules which turn old concepts into new ones. To derive the
production rules, we determined some general properties common to many
concepts across domains and implemented production rules which can
turn old concepts into new ones which have these properties. Given
some initial concepts as background information, HR uses a heuristic
search to choose which old concepts to apply which production rules to
and explores the space in this manner.
If the user supplies positive and negative examples for a concept, and
HR invents a definition which all the positive examples satisfy, but
the negative examples do not, we say HR has identified the concept. In
number theory, we have restricted HR's abilities to only identifying
sequences which are types of number, such as prime numbers. We hope to
cover other sequence types such as functions in the near future. If
the user supplies an increasing sequence: then
HR assumes that the negative examples are those integers between
and which do not appear in the sequence. For example, when HR is
given this sequence extract:
it re-invents the concept of integers with exactly two divisors (prime
numbers). All the integers in the sequence have this property but the
integers do not, so HR has learned the
concept. To supply the next number in the sequence HR simply generates
integers larger than until the next one which satisfies the
definition is found, in this case 37.
HR is designed to perform machine discovery and has successfully
invented new integer sequences [colton:aaai00]. The purpose of
this paper is to detail the application of HR to a different problem:
the identification of user supplied concepts. Exploratory search is
not well suited to such learning problems, so we have implemented a
forward looking mechanism to improve efficiency. This works by looking
up to three steps ahead in the space to decide whether a path will
lead to the required concept. We discuss how concepts are represented,
the production rules HR employs and the forward looking mechanism. To
assess the system, we present some initial results from integer
sequence extrapolation and compare HR to other systems, including the
machine learning program Progol.
Representation of Concepts
HR is supplied with background knowledge in terms of some initial
concepts upon which all new concepts are based. These concepts are
either objects of interest, such as graphs, groups or integers, ways
to finitely decompose the objects into subobjects, or relations
between the subobjects. Table 1 details the subobjects and relations
usually provided for each domain.

Table 1. Initial concepts supplied by the user
Each concept describes a property of tuples of objects. The tuple is
always an object of interest followed by subobjects. An example is
prime divisors of an integer: a property of pairs of integers,
stating that is a prime divisor of . Additionally, there may be
numerical values calculated, for example the concept of tuples
where is a graph and is the number of nodes. Every concept is
represented with:
[1] A data table of tuples of objects which
satisfy the definition of the concept. For example, for the integers 1
to 5, the concept of multiplication has these tuples:

where the integers in the final two columns multiply to give the
integer in the first column.
[2] A set of predicates which are
true of the objects in each row of the data table. For example, the
concept of multiplication discusses triples of integers,
and the user supplies 6 predicates:
These predicates specify the multiplication property of this
concept and also highlight that the integers being multiplied
together are divisors of their product.
Given a new concept, , HR identifies its sub-concepts, which
are those other concepts with a subset of the predicates of . For
instance, the concept of divisors in number theory is a sub-concept of
multiplication, as the integers being multiplied are divisors of their
product. We call sub-concepts where each variable of appears in a
predicate of the sub-concept generalisations. For example,
divisors are a generalisation of prime divisors.
Production Rules
Instead of studying how mathematicians invent new concepts, we have
looked at the concepts themselves to extract shared general
properties. We have turned each observed property into a production
rule which takes a concept without the property and produces a concept
with it. We have implemented seven rules, each of which generates a
data table and predicates for the new concept. Each rule also
generates a set of concept-dependent parameterisations, with each
parameterisation specifying a different construction using the
rule. We motivate each rule below with examples from mathematics, and
detail how it is parameterised and how it produces new data tables and
predicates. The seven rules are not meant to be exhaustive, and we
suggest two more in Section 7.
The Negate Production Rule
The negation of one concept is often incorporated into the definition
of another, e.g., odd numbers are those not divisible by
2. Similarly, groups which are not soluble are very
important in group theory. Given a concept, , there is one way to
negate it for each of its generalisations, , and the set of
generalisations comprises the parameterisations for this rule. To
construct the predicates of the new concept, the negate rule finds any
predicates of which aren't predicates of , and negates them. It
then adds the negated predicates to those of to produce the new
set. For example, the concept of pairs of divisors of an
integer :
is a generalisation of the multiplication concept:
Hence to negate multiplication, HR negates just the multiplication
predicate and adds it to the predicates from the generalisation,
giving:
The data table for the new concept is constructed by taking those
tuples in the data table for that are not present in the data
table for . In our example, the new data table will contain tuples
such as and where the last two numbers are
divisors of the first integer, but their product is not the integer.
The Forall and Exists Production Rules
Looking at Abelian groups, where all elements commute,
complete graphs, where all nodes are adjacent and repdigit
integers, where all digits are equal, we see that objects
where every pair of subobjects satisfies a relation are often
extracted into a new concept. In general, objects where every tuple of
subobjects have a particular property are of interest.
To turn this observation into the forall production rule, we noted
that information about which specific subobjects have the property is
replaced by a quantification, ie., all objects have the
property. The rule is then parameterised by the set of variables to
quantify over. For instance, the parameters indicate that the
second and third variables should be quantified over. Given a concept,
, the predicates for the new concept are the predicates not
involving the quantified variables, and an additional quantified
predicate. The quantified predicate is constructed by first finding a
sub-concept, , which involves all the variables to be
quantified. It is then constructed in the form: , where is the conjunction of
the predicates of and is the conjunction of the predicates of
which are not predicates of .
This is better explained by example: suppose we use parameters [2] and
the concept of prime digits:
The only sub-concept of this is the concept of digits. Hence the new
concept will have a quantification over the digits of , stating
that they are all prime:
This is the concept of integers with all prime digits. To produce a
data table for the new concept, the data table of the sub-concept
is found and HR looks at every tuple in it for a particular object of
interest. If all the tuples are also found in the data table for ,
a new tuple is constructed containing just the object and added to the
data table for the new concept.
Sometimes objects with all subobjects satisfying a relation are rare
and it may still be interesting to find an object which has at least
one such subobject. The exists production rule generates concepts with
this kind of existential quantification. The construction of new
concepts by the exists production rule is similar to that performed by
the forall rule.
The Match Production Rule
Square numbers, self inverse elements in groups and loops in graphs
from a node to itself are all concepts where some objects or
subobjects relate to themselves. The match production rule implements
a method to construct concepts with this nature. Each parameterisation
for this rule details how to find tuples where some of the objects are
equal, with the proviso that they are not different types of objects,
such as nodes and edges of graphs. For instance, the parameters
specify that tuples where the last two subobjects are equal
are to be extracted into a new concept. As the objects will be equal,
the corresponding variables in the predicates must also
match. Therefore predicates for the new concept are constructed by
taking the predicates for the old concept and matching the
variables. For example, given parameters and the
multiplication concept:
if we extract all those tuples for which the last 2 objects are equal,
we get this concept:
which is a construction made on the way to defining square
numbers. Note that the arity of the concept has been reduced, ie. the
predicates are now binary, as the third variable is the same as the
second. To construct a data table for the new concept, HR extracts all
those tuples where the objects in the columns match as prescribed by
the parameters. It then discards the repeated columns relating to the
repeated variables.
The Size and Split Production Rules
The number of divisors of a integer is the well known
-function and the size of the centre of groups is an
important group theory notion. Also, groups with one central
element and integers with two divisors are well known concepts
(symmetric groups and prime numbers respectively). This indicates that
taking the size of a set of subobjects, and identifying objects with a
set of a particular size are two general ways to construct
mathematical concepts.
Given a concept, , the size production rule calculates set
sizes. Similar to the forall rule, information about specific
subobjects is replaced by a summary of how many have a particular
property. So, this rule is parameterised by sets of variable numbers
such as which specify that subobjects appearing in the 2nd and
3rd columns of the tuples for are to be counted. The predicates
for the new concept are produced by first finding all predicates
involving no variables for the sub-objects to be counted. These are
added to the new set, along with an additional predicate involving a
new variable, say . The new predicate will be of the form: , where is the conjunction of the set of
predicates which involve the variables for the sub-objects being
counted.
For example, if we start with the concept of division: ,
we can count the number of divisors of . Removing the predicates
which involve divisors leaves only the sub-concept of integers. A new
letter is generated, , and the predicate to add will be . This produces the -function:
The new data table is constructed by finding the sub-concept, ,
with the predicates of involving variables not appearing in the
new set size predicate. Then, for each tuple in the data table of ,
the number of times it appears in the data table for is
counted. Finally, this coefficient is added to the tuple to produce a
tuple for the new data table.
The split production rule takes a concept generated by the size
production rule and finds those objects where the set size is either
0, 1 or 2 (or a higher number, depending on the user's
preferences). To produce new predicates, it simply replaces the
variable representing the set size with the value the set size must
be. For example, taking the function above, and replacing the
by 2, we get the concept of prime numbers:
This action reduces the arity of the concept, as one of the variables
has changed to a fixed number.
This rule can also fix subobjects to be particular values. For
example, if it extracts all those integers where is a divisor, it
produces even numbers. The parameterisations specify which columns of
the tuples to look for which values in. For example, to get prime
numbers, we need to look for the number in the second column of
the -function tuples. Here the parameterisation is written
, with the first list being the columns to look in, the
second list being the values to look for. The new data table is
constructed by finding those tuples in the data table of where the
columns specified have the values specified. Then the columns with
fixed values are discarded, and any repeated rows which result are
also discarded.
The Compose Production Rule
The compose production rule covers both the composition of functions,
and the specialisation of concepts, construction methods which are
ubiquitous in every mathematical domain. HR can compose a concept
with a partner by adding the predicates of to those of ,
effectively overlapping the concepts. For example, given two
function concepts:
and
the construction can be function composition:
or
in which case the arity of the concept increases. It can also be a
specialisation step:
[the are those special ones for which ]. The
parameterisations are sets of numbers which specify how the predicates
of are to be overlapped with those of . For example, if the
parameterisation was , this specifies that a new concept with
predicates over three variables is to be constructed. The new concept
will have the predicates of . It will also have the predicates from
, but the variables will have changed - the first variable will be
the first from the new concept, and the second variable will be the
third variable from the new concept (as the number 2 appears in the
third column of the parameters).
To build a data table for the new concept, HR runs through the
data table for and extracts any tuples, , which overlap in the
prescribed way with a tuple, , from the data table of . Then, for
every pair , a tuple for the new data table is
constructed by overlapping the tuples. For example, if the parameters
were , then we require a tuple and a tuple
for which . If such a pair is found, the
tuple is added to the new data table.

Fig. 1. Construction of the function
Example Construction
Figure 1 details the construction of the number theory function
which counts the number of integers less than or equal to which
are coprime to it (two integers are coprime if the only divisor they
share is 1). This construction only requires 2 initial concepts
(divisors and ), and 3 production rules (compose, size and
split). This demonstrates that complicated concepts are reachable from
fundamental concepts using just a few production rules. We define the
complexity of a concept to be the number of concepts in its
construction history. The complexity of the function is
therefore 7. Often to impose a depth limit on the search we set a
complexity limit of around 10.
A Forward Looking Mechanism
As each new concept can conceivably be combined with any other, and
each production rule has a set of parameterisations, HR's searches
lead to a combinatorial explosion. Asking HR to identify a concept of
complexity 7 or 8 means searching a space which is too large to cover
in a reasonable time. As discussed in [colton:ijhcs], HR
usually works by choosing the best concept to use in the next theory
formation step. HR has eight measures to decide which concept to use
and the heuristic search is intended to increase the yield of
interesting concepts.
We tried this approach for concept learning, using the invariance and
discrimination measures described in
[colton:ijcai99]. These measure how close the
categorisation produced by a new concept is to the categorisation
produced by the goal concept. For example, suppose the goal concept is
prime numbers, with this categorisation:
[1,4,6, ],[2,3,5, ]. Concepts producing similar
categorisations will score highly, and hence will be used first in the
heuristic search. Unfortunately, this actually made matters worse. We
noted a drawback which is common in heuristic searches - often to get
to the concepts of real interest, it is necessary to go via dull
concepts the search is designed to avoid. That is, certain concepts
necessary to the construction of the goal concept may produce very
different categorisations to that of the goal concept and be ignored
by the search mechanism.
We therefore adopted another way to use the production rules - an
exhaustive search enhanced by a mechanism for examination of new
concepts in order to suggest which production rules (if any) to apply
to them. Therefore, rather than identifying the present concepts which
are best, the mechanism identifies which future concepts are best. For
example, if trying to identify the prime numbers, 2, 3, 5, etc. given
the initial concept of divisors of an integer, we would hope HR would
notice that there were exactly two divisors for and so
on. We would also want it to notice that the non-examples did not have
this property, and suggest how to capitalise upon this. Similarly, if
trying to identify the integers ,
as soon as it invented prime numbers, we would want it to notice that
the sequence members have no prime digits, and to suggest we compose
the new concept of primes with the old concept of digits.
To do this, every new concept, , is passed through each production
rule, , which searches for a pattern. If a pattern is found, a
theory formation step stating that should be used with is
added to the top of the agenda. For each production rule we have
balanced the need to spot different patterns with the need to do this
efficiently - looking too hard slows the search considerably. The size
production rule, for instance, looks for 1 pattern: where the number
of tuples in the data table is the same for every positive example. As
soon as the pattern is broken, it stops looking, which improves
efficiency. Only if the pattern is true for all the positive
examples, will it look at the negative examples, in the hope that the
pattern is not true for them. Only if no negative example has the
pattern will the appropriate step be added to the agenda.
Some production rules effectively look two or even three steps
ahead. The match production rule finds tuples for which
. It can spot four patterns in this way: for every positive
example, (i) all tuples have the property (ii) at least one tuple has
the property (iii) the same number of tuples have the property and
(iv) no tuples have the property. If any of these patterns are found,
they not only suggest the use of the match production rule, but also
the use of another production rule after that. For example, if no
tuples are found, this suggests performing a match step followed by a
negate step, and the agenda is updated accordingly. Similarly, if the
same number of tuples with the property was found for each positive
example, this would suggest performing a match step followed by a size
step, followed by a split step. Hence in certain cases, this mechanism
looks three steps ahead, allowing deeper access to the search space.
The compose production rule is treated differently, as patterns for
this are sought by relating the newly formed concept to a partner
concept. For efficiency reasons, HR looks at every possible partner,
but only looks for a small number of common patterns. In particular,
it notices that (i) the output of two functions are the same for each
positive example [e.g., that the number of divisors equals the number
of digits] (ii) that all the positive examples have the property of
the concept and its partner [e.g., that an integer is both prime and
odd] and (iii) that a set of subobjects (or the output from a
function) has a property for each positive example [e.g., that each
digit is prime]. The look-ahead functionality of the compose rule is
very effective because HR often takes a long time to combine concepts
when performing an exhaustive search.
Integer Sequence Results
We have initially concentrated on HR's ability at integer sequence
extrapolation, due to the availability of the Encyclopedia to suggest
sequences to learn. HR has so far identified 125 sequences from the
Encyclopedia and 20 of the more well known ones are given in table 2,
with the (paraphrased) definitions that HR produced for them. We let
be the number of 1's in the binary representation of and
be the number of 0's. In most cases, HR finds an instantly
recognisable, correct definition. In some cases HR finds alternative
definitions, e.g., it notices that powers of two are the only integers
with exactly one odd divisor.

Table 2. Definitions for 20 well known number types
The average time to learn one of these 20 concepts is 373.05 seconds
with an exhaustive search, reducing to just 3.65 seconds when the
forward looking mechanism is employed. The most striking examples of
the efficiency gain are the cases where two simple concepts are
combined into a more complicated one. For example, odious numbers are
those with an odd number of 1's in their binary representation, e.g., 25
is odious because it is written 10011 in binary, with three 1's. HR
cannot learn this until it has invented both the concept of odd
numbers and the function which counts the number of ones in the binary
representation of an integer. These were the 26th and 54th concepts
produced respectively. When concept 54 was introduced, the forward
looking mechanism determined that it should compose with concept 26
and this led to the solution after only 7 seconds. Without the
heuristic, HR took 90 minutes to get around to composing concepts 26
and 54.
As well as learning well known sequences, we identified sequences
missing from the Encyclopedia of Integer Sequences. Given any integers
and such that , the Encyclopedia has a
sequence beginning , with just two exceptions: there are
no sequences starting with or . We used HR to
extrapolate these. The sequence was very easy, HR simply
invented the concept of primes + 2. The sequence was more
difficult. The solution HR found uses the binary representation of an
integer, , to write it as . For example . HR first used the exists
production rule to define the set:
e.g., . Then it composed this with the concept of
divisors and defined those divisors which appear in . For
example, so divisors 1 and 2 of 6 appear in
. Finally, HR negated this concept, looking at divisors which do
not appear in , and it spotted a pattern for the integers and : they have exactly 2 divisors which do not appear in
. ie. are the first four integers, , for which:

This sequence continues: 4, 5, 6, 9, 13, 14, 15, 17 and we see that HR
has intelligently extrapolated . However, this sequence
seems to have little value other than providing an answer to our
question.
Related Work
The AM program [lenat:kbsiai] worked in number theory,
re-inventing concepts like prime numbers and making conjectures such
as Goldbach's conjecture. Starting with 115 concepts, AM applied one
of 242 heuristics to determine which task to do next - necessary as it
often had over 2000 tasks on the agenda. Some heuristics were very
specialised, enabling AM to reach particular concepts and sometimes
the user guided AM. HR's theory formation is much simpler: it starts
with only a few initial concepts, uses only seven construction
techniques and has only eight measures of interestingness. AM was
never applied to machine learning tasks such as identifying particular
concepts.
The Graffiti program, [fajtlowicz:ocog], makes conjectures in
graph theory stating that one summation of numerical invariants is
always less than another summation. Its concept formation is limited
to summing invariants, and it is not used to identify concepts. The
simply stated but difficult conjectures have efficiency applications
and have been settled by many notable mathematicians. In
[colton:ijhcs], we compare HR, AM, Graffiti and similar discovery
systems.
Integer Sequence Extrapolation
The Encyclopedia of Integer Sequences comprises around 54,000
sequences collected by Neil Sloane, with contributions from many
mathematicians. The user supplies the first few terms of a sequence
and the Encyclopedia is searched until a sequence is found which
contains the given terms. There is also an email server called the
superseeker which works much harder to find a match for a given
sequence. Superseeker transforms the input sequence and searches for a
match for the transformed sequences. Such transformations include
taking the difference between successive terms and more complex
manipulations such as the Boustrophedon transformation by
[millar:boustrophedon]. Superseeker's transformations are
very good at identifying sequences related to one already in the
Encyclopedia. However, due to the size of the database, it has limited
ability to relate two sequences. For instance, even though there are
many sequences about the digits of an integer, and the sequence of
prime numbers is fundamental, the superseeker cannot determine the
nature of these numbers: 1,4,6,8,9,10,11,14, which are simply those
with no prime digits.
The SeekWhence program by [hofstadter:fcaca], was
designed from a cognitive science perspective to extrapolate integer
sequences. Early versions determined the nature of an integer sequence
by performing some mathematical transformations, such as taking
differences between terms, and some general pattern recognition
transformations, such as looking at every second term in the sequence,
or identifying repeating clusters. For example, to extrapolate the
sequence , SeekWhence would first transform it by
taking the difference between successive terms to give: It would then recognise this as a sequence in its database,
ie. odd numbers. As it knew how to extrapolate odd numbers, it could
derive a way to extrapolate the original sequence. The project
originally aimed to out-perform mathematical approaches to sequence
extrapolation, such as those described by
[persson:seq_ext]. However, they became more interested
in what Hofstadter calls `Haiku' sequences, which are independent of
the context of mathematics, and only require general pattern
recognition rules to extrapolate. In this way, SeekWhence could also
extrapolate non-mathematical sequences, in particular melodies.
HR occupies the middle ground between the Encyclopedia and
SeekWhence. The production rules are designed to be applicable to many
domains, so it does not use the domain specific transformations of the
Encyclopedia. Nor does it use general pattern finding templates and
heuristics like SeekWhence. This is because HR is primarily a machine
discovery program employed to invent new concepts - the patterns of
SeekWhence are so general they would produce a plethora of concepts,
many of which would be of little interest to mathematicians. By
identifying certain general properties of mathematical concepts, we
have given HR a mathematical pattern generating ability which it can
use to invent new concepts in an attempt to find a definition for a
given set of examples.
A Qualitative Comparison with Progol
The Progol program [muggleton:progol] uses inductive logic
programming (ILP) to learn a concept given a set of predicates as
background knowledge and a set of positive and negative examples for
the concept. There is a striking similarity between the concepts
Progol and HR can reach. Firstly, Progol uses inverse resolution to
invent predicates which could have been resolved to produce the
background and example predicates. This produces concepts with
conjunctions of predicates, predicates with repeated variables, and
conjunctions of predicates which contain the same variable. We have
found that this covers the concepts that HR can produce with its
compose, match and exists production rules. For example, HR produces
this definition for square numbers: integers such that
s.t. , and Progol produces this
definition:
square(N) :- integer(M), multiply(N,M,M).
Secondly, the user is allowed to set mode
declarations detailing where background predicates can appear in the
invented predicates. Mode declarations also specify whether variables
become instantiated and whether negation of predicates is allowed. The
ability to instantiate variables corresponds exactly with HR's split
production rule, and the ability to negate predicates corresponds with
the negate rule. A combination of negated and existentially quantified
predicates corresponds to concepts produced by HR's forall production
rule. For example HR produces the definition for even numbers as being
divisible by two. Given the background predicate of divisors and
allowed to instantiate variables, Progol produces this definition:
even(N) :- divides(N,2).
Finally, we found that if we supply two additional predicates as
background knowledge from set theory, namely the standard Prolog
predicates of setof and length, Progol can cover
concepts produced by the size production rule. For example, HR defines
prime numbers as having exactly two divisors, and Progol produces this
equivalent definition:
prime(N) :- setof(M,divides(N,M),L),
length(L,2).
Therefore, for each of HR's production rules, we have found a way for
Progol to produce concepts of a similar nature. Interestingly, to
cover all the production rules requires three different aspects of
Progol's functionality. Only one of HR's production rules corresponds
to additional background knowledge, which adds to our claim that the
production rules are very general. We are currently undertaking a
quantitative assessment of HR and Progol to enable us to better
compare and contrast issues such as coverage, efficiency and control.
Further Work and Conclusions
Our approach to the identification of mathematical concepts is
specific to mathematics, but uses no information specific to any
domain and provides a way to identify types of graphs, types of groups
and types of numbers with very little modification to the program. It
is likely that the best approach to identifying mathematical concepts
will combine programs with:
- A large database, e.g., the Encyclopedia.
- Domain specific transformations, e.g., Superseeker.
- General pattern recognition, e.g., SeekWhence.
- Inventive abilities, e.g., Progol, HR.
HR starts with the fundamental concepts of a domain, modelling the way
in which a concept can be learned completely from scratch. HR would
benefit from a knowledge base of interesting concepts in a
domain. Using concepts like prime numbers as the basis for theory
formation rather than more fundamental concepts like divisors, it
could progress further into the search space. It would also benefit
from some domain specific transformations, and we have begun
implementing some based on those used by superseeker.
We can also look at the concepts which Progol can but HR cannot learn,
and suggest new production rules to fill the gap. Progol can generate
recursive definitions, such as the factorial function. We plan a
`path' production rule to enable HR to construct concepts with
recursive definitions. Also, Progol is able to produce definitions
with disjunction of predicates, such as integers which are prime
or square. We have so far avoided a `disjunct' rule, worried that
the theories produced will contain too many dull disjunctions, but we
plan to implement it for machine learning purposes.
As discussed by [langley:mlad], there is much overlap
between machine learning and machine discovery, with machine learning
tools discovering new results in science, e.g, ILP in biology. We have
discussed the reverse problem here: how to apply a discovery program
to machine learning problems. Presenting HR's production rules as a
tool for machine learning, we have provided some justification for
each based on observations from the mathematical literature, and
detailed how each forms concepts. We have shown that the search space
defined by the rules can reach many complicated concepts in number
theory such as the -function and that our forward looking
mechanism greatly improves efficiency.
The project to apply HR to machine learning tasks is far from
complete. A qualitative comparison of HR and Progol has highlighted
that HR's production rules correspond with various aspects of Progol,
showing that HR can theoretically cover many (but not all) of the
concepts Progol can learn. We are presently undertaking a quantitative
comparison of the systems. In [colton:aaai00] we have shown that
HR can invent interesting integer sequences, in effect posing sequence
extrapolation problems. We hope to have shown here that it can also
solve them.
Acknowledgements
This work is supported by EPSRC grant GR/M98012. We would like to
thank the anonymous reviewers for their very helpful comments, and
Stephen Muggleton for many in-depth discussions about ILP and HR.
References
[colton:ijcai99]
Colton, S., Bundy, A., & Walsh, T. (1999).
HR: Automatic concept formation in pure mathematics.
Proceedings of the Sixteenth International Joint Conference on
Artificial Intelligence (pp.\/ 786--791).
[colton:aaai00]
Colton, S., Bundy, A., & Walsh, T. (in pressa).
Automatic invention of integer sequences.
Proceedings of the Seventeenth National Conference on Artificial
Intelligence.
[colton:ijhcs]
Colton, S., Bundy, A., & Walsh, T. (in pressb).
On the notion of interestingness in automated mathematical discovery.
International Journal of Human Computer Studies.
[lenat:kbsiai]
Davis, R., & Lenat, D. (1982).
Knowledge-based systems in artificial intelligence.
New York: McGraw-Hill.
[fajtlowicz:ocog]
Fajtlowicz, S. (1988).
On conjectures of Graffiti.
Discrete Mathematics, 72, 113--118.
[hofstadter:fcaca]
Hofstadter, D. (1995).
Fluid concepts and creative analogies.
New York: Basic Books.
[langley:mlad]
Langley, P., & Michalski, R. (1986).
Machine learning and discovery.
Machine Learning, 1, 363--366.
[millar:boustrophedon]
Millar, J., Sloane, N., & Young, N. (1997).
A new operation on sequences: the Boustrophedon transform.
Journal of Combinatorial Theory, 17A, 44--54.
[muggleton:progol]
Muggleton, S. (1995).
Inverse entailment and Progol.
New Generation Computing, 13, 245--286.
[persson:seq_ext]
Persson, S. (1966).
Some sequence extrapolating programs: A study of representation
and modelling in inquiring systems (Technical Report STAN-CS-66-050).
Department of Computer Science, Stanford University, Stanford, CA.
© 2002 Simon Colton
|