From Ashwin.Srinivasan@comlab Tue Jul  5 14:48:41 1994
Received: from comlab.ox.ac.uk (scye.comlab) by comlab.oxford.ac.uk 
	 id AA00180; Tue, 5 Jul 94 14:48:19 +0100
Received: by comlab.ox.ac.uk (4.1/comlab3.1)
	 id AA07319; Tue, 5 Jul 94 14:48:23 BST
From: Ashwin.Srinivasan@comlab
Message-Id: <9407051348.AA07319@scye.comlab.ox.ac.uk>
Subject: ml-chall.txt
To: Steve.Muggleton@comlab (Steve Muggleton)
Date: Tue, 5 Jul 94 14:48:23 BST
X-Mailer: ELM [version 2.3 PL11]
Status: OR


Recent developments in relational learning have intensified interest 
in automated theory formation as an aid to the conjecture from 
observations of predictive laws in science. How do today's inductive 
inference algorithms stack up against human brains in this particular 
skill? I plan shortly to publish the following article in  the  
British  weekly Computing, here reproduced as a Postscript file to 
facilitate recovery of the embedded graphics.

<start of article for Computing> 

The diagram below originated 20 years ago from Ryszard Michalski, 
pioneer of the branch of machine learning that digs theories out of 
data. 

<Postscript pic file here>

As in scientific discovery, it is required to conjecture some 
plausible Law, in this case  governing what kinds of trains are 
Eastbound and what kinds Westbound. 


The newly discovered Law can be tested on fresh observations. If it 
predicts correctly, it is said to receive corroboration.  Otherwise it 
is scrapped, or alternatively patched up. After Galileo's 
introduction of the  telescope the Ptolemaic theory of the heavens was
eventually scrapped, and replaced by one in which the earth goes 
round the sun. 

I published the trains diagram in the  computer press over ten years 
ago. My post-bag contained some neat conjectures from readers, such as: 

Theory A: If a train has a short closed car, then it is Eastbound and 
otherwise Westbound. 

Theory B: If a train has two cars, or has a car with a corrugated roof, 
then it is Westbound and otherwise Eastbound. 

One and the same set of observations can support several different 
theories. Pending new observations we generally take the simplest and 
hope for the best. Theory A is marginally simpler than B and a good 
deal simpler than C from one of my readers: 

Theory C: If a train has more than two different kinds of load, then it 
is Eastbound and otherwise Westbound. 

No learning system of those days was capable of coming up with a theory
 like C, still less one like the following from another reader: 
 
Theory D: For each train add up the total number of sides of loads 
(taking a circle to have one side). If the answer is a divisor of 60 
then the train is Westbound and otherwise Eastbound. 

Time has moved on and I have meanwhile observed ten more trains:

<Postscript pic file here>

Merging these with Michalski's original ten, we note that Theories 
A, B, C and D all fail. Can the mental ingenuity of Computing 
readers extract sense from the enlarged sample and give us a new Law, 
fitting all 20?  The best entry, judged on accuracy and simplicity, wins 
a free copy of Richard Gregory's handsome book The Oxford Companion to 
the Mind, by kind donation of the Oxford University Press. I will also 
publish any good Laws of machine authorship.  To be in line for a prize 
the programmers must submit their names and addresses, as must the other 
entrants. 

<end of article for Computing>

Returning now to this open letter to ML specialists, the additional 
ten trains were generated by Steve Muggleton, within constraints that
seemed to be suggested by the original set of ten cars, as follows: 

1)  A train has two, three  or four cars, each of which can either be long 
or short. 

2)  A long car can have either two or three axles. 

3)  In Michalski's original version a possible distinction between hollow
and solid wheels was ignored, as also here. 

4)  A car can also be either open or closed. 

5)  Long cars can only be rectangular, and if closed then their tops are 
either level or corrugated. 

6)  If a short car is rectangular then it can also be double-sided, and 
if closed then it has either a level or a peaked top. 

7)  The other allowable forms for a closed short car are rectangular, 
elliptical, and hexagonal. 

8)  For an open short car the possibilities are rectangular, slope-sided 
and curve-sided. 

9)  A car can be empty or it can contain one, two or three replicas of 
one of the following kinds of load: circle, triangle, 
inverted-triangle, hexagon, rectangle and rotated-rectangle. 

10) No sub-distinctions are drawn among rectangles, even though some 
are drawn square and others more or less oblong. The presumption is 
that they were drawn just as oblong as they needed to be in each case 
to fill the available container space. 

Steve Muggleton's Prolog train generator embodies the above-sketched 
constraints together with certain distributional assumptions 
concerning values of descriptors so as to ensure statistical coherence 
with Michalski's original ten. 

The Prolog code is as follows. 

<Prolog text here>

Note that the generator assigns no class labels, this having been left 
to me to do in such a  way as to  exemplify "property X", i.e. the 
secret East-West rule at present known only to me. When all entries 
have been received from readers of Computing and from readers of this 
circular letter, the rule will be divulged, along with results and 
analysis. 

Donald Michie, SERC Visiting Fellow, OU Computing Laboratory, OXFORD. 
4th July 1994.

