From Ashwin.Srinivasan@comlab Tue Jul 5 14:48:41 1994 Received: from comlab.ox.ac.uk (scye.comlab) by comlab.oxford.ac.uk id AA00180; Tue, 5 Jul 94 14:48:19 +0100 Received: by comlab.ox.ac.uk (4.1/comlab3.1) id AA07319; Tue, 5 Jul 94 14:48:23 BST From: Ashwin.Srinivasan@comlab Message-Id: <9407051348.AA07319@scye.comlab.ox.ac.uk> Subject: ml-chall.txt To: Steve.Muggleton@comlab (Steve Muggleton) Date: Tue, 5 Jul 94 14:48:23 BST X-Mailer: ELM [version 2.3 PL11] Status: OR Recent developments in relational learning have intensified interest in automated theory formation as an aid to the conjecture from observations of predictive laws in science. How do today's inductive inference algorithms stack up against human brains in this particular skill? I plan shortly to publish the following article in the British weekly Computing, here reproduced as a Postscript file to facilitate recovery of the embedded graphics. The diagram below originated 20 years ago from Ryszard Michalski, pioneer of the branch of machine learning that digs theories out of data. As in scientific discovery, it is required to conjecture some plausible Law, in this case governing what kinds of trains are Eastbound and what kinds Westbound. The newly discovered Law can be tested on fresh observations. If it predicts correctly, it is said to receive corroboration. Otherwise it is scrapped, or alternatively patched up. After Galileo's introduction of the telescope the Ptolemaic theory of the heavens was eventually scrapped, and replaced by one in which the earth goes round the sun. I published the trains diagram in the computer press over ten years ago. My post-bag contained some neat conjectures from readers, such as: Theory A: If a train has a short closed car, then it is Eastbound and otherwise Westbound. Theory B: If a train has two cars, or has a car with a corrugated roof, then it is Westbound and otherwise Eastbound. One and the same set of observations can support several different theories. Pending new observations we generally take the simplest and hope for the best. Theory A is marginally simpler than B and a good deal simpler than C from one of my readers: Theory C: If a train has more than two different kinds of load, then it is Eastbound and otherwise Westbound. No learning system of those days was capable of coming up with a theory like C, still less one like the following from another reader: Theory D: For each train add up the total number of sides of loads (taking a circle to have one side). If the answer is a divisor of 60 then the train is Westbound and otherwise Eastbound. Time has moved on and I have meanwhile observed ten more trains: Merging these with Michalski's original ten, we note that Theories A, B, C and D all fail. Can the mental ingenuity of Computing readers extract sense from the enlarged sample and give us a new Law, fitting all 20? The best entry, judged on accuracy and simplicity, wins a free copy of Richard Gregory's handsome book The Oxford Companion to the Mind, by kind donation of the Oxford University Press. I will also publish any good Laws of machine authorship. To be in line for a prize the programmers must submit their names and addresses, as must the other entrants. Returning now to this open letter to ML specialists, the additional ten trains were generated by Steve Muggleton, within constraints that seemed to be suggested by the original set of ten cars, as follows: 1) A train has two, three or four cars, each of which can either be long or short. 2) A long car can have either two or three axles. 3) In Michalski's original version a possible distinction between hollow and solid wheels was ignored, as also here. 4) A car can also be either open or closed. 5) Long cars can only be rectangular, and if closed then their tops are either level or corrugated. 6) If a short car is rectangular then it can also be double-sided, and if closed then it has either a level or a peaked top. 7) The other allowable forms for a closed short car are rectangular, elliptical, and hexagonal. 8) For an open short car the possibilities are rectangular, slope-sided and curve-sided. 9) A car can be empty or it can contain one, two or three replicas of one of the following kinds of load: circle, triangle, inverted-triangle, hexagon, rectangle and rotated-rectangle. 10) No sub-distinctions are drawn among rectangles, even though some are drawn square and others more or less oblong. The presumption is that they were drawn just as oblong as they needed to be in each case to fill the available container space. Steve Muggleton's Prolog train generator embodies the above-sketched constraints together with certain distributional assumptions concerning values of descriptors so as to ensure statistical coherence with Michalski's original ten. The Prolog code is as follows. Note that the generator assigns no class labels, this having been left to me to do in such a way as to exemplify "property X", i.e. the secret East-West rule at present known only to me. When all entries have been received from readers of Computing and from readers of this circular letter, the rule will be divulged, along with results and analysis. Donald Michie, SERC Visiting Fellow, OU Computing Laboratory, OXFORD. 4th July 1994.