Index of /~shm/Software/ase_progol/version_1.0
ASE-Progol Version 1.0
ASE-Progol Version 1.0
This system is described in the paper:
Combining Inductive Logic Programming, Active Learning and Robotics to
Discover the Function of Genes
C.H. Bryant and S.H. Muggleton and S.G. Oliver and D.B. Kell and
P. Reiser and R.D. King
2001,
Electronic Transactions on Artificial Intelligence, 5(B), pp1-36
INSTALLATION
Down-load the following files from the directory contain this README
file and place them in a single directory.
ase_progol
ask_oracle.pl
classify.pl
fast_forward.pl
max_compressions.pl
trial_generator.pl
trial_selector.pl
one_line_clauses
Edit the file ase_progol as follows:
Change the line near the top of this file beginning
set code_dir =
to
set code_dir = X
where X is the full pathname of the directory containing the above code.
Change the line near the top of this file beginning
set progol =
to
set progol = Y
where Y is the full pathname of the directory containing source code
for CProgol.
To run ASE-Progol execute the file "ase-progol". This is a unix shell
script which makes calls to CProgol. The options described below may
be used by adding them to the Unix command line after "ase-progol".
OPTIONS
-d data_directory
where data_directory is the pathname of the directory
containing the domain files (see below). Either a full or
relative pathname may be used. If a relative pathname is used
then it should be relative to the directory from which
ASE-Progol is called. If this option is not used then, by
default, the data_directory is taken to be the directory from
which ASE-Progol is called.
-scroll
This option causes ASE-Progol to ask the user at the start of
each cycle whether to continue the execution. (This allows the
user to examine the detailed files /tmp/*log which are only
kept for the current loop.)
-l max_iterations_of_CLML_cycle
where max_iterations_of_CLML_cycle is a limit on the maximum
number of iterations of the CLML cycle. In other words, a
limit on the number of trials which may be physically
performed.
-c trial_costs_limit
where trial_costs_limit is a limit on the experimental resources
which may be consumed by ASE-Progol.
-robot This option causes ASE-Progol to call the robot to get the
results of trials. If this option is not used then ASE-Progol
asks an Oracle instead.
-s Size of unification stack used by CProgol.
-random
This option causes ASE-Progol to select trials at random.
-naive
This option causes ASE-Progol to naively select the cheapest
trial from the set of candidate trials.
Normally ASE-Progol will instruct the robot to perform the trial whose
outcome will provide the highest discrimination between the candidate
hypotheses. The random and naive options can be used to obtain
benchmarks against which the normal performance of ASE-Progol can be
measured.
MODES
ASE-Progol can operates in either of three modes, namely ua (unaided),
hs (head-start) or ff (fast-forward).
Head-start mode allows ASE-Progol to utilise a set of laboratory
results obtained prior to execution without the use of ASE-Progol. In
Unaided mode there is no such set of results.
Fast-forward mode allows ASE-Progol to utilise the results from a
previous execution. In fast-forward mode execution recommences at a
specified iteration of the CLML cycle. ASE-Progol enters the loop at
the point where a new example has just been created. Fast-forward mode
requires a trace of the examples and hypotheses generated during a
previous execution.
The option -m is used to specify the desired mode of operation. It is
be used by adding one of the following to the Unix command line after
"ase-progol".
-m ua
-m hs hs_lab_results_file
where hs_lab_results_file is a file which take the same form
as lab_results.pl (see below). (All these results must be
assigned the loop number 0; this allows ASE-Progol to
distinguish these results from those that will be obtained
during the forthcoming execution).
-m ff start_loop ff_lab_results_file ff_hyps_file
where start_loop is an iteration of the previous CLML cycle
and ff_lab_results_file and ff_hyps_file are files which take the
same form as lab_results.pl and hypotheses.pl files
respectively (see below).
DOMAIN FILES
All the data for a particular domain must kept in a single separate
directory. This directory will contain the following files.
static_know.pl
Prolog code representing static knowledge on the domain,
including cost(Trial, Cost) definitions for domain.
trials.pl
trial/1 facts, where the argument is a ground unit clause
representing a trial
e.g. trial(phenotypic_effect(gene_d, [m1, m2])).
slp.pl
A Prolog program containing a definition of the predicate
sample_trials(Quantity,List_of_trials,Trial_selection_method).
The Trial Generator component calls this program on each
iteration of the loop in order to generate a set of candidate
trials.
Values of the terms Quantity and Trial_selection_method are
input to the program and the value of the term List_of_trials
is output. Quantity is the number elements in List_of_trials.
Trial_selection_method takes the value 'normal' or 'random'
The purpose of the term Trial_selection_method is to give the
program the option of behaving differently according to which
method of selecting trials is selected when ASE-Progol is
executed. One possible use of this option would be to provide
a definition of sample_trials(Quantity, List_of_trials,
normal) which utilises domain knowledge and a definition of
sample_trials(Quantity, List_of_trials, random) which does
not.
The program may use Stochastic Logic Programming. If this is
so and the slp utilises the system predicate sample/3 then the
predicate randomseed/0 must be executed first. Otherwise each
time CProgol is restarted, each call to sample(trial_pred,
Quantity, List_of_trials) generates the same list of trials,
assuming that the definition of trial_pred remains constant.
lab_results.pl
example(Loop_no, Trial, Class) facts, where Trial is a ground
unit clause representing a trial, Loop_no is the iteration of
the CLML loop in which the example was generated and Class is
either positive or negative.
examples.pl
positive and negative instances of a predicate representing an
trial.
e.g. phenotypic_effect(gene_d, [m1, m2]).
and
:- phenotypic_effect(gene_d, [m2]).
NB These are represented in the usual CProgol syntax.
learning.pl
CProgol mode and type declarations, constraints etc
hypotheses.pl
hypoth(Loop_no, [Head,Body], Compression), where the
second term is a list with two elements representing a
hypothesis, Loop_no is the iteration of the CLML loop
in which the hypothesis was generated and the third term is
the compression for that clause output by CProgol. eg
hypoth(1, [codes(gene_d,enzyme_d), (true)], 9).
classifications.pl
The output of the Classifier ie
matrix_cell(H, Compression, Trial, Cell_value)
facts which record how each trial is classified by each
hypothesis.
e.g. matrix_cell([codes(gene_d,enzyme_d), (true)], 9,
phenotypic_effect(gene_d,[m1,m2]), 1)
oracle.pl
This file is only needed when ASE-Progol is to ask an Oracle,
as opposed to a robot, for the outcome of trials. The oracle
takes the form of a file which should contain either an
intensional or extensional definition of the observable
predicate. An intensional definition should represent the
outcome for every possible trial by a set of positive and
negative examples in the usual Progol syntax.
LIMITATIONS
Both the cost of individual trials and the limit on the experimental
resources which may be consumed by ASE-Progol must not exceed (2^31
-1): this is the largest integer which can be represented.