Go to the HR project pages

The Effect of Input Knowledge on Creativity

Get paper as PDF file

Get paper as postscript file

Simon Colton, Alison Pease, Graeme Ritchie
Division of Informatics
University of Edinburgh
80 South Bridge
Edinburgh EH1 1HN
United Kingdom
simonco,alisonp,graeme@dai.ed.ac.uk

Abstract

Recently, many programs have been written to perform tasks which are usually regarded as requiring creativity in humans. We can derive some commonalities between these programs in order to build further creative programs. Key to this is the derivation of certain measures which assess how creative a program is. Starting from recent proposals by Ritchie, we define possible measures which describe the extent to which a program produces novel output. We discuss how this relates to the creativity of the program.

Introduction

There has been much debate about machine creativity in artificial intelligence, cognitive science and philosophy. However, only recently have sufficient numbers of creative programs become available for us derive some commonalities between them. Such commonalities allow us to suggest ways in which creative programs can be designed, utilised and assessed. An important part of this process is the determination of a set of measures which can be used to estimate the creativity of a program.

Under certain circumstances, it may be possible to use such measures to determine whether one program is more creative than another. However, the circumstances would have to be very special, taking the design, input and output of both programs into consideration, and it is still likely that such a comparison would be deemed unfair. Similarly, it may be possible to use measures of creativity to determine whether a program is being creative at all, but this is also problematic, because creativity is such an overloaded and highly subjective word. For instance, Cohen's AARON program [cohen:aaron]Currently available at
www.kurzweilcyberart.com is cited as creative by many people, but is not thought of as creative by its author.

The worth of measures of creativity therefore lies in using them in the design of creative programs rather than the assessment of established programs. If we can agree upon a set of measures of creativity of programs, then someone writing a program can use these as a guideline for increasing the creativity of the program. If a new version of the program is assessed more favourably by some of the measures than the previous version, it is likely that progress has been made.

For any program which purports to be creative, an important question is the extent to which its design (including the algorithms and data it uses) is contrived to produce particular outputs. That is, has the program been `fine-tuned' to generate specific results? Evidence of fine-tuning can affect our perception of how creative a program has been. For instance, one of the reasons that Lenat's AM program [lenat:kbsiai] appears creative is that it began by working in set theory, but switched to number theory, with its best results arising in the latter domain. However, our willingness to accept AM as creative is lowered by the evidence provided in [hanna_ritchie:method] that AM's `decision' to regard bags as numbers --- which led it to investigate number theory --- was the result of certain carefully crafted knowledge.

In order to address the question of how the knowledge input to a program affects its creativity, we first discuss the nature of the kinds of program to which our measures of creativity apply and give some examples of such programs. We then draw on Popper's philosophy of science to motivate our investigation into fine-tuning. Following this, we give an overview of Ritchie's previous work on measuring creativity [ritchie:aisb01], and discuss how we will build on these. We then introduce the notion of a creative set and use this to derive some measures of fine-tuning which affect the assessment of creativity in programs. Finally, we perform a case study using the AM and HR mathematical theory formation programs [lenat:kbsiai], [colton:phd].

Background

We first discuss the types of programs for which our measures will be meaningful, both by identifying properties they share and by surveying some relevant programs. Following this, we motivate our study of fine-tuning by making an analogy between creative programs and theories as described in Popper's philosophy of science.

Creative Programs

Following [ritchie:aisb01], we restrict our discussion to programs which produce novel artefacts which can be qualitatively assessed by a human. The artefacts can be jokes, mathematical conjectures, poems, melodies, paintings, etc., but the restriction of qualitatively assessing them is important. To a large extent, this rules out programs such as computer algebra systems, where it is rare for the output from a calculation to be described as good, bad or anything inbetween. Occasionally, the output of a calculation might be surprising, but this was not the expectation beforehand. On the other hand, with the creative systems we discuss here, the reason for running the program is to produce something which is not only novel but which can be considered of high quality. For this reason, these programs often spend much of their time internally assessing the artefacts they produced in order to prune and order their output, and direct their search.

Recent creative programs of the type we are interested in include the JAPE joke generator [binsted:phd], the HR mathematical theory formation program [colton:ijcai99], the MuzaCazUza melody generator [ribeiro:aisb01] and the ASPERA poetry generator [gervas:aisb01]. We give a brief overview of these programs below, in order to derive some commonalities not only in their output, but also in the knowledge input to them.

The JAPE program [binsted:phd] produces simple punning riddles such as:


What do you get when you cross a monkey and a peach?
An ape-ricot.

Although Binsted carefully does not claim that her program is creative, JAPE is the type of program that we are interested in, since its output can be judged as to its quality (as a joke), and the aim of the exercise was to have the program creating items of as high-quality as possible. Binsted tested the output in a controlled fashion by showing it to school children, who were asked to say, for each item, if it was a joke, and how funny it was. On average, the output items were deemed to be jokes, and fairly funny, though not as funny as those published in joke books for children [binsted:children].

The HR program [colton:ijcai99] [colton:phd] performs theory formation in domains of pure mathematics such as group theory, number theory and graph theory. Given a little information about the domain of interest comprising some fundamental concepts in the domain (such as divisors in number theory), HR performs concept formation, conjecture making and (in algebraic domains) theorem proving and counterexample finding. The process is driven by the concept formation. That is, concepts are formed and conjectures are proposed by looking for patterns in the examples of the concepts. The nature of the concepts formed therefore dictates the nature of the conjectures.

There are 7 general production rules which turn one (or two) old concepts into a new one. As discussed later, each of these was introduced in order for HR to re-invent a classically interesting concept which it was not able to re-invent before. HR uses a set of measures of interestingness to determine which is the most interesting concept at any one time. This enables HR to build new concepts from the more interesting ones before the less interesting ones. The measures of interestingness include those which determine some intrinsic property of the concept, such as the complexity of its definition, and some relative properties of the concept, including how novel the categorisation it achieves is with respect to the other concepts. Finally, there are measures which determine whether the conjectures about a concept add to the interestingness of that concept.

The MuzaCazUza program [ribeiro:aisb01] is a melody generating program which is the successor to the SICOM program [pereira:sicom]. MuzaCazUza uses case base reasoning to generate a melody given a harmonic line over which to compose it. The cases comprise a chord, its rhythm and melody, and are taken from (at present) pieces from the Baroque period. Each case has attributes including pitch, duration of the chord and duration of the preceding and following chord.

When given a harmonic line, MuzaCazUza searches through the entire case base and scores each case by matching it with the input case. The score is taken as a weighted sum which assesses how close the input case is to the base case, using, amongst other techniques, Sch\"oenberg's chart of the regions. The authors state in [ribeiro:aisb01] that this is fairly limited approach, which often produces obvious and less interesting solutions. They plan to give MuzaCazUza some more of the properties of SICOM, such as choosing from the best cases randomly. After the melody has been generated, there is a more interactive phase where the user can choose to apply transformations of the melody, such as mirroring certain parts of it. The user can also choose to adapt the piece using a tonal correction algorithm.

The ASPERA poetry generator also uses cased-based reasoning. ASPERA generates poetry from user-given prose which contains an intended message for the poem. It uses fragments of the text as keys for the case base which contains corresponding poem fragments. The collection of poem fragments comprises the skeleton poem and ASPERA refines this using metrical rules to adapt the best matching cases to the text provided. The adaptation takes place via an abstraction to the linguistic categories of the words in the poem fragments. The metric which performs the adaptation is chosen to be the most appropriate for the user's wishes. The user is then asked whether the resulting poem is valid. If so, then the lines of verse are analysed linguistically and incorporated into the system data files.

We see that there are some similarities between these programs. Firstly, each program produces artefacts and the quality of the artefacts is assessed either internally or with the help of a user. In HR, measures of interestingness have been derived to help it find the most interesting concepts and conjectures first. In JAPE, an external assessment of the jokes was undertaken using school children. In MuzaCazUza, after the initial generation, there is an interactive phase where the user can attempt to increase the quality of the melody by applying transformations and tonal corrections. In ASPERA, if the user validates a poem, then this information is added to the data which the program will use next time, in order to increase the quality of the output with cases from validated poems.

There is also a similarity in the input to the programs, with all the programs taking data as input and, to a large extent, having interchangeable algorithms for generation and assessment of artefacts. We address this in detail later when discussing the contributions to creativity.

Motivation from the Philosophy of Science

Following Tarski, Popper suggests that we divide the universal class of all statements into true and false, and , [popper:ok]. He claims that the aim of science is to discover theories (explanations) whose content covers as much of and as little of as possible, where the content of a theory is the set of all statements logically entailed by it. This set may also be divided into true and false statements (the theory's truth and falsity content). A good theory should suggest where to look, i.e. new observations which we had not thought of making before.

This is comparable to a situation where the universal class of all basic items in a domain is divided into good and bad, and . If we describe the content of a program as its output set which may be divided into good and bad artefacts, then we can claim that one aim of a creative program is to generate as much of (and as little of ) as possible. A good system should suggest new areas of the search space to explore, i.e. find artefacts which we had not thought of generating before. If we accept this analogy, then Popper's criteria for evaluating theories sheds light on criteria for evaluating creative programs.

Popper sets out two criteria for a satisfactory theory (in addition to it logically entailing what it explains). Firstly it must not be ad hoc. That is, the theory (explicans) cannot itself be evidence for the phenomena to be explained (explicandum), or vice versa. For example if the explicandum is `this rat is dead', then it is not enough to suggest that `this rat ate poison' if the evidence for it having done so is it being now dead. There must be independent evidence, such as `the rat's stomach contains rat poison'. The opposite of an ad hoc explanation is therefore one which is independently testable. Secondly, a theory must be rich in content. For example, a theory which explains phenomena other than the specific phenomena it was designed to explain has a much richer content, and therefore has greater value than one which is less general (the principle of universality).

Applying these criteria to creative programs, if we see a program as the theory and the set of artefacts we wish to generate as the phenomena to be explained, then we are interested in the independent testability of and the richness of its content. A program which has been carefully tailored in order to produce very specific artefacts cannot be claimed to be a good program on the grounds that it produces those artefacts. There must be independent grounds for its value, such as also generating other valuable artefacts. Within the programming analogy, this is clearly connected to the richness of content criterion; the more valuable artefacts outside of and fewer worthless artefacts a program generates, the better that program is.

It is important to note that Popper's criteria are general for all scientific theories, applying to single statement explanations as well as all-encompassing theories. They therefore apply to any program (including subsets of larger programs) able to generate artefacts in . The conclusion of the analogy is that we should aim to make our programs as general as possible. That is, any creative program which re-invents already known artefacts should also generate a reasonable number of new, valuable, artefacts.

Contributions to Creativity

In [ritchie:aisb01], some measures of program creativity were formally introduced. Ritchie suggested that judgements of creativity take into account the fact that the program designer is typically guided by some set of (usually high-valued) artefacts called the \textit{inspiring set}. For instance, as discussed in chapter 6 of [colton:phd], in the development of the HR program, the concept of Abelian groups was an inspiring concept and HR was developed in order to re-invent that concept in the hope that it would also invent new concepts of a similar nature. Other inspiring artefacts for HR include the concept of self-inversing elements and prime numbers, with more details given in chapter 6 of [colton:phd]. Ritchie put forward a number of criteria for program creativity, all of which depended on a small set of factors, including , the program's output set and the subset of which was high-valued.

Ritchie suggested that, while an individual measure may not be adequate for assessing creativity alone, an overall measure of creativity could be formed by combining some of these formulae (or other similar criteria). We add to these measures by looking at the way particular knowledge may contribute to the creativity of a program. In particular, we derive measures which can be used to estimate how fine-tuned a program is. Such measures are motivated by instances of fine-tuning, such as the case with Lenat's AM described above, and also by the analogy with the philosophy of science given above.

Following Ritchie, we assume that the artefacts produced by the creative programs can be rated by humans according to their `value' (quality). While such a subjective judgement of value is difficult in general, it is not impossible; see for example, [binsted:phd], [colton:phd] or [steel:msc]. Specifically, following Ritchie, we assume that a value-set, , of high valued artefacts can be extracted from the program's output set . The output set can be determined either by single or multiple runs, but in either case, it is taken to be those artefacts actually produced by the program, rather than those it could plausibly produce.

The degree of creativity in a program is partly determined by the number of novel items of value it produces. Therefore we are interested in the set of valuable items produced by the program which exclude those in the inspiring set . We call this the creative set:

A programmer may increase the size of by using general procedural methods as opposed to specific procedures (which are more likely to produce items in ). A general procedure might consist of dropping or negating a constraint or altering a variable (for instance Kekul\'{e}'s discovery of the benzene ring, in which he negated the constraint that a string-molecule is an open curve [boden:tcm]). Such heuristics have been gleaned from examining how previous creative items were produced but are general enough to produce valuable items not yet envisaged. Therefore, the generality of the procedures in a program contributes to the degree of creativity we attribute to it.

We need a formal account of the knowledge \emph{input} to such programs. This is not straightforward, as there is not a uniform set of items which comprise the input. Fortunately, there is often a natural classification of the kinds of information used by a program. For instance, the HR program uses a set of 7 production rules for generating new concepts, along with a set of measures of interestingness for the concepts and conjectures produced. Furthermore in HR, for each session, the user sets various parameters --- mostly to dictate the form of the heuristic search --- and provides a few initial concepts from which all others follow. Similarly, in AM there were 242 heuristics and 115 elementary concepts.

JAPE is primarily driven by three types of rule (schemata, description-rules, and templates), all of which assume the availability of a general purpose dictionary (in Binsted's main experiment, Wordnet [miller:wordnet] was used). Although the rules may interact, in the sense that the operation of a particular schemata may result in an intermediate structure which only certain description-rules can use, or the output of a description rule may be suitable only for certain templates, each rule has a meaning independently of the other rules.

The data given to MuzaCazUza comprises the base cases which enable it to generate melodies and the harmonic line supplied by the user. In addition to this, there are techniques for assessing how close the input case is to the base case and (in SICOM, but suggested also for MuzaCazUza) different possibilities for choosing a case from the best-matching cases (e.g. randomly). Furthermore, the user is provided with various transformations which they can apply to the melodies produced, and these transformations also make up the knowledge given to MuzaCazUza.

With ASPERA, the data supplied is also comprised of the base cases and the user input, in this case some prose which encompasses an intended meaning. Also, the metrical rules for adapting the poem fragments retrieved from the case base to the prose, and the techniques for analysing validated poems make up the knowledge given to ASPERA.

We see that the types of initial information include:

(i) procedures for generating artefacts

(ii) procedures for altering/adapting artefacts

(iii) calculations for evaluating artefacts

(iv) parameters for the search

(v) input data

We do not need to distinguish between these types here.

Some programs are designed so that, to allow experiments, certain items from the above subsets are optional. For example, the HR user can turn construction procedures and measures of interestingness on and off, change parameters and remove input data; in principle, the heuristics and initial concepts of AM include (we assume) some optional elements; JAPE's rules are independent in the sense that rules could be added or removed to alter the set of jokes generated. We therefore assume that the program can (at least theoretically) operate with different \textit{input sets}, and this variation may affect the output.

Measures of Fine-Tuning

Let us now consider the case where input knowledge causes a program to replicate known items to a greater extent than it causes the generation of novel high-valued items; we suggest that this captures the notion of `fine-tuning'.

We will write for the set of output artefacts corresponding to input knowledge . Then we define as the set of high-valued items in ; as the `re-inventions', i.e.\ the artefacts in which were in the inspiring set ; and as the creative set -- those artefacts in which were not originally in (i.e.\ - ).

If we remove a particular subset from , then may differ from . From this point, we shall assume a fixed set of available input knowledge, and consider the effects of removing subsets of that overall knowledge base.

We can define three convenient terms:

  • is \textit{creatively irrelevant} if

  • is \textit{creatively useful} if

  • is \textit{creatively destructive} if
We are interested in the creatively destructive and especially the creatively useful subsets of knowledge. For which is creatively useful, we define the \textit{dependency set} of , to be the set . This is the set of high valued artefacts which will be missing from the output if is removed from the knowledge set.

As above, we define (the re-inventions) and (the creative set). For a particular creatively useful , we can say that is fine-tuned if:

That is, the contribution of to high-valued output is confined to replicating elements of the inspiring set, with no novel high-valued output being directly attributable to .

For cases where (i.e.\ there are at least some high-valued novel items contributed by ), we can get an indication of how fine-tuned is by defining:

This returns a value greater than 1 if is used to rediscover more artefacts than to find new ones of value, and 1 or less otherwise. Thus it captures the notion of a piece of knowledge being introduced only in order to find particular elements of the inspiring set, with nothing else of value being lost by removing that piece of knowledge.

So far, we have a measure for the extent to which some individual subset of knowledge contributes to the fine-tuning of the whole knowledge base. In the case where , it follows from our definitions that a program which fails to produce any novel high-valued artefacts when using (i.e.\ ) but does replicate some of the inspiring set (i.e.\ ) is deemed to be fine-tuned.

It is also interesting to consider, in the more general case, the extent to which there is fine-tuning in the various subsets of . For this we need to be more selective about the subsets we consider. Consider the case where a subset corresponds to a dependency set . That is, removing from will result in the set not forming part of the program's output. Suppose there is also a subset which is creatively irrelevant in the sense defined above, and consider . If there is no interaction between and , it is possible that , and hence . This would mean that the larger set, , would be rated in the same way, with respect to fine-tuning, as , which is intuitively untidy. To avoid this, we introduce the following definition:

A subset which is creatively useful, with dependency set , \textit{non-redundantly contributes to} if there is no subset of such that . Then in the following two measures, which both give some assessment of the extent of fine-tuning amongst the subsets of , we restrict attention to the subsets which \emph{non-redundantly} contribute to . These measures describe how fine-tuned a program is when using knowledge (assuming was constructed using inspiring set ):

where

  • maximum of over

If is greater than 0 or greater than 1, we can claim that using has been fine-tuned to some extent. If is 1, using is completely fine-tuned, in the sense that every item of knowledge in contributes to some subset (which non-redundantly contributes to ) which is fine-tuned. If is greater than 1, then there is at least one such subset of which is used more to replicate known artefacts than to find new ones of value.

Case Studies

We pointed out earlier that it is not our intention to apply our measures to established programs. Hence, as these measures are new, and no programs have been developed with them in mind, our case studies are more qualitative than quantitative.

AM is one of very few programs to have been criticised for fine-tuning in the literature [hanna_ritchie:method]. Ritchie and Hanna state that:

'... it is possible to gain the impression that the successful ``discovery'' was the result of various specially designed pieces of information, aimed at achieving this effect.' (page 263)

This suggests that there were specialised heuristics in AM which had a disproportionate effect on its results. In fact, in [lenat:kbsiai] Lenat proposes this as a way for writing discovery programs:

`Suppose a large collection of these heuristic strategies has been assembled (e.g. by analyzing a great many discoveries, and writing down new heuristic rules whenever necessary) ... one can imagine starting from a basic core of knowledge and ``running'' the heuristics to generate new concepts. ... Such syntheses are precisely what AM does.' (p. 5)

This suggests that AM was written by Lenat looking at particular concepts or conjectures and adding in heuristics until AM successfully found the result. This is true of many creative programs, but unfortunately with AM, sometimes the heuristic was so fine-tuned it was introduced solely in order for AM to re-invent a single concept, e.g., the concept of number (by thinking of bags as numbers).

Returning to our measures, the situation in AM could be characterised as containing a small (possibly unary) subset such that was particularly high. Hence, we could conclude that for AM would be greater than 1 and that AM was fine-tuned to a certain extent. In summary, we note that AM had more heuristics (242) than concepts it would ordinarily produce in a session (around 180).

In [colton:phd] we went to some lengths to argue that the HR program was not fine-tuned. For instance, we pointed out that the match production rule was inspired by the concept of self-inversing elements in groups (those for which ), but when employed in number theory, it enabled HR to re-invent the concept of square numbers. We did similarly for all the production rules, but, more importantly, we showed that all production rules were used in forming concepts which were new to us (and sometimes new to mathematics, as discussed in [colton:aaai00]).

We can also use the measures derived above to compare two competing knowledge subsets. Suppose a program can be run with either with the knowledge base or with , where and are in some sense alternative formulations that we wish to compare. Suppose that both and are creatively useful, and that the high-valued result set for the first of these is and for the second knowledge base it is . If , then it follows that and also . Hence , so . That is, the input knowledge variant which corresponds to the larger will at worst be as fine-tuned as the other input knowledge . If , will be less fine-tuned.

Such a situation occured during the development of HR --- as discussed in chapter 6 of [colton:phd] --- when two production rules in HR were replaced by a new, more general, rule. In particular, the conjunct rule was employed to take two concepts with, for example, definitions and and produce a concept with definition [where and are predicates defining some relation over objects , and ]. The common rule was designed to take a single concept with definition and produce a concept with a definition . When writing a new rule, the compose rule, which took two functions and and produced the concept , we realised that this could be written in such a way as to incorporate the functionality of the previous two also. The details of this are given in [colton:phd], but are not relevant here. In addition to making HR more general and more comprehensible, the new rule covered all of the output from the previous two and more. So, in this case, the generalisation of the two previous rules \emph{increased} for HR so the new version was less fine-tuned than the previous version. We hope to perform a more detailed examination of this case study in future.

Conclusions and Further Work

Of course, there is definitely a need for simulations, whereby a program is written to re-produce discoveries made by humans. For example, the BACON programs [langley:sci_disc] were written to provide plausible ways for a computer to re-discover certain laws from the physical sciences. Such simulations often highlight general ways to proceed and more creative programs are built as a result. However, we believe that fine-tuning in programs purporting to be creative needs to be addressed. We have sketched a formalisation of what it means for a creative program to be fine-tuned and we have noted that the more fine-tuned a program is, the less creativity we attribute to it. This contributes to the question of which processes can be deemed creative and so enhances the approach to estimating the creativity of a program based on its input and output. We believe this approach is very important in the study of machine creativity.

Our approach may be applied to existing programs whose set of inspiring items is possible to identify. This would then support or undermine claims of creativity. However, since may be difficult to identify in retrospect (other than the set of all artefacts known to the programmer) the main value of our work lies in its role as a guideline for researchers currently writing creative programs (who can record their inspiring set). In particular, such measures of creativity are useful when writing improved versions of creative programs, as in the illustrative example above with HR.

More work is required on these measures. In particular, these and some of the measures in [ritchie:aisb01] depend on the notion of an inspiring set, which may not apply completely to certain creative programs. For example, Cohen may have improved the AARON program so that it could draw trees, but the artefacts it produces are not just drawings of trees, but of scenes with humans and trees in them, and there is no particular drawing of a tree which acts as an inspiring artefact. In this case, we would have to relax our definition of inspiring set to include parts of artefacts rather than entire artefacts. Similarly, to make these measures more general, it is likely that we will need to adopt a more fuzzy notion of good and bad artefacts.

The assessment of machine creativity is beginning to be recognised as an important area of this field, with general guidelines such as those in [pease:iccbr01] required as well as concrete measures, such as those presented in [ritchie:aisb01] and those derived here. Such measures are imperative if the study of machine creativity is to become a formal research programme. Furthermore, we hope that such notions will be very useful for researchers writing creative programs in the future.

Acknowledgments

This work is supported by EPSRC grants GR/M98012 and GR/M45030. The first author is also affiliated with the Department of Computer Science, University of York. We would like to thank Alan Smaill for some important input to this work and the anonymous reviewers for their comments on an earlier draft of this paper.

References

[binsted:children]
Binsted, K.; Pain, H.; and Ritchie, G.
Children's evaluation of computer-generated punning riddles.
Pragmatics and Cognition 5:2:309--358, 1997.

[binsted:phd]
Binsted, K.
Machine Humour: An Implemented Model of Puns.
Ph.D. Dissertation, Department of Artificial Intelligence, University of Edinburgh, 1996.

[boden:tcm]
Boden, M, A.
The Creative Mind.
Abacus, 1992.

[cohen:aaron]
Cohen, H.
The further exploits of AARON, painter.
Stanford Electronic Humanities Review 4:2, 1995.

[colton:ijcai99]
Colton, S.; Bundy, A.; and Walsh, T.
HR: Automatic concept formation in pure mathematics.
In Proceedings of the 16th IJCAI, 786--791, 1999.

[colton:aaai00]
Colton, S.; Bundy, A.; and Walsh, T.
Automatic invention of integer sequences.
In Proceedings of the Seventeenth National Conference on Artificial Intelligence, 558--563, 2000.

[colton:phd]
Colton, S.
Automated Theory Formation in Pure Mathematics.
Ph.D. Dissertation, Division of Informatics, University of Edinburgh, 2001.

[gervas:aisb01]
Gerv\'as, P.
Generating poetry from a prose text: Creativity versus faithfulness.
In Wiggins, G., ed., Proceedings of the AISB'01 Symposium on Artificial Intelligence and Creativity in Arts and Science, 93--99, 2001.

[langley:sci_disc]
Langley, P.; Simon, H.; Bradshaw, G.; and \.Zytkow, J.
Scientific Discovery - Computational Explorations of the Creative Processes
MIT Press, 1987.

[lenat:kbsiai]
Lenat, D.
AM: Discovery in mathematics as heuristic search.
In Lenat, D., and Davis, R., eds., Knowledge-Based Systems in Artificial Intelligence.
McGraw-Hill Advanced Computer Science Series, 1982.

[miller:wordnet]
Miller, G.; Beckwith, R.; Fellbaum, C.; Gross, D.; Miller, K.; and Tengi, R.
Five papers on wordnet.
International Journal of Lexicography 3:4.
Revised March 1993.

[pease:iccbr01]
Pease, A.; Winterstein, D.; and Colton, S.
Evaluating machine creativity.
In Workshop on Creative Systems, 4th International Conference on Case Based Reasoning, 2001.

[pereira:sicom]
Pereira, F.; Grilo, C.; Macedo, L.; and Cardoso, A.
Composing music with CBR.
In First International Conference on Computational Models of Creative Cognition, Dublin, MIND-II, 1997.

[popper:ok]
Popper, K.
Objective Knowledge.
OUP, 1972.

[ribeiro:aisb01]
Ribeiro, P.; Pereira, F. C.; Ferrand, M.; and Cardoso, A.
Case-based melody generation with MuzaCazUza.
In Wiggins, G., ed., Proceedings of the AISB'01 Symposium on Artificial Intelligence and Creativity in Arts and Science, 67--74, 2001.

[hanna_ritchie:method]
Ritchie, G., and Hanna, F.
AM: A case study in methodology.
Artificial Intelligence 23:249--268, 1984.

[ritchie:aisb01]
Ritchie, G.
Assessing creativity.
In Wiggins, G., ed., Proceedings of the AISB'01 Symposium on Artificial Intelligence and Creativity in Arts and Science, 3--11, 2001.

[steel:msc]
Steel, G.
Cross domain concept formation using HR.
Master's thesis, Division of Informatics, University of Edinburgh, 1999.



© 2002 Simon Colton