|lectures_ai(simon)||("simon lectures AI")||arity is 1 here|
|father(bob,bill)||("bob is bill's father")||arity is 2 here|
|lives_at(bryan, house_of(jack))||("bryan lives at jack's house")||arity is 2 here|
We can string predicates together into a sentence by using connectives in the same way that we did for propositional logic. We call a set of predicates strung together in the correct way a sentence. Note that a single predicate can be thought of as a sentence.
There are five connectives in first-order logic. First, we have "and", which we write , and "or", which we write . These connect predicates together in the obvious ways. So, if we wanted to say that "Simon lectures AI and Simon lectures bioinformatics", we could write:
Note also, that now we are talking about different lectures, it might be a good idea to change our choice of predicates, and make ai and bioinformatics constants:
lectures(simon, ai) lectures(simon, bioinformatics)
The other connectives available to us in first-order logic are (a) "not", written , which negates the truth of a predicate (b) "implies", written , which can be used to say that one sentence being true follows from another sentence being true, and (c) "if and only if" (also known as "equivalence"), written , which can be used to state that the truth of one sentence is always the same as the truth of another sentence.
For instance, if we want to say that "if Simon isn't lecturing AI, then Bob must be lecturing AI", we could write it thus:
The things which predicates relate are terms: these may be constants, variables or the output from functions.
Constants are things which cannot be changed, such as england, black and barbara. They stand for one thing only, which can be confusing when the constant is something like blue, because we know there are different shades of blue. If we are going to talk about different shades of blue in our sentences, however, then we should not have made blue a constant, but rather used shade_of_blue as a predicate, in which we can specify some constants, such as navy_blue, aqua_marine and so on. When translating a sentence into first-order logic, one of the first things we must decide is what objects are to be the constants. One convention is to use lower-case letters for the constants in a sentence, which we also stick to.
Functions can be thought of as special predicates, where we think of all but one of the arguments as input and the final argument as the output. For each set of things which are classed as the input to a function, there is exactly one output to which they are related by the function. To make it clear that we are dealing with a function, we can use an equality sign. So, for example, if we wanted to say that the cost of an omelette at the Red Lion pub is five pounds, the normal way to express it in first-order logic would probably be:
However, because we know this is a function, we can make this clearer:
Because we know that there is only one output for every set of inputs to a function, we allow ourselves to use an abbreviation when it would make things clearer. That is, we can talk about the output from a function without explicitly writing it down, but rather replacing it with the left hand side of the equation. So, for example, if we wanted to say that the price of omelettes at the Red Lion is less than the price of pancakes at the House Of Pancakes, we would normally write something like this:
This is fairly messy, and involves variables (see next subsection). However, allowing ourselves the abbreviation, we can write it like this:
which is somewhat easier to follow.
Suppose now that we wanted to say that there is a meal at the Red Lion which costs only 3 pounds. If we said that cost_of(meal, red_lion) = three_pounds, then this states that a particular meal (a constant, which we've labeled meal) costs 3 pounds. This does not exactly capture what we wanted to say. For a start, it implies that we know exactly which meal it is that costs 3 pounds, and moreover, the landlord at the Red Lion chose to give this the bizarre name of "meal". Also, it doesn't express the fact that there may be more than one meal which costs 3 pounds.
Instead of using constants in our translation of the sentence "there is a meal at the Red Lion costing 3 pounds", we should have used variables. If we had replaced meal with something which reflects the fact that we are talking about a generic, rather than a specific meal, then things would have been clearer. When a predicate relates something that could vary (like our meal), we call these things variables, and represent them with an upper-case word or letter.
So, we should have started with something like
which reflects the fact that we're talking about some meal at the Red Lion, rather than a particular one. However, this isn't quite specific enough. We need to tell the reader of our translated sentence something more about our beliefs concerning the variable X. In this case, we need to tell the reader that we believe there exists such an X. There is a specific symbol in predicate logic which we use for this purpose, called the 'exists symbol'. This is written: . If we put it around our pair of predicates, then we get a fully formed sentence in first-order logic:
X (meal(X) cost_of(red_lion, X) = three_pounds)
This is read as "there is something called X, where X is a meal and X costs three pounds at the Red Lion".
But what now if we want to say that all meals at the Red Lion cost three pounds. In this case, we need to use a different symbol, which we call the 'forall' symbol. This states that the predicates concerning the variable to which the symbol applies are true for all possible instances of that variable. So, what would happen if we replaced the exists symbol above by our new forall symbol? We would get this:
X (meal(X) cost_of(red_lion, X) = three_pounds)
Is this actually what we wanted to say? Aren't we saying something about all meals in the universe? Well, actually, we're saying something about every object in the Universe: everything is a meal which you can buy from the Red Lion. For three pounds! What we really wanted to say should have been expressed more like this:
X (meal(X) cost_of(red_lion, X) = three_pounds)
This is read as: forall objects X, if X is a meal, then it costs three pounds in the Red Lion. We're still not there, though. This implies that every meal can be brought at the Red Lion. Perhaps we should throw in another predicate: serves(Pub, Meal) which states that Pub serves the Meal. We can now finally write what we wanted to say:
X (meal(X) serves(red_lion, X) cost_of(red_lion, X) = three_pounds)
This can be read as: for all objects X, if X is a meal and X is served in the Red Lion, then X costs three pounds.
The act of making ourselves clear about a variable by introducing an exists or a forall sign is called quantifying the variable. The exists and forall sign are likewise called quantifiers in first-order logic.
Substituting a ground term for a variable is often called "grounding a variable", "applying a substitution" or "performing an instantiation". An example of instantiation is: turning the sentence "All meals are five pounds" into "Spaghetti is five pounds" - we have grounded the value of the variable meal to the constant spaghetti to give us an instance of the sentence.
We have now seen some examples of first order sentences, and you should practice writing down English sentences in first-order logic, to get used to them.
There are many ways to translate things from English to Predicate Logic incorrectly, and we can highlight some pitfalls to avoid. Firstly, there is often a mix up between the "and" and "or" connectives. We saw in a previous lecture that the sentence "Every Monday and Wednesday I go to John's house for dinner" can be written in first order first-order logic as:
X ((day_of_week(X, monday)
(go_to(me, house_of(john)) eat_meal(me, dinner)))
and it's important to note that the "and" in the English sentence has changed to an "or" sign in the first-order logic translation. Because we have turned this sentence into an implication, we need to make it clear that if the day of the week is Monday or Wednesday, then we go to John's house for dinner. Hence the disjunction sign (the "or" sign) is introduced. Note that we call the "and" sign the conjunction sign.
Another common problem is getting the choice, placement and order of the quantifiers wrong. We saw this with the Red Lion meals example above. As another example, try translating the sentence: "Only red things are in the bag". Here are some incorrect answers:
X (in_bag(X) red(X))
X (red(X) in_bag(X))
X ( Y (bag(X) in_bag(Y,X) red(Y)))
Question: "Why are these incorrect, what are they actually saying, and what is the correct answer?"
Another common problem is using commonsense knowledge to introduce new predicates. While this may simplify things, the agent you're communicating with is unlikely to know the piece of commonsense knowledge you are expecting it to. For example, some people translate the sentence: "Any child of an elephant is an elephant" as:
X ( Y (parent(X,Y) elephant(X)) elephant(Y))
even though they're told to use the predicate child. What they have done here is use their knowledge about the world to substitute the predicate 'parent' for 'child'. It's important to never assume this kind of commonsense knowledge in an agent: unless you've specifically programmed it to, an agent will not know the relationship between the child predicate and the parent predicate.
There are tricks to compress what is written in logic into a succinct, understandable English sentence. For instance, look at this sentence from earlier:
X (meal(X) cost_of(red_lion, X) = three_pounds)
This is read as "there is something called X, where X is a meal and X costs three pounds at the Red Lion". We can abbreviate this to: "there is a meal, X, which costs three pounds at the Red Lion", and finally, we can ignore the X entirely: "there is a meal at the Red Lion which costs three pounds". In performing these abbreviations, we have interpreted the sentence.
Interpretation is fraught with danger. Remember that the main reason we will want to translate from first-order logic is so that we can read the output from a reasoning agent which has deduced something new for us. Hence it is important that we don't ruin the good work of our agent by mis-interpreting the information it provides us with.
Most programming languages are procedural: the programmer specifies exactly the right instructions (algorithms) required to get an agent to function correclty. It comes as a surprise to many people that there is another way to write programs. Declarative programming is when the user declares what the output to a function should look like given some information about the input. The agent then searches for an answer which fits the declaration, and returns any it finds.
As an example, imagine a parent asking their child to run to the shop and buy some groceries. To do this in a declarative fashion, the parent simply has to write down a shopping list. The parents have "programmed" their child to perform their task in the knowledge that the child has underlying search routines which will enable him or her to get to the shop, find and buy the groceries, and come home. To instruct their child in a procedural fashion, they would have to tell the child to go out of the front door, turn left, walk down the street, stop after 70 steps, and so on.
We see that declarative programming languages can have some advantages over procedural ones. In fact, it is often said that a Java program written to do the same as a Prolog program usually takes about 10 times the number of lines of code. Many AI researchers try out an idea in Prolog before implementing it more fully in other languages, because Prolog can be used to perform searches easily (see later).
A well-known declarative language which is used a lot by AI researchers is Prolog, which is based on first-order logic. For any declarative programming language, the two most important aspects are: how information is represented, and the underlying search routines upon which the language is based. Robert Kowalski put this in a most succinct way:
Algorithm = Logic + Control.
Robert Kowalski is an emeritus Professor in the Department of Computer Science, here at Imperial College. He is one of the world's leading researchers in logic and in particular logic programming. Prof. Kowalski has applied logic to various real world applications such as legal reasoning, and he is currently writing a book about Computational Logic for Human Affairs.
Read a short story of his life and work here.
If we impose some additional constraints on first-order logic, then we get to a representation language known as logic programs. The main restriction we impose is that all the knowledge we want to encode is represented as Horn clauses. These are implications which comprise a body and a head, where the predicates in the body are conjoined and they imply the single predicate in the head. Horn clauses are universally quantified over all the variables appearing in them. So, an example Horn clause looks like this:
x, y, z ( b1(x,y) b2(x) ... bn(x,y,z) h(x,y))
We see that the body consists of predicates bi and the head is h(x,y). We can make this look a lot more like the Prolog programs you are used to writing by making a few syntactic changes: first, we turn the implication around and write it as :- thus:
x, y, z (h(x,y) :- b1(x,y) b2(x) ... bn(x,y,z))
next, we change the symbols to commas.
x, y, z (h(x,y) :- b1(x,y), b2(x), ..., bn(x,y,z))
Finally, we remove the universal quantification (it is assumed in Prolog), make the variables capital letters (Prolog requires this), and put a full stop at the end:
h(X,Y) :- b1(X,Y), b2(X), ..., bn(X,Y,Z).
Note that we use the notation h/2 to indicate that predicate h has arity 2. Also, we call a set of Horn clauses like these a logic program. Representing knowledge with logic programs is less expressive than full first order logic, but it can still express lots of types of information. In particular, disjunction can be achieved by having different Horn clauses with the same head. So, this sentence in first-order logic:
x (a(x) b(x) c(x) d(x))
can be written as the following logic program:
c(x) :- a(x). c(x) :- b(x). d(x) :- a(x). d(x) :- b(x).
We also allow ourselves to represent facts as atomic ground predicates. So, for instance, we can state that:
parent(georgesenior, georgedubya). colour(red).and so on.
We can use this simple Prolog program to describe how Prolog searches:
president(X) :- first_name(X, georgedubya), second_name(X, bush). prime_minister(X) :- first_name(X, maggie), second_name(X, thatcher). prime_minister(X) :- first_name(X, tony), second_name(X, blair). first_name(tonyblair, tony). first_name(georgebush, georgedubya). second_name(tonyblair, blair). second_name(georgebush, bush).
If we loaded this into a Prolog implementation such as Sicstus, and queried the database:
then Sicstus would search in the following manner: it would run through it's database until it came across a Horn clause (or fact) for which the head was prime_minister and the arity of the predicate was 1. It would first look at the president clause, and reject this, because the name of the head doesn't match with the head in the query. However, next it would find that the clause:
prime_minister(X) :- first_name(X, maggie), second_name(X, thatcher).
fits the bill. It would then look at the predicates in the body of the clause and see if it could satisfy them. In this case, it would try to find a match for first_name(X, maggie). However, it would fail, because no such information can be found in the database. That means that the whole clause fails, and Sicstus would backtrack, i.e., it would go back to looking for a clause with the same head as the query. It would, of course, next find this clause:
prime_minister(X) :- first_name(X, tony), second_name(X, blair).
Then it would look at the body again, and try to find a match for first_name(X, tony). It would look through the datatabase and find X=tonyblair a good assignment, because the fact first_name(tonyblair, tony) is found towards the end of the database. Likewise, having assigned X=tonyblair, it would then look for a match to: second_name(tonyblair, blair), and would succeed. Hence, the answer tonyblair would make the query succeed, and this would be reported back to us.
The important thing to remember is that Prolog implementations search from the top to the bottom of the database, and try each term in the body of a clause in the order in which they appear. We say that Sicstus has proved the query prime_minister(P) by finding something which satisfied the declaration of what a prime minister is: Tony Blair. It is also worth remembering that Sicstus assumes negation as failure. This means that if it cannot prove a predicate, then the predicate is false. Hence the query:
?- \+ president(tonyblair).
Returns an answer of 'true', because Sicstus cannot prove that Tony Blair is a president.
Note that, as part of its search, Prolog also makes inferences using the generalised Modus-Ponens rule of inference and unification of clauses. We will look in detail at these processes in the next lecture.
One of the most popular commercial implementations of Prolog is provided by the Swedish Institute of Computer Science (SICS), and is called Sicstus. The homepages for Sicstus are here. A good free implementation of Prolog is SWI-Prolog which is available here. A guide to Prolog implementations is supplied here.
Also, machine learning agents which represent information as logic program often have built-in Prolog interpreters (unless they are built using Prolog themselves, of course). For instance, the Progol program, which is described later in the course, has a Prolog interpreter written in C.
To make Prolog into a usable programming language, certain implementations allow arithmetic to be carried out, whereby the search mechanism is substituted by specific code for carrying out arithmetical functions. To make us aware of this substitution, the word is is used. For example, if we wanted to calculate 10 + 4, we would write:
?- A is 10 + 4.
Then our Prolog implementation would know that it is not supposed to search for an answer which is 10 + 4, but rather to calculate this value using pre-compiled code.
The speed of Prolog implementations is measured in Logical Inferences Per Second (LIPS). Logic programming languages used to be too slow for practical usage, until compilers were written for them, which meant that the number of LIPS approached 50,000. One way in which compilation is achieved is by translating Prolog programs into procedural programs written in a language like C, and then compiling the procedural programs. An alternative approach is to translate the Prolog program into an intermediate language, the best known of which is the Warren Abstract Machine (WAM), named after David Warren, it's inventor. With modern compilers on modern computers, the number of LIPS is usually in the millions, which makes Prolog competitive with procedural languages such as C and Java. Moreover, Prolog can take advantage of parallelism. There are two obvious types of parallelisms available, namely OR-parallelism and AND-parallelism.
With OR-parallelism, a different processor is used for each possible Horn clause with the head the same as a query. For instance, if we had this Prolog program:
prime_minister(X) :- first_name(X, tony), second_name(X, blair). prime_minister(X) :- first_name(X, maggie), second_name(X, thatcher).
and we queried: ?- prime_minister(A), then a Prolog implementation with OR-parallelism would use two different processors. The first would try to satisfy the first prime_minister/1 clause and the second would try to satisfy the second prime_minister/1 clause. If they both reported an answer, the first would be taken.
AND-parallelism is when Prolog uses a seperate processor to use to try to satisfy each conjunct in the body of a clause under consideration. So, when looking at the clause:
prime_minister(X) :- first_name(X, tony), second_name(X, blair).
one processor would start looking for an X to satisfy first_name(X, tony), and in parallel, another processor would look for an X to satisfy second_name(X, blair). This way of parallelising Prolog is more difficult to achieve, because the solutions for each of the conjuncts have to match. This means that the processors have to communicate with each other in order to coordinate their attack on the clause, ensuring that they end up with the same solution. Perhaps the most successful attempt at AND-parallelisation of Prolog programs was the PIM project (Parallel Inference Machine), which came out of the Japanese Fifth Generation Project.
The Fifth Generation Project in Japan was a source of major funding for AI research. The 10 year project, which started in 1981, aimed to produce ubiquitous computers running Prolog in much the same way that computers normally run machine code. It had very ambitious goals, such as full-scale natural language understanding, and some people used these to criticise it. For example, J. Marshall Ungar wrote a book entitled: "The Fifth Generation Fallacy - Why Japan is Betting its Future on Artificial Intelligence". The book is reviewed here.
An unexpected spin-off from the Fifth Generation Project was the number of similar projects around the world which were started in response. Read a paper from 1988 here, describing "the Sputnik effect" whereby other nations - not wanting to be left behind - started projects such as the European Esprit, the British Alvey, and the American MCC projects.
Prolog is often used for rapid prototyping of programs in research. I always think of writing a Prolog program as similar to how an architect makes a cardboard model of a building before construction of the real building begins. This analogy is a little unfair, however, as there are many full-scale applications which have been put together using Prolog. In particular, Prolog has been used to design commercial expert systems in many areas including medicine, business and finance, as discussed in the next section.
Expert systems are agents which are programmed to make decisions about real world situations. They are put together by using knowledge illicitation techniques to extract information from human experts. A particularly fruitful area is in diagnosis of diseases, where expert systems are used to decide (suggest) what disease a patient has, given their symptoms.
Expert systems are one of the major success stories of AI. Russell and Norvig give a very nice example from medicine:
"A leading expert on lymph-node pathology describes a fiendishly difficult case to the expert system, and examines the system's diagnosis. He scoffs at the system's response. Only slightly worried, the creators of the system suggest he ask the computer for an explanation of the diagnosis. The machine points out the major factors influencing its decision and explains the subtle interaction of several of the symptoms in this case. The experts admits his error, eventually."
Often, the rules from the expert are encoded as if-then rules in first-order logic and the implementation of the expert system can be fairly easily achieved in a programming language such as Prolog.
We can take our card game from the previous lecture as a case study for the implementation of a logic-based expert system. The rules were: four cards are laid on the table face up. Player 1 takes the first card, and they take it in turns until they both have two cards each. To see who has won, they each add up their two card numbers, and the winner is the one with the highest even number. The winner scores the even number they have. If there's no even number, or both players achieve the same even number, then the game is drawn
It could be argued that undertaking a minimax search is a little uneccessary for this game, because we could easily just specify a set of rules for each player, so that they choose cards rationally. To demonstrate this, we will derive down some Prolog rules which specify how player one should choose the first card.
For example, suppose the cards dealt were: 4, 5, 6, 10. In this case, the best choice of action for player one is to choose the 10, followed presumably by the 4, because player two will pick the 6. We need to abstract from this particular example to the general case: we see that there were three even numbers and one odd one, so player one is guaranteed another even number to match the one they chose. This is also true if there are four even numbers. Hence we have our first rule:
When there are three or four odd cards it's not difficult to see that the most rational action for player one is to choose the highest odd numbered card:
The only other situation is when there are two even and two odd cards. In this case, I'll leave it as an exercise to convince yourselves that there are no rules governing the choice of player one's first card: they can simply choose randomly, because they're not going to win unless player two makes a mistake.
To write an expert system to decide which card to choose in a game, we will need to translate our rules into first-order logic, and then into a Prolog implementation. Our first rule states that, in a game, g:
The meaning of the predicates is as obvious as it seems. Similarly, our second rule can be written as:
There are many different ways to encode these rules as a Prolog program. Different implementations will differ in their execution time, but for our simple program, it doesn't really matter which predicates we choose to implement. We will make our top level predicate: player_one_chooses/2. This predicate will take a list of card numbers as the first argument, and it will choose a member of this list to put as the second argument. In this way, the same predicate can be used in order to make second choices.
Using our above logical representation, we can start by definining:
player_one_chooses(CardList, CardToChoose) :- length(CardList, 4), number_of_evens(CardList, 3), biggest_even_in_list(CardList, CardToChoose). player_one_chooses(CardList, CardToChoose) :- length(CardList, 4), number_of_evens(CardList, 4), biggest_even_in_list(CardList, CardToChoose). player_one_chooses(CardList, CardToChoose) :- length(CardList, 4), number_of_odds(CardList, 3), biggest_odd_in_list(CardList, CardToChoose). player_one_chooses(CardList, CardToChoose) :- length(CardList, 4), number_of_odds(CardList, 4), biggest_odd_in_list(CardList, CardToChoose). player_one_chooses([CardToChoose|_], CardToChoose).
We see that there are four choices depending on the number of odds and evens in the CardList. To make these predicates work, we need to fill in the details of the other predicates. Assuming that we have some basic list predicates: length/2 which calculates the size of a list, sort/2 which sorts a list, and last/2 which returns the last element in a list, then we can write down the required predicates:
iseven(A) :- 0 is A mod 2. isodd(A) :- 1 is A mod 2. even_cards_in_list(CardList, EvenCards) :- findall(EvenCard,(member(EvenCard, CardList), iseven(EvenCard)), EvenCards). odd_cards_in_list(CardList, OddCardes) :- findall(OddCard,(member(OddCard, CardList), isodd(OddCard)), EvenCards). number_of_evens(CardList, NumberOfEvens) :- even_cards_in_list(CardList, EvenCards), length(EvenCards, NumberOfEvens). number_of_odds(CardList, NumberOfOdds) :- odd_cards_in_list(CardList, OddCards), length(OddCards, NumberOfOdds). biggest_odd_in_list(CardList, BiggestOdd) :- odd_cards_in_list(CardList, OddCards), sort(OddCards, SortedOddCards), last(SortedOddCards, BiggestOdd). biggest_even_in_list(CardList, BiggestEven) :- even_cards_in_list(CardList, EvenCards), sort(EvenCards, SortedEvenCards), last(SortedEvenCards, BiggestEven).
It's left as an exercise to write down the rules for player one's next choice, and player two's choices.