Lecture 9 
X = Y, A[T]  

Unify(X, S) = θ 
Subst(θ, A[Y])...) 
We can also allow demodulation to work the other way, i.e. replace Y with X, but as with rewriting we need to take care not to get stuck in any loops. Our presidential example would be a trivial case for the demodulation rule:
george_bush = g_bush, is_president(george_bush) 

is_president(g_bush) 
In this case, the unifying substitution is just the trivial identity substitution.
Most state of the art resolution theorem provers have resolution at their heart, but use a number of other inference rules, including demodulation, and its big brother paramodulation, which deals with cases where we only know that x=y P(x).
It is important to remember that, for theorems of any interest, the initial knowledge base will be quite sizeable. Moreover, each resolution step adds another sentence to the knowledge base. Hence, a strategy such as breadth first search, which takes every pair of sentences in turn and tries to resolve them will often get nowhere near to proving even fairly trivial theorems.
Automated Reasoning is an AI task that can be represented as a search problem. Therefore, it is not surprising that people began to think about some ways in which to improve the performance by using rules of thumb (heuristics). This was mainly motivated by the fact that early implementations failed to perform as expected  there were lots of simple looking theorems that they could not prove. We look at four heuristics for choosing what to resolve at what stage and what to leave out of resolution steps entirely.
The Unit Preference strategy is to be greedy when it comes to clauses which are just single literals, known as unit clauses. That is, whenever possible, this strategy chooses steps which involve the resolution of a unit clause with something else. The idea is that we are aiming for a clause with no literals, so if we can keep the length of clauses down to a minimum, then perhaps this is a good strategy. Of course, continually resolving with unit clauses will produce a reduction in clause sizes. Unit preference doesn't actually reduce the branching rate in mediumsize problems enough to make an impact. However, the use of this strategy produced a speed up in resolution provers for simpler problems, enabling implementations to prove many theorems which were out of their range before.
The Set of Support (SOS) strategy restricts the resolutions which are allowed to occur to just a subset of those possible. Specifically each resolution must involve a clause from a subset called the set of support. The idea is that the set of clauses excluded from the set of support are consistent within themselves, so the path to the solution must involve a resolution with one of the clauses in the SOS. This has an analogy with a heuristic we teach children to help them with problem solving: if you haven't used everything mentioned in the question, then you're probably not going in the right direction. In our situation, given that the axioms are consistent, then we know that the path to the solution will involve at least one resolution of the negated theorem statement with something else. Of course, the result of that resolution might also lead to the solution, and so on. This indicates the heuristic most used in set of support strategies: start with just the negated theorem in the SOS and keep adding in those clauses which are generated from the resolution of anything with an SOS member.
A special case of the SOS strategy is the Input Resolution strategy. This restricts the SOS to include only the clauses that were in the knowledge base before the search started, namely the axioms or negated theorems. Hence each resolution step involves at least one of the clauses provided as input. This will clearly reduce the size of the search space. This strategy is not complete for first order sentences in general, but is complete for Horn clause databases (such as Prolog Programs).
A final heuristic is called Subsumption and this checks when a newly formed clause is subsumed by one existing in the database already. One clause, C, is subsumed by another, D, if C is more specific than D. As the more specific clauses can be implied by the more general ones, then the more specific ones can be discarded without loss of generality, i.e., it will still be possible to prove the theorem. Of course, the reduction in the size of the knowledge base will produce a smaller search space.
In practice, a naive way to perform subsumption is to check whether one clause, C_{1} could be unified with (a subset of) the literals in another clause, C_{2} in such a way that the substitution does not affect C_{1} other than renaming variables, i.e. no extra function or constant symbols are substituted into C_{1}, the more specific clause. For example, the clause p(george) q(X) unifies with a subset of the literals in the clause p(A) q(B) r(C). The substitution is {A/george, X/B}, which instantiates A in the second (more general) clause, but only renames X to B, thus leaving the first (more specific clause) much the same as before.
There are less naive ways of performing subsumption checks which are much quicker, but subsumption checking of every clause against a new clause can be computationally expensive. Therefore, the reduction in search space will have to outweigh this expense.
The Logic Theorist One of the first applications of automated theorem proving was the use of Newell, Shaw and Simon's Logic Theory Machine to prove theorems from Whitehead and Russell's Principia Mathematica. The program proved 38 of the 52 theorems they presented to it, and actually found a more elegant proof to theorem 2.85 than provided by Whitehead and Russell. On hearing of this, Russell wrote to Simon in November 1956: 'I am delighted to know that Principia Mathematica can now be done by machinery ... I am quite willing to believe that everything in deductive logic can be done by machinery.' Newell, Shaw and Simon submitted an article about theorem 2.85, coauthored by the Logic Theory Machine, to the Journal of Symbolic Logic. However, it was refused publication as it was coauthored by a program. An informative page on Automated Deduction, including the Logic Theorist is here. 
In the 1960s and 70s, when it was realised that resolution was a complete procedure and implementations started to proliferate, there were some people who speculated that soon enough every mathematician would have an automated theorem prover on his or her desk which they could turn to in times of need to prove their theorems. This has not turned out to be the case, and should be taken as an illustrative example of why we shouldn't make such overly optimistic claims about AI. It also points out just how much more difficult the problem of automating mathematical theorem proving has been in comparison to playing chess, for example.
It's perhaps unfair to do so, but we can compare the impact on mathematics of automated reasoning with its sister subject, computer algebra systems (CAS), which perform symbolic manipulations and giant calculations. CAS itself grew out of AI research undertaken by mathematicians and computer scientists and has had a major impact on mathematics. Most undergraduate maths courses now offer modules in standard CAS applications such as Maple and Mathematica, and much research mathematics is facilitated at least in part by some CAS or another. In fact, Andrew Wiles, who in 1995 proved Fermat's Last theorem, purportedly only using a computer to write the proof up, is very much the exception to the rule, and he is a member of a dying breed.
Back to the comparison: there is no comparison. Automated reasoning programs have barely made a scratch on the surface of mathematics education or research. Partly due to this lack of success, and partly due to the success in other, well funded, areas such as hardware and software verification, the active interest in automating mathematics has largely dwindled. Note that hardware and software verification are large areas of research, which are beyond the scope of this lecture course.
There are, however, still a few of us who continue to pursue the dream of having an intelligent, creative AI agent help us make discoveries in mathematics, and below are a couple of the more interesting projects which have actually added to the mathematical literature, both of which use resolution theorem proving.
Automated theorem proving in general attempts to find proofs to theorems which are usually assumed to be true. This is, of course, not how mathematics proceeds in general. Only in rare cases is a theorem written down and then a concerted effort is made to prove it. Famous examples include Fermat's Last Theorem and Goldbach's conjecture (that every even number is the sum of two primes). In most cases, theorems are derived and are found at the end of lengthy reasoning processes including calculation, induction of patterns from observations and deduction of facts following from other known facts. Indeed, by the time most theorems are properly written down for the first time, they have been proved already. Having said that, there are cases where the "know it is true and then find the proof" approach can be useful, and other cases where an exhaustive search over a large set of theorems to find some with certain properties can lead to interesting discoveries.
In particular, Bill McCune and Larry Wos at the Argonne National Laboratories have been building and applying state of the art resolution theorem provers to mathematical discovery for many years. Their biggest accomplishment was with a prover called EQP, which solved an open conjecture known as the Robbins Algebra Conjecture. Herbert Robbins conjectured that commutative, associative algebras with the extra condition that n(n(x) + y)+n(n(x)+n(y)) = x for some function n, are Boolean algebras. The attempts to prove or disprove this conjecture outwitted mathematicians for 60 years. After many years of development of the problem (with the development also using automated deduction), it was finally in a form whereby they could run EQP to attempt to find a proof using resolution. EQP ran for around 8 days and used about 30 Mb of memory to find the proof in 1996. This still remains the biggest mathematical achievement of any automated theorem prover.
Although the Robbins result has been the most highprofile, the Argonne team have worked on many projects which have added to mathematics. In particular, Bill McCune has worked with a mathematician called Padmanabhan. He sends McCune conjectures, and McCune uses his latest resolution prover, called Otter, to attempt to prove the conjectures, sending back any proofs that he finds. This has been a very fruitful partnership, leading to enough results to fill a book about cubic curves.
Another particularly interesting project has been to determine the smallest way to axiomatise certain algebras such as group theory. Group theory is a domain of mathematics which involves symmetries. Normally it is described as a set of objects with an operation, *, which takes a pair of elements of the set and returns a third element. This operation is subject to a set of rules. Normally, these are expressed as three axioms: inverse, identity and associativity. By performing a search over smaller representations for the axioms, and trying to prove using Otter that each was equivalent to the group theory axioms, they found new, very succinct representations for the group theory axioms, and a host of other algebras.
A web page describing the work of the Argonne automated reasoning laboratory is here.
One of the course lecturers (Simon) has for a long time been involved in the development of a system called HR which performs automated theory formation in scientific domains. A theory consists of a set of concepts with examples and definitions, a set of conjectures about those concepts, and a set of proofs of the conjectures. HR is really a machine learning program, but it has had special application to the discovery of concepts and conjectures in domains of mathematics such as number theory, group theory and graph theory. More to the point of this lecture, HR uses Otter to do its theorem proving.
HR is often good at identifying nonobvious conjectures which we may not have thought of on our own. For instance, we looked at the domain of antiassociative algebras, mainly because we could not find any research done in this area. These are algebras over one operator with just one axiom: that no triple of elements is associative, that is, for all x, y and z, (x * y) * z is not equal to x * (y * z). We gave HR only these axioms to work with, and it used the MACE model generator (a sister program to Otter, which generates examples as counterexamples to false theorems), to generate examples of antiassociative algebras. HR conjectured and Otter proved many interesting conjectures, that we probably would never have stumbled across, including: that there must be two different elements on the diagonal of the multiplication table, that antiassociative algebras cannot be quasigroups and that they cannot have an identity element. Of course, because these results are proved by Otter, and because Otter is a very highly respected theorem prover (i.e., if it says it's proved, then it always is proved), we know that they are true in the general case.
Perhaps the most interesting results have come in number theory. In this domain, we use Otter in a slightly different way, because resolution provers are not as effective in numerical domains as they are in algebraic ones such as group theory. Because any conjectures that Otter can prove are unlikely to be at all difficult for a human to prove, we get HR to discard them because they won't be interesting. Some nice results (try to prove them if you can) from HR in number theory include:
None of these theorems were known to the user when they were reported, and each one takes around 10 lines, and a fair bit of number theoretic knowledge, to prove. In addition, HR has been responsible for around 30 new number types being added to the Encyclopedia of Integer Sequences  the only nonhuman to do so. A nice example is refactorable numbers (where the number of divisors is itself a divisor). HR also pointed out some nice results about these, for example that odd refactorable numbers must be squares.
Below are some other areas of automated theorem proving you may come across at some stage.
Given that, as discussed above, automated theorem provers are still far from being proficient at proving, it makes sense not to go for a fully automated approach, but rather to build more interactive theorem provers. Some systems, such as the Theorema package built by Bruno Buchberger's group in Linz, act like assistants to mathematicians, proving simple results as the user explores a theory. Other systems enable the user to guide the proof of more complicated theorems: they step in whenever there is an important choice point such as a case split. One important topic is the representation of a proof as it unfolds, and there are various techniques for graphically displaying and annotating proof trees so that the user knows what is going on (in order to correctly guide the proof at choice points).
Higher order theorem proving is what you would expect it to be  the automation of theorem proving in higher order logics as discussed in lecture 4. For an example of a higher order theorem prover (called HOL), see here. HOL has been used for a number of verification tasks, including type safety proofs for Java and the verification of cryptographic protocols.
Here 'inductive' is meant in the sense of mathematical induction and not inductive reasoning (machine learning). This involves deduction with induction rules of inference. For example, a standard argument by mathematical induction over the natural numbers 0,1,2,3... is expressed in the following rule:
P(0), X. (P(X) P(X + 1)) 

X. P(X) 
Mathematical induction is possible over other structures apart from numbers, and is particularly useful for deductive reasoning about recursion or iteration over such structures. As a result it has considerable application to software and hardware verification, given the ubiquity of recursion/iteration over data structures in computer systems.
Proof planning is a technique developed, amongst others, in Alan Bundy's automated reasoning group in Edinburgh. A proof plan is an outline or plan of a proof and proof planning is a technique for guiding the search for a proof in automated theorem proving. To prove a conjecture, proof planning first constructs the proof plan for a proof and then uses it to guide the construction of the proof itself. Common patterns in proofs are identified and represented in computational form as generalpurpose tactics, i.e., programs for directing the proof search process. These tactics are then formally specified with methods using a metalanguage.
Standard patterns of proof failure and appropriate patches to the failed proofs attempts are represented as critics, and to form a proof plan for a conjecture, the proof planner reasons with these methods and critics. The proof plan consists of a customised tactic for the conjecture, whose primitive actions are the generalpurpose tactics. This customised tactic directs the search of a tacticbased theorem prover. Proof planning has been particularly useful for proving theorems by mathematical induction.
For more information about proof planning, see a FAQ here.
The TPTP Library Largely fuelled by the enthusiasm of Geoff Sutcliffe and others, the annual CASC competition is often the venue for much revelry in automated reasoning, and sometimes even a little controversy. This competition attempts to determine who has the fastest, most accurate first order (usually resolution) theorem prover on the planet. It draws unseen problems from the TPTP library (thousands of problems for theorem provers), and these are used to test the provers. The competition is split into various categories, with an overall "CNF" category being perhaps the most prestigious one to win. The winner in 2008 was a theorem prover called Vampire 10 written by Andrei Voronkov and Alexandre Riazanov in Manchester. CASC is similar to the CASP competition in bioinformatics, where protein structures are withheld so that computers (and humans) can try to determine their structure from their genetic sequence. Such competitions are becoming very popular in AI, although there are many people who believe they are a waste of time, which encourage people to fine tune their systems only to work in competition situations. The website for CASC is here (in particular, have a look at the Tshirts!). The website for the TPTP library is here. The website for Vampire is here. 