At present, much of this identification is done manually: the designer displays a small number of molecules using 3-D graphics and tries to fit ``equivalent'' atoms or groups which have matching chemical properties. The main automatic method is statistical correlation (using linear regression) of biological activity with bulk chemical properties such as acidity.
Both methods are inadequate. Manual matching can only be used on a few molecules at a time, because there is too much 3-D information for the designer to handle: molecular shapes are horrendously convoluted, and it is hard to comprehend 3-D plots.
Statistical correlation only works within a series of closely related or ``homologous'' molecules, where for example each member differs from the last in the length of a hydrocarbon chain or the basicity of an amine group. Better ways to automatically discover the chemical properties which affect the activity of drugs could greatly reduce pharmaceutical R&D costs - at present, the average cost of developing a new drug is $230 million, and the average development time is 10 years.
Our research, which includes collaboration with the Imperial Cancer Research Fund, has shown shown that ILP can construct rules which predict the activity of untried drugs, given examples of drugs whose medicinal activity is already known. We found the these rules to be more accurate than statistical correlations. More importantly, because the examples are expressed in logic, it is possible to describe arbitrary properties of, and relations between, atoms and groups. This means that the examples need not be restricted to one homologous series. Finally, the logical nature of the rules also makes them easy to understand and can provide key insights, allowing considerable reductions in the numbers of compounds that need to be tested.
For further information, please see the pages on our work with drugs against Alzheimer's disease, drugs for inhibition of E. Coli Dihydrofolate Reductase, and suramin analogues.