Learning rules for comparing drug activities

Contents and links

Introduction to the drug structure-activity problem

Most pharmaceutical R&D is based on finding slightly improved variants of patented active drugs (292 out of 348 US drugs introduced between 1981 and 1988 were of this kind). In doing this, it is essential to understand the relationships between chemical structure and activity. In most cases, these relationships cannot be derived solely from physical theory, so experimental evidence is essential. Such empirically derived relationships are called Structure Activity Relationships (SARs). In a typical SAR problem, a set of chemicals of known structure and activity are given, and the problem is to construct a predictive theory relating the structure of a compound to its activity. This relationship can them be used to select for structures with high or low activity. Typically, knowledge of such relationships form the basis for devising clinically effective, non-toxic drugs.

At present, much of this identification is done manually: the designer displays a small number of molecules using 3-D graphics and tries to fit ``equivalent'' atoms or groups which have matching chemical properties. The main automatic method is statistical correlation (using linear regression) of biological activity with bulk chemical properties such as acidity.

Both methods are inadequate. Manual matching can only be used on a few molecules at a time, because there is too much 3-D information for the designer to handle: molecular shapes are horrendously convoluted, and it is hard to comprehend 3-D plots.

Statistical correlation only works within a series of closely related or ``homologous'' molecules, where for example each member differs from the last in the length of a hydrocarbon chain or the basicity of an amine group. Better ways to automatically discover the chemical properties which affect the activity of drugs could greatly reduce pharmaceutical R&D costs - at present, the average cost of developing a new drug is $230 million, and the average development time is 10 years.

Our research, which includes collaboration with the Imperial Cancer Research Fund, has shown shown that ILP can construct rules which predict the activity of untried drugs, given examples of drugs whose medicinal activity is already known. We found the these rules to be more accurate than statistical correlations. More importantly, because the examples are expressed in logic, it is possible to describe arbitrary properties of, and relations between, atoms and groups. This means that the examples need not be restricted to one homologous series. Finally, the logical nature of the rules also makes them easy to understand and can provide key insights, allowing considerable reductions in the numbers of compounds that need to be tested.

For further information, please see the pages on our work with drugs against Alzheimer's disease, drugs for inhibition of E. Coli Dihydrofolate Reductase, and suramin analogues.

Up to applications main page.

Machine Learning Group Home Page