Introduction to secondary structure prediction

Predicting the three-dimensional shape of proteins from their amino acid sequence is widely believed to be one of the hardest unsolved problems in molecular biology. It is also of considerable interest to pharmaceutical companies since a protein's shape generally determines its function as an enzyme. This is what a protein looks like.

[Muggleton S., King R.D., and Sternberg M.J.E. (1992)]. The task is to learn rules to identify whether a position in a protein is in an alpha-helix. Points of relevance are:

The Golem dataset

The data files we provide are as used in the original Golem experiments, and are downloadable as one compressed TAR file. Within this file, background knowledge files have a ``.b'' suffix, positive example files have a ``.f'' suffix, and negative example files have a ``.n'' suffix.


