Abstract
This paper describes a Bayesian learning based approach to protein secondary structure prediction. Four secondary structure types are considered, including α-helix, β-strand, β-turn and coil. A six-letter exchange group is utilized to represent a protein sequence. Training cases are expressed as sequence quaternion. A tool called Predictor is developed in Java that implements the proposed approach. To evaluate the tool, we select, from the Protein Data Bank and based on the principle of one-protein-per-family according to the structure family of SCOP, six hundred and twenty-three known proteins without pair wise sequence homology. Several training/test data splits have been tried. The results show that our proposed approach can produce prediction accuracy comparable to those of the traditional prediction methods. Predictor has user-friendly and easy-to-use GUIs, and is of practical value to the molecular biology researchers.