Abstract
The authors address the issue of providing highly representative descriptions in automated functional annotations. For an uncharacterized sequence, a common strategy is to infer such annotations from those of well-characterized sequences that contain its homologues. However, under many circumstances, this strategy fails to produce meaningful annotations. Using information revealed by the structured vocabularies of Gene Ontology, we propose a quantitative algorithm to assign representative annotations. We established a confidence function that reflects both the precision and coverage of a candidate annotation, and reasoned the function's parameters from analyses of significant forms of candidate distributions on the GO graph. We tested the algorithm with our self-designed BIO101 (http://BIO101.iis.sinica.edu.tw)-an automated annotation system that supports the workflows of functional annotations for expressed sequence tags (ESTs). According to our experimental results, the algorithm is capable of producing representative and meaningful functional annotations.