Abstract
We address the problem of estimating the sensitivity of seed-based similarity search algorithms. In contrast to approaches based on Markov models [Faster and more sensitive homology search, Designing seeds for similarity search in genomic DNA, Optimal spaced seeds for Hidden Markov Models, with application to homologous coding regions, Vector seeds: an extension to spaced seeds allows substantial improvements in sensitivity and specificity, Sensitivity analysis and efficient method for identifying optimal spaced seeds], we study the estimation based on homogeneous alignments. We describe an algorithm for counting and random generation of those alignments and an algorithm for exact computation of the sensitivity for a broad class of seed strategies. We provide experimental results demonstrating a bias introduced by ignoring the homogeneousness condition.