<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet href="/css/rss20.xsl" type="text/xsl"?>
<rss xmlns:pheedo="http://www.pheedo.com/namespace/pheedo" version="2.0">
	<channel>
		<title>IEEE/ACM Transactions on Computational Biology and Bioinformatics</title>
		<link>http://www.computer.org/tcbb</link>
		<description>The IEEE/ACM Transactions on Computational Biology and Bioinformatics is a new quarterly that will publish archival research results related to the algorithmic, mathematical, statistical, and computational methods that are central in bioinformatics and computational biology; the development and testing of effective computer programs in bioinformatics; the development and optimization of biological databases; and important biological results that are obtained from the use of these methods, programs, and databases.	</description>
		<language>en-us</language>
		<pubDate>Tue, 7 Oct 2008 10:00:03 GMT</pubDate>
		<image>
			<url>http://csdl.computer.org/common/images/logos/tcbb.gif</url>
			<title>IEEE Computer Society</title>
			<description>List of recently published journal articles</description>
			<link>http://www.computer.org/tcbb</link>
		</image>
		<item>
			<title>PrePrint: A Study of Hierarchical and Flat Classification of Proteins</title>
			<link>http://www.pheedo.com/click.phdo?i=204a337dade84565789943ceb520f7e4</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.104</pheedo:origLink>
			<description>Automatic classification of proteins using machine learning is an important problem that has received significant attention in the literature. One feature of this problem is that expert-defined hierarchies of protein classes exist and can potentially be exploited to improve classification performance. In this article we investigate empirically whether this is the case for two such hierarchies. We compare multi-class classification techniques that exploit the information in those class hierarchies and those that do not, using logistic regression, decision trees, bagged decision trees, and support vector machines as the underlying base learners. In particular, we compare hierarchical and flat variants of ensembles of nested dichotomies. The latter have been shown to deliver strong classification performance in multi-class settings. We present experimental results for synthetic, fold recognition, enzyme classification, and remote homology detection data. Our results show that exploiting the class hierarchy improves performance on the synthetic data, but not in the case of the protein classification problems. Based on this we recommend that strong flat multi-class methods be used as a baseline to establish the benefit of exploiting class hierarchies in this area.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=204a337dade84565789943ceb520f7e4&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=204a337dade84565789943ceb520f7e4&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.104</guid>
		</item>
		<item>
			<title>PrePrint: A Monte Carlo EM Algorithm for de novo Motif Discovery in Biomolecular Sequences</title>
			<link>http://www.pheedo.com/click.phdo?i=f59a4e43fd31d1f2b2f7c19e64fef591</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.103</pheedo:origLink>
			<description>Motif discovery methods play pivotal roles in deciphering the genetic regulatory codes in genome as well as in locating conserved domains in protein sequences. The Expectation Maximization (EM) motif-finding algorithm is one of the most popular de novo motif discovery methods. Based on the position weight matrix (PWM) updating technique, this paper presents a Monte Carlo version of the EM motif algorithm The newly implemented algorithm is named as Monte Carlo EM Motif Discovery Algorithm (MCEMDA). MCEMDA starts from an initial model, and then it iteratively performs Monte Carlo simulation and parameter update until convergence. A log-likelihood profiling technique together with the top-k strategy is introduced to cope with the phase shifts and multiple modal issues in motif discovery problem. A novel grouping motif alignments (GMA) algorithm is designed to select motifs by clustering a population of candidate local alignments and successfully applied to subtle motif discovery. MCEMDA compares favorably to other popular PWM-based and word enumerative motif algorithms tested using simulated motif cases, documented prokaryotic and eukaryotic DNA motif sequences. Finally MCEMDA is applied to detect large blocks of conserved domains in protein benchmarks and compared with other multiple sequence alignment methods.&lt;br style=&quot;clear: both;&quot;/&gt;
      &lt;a href=&quot;http://www.pheedo.com/click.phdo?s=f59a4e43fd31d1f2b2f7c19e64fef591&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=f59a4e43fd31d1f2b2f7c19e64fef591&quot;/&gt;&lt;/a&gt;
  &lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=f59a4e43fd31d1f2b2f7c19e64fef591&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.103</guid>
		</item>
		<item>
			<title>PrePrint: Refining Phylogenetic Trees Given Additional Data: An Algorithm Based on Parsimony</title>
			<link>http://www.pheedo.com/click.phdo?i=e51a82a755635e4b9fbe42a4df9b94d5</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.102</pheedo:origLink>
			<description>Given a set X of taxa, a phylogenetic X-tree T that is only partially resolved, and a collection of characters on X, we consider the problem of finding a resolution (refinement) of T that minimizes the parsimony score of the given characters. Previous work has shown that this problem has a polynomial time solution provided certain strong constraints are imposed on the input. In this paper we provide a new algorithm for this problem, and show that it is fixed parameter tractable under more general conditions.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=e51a82a755635e4b9fbe42a4df9b94d5&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=e51a82a755635e4b9fbe42a4df9b94d5&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.102</guid>
		</item>
		<item>
			<title>PrePrint: Combinatorial Analysis for Sequence And Spatial Motif Discovery in Short Sequence Fragments</title>
			<link>http://www.pheedo.com/click.phdo?i=e9f6cbe1b070b1a78c1e95a5ccbaef88</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.101</pheedo:origLink>
			<description>Motifs are over-represented sequence or spatial patterns appearing in proteins. They often play important roles in maintaining protein stability and in facilitating protein function. When motifs are located in short sequence fragments, as in transmembrane domains that are only 6-20 residues in length, and when there is only very limited data, it is difficult to identify motifs. In this study, we introduce combinatorial models based on permutation for assessing statistically significant sequence and spatial patterns in short sequences. We show our method can uncover previously unknown sequence and spatial motifs in beta-barrel membrane proteins, and that our method outperforms existing methods in detecting statistically significant motifs in this dataset. Lastly, we discuss implications of motif analysis for problems involving short sequences in other families of proteins.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=e9f6cbe1b070b1a78c1e95a5ccbaef88&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=e9f6cbe1b070b1a78c1e95a5ccbaef88&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.101</guid>
		</item>
		<item>
			<title>PrePrint: Refining Phylogenetic Trees Given Additional Data: An Algorithm Based on Parsimony</title>
			<link>http://www.pheedo.com/click.phdo?i=6a3ce90d86bec877d0d37c1cd7334abc</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.100</pheedo:origLink>
			<description>Given a set X of taxa, a phylogenetic X-tree T that is only partially resolved, and a collection of characters on X, we consider the problem of finding a resolution (refinement) of T that minimizes the parsimony score of the given characters. Previous work has shown that this problem has a polynomial time solution provided certain strong constraints are imposed on the input. In this paper we provide a new algorithm for this problem, and show that it is fixed parameter tractable under more general conditions.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=6a3ce90d86bec877d0d37c1cd7334abc&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=6a3ce90d86bec877d0d37c1cd7334abc&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.100</guid>
		</item>
		<item>
			<title>PrePrint: Finding the Nearest Neighbors in Biological Databases Using Less Distance Computations</title>
			<link>http://www.pheedo.com/click.phdo?i=219e2bc4043569d1247d659f5b13afd5</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.99</pheedo:origLink>
			<description>Modern biological applications usually involve the similarity comparison between two objects, which is often computationally very expensive, such as whole genome pairwise alignment and protein three dimensional structure alignment. Nevertheless, being able to quickly identify the closest neighboring objects from very large databases for a newly obtained sequence or structure can provide timely hints to its functions and more. This paper presents a substantial speedup technique for the well studied k-nearest neighbor (k-nn) search, based on novel concepts of virtual pivots and partial pivots, such that a significant number of the expensive distance computations can be avoided. The new method is able to dynamically locate virtual pivots, according to the query, with increasing pruning ability. Using the same or less amount of database preprocessing effort, the new method outperformed the second best method by using no more than 40% distance computations per query, on a database of 10,000 gene sequecnes, compared to several best known k-nn search methods including M-Tree, OMNI, SA-Tree, and LAESA. We demonstrated the use of this method on two biological sequence datasets, one of which is for HIV-1 viral strain computational genotyping.&lt;br style=&quot;clear: both;&quot;/&gt;
      &lt;a href=&quot;http://www.pheedo.com/click.phdo?s=219e2bc4043569d1247d659f5b13afd5&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=219e2bc4043569d1247d659f5b13afd5&quot;/&gt;&lt;/a&gt;
  &lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=219e2bc4043569d1247d659f5b13afd5&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.99</guid>
		</item>
		<item>
			<title>PrePrint: BioExtract Server - An Integrated Workflow-Enabling System to Access and Analyze Heterogeneous, Distributed Biomolecular Data</title>
			<link>http://www.pheedo.com/click.phdo?i=429de2a56fa94fc8a06602ff97ae4487</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.98</pheedo:origLink>
			<description>Many in silico investigations in bioinformatics require access to multiple, distributed data sources and analytic tools. The requisite data sources may include large public data repositories, community databases, and project databases for use in domain-specific research. Different data sources frequently utilize distinct query languages and return results in unique formats, and therefore researchers must either rely upon a small number of primary data sources or become familiar with multiple query languages and formats. Similarly, the associated analytic tools often require specific input formats and produce unique outputs which make it difficult to utilize the output from one tool as input to another. The BioExtract Server (http://bioextract.org) is a Web-based data integration application designed to consolidate, analyze, and serve data from heterogeneous biomolecular databases in the form of a mash-up. The basic operations of the BioExtract Server allow researchers, via their Web browsers, to: specify data sources, flexibly query data sources, apply analytic tools, download result sets, and store query results for later reuse. As a researcher works with the system, their "steps" are saved in the background. At any time these steps can be preserved long-term as a workflow simply by providing a workflow name and description.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=429de2a56fa94fc8a06602ff97ae4487&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=429de2a56fa94fc8a06602ff97ae4487&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.98</guid>
		</item>
		<item>
			<title>PrePrint: SARNA-Predict: Accuracy Improvement of RNA Secondary Structure Prediction Using Permutation Based Simulated Annealing</title>
			<link>http://www.pheedo.com/click.phdo?i=30b889305b977ac12edca888fa2680f0</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.97</pheedo:origLink>
			<description>Ribonucleic acid (RNA), a single stranded linear molecule, is essential to all biological systems. Since the structure of RNA molecules is key to their function, algorithms for the prediction of RNA structure are of great value. In this article we demonstrate the usefulness of SARNA-Predict, an RNA secondary structure prediction algorithm based on Simulated Annealing (SA). A performance valuation of SARNA-Predict in terms of prediction accuracy is made via comparison with eight state-of-the-art RNA prediction algorithms: mfold, Pseudoknot (pknotsRE), NUPACK, pknotsRG-mfe, Sfold, HotKnots, ILM, and STAR. These algorithms are from three different classes: heuristic, dynamic programming, and statistical sampling techniques. An evaluation for the performance of SARNA-Predict in terms of prediction accuracy was verified with native structures. Experiments on 33 individual known structures from eleven RNA classes (tRNA, viral RNA, anti-genomic HDV, telomerase RNA, tmRNA, rRNA, RNaseP, 5S rRNA, Group I intron 23S rRNA, Group I intron 16S rRNA, and 16S rRNA) were performed. The results presented in this paper demonstrate that SARNA-Predict can out-perform other state-of-the-art algorithms in terms of prediction accuracy. Furthermore, there is substantial improvement of prediction accuracy by incorporating a more sophisticated thermodynamic model (efn2).&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=30b889305b977ac12edca888fa2680f0&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=30b889305b977ac12edca888fa2680f0&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.97</guid>
		</item>
		<item>
			<title>PrePrint: Efficient Algorithms to Explore Conformation Spaces of Flexible Protein Loops</title>
			<link>http://www.pheedo.com/click.phdo?i=3e4f70133223bedb79c460ece89d9f9c</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.96</pheedo:origLink>
			<description>Several applications in biology - e.g., incorporation of protein flexibility in ligand docking algorithms, interpretation of fuzzy X-ray crystallographic data, and homology modeling - require computing the internal parameters of a flexible fragment (usually, a loop) of a protein in order to connect its termini to the rest of the protein without causing any steric clash. One must often sample many such conformations in order to explore and adequately represent the conformational range of the studied loop. While sampling must be fast, it is made difficult by the fact that two conflicting constraints - kinematic closure and clash avoidance - must be satisfied concurrently. This paper describes two efficient and complementary sampling algorithms to explore the space of closed clash-free conformations of a flexible protein loop. The "seed sampling" algorithm samples broadly from this space, while the "deformation sampling" algorithm uses seed conformations as starting points to explore the conformation space around them at a finer grain. Computational results are presented for various loops ranging from 5 to 25 residues. More specific results also show that the combination of the sampling algorithms with a functional site prediction software (FEATURE) makes it possible to compute and recognize calcium-binding loop conformations. The sampling algorithms are implemented in a toolkit (LoopTK), which is available at https://simtk.org/home/looptk.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=3e4f70133223bedb79c460ece89d9f9c&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=3e4f70133223bedb79c460ece89d9f9c&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.96</guid>
		</item>
		<item>
			<title>PrePrint: SCS: Signal, Context and Structure Features for Genome-Wide Human Promoter Recognition</title>
			<link>http://www.pheedo.com/click.phdo?i=052326e41f1d0edf7e8127349af72e0b</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.95</pheedo:origLink>
			<description>This paper integrates the signal, context and structure features for genome-wide promoter recognition, which is critical in many DNA sequence analysis tasks. First, CpG islands are salient biological signals associated with approximately $50\%$ mammalian promoters associated with transcription start sites. Second, the genetic context of promoters may have biological significance, which is based on $n$-mers (sequences of $n$ bases long) and their statistics estimated from training samples. Third, sequence-dependent DNA flexibility originates from DNA $3$D structures, and plays an important role in guiding TFs to the target site in promoters. Employing decision trees, we combine above signal, context and structure features to build a hierarchical promoter recognition system called SCS. Experimental results on controlled datasets and the entire human genome demonstrate that SCS is significantly superior in terms of sensitivity and specificity as compared to other state-of-the-art methods. The SCS promoter recognition system is available upon request from the authors.&lt;br style=&quot;clear: both;&quot;/&gt;
      &lt;a href=&quot;http://www.pheedo.com/click.phdo?s=052326e41f1d0edf7e8127349af72e0b&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=052326e41f1d0edf7e8127349af72e0b&quot;/&gt;&lt;/a&gt;
  &lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=052326e41f1d0edf7e8127349af72e0b&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.95</guid>
		</item>
		<item>
			<title>PrePrint: Cache-Oblivious Dynamic Programming for Bioinformatics</title>
			<link>http://www.pheedo.com/click.phdo?i=efb44f179b7768d65f4c1339b413fe61</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.94</pheedo:origLink>
			<description>We present efficient cache-oblivious algorithms for some well-studied string problems in bioinformatics including {\em the longest common subsequence}, {\em global pairwise sequence alignment} and {\em 3-way sequence alignment (or median)}, both with affine gap costs, and {\em RNA secondary structure prediction with simple pseudoknots}. For each of these problems we present cache-oblivious algorithms that match the best-known time complexity, match or improve the best-known space complexity, and improve significantly over the cache-efficiency of earlier algorithms. We present experimental results which show that our cache-oblivious algorithms run faster than software and implementations based on previous best algorithms for these problems.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=efb44f179b7768d65f4c1339b413fe61&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=efb44f179b7768d65f4c1339b413fe61&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.94</guid>
		</item>
		<item>
			<title>PrePrint: Fixed-Parameter Tractability of the Maximum Agreement Supertree Problem</title>
			<link>http://www.pheedo.com/click.phdo?i=460c5712e61c908bb5fadcbf24f04f63</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.93</pheedo:origLink>
			<description>Given a set $L$ of labels and a collection of rooted trees whose leaves are bijectively labelled by some elements of $L$, the Maximum Agreement Supertree problem (\pSMAST) is as follows: find a tree $T$ on a largest label set $L'$ in $L$ that contains every input tree restricted to $L'$. The problem has phylogenetic applications to infer supertrees and perform tree congruence analyses. We focus on the parameterized complexity of this $\np$-hard problem, considering different combinations of parameters as well as particular cases. We show that \pSMAST on $k$ rooted binary trees on a label set of size $n$ can be solved in $O((8n)^k)$ time, which is an improvement with respect to the previously known $O(n^{3k^2})$ time algorithm. In this case, we also give an $O((2k)^p kn^2)$ time algorithm, where $p$ is an upper bound on the number of leaves missing in a \pSMAST solution. This shows that \pSMAST can be solved efficiently when the input trees are mostly congruent. Then for the particular case where any triple of leaves is contained in at least one input tree, we give $O(4^p n^3)$ and $O(3.12^p+n^4)$ time algorithms, obtaining the first fixed-parameter tractable algorithms on a single parameter for this problem. We also obtain intractability results for several combinations of parameters.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=460c5712e61c908bb5fadcbf24f04f63&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=460c5712e61c908bb5fadcbf24f04f63&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.93</guid>
		</item>
		<item>
			<title>PrePrint: Defining and Computing Optimum RMSD for Gapped and Weighted Multiple Structure Alignment</title>
			<link>http://www.pheedo.com/click.phdo?i=d5329d80d2ac2555898b2f192bb14b77</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.92</pheedo:origLink>
			<description>Pairwise structure alignment commonly uses root mean square deviation (RMSD) to measure the structural similarity, and methods for optimizing RMSD are well established. We extend RMSD to weighted RMSD for multiple structures. By using multiplicative weights, we show that weighted RMSD for all pairs is the same as weighted RMSD to an average of the structures. Thus, using RMSD or weighted RMSD implies that the average is a consensus structure. Although we show that in general, the two tasks of finding the optimal translations and rotations for minimizing weighted RMSD cannot be separated for multiple structures like they can for pairs, an inherent difficulty and a fact ignored by previous work, we develop a near-linear iterative algorithm to converge weighted RMSD to a local minimum. 10,000 experiments of gapped alignment done on each of 23 protein families from HOMSTRAD (where each structure starts with a random translation and rotation) converge rapidly to the same minimum. Finally we propose a heuristic method to iteratively remove the effect of outliers and find well-aligned positions that determine the structural conserved region by modeling B-factors and deviations from the average positions as weights and iteratively assigning higher weights to better aligned atoms.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=d5329d80d2ac2555898b2f192bb14b77&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=d5329d80d2ac2555898b2f192bb14b77&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.92</guid>
		</item>
		<item>
			<title>PrePrint: Linear Separability of Gene Expression Datasets</title>
			<link>http://www.pheedo.com/click.phdo?i=00366da1f9dc91965846738220a8a20e</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.90</pheedo:origLink>
			<description>We study simple geometric properties of gene expression datasets. Samples are taken from two distinct classes (e.g. two types of cancer). Specifically, the problem of linear separability for pairs of genes is investigated. If a pair of genes exhibits linear separation with respect to the two classes, then the joint expression level of the two genes is strongly correlated to the phenomena of the sample being taken from one class or the other. This may indicate an underlying molecular mechanism relating the two genes and the phenomena (e.g. a specific cancer). We developed and implemented novel, efficient algorithmic tools for finding all pairs of genes that induce a linear separation of the two sample classes. These tools are based on computational geometric properties, and were applied to ten publicly available cancer datasets. For each dataset, we computed the number of actual separating pairs, and compared it to an upper bound on the number expected by chance, and to the numbers resulting from shuffling the labels of the data at random empirically. Seven out of these ten datasets are highly separable. Statistically, this phenomenon is highly significant, very unlikely to occur at random. It is therefore reasonable that it manifests a functional association between separating genes and the underlying phenotypic classes.&lt;br style=&quot;clear: both;&quot;/&gt;
      &lt;hr /&gt;
&lt;div style=&quot;font-size:xx-small;color:gray;padding-bottom:.5em&quot;&gt;Presented By:&lt;/div&gt;
&lt;div&gt;&lt;a href=&quot;http://www.pheedo.com/feeds/ht.php?t=c&amp;amp;i=00366da1f9dc91965846738220a8a20e&quot;&gt;&lt;/a&gt;&lt;/div&gt;&lt;table border=&quot;0&quot; cellpadding=&quot;0&quot; cellspacing=&quot;0&quot;&gt;
&lt;tr&gt;&lt;td valign=&quot;top&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&amp;nbsp;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;div style=&quot;font-size:xx-small; padding-top: 1em;&quot;&gt;&lt;span style=&quot;border-top: 1px solid&quot;&gt;
&lt;br style=&quot;display:none&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/&quot;&gt;Ads by Pheedo&lt;/a&gt;
&lt;/span&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/feeds/ht.php?t=v&amp;amp;i=00366da1f9dc91965846738220a8a20e&quot;/&gt;
&lt;br/&gt;
&lt;/div&gt;
  &lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=00366da1f9dc91965846738220a8a20e&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.90</guid>
		</item>
		<item>
			<title>PrePrint: Improving Strand Pairing Prediction Through Exploring Folding Cooperativity</title>
			<link>http://www.pheedo.com/click.phdo?i=8867c66043c4dfe895214262ba761eb1</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.88</pheedo:origLink>
			<description>The topology of $\beta$-sheets is defined by the pattern of hydrogen-bonded strand pairing. Therefore, predicting hydrogen bonded strand partners is a fundamental step towards predicting $\beta$-sheet topology. At the same time, finding the correct partners is very difficult due to long range interactions involved in strand pairing. Additionally, patterns of aminoacids involved, in $\beta$-sheet formations are very general and therefore difficult to use for computational recognition of specific contacts between strands. In this work, we report a new strand pairing algorithm. To address above mentioned difficulties, our algorithm attempts to mimic elements of the folding process. Namely, in addition to ensuring that the predicted hydrogen bonded strand pairs satisfy basic global consistency constraints, it takes into account hypothetical folding pathways. Consistently with this view, introducing hydrogen bonds between a pair of strands changes the probabilities of forming hydrogen bonds between other pairs of strand. We demonstrate that this approach provides an improvement over previously proposed algorithms. We also compare the performance of this method to that of a global optimization algorithm that poses the problem as integer linear programming optimization problem and solves it using ILOG CPLEX\texttrademark\ package.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=8867c66043c4dfe895214262ba761eb1&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=8867c66043c4dfe895214262ba761eb1&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.88</guid>
		</item>
		<item>
			<title>PrePrint: Gene Association Networks from Microarray Data Using a Regularized Estimation of Partial Correlation Based on PLS Regression</title>
			<link>http://www.pheedo.com/click.phdo?i=d3b9c349450f0f6c5b202801676f4963</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.87</pheedo:origLink>
			<description>Reconstruction of gene-gene interactions from large scale data such as microarray is a first step toward better understanding the mechanisms at work in the cell. Two main issues have to be managed in such a context: (i) to choose which measures have to be used to distinguish between direct and indirect interactions from high dimensional microarray data and (ii) to construct networks with a low proportion of false positive edges. We present an efficient methodology for the reconstruction of gene interaction networks in a small sample size setting. The strength of independence of any two genes is measured, in such "high dimensional network", by a regularized estimation of the partial correlation based on the Partial Least Squares Regression. We finally emphasize specific properties of the proposed method. To assess the sensitivity and specificity of the method, we carried out the reconstruction of networks from simulated data. We also tested PLS-based partial correlation network on static and dynamic real microarray data. An R implementation of the proposed algorithm is available from http://plmnetwork.seeclic.com/.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=d3b9c349450f0f6c5b202801676f4963&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=d3b9c349450f0f6c5b202801676f4963&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.87</guid>
		</item>
		<item>
			<title>PrePrint: Hybridization in Non-Binary Trees</title>
			<link>http://www.pheedo.com/click.phdo?i=ea970fa4470911d0c24bbb2e9e1bfefd</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.86</pheedo:origLink>
			<description>Reticulate evolution&amp;#x2014;the umbrella term for processes like hybridization, horizontal gene transfer, and recombination&amp;#x2014;plays an important role in the history of life of many species. Although the occurrence of such events is widely accepted, approaches to calculate the extent to which reticulation has influenced evolution are relatively rare. In this paper, we show that the NP-hard problem of calculating the minimum number of reticulation events for two (arbitrary) rooted phylogenetic trees parameterized by this minimum number is fixed-parameter tractable.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=ea970fa4470911d0c24bbb2e9e1bfefd&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=ea970fa4470911d0c24bbb2e9e1bfefd&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.86</guid>
		</item>
		<item>
			<title>IEEE/ACM Transactions on Computational Biology and Bioinformatics - July-September 2008 (Vol. 5, No. 3)</title>
			<link>http://opac.ieeecomputersociety.org/opac?year=2008&amp;volume=5&amp;issue=03&amp;acronym=tcbb</link>
			<description>IEEE/ACM Transactions on Computational Biology and Bioinformatics</description>
			<guid isPermaLink="true">http://www.computer.org/portal/site/tcbb/</guid>
		</item>
		<item>
			<title>PrePrint: An Introduction to Metabolic Networks and Their Structural Analysis</title>
			<link>http://www.pheedo.com/click.phdo?i=7040a7a0b42c15ea3777700f3985171d</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.79</pheedo:origLink>
			<description>There has been a renewed interest for metabolism in the computational biology community, leading to an avalanche of papers coming from methodological network analysis as well as experimental and theoretical biology. This paper is meant to serve as an initial guide for both the biologists interested in formal approaches and the mathematicians or computer scientists wishing to inject more realism into their models. The paper is focused on the structural aspects of metabolism only. The literature is vast enough already, and the thread through it difficult to follow even for the more experienced worker in the field. We explain methods for acquiring data and reconstructing metabolic networks, and review the various models that have been used for their structural analysis. Several concepts such as modularity are introduced, as are the controversies that have beset the field these past few years, for instance, on whether metabolic networks are small-world or scale-free, and on which model better explains the evolution of metabolism. Clarifying the work that has been done also helps in identifying open questions and in proposing relevant future directions in the field, which we do along the paper and in the conclusion.&lt;br style=&quot;clear: both;&quot;/&gt;
      &lt;a href=&quot;http://www.pheedo.com/click.phdo?s=7040a7a0b42c15ea3777700f3985171d&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=7040a7a0b42c15ea3777700f3985171d&quot;/&gt;&lt;/a&gt;
  &lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=7040a7a0b42c15ea3777700f3985171d&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.79</guid>
		</item>
		<item>
			<title>PrePrint: Protein Structure Classification Based on Conserved Hydrophobic Residues</title>
			<link>http://www.pheedo.com/click.phdo?i=7d598ccecb482f26a6bc80aa3523f588</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.77</pheedo:origLink>
			<description>Protein folding is frequently guided by local residue interactions that form clusters in the protein core. The interactions between residue clusters serve as potential nucleation sites in the folding process. Evidence postulates that the residue interactions are governed by the hydrophobic propensities that the residues possess. An array of hydrophobicity scales has been developed to determine the hydrophobic propensities of residues under different environmental conditions. In this work, we propose a graph theory based data mining framework to extract and isolate protein structural features that sustain invariance in evolutionary related proteins, through the integrated analysis of five well-known hydrophobicity scales over the 3-D structure of proteins. We hypothesize that proteins of the same homology contain conserved hydrophobic residues and exhibit analogous residue interaction patterns in the folded state. The results obtained demonstrate that discriminatory residue interaction patterns shared among proteins of the same family can be employed for both the structural and the functional annotation of proteins. We obtained at average 90% accuracy in protein classification with a significantly small feature vector compared to previous results in the area. This work presents an elaborate study as well as validation evidence to illustrate the efficacy of the method and the correctness of results reported.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=7d598ccecb482f26a6bc80aa3523f588&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=7d598ccecb482f26a6bc80aa3523f588&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.77</guid>
		</item>
		<item>
			<title>PrePrint: Maximum Parsimony for Tree Mixtures</title>
			<link>http://www.pheedo.com/click.phdo?i=a65f78ae523c42a948c99b383a504528</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.75</pheedo:origLink>
			<description>With the number of sequenced genomes growing ever larger, it is now common practice to concatenate sequence alignments from several genomic loci as a first step to phylogenetic tree inference. However, as different loci may support different trees due to processes such as gene duplication and lineage sorting, it is important to better understand how commonly used phylogenetic inference methods behave on such "phylogenetic mixtures". Here we shall focus on how parsimony, one of the most popular methods for reconstructing phylogenetic trees, behaves for mixtures of two trees. In particular, we show that (i) the parsimony problem is NP-complete for mixtures of two trees, (ii) there are mixtures of two trees that have exponentially many (in the number of leaves) most parsimonious trees, and (iii) give an explicit description of the most parsimonious tree(s) and scores corresponding to the mixture of a pair of trees related by a single TBR operation.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=a65f78ae523c42a948c99b383a504528&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=a65f78ae523c42a948c99b383a504528&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.75</guid>
		</item>
		<item>
			<title>PrePrint: Reassortment Networks for Investigating the Evolution of Segmented Viruses</title>
			<link>http://www.pheedo.com/click.phdo?i=04aa892cb724bda625809ac18ec6d5e3</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.73</pheedo:origLink>
			<description>Many viruses, such as influenza A, have distinct segments in their genome. Their evolution involves mutation as well as reassortment, where segments are interchanged betweenviruses that co-infect a host. Phylogenetic trees can be constructed to investigate the mutation-driven evolution of individual viral segments, but cannot depict reassortment events. We propose reassortment networks to analyze the evolution of segmented viruses. These are layered graphs in which the layers represent evolutionary stages such as a temporal series of seasons in which influenza viruses are isolated. Nodes represent viral isolates and reassortment events between pairs of isolates. Edges represent evolutionary steps while weights on edges represent edit costs of reassortment and mutation events. Paths represent possible transformation series among viruses. The length of each path is the sum edit cost of the events required to transform one virus into another. To analyze &amp;#x03C4; stages of evolution of n viruses with segments of maximum length m, we compute the pairwise distances between all corresponding segments in Om2n2 time using dynamic programming. The reassortment network, with O&amp;#x03C4;n2 nodes, is constructed using these distances. The ancestors and descendents of a specific virus can be traced via shortest paths in this network, which can be found in O&amp;#x03C4;n3 time.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=04aa892cb724bda625809ac18ec6d5e3&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=04aa892cb724bda625809ac18ec6d5e3&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.73</guid>
		</item>
		<item>
			<title>PrePrint: Signal Quality Measurements for cDNA Microarray Data</title>
			<link>http://www.pheedo.com/click.phdo?i=67c2ce464abf4016e41d46201d96171d</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.72</pheedo:origLink>
			<description>Concerns about the reliability of transcription rates derived from microarray data inspires ongoing research into measurement error in these experiments. Error happens at both the technical level within the lab and the experimental level. In this paper, we are concerned with measuring spot-specific error. This paper outlines two different approaches to quantify the reliability of spot-specific intensity estimates. In both cases, the spatial correlation between pixels and its impact on spot quality is accounted for. The first method is a straightforward parametric estimate of within-spot variance that assumes a Gaussian distribution and accounts for spatial correlation via an overdispersion factor. The second method employs a non-parametric quality estimate referred to throughout as the mean square prediction error (MSPE). The MSPE first smooths a pixel region and then measures the difference between actual pixel values and the smoother. Both methods herein are compared for real and simulated data to assess numerical characteristics and the ability to describe poor spot quality. We conclude that both approaches capture noise in the microarray platform and highlight situations where one method or the other is superior.&lt;br style=&quot;clear: both;&quot;/&gt;
      &lt;a href=&quot;http://www.pheedo.com/click.phdo?s=67c2ce464abf4016e41d46201d96171d&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=67c2ce464abf4016e41d46201d96171d&quot;/&gt;&lt;/a&gt;
  &lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=67c2ce464abf4016e41d46201d96171d&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.72</guid>
		</item>
		<item>
			<title>PrePrint: Metrics for Phylogenetic Networks I: Generalizations of the Robinson-Foulds Metric</title>
			<link>http://www.pheedo.com/click.phdo?i=b8caf6f7116500be5fdd78ad991296bc</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.70</pheedo:origLink>
			<description>The assessment of phylogenetic network reconstruction methods requires the ability to compare phylogenetic networks. This is the first in a series of papers devoted to the analysis and comparison of metrics for tree-child time consistent phylogenetic networks on the same set of taxa. In this paper, we study three metrics that have already been introduced in the literature: the Robinson-Foulds distance, the tripartitions distance and the $\mu$-distance. They generalize to networks the classical Robinson-Foulds or partition distance for phylogenetic trees. We analyze the behavior of these metrics by studying their least and largest values and when they achieve them. As a by-product of this study, we obtain tight bounds on the size of a tree-child time consistent phylogenetic network.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=b8caf6f7116500be5fdd78ad991296bc&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=b8caf6f7116500be5fdd78ad991296bc&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.70</guid>
		</item>
		<item>
			<title>PrePrint: An &#x03a9;(n^2/ log n) Speed-Up of TBR Heuristics for the Gene-Duplication Problem</title>
			<link>http://www.pheedo.com/click.phdo?i=3c6bf5f7159486abc304d36d78acf094</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.69</pheedo:origLink>
			<description>The gene-duplication problem is to infer a species supertree from gene trees that are confounded by complex histories of gene duplications. This problem is NP-hard and thus requires efficient and effective heuristics. Existing heuristics perform a stepwise search of the tree space, where each step is guided by an exact solution to an instance of a local search problem. We improve on the time complexity of the local search problem by a factor of n2= log n, where n is the size of the resulting species supertree. Typically, several thousand instances of the local search problem are solved throughout a stepwise heuristic search. Hence, our improvement makes the gene-duplication problem much more tractable for large-scale phylogenetic analyses.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=3c6bf5f7159486abc304d36d78acf094&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=3c6bf5f7159486abc304d36d78acf094&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.69</guid>
		</item>
		<item>
			<title>PrePrint: Fourier Transform Inequalities for Phylogenetic Trees</title>
			<link>http://www.pheedo.com/click.phdo?i=c9cc4acd3225150c291a4643ed6245f4</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.68</pheedo:origLink>
			<description>Phylogenetic invariants are not the only constraints on site-pattern frequency vectors for phylogenetic trees. A mutation matrix, by its definition, is the exponential of a matrix with non-negative off-diagonal entries; this positivity requirement implies non-trivial constraints on the site-pattern frequency vectors. We call these additional constraints "edge-parameter inequalities." In this paper, we first motivate the edge-parameter inequalities by considering a pathological site-pattern frequency vector corresponding to a quartet tree with a negative internal edge. This site-pattern frequency vector nevertheless satisfies all of the constraints described up to now in the literature. We next describe two complete sets of edge-parameter inequalities for the group-based models; these constraints are square-free monomial inequalities in the Fourier transformed coordinates. These inequalities, along with the phylogenetic invariants, form a complete description of the set of site-pattern frequency vectors corresponding to \emph{bona fide} trees. Said in mathematical language, this paper explicitly presents two finite lists of inequalities in Fourier coordinates of the form "monomial $\leq 1$," each list characterizing the phylogenetically relevant semialgebraic subsets of the phylogenetic varieties.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=c9cc4acd3225150c291a4643ed6245f4&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=c9cc4acd3225150c291a4643ed6245f4&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.68</guid>
		</item>
		<item>
			<title>PrePrint: SpeedHap: An Accurate Heuristic for the Single Individual SNP Haplotyping Problem with Many Gaps, High Reading Error Rate and  Low Coverage</title>
			<link>http://www.pheedo.com/click.phdo?i=75ed9c6cfc4fa3e81a2ebfdcff3aee9a</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.67</pheedo:origLink>
			<description>Single nucleotide polymorphism (SNP) is the most frequent form of DNA variation. The set of SNP's present in a chromosome (called the em haplotype) is of interest in a wide area of applications in molecular biology and biomedicine, including diagnostic and medical therapy. In this paper we propose a new heuristic method for the problem of haplotype reconstruction for (portions of) a pair of homologous human chromosomes from a single individual (SIH). The problem is well known in literature and exact algorithms have been proposed for the case when no (or few) gaps are allowed in the input fragments. These algorithms, though exact and of polynomial complexity, are slow in practice. When gaps are considered no exact method of polynomial complexity is known. The problem is also hard to approximate with guarantees. Therefore fast heuristics have been proposed. In this paper we describe SpeedHap, a new heuristic method that is able to tackle the case of many gapped fragments and retains its effectiveness even when the input fragments have high rate of reading errors (up to 20%) and low coverage (as low as 3). We test SpeedHap on real data from the HapMap Project.&lt;br style=&quot;clear: both;&quot;/&gt;
      &lt;a href=&quot;http://www.pheedo.com/click.phdo?s=75ed9c6cfc4fa3e81a2ebfdcff3aee9a&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=75ed9c6cfc4fa3e81a2ebfdcff3aee9a&quot;/&gt;&lt;/a&gt;
  &lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=75ed9c6cfc4fa3e81a2ebfdcff3aee9a&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.67</guid>
		</item>
		<item>
			<title>PrePrint: Incomplete Lineage Sorting: Consistent Phylogeny Estimation From Multiple Loci</title>
			<link>http://www.pheedo.com/click.phdo?i=419385d0af32c0d75826fa62dffdae13</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.66</pheedo:origLink>
			<description>We introduce a simple, computationally efficient algorithm for reconstructing phylogenies from multiple gene trees in the presence of incomplete lineage sorting, that is, when the topology of the gene trees may differ from that of the species tree. We show that our technique is statistically consistent under standard stochastic assumptions, that is, it returns the correct tree given sufficiently many unlinked loci. We also show that it can tolerate moderate estimation errors.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=419385d0af32c0d75826fa62dffdae13&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=419385d0af32c0d75826fa62dffdae13&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.66</guid>
		</item>
		<item>
			<title>PrePrint: Automated Isolation of Translational Efficiency Bias that Resists the Confounding Effect of GC(AT)-Content</title>
			<link>http://www.pheedo.com/click.phdo?i=c9dbc24bec5ae25e6ca62d7b0d4a5693</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.65</pheedo:origLink>
			<description>Genomic sequencing projects are an abundant source of information for biological studies ranging from the molecular to the ecological in scale; however, much of the information present may yet be hidden from casual analysis. One such information domain, trends in codon usage, can provide a wealth of information about an organism's genes and their expression. Degeneracy in the genetic code allows more than one triplet codon to code for the same amino acid, and usage of these codons is often biased such that one or more of these synonymous codons is preferred. Detection of this bias is an important tool in the analysis of genomic data, particularly as a predictor of gene expressivity. Methods for identifying codon usage bias in genomic data that rely solely on genomic sequence data are susceptible to being confounded by the presence of several factors simultaneously influencing codon selection. Presented here is a new technique for removing the effects of one of the more common confounding factors, GC(AT)-content, and of visualizing the search-space for codon usage bias through the use of a solution landscape. This technique successfully isolates expressivity-related codon usage trends, using only genomic sequence information, where other techniques fail due to the presence of GC(AT)-content confounding influences.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=c9dbc24bec5ae25e6ca62d7b0d4a5693&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=c9dbc24bec5ae25e6ca62d7b0d4a5693&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.65</guid>
		</item>
		<item>
			<title>PrePrint: Model Composition for Macromolecular Regulatory Networks</title>
			<link>http://www.pheedo.com/click.phdo?i=ad7dfd109a77ff7ffa955a9d983d103a</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.64</pheedo:origLink>
			<description>Models of regulatory networks become more difficult to construct and understand as they grow in size and complexity. Large models are usually built up from smaller models, representing subsets of reactions within the larger network. To assist modelers in this composition process, we present a formal approach for model composition, a wizard-style program for implementing the approach, and suggested language extensions to the Systems Biology Markup Language (SBML) to support model composition. To illustrate the features of our approach and how to use the JigCell Composition Wizard, we build up a model of the eukaryotic cell cycle ``engine'' from smaller pieces.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=ad7dfd109a77ff7ffa955a9d983d103a&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=ad7dfd109a77ff7ffa955a9d983d103a&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.64</guid>
		</item>
		<item>
			<title>PrePrint: Barking Up The Wrong Treelength: The Impact of Gap Penalty on Alignment and Tree Accuracy</title>
			<link>http://www.pheedo.com/click.phdo?i=626d756da391bfb1a3e2ddc4f305b873</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.63</pheedo:origLink>
			<description>Several methods have been developed for simultaneous estimation of alignment and tree, of which POY is the most popular. In a 2007 paper published in Systematic Biology, Ogden and Rosenberg reported on a simulation study in which they compared POY to estimating the alignment using ClustalW and then analyzing the resultant alignment using maximum parsimony. They found that ClustalW+MP outperformed POY with respect to alignment and phylogenetic tree accuracy, and they concluded that simultaneous estimation techniques are not competitive with two-phase techniques. Our paper presents a simulation study in which we focus on the NP-hard optimization problem that POY addresses: minimizing treelength. Our study considers the impact of the gap penalty and suggests that the poor performance observed for POY by Ogden and Rosenberg is due to the simple gap penalties they used to score alignment/tree pairs. Our study suggests that optimizing under an affine gap penalty might produce alignments that are better than ClustalW alignments, and competitive with those produced by the best current alignment methods. We also show that optimizing under this affine gap penalty produces trees whose topological accuracy is better than ClustalW+MP, and competitive with the current best two-phase methods.&lt;br style=&quot;clear: both;&quot;/&gt;
      &lt;a href=&quot;http://www.pheedo.com/click.phdo?s=626d756da391bfb1a3e2ddc4f305b873&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=626d756da391bfb1a3e2ddc4f305b873&quot;/&gt;&lt;/a&gt;
  &lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=626d756da391bfb1a3e2ddc4f305b873&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.63</guid>
		</item>
		<item>
			<title>PrePrint: Fast Hinge Detection Algorithms for Flexible Protein Structures</title>
			<link>http://www.pheedo.com/click.phdo?i=a8215bcb8319cf517ded7441110e444a</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.62</pheedo:origLink>
			<description>Analysis of conformational changes is one of the keys to the understanding of protein functions and interactions. For the analysis, we often compare two protein structures, taking flexible regions like hinge regions into consideration. We propose a new measure called RMSDh (Root Mean Square Deviation considering hinges) and its variant RMSD(k) for comparing two flexible proteins with hinge regions. We also propose novel linear-time algorithms for computing them, which can detect the hinge positions at the same time. The RMSDh is suitable for cases where there is one small hinge region in each of the two target structures. The RMSDh(k) is designed for comparing structures with more than one hinge region. The RMSDh(k) measure considers at most k small hinge region, i.e., the RMSDh(k) value should be small if the two structures are similar except for at most k hinge regions. To compute the value, we propose an O(kn^2)-time and O(n)-space algorithm based on a new dynamic programming technique. With the same computational time and space, we can enumerate the predicted hinge positions. We also test our algorithms against actual flexible protein structures, and show that the hinge positions can be correctly detected by our algorithms.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=a8215bcb8319cf517ded7441110e444a&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=a8215bcb8319cf517ded7441110e444a&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.62</guid>
		</item>
		<item>
			<title>PrePrint: Modeling Protein Interacting Groups by Quasi-bicliques: Complexity, Algorithm and Application</title>
			<link>http://www.pheedo.com/click.phdo?i=e99994f79ff1862cb790ccaf3b04139b</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.61</pheedo:origLink>
			<description>Protein-protein interactions (PPIs) are one of the most important mechanisms in cellular processes. To model protein interaction sites, recent studies have suggested to find interacting protein group pairs from large PPI networks at the first step, and then to search conserved motifs within the protein groups to form interacting motif pairs. To consider noise effect and incompleteness of biological data, we propose to use quasi-bicliques for finding interacting protein group pairs. We investigate two new problems which arise from finding interacting protein group pairs: the maximum vertex quasi-biclique problem and the maximum balanced quasi-biclique problem. We prove that both problems are NP-complete. This is a surprising result as the widely known maximum vertex biclique problem is polynomial time solvable. We then propose a heuristic algorithm which uses the greedy method to find the quasi-bicliques from PPI networks. Our experiment results on real data show that this algorithm has a better performance than a benchmark algorithm for identifying highly matched BLOCKS and PRINTS motifs. We also report results of two case studies on interacting motif pairs which map well with two interacting domain pairs in iPfam. Supplementary information: http://www.cs.cityu.edu.hk/~liuxw/ppi&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=e99994f79ff1862cb790ccaf3b04139b&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=e99994f79ff1862cb790ccaf3b04139b&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.61</guid>
		</item>
		<item>
			<title>PrePrint: Seeded Tree Alignment</title>
			<link>http://www.pheedo.com/click.phdo?i=acb4755e4b2a01262f848e120b7247d3</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.59</pheedo:origLink>
			<description>The optimal transformation of one tree into another by means of elementary edit operations is an important algorithmic problem that has several interesting applications to computational biology. Here we introduce a constrained form of this problem in which a partial mapping of a set of nodes (the "seeds") in one tree to a corresponding set of nodes in the other tree is given, and present efficient algorithms for both ordered and unordered trees. Whereas ordered tree matching based on seeded nodes has applications in pattern matching of RNA structures, unordered tree matching based on seeded nodes has applications in co-speciation and phylogeny reconciliation. The latter involves the solution of the planar tanglegram layout problem, for which a polynomial-time algorithm is given here.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=acb4755e4b2a01262f848e120b7247d3&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=acb4755e4b2a01262f848e120b7247d3&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.59</guid>
		</item>
		<item>
			<title>PrePrint: Drawing Rooted Phylogenetic Networks</title>
			<link>http://www.pheedo.com/click.phdo?i=094323611ba14411f0fec0aae2de85f8</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.58</pheedo:origLink>
			<description>The evolutionary history of a collection of species is usually represented by a phylogenetic tree. Sometimes, phylogenetic networks are used as a means of representing reticulate evolution or of showing uncertainty and incompatibilities in evolutionary datasets. This is often done using unrooted phylogenetic networks such as split networks, due in part, to the availability of software (SplitsTree) for their computation and visualization. In this paper we discuss the problem of drawing rooted phylogenetic networks as cladograms or phylograms in a number of different views that are commonly used for rooted trees. Implementations of the algorithms are available in new releases of the Dendroscope and SplitsTree programs.&lt;br style=&quot;clear: both;&quot;/&gt;
      &lt;a href=&quot;http://www.pheedo.com/click.phdo?s=094323611ba14411f0fec0aae2de85f8&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=094323611ba14411f0fec0aae2de85f8&quot;/&gt;&lt;/a&gt;
  &lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=094323611ba14411f0fec0aae2de85f8&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.58</guid>
		</item>
		<item>
			<title>PrePrint: Learning Scoring Schemes for Sequence Alignment from Partial Examples</title>
			<link>http://www.pheedo.com/click.phdo?i=e31dcf32310d2a5d2fc6c254ae44cae6</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.57</pheedo:origLink>
			<description>When aligning biological sequences, the choice of parameter values for the alignment scoring function is critical. Small changes in gap penalties, for example, can yield radically different alignments. A rigorous way to compute parameter values that are appropriate for aligning biological sequences is through inverse parametric sequence alignment. Given a collection of examples of biologically correct alignments, this is the problem of finding parameter values that make the scores of the example alignments close to those of optimal alignments for their sequences. We extend prior work on inverse parametric alignment to partial examples, which contain regions where the alignment is left unspecified, and to an improved formulation based on minimizing the average error between the score of an example and the score of an optimal alignment. Experiments on benchmark biological alignments show we can find parameters that generalize across protein families and that boost the accuracy of multiple sequence alignment by as much as 25%.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=e31dcf32310d2a5d2fc6c254ae44cae6&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=e31dcf32310d2a5d2fc6c254ae44cae6&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.57</guid>
		</item>
		<item>
			<title>PrePrint: Heuristic Bayesian segmentation for discovery of co-expressed genes within genomic regions</title>
			<link>http://www.pheedo.com/click.phdo?i=5629084db2e9c22ec5b0ed8633137bc3</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.56</pheedo:origLink>
			<description>Segmentation aims to separate homogeneous areas from the sequential data, and plays a central role in data mining. It has applications ranging from finance to molecular biology, where bioinformatics tasks, such as genome data analysis, are active application fields. In this paper, we present a novel application of segmentation in locating genomic regions with co-expressed genes. We aim at automated discovery of such regions without requirement for user given parameters. In order to perform the segmentation within a reasonable time, we use heuristics. Most of the heuristic segmentation algorithms require some decision on the number of segments. This is usually accomplished by using asymptotic model selection methods like Bayesian information criterion. Such methods are based on some simplification which can limit their usage. In this paper, we propose a Bayesian model selection to choose the most proper result from heuristic segmentation. Our Bayesian model presents a simple prior for the segmentation solutions with various segment numbers and a modified dirichlet prior for modelling multinomial data. We show with various artificial datasets in our benchmark system that our model selection criterion has the best overall performance. The application of our method in yeast cell cycle gene expression data reveals potential active and passive regions of the genome.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=5629084db2e9c22ec5b0ed8633137bc3&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=5629084db2e9c22ec5b0ed8633137bc3&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.56</guid>
		</item>
		<item>
			<title>PrePrint: Budgeted Phylogenetic Diversity on Circular Split Systems</title>
			<link>http://www.pheedo.com/click.phdo?i=94320002bf02b3d32b7310ba7661d5da</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.54</pheedo:origLink>
			<description>In the last 15 years, Phylogenetic Diversity (PD) has gained interest in the community of conservation biologists as a surrogate measure for assessing biodiversity. We have recently proposed two approaches to select taxa for maximizing PD, namely PD with budget constraints and PD on split systems. In this paper, we will unify these two strategies and present a dynamic programming algorithm to solve the unified framework of selecting taxa with maximal PD under budget constraints on circular split systems. An improved algorithm will also be given if the underlying split system is a tree.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=94320002bf02b3d32b7310ba7661d5da&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=94320002bf02b3d32b7310ba7661d5da&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.54</guid>
		</item>
		<item>
			<title>PrePrint: Sorting genomes by reciprocal translocations, insertions and deletions</title>
			<link>http://www.pheedo.com/click.phdo?i=98aac34dada47a5c962266a141b93542</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.53</pheedo:origLink>
			<description>The problem of sorting by reciprocal translocations (abbreviated as SBT) is a major computational problem in the field of comparative genomics, which is to find a shortest sequence of reciprocal translocations that transforms one genome $\Pi$ into another genome $\Gamma$, with the restriction that $\Pi$ and $\Gamma$ contain the same genes. SBT has been proved to be polynomial-time solvable, and several polynomial algorithms have been developed. In this paper, we show how to extend Bergeron's SBT algorithm to include insertions and deletions, allowing to compare genomes containing different genes. In particular, if the gene set of $\Pi$ is a subset (or superset, respectively) of the gene set of $\Gamma$, we present an approximation algorithm for transforming $\Pi$ into $\Gamma$ by reciprocal translocations and deletions (insertions, respectively), providing a sorting sequence with length at most $OPT+2$, where $OPT$ is the minimum number of translocations and deletions (insertions, respectively) needed to transform $\Pi$ into $\Gamma$; if $\Pi$ and $\Gamma$ have different genes but not containing each other, we give a heuristic to transform $\Pi$ into $\Gamma$ by a shortest sequence of reciprocal translocations, insertions and deletions, with bounds for the length of the sorting sequence it outputs.&lt;br style=&quot;clear: both;&quot;/&gt;
      &lt;a href=&quot;http://www.pheedo.com/click.phdo?s=98aac34dada47a5c962266a141b93542&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=98aac34dada47a5c962266a141b93542&quot;/&gt;&lt;/a&gt;
  &lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=98aac34dada47a5c962266a141b93542&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.53</guid>
		</item>
		<item>
			<title>PrePrint: The Identifiability of Covarion Models in Phylogenetics</title>
			<link>http://www.pheedo.com/click.phdo?i=e3681dc4169a969d14d044fc3da09b4d</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.52</pheedo:origLink>
			<description>Covarion models of character evolution describe inhomogeneities in substitution processes through time. In phylogenetics, such models are used to describe changing functional constraints or selection regimes during the evolution of biological sequences. In this work the identifiability of such models for generic parameters on a known phylogenetic tree is established, provided the number of covarion classes does not exceed the size of the observable state space. `Generic parameters' as used here means all parameters except possibly those in a set of measure zero within the parameter space. Combined with earlier results, this implies both the tree and generic numerical parameters are identifiable if the number of classes is strictly smaller than the number of observable states.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=e3681dc4169a969d14d044fc3da09b4d&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=e3681dc4169a969d14d044fc3da09b4d&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.52</guid>
		</item>
		<item>
			<title>PrePrint: Robustness of topological supertree methods for reconciling dense incompatible data</title>
			<link>http://www.pheedo.com/click.phdo?i=1a005e3d4a51b4fc5cfd256e4a51f056</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.51</pheedo:origLink>
			<description>Given a collection of rooted phylogenetic trees with overlapping sets of leaves, a compatible supertree $S$ is a single tree whose set of leaves is the union of the input sets of leaves and such that $S$ agrees with each input tree when restricted to the leaves of the input tree. Typically with trees from real data, no compatible supertree exists, and various methods may be utilized to reconcile the incompatibilities in the input trees. This paper focuses on a measure of robustness of a supertree method called its ``radius" $R$. The larger the value of $R$ is, the further the data set can be from a natural correct tree $T$ and yet the method will still output $T$. It is shown that the maximal possible radius for a method is $R = 1/2$. Many familiar methods, both for supertrees and consensus trees, are shown to have $R = 0$, indicating that they need not output a tree $T$ that would seem to be the natural correct answer. A polynomial-time method Normalized Triplet Supertree (NTS) with the maximal possible $R = 1/2$ is defined. A geometric interpretion is given, and NTS is shown to solve an optimization problem. Additional properties of NTS are described.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=1a005e3d4a51b4fc5cfd256e4a51f056&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=1a005e3d4a51b4fc5cfd256e4a51f056&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.51</guid>
		</item>
		<item>
			<title>PrePrint: Efficient algorithms for the computational design of optimal tiling arrays</title>
			<link>http://www.pheedo.com/click.phdo?i=a7e6ecd9541c213df2c41d83c2371dfb</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.50</pheedo:origLink>
			<description>The representation of a genome by oligonucleotide probes is a prerequisite for the analysis of many of its basic properties, such as transcription factor binding sites, chromosomal breakpoints, gene expression of known genes and detection of novel genes, in particular those coding for small RNAs. An ideal representation would consist of a high density set of oligonucleotides with similar melting temperatures that do not cross-hybridize with other regions of the genome and are equidistantly spaced. The implementation of such design is typically called a tiling array or genome array. We formulate the minimal cost tiling path problem for the selection of oligonucleotides from a set of candidates. Computing the selection of probes requires multi-criterion optimization, which we cast into a shortest path problem. Standard algorithms running in linear time allow us to compute globally optimal tiling paths from millions of candidate oligonucleotides on a standard desktop computer for most problem variants. The solutions to this multi-criterion optimization are spatially adaptive to the problem instance. Our formulation incorporates experimental constraints with respect to specific regions of interest and trade offs between hybridization parameters, probe quality and tiling density easily.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=a7e6ecd9541c213df2c41d83c2371dfb&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=a7e6ecd9541c213df2c41d83c2371dfb&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.50</guid>
		</item>
		<item>
			<title>PrePrint: Integrating Data Clustering and Visualization for the Analysis of 3D Gene Expression Data</title>
			<link>http://www.pheedo.com/click.phdo?i=45f89254375c9c1631d813254be748a3</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.49</pheedo:origLink>
			<description>The recent development of methods for extracting precise measurements of spatial gene expression patterns from three-dimensional (3D) image data opens the way for new analyses of the complex gene regulatory networks controlling animal development. We present an integrated visualization and analysis framework that supports user-guided data clustering to aid exploration of these new complex datasets. The interplay of data visualization and clustering-based data classification leads to improved visualization and enables a more detailed analysis than previously possible. We discuss (i) integration of data clustering and visualization into one framework; (ii) application of data clustering to 3D gene expression data; (iii) evaluation of the number of clusters k in the context of 3D gene expression clustering; and (iv) improvement of overall analysis quality via dedicated post-processing of clustering results based on visualization. We discuss the use of this framework to objectively define spatial pattern boundaries and temporal profiles of genes and to analyze how mRNA patterns are controlled by their regulatory transcription factors.&lt;br style=&quot;clear: both;&quot;/&gt;
      &lt;a href=&quot;http://www.pheedo.com/click.phdo?s=45f89254375c9c1631d813254be748a3&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=45f89254375c9c1631d813254be748a3&quot;/&gt;&lt;/a&gt;
  &lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=45f89254375c9c1631d813254be748a3&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.49</guid>
		</item>
		<item>
			<title>PrePrint: On the Importance of Comprehensible Classification Models for Protein Function Prediction</title>
			<link>http://www.pheedo.com/click.phdo?i=245a4bfecb5bf9b4f5d7a57f1cbdc6d0</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.47</pheedo:origLink>
			<description>The literature on protein function prediction is currently dominated by works aimed at maximizing predictive accuracy, ignoring the important issues of validation and interpretation of discovered knowledge which can lead to new insights and hypotheses which are biologically meaningful and advance the understanding of protein functions by biologists. The overall goal of this paper is to critically evaluate this approach, offering a refreshing new perspective on this issue, focusing not only on predictive accuracy but also on the comprehensibility of the induced protein function prediction models. More specifically, this paper aims to offer two main contributions to the area of protein function prediction. First, it presents the case for discovering comprehensible protein function prediction models from data, discussing in detail the advantages of such models, namely increasing the confidence of the biologist in the system's predictions, leading to new insights about the data and the formulation of new biological hypotheses, and detecting errors in the data. Second, it presents a critical review of the pros and cons of several different knowledge representations that can be used in order to support the discovery of comprehensible protein function prediction models.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=245a4bfecb5bf9b4f5d7a57f1cbdc6d0&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=245a4bfecb5bf9b4f5d7a57f1cbdc6d0&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.47</guid>
		</item>
		<item>
			<title>PrePrint: A Multiple-Filter-Multiple-Wrapper Approach to Gene Selection and Microarray Data Classification</title>
			<link>http://www.pheedo.com/click.phdo?i=3b83695417909da4e4157719ea9dbf4a</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.46</pheedo:origLink>
			<description>Filters and wrappers are two prevailing approaches for gene selection in microarray data analysis. Filters make use of statistical properties of each gene to represent its discriminating power between different classes. The computation is fast but the predictions are inaccurate. Wrappers make use of a chosen classifier to select genes by maximizing classification accuracy, but the computation burden is formidable. Filters and wrappers have been combined in previous studies to maximize the classification accuracy for a chosen classifier with respect to a filtered set of genes. The drawback of this single-filter-single-wrapper (SFSW) approach is that the classification accuracy is dependent on the choice of specific filter and wrapper. In this paper, a multiple-filter-multiple-wrapper (MFMW) approach is proposed that makes use of multiple filters and multiple wrappers to improve the accuracy and robustness of the classification, and to identify potential biomarker genes. Experiments based on six benchmark datasets show that the MFMW approach outperforms SFSW models (generated by all combinations of filters and wrappers used in the corresponding MFMW model) in all cases and for all six datasets. Some of MFMW selected genes have been confirmed to be biomarkers or contribute to the development of particular cancers by other studies.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=3b83695417909da4e4157719ea9dbf4a&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=3b83695417909da4e4157719ea9dbf4a&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.46</guid>
		</item>
		<item>
			<title>PrePrint: A Trade-off Between Sample Complexity and Computational Complexity in Learning Boolean Networks from Time Series Data</title>
			<link>http://www.pheedo.com/click.phdo?i=c53b1324c3600c008ec06e3bb1da4030</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.38</pheedo:origLink>
			<description>A key problem in molecular biology is to infer regulatory relationships between genes from expression data. This paper studies a simplified model of such inference problems in which one or more Boolean variables, modeling, for example, the expression levels of genes, each depend deterministically on a small but unknown subset of a large number of Boolean input variables. Our model assumes that the expression data comprises a time series, in which successive samples may be correlated. We provide bounds on the expected amount of data needed to infer the correct relationships between output and input variables. These bounds improve and generalize previous results for Boolean network inference and continuous-time switching network inference. Although the computational problem is intractable in general, we describe a fixed-parameter tractable algorithm that is guaranteed to provide at least a partial solution to the problem. Most interestingly, both the sample complexity and computational complexity of the problem depend on the strength of correlations between successive samples in the time series, but in opposing ways. Uncorrelated samples minimize the total number of samples needed while maximizing computational complexity; a strong correlation between successive samples has the opposite effect. This observation has implications for the design of experiments for measuring gene expression.&lt;br style=&quot;clear: both;&quot;/&gt;
      &lt;a href=&quot;http://www.pheedo.com/click.phdo?s=c53b1324c3600c008ec06e3bb1da4030&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=c53b1324c3600c008ec06e3bb1da4030&quot;/&gt;&lt;/a&gt;
  &lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=c53b1324c3600c008ec06e3bb1da4030&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.38</guid>
		</item>
		<item>
			<title>PrePrint: Consistency of Topological Moves Based on the Balanced Minimum Evolution Principle of Phylogenetic Inference</title>
			<link>http://www.pheedo.com/click.phdo?i=a46d9c69ff0a3eb75674723ec0a417f8</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.37</pheedo:origLink>
			<description>Many phylogenetic algorithms search the space of possible trees using topological rearrangements and some optimality criterion. FastME is such an approach that uses the {\em balanced minimum evolution (BME)} principle, which computer studies have demonstrated to have high accuracy. FastME includes two variants: {\em balanced subtree prune and regraft (BSPR)} and {\em balanced nearest neighbor interchange (BNNI)}. These algorithms take as input a distance matrix and a putative phylogenetic tree. The tree is modified using SPR or NNI operations, respectively, to reduce the BME length relative to the distance matrix, until a tree with (locally) shortest BME length is found. Following computer simulations, it has been conjectured that BSPR and BNNI are consistent, i.e. for an input distance that is a tree-metric, they converge to the corresponding tree. We prove that the BSPR algorithm is consistent. Moreover, even if the input contains small errors relative to a tree-metric, we show that the BSPR algorithm still returns the corresponding tree. Whether BNNI is consistent remains open.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=a46d9c69ff0a3eb75674723ec0a417f8&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=a46d9c69ff0a3eb75674723ec0a417f8&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.37</guid>
		</item>
		<item>
			<title>PrePrint: Feature Selection for Gene Expression using Model-based Entropy</title>
			<link>http://www.pheedo.com/click.phdo?i=9aa5995c069fd10f45585c7efd538c13</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.35</pheedo:origLink>
			<description>Gene expression data usually contain a large number of genes, but a small number of samples. Feature selection for gene expression data aims at finding a set of genes that best discriminate biological samples of different types. Traditional gene selection based on empirical mutual information suffers the data sparseness issue due to the small number of samples. To overcome the sparseness issue, we propose a model-based approach to estimate the entropy of class variables on the model, instead of on the data itself. We use multivariate normal distributions to fit the data, because multivariate normal distributions have maximum entropy among all real-valued distributions with specified mean and standard deviation, and are widely used to approximate various distributions. Because of the large number of genes, we propose several algorithms to largely reduce the computational cost. The experiments on seven gene datasets and the comparison with other five approaches show the accuracy of the multivariate Gaussian generative model for feature selection, and the efficiency of our algorithms.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=9aa5995c069fd10f45585c7efd538c13&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=9aa5995c069fd10f45585c7efd538c13&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.35</guid>
		</item>
		<item>
			<title>PrePrint: Identification of Regulatory Modules in Time Series Gene Expression Data using a Linear Time Biclustering Algorithm</title>
			<link>http://www.pheedo.com/click.phdo?i=ddfeac48d203f39e9106c85d048caab9</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.34</pheedo:origLink>
			<description>Although most biclustering formulations are NP-hard, in time series expression data analysis, it is reasonable to restrict the problem to the identification of maximal biclusters with contiguous columns, that correspond to coherent expression patterns shared by a group of genes in consecutive time-points. This restriction leads to a tractable problem. We propose an algorithm that finds and reports all maximal contiguous column coherent biclusters in time linear in the size of the expression matrix. The linear time complexity of CCC-Biclustering relies on the use of a discretized matrix and efficient string processing techniques based on suffix trees. We also propose a method for ranking biclusters based on their statistical significance and a methodology for filtering highly overlapping and, therefore, redundant, biclusters. We report results in synthetic and real data showing the effectiveness of the approach and its relevance in the discovery of regulatory modules. Results obtained using the transcriptomic expression patterns occurring in Saccharomyces cerevisiae in response to heat stress, show not only the ability of the proposed methodology to extract relevant information compatible with documented biological knowledge, but also the utility of using this algorithm in the study of other environmental stresses, and of regulatory modules, in general.&lt;br style=&quot;clear: both;&quot;/&gt;
      &lt;a href=&quot;http://www.pheedo.com/click.phdo?s=ddfeac48d203f39e9106c85d048caab9&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=ddfeac48d203f39e9106c85d048caab9&quot;/&gt;&lt;/a&gt;
  &lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=ddfeac48d203f39e9106c85d048caab9&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.34</guid>
		</item>
		<item>
			<title>PrePrint: Exploratory Consensus of Hierarchical Clusterings for Melanoma and Breast Cancer</title>
			<link>http://www.pheedo.com/click.phdo?i=5e57e70f93b77697984979fb5b9a08ed</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.33</pheedo:origLink>
			<description>Finding subtypes of heterogeneous diseases is the biggest challenge in the area of biology. Often, clustering is used to provide a hypothesis for the subtypes of a heterogeneous disease. However, there are usually discrepancies between the clusterings produced by different algorithms. This work introduces a simple method which provides the most consistent clusters across three different clustering algorithms for a melanoma and a breast cancer dataset. The method is validated by showing that the Silhouette, Dunne's and Davies-Bouldin's cluster validation indices are better for the proposed algorithm than those obtained by $k$-means and another consensus clustering algorithm. The hypotheses of the consensus clusters on both the datasets are corroborated by clear genetic markers and 100% classification accuracy. In Bittner, et.al's melanoma dataset, a previously hypothesized primary cluster is recognized as the largest consensus cluster and a new partition of this cluster into two sub-clusters is proposed. In Veer, et.al's breast cancer dataset, previously proposed "basal" and "luminal A" subtypes are clearly recognized as the two predominant clusters. Furthermore, a new hypothesis is provided about the existence of two subgroups within the "basal" subtype in this dataset. The clusters of Veer's dataset is also validated by high classification accuracy obtained in the dataset of Vijver, et.al.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=5e57e70f93b77697984979fb5b9a08ed&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=5e57e70f93b77697984979fb5b9a08ed&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.33</guid>
		</item>
	</channel>
</rss>