Abstract
Microarray technologies are emerging as a promising tool for genomic studies. A huge body of time-course gene expression data has been and will continuously be produced by microarray experiments. Such gene expression data contains important information and has been proven useful in medical diagnosis, treatment, and drug design. The challenge now is how to analyze such data to obtain the inherent information. Cluster analysis has played an important role in analyzing time-course gene expression data. However, most clustering techniques do not take into consideration the inherent time dependence (dynamics) of time-course gene expression patterns. Accounting for the inherent dynamics of such data in cluster analysis should lead to higher quality clustering. This paper presents a model-based clustering method for time-course gene expression data. The presented method uses Markov chain models (MCMs) to account for the inherent dynamics of time-course gene expression patterns and assumes that expression patterns in the same cluster were generated by the same MCM. For the given number of clusters, the presented method computes cluster models using an EM algorithm and an assignment of genes to these models that maximizes their posterior probabilities. Further, this study employs the average adjusted Rand index (AARI) to evaluate the quality of clustering. The improved performance of the presented method is demonstrated by comparing to the k-means method on a publicly available dataset.