Advanced Search
CS Search Google Search
Subscribers, please login

Published Articles >> Table of Contents >> Abstract

15th International Conference on Pattern Recognition (ICPR'00) - Volume 2   p. 2076
Clustering Very Large Databases Using EM Mixture Models

Full Article Text: Download PDF of full textBuy this articleGet full text from IEEE Xplore

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICPR.2000.906021
Send link to a friend

Abstract
Clustering very large databases is a challenge for traditional pattern recognition algorithms, e.g. the Expectation-Maximization (EM) algorithm for fitting mixture models, because of high memory and iteration requirements. Over large databases, the cost of the numerous scans required converging and large memory requirement of the algorithm becomes prohibitive. We present a decomposition of the EM algorithm requiring a small amount of memory by limiting iterations to small data subsets. The scalable EM approach requires at most one database scan and is based on identifying regions of the data that are discardable, regions that are compressible, and regions that must be maintained in memory. Data resolution is preserved to the extent possible based upon the size of the memory buffer and fit of the current model to the data. Computational tests demonstrate that the scalable scheme outperforms similarly constrained EM approaches.
Additional Information

Citation:  P.S. Bradley, C.A. Reina, U.M. Fayyad, "Clustering Very Large Databases Using EM Mixture Models," icpr, p. 2076,  15th International Conference on Pattern Recognition (ICPR'00) - Volume 2,  2000

Similar Articles

Abstract Contents
Abstract
Citation




Free access to

  • Abstracts
  • Selected PDFs

Electronic subscribers login to:

  • Access HTML/PDFs of full text articles

Subscription information

Get a Web account

PDFs require Adobe Acrobat Reader.

Peer Review Notice

Give us Feedback