Proceedings. The Fourth International Conference on Computer and Information Technology
Download PDF

Abstract

Sequential Pattern Mining is increasingly becoming useful and essential in many scientific and commercial domains. Enormous sizes of available datasets and possibly large number of candidate patterns demand efficient and scalable algorithms. In this paper we present an efficient parallel algorithm named Pre-Clustering based Sequential Pattern Mining (PCSPM). The algorithm groups sequence data into some clusters according to a similarity definition, and then distribute the clusters to the nodes of distributed memory parallel computer and form some node sets according to the clusters. By limiting the most of communication in each node set, it can greatly reduce the unnecessary communications among parallel computing nodes, and therefore, save much time of communication. The experimental results and the relevant analysis show that PCSPM algorithm is efficient and available.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!