Advanced Search
CS Search Google Search
Subscribers, please login

Published Articles >> Table of Contents >> Abstract

International Conference on Information Technology: Computers and Communications   p. 199
A Method for Calculating Term Similarity on Large Document Collections

Full Article Text: Download PDF of full textBuy this articleGet full text from IEEE Xplore

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ITCC.2003.1197526
Send link to a friend

Abstract
We present an efficient algorithm called the Quadtree Heuristic for identifying a list of similar terms for each unique term in a large document collection. Term similarity is defined using the Expected Mutual Information Measure (EMIM). Since our aim for defining the similarity lists is to improve information retrieval (IR), we present the outcome of an experiment comparing the performance of an IR engine designed to use the similarity lists. Two methods were used to generate similarity lists: a brute-force technique and the Quadtree Heuristic. The performance of the list generated by the Quadtree Heuristic was commensurate with the brute force list.
Additional Information

Citation:  Wolfgang W. Bein, Jeffrey S. Coombs, Kazem Taghva, "A Method for Calculating Term Similarity on Large Document Collections," itcc, p. 199,  International Conference on Information Technology: Computers and Communications,  2003

Similar Articles

Abstract Contents
Abstract
Citation




Free access to

  • Abstracts
  • Selected PDFs

Electronic subscribers login to:

  • Access HTML/PDFs of full text articles

Subscription information

Get a Web account

PDFs require Adobe Acrobat Reader.

Peer Review Notice

Give us Feedback