Advanced Search
CS Search Google Search
Subscribers, please login

Published Articles >> Table of Contents >> Abstract

Publication Home Page
July/August 2003 (Vol. 15, No. 4)   pp. 855-870
Searching with Numbers

Full Article Text: View linked HTML of full textDownload PDF of full textBuy this articleGet full text from IEEE Xplore

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2003.1209004
Send link to a friend

Abstract
A large fraction of the useful web is comprised of specification documents that largely consist of \langleattribute name, numeric value\rangle pairs embedded in text. Examples include product information, classified advertisements, resumes, etc. The approach taken in the past to search these documents by first establishing correspondences between values and their names has achieved limited success because of the difficulty of extracting this information from free text. We propose a new approach that does not require this correspondence to be accurately established. Provided the data has “low reflectivity”, we can do effective search even if the values in the data have not been assigned attribute names and the user has omitted attribute names in the query. We give algorithms and indexing structures for implementing the search. We also show how hints (i.e., imprecise, partial correspondences) from automatic data extraction techniques can be incorporated into our approach for better accuracy on high reflectivity data sets. Finally, we validate our approach by showing that we get high precision in our answers on real data sets from a variety of domains.
References
[1] S. Berchtold, C. Bohm, D.A. Keim, F. Krebs, and H.-P. Kriegel, On Optimizing Nearest Neighbor Queries in High-Dimensional Data Spaces Proc. Eighth Int'l Conf. Database Theory, Jan. 2001.
[2] A. Broder and M. Henzinger, Information Retrieval on the Web: Tools and Algorithmic Issues Foundations of Computer Science, invited tutorial, 1998.
[3] S. Chakrabarti, Data Mining for Hypertext: A Tutorial Survey SIGKDD Explorations, vol. 1, no. 2, 2000.
[4] B. Cherkassky, A. Goldberg, P. Martin, J. Setubal, and J. Stolfi, Augment or Push? A Computational Study of Bipartite Matching and Unit Capacity Flow Algorithms Technical Report 97-127, NEC Research Inst., 1997.
[5] J. Cho and S. Rajagopalan, A Fast Regular Expression Indexing Engine Proc. Int'l Conf. Data Eng., Feb. 2002.
[6] D. Comer, The Ubiquitous B-tree ACM Computing, Surveys, vol. 11, no. 2, pp. 121-138, June 1979.
[7] A. Crespo, J. Jannink, E. Neuhold, M. Rys, and R. Studer, A Survey of Semi-Automatic Extraction and Transformation http://www-db.stanford.edu/crespopublications /, 1994.
[8] U. Dayal and H.-Y. Hwang, View Definition and Generalization for Database Integration in a Multidatabase System IEEE Trans. Software Eng., vol. 10, no. 6, pp. 628-645, 1984.
[9] S. Deerwester, S. Dumais, G. Furnas, T. Landauer, and R. Harshman, Indexing by Latent Semantic Analysis J. Am. Soc. Information Science, vol. 41, no. 6, pp. 391-407, 1990.
[10] C. Dwork, M. Naor, R. Kumar, and D. Sivakumar, Rank Aggregation Methods for the Web Proc. 10th Int'l World Wide Web Conf., May 2001.
[11] R. Fagin, Combining Fuzzy Information from Multiple Systems (extended abstract) Proc. Symp. Principles of Database Systems, 1996.
[12] R. Fagin, A. Lotem, and M. Naor, Optimal Aggregation Algorithms for Middleware Proc. Symp. Principles of Database Systems, 2001.
[13] T. Feder and R. Motwani, Clique Partitions, Graph Compression and Speeding-Up Algorithms J. Computer and System Sciences, vol. 51, pp. 261-272, 1995.
[14] U. Guntzer, W.-T. Balke, and W. Kiessling, Optimizing Multi-Feature Queries for Image Databases Proc. 26th Int'l Conf. Very Large Databases, 2000.
[15] P. Indyk and R. Motwani, Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality Proc. ACM Symp. Theory of Computing, pp. 604-613, 1998.
[16] V. Kashyap and A.P. Sheth, Semantic and Schematic Similarities between Database Objects: A Context-Based Approach VLDB J., vol. 5, no. 4, pp. 276-304, 1996.
[17] W. Kim and J. Seo, "Classifying Schematic and Data Heterogeneity in Multidatabase Systems," Computer, Dec. 1991.
[18] Physics and Astronomy of the Moon, Z. Kopal, ed. Academic Press, 1962.
[19] I. Muslea, Extraction Patterns for Information Extraction Tasks: A Survey Proc. AAAI-99 Workshop Machine Learning for Information Extraction, 1999.
[20] S. Nepal and M.V. Ramakrishna, Query Processing Issues in Image (Multimedia) Databases Proc. Int'l Conf. Data Eng., pp. 22-29, 1999.
[21] N. Roussopoulos, S. Kelley, and F. Vincent, Nearest Neighbor Queries Proc. 1995 ACM SIGMOD Int'l Conf. Management of Data, pp. 71-79, 1995.
[22] G. Salton, Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. New York: Addison-Wesley, 1989.
Additional Information
Index Terms- Specification documents, search engines, queries, numbers, hints, data extraction, heterogeneous databases.

Citation:  Rakesh Agrawal, Ramakrishnan Srikant, "Searching with Numbers," IEEE Transactions on Knowledge and Data Engineering, vol. 15,  no. 4,  pp. 855-870,  Jul/Aug,  2003

RSS Feed

Similar Articles

Abstract Contents
Abstract
References
Index Terms
Citation




Free access to

  • Abstracts
  • Selected PDFs

Electronic subscribers login to:

  • Access HTML/PDFs of full text articles

Subscription information

Get a Web account

Peer Review Notice

Give us Feedback