Advanced Search
CS Search Google Search
Subscribers, please login

Published Articles >> Table of Contents >> Abstract

Publication Home Page
May 2004 (Vol. 16, No. 5)   pp. 635-639
A Selectivity Model for Fragmented Relations: Applied in Information Retrieval

Full Article Text: View linked HTML of full textDownload PDF of full textBuy this articleGet full text from IEEE Xplore

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2004.1277824
Send link to a friend

Abstract
New application domains cause today's database sizes to grow rapidly, posing great demands on technology. Data fragmentation facilitates techniques (like distribution, parallelization, and main-memory computing) meeting these demands. Also, fragmentation might help to improve efficient processing of query types such as top {\rm{N}}. Database design and query optimization require a good notion of the costs resulting from a certain fragmentation. Our mathematically derived selectivity model facilitates this. Once its two parameters have been computed based on the fragmentation, after each (though usually infrequent) update, our model can forget the data distribution, resulting in fast and quite good selectivity estimation. We show experimental verification for Zipfian distributed IR databases.
References
[1] Proc. ACM SIGMOD, 1996.
[2] Proc. Very Large Data Base Conf., M.P. Atkinson, M.E. Orlowska, P. Valduriez, S.B. Zdonik, and M.L. Brodie, eds., 1999.
[3] H.E. Blok, S. Choenni, H.M. Blanken, and P.M.G. Apers, A Selectivity Model for Fragmented Relations in Information Retrieval, Technical report, no. TR-CTIT-01-02, Centre for Telematics and Information Technology, University of Twente, The Netherlands, Jan. 2001.
[4] H.E. Blok, D. Hiemstra, S. Choenni, F. de Jong, H.M. Blanken, and P.M.G. Apers, Predicting the Cost-Quality Trade-Off for Information Retrieval Queries: Facilitating Database Design and Query Optimization Proc. ACM Conf. Information and Knowledge Management, 2001.
[5] H.E. Blok, Database Optimization Aspects for Information Retrieval PhD thesis, Univ. of Twente, Enschede, The Netherlands, 2002.
[6] A.F. Cardenas, Analysis and Performance of Inverted Data Base Structures Comm. ACM, vol. 18, no. 5, pp. 253-263, 1975.
[7] S. Chaudhuri and L. Gravano, Evaluating Top-$k$Selection Queries Proc. Very Large Data Base Conf., pp. 397-410, 1999.
[8] M.J. Carey and D. Kossmann, Reducing the Braking Distance of an SQL Query Engine Proc. Very Large Data Base Conf., pp. 158-169, 1998.
[9] S. Choenni, M. Kersten, A. Saad, and J. v.d. Akker, A Framework for Multi-Query Optimization Report CS-R9638, CWI, Centre for Math. and Computer Science, Amsterdam, 1996.
[10] S. Chaudhuri, R. Motwani, and V. Narasayya, Random Sampling for Histogram Construction: How Much Is Enough? Proc. ACM SIGMOD, pp. 436-447, 1998.
[11] S. Chaudhuri, R. Motwani, and V. Narasayya, On Random Sampling over Joins Proc. ACM SIGMOD, pp. 263-274, 1999.
[12] C.M. Chen and N. Roussopoulos, Adaptive Selectivity Estimation Using Query Feedback Proc. ACM SIGMOD, pp. 161-172, 1994.
[13] D. Donjerkovic and R. Ramakrishnan, Probabilistic Optimization of Top N Queries Proc. Very Large Data Base Conf., pp. 411-422, 1999.
[14] M. Fang, N. Shivakumar, H Garcia-Molina, R. Motwani, and J.D. Ullman, Computing Iceberg Queries Efficiently Proc. Very Large Data Base Conf., pp. 299-310, 1998.
[15] S. Ganguly, P.B. Gibbons, Y. Matias, and A. Silberschatz, Bifocal Sampling for Skew-Resistant Join Size Estimation Proc. ACM SIGMOD, pp. 271-281, 1996.
[16] Proc. Very Large Data Base Conf., A. Gupta, O. Shmueli, and J. Widom, eds., 1998.
[17] L.M. Haas, D. Kossmann, E.L. Wimmers, and J. Yang, Optimizing Queries across Diverse Data Sources Proc. Very Large Data Base Conf., 1997.
[18] A. Ijbema and H.M. Blanken, Estimating Bucket Accesses: A Practical Approach Proc. Int'l Conf. Data Eng., pp. 30-37, 1986.
[19] Y.E. Ioannidis and V. Poosala, Balancing Histogram Optimality and Practicality for Query Result Size Estimation Proc. ACM SIGMOD, pp. 233-244, 1995.
[20] R.J. Lipton, J.F. Naughton, and D.A. Schneider, Practical Selectivity Estimation through Adaptive Sampling Proc. ACM SIGMOD, H. Garcia-Molina and H.V. Jagadish, eds., pp. 1-11, 1990.
[21] V. Poosala, Y.E. Ioannidi, P.J. Haas, and E.J. Shekita, Improved Histograms for Selectivity Estimation of Range Predicates Proc. ACM SIGMOD, pp. 294-305, 1996.
[22] A.P. Sheth and J.A. Larson, Federated Database Systems for Managing Distributed, Heterogeneaous, and Autonomous Databases ACM Computing Surveys, vol. 22, no. 3, pp. 183-236, 1990.
[23] A.P. de Vries and D. Hiemstra, The miRRor DBMS at TREC Proc. Text REtrieval Conf., E.M. Voorhees and D.K. Harman, eds., pp. 725-734, NIST special publications, 1999.
[24] S.B. Yao, Approximating Block Accesses in Database Organizations Comm. ACM, vol. 20, no. 4, 260-261, 1977.
[25] G.K. Zipf, Human Behavior and the Principle of Least Effort. Reading, Mass.: Addison-Wesley, Reading, 1949.
Additional Information
Index Terms- Selectivity, fragmentation, Zipf, information retrieval, databases.

Citation:  Henk Ernst Blok, Sunil Choenni, Henk M. Blanken, Peter M.G. Apers, "A Selectivity Model for Fragmented Relations: Applied in Information Retrieval," IEEE Transactions on Knowledge and Data Engineering, vol. 16,  no. 5,  pp. 635-639,  May,  2004

RSS Feed

Similar Articles

Abstract Contents
Abstract
References
Index Terms
Citation




Free access to

  • Abstracts
  • Selected PDFs

Electronic subscribers login to:

  • Access HTML/PDFs of full text articles

Subscription information

Get a Web account

Peer Review Notice

Give us Feedback