|
Published Articles >> Table of Contents >> Abstract
21st International Conference on Data Engineering (ICDE'05)
pp. 606-617
Modeling and Managing Content Changes in Text Databases
Panagiotis G. Ipeirotis, New York University
Alexandros Ntoulas, University of California at Los Angeles
Junghoo Cho, University of California at Los Angeles
Luis Gravano, Columbia University
Full Article Text:
 
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDE.2005.91
Send link to a friend
| Abstract |
|
Large amounts of (often valuable) information are stored
in web-accessible text databases. "Metasearchers" provide
unified interfaces to query multiple such databases at
once. For efficiency, metasearchers rely on succinct statistical
summaries of the database contents to select the best databases
for each query. So far, database selection research
has largely assumed that databases are static, so the associated
statistical summaries do not need to change over time.
However, databases are rarely static and the statistical summaries
that describe their contents need to be updated periodically
to reflect content changes. In this paper, we first
report the results of a study showing how the content summaries
of 152 real web databases evolved over a period of
52 weeks. Then, we show how to use "survival analysis"
techniques in general, and Cox’s proportional hazards regression
in particular, to model database changes over time
and predict when we should update each content summary.
Finally, we exploit our change model to devise update schedules
that keep the summaries up to date by contacting databases
only when needed, and then we evaluate the quality of
our schedules experimentally over real web databases.
|
Additional Information
|
Citation:
Panagiotis G. Ipeirotis, Alexandros Ntoulas, Junghoo Cho, Luis Gravano,
"Modeling and Managing Content Changes in Text Databases,"
icde,
pp. 606-617,
21st International Conference on Data Engineering (ICDE'05),
2005
|
|