Advanced Search
CS Search Google Search
Subscribers, please login

Published Articles >> Table of Contents >> Abstract

18th International Conference on Data Engineering (ICDE'02)   p. 0041
Detecting Changes in XML Documents

Full Article Text: Download PDF of full textBuy this articleGet full text from IEEE Xplore

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDE.2002.994696
Send link to a friend

Abstract
We present a diff algorithm for XML data. This work is motivated by the support for change control in the context of the Xyleme project that is investigating dynamic warehouses capable of storing massive volume of XML data. Because of the context, our algorithm has to be very efficient in terms of speed and memory space even at the cost of some loss of ``quality''. Also, it considers, besides insertions, deletions and updates (standard in diffs), a move operation on subtrees that is essential in the context of XML. Intuitively, our diff algorithm uses signatures to match (large) subtrees that were left unchanged between the old and new versions. Such exact matchings are then possibly propagated to ancestors and descendants to obtain more matchings. It also uses XML specific information such as ID attributes. We provide a performance analysis of the algorithm. We show that it runs in average in linear time vs. quadratic time for previous algorithms. We present experiments on synthetic data that confirm the analysis. Since this problem is NP-hard, the linear time is obtained by trading some quality. We present experiments (again on synthetic data) that show that the output of our algorithm is reasonably close to the ``optimal'' in terms of quality. Finally we present experiments on a small sample of XML pages found on the Web.
Additional Information
Index Terms- Semi-Structured Data, XML, Diff, Tree Pattern Matching, Change Detection, Versioning

Citation:  Gregory Cobena, Serge Abiteboul, Amelie Marian, "Detecting Changes in XML Documents," icde, p. 0041,  18th International Conference on Data Engineering (ICDE'02),  2002

Similar Articles

Abstract Contents
Abstract
Index Terms
Citation




Free access to

  • Abstracts
  • Selected PDFs

Electronic subscribers login to:

  • Access HTML/PDFs of full text articles

Subscription information

Get a Web account

Peer Review Notice

Give us Feedback