|
Published Articles >> Table of Contents >> Abstract
2003 IEEE International Conference on E-Commerce Technology (CEC'03)
p. 381
Page Digest for Large-Scale Web Services
Daniel Rocco, College of Computing
David Buttler, College of Computing
Ling Liu, College of Computing
Full Article Text:
 
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/COEC.2003.1210274
Send link to a friend
| Abstract |
|
We introduce Page Digest, a mechanismfor efficient storage
and processing of Web documents. The Page Digest
design encourages a clean separation of the structural elements
of Web documents from their content. Its encoding
transformation produces many of the advantages of traditional
string digest schemes yet remains invertible without
introducing significant additional cost or complexity. Using
the Page Digest encoding can provide at least an order
of magnitude speedup when traversing a Web document
as compared to using a standard Document Object Model
implementation. Our experiments show that change detection
using Page Digest operates in linear time, offering 75%
improvement in execution performance compared with existing
systems. In addition, the Page Digest encoding can
reduce the tag name redundancy found in Web documents,
allowing 30% to 50% reduction in document size.
|
Additional Information
|
Citation:
Daniel Rocco, David Buttler, Ling Liu,
"Page Digest for Large-Scale Web Services,"
cec,
p. 381,
2003 IEEE International Conference on E-Commerce Technology (CEC'03),
2003
|
|