Proceedings. 20th International Conference on Data Engineering
Download PDF

Abstract

Statistics over the most recently observed data elements are often required in applications involving data streams, such as intrusion detection in network monitoring, stock price prediction in financial markets, web log mining for access prediction, and user click stream mining for personalization. Among various statistics, computing quantile summary is probably most challenging because of its complexity. In this paper, we study the problem of continuously maintaining quantile summary of the most recently observed N elements over a stream so that quantile queries can be answered with a guaranteed precision of ∊N. We developed a space efficient algorithm for pre-defined N that requires only one scan of the input data stream and O({{\log ( \in ^2 N)} \over \in } + {1 \over { \in ^2 }}) space in the worst cases. We also developed an algorithm that maintains quantile summaries for most recent N elements so that quantile queries on any most recent n elements (n ≤ N) can be answered with a guaranteed precision of ∊n. The worst case space requirement for this algorithm is only O({{\log ^2 ( \in N)} \over { \in ^2 }}). Our performance study indicated that not only the actual quantile estimation error is far below the guaranteed precision but the space requirement is also much less than the given theoretical bound.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles