Abstract
We study the problem of identifying items with heavy weights in the sliding window of a weighted data stream. We give a deterministic algorithm that solves the problem within error bound ε, uses \(O(\frac{R}{\epsilon})\) space and supports \(O(\frac{R}{\epsilon})\) query and update times. Here, R is the maximum item weight. We also show that the space can be reduced substantially in practice by showing for any c > 0, we can construct an \(O(\frac{c \log R}{\epsilon^2})\)-space algorithm, which returns correct answers provided that the ratio between the total weights of any two adjacent sliding windows is not greater than c. We also give a randomized algorithm that solves the problem with success probability 1 − δ using \(O(\frac{1}{\epsilon^2}\log R \log D \log \frac{\log D}{\delta\epsilon})\) space where D is the number of distinct items in the data stream.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Arasu, A., Manku, G.S.: Approximate counts and quantiles over sliding windows. In: Symposium on Principles of Database Systems (PODS), pp. 286–296 (2004)
Cormode, G., Muthukrishnan, S.: An improved data stream summary: The count-min sketch and its applications. Journal of Algorithms 55(1), 58–75 (2005)
Datar, M., Gionis, A., Indyk, P., Motwani, R.: Maintaining stream statistics over sliding windows. In: Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 635–644 (2002)
Demaine, E.D., López-Ortiz, A., Munro, J.I.: Frequency estimation of internet packet streams with limited space. In: European Symposium on Algorithms (ESA), pp. 348–360 (2002)
Ganguly, S., Majumder, A.: CR-precis: A deterministic summary structure for update data streams. In: Chen, B., Paterson, M., Zhang, G. (eds.) ESCAPE 2007. LNCS, vol. 4614, pp. 48–59. Springer, Heidelberg (2007)
Gibbons, P.B., Tirthapura, S.: Distributed streams algorithms for sliding windows. In: ACM Symposium on Parallel Algorithms and Architectures (SPAA), pp. 63–72 (2002)
Golab, L., DeHaan, D., López-Ortiz, A., Demaine, E.D.: Finding frequent items in sliding windows with multinomially-distributed item frequencies. In: International Conference on Scientific and Statistical Database Management (SSDBM), pp. 425–426 (2004)
Karp, R.M., Shenker, S., Papadimitriou, C.H.: A simple algorithm for finding frequent elements in streams and bags. ACM Transactions on Database Systems (TODS) 28(1), 51–55 (2003)
Lee, L.K., Ting, H.F.: Maintaining significant stream statistics over sliding windows. In: Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms (SODA), pp. 724–732 (2006)
Lee, L.K., Ting, H.F.: A simpler and more efficient deterministic scheme for finding frequent items over sliding windows. In: Symposium on Principles of Database Systems (PODS), pp. 290–297 (2006)
Lin, X., Lu, H., Xu, J., Yu, J.X.: Continuously maintaining quantile summaries of the most recent N elements over a data stream. In: International Conference on Data Engineering (ICDE), pp. 362–374 (2004)
Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. In: Very Large Data Bases (VLDB) Conference, pp. 346–357 (2002)
Misra, J., Gries, D.: Finding repeated elements. Science of Computer Programming 2, 143–152 (1982)
Xu, J., Lin, X., Zhou, X.: Space Efficient Quantile Summary for Constrained Sliding Windows on a Data Stream. In: Li, Q., Wang, G., Feng, L. (eds.) WAIM 2004. LNCS, vol. 3129, pp. 34–44. Springer, Heidelberg (2004)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hung, R.Y.S., Ting, H.F. (2008). Finding Heavy Hitters over the Sliding Window of a Weighted Data Stream. In: Laber, E.S., Bornstein, C., Nogueira, L.T., Faria, L. (eds) LATIN 2008: Theoretical Informatics. LATIN 2008. Lecture Notes in Computer Science, vol 4957. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78773-0_60
Download citation
DOI: https://doi.org/10.1007/978-3-540-78773-0_60
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78772-3
Online ISBN: 978-3-540-78773-0
eBook Packages: Computer ScienceComputer Science (R0)