Abstract
Detecting anomaly in data streams attracts great attention in both academic and industry communities due to its wide range application in venture analysis, network monitoring, trend analysis and so on. However, existing methods on anomaly detection suffer three problems. 1) A large number of false positive results are generated. 2) Training data are needed to build the detection model, and an appropriate time window size along with corresponding threshold has to be set empirically. 3) Both time and space overhead is usually very high. To address these limitations. We propose a fractal-model-based approach to detection of anomalies that change underlying data distribution in this paper. Both a history-based algorithm and a parameter-free algorithm are introduced. We show that the later method consumes only limited memory and does not involve any training process. Theoretical analyses of the algorithm are presented. The experimental results on real life data sets indicate that, compared with existing anomaly detection methods, our algorithm can achieve higher precision with less space and time complexity.
Similar content being viewed by others
References
Arasu, A, Babcock, B, Babu, S, Datar, M, Ito, K, Nishizawa, I, Rosenstein, J, Widom, J: Stream: The stanford stream data manager. IEEE Data Eng. Bull. 26 (1), 19–26 (2003)
Barnsley, M: Fractals Everywhere. Academic Press, New York (1988)
Barnsley, M F: Fractal functions and interpolation. Constr. Approx. 2, 303–329 (1986)
Barnsley, M F, Demko, S G: Iterated function schemes and the global construction of fractals. In: Proceedings of the Royal Society (1985)
Barnsley, M F, Elton, J H, Hardin, DP: Recurrent iterated function systems. In: Constructive Approximation (1989)
Ben-David, S, Gehrke, J, Kifer, D: Detecting change in data streams. In: Proceedings of VLDB (2004)
Borgnat, P, Flandrin, P, Amblard, P O: Stochastic discrete scale invariance. IEEE Signal Process. Lett. 9 (6), 181–184 (2002)
Bulut, A, Singh, AK: A unified framework for monitoring data streams in real time. In: Proceedings of ICDE (2005)
Cai, Y D, Clutter, D, Pape, G, Han, J, Welge, M, Auvil, L: Maids: Mining alarming incidents from data streams. In: Proceedings of SIGMOD (2004)
Carney, D, Cetintemel, U, Cherniack, M, Convey, C, Lee, S, Seidman, G, Stonebraker, M, Tatbul, N, Zdonik, SB: Monitoring streams - a new class of data management applications. In: Proceedings of VLDB (2002)
Chandrasekaran, S, Deshpande, A, Franklin, M, Hellerstein, J, Hong, W, Krishnamurthy, S, Madden, S, Raman, V, Reiss, F, Shah, M: Telegraphcq: Continuous dataflow processing for an uncertain world. In: Proceedings of CIDR (2003)
Chen, Y, Dong, G, Han, J, Wah, B W, Wang, J: Multi-dimensional regression analysis of time-series data streams. In: Proceedings of VLDB (2002)
Cormode, G, Muthukrishnan, S: What’s new: Finding significant differences in network data streams. In: Proceedings of INFOCOM (2004)
Forte, B, Vrscay, E R: Solving the inverse problem for measures using iterated function systems: A new approach. Adv. Appl. Probab. 27 (3), 800–820 (1995)
Hart, J C: Fractal image compression and the inverse problem of recurrent iterated function systems. In: Proceedings of SIGGRAPH (1994)
Hart, J C: Fractal image compression and recurrent iterated function systems. IEEE Comput. Graph. Appl. 16 (4), 25–33 (1996)
Hartenstein, H, Ruhl, M, Saupe, D, Vrscay, E R: Book chapters: On the inverse problem of fractal compression. Ergodic Theory Anal. Efficient Simul. Dyn. Syst., 617–647 (2001)
Kleinberg, J: Bursty and hierarchical structure in streams. In: Proceedings of SIGKDD (2002)
Krishnamurthy, B, Sen, S, Zhang, Y, Chen, Y: Sketch-based change detection: Methods, evaluation and applications. In: Proceedings of IMC (2003)
Mandlebrot, B B: The Fractal Geometry of Nature. Freeman, New York (1982)
Mazel, D S, Hayes, M H: Using iterated function systems to model discrete sequences. IEEE Trans. Signal Process. 40 (7), 1724–1734 (1992)
Mitra, S K, Murthy, C A, Kundu, M K, Bhattacharya, B B: Fractal image compression using iterated function system with probabilities. In: IEEE International Conference on Information Technology: Coding and Computing (2001)
Muthukrishnan, S, Shah, R, Vitter, JS: Mining deviants in time series data streams. In: Proceedings of SSDBM (2004)
O’Rourke, J: An on-line algorithm for fitting straight lines between data ranges. Commun. ACM 24 (9), 574–578 (1981). doi:10.1145/358746.358758
Qin, S, Qian, W, Zhou, A: Approximately processing multi-granularity aggregate queries over data streams. In: Proceedings of ICDE (2006)
Shahabi, C, Tian, X, Zhao, W: Tsa-tree: A wavelet-based approach to improve the efficiency of multi-level surprise and tend queries on time-series data. In: Proceedings of SSDBM (2000)
Sullivan, M, Heybey, A: Tribeca: A system for mananging large database of network traffic. In: USENIX (1998)
Vlachos, M, Meek, C, Vagena, Z, Gunopulos, D: Identifying similarities periodicities and bursts for online search queries. In: Proceedings of SIGMOD (2004)
Wadstromer, N: An automatization of barnsley’s algorithm for the inverse problem of iterated function systems. IEEE Trans. Image Process. 12 (11), 1388–1397 (2003)
Website: Internet traffic archive. http://ita.ee.lbl.gov/ (1989)
Wu, X, Barbara, D: Using fractals to compress real data sets: Is it feasible? In: Proceedings of SIGKDD (2003)
Zhou, A, Qin, S, Qian, W: Adaptively detecting aggregation bursts in data streams. In: Proceedings of DASFAA (2005)
Zhu, Y, Shasha, D: Statstream: Statistical monitoring of thousands of data streams in real time. In: Proceedings of VLDB (2002)
Zhu, Y, Shasha, D: Efficient elastic burst detection in data streams. In: Proceedings of SIGKDD (2003)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, R., Zhou, M., Gong, X. et al. Detecting anomaly in data streams by fractal model. World Wide Web 18, 1419–1441 (2015). https://doi.org/10.1007/s11280-014-0296-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-014-0296-y