Abstract
In this paper, we consider the problem of tracking multiple quantiles of dynamically varying data stream distributions. The method is based on making incremental updates of the quantile estimates every time a new sample is received. The method is memory and computationally efficient since it only stores one value for each quantile estimate and only performs one operation per quantile estimate when a new sample is received from the data stream. The estimates are realistic in the sense that the monotone property of quantiles is satisfied in every iteration. Experiments show that the method efficiently tracks multiple quantiles and outperforms state-of-the-art methods.
Similar content being viewed by others
References
Abbasi B, Guillen M (2013) Bootstrap control charts in monitoring value at risk in insurance. Expert Syst Appl 40(15):6125–6135
Alkhamees N, Fasli M (2016) Event detection from social network streams using frequent pattern mining with dynamic support values. In: 2016 IEEE international conference on big data (big data). IEEE, pp 1670–1679
Arasu A, Manku GS (2004) Approximate counts and quantiles over sliding windows. In: Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems. ACM, pp 286–296
Cao J, Li L, Chen A, Bu T (2010) Tracking quantiles of network data streams with dynamic operations. In: INFOCOM, 2010 proceedings IEEE. IEEE, pp 1–5
Cao J, Li LE, Chen A, Bu T (2009) Incremental tracking of multiple quantiles for network monitoring in cellular networks. In: Proceedings of the 1st ACM workshop on mobile internet through cellular networks. ACM, pp 7–12
Chen F, Lambert D, Pinheiro JC (2000) Incremental quantile estimation for massive tracking. In: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 516–522
Choi B-Y, Moon S, Cruz R, Zhang Z-L, Diot C (2007) Quantile sampling for practical delay monitoring in internet backbone networks. Comput Netw 51(10):2701–2716
Datar M, Gionis A, Indyk P, Motwani R (2002) Maintaining stream statistics over sliding windows. SIAM J Comput 31(6):1794–1813
Fang Y, Zhang H, Ye Y, Li X (2014) Detecting hot topics from twitter: a multiview approach. J Inf Sci 40(5):578–593
Gilbert AC, Guha S, Indyk P, Kotidis Y, Muthukrishnan S, Strauss MJ (2002) Fast, small-space algorithms for approximate histogram maintenance. In: Proceedings of the thiry-fourth annual ACM symposium on theory of computing. ACM, pp 389–398
Gilbert AC, Kotidis Y, Muthukrishnan S, Strauss MJ (2002) How to summarize the universe: dynamic maintenance of quantiles. In: Proceedings of the 28th Kotidisinternational conference on very large data bases. VLDB Endowment, pp 454–465
Gilli M et al (2006) An application of extreme value theory for measuring financial risk. Comput Econ 27(2–3):207–228
Hammer HL, Yazidi A (2017) Incremental quantiles estimators for tracking multiple quantiles. In: International conference on industrial, engineering and other applications of applied intelligent systems. Springer, pp 202–210
Hammer HL, Yazidi A (2018) A novel incremental quantile estimator using the magnitude of the observations. In: 2018 26th Mediterranean conference on control and automation (MED). IEEE, pp 290–295
Hammer HL, Yazidi A, John OB (2018) On the classification of dynamical data streams using novel anti-Bayesian techniques. Pattern Recognit 76:108–124
Hammer HL, Yazidi A, Rue H (2018) A new quantile tracking algorithm using a generalized exponentially weighted average of observations. Appl Intell. https://doi.org/10.1007/s10489-018-1335-7
Lin X, Lu H, Xu J, Yu JX (2004) Continuously maintaining quantile summaries of the most recent n elements over a data stream. In: Proceedings of the 20th international conference on data engineering, 2004. IEEE, pp 362–373
Luo G, Wang L, Yi K, Cormode G (2016) Quantiles over data streams: experimental comparisons, new analyses, and further improvements. VLDB J 25(4):1–24
Ma Q, Muthukrishnan S, Sandler M (2013) Frugal streaming for estimating quantiles. In: Brodnik A, López-Ortiz A, Raman V, Viola A (eds) Space-efficient data structures, streams, and algorithms. Springer, Berlin, pp 77–96
Norman MF (1972) Markov processes and learning models, vol 84. Academic Press, New York
Sommers J, Barford P, Duffield N, Ron A (2007) Accurate and efficient SLA compliance monitoring. In: ACM SIGCOMM computer communication review, vol 37. ACM, pp 109–120
Sommers J, Barford P, Duffield N, Ron A (2010) Multiobjective monitoring for SLA compliance. IEEE/ACM Trans Netw (TON) 18(2):652–665
Song G, Ye Y, Zhang H, Xu X, Lau RYK, Liu F (2016) Dynamic clustering forest: an ensemble framework to efficiently classify textual data stream with concept drift. Inf Sci 357:125–143
Stahl V, Fischer A, Bippus R (2000) Quantile based noise estimation for spectral subtraction and wiener filtering. In: 2000 IEEE international conference on acoustics, speech, and signal processing, 2000. ICASSP’00, vol 3. IEEE, pp 1875–1878
Tierney L (1983) A space-efficient recursive procedure for estimating a quantile of an unknown distribution. SIAM J Sci Stat Comput 4(4):706–711
Tiwari N, Pandey PC (2018) A technique with low memory and computational requirements for dynamic tracking of quantiles. J Signal Process Syst. https://doi.org/10.1007/s11265-017-1327-6
Yazidi A, Hammer HL (2017) Multiplicative update methods for incremental quantile estimation. IEEE Trans Cybern. https://doi.org/10.1109/TCYB.2017.2779140
Zhang L, Guan Y (2008) Detecting click fraud in pay-per-click streams of online advertising networks. In: The 28th international conference on distributed computing systems, 2008. ICDCS’08. IEEE, pp 77–84
Zhang X, Alexander L, Hegerl GC, Jones P, Tank AK, Peterson TC, Trewin B, Zwiers FW (2011) Indices for monitoring changes in extremes based on daily temperature and precipitation data. Wiley Interdiscip Rev Clim Change 2(6):851–870
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Proof of Theorem 1
Appendix: Proof of Theorem 1
We will first present a theorem due to Norman [20] that will be used to prove Theorem 1. Norman [20] studied distance “diminishing models”. The convergence of \({\widehat{Q}}_n(q_k)\) to \(Q(q_k)\) is a consequence of this theorem.
Theorem 2
Letx(t) be a stationary Markov process dependent on a constant parameter\(\theta \in [0,1]\). Each\(x(t) \in I\), whereIis a subset of the real line. Let\(\delta x(t)=x(t+1)-x(t)\). The following are assumed to hold:
- 1.
I is compact
- 2.
\(E [\delta x(t) | x(t)=y]= \theta w(y)+ O(\theta ^2)\)
- 3.
\(Var [\delta x(t) | x(t)=y]= \theta ^2 s(y)+ O(\theta ^2)\)
- 4.
\(E [\delta x(t)^3 | x(t)=y]= O(\theta ^3)\)where\(sup_{y \in I} \frac{O(\theta ^k)}{\theta ^k}< \infty\)for\(k=2,3\)and\(sup_{y \in I} \frac{o(\theta ^2)}{\theta ^2} \rightarrow 0\) as \(\theta \rightarrow 0\)
- 5.
w(y) has a Lipschitz derivative inI
- 6.
s(y) is LipschitzI.
If Assumptions 1 to 6 above hold,w(y) has a unique root\(y^*\)inIand\(\frac{d w}{d y} \bigg |_{y=y^*} \le 0\)then
- 1.
\(var [\delta x(t) | x(0)=x]=O(\theta )\)uniformly for all\(x \in I\)and\(t \ge 0\). For any\(x \in I\), the differential equation\(\frac{d y(\tau )}{d \tau }=w(y(t))\)has a unique solution\(y(\tau )=y(\tau ,x)\)with\(y(0)=x\)and\(E [\delta x(t) | x(0)=x]=y( t \theta )+O(\theta )\)uniformly for all\(x \in I\) and \(t \ge 0\).
- 2.
\(\frac{x(t)-y(t \theta )}{\sqrt{\theta }}\)has a normal distribution with zero mean and finite variance as\(\theta \rightarrow 0\)and\(t \theta \rightarrow \infty\).
Having presented Theorem 2, we are now ready to prove Theorem 1 by resorting to Theorem 2. This is the main result of this paper. We will prove the convergence of \({\widehat{Q}}_n(q_k)\) for \(k=2,3,\ldots ,K-1\) below. The proof for \({\widehat{Q}}_n(q_1)\) and \({\widehat{Q}}_n(q_K)\) can be done in the same manner and are not shown in this paper for the sake of brevity.
Proof
We now start by showing that the Markov process based on the updating rules (9)–(11) satisfies the assumptions 1 to 6 in Theorem 2.
We now let \(\theta = \beta\), \(y = {\widehat{Q}}_{n}(q_{k})\) and \(w\left( {\widehat{Q}}_{n}(q_{k})\right)\) be equal to “everything” in (13) except \(\beta\). It is easy to see that assumption 2 in Theorem 2 is satisfied. Next, we turn to assumption 5 which requires that \(w\left( {\widehat{Q}}_{n}(q_{k})\right)\) has a Lipschitz derivative with respect to \({\widehat{Q}}_{n}(q_{k})\). Unfortunately it is not obvious that this is satisfied since H has a discontinuous derivative with respect to \({\widehat{Q}}_{n}(q_{k})\) due to the min-function in (6). To show that both assumptions 2 and 5 are satisfied, we need to perform a subtle modification of (13) as follows. A typical example of H as a function of \({\widehat{Q}}_{n}(q_{k})\) is shown as the black curve in Fig. 7. We define a function \(H^{*}\) that is equal to H except for an the interval \([Q^{*} - \delta , Q^{*} + \delta ]\) (see Fig. 7). In the interval \([Q^{*} - \delta , Q^{*} + \delta ]\), \(H^{*}\) is a function that is smaller then H (to satisfy the monotone property) and has a Lipschitz derivative. This requires that \(H^{*}\) satisfy the following
i.e. that the function value and the derivative must be equal for H and \(H^{*}\) in \(Q^{*} - \delta\) and \(Q^{*} + \delta\). It is straightforward to satisfy these criteria, e.g. by fitting a polynomial. \(H^{*}\) is illustrated as the grey curve in Fig. 7 (and is equal to H outside the interval). By reducing the value of \(\delta\), \(H^{*}\) will be more and more similar to H. In other words, there exists always a \(\delta\) such that
which means that we can write
Substituting (14) into (13) we obtain
Since \(H^{*}\) has a Lipschitz derivative, we see that with
both assumptions 2 and 5 are satisfied.
Next we turn to assumption 3.
We see that assumption 3 is satisfied with \(s\left( {\widehat{Q}}_{n}(q_{k})\right)\) equal to everything in (16) except \(\beta ^2\). Since H is Lipschitz (has a bounded derivative), \(s\left( {\widehat{Q}}_{n}(q_{k})\right)\) is Lipschitz and assumption 6 is also satisfied. Assumption 4 can now be proved in the same manner.
We will use the results of Norman to prove the convergence. It is easy to see that \(w\left( {\widehat{Q}}_{n}(q_{k})\right)\) in (15) admits two roots \({\widehat{Q}}_n(q_k) = {F_X}^{-1}(q_k) = Q(q_k)\) and \({\widehat{Q}}_n(q_k) = 0\). By introducing an arbitrarily small lower bound \(Q_{\text {min}}>0\) on estimate \({\widehat{Q}}_{n}(q_{k})\), we can avoid the \({\widehat{Q}}_n(q_k)=0\). This is easily implemented by modifying the update rules and adding \(Q_{\text {min}}\) to the right term of Eqs. (9)–(11). Therefore, the unique root becomes \({\widehat{Q}}_n(q_k)={F_X}^{-1}(q_k) = Q(q_k)\).
We now differentiate to get
We substitute the unique root \(Q(q_k)\) for \({\widehat{Q}}_{n}(q_{k})\) and get
This gives
and
Consequently
□
Rights and permissions
About this article
Cite this article
Hammer, H.L., Yazidi, A. & Rue, H. Tracking of multiple quantiles in dynamically varying data streams. Pattern Anal Applic 23, 225–237 (2020). https://doi.org/10.1007/s10044-019-00778-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-019-00778-3