Abstract
MPdist is a distance measure which considers two time series to be similar if they share many similar subsequences. However, computing MPdist can be slow, especially for large time series. We propose a technique for the approximate computation of MPdist that uses the SAX representation of the time series to quickly estimate the Nearest Neighbor (NN) distance of each subsequence, and then applies a Machine Learning model to correct the accuracy loss incurred. Our method is orders of magnitude faster than the exact computation of MPdist; at the same time, our best approximation computes the NN of a time series with high accuracy. A thorough evaluation of our technique is provided.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The source code and test data are available upon request.
References
Aggarwal, C.C.: Data Mining. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-14142-8
Berndt, D.J., Clifford, J.: Using dynamic time warping to find patterns in time series. In: Proceedings of the 3rd ICKDDM, pp. 359–370 (1994)
Bountrogiannis, K., Tzagkarakis, G., Tsakalides, P.: Distribution agnostic symbolic representations for time series dimensionality reduction and online anomaly detection. IEEE Trans. Knowl. Data Eng. 35(6), 5752–5766 (2023)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Cai, Y., Ng, R.: Indexing spatio-temporal trajectories with chebyshev polynomials. In: ACM SIGMOD, p. 599–610 (2004)
Chan, K., Fu, A.W.: Efficient time series matching by wavelets. In: ICDE (1999)
Chatzigeorgakidis, G., Skoutas, D., Patroumpas, K., Palpanas, T., Athanasiou, S., Skiadopoulos, S.: Twin subsequence search in time series. arXiv:2104.06874 (2021)
Drucker, H., Burges, C.J.C., Kaufman, L., Smola, A., Vapnik, V.: Support vector regression machines. In: Advances in Neural Information Processing Systems, vol. 9 (1996)
Gharghabi, S., Imani, S., Bagnall, A., Darvishzadeh, A., Keogh, E.: Matrix profile XII: MPdist: a novel time series distance measure to allow data mining in more challenging scenarios. In: ICDM, pp. 965–970 (2018)
Haykin, S.: Neural Networks: A Comprehensive Foundation, 1st edn. Prentice Hall PTR, USA (1994)
Keogh, E.J., Chakrabarti, K., Pazzani, M.J., Mehrotra, S.: Dimensionality reduction for fast similarity search in large time series databases. Knowl. I. S. 3, 263–286 (2001). https://doi.org/10.1007/PL00011669
Lin, J., Keogh, E., Wei, L., Lonardi, S.: Experiencing SAX: a novel symbolic representation of time series. Data Min. Knowl. Discov. 15(2), 107–144 (2007)
Palpanas, T.: Data series management: the road to big sequence analytics. SIGMOD Rec. 44(2), 47–52 (2015)
Palpanas, T.: Data series management: the next challenge. In: ICDM, pp. 196–199 (2016)
Tsoukalos, M.: Time Series Indexing. Packt Publishing (2023)
Tsoukalos, M., Platis, N., Vassilakis, C.: Estimating iSAX parameters for efficiency. In: ADBIS, pp. 3–12 (2023)
Wang, X., Mueen, A., Ding, H., Trajcevski, G., Scheuermann, P., Keogh, E.: Experimental comparison of representation methods and distance measures for time series data. Data Min. Knowl. Disc. 26(2), 275–309 (2013)
Wang, Y., Wang, P., Pei, J., Wang, W., Huang, S.: A data-adaptive and dynamic segmentation index for whole matching on time series. Proc. VLDB Endow. 6(10), 793–804 (2013)
Wang, Z., Wang, Q., Wang, P., Palpanas, T., Wang, W.: Dumpy: a compact and adaptive index for large data series collections. Proc. ACM Manage. Data 1(1), 1–27 (2023)
Yeh, C.M., et al.: Matrix Profile I: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In: ICDM (2016)
Acknowledgments
This research was partly funded by the SODASENSE project (https://sodasense.uop.gr/) under grant agreement No. MIS 5060275 (co-financed by Greece and the EU through the European Regional Development Fund).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Disclosure of Interests
All authors have no conflicts of interest.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Tsoukalos, M., Chronis, P., Platis, N., Vassilakis, C. (2025). Estimating MPdist with SAX and Machine Learning. In: Tekli, J., et al. New Trends in Database and Information Systems. ADBIS 2024. Communications in Computer and Information Science, vol 2186. Springer, Cham. https://doi.org/10.1007/978-3-031-70421-5_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-70421-5_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70420-8
Online ISBN: 978-3-031-70421-5
eBook Packages: Computer ScienceComputer Science (R0)