Abstract
Time series are sequences of data indexed by time. Such data are collected in various domains, often in massive amounts, such that storing them proves challenging. Thus, time series are commonly stored in a compressed format. An important compression approach is piecewise linear approximation (PLA), which only keeps a small set of time points and interpolates the remainder linearly. Picking a subset of time points such that the PLA minimizes the mean squared error to the original time series is a challenging task, naturally lending itself to heuristics. We propose the piecewise linear approximation genetic algorithm (PLA-GA) for compressing time series by PLA. The PLA-GA is a memetic \((\mu + \lambda )\) GA that makes use of two distinct operators tailored to time series compression. First, we add special individuals to the initial population that are derived using established PLA heuristics. Second, we propose a novel local search operator that greedily improves a compressed time series. We compare the PLA-GA empirically with existing evolutionary approaches and with a deterministic PLA algorithm, known as Bellman’s algorithm, that is optimal for the restricted setting of sampling. In both cases, the PLA-GA approximates the original time series better and quicker. Further, it drastically outperforms Bellman’s algorithm with increasing instance size with respect to run time until finding a solution of equal or better quality – we observe speed-up factors between 7 and 100 for instances of 90, 000 to 100, 000 data points.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
Bellman’s algorithm was modified to calculate the segment errors incrementally on the fly. This reduced the memory complexity from \(\mathcal {O}(n^2)\) to \(\mathcal {O}(mn)\), allowing us to run larger experiments. This modification does not affect the asymptotic run time.
References
Agrawal, R., Faloutsos, C., Swami, A.: Efficient similarity search in sequence databases. In: Proceedings of FODO 1993, pp. 69–84 (1993)
Aruoba, S.B., Fernández-Villaverde, J.: A comparison of programming languages in economics. Working Paper 20263, National Bureau of Economic Research (2014)
Baragona, R., Battaglia, F., Poli, I.: Evolutionary Statistical Procedures. An evolutionary computation approach to statistical procedures designs and applications. Springer (2011).https://doi.org/10.1007/978-3-6SD42-16218-3
Bellman, R., Kotkin, B.: On the approximation of curves by line segments using dynamic programming. RAND Corporation, II. Technical report (1962)
Cucina, D., Rizzo, M., Ursu, E.: Multiple changepoint detection for periodic autoregressive models with an application to river flow analysis. Stoch. Environ. Res. Risk Assess. 33(4–6), 1137–1157 (2019)
Doerr, B., Doerr, C., Kötzing, T.: Static and self-adjusting mutation strengths for multi-valued decision variables. Algorithmica 80(5), 1732–1768 (2018)
Doerr, B., Fischer, P., Hilbert, A., Witt, C.: Detecting structural breaks in time series via genetic algorithms. Soft. Comput. 21(16), 4707–4720 (2017)
Douglas, D.H., Peucker, T.K.: Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. Cartograph. Int. J. Geograph. Inf. Geovis. 10(2), 112–122 (1973)
Dunning, T., Friedman, E.: Time Series Databases: New Ways to Store and Access Data. O’Reilly and Associates, Sebastopol (2014)
Durán-Rosal, A.M., Gutiérrez, P.A., Poyato, Á.C., Hervás-Martínez, C.: A hybrid dynamic exploitation barebones particle swarm optimisation algorithm for time series segmentation. Neurocomputing 353, 45–55 (2019)
Durán-Rosal, A.M., Gutiérrez, P.A., Salcedo-Sanz, S., Hervás-Martínez, C.: Dynamical memetization in coral reef optimization algorithms for optimal time series approximation. Progress in AI 8(2), 253–262 (2019)
Durán-Rosal, A.M., Peáa, P.A.G., Martínez-Estudillo, F.J., Hervás-Martínez, C.: Time series representation by a novel hybrid segmentation algorithm. In: Proceedings of HAIS 2016, vol. 9648, pp. 163–173 (2016)
Eiben, A.E., Smith, J.E.: Introduction to Evolutionary Computing. NCS. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-44874-8
Elmeleegy, H., Elmagarmid, A.K., Cecchet, E., Aref, W.G., Zwaenepoel, W.: Online piecewise linear approximation of numerical streams with precision guarantees. PVLDB 2(1), 145–156 (2009)
Esling, P., Agon, C.: Time-series data mining. ACM Comput. Surveys (CSUR) 45(1), 1–34 (2012)
Fan, J., Yao, Q.: Nonlinear Time Series. Nonparametric and parametric methods. Springer (2008). https://doi.org/10.1007/978-0-387-69395-8
Fu, T.C.: A review on time series data mining. Eng. Appl. Artif. Intell. 24(1), 164–181 (2011)
Hollmig, G., et al.: An evaluation of combinations of lossy compression and change-detection approaches for time-series data. Inf. Syst. 65, 65–77 (2017)
Hung, N.Q.V., Jeung, H., Aberer, K.: An evaluation of model-based approaches to sensor data compression. IEEE Trans. Knowl. Data Eng. 25(11), 2434–2447 (2013)
Hurvitz, P.M.: GPS and accelerometer time stamps: proper data handling and avoiding pitfalls. In: Proceedings of ACM SIGSPATIAL 2015, pp. 94–100 (2015)
Keogh, E., Chu, S., Hart, D., Pazzani, M.: Segmenting time series: a survey and novel approach. In: Data Mining in Time Series Databases, pp. 1–21. World Scientific (2004)
Last, M., Horst, B., Abraham, K.: Data mining in time series databases. World Scientific (2004)
Lin, J.W., Liao, S.W., Leu, F.Y.: Sensor data compression using bounded error piecewise linear approximation with resolution reduction. Energies 12(13), 2523 (2019)
Lovrić, M., Milanović, M., Stamenković, M.: Algoritmic methods for segmentation of time series: an overview. JCBI 1(1), 31–53 (2014)
Marmarelis, M.G.: Efficient and robust polylinear analysis of noisy time series. arXiv preprint, CoRR abs/1704.02577 (2017)
Ramer, U.: An iterative procedure for the polygonal approximation of plane curves. Comput. Graph. Image Process. 1(3), 244–256 (1972)
Ratanamahatana, C.A., Lin, J., Gunopulos, D., Keogh, E., Vlachos, M., Das, G.: Mining time series data. In: Data Mining and Knowledge Discovery Handbook, pp. 1069–1103. Springer, Boston (2005). https://doi.org/10.1007/0-387-25465-X_51
Reiss, A., Stricker, D.: Creating and benchmarking a new dataset for physical activity monitoring. In: Proceedings of PETRA 2012, pp. 1–8 (2012)
Reiss, A., Stricker, D.: Introducing a new benchmarked dataset for activity monitoring. In: Proceedings of ISWC 2012, pp. 108–109 (2012)
Sinaie, S., Nguyen, V.P., Nguyen, C.T., Bordas, S.: Programming the material point method in Julia. Adv. Eng. Softw. 105, 17–29 (2017)
Struzik, Z.R., Siebes, A.: Wavelet transform in similarity paradigm. In: Proceedings of PAKDD, pp. 295–309 (1998)
The UCR time series classification archive. http://www.springer.com/lncs. Accessed 27 June 2020
UCI machine learning repository. http://archive.ics.uci.edu/ml. Accessed 27 June 2020
Visvalingam, M., Whyatt, J.D.: Line generalisation by repeated elimination of points. Cartograph. J. 30(1), 46–51 (1993)
Welch, W.: Algorithmic complexity: three NP-hard problems in computational statistics. J. Stat. Comput. Simul. 15, 17–25 (1982)
Zhao, H., Dong, Z., Li, T., Wang, X., Pang, C.: Segmenting time series with connected lines under maximum error bound. Inf. Sci. 345, 1–8 (2016)
Zordan, D., Martínez, B., Vilajosana, I., Rossi, M.: On the performance of Lossy compression schemes for energy constrained sensor networking. ACM Trans. Sens. Netw. 11(1), 15:1–15:34 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Friedrich, T., Krejca, M.S., Lagodzinski, J.A.G., Rizzo, M., Zahn, A. (2020). Memetic Genetic Algorithms for Time Series Compression by Piecewise Linear Approximation. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Lecture Notes in Computer Science(), vol 12534. Springer, Cham. https://doi.org/10.1007/978-3-030-63836-8_49
Download citation
DOI: https://doi.org/10.1007/978-3-030-63836-8_49
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63835-1
Online ISBN: 978-3-030-63836-8
eBook Packages: Computer ScienceComputer Science (R0)