Abstract
Much of the world’s supply of data is in the form of time series. In the last decade, there has been an explosion of interest in Mining time series data. A nunber of new algorithms have been introduced to classify, cluster, segment, index, discover rules, and detect anomalies/novelties in time series. While these many different techniques used to solve these problems use a multitude of different techniques, they all have one common factor; they require some high level representation of the data, rather than the original raw data. These high level representation are necessary as a feature extraction step, or simply to make the storage, transmission, and computation of massive dataset feasible. A multitute of representations have been proposed in the literature, including spectral transform, wavelets transforms, piecewise polynomials, eigenfunctions, and symbolic mappings. This chapter gives a high-level survey of time series Data Mining tasks, with an emphasis on time series representations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aach, J. and Church, G. Aligning gene expression time series with time warping algorithms. Bioinformatics; 2001, Volume 17, pp. 495–508.
Aggarwal, C, Hinneburg, A., Keim, D. A. On the surprising behavior of distance metrics in high dimensional space. In proceedings of the 8th International Conference on Database Theory; 2001 Jan 4–6; London, UK, pp 420–434.
Agrawal, R., Faloulsos, C, Swami, A. Efficient Similarity Search in Sequence Data bases. International Conference on Foundations of Data Organization (FODO); 1993.
Agrawal, R., Lin, K.-I, Sawhney, H.S., Shim, K. Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Trime-Series Databases. Proceedings of 21st International Conference on Very Large Databases; 1995 Sep; Zurich, Switzerland, pp. 490–500.
Berndt, D.J., Clifford, J. Finding Patterns in Time Series: A Dynamic Programming Approach. In Advances in Knowledge Discovery and Data Mining AAAI/MIT Press, Menlo Park, CA, 1996, pp. 229–248.
Bollobas, B., Das, G., Gunopulos, D., Mannila, H. Time-Series Similarity Problems and Well-Separated Geometric Sets. Nordic Jour. of Computing 2001; 4.
Brin, S. Near neighbor search in large metric spaces. Proceedings of 21st VLDB; 1995.
Chakrabarti, K., Keogh, E., Pazzani, M., Mehrotra, S. Locally adaptive dimensionality reduction for indexing large time series databases. ACM Transactions on Database Systems. Volume 27, Issue 2, (June 2002). pp 188–228.
Chan, K., Fu, A. W. Efficient time series matching by wavelets. Proceedings of 15th IEEE International Conference on Data Engineering; 1999 Mar 23–26; Sydney, Australia, pp. 126–133.
Chang, C.L.E., Garcia-Molina, H., Wiederhold, G. Clustering for Approximate Similarity Search in High-Dimensional Spaces. IEEE Transactions on Knowledge and Data Engineering 2002; Jul–Aug. 14(4): 792–808.
Chiu, B.Y., Keogh, E., Lonardi, S. Probabilistic discovery of time series motifs. Proceedings of ACM SIGKDD; 2003, pp. 493–498.
Ciaccia, P., Patella, M., Zezula, P. M-tree: An efficient access method for similarity search in metric spaces. Proceedings of 23rd VLDB; 1997, pp. 426–435.
Crochemore, M., Czumaj, A., Gasjeniec, L, Jarominek, S., Lecroq, T., Plandowski, W., Rytter, W. Speeding up two string-matching algorithms. Algorithmica; 1994; Vol. 12(4/5), pp. 247–267.
Dasgupta, D., Forrest, S. Novelty Detection in Time Series Data Using Ideas from Immunology. Proceedings of 8th International conference on Intelligent Systems; 1999 Jun 24–26; Denver, CO.
Debregeas, A., Hebrail, G. Interactive interpretation of kohonen maps applied to curves. In proceedings of the 4th Int’l Conference of Knowledge Discovery and Data Mining; 1998 Aug 27–31; New York, NY, pp 179–183.
Faloutsos, C, Jagadish, H., Mendelzon, A., Milo, T. A signature technique for similarity-based queries. Proceedings of the International Conference on Compression and Complexity of Sequences; 1997 Jun 11–13; Positano-Salerno, Italy.
Faloutsos, C, Ranganaihan, M., Manolopoulos, Y. Fast subsequence matching in time-series databases. In proceedings of the ACM SIGMOD Int’l Conference on Management of Data; 1994 May 25–27; Minneapolis, MN, pp 419–429.
Ge, X., Smyth, P. Deformable Markov Model Templates for Time-Series Pattern Matching. Proceedings of 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2000 Aug 20–23; Boston, MA, pp. 81–90.
Geurts, P. Pattern extraction for time series classification. Proceedings of Principles of Data Mining and Knowledge Discovery, 5th European Conference; 2001 Sep 3–5; Freiburg, Germany, pp 115–127.
Goldin, D.Q., Kanellakis, P.C. On Similarity Queries for Time-Series Data: Constraint Specification and Implementation. Proceedings of the 1st International Conference on the Principles and Practice of Constraint Programming; 1995 Sep 19–22; Cassis, France, pp. 137–153.
Guralnik, V, Srivastava, J. Event detection from time series data. In proceedings of the 5th ACM SIGKDD Int’l Conference on Knowledge Discovery and Data Mining; 1999 Aug 15–18; San Diego, CA, pp 33–42.
Huhtala, Y., Karkkainen, J, Toivonen, H. Mining for similarities in aligned time scries using wavelet. Data Mining and Knowledge Discovery: Theory, Tools, and Technology, SPIE Proceedings Series 1995; Orlando, FL, Vol. 3695, pp. 150–160.
Hochheiser, H., Shneiderman,, B. Interactive Exploration of Time-Sereis Data. Proceedings of 4th International conference on Discovery Science; 2001 Nov 25–28; Washington, DC, pp. 441–446.
Indyk, P., Koudas, N., Muthukrishnan, S. Identifying representative trends in massive time series data sets using sketches. In proceedings of the 26th Int’l Conference on Very Large Data Bases; 2000 Sept 10–14; Cairo, Egypt, pp 363–372.
Jagadish, H. V., Mendelzon, A.O., and Milo, T. Similarity-Based Queries. Proceedings of ACM PODS; 1995 May; San Jose, CA, pp, 36–45.
Kahveci, T., Singh, A. Variable length queries for time series data. In proceedings of the 17th Int’l Conference on Data Engineering; 2001 Apr 2–6; Heidelberg, Germany, pp 273–282.
Kalpakis, K., Gada, D., Puttagunta, V. Distance measures for effective clustering of ARIMA time-series. Proceedings of the IEEE Int’l Conference on Data Mining; 2001 Nov 29–Dec 2; San Jose, CA, pp 273–280.
Kanth, K.V., Agrawal, D., Singh, A. Dimensionality reduction for similarity searching in dynamic databases. Proceedings of ACM SIGMOD International Conference; 1998, pp. 166–176.
Keogh, E. Exact indexing of dynamic time warping. Proceedings of 2th Internation Conference on Very Large Databases; 2002; Hong Kong, pp. 406–417.
Keogh, E., Chakrabarti, K., Mehrotra, S., Pazzani, M. Locally adaptive dimensionality reduction for indexing large time series databases. Proceedings of ACM SIGMOD International Conference; 2001.
Keogh, E., Chakrabarti, K., Pazzani, M., Mehrocra, S. Dimensionality reduction for fast similarity search in large time series databases. Knowledge and Information Systems 2001; 3: 263–286.
Keogh, E., Lin, J., Truppel, W. Clustering of Time Series Subsequences is Meaningless: Implications for Previous and Future Research. Proceedings of ICDM; 2003. pp. 115–122.
Keogh, E., Lonardi, S., Chiu, W. Finding Surprising Patterns in a Tune Series Database In Linear Time and Space. In the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2002 Jul 23–26; Edmonton, Alberta, Canada, pp 550–556.
Keogh, E., Lonardi, S., Ratanamahatana, C.A. Towards Parameter-Free Data Mining. Proceedings of 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2004 Aug 22–25; Seattle, WA.
Keogh, E., Pazzani, M. An enhanced representation of time series which allows fast and accurate classification, clustering and relevance feedback. Proceedings of the 4th Int’l Conference on Knowledge Discovery and Data Mining; 1998 Aug 27–31; New York, NY, pp 239–241.
Keogh, E. and Kasetty, S. On the Need for Time Series Data Mining Bench-marks: A Survey and Empirical Demonstration. In the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2002 Jul 23–26; Edmonton, Alberta, Canada, pp 102–111.
Keogh, E., Smyth, P. A Probabilistic Approach to Fast Pattern matching in Time Series Databases. Proceedings of 3rd International conference on Knowledge Discovery and Data Mining; 1997 Aug 14–17; Newport Beach, CA, pp. 24–30.
Korn, F., Jagadish, H., Faloutsos, C. Efficiently supporting ad hoc queries in large datasets of time sequences. Proceedings of SIGMOD International Conferences 1997; Tucson, AZ, pp. 289–300.
Kruskal, J.B., Sankoff, D., Editors. Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. Addison-Wesley, 1983.
Lin, J., Keogh, E., Lonardi, S., Chiu, B. A Symbolic Representation of Time Series, with Implications for Streaming Algorithms. Workshop on Research Issues in Data Mining and Knowledge Discovery, 8th ACM SIGMOD; 2003 Jun 13; San Diego, CA.
Lin, J., Keogh, E., Lonardi, S., Lankford, J. P., Nystrom, D. M. Visually Mining and Monitoring Massive Time Series. Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2004 Aug 22–25; Seattle, WA.
Ma, J., Perkins, S. Online Novelty Detection on Temporal Sequences. Proceedings of 9th International Conference on Knowledge Discovery and Data Mining; 2003 Aug 24–27; Washington DC.
Nievergelt, H, Hinterberger, H., Sevcik, K.C. The grid file: An adaptable, symmetricmultikey file structure. ACM Trans. Database Systems; 1984; 9(1): 38–71.
Palpanas, T., Vlachos, M., Keogh, E., Gunopulos, D., Truppel, W. Online Amnestic Approximation of Streaming Time Series. Proceedings of 20th International Conference on Data Engineering; 2004, Boston, MA.
Pavlidis, T., Horowitz, S. Segmentation of plane curves. IEEE Transactions on Coasters; 1974 August; Vol. C-23(8), pp. 860–870.
Popivanov, I., Miller, R. J. Similarity search over time series data using wavelets. In proceedings of the 18th Int’l Conference on Data Engineering; 2002 Feb 26–Mar 1; San Jose, CA, pp 212–221.
Rafiei, D., Mendelzon, A. O. Efficient retrieval of similar time sequences using DFT. In proceedings of the 5th Int’l Conference on Foundations of Data Organization and Algorithms; 1998 Nov 12–13; Kobe, Japan.
Ratanamahatana, C.A., Keogh, E. Making Time-Series Classification More Accurate Using Learned Constrints. Proceedings of SIAM International Conference on Data Mining; 2004 Apr 22–24; Lake Buena Vista, FL, pp. 11–22.
Ripley, B.D. Pattern recognition and neural networks. Cambridge University Press, Cambridge, UK, 1996.
Robinson, J.T. The K-d-b-tree: A search structure for large multidimensional dynamic indexes. Proceedings of ACM SIGMOD; 1981.
Shahabi, C., Tian, X., Zhao, W. TSA-tree: a wavelet based approach to improve the efficiency of multi-level surprise and trend queries. In proceedings of the 12th Int’l Conference on Scientific and Statistical Database Management; 2000 Jul 26–28; Berlin, Germany, pp 55–68.
Struzik, Z., Siebes, A. The Haar wavelet transform in the time series similarity paradigm. Proceedings of 3th European Conference on Principles and Practice of Knowledge Discovery in Databases; 1999; Prague, Czech Republic, pp. 12–22.
Tufte, E. The visual display of quantitative information. Graphics Press, Cheshire. Connecticut, 1983.
Tzouramanis, T., Vassilakopoulos, M., Manolopoulos, Y. Overlapping Linear Quadtrees: A Spatio-Temporal Access Method. ACM-GIS; 1998, pp. 1–7.
Guralnik, V, Srivastava, J. Event Detection from Time Series Data. Proceedings of ACM SIGKDD; 1999, PP 33–42.
Vlachos, M., Gunopulos, D., Das, G. Rotation Invariant Distance Measures for Trajectories. Proceedings of 10th International Conference on Knowledge Discovery and Data Mining; 2004 Aug 22–25; Seattle, WA.
Vlachos, M., Meek, C., Vagena, Z., Gonopulos, D. Identification of Similarities, Periodicities & Bursts for Online Search Queries. Proceedings of International Conference on Management of Data; 2004; Paris, France.
Weber, M., Iexa, M., Muller, W. Visualizing Time Series on Spirals. Proceedings of IEEE Symposium on Information Visualization; 2000 Oct 21–26; San Diego, CA, pp. 7–14.
Wijk, J.J. van, E. van Selow. Cluster and calendar-based visualisation of time series data. Proceedings of IEEE Symposium on Information Visualization; 1999 Oct 25–26, IEEE Computer Society, pp 4–9.
Wu, D., Agrawal, D., El Abbadi, A., Singh, A, Smith, T.R. Efficient retrieval for browsing large image databases. Proceedings of 5th International Conference on Knowledge Information; 1996; Rockville, MD, pp. 11–18.
Wu, Y, Agrawal, D., El Abbadi, A. A comparison of DFT and DWT based similarity search in time-series databases. In proceedings of the 9th ACM CIKM Int’l Conference on Information and Knowledge Management; 2000 Nov 6–11; McLean, VA, pp 488–495.
Yi, B., Faloutsos, C. Fast time sequence indexing for arbitrary lp norms. Proceedings of the 26th Int’l Conference on Very Large Databases; 2000 Sep 10–14; Cairo, Egypt, pp 385–394.
Yianilos, P. Data structures and algorithms for nearest neighbor search in general metric spaces. Proceedings of 3rd SIAM on Discrete Algorithms; 1992.
Zhu, Y, Shasha, D. StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time, Proceedings of VLDB; 2002, pp. 358–369.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer Science+Business Media, Inc.
About this chapter
Cite this chapter
Ralanamahatana, C.A., Lin, J., Gunopulos, D., Keogh, E., Vlachos, M., Das, G. (2005). Mining Time Series Data. In: Maimon, O., Rokach, L. (eds) Data Mining and Knowledge Discovery Handbook. Springer, Boston, MA. https://doi.org/10.1007/0-387-25465-X_51
Download citation
DOI: https://doi.org/10.1007/0-387-25465-X_51
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-24435-8
Online ISBN: 978-0-387-25465-4
eBook Packages: Computer ScienceComputer Science (R0)