Abstract
Shapelets are discriminative local patterns in time series, which maximally distinguish among different classes. Instead of considering full series, shapelet transformation considers the existence or absence of local shapelets, which leads to high classification accuracy, easy visualization and interpretability. One of the limitation of existing methods is robustness. For example, Search-based approaches select sample subsequences as shapelets and those methods intuitively may be not accurate and robust enough. Learning-based approaches learn shapelets by maximizing the discriminative ability. However, those methods may not preserve basic shape for visualization. In practice, shapelets are subjected to various geometric transformations, such as translation, scaling, and stretching, which may result in a confusion of shapelet judgement. In this paper, robust shapelet learning is proposed to solve above problems. By learning transform-invariant representative prototypes from all training time series, rather than just selecting samples from the sequences, each time series sample could be approximated by the combination of the transformations of those prototypes. Based on the combination, samples could be easily classified into different classes. Experiments on 16 UCR time series datasets showed that the performance of the proposed framework is comparable to the state-of-art methods, but could learn more representative shapelets for complex scenarios.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Time series classification has attracted a lot of attention in many applications, such as finance (e.g. stock market), medical diagnosis (e.g. EEG and ECG), motion capture and speech recognition. Since the order of series and the dependence of close time stamps are crucial in finding the most discriminative features and patterns, it raises great difficulty for algorithms to classify time series data. Recently, a new primitive named Shapelets has been generating increasing interests in time series classification (TSC). Shapelets are discriminative continuous snippets of full series, which maximally distinguish among different classes. Hence, shapelets can be treated as representative of some class, and time series classification turns out to be the problem of presence or absence of some shapelets for representing time series data.
Shapelets for time series have attracted many researchers’ interest for two main reasons. First, shapelets focus on local variation rather than global variation as traditional algorithms did, which could be more robust to noise and available for early time series prediction. Second, shapelets reveal the inherent attention mechanism of data so that allow for easier summarization and visualization, which provides explanatory insights to the classification problem.
Despite the above advantages of shapelets, current shapelets are a little clumsy. We observe that samples in the same class always share a basic shape, while in practice, the basic shape may subject to individual difference and go through various deformations, such as shift, translation, scaling, stretching and so on. As shown in Fig. 1, the cosine curve is actually similar to the transformed cosine curve in term of shape, despite individual noises and slight differences in phase (shift), amplitude (scaling), offset (translation), uniform scaling (stretching).
In this paper, we introduce a new conception, “shapelets prototypes” and propose a transform-invariant robust shapelet learning framework based on dictionary learning theory. The proposed framework aims to learn representative basic shapes that are learned from the transformed subsequences by alternative iteration. In each iteration, robust shapelet learning performs two steps: a) in the alignment step, the best transformation operators are automatically obtained by minimizing the average least square error between the transformed subsequences and current shapelets prototype; b) in the refinement step, the dictionaries are updated to reflect the basic shapes from transformed training series. Figure 2 shows a real-world example from the GunPoint dataset. S1 and S2 are learned shapelet prototypes, T1–T3 are three time series from gun class, and T4–T6 are examples from no gun class. Here S1 represents an action of drawing guns from holsters, and S2 represents returning guns to the holsters, which are critical patterns for classifying the two classes. While such actions are subject-dependent, and the corresponding patterns from samples shows a high variety due to individual factors. As seen from Fig. 2, our proposed method can align probe sample to learned prototypes and reveals the inherent knowledge of data. Our contributions can be summarized as:
-
We propose a robust shapelet learning framework based on dictionary learning theory, which is invariant to various deformations;
-
The discovered shapelet prototypes can explore intrinsic shapes which are more general and expressive.
-
Shapelet prototypes well preserve the basic shapes and hence have better interpretability.
2 Related Work
Shapelets are discriminative shapes that can be used to classify time series data effectively. Shapelet learning algorithms make classification decision based on the presence or absence of shapelets in representing a time series. Existing shapelet learning methods can be categorized into search-based algorithms [7, 8, 10, 13, 14] and learning-based algorithms [4, 12]. The original search-based algorithm adopts a brute-force strategy to select shapelets from a large pool of candidate segments and select the most expressive subsequences by various quality criteria, which is time-consuming [14]. In order to reduce the searching time, several algorithms have been proposed to speed up the algorithm by skipping similar segments so that the number of candidates are greatly reduced [7, 8, 10, 13]. Instead of learning the shapelets first, learning-based algorithms try to learn shapelets and classify time series data simultaneously. Grabocka et al. proposed a classification logistic loss function to jointly learn the shapelets and the logistic regression classifier through stochastic gradient descent approach. Search-based approaches choose existing subsequences from training time series, while learning-based approach sometimes may not preserve the basic shape of shapelets so that may lack of interpretability, as it mainly considered classification ability. In other words, the above methods lack of the ability of learning a transform-invariant prototypes from training series.
Invariance of transformations is important for time-series domain because sequences are easily distorted and always show high variety due to geometric transformations. For examples, scaling (amplitude) and translation (offset) invariance might benefit for seasonal variations of markets with inflation motion caption [9], and shift invariance [15] is essential for the case where time series share similar patterns but in different phase. In addition, uniform scaling invariance is required for heartbeats with measurement periods of different sampling frequency.
Dictionary learning has been proposed to learn a set of basis for compact representation. In dictionary learning frameworks, each sample T can be approximated by a sparse linear combination of learned basis \(\{D_k\}_{k=1}^K\).
Such models always employ reconstruction error as objective loss function, where basis and representation coefficients are optimized by alternate iteration. Dictionary learning has been proved feasible and desirable in image classification [6], signal reconstruction and representation [1, 11]. Further, it has been shown great power in scalable data mining and has strong interpretability and generalization capability [16].
3 Learning Transform Invariant Shapelet Prototypes
Formulations. In this section we adopt a conception of “prototype” D, representative shapes learned from training samples. For a set of subsequences without transformation, the prototype is usually computed based on Euclidean space:
While it’s not appropriate to utilize Euclidean distance by L2 norm in many cases, as we often pay more attention to the shape similarity. Even tiny operation in scaling, translation, and stretching may rapidly swamp shape similarity measured by Euclidean distance.
For subsequences with various transformations, we introduce a conception of transformation operator \(\tau \), and \(\tau \) may be a compound of multiple transformations. An intuitive observation is that each sample subsequences is transformed from shared “prototype” dictionary bases D by a corresponding transformation operator:
where each sample has specific transformation operator \(\tau ^i\). Therefore, the learned dictionaries, removing the effect of transformations, is defined as “prototypes” and could be estimated by minimizing the reconstruction error between the sample subsequence and the aligned reconstruction samples. The reformulation is as follows:
Here prototypes D for series and transformation operator \(\tau ^i\) and codes \(\alpha ^i\) for each sample need to be optimized.
Transform Definition
The proposed robust shapelet learning framework aims to find the intrinsic local patterns of time series. For a prototype \(w \in R^p\), a vector ordered by time stamps, most classical linear transformations can be represented as:
where \(a_i\) is a scaling factor, \(c_i\) is a translation factor and \(\mu _i\) is a stretching factor. Here \(\tau _i\) defines a general transformation, where scaling, translation and stretching operators are its special cases:
-
Scaling operator: Scaling transformation describes differences in amplitude between w(t) and transformed sequence \(T_i(t)\):
$$\begin{aligned} T_i(t) = a_i w(t) \quad t = 1,\ldots ,p \end{aligned}$$(6) -
Translation operator: Translation transformation describes a translation along y axis from w(t) to transformed sequence \(T_i(t)\):
$$\begin{aligned} T_i(t) = w(t)+c_i \quad t = 1, \ldots , p \end{aligned}$$(7) -
Stretch operator [3]: Uniform scaling transformation is used for matching sequences with different lengths. Subsequences with different length may require a stretching or shrinking operation due to different tempo or sampling frequency:
$$\begin{aligned} T_i(t) = w(\left\lceil \frac{t}{\mu } \right\rceil ) \quad t = 1,\ldots ,\left\lceil \mu p \right\rceil \end{aligned}$$(8)
Note that stretching operator defined is motivated by uniform scaling operation through ceiling function [3].
Figure 3 shows the above defined transformations. Similarly, we can define the inverse operators for the above transformations.
4 Model Inference
Optimization objective function is a non-convex problem with respect to the prototypes and transformation operators. We adopt a coordinate descent approach for iteration alternatively. And in each iteration, we perform two steps shown in Algorithm 1: (i) Alignment step: Given current dictionary, updating corresponding transformation operator for each sample subsequences. (ii) Refinement step: Given all transformation operators, the dictionary is updated to reflect the basic shapes learnt from training series.
Alignment Step. Fix the dictionary and sparse coding, i.e., D and \(\alpha \), reconstruction sequence has been determined. We are going to find the optimal alignment (deformation) between original series \(T^i\) and reconstruction sequence. To learn the transform parameters, we minimize the mean square error between the reconstruction sequence and subsequence of time series by all possible conversion. It’s remarkable that for input segments \(T^i_j\) with stretching factor \(\mu \), it has an unequal length to the dictionary. So it needs to do an inverse operation \(\mu ^{-1}\). For a given sparse coding and dictionary, problems can be solve as:
where \(\tau = \{a,\mu ,c \}\), superscript and subscript are dropped for clarify. Obviously, a and c are scaling and translation factors with continuous values, \(\mu \) is a stretch factor with discrete factor. So fixed \(\mu \), optimal a, c can be derived by Least Square methods. Then a grid search will be conducted by all possible \(\mu \). With obtained parameters, original subsequence \(\tau (T)\) at transformed location are exacted, as well as aligned sequence \(S = \tau (T) \in R^q\).
Refinement Step. Fix the transform parameters, i.e., \(\tau = \{a,\mu ,c\}\), optimize \(\alpha , D\). If transform parameters \(\tau \) are known, it’s easy to get updated segments, i.e., updated \(\{\tau ^i(T^i)\}_{i=1}^N\), which is transformed with fixed length. The problem then reduces to a traditional dictionary learning problem:
Coordinate descent is used to solve for dictionary D and sparse coding \(\alpha \).
5 Experiment
5.1 Experiment Setting
Since there are enormous shapelet candidates, we calculated the average sequence to decide the discriminative ability by information gain. And shapes with low discriminative power have been removed so that the volume of candidates would be much smaller (similar, redundant candidates have been discarded), and only valuable candidates were selected as input segments. In training phase, the algorithm was initialized by subsequence matching, where we aligned the initial candidate to all possible segment and extracted the most similar projection. Then, these segments \(T_i\) were fed into our transform invariant prototype learning framework and after iterations, a transform-invariant dictionary could be derived. Lastly in testing phase, the sparse representation were conducted based on learned transform-invariant dictionary learning.
During candidate selection process, it requires the tuning of hyper-parameters, which were found through a grid search approach using cross-validation over the training data. The initial number of shape set was searched in a range of \( K_1 \in \{0.05, 0.1\}*Q\), and the length of shapelets \(q \in \{0.125,0.25,0.375,0.5\}*Q\), where Q is the full series length. During dictionary learning process, the number of atoms in dictionary \(K_2 \in \{5,10,15\}\), while the sparsity parameter \(\lambda \in \{0.1,1\}\). For efficiency, the operation of the above deformation could be selected flexibly.
5.2 Complexity Analysis
The time complexity of shapelet prototypes learning consists of two parts, one for initialization segments exaction, and the other for transform invariant dictionary learning. For initialization segments exaction, we need to compute a robust shape similarity, instead of optimal transformation operators, between shapelets with length q candidates and all possible segments. Therefore, we adopted a Z-normalization distance with all range of stretching factors, and the cost is O(Cq), where C is the number of stretching factors. Here, the complexity is similar to the Euclidean distance computation, with a constant multiplier.
For transform invariant dictionary learning, the cost part of alignment step is the computing for \(\tau ^i\), which takes O(CNq), including the complexity of least square solution of scaling, translation factor and grid search on stretching factors. Refinement step takes O(KNq), including sparse coding and dictionary learning. Therefore, the total complexity for one call to Algorithm 1 is \(O(M((C+K)Nq))\), where M is the maximum number of iterations allowed in Algorithm 1. M can be set quite small. In practice, 20 iterations would be sufficient.
5.3 Classification Accuracy
Experiments were performed on the 16 commonly used UCR time series benchmark which could be downloaded from UCR website [2], and the information of the those datasets were listed in Table 1. For the sake of equivalent comparison, we used the same training and testing split for all the methods. And in the experiments, SVM was chosen as the classifier.
There are many shapelet learning algorithms proposed in the last decades. Ref. [5] compared the performance of some popular shapelet learning algorithms. In this work, we compared our algorithm with three state-of-art baselines IGSVM, LTS and FLAG:
-
IGSVM [7]: Shapelet-transform algorithm, which uses the linear SVM as classifier and information as shapelet quality measurement.
-
LTS [4]: learning time series shapelets algorithm, which learns the shapelets and logistic classifier automatically and jointly.
-
FLAG [5]: Learning position of shapelets, which maximizes the ratio of projected data variances between classes by fussed lasso constraints efficiently.
Table 2 shows the classification accuracy of baseline and the proposed method. Our method shows a superiority to IGSVM, FLAG and a comparable performance to LTS, even better prediction accuracy in several dataset. In addition, dictionary learning is desirable for scalable analysis, as well as interpretability.
5.4 Exploratory Data Analysis
One of the strengths of using shapelets as a classification tool is that they provide an easy interpretation and summarization behind data that other classification approaches simply do not. It helps for mining inherent structure. One of the key motivation of our work is to capture an intrinsic structure and knowledge behind data, which is interpretable and unified for data.
To verify the power of transform invariant shapelet learning, we briefly analyze a classical problem in time series data mining domain. On the Gun/NoGun motion capture time series dataset, there are 100 instances from each class. In the Gun class, the actors have their hands by their sides, draw a gun from a hip-mounted holster, point it at a target for approximately one second, and then return the gun to the holster and their hands to their sides. In contrast, in the NoGun class, actors do the similar hands-down, point, hold, and return motion without the gun in their hands and therefore are pointing to a target using the index finger. The classification problem is to distinguish between above two very similar action. Moreover, the dataset consists of instances from two actors, who differ in baseline height about 12 in. (translation) and motion ‘style’, including different movement range (scaling), tempo (stretching). To sum up, the challenges relied on the similarity between two action and the intra-class diversity due to geometric transformation.
As shown in Fig. 4, the original top shapelet trained by [14], represents a “overshot” phenomenon at the end of series. It contains an action corresponding to the arm being lowered back into position. However, at the begin of series, Gun class has a specific shape found by proposed method before a consistent action of raising the arm. That’s because action from Gun class has to draw a gun from a holster. Intuitively, it’s one of intrinsic feature of Gun class, which differ from NoGun class. However, owing to the intra-class diversity, instances differ in offset, scale and stretching factor. DTW and Z-normalization fail to deal with such variance, so that they failed to explore the latent discriminative power of S1, because of the complex transformation. While our transform invariant framework achieved meaningful detection of shapelets. The graphs in Fig. 4 shows a best alignment for S1 by projecting it through optimal deformation factors.
Therefore, shapelet found by proposed method reflect the essential difference between classes. Interestingly, once instance take a gun from a holster, we can achieve a earlier judgement or prediction for Gun/NoGun classification problem.
References
Chen, X., Du, Z., Li, J., Li, X., Zhang, H.: Compressed sensing based on dictionary learning for extracting impulse components. Sign. Process. 96, 94–109 (2014)
Chen, Y., et al.: The UCR time series classification archive, July 2015
Fu, W.C., Keogh, E., Lau, L.Y., Ratanamahatana, C.A., Wong, C.W.: Scaling and time warping in time series querying. VLDB J. 17(4), 899–921 (2008)
Grabocka, J., Schilling, N., Wistuba, M., Schmidt-Thieme, L.: Learning time-series shapelets. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 392–401. ACM (2014)
Hou, L., Kwok, J.T., Zurada, J.M.: Efficient learning of timeseries shapelets. In: Thirtieth AAAI Conference on Artificial Intelligence (2016)
Kong, S., Wang, D.: A dictionary learning approach for classification: separating the particularity and the commonality. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7572, pp. 186–199. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33718-5_14
Lines, J., Davis, L.M., Hills, J., Bagnall, A.: A shapelet transform for time series classification. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 289–297. ACM (2012)
Mueen, A., Keogh, E., Young, N.: Logical-shapelets: an expressive primitive for time series classification. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1154–1162. ACM (2011)
Paparrizos, J., Gravano, L.: K-shape: efficient and accurate clustering of time series. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1855–1870. ACM (2015)
Rakthanmanon, T., Keogh, E.: Fast shapelets: a scalable algorithm for discovering time series shapelets. In: Proceedings of the 2013 SIAM International Conference on Data Mining, pp. 668–676. SIAM (2013)
Rubinstein, R., Zibulevsky, M., Elad, M.: Double sparsity: learning sparse dictionaries for sparse signal approximation. IEEE Trans. Sig. Process. 58(3), 1553–1564 (2010)
Shah, M., Grabocka, J., Schilling, N., Wistuba, M., Schmidt-Thieme, L.: Learning DTW-shapelets for time-series classification. In: IKDD Conference on Data Science, p. 3 (2016)
Wistuba, M., Grabocka, J., Schmidt-Thieme, L.: Ultra-fast shapelets for time series classification. arXiv preprint arXiv:1503.05018 (2015)
Ye, L., Keogh, E.: Time series shapelets: a new primitive for data mining. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 947–956. ACM (2009)
Zhao, R., Schalk, G., Ji, Q.: Temporal pattern localization using mixed integer linear programming
Zheng, G., Yang, Y., Carbonell, J.G.: Efficient shift-invariant dictionary learning. In: KDD, pp. 2095–2104 (2016)
Acknowledgement
This work is partially supported by the NSFC under grants Nos. 61673018, 61272338, 61703443 and Guangzhou Science and Technology Founding Committee under grant No. 201804010255 and Guangdong Province Key Laboratory of Computer Science.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Deng, H., Chen, W., Ma, A.J., Shen, Q., Yuen, P.C., Feng, G. (2018). Robust Shapelets Learning: Transform-Invariant Prototypes. In: Lai, JH., et al. Pattern Recognition and Computer Vision. PRCV 2018. Lecture Notes in Computer Science(), vol 11258. Springer, Cham. https://doi.org/10.1007/978-3-030-03338-5_41
Download citation
DOI: https://doi.org/10.1007/978-3-030-03338-5_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03337-8
Online ISBN: 978-3-030-03338-5
eBook Packages: Computer ScienceComputer Science (R0)