Selego: robust variate selection for accurate time series forecasting

Tiwaskar, Manoj; Garg, Yash; Li, Xinsheng; Candan, K. Selçuk; Sapino, Maria Luisa

doi:10.1007/s10618-021-00777-1

Selego: robust variate selection for accurate time series forecasting

Published: 28 July 2021

Volume 35, pages 2141–2167, (2021)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Manoj Tiwaskar ORCID: orcid.org/0000-0001-5570-947X¹,
Yash Garg¹,
Xinsheng Li¹,
K. Selçuk Candan¹ &
…
Maria Luisa Sapino²

500 Accesses
1 Altmetric
Explore all metrics

Abstract

Naïve extensions of uni-variate prediction techniques lead to an unwelcome increase in the cost of multi-variate model learning and significant deteriorations in the model performance. In this paper, we first argue that (a) one can learn a more accurate forecasting model by leveraging temporal alignments among variates to quantify the importance of the recorded variates with respect to a target variate. We further argue that, (b) for this purpose we need to quantify temporal correlation, not in terms of series similarity, but in terms of temporal alignments of key “events” impacting these series. Finally, we argue that (c) while learning a temporal model using recurrence based techniques (such as RNN and LSTM—even when leveraging attention strategies) is difficult and costly, we can achieve better performance by coupling simpler CNNs with an adaptive variate selection strategy. Relying on these arguments, we propose a Selego framework (Selego is a word of latin origin meaning “selection”) for variate selection and experimentally evaluate the performance of the proposed approach on various forecasting models, such as LSTM, RNN, and CNN, for different top-X% variates and different forecasting time in the future (lead) on multiple real-world datasets. Experiments show that the proposed framework can offer significant (\(90-98\%\)) drops in the number of recorded variates that are needed to train predictive models, while simultaneously boosting accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Fig. 5

VPformer: Multivariate Time Series Forecasting with Variable Correlation and Triple Patch Correlation Transformer

DPAST-RNN: A Dual-Phase Attention-Based Recurrent Neural Network Using Spatiotemporal LSTMs for Time Series Prediction

Foreformer: an enhanced transformer-based framework for multivariate time series forecasting

Article 28 September 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

While the terms “variate” and “feature” are often used interchangeably, in this paper, we make a clear distinction: A “variate” is an input time series describing a time-varying property of the system being observed, whereas a “feature” is a temporal pattern extracted from a given time series and can be used to characterize that series.
Without loss of generality, in the experiments reported in Sect. 3, we consider target sets each with a single variate (i.e., \(|{\mathbb {Y}}|\) = 1).
Our source codes and the public data sets used in these experiments are available .
Results presented in this paper were obtained using NSF testbed: “Chameleon: A Large-Scale Re-configurable Experimental Environment for Cloud Research”
Since the components of the FRESH feature vector are of potentially of very different scales, each component has been re-scaled to between 0 and 1 to prevent large valued components from having undue bias in the final ranking.
We report the best model performance across 200 epochs.

References

Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J et al (2016) Tensorflow: A system for large-scale machine learning. In USENIX, OSDI
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In ICLR
Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. JMLR
Berndt DJ, Clifford J (1994) Using dynamic time warping to find patterns in time series. In KDD workshop
Bianco V, Manca O, Nardini S (2009) Electricity consumption forecasting in italy using linear regression models. Energy 34(9):1413–1421
Article Google Scholar
Blei DM, Lafferty JD (2006) Dynamic topic models. In Proceedings of the 23rd International Conference on Machine Learning (ICML ’06), pp 113–120
Box GE, Jenkins GM, Reinsel GC, Ljung GM (2015) Time series analysis, forecasting and control, 5th ed., Wiley
Candan KS, Rossini R, Sapino ML, Wang X (2012) sdtw: Computing dtw distances using locally relevant constraints based on salient feature alignments. VLDB
Chen L, Ng R (2004) On marriage of lp-norms and edit distance. In VLDB
Christ M, Braun N, Neuffer J, Kempa-Liehr A (2018) Time series feature extraction on basis of scalable hypothesis tests (tsfresh – a python package)
Clevert D-A, Unterthiner T, Hochreiter S (2016) Fast and accurate deep network learning by exponential linear units (elus). ICLR
Drucker H, Burges CJ, Kaufman L, Smola AJ, Vapnik V (1997) Support vector regression machines. In NIPS
Fernandez-Fraga S, Aceves-Fernandez M (2018) Feature extraction of eeg signal upon bci systems based on steady-state visual evoked potentials using the ant colony optimization. Discrete Dynamics in Nature and Society
Garg Y, Candan KS (2019) Racknet: Robust allocation of convolutional kernels in neural networks for image classification. In ICMR
Garg Y, Candan KS (2021a) Sdma: Saliency-driven mutual cross attention for multi-variate time series. In ICPR, IEEE
Garg Y, Candan KS (2021b) Xm2a:multi-scale multi-head attention with cross talk for time series analysis. In MIPR
Goodwin P, Dargay J, Hanly M (2004) Elasticities of road traffic and fuel consumption with respect to price and income: a review. Transport reviews
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In CVPR
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Keogh E, Ratanamahatana CA (2005) Exact indexing of dynamic time warping. Knowl Inf Syst 7:358–386
Linderman S, Adams R (2014) Discovering latent network structure in point process data. In ICML’14, PMLR 32(2), pages 1413–1421
Lin J, Keogh E, Lonardi S, Patel P (2002) Finding motifs in time series. In Workshop on Temporal Data Mining
Lowe D (2004) Distinctive image features from scale-invariant keypoints. IJCV
Mueen A, Keogh E (2016) Extracting optimal performance from dynamic time warping. In SIGKDD
Pearson K (1901) Principal components analysis. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science
Qin Y, Song D, Chen H, Cheng W, Jiang G, Cottrell G (2017) A dual-stage attention-based recurrent neural network for time series prediction. IJCAI
Roffo G, Melzi S, Cristani M (2015) Infinite feature selection. In ICCV
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323:533–536
Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. TASSP
Salton G, McGill MJ (1983) Introduction to modern information retrieval. McGraw-Hill, Inc
Shatkay H, Zdonik SB (1996) Approximate queries and representations for large data sequences. In ICDE, IEEE
Szegedy C, Liu W, Jia Y, Sermanet P, Reed SE, Anguelov D, Erhan D (2015) Going deeper with convolutions. In CVPR
Tong H, Faloutsos C, Pan J-Y (2006) Fast random walk with restart and its applications. In ICDM, page 613–622
Tucker L (1966) Some mathematical notes on three-mode factor analysis. Psychometrika 31:279–311
Article MathSciNet Google Scholar
Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In CVPR
Yuan B, Li H, Bertozzi AL, Brantingham PJ, Porter MA (2019) Multivariate spatiotemporal hawkes processes and network reconstruction. SIAM J. Math. Data Sci. 1(2):356–382
Article MathSciNet Google Scholar
Zoph B, Le QV (2017) Neural architecture search with reinforcement learning. ICLR

Download references

Acknowledgements

This work is partially supported by NSF#1827757 “Building Doctor’s Medicine Cabinet (BDMC): Data-Driven Services for High Performance and Sustainable Buildings”, NSF#1610282 “DataStorm: A Data Enabled System for End-to-End Disaster Planning and Response”, NSF#1633381 “BIGDATA: Discovering Context-Sensitive Impact in Complex Systems”, NSF#1909555 “pCAR: Discovering and Leveraging Plausibly Causal (p-causal) Relationships to Understand Complex Dynamic Systems”, and DOE grant “Securing Grid-interactive Efficient Buildings (GEB) through Cyber Defense and Resilient System (CYDRES)”. Part of the research was carried out using the Chameleon testbed supported by the NSF.

Author information

Authors and Affiliations

Arizona State University, 699 S Mill Ave, Tempe, Arizona, US
Manoj Tiwaskar, Yash Garg, Xinsheng Li & K. Selçuk Candan
University of Turin, Via Pessinetto, 12, 10149, Torino, TO, Italy
Maria Luisa Sapino

Authors

Manoj Tiwaskar
View author publications
You can also search for this author inPubMed Google Scholar
Yash Garg
View author publications
You can also search for this author inPubMed Google Scholar
Xinsheng Li
View author publications
You can also search for this author inPubMed Google Scholar
K. Selçuk Candan
View author publications
You can also search for this author inPubMed Google Scholar
Maria Luisa Sapino
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Manoj Tiwaskar.

Additional information

Responsible editor: Annalisa Appice, Sergio Escalera, Jose A. Gamez, Heike Trautmann.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix—sample series and feature distributions

Figures 7 through 9 provide examples of target variables, the best series aligned based on feature distributions, along with a sample for poorly aligned series. In order to better visualize the feature alignments, consecutive series (e.g. the consecutive days in NASDAQ) have been concatenated and the number of feature layers considered in these charts have been raised from the number of layers considered in the experiments. As we see in these figures, temporal alignment of variates does not mean that they must look similar: instead, alignment only means that the two series show evidence of being impacted from the same underlying events. In Fig. 9b, for example, we see six variates that, together, predict the fuel consumption series 9a well. We also see in the figure that these series used for model training are temporally aligned with the target series but are not necessarily similar to it.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tiwaskar, M., Garg, Y., Li, X. et al. Selego: robust variate selection for accurate time series forecasting. Data Min Knowl Disc 35, 2141–2167 (2021). https://doi.org/10.1007/s10618-021-00777-1

Download citation

Received: 19 September 2020
Accepted: 23 June 2021
Published: 28 July 2021
Issue Date: September 2021
DOI: https://doi.org/10.1007/s10618-021-00777-1

Keywords

Part of a collection:

Special Issue of the Journal Track of ECML PKDD 2021

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Selego: robust variate selection for accurate time series forecasting

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

VPformer: Multivariate Time Series Forecasting with Variable Correlation and Triple Patch Correlation Transformer

DPAST-RNN: A Dual-Phase Attention-Based Recurrent Neural Network Using Spatiotemporal LSTMs for Time Series Prediction

Foreformer: an enhanced transformer-based framework for multivariate time series forecasting

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix—sample series and feature distributions

Appendix—sample series and feature distributions

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now