Abstract
Long time series forecasting (LTSF), which involves modeling relationships within long time series to predict future values, has extensive applications in domains such as weather forecasting, financial analysis, and traffic prediction. Recently, numerous transformer-based models have been developed to address the challenges in LTSF. These models employ methods such as sparse attention to alleviate the inefficiencies associated with the attention mechanism and utilize decomposition architecture to enhance the predictability of the series. However, these complexity reduction methods necessitate additional calculations, and the series decomposition architecture overlooks the random components. To overcome these limitations, this paper proposes the Frequency Enhanced Decomposed Attention Free Transformer (FEDAF). FEDAF introduces two variants of the Frequency Enhanced Attention Free Mechanism (FEAFM), namely FEAFM-s and FEAFM-c, which seamlessly replace self-attention and cross-attention. Both variants perform calculations in the frequency domain without incurring additional costs, with the time and space complexity of FEAFM-s being \(\mathcal {O}(L{\text {log}}L)\). Additionally, FEDAF incorporates a time series decomposition architecture that considers random components. Unlike other models that solely decompose the series into trend and seasonal components, FEDAF also eliminates random terms by applying Fourier denoising. Our study quantifies data drift and validates that the proposed decomposition structure can mitigate the adverse effects caused by data shift. Overall, FEDAF demonstrates superior forecasting performance compared to state-of-the-art models across various domains, achieving a remarkable improvement of 19.49% for Traffic in particular. Furthermore, an efficiency analysis reveals that FEAFM enhances space efficiency by 12.8% compared to the vanilla attention mechanism and improves time efficiency by 43.63% compared to other attention mechanism variants.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
Publicly available datasets were analyzed in this study: ETT-small can be found at: https://github.com/zhouhaoyi/ETDataset. Weather can be found at: https://www.bgc-jena.mpg.de/wetter/. Traffic can be found at: http://pems.dot.ca.gov. Exchange can be found at: https://github.com/laiguokun/multivariate-time-series-data. Electricity can be found at: https://archive.ics.uci.edu/ml/datasets/ElectricityLoadDiagrams20112014.
References
Venkatachalam K, Trojovskỳ P, Pamucar D, Bacanin N, Simic V (2023) Dwfh: an improved data-driven deep weather forecasting hybrid model using transductive long short term memory (t-lstm). Expert Syst Appl 213:119270
Kirisci M, Cagcag Yolcu O (2022) A new CNN-based model for financial time series: Taiex and ftse stocks forecasting. Neural Process Lett 54(4):3357–3374
Bui K-HN, Cho J, Yi H (2022) Spatial-temporal graph neural network for traffic forecasting: an overview and open research issues. Appl Intell 52(3):2763–2774
Kushwah V, Wadhvani R, Kushwah AK (2021) Trend-based time series data clustering for wind speed forecasting. Wind Eng 45(4):992–1001
Yang Z, Liu L, Li N, Tian J (2022) Time series forecasting of motor bearing vibration based on informer. Sensors 22(15):5858
Rahmani F, Fattahi MH (2022) The influence of rainfall time series fractality on forecasting models’ efficiency. Acta Geophys 70(3):1349–1361
Das K, Das S (2022) Energy-efficient cloud-integrated sensor network model based on data forecasting through Arima. Int J e-Collaboration (IJeC) 18(1):1–17
Liu J, Zhao Z, Zhong Y, Zhao C, Zhang G (2022) Prediction of the dissolved gas concentration in power transformer oil based on Sarima model. Energy Rep 8:1360–1367
Zhou H, Zhang S, Peng J, Zhang S, Li J, Xiong H, Zhang W (2021) Informer: Beyond efficient transformer for long sequence time-series forecasting. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 11106–11115
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30
Wang W, Xie E, Li X, Fan D-P, Song K, Liang D, Lu T, Luo P, Shao L (2022) Pvt v2: improved baselines with pyramid vision transformer. Comput Vis Media 8(3):415–424
Ebert N, Stricker D, Wasenmüller O (2023) PLG-VIT: vision transformer with parallel local and global self-attention. Sensors 23(7):3447
Ding C, Teng D, Zheng X, Wang Q, He Y, Long Z (2023) Dht: dynamic vision transformer using hybrid window attention for industrial defect images classification. IEEE Instr Meas Mag 26(2):19–28
Zhou Q, Li R, Zhao Z, Peng C, Zhang H (2021) Semantic communication with adaptive universal transformer. IEEE Wirel Commun Lett 11(3):453–457
Shi Y, Zhang X, Yu N (2023) Pl-transformer: a Pos-aware and layer ensemble transformer for text classification. Neural Comput Appl 35(2):1971–1982
Farahani M, Gharachorloo M, Farahani M, Manthouri M (2021) Parsbert: transformer-based model for Persian language understanding. Neural Process Lett 53:3831–3847
LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1(4):541–551
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Guo D, Wu Z, Feng J, Zhou Z, Shen Z (2023) Helvit: highly efficient lightweight vision transformer for remote sensing image scene classification. Appl Intell 53:1–16
Yu W, Zhu M, Wang N, Wang X, Gao X (2022) An efficient transformer based on global and local self-attention for face photo-sketch synthesis. IEEE Trans Image Process 32:483–495
Wang J, Xu G, Yan F, Wang J, Wang Z (2023) Defect transformer: an efficient hybrid transformer architecture for surface defect detection. Measurement 211:112614
Xu C, Li J, Feng B, Lu B (2023) A financial time-series prediction model based on multiplex attention and linear transformer structure. Appl Sci 13(8):5175
Zhang S, Zhang J, Wang X, Wang J, Wu Z (2023) Els2t: efficient lightweight spectral-spatial transformer for hyperspectral image classification. IEEE Trans Geosci Remote Sens 61:1–16
Kitaev N, Kaiser L, Levskaya A (2020) Reformer: the efficient transformer. In: 8th international conference on learning representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020
Wu H, Xu J, Wang J, Long M (2021) Autoformer: decomposition transformers with auto-correlation for long-term series forecasting. Adv Neural Inf Process Syst 34:22419–22430
Taylor SJ, Letham B (2018) Forecasting at scale. Am Stat 72(1):37–45
Oreshkin BN, Carpov D, Chapados N, Bengio Y (2020) N-BEATS: neural basis expansion analysis for interpretable time series forecasting. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, ???
Sen R, Yu H-F, Dhillon IS (2019) Think globally, act locally: a deep neural network approach to high-dimensional time series forecasting. Adv Neural Inf Process Syst 32
Zhou T, Ma Z, Wen Q, Wang X, Sun L, Jin R (2022) Fedformer: frequency enhanced decomposed transformer for long-term series forecasting. In: International conference on machine learning, pp. 27268–27286. PMLR
Yadav H, Thakkar A (2024) Noa-LSTM: an efficient LSTM cell architecture for time series forecasting. Expert Syst Appl 238:122333
Liu S, Huang Q, Li M, Wei Y (2024) A new lasso-bilstm-based ensemble learning approach for exchange rate forecasting. Eng Appl Artif Intell 127:107305
Ryu G-A, Chuluunsaikhan T, Nasridinov A, Rah H, Yoo K-H (2023) Sce-lstm: Sparse critical event-driven lstm model with selective memorization for agricultural time-series prediction. Agriculture 13(11):2044
Yi S, Liu H, Chen T, Zhang J, Fan Y (2023) A deep lstm-cnn based on self-attention mechanism with input data reduction for short-term load forecasting. IET Gener Trans Distrib 17(7):1538–1552
Su H, Wang X, Qin Y, Chen Q (2024) Attention based adaptive spatial-temporal hypergraph convolutional networks for stock price trend prediction. Expert Syst Appl 238:121899
Silva AQB, Gonçalves WN, Matsubara ET (2023) Descinet: a hierarchical deep convolutional neural network with skip connection for long time series forecasting. Expert Syst Appl 228:120246
Ma S, Zhang T, Zhao Y-B, Kang Y, Bai P (2023) Tcln: a transformer-based CONV-LSTM network for multivariate time series forecasting. Appl Intell 53(23):28401–28417
Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G (2009) The graph neural network model. IEEE Trans Neural Netw 20(1):61–80. https://doi.org/10.1109/TNN.2008.2005605
Yu H, Li T, Yu W, Li J, Huang Y, Wang L, Liu A (2022) Regularized graph structure learning with semantic knowledge for multi-variates time-series forecasting. In: Raedt, L.D. (ed.) Proceedings of the thirty-first international joint conference on artificial intelligence, IJCAI-22, pp. 2362–2368. International Joint Conferences on Artificial Intelligence Organization, ???. https://doi.org/10.24963/ijcai.2022/328. Main Track
Cui Y, Zheng K, Cui D, Xie J, Deng L, Huang F, Zhou X (2021) Metro: a generic graph neural network framework for multivariate time series forecasting. Proc VLDB Endowment 15(2):224–236
Cini A, Marisca I, Bianchi FM, Alippi C (2023) Scalable spatiotemporal graph neural networks. In: Proceedings of the AAAI conference on artificial intelligence, vol 37, pp 7218–7226
Jin M, Zheng Y, Li Y-F, Chen S, Yang B, Pan S (2022) Multivariate time series forecasting with dynamic graph neural odes. IEEE Trans Knowl Data Eng 35:9168–9180
Liu Y, Liu Q, Zhang J-W, Feng H, Wang Z, Zhou Z, Chen W (2022) Multivariate time-series forecasting with temporal polynomial graph neural networks. Adv Neural Inf Process Syst 35:19414–19426
Chen W, Chen L, Xie Y, Cao W, Gao Y, Feng X (2020) Multi-range attentive bicomponent graph convolutional network for traffic forecasting. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 3529–3536
Lan S, Ma Y, Huang W, Wang W, Yang H, Li P (2022) Dstagnn: Dynamic spatial-temporal aware graph neural network for traffic flow forecasting. In: International conference on machine learning, pp 11906–11917. PMLR
Shen L, Wei Y, Wang Y (2023) Gbt: two-stage transformer framework for non-stationary time series forecasting. Neural Netw 165:953–970
Liu Y, Wang Z, Yu X, Chen X, Sun M (2022) Memory-based transformer with shorter window and longer horizon for multivariate time series forecasting. Pattern Recogn Lett 160:26–33
Li M, Chen Q, Li G, Han D (2022) Umformer: a transformer dedicated to univariate multistep prediction. IEEE Access 10:101347–101361
Shen L, Wang Y (2022) Tcct: tightly-coupled convolutional transformer on time series forecasting. Neurocomputing 480:131–145
Yang Y, Lu J (2023) Foreformer: an enhanced transformer-based framework for multivariate time series forecasting. Appl Intell 53(10):12521–12540
Li Z, Zhang X, Dong Z (2023) Tsf-transformer: a time series forecasting model for exhaust gas emission using transformer. Appl Intell 53(13):17211–17225
Wang N, Zhao X (2023) Time series forecasting based on convolution transformer. IEICE Trans Inf Syst 106(5):976–985
Chevillon G (2007) Direct multi-step estimation and forecasting. J Econ Surv 21(4):746–785
Taieb SB, Hyndman RJ, et al. (2012) Recursive and direct multi-step forecasting: the best of both worlds vol. 19. Department of Econometrics and Business Statistics, Monash Univ
Zeng A, Chen M, Zhang L, Xu Q (2023) Are transformers effective for time series forecasting? In: Proceedings of the AAAI conference on artificial intelligence, vol 37, pp 11121–11128
Liu S, Yu H, Liao C, Li J, Lin W, Liu AX, Dustdar S (2021) Pyraformer: low-complexity pyramidal attention for long-range time series modeling and forecasting. In: International conference on learning representations
Li S, Jin X, Xuan Y, Zhou X, Chen W, Wang Y-X, Yan X (2019) Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. Adv Neural Inf Process Syst 32
Acknowledgements
We would like to express our gratitude to Grace LI for her meticulous and diligent efforts in refining the English language in this paper. Her careful approach has made a valuable contribution to the writing of this manuscript.
Author information
Authors and Affiliations
Contributions
XY and HL contributed to the conception and design of the study. XH and XF organized the datasets and performed the statistical analysis. XY and XF conducted all experiments and code programming. XY wrote the first draft of the manuscript, and HL made the manuscript revision. All authors contributed to the article and approved the submitted version.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential Conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yang, X., Li, H., Huang, X. et al. FEDAF: frequency enhanced decomposed attention free transformer for long time series forecasting. Neural Comput & Applic 36, 16271–16288 (2024). https://doi.org/10.1007/s00521-024-09937-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-024-09937-y