FEDAF: frequency enhanced decomposed attention free transformer for long time series forecasting | Neural Computing and Applications Skip to main content
Log in

FEDAF: frequency enhanced decomposed attention free transformer for long time series forecasting

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Long time series forecasting (LTSF), which involves modeling relationships within long time series to predict future values, has extensive applications in domains such as weather forecasting, financial analysis, and traffic prediction. Recently, numerous transformer-based models have been developed to address the challenges in LTSF. These models employ methods such as sparse attention to alleviate the inefficiencies associated with the attention mechanism and utilize decomposition architecture to enhance the predictability of the series. However, these complexity reduction methods necessitate additional calculations, and the series decomposition architecture overlooks the random components. To overcome these limitations, this paper proposes the Frequency Enhanced Decomposed Attention Free Transformer (FEDAF). FEDAF introduces two variants of the Frequency Enhanced Attention Free Mechanism (FEAFM), namely FEAFM-s and FEAFM-c, which seamlessly replace self-attention and cross-attention. Both variants perform calculations in the frequency domain without incurring additional costs, with the time and space complexity of FEAFM-s being \(\mathcal {O}(L{\text {log}}L)\). Additionally, FEDAF incorporates a time series decomposition architecture that considers random components. Unlike other models that solely decompose the series into trend and seasonal components, FEDAF also eliminates random terms by applying Fourier denoising. Our study quantifies data drift and validates that the proposed decomposition structure can mitigate the adverse effects caused by data shift. Overall, FEDAF demonstrates superior forecasting performance compared to state-of-the-art models across various domains, achieving a remarkable improvement of 19.49% for Traffic in particular. Furthermore, an efficiency analysis reveals that FEAFM enhances space efficiency by 12.8% compared to the vanilla attention mechanism and improves time efficiency by 43.63% compared to other attention mechanism variants.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

Publicly available datasets were analyzed in this study: ETT-small can be found at: https://github.com/zhouhaoyi/ETDataset. Weather can be found at: https://www.bgc-jena.mpg.de/wetter/. Traffic can be found at: http://pems.dot.ca.gov. Exchange can be found at: https://github.com/laiguokun/multivariate-time-series-data. Electricity can be found at: https://archive.ics.uci.edu/ml/datasets/ElectricityLoadDiagrams20112014.

References

  1. Venkatachalam K, Trojovskỳ P, Pamucar D, Bacanin N, Simic V (2023) Dwfh: an improved data-driven deep weather forecasting hybrid model using transductive long short term memory (t-lstm). Expert Syst Appl 213:119270

    Google Scholar 

  2. Kirisci M, Cagcag Yolcu O (2022) A new CNN-based model for financial time series: Taiex and ftse stocks forecasting. Neural Process Lett 54(4):3357–3374

    Google Scholar 

  3. Bui K-HN, Cho J, Yi H (2022) Spatial-temporal graph neural network for traffic forecasting: an overview and open research issues. Appl Intell 52(3):2763–2774

    Google Scholar 

  4. Kushwah V, Wadhvani R, Kushwah AK (2021) Trend-based time series data clustering for wind speed forecasting. Wind Eng 45(4):992–1001

    Google Scholar 

  5. Yang Z, Liu L, Li N, Tian J (2022) Time series forecasting of motor bearing vibration based on informer. Sensors 22(15):5858

    Google Scholar 

  6. Rahmani F, Fattahi MH (2022) The influence of rainfall time series fractality on forecasting models’ efficiency. Acta Geophys 70(3):1349–1361

    Google Scholar 

  7. Das K, Das S (2022) Energy-efficient cloud-integrated sensor network model based on data forecasting through Arima. Int J e-Collaboration (IJeC) 18(1):1–17

    MathSciNet  Google Scholar 

  8. Liu J, Zhao Z, Zhong Y, Zhao C, Zhang G (2022) Prediction of the dissolved gas concentration in power transformer oil based on Sarima model. Energy Rep 8:1360–1367

    Google Scholar 

  9. Zhou H, Zhang S, Peng J, Zhang S, Li J, Xiong H, Zhang W (2021) Informer: Beyond efficient transformer for long sequence time-series forecasting. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 11106–11115

  10. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30

  11. Wang W, Xie E, Li X, Fan D-P, Song K, Liang D, Lu T, Luo P, Shao L (2022) Pvt v2: improved baselines with pyramid vision transformer. Comput Vis Media 8(3):415–424

    Google Scholar 

  12. Ebert N, Stricker D, Wasenmüller O (2023) PLG-VIT: vision transformer with parallel local and global self-attention. Sensors 23(7):3447

    Google Scholar 

  13. Ding C, Teng D, Zheng X, Wang Q, He Y, Long Z (2023) Dht: dynamic vision transformer using hybrid window attention for industrial defect images classification. IEEE Instr Meas Mag 26(2):19–28

    Google Scholar 

  14. Zhou Q, Li R, Zhao Z, Peng C, Zhang H (2021) Semantic communication with adaptive universal transformer. IEEE Wirel Commun Lett 11(3):453–457

    Google Scholar 

  15. Shi Y, Zhang X, Yu N (2023) Pl-transformer: a Pos-aware and layer ensemble transformer for text classification. Neural Comput Appl 35(2):1971–1982

    Google Scholar 

  16. Farahani M, Gharachorloo M, Farahani M, Manthouri M (2021) Parsbert: transformer-based model for Persian language understanding. Neural Process Lett 53:3831–3847

    Google Scholar 

  17. LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1(4):541–551

    Google Scholar 

  18. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Google Scholar 

  19. Guo D, Wu Z, Feng J, Zhou Z, Shen Z (2023) Helvit: highly efficient lightweight vision transformer for remote sensing image scene classification. Appl Intell 53:1–16

    Google Scholar 

  20. Yu W, Zhu M, Wang N, Wang X, Gao X (2022) An efficient transformer based on global and local self-attention for face photo-sketch synthesis. IEEE Trans Image Process 32:483–495

    Google Scholar 

  21. Wang J, Xu G, Yan F, Wang J, Wang Z (2023) Defect transformer: an efficient hybrid transformer architecture for surface defect detection. Measurement 211:112614

    Google Scholar 

  22. Xu C, Li J, Feng B, Lu B (2023) A financial time-series prediction model based on multiplex attention and linear transformer structure. Appl Sci 13(8):5175

    Google Scholar 

  23. Zhang S, Zhang J, Wang X, Wang J, Wu Z (2023) Els2t: efficient lightweight spectral-spatial transformer for hyperspectral image classification. IEEE Trans Geosci Remote Sens 61:1–16

    Google Scholar 

  24. Kitaev N, Kaiser L, Levskaya A (2020) Reformer: the efficient transformer. In: 8th international conference on learning representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020

  25. Wu H, Xu J, Wang J, Long M (2021) Autoformer: decomposition transformers with auto-correlation for long-term series forecasting. Adv Neural Inf Process Syst 34:22419–22430

    Google Scholar 

  26. Taylor SJ, Letham B (2018) Forecasting at scale. Am Stat 72(1):37–45

    MathSciNet  Google Scholar 

  27. Oreshkin BN, Carpov D, Chapados N, Bengio Y (2020) N-BEATS: neural basis expansion analysis for interpretable time series forecasting. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, ???

  28. Sen R, Yu H-F, Dhillon IS (2019) Think globally, act locally: a deep neural network approach to high-dimensional time series forecasting. Adv Neural Inf Process Syst 32

  29. Zhou T, Ma Z, Wen Q, Wang X, Sun L, Jin R (2022) Fedformer: frequency enhanced decomposed transformer for long-term series forecasting. In: International conference on machine learning, pp. 27268–27286. PMLR

  30. Yadav H, Thakkar A (2024) Noa-LSTM: an efficient LSTM cell architecture for time series forecasting. Expert Syst Appl 238:122333

    Google Scholar 

  31. Liu S, Huang Q, Li M, Wei Y (2024) A new lasso-bilstm-based ensemble learning approach for exchange rate forecasting. Eng Appl Artif Intell 127:107305

    Google Scholar 

  32. Ryu G-A, Chuluunsaikhan T, Nasridinov A, Rah H, Yoo K-H (2023) Sce-lstm: Sparse critical event-driven lstm model with selective memorization for agricultural time-series prediction. Agriculture 13(11):2044

    Google Scholar 

  33. Yi S, Liu H, Chen T, Zhang J, Fan Y (2023) A deep lstm-cnn based on self-attention mechanism with input data reduction for short-term load forecasting. IET Gener Trans Distrib 17(7):1538–1552

    Google Scholar 

  34. Su H, Wang X, Qin Y, Chen Q (2024) Attention based adaptive spatial-temporal hypergraph convolutional networks for stock price trend prediction. Expert Syst Appl 238:121899

    Google Scholar 

  35. Silva AQB, Gonçalves WN, Matsubara ET (2023) Descinet: a hierarchical deep convolutional neural network with skip connection for long time series forecasting. Expert Syst Appl 228:120246

    Google Scholar 

  36. Ma S, Zhang T, Zhao Y-B, Kang Y, Bai P (2023) Tcln: a transformer-based CONV-LSTM network for multivariate time series forecasting. Appl Intell 53(23):28401–28417

    Google Scholar 

  37. Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G (2009) The graph neural network model. IEEE Trans Neural Netw 20(1):61–80. https://doi.org/10.1109/TNN.2008.2005605

    Article  Google Scholar 

  38. Yu H, Li T, Yu W, Li J, Huang Y, Wang L, Liu A (2022) Regularized graph structure learning with semantic knowledge for multi-variates time-series forecasting. In: Raedt, L.D. (ed.) Proceedings of the thirty-first international joint conference on artificial intelligence, IJCAI-22, pp. 2362–2368. International Joint Conferences on Artificial Intelligence Organization, ???. https://doi.org/10.24963/ijcai.2022/328. Main Track

  39. Cui Y, Zheng K, Cui D, Xie J, Deng L, Huang F, Zhou X (2021) Metro: a generic graph neural network framework for multivariate time series forecasting. Proc VLDB Endowment 15(2):224–236

    Google Scholar 

  40. Cini A, Marisca I, Bianchi FM, Alippi C (2023) Scalable spatiotemporal graph neural networks. In: Proceedings of the AAAI conference on artificial intelligence, vol 37, pp 7218–7226

  41. Jin M, Zheng Y, Li Y-F, Chen S, Yang B, Pan S (2022) Multivariate time series forecasting with dynamic graph neural odes. IEEE Trans Knowl Data Eng 35:9168–9180

    Google Scholar 

  42. Liu Y, Liu Q, Zhang J-W, Feng H, Wang Z, Zhou Z, Chen W (2022) Multivariate time-series forecasting with temporal polynomial graph neural networks. Adv Neural Inf Process Syst 35:19414–19426

    Google Scholar 

  43. Chen W, Chen L, Xie Y, Cao W, Gao Y, Feng X (2020) Multi-range attentive bicomponent graph convolutional network for traffic forecasting. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 3529–3536

  44. Lan S, Ma Y, Huang W, Wang W, Yang H, Li P (2022) Dstagnn: Dynamic spatial-temporal aware graph neural network for traffic flow forecasting. In: International conference on machine learning, pp 11906–11917. PMLR

  45. Shen L, Wei Y, Wang Y (2023) Gbt: two-stage transformer framework for non-stationary time series forecasting. Neural Netw 165:953–970

    Google Scholar 

  46. Liu Y, Wang Z, Yu X, Chen X, Sun M (2022) Memory-based transformer with shorter window and longer horizon for multivariate time series forecasting. Pattern Recogn Lett 160:26–33

    Google Scholar 

  47. Li M, Chen Q, Li G, Han D (2022) Umformer: a transformer dedicated to univariate multistep prediction. IEEE Access 10:101347–101361

    Google Scholar 

  48. Shen L, Wang Y (2022) Tcct: tightly-coupled convolutional transformer on time series forecasting. Neurocomputing 480:131–145

    Google Scholar 

  49. Yang Y, Lu J (2023) Foreformer: an enhanced transformer-based framework for multivariate time series forecasting. Appl Intell 53(10):12521–12540

    Google Scholar 

  50. Li Z, Zhang X, Dong Z (2023) Tsf-transformer: a time series forecasting model for exhaust gas emission using transformer. Appl Intell 53(13):17211–17225

    Google Scholar 

  51. Wang N, Zhao X (2023) Time series forecasting based on convolution transformer. IEICE Trans Inf Syst 106(5):976–985

    Google Scholar 

  52. Chevillon G (2007) Direct multi-step estimation and forecasting. J Econ Surv 21(4):746–785

    Google Scholar 

  53. Taieb SB, Hyndman RJ, et al. (2012) Recursive and direct multi-step forecasting: the best of both worlds vol. 19. Department of Econometrics and Business Statistics, Monash Univ

  54. Zeng A, Chen M, Zhang L, Xu Q (2023) Are transformers effective for time series forecasting? In: Proceedings of the AAAI conference on artificial intelligence, vol 37, pp 11121–11128

  55. Liu S, Yu H, Liao C, Li J, Lin W, Liu AX, Dustdar S (2021) Pyraformer: low-complexity pyramidal attention for long-range time series modeling and forecasting. In: International conference on learning representations

  56. Li S, Jin X, Xuan Y, Zhou X, Chen W, Wang Y-X, Yan X (2019) Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. Adv Neural Inf Process Syst 32

Download references

Acknowledgements

We would like to express our gratitude to Grace LI for her meticulous and diligent efforts in refining the English language in this paper. Her careful approach has made a valuable contribution to the writing of this manuscript.

Author information

Authors and Affiliations

Authors

Contributions

XY and HL contributed to the conception and design of the study. XH and XF organized the datasets and performed the statistical analysis. XY and XF conducted all experiments and code programming. XY wrote the first draft of the manuscript, and HL made the manuscript revision. All authors contributed to the article and approved the submitted version.

Corresponding author

Correspondence to Hui Li.

Ethics declarations

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential Conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, X., Li, H., Huang, X. et al. FEDAF: frequency enhanced decomposed attention free transformer for long time series forecasting. Neural Comput & Applic 36, 16271–16288 (2024). https://doi.org/10.1007/s00521-024-09937-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-024-09937-y

Keywords

Navigation