FEDAF: frequency enhanced decomposed attention free transformer for long time series forecasting

Yang, Xuekang; Li, Hui; Huang, Xiang; Feng, Xingyu

doi:10.1007/s00521-024-09937-y

FEDAF: frequency enhanced decomposed attention free transformer for long time series forecasting

Original Article
Published: 25 May 2024

Volume 36, pages 16271–16288, (2024)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Xuekang Yang¹,
Hui Li ORCID: orcid.org/0000-0002-6048-1083¹,
Xiang Huang² &
…
Xingyu Feng³

384 Accesses
Explore all metrics

Abstract

Long time series forecasting (LTSF), which involves modeling relationships within long time series to predict future values, has extensive applications in domains such as weather forecasting, financial analysis, and traffic prediction. Recently, numerous transformer-based models have been developed to address the challenges in LTSF. These models employ methods such as sparse attention to alleviate the inefficiencies associated with the attention mechanism and utilize decomposition architecture to enhance the predictability of the series. However, these complexity reduction methods necessitate additional calculations, and the series decomposition architecture overlooks the random components. To overcome these limitations, this paper proposes the Frequency Enhanced Decomposed Attention Free Transformer (FEDAF). FEDAF introduces two variants of the Frequency Enhanced Attention Free Mechanism (FEAFM), namely FEAFM-s and FEAFM-c, which seamlessly replace self-attention and cross-attention. Both variants perform calculations in the frequency domain without incurring additional costs, with the time and space complexity of FEAFM-s being \(\mathcal {O}(L{\text {log}}L)\). Additionally, FEDAF incorporates a time series decomposition architecture that considers random components. Unlike other models that solely decompose the series into trend and seasonal components, FEDAF also eliminates random terms by applying Fourier denoising. Our study quantifies data drift and validates that the proposed decomposition structure can mitigate the adverse effects caused by data shift. Overall, FEDAF demonstrates superior forecasting performance compared to state-of-the-art models across various domains, achieving a remarkable improvement of 19.49% for Traffic in particular. Furthermore, an efficiency analysis reveals that FEAFM enhances space efficiency by 12.8% compared to the vanilla attention mechanism and improves time efficiency by 43.63% compared to other attention mechanism variants.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

BSAformer: bidirectional sequence splitting aggregation attention mechanism for long term series forecasting

Article Open access 28 February 2025

Double-Layer Attention for Long Sequence Time-Series Forecasting

DBAFormer: A Double-Branch Attention Transformer for Long-Term Time Series Forecasting

Article Open access 20 July 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

Publicly available datasets were analyzed in this study: ETT-small can be found at: https://github.com/zhouhaoyi/ETDataset. Weather can be found at: https://www.bgc-jena.mpg.de/wetter/. Traffic can be found at: http://pems.dot.ca.gov. Exchange can be found at: https://github.com/laiguokun/multivariate-time-series-data. Electricity can be found at: https://archive.ics.uci.edu/ml/datasets/ElectricityLoadDiagrams20112014.

References

Venkatachalam K, Trojovskỳ P, Pamucar D, Bacanin N, Simic V (2023) Dwfh: an improved data-driven deep weather forecasting hybrid model using transductive long short term memory (t-lstm). Expert Syst Appl 213:119270
Google Scholar
Kirisci M, Cagcag Yolcu O (2022) A new CNN-based model for financial time series: Taiex and ftse stocks forecasting. Neural Process Lett 54(4):3357–3374
Google Scholar
Bui K-HN, Cho J, Yi H (2022) Spatial-temporal graph neural network for traffic forecasting: an overview and open research issues. Appl Intell 52(3):2763–2774
Google Scholar
Kushwah V, Wadhvani R, Kushwah AK (2021) Trend-based time series data clustering for wind speed forecasting. Wind Eng 45(4):992–1001
Google Scholar
Yang Z, Liu L, Li N, Tian J (2022) Time series forecasting of motor bearing vibration based on informer. Sensors 22(15):5858
Google Scholar
Rahmani F, Fattahi MH (2022) The influence of rainfall time series fractality on forecasting models’ efficiency. Acta Geophys 70(3):1349–1361
Google Scholar
Das K, Das S (2022) Energy-efficient cloud-integrated sensor network model based on data forecasting through Arima. Int J e-Collaboration (IJeC) 18(1):1–17
MathSciNet Google Scholar
Liu J, Zhao Z, Zhong Y, Zhao C, Zhang G (2022) Prediction of the dissolved gas concentration in power transformer oil based on Sarima model. Energy Rep 8:1360–1367
Google Scholar
Zhou H, Zhang S, Peng J, Zhang S, Li J, Xiong H, Zhang W (2021) Informer: Beyond efficient transformer for long sequence time-series forecasting. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 11106–11115
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30
Wang W, Xie E, Li X, Fan D-P, Song K, Liang D, Lu T, Luo P, Shao L (2022) Pvt v2: improved baselines with pyramid vision transformer. Comput Vis Media 8(3):415–424
Google Scholar
Ebert N, Stricker D, Wasenmüller O (2023) PLG-VIT: vision transformer with parallel local and global self-attention. Sensors 23(7):3447
Google Scholar
Ding C, Teng D, Zheng X, Wang Q, He Y, Long Z (2023) Dht: dynamic vision transformer using hybrid window attention for industrial defect images classification. IEEE Instr Meas Mag 26(2):19–28
Google Scholar
Zhou Q, Li R, Zhao Z, Peng C, Zhang H (2021) Semantic communication with adaptive universal transformer. IEEE Wirel Commun Lett 11(3):453–457
Google Scholar
Shi Y, Zhang X, Yu N (2023) Pl-transformer: a Pos-aware and layer ensemble transformer for text classification. Neural Comput Appl 35(2):1971–1982
Google Scholar
Farahani M, Gharachorloo M, Farahani M, Manthouri M (2021) Parsbert: transformer-based model for Persian language understanding. Neural Process Lett 53:3831–3847
Google Scholar
LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1(4):541–551
Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Google Scholar
Guo D, Wu Z, Feng J, Zhou Z, Shen Z (2023) Helvit: highly efficient lightweight vision transformer for remote sensing image scene classification. Appl Intell 53:1–16
Google Scholar
Yu W, Zhu M, Wang N, Wang X, Gao X (2022) An efficient transformer based on global and local self-attention for face photo-sketch synthesis. IEEE Trans Image Process 32:483–495
Google Scholar
Wang J, Xu G, Yan F, Wang J, Wang Z (2023) Defect transformer: an efficient hybrid transformer architecture for surface defect detection. Measurement 211:112614
Google Scholar
Xu C, Li J, Feng B, Lu B (2023) A financial time-series prediction model based on multiplex attention and linear transformer structure. Appl Sci 13(8):5175
Google Scholar
Zhang S, Zhang J, Wang X, Wang J, Wu Z (2023) Els2t: efficient lightweight spectral-spatial transformer for hyperspectral image classification. IEEE Trans Geosci Remote Sens 61:1–16
Google Scholar
Kitaev N, Kaiser L, Levskaya A (2020) Reformer: the efficient transformer. In: 8th international conference on learning representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020
Wu H, Xu J, Wang J, Long M (2021) Autoformer: decomposition transformers with auto-correlation for long-term series forecasting. Adv Neural Inf Process Syst 34:22419–22430
Google Scholar
Taylor SJ, Letham B (2018) Forecasting at scale. Am Stat 72(1):37–45
MathSciNet Google Scholar
Oreshkin BN, Carpov D, Chapados N, Bengio Y (2020) N-BEATS: neural basis expansion analysis for interpretable time series forecasting. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, ???
Sen R, Yu H-F, Dhillon IS (2019) Think globally, act locally: a deep neural network approach to high-dimensional time series forecasting. Adv Neural Inf Process Syst 32
Zhou T, Ma Z, Wen Q, Wang X, Sun L, Jin R (2022) Fedformer: frequency enhanced decomposed transformer for long-term series forecasting. In: International conference on machine learning, pp. 27268–27286. PMLR
Yadav H, Thakkar A (2024) Noa-LSTM: an efficient LSTM cell architecture for time series forecasting. Expert Syst Appl 238:122333
Google Scholar
Liu S, Huang Q, Li M, Wei Y (2024) A new lasso-bilstm-based ensemble learning approach for exchange rate forecasting. Eng Appl Artif Intell 127:107305
Google Scholar
Ryu G-A, Chuluunsaikhan T, Nasridinov A, Rah H, Yoo K-H (2023) Sce-lstm: Sparse critical event-driven lstm model with selective memorization for agricultural time-series prediction. Agriculture 13(11):2044
Google Scholar
Yi S, Liu H, Chen T, Zhang J, Fan Y (2023) A deep lstm-cnn based on self-attention mechanism with input data reduction for short-term load forecasting. IET Gener Trans Distrib 17(7):1538–1552
Google Scholar
Su H, Wang X, Qin Y, Chen Q (2024) Attention based adaptive spatial-temporal hypergraph convolutional networks for stock price trend prediction. Expert Syst Appl 238:121899
Google Scholar
Silva AQB, Gonçalves WN, Matsubara ET (2023) Descinet: a hierarchical deep convolutional neural network with skip connection for long time series forecasting. Expert Syst Appl 228:120246
Google Scholar
Ma S, Zhang T, Zhao Y-B, Kang Y, Bai P (2023) Tcln: a transformer-based CONV-LSTM network for multivariate time series forecasting. Appl Intell 53(23):28401–28417
Google Scholar
Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G (2009) The graph neural network model. IEEE Trans Neural Netw 20(1):61–80. https://doi.org/10.1109/TNN.2008.2005605
Article Google Scholar
Yu H, Li T, Yu W, Li J, Huang Y, Wang L, Liu A (2022) Regularized graph structure learning with semantic knowledge for multi-variates time-series forecasting. In: Raedt, L.D. (ed.) Proceedings of the thirty-first international joint conference on artificial intelligence, IJCAI-22, pp. 2362–2368. International Joint Conferences on Artificial Intelligence Organization, ???. https://doi.org/10.24963/ijcai.2022/328. Main Track
Cui Y, Zheng K, Cui D, Xie J, Deng L, Huang F, Zhou X (2021) Metro: a generic graph neural network framework for multivariate time series forecasting. Proc VLDB Endowment 15(2):224–236
Google Scholar
Cini A, Marisca I, Bianchi FM, Alippi C (2023) Scalable spatiotemporal graph neural networks. In: Proceedings of the AAAI conference on artificial intelligence, vol 37, pp 7218–7226
Jin M, Zheng Y, Li Y-F, Chen S, Yang B, Pan S (2022) Multivariate time series forecasting with dynamic graph neural odes. IEEE Trans Knowl Data Eng 35:9168–9180
Google Scholar
Liu Y, Liu Q, Zhang J-W, Feng H, Wang Z, Zhou Z, Chen W (2022) Multivariate time-series forecasting with temporal polynomial graph neural networks. Adv Neural Inf Process Syst 35:19414–19426
Google Scholar
Chen W, Chen L, Xie Y, Cao W, Gao Y, Feng X (2020) Multi-range attentive bicomponent graph convolutional network for traffic forecasting. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 3529–3536
Lan S, Ma Y, Huang W, Wang W, Yang H, Li P (2022) Dstagnn: Dynamic spatial-temporal aware graph neural network for traffic flow forecasting. In: International conference on machine learning, pp 11906–11917. PMLR
Shen L, Wei Y, Wang Y (2023) Gbt: two-stage transformer framework for non-stationary time series forecasting. Neural Netw 165:953–970
Google Scholar
Liu Y, Wang Z, Yu X, Chen X, Sun M (2022) Memory-based transformer with shorter window and longer horizon for multivariate time series forecasting. Pattern Recogn Lett 160:26–33
Google Scholar
Li M, Chen Q, Li G, Han D (2022) Umformer: a transformer dedicated to univariate multistep prediction. IEEE Access 10:101347–101361
Google Scholar
Shen L, Wang Y (2022) Tcct: tightly-coupled convolutional transformer on time series forecasting. Neurocomputing 480:131–145
Google Scholar
Yang Y, Lu J (2023) Foreformer: an enhanced transformer-based framework for multivariate time series forecasting. Appl Intell 53(10):12521–12540
Google Scholar
Li Z, Zhang X, Dong Z (2023) Tsf-transformer: a time series forecasting model for exhaust gas emission using transformer. Appl Intell 53(13):17211–17225
Google Scholar
Wang N, Zhao X (2023) Time series forecasting based on convolution transformer. IEICE Trans Inf Syst 106(5):976–985
Google Scholar
Chevillon G (2007) Direct multi-step estimation and forecasting. J Econ Surv 21(4):746–785
Google Scholar
Taieb SB, Hyndman RJ, et al. (2012) Recursive and direct multi-step forecasting: the best of both worlds vol. 19. Department of Econometrics and Business Statistics, Monash Univ
Zeng A, Chen M, Zhang L, Xu Q (2023) Are transformers effective for time series forecasting? In: Proceedings of the AAAI conference on artificial intelligence, vol 37, pp 11121–11128
Liu S, Yu H, Liao C, Li J, Lin W, Liu AX, Dustdar S (2021) Pyraformer: low-complexity pyramidal attention for long-range time series modeling and forecasting. In: International conference on learning representations
Li S, Jin X, Xuan Y, Zhou X, Chen W, Wang Y-X, Yan X (2019) Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. Adv Neural Inf Process Syst 32

Download references

Acknowledgements

We would like to express our gratitude to Grace LI for her meticulous and diligent efforts in refining the English language in this paper. Her careful approach has made a valuable contribution to the writing of this manuscript.

Author information

Authors and Affiliations

School of Automation, Nanjing University of Science and Technology, Nanjing, 210094, Jiangsu Province, China
Xuekang Yang & Hui Li
Science and Technology on Information System Engineering Laboratory, Nanjing, 210046, Jiangsu Province, China
Xiang Huang
School of Information and Electronic Engineering, Zhejiang Gongshang University, Hangzhou, 310018, Zhejiang Province, China
Xingyu Feng

Authors

Xuekang Yang
View author publications
You can also search for this author in PubMed Google Scholar
Hui Li
View author publications
You can also search for this author in PubMed Google Scholar
Xiang Huang
View author publications
You can also search for this author in PubMed Google Scholar
Xingyu Feng
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

XY and HL contributed to the conception and design of the study. XH and XF organized the datasets and performed the statistical analysis. XY and XF conducted all experiments and code programming. XY wrote the first draft of the manuscript, and HL made the manuscript revision. All authors contributed to the article and approved the submitted version.

Corresponding author

Correspondence to Hui Li.

Ethics declarations

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential Conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yang, X., Li, H., Huang, X. et al. FEDAF: frequency enhanced decomposed attention free transformer for long time series forecasting. Neural Comput & Applic 36, 16271–16288 (2024). https://doi.org/10.1007/s00521-024-09937-y

Download citation

Received: 13 September 2023
Accepted: 28 April 2024
Published: 25 May 2024
Issue Date: September 2024
DOI: https://doi.org/10.1007/s00521-024-09937-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

FEDAF: frequency enhanced decomposed attention free transformer for long time series forecasting

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

BSAformer: bidirectional sequence splitting aggregation attention mechanism for long term series forecasting

Double-Layer Attention for Long Sequence Time-Series Forecasting

DBAFormer: A Double-Branch Attention Transformer for Long-Term Time Series Forecasting

Explore related subjects

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now