Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Haoyi Zhou; Shanghang Zhang; J. Peng; Shuai Zhang; Jianxin Li; Hui Xiong; Wan Zhang

DOI:10.1609/aaai.v35i12.17325
Corpus ID: 229156802

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

@article{Zhou2020InformerBE,
  title={Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting},
  author={Haoyi Zhou and Shanghang Zhang and Jieqi Peng and Shuai Zhang and Jianxin Li and Hui Xiong and Wan Zhang},
  journal={ArXiv},
  year={2020},
  volume={abs/2012.07436},
  url={https://api.semanticscholar.org/CorpusID:229156802}
}

Haoyi ZhouShanghang Zhang Wan Zhang
Published in AAAI Conference on Artificial… 14 December 2020
Computer Science, Engineering

An efficient transformer-based model for LSTF, named Informer, with three distinctive characteristics: a ProbSparse self-attention mechanism, which achieves O(L log L) in time complexity and memory usage, and has comparable performance on sequences' dependency alignment.

[PDF] Semantic Reader

3,386 Citations

Highly Influential Citations

474

Background Citations

944

Methods Citations

1,043

Results Citations

Figures and Tables from this paper

AGCNT: Adaptive Graph Convolutional Network for Transformer-based Long Sequence Time-Series Forecasting

Hongyang SuXiaolong WangYang Qin

Computer Science, Engineering

CIKM

2021

A transformer-based model, named AGCNT, which is efficient and can capture the correlation between the sequences in the multivariate LSTF task without causing the memory bottleneck, and outperforms state-of-the-art baselines on large-scale datasets.

Halveformer: A Novel Architecture Combined with Linear Models for Long Sequences Time Series Forecasting

Yuan FengKai XiaXiaoyu QuXuwei HuLong SunZilong Zhang

Computer Science, Engineering

2024 International Joint Conference on Neural…

2024

This paper proposes a novel architecture named Halveformer, which combines linear models with the encoder to enhance both model performance and efficiency and demonstrates that Halveformer significantly outperforms existing advanced methods.

Towards Long-Term Time-Series Forecasting: Feature, Pattern, and Distribution

Yan LiXin Lu D. Dou

Engineering, Computer Science

2023 IEEE 39th International Conference on Data…

2023

An efficient Transformer-based model, named Conformer, is proposed, which differentiates itself from existing methods for LTTF in three aspects and outperforms the state-of-the-art methods on LTTF and generates reliable prediction results with uncertainty quantification.

[PDF]

Knowledge-enhanced Transformer for Multivariate Long Sequence Time-series Forecasting

S. T. KakdeRony MitraJasashwi MandalManoj Kumar Tiwari

Computer Science

2024

A novel approach is introduced that encapsulates conceptual relationships among variables within a well-defined knowledge graph, forming dynamic and learnable KGEs for seamless integration into the transformer architecture, which improves the accuracy of multivariate LSTF by capturing complex temporal and relational dynamics across multiple domains.

Highly Influenced

[PDF]

InParformer: Evolutionary Decomposition Transformers with Interactive Parallel Attention for Long-Term Time Series Forecasting

Haizhou CaoZhenhao HuangTiechui YaoJue WangHui HeYangang Wang

Computer Science

AAAI

2023

A novel Transformer-based forecasting model named InParformer with an Interactive Parallel Attention (InPar Attention) mechanism is proposed to learn long-range dependencies comprehensively in both frequency and time domains.

Enformer: Encoder-Based Sparse Periodic Self-Attention Time-Series Forecasting

Na WangXianglian Zhao

Computer Science

IEEE Access

2023

It is proved that the reasonable improvement of transformer structure in time-series prediction can reduce the amount of calculation and give consideration to the accuracy at the same time.

Long Sequence Time-Series Forecasting via Gated Convolution and Temporal Attention Mechanism

Mengjun LuX. ZhaiXiaodong Li

Computer Science, Engineering

2022 IEEE Smartworld, Ubiquitous Intelligence…

2022

This work improves Informer based on gated convolution and temporal attention mechanism, called GCTAM, and demonstrates that the method outperforms Informer on multiple real datasets.

Segformer: Segment-Based Transformer with Decomposition for Long-Term Series Forecasting

Jinhua ChenJinghua FanZhen LiuJiaqian XiangJia Wu

Computer Science, Engineering

2023 International Joint Conference on Neural…

2023

A Transformer-based model, Segformer, which extracts multiple components with obvious dependencies and coordinates the modeling process with the help of multi-component decomposition blocks and collaboration blocks, and offers an efficient solution for long-term dependency modeling problem of time-series.

Does Long-Term Series Forecasting Need Complex Attention and Extra Long Inputs?

Daojun LiangHaixia ZhangDongfeng YuanXiaoyan MaDongyang LiMinggao Zhang

Computer Science

ArXiv

2023

A lightweight Period-Attention mechanism (Periodformer), which renovates the aggregation of long-term subseries via explicit periodicity and short-termSubseries via built-in proximity and reduces the average search time while finding better hyperparameters.

[PDF]

Grouped self-attention mechanism for a memory-efficient Transformer

Bumjun JungYusuke MukutaTatsuya Harada

Computer Science, Engineering

ArXiv

2022

The proposed two novel modules, Grouped Self-Attention (GSA) and Compressed Cross-Att attention (CCA) achieve a computational space and time complexity of order $O(l)$ with a sequence length $l$ under small hyperparameter limitations, and can capture locality while considering global information.

[PDF]

Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting

SHIYANG LIXiaoyong Jin Xifeng Yan

Computer Science, Engineering

NeurIPS

2019

First, convolutional self-attention is proposed by producing queries and keys with causal convolution so that local context can be better incorporated into attention mechanism, and LogSparse Transformer is proposed, improving forecasting accuracy for time series with fine granularity and strong long-term dependencies under constrained memory budget.

A Memory-Network Based Solution for Multivariate Time-Series Forecasting

Yen-Yu ChangFan-Yun SunYueh-Hua WuShou-De Lin

Computer Science

ArXiv

2018

A deep learning based model named Memory Time-series network (MTNet) for time series forecasting, Inspired by Memory Network proposed for solving the question-answering task, which consists of a large memory component, three separate encoders, and an autoregressive component to train jointly.

[PDF]

Modeling Long- and Short-Term Temporal Patterns with Deep Neural Networks

Guokun LaiWei-Cheng ChangYiming YangHanxiao Liu

Computer Science, Engineering

SIGIR

2018

A novel deep learning framework, namely Long- and Short-term Time-series network (LSTNet), to address this open challenge of multivariate time series forecasting, using the Convolution Neural Network and the Recurrent Neural Network to extract short-term local dependency patterns among variables and to discover long-term patterns for time series trends.

[PDF]

A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction

Yao QinDongjin SongHaifeng ChenWei ChengGuofei JiangG. Cottrell

Computer Science

IJCAI

2017

A dual-stage attention-based recurrent neural network (DA-RNN) to address the long-term temporal dependencies of the Nonlinear autoregressive exogenous model and can outperform state-of-the-art methods for time series prediction.

[PDF]

ARMDN: Associative and Recurrent Mixture Density Networks for eRetail Demand Forecasting

Srayanta MukherjeeDevashish Shankar K. Chaudhury

Business, Computer Science

ArXiv

2018

A Neural Network architecture called AR-MDN, that simultaneously models associative factors, time-series trends and the variance in the demand, is proposed, that yields a significant improvement in forecasting accuracy when compared with existing alternatives.

[PDF]

Learning Longer-term Dependencies in RNNs with Auxiliary Losses

Trieu H. TrinhAndrew M. DaiThang LuongQuoc V. Le

Computer Science

ICML

2018

This paper proposes a simple method that improves the ability to capture long term dependencies in RNNs by adding an unsupervised auxiliary loss to the original objective, making truncated backpropagation feasible for long sequences and also improving full BPTT.

[PDF]

DSTP-RNN: a dual-stage two-phase attention-based recurrent neural networks for long-term and multivariate time series prediction

Yeqi LiuChuanyang GongLing YangYingyi Chen

Computer Science

Expert Syst. Appl.

2020

[PDF]

Long-term Forecasting using Higher Order Tensor RNNs

Rose YuStephan ZhengAnima AnandkumarYisong Yue

Computer Science

2017

This work theoretically establishes the approximation guarantees and the variance bound for HOT-RNN for general sequence inputs, and demonstrates 5% ~ 12% improvements for long-term prediction over general RNN and LSTM architectures on a range of simulated environments with nonlinear dynamics, as well on real-world time series data.

Long-term Forecasting using Tensor-Train RNNs

Rose YuStephan ZhengAnima AnandkumarYisong Yue

Computer Science

ArXiv

2017

Tensor-Train RNN (TT-RNN), a novel family of neural sequence architectures for multivariate forecasting in environments with nonlinear dynamics, and decompose the higher-order structure using the tensor-train (TT) decomposition to reduce the number of parameters while preserving the model performance.

[PDF]

CDSA: Cross-Dimensional Self-Attention for Multivariate, Geo-tagged Time Series Imputation

Jiawei MaZheng ShouAlireza ZareianH. MansourA. VetroShih-Fu Chang

Computer Science, Environmental Science

ArXiv

2019

This paper is the first to adapt the self-attention mechanism for multivariate, geo-tagged time series data with a novel approach called Cross-Dimensional Self-Attention (CDSA) to process each dimension sequentially, yet in an order-independent manner.

[PDF]

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Figures and Tables from this paper

3,386 Citations

AGCNT: Adaptive Graph Convolutional Network for Transformer-based Long Sequence Time-Series Forecasting

Halveformer: A Novel Architecture Combined with Linear Models for Long Sequences Time Series Forecasting

Towards Long-Term Time-Series Forecasting: Feature, Pattern, and Distribution

Knowledge-enhanced Transformer for Multivariate Long Sequence Time-series Forecasting

InParformer: Evolutionary Decomposition Transformers with Interactive Parallel Attention for Long-Term Time Series Forecasting

Enformer: Encoder-Based Sparse Periodic Self-Attention Time-Series Forecasting

Long Sequence Time-Series Forecasting via Gated Convolution and Temporal Attention Mechanism

Segformer: Segment-Based Transformer with Decomposition for Long-Term Series Forecasting

Does Long-Term Series Forecasting Need Complex Attention and Extra Long Inputs?

Grouped self-attention mechanism for a memory-efficient Transformer

57 References

Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting

A Memory-Network Based Solution for Multivariate Time-Series Forecasting

Modeling Long- and Short-Term Temporal Patterns with Deep Neural Networks

A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction

ARMDN: Associative and Recurrent Mixture Density Networks for eRetail Demand Forecasting

Learning Longer-term Dependencies in RNNs with Auxiliary Losses

DSTP-RNN: a dual-stage two-phase attention-based recurrent neural networks for long-term and multivariate time series prediction

Long-term Forecasting using Higher Order Tensor RNNs

Long-term Forecasting using Tensor-Train RNNs

CDSA: Cross-Dimensional Self-Attention for Multivariate, Geo-tagged Time Series Imputation

Related Papers