Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting
@article{Zhou2020InformerBE, title={Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting}, author={Haoyi Zhou and Shanghang Zhang and Jieqi Peng and Shuai Zhang and Jianxin Li and Hui Xiong and Wan Zhang}, journal={ArXiv}, year={2020}, volume={abs/2012.07436}, url={https://api.semanticscholar.org/CorpusID:229156802} }
An efficient transformer-based model for LSTF, named Informer, with three distinctive characteristics: a ProbSparse self-attention mechanism, which achieves O(L log L) in time complexity and memory usage, and has comparable performance on sequences' dependency alignment.
3,142 Citations
AGCNT: Adaptive Graph Convolutional Network for Transformer-based Long Sequence Time-Series Forecasting
- 2021
Computer Science, Engineering
A transformer-based model, named AGCNT, which is efficient and can capture the correlation between the sequences in the multivariate LSTF task without causing the memory bottleneck, and outperforms state-of-the-art baselines on large-scale datasets.
Halveformer: A Novel Architecture Combined with Linear Models for Long Sequences Time Series Forecasting
- 2024
Computer Science, Engineering
This paper proposes a novel architecture named Halveformer, which combines linear models with the encoder to enhance both model performance and efficiency and demonstrates that Halveformer significantly outperforms existing advanced methods.
Towards Long-Term Time-Series Forecasting: Feature, Pattern, and Distribution
- 2023
Engineering, Computer Science
An efficient Transformer-based model, named Conformer, is proposed, which differentiates itself from existing methods for LTTF in three aspects and outperforms the state-of-the-art methods on LTTF and generates reliable prediction results with uncertainty quantification.
Knowledge-enhanced Transformer for Multivariate Long Sequence Time-series Forecasting
- 2024
Computer Science
A novel approach is introduced that encapsulates conceptual relationships among variables within a well-defined knowledge graph, forming dynamic and learnable KGEs for seamless integration into the transformer architecture, which improves the accuracy of multivariate LSTF by capturing complex temporal and relational dynamics across multiple domains.
InParformer: Evolutionary Decomposition Transformers with Interactive Parallel Attention for Long-Term Time Series Forecasting
- 2023
Computer Science
A novel Transformer-based forecasting model named InParformer with an Interactive Parallel Attention (InPar Attention) mechanism is proposed to learn long-range dependencies comprehensively in both frequency and time domains.
Enformer: Encoder-Based Sparse Periodic Self-Attention Time-Series Forecasting
- 2023
Computer Science
It is proved that the reasonable improvement of transformer structure in time-series prediction can reduce the amount of calculation and give consideration to the accuracy at the same time.
Long Sequence Time-Series Forecasting via Gated Convolution and Temporal Attention Mechanism
- 2022
Computer Science, Engineering
This work improves Informer based on gated convolution and temporal attention mechanism, called GCTAM, and demonstrates that the method outperforms Informer on multiple real datasets.
Segformer: Segment-Based Transformer with Decomposition for Long-Term Series Forecasting
- 2023
Computer Science, Engineering
A Transformer-based model, Segformer, which extracts multiple components with obvious dependencies and coordinates the modeling process with the help of multi-component decomposition blocks and collaboration blocks, and offers an efficient solution for long-term dependency modeling problem of time-series.
Does Long-Term Series Forecasting Need Complex Attention and Extra Long Inputs?
- 2023
Computer Science
A lightweight Period-Attention mechanism (Periodformer), which renovates the aggregation of long-term subseries via explicit periodicity and short-termSubseries via built-in proximity and reduces the average search time while finding better hyperparameters.
Grouped self-attention mechanism for a memory-efficient Transformer
- 2022
Computer Science, Engineering
The proposed two novel modules, Grouped Self-Attention (GSA) and Compressed Cross-Att attention (CCA) achieve a computational space and time complexity of order $O(l)$ with a sequence length $l$ under small hyperparameter limitations, and can capture locality while considering global information.
57 References
Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting
- 2019
Computer Science, Engineering
First, convolutional self-attention is proposed by producing queries and keys with causal convolution so that local context can be better incorporated into attention mechanism, and LogSparse Transformer is proposed, improving forecasting accuracy for time series with fine granularity and strong long-term dependencies under constrained memory budget.
A Memory-Network Based Solution for Multivariate Time-Series Forecasting
- 2018
Computer Science
A deep learning based model named Memory Time-series network (MTNet) for time series forecasting, Inspired by Memory Network proposed for solving the question-answering task, which consists of a large memory component, three separate encoders, and an autoregressive component to train jointly.
Modeling Long- and Short-Term Temporal Patterns with Deep Neural Networks
- 2018
Computer Science, Engineering
A novel deep learning framework, namely Long- and Short-term Time-series network (LSTNet), to address this open challenge of multivariate time series forecasting, using the Convolution Neural Network and the Recurrent Neural Network to extract short-term local dependency patterns among variables and to discover long-term patterns for time series trends.
A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction
- 2017
Computer Science
A dual-stage attention-based recurrent neural network (DA-RNN) to address the long-term temporal dependencies of the Nonlinear autoregressive exogenous model and can outperform state-of-the-art methods for time series prediction.
ARMDN: Associative and Recurrent Mixture Density Networks for eRetail Demand Forecasting
- 2018
Business, Computer Science
A Neural Network architecture called AR-MDN, that simultaneously models associative factors, time-series trends and the variance in the demand, is proposed, that yields a significant improvement in forecasting accuracy when compared with existing alternatives.
Learning Longer-term Dependencies in RNNs with Auxiliary Losses
- 2018
Computer Science
This paper proposes a simple method that improves the ability to capture long term dependencies in RNNs by adding an unsupervised auxiliary loss to the original objective, making truncated backpropagation feasible for long sequences and also improving full BPTT.
DSTP-RNN: a dual-stage two-phase attention-based recurrent neural networks for long-term and multivariate time series prediction
- 2020
Computer Science
Long-term Forecasting using Higher Order Tensor RNNs
- 2017
Computer Science
This work theoretically establishes the approximation guarantees and the variance bound for HOT-RNN for general sequence inputs, and demonstrates 5% ~ 12% improvements for long-term prediction over general RNN and LSTM architectures on a range of simulated environments with nonlinear dynamics, as well on real-world time series data.
Long-term Forecasting using Tensor-Train RNNs
- 2017
Computer Science
Tensor-Train RNN (TT-RNN), a novel family of neural sequence architectures for multivariate forecasting in environments with nonlinear dynamics, and decompose the higher-order structure using the tensor-train (TT) decomposition to reduce the number of parameters while preserving the model performance.
CDSA: Cross-Dimensional Self-Attention for Multivariate, Geo-tagged Time Series Imputation
- 2019
Computer Science, Environmental Science
This paper is the first to adapt the self-attention mechanism for multivariate, geo-tagged time series data with a novel approach called Cross-Dimensional Self-Attention (CDSA) to process each dimension sequentially, yet in an order-independent manner.