The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions
@article{Hochreiter1998TheVG, title={The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions}, author={Sepp Hochreiter}, journal={Int. J. Uncertain. Fuzziness Knowl. Based Syst.}, year={1998}, volume={6}, pages={107-116}, url={https://api.semanticscholar.org/CorpusID:18452318} }
The de-caying error flow is theoretically analyzed, methods trying to overcome vanishing gradients are briefly discussed, and experiments comparing conventional algorithms and alternative methods are presented.
2,356 Citations
Linear Antisymmetric Recurrent Neural Networks
- 2020
Computer Science, Mathematics
This paper suggests a new recurrent network structure called Linear Antisymmetric RNN (LARNN), based on the numerical solution to an Ordinary Differential Equation (ODE) with stability properties resulting in a stable solution, which corresponds to long-term memory.
Learning Long Term Dependencies with Recurrent Neural Networks
- 2006
Computer Science
It is shown that RNNs and especially normalised recurrent neural networks (NRNNs) unfolded in time are indeed very capable of learning time lags of at least a hundred time steps and it is demonstrated that the problem of a vanishing gradient does not apply to these networks.
Learning Longer Memory in Recurrent Neural Networks
- 2015
Computer Science
This paper shows that learning longer term patterns in real data, such as in natural language, is perfectly possible using gradient descent, by using a slight structural modification of the simple recurrent neural network architecture.
Short-Term Memory Optimization in Recurrent Neural Networks by Autoencoder-based Initialization
- 2020
Computer Science
An initialization schema that pretrains the weights of a recurrent neural network to approximate the linear autoencoder of the input sequences is introduced and it is shown how such pretraining can better support solving hard classification tasks with long sequences.
Reinforcement learning with recurrent neural networks
- 2008
Computer Science, Engineering
RNN can well map and reconstruct (partially observable) Markov decision processes and the resulting inner state of the network can be used as a basis for standard RL algorithms, which forms a novel connection between recurrent neural networks (RNN) and reinforcement learning (RL) techniques.
Backpropagation-decorrelation: online recurrent learning with O(N) complexity
- 2004
Computer Science
A new learning rule for fully recurrent neural networks is introduced which combines important principles: one-step backpropagation of errors and the usage of temporal memory in the network dynamics by means of decorrelation of activations.
Learning from Predictions: Fusing Training and Autoregressive Inference for Long-Term Spatiotemporal Forecasts
- 2023
Computer Science
The results show that BPTT-SA effectively reduces iterative error propagation in convolutional RNNs and Convolutional Autoencoder Rnns, and demonstrates its capabilities in long-term prediction of high-dimensional fluid flows.
Decoupling Hierarchical Recurrent Neural Networks With Locally Computable Losses
- 2019
Computer Science
It is empirically show that in (deep) HRNNs, propagating gradients back from higher to lower levels can be replaced by locally computable losses, without harming the learning capability of the network, over a wide range of tasks.
Using recurrent networks for non-temporal classification tasks
- 2014
Biology, Computer Science
This paper investigates the use of recurrent neural networks as an alternative to deep architectures and shows that for a comparable numbers of parameters or complexity, replacing depth with recurrency can result in improved performance.
32 References
Learning long-term dependencies with gradient descent is difficult
- 1994
Computer Science
This work shows why gradient based learning algorithms face an increasingly difficult problem as the duration of the dependencies to be captured increases, and exposes a trade-off between efficient learning by gradient descent and latching on information for long periods.
Learning State Space Trajectories in Recurrent Neural Networks
- 1989
Computer Science
A procedure for finding E/wij, where E is an error functional of the temporal trajectory of the states of a continuous recurrent network and wij are the weights of that network, which seems particularly suited for temporally continuous domains.
Gradient calculations for dynamic recurrent neural networks: a survey
- 1995
Computer Science, Mathematics
The author discusses advantages and disadvantages of temporally continuous neural networks in contrast to clocked ones and presents some "tricks of the trade" for training, using, and simulating continuous time and recurrent neural networks.
Learning Complex, Extended Sequences Using the Principle of History Compression
- 1992
Computer Science
A simple principle for reducing the descriptions of event sequences without loss of information is introduced and this insight leads to the construction of neural architectures that learn to divide and conquer by recursively decomposing sequences.
Learning long-term dependencies in NARX recurrent neural networks
- 1996
Computer Science
It is shown that the long-term dependencies problem is lessened for a class of architectures called nonlinear autoregressive models with exogenous (NARX) recurrent neural networks, which have powerful representational capabilities.
Credit Assignment through Time: Alternatives to Backpropagation
- 1993
Computer Science, Mathematics
This work considers and compares alternative algorithms and architectures on tasks for which the span of the input/output dependencies can be controlled and shows performance qualitatively superior to that obtained with backpropagation.
Long Short-Term Memory
- 1997
Computer Science
A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Learning Sequential Structure with the Real-Time Recurrent Learning Algorithm
- 1989
Computer Science
A more powerful recurrent learning procedure, called real-time recurrent learning2,6 (RTRL), is applied to some of the same problems studied by Servan-Schreiber, Cleeremans, and McClelland and revealed that the internal representations developed by RTRL networks revealed that they learn a rich set of internal states that represent more about the past than is required by the underlying grammar.
LSTM can Solve Hard Long Time Lag Problems
- 1996
Computer Science, Mathematics
This work shows that problems used to promote various previous algorithms can be solved more quickly by random weight guessing than by the proposed algorithms, and uses LSTM, its own recent algorithm, to solve a hard problem.
Neurocontrol of nonlinear dynamical systems with Kalman filter trained recurrent networks
- 1994
Computer Science, Engineering
These simulations suggest that recurrent controller networks trained by Kalman filter methods can combine the traditional features of state-space controllers and observers in a homogeneous architecture for nonlinear dynamical systems, while simultaneously exhibiting less sensitivity than do purely feedforward controller networks to changes in plant parameters and measurement noise.