The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions

Sepp Hochreiter

DOI:10.1142/S0218488598000094
Corpus ID: 18452318

The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions

@article{Hochreiter1998TheVG,
  title={The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions},
  author={Sepp Hochreiter},
  journal={Int. J. Uncertain. Fuzziness Knowl. Based Syst.},
  year={1998},
  volume={6},
  pages={107-116},
  url={https://api.semanticscholar.org/CorpusID:18452318}
}

Sepp Hochreiter
Published in Int. J. Uncertain. Fuzziness… 1 April 1998
Computer Science
Int. J. Uncertain. Fuzziness Knowl. Based Syst.

The de-caying error flow is theoretically analyzed, methods trying to overcome vanishing gradients are briefly discussed, and experiments comparing conventional algorithms and alternative methods are presented.

View via Publisher

wikidata.org

2,356 Citations

Highly Influential Citations

Background Citations

1,223

Methods Citations

299

Results Citations

Topics

Recurrent Nets Temporal Dependency Error Flow Time-series Prediction Long-time-lag Problems Vanishing Gradients Vanishing-gradient Problem

Linear Antisymmetric Recurrent Neural Networks

Signe MoeFilippo RemonatoE. GrøtliJ. Gravdahl

Computer Science, Mathematics

L4DC

2020

This paper suggests a new recurrent network structure called Linear Antisymmetric RNN (LARNN), based on the numerical solution to an Ordinary Differential Equation (ODE) with stability properties resulting in a stable solution, which corresponds to long-term memory.

Learning Long Term Dependencies with Recurrent Neural Networks

A. SchäferS. UdluftH. Zimmermann

Computer Science

ICANN

2006

It is shown that RNNs and especially normalised recurrent neural networks (NRNNs) unfolded in time are indeed very capable of learning time lags of at least a hundred time steps and it is demonstrated that the problem of a vanishing gradient does not apply to these networks.

Learning Longer Memory in Recurrent Neural Networks

Tomas MikolovArmand JoulinS. ChopraMichaël MathieuMarc'Aurelio Ranzato

Computer Science

ICLR

2015

This paper shows that learning longer term patterns in real data, such as in natural language, is perfectly possible using gradient descent, by using a slight structural modification of the simple recurrent neural network architecture.

[PDF]

Learning long-term dependencies with recurrent neural networks

A. SchäferS. UdluftH. Zimmermann

Computer Science

Neurocomputing

2008

Short-Term Memory Optimization in Recurrent Neural Networks by Autoencoder-based Initialization

Antonio CartaA. SperdutiD. Bacciu

Computer Science

ArXiv

2020

An initialization schema that pretrains the weights of a recurrent neural network to approximate the linear autoencoder of the input sequences is introduced and it is shown how such pretraining can better support solving hard classification tasks with long sequences.

[PDF]

Reinforcement learning with recurrent neural networks

A. Schäfer

Computer Science, Engineering

2008

RNN can well map and reconstruct (partially observable) Markov decision processes and the resulting inner state of the network can be used as a basis for standard RL algorithms, which forms a novel connection between recurrent neural networks (RNN) and reinforcement learning (RL) techniques.

Backpropagation-decorrelation: online recurrent learning with O(N) complexity

Jochen J. Steil

Computer Science

2004 IEEE International Joint Conference on…

2004

A new learning rule for fully recurrent neural networks is introduced which combines important principles: one-step backpropagation of errors and the usage of temporal memory in the network dynamics by means of decorrelation of activations.

Learning from Predictions: Fusing Training and Autoregressive Inference for Long-Term Spatiotemporal Forecasts

Pantelis R. VlachasP. Koumoutsakos

Computer Science

SSRN Electronic Journal

2023

The results show that BPTT-SA effectively reduces iterative error propagation in convolutional RNNs and Convolutional Autoencoder Rnns, and demonstrates its capabilities in long-term prediction of high-dimensional fluid flows.

[PDF]

Decoupling Hierarchical Recurrent Neural Networks With Locally Computable Losses

Asier MujikaFelix WeissenbergerA. Steger

Computer Science

ArXiv

2019

It is empirically show that in (deep) HRNNs, propagating gradients back from higher to lower levels can be replaced by locally computable losses, without harming the learning capability of the network, over a wide range of tasks.

[PDF]

Using recurrent networks for non-temporal classification tasks

Saurav BiswasMuhammad Zeshan AfzalT. Breuel

Biology, Computer Science

2014 International Joint Conference on Neural…

2014

This paper investigates the use of recurrent neural networks as an alternative to deep architectures and shows that for a comparable numbers of parameters or complexity, replacing depth with recurrency can result in improved performance.

Learning long-term dependencies with gradient descent is difficult

Yoshua BengioPatrice Y. SimardP. Frasconi

Computer Science

IEEE Trans. Neural Networks

1994

This work shows why gradient based learning algorithms face an increasingly difficult problem as the duration of the dependencies to be captured increases, and exposes a trade-off between efficient learning by gradient descent and latching on information for long periods.

Learning State Space Trajectories in Recurrent Neural Networks

Barak A. Pearlmutter

Computer Science

Neural Computation

1989

A procedure for finding E/wij, where E is an error functional of the temporal trajectory of the states of a continuous recurrent network and wij are the weights of that network, which seems particularly suited for temporally continuous domains.

Gradient calculations for dynamic recurrent neural networks: a survey

Barak A. Pearlmutter

Computer Science, Mathematics

IEEE Trans. Neural Networks

1995

The author discusses advantages and disadvantages of temporally continuous neural networks in contrast to clocked ones and presents some "tricks of the trade" for training, using, and simulating continuous time and recurrent neural networks.

Learning Complex, Extended Sequences Using the Principle of History Compression

J. Schmidhuber

Computer Science

Neural Computation

1992

A simple principle for reducing the descriptions of event sequences without loss of information is introduced and this insight leads to the construction of neural architectures that learn to divide and conquer by recursively decomposing sequences.

Learning long-term dependencies in NARX recurrent neural networks

Tsungnan LinB. HorneP. TiňoC. Lee Giles

Computer Science

IEEE Trans. Neural Networks

1996

It is shown that the long-term dependencies problem is lessened for a class of architectures called nonlinear autoregressive models with exogenous (NARX) recurrent neural networks, which have powerful representational capabilities.

Credit Assignment through Time: Alternatives to Backpropagation

Yoshua BengioP. Frasconi

Computer Science, Mathematics

NIPS

1993

This work considers and compares alternative algorithms and architectures on tasks for which the span of the input/output dependencies can be controlled and shows performance qualitatively superior to that obtained with backpropagation.

Long Short-Term Memory

Sepp HochreiterJ. Schmidhuber

Computer Science

Neural Computation

1997

A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

Learning Sequential Structure with the Real-Time Recurrent Learning Algorithm

Anthony V. W. SmithD. Zipser

Computer Science

Int. J. Neural Syst.

1989

A more powerful recurrent learning procedure, called real-time recurrent learning2,6 (RTRL), is applied to some of the same problems studied by Servan-Schreiber, Cleeremans, and McClelland and revealed that the internal representations developed by RTRL networks revealed that they learn a rich set of internal states that represent more about the past than is required by the underlying grammar.

LSTM can Solve Hard Long Time Lag Problems

Sepp HochreiterJ. Schmidhuber

Computer Science, Mathematics

NIPS

1996

This work shows that problems used to promote various previous algorithms can be solved more quickly by random weight guessing than by the proposed algorithms, and uses LSTM, its own recent algorithm, to solve a hard problem.

Neurocontrol of nonlinear dynamical systems with Kalman filter trained recurrent networks

G. PuskoriusL. Feldkamp

Computer Science, Engineering

IEEE Trans. Neural Networks

1994

These simulations suggest that recurrent controller networks trained by Kalman filter methods can combine the traditional features of state-space controllers and observers in a homogeneous architecture for nonlinear dynamical systems, while simultaneously exhibiting less sensitivity than do purely feedforward controller networks to changes in plant parameters and measurement noise.

The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions

Topics

2,356 Citations

Linear Antisymmetric Recurrent Neural Networks

Learning Long Term Dependencies with Recurrent Neural Networks

Learning Longer Memory in Recurrent Neural Networks

Learning long-term dependencies with recurrent neural networks

Short-Term Memory Optimization in Recurrent Neural Networks by Autoencoder-based Initialization

Reinforcement learning with recurrent neural networks

Backpropagation-decorrelation: online recurrent learning with O(N) complexity

Learning from Predictions: Fusing Training and Autoregressive Inference for Long-Term Spatiotemporal Forecasts

Decoupling Hierarchical Recurrent Neural Networks With Locally Computable Losses

Using recurrent networks for non-temporal classification tasks

32 References

Learning long-term dependencies with gradient descent is difficult

Learning State Space Trajectories in Recurrent Neural Networks

Gradient calculations for dynamic recurrent neural networks: a survey

Learning Complex, Extended Sequences Using the Principle of History Compression

Learning long-term dependencies in NARX recurrent neural networks

Credit Assignment through Time: Alternatives to Backpropagation

Long Short-Term Memory

Learning Sequential Structure with the Real-Time Recurrent Learning Algorithm

LSTM can Solve Hard Long Time Lag Problems

Neurocontrol of nonlinear dynamical systems with Kalman filter trained recurrent networks

Related Papers