Natural Conjugate Gradient Training of Multilayer Perceptrons

González, Ana; Dorronsoro, José R.

doi:10.1007/11840817_18

Ana González²⁰ &
José R. Dorronsoro²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4131))

Included in the following conference series:

International Conference on Artificial Neural Networks

3445 Accesses

Abstract

For maximum log–likelihood estimation, the Fisher matrix defines a Riemannian metric in weight space and, as shown by Amari and his coworkers, the resulting natural gradient greatly accelerates on–line multilayer perceptron (MLP) training. While its batch gradient descent counterpart also improves on standard gradient descent (as it gives a Gauss–Newton approximation to mean square error minimization), it may no longer be competitive with more advanced gradient–based function minimization procedures. In this work we shall show how to introduce natural gradients in a conjugate gradient (CG) setting, showing numerically that when applied to batch MLP learning, they lead to faster convergence to better minima than that achieved by standard euclidean CG descent. Since a drawback of full natural gradient is its larger computational cost, we also consider some cost simplifying variants and show that one of them, diagonal natural CG, also gives better minima than standard CG, with a comparable complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 11439; Price includes VAT (Japan)

Softcover Book: JPY 14299; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Mitigating Vanishing Gradient in SGD Optimization in Neural Networks

Regularizing Neural Networks with Gradient Monitoring

Toward Novel Optimizers: A Moreau-Yosida View of Gradient-Based Learning

References

Amari, S.: Natural Gradient Works Efficiently in Learning. Neural Computation 10, 251–276 (1998)
Article Google Scholar
Amari, S., Nagaoka, H.: Methods of information geometry. American Mathematical Society (2000)
Google Scholar
Amari, S., Park, H., Fukumizu, K.: Adaptive Method of Realizing Natural Gradient Learning for Multilayer Perceptrons. Neural Computation 12, 1399–1409 (2000)
Article Google Scholar
Duda, R., Hart, P., Stork, D.: Pattern classification. Wiley, Chichester (2000)
Google Scholar
Heskes, T.: On natural Learning and pruning in multilayered perceptrons. Neural Computation 12, 1037–1057 (2000)
Article Google Scholar
Igel, C., Toussaint, M., Weishui, W.: Rprop Using the Natural Gradient. In: Trends and Applications in Constructive Approximation. International Series of Numerical Mathematics, vol. 151, Birkhäuser, Basel (2005)
Google Scholar
LeCun, J., Bottou, L., Orr, G., Müller, K.R.: Efficient BackProp. In: Neural Networks: tricks of the trade, pp. 9–50. Springer, Heidelberg (1998)
Chapter Google Scholar
Murray, M., Rice, J.W.: Differential Geometry and Statistics. Chapman & Hall, Boca Raton (1993)
MATH Google Scholar
Murphy, P., Aha, D.: UCI Repository of Machine Learning Databases, Tech. Report, University of Califonia, Irvine (1994)
Google Scholar
Polak, F.: Computational Methods in Optimization. Academic Press, London (1971)
Google Scholar
Press, W., Teukolsky, S., Vetterling, W., Flannery, B.: Numerical Recipes in C. Cambridge U. Press, New York (1988)
MATH Google Scholar
Rao, C.R.: Information and accuracy attainable in estimation of statistical parameters. Bull. Cal. Math. Soc. 37, 81–91 (1945)
MATH Google Scholar
Rattray, M., Saad, D., Amari, S.: Natural gradient descent for on–line learning. Physical Review Letters 81, 5461–5464 (1998)
Article Google Scholar
Yang, H., Amari, S.: Complexity Issues in Natural Gradient Descent Method for Training Multi-Layer Perceptrons. Neural Computation 10, 2137–2157 (1998)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Dpto. de Ingeniería Informática and Instituto de Ingeniería del Conocimiento, Universidad Autónoma de Madrid, 28049, Madrid, Spain
Ana González & José R. Dorronsoro

Authors

Ana González
View author publications
You can also search for this author in PubMed Google Scholar
José R. Dorronsoro
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Electrical and Computer Engineering, Image, Video and Multimedia Systems Laboratory, National Technical University of Athens, GR-157 80, Zographou, Greece
Stefanos D. Kollias
Department of Electrical and Computer Engineering, National Technical University of Athens, 15780, Zographou, Greece
Andreas Stafylopatis
Department of Informatics, Nicolaus Copernicus University, Toruń, Poland
Włodzisław Duch
Adaptive Informatics Research Centre, Helsinki University of Technology, HUT, P.O. Box 5400, 02015, Finland
Erkki Oja

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

González, A., Dorronsoro, J.R. (2006). Natural Conjugate Gradient Training of Multilayer Perceptrons. In: Kollias, S.D., Stafylopatis, A., Duch, W., Oja, E. (eds) Artificial Neural Networks – ICANN 2006. ICANN 2006. Lecture Notes in Computer Science, vol 4131. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11840817_18

Download citation

DOI: https://doi.org/10.1007/11840817_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-38625-4
Online ISBN: 978-3-540-38627-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Natural Conjugate Gradient Training of Multilayer Perceptrons

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Mitigating Vanishing Gradient in SGD Optimization in Neural Networks

Regularizing Neural Networks with Gradient Monitoring

Toward Novel Optimizers: A Moreau-Yosida View of Gradient-Based Learning

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Natural Conjugate Gradient Training of Multilayer Perceptrons

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Mitigating Vanishing Gradient in SGD Optimization in Neural Networks

Regularizing Neural Networks with Gradient Monitoring

Toward Novel Optimizers: A Moreau-Yosida View of Gradient-Based Learning

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation