Abstract
For maximum log–likelihood estimation, the Fisher matrix defines a Riemannian metric in weight space and, as shown by Amari and his coworkers, the resulting natural gradient greatly accelerates on–line multilayer perceptron (MLP) training. While its batch gradient descent counterpart also improves on standard gradient descent (as it gives a Gauss–Newton approximation to mean square error minimization), it may no longer be competitive with more advanced gradient–based function minimization procedures. In this work we shall show how to introduce natural gradients in a conjugate gradient (CG) setting, showing numerically that when applied to batch MLP learning, they lead to faster convergence to better minima than that achieved by standard euclidean CG descent. Since a drawback of full natural gradient is its larger computational cost, we also consider some cost simplifying variants and show that one of them, diagonal natural CG, also gives better minima than standard CG, with a comparable complexity.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Amari, S.: Natural Gradient Works Efficiently in Learning. Neural Computation 10, 251–276 (1998)
Amari, S., Nagaoka, H.: Methods of information geometry. American Mathematical Society (2000)
Amari, S., Park, H., Fukumizu, K.: Adaptive Method of Realizing Natural Gradient Learning for Multilayer Perceptrons. Neural Computation 12, 1399–1409 (2000)
Duda, R., Hart, P., Stork, D.: Pattern classification. Wiley, Chichester (2000)
Heskes, T.: On natural Learning and pruning in multilayered perceptrons. Neural Computation 12, 1037–1057 (2000)
Igel, C., Toussaint, M., Weishui, W.: Rprop Using the Natural Gradient. In: Trends and Applications in Constructive Approximation. International Series of Numerical Mathematics, vol. 151, Birkhäuser, Basel (2005)
LeCun, J., Bottou, L., Orr, G., Müller, K.R.: Efficient BackProp. In: Neural Networks: tricks of the trade, pp. 9–50. Springer, Heidelberg (1998)
Murray, M., Rice, J.W.: Differential Geometry and Statistics. Chapman & Hall, Boca Raton (1993)
Murphy, P., Aha, D.: UCI Repository of Machine Learning Databases, Tech. Report, University of Califonia, Irvine (1994)
Polak, F.: Computational Methods in Optimization. Academic Press, London (1971)
Press, W., Teukolsky, S., Vetterling, W., Flannery, B.: Numerical Recipes in C. Cambridge U. Press, New York (1988)
Rao, C.R.: Information and accuracy attainable in estimation of statistical parameters. Bull. Cal. Math. Soc. 37, 81–91 (1945)
Rattray, M., Saad, D., Amari, S.: Natural gradient descent for on–line learning. Physical Review Letters 81, 5461–5464 (1998)
Yang, H., Amari, S.: Complexity Issues in Natural Gradient Descent Method for Training Multi-Layer Perceptrons. Neural Computation 10, 2137–2157 (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
González, A., Dorronsoro, J.R. (2006). Natural Conjugate Gradient Training of Multilayer Perceptrons. In: Kollias, S.D., Stafylopatis, A., Duch, W., Oja, E. (eds) Artificial Neural Networks – ICANN 2006. ICANN 2006. Lecture Notes in Computer Science, vol 4131. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11840817_18
Download citation
DOI: https://doi.org/10.1007/11840817_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-38625-4
Online ISBN: 978-3-540-38627-8
eBook Packages: Computer ScienceComputer Science (R0)