Revisiting Natural Actor-Critics with Value Function Approximation

Geist, Matthieu; Pietquin, Olivier

doi:10.1007/978-3-642-16292-3_21

Matthieu Geist²² &
Olivier Pietquin²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6408))

Included in the following conference series:

International Conference on Modeling Decisions for Artificial Intelligence

623 Accesses

Abstract

Actor-critics architectures have become popular during the last decade in the field of reinforcement learning because of the introduction of the policy gradient with function approximation theorem. It allows combining rationally actor-critic architectures with value function approximation and therefore addressing large-scale problems. Recent researches led to the replacement of policy gradient by a natural policy gradient, improving the efficiency of the corresponding algorithms. However, a common drawback of these approaches is that they require the manipulation of the so-called advantage function which does not satisfy any Bellman equation. Consequently, derivation of actor-critic algorithms is not straightforward. In this paper, we re-derive theorems in a way that allows reasoning directly with the state-action value function (or Q-function) and thus relying on the Bellman equation again. Consequently, new forms of critics can easily be integrated in the actor-critic framework.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 5719; Price includes VAT (Japan)

Softcover Book: JPY 7149; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Combine Deep Q-Networks with Actor-Critic

Integrated Actor-Critic for Deep Reinforcement Learning

Understanding Failures of Deterministic Actor-Critic with Continuous Action Spaces and Sparse Rewards

References

Barto, A.G., Sutton, R.S., Anderson, C.W.: Neuronlike adaptive elements that can solve difficult learning control problems, pp. 535–549 (1988)
Google Scholar
Watkins, C.: Learning from Delayed Rewards. PhD thesis, Cambridge University, Cambridge, England (1989)
Google Scholar
Sutton, R.S., McAllester, D.A., Singh, S.P., Mansour, Y.: Policy Gradient Methods for Reinforcement Learning with Function Approximation. In: Advances in Neural Information Processing Systems (NIPS 12), pp. 1057–1063 (2000)
Google Scholar
Konda, V.R., Tsitsiklis, J.N.: Actor-Critic Algorithms. In: Advances in Neural Information Processing Systems, NIPS 12 (2000)
Google Scholar
Peters, J., Vijayakumar, S., Schaal, S.: Reinforcement Learning for Humanoid Robotics. In: Third IEEE-RAS International Conference on Humanoid Robots, Humanoids 2003 (2003)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. In: Adaptive Computation and Machine Learning, 3rd edn. The MIT Press, Cambridge (1998)
Google Scholar
Bhatnagar, S., Sutton, R.S., Ghavamzadeh, M., Lee, M.: Incremental Natural Actor-Critic Algorithms. In: Advances in Neural Information Processing Systems (NIPS 21), Vancouver, Canada (2007)
Google Scholar
Amari, S.I.: Natural gradient works efficiently in learning. Neural Computation 10, 251–276 (1998)
Article Google Scholar
Kakade, S.: A Natural Policy Gradient. In: Advances in Neural Information Processing Systems (NIPS 14), pp. 1531–1538 (2002)
Google Scholar
Geist, M., Pietquin, O., Fricout, G.: Kalman Temporal Differences: the deterministic case. In: Proceedings of the IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL 2009), Nashville, TN, USA (2009)
Google Scholar
Morimura, T., Uchibe, E., Doya, K.: Utilizing the Natural Gradient in Temporal Difference Reinforcement Learning with Eligibility Traces. In: 2nd Internatinal Symposium on Information Geometry and its Applications, Tokyo, Japan, pp. 256–263 (2005)
Google Scholar
Wiering, M., van Hasselt, H.: The QV Family Compared to Other Reinforcement Learning Algorithms. In: IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL 2009), Nashville, TN, USA (2009)
Google Scholar
Bradtke, S.J., Barto, A.G.: Linear Least-Squares algorithms for temporal difference learning. Machine Learning 22, 33–57 (1996)
MATH Google Scholar
Geist, M., Pietquin, O., Fricout, G.: Tracking in reinforcement learning. In: Leung, C.S., Lee, M., Chan, J.H. (eds.) ICONIP 2009. LNCS, vol. 5863, pp. 502–511. Springer, Heidelberg (2009)
Chapter Google Scholar
Park, J., Kim, J., Kang, D.: An RLS-Based Natural Actor-Critic Algorithm for Locomotion of a Two-Linked Robot Arm. In: Hao, Y., Liu, J., Wang, Y.-P., Cheung, Y.-m., Yin, H., Jiao, L., Ma, J., Jiao, Y.-C. (eds.) CIS 2005. LNCS (LNAI), vol. 3801, pp. 65–72. Springer, Heidelberg (2005)
Chapter Google Scholar
Lagoudakis, M.G., Parr, R.: Least-Squares Policy Iteration. Journal of Machine Learning Research 4, 1107–1149 (2003)
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

IMS Research Group, Supélec, Metz, France
Matthieu Geist & Olivier Pietquin

Authors

Matthieu Geist
View author publications
You can also search for this author in PubMed Google Scholar
Olivier Pietquin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

IIIA-CSIC, Campus UAB s/n, 08193, Bellaterra, Catalonia, Spain
Vicenç Torra
Toho Gakuen, 3-1-10 Naka, Kunitachi, 186-0004, Tokyo, Japan
Yasuo Narukawa
Université de Perpignan, Tecnosud, Rambla de la thermodynamique, 66100, Perpignan, France
Marc Daumas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Geist, M., Pietquin, O. (2010). Revisiting Natural Actor-Critics with Value Function Approximation. In: Torra, V., Narukawa, Y., Daumas, M. (eds) Modeling Decisions for Artificial Intelligence. MDAI 2010. Lecture Notes in Computer Science(), vol 6408. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16292-3_21

Download citation

DOI: https://doi.org/10.1007/978-3-642-16292-3_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16291-6
Online ISBN: 978-3-642-16292-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics