Local Policy Search in a Convex Space and Conservative Policy Iteration as Boosted Policy Search

Scherrer, Bruno; Geist, Matthieu

doi:10.1007/978-3-662-44845-8_3

Bruno Scherrer^23,24 &
Matthieu Geist²⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8726))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

2935 Accesses
1 Altmetric

Abstract

Local Policy Search is a popular reinforcement learning approach for handling large state spaces. Formally, it searches locally in a parameterized policy space in order to maximize the associated value function averaged over some predefined distribution. The best one can hope in general from such an approach is to get a local optimum of this criterion. The first contribution of this article is the following surprising result: if the policy space is convex, any (approximate) local optimum enjoys a global performance guarantee. Unfortunately, the convexity assumption is strong: it is not satisfied by commonly used parameterizations and designing a parameterization that induces this property seems hard. A natural solution to alleviate this issue consists in deriving an algorithm that solves the local policy search problem using a boosting approach (constrained to the convex hull of the policy space). The resulting algorithm turns out to be a slight generalization of conservative policy iteration; thus, our second contribution is to highlight an original connection between local policy search and approximate dynamic programming.

Download to read the full chapter text

Chapter PDF

A Survey on Constraining Policy Updates Using the KL Divergence

Novel First Order Bayesian Optimization with an Application to Reinforcement Learning

Article 30 September 2020

Reinforcement Learning for Approximate Optimal Control

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Antos, A., Szepesvari, C., Munos, R.: Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path. Machine Learning Journal 71, 89–129 (2008)
Article Google Scholar
Archibald, T., McKinnon, K., Thomas, L.: On the Generation of Markov Decision Processes. Journal of the Operational Research Society 46, 354–361 (1995)
Article MATH Google Scholar
Baxter, J., Bartlett, P.L.: Infinite-horizon gradient-based policy search. Journal of Artificial Intelligence Research (JAIR) 15, 319–350 (2001)
MATH MathSciNet Google Scholar
Bertsekas, D., Tsitsiklis, J.: Neuro-Dynamic Programming. Athena Scientific (1996)
Google Scholar
Bertsekas, D.P.: Dynamic Programming and Optimal Control. Athena Scientific (1995)
Google Scholar
Bhatnagar, S., Sutton, R.S., Ghavamzadeh, M., Lee, M.: Incremental natural actor-critic algorithms. In: Advances in Neural Information Processing Systems, NIPS (2007)
Google Scholar
Fern, A., Yoon, S., Givan, R.: Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes. Journal of Artificial Intelligence Research (JAIR) 25, 75–118 (2006)
MATH MathSciNet Google Scholar
Ghavamzadeh, M., Lazaric, A.: Conservative and Greedy Approaches to Classification-based Policy Iteration. In: Conference on Artificial Intelligence, AAAI (2012)
Google Scholar
Heidrich-Meisner, V., Igel, C.: Evolution strategies for direct policy search. In: Rudolph, G., Jansen, T., Lucas, S., Poloni, C., Beume, N. (eds.) PPSN 2008. LNCS, vol. 5199, pp. 428–437. Springer, Heidelberg (2008)
Chapter Google Scholar
Kakade, S.: A Natural Policy Gradient. In: Advances in Neural Information Processing Systems, NIPS (2001)
Google Scholar
Kakade, S., Langford, J.: Approximately optimal approximate reinforcement learning. In: International Conference on Machine Learning, ICML (2002)
Google Scholar
Kober, J., Peters, J.: Policy Search for Motor Primitives in Robotics. Machine Learning pp. 171–203 (2011)
Google Scholar
Lagoudakis, M., Parr, R.: Least-squares policy iteration. Journal of Machine Learning Research (JMLR) 4, 1107–1149 (2003)
MathSciNet Google Scholar
Lagoudakis, M., Parr, R.: Reinforcement learning as classification: Leveraging modern classifiers. In: International Conference on Machine Learning, ICML (2003)
Google Scholar
Lazaric, A., Ghavamzadeh, M., Munos, R.: Finite-sample analysis of least-squares policy iteration. Journal of Machine Learning Research 13, 3041–3074 (2011)
MathSciNet Google Scholar
Lazaric, A., Ghavamzadeh, M., Munos, R.: Analysis of a classification-based policy iteration algorithm. In: International Conference on Machine Learning, ICML (2010)
Google Scholar
Mason, L., Baxter, J., Bartlett, P., Frean, M.: Boosting algorithms as gradient descent in function space. Tech. rep., Australian National University (1999)
Google Scholar
Munos, R.: Error bounds for approximate policy iteration. In: International Conference on Machine Learning, ICML (2003)
Google Scholar
Munos, R.: Performance bounds in Lp norm for approximate value iteration. SIAM Journal on Control and Optimization (2007)
Google Scholar
Peters, J., Schaal, S.: Natural Actor-Critic. Neurocomputing 71, 1180–1190 (2008)
Article Google Scholar
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley-Interscience (1994)
Google Scholar
Scherrer, B., Gabillon, V., Ghavamzadeh, M., Geist, M.: Approximate Modified Policy Iteration. In: International Conference on Machine Learning, ICML (2012)
Google Scholar
Scherrer, B., Lesner, B.: On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes. In: Advances in Neural Information Processing Systems, NIPS (2012)
Google Scholar
Sutton, R., Barto, A.: Reinforcement Learning, An introduction. The MIT Press (1998)
Google Scholar
Sutton, R.S., McAllester, D.A., Singh, S.P., Mansour, Y.: Policy Gradient Methods for Reinforcement Learning with Function Approximation. In: Advances in Neural Information Processing Systems, NIPS (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Inria, Villers-lès-Nancy, F-54600, France
Bruno Scherrer
Université de Lorraine, LORIA, UMR 7503, Vandœuvre-lès-Nancy, F-54506, France
Bruno Scherrer
Supélec - IMS-MaLIS Research Group & UMI 2958 (GeorgiaTech-CNRS), Metz, France
Matthieu Geist

Authors

Bruno Scherrer
View author publications
You can also search for this author in PubMed Google Scholar
Matthieu Geist
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Applied Sciences, Department of Computer and Decision Engineering, Université Libre de Bruxelles, Av. F. Roosevelt, CP 165/15, 1050, Brussels, Belgium
Toon Calders
Dipartimento di Informatica, Università degli Studi “Aldo Moro”, via Orabona 4, 70125, Bari, Italy
Floriana Esposito
Department of Computer Science, Universität Paderborn, Warburger Str. 100, 33098, Paderborn, Germany
Eyke Hüllermeier
Dipartimento di Informatica,, Università degli Studi di Torino, Corso Svizzera 185, 10149, Torino, Italy
Rosa Meo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Scherrer, B., Geist, M. (2014). Local Policy Search in a Convex Space and Conservative Policy Iteration as Boosted Policy Search. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2014. Lecture Notes in Computer Science(), vol 8726. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44845-8_3

Download citation

DOI: https://doi.org/10.1007/978-3-662-44845-8_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44844-1
Online ISBN: 978-3-662-44845-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics