Regularized Fitted Q-Iteration: Application to Planning

Farahmand, Amir massoud; Ghavamzadeh, Mohammad; Szepesvári, Csaba; Mannor, Shie

doi:10.1007/978-3-540-89722-4_5

Amir massoud Farahmand³,
Mohammad Ghavamzadeh³,
Csaba Szepesvári³ &
…
Shie Mannor⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5323))

Included in the following conference series:

European Workshop on Reinforcement Learning

1157 Accesses

Abstract

We consider planning in a Markovian decision problem, i.e., the problem of finding a good policy given access to a generative model of the environment. We propose to use fitted Q-iteration with penalized (or regularized) least-squares regression as the regression subroutine to address the problem of controlling model-complexity. The algorithm is presented in detail for the case when the function space is a reproducing-kernel Hilbert space underlying a user-chosen kernel function. We derive bounds on the quality of the solution and argue that data-dependent penalties can lead to almost optimal performance. A simple example is used to illustrate the benefits of using a penalized procedure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Planning with Information-Processing Constraints and Model Uncertainty in Markov Decision Processes

Stochastic Extended LQR: Optimization-Based Motion Planning Under Uncertainty

Extended LQR: Locally-Optimal Feedback Control for Systems with Non-Linear Dynamics and Non-Quadratic Cost

References

Antos, A., Szepesvári, C., Munos, R.: Value-iteration based fitted policy iteration: learning with a single trajectory. In: IEEE ADPRL, pp. 330–337 (2007)
Google Scholar
Antos, A., Munos, R., Szepesvári, C.: Fitted Q-iteration in continuous action-space MDPs. In: Advances in Neural Information Processing Systems 20, NIPS 2007 (in print, 2008)
Google Scholar
Antos, A., Szepesvári, C., Munos, R.: Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path. Machine Learning 71, 89–129 (2008)
Article MATH Google Scholar
Bertsekas, D.P., Shreve, S.E.: Stochastic Optimal Control (The Discrete Time Case). Academic Press, New York (1978)
MATH Google Scholar
Bunea, F., Tsybakov, A., Wegkamp, M.: Sparsity oracle inequalities for the lasso. Electronic Journal of Statistics 1, 169–194 (2007)
Article MathSciNet MATH Google Scholar
Engel, Y., Mannor, S., Meir, R.: The kernel recursive least squares algorithm. IEEE Transaction on Signal Processing 52(8), 2275–2285 (2004)
Article MathSciNet Google Scholar
Engel, Y., Mannor, S., Meir, R.: Reinforcement learning with Gaussian processes. In: ICML 2005: Proceedings of the 22nd international conference on Machine learning, pp. 201–208. ACM, New York (2005)
Google Scholar
Ernst, D., Geurts, P., Wehenkel, L.: Tree-based batch mode reinforcement learning. Journal of Machine Learning Research 6, 503–556 (2005)
MathSciNet MATH Google Scholar
Farahmand, A.M., Ghavamzadeh, M., Szepesvári, C., Mannor, S.: Regularized policy iteration. In: Advances in Neural Information Processing Systems 21, NIPS 2008 (to appear, 2008)
Google Scholar
Györfi, L., Kohler, M., Krzyżak, A., Walk, H.: A distribution-free theory of nonparametric regression. Springer, New York (2002)
Book MATH Google Scholar
Jung, T., Polani, D.: Least squares SVM for least squares TD learning. In: ECAI, pp. 499–503 (2006)
Google Scholar
Kearns, M., Mansour, Y., Ng, A.Y.: A sparse sampling algorithm for near-optimal planning in large Markovian decision processes. In: Proceedings of IJCAI 1999, pp. 1324–1331 (1999)
Google Scholar
Lagoudakis, M.G., Parr, R.: Reinforcement learning as classification: Leveraging modern classifiers. In: ICML 2003, pp. 424–431 (2003)
Google Scholar
Loth, M., Davy, M., Preux, P.: Sparse temporal difference learning using LASSO. In: IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning (2007)
Google Scholar
Mannor, S., Menache, I., Shimkin, N.: Basis function adaptation in temporal difference reinforcement learning. Annals of Operations Research 134, 215–238 (2005)
Article MathSciNet MATH Google Scholar
Munos, R., Szepesvári, C.: Finite-time bounds for fitted value iteration. Journal of Machine Learning Research 9, 815–857 (2008)
MathSciNet MATH Google Scholar
Ng, A.Y., Jordan, M.: PEGASUS: A policy search method for large MDPs and POMDPs. In: Proceedings of the 16th Conference in Uncertainty in Artificial Intelligence, pp. 406–415 (2000)
Google Scholar
Ormoneit, D., Sen, S.: Kernel-based reinforcement learning. Machine Learning 49, 161–178 (2002)
Article MATH Google Scholar
Parr, R., Painter-Wakefield, C., Li, L., Littman, M.L.: Analyzing feature generation for value-function approximation. In: ICML, pp. 737–744 (2007)
Google Scholar
Schölkopf, B., Smola, A.J.: Learning with Kernels. MIT Press, Cambridge (2002)
MATH Google Scholar
Srebro, N., Ben-David, S.: Learning bounds for support vector machines with learned kernels. In: Lugosi, G., Simon, H.U. (eds.) COLT 2006. LNCS, vol. 4005, pp. 169–183. Springer, Heidelberg (2006)
Chapter Google Scholar
Xu, X., Hu, D., Lu, X.: Kernel-based least squares policy iteration for reinforcement learning. IEEE Trans. on Neural Networks 18, 973–992 (2007)
Article Google Scholar
Zhou, D.-X.: Capacity of reproducing kernel spaces in learning theory. IEEE Transactions on Information Theory 49, 1743–1752 (2003)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computing Science, University of Alberta, Edmonton, AB T6G 2E8, Canada
Amir massoud Farahmand, Mohammad Ghavamzadeh & Csaba Szepesvári
Department of Electrical & Computer Engineering, McGill University, Montreal, QC H3A 2A7, Canada
Shie Mannor

Authors

Amir massoud Farahmand
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Ghavamzadeh
View author publications
You can also search for this author in PubMed Google Scholar
Csaba Szepesvári
View author publications
You can also search for this author in PubMed Google Scholar
Shie Mannor
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

INRIA Lille-Nord Europe, 59650, Villeneuve d’Ascq, France
Sertan Girgin
INRIA, LIFL, CNRS, Université de Lille, Villeneuve d’Ascq, France
Manuel Loth , Rémi Munos , Philippe Preux & Daniil Ryabko , , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Farahmand, A.m., Ghavamzadeh, M., Szepesvári, C., Mannor, S. (2008). Regularized Fitted Q-Iteration: Application to Planning. In: Girgin, S., Loth, M., Munos, R., Preux, P., Ryabko, D. (eds) Recent Advances in Reinforcement Learning. EWRL 2008. Lecture Notes in Computer Science(), vol 5323. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89722-4_5

Download citation

DOI: https://doi.org/10.1007/978-3-540-89722-4_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89721-7
Online ISBN: 978-3-540-89722-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Regularized Fitted Q-Iteration: Application to Planning

Abstract

Access this chapter

Preview

Similar content being viewed by others

Planning with Information-Processing Constraints and Model Uncertainty in Markov Decision Processes

Stochastic Extended LQR: Optimization-Based Motion Planning Under Uncertainty

Extended LQR: Locally-Optimal Feedback Control for Systems with Non-Linear Dynamics and Non-Quadratic Cost

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Regularized Fitted Q-Iteration: Application to Planning

Abstract

Access this chapter

Preview

Similar content being viewed by others

Planning with Information-Processing Constraints and Model Uncertainty in Markov Decision Processes

Stochastic Extended LQR: Optimization-Based Motion Planning Under Uncertainty

Extended LQR: Locally-Optimal Feedback Control for Systems with Non-Linear Dynamics and Non-Quadratic Cost

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation