New Error Bounds for Approximations from Projected Linear Equations | SpringerLink
Skip to main content

New Error Bounds for Approximations from Projected Linear Equations

  • Conference paper
Recent Advances in Reinforcement Learning (EWRL 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5323))

Included in the following conference series:

  • 1095 Accesses

Abstract

We consider linear fixed point equations and their approximations by projection on a low dimensional subspace. We derive new bounds on the approximation error of the solution, which are expressed in terms of low dimensional matrices and can be computed by simulation. When the fixed point mapping is a contraction, as is typically the case in Markovian decision processes (MDP), one of our bounds is always sharper than the standard worst case bounds, and another one is often sharper. Our bounds also apply to the non-contraction case, including policy evaluation in MDP with nonstandard projections that enhance exploration. There are no error bounds currently available for this case to our knowledge.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Bertsekas, D.P.: Dynamic Programming and Optimal Control, 3rd edn., vol. II. Athena Scientific, Belmont (2007)

    MATH  Google Scholar 

  2. Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)

    MATH  Google Scholar 

  3. Bertsekas, D.P., Yu, H.: Projected equation methods for approximate solution of large linear systems. J. Computational and Applied Mathematics (to appear, 2008)

    Google Scholar 

  4. Boyan, J.A.: Least-squares temporal difference learning. In: Proc. of the 16th Int. Conf. Machine Learning (1999)

    Google Scholar 

  5. Konda, V.R.: Actor-Critic Algorithms. Ph.D thesis. MIT, Cambridge (2002)

    Google Scholar 

  6. Munos, R.: Error bounds for approximate policy iteration. In: Proc. The 20th Int. Conf. Machine Learning (2003)

    Google Scholar 

  7. Nedić, A., Bertsekas, D.P.: Least squares policy evaluation algorithms with linear function approximation. Discrete Event Dyn. Syst. 13, 79–110 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  8. Sutton, R.S.: Learning to predict by the methods of temporal differences. Machine Learning 3, 9–44 (1988)

    Google Scholar 

  9. Sutton, R.S., Barto, A.G.: Reinforcement Learning. MIT Press, Cambridge (1998)

    Google Scholar 

  10. Tsitsiklis, J.N., Van Roy, B.: An analysis of temporal-difference learning with function approximation. IEEE Trans. Automat. Contr. 42(5), 674–690 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  11. Tsitsiklis, J.N., Van Roy, B.: Average cost temporal-difference learning. Automatica 35(11), 1799–1808 (1999)

    Article  MATH  Google Scholar 

  12. Yu, H., Bertsekas, D.P.: New error bounds for approximations from projected linear equations. Technical Report C-2008-43, University of Helsinki (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yu, H., Bertsekas, D.P. (2008). New Error Bounds for Approximations from Projected Linear Equations. In: Girgin, S., Loth, M., Munos, R., Preux, P., Ryabko, D. (eds) Recent Advances in Reinforcement Learning. EWRL 2008. Lecture Notes in Computer Science(), vol 5323. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89722-4_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-89722-4_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-89721-7

  • Online ISBN: 978-3-540-89722-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics