Tracking in Reinforcement Learning | SpringerLink
Skip to main content

Tracking in Reinforcement Learning

  • Conference paper
Neural Information Processing (ICONIP 2009)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5863))

Included in the following conference series:

  • 1622 Accesses

Abstract

Reinforcement learning induces non-stationarity at several levels. Adaptation to non-stationary environments is of course a desired feature of a fair RL algorithm. Yet, even if the environment of the learning agent can be considered as stationary, generalized policy iteration frameworks, because of the interleaving of learning and control, will produce non-stationarity of the evaluated policy and so of its value function. Tracking the optimal solution instead of trying to converge to it is therefore preferable. In this paper, we propose to handle this tracking issue with a Kalman-based temporal difference framework. Complexity and convergence analysis are studied. Empirical investigations of its ability to handle non-stationarity is finally provided.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1996)

    Google Scholar 

  2. Phua, C.W., Fitch, R.: Tracking Value Function Dynamics to Improve Reinforcement Learning with Piecewise Linear Function Approximation. In: International Conference on Machine Learning, ICML 2007 (2007)

    Google Scholar 

  3. Sutton, R.S., Koop, A., Silver, D.: On the role of tracking in stationary environments. In: Proceedings of the 24th international conference on Machine learning, pp. 871–878 (2007)

    Google Scholar 

  4. Geist, M., Pietquin, O., Fricout, G.: Kalman Temporal Differences: the deterministic case. In: Proceedings of the IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL 2009), Nashville, TN, USA (April 2009)

    Google Scholar 

  5. Kalman, R.E.: A New Approach to Linear Filtering and Prediction Problems. Transactions of the ASME–Journal of Basic Engineering 82(Series D), 35–45 (1960)

    Google Scholar 

  6. Julier, S.J., Uhlmann, J.K.: Unscented filtering and nonlinear estimation. Proceedings of the IEEE 92(3), 401–422 (2004)

    Article  Google Scholar 

  7. van der Merwe, R.: Sigma-Point Kalman Filters for Probabilistic Inference in Dynamic State-Space Models. PhD thesis, Oregon Health&Science University, Portland, USA (2004)

    Google Scholar 

  8. Bradtke, S.J., Barto, A.G.: Linear Least-Squares Algorithms for Temporal Difference Learning. Machine Learning 22(1-3), 33–57 (1996)

    Article  MATH  Google Scholar 

  9. Baird, L.C.: Residual Algorithms: Reinforcement Learning with Function Approximation. In: Proceedings of the International Conference on Machine Learning, pp. 30–37 (1995)

    Google Scholar 

  10. Antos, A., Szepesvári, C., Munos, R.: Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path. Machine Learning 71(1), 89–129 (2008)

    Article  Google Scholar 

  11. Kakade, S.: A natural policy gradient. In: Advances in Neural Information Processing Systems 14 (NIPS 2001), Vancouver, British Columbia, Canada, pp. 1531–1538 (2001)

    Google Scholar 

  12. Peters, J., Vijayakumar, S., Schaal, S.: Natural actor-critic. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 280–291. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  13. Boyan, J.A.: Technical Update: Least-Squares Temporal Difference Learning. Machine Learning 49(2-3), 233–246 (1999)

    Google Scholar 

  14. Lagoudakis, M.G., Parr, R.: Least-Squares Policy Iteration. Journal of Machine Learning Research 4, 1107–1149 (2003)

    Article  MathSciNet  Google Scholar 

  15. Jo, S., Kim, S.W.: Consistent Normalized Least Mean Square Filtering with Noisy Data Matrix. IEEE Transactions on Signal Processing 53(6), 2112–2123 (2005)

    Article  MathSciNet  Google Scholar 

  16. Engel, Y., Mannor, S., Meir, R.: Reinforcement Learning with Gaussian Processes. In: Proceedings of Internation Conference on Machine Learning, ICML 2005 (2005)

    Google Scholar 

  17. Bhatnagar, S., Sutton, R.S., Ghavamzadeh, M., Lee, M.: Incremental Natural Actor-Critic Algorithms. In: Advances in Neural Information Processing Systems, Vancouver, vol. 21 (2008)

    Google Scholar 

  18. Geist, M., Pietquin, O., Fricout, G.: Bayesian Reward Filtering. In: Girgin, S., Loth, M., Munos, R., Preux, P., Ryabko, D. (eds.) EWRL 2008. LNCS (LNAI), vol. 5323, pp. 96–109. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Geist, M., Pietquin, O., Fricout, G. (2009). Tracking in Reinforcement Learning. In: Leung, C.S., Lee, M., Chan, J.H. (eds) Neural Information Processing. ICONIP 2009. Lecture Notes in Computer Science, vol 5863. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10677-4_57

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-10677-4_57

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-10676-7

  • Online ISBN: 978-3-642-10677-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics