Batch, Off-Policy and Model-Free Apprenticeship Learning | SpringerLink
Skip to main content

Batch, Off-Policy and Model-Free Apprenticeship Learning

  • Conference paper
Recent Advances in Reinforcement Learning (EWRL 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7188))

Included in the following conference series:

Abstract

This paper addresses the problem of apprenticeship learning, that is learning control policies from demonstration by an expert. An efficient framework for it is inverse reinforcement learning (IRL). Based on the assumption that the expert maximizes a utility function, IRL aims at learning the underlying reward from example trajectories. Many IRL algorithms assume that the reward function is linearly parameterized and rely on the computation of some associated feature expectations, which is done through Monte Carlo simulation. However, this assumes to have full trajectories for the expert policy as well as at least a generative model for intermediate policies. In this paper, we introduce a temporal difference method, namely LSTD-μ, to compute these feature expectations. This allows extending apprenticeship learning to a batch and off-policy setting.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 5719
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 7149
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Abbeel, P., Ng, A.: Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the Twenty-First International Conference on Machine Learning, p. 1. ACM (2004)

    Google Scholar 

  2. Bradtke, S., Barto, A.: Linear least-squares algorithms for temporal difference learning. Machine Learning 22(1), 33–57 (1996)

    MATH  Google Scholar 

  3. Kolter, J., Abbeel, P., Ng, A.: Hierarchical apprenticeship learning with application to quadruped locomotion. In: Neural Information Processing Systems, vol. 20 (2008)

    Google Scholar 

  4. Lagoudakis, M., Parr, R.: Least-squares policy iteration. The Journal of Machine Learning Research 4, 1107–1149 (2003)

    MathSciNet  Google Scholar 

  5. Lazaric, A., Ghavamzadeh, M., Munos, R.: Finite-sample analysis of lstd. In: Proceedings of the 27th International Conference on Machine Learning (2010)

    Google Scholar 

  6. Nedić, A., Bertsekas, D.: Least squares policy evaluation algorithms with linear function approximation. Discrete Event Dynamic Systems 13(1), 79–110 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  7. Neu, G., Szepesvári, C.: Apprenticeship learning using inverse reinforcement learning and gradient methods. In: Proc. UAI, pp. 295–302 (2007)

    Google Scholar 

  8. Ng, A., Russell, S.: Algorithms for inverse reinforcement learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 663–670. Morgan Kaufmann Publishers Inc. (2000)

    Google Scholar 

  9. Ramachandran, D., Amir, E.: Bayesian inverse reinforcement learning. In: Proceedings of the International Joint Conference on Artificial Intelligence, pp. 2586–2591 (2007)

    Google Scholar 

  10. Ratliff, N., Bagnell, J., Srinivasa, S.: Imitation learning for locomotion and manipulation. In: 2007 7th IEEE-RAS International Conference on Humanoid Robots, pp. 392–397. IEEE (2007)

    Google Scholar 

  11. Ratliff, N., Bradley, D., Bagnell, J., Chestnutt, J.: Boosting structured prediction for imitation learning. In: Advances in Neural Information Processing Systems, vol. 19, p. 1153 (2007)

    Google Scholar 

  12. Ratliff, N., Bagnell, J., Zinkevich, M.: Maximum margin planning. In: Proceedings of the 23rd International Conference on Machine Learning, p. 736. ACM (2006)

    Google Scholar 

  13. Russell, S.: Learning agents for uncertain environments (extended abstract). In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, p. 103. ACM (1998)

    Google Scholar 

  14. Sutton, R., Barto, A.: Reinforcement learning. MIT Press (1998)

    Google Scholar 

  15. Syed, U., Bowling, M., Schapire, R.: Apprenticeship learning using linear programming. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1032–1039. ACM (2008)

    Google Scholar 

  16. Syed, U., Schapire, R.: A game-theoretic approach to apprenticeship learning. In: Advances in Neural Information Processing Systems, vol. 20, pp. 1449–1456 (2008)

    Google Scholar 

  17. Ziebart, B., Maas, A., Bagnell, J., Dey, A.: Maximum entropy inverse reinforcement learning. In: Proc. AAAI, pp. 1433–1438 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Klein, E., Geist, M., Pietquin, O. (2012). Batch, Off-Policy and Model-Free Apprenticeship Learning. In: Sanner, S., Hutter, M. (eds) Recent Advances in Reinforcement Learning. EWRL 2011. Lecture Notes in Computer Science(), vol 7188. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29946-9_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-29946-9_28

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-29945-2

  • Online ISBN: 978-3-642-29946-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics