Batch, Off-Policy and Model-Free Apprenticeship Learning

Klein, Edouard; Geist, Matthieu; Pietquin, Olivier

doi:10.1007/978-3-642-29946-9_28

Edouard Klein^21,23,
Matthieu Geist²¹ &
Olivier Pietquin^21,22

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7188))

Included in the following conference series:

European Workshop on Reinforcement Learning

2338 Accesses
4 Citations

Abstract

This paper addresses the problem of apprenticeship learning, that is learning control policies from demonstration by an expert. An efficient framework for it is inverse reinforcement learning (IRL). Based on the assumption that the expert maximizes a utility function, IRL aims at learning the underlying reward from example trajectories. Many IRL algorithms assume that the reward function is linearly parameterized and rely on the computation of some associated feature expectations, which is done through Monte Carlo simulation. However, this assumes to have full trajectories for the expert policy as well as at least a generative model for intermediate policies. In this paper, we introduce a temporal difference method, namely LSTD-μ, to compute these feature expectations. This allows extending apprenticeship learning to a batch and off-policy setting.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 5719; Price includes VAT (Japan)

Softcover Book: JPY 7149; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Batch-Constraint Inverse Reinforcement Learning

Dealing with multiple experts and non-stationarity in inverse reinforcement learning: an application to real-life problems

Article Open access 14 March 2021

Batch Reinforcement Learning from Crowds

References

Abbeel, P., Ng, A.: Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the Twenty-First International Conference on Machine Learning, p. 1. ACM (2004)
Google Scholar
Bradtke, S., Barto, A.: Linear least-squares algorithms for temporal difference learning. Machine Learning 22(1), 33–57 (1996)
MATH Google Scholar
Kolter, J., Abbeel, P., Ng, A.: Hierarchical apprenticeship learning with application to quadruped locomotion. In: Neural Information Processing Systems, vol. 20 (2008)
Google Scholar
Lagoudakis, M., Parr, R.: Least-squares policy iteration. The Journal of Machine Learning Research 4, 1107–1149 (2003)
MathSciNet Google Scholar
Lazaric, A., Ghavamzadeh, M., Munos, R.: Finite-sample analysis of lstd. In: Proceedings of the 27th International Conference on Machine Learning (2010)
Google Scholar
Nedić, A., Bertsekas, D.: Least squares policy evaluation algorithms with linear function approximation. Discrete Event Dynamic Systems 13(1), 79–110 (2003)
Article MathSciNet MATH Google Scholar
Neu, G., Szepesvári, C.: Apprenticeship learning using inverse reinforcement learning and gradient methods. In: Proc. UAI, pp. 295–302 (2007)
Google Scholar
Ng, A., Russell, S.: Algorithms for inverse reinforcement learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 663–670. Morgan Kaufmann Publishers Inc. (2000)
Google Scholar
Ramachandran, D., Amir, E.: Bayesian inverse reinforcement learning. In: Proceedings of the International Joint Conference on Artificial Intelligence, pp. 2586–2591 (2007)
Google Scholar
Ratliff, N., Bagnell, J., Srinivasa, S.: Imitation learning for locomotion and manipulation. In: 2007 7th IEEE-RAS International Conference on Humanoid Robots, pp. 392–397. IEEE (2007)
Google Scholar
Ratliff, N., Bradley, D., Bagnell, J., Chestnutt, J.: Boosting structured prediction for imitation learning. In: Advances in Neural Information Processing Systems, vol. 19, p. 1153 (2007)
Google Scholar
Ratliff, N., Bagnell, J., Zinkevich, M.: Maximum margin planning. In: Proceedings of the 23rd International Conference on Machine Learning, p. 736. ACM (2006)
Google Scholar
Russell, S.: Learning agents for uncertain environments (extended abstract). In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, p. 103. ACM (1998)
Google Scholar
Sutton, R., Barto, A.: Reinforcement learning. MIT Press (1998)
Google Scholar
Syed, U., Bowling, M., Schapire, R.: Apprenticeship learning using linear programming. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1032–1039. ACM (2008)
Google Scholar
Syed, U., Schapire, R.: A game-theoretic approach to apprenticeship learning. In: Advances in Neural Information Processing Systems, vol. 20, pp. 1449–1456 (2008)
Google Scholar
Ziebart, B., Maas, A., Bagnell, J., Dey, A.: Maximum entropy inverse reinforcement learning. In: Proc. AAAI, pp. 1433–1438 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Supélec, IMS Research Group, France
Edouard Klein, Matthieu Geist & Olivier Pietquin
UMI 2958, GeorgiaTech-CNRS, France
Olivier Pietquin
Equipe ABC, LORIA-CNRS, France
Edouard Klein

Authors

Edouard Klein
View author publications
You can also search for this author in PubMed Google Scholar
Matthieu Geist
View author publications
You can also search for this author in PubMed Google Scholar
Olivier Pietquin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

NICTA and the Australian National University, 7 London Circuit, ACT 2601, Canberra, Australia
Scott Sanner
Research School of Computer Science, Australian National University, ACT 0200, Canberra, Australia
Marcus Hutter

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Klein, E., Geist, M., Pietquin, O. (2012). Batch, Off-Policy and Model-Free Apprenticeship Learning. In: Sanner, S., Hutter, M. (eds) Recent Advances in Reinforcement Learning. EWRL 2011. Lecture Notes in Computer Science(), vol 7188. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29946-9_28

Download citation

DOI: https://doi.org/10.1007/978-3-642-29946-9_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29945-2
Online ISBN: 978-3-642-29946-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics