Abstract
This paper provides a comparative study between Inverse Reinforcement Learning (IRL) and Apprenticeship Learning (AL). IRL and AL are two frameworks, using Markov Decision Processes (MDP), which are used for the imitation learning problem where an agent tries to learn from demonstrations of an expert. In the AL framework, the agent tries to learn the expert policy whereas in the IRL framework, the agent tries to learn a reward which can explain the behavior of the expert. This reward is then optimized to imitate the expert. One can wonder if it is worth estimating such a reward, or if estimating a policy is sufficient. This quite natural question has not really been addressed in the literature right now. We provide partial answers, both from a theoretical and empirical point of view.
Chapter PDF
Similar content being viewed by others
Keywords
- Markov Decision Process
- Reward Function
- Neural Information Processing System
- Expert Policy
- Approximate Dynamic Program
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Abbeel, P., Ng, A.Y.: Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the 21st International Conference on Machine Learning, ICML (2004)
Archibald, T., McKinnon, K., Thomas, L.: On the generation of markov decision processes. Journal of the Operational Research Society (1995)
Atkeson, C.G., Schaal, S.: Robot learning from demonstration. In: Proceedings of the 14th International Conference on Machine Learning, ICML (1997)
Boularias, A., Kober, J., Peters, J.: Relative entropy inverse reinforcement learning. In: JMLR Workshop and Conference Proceedings, AISTATS 2011, vol. 15 (2011)
Klein, E., Geist, M., Piot, B., Pietquin, O.: Inverse reinforcement learning through structured classification. In: Advances in Neural Information Processing Systems 25 (NIPS) (2012)
Langford, J., Zadrozny, B.: Relating reinforcement learning performance to classification performance. In: Proceedings of the 22nd International Conference on Machine Learning, ICML (2005)
Pomerleau, D.: Alvinn: An autonomous land vehicle in a neural network. Tech. rep., DTIC Document (1989)
Russell, S.: Learning agents for uncertain environments. In: Proceedings of the 11th Annual Conference on Computational Learning Theory, COLT (1998)
Shor, N.Z., Kiwiel, K.C., Ruszcaynski, A.: Minimization methods for non-differentiable functions. Springer (1985)
Syed, U., Schapire, R.: A game-theoretic approach to apprenticeship learning. In: Advances in Neural Information Processing Systems 21 (NIPS) (2008)
Syed, U., Schapire, R.: A reduction from apprenticeship learning to classification. In: Advances in Neural Information Processing Systems 23 (NIPS) (2010)
Taskar, B., Chatalbashev, V., Koller, D., Guestrin, C.: Learning structured prediction models: A large margin approach. In: Proceedings of the 22nd International Conference on Machine Learning, ICML (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Piot, B., Geist, M., Pietquin, O. (2013). Learning from Demonstrations: Is It Worth Estimating a Reward Function?. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2013. Lecture Notes in Computer Science(), vol 8188. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40988-2_2
Download citation
DOI: https://doi.org/10.1007/978-3-642-40988-2_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40987-5
Online ISBN: 978-3-642-40988-2
eBook Packages: Computer ScienceComputer Science (R0)