Policy search for motor primitives in robotics

Kober, Jens; Peters, Jan

doi:10.1007/s10994-010-5223-6

Policy search for motor primitives in robotics

Published: 06 November 2010

Volume 84, pages 171–203, (2011)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Policy search for motor primitives in robotics

Download PDF

Jens Kober¹ &
Jan Peters¹

2964 Accesses
173 Citations
Explore all metrics

Abstract

Many motor skills in humanoid robotics can be learned using parametrized motor primitives. While successful applications to date have been achieved with imitation learning, most of the interesting motor learning problems are high-dimensional reinforcement learning problems. These problems are often beyond the reach of current reinforcement learning methods. In this paper, we study parametrized policy search methods and apply these to benchmark problems of motor primitive learning in robotics. We show that many well-known parametrized policy search methods can be derived from a general, common framework. This framework yields both policy gradient methods and expectation-maximization (EM) inspired algorithms. We introduce a novel EM-inspired algorithm for policy learning that is particularly well-suited for dynamical system motor primitives. We compare this algorithm, both in simulation and on a real robot, to several well-known parametrized policy search methods such as episodic REINFORCE, ‘Vanilla’ Policy Gradients with optimal baselines, episodic Natural Actor Critic, and episodic Reward-Weighted Regression. We show that the proposed method out-performs them on an empirical benchmark of learning dynamical system motor primitives both in simulation and on a real robot. We apply it in the context of motor learning and show that it can learn a complex Ball-in-a-Cup task on a real Barrett WAM™ robot arm.

Article PDF

R \(\times \) R: Rapid eXploration for Reinforcement learning via sampling-based reset distributions and imitation pre-training

Article 27 August 2024

Hybrid Control for Learning Motor Skills

Fast Robot Motor Skill Acquisition Based on Bayesian Inspired Policy Improvement

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Andrieu, C., de Freitas, N., Doucet, A., & Jordan, M. I. (2003). An introduction to MCMC for machine learning. Machine Learning, 50(1), 5–43.
Article MATH Google Scholar
Atkeson, C. G. (1994). Using local trajectory optimizers to speed up global optimization in dynamic programming. In Advances in neural information processing systems (Vol. 6, pp. 503–521), Denver, CO, USA.
Google Scholar
Attias, H. (2003). Planning by probabilistic inference. In Proceedings of the ninth international workshop on artificial intelligence and statistics (AISTATS), Key West, FL, USA.
Google Scholar
Bagnell, J., & Schneider, J. (2003). Covariant policy search. In Proceedings of the international joint conference on artificial intelligence (IJCAI) (pp. 1019–1024), Acapulco, Mexico.
Google Scholar
Bagnell, J., Kadade, S., Ng, A., & Schneider, J. (2004). Policy search by dynamic programming. In Advances in neural information processing systems (Vol. 16), Vancouver, BC, CA.
Google Scholar
Binder, J., Koller, D., Russell, S., & Kanazawa, K. (1997). Adaptive probabilistic networks with hidden variables. Machine Learning, 29(2–3), 213–244.
Article MATH Google Scholar
Chiappa, S., Kober, J., & Peters, J. (2009). Using Bayesian dynamical systems for motion template libraries. In D. Koller, D. Schuurmans, Y. Bengio, & L. Bottou (Eds.), Advances in neural information processing systems (Vol. 21, pp. 297–304).
Google Scholar
DARPA (2010a). Learning locomotion (L2). http://www.darpa.mil/ipto/programs/ll/ll.asp.
DARPA (2010b). Learning applied to ground robotics (LAGR). http://www.darpa.mil/ipto/programs/lagr/lagr.asp.
DARPA (2010c). Autonomous robot manipulation (ARM). http://www.darpa.mil/ipto/programs/arm/arm.asp.
Dayan, P., & Hinton, G. E. (1997). Using expectation-maximization for reinforcement learning. Neural Computation, 9(2), 271–278.
Article MATH Google Scholar
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B (Methodological), 39, 1–38.
MathSciNet MATH Google Scholar
El-Fakdi, A., Carreras, M., & Ridao, P. (2006). Towards direct policy search reinforcement learning for robot control. In Proceedings of the IEEE/RSJ 2006 international conference on intelligent robots and systems (IROS), Beijing, China.
Google Scholar
Fantoni, I., & Lozano, R. (2001). Non-linear control for underactuated mechanical systems. New York: Springer.
Google Scholar
Guenter, F., Hersch, M., Calinon, S., & Billard, A. (2007). Reinforcement learning for imitating constrained reaching movements. Advanced Robotics, Special Issue on Imitative Robots, 21(13), 1521–1544.
Google Scholar
Gullapalli, V., Franklin, J., & Benbrahim, H. (1994). Acquiring robot skills via reinforcement learning. IEEE Control Systems Journal, Special Issue on Robotics: Capturing Natural Motion, 4(1), 13–24.
Google Scholar
Hoffman, M., Doucet, A., de Freitas, N., & Jasra, A. (2007). Bayesian policy learning with trans-dimensional MCMC. In Advances in neural information processing systems (Vol. 20), Vancouver, BC, CA.
Google Scholar
Ijspeert, A. J., Nakanishi, J., & Schaal, S. (2002). Movement imitation with nonlinear dynamical systems in humanoid robots. In Proceedings of IEEE international conference on robotics and automation (ICRA) (pp. 1398–1403), Washington, DC.
Google Scholar
Ijspeert, A. J., Nakanishi, J., & Schaal, S. (2003). Learning attractor landscapes for learning motor primitives. In Advances in neural information processing systems (Vol. 15, pp. 1547–1554), Vancouver, BC, CA.
Google Scholar
Jaakkola, T., Jordan, M. I., & Singh, S. P. (1994). Convergence of stochastic iterative dynamic programming algorithms. In J. D. Cowan, G. Tesauro, & J. Alspector (Eds.), Advances in neural information processing systems (Vol. 6, pp. 703–710). San Mateo: Morgan Kaufmann.
Google Scholar
Kirk, D. E. (1970). Optimal control theory. Englewood Cliffs: Prentice-Hall.
Google Scholar
Kober, J., & Peters, J. (2009a). Learning motor primitives for robotics. In Proceedings of IEEE international conference on robotics and automation (ICRA) (pp. 2112–2118).
Google Scholar
Kober, J., & Peters, J. (2009b). Policy search for motor primitives in robotics. In D. Koller, D. Schuurmans, Y. Bengio, & L. Bottou (Eds.), Advances in neural information processing systems (Vol. 21, pp. 849–856).
Google Scholar
Kober, J., Mohler, B., & Peters, J. (2008). Learning perceptual coupling for motor primitives. In Proceedings of the IEEE/RSJ 2008 international conference on intelligent robots and systems (IROS) (pp. 834–839), Nice, France.
Chapter Google Scholar
Kormushev, P., Calinon, S., & Caldwell, D. G. (2010). Robot motor skill coordination with em-based reinforcement learning. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (IROS).
Google Scholar
Kwee, I., Hutter, M., & Schmidhuber, J. (2001). Gradient-based reinforcement planning in policy-search methods. In M. A. Wiering (Ed.), Cognitieve Kunstmatige Intelligentie: Vol. 27. Proceedings of the 5th European workshop on reinforcement learning (EWRL) (pp. 27–29), Lugano. Manno: Onderwijsinsituut CKI, Utrecht University.
Google Scholar
Lawrence, G., Cowan, N., & Russell, S. (2003). Efficient gradient estimation for motor control learning. In Proceedings of the international conference on uncertainty in artificial intelligence (UAI) (pp. 354–361), Acapulco, Mexico.
Google Scholar
Martín, H. J. A., de Lope, J., & Maravall, D. (2009). The knn-td reinforcement learning algorithm. In Proceedings of the 3rd international work-conference on the interplay between natural and artificial computation (IWINAC) (pp. 305–314). Berlin: Springer.
Google Scholar
McLachan, G. J., & Krishnan, T. (1997). Wiley series in probability and statistics. The EM algorithm and extensions. New York: Wiley.
Google Scholar
Miyamoto, H., Schaal, S., Gandolfo, F., Gomi, H., Koike, Y., Osu, R., Nakano, E., Wada, Y., & Kawato, M. (1996). A Kendama learning robot based on bi-directional theory. Neural Networks, 9(8), 1281–1302.
Article Google Scholar
Ng, A. Y., & Jordan, M. (2000). Pegasus: A policy search method for large mdps and pomdps. In Proceedings of the international conference on uncertainty in artificial intelligence (UAI) (pp. 406–415), Palo Alto, CA.
Google Scholar
Ng, A. Y., Kim, H. J., Jordan, M. I., & Sastry, S. (2004). Inverted autonomous helicopter flight via reinforcement learning. In Proceedings of the international symposium on experimental robotics (ISER). Cambridge: MIT Press.
Google Scholar
Park, D. H., Hoffmann, H., Pastor, P., & Schaal, S. (2008). Movement reproduction and obstacle avoidance with dynamic movement primitives and potential fields. In IEEE international conference on humanoid robots (HUMANOIDS) (pp. 91–98).
Google Scholar
PASCAL2 (2010). Challenges. http://pascallin2.ecs.soton.ac.uk/Challenges/.
Peshkin, L. (2001). Reinforcement learning by policy search. PhD thesis, Brown University, Providence, RI.
Peters, J. (2007). Machine learning of motor skills for robotics. PhD thesis, University of Southern California, Los Angeles, CA, 90089, USA.
Peters, J., & Schaal, S. (2006). Policy gradient methods for robotics. In Proceedings of the IEEE/RSJ 2006 international conference on intelligent robots and systems (IROS) (pp. 2219–2225), Beijing, China.
Chapter Google Scholar
Peters, J., & Schaal, S. (2007). Reinforcement learning by reward-weighted regression for operational space control. In Proceedings of the international conference on machine learning (ICML), Corvallis, OR, USA.
Google Scholar
Peters, J., Vijayakumar, S., & Schaal, S. (2003). Reinforcement learning for humanoid robotics. In Proceedings of the IEEE-RAS international conference on humanoid robots (HUMANOIDS) (pp. 103–123), Karlsruhe, Germany.
Google Scholar
Peters, J., Vijayakumar, S., & Schaal, S. (2005). Natural actor-critic. In Proceedings of the European conference on machine learning (ECML) (pp. 280–291), Porto, Portugal.
Google Scholar
Rückstieß, T., Felder, M., & Schmidhuber, J. (2008). State-dependent exploration for policy gradient methods. In Proceedings of the European conference on machine learning (ECML) (pp. 234–249), Antwerp, Belgium.
Google Scholar
Sato, S., Sakaguchi, T., Masutani, Y., & Miyazaki, F. (1993). Mastering of a task with interaction between a robot and its environment: “kendama” task. Transactions of the Japan Society of Mechanical Engineers C, 59(558), 487–493.
Google Scholar
Schaal, S., Atkeson, C. G., & Vijayakumar, S. (2002). Scalable techniques from nonparameteric statistics for real-time robot learning. Applied Intelligence, 17(1), 49–60.
Article MATH Google Scholar
Schaal, S., Peters, J., Nakanishi, J., & Ijspeert, A. J. (2003). Control, planning, learning, and imitation with dynamic movement primitives. In Proceedings of the workshop on bilateral paradigms on humans and humanoids, IEEE international conference on intelligent robots and systems (IROS), Las Vegas, NV, October 27–31, 2003.
Google Scholar
Schaal, S., Mohajerian, P., & Ijspeert, A. J. (2007). Dynamics systems vs. optimal control—a unifying view. Progress in Brain Research, 165(1), 425–445.
Article Google Scholar
Sehnke, F., Osendorfer, C., Rückstieß, T., Graves, A., Peters, J., & Schmidhuber, J. (2010). Parameter-exploring policy gradients. Neural Networks, 21(4), 551–559.
Article Google Scholar
Shone, T., Krudysz, G., & Brown, K. (2000). Dynamic manipulation of Kendama (Tech. rep.). Rensselaer Polytechnic Institute.
Strens, M., & Moore, A. (2001). Direct policy search using paired statistical tests. In Proceedings of the 18th international conference on machine learning (ICML).
Google Scholar
Sumners, C. (1997). Toys in space: exploring science with the astronauts. New York: McGraw-Hill.
Google Scholar
Sutton, R., & Barto, A. (1998). Reinforcement learning. Cambridge: MIT Press.
Google Scholar
Sutton, R. S. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Proceedings of the international machine learning conference (pp. 9–44).
Google Scholar
Sutton, R. S., McAllester, D., Singh, S., & Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems (NIPS) (Vol. 13, pp. 1057–1063), Denver, CO, USA.
Google Scholar
Takenaka, K. (1984). Dynamical control of manipulator with vision: “cup and ball” game demonstrated by robot. Transactions of the Japan Society of Mechanical Engineers C, 50(458), 2046–2053.
Google Scholar
Taylor, M. E., Whiteson, S., & Stone, P. (2007). Transfer via inter-task mappings in policy search reinforcement learning. In Proceedings of the sixth international joint conference on autonomous agents and multiagent systems (AAMAS).
Google Scholar
Tedrake, R., Zhang, T. W., & Seung, H. S. (2004). Stochastic policy gradient reinforcement learning on a simple 3d biped. In Proceedings of the IEEE 2004 international conference on intelligent robots and systems (IROS) (pp. 2849–2854).
Google Scholar
Theodorou, E. A., Buchli, J., & Schaal, S. (2010). Reinforcement learning of motor skills in high dimensions: a path integral approach. In Proceedings of IEEE international conference on robotics and automation (ICRA) (pp. 2397–2403).
Google Scholar
Toussaint, M., & Goerick, C. (2007). Probabilistic inference for structured planning in robotics. In Proceedings of the IEEE/RSJ 2007 international conference on intelligent robots and systems (IROS), San Diego, CA, USA.
Google Scholar
Van Der Maaten, L., Postma, E., & Van Den Herik, H. (2007). Dimensionality reduction: a comparative review. Preprint.
Vlassis, N., Toussaint, M., Kontes, G., & Piperidis, S. (2009). Learning model-free robot control by a Monte Carlo EM algorithm. Autonomous Robots, 27(2), 123–130.
Article Google Scholar
Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8, 229–256.
MATH Google Scholar
Wulf, G. (2007). Attention and motor skill learning. Champaign: Human Kinetics.
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. Empirical Inference, Max Planck Institute for Biological Cybernetics, Spemannstr. 38, 72076, Tübingen, Germany
Jens Kober & Jan Peters

Authors

Jens Kober
View author publications
You can also search for this author inPubMed Google Scholar
Jan Peters
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Jens Kober.

Additional information

Editors: S. Whiteson and M. Littman.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kober, J., Peters, J. Policy search for motor primitives in robotics. Mach Learn 84, 171–203 (2011). https://doi.org/10.1007/s10994-010-5223-6

Download citation

Received: 02 December 2009
Revised: 27 September 2010
Accepted: 17 October 2010
Published: 06 November 2010
Issue Date: July 2011
DOI: https://doi.org/10.1007/s10994-010-5223-6

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Policy search for motor primitives in robotics

Abstract

Article PDF

Similar content being viewed by others

R \(\times \) R: Rapid eXploration for Reinforcement learning via sampling-based reset distributions and imitation pre-training

Hybrid Control for Learning Motor Skills

Fast Robot Motor Skill Acquisition Based on Bayesian Inspired Policy Improvement

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords