Leveraging Domain Knowledge for Reinforcement Learning Using MMC Architectures

Ramamurthy, Rajkumar; Bauckhage, Christian; Sifa, Rafet; Schücker, Jannis; Wrobel, Stefan

doi:10.1007/978-3-030-30484-3_48

Rajkumar Ramamurthy^12,13,
Christian Bauckhage^12,13,
Rafet Sifa^12,13,
Jannis Schücker^12,13 &
…
Stefan Wrobel^12,13

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11728))

Included in the following conference series:

International Conference on Artificial Neural Networks

4391 Accesses
10 Citations

Abstract

Despite the success of reinforcement learning methods in various simulated robotic applications, end-to-end training suffers from extensive training times due to high sample complexity and does not scale well to realistic systems. In this work, we speed up reinforcement learning by incorporating domain knowledge into policy learning. We revisit an architecture based on the mean of multiple computations (MMC) principle known from computational biology and adapt it to solve a “reacher task”. We approximate the policy using a simple MMC network, experimentally compare this idea to end-to-end deep learning architectures, and show that our approach reduces the number of interactions required to approximate a suitable policy by a factor of ten.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 11210; Price includes VAT (Japan)

Softcover Book: JPY 14013; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Deep Model-Based Reinforcement Learning for Predictive Control of Robotic Systems with Dense and Sparse Rewards

Article Open access 09 July 2024

Challenges of Reinforcement Learning

Deep Reinforcement Learning Techniques in Diversified Domains: A Survey

Article 10 February 2021

References

Brockman, G., et al.: OpenAI Gym. arXiv:1606.01540 (2016)
Clavera, I., Held, D., Abbeel, P.: Policy transfer via modularity and reward guiding. In: Proceedings Intelligent Robots and Systems (2017)
Google Scholar
Conti, E., Madhavan, V., Such, F.P., Lehman, J., Stanley, K.O., Clune, J.: Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents. arXiv preprint arXiv:1712.06560 (2017)
Cruse, H., Kindermann, T., Schumm, M., Dean, J., Schmitz, J.: Walknet-a-biologically inspired network to control six-legged walking. Neural Networks 11(7–8), 1435–1447 (1998)
Article Google Scholar
Devin, C., Gupta, A., Darrell, T., Abbeel, P., Levine, S.: Learning modular neural network policies for multi-task and multi-robot transfer. In: Proceedings International Conference on Robotics and Automation (2017)
Google Scholar
Dorigo, M., Colombetti, M.: Robot shaping: developing autonomous agents through learning. Artif. Intell. 71(2), 321–370 (1994)
Article Google Scholar
Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: Proceedings International Conference on Robotics and Automation (2017)
Google Scholar
Ho, J., Ermon, S.: Generative adversarial imitation learning. In: Proceedings Neural Information Processing Systems (2016)
Google Scholar
Lehman, J., Stanley, K.O.: Evolving a diversity of virtual creatures through novelty search and local competition. In: Proceedings International Conference on Genetic and Evolutionary Computation (2011)
Google Scholar
Loftin, R., et al.: Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning. Auton. Agent. Multi-Agent Syst. 30(1), 30–59 (2016)
Article Google Scholar
Maes, F., Fonteneau, R., Wehenkel, L., Ernst, D.: Policy search in a space of simple closed-form formulas: towards interpretability of reinforcement learning. In: Ganascia, J.-G., Lenca, P., Petit, J.-M. (eds.) DS 2012. LNCS (LNAI), vol. 7569, pp. 37–51. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33492-4_6
Chapter Google Scholar
Mataric, M.J.: Reward Functions for Accelerated Learning. In: Machine Learning Proceedings 1994 (1994)
Google Scholar
Mirowski, P., et al.: Learning to navigate in complex environments. arXiv preprint arXiv:1611.03673 (2016)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)
Article Google Scholar
Müller, M., Dosovitskiy, A., Ghanem, B., Koltun, V.: Driving policy transfer via modularity and abstraction. arXiv preprint arXiv:1804.09364 (2018)
Peng, B., MacGlashan, J., Loftin, R., Littman, M.L., Roberts, D.L., Taylor, M.E.: A need for speed: adapting agent action speed to improve task learning from non-expert humans. In: Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems (2016)
Google Scholar
Ramamurthy, R., Bauckhage, C., Sifa, R., Wrobel, S.: Policy learning using SPSA. In: Kůrková, V., Manolopoulos, Y., Hammer, B., Iliadis, L., Maglogiannis, I. (eds.) ICANN 2018. LNCS, vol. 11141, pp. 3–12. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01424-7_1
Chapter Google Scholar
Robbins, H., Monro, S.: A stochastic approximation method. Annals of Mathematical Statistics 22(3), 400–407 (1951)
Article MathSciNet Google Scholar
Salimans, T., Ho, J., Chen, X., Sutskever, I.: Evolution strategies as a scalable alternative to reinforcement learning. arXiv:1703.03864 (2017)
Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust Region Policy Optimization. In: Proceedings International Conference on Machine Learning (2015)
Google Scholar
Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms. In: Proceedings International Conference on Machine Learning (2014)
Google Scholar
Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354 (2017)
Article Google Scholar
Spall, J.C.: Multivariate stochastic approximation using a simultaneous perturbation gradient approximation. IEEE Trans. Autom. Control 37(3), 332–341 (1992)
Article MathSciNet Google Scholar
Steinkühler, U., Cruse, H.: A holistic model for an internal representation to control the movement of a manipulator with redundant degrees of freedom. Biol. Cybern. 79(6), 457–466 (1998)
Article Google Scholar
Suay, H.B., Brys, T., Taylor, M.E., Chernova, S.: Learning from demonstration for shaping through inverse reinforcement learning. In: Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems (2016)
Google Scholar
Tassa, Y., et al.: DeepMind Control Suite. arXiv preprint arXiv:1801.00690 (2018)
Verma, A., Murali, V., Singh, R., Kohli, P., Chaudhuri, S.: Programmatically interpretable reinforcement learning. arXiv preprint arXiv:1804.02477 (2018)
Zhu, Y., et al.: Target-driven visual navigation in indoor scenes using deep reinforcement learning. In: Proceedings International Conference on Robotics and Automation (2017)
Google Scholar
Zhu, Y., et al.: Reinforcement and imitation learning for diverse visuomotor skills. arXiv preprint arXiv:1802.09564 (2018)

Download references

Author information

Authors and Affiliations

Fraunhofer Center for Machine Learning, Sankt Augustin, Germany
Rajkumar Ramamurthy, Christian Bauckhage, Rafet Sifa, Jannis Schücker & Stefan Wrobel
Fraunhofer IAIS, Sankt Augustin, Germany
Rajkumar Ramamurthy, Christian Bauckhage, Rafet Sifa, Jannis Schücker & Stefan Wrobel

Authors

Rajkumar Ramamurthy
View author publications
You can also search for this author in PubMed Google Scholar
Christian Bauckhage
View author publications
You can also search for this author in PubMed Google Scholar
Rafet Sifa
View author publications
You can also search for this author in PubMed Google Scholar
Jannis Schücker
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Wrobel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rajkumar Ramamurthy .

Editor information

Editors and Affiliations

Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Neuherberg, Germany
Igor V. Tetko
Institute of Computer Science, Czech Academy of Sciences, Prague 8, Czech Republic
Věra Kůrková
Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Neuherberg, Germany
Pavel Karpov
Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Neuherberg, Germany
Fabian Theis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ramamurthy, R., Bauckhage, C., Sifa, R., Schücker, J., Wrobel, S. (2019). Leveraging Domain Knowledge for Reinforcement Learning Using MMC Architectures. In: Tetko, I., Kůrková, V., Karpov, P., Theis, F. (eds) Artificial Neural Networks and Machine Learning – ICANN 2019: Deep Learning. ICANN 2019. Lecture Notes in Computer Science(), vol 11728. Springer, Cham. https://doi.org/10.1007/978-3-030-30484-3_48

Download citation

DOI: https://doi.org/10.1007/978-3-030-30484-3_48
Published: 09 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30483-6
Online ISBN: 978-3-030-30484-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics