Leveraging Domain Knowledge for Reinforcement Learning Using MMC Architectures | SpringerLink
Skip to main content

Leveraging Domain Knowledge for Reinforcement Learning Using MMC Architectures

  • Conference paper
  • First Online:
Artificial Neural Networks and Machine Learning – ICANN 2019: Deep Learning (ICANN 2019)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11728))

Included in the following conference series:

Abstract

Despite the success of reinforcement learning methods in various simulated robotic applications, end-to-end training suffers from extensive training times due to high sample complexity and does not scale well to realistic systems. In this work, we speed up reinforcement learning by incorporating domain knowledge into policy learning. We revisit an architecture based on the mean of multiple computations (MMC) principle known from computational biology and adapt it to solve a “reacher task”. We approximate the policy using a simple MMC network, experimentally compare this idea to end-to-end deep learning architectures, and show that our approach reduces the number of interactions required to approximate a suitable policy by a factor of ten.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 11210
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 14013
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Brockman, G., et al.: OpenAI Gym. arXiv:1606.01540 (2016)

  2. Clavera, I., Held, D., Abbeel, P.: Policy transfer via modularity and reward guiding. In: Proceedings Intelligent Robots and Systems (2017)

    Google Scholar 

  3. Conti, E., Madhavan, V., Such, F.P., Lehman, J., Stanley, K.O., Clune, J.: Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents. arXiv preprint arXiv:1712.06560 (2017)

  4. Cruse, H., Kindermann, T., Schumm, M., Dean, J., Schmitz, J.: Walknet-a-biologically inspired network to control six-legged walking. Neural Networks 11(7–8), 1435–1447 (1998)

    Article  Google Scholar 

  5. Devin, C., Gupta, A., Darrell, T., Abbeel, P., Levine, S.: Learning modular neural network policies for multi-task and multi-robot transfer. In: Proceedings International Conference on Robotics and Automation (2017)

    Google Scholar 

  6. Dorigo, M., Colombetti, M.: Robot shaping: developing autonomous agents through learning. Artif. Intell. 71(2), 321–370 (1994)

    Article  Google Scholar 

  7. Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: Proceedings International Conference on Robotics and Automation (2017)

    Google Scholar 

  8. Ho, J., Ermon, S.: Generative adversarial imitation learning. In: Proceedings Neural Information Processing Systems (2016)

    Google Scholar 

  9. Lehman, J., Stanley, K.O.: Evolving a diversity of virtual creatures through novelty search and local competition. In: Proceedings International Conference on Genetic and Evolutionary Computation (2011)

    Google Scholar 

  10. Loftin, R., et al.: Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning. Auton. Agent. Multi-Agent Syst. 30(1), 30–59 (2016)

    Article  Google Scholar 

  11. Maes, F., Fonteneau, R., Wehenkel, L., Ernst, D.: Policy search in a space of simple closed-form formulas: towards interpretability of reinforcement learning. In: Ganascia, J.-G., Lenca, P., Petit, J.-M. (eds.) DS 2012. LNCS (LNAI), vol. 7569, pp. 37–51. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33492-4_6

    Chapter  Google Scholar 

  12. Mataric, M.J.: Reward Functions for Accelerated Learning. In: Machine Learning Proceedings 1994 (1994)

    Google Scholar 

  13. Mirowski, P., et al.: Learning to navigate in complex environments. arXiv preprint arXiv:1611.03673 (2016)

  14. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)

    Article  Google Scholar 

  15. Müller, M., Dosovitskiy, A., Ghanem, B., Koltun, V.: Driving policy transfer via modularity and abstraction. arXiv preprint arXiv:1804.09364 (2018)

  16. Peng, B., MacGlashan, J., Loftin, R., Littman, M.L., Roberts, D.L., Taylor, M.E.: A need for speed: adapting agent action speed to improve task learning from non-expert humans. In: Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems (2016)

    Google Scholar 

  17. Ramamurthy, R., Bauckhage, C., Sifa, R., Wrobel, S.: Policy learning using SPSA. In: Kůrková, V., Manolopoulos, Y., Hammer, B., Iliadis, L., Maglogiannis, I. (eds.) ICANN 2018. LNCS, vol. 11141, pp. 3–12. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01424-7_1

    Chapter  Google Scholar 

  18. Robbins, H., Monro, S.: A stochastic approximation method. Annals of Mathematical Statistics 22(3), 400–407 (1951)

    Article  MathSciNet  Google Scholar 

  19. Salimans, T., Ho, J., Chen, X., Sutskever, I.: Evolution strategies as a scalable alternative to reinforcement learning. arXiv:1703.03864 (2017)

  20. Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust Region Policy Optimization. In: Proceedings International Conference on Machine Learning (2015)

    Google Scholar 

  21. Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms. In: Proceedings International Conference on Machine Learning (2014)

    Google Scholar 

  22. Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354 (2017)

    Article  Google Scholar 

  23. Spall, J.C.: Multivariate stochastic approximation using a simultaneous perturbation gradient approximation. IEEE Trans. Autom. Control 37(3), 332–341 (1992)

    Article  MathSciNet  Google Scholar 

  24. Steinkühler, U., Cruse, H.: A holistic model for an internal representation to control the movement of a manipulator with redundant degrees of freedom. Biol. Cybern. 79(6), 457–466 (1998)

    Article  Google Scholar 

  25. Suay, H.B., Brys, T., Taylor, M.E., Chernova, S.: Learning from demonstration for shaping through inverse reinforcement learning. In: Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems (2016)

    Google Scholar 

  26. Tassa, Y., et al.: DeepMind Control Suite. arXiv preprint arXiv:1801.00690 (2018)

  27. Verma, A., Murali, V., Singh, R., Kohli, P., Chaudhuri, S.: Programmatically interpretable reinforcement learning. arXiv preprint arXiv:1804.02477 (2018)

  28. Zhu, Y., et al.: Target-driven visual navigation in indoor scenes using deep reinforcement learning. In: Proceedings International Conference on Robotics and Automation (2017)

    Google Scholar 

  29. Zhu, Y., et al.: Reinforcement and imitation learning for diverse visuomotor skills. arXiv preprint arXiv:1802.09564 (2018)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rajkumar Ramamurthy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ramamurthy, R., Bauckhage, C., Sifa, R., Schücker, J., Wrobel, S. (2019). Leveraging Domain Knowledge for Reinforcement Learning Using MMC Architectures. In: Tetko, I., Kůrková, V., Karpov, P., Theis, F. (eds) Artificial Neural Networks and Machine Learning – ICANN 2019: Deep Learning. ICANN 2019. Lecture Notes in Computer Science(), vol 11728. Springer, Cham. https://doi.org/10.1007/978-3-030-30484-3_48

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30484-3_48

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30483-6

  • Online ISBN: 978-3-030-30484-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics