Abstract
Despite the success of reinforcement learning methods in various simulated robotic applications, end-to-end training suffers from extensive training times due to high sample complexity and does not scale well to realistic systems. In this work, we speed up reinforcement learning by incorporating domain knowledge into policy learning. We revisit an architecture based on the mean of multiple computations (MMC) principle known from computational biology and adapt it to solve a “reacher task”. We approximate the policy using a simple MMC network, experimentally compare this idea to end-to-end deep learning architectures, and show that our approach reduces the number of interactions required to approximate a suitable policy by a factor of ten.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Brockman, G., et al.: OpenAI Gym. arXiv:1606.01540 (2016)
Clavera, I., Held, D., Abbeel, P.: Policy transfer via modularity and reward guiding. In: Proceedings Intelligent Robots and Systems (2017)
Conti, E., Madhavan, V., Such, F.P., Lehman, J., Stanley, K.O., Clune, J.: Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents. arXiv preprint arXiv:1712.06560 (2017)
Cruse, H., Kindermann, T., Schumm, M., Dean, J., Schmitz, J.: Walknet-a-biologically inspired network to control six-legged walking. Neural Networks 11(7–8), 1435–1447 (1998)
Devin, C., Gupta, A., Darrell, T., Abbeel, P., Levine, S.: Learning modular neural network policies for multi-task and multi-robot transfer. In: Proceedings International Conference on Robotics and Automation (2017)
Dorigo, M., Colombetti, M.: Robot shaping: developing autonomous agents through learning. Artif. Intell. 71(2), 321–370 (1994)
Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: Proceedings International Conference on Robotics and Automation (2017)
Ho, J., Ermon, S.: Generative adversarial imitation learning. In: Proceedings Neural Information Processing Systems (2016)
Lehman, J., Stanley, K.O.: Evolving a diversity of virtual creatures through novelty search and local competition. In: Proceedings International Conference on Genetic and Evolutionary Computation (2011)
Loftin, R., et al.: Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning. Auton. Agent. Multi-Agent Syst. 30(1), 30–59 (2016)
Maes, F., Fonteneau, R., Wehenkel, L., Ernst, D.: Policy search in a space of simple closed-form formulas: towards interpretability of reinforcement learning. In: Ganascia, J.-G., Lenca, P., Petit, J.-M. (eds.) DS 2012. LNCS (LNAI), vol. 7569, pp. 37–51. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33492-4_6
Mataric, M.J.: Reward Functions for Accelerated Learning. In: Machine Learning Proceedings 1994 (1994)
Mirowski, P., et al.: Learning to navigate in complex environments. arXiv preprint arXiv:1611.03673 (2016)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)
Müller, M., Dosovitskiy, A., Ghanem, B., Koltun, V.: Driving policy transfer via modularity and abstraction. arXiv preprint arXiv:1804.09364 (2018)
Peng, B., MacGlashan, J., Loftin, R., Littman, M.L., Roberts, D.L., Taylor, M.E.: A need for speed: adapting agent action speed to improve task learning from non-expert humans. In: Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems (2016)
Ramamurthy, R., Bauckhage, C., Sifa, R., Wrobel, S.: Policy learning using SPSA. In: Kůrková, V., Manolopoulos, Y., Hammer, B., Iliadis, L., Maglogiannis, I. (eds.) ICANN 2018. LNCS, vol. 11141, pp. 3–12. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01424-7_1
Robbins, H., Monro, S.: A stochastic approximation method. Annals of Mathematical Statistics 22(3), 400–407 (1951)
Salimans, T., Ho, J., Chen, X., Sutskever, I.: Evolution strategies as a scalable alternative to reinforcement learning. arXiv:1703.03864 (2017)
Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust Region Policy Optimization. In: Proceedings International Conference on Machine Learning (2015)
Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms. In: Proceedings International Conference on Machine Learning (2014)
Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354 (2017)
Spall, J.C.: Multivariate stochastic approximation using a simultaneous perturbation gradient approximation. IEEE Trans. Autom. Control 37(3), 332–341 (1992)
Steinkühler, U., Cruse, H.: A holistic model for an internal representation to control the movement of a manipulator with redundant degrees of freedom. Biol. Cybern. 79(6), 457–466 (1998)
Suay, H.B., Brys, T., Taylor, M.E., Chernova, S.: Learning from demonstration for shaping through inverse reinforcement learning. In: Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems (2016)
Tassa, Y., et al.: DeepMind Control Suite. arXiv preprint arXiv:1801.00690 (2018)
Verma, A., Murali, V., Singh, R., Kohli, P., Chaudhuri, S.: Programmatically interpretable reinforcement learning. arXiv preprint arXiv:1804.02477 (2018)
Zhu, Y., et al.: Target-driven visual navigation in indoor scenes using deep reinforcement learning. In: Proceedings International Conference on Robotics and Automation (2017)
Zhu, Y., et al.: Reinforcement and imitation learning for diverse visuomotor skills. arXiv preprint arXiv:1802.09564 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Ramamurthy, R., Bauckhage, C., Sifa, R., Schücker, J., Wrobel, S. (2019). Leveraging Domain Knowledge for Reinforcement Learning Using MMC Architectures. In: Tetko, I., Kůrková, V., Karpov, P., Theis, F. (eds) Artificial Neural Networks and Machine Learning – ICANN 2019: Deep Learning. ICANN 2019. Lecture Notes in Computer Science(), vol 11728. Springer, Cham. https://doi.org/10.1007/978-3-030-30484-3_48
Download citation
DOI: https://doi.org/10.1007/978-3-030-30484-3_48
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30483-6
Online ISBN: 978-3-030-30484-3
eBook Packages: Computer ScienceComputer Science (R0)