Abstract
Embodied AI, learning through interaction with a physical environment, typically requires large amounts of interaction with the environment in order to learn how to solve new tasks. Training can be done in parallel, using simulated environments. However, once deployed in e.g., a real-world setting, it is not yet clear how an agent can quickly adapt its knowledge to solve new tasks.
In this paper, we propose a novel Hierarchical Reinforcement Learning (HRL) method that allows an agent, when confronted with a novel task, to switch between exploiting prior knowledge through temporally extended actions, and environment exploration. We solve this trade-off by utilizing the disagreement between action distributions of selected previously acquired policies. Selection of relevant prior tasks is done by measuring the cosine similarity of their attached natural language goals in a pre-trained word-embedding.
We analyze the resulting temporal abstractions, and we experimentally demonstrate the effectiveness of them in different environments. We show that our method is capable of solving new tasks using only a fraction of the environment interactions required when learning the task from scratch.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Anderson, P., et al.: On evaluation of embodied navigation agents. arXiv:1807.06757 [cs] (2018)
Bacon, P.L., Harb, J., Precup, D.: The option-critic architecture. In: AAAI17 (2017)
Bahdanau, D., et al.: Learning to understand goal specifications by modelling reward. In: ICLR19 (2019)
Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. JMLR 3(Feb), 1137–1155 (2003)
Chaplot, D.S., Gandhi, D., Gupta, A., Salakhutdinov, R.: Object goal navigation using goal-oriented semantic exploration. In: NeurIPS20 (2020)
Chevalier-Boisvert, M.: gym-miniworld environment for openai gym (2018). https://github.com/maximecb/gym-miniworld
Fernández, F., Veloso, M.: Probabilistic policy reuse in a reinforcement learning agent. In: AAMAS06 (2006)
Fulda, N., Ricks, D., Murdoch, B., Wingate, D.: What can you do with a rock? Affordance extraction via word embeddings. In: IJCAI17 (2017)
Gregor, K., Rezende, D.J., Wierstra, D.: Variational intrinsic control. arXiv:1611.07507 [cs] (2016)
Gupta, S., Tolani, V., Davidson, J., Levine, S., Sukthankar, R., Malik, J.: Cognitive mapping and planning for visual navigation. Int. J. Comput. Vision 128(5), 1311–1330 (2020). https://doi.org/10.1007/s11263-019-01236-7
Hester, T., et al.: Deep Q-learning from demonstrations. In: AAAI18 (2017)
Ho, J., Ermon, S.: Generative adversarial imitation learning. In: Advances in Neural Information Processing Systems, vol. 29 (2016)
Honnibal, M., Montani, I., Van Landeghem, S., Boyd, A.: spaCy: industrial-strength natural language processing in Python (2020). https://doi.org/10.5281/zenodo.1212303
Hutsebaut-Buysse, M., Mets, K., Latré, S.: Pre-trained word embeddings for goal-conditional transfer learning in reinforcement learning. In: 1st Workshop on Language in Reinforcement Learning (2020)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013)
Mirowski, P., et al.: Learning to navigate in complex environments. In: ICLR17 (2017)
Miyazawa, K., Aoki, T., Horii, T., Nagai, T.: lamBERT: language and action learning using multimodal BERT (2020)
Mousavian, A., Toshev, A., Fiser, M., Kosecka, J., Wahid, A., Davidson, J.: Visual representations for semantic target driven navigation. In: ICRA19, pp. 8846–8852. IEEE, Montreal (2019). https://doi.org/10.1109/ICRA.2019.8793493
Nachum, O., Gu, S., Lee, H., Levine, S.: Data-efficient hierarchical reinforcement learning. In: NIPS18 (2018)
Narasimhan, K., Barzilay, R., Jaakkola, T.: Grounding language for transfer in deep reinforcement learning. JAIR 63, 849–874 (2018)
Ross, S., Gordon, G., Bagnell, D.: A reduction of imitation learning and structured prediction to no-regret online learning. In: Gordon, G., Dunson, D., Dudík, M. (eds.) Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 15, pp. 627–635. PMLR, Fort Lauderdale (2011)
Savva, M., et al.: Habitat: a platform for embodied AI research. In: ICCV19 (2019)
Silver, D.L., Yang, Q., Li, L.: Lifelong machine learning systems: beyond learning algorithms. In: AAAI13 (2013)
Sutton, R.S., Precup, D., Singh, S.: Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artif. Intell. 112(1–2), 181–211 (1999). https://doi.org/10.1016/S0004-3702(99)00052-1
Wang, Z., et al.: Sample efficient actor-critic with experience replay. In: ICLR17 (2017)
Weischedel, R., et al.: OntoNotes: a large training corpus for enhanced processing (2013)
Wijmans, E., et al.: DD-PPO: learning near-perfect pointgoal navigators from 2.5 billion frames. In: ICLR20 (2020)
Zhu, Y., et al: Target-driven visual navigation in indoor scenes using deep reinforcement learning. In: ICRA17 (2017). https://doi.org/10.1109/ICRA.2017.7989381
Acknowledgements
This research received funding from the Flemish Government under the “Onderzoeksprogramma Artificiële Intelligentie (AI) Vlaanderen” program.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Hutsebaut-Buysse, M., Schepper, T.D., Mets, K., Latré, S. (2021). Disagreement Options: Task Adaptation Through Temporally Extended Actions. In: Oliver, N., Pérez-Cruz, F., Kramer, S., Read, J., Lozano, J.A. (eds) Machine Learning and Knowledge Discovery in Databases. Research Track. ECML PKDD 2021. Lecture Notes in Computer Science(), vol 12975. Springer, Cham. https://doi.org/10.1007/978-3-030-86486-6_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-86486-6_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86485-9
Online ISBN: 978-3-030-86486-6
eBook Packages: Computer ScienceComputer Science (R0)