Disagreement Options: Task Adaptation Through Temporally Extended Actions

Hutsebaut-Buysse, Matthias; Schepper, Tom De; Mets, Kevin; Latré, Steven

doi:10.1007/978-3-030-86486-6_12

Matthias Hutsebaut-Buysse¹³,
Tom De Schepper¹³,
Kevin Mets¹³ &
…
Steven Latré¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12975))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

2649 Accesses

Abstract

Embodied AI, learning through interaction with a physical environment, typically requires large amounts of interaction with the environment in order to learn how to solve new tasks. Training can be done in parallel, using simulated environments. However, once deployed in e.g., a real-world setting, it is not yet clear how an agent can quickly adapt its knowledge to solve new tasks.

In this paper, we propose a novel Hierarchical Reinforcement Learning (HRL) method that allows an agent, when confronted with a novel task, to switch between exploiting prior knowledge through temporally extended actions, and environment exploration. We solve this trade-off by utilizing the disagreement between action distributions of selected previously acquired policies. Selection of relevant prior tasks is done by measuring the cosine similarity of their attached natural language goals in a pre-trained word-embedding.

We analyze the resulting temporal abstractions, and we experimentally demonstrate the effectiveness of them in different environments. We show that our method is capable of solving new tasks using only a fraction of the environment interactions required when learning the task from scratch.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 12583; Price includes VAT (Japan)

Softcover Book: JPY 15729; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Maximum causal entropy inverse constrained reinforcement learning

Article 21 February 2025

Temporally extended successor feature neural episodic control

Article Open access 02 July 2024

Preserving and combining knowledge in robotic lifelong reinforcement learning

Article Open access 05 February 2025

References

Anderson, P., et al.: On evaluation of embodied navigation agents. arXiv:1807.06757 [cs] (2018)
Bacon, P.L., Harb, J., Precup, D.: The option-critic architecture. In: AAAI17 (2017)
Google Scholar
Bahdanau, D., et al.: Learning to understand goal specifications by modelling reward. In: ICLR19 (2019)
Google Scholar
Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. JMLR 3(Feb), 1137–1155 (2003)
MATH Google Scholar
Chaplot, D.S., Gandhi, D., Gupta, A., Salakhutdinov, R.: Object goal navigation using goal-oriented semantic exploration. In: NeurIPS20 (2020)
Google Scholar
Chevalier-Boisvert, M.: gym-miniworld environment for openai gym (2018). https://github.com/maximecb/gym-miniworld
Fernández, F., Veloso, M.: Probabilistic policy reuse in a reinforcement learning agent. In: AAMAS06 (2006)
Google Scholar
Fulda, N., Ricks, D., Murdoch, B., Wingate, D.: What can you do with a rock? Affordance extraction via word embeddings. In: IJCAI17 (2017)
Google Scholar
Gregor, K., Rezende, D.J., Wierstra, D.: Variational intrinsic control. arXiv:1611.07507 [cs] (2016)
Gupta, S., Tolani, V., Davidson, J., Levine, S., Sukthankar, R., Malik, J.: Cognitive mapping and planning for visual navigation. Int. J. Comput. Vision 128(5), 1311–1330 (2020). https://doi.org/10.1007/s11263-019-01236-7
Article Google Scholar
Hester, T., et al.: Deep Q-learning from demonstrations. In: AAAI18 (2017)
Google Scholar
Ho, J., Ermon, S.: Generative adversarial imitation learning. In: Advances in Neural Information Processing Systems, vol. 29 (2016)
Google Scholar
Honnibal, M., Montani, I., Van Landeghem, S., Boyd, A.: spaCy: industrial-strength natural language processing in Python (2020). https://doi.org/10.5281/zenodo.1212303
Hutsebaut-Buysse, M., Mets, K., Latré, S.: Pre-trained word embeddings for goal-conditional transfer learning in reinforcement learning. In: 1st Workshop on Language in Reinforcement Learning (2020)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013)
Google Scholar
Mirowski, P., et al.: Learning to navigate in complex environments. In: ICLR17 (2017)
Google Scholar
Miyazawa, K., Aoki, T., Horii, T., Nagai, T.: lamBERT: language and action learning using multimodal BERT (2020)
Google Scholar
Mousavian, A., Toshev, A., Fiser, M., Kosecka, J., Wahid, A., Davidson, J.: Visual representations for semantic target driven navigation. In: ICRA19, pp. 8846–8852. IEEE, Montreal (2019). https://doi.org/10.1109/ICRA.2019.8793493
Nachum, O., Gu, S., Lee, H., Levine, S.: Data-efficient hierarchical reinforcement learning. In: NIPS18 (2018)
Google Scholar
Narasimhan, K., Barzilay, R., Jaakkola, T.: Grounding language for transfer in deep reinforcement learning. JAIR 63, 849–874 (2018)
Article MathSciNet Google Scholar
Ross, S., Gordon, G., Bagnell, D.: A reduction of imitation learning and structured prediction to no-regret online learning. In: Gordon, G., Dunson, D., Dudík, M. (eds.) Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 15, pp. 627–635. PMLR, Fort Lauderdale (2011)
Google Scholar
Savva, M., et al.: Habitat: a platform for embodied AI research. In: ICCV19 (2019)
Google Scholar
Silver, D.L., Yang, Q., Li, L.: Lifelong machine learning systems: beyond learning algorithms. In: AAAI13 (2013)
Google Scholar
Sutton, R.S., Precup, D., Singh, S.: Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artif. Intell. 112(1–2), 181–211 (1999). https://doi.org/10.1016/S0004-3702(99)00052-1
Article MathSciNet MATH Google Scholar
Wang, Z., et al.: Sample efficient actor-critic with experience replay. In: ICLR17 (2017)
Google Scholar
Weischedel, R., et al.: OntoNotes: a large training corpus for enhanced processing (2013)
Google Scholar
Wijmans, E., et al.: DD-PPO: learning near-perfect pointgoal navigators from 2.5 billion frames. In: ICLR20 (2020)
Google Scholar
Zhu, Y., et al: Target-driven visual navigation in indoor scenes using deep reinforcement learning. In: ICRA17 (2017). https://doi.org/10.1109/ICRA.2017.7989381

Download references

Acknowledgements

This research received funding from the Flemish Government under the “Onderzoeksprogramma Artificiële Intelligentie (AI) Vlaanderen” program.

Author information

Authors and Affiliations

Department of Computer Science, University of Antwerp – imec, Antwerp, Belgium
Matthias Hutsebaut-Buysse, Tom De Schepper, Kevin Mets & Steven Latré

Authors

Matthias Hutsebaut-Buysse
View author publications
You can also search for this author in PubMed Google Scholar
Tom De Schepper
View author publications
You can also search for this author in PubMed Google Scholar
Kevin Mets
View author publications
You can also search for this author in PubMed Google Scholar
Steven Latré
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matthias Hutsebaut-Buysse .

Editor information

Editors and Affiliations

ELLIS - The European Laboratory for Learning and Intelligent Systems, Alicante, Spain
Nuria Oliver
ETHZ and EPFL, Zürich, Switzerland
Fernando Pérez-Cruz
Johannes Gutenberg University of Mainz, Mainz, Germany
Stefan Kramer
École Polytechnique, Palaiseau, France
Jesse Read
Basque Center for Applied Mathematics, Bilbao, Spain
Jose A. Lozano

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hutsebaut-Buysse, M., Schepper, T.D., Mets, K., Latré, S. (2021). Disagreement Options: Task Adaptation Through Temporally Extended Actions. In: Oliver, N., Pérez-Cruz, F., Kramer, S., Read, J., Lozano, J.A. (eds) Machine Learning and Knowledge Discovery in Databases. Research Track. ECML PKDD 2021. Lecture Notes in Computer Science(), vol 12975. Springer, Cham. https://doi.org/10.1007/978-3-030-86486-6_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-86486-6_12
Published: 10 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86485-9
Online ISBN: 978-3-030-86486-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)