Disagreement Options: Task Adaptation Through Temporally Extended Actions | SpringerLink
Skip to main content

Disagreement Options: Task Adaptation Through Temporally Extended Actions

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases. Research Track (ECML PKDD 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12975))

  • 2649 Accesses

Abstract

Embodied AI, learning through interaction with a physical environment, typically requires large amounts of interaction with the environment in order to learn how to solve new tasks. Training can be done in parallel, using simulated environments. However, once deployed in e.g., a real-world setting, it is not yet clear how an agent can quickly adapt its knowledge to solve new tasks.

In this paper, we propose a novel Hierarchical Reinforcement Learning (HRL) method that allows an agent, when confronted with a novel task, to switch between exploiting prior knowledge through temporally extended actions, and environment exploration. We solve this trade-off by utilizing the disagreement between action distributions of selected previously acquired policies. Selection of relevant prior tasks is done by measuring the cosine similarity of their attached natural language goals in a pre-trained word-embedding.

We analyze the resulting temporal abstractions, and we experimentally demonstrate the effectiveness of them in different environments. We show that our method is capable of solving new tasks using only a fraction of the environment interactions required when learning the task from scratch.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 12583
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 15729
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Anderson, P., et al.: On evaluation of embodied navigation agents. arXiv:1807.06757 [cs] (2018)

  2. Bacon, P.L., Harb, J., Precup, D.: The option-critic architecture. In: AAAI17 (2017)

    Google Scholar 

  3. Bahdanau, D., et al.: Learning to understand goal specifications by modelling reward. In: ICLR19 (2019)

    Google Scholar 

  4. Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. JMLR 3(Feb), 1137–1155 (2003)

    MATH  Google Scholar 

  5. Chaplot, D.S., Gandhi, D., Gupta, A., Salakhutdinov, R.: Object goal navigation using goal-oriented semantic exploration. In: NeurIPS20 (2020)

    Google Scholar 

  6. Chevalier-Boisvert, M.: gym-miniworld environment for openai gym (2018). https://github.com/maximecb/gym-miniworld

  7. Fernández, F., Veloso, M.: Probabilistic policy reuse in a reinforcement learning agent. In: AAMAS06 (2006)

    Google Scholar 

  8. Fulda, N., Ricks, D., Murdoch, B., Wingate, D.: What can you do with a rock? Affordance extraction via word embeddings. In: IJCAI17 (2017)

    Google Scholar 

  9. Gregor, K., Rezende, D.J., Wierstra, D.: Variational intrinsic control. arXiv:1611.07507 [cs] (2016)

  10. Gupta, S., Tolani, V., Davidson, J., Levine, S., Sukthankar, R., Malik, J.: Cognitive mapping and planning for visual navigation. Int. J. Comput. Vision 128(5), 1311–1330 (2020). https://doi.org/10.1007/s11263-019-01236-7

    Article  Google Scholar 

  11. Hester, T., et al.: Deep Q-learning from demonstrations. In: AAAI18 (2017)

    Google Scholar 

  12. Ho, J., Ermon, S.: Generative adversarial imitation learning. In: Advances in Neural Information Processing Systems, vol. 29 (2016)

    Google Scholar 

  13. Honnibal, M., Montani, I., Van Landeghem, S., Boyd, A.: spaCy: industrial-strength natural language processing in Python (2020). https://doi.org/10.5281/zenodo.1212303

  14. Hutsebaut-Buysse, M., Mets, K., Latré, S.: Pre-trained word embeddings for goal-conditional transfer learning in reinforcement learning. In: 1st Workshop on Language in Reinforcement Learning (2020)

    Google Scholar 

  15. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013)

    Google Scholar 

  16. Mirowski, P., et al.: Learning to navigate in complex environments. In: ICLR17 (2017)

    Google Scholar 

  17. Miyazawa, K., Aoki, T., Horii, T., Nagai, T.: lamBERT: language and action learning using multimodal BERT (2020)

    Google Scholar 

  18. Mousavian, A., Toshev, A., Fiser, M., Kosecka, J., Wahid, A., Davidson, J.: Visual representations for semantic target driven navigation. In: ICRA19, pp. 8846–8852. IEEE, Montreal (2019). https://doi.org/10.1109/ICRA.2019.8793493

  19. Nachum, O., Gu, S., Lee, H., Levine, S.: Data-efficient hierarchical reinforcement learning. In: NIPS18 (2018)

    Google Scholar 

  20. Narasimhan, K., Barzilay, R., Jaakkola, T.: Grounding language for transfer in deep reinforcement learning. JAIR 63, 849–874 (2018)

    Article  MathSciNet  Google Scholar 

  21. Ross, S., Gordon, G., Bagnell, D.: A reduction of imitation learning and structured prediction to no-regret online learning. In: Gordon, G., Dunson, D., Dudík, M. (eds.) Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 15, pp. 627–635. PMLR, Fort Lauderdale (2011)

    Google Scholar 

  22. Savva, M., et al.: Habitat: a platform for embodied AI research. In: ICCV19 (2019)

    Google Scholar 

  23. Silver, D.L., Yang, Q., Li, L.: Lifelong machine learning systems: beyond learning algorithms. In: AAAI13 (2013)

    Google Scholar 

  24. Sutton, R.S., Precup, D., Singh, S.: Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artif. Intell. 112(1–2), 181–211 (1999). https://doi.org/10.1016/S0004-3702(99)00052-1

    Article  MathSciNet  MATH  Google Scholar 

  25. Wang, Z., et al.: Sample efficient actor-critic with experience replay. In: ICLR17 (2017)

    Google Scholar 

  26. Weischedel, R., et al.: OntoNotes: a large training corpus for enhanced processing (2013)

    Google Scholar 

  27. Wijmans, E., et al.: DD-PPO: learning near-perfect pointgoal navigators from 2.5 billion frames. In: ICLR20 (2020)

    Google Scholar 

  28. Zhu, Y., et al: Target-driven visual navigation in indoor scenes using deep reinforcement learning. In: ICRA17 (2017). https://doi.org/10.1109/ICRA.2017.7989381

Download references

Acknowledgements

This research received funding from the Flemish Government under the “Onderzoeksprogramma Artificiële Intelligentie (AI) Vlaanderen” program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matthias Hutsebaut-Buysse .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hutsebaut-Buysse, M., Schepper, T.D., Mets, K., Latré, S. (2021). Disagreement Options: Task Adaptation Through Temporally Extended Actions. In: Oliver, N., Pérez-Cruz, F., Kramer, S., Read, J., Lozano, J.A. (eds) Machine Learning and Knowledge Discovery in Databases. Research Track. ECML PKDD 2021. Lecture Notes in Computer Science(), vol 12975. Springer, Cham. https://doi.org/10.1007/978-3-030-86486-6_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86486-6_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86485-9

  • Online ISBN: 978-3-030-86486-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics