A Cordial Sync: Going Beyond Marginal Policies for Multi-agent Embodied Tasks

Jain, Unnat; Weihs, Luca; Kolve, Eric; Farhadi, Ali; Lazebnik, Svetlana; Kembhavi, Aniruddha; Schwing, Alexander

doi:10.1007/978-3-030-58558-7_28

Unnat Jain¹²,
Luca Weihs¹³,
Eric Kolve¹³,
Ali Farhadi¹⁴,
Svetlana Lazebnik¹²,
Aniruddha Kembhavi^13,14 &
…
Alexander Schwing¹²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12350))

Included in the following conference series:

European Conference on Computer Vision

4699 Accesses

Abstract

Autonomous agents must learn to collaborate. It is not scalable to develop a new centralized agent every time a task’s difficulty outpaces a single agent’s abilities. While multi-agent collaboration research has flourished in gridworld-like environments, relatively little work has considered visually rich domains. Addressing this, we introduce the novel task FurnMove in which agents work together to move a piece of furniture through a living room to a goal. Unlike existing tasks, FurnMove requires agents to coordinate at every timestep. We identify two challenges when training agents to complete FurnMove: existing decentralized action sampling procedures do not permit expressive joint action policies and, in tasks requiring close coordination, the number of failed actions dominates successful actions. To confront these challenges we introduce SYNC-policies (synchronize your actions coherently) and CORDIAL (coordination loss). Using SYNC-policies and CORDIAL, our agents achieve a 58% completion rate on FurnMove, an impressive absolute gain of 25 % points over competitive decentralized baselines. Our dataset, code, and pretrained models are available at https://unnat.github.io/cordial-sync.

U. Jain and L. Weihs—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 11439; Price includes VAT (Japan)

Softcover Book: JPY 14299; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

ReALFRED: An Embodied Instruction Following Benchmark in Photo-Realistic Environments

LEMMA: A Multi-view Dataset for L Earning Multi-agent Multi-task Activities

Analysis of coordinated behavior structures with multi-agent deep reinforcement learning

Article Open access 16 September 2020

Notes

1.
For FurnMove, each location of the lifted furniture corresponds to 404, 480 states, making shortest path computation intractable (more details in the supplement).

References

Abel, D., Agarwal, A., Diaz, F., Krishnamurthy, A., Schapire, R.E.: Exploratory gradient boosting for reinforcement learning in complex domains. arXiv preprint arXiv:1603.04119 (2016)
Anderson, P., et al.: On evaluation of embodied navigation agents. arXiv preprint arXiv:1807.06757 (2018)
Anderson, P., Shrivastava, A., Parikh, D., Batra, D., Lee, S.: Chasing ghosts: instruction following as bayesian state tracking. In: NeurIPS (2019)
Google Scholar
Anderson, P., et al.: Vision-and-language navigation: interpreting visually-grounded navigation instructions in real environments. In: CVPR (2018)
Google Scholar
Armeni, I., Sax, S., Zamir, A.R., Savarese, S.: Joint 2D–3D-semantic data for indoor scene understanding. arXiv preprint arXiv:1702.01105 (2017)
Aydemir, A., Pronobis, A., Göbelbecker, M., Jensfelt, P.: Active visual object search in unknown environments using uncertain semantics. IEEE Trans. Robot. 29, 986–1002 (2013)
Article Google Scholar
Baker, B., et al.: Emergent tool use from multi-agent autocurricula. arXiv preprint arXiv:1909.07528 (2019)
Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)
Article Google Scholar
Boutilier, C.: Sequential optimality and coordination in multiagent systems. In: IJCAI (1999)
Google Scholar
Bratman, J., Shvartsman, M., Lewis, R.L., Singh, S.: A new approach to exploring language emergence as boundedly optimal control in the face of environmental and cognitive constraints. In: Proceedings of International Conference on Cognitive Modeling (2010)
Google Scholar
Brodeur, S., et al.: HoME: a household multimodal environment. arXiv preprint arXiv:1711.11017 (2017)
Busoniu, L., Babuska, R., Schutter, B.D.: A comprehensive survey of multiagent reinforcement learning. IEEE Trans. Syst. Man Cybern. 38, 156–172 (2008)
Article Google Scholar
Cadena, C., et al.: Past, present, and future of simultaneous localization and mapping: toward the robust-perception age. IEEE Trans. Robot. 32, 1309–1332 (2016)
Article Google Scholar
Canny, J.: The Complexity of Robot Motion Planning. MIT Press, Cambridge (1988)
MATH Google Scholar
Chang, A., et al.: Matterport3D: learning from RGB-D data in indoor environments. In: 3DV (2017)
Google Scholar
Chaplot, D.S., Gupta, S., Gupta, A., Salakhutdinov, R.: Learning to explore using active neural mapping. In: ICLR (2020)
Google Scholar
Chen, B., Song, S., Lipson, H., Vondrick, C.: Visual hide and seek. arXiv preprint arXiv:1910.07882 (2019)
Chen, C., et al.: Audio-visual embodied navigation. arXiv preprint arXiv:1912.11474 (2019). First two authors contributed equally
Chen, H., Suhr, A., Misra, D., Snavely, N., Artzi, Y.: Touchdown: natural language navigation and spatial reasoning in visual street environments. In: CVPR (2019)
Google Scholar
Daftry, S., Bagnell, J.A., Hebert, M.: Learning transferable policies for monocular reactive MAV control. In: Kulić, D., Nakamura, Y., Khatib, O., Venture, G. (eds.) ISER 2016. SPAR, vol. 1, pp. 3–11. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-50115-4_1
Chapter Google Scholar
Das, A., Datta, S., Gkioxari, G., Lee, S., Parikh, D., Batra, D.: Embodied question answering. In: CVPR (2018)
Google Scholar
Das, A., Gkioxari, G., Lee, S., Parikh, D., Batra, D.: Neural modular control for embodied question answering. In: ECCV (2018)
Google Scholar
Das, A., et al.: Probing emergent semantics in predictive agents via question answering. In: ICML (2020). First two authors contributed equally
Google Scholar
Das, A., et al.: TarMAC: targeted multi-agent communication. In: ICML (2019)
Google Scholar
Dellaert, F., Seitz, S., Thorpe, C., Thrun, S.: Structure from motion without correspondence. In: CVPR (2000)
Google Scholar
Elfes, A.: Using occupancy grids for mobile robot perception and navigation. Computer 22, 46–57 (1989)
Article Google Scholar
Foerster, J.N., Assael, Y.M., de Freitas, N., Whiteson, S.: Learning to communicate with deep multi-agent reinforcement learning. In: NeurIPS (2016)
Google Scholar
Foerster, J.N., Farquhar, G., Afouras, T., NArdelli, N., Whiteson, S.: Counterfactual multi-agent policy gradients. In: AAAI (2018)
Google Scholar
Foerster, J.N., Nardelli, N., Farquhar, G., Torr, P.H.S., Kohli, P., Whiteson, S.: Stabilising experience replay for deep multi-agent reinforcement learning. In: ICML (2017)
Google Scholar
Fraundorfer, F., et al.: Vision-based autonomous mapping and exploration using a quadrotor MAV. In: IROS (2012)
Google Scholar
Gao, R., Chen, C., Al-Halah, Z., Schissler, C., Grauman, K.: VisualEchoes: spatial image representation learning through echolocation. In: ECCV (2020)
Google Scholar
Giles, C.L., Jim, K.-C.: Learning communication for multi-agent systems. In: Truszkowski, W., Hinchey, M., Rouff, C. (eds.) WRAC 2002. LNCS (LNAI), vol. 2564, pp. 377–390. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45173-0_29
Chapter Google Scholar
Giusti, A., et al.: A machine learning approach to visual perception of forest trails for mobile robots. IEEE Robot. Autom. Lett. 1, 661–667 (2015)
Article Google Scholar
Gordon, D., Kembhavi, A., Rastegari, M., Redmon, J., Fox, D., Farhadi, A.: IQA: visual Question Answering in Interactive Environments. In: CVPR (2018)
Google Scholar
Gupta, A., Johnson, J., Fei-Fei, L., Savarese, S., Alahi, A.: Social GAN: socially acceptable trajectories with generative adversarial networks. In: CVPR (2018)
Google Scholar
Gupta, J.K., Egorov, M., Kochenderfer, M.: Cooperative multi-agent control using deep reinforcement learning. In: Sukthankar, G., Rodriguez-Aguilar, J.A. (eds.) AAMAS 2017. LNCS (LNAI), vol. 10642, pp. 66–83. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71682-4_5
Chapter Google Scholar
Henriques, J.F., Vedaldi, A.: MapNet: an allocentric spatial memory for mapping environments. In: CVPR (2018)
Google Scholar
Hill, F., Hermann, K.M., Blunsom, P., Clark, S.: Understanding grounded language learning agents. arXiv preprint arXiv:1710.09867 (2017)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)
Article Google Scholar
Jaderberg, M., et al.: Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science 364, 859–865 (2019)
Article MathSciNet Google Scholar
Jain, U., et al.: Two body problem: collaborative visual task completion. In: CVPR (2019), first two authors contributed equally
Google Scholar
Johnson, M., Hofmann, K., Hutton, T., Bignell, D.: The malmo platform for artificial intelligence experimentation. In: IJCAI (2016)
Google Scholar
Kahn, G., Zhang, T., Levine, S., Abbeel, P.: Plato: policy learning using adaptive trajectory optimization. In: ICRA (2017)
Google Scholar
Kasai, T., Tenmoto, H., Kamiya, A.: Learning of communication codes in multi-agent reinforcement learning problem. In: Proceedings of IEEE Soft Computing in Industrial Applications (2008)
Google Scholar
Kavraki, L.E., Svestka, P., Latombe, J.C., Overmars, M.H.: Probabilistic roadmaps for path planning in high-dimensional configuration spaces. IEEE Trans. Robot. Autom. 12, 566–580 (1996)
Article Google Scholar
Kempka, M., Wydmuch, M., Runc, G., Toczek, J., Jakowski, W.: ViZDoom: a doom-based AI research platform for visual reinforcement learning. In: Proceedings of IEEE Conference on Computational Intelligence and Games (2016)
Google Scholar
Kolve, E., et al.: AI2-THOR: an interactive 3D environment for visual AI. arXiv preprint arXiv:1712.05474 (2019)
Konolige, K., et al.: View-based maps. Int. J. Robot. Res. 29, 941–957 (2010)
Article Google Scholar
Kuipers, B., Byun, Y.T.: A robot exploration and mapping strategy based on a semantic hierarchy of spatial representations. Robot. Auton. Syst. 8, 47–63 (1991)
Article Google Scholar
Lauer, M., Riedmiller, M.: An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In: ICML (2000)
Google Scholar
Lavalle, S.M., Kuffner, J.J.: Rapidly-exploring random trees: progress and prospects. Algorithmic Comput. Robot.: New Direct (2000)
Google Scholar
Lazaridou, A., Peysakhovich, A., Baroni, M.: Multi-agent cooperation and the emergence of (natural) language. In: arXiv preprint arXiv:1612.07182 (2016)
Lerer, A., Gross, S., Fergus, R.: Learning physical intuition of block towers by example. In: ICML (2016)
Google Scholar
Liu, I.J., Yeh, R., Schwing, A.G.: PIC: permutation invariant critic for multi-agent deep reinforcement learning. In: CoRL (2019). First two authors contributed equally
Google Scholar
Liu, Y.C., Tian, J., Glaser, N., Kira, Z.: When2com: multi-agent perception via communication graph grouping. In: CVPR (2020)
Google Scholar
Liu, Y.C., Tian, J., Ma, C.Y., Glaser, N., Kuo, C.W., Kira, Z.: Who2com: collaborative perception via learnable handshake communication. In: ICRA (2020)
Google Scholar
Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: NeurIPS (2017)
Google Scholar
Savva, M., et al.: Habitat: a platform for embodied AI research. In: ICCV (2019)
Google Scholar
Matignon, L., Laurent, G.J., Fort-Piat, N.L.: Hysteretic Q-learning: an algorithm for decentralized reinforcement learning in cooperative multi-agent teams. In: IROS (2007)
Google Scholar
Melo, F.S., Spaan, M.T.J., Witwicki, S.J.: QueryPOMDP: POMDP-based communication in multiagent systems. In: Cossentino, M., Kaisers, M., Tuyls, K., Weiss, G. (eds.) EUMAS 2011. LNCS (LNAI), vol. 7541, pp. 189–204. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34799-3_13
Chapter Google Scholar
Mirowski, P., et al.: Learning to navigate in complex environments. In: ICLR (2017)
Google Scholar
Mirowski, P., et al.: The streetlearn environment and dataset. arXiv preprint arXiv:1903.01292 (2019)
Mirowski, P., et al.: Learning to navigate in cities without a map. In: NeurIPS (2018)
Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)
Article Google Scholar
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: ICML (2016)
Google Scholar
Mordatch, I., Abbeel, P.: Emergence of grounded compositional language in multi-agent populations. In: AAAI (2018)
Google Scholar
Oh, J., Chockalingam, V., Singh, S., Lee, H.: Control of memory, active perception, and action in minecraft. In: ICML (2016)
Google Scholar
Omidshafiei, S., Pazis, J., Amato, C., How, J.P., Vian, J.: Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In: ICML (2017)
Google Scholar
Panait, L., Luke, S.: Cooperative multi-agent learning: the state of the art. Autonom. Agents Multi-Agent Syst. AAMAS 11, 387–434 (2005)
Article Google Scholar
Peng, P., et al.: Multiagent bidirectionally-coordinated nets: emergence of human-level coordination in learning to play starcraft combat games. arXiv preprint arXiv:1703.10069 (2017)
Smith, R.C., Cheeseman, P.: On the representation and estimation of spatial uncertainty. Int. J. Robot. Res. 5, 56–68 (1986)
Article Google Scholar
Ramakrishnan, S.K., Jayaraman, D., Grauman, K.: An exploration of embodied visual exploration. arXiv preprint arXiv:2001.02192 (2020)
Rashid, T., Samvelyan, M., Schroeder, C., Farquhar, G., Foerster, J., Whiteson, S.: QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. In: ICML (2018)
Google Scholar
Savinov, N., Dosovitskiy, A., Koltun, V.: Semi-parametric topological memory for navigation. In: ICLR (2018)
Google Scholar
Savva, M., Chang, A.X., Dosovitskiy, A., Funkhouser, T., Koltun, V.: MINOS: multimodal indoor simulator for navigation in complex environments. arXiv preprint arXiv:1712.03931 (2017)
Schönberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: CVPR (2016)
Google Scholar
Smith, R.C., Self, M., Cheeseman, P.: Estimating uncertain spatial relationships in robotics. In: UAI (1986)
Google Scholar
Suhr, A., et al.: Executing instructions in situated collaborative interactions. In: EMNLP (2019)
Google Scholar
Sukhbaatar, S., Szlam, A., Fergus, R.: Learning multiagent communication with backpropagation. In: NeurIPS (2016)
Google Scholar
Sukhbaatar, S., Szlam, A., Synnaeve, G., Chintala, S., Fergus, R.: MazeBase: a sandbox for learning from games. arXiv preprint arXiv:1511.07401 (2015)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
MATH Google Scholar
Tamar, A., Wu, Y., Thomas, G., Levine, S., Abbeel, P.: Value iteration networks. In: NeurIPS (2016)
Google Scholar
Tampuu, A., et al.: Multiagent cooperation and competition with deep reinforcement learning. PloS 12, e0172395 (2017)
Article Google Scholar
Tan, M.: Multi-agent reinforcement learning: independent vs. cooperative agents. In: ICML (1993)
Google Scholar
Tesauro, G.: Extending Q-learning to general adaptive multi-agent systems. In: NeurIPS (2004)
Google Scholar
Thomason, J., Gordon, D., Bisk, Y.: Shifting the baseline: Single modality performance on visual navigation & QA. In: NAACL (2019)
Google Scholar
Tomasi, C., Kanade, T.: Shape and motion from image streams under orthography: a factorization method. IJCV 9, 137–154 (1992)
Article Google Scholar
Toussaint, M.: Learning a world model and planning with a self-organizing, dynamic neural system. In: NeurIPS (2003)
Google Scholar
Usunier, N., Synnaeve, G., Lin, Z., Chintala, S.: Episodic exploration for deep deterministic policies: an application to starcraft micromanagement tasks. In: ICLR (2016)
Google Scholar
de Vries, H., Shuster, K., Batra, D., Parikh, D., Weston, J., Kiela, D.: Talk the walk: navigating new York city through grounded dialogue. arXiv preprint arXiv:1807.03367 (2018)
Wang, X., et al.: Reinforced cross-modal matching and self-supervised imitation learning for vision-language navigation. In: CVPR (2019)
Google Scholar
Weihs, L., Jain, U., Salvador, J., Lazebnik, S., Kembhavi, A., Schwing, A.: Bridging the imitation gap by adaptive insubordination. arXiv preprint arXiv:2007.12173 (2020). The first two authors contributed equally
Weihs, L., et al.: Artificial agents learn flexible visual representations by playing a hiding game. arXiv preprint arXiv:1912.08195 (2019)
Weihs, L., et al.: AllenAct: a framework for embodied AI research. arXiv (2020)
Google Scholar
Wijmans, E., et al.: Embodied question answering in photorealistic environments with point cloud perception. In: CVPR (2019)
Google Scholar
Wortsman, M., Ehsani, K., Rastegari, M., Farhadi, A., Mottaghi, R.: Learning to learn how to learn: self-adaptive visual navigation using meta-learning. In: CVPR (2019)
Google Scholar
Wu, Y., Wu, Y., Tamar, A., Russell, S., Gkioxari, G., Tian, Y.: Bayesian relational memory for semantic visual navigation. In: ICCV (2019)
Google Scholar
Wymann, B., Espié, E., Guionneau, C., Dimitrakakis, C., Coulom, R., Sumner, A.: TORCS, the open racing car simulator (2013). http://www.torcs.org
Xia, F., et al.: Interactive Gibson: a benchmark for interactive navigation in cluttered environments. arXiv preprint arXiv:1910.14442 (2019)
Xia, F., Zamir, A.R., He, Z., Sax, A., Malik, J., Savarese, S.: Gibson ENv: real-world perception for embodied agents. In: CVPR (2018)
Google Scholar
Yang, J., Lu, J., Lee, S., Batra, D., Parikh, D.: Visual curiosity: learning to ask questions to learn visual recognition. In: CoRL (2018)
Google Scholar
Yang, J., et al.: Embodied amodal recognition: learning to move to perceive objects. In: ICCV (2019)
Google Scholar
Yang, W., Wang, X., Farhadi, A., Gupta, A., Mottaghi, R.: Visual semantic navigation using scene priors. In: ICLR (2018)
Google Scholar
Zhang, K., Yang, Z., Başar, T.: Multi-agent reinforcement learning: a selective overview of theories and algorithms. arXiv preprint arXiv:1911.10635 (2019)
Zhu, Y., et al.: Target-driven visual navigation in indoor scenes using deep reinforcement learning. In: ICRA (2017)
Google Scholar

Download references

Acknowledgements

This material is based upon work supported in part by the National Science Foundation under Grants No. 1563727, 1718221, 1637479, 165205, 1703166, Samsung, 3M, Sloan Fellowship, NVIDIA Artificial Intelligence Lab, Allen Institute for AI, Amazon, AWS Research Awards, and Siebel Scholars Award. We thank M. Wortsman and K.-H. Zeng for their insightful comments.

Author information

Authors and Affiliations

University of Illinois at Urbana-Champaign, Champaign, USA
Unnat Jain, Svetlana Lazebnik & Alexander Schwing
Allen Institute for AI, Seattle, USA
Luca Weihs, Eric Kolve & Aniruddha Kembhavi
University of Washington, Seattle, USA
Ali Farhadi & Aniruddha Kembhavi

Authors

Unnat Jain
View author publications
You can also search for this author in PubMed Google Scholar
Luca Weihs
View author publications
You can also search for this author in PubMed Google Scholar
Eric Kolve
View author publications
You can also search for this author in PubMed Google Scholar
Ali Farhadi
View author publications
You can also search for this author in PubMed Google Scholar
Svetlana Lazebnik
View author publications
You can also search for this author in PubMed Google Scholar
Aniruddha Kembhavi
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Schwing
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Unnat Jain .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 4155 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jain, U. et al. (2020). A Cordial Sync: Going Beyond Marginal Policies for Multi-agent Embodied Tasks. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12350. Springer, Cham. https://doi.org/10.1007/978-3-030-58558-7_28

Download citation

DOI: https://doi.org/10.1007/978-3-030-58558-7_28
Published: 29 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58557-0
Online ISBN: 978-3-030-58558-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics