A Cordial Sync: Going Beyond Marginal Policies for Multi-agent Embodied Tasks | SpringerLink
Skip to main content

A Cordial Sync: Going Beyond Marginal Policies for Multi-agent Embodied Tasks

  • Conference paper
  • First Online:
Computer Vision – ECCV 2020 (ECCV 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12350))

Included in the following conference series:

  • 4699 Accesses

Abstract

Autonomous agents must learn to collaborate. It is not scalable to develop a new centralized agent every time a task’s difficulty outpaces a single agent’s abilities. While multi-agent collaboration research has flourished in gridworld-like environments, relatively little work has considered visually rich domains. Addressing this, we introduce the novel task FurnMove in which agents work together to move a piece of furniture through a living room to a goal. Unlike existing tasks, FurnMove requires agents to coordinate at every timestep. We identify two challenges when training agents to complete FurnMove: existing decentralized action sampling procedures do not permit expressive joint action policies and, in tasks requiring close coordination, the number of failed actions dominates successful actions. To confront these challenges we introduce SYNC-policies (synchronize your actions coherently) and CORDIAL (coordination loss). Using SYNC-policies and CORDIAL, our agents achieve a 58% completion rate on FurnMove, an impressive absolute gain of 25 % points over competitive decentralized baselines. Our dataset, code, and pretrained models are available at https://unnat.github.io/cordial-sync.

U. Jain and L. Weihs—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 11439
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 14299
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    For FurnMove, each location of the lifted furniture corresponds to 404, 480 states, making shortest path computation intractable (more details in the supplement).

References

  1. Abel, D., Agarwal, A., Diaz, F., Krishnamurthy, A., Schapire, R.E.: Exploratory gradient boosting for reinforcement learning in complex domains. arXiv preprint arXiv:1603.04119 (2016)

  2. Anderson, P., et al.: On evaluation of embodied navigation agents. arXiv preprint arXiv:1807.06757 (2018)

  3. Anderson, P., Shrivastava, A., Parikh, D., Batra, D., Lee, S.: Chasing ghosts: instruction following as bayesian state tracking. In: NeurIPS (2019)

    Google Scholar 

  4. Anderson, P., et al.: Vision-and-language navigation: interpreting visually-grounded navigation instructions in real environments. In: CVPR (2018)

    Google Scholar 

  5. Armeni, I., Sax, S., Zamir, A.R., Savarese, S.: Joint 2D–3D-semantic data for indoor scene understanding. arXiv preprint arXiv:1702.01105 (2017)

  6. Aydemir, A., Pronobis, A., Göbelbecker, M., Jensfelt, P.: Active visual object search in unknown environments using uncertain semantics. IEEE Trans. Robot. 29, 986–1002 (2013)

    Article  Google Scholar 

  7. Baker, B., et al.: Emergent tool use from multi-agent autocurricula. arXiv preprint arXiv:1909.07528 (2019)

  8. Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)

    Article  Google Scholar 

  9. Boutilier, C.: Sequential optimality and coordination in multiagent systems. In: IJCAI (1999)

    Google Scholar 

  10. Bratman, J., Shvartsman, M., Lewis, R.L., Singh, S.: A new approach to exploring language emergence as boundedly optimal control in the face of environmental and cognitive constraints. In: Proceedings of International Conference on Cognitive Modeling (2010)

    Google Scholar 

  11. Brodeur, S., et al.: HoME: a household multimodal environment. arXiv preprint arXiv:1711.11017 (2017)

  12. Busoniu, L., Babuska, R., Schutter, B.D.: A comprehensive survey of multiagent reinforcement learning. IEEE Trans. Syst. Man Cybern. 38, 156–172 (2008)

    Article  Google Scholar 

  13. Cadena, C., et al.: Past, present, and future of simultaneous localization and mapping: toward the robust-perception age. IEEE Trans. Robot. 32, 1309–1332 (2016)

    Article  Google Scholar 

  14. Canny, J.: The Complexity of Robot Motion Planning. MIT Press, Cambridge (1988)

    MATH  Google Scholar 

  15. Chang, A., et al.: Matterport3D: learning from RGB-D data in indoor environments. In: 3DV (2017)

    Google Scholar 

  16. Chaplot, D.S., Gupta, S., Gupta, A., Salakhutdinov, R.: Learning to explore using active neural mapping. In: ICLR (2020)

    Google Scholar 

  17. Chen, B., Song, S., Lipson, H., Vondrick, C.: Visual hide and seek. arXiv preprint arXiv:1910.07882 (2019)

  18. Chen, C., et al.: Audio-visual embodied navigation. arXiv preprint arXiv:1912.11474 (2019). First two authors contributed equally

  19. Chen, H., Suhr, A., Misra, D., Snavely, N., Artzi, Y.: Touchdown: natural language navigation and spatial reasoning in visual street environments. In: CVPR (2019)

    Google Scholar 

  20. Daftry, S., Bagnell, J.A., Hebert, M.: Learning transferable policies for monocular reactive MAV control. In: Kulić, D., Nakamura, Y., Khatib, O., Venture, G. (eds.) ISER 2016. SPAR, vol. 1, pp. 3–11. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-50115-4_1

    Chapter  Google Scholar 

  21. Das, A., Datta, S., Gkioxari, G., Lee, S., Parikh, D., Batra, D.: Embodied question answering. In: CVPR (2018)

    Google Scholar 

  22. Das, A., Gkioxari, G., Lee, S., Parikh, D., Batra, D.: Neural modular control for embodied question answering. In: ECCV (2018)

    Google Scholar 

  23. Das, A., et al.: Probing emergent semantics in predictive agents via question answering. In: ICML (2020). First two authors contributed equally

    Google Scholar 

  24. Das, A., et al.: TarMAC: targeted multi-agent communication. In: ICML (2019)

    Google Scholar 

  25. Dellaert, F., Seitz, S., Thorpe, C., Thrun, S.: Structure from motion without correspondence. In: CVPR (2000)

    Google Scholar 

  26. Elfes, A.: Using occupancy grids for mobile robot perception and navigation. Computer 22, 46–57 (1989)

    Article  Google Scholar 

  27. Foerster, J.N., Assael, Y.M., de Freitas, N., Whiteson, S.: Learning to communicate with deep multi-agent reinforcement learning. In: NeurIPS (2016)

    Google Scholar 

  28. Foerster, J.N., Farquhar, G., Afouras, T., NArdelli, N., Whiteson, S.: Counterfactual multi-agent policy gradients. In: AAAI (2018)

    Google Scholar 

  29. Foerster, J.N., Nardelli, N., Farquhar, G., Torr, P.H.S., Kohli, P., Whiteson, S.: Stabilising experience replay for deep multi-agent reinforcement learning. In: ICML (2017)

    Google Scholar 

  30. Fraundorfer, F., et al.: Vision-based autonomous mapping and exploration using a quadrotor MAV. In: IROS (2012)

    Google Scholar 

  31. Gao, R., Chen, C., Al-Halah, Z., Schissler, C., Grauman, K.: VisualEchoes: spatial image representation learning through echolocation. In: ECCV (2020)

    Google Scholar 

  32. Giles, C.L., Jim, K.-C.: Learning communication for multi-agent systems. In: Truszkowski, W., Hinchey, M., Rouff, C. (eds.) WRAC 2002. LNCS (LNAI), vol. 2564, pp. 377–390. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45173-0_29

    Chapter  Google Scholar 

  33. Giusti, A., et al.: A machine learning approach to visual perception of forest trails for mobile robots. IEEE Robot. Autom. Lett. 1, 661–667 (2015)

    Article  Google Scholar 

  34. Gordon, D., Kembhavi, A., Rastegari, M., Redmon, J., Fox, D., Farhadi, A.: IQA: visual Question Answering in Interactive Environments. In: CVPR (2018)

    Google Scholar 

  35. Gupta, A., Johnson, J., Fei-Fei, L., Savarese, S., Alahi, A.: Social GAN: socially acceptable trajectories with generative adversarial networks. In: CVPR (2018)

    Google Scholar 

  36. Gupta, J.K., Egorov, M., Kochenderfer, M.: Cooperative multi-agent control using deep reinforcement learning. In: Sukthankar, G., Rodriguez-Aguilar, J.A. (eds.) AAMAS 2017. LNCS (LNAI), vol. 10642, pp. 66–83. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71682-4_5

    Chapter  Google Scholar 

  37. Henriques, J.F., Vedaldi, A.: MapNet: an allocentric spatial memory for mapping environments. In: CVPR (2018)

    Google Scholar 

  38. Hill, F., Hermann, K.M., Blunsom, P., Clark, S.: Understanding grounded language learning agents. arXiv preprint arXiv:1710.09867 (2017)

  39. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)

    Article  Google Scholar 

  40. Jaderberg, M., et al.: Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science 364, 859–865 (2019)

    Article  MathSciNet  Google Scholar 

  41. Jain, U., et al.: Two body problem: collaborative visual task completion. In: CVPR (2019), first two authors contributed equally

    Google Scholar 

  42. Johnson, M., Hofmann, K., Hutton, T., Bignell, D.: The malmo platform for artificial intelligence experimentation. In: IJCAI (2016)

    Google Scholar 

  43. Kahn, G., Zhang, T., Levine, S., Abbeel, P.: Plato: policy learning using adaptive trajectory optimization. In: ICRA (2017)

    Google Scholar 

  44. Kasai, T., Tenmoto, H., Kamiya, A.: Learning of communication codes in multi-agent reinforcement learning problem. In: Proceedings of IEEE Soft Computing in Industrial Applications (2008)

    Google Scholar 

  45. Kavraki, L.E., Svestka, P., Latombe, J.C., Overmars, M.H.: Probabilistic roadmaps for path planning in high-dimensional configuration spaces. IEEE Trans. Robot. Autom. 12, 566–580 (1996)

    Article  Google Scholar 

  46. Kempka, M., Wydmuch, M., Runc, G., Toczek, J., Jakowski, W.: ViZDoom: a doom-based AI research platform for visual reinforcement learning. In: Proceedings of IEEE Conference on Computational Intelligence and Games (2016)

    Google Scholar 

  47. Kolve, E., et al.: AI2-THOR: an interactive 3D environment for visual AI. arXiv preprint arXiv:1712.05474 (2019)

  48. Konolige, K., et al.: View-based maps. Int. J. Robot. Res. 29, 941–957 (2010)

    Article  Google Scholar 

  49. Kuipers, B., Byun, Y.T.: A robot exploration and mapping strategy based on a semantic hierarchy of spatial representations. Robot. Auton. Syst. 8, 47–63 (1991)

    Article  Google Scholar 

  50. Lauer, M., Riedmiller, M.: An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In: ICML (2000)

    Google Scholar 

  51. Lavalle, S.M., Kuffner, J.J.: Rapidly-exploring random trees: progress and prospects. Algorithmic Comput. Robot.: New Direct (2000)

    Google Scholar 

  52. Lazaridou, A., Peysakhovich, A., Baroni, M.: Multi-agent cooperation and the emergence of (natural) language. In: arXiv preprint arXiv:1612.07182 (2016)

  53. Lerer, A., Gross, S., Fergus, R.: Learning physical intuition of block towers by example. In: ICML (2016)

    Google Scholar 

  54. Liu, I.J., Yeh, R., Schwing, A.G.: PIC: permutation invariant critic for multi-agent deep reinforcement learning. In: CoRL (2019). First two authors contributed equally

    Google Scholar 

  55. Liu, Y.C., Tian, J., Glaser, N., Kira, Z.: When2com: multi-agent perception via communication graph grouping. In: CVPR (2020)

    Google Scholar 

  56. Liu, Y.C., Tian, J., Ma, C.Y., Glaser, N., Kuo, C.W., Kira, Z.: Who2com: collaborative perception via learnable handshake communication. In: ICRA (2020)

    Google Scholar 

  57. Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: NeurIPS (2017)

    Google Scholar 

  58. Savva, M., et al.: Habitat: a platform for embodied AI research. In: ICCV (2019)

    Google Scholar 

  59. Matignon, L., Laurent, G.J., Fort-Piat, N.L.: Hysteretic Q-learning: an algorithm for decentralized reinforcement learning in cooperative multi-agent teams. In: IROS (2007)

    Google Scholar 

  60. Melo, F.S., Spaan, M.T.J., Witwicki, S.J.: QueryPOMDP: POMDP-based communication in multiagent systems. In: Cossentino, M., Kaisers, M., Tuyls, K., Weiss, G. (eds.) EUMAS 2011. LNCS (LNAI), vol. 7541, pp. 189–204. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34799-3_13

    Chapter  Google Scholar 

  61. Mirowski, P., et al.: Learning to navigate in complex environments. In: ICLR (2017)

    Google Scholar 

  62. Mirowski, P., et al.: The streetlearn environment and dataset. arXiv preprint arXiv:1903.01292 (2019)

  63. Mirowski, P., et al.: Learning to navigate in cities without a map. In: NeurIPS (2018)

    Google Scholar 

  64. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)

    Article  Google Scholar 

  65. Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: ICML (2016)

    Google Scholar 

  66. Mordatch, I., Abbeel, P.: Emergence of grounded compositional language in multi-agent populations. In: AAAI (2018)

    Google Scholar 

  67. Oh, J., Chockalingam, V., Singh, S., Lee, H.: Control of memory, active perception, and action in minecraft. In: ICML (2016)

    Google Scholar 

  68. Omidshafiei, S., Pazis, J., Amato, C., How, J.P., Vian, J.: Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In: ICML (2017)

    Google Scholar 

  69. Panait, L., Luke, S.: Cooperative multi-agent learning: the state of the art. Autonom. Agents Multi-Agent Syst. AAMAS 11, 387–434 (2005)

    Article  Google Scholar 

  70. Peng, P., et al.: Multiagent bidirectionally-coordinated nets: emergence of human-level coordination in learning to play starcraft combat games. arXiv preprint arXiv:1703.10069 (2017)

  71. Smith, R.C., Cheeseman, P.: On the representation and estimation of spatial uncertainty. Int. J. Robot. Res. 5, 56–68 (1986)

    Article  Google Scholar 

  72. Ramakrishnan, S.K., Jayaraman, D., Grauman, K.: An exploration of embodied visual exploration. arXiv preprint arXiv:2001.02192 (2020)

  73. Rashid, T., Samvelyan, M., Schroeder, C., Farquhar, G., Foerster, J., Whiteson, S.: QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. In: ICML (2018)

    Google Scholar 

  74. Savinov, N., Dosovitskiy, A., Koltun, V.: Semi-parametric topological memory for navigation. In: ICLR (2018)

    Google Scholar 

  75. Savva, M., Chang, A.X., Dosovitskiy, A., Funkhouser, T., Koltun, V.: MINOS: multimodal indoor simulator for navigation in complex environments. arXiv preprint arXiv:1712.03931 (2017)

  76. Schönberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: CVPR (2016)

    Google Scholar 

  77. Smith, R.C., Self, M., Cheeseman, P.: Estimating uncertain spatial relationships in robotics. In: UAI (1986)

    Google Scholar 

  78. Suhr, A., et al.: Executing instructions in situated collaborative interactions. In: EMNLP (2019)

    Google Scholar 

  79. Sukhbaatar, S., Szlam, A., Fergus, R.: Learning multiagent communication with backpropagation. In: NeurIPS (2016)

    Google Scholar 

  80. Sukhbaatar, S., Szlam, A., Synnaeve, G., Chintala, S., Fergus, R.: MazeBase: a sandbox for learning from games. arXiv preprint arXiv:1511.07401 (2015)

  81. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  82. Tamar, A., Wu, Y., Thomas, G., Levine, S., Abbeel, P.: Value iteration networks. In: NeurIPS (2016)

    Google Scholar 

  83. Tampuu, A., et al.: Multiagent cooperation and competition with deep reinforcement learning. PloS 12, e0172395 (2017)

    Article  Google Scholar 

  84. Tan, M.: Multi-agent reinforcement learning: independent vs. cooperative agents. In: ICML (1993)

    Google Scholar 

  85. Tesauro, G.: Extending Q-learning to general adaptive multi-agent systems. In: NeurIPS (2004)

    Google Scholar 

  86. Thomason, J., Gordon, D., Bisk, Y.: Shifting the baseline: Single modality performance on visual navigation & QA. In: NAACL (2019)

    Google Scholar 

  87. Tomasi, C., Kanade, T.: Shape and motion from image streams under orthography: a factorization method. IJCV 9, 137–154 (1992)

    Article  Google Scholar 

  88. Toussaint, M.: Learning a world model and planning with a self-organizing, dynamic neural system. In: NeurIPS (2003)

    Google Scholar 

  89. Usunier, N., Synnaeve, G., Lin, Z., Chintala, S.: Episodic exploration for deep deterministic policies: an application to starcraft micromanagement tasks. In: ICLR (2016)

    Google Scholar 

  90. de Vries, H., Shuster, K., Batra, D., Parikh, D., Weston, J., Kiela, D.: Talk the walk: navigating new York city through grounded dialogue. arXiv preprint arXiv:1807.03367 (2018)

  91. Wang, X., et al.: Reinforced cross-modal matching and self-supervised imitation learning for vision-language navigation. In: CVPR (2019)

    Google Scholar 

  92. Weihs, L., Jain, U., Salvador, J., Lazebnik, S., Kembhavi, A., Schwing, A.: Bridging the imitation gap by adaptive insubordination. arXiv preprint arXiv:2007.12173 (2020). The first two authors contributed equally

  93. Weihs, L., et al.: Artificial agents learn flexible visual representations by playing a hiding game. arXiv preprint arXiv:1912.08195 (2019)

  94. Weihs, L., et al.: AllenAct: a framework for embodied AI research. arXiv (2020)

    Google Scholar 

  95. Wijmans, E., et al.: Embodied question answering in photorealistic environments with point cloud perception. In: CVPR (2019)

    Google Scholar 

  96. Wortsman, M., Ehsani, K., Rastegari, M., Farhadi, A., Mottaghi, R.: Learning to learn how to learn: self-adaptive visual navigation using meta-learning. In: CVPR (2019)

    Google Scholar 

  97. Wu, Y., Wu, Y., Tamar, A., Russell, S., Gkioxari, G., Tian, Y.: Bayesian relational memory for semantic visual navigation. In: ICCV (2019)

    Google Scholar 

  98. Wymann, B., Espié, E., Guionneau, C., Dimitrakakis, C., Coulom, R., Sumner, A.: TORCS, the open racing car simulator (2013). http://www.torcs.org

  99. Xia, F., et al.: Interactive Gibson: a benchmark for interactive navigation in cluttered environments. arXiv preprint arXiv:1910.14442 (2019)

  100. Xia, F., Zamir, A.R., He, Z., Sax, A., Malik, J., Savarese, S.: Gibson ENv: real-world perception for embodied agents. In: CVPR (2018)

    Google Scholar 

  101. Yang, J., Lu, J., Lee, S., Batra, D., Parikh, D.: Visual curiosity: learning to ask questions to learn visual recognition. In: CoRL (2018)

    Google Scholar 

  102. Yang, J., et al.: Embodied amodal recognition: learning to move to perceive objects. In: ICCV (2019)

    Google Scholar 

  103. Yang, W., Wang, X., Farhadi, A., Gupta, A., Mottaghi, R.: Visual semantic navigation using scene priors. In: ICLR (2018)

    Google Scholar 

  104. Zhang, K., Yang, Z., Başar, T.: Multi-agent reinforcement learning: a selective overview of theories and algorithms. arXiv preprint arXiv:1911.10635 (2019)

  105. Zhu, Y., et al.: Target-driven visual navigation in indoor scenes using deep reinforcement learning. In: ICRA (2017)

    Google Scholar 

Download references

Acknowledgements

This material is based upon work supported in part by the National Science Foundation under Grants No. 1563727, 1718221, 1637479, 165205, 1703166, Samsung, 3M, Sloan Fellowship, NVIDIA Artificial Intelligence Lab, Allen Institute for AI, Amazon, AWS Research Awards, and Siebel Scholars Award. We thank M. Wortsman and K.-H. Zeng for their insightful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Unnat Jain .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 4155 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jain, U. et al. (2020). A Cordial Sync: Going Beyond Marginal Policies for Multi-agent Embodied Tasks. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12350. Springer, Cham. https://doi.org/10.1007/978-3-030-58558-7_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58558-7_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58557-0

  • Online ISBN: 978-3-030-58558-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics