{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,12,10]],"date-time":"2024-12-10T07:43:10Z","timestamp":1733816590360,"version":"3.30.1"},"reference-count":71,"publisher":"SAGE Publications","issue":"4","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["AIC"],"published-print":{"date-parts":[[2022,9,20]]},"abstract":"The Game Theory & Multi-Agent team at DeepMind studies several aspects of multi-agent learning ranging from computing approximations to fundamental concepts in game theory to simulating social dilemmas in rich spatial environments and training 3-d humanoids in difficult team coordination tasks. A signature aim of our group is to use the resources and expertise made available to us at DeepMind in deep reinforcement learning to explore multi-agent systems in complex environments and use these benchmarks to advance our understanding. Here, we summarise the recent work of our team and present a taxonomy that we feel highlights many important open challenges in multi-agent research.<\/jats:p>","DOI":"10.3233\/aic-220113","type":"journal-article","created":{"date-parts":[[2022,9,6]],"date-time":"2022-09-06T15:24:40Z","timestamp":1662477880000},"page":"271-284","source":"Crossref","is-referenced-by-count":0,"title":["Developing, evaluating and scaling learning agents in multi-agent environments"],"prefix":"10.1177","volume":"35","author":[{"given":"Ian","family":"Gemp","sequence":"first","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}]},{"given":"Thomas","family":"Anthony","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}]},{"given":"Yoram","family":"Bachrach","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}]},{"given":"Avishkar","family":"Bhoopchand","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}]},{"given":"Kalesha","family":"Bullard","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}]},{"given":"Jerome","family":"Connor","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}]},{"given":"Vibhavari","family":"Dasagi","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}]},{"given":"Bart","family":"De Vylder","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}]},{"given":"Edgar\u00a0A.","family":"Du\u00e9\u00f1ez-Guzm\u00e1n","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}]},{"given":"Romuald","family":"Elie","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}]},{"given":"Richard","family":"Everett","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}]},{"given":"Daniel","family":"Hennes","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}]},{"given":"Edward","family":"Hughes","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}]},{"given":"Mina","family":"Khan","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}]},{"given":"Marc","family":"Lanctot","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}]},{"given":"Kate","family":"Larson","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}]},{"given":"Guy","family":"Lever","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}]},{"given":"Siqi","family":"Liu","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}]},{"given":"Luke","family":"Marris","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}]},{"given":"Kevin R.","family":"McKee","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}]},{"given":"Paul","family":"Muller","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}]},{"given":"Julien","family":"P\u00e9rolat","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}]},{"given":"Florian","family":"Strub","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}]},{"given":"Andrea","family":"Tacchetti","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}]},{"given":"Eugene","family":"Tarassov","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}]},{"given":"Zhe","family":"Wang","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}]},{"given":"Karl","family":"Tuyls","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}]}],"member":"179","reference":[{"key":"10.3233\/AIC-220113_ref1","first-page":"17987","article-title":"Learning to play no-press diplomacy with best response policy iteration","volume":"33","author":"Anthony","year":"2020","journal-title":"Advances in Neural Information Processing Systems"},{"key":"10.3233\/AIC-220113_ref3","first-page":"187","article-title":"Some mathematical models of race discrimination in the labor market","author":"Arrow","year":"1972","journal-title":"Racial discrimination in economic life"},{"doi-asserted-by":"publisher","key":"10.3233\/AIC-220113_ref4","DOI":"10.1016\/j.artint.2020.103356"},{"doi-asserted-by":"crossref","unstructured":"Y. Bachrach, I. Gemp, M. Garnelo, J. Kramar, T. Eccles, D. Rosenbaum and T. Graepel, A neural network auction for group decision making over a continuous space, in: Proceedings of the 30th International Joint Conference on Artificial Intelligence (IJCAI) Demonstrations Track, 2021.","key":"10.3233\/AIC-220113_ref5","DOI":"10.24963\/ijcai.2021\/706"},{"unstructured":"A. Bakhtin, D. Wu, A. Lerer and N. Brown, No-press diplomacy from scratch, Advances in Neural Information Processing Systems 34 (2021).","key":"10.3233\/AIC-220113_ref6"},{"unstructured":"J. Balaguer, R. K\u00f6ster, C. Summerfield and A. Tacchetti, The good shepherd: An oracle agent for mechanism design, in: ICLR Workshop on Gamification and Multiagent Solutions, 2022.","key":"10.3233\/AIC-220113_ref8"},{"unstructured":"J. Balaguer, R. K\u00f6ster, A. Weinstein, L. Campbell-Gillingham, C. Summerfield, M. Botvinick and A. Tacchetti, HCMD-zero: Learning value aligned mechanisms from data, in: ICLR Workshop on Gamification and Multiagent Solutions, 2022.","key":"10.3233\/AIC-220113_ref9"},{"unstructured":"D. Balduzzi, K. Tuyls, J. Perolat and T. Graepel, Re-evaluating evaluation, Advances in Neural Information Processing Systems 31 (2018).","key":"10.3233\/AIC-220113_ref10"},{"key":"10.3233\/AIC-220113_ref11","doi-asserted-by":"publisher","first-page":"659","DOI":"10.1613\/jair.4818","article-title":"Evolutionary dynamics of multi-agent learning: A survey","volume":"53","author":"Bloembergen","year":"2015","journal-title":"J. Artif. Intell. Res."},{"issue":"1","key":"10.3233\/AIC-220113_ref12","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1006\/jeth.1997.2319","article-title":"Learning through reinforcement and replicator dynamics","volume":"77","author":"B\u00f6rgers","year":"1997","journal-title":"Journal of Economic Theory"},{"issue":"1","key":"10.3233\/AIC-220113_ref13","first-page":"374","article-title":"Iterative solution of games by fictitious play, activity analysis of production and","volume":"13","author":"Brown","year":"1951","journal-title":"allocation"},{"issue":"4","key":"10.3233\/AIC-220113_ref16","doi-asserted-by":"publisher","first-page":"911","DOI":"10.4310\/CMS.2015.v13.n4.a4","article-title":"Mean field games and systemic risk","volume":"13","author":"Carmona","year":"2015","journal-title":"Communications in Mathematical Sciences"},{"unstructured":"M. Carroll, R. Shah, M.K. Ho, T. Griffiths, S. Seshia, P. Abbeel and A. Dragan, On the utility of learning about humans for human-ai coordination, Advances in neural information processing systems 32 (2019).","key":"10.3233\/AIC-220113_ref17"},{"unstructured":"R. Chaabouni, F. Strub, F. Altch\u00e9, E. Tarassov, C. Tallec, E. Davoodi, K.W. Mathewson, O. Tieleman, A. Lazaridou and B. Piot, Emergent Communication at Scale, International Conference on Learning Representations, 2021.","key":"10.3233\/AIC-220113_ref18"},{"key":"10.3233\/AIC-220113_ref19","first-page":"17443","article-title":"Real world games look like spinning tops","volume":"33","author":"Czarnecki","year":"2020","journal-title":"Advances in Neural Information Processing Systems"},{"doi-asserted-by":"crossref","unstructured":"A. Dafoe, Y. Bachrach, G. Hadfield, E. Horvitz, K. Larson and T. Graepel, Cooperative AI: Machines Must Learn to Find Common Ground, Nature Publishing Group, 2021.","key":"10.3233\/AIC-220113_ref20","DOI":"10.1038\/d41586-021-01170-0"},{"doi-asserted-by":"publisher","key":"10.3233\/AIC-220113_ref22","DOI":"10.1109\/CVPR.2009.5206848"},{"issue":"9\u201310","key":"10.3233\/AIC-220113_ref23","doi-asserted-by":"publisher","first-page":"1506","DOI":"10.1016\/j.mcm.2010.06.012","article-title":"Modeling crowd dynamics by the mean-field limit approach","volume":"52","author":"Dogb\u00e9","year":"2010","journal-title":"Mathematical and Computer Modelling"},{"unstructured":"A.M. Donati, G. Quispe, C. Ollion, S.L. Corff, F. Strub and O. Pietquin, Learning natural language generation from scratch, in: Conference of the North American Chapter of the Association for Computational Linguistics, 2022.","key":"10.3233\/AIC-220113_ref24"},{"unstructured":"T. Eccles, Y. Bachrach, G. Lever, A. Lazaridou and T. Graepel, Biases for emergent communication in multi-agent reinforcement learning, Advances in neural information processing systems 32 (2019).","key":"10.3233\/AIC-220113_ref26"},{"issue":"1","key":"10.3233\/AIC-220113_ref27","doi-asserted-by":"publisher","first-page":"399","DOI":"10.1111\/mafi.12291","article-title":"Mean\u2013field moral hazard for optimal energy demand response management","volume":"31","author":"\u00c9lie","year":"2021","journal-title":"Mathematical Finance"},{"doi-asserted-by":"publisher","key":"10.3233\/AIC-220113_ref28","DOI":"10.1051\/mmnp\/2020022"},{"doi-asserted-by":"crossref","unstructured":"R. Elie, J. Perolat, M. Lauri\u00e8re, M. Geist and O. Pietquin, On the convergence of model free learning in mean field games, in: Proc. of AAAI, 2020.","key":"10.3233\/AIC-220113_ref29","DOI":"10.1609\/aaai.v34i05.6203"},{"unstructured":"M. Geist, J. P\u00e9rolat, M. Lauri\u00e8re, R. Elie, S. Perrin, O. Bachem, R. Munos and O. Pietquin, Concave utility reinforcement learning: The mean-field game viewpoint, in: Proc. of AAMAS, 2022.","key":"10.3233\/AIC-220113_ref30"},{"unstructured":"I. Gemp, R. Savani, M. Lanctot, Y. Bachrach, T. Anthony, R. Everett, A. Tacchetti, T. Eccles and J. Kram\u00e1r, Sample-based approximation of Nash in large many-player games via gradient descent, in: Proceedings of the 21st International Conference on Autonomous Agents and MultiAgent Systems, AAMAS \u201922, International Foundation for Autonomous Agents and Multiagent Systems, 2022.","key":"10.3233\/AIC-220113_ref34"},{"doi-asserted-by":"publisher","key":"10.3233\/AIC-220113_ref35","DOI":"10.1098\/rstb.2019.0766"},{"unstructured":"A. Gruslys, M. Lanctot, R. Munos, F. Timbers, M. Schmid, J. Perolat, D. Morrill, V. Zambaldi, J.-B. Lespiau, J. Schultz, M.G. Azar, M. Bowling and K. Tuyls, The Advantage Regret-Matching Actor-Critic, 2020.","key":"10.3233\/AIC-220113_ref37"},{"unstructured":"J. Heinrich, M. Lanctot and D. Silver, Fictitious self-play in extensive-form games, in: Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), 2015.","key":"10.3233\/AIC-220113_ref38"},{"unstructured":"D. Hennes, D. Morrill, S. Omidshafiei, R. Munos, J. Perolat, M. Lanctot, A. Gruslys, J.-B. Lespiau, P. Parmas, E. Duenez-Guzman and K. Tuyls, Neural replicator dynamics, in: Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2020.","key":"10.3233\/AIC-220113_ref40"},{"doi-asserted-by":"crossref","unstructured":"M. Huang, R.P. Malham\u00e9 and P.E. Caines, Large population stochastic dynamic games: Closed-loop McKean-Vlasov systems and the Nash certainty equivalence principle, Communications in Information & Systems 6 (2006).","key":"10.3233\/AIC-220113_ref41","DOI":"10.4310\/CIS.2006.v6.n3.a5"},{"unstructured":"E. Hughes, J.Z. Leibo, M. Phillips, K. Tuyls, E. Due\u00f1ez-Guzman, A. Garc\u00eda Casta\u00f1eda, I. Dunning, T. Zhu, K. McKee, R. Koster et al., Inequity aversion improves cooperation in intertemporal social dilemmas, Advances in neural information processing systems 31 (2018).","key":"10.3233\/AIC-220113_ref43"},{"unstructured":"N. Jaques, A. Lazaridou, E. Hughes, C. Gulcehre, P. Ortega, D. Strouse, J.Z. Leibo and N. De Freitas, Social influence as intrinsic motivation for multi-agent deep reinforcement learning, in: International Conference on Machine Learning, PMLR, 2019, pp. 3040\u20133049.","key":"10.3233\/AIC-220113_ref44"},{"unstructured":"A. Kalinowska, E. Davoodi, F. Strub, K. Mathewson, T. Murphey and P. Pilarski, Situated Communication: A Solution to over-Communication Between Artificial Agents, Emergent Communication Workshop at ICLR 2022, 2022.","key":"10.3233\/AIC-220113_ref45"},{"doi-asserted-by":"crossref","unstructured":"R. K\u00f6ster, D. Hadfield-Menell, R. Everett, L. Weidinger, G.K. Hadfield and J.Z. Leibo, Spurious normativity enhances learning of compliance and enforcement behavior in artificial agents, Proceedings of the National Academy of Sciences 119(3) (2022).","key":"10.3233\/AIC-220113_ref47","DOI":"10.1073\/pnas.2106028118"},{"unstructured":"A. Krizhevsky, I. Sutskever and G.E. Hinton, Imagenet classification with deep convolutional neural networks, Advances in neural information processing systems 25 (2012).","key":"10.3233\/AIC-220113_ref48"},{"key":"10.3233\/AIC-220113_ref49","first-page":"1107","article-title":"Least-squares policy iteration","volume":"4","author":"Lagoudakis","year":"2003","journal-title":"The Journal of Machine Learning Research"},{"unstructured":"M. Lanctot, V. Zambaldi, A. Gruslys, A. Lazaridou, K. Tuyls, J. Perolat, D. Silver and T. Graepel, A unified game-theoretic approach to multiagent reinforcement learning, in: Neural Information Processing Systems (NIPS), 2017.","key":"10.3233\/AIC-220113_ref51"},{"unstructured":"M. Lanctot, V. Zambaldi, A. Gruslys, A. Lazaridou, K. Tuyls, J. Perolat, D. Silver and T. Graepel, A unified game-theoretic approach to multiagent reinforcement learning, in: Advances in Neural Information Processing Systems, 2017.","key":"10.3233\/AIC-220113_ref52"},{"doi-asserted-by":"publisher","key":"10.3233\/AIC-220113_ref53","DOI":"10.1007\/s11537-007-0657-8"},{"unstructured":"J.Z. Leibo, E.A. Due\u00f1ez-Guzman, A. Vezhnevets, J.P. Agapiou, P. Sunehag, R. Koster, J. Matyas, C. Beattie, I. Mordatch and T. Graepel, Scalable evaluation of multi-agent reinforcement learning with melting pot, in: International Conference on Machine Learning, PMLR, 2021, pp. 6187\u20136199.","key":"10.3233\/AIC-220113_ref55"},{"unstructured":"J.Z. Leibo, J. Perolat, E. Hughes, S. Wheelwright, A.H. Marblestone, E. Du\u00e9\u00f1ez-Guzm\u00e1n, P. Sunehag, I. Dunning and T. Graepel, Malthusian reinforcement learning, in: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, 2019, pp. 1099\u20131107.","key":"10.3233\/AIC-220113_ref57"},{"unstructured":"G. Lever, J. Merel, N. Heess, S. Tunyasuvunakool, S. Liu and T. Graepel, Emergent Coordination Through Competition, 2019.","key":"10.3233\/AIC-220113_ref59"},{"unstructured":"S.\u00a0Liu, L.\u00a0Marris, D.\u00a0Hennes, J.\u00a0Merel, N.\u00a0Heess and T.\u00a0Graepel, NeuPL: Neural population learning, in: International Conference on Learning Representations, 2022, https:\/\/openreview.net\/forum?id=MIX3fJkl_1.","key":"10.3233\/AIC-220113_ref61"},{"doi-asserted-by":"crossref","unstructured":"E. Lockhart, M. Lanctot, J. P\u00e9rolat, J.-B. Lespiau, D. Morrill, F. Timbers and K. Tuyls, Computing approximate equilibria in sequential adversarial games by exploitability descent, in: Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI), 2019.","key":"10.3233\/AIC-220113_ref62","DOI":"10.24963\/ijcai.2019\/66"},{"unstructured":"L. Marris, P. Muller, M. Lanctot, K. Tuyls and T. Graepel, Multi-agent training beyond zero-sum with correlated equilibrium meta-solvers, in: Proceedings of the 38th International Conference on Machine Learning, M. Meila and T. Zhang, eds, Proceedings of Machine Learning Research, Vol. 139, PMLR, 2021, pp. 7480\u20137491, http:\/\/proceedings.mlr.press\/v139\/marris21a.html.","key":"10.3233\/AIC-220113_ref63"},{"unstructured":"H.B. McMahan, G.J. Gordon and A. Blum, Planning in the presence of cost functions controlled by an adversary, in: Proceedings of the 20th International Conference on Machine Learning (ICML-03), 2003, pp. 536\u2013543.","key":"10.3233\/AIC-220113_ref65"},{"doi-asserted-by":"crossref","unstructured":"P. Milgrom and P.R. Milgrom, Putting Auction Theory to Work, Cambridge University Press, 2004.","key":"10.3233\/AIC-220113_ref66","DOI":"10.1017\/CBO9780511813825"},{"doi-asserted-by":"publisher","key":"10.3233\/AIC-220113_ref67","DOI":"10.48550\/ARXIV.2102.02274"},{"unstructured":"P. Muller, S. Omidshafiei, M. Rowland, K. Tuyls, J. Perolat, S. Liu, D. Hennes, L. Marris, M. Lanctot, E. Hughes, Z. Wang, G. Lever, N. Heess, T. Graepel and R. Munos, A generalized training approach for multiagent learning, in: Proceedings of the Eighth International Conference on Learning Representations (ICLR), 2020.","key":"10.3233\/AIC-220113_ref68"},{"doi-asserted-by":"publisher","key":"10.3233\/AIC-220113_ref69","DOI":"10.48550\/ARXIV.2111.08350"},{"unstructured":"R. Munos, J. Perolat, J.-B. Lespiau, M. Rowland, B.D. Vylder, M. Lanctot, F. Timbers, D. Hennes, S. Omidshafiei, A. Gruslys, M.G. Azar, E. Lockhart and K. Tuyls, Fast computation of Nash equilibria in imperfect information games, in: Proceedings of the International Conference on Machine Learning (ICML), 2020.","key":"10.3233\/AIC-220113_ref70"},{"doi-asserted-by":"publisher","key":"10.3233\/AIC-220113_ref71","DOI":"10.1038\/s41598-019-45619-9"},{"doi-asserted-by":"publisher","key":"10.3233\/AIC-220113_ref72","DOI":"10.3390\/e20100782"},{"unstructured":"P. Paquette, Y. Lu, S.S. Bocco, M. Smith, S. Ortiz-Gagn\u00e9, J.K. Kummerfeld, J. Pineau, S. Singh and A.C. Courville, No-press diplomacy: Modeling multi-agent gameplay, Advances in Neural Information Processing Systems 32 (2019).","key":"10.3233\/AIC-220113_ref73"},{"doi-asserted-by":"crossref","unstructured":"R. Patel, M. Garnelo, I. Gemp, C. Dyer and Y. Bachrach, Game-theoretic vocabulary selection via the Shapley value and Banzhaf index, in: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021, pp. 2789\u20132798.","key":"10.3233\/AIC-220113_ref74","DOI":"10.18653\/v1\/2021.naacl-main.223"},{"unstructured":"J. Perolat, J.Z. Leibo, V. Zambaldi, C. Beattie, K. Tuyls and T. Graepel, A multi-agent reinforcement learning model of common-pool resource appropriation, Advances in Neural Information Processing Systems 30 (2017).","key":"10.3233\/AIC-220113_ref75"},{"unstructured":"J. Perolat, R. Munos, J.-B. Lespiau, S. Omidshafiei, M. Rowland, P. Ortega, N. Burch, T. Anthony, D. Balduzzi, B.D. Vylder, G. Piliouras, M. Lanctot and K. Tuyls, From Poincar\u00e9 recurrence to convergence in imperfect information games: Finding equilibrium via regularization, in: Proceedings of the Thirty-Eighth International Conference on Machine Learning (ICML), 2021.","key":"10.3233\/AIC-220113_ref76"},{"unstructured":"S. Perrin, J. P\u00e9rolat, M. Lauri\u00e8re, M. Geist, R. Elie and O. Pietquin, Fictitious play for mean field games: Continuous time analysis and applications, in: Proc. of NeurIPS, 2020.","key":"10.3233\/AIC-220113_ref80"},{"unstructured":"M. Rita, F. Strub, J.-B. Grill, O. Pietquin and E. Dupoux, On the role of population heterogeneity in emergent communication, in: International Conference on Learning Representations, 2021.","key":"10.3233\/AIC-220113_ref82"},{"issue":"7676","key":"10.3233\/AIC-220113_ref83","doi-asserted-by":"publisher","first-page":"354","DOI":"10.1038\/nature24270","article-title":"Mastering the game of Go without human knowledge","volume":"550","author":"Silver","year":"2017","journal-title":"Nat."},{"unstructured":"S. Srinivasan, M. Lanctot, V. Zambaldi, J. P\u00e9rolat, K. Tuyls, R. Munos and M. Bowling, Actor-critic policy optimization in partially observable multiagent environments, in: Advances in Neural Information Processing Systems (NeurIPS), 2018.","key":"10.3233\/AIC-220113_ref84"},{"unstructured":"D. Strouse, K. McKee, M. Botvinick, E. Hughes and R. Everett, Collaborating with humans without human data, Advances in Neural Information Processing Systems 34 (2021).","key":"10.3233\/AIC-220113_ref85"},{"doi-asserted-by":"publisher","key":"10.3233\/AIC-220113_ref86","DOI":"10.1162\/isal_a_00148"},{"unstructured":"E. Szathm\u00e1ry and J.M. Smith, The Major Transitions in Evolution, WH Freeman Spektrum, Oxford, UK, 1995.","key":"10.3233\/AIC-220113_ref87"},{"unstructured":"A. Tacchetti, D. Strouse, M. Garnelo, T. Graepel and Y. Bachrach, Learning truthful, efficient, and welfare maximizing auction rules, in: ICLR Workshop on Gamification and Multiagent Solutions, 2022.","key":"10.3233\/AIC-220113_ref88"},{"doi-asserted-by":"publisher","key":"10.3233\/AIC-220113_ref90","DOI":"10.1007\/978-3-540-39857-8_38"},{"key":"10.3233\/AIC-220113_ref91","doi-asserted-by":"publisher","first-page":"41","DOI":"10.1613\/jair.1.12505","article-title":"Game plan: What AI can do for football, and what football can do for AI","volume":"71","author":"Tuyls","year":"2021","journal-title":"Journal of Artificial Intelligence Research"},{"doi-asserted-by":"publisher","key":"10.3233\/AIC-220113_ref92","DOI":"10.1145\/860575.860687"},{"unstructured":"A. Vezhnevets, Y. Wu, M. Eckstein, R. Leblond and J.Z. Leibo, Options as responses: Grounding behavioural hierarchies in multi-agent reinforcement learning, in: International Conference on Machine Learning, PMLR, 2020, pp. 9733\u20139742.","key":"10.3233\/AIC-220113_ref93"},{"issue":"7782","key":"10.3233\/AIC-220113_ref95","doi-asserted-by":"publisher","first-page":"350","DOI":"10.1038\/s41586-019-1724-z","article-title":"Grandmaster level in StarCraft II using multi-agent reinforcement learning","volume":"575","author":"Vinyals","year":"2019","journal-title":"Nat."},{"key":"10.3233\/AIC-220113_ref97","first-page":"15208","article-title":"Learning to incentivize other learning agents","volume":"33","author":"Yang","year":"2020","journal-title":"Advances in Neural Information Processing Systems"}],"container-title":["AI Communications"],"original-title":[],"link":[{"URL":"https:\/\/content.iospress.com\/download?id=10.3233\/AIC-220113","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,12,9]],"date-time":"2024-12-09T10:08:05Z","timestamp":1733738885000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/full\/10.3233\/AIC-220113"}},"subtitle":[],"editor":[{"given":"Stefano V.","family":"Albrecht","sequence":"additional","affiliation":[]},{"given":"Michael","family":"Woolridge","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2022,9,20]]},"references-count":71,"journal-issue":{"issue":"4"},"URL":"https:\/\/doi.org\/10.3233\/aic-220113","relation":{},"ISSN":["1875-8452","0921-7126"],"issn-type":[{"type":"electronic","value":"1875-8452"},{"type":"print","value":"0921-7126"}],"subject":[],"published":{"date-parts":[[2022,9,20]]}}}