{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,7,8]],"date-time":"2024-07-08T18:00:24Z","timestamp":1720461624479},"reference-count":48,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2022,3,18]],"date-time":"2022-03-18T00:00:00Z","timestamp":1647561600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,3,18]],"date-time":"2022-03-18T00:00:00Z","timestamp":1647561600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Auton Agent Multi-Agent Syst"],"published-print":{"date-parts":[[2022,4]]},"abstract":"Abstract<\/jats:title>Generalization is a major challenge for multi-agent reinforcement learning. How well does an agent perform when placed in novel environments and in interactions with new co-players? In this paper, we investigate and quantify the relationship between generalization and diversity<\/jats:italic> in the multi-agent domain. Across the range of multi-agent environments considered here, procedurally generating training levels significantly improves agent performance on held-out levels. However, agent performance on the specific levels used in training sometimes declines as a result. To better understand the effects of co-player variation, our experiments introduce a new environment-agnostic measure of behavioral diversity. Results demonstrate that population size and intrinsic motivation are both effective methods of generating greater population diversity. In turn, training with a diverse set of co-players strengthens agent performance in some (but not all) cases.<\/jats:p>","DOI":"10.1007\/s10458-022-09548-8","type":"journal-article","created":{"date-parts":[[2022,3,18]],"date-time":"2022-03-18T06:02:54Z","timestamp":1647583374000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["Quantifying the effects of environment and population diversity in multi-agent reinforcement learning"],"prefix":"10.1007","volume":"36","author":[{"ORCID":"http:\/\/orcid.org\/0000-0002-4412-1686","authenticated-orcid":false,"given":"Kevin R.","family":"McKee","sequence":"first","affiliation":[]},{"given":"Joel Z.","family":"Leibo","sequence":"additional","affiliation":[]},{"given":"Charlie","family":"Beattie","sequence":"additional","affiliation":[]},{"given":"Richard","family":"Everett","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,3,18]]},"reference":[{"key":"9548_CR1","unstructured":"Balduzzi, D., Garnelo, M., Bachrach, Y., Czarnecki, W., Perolat, J., Jaderberg, M., Graepel, T. (2019). Open-ended learning in symmetric zero-sum games. In: International Conference on Machine Learning, pp. 434\u2013443. PMLR."},{"key":"9548_CR2","unstructured":"Beattie, C., K\u00f6ppe, T., Du\u00e9\u00f1ez-Guzm\u00e1n, E.A., Leibo, J.Z. (2020). DeepMind Lab2D. arXiv preprint arXiv:2011.07027."},{"key":"9548_CR3","unstructured":"Carroll, M., Shah, R., Ho, M.K., Griffiths, T., Seshia, S., Abbeel, P., Dragan, A. (2019). On the utility of learning about humans for human-AI coordination. In: Advances in Neural Information Processing Systems, pp. 5175\u20135186."},{"key":"9548_CR4","doi-asserted-by":"crossref","unstructured":"Charakorn, R., Manoonpong, P., Dilokthanakul, N. (2020). Investigating partner diversification methods in cooperative multi-agent deep reinforcement learning. In: International Conference on Neural Information Processing, pp. 395\u2013402. Springer.","DOI":"10.1007\/978-3-030-63823-8_46"},{"key":"9548_CR5","unstructured":"Cobbe, K., Hesse, C., Hilton, J., Schulman, J. (2019). Leveraging procedural generation to benchmark reinforcement learning. arXiv preprint arXiv:1912.01588."},{"key":"9548_CR6","unstructured":"Cobbe, K., Klimov, O., Hesse, C., Kim, T., Schulman, J. (2019). Quantifying generalization in reinforcement learning. In: International Conference on Machine Learning, pp. 1282\u20131289."},{"issue":"11","key":"9548_CR7","doi-asserted-by":"publisher","first-page":"671","DOI":"10.1037\/h0043943","volume":"12","author":"LJ Cronbach","year":"1957","unstructured":"Cronbach, L. J. (1957). The two disciplines of scientific psychology. American Psychologist, 12(11), 671.","journal-title":"American Psychologist"},{"key":"9548_CR8","first-page":"17443","volume":"33","author":"WM Czarnecki","year":"2020","unstructured":"Czarnecki, W. M., Gidel, G., Tracey, B., Tuyls, K., Omidshafiei, S., Balduzzi, D., & Jaderberg, M. (2020). Real world games look like spinning tops. Advances in Neural Information Processing Systems, 33, 17443\u201317454.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"9548_CR9","unstructured":"Dafoe, A., Hughes, E., Bachrach, Y., Collins, T., McKee, K.R., Leibo, J.Z., Larson, K., Graepel, T. (2020). Open problems in cooperative AI. arXiv preprint arXiv:2012.08630."},{"key":"9548_CR10","doi-asserted-by":"publisher","first-page":"396","DOI":"10.1016\/j.neucom.2021.10.040","volume":"468","author":"T Dai","year":"2022","unstructured":"Dai, T., Du, Y., Fang, M., & Bharath, A. A. (2022). Diversity-augmented intrinsic motivation for deep reinforcement learning. Neurocomputing, 468, 396\u2013406.","journal-title":"Neurocomputing"},{"key":"9548_CR11","first-page":"543","volume-title":"Individual differences in human-computer interaction. In: Handbook of Human-Computer Interaction,","author":"DE Egan","year":"1988","unstructured":"Egan, D. E. (1988). Individual differences in human-computer interaction. In: Handbook of Human-Computer Interaction, (pp. 543\u2013568). Netherlands: Elsevier."},{"issue":"4","key":"9548_CR12","doi-asserted-by":"publisher","first-page":"662","DOI":"10.1037\/0022-3514.76.4.662","volume":"76","author":"M Eid","year":"1999","unstructured":"Eid, M., & Diener, E. (1999). Intraindividual variability in affect: Reliability, validity, and personality correlates. Journal of Personality and Social Psychology, 76(4), 662.","journal-title":"Journal of Personality and Social Psychology"},{"key":"9548_CR13","volume-title":"The Rating of Chessplayers Past and Present","author":"AE Elo","year":"1978","unstructured":"Elo, A. E. (1978). The Rating of Chessplayers Past and Present. New York: Arco Publishing."},{"key":"9548_CR14","unstructured":"Everett, R., Cobb, A., Markham, A., Roberts, S. (2019). Optimising worlds to evaluate and influence reinforcement learning agents. In: Proceedings of the 18th International Conference on Autonomous Agents and Multiagent Systems, pp. 1943\u20131945. International Foundation for Autonomous Agents and Multiagent Systems."},{"key":"9548_CR15","unstructured":"Eysenbach, B., Gupta, A., Ibarz, J., Levine, S. (2019). Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations."},{"key":"9548_CR16","volume-title":"Statistical Methods for Research Workers","author":"RA Fisher","year":"1928","unstructured":"Fisher, R. A. (1928). Statistical Methods for Research Workers. United Kingdom: Oliver & Boyd."},{"issue":"4027\u20134030","key":"9548_CR17","first-page":"1","volume":"6","author":"DA Freedman","year":"1999","unstructured":"Freedman, D. A. (1999). Ecological inference and the ecological fallacy. International Encyclopedia of the Social and Behavioral Sciences, 6(4027\u20134030), 1\u20137.","journal-title":"International Encyclopedia of the Social and Behavioral Sciences"},{"key":"9548_CR18","unstructured":"Haarnoja, T., Tang, H., Abbeel, P., Levine, S. (2017). Reinforcement learning with deep energy-based policies. In: International Conference on Machine Learning, pp. 1352\u20131361. PMLR."},{"key":"9548_CR19","doi-asserted-by":"crossref","unstructured":"Hessel, M., Soyer, H., Espeholt, L., Czarnecki, W., Schmitt, S., van Hasselt, H. (2019). Multi-task deep reinforcement learning with PopArt. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol.\u00a033, pp. 3796\u20133803.","DOI":"10.1609\/aaai.v33i01.33013796"},{"key":"9548_CR20","unstructured":"Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics pp. 65\u201370."},{"key":"9548_CR21","unstructured":"Hu, H., Lerer, A., Peysakhovich, A., Foerster, J. (2020). \u2018Other-play\u2019 for zero-shot coordination. arXiv preprint arXiv:2003.02979."},{"key":"9548_CR22","unstructured":"Hughes, E., Leibo, J.Z., Phillips, M., Tuyls, K., Due\u00f1ez-Guzman, E., Casta\u00f1eda, A.G., Dunning, I., Zhu, T., McKee, K.R., Koster, R., Roff, H., Graepel, T. (2018). Inequity aversion improves cooperation in intertemporal social dilemmas. In: Advances in Neural Information Processing Systems, pp. 3326\u20133336."},{"key":"9548_CR23","unstructured":"Ibrahim, A., Jitani, A., Piracha, D., Precup, D. (2020). Reward redistribution mechanisms in multi-agent reinforcement learning. In: Adaptive Learning Agents Workshop at the International Conference on Autonomous Agents and Multiagent Systems."},{"issue":"6443","key":"9548_CR24","doi-asserted-by":"publisher","first-page":"859","DOI":"10.1126\/science.aau6249","volume":"364","author":"M Jaderberg","year":"2019","unstructured":"Jaderberg, M., Czarnecki, W. M., Dunning, I., Marris, L., Lever, G., Casta\u00f1eda, A. G., Beattie, C., Rabinowitz, N. C., Morcos, A. S., Ruderman, A., Sonnerat, N., Green, T., Deason, L., Leibo, J. Z., Silver, D., Hassabis, D., Kavukcuoglu, K., & Graepel, T. (2019). Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science, 364(6443), 859\u2013865.","journal-title":"Science"},{"key":"9548_CR25","unstructured":"Jaques, N., Lazaridou, A., Hughes, E., Gulcehre, C., Ortega, P.A., Strouse, D., Leibo, J.Z., De\u00a0Freitas, N. (2019). Intrinsic social motivation via causal influence in multi-agent RL. In: International Conference on Learning Representations."},{"key":"9548_CR26","doi-asserted-by":"crossref","unstructured":"Juliani, A., Khalifa, A., Berges, V.P., Harper, J., Teng, E., Henry, H., Crespi, A., Togelius, J., Lange, D. (2019). Obstacle tower: A generalization challenge in vision, control, and planning. arXiv preprint arXiv:1902.01378.","DOI":"10.24963\/ijcai.2019\/373"},{"key":"9548_CR27","unstructured":"Justesen, N., Torrado, R.R., Bontrager, P., Khalifa, A., Togelius, J., Risi, S. (2018). Illuminating generalization in deep reinforcement learning through procedural level generation. arXiv preprint arXiv:1806.10729."},{"key":"9548_CR28","unstructured":"Kingma, D.P., Ba, J., Adam (2014). A method for stochastic optimization. arXiv preprint arXiv:1412.6980."},{"key":"9548_CR29","unstructured":"Knott, P., Carroll, M., Devlin, S., Ciosek, K., Hofmann, K., Dragan, A., Shah, R. (2021). Evaluating the robustness of collaborative agents. arXiv preprint arXiv:2101.05507."},{"key":"9548_CR30","doi-asserted-by":"crossref","unstructured":"Kram\u00e1r, J., Rabinowitz, N., Eccles, T., Tacchetti, A. (2020). Should I tear down this wall? Optimizing social metrics by evaluating novel actions. arXiv preprint arXiv:2004.07625.","DOI":"10.1007\/978-3-030-72376-7_7"},{"key":"9548_CR31","unstructured":"Lanctot, M., Zambaldi, V., Gruslys, A., Lazaridou, A., Tuyls, K., P\u00e9rolat, J., Silver, D., Graepel, T. (2017). A unified game-theoretic approach to multiagent reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 4190\u20134203."},{"key":"9548_CR32","unstructured":"Leibo, J.Z., Perolat, J., Hughes, E., Wheelwright, S., Marblestone, A.H., Du\u00e9\u00f1ez-Guzm\u00e1n, E., Sunehag, P., Dunning, I., Graepel, T. (2019). Malthusian reinforcement learning. In: Proceedings of the 18th International Conference on Autonomous Agents and Multiagent Systems, pp. 1099\u20131107. International Foundation for Autonomous Agents and Multiagent Systems."},{"key":"9548_CR33","doi-asserted-by":"crossref","unstructured":"Lerer, A., Peysakhovich, A. (2019). Learning existing social conventions via observationally augmented self-play. In: Proceedings of the 2019 AAAI\/ACM Conference on AI, Ethics, and Society, pp. 107\u2013114.","DOI":"10.1145\/3306618.3314268"},{"key":"9548_CR34","doi-asserted-by":"crossref","unstructured":"Littman, M. L. (1994). Markov games as a framework for multi-agent reinforcement learning. In: Machine Learning Proceedings 1994 (pp. 157\u2013163). Elsevier.","DOI":"10.1016\/B978-1-55860-335-6.50027-1"},{"key":"9548_CR35","unstructured":"Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., Mordatch, I. (2017). Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in Neural Information Processing Systems, pp. 6382\u20136393."},{"key":"9548_CR36","unstructured":"McKee, K.R., Gemp, I., McWilliams, B., Du\u00e9\u00f1ez-Guzm\u00e1n, E.A., Hughes, E., Leibo, J.Z. (2020). Social diversity and social preferences in mixed-motive reinforcement learning. In: Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems. International Foundation for Autonomous Agents and Multiagent Systems."},{"key":"9548_CR37","unstructured":"Nieves, N.P., Yang, Y., Slumbers, O., Mguni, D.H., Wen, Y., Wang, J. (2021). Modelling behavioural diversity for learning in open-ended games. arXiv preprint arXiv:2103.07927."},{"key":"9548_CR38","unstructured":"Perolat, J., Leibo, J.Z., Zambaldi, V., Beattie, C., Tuyls, K., Graepel, T. (2017). A multi-agent reinforcement learning model of common-pool resource appropriation. In: Advances in Neural Information Processing Systems, pp. 3643\u20133652."},{"key":"9548_CR39","unstructured":"Sanjaya, R., Wang, J., Yang, Y. (2021). Measuring the non-transitivity in chess. arXiv preprint arXiv:2110.11737."},{"key":"9548_CR40","doi-asserted-by":"crossref","unstructured":"Singh, S.P., Barto, A.G., Chentanez, N. (2005). Intrinsically motivated reinforcement learning. In: Advances in Neural Information Processing Systems.","DOI":"10.21236\/ADA440280"},{"key":"9548_CR41","unstructured":"Song, H.F., Abdolmaleki, A., Springenberg, J.T., Clark, A., Soyer, H., Rae, J.W., Noury, S., Ahuja, A., Liu, S., Tirumala, D., Heess, N., Belov, D., Riedmiller, M., Botvinick, M.M. (2019). V-MPO: On-policy maximum a posteriori policy optimization for discrete and continuous control. arXiv preprint arXiv:1909.12238."},{"issue":"1\u20132","key":"9548_CR42","doi-asserted-by":"publisher","first-page":"181","DOI":"10.1016\/S0004-3702(99)00052-1","volume":"112","author":"RS Sutton","year":"1999","unstructured":"Sutton, R. S., Precup, D., & Singh, S. (1999). Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112(1\u20132), 181\u2013211.","journal-title":"Artificial Intelligence"},{"key":"9548_CR43","doi-asserted-by":"crossref","unstructured":"Tukey, J.W. (1949). Comparing individual means in the analysis of variance. Biometrics pp. 99\u2013114.","DOI":"10.2307\/3001913"},{"issue":"7782","key":"9548_CR44","doi-asserted-by":"publisher","first-page":"350","DOI":"10.1038\/s41586-019-1724-z","volume":"575","author":"O Vinyals","year":"2019","unstructured":"Vinyals, O., Babuschkin, I., Czarnecki, W. M., Mathieu, M., Dudzik, A., Chung, J., Choi, D. H., Powell, R., Ewalds, T., Georgiev, P., Oh, J., Horgan, D., Kroiss, M., Danihelka, I., Huang, A., Sifre, L., Cai, T., Agapiou, J. P., Jaderberg, M., \u2026 Silver, D. (2019). Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 575(7782), 350\u2013354.","journal-title":"Nature"},{"key":"9548_CR45","unstructured":"Wang, R., Lehman, J., Clune, J., Stanley, K.O. (2019). Paired open-ended trailblazer (POET): Endlessly generating increasingly complex and diverse learning environments and their solutions. arXiv preprint arXiv:1901.01753."},{"key":"9548_CR46","doi-asserted-by":"crossref","unstructured":"Wang, R., Lehman, J., Rawal, A., Zhi, J., Li, Y., Clune, J., Stanley, K. (2020). Enhanced POET: Open-ended reinforcement learning through unbounded invention of learning challenges and their solutions. In: International Conference on Machine Learning, pp. 9940\u20139951. PMLR.","DOI":"10.1145\/3321707.3321799"},{"key":"9548_CR47","doi-asserted-by":"crossref","unstructured":"Wang, R.E., Wu, S.A., Evans, J.A., Tenenbaum, J.B., Parkes, D.C., Kleiman-Weiner, M. (2020). Too many cooks: Bayesian inference for coordinating multi-agent collaboration. In: Cooperative AI Workshop at the Conference on Neural Information Processing Systems.","DOI":"10.1093\/oso\/9780198862536.003.0008"},{"key":"9548_CR48","unstructured":"Zhang, C., Vinyals, O., Munos, R., Bengio, S. (2018). A study on overfitting in deep reinforcement learning. arXiv preprint arXiv:1804.06893."}],"container-title":["Autonomous Agents and Multi-Agent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10458-022-09548-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10458-022-09548-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10458-022-09548-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,4,28]],"date-time":"2022-04-28T17:45:18Z","timestamp":1651167918000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10458-022-09548-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,18]]},"references-count":48,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2022,4]]}},"alternative-id":["9548"],"URL":"https:\/\/doi.org\/10.1007\/s10458-022-09548-8","relation":{},"ISSN":["1387-2532","1573-7454"],"issn-type":[{"value":"1387-2532","type":"print"},{"value":"1573-7454","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,3,18]]},"assertion":[{"value":"5 February 2022","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"18 March 2022","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"21"}}