Classifying ambiguous identities in hidden-role Stochastic games with multi-agent reinforcement learning

Han, Shijie; Li, Siyuan; An, Bo; Zhao, Wei; Liu, Peng

doi:10.1007/s10458-023-09620-x

Classifying ambiguous identities in hidden-role Stochastic games with multi-agent reinforcement learning

Published: 11 August 2023

Volume 37, article number 35, (2023)
Cite this article

Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Shijie Han¹,
Siyuan Li¹,
Bo An²,
Wei Zhao¹ &
…
Peng Liu¹

515 Accesses
Explore all metrics

Abstract

Multi-agent reinforcement learning (MARL) is a prevalent learning paradigm for solving stochastic games. In most MARL studies, agents in a game are defined as teammates or enemies beforehand, and the relationships among the agents (i.e., their identities) remain fixed throughout the game. However, in real-world problems, the agent relationships are commonly unknown in advance or dynamically changing. Many multi-party interactions start off by asking: who is on my team? This question arises whether it is the first day at the stock exchange or the kindergarten. Therefore, training policies for such situations in the face of imperfect information and ambiguous identities is an important problem that needs to be addressed. In this work, we develop a novel identity detection reinforcement learning (IDRL) framework that allows an agent to dynamically infer the identities of nearby agents and select an appropriate policy to accomplish the task. In the IDRL framework, a relation network is constructed to deduce the identities of other agents by observing the behaviors of the agents. A danger network is optimized to estimate the risk of false-positive identifications. Beyond that, we propose an intrinsic reward that balances the need to maximize external rewards and accurate identification. After identifying the cooperation-competition pattern among the agents, IDRL applies one of the off-the-shelf MARL methods to learn the policy. To evaluate the proposed method, we conduct experiments on Red-10 card-shedding game, and the results show that IDRL achieves superior performance over other state-of-the-art MARL methods. Impressively, the relation network has the par performance to identify the identities of agents with top human players; the danger network reasonably avoids the risk of imperfect identification. The code to reproduce all the reported results is available online at https://github.com/MR-BENjie/IDRL.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Mastering the Card Game of Jaipur Through Zero-Knowledge Self-Play Reinforcement Learning and Action Masks

From mimic to counteract: a two-stage reinforcement learning algorithm for Google research football

Article 22 February 2024

Efficiently detecting switches against non-stationary opponents

Article 26 November 2016

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Panait, L., & Luke, S. (2005). Cooperative multi-agent learning: The state of the art. Autonomous Agents and Multi-Agent Systems, 11, 387–434.
Article Google Scholar
Ismail, Z. H., Sariff, N., & Hurtado, E. (2018). A survey and analysis of cooperative multi-agent robot systems: Challenges and directions. Applications of Mobile Robots, 5, 8–14.
Google Scholar
Dafoe, A., Bachrach, Y., Hadfield, G., Horvitz, E., Larson, K., & Graepel, T. (2021). Cooperative ai: Machines must learn to find common ground. Nature, 593(7857), 33–36.
Article Google Scholar
Carta, S. (2022). Machine Learning and the City: Applications in Architecture and Urban Design, pp. 143–166.
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al. (2016). Mastering the game of go with deep neural networks and tree search. Nature, 529(7587), 484–489.
Article Google Scholar
Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., et al. (2018). A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 362(6419), 1140–1144.
Article MathSciNet Google Scholar
Schrittwieser, J., Antonoglou, I., Hubert, T., Simonyan, K., Sifre, L., Schmitt, S., Guez, A., Lockhart, E., Hassabis, D., Graepel, T., et al. (2020). Mastering atari, go, chess and shogi by planning with a learned model. Nature, 588(7839), 604–609.
Article Google Scholar
Ye, D., Liu, Z., Sun, M., Shi, B., Zhao, P., Wu, H., Yu, H., Yang, S., Wu, X., Guo, Q., & et al. (2020). Mastering complex control in moba games with deep reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence (vol. 34, pp. 6672–6679).
Berner, C., Brockman, G., Chan, B., Cheung, V., Debiak, P., Dennison, C., Farhi, D., Fischer, Q., Hashme, S., & Hesse, C., et al. (2019). Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680.
Brown, N., & Sandholm, T. (2018). Superhuman ai for heads-up no-limit poker: Libratus beats top professionals. Science, 359(6374), 418–424.
Article MathSciNet Google Scholar
Li, J., Koyamada, S., Ye, Q., Liu, G., Wang, C., Yang, R., Zhao, L., Qin, T., Liu, T.-Y., & Hon, H.-W. (2020). Suphx: Mastering mahjong with deep reinforcement learning. arXiv preprint arXiv:2003.13590.
Zha, D., Xie, J., Ma, W., Zhang, S., Lian, X., Hu, X., & Liu, J. (2021). Douzero: Mastering doudizhu with self-play deep reinforcement learning. In International conference on machine learning (pp. 12333–12344). PMLR.
Kurach, K., Raichuk, A., Stańczyk, P., Zajac, M., Bachem, O., Espeholt, L., Riquelme, C., Vincent, D., Michalski, M., Bousquet, O., et al. (2020). Google research football: A novel reinforcement learning environment. In Proceedings of the AAAI conference on artificial intelligence (vol. 34, pp. 4501–4510).
Chenghao, L., Wang, T., Wu, C., Zhao, Q., Yang, J., & Zhang, C. (2021). Celebrating diversity in shared multi-agent reinforcement learning. Advances in Neural Information Processing Systems, 34, 3991–4002.
Google Scholar
Buşoniu, L., Babuška, R., & Schutter, B. D. (2010). Multi-agent reinforcement learning: An overview. Innovations in Multi-Agent Systems and Applications, 1, 183–221.
Article MathSciNet Google Scholar
Sunehag, P., Lever, G., Gruslys, A., Czarnecki, W.M., Zambaldi, V., Jaderberg, M., Lanctot, M., Sonnerat, N., Leibo, J.Z., & Tuyls, K., et al. (2017). Value-decomposition networks for cooperative multi-agent learning. arXiv preprint arXiv:1706.05296.
Rashid, T., Samvelyan, M., Schroeder, C., Farquhar, G., Foerster, J., & Whiteson, S. (2018). Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In International conference on machine learning (pp. 4295–4304). PMLR.
Du, Y., Han, L., Fang, M., Liu, J., Dai, T., & Tao, D. (2019). Liir: Learning individual intrinsic reward in multi-agent reinforcement learning. Advances in Neural Information Processing Systems, 32, 558.
Google Scholar
Xiao, B., Ramasubramanian, B., & Poovendran, R. (2022). Agent-temporal attention for reward redistribution in episodic multi-agent reinforcement learning. arXiv preprint arXiv:2201.04612.
Peng, B., Rashid, T., Schroeder de Witt, C., Kamienny, P.-A., Torr, P., Böhmer, W., & Whiteson, S. (2021). Facmac: Factored multi-agent centralised policy gradients. Advances in Neural Information Processing Systems, 34, 12208–12221.
Google Scholar
Foerster, J., Assael, I. A., De Freitas, N., & Whiteson, S. (2016). Learning to communicate with deep multi-agent reinforcement learning. Advances in Neural Information Processing Systems, 29, 552.
Google Scholar
Peng, Z., Zhang, L., & Luo, T. (2018). Learning to communicate via supervised attentional message processing. In Proceedings of the 31st international conference on computer animation and social agents (pp. 11–16).
Lin, T., Huh, M., Stauffer, C., Lim, S. N., & Isola, P. (2021). Learning to ground multi-agent communication with autoencoders. Advances in Neural Information Processing Systems, 19, 15230–15242.
Google Scholar
Vanneste, S., Vanneste, A., Mets, K., Anwar, A., Mercelis, S., Latré, S., & Hellinckx, P (2020) .Learning to communicate using counterfactual reasoning. arXiv preprint arXiv:2006.07200.
Heinrich, J., & Silver, D. (2016). Deep reinforcement learning from self-play in imperfect-information games. arXiv preprint arXiv:1603.01121.
Vinyals, O., Babuschkin, I., Czarnecki, W. M., Mathieu, M., Dudzik, A., Chung, J., Choi, D. H., Powell, R., Ewalds, T., Georgiev, P., et al. (2019). Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature, 575(7782), 350–354.
Article Google Scholar
Foerster, J.N., Chen, R.Y., Al-Shedivat, M., Whiteson, S., Abbeel, P., & Mordatch, I. (2017). Learning with opponent-learning awareness. arXiv preprint arXiv:1709.04326.
Anthony, T., Eccles, T., Tacchetti, A., Kramár, J., Gemp, I., Hudson, T., Porcel, N., Lanctot, M., Pérolat, J., Everett, R., et al. (2020). Learning to play no-press diplomacy with best response policy iteration. Advances in Neural Information Processing Systems, 33, 17987–18003.
Google Scholar
Paquette, P., Lu, Y., Bocco, S. S., Smith, M., & O-G, S., Kummerfeld, J.K., Pineau, J., Singh, S., & Courville, A.C. (2019). No-press diplomacy: Modeling multi-agent gameplay. Advances in Neural Information Processing Systems, 32, 569.
Google Scholar
(FAIR)†, M.F.A.R.D.T., Bakhtin, A., Brown, N., Dinan, E., Farina, G., Flaherty, C., Fried, D., Goff, A., Gray, J., & Hu, H., et al. (2022). Human-level play in the game of diplomacy by combining language models with strategic reasoning. Science, 378(6624), 1067–1074.
Serrino, J., Kleiman-Weiner, M., Parkes, D. C., & Tenenbaum, J. (2019). Finding friend and foe in multi-agent games. Advances in Neural Information Processing Systems, 32, 669.
Google Scholar
Wang, T., & Kaneko, T. (2018). Application of deep reinforcement learning in werewolf game agents. In 2018 conference on technologies and applications of artificial intelligence (TAAI) (pp. 28–33). IEEE.
Sutton, R. S., & Barto, A.G. (2018). Reinforcement Learning: An Introduction.
Yang, Y., Luo, R., Li, M., Zhou, M., Zhang, W., & Wang, J. (2018). Mean field multi-agent reinforcement learning. In International conference on machine learning (pp. 5571–5580). PMLR.
Wang, B., Xie, J., & Atanasov, N. (2022). Darl1n: Distributed multi-agent reinforcement learning with one-hop neighbors. arXiv preprint arXiv:2202.09019.
Lowe, R., Wu, Y. I., Tamar, A., Harb, J., Pieter Abbeel, O., & Mordatch, I. (2017). Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in Neural Information Processing Systems, 30, 5689.
Google Scholar
Pérolat, J., Strub, F., Piot, B., & Pietquin, O. (2017). Learning nash equilibrium for general-sum markov games from batch data. In Artificial intelligence and statistics (pp. 232–241). PMLR.
uz Zaman, M.A., Zhang, K., Miehling, E., & Başar, T. (2020). Approximate equilibrium computation for discrete-time linear-quadratic mean-field games. In 2020 American control conference (ACC) (pp. 333–339). IEEE.
Fu, Z., Yang, Z., Chen, Y., & Wang, Z. (2019). Actor-critic provably finds nash equilibria of linear-quadratic mean-field games. arXiv preprint arXiv:1910.07498.
Nair, A., Srinivasan, P., Blackwell, S., Alcicek, C., Fearon, R., De Maria, A., Panneershelvam, V., Suleyman, M., Beattie, C., & Petersen, S. et al. (2015). Massively parallel methods for deep reinforcement learning. arXiv preprint arXiv:1507.04296.
Wang, T., Wang, J., Wu, Y., & Zhang, C. (2019). Influence-based multi-agent exploration. arXiv preprint arXiv:1910.05512.
Liu, I.-J., Jain, U., Yeh, R.A., & Schwing, A. (2021). Cooperative exploration for multi-agent deep reinforcement learning. In International conference on machine learning (pp. 6826–6836). PMLR.
Viseras, A., Wiedemann, T., Manss, C., Magel, L., Mueller, J., Shutin, D., & Merino, L. (2016). Decentralized multi-agent exploration with online-learning of gaussian processes. In 2016 IEEE international conference on robotics and automation (ICRA) (pp. 4222–4229). IEEE.
Hadfield-Menell, D., Russell, S. J., Abbeel, P., & Dragan, A. (2016). Cooperative inverse reinforcement learning. Advances in Neural Information Processing Systems, 29, 556.
Google Scholar
Wu, H., Sequeira, P., & Pynadath, D. V. (2023). Multiagent inverse reinforcement learning via theory of mind reasoning. arXiv preprint arXiv:2302.10238.
He, H., Boyd-Graber, J., Kwok, K., & Daumé III, H. (2016). Opponent modeling in deep reinforcement learning. In International conference on machine learning (pp. 1804–1813). PMLR.
Albrecht, S. V., & Stone, P. (2018). Autonomous agents modelling other agents: A comprehensive survey and open problems. Artificial Intelligence, 258, 66–95.
Article MathSciNet Google Scholar
Perolat, J., De Vylder, B., Hennes, D., Tarassov, E., Strub, F., de Boer, V., Muller, P., Connor, J. T., Burch, N., Anthony, T., et al. (2022). Mastering the game of stratego with model-free multiagent reinforcement learning. Science, 378(6623), 990–996.
Article MathSciNet Google Scholar
Rabinowitz, N., Perbet, F., Song, F., Zhang, C., Eslami, S. A., & Botvinick, M. (2018). Machine theory of mind. In International conference on machine learning (pp. 4218–4227). PMLR.
Cuzzolin, F., Morelli, A., Cirstea, B., & Sahakian, B. J. (2020). Knowing me, knowing you: Theory of mind in ai. Psychological Medicine, 50(7), 1057–1061.
Article Google Scholar
Stone, P., Kaminka, G.A., Kraus, S., & Rosenschein, J.S. (2010). Ad hoc autonomous agent teams: Collaboration without pre-coordination. In Twenty-fourth AAAI conference on artificial intelligence.
Mirsky, R., Carlucho, I., Rahman, A., Fosong, E., Macke, W., Sridharan, M., Stone, P., & Albrecht, S. V. (2022). A survey of ad hoc teamwork research. In European conference on multi-agent systems (pp. 275–293). Springer.
Barrett, S., & Stone, P. (2015). Cooperating with unknown teammates in complex domains: A robot soccer case study of ad hoc teamwork. In Twenty-ninth AAAI conference on artificial intelligence.
Ravula, M., Alkoby, S., & Stone, P. (2019). Ad hoc teamwork with behavior switching agents. In Proceedings of the 28th international joint conference on artificial intelligence (pp. 550–556).
Chen, S., Andrejczuk, E., Cao, Z., & Zhang, J. (2020). Aateam: Achieving the ad hoc teamwork by employing the attention mechanism. In Proceedings of the AAAI conference on artificial intelligence (vol. 34, pp. 7095–7102).
Gu, P., Zhao, M., Hao, J., & An, B. (2021). Online ad hoc teamwork under partial observability. In International conference on learning representations.
Rahman, M.A., Hopner, N., Christianos, F., & Albrecht, S.V. (2021). Towards open ad hoc teamwork using graph-based policy learning. In International conference on machine learning (pp. 8776–8786). PMLR.
Zha, D., Lai, K.-H., Huang, S., Cao, Y., Reddy, K., Vargas, J., Nguyen, A., Wei, R., Guo, J., & Hu, X. (2021). Rlcard: a platform for reinforcement learning in card games. In Proceedings of the twenty-ninth international conference on international joint conferences on artificial intelligence (pp. 5264–5266).
Jiang, Q., Li, K., Du, B., Chen, H., & Fang, H. (2019). Deltadou: Expert-level doudizhu ai through self-play. In IJCAI (pp. 1265–1271).
You, Y., Li, L., Guo, B., Wang, W., & Lu, C. (2019). Combinational q-learning for dou di zhu. arXiv preprint arXiv:1901.08925.
Arnob, S.Y. (2020). Off-policy adversarial inverse reinforcement learning. arXiv preprint arXiv:2005.01138.
Singh, S., Soni, V., & Wellman, M. (2004). Computing approximate bayes-nash equilibria in tree-games of incomplete information. In Proceedings of the 5th ACM conference on electronic commerce (pp. 81–90).

Download references

Acknowledgements

We acknowledge funding in support of this work from the Project supported by the Key Program of the National Natural Science Foundation of China (Grant No.51935005), Basic Research Project (Grant No.JCKY20200603C010), China Academy of Launch Vehicle Technology (CALT2022-18) and supported by Natural Science Foundation of Heilongjiang Province of China (Grant No.LH2021F023), as well as supported by Science and Technology Planning Project of Heilongjiang Province of China (Grant No.GA21C031).

Author information

Authors and Affiliations

School of Computer Science and Technology, Harbin Institute of Technology, 92 West Dazhi Street, Harbin, 150001, China
Shijie Han, Siyuan Li, Wei Zhao & Peng Liu
School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
Bo An

Authors

Shijie Han
View author publications
You can also search for this author in PubMed Google Scholar
Siyuan Li
View author publications
You can also search for this author in PubMed Google Scholar
Bo An
View author publications
You can also search for this author in PubMed Google Scholar
Wei Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Peng Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Siyuan Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Red-10 game rules

Deck Red-10 game is played with a standard 52-card deck comprising 13 ranks in each of the four suits: clubs, diamonds, hearts, and spades. Each suit series is ranked from top to bottom as 2,A,K,Q,J,10,9,8,7,6,5,4,3.

Cards combination categories Similar to the Doudizhu, there are rich card combination categories in Red-10 as follows.

Solo: Any individual card, ranked according to its face rank.
Pair: Any pair of identically ranked cards, ranked according to its face rank.
Trio: Any three identically ranked cards, ranked according to its face rank.
Trio with solo: Any three identically ranked cards with a solo, ranked according to the trio.
Trio with pair: Any three identically ranked cards with a pair, ranked according to the trio.
Solo chain: No fewer than five consecutive card ranks, ranked by the lowest rank in the chain.
Pairs chain: No fewer than three consecutive pairs, ranked by the lowest rank in the chain.
Airplane: No fewer than two consecutive trios, ranked by the lowest rank in the combination.
Airplane with small wings: No fewer than two consecutive trios, with additional cards having the same amount of trios, ranked by the lowest rank in the chain of trios.
Airplane with large wings: No fewer than two consecutive trios with additional pairs having the same amount of trios, ranked by the lowest rank in the chain of trios.
Four with two single cards: Four cards with equal rank with two individual cards, ranked according to the four cards.
Four with two pairs: four cards with equal rank, with two pairs, ranked by the four cards.
Bomb: Four cards of equal rank.

Red-10 includes two phases as follows.

1.
Dealing: A shuffled deck of 52 cards is randomly dealt to four players in turn, equally.
2.
Card-playing: For players play cards in turn; the first plays any category. The next player must play cards of the same category with a higher rank or bomb; otherwise, they can pass on their turn. If three consecutive agents pass, the fourth player can play any category. The game ends when any player runs out of cards.

Winner Players holding a red 10 card are on the “Landlord team,” and the others on the “Peasant.” The first team with a player who runs out of cards wins.

Appendix B: Detailed input data

In Red-10 game environment, the detailed input data of the Q action-value function, the relation network, and the danger network are listed as follows tables (Tables 8 and 9).

Table 8 Input Data of the Q Action-Value Function

Full size table

Table 9 Input data of the relation and danger networks

Full size table

Appendix C: Experiments hyper-parameters

We list the hyper-parameters of IDRL in Red-10 experiments in Table 10, and the hyper-parameters of baseline algorithms in Table 11.

Table 10 Hyper-parameters of IDRL experiments

Full size table

Table 11 Hyper-parameters of baseline algorithms

Full size table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Han, S., Li, S., An, B. et al. Classifying ambiguous identities in hidden-role Stochastic games with multi-agent reinforcement learning. Auton Agent Multi-Agent Syst 37, 35 (2023). https://doi.org/10.1007/s10458-023-09620-x

Download citation

Accepted: 25 July 2023
Published: 11 August 2023
DOI: https://doi.org/10.1007/s10458-023-09620-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Classifying ambiguous identities in hidden-role Stochastic games with multi-agent reinforcement learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Mastering the Card Game of Jaipur Through Zero-Knowledge Self-Play Reinforcement Learning and Action Masks

From mimic to counteract: a two-stage reinforcement learning algorithm for Google research football

Efficiently detecting switches against non-stationary opponents

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix A: Red-10 game rules

Appendix B: Detailed input data

Appendix C: Experiments hyper-parameters

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now