Classifying ambiguous identities in hidden-role Stochastic games with multi-agent reinforcement learning | Autonomous Agents and Multi-Agent Systems Skip to main content
Log in

Classifying ambiguous identities in hidden-role Stochastic games with multi-agent reinforcement learning

  • Published:
Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Multi-agent reinforcement learning (MARL) is a prevalent learning paradigm for solving stochastic games. In most MARL studies, agents in a game are defined as teammates or enemies beforehand, and the relationships among the agents (i.e., their identities) remain fixed throughout the game. However, in real-world problems, the agent relationships are commonly unknown in advance or dynamically changing. Many multi-party interactions start off by asking: who is on my team? This question arises whether it is the first day at the stock exchange or the kindergarten. Therefore, training policies for such situations in the face of imperfect information and ambiguous identities is an important problem that needs to be addressed. In this work, we develop a novel identity detection reinforcement learning (IDRL) framework that allows an agent to dynamically infer the identities of nearby agents and select an appropriate policy to accomplish the task. In the IDRL framework, a relation network is constructed to deduce the identities of other agents by observing the behaviors of the agents. A danger network is optimized to estimate the risk of false-positive identifications. Beyond that, we propose an intrinsic reward that balances the need to maximize external rewards and accurate identification. After identifying the cooperation-competition pattern among the agents, IDRL applies one of the off-the-shelf MARL methods to learn the policy. To evaluate the proposed method, we conduct experiments on Red-10 card-shedding game, and the results show that IDRL achieves superior performance over other state-of-the-art MARL methods. Impressively, the relation network has the par performance to identify the identities of agents with top human players; the danger network reasonably avoids the risk of imperfect identification. The code to reproduce all the reported results is available online at https://github.com/MR-BENjie/IDRL.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Panait, L., & Luke, S. (2005). Cooperative multi-agent learning: The state of the art. Autonomous Agents and Multi-Agent Systems, 11, 387–434.

    Article  Google Scholar 

  2. Ismail, Z. H., Sariff, N., & Hurtado, E. (2018). A survey and analysis of cooperative multi-agent robot systems: Challenges and directions. Applications of Mobile Robots, 5, 8–14.

    Google Scholar 

  3. Dafoe, A., Bachrach, Y., Hadfield, G., Horvitz, E., Larson, K., & Graepel, T. (2021). Cooperative ai: Machines must learn to find common ground. Nature, 593(7857), 33–36.

    Article  Google Scholar 

  4. Carta, S. (2022). Machine Learning and the City: Applications in Architecture and Urban Design, pp. 143–166.

  5. Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al. (2016). Mastering the game of go with deep neural networks and tree search. Nature, 529(7587), 484–489.

    Article  Google Scholar 

  6. Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., et al. (2018). A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 362(6419), 1140–1144.

    Article  MathSciNet  Google Scholar 

  7. Schrittwieser, J., Antonoglou, I., Hubert, T., Simonyan, K., Sifre, L., Schmitt, S., Guez, A., Lockhart, E., Hassabis, D., Graepel, T., et al. (2020). Mastering atari, go, chess and shogi by planning with a learned model. Nature, 588(7839), 604–609.

    Article  Google Scholar 

  8. Ye, D., Liu, Z., Sun, M., Shi, B., Zhao, P., Wu, H., Yu, H., Yang, S., Wu, X., Guo, Q., & et al. (2020). Mastering complex control in moba games with deep reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence (vol. 34, pp. 6672–6679).

  9. Berner, C., Brockman, G., Chan, B., Cheung, V., Debiak, P., Dennison, C., Farhi, D., Fischer, Q., Hashme, S., & Hesse, C., et al. (2019). Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680.

  10. Brown, N., & Sandholm, T. (2018). Superhuman ai for heads-up no-limit poker: Libratus beats top professionals. Science, 359(6374), 418–424.

    Article  MathSciNet  Google Scholar 

  11. Li, J., Koyamada, S., Ye, Q., Liu, G., Wang, C., Yang, R., Zhao, L., Qin, T., Liu, T.-Y., & Hon, H.-W. (2020). Suphx: Mastering mahjong with deep reinforcement learning. arXiv preprint arXiv:2003.13590.

  12. Zha, D., Xie, J., Ma, W., Zhang, S., Lian, X., Hu, X., & Liu, J. (2021). Douzero: Mastering doudizhu with self-play deep reinforcement learning. In International conference on machine learning (pp. 12333–12344). PMLR.

  13. Kurach, K., Raichuk, A., Stańczyk, P., Zajac, M., Bachem, O., Espeholt, L., Riquelme, C., Vincent, D., Michalski, M., Bousquet, O., et al. (2020). Google research football: A novel reinforcement learning environment. In Proceedings of the AAAI conference on artificial intelligence (vol. 34, pp. 4501–4510).

  14. Chenghao, L., Wang, T., Wu, C., Zhao, Q., Yang, J., & Zhang, C. (2021). Celebrating diversity in shared multi-agent reinforcement learning. Advances in Neural Information Processing Systems, 34, 3991–4002.

    Google Scholar 

  15. Buşoniu, L., Babuška, R., & Schutter, B. D. (2010). Multi-agent reinforcement learning: An overview. Innovations in Multi-Agent Systems and Applications, 1, 183–221.

    Article  MathSciNet  Google Scholar 

  16. Sunehag, P., Lever, G., Gruslys, A., Czarnecki, W.M., Zambaldi, V., Jaderberg, M., Lanctot, M., Sonnerat, N., Leibo, J.Z., & Tuyls, K., et al. (2017). Value-decomposition networks for cooperative multi-agent learning. arXiv preprint arXiv:1706.05296.

  17. Rashid, T., Samvelyan, M., Schroeder, C., Farquhar, G., Foerster, J., & Whiteson, S. (2018). Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In International conference on machine learning (pp. 4295–4304). PMLR.

  18. Du, Y., Han, L., Fang, M., Liu, J., Dai, T., & Tao, D. (2019). Liir: Learning individual intrinsic reward in multi-agent reinforcement learning. Advances in Neural Information Processing Systems, 32, 558.

    Google Scholar 

  19. Xiao, B., Ramasubramanian, B., & Poovendran, R. (2022). Agent-temporal attention for reward redistribution in episodic multi-agent reinforcement learning. arXiv preprint arXiv:2201.04612.

  20. Peng, B., Rashid, T., Schroeder de Witt, C., Kamienny, P.-A., Torr, P., Böhmer, W., & Whiteson, S. (2021). Facmac: Factored multi-agent centralised policy gradients. Advances in Neural Information Processing Systems, 34, 12208–12221.

    Google Scholar 

  21. Foerster, J., Assael, I. A., De Freitas, N., & Whiteson, S. (2016). Learning to communicate with deep multi-agent reinforcement learning. Advances in Neural Information Processing Systems, 29, 552.

    Google Scholar 

  22. Peng, Z., Zhang, L., & Luo, T. (2018). Learning to communicate via supervised attentional message processing. In Proceedings of the 31st international conference on computer animation and social agents (pp. 11–16).

  23. Lin, T., Huh, M., Stauffer, C., Lim, S. N., & Isola, P. (2021). Learning to ground multi-agent communication with autoencoders. Advances in Neural Information Processing Systems, 19, 15230–15242.

    Google Scholar 

  24. Vanneste, S., Vanneste, A., Mets, K., Anwar, A., Mercelis, S., Latré, S., & Hellinckx, P (2020) .Learning to communicate using counterfactual reasoning. arXiv preprint arXiv:2006.07200.

  25. Heinrich, J., & Silver, D. (2016). Deep reinforcement learning from self-play in imperfect-information games. arXiv preprint arXiv:1603.01121.

  26. Vinyals, O., Babuschkin, I., Czarnecki, W. M., Mathieu, M., Dudzik, A., Chung, J., Choi, D. H., Powell, R., Ewalds, T., Georgiev, P., et al. (2019). Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature, 575(7782), 350–354.

    Article  Google Scholar 

  27. Foerster, J.N., Chen, R.Y., Al-Shedivat, M., Whiteson, S., Abbeel, P., & Mordatch, I. (2017). Learning with opponent-learning awareness. arXiv preprint arXiv:1709.04326.

  28. Anthony, T., Eccles, T., Tacchetti, A., Kramár, J., Gemp, I., Hudson, T., Porcel, N., Lanctot, M., Pérolat, J., Everett, R., et al. (2020). Learning to play no-press diplomacy with best response policy iteration. Advances in Neural Information Processing Systems, 33, 17987–18003.

    Google Scholar 

  29. Paquette, P., Lu, Y., Bocco, S. S., Smith, M., & O-G, S., Kummerfeld, J.K., Pineau, J., Singh, S., & Courville, A.C. (2019). No-press diplomacy: Modeling multi-agent gameplay. Advances in Neural Information Processing Systems, 32, 569.

    Google Scholar 

  30. (FAIR)†, M.F.A.R.D.T., Bakhtin, A., Brown, N., Dinan, E., Farina, G., Flaherty, C., Fried, D., Goff, A., Gray, J., & Hu, H., et al. (2022). Human-level play in the game of diplomacy by combining language models with strategic reasoning. Science, 378(6624), 1067–1074.

  31. Serrino, J., Kleiman-Weiner, M., Parkes, D. C., & Tenenbaum, J. (2019). Finding friend and foe in multi-agent games. Advances in Neural Information Processing Systems, 32, 669.

    Google Scholar 

  32. Wang, T., & Kaneko, T. (2018). Application of deep reinforcement learning in werewolf game agents. In 2018 conference on technologies and applications of artificial intelligence (TAAI) (pp. 28–33). IEEE.

  33. Sutton, R. S., & Barto, A.G. (2018). Reinforcement Learning: An Introduction.

  34. Yang, Y., Luo, R., Li, M., Zhou, M., Zhang, W., & Wang, J. (2018). Mean field multi-agent reinforcement learning. In International conference on machine learning (pp. 5571–5580). PMLR.

  35. Wang, B., Xie, J., & Atanasov, N. (2022). Darl1n: Distributed multi-agent reinforcement learning with one-hop neighbors. arXiv preprint arXiv:2202.09019.

  36. Lowe, R., Wu, Y. I., Tamar, A., Harb, J., Pieter Abbeel, O., & Mordatch, I. (2017). Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in Neural Information Processing Systems, 30, 5689.

    Google Scholar 

  37. Pérolat, J., Strub, F., Piot, B., & Pietquin, O. (2017). Learning nash equilibrium for general-sum markov games from batch data. In Artificial intelligence and statistics (pp. 232–241). PMLR.

  38. uz Zaman, M.A., Zhang, K., Miehling, E., & Başar, T. (2020). Approximate equilibrium computation for discrete-time linear-quadratic mean-field games. In 2020 American control conference (ACC) (pp. 333–339). IEEE.

  39. Fu, Z., Yang, Z., Chen, Y., & Wang, Z. (2019). Actor-critic provably finds nash equilibria of linear-quadratic mean-field games. arXiv preprint arXiv:1910.07498.

  40. Nair, A., Srinivasan, P., Blackwell, S., Alcicek, C., Fearon, R., De Maria, A., Panneershelvam, V., Suleyman, M., Beattie, C., & Petersen, S. et al. (2015). Massively parallel methods for deep reinforcement learning. arXiv preprint arXiv:1507.04296.

  41. Wang, T., Wang, J., Wu, Y., & Zhang, C. (2019). Influence-based multi-agent exploration. arXiv preprint arXiv:1910.05512.

  42. Liu, I.-J., Jain, U., Yeh, R.A., & Schwing, A. (2021). Cooperative exploration for multi-agent deep reinforcement learning. In International conference on machine learning (pp. 6826–6836). PMLR.

  43. Viseras, A., Wiedemann, T., Manss, C., Magel, L., Mueller, J., Shutin, D., & Merino, L. (2016). Decentralized multi-agent exploration with online-learning of gaussian processes. In 2016 IEEE international conference on robotics and automation (ICRA) (pp. 4222–4229). IEEE.

  44. Hadfield-Menell, D., Russell, S. J., Abbeel, P., & Dragan, A. (2016). Cooperative inverse reinforcement learning. Advances in Neural Information Processing Systems, 29, 556.

    Google Scholar 

  45. Wu, H., Sequeira, P., & Pynadath, D. V. (2023). Multiagent inverse reinforcement learning via theory of mind reasoning. arXiv preprint arXiv:2302.10238.

  46. He, H., Boyd-Graber, J., Kwok, K., & Daumé III, H. (2016). Opponent modeling in deep reinforcement learning. In International conference on machine learning (pp. 1804–1813). PMLR.

  47. Albrecht, S. V., & Stone, P. (2018). Autonomous agents modelling other agents: A comprehensive survey and open problems. Artificial Intelligence, 258, 66–95.

    Article  MathSciNet  Google Scholar 

  48. Perolat, J., De Vylder, B., Hennes, D., Tarassov, E., Strub, F., de Boer, V., Muller, P., Connor, J. T., Burch, N., Anthony, T., et al. (2022). Mastering the game of stratego with model-free multiagent reinforcement learning. Science, 378(6623), 990–996.

    Article  MathSciNet  Google Scholar 

  49. Rabinowitz, N., Perbet, F., Song, F., Zhang, C., Eslami, S. A., & Botvinick, M. (2018). Machine theory of mind. In International conference on machine learning (pp. 4218–4227). PMLR.

  50. Cuzzolin, F., Morelli, A., Cirstea, B., & Sahakian, B. J. (2020). Knowing me, knowing you: Theory of mind in ai. Psychological Medicine, 50(7), 1057–1061.

    Article  Google Scholar 

  51. Stone, P., Kaminka, G.A., Kraus, S., & Rosenschein, J.S. (2010). Ad hoc autonomous agent teams: Collaboration without pre-coordination. In Twenty-fourth AAAI conference on artificial intelligence.

  52. Mirsky, R., Carlucho, I., Rahman, A., Fosong, E., Macke, W., Sridharan, M., Stone, P., & Albrecht, S. V. (2022). A survey of ad hoc teamwork research. In European conference on multi-agent systems (pp. 275–293). Springer.

  53. Barrett, S., & Stone, P. (2015). Cooperating with unknown teammates in complex domains: A robot soccer case study of ad hoc teamwork. In Twenty-ninth AAAI conference on artificial intelligence.

  54. Ravula, M., Alkoby, S., & Stone, P. (2019). Ad hoc teamwork with behavior switching agents. In Proceedings of the 28th international joint conference on artificial intelligence (pp. 550–556).

  55. Chen, S., Andrejczuk, E., Cao, Z., & Zhang, J. (2020). Aateam: Achieving the ad hoc teamwork by employing the attention mechanism. In Proceedings of the AAAI conference on artificial intelligence (vol. 34, pp. 7095–7102).

  56. Gu, P., Zhao, M., Hao, J., & An, B. (2021). Online ad hoc teamwork under partial observability. In International conference on learning representations.

  57. Rahman, M.A., Hopner, N., Christianos, F., & Albrecht, S.V. (2021). Towards open ad hoc teamwork using graph-based policy learning. In International conference on machine learning (pp. 8776–8786). PMLR.

  58. Zha, D., Lai, K.-H., Huang, S., Cao, Y., Reddy, K., Vargas, J., Nguyen, A., Wei, R., Guo, J., & Hu, X. (2021). Rlcard: a platform for reinforcement learning in card games. In Proceedings of the twenty-ninth international conference on international joint conferences on artificial intelligence (pp. 5264–5266).

  59. Jiang, Q., Li, K., Du, B., Chen, H., & Fang, H. (2019). Deltadou: Expert-level doudizhu ai through self-play. In IJCAI (pp. 1265–1271).

  60. You, Y., Li, L., Guo, B., Wang, W., & Lu, C. (2019). Combinational q-learning for dou di zhu. arXiv preprint arXiv:1901.08925.

  61. Arnob, S.Y. (2020). Off-policy adversarial inverse reinforcement learning. arXiv preprint arXiv:2005.01138.

  62. Singh, S., Soni, V., & Wellman, M. (2004). Computing approximate bayes-nash equilibria in tree-games of incomplete information. In Proceedings of the 5th ACM conference on electronic commerce (pp. 81–90).

Download references

Acknowledgements

We acknowledge funding in support of this work from the Project supported by the Key Program of the National Natural Science Foundation of China (Grant No.51935005), Basic Research Project (Grant No.JCKY20200603C010), China Academy of Launch Vehicle Technology (CALT2022-18) and supported by Natural Science Foundation of Heilongjiang Province of China (Grant No.LH2021F023), as well as supported by Science and Technology Planning Project of Heilongjiang Province of China (Grant No.GA21C031).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Siyuan Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Red-10 game rules

Deck Red-10 game is played with a standard 52-card deck comprising 13 ranks in each of the four suits: clubs, diamonds, hearts, and spades. Each suit series is ranked from top to bottom as 2,A,K,Q,J,10,9,8,7,6,5,4,3.

Cards combination categories Similar to the Doudizhu, there are rich card combination categories in Red-10 as follows.

  • Solo: Any individual card, ranked according to its face rank.

  • Pair: Any pair of identically ranked cards, ranked according to its face rank.

  • Trio: Any three identically ranked cards, ranked according to its face rank.

  • Trio with solo: Any three identically ranked cards with a solo, ranked according to the trio.

  • Trio with pair: Any three identically ranked cards with a pair, ranked according to the trio.

  • Solo chain: No fewer than five consecutive card ranks, ranked by the lowest rank in the chain.

  • Pairs chain: No fewer than three consecutive pairs, ranked by the lowest rank in the chain.

  • Airplane: No fewer than two consecutive trios, ranked by the lowest rank in the combination.

  • Airplane with small wings: No fewer than two consecutive trios, with additional cards having the same amount of trios, ranked by the lowest rank in the chain of trios.

  • Airplane with large wings: No fewer than two consecutive trios with additional pairs having the same amount of trios, ranked by the lowest rank in the chain of trios.

  • Four with two single cards: Four cards with equal rank with two individual cards, ranked according to the four cards.

  • Four with two pairs: four cards with equal rank, with two pairs, ranked by the four cards.

  • Bomb: Four cards of equal rank.

Red-10 includes two phases as follows.

  1. 1.

    Dealing: A shuffled deck of 52 cards is randomly dealt to four players in turn, equally.

  2. 2.

    Card-playing: For players play cards in turn; the first plays any category. The next player must play cards of the same category with a higher rank or bomb; otherwise, they can pass on their turn. If three consecutive agents pass, the fourth player can play any category. The game ends when any player runs out of cards.

Winner Players holding a red 10 card are on the “Landlord team,” and the others on the “Peasant.” The first team with a player who runs out of cards wins.

Appendix B: Detailed input data

In Red-10 game environment, the detailed input data of the Q action-value function, the relation network, and the danger network are listed as follows tables (Tables 8 and 9).

Table 8 Input Data of the Q Action-Value Function
Table 9 Input data of the relation and danger networks

Appendix C: Experiments hyper-parameters

We list the hyper-parameters of IDRL in Red-10 experiments in Table 10, and the hyper-parameters of baseline algorithms in Table 11.

Table 10 Hyper-parameters of IDRL experiments
Table 11 Hyper-parameters of baseline algorithms

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Han, S., Li, S., An, B. et al. Classifying ambiguous identities in hidden-role Stochastic games with multi-agent reinforcement learning. Auton Agent Multi-Agent Syst 37, 35 (2023). https://doi.org/10.1007/s10458-023-09620-x

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10458-023-09620-x

Keywords

Navigation