Heterogeneous Multi-unit Control with Curriculum Learning for Multi-agent Reinforcement Learning

Chen, Jiali; Jiang, Kai; Liang, Rupeng; Wang, Jing; Zheng, Shaoqiu; Tan, Ying

doi:10.1007/978-981-19-9297-1_1

Jiali Chen⁷,
Kai Jiang⁸,
Rupeng Liang⁸,
Jing Wang⁸,
Shaoqiu Zheng⁸ &
…
Ying Tan^7,9,10,11

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1744))

Included in the following conference series:

International Conference on Data Mining and Big Data

710 Accesses

Abstract

Heterogeneous Multi-unit control is one of the most concerned topic in multi-agent system, which focuses on controlling agents of different type of functions. Methods that utilize parameter or replay-buffer sharing are able to address the problem of combinatorial explosion under isomorphism assumption, but may lead to divergence under heterogeneous setting. This work use curriculum learning to bypass the barrier of a needle in a haystack that is faced by either joint-action learner or independent learner. According to the experiment on heterogeneous force combat engagements, the independent learner outperforms the baseline learner by 10% of evaluation metrics with curriculum learning, which empirically shows that curriculum learning is able to discover a novel learning trajectory that is not followed by conventional multi-agent learners.

This work is supported by the National Natural Science Foundation of China (Grant No. 62250037, 62276008 and 62076010), and partially supported by Science and Technology Innovation 2030 - ‘New Generation Artificial Intelligence’Major Project (Grant Nos.: 2018AAA0102301 and 2018AAA0100302).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 10295; Price includes VAT (Japan)

Softcover Book: JPY 12869; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Learning Distinct Strategies for Heterogeneous Cooperative Multi-agent Reinforcement Learning

Continuous self-adaptive optimization to learn multi-task multi-agent

Article Open access 17 December 2021

Pacesetter Learning for Large Scale Cooperative Multi-Agent Reinforcement Learning

References

Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Proceedings of the 26th Annual International Conference On Machine Learning, pp. 41–48 (2009)
Google Scholar
Bravo, M., Alvarado, M.: On the pragmatic similarity between agent communication protocols: modeling and measuring. In: On the Move to Meaningful Internet Systems: OTM, pp. 128–137 (2008)
Google Scholar
Bravo, M., Reyes-Ortiz, J.A., Rodríguez, J., Silva-López, B.: Multi-agent communication heterogeneity. In: 2015 International Conference on Computational Science and Computational Intelligence (CSCI), pp. 583–588. IEEE (2015)
Google Scholar
Burda, Y., Edwards, H., Storkey, A., Klimov, O.: Exploration by random network distillation. arXiv preprint arXiv:1810.12894 (2018)
Calvo, J.A., Dusparic, I.: Heterogeneous multi-agent deep reinforcement learning for traffic lights control. In: AICS, pp. 2–13 (2018)
Google Scholar
Dorigo, M., Birattari, M., Stutzle, T.: Ant colony optimization. IEEE Comput. Intell. Mag. 1(4), 28–39 (2006)
Article Google Scholar
Foerster, J., Farquhar, G., Afouras, T., Nardelli, N., Whiteson, S.: Counterfactual multi-agent policy gradients. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Google Scholar
Fournier, P., Sigaud, O., Chetouani, M., Oudeyer, P.Y.: Accuracy-based curriculum learning in deep reinforcement learning. arXiv preprint arXiv:1806.09614 (2018)
Henaff, M., Bruna, J., LeCun, Y.: Deep convolutional networks on graph-structured data. arXiv preprint arXiv:1506.05163 (2015)
Hu, W., Tan, Y.: Prototype generation using multiobjective particle swarm optimization for nearest neighbor classification. IEEE Trans. Cybern. 46(12), 2719–2731 (2015)
Article Google Scholar
Ivanovic, B., Harrison, J., Sharma, A., Chen, M., Pavone, M.: BARC: backward reachability curriculum for robotic reinforcement learning. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 15–21. IEEE (2019)
Google Scholar
Jabri, A., Hsu, K., Gupta, A., Eysenbach, B., Levine, S., Finn, C.: Unsupervised curricula for visual meta-reinforcement learning. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Jain, P., Kar, P., et al.: Non-convex optimization for machine learning. Found. Trends Mach. Learn. 10(3–4), 142–363 (2017)
Article MATH Google Scholar
Jiang, J., Lu, Z.: Offline decentralized multi-agent reinforcement learning. arXiv preprint arXiv:2108.01832 (2021)
Kapetanakis, S., Kudenko, D.: Reinforcement learning of coordination in heterogeneous cooperative multi-agent systems. In: Kudenko, D., Kazakov, D., Alonso, E. (eds.) AAMAS 2003-2004. LNCS (LNAI), vol. 3394, pp. 119–131. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-32274-0_8
Chapter Google Scholar
Kennedy, J., Eberhart, R.: Particle swarm optimization. In: Proceedings of ICNN 1995-International Conference on Neural Networks, vol. 4, pp. 1942–1948. IEEE (1995)
Google Scholar
Köster, R.: Model-free conventions in multi-agent reinforcement learning with heterogeneous preferences. arXiv preprint arXiv:2010.09054 (2020)
Lair, N., Colas, C., Portelas, R., Dussoux, J.M., Dominey, P.F., Oudeyer, P.Y.: Language grounding through social interactions and curiosity-driven multi-goal learning. arXiv preprint arXiv:1911.03219 (2019)
Li, H., He, H.: Multi-agent trust region policy optimization. arXiv preprint arXiv:2010.07916 (2020)
Liu, C.L., Tian, Y.P.: Formation control of multi-agent systems with heterogeneous communication delays. Int. J. Syst. Sci. 40(6), 627–636 (2009)
Article MATH Google Scholar
Liu, L., Zheng, S., Tan, Y.: S-metric based multi-objective fireworks algorithm. In: 2015 IEEE Congress on Evolutionary Computation (CEC), pp. 1257–1264. IEEE (2015)
Google Scholar
Lowe, R., Wu, Y.I., Tamar, A., Harb, J., Pieter Abbeel, O., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Meneghetti, D.D.R., Bianchi, R.A.C.: Towards heterogeneous multi-agent reinforcement learning with graph neural networks. arXiv preprint arXiv:2009.13161 (2020)
Meneghetti, D.D.R., da Costa Bianchi, R.A.: Specializing inter-agent communication in heterogeneous multi-agent reinforcement learning using agent class information. arXiv:abs/2012.07617 (2020)
Minsky, M.: Steps toward artificial intelligence. Proc. IRE 49(1), 8–30 (1961)
Article Google Scholar
Nash, J.F., Jr.: Equilibrium points in n-person games. Proc. Natl. Acad. Sci. 36(1), 48–49 (1950)
Article MATH Google Scholar
Oliehoek, F.A., Spaan, M.T., Vlassis, N.: Optimal and approximate q-value functions for decentralized Pomdps. J. Artif. Intell. Res. 32, 289–353 (2008)
Article MATH Google Scholar
Portelas, R., Colas, C., Weng, L., Hofmann, K., Oudeyer, P.Y.: Automatic curriculum learning for deep RL: a short survey. arXiv preprint arXiv:2003.04664 (2020)
Price, B., Boutilier, C.: Reinforcement learning with imitation in heterogeneous multi-agent systems
Google Scholar
Racaniere, S., Lampinen, A.K., Santoro, A., Reichert, D.P., Firoiu, V., Lillicrap, T.P.: Automated curricula through setter-solver interactions. arXiv preprint arXiv:1909.12892 (2019)
Rashid, T., Farquhar, G., Peng, B., Whiteson, S.: Weighted QMIX: expanding monotonic value function factorisation for deep multi-agent reinforcement learning. Adv. Neural. Inf. Process. Syst. 33, 10199–10210 (2020)
Google Scholar
Rashid, T., Samvelyan, M., Schroeder, C., Farquhar, G., Foerster, J., Whiteson, S.: QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. In: International Conference On Machine Learning, pp. 4295–4304. PMLR (2018)
Google Scholar
Samvelyan, M., et al.: The StarCraft Multi-Agent Challenge. CoRR abs/1902.04043 (2019)
Google Scholar
Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897. PMLR (2015)
Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Son, K., Kim, D., Kang, W.J., Hostallero, D.E., Yi, Y.: QTRAN: learning to factorize with transformation for cooperative multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 5887–5896. PMLR (2019)
Google Scholar
Su, K., Zhou, S., Gan, C., Wang, X., Lu, Z.: Ma2QL: a minimalist approach to fully decentralized multi-agent reinforcement learning. arXiv preprint arXiv:2209.08244 (2022)
Sunehag, P., et al.: Value-decomposition networks for cooperative multi-agent learning. arXiv preprint arXiv:1706.05296 (2017)
Tan, M.: Multi-agent reinforcement learning: independent vs. cooperative agents. In: Proceedings of the Tenth International Conference on Machine Learning, pp. 330–337 (1993)
Google Scholar
Tan, Y., Zhu, Y.: Fireworks algorithm for optimization. In: Tan, Y., Shi, Y., Tan, K.C. (eds.) ICSI 2010. LNCS, vol. 6145, pp. 355–364. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13495-1_44
Chapter Google Scholar
Terry, J.K., Grammel, N., Hari, A., Santos, L.: Parameter sharing is surprisingly useful for multi-agent deep reinforcement learning (2020)
Google Scholar
Terry, J.K., Grammel, N., Hari, A., Santos, L., Black, B.: Revisiting parameter sharing in multi-agent deep reinforcement learning. arXiv preprint arXiv:2005.13625 (2020)
Terry, J.K., Grammel, N., Son, S., Black, B.: Parameter sharing for heterogeneous agents in multi-agent reinforcement learning. arXiv:abs/2005.13625 (2020)
Vinyals, O., et al.: Grandmaster level in StarCraft ii using multi-agent reinforcement learning. Nature 575(7782), 350–354 (2019)
Article Google Scholar
Vinyals, O., et al.: StarCraft ii: A new challenge for reinforcement learning. arXiv preprint arXiv:1708.04782 (2017)
Wang, J., Ren, Z., Liu, T., Yu, Y., Zhang, C.: Qplex: duplex dueling multi-agent q-learning. arXiv preprint arXiv:2008.01062 (2020)
Wang, T., Dong, H., Lesser, V., Zhang, C.: Roma: Multi-agent reinforcement learning with emergent roles. arXiv preprint arXiv:2003.08039 (2020)
Wang, T., Gupta, T., Mahajan, A., Peng, B., Whiteson, S., Zhang, C.: Rode: learning roles to decompose multi-agent tasks. arXiv preprint arXiv:2010.01523 (2020)
de Witt, C.S., et al.: Is independent learning all you need in the StarCraft multi-agent challenge? arXiv preprint arXiv:2011.09533 (2020)
Yang, S., Yang, B., Kang, Z., Deng, L.: IHG-MA: Inductive heterogeneous graph multi-agent reinforcement learning for multi-intersection traffic signal control. Neural Netw. 139, 265–277 (2021). https://doi.org/10.1016/j.neunet.2021.03.015
Article Google Scholar
Yu, C., Velu, A., Vinitsky, E., Wang, Y., Bayen, A., Wu, Y.: The surprising effectiveness of ppo in cooperative, multi-agent games. arXiv preprint arXiv:2103.01955 (2021)
Zheng, Z., Tan, Y.: Group explosion strategy for searching multiple targets using swarm robotic. In: 2013 IEEE Congress on Evolutionary Computation, pp. 821–828. IEEE (2013)
Google Scholar
Zhou, J., Cui, G., Zhang, Z., Yang, C., Liu, Z., Sun, M.: Graph neural networks: a review of methods and applications. arXiv:abs/1812.08434 (2020)
Zhou, Y., Tan, Y.: GPU-based parallel multi-objective particle swarm optimization. Int. J. Artif. Intell. 7(A11), 125–141 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Intelligence Science and Technology, Peking University, Beijing, 100871, China
Jiali Chen & Ying Tan
Nanjing Research Institute of Electronic Engineering, Nanjing, 210007, China
Kai Jiang, Rupeng Liang, Jing Wang & Shaoqiu Zheng
Key Laboratory of Machine Perception (MOE), Peking University, Beijing, 100871, China
Ying Tan
Institute for Artificial Intelligence, Peking University, Beijing, 100871, China
Ying Tan
Nanjing Kangbo Intelligent Health Academy, Nanjing, 211100, China
Ying Tan

Authors

Jiali Chen
View author publications
You can also search for this author in PubMed Google Scholar
Kai Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Rupeng Liang
View author publications
You can also search for this author in PubMed Google Scholar
Jing Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shaoqiu Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Ying Tan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Shaoqiu Zheng or Ying Tan .

Editor information

Editors and Affiliations

Peking University, Beijing, China
Ying Tan
Southern University of Science and Technology, Shenzhen, China
Yuhui Shi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, J., Jiang, K., Liang, R., Wang, J., Zheng, S., Tan, Y. (2022). Heterogeneous Multi-unit Control with Curriculum Learning for Multi-agent Reinforcement Learning. In: Tan, Y., Shi, Y. (eds) Data Mining and Big Data. DMBD 2022. Communications in Computer and Information Science, vol 1744. Springer, Singapore. https://doi.org/10.1007/978-981-19-9297-1_1

Download citation

DOI: https://doi.org/10.1007/978-981-19-9297-1_1
Published: 20 January 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-9296-4
Online ISBN: 978-981-19-9297-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Heterogeneous Multi-unit Control with Curriculum Learning for Multi-agent Reinforcement Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Learning Distinct Strategies for Heterogeneous Cooperative Multi-agent Reinforcement Learning

Continuous self-adaptive optimization to learn multi-task multi-agent

Pacesetter Learning for Large Scale Cooperative Multi-Agent Reinforcement Learning

References

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Heterogeneous Multi-unit Control with Curriculum Learning for Multi-agent Reinforcement Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Learning Distinct Strategies for Heterogeneous Cooperative Multi-agent Reinforcement Learning

Continuous self-adaptive optimization to learn multi-task multi-agent

Pacesetter Learning for Large Scale Cooperative Multi-Agent Reinforcement Learning

References

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation