Abstract
Heterogeneous Multi-unit control is one of the most concerned topic in multi-agent system, which focuses on controlling agents of different type of functions. Methods that utilize parameter or replay-buffer sharing are able to address the problem of combinatorial explosion under isomorphism assumption, but may lead to divergence under heterogeneous setting. This work use curriculum learning to bypass the barrier of a needle in a haystack that is faced by either joint-action learner or independent learner. According to the experiment on heterogeneous force combat engagements, the independent learner outperforms the baseline learner by 10% of evaluation metrics with curriculum learning, which empirically shows that curriculum learning is able to discover a novel learning trajectory that is not followed by conventional multi-agent learners.
This work is supported by the National Natural Science Foundation of China (Grant No. 62250037, 62276008 and 62076010), and partially supported by Science and Technology Innovation 2030 - ‘New Generation Artificial Intelligence’Major Project (Grant Nos.: 2018AAA0102301 and 2018AAA0100302).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Proceedings of the 26th Annual International Conference On Machine Learning, pp. 41–48 (2009)
Bravo, M., Alvarado, M.: On the pragmatic similarity between agent communication protocols: modeling and measuring. In: On the Move to Meaningful Internet Systems: OTM, pp. 128–137 (2008)
Bravo, M., Reyes-Ortiz, J.A., Rodríguez, J., Silva-López, B.: Multi-agent communication heterogeneity. In: 2015 International Conference on Computational Science and Computational Intelligence (CSCI), pp. 583–588. IEEE (2015)
Burda, Y., Edwards, H., Storkey, A., Klimov, O.: Exploration by random network distillation. arXiv preprint arXiv:1810.12894 (2018)
Calvo, J.A., Dusparic, I.: Heterogeneous multi-agent deep reinforcement learning for traffic lights control. In: AICS, pp. 2–13 (2018)
Dorigo, M., Birattari, M., Stutzle, T.: Ant colony optimization. IEEE Comput. Intell. Mag. 1(4), 28–39 (2006)
Foerster, J., Farquhar, G., Afouras, T., Nardelli, N., Whiteson, S.: Counterfactual multi-agent policy gradients. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Fournier, P., Sigaud, O., Chetouani, M., Oudeyer, P.Y.: Accuracy-based curriculum learning in deep reinforcement learning. arXiv preprint arXiv:1806.09614 (2018)
Henaff, M., Bruna, J., LeCun, Y.: Deep convolutional networks on graph-structured data. arXiv preprint arXiv:1506.05163 (2015)
Hu, W., Tan, Y.: Prototype generation using multiobjective particle swarm optimization for nearest neighbor classification. IEEE Trans. Cybern. 46(12), 2719–2731 (2015)
Ivanovic, B., Harrison, J., Sharma, A., Chen, M., Pavone, M.: BARC: backward reachability curriculum for robotic reinforcement learning. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 15–21. IEEE (2019)
Jabri, A., Hsu, K., Gupta, A., Eysenbach, B., Levine, S., Finn, C.: Unsupervised curricula for visual meta-reinforcement learning. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Jain, P., Kar, P., et al.: Non-convex optimization for machine learning. Found. Trends Mach. Learn. 10(3–4), 142–363 (2017)
Jiang, J., Lu, Z.: Offline decentralized multi-agent reinforcement learning. arXiv preprint arXiv:2108.01832 (2021)
Kapetanakis, S., Kudenko, D.: Reinforcement learning of coordination in heterogeneous cooperative multi-agent systems. In: Kudenko, D., Kazakov, D., Alonso, E. (eds.) AAMAS 2003-2004. LNCS (LNAI), vol. 3394, pp. 119–131. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-32274-0_8
Kennedy, J., Eberhart, R.: Particle swarm optimization. In: Proceedings of ICNN 1995-International Conference on Neural Networks, vol. 4, pp. 1942–1948. IEEE (1995)
Köster, R.: Model-free conventions in multi-agent reinforcement learning with heterogeneous preferences. arXiv preprint arXiv:2010.09054 (2020)
Lair, N., Colas, C., Portelas, R., Dussoux, J.M., Dominey, P.F., Oudeyer, P.Y.: Language grounding through social interactions and curiosity-driven multi-goal learning. arXiv preprint arXiv:1911.03219 (2019)
Li, H., He, H.: Multi-agent trust region policy optimization. arXiv preprint arXiv:2010.07916 (2020)
Liu, C.L., Tian, Y.P.: Formation control of multi-agent systems with heterogeneous communication delays. Int. J. Syst. Sci. 40(6), 627–636 (2009)
Liu, L., Zheng, S., Tan, Y.: S-metric based multi-objective fireworks algorithm. In: 2015 IEEE Congress on Evolutionary Computation (CEC), pp. 1257–1264. IEEE (2015)
Lowe, R., Wu, Y.I., Tamar, A., Harb, J., Pieter Abbeel, O., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Meneghetti, D.D.R., Bianchi, R.A.C.: Towards heterogeneous multi-agent reinforcement learning with graph neural networks. arXiv preprint arXiv:2009.13161 (2020)
Meneghetti, D.D.R., da Costa Bianchi, R.A.: Specializing inter-agent communication in heterogeneous multi-agent reinforcement learning using agent class information. arXiv:abs/2012.07617 (2020)
Minsky, M.: Steps toward artificial intelligence. Proc. IRE 49(1), 8–30 (1961)
Nash, J.F., Jr.: Equilibrium points in n-person games. Proc. Natl. Acad. Sci. 36(1), 48–49 (1950)
Oliehoek, F.A., Spaan, M.T., Vlassis, N.: Optimal and approximate q-value functions for decentralized Pomdps. J. Artif. Intell. Res. 32, 289–353 (2008)
Portelas, R., Colas, C., Weng, L., Hofmann, K., Oudeyer, P.Y.: Automatic curriculum learning for deep RL: a short survey. arXiv preprint arXiv:2003.04664 (2020)
Price, B., Boutilier, C.: Reinforcement learning with imitation in heterogeneous multi-agent systems
Racaniere, S., Lampinen, A.K., Santoro, A., Reichert, D.P., Firoiu, V., Lillicrap, T.P.: Automated curricula through setter-solver interactions. arXiv preprint arXiv:1909.12892 (2019)
Rashid, T., Farquhar, G., Peng, B., Whiteson, S.: Weighted QMIX: expanding monotonic value function factorisation for deep multi-agent reinforcement learning. Adv. Neural. Inf. Process. Syst. 33, 10199–10210 (2020)
Rashid, T., Samvelyan, M., Schroeder, C., Farquhar, G., Foerster, J., Whiteson, S.: QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. In: International Conference On Machine Learning, pp. 4295–4304. PMLR (2018)
Samvelyan, M., et al.: The StarCraft Multi-Agent Challenge. CoRR abs/1902.04043 (2019)
Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897. PMLR (2015)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Son, K., Kim, D., Kang, W.J., Hostallero, D.E., Yi, Y.: QTRAN: learning to factorize with transformation for cooperative multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 5887–5896. PMLR (2019)
Su, K., Zhou, S., Gan, C., Wang, X., Lu, Z.: Ma2QL: a minimalist approach to fully decentralized multi-agent reinforcement learning. arXiv preprint arXiv:2209.08244 (2022)
Sunehag, P., et al.: Value-decomposition networks for cooperative multi-agent learning. arXiv preprint arXiv:1706.05296 (2017)
Tan, M.: Multi-agent reinforcement learning: independent vs. cooperative agents. In: Proceedings of the Tenth International Conference on Machine Learning, pp. 330–337 (1993)
Tan, Y., Zhu, Y.: Fireworks algorithm for optimization. In: Tan, Y., Shi, Y., Tan, K.C. (eds.) ICSI 2010. LNCS, vol. 6145, pp. 355–364. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13495-1_44
Terry, J.K., Grammel, N., Hari, A., Santos, L.: Parameter sharing is surprisingly useful for multi-agent deep reinforcement learning (2020)
Terry, J.K., Grammel, N., Hari, A., Santos, L., Black, B.: Revisiting parameter sharing in multi-agent deep reinforcement learning. arXiv preprint arXiv:2005.13625 (2020)
Terry, J.K., Grammel, N., Son, S., Black, B.: Parameter sharing for heterogeneous agents in multi-agent reinforcement learning. arXiv:abs/2005.13625 (2020)
Vinyals, O., et al.: Grandmaster level in StarCraft ii using multi-agent reinforcement learning. Nature 575(7782), 350–354 (2019)
Vinyals, O., et al.: StarCraft ii: A new challenge for reinforcement learning. arXiv preprint arXiv:1708.04782 (2017)
Wang, J., Ren, Z., Liu, T., Yu, Y., Zhang, C.: Qplex: duplex dueling multi-agent q-learning. arXiv preprint arXiv:2008.01062 (2020)
Wang, T., Dong, H., Lesser, V., Zhang, C.: Roma: Multi-agent reinforcement learning with emergent roles. arXiv preprint arXiv:2003.08039 (2020)
Wang, T., Gupta, T., Mahajan, A., Peng, B., Whiteson, S., Zhang, C.: Rode: learning roles to decompose multi-agent tasks. arXiv preprint arXiv:2010.01523 (2020)
de Witt, C.S., et al.: Is independent learning all you need in the StarCraft multi-agent challenge? arXiv preprint arXiv:2011.09533 (2020)
Yang, S., Yang, B., Kang, Z., Deng, L.: IHG-MA: Inductive heterogeneous graph multi-agent reinforcement learning for multi-intersection traffic signal control. Neural Netw. 139, 265–277 (2021). https://doi.org/10.1016/j.neunet.2021.03.015
Yu, C., Velu, A., Vinitsky, E., Wang, Y., Bayen, A., Wu, Y.: The surprising effectiveness of ppo in cooperative, multi-agent games. arXiv preprint arXiv:2103.01955 (2021)
Zheng, Z., Tan, Y.: Group explosion strategy for searching multiple targets using swarm robotic. In: 2013 IEEE Congress on Evolutionary Computation, pp. 821–828. IEEE (2013)
Zhou, J., Cui, G., Zhang, Z., Yang, C., Liu, Z., Sun, M.: Graph neural networks: a review of methods and applications. arXiv:abs/1812.08434 (2020)
Zhou, Y., Tan, Y.: GPU-based parallel multi-objective particle swarm optimization. Int. J. Artif. Intell. 7(A11), 125–141 (2011)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Chen, J., Jiang, K., Liang, R., Wang, J., Zheng, S., Tan, Y. (2022). Heterogeneous Multi-unit Control with Curriculum Learning for Multi-agent Reinforcement Learning. In: Tan, Y., Shi, Y. (eds) Data Mining and Big Data. DMBD 2022. Communications in Computer and Information Science, vol 1744. Springer, Singapore. https://doi.org/10.1007/978-981-19-9297-1_1
Download citation
DOI: https://doi.org/10.1007/978-981-19-9297-1_1
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-9296-4
Online ISBN: 978-981-19-9297-1
eBook Packages: Computer ScienceComputer Science (R0)