JRM Vol.36 p.508 (2024) | Fuji Technology Press: academic journal publisher

single-rb.php

JRM Vol.36 No.3 pp. 508-516
doi: 10.20965/jrm.2024.p0508
(2024)

Review:

Learning Agents in Robot Navigation: Trends and Next Challenges

Fumito Uwano ORCID Icon

Okayama University
3-1-1 Tsushima-naka, Kita-ku, Okayama 700-8530, Japan

Received:
February 15, 2024
Accepted:
April 12, 2024
Published:
June 20, 2024
Keywords:
multi-agent system, reinforcement learning, robotics, navigation
Abstract

Multiagent reinforcement learning performs well in multiple situations such as social simulation and data mining. It particularly stands out in robot control. In this approach, artificial agents behave in a system and learn their policies for their own satisfaction and that of others. Robots encode policies to simulate the performance. Therefore, learning should maintain and improve system performance. Previous studies have attempted various approaches to outperform control robots. This paper provides an overview of multiagent reinforcement learning work, primarily on navigation. Specifically, we discuss current achievements and limitations, followed by future challenges.

Multi-robot navigation with path finding

Multi-robot navigation with path finding

Cite this article as:
F. Uwano, “Learning Agents in Robot Navigation: Trends and Next Challenges,” J. Robot. Mechatron., Vol.36 No.3, pp. 508-516, 2024.
Data files:
References
  1. [1] S. Thrun, W. Burgard, and D. Fox, “Probabilistic Robotics (Intelligent Robotics and Autonomous Agents),” The MIT Press, 2005.
  2. [2] K. J. Singh, A. Nayyar, D. S. Kapoor, N. Mittal, S. Mahajan, A. K. Pandit, and M. Masud, “Adaptive Flower Pollination Algorithm-Based Energy Efficient Routing Protocol for Multi-Robot Systems,” IEEE Access, Vol.9, pp. 82417-82434, 2021. https://doi.org/10.1109/ACCESS.2021.3086628
  3. [3] Y. Chang, L. Ballotta, and L. Carlone, “D-Lite: Navigation-Oriented Compression of 3D Scene Graphs for Multi-Robot Collaboration,” IEEE Robotics and Automation Letters, Vol.8, No.11, pp. 7527-7534, 2023. https://doi.org/10.1109/LRA.2023.3320011
  4. [4] S. Han, M. Dastani, and S. Wang, “Model-based Sparse Communication in Multi-agent Reinforcement Learning,” Proc. of the 2023 Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS), pp. 439-447, 2023.
  5. [5] G. Sharon, R. Stern, A. Felner, and N. R. Sturtevant, “Conflict-Based Search for Optimal Multi-Agent Pathfinding,” Artificial Intelligence, Vol.219, pp. 40-66, 2015. https://doi.org/10.1016/j.artint.2014.11.006
  6. [6] Z. Ren, J. Li, H. Zhang, S. Koenig, S. Rathinam, and H. Choset, “Binary Branching Multi-Objective Conflict-Based Search for Multi-Agent Path Finding,” Proc. of the Int. Conf. on Automated Planning and Scheduling, Vol.33, No.1, pp. 361-369, 2023. https://doi.org/10.1609/icaps.v33i1.27214
  7. [7] C. Ge, H. Zhang, J. Li, and S. Koenig, “Cost Splitting for Multi-Objective Conflict-Based Search,” Proc. of the Int. Conf. on Automated Planning and Scheduling, Vol.33, No.1, pp. 128-137, 2023. https://doi.org/10.1609/icaps.v33i1.27187
  8. [8] L. Chen, Y. Wang, Y. Mo, Z. Miao, H. Wang, M. Feng, and S. Wang, “Multiagent Path Finding Using Deep Reinforcement Learning Coupled with Hot Supervision Contrastive Loss,” IEEE Trans. on Industrial Electronics, Vol.70, No.7, pp. 7032-7040, 2023. https://doi.org/10.1109/TIE.2022.3206745
  9. [9] C. Ferner, G. Wagner, and H. Choset, “ODrM* optimal multirobot path planning in low dimensional search spaces,” 2013 IEEE Int. Conf. on Robotics and Automation, pp. 3854-3859, 2013. https://doi.org/10.1109/ICRA.2013.6631119
  10. [10] G. Wagner and H. Choset, “M*: A Complete Multirobot Path Planning Algorithm with Performance Bounds,” 2011 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pp. 3260-3267, 2011. https://doi.org/10.1109/IROS.2011.6095022
  11. [11] H. Asano, R. Yonetani, M. Nishimura, and T. Kozuno, “Counterfactual Fairness Filter for Fair-Delay Multi-Robot Navigation,” Proc. of the 2023 Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS), pp. 887-895, 2023.
  12. [12] Y. Miyashita, T. Yamauchi, and T. Sugawara, “Distributed Planning with Asynchronous Execution with Local Navigation for Multi-agent Pickup and Delivery Problem,” Proc. of the 2023 Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS), pp. 914-922, 2023.
  13. [13] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, “Human-level control through deep reinforcement learning,” Nature, Vol.518, No.7540, pp. 529-533, 2015. https://doi.org/10.1038/nature14236
  14. [14] D. Hennes, D. Morrill, S. Omidshafiei, R. Munos, J. Perolat, M. Lanctot, A. Gruslys, J.-B. Lespiau, P. Parmas, E. Duèñez-Guzmán, and K. Tuyls, “Neural Replicator Dynamics: Multiagent Learning via Hedging Policy Gradients,” Proc. of the 19th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS), pp. 492-501, 2020.
  15. [15] O. Sigaud and O. Buffet, “Markov Decision Processes in Artificial Intelligence,” Wiley-IEEE Press, 2010.
  16. [16] R. S. Sutton and A. G. Barto, “Introduction to Reinforcement Learning,” MIT Press, 1998.
  17. [17] C. J. C. H. Watkins and P. Dayan, “Q-Learning,” Machine Learning, Vol.8, Nos.3-4, pp. 279-292, 1992. https://doi.org/10.1007/BF00992698
  18. [18] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu, “Asynchronous Methods for Deep Reinforcement Learning,” arXiv preprint, arXiv:1602.01783, 2016. https://doi.org/10.48550/arXiv.1602.01783
  19. [19] R. Raileanu, E. Denton, A. Szlam, and R. Fergus, “Modeling Others Using Oneself in Multi-Agent Reinforcement Learning,” Proc. of the 35th Int. Conf. on Machine Learning, 2018.
  20. [20] M. Samvelyan, T. Rashid, C. S. d. Witt, G. Farquhar, N. Nardelli, T. G. J. Rudner, C.-M. Hung, P. H. S. Torr, J. Foerster, and S. Whiteson, “The StarCraft Multi-Agent Challenge,” Proc. of the 18th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS), pp. 2186-2188, 2019. https://doi.org/10.5555/3306127.3332052
  21. [21] T. Rashid, M. Samvelyan, C. Schroeder, G. Farquhar, J. Foerster, and S. Whiteson, “QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning,” Proc. of the 35th Int. Conf. on Machine Learning, pp. 4295-4304, 2018.
  22. [22] E. Marchesini, L. Marzari, A. Farinelli, and C. Amato, “Safe Deep Reinforcement Learning by Verifying Task-Level Properties,” Proc. of the 2023 Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS), pp. 1466-1475, 2023.
  23. [23] Y. Liu, A. Halev, and X. Liu, “Policy Learning with Constraints in Model-free Reinforcement Learning: A Survey,” Proc. of the 13th Int. Joint Conf. on Artificial Intelligence (IJCAI-21), pp. 4508-4515, 2021. https://doi.org/10.24963/ijcai.2021/614
  24. [24] S. Lu, K. Zhang, T. Chen, T. Başar, and L. Horesh, “Decentralized Policy Gradient Descent Ascent for Safe Multi-Agent Reinforcement Learning,” Proc. of the AAAI Conf. on Artificial Intelligence, Vol.35, No.10, pp. 8767-8775, 2021. https://doi.org/10.1609/aaai.v35i10.17062
  25. [25] C. Liu, N. Geng, V. Aggarwal, T. Lan, Y. Yang, and M. Xu, “CMIX: Deep Multi-agent Reinforcement Learning with Peak and Average Constraints,” N. Oliver, F. Pérez-Cruz, S. Kramer, J. Read, and J. A. Lozano (Eds.), “Machine Learning and Knowledge Discovery in Databases. Research Track,” pp. 157-173, Springer Cham, 2021. https://doi.org/10.1007/978-3-030-86486-6
  26. [26] I. ElSayed-Aly, S. Bharadwaj, C. Amato, R. Ehlers, U. Topcu, and L. Feng, “Safe Multi-Agent Reinforcement Learning via Shielding,” Proc. of the 20th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS), pp. 483-491, 2021.
  27. [27] S. Gu, J. G. Kuba, Y. Chen, Y. Du, L. Yang, A. Knoll, and Y. Yang, “Safe Multi-Agent Reinforcement Learning for Multi-Robot Control,” Artificial Intelligence, Vol.319, Article No.103905, 2023. https://doi.org/10.1016/j.artint.2023.103905
  28. [28] A. Demir, E. Çilden, and F. Polat, “Landmark Based Reward Shaping in Reinforcement Learning with Hidden States,” Proc. of the 18th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS), pp. 1922-1924, 2019.
  29. [29] M. H. Ikram, S. Khaliq, M. L. Anjum, and W. Hussain, “Perceptual Aliasing++: Adversarial Attack for Visual SLAM Front-End and Back-End,” IEEE Robotics and Automation Letters, Vol.7, No.2, pp. 4670-4677, 2022. https://doi.org/10.1109/LRA.2022.3150031
  30. [30] L. Chrisman, “Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach,” Proc. of the 10th National Conf. on Artificial Intelligence (AAAI), pp. 183-188, 1992.
  31. [31] O. Catal, T. Verbelen, N. Wang, M. Hartmann, and B. Dhoedt, “Bio-Inspired Monocular Drone SLAM,” System Engineering for Constrained Embedded Systems, pp. 21-26, 2022. https://doi.org/10.1145/3522784.3522788
  32. [32] S. Thrun, W. Burgard, and D. Fox, “A Probabilistic Approach to Concurrent Mapping and Localization for Mobile Robots,” Machine Learning, Vol.31, No.1, pp. 29-53, 1998. https://doi.org/10.1023/A:1007436523611
  33. [33] J.-S. Gutmann and K. Konolige, “Incremental Mapping of Large Cyclic Environments,” Proc. 1999 IEEE Int. Symposium on Computational Intelligence in Robotics and Automation (CIRA’99), pp. 318-325, 1999. https://doi.org/10.1109/CIRA.1999.810068
  34. [34] P.-Y. Lajoie, S. Hu, G. Beltrame, and L. Carlone, “Modeling Perceptual Aliasing in SLAM via Discrete-Continuous Graphical Models,” IEEE Robotics and Automation Letters, Vol.4, No.2, pp. 1232-1239, 2019. https://doi.org/10.1109/LRA.2019.2894852
  35. [35] A. Ranganathan, E. Menegatti, and F. Dellaert, “Bayesian Inference in the Space of Topological Maps,” IEEE Trans. on Robotics, Vol.22, No.1, pp. 92-107, 2006. https://doi.org/10.1109/TRO.2005.861457
  36. [36] P. Gao, Q. Zhu, H. Lu, C. Gan, and H. Zhang, “Deep Masked Graph Matching for Correspondence Identification in Collaborative Perception,” 2023 IEEE Int. Conf. on Robotics and Automation (ICRA), pp. 6117-6123, 2023. https://doi.org/10.1109/ICRA48891.2023.10161231
  37. [37] D. Van Opdenbosch and E. Steinbach, “Collaborative Visual SLAM Using Compressed Feature Exchange,” IEEE Robotics and Automation Letters, Vol.4, No.1, pp. 57-64, 2019. https://doi.org/10.1109/LRA.2018.2878920
  38. [38] A. Siddique, W. N. Browne, and G. M. Grimshaw, “Frames-of-Reference-Based Learning: Overcoming Perceptual Aliasing in Multistep Decision-Making Tasks,” IEEE Trans. on Evolutionary Computation, Vol.26, No.1, pp. 174-187, 2022. https://doi.org/10.1109/TEVC.2021.3102241
  39. [39] F. Uwano and W. Browne, “Hierarchical Frames-of-References in Learning Classifier Systems,” Proc. of the Companion Conf. on Genetic and Evolutionary Computation (GECCO’23), pp. 335-338, 2023. https://doi.org/10.1145/3583133.3590588
  40. [40] B. Wang, J. Xie, and N. Atanasov, “DARL1N: Distributed Multi-Agent Reinforcement Learning with One-Hop Neighbors,” 2022 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), pp. 9003-9010, 2022. https://doi.org/10.1109/IROS47612.2022.9981441
  41. [41] J. Foerster, I. A. Assael, N. d. Freitas, and S. Whiteson, “Learning to Communicate with Deep Multi-Agent Reinforcement Learning,” Proc. of Advances in Neural Information Processing Systems (NIPS), Vol.29, 2016.
  42. [42] Y. Niu, R. Paleja, and M. Gombolay, “Multi-Agent Graph-Attention Communication and Teaming,” Proc. of the 20th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS), pp. 964-973, 2021.
  43. [43] Z. Sun, H. Wu, Y. Shi, X. Yu, Y. Gao, W. Pei, Z. Yang, H. Piao, and Y. Hou, “Multi-Agent Air Combat with Two-Stage Graph-Attention Communication,” Neural Computing and Applications, Vol.35, No.27, pp. 19765-19781, 2023. https://doi.org/10.1007/s00521-023-08784-7
  44. [44] A. Das, T. Gervet, J. Romoff, D. Batra, D. Parik, M. Rabbat, and J. Pineau, “TarMAC: Targeted Multi-Agent Communication,” Proc. of the 36th Int. Conf. on Machine Learning, Vol.97, pp. 1538-1546, 2019.
  45. [45] A. Y. Ng, D. Harada, and S. J. Russell, “Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping,” Proc. of the 16th Int. Conf. on Machine Learning (ICML), pp. 278-287, 1999.
  46. [46] S. Devlin and D. Kudenko, “Dynamic Potential-Based Reward Shaping,” Proc. of the 11th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS), pp. 433-440, 2012.
  47. [47] P. Mannion, S. Devlin, J. Duggan, and E. Howley, “Reward Shaping for Knowledge-Based Multi-Objective Multi-Agent Reinforcement Learning,” The Knowledge Engineering Review, Vol.33, Article No.e23, 2018. https://doi.org/10.1017/S0269888918000292
  48. [48] S. Russell, “Learning Agents for Uncertain Environments (Extended Abstract),” Proc. of the 11th Annual Conf. on Computational Learning Theory (COLT), pp. 101-103, 1998. https://doi.org/10.1145/279943.279964
  49. [49] M. Kuderer, S. Gulati, and W. Burgard, “Learning Driving Styles for Autonomous Vehicles From Demonstration,” 2015 IEEE Int. Conf. on Robotics and Automation (ICRA), pp. 2641-2646, 2015. https://doi.org/10.1109/ICRA.2015.7139555
  50. [50] Z. Wu, L. Sun, W. Zhan, C. Yang, and M. Tomizuka, “Efficient Sampling-Based Maximum Entropy Inverse Reinforcement Learning with Application to Autonomous Driving,” IEEE Robotics and Automation Letters, Vol.5, No.4, pp. 5355-5362, 2020. https://doi.org/10.1109/LRA.2020.3005126
  51. [51] D. Brown, W. Goo, P. Nagarajan, and S. Niekum, “Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations,” Proc. of the 36th Int. Conf. on Machine Learning (ICML), pp. 783-792, 2019.
  52. [52] K. Bogert and P. Doshi, “Multi-Robot Inverse Reinforcement Learning Under Occlusion with Interactions,” Proc. of the 2014 Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS), pp. 173-180, 2014.
  53. [53] L. Fan, G. Wang, Y. Jiang, A. Mandlekar, Y. Yang, H. Zhu, A. Tang, D.-A. Huang, Y. Zhu, and A. Anandkumar, “MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge,” S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), “Advances in Neural Information Processing Systems,” Vol.35, pp. 18343-18362, Curran Associates, Inc., 2022.
  54. [54] Z. Wang, S. Cai, A. Liu, Y. Jin, J. Hou, B. Zhang, H. Lin, Z. He, Z. Zheng, Y. Yang, X. Ma, and Y. Liang, “JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models,” arXiv preprint, arXiv:2311.05997, 2023. https://doi.org/10.48550/arXiv.2311.05997
  55. [55] S. Reed, K. Zolna, E. Parisotto, S. G. Colmenarejo, A. Novikov, G. Barth-Maron, M. Gimenez, Y. Sulsky, J. Kay, J. T. Springenberg, T. Eccles, J. Bruce, A. Razavi, A. Edwards, N. Heess, Y. Chen, R. Hadsell, O. Vinyals, M. Bordbar, and N. d. Freitas, “A Generalist Agent,” arXiv preprint, arXiv:2205.06175, 2022. https://doi.org/10.48550/arXiv.2205.06175
  56. [56] M. Hausknecht, P. Ammanabrolu, M.-A. Côté, and X. Yuan, “Interactive Fiction Games: A Colossal Adventure,” Proc. of the AAAI Conf. on Artificial Intelligence, Vol.34, No.05, pp. 7903-7910, 2020. https://doi.org/10.1609/aaai.v34i05.6297
  57. [57] A. Joshi, A. Ahmad, U. Pandey, and A. Modi, “ScriptWorld: Text Based Environment For Learning Procedural Knowledge,” Proc. of the 32nd Int. Joint Conf. on Artificial Intelligence (IJCAI-23), pp. 5095-5103, 2023.
  58. [58] A. Kita, N. Suenari, M. Okada, and T. Taniguchi, “Online Re-Planning and Adaptive Parameter Update for Multi-Agent Path Finding with Stochastic Travel Times,” Proc. of the 2023 Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS), pp. 2556-2558, 2023.

*This site is desgined based on HTML5 and CSS3 for modern browsers, e.g. Chrome, Firefox, Safari, Edge, Opera.

Last updated on Jan. 08, 2025