Double Critic Deep Reinforcement Learning for Mapless 3D Navigation of Unmanned Aerial Vehicles | Journal of Intelligent & Robotic Systems Skip to main content
Log in

Double Critic Deep Reinforcement Learning for Mapless 3D Navigation of Unmanned Aerial Vehicles

  • Regular paper
  • Published:
Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

Abstract

This paper presents a novel deep reinforcement learning-based system for 3D mapless navigation for Unmanned Aerial Vehicles (UAVs). Instead of using an image-based sensing approach, we propose a simple learning system that uses only a few sparse range data from a distance sensor to train a learning agent. We based our approaches on two state-of-art double critics Deep-RL models: Twin Delayed Deep Deterministic Policy Gradient (TD3) and Soft Actor-Critic (SAC). We show that our two approaches manage to outperform an approach based on the Deep Deterministic Policy Gradient (DDPG) technique and the BUG2 algorithm. Also, our new Deep-RL structure based on Recurrent Neural Networks (RNNs) outperforms the current structure used to perform mapless navigation of mobile robots. Overall, we conclude that Deep-RL approaches based on double critic with Recurrent Neural Networks (RNNs) are better suited to perform mapless navigation and obstacle avoidance of UAVs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Dulac-Arnold, G., Evans, R., van Hasselt, H., Sunehag, P., Lillicrap, T., Hunt, J., Mann, T., Weber, T., Degris, T. , Coppin, B.: “Deep reinforcement learning in large discrete action spaces, arXiv:1512.07679 (2015)

  2. Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: “Continuous control with deep reinforcement learning. In: ICLR (2015)

  3. Duan, Y., Chen, X., Houthooft, R., Schulman, J., Abbeel, P.: “Benchmarking deep reinforcement learning for continuous control. In: International conference on machine learning. PMLR, pp 1329–1338 (2016)

  4. Drews-Jr, P.L.J. , Hernández, E., Elfes, A., Nascimento, E.R., Campos, M.F.M.: “Real-time monocular obstacle avoidance using underwater dark channel prior”. In: IEEE/RSJ IROS, pp 4672–4677 (2016)

  5. Xie, L., Wang, S.: A. markham, and. N. Trigoni, “Towards monocular vision based obstacle avoidance through deep reinforcement learning,” arXiv preprint arXiv 1706, 09829 (2017)

    Google Scholar 

  6. Zhu, Y., Mottaghi, R., Kolve, E., Lim, J.J., Gupta, A., Fei-fei, L., Farhadi, A.: “Target-driven visual navigation in indoor scenes using deep reinforcement learning”. In: IEEE ICRA, pp 3357–3364 (2017)

  7. Tai, L., Paolo, G., Liu, M: “Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation”. In: IEEE/RSJ IROS, pp 31–36 (2017)

  8. Chen, Y.F., Everett, M., Liu, M., How, J.P.: “Socially aware motion planning with deep reinforcement learning”. In: IEEE/RSJ IROS, pp 1343–1350 (2017)

  9. Jesus, J.C., Bottega, J.A., Cuadros, M.A., Gamarra, D.F. : “Deep deterministic policy gradient for navigation of mobile robots in simulated environments”. In: ICAR, pp 362–367 (2019)

  10. Sampedro, C., Rodriguez-Ramos, A., Bavle, H., Carrio, A., de la Puente, P., Campoy, P.: A fully-autonomous aerial robot for search and rescue applications in indoor environments using learning-based techniques. Journal of Intelligent & Robotic Systems 95(2), 601–627 (2019)

    Article  Google Scholar 

  11. Kang, K., Belkhale, S., Kahn, G., Abbeel, P., Levine, S.: “Generalization through simulation: Integrating simulated and real data into deep reinforcement learning for vision-based autonomous flight”. In: IEEE ICRA, pp 6008–6014 (2019)

  12. Grando, R.B., de Jesus, J.C., Drews-Jr, P.L.: “Deep reinforcement learning for mapless navigation of unmanned aerial vehicles”. In: 2020 Latin American Robotics Symposium (LARS), 2020 Brazilian Symposium on Robotics (SBR) and 2020 Workshop on Robotics in Education (WRE). IEEE, pp 1–6 (2020)

  13. Fujimoto, S., Hoof, H., Meger, D.: “Addressing function approximation error in actor-critic methods”. In: International Conference on Machine Learning. PMLR, pp 1587–1596 (2018)

  14. Marino, R., Mastrogiovanni, F., Sgorbissa, A., Zaccaria, R.: “A minimalistic quadrotor navigation strategy for indoor multi-floor scenarios”. In: Intelligent Autonomous Systems 13. Springer, pp 1561–1570 (2016)

  15. Grando, R.B., Costa de Jesus, J., Kich, V.A., Kolling, A.H. , Bortoluzzi, P.N., Pinheiro, P.M., Neto, A.A., Drews-Jr, P.L.J: “Deep reinforcement learning for mapless navigation of a hybrid aerial underwater vehicle with medium transition”, IEEE International Conference on Robotics and Automation (ICRA), pp arXiv:2103 (2021)

  16. Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: a survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013)

    Article  Google Scholar 

  17. Kormushev, P., Calinon, S., Caldwell, D.G.: Reinforcement learning in robotics: Applications and real-world challenges. Robotics 2(3), 122–148 (2013)

    Article  Google Scholar 

  18. Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., Abbeel, P.: “Domain randomization for transferring deep neural networks from simulation to the real world. In: 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, pp 23–30 (2017)

  19. Ota, K., Sasaki, Y., Jha, D. K., Yoshiyasu, Y., Kanezaki, A.: “Efficient exploration in constrained environments with goal-oriented reference path. In: IEEE/RSJ IROS (2020)

  20. Rodriguez-Ramos, A., Sampedro, C., Bavle, H., Moreno, I.G., Bavle, H., Campoy, P.: “A deep reinforcement learning technique for vision-based autonomous multirotor landing on a moving platform. In: IEEE/RSJ IROS, pp 1010–1017 (2018)

  21. Furrer, F., Burri, M., Achtelik, M., Siegwart, R.: “RotorS – modular gazebo MAV simulator framework. In: Robot Operating System (ROS), pp 595–625 (2016)

  22. He, L., Aouf, N., Whidborne, J.F., Song, B: “Deep reinforcement learning based local planner for uav obstacle avoidance using demonstration data, arXiv:2008.02521 (2020)

  23. Li, B., Gan, Z., Chen, D., Sergey Aleksandrovich, D.: Uav maneuvering target tracking in uncertain environments based on deep reinforcement learning and meta-learning. Remote Sens. 12(22), 3789 (2020)

    Article  Google Scholar 

  24. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., Chintala, S.: “Pytorch: An imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, Eds., vol. 32. Curran Associates, Inc. [Online]. Available: https://proceedings.neurips.cc/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf (2019)

  25. Rukhovich, D., Sofiiuk, K., Galeev, D., Barinova, O., Konushin, A.: “Iterdet, Iterative scheme for object detection in crowded environments” (2021)

  26. Tao, A., Sapra, K., Catanzaro, B.: Hierarchical multi-scale attention for semantic segmentation, vol. arXiv:2005.10821 (2020)

  27. Shoeybi, M., Patwary, M., Puri, R., LeGresley, P., Casper, J., Catanzaro, B: “Megatron-lm: Training multi-billion parameter language models using model parallelism” (2020)

  28. Quigley, M., Conley, K., Gerkey, B., Faust, J., Foote, T., Leibs, J., Wheeler, R., Ng, A.Y.: “ROS: an open-source robot operating system. In: IEEE ICRA - Workshop on open source software. Kobe, Japan, vol. 3, p 5 (2009)

  29. Drews-Jr, P.L.J., Neto, A.A., Campos, M.F.M.: “Hybrid unmanned aerial underwater vehicle: Modeling and simulation. pp. 4637–4642 (2014)

  30. Neto, A.A., Mozelli, L.A., Drews-Jr, P.L.J., Campos, M.F.M.: “Attitude control for an hybrid unmanned aerial underwater vehicle: A robust switched strategy with global stability. In: IEEE ICRA, pp 395–400 (2015)

  31. Horn, A.C., Pinheiro, P.M., Silva, C.B., Neto, A.A., Drews-Jr, P.L.J.: “A study on configuration of propellers for multirotor-like hybrid aerial-aquatic vehicles. In: ICAR, pp 173–178 (2019)

  32. Horn, A.C., Pinheiro, P.M., Grando, R.B., da Silva, C. B., Neto, A.A., Drews, P.L.: “A novel concept for hybrid unmanned aerial underwater vehicles focused on aquatic performance”, pp. 1–6 (2020)

  33. Bedin Grando, R., Drews Jr, P.L.J., Alves Neto, A.: Ardupilot and ros-based control system concept for a hybrid unmanned aerial underwater vehicle. In: II Brazilian Humanoid Robot Workshop and III Brazilian Workshop on Service Robotics, pp 26–29 (2019)

  34. Grando, R.B., Pinheiro, P.M., Bortoluzzi, N.P., da Silva, C.B., Zauk, O.F., Piñeiro, M.O., Aoki, M.V., Kelbouscas, A.L., Lima, Y.B., Drews, P.L., Neto, A.A.: “Visual-based autonomous unmanned aerial vehicle for inspection in indoor environments,” pp. 1–6 (2020)

  35. Koenig, N., A Howard, A.: “Design and use paradigms for gazebo, an open-source multi-robot simulator”. In: IEEE/RSJ IROS, vol. 3, pp 2149–2154 (2004)

Download references

Acknowledgement

We want to thank to National Council for Scientific and Technological Development (CNPq), the Coordination for the Improvement of Higher Education Personnel (CAPES) - Finance Code 001, PRH-ANP and all participants of VersusAI. We also want to acknowledge that the data used in this works differs completely from our previous work, a new evaluation was carried out for all statistics presented.

Funding

This work was manly founded by the Coordination for the Improvement of Higher Education Personnel (CAPES). It also had support from the National Council for Scientific and Technological Development (CNPq) and from the National Agency of Petroleum, Natural Gas, and Biofuels (PRH-ANP).

Author information

Authors and Affiliations

Authors

Contributions

- Ricardo Bedin Grando conceptualized the study, wrote the article, developed and programmed the experiments, and collected and analyzed the test data.

- Junior Costa de Jesus wrote the article and gathered and analyzed the test data.

- Victor Augusto Kich wrote the article, programmed the experiments, and gathered and analyzed the test data.

- Alisson Henrique Kolling wrote the article, programmed the experiments, and gathered and analyzed the test data.

- Paulo Lilles Jorge Drews Jr. conceptualized the research, wrote the article, and led the debate on the article’s major topics.

Corresponding author

Correspondence to Ricardo Bedin Grando.

Ethics declarations

Ethics approval

All authors have ethcally approved this work.

Consent for Publication

This paper can be published, the permission was approved by the author and the all the coauthors.

Competing interests

The authors inform this work present no competing interest.

Additional information

Available of data and material

GitHub’s repository https://github.com/ricardoGrando/hydrone_deep_rl_jint.

Consent to Participate

The consent to participate in this article was given by all members.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Grando, R.B., de Jesus, J., Kich, V.A. et al. Double Critic Deep Reinforcement Learning for Mapless 3D Navigation of Unmanned Aerial Vehicles. J Intell Robot Syst 104, 29 (2022). https://doi.org/10.1007/s10846-021-01568-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10846-021-01568-y

Keywords

Navigation