A Review of Image and Point Cloud Fusion in Autonomous Driving | SpringerLink
Skip to main content

A Review of Image and Point Cloud Fusion in Autonomous Driving

  • Conference paper
  • First Online:
6GN for Future Wireless Networks (6GN 2023)

Abstract

In the task of autonomous driving perception scenarios, multi-sensor fusion is gradually becoming the current mainstream trend. At this stage, researchers use multimodal fusion to leverage information and ultimately improve target detection efficiency. Most of the current research focus on the fusion of camera and LIDAR. In this paper, we summarize the multimodal-based approaches for autonomous driving perception tasks in deep learning within the last five years. And we provide a detailed analysis of several papers on target detection tasks using LiDAR and cameras. Unlike the traditional way of classifying fusion models, this paper classifies them into three types of structures: data fusion, feature fusion, and result fusion by the different stages of feature fusion in the model. Finally, it is proposed that the future should clarify the evaluation index, improve the data enhancement methods in different modes, and use multiple fusion methods in parallel in the future.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 8007
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 10009
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    *Supported by National Natural Science Foundation of China Youth Fund (No.61802247) Natural Science Foundation of Shanghai (No.22ZR1425300) and Other projects of Shanghai Science and Technology Commission (No.21010501000).

References

  1. Yurtsever, E., Lambert, J., Carballo, A., Takeda, K.: A Survey of autonomous driving: common practices and emerging technologies. IEEE Access 8, 58443–58469 (2020)

    Article  Google Scholar 

  2. Wang, et al., “Multi-modal 3D Object detection in autonomous driving: a survey and taxonomy. IEEE Trans. Intell. Veh. (2023)

    Google Scholar 

  3. The DARPA urban challenge: autonomous vehicles in city traffic. springer (2009)

    Google Scholar 

  4. Xie, L., Xu, G., Cai, D., He, X.: X-view: non-egocentric multi-view 3D object detector. IEEE Trans. Image Process. 32, 1488–1497 (2023)

    Article  Google Scholar 

  5. Sindagi, V.A., Zhou, Y., Tuzel, O.: Mvx-net: Multimodal voxelnet for 3dd object detection. In: 2019 International Conference on Robotics and Automation (ICRA). IEEE pp. 7276–7282, (2019)

    Google Scholar 

  6. Zhen-dong, C.U.I., Zong-min, L.I., Shu-lin, Y.A.N.G., Yu-jie, L.I.U., Hua, L.I.: 3D object detection based on semantic segmentation quidance. J Graph., 43(06), 1134–1142 (2022)

    Google Scholar 

  7. Yang, Z., Sun, Y., Shu, L., et al.: IPOD: Intensive Point-based Object Detector for Point Cloud.

    Google Scholar 

  8. 2019.Vora, S., Lang, A.H., Helou, B., et al.: Pointpainting: sequential fusion for 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4604–4612 (2020)

    Google Scholar 

  9. Xie, L., Xiang, C., Yu, Z., et al.: PI-RCNN: an efficient multi-sensor 3D object detector with point-based attentive cont-conv fusion module. In: Proceedings of the AAAI conference on artificial intelligence, 34(07), 12460–12467 (2020)

    Google Scholar 

  10. Wang, C., Ma, C., Zhu, M., et al.: Pointaugmenting: cross-modal augmentation for 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 11794–11803

    Google Scholar 

  11. Park, J., Yoo, H., Wang, Y.: Drivable dirt road region identification using image and point cloud semantic segmentation fusion. IEEE Trans. Intell. Transp. Syst. 23(8), 13203–13216 (2021)

    Article  Google Scholar 

  12. Chen, A., Zhang, K., Zhang, R., et al.: Pimae: point cloud and image interactive masked autoencoders for 3D object detection; In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5291–5301 (2023)

    Google Scholar 

  13. Xu, D., Anguelov, D., Jain, A.: Pointfusion: deep sensor fusion for 3D bounding box estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 244–253 (2018)

    Google Scholar 

  14. Li, M., Hu, Y., Zhao, N., et al.: One-stage multi-sensor data fusion convolutional neural network for 3d object detection. Sensors 19(6), 1434 (2019)

    Article  Google Scholar 

  15. Li, W., Yuan, Q., Chen, L., Zheng, L., Tang, X.: Human target detection method based on fusion of radar and lmage data. Data Acquisit. Process. 36(02), 324–333 (2021)

    Google Scholar 

  16. Tao, B., Yan, F., Yin, Z., Wu, D.:3D object detection based on high-precision map enhancement. J. Jilin Univ. (Engineering Edition) 53(03), 802–809 (2023)

    Google Scholar 

  17. Jiao, Y., Jie, Z., Chen, S., et al.: MSMDfusion: Fusing lidar and camera at multiple scales with multi-depth seeds for 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 21643–21652 (2023)

    Google Scholar 

  18. Liang, M., Yang, B., Wang, S., et al.: Deep continuous fusion for multi-sensor 3D object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 641–656 (2018)

    Google Scholar 

  19. Yoo, J.H., Kim, Y., Kim, J., et al.: 3d-cvf: Generating joint camera and lidar features using cross-view spatial feature fusion for 3d object detection. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVII 16. Springer International Publishing, pp. 720−736 (2020)

    Google Scholar 

  20. Wen, L.H., Jo, K.H.: Three-attention mechanisms for one-stage 3-d object detection based on LiDAR and camera. IEEE Trans. Industr. Inf. 17(10), 6655–6663 (2021)

    Article  Google Scholar 

  21. Huang, T., Liu, Z., Chen, X., et al.: Epnet: enhancing point features with image semantics for 3D object detection. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XV 16. Springer International Publishing, pp. 35−52 (2020)

    Google Scholar 

  22. Cal, Z., Zhao, J., Zhu, F.: Single-stage object detection with fusion of point cloud and lmage feature. Comput. Eng. Appl. 59(09), 140–149 (2023)

    Google Scholar 

  23. Su, H., Jampani, V., Sun, D., et al.: Splatnet: sparse lattice networks for point cloud processing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2530–2539 (2018)

    Google Scholar 

  24. Piergiovanni, A.J., Casser, V., Ryoo, M.S., et al.: 4d-net for learned multi-modal alignment. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15435–15445 (2021)

    Google Scholar 

  25. Chen, X., Ma, H., Wan, J., et al.: Multi-view 3d object detection network for autonomous driving. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 1907–1915 (2017)

    Google Scholar 

  26. Zhu, M., Ma, C., Ji, P., et al.: Cross-modality 3D object detection. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3772–3781 (2021)

    Google Scholar 

  27. Ku, J., Mozifian, M., Lee, J., et al.: Joint 3D proposal generation and object detection from view aggregation. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp.1–8 (2018)

    Google Scholar 

  28. Qi, C.R., Chen, X., Litany, O., et al.: Imvotenet: boosting 3D object detection in point clouds with image votes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4404–4413 (2020)

    Google Scholar 

  29. Wu, Y., Jiang, X., Fang, Z., et al.: Multi-modal 3d object detection by 2D-guided precision anchor proposal and multi-layer fusion. Appl. Soft Comput. 108, 107405 (2021)

    Article  Google Scholar 

  30. Huang, Y., Li, B., Huang, Q., Zhou, J., Wang, L., Zhu, J.: Camera-LiDAR Fusion for Object Detection, Tracking and Prediction. J. Wuhan Univ. (Information Science Edition), pp. 1–8 (2023)

    Google Scholar 

  31. Cui, S., Jiang, H.-L., Rong, H., Wang, W.-Y.: A Survey of Multi-sensor Information Fusion Technology. Autom. Electron. 09, 41–43 (2018)

    Google Scholar 

  32. Pang, S., Morris, D., Radha, H.: CLOCs: Camera-LiDAR object candidates fusion for 3D object detection. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp.10386–10393 (2020)

    Google Scholar 

  33. Wu, T.E., Tsai, C.C., Guo J.I.: LiDAR/camera sensor fusion technology for pedestrian detection. In: 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, pp. 1675–1678 (2017)

    Google Scholar 

  34. Asvadi, A., Garrote, L., Premebida, C., et al.: Multimodal vehicle detection: fusing 3D-LIDAR and color camera data. Pattern Recogn. Lett. 115, 20–29 (2018)

    Article  Google Scholar 

  35. Zheng, S., Li, W., Hu, J.: Vehicle detection in the traffic environment based on the fusion of laserpoint cloud and image information. J. Instrument. Measure. 40(12), 143–151 (2019)

    Google Scholar 

  36. Zhao, Y., Wang, X., Gao, L., Liu, Y., Dai, Y.: 3D target detection method combined with multi-view mutual projectiolfusion. J. Beijing Inst. Technol. 42(12), 1273–1282 (2022)

    Google Scholar 

  37. Wang, Z., Jia, K.: Frustum convnet: Sliding frustums to aggregate local point-wise features for amodal 3D object detection. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp.1742–1749 (2019)

    Google Scholar 

  38. Qi, C.R., Liu, W., Wu, C., et al.: Frustum pointnets for 3D object detection from rgb-d data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 918–927 (2018)

    Google Scholar 

  39. Li, M.: Pedestrian Detection and Tracking Technology Based on the Fusion of Laser Point and Image [D].National University of Defense Technology (2017)

    Google Scholar 

  40. Guo, Y., Wang, H., Hu, Q., et al.: Deep learning for 3d point clouds: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 43(12), 4338–4364 (2020)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to xiao ya Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, x.y., Zhang, p.p., Dou, m.s., Tian, s.h. (2024). A Review of Image and Point Cloud Fusion in Autonomous Driving. In: Li, J., Zhang, B., Ying, Y. (eds) 6GN for Future Wireless Networks. 6GN 2023. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 554. Springer, Cham. https://doi.org/10.1007/978-3-031-53404-1_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-53404-1_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-53403-4

  • Online ISBN: 978-3-031-53404-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics