Abstract
In the task of autonomous driving perception scenarios, multi-sensor fusion is gradually becoming the current mainstream trend. At this stage, researchers use multimodal fusion to leverage information and ultimately improve target detection efficiency. Most of the current research focus on the fusion of camera and LIDAR. In this paper, we summarize the multimodal-based approaches for autonomous driving perception tasks in deep learning within the last five years. And we provide a detailed analysis of several papers on target detection tasks using LiDAR and cameras. Unlike the traditional way of classifying fusion models, this paper classifies them into three types of structures: data fusion, feature fusion, and result fusion by the different stages of feature fusion in the model. Finally, it is proposed that the future should clarify the evaluation index, improve the data enhancement methods in different modes, and use multiple fusion methods in parallel in the future.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
*Supported by National Natural Science Foundation of China Youth Fund (No.61802247) Natural Science Foundation of Shanghai (No.22ZR1425300) and Other projects of Shanghai Science and Technology Commission (No.21010501000).
References
Yurtsever, E., Lambert, J., Carballo, A., Takeda, K.: A Survey of autonomous driving: common practices and emerging technologies. IEEE Access 8, 58443–58469 (2020)
Wang, et al., “Multi-modal 3D Object detection in autonomous driving: a survey and taxonomy. IEEE Trans. Intell. Veh. (2023)
The DARPA urban challenge: autonomous vehicles in city traffic. springer (2009)
Xie, L., Xu, G., Cai, D., He, X.: X-view: non-egocentric multi-view 3D object detector. IEEE Trans. Image Process. 32, 1488–1497 (2023)
Sindagi, V.A., Zhou, Y., Tuzel, O.: Mvx-net: Multimodal voxelnet for 3dd object detection. In: 2019 International Conference on Robotics and Automation (ICRA). IEEE pp. 7276–7282, (2019)
Zhen-dong, C.U.I., Zong-min, L.I., Shu-lin, Y.A.N.G., Yu-jie, L.I.U., Hua, L.I.: 3D object detection based on semantic segmentation quidance. J Graph., 43(06), 1134–1142 (2022)
Yang, Z., Sun, Y., Shu, L., et al.: IPOD: Intensive Point-based Object Detector for Point Cloud.
2019.Vora, S., Lang, A.H., Helou, B., et al.: Pointpainting: sequential fusion for 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4604–4612 (2020)
Xie, L., Xiang, C., Yu, Z., et al.: PI-RCNN: an efficient multi-sensor 3D object detector with point-based attentive cont-conv fusion module. In: Proceedings of the AAAI conference on artificial intelligence, 34(07), 12460–12467 (2020)
Wang, C., Ma, C., Zhu, M., et al.: Pointaugmenting: cross-modal augmentation for 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 11794–11803
Park, J., Yoo, H., Wang, Y.: Drivable dirt road region identification using image and point cloud semantic segmentation fusion. IEEE Trans. Intell. Transp. Syst. 23(8), 13203–13216 (2021)
Chen, A., Zhang, K., Zhang, R., et al.: Pimae: point cloud and image interactive masked autoencoders for 3D object detection; In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5291–5301 (2023)
Xu, D., Anguelov, D., Jain, A.: Pointfusion: deep sensor fusion for 3D bounding box estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 244–253 (2018)
Li, M., Hu, Y., Zhao, N., et al.: One-stage multi-sensor data fusion convolutional neural network for 3d object detection. Sensors 19(6), 1434 (2019)
Li, W., Yuan, Q., Chen, L., Zheng, L., Tang, X.: Human target detection method based on fusion of radar and lmage data. Data Acquisit. Process. 36(02), 324–333 (2021)
Tao, B., Yan, F., Yin, Z., Wu, D.:3D object detection based on high-precision map enhancement. J. Jilin Univ. (Engineering Edition) 53(03), 802–809 (2023)
Jiao, Y., Jie, Z., Chen, S., et al.: MSMDfusion: Fusing lidar and camera at multiple scales with multi-depth seeds for 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 21643–21652 (2023)
Liang, M., Yang, B., Wang, S., et al.: Deep continuous fusion for multi-sensor 3D object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 641–656 (2018)
Yoo, J.H., Kim, Y., Kim, J., et al.: 3d-cvf: Generating joint camera and lidar features using cross-view spatial feature fusion for 3d object detection. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVII 16. Springer International Publishing, pp. 720−736 (2020)
Wen, L.H., Jo, K.H.: Three-attention mechanisms for one-stage 3-d object detection based on LiDAR and camera. IEEE Trans. Industr. Inf. 17(10), 6655–6663 (2021)
Huang, T., Liu, Z., Chen, X., et al.: Epnet: enhancing point features with image semantics for 3D object detection. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XV 16. Springer International Publishing, pp. 35−52 (2020)
Cal, Z., Zhao, J., Zhu, F.: Single-stage object detection with fusion of point cloud and lmage feature. Comput. Eng. Appl. 59(09), 140–149 (2023)
Su, H., Jampani, V., Sun, D., et al.: Splatnet: sparse lattice networks for point cloud processing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2530–2539 (2018)
Piergiovanni, A.J., Casser, V., Ryoo, M.S., et al.: 4d-net for learned multi-modal alignment. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15435–15445 (2021)
Chen, X., Ma, H., Wan, J., et al.: Multi-view 3d object detection network for autonomous driving. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 1907–1915 (2017)
Zhu, M., Ma, C., Ji, P., et al.: Cross-modality 3D object detection. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3772–3781 (2021)
Ku, J., Mozifian, M., Lee, J., et al.: Joint 3D proposal generation and object detection from view aggregation. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp.1–8 (2018)
Qi, C.R., Chen, X., Litany, O., et al.: Imvotenet: boosting 3D object detection in point clouds with image votes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4404–4413 (2020)
Wu, Y., Jiang, X., Fang, Z., et al.: Multi-modal 3d object detection by 2D-guided precision anchor proposal and multi-layer fusion. Appl. Soft Comput. 108, 107405 (2021)
Huang, Y., Li, B., Huang, Q., Zhou, J., Wang, L., Zhu, J.: Camera-LiDAR Fusion for Object Detection, Tracking and Prediction. J. Wuhan Univ. (Information Science Edition), pp. 1–8 (2023)
Cui, S., Jiang, H.-L., Rong, H., Wang, W.-Y.: A Survey of Multi-sensor Information Fusion Technology. Autom. Electron. 09, 41–43 (2018)
Pang, S., Morris, D., Radha, H.: CLOCs: Camera-LiDAR object candidates fusion for 3D object detection. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp.10386–10393 (2020)
Wu, T.E., Tsai, C.C., Guo J.I.: LiDAR/camera sensor fusion technology for pedestrian detection. In: 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, pp. 1675–1678 (2017)
Asvadi, A., Garrote, L., Premebida, C., et al.: Multimodal vehicle detection: fusing 3D-LIDAR and color camera data. Pattern Recogn. Lett. 115, 20–29 (2018)
Zheng, S., Li, W., Hu, J.: Vehicle detection in the traffic environment based on the fusion of laserpoint cloud and image information. J. Instrument. Measure. 40(12), 143–151 (2019)
Zhao, Y., Wang, X., Gao, L., Liu, Y., Dai, Y.: 3D target detection method combined with multi-view mutual projectiolfusion. J. Beijing Inst. Technol. 42(12), 1273–1282 (2022)
Wang, Z., Jia, K.: Frustum convnet: Sliding frustums to aggregate local point-wise features for amodal 3D object detection. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp.1742–1749 (2019)
Qi, C.R., Liu, W., Wu, C., et al.: Frustum pointnets for 3D object detection from rgb-d data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 918–927 (2018)
Li, M.: Pedestrian Detection and Tracking Technology Based on the Fusion of Laser Point and Image [D].National University of Defense Technology (2017)
Guo, Y., Wang, H., Hu, Q., et al.: Deep learning for 3d point clouds: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 43(12), 4338–4364 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Wang, x.y., Zhang, p.p., Dou, m.s., Tian, s.h. (2024). A Review of Image and Point Cloud Fusion in Autonomous Driving. In: Li, J., Zhang, B., Ying, Y. (eds) 6GN for Future Wireless Networks. 6GN 2023. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 554. Springer, Cham. https://doi.org/10.1007/978-3-031-53404-1_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-53404-1_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-53403-4
Online ISBN: 978-3-031-53404-1
eBook Packages: Computer ScienceComputer Science (R0)