{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T14:52:48Z","timestamp":1740149568419,"version":"3.37.3"},"reference-count":52,"publisher":"MDPI AG","issue":"23","license":[{"start":{"date-parts":[[2022,12,1]],"date-time":"2022-12-01T00:00:00Z","timestamp":1669852800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"In this review, we provide a detailed coverage of multi-sensor fusion techniques that use RGB stereo images and a sparse LiDAR-projected depth map as input data to output a dense depth map prediction. We cover state-of-the-art fusion techniques which, in recent years, have been deep learning-based methods that are end-to-end trainable. We then conduct a comparative evaluation of the state-of-the-art techniques and provide a detailed analysis of their strengths and limitations as well as the applications they are best suited for.<\/jats:p>","DOI":"10.3390\/s22239364","type":"journal-article","created":{"date-parts":[[2022,12,1]],"date-time":"2022-12-01T09:28:57Z","timestamp":1669886937000},"page":"9364","source":"Crossref","is-referenced-by-count":11,"title":["A Critical Review of Deep Learning-Based Multi-Sensor Fusion Techniques"],"prefix":"10.3390","volume":"22","author":[{"given":"Benedict","family":"Marsh","sequence":"first","affiliation":[{"name":"Institute of Digital Futures, Brunel University London, Kingston Ln, Uxbridge UB8 3PH, UK"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9825-5911","authenticated-orcid":false,"given":"Abdul Hamid","family":"Sadka","sequence":"additional","affiliation":[{"name":"Institute of Digital Futures, Brunel University London, Kingston Ln, Uxbridge UB8 3PH, UK"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3476-9104","authenticated-orcid":false,"given":"Hamid","family":"Bahai","sequence":"additional","affiliation":[{"name":"Institute of Materials and Manufacturing, Brunel University London, Kingston Ln, Uxbridge UB8 3PH, UK"}]}],"member":"1968","published-online":{"date-parts":[[2022,12,1]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Chang, J.R., and Chen, Y.S. (2018, January 18\u201323). Pyramid stereo matching network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00567"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Guo, X., Yang, K., Yang, W., Wang, X., and Li, H. (2019, January 15\u201320). Group-wise correlation stereo network. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00339"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Kendall, A., Martirosyan, H., Dasgupta, S., Henry, P., Kennedy, R., Bachrach, A., and Bry, A. (2017, January 22\u201329). End-to-end learning of geometry and context for deep stereo regression. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.17"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Yao, Y., Luo, Z., Li, S., Fang, T., and Quan, L. (2018, January 8\u201314). Mvsnet: Depth inference for unstructured multi-view stereo. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01237-3_47"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Zhang, F., Prisacariu, V., Yang, R., and Torr, P.H. (2019, January 15\u201320). Ga-net: Guided aggregation net for end-to-end stereo matching. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00027"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Zhang, F., Qi, X., Yang, R., Prisacariu, V., Wah, B., and Torr, P. (2020, January 23\u201328). Domain-invariant stereo matching networks. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58536-5_25"},{"key":"ref_7","unstructured":"Zhong, Y., Dai, Y., and Li, H. (2017). Self-supervised learning for stereo matching with self-improving ability. arXiv."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Tankovich, V., Hane, C., Zhang, Y., Kowdle, A., Fanello, S., and Bouaziz, S. (2021, January 20\u201325). Hitnet: Hierarchical iterative tile refinement network for real-time stereo matching. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01413"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Li, Z., Liu, X., Drenkow, N., Ding, A., Creighton, F.X., Taylor, R.H., and Unberath, M. (2021, January 10\u201317). Revisiting stereo depth estimation from a sequence-to-sequence perspective with transformers. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00614"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Khamis, S., Fanello, S., Rhemann, C., Kowdle, A., Valentin, J., and Izadi, S. (2018, January 8\u201314). Stereonet: Guided hierarchical refinement for real-time edge-aware depth prediction. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01267-0_35"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Yang, G., Manela, J., Happold, M., and Ramanan, D. (2019, January 15\u201320). Hierarchical deep stereo matching on high-resolution images. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00566"},{"key":"ref_12","first-page":"2287","article-title":"Stereo matching by training a convolutional neural network to compare image patches","volume":"17","author":"Zbontar","year":"2016","journal-title":"J. Mach. Learn. Res."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Mayer, N., Ilg, E., Hausser, P., Fischer, P., Cremers, D., Dosovitskiy, A., and Brox, T. (2016, January 27\u201330). A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.438"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Cheng, X., Wang, P., Guan, C., and Yang, R. (2019). CSPN++: Learning Context and Resource Aware Convolutional Spatial Propagation Networks for Depth Completion. arXiv.","DOI":"10.1609\/aaai.v34i07.6635"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Qiu, J., Cui, Z., Zhang, Y., Zhang, X., Liu, S., Zeng, B., and Pollefeys, M. (2019, January 15\u201320). Deeplidar: Deep surface normal guided depth prediction for outdoor scene from sparse lidar data and single color image. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00343"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Yan, Z., Wang, K., Li, X., Zhang, Z., Xu, B., Li, J., and Yang, J. (2021). RigNet: Repetitive image guided network for depth completion. arXiv.","DOI":"10.1007\/978-3-031-19812-0_13"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Lee, B.U., Lee, K., and Kweon, I.S. (2021, January 20\u201325). Depth completion using plane-residual representation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01370"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"5404","DOI":"10.1109\/TNNLS.2021.3072883","article-title":"Multitask gans for semantic segmentation and depth completion with cycle consistency","volume":"32","author":"Zhang","year":"2021","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"5264","DOI":"10.1109\/TIP.2021.3079821","article-title":"Adaptive context-aware multi-modal network for depth completion","volume":"30","author":"Zhao","year":"2021","journal-title":"IEEE Trans. Image Process."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Liu, L., Song, X., Lyu, X., Diao, J., Wang, M., Liu, Y., and Zhang, L. (2020). FCFR-Net: Feature Fusion based Coarse-to-Fine Residual Learning for Depth Completion. arXiv.","DOI":"10.1609\/aaai.v35i3.16311"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Chen, Y., Yang, B., Liang, M., and Urtasun, R. (2019\u20132, January 27). Learning joint 2d-3d representations for depth completion. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea.","DOI":"10.1109\/ICCV.2019.01012"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Cheng, X., Wang, P., and Yang, R. (2018, January 8\u201314). Depth estimation via affinity learned with convolutional spatial propagation network. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01270-0_7"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Park, K., Kim, S., and Sohn, K. (2018, January 21\u201325). High-precision depth estimation with the 3d lidar and stereo fusion. Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia.","DOI":"10.1109\/ICRA.2018.8461048"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Zhang, J., Ramanagopal, M.S., Vasudevan, R., and Johnson-Roberson, M. (August, January 31). Listereo: Generate dense depth maps from lidar and stereo imagery. Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France.","DOI":"10.1109\/ICRA40945.2020.9196628"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Maddern, W., and Newman, P. (2016, January 9\u201314). Real-time probabilistic fusion of sparse 3d lidar and dense stereo. Proceedings of the 2016 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, Republic of Korea.","DOI":"10.1109\/IROS.2016.7759342"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Lipson, L., Teed, Z., and Deng, J. (2021, January 1\u20133). Raft-stereo: Multilevel recurrent field transforms for stereo matching. Proceedings of the 2021 International Conference on 3D Vision (3DV), London, UK.","DOI":"10.1109\/3DV53792.2021.00032"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Li, J., Wang, P., Xiong, P., Cai, T., Yan, Z., Yang, L., Liu, J., Fan, H., and Liu, S. (2022, January 19\u201324). Practical stereo matching via cascaded recurrent network with adaptive correlation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01578"},{"key":"ref_28","first-page":"22158","article-title":"Hierarchical neural architecture search for deep stereo matching","volume":"33","author":"Cheng","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Hu, M., Wang, S., Li, B., Ning, S., Fan, L., and Gong, X. (June, January 30). Penet: Towards precise and efficient image guided depth completion. Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi\u2019an, China.","DOI":"10.1109\/ICRA48506.2021.9561035"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Nazir, D., Liwicki, M., Stricker, D., and Afzal, M.Z. (2022). SemAttNet: Towards Attention-based Semantic Aware Guided Depth Completion. arXiv.","DOI":"10.1109\/ACCESS.2022.3214316"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Lin, Y., Cheng, T., Zhong, Q., Zhou, W., and Yang, H. (2022). Dynamic spatial propagation network for depth completion. arXiv.","DOI":"10.1609\/aaai.v36i2.20055"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"4672","DOI":"10.1109\/LRA.2021.3068712","article-title":"Volumetric propagation network: Stereo-lidar fusion for long-range depth estimation","volume":"6","author":"Choe","year":"2021","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Wang, T.H., Hu, H.N., Lin, C.H., Tsai, Y.H., Chiu, W.C., and Sun, M. (2019, January 3\u20138). 3D lidar and stereo fusion using stereo matching network with conditional cost volume normalization. Proceedings of the 2019 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China.","DOI":"10.1109\/IROS40897.2019.8968170"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Cheng, X., Zhong, Y., Dai, Y., Ji, P., and Li, H. (2019, January 15\u201320). Noise-aware unsupervised deep lidar-stereo fusion. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00650"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Teed, Z., and Deng, J. (2020, January 23\u201328). Raft: Recurrent all-pairs field transforms for optical flow. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58536-5_24"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Menze, M., and Geiger, A. (2015, January 7\u201312). Object scene flow for autonomous vehicles. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298925"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Zhu, X., Hu, H., Lin, S., and Dai, J. (2019, January 15\u201320). Deformable convnets v2: More deformable, better results. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00953"},{"key":"ref_38","unstructured":"Liu, H., Simonyan, K., and Yang, Y. (2018). Darts: Differentiable architecture search. arXiv."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Liu, C., Chen, L.C., Schroff, F., Adam, H., Hua, W., Yuille, A.L., and Fei-Fei, L. (2019, January 15\u201320). Auto-deeplab: Hierarchical neural architecture search for semantic image segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00017"},{"key":"ref_40","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (July, January 27). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"1116","DOI":"10.1109\/TIP.2020.3040528","article-title":"Learning guided convolutional network for depth completion","volume":"30","author":"Tang","year":"2020","journal-title":"IEEE Trans. Image Process."},{"key":"ref_42","unstructured":"Yu, F., and Koltun, V. (2015). Multi-scale context aggregation by dilated convolutions. arXiv."},{"key":"ref_43","unstructured":"Fooladgar, F., and Kasaei, S. (2019). Multi-modal attention-based fusion model for semantic segmentation of rgb-depth images. arXiv."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1016\/j.patcog.2019.01.006","article-title":"Wider or deeper: Revisiting the resnet model for visual recognition","volume":"90","author":"Wu","year":"2019","journal-title":"Pattern Recognit."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Van Gansbeke, W., Neven, D., De Brabandere, B., and Van Gool, L. (2019, January 27\u201331). Sparse and noisy lidar completion with rgb guidance and uncertainty. Proceedings of the 2019 16th International Conference on Machine Vision Applications (MVA), Tokyo, Japan.","DOI":"10.23919\/MVA.2019.8757939"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015, January 5\u20139). U-net: Convolutional networks for biomedical image segmentation. Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany.","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Cheng, X., Wang, P., Zhou, Y., Guan, C., and Yang, R. (August, January 31). Omnidirectional depth extension networks. Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France.","DOI":"10.1109\/ICRA40945.2020.9197123"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Xu, Z., Yin, H., and Yao, J. (2020, January 25\u201328). Deformable spatial propagation networks for depth completion. Proceedings of the 2020 IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates.","DOI":"10.1109\/ICIP40778.2020.9191138"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Newell, A., Yang, K., and Deng, J. (2016, January 11\u201314). Stacked hourglass networks for human pose estimation. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46484-8_29"},{"key":"ref_50","unstructured":"Perez, E., De Vries, H., Strub, F., Dumoulin, V., and Courville, A. (2017). Learning visual reasoning without strong priors. arXiv."},{"key":"ref_51","first-page":"6597","article-title":"Modulating early visual processing by language","volume":"30","author":"Strub","year":"2017","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Uhrig, J., Schneider, N., Schneider, L., Franke, U., Brox, T., and Geiger, A. (2017, January 10\u201312). Sparsity invariant CNNs. Proceedings of the 2017 International Conference on 3D Vision (3DV), Qingdao, China.","DOI":"10.1109\/3DV.2017.00012"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/23\/9364\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,10]],"date-time":"2024-08-10T14:01:24Z","timestamp":1723298484000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/23\/9364"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,1]]},"references-count":52,"journal-issue":{"issue":"23","published-online":{"date-parts":[[2022,12]]}},"alternative-id":["s22239364"],"URL":"https:\/\/doi.org\/10.3390\/s22239364","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2022,12,1]]}}}