{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2023,4,18]],"date-time":"2023-04-18T05:14:07Z","timestamp":1681794847657},"reference-count":40,"publisher":"MDPI AG","issue":"8","license":[{"start":{"date-parts":[[2023,4,17]],"date-time":"2023-04-17T00:00:00Z","timestamp":1681689600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61976227","62176096"],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003819","name":"Natural Science Foundation of Hubei Province","doi-asserted-by":"publisher","award":["2020CFA025"],"id":[{"id":"10.13039\/501100003819","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"Recently, self-supervised multi-view stereo (MVS) methods, which are dependent primarily on optimizing networks using photometric consistency, have made clear progress. However, the difference in lighting between different views and reflective objects in the scene can make photometric consistency unreliable. To address this issue, a geometric prior-guided multi-view stereo (GP-MVS) for self-supervised learning is proposed, which exploits the geometric prior from the input data to obtain high-quality depth pseudo-labels. Specifically, two types of pseudo-labels for self-supervised MVS are proposed, based on the structure-from-motion (SfM) and traditional MVS methods. One converts the sparse points of SfM into sparse depth maps and combines the depth maps with spatial smoothness constraints to obtain a sparse prior loss. The other generates initial depth maps for semi-dense depth pseudo-labels using the traditional MVS, and applies a geometric consistency check to filter the wrong depth in the initial depth maps. We conducted extensive experiments on the DTU and Tanks and Temples datasets, which demonstrate that our method achieves state-of-the-art performance compared to existing unsupervised\/self-supervised approaches, and even performs on par with traditional and supervised approaches.<\/jats:p>","DOI":"10.3390\/rs15082109","type":"journal-article","created":{"date-parts":[[2023,4,17]],"date-time":"2023-04-17T09:51:41Z","timestamp":1681725101000},"page":"2109","source":"Crossref","is-referenced-by-count":0,"title":["Geometric Prior-Guided Self-Supervised Learning for Multi-View Stereo"],"prefix":"10.3390","volume":"15","author":[{"ORCID":"http:\/\/orcid.org\/0000-0003-3775-8571","authenticated-orcid":false,"given":"Liman","family":"Liu","sequence":"first","affiliation":[{"name":"School of Biomedical Engineering, South-Central Minzu University, Wuhan 430074, China"}]},{"ORCID":"http:\/\/orcid.org\/0000-0002-5405-590X","authenticated-orcid":false,"given":"Fenghao","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Biomedical Engineering, South-Central Minzu University, Wuhan 430074, China"}]},{"ORCID":"http:\/\/orcid.org\/0000-0002-5497-4682","authenticated-orcid":false,"given":"Wanjuan","family":"Su","sequence":"additional","affiliation":[{"name":"National Key Laboratory of Science and Technology on Multi-Spectral Information Processing, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China"}]},{"given":"Yuhang","family":"Qi","sequence":"additional","affiliation":[{"name":"National Key Laboratory of Science and Technology on Multi-Spectral Information Processing, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China"}]},{"given":"Wenbing","family":"Tao","sequence":"additional","affiliation":[{"name":"National Key Laboratory of Science and Technology on Multi-Spectral Information Processing, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China"}]}],"member":"1968","published-online":{"date-parts":[[2023,4,17]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Cheng, S., Xu, Z., Zhu, S., Li, Z., Li, L.E., Ramamoorthi, R., and Su, H. (2020, January 13\u201319). Deep stereo using adaptive thin volume representation with uncertainty awareness. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00260"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Gon\u00e7alves, G., Gon\u00e7alves, D., G\u00f3mez-Guti\u00e9rrez, \u00c1., Andriolo, U., and P\u00e9rez-Alv\u00e1rez, J.A. (2021). 3D reconstruction of coastal cliffs from fixed-wing and multi-rotor uas: Impact of sfm-mvs processing parameters, image redundancy and acquisition geometry. Remote Sens., 13.","DOI":"10.3390\/rs13061222"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"44","DOI":"10.1016\/j.cag.2015.09.003","article-title":"MVE\u2014An image-based reconstruction environment","volume":"53","author":"Fuhrmann","year":"2015","journal-title":"Comput. Graph."},{"key":"ref_4","unstructured":"Cernea, D. (2015, May 20). OpenMVS: Multi-View Stereo Reconstruction Library. Available online: https:\/\/cdcseacave.github.io\/openMVS."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Schonberger, J.L., and Frahm, J.M. (2016, January 27\u201330). Structure-from-motion revisited. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.445"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Sch\u00f6nberger, J.L., Zheng, E., Frahm, J.M., and Pollefeys, M. (2016, January 11\u201314). Pixelwise view selection for unstructured multi-view stereo. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46487-9_31"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Ji, M., Gall, J., Zheng, H., Liu, Y., and Fang, L. (2017, January 22\u201329). Surfacenet: An end-to-end 3d neural network for multiview stereopsis. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.253"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Zhong, Y., Li, H., and Dai, Y. (2018, January 8\u201314). Open-world stereo video matching with deep rnn. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01216-8_7"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"308","DOI":"10.1016\/j.neucom.2022.04.095","article-title":"End-to-end learning of self-rectification and self-supervised disparity prediction for stereo vision","volume":"494","author":"Zhang","year":"2022","journal-title":"Neurocomputing"},{"key":"ref_10","unstructured":"Khot, T., Agrawal, S., Tulsiani, S., Mertz, C., Lucey, S., and Hebert, M. (2019). Learning unsupervised multi-view stereopsis via robust photometric consistency. arXiv."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Dai, Y., Zhu, Z., Rao, Z., and Li, B. (2019, January 16\u201319). Mvs2: Deep unsupervised multi-view stereo with multi-view symmetry. Proceedings of the 2019 International Conference on 3D Vision (3DV), Quebec, QC, Canada.","DOI":"10.1109\/3DV.2019.00010"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Huang, B., Yi, H., Huang, C., He, Y., Liu, J., and Liu, X. (2021, January 19\u201322). M3VSNet: Unsupervised multi-metric multi-view stereo network. Proceedings of the 2021 IEEE International Conference on Image Processing (ICIP), Anchorage, AK, USA.","DOI":"10.1109\/ICIP42928.2021.9506469"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Xu, H., Zhou, Z., Qiao, Y., Kang, W., and Wu, Q. (2021, January 2\u20139). Self-supervised multi-view stereo via effective co-segmentation and data-augmentation. Proceedings of the AAAI Conference on Artificial Intelligence, Virtual Event.","DOI":"10.1609\/aaai.v35i4.16411"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Yang, J., Alvarez, J.M., and Liu, M. (2021, January 20\u201325). Self-supervised learning of depth inference for multi-view stereo. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00744"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Xu, H., Zhou, Z., Wang, Y., Kang, W., Sun, B., Li, H., and Qiao, Y. (2021, January 11\u201317). Digging into uncertainty in self-supervised multi-view stereo. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00602"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"418","DOI":"10.1109\/TPAMI.2005.44","article-title":"A quasi-dense approach to surface reconstruction from uncalibrated images","volume":"27","author":"Lhuillier","year":"2005","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"1362","DOI":"10.1109\/TPAMI.2009.161","article-title":"Accurate, dense, and robust multiview stereopsis","volume":"32","author":"Furukawa","year":"2009","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Hane, C., Zach, C., Cohen, A., Angst, R., and Pollefeys, M. (2013, January 23\u201328). Joint 3D scene reconstruction and class segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA.","DOI":"10.1109\/CVPR.2013.20"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"1901","DOI":"10.1109\/TIP.2013.2237921","article-title":"Accurate multiple view 3d reconstruction using patch-based stereo for large-scale scenes","volume":"22","author":"Shen","year":"2013","journal-title":"IEEE Trans. Image Process."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Zheng, E., Dunn, E., Jojic, V., and Frahm, J.M. (2014, January 23\u201328). Patchmatch based joint view selection and depthmap estimation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.196"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Galliani, S., Lasinger, K., and Schindler, K. (2015, January 7\u201313). Massively parallel multiview stereopsis by surface normal diffusion. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.106"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"46","DOI":"10.1016\/j.neucom.2015.09.109","article-title":"Multi-view stereo via depth map fusion: A coordinate decent optimization method","volume":"178","author":"Li","year":"2016","journal-title":"Neurocomputing"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Xu, Q., and Tao, W. (2019, January 15\u201320). Multi-scale geometric consistency guided multi-view stereo. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00563"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Zhou, L., Zhang, Z., Jiang, H., Sun, H., Bao, H., and Zhang, G. (2021). DP-MVS: Detail Preserving Multi-View Surface Reconstruction of Large-Scale Scenes. Remote Sens., 13.","DOI":"10.3390\/rs13224569"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Stathopoulou, E.K., Battisti, R., Cernea, D., Remondino, F., and Georgopoulos, A. (2021). Semantically derived geometric constraints for MVS reconstruction of textureless areas. Remote Sens., 13.","DOI":"10.3390\/rs13061053"},{"key":"ref_26","unstructured":"Bleyer, M., Rhemann, C., and Rother, C. (September, January 29). Patchmatch stereo-stereo matching with slanted support windows. Proceedings of the BMVC, Dundee, UK."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Yao, Y., Luo, Z., Li, S., Fang, T., and Quan, L. (2018, January 8\u201314). Mvsnet: Depth inference for unstructured multi-view stereo. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01237-3_47"},{"key":"ref_28","unstructured":"Xue, Y., Chen, J., Wan, W., Huang, Y., Yu, C., Li, T., and Bao, J. (November, January 27). Mvscrf: Learning multi-view stereo with conditional random fields. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_29","unstructured":"Luo, K., Guan, T., Ju, L., Huang, H., and Luo, Y. (November, January 27). P-mvsnet: Learning patch-wise matching confidence aggregation for multi-view stereo. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Xu, Q., and Tao, W. (2020, January 7\u201312). Learning inverse depth regression for multi-view stereo with correlation cost volume. Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA.","DOI":"10.1609\/aaai.v34i07.6939"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Yao, Y., Luo, Z., Li, S., Shen, T., Fang, T., and Quan, L. (2019, January 15\u201320). Recurrent mvsnet for high-resolution multi-view stereo depth inference. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00567"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Yan, J., Wei, Z., Yi, H., Ding, M., Zhang, R., Chen, Y., Wang, G., and Tai, Y.W. (2020, January 23\u201328). Dense hybrid recurrent multi-view stereo net with dynamic consistency checking. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58548-8_39"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Wei, Z., Zhu, Q., Min, C., Chen, Y., and Wang, G. (2021, January 11\u201317). Aa-rmvsnet: Adaptive aggregation recurrent multi-view stereo network. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00613"},{"key":"ref_34","unstructured":"Chen, R., Han, S., Xu, J., and Su, H. (November, January 27). Point-based multi-view stereo network. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Gu, X., Fan, Z., Zhu, S., Dai, Z., Tan, F., and Tan, P. (2020, January 13\u201319). Cascade cost volume for high-resolution multi-view stereo and stereo matching. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00257"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Yang, J., Mao, W., Alvarez, J.M., and Liu, M. (2020, January 13\u201319). Cost volume pyramid based depth inference for multi-view stereo. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00493"},{"key":"ref_37","unstructured":"Cao, C., Ren, X., and Fu, Y. (2023). MVSFormer: Multi-View Stereo by Learning Robust Image Features and Temperature-based Depth. arXiv."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Doll\u00e1r, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21\u201326). Feature pyramid networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"153","DOI":"10.1007\/s11263-016-0902-9","article-title":"Large-scale data for multiple-view stereopsis","volume":"120","author":"Jensen","year":"2016","journal-title":"Int. J. Comput. Vis."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3072959.3073599","article-title":"Tanks and temples: Benchmarking large-scale scene reconstruction","volume":"36","author":"Knapitsch","year":"2017","journal-title":"ACM Trans. Graph. (ToG)"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/15\/8\/2109\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,4,17]],"date-time":"2023-04-17T10:18:28Z","timestamp":1681726708000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/15\/8\/2109"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,4,17]]},"references-count":40,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2023,4]]}},"alternative-id":["rs15082109"],"URL":"https:\/\/doi.org\/10.3390\/rs15082109","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,4,17]]}}}