Abstract
Deep learning has made great improvements in multi-view stereo. Recent approaches typically adopt raw images as input and estimate depth through deep networks. However, as a primary geometric cue, edge information, which captures the structures of scenes well, is ignored by the existing multi-view stereo networks. To this end, we present an Edge-aware Spatial Propagation Network, named ESPDepth, a novel depth estimation network that utilizes edges to assist in the understanding of scene structures. To be exact, we first generate a coarse initial depth map with a shallow network. Then we design an Edge Information Encoding (EIE) module, to encode edge-aware features from the initial depth. Subsequently, we apply the proposed Edge-Aware spatial Propagation (EAP) module, to guide the iterative propagation on cost volumes. Finally, the edge optimized cost volumes are utilized to obtain the final depth map, serving as a refinement process. By introducing the edge information in the propagation of cost volumes, the proposed method performs well when capturing geometric shapes, thus alleviating the negative effects of the greatly changed depth on edges of real scenes. Experiments on ScanNet and 7-Scenes datasets demonstrate our method produces precise depth estimation, gaining improvements both on global structures and detailed regions.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Long X, Liu L, Theobalt C, Wang W (2020) Occlusion-aware depth estimation with adaptive normal constraints. In: European conference on computer vision, pp. 640–657. Springer
Zheng E, Dunn E, Jojic V, Frahm J-M (2014) Patchmatch based joint view selection and depthmap estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1510–1517
Galliani S, Lasinger K, Schindler K (2015) Massively parallel multiview stereopsis by surface normal diffusion. In: Proceedings of the IEEE international conference on computer vision, pp. 873–881
Schönberger JL, Zheng E, Frahm J-M, Pollefeys M (2016) Pixelwise view selection for unstructured multi-view stereo. In: European conference on computer vision, pp. 501–518. Springer
Xu Q, Tao W (2019) Multi-scale geometric consistency guided multi-view stereo. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp. 5483–5492
Xu Q, Tao W (2020) Planar prior assisted patchmatch multi-view stereo. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp. 12516–12523
Yao Y, Luo Z, Li S, Fang T, Quan L (2018) Mvsnet: Depth inference for unstructured multi-view stereo. In: Proceedings of the European conference on computer vision (ECCV), pp. 767–783
Gao Z, Li E, Wang Z, Yang G, Lu J, Ouyang B, Xu D, Liang Z (2021) Object reconstruction based on attentive recurrent network from single and multiple images. Neural Process Lett 53(1):653–670
Yao Y, Luo Z, Li S, Shen T, Fang T, Quan L (2019) Recurrent mvsnet for high-resolution multi-view stereo depth inference. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5525–5534
Yang J, Mao W, Alvarez JM, Liu M (2020) Cost volume pyramid based depth inference for multi-view stereo. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4877–4886
Gu X, Fan Z, Zhu S, Dai Z, Tan F, Tan P (2020) Cascade cost volume for high-resolution multi-view stereo and stereo matching. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2495–2504
Yu Z, Gao S (2020) Fast-mvsnet: sparse-to-dense multi-view stereo with learned propagation and gauss-newton refinement. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 1949–1958
Kusupati U, Cheng S, Chen R, Su H (2020) Normal assisted stereo depth estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2189–2199
Yu Z, Jin L, Gao S (2020) P\(^2\)net: Patch-match and plane-regularization for unsupervised indoor depth estimation. In: Computer Vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIV 16, pp. 206–222. Springer
Qi X, Liao R, Liu Z, Urtasun R, Jia J (2018) Geonet: geometric neural network for joint depth and surface normal estimation. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp. 283–291
Zhao W, Liu S, Wei Y, Guo H, Liu Y-J (2021) A confidence-based iterative solver of depths and surface normals for deep multi-view stereo. In: Proceedings of the IEEE/CVF International conference on computer vision, pp. 6168–6177
Long X, Lin C, Liu L, Li W, Theobalt C, Yang R, Wang W (2021) Adaptive surface normal constraint for depth estimation. arXiv preprint arXiv:2103.15483
Yin W, Liu Y, Shen C, Yan Y (2019) Enforcing geometric constraints of virtual normal for depth prediction. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 5684–5693
Yoon G-J, Song J, Hong Y-J, Yoon SM (2022) Single image based three-dimensional scene reconstruction using semantic and geometric priors. Neural Process Lett 54(5):3679–3694
Song X, Zhao X, Hu H, Fang L (2018) Edgestereo: a context integrated residual pyramid network for stereo matching. In: Asian conference on computer vision, Springer, pp. 20–35
Hu J, Ozay M, Zhang Y, Okatani T (2019) Revisiting single image depth estimation: toward higher resolution maps with accurate object boundaries. In: 2019 IEEE winter conference on applications of computer vision (WACV). IEEE, pp. 1043–1051
Wang K, Shen S (2018) Mvdepthnet: real-time multiview depth estimation neural network. In: 2018 International conference on 3d vision (3DV). IEEE, pp. 248–257
Khamis S, Fanello S, Rhemann C, Kowdle A, Valentin J, Izadi S (2018) Stereonet: Guided hierarchical refinement for real-time edge-aware depth prediction. In: Proceedings of the European conference on computer vision (ECCV), pp. 573–590
Qi X, Liu Z, Liao R, Torr PH, Urtasun R, Jia J (2020) Geonet++: iterative geometric neural network with edge-aware refinement for joint depth and surface normal estimation. IEEE Trans Patt Anal Mach Intell 44(2):969–984
Im S, Jeon H-G, Lin S, Kweon (2019) IS DPSNet: end-to-end deep plane sweep stereo. In: International conference on learning representations. https://openreview.net/forum?id=ryeYHi0ctQ
Cheng S, Xu Z, Zhu S, Li Z, Li LE, Ramamoorthi R, Su H (2020) Deep stereo using adaptive thin volume representation with uncertainty awareness. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition (CVPR)
Hou Y, Kannala J, Solin A (2019) Multi-view stereo by temporal nonparametric fusion. In: Proceedings of the IEEE/CVF International conference on computer vision, pp. 2651–2660
Xu Q, Tao W (2020) Learning inverse depth regression for multi-view stereo with correlation cost volume. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp. 12508–12515
Xu Q, Tao W (2020) Pvsnet: Pixelwise visibility-aware multi-view stereo network. arXiv preprint arXiv:2007.07714
Yang W, Ai X, Yang Z, Xu Y, Zhao Y (2020) Dedge-agmnet: An effective stereo matching network optimized by depth edge auxiliary task. In: Giacomo, G.D., Catalá, A., Dilkina, B., Milano, M., Barro, S., Bugarín, A., Lang, J. (eds.) ECAI 2020 - 24th European Conference on Artificial Intelligence, 29 August-8 September 2020, Santiago de Compostela, Spain, August 29 - September 8, 2020 - Including 10th Conference on prestigious applications of artificial intelligence (PAIS 2020). Frontiers in artificial intelligence and applications, vol. 325, pp. 2784–2791. https://doi.org/10.3233/FAIA200419
Zhu S, Brazil G, Liu X (2020) The edge of depth: explicit constraints between segmentation and depth. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp. 13116–13125
Xue F, Cao J, Zhou Y, Sheng F, Wang Y, Ming A (2021) Boundary-induced and scene-aggregated network for monocular depth prediction. Patt Recognit 115:107901
Gallup D, Frahm J-M, Mordohai P, Yang Q, Pollefeys M (2007) Real-time plane-sweeping stereo with multiple sweeping directions. In: 2007 IEEE conference on computer vision and pattern recognition, pp. 1–8 . https://doi.org/10.1109/CVPR.2007.383245
Canny J (1986) A computational approach to edge detection. IEEE Trans Patt Anal Mach Intell PAMI 8(6):679–698. https://doi.org/10.1109/TPAMI.1986.4767851
Zhang J, Yao Y, Li S, Luo Z, Fang T (2020) Visibility-aware multi-view stereo network. CoRR arXiv:abs/2008.07928
Cheng X, Wang P, Yang R (2020) Learning depth with convolutional spatial propagation network. IEEE Trans Patt Anal Mach Intell 42(10):2361–2379
Liu S, De Mello S, Gu J, Zhong G, Yang M, Kautz J (2017) Learning affinity via spatial propagation networks. Advances in Neural Information Processing Systems 2017-December, pp. 1521–1531
Dai A, Chang AX, Savva M, Halber M, Funkhouser T, Nießner M (2017) Scannet: Richly-annotated 3d reconstructions of indoor scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5828–5839
Shotton J, Glocker B, Zach C, Izadi S, Criminisi A, Fitzgibbon A (2013) Scene coordinate regression forests for camera relocalization in rgb-d images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2930–2937
Geiger A, Lenz P, Stiller C, Urtasun R (2013) Vision meets robotics: the kitti dataset. Int J Robot Res 32:1231–1237
Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. In: 28th Annual conference on neural information processing systems 2014, Neural information processing systems foundation, NIPS 2014, pp. 2366–2374
Yang Z, Ren Z, Shan Q, Huang Q (2021) MVS2D: efficient multi-view stereo via attention-driven 2D convolutions
Curless B, Levoy M (1996) A volumetric method for building complex models from range images. In: Proceedings of the 23rd annual conference on computer graphics and interactive techniques, pp. 303–312
Gan Y, Xu X, Sun W, Lin L (2018) Monocular depth estimation with affinity, vertical pooling, and label enhancement. In: Proceedings of the European conference on computer vision (ECCV), pp. 224–239
Fu H, Gong M, Wang C, Batmanghelich K, Tao D (2018) Deep ordinal regression network for monocular depth estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2002–2011
Lee JH, Han M-K, Ko DW, Suh IH (2019) From big to small: multi-scale local planar guidance for monocular depth estimation. arXiv preprint arXiv:1907.10326
Bhat SF, Alhashim I, Wonka P (2021) Adabins: depth estimation using adaptive bins. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4009–4018
Kim D, Ga W, Ahn P, Joo D, Chun S, Kim J (2022) Global-local path networks for monocular depth estimation with vertical cutdepth. arXiv preprint arXiv:2201.07436
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: 2021 IEEE/CVF International conference on computer vision (ICCV), pp. 9992–10002. https://doi.org/10.1109/ICCV48922.2021.00986
Liu N, Han J (2016) Dhsnet: Deep hierarchical saliency network for salient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 678–686
Fang C, Tian H, Zhang D, Zhang Q, Han J, Han J (2022) Densely nested top-down flows for salient object detection. Science China Inform Sci 65(8):182103
Ramesh K, Kumar GK, Swapna K, Datta D, Rajest SS (2021) A review of medical image segmentation algorithms. EAI Endors Trans Pervas Health Technol 7(27):6–6
Zhang D, Huang G, Zhang Q, Han J, Han J, Yu Y (2021) Cross-modality deep feature learning for brain tumor segmentation. Patt Recognit 110:107562
Gibson E, Giganti F, Hu Y, Bonmati E, Bandula S, Gurusamy K, Davidson B, Pereira SP, Clarkson MJ, Barratt DC (2018) Automatic multi-organ segmentation on abdominal CT with dense v-networks. IEEE Trans Med Imag 37(8):1822–1834
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare to have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Xu, S., Xu, Q., Su, W. et al. Edge-Aware Spatial Propagation Network for Multi-view Depth Estimation. Neural Process Lett 55, 10905–10923 (2023). https://doi.org/10.1007/s11063-023-11356-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-023-11356-4