Learning Inverse Depth Regression for Pixelwise Visibility-Aware Multi-View Stereo Networks | International Journal of Computer Vision Skip to main content
Log in

Learning Inverse Depth Regression for Pixelwise Visibility-Aware Multi-View Stereo Networks

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Recently, learning-based multi-view stereo methods have achieved promising results. However, most of them overlook the visibility difference among different views, which leads to an indiscriminate multi-view similarity definition and greatly limits their performance on datasets with strong viewpoint variations. To deal with this problem, a pixelwise visibility-aware multi-view stereo network is proposed for robust dense 3D reconstruction. We present a pixelwise visibility estimation network to learn the visibility information for different neighboring images before computing the multi-view similarity, and then construct an adaptive weighted cost volume with the visibility information. Unlike previous methods that treat multi-view depth inference as a depth regression problem or an inverse depth classification problem, we recast multi-view depth inference as an inverse depth regression task. This allows our network to achieve sub-pixel estimation and be applicable to large-scale scenes. To achieve scalable high-resolution depth map estimation, we construct cost volumes by group-wise correlation and design an ordinal-based uncertainty estimation to progressively refine depth maps. Through extensive experiments on DTU dataset, Tanks and Temples dataset and ETH3D benchmark, we show that our method generalizes well to various datasets and achieves promising results, demonstrating its superior performance on robust dense 3D reconstruction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Aanæs, H., Jensen, R. R., Vogiatzis, G., Tola, E., & Dahl, A. B. (2016). Large-scale data for multiple-view stereopsis. International Journal of Computer Vision, 120(2), 153–168.

    Article  MathSciNet  Google Scholar 

  • Barnes, C., Shechtman, E., Finkelstein, A., & Goldman, D. B. (2009). Patchmatch: A randomized correspondence algorithm for structural image editing. In ACM SIGGRAPH, pp. 24:1–24:11.

  • Bleyer, M., Rhemann, C., & Rother, C. (2011). Patchmatch stereo-stereo matching with slanted support windows. In Bmvc, 11, 1–11.

  • Boykov, Y., Veksler, O., & Zabih, R. (2001). Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(11), 1222–1239.

    Article  Google Scholar 

  • Campbell, N. D. F., Vogiatzis, G., Hernández, C., & Cipolla, R. (2008). Using multiple hypotheses to improve depth-maps for multi-view stereo. In Proceedings of the European Conference on Computer Vision, pp. 766–779.

  • Chang, J., & Chen, Y. (2018). Pyramid stereo matching network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5410–5418.

  • Chen, R., Han, S., Xu, J., & Su, H. (2019). Point-based multi-view stereo network. In Proceedings of the IEEE International Conference on Computer Vision, pp. 1538–1547.

  • Chen, R., Han, S., Xu, J., & Su, H. (2020). Visibility-aware point-based multi-view stereo network. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(10), 3695–3708.

    Article  Google Scholar 

  • Cheng, S., Xu, Z., Zhu, S., Li, Z., Li, L. E., Ramamoorthi, R., & Su, H. (2020). Deep stereo using adaptive thin volume representation with uncertainty awareness. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

  • Collins, R. T. (1996). A space-sweep approach to true multi-image matching. In Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 358–363.

  • Fu, Z., & Ardabilian Fard, M. (2018). Learning confidence measures by multi-modal convolutional neural networks. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision, pp. 1321–1330.

  • Fuhrmann, S., Langguth, F., & Goesele, M. (2014). Mve: A multi-view reconstruction environment. In Proceedings of the Eurographics Workshop on Graphics and Cultural Heritage, pp. 11–18.

  • Furukawa, Y., & Ponce, J. (2010). Accurate, dense, and robust multiview stereopsis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(8), 1362–1376.

    Article  Google Scholar 

  • Galliani, S., Lasinger, K., & Schindler, K. (2015). Massively parallel multiview stereopsis by surface normal diffusion. In Proceedings of the IEEE International Conference on Computer Vision, pp. 873–881.

  • Gu, X., Fan, Z., Zhu, S., Dai, Z., Tan, F., & Tan, P. (2020). Cascade cost volume for high-resolution multi-view stereo and stereo matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

  • Guo, X., Yang, K., Yang, W., Wang, X. & Li, H. (2019). Group-wise correlation stereo network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3273–3282.

  • Haala, N., & Rothermel, M. (2012). Dense multi-stereo matching for high quality digital elevation models. Photogrammetrie-Fernerkundung-Geoinformation, 202(4), 331–343.

    Article  Google Scholar 

  • Hartmann, W., Galliani, S., Havlena, M., Gool, L. V., & Schindler, K. (2017). Learned multi-patch similarity. In Proceedings of the IEEE International Conference on Computer Vision, pp. 1595–1603.

  • Heise, P., Jensen, B., Klose, S., & Knoll, A. (2015). Variational patchmatch multiview reconstruction and refinement. In Proceedings of the IEEE International Conference on Computer Vision, pp. 882–890.

  • Hirschmuller, H. (2008). Stereo processing by semiglobal matching and mutual information. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(2), 328–341.

    Article  Google Scholar 

  • Hosni, A., Rhemann, C., Bleyer, M., Rother, C., & Gelautz, M. (2013). Fast cost-volume filtering for visual correspondence and beyond. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(2), 504–511.

    Article  Google Scholar 

  • Hu, X., & Mordohai, P. (2012). A quantitative evaluation of confidence measures for stereo vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(11), 2121–2133.

    Article  Google Scholar 

  • Huang, P., Matzen, K., Kopf, J., Ahuja, N., & Huang, J. (2018). Deepmvs: Learning multi-view stereopsis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2821–2830.

  • Im, S., Jeon, H.-G., Lin, S., & Kweon, I. S. (2019). Dpsnet: End-to-end deep plane sweep stereo. arXiv:1905.00538.

  • Ji, M., Gall, J., Zheng, H., Liu, Y., & Fang, L. (2017). Surfacenet: An end-to-end 3d neural network for multiview stereopsis. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2307–2315.

  • Kar, A., Häne, C., & Malik, J. (2017). Learning a multi-view stereo machine. In Advances in Neural Information Processing Systems, pp. 365–376.

  • Kazhdan, Michael, & Hoppe, Hugues. (2013). Screened poisson surface reconstruction. ACM Transactions on Graphics, 32(3), 29:1-29:13.

    Article  Google Scholar 

  • Kendall, A., Martirosyan, H., Dasgupta, S. & Henry, P. (2017). End-to-end learning of geometry and context for deep stereo regression. In Proceedings of the IEEE International Conference on Computer Vision, pp. 66–75.

  • Kim, S., Kim, S., Min, D., & Sohn, K. (2019). Laf-net: Locally adaptive fusion networks for stereo confidence estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 205–214.

  • Kim, S., Min, D., Kim, S., & Sohn, K. (2019). Unified confidence estimation networks for robust stereo matching. IEEE Transactions on Image Processing, 28(3), 1299–1313.

    Article  MathSciNet  Google Scholar 

  • Arno, K., Jaesik, P., Qian-Yi, Z., & Vladlen, K. (2017). Tanks and temples benchmark. https://www.tanksandtemples.org.

  • Arno, K., Jaesik, P., Qian-Yi, Z., & Vladlen, K. (2017). Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Transactions on Graphics, 36(4), 78:1-78:13.

    Google Scholar 

  • Vladimir, K., & Ramin, Z. (2002). Multi-camera scene reconstruction via graph cuts. In Proceedings of the European Conference on Computer Vision, pp. 82–96

  • Andreas, K., Christian, S., Mattia, R., Oliver, E., & Friedrich, F. (2020). Deepc-mvs: Deep confidence prediction for multi-view stereo reconstruction. In Proceedings of the IEEE Conference on on 3D Vision, pp. 404–413.

  • Li, Zhaoxin, Zuo, Wangmeng, Wang, Zhaoqi, & Zhang, Lei. (2020). Confidence-based large-scale dense multi-view stereo. IEEE Transactions on Image Processing, 29, 7176–7191.

    Article  Google Scholar 

  • Keyang, L., Tao, G., Lili, J., Haipeng, H., & Yawei, L. (2019). P-mvsnet: Learning patch-wise matching confidence aggregation for multi-view stereo. In Proceedings of the IEEE International Conference on Computer Vision, pp. 10452–10461.

  • Keyang, L., Tao, G., Lili, J., Yuesong, W., Zhuo, C., & Yawei, L. (2020). Attention-aware multi-view stereo. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1590–1599.

  • Luo, W., Schwing, A. G., & Urtasun, R. (2016). Efficient deep learning for stereo matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5695–5703.

  • Mayer, N., Ilg, E., Häusser, P., Fischer, P., Cremers, D., Dosovitskiy, A., & Brox, T. (2016). A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4040–4048.

  • Poggi, M., Tosi, F., & Mattoccia, S. (2017). Quantitative evaluation of confidence measures in a machine learning world. In Proceedings of the IEEE International Conference on Computer Vision, pp. 5238–5247.

  • Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention—MICCAI, 234–241.

  • L. Schönberger, J., & Frahm, J. (2016). Structure-from-motion revisited. In Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 4104–4113.

  • Johannes, L., Schönberger, E. Z., Jan-Michael, F., & Marc, P. (2016). Pixelwise view selection for unstructured multi-view stereo. In Proceedings of the European Conference on Computer Vision, pp. 501–518.

  • Thomas, S., Johannes, L. S., Silvano, G., Torsten, S., Konrad, S., Marc, P., & Andreas, G. ETH3D Benchmark. https://www.eth3d.net.

  • Schöps, T.., Schönberger, J. L., Galliani, S., Sattler, T., Schindler, K., Pollefeys, M., & Geiger, A. (2017). A multi-view stereo benchmark with high-resolution images and multi-camera videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2538–2547.

  • Akihito, S., & Marc, P. (2016). Patch based confidence prediction for dense disparity map. In Proceedings of the British Machine Vision Conference, pp. 23.1–23.13.

  • Christian, S., Patrick, K., Andreas, K., Mattia, R., Thomas, P., & Friedrich, F. (2020). Bp-mvsnet: Belief-propagation-layers for multi-view-stereo. In Proceedings of the International Conference on 3D Vision, pp. 394–403.

  • Christian, S., Mattia, R., Andreas, K., & Friedrich, F. (2021). Ib-mvs: An iterative algorithm for deep multi-view stereo based on binary decisions. arXiv:2111.14420.

  • Tola, Engin, Strecha, Christoph, & Fua, Pascal. (2012). Efficient large-scale multi-view stereo for ultra high-resolution image sets. Machine Vision and Applications, 23(5), 903–920.

    Article  Google Scholar 

  • Stepan, T., Anton, I., & François, F. (2018). Practical deep stereo (pds): Toward applications-friendly deep stereo matching. In Advances in Neural Information Processing Systems, pp. 5871–5881.

  • Žbontar, J., LeCun, Y. (2015). Computing the stereo matching cost with a convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1592–1599.

  • Fangjinhua, W., Silvano, G., Christoph, V., & Marc, P. (2021). Itermvs: Iterative probability estimation for efficient multi-view stereo. arXiv:2112.05126.

  • Fangjinhua, W., Silvano, G., Christoph, Vogel., Pablo, Speciale., & Marc, P. (2021). Patchmatchnet: Learned multi-view patchmatch stereo. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 14194–14203.

  • Qingshan, X., & Wenbing, T. (2019). Multi-scale geometric consistency guided multi-view stereo. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5483–5492.

  • Qingshan, X., & Wenbing, T. (2020). Learning inverse depth regression for multi-view stereo with correlation cost volume. In Proceedings of the AAAI Conference on Artificial Intelligence.

  • Qingshan, X., & Wenbing, T. (2020). Planar prior assisted patchmatch multi-view stereo. In Proceedings of the AAAI Conference on Artificial Intelligence.

  • Qingshan, X., & Wenbing T. (2020). Pvsnet: Pixelwise visibility-aware multi-view stereo network. arXiv:2007.07714.

  • Zhenyu, X., Yiguang, L., Xuelei, S., Ying, W., Yunan, Z. (2020). Marmvs: Matching ambiguity reduced multiple view stereo for efficient large scale scene reconstruction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5981–5990.

  • Youze, X., Jiansheng, C., Weitao, W., Yiqing, H., Cheng, Y., Tianpeng, L., & Jiayu, B. (2019). Mvscrf: Learning multi-view stereo with conditional random fields. In Proceedings of the IEEE International Conference on Computer Vision, pp. 4312–4321.

  • Jianfeng, Y., Zizhuang, W., Hongwei, Y., Mingyu, D., Runze, Z., Yisong, C., Guoping, W., & Yu-Wing, T. (2020). Dense hybrid recurrent multi-view stereo net with dynamic consistency checking. In Proceedings of the European Conference on Computer Vision, pp. 674–689.

  • Jiayu, Y., Wei, M., Jose, M. A., & Miaomiao, L. (2020). Cost volume pyramid based depth inference for multi-view stereo. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

  • Yao, Y., Zixin, L., Shiwei, L., Tian, F., & Long, Q. (2018). Mvsnet: Depth inference for unstructured multi-view stereo. In Proceedings of the European Conference on Computer Vision, pp. 767–783.

  • Yao, Y., Zixin, L., Shiwei, L., Tianwei, S., Tian, F., & Long, Q. (2019). Recurrent mvsnet for high-resolution multi-view stereo depth inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5525–5534.

  • Yao, Y., Zixin, L., Shiwei, L., Jingyang, Z., Yufan, R., Lei, Z., Tian, F., & Long, Q. (2020). Blendedmvs: A large-scale dataset for generalized multi-view stereo networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1790–1799.

  • Zehao, Y., Shenghua, G. (2020). Fast-mvsnet: Sparse-to-dense multi-view stereo with learned propagation and gauss-newton refinement. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1949–1958.

  • Feihu, Z., Victor, P., Ruigang, Y., & Philip, H. S. T. (2019). Ga-net: Guided aggregation net for end-to-end stereo matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 185–194.

  • Jingyang, Z., Yao, Y., Shiwei, L., Zixin, L., & Tian, F. (2020). Visibility-aware multi-view stereo network. arXiv:2008.07928.

  • Xudong, Z., Yutao, H., Haochen, W., Xianbin, C., & Baochang, Z. (2021). Long-range attention network for multi-view stereo. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision, pp. 3782–3791.

  • E. Zheng, E. D., Jojic, V., & Frahm, J. M. (2014). Patchmatch based joint view selection and depthmap estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1510–1517.

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grants 62176096 and 61991412.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenbing Tao.

Additional information

Communicated by D. Scharstein.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xu, Q., Su, W., Qi, Y. et al. Learning Inverse Depth Regression for Pixelwise Visibility-Aware Multi-View Stereo Networks. Int J Comput Vis 130, 2040–2059 (2022). https://doi.org/10.1007/s11263-022-01628-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-022-01628-2

Keywords

Navigation