Abstract
Recently, learning-based multi-view stereo methods have achieved promising results. However, most of them overlook the visibility difference among different views, which leads to an indiscriminate multi-view similarity definition and greatly limits their performance on datasets with strong viewpoint variations. To deal with this problem, a pixelwise visibility-aware multi-view stereo network is proposed for robust dense 3D reconstruction. We present a pixelwise visibility estimation network to learn the visibility information for different neighboring images before computing the multi-view similarity, and then construct an adaptive weighted cost volume with the visibility information. Unlike previous methods that treat multi-view depth inference as a depth regression problem or an inverse depth classification problem, we recast multi-view depth inference as an inverse depth regression task. This allows our network to achieve sub-pixel estimation and be applicable to large-scale scenes. To achieve scalable high-resolution depth map estimation, we construct cost volumes by group-wise correlation and design an ordinal-based uncertainty estimation to progressively refine depth maps. Through extensive experiments on DTU dataset, Tanks and Temples dataset and ETH3D benchmark, we show that our method generalizes well to various datasets and achieves promising results, demonstrating its superior performance on robust dense 3D reconstruction.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Aanæs, H., Jensen, R. R., Vogiatzis, G., Tola, E., & Dahl, A. B. (2016). Large-scale data for multiple-view stereopsis. International Journal of Computer Vision, 120(2), 153–168.
Barnes, C., Shechtman, E., Finkelstein, A., & Goldman, D. B. (2009). Patchmatch: A randomized correspondence algorithm for structural image editing. In ACM SIGGRAPH, pp. 24:1–24:11.
Bleyer, M., Rhemann, C., & Rother, C. (2011). Patchmatch stereo-stereo matching with slanted support windows. In Bmvc, 11, 1–11.
Boykov, Y., Veksler, O., & Zabih, R. (2001). Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(11), 1222–1239.
Campbell, N. D. F., Vogiatzis, G., Hernández, C., & Cipolla, R. (2008). Using multiple hypotheses to improve depth-maps for multi-view stereo. In Proceedings of the European Conference on Computer Vision, pp. 766–779.
Chang, J., & Chen, Y. (2018). Pyramid stereo matching network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5410–5418.
Chen, R., Han, S., Xu, J., & Su, H. (2019). Point-based multi-view stereo network. In Proceedings of the IEEE International Conference on Computer Vision, pp. 1538–1547.
Chen, R., Han, S., Xu, J., & Su, H. (2020). Visibility-aware point-based multi-view stereo network. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(10), 3695–3708.
Cheng, S., Xu, Z., Zhu, S., Li, Z., Li, L. E., Ramamoorthi, R., & Su, H. (2020). Deep stereo using adaptive thin volume representation with uncertainty awareness. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
Collins, R. T. (1996). A space-sweep approach to true multi-image matching. In Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 358–363.
Fu, Z., & Ardabilian Fard, M. (2018). Learning confidence measures by multi-modal convolutional neural networks. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision, pp. 1321–1330.
Fuhrmann, S., Langguth, F., & Goesele, M. (2014). Mve: A multi-view reconstruction environment. In Proceedings of the Eurographics Workshop on Graphics and Cultural Heritage, pp. 11–18.
Furukawa, Y., & Ponce, J. (2010). Accurate, dense, and robust multiview stereopsis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(8), 1362–1376.
Galliani, S., Lasinger, K., & Schindler, K. (2015). Massively parallel multiview stereopsis by surface normal diffusion. In Proceedings of the IEEE International Conference on Computer Vision, pp. 873–881.
Gu, X., Fan, Z., Zhu, S., Dai, Z., Tan, F., & Tan, P. (2020). Cascade cost volume for high-resolution multi-view stereo and stereo matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
Guo, X., Yang, K., Yang, W., Wang, X. & Li, H. (2019). Group-wise correlation stereo network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3273–3282.
Haala, N., & Rothermel, M. (2012). Dense multi-stereo matching for high quality digital elevation models. Photogrammetrie-Fernerkundung-Geoinformation, 202(4), 331–343.
Hartmann, W., Galliani, S., Havlena, M., Gool, L. V., & Schindler, K. (2017). Learned multi-patch similarity. In Proceedings of the IEEE International Conference on Computer Vision, pp. 1595–1603.
Heise, P., Jensen, B., Klose, S., & Knoll, A. (2015). Variational patchmatch multiview reconstruction and refinement. In Proceedings of the IEEE International Conference on Computer Vision, pp. 882–890.
Hirschmuller, H. (2008). Stereo processing by semiglobal matching and mutual information. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(2), 328–341.
Hosni, A., Rhemann, C., Bleyer, M., Rother, C., & Gelautz, M. (2013). Fast cost-volume filtering for visual correspondence and beyond. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(2), 504–511.
Hu, X., & Mordohai, P. (2012). A quantitative evaluation of confidence measures for stereo vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(11), 2121–2133.
Huang, P., Matzen, K., Kopf, J., Ahuja, N., & Huang, J. (2018). Deepmvs: Learning multi-view stereopsis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2821–2830.
Im, S., Jeon, H.-G., Lin, S., & Kweon, I. S. (2019). Dpsnet: End-to-end deep plane sweep stereo. arXiv:1905.00538.
Ji, M., Gall, J., Zheng, H., Liu, Y., & Fang, L. (2017). Surfacenet: An end-to-end 3d neural network for multiview stereopsis. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2307–2315.
Kar, A., Häne, C., & Malik, J. (2017). Learning a multi-view stereo machine. In Advances in Neural Information Processing Systems, pp. 365–376.
Kazhdan, Michael, & Hoppe, Hugues. (2013). Screened poisson surface reconstruction. ACM Transactions on Graphics, 32(3), 29:1-29:13.
Kendall, A., Martirosyan, H., Dasgupta, S. & Henry, P. (2017). End-to-end learning of geometry and context for deep stereo regression. In Proceedings of the IEEE International Conference on Computer Vision, pp. 66–75.
Kim, S., Kim, S., Min, D., & Sohn, K. (2019). Laf-net: Locally adaptive fusion networks for stereo confidence estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 205–214.
Kim, S., Min, D., Kim, S., & Sohn, K. (2019). Unified confidence estimation networks for robust stereo matching. IEEE Transactions on Image Processing, 28(3), 1299–1313.
Arno, K., Jaesik, P., Qian-Yi, Z., & Vladlen, K. (2017). Tanks and temples benchmark. https://www.tanksandtemples.org.
Arno, K., Jaesik, P., Qian-Yi, Z., & Vladlen, K. (2017). Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Transactions on Graphics, 36(4), 78:1-78:13.
Vladimir, K., & Ramin, Z. (2002). Multi-camera scene reconstruction via graph cuts. In Proceedings of the European Conference on Computer Vision, pp. 82–96
Andreas, K., Christian, S., Mattia, R., Oliver, E., & Friedrich, F. (2020). Deepc-mvs: Deep confidence prediction for multi-view stereo reconstruction. In Proceedings of the IEEE Conference on on 3D Vision, pp. 404–413.
Li, Zhaoxin, Zuo, Wangmeng, Wang, Zhaoqi, & Zhang, Lei. (2020). Confidence-based large-scale dense multi-view stereo. IEEE Transactions on Image Processing, 29, 7176–7191.
Keyang, L., Tao, G., Lili, J., Haipeng, H., & Yawei, L. (2019). P-mvsnet: Learning patch-wise matching confidence aggregation for multi-view stereo. In Proceedings of the IEEE International Conference on Computer Vision, pp. 10452–10461.
Keyang, L., Tao, G., Lili, J., Yuesong, W., Zhuo, C., & Yawei, L. (2020). Attention-aware multi-view stereo. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1590–1599.
Luo, W., Schwing, A. G., & Urtasun, R. (2016). Efficient deep learning for stereo matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5695–5703.
Mayer, N., Ilg, E., Häusser, P., Fischer, P., Cremers, D., Dosovitskiy, A., & Brox, T. (2016). A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4040–4048.
Poggi, M., Tosi, F., & Mattoccia, S. (2017). Quantitative evaluation of confidence measures in a machine learning world. In Proceedings of the IEEE International Conference on Computer Vision, pp. 5238–5247.
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention—MICCAI, 234–241.
L. Schönberger, J., & Frahm, J. (2016). Structure-from-motion revisited. In Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 4104–4113.
Johannes, L., Schönberger, E. Z., Jan-Michael, F., & Marc, P. (2016). Pixelwise view selection for unstructured multi-view stereo. In Proceedings of the European Conference on Computer Vision, pp. 501–518.
Thomas, S., Johannes, L. S., Silvano, G., Torsten, S., Konrad, S., Marc, P., & Andreas, G. ETH3D Benchmark. https://www.eth3d.net.
Schöps, T.., Schönberger, J. L., Galliani, S., Sattler, T., Schindler, K., Pollefeys, M., & Geiger, A. (2017). A multi-view stereo benchmark with high-resolution images and multi-camera videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2538–2547.
Akihito, S., & Marc, P. (2016). Patch based confidence prediction for dense disparity map. In Proceedings of the British Machine Vision Conference, pp. 23.1–23.13.
Christian, S., Patrick, K., Andreas, K., Mattia, R., Thomas, P., & Friedrich, F. (2020). Bp-mvsnet: Belief-propagation-layers for multi-view-stereo. In Proceedings of the International Conference on 3D Vision, pp. 394–403.
Christian, S., Mattia, R., Andreas, K., & Friedrich, F. (2021). Ib-mvs: An iterative algorithm for deep multi-view stereo based on binary decisions. arXiv:2111.14420.
Tola, Engin, Strecha, Christoph, & Fua, Pascal. (2012). Efficient large-scale multi-view stereo for ultra high-resolution image sets. Machine Vision and Applications, 23(5), 903–920.
Stepan, T., Anton, I., & François, F. (2018). Practical deep stereo (pds): Toward applications-friendly deep stereo matching. In Advances in Neural Information Processing Systems, pp. 5871–5881.
Žbontar, J., LeCun, Y. (2015). Computing the stereo matching cost with a convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1592–1599.
Fangjinhua, W., Silvano, G., Christoph, V., & Marc, P. (2021). Itermvs: Iterative probability estimation for efficient multi-view stereo. arXiv:2112.05126.
Fangjinhua, W., Silvano, G., Christoph, Vogel., Pablo, Speciale., & Marc, P. (2021). Patchmatchnet: Learned multi-view patchmatch stereo. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 14194–14203.
Qingshan, X., & Wenbing, T. (2019). Multi-scale geometric consistency guided multi-view stereo. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5483–5492.
Qingshan, X., & Wenbing, T. (2020). Learning inverse depth regression for multi-view stereo with correlation cost volume. In Proceedings of the AAAI Conference on Artificial Intelligence.
Qingshan, X., & Wenbing, T. (2020). Planar prior assisted patchmatch multi-view stereo. In Proceedings of the AAAI Conference on Artificial Intelligence.
Qingshan, X., & Wenbing T. (2020). Pvsnet: Pixelwise visibility-aware multi-view stereo network. arXiv:2007.07714.
Zhenyu, X., Yiguang, L., Xuelei, S., Ying, W., Yunan, Z. (2020). Marmvs: Matching ambiguity reduced multiple view stereo for efficient large scale scene reconstruction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5981–5990.
Youze, X., Jiansheng, C., Weitao, W., Yiqing, H., Cheng, Y., Tianpeng, L., & Jiayu, B. (2019). Mvscrf: Learning multi-view stereo with conditional random fields. In Proceedings of the IEEE International Conference on Computer Vision, pp. 4312–4321.
Jianfeng, Y., Zizhuang, W., Hongwei, Y., Mingyu, D., Runze, Z., Yisong, C., Guoping, W., & Yu-Wing, T. (2020). Dense hybrid recurrent multi-view stereo net with dynamic consistency checking. In Proceedings of the European Conference on Computer Vision, pp. 674–689.
Jiayu, Y., Wei, M., Jose, M. A., & Miaomiao, L. (2020). Cost volume pyramid based depth inference for multi-view stereo. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
Yao, Y., Zixin, L., Shiwei, L., Tian, F., & Long, Q. (2018). Mvsnet: Depth inference for unstructured multi-view stereo. In Proceedings of the European Conference on Computer Vision, pp. 767–783.
Yao, Y., Zixin, L., Shiwei, L., Tianwei, S., Tian, F., & Long, Q. (2019). Recurrent mvsnet for high-resolution multi-view stereo depth inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5525–5534.
Yao, Y., Zixin, L., Shiwei, L., Jingyang, Z., Yufan, R., Lei, Z., Tian, F., & Long, Q. (2020). Blendedmvs: A large-scale dataset for generalized multi-view stereo networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1790–1799.
Zehao, Y., Shenghua, G. (2020). Fast-mvsnet: Sparse-to-dense multi-view stereo with learned propagation and gauss-newton refinement. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1949–1958.
Feihu, Z., Victor, P., Ruigang, Y., & Philip, H. S. T. (2019). Ga-net: Guided aggregation net for end-to-end stereo matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 185–194.
Jingyang, Z., Yao, Y., Shiwei, L., Zixin, L., & Tian, F. (2020). Visibility-aware multi-view stereo network. arXiv:2008.07928.
Xudong, Z., Yutao, H., Haochen, W., Xianbin, C., & Baochang, Z. (2021). Long-range attention network for multi-view stereo. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision, pp. 3782–3791.
E. Zheng, E. D., Jojic, V., & Frahm, J. M. (2014). Patchmatch based joint view selection and depthmap estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1510–1517.
Acknowledgements
This work was supported by the National Natural Science Foundation of China under Grants 62176096 and 61991412.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by D. Scharstein.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Xu, Q., Su, W., Qi, Y. et al. Learning Inverse Depth Regression for Pixelwise Visibility-Aware Multi-View Stereo Networks. Int J Comput Vis 130, 2040–2059 (2022). https://doi.org/10.1007/s11263-022-01628-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-022-01628-2