Human Correspondence Consensus for 3D Object Semantic Understanding

Lou, Yujing; You, Yang; Li, Chengkun; Cheng, Zhoujun; Li, Liangwei; Ma, Lizhuang; Wang, Weiming; Lu, Cewu

doi:10.1007/978-3-030-58542-6_30

Yujing Lou¹²,
Yang You¹²,
Chengkun Li¹²,
Zhoujun Cheng¹²,
Liangwei Li¹²,
Lizhuang Ma¹²,
Weiming Wang¹² &
…
Cewu Lu¹²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12367))

Included in the following conference series:

European Conference on Computer Vision

3681 Accesses
4 Citations

Abstract

Semantic understanding of 3D objects is crucial in many applications such as object manipulation. However, it is hard to give a universal definition of point-level semantics that everyone would agree on. We observe that people have a consensus on semantic correspondences between two areas from different objects, but are less certain about the exact semantic meaning of each area. Therefore, we argue that by providing human labeled correspondences between different objects from the same category instead of explicit semantic labels, one can recover rich semantic information of an object. In this paper, we introduce a new dataset named CorresPondenceNet. Based on this dataset, we are able to learn dense semantic embeddings with a novel geodesic consistency loss. Accordingly, several state-of-the-art networks are evaluated on this correspondence benchmark. We further show that CorresPondenceNet could not only boost fine-grained understanding of heterogeneous objects but also cross-object registration and partial object matching.

Y. Lou, Y. You and C. Li—These authors contributed equally.

C. Lu—Who is also the member of Qing Yuan Research Institute and MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, China.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 11439; Price includes VAT (Japan)

Softcover Book: JPY 14299; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Cyclic Functional Mapping: Self-supervised Correspondence Between Non-isometric Deformable Shapes

Zero-Shot Image Feature Consensus with Deep Functional Maps

Generic 3D Representation via Pose Estimation and Matching

References

Allen, B., Curless, B., Curless, B., Popović, Z.: The space of human body shapes: reconstruction and parameterization from range scans. ACM Trans. Graph. (TOG) 22, 587–594 (2003)
Article Google Scholar
Andriluka, M., et al.: PoseTrack: a benchmark for human pose estimation and tracking. In: CVPR (2018)
Google Scholar
Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: new benchmark and state of the art analysis. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2014
Google Scholar
Au, O.K.C., Tai, C.L., Chu, H.K., Cohen-Or, D., Lee, T.Y.: Skeleton extraction by mesh contraction. ACM Trans. Graph. (TOG) 27, 44 (2008)
Article Google Scholar
Besl, P.J., McKay, N.D.: Method for registration of 3-D shapes. In: Sensor fusion IV: Control Paradigms and Data Structures, vol. 1611, pp. 586–606. International Society for Optics and Photonics (1992)
Google Scholar
Blanz, V., Vetter, T., et al.: A morphable model for the synthesis of 3D faces. In: SIGGRAPH, vol. 99, pp. 187–194 (1999)
Google Scholar
Bogo, F., Romero, J., Loper, M., Black, M.J.: Faust: dataset and evaluation for 3D mesh registration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3794–3801 (2014)
Google Scholar
Bojarski, M., et al.: End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316 (2016)
Bronstein, A.M., Bronstein, M.M., Kimmel, R.: Numerical Geometry of Non-Rigid Shapes. Springer, Cham (2008)
MATH Google Scholar
Chang, A.X., et al.: ShapeNet: an information-rich 3D model repository. arXiv preprint arXiv:1512.03012 (2015)
Choy, C., Gwak, J., Savarese, S.: 4D spatio-temporal convNets: Minkowski convolutional neural networks. arXiv preprint arXiv:1904.08755 (2019)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection (2005)
Google Scholar
Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering. In: Advances in Neural Information Processing Systems, pp. 3844–3852 (2016)
Google Scholar
Dutagaci, H., Cheung, C.P., Godil, A.: Evaluation of 3D interest point detection techniques via human-generated ground truth. Vis. Comput. 28(9), 901–917 (2012)
Article Google Scholar
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)
Article MathSciNet Google Scholar
Florence, P.R., Manuelli, L., Tedrake, R.: Dense object nets: learning dense visual object descriptors by and for robotic manipulation. arXiv preprint arXiv:1806.08756 (2018)
Groueix, T., Fisher, M., Kim, V.G., Russell, B.C., Aubry, M.: 3D-coded: 3D correspondences by deep deformation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 230–246 (2018)
Google Scholar
Halimi, O., Litany, O., Rodolà, E., Bronstein, A., Kimmel, R.: Self-supervised learning of dense shape correspondence. arXiv preprint arXiv:1812.02415 (2018)
Ham, B., Cho, M., Schmid, C., Ponce, J.: Proposal flow: semantic correspondences from object proposals. IEEE Trans. Pattern Anal. Mach. Intell. 40(7), 1711–1725 (2017)
Article Google Scholar
Horn, B.K., Schunck, B.G.: Determining optical flow: a retrospective (1993)
Google Scholar
Huang, H., Kalogerakis, E., Chaudhuri, S., Ceylan, D., Kim, V.G., Yumer, E.: Learning local shape descriptors from part correspondences with multiview convolutional networks. ACM Trans. Graph. 37(1) (2017). https://doi.org/10.1145/3137609
Kalayeh, M.M., Basaran, E., Gökmen, M., Kamasak, M.E., Shah, M.: Human semantic parsing for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1062–1071 (2018)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kulkarni, N., Gupta, A., Tulsiani, S.: Canonical surface mapping via geometric cycle consistency. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2202–2211 (2019)
Google Scholar
Kundu, A., Li, Y., Rehg, J.M.: 3D-RCNN: instance-level 3D object reconstruction via render-and-compare. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3559–3568 (2018)
Google Scholar
Leng, B., Liu, Y., Yu, K., Zhang, X., Xiong, Z.: 3D object understanding with 3D convolutional neural networks. Inf. Sci. 366, 188–201 (2016)
Article MathSciNet Google Scholar
Leutenegger, S., Chli, M., Siegwart, R.: Brisk: binary robust invariant scalable keypoints. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 2548–2555. IEEE (2011)
Google Scholar
Liao, J., Yao, Y., Yuan, L., Hua, G., Kang, S.B.: Visual attribute transfer through deep image analogy. arXiv preprint arXiv:1705.01088 (2017)
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Liu, C., Yuen, J., Torralba, A.: Sift flow: dense correspondence across scenes and its applications. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 978–994 (2010)
Article Google Scholar
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. Image Vis. Comput. 22(10), 761–767 (2004)
Article Google Scholar
Min, J., Lee, J., Ponce, J., Cho, M.: Spair-71k: a large-scale benchmark for semantic correspondence. arXiv preprint arXiv:1908.10543 (2019)
Mo, K., et al.: PartNet: a large-scale benchmark for fine-grained and hierarchical part-level 3D object understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 909–918 (2019)
Google Scholar
Okutomi, M., Kanade, T.: A multiple-baseline stereo. IEEE Trans. Pattern Anal. Mach. Intell. 4, 353–363 (1993)
Article Google Scholar
Paden, B., Čáp, M., Yong, S.Z., Yershov, D., Frazzoli, E.: A survey of motion planning and control techniques for self-driving urban vehicles. IEEE Trans. Intell. Veh. 1(1), 33–55 (2016)
Article Google Scholar
Pomerleau, F., Colas, F., Siegwart, R., et al.: A review of point cloud registration algorithms for mobile robotics. Found. Trends® Robot. 4(1), 1–104 (2015)
Article Google Scholar
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017)
Google Scholar
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, pp. 5099–5108 (2017)
Google Scholar
Roufosse, J.M., Sharma, A., Ovsjanikov, M.: Unsupervised deep learning for structured shape matching. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1617–1627 (2019)
Google Scholar
Rusu, R.B., Blodow, N., Beetz, M.: Fast point feature histograms (FPFH) for 3D registration. In: 2009 IEEE International Conference on Robotics and Automation, pp. 3212–3217. IEEE (2009)
Google Scholar
Salti, S., Tombari, F., Spezialetti, R., Di Stefano, L.: Learning a descriptor-specific 3D keypoint detector. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2318–2326 (2015)
Google Scholar
Schmidt, T., Newcombe, R., Fox, D.: Self-supervised visual descriptor learning for dense correspondence. IEEE Robot. Autom. Lett. 2(2), 420–427 (2016)
Article Google Scholar
Sung, M., Su, H., Yu, R., Guibas, L.J.: Deep functional dictionaries: learning consistent semantic structures on 3D models from functions. In: Advances in Neural Information Processing Systems, pp. 485–495 (2018)
Google Scholar
Suwajanakorn, S., Snavely, N., Tompson, J.J., Norouzi, M.: Discovery of latent 3D keypoints via end-to-end geometric reasoning. In: Advances in Neural Information Processing Systems, pp. 2059–2070 (2018)
Google Scholar
Taniai, T., Sinha, S.N., Sato, Y.: Joint recovery of dense correspondence and cosegmentation in two images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4246–4255 (2016)
Google Scholar
Tombari, F., Salti, S., Di Stefano, L.: Unique signatures of histograms for local surface description. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6313, pp. 356–369. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15558-1_26
Chapter Google Scholar
Vestner, M., Litman, R., Rodolà, E., Bronstein, A., Cremers, D.: Product manifold filter: non-rigid shape correspondence via kernel density estimation in the product space. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3327–3336 (2017)
Google Scholar
Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph CNN for learning on point clouds. ACM Trans. Graph. (TOG) 38(5), 146 (2019)
Article Google Scholar
Wu, W., Qi, Z., Li, F.: PointConv: deep convolutional networks on 3D point clouds. CoRR abs/1811.07246 (2018). http://arxiv.org/abs/1811.07246
Yi, L., et al.: A scalable active framework for region annotation in 3D shape collections. ACM Trans. Graph. (TOG) 35(6), 210 (2016)
Article Google Scholar
You, Y., et al.: KeypointNet: a large-scale 3D keypoint dataset aggregated from numerous human annotations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13647–13656 (2020)
Google Scholar
Zhou, B., et al.: Semantic understanding of scenes through the ADE20K dataset. Int. J. Comput. Vis. 127(3), 302–321 (2019)
Article Google Scholar
Zhou, T., Krahenbuhl, P., Aubry, M., Huang, Q., Efros, A.A.: Learning dense correspondence via 3D-guided cycle consistency. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 117–126 (2016)
Google Scholar

Download references

Acknowledgements

This work is supported in part by the National Key R&D Program of China, No. 2017YFA0700800, National Natural Science Foundation of China under Grants 61772332, SHEITC (2018-RGZN-02046) and Shanghai Qi Zhi Institute.

Author information

Authors and Affiliations

Shanghai Jiao Tong University, Shanghai, China
Yujing Lou, Yang You, Chengkun Li, Zhoujun Cheng, Liangwei Li, Lizhuang Ma, Weiming Wang & Cewu Lu

Authors

Yujing Lou
View author publications
You can also search for this author in PubMed Google Scholar
Yang You
View author publications
You can also search for this author in PubMed Google Scholar
Chengkun Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhoujun Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Liangwei Li
View author publications
You can also search for this author in PubMed Google Scholar
Lizhuang Ma
View author publications
You can also search for this author in PubMed Google Scholar
Weiming Wang
View author publications
You can also search for this author in PubMed Google Scholar
Cewu Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cewu Lu .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 9298 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lou, Y. et al. (2020). Human Correspondence Consensus for 3D Object Semantic Understanding. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12367. Springer, Cham. https://doi.org/10.1007/978-3-030-58542-6_30

Download citation

DOI: https://doi.org/10.1007/978-3-030-58542-6_30
Published: 17 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58541-9
Online ISBN: 978-3-030-58542-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics