CATRE: Iterative Point Clouds Alignment for Category-Level Object Pose Refinement

Liu, Xingyu; Wang, Gu; Li, Yi; Ji, Xiangyang

doi:10.1007/978-3-031-20086-1_29

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13662))

Included in the following conference series:

European Conference on Computer Vision

2601 Accesses

Abstract

While category-level 9DoF object pose estimation has emerged recently, previous correspondence-based or direct regression methods are both limited in accuracy due to the huge intra-category variances in object shape and color, etc. Orthogonal to them, this work presents a category-level object pose and size refiner CATRE, which is able to iteratively enhance pose estimate from point clouds to produce accurate results. Given an initial pose estimate, CATRE predicts a relative transformation between the initial pose and ground truth by means of aligning the partially observed point cloud and an abstract shape prior. In specific, we propose a novel disentangled architecture being aware of the inherent distinctions between rotation and translation/size estimation. Extensive experiments show that our approach remarkably outperforms state-of-the-art methods on REAL275, CAMERA25, and LM benchmarks up to a speed of \({\approx }{85.32}\,{\text {Hz}}\), and achieves competitive results on category-level tracking. We further demonstrate that CATRE can perform pose refinement on unseen category. Code and trained models are available (https://github.com/THU-DA-6D-Pose-Group/CATRE.git).

X. Liu and G. Wang—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 12583; Price includes VAT (Japan)

Softcover Book: JPY 15729; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking

U-COPE: Taking a Further Step to Universal 9D Category-Level Object Pose Estimation

SUP-NeRF: A Streamlined Unification of Pose Estimation and NeRF for Monocular 3D Object Reconstruction

Notes

1.
Note that there is a small mistake in the original IoU evaluation code of [55], we recalculated the IoU metrics as in [39].

References

Aoki, Y., Goforth, H., Srivatsan, R.A., Lucey, S.: PointNetLK: robust & efficient point cloud registration using PointNet. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7163–7172 (2019)
Google Scholar
Besl, P.J., McKay, N.D.: A method for registration of 3-D shapes. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 14(2), 239–256 (1992)
Article Google Scholar
Bouaziz, S., Tagliasacchi, A., Pauly, M.: Sparse iterative closest point. In: Computer Graphics Forum, vol. 32, pp. 113–123. Wiley Online Library (2013)
Google Scholar
Brachmann, E., Michel, F., Krull, A., Ying Yang, M., Gumhold, S., Rother, C.: Uncertainty-driven 6D pose estimation of objects and scenes from a single RGB image. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3364–3372 (2016)
Google Scholar
Chang, A.X., et al.: ShapeNet: an information-rich 3D model repository. arXiv preprint arXiv:1512.03012 (2015)
Chen, D., Li, J., Wang, Z., Xu, K.: Learning canonical shape space for category-level 6D object pose and size estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11970–11979 (2020). https://doi.org/10.1109/CVPR42600.2020.01199
Chen, K., Dou, Q.: SGPA: structure-guided prior adaptation for category-level 6D object pose estimation. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2773–2782 (2021)
Google Scholar
Chen, W., Jia, X., Chang, H.J., Duan, J., Leonardis, A.: G2L-Net: global to local network for real-time 6D pose estimation with embedding vector features. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4233–4242 (2020)
Google Scholar
Chen, W., Jia, X., Chang, H.J., Duan, J., Linlin, S., Leonardis, A.: FS-Net: fast shape-based network for category-level 6D object pose estimation with decoupled rotation mechanism. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1581–1590, June 2021
Google Scholar
Choy, C., Dong, W., Koltun, V.: Deep global registration. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2514–2523 (2020)
Google Scholar
Collins, J., et al.: ABO: Dataset and benchmarks for real-world 3d object understanding. arXiv preprint arXiv:2110.06199 (2021)
Deng, X., Geng, J., Bretl, T., Xiang, Y., Fox, D.: iCaps: iterative category-level object pose and shape estimation. IEEE Robot. Autom. Lett. (RAL) 7, 1784–1791 (2022)
Article Google Scholar
Du, G., Wang, K., Lian, S., Zhao, K.: Vision-based robotic grasping from object localization, object pose estimation to grasp estimation for parallel grippers: a review. Artif. Intell. Rev. 54(3), 1677–1734 (2021)
Article Google Scholar
Fan, Z., et al.: ACR-Pose: Adversarial canonical representation reconstruction network for category level 6d object pose estimation. arXiv preprint arXiv:2111.10524 (2021)
Gao, G., Lauri, M., Hu, X., Zhang, J., Frintrop, S.: CloudAAE: learning 6D object pose regression with on-line data synthesis on point clouds. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 11081–11087 (2021)
Google Scholar
Groueix, T., Fisher, M., Kim, V.G., Russell, B.C., Aubry, M.: 3D-CODED: 3D correspondences by deep deformation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 235–251. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_15
Chapter Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2961–2969 (2017)
Google Scholar
Hendrycks, D., Gimpel, K.: Gaussian error linear units (GELUs). arXiv preprint arXiv:1606.08415 (2016)
Hinterstoisser, S., et al.: Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: Asian Conference on Computer Vision (ACCV) (2012)
Google Scholar
Hodaň, T., Matas, J., Obdržálek, Š: On evaluation of 6D object pose estimation. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 606–619. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49409-8_52
Chapter Google Scholar
Hodaň, T., et al.: bop challenge 2020 on 6D object localization. In: Bartoli, A., Fusiello, A. (eds.) ECCV 2020. LNCS, vol. 12536, pp. 577–594. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-66096-3_39
Chapter Google Scholar
Huang, S., Qi, S., Xiao, Y., Zhu, Y., Wu, Y.N., Zhu, S.C.: Cooperative holistic scene understanding: unifying 3D object, layout, and camera pose estimation. In: Advances in Neural Information Processing Systems (NeurIPS), vol. 31 (2018)
Google Scholar
Huynh, D.Q.: Metrics for 3D rotations: comparison and analysis. J. Math. Imag. Vis. 35(2), 155–164 (2009)
Article MathSciNet MATH Google Scholar
Ilya Loshchilov, F.H.: SGDR: stochastic gradient descent with warm restarts. In: International Conference on Learning Representations (ICLR) (2017)
Google Scholar
Iwase, S., Liu, X., Khirodkar, R., Yokota, R., Kitani, K.M.: RePOSE: fast 6D object pose refinement via deep texture rendering. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3303–3312 (2021)
Google Scholar
Kehl, W., Manhardt, F., Tombari, F., Ilic, S., Navab, N.: SSD-6D: making RGB-based 3D detection and 6D pose estimation great again. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1521–1529 (2017)
Google Scholar
Labbé, Y., Carpentier, J., Aubry, M., Sivic, J.: CosyPose: consistent multi-view multi-object 6D pose estimation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12362, pp. 574–591. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58520-4_34
Chapter Google Scholar
Lee, D., Hamsici, O.C., Feng, S., Sharma, P., Gernoth, T.: DeepPRO: deep partial point cloud registration of objects. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5683–5692 (2021)
Google Scholar
Li, Y., Wang, G., Ji, X., Xiang, Yu., Fox, D.: DeepIM: deep iterative matching for 6D pose estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 695–711. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_42
Chapter Google Scholar
Li, Y., Wang, G., Ji, X., Xiang, Y., Fox, D.: DeepIM: deep iterative matching for 6D pose estimation. Int. J. Comput. Vis. (IJCV) 128(3), 657–678 (2020)
Article Google Scholar
Li, Z., Wang, G., Ji, X.: CDPN: coordinates-based disentangled pose network for real-time RGB-based 6-DoF object pose estimation. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 7678–7687 (2019)
Google Scholar
Lin, J., Wei, Z., Li, Z., Xu, S., Jia, K., Li, Y.: DualPoseNet: category-level 6D object pose and size estimation using dual pose network with refined learning of pose consistency. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3560–3569, October 2021
Google Scholar
Lin, Z.H., Huang, S.Y., Wang, Y.C.F.: Convolution in the cloud: learning deformable kernels in 3D graph convolution networks for point cloud analysis. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1800–1809 (2020)
Google Scholar
Liu, L., et al.: On the variance of the adaptive learning rate and beyond. In: International Conference on Learning Representations (ICLR) (2019)
Google Scholar
Marchand, E., Uchiyama, H., Spindler, F.: Pose estimation for augmented reality: a hands-on survey. IEEE Trans. Vis. Comput. Graph. (TVCG) 22(12), 2633–2651 (2015)
Article Google Scholar
Nie, Y., Han, X., Guo, S., Zheng, Y., Chang, J., Zhang, J.J.: Total3DUnderstanding: joint layout, object pose and mesh reconstruction for indoor scenes from a single image. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 55–64 (2020)
Google Scholar
Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 8026–8037 (2019)
Google Scholar
Peng, S., Liu, Y., Huang, Q., Zhou, X., Bao, H.: PVNet: pixel-wise voting network for 6dof pose estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4561–4570 (2019)
Google Scholar
Peng, W., Yan, J., Wen, H., Sun, Y.: Self-supervised category-level 6D object pose estimation with deep implicit shape representation. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), vol. 36, no. 2, pp. 2082–2090 (2022). https://doi.org/10.1609/aaai.v36i2.20104
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, no. 2, p. 4 (2017)
Google Scholar
Reizenstein, J., Shapovalov, R., Henzler, P., Sbordone, L., Labatut, P., Novotny, D.: Common objects in 3D: large-scale learning and evaluation of real-life 3D category reconstruction. In: IEEE/CVF International Conference on Computer Vision (ICCV) (2021)
Google Scholar
Rusinkiewicz, S., Levoy, M.: Efficient variants of the ICP algorithm. In: Proceedings third International Conference on 3-D Digital Imaging and Modeling, pp. 145–152. IEEE (2001)
Google Scholar
Sarode, V., et al.: PCRNet: point cloud registration network using pointnet encoding. arXiv preprint arXiv:1908.07906 (2019)
Segal, A., Haehnel, D., Thrun, S.: Generalized-ICP. In: Robotics: Science and Systems, Seattle, WA, vol. 2, p. 435 (2009)
Google Scholar
Song, C., Song, J., Huang, Q.: HybridPose: 6D object pose estimation under hybrid representations. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 431–440 (2020)
Google Scholar
Su, Y., Rambach, J., Minaskan, N., Lesur, P., Pagani, A., Stricker, D.: Deep multi-state object pose estimation for augmented reality assembly. In: 2019 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct), pp. 222–227 (2019)
Google Scholar
Tian, M., Ang, M.H., Lee, G.H.: Shape prior deformation for categorical 6D object pose and size estimation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12366, pp. 530–546. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58589-1_32
Chapter Google Scholar
Trappolini, G., Cosmo, L., Moschella, L., Marin, R., Melzi, S., Rodolà, E.: Shape registration in the time of transformers. In: Advances in Neural Information Processing Systems (NeurIPS), vol. 34, pp. 5731–5744 (2021)
Google Scholar
Tremblay, J., To, T., Sundaralingam, B., Xiang, Y., Fox, D., Birchfield, S.: Deep object pose estimation for semantic robotic grasping of household objects. In: Conference on Robot Learning (CoRL), pp. 306–316 (2018)
Google Scholar
Umeyama, S.: Least-squares estimation of transformation parameters between two point patterns. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 13(04), 376–380 (1991). https://doi.org/10.1109/34.88573
Wang, C., et al.: 6-PACK: category-level 6D pose tracker with anchor-based keypoints. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 10059–10066 (2020)
Google Scholar
Wang, C., et al.: DenseFusion: 6D object pose estimation by iterative dense fusion. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3343–3352 (2019)
Google Scholar
Wang, G., Manhardt, F., Liu, X., Ji, X., Tombari, F.: Occlusion-aware self-supervised monocular 6D object pose estimation. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) (2021). https://doi.org/10.1109/TPAMI.2021.3136301
Article Google Scholar
Wang, G., Manhardt, F., Tombari, F., Ji, X.: GDR-Net: Geometry-guided direct regression network for monocular 6D object pose estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 16611–16621 (2021)
Google Scholar
Wang, H., Sridhar, S., Huang, J., Valentin, J., Song, S., Guibas, L.J.: Normalized object coordinate space for category-level 6D object pose and size estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2642–2651 (2019)
Google Scholar
Wang, J., Chen, K., Dou, Q.: Category-level 6D object pose estimation via cascaded relation and recurrent reconstruction networks. In: IEEE/RJS International Conference on Intelligent Robots and Systems (IROS) (2021)
Google Scholar
Wang, Y., Solomon, J.: PRNet: self-supervised learning for partial-to-partial registration. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 8814–8826 (2019)
Google Scholar
Wang, Y., Solomon, J.M.: Deep closest point: learning representations for point cloud registration. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3523–3532 (2019)
Google Scholar
Wen, B., Mitash, C., Ren, B., Bekris, K.E.: se(3)-TrackNet: data-driven 6D pose tracking by calibrating image residuals in synthetic domains. In: IEEE/RJS International Conference on Intelligent Robots and Systems (IROS), pp. 10367–10373 (2020)
Google Scholar
Weng, Y., et al: CAPTRA: category-level pose tracking for rigid and articulated objects from point clouds. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 13209–13218 (2021)
Google Scholar
Wu, Y., He, K.: Group normalization. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01261-8_1
Chapter Google Scholar
Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: PoseCNN: a convolutional neural network for 6D object pose estimation in cluttered scenes. In: Robotics: Science and Systems Conference (RSS) (2018)
Google Scholar
Yong, H., Huang, J., Hua, X., Zhang, L.: Gradient centralization: a new optimization technique for deep neural networks. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 635–652. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_37
Chapter Google Scholar
Zakharov, S., Shugurov, I., Ilic, S.: DPOD: dense 6D pose object detector in RGB images. In: IEEE/CVF International Conference on Computer Vision (ICCV) (2019)
Google Scholar
Zhang, M., Lucas, J., Ba, J., Hinton, G.E.: Lookahead optimizer: k steps forward, 1 step back. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’ Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems (NeurIPS), vol. 32. Curran Associates, Inc. (2019)
Google Scholar
Zhou, Y., Barnes, C., Lu, J., Yang, J., Li, H.: On the continuity of rotation representations in neural networks. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5745–5753 (2019)
Google Scholar

Download references

Acknowledgments

We thank Yansong Tang at Tsinghua-Berkeley Shenzhen Institute, Ruida Zhang and Haotian Xu at Tsinghua University for their helpful suggestions. This work was supported by the National Key R &D Program of China under Grant 2018AAA0102801 and National Natural Science Foundation of China under Grant 61620106005.

Author information

Authors and Affiliations

Tsinghua University, BNRist, Beijing, China
Xingyu Liu & Xiangyang Ji
JD.com, Beijing, China
Gu Wang
University of Washington, Seattle, USA
Yi Li

Authors

Xingyu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Gu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yi Li
View author publications
You can also search for this author in PubMed Google Scholar
Xiangyang Ji
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiangyang Ji .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 3569 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, X., Wang, G., Li, Y., Ji, X. (2022). CATRE: Iterative Point Clouds Alignment for Category-Level Object Pose Refinement. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13662. Springer, Cham. https://doi.org/10.1007/978-3-031-20086-1_29

Download citation

DOI: https://doi.org/10.1007/978-3-031-20086-1_29
Published: 11 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20085-4
Online ISBN: 978-3-031-20086-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

CATRE: Iterative Point Clouds Alignment for Category-Level Object Pose Refinement

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking

U-COPE: Taking a Further Step to Universal 9D Category-Level Object Pose Estimation

SUP-NeRF: A Streamlined Unification of Pose Estimation and NeRF for Monocular 3D Object Reconstruction

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 3569 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

CATRE: Iterative Point Clouds Alignment for Category-Level Object Pose Refinement

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking

U-COPE: Taking a Further Step to Universal 9D Category-Level Object Pose Estimation

SUP-NeRF: A Streamlined Unification of Pose Estimation and NeRF for Monocular 3D Object Reconstruction

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 3569 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation