While category-level 9DoF object pose estimation has emerged recently, previous correspondence-based or direct regression methods are both limited in accuracy due to the huge intra-category variances in object shape and color, etc. Orthogonal to them, this work presents a category-level object pose and size refiner CATRE, which is able to iteratively enhance pose estimate from point clouds to produce accurate results. Given an initial pose estimate, CATRE predicts a relative transformation between the initial pose and ground truth by means of aligning the partially observed point cloud and an abstract shape prior. In specific, we propose a novel disentangled architecture being aware of the inherent distinctions between rotation and translation/size estimation. Extensive experiments show that our approach remarkably outperforms state-of-the-art methods on REAL275, CAMERA25, and LM benchmarks up to a speed of \({\approx }{85.32}\,{\text {Hz}}\), and achieves competitive results on category-level tracking. We further demonstrate that CATRE can perform pose refinement on unseen category. Code and trained models are available (https://github.com/THU-DA-6D-Pose-Group/CATRE.git).
X. Liu and G. Wang—Equal contribution.
We thank Yansong Tang at Tsinghua-Berkeley Shenzhen Institute, Ruida Zhang and Haotian Xu at Tsinghua University for their helpful suggestions. This work was supported by the National Key R &D Program of China under Grant 2018AAA0102801 and National Natural Science Foundation of China under Grant 61620106005.
