• Corpus ID: 256903232

CroCPS: Addressing Photometric Challenges in Self-Supervised Category-Level 6D Object Poses with Cross-Modal Learning

@inproceedings{Wang2022CroCPSAP,
  title={CroCPS: Addressing Photometric Challenges in Self-Supervised Category-Level 6D Object Poses with Cross-Modal Learning},
  author={Pengyuan Wang and Lorenzo Garattoni and Sven Meier and Nassir Navab and Benjamin Busam},
  booktitle={British Machine Vision Conference},
  year={2022},
  url={https://api.semanticscholar.org/CorpusID:256903232}
}
This work constitutes the first solution for self-supervision on challenging reflective objects and explores the usage of polarization images.

Figures and Tables from this paper

Improving Self-Supervised Learning of Transparent Category Poses With Language Guidance and Implicit Physical Constraints

This work proposes a novel pipeline which takes language guidance and implicit physical constraints for 2D and 3D self-supervisions, and utilizes language guidance to obtain accurate 2D object segmentation which is robust to background changes.

Colibri5: Real-Time Monocular 5-DoF Trocar Pose Tracking for Robot-Assisted Vitreoretinal Surgery

A real-time marker-less method for 3D pose tracking of trocar, achieved with only a single monocular camera is presented, which could serve as a foundation to improve robotic systems’ automation, integration, and efficiency of robotic systems for retinal surgery.

Occlusion-Aware Self-Supervised Monocular 6D Object Pose Estimation

This work proposes a novel monocular 6D pose estimation approach by means of self-supervised learning, removing the need for real annotations, and demonstrates that this proposed self-supervision outperforms all other methods relying on synthetic data or employing elaborate techniques from the domain adaptation realm.

CPS++: Improving Class-level 6D Pose and Shape Estimation From Monocular Images With Self-Supervised Learning

This work proposes a novel method for class-level monocular 6D pose estimation, coupled with metric shape retrieval, and leverages recent advances in differentiable rendering to self-supervise the model with unannotated real RGB-D data to improve latter inference.

Self6D: Self-Supervised Monocular 6D Object Pose Estimation

This work proposes the idea of monocular 6D pose estimation by means of self-supervised learning, removing the need for real annotations, and demonstrates that the proposed self- supervision model is able to significantly enhance the model's original performance, outperforming all other methods relying on synthetic data or employing elaborate techniques from the domain adaptation realm.

PhoCaL: A Multi-Modal Dataset for Category-Level Object Pose Estimation with Photometrically Challenging Objects

A novel robot-supported multi-modal (RGB, depth, polarisation) data acquisition and annotation process is developed that ensures sub-millimeter accuracy of the pose for opaque textured, shiny and transparent objects, no motion blur and perfect camera synchronisation.

WS-OPE: Weakly Supervised 6D Object Pose Regression using Relative Multi-Camera Pose Constraints

The proposed scalable, end-to-end 6D pose regression with weak supervision without the need for a consecutive refinement stage thereby ensures real-time performance and is capable of predicting poses of good quality, in spite being trained with only weak labels.

Self-Supervised Category-Level 6D Object Pose Estimation with Deep Implicit Shape Representation

A self-supervised framework for category-level 6D pose estimation that leverages DeepSDF as a 3D object representation and design several novel loss functions based onDeepSDF to help the self- supervised model predict unseen object poses without any 6D object pose labels and explicit 3D models in real scenarios.

DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion

DenseFusion is a generic framework for estimating 6D pose of a set of known objects from RGB-D images that processes the two data sources individually and uses a novel dense fusion network to extract pixel-wise dense feature embedding, from which the pose is estimated.

UDA-COPE: Unsupervised Domain Adaptation for Category-level Object Pose Estimation

This work proposes an unsupervised domain adaptation (UDA) for category-level object pose estimation, called UDA-COPE, which exploits a teacher-student self-supervised learning scheme to train a pose estimation network without using target domain pose labels.

CroMo: Cross-Modal Learning for Monocular Depth Estimation

This paper proposes a novel pipeline capable of estimating depth from monocular polarisation, and proposes the inversion of differentiable analytic models thereby connects scene geometry with polarisation and ToF signals and enables self-supervised and cross-modal learning.

NeRF-Pose: A First-Reconstruct-Then-Regress Approach for Weakly-supervised 6D Object Pose Estimation

A weakly-supervised reconstruction-based pipeline, named NeRF-Pose, which needs only 2D bounding boxes and relative camera poses during training and has state-of-the-art accuracy in comparison to the best 6D pose estimation methods in spite of being trained only with weak labels.