Fusing Local Similarities for Retrieval-based 3D Orientation Estimation of Unseen Objects

@article{Zhao2022FusingLS,
  title={Fusing Local Similarities for Retrieval-based 3D Orientation Estimation of Unseen Objects},
  author={Chen Zhao and Yinlin Hu and Mathieu Salzmann},
  journal={ArXiv},
  year={2022},
  volume={abs/2203.08472},
  url={https://api.semanticscholar.org/CorpusID:247475898}
}
This paper tackles the task of estimating the 3D orientation of previously-unseen objects from monocular images and introduces an adaptive fusion module that robustly aggregates the local similarities into a global similarity score of pairwise images.

Figures and Tables from this paper

Object Pose Estimation via the Aggregation of Diffusion Features

This work proposes three distinct architectures that can effectively capture and aggregate diffusion features of different granularity, greatly improving the generalizability of object pose estimation, and outperforms the state-of-the-art methods by a considerable margin on three popular benchmark datasets.

GigaPose: Fast and Robust Novel Object Pose Estimation via One Correspondence

GigaPose first leverages discriminative “templates ”, ren-dered images of the CAD models, to recover the out-of-plane rotation and then uses patch correspondences to estimate the four remaining parameters, resulting in a speedup factor of 35x compared to the state of the art.

3D-Aware Hypothesis & Verification for Generalizable Relative Object Pose Estimation

This work presents a new hypothesis-and-verification framework, in which the goal is to estimate the relative object pose between this reference view and a query image that depicts the object in a different pose.

LocPoseNet: Robust Location Prior for Unseen Object Pose Estimation

This paper develops a method, LocPoseNet, able to robustly learn location prior for unseen objects, and introduces a novel translation estimator, which decouples scale-aware and scale-robust features to predict different object location parameters.

Finer-Grained Correlations: Location Priors for Unseen Object Pose Estimation

Aner-grained correlation estimation module, which handles the object scale mismatches by computing correlations with adjustable receptive receptive representations, and proposes to decouple the correlations into scale-robust and scale-aware representations to estimate the object location and size.

Generalizable Single-view Object Pose Estimation by Two-side Generating and Matching

A novel generalizable object pose estimation method to determine the object pose using only one RGB image, which operates with a single reference image of the object, and eliminates the need for 3D object models or multiple views of the object.

Open-Vocabulary Category-Level Object Pose and Size Estimation

Comprehensive quantitative and qualitative experiments demonstrate that the proposed open-vocabulary method, trained on large-scale synthesized data, significantly outperforms the baseline and can effectively generalize to real-world images of unseen categories.

Deep Learning-Based Object Pose Estimation: A Comprehensive Survey

This survey discusses the recent advances in deep learning-based object pose estimation, covering all three formulations of the problem, and identifies key challenges, reviews the prevailing trends along with their pros and cons, and identifies promising directions for future research.

DVMNet: Computing Relative Pose for Unseen Objects Beyond Hypotheses

This work presents a Deep Voxel Matching Network (DVMNet) that eliminates the need for pose hypotheses and computes the relative object pose in a single pass, and introduces a weighted closest voxel algorithm capable of mitigating the impact of noisy voxels.

GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation

A simple yet effective Geometry-guided Direct Regression Network (GDR-Net) to learn the 6D pose in an end-to-end manner from dense correspondence-based intermediate geometric representations, which remarkably outperforms state-of-the-art methods on LM, LM-O and YCB-V datasets.

LatentFusion: End-to-End Differentiable Reconstruction and Rendering for Unseen Object Pose Estimation

A novel framework for 6D pose estimation of unseen objects is proposed, which presents a network that reconstructs a latent 3D representation of an object using a small number of reference views at inference time and is able to render the latent3D representation from arbitrary views.

T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-Less Objects

Initial evaluation results indicate that the state of the art in 6D object pose estimation has ample room for improvement, especially in difficult cases with significant occlusion.

Histograms of oriented gradients for human detection

It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.

PoseContrast: Class-Agnostic Object Viewpoint Estimation in the Wild with Pose-Aware Contrastive Learning

This work trains a direct pose estimator in a class-agnostic way by sharing weights across all object classes, and introduces a contrastive learning method that has three main ingredients: the use of pre-trained, self-supervised, contrast-based features, and a pose-aware contrastive loss.

A Simple Framework for Contrastive Learning of Visual Representations

It is shown that composition of data augmentations plays a critical role in defining effective predictive tasks, and introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning.

Multi-Path Learning for Object Pose Estimation Across Domains

A scalable approach for object pose estimation trained on simulated RGB views of multiple 3D models together and achieves state-of-the-art results on T-LESS at much lower runtimes than competing approaches is introduced.

Pose from Shape: Deep Pose Estimation for Arbitrary 3D Objects

This work proposes a completely generic deep pose estimation approach, which does not require the network to have been trained on relevant categories, nor objects in a category to have a canonical pose, and demonstrates that this method boosts performances for supervised category pose estimation on standard benchmarks.

Implicit 3D Orientation Learning for 6D Object Detection from RGB Images

This work proposes a real-time RGB-based pipeline for object detection and 6D pose estimation based on a variant of the Denoising Autoencoder trained on simulated views of a 3D model using Domain Randomization.

NetVLAD: CNN Architecture for Weakly Supervised Place Recognition

A convolutional neural network architecture that is trainable in an end-to-end manner directly for the place recognition task, and significantly outperforms non-learnt image representations and off-the-shelf CNN descriptors on two challenging place recognition benchmarks.