SUN RGB-D: A RGB-D scene understanding benchmark suite

@article{Song2015SUNRA,
  title={SUN RGB-D: A RGB-D scene understanding benchmark suite},
  author={Shuran Song and Samuel P. Lichtenberg and Jianxiong Xiao},
  journal={2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2015},
  pages={567-576},
  url={https://api.semanticscholar.org/CorpusID:6242669}
}
This paper introduces an RGB-D benchmark suite for the goal of advancing the state-of-the-arts in all major scene understanding tasks, and presents a dataset that enables the train data-hungry algorithms for scene-understanding tasks, evaluate them using meaningful 3D metrics, avoid overfitting to a small testing set, and study cross-sensor bias.

SceneNet RGB-D: 5M Photorealistic Images of Synthetic Indoor Trajectories with Ground Truth

We introduce SceneNet RGB-D, expanding the previous work of SceneNet to enable large scale photorealistic rendering of indoor scene trajectories. It provides pixel-perfect ground truth for scene

Matterport3D: Learning from RGB-D Data in Indoor Environments

Matterport3D is introduced, a large-scale RGB-D dataset containing 10,800 panoramic views from 194,400RGB-D images of 90 building-scale scenes that enable a variety of supervised and self-supervised computer vision tasks, including keypoint matching, view overlap prediction, normal prediction from color, semantic segmentation, and region classification.

Feature learning for RGB-D scene understanding

Inspired by the success of unsupervised feature learning, the existing unsuper supervised feature learning technique is adapted to directly learn features from RGB-D images to solve indoor scene understanding problems.

ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes

This work introduces ScanNet, an RGB-D video dataset containing 2.5M views in 1513 scenes annotated with 3D camera poses, surface reconstructions, and semantic segmentations, and shows that using this data helps achieve state-of-the-art performance on several 3D scene understanding tasks.

Multi-Modal RGB-D Scene Recognition Across Domains

This work puts under the spotlight the existence of a possibly severe domain shift issue within multi-modality scene recognition datasets and introduces a novel adaptive scene recognition approach that leverages self-supervised translation between modalities.

Multiview RGB-D Dataset for Object Instance Detection

A new multi-view RGB-D dataset of nine kitchen scenes, each containing several objects in realistic cluttered environments including a subset of objects from the BigBird dataset is presented and an approach for detection and recognition is presented.

SceneNet RGB-D: Can 5M Synthetic Images Beat Generic ImageNet Pre-training on Indoor Segmentation?

Analysis of SceneNet RGB-D suggests that large-scale high-quality synthetic RGB datasets with task-specific labels can be more useful for pretraining than real-world generic pre-training such as ImageNet.

Learning 3D Scene Synthesis from Annotated RGB‐D Images

While the algorithm inserts objects one at a time, it attains holistic plausibility of the whole current scene while offering controllability through progressive synthesis, compared to previous works on probabilistic learning for object placement.

Learning Effective RGB-D Representations for Scene Recognition

This paper proposes an architecture and a two-step training approach that directly learns effective depth-specific features using weak supervision via patches and obtains the state-of-the-art performances on RGB-D image and video scene recognition.

ScanNet++: A High-Fidelity Dataset of 3D Indoor Scenes

A large-scale dataset that couples together capture of high-quality and commodity-level geometry and color of indoor scenes, and a new benchmark for 3D semantic scene understanding that comprehensively encapsulates diverse and ambiguous semantic labeling scenarios.
...

RGB-(D) scene labeling: Features and algorithms

The main objective is to empirically understand the promises and challenges of scene labeling with RGB-D and adapt the framework of kernel descriptors that converts local similarities (kernels) to patch descriptors to capture appearance (RGB) and shape (D) similarities.

Semantic Labeling of 3D Point Clouds for Indoor Scenes

This paper proposes a graphical model that captures various features and contextual relations, including the local visual appearance and shape cues, object co-occurence relationships and geometric relationships, and applies these algorithms successfully on a mobile robot for the task of finding objects in large cluttered rooms.

A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM

This work introduces the Imperial College London and National University of Ireland Maynooth (ICL-NUIM) dataset and presents a collection of handheld RGB-D camera sequences within synthetically generated environments to provide a method to benchmark the surface reconstruction accuracy.

Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images

We address the problem of inferring the pose of an RGB-D camera relative to a known 3D scene, given only a single acquired image. Our approach employs a regression forest that is capable of inferring

A large-scale hierarchical multi-view RGB-D object dataset

A large-scale, hierarchical multi-view object dataset collected using anRGB-D camera is introduced and techniques for RGB-D based object recognition and detection are introduced, demonstrating that combining color and depth information substantially improves quality of results.

SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels

SUN3D, a large-scale RGB-D video database with camera pose and object labels, capturing the full 3D extent of many places is introduced, and a generalization of bundle adjustment that incorporates object-to-object correspondences is introduced.

Contextually guided semantic labeling and search for three-dimensional point clouds

This paper addresses the task of detecting commonly found objects in the three-dimensional point cloud of indoor scenes obtained from RGB-D cameras by using a graphical model that captures various features and contextual relations, including the local visual appearance and shape cues, object co-occurrence relationships and geometric relationships.

Intrinsic Scene Properties from a Single RGB-D Image

In this paper we extend the “shape, illumination and reflectance from shading” (SIRFS) model [3, 4], which recovers intrinsic scene properties from a single image. Though SIRFS performs well on

Learning Rich Features from RGB-D Images for Object Detection and Segmentation

A new geocentric embedding is proposed for depth images that encodes height above ground and angle with gravity for each pixel in addition to the horizontal disparity to facilitate the use of perception in fields like robotics.

Holistic Scene Understanding for 3D Object Detection with RGBD Cameras

A holistic approach that exploits 2D segmentation, 3D geometry, as well as contextual relations between scenes and objects, and develops a conditional random field to integrate information from different sources to classify the cuboids is proposed.
...