Unsupervised Learning of Monocular Depth and Ego-Motion using Conditional PatchGANs

Unsupervised Learning of Monocular Depth and Ego-Motion using Conditional PatchGANs

Madhu Vankadari, Swagat Kumar, Anima Majumder, Kaushik Das

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence
Main track. Pages 5677-5684. https://doi.org/10.24963/ijcai.2019/787

This paper presents a new GAN-based deep learning framework for estimating absolute scale awaredepth and ego motion from monocular images using a completely unsupervised mode of learning.The proposed architecture uses two separate generators to learn the distribution of depth and posedata for a given input image sequence. The depth and pose data, thus generated, are then evaluated bya patch-based discriminator using the reconstructed image and its corresponding actual image. Thepatch-based GAN (or PatchGAN) is shown to detect high frequency local structural defects in thereconstructed image, thereby improving the accuracy of overall depth and pose estimation. Unlikeconventional GANs, the proposed architecture uses a conditioned version of input and output of thegenerator for training the whole network. The resulting framework is shown to outperform all existing deep networks in this field and beating the current state-of-the-art method by 8.7% in absoluteerror and 5.2% in RMSE metric. To the best of our knowledge, this is first deep network based modelto estimate both depth and pose simultaneously using a conditional patch-based GAN paradigm.The efficacy of the proposed approach is demonstrated through rigorous ablation studies and exhaustive performance comparison on the popular KITTI outdoor driving dataset.
Keywords:
Robotics: Localization, Mapping, State Estimation
Robotics: Robotics and Vision
Computer Vision: 2D and 3D Computer Vision