Patch-Wise Attention Network for Monocular Depth Estimation

Authors

  • Sihaeng Lee KAIST
  • Janghyeon Lee KAIST
  • Byungju Kim KAIST Mathpresso Inc
  • Eojindl Yi KAIST
  • Junmo Kim KAIST

DOI:

https://doi.org/10.1609/aaai.v35i3.16282

Keywords:

Vision for Robotics & Autonomous Driving

Abstract

In computer vision, monocular depth estimation is the problem of obtaining a high-quality depth map from a two-dimensional image. This map provides information on three-dimensional scene geometry, which is necessary for various applications in academia and industry, such as robotics and autonomous driving. Recent studies based on convolutional neural networks achieved impressive results for this task. However, most previous studies did not consider the relationships between the neighboring pixels in a local area of the scene. To overcome the drawbacks of existing methods, we propose a patch-wise attention method for focusing on each local area. After extracting patches from an input feature map, our module generates attention maps for each local patch, using two attention modules for each patch along the channel and spatial dimensions. Subsequently, the attention maps return to their initial positions and merge into one attention feature. Our method is straightforward but effective. The experimental results on two challenging datasets, KITTI and NYU Depth V2, demonstrate that the proposed method achieves significant performance. Furthermore, our method outperforms other state-of-the-art methods on the KITTI depth estimation benchmark.

Downloads

Published

2021-05-18

How to Cite

Lee, S., Lee, J., Kim, B., Yi, E., & Kim, J. (2021). Patch-Wise Attention Network for Monocular Depth Estimation. Proceedings of the AAAI Conference on Artificial Intelligence, 35(3), 1873-1881. https://doi.org/10.1609/aaai.v35i3.16282

Issue

Section

AAAI Technical Track on Computer Vision II