Depth-assisted rectification for real-time object detection and pose estimation | Machine Vision and Applications Skip to main content
Log in

Depth-assisted rectification for real-time object detection and pose estimation

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

RGB-D sensors have become in recent years a product of easy access to general users. They provide both a color image and a depth image of the scene and, besides being used for object modeling, they can also offer important cues for object detection and tracking in real time. In this context, the work presented in this paper investigates the use of consumer RGB-D sensors for object detection and pose estimation from natural features. Two methods based on depth-assisted rectification are proposed, which transform features extracted from the color image to a canonical view using depth data in order to obtain a representation invariant to rotation, scale and perspective distortions. While one method is suitable for textured objects, either planar or non-planar, the other method focuses on texture-less planar objects. Qualitative and quantitative evaluations of the proposed methods are performed, showing that they can obtain better results than some existing methods for object detection and pose estimation, especially when dealing with oblique poses.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28
Fig. 29
Fig. 30
Fig. 31
Fig. 32
Fig. 33
Fig. 34
Fig. 35
Fig. 36
Fig. 37
Fig. 38
Fig. 39
Fig. 40
Fig. 41
Fig. 42
Fig. 43
Fig. 44
Fig. 45

Similar content being viewed by others

References

  1. Álvarez, H., Borro, D.: Junction assisted 3d pose retrieval of untextured 3d models in monocular images. Comput. Vis. Image Underst. 117(10), 1204–1214 (2013)

    Article  Google Scholar 

  2. Barrow, H.G., Tenenbaum, J.M., Bolles, R.C., Wolf, H.C.: Parametric correspondence and chamfer matching: two new techniques for image matching. Technical report, DTIC Document (1977)

  3. Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (surf). Comput. Vis. Image Underst. 110(3), 346–359 (2008)

    Article  Google Scholar 

  4. Benhimane, S., Malis, E.: Real-time image-based tracking of planes using efficient second-order minimization. In: 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2004 (IROS 2004). Proceedings, vol. 1, pp. 943–948. IEEE (2004)

  5. Berkmann, J., Caelli, T.: Computation of surface geometry and segmentation using covariance techniques. IEEE Trans. Pattern Anal. Mach. Intell. 16(11), 1114–1116 (1994)

    Article  Google Scholar 

  6. Borgefors, G.: Distance transformations in digital images. Comput. Vis. Graph. Image Process. 34(3), 344–371 (1986)

    Article  Google Scholar 

  7. Bradski, G., Kaehler, A.: Learning OpenCV: Computer vision with the OpenCV library. O’Reilly Media Inc, Sebastopol (2008)

    Google Scholar 

  8. Calonder, M., Lepetit, V., Strecha, C., Fua, P.: Brief: Binary robust independent elementary features. In: Computer Vision—ECCV 2010, pp. 778–792. Springer, Berlin (2010)

  9. Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 6, 679–698 (1986)

    Article  Google Scholar 

  10. Cruz, L., Lucio, D., Velho, L.: Kinect and RGBD images: challenges and applications. In: 2012 25th SIBGRAPI Conference on Graphics, Patterns and Images Tutorials (SIBGRAPI-T), pp. 36–49. IEEE (2012)

  11. Del Bimbo, A., Franco, F., Pernici, F.: Local homography estimation using keypoint descriptors. In: Analysis, Retrieval and Delivery of Multimedia Content, pp. 203–217. Springer, Berlin (2013)

  12. Donoser, M., Kontschieder, P., Bischof, H.: Robust planar target tracking and pose estimation from a single concavity. In: 2011 10th IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 9–15. IEEE (2011)

  13. Eyjolfsdottir, E., Turk, M.: Multisensory embedded pose estimation. In: 2011 IEEE Workshop on Applications of Computer Vision (WACV), pp. 23–30. IEEE (2011)

  14. Falahati, S.: OpenNI Cookbook. Packt Publishing Ltd, Birmingham (2013)

    Google Scholar 

  15. Gossow, D., Weikersdorfer, D., Beetz, M.: Distinctive texture features from perspective-invariant keypoints. In: 2012 21st International Conference on Pattern Recognition (ICPR), pp. 2764–2767. IEEE (2012)

  16. Hagbi, N., Bergig, O., El-Sana, J., Billinghurst, M.: Shape recognition and pose estimation for mobile augmented reality. In: 8th IEEE International Symposium on Mixed and Augmented Reality, 2009. ISMAR 2009, pp. 65–71. IEEE (2009)

  17. Haralock, R.M., Shapiro, L.G.: Computer and robot vision. Addison-Wesley Longman Publishing Co., Inc. (1991)

  18. Harris, C., Stephens, M.: A combined corner and edge detector. In: Alvey Vision Conference, vol. 15, p. 50. Manchester, UK (1988)

  19. Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2003)

    Google Scholar 

  20. Hinterstoisser, S., Benhimane, S., Navab, N., Fua, P., Lepetit, V.: Online learning of patch perspective rectification for efficient object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, 2008. CVPR 2008, pp. 1–8. IEEE (2008)

  21. Hinterstoisser, S., Holzer, S., Cagniart, C., Ilic, S., Konolige, K., Navab, N., Lepetit, V.: Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 858–865. IEEE (2011)

  22. Hinterstoisser, S., Kutter, O., Navab, N., Fua, P., Lepetit, V.: Real-time learning of accurate patch rectification. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, pp. 2945–2952. IEEE (2009)

  23. Hinterstoisser, S., Lepetit, V., Ilic, S., Fua, P., Navab, N.: Dominant orientation templates for real-time detection of texture-less objects. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2257–2264. IEEE (2010)

  24. Hofhauser, A., Steger, C., Navab, N.: Edge-based template matching and tracking for perspectively distorted planar objects. In: Advances in Visual Computing, pp. 35–44. Springer, Berlin (2008)

  25. Holzer, S., Hinterstoisser, S., Ilic, S., Navab, N.: Distance transform templates for object detection and pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, pp. 1177–1184. IEEE (2009)

  26. Konolige, K.: Projected texture stereo. In: 2010 IEEE International Conference on Robotics and Automation (ICRA), pp. 148–155. IEEE (2010)

  27. Koser, K., Koch, R.: Perspectively invariant normal features. In: IEEE 11th International Conference on Computer Vision, 2007. ICCV 2007, pp. 1–8. IEEE (2007)

  28. Kurz, D., Benhimane, S.: Gravity-aware handheld augmented reality. In: 2011 10th IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 111–120. IEEE (2011)

  29. Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view RGB-D object dataset. In: 2011 IEEE International Conference on Robotics and Automation (ICRA), pp. 1817–1824. IEEE (2011)

  30. Lee, W., Park, N., Woo, W.: Depth-assisted real-time 3d object detection for augmented reality. In: ICAT’11, pp. 126–132 (2011)

  31. Lepetit, V., Lagger, P., Fua, P.: Randomized trees for real-time keypoint recognition. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. CVPR 2005, vol. 2, pp. 775–781. IEEE (2005)

  32. Lieberknecht, S., Benhimane, S., Meier, P., Navab, N.: A dataset and evaluation methodology for template-based tracking algorithms. In: ISMAR, pp. 145–151 (2009)

  33. Lima, J.P., Simoes, F., Uchiyama, H., Teichrieb, V., Marchand, E., et al.: Depth-assisted rectification of patches using RGB-D consumer devices to improve real-time keypoint matching. In: International Conference on Computer Vision Theory and Applications, Visapp 2013, pp. 651–656 (2013)

  34. Lima, J.P., Teichrieb, V., Uchiyama, H., Marchand, E., et al.: Object detection and pose estimation from natural features using consumer RGB-D sensors: applications in augmented reality. In: IEEE International Symposium on Mixed and Augmented Reality (Doctoral Symposium), ISMAR’12, pp. 1–4 (2012)

  35. Lima, J.P., Uchiyama, H., Teichrieb, V., Marchand, E.: Texture-less planar object detection and pose estimation using depth-assisted rectification of contours. In: 2012 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 297–298. IEEE (2012)

  36. Liu, M.Y., Tuzel, O., Veeraraghavan, A., Chellappa, R.: Fast directional chamfer matching. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1696–1703. IEEE (2010)

  37. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Article  Google Scholar 

  38. Marcon, M., Frigerio, E., Sarti, A., Tubaro, S.: 3d wide baseline correspondences using depth-maps. Signal Process. Image Commun. 27(8), 849–855 (2012)

    Article  Google Scholar 

  39. Martedi, S., Thomas, B., Saito, H.: Region-based tracking using sequences of relevance measures. In: 2013 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 1–6. IEEE (2013)

  40. Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: Proceedings of the British Machine Vision Conference, vol. 1, pp. 384–393. BMVA (2002)

  41. Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Van Gool, L.: A comparison of affine region detectors. Int. J. Comput. Vis. 65(1–2), 43–72 (2005)

    Article  Google Scholar 

  42. Morel, J.M., Yu, G.: ASIFT: a new framework for fully affine invariant image comparison. SIAM J. Imaging Sci. 2(2), 438–469 (2009)

  43. Morwald, T., Richtsfeld, A., Prankl, J., Zillich, M., Vincze, M.: Geometric data abstraction using b-splines for range image segmentation. In: 2013 IEEE International Conference on Robotics and Automation (ICRA), pp. 148–153. IEEE (2013)

  44. Newcombe, R.A., Davison, A.J., Izadi, S., Kohli, P., Hilliges, O., Shotton, J., Molyneaux, D., Hodges, S., Kim, D., Fitzgibbon, A.: Kinectfusion: real-time dense surface mapping and tracking. In: 2011 10th IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 127–136. IEEE (2011)

  45. Ozuysal, M., Fua, P., Lepetit, V.: Fast keypoint recognition in ten lines of code. In: IEEE Conference on Computer Vision and Pattern Recognition, 2007. CVPR’07, pp. 1–8. IEEE (2007)

  46. Pagani, A., Stricker, D.: Learning local patch orientation with a cascade of sparse regressors. In: BMVC, pp. 1–11 (2009)

  47. Park, Y., Lepetit, V., Woo, W.: Texture-less object tracking with online training using an RGB-D camera. In: 2011 10th IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 121–126. IEEE (2011)

  48. ROS: openni_launch_tutorials_intrinsiccalibration—ros wiki (2015). http://goo.gl/cEYyaG. Accessed 28 Aug 2015

  49. Rosten, E., Drummond, T.: Machine learning for high-speed corner detection. In: Computer Vision—ECCV 2006, pp. 430–443. Springer, Berlin (2006)

  50. Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: Orb: an efficient alternative to sift or surf. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 2564–2571. IEEE (2011)

  51. Rusu, R.B., Cousins, S.: 3D is here: point cloud library (PCL). In: 2011 IEEE International Conference on Robotics and Automation (ICRA), pp. 1–4. IEEE (2011)

  52. Shotton, J., Blake, A., Cipolla, R.: Multiscale categorical object recognition using contour fragments. IEEE Trans. Pattern Anal. Mach. Intell. 30(7), 1270–1281 (2008)

    Article  Google Scholar 

  53. Suzuki, S., et al.: Topological structural analysis of digitized binary images by border following. Comput. Vis. Graph. Image Process. 30(1), 32–46 (1985)

    Article  MATH  Google Scholar 

  54. Taylor, S., Drummond, T.: Multiple target localisation at over 100 fps. In: Proceedings of the British Machine Vision Conference, pp. 1–11. BMVA (2009)

  55. Uchiyama, H., Marchand, E.: Toward augmenting everything: detecting and tracking geometrical features on planar objects. In: 2011 10th IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 17–25. IEEE (2011)

  56. Woodfill, J.I., Gordon, G., Buck, R.: Tyzx deepsea high speed stereo vision system. In: Conference on Computer Vision and Pattern Recognition Workshop, 2004. CVPRW’04, pp. 41–41. IEEE (2004)

  57. Wu, C., Clipp, B., Li, X., Frahm, J.M., Pollefeys, M.: 3D model matching with viewpoint-invariant patches (VIP). In: IEEE Conference on Computer Vision and Pattern Recognition, 2008. CVPR 2008, pp. 1–8. IEEE (2008)

  58. Yang, M.Y., Cao, Y., Förstner, W., McDonald, J.: Robust wide baseline scene alignment based on 3d viewpoint normalization. In: Advances in Visual Computing, pp. 654–665. Springer, Berlin (2010)

  59. Zeisl, B., Köser, K., Pollefeys, M.: Viewpoint invariant matching via developable surfaces. In: Computer Vision—ECCV 2012. Workshops and Demonstrations, pp. 62–71. Springer, Brelin (2012)

  60. Zeisl, B., Koser, K., Pollefeys, M.: Automatic registration of RGB-D scans via salient directions. In: 2013 IEEE International Conference on Computer Vision (ICCV), pp. 2808–2815. IEEE (2013)

Download references

Acknowledgments

The authors would like to thank Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)/Institut National de Recherche en Informatique et en Automatique (INRIA)/Comisión Nacional de Investigación Científica y Tecnológica (CONICYT) STIC-AmSud project ARVS and Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) (process 141705/2010-8) for partially funding this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to João Paulo Silva do Monte Lima.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (mpg 62920 KB)

Supplementary material 2 (mpg 42922 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

do Monte Lima, J.P.S., Simões, F.P.M., Uchiyama, H. et al. Depth-assisted rectification for real-time object detection and pose estimation. Machine Vision and Applications 27, 193–219 (2016). https://doi.org/10.1007/s00138-015-0740-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00138-015-0740-8

Keywords

Navigation