Using surfaces and surface relations in an Early Cognitive Vision system | Machine Vision and Applications
Skip to main content

Using surfaces and surface relations in an Early Cognitive Vision system

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

We present a deep hierarchical visual system with two parallel hierarchies for edge and surface information. In the two hierarchies, complementary visual information is represented on different levels of granularity together with the associated uncertainties and confidences. At all levels, geometric and appearance information is coded explicitly in 2D and 3D allowing to access this information separately and to link between the different levels. We demonstrate the advantages of such hierarchies in three applications covering grasping, viewpoint independent object representation, and pose estimation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Notes

  1. Note that multiple of these early stages of processing are collapsed into one level in Fig. 2 and are in more detail described in, e.g., [67].

  2. The parameter m is typically in the range [2,4].

References

  1. Başeski, E., Pugeault, N., Kalkan, S., Bodenhagen, L., Piater, J.H., Krüger, N.: Using multi-modal 3D contours and their relations for vision and robotics. J. Vis. Commun. Image Represent. 21(8), 850–864 (2010)

    Article  Google Scholar 

  2. Bay, H., Ess, A., Tuytelaars, T., Gool, L.V.: Speeded-up robust features (surf). Comput. Vis. Image Underst. 110(3), 346–359 (2008). doi:10.1016/j.cviu.2007.09.014. http://www.sciencedirect.com/science/article/pii/S1077314207001555

  3. Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell. 24(4), 509–522 (2002). doi:10.1109/34.993558

    Article  Google Scholar 

  4. Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2, 1–127 (2009)

    Article  MATH  Google Scholar 

  5. Bengio, Y., Lamblin, P., Popovici, P., Larochelle, H.: Greedy layer-wise training of deep networks. Adv. Neural Inf. Process. Syst. 19, 153–160 (2007)

    Google Scholar 

  6. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  7. Buch, A.G., Jessen, J.B., Kraft, D., Savarimuthu, T.R., Krüger, N.: Extended 3d line segments from rgb-d data for pose estimation. In: Kämäräinen, J.-K., Koskela, M. (eds.) Image Analysis, pp. 54–65. Springer, Berlin (2013)

  8. Buch, A.G., Kraft, D., Kämäräinen, J.K., Krüger, N.: Pose estimation using a hierarchical 3D representation of contours and surfaces. VISAPP 1, 105–111 (2013)

    Google Scholar 

  9. Buch, A.G., Kraft, D., Kamarainen, J.K., Petersen, H.G., Kruger, N.: Pose estimation using local structure-specific shape and appearance context. In: IEEE International Conference on Robotics and Automation (ICRA), 2013, pp. 2080–2087. IEEE (2013)

  10. Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 8(6), 679–698 (1986)

    Article  Google Scholar 

  11. Csurka, G., Dance, C.R., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Workshop on Statistical Learning in Computer Vision, pp. 1–22, ECCV (2004)

  12. Detry, R., Pugeault, N., Piater, J.: A probabilistic framework for 3D visual object representation. IEEE Trans. Pattern Anal. Mach. Intell. 31(10), 1790–1803 (2009)

    Article  Google Scholar 

  13. Dickinson, S.: The evolution of object categorization and the challenge of image abstraction. In: Dickinson, S., Leonardis, A., Schiele, B., Tarr, M. (eds.) Object Categorization: Computer and Human Vision Perspectives, pp. 1–37. Cambridge University Press, Cambridge (2009)

    Chapter  Google Scholar 

  14. Felleman, D., Essen, D.V.: Distributed hierarchical processing in primate cerebral cortex. Cereb. Cortex 1, 1–47 (1991)

    Article  Google Scholar 

  15. Felsberg, M., Kalkan, S., Krüger, N.: Continuous dimensionality characterization of image structures. Image Vis. Comput. 27, 628–636 (2009)

    Article  Google Scholar 

  16. Felsberg, M., Sommer, G.: The monogenic signal. IEEE Trans. Signal Process. 49(12), 3136–3144 (2001)

    Article  MathSciNet  Google Scholar 

  17. Fidler, S., Boben, M., Leonardis, A.: Learning hierarchical compositional representations of object structure. In: Dickinson, S., Leonardis, A., Schiele, B., Tarr, M. (eds.) Object Categorization: Computer and Human Vision Perspectives, pp. 196–215. Cambridge University Press, Cambridge (2009)

    Chapter  Google Scholar 

  18. Fidler, S., Boben, M., Leonardis, A.: A coarse-to-fine taxonomy of constellations for fast multi-class object detection. ECCV 5, 687–700 (2010)

    Google Scholar 

  19. Fischler, M., Bolles, R.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)

    Article  MathSciNet  Google Scholar 

  20. Fukushima, K., Miyake, S., Ito, T.: Neocognitron: a neural network model for a mechanism of visual pattern recognition. IEEE Syst. Man Cybern. 13(3), 826–834 (1983)

    Article  Google Scholar 

  21. Geman, S., Bienenstock, E., Doursat, R.: Neural networks and the bias/variance dilemma. Neural Comput. 4, 1–58 (1995)

    Article  Google Scholar 

  22. Geman, S., Potter, D., Chi, Z.: Composition systems. Q. Appl. Math. 60(4), 707–736 (2002)

    MATH  MathSciNet  Google Scholar 

  23. Gilbert, A., Illingworth, J., Bowden, R.: Action recognition using mined hierarchical compound features. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 883–897 (2011)

    Article  Google Scholar 

  24. Granlund, G.H., Knutsson, H.: Signal Processing for Computer Vision. Kluwer Academic, Dordrecht (1995)

    Book  Google Scholar 

  25. Hetzel, G., Leibe, B., Levi, P., Schiele, B.: 3D object recognition from range images using local feature histograms. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, pp. II-394–II-399. IEEE Computer Society, Los Alamitos, CA, USA (2001). doi:10.1109/CVPR.2001.990988

  26. Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18, 527–1554 (2006)

    Article  MathSciNet  Google Scholar 

  27. Huang, F.J., LeCun, Y.: Large-scale learning with SVN and convolutional nets for generic object categorization. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 284–291 (2006)

  28. Hubel, D., Wiesel, T.: Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J. Physiol. 160, 106–154 (1962)

    Article  Google Scholar 

  29. Hubel, D., Wiesel, T.: Anatomical demonstration of columns in the monkey striate cortex. Nature 221, 747–750 (1969)

    Article  Google Scholar 

  30. Hummel, J., Biederman, I.: Dynamic binding in a neural network for shape recognition. Psychol. Rev. 99, 480–517 (1992)

    Article  Google Scholar 

  31. Hunt, R.: Measuring Colour, 3rd edn. Fountain Press, Kingston-upon-Thames (1998)

    Google Scholar 

  32. Jensen, L.B.W., Kjær-Nielsen, A., Pauwels, K., Jessen, J.B., Hulle, M.V., Krüger, N.: A two-level real-time vision machine combining coarse and fine grained parallelism. J. Real-Time Image Process. 5(4), 291–304 (2010)

    Article  Google Scholar 

  33. Jessen, J.B., Pilz, F., Kraft, D., Pugeault, N., Krüger, N.: Accumulation of different visual feature descriptors in a coherent framework. In: Scandinavian Conference on Image Analysis (SCIA), pp. 79–90 (2011)

  34. Johnson, A.E., Hebert, M.: Using spin images for efficient object recognition in cluttered 3D scenes. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 21(5), 433–449 (1999). doi:10.1109/34.765655

    Article  Google Scholar 

  35. Kalkan, S., Wörgötter, F., Krüger, N.: Statistical analysis of local 3D structure in 2D images. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 1114–1121 (2006)

  36. Kandell, E., Schwartz, J., Messel, T.: Principles of Neural Science, 4th edn. McGraw Hill, New York (2000)

    Google Scholar 

  37. Kasper, A., Xue, Z., Dillmann, R.: The kit object models database: an object model database for object recognition, localization and manipulation in service robotics. Int. J. Robot. Res. (IJRR) 31(8), 927–934 (2012). doi:10.1177/0278364912445831

    Article  Google Scholar 

  38. Kavukcuoglu, K., Sermanet, P., amd K. Gregor, Y.B., Mathieu, M., LeCun, Y.: Learning convolutional feature hierachies for visual recognition. In: Advances in Neural Information Processing Systems (NIPS 2010), vol. 23, pp. 1090–1098 (2010)

  39. Kellman, P., Arterberry, M.: The Cradle of Knowledge. MIT-Press, Cambridge (1998)

    Google Scholar 

  40. Kjær-Nielsen, A., Buch, A.G., Jensen, A.E.K., Ellekilde, L.P., Petersen, H.G., Krüger, N., Kraft, D., Møller, B.: Ring on the hook: placing a ring on a moving and pendulating hook based on visual input. Ind. Robot Int. J. 28(3), 301–314 (2010)

    Google Scholar 

  41. Kootstra, G., Popovic, M., Jørgensen, J., Kuklinski, K., Miatliuk, K., Kragic, D., Kruger, N.: Enabling grasping of unknown objects through a synergistic use of edge and surface information. Int. J. Robot. Res. 31(10), 1190–1213 (2012). doi:10.1177/0278364912452621. http://ijr.sagepub.com/content/31/10/1190.abstract

  42. Kovesi, P.: Image features from phase congruency. Videre J. Comput. Vis. Res. 1(3), 1–26 (1999)

    Google Scholar 

  43. Kraft, D., Detry, R., Pugeault, N., Başeski, E., Guerin, F., Piater, J., Krüger, N.: Development of object and grasping knowledge by robot exploration. IEEE Trans. Auton. Ment. Dev. 2(4), 368–383 (2010)

    Article  Google Scholar 

  44. Kraft, D., Pugeault, N., Başeski, E., Popović, M., Kragic, D., Kalkan, S., Wörgötter, F., Krüger, N.: Birth of the object: detection of objectness and extraction of object shape through object action complexes. Int. J. Hum. Robot. (Special Issue on “Cognitive Humanoid Robots”) 5, 247–265 (2009)

    Article  Google Scholar 

  45. Krüger, N., Janssen, P., Kalkan, S., Lappe, M., Leonardis, A., Piater, J., Rodríguez-Sánchez, A.J., Wiskott, L.: Deep hierarchies in the primate visual cortex: what can we learn for computer vision? IEEE PAMI 35(8), 1847–1871 (2013)

    Article  Google Scholar 

  46. Krüger, N., Pugeault, N., Başeski, E., Jensen, L.B.W., Kalkan, S., Kraft, D., Jessen, J.B., Pilz, F., Nielsen, A.K., Popović, M., Asfour, T., Piater, J., Kragic, D., Wörgötter., F.: Early cognitive vision as a front-end for cognitive systems. In: ECCV 2010 Workshop on “Vision for Cognitive Tasks” (2010)

  47. Krüger, N., Wörgötter, F.: Different degree of genetical prestructuring in the ontogenesis of visual abilities based on deterministic and statistical regularities. In: Proceedings of the Workshop on Growing up Artifacts that Live (SAB 2002), pp. 5–14 (2002)

  48. Krüger, N., Wörgötter, F.: Multi-modal primitives as functional models of hyper-columns and their use for contextual integration. In: Proceedings of the 1st International Symposium on Brain, Vision and Artificial Intelligence, Lecture Notes in Computer Science, LNCS 3704, pp. 157–156 (2005)

  49. Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view rgb-d object dataset. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 1817–1824 (2011)

  50. Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2169–2178 (2006)

  51. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  52. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. (IJCV) 2(60), 91–110 (2004)

    Article  Google Scholar 

  53. Marr, D.: Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. Freeman, San Francisco (1977)

    Google Scholar 

  54. Mel, B.W., Fiser, J.: Minimizing binding errors using learned conjunctive features. Neural Comput. 12(4), 731–762 (2000)

    Article  Google Scholar 

  55. Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 27(10), 1615–1630 (2005)

    Article  Google Scholar 

  56. Milner, A., Goodale, M.: Separate visual pathways for perception and action. Trends Neurosci. 15, 20–25 (1992)

    Article  Google Scholar 

  57. Murray, D., Little, J.: Patchlets: representing stereo vision data with surface elements. In: Seventh IEEE Workshops on Application of Computer Vision. WACV/MOTIONS vol 1., pp. 192–199 (2005)

  58. Mustafa, W., Pugeault, N., Krüger, N.: Multi-view object recognition using view-point invariant shape relations and appearance information. In: IEEE International Conference on Robotics and Automation (ICRA) (2013)

  59. Niebles, J., Fei Fei, L.: A hierarchical model of shape and appearance for human action classification. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2007)

  60. Olesen, S.M., Lyder, S., Kraft, D., Krüger, N., Jessen, J.B.: Real-time extraction of surface patches with associated uncertainties by means of kinect cameras. J. Real-Time Image Process. 1–14 (2012). doi:10.1007/s11554-012-0261-x

  61. Ommer, B., Buhmann, J.M.: Learning the compositional nature of visual objects. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2007)

  62. O’Neill, B.: Elementary Differential Geometry. Elsevier Academic Press, Amsterdam (2006). http://books.google.dk/books?id=OtbNXAIve_AC

  63. Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2007)

  64. Pinto, N., Barhomi, Y., Cox, D., DiCarlo, J.: Comparing state-of-the-art visual features on invariant object recognition tasks. In: IEEE Workshop on Applications of Computer Vision (WACV 2011), pp. 463–470 (2011)

  65. Pinto, N., DiCarlo, J., Cox, D.: How far can you get with a modern face recognition test set using only simple features? In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2591–2598 (2009)

  66. Popović, M., Kraft, D., Bodenhagen, L., Başeski, E., Pugeault, N., Kragic, D., Asfour, T., Krüger, N.: A strategy for grasping unknown objects based on co-planarity and colour information. Robot. Auton. Syst. 58(5), 551–565 (2010). doi:10.1016/j.robot.2010.01.003

    Article  Google Scholar 

  67. Pugeault, N., Wörgötter, F., Krüger, N.: Visual primitives: local, condensed, and semantically rich visual descriptors and their applications in robotics. Int. J. Hum. Robot. (Special Issue on Cognitive Humanoid Vision) 7(3), 379–405 (2010)

    Article  Google Scholar 

  68. Quack, T., Ferrari, V., Leibe, B., Gool, L.V.: Efficient mining of frequent and distinctive feature configurations. In: Proceedings of the International Conference in Computer Vision (ICCV), pp. 1–8 (2007)

  69. Razavian, A., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: IEEE CVPR Workshop on DeepVision (2014)

  70. Riesenhuber, M., Poggio, T.: Hierarchical models of object recognition in cortex. Nature Neurosci. 11(2), 1019–1025 (1999)

    Google Scholar 

  71. Rosenblatt, F.: The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65, 386–408 (1958)

    Article  MathSciNet  Google Scholar 

  72. Rumelhart, D., Hinton, G., Williams, R.: Learning representation by back-propagating errors. Nature 323(9), 533–536 (1986)

    Article  Google Scholar 

  73. Rusu, R.B., Blodow, N., Beetz, M.: Fast point feature histograms (fpfh) for 3d registration. In: IEEE International Conference on Robotics and Automation, 2009. ICRA’09, pp. 3212–3217. IEEE (2009)

  74. Rusu, R.B., Blodow, N., Marton, Z.C., Beetz, M.: Aligning point cloud views using persistent feature histograms. In: Proceedings of the 21st IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Nice, France, pp. 3384–3391 (2008)

  75. Savarese, S., Winn, J., Criminisi, A.: Discriminative object class models of appearance and shape by correlations. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2033–2040 (2006)

  76. Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., Le Cun, Y.: OverFeat: Integrated recognition, localization and detection using convolutional networks. In: International Conference on Learning Representations (2014)

  77. Sermanet, P., LeCun, Y.: Traffic sign recognition with multi-scale convolutional networks. In: Proceedings of International Joint Conference on Neural Networks (IJCNN’11), pp. 2809–2813 (2011)

  78. Serre, T., Wolf, L., Bileschi, S., Riesenhuber, M., Poggio, T.: Object recognition with cortex-like mechanisms. IEEE Trans. Pattern Anal. Mach. Intell. 29(3), 411–426 (2007)

    Article  Google Scholar 

  79. Sutskever, I., Hinton, G.E.: Learning multilevel distributed representations for high-dimensional sequences. In: AI and Statistics, pp. 544–551 (2007)

  80. Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I.J., Fergus, R.: Intriguing properties of neural networks. International Conference on Learning Representations (2014)

  81. Tenenbaum, J.B., Kemp, C., Griffiths, T.L., Goodman, N.D.: How to grow a mind: statistics, structure, and abstraction. Science 331, 1279–1285 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  82. Tsotsos, J.K.: Analyzing vision at the complexity level. Behav. Brain Sci. 13(3), 423–469 (1990)

    Article  Google Scholar 

  83. Tsotsos, J.K.: A Computational Perspective on Visual Attention, 1st edn. MIT Press, Cambridge (2011)

    Book  Google Scholar 

  84. Ullman, S., Epshtein, B.: Visual classication by a hierarchy of extended fragments. In: Ponce, J., Hebert, M., Schmid, C., Zisserman, A. (eds.) Towards Category-Level Object Recognition, pp. 321–344. Springer, Berlin (2006)

  85. Wahl, E., Hillenbrand, U., Hirzinger, G.: Surflet-pair-relation histograms: a statistical 3D-shape representation for rapid classification. In: Fourth International Conference on 3-D Digital Imaging and Modeling, 2003. 3DIM 2003. Proceedings, pp. 474–481. IEEE (2003)

  86. Yang, Y., Newsam, S.: Bag-of-visual-words and spatial extensions for land-use classification. In: Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 270–279 (2010)

  87. Zhang, J., Marszalek, M., Lazebnik, S., Schmid, C.: Local features and kernels for classification of texture and object categories: a comprehensive study. Int. J. Comput. Vis. 73(2), 213–238 (2007)

    Article  Google Scholar 

Download references

Acknowledgments

This work has been supported by the European Community’s Seventh Framework Programme FP7/ICT under grant agreement no. 270273, Xperience. We would like to thank Antonio Rodriguez Sanchez for providing an initial version of Fig. 1.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dirk Kraft.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kraft, D., Mustafa, W., Popović, M. et al. Using surfaces and surface relations in an Early Cognitive Vision system. Machine Vision and Applications 26, 933–954 (2015). https://doi.org/10.1007/s00138-015-0705-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00138-015-0705-y

Keywords