Abstract
We present a deep hierarchical visual system with two parallel hierarchies for edge and surface information. In the two hierarchies, complementary visual information is represented on different levels of granularity together with the associated uncertainties and confidences. At all levels, geometric and appearance information is coded explicitly in 2D and 3D allowing to access this information separately and to link between the different levels. We demonstrate the advantages of such hierarchies in three applications covering grasping, viewpoint independent object representation, and pose estimation.
Similar content being viewed by others
References
Başeski, E., Pugeault, N., Kalkan, S., Bodenhagen, L., Piater, J.H., Krüger, N.: Using multi-modal 3D contours and their relations for vision and robotics. J. Vis. Commun. Image Represent. 21(8), 850–864 (2010)
Bay, H., Ess, A., Tuytelaars, T., Gool, L.V.: Speeded-up robust features (surf). Comput. Vis. Image Underst. 110(3), 346–359 (2008). doi:10.1016/j.cviu.2007.09.014. http://www.sciencedirect.com/science/article/pii/S1077314207001555
Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell. 24(4), 509–522 (2002). doi:10.1109/34.993558
Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2, 1–127 (2009)
Bengio, Y., Lamblin, P., Popovici, P., Larochelle, H.: Greedy layer-wise training of deep networks. Adv. Neural Inf. Process. Syst. 19, 153–160 (2007)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Buch, A.G., Jessen, J.B., Kraft, D., Savarimuthu, T.R., Krüger, N.: Extended 3d line segments from rgb-d data for pose estimation. In: Kämäräinen, J.-K., Koskela, M. (eds.) Image Analysis, pp. 54–65. Springer, Berlin (2013)
Buch, A.G., Kraft, D., Kämäräinen, J.K., Krüger, N.: Pose estimation using a hierarchical 3D representation of contours and surfaces. VISAPP 1, 105–111 (2013)
Buch, A.G., Kraft, D., Kamarainen, J.K., Petersen, H.G., Kruger, N.: Pose estimation using local structure-specific shape and appearance context. In: IEEE International Conference on Robotics and Automation (ICRA), 2013, pp. 2080–2087. IEEE (2013)
Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 8(6), 679–698 (1986)
Csurka, G., Dance, C.R., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Workshop on Statistical Learning in Computer Vision, pp. 1–22, ECCV (2004)
Detry, R., Pugeault, N., Piater, J.: A probabilistic framework for 3D visual object representation. IEEE Trans. Pattern Anal. Mach. Intell. 31(10), 1790–1803 (2009)
Dickinson, S.: The evolution of object categorization and the challenge of image abstraction. In: Dickinson, S., Leonardis, A., Schiele, B., Tarr, M. (eds.) Object Categorization: Computer and Human Vision Perspectives, pp. 1–37. Cambridge University Press, Cambridge (2009)
Felleman, D., Essen, D.V.: Distributed hierarchical processing in primate cerebral cortex. Cereb. Cortex 1, 1–47 (1991)
Felsberg, M., Kalkan, S., Krüger, N.: Continuous dimensionality characterization of image structures. Image Vis. Comput. 27, 628–636 (2009)
Felsberg, M., Sommer, G.: The monogenic signal. IEEE Trans. Signal Process. 49(12), 3136–3144 (2001)
Fidler, S., Boben, M., Leonardis, A.: Learning hierarchical compositional representations of object structure. In: Dickinson, S., Leonardis, A., Schiele, B., Tarr, M. (eds.) Object Categorization: Computer and Human Vision Perspectives, pp. 196–215. Cambridge University Press, Cambridge (2009)
Fidler, S., Boben, M., Leonardis, A.: A coarse-to-fine taxonomy of constellations for fast multi-class object detection. ECCV 5, 687–700 (2010)
Fischler, M., Bolles, R.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)
Fukushima, K., Miyake, S., Ito, T.: Neocognitron: a neural network model for a mechanism of visual pattern recognition. IEEE Syst. Man Cybern. 13(3), 826–834 (1983)
Geman, S., Bienenstock, E., Doursat, R.: Neural networks and the bias/variance dilemma. Neural Comput. 4, 1–58 (1995)
Geman, S., Potter, D., Chi, Z.: Composition systems. Q. Appl. Math. 60(4), 707–736 (2002)
Gilbert, A., Illingworth, J., Bowden, R.: Action recognition using mined hierarchical compound features. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 883–897 (2011)
Granlund, G.H., Knutsson, H.: Signal Processing for Computer Vision. Kluwer Academic, Dordrecht (1995)
Hetzel, G., Leibe, B., Levi, P., Schiele, B.: 3D object recognition from range images using local feature histograms. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, pp. II-394–II-399. IEEE Computer Society, Los Alamitos, CA, USA (2001). doi:10.1109/CVPR.2001.990988
Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18, 527–1554 (2006)
Huang, F.J., LeCun, Y.: Large-scale learning with SVN and convolutional nets for generic object categorization. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 284–291 (2006)
Hubel, D., Wiesel, T.: Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J. Physiol. 160, 106–154 (1962)
Hubel, D., Wiesel, T.: Anatomical demonstration of columns in the monkey striate cortex. Nature 221, 747–750 (1969)
Hummel, J., Biederman, I.: Dynamic binding in a neural network for shape recognition. Psychol. Rev. 99, 480–517 (1992)
Hunt, R.: Measuring Colour, 3rd edn. Fountain Press, Kingston-upon-Thames (1998)
Jensen, L.B.W., Kjær-Nielsen, A., Pauwels, K., Jessen, J.B., Hulle, M.V., Krüger, N.: A two-level real-time vision machine combining coarse and fine grained parallelism. J. Real-Time Image Process. 5(4), 291–304 (2010)
Jessen, J.B., Pilz, F., Kraft, D., Pugeault, N., Krüger, N.: Accumulation of different visual feature descriptors in a coherent framework. In: Scandinavian Conference on Image Analysis (SCIA), pp. 79–90 (2011)
Johnson, A.E., Hebert, M.: Using spin images for efficient object recognition in cluttered 3D scenes. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 21(5), 433–449 (1999). doi:10.1109/34.765655
Kalkan, S., Wörgötter, F., Krüger, N.: Statistical analysis of local 3D structure in 2D images. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 1114–1121 (2006)
Kandell, E., Schwartz, J., Messel, T.: Principles of Neural Science, 4th edn. McGraw Hill, New York (2000)
Kasper, A., Xue, Z., Dillmann, R.: The kit object models database: an object model database for object recognition, localization and manipulation in service robotics. Int. J. Robot. Res. (IJRR) 31(8), 927–934 (2012). doi:10.1177/0278364912445831
Kavukcuoglu, K., Sermanet, P., amd K. Gregor, Y.B., Mathieu, M., LeCun, Y.: Learning convolutional feature hierachies for visual recognition. In: Advances in Neural Information Processing Systems (NIPS 2010), vol. 23, pp. 1090–1098 (2010)
Kellman, P., Arterberry, M.: The Cradle of Knowledge. MIT-Press, Cambridge (1998)
Kjær-Nielsen, A., Buch, A.G., Jensen, A.E.K., Ellekilde, L.P., Petersen, H.G., Krüger, N., Kraft, D., Møller, B.: Ring on the hook: placing a ring on a moving and pendulating hook based on visual input. Ind. Robot Int. J. 28(3), 301–314 (2010)
Kootstra, G., Popovic, M., Jørgensen, J., Kuklinski, K., Miatliuk, K., Kragic, D., Kruger, N.: Enabling grasping of unknown objects through a synergistic use of edge and surface information. Int. J. Robot. Res. 31(10), 1190–1213 (2012). doi:10.1177/0278364912452621. http://ijr.sagepub.com/content/31/10/1190.abstract
Kovesi, P.: Image features from phase congruency. Videre J. Comput. Vis. Res. 1(3), 1–26 (1999)
Kraft, D., Detry, R., Pugeault, N., Başeski, E., Guerin, F., Piater, J., Krüger, N.: Development of object and grasping knowledge by robot exploration. IEEE Trans. Auton. Ment. Dev. 2(4), 368–383 (2010)
Kraft, D., Pugeault, N., Başeski, E., Popović, M., Kragic, D., Kalkan, S., Wörgötter, F., Krüger, N.: Birth of the object: detection of objectness and extraction of object shape through object action complexes. Int. J. Hum. Robot. (Special Issue on “Cognitive Humanoid Robots”) 5, 247–265 (2009)
Krüger, N., Janssen, P., Kalkan, S., Lappe, M., Leonardis, A., Piater, J., Rodríguez-Sánchez, A.J., Wiskott, L.: Deep hierarchies in the primate visual cortex: what can we learn for computer vision? IEEE PAMI 35(8), 1847–1871 (2013)
Krüger, N., Pugeault, N., Başeski, E., Jensen, L.B.W., Kalkan, S., Kraft, D., Jessen, J.B., Pilz, F., Nielsen, A.K., Popović, M., Asfour, T., Piater, J., Kragic, D., Wörgötter., F.: Early cognitive vision as a front-end for cognitive systems. In: ECCV 2010 Workshop on “Vision for Cognitive Tasks” (2010)
Krüger, N., Wörgötter, F.: Different degree of genetical prestructuring in the ontogenesis of visual abilities based on deterministic and statistical regularities. In: Proceedings of the Workshop on Growing up Artifacts that Live (SAB 2002), pp. 5–14 (2002)
Krüger, N., Wörgötter, F.: Multi-modal primitives as functional models of hyper-columns and their use for contextual integration. In: Proceedings of the 1st International Symposium on Brain, Vision and Artificial Intelligence, Lecture Notes in Computer Science, LNCS 3704, pp. 157–156 (2005)
Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view rgb-d object dataset. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 1817–1824 (2011)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2169–2178 (2006)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. (IJCV) 2(60), 91–110 (2004)
Marr, D.: Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. Freeman, San Francisco (1977)
Mel, B.W., Fiser, J.: Minimizing binding errors using learned conjunctive features. Neural Comput. 12(4), 731–762 (2000)
Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 27(10), 1615–1630 (2005)
Milner, A., Goodale, M.: Separate visual pathways for perception and action. Trends Neurosci. 15, 20–25 (1992)
Murray, D., Little, J.: Patchlets: representing stereo vision data with surface elements. In: Seventh IEEE Workshops on Application of Computer Vision. WACV/MOTIONS vol 1., pp. 192–199 (2005)
Mustafa, W., Pugeault, N., Krüger, N.: Multi-view object recognition using view-point invariant shape relations and appearance information. In: IEEE International Conference on Robotics and Automation (ICRA) (2013)
Niebles, J., Fei Fei, L.: A hierarchical model of shape and appearance for human action classification. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2007)
Olesen, S.M., Lyder, S., Kraft, D., Krüger, N., Jessen, J.B.: Real-time extraction of surface patches with associated uncertainties by means of kinect cameras. J. Real-Time Image Process. 1–14 (2012). doi:10.1007/s11554-012-0261-x
Ommer, B., Buhmann, J.M.: Learning the compositional nature of visual objects. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2007)
O’Neill, B.: Elementary Differential Geometry. Elsevier Academic Press, Amsterdam (2006). http://books.google.dk/books?id=OtbNXAIve_AC
Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2007)
Pinto, N., Barhomi, Y., Cox, D., DiCarlo, J.: Comparing state-of-the-art visual features on invariant object recognition tasks. In: IEEE Workshop on Applications of Computer Vision (WACV 2011), pp. 463–470 (2011)
Pinto, N., DiCarlo, J., Cox, D.: How far can you get with a modern face recognition test set using only simple features? In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2591–2598 (2009)
Popović, M., Kraft, D., Bodenhagen, L., Başeski, E., Pugeault, N., Kragic, D., Asfour, T., Krüger, N.: A strategy for grasping unknown objects based on co-planarity and colour information. Robot. Auton. Syst. 58(5), 551–565 (2010). doi:10.1016/j.robot.2010.01.003
Pugeault, N., Wörgötter, F., Krüger, N.: Visual primitives: local, condensed, and semantically rich visual descriptors and their applications in robotics. Int. J. Hum. Robot. (Special Issue on Cognitive Humanoid Vision) 7(3), 379–405 (2010)
Quack, T., Ferrari, V., Leibe, B., Gool, L.V.: Efficient mining of frequent and distinctive feature configurations. In: Proceedings of the International Conference in Computer Vision (ICCV), pp. 1–8 (2007)
Razavian, A., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: IEEE CVPR Workshop on DeepVision (2014)
Riesenhuber, M., Poggio, T.: Hierarchical models of object recognition in cortex. Nature Neurosci. 11(2), 1019–1025 (1999)
Rosenblatt, F.: The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65, 386–408 (1958)
Rumelhart, D., Hinton, G., Williams, R.: Learning representation by back-propagating errors. Nature 323(9), 533–536 (1986)
Rusu, R.B., Blodow, N., Beetz, M.: Fast point feature histograms (fpfh) for 3d registration. In: IEEE International Conference on Robotics and Automation, 2009. ICRA’09, pp. 3212–3217. IEEE (2009)
Rusu, R.B., Blodow, N., Marton, Z.C., Beetz, M.: Aligning point cloud views using persistent feature histograms. In: Proceedings of the 21st IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Nice, France, pp. 3384–3391 (2008)
Savarese, S., Winn, J., Criminisi, A.: Discriminative object class models of appearance and shape by correlations. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2033–2040 (2006)
Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., Le Cun, Y.: OverFeat: Integrated recognition, localization and detection using convolutional networks. In: International Conference on Learning Representations (2014)
Sermanet, P., LeCun, Y.: Traffic sign recognition with multi-scale convolutional networks. In: Proceedings of International Joint Conference on Neural Networks (IJCNN’11), pp. 2809–2813 (2011)
Serre, T., Wolf, L., Bileschi, S., Riesenhuber, M., Poggio, T.: Object recognition with cortex-like mechanisms. IEEE Trans. Pattern Anal. Mach. Intell. 29(3), 411–426 (2007)
Sutskever, I., Hinton, G.E.: Learning multilevel distributed representations for high-dimensional sequences. In: AI and Statistics, pp. 544–551 (2007)
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I.J., Fergus, R.: Intriguing properties of neural networks. International Conference on Learning Representations (2014)
Tenenbaum, J.B., Kemp, C., Griffiths, T.L., Goodman, N.D.: How to grow a mind: statistics, structure, and abstraction. Science 331, 1279–1285 (2011)
Tsotsos, J.K.: Analyzing vision at the complexity level. Behav. Brain Sci. 13(3), 423–469 (1990)
Tsotsos, J.K.: A Computational Perspective on Visual Attention, 1st edn. MIT Press, Cambridge (2011)
Ullman, S., Epshtein, B.: Visual classication by a hierarchy of extended fragments. In: Ponce, J., Hebert, M., Schmid, C., Zisserman, A. (eds.) Towards Category-Level Object Recognition, pp. 321–344. Springer, Berlin (2006)
Wahl, E., Hillenbrand, U., Hirzinger, G.: Surflet-pair-relation histograms: a statistical 3D-shape representation for rapid classification. In: Fourth International Conference on 3-D Digital Imaging and Modeling, 2003. 3DIM 2003. Proceedings, pp. 474–481. IEEE (2003)
Yang, Y., Newsam, S.: Bag-of-visual-words and spatial extensions for land-use classification. In: Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 270–279 (2010)
Zhang, J., Marszalek, M., Lazebnik, S., Schmid, C.: Local features and kernels for classification of texture and object categories: a comprehensive study. Int. J. Comput. Vis. 73(2), 213–238 (2007)
Acknowledgments
This work has been supported by the European Community’s Seventh Framework Programme FP7/ICT under grant agreement no. 270273, Xperience. We would like to thank Antonio Rodriguez Sanchez for providing an initial version of Fig. 1.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kraft, D., Mustafa, W., Popović, M. et al. Using surfaces and surface relations in an Early Cognitive Vision system. Machine Vision and Applications 26, 933–954 (2015). https://doi.org/10.1007/s00138-015-0705-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00138-015-0705-y