What We Can Learn From the Primate’s Visual System | KI - Künstliche Intelligenz Skip to main content
Log in

What We Can Learn From the Primate’s Visual System

  • Technical Contribution
  • Published:
KI - Künstliche Intelligenz Aims and scope Submit manuscript

Abstract

In this review, we discuss the impact (or lack thereof) biologically motivated vision has had on computer vision in the last decades. We then summarize a number of computer vision and robotic problems for which biological models can give indications for how these can be addressed. Then we summarize important findings about the primate’s visual system and draw a number of conclusions for the development of algorithms from these findings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. Actually it has mostly been the macaque’s visual system that is the basis for neuro-physiological investigations which however shows a large degree of similarity to the human visual system.

  2. An area is called retinotopically organized when it preserves the neighbourhood relations of the retina, i.e., the general arrangement of 2D positions. In particular the cortical areas at lower levels of the visual hierarchy are retinotopic.

  3. The receptive field associated to a neuron is the part of the visual field that directly influences the response of the neuron.

References

  1. Agarwal S, Snavely N, Simon I, Seitz SM, Szeliski R (2009) Building Rome in a day. In: International Conference on Computer Vision (ICCV). pp 72–79

  2. Aldoma A, Fäulhammer T, Vincze M (2014) Automation of “ground truth” annotation for multi-view RGB-D object instance recognition datasets. In: International Conference on Robotics and Automation (ICRA). pp 5016–5023

  3. Ambrus R, Bore N, Folkesson J, Jensfelt P (2014) Meta-rooms: building and maintaining long term spatial models in a dynamic world. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). pp 1854–1861

  4. Andreopoulos A, Tsotsos JK (2013) 50 Years of object recognition: directions forward. Comput Vision Image Underst 117(8):827–891

    Article  Google Scholar 

  5. Arterberry ME, Yonas A, Bensen AS (1977) Self-produced locomotion and the development of responsiveness to linear perspective and texture gradients. Dev Psychol 25:976–982

    Article  Google Scholar 

  6. Bay H, Ess A, Tuytelaars T, Van Gool L (2008) SURF: speeded up robust features. Comput Vision Image Underst 110(3):346–359

    Article  Google Scholar 

  7. Bengio S, Deng L, Larochelle H, Lee H, Salakhutdinov R (guest eds) (2013) Special section on learning deep architectures. Pattern analysis and machine intelligence, IEEE Transactions on 35(8)

  8. Berkes P, Wiskott L (2005) Slow feature analysis yields a rich repertoire of complex cell properties. J Vision 5(6):579–602

    Article  Google Scholar 

  9. Borji A, Itti L (2013) State-of-the-art in visual attention modeling. Pattern Anal Mach Intell IEEE Trans 35(1):185–207

    Article  MathSciNet  Google Scholar 

  10. Borji A, Sihite DN, Itti L (2012) Probabilistic learning of task-specific visual attention. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp 470–477

  11. Boyer KL, Sarkar S (1999) Perceptual organisation in computer vision: status, challenges and potential. Guest Editor Comput Vision Image Underst 76(1):1–5

    Article  Google Scholar 

  12. Canny JF (1986) A computational approach to edge detection. Pattern Anal Mach Intell IEEE Trans 8(6):679–698

    Article  Google Scholar 

  13. Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details: delving deep into convolutional nets. In: British Machine Vision Conference (BMVC)

  14. Chuang AT, Margo CE, Greenberg PB (2014) Retinal implants: a systematic review. Br J Ophthalmol 98:852–856

    Article  Google Scholar 

  15. Criminisi A, Blake A, Rother C, Shotton J, Torr P (2007) Efficient dense stereo with occlusions for new view-synthesis by four-state dynamic programming. Int J Comput Vision 71(1):89–110

    Article  Google Scholar 

  16. Cummins M, Newman P (2010) Appearance-only SLAM at large scale with FAB-MAP 2.0. The International Journal of Robotics Research

  17. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol 2. pp 886–893

  18. Dame A, Prisacariu VA, Ren CY, Reid I (2013) Dense reconstruction using 3D object shape priors. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp 1288–1295

  19. Dementhon DF, Davis LS (1995) Model-based object pose in 25 lines of code. Int J Comput Vision 15(1–2):123–141

    Article  Google Scholar 

  20. Dickinson S (2009) The evolution of object categorization and the challenge of image abstraction. In: Dickinson S, Leonardis A, Schiele B, Tarr M (eds) Object categorization: computer and human vision perspectives. Cambridge University Press, pp 1–37

  21. Dickinson S, Levinshtein A, Sala P, Sminchisescu C (2013) The role of mid-level shape priors in perceptual grouping and image abstraction. In: Dickinson S, Pizlo Z (eds), Shape perception in human and computer vision: sn interdisciplinary perspective. Springer

  22. Fang F, Boyaci H, Kersten D (2009) Border ownership selectivity in human early visual cortex and its modulation by attention. J Neurosci 29(2):460–465

    Article  Google Scholar 

  23. Faugeras OD (1993) Three-dimensional computer vision. MIT press, Cambridge

    Google Scholar 

  24. Fidler S, Boben M, Leonardis A (2010) A coarse-to-fine taxonomy of constellations for fast multi-class object detection. In: European Conference on Computer Vision (ECCV)

  25. Freedman DJ, Assad JA (2012) Experience-dependent representation of visual categories in parietal cortex. Nature 443:85–88

    Article  Google Scholar 

  26. Frintrop S, Rome E, Christensen H (2010) Computational visual attention systems and their cognitive foundations: a survey. ACM Trans Appl Percept (TAP) 7(1):1–46

    Article  Google Scholar 

  27. Geman S, Bienenstock E, Doursat R (1995) Neural networks and the bias/variance dilemma. Neural Comput 4:1–58

    Article  Google Scholar 

  28. Girshick R, Donahue J, Darrell T, and Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  29. Gordon I , Lowe DG (2006) What and where: 3D object recognition with accurate pose. In: Ponce J, Hebert M, Schmid C, Zisserman A (eds) Toward category-level object recognition, chapter what and w. Springer, pp 67–82

  30. Hager GD, Wegbreit B (2011) Scene parsing using a prior world model. Int J Robot Res

  31. Hartley RI, Zisserman A (2000) Multiple view geometry in computer vision. University Press, Cambridge

    MATH  Google Scholar 

  32. Herbst E, Ren X, Fox D (2011) RGB-D object discovery via multi-scene analysis. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

  33. Hinterstoisser S, Cagniart C, Ilic S, Sturm P, Navab N, Fua P, Lepetit V (2012) Gradient response maps for real-time detection of textureless objects. Pattern Anal Mach Intell IEEE Trans 34(5):876–888

    Article  Google Scholar 

  34. Hinton GE, Osindero S, Teh Y-W (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554

    Article  MATH  MathSciNet  Google Scholar 

  35. Hochberg LR, Bacher D, Jarosiewicz B, Masse NY, Simeral JD, Vogel J, Haddadin JS, Liu J, Cash SS, vander Smagt P, Donoghue JP (2012) Reach and grasp by people with tetraplegia using a neurally controlled robotic arm. Nature 485:372–375

    Article  Google Scholar 

  36. Hubel DH, Wiesel TN (1962) Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J Physiol 160:106–154

    Article  Google Scholar 

  37. Hubel DH, Wiesel TN (1969) Anatomical demonstration of columns in the monkey striate cortex. Nature 221:747–750

    Article  Google Scholar 

  38. Hummel J, Biederman I (1992) Dynamic binding in a neural network for shape recognition. Psychol Rev 99:480–517

    Article  Google Scholar 

  39. Johnson AE, Hebert M (1999) Using spin images for efficient object recognition in cluttered 3d scenes. Pattern Anal Mach Intell IEEE Trans 21(5):433–449

    Article  Google Scholar 

  40. Kayser C, Körding KP, König P (2004) Processing of complex stimuli and natural scenes in the visual cortex. Curr Opin Neurobiol 14(4):468–473

    Article  Google Scholar 

  41. Kellman PJ, Arterberry ME (1998) The cradle of knowledge. MIT Press, Cambridge

    Google Scholar 

  42. Klein G, Murray D (2007) Parallel tracking and mapping for small AR workspaces. In: Sixth IEEE and ACM International Symposium on Mixed and Augmented Reality (ISMAR). Nara, Japan, pp 225–234

  43. König P, Krüger N (2006) Perspectives: symbols as self-emergent entities in an optimization process of feature extraction and predictions. Biol Cybern 94(4):325–334

    Article  MATH  Google Scholar 

  44. Kraft D, Pugeault N, Başeski M, Popović M, Kragic D, Kalkan S, Wörgötter F, Krüger N (2009) Birth of the object: detection of objectness and extraction of object shape through object action complexes. Int J Humanoid Robot 5:247–265

    Article  Google Scholar 

  45. Krainin M, Henry P, Ren X, Fox D (2010) Manipulator and object tracking for in Hand Model Acquisition. In: IEEE International Conference on Robotics and Automation (ICRA)

  46. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet Classification with Deep Convolutional Neural Networks. In: Advances in neural information processing systems. pp 1–9

  47. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems 25 (NIPS 2012). Curran Associates Inc, pp 1097–1105

  48. Krüger N, Janssen P, Kalkan S, Lappe M, Leonardis A, Piater J, Rodríguez-Sánchez AJ, Wiskott L (2013) Deep hierarchies in the primate visual cortex: what can we learn for computer vision? Pattern Anal Mach Intell IEEE Trans 35(8):1847–1871

    Article  Google Scholar 

  49. Krüger N, vonder Malsburg C (2015) A required paradigm shift in todays vision research: interview with Prof. Christoph von der Malsburg. Künstliche Intelligenz—special issue on bio-inspired vision systems

  50. Krüger N, Wörgötter F (2004) Statistical and deterministic regularities: utilisation of motion and grouping in biological and artificial visual systems. Adv Imaging Electron Phys 131:82–147

    Google Scholar 

  51. Leung T, Malik J (1998) Contour continuity in region based image segmentation. In: European Conference on Computer Vision (ECCV). pp 544–559

  52. Lowe DG (1987) Three-dimensional object recognition from single two-dimensional images. Artif Intell 31(3):355–395

    Article  Google Scholar 

  53. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vision 2(60):91–110

    Article  Google Scholar 

  54. Marr D (1982) Vision: a computational investigation into the human representation and processing of visual information. Freeman WH

  55. Mian AS, Bennamoun M, Owens R (2006) Three-dimensional model-based object recognition and segmentation in cluttered scenes. Pattern Anal Mach Intell IEEE Trans 28(10):1584–1601

    Article  Google Scholar 

  56. Navalpakkam V, Itti L (2006) An integrated model of top-down and bottom-up attention for optimal object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). New York, pp 2049–2056

  57. Newcombe RA, Davison AJ (2010) Live dense reconstruction with a single moving camera. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp 1498–1505

  58. Newcombe RA, Lovegrove SJ, Davison AJ (2011) DTAM : dense tracking and mapping in real-time. In: IEEE International Conference on Computer Vision (ICCV)

  59. Olshausen BA, Field D (1996) Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381:607–609

    Article  Google Scholar 

  60. Osada R, Funkhouser T, Chazelle B, Dobkin D (2002) Matching 3D models with shape distributions. In: International Conference on Shape Modeling and Applications (SMI)

  61. Pizzoli M, Forster C, Scaramuzza D (2014) REMODE : probabilistic, monocular dense reconstruction in real time. In: IEEE International Conference on Robotics and Automation (ICRA)

  62. Pugeault N, Wörgötter F, Krüger N (2010) Visual primitives: local, condensed, and semantically rich visual descriptors and their applications in robotics. Int J Humanoid Robot 7(3):379–405

    Article  Google Scholar 

  63. Richtsfeld A, Mörwald T, Prankl J, Zillich M, Vincze M (2014) Learning of perceptual grouping for object segmentation on RGB-D data. J Vis Commun Image Represent 25(1):64–73

    Article  Google Scholar 

  64. Rosten E, Porter R, Drummond T (2010) Faster and better: a machine learning approach to corner detection. Pattern Anal Mach Intell IEEE Trans 32(1):105–119

    Article  Google Scholar 

  65. Rumelhart D, Hinton GE, Williams RJ (1986) Learning representation by back-propagating errors. Nature 323(9):533–536

    Article  Google Scholar 

  66. Russakovsky O, Deng J, Huang Z, Berg AC, Fei-Fei L (2013) Detecting avocados to Zucchinis: what have we done, and where are we going? In: IEEE International Conference on Computer Vision (ICCV). pp 2064–2071

  67. Rusu RB, Bradski G, Thibaux R, Hsu J (2010) Fast 3D recognition and pose using the Viewpoint Feature Histogram. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

  68. Sala P, Dickinson S (2010) Contour grouping and abstraction using simple part models. In: European Conference on Computer Vision (ECCV). pp 603–616

  69. Salti S, Tombari F, Di Stefano L (2014) SHOT: unique signatures of histograms for surface and texture description. Comput Vision Image Underst 125:251–264

    Article  Google Scholar 

  70. Sarkar S, Boyer KL (1993) Perceptual organization in computer vision: a review and a proposal for a classificatory structure. IEEE Trans Syst Man Cybern 23(2):382–399

    Article  Google Scholar 

  71. Silberman N, Hoiem D, Kohli P, Fergus R (2012) Indoor segmentation and support inference from RGBD images. In: European Conference on Computer Vision (ECCV). pp 746–760

  72. Sinha SN, Scharstein D, Szeliski R (2014) Efficient high-resolution stereo matching using local plane sweeps. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp 1582–1589

  73. Stuehmer J, Gumhold S, Cremers D (2010) Real-time dense geometry from a handheld camera. In: Proceedings of the DAGM Symposium on Pattern Recognition

  74. Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I, Fergus R (2014) Intriguing properties of neural networks. In: International Conference on Learning Representations (ICLR)

  75. Tanaka K (1993) Neuronal mechanisms of object recognition. Science 262:685–688

    Article  Google Scholar 

  76. Tola E, Lepetit V, Fua P (2010) DAISY: an efficient dense descriptor applied to wide-baseline stereo. Pattern Anal Mach Intell IEEE Trans 32(5):815–830

    Article  Google Scholar 

  77. Tombari F, Salti S, Di Stefano L (2010) Unique signatures of histograms for local surface description. In: European Conference on Computer Vision (ECCV). Springer, pp 356–369

  78. Tuytelaars T, Mikolajczyk K (2008) Local invariant feature detectors: a survey. Found Trends Comput Graph Vision 3(3):1–104

    Google Scholar 

  79. Ückermann A, Elbrechter C, Haschke R, Ritter H (2014) Real-time hierarchical scene segmentation and classification. In: IEEE-RAS International Conference on Humanoid Robots (Humanoids)

  80. Vapnik VN (1998) Stat Learn Theory. Adaptive and learning systems for signal processing. Wiley, New-York

    Google Scholar 

  81. Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vision 57(2):137–154

    Article  Google Scholar 

  82. vonder Heydt R, Peterhans E, Baumgartner G (1984) Illusory contours and cortical neuron responses. Science 224:1260–1262

    Article  Google Scholar 

  83. Wagemans J, Elder JH, Kubovy M, Palmer SE, Peterson MA, Singh M, vonder Heydt R (2012) A century of gestalt psychology in visual perception: I. perceptual grouping and figure-ground organization. Psychol Bull 138(6)

  84. Wendel A, Maurer M, Graber G, Pock T, Bischof H (2012) Dense reconstruction on-the-fly. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp 1450–1457

Download references

Acknowledgments

Norbert Krüger was supported by the EU Cognitive Systems project XPERIENCE (FP7-ICT-270273) and the DSF project patient@home. Michael Zillich was supported by EU projects SQUIRREL (FP7-ICT-610532) and STRANDS (FP7-ICT-600623) and Austrian Science Fund (FWF) grant No. TRP 139-N23 InSitu. Many thanks to Antonio Rodriguez Sanchez for his work on Fig. 1 and Laurenz Wiskott for his work on Fig. 2. Many thanks also to IEEE for allowing to re-use these figures from [48].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Zillich.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Krüger, N., Zillich, M., Janssen, P. et al. What We Can Learn From the Primate’s Visual System. Künstl Intell 29, 9–18 (2015). https://doi.org/10.1007/s13218-014-0345-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13218-014-0345-9

Keywords

Navigation