A computational framework for attentional object discovery in RGB-D videos | Cognitive Processing Skip to main content
Log in

A computational framework for attentional object discovery in RGB-D videos

  • Research Report
  • Published:
Cognitive Processing Aims and scope Submit manuscript

Abstract

We present a computational framework for attention-guided visual scene exploration in sequences of RGB-D data. For this, we propose a visual object candidate generation method to produce object hypotheses about the objects in the scene. An attention system is used to prioritise the processing of visual information by (1) localising candidate objects, and (2) integrating an inhibition of return (IOR) mechanism grounded in spatial coordinates. This spatial IOR mechanism naturally copes with camera motions and inhibits objects that have already been the target of attention. Our approach provides object candidates which can be processed by higher cognitive modules such as object recognition. Since objects are basic elements for many higher level tasks, our architecture can be used as a first layer in any cognitive system that aims at interpreting a stream of images. We show in the evaluation how our framework finds most of the objects in challenging real-world scenes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Adams R, Bischof L (1994) Seeded region growing. IEEE Trans Pattern Anal Mach Intell 16(6):641–647

    Article  Google Scholar 

  • Alexe B, Deselaers T, Ferrari V (2012) Measuring the objectness of image windows. IEEE Trans Pattern Anal Mach Intell 34(11):2189–2202

    Article  PubMed  Google Scholar 

  • Backer G, Mertsching B, Bollmann M (2001) Data- and model-driven gaze control for an active-vision system. IEEE Trans Pattern Anal Mach Intell (PAMI) 23(12):1415–1429

    Article  Google Scholar 

  • Chang CC, Lin CJ (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):1–27

    Article  Google Scholar 

  • Everingham M, Van Gool L, Williams C, Winn J, Zisserman A (2007) The Pascal visual object classes challenge 2007 results. http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2007/. Accessed 17 Feb 2015

  • Feldman J (2003) What is a visual object? Trends Cogn Sci 7(6):252–256

    Article  PubMed  Google Scholar 

  • Felzenszwalb PF, Huttenlocher DP (2004) Efficient graph-dased image segmentation. Int J Comput Vis (IJCV) 59(2):167–181

    Article  Google Scholar 

  • Fowlkes CC, Martin DR, Malik J (2007) Local figure-ground cues are valid for natural images. J Vision 7(8):2

    Article  Google Scholar 

  • Frintrop S, Rome E, Christensen HI (2010) Computational visual attention systems and their cognitive foundations: A survey. ACM Trans Appl Percept 7(1):6

    Article  Google Scholar 

  • Frintrop S, Werner T, Martín García G (2015) Traditional saliency reloaded: a good old model in new shape. In: Proceedings of CVPR

  • Heinke D, Humphreys GW (2004) Computational models of visual selective attention. A review. In: Connectionist models in cognitive psychology, vol 4. Psychology Press, pp 273–312

  • Herbst E, Henry P, Ren X, Fox D (2011) Toward object discovery and modeling via 3-D scene comparison. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)

  • Hu MK (1962) Visual pattern recognition by moment invariants. IRE Trans Inform Theory 8(2):179–187

    Article  Google Scholar 

  • Hurvich L, Jameson D (1957) An opponent-process theory of color vision. Psychol review 64(6):384

    Article  Google Scholar 

  • Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259

    Article  Google Scholar 

  • Kanizsa W, Gerbino W (1976) Convexity and symmetry in figure-ground organization. In: Henle M (ed) Vision and artifact. Springer, New York, pp 25–32

  • Karpathy A, Miller S, Fei-Fei L (2013) Object discovery in 3D scenes via shape analysis. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)

  • Koch C, Ullman S (1987) Shifts in selective visual attention: towards the underlying neural circuitry. Matters of intelligence. Springer, Berlin

    Google Scholar 

  • Kootstra G, Kragic D (2011) Fast and bottom–up object detection, segmentation, and evaluation using Gestalt principles. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)

  • Lai K, Bo L, Ren X, Fox D (2011) A large-scale hierarchical multi-view rgb-d object dataset. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)

  • Manén S, Guillaumin M, Van Gool L (2013) Prime object proposals with randomized Prim’s algorithm. In: IEEE International Conference on Computer Vision (ICCV)

  • Martín García G, Frintrop S (2013) A computational framework for attentional 3D object detection. In: Proceedings of the Annual Conference of the Cognitive Science Society (CogSci)

  • Martín García G, Potapova E, Werner T, Zillich M, Vincze M, Frintrop S (2015) Saliency-based object discovery on RGB-D data with a late-fusion approach. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)

  • Newcombe RA, Izadi S, Hilliges O, Molyneaux D, Kim D, Davison AJ, Kohli P, Shotton J, Hodges S, Fitzgibbon A (2011) KinectFusion: real-time dense surface mapping and tracking. In: Proceedings of IEEE International Symposium on Mixed and Augmented Reality (ISMAR)

  • Palomino AJ, Marfil R, Bandera JP, Bandera A (2011) A novel biologically inspired attention mechanism for a social robot. EURASIP J Adv Signal Process 2011:4

    Article  Google Scholar 

  • Pashler HE, Sutherland S (1998) The psychology of attention, vol 15. MIT Press, Cambridge

    Google Scholar 

  • Posner MI, Rafal RD, Choate LS, Vaughan J (1985) Inhibition of return: neural basis and function. Cogn Neuropsychol 2(3):211–228

    Article  Google Scholar 

  • Potapova E, Varadarajan KM, Richtsfeld A, Zillich M, Vincze M (2014) Attention-driven object detection and segmentation of cluttered table scenes using 2.5D symmetry. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)

  • Pylyshyn ZW (2001) Visual indexes, preconceptual objects, and situated vision. Cognition 80(1–2):127–158

    Article  CAS  PubMed  Google Scholar 

  • Rensink R (2000) The dynamic representation of scenes. Visual Cogn 7:17–42

    Article  Google Scholar 

  • Richtsfeld A, Morwald T, Prankl J, Zillich M, Vincze M (2012) Segmentation of unknown objects in indoor environments. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

  • Schiebener D, Ude A, Asfour T (2014) Physical interaction for segmentation of unknown textured and non-textured rigid objects. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)

  • Scholl B (2001) Objects and attention: the state of the art. Cognition 80:1–46

    Article  CAS  PubMed  Google Scholar 

  • Treisman AM, Gelade G (1980) A feature integration theory of attention. Cogn Psychol 12:97–136

    Article  CAS  PubMed  Google Scholar 

  • Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vision 57(2):137–154

    Article  Google Scholar 

  • Wagemans J, Elder JH, Kubovy M, Palmer SE, Peterson MA, Singh M, von der Heydt R (2012) A century of gestalt psychology in visual perception: I. perceptual grouping and figure-ground organization. Psychol Bull 138:1172–1217

    Article  PubMed  PubMed Central  Google Scholar 

  • Wang A, Liu X, Chen Q, Zhang M (2016) Effect of different directions of attentional shift on inhibition of return in three-dimensional space. Atten Percept Psychophys 78(3):838–847

    Article  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Germán Martín García.

Additional information

Handling Editor: John K. Tsotsos (York University).

Reviewers: Markus Vincze (Vienna University of Technology), Neil Bruce (University of Manitoba).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Martín García, G., Pavel, M. & Frintrop, S. A computational framework for attentional object discovery in RGB-D videos. Cogn Process 18, 169–182 (2017). https://doi.org/10.1007/s10339-017-0791-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10339-017-0791-z

Keywords