Abstract
We present a computational framework for attention-guided visual scene exploration in sequences of RGB-D data. For this, we propose a visual object candidate generation method to produce object hypotheses about the objects in the scene. An attention system is used to prioritise the processing of visual information by (1) localising candidate objects, and (2) integrating an inhibition of return (IOR) mechanism grounded in spatial coordinates. This spatial IOR mechanism naturally copes with camera motions and inhibits objects that have already been the target of attention. Our approach provides object candidates which can be processed by higher cognitive modules such as object recognition. Since objects are basic elements for many higher level tasks, our architecture can be used as a first layer in any cognitive system that aims at interpreting a stream of images. We show in the evaluation how our framework finds most of the objects in challenging real-world scenes.














Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Adams R, Bischof L (1994) Seeded region growing. IEEE Trans Pattern Anal Mach Intell 16(6):641–647
Alexe B, Deselaers T, Ferrari V (2012) Measuring the objectness of image windows. IEEE Trans Pattern Anal Mach Intell 34(11):2189–2202
Backer G, Mertsching B, Bollmann M (2001) Data- and model-driven gaze control for an active-vision system. IEEE Trans Pattern Anal Mach Intell (PAMI) 23(12):1415–1429
Chang CC, Lin CJ (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):1–27
Everingham M, Van Gool L, Williams C, Winn J, Zisserman A (2007) The Pascal visual object classes challenge 2007 results. http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2007/. Accessed 17 Feb 2015
Feldman J (2003) What is a visual object? Trends Cogn Sci 7(6):252–256
Felzenszwalb PF, Huttenlocher DP (2004) Efficient graph-dased image segmentation. Int J Comput Vis (IJCV) 59(2):167–181
Fowlkes CC, Martin DR, Malik J (2007) Local figure-ground cues are valid for natural images. J Vision 7(8):2
Frintrop S, Rome E, Christensen HI (2010) Computational visual attention systems and their cognitive foundations: A survey. ACM Trans Appl Percept 7(1):6
Frintrop S, Werner T, Martín García G (2015) Traditional saliency reloaded: a good old model in new shape. In: Proceedings of CVPR
Heinke D, Humphreys GW (2004) Computational models of visual selective attention. A review. In: Connectionist models in cognitive psychology, vol 4. Psychology Press, pp 273–312
Herbst E, Henry P, Ren X, Fox D (2011) Toward object discovery and modeling via 3-D scene comparison. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)
Hu MK (1962) Visual pattern recognition by moment invariants. IRE Trans Inform Theory 8(2):179–187
Hurvich L, Jameson D (1957) An opponent-process theory of color vision. Psychol review 64(6):384
Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259
Kanizsa W, Gerbino W (1976) Convexity and symmetry in figure-ground organization. In: Henle M (ed) Vision and artifact. Springer, New York, pp 25–32
Karpathy A, Miller S, Fei-Fei L (2013) Object discovery in 3D scenes via shape analysis. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)
Koch C, Ullman S (1987) Shifts in selective visual attention: towards the underlying neural circuitry. Matters of intelligence. Springer, Berlin
Kootstra G, Kragic D (2011) Fast and bottom–up object detection, segmentation, and evaluation using Gestalt principles. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)
Lai K, Bo L, Ren X, Fox D (2011) A large-scale hierarchical multi-view rgb-d object dataset. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)
Manén S, Guillaumin M, Van Gool L (2013) Prime object proposals with randomized Prim’s algorithm. In: IEEE International Conference on Computer Vision (ICCV)
Martín García G, Frintrop S (2013) A computational framework for attentional 3D object detection. In: Proceedings of the Annual Conference of the Cognitive Science Society (CogSci)
Martín García G, Potapova E, Werner T, Zillich M, Vincze M, Frintrop S (2015) Saliency-based object discovery on RGB-D data with a late-fusion approach. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)
Newcombe RA, Izadi S, Hilliges O, Molyneaux D, Kim D, Davison AJ, Kohli P, Shotton J, Hodges S, Fitzgibbon A (2011) KinectFusion: real-time dense surface mapping and tracking. In: Proceedings of IEEE International Symposium on Mixed and Augmented Reality (ISMAR)
Palomino AJ, Marfil R, Bandera JP, Bandera A (2011) A novel biologically inspired attention mechanism for a social robot. EURASIP J Adv Signal Process 2011:4
Pashler HE, Sutherland S (1998) The psychology of attention, vol 15. MIT Press, Cambridge
Posner MI, Rafal RD, Choate LS, Vaughan J (1985) Inhibition of return: neural basis and function. Cogn Neuropsychol 2(3):211–228
Potapova E, Varadarajan KM, Richtsfeld A, Zillich M, Vincze M (2014) Attention-driven object detection and segmentation of cluttered table scenes using 2.5D symmetry. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)
Pylyshyn ZW (2001) Visual indexes, preconceptual objects, and situated vision. Cognition 80(1–2):127–158
Rensink R (2000) The dynamic representation of scenes. Visual Cogn 7:17–42
Richtsfeld A, Morwald T, Prankl J, Zillich M, Vincze M (2012) Segmentation of unknown objects in indoor environments. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Schiebener D, Ude A, Asfour T (2014) Physical interaction for segmentation of unknown textured and non-textured rigid objects. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)
Scholl B (2001) Objects and attention: the state of the art. Cognition 80:1–46
Treisman AM, Gelade G (1980) A feature integration theory of attention. Cogn Psychol 12:97–136
Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vision 57(2):137–154
Wagemans J, Elder JH, Kubovy M, Palmer SE, Peterson MA, Singh M, von der Heydt R (2012) A century of gestalt psychology in visual perception: I. perceptual grouping and figure-ground organization. Psychol Bull 138:1172–1217
Wang A, Liu X, Chen Q, Zhang M (2016) Effect of different directions of attentional shift on inhibition of return in three-dimensional space. Atten Percept Psychophys 78(3):838–847
Author information
Authors and Affiliations
Corresponding author
Additional information
Handling Editor: John K. Tsotsos (York University).
Reviewers: Markus Vincze (Vienna University of Technology), Neil Bruce (University of Manitoba).
Rights and permissions
About this article
Cite this article
Martín García, G., Pavel, M. & Frintrop, S. A computational framework for attentional object discovery in RGB-D videos. Cogn Process 18, 169–182 (2017). https://doi.org/10.1007/s10339-017-0791-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10339-017-0791-z