Abstract
Human visual attention can be modulated not only by visual stimuli but also by ones from other modalities such as audition. Hence, incorporating auditory information into a human visual attention model would be a key issue for building more sophisticated models. However, the way of integrating multiple pieces of information arising from audio-visual domains still remains a challenging problem. This paper proposes a novel computational model of human visual attention driven by auditory cues. Founded on the Bayesian surprise model that is considered to be promising in the literature, our model uses surprising auditory events to serve as a clue for selecting synchronized visual features and then emphasizes the selected features to form the final surprise map. Our approach to audio-visual integration focuses on using effective visual features alone but not all available features for simulating visual attention with the help of auditory information. Experiments using several video clips show that our proposed model can better simulate eye movements of human subjects than other existing models in spite that our model uses a smaller number of visual features.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ahveninen, J., Jaaskelainen, I.P., Belliveau, J.W., Hamalainen, M., Lin, F.H., Raij, T.: Dissociable influences of auditory object vs. spatial attention on visual system oscillatory activity. PLoS One 7(6), e38511 (2012)
Begum, M., Karray, F.: Visual attention for robotic cognition: A survey. IEEE Transactions on Autonomous Mental Development 3(1), 92–105 (2011)
Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence 35(1), 185–207 (2013)
Van der Burg, E., Cass, J., Olivers, C.N.L., Theeuwes, J., Alais, D.: Efficient visual search from synchronized auditory signals requires transient audiovisual events. PLoS One 5(5), e10664 (2010)
Evangelopoulos, G., Zlatintsi, A., Potamianos, A., Maragos, P., Rapantzikos, K., Skoumas, G., Avrithis, Y.: Multimodal saliency and fusion for movie summarization based on aural, visual, and textual attention. IEEE Transactions on Multimedia 15(7), 1553–1568 (2013)
Gao, D., Han, S., Vasconcelos, N.: Discriminant saliency, the detection of suspicious coincidences, and applications to visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 31(6), 989–1005 (2009)
Itti, L., Dhavale, N., Pighin, F.: Realistic avatar eye and head animation using a neurobiological model of visual attention. In: Proc. SPIE 48th Annual International Symposium on Optical Science and Technology, vol. 5200, pp. 64–78. SPIE Press, Bellingham (2003)
Itti, L., Baldi, P.: Bayesian surprise attracts human attention. Vision Research 49(10), 1295–1306 (2009)
Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(11), 1254–1259 (1998)
Kayser, C., Petkov, C., Lippert, M., Logothesis, N.: Mechanisms for allocating auditory attention: An auditory saliency map. Current Biology 15, 1943–1947 (2005)
Kimura, A., Yonetani, R., Hirayama, T.: Computational models of human visual attention and their implementations: A survey. IEICE Transactions 96-D(3), 562–578 (2013)
Ma, Y.F., Hua, X.S., Lu, L., Zhang, H.J.: A generic framework of user attention model and its application in video summarization. IEEE Transactions on Multimedia 7(5), 907–919 (2005)
Miyazato, K., Kimura, A., Takagi, S., Yamato, J.: Real-time estimation of human visual attention with dynamic Bayesian network and MCMC-based particle filter. In: ICME, pp. 250–257. IEEE (2009)
Nakajima, J., Sugimoto, A., Kawamoto, K.: Incorporating audio signals into constructing a visual saliency map. In: Klette, R., Rivera, M., Satoh, S. (eds.) PSIVT 2013. LNCS, vol. 8333, pp. 468–480. Springer, Heidelberg (2014)
Ngo, C.W., Ma, Y.F., Zhang, H.J.: Video summarization and scene detection by graph modeling. IEEE Transactions on Circuits and Systems for Video Technology 15(2), 296–305 (2005)
Pang, D., Kimura, A., Takeuchi, T., Yamato, J., Kashino, K.: A stochastic model of selective visual attention with a dynamic Bayesian network. In: Proc. IEEE International Conference on Multimedia and Expo. (ICME), pp. 1073–1076. IEEE (2008)
Rolf, M., Asada, M.: Visual attention by audiovisual signal-level synchrony. In: Proc. 9th ACM/IEEE International Conference on Human-Robot Interaction Workshop on Attention Models in Robotics: Visual Systems for Better HRI (2014)
Ruesch, J., Lopes, M., Bernardino, A., Hornstein, J., Santos-Victor, J., Pfeifer, R.: Multimodal saliency-based bottom-up attention a framework for the humanoid robot iCub. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 962–967 (2008)
Schauerte, B., Kühn, B., Kroschel, K., Stiefelhagen, R.: Multimodal saliency-based attention for object-based scene analysis. In: Proc. 24th International Conference on Intelligent Robots and Systems (IROS). IEEE/RSJ (2011)
Schauerte, B., Stiefelhagen, R.: Wow! Bayesian surprise for salient acoustic event detection. In: Proc. 38th International Conference on Acoustics, Speech, and Signal Processing, (ICASSP) (2013)
Spexard, T., Hanheide, M., Sagerer, G.: Human-oriented interaction with an anthropomorphic robot. IEEE Transactions on Robotics 23(5), 852–862 (2007)
Tsuchida, T., Cottrell, G.: Auditory saliency using natural statistics. In: Proc. Annual Meeting of the Cognitive Science (CogSci), pp. 1048–1053 (2012)
Wolfe, J., Cave, K., Franzel, S.: Guided search: an alternative to the feature integration model for visual search. Journal of Experimental Psychology: Human Perception and Performance 15(3), 419–433 (1989)
Zhang, L., Tong, M.H., Marks, T.K., Shan, H., Cottrell, G.W.: SUN: A Bayesian framework for saliency using natural statistics. Journal of Vision 8(7) (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Nakajima, J., Kimura, A., Sugimoto, A., Kashino, K. (2015). Visual Attention Driven by Auditory Cues. In: He, X., Luo, S., Tao, D., Xu, C., Yang, J., Hasan, M.A. (eds) MultiMedia Modeling. MMM 2015. Lecture Notes in Computer Science, vol 8936. Springer, Cham. https://doi.org/10.1007/978-3-319-14442-9_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-14442-9_7
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14441-2
Online ISBN: 978-3-319-14442-9
eBook Packages: Computer ScienceComputer Science (R0)