Abstract
The project Augmented Multi-party Interaction (AMI) is concerned with the development of meeting browsers and remote meeting assistants for instrumented meeting rooms – and the required component technologies R&D themes: group dynamics, audio, visual, and multimodal processing, content abstraction, and human-computer interaction. The audio-visual processing workpackage within AMI addresses the automatic recognition from audio, video, and combined audio-video streams, that have been recorded during meetings. In this article we describe the progress that has been made in the first two years of the project. We show how the large problem of audio-visual processing in meetings can be split into seven questions, like “Who is acting during the meeting?”. We then show which algorithms and methods have been developed and evaluated for the automatic answering of these questions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ba, S.O., Odobez, J.M.: Evaluation of head pose tracking algorithm in indoor environments. In: Proceedings IEEE ICME (2005)
Ba, S.O., Odobez, J.M.: A rao-blackwellized mixed state particle filter for head pose tracking. In: Proceedings of the ACM-ICMI Workshop on MMMP (2005)
BANCA: Benchmark database, http://www.ee.surrey.ac.uk/banca
Burger, S., MacLaren, V., Yu, H.: The ISL meeting corpus: The impact of meeting type on speech style. In: Proceedings ICSLP (2002)
Cardinaux, F., Sanderson, C., Bengio, S.: Face verification using adapted generative models. In: Int. Conf. on Automatic Face and Gesture Recognition (2004)
Cardinaux, F., Sanderson, C., Marcel, S.: Comparison of MLP and GMM classifiers for face verification on XM2VTS. In: Proc. IEEE AVBPA (2003)
Carletta, J., et al.: The AMI meetings corpus. In: Proc. Symposium on Annotating and measuring Meeting Behavior (2005)
Fapso, M., Schwarz, P., Szoke, I., Smrz, P., Schwarz, M., Cernocky, J., Karafiat, M., Burget, L.: Search engine for information retrieval from speech records. In: Proceedings Computer Treatment of Slavic and East European Languages (2005)
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: International Conference on Machine Learning (1996)
Hain, T., Burget, L., Dines, J., Garau, G., Karafiat, M., Lincoln, M., McCowan, I., Moore, D., Wan, V., Ordelman, R., Renals, S.: The 2005 AMI system for the transcription of speech in meetings. In: Proc. of the NIST RT 2005s workshop (2005)
Hain, T., Dines, J., Garau, G., Karafiat, M., Moore, D., Wan, V., Ordelman, R., Renals, S.: Transcription of conference room meetings: an investigation. In: Proceedings Interspeech (2005)
Heylen, D., Nijholt, A., Reidsma, D.: Determining what people feel and think when interacting with humans and machines: Notes on corpus collection and annotation. In: Kreiner, J., Putcha, C. (eds.) Proceedings 1st California Conference on Recent Advances in Engineering Mechanics (2006)
Hradis, M., Juranek, R.: Real-time tracking of participants in meeting video. In: Proceedings CESCG (2006)
Janin, A., Baron, D., Edwards, J., Ellis, D., Gelbart, D., Morgan, N., Peskin, B., Pfau, T., Shriberg, E., Stolcke, A., Wooters, C.: ICSI meeting corpus. In: Proceedings IEEE ICASSP (2003)
Messer, K., Kittler, J., Sadeghi, M., Hamouz, M., Kostyn, A., Marcel, S., Bengio, S., Cardinaux, F., Sanderson, C., Poh, N., Rodriguez, Y., Czyz, J., et al.: Face authentication test on the BANCA database. In: Proceedings ICPR (2004)
Motlicek, P., Burget, L., Cernocky, J.: Non-parametric speaker turn segmentation of meeting data. In: Proceedings Eurospeech (2005)
Pantic, M., Rothkrantz, L.J.M.: Automatic analysis of facial expressions: The state of the art. IEEE TPAMI 22(12), 1424–1445 (2000)
Poppe, R., Heylen, D., Nijholt, A., Poel, M.: Towards real-time body pose estimation for presenters in meeting environments. In: Proceedings WSCG (2005)
Potucek, I., Sumec, S., Spanel, M.: Participant activity detection by hands and face movement tracking in the meeting room. In: Proceedings CGI (2004)
Rienks, R., Poppe, R., Heylen, D.: Differences in head orientation for speakers and listeners: Experiments in a virtual environment. Int. Journ. HCS (to appear)
Schwarz, P., Matějka, P., Černocký, J.: Hierarchical structures of neural networks for phoneme recognition. In: IEEE ICASSP (accepted, 2006)
Smith, K., Ba, S., Odobez, J., Gatica-Perez, D.: Evaluating multi-object tracking. In: Workshop on Empirical Evaluation Methods in Computer Vision (2005)
Smith, K., Ba, S., Odobez, J.M., Gatica-Perez, D.: Multi-person wander-visual-focus-of-attention tracking. Technical Report RR-05-80, IDIAP (2005)
Smith, K., Schreiber, S., Beran, V., Potúcek, I., Gatica-Perez, D.: A comparitive study of head tracking methods. In: Renals, S., Bengio, S., Fiscus, J.G. (eds.) MLMI 2006. LNCS, vol. 4299. Springer, Heidelberg (2006)
Szöke, I., Schwarz, P., Matějka, P., Burget, L., Karafiát, M., Fapšo, M., Černocký, J.: Comparison of keyword spotting approaches for informal continuous speech. In: Proceedings Eurospeech (2005)
Torch, http://www.idiap.ch/~marcel/en/torch3/introduction.php
NIST US: Spring 2004 (RT04S) and Spring 2005 (RT05S) Rich Transcription Meeting Recognition Evaluation Plan. Available at: http://www.nist.gov/
Viola, P., Jones, M.: Robust real-time object detection. International Journal of Computer Vision (2002)
Waibel, A., Steusloff, H., Stiefelhagen, R., CHIL Project Consortium: CHIL: Computers in the human interaction loop. In: Proceedings of the NIST ICASSP Meeting Recognition Workshop (2004)
Wellner, P., Flynn, M., Guillemot, M.: Browsing recorded meetings with Ferret. In: Bengio, S., Bourlard, H. (eds.) MLMI 2004. LNCS, vol. 3361, pp. 12–21. Springer, Heidelberg (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Al-Hames, M. et al. (2006). Audio-Visual Processing in Meetings: Seven Questions and Current AMI Answers. In: Renals, S., Bengio, S., Fiscus, J.G. (eds) Machine Learning for Multimodal Interaction. MLMI 2006. Lecture Notes in Computer Science, vol 4299. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11965152_3
Download citation
DOI: https://doi.org/10.1007/11965152_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69267-6
Online ISBN: 978-3-540-69268-3
eBook Packages: Computer ScienceComputer Science (R0)