It has been suggested that combining content-based indexing with automatically generated temporal metadata might help improve search and browsing of recordings of computer-mediated collaborative activities such as on-line meetings, which are characterised by extensive multimodal communication. This paper presents an analytical evaluation of the effectiveness of these techniques as implemented through automatic speech recognition and temporal mapping. In particular, it assesses the extent to which this strategy can help uncover contextual relationships between audio and text segments in recorded remote meetings. Results show that even simple temporal mapping can effectively support retrieval of recorded audio segments, improve retrieval performance in situations where speech recognition alone would have exhibited prohibitively high word error rates, and provide a basic form of semantic adaptation.
Similar content being viewed by others
Agius, H., Angelides, M.C.: Enriching MPEG-7 user models with content metadata. In: Proceedings of the 1st International Workshop on Semantic Media Adaptation and Personalization: SMAP’06, pp. 151–156 (2006)
Allen J.F. (1983). Maintaining knowledge about temporal intervals. Commun. ACM 11(26): 832–843
Bouamrane M.M. and Luz S. (2006). Meeting browsing: a state-of-the-art review. Multimedia Syst 12(4–5): 439–457
Bouamrane, M.M., Luz, S.: Navigating multimodal meeting recordings with the meeting miner. In: Flexible Query Answering Systems: FQAS 2006, LNAI, vol. 4027, pp. 356–367. Springer, Milan (2006)
Bouamrane, M.M., Luz, S.: Temporal mining of recorded collaborative production of artefacts. In: Proceedings of Industrial Conference on Data Mining, ICDM’06, pp. 187–201, Leipzig (2006)
Erol, B., Li, Y.: An overview of technologies for e-meeting and e-lecture. In: IEEE International Conference on Multimedia and Expo, ICME’05, pp. 1000–1005. IEEE press, Amsterdam (2005)
Furui, S.: Automatic speech recognition and its application to information extraction. In: Proceedings of the 37th annual meeting of the Association for Computational Linguistics, pp. 11–20. Morristown (1999)
Geyer, W., Richter, H., Abowd, G.D.: Making multimedia meeting records more meaningful. In: Proceedings of International Conference on Multimedia and Expo, ICME’03, vol. 2, pp. 669–672 (2003)
Jain R. (2003). Are we doing multimedia?. IEEE MultiMedia 10(4): 111–112
Koumpis K. and Renals S. (2005). Content-based access to spoken audio. IEEE Signal Process. 22(5): 61–69
Lee, D.S., Hull, J., Erol, B., Graham, J.: Minuteaid: multimedia note-taking in an intelligent meeting room. In: IEEE International Conference on Multimedia and Expo, vol. 3, pp. 1759 – 1762. IEEE press, New York (2004)
Luz, S., Bouamrane, M.M., Masoodian, M.: Gathering a corpus of multimodal computer-mediated meetings with focus on text and audio interaction. In: Proceedings of the International Conference on Language Resources and Evaluation, LREC 2006, pp. 407–412. Genoa (2006)
Luz, S., Roy, D.M.: Meeting browser: A system for visualising and accessing audio in multicast meetings. In: Proceedings of the International Workshop on Multimedia Signal Processing, pp. 489–494. IEEE Signal Process. Soc. (1999)
Masoodian, M., Luz, S., Bouamrane, M.M., King, D.: RECOLED: A group-aware collaborative text editor for capturing document history. In: Proceedings of WWW/Internet 2005, vol. 1, pp. 323–330. Lisbon (2005)
McCowan I., Gatica-Perez D., Bengio S., Lathoud G., Barnard M. and Zhang D. (2005). Automatic analysis of multimodal group actions in meetings. IEEE Trans. Pattern Anal. Mach. Intell. 27(3): 305–317
Nakatani, C., Whittaker, S., Hirschberg, J.: Now you hear it, now you don’t: Empirical studies of audio browsing behavior. In: Proceedings of International Conference on Spoken Language Processing, ICSLP 1998, vol. 4, pp. 1651–1654. Sydney (1998)
Rijsbergen C.J.V. (1979). Information Retrieval. Butterworths, London, UK
Sellen, A.J.: Speech patterns in video-mediated conversations. In: Proceedings of the SIGCHI conference on Human factors in computing systems: CHI’92, pp. 49–59. ACM Press, New York (1992)
Smeaton, A.F.: Indexing, browsing, and searching of digital video and digital audio information. LNCS Lectures on information retrieval pp. 93–110 (2001)
Tannen, D.: Talking voices, repetition, dialogue and imagery in conversational discourse. Studies in interactional sociolinguistics. Cambridge Univ. Press, (1989)
Tucker, S., Whittaker, S.: Accessing multimodal meeting data: systems, problems and possibilities. In: Machine Learning for Multimodal Interaction: MLMI 2004, vol. LNCS 3361, pp. 1–2. Springer, Heidelberg (2005)
Waibel, A., Bett, M., Metze, F., Ries, K., Schaaf, T., Schultz, T., Soltau, H., Yu, H., Zechner, K.: Advances in automatic meeting record creation and access. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing, pp. 597–600 (2001)
Wellner, P., Flynn, M., Guillemot, M.: Browsing recorded meetings with Ferret. In: Bengio S., Bourlard H. (eds.) In: Proceedings of Machine Learning for Multimodal Interaction: 1st International Workshop, MLMI 2004, vol. 3361, pp. 12–21. Springer-, Martigny (2004)
Wellner, P., Flynn, M., Tucker, S., Whittaker, S.: A meeting browser evaluation test. In: CHI ’05 Extended abstracts on Human factors in computing systems, pp. 2021–2024. ACM Press, New York (2005)
Zechner, K.: Automatic generation of concise summaries of spoken dialogues in unrestricted domains. In: Proceedings of the 24th annual conference on Research and development in information retrieval, SIGIR ’01, pp. 199–207. ACM Press, New York (2001)
Author information
Authors and Affiliations
Corresponding author
Additional information
The authors are listed in alphabetical order.
Rights and permissions
About this article
Cite this article
Bouamrane, MM., Luz, S. An analytical evaluation of search by content and interaction patterns on multimodal meeting records. Multimedia Systems 13, 89–102 (2007). https://doi.org/10.1007/s00530-007-0087-8
Issue Date:
DOI: https://doi.org/10.1007/s00530-007-0087-8