Abstract
A framework, called Table of Content-Analytical Index (ToCAI), for the content description of multimedia material is presented. The idea for such a description scheme (DS) comes out from the structures used for indexing technical books (containing a Table of Content, typically placed at the beginning of the book, where the list of topics is organized hierarchically into chapters, sections, and an Analytical Index, typically placed at the end of the book, where keywords are listed alphabetically). The ToCAI description scheme provides similarly a hierarchical description of the time sequential structure of a multimedia document (ToC), suitable for browsing, and an “Analytical Index” (AI) of audio-visual key items for the document, suitable for effective retrieval. Besides two other sub-description schemes are proposed to specify the program category and the description of other metadata associated to the multimedia document in the general DS. The detailed structure of the DS is presented by means of a UML diagram. Moreover, some suitable automatic extraction methods for the identification of the values associated to the descriptors that compose the ToCAI are presented and discussed. Finally, a browsing application example is also proposed.
Similar content being viewed by others
References
N. Adami, A. Bugatti, R. Leonardi, P. Migliorati, and L. Rossi, “ISO/IEC JTC1/SC29/WG11/M4586: The TOCAI DS for audio-visual documents. Structure and concepts,” MPEG-7, Seoul, Korea, March 1999.
N. Adami and R. Leonardi, “Identification of editing effects in image sequences by statistical modelling,” in Proc. of the 1999 Picture Coding Symposium, Portland, OR, U.S.A., April 1999.
P. De Souza, “A statistical approach to the design of an adaptive self-normalizing silence detector, IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. 3, No. 31, pp. 678–684, 1983.
A. Ferman, A. Tekalp, and R. Mehrotra, “Effective content representation for video,” in Proc. IEEE International Conference Image Processing, Chicago, IL, Oct. 1998.
M. Fowler, UML Distilled, Addison-Wesley, 1997.
J. Foote, “A similarity measure for automatic audio classification,” Proc. AAAI'97 Spring Symposium on Intelligent Integration and Use of Text, Image, Video and Audio Corpora, 1997.
O.N. Gerek and Y. Altunbasak, “Key frame selection from MPEG video,” in Proc. SPIE Visual Communications and Image Processing, 1997, Vol. 3024, pp. 920–925.
L.F. Lamel, L.R. Rabiner, A.E. Rosenberg, and J.G. Wilpon, “An improved endpoint detector for isolatedword recognition,” IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. 4, No. 29, pp. 777–785, 1981.
C. Montaci and M. Caraty, “A silence/noise/music/speech algorithm,” in International Conference on Spoken Language Processing, Sidney, 1998.
MPEG Requirement Group, MPEG7, “Context and Objectives. ISO/IEC JTC1/SC29/WG11 N2460,” MPEG98, Atlantic City, USA, Oct. 1998.
MPEG Requirement Group, MPEG7, “Requirements. ISO/IEC JTC1/SC29/WG11 N2461,” MPEG98, Atlantic City, USA, Oct. 1998.
L. Rabiner and R. Schafer, Digital Processing of Speech Signals, Prentice Hall, Alan Oppenheim editor.
Y. Rui, T. Huang, and S. Mehrotra, “Browsing and retrieving video content in a unified framework,” in Proc. IEEE Workshop on Multimedia Signal Processing, Dec. 1998.
C. Saraceno, “Content-based representation and analysis of video sequences by joint audio and visual characterization,” Ph.D. thesis, Brescia, 1998.
C. Saraceno and R. Leonardi, “Indexing audio-visual databases through a joint audio and video processing,” International Journal of Imaging Systems and Technology, 1998. Vol. 9, No. 5, pp. 320–331.
C. Saraceno and R. Leonardi, “Identification of story units in audio-visual sequences by joint audio and video processing,” in Proc. International Conference on Image Processing, Chicago, IL, U.S.A., Oct. 1998.
J. Saunders, “Real Time discrimination of broadcast music/speech,” in Proc. ICASSP-1996, 1996, pp. 993–996.
I.K. Sethi and N. Patel, “A statistical approach to scene change detection,” in Proc. of the SPIE Conf. on Storage and Retrieval for Image and Video Databases III, SPIE-2420. Feb. 1995, pp. 329–338.
S. Smoliar and L. Wilcox, “Indexing the content of multimedia documents,” in Proc. Second International Conference on Visual Information Systems, San Diego, CA, 1997.
T. Zhang and C.-C. Jay Kuo, “Audio-guided audiovisual data segmentation and indexing,” in IS&T/SPIE's Symposium on Electronic Imaging Science & Technology—Conference on Storage and Retrieval for Image and Video Databases. San Jose, Jan. 1999, Vol. 7, No. 3656, pp. 316–327.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Adami, N., Bugatti, A., Leonardi, R. et al. The ToCAI Description Scheme for Indexing and Retrieval of Multimedia Documents. Multimedia Tools and Applications 14, 153–173 (2001). https://doi.org/10.1023/A:1011347200133
Issue Date:
DOI: https://doi.org/10.1023/A:1011347200133