Abstract
This paper describes a topic segmentation and indexation system for TV broadcast news programs spoken in European Portuguese. The system is integrated in an alert system for selective dissemination of multimedia information developed in the scope of an European Project. The goal of this work is to enhance the retrieval of specific spoken documents that have been automatically transcribed, using speech recognition. Our segmentation algorithm is based on simple heuristics related with anchor detection. The indexation is based on hierarchical concept trees (thesaurus), containing 22 main thematic domains, for which Hidden Markov models and topic language models were created. On-going experiments related to multiple topic indexing are also described, where a confidence measure based on the likelihood ratio test is used as the hypothesis test.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Fiscus, J., Doddington, G., Garofolo, J., Martin, A., “NIST’S 1998 Topic Detection and Tracking Evaluation (TDT2)”, in Proc. DARPA Broadcast News Workshop, Feb. 1999.
Yamron, J. P., Carp, I., Gillick, L., Lowe, S., “A Hidden Markov Model Approach to Text Segmentation and Event Tracking”, in Proceedings of ICASSP-98, Seattle, May 1998.
Clarkson, P., Rosenfeld, R., “Statistical Language Modeling using the CMU-Cambridge Toolkit”, in Proc. EUROSPEECH 97, Rhodes, Greece, 1997.
Alexander Gelbukh, Grigori Sidorov and Adolfo Guzmán-Arenas: Document Indexing With a Concept Hierarchy. In: New Developments in Digital Libraries. Proceedings of the 1st International Workshop on New Developments in Digital Libraries (NDDL-2001). ICEIS PRESS, Setúbal, 2001.
H. Meinedo, N. Souto, J. Neto: Speech Recognition of Broadcast News for the European Portuguese language. Proceedings ASRU’2001-IEEE Automatic Speech Recognition and Understanding Workshop, Madonna di Campiglio, Italy, December 2001.
C. Hagège: SMORPH: um analisador/gerador morfológico para o português., Lisboa, Portugal, 1997.
NIST Speech Group: The 2001 Topic Detection and Tracking (TDT2001) Task Definition and Evaluation Plan. ftp://jaguar.ncsl.nist.gov//tdt/tdt2001/evalplans/TDT01.Eval.Plan.v1.2.ps, 15 November 2002.
Ng, K., “Survey of Approaches to Information Retrieval of Speech Messages” Technical report, Spoken Language Systems Group, MIT, February 1996.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Amaral, R., Trancoso, I. (2003). Topic Indexing of TV Broadcast News Programs. In: Mamede, N.J., Trancoso, I., Baptista, J., das Graças Volpe Nunes, M. (eds) Computational Processing of the Portuguese Language. PROPOR 2003. Lecture Notes in Computer Science(), vol 2721. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45011-4_35
Download citation
DOI: https://doi.org/10.1007/3-540-45011-4_35
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40436-1
Online ISBN: 978-3-540-45011-5
eBook Packages: Springer Book Archive