Abstract
In this paper we describe the part of the text preprocessing module in our text-to-speech synthesis system which converts numerals written as figures into a readable full-length form, which could be processed by a phonetic transcription module. The numerals conversion is a significant issue in inflectional language as Czech, Russian or Slovak because morphological and semantic information is necessary to make the conversion unambiguous. In the paper three part-of-speech tagging methods are compared. Furthermore, a method reducing the tagset to increase the numerals conversion accuracy is presented in the paper.
Support for this work was provided by the Ministry of Education of the Czech Republic (MŠMT LC536).
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Van den Bosh, A.: Automatic Phonetic Transcription of Words Based on Sparse Data. In: Workshop Notes of the ECML/MLnetWorkshop on Empirical Learning of Natural Language Processing Tasks, Prague, Czech Republic, pp. 61–70 (1997)
Matoušek, J., Psutka: ARTIC: a New Czech Text-to-Speech System Using Statistical Approach to Speech Segment Database Construction. In: Proceedings of ICSLP 2000, Beijing, vol. IV, pp. 612–615 (2000)
Böhmová, A., Hajič, J., Hajičová, E., Hladká, B.: The Prague Dependency Treebank: Three- Level Annotation Scenario. In: Abeillé, A. (ed.) Treebanks: Building and Using Syntactically Annotated Corpora, Kluwer Academic Publishers, Dordrecht (2001)
Hana, J., Hanová, H., Hajič, J., Hladká, B., Jeřábek, E.: Manual for Morphological Annotation - Instructions for Annotators. -In: CKL Technical Report TR-2002-14, Charles University, Czech Republic (2002)
Romportl, J., Tihelka, D., Matoušek, J.: Sentence Boundary Detection in Czech TTS System Using Neural Networks. In: Proceedings of the Seventh International Symposium on Signal Processing and its Applications. Paris, France, pp. 247–250 (2003)
Brill, E.: Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging. Computational Linguistics, 247–250 (1995)
Florian, R.:, http://nlp.cs.jhu.edu/rflorian/fntbl
Florian, G., Ngai.: Transformation-Based Learning in the fast lane. In: Proceedings of North America ACL 2001(2001)
Florian, R., Ngai, G.: Fast Transformation-Based Learning Toolkit. Technical Report
Daelemans, W., Zavrel, J., Berck, P., Gillis, S.: A Memory-Based Part of Speech Tagger- Generator. In: Proceedings of the 4th Workshop on Very Large Corpora (1996)
Hajič, J.: Morphological Tagging: Data vs. Dictionaries. In: Proceedings of the 6th Applied Natural Language Processing and the 1st NAACL Conference, Seattle, Washington, pp. 94–101 (2000)
Matoušek, J., Tihelka, D.: Slovak Text-to-Speech Synthesis in ARTIC System. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2004. LNCS (LNAI), vol. 3206, pp. 155–162. Springer, Heidelberg (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zelinka, J., Kanis, J., Müller, L. (2005). Automatic Transcription of Numerals in Inflectional Languages. In: Matoušek, V., Mautner, P., Pavelka, T. (eds) Text, Speech and Dialogue. TSD 2005. Lecture Notes in Computer Science(), vol 3658. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11551874_42
Download citation
DOI: https://doi.org/10.1007/11551874_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28789-6
Online ISBN: 978-3-540-31817-0
eBook Packages: Computer ScienceComputer Science (R0)