Abstract
In this work, we propose and compare two different approaches to a two-level language model. Both of them are based on phrase classes but they consider different ways of dealing with phrases into the classes. We provide a complete formulation consistent with the two approaches. The language models proposed were integrated into an Automatic Speech Recognition (ASR) system and evaluated in terms of Word Error Rate. Several series of experiments were carried out over a spontaneous human–machine dialogue corpus in Spanish, where users asked for information about long-distance trains by telephone. It can be extracted from the obtained results that the integration of phrases into classes when using the language models proposed leads to an improvement of the performance of an ASR system. Moreover, the obtained results seem to indicate that the history length with which the best performance is achieved is related to the features of the model itself. Thus, not all the models show the best results with the same value of history length.
Similar content being viewed by others
Notes
Ametzagaiña R&D group, member of the Basque Technologic Network, http://www.ametza.com.
References
Gupta V, Lenning M, Mermelstein P (1992) A language model for very large-vocabulary speech recognition. Comp Speech Lang 6:331–344
Brown PF, Pietra VJD, Souza PVd, Lai JC, Mercer RL (1992) Class-based n-gram models of natural language. Comput Linguist 18:467–480
Niesler TR, Woodland PC (1996) A variable-length category-based n-gram language model. In: IEEE ICASSP-96, vol I. IEEE, Atlanta, pp 164–167
Niesler T, Whittaker E, Woodland P (1998) Comparison of part-of-speech and automatically derived category-based language models for speech recognition. In: ICASSP’98, Seattle, pp 177–180
Zitouni I (2007) Backoff hierarchical class n-gram language models: effectiveness to model unseen events in speech recognition. Comput Speech Lang 21:99–104
Goodman JT (2001) A bit of progress in language modeling. Comput Speech Lang 15:403–434
Deligne S, Bimbot F (1995) Language modeling by variable length sequences: theoretical formulation and evaluation of multigrams. In: Proceedings of ICASSP ’95, Detroit, pp 169–172
Ries K, Bu FD, Wang Y, Waibel A (1995) Improved language modeling by unsupervised acquisition of structure. In: Proceedings of ICASSP ’95, Detroit, pp 193–196
Kuo HKJ, Reichl W (1999) Phrase-based language models for speech recognition. In: Proceedings of EUROSPEECH 99, vol 4, 1595–1598 Budapest
Chen Y, Chan KP (2003) Extended multi-word trigger pair language model using data mining technique. In: Proceedings of IEEE international conference on systems, man and cybernetics, Washington, DC, pp 262–267
Binnenpoorte D, Cucchiarini C, Boves L, Strik H (2005) Multiword expressions in spoken language: an exploratory study on pronunciation variation. Comput Speech and Lang 19:433–449
Marcu D, Wong W (2002) A phrase-based, joint probability model for statistical machine translation (EMNLP), Philadelphia, 6–7 July
Koehn P, Och F, Marcu D (2003) Statistical phrase-based translation. In: Proceedings of the human language technology and North American Association for computational linguistics conference (HLT/NAACL), 27 May to 1 June, Edmonton, Canada
Zhou B, Chen S, Gao Y (2005) Constrained phrase-based translation using weighted finite state transducer. In: ICASSP, vol 1, pp 1017–1020
Suhm B, Waibel A (1994) Towards better language models for spontaneous speech. In: Proceedings of ICSLP ’94, vol 3, Yokohama, Japan, pp 831–834
Ries K, Buo FD, Waibel A (1996) Class phrase models for language modelling. In: Proceedings of ICSLP ’96, vol 1, Philadelphia, pp 398–401
McCandless M, Glass J (1994) Empirical acquisition of language models for speech recognition. In: Proceedings of ICSLP ’94, Yokohama, Japan
Deligne S, Sagisaka Y (2000) Statistical language modeling with a class-based n-multigram model. Comput Speech Lang 14:261–279
Zitouni I (2002) A hierarchical language model based on variable-length class sequences: the mcni approach. IEEE Trans Speech Audio Proc 10:193–198
Yamamoto H, Isogai S, Sagisaka Y (2003) Multi-class composite n-gram language model. Speech Commun 41:369–379
Zitouni I, Smaili K, Haton JP (2003) Statistical language modeling based on variable-length sequences. Comput Speech Lang 17:27–41
Sanchis E, Segarra E, Garca F, Hurtado L (2004) Language Understanding using n-multigram Models. Lect Notes Comp Sci 0302–9743(3230):207–219
Hsu BJP, Glass J (2006) Style & topic language model adaptation using hmm-lda. In: Proceedings of the 2006 Conference on Empirical methods in natural language processing. Association for Computational Linguistics, Sydney, pp 373–381
Li YX, Tan CL, Ding X (2005) A hybrid post-processing system for offline handwritten chinese script recognition. Pattern Anal Appl 8:272–286
Benedí JM, Sánchez JA (2005) Estimation of stochastic context-free grammars and their use as language models. Comput Speech Lang 19:249–274
García P, Vidal E (1990) Inference of k-testable languages in the strict sense and application to syntactic pattern recognition. IEEE Trans Pattern Anal Mach Intell 12:920–925
Torres I, Varona A (2001) k-TSS language models in speech recognition systems. Comput Speech Lang 15:127–149
Zue V, Seneff S, Glass J, Polifroni J, Pao C, Hazen T, Hetherington L (2000) Jupiter: A telephone-based conversational interface for weather information. IEEE Trans Speech Audio Proc. 8(1):85–96
Lamel L, Rosset S, Gauvin J, Bennacef S, Prouts G (1998) The limsi arise system. In: IEEE 4th workshop on interactive voice technology for telecommunications applications, pp 209–214
Seneff S, Polifroni J (2000) Dialogue management in the mercury flight reservation system. In: ANLP-NAACL 2000 satellite workshop, pp 1–6
Benedí JM, Lleida E, Varona A, Castro MJ, Galiano I, Justo R, López I, Miguel A (2006) Design and acquisition of a telephone spontaneous speech dialogue corpus in Spanish: DIHANA. In: Proceedings of LREC’06, Genoa, Italy, pp 1636–1639
Justo R, Torres MI (2007) Phrases in category-based language models for spanish and basque asr. In: Proceedings of the 10th European conference on speech communication and technology. Interspeech, Antwerp, Belgium, pp 2377–2380
Zens R, Ney H (2004) Improvements in phrase-based statistical machine translation. In: Proceedings of the human language technology conference (HLT-NAACL), pp 257–264
Caseiro DA, Trancoso I (2006) A specialized on-the-fly algorithm for lexicon and language model composition. IEEE Trans Audio Speech Lang Process 14:1281–1291
Piao SS, Rayson P, Archer D, McEnery T (2005) Comparing and combining a semantic tagger and a statistical tool for mwe extraction. Comp Speech Lang 19:378–397
Och FJ (1999) An efficient method for determining bilingual word classes. In: Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics. Association for Computational Linguistics, Morristown, pp 71–76
DIHANA project (2005) Dialogue system for information access using spontaneous speech in different environments. CICYT TIC2002-04103-C03-03 http://www.dihana.upv.es
Grau S, Segarra E, Sanchís E, García F, Hurtado LF (2006) Incorporating semantic knowledge to the language model in a speech understanding system. In: IV Jornadas en Tecnologia del Habla, Zaragoza, Spain, pp 145–148
Hurtado LF, Griol D, Segarra E, Sanchís E (2006) A stochastic approach for dialog management based on neural networks. In: Proceedings of the 9th international conference on spoken language processing interspeech, Pittsburgh, pp 49–52
Justo R, Torres MI, Benedí JM (2006) Category-based language model in a spanish spoken dialogue system. Procesamiento del Lenguaje Natural 37:19–24
Acknowledgments
We would like to thank anonymous reviewers for their constructive comments and suggestions. We are also very grateful to Professor J. M. Benedí for his helpful comments in the first stage of this work. Finally, we would like to thank the Ametzagaiña group and, in particular, Josu Landa, for providing us with the linguistic classes and segmentation of the DIHANA corpus. This word has been partially supported by the University of the Basque Country under grant GIU07/57 and by CICYT under grant TIN2005-08660-C04-03.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Justo, R., Torres, M.I. Phrase classes in two-level language models for ASR. Pattern Anal Applic 12, 427–437 (2009). https://doi.org/10.1007/s10044-009-0165-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-009-0165-y