Abstract
Corpus-based stochastic language models have achieved significant success in speech recognition, but construction of a corpus pertaining to a specific application is a difficult task. This paper introduces a Case-Based Reasoning system to generate natural language corpora. In comparison to traditional natural language generation approaches, this system overcomes the inflexibility of template-based methods while avoiding the linguistic sophistication of rule-based packages. The evaluation of the system indicates our approach is effective in generating users’ specifications or queries as 98% of the generated sentences are grammatically correct. The study result also shows that the language model derived from the generated corpus can significantly outperform a general language model or a dictation grammar.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Becchetti, C., Ricotti, L.P.: Speech Recognition: Theory and C++ Implementation. John Wiley & Sons, Chichester (1999)
Somers, H.: Empirical Approaches to Natural Language Processing. In: Dale, R., et al. (eds.) Handbook of Natural Language Processing, pp. 377–384. New York, Marcel Dekker (2000)
Jurafsky, D., et al.: The Berkeley Restaurant Project. In: Proceedings of ICSLP-1994, Yokohama, Japan, pp. 2139–2142 (1994)
Lesher, G.W., et al.: Effects of ngram order and training text size on word prediction. In: Proc. of the RESNA 1999 Annual Conference, Arlington, VA, pp. 52–54 (1999)
Rudnicky, A.I., et al.: Task and Domain Specific Modeling in the Carnegie Mellon Communicator System. In: ICSLP 2000, Beijing, China (2000)
Lesher, G.W., Sanelli, C.: A Web-Based System for Autonomous Text Corpus Generation. In: Proceedings of ISSAAC 2000, Washington DC, U.S.A. (2000)
Thompson, H.S.: Corpus Creation for Data-Intensive Linguistics. In: Dale, R., et al. (eds.) Handbook of Natural Language Processing, New York, Marcel Dekker, pp. 385–401 (2000)
Reiter, E.: NLG vs. Templates. In: Proceedings of the 5th European Workshop on Natural Language Generation, Leiden, The Netherlands (1995)
Oh, A.H., Rudnicky, A.: Stochastic Language Generation for Spoken Dialogue Systems. In: Proceedings of the ANLP/NAACL Workshop on Conversational Systems, May 2000, pp. 27–32 (2000)
Varges, S., Mellish, C.: Instance-based Natural Language Generation. In: Proceedings of the 2nd Meeting of the North America Chapter of the Association for Computational Linguistics (NAACL-2001), Pittsburgh, PA (June 2001)
Pan, S., Weng, W.: Designing a speech corpus for instance-based spoken language generation. In: Proceedings of INLG 2002, New York, U.S.A. (2002)
Varges, S.: Instance-based Natural Language Generation, PhD thesis, Institute for Communicating and Collaborative Systems, School of Informatics, University of Edinburgh (2003)
Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, pp. 332–334. Prentice Hall, Englewood Cliffs (2000)
Sun, J., et al.: A Robust Speech Understanding System Using Conceptual Relational Grammar. In: Proceedings of ICSLP 2000, Beijing, China (October 2000)
Minock, M.J.: A Phrasal Generator for Describing Relational Database Queries. In: Proceedings of the 9th European Association of Computational Linguistics workshop on Natural Language Generation, Budapest, Hungary (April 2003)
Halliday, M.A.K., Matthiessen, M.I.M.: An Introduction to Functional Grammar, 3rd edn., ARNOLD (2004)
Ratnaparkhi, A.: Trainable Methods for Surface Natural Language Generation. In: Proceedings of the ANLP/NAACL 2000, Seattle, WA, pp. 194–201 (2000)
The CMU Sphinx Group Open Source Speech Recognition Engines. (Retrieved December 12, 2004), From http://cmusphinx.sourceforge.net/html/cmusphinx.php
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fan, Y., Kendall, E. (2005). A Case-Based Reasoning Approach for Speech Corpus Generation. In: Dale, R., Wong, KF., Su, J., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2005. IJCNLP 2005. Lecture Notes in Computer Science(), vol 3651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562214_86
Download citation
DOI: https://doi.org/10.1007/11562214_86
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29172-5
Online ISBN: 978-3-540-31724-1
eBook Packages: Computer ScienceComputer Science (R0)