A Case-Based Reasoning Approach for Speech Corpus Generation | SpringerLink
Skip to main content

A Case-Based Reasoning Approach for Speech Corpus Generation

  • Conference paper
Natural Language Processing – IJCNLP 2005 (IJCNLP 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3651))

Included in the following conference series:

  • 1599 Accesses

Abstract

Corpus-based stochastic language models have achieved significant success in speech recognition, but construction of a corpus pertaining to a specific application is a difficult task. This paper introduces a Case-Based Reasoning system to generate natural language corpora. In comparison to traditional natural language generation approaches, this system overcomes the inflexibility of template-based methods while avoiding the linguistic sophistication of rule-based packages. The evaluation of the system indicates our approach is effective in generating users’ specifications or queries as 98% of the generated sentences are grammatically correct. The study result also shows that the language model derived from the generated corpus can significantly outperform a general language model or a dictation grammar.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Becchetti, C., Ricotti, L.P.: Speech Recognition: Theory and C++ Implementation. John Wiley & Sons, Chichester (1999)

    Google Scholar 

  2. Somers, H.: Empirical Approaches to Natural Language Processing. In: Dale, R., et al. (eds.) Handbook of Natural Language Processing, pp. 377–384. New York, Marcel Dekker (2000)

    Google Scholar 

  3. Jurafsky, D., et al.: The Berkeley Restaurant Project. In: Proceedings of ICSLP-1994, Yokohama, Japan, pp. 2139–2142 (1994)

    Google Scholar 

  4. Lesher, G.W., et al.: Effects of ngram order and training text size on word prediction. In: Proc. of the RESNA 1999 Annual Conference, Arlington, VA, pp. 52–54 (1999)

    Google Scholar 

  5. Rudnicky, A.I., et al.: Task and Domain Specific Modeling in the Carnegie Mellon Communicator System. In: ICSLP 2000, Beijing, China (2000)

    Google Scholar 

  6. Lesher, G.W., Sanelli, C.: A Web-Based System for Autonomous Text Corpus Generation. In: Proceedings of ISSAAC 2000, Washington DC, U.S.A. (2000)

    Google Scholar 

  7. Thompson, H.S.: Corpus Creation for Data-Intensive Linguistics. In: Dale, R., et al. (eds.) Handbook of Natural Language Processing, New York, Marcel Dekker, pp. 385–401 (2000)

    Google Scholar 

  8. Reiter, E.: NLG vs. Templates. In: Proceedings of the 5th European Workshop on Natural Language Generation, Leiden, The Netherlands (1995)

    Google Scholar 

  9. Oh, A.H., Rudnicky, A.: Stochastic Language Generation for Spoken Dialogue Systems. In: Proceedings of the ANLP/NAACL Workshop on Conversational Systems, May 2000, pp. 27–32 (2000)

    Google Scholar 

  10. Varges, S., Mellish, C.: Instance-based Natural Language Generation. In: Proceedings of the 2nd Meeting of the North America Chapter of the Association for Computational Linguistics (NAACL-2001), Pittsburgh, PA (June 2001)

    Google Scholar 

  11. Pan, S., Weng, W.: Designing a speech corpus for instance-based spoken language generation. In: Proceedings of INLG 2002, New York, U.S.A. (2002)

    Google Scholar 

  12. Varges, S.: Instance-based Natural Language Generation, PhD thesis, Institute for Communicating and Collaborative Systems, School of Informatics, University of Edinburgh (2003)

    Google Scholar 

  13. Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, pp. 332–334. Prentice Hall, Englewood Cliffs (2000)

    Google Scholar 

  14. Sun, J., et al.: A Robust Speech Understanding System Using Conceptual Relational Grammar. In: Proceedings of ICSLP 2000, Beijing, China (October 2000)

    Google Scholar 

  15. Minock, M.J.: A Phrasal Generator for Describing Relational Database Queries. In: Proceedings of the 9th European Association of Computational Linguistics workshop on Natural Language Generation, Budapest, Hungary (April 2003)

    Google Scholar 

  16. Halliday, M.A.K., Matthiessen, M.I.M.: An Introduction to Functional Grammar, 3rd edn., ARNOLD (2004)

    Google Scholar 

  17. Ratnaparkhi, A.: Trainable Methods for Surface Natural Language Generation. In: Proceedings of the ANLP/NAACL 2000, Seattle, WA, pp. 194–201 (2000)

    Google Scholar 

  18. The CMU Sphinx Group Open Source Speech Recognition Engines. (Retrieved December 12, 2004), From http://cmusphinx.sourceforge.net/html/cmusphinx.php

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fan, Y., Kendall, E. (2005). A Case-Based Reasoning Approach for Speech Corpus Generation. In: Dale, R., Wong, KF., Su, J., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2005. IJCNLP 2005. Lecture Notes in Computer Science(), vol 3651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562214_86

Download citation

  • DOI: https://doi.org/10.1007/11562214_86

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29172-5

  • Online ISBN: 978-3-540-31724-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics