Abstract
Many methods of term extraction have been discussed in terms of their accuracy on huge corpora. However, when we try to apply various methods that derive from frequency to a small corpus, we may not be able to achieve sufficient accuracy because of the shortage of statistical information on frequency. This paper reports a new way of extracting terms that is tuned for a very small corpus. It focuses on the structure of compound terms and calculates perplexity on the term unit’s left-side and right-side. The results of our experiments revealed that the accuracy with the proposed method was not that advantageous. However, experimentation with the method combining perplexity and frequency information obtained the highest average-precision in comparison with other methods.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ananiadou, S.: A methodology for automatic term recognition. In: Proceedings of the 15th International Conference on Computational Linguistcs (COLING), pp. 1034–1038 (1994)
Asahara, M., Matsumoto, Y.: Extended Models and Tools for High-performance Part-of-Speech Tagger. In: Proceedings of COLING 2000 (2000)
COMPUTERM 1998 First Workshop on Computational Terminology (1998)
COMPUTERM 2002 Second Workshop on Computational Terminology (2002)
Frantzi, K., Ananiadou, S.: The C-value/NC-value method for ATR. Journal of NLP 6(3), 145–179 (1999)
Kageura, K.: TMREC Task: Overview and Evaluation. In: Proc. of the First NTCIR Workshop on Research in Japanese Text Retrieval and Term Recognition, pp. 411–440 (1999)
Kageura, K., Umino, B.: Methods of automatic term recognition: A review. Terminology 3(2), 259–289 (1996)
Manning, C.D., Schutze, H.: Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge (1999)
Nakagawa, H., Mori, T.: Automatic Term Recognition based on Statistics of Compound Nouns and their Components. Terminology 9(2), 201–219 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yoshida, M., Nakagawa, H. (2005). Automatic Term Extraction Based on Perplexity of Compound Words. In: Dale, R., Wong, KF., Su, J., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2005. IJCNLP 2005. Lecture Notes in Computer Science(), vol 3651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562214_24
Download citation
DOI: https://doi.org/10.1007/11562214_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29172-5
Online ISBN: 978-3-540-31724-1
eBook Packages: Computer ScienceComputer Science (R0)