Abstract
This paper suggests a methodology which is aimed to extract the terminologically relevant collocations for translation purposes. Our basic idea is to use a hybrid method which combines the statistical method and linguistic rules. The extraction system used in our work operated at three steps: (1) Tokenization and POS tagging of the corpus; (2) Extraction of multi-word units using statistical measure; (3) Linguistic filtering to make use of syntactic patterns and stop-word list. As a result, hybrid method using linguistic filters proved to be a suitable method for selecting terminological collocations, it has considerably improved the precision of the extraction which is much higher than that of purely statistical method. In our test, hybrid method combining “Log-likelihood ratio” and “linguistic rules” had the best performance in the extraction. We believe that terminological collocations and phrases extracted in this way, could be used effectively either to supplement existing terminological collections or to be used in addition to traditional reference works.
This work has been supported by The National Basic Research Program of China(973 program, No. 2004CB318102) and the 863 program (No. 2001AA114210, 2002AA117010).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bao-Bao, C.: Extraction of Translation Equivalent Pairs from Chinese-English Parallel Corpus. Terminology Standardization and Information Technology, pp. 24–29 (2002)
Bourigault, D.L.: A Natural Language Processing Tool for Terminology Extraction. In: Proceedings of 7th EURALEX International Congress (1996)
Daille, B.: Study and Implementation of Combined Techniques for Automatic Extraction of Terminology. In: The balancing act combining symbolic and statistical approaches to language. MIT Press, Cambridge (1995)
Heid, U.: A linguistic bootstrapping approach to the extraction of term candidates from German text (2000), http://www.ims.uni-stuttgart.de/~uli/papers.html
Shimohata, S., Sugio, T., Nagata, J.: Retrieving Domain-Specific Collocations By Co-Occurrences and Word Order Constraints. Computational Intelligence 15, 92–100 (1999)
Luo, S., Nation, M.S.: Two-Character Chinese Word Extraction Based on Hybrid of Internal and Contextual Measures (2003)
Smadja, F.: Retrieving Collocations From Text: XTRACT. Computational Linguistics 19(1), 143–177 (1993)
Vogel, D.: Using Generic Corpora to Learn Domain-Specific Terminology. In: Workshop on Link Analysis for Detecting Complex Behavior (2003)
Dias, G., Guilloré, S., Lopes, J.G.P.: Multiword Lexical Units Extraction. In: Proceedings of the International Symposium on Machine Translation and Computer Language Information Processing, Beijing, China (1999)
Zhi-Wei, F.: An Introduction to Modern Terminology. Yuwen press, China (1997)
Diasetc, G., et al.: Combining Linguistics with Statistics for Multiword Term Extraction. In: Proc. of Recherche d’Informations Assistee par Ordinateur (2000)
Xuan-jing, H., Li-de, W., Wen-xin, W.: Statistical Acquisition of Terminology Dictionary. In: The Fifth Workshop on Very Large Corpora (1997)
Yu, J.: Automatic Detection of Collocation (2003), http://icl.pku.edu.cn/yujs/
Oh, J.-H., Kim, J.-H., Choi, K.-S.: Automatic Term Recognition Through EM Algorithm (2003), http://nlplab.kaist.ac.kr/
Schone, P., Jurafsky, D.: Is Knowledge-Free Induction of Multiword Unit Dictionary Headwords a Solved Problem? In: Proceedings of EMNLP (2001)
Resnik, P., Dan Melamed, I.: Semi-Automatic Acquisition of Domain-Specific Translation Lexicons. In: Proceedings of the fifth conference on Applied natural language processing, pp. 340–347 (1997)
Zhi-Fang, S.: Terminology Standardization using the NLP Technology. Issues in Chinese Information Processing, 341–352 (2003)
Shi-wen, Y.: A Complete Specification on The Grammatical Knowledge-base of Contemporary Chinese. Qinghua Univ. Press (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kang, BK., Chang, BB., Chen, YR., Yu, SW. (2005). Extracting Terminologically Relevant Collocations in the Translation of Chinese Monograph. In: Dale, R., Wong, KF., Su, J., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2005. IJCNLP 2005. Lecture Notes in Computer Science(), vol 3651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562214_88
Download citation
DOI: https://doi.org/10.1007/11562214_88
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29172-5
Online ISBN: 978-3-540-31724-1
eBook Packages: Computer ScienceComputer Science (R0)