Extracting Terminologically Relevant Collocations in the Translation of Chinese Monograph | SpringerLink
Skip to main content

Extracting Terminologically Relevant Collocations in the Translation of Chinese Monograph

  • Conference paper
Natural Language Processing – IJCNLP 2005 (IJCNLP 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3651))

Included in the following conference series:

Abstract

This paper suggests a methodology which is aimed to extract the terminologically relevant collocations for translation purposes. Our basic idea is to use a hybrid method which combines the statistical method and linguistic rules. The extraction system used in our work operated at three steps: (1) Tokenization and POS tagging of the corpus; (2) Extraction of multi-word units using statistical measure; (3) Linguistic filtering to make use of syntactic patterns and stop-word list. As a result, hybrid method using linguistic filters proved to be a suitable method for selecting terminological collocations, it has considerably improved the precision of the extraction which is much higher than that of purely statistical method. In our test, hybrid method combining “Log-likelihood ratio” and “linguistic rules” had the best performance in the extraction. We believe that terminological collocations and phrases extracted in this way, could be used effectively either to supplement existing terminological collections or to be used in addition to traditional reference works.

This work has been supported by The National Basic Research Program of China(973 program, No. 2004CB318102) and the 863 program (No. 2001AA114210, 2002AA117010).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 17159
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 21449
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Bao-Bao, C.: Extraction of Translation Equivalent Pairs from Chinese-English Parallel Corpus. Terminology Standardization and Information Technology, pp. 24–29 (2002)

    Google Scholar 

  2. Bourigault, D.L.: A Natural Language Processing Tool for Terminology Extraction. In: Proceedings of 7th EURALEX International Congress (1996)

    Google Scholar 

  3. Daille, B.: Study and Implementation of Combined Techniques for Automatic Extraction of Terminology. In: The balancing act combining symbolic and statistical approaches to language. MIT Press, Cambridge (1995)

    Google Scholar 

  4. Heid, U.: A linguistic bootstrapping approach to the extraction of term candidates from German text (2000), http://www.ims.uni-stuttgart.de/~uli/papers.html

  5. Shimohata, S., Sugio, T., Nagata, J.: Retrieving Domain-Specific Collocations By Co-Occurrences and Word Order Constraints. Computational Intelligence 15, 92–100 (1999)

    Article  Google Scholar 

  6. Luo, S., Nation, M.S.: Two-Character Chinese Word Extraction Based on Hybrid of Internal and Contextual Measures (2003)

    Google Scholar 

  7. Smadja, F.: Retrieving Collocations From Text: XTRACT. Computational Linguistics 19(1), 143–177 (1993)

    Google Scholar 

  8. Vogel, D.: Using Generic Corpora to Learn Domain-Specific Terminology. In: Workshop on Link Analysis for Detecting Complex Behavior (2003)

    Google Scholar 

  9. Dias, G., Guilloré, S., Lopes, J.G.P.: Multiword Lexical Units Extraction. In: Proceedings of the International Symposium on Machine Translation and Computer Language Information Processing, Beijing, China (1999)

    Google Scholar 

  10. Zhi-Wei, F.: An Introduction to Modern Terminology. Yuwen press, China (1997)

    Google Scholar 

  11. Diasetc, G., et al.: Combining Linguistics with Statistics for Multiword Term Extraction. In: Proc. of Recherche d’Informations Assistee par Ordinateur (2000)

    Google Scholar 

  12. Xuan-jing, H., Li-de, W., Wen-xin, W.: Statistical Acquisition of Terminology Dictionary. In: The Fifth Workshop on Very Large Corpora (1997)

    Google Scholar 

  13. Yu, J.: Automatic Detection of Collocation (2003), http://icl.pku.edu.cn/yujs/

  14. Oh, J.-H., Kim, J.-H., Choi, K.-S.: Automatic Term Recognition Through EM Algorithm (2003), http://nlplab.kaist.ac.kr/

  15. Schone, P., Jurafsky, D.: Is Knowledge-Free Induction of Multiword Unit Dictionary Headwords a Solved Problem? In: Proceedings of EMNLP (2001)

    Google Scholar 

  16. Resnik, P., Dan Melamed, I.: Semi-Automatic Acquisition of Domain-Specific Translation Lexicons. In: Proceedings of the fifth conference on Applied natural language processing, pp. 340–347 (1997)

    Google Scholar 

  17. Zhi-Fang, S.: Terminology Standardization using the NLP Technology. Issues in Chinese Information Processing, 341–352 (2003)

    Google Scholar 

  18. Shi-wen, Y.: A Complete Specification on The Grammatical Knowledge-base of Contemporary Chinese. Qinghua Univ. Press (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kang, BK., Chang, BB., Chen, YR., Yu, SW. (2005). Extracting Terminologically Relevant Collocations in the Translation of Chinese Monograph. In: Dale, R., Wong, KF., Su, J., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2005. IJCNLP 2005. Lecture Notes in Computer Science(), vol 3651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562214_88

Download citation

  • DOI: https://doi.org/10.1007/11562214_88

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29172-5

  • Online ISBN: 978-3-540-31724-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics