A Multi-stage Chinese Collocation Extraction System | SpringerLink
Skip to main content

A Multi-stage Chinese Collocation Extraction System

  • Conference paper
Advances in Machine Learning and Cybernetics

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3930))

  • 1216 Accesses

Abstract

Most of the existing collocation extraction systems are based on globally significant statistical behaviors without mechanisms to handle different types of collocations. By taking compositionality, substitutability, modifiability and internal associations into consideration, collocations are categorized into four different types in this work. Based on the analysis for each type of collocation, a multi-stage extraction system is designed using different combinations of discriminative features so as to identify different types of collocations in different stages. Perceptron training is employed to optimize the consolidation of discriminative features from different sources. Experiment results show that the achieved performance is much better than most reported work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Manning, C.D., Schutze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)

    MATH  Google Scholar 

  2. Smadja, F.: Retrieving Collocations from Text: Xtract. Computational Linguistics 19(1), 143–177 (1993)

    Google Scholar 

  3. Church, K., Hanks, P.: Word Association Norms, Mutual Information, and Lexicography. Computational Linguistics 16(1), 22–29 (1990)

    Google Scholar 

  4. Xu, R.F., Lu, Q., Li, Y.: An Automatic Chinese Collocation Extraction Algorithm Based on Lexical Statistics. In: Proc. IEEE Int. Conf. NLPKE (2003)

    Google Scholar 

  5. Lin, D.K.: Extracting Collocations from Text Corpora. In: First Workshop on Computational Terminology (1998)

    Google Scholar 

  6. Li, W.Y., Lu, Q., Xu, R.F.: Similarity based Chinese Synonyms Collocation Extraction. Int. J. Computational Linguistics and Chinese Language Processing 10(1), 123–144 (2005)

    Google Scholar 

  7. Mei, J.J.: Dictionary of Modern Chinese Collocations. Hanyu Dictionary Press (1999)

    Google Scholar 

  8. Robinson, J.J.: Dependency Structures and Transformation Rules. Language 46 (1970)

    Google Scholar 

  9. Mitchell, T.M.: Machine Learning. McGraw-Hill Press, New York (1997)

    MATH  Google Scholar 

  10. Mei, J.J., Zhu, Y.M.: Tong_Yi_Ci_Lin, Shanghai. Dictionary Press (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Xu, R., Lu, Q. (2006). A Multi-stage Chinese Collocation Extraction System. In: Yeung, D.S., Liu, ZQ., Wang, XZ., Yan, H. (eds) Advances in Machine Learning and Cybernetics. Lecture Notes in Computer Science(), vol 3930. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11739685_77

Download citation

  • DOI: https://doi.org/10.1007/11739685_77

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-33584-9

  • Online ISBN: 978-3-540-33585-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics