Abstract
Most of the existing collocation extraction systems are based on globally significant statistical behaviors without mechanisms to handle different types of collocations. By taking compositionality, substitutability, modifiability and internal associations into consideration, collocations are categorized into four different types in this work. Based on the analysis for each type of collocation, a multi-stage extraction system is designed using different combinations of discriminative features so as to identify different types of collocations in different stages. Perceptron training is employed to optimize the consolidation of discriminative features from different sources. Experiment results show that the achieved performance is much better than most reported work.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Manning, C.D., Schutze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
Smadja, F.: Retrieving Collocations from Text: Xtract. Computational Linguistics 19(1), 143–177 (1993)
Church, K., Hanks, P.: Word Association Norms, Mutual Information, and Lexicography. Computational Linguistics 16(1), 22–29 (1990)
Xu, R.F., Lu, Q., Li, Y.: An Automatic Chinese Collocation Extraction Algorithm Based on Lexical Statistics. In: Proc. IEEE Int. Conf. NLPKE (2003)
Lin, D.K.: Extracting Collocations from Text Corpora. In: First Workshop on Computational Terminology (1998)
Li, W.Y., Lu, Q., Xu, R.F.: Similarity based Chinese Synonyms Collocation Extraction. Int. J. Computational Linguistics and Chinese Language Processing 10(1), 123–144 (2005)
Mei, J.J.: Dictionary of Modern Chinese Collocations. Hanyu Dictionary Press (1999)
Robinson, J.J.: Dependency Structures and Transformation Rules. Language 46 (1970)
Mitchell, T.M.: Machine Learning. McGraw-Hill Press, New York (1997)
Mei, J.J., Zhu, Y.M.: Tong_Yi_Ci_Lin, Shanghai. Dictionary Press (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Xu, R., Lu, Q. (2006). A Multi-stage Chinese Collocation Extraction System. In: Yeung, D.S., Liu, ZQ., Wang, XZ., Yan, H. (eds) Advances in Machine Learning and Cybernetics. Lecture Notes in Computer Science(), vol 3930. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11739685_77
Download citation
DOI: https://doi.org/10.1007/11739685_77
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33584-9
Online ISBN: 978-3-540-33585-6
eBook Packages: Computer ScienceComputer Science (R0)