Abstract
Recognition of Multi-word Expressions (MWEs) and their relative compositionality are crucial to Natural Language Processing. Various statistical techniques have been proposed to recognize MWEs. In this paper, we integrate all the existing statistical features and investigate a range of classifiers for their suitability for recognizing the non-compositional Verb-Noun (V-N) collocations. In the task of ranking the V-N collocations based on their relative compositionality, we show that the correlation between the ranks computed by the classifier and human ranking is significantly better than the correlation between ranking of individual features and human ranking. We also show that the properties ‘Distributed frequency of object’ (as defined in [27] ) and ‘Nearest Mutual Information’ (as adapted from [18]) contribute greatly to the recognition of the non-compositional MWEs of the V-N type and to the ranking of the V-N collocations based on their relative compositionality.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Abeille, A.: Light verb constuctions and extraction out of NP in a tree adjoining grammar. In: Papers of the 24th Regional Meeting of the Chicago Linguistics Society (1988)
Akimoto, M.: Papers of the 24th Regional Meeting of the Chicago Linguistics Society. Shinozaki Shorin (1989)
Baldwin, T., Bannard, C., Tanaka, T., Widdows, D.: An Empirical Model of Multiword Expression. In: Proceedings of the ACL-2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment (2003)
Bannard, C., Baldwin, T., Lascarides, A.: A Statistical Approach to the Semantics of Verb-Particles. In: Proceedings of the ACL-2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment (2003)
Bikel, D.M.: A Distributional Analysis of a Lexicalized Statistical Parsing Model. In: Proceedings of EMNLP (2004)
Becker, J.D.: The Phrasal Lexicon. In: Theoritical Issues of NLP, Workshop in CL, Linguistics, Psychology and AI, Cambridge, MA (1975)
Breidt, E.: Extraction of V-N-Collocations from Text Corpora: A Feasibility Study for German. In: CoRR-1996 (1995)
Church, K., Gale, W., Hanks, P., Hindle, D.: Parsing, word associations and typical predicate-argument relations. In: Current Issues in Parsing Technology. Kluwer Academic, Dordrecht (1991)
Church, K., Hanks, P.: Word Association Norms, Mutual Information, and Lexicography. In: Proceedings of the 27th. Annual Meeting of the Association for Computational Linguistics 1990 (1989)
Dunning, T.: Accurate Methods for the Statistics of Surprise and Coincidence. In: Computational Linguistics - 1993 (1993)
Evert, S., Krenn, B.: Methods for the Qualitative Evaluation of Lexical Association Measures. In: Proceedings of the ACL - 2001 (2001)
Fillmore, C.: An extremist approach to multi-word expressions. A talk given at IRCS, University of Pennsylvania, 2003 (2003)
Fontenelle, Bruls, T.W., Thomas, L., Vanallemeersch, T., Jansen, J.: Survey of collocation extraction tools. Deliverable D-1a, MLAP-Project 93-19 DECIDE, University of Liege, Belgium (1994)
Diaz-Galiano, M.C., Martin-Valdivia, M.T., Martinez-Santiago, F., Urena-Lopez, L.A.: Multi-word Expressions Recognition with the LVQ Algorithm. In: Proceedings of Methodologies and Evaluation of Multiword Unit in Real-world Applications, LREC 2004 (2004)
Joachims, T.: Making large-Scale SVM Learning Practical. Advances in Kernel Methods - Support Vector Learning (1999)
Joachims, T.: Optimizing Search Engines Using Clickthrough Data. In: Advances in Kernel Methods - Support Vector Learning edings of the ACM Conference on Knowledge Discovery and Data Mining (KDD). ACM, New York (2002)
Kilgariff, A., Rosenzweig, J.: Framework and Results for English Senseval. Computers and the Humanities 2000 (2000)
Lin, D.: Automatic Identification of non-compositonal phrases. In: Proceedings of ACL- 1999, College Park, USA (1999)
McCarthy, D., Keller, B., Carroll, J.: Detecting a Continuum of Compositionality in Phrasal Verbs. In: Proceedings of the ACL-2003 Workshop on Multi-word Expressions: Analysis, Acquisition and Treatment 2003 (2003)
Mitchell, T.: Instance-Based Learning. In: Machine Learning. McGraw-Hill Series in Computer Science, New York (1997)
Moore, A.W., Lee, M.S.: Proceedings of the 11 International Conference on Machine Learning (1994)
Nunberg, G., Sag, I.A., Wasow, T.: Idioms. Language 1994 (1994)
Sag, I.A., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multi-word expressions: a pain in the neck for nlp. In: Proceedings of CICLing 2002 (2002)
Schone, P., Jurafsky, D.: Is Knowledge-Free Induction of Multiword Unit Dictionary Headwords a Solved Problem? In: Proceedings of EMNLP 2001 (2001)
Schuler, W., Joshi, A.K.: Relevance of tree rewriting systems for multi-word expressions (2005) (to be published)
Smadja, F.: Retrieving Collocations from Text: Xtract. In: Computational Linguistics - 1993 (1993)
Tapanainen, P., Piitulaine, J., Jarvinen, T.: Idiomatic object usage and support verbs. In: 36th Annual Meeting of the Association for Computational Linguistics (1998)
Venkatapathy, S., Joshi, A.K.: Recognition of Multi-word Expressions: A Study of Verb-Noun (V-N) Collocations. In: Proceedings of the International Conference on Natural Language Processing 2004 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Venkatapathy, S., Joshi, A.K. (2005). Relative Compositionality of Multi-word Expressions: A Study of Verb-Noun (V-N) Collocations. In: Dale, R., Wong, KF., Su, J., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2005. IJCNLP 2005. Lecture Notes in Computer Science(), vol 3651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562214_49
Download citation
DOI: https://doi.org/10.1007/11562214_49
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29172-5
Online ISBN: 978-3-540-31724-1
eBook Packages: Computer ScienceComputer Science (R0)