Abstract
Multiword Expressions (MWEs) have been the bottleneck in NLP. Particularly, the resource of fixed MWEs can improve the performance of tasks and implications of NLP. Due to complex characters of MWEs, it is hard to make difference between fixed MWEs and unfixed MWEs. This paper puts forwards an approach to extract fixed MWEs rapidly. First the definition of fixed MWEs is given. Features contributing to determinate fixed MWEs are considered both in statistic measures and in linguistic information. We extract fixed MWEs in the frame of multi-features and do manual evaluation. Experiment shows that the approach is effective. Our job can provide a desired list of fixed MWEs for NLP implication.
This research has been funded by Taiyuan University of Technology (Item number: 900103- 03010255 and 900103-03020632).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Sag, I.A., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multiword Expressions: A Pain in the Neck for NLP. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 1–15. Springer, Heidelberg (2002)
Biber, D., Johansson, S., Leech, G., Conrad, S., Finegan, E.: Grammar of Spoken and Written English. Longman, Harlow (1999)
Jackendoff, R.: The Architecture of the Language Faculty, Cambridge (1997)
Baldwin, T., Bender, E.M., Flickinger, D., Kim, A., Oepen, S.: Road-testing the English Resource Grammar over the British National Corpus. In: Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004), Lisbon, Portugal, pp. 2047–2050 (2004)
Caseli, H.M., Ramisch, C., Nunes, M.G.V., Villavicencio, A.: Alignment-based extraction of multiword expressions. Language Resources and Evaluation (2009) (to appear)
Moon, R.: Fixed Expressions and Idioms in English: A Corpus-Based Approach. Clarendom Press, Oxford (1998)
Piao, S.S.L., Sun, G., Rayson, P., Yuan, Q.: Automatic Extraction of Chinese Multiword Expressions with a Statistical Tool. In: Proceedings of the Workshop on Multiword expressions in a Multilingual Context (EACL 2006), Trento, Italy, pp. 17–24 (April 2006)
Zhang, Y., Kordoni, V., Villavicencio, A., Idiart, M.: Automated Multiword Expression Prediction for Grammar Engineering. In: Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties, pp. 36–44. Association for Computational Linguistics, Sydney (July 2006)
Bannard, C.: A measure of syntactic flexibility for automatically identifying multiword expressions in corpora. In: Proceedings of the ACL Workshop on A Broader Perspective on Multiword Expressions, pp. 1–8 (2007)
Baldwin, T., Villavicencio, A.: Extracting the Unextractable: A Case Study on Verb-particles. In: Proceedings of the 6th Conference on Natural Language Learning (CoNLL 2002), Taipei, Taiwan, pp. 98–104 (2002)
Van de Cruys, T., Moirón, B.V.: Semantics-based multiword expression extraction. In: Proceedings of the Workshop on A Broader Perspective on Multiword Expressions, pp. 25–32 (2007)
Duan, J., Zhang, M., Tong, L., Guo, F.: A Hybrid Approach to Improve Bilingual Multiword Expression Extraction. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 541–547. Springer, Heidelberg (2009)
Ren, Z., Lu, Y., Cao, J., Liu, Q., Huang, Y.: Proceedings of the 2009 Workshop on Multiword Expressions, ACL-IJCNLP 2009, pp. 47–54. Suntec, Singapore (2009)
Villavicencio, A., Kordoni, V., Zhang, Y., MarcoIdiart, Ramisch, C.: Validation and Evaluation of Automatically, Acquired Multiword Expressions for Grammar Engineering. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, June 2007, pp. 1034–1043 (2007)
Pearce, D.: A comparative evaluation of collocation extraction techniques. In: Proc. of the 3rd International Conference on Language Resources and Evaluation (LREC 2002), Las Palmas, Canary Islands, pp. 1530–1536 (2002)
Pecina, P.: Lexical association measures and collocation extraction. Language Resources and Evaluation 44, 137–158 (2010)
Hoang, H.H., Kim, S.N., Kan, M.-Y.: A Re-examination of Lexical Association Measures. In: Proceedings of the 2009 Workshop on Multiword Expressions, ACL-IJCNLP 2009, Suntec, Singapore, pp. 31–39 (2009)
Davidov, D., Rappoport, A.: Efficient Unsupervised Discovery of Word Categories Using Symmetric Patterns and High Frequency words. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, Sydney, pp. 297–304 (July 2006)
Jackendoff, R.: The Architecture of the Language Faculty, Cambridge (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, L., Liu, R. (2011). A Rapid Method to Extract Multiword Expressions with Statistic Measures and Linguistic Rules. In: Gong, Z., Luo, X., Chen, J., Lei, J., Wang, F.L. (eds) Web Information Systems and Mining. WISM 2011. Lecture Notes in Computer Science, vol 6988. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23982-3_30
Download citation
DOI: https://doi.org/10.1007/978-3-642-23982-3_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23981-6
Online ISBN: 978-3-642-23982-3
eBook Packages: Computer ScienceComputer Science (R0)