Abstract
Multiword expression (MWE) identification can be handled by using sequence tagging approach accompanied with stochastic models and variants of IOB tagging scheme. In this paper, we introduce a new tagging scheme called bigappy-unicrossy to rise to the challenge of overlapping MWEs. The bigappy-unicrossy tagging scheme is compared with the two other well-known tagging schemes which are IOB2 and gappy 1-level in the verbal multiword expression (VMWE) identification task using bidirectional Long Short-Term Memory model with a Conditional Random Field layer on top (bidirectional LSTM-CRF). Both the bigappy-unicrossy and the gappy 1-level tagging schemes outperform the IOB2 tagging scheme. The bigappy-unicrossy tagging scheme competes with the gappy 1-level tagging scheme. We believe that our tagging scheme will show better performance on corpora with higher frequency of overlapping cases.
G. Berk and B. Erden—These authors contributed equally to the work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Baldwin, T., Kim, S.N.: Multiword expressions. Handb. Nat. Lang. Process. 2, 267–292 (2010)
Berk, G., Erden, B., Güngör, T.: Deep-BGT at PARSEME shared task 2018: bidirectional LSTM-CRF model for verbal multiword expression identification. In: Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), pp. 248–253 (2018)
Boroş, T., Burtica, R.: GBD-NER at PARSEME shared task 2018: multi-word expression detection using bidirectional long-short-term memory networks and graph based decoding. In: Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), pp. 254–260 (2018)
Constant, M., et al.: Multiword expression processing: a survey. Comput. Linguist. 43(4), 837–892 (2017)
Ehren, R., Lichte, T., Samih, Y.: Mumpitz at PARSEME shared task 2018: a bidirectional LSTM for the identification of verbal multiword expressions. In: Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), pp. 261–267 (2018)
Grave, E., Bojanowski, P., Gupta, P., Joulin, A., Mikolov, T.: Learning word vectors for 157 languages. In: Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018) (2018)
Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991 (2015)
Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360 (2016)
Legrand, J., Collobert, R.: Phrase representations for multiword expressions. In: Proceedings of the 12th Workshop on Multiword Expressions, pp. 67–71. Association for Computational Linguistics (2016). https://doi.org/10.18653/v1/W16-1810, http://aclweb.org/anthology/W16-1810
Ma, X., Hovy, E.: End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. arXiv preprint arXiv:1603.01354 (2016)
Ramisch, C., et al.: Edition 1.1 of the PARSEME shared task on automatic identification of verbal multiword expressions. In: Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG 2018). Association for Computational Linguistics, Santa Fe (2018)
Ramisch, C., et al.: Annotated corpora and tools of the PARSEME shared task on automatic identification of verbal multiword expressions (edition 1.1) (2018). http://hdl.handle.net/11372/LRT-2842. LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University
Ramshaw, L., Marcus, M.: Text chunking using transformation-based learning. In: Third Workshop on Very Large Corpora (1995). http://aclweb.org/anthology/W95-0107
Ratnaparkhi, A.: Maximum entropy models for natural language ambiguity resolution. Ph.D. thesis, University of Pennsylvania, Philadelphia, PA, USA (1998). aAI9840230
Reimers, N., Gurevych, I.: Reporting score distributions makes a difference: performance study of LSTM-networks for sequence tagging. arXiv preprint arXiv:1707.09861 (2017)
Sang, E.F.T.K., Veenstra, J.: Representing text chunks. In: Proceedings of the Ninth Conference on European Chapter of the Association for Computational Linguistics, EACL 1999, pp. 173–179. Association for Computational Linguistics, Stroudsburg (1999). https://doi.org/10.3115/977035.977059
Schneider, N., Danchik, E., Dyer, C., Smith, N.A.: Discriminative lexical semantic segmentation with gaps: running the MWE gamut. Trans. Assoc. Comput. Linguist. 2, 193–206 (2014)
Taslimipoor, S., Rohanian, O.: SHOMA at parseme shared task on automatic identification of VMWEs: neural multiword expression tagging with high generalisation. arXiv preprint arXiv:1809.03056 (2018)
Zampieri, N., Scholivet, M., Ramisch, C., Favre, B.: Veyn at PARSEME shared task 2018: recurrent neural networks for VMWE identification. In: Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), pp. 290–296 (2018)
Acknowledgements
This research was supported by Boğaziçi University Research Fund Grant Number 14420.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 Springer Nature Switzerland AG
About this paper
Cite this paper
Berk, G., Erden, B., Güngör, T. (2023). Representing Overlaps in Sequence Labeling Tasks with a Novel Tagging Scheme: Bigappy-Unicrossy. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2019. Lecture Notes in Computer Science, vol 13451. Springer, Cham. https://doi.org/10.1007/978-3-031-24337-0_44
Download citation
DOI: https://doi.org/10.1007/978-3-031-24337-0_44
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-24336-3
Online ISBN: 978-3-031-24337-0
eBook Packages: Computer ScienceComputer Science (R0)