Representing Overlaps in Sequence Labeling Tasks with a Novel Tagging Scheme: Bigappy-Unicrossy | SpringerLink
Skip to main content

Representing Overlaps in Sequence Labeling Tasks with a Novel Tagging Scheme: Bigappy-Unicrossy

  • Conference paper
  • First Online:
Computational Linguistics and Intelligent Text Processing (CICLing 2019)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13451))

  • 421 Accesses

Abstract

Multiword expression (MWE) identification can be handled by using sequence tagging approach accompanied with stochastic models and variants of IOB tagging scheme. In this paper, we introduce a new tagging scheme called bigappy-unicrossy to rise to the challenge of overlapping MWEs. The bigappy-unicrossy tagging scheme is compared with the two other well-known tagging schemes which are IOB2 and gappy 1-level in the verbal multiword expression (VMWE) identification task using bidirectional Long Short-Term Memory model with a Conditional Random Field layer on top (bidirectional LSTM-CRF). Both the bigappy-unicrossy and the gappy 1-level tagging schemes outperform the IOB2 tagging scheme. The bigappy-unicrossy tagging scheme competes with the gappy 1-level tagging scheme. We believe that our tagging scheme will show better performance on corpora with higher frequency of overlapping cases.

G. Berk and B. Erden—These authors contributed equally to the work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 11439
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 14299
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://universaldependencies.org/format.html.

  2. 2.

    https://github.com/deep-bgt/Deep-BGT.

References

  1. Baldwin, T., Kim, S.N.: Multiword expressions. Handb. Nat. Lang. Process. 2, 267–292 (2010)

    Google Scholar 

  2. Berk, G., Erden, B., Güngör, T.: Deep-BGT at PARSEME shared task 2018: bidirectional LSTM-CRF model for verbal multiword expression identification. In: Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), pp. 248–253 (2018)

    Google Scholar 

  3. Boroş, T., Burtica, R.: GBD-NER at PARSEME shared task 2018: multi-word expression detection using bidirectional long-short-term memory networks and graph based decoding. In: Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), pp. 254–260 (2018)

    Google Scholar 

  4. Constant, M., et al.: Multiword expression processing: a survey. Comput. Linguist. 43(4), 837–892 (2017)

    Article  MathSciNet  Google Scholar 

  5. Ehren, R., Lichte, T., Samih, Y.: Mumpitz at PARSEME shared task 2018: a bidirectional LSTM for the identification of verbal multiword expressions. In: Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), pp. 261–267 (2018)

    Google Scholar 

  6. Grave, E., Bojanowski, P., Gupta, P., Joulin, A., Mikolov, T.: Learning word vectors for 157 languages. In: Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018) (2018)

    Google Scholar 

  7. Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991 (2015)

  8. Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360 (2016)

  9. Legrand, J., Collobert, R.: Phrase representations for multiword expressions. In: Proceedings of the 12th Workshop on Multiword Expressions, pp. 67–71. Association for Computational Linguistics (2016). https://doi.org/10.18653/v1/W16-1810, http://aclweb.org/anthology/W16-1810

  10. Ma, X., Hovy, E.: End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. arXiv preprint arXiv:1603.01354 (2016)

  11. Ramisch, C., et al.: Edition 1.1 of the PARSEME shared task on automatic identification of verbal multiword expressions. In: Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG 2018). Association for Computational Linguistics, Santa Fe (2018)

    Google Scholar 

  12. Ramisch, C., et al.: Annotated corpora and tools of the PARSEME shared task on automatic identification of verbal multiword expressions (edition 1.1) (2018). http://hdl.handle.net/11372/LRT-2842. LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University

  13. Ramshaw, L., Marcus, M.: Text chunking using transformation-based learning. In: Third Workshop on Very Large Corpora (1995). http://aclweb.org/anthology/W95-0107

  14. Ratnaparkhi, A.: Maximum entropy models for natural language ambiguity resolution. Ph.D. thesis, University of Pennsylvania, Philadelphia, PA, USA (1998). aAI9840230

    Google Scholar 

  15. Reimers, N., Gurevych, I.: Reporting score distributions makes a difference: performance study of LSTM-networks for sequence tagging. arXiv preprint arXiv:1707.09861 (2017)

  16. Sang, E.F.T.K., Veenstra, J.: Representing text chunks. In: Proceedings of the Ninth Conference on European Chapter of the Association for Computational Linguistics, EACL 1999, pp. 173–179. Association for Computational Linguistics, Stroudsburg (1999). https://doi.org/10.3115/977035.977059

  17. Schneider, N., Danchik, E., Dyer, C., Smith, N.A.: Discriminative lexical semantic segmentation with gaps: running the MWE gamut. Trans. Assoc. Comput. Linguist. 2, 193–206 (2014)

    Article  Google Scholar 

  18. Taslimipoor, S., Rohanian, O.: SHOMA at parseme shared task on automatic identification of VMWEs: neural multiword expression tagging with high generalisation. arXiv preprint arXiv:1809.03056 (2018)

  19. Zampieri, N., Scholivet, M., Ramisch, C., Favre, B.: Veyn at PARSEME shared task 2018: recurrent neural networks for VMWE identification. In: Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), pp. 290–296 (2018)

    Google Scholar 

Download references

Acknowledgements

This research was supported by Boğaziçi University Research Fund Grant Number 14420.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Berna Erden .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Berk, G., Erden, B., Güngör, T. (2023). Representing Overlaps in Sequence Labeling Tasks with a Novel Tagging Scheme: Bigappy-Unicrossy. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2019. Lecture Notes in Computer Science, vol 13451. Springer, Cham. https://doi.org/10.1007/978-3-031-24337-0_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-24337-0_44

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-24336-3

  • Online ISBN: 978-3-031-24337-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics