MG-BERT: A Multi-glosses BERT Model for Word Sense Disambiguation

Guo, Ping; Hu, Yue; Li, Yunpeng

doi:10.1007/978-3-030-55393-7_24

Ping Guo^14,15,
Yue Hu^14,15 &
Yunpeng Li^14,15

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12275))

Included in the following conference series:

International Conference on Knowledge Science, Engineering and Management

1517 Accesses

Abstract

Word Sense Disambiguation (WSD) is a core task in NLP fields and has many potential applications. Traditional supervised methods still have obstacles, such as the problem of variable size of label candidates and the lack of annotated corpora. Although attempts are made to integrate gloss information to the model, no existing models have paid attention to the divergences among glosses. In this paper, we propose a Multi-Glosses BERT (MG-BERT) model with two main advantages for WSD task. Our model jointly encodes the context and multi-glosses of the target word. We show that our Context with Multi-Glosses mechanism can find out and emphasize the divergences among glosses and generate nearly orthogonal gloss embeddings, which makes it more accuracy to match the context with the correct gloss. We design three classification algorithms, Gloss Matrix Classifier (GMC), General Gloss Matrix Classifier (GGMC) and Space Transforming Classifier (STC), all of which can disambiguate words with full-coverage of WordNet. In GMC and GGMC, we utilize gloss embeddings as weight matrix. For STC, we transform different label space to a same label space. Experiment shows that our MG-BERT model achieves new state-of-the-art performance on all WSD benchmarks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 5719; Price includes VAT (Japan)

Softcover Book: JPY 7149; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Comparative Study of Transformers on Word Sense Disambiguation

Amharic Sentence-Level Word Sense Disambiguation Using Transfer Learning

Neural Local and Global Contexts Learning for Word Sense Disambiguation

References

Basile, P., Caputo, A., Semeraro, G.: An enhanced lesk word sense disambiguation algorithm through a distributional semantic model. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp. 1591–1600 (2014)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Huang, L., Sun, C., Qiu, X., Huang, X.: Glossbert: Bert for word sense disambiguation with gloss knowledge. arXiv preprint arXiv:1908.07245 (2019)
Iacobacci, I., Pilehvar, M.T., Navigli, R.: Embeddings for word sense disambiguation: an evaluation study. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 897–907 (2016)
Google Scholar
Lesk, M.: Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Proceedings of the 5th Annual International Conference on Systems Documentation, pp. 24–26 (1986)
Google Scholar
Loureiro, D., Jorge, A.: Language modelling makes sense: Propagating representations through wordnet for full-coverage word sense disambiguation. arXiv preprint arXiv:1906.10007 (2019)
Luo, F., Liu, T., He, Z., Xia, Q., Sui, Z., Chang, B.: Leveraging gloss knowledge in neural word sense disambiguation by hierarchical co-attention. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 1402–1411 (2018)
Google Scholar
Luo, F., Liu, T., Xia, Q., Chang, B., Sui, Z.: Incorporating glosses into neural word sense disambiguation. arXiv preprint arXiv:1805.08028 (2018)
Miller, G.A.: WordNet: An Electronic Lexical Database. MIT press, Cambridge (1998)
MATH Google Scholar
Miller, G.A., Chodorow, M., Landes, S., Leacock, C., Thomas, R.G.: Using a semantic concordance for sense identification. In: Proceedings of the workshop on Human Language Technology, pp. 240–243. Association for Computational Linguistics (1994)
Google Scholar
Moro, A., Raganato, A., Navigli, R.: Entity linking meets word sense disambiguation: a unified approach. Trans. Assoc. Comput. Linguist. 2, 231–244 (2014)
Article Google Scholar
Navigli, R.: Word sense disambiguation: a survey. ACM Comput. Surv. (CSUR) 41(2), 1–69 (2009)
Article Google Scholar
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Google Scholar
Peters, M.E., et al.: Deep contextualized word representations. arXiv preprint arXiv:1802.05365 (2018)
Raganato, A., Bovi, C.D., Navigli, R.: Neural sequence learning models for word sense disambiguation. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 1156–1167 (2017)
Google Scholar
Raganato, A., Camacho-Collados, J., Navigli, R.: Word sense disambiguation: a unified evaluation framework and empirical comparison. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pp. 99–110 (2017)
Google Scholar
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R., Le, Q.V.: XLNet: generalized autoregressive pretraining for language understanding. In: Advances in Neural Information Processing Systems, pp. 5754–5764 (2019)
Google Scholar
Zhong, Z., Ng, H.T.: It makes sense: a wide-coverage word sense disambiguation system for free text. In: Proceedings of the ACL 2010 System Demonstrations, pp. 78–83 (2010)
Google Scholar

Download references

Acknowledgements

We thank the reviewers for their insightful comments. We also thank Effyic Intelligent Technology (beijing) for their computing resource support. This work was supported by in part by the National Key Research and Development Program of China under Grant No. 2016YFB0801003.

Author information

Authors and Affiliations

Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
Ping Guo, Yue Hu & Yunpeng Li
School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China
Ping Guo, Yue Hu & Yunpeng Li

Authors

Ping Guo
View author publications
You can also search for this author in PubMed Google Scholar
Yue Hu
View author publications
You can also search for this author in PubMed Google Scholar
Yunpeng Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yue Hu .

Editor information

Editors and Affiliations

Deakin University, Geelong, VIC, Australia
Gang Li
University of Electronic Science and Technology of China, Chengdu, China
Heng Tao Shen
Beijing Institute of Technology, Beijing, China
Ye Yuan
Zhejiang Gongshang University, Hangzhou, China
Xiaoyang Wang
Zhejiang Normal University, Jinhua, China
Huawen Liu
National University of Defense Technology, Changsha, China
Xiang Zhao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Guo, P., Hu, Y., Li, Y. (2020). MG-BERT: A Multi-glosses BERT Model for Word Sense Disambiguation. In: Li, G., Shen, H., Yuan, Y., Wang, X., Liu, H., Zhao, X. (eds) Knowledge Science, Engineering and Management. KSEM 2020. Lecture Notes in Computer Science(), vol 12275. Springer, Cham. https://doi.org/10.1007/978-3-030-55393-7_24

Download citation

DOI: https://doi.org/10.1007/978-3-030-55393-7_24
Published: 20 August 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-55392-0
Online ISBN: 978-3-030-55393-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics