Abstract
Sememe is the minimum unambiguous semantic unit in human language. The semantics of word senses are encoded and expressed by sememe trees in sememe knowledge base. Sememe knowledge benefits many NLP tasks. But it is time-consuming to construct the sememe knowledge base manually. There is one existing work that slightly involves sememe tree prediction, but there are two limitations. The first is they use the word as the unit instead of the word sense. The second is that their method only deals with words with dictionary definitions, not all words. In this article, we use English and Chinese bilingual information to help disambiguate word sense. We propose the Chinese and English bilingual sememe tree prediction task which can automatically extend the famous knowledge base HowNet. And we propose two methods. For a given word pair with categorial sememe, starting from the root node, the first method uses neural networks to gradually generate edges and nodes in a depth-first order. The second is a recommended method. For a given word pair with categorial sememe, we find some word pairs with the same categorial sememe and semantically similar to it, and construct a propagation function to transfer sememe tree information of these word pairs to the word pair to be predicted. Experiments show that our method has a significant effect of F1 84.0%. Further, we use the Oxford English-Chinese Bilingual Dictionary as data and add about 90,000 word pairs to HowNet, nearly expanding HowNet by half.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
\(P_{s_i}+r_i=[s_1, r_1, \ldots , s_{i-1}, r_{i-1}, s_i, r_i]\).
- 2.
\(P_{s_i}+r_i+s_{i+1}=[s_1, r_1, \ldots , s_{i-1}, r_{i-1}, s_i, r_i, s_{i+1}]\).
- 3.
- 4.
References
Bloomfield, L.: A set of postulates for the science of language. Language 2(3), 153–164 (1926)
Zhang, Y., Gong, L., Wang, Y.: Chinese word sense disambiguation using HowNet. In: International Conference on Advances in Natural Computation (2005)
Dang, L., Zhang, L.: Method of discriminant for Chinese sentence sentiment orientation based on HowNet. In: Application Research of Computers (2010)
Gu, Y., et al.: Language modeling with sparse product of sememe experts. In: EMNLP 2018: 2018 Conference on Empirical Methods in Natural Language Processing, pp. 4642–4651 (2018)
Qi, F., et al.: Modeling semantic compositionality with sememe knowledge. In: ACL 2019 : The 57th Annual Meeting of the Association for Computational Linguistics, pp. 5706–5715 (2019)
Li, Z., Ding, N., Liu, Z., Zheng, H., Shen, Y.: Chinese relation extraction with multi-grained information and external linguistic knowledge. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 4377–4386 (2019)
Sun, J.G., Cai, D.F., LV, D., Dong, Y.: HowNet based Chinese question automatic classification. J. Chin. Inf. Process. 21(1), 90–95 (2007)
Zang, Y., et al.: Textual adversarial attack as combinatorial optimization. arXiv: Computation and Language (2019)
Adriani, M.: Using statistical term similarity for sense disambiguation in cross-language information retrieval. Inf. Retrieval 2(1), 71–82 (2000)
Balkova, V., Sukhonogov, A., Yablonsky, S.: Russian wordnet. In: Proceedings of the Second Global Wordnet Conference (2004)
Dong, Z., Dong, Q.: HowNet - a hybrid language and knowledge resource. In: 2003 Proceedings of International Conference on Natural Language Processing and Knowledge Engineering, pp. 820–824 (2003)
Du, J., Qi, F., Sun, M., Liu, Z.: Lexical sememe prediction using dictionary definitions by capturing local semantic correspondence. arXiv preprint arXiv:2001.05954 (2020)
Xie, R., Yuan, X., Liu, Z., Sun, M.: Lexical sememe prediction via word embeddings and matrix factorization. In: International Joint Conference on Artificial Intelligence (2017)
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Song, Y., Shi, S., Li, J., Zhang, H.: Directional skip-gram: Explicitly distinguishing left and right context for word embed-dings. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pp. 175–180. Association for Computational Linguistics, New Or-leans, Louisiana, June 2018
Ding, N., Li, Z., Liu, Z., Zheng, H., Lin, Z.: Event detection with trigger-aware lattice neural network. In: Proceedings of the2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 347–356, January 2019
Du, J., Qi, F., Sun, M., Liu, Z.: Lexical sememe prediction by dictionary definitions and local semantic correspondence. J. Chin. Inf. Process. 34(5), 1–9 (2020)
Acknowledgements
This work is supported by the Key Technology Develop and Research Project (SGTJDK00DWJS1900242) in STATE GRID Corporation of China.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Liu, B., Shang, X., Liu, L., Tan, Y., Hou, L., Li, J. (2021). Sememe Tree Prediction for English-Chinese Word Pairs. In: Chen, H., Liu, K., Sun, Y., Wang, S., Hou, L. (eds) Knowledge Graph and Semantic Computing: Knowledge Graph and Cognitive Intelligence. CCKS 2020. Communications in Computer and Information Science, vol 1356. Springer, Singapore. https://doi.org/10.1007/978-981-16-1964-9_2
Download citation
DOI: https://doi.org/10.1007/978-981-16-1964-9_2
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-1963-2
Online ISBN: 978-981-16-1964-9
eBook Packages: Computer ScienceComputer Science (R0)