Improved Learning of Chinese Word Embeddings with Semantic Knowledge | SpringerLink
Skip to main content

Improved Learning of Chinese Word Embeddings with Semantic Knowledge

  • Conference paper
  • First Online:
Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data (CCL 2015, NLP-NABD 2015)

Abstract

While previous studies show that modeling the minimum meaning-bearing units (characters or morphemes) benefits learning vector representations of words, they ignore the semantic dependencies across these units when deriving word vectors. In this work, we propose to improve the learning of Chinese word embeddings by exploiting semantic knowledge. The basic idea is to take the semantic knowledge about words and their component characters into account when designing composition functions. Experiments show that our approach outperforms two strong baselines on word similarity, word analogy, and document classification tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    We use tag-specific weight vectors rather than weight matrices, as the vLBL model [14] does, for significantly faster training. This has been discussed by Mnih and Teh [15].

  2. 2.

    http://www.csie.ntu.edu.tw/~cjlin/liblinear/

References

  1. Bengio, Y., Ducharme, R., Vincent, P., Janvin, C.: A neural probabilistic language model. JMLR 3, 1137–1155 (2003)

    Google Scholar 

  2. Botha, J.A., Blunsom, P.: Compositional morphology for word representations and language modelling. In: Proceedings of ICML (2014)

    Google Scholar 

  3. Chen, X., Liu, Z., Sun, M.: A unified model for word sense representation and disambiguation. In: Proceedings of EMNLP (2014)

    Google Scholar 

  4. Chen, X., Xu, L., Liu, Z., Sun, M., Luan, H.: Joint learning of character and word embeddings. In: Proceedings of IJCAI (2015)

    Google Scholar 

  5. Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. JMLR 12, 2493–2537 (2011)

    Google Scholar 

  6. Faruqui, M., Dodge, J., Jauhar, S.K., Dyer, C., Hovy, E., Smith, N.A.: Retrofitting word vectors to semantic lexicons. In: Proceedings of NAACL (2015)

    Google Scholar 

  7. Jin, P., Wu, Y.: Semeval-2012 task 4: evaluating chinese word similarity. In: Proceedings of SemEval (2012)

    Google Scholar 

  8. Li, J., Sun, M.: Scalable term selection for text categorization. In: Proceedings of EMNLP (2007)

    Google Scholar 

  9. Li, Z.: Parsing the internal structure of words: a new paradigm for chinese word segmentation. In: Proceedings of ACL (2011)

    Google Scholar 

  10. Luong, T., Socher, R., Manning, C.D.: Better word representations with recursive neural networks for morphology. In: Proceedings of CoNLL (2013)

    Google Scholar 

  11. Mei, J., Zhu, Y., Gao, Y., Yin, H.: TongYiCi CiLin. Shanghai Cishu Publisher, Shanghai (1983)

    Google Scholar 

  12. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of Workshop at ICLR (2013)

    Google Scholar 

  13. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of NIPS (2013)

    Google Scholar 

  14. Mnih, A., Kavukcuoglu, K.: Learning word embeddings efficiently with noise-contrastive estimation. In: Proceedings of NIPS (2013)

    Google Scholar 

  15. Mnih, A., Teh, Y.W.: A fast and simple algorithm for training neural probabilistic language models. In: Proceedings of ICML (2012)

    Google Scholar 

  16. Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of EMNLP (2014)

    Google Scholar 

  17. Socher, R., Lin, C.C., Ng, A.Y., Manning, C.D.: Parsing natural scenes and natural language with recursive neural networks. In: Proceedings of ICML (2011)

    Google Scholar 

  18. Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C.D., Ng, A., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of EMNLP (2013)

    Google Scholar 

  19. Yu, M., Dredze, M.: Improving lexical embeddings with semantic knowledge. In: Proceedings of ACL (2014)

    Google Scholar 

  20. Zhang, M., Zhang, Y., Che, W., Liu, T.: Chinese parsing exploiting characters. In: Proceedings of ACL (2013)

    Google Scholar 

  21. Zhao, H.: Character-level dependencies in chinese: usefulness and learning. In: Proceedings of EACL (2009)

    Google Scholar 

Download references

Acknowledgments

The authors thank Yang Liu, Xinxiong Chen, Lei Xu, Yu Zhao and Zhiyuan Liu for helpful discussions and three anonymous reviewers for the valuable comments. This research is supported by the Key Project of National Social Science Foundation of China under Grant No. 13&ZD190 and the Project of National Natural Science Foundation of China under Grant No. 61170196.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Liner Yang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Yang, L., Sun, M. (2015). Improved Learning of Chinese Word Embeddings with Semantic Knowledge. In: Sun, M., Liu, Z., Zhang, M., Liu, Y. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. CCL NLP-NABD 2015 2015. Lecture Notes in Computer Science(), vol 9427. Springer, Cham. https://doi.org/10.1007/978-3-319-25816-4_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25816-4_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25815-7

  • Online ISBN: 978-3-319-25816-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics