Light Pre-Trained Chinese Language Model for NLP Tasks | SpringerLink
Skip to main content

Light Pre-Trained Chinese Language Model for NLP Tasks

  • Conference paper
  • First Online:
Natural Language Processing and Chinese Computing (NLPCC 2020)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12431))

  • 2221 Accesses

Abstract

We present the results of shared-task 1 held in the 2020 Conference on Natural Language Processing and Chinese Computing (NLPCC): Light Pre-Trained Chinese Language Model for NLP tasks. This shared-task examines the performance of light language models on four common NLP tasks: Text Classification, Named Entity Recognition, Anaphora Resolution and Machine Reading Comprehension. To make sure that the models are light-weight, we put restrictions and requirements on the number of parameters and inference speed of the participating models. In total, 30 teams registered our tasks. Each submission was evaluated through our online benchmark system (https://www.cluebenchmarks.com/nlpcc2020.html), with the average score over the four tasks as the final score. Various ideas and frameworks were explored by the participants, including data enhancement, knowledge distillation and quantization. The best model achieved an average score of 75.949, which was very close to BERT-base (76.460). We believe this shared-task highlights the potential of light-weight models and calls for further research on the development and exploration of light-weight models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 5719
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 7149
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://ai.chuangxin.com/.

  2. 2.

    https://tianchi.aliyun.com/competition/gameList/activeList.

  3. 3.

    https://www.kesci.com/.

  4. 4.

    https://ai.baidu.com/.

  5. 5.

    http://tcci.ccf.org.cn/conference/2020/cfpt.php.

  6. 6.

    See most recent results at https://www.cluebenchmarks.com/nlpcc2020.html.

  7. 7.

    https://github.com/CLUEbenchmark/LightLM.

  8. 8.

    https://www.cluebenchmarks.com/nlpcc2020.html.

  9. 9.

    https://github.com/CLUEbenchmark/LightLM/tree/master/baselines.

  10. 10.

    Recently, 69.289 is the best score of our baseline.

  11. 11.

    Huawei Cloud & Noah’s Ark lab submitted Rank 3 instead of the best one.

  12. 12.

    Thanks to Xiaomi AI Lab. They submitted this BERT-base model, which is though not totally fine-tuned.

References

  1. Chen, B., Huang, F.: Semi-supervised convolutional networks for translation adaptation with tiny amount of in-domain data. In: Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning, pp. 314–323 (2016)

    Google Scholar 

  2. Cui, Y., et al.: A span-extraction dataset for Chinese machine reading comprehension. arXiv preprint arXiv:1810.07366 (2018)

  3. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  4. Gordon, M.A., Duh, K., Andrews, N.: Compressing BERT: studying the effects of weight pruning on transfer learning. arXiv preprint arXiv:2002.08307 (2020)

  5. Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)

  6. Jiao, X., et al.: TinyBERT: Distilling BERT for natural language understanding. arXiv preprint arXiv:1909.10351 (2019)

  7. Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: ALBERT: a lite BERT for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942 (2019)

  8. Levesque, H., Davis, E., Morgenstern, L.: The winograd schema challenge. In: Thirteenth International Conference on the Principles of Knowledge Representation and Reasoning (2012)

    Google Scholar 

  9. Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)

  10. Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019)

  11. Sun, Y., et al.: Ernie: Enhanced representation through knowledge integration. arXiv preprint arXiv:1904.09223 (2019)

  12. Sun, Z., Yu, H., Song, X., Liu, R., Yang, Y., Zhou, D.: MobileBERT: a compact task-agnostic BERT for resource-limited devices. arXiv preprint arXiv:2004.02984 (2020)

  13. Wang, A., et al.: SuperGLUE: a stickier benchmark for general-purpose language understanding systems. arXiv e-prints (2019)

    Google Scholar 

  14. Wang, W., Wei, F., Dong, L., Bao, H., Yang, N., Zhou, M.: MiniLM: deep self-attention distillation for task-agnostic compression of pre-trained transformers. arXiv preprint arXiv:2002.10957 (2020)

  15. Wei, J., et al.: NEZHA: neural contextualized representation for Chinese language understanding. arXiv preprint arXiv:1909.00204 (2019)

  16. Xu, L., et al.: CLUENER 2020: fine-grained named entity recognition dataset and benchmark for Chinese. arXiv preprint arXiv-2001 (2020)

    Google Scholar 

  17. Xu, L., Zhang, X., Dong, Q.: CLUECorpus 2020: a large-scale Chinese corpus for pre-traininglanguage model. arXiv preprint arXiv:2003.01355 (2020)

  18. Xu, L., Zhang, X., Li, L., Hu, H., Cao, C., Liu, W., Li, J., Li, Y., Sun, K., Xu, Y., et al.: Clue: A chinese language understanding evaluation benchmark. arXiv preprint arXiv:2004.05986 (2020)

  19. Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R., Le, Q.V.: XLNet: generalized autoregressive pretraining for language understanding. In: Advances in Neural Information Processing Systems, pp. 5753–5763 (2019)

    Google Scholar 

  20. Zhao, Z., et al.: UER: an open-source toolkit for pre-training models. arXiv preprint arXiv:1909.05658 (2019)

Download references

Acknowledge

Many thanks to NLPCC for giving us this opportunity to organize this task and people who take part in this task.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Junyi Li .

Editor information

Editors and Affiliations

A Appendix

A Appendix

1.1 A.1 List of Literary Works Selected in CLUEWSC2020

figure c

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, J., Hu, H., Zhang, X., Li, M., Li, L., Xu, L. (2020). Light Pre-Trained Chinese Language Model for NLP Tasks. In: Zhu, X., Zhang, M., Hong, Y., He, R. (eds) Natural Language Processing and Chinese Computing. NLPCC 2020. Lecture Notes in Computer Science(), vol 12431. Springer, Cham. https://doi.org/10.1007/978-3-030-60457-8_47

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-60457-8_47

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-60456-1

  • Online ISBN: 978-3-030-60457-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics