Abstract
We present the results of shared-task 1 held in the 2020 Conference on Natural Language Processing and Chinese Computing (NLPCC): Light Pre-Trained Chinese Language Model for NLP tasks. This shared-task examines the performance of light language models on four common NLP tasks: Text Classification, Named Entity Recognition, Anaphora Resolution and Machine Reading Comprehension. To make sure that the models are light-weight, we put restrictions and requirements on the number of parameters and inference speed of the participating models. In total, 30 teams registered our tasks. Each submission was evaluated through our online benchmark system (https://www.cluebenchmarks.com/nlpcc2020.html), with the average score over the four tasks as the final score. Various ideas and frameworks were explored by the participants, including data enhancement, knowledge distillation and quantization. The best model achieved an average score of 75.949, which was very close to BERT-base (76.460). We believe this shared-task highlights the potential of light-weight models and calls for further research on the development and exploration of light-weight models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
See most recent results at https://www.cluebenchmarks.com/nlpcc2020.html.
- 7.
- 8.
- 9.
- 10.
Recently, 69.289 is the best score of our baseline.
- 11.
Huawei Cloud & Noah’s Ark lab submitted Rank 3 instead of the best one.
- 12.
Thanks to Xiaomi AI Lab. They submitted this BERT-base model, which is though not totally fine-tuned.
References
Chen, B., Huang, F.: Semi-supervised convolutional networks for translation adaptation with tiny amount of in-domain data. In: Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning, pp. 314–323 (2016)
Cui, Y., et al.: A span-extraction dataset for Chinese machine reading comprehension. arXiv preprint arXiv:1810.07366 (2018)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Gordon, M.A., Duh, K., Andrews, N.: Compressing BERT: studying the effects of weight pruning on transfer learning. arXiv preprint arXiv:2002.08307 (2020)
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
Jiao, X., et al.: TinyBERT: Distilling BERT for natural language understanding. arXiv preprint arXiv:1909.10351 (2019)
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: ALBERT: a lite BERT for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942 (2019)
Levesque, H., Davis, E., Morgenstern, L.: The winograd schema challenge. In: Thirteenth International Conference on the Principles of Knowledge Representation and Reasoning (2012)
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019)
Sun, Y., et al.: Ernie: Enhanced representation through knowledge integration. arXiv preprint arXiv:1904.09223 (2019)
Sun, Z., Yu, H., Song, X., Liu, R., Yang, Y., Zhou, D.: MobileBERT: a compact task-agnostic BERT for resource-limited devices. arXiv preprint arXiv:2004.02984 (2020)
Wang, A., et al.: SuperGLUE: a stickier benchmark for general-purpose language understanding systems. arXiv e-prints (2019)
Wang, W., Wei, F., Dong, L., Bao, H., Yang, N., Zhou, M.: MiniLM: deep self-attention distillation for task-agnostic compression of pre-trained transformers. arXiv preprint arXiv:2002.10957 (2020)
Wei, J., et al.: NEZHA: neural contextualized representation for Chinese language understanding. arXiv preprint arXiv:1909.00204 (2019)
Xu, L., et al.: CLUENER 2020: fine-grained named entity recognition dataset and benchmark for Chinese. arXiv preprint arXiv-2001 (2020)
Xu, L., Zhang, X., Dong, Q.: CLUECorpus 2020: a large-scale Chinese corpus for pre-traininglanguage model. arXiv preprint arXiv:2003.01355 (2020)
Xu, L., Zhang, X., Li, L., Hu, H., Cao, C., Liu, W., Li, J., Li, Y., Sun, K., Xu, Y., et al.: Clue: A chinese language understanding evaluation benchmark. arXiv preprint arXiv:2004.05986 (2020)
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R., Le, Q.V.: XLNet: generalized autoregressive pretraining for language understanding. In: Advances in Neural Information Processing Systems, pp. 5753–5763 (2019)
Zhao, Z., et al.: UER: an open-source toolkit for pre-training models. arXiv preprint arXiv:1909.05658 (2019)
Acknowledge
Many thanks to NLPCC for giving us this opportunity to organize this task and people who take part in this task.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Appendix
A Appendix
1.1 A.1 List of Literary Works Selected in CLUEWSC2020
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Li, J., Hu, H., Zhang, X., Li, M., Li, L., Xu, L. (2020). Light Pre-Trained Chinese Language Model for NLP Tasks. In: Zhu, X., Zhang, M., Hong, Y., He, R. (eds) Natural Language Processing and Chinese Computing. NLPCC 2020. Lecture Notes in Computer Science(), vol 12431. Springer, Cham. https://doi.org/10.1007/978-3-030-60457-8_47
Download citation
DOI: https://doi.org/10.1007/978-3-030-60457-8_47
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60456-1
Online ISBN: 978-3-030-60457-8
eBook Packages: Computer ScienceComputer Science (R0)