计算机科学 ›› 2022, Vol. 49 ›› Issue (7): 148-163.doi: 10.11896/jsjkx.211200018
侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木
HOU Yu-tao, ABULIZI Abudukelimu, ABUDUKELIMU Halidanmu
摘要: 近年来,预训练模型在自然语言处理领域蓬勃发展,旨在对自然语言隐含的知识进行建模和表示,但主流预训练模型大多针对英文领域。中文领域起步相对较晚,鉴于其在自然语言处理过程中的重要性,学术界和工业界都开展了广泛的研究,提出了众多的中文预训练模型。文中对中文预训练模型的相关研究成果进行了较为全面的回顾,首先介绍预训练模型的基本概况及其发展历史,对中文预训练模型主要使用的两种经典模型Transformer和BERT进行了梳理,然后根据不同模型所属类别提出了中文预训练模型的分类方法,并总结了中文领域的不同评测基准,最后对中文预训练模型未来的发展趋势进行了展望。旨在帮助科研工作者更全面地了解中文预训练模型的发展历程,继而为新模型的提出提供思路。
中图分类号:
[1]LIU P F,QIU X P,HUANG X J.Recurrent neura lnetwork for text classification with multi-task learning[C]//Proceedings of the 2016 Conference on IJCAI.2016:2073-2879. [2]KRIZHEVSKY A,SUSKEVER I,HINTON G E.ImageNetclassification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems.London:MIT Press,2012:1097-1105. [3]BAHDANAU D,CHO K,BENGIO Y.Neural Machine Translation by Jointly Learning to Align and Translate[J].arXiv:1409.0473v7,2014. [4]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics.2019:4171-4186. [5]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient Estimation of Word Representations in Vector Space[J].arXiv:1301.3781v1,2013. [6]PENNINGTON J,SOCHER R,MANNING C D.GloVe:Global Vectors for Word Representation [C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Proces-sing(EMNLP).2014:1532-1543. [7]JOULIN A,GRAVE E,BOJANOWSKI P,et al.Bag of Tricks for Efficient Text Classification[J].arXiv:1607.01759,2016. [8]PETERS M,NEUMANN M,LYYER M,et al.Deep Contextualized Word Representations[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics.2018:2227-2237. [9]SHI X,CHEN Z,WANG H,et al.Convolutional LSTM Net-work:A Machine Learning Approach for Precipitation Nowcas-ting[J].arXiv:1506.04214,2015. [10]RADFORD A,NARASIMHAN K,SALIMANS T,et al.Improving language understanding by g-enerative pre-training[OL].[2022-04-15].https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf. [11]WANG A,SINGH A,MICHAEL J,et al.GLUE:A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding[J].arXiv:1804.07461,2018. [12]HOCHREITER S,SCHMIDHUBER J.Long Short-Term Me-mory[J].Neural Computation,1997,9(8):1735-1780. [13]CHO K,MERRIENBOER B V,GULCEHRE C,et al.Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation[C]//Proceedings of the 2014 Confe-rence on Empirical Methods in Natural Language Processing (EMNLP).2014:1724-1734. [14]VASWANI A,SHAZEER N,PARMAR N,et al.Attention Is All You Need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.2017:6000-6010. [15]WU Y,SCHUSTER M,CHEN Z,et al.Google's neural machine translation system:Bridging the gap between human and machine translation[J].arXiv:1609.08144,2016. [16]SUN Y,WANG SH,LI Y K,et al.ERNIE:enha-nced representation through knowledge integration[J].arXiv:1904.09223,2019. [17]WEI J,REN X,LI X,et al.NEZHA:Neural Co-ntextualizedRepresentation for Chinese Language Understanding[J].arXiv:1909.00204,2019. [18]CUI Y,CHE W,LIU T,et al.Revisiting Pre-Trained Models for Chinese Natural Language Processing[J].arXiv:2004.13922,2020. [19]LIU Y H,OTT M,GOYAL N,et al.RoBERTa:A Robustly Optimized BERT Pretraining Approach.[J].arXiv:1907.11692,2019. [20]ERLANGSHEN Pre-training model [OL].[2021-11-15].https://huggingface.co/IDEA-CCNL/Erlangshen-1.3B. [21]PEKRS Pre-training model [OL].[2021-11-16].https://mp.weixin.qq.com/s/r85W7T26vy6_IIRAWY1ZKA. [22]LAI Y,LIU Y,FENG Y,et al.Lattice-BERT:Leveraging Multi-Granularity Representations in Chinese Pre-Trained Language Models[J].arXiv:2104.07204,2021. [23]ZHANG Z,GU Y,HAN X,et al.CPM-2:Large-Scale Cost-Effective Pre-Trained Language Models[J].arXiv:2106.10715,2021. [24]MOTIAN Pre-training model[OL].[2021-06-24]https://mp.weixin.qq.com/s/HQL0Hk49UR6kVNtrvcXEGA. [25]ZHANG R,PANG C,ZHANG C,et al.Correcting ChineseSpelling Errors with Phonetic Pre-Training[C]//Findings of the Association for Computational Linguistics:ACL-IJCNLP.2021:2250-2261. [26]SHAW P,USZKOREIT J,VASWANI A.Self-Attention withRelative Position Representations[J].arXiv:1803.02155,2018. [27]IOFFE S,SZEGEDY C.Batch normalization:accelerating deep network training by reducing int-ernal covariate shift[C]//International Conference on Machine Learning.2015:448-456. [28]BA J L,KIROS J R,HINTON G E.Layer norm-alization[J].arXiv:1607.06450,2016. [29]BERTSG Pre-training model [OL].[2021-03-15].https://baijiahao.baidu.com/s?id=1695185167027662850&wfr=spider&for=pc. [30]DING M,YANG Z,HONG W,et al.CogView:Mastering Text-to-Image Generation via Transf- ormers[J].arXiv:2105.13290,2021. [31]SHAZEER N,MISHOSEINI N,MAZIARZ K,et al.Outra-geously Large Neural Networks:The Sparsely-Gated Mixture-of-Experts Layer[J].arXiv:1701.06538,2017. [32]LIN J,MEN R,YANG A,et al.M6:A Chinese Multimodal Pretrainer[J].arXiv:2103.00823,2021. [33]YANG A,LIN J,MEN R,et al.M6-T:Exploring Sparse Expert Models and Beyond[J].arXiv:2105.15082.2021. [34]DIAO S Z,BAI J X,SONG Y,et al.ZEN:Pre-training Chinese Text Encoder Enhanced by N-gram Representations[C]//Fin-dings of the Association for Computational Linguistics.2020:4729-4740. [35]SONG Y,ZHANG T,WANG Y,et al.ZEN 2.0:Continue Trainingand Adaption for N-gram En-hanced Text Encoders[J].arXiv:2105.01279,2021. [36]ZHANG X,LI P,LI H.AMBERT:A Pre-Trained LanguageModel withMulti-Grained Tokenization[J].arXiv:2008.11869,2020. [37]GUO W,ZHAO M,ZHANG L,et al.LICHEE:Improving Language Model Pre-Training with Multi-Grained Tokenization[J].arXiv:2108.00801,2021. [38]WoBERT Pre-training model [OL].[2020-09-18].https://ke-xue.fm/archives/7758. [39]PLUG Pre-training model [OL].[2021-04-19].https://mp.weixin.qq.com/s/-aV6Hh-BFoW41HQop_Z02w. [40]WANG W,BI B,YAN M,et al.StructBERT:IncorporatingLanguage Structures into Pre-Training for Deep Language Understanding[J].arXiv:1908.04577,2019. [41]BI B,LI C,WU C,et al.PALM:Pre-training an Autoencoding &Autoregressive Language Model for Context-conditionedGene-ration[J].arXiv:2004.07159,2020. [42]SHAO Y,GENG Z,LIU Y,et al.CPT:A Pre-Trained Unba-lanced Transformer for Both Chinese Language Understanding and Generation[J].arXiv:2109.05729,2021. [43]SUN Y,WANG S,FENG S,et al.ERNIE 3.0:Large-ScaleKnowledge Enhanced Pre-Training for Language Understanding and Generation[J].arXiv:2107.02137,2021. [44]WANG S,SUN Y,XIANG Y,et al.ERNIE 3.0 Titan:Exploring Larger-scale Knowledge Enhan-ced Pre-training for Language Understanding and Generation[J].arXiv:2112.12731,2021. [45]SHEN Z.Pre-training model [OL].[2021-09-30].https://www.jiqizhixin.com/articles/2021-09-30-2. [46]SUN Z,LI X,SUN X,et al.ChineseBERT:Chi-nese Pretraining Enhanced by Glyph and Pinyin Information[C]//Proceedings of the 59th Annual Meeting of the Association for Computational L-inguistics and the 11th International Joint Conference on Na-tural Language Processing(Volume 1:Long Papers).2021:2065-2075. [47]ZHANG Z,ZHANG H,CHEN K,et al.Mengzi:TowardsLightweight yet Ingenious Pre-Trained Models for Chinese[J].arXiv:2110.06696,2021. [48]SHEN N.Pre-training model [OL].[2021-10-20].https://mp.weixin.qq.com/s/coW_OIbRA4lwVLZaRyxO9Q. [49]HUO Y,ZHANG M,LIU G,et al.WenLan:Bri-dging Visionand Language by Large-Scale Multi-Modal Pre-Training[J].arXiv:2103.06561,2021. [50]OORD A,VINYALS O,KAVUKCUOGLU K.Neural discreterepresentation learning[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.2017:6309-6318. [51]LIU J,ZHU X,LIU F,et al.OPT:Omni-Percept-ion Pre-Trai-ner for Cross-Modal Understanding and Generation[J].arXiv:2107.00249,2021. [52]ZHANG Z,HAN X,ZHOU H,et al.CPM:A Large-Scale Ge-nerative Chinese Pre-Trained Language Model[J].arXiv:2012.00413,2020. [53]WU S,ZHAO X,YU T,et al.Yuan 1.0:Large-Scale Pre-Trained Language Model in Zero-Shot and Few-Shot Learning[J].arXiv:2110.04725,2021. [54]SUN Y,WANG S,LI Y,et al.ERNIE 2.0:AContinual Pre-Training Framework for Language Understanding[J].Procee-dings of the AAAI Co-nference on Artificial Intelligence,2020,34(5):8968-8975. [55]XIAO C,HU X,LIU Z,et al.Lawformer:A Pre-Trained Language Model for Chinese Legal Long Documents[J].arXiv:2105.03887,2021. [56]BELTAGY I,PETERS M E,COHAN A.Long-former:TheLong-Document Transformer[J].arXiv:2004.05150,2020. [57]ZENG W,REN X,SU T,et al.PanGu-$\alpha$:Large-Scale Autoregressive Pretrained Chinese Language Models with Auto-Parallel Computation[J].arXiv:2104.12369,2021. [58]MICIKEVICIUS P,NARANG S,ALBEN J,et al.Mixed Precision Training[J].arXiv:1710.03740,2017. [59]LESTER B,AL-RFOU R,CONSTANT N.Thepower of scale for parameter-efficient prompt tuning[J].arXiv:2104.08691,2021. [60]BAO S,HE H,WANG F,et al.PLATO-2:Tow-ards Building an Open-Domain Chatbot via Curr-iculum Learning[J].arXiv:2006.16779,2020. [61]BAO S,HE H,WANG F,et al.PLATO-XL:Ex-ploring theLarge-Scale Pre-Training of Dialogue Generation[J].arXiv:2109.09519,2021. [62]WANG Y,KE P,ZHENG Y,et al.A Large-Scale ChineseShort-Text Conversation Dataset[J].arXiv:2008.03946,2020. [63]ZHOU H,KE P,ZHANG Z,et al.EVA:An Op-en-Domain Chinese Dialogue System with Large-Scale Generative Pre-Training[J].arXiv:2108.01547,2021. [64]LIU Z,HUANG D,HUANG D,et al.FinBERT:A Pre-trained Financial Language Representati- on Model for Financial Text Mining[C]//Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence.2021:4513-4519. [65]TAL-EduBERT Pre-training model [OL].[2021-01-26].https://github.com/tal-tech/edu-bert. [66]GUWENBERT Pre-training model [OL].[2021-08-31]https://github.com/ethan-yt/guwenbert. [67]BERT-CCPoem Pre-training model [OL].[2021-7-5]https://github.com/THUNLP-AIPoet/BERT-CCPoem. [68]ZHANG N,JIA Q,YIN K,et al.Conceptualized Representation Learning for Chinese Biomedical Text Mining[J].arXiv:2008.10813,2020. [69]HUI B,SHI X,GENG R,et al.Improving Text-to-SQL withSchema Dependency Learning[J].arXiv:2103.04399,2021. [70]LAN Z,CHEN M,GOODMAN S,et al.ALBE-RT:A LiteBERT for Self-Supervised Learning of Language Representations[J].arXiv:1909.11942,2019. [71]YANG Z,DAI Z,YANG Y,et al.XLNet:Gene-ralized Auto-regressive Pretraining for Language Understanding[J].arXiv:1906.08237,2019. [72]CLARK K,LUONG M T,LE Q V,et al.ELEC-TRA:Pre-Training Text Encoders as Discrimina-tors Rather Than Gene-rators[J].arXiv:2003.10555,2020. [73]SU J,LU Y,PAN S,et al.RoFormer:Enhanced Transformer with Rotary Position Embedding[J].arXiv:2104.09864,2021. [74]DAI Z,YANG Z,YANG,Y,et al.Transformer-XL:Attentive Language Models Beyond a Fixed-Length Context[J].arXiv:1901.02860,2019. [75]RAFFEL C,SHAZEER N,ROBERTS A,et al.Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer[J].arXiv:1910.10683,2019. [76]ZHANG J,ZHAO Y,SALEH M,et al.PEGASU-S:Pre-Trai-ning with Extracted Gap-Sentences for Abstractive Summarization[J].arXiv:1912.08777,2019. [77]LEWIS M,LIU Y,GOYAL N,et al.BART:Denoising Se-quence-to-Sequence Pre-training for Natural Language Generation,Translation,and Comprehension[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.2020:7871-7880. [78]DONG L,YANG N,WANG W,et al.Unified Language ModelPre-Training for Natural Lang-uage Understanding and Generation[J].arXiv:1905.03197,2019. [79]SU J L.SimBERT Pretraining model [OL].[2020-05-18].https://www.spaces.ac.cn/archives/7427. [80]SU J L.RoFormer-Sim Pretraining model [OL].[2021-06-11].https://www.spaces.ac.cn/archives/8454. [81]XU L,ZHANG X,DONG Q.CLUECorpus2020:A Large-Scale Chinese Corpus for Pre-Training Language Model[J].arXiv:2003.01355,2020. [82]XU L,HU H,ZHANG X,et al.CLUE:A Chinese Language Understanding Evaluation Benchmark[J].arXiv:2004.05986,2020. [83]ZHANG N,CHEN M,BI Z,et al.CBLUE:A Chinese Bio-medical Language Understanding Evaluation Benchmark[J].ar-Xiv:2106.08087,2021. [84]YAO Y,DONG Q,GUAN J,et al.CUGE:A Chinese Language Understanding and Generation Evaluation Benchmark[J].ar-Xiv:2112.13610,2021. |
[1] | 饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277 |
[2] | 汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108 |
[3] | 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺. 时序知识图谱表示学习 Temporal Knowledge Graph Representation Learning 计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204 |
[4] | 王剑, 彭雨琦, 赵宇斐, 杨健. 基于深度学习的社交网络舆情信息抽取方法综述 Survey of Social Network Public Opinion Information Extraction Based on Deep Learning 计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099 |
[5] | 郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077 |
[6] | 姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046 |
[7] | 孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061 |
[8] | 闫佳丹, 贾彩燕. 基于双图神经网络信息融合的文本分类方法 Text Classification Method Based on Information Fusion of Dual-graph Neural Network 计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042 |
[9] | 周慧, 施皓晨, 屠要峰, 黄圣君. 基于主动采样的深度鲁棒神经网络学习 Robust Deep Neural Network Learning Based on Active Sampling 计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044 |
[10] | 苏丹宁, 曹桂涛, 王燕楠, 王宏, 任赫. 小样本雷达辐射源识别的深度学习方法综述 Survey of Deep Learning for Radar Emitter Identification Based on Small Sample 计算机科学, 2022, 49(7): 226-235. https://doi.org/10.11896/jsjkx.210600138 |
[11] | 姜胜腾, 张亦弛, 罗鹏, 刘月玲, 曹阔, 赵海涛, 魏急波. 语义通信系统的性能度量指标分析 Analysis of Performance Metrics of Semantic Communication Systems 计算机科学, 2022, 49(7): 236-241. https://doi.org/10.11896/jsjkx.211200071 |
[12] | 胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092 |
[13] | 程成, 降爱莲. 基于多路径特征提取的实时语义分割方法 Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction 计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157 |
[14] | 王君锋, 刘凡, 杨赛, 吕坦悦, 陈峙宇, 许峰. 基于多源迁移学习的大坝裂缝检测 Dam Crack Detection Based on Multi-source Transfer Learning 计算机科学, 2022, 49(6A): 319-324. https://doi.org/10.11896/jsjkx.210500124 |
[15] | 楚玉春, 龚航, 王学芳, 刘培顺. 基于YOLOv4的目标检测知识蒸馏算法研究 Study on Knowledge Distillation of Target Detection Algorithm Based on YOLOv4 计算机科学, 2022, 49(6A): 337-344. https://doi.org/10.11896/jsjkx.210600204 |
|