计算机科学 ›› 2022, Vol. 49 ›› Issue (6): 305-312.doi: 10.11896/jsjkx.210500117
董振恒1, 任维平2, 游新冬1, 吕学强1
DONG Zhen-heng1, REN Wei-ping2, YOU Xin-dong1, LYU Xue-qiang1
摘要: 在领域机器翻译中,领域术语能否被正确翻译对翻译质量起着决定性作用,有效地将领域术语融入到神经机器翻译模型中,提升领域术语的翻译质量具有实际意义。文中提出了一种将新能源领域术语信息作为先验知识融入神经机器翻译中的方法,以新能源领域双语术语知识库构建的术语字典为媒介,提出并比较了两种不同的知识融入方式:1)术语替换,即在源语言端使用目标端术语替换源端术语;2)术语添加,即在源语言端将源端术语与目标端术语拼接,并在源语言端与目标语言端均使用作为特殊外部知识的标识符来标识目标端术语的开头与结尾。以新能源领域中英文双语对齐语料以及构建的中英文对齐术语库为数据基础进行了实验,结果表明,在测试集上,所提方法的BLEU值比基线实验分别高出6.38与6.55,证明了所提方法能有效地将领域术语知识融入到翻译模型中,提升了领域术语的翻译质量。
中图分类号:
[1] JUNCZYS-DOWMUNT M,DWOJAK T,HOANG H.Is neural machine translation ready for deployment? A case study on 30 translation directions[J].arXiv:1610.01108,2016. [2] WU Y,SCHUSTER M,CHEN Z,et al.Google’s neural machine translation system:Bridging the gap between human and machine translation[J].arXiv:1609.08144,2016. [3] GEHRING J,AULI M,GRANGIER D,et al.Convolutional sequence to sequence learning[C]//International Conference on Machine Learning.PMLR,2017:1243-1252. [4] BRITZ D,GOLDIE A,LUONG M T,et al.Massive exploration of neural machine translation architectures[J].arXiv:1703.03906,2017. [5] VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[J].arXiv:1706.03762,2017. [6] LIU F,LU H,NEUBIG G.Handling homographs in neural machine translation[J].arXiv:1708.06510,2017. [7] QIN W J,XIONG D Y.Neural machine translation with rule information[J].Journal of Xiamen University(Natural Science),2020,59(2):185-191. [8] FENG Y,SHAOCH Z.Review on the frontier of neuralmachinetranslation[J].Journal of Chinese Information Processing,2020,34(7):1-18. [9] LI Y,XIONG D,ZHANG M.review of neural machine translation[J].Chinese Journal of Computers,2018,41(12):2734-2755. [10] TANG Y,MENG F,LU Z,et al.Neural machine translationwith external phrase memory[J].arXiv:1606.01792,2016. [11] ARTHUR P,NEUBIG G,NAKAMURA S.Incorporating discrete-translation lexicons into neural machine translation[J].arXiv:1606.02006,2016. [12] WANG X,TU Z,XIONG D,et al.Translating phrasesin neural machine translation[J].arXiv:1708.01980,2017. [13] ZHANG J,LIU Y,LUAN H,et al.Prior Knowledge In-tegration for Neural Machine Translation using Posterior Regularization[J].arXiv:1811.01100,2018. [14] HAN D,LI J H,ZHOU G D.Neural machine translation based on word translation[J].Journal of Chinese Information Proces-sing,2019,33(7):40-45. [15] DINU G,MATHUR P,FEDERICO M,et al.Training neural machine translation to apply terminology constraints[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.2019:3063-3068. [16] QIAO B W,LI J H.Neural machine translation with semantic roles[J].Computer Science,2020,47(2):163-168. [17] CAO Q,XIONG D Y.Fusion method of translation Memory and neural machine translation based on data expansion[J].Journal of Chinese Information Processing,2020,34(5):36-43. [18] ZHANG T,HUANG H,FENG C,et al.Self-supervised bilingual syntactic alignment for neural machine translation[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021,35(16):14454-14462. [19] CHEN G,CHEN Y,LI V O K.Lexically constrained neural machine translation with explicit alignment guidance[C]//Procee-dings of the AAAI Conference on Artificial Intelligence.2021:12630-12638. [20] SUN T,CHEN H T,LV X Q,et al.Research on Term Extraction of New Energy Patent Text[J/OL].Journal of Chinese Computer Systems.[2021-07-16].http://kns.cnki.net/kcms/detail/21.1106.TP.20210511.1556.002.html. [21] OTT M,EDUNOV S,BAEVSKI A,et al.fairseq:A fast,extensible toolkit for sequence modeling[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2019:48-53. [22] PAPINENI K,ROUKOS S,WARD T,et al.BLEU:a methodfor automatic evaluation of machine translation[C]//Procee-dings of the 40th Annual Meeting on Association for Computational Linguistics.Stroudsburg,PA:Association for Computational Linguistics,2002:311-318. |
[1] | 刘昕, 袁家斌, 王天星. 基于场景先验知识的室内人体行为识别方法 Interior Human Action Recognition Method Based on Prior Knowledge of Scene 计算机科学, 2022, 49(1): 225-232. https://doi.org/10.11896/jsjkx.201100185 |
[2] | 田振坤, 傅莺莺, 刘素红. 基于异构机器学习算法融合的遥感影像分类 Remote Sensing Image Classification Based on Heterogeneous Machine Learning Algorithm Fusion 计算机科学, 2019, 46(5): 235-240. https://doi.org/10.11896/j.issn.1002-137X.2019.05.036 |
[3] | 郭鑫鹏,黄元元,胡作进. 融合颜色与纹理的复杂场景下的服装图像分割算法 Unsupervised Complex-scene Clothing Image Segmentation Algorithm Based on Color and Texture Features 计算机科学, 2017, 44(Z11): 228-232. https://doi.org/10.11896/j.issn.1002-137X.2017.11A.048 |
[4] | 于旭,杨静,谢志强. 虚拟样本生成技术研究 Research on Virtual Sample Generation Technology 计算机科学, 2011, 38(3): 16-19. |
[5] | 李琳娜,陈海蕊,王映龙. 基于高阶逻辑的复杂结构数据半监督聚类 Semi-supervised Clustering of Complex Structured Data Based on Higher-order Logic 计算机科学, 2009, 36(9): 196-200. |
|