融入新能源领域术语知识的机器翻译方法

计算机科学 ›› 2022, Vol. 49 ›› Issue (6): 305-312.doi: 10.11896/jsjkx.210500117

• 人工智能 • 上一篇    下一篇

融入新能源领域术语知识的机器翻译方法

董振恒1, 任维平2, 游新冬1, 吕学强1   

  1. 1 北京信息科技大学网络文化与数字传播北京市重点实验室 北京 100101
    2 北京信息科技大学外国语学院 北京 100192
  • 收稿日期:2021-05-17 修回日期:2021-12-10 出版日期:2022-06-15 发布日期:2022-06-08
  • 通讯作者: 任维平(renweiping@bistu.edu.cn)
  • 作者简介:(dongzhenheng1@163.com)
  • 基金资助:
    北京市自然科学基金 (4212020);国家自然科学基金(61671070);北京信息科技大学“勤信人才”培育计划项目(QXTCPB201908);北京市市教委科研计划(KM202111232001)

Machine Translation Method Integrating New Energy Terminology Knowledge

DONG Zhen-heng1, REN Wei-ping2, YOU Xin-dong1, LYU Xue-qiang1   

  1. 1 Beijing Key Laboratory of Internet Culture and Digital Dissemination Research,Beijing Information Science and Technology University, Beijing 100101,China
    2 School of Foreign Languages,Beijing Information Science and Technology University,Beijing 100192,China
  • Received:2021-05-17 Revised:2021-12-10 Online:2022-06-15 Published:2022-06-08
  • About author:DONG Zhen-heng,born in 1995,postgraduate.His main research interests include natural language processing and machine translation.
    REN Wei-ping,born in 1962,professor.Her main research interests include applied linguistics and so on.
  • Supported by:
    Natural Science Foundation of Beijing,China(4212020),National Natural Science Foundation of China(61671070),Qin Xin Talents Cultivation Program of Beijing Information Science & Technology University(QXTCPB201908) and Research Planning of Beijing Municipal Commission of Education (KM202111232001).

摘要: 在领域机器翻译中,领域术语能否被正确翻译对翻译质量起着决定性作用,有效地将领域术语融入到神经机器翻译模型中,提升领域术语的翻译质量具有实际意义。文中提出了一种将新能源领域术语信息作为先验知识融入神经机器翻译中的方法,以新能源领域双语术语知识库构建的术语字典为媒介,提出并比较了两种不同的知识融入方式:1)术语替换,即在源语言端使用目标端术语替换源端术语;2)术语添加,即在源语言端将源端术语与目标端术语拼接,并在源语言端与目标语言端均使用作为特殊外部知识的标识符来标识目标端术语的开头与结尾。以新能源领域中英文双语对齐语料以及构建的中英文对齐术语库为数据基础进行了实验,结果表明,在测试集上,所提方法的BLEU值比基线实验分别高出6.38与6.55,证明了所提方法能有效地将领域术语知识融入到翻译模型中,提升了领域术语的翻译质量。

关键词: 领域机器翻译, 领域术语, 术语替换, 术语添加, 特殊标识, 先验知识

Abstract: In domain machine translation,whether domain terms can be translated correctly plays a decisive role in translation quality.It is of practical significance to effectively integrate domain terms into neural machine translation model and improve the translation quality of domain terms.This paper proposes a method to integrate the term information in the field of new energy into neural machine translation as a priori knowledge.Taking the term dictionary constructed by the bilingual term knowledge base in the field of new energy as the medium,this paper puts forward and compares two different ways of knowledge integration:1)term replacement,that is,replacing the source term with the target term at the source language end;2)term addition refers to the splicing of source side terms and target side terms at the source language side,the identifier as special external knowledge is used to identify the beginning and end of the target term at both the source language end and the target language end.Experiments are carried out based on the Chinese and English bilingual alignment corpus in the field of new energy and the constructed Chinese and English alignment corpus.The results show that on the test set,the Bleu value of the proposed method is 6.38 and 6.55 higher than that of the baseline experiment respectively,which proves that the proposed method can effectively integrate the domain term knowledge into the translation model and improve the translation quality of domain terms.

Key words: Domain machine translation, Domain terms, Prior knowledge, Special identification, Term append, Term replacement

中图分类号: 

  • TP391
[1] JUNCZYS-DOWMUNT M,DWOJAK T,HOANG H.Is neural machine translation ready for deployment? A case study on 30 translation directions[J].arXiv:1610.01108,2016.
[2] WU Y,SCHUSTER M,CHEN Z,et al.Google’s neural machine translation system:Bridging the gap between human and machine translation[J].arXiv:1609.08144,2016.
[3] GEHRING J,AULI M,GRANGIER D,et al.Convolutional sequence to sequence learning[C]//International Conference on Machine Learning.PMLR,2017:1243-1252.
[4] BRITZ D,GOLDIE A,LUONG M T,et al.Massive exploration of neural machine translation architectures[J].arXiv:1703.03906,2017.
[5] VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[J].arXiv:1706.03762,2017.
[6] LIU F,LU H,NEUBIG G.Handling homographs in neural machine translation[J].arXiv:1708.06510,2017.
[7] QIN W J,XIONG D Y.Neural machine translation with rule information[J].Journal of Xiamen University(Natural Science),2020,59(2):185-191.
[8] FENG Y,SHAOCH Z.Review on the frontier of neuralmachinetranslation[J].Journal of Chinese Information Processing,2020,34(7):1-18.
[9] LI Y,XIONG D,ZHANG M.review of neural machine translation[J].Chinese Journal of Computers,2018,41(12):2734-2755.
[10] TANG Y,MENG F,LU Z,et al.Neural machine translationwith external phrase memory[J].arXiv:1606.01792,2016.
[11] ARTHUR P,NEUBIG G,NAKAMURA S.Incorporating discrete-translation lexicons into neural machine translation[J].arXiv:1606.02006,2016.
[12] WANG X,TU Z,XIONG D,et al.Translating phrasesin neural machine translation[J].arXiv:1708.01980,2017.
[13] ZHANG J,LIU Y,LUAN H,et al.Prior Knowledge In-tegration for Neural Machine Translation using Posterior Regularization[J].arXiv:1811.01100,2018.
[14] HAN D,LI J H,ZHOU G D.Neural machine translation based on word translation[J].Journal of Chinese Information Proces-sing,2019,33(7):40-45.
[15] DINU G,MATHUR P,FEDERICO M,et al.Training neural machine translation to apply terminology constraints[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.2019:3063-3068.
[16] QIAO B W,LI J H.Neural machine translation with semantic roles[J].Computer Science,2020,47(2):163-168.
[17] CAO Q,XIONG D Y.Fusion method of translation Memory and neural machine translation based on data expansion[J].Journal of Chinese Information Processing,2020,34(5):36-43.
[18] ZHANG T,HUANG H,FENG C,et al.Self-supervised bilingual syntactic alignment for neural machine translation[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021,35(16):14454-14462.
[19] CHEN G,CHEN Y,LI V O K.Lexically constrained neural machine translation with explicit alignment guidance[C]//Procee-dings of the AAAI Conference on Artificial Intelligence.2021:12630-12638.
[20] SUN T,CHEN H T,LV X Q,et al.Research on Term Extraction of New Energy Patent Text[J/OL].Journal of Chinese Computer Systems.[2021-07-16].http://kns.cnki.net/kcms/detail/21.1106.TP.20210511.1556.002.html.
[21] OTT M,EDUNOV S,BAEVSKI A,et al.fairseq:A fast,extensible toolkit for sequence modeling[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2019:48-53.
[22] PAPINENI K,ROUKOS S,WARD T,et al.BLEU:a methodfor automatic evaluation of machine translation[C]//Procee-dings of the 40th Annual Meeting on Association for Computational Linguistics.Stroudsburg,PA:Association for Computational Linguistics,2002:311-318.
[1] 刘昕, 袁家斌, 王天星.
基于场景先验知识的室内人体行为识别方法
Interior Human Action Recognition Method Based on Prior Knowledge of Scene
计算机科学, 2022, 49(1): 225-232. https://doi.org/10.11896/jsjkx.201100185
[2] 田振坤, 傅莺莺, 刘素红.
基于异构机器学习算法融合的遥感影像分类
Remote Sensing Image Classification Based on Heterogeneous Machine Learning Algorithm Fusion
计算机科学, 2019, 46(5): 235-240. https://doi.org/10.11896/j.issn.1002-137X.2019.05.036
[3] 郭鑫鹏,黄元元,胡作进.
融合颜色与纹理的复杂场景下的服装图像分割算法
Unsupervised Complex-scene Clothing Image Segmentation Algorithm Based on Color and Texture Features
计算机科学, 2017, 44(Z11): 228-232. https://doi.org/10.11896/j.issn.1002-137X.2017.11A.048
[4] 于旭,杨静,谢志强.
虚拟样本生成技术研究
Research on Virtual Sample Generation Technology
计算机科学, 2011, 38(3): 16-19.
[5] 李琳娜,陈海蕊,王映龙.
基于高阶逻辑的复杂结构数据半监督聚类
Semi-supervised Clustering of Complex Structured Data Based on Higher-order Logic
计算机科学, 2009, 36(9): 196-200.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!