Abstract
Neural Machine Translation (NMT) has shown improvement for high-resource languages, but there is still a problem with low-resource languages as NMT performs well on huge parallel data available for high-resource languages. In spite of many proposals to solve the problem of low-resource languages, it continues to be a difficult challenge. The issue becomes even more complicated when few resources cover only one domain. In our attempt to combat this issue, we propose a new approach to improve NMT for low-resource languages. The proposed approach using the transformer model shows 5.3, 5.0, and 3.7 BLEU score improvement for Gamo-English, Gofa-English, and Dawuro-English language pairs, respectively, where Gamo, Gofa, and Dawuro are related low-resource Ethiopian languages. We discuss our contributions and envisage future steps in this challenging research area.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Julia, H., Manning, C.D.: Advances in natural language processing. Science 349(6245), 261–266 (2015)
Forcada, M.L.: Making sense of neural machine translation. Transl. Spaces 6(2), 291–309 (2017)
Mohamed, S.A., Elsayed, A.A., Hassan, Y.F., Abdou, M.A.: Neural machine translation: past, present, and future. Neural Comput. Appl. 33(23), 15919–15931 (2021). https://doi.org/10.1007/s00521-021-06268-0
Benyamin, A., Dorr, B.J.: Augmenting neural machine translation through round-trip training approach. Open Comput. Sci. 9(1), 268–278 (2019)
Alexandre, B., Kim, Z.M., Nikoulina, V., Park, E.L., Gallé, M.: A multilingual neural machine translation model for biomedical data. arXiv preprint arXiv:2008.02878 (2020)
Markus, F., Firat, O.: Complete multilingual neural machine translation. arXiv preprint arXiv:2010.10239 (2020)
Khaled, S., Rafea, A., Moneim, A.A., Baraka, H.: Machine translation of English noun phrases into Arabic. Int. J. Comput. Process. Oriental Lang. 17(02), 121–134 (2004)
Pratik, J., Santy, S., Budhiraja, A., Bali, K., Choudhury, M.: The state and fate of linguistic diversity and inclusion in the NLP world. arXiv preprint arXiv:2004.09095 (2020)
Hasibuan, Z.: A comparative study between human translation and machine translation as an interdisciplinary research. J. Eng. Teach. Learn. Issue. 3(2), 115–130 (2020)
Dubey, P.: Study and development of machine translation system from Hindi language to Dogri language an important tool to bridge the digital divide (2008)
Okpor, M.D.: Machine translation approaches: issues and challenges. Int. J. Comput. Sci. Issue. (IJCSI) 11(5), 159 (2014)
Lopez, A.: Statistical machine translation. ACM Comput. Surv. (CSUR) 40(3), 1–49 (2008)
Philipp, K.: Statistical Machine Translation. Cambridge University Press (2009)
Sergei, N., Somers, H.L., Wilks, Y.A.: A Framework of a Mechanical Translation between Japanese and English by Analogy Principle, pp. 351–354 (2003)
Philipp, K.: Neural machine translation. arXiv preprint arXiv:1709.07809 (2017)
Stahlberg, F.: Neural machine translation: a review. J. Artif. Intell. Res. 69, 343–418 (2020)
Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016)
Robert, Ö., Tiedemann, J.: Neural machine translation for low-resource languages. arXiv preprint arXiv:1708.05729 (2017)
Barret, Z., Yuret, D., May, J., Knight, K.: Transfer learning for low-resource neural machine translation. arXiv preprint arXiv:1604.02201 (2016)
Tao, F., Li, M., Chen, L.: Low-resource neural machine translation with transfer learning. In: LREC 2018 Workshop, p. 30 (2018)
Amel, S., Melouah, a., Faghihi, u., Sahib, k.: Improving neural machine translation for low resource Algerian dialect by transductive transfer learning strategy. Arab. J. Sci. Eng. 47, 10411–10418 (2022)
Hirut, W.: Language planning challenged by identity contestation in a multilingual setting: the case of gamo. Oslo Stud. Lang. 8(1) (2016)
Azeb, A.: The Omotic Language Family. Cambridge University Press (2017)
Atnafu Lambebo, T., Woldeyohannis, M.M., Yigezu, M.G.: A parallel corpora for bi-directional neural machine translation for low resourced Ethiopian languages. In: 2021 International Conference on Information and Communication Technology for Development for Africa (ICT4DA), pp. 71–76. IEEE (2021)
Ashish, V., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Gage, P.: A new algorithm for data compression. C Users J. 12(2), 23–38 (1994)
Guillaume, K., Kim, Y., Deng, Y., Senellart, J., Rush, A.M.: OpenNMT: open-source toolkit for neural machine translation. arXiv preprint arXiv:1701.02810 (2017)
Michael, C., Bragança, L., Paranaiba Vilela Neto, O., Nacif, J.A., Ferreira R.: Google colab cad4u: hands-on cloud laboratories for digital design. In: 2021 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–5. IEEE (2021)
Yigezu, M.G., Woldeyohannis M.M., Tonja, A.L.:Multilingual neural machine translation for low resourced languages: Ometo-English. In: 2021 International Conference on Information and Communication Technology for Development for Africa (ICT4DA), pp. 89–94 (2021). https://doi.org/10.1109/ICT4DA53266.2021.9671270
Acknowledgments
The work was done with partial support from the Mexican Government through the grant A1S-47854 of CONACYT, Mexico, grants 20220852, 20220859, and 20221627 of the Secretaría de Investigación y Posgrado of the Instituto Politécnico Nacional, Mexico. The authors thank the CONACYT for the computing resources brought to them through the Plataforma de Aprendizaje Profundo para Tecnologías del Lenguaje of the Laboratorio de Supercómputo of the INAOE, Mexico and acknowledge the support of Microsoft through the Microsoft Latin America PhD Award.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Tonja, A.L., Kolesnikova, O., Arif, M., Gelbukh, A., Sidorov, G. (2022). Improving Neural Machine Translation for Low Resource Languages Using Mixed Training: The Case of Ethiopian Languages. In: Pichardo Lagunas, O., Martínez-Miranda, J., Martínez Seis, B. (eds) Advances in Computational Intelligence. MICAI 2022. Lecture Notes in Computer Science(), vol 13613. Springer, Cham. https://doi.org/10.1007/978-3-031-19496-2_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-19496-2_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19495-5
Online ISBN: 978-3-031-19496-2
eBook Packages: Computer ScienceComputer Science (R0)