Improving Neural Machine Translation for Low Resource Languages Using Mixed Training: The Case of Ethiopian Languages

Tonja, Atnafu Lambebo; Kolesnikova, Olga; Arif, Muhammad; Gelbukh, Alexander; Sidorov, Grigori

doi:10.1007/978-3-031-19496-2_3

Atnafu Lambebo Tonja ORCID: orcid.org/0000-0002-3501-5136¹⁰,
Olga Kolesnikova¹⁰,
Muhammad Arif¹⁰,
Alexander Gelbukh¹⁰ &
…
Grigori Sidorov¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13613))

Included in the following conference series:

Mexican International Conference on Artificial Intelligence

831 Accesses

Abstract

Neural Machine Translation (NMT) has shown improvement for high-resource languages, but there is still a problem with low-resource languages as NMT performs well on huge parallel data available for high-resource languages. In spite of many proposals to solve the problem of low-resource languages, it continues to be a difficult challenge. The issue becomes even more complicated when few resources cover only one domain. In our attempt to combat this issue, we propose a new approach to improve NMT for low-resource languages. The proposed approach using the transformer model shows 5.3, 5.0, and 3.7 BLEU score improvement for Gamo-English, Gofa-English, and Dawuro-English language pairs, respectively, where Gamo, Gofa, and Dawuro are related low-resource Ethiopian languages. We discuss our contributions and envisage future steps in this challenging research area.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 8579; Price includes VAT (Japan)

Softcover Book: JPY 10724; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Neural machine translation for limited resources English-Nyishi pair

Article 02 November 2023

Low Resource Neural Machine Translation from English to Khasi: A Transformer-Based Approach

Low-Resource Machine Translation Training Curriculum Fit for Low-Resource Languages

References

Julia, H., Manning, C.D.: Advances in natural language processing. Science 349(6245), 261–266 (2015)
Article MathSciNet Google Scholar
Forcada, M.L.: Making sense of neural machine translation. Transl. Spaces 6(2), 291–309 (2017)
Article Google Scholar
Mohamed, S.A., Elsayed, A.A., Hassan, Y.F., Abdou, M.A.: Neural machine translation: past, present, and future. Neural Comput. Appl. 33(23), 15919–15931 (2021). https://doi.org/10.1007/s00521-021-06268-0
Article Google Scholar
Benyamin, A., Dorr, B.J.: Augmenting neural machine translation through round-trip training approach. Open Comput. Sci. 9(1), 268–278 (2019)
Article Google Scholar
Alexandre, B., Kim, Z.M., Nikoulina, V., Park, E.L., Gallé, M.: A multilingual neural machine translation model for biomedical data. arXiv preprint arXiv:2008.02878 (2020)
Markus, F., Firat, O.: Complete multilingual neural machine translation. arXiv preprint arXiv:2010.10239 (2020)
Khaled, S., Rafea, A., Moneim, A.A., Baraka, H.: Machine translation of English noun phrases into Arabic. Int. J. Comput. Process. Oriental Lang. 17(02), 121–134 (2004)
Article Google Scholar
Pratik, J., Santy, S., Budhiraja, A., Bali, K., Choudhury, M.: The state and fate of linguistic diversity and inclusion in the NLP world. arXiv preprint arXiv:2004.09095 (2020)
Hasibuan, Z.: A comparative study between human translation and machine translation as an interdisciplinary research. J. Eng. Teach. Learn. Issue. 3(2), 115–130 (2020)
Article Google Scholar
Dubey, P.: Study and development of machine translation system from Hindi language to Dogri language an important tool to bridge the digital divide (2008)
Google Scholar
Okpor, M.D.: Machine translation approaches: issues and challenges. Int. J. Comput. Sci. Issue. (IJCSI) 11(5), 159 (2014)
Google Scholar
Lopez, A.: Statistical machine translation. ACM Comput. Surv. (CSUR) 40(3), 1–49 (2008)
Article Google Scholar
Philipp, K.: Statistical Machine Translation. Cambridge University Press (2009)
Google Scholar
Sergei, N., Somers, H.L., Wilks, Y.A.: A Framework of a Mechanical Translation between Japanese and English by Analogy Principle, pp. 351–354 (2003)
Google Scholar
Philipp, K.: Neural machine translation. arXiv preprint arXiv:1709.07809 (2017)
Stahlberg, F.: Neural machine translation: a review. J. Artif. Intell. Res. 69, 343–418 (2020)
Article MathSciNet Google Scholar
Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016)
Robert, Ö., Tiedemann, J.: Neural machine translation for low-resource languages. arXiv preprint arXiv:1708.05729 (2017)
Barret, Z., Yuret, D., May, J., Knight, K.: Transfer learning for low-resource neural machine translation. arXiv preprint arXiv:1604.02201 (2016)
Tao, F., Li, M., Chen, L.: Low-resource neural machine translation with transfer learning. In: LREC 2018 Workshop, p. 30 (2018)
Google Scholar
Amel, S., Melouah, a., Faghihi, u., Sahib, k.: Improving neural machine translation for low resource Algerian dialect by transductive transfer learning strategy. Arab. J. Sci. Eng. 47, 10411–10418 (2022)
Google Scholar
Hirut, W.: Language planning challenged by identity contestation in a multilingual setting: the case of gamo. Oslo Stud. Lang. 8(1) (2016)
Google Scholar
Azeb, A.: The Omotic Language Family. Cambridge University Press (2017)
Google Scholar
Atnafu Lambebo, T., Woldeyohannis, M.M., Yigezu, M.G.: A parallel corpora for bi-directional neural machine translation for low resourced Ethiopian languages. In: 2021 International Conference on Information and Communication Technology for Development for Africa (ICT4DA), pp. 71–76. IEEE (2021)
Google Scholar
Ashish, V., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Gage, P.: A new algorithm for data compression. C Users J. 12(2), 23–38 (1994)
Google Scholar
Guillaume, K., Kim, Y., Deng, Y., Senellart, J., Rush, A.M.: OpenNMT: open-source toolkit for neural machine translation. arXiv preprint arXiv:1701.02810 (2017)
Michael, C., Bragança, L., Paranaiba Vilela Neto, O., Nacif, J.A., Ferreira R.: Google colab cad4u: hands-on cloud laboratories for digital design. In: 2021 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–5. IEEE (2021)
Google Scholar
Yigezu, M.G., Woldeyohannis M.M., Tonja, A.L.:Multilingual neural machine translation for low resourced languages: Ometo-English. In: 2021 International Conference on Information and Communication Technology for Development for Africa (ICT4DA), pp. 89–94 (2021). https://doi.org/10.1109/ICT4DA53266.2021.9671270

Download references

Acknowledgments

The work was done with partial support from the Mexican Government through the grant A1S-47854 of CONACYT, Mexico, grants 20220852, 20220859, and 20221627 of the Secretaría de Investigación y Posgrado of the Instituto Politécnico Nacional, Mexico. The authors thank the CONACYT for the computing resources brought to them through the Plataforma de Aprendizaje Profundo para Tecnologías del Lenguaje of the Laboratorio de Supercómputo of the INAOE, Mexico and acknowledge the support of Microsoft through the Microsoft Latin America PhD Award.

Author information

Authors and Affiliations

Instituto Politécnico Nacional (IPN), Centro de Investigación en Computación (CIC), Mexico City, Mexico
Atnafu Lambebo Tonja, Olga Kolesnikova, Muhammad Arif, Alexander Gelbukh & Grigori Sidorov

Authors

Atnafu Lambebo Tonja
View author publications
You can also search for this author in PubMed Google Scholar
Olga Kolesnikova
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Arif
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Gelbukh
View author publications
You can also search for this author in PubMed Google Scholar
Grigori Sidorov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Atnafu Lambebo Tonja .

Editor information

Editors and Affiliations

Instituto Politécnico Nacional, Mexico, Mexico
Obdulia Pichardo Lagunas
Centro de Investigación Científica y de Educación Superior de Ensenada, Ensenada, Baja California, Mexico
Juan Martínez-Miranda
Instituto Politécnico Nacional, Mexico, Mexico
Bella Martínez Seis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tonja, A.L., Kolesnikova, O., Arif, M., Gelbukh, A., Sidorov, G. (2022). Improving Neural Machine Translation for Low Resource Languages Using Mixed Training: The Case of Ethiopian Languages. In: Pichardo Lagunas, O., Martínez-Miranda, J., Martínez Seis, B. (eds) Advances in Computational Intelligence. MICAI 2022. Lecture Notes in Computer Science(), vol 13613. Springer, Cham. https://doi.org/10.1007/978-3-031-19496-2_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-19496-2_3
Published: 23 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19495-5
Online ISBN: 978-3-031-19496-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Improving Neural Machine Translation for Low Resource Languages Using Mixed Training: The Case of Ethiopian Languages