Enhancing Zero-Shot Translation in Multilingual Neural Machine Translation: Focusing on Obtaining Location-Agnostic Representations

Zhang, Jiarui; Huang, Heyan; Hu, Yue; Guo, Ping

doi:10.1007/978-3-031-72350-6_13

Jiarui Zhang^11,12,14,
Heyan Huang^13,14,
Yue Hu^11,12 &
…
Ping Guo^11,12

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15022))

Included in the following conference series:

International Conference on Artificial Neural Networks

425 Accesses

Abstract

In the field of multilingual neural machine translation, a notable challenge is zero-shot translation, where a model translates languages that it has not been trained on. This often results in poor translation quality, mainly because the model’s internal language representations are too specific to its training languages. We illustrate that the positional relationship to input tokens is a primary factor contributing to the language-specific representations. We find a solution by modifying the model’s structure, specifically by removing certain connections in its encoder layer. This simple change significantly improves the quality of zero-shot translations, with an increase of up to 11.1 BLEU points, a measure of translation accuracy. Importantly, this improvement does not affect the quality of translations for the languages the model was trained on. Besides, our method facilitates the seamless incorporation of new languages, significantly broadening the scope of translation coverage.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 8007; Price includes VAT (Japan)

Softcover Book: JPY 10009; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.statmt.org/moses/.
2.
We also experimented with: 1) extending the removal of residual connections to additional layers, which, however, adversely affected the model’s convergence; 2) replacing residual connections with mean-pooled sentence embeddings. This latter approach yielded lesser gains in zero-shot translation directions than simply removing the residual connections.

References

Arivazhagan, N., Bapna, A., Firat, O., Aharoni, R., Johnson, M., Macherey, W.: The missing ingredient in zero-shot neural machine translation. arXiv preprint arXiv:1903.07091 (2019)
Belinkov, Y., Màrquez, L., Sajjad, H., Durrani, N., Dalvi, F., Glass, J.: Evaluating layers of representation in neural machine translation on part-of-speech and semantic tagging tasks. In: Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 1–10. Asian Federation of Natural Language Processing, Taipei (2017). https://www.aclweb.org/anthology/I17-1001
Cettolo, M., et al.: Overview of the IWSLT 2017 evaluation campaign. In: International Workshop on Spoken Language Translation, pp. 2–14 (2017). https://cris.fbk.eu/retrieve/handle/11582/312796/21009/iwslt17-overview.pdf
Chen, G., et al.: Zero-shot cross-lingual transfer of neural machine translation with multilingual pretrained encoders. arXiv preprint arXiv:2104.08757 (2021)
Dabre, R., Kurohashi, S.: MMCR4NLP: multilingual multiway corpora repository for natural language processing. arXiv preprint arXiv:1710.01025 (2017)
Fan, A., et al.: Beyond English-centric multilingual machine translation. arXiv preprint arXiv:2010.11125 (2020)
Gal, Y., Ghahramani, Z.: A theoretically grounded application of dropout in recurrent neural networks. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems 29, pp. 1019–1027. Curran Associates, Inc. (2016). http://papers.nips.cc/paper/6241-a-theoretically-grounded-application-of-dropout-in-recurrent-neural-networks.pdf
Ha, T.L., Niehues, J., Waibel, A.: Toward multilingual neural machine translation with universal encoder and decoder. In: Proceedings of IWSLT (2016). https://workshop2016.iwslt.org/downloads/IWSLT_2016_paper_5.pdf
Haddow, B., Kirefu, F.: Pmindia - a collection of parallel corpora of languages of India. arXiv preprint arXiv:2001.09907 (2020)
Johnson, M., et al.: Google’s multilingual neural machine translation system: enabling zero-shot translation. Trans. Assoc. Comput. Linguist. 5, 339–351 (2017). https://doi.org/10.1162/tacl_a_00065
Kudo, T., Richardson, J.: SentencePiece: a simple and language independent subword tokenizer and detokenizer for neural text processing. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 66–71. Association for Computational Linguistics, Brussels (2018). https://doi.org/10.18653/v1/D18-2012
Kudugunta, S., Bapna, A., Caswell, I., Firat, O.: Investigating multilingual NMT representations at scale. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 1565–1575. Association for Computational Linguistics, Hong Kong (2019). https://doi.org/10.18653/v1/D19-1167
Ma, S., et al.: Xlm-t: scaling up multilingual machine translation with pretrained cross-lingual transformer encoders. arXiv preprint arXiv:2012.15547 (2020)
Vaswani, A., et al.: Attention is all you need. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 5998–6008. Curran Associates, Inc. (2017). http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf
Yang, S., et al.: On the localness modeling for the self-attention based end-to-end speech synthesis. Neural Netw. 125, 121–130 (2020). https://doi.org/10.1016/j.neunet.2020.01.034

Download references

Acknowledgements

We thank all anonymous reviewers for their constructive comments and we have made some modifications. This work is supported by the National Natural Science Foundation of China (No. U21B2009).

Author information

Authors and Affiliations

Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
Jiarui Zhang, Yue Hu & Ping Guo
School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China
Jiarui Zhang, Yue Hu & Ping Guo
School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
Heyan Huang
Southeast Academy of Information Technology, Beijing Institute of Technology, Putian, China
Jiarui Zhang & Heyan Huang

Authors

Jiarui Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Heyan Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yue Hu
View author publications
You can also search for this author in PubMed Google Scholar
Ping Guo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Jiarui Zhang or Heyan Huang .

Editor information

Editors and Affiliations

IDSIA USI-SUPSI, Lugano, Switzerland
Michael Wand
Comenius University, Bratislava, Slovakia
Kristína Malinovská
KAUST Center of Generative AI, Thuwal, Saudi Arabia
Jürgen Schmidhuber
Helmholtz Zentrum München, Neuherberg, Germany
Igor V. Tetko

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, J., Huang, H., Hu, Y., Guo, P. (2024). Enhancing Zero-Shot Translation in Multilingual Neural Machine Translation: Focusing on Obtaining Location-Agnostic Representations. In: Wand, M., Malinovská, K., Schmidhuber, J., Tetko, I.V. (eds) Artificial Neural Networks and Machine Learning – ICANN 2024. ICANN 2024. Lecture Notes in Computer Science, vol 15022. Springer, Cham. https://doi.org/10.1007/978-3-031-72350-6_13

Download citation

DOI: https://doi.org/10.1007/978-3-031-72350-6_13
Published: 17 September 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72349-0
Online ISBN: 978-3-031-72350-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Enhancing Zero-Shot Translation in Multilingual Neural Machine Translation: Focusing on Obtaining Location-Agnostic Representations