Abstract
In the field of multilingual neural machine translation, a notable challenge is zero-shot translation, where a model translates languages that it has not been trained on. This often results in poor translation quality, mainly because the model’s internal language representations are too specific to its training languages. We illustrate that the positional relationship to input tokens is a primary factor contributing to the language-specific representations. We find a solution by modifying the model’s structure, specifically by removing certain connections in its encoder layer. This simple change significantly improves the quality of zero-shot translations, with an increase of up to 11.1 BLEU points, a measure of translation accuracy. Importantly, this improvement does not affect the quality of translations for the languages the model was trained on. Besides, our method facilitates the seamless incorporation of new languages, significantly broadening the scope of translation coverage.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
We also experimented with: 1) extending the removal of residual connections to additional layers, which, however, adversely affected the model’s convergence; 2) replacing residual connections with mean-pooled sentence embeddings. This latter approach yielded lesser gains in zero-shot translation directions than simply removing the residual connections.
References
Arivazhagan, N., Bapna, A., Firat, O., Aharoni, R., Johnson, M., Macherey, W.: The missing ingredient in zero-shot neural machine translation. arXiv preprint arXiv:1903.07091 (2019)
Belinkov, Y., Màrquez, L., Sajjad, H., Durrani, N., Dalvi, F., Glass, J.: Evaluating layers of representation in neural machine translation on part-of-speech and semantic tagging tasks. In: Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 1–10. Asian Federation of Natural Language Processing, Taipei (2017). https://www.aclweb.org/anthology/I17-1001
Cettolo, M., et al.: Overview of the IWSLT 2017 evaluation campaign. In: International Workshop on Spoken Language Translation, pp. 2–14 (2017). https://cris.fbk.eu/retrieve/handle/11582/312796/21009/iwslt17-overview.pdf
Chen, G., et al.: Zero-shot cross-lingual transfer of neural machine translation with multilingual pretrained encoders. arXiv preprint arXiv:2104.08757 (2021)
Dabre, R., Kurohashi, S.: MMCR4NLP: multilingual multiway corpora repository for natural language processing. arXiv preprint arXiv:1710.01025 (2017)
Fan, A., et al.: Beyond English-centric multilingual machine translation. arXiv preprint arXiv:2010.11125 (2020)
Gal, Y., Ghahramani, Z.: A theoretically grounded application of dropout in recurrent neural networks. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems 29, pp. 1019–1027. Curran Associates, Inc. (2016). http://papers.nips.cc/paper/6241-a-theoretically-grounded-application-of-dropout-in-recurrent-neural-networks.pdf
Ha, T.L., Niehues, J., Waibel, A.: Toward multilingual neural machine translation with universal encoder and decoder. In: Proceedings of IWSLT (2016). https://workshop2016.iwslt.org/downloads/IWSLT_2016_paper_5.pdf
Haddow, B., Kirefu, F.: Pmindia - a collection of parallel corpora of languages of India. arXiv preprint arXiv:2001.09907 (2020)
Johnson, M., et al.: Google’s multilingual neural machine translation system: enabling zero-shot translation. Trans. Assoc. Comput. Linguist. 5, 339–351 (2017). https://doi.org/10.1162/tacl_a_00065
Kudo, T., Richardson, J.: SentencePiece: a simple and language independent subword tokenizer and detokenizer for neural text processing. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 66–71. Association for Computational Linguistics, Brussels (2018). https://doi.org/10.18653/v1/D18-2012
Kudugunta, S., Bapna, A., Caswell, I., Firat, O.: Investigating multilingual NMT representations at scale. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 1565–1575. Association for Computational Linguistics, Hong Kong (2019). https://doi.org/10.18653/v1/D19-1167
Ma, S., et al.: Xlm-t: scaling up multilingual machine translation with pretrained cross-lingual transformer encoders. arXiv preprint arXiv:2012.15547 (2020)
Vaswani, A., et al.: Attention is all you need. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 5998–6008. Curran Associates, Inc. (2017). http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf
Yang, S., et al.: On the localness modeling for the self-attention based end-to-end speech synthesis. Neural Netw. 125, 121–130 (2020). https://doi.org/10.1016/j.neunet.2020.01.034
Acknowledgements
We thank all anonymous reviewers for their constructive comments and we have made some modifications. This work is supported by the National Natural Science Foundation of China (No. U21B2009).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, J., Huang, H., Hu, Y., Guo, P. (2024). Enhancing Zero-Shot Translation in Multilingual Neural Machine Translation: Focusing on Obtaining Location-Agnostic Representations. In: Wand, M., Malinovská, K., Schmidhuber, J., Tetko, I.V. (eds) Artificial Neural Networks and Machine Learning – ICANN 2024. ICANN 2024. Lecture Notes in Computer Science, vol 15022. Springer, Cham. https://doi.org/10.1007/978-3-031-72350-6_13
Download citation
DOI: https://doi.org/10.1007/978-3-031-72350-6_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72349-0
Online ISBN: 978-3-031-72350-6
eBook Packages: Computer ScienceComputer Science (R0)