Enhancing Zero-Shot Translation in Multilingual Neural Machine Translation: Focusing on Obtaining Location-Agnostic Representations | SpringerLink
Skip to main content

Enhancing Zero-Shot Translation in Multilingual Neural Machine Translation: Focusing on Obtaining Location-Agnostic Representations

  • Conference paper
  • First Online:
Artificial Neural Networks and Machine Learning – ICANN 2024 (ICANN 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15022))

Included in the following conference series:

  • 425 Accesses

Abstract

In the field of multilingual neural machine translation, a notable challenge is zero-shot translation, where a model translates languages that it has not been trained on. This often results in poor translation quality, mainly because the model’s internal language representations are too specific to its training languages. We illustrate that the positional relationship to input tokens is a primary factor contributing to the language-specific representations. We find a solution by modifying the model’s structure, specifically by removing certain connections in its encoder layer. This simple change significantly improves the quality of zero-shot translations, with an increase of up to 11.1 BLEU points, a measure of translation accuracy. Importantly, this improvement does not affect the quality of translations for the languages the model was trained on. Besides, our method facilitates the seamless incorporation of new languages, significantly broadening the scope of translation coverage.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 8007
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 10009
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.statmt.org/moses/.

  2. 2.

    We also experimented with: 1) extending the removal of residual connections to additional layers, which, however, adversely affected the model’s convergence; 2) replacing residual connections with mean-pooled sentence embeddings. This latter approach yielded lesser gains in zero-shot translation directions than simply removing the residual connections.

References

  1. Arivazhagan, N., Bapna, A., Firat, O., Aharoni, R., Johnson, M., Macherey, W.: The missing ingredient in zero-shot neural machine translation. arXiv preprint arXiv:1903.07091 (2019)

  2. Belinkov, Y., Màrquez, L., Sajjad, H., Durrani, N., Dalvi, F., Glass, J.: Evaluating layers of representation in neural machine translation on part-of-speech and semantic tagging tasks. In: Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 1–10. Asian Federation of Natural Language Processing, Taipei (2017). https://www.aclweb.org/anthology/I17-1001

  3. Cettolo, M., et al.: Overview of the IWSLT 2017 evaluation campaign. In: International Workshop on Spoken Language Translation, pp. 2–14 (2017). https://cris.fbk.eu/retrieve/handle/11582/312796/21009/iwslt17-overview.pdf

  4. Chen, G., et al.: Zero-shot cross-lingual transfer of neural machine translation with multilingual pretrained encoders. arXiv preprint arXiv:2104.08757 (2021)

  5. Dabre, R., Kurohashi, S.: MMCR4NLP: multilingual multiway corpora repository for natural language processing. arXiv preprint arXiv:1710.01025 (2017)

  6. Fan, A., et al.: Beyond English-centric multilingual machine translation. arXiv preprint arXiv:2010.11125 (2020)

  7. Gal, Y., Ghahramani, Z.: A theoretically grounded application of dropout in recurrent neural networks. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems 29, pp. 1019–1027. Curran Associates, Inc. (2016). http://papers.nips.cc/paper/6241-a-theoretically-grounded-application-of-dropout-in-recurrent-neural-networks.pdf

  8. Ha, T.L., Niehues, J., Waibel, A.: Toward multilingual neural machine translation with universal encoder and decoder. In: Proceedings of IWSLT (2016). https://workshop2016.iwslt.org/downloads/IWSLT_2016_paper_5.pdf

  9. Haddow, B., Kirefu, F.: Pmindia - a collection of parallel corpora of languages of India. arXiv preprint arXiv:2001.09907 (2020)

  10. Johnson, M., et al.: Google’s multilingual neural machine translation system: enabling zero-shot translation. Trans. Assoc. Comput. Linguist. 5, 339–351 (2017). https://doi.org/10.1162/tacl_a_00065

  11. Kudo, T., Richardson, J.: SentencePiece: a simple and language independent subword tokenizer and detokenizer for neural text processing. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 66–71. Association for Computational Linguistics, Brussels (2018). https://doi.org/10.18653/v1/D18-2012

  12. Kudugunta, S., Bapna, A., Caswell, I., Firat, O.: Investigating multilingual NMT representations at scale. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 1565–1575. Association for Computational Linguistics, Hong Kong (2019). https://doi.org/10.18653/v1/D19-1167

  13. Ma, S., et al.: Xlm-t: scaling up multilingual machine translation with pretrained cross-lingual transformer encoders. arXiv preprint arXiv:2012.15547 (2020)

  14. Vaswani, A., et al.: Attention is all you need. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 5998–6008. Curran Associates, Inc. (2017). http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf

  15. Yang, S., et al.: On the localness modeling for the self-attention based end-to-end speech synthesis. Neural Netw. 125, 121–130 (2020). https://doi.org/10.1016/j.neunet.2020.01.034

Download references

Acknowledgements

We thank all anonymous reviewers for their constructive comments and we have made some modifications. This work is supported by the National Natural Science Foundation of China (No. U21B2009).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jiarui Zhang or Heyan Huang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, J., Huang, H., Hu, Y., Guo, P. (2024). Enhancing Zero-Shot Translation in Multilingual Neural Machine Translation: Focusing on Obtaining Location-Agnostic Representations. In: Wand, M., Malinovská, K., Schmidhuber, J., Tetko, I.V. (eds) Artificial Neural Networks and Machine Learning – ICANN 2024. ICANN 2024. Lecture Notes in Computer Science, vol 15022. Springer, Cham. https://doi.org/10.1007/978-3-031-72350-6_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72350-6_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72349-0

  • Online ISBN: 978-3-031-72350-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics