Low-Resource Machine Translation with Different Granularity Image Features

Tayir, Turghun; Li, Lin; Maimaiti, Mieradilijiang; Muhtar, Yusnur

doi:10.1007/978-981-97-8620-6_18

Turghun Tayir¹⁵,
Lin Li¹⁵,
Mieradilijiang Maimaiti¹⁶ &
…
Yusnur Muhtar¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15035))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

108 Accesses

Abstract

Visual content improves the alignments in the language latent spaces since the physical visual sensation is similar for people speaking different languages. Therefore, some researchers have recently proposed an unsupervised multimodal machine translation (UMMT) method for low-resource settings, which leverages images as pseudo-pivots to facilitate latent space alignment. However, they only consider region or grid image features in high-resource close language pairs (CLP), e.g., English-German (En-De) and English-French (En-Fr), which ignores the effect of applying more informative features to UMMT in low-resource distant language pairs (DLP), e.g., Chinese-Uyghur (Zh-Uy) and English-Uyghur (En-Uy). In this paper, we exploit a pre-training language model and a UMMT model with different granularity of image features and study the influence of image features on DLP and CLP translation. The experimental results on the CLP dataset Multi30k and the DLP dataset Multi30K-Zh-Uy show that the proposed approach has significantly improved over the state-of-the-art methods. The code is available at https://github.com/Turghuns/UMMT-DGIF.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 10295; Price includes VAT (Japan)

Softcover Book: JPY 12869; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Text-image matching for multi-model machine translation

Article 09 May 2023

CLIP-enhanced multimodal machine translation: integrating visual and label features with transformer fusion

Article 05 June 2024

An empirical study of a novel multimodal dataset for low-resource machine translation

Article 29 July 2024

Notes

References

Artetxe, M., Labaka, G., Agirre, E., Cho, K.: Unsupervised neural machine translation. In: International Conference on Learning Representations, pp. 1–12 (2018)
Google Scholar
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: 3rd International Conference on Learning Representations, pp. 1–15 (2015)
Google Scholar
Caglayan, O., Kuyu, M., Amac, M.S., Madhyastha, P., Erdem, E., Erdem, A., Specia, L.: Cross-lingual visual pre-training for multimodal machine translation. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics, pp. 1317–1324 (2021)
Google Scholar
Chang, P., Galley, M., Manning, C.D.: Optimizing chinese word segmentation for machine translation performance. In: Proceedings of the Third Workshop on Statistical Machine Translation, pp. 224–232 (2008)
Google Scholar
Chen, S., Jin, Q., Fu, J.: From words to sentences: A progressive learning approach for zero-resource machine translation with visual pivots. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, pp. 4932–4938 (2019)
Google Scholar
Chen, Y., Liu, Y., Cheng, Y., Li, V.O.K.: A teacher-student framework for zero-resource neural machine translation. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pp. 1925–1935 (2017)
Google Scholar
Chen, Y., Liu, Y., Li, V.O.K.: Zero-resource neural machine translation with multi-agent communication game. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th Innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence, pp. 5086–5093 (2018)
Google Scholar
Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, pp. 103–111 (2014)
Google Scholar
Conneau, A., Lample, G.: Cross-lingual language model pretraining. In: Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems, pp. 7057–7067 (2019)
Google Scholar
Elliott, D., Frank, S., Sima’an, K., Specia, L.: Multi30k: multilingual English-German image descriptions. In: Proceedings of the 5th Workshop on Vision and Language, pp. 70–74 (2016)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Huang, P., Sun, S., Yang, H.: Image-assisted transformer in zero-resource multi-modal translation. In: International Conference on Acoustics, Speech and Signal Processing, pp. 7548–7552 (2021)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Computer Vision and Pattern Recognition, pp. 1–15 (2015)
Google Scholar
Lample, G., Conneau, A., Denoyer, L., Ranzato, M.: Unsupervised machine translation using monolingual corpora only. In: 6th International Conference on Learning Representations, pp. 1–14 (2018)
Google Scholar
Lample, G., Ott, M., Conneau, A., Denoyer, L., Ranzato, M.: Phrase-based & neural unsupervised machine translation. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 5039–5049 (2018)
Google Scholar
Läubli, S., Sennrich, R., Volk, M.: Has machine translation achieved human parity? A case for document-level evaluation. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 4791–4796 (2018)
Google Scholar
Lavie, A., Agarwal, A.: METEOR: an automatic metric for MT evaluation with high levels of correlation with human judgments. In: The Second Workshop on Statistical Machine Translation, pp. 228–231 (2007)
Google Scholar
Li, L., Tayir, T., Han, Y., Tao, X., Velásquez, J.D.: Multimodality information fusion for automated machine translation. Inf. Fusion 91, 352–363 (2023)
Article Google Scholar
Li, L., Tayir, T., Hu, K., Zhou, D.: Multi-modal and multi-perspective machine translation by collecting diverse alignments. In: PRICAI 2021: Trends in Artificial Intelligence - 18th Pacific Rim International Conference on Artificial Intelligence, pp. 311–322 (2021)
Google Scholar
Nakayama, H., Nishida, N.: Zero-resource machine translation by multimodal encoder-decoder network with multimedia pivot. Mach. Transl. 31(1–2), 49–64 (2017)
Article Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318 (2002)
Google Scholar
Popel, M., Tomkova, M., Tomek, J., Kaiser, Ł, Uszkoreit, J., Bojar, O., Žabokrtskỳ, Z.: Transforming machine translation: a deep learning system reaches news translation quality comparable to human professionals. Nat. Commun. 11(1), 4381 (2020)
Google Scholar
Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Conference on Neural Information Processing Systems, pp. 91–99 (2015)
Google Scholar
Sennrich, R., Haddow, B., Birch, A.: Improving neural machine translation models with monolingual data. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pp. 86–96 (2016)
Google Scholar
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pp. 1715–1725 (2016)
Google Scholar
Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: Proceedings of Association for Machine Translation in the Americas, pp. 223–231 (2006)
Google Scholar
Su, Y., Fan, K., Bach, N., Kuo, C.J., Huang, F.: Unsupervised multi-modal neural machine translation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 10482–10491 (2019)
Google Scholar
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems, pp. 3104–3112 (2014)
Google Scholar
Tayir, T., Li, L.: Unsupervised multimodal machine translation for low-resource distant language pairs. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 23(4) (2024)
Google Scholar
Tayir, T., Li, L., Li, B., Liu, J., Lee, K.A.: Encoder-decoder calibration for multimodal machine translation. IEEE Trans. Artif. Intell. 1–9 (2024)
Google Scholar
Toral, A., Castilho, S., Hu, K., Way, A.: Attaining the unattainable? reassessing claims of human parity in neural machine translation. In: Proceedings of the Third Conference on Machine Translation, pp. 113–123 (2018)
Google Scholar

Download references

Acknowledgement

This work is partially supported by NSFC, China (No.62276196).

Author information

Authors and Affiliations

Wuhan University of Technology, Wuhan, China
Turghun Tayir & Lin Li
XinJiang Technical Institute of Physics and Chemistry Chinese Academy of Sciences, Urumqi, China
Mieradilijiang Maimaiti
South China University of Technology, Guangzhou, China
Yusnur Muhtar

Authors

Turghun Tayir
View author publications
You can also search for this author in PubMed Google Scholar
Lin Li
View author publications
You can also search for this author in PubMed Google Scholar
Mieradilijiang Maimaiti
View author publications
You can also search for this author in PubMed Google Scholar
Yusnur Muhtar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lin Li .

Editor information

Editors and Affiliations

Peking University, Beijing, China
Zhouchen Lin
Nankai University, Tianjin, China
Ming-Ming Cheng
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Ran He
Xinjiang University, Ürümqi, Xinjiang, China
Kurban Ubul
Xinjiang University, Ürümqi, China
Wushouer Silamu
Peking University, Beijing, China
Hongbin Zha
Tsinghua University, Beijing, China
Jie Zhou
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tayir, T., Li, L., Maimaiti, M., Muhtar, Y. (2025). Low-Resource Machine Translation with Different Granularity Image Features. In: Lin, Z., et al. Pattern Recognition and Computer Vision. PRCV 2024. Lecture Notes in Computer Science, vol 15035. Springer, Singapore. https://doi.org/10.1007/978-981-97-8620-6_18

Download citation

DOI: https://doi.org/10.1007/978-981-97-8620-6_18
Published: 20 October 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-8619-0
Online ISBN: 978-981-97-8620-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Low-Resource Machine Translation with Different Granularity Image Features

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Text-image matching for multi-model machine translation

CLIP-enhanced multimodal machine translation: integrating visual and label features with transformer fusion

An empirical study of a novel multimodal dataset for low-resource machine translation

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Low-Resource Machine Translation with Different Granularity Image Features

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Text-image matching for multi-model machine translation

CLIP-enhanced multimodal machine translation: integrating visual and label features with transformer fusion

An empirical study of a novel multimodal dataset for low-resource machine translation

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation