Abstract
Visual content improves the alignments in the language latent spaces since the physical visual sensation is similar for people speaking different languages. Therefore, some researchers have recently proposed an unsupervised multimodal machine translation (UMMT) method for low-resource settings, which leverages images as pseudo-pivots to facilitate latent space alignment. However, they only consider region or grid image features in high-resource close language pairs (CLP), e.g., English-German (En-De) and English-French (En-Fr), which ignores the effect of applying more informative features to UMMT in low-resource distant language pairs (DLP), e.g., Chinese-Uyghur (Zh-Uy) and English-Uyghur (En-Uy). In this paper, we exploit a pre-training language model and a UMMT model with different granularity of image features and study the influence of image features on DLP and CLP translation. The experimental results on the CLP dataset Multi30k and the DLP dataset Multi30K-Zh-Uy show that the proposed approach has significantly improved over the state-of-the-art methods. The code is available at https://github.com/Turghuns/UMMT-DGIF.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Artetxe, M., Labaka, G., Agirre, E., Cho, K.: Unsupervised neural machine translation. In: International Conference on Learning Representations, pp. 1–12 (2018)
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: 3rd International Conference on Learning Representations, pp. 1–15 (2015)
Caglayan, O., Kuyu, M., Amac, M.S., Madhyastha, P., Erdem, E., Erdem, A., Specia, L.: Cross-lingual visual pre-training for multimodal machine translation. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics, pp. 1317–1324 (2021)
Chang, P., Galley, M., Manning, C.D.: Optimizing chinese word segmentation for machine translation performance. In: Proceedings of the Third Workshop on Statistical Machine Translation, pp. 224–232 (2008)
Chen, S., Jin, Q., Fu, J.: From words to sentences: A progressive learning approach for zero-resource machine translation with visual pivots. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, pp. 4932–4938 (2019)
Chen, Y., Liu, Y., Cheng, Y., Li, V.O.K.: A teacher-student framework for zero-resource neural machine translation. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pp. 1925–1935 (2017)
Chen, Y., Liu, Y., Li, V.O.K.: Zero-resource neural machine translation with multi-agent communication game. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th Innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence, pp. 5086–5093 (2018)
Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, pp. 103–111 (2014)
Conneau, A., Lample, G.: Cross-lingual language model pretraining. In: Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems, pp. 7057–7067 (2019)
Elliott, D., Frank, S., Sima’an, K., Specia, L.: Multi30k: multilingual English-German image descriptions. In: Proceedings of the 5th Workshop on Vision and Language, pp. 70–74 (2016)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Huang, P., Sun, S., Yang, H.: Image-assisted transformer in zero-resource multi-modal translation. In: International Conference on Acoustics, Speech and Signal Processing, pp. 7548–7552 (2021)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Computer Vision and Pattern Recognition, pp. 1–15 (2015)
Lample, G., Conneau, A., Denoyer, L., Ranzato, M.: Unsupervised machine translation using monolingual corpora only. In: 6th International Conference on Learning Representations, pp. 1–14 (2018)
Lample, G., Ott, M., Conneau, A., Denoyer, L., Ranzato, M.: Phrase-based & neural unsupervised machine translation. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 5039–5049 (2018)
Läubli, S., Sennrich, R., Volk, M.: Has machine translation achieved human parity? A case for document-level evaluation. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 4791–4796 (2018)
Lavie, A., Agarwal, A.: METEOR: an automatic metric for MT evaluation with high levels of correlation with human judgments. In: The Second Workshop on Statistical Machine Translation, pp. 228–231 (2007)
Li, L., Tayir, T., Han, Y., Tao, X., Velásquez, J.D.: Multimodality information fusion for automated machine translation. Inf. Fusion 91, 352–363 (2023)
Li, L., Tayir, T., Hu, K., Zhou, D.: Multi-modal and multi-perspective machine translation by collecting diverse alignments. In: PRICAI 2021: Trends in Artificial Intelligence - 18th Pacific Rim International Conference on Artificial Intelligence, pp. 311–322 (2021)
Nakayama, H., Nishida, N.: Zero-resource machine translation by multimodal encoder-decoder network with multimedia pivot. Mach. Transl. 31(1–2), 49–64 (2017)
Papineni, K., Roukos, S., Ward, T., Zhu, W.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318 (2002)
Popel, M., Tomkova, M., Tomek, J., Kaiser, Ł, Uszkoreit, J., Bojar, O., Žabokrtskỳ, Z.: Transforming machine translation: a deep learning system reaches news translation quality comparable to human professionals. Nat. Commun. 11(1), 4381 (2020)
Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Conference on Neural Information Processing Systems, pp. 91–99 (2015)
Sennrich, R., Haddow, B., Birch, A.: Improving neural machine translation models with monolingual data. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pp. 86–96 (2016)
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pp. 1715–1725 (2016)
Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: Proceedings of Association for Machine Translation in the Americas, pp. 223–231 (2006)
Su, Y., Fan, K., Bach, N., Kuo, C.J., Huang, F.: Unsupervised multi-modal neural machine translation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 10482–10491 (2019)
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems, pp. 3104–3112 (2014)
Tayir, T., Li, L.: Unsupervised multimodal machine translation for low-resource distant language pairs. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 23(4) (2024)
Tayir, T., Li, L., Li, B., Liu, J., Lee, K.A.: Encoder-decoder calibration for multimodal machine translation. IEEE Trans. Artif. Intell. 1–9 (2024)
Toral, A., Castilho, S., Hu, K., Way, A.: Attaining the unattainable? reassessing claims of human parity in neural machine translation. In: Proceedings of the Third Conference on Machine Translation, pp. 113–123 (2018)
Acknowledgement
This work is partially supported by NSFC, China (No.62276196).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Tayir, T., Li, L., Maimaiti, M., Muhtar, Y. (2025). Low-Resource Machine Translation with Different Granularity Image Features. In: Lin, Z., et al. Pattern Recognition and Computer Vision. PRCV 2024. Lecture Notes in Computer Science, vol 15035. Springer, Singapore. https://doi.org/10.1007/978-981-97-8620-6_18
Download citation
DOI: https://doi.org/10.1007/978-981-97-8620-6_18
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-8619-0
Online ISBN: 978-981-97-8620-6
eBook Packages: Computer ScienceComputer Science (R0)