Abstract
With the growth of various types of media data (e.g., text, image, video, and audio), fine-grained cross-media retrieval which aims to provide flexible and accurate query service has attracted significant attention. Different from traditional keyword-based retrieval, the queries and results in fine-grained cross-media retrieval might be different types. In this work, we demonstrate that the quality of retrieval results can be further improved by additionally considering the media-specified information. Besides, the process of feature extraction should be different for queries with various media types. To this end, we propose a novel network architecture, namely Double Branch Fine-grained Cross-media Net (DBFC-Net), which is the first work that can use the media-specific information to construct the common features by a uniform framework. Furthermore, we devise an effective distance metric (cosine+) for fine-grained cross-media retrieval. Compared with commonly-used metrics (e.g., cosine function), our proposed cosine+ metric is well adaptive to handle fine-grained retrieval scenarios. Extensive experiments and ablation studies on publicly available datasets demonstrate the effectiveness of our proposed approach.






Similar content being viewed by others
References
Sanchez-Nielsen, E., Chavez-Gutierrez, F., Lorenzo-Navarro, J.: A semantic parliamentary multimedia approach for retrieval of video clips with content understanding. Multimedia Syst. 25(4), 1–18 (2019)
Li, Xirong: Tag relevance fusion for social image retrieval. Multimedia Syst. 23(1), 29–40 (2017)
Yao, Y., Shen, F., Xie, G., Liu, L., Zhu, F., Zhang, J., Shen, H.T.: Exploiting web images for multi-output classification: From category to subcategories. IEEE Trans. Neural Netw. Learn. Syst. 31(7), 2348–2360 (2020)
Yao, Y., Zhang, J., Shen, F., Liu, L., Zhu, F., Zhang, D., Shen, H.T.: Towards automatic construction of diverse, high-quality image datasets. IEEE Trans. Knowl. Data Eng. 32(6), 1199–1211 (2019)
Yao, Y., Shen, F., Zhang, J., Liu, L., Tang, Z., Shao, L.: Extracting privileged information for enhancing classifier learning. IEEE Trans. Image Process. 28(1), 436–450 (2018)
Yao, Y., Zhang, J., Shen, F., Hua, X., Xu, J., Tang, Z.: Exploiting web images for dataset construction: a domain robust approach. IEEE Trans. Multimed. 19(8), 1771–1784 (2017)
Peng, Y., Huang, X., Zhao, Y.: An overview of cross-media retrieval: Concepts, methodologies, benchmarks, and challenges. IEEE Trans. Circuits Syst. Video Technol. 28(9), 2372–2385 (2017)
B. Wang, Y. Yang, X. Xu, A. Hanjalic, and H. T. Shen: Adversarial cross-modal retrieval, in Proceedings of the 25th ACM international conference on Multimedia, pp. 154–162 (2017)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, Eds., pp. 2672–2680 (2014)
Zhuo, Y., Qi, J. and P. YX: Cross-media deep fine-grained correlation learning, in Journal of Software 30(4), 884–895 (2019)
J. Schmidhuber, Long short-term memory, Neural Comput.z, vol. 9, no. 8, pp. 1735–1780
P. Y. X. Qi Jin-Wei, Huang Xin: A cross-media shared representation learning method based on cascaded deep networks (2019)
Peng, Y., Huang, X., Qi, J.: Cross-media shared representation by hierarchical learning with multiple deep networks. in IJCAI, pp. 3846–3853 (2016)
C. Pan, Cross-modal retrieval via deep and bidirectional representation learning, IEEE Transactions on Multimedia, vol. 18, no. 7, pp. 1363–1377
He, X., Peng, Y., Xie, L.: A new benchmark and approach for fine-grained cross-media retrieval, in Proceedings of the 27th ACM International Conference on Multimedia, pp. 1740–1748 (2019)
Huang, X., Peng, Y., Yuan, M.: Mhtn: Modal-adversarial hybrid transfer network for cross-modal retrieval, IEEE transactions on cybernetics (2018)
J. Xiao, Learning cross-media joint representation with sparse and semisupervised regularization, IEEE Transactions on Circuits & Systems for Video Technology, vol. 24, no. 6, pp. 965–978
Biswas, S.: Generalized semantic preserving hashing for n-label cross-modal retrieval, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
S. Yan, Cross-modal retrieval with CNN visual features: a new baseline, IEEE Transactions on Cybernetics, vol. 47, no. 2, pp. 449–460
Salakhutdinov, R.: Multimodal learning with deep Boltzmann machines, in International Conference on Neural Information Processing Systems (2012)
Yan, F., Mikolajczyk, K.: Deep correlation for matching images and text, in in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Gröchenig, K.: Found. Time-freq. Anal. 33(6), 464–466 (2001)
Xue, X.: Multi-stream multi-class fusion of deep networks for video classification, in the 2016 ACM (2016)
Jian, S.: Deep residual learning for image recognition, in IEEE Conference on Computer Vision & Pattern Recognition (2016)
Yu, Q.: A Discriminative Feature Learning Approach for Deep Face Recognition. Springer International Publishing, Berlin (2016)
Zheng, N.: Person re-identification by multi-channel parts-based cnn with improved triplet loss function, in Computer Vision & Pattern Recognition (2016)
Zhang, J., Peng, Y.: Multi-pathway generative adversarial hashing for unsupervised cross-modal retrieval. IEEE Trans. Multimed. 22(1), 174–187 (2019)
Peng, Y., Qi, J.: Quintuple-media joint correlation learning with deep compression and regularization. IEEE Trans. Circuits Syst. Video Technol. 30(8), 2709–2722 (2019)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, Q., Guo, Y. & Yao, Y. DBFC-Net: a uniform framework for fine-grained cross-media retrieval. Multimedia Systems 28, 423–432 (2022). https://doi.org/10.1007/s00530-021-00825-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00530-021-00825-2