Abstract
A challenging issue of content-based image retrieval (CBIR) is to distinguish the target object from cluttered backgrounds, resulting in more discriminative image embeddings, compared to situations where feature extraction is distracted by irrelevant objects. To handle the issue, we propose a saliency-guided model with deep image features. The model is fully based on convolution neural networks (CNNs) and it incorporates a visual saliency detection module, making saliency detection a preceding step of feature extraction. The resulted saliency maps are utilized to refine original inputs and then compatible image features suitable for ranking are extracted from refined inputs. The model suggests a working scheme of involving saliency information into existing CNN-based CBIR systems with minimum impacts on the them. Some work assist image retrieval with other methods like object detection or semantic segmentation, but they are not so fine-grained as saliency detection, meanwhile some of them require additional annotations to train. In contrast, we train the saliency module in weak-supervised end-to-end style and do not need saliency ground truth. Extensive experiments are conducted on standard image retrieval benchmarks and our model shows competitive retrieval results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bai, C., Chen, J., Huang, L., Kpalma, K., Chen, S.: Saliency-based multi-feature modeling for semantic image retrieval. J. Vis. Commun. Image Represent. 50, 199–204 (2018). https://doi.org/10.1016/j.jvcir.2017.11.021
Gordo, A., Almazan, J., Revaud, J., Larlus, D.: End-to-end learning of deep visual representations for image retrieval. Int. J. Comput. Vis. 124(2), 237–254 (2017). https://doi.org/10.1007/s11263-017-1016-8
Han, K., Guo, J., Zhang, C., Zhu, M.: Attribute-aware attention model for fine-grained representation learning. In: Proceedings of the 26th ACM International Conference on Multimedia, pp. 2040–2048 (2018)
Hoang, T., Do, T.T., Le Tan, D.K., Cheung, N.M.: Selective deep convolutional features for image retrieval. In: Proceedings of the 25th ACM International Conference on Multimedia, pp. 1600–1608 (2017)
Jegou, H., Perronnin, F., Douze, M., Sanchez, J., Perez, P., Schmid, C.: Aggregating local image descriptors into compact codes. IEEE Trans. Pattern Anal. Mach. Intell. 34(9), 1704–1716 (2012). https://doi.org/10.1109/TPAMI.2011.235
Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: CVPR 2010–23rd IEEE Conference on Computer Vision & Pattern Recognition, pp. 3304–3311. IEEE Computer Society (2010)
Kalantidis, Y., Mellina, C., Osindero, S.: Cross-dimensional weighting for aggregated deep convolutional features. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9913, pp. 685–701. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46604-0_48
Liu, J.J., Hou, Q., Cheng, M.M., Feng, J., Jiang, J.: A simple pooling-based design for real-time salient object detection. arXiv:1904.09569 [cs] (2019)
Ma, J., Gu, X.: Scene image retrieval with Siamese spatial attention pooling. Neurocomputing 412, 252–261 (2020)
Noh, H., Araujo, A., Sim, J., Weyand, T., Han, B.: Large-scale image retrieval with attentive deep local features. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 3476–3485. IEEE, Venice (2017). https://doi.org/10.1109/ICCV.2017.374
Oh Song, H., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4004–4012 (2016)
Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE, Minneapolis (2007). https://doi.org/10.1109/CVPR.2007.383172
Radenovic, F., Tolias, G., Chum, O.: Fine-tuning CNN image retrieval with no human annotation. IEEE Trans. Pattern Anal. Mach. Intell. 7, 1655–1668 (2018). https://doi.org/10.1109/TPAMI.2018.2846566
Razavian, A.S., Sullivan, J., Carlsson, S., Maki, A.: Visual instance retrieval with deep convolutional networks. ITE Trans. Media Technol. Appl. 4(3), 251–258 (2016). https://doi.org/10.3169/mta.4.251
Sohn, K.: Improved deep metric learning with multi-class n-pair loss objective. In: Advances in Neural Information Processing Systems, pp. 1857–1865 (2016)
Tolias, G., Sicre, R., Jégou, H.: Particular object retrieval with integral max-pooling of CNN activations. arXiv:1511.05879 [cs] (2015)
Wang, H., Li, Z., Li, Y., Gupta, B., Choi, C.: Visual saliency guided complex image retrieval. Pattern Recogn. Lett. 130, 64–72 (2020). https://doi.org/10.1016/j.patrec.2018.08.010
Wei, S., Liao, L., Li, J., Zheng, Q., Yang, F., Zhao, Y.: Saliency inside: learning attentive CNNs for content-based image retrieval. IEEE Trans. Image Process. 28(9), 4580–4593 (2019)
Wei, X.S., Luo, J.H., Wu, J., Zhou, Z.H.: Selective convolutional descriptor aggregation for fine-grained image retrieval. IEEE Trans. Image Process. 26(6), 2868–2881 (2017)
Wengert, C., Douze, M., Jégou, H.: Bag-of-colors for improved image search. In: Proceedings of the 19th ACM International Conference on Multimedia, pp. 1437–1440 (2011)
Zheng, X., Ji, R., Sun, X., Wu, Y., Huang, F., Yang, Y.: Centralized ranking loss with weakly supervised localization for fine-grained object retrieval. In: IJCAI, pp. 1226–1233 (2018)
Zheng, X., Ji, R., Sun, X., Zhang, B., Wu, Y., Huang, F.: Towards optimal fine grained retrieval via decorrelated centralized loss with normalize-scale layer. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 9291–9298 (2019)
Acknowledgments
This work was supported in part by National Natural Science Foundation of China under grant 61771145.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Ma, J., Gu, X. (2020). End-to-end Saliency-Guided Deep Image Retrieval. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Communications in Computer and Information Science, vol 1332. Springer, Cham. https://doi.org/10.1007/978-3-030-63820-7_46
Download citation
DOI: https://doi.org/10.1007/978-3-030-63820-7_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63819-1
Online ISBN: 978-3-030-63820-7
eBook Packages: Computer ScienceComputer Science (R0)