End-to-end Saliency-Guided Deep Image Retrieval | SpringerLink
Skip to main content

End-to-end Saliency-Guided Deep Image Retrieval

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2020)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1332))

Included in the following conference series:

  • 2483 Accesses

Abstract

A challenging issue of content-based image retrieval (CBIR) is to distinguish the target object from cluttered backgrounds, resulting in more discriminative image embeddings, compared to situations where feature extraction is distracted by irrelevant objects. To handle the issue, we propose a saliency-guided model with deep image features. The model is fully based on convolution neural networks (CNNs) and it incorporates a visual saliency detection module, making saliency detection a preceding step of feature extraction. The resulted saliency maps are utilized to refine original inputs and then compatible image features suitable for ranking are extracted from refined inputs. The model suggests a working scheme of involving saliency information into existing CNN-based CBIR systems with minimum impacts on the them. Some work assist image retrieval with other methods like object detection or semantic segmentation, but they are not so fine-grained as saliency detection, meanwhile some of them require additional annotations to train. In contrast, we train the saliency module in weak-supervised end-to-end style and do not need saliency ground truth. Extensive experiments are conducted on standard image retrieval benchmarks and our model shows competitive retrieval results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 11439
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 14299
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bai, C., Chen, J., Huang, L., Kpalma, K., Chen, S.: Saliency-based multi-feature modeling for semantic image retrieval. J. Vis. Commun. Image Represent. 50, 199–204 (2018). https://doi.org/10.1016/j.jvcir.2017.11.021

    Article  Google Scholar 

  2. Gordo, A., Almazan, J., Revaud, J., Larlus, D.: End-to-end learning of deep visual representations for image retrieval. Int. J. Comput. Vis. 124(2), 237–254 (2017). https://doi.org/10.1007/s11263-017-1016-8

    Article  MathSciNet  Google Scholar 

  3. Han, K., Guo, J., Zhang, C., Zhu, M.: Attribute-aware attention model for fine-grained representation learning. In: Proceedings of the 26th ACM International Conference on Multimedia, pp. 2040–2048 (2018)

    Google Scholar 

  4. Hoang, T., Do, T.T., Le Tan, D.K., Cheung, N.M.: Selective deep convolutional features for image retrieval. In: Proceedings of the 25th ACM International Conference on Multimedia, pp. 1600–1608 (2017)

    Google Scholar 

  5. Jegou, H., Perronnin, F., Douze, M., Sanchez, J., Perez, P., Schmid, C.: Aggregating local image descriptors into compact codes. IEEE Trans. Pattern Anal. Mach. Intell. 34(9), 1704–1716 (2012). https://doi.org/10.1109/TPAMI.2011.235

    Article  Google Scholar 

  6. Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: CVPR 2010–23rd IEEE Conference on Computer Vision & Pattern Recognition, pp. 3304–3311. IEEE Computer Society (2010)

    Google Scholar 

  7. Kalantidis, Y., Mellina, C., Osindero, S.: Cross-dimensional weighting for aggregated deep convolutional features. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9913, pp. 685–701. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46604-0_48

    Chapter  Google Scholar 

  8. Liu, J.J., Hou, Q., Cheng, M.M., Feng, J., Jiang, J.: A simple pooling-based design for real-time salient object detection. arXiv:1904.09569 [cs] (2019)

  9. Ma, J., Gu, X.: Scene image retrieval with Siamese spatial attention pooling. Neurocomputing 412, 252–261 (2020)

    Article  Google Scholar 

  10. Noh, H., Araujo, A., Sim, J., Weyand, T., Han, B.: Large-scale image retrieval with attentive deep local features. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 3476–3485. IEEE, Venice (2017). https://doi.org/10.1109/ICCV.2017.374

  11. Oh Song, H., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4004–4012 (2016)

    Google Scholar 

  12. Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE, Minneapolis (2007). https://doi.org/10.1109/CVPR.2007.383172

  13. Radenovic, F., Tolias, G., Chum, O.: Fine-tuning CNN image retrieval with no human annotation. IEEE Trans. Pattern Anal. Mach. Intell. 7, 1655–1668 (2018). https://doi.org/10.1109/TPAMI.2018.2846566

    Article  Google Scholar 

  14. Razavian, A.S., Sullivan, J., Carlsson, S., Maki, A.: Visual instance retrieval with deep convolutional networks. ITE Trans. Media Technol. Appl. 4(3), 251–258 (2016). https://doi.org/10.3169/mta.4.251

    Article  Google Scholar 

  15. Sohn, K.: Improved deep metric learning with multi-class n-pair loss objective. In: Advances in Neural Information Processing Systems, pp. 1857–1865 (2016)

    Google Scholar 

  16. Tolias, G., Sicre, R., Jégou, H.: Particular object retrieval with integral max-pooling of CNN activations. arXiv:1511.05879 [cs] (2015)

  17. Wang, H., Li, Z., Li, Y., Gupta, B., Choi, C.: Visual saliency guided complex image retrieval. Pattern Recogn. Lett. 130, 64–72 (2020). https://doi.org/10.1016/j.patrec.2018.08.010

    Article  Google Scholar 

  18. Wei, S., Liao, L., Li, J., Zheng, Q., Yang, F., Zhao, Y.: Saliency inside: learning attentive CNNs for content-based image retrieval. IEEE Trans. Image Process. 28(9), 4580–4593 (2019)

    Article  MathSciNet  Google Scholar 

  19. Wei, X.S., Luo, J.H., Wu, J., Zhou, Z.H.: Selective convolutional descriptor aggregation for fine-grained image retrieval. IEEE Trans. Image Process. 26(6), 2868–2881 (2017)

    Article  MathSciNet  Google Scholar 

  20. Wengert, C., Douze, M., Jégou, H.: Bag-of-colors for improved image search. In: Proceedings of the 19th ACM International Conference on Multimedia, pp. 1437–1440 (2011)

    Google Scholar 

  21. Zheng, X., Ji, R., Sun, X., Wu, Y., Huang, F., Yang, Y.: Centralized ranking loss with weakly supervised localization for fine-grained object retrieval. In: IJCAI, pp. 1226–1233 (2018)

    Google Scholar 

  22. Zheng, X., Ji, R., Sun, X., Zhang, B., Wu, Y., Huang, F.: Towards optimal fine grained retrieval via decorrelated centralized loss with normalize-scale layer. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 9291–9298 (2019)

    Google Scholar 

Download references

Acknowledgments

This work was supported in part by National Natural Science Foundation of China under grant 61771145.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaodong Gu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ma, J., Gu, X. (2020). End-to-end Saliency-Guided Deep Image Retrieval. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Communications in Computer and Information Science, vol 1332. Springer, Cham. https://doi.org/10.1007/978-3-030-63820-7_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-63820-7_46

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-63819-1

  • Online ISBN: 978-3-030-63820-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics