Scalable Bag of Selected Deep Features for Visual Instance Retrieval | SpringerLink
Skip to main content

Scalable Bag of Selected Deep Features for Visual Instance Retrieval

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10705))

Included in the following conference series:

Abstract

Recent studies show that aggregating activations of convolutional layers from CNN models together as a global descriptor leads to promising performance for instance retrieval. However, due to the global pooling strategy adopted, the generated feature representation is lack of discriminative local structure information and is degraded by irrelevant image patterns or background clutter. In this paper, we propose a novel Bag-of-Deep-Visual-Words (BoDVW) model for instance retrieval. Activations of convolutional feature maps are extracted as a set of individual semantic-aware local features. An energy-based feature selection is adopted to filter out features on homogeneous background with poor distinction. To achieve the scalability of local feature-level cross matching, the local deep CNN features are quantized to adapt to the inverted index structure. A new cross-matching metric is defined to measure image similarity. Our approach achieves respectable performance in comparison to other state-of-the-art methods. Especially, it is proved to be more effective and efficient on large scale datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 5719
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 7149
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Babenko, A., Lempitsky, V.: Aggregating local deep features for image retrieval. In: ICCV (2015)

    Google Scholar 

  2. Babenko, A., Slesarev, A., Chigorin, A., Lempitsky, V.: Neural codes for image retrieval. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 584–599. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_38

    Google Scholar 

  3. Gong, Y., Lazebnik, S., Gordo, A.: Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. TPAMI 35(12), 2916–2929 (2013)

    Article  Google Scholar 

  4. Gong, Y., Wang, L., Guo, R., Lazebnik, S.: Multi-scale orderless pooling of deep convolutional activation features. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 392–407. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10584-0_26

    Google Scholar 

  5. Gordo, A., Almazan, J., Revaud, J., Lualus, D.: End-to-end learning of deep visual representations for image retrieval. arXiv preprint arXiv:1610.07940 (2016)

  6. Jegou, H., Douze, M., Schmid, C.: Hamming embedding and weak geometric consistency for large scale image search. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 304–317. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88682-2_24

    Chapter  Google Scholar 

  7. Kalantidis, Y., Mellina, C., Osindero, S.: Cross-dimensional weighting for aggregated deep convolutional features. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9913, pp. 685–701. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46604-0_48

    Chapter  Google Scholar 

  8. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

    Google Scholar 

  9. Li, Y., Kong, X., Zheng, L., Tian, Q.: Exploiting hierarchical activations of neural network for image retrieval. In: ACM MM, pp. 132–136 (2016)

    Google Scholar 

  10. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004)

    Article  Google Scholar 

  11. Philbin, J., Chum, O., Isard, M.: Object retrieval with large vocabularies and fast spatial matching. In: CVPR (2007)

    Google Scholar 

  12. Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Lost in quantization: improving particular object retrieval in large scale image databases. In: CVPR (2008)

    Google Scholar 

  13. Razavian, A.S., Azizpour, H., Sullivan, J.: CNN features off-the-shelf: an astounding baseline for recognition. In: CVPRW (2014)

    Google Scholar 

  14. Razavian, A.S., Sullivan, J., Carlsson, S.: Visual instance retrieval with deep convolutional networks. ITE Trans. Media Technol. Appl. 4(3), 251–258 (2016)

    Article  Google Scholar 

  15. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  16. Sun, S., Zhou, W., Tian, Q., Li, H.: Scalable object retrieval with compact image representation from generic object regions. TOMM 12(2), 29 (2016)

    Google Scholar 

  17. Tolias, G., Sicre, R., Jégou, H.: Particular object retrieval with integral max-pooling of CNN activations. In: ICLR (2016)

    Google Scholar 

  18. Wang, M., Zhou, W., Tian, Q., Li, H.: A general framework for linear distance preserving hashing. TIP (2017)

    Google Scholar 

  19. Ng, J.Y.-H., Yang, F., Davis, L.S.: Exploiting local features from deep networks for image retrieval. In: CVPRW, pp. 53–61 (2015)

    Google Scholar 

  20. Zheng, L., Yang, Y., Tian, Q.: SIFT meets CNN: a decade survey of instance retrieval. TPAMI (2017)

    Google Scholar 

  21. Zhou, W., Li, H., Lu, Y., Tian, Q.: Large scale partial-duplicate image retrieval with bi-space quantization and geometric consistency. In: ICASSP, pp. 2394–2397 (2010)

    Google Scholar 

  22. Zhou, W., Li, H., Yijuan, L., Tian, Q.: Principal visual word discovery for automatic license plate detection. TIP 21(9), 4269–4279 (2012)

    MathSciNet  MATH  Google Scholar 

  23. Zhou, W., Li, H., Sun, J., Tian, Q.: Collaborative index embedding for image retrieval. TPAMI (2017)

    Google Scholar 

  24. Zhou, W., Lu, Y., Li, H., Song, Y., Tian, Q.: Spatial coding for large scale partial-duplicate web image search. In: ACM MM (2010)

    Google Scholar 

  25. Zhou, W., Yang, M., Wang, X., Li, H., Lin, Y., Tian, Q.: Scalable feature matching by dual cascaded scalar quantization for image retrieval. TPAMI 38(1), 159–171 (2016)

    Article  Google Scholar 

Download references

Acknowledgement

This work was supported in part to Prof. Houqiang Li by 973 Program under contract No. 2015CB351803, NSFC under contract No. 61325009 and No. 61390514, in part to Dr. Wengang Zhou by NSFC under contract No. 61472378 and No. 61632019, the Young Elite Scientists Sponsorship Program by CAST under Grant 2016QNRC001, and the Fundamental Research Funds for the Central Universities, and in part to Dr. Qi Tian by ARO grant W911NF-15-1-0290 and Faculty Research Gift Awards by NEC Laboratories of America and Blippar. This work was supported in part by NSFC under contract No. 61429201.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wengang Zhou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lv, Y., Zhou, W., Tian, Q., Li, H. (2018). Scalable Bag of Selected Deep Features for Visual Instance Retrieval. In: Schoeffmann, K., et al. MultiMedia Modeling. MMM 2018. Lecture Notes in Computer Science(), vol 10705. Springer, Cham. https://doi.org/10.1007/978-3-319-73600-6_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-73600-6_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-73599-3

  • Online ISBN: 978-3-319-73600-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics