Scalable Bag of Selected Deep Features for Visual Instance Retrieval

Lv, Yue; Zhou, Wengang; Tian, Qi; Li, Houqiang

doi:10.1007/978-3-319-73600-6_21

Yue Lv²¹,
Wengang Zhou²¹,
Qi Tian²² &
…
Houqiang Li²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10705))

Included in the following conference series:

International Conference on Multimedia Modeling

2939 Accesses
3 Citations

Abstract

Recent studies show that aggregating activations of convolutional layers from CNN models together as a global descriptor leads to promising performance for instance retrieval. However, due to the global pooling strategy adopted, the generated feature representation is lack of discriminative local structure information and is degraded by irrelevant image patterns or background clutter. In this paper, we propose a novel Bag-of-Deep-Visual-Words (BoDVW) model for instance retrieval. Activations of convolutional feature maps are extracted as a set of individual semantic-aware local features. An energy-based feature selection is adopted to filter out features on homogeneous background with poor distinction. To achieve the scalability of local feature-level cross matching, the local deep CNN features are quantized to adapt to the inverted index structure. A new cross-matching metric is defined to measure image similarity. Our approach achieves respectable performance in comparison to other state-of-the-art methods. Especially, it is proved to be more effective and efficient on large scale datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 5719; Price includes VAT (Japan)

Softcover Book: JPY 7149; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Instance-level object retrieval via deep region CNN

Article 13 September 2018

Deep Encoding Features for Instance Retrieval

Effective triplet mining improves training of multi-scale pooled CNN for image retrieval

Article 06 January 2022

References

Babenko, A., Lempitsky, V.: Aggregating local deep features for image retrieval. In: ICCV (2015)
Google Scholar
Babenko, A., Slesarev, A., Chigorin, A., Lempitsky, V.: Neural codes for image retrieval. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 584–599. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_38
Google Scholar
Gong, Y., Lazebnik, S., Gordo, A.: Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. TPAMI 35(12), 2916–2929 (2013)
Article Google Scholar
Gong, Y., Wang, L., Guo, R., Lazebnik, S.: Multi-scale orderless pooling of deep convolutional activation features. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 392–407. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10584-0_26
Google Scholar
Gordo, A., Almazan, J., Revaud, J., Lualus, D.: End-to-end learning of deep visual representations for image retrieval. arXiv preprint arXiv:1610.07940 (2016)
Jegou, H., Douze, M., Schmid, C.: Hamming embedding and weak geometric consistency for large scale image search. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 304–317. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88682-2_24
Chapter Google Scholar
Kalantidis, Y., Mellina, C., Osindero, S.: Cross-dimensional weighting for aggregated deep convolutional features. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9913, pp. 685–701. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46604-0_48
Chapter Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Li, Y., Kong, X., Zheng, L., Tian, Q.: Exploiting hierarchical activations of neural network for image retrieval. In: ACM MM, pp. 132–136 (2016)
Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004)
Article Google Scholar
Philbin, J., Chum, O., Isard, M.: Object retrieval with large vocabularies and fast spatial matching. In: CVPR (2007)
Google Scholar
Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Lost in quantization: improving particular object retrieval in large scale image databases. In: CVPR (2008)
Google Scholar
Razavian, A.S., Azizpour, H., Sullivan, J.: CNN features off-the-shelf: an astounding baseline for recognition. In: CVPRW (2014)
Google Scholar
Razavian, A.S., Sullivan, J., Carlsson, S.: Visual instance retrieval with deep convolutional networks. ITE Trans. Media Technol. Appl. 4(3), 251–258 (2016)
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Sun, S., Zhou, W., Tian, Q., Li, H.: Scalable object retrieval with compact image representation from generic object regions. TOMM 12(2), 29 (2016)
Google Scholar
Tolias, G., Sicre, R., Jégou, H.: Particular object retrieval with integral max-pooling of CNN activations. In: ICLR (2016)
Google Scholar
Wang, M., Zhou, W., Tian, Q., Li, H.: A general framework for linear distance preserving hashing. TIP (2017)
Google Scholar
Ng, J.Y.-H., Yang, F., Davis, L.S.: Exploiting local features from deep networks for image retrieval. In: CVPRW, pp. 53–61 (2015)
Google Scholar
Zheng, L., Yang, Y., Tian, Q.: SIFT meets CNN: a decade survey of instance retrieval. TPAMI (2017)
Google Scholar
Zhou, W., Li, H., Lu, Y., Tian, Q.: Large scale partial-duplicate image retrieval with bi-space quantization and geometric consistency. In: ICASSP, pp. 2394–2397 (2010)
Google Scholar
Zhou, W., Li, H., Yijuan, L., Tian, Q.: Principal visual word discovery for automatic license plate detection. TIP 21(9), 4269–4279 (2012)
MathSciNet MATH Google Scholar
Zhou, W., Li, H., Sun, J., Tian, Q.: Collaborative index embedding for image retrieval. TPAMI (2017)
Google Scholar
Zhou, W., Lu, Y., Li, H., Song, Y., Tian, Q.: Spatial coding for large scale partial-duplicate web image search. In: ACM MM (2010)
Google Scholar
Zhou, W., Yang, M., Wang, X., Li, H., Lin, Y., Tian, Q.: Scalable feature matching by dual cascaded scalar quantization for image retrieval. TPAMI 38(1), 159–171 (2016)
Article Google Scholar

Download references

Acknowledgement

This work was supported in part to Prof. Houqiang Li by 973 Program under contract No. 2015CB351803, NSFC under contract No. 61325009 and No. 61390514, in part to Dr. Wengang Zhou by NSFC under contract No. 61472378 and No. 61632019, the Young Elite Scientists Sponsorship Program by CAST under Grant 2016QNRC001, and the Fundamental Research Funds for the Central Universities, and in part to Dr. Qi Tian by ARO grant W911NF-15-1-0290 and Faculty Research Gift Awards by NEC Laboratories of America and Blippar. This work was supported in part by NSFC under contract No. 61429201.

Author information

Authors and Affiliations

University of Science and Technology of China, Hefei, Anhui, People’s Republic of China
Yue Lv, Wengang Zhou & Houqiang Li
University of Texas at San Antonio, San Antonio, USA
Qi Tian

Authors

Yue Lv
View author publications
You can also search for this author in PubMed Google Scholar
Wengang Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Qi Tian
View author publications
You can also search for this author in PubMed Google Scholar
Houqiang Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wengang Zhou .

Editor information

Editors and Affiliations

Alpen-Adria-Universität Klagenfurt, Klagenfurt, Austria
Klaus Schoeffmann
Chulalongkorn University, Bangkok, Thailand
Thanarat H. Chalidabhongse
City University of Hong Kong, Hong Kong, China
Chong Wah Ngo
Chulalongkorn University, Bangkok, Thailand
Supavadee Aramvith
Dublin City University, Dublin, Ireland
Noel E. O’Connor
Gwangju Institute of Science and Technology, Gwangju, Korea (Republic of)
Yo-Sung Ho
Tampere University of Technology, Tampere, Finland
Moncef Gabbouj
Rutgers University, Piscataway, New Jersey, USA
Ahmed Elgammal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lv, Y., Zhou, W., Tian, Q., Li, H. (2018). Scalable Bag of Selected Deep Features for Visual Instance Retrieval. In: Schoeffmann, K., et al. MultiMedia Modeling. MMM 2018. Lecture Notes in Computer Science(), vol 10705. Springer, Cham. https://doi.org/10.1007/978-3-319-73600-6_21

Download citation

DOI: https://doi.org/10.1007/978-3-319-73600-6_21
Published: 13 January 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73599-3
Online ISBN: 978-3-319-73600-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Scalable Bag of Selected Deep Features for Visual Instance Retrieval

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Instance-level object retrieval via deep region CNN

Deep Encoding Features for Instance Retrieval

Effective triplet mining improves training of multi-scale pooled CNN for image retrieval

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Scalable Bag of Selected Deep Features for Visual Instance Retrieval

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Instance-level object retrieval via deep region CNN

Deep Encoding Features for Instance Retrieval

Effective triplet mining improves training of multi-scale pooled CNN for image retrieval

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation