Abstract
Recent studies show that aggregating activations of convolutional layers from CNN models together as a global descriptor leads to promising performance for instance retrieval. However, due to the global pooling strategy adopted, the generated feature representation is lack of discriminative local structure information and is degraded by irrelevant image patterns or background clutter. In this paper, we propose a novel Bag-of-Deep-Visual-Words (BoDVW) model for instance retrieval. Activations of convolutional feature maps are extracted as a set of individual semantic-aware local features. An energy-based feature selection is adopted to filter out features on homogeneous background with poor distinction. To achieve the scalability of local feature-level cross matching, the local deep CNN features are quantized to adapt to the inverted index structure. A new cross-matching metric is defined to measure image similarity. Our approach achieves respectable performance in comparison to other state-of-the-art methods. Especially, it is proved to be more effective and efficient on large scale datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Babenko, A., Lempitsky, V.: Aggregating local deep features for image retrieval. In: ICCV (2015)
Babenko, A., Slesarev, A., Chigorin, A., Lempitsky, V.: Neural codes for image retrieval. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 584–599. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_38
Gong, Y., Lazebnik, S., Gordo, A.: Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. TPAMI 35(12), 2916–2929 (2013)
Gong, Y., Wang, L., Guo, R., Lazebnik, S.: Multi-scale orderless pooling of deep convolutional activation features. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 392–407. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10584-0_26
Gordo, A., Almazan, J., Revaud, J., Lualus, D.: End-to-end learning of deep visual representations for image retrieval. arXiv preprint arXiv:1610.07940 (2016)
Jegou, H., Douze, M., Schmid, C.: Hamming embedding and weak geometric consistency for large scale image search. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 304–317. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88682-2_24
Kalantidis, Y., Mellina, C., Osindero, S.: Cross-dimensional weighting for aggregated deep convolutional features. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9913, pp. 685–701. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46604-0_48
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Li, Y., Kong, X., Zheng, L., Tian, Q.: Exploiting hierarchical activations of neural network for image retrieval. In: ACM MM, pp. 132–136 (2016)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004)
Philbin, J., Chum, O., Isard, M.: Object retrieval with large vocabularies and fast spatial matching. In: CVPR (2007)
Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Lost in quantization: improving particular object retrieval in large scale image databases. In: CVPR (2008)
Razavian, A.S., Azizpour, H., Sullivan, J.: CNN features off-the-shelf: an astounding baseline for recognition. In: CVPRW (2014)
Razavian, A.S., Sullivan, J., Carlsson, S.: Visual instance retrieval with deep convolutional networks. ITE Trans. Media Technol. Appl. 4(3), 251–258 (2016)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Sun, S., Zhou, W., Tian, Q., Li, H.: Scalable object retrieval with compact image representation from generic object regions. TOMM 12(2), 29 (2016)
Tolias, G., Sicre, R., Jégou, H.: Particular object retrieval with integral max-pooling of CNN activations. In: ICLR (2016)
Wang, M., Zhou, W., Tian, Q., Li, H.: A general framework for linear distance preserving hashing. TIP (2017)
Ng, J.Y.-H., Yang, F., Davis, L.S.: Exploiting local features from deep networks for image retrieval. In: CVPRW, pp. 53–61 (2015)
Zheng, L., Yang, Y., Tian, Q.: SIFT meets CNN: a decade survey of instance retrieval. TPAMI (2017)
Zhou, W., Li, H., Lu, Y., Tian, Q.: Large scale partial-duplicate image retrieval with bi-space quantization and geometric consistency. In: ICASSP, pp. 2394–2397 (2010)
Zhou, W., Li, H., Yijuan, L., Tian, Q.: Principal visual word discovery for automatic license plate detection. TIP 21(9), 4269–4279 (2012)
Zhou, W., Li, H., Sun, J., Tian, Q.: Collaborative index embedding for image retrieval. TPAMI (2017)
Zhou, W., Lu, Y., Li, H., Song, Y., Tian, Q.: Spatial coding for large scale partial-duplicate web image search. In: ACM MM (2010)
Zhou, W., Yang, M., Wang, X., Li, H., Lin, Y., Tian, Q.: Scalable feature matching by dual cascaded scalar quantization for image retrieval. TPAMI 38(1), 159–171 (2016)
Acknowledgement
This work was supported in part to Prof. Houqiang Li by 973 Program under contract No. 2015CB351803, NSFC under contract No. 61325009 and No. 61390514, in part to Dr. Wengang Zhou by NSFC under contract No. 61472378 and No. 61632019, the Young Elite Scientists Sponsorship Program by CAST under Grant 2016QNRC001, and the Fundamental Research Funds for the Central Universities, and in part to Dr. Qi Tian by ARO grant W911NF-15-1-0290 and Faculty Research Gift Awards by NEC Laboratories of America and Blippar. This work was supported in part by NSFC under contract No. 61429201.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Lv, Y., Zhou, W., Tian, Q., Li, H. (2018). Scalable Bag of Selected Deep Features for Visual Instance Retrieval. In: Schoeffmann, K., et al. MultiMedia Modeling. MMM 2018. Lecture Notes in Computer Science(), vol 10705. Springer, Cham. https://doi.org/10.1007/978-3-319-73600-6_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-73600-6_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73599-3
Online ISBN: 978-3-319-73600-6
eBook Packages: Computer ScienceComputer Science (R0)