Abstract
As a basic task of computer vision task, object localization plays an important role in many computer vision based applications. Supervised methods employ manual location labels to learn to localize the objects directly, but incomplete or incorrectly assigned location labels affect localization accuracy, and the cost of manual labelling should also be extremely large. This paper proposes a weakly-supervised localization method based on a multi-scale gradient-pyramid feature, which employs the weighted gradient features on the multiple convolutional layers in order to generate a gradient-pyramid feature for object localization. Pairs of gradients and features from different layers are first extracted to compute the gradient features. Then, during the fusion of the gradient features through a pyramid model, the larger value is selected as the result of the fusion task without using the concatenated method. Finally, the multi-scale gradient-pyramid feature is obtained and used to have a more accurate object localization by using the region scaling operation. Our proposed method can be directly integrated into the pre-trained classification model to perform object localization without additional training. Experimental results on the ILSVRC 2016 dataset and CUB-200-2011 dataset show that the proposed method can achieve better object localization performance.






Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The data that support the findings of this study are available in:
1) https://image-net.org/update-mar-11-2021.php.
2) http://www.vision.caltech.edu/datasets/cub_200_2011/.
References
Sabour S, Frosst N, Hinton GE (2017) Dynamic routing between capsules. In: Advances in neural information processing systems (NeurIPS), pp 3856–3866
Yang S, Gao T, Wang J, Deng B, Lansdell B, Linares-Barranco B (2021) Efficient spike-driven learning with dendritic event-based processing. Front Neuro Sci 15:97
Yang S, Wang J, Deng B, Azghadi MR, Linares-Barranco B (2021) Neuromorphic Context-dependent learning framework with fault-tolerant spike routing. IEEE Trans Neural Netw Learn Syst:1–15
Yang S, Wang J, Zhang N, Deng B, Pang Y, Azghadi MR (2021) CerebelluMorphic: large-scale neuromorphic model and architecture for supervised motor learning. IEEE Trans Neural Netw Learn Syst:1–15
Yang S, Wei X, Deng B, Liu C, Li H, Wang J (2018) Efficient digital implementation of a conductance-based globus pallidus neuron and the dynamics analysis. Phys A: Stat Mech Appl 494:484–502
Tychsen-Smith L, Petersson L (2018) Improving object localization with fitness nms and bounded iou loss. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6877–6885
Cui S, Wang R, Hu J, Wei J, Wang S, Lou Z (2021)In-hand object localization using a novel high-resolution Visuotactile sensor. IEEE Trans Ind Electron 69(6):6015–6025
Qin Z, Wang J, Lu Y (2019) Monogrnet: A geometric reasoning network for monocular 3d object localization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 33, no 01, pp 8851–8858
Cao G, Xie X, Yang W, Liao Q, Shi G, Wu J (2018)Feature-fused SSD: Fast detection for small objects. In: Ninth International Conference on Graphic and Image Processing. International Society for Optics and Photonics, vol 10615, p 106151E
Mhalla A, Chateau T, Gazzah S, Amara NEB (2018) An embedded computer-vision system for multi-object detection in traffic surveillance. IEEE Trans Intell Transp Syst 20(11):4006–4018
Cao J, Pang Y, Zhao S, Li X (2019)High-level semantic networks for multi-scale object detection. IEEE Trans Circuits Syst Video Technol 30(10):3372–3386
Amin J, Sharif M, Yasmin M, Fernandes SL (2020) A distinctive approach in brain tumor detection and classification using MRI. Pattern Recognit Lett 139:118–127
Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2015) Object detectors emerge in deep scene cnns. International Conference on Learning Representations
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp 618–626
Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2921–2929
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105
Zhang X, Wei Y, Kang G, Yang Y, Huang T (2018)Self-produced guidance for weakly-supervised object localization. In: Proceedings of the European conference on computer vision (ECCV), pp 597–613
Choe J, Shim H (2019)Attention-based dropout layer for weakly supervised object localization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 2219–2228
Mai J, Yang M, Luo W (2020) Erasing integrated learning: A simple yet effective approach for weakly supervised object localization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8766–8775
Meng M, Zhang T, Tian Q, Zhang Y, Wu F (2021) Foreground activation maps for weakly supervised object localization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 3385–3395
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In European conference on computer vision. Springer, Cham, pp 818–833
Singh KK, Lee YJ (2017) Hide-and-seek: Forcing a network to be meticulous for weakly-supervised object and action localization. In: 2017 IEEE international conference on computer vision (ICCV). IEEE, pp 3544–3553
Bazzani L, Bergamo A, Anguelov D, Torresani L (2016)Self-taught object localization with deep networks. In: 2016 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 1–9
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., … Rabinovich, A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning. PMLR, pp 448–456
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826
Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Thirty-first AAAI conference on artificial intelligence
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
Chen Y, Li J, Xiao H, Jin X, Yan S, Feng J (2017) Dual path networks. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp 4470–4478
Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical image computing and computer-assisted intervention. Springer, Cham, pp 234–241
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Chen LC, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation.arXiv preprint arXiv:1706.05587
Chen LC, Zhu Y, Papandreou G, Schroff F, Adam H (2018)Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV), pp 801–818
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2881–2890
Sagar A, Soundrapandiyan R (2021) Semantic segmentation with multi scale spatial attention for self driving cars. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 2650–2656
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
Liu Y, Cheng MM, Hu X, Wang K, Bai X (2017) Richer convolutional features for edge detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3000–3009
Gao S, Cheng MM, Zhao K, Zhang XY, Yang MH, Torr PH (2019) Res2net: A new multi-scale backbone architecture. IEEE Trans Pattern Anal Mach Intell
Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., … Fei-Fei, L (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The Caltech-UCSDBirds-200-2011 Dataset. Tech. Rep. Cns-Tr-2011-001, California Institute of Technology
Everingham M, Eslami SA, Van Gool L, Williams CK, Winn J, Zisserman A (2015) The pascal visual object classes challenge: A retrospective. Int J Comput Vis 111(1):98–136
Simonyan K, Vedaldi A, Zisserman A (2013) Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034
Zhang J, Bargal SA, Lin Z, Brandt J, Shen X, Sclaroff S (2018)Top-down neural attention by excitation backprop. Int J Comput Vis 126(10):1084–1102
Zhang X, Wei Y, Feng J, Yang Y, Huang TS (2018) Adversarial complementary learning for weakly supervised object localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1325–1334
Bae W, Noh J, Kim G (2020), August Rethinking class activation mapping for weakly supervised object localization. In: European Conference on Computer Vision. Springer, Cham, pp 618–634
Zhang CL, Cao YH, Wu J (2020) Rethinking the route towards weakly supervised object localization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 13460–13469
Choe J, Han D, Yun S, Ha JW, Oh SJ, Shim H (2021)Region-based dropout with attention prior for weakly supervised object localization. Pattern Recogn 116:107949
Babar S, Das S (2021) Where to Look?: Mining complementary image regions for weakly supervised object localization. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp 1010–1019
Acknowledgements
The authors would like to thank the editor and the reviewers for their critical and constructive comments and suggestions. We also would like to acknowledge Dr. Vasile Palade for proofreading the whole manuscript and giving the valuable comments. This work was supported in part by the National Natural Science Foundation of China (Projects Numbers: 61673194, 61672263, 61672265), and in part by the National First-Class Discipline Program of Light Industry Technology and Engineering (Project Number: LITE2018-25).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Mao, Z., Zhou, Y., Sun, J. et al. Weakly-supervised object localization with gradient-pyramid feature. Appl Intell 53, 2923–2935 (2023). https://doi.org/10.1007/s10489-022-03686-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03686-y