Abstract
Benefit from multi-scale feature pyramid methods, recently single-stage object detectors have achieved promising accuracy and fast inference speed. However, the majority of existing feature pyramid detection techniques only simply describe complex contextual relationships from different scales. Not only are there no effective modules that adaptively extend appropriate semantic information from deeper layers, but the finer spatial localization cues from lower layers are often ignored. In this paper, we present a Local Enhancement and Bidirectional Feature Refinement Network (LFBFR), which includes two optimization methods to achieve remarkable improvements in detection accuracy. Firstly, to make the backbone more suitable for detection task, we modify the pre-trained classification backbone to mitigate the loss of details in small objects due to consecutive decrease of the image resolution. Then we propose a Bidirectional Feature Refinement Pyramid, which can effectively utilize the inter-channel relationship of higher-level features and fine appearance cues from lower-level features by using the attention residual refinement module and the feature reuse module. Ultimately, to assess the performance of the proposed LFBFR, we design a powerful end-to-end single-stage detector called LFBFR-SSD by embedding it into the framework of SSD. Extensive experiments on the PASCAL VOC and MS COCO verify that our LFBFR-SSD outperforms a lot of state-of-the-art detectors while maintaining a real-time speed.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Adelson EH, Anderson CH, Bergen JR, Burt PJ, Ogden JM. Pyramid methods in image processing. RCA engineer. 1984;29(6):33–41.
LeCun Y, Bengio Y, et al. Convolutional networks for images, speech, and time series. The Handbook of Brain Theory and Neural Networks. 1995;3361(10):
Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2014. pp. 580–587.
Girshick R. Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision. 2015. pp. 1440–1448.
Ren S, He K, Girshick R, Sun J. Faster r-cnn: Towards real-time object detection with region proposal networks. In: Adv Neural Inf Proces Syst. 2015. pp 91–99.
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC. Ssd: Single shot multibox detector. In: European Conference on Computer Vision, Springer 2016. pp. 21–37.
Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: Unified, real-time object detection. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2016. pp. 779–788.
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv preprint 2014. arXiv:14091556
Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S. Feature pyramid networks for object detection. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2017. pp. 2117–2125.
Fu CY, Liu W, Ranga A, Tyagi A, Berg AC. Dssd: Deconvolutional single shot detector. arXiv preprint 2017. arXiv:170106659
Shen Z, Liu Z, Li J, Jiang YG, Chen Y, Xue X. Dsod: Learning deeply supervised object detectors from scratch. In: Proceedings of the IEEE International Conference on Computer Vision. 2017. pp. 1919–1927.
Kong T, Sun F, Yao A, Liu H, Lu M, Chen Y. Ron: Reverse connection with objectness prior networks for object detection. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2017. pp. 5936–5944.
Woo S, Hwang S, Kweon IS. Stairnet: Top-down semantic aggregation for accurate one shot detection. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE 2018, pp. 1093–1102.
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2016. pp. 770–778.
Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A. The pascal visual object classes (voc) challenge. Int J Comput Vis. 2010;88(2):303–38.
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL. Microsoft coco: Common objects in context. In: European Conference on Computer Vision, Springer 2014. pp. 740–755.
Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y. Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv preprint 2013. arXiv:13126229
Uijlings JR, Van De Sande KE, Gevers T, Smeulders AW. Selective search for object recognition. Int J Comput Vis. 2013;104(2):154–71.
Zitnick CL, Dollár P. Edge boxes: Locating object proposals from edges. In: European Conference on Computer Vision, Springer 2014. pp 391–405.
He K, Zhang X, Ren S, Sun J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell. 2015;37(9):1904–16.
Cai Z, Fan Q, Feris RiS, Vasconcelos N. A unified multi-scale deep convolutional neural network for fast object detection. In: European Conference on Computer Vision, Springer 2016. pp. 354–370.
Shrivastava A, Sukthankar R, Malik J, Gupta A. Beyond skip connections: Top-down modulation for object detection. arXiv preprint 2016. arXiv:161206851
Zhang S, Wen L, Bian X, Lei Z, Li SZ. Single-shot refinement neural network for object detection. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2018. pp. 4203–4212.
Chen X, Yu J, Kong S, Wu Z, Wen L. Dual refinement networks for accurate and fast object detection in real-world scenes. arXiv preprint 2018. arXiv:180708638
Zhang Z, Qiao S, Xie C, Shen W, Wang B, Yuille AL. Single-shot object detection with enriched semantics. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2018. pp. 5813–5821.
Kong T, Sun F, Tan C, Liu H, Huang W. Deep feature pyramid reconfiguration for object detection. In: Proceedings of the European Conference on Computer Vision (ECCV). 2018. pp 169–185.
Wang T, Anwer RM, Cholakkal H, Khan FS, Pang Y, Shao L. Learning rich features at high-speed for single-shot object detection. In: Proceedings of the IEEE International Conference on Computer Vision. 2019. pp 1971–1980.
Pang Y, Wang T, Anwer RM, Khan FS, Shao L. Efficient featurized image pyramid network for single shot detector. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2019. pp. 7336–7344.
Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X. Residual attention network for image classification. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2017. pp. 3156–3164.
Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2018. pp. 7132–7141.
Wang X, Girshick R, Gupta A, He K. Non-local neural networks. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2018. pp. 7794–7803.
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L. Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, IEEE 2009. pp. 248–255.
Jang HD, Woo S, Benz P, Park J, Kweon IS. Propose-and-attend single shot detector. In: The IEEE Winter Conference on Applications of Computer Vision. 2020. pp. 815–824.
Zhang H, Kang D, He H, Wang FY. Aplnet: Attention-enhanced progressive learning network. Neurocomputing. 2020;371:166–76.
Li S, Yang L, Huang J, Hua XS, Zhang L. Dynamic anchor feature selection for single-shot object detection. In: Proceedings of the IEEE International Conference on Computer Vision. 2019. pp. 6609–6618.
Xu X, Luo X, Ma L. Context-aware hierarchical feature attention network for multi-scale object detection. In: 2020 IEEE International Conference on Image Processing (ICIP), IEEE 2020. pp. 2011–2015.
Zhu Y, Zhao C, Wang J, Zhao X, Wu Y, Lu H. Couplenet: Coupling global structure with local parts for object detection. In: Proceedings of the IEEE International Conference on Computer Vision. 2017. pp. 4126–4134.
Kong T, Yao A, Chen Y, Sun F. Hypernet: Towards accurate region proposal generation and joint object detection. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2016. pp. 845–853.
Bell S, Lawrence Zitnick C, Bala K, Girshick R. Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2016. pp. 2874–2883.
Dai J, Li Y, He K, Sun J. R-fcn: Object detection via region-based fully convolutional networks. In: Adv Neural Inf Proces Syst. 2016. pp. 379–387.
Jeong J, Park H, Kwak N. Enhancement of ssd by concatenating feature maps for object detection. arXiv preprint 2017. arXiv:170509587
Lee K, Choi J, Jeong J, Kwak N. Residual features and unified prediction network for single stage detection. arXiv preprint 2017. arXiv:170705031
Xie S, Liu C, Gao J, Li X, Luo J, Fan B, Chen J, Pu H, Peng Y. Diverse receptive field network with context aggregation for fast object detection. J Vis Commun Image Represent. 2020. pp. 102770.
Liu S, Huang D, et al. Receptive field block net for accurate and fast object detection. In: Proceedings of the European Conference on Computer Vision (ECCV). 2018. pp. 385–400.
Quan Q, He F, Li H. A multi-phase blending method with incremental intensity for training detection networks. Vis Comput. 2020. pp. 1–15.
Redmon J, Farhadi A. Yolo9000: better, faster, stronger. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2017. pp. 7263–7271.
Shrivastava A, Gupta A, Girshick R. Training region-based object detectors with online hard example mining. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2016. pp. 761–769.
Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y. Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision. 2017. pp. 764–773.
Lin TY, Goyal P, Girshick R, He K, Dollár P. Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision. 2017. pp. 2980–2988.
Wang Q, Chen M, Nie F, Li X. Detecting coherent groups in crowd scenes by multiview clustering. IEEE Trans Pattern Anal Mach Intell. 2018;42(1):46–58.
Acknowledgements
This work was supported by the National Nature Science Foundation of China Grand No:61371156 and the Key R&D Program of Anhui Province Grand No:201904d07020118. The authors would like to thank the anonymous reviews for their helpful and constructive comments and suggestions regarding this manuscript.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of Interest
The authors declare that they have no conflict of interest.
Ethical Approval
This article does not contain any studies that used human participants or animals.
Rights and permissions
About this article
Cite this article
Ouyang, P., Zhu, J., Fan, C. et al. Local Enhancement and Bidirectional Feature Refinement Network for Single-Shot Detector. Cogn Comput 14, 1107–1122 (2022). https://doi.org/10.1007/s12559-020-09814-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12559-020-09814-5