Stacked Pyramid Attention Network for Object Detection

Hao, Shijie; Wang, Zhonghao; Sun, Fuming

doi:10.1007/s11063-021-10505-x

Stacked Pyramid Attention Network for Object Detection

Published: 07 April 2021

Volume 54, pages 2759–2782, (2022)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

Shijie Hao¹,
Zhonghao Wang¹ &
Fuming Sun²

503 Accesses
1 Citation
Explore all metrics

Abstract

Scale variation is one of the primary challenges in object detection. Recently, different strategies have been introduced to address this challenge, achieving promising performance. However, limitations still exist in these detectors. On the one hand, as for the large-scale deep layers, the localizing power of the features is relatively low. On the other hand, as for the small-scale shallow layers, the categorizing ability of the features is relatively weak. Actually, the limitations are self-solving, as the above two aspects can be mutually beneficial to each other. Therefore, we propose the Stacked Pyramid Attention Network (SPANet) to bridge the gap between different scales. In SPANet, two lightweight modules, i.e. top-down feature map attention module (TDFAM) and bottom-up feature map attention module (BUFAM), are designed. Via learning the channel attention and spatial attention, each module effectively builds connections between features from adjacent scales. By progressively integrating BUFAM and TDFAM into two encoder–decoder structures, two novel feature aggregating branches are built. In this way, the branches fully complement the localizing power from small-scale features and the categorizing power from large-scale features, therefore enhancing the detection accuracy while keeping lightweight. Extensive experiments on two challenging benchmarks (PASCAL VOC and MS COCO datasets) demonstrate the effectiveness of our SPANet, showing that our model reaches a competitive trade-off between accuracy and speed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Learning Discriminated Features Based on Feature Pyramid Networks and Attention for Multi-scale Object Detection

Article 26 August 2022

Scale-Insensitive Object Detection via Attention Feature Pyramid Transformer Network

Article 19 October 2021

Multi-scale Attention-Based Feature Pyramid Networks for Object Detection

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Bell S, Lawrence ZC, Bala K, Girshick R (2016) Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2874–2883
Bodla N, Singh B, Chellappa R, Davis LS (2017) Soft-nms—improving object detection with one line of code. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 5561–5569
Cai Z, Vasconcelos N (2018) Cascade r-cnn: delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 6154–6162
Cao J, Pang Y, Zhao S, Li X (2019) High-level semantic networks for multi-scale object detection. IEEE Trans Circuits Syst Video Technol 30:3372–3386
Article Google Scholar
Cao J, Cholakkal H, Anwer RM, Khan FS, Pang Y, Shao L (2020) D2det: towards high quality object detection and instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11485–11494
Chen Y, Yang T, Zhang X, Meng G, Xiao X, Sun J (2019) Detnas: backbone search for object detection. In: Advances in neural information processing systems (NIPS), pp 6638–6648
Dai J, Li Y, He K, Sun J (2016) R-fcn: object detection via region-based fully convolutional networks. In: Advances in neural information processing systems (NIPS), pp 379–387
Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 764–773
Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The Pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338
Article Google Scholar
Fu CY, Liu W, Ranga A, Tyagi A, Berg AC (2017) Dssd: deconvolutional single shot detector. arXiv preprint arXiv:1701.06659
Fu Z, Jin Z, Qi G, Shen C, Jiang R, Chen Y, Hua X (2018) Previewer for multi-scale object detector. In: Proceedings of the 26th ACM international conference on multimedia (MM), pp 265–273
Ghiasi G, Lin TY, Le QV (2019) Nas-fpn: Learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 7036–7045
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 580–587
Guo Y, Wu Z, Shen D (2020) Learning longitudinal classification-regression model for infant hippocampus segmentation. Neurocomputing 391:191–198
Article Google Scholar
Hao S, Zhou Y, Guo Y (2020) A brief survey on semantic segmentation with deep learning. Neurocomputing 406:302–321
Article Google Scholar
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 2961–2969
Huang J, Rathod V, Sun C, Zhu M, Korattikara A, Fathi A, Fischer I, Wojna Z, Song Y, Guadarrama S, et al (2017) Speed/accuracy trade-offs for modern convolutional object detectors. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 7310–7311
Huang L, Yang Y, Deng Y, Yu Y (2015) Densebox: Unifying landmark localization with end to end object detection. arXiv preprint arXiv:1509.04874
Ji Z, Kong Q, Wang H, Pang Y (2019) Small and dense commodity object detection with multi-scale receptive field attention. In: Proceedings of the 27th ACM international conference on multimedia (MM), pp 1349–1357
Law H, Deng J (2018) Cornernet: detecting objects as paired keypoints. In: Proceedings of the European conference on computer vision (ECCV), pp 734–750
Li H, Sun F, Liu L, Wang L (2015) A novel traffic sign detection method via color segmentation and robust shape matching. Neurocomputing 169:77–88
Article Google Scholar
Li S, Yang L, Huang J, Hua XS, Zhang L (2019) Dynamic anchor feature selection for single-shot object detection. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 6609–6618
Li Y, Chen Y, Wang N, Zhang Z (2019) Scale-aware trident networks for object detection. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 6054–6063
Li Y, Pang Y, Shen J, Cao J, Shao L (2020) Netnet: neighbor erasing and transferring network for better single shot object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13349–13358
Li Z, Lang C, Liew J, Hou Q, Li Y, Feng J (2020) Cross-layer feature pyramid network for salient object detection. arXiv preprint arXiv:2002.10864
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: Proceedings of the European conference on computer vision (ECCV). Springer, pp 740–755
Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2117–2125
Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 2980–2988
Liu S, Huang D, et al (2018) Receptive field block net for accurate and fast object detection. In: Proceedings of the European conference on computer vision (ECCV), pp 385–400
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: single shot multibox detector. In: Proceedings of the European conference on computer vision (ECCV). Springer, pp 21–37
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3431–3440
Ma X, Wang Z, Li H, Zhang P, Ouyang W, Fan X (2019) Accurate monocular 3d object detection via color-embedded 3d reconstruction for autonomous driving. In: Proceedings of the IEEE international conference on computer vision, pp 6851–6860
Ouyang W, Wang X, Zeng X, Qiu S, Luo P, Tian Y, Li H, Yang S, Wang Z, Loy CC, et al (2015) Deepid-net: deformable deep convolutional neural networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2403–2412
Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 7263–7271
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 779–788
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems (NIPS), pp 91–99
Shrivastava A, Sukthankar R, Malik J, Gupta A (2016) Beyond skip connections: top-down modulation for object detection. arXiv preprint arXiv:1612.06851
Tian Z, Shen C, Chen H, He T (2019) Fcos: fully convolutional one-stage object detection. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 9627–9636
Uijlings JR, Van De Sande KE, Gevers T, Smeulders AW (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171
Article Google Scholar
Wang J, Chen K, Yang S, Loy CC, Lin D (2019) Region proposal by guided anchoring. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2965–2974
Woo S, Park J, Lee JY, So KI (2018) Cbam: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19
Yang B, Yang C, Liu Q, Yin X (2019) Joint rotation-invariance face detection and alignment with angle-sensitivity cascaded networks. In: Proceedings of the 27th ACM international conference on multimedia (MM), pp 1473–1480
Yu F, Koltun V (2015) Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122
Yu J, Jiang Y, Wang Z, Cao Z, Huang T (2016) Unitbox: an advanced object detection network. In: Proceedings of the 24th ACM international conference on multimedia (MM), pp 516–520
Zhang C, Kim J (2019) Object detection with location-aware deformable convolution and backward attention filtering. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 9452–9461
Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Single-shot refinement neural network for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4203–4212
Zhang Z, Qiao S, Xie C, Shen W, Wang B, Yuille AL (2018) Single-shot object detection with enriched semantics. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 5813–5821
Zhao Q, Sheng T, Wang Y, Tang Z, Chen Y, Cai L, Ling H (2019) M2det: a single-shot object detector based on multi-level feature pyramid network. Proc AAAI Conf Artif Intell 33:9259–9266
Google Scholar
Zhou X, Zhuo J, Krahenbuhl P (2019) Bottom-up object detection by grouping extreme and center points. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 850–859
Zhu Y, Zhao C, Wang J, Zhao X, Wu Y, Lu H (2017) Couplenet: coupling global structure with local parts for object detection. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 4126–4134
Zitnick CL, Dollár P (2014) Edge boxes: locating object proposals from edges. In: Proceedings of the European conference on computer vision (ECCV). Springer, pp 391–405

Download references

Acknowledgements

The research was supported in part by the National Key Research and Development Program under Grant Nos. 2017YFC0820604, and in part by the National Nature Science Foundation of China under Grant Nos. 61772171, 62072152 and the Fundamental Research Funds for the Central Universities under grants PA2020GDKC0023, PA2019GDZC0095.

Author information

Authors and Affiliations

Hefei University of Technology, Hefei, 230009, China
Shijie Hao & Zhonghao Wang
Dalian Minzu University, Dalian, 116600, China
Fuming Sun

Authors

Shijie Hao
View author publications
You can also search for this author in PubMed Google Scholar
Zhonghao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Fuming Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shijie Hao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hao, S., Wang, Z. & Sun, F. Stacked Pyramid Attention Network for Object Detection. Neural Process Lett 54, 2759–2782 (2022). https://doi.org/10.1007/s11063-021-10505-x

Download citation

Accepted: 21 March 2021
Published: 07 April 2021
Issue Date: August 2022
DOI: https://doi.org/10.1007/s11063-021-10505-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Stacked Pyramid Attention Network for Object Detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Learning Discriminated Features Based on Feature Pyramid Networks and Attention for Multi-scale Object Detection

Scale-Insensitive Object Detection via Attention Feature Pyramid Transformer Network

Multi-scale Attention-Based Feature Pyramid Networks for Object Detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Stacked Pyramid Attention Network for Object Detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Learning Discriminated Features Based on Feature Pyramid Networks and Attention for Multi-scale Object Detection

Scale-Insensitive Object Detection via Attention Feature Pyramid Transformer Network

Multi-scale Attention-Based Feature Pyramid Networks for Object Detection

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation