Multiple information perception-based attention in YOLO for underwater object detection

Shen, Xin; Wang, Huibing; Cui, Tianxiang; Guo, Zhicheng; Fu, Xianping

doi:10.1007/s00371-023-02858-2

Multiple information perception-based attention in YOLO for underwater object detection

Original article
Published: 25 May 2023

Volume 40, pages 1415–1438, (2024)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Xin Shen¹,
Huibing Wang¹,
Tianxiang Cui¹,
Zhicheng Guo¹ &
…
Xianping Fu^1,2

1335 Accesses
Explore all metrics

A Publisher Correction to this article was published on 23 June 2023

This article has been updated

Abstract

Underwater object detection is a prerequisite for underwater robots to achieve autonomous operation and ocean exploration. However, poor imaging quality, harsh underwater environments, and concealed underwater targets greatly aggravate the difficulty of underwater object detection. In order to reduce underwater background interference and improve underwater object perception, we propose a multiple information perception-based attention module (MIPAM), which is mainly composed of five processes. In information preprocessing, spatial downsampling and channel splitting control parameters and computations of attention module by reducing dimension sizes. In information collection, channel-level information collection and spatial-level information collection enhance the semantic information expression by perceiving multi-dimensional dependency information, multi-dimensional structure information and multi-dimensional global information. In information interaction, channel-driven information interaction and spatial-driven information interaction stimulate the intrinsic interaction potential by further perceiving multi-dimensional diversity information. Adaptive feature fusion further improves the information interaction quality by allocating learnable parameters. In attention activation, the multi-branch structure enhances the attention calibration efficiency by generating multiple attention. In information postprocessing, channel concatenation and spatial upsampling realize the plug-and-play of attention module by restoring original feature states. In order to meet the high-precision and real-time requirements for underwater object detection, we integrate MIPAM into YOLO detectors. The experimental results indicate that our work brings significant performance gains for underwater detection tasks. Our work also provides some performance improvements for other detection tasks, which shows the ideal generalization ability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Multi-dimensional, multi-functional and multi-level attention in YOLO for underwater object detection

Article 14 July 2023

High-strength synergic-calibration attention system in YOLO for underwater object detection application

Article 16 November 2024

Efficient underwater object detection based on feature enhancement and attention detection head

Article Open access 18 February 2025

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability statement

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Change history

23 June 2023
A Correction to this paper has been published: https://doi.org/10.1007/s00371-023-02928-5

References

Jiang, M., Zhai, F.H., Kong, J.: Sparse attention module for optimizing semantic segmentation performance combined with a multi-task feature extraction network. Vis. Comput. 38(7), 2473–2488 (2022)
Article Google Scholar
Yang, Q.N., Shi, W.M., Chen, J., Tang, Y.: Localization of hard joints in human pose estimation based on residual down-sampling and attention mechanism. Vis. Comput. 38(7), 2447–2459 (2022)
Article Google Scholar
Cheng, Z.M., Qu, A.P., He, X.F.: Contour-aware semantic segmentation network with spatial attention mechanism for medical image. Vis. Comput. 38(3), 749–762 (2022)
Article PubMed Google Scholar
Li, Z.X., Lu, S.H., Dong, Y.S., Guo, J.Y.: Msffa: a multi-scale feature fusion and attention mechanism network for crowd counting. Vis. Comput. 1–12 (2022)
Li, X.L., Hua, Z., Li, J.J.: Attention-based adaptive feature selection for multi-stage image dehazing. Vis. Comput., 1–16 (2022)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7132–7141 (2018)
Park, J., Woo, S., Lee, J.Y., Kweon, I.S.: Bam: bottleneck attention module. arXiv preprint arXiv:1807.06514 (2018)
Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: Cbam: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)
Lee, H., Kim, H.E., Nam, H.: Srm: A style-based recalibration module for convolutional neural networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1854–1862 (2019)
Li, X., Hu, X.L., Yang, J.: Spatial group-wise enhance: Improving semantic feature learning in convolutional networks. arXiv preprint arXiv:1905.09646 (2019)
Wang, Q., Wu, B., Zhu, P., Li, P., Hu, Q.: Eca-net: Efficient channel attention for deep convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Zhang, Q.L., Yang, Y.B.: Sa-net: Shuffle attention for deep convolutional neural networks. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2235–2239 (2021). IEEE
Zhang, H., Zu, K., Lu, J., Zou, Y., Meng, D.: Epsanet: an efficient pyramid split attention block on convolutional neural network. arXiv preprint arXiv:2105.14447 (2021)
Yang, Z.X., Zhu, L.C., Wu, Y., Yang, Y.: Gated channel transformation for visual recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11794–11803 (2020)
Qin, Z., Zhang, P., Wu, F., Li, X.: Fcanet: Frequency channel attention networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 783–792 (2021)
Misra, D., Nalamada, T., Arasanipalai, A.U., Hou, Q.: Rotate to attend: convolutional triplet attention module. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3139–3148 (2021)
Chen, Y.P., Kalantidis, Y., Li, J.S., Yan, S.C., Feng, J.S.: A \(\hat{\,}\) 2-nets: double attention networks. Adv. Neural Inf. Process. Syst. 31 (2018)
Gao, Z.L., Xie, J.T., Wang, Q.L., Li, P.H.: Global second-order pooling convolutional networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3024–3033 (2019)
Zhang, Z.Z., Lan, C.L., Zeng, W.J., Jin, X., Chen, Z.B.: Relation-aware global attention for person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3186–3195 (2020)
Haining, H., Yu, L.: Underwater acoustic detection: current status and future trends. Bull. Chin. Acad. Sci. (Chin. Vers.) 34(3), 264–271 (2019)
Google Scholar
Cho, H., Gu, J., Joe, H., Asada, A., Yu, S.-C.: Acoustic beam profile-based rapid underwater object detection for an imaging sonar. J. Mar. Sci. Technol. 20, 180–197 (2015)
Article Google Scholar
Zhang, L.Y., Li, C.Y., Sun, H.F.: Object detection/tracking toward underwater photographs by remotely operated vehicles (ROVs). Futur. Gener. Comput. Syst. 126, 163–168 (2022)
Article Google Scholar
Moniruzzaman, M., Islam, S.M.S., Lavery, P., Bennamoun, M.: Faster r-cnn based deep learning for seagrass detection from underwater digital images. In: 2019 Digital Image Computing: Techniques and Applications (DICTA), pp. 1–7 (2019). IEEE
Tharwat, A., Hemedan, A.A., Hassanien, A.E., Gabel, T.: A biometric-based model for fish species classification. Fish. Res. 204, 324–336 (2018)
Article Google Scholar
Chuang, M.-C., Hwang, J.-N., Williams, K.: A feature learning and object recognition framework for underwater fish images. IEEE Trans. Image Process. 25(4), 1862–1872 (2016)
MathSciNet Google Scholar
Knausgård, K.M., Wiklund, A., Sørdalen, T.K., Halvorsen, K.T., Kleiven, A.R., Jiao, L., Goodwin, M.: Temperate fish detection and classification: a deep learning based approach. Appl. Intell., 1–14 (2022)
Pan, T.-S., Huang, H.-C., Lee, J.-C., Chen, C.-H.: Multi-scale ResNet for real-time underwater object detection. SIViP 15, 941–949 (2021)
Article Google Scholar
Ayob, A., Khairuddin, K., Mustafah, Y., Salisa, A., Kadir, K.: Analysis of pruned neural networks (mobilenetv2-yolo v2) for underwater object detection. In: Proceedings of the 11th National Technical Seminar on Unmanned System Technology 2019: NUSYS’19, pp. 87–98 (2021). Springer
Jalal, A., Salman, A., Mian, A., Shortis, M., Shafait, F.: Fish detection and species classification in underwater environments using deep learning with temporal information. Eco. Inf. 57, 101088 (2020)
Article Google Scholar
Jian, M.W., Liu, X.Y., Luo, H.J., Lu, X.W., Yu, H., Dong, J.Y.: Underwater image processing and analysis: a review. Signal Process. Image Commun. 91, 116088 (2021)
Article Google Scholar
Jian, M.W., Qi, Q., Dong, J.Y., Yin, Y.L., Lam, K.-M.: Integrating QDWD with pattern distinctness and local contrast for underwater saliency detection. J. Vis. Commun. Image Rep. 53, 31–41 (2018)
Article Google Scholar
Jian, M.W., Qi, Q., Yu, H., Dong, J.Y., Cui, C.R., Nie, X.S., Zhang, H.X., Yin, Y.L., Lam, K.-M.: The extended marine underwater environment database and baseline evaluations. Appl. Soft Comput. 80, 425–437 (2019)
Article Google Scholar
Lin, W.-H., Zhong, J.-X., Liu, S., Li, T., Li, G.: Roimix: proposal-fusion among multiple images for underwater object detection. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2588–2592 (2020). IEEE
Xu, F.Q., Wang, H.B., Peng, J.J., Fu, X.P.: Scale-aware feature pyramid architecture for marine object detection. Neural Comput. Appl. 33, 3637–3653 (2021)
Article Google Scholar
Xu, F.Q., Wang, H.B., Sun, X.D., Fu, X.P.: Refined marine object detector with attention-based spatial pyramid pooling networks and bidirectional feature fusion strategy. Neural Comput. Appl. 34(17), 14881–14894 (2022)
Article Google Scholar
Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.Z.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2961–2969 (2017)
Cai, Z., Vasconcelos, N.: Cascade r-cnn: Delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6154–6162 (2018)
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 10012–10022 (2021)
Yang, J., Li, C., Zhang, P., Dai, X., Xiao, B., Yuan, L., Gao, J.: Focal self-attention for local-global interactions in vision transformers. arXiv preprint arXiv:2107.00641 (2021)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016)
Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7263–7271 (2017)
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)
Jocher, G., et al: Yolov5. https://github.com/ultralytics/yolov5 (2021)
Yolov6: a single-stage object detection framework dedicated to industrial applications. https://github.com/meituan/YOLOv6 (2022)
Wang, C.Y., Bochkovskiy, A., Liao, H.Y.: Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv preprint arXiv:2207.02696 (2022)
Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: Yolox: exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430 (2021)
Underwater robot picking contest. http://www.cnurpc.org/
Everingham, M., Gool, L.V., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vision 88(2), 303–338 (2010)
Article Google Scholar
Everingham, M., Eslami, S.M.A., Gool, L.V., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. Int. J. Comput. Vision 111(1), 98–136 (2015)
Article Google Scholar
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 618–626 (2017)
Hou, Q., Zhou, D., Feng, J.: Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13713–13722 (2021)

Download references

Acknowledgements

The authors gratefully acknowledge the financial supports from the National Natural Science Foundation of China under Grant 61370142, Grant 61802043, Grant 61272368, Grant 62176037 and Grant 62002041, in part by the Fundamental Research Funds for the Central Universities under Grant 3132016352 and Grant 3132021238, in part by the Dalian Science and Technology Innovation Fund under Grant 2018J12GX037, Grant 2019J11CY001 and Grant 2021JJ12GX028, in part by Liaoning Revitalization Talents Program under Grant XLYC1908007, in part by the Liaoning Doctoral Research Start-up Fund Project Grant 2021-BS-075, and in part by the China Postdoctoral Science Foundation under Grant 3620080307.

Author information

Authors and Affiliations

The School of Information Science and Technology, Dalian Maritime University, Dalian, 116026, China
Xin Shen, Huibing Wang, Tianxiang Cui, Zhicheng Guo & Xianping Fu
The Peng Cheng Laboratory, Shenzhen, 518000, China
Xianping Fu

Authors

Xin Shen
View author publications
You can also search for this author in PubMed Google Scholar
Huibing Wang
View author publications
You can also search for this author in PubMed Google Scholar
Tianxiang Cui
View author publications
You can also search for this author in PubMed Google Scholar
Zhicheng Guo
View author publications
You can also search for this author in PubMed Google Scholar
Xianping Fu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xianping Fu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this article was revised: the corrections of the author were not carried out.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Shen, X., Wang, H., Cui, T. et al. Multiple information perception-based attention in YOLO for underwater object detection. Vis Comput 40, 1415–1438 (2024). https://doi.org/10.1007/s00371-023-02858-2

Download citation

Accepted: 31 March 2023
Published: 25 May 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s00371-023-02858-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Multiple information perception-based attention in YOLO for underwater object detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multi-dimensional, multi-functional and multi-level attention in YOLO for underwater object detection

High-strength synergic-calibration attention system in YOLO for underwater object detection application

Efficient underwater object detection based on feature enhancement and attention detection head

Explore related subjects

Data availability statement

Change history

23 June 2023

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now