Multiple information perception-based attention in YOLO for underwater object detection | The Visual Computer Skip to main content
Log in

Multiple information perception-based attention in YOLO for underwater object detection

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

A Publisher Correction to this article was published on 23 June 2023

This article has been updated

Abstract

Underwater object detection is a prerequisite for underwater robots to achieve autonomous operation and ocean exploration. However, poor imaging quality, harsh underwater environments, and concealed underwater targets greatly aggravate the difficulty of underwater object detection. In order to reduce underwater background interference and improve underwater object perception, we propose a multiple information perception-based attention module (MIPAM), which is mainly composed of five processes. In information preprocessing, spatial downsampling and channel splitting control parameters and computations of attention module by reducing dimension sizes. In information collection, channel-level information collection and spatial-level information collection enhance the semantic information expression by perceiving multi-dimensional dependency information, multi-dimensional structure information and multi-dimensional global information. In information interaction, channel-driven information interaction and spatial-driven information interaction stimulate the intrinsic interaction potential by further perceiving multi-dimensional diversity information. Adaptive feature fusion further improves the information interaction quality by allocating learnable parameters. In attention activation, the multi-branch structure enhances the attention calibration efficiency by generating multiple attention. In information postprocessing, channel concatenation and spatial upsampling realize the plug-and-play of attention module by restoring original feature states. In order to meet the high-precision and real-time requirements for underwater object detection, we integrate MIPAM into YOLO detectors. The experimental results indicate that our work brings significant performance gains for underwater detection tasks. Our work also provides some performance improvements for other detection tasks, which shows the ideal generalization ability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability statement

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Change history

References

  1. Jiang, M., Zhai, F.H., Kong, J.: Sparse attention module for optimizing semantic segmentation performance combined with a multi-task feature extraction network. Vis. Comput. 38(7), 2473–2488 (2022)

    Article  Google Scholar 

  2. Yang, Q.N., Shi, W.M., Chen, J., Tang, Y.: Localization of hard joints in human pose estimation based on residual down-sampling and attention mechanism. Vis. Comput. 38(7), 2447–2459 (2022)

    Article  Google Scholar 

  3. Cheng, Z.M., Qu, A.P., He, X.F.: Contour-aware semantic segmentation network with spatial attention mechanism for medical image. Vis. Comput. 38(3), 749–762 (2022)

    Article  PubMed  Google Scholar 

  4. Li, Z.X., Lu, S.H., Dong, Y.S., Guo, J.Y.: Msffa: a multi-scale feature fusion and attention mechanism network for crowd counting. Vis. Comput. 1–12 (2022)

  5. Li, X.L., Hua, Z., Li, J.J.: Attention-based adaptive feature selection for multi-stage image dehazing. Vis. Comput., 1–16 (2022)

  6. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7132–7141 (2018)

  7. Park, J., Woo, S., Lee, J.Y., Kweon, I.S.: Bam: bottleneck attention module. arXiv preprint arXiv:1807.06514 (2018)

  8. Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: Cbam: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)

  9. Lee, H., Kim, H.E., Nam, H.: Srm: A style-based recalibration module for convolutional neural networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1854–1862 (2019)

  10. Li, X., Hu, X.L., Yang, J.: Spatial group-wise enhance: Improving semantic feature learning in convolutional networks. arXiv preprint arXiv:1905.09646 (2019)

  11. Wang, Q., Wu, B., Zhu, P., Li, P., Hu, Q.: Eca-net: Efficient channel attention for deep convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

  12. Zhang, Q.L., Yang, Y.B.: Sa-net: Shuffle attention for deep convolutional neural networks. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2235–2239 (2021). IEEE

  13. Zhang, H., Zu, K., Lu, J., Zou, Y., Meng, D.: Epsanet: an efficient pyramid split attention block on convolutional neural network. arXiv preprint arXiv:2105.14447 (2021)

  14. Yang, Z.X., Zhu, L.C., Wu, Y., Yang, Y.: Gated channel transformation for visual recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11794–11803 (2020)

  15. Qin, Z., Zhang, P., Wu, F., Li, X.: Fcanet: Frequency channel attention networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 783–792 (2021)

  16. Misra, D., Nalamada, T., Arasanipalai, A.U., Hou, Q.: Rotate to attend: convolutional triplet attention module. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3139–3148 (2021)

  17. Chen, Y.P., Kalantidis, Y., Li, J.S., Yan, S.C., Feng, J.S.: A \(\hat{\,}\) 2-nets: double attention networks. Adv. Neural Inf. Process. Syst. 31 (2018)

  18. Gao, Z.L., Xie, J.T., Wang, Q.L., Li, P.H.: Global second-order pooling convolutional networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3024–3033 (2019)

  19. Zhang, Z.Z., Lan, C.L., Zeng, W.J., Jin, X., Chen, Z.B.: Relation-aware global attention for person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3186–3195 (2020)

  20. Haining, H., Yu, L.: Underwater acoustic detection: current status and future trends. Bull. Chin. Acad. Sci. (Chin. Vers.) 34(3), 264–271 (2019)

    Google Scholar 

  21. Cho, H., Gu, J., Joe, H., Asada, A., Yu, S.-C.: Acoustic beam profile-based rapid underwater object detection for an imaging sonar. J. Mar. Sci. Technol. 20, 180–197 (2015)

    Article  Google Scholar 

  22. Zhang, L.Y., Li, C.Y., Sun, H.F.: Object detection/tracking toward underwater photographs by remotely operated vehicles (ROVs). Futur. Gener. Comput. Syst. 126, 163–168 (2022)

    Article  Google Scholar 

  23. Moniruzzaman, M., Islam, S.M.S., Lavery, P., Bennamoun, M.: Faster r-cnn based deep learning for seagrass detection from underwater digital images. In: 2019 Digital Image Computing: Techniques and Applications (DICTA), pp. 1–7 (2019). IEEE

  24. Tharwat, A., Hemedan, A.A., Hassanien, A.E., Gabel, T.: A biometric-based model for fish species classification. Fish. Res. 204, 324–336 (2018)

    Article  Google Scholar 

  25. Chuang, M.-C., Hwang, J.-N., Williams, K.: A feature learning and object recognition framework for underwater fish images. IEEE Trans. Image Process. 25(4), 1862–1872 (2016)

    MathSciNet  Google Scholar 

  26. Knausgård, K.M., Wiklund, A., Sørdalen, T.K., Halvorsen, K.T., Kleiven, A.R., Jiao, L., Goodwin, M.: Temperate fish detection and classification: a deep learning based approach. Appl. Intell., 1–14 (2022)

  27. Pan, T.-S., Huang, H.-C., Lee, J.-C., Chen, C.-H.: Multi-scale ResNet for real-time underwater object detection. SIViP 15, 941–949 (2021)

    Article  Google Scholar 

  28. Ayob, A., Khairuddin, K., Mustafah, Y., Salisa, A., Kadir, K.: Analysis of pruned neural networks (mobilenetv2-yolo v2) for underwater object detection. In: Proceedings of the 11th National Technical Seminar on Unmanned System Technology 2019: NUSYS’19, pp. 87–98 (2021). Springer

  29. Jalal, A., Salman, A., Mian, A., Shortis, M., Shafait, F.: Fish detection and species classification in underwater environments using deep learning with temporal information. Eco. Inf. 57, 101088 (2020)

    Article  Google Scholar 

  30. Jian, M.W., Liu, X.Y., Luo, H.J., Lu, X.W., Yu, H., Dong, J.Y.: Underwater image processing and analysis: a review. Signal Process. Image Commun. 91, 116088 (2021)

    Article  Google Scholar 

  31. Jian, M.W., Qi, Q., Dong, J.Y., Yin, Y.L., Lam, K.-M.: Integrating QDWD with pattern distinctness and local contrast for underwater saliency detection. J. Vis. Commun. Image Rep. 53, 31–41 (2018)

    Article  Google Scholar 

  32. Jian, M.W., Qi, Q., Yu, H., Dong, J.Y., Cui, C.R., Nie, X.S., Zhang, H.X., Yin, Y.L., Lam, K.-M.: The extended marine underwater environment database and baseline evaluations. Appl. Soft Comput. 80, 425–437 (2019)

    Article  Google Scholar 

  33. Lin, W.-H., Zhong, J.-X., Liu, S., Li, T., Li, G.: Roimix: proposal-fusion among multiple images for underwater object detection. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2588–2592 (2020). IEEE

  34. Xu, F.Q., Wang, H.B., Peng, J.J., Fu, X.P.: Scale-aware feature pyramid architecture for marine object detection. Neural Comput. Appl. 33, 3637–3653 (2021)

    Article  Google Scholar 

  35. Xu, F.Q., Wang, H.B., Sun, X.D., Fu, X.P.: Refined marine object detector with attention-based spatial pyramid pooling networks and bidirectional feature fusion strategy. Neural Comput. Appl. 34(17), 14881–14894 (2022)

    Article  Google Scholar 

  36. Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.Z.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2961–2969 (2017)

  37. Cai, Z., Vasconcelos, N.: Cascade r-cnn: Delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6154–6162 (2018)

  38. Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 10012–10022 (2021)

  39. Yang, J., Li, C., Zhang, P., Dai, X., Xiao, B., Yuan, L., Gao, J.: Focal self-attention for local-global interactions in vision transformers. arXiv preprint arXiv:2107.00641 (2021)

  40. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016)

  41. Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7263–7271 (2017)

  42. Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)

  43. Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)

  44. Jocher, G., et al: Yolov5. https://github.com/ultralytics/yolov5 (2021)

  45. Yolov6: a single-stage object detection framework dedicated to industrial applications. https://github.com/meituan/YOLOv6 (2022)

  46. Wang, C.Y., Bochkovskiy, A., Liao, H.Y.: Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv preprint arXiv:2207.02696 (2022)

  47. Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: Yolox: exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430 (2021)

  48. Underwater robot picking contest. http://www.cnurpc.org/

  49. Everingham, M., Gool, L.V., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vision 88(2), 303–338 (2010)

    Article  Google Scholar 

  50. Everingham, M., Eslami, S.M.A., Gool, L.V., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. Int. J. Comput. Vision 111(1), 98–136 (2015)

    Article  Google Scholar 

  51. Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 618–626 (2017)

  52. Hou, Q., Zhou, D., Feng, J.: Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13713–13722 (2021)

Download references

Acknowledgements

The authors gratefully acknowledge the financial supports from the National Natural Science Foundation of China under Grant 61370142, Grant 61802043, Grant 61272368, Grant 62176037 and Grant 62002041, in part by the Fundamental Research Funds for the Central Universities under Grant 3132016352 and Grant 3132021238, in part by the Dalian Science and Technology Innovation Fund under Grant 2018J12GX037, Grant 2019J11CY001 and Grant 2021JJ12GX028, in part by Liaoning Revitalization Talents Program under Grant XLYC1908007, in part by the Liaoning Doctoral Research Start-up Fund Project Grant 2021-BS-075, and in part by the China Postdoctoral Science Foundation under Grant 3620080307.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xianping Fu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this article was revised: the corrections of the author were not carried out.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shen, X., Wang, H., Cui, T. et al. Multiple information perception-based attention in YOLO for underwater object detection. Vis Comput 40, 1415–1438 (2024). https://doi.org/10.1007/s00371-023-02858-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-023-02858-2

Keywords

Navigation