Abstract
Underwater object detection is a prerequisite for underwater robots to achieve autonomous operation and ocean exploration. However, poor imaging quality, harsh underwater environments, and concealed underwater targets greatly aggravate the difficulty of underwater object detection. In order to reduce underwater background interference and improve underwater object perception, we propose a multiple information perception-based attention module (MIPAM), which is mainly composed of five processes. In information preprocessing, spatial downsampling and channel splitting control parameters and computations of attention module by reducing dimension sizes. In information collection, channel-level information collection and spatial-level information collection enhance the semantic information expression by perceiving multi-dimensional dependency information, multi-dimensional structure information and multi-dimensional global information. In information interaction, channel-driven information interaction and spatial-driven information interaction stimulate the intrinsic interaction potential by further perceiving multi-dimensional diversity information. Adaptive feature fusion further improves the information interaction quality by allocating learnable parameters. In attention activation, the multi-branch structure enhances the attention calibration efficiency by generating multiple attention. In information postprocessing, channel concatenation and spatial upsampling realize the plug-and-play of attention module by restoring original feature states. In order to meet the high-precision and real-time requirements for underwater object detection, we integrate MIPAM into YOLO detectors. The experimental results indicate that our work brings significant performance gains for underwater detection tasks. Our work also provides some performance improvements for other detection tasks, which shows the ideal generalization ability.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability statement
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
Change history
23 June 2023
A Correction to this paper has been published: https://doi.org/10.1007/s00371-023-02928-5
References
Jiang, M., Zhai, F.H., Kong, J.: Sparse attention module for optimizing semantic segmentation performance combined with a multi-task feature extraction network. Vis. Comput. 38(7), 2473–2488 (2022)
Yang, Q.N., Shi, W.M., Chen, J., Tang, Y.: Localization of hard joints in human pose estimation based on residual down-sampling and attention mechanism. Vis. Comput. 38(7), 2447–2459 (2022)
Cheng, Z.M., Qu, A.P., He, X.F.: Contour-aware semantic segmentation network with spatial attention mechanism for medical image. Vis. Comput. 38(3), 749–762 (2022)
Li, Z.X., Lu, S.H., Dong, Y.S., Guo, J.Y.: Msffa: a multi-scale feature fusion and attention mechanism network for crowd counting. Vis. Comput. 1–12 (2022)
Li, X.L., Hua, Z., Li, J.J.: Attention-based adaptive feature selection for multi-stage image dehazing. Vis. Comput., 1–16 (2022)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7132–7141 (2018)
Park, J., Woo, S., Lee, J.Y., Kweon, I.S.: Bam: bottleneck attention module. arXiv preprint arXiv:1807.06514 (2018)
Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: Cbam: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)
Lee, H., Kim, H.E., Nam, H.: Srm: A style-based recalibration module for convolutional neural networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1854–1862 (2019)
Li, X., Hu, X.L., Yang, J.: Spatial group-wise enhance: Improving semantic feature learning in convolutional networks. arXiv preprint arXiv:1905.09646 (2019)
Wang, Q., Wu, B., Zhu, P., Li, P., Hu, Q.: Eca-net: Efficient channel attention for deep convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Zhang, Q.L., Yang, Y.B.: Sa-net: Shuffle attention for deep convolutional neural networks. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2235–2239 (2021). IEEE
Zhang, H., Zu, K., Lu, J., Zou, Y., Meng, D.: Epsanet: an efficient pyramid split attention block on convolutional neural network. arXiv preprint arXiv:2105.14447 (2021)
Yang, Z.X., Zhu, L.C., Wu, Y., Yang, Y.: Gated channel transformation for visual recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11794–11803 (2020)
Qin, Z., Zhang, P., Wu, F., Li, X.: Fcanet: Frequency channel attention networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 783–792 (2021)
Misra, D., Nalamada, T., Arasanipalai, A.U., Hou, Q.: Rotate to attend: convolutional triplet attention module. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3139–3148 (2021)
Chen, Y.P., Kalantidis, Y., Li, J.S., Yan, S.C., Feng, J.S.: A \(\hat{\,}\) 2-nets: double attention networks. Adv. Neural Inf. Process. Syst. 31 (2018)
Gao, Z.L., Xie, J.T., Wang, Q.L., Li, P.H.: Global second-order pooling convolutional networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3024–3033 (2019)
Zhang, Z.Z., Lan, C.L., Zeng, W.J., Jin, X., Chen, Z.B.: Relation-aware global attention for person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3186–3195 (2020)
Haining, H., Yu, L.: Underwater acoustic detection: current status and future trends. Bull. Chin. Acad. Sci. (Chin. Vers.) 34(3), 264–271 (2019)
Cho, H., Gu, J., Joe, H., Asada, A., Yu, S.-C.: Acoustic beam profile-based rapid underwater object detection for an imaging sonar. J. Mar. Sci. Technol. 20, 180–197 (2015)
Zhang, L.Y., Li, C.Y., Sun, H.F.: Object detection/tracking toward underwater photographs by remotely operated vehicles (ROVs). Futur. Gener. Comput. Syst. 126, 163–168 (2022)
Moniruzzaman, M., Islam, S.M.S., Lavery, P., Bennamoun, M.: Faster r-cnn based deep learning for seagrass detection from underwater digital images. In: 2019 Digital Image Computing: Techniques and Applications (DICTA), pp. 1–7 (2019). IEEE
Tharwat, A., Hemedan, A.A., Hassanien, A.E., Gabel, T.: A biometric-based model for fish species classification. Fish. Res. 204, 324–336 (2018)
Chuang, M.-C., Hwang, J.-N., Williams, K.: A feature learning and object recognition framework for underwater fish images. IEEE Trans. Image Process. 25(4), 1862–1872 (2016)
Knausgård, K.M., Wiklund, A., Sørdalen, T.K., Halvorsen, K.T., Kleiven, A.R., Jiao, L., Goodwin, M.: Temperate fish detection and classification: a deep learning based approach. Appl. Intell., 1–14 (2022)
Pan, T.-S., Huang, H.-C., Lee, J.-C., Chen, C.-H.: Multi-scale ResNet for real-time underwater object detection. SIViP 15, 941–949 (2021)
Ayob, A., Khairuddin, K., Mustafah, Y., Salisa, A., Kadir, K.: Analysis of pruned neural networks (mobilenetv2-yolo v2) for underwater object detection. In: Proceedings of the 11th National Technical Seminar on Unmanned System Technology 2019: NUSYS’19, pp. 87–98 (2021). Springer
Jalal, A., Salman, A., Mian, A., Shortis, M., Shafait, F.: Fish detection and species classification in underwater environments using deep learning with temporal information. Eco. Inf. 57, 101088 (2020)
Jian, M.W., Liu, X.Y., Luo, H.J., Lu, X.W., Yu, H., Dong, J.Y.: Underwater image processing and analysis: a review. Signal Process. Image Commun. 91, 116088 (2021)
Jian, M.W., Qi, Q., Dong, J.Y., Yin, Y.L., Lam, K.-M.: Integrating QDWD with pattern distinctness and local contrast for underwater saliency detection. J. Vis. Commun. Image Rep. 53, 31–41 (2018)
Jian, M.W., Qi, Q., Yu, H., Dong, J.Y., Cui, C.R., Nie, X.S., Zhang, H.X., Yin, Y.L., Lam, K.-M.: The extended marine underwater environment database and baseline evaluations. Appl. Soft Comput. 80, 425–437 (2019)
Lin, W.-H., Zhong, J.-X., Liu, S., Li, T., Li, G.: Roimix: proposal-fusion among multiple images for underwater object detection. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2588–2592 (2020). IEEE
Xu, F.Q., Wang, H.B., Peng, J.J., Fu, X.P.: Scale-aware feature pyramid architecture for marine object detection. Neural Comput. Appl. 33, 3637–3653 (2021)
Xu, F.Q., Wang, H.B., Sun, X.D., Fu, X.P.: Refined marine object detector with attention-based spatial pyramid pooling networks and bidirectional feature fusion strategy. Neural Comput. Appl. 34(17), 14881–14894 (2022)
Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.Z.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2961–2969 (2017)
Cai, Z., Vasconcelos, N.: Cascade r-cnn: Delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6154–6162 (2018)
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 10012–10022 (2021)
Yang, J., Li, C., Zhang, P., Dai, X., Xiao, B., Yuan, L., Gao, J.: Focal self-attention for local-global interactions in vision transformers. arXiv preprint arXiv:2107.00641 (2021)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016)
Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7263–7271 (2017)
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)
Jocher, G., et al: Yolov5. https://github.com/ultralytics/yolov5 (2021)
Yolov6: a single-stage object detection framework dedicated to industrial applications. https://github.com/meituan/YOLOv6 (2022)
Wang, C.Y., Bochkovskiy, A., Liao, H.Y.: Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv preprint arXiv:2207.02696 (2022)
Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: Yolox: exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430 (2021)
Underwater robot picking contest. http://www.cnurpc.org/
Everingham, M., Gool, L.V., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vision 88(2), 303–338 (2010)
Everingham, M., Eslami, S.M.A., Gool, L.V., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. Int. J. Comput. Vision 111(1), 98–136 (2015)
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 618–626 (2017)
Hou, Q., Zhou, D., Feng, J.: Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13713–13722 (2021)
Acknowledgements
The authors gratefully acknowledge the financial supports from the National Natural Science Foundation of China under Grant 61370142, Grant 61802043, Grant 61272368, Grant 62176037 and Grant 62002041, in part by the Fundamental Research Funds for the Central Universities under Grant 3132016352 and Grant 3132021238, in part by the Dalian Science and Technology Innovation Fund under Grant 2018J12GX037, Grant 2019J11CY001 and Grant 2021JJ12GX028, in part by Liaoning Revitalization Talents Program under Grant XLYC1908007, in part by the Liaoning Doctoral Research Start-up Fund Project Grant 2021-BS-075, and in part by the China Postdoctoral Science Foundation under Grant 3620080307.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The original online version of this article was revised: the corrections of the author were not carried out.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Shen, X., Wang, H., Cui, T. et al. Multiple information perception-based attention in YOLO for underwater object detection. Vis Comput 40, 1415–1438 (2024). https://doi.org/10.1007/s00371-023-02858-2
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-023-02858-2