Abstract
Successful salient object detection is largely dependent on large-scale fine-grained annotated datasets. However, pixel-level annotation is a laborious process compared with weak labels and scant research has been done on high-resolution images. To mitigate these drawbacks, we propose a distinctive network to explore salient object in high-resolution images under scribble-supervised and relabel a previous high-resolution dataset with scribbles, namely Scr-HRSOD, in which each image is labelled in a few seconds. Since scribble labels lack structural information about objects, a boundary structure maintenance branch with shallow layers is introduced to capture low-level spatial details. Within the constraint of boundary branches, a lightweight contextual semantic branch process compressed inputs to obtain high-level semantic context and iteratively propagates the partially annotated pixels to surrounding similar regions, which are then employed as pseudo-labels to supervise the network. Extensive evaluations on five datasets illustrate the effectiveness of our introduced method. On HRSOD datasets, we achieve higher 0.861 Fmax and 0.887 Sm values, which outperforms the existing foremost weakly supervised methods and even the fully supervised methods.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
The annotation tool we employed is the scribble annotation tool in Image Labeler in Matlab R2019b.
The Scr-HRSOD datasets are publicly available: https://github.com/YQP-CV/Scribble-Supervised-HRSOD and our code is about to be open source.
References
Shon AP, Grimes DB, Baker CL, et al. (2005) Probabilistic gaze imitation and saliency learning in a robotic head. In: Proceedings of the IEEE International Conference on Robotics and Automation 2865–2870
Zhi H, Shen J, Hong B (2018) Saliency driven region-edge-based top down level set evolution reveals the asynchronous focus in image segmentation. Pattern Recognit: J Pattern Recognit Soc 80:241–255
Hong S, You T, Kwak S, Han B (2015) Online tracking by learning discriminative saliency map with convolutional neural network. International conference on machine learning 597–606
Shen JB, Peng JT, Shao L (2018) Submodular trajectories for better motion segmentation in videos. IEEE Trans Image Proc 27(6):2688–2700
Wang WG, Shen JB, Ling HB (2018) A deep network solution for attention and aesthetics aware photo ropping. IEEE Trans Pattern Anal Mach Intell 41(7):1531–1544
Luo ZM, Mishra A, Achkar et al. (2017) Non-local deep features for salient object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 6609–6617
Liu N, Han JW, Yang MH et al. (2018) Picanet: Learning pixel-wise contextual attention for saliency detection. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 684–690
Zeng Y, Zhang PP, Zhang JM, et al. (2019) Towards high-resolution salient object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision 7234–7243
Zhang P, Liu W, Zeng Y et al (2021) Looking for the detail and context devils: high-resolution salient object detection. IEEE Trans Image Proc 99:1–1
Wang L, Lu H, Wang Y, et al. (2017) Learning to detect salient objects with image-level supervision. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p. 3796–3805. https://doi.org/10.1109/CVPR.2017.404
Qian M, Qi J, Zhang L et al (2019) Language-aware weak supervision for salient object detection. Pattern Recognit 96:106955
Y Zeng, Y Zhuge, H Lu, et al. (2019) Multi-source weak supervision for saliency detection. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), p. 6067–6076. https://doi.org/10.1109/CVPR.2019.00623
Zhang J, Yu X, Li A, et al. (2020) Weakly-supervised salient object detection via scribble annotations. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), p. 12546–12555
Yu C, Wang J, Peng C, et al. (2018) BiSeNet: bilateral segmentation network for real-time semantic segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), p. 325–341
Yu C, Gao C, Wang J, et al. (2020) BiSeNet V2: bilateral network with guided aggregation for real-time semantic segmentation, arXiv preprint arXiv: 2004.02147 [cs.CV]
Zhao H , Qi X , Shen X , et al. (2017) ICNet for Real-Time Semantic Segmentation on High-Resolution Images. In: Proceedings of the European conference on computer vision (ECCV), p. 405–420
Poudel R, Liwicki S, Cipolla R. (2019) Fast-SCNN: fast semantic segmentation network, arXiv preprint arXiv:1902.04502
Poudel R, Bonde U, Liwicki S, et al. (2018) ContextNet: exploring context and detail for semantic segmentation in real-time, arXiv preprint arXiv:1805.04554
Sandler M, Howard A, Zhu M, et al. (2018) MobileNetV2: inverted residuals and linear bottlenecks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), p. 4510–4520
Zhang X, Zhou X, Lin M, et al. (2017) ShuffleNet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE conference on computer vision and pattern recognition, p. 6848–6856
Ma N, Zhang X, Zheng H T, et al. ShuffleNet V2: practical guidelines for efficient CNN architecture design. In: Proceedings of the European conference on computer vision (ECCV), p. 116–131
Iandola, Forrest N., et al. (2016) SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size. arXiv preprint arXiv: 1602.07360
Long, J, Shelhamer E, Darrell T. (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, p. 3431–3440
Ronneberger O, Fischer P, Brox T. (2015) U-Net: convolutional networks for biomedical image segmentation. In: International Conference on Medical image computing and computer-assisted intervention. Springer, Cham, p. 234–241
Wang WG, Lai QX, Fu HZ, Shen JB, Ling HB. (2021) Salient object detection in the deep learning era: an in-depth survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, p. 220–232
Howard, Andrew G, et al. (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv: 1704.04861
Lin G, Milan A, Shen C, et al. (2017) RefineNet: multi-path refinement networks for high-resolution semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p. 5168–5177
Wang J, Yang QP, Yang SQ et al (2022) Dual-path processing network for high-resolution salient object detection. Appl Intell. https://doi.org/10.1007/s10489-021-02971-6
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556
He KM, Zhang XY, Ren SQ, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, p. 770–778
Huang G, Liu Z, Maaten LV, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, p. 4700–4708
Jia D, Wei D, Socher R, et al. (2009) ImageNet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition, p. 248–255
Siva, P Russell C, Xiang T, Agapito L (2013) Looking beyond the image: Unsupervised learning for object saliency and detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, p. 3238–3245
Bearman A, Russakovsky O, Ferrari V, et al. (2016) What's the Point: Semantic Segmentation with Point Supervision. Springer, Cham, p. 549–565
Chen LC, Papandreou G, Kokkinos I et al (2018) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Boykov, Yuri Y, M-P Jolly (2001) Interactive graph cuts for optimal boundary & region segmentation of objects in ND images. In: Proceedings eighth IEEE international conference on computer vision. ICCV 2001. IEEE, p. 105–112
Liu Y, Cheng M, M Hu, et al. (2017) Richer convolutional features for edge detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, p. 3000–3009
Chen LC, Papandreou G, Schroff F, et al. (2017) Rethinking atrous convolution for semantic image segmentation, arXiv preprint arXiv:1706.05587
Fan MY, Huang SQ, Wei XM, et al. (2021) Rethinking BiSeNet For Real-time Semantic Segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, p. 9716–9725
Zhao J X, Liu J J, Fan D P, et al. (2019) EGNet: Edge guidance network for salient object detection. Proceedings of the IEEE/CVF International Conference on Computer Vision, p. 8779–8788
Tang M, Djelouah A, Perazzi F, et al. (2018) Normalized cut loss for weakly-supervised cnn segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, p. 1818–1827
Yan Q, Xu L, Shi JP, Jia JY (2013) Hierarchical saliency detection. Computer Vision and Pattern Recognition (CVPR). In: 2013 IEEE Conference, p. 1155–1162
Li GB, Yu YZ (2015) Visual saliency based on multiscale deep features. In: Proceedings of the IEEE conference on computer vision and pattern recognition, p. 5455–5463
Wolfgang E, Peter K (2015) Does luminance-contrast contribute to a saliency map for overt visual attention? Eur J Neurosci 17(5):1089–1097
Wang LJ, Lu HC, Wang YF, Mengyang Feng (2017) Learning to detect salient objects with image-level supervision. In: IEEE Conference on Computer Vision & Pattern Recognition, p. 136–145
Zhang PP, Wang D, Lu HC, Wang HY (2017) Amulet: aggregating multi-level convolutional features for salient object detection. In: Proceedings of the IEEE International Conference on Computer Vision, p. 202–211
Zhang D, Han J, Zhang Y. (2017) Supervision by fusion: Towards unsupervised learning of deep salient object detector. In: Proceedings of the IEEE International Conference on Computer Vision, p. 4048–4056
Kingma D P, Ba J. (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980
Lin D, Dai JF, Jia JY, et al. (2016) Scribblesup: Scribble-supervised convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, p. 3159–3167
Wang B, Qi GJ, Tang S, et al. (2019) Boundary perception guidance: A scribble-supervised semantic segmentation approach. In: Proceedings of the IEEE conference on computer vision and pattern recognition, p. 3663–3669
Acknowledgements
This research is supported by the National Natural Science Foundation of China (No.62002100), the National Natural Science Foundation of China (No.61802111) and the Science and Technology Foundation of Henan Province of China (No.212102210156). National Natural Science Foundation of China (No.62176088).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yang, Q., Zhou, Y., Chai, X. et al. Exploring class-agnostic pixels for scribble-supervised high-resolution salient object detection. Neural Comput & Applic 35, 3469–3482 (2023). https://doi.org/10.1007/s00521-022-07915-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07915-w