Abstract
Researches on the Salient Object Detection (SOD) task have made many advances based on deep learning methods. However, most methods have focused on predicting a fine mask rather than finding the most salient objects. Most datasets for the SOD task also focus on evaluating pixel-wise accuracy rather than “saliency”. In this study, we used the Salient Objects in Clutter (SOC) dataset to conduct research that focuses more on the saliency of objects. We propose a architecture that extends the cross-attention mechanism of Transformer to the DETR architecture to learn the relationship between the global image semantics and the objects. We extended module with Saliency Attention (SA) to the network, namely SA-DETR, to detect salient objects based on object-level saliency. Our proposed method with cross- and saliency-attentions shows superior results in detecting salient objects among multiple objects compared to other methods. We demonstrate the effectiveness of our proposed method by showing that it outperforms the state-of-the-art performance of the existing SOD method by 4.7% and 0.2% in MAE and mean E-measure, respectively.









Similar content being viewed by others
Data availability
All data generated or analysed during this study are included in the published article [9], and the authors confirm that the datasets are indicated in the reference list.
References
Achanta R, Hemami S, Estrada F, et al (2009) Frequency-tuned salient region detection. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 1597–1604
Brahim K, Kalboussi R, Abdellaoui M et al (2019) Spatio-temporal saliency detection using objectness measure. Signal, Image Video Process 13:1055–1062
Carion N, Massa F, Synnaeve G, et al (2020) End-to-end object detection with transformers. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16. Springer, pp 213–229
Chen Q, Wang J, Han C et al (2022) Group detr v2: Strong object detector with encoder-decoder pretraining. arXiv preprint arXiv:2211.03594
Cheng MM, Zhang Z, Lin WY et al (2014) Bing: Binarized normed gradients for objectness estimation at 300fps. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3286–3293
Dosovitskiy A, Beyer L, Kolesnikov A et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929
Fan DP, Cheng MM, Liu Y, et al (2017) Structure-measure: a new way to evaluate foreground maps. In: Proceedings of the IEEE international conference on computer vision, pp 4548–4557
Fan DP, Gong C, Cao Y et al (2018) Enhanced-alignment measure for binary foreground map evaluation. arXiv preprint arXiv:1805.10421
Fan DP, Zhang J, Xu G et al (2022) Salient objects in clutter. IEEE Trans Pattern Anal Mach Intell 45(2):2344–2366
Fang Y, Wang W, Xie B et al (2022) Eva: Exploring the limits of masked visual representation learning at scale. arXiv preprint arXiv:2211.07636
Harel J, Koch C, Perona P (2006) Graph-based visual saliency. Advances in neural information processing systems 19
Hou Q, Cheng MM, Hu X et al (2019) Deeply supervised salient object detection with short connections. IEEE TPAMI 41(4):815–828. https://doi.org/10.1109/TPAMI.2018.2815688
Hou X, Zhang L (2007) Saliency detection: A spectral residual approach. In: 2007 IEEE Conference on computer vision and pattern recognition. IEEE, pp 1–8
Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259
Li G, Yu Y (2016) Deep contrast learning for salient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 478–487
Li Y, Hou X, Koch C et al (2014) The secrets of salient object segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 280–287
Liu JJ, Hou Q, Cheng MM et al (2019) A simple pooling-based design for real-time salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3917–3926
Liu N, Zhang N, Wan K et al (2021) Visual saliency transformer. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 4722–4732
Liu Y, Cheng MM, Hu X et al (2017) Richer convolutional features for edge detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3000–3009
Liu Z, Lin Y, Cao Y et al (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022
Luo Z, Mishra A, Achkar A et al (2017) Non-local deep features for salient object detection. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 6609–6617
Nguyen T (2015) Salient object detection via objectness proposals. In: Proceedings of the AAAI Conference on Artificial Intelligence
Pan J, Sayrol E, Nieto XG et al (2017) Salgan: Visual saliency prediction with adversarial networks. In: CVPR scene understanding workshop (SUNw)
Perazzi F, Krähenbühl P, Pritch Y et al (2012) Saliency filters: contrast based filtering for salient region detection. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 733–740
Qin X, Zhang Z, Huang C et al (2019) Basnet: Boundary-aware salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7479–7489
Qin X, Zhang Z, Huang C et al (2020) U2-net: Going deeper with nested u-structure for salient object detection. Pattern Recognit 106:107404
Srivatsa RS, Babu RV (2015) Salient object detection via objectness measure. In: 2015 IEEE international conference on image processing (ICIP). IEEE, pp 4481–4485
Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. Advances in neural information processing systems 30
Wang L, Lu H, Wang Y et al (2017) Learning to detect salient objects with image-level supervision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 136–145
Wei J, Wang S, Huang Q (2020) \(\text{F}^3\)net: fusion, feedback and focus for salient object detection. In: Proceedings of the AAAI conference on artificial intelligence, pp 12321–12328
Wu Z, Su L, Huang Q (2019) Stacked cross refinement network for edge-aware salient object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 7264–7273
Yang C, Zhang L, Lu H et al (2013) Saliency detection via graph-based manifold ranking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3166–3173
Zaidi SSA, Ansari MS, Aslam A et al (2022) A survey of modern deep learning based object detection models. Digit Signal Process 126:103514
Zhang J, Fan DP, Dai Y et al (2021) Uncertainty inspired rgb-d saliency detection. IEEE Trans Pattern Anal Mach Intell 44(9):5761–5779
Zhang P, Wang D, Lu H et al (2017) Learning uncertain convolutional features for accurate saliency detection. In: Proceedings of the IEEE International Conference on computer vision, pp 212–221
Zhao JX, Liu JJ, Fan DP et al (2019) Egnet: Edge guidance network for salient object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8779–8788
Zhuge M, Fan DP, Liu N et al (2022) Salient object detection via integrity learning. IEEE Trans Pattern Anal Mach Intell 45(3):3738–52
Zong Z, Song G, Liu Y (2022) Detrs with collaborative hybrid assignments training. arXiv preprint arXiv:2211.12860
Acknowledgements
This work was supported by the Soongsil University Research Fund (New Professor Support Research) of 2021.
Funding
Soongsil University, New Professor Support Research of 2021, Minyoung Chung.
Author information
Authors and Affiliations
Contributions
Kwangwoon Nam: Methodology, Software, Investigation, Data curation, Writing - original draft. Jeeheon Kim: Conceptualization, Supervision, Writing-review. Heeyeon Kim: Experiments. Minyoung Chung: Conceptualization, Resources, Writing - review & editing, Supervision, Project administration, Funding acquisition.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Ethical and informed consent for data used
Not applicable.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Nam, K., Kim, J., Kim, H. et al. SA-DETR: Saliency Attention-based DETR for salient object detection. Pattern Anal Applic 28, 5 (2025). https://doi.org/10.1007/s10044-024-01379-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10044-024-01379-5