A Self-Supervised Few-Shot Semantic Segmentation Method Based on Multi-Task Learning and Dense Attention Computation
Abstract
:1. Introduction
- A self-supervised few-shot segmentation method is proposed based on a multi-task learning paradigm. The unsupervised salient part of the image is split into two parts; one of them is used as a support image mask for few-shot segmentation, and the other part and the entire image are used to calculate the cross-entropy with the prediction results to realize multi-task learning so as to improve the generalization ability;
- An efficient few-shot segmentation network based on dense attention computation is proposed. Multi-scale feature extraction is carried out using Swin Transformer so as to make full use of the multi-scale pixel-level correlation.
2. Related Works
2.1. Few-Shot Semantic Segmentation with Fully Supervised Learning
2.2. Self-Supervised Learning for Image Semantic Segmentation
2.3. FSS Vision Transformers
3. Method
3.1. Problem Definition
3.2. Framework
Algorithm 1 FSS self-supervised framework based on multi-task learning |
|
3.3. MLDAC Network Architecture
3.3.1. Feature Extraction and Masking
3.3.2. Dense Attention Computation Block(DACB)
3.3.3. Inter-Scale Mixing and Up-Sampling Module
4. Experiments and Results
4.1. Implementation Details
4.2. Comparison with Other Popular Methods
4.3. Analysis of the Computational Complexity
4.4. Visualization Results
4.5. Ablation Study
4.5.1. Multi-Task Learning Parameter Settings
4.5.2. The Architecture of MLDAC
4.5.3. Configuration of Learnable Absolute PE and Dense Skip Connections
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Correction Statement
References
- Kim, M.Y.; Kim, S.; Lee, B.; Kim, J. Enhancing Deep Learning-Based Segmentation Accuracy through Intensity Rendering and 3D Point Interpolation Techniques to Mitigate Sensor Variability. Sensors 2024, 24, 4475. [Google Scholar] [CrossRef]
- Jun, W.; Yoo, J.; Lee, S. Synthetic Data Enhancement and Network Compression Technology of Monocular Depth Estimation for Real-Time Autonomous Driving System. Sensors 2024, 24, 4205. [Google Scholar] [CrossRef]
- You, L.; Zhu, R.; Kwan, M.; Chen, M.; Zhang, F.; Yang, B.; Wong, M.; Qin, Z. Unraveling adaptive changes in electric vehicle charging behavior toward the postpandemic era by federated meta-learning. Innovation 2024, 5. [Google Scholar] [CrossRef] [PubMed]
- Liu, S.; You, L.; Zhu, R.; Liu, B.; Liu, R.; Yu, H.; Yuen, C. AFM3D: An Asynchronous Federated Meta-Learning Framework for Driver Distraction Detection. In IEEE Transactions on Intelligent Transportation Systems; IEEE: Piscataway, NJ, USA, 2024. [Google Scholar]
- Wang, K.; Liew, J.; Zou, Y.; Zhou, D.; Feng, J. Panet: Few-shot image semantic segmentation with prototype alignment. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9197–9206. [Google Scholar]
- Zhang, C.; Lin, G.; Liu, F.; Yao, R.; Shen, C. Canet: Class-agnostic segmentation networks with iterative refinement and attentive few-shot learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 5217–5226. [Google Scholar]
- Min, J.; Kang, D.; Cho, M. Hypercorrelation squeeze for few-shot segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 6941–6952. [Google Scholar]
- Zhou, T.; Wang, W.; Konukoglu, E.; Van Gool, L. Rethinking semantic segmentation: A prototype view. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 2582–2593. [Google Scholar]
- Yang, B.; Wan, F.; Liu, C.; Li, B.; Ji, X.; Ye, Q. Part-based semantic transform for few-shot semantic segmentation. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 7141–7152. [Google Scholar] [CrossRef] [PubMed]
- Van Gansbeke, W.; Vandenhende, S.; Georgoulis, S.; Van Gool, L. Unsupervised semantic segmentation by contrasting object mask proposals. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10052–10062. [Google Scholar]
- Amac, M.; Sencan, A.; Baran, B.; Ikizler-Cinbis, N.; Cinbis, R. MaskSplit: Self-supervised meta-learning for few-shot semantic segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2022; pp. 1067–1077. [Google Scholar]
- Karimijafarbigloo, S.; Azad, R.; Merhof, D. Self-supervised few-shot learning for semantic segmentation: An annotation-free approach. arXiv 2023, arXiv:2307.14446. [Google Scholar]
- Shaban, A.; Bansal, S.; Liu, Z.; Essa, I.; Boots, B. One-shot learning for semantic segmentation. arXiv 2017, arXiv:1709.03410. [Google Scholar]
- Zhuge, Y.; Shen, C. Deep reasoning network for few-shot semantic segmentation. In Proceedings of the 29th ACM International Conference on Multimedia, Virtual, 20–24 October 2021; pp. 5344–5352. [Google Scholar]
- Liu, L.; Cao, J.; Liu, M.; Guo, Y.; Chen, Q.; Tan, M. Dynamic extension nets for few-shot semantic segmentation. In Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA, 12–16 October 2020; pp. 1441–1449. [Google Scholar]
- Wang, W.; Zhou, T.; Yu, F.; Dai, J.; Konukoglu, E.; Van Gool, L. Exploring cross-image pixel contrast for semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 7303–7313. [Google Scholar]
- Dong, N.; Xing, E. Few-shot semantic segmentation with prototype learning. BMVC 2018, 3, 4. [Google Scholar]
- Lang, C.; Cheng, G.; Tu, B.; Han, J. Learning what not to segment: A new perspective on few-shot segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 8057–8067. [Google Scholar]
- Zhang, X.; Wei, Y.; Li, Z.; Yan, C.; Yang, Y. Rich embedding features for one-shot semantic segmentation. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 6484–6493. [Google Scholar] [CrossRef] [PubMed]
- Zhang, C.; Lin, G.; Liu, F.; Guo, J.; Wu, Q.; Yao, R. Pyramid graph networks with connection attentions for region-based one-shot semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9587–9595. [Google Scholar]
- Wang, H.; Zhang, X.; Hu, Y.; Yang, Y.; Cao, X.; Zhen, X. Few-shot semantic segmentation with democratic attention networks. In Proceedings, Part XIII 16, Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Springer: Cham, Switzerland, 2020; pp. 730–746. [Google Scholar]
- Liu, B.; Jiao, J.; Ye, Q. Harmonic feature activation for few-shot semantic segmentation. IEEE Trans. Image Process. 2021, 30, 3142–3153. [Google Scholar] [CrossRef] [PubMed]
- Yang, X.; Wang, B.; Chen, K.; Zhou, X.; Yi, S.; Ouyang, W.; Zhou, L. Brinet: Towards bridging the intra-class and inter-class gaps in one-shot segmentation. arXiv 2020, arXiv:2008.06226. [Google Scholar]
- Tian, P.; Wu, Z.; Qi, L.; Wang, L.; Shi, Y.; Gao, Y. Differentiable meta-learning model for few-shot semantic segmentation. Proc. Aaai Conf. Artif. Intell. 2020, 34, 12087–12094. [Google Scholar] [CrossRef]
- Boudiaf, M.; Kervadec, H.; Masud, Z.; Piantanida, P.; Ben Ayed, I.; Dolz, J. Few-shot segmentation without meta-learning: A good transductive inference is all you need? In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13979–13988. [Google Scholar]
- Wu, Z.; Shi, X.; Lin, G.; Cai, J. Learning meta-class memory for few-shot semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 517–526. [Google Scholar]
- Xie, G.; Xiong, H.; Liu, J.; Yao, Y.; Shao, L. Few-shot semantic segmentation with cyclic memory network. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 7293–7302. [Google Scholar]
- Li, G.; Kang, G.; Liu, W.; Wei, Y.; Yang, Y. Content-consistent matching for domain adaptive semantic segmentation. In European Conference on Computer Vision; Springer International Publishing: Cham, Switzerland, 2020; pp. 440–456. [Google Scholar]
- Subhani, M.; Ali, M. Learning from scale-invariant examples for domain adaptation in semantic segmentation. In Proceedings, Part XXII 16, Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Springer: Cham, Switzerland, 2020; pp. 290–306. [Google Scholar]
- Wen, X.; Zhao, B.; Zheng, A.; Zhang, X.; Qi, X. Self-supervised visual representation learning with semantic grouping. Adv. Neural Inf. Process. Syst. 2022, 35, 16423–16438. [Google Scholar]
- Araslanov, N.; Roth, S. Self-supervised augmentation consistency for adapting semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 15384–15394. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S. Others An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
- Lu, Z.; He, S.; Zhu, X.; Zhang, L.; Song, Y.; Xiang, T. Simpler is better: Few-shot semantic segmentation with classifier weight transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 8741–8750. [Google Scholar]
- Zhang, G.; Kang, G.; Yang, Y.; Wei, Y. Few-shot segmentation via cycle-consistent transformer. Adv. Neural Inf. Process. Syst. 2021, 34, 21984–21996. [Google Scholar]
- Shi, X.; Wei, D.; Zhang, Y.; Lu, D.; Ning, M.; Chen, J.; Ma, K.; Zheng, Y. Dense cross-query-and-support attention weighted mask aggregation for few-shot segmentation. In European Conference on Computer Vision; Springer Nature: Cham, Switzerland, 2022; pp. 151–168. [Google Scholar]
- Everingham, M.; Van Gool, L.; Williams, C.; Winn, J.; Zisserman, A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
- Lin, T.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollar, P.; Zitnick, C. Microsoft coco: Common objects in context. In Proceedings, Part V 13, Computer Vision-ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Springer: Cham, Switzerland, 2014; pp. 740–755. [Google Scholar]
- Li, X.; Wei, T.; Chen, Y.; Tai, Y.; Tang, C. Fss-1000: A 1000-class dataset for few-shot segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 2869–2878. [Google Scholar]
- Yang, L.; Zhuo, W.; Qi, L.; Shi, Y.; Gao, Y. Mining latent classes for few-shot segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 8721–8730. [Google Scholar]
- Liu, Y.; Liu, N.; Yao, X.; Han, J. Intermediate prototype mining transformer for few-shot semantic segmentation. Adv. Neural Inf. Process. Syst. 2022, 35, 38020–38031. [Google Scholar]
- Yang, Y.; Chen, Q.; Feng, Y.; Huang, T. MIANet: Aggregating Unbiased Instance and General Information for Few-Shot Semantic Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 7131–7140. [Google Scholar]
- Tian, Z.; Zhao, H.; Shu, M.; Yang, Z.; Li, R.; Jia, J. Prior guided feature enrichment network for few-shot segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 44, 1050–1065. [Google Scholar] [CrossRef] [PubMed]
- Codella, N.; Rotemberg, V.; Tsch, L.P.; Celebi, M.; Dusza, S.; Gutman, D.; Helba, B.; Kalloo, A.; Liopyris, K.; Marchetti, M. Others Skin lesion analysis toward melanoma detection 2018: A challenge hosted by the international skin imaging collaboration (isic). arXiv 2019, arXiv:1902.03368. [Google Scholar]
- Lei, S.; Zhang, X.; He, J.; Chen, F.; Du, B.; Lu, C. Cross-domain few-shot semantic segmentation. In European Conference on Computer Vision; Springer Nature: Cham, Switzerland, 2022; pp. 73–90. [Google Scholar]
- Chen, H.; Dong, Y.; Lu, Z.; Yu, Y.; Han, J. Pixel Matching Network for Cross-Domain Few-Shot Segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2024; pp. 978–987. [Google Scholar]
Method | avg | ||||
---|---|---|---|---|---|
Supervised approaches | |||||
CWT [34] | 56.9 | 65.2 | 61.2 | 48.8 | 58.0 |
DAN [21] | 54.7 | 68.6 | 57.8 | 51.6 | 58.2 |
MLC [40] | 60.8 | 71.3 | 61.5 | 56.9 | 62.6 |
HSNet [7] | 67.3 | 72.3 | 62.0 | 63.1 | 66.2 |
CyCTR [35] | 69.3 | 72.7 | 56.5 | 58.6 | 64.3 |
IPMT [41] | 71.6 | 73.5 | 58.0 | 61.2 | 66.1 |
MIANet [42] | 68.5 | 75.8 | 67.5 | 63.2 | 68.7 |
Self-supervised approaches | |||||
Saliency * [10] | 51.5 | 49.1 | 48.1 | 39.0 | 46.9 |
MaskContrast * [10] | 53.6 | 50.7 | 50.7 | 39.9 | 48.7 |
IPMT * [41] | 57.9 | 57.2 | 55.4 | 43.9 | 53.6 |
MIANet * [42] | 57.2 | 56.8 | 55.9 | 45.2 | 53.8 |
MaskSplit [11] | 54.1 | 57.1 | 54.8 | 46.1 | 53.0 |
Ours | 58.4 | 57.9 | 58.7 | 46.0 | 55.1 |
Method | avg | FSS-1000 | ||||
---|---|---|---|---|---|---|
Supervised approaches | ||||||
CWT [34] | 30.3 | 36.6 | 30.5 | 32.2 | 32.4 | |
DAN [21] | - | - | - | - | 24.4 | 85.2 |
MLC [40] | 50.2 | 37.8 | 27.1 | 30.4 | 36.4 | |
HSNet [7] | 37.2 | 44.1 | 42.4 | 41.3 | 41.2 | 86.5 |
PEFNet [43] | 36.8 | 41.8 | 38.7 | 36.7 | 38.5 | |
MIANet [42] | 42.5 | 53.0 | 47.8 | 47.4 | 47.7 | |
Self-supervised approaches | ||||||
Saliency * [10] | 22.7 | 24.3 | 20.4 | 22.2 | 22.4 | |
HSNet [7] | 29.3 | 25.6 | 20.5 | 23.0 | 24.6 | 76.1 |
MIANet * [42] | 26.7 | 27.2 | 20.9 | 21.9 | 24.2 | 75.0 |
MaskSplit [11] | 22.3 | 26.1 | 20.6 | 24.3 | 23.3 | 72.1 * |
Ours | 37.4 | 26.2 | 21.3 | 22.3 | 26.8 | 78.1 |
Method | FLOPs | Params | Number of Iterations | Time in Each Iteration |
---|---|---|---|---|
HSNet [7] | 103.8 G | 86.7 M | 90 | 15 m |
MLDAC (Ours) | 112.0 G | 96.1 M | 18 | 15 m |
Results | |||
---|---|---|---|
✓ | 49.2 | ||
✓ | ✓ | 53.5 | |
✓ | ✓ | ✓ | 55.1 |
Fixed Learnable PE | Connection | Connection | + Connection | Results |
---|---|---|---|---|
✓ | ✓ | 54.6 | ||
✓ | ✓ | 54.3 | ||
✓ | ✓ | 54.1 | ||
✓ | ✓ | 53.9 | ||
✓ | ✓ | 54.5 | ||
✓ | ✓ | ✓ | 55.1 | |
✓ | ✓ | ✓ | 54.8 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yi , K.; Wang , W.; Zhang , Y. A Self-Supervised Few-Shot Semantic Segmentation Method Based on Multi-Task Learning and Dense Attention Computation. Sensors 2024, 24, 4975. https://doi.org/10.3390/s24154975
Yi K, Wang W, Zhang Y. A Self-Supervised Few-Shot Semantic Segmentation Method Based on Multi-Task Learning and Dense Attention Computation. Sensors. 2024; 24(15):4975. https://doi.org/10.3390/s24154975
Chicago/Turabian StyleYi , Kai, Weihang Wang , and Yi Zhang . 2024. "A Self-Supervised Few-Shot Semantic Segmentation Method Based on Multi-Task Learning and Dense Attention Computation" Sensors 24, no. 15: 4975. https://doi.org/10.3390/s24154975
APA StyleYi , K., Wang , W., & Zhang , Y. (2024). A Self-Supervised Few-Shot Semantic Segmentation Method Based on Multi-Task Learning and Dense Attention Computation. Sensors, 24(15), 4975. https://doi.org/10.3390/s24154975