{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,8,2]],"date-time":"2024-08-02T00:35:26Z","timestamp":1722558926792},"reference-count":46,"publisher":"MDPI AG","issue":"15","license":[{"start":{"date-parts":[[2024,7,31]],"date-time":"2024-07-31T00:00:00Z","timestamp":1722384000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Intelligent Policing Key Laboratory of Sichuan Province","award":["ZNJW2024FKMS004"]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"Nowadays, autonomous driving technology has become widely prevalent. The intelligent vehicles have been equipped with various sensors (e.g., vision sensors, LiDAR, depth cameras etc.). Among them, the vision systems with tailored semantic segmentation and perception algorithms play critical roles in scene understanding. However, the traditional supervised semantic segmentation needs a large number of pixel-level manual annotations to complete model training. Although few-shot methods reduce the annotation work to some extent, they are still labor intensive. In this paper, a self-supervised few-shot semantic segmentation method based on Multi-task Learning and Dense Attention Computation (dubbed MLDAC) is proposed. The salient part of an image is split into two parts; one of them serves as the support mask for few-shot segmentation, while cross-entropy losses are calculated between the other part and the entire region with the predicted results separately as multi-task learning so as to improve the model\u2019s generalization ability. Swin Transformer is used as our backbone to extract feature maps at different scales. These feature maps are then input to multiple levels of dense attention computation blocks to enhance pixel-level correspondence. The final prediction results are obtained through inter-scale mixing and feature skip connection. The experimental results indicate that MLDAC obtains 55.1% and 26.8% one-shot mIoU self-supervised few-shot segmentation on the PASCAL-5i and COCO-20i datasets, respectively. In addition, it achieves 78.1% on the FSS-1000 few-shot dataset, proving its efficacy.<\/jats:p>","DOI":"10.3390\/s24154975","type":"journal-article","created":{"date-parts":[[2024,7,31]],"date-time":"2024-07-31T21:16:49Z","timestamp":1722460609000},"page":"4975","source":"Crossref","is-referenced-by-count":0,"title":["A Self-Supervised Few-Shot Semantic Segmentation Method Based on Multi-Task Learning and Dense Attention Computation"],"prefix":"10.3390","volume":"24","author":[{"given":"Kai","family":"Yi\u00a0","sequence":"first","affiliation":[{"name":"Intelligent Policing Key Laboratory of Sichuan Province, Luzhou 646099, China"}]},{"ORCID":"http:\/\/orcid.org\/0000-0002-2199-1048","authenticated-orcid":false,"given":"Weihang","family":"Wang\u00a0","sequence":"additional","affiliation":[{"name":"College of Computer Science, Sichuan University, Chengdu 610042, China"}]},{"ORCID":"http:\/\/orcid.org\/0000-0002-2028-048X","authenticated-orcid":false,"given":"Yi","family":"Zhang\u00a0","sequence":"additional","affiliation":[{"name":"College of Computer Science, Sichuan University, Chengdu 610042, China"}]}],"member":"1968","published-online":{"date-parts":[[2024,7,31]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Kim, M.Y., Kim, S., Lee, B., and Kim, J. (2024). Enhancing Deep Learning-Based Segmentation Accuracy through Intensity Rendering and 3D Point Interpolation Techniques to Mitigate Sensor Variability. Sensors, 24.","DOI":"10.3390\/s24144475"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Jun, W., Yoo, J., and Lee, S. (2024). Synthetic Data Enhancement and Network Compression Technology of Monocular Depth Estimation for Real-Time Autonomous Driving System. Sensors, 24.","DOI":"10.3390\/s24134205"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"You, L., Zhu, R., Kwan, M., Chen, M., Zhang, F., Yang, B., Wong, M., and Qin, Z. (2024). Unraveling adaptive changes in electric vehicle charging behavior toward the postpandemic era by federated meta-learning. Innovation, 5.","DOI":"10.1016\/j.xinn.2024.100587"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Liu, S., You, L., Zhu, R., Liu, B., Liu, R., Yu, H., and Yuen, C. (2024). AFM3D: An Asynchronous Federated Meta-Learning Framework for Driver Distraction Detection. IEEE Transactions on Intelligent Transportation Systems, IEEE.","DOI":"10.1109\/TITS.2024.3357138"},{"key":"ref_5","unstructured":"Wang, K., Liew, J., Zou, Y., Zhou, D., and Feng, J. (November, January 27). Panet: Few-shot image semantic segmentation with prototype alignment. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Zhang, C., Lin, G., Liu, F., Yao, R., and Shen, C. (2019, January 15\u201320). Canet: Class-agnostic segmentation networks with iterative refinement and attentive few-shot learning. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00536"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Min, J., Kang, D., and Cho, M. (2021, January 11\u201317). Hypercorrelation squeeze for few-shot segmentation. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00686"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Zhou, T., Wang, W., Konukoglu, E., and Van Gool, L. (2022, January 18\u201324). Rethinking semantic segmentation: A prototype view. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00261"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"7141","DOI":"10.1109\/TNNLS.2021.3084252","article-title":"Part-based semantic transform for few-shot semantic segmentation","volume":"33","author":"Yang","year":"2021","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Van Gansbeke, W., Vandenhende, S., Georgoulis, S., and Van Gool, L. (2021, January 11\u201317). Unsupervised semantic segmentation by contrasting object mask proposals. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00990"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Amac, M., Sencan, A., Baran, B., Ikizler-Cinbis, N., and Cinbis, R. (2022, January 3\u20138). MaskSplit: Self-supervised meta-learning for few-shot semantic segmentation. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA.","DOI":"10.1109\/WACV51458.2022.00050"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Karimijafarbigloo, S., Azad, R., and Merhof, D. (2023). Self-supervised few-shot learning for semantic segmentation: An annotation-free approach. arXiv.","DOI":"10.1007\/978-3-031-46005-0_14"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Shaban, A., Bansal, S., Liu, Z., Essa, I., and Boots, B. (2017). One-shot learning for semantic segmentation. arXiv.","DOI":"10.5244\/C.31.167"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Zhuge, Y., and Shen, C. (2021, January 20\u201324). Deep reasoning network for few-shot semantic segmentation. Proceedings of the 29th ACM International Conference on Multimedia, Virtual.","DOI":"10.1145\/3474085.3475658"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Liu, L., Cao, J., Liu, M., Guo, Y., Chen, Q., and Tan, M. (2020, January 12\u201316). Dynamic extension nets for few-shot semantic segmentation. Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA.","DOI":"10.1145\/3394171.3413915"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Wang, W., Zhou, T., Yu, F., Dai, J., Konukoglu, E., and Van Gool, L. (2021, January 11\u201317). Exploring cross-image pixel contrast for semantic segmentation. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00721"},{"key":"ref_17","first-page":"4","article-title":"Few-shot semantic segmentation with prototype learning","volume":"3","author":"Dong","year":"2018","journal-title":"BMVC"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Lang, C., Cheng, G., Tu, B., and Han, J. (2022, January 18\u201324). Learning what not to segment: A new perspective on few-shot segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00789"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"6484","DOI":"10.1109\/TNNLS.2021.3081693","article-title":"Rich embedding features for one-shot semantic segmentation","volume":"33","author":"Zhang","year":"2021","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_20","unstructured":"Zhang, C., Lin, G., Liu, F., Guo, J., Wu, Q., and Yao, R. (November, January 27). Pyramid graph networks with connection attentions for region-based one-shot semantic segmentation. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Wang, H., Zhang, X., Hu, Y., Yang, Y., Cao, X., and Zhen, X. (2020). Few-shot semantic segmentation with democratic attention networks. Proceedings, Part XIII 16, Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, 23\u201328 August 2020, Springer.","DOI":"10.1007\/978-3-030-58601-0_43"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"3142","DOI":"10.1109\/TIP.2021.3058512","article-title":"Harmonic feature activation for few-shot semantic segmentation","volume":"30","author":"Liu","year":"2021","journal-title":"IEEE Trans. Image Process."},{"key":"ref_23","unstructured":"Yang, X., Wang, B., Chen, K., Zhou, X., Yi, S., Ouyang, W., and Zhou, L. (2020). Brinet: Towards bridging the intra-class and inter-class gaps in one-shot segmentation. arXiv."},{"key":"ref_24","first-page":"12087","article-title":"Differentiable meta-learning model for few-shot semantic segmentation","volume":"34","author":"Tian","year":"2020","journal-title":"Proc. Aaai Conf. Artif. Intell."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Boudiaf, M., Kervadec, H., Masud, Z., Piantanida, P., Ben Ayed, I., and Dolz, J. (2021, January 20\u201325). Few-shot segmentation without meta-learning: A good transductive inference is all you need?. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01376"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Wu, Z., Shi, X., Lin, G., and Cai, J. (2021, January 11\u201317). Learning meta-class memory for few-shot semantic segmentation. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00056"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Xie, G., Xiong, H., Liu, J., Yao, Y., and Shao, L. (2021, January 11\u201317). Few-shot semantic segmentation with cyclic memory network. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00720"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Li, G., Kang, G., Liu, W., Wei, Y., and Yang, Y. (2020). Content-consistent matching for domain adaptive semantic segmentation. European Conference on Computer Vision, Springer International Publishing.","DOI":"10.1007\/978-3-030-58568-6_26"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Subhani, M., and Ali, M. (2020). Learning from scale-invariant examples for domain adaptation in semantic segmentation. Proceedings, Part XXII 16, Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, 23\u201328 August 2020, Springer.","DOI":"10.1007\/978-3-030-58542-6_18"},{"key":"ref_30","first-page":"16423","article-title":"Self-supervised visual representation learning with semantic grouping","volume":"35","author":"Wen","year":"2022","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Araslanov, N., and Roth, S. (2021, January 20\u201325). Self-supervised augmentation consistency for adapting semantic segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01513"},{"key":"ref_32","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). Others An image is worth 16 \u00d7 16 words: Transformers for image recognition at scale. arXiv."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021, January 11\u201317). Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Lu, Z., He, S., Zhu, X., Zhang, L., Song, Y., and Xiang, T. (2021, January 11\u201317). Simpler is better: Few-shot semantic segmentation with classifier weight transformer. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00862"},{"key":"ref_35","first-page":"21984","article-title":"Few-shot segmentation via cycle-consistent transformer","volume":"34","author":"Zhang","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Shi, X., Wei, D., Zhang, Y., Lu, D., Ning, M., Chen, J., Ma, K., and Zheng, Y. (2022). Dense cross-query-and-support attention weighted mask aggregation for few-shot segmentation. European Conference on Computer Vision, Springer Nature.","DOI":"10.1007\/978-3-031-20044-1_9"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1007\/s11263-009-0275-4","article-title":"The pascal visual object classes (voc) challenge","volume":"88","author":"Everingham","year":"2010","journal-title":"Int. J. Comput. Vis."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Lin, T., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollar, P., and Zitnick, C. (2014). Microsoft coco: Common objects in context. Proceedings, Part V 13, Computer Vision-ECCV 2014: 13th European Conference, Zurich, Switzerland, 6\u201312 September 2014, Springer.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Li, X., Wei, T., Chen, Y., Tai, Y., and Tang, C. (2020, January 13\u201319). Fss-1000: A 1000-class dataset for few-shot segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00294"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Yang, L., Zhuo, W., Qi, L., Shi, Y., and Gao, Y. (2021, January 11\u201317). Mining latent classes for few-shot segmentation. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00860"},{"key":"ref_41","first-page":"38020","article-title":"Intermediate prototype mining transformer for few-shot semantic segmentation","volume":"35","author":"Liu","year":"2022","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Yang, Y., Chen, Q., Feng, Y., and Huang, T. (2023, January 17\u201324). MIANet: Aggregating Unbiased Instance and General Information for Few-Shot Semantic Segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada.","DOI":"10.1109\/CVPR52729.2023.00689"},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"1050","DOI":"10.1109\/TPAMI.2020.3013717","article-title":"Prior guided feature enrichment network for few-shot segmentation","volume":"44","author":"Tian","year":"2020","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_44","unstructured":"Codella, N., Rotemberg, V., Tsch, L.P., Celebi, M., Dusza, S., Gutman, D., Helba, B., Kalloo, A., Liopyris, K., and Marchetti, M. (2019). Others Skin lesion analysis toward melanoma detection 2018: A challenge hosted by the international skin imaging collaboration (isic). arXiv."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Lei, S., Zhang, X., He, J., Chen, F., Du, B., and Lu, C. (2022). Cross-domain few-shot semantic segmentation. European Conference on Computer Vision, Springer Nature.","DOI":"10.1007\/978-3-031-20056-4_5"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Chen, H., Dong, Y., Lu, Z., Yu, Y., and Han, J. (2024, January 3\u20138). Pixel Matching Network for Cross-Domain Few-Shot Segmentation. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA.","DOI":"10.1109\/WACV57701.2024.00102"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/24\/15\/4975\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,1]],"date-time":"2024-08-01T15:41:05Z","timestamp":1722526865000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/24\/15\/4975"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7,31]]},"references-count":46,"journal-issue":{"issue":"15","published-online":{"date-parts":[[2024,8]]}},"alternative-id":["s24154975"],"URL":"https:\/\/doi.org\/10.3390\/s24154975","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2024,7,31]]}}}