{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,11,15]],"date-time":"2024-11-15T05:40:40Z","timestamp":1731649240081,"version":"3.28.0"},"reference-count":79,"publisher":"Association for Computing Machinery (ACM)","issue":"11","funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62021001"],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2024,11,30]]},"abstract":"\n Shadow detection is a fundamental and challenging task in many computer vision applications. Intuitively, most shadows come from the occlusion of light by the object itself, resulting in the object and its shadow being contiguous (referred to as the adjacent shadow in this article). In this case, when the color of the object is similar to that of the shadow, existing methods struggle to achieve accurate detection. To address this problem, we present SwinShadow, a transformer-based architecture that fully utilizes the powerful shifted window mechanism for detecting adjacent shadows. The mechanism operates in two steps. Initially, it applies local self-attention within a single window, enabling the network to focus on local details. Subsequently, it shifts the attention windows to facilitate inter-window attention, enabling the capture of a broader range of adjacent information. These combined steps significantly improve the network\u2019s capacity to distinguish shadows from nearby objects. And the whole process can be divided into three parts: encoder, decoder, and feature integration. During encoding, we adopt Swin Transformer to acquire hierarchical features. Then during decoding, for shallow layers, we propose a deep supervision (DS) module to suppress the false positives and boost the representation capability of shadow features for subsequent processing, while for deep layers, we leverage a double attention (DA) module to integrate local and shifted window in one stage to achieve a larger receptive field and enhance the continuity of information. Ultimately, a new multi-level aggregation (MLA) mechanism is applied to fuse the decoded features for mask prediction. Extensive experiments on three shadow detection benchmark datasets, SBU, UCF, and ISTD, demonstrate that our network achieves good performance in terms of balance error rate (BER). The source code and results are now publicly available at\n https:\/\/github.com\/harrytea\/SwinShadow<\/jats:ext-link>\n .\n <\/jats:p>","DOI":"10.1145\/3688803","type":"journal-article","created":{"date-parts":[[2024,8,27]],"date-time":"2024-08-27T14:39:52Z","timestamp":1724769592000},"page":"1-20","update-policy":"http:\/\/dx.doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["SwinShadow: Shifted Window for Ambiguous Adjacent Shadow Detection"],"prefix":"10.1145","volume":"20","author":[{"ORCID":"http:\/\/orcid.org\/0000-0002-4741-8231","authenticated-orcid":false,"given":"Yonghui","family":"Wang","sequence":"first","affiliation":[{"name":"University of Science and Technology of China, Hefei, China"}]},{"ORCID":"http:\/\/orcid.org\/0009-0004-6095-6132","authenticated-orcid":false,"given":"Shaokai","family":"Liu","sequence":"additional","affiliation":[{"name":"University of Science and Technology of China, Hefei, China"}]},{"ORCID":"http:\/\/orcid.org\/0000-0002-7163-6263","authenticated-orcid":false,"given":"Li","family":"Li","sequence":"additional","affiliation":[{"name":"University of Science and Technology of China, Hefei, China"}]},{"ORCID":"http:\/\/orcid.org\/0000-0003-1690-9836","authenticated-orcid":false,"given":"Wengang","family":"Zhou","sequence":"additional","affiliation":[{"name":"University of Science and Technology of China, Hefei, China"}]},{"ORCID":"http:\/\/orcid.org\/0000-0003-2188-3028","authenticated-orcid":false,"given":"Houqiang","family":"Li","sequence":"additional","affiliation":[{"name":"University of Science and Technology of China, Hefei, China"}]}],"member":"320","published-online":{"date-parts":[[2024,11,14]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2011.2132728"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2008.916989"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58452-8_13"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01212"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01240-3_15"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00565"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/ITSC.2001.948679"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1285"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01181"},{"key":"e_1_3_1_12_2","first-page":"1","volume-title":"ICLR","author":"Dosovitskiy Alexey","year":"2020","unstructured":"Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 1\u201310."},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/3474085.3475199"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-009-0243-z"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2006.18"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2003.811620"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1145\/3131342"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2011.5995725"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2018.8594050"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.563"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2019.2919616"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2021.3049331"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00778"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2011.6126331"},{"issue":"1","key":"e_1_3_1_25_2","first-page":"106","article-title":"Receptive fields, binocular interaction and functional architecture in the cat\u2019s visual cortex","volume":"160","author":"Hubel David H.","year":"1962","unstructured":"David H. Hubel and Torsten N. Wiesel. 1962. Receptive fields, binocular interaction and functional architecture in the cat\u2019s visual cortex. JP 160, 1 (1962), 106.","journal-title":"JP"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICME52920.2022.9860013"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2009.2012924"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1145\/2070781.2024191"},{"key":"e_1_3_1_29_2","first-page":"4171","volume-title":"NAACL","author":"Kenton Jacob Devlin Ming-Wei Chang","year":"2019","unstructured":"Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In NAACL, 4171\u20134186."},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.249"},{"key":"e_1_3_1_31_2","first-page":"109","volume-title":"NeurIPS","author":"Kr\u00e4henb\u00fchl Philipp","year":"2011","unstructured":"Philipp Kr\u00e4henb\u00fchl and Vladlen Koltun. 2011. Efficient inference in fully connected CRFs with gaussian edge potentials. In NeurIPS, 109\u2013117."},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-15552-9_24"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-011-0501-8"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01216-8_41"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-46805-6_19"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.106"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00312"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICPR.2000.905341"},{"issue":"8","key":"e_1_3_1_40_2","first-page":"4117","article-title":"Shadow detection in single RGB images using a context preserver convolutional neural network trained by multiple adversarial examples","volume":"28","author":"Mohajerani Sorour","year":"2019","unstructured":"Sorour Mohajerani and Parvaneh Saeedi. 2019. Shadow detection in single RGB images using a context preserver convolutional neural network trained by multiple adversarial examples. TIP 28, 8 (2019), 4117\u20134129.","journal-title":"TIP"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1145\/1413862.1413863"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.483"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2022.3214422"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2009.5459381"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206665"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2009.2012944"},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.1016\/S0893-6080(98)00116-6"},{"issue":"3","key":"e_1_3_1_48_2","first-page":"379","article-title":"A mathematical theory of communication","volume":"27","author":"Shannon Claude Elwood","year":"1948","unstructured":"Claude Elwood Shannon. 1948. A mathematical theory of communication. BELLTJ 27, 3 (1948), 379\u2013423.","journal-title":"BELLTJ"},{"key":"e_1_3_1_49_2","first-page":"2067","volume-title":"CVPR","author":"Shen Li","year":"2015","unstructured":"Li Shen, Teck Wee Chua, and Karianto Leman. 2015. Shadow optimization from structured deep edge detection. In CVPR, 2067\u20132074."},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1145\/3077140"},{"key":"e_1_3_1_51_2","first-page":"85","article-title":"New spectrum ratio properties and features for shadow detection","volume":"51","author":"Tian Jiandong","year":"2016","unstructured":"Jiandong Tian, Xiaojun Qi, Liangqiong Qu, and Yandong Tang. 2016. New spectrum ratio properties and features for shadow detection. PR 51 (2016), 85\u201396.","journal-title":"PR"},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2009.2026682"},{"key":"e_1_3_1_53_2","doi-asserted-by":"publisher","DOI":"10.1109\/WACV56688.2023.00175"},{"key":"e_1_3_1_54_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW59228.2023.00179"},{"key":"e_1_3_1_55_2","first-page":"5998","volume-title":"NeurIPS","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NeurIPS, 5998\u20136008."},{"key":"e_1_3_1_56_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.387"},{"key":"e_1_3_1_57_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46466-4_49"},{"key":"e_1_3_1_58_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00192"},{"key":"e_1_3_1_59_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.433"},{"key":"e_1_3_1_60_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00863"},{"key":"e_1_3_1_61_2","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2018\/140"},{"key":"e_1_3_1_62_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01716"},{"key":"e_1_3_1_63_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.634"},{"key":"e_1_3_1_64_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.01165"},{"key":"e_1_3_1_65_2","first-page":"5753","volume-title":"NeurIPS","author":"Yang Zhilin","year":"2019","unstructured":"Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R. Salakhutdinov, and Quoc V. Le. 2019. XLNet: Generalized autoregressive pretraining for language understanding. In NeurIPS, 5753\u20135763."},{"key":"e_1_3_1_66_2","doi-asserted-by":"publisher","DOI":"10.1145\/3243217"},{"key":"e_1_3_1_67_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01458"},{"key":"e_1_3_1_68_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01102"},{"issue":"3","key":"e_1_3_1_69_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3571745","article-title":"Exploiting Residual and Illumination with GANs for Shadow Detection and Shadow Removal","volume":"19","author":"Zhang Ling","year":"2023","unstructured":"Ling Zhang, Chengjiang Long, Xiaolong Zhang, and Chunxia Xiao. 2023. Exploiting Residual and Illumination with GANs for Shadow Detection and Shadow Removal. TOMM 19, 3 (2023), 1\u201322.","journal-title":"TOMM"},{"key":"e_1_3_1_70_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2007.902842"},{"key":"e_1_3_1_71_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.660"},{"key":"e_1_3_1_72_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00887"},{"key":"e_1_3_1_73_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00531"},{"key":"e_1_3_1_74_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00681"},{"key":"e_1_3_1_75_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00916"},{"key":"e_1_3_1_76_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2010.5540209"},{"key":"e_1_3_1_77_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01231-1_8"},{"key":"e_1_3_1_78_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00466"},{"key":"e_1_3_1_79_2","first-page":"1","volume-title":"ICLR","author":"Zhu Xizhou","year":"2020","unstructured":"Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai. 2020. Deformable DETR: Deformable transformers for end-to-end object detection. In ICLR, 1\u201310."},{"key":"e_1_3_1_80_2","doi-asserted-by":"publisher","DOI":"10.1145\/3503161.3547904"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3688803","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,11,14]],"date-time":"2024-11-14T16:47:21Z","timestamp":1731602841000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3688803"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,11,14]]},"references-count":79,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2024,11,30]]}},"alternative-id":["10.1145\/3688803"],"URL":"https:\/\/doi.org\/10.1145\/3688803","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"type":"print","value":"1551-6857"},{"type":"electronic","value":"1551-6865"}],"subject":[],"published":{"date-parts":[[2024,11,14]]},"assertion":[{"value":"2024-01-23","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-08-05","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-11-14","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}