{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,9,23]],"date-time":"2024-09-23T04:26:53Z","timestamp":1727065613227},"reference-count":72,"publisher":"MDPI AG","issue":"21","license":[{"start":{"date-parts":[[2021,11,5]],"date-time":"2021-11-05T00:00:00Z","timestamp":1636070400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"the National Key R&D Program of China","award":["2019YFC1510905","4192034"]},{"DOI":"10.13039\/501100001809","name":"the National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62125102"],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"Deep learning methods have achieved considerable progress in remote sensing image building extraction. Most building extraction methods are based on Convolutional Neural Networks (CNN). Recently, vision transformers have provided a better perspective for modeling long-range context in images, but usually suffer from high computational complexity and memory usage. In this paper, we explored the potential of using transformers for efficient building extraction. We design an efficient dual-pathway transformer structure that learns the long-term dependency of tokens in both their spatial and channel dimensions and achieves state-of-the-art accuracy on benchmark building extraction datasets. Since single buildings in remote sensing images usually only occupy a very small part of the image pixels, we represent buildings as a set of \u201csparse\u201d feature vectors in their feature space by introducing a new module called \u201csparse token sampler\u201d. With such a design, the computational complexity in transformers can be greatly reduced over an order of magnitude. We refer to our method as Sparse Token Transformers (STT). Experiments conducted on the Wuhan University Aerial Building Dataset (WHU) and the Inria Aerial Image Labeling Dataset (INRIA) suggest the effectiveness and efficiency of our method. Compared with some widely used segmentation methods and some state-of-the-art building extraction methods, STT has achieved the best performance with low time cost.<\/jats:p>","DOI":"10.3390\/rs13214441","type":"journal-article","created":{"date-parts":[[2021,11,5]],"date-time":"2021-11-05T02:25:54Z","timestamp":1636079154000},"page":"4441","source":"Crossref","is-referenced-by-count":86,"title":["Building Extraction from Remote Sensing Images with Sparse Token Transformers"],"prefix":"10.3390","volume":"13","author":[{"ORCID":"http:\/\/orcid.org\/0000-0003-0483-1306","authenticated-orcid":false,"given":"Keyan","family":"Chen","sequence":"first","affiliation":[{"name":"Image Processing Center, School of Astronautics, Beihang University, Beijing 100191, China"},{"name":"Beijing Key Laboratory of Digital Media, Beihang University, Beijing 100191, China"},{"name":"State Key Laboratory of Virtual Reality Technology and Systems, School of Astronautics, Beihang University, Beijing 100191, China"}]},{"ORCID":"http:\/\/orcid.org\/0000-0003-1774-552X","authenticated-orcid":false,"given":"Zhengxia","family":"Zou","sequence":"additional","affiliation":[{"name":"Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA"}]},{"given":"Zhenwei","family":"Shi","sequence":"additional","affiliation":[{"name":"Image Processing Center, School of Astronautics, Beihang University, Beijing 100191, China"},{"name":"Beijing Key Laboratory of Digital Media, Beihang University, Beijing 100191, China"},{"name":"State Key Laboratory of Virtual Reality Technology and Systems, School of Astronautics, Beihang University, Beijing 100191, China"}]}],"member":"1968","published-online":{"date-parts":[[2021,11,5]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Guo, M., Liu, H., Xu, Y., and Huang, Y. (2020). Building extraction based on U-Net with an attention block and multiple losses. Remote Sens., 12.","DOI":"10.3390\/rs12091400"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Zhou, D., Wang, G., He, G., Long, T., Yin, R., Zhang, Z., Chen, S., and Luo, B. (2020). Robust Building Extraction for High Spatial Resolution Remote Sensing Images with Self-Attention Network. Sensors, 20.","DOI":"10.3390\/s20247241"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"574","DOI":"10.1109\/TGRS.2018.2858817","article-title":"Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set","volume":"57","author":"Ji","year":"2018","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Chen, K., Fu, K., Gao, X., Yan, M., Sun, X., and Zhang, H. (2017, January 23\u201328). Building extraction from remote sensing images with deep learning in a supervised manner. Proceedings of the 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Fort Worth, TX, USA.","DOI":"10.1109\/IGARSS.2017.8127295"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Chen, M., Wu, J., Liu, L., Zhao, W., Tian, F., Shen, Q., Zhao, B., and Du, R. (2021). DR-Net: An Improved Network for Building Extraction from High Resolution Remote Sensing Image. Remote Sens., 13.","DOI":"10.3390\/rs13020294"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Chen, H., Qi, Z., and Shi, Z. (2021). Remote Sensing Image Change Detection With Transformers. IEEE Trans. Geosci. Remote Sens., 1\u201314.","DOI":"10.1109\/TGRS.2021.3095166"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Chen, H., Li, W., and Shi, Z. (2021). Adversarial Instance Augmentation for Building Change Detection in Remote Sensing Images. IEEE Trans. Geosci. Remote Sens., 1\u201316.","DOI":"10.1109\/TGRS.2021.3066802"},{"key":"ref_8","unstructured":"Zhang, H., Liao, Y., Yang, H., Yang, G., and Zhang, L. (2020). A Local-Global Dual-Stream Network for Building Extraction From Very-High-Resolution Remote Sensing Images. IEEE Trans. Neural Networks Learn. Syst., 1\u201315."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"2611","DOI":"10.1109\/JSTARS.2021.3058097","article-title":"Attention-Gate-Based Encoder\u2013Decoder Network for Automatical Building Extraction","volume":"14","author":"Deng","year":"2021","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"4595","DOI":"10.1109\/JSTARS.2021.3073994","article-title":"ED-Net: Automatic Building Extraction From High-Resolution Aerial Images With Boundary Information","volume":"14","author":"Zhu","year":"2021","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"7313","DOI":"10.1109\/ACCESS.2020.2964043","article-title":"Automatic building extraction from high-resolution aerial imagery via fully convolutional encoder-decoder network with non-local block","volume":"8","author":"Wang","year":"2020","journal-title":"IEEE Access"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Shao, Z., Tang, P., Wang, Z., Saleem, N., Yam, S., and Sommai, C. (2020). BRRNet: A fully convolutional neural network for automatic building extraction from high-resolution remote sensing images. Remote Sens., 12.","DOI":"10.3390\/rs12061050"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"8490","DOI":"10.1109\/TGRS.2020.2988265","article-title":"Deep Matting for Cloud Detection in Remote Sensing Images","volume":"58","author":"Li","year":"2020","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_14","unstructured":"Zou, Z., Li, W., Shi, T., Shi, Z., and Ye, J. (November, January 27). Generative adversarial training for weakly supervised cloud matting. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"3633","DOI":"10.1109\/TGRS.2019.2959020","article-title":"Coupled adversarial training for remote sensing image super-resolution","volume":"58","author":"Lei","year":"2019","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Lei, S., and Shi, Z. (2021). Hybrid-Scale Self-Similarity Exploitation for Remote Sensing Image Super-Resolution. IEEE Trans. Geosci. Remote Sens., 1\u201310.","DOI":"10.1109\/TGRS.2021.3069889"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"87","DOI":"10.1016\/j.isprsjprs.2021.01.023","article-title":"A geographic information-driven method and a new large scale dataset for remote sensing cloud\/snow detection","volume":"174","author":"Wu","year":"2021","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_18","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, \u0141., and Polosukhin, I. (2017, January 4\u20139). Attention is all you need. Proceedings of the 31st Conference on Neural Information Processing Systems, Long Beach, CA, USA."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Bazi, Y., Bashmal, L., Rahhal, M.M.A., Dayil, R.A., and Ajlan, N.A. (2021). Vision Transformers for Remote Sensing Image Classification. Remote Sens., 13.","DOI":"10.3390\/rs13030516"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"He, X., Chen, Y., and Lin, Z. (2021). Spatial-Spectral Transformer for Hyperspectral Image Classification. Remote Sens., 13.","DOI":"10.3390\/rs13030498"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"847","DOI":"10.1109\/JSTARS.2020.2971763","article-title":"A CNN-Transformer Hybrid Approach for Crop Classification Using Multitemporal Multisensor Images","volume":"13","author":"Li","year":"2020","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Qing, Y., Liu, W., Feng, L., and Gao, W. (2021). Improved Transformer Net for Hyperspectral Image Classification. Remote Sens., 13.","DOI":"10.3390\/rs13112216"},{"key":"ref_23","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Srinivas, A., Lin, T.Y., Parmar, N., Shlens, J., Abbeel, P., and Vaswani, A. (2021, January 19\u201325). Bottleneck transformers for visual recognition. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01625"},{"key":"ref_25","unstructured":"Beal, J., Kim, E., Tzeng, E., Park, D.H., Zhai, A., and Kislyuk, D. (2020). Toward Transformer-Based Object Detection. arXiv."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., and Zagoruyko, S. (2020, January 23\u201328). End-to-end object detection with transformers. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58452-8_13"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Zheng, S., Lu, J., Zhao, H., Zhu, X., Luo, Z., Wang, Y., Fu, Y., Feng, J., Xiang, T., and Torr, P.H. (2021, January 19\u201325). Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00681"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Sirmacek, B., and Unsalan, C. (2008, January 27\u201329). Building detection from aerial images using invariant color features and shadow information. Proceedings of the 2008 23rd International Symposium on Computer and Information Sciences, Istanbul, Turkey.","DOI":"10.1109\/ISCIS.2008.4717854"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"50","DOI":"10.1016\/S0924-2716(98)00027-6","article-title":"Optimisation of building detection in satellite images by combining multispectral classification and texture filtering","volume":"54","author":"Zhang","year":"1999","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Zhong, S.h., Huang, J.j., and Xie, W.x. (2008, January 26\u201329). A new method of building detection from a single aerial photograph. Proceedings of the 2008 9th International Conference on Signal Processing, Beijing, China.","DOI":"10.1109\/ICOSP.2008.4697350"},{"key":"ref_31","first-page":"197","article-title":"Adaptive building edge detection by combining LiDAR data and aerial images","volume":"37","author":"Li","year":"2008","journal-title":"Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"1224","DOI":"10.1109\/TGRS.2009.2029338","article-title":"Multichannel InSAR building edge detection","volume":"48","author":"Ferraioli","year":"2009","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"227","DOI":"10.1007\/s12524-008-0023-1","article-title":"Use of laser range and height texture cues for building identification","volume":"36","author":"Tiwari","year":"2008","journal-title":"J. Indian Soc. Remote Sens."},{"key":"ref_34","first-page":"143","article-title":"Improved building detection using texture information","volume":"38","author":"Awrangjeb","year":"2011","journal-title":"Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"242","DOI":"10.1016\/0734-189X(90)90139-M","article-title":"Use of shadows for extracting buildings in aerial images","volume":"49","author":"Liow","year":"1990","journal-title":"Comput. Vision Graph. Image Process."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"181","DOI":"10.4304\/jmm.9.1.181-188","article-title":"Shadow-Based Building Detection and Segmentation in High-Resolution Remote Sensing Image","volume":"9","author":"Chen","year":"2014","journal-title":"J. Multimed."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Long, J., Shelhamer, E., and Darrell, T. (2015, January 7\u201312). Fully convolutional networks for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Liu, P., Liu, X., Liu, M., Shi, Q., Yang, J., Xu, X., and Zhang, Y. (2019). Building footprint extraction from high-resolution images via spatial residual inception convolutional neural network. Remote Sens., 11.","DOI":"10.3390\/rs11070830"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Liu, H., Luo, J., Huang, B., Hu, X., Sun, Y., Yang, Y., Xu, N., and Zhou, N. (2019). DE-Net: Deep Encoding Network for Building Extraction from High-Resolution Remote Sensing Imagery. Remote Sens., 11.","DOI":"10.3390\/rs11202380"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Zuo, T., Feng, J., and Chen, X. (2016, January 20\u201324). HF-FCN: Hierarchically fused fully convolutional network for robust building extraction. Proceedings of the Asian Conference on Computer Vision, Taipei, Taiwan.","DOI":"10.1007\/978-3-319-54181-5_19"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015, January 5\u20139). U-net: Convolutional networks for biomedical image segmentation. Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany.","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"2481","DOI":"10.1109\/TPAMI.2016.2644615","article-title":"Segnet: A deep convolutional encoder-decoder architecture for image segmentation","volume":"39","author":"Badrinarayanan","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"834","DOI":"10.1109\/TPAMI.2017.2699184","article-title":"Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs","volume":"40","author":"Chen","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_44","unstructured":"Chen, L.C., Papandreou, G., Schroff, F., and Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. arXiv."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., and Adam, H. (2018, January 8-14). Encoder-decoder with atrous separable convolution for semantic image segmentation. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_49"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"6169","DOI":"10.1109\/TGRS.2020.3026051","article-title":"MAP-Net: Multiple Attending Path Neural Network for Building Footprint Extraction From Remote Sensed Imagery","volume":"59","author":"Zhu","year":"2020","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/s11432-019-2791-7","article-title":"Hybrid first and second order attention Unet for building segmentation in remote sensing images","volume":"63","author":"He","year":"2020","journal-title":"Sci. China Inf. Sci."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"154997","DOI":"10.1109\/ACCESS.2020.3015701","article-title":"ARC-Net: An Efficient Network for Building Extraction From High-Resolution Aerial Images","volume":"8","author":"Liu","year":"2020","journal-title":"IEEE Access"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Zhang, Y., Gong, W., Sun, J., and Li, W. (2019). Web-Net: A novel nest networks with ultra-hierarchical sampling for building extraction from aerial imageries. Remote Sens., 11.","DOI":"10.3390\/rs11161897"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Sun, G., Huang, H., Zhang, A., Li, F., Zhao, H., and Fu, H. (2019). Fusion of multiscale convolutional neural networks for building extraction in very high-resolution images. Remote Sens., 11.","DOI":"10.3390\/rs11030227"},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"128774","DOI":"10.1109\/ACCESS.2019.2940527","article-title":"Automatic building extraction on high-resolution remote sensing imagery using deep convolutional encoder-decoder with spatial pyramid pooling","volume":"7","author":"Liu","year":"2019","journal-title":"IEEE Access"},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Ma, J., Wu, L., Tang, X., Liu, F., Zhang, X., and Jiao, L. (2020). Building extraction of aerial images by a global and multi-scale encoder-decoder network. Remote Sens., 12.","DOI":"10.3390\/rs12152350"},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Zhu, Q., Li, Z., Zhang, Y., and Guan, Q. (2020). Building Extraction from High Spatial Resolution Remote Sensing Images via Multiscale-Aware and Segmentation-Prior Conditional Random Fields. Remote Sens., 12.","DOI":"10.3390\/rs12233983"},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Kang, W., Xiang, Y., Wang, F., and You, H. (2019). EU-net: An efficient fully convolutional network for building extraction from optical remote sensing images. Remote Sens., 11.","DOI":"10.3390\/rs11232813"},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Zhang, Z., and Wang, Y. (2019). JointNet: A common neural network for road and building extraction. Remote Sens., 11.","DOI":"10.3390\/rs11060696"},{"key":"ref_56","doi-asserted-by":"crossref","first-page":"54285","DOI":"10.1109\/ACCESS.2019.2912822","article-title":"ESFNet: Efficient network for building extraction from high-resolution aerial images","volume":"7","author":"Lin","year":"2019","journal-title":"IEEE Access"},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Yi, Y., Zhang, Z., Zhang, W., Zhang, C., Li, W., and Zhao, T. (2019). Semantic segmentation of urban buildings from VHR remote sensing imagery using a deep convolutional neural network. Remote Sens., 11.","DOI":"10.3390\/rs11151774"},{"key":"ref_58","doi-asserted-by":"crossref","unstructured":"Ye, Z., Fu, Y., Gan, M., Deng, J., Comber, A., and Wang, K. (2019). Building Extraction from Very High Resolution Aerial Imagery Using Joint Attention Deep Neural Network. Remote Sens., 11.","DOI":"10.3390\/rs11242970"},{"key":"ref_59","doi-asserted-by":"crossref","unstructured":"Lu, K., Sun, Y., and Ong, S.H. (2018, January 20\u201324). Dual-resolution u-net: Building extraction from aerial images. Proceedings of the 2018 24th International Conference on Pattern Recognition (ICPR), Beijing, China.","DOI":"10.1109\/ICPR.2018.8545190"},{"key":"ref_60","doi-asserted-by":"crossref","first-page":"6106","DOI":"10.1109\/TGRS.2020.3022410","article-title":"Multiscale U-shaped CNN building instance extraction framework with edge constraint for high-spatial-resolution remote sensing imagery","volume":"59","author":"Liu","year":"2020","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_61","doi-asserted-by":"crossref","first-page":"4287","DOI":"10.1109\/TGRS.2020.3014312","article-title":"Scene-driven multitask parallel attention network for building extraction in high-resolution remote sensing images","volume":"59","author":"Guo","year":"2020","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_62","doi-asserted-by":"crossref","unstructured":"Yang, H., Wu, P., Yao, X., Wu, Y., Wang, B., and Xu, Y. (2018). Building extraction in very high resolution imagery by dense-attention networks. Remote Sens., 10.","DOI":"10.3390\/rs10111768"},{"key":"ref_63","doi-asserted-by":"crossref","unstructured":"Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., and Lu, H. (2019, January 15\u201320). Dual attention network for scene segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00326"},{"key":"ref_64","doi-asserted-by":"crossref","first-page":"26661","DOI":"10.1007\/s11042-020-09294-7","article-title":"Remote sensing image caption generation via transformer and reinforcement learning","volume":"79","author":"Shen","year":"2020","journal-title":"Multimed. Tools Appl."},{"key":"ref_65","doi-asserted-by":"crossref","first-page":"1884","DOI":"10.1109\/LGRS.2019.2911322","article-title":"Optimized input for CNN-based hyperspectral image classification using spatial transformer network","volume":"16","author":"He","year":"2019","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_66","doi-asserted-by":"crossref","unstructured":"Wang, L., Li, R., Duan, C., and Fang, S. (2021). Transformer Meets DCFAM: A Novel Semantic Segmentation Scheme for Fine-Resolution Remote Sensing Images. arXiv.","DOI":"10.1109\/LGRS.2022.3143368"},{"key":"ref_67","unstructured":"Zhu, X., Su, W., Lu, L., Li, B., Wang, X., and Dai, J. (2020). Deformable DETR: Deformable Transformers for End-to-End Object Detection. arXiv."},{"key":"ref_68","doi-asserted-by":"crossref","unstructured":"Shi, W., Caballero, J., Husz\u00e1r, F., Totz, J., Aitken, A.P., Bishop, R., Rueckert, D., and Wang, Z. (2016, January 27\u201330). Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.207"},{"key":"ref_69","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_70","unstructured":"Simonyan, K., and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv."},{"key":"ref_71","doi-asserted-by":"crossref","unstructured":"Maggiori, E., Tarabalka, Y., Charpiat, G., and Alliez, P. (2017, January 23\u201328). Can Semantic Labeling Methods Generalize to Any City? The Inria Aerial Image Labeling Benchmark. Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Fort Worth, TX, USA.","DOI":"10.1109\/IGARSS.2017.8127684"},{"key":"ref_72","doi-asserted-by":"crossref","unstructured":"Xu, Y., Wu, L., Xie, Z., and Chen, Z. (2018). Building extraction in very high resolution remote sensing imagery using deep learning and guided filters. Remote Sens., 10.","DOI":"10.3390\/rs10010144"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/13\/21\/4441\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,11]],"date-time":"2024-09-11T14:45:04Z","timestamp":1726065904000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/13\/21\/4441"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,11,5]]},"references-count":72,"journal-issue":{"issue":"21","published-online":{"date-parts":[[2021,11]]}},"alternative-id":["rs13214441"],"URL":"https:\/\/doi.org\/10.3390\/rs13214441","relation":{},"ISSN":["2072-4292"],"issn-type":[{"type":"electronic","value":"2072-4292"}],"subject":[],"published":{"date-parts":[[2021,11,5]]}}}