Abstract
Recently, building segmentation (BS) has drawn significant attention in remote sensing applications. Convolutional neural networks (CNNs) have become the mainstream analysis approach in this field owing to their powerful representative ability. However, owing to the variation in building appearance, designing an effective CNN architecture for BS still remains a challenging task. Most of CNN-based BS methods mainly focus on deep or wide network architectures, neglecting the correlation among intermediate features. To address this problem, in this paper we propose a hybrid first and second order attention network (HFSA) that explores both the global mean and the inner-product among different channels to adaptively rescale intermediate features. As a result, the HFSA can not only make full use of first order feature statistics, but also incorporate the second order feature statistics, which leads to more representative feature. We conduct a series of comprehensive experiments on three widely used aerial building segmentation data sets and one satellite building segmentation data set. The experimental results show that our newly developed model achieves better segmentation performance over state-of-the-art models in terms of both quantitative and qualitative results.
Similar content being viewed by others
References
Jensen J R, Cowen D C. Remote sensing of urban suburban infrastructure and socio-economic attributes. Photogramm Eng Remote Sens, 1999, 65: 611–622
Yuan J. Learning building extraction in aerial scenes with convolutional networks. IEEE Trans Pattern Anal Mach Intell, 2018, 40: 2793–2798
Liow Y T, Pavlidis T. Use of shadows for extracting buildings in aerial images. Comput Vision Graph Image Process, 1990, 49: 242–277
Ok A O. Automated detection of buildings from single VHR multispectral images using shadow information and graph cuts. ISPRS J Photogrammetry Remote Sens, 2013, 86: 21–40
Inglada J. Automatic recognition of man-made objects in high resolution optical remote sensing images by SVM classification of geometric image features. ISPRS J Photogrammetry Remote Sens, 2007, 62: 236–248
Karantzalos K, Paragios N. Recognition-driven two-dimensional competing priors toward automatic and accurate building detection. IEEE Trans Geosci Remote Sens, 2009, 47: 133–144
Kim T, Muller J. Development of a graph-based approach for building detection. Image Vision Comput, 1999, 17: 3–14
Li E, Femiani J, Xu S, et al. Robust rooftop extraction from visible band images using higher order CRF. IEEE Trans Geosci Remote Sens, 2015, 53: 4483–4495
Yang H L, Yuan J, Lunga D, et al. Building extraction at scale using convolutional neural network: mapping of the united states. IEEE J Sel Top Appl Earth Observ Remote Sens, 2018, 11: 2600–2614
Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, 2012. 1097–1105
Zhou Q, Wang Y, Liu J, et al. An open-source project for real-time image semantic segmentation. Sci China Inf Sci, 2019, 62: 227101
Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015. 3431–3440
Wang W, Gao W, Hu Z Y. Effectively modeling piecewise planar urban scenes based on structure priors and CNN. Sci China Inf Sci, 2019, 62: 029102
Ronneberger O, Fischer P, Brox T. Unet: convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015. Berlin: Springer, 2015. 234–241
Lu Y H, Zhen M M, Fang T. Multi-view based neural network for semantic segmentation on 3D scenes. Sci China Inf Sci, 2019, 62: 229101
Badrinarayanan V, Kendall A, Cipolla R. SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell, 2017, 39: 2481–2495
Geng Q C, Zhou Z, Cao X C. Survey of recent progress in semantic image segmentation with CNNs. Sci China Inf Sci, 2018, 61: 051101
Haut J M, Paoletti M E, Plaza J, et al. Visual attention-driven hyperspectral image classification. IEEE Trans Geosci Remote Sens, 2019, 57: 8065–8080
He N, Fang L, Li S, et al. Remote sensing scene classification using multilayer stacked covariance pooling. IEEE Trans Geosci Remote Sens, 2018, 56: 6899–6910
He N, Fang L, Li S, et al. Skip-connected covariance network for remote sensing scene classification. IEEE Trans Neural Netw Learn Syst, 2019. doi: https://doi.org/10.1109/TNNLS.2019.2920374
Lin T Y, Maji S. Improved bilinear pooling with CNNs. In: Proceedings of British Machine Vision Conference (BMVC), 2017
Lin T Y, RoyChowdhury A, Maji S. Bilinear CNN models for fine-grained visual recognition. In: Proceedings of Internation Conference of Computer Vision (ICCV), 2015. 1449–1457
Mnih V. Machine learning for aerial image labeling. Dissertation for Ph.D. Degree. Toronto: University of Toronto, 2013
Ji S, Wei S, Lu M. Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set. IEEE Trans Geosci Remote Sens, 2019, 57: 574–586
Maggiori E, Tarabalka Y, Charpiat G, et al. Can semantic labeling methods generalize to any city? The inria aerial image labeling benchmark. In: Proceedings of IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Fort Worth, 2017. 3226–3229
Acknowledgements
This work was supported in part by National Natural Science Foundation of China (Grant Nos. 61922029, 61771192), National Natural Science Foundation of China for International Cooperation and Exchanges (Grant No. 61520106001), and Huxiang Young Talents Plan Project of Hunan Province (Grant No. 2019RS2016).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
He, N., Fang, L. & Plaza, A. Hybrid first and second order attention Unet for building segmentation in remote sensing images. Sci. China Inf. Sci. 63, 140305 (2020). https://doi.org/10.1007/s11432-019-2791-7
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11432-019-2791-7