Big Map R-CNN for object detection in large-scale remote sensing images

Linfei Wang; Dapeng Tao; Ruonan Wang; Ruxin Wang; Hao Li

doi:10.3934/mfc.2019019

Article Contents

2019, Volume 2, Issue 4: 299-314. Doi: 10.3934/mfc.2019019

This issue Previous Article On the

$ k $

-error linear complexity for

$ p^n $

-periodic binary sequences via hypercube theory Next Article A Sim2real method based on DDQN for training a self-driving scale car

Big Map R-CNN for object detection in large-scale remote sensing images

a.
FIST LAB, School of Information Science and Engineering, Yunnan University Kunming, 650091, Yunnan, China
b.
Yunnan Union Vision Technology Co Ltd. Kunming, 650091, Yunnan, China
c.
School of Software, Yunnan University Kunming, Yunnan University Kunming, 650091, Yunnan, China

^* Corresponding author: Dapeng Tao
^* Corresponding author: Dapeng Tao

Published: December 2019

Abstract / Introduction Full Text(HTML) Figure(6) / Table(8) Related Papers Cited by

Abstract

Detecting sparse and multi-sized objects in very high resolution (VHR) remote sensing images remains a significant challenge in satellite imagery applications and analytics. Difficulties include broad geographical scene distributions and high pixel counts in each image: a large-scale satellite image contains tens to hundreds of millions of pixels and dozens of complex backgrounds. Furthermore, the scale of the same category object can vary widely (e.g., ships can measure from several to thousands of pixels). To address these issues, here we propose the Big Map R-CNN method to improve object detection in VHR satellite imagery. Big Map R-CNN introduces mean shift clustering for quadric detecting based on the existing Mask R-CNN architecture. Big Map R-CNN considers four main aspects: 1) big map cropping to generate small size sub-images; 2) detecting these sub-images using the typical Mask R-CNN network; 3) screening out fragmented low-confidence targets and collecting uncertain image regions by clustering; 4) quadric detecting to generate prediction boxes. We also introduce a new large-scale and VHR remote sensing imagery dataset containing two categories (RSI LS-VHR-2) for detection performance verification. Comprehensive evaluations on RSI LS-VHR-2 dataset demonstrate the effectiveness of the proposed Big Map R-CNN algorithm for object detection in large-scale remote sensing images.

Keywords:

Mathematics Subject Classification: Primary: 68T10, 68T45.

Citation:

Full Text(HTML)

Figure 1. Motivation for the proposed method. (a) Remote sensing scene of Madrid Airport. (b) Remote sensing scene of the South China Sea. These examples are from the RSI LS-VHR-2 dataset. The targets in the images are indicated by red cicles. The remote sensing scenes show the characteristics of large scale, high resolution, and relatively sparse target distribution, which means that existing methods are suboptimal for detection

Download: Full-size image PowerPoint slide

Figure 2. The scheme of Big Map R-CNN, containing three main components: 1) cropping the input big map in the form of a sliding window; 2) detecting each sub-image sequentially and filtering possible object areas; 3) using mean shift clustering to precisely locate candidate object areas, cropping the new sub-images containing possible objects, and using quadric-detecting to judge whether there is an object or not

Download: Full-size image PowerPoint slide

Figure 3. Large-scale image cropping

Download: Full-size image PowerPoint slide

Figure 5. PRCs of the proposed Big Map R-CNN method and three other state-of-the-art detection methods (YOLOv3, Faster R-CNN, and Mask R-CNN). (a) is the PRC of the four methods for aircraft when IoU = 0.5; (b) is the PRC of the four methods for aircraft when IoU = 0.75; (c) is the PRC of the four methods for ships when IoU = 0.5; (d) is the PRC of the four methods for ships when IoU = 0.75

Download: Full-size image PowerPoint slide

Figure 4. Some examples from the RSI LS-VHR-2 dataset

Download: Full-size image PowerPoint slide

Figure 6. Detection comparisons of the different methods. (a) Typical Mask R-CNN for aircraft; (b) Big Map R-CNN for aircraft; (c) typical Mask R-CNN for ships; (d) Big Map R-CNN for ships. The true positives are indicated by green rectangles, the false negatives are indicated by red circles, and the bounding boxes that deviate from the ground truth are indicated by red rectangles

Download: Full-size image PowerPoint slide

Table Ⅰ. DESCRIPTION OF THE RSI LS-VHR-2 DATASET

Label	Name	Total instances	Complete instances	Fragmentary instances	Scene class	Images	Image width	Sub-images
1	aircraft	103917	85975	17942	203	2858	6000-15000	62129
2	ship	68436	54386	14050	30	397	5000-18000	53860

| Show Table

DownLoad: CSV

Table Ⅱ. DETAILS OF THE TEST IMAGES

Label	Scale(pixels)	Images	Instances	Sub-images
aircraft	$ 8000\times8000 $	5	272	980
ship	$ 8000\times8000 $	5	225	980

| Show Table

DownLoad: CSV

Table Ⅵ. PARAMETER SETTING OF Mask R-CNN AND Big Map R-CNN

Input Size	Per Batch Size	Max Iteration	Anchor Stride	Base Learning Rate	Steps	Weight Decay	NMS Threshold	Momentum
600	8	90000	(4, 8, 16, 32, 64)	0.01	(60000, 80000)	0.0001	0.7	0.9

| Show Table

DownLoad: CSV

Table Ⅲ. PERFARMANCE COMPARISONS OF THREE DIFFERENT CROPPING SIZE IN Faster R-CNN NETWORK

Cropping Size	AP	Cost time(s)
C300	0.430	45.82
C600	0.651	13.20
C800	0.647	8.79

| Show Table

DownLoad: CSV

Table Ⅳ. PERFORMANCE COMPARISONS OF THE FOUR METHODS ON AIRCRAFT

Method	IoU=0.5						IoU=0.75
Method	TP	FP	FN	Recall	Precision	AP	TP	FP	FN	Recall	Precision	AP
YOLOv3	213	25	59	0.783	0.895	0.727	166	72	106	0.610	0.6974	0.494
Faster R-CNN	242	55	30	0.890	0.815	0.830	189	108	83	0.695	0.636	0.618
Mask R-CNN	245	38	27	0.901	0.866	0.843	184	99	88	0.676	0.650	0.570
Big Map R-CNN	261	4	11	0.960	0.985	0.959	241	24	31	0.886	0.909	0.850

| Show Table

DownLoad: CSV

Table Ⅴ. PERFORMANCE COMPARISONS OF THE FOUR METHODS ON SHIP

Method	IoU=0.5						IoU=0.75
Method	TP	FP	FN	Recall	Precision	AP	TP	FP	FN	Recall	Precision	AP
YOLOv3	128	53	97	0.569	0.707	0.513	66	115	159	0.293	0.365	0.213
Faster R-CNN	164	185	61	0.729	0.470	0.651	78	271	147	0.347	0.223	0.259
Mask R-CNN	166	121	59	0.738	0.578	0.661	78	209	147	0.347	0.272	0.273
Big Map R-CNN	191	49	34	0.849	0.796	0.826	133	107	92	0.591	0.554	0.546

| Show Table

DownLoad: CSV

Table Ⅶ. THE AVERAGE PRECISION OF Mask R-CNN AND Big Map R-CNN IN RSI LS-VHR-2 DATASET

Method	Backbone	AP($ \% $)
Mask R-CNN	ResNet50	75.2
Big Map R-CNN	ResNet50	89.2

| Show Table

DownLoad: CSV

Table Ⅷ. COMPREHENSIVE PERFORMANCE COMPARISONS OF FOUR METHODS

Method	mAP (IoU=0.5)	mAP (IoU=0.75)	Inference time(s/im)
YOLOv3	0.620	0.354	3.310
Faster R-CNN	0.741	0.439	13.254
Mask R-CNN	0.752	0.422	13.310
Big Map R-CNN	0.892	0.700	16.005

| Show Table

DownLoad: CSV

Related Papers

Cited by

References

[1]	U. R. Acharya, H. Fujita and S. Bhat, Decision support system for fatty liver disease using GIST descriptors extracted from ultrasound images, Information Fusion, (2016), 32-39. doi: 10.1016/j.inffus.2015.09.006.
[2]	H. Bay, T. Tuytelaars and L. Van Gool, Surf: Speeded up robust features, European Conference On Computer Vision, 3951 (2006), 404-417. doi: 10.1007/11744023_32.
[3]	Y. S. Cao, X. Niu and Y. Dou, Region-based convolutional neural networks for object detection in very high resolution remote sensing images, 2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery, (2016), 548–554. doi: 10.1109/FSKD.2016.7603232.
[4]	K. Chatfield, K. Simonyan and A. Vedaldi, Return of the devil in the details: Delving deep into convolutional nets, proceedings of BMVC, (2014). doi: 10.5244/C.28.6.
[5]	L. C. Chen, G. Papandreou and I. Kokkinos, Semantic image segmentation with deep convolutional nets and fully connected crfs, arXiv: 1412.7062.
[6]	G. Cheng, P. Zhou and J. Han, Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images, IEEE Transactions on Geoscience and Remote Sensing, 54 (2016), 7405-7415. doi: 10.1109/TGRS.2016.2601622.
[7]	J. Dai, Y. Li, K. He and J. Sun, R-fcn: Object detection via region-based fully convolutional networks, Advances in Neural Information Processing Systems, (2016), 379-387.
[8]	N. Dalal and B. Triggs, Histograms of oriented gradients for human detection, international Conference on Computer Vision & Pattern Recognition, (2005), 886-893. doi: 10.1109/CVPR.2005.177.
[9]	R. Girshick, J. Donahue, T. Darrell and J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2014), 580-587. doi: 10.1109/CVPR.2014.81.
[10]	R. Girshick, Fast R-CNN, Proceedings of the IEEE International Conference on Computer Vision, (2015), 1440-1448. doi: 10.1109/ICCV.2015.169.
[11]	D. Gray and H. Tao, Viewpoint invariant pedestrian recognition with an ensemble of localized features, Proceedings of the European Conference on Computer Vision, 5302 (2008), 262-275. doi: 10.1007/978-3-540-88682-2_21.
[12]	X. Han, Y. Zhong and L. Zhang, An efficient and robust integrated geospatial object detection framework for high spatial resolution remote sensing imagery, Remote Sensing, 9 (2017), 666-687. doi: 10.3390/rs9070666.
[13]	K. He, X. Zhang, S. Ren and J. Sun, Spatial pyramid pooling in deep convolutional networks for visual recognition, ECCV, 8591 (2014), 346-361. doi: 10.1007/978-3-319-10578-9_23.
[14]	K. He, G. Gkioxari, P. Dollár and R. Girshick, Mask r-cnn, Proceedings of the IEEE international conference on computer vision, (2017), 2961-2969. doi: 10.1109/ICCV.2017.322.
[15]	J. Jeong, H. Park and N. Kwak, Enhancement of SSD by concatenating feature maps for object detection, BMVC, (2017), 1-12. doi: 10.5244/C.31.76.
[16]	K. Kanistras, G. Martins and M. J. Rutherford, Survey of unmanned aerial vehicles (UAVs) for traffic monitoring, Handbook of Unmanned Aerial Vehicles, (2016), 2643-2666. doi: 10.1109/ICUAS.2013.6564694.
[17]	M. Kang, K. Ji and X. Leng, Contextual region-based convolutional neural network with multilayer fusion for SAR ship detection, Remote Sensing, (2017), 860-873.
[18]	Y. Ke and R. Sukthankar, PCA-SIFT: A more distinctive representation for local image descriptors, CVPR, (2004), 506-513.
[19]	S. Khanal, J. Fulton and S. Shearer, An overview of current and potential applications of thermal remote sensing in precision agriculture, Computers and Electronics in Agriculture, 139 (2017), 22-32. doi: 10.1016/j.compag.2017.05.001.
[20]	V. Kyrki, J. K. Kamarainen and H. Kälviäinen, Simple Gabor feature space for invariant object recognition, Pattern Recognition Letters, 25 (2004), 311-318. doi: 10.1016/j.patrec.2003.10.008.
[21]	Y. Li, Y. Tan and J. Deng, Cauchy graph embedding optimization for built-up areas detection from high-resolution remote sensing images, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 8 (2015), 2078-2096. doi: 10.1109/JSTARS.2015.2394504.
[22]	W. Liu, D. Anguelov and D. Erhan, Ssd: Single shot multibox detector, European Conference on Computer Vision, 9905 (2016), 21-37. doi: 10.1007/978-3-319-46448-0_2.
[23]	D. G. Lowe, Distinctive image features from scale-invariant keypoints, International Journal of Computer Vision, 60 (2004), 91-110. doi: 10.1023/B:VISI.0000029664.99615.94.
[24]	J. Ma, H. Zhou and J. Zhao, Robust feature matching for remote sensing image registration via locally linear transforming, IEEE Transactions on Geoscience and Remote Sensing, 53 (2015), 6469-6481. doi: 10.1109/TGRS.2015.2441954.
[25]	M. Mazhar Rathore, A. Ahmad and A. Paul, Urban planning and building smart cities based on the internet of things using big data analytics, Computer Networks, 101 (2016), 63-80. doi: 10.1016/j.comnet.2015.12.023.
[26]	B. S. Manjunath, J. R. Ohm and V. V. Vasudevan, Color and texture descriptors, IEEE Transactions on Circuits and Systems for Video Technology, 11 (2011), 703-715. doi: 10.1109/76.927424.
[27]	V. Nair and G. E. Hinton, 3D object recognition with deep belief nets, Advances in Neural Information Processing Systems, (2009), 1339-1347.
[28]	H. Noh, S. Hong and B. Han, Learning deconvolution network for semantic segmentation, Proceedings of the IEEE International Conference on Computer Vision, (2015), 1520-1528. doi: 10.1109/ICCV.2015.178.
[29]	W. Ouyang, X. Wang and X. Zeng, Deepid-net: Deformable deep convolutional neural networks for object detection, The IEEE Conference on Computer Vision and Pattern Recognition, (2015), 2403-2412. doi: 10.1109/CVPR.2015.7298854.
[30]	M. T. Pham, G. Mercier and O. Regniers, Texture retrieval from VHR optical remote sensed images using the local extrema descriptor with application to vineyard parcel detection, Remote Sensing, 8 (2016), 368-388. doi: 10.3390/rs8050368.
[31]	J. Redmon, S. Divvala and R. Girshick, You only look once: Unified, real-time object detection, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2016), 779-788. doi: 10.1109/CVPR.2016.91.
[32]	Y. Ren, C. Zhu and S. Xiao, Small object detection in optical remote sensing images via modified faster R-CNN, Applied Sciences, 8 (2018), 813-823. doi: 10.3390/app8050813.
[33]	S. Ren, K. He and R. Girshick, Faster r-cnn: Towards real-time object detection with region proposal networks, Advances in Neural Information Processing Systems, (2015), 91-99.
[34]	M. Simony, S. Milzy and K. Amendey, Complex-YOLO: An Euler-region-proposal for real-time 3D object detection on point clouds, Proceedings of the European Conference on Computer Vision, 11127 (2018), 197-209. doi: 10.1007/978-3-030-11009-3_11.
[35]	M. Vakalopoulou, K. Karantzalos and N. Komodakis, Building detection in very high resolution multispectral data with deep learning features, 2015 IEEE International Geoscience and Remote Sensing Symposium, (2015), 1873-1876. doi: 10.1109/IGARSS.2015.7326158.
[36]	K. S. Willis, Remote sensing change detection for ecological monitoring in United States protected areas, Biological Conservation, 182 (2015), 233-242. doi: 10.1016/j.biocon.2014.12.006.
[37]	J. Yan, H. Wang and M. Yan, IoU-adaptive deformable R-CNN: Make full use of iou for multi-class object detection in remote sensing imagery, Remote Sensing, (2019), 286-306.
[38]	Y. Zhong, X. Han and L. Zhang, Multi-class geospatial object detection based on a position-sensitive balancing framework for high spatial resolution remote sensing imagery, ISPRS Journal of Photogrammetry and Remote Sensing, 138 (2018), 281-294. doi: 10.1016/j.isprsjprs.2018.02.014.
[39]	H. Zhu, X. Chen and W. Dai, Orientation robust object detection in aerial images using deep convolutional neural network, 2015 IEEE International Conference on Image Processing, (2015), 3735-3739. doi: 10.1109/ICIP.2015.7351502.