Efficiency of Extreme Gradient Boosting for Imbalanced Land Cover Classification Using an Extended Margin and Disagreement Performance
Abstract
:1. Introduction
2. Data and Methods
2.1. Study Areas and Data
2.2. Methods
2.2.1. Models and Parameters Optimization
2.2.2. Experiments: Analysis of Class Imbalance and Spectral Separability
2.3. Accuracy Assessment
2.3.1. CM Based Accuracy Metrics and Disagreement Performance
2.3.2. The Extended Margin and Margin Based Confidence Measures
3. Results
3.1. Experiment 1: Minority Proportion Influence on XGB
3.2. Experiment 2: Influence of Minority Class Spectral Separability on XGB’s Performance
4. Discussion
5. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
Appendix A
- Brief introduction of margin-based certainty margin based confidence measures
- 2.
- Brief introduction of disagreement performance.
- 3.
- Reference map and classification maps of area 8 with different imbalanced samples.
References
- Mellor, A.; Boukir, S.; Haywood, A.; Jones, S. Exploring issues of training data imbalance and mislabelling on random forest performance for large area land cover classification using the ensemble margin. ISPRS J. Photogramm. Remote Sens. 2015, 105, 155–168. [Google Scholar] [CrossRef]
- Mellor, A.; Boukir, S. Exploring diversity in ensemble classification: Applications in large area land cover mapping. ISPRS J. Photogramm. Remote Sens. 2017, 129, 151–161. [Google Scholar] [CrossRef]
- Foody, G.M. Status of land cover classification accuracy assessment. Remote Sens. Environ. 2002, 80, 185–201. [Google Scholar] [CrossRef]
- Geiß, C.; Pelizari, P.A.; Marconcini, M.; Sengara, W.; Edwards, M.; Lakes, T.; Taubenböck, H. Estimation of seismic building structural types using multi-sensor remote sensing and machine learning techniques. ISPRS J. Photogramm. Remote Sens. 2015, 104, 175–188. [Google Scholar] [CrossRef]
- Lippitt, C.D.; Rogan, J.; Li, Z.; Eastman, J.R.; Jones, T.G. Mapping selective logging in mixed deciduous forest: A comparison of Machine Learning Algorithms. Photogramm. Eng. Remote Sens. 2008, 74, 1201–1211. [Google Scholar] [CrossRef]
- Leichtle, T.; Geiß, C.; Lakes, T.; Taubenböck, H. Class imbalance in unsupervised change detection—A diagnostic analysis from urban remote sensing. Int. J. Appl. Earth Obs. Geoinf. 2017, 60, 83–98. [Google Scholar] [CrossRef]
- Foody, G.M.; Mathur, A.; Sanchez-Hernandez, C.; Boyd, D.S. Training set size requirements for the classification of a specific class. Remote Sens. Environ. 2006, 104, 1–14. [Google Scholar] [CrossRef]
- Foster, P. Machine Learning from Imbalanced Data Sets 101 (Extended Abstract). In Proceedings of the AAAI’2000 Workshop on Imbalanced Data Sets, Austin, TX, USA, 31 July 2000. [Google Scholar]
- Japkowicz, N.; Stephen, S. The class imbalance problem: A systematic study. Intell. Data Anal. 2002, 6, 429–449. [Google Scholar] [CrossRef]
- He, H.; Garcia, E.A. Learning from Imbalanced Data. IEEE Trans. Knowl. Data Eng. 2009, 21, 1263–1284. [Google Scholar] [CrossRef]
- Krawczyk, B. Learning from imbalanced data: Open challenges and future directions. Prog. Artif. Intell. 2016, 5, 221–232. [Google Scholar] [CrossRef]
- Haixiang, G.; Yijing, L.; Shang, J.; Mingyun, G.; Yuanyue, H.; Bing, G. Learning from class-imbalanced data: Review of methods and applications. Expert Syst. Appl. 2017, 73, 220–239. [Google Scholar] [CrossRef]
- Krawczyk, B.; Woźniak, M.; Schaefer, G. Cost-sensitive decision tree ensembles for effective imbalanced classification. Appl. Soft Comput. 2014, 14, 554–562. [Google Scholar] [CrossRef] [Green Version]
- Ha, J.; Lee, J.-S. A New Under-Sampling Method Using Genetic Algorithm for Imbalanced Data Classification. In Proceedings of the 10th International Conference on Ubiquitous Information Management and Communication, Danang, Vietnam, 4–6 January 2016; pp. 1–6. [Google Scholar]
- Nekooeimehr, I.; Lai-Yuen, S.K. Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbalanced datasets. Expert Syst. Appl. 2016, 46, 405–416. [Google Scholar] [CrossRef]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Int. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
- Andrew, E.; Taeho, J.; Nathalie, J. A Multiple Resampling Method for Learning from Imbalanced Data Sets. Comput. Intell. 2004, 20, 18–36. [Google Scholar] [CrossRef] [Green Version]
- Wang, B.; Pineau, J. Online Bagging and Boosting for Imbalanced Data Streams. IEEE Trans. Knowl. Data Eng. 2016, 28, 3353–3366. [Google Scholar] [CrossRef]
- Krawczyk, B.; Galar, M.; Jeleń, Ł.; Herrera, F. Evolutionary undersampling boosting for imbalanced classification of breast cancer malignancy. Appl. Soft Comput. 2016, 38, 714–726. [Google Scholar] [CrossRef]
- Hassan, A.K.I.; Abraham, A. Modeling Insurance Fraud Detection Using Imbalanced Data Classification. In Advances in Nature and Biologically Inspired Computing; Springer: Berlin, Germany, 2016; pp. 117–127. [Google Scholar]
- López, V.; del Río, S.; Benítez, J.M.; Herrera, F. Cost-sensitive linguistic fuzzy rule based classification systems under the MapReduce framework for imbalanced big data. Fuzzy Sets Syst. 2015, 258, 5–38. [Google Scholar] [CrossRef]
- Wu, D.; Wang, Z.; Chen, Y.; Zhao, H. Mixed-kernel based weighted extreme learning machine for inertial sensor based human activity recognition with imbalanced dataset. Neurocomputing 2016, 190, 35–49. [Google Scholar] [CrossRef]
- Bruzzone, L.; Serpico, S.B. Classification of imbalanced remote-sensing data by neural networks. Pattern Recognit. Lett. 1997, 18, 1323–1328. [Google Scholar] [CrossRef] [Green Version]
- Li, F.; Li, S.; Zhu, C.; Lan, X.; Chang, H. Cost-Effective Class-Imbalance Aware CNN for Vehicle Localization and Categorization in High Resolution Aerial Images. Remote Sens. 2017, 9, 494. [Google Scholar] [CrossRef]
- Chen, X.; Fang, T.; Huo, H.; Li, D. Semisupervised Feature Selection for Unbalanced Sample Sets of VHR Images. IEEE Geosci. Remote Sens. Lett. 2010, 7, 781–785. [Google Scholar] [CrossRef]
- Graves, J.S.; Asner, P.G.; Martin, E.R.; Anderson, B.C.; Colgan, S.M.; Kalantari, L.; Bohlman, A.S. Tree Species Abundance Predictions in a Tropical Agricultural Landscape with a Supervised Classification Model and Imbalanced Data. Remote Sens. 2016, 8, 161. [Google Scholar] [CrossRef]
- Loyola-González, O.; Martínez-Trinidad, J.F.; Carrasco-Ochoa, J.A.; García-Borroto, M. Study of the impact of resampling methods for contrast pattern based classifiers in imbalanced databases. Neurocomputing 2016, 175, 935–947. [Google Scholar] [CrossRef]
- Tien Bui, D.; Pradhan, B.; Lofman, O.; Revhaug, I. Landslide Susceptibility Assessment in Vietnam Using Support Vector Machines, Decision Tree, and Naïve Bayes Models. Math. Probl. Eng. 2012, 2012, 1–26. [Google Scholar] [CrossRef]
- Pal, M.; Mather, P.M. An assessment of the effectiveness of decision tree methods for land cover classification. Remote Sens. Environ. 2003, 86, 554–565. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
- Fan, J.; Wang, X.; Wu, L.; Zhou, H.; Zhang, F.; Yu, X.; Lu, X.; Xiang, Y. Comparison of Support Vector Machine and Extreme Gradient Boosting for predicting daily global solar radiation using temperature and precipitation in humid subtropical climates: A case study in China. Energy Convers. Manag. 2018, 164, 102–111. [Google Scholar] [CrossRef]
- Carmona, P.; Climent, F.; Momparler, A. Predicting failure in the U.S. banking sector: An extreme gradient boosting approach. Int. Rev. Econ. Finance 2019, 61, 304–323. [Google Scholar] [CrossRef]
- López, V.; Fernández, A.; García, S.; Palade, V.; Herrera, F. An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Inf. Sci. 2013, 250, 113–141. [Google Scholar] [CrossRef]
- Rokach, L.; Schclar, A.; Itach, E. Ensemble methods for multi-label classification. Expert Syst. Appl. 2014, 41, 7507–7523. [Google Scholar] [CrossRef] [Green Version]
- Bi, J.; Zhang, C. An Empirical Comparison on State-of-the-art Multi-class Imbalance Learning Algorithms and A New Diversified Ensemble Learning Scheme. Knowl. Based Syst. 2018, 158, 81–93. [Google Scholar] [CrossRef]
- Schapire, R.E.; Freund, Y.; Barlett, P.; Lee, W.S. Boosting the margin: A new explanation for the effectiveness of voting methods. In Proceedings of the 14th International Conference on Machine Learning (ICML ‘97), Nashville, TN, USA, 8–12 July 1997; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA; pp. 322–330. [Google Scholar]
- Panuju, D.R.; Paull, D.J.; Trisasongko, B.H. Combining Binary and Post-Classification Change Analysis of Augmented ALOS Backscatter for Identifying Subtle Land Cover Changes. Remote Sens. 2019, 11, 100. [Google Scholar] [CrossRef]
- Georganos, S.; Grippa, T.; Vanhuysse, S.; Lennert, M.; Shimoni, M.; Wolff, E. Very High Resolution Object-Based Land Use–Land Cover Urban Classification Using Extreme Gradient Boosting. IEEE Geosci. Remote Sens. Lett. 2018, 15, 607–611. [Google Scholar] [CrossRef]
- Ustuner, M.; Balik Sanli, F. Polarimetric Target Decompositions and Light Gradient Boosting Machine for Crop Classification: A Comparative Evaluation. ISPRS Int. J. Geo-Inf. 2019, 8, 97. [Google Scholar] [CrossRef]
- Branco, P.; Torgo, L.; Ribeiro, R. A Survey of Predictive Mo delling under Imbalanced Distributions. CoRR. arXiv 2015, arXiv:1505.01658. [Google Scholar]
- Chawla, N.V. Data Mining for Imbalanced Datasets: An Overview. In Data Mining and Knowledge Discovery Handbook; Maimon, O., Rokach, L., Eds.; Springer: Boston, MA, USA, 2010; pp. 875–886. [Google Scholar]
- Pontius, R.G.; Millones, M. Death to Kappa: Birth of quantity disagreement and allocation disagreement for accuracy assessment. Int. J. Remote Sens. 2011, 32, 4407–4429. [Google Scholar] [CrossRef]
- Pontius, R.G.; Santacruz, A. Quantity, exchange, and shift components of difference in a square contingency table. Int. J. Remote Sens. 2014, 35, 7543–7554. [Google Scholar] [CrossRef]
- Guo, Q.; Li, W.; Liu, D.; Chen, J. A Framework for Supervised Image Classification with Incomplete Training Samples. Photogramm. Eng. Remote Sens. 2012, 78, 595–604. [Google Scholar] [CrossRef]
- Madonsela, S.; Cho, M.A.; Ramoelo, A.; Mutanga, O.; Naidoo, L. Estimating tree species diversity in the savannah using NDVI and woody canopy cover. Int. J. Appl. Earth Obs. Geoinf. 2018, 66, 106–115. [Google Scholar] [CrossRef] [Green Version]
- McGarigal, K.; Cushman, S.A.; Ene, E. FRAGSTATS v4: Spatial Pattern Analysis Program for Categorical and Continuous Maps. Available online: http://www.umass.edu/landeco/research/fragstats/fragstats.html (accessed on 1 May 2019).
- Song, C.; Woodcock, C.E.; Seto, K.C.; Lenney, M.P.; Macomber, S.A. Classification and Change Detection Using Landsat TM Data: When and How to Correct Atmospheric Effects? Remote Sens. Environ. 2001, 75, 230–244. [Google Scholar] [CrossRef]
- Haralick, R.M.; Shanmugam, K.; Dinstein, I. Textural Features for Image Classification. IEEE Trans. Syst. Man Cybern. 1973, 3, 610–621. [Google Scholar] [CrossRef]
- Li, W.; Guo, Q.; Elkan, C. A Positive and Unlabeled Learning Algorithm for One-Class Classification of Remote-Sensing Data. IEEE Trans. Geosci. Remote Sens. 2011, 49, 717–725. [Google Scholar] [CrossRef]
- Richards, J.A. Remote Sensing Digital Image Analysis; Springer: Berlin, Germany, 1999. [Google Scholar]
- García Nieto, P.J.; García–Gonzalo, E.; Arbat, G.; Duran–Ros, M.; Ramírez de Cartagena, F.; Puig-Bargués, J. Pressure drop modelling in sand filters in micro-irrigation using gradient boosted regression trees. Biosyst. Eng. 2018, 171, 41–51. [Google Scholar] [CrossRef]
- Chen, L.; Zhang, T.; Li, T. Gradient boosting model for unbalanced quantitative mass spectra quality assessment. In Proceedings of the 2017 International Conference on Security, Pattern Analysis, and Cybernetics (SPAC), Shenzhen, China, 15–17 December 2017; IEEE: Piscataway, NJ, USA; pp. 394–399. [Google Scholar]
- He, H.; Zhang, W.; Zhang, S. A novel ensemble method for credit scoring: Adaption of different imbalance ratios. Expert Syst. Appl. 2018, 98, 105–117. [Google Scholar] [CrossRef]
- Belgiu, M.; Drăguţ, L. Random forest in remote sensing: A review of applications and future directions. ISPRS J. Photogramm. Remote Sens. 2016, 114, 24–31. [Google Scholar] [CrossRef]
- Rodriguez-Galiano, V.F.; Ghimire, B.; Rogan, J.; Chica-Olmo, M.; Rigol-Sanchez, J.P. An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS J. Photogramm. Remote Sens. 2012, 67, 93–104. [Google Scholar] [CrossRef]
- Cheng, F.; Zhang, J.; Wen, C.; Liu, Z.; Li, Z. Large cost-sensitive margin distribution machine for imbalanced data classification. Neurocomputing 2017, 224, 45–57. [Google Scholar] [CrossRef]
- Kuncheva, L.I.; Whitaker, C.J. Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy. Mach. Learn. 2003, 51, 181–207. [Google Scholar] [CrossRef]
- Del Río, S.; López, V.; Benítez, J.M.; Herrera, F. On the use of MapReduce for imbalanced big data using Random Forest. Inf. Sci. 2014, 285, 112–137. [Google Scholar] [CrossRef]
Area | SHDI | Dataset | Resolution | Species |
---|---|---|---|---|
1 | 0.83 | ADS 40 | 0.2 m | Framland 1, Framland 2, Soil |
2 | 0.94 | ADS 40 | 0.2 m | House, Tree, Framland 1, Framland 2, Others |
3 | 1.02 | ADS 40 | 0.2 m | Tree, Framland 1, Framland 2, Soil, Water, Others |
4 | 1.19 | ADS 40 | 0.2 m | Tree, Framland 1, Framland 2, Soil, Grass |
5 | 1.21 | ADS 40 | 0.2 m | House, Tree, Framland 1, Framland 2, Soil, Others |
6 | 1.43 | ADS 40 | 0.2 m | House, Tree, Soil, Road, Grass, Others |
7 | 1.67 | ADS 40 | 0.2 m | House, Tree, Framland 1, Framland 2, Soil, Grass, Others |
8 | 2.22 | Geo-Eye 1 | 0.5 m | Water, Road, Tree, Builiding 1,Builiding 2, Building 3, Grass, Waterweeds, High-light Objects, Soil, Others |
Pair Separation | Water | Road | Tree | Building 1 | Grass | Water-Weeds | Building 2 | Building 3 | Highlight Objects | Soil | Others |
---|---|---|---|---|---|---|---|---|---|---|---|
Tree | 2.00 | 2.00 | - | 1.84 | 1.82 | 1.98 | 2.00 | 2.00 | 2.00 | 2.00 | 1.97 |
Water | - | 2.00 | 2.00 | 2.00 | 2.00 | 1.99 | 2.00 | 2.00 | 2.00 | 2.00 | 1.98 |
Minority Proportion in Training Data Set: 40% of (per) Majority | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CM | Margin-Weighted Confusion Matrix | ||||||||||||
House | Tree | Soil | Road | Grass | Others | House | Tree | Soil | Road | Grass | Others | ||
XGB | House | 293 | 8 | 33 | 33 | 1 | 11 | 0.80 | 0.67 | 0.86 | 0.78 | 0.58 | 0.84 |
Tree | 1 | 498 | 2 | 3 | 65 | 35 | 0.81 | 0.78 | 0.95 | 0.78 | 0.66 | 0.83 | |
Soil | 1 | 0 | 636 | 9 | 1 | 1 | 0.68 | 0.00 | 0.92 | 0.84 | 0.77 | 0.84 | |
Road | 3 | 1 | 4 | 589 | 56 | 2 | 0.76 | 0.82 | 0.94 | 0.86 | 0.69 | 0.89 | |
Grass | 4 | 46 | 1 | 30 | 392 | 2 | 0.75 | 0.72 | 0.96 | 0.80 | 0.69 | 0.76 | |
Others | 2 | 29 | 2 | 3 | 13 | 631 | 0.86 | 0.68 | 0.95 | 0.79 | 0.66 | 0.88 | |
House | Tree | Soil | Road | Grass | Others | House | Tree | Soil | Road | Grass | Others | ||
RF | House | 265 | 8 | 56 | 37 | 2 | 11 | 0.53 | 0.29 | 0.47 | 0.43 | 0.25 | 0.38 |
Tree | 1 | 494 | 2 | 3 | 66 | 38 | 0.41 | 0.48 | 0.76 | 0.46 | 0.35 | 0.66 | |
Soil | 0 | 0 | 635 | 10 | 2 | 1 | 0.00 | 0.00 | 0.67 | 0.37 | 0.22 | 0.11 | |
Road | 3 | 3 | 4 | 587 | 56 | 2 | 0.41 | 0.22 | 0.39 | 0.52 | 0.46 | 0.78 | |
Grass | 4 | 51 | 2 | 31 | 385 | 2 | 0.39 | 0.44 | 0.45 | 0.42 | 0.45 | 0.69 | |
Others | 2 | 37 | 2 | 3 | 12 | 624 | 0.52 | 0.37 | 0.64 | 0.39 | 0.32 | 0.72 |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sun, F.; Wang, R.; Wan, B.; Su, Y.; Guo, Q.; Huang, Y.; Wu, X. Efficiency of Extreme Gradient Boosting for Imbalanced Land Cover Classification Using an Extended Margin and Disagreement Performance. ISPRS Int. J. Geo-Inf. 2019, 8, 315. https://doi.org/10.3390/ijgi8070315
Sun F, Wang R, Wan B, Su Y, Guo Q, Huang Y, Wu X. Efficiency of Extreme Gradient Boosting for Imbalanced Land Cover Classification Using an Extended Margin and Disagreement Performance. ISPRS International Journal of Geo-Information. 2019; 8(7):315. https://doi.org/10.3390/ijgi8070315
Chicago/Turabian StyleSun, Fei, Run Wang, Bo Wan, Yanjun Su, Qinghua Guo, Youxin Huang, and Xincai Wu. 2019. "Efficiency of Extreme Gradient Boosting for Imbalanced Land Cover Classification Using an Extended Margin and Disagreement Performance" ISPRS International Journal of Geo-Information 8, no. 7: 315. https://doi.org/10.3390/ijgi8070315
APA StyleSun, F., Wang, R., Wan, B., Su, Y., Guo, Q., Huang, Y., & Wu, X. (2019). Efficiency of Extreme Gradient Boosting for Imbalanced Land Cover Classification Using an Extended Margin and Disagreement Performance. ISPRS International Journal of Geo-Information, 8(7), 315. https://doi.org/10.3390/ijgi8070315