Abstract
This study examined the efficacy of three machine ensemble classifiers, namely, random forest, rotation forest and AdaBoost, in assessing flood susceptibility in an arid region of southern Iraq. A dataset was created from flooded and non-flooded areas to train and validate the ensemble classifiers using a binary classification scheme (1—flood, 0—non-flood). The prepared dataset was then partitioned into two sets with a 70/30 ratio: 70% (2478 pixels) for training and 30% (1062 pixels) for testing. A total of 10 influential flood factors were selected and prepared based on data availability and a literature review. The selected factors were surface elevation, slope, plain curvature, topographic wetness index, stream power index, distance to rivers, drainage density, lithology, soil and land use/land cover. The information gain ratio was first utilised to explore the predictive abilities of the factors. The predictive performances of the three ensemble models were compared using six statistical measures: sensitivity, specificity, accuracy, kappa, root mean square error and area under the operating characteristics curve. The results revealed that the AdaBoost classifier was the best in terms of the statistical measures, followed by the random forest and rotation forest models. A flood susceptibility map was prepared based on the result of each classifier and classified into five zones: very low, low, moderate, high and very high. For the model with the best performance, i.e., the AdaBoost model, these zones were distributed over an area of 6002 km2 (44%) for the very low–low zone, 2477 km2 (18%) for the moderate zone and 5048 km2 (40%) for the high–very high zones. This study proved the high capabilities of ensemble machine learning classifiers to decipher flood susceptibility zones in an arid region.
Similar content being viewed by others
References
Akar Ö (2017) The Rotation Forest algorithm and object-based classification method for land use mapping through UAV images. Geocarto Int. https://doi.org/10.1080/10106049.2016.1277273
Bathrellos GD, Karymbalis E, Skilodimou HD, Gaki-Papanastassiou K, Baltas FA (2016) Urban flood hazard assessment in the basin of Athens Metropolitan city, Greece. Environ Earth Sci 75:319. https://doi.org/10.1007/s12665-015-5157-1
Breiman L (2001) Random forests. Mach Learn 45(1):5–32. https://doi.org/10.1023/A:1010933404324
Breiman L, Cutler A (2004) Random forests: classification/clustering. Retrieved May 2004, from http://www.stat.berkeley.edu/users/breiman/RandomForests
Brenning A (2005) Spatial prediction models for landslide hazards: review, comparison and evaluation. Nat Hazard Earth Sys 5(6):853–862
Cao C, Xu P, Wang Y, Chen J, Zheng L, Niu C (2016) Flash flood hazard susceptibility mapping using frequency ratio and statistical index methods in coalmine subsidence areas. Sustainability 8(9):948. https://doi.org/10.3390/su8090948
Chapi K, Singh VP, Shirzadi A, Shahabi H, Tien Bui D, Pham BT, Khosravi K (2017) A novel hybrid artificial intelligence approach for flood susceptibility assessment. Environ Model Softw 95:229–245. https://doi.org/10.1016/j.envsoft.2017.06.012
Costache R, Zahari L (2017) Flash-flood potential assessment and mapping by integrating the weights-of-evidence and frequency ratio statistical methods in GIS environment – case study: Bâsca Chiojdului River catchment (Romania). Indian J Earth Syst Sci 126(4):59. https://doi.org/10.1007/s12040-017-0828-9
Fekete A (2009) Validation of a social vulnerability index in context to river-floods in Germany. Nat Haz Earth Syst Sci 9:393–403. https://doi.org/10.5194/nhess-9-393-2009
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139. https://doi.org/10.1006/jcss.1997.1504
Gaikwad S, Pise N (2014) An experimental study on hypothyroid using rotation forest. Int J Data Min Knowl Manag Process (IJDKP) 4(6):31–37
Ghoneim E, Foody GM (2013) Assessing flash flood hazard in an arid mountainous region. Arab J Geosci 6(4):1191–1202. https://doi.org/10.1007/s12517-011-0411-7
Haghizadeh A, Siahkamari S, Haghiabi AH, Rahmati O (2017) Forecasting flood-prone areas using Shannon’s entropy model. J Earth Syst Sci 126:39. https://doi.org/10.1007/s12040-017-0819-x
Hirabayashi Y, Kanae S (2009) First estimate of the future global population at risk of flooding. Hydrol Res Lett 3:6–9. https://doi.org/10.3178/hrl.3.6
Hong H, Tsangaratos P, Ilia I, Liu J, A-Xing Z, Chen W (2018) Application of fuzzy weight of evidence and data mining techniques in construction of flood susceptibility map of Poyang County, China. Sci Total Enviro 625:575–588. https://doi.org/10.1016/j.scitotenv.2017.12.256
ISRO (2013) Flood water over Wasit governorate, Republic of Iraq. http://reliefweb.int/map/iraq/ flood-waters-over-wasit-governorate-republic-iraq
Jonkman SN (2005) Global perspectives on loss of human lives caused by floods. Nat Haz 34(2):151–175. https://doi.org/10.1007/s11069-004-8891-3
Kantardzic M (2011) Data mining: concepts, models, methods, and algorithms. Wiley-IEEE Press. 550 p
Kavzoglu T, Colkesen I (2013) An assessment of the effectiveness of a rotation forest ensemble for land-use and land-cover mapping. Int J Remote Sens 34(12):4224–4241. https://doi.org/10.1080/01431161.2013.774099
Kazakis N, Kougias I, Patsialis T (2015) Assessment of flood hazard areas at a regional scale using an index-based approach and analytical hierarchy process: application in Rhodope–Evros region, Greece. Sci Total Environ 538:555–563. https://doi.org/10.1016/j.scitotenv.2015.08.055
Khosravi K, Nohani E, Maroufinia E, Pourghasemi HR (2016) A GIS-based flood susceptibility assessment and its mapping in Iran: a comparison between frequency ratio and weights-of-evidence bivariate statistical models with multi-criteria decision-making technique. Nat Haz 83(2):947–987. https://doi.org/10.1007/s11069-016-2357-2
Kia MB, Pirasteh S, Pradhan B, Rodzi Mahmud A, Sulaiman WNA, Moradi A (2012) An artificial neural network model for flood simulation using GIS: Johor River Basin, Malaysia. Environ Earth Sci 67(1):251–264. https://doi.org/10.1007/s12665-011-1504-z
Lai C, Shao Q, Chen X, Wang Z, Zhou X, Yang B, Zhang L (2016) Flood risk zoning using a rule mining based on ant colony algorithm. J Hydrol 542:268–280. https://doi.org/10.1016/j.jhydrol.2016.09.003
Lee MJ, Je K, Jeon S (2012) Application of frequency ratio model and validation for predictive flooded area susceptibility mapping using GIS. IGARSS. IEEE Int. https://doi.org/10.1109/IGARSS.2012.6351414
Lee S, Kim J-C, Jung H-S, Lee MJ, Lee S (2017) Spatial prediction of flood susceptibility using random-forest and boosted-tree models in Seoul metropolitan city, Korea. Geomat Nat Haz Risk 8(2):1185–1203. https://doi.org/10.1080/19475705.2017.1308971
Martínez-Álvarez F, Reyes J, Morales-Esteban A, Rubio-Escudero C (2013) Determining the best set of seismicity indicators to predict earthquakes. Two case studies: Chile and the Iberian Peninsula. Knowl-Based Syst 50:198–210. https://doi.org/10.1016/j.knosys.2013.06.011
Mojaddadi H, Pradhan B, Nampak H, Ahmad N, bin Ghazali AH (2017) Ensemble machine-learning-based geospatial approach for flood risk assessment using multisensor remote-sensing data and GIS. Geomat Nat Haz Risk 8(2):1080–1102. https://doi.org/10.1080/19475705.2017.294113
Moosavi V, Niazi Y (2016) Development of hybrid wavelet packet-statistical models (WP-SM) for landslide susceptibility mapping. Landslides 13(1):97–114. https://doi.org/10.1007/s10346-014-0547-0
Muhaimeed AS, Saloom AJ, Sallem KA, Alani KA, Muklef WM (2014) Classification and distribution of Iraqi soils. Int J Agric Innov Res 2(6):2319–1472
Ohl CA, Tapsell S (2000) Flooding and human health: the dangers posed are not always obvious. Brit J Med 321:1167–1168
Pal M, Foody GM (2010) Feature selection for classification of hyperspectral data by SVM. IEEE Trans Geosci Remote Sens 48:2297–2307. https://doi.org/10.1109/TGRS.2009.2039484
Pham BT, Tien Bui D, Dholakia MB, Prakash I, Pham HV, Mehmood K, Le HQ (2017) A novel ensemble classifier of rotation forest and Naïve Bayer for landslide susceptibility assessment at the Luc Yen district, Yen Bai Province (Viet Nam) using GIS. Geomat Nat Haz Risk 8(2):649–671. https://doi.org/10.1080/19475705.2016.1255667
Pradhan B (2010) Flood susceptible mapping and risk area delineation using logistic regression, GIS and remote sensing. J Spat Hydrol 9:1–18
Pradhan B (2013) A comparative study on the predictive ability of the decision tree, support vector machine and neuro-fuzzy models in landslide susceptibility mapping using GIS. Comput Geosci 51:350–365. https://doi.org/10.1016/j.cageo.2012.08.023ظ
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Mateo
Rahmati O, Zeinivand H, Besharat M (2015) Flood hazard zoning in Yasooj region, Iran, using GIS and multi-criteria decision analysis. Geomat Nat Haz Risk 7(3):1000–1017. https://doi.org/10.1080/19475705.2015.1045043
Razavi Termeh SV, Korejdady A, Pourghasemi HR, Keesstra S (2018) Flood susceptibility mapping using novel ensembles of adaptive neuro fuzzy inference system and metaheuristic algorithms. Sci Total Environ 615:438–451. https://doi.org/10.1016/j.scitotenv.2017.09.262
Rodrigues-Galiano VF, Chica-Olmo M, Chiac-Rivas M (2014) Predictive modelling of gold potential with the integration of multisource information based on random forest: a case study on the Rodalquilar area, southern Spain. Int J Geogr Inf Sci 28(7):1336–1354. https://doi.org/10.1080/13658816.2014.885527
Rodriguez JJ, Kuncheva LI, Alonso CJ (2006) Rotation Forest: a new classifier ensemble method. IEEE Trans Pattern Anal Mach Intell 28:1619–1630. https://doi.org/10.1109/TPAMI.2006.211
Rokach L (2010) Ensemble methods in supervised learning. Data mining and knowledge discovery handbook, 959–979
Srinivas VV, Tripathi S, Rao AR, Govindaraju RS (2008) Regional flood frequency analysis by combining self-organizing feature maps and fuzzy clustering. J Hydrol 348(1–2):148–166. https://doi.org/10.1016/j.jhydrol.2007.09.046
Tang Z, Zhang H, Yi S, Xiao Y (2018) Assessment of flood susceptible areas using spatially explicit, probabilistic multi-criteria decision analysis. J Hydrol 558:144–158. https://doi.org/10.1016/j.jhydrol.2018.01.033
Tehrany MS, Pradhan B, Jebur MN (2013) Spatial prediction of flood susceptible areas using rule based decision tree (DT) and a novel ensemble bivariate and multivariate statistical models in GIS. J Hydrol 504:69–79. https://doi.org/10.1016/j.jhydrol.2013.09.034
Tehrany MS, Lee M-J, Pradhan B, Jebur MN, Lee S (2014a) Flood susceptibility mapping using integrated bivariate and multivariate statistical models. Environ Earth Sci 72(10):4001–4015. https://doi.org/10.1007/s12665-014-3289-3
Tehrany MS, Pradhan B, Jebur MN (2014b) Flood susceptibility mapping using a novel ensemble weights-of-evidence and support vector machine models in GIS. J Hydrol 512:332–343. https://doi.org/10.1016/j.jhydrol.2014.03.008
Tehrany MS, Pradhan B, Mansor S, Ahmad N (2015) Flood susceptibility assessment using GIS-based support vector machine model with different kernel types. Catena 125:91–101. https://doi.org/10.1016/j.catena.2014.10.017
Tehrany MS, Shabani F, Jebur MN, Hong H, Chen W, Xie X (2017) GIS-based spatial prediction of flood prone areas using standalone frequency ratio, logistic regression, weight of evidence and their ensemble techniques. Geomat Nat Haz Risk. https://doi.org/10.1080/19475705.2017.1362038
Tien Bui D, Tuan TA, Klempe H, Pradhan B, Revhaut I (2014) Spatial prediction models for shallow landslide hazards: a comparative assessment of the efficacy of support vector machines, artificial neural networks, kernel logistic regression, and logistic model tree. Landslides 13(2):361–378. https://doi.org/10.1007/s10346-015-0557-6
Tien Bui D, Pradhan B, Nampak H, Bui QT, Tran QA, Nguyen QP (2016) Hybrid artificial intelligence approach based on neural fuzzy inference model and metaheuristic optimization for flood susceptibility modeling in a high-frequency tropical cyclone area using GIS. Hydrol J 540:317–330. https://doi.org/10.1016/j.jhydrol.2016.06.027
Wang Z, Lai C, Chen X, Yang B, Zhao S, Bai X (2015) Flood hazard risk assessment model based on random forest. J Hydrol 527:1130–1141. https://doi.org/10.1016/j.jhydrol.2015.06.008
Watts JD, Powell SL, Lawrence RL, Hilker T (2011) Improved classification of conservation tillage adoption using high temporal and synthetic satellite imagery. Remote Sens Environ 115:66–75. https://doi.org/10.1016/j.rse.2010.08.005
Winnaar Gde, Jewitt GPW, Horan M (2007) A GIS-based approach for identifying potential runoff harvesting sites in the Thukela River basin, South Africa. Phys Chem Earth 32:1058–1067
Youssef AM, Pourghasemi HR, Pourtaghi ZS, Al-Katheeri MM (2015) Landslide susceptibility mapping using random forest, boosted regression tree, classification and regression tree, and general linear models and comparison of their performance at Wadi Tayyah Basin, Asir Region, Saudi Arabia. Landslides 13:839–856. https://doi.org/10.1007/s10346-015-0614-1
Zhang C-X, Zhang J-S (2010) A variant of Rotation Forest for constructing ensemble classifiers. Pattern Anal Applic 13(1):59–77. https://doi.org/10.1007/s10044-009-0168-8
ZhenJie A, RenGuang Z, YiHui X (2015) A comparative study of fuzzy weights of evidence and random forests for mapping mineral prospectivity for skarn-type Fe deposits in the southwestern Fujian metallogenic belt, China. Sci China Earth Sci 59(3):556–572. https://doi.org/10.1007/s11430-015-5178-3
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Al-Abadi, A.M. Mapping flood susceptibility in an arid region of southern Iraq using ensemble machine learning classifiers: a comparative study. Arab J Geosci 11, 218 (2018). https://doi.org/10.1007/s12517-018-3584-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12517-018-3584-5