The Influence of Region of Interest Heterogeneity on Classification Accuracy in Wetland Systems
Next Article in Journal
Processing Thermal Infrared Imagery Time-Series from Volcano Permanent Ground-Based Monitoring Network. Latest Methodological Improvements to Characterize Surface Temperatures Behavior of Thermal Anomaly Areas
Previous Article in Journal
Hyperspectral and LiDAR Data Fusion Classification Using Superpixel Segmentation-Based Local Pixel Neighborhood Preserving Embedding
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Influence of Region of Interest Heterogeneity on Classification Accuracy in Wetland Systems

1
Pegasus Technical Services, Inc., c/o U.S. Environmental Protection Agency, Cincinnati, OH 45219, USA
2
Direção-Geral do Território, 1099-052 Lisbon, Portugal
3
Office of Research and Development, U.S. Environmental Protection Agency, Cincinnati, OH 45268, USA
4
Laboratory of Floristics and Geobotany, Institute of General and Experimental Biology SB RAS, Ulan-Ude 670047, Russia
5
Laboratory of Physical Geography and Biogeography, V.B. Sochava Institute of Geography SB RAS, Irkutsk 664033, Russia
6
Department of Botany, Irkutsk State University, Irkutsk 664003, Russia
*
Author to whom correspondence should be addressed.
Remote Sens. 2019, 11(5), 551; https://doi.org/10.3390/rs11050551
Submission received: 22 January 2019 / Revised: 19 February 2019 / Accepted: 20 February 2019 / Published: 6 March 2019

Abstract

:
Classifying and mapping natural systems such as wetlands using remote sensing frequently relies on data derived from regions of interest (ROIs), often acquired during field campaigns. ROIs tend to be heterogeneous in complex systems with a variety of land cover classes. However, traditional supervised image classification is predicated on pure single-class observations to train a classifier. This ultimately encourages end-users to create single-class ROIs, nudging ROIs away from field-based points or gerrymandering the ROI, which may produce ROIs unrepresentative of the landscape and potentially insert error into the classification. In this study, we explored WorldView-2 images and 228 field-based data points to define ROIs of varying heterogeneity levels in terms of class membership to classify and map 22 discrete classes in a large and complex wetland system. The goal was to include rather than avoid ROI heterogeneity and assess its impact on classification accuracy. Parametric and nonparametric classifiers were tested with ROI heterogeneity that varied from 7% to 100%. Heterogeneity was governed by ROI area, which we increased from the field-sampling frame of ~100 m2 nearly 19-fold to ~2124 m2. In general, overall accuracy (OA) tended downwards with increasing heterogeneity but stayed relatively high until extreme heterogeneity levels were reached. Moreover, the differences in OA were not statistically significant across several small-to-large heterogeneity levels. Per-class user’s and producer’s accuracies behaved similarly. Our findings suggest that ROI heterogeneity did not harm classification accuracy unless heterogeneity became extreme, and thus there are substantial practical advantages to accommodating heterogeneous ROIs in image classification. Rather than attempting to avoid ROI heterogeneity by gerrymandering, classification in wetland environments, as well as analyses of other complex environments, should embrace ROI heterogeneity.

Graphical Abstract

1. Introduction

Remote sensing plays a significant role in environmental monitoring and management applications, helping to quantify the effect of both natural and anthropogenic factors on the spatiotemporal dynamics of wetland systems [1]. Fundamental remote sensing data-processing workflows include the acquisition of applicable remote sensing data, implementation of a systematic and appropriate field data collection procedure, and selection and calculation of data-analysis algorithms that meet the current state of knowledge and the scope of the research being undertaken. The effective integration of this process creates highly accurate, repeatable, and relevant geospatial products [2].
Myriad possible combinations of spatial scales and analytical methods make effective execution of any remote sensing project a daunting task [3,4,5,6]. For example, the increasing spatial and spectral resolution of economically affordable remote sensing data challenges fundamentals of remote sensing, such as delineation and sampling of representative plots that are spectrally homogeneous for field data collection [7,8]. Yet higher heterogeneity of spectral responses is often expected in natural environments, particularly with the advance in sensor capabilities [9]. This leads to a paradox: increased spatial and spectral resolution promising new opportunities for remote landscape classification can discern an abundance of unique end-product classes requiring highly demanding and costly field efforts (i.e., an abundance of field sampling points) and the derivation of spectrally homogeneous regions of interest (ROIs) such that many of the advantages of using remotely sensed data could be lost.
Feature extraction and classification of natural objects with heterogeneous spectral signatures (e.g., mixed ROIs, composed of pixels with differing class membership) often result in erroneous classification in the final thematic map if not dealt with conveniently. For example, probabilistic functions underpinning traditional supervised classifiers assume ROI purity [10,11], and large samples of pure ROIs are suggested for better classification accuracy outcomes in training supervised classifiers [7,12,13]. Consequently, end-users frequently opt to create pure or homogeneous ROIs (e.g., moving ROIs away from field-based points, or gerrymandering the ROI), potentially inserting error into their classification. The feasibility of creating pure ROIs in the field, especially in light of the aforementioned satellite resolution and spectral bandwidth paradox, is likely to be limited by the required amount of financial, logistical, and time resources [13,14]. This is particularly true when trying to collect large and spectrally homogeneous representative data in large wetland systems due to the high diversity of wetland classes and plant species composition per unit surveying area [15], driven by manifest differences in hydropatterning (at coarser spatial scales) and hydroperiod (at finer spatial scales). For example, the hydrogeomorphology of deltaic wetland ecosystems can be unique and complex with rivers meandering through the delta creating both depressions (e.g., pools) and elevated terraces (e.g., river banks and islands), with great heterogeneity of edaphic conditions reflected in high local and landscape-scale diversity. Nevertheless, researchers have pushed to establish baseline data on wetland systems across the globe, such as Russia’s Selenga River Delta into Lake Baikal [16,17] or Botswana’s Okavango Delta [18] through the judicious use of emerging classification approaches (e.g., [19]) and technological advances (e.g., [20,21], see also [2,22] for the current state-of-the-science in wetland satellite remote sensing).
It is against this backdrop that there is increasing utility and emerging need to conduct classifications using heterogeneous training ROIs. Classification of ROIs with mixed spectral response involves computation of the wetland class end member proportions by ROIs rather than through probabilistic determination of the spectral response of a pixel to a particular wetland class end member. Many studies have shown the potential advantages of using mixed ROIs in training (e.g., [4,13]). Brown et al. [12] stated class mixture modeling is fundamental and suggested a discrimination approach when discrete class densities do not overlap in feature vector space and statistical pattern recognition methods when they do overlap. Amancio et al. [23] noted that despite the long tradition of using pattern recognition algorithms in applications such as industry, commerce, and academic research, there is no single method that yields the best accuracy results. Despite these admonitions, classification with mixed-ROIs remains uncommon.
With the increasing number of high-resolution satellite sensors creating a paradox for effectively characterizing ecosystems, we sought to answer the following research question: Do mixed ROIs yield better classification accuracy in complex remote sensing classification approaches? In addressing this research question, we used four classifiers often trained with samples of only single-class observations, but also allowing for heterogeneous multi-class observations. Specifically, we used and contrasted both parametric and nonparametric approaches: (1) Multinomial Logistic Model (MLM), (2) Generalized Linear Model (GLM), (3) Support Vector Machine (SVM), and (4) Random Forest (RF). The literature we assessed (e.g., [4]) and the paradox we observed suggested that mixed ROI approaches would outperform pure ROI approaches. We conducted our research in a large freshwater deltaic wetland with high complexity (i.e., with ~22 discrete classes of wetland and open-water habitats [17,24]). Our goal in conducting these unique and novel analyses was to assess and characterize the benefits and detriments of using mixed-ROIs to provide useful information for end-users in selecting and parametrizing wetland classification models.

2. Materials and Methods

2.1. Study Area

The study area is the Russian Federation’s Selenga River Delta located in southern Siberia (Figure 1). The hydro-climate of the region is characterized by high-amplitude daily and seasonal air temperature fluctuations, cold and long winters, short springs and short but warm and relatively rainy summers [25,26]. The delta covers an area of ~1100 km2 at the terminus of the Selenga River, the major fluvial contributor of water, sediment and contaminant inflows to Lake Baikal [27]. The Selenga River contributes 50–60% of the total inflow to Lake Baikal [28,29,30,31], the oldest (20–25 million years) and deepest (>1600 m) lake in the world. Lake Baikal contains 26% of the water volume of the world’s freshwater lakes, and ~6% of all global fresh water resources, including lakes, rivers, glaciers, etc. [32]. Recognized as a World Heritage Site by United Nations Cultural and Educational Organization (“UNESCO”), the Selenga River Delta into Lake Baikal is home to a wide variety of flora and fauna [31]. The delta acts as a buffer for attenuating and removing anthropogenic contaminants originating from the Selenga River Basin, an area of ~450,000 km2, before reaching Lake Baikal. The density of wetlands and small channels of the delta have shown steady and significant contaminant (e.g., metals and persistent toxic organic compounds) removal under various flow regime conditions [28,33,34]. With an increasing pressure on the eco-hydrology of the Selenga River in general and the flora and the fauna of the river delta in particular, studying the delta’s wetland systems and aquatic habitats is paramount in an effort to preserve its vital ecosystem services and functions for the future [35,36].

2.2. Remote Sensing Data Acquisitions and Pre-Processing

Two overlapping cloud-free WorldView-2 images (WV2; DigitalGlobe, Westminster, CO, USA) were acquired in 2011 (images taken on 25 June and 3 July). WV2 has eight multispectral bands and one panchromatic band with 2.0-m and 0.5-m spatial resolutions, respectively. The eight multispectral WV2 bands include four newer bands (i.e., coastal, yellow, red-edge, and near infrared-2) in addition to the four “traditional” bands (i.e., blue, green, red, near infrared-1). These additional bands have been found to improve wetland vegetation and habitat discrimination and classification (e.g., [24]). The spatial extent of the study area covered by the two images is 215 km2, focusing on a central portion of the Selenga River Delta. Absolute radiometric calibration factor and effective band-width values provided with the imagery metadata were used to evaluate the radiometric compatibility of the two images before the digital number (DN) values were converted to the top-of-the-atmosphere reflectance values in ENVI (v. 5.3, Exelis Visual Information Solutions, Inc., Harris Corporation, Broomfield CO, USA). Ortho-rectification was not necessary since the two images are Ortho-Ready Standard (OR2A) with geo-accuracy error of <5 m based on 21 ground control points (GCPs) that were collected during the field data collection season (described below). The two images were mosaicked in ENVI and then classified into an initial 22 unsupervised classes as described in detail by Lane et al. [24] using Iterative Self-Organizing Data Analysis (ISODATA) clustering technique in ENVI. The unsupervised ISODATA wetland class thematic map produced was used for initial field data collection. The vector layer polygons created from the ISODATA classified image were loaded into a Trimble Nomad and/or a Trimble Yuma GPS receiver (Sunnyvale, CA, USA) with 2- to 5-m real-time accuracy for field data collection.

2.3. Field Data Collection and Processing

A total of 228 field sites and 21 GCPs in the focal area of Figure 2 were visited by boat, vehicle and foot in 2011 and 2012. A 100-m2 area typical of the target ISODATA class was chosen by the field team, and vegetation (i.e., species abundance occurring >10%) and corresponding habitat data were collected (see, e.g., [17]). Between three and 17 unique polygons were visited for each of the 22 classes (average: 10 field sites per class). These botanical data were subsequently collapsed to the genus phylogenic level and used to both train and validate the classifiers. Information on the vegetation composition and structure of the classes is available in Table 1. Circular ROI polygons of increasing diameters (Figure 3A–C) were delineated around each field site in ArcGIS (v. 10.4.1, ESRI Inc., Redlands, CA, USA). Differing diameters were arbitrarily chosen (from 12 m to 52 m; area ranged from 113 to 2124 m2), as there is no set distance for assessment, and end-users may choose to use any given diameter or shape, as dictated by the application at hand, such as field data collection designs. As expected, increasing the diameter increased the potential range of controlling factors affecting vegetation structure and type (e.g., increased the likelihood of different hydrologic regimes affecting vegetation composition). Ergo, increasing ROI heterogeneity resulted from increased ROI diameter length (Table 2). The relative purity of the ROI was also a function of the field site location. Field data collected from the center of large uniform features (e.g., a pond or a large river course) would be expected to maintain ROI purity with increasing ROI diameter (Figure 3A–C; see also Table 2).

2.4. Mixed-ROI Image Classification

Four classification algorithms (i.e., MLM, GLM, SVM, and RF) were implemented in R (R Core Team 2016) using the following packages: nnet [37], glmnet [38], gmum.r [39], and ranger [40]. Each classification algorithm was iteratively analyzed, using ROIs with 11 increasing diameters and hence increased heterogeneity (see Table 2). Heterogeneity of the ROIs was used to weight their contribution to the classification learning. That is, ROIs were used to provide information on the classes (among the 22 classes) whose membership was larger than zero (i.e., covering >0% of the ROI’s area). These membership degrees (i.e., cover percentages) were provided as observation weights to the classifiers, following [4]. For example, in R’s function ‘ranger’, the membership degrees were passed to RF via argument ‘case.weights’. To focus on the influence of ROI heterogeneity, the classification parameters other than heterogeneity were not modified between ROI sizes. WV2 has eight spectral bands and the WV2 spectral band mean and standard deviation from all pixels within each ROI were calculated, resulting in a total of 16 predictor variables.

2.4.1. Multinomial Logistic Model

MLM is an extension of the binary logistic model to handle cases in which the variable of interest such as land cover can take multiple classes. In both MLM and binary logistic model approaches, the goal of the models is to describe the assumed linear relationship between the response variable and the predictors. MLM has been used in statistical analysis for decades and can be seen here as a conventional classifier normally trained only with known pure responses. However, the model can also accommodate mixed responses, for example through the use of an Artificial Neural Network (ANN). ANNs are an intelligent machine-learning algorithm approach initially developed to study biological functions and hence mimic the flight of energy along neural networks in the human brain, predicting outputs by processing non-linear and complex interactions using input-predictor variables [41]. A simple network without hidden layering can fit multinomial logistic models [37]. This specific net architecture has the advantage of accommodating mixed training units (i.e., mixed ROIs) in training, which is typically not possible in traditional statistical packages.

2.4.2. Generalized Linear Model

GLMs are extensions of linear models by using a link function that relates the expected value of the response to a linear combination of the predictors. GLMs are suitable for modeling response variables of arbitrary distributions, including binary and count data. Here, GLMs were used to fit multinomial models similar to Section 2.4.1 above, but also with regularization and feature selection [38]. This strategy promotes robustness of modeling in classification, especially when the dimension of the data is large relative to the sample size, which is the case of the paradoxical scenario of increased spatial and spectral resolutions used in remote sensing applications.

2.4.3. Support Vector Machine

SVM supervised nonparametric techniques produce high classification accuracy results with limited ground-truthed data by determining an optimum hyperplane separating the training dataset into discrete user-defined class end-members [13,42,43,44,45]. The SVM algorithms are particularly well informed by mixed ROIs and can be particularly robust when the hyperplane is located close to the center of class end-members ascribed by pure- and mixed-ROI training samples divided in feature space [13].

2.4.4. Random Forest

RF is based on ensemble machine-learning and is increasingly being used as a classifier of choice for remote sensing analyses of different habitats (e.g., [16,46,47,48]). RF is nonparametric, and can be used for both classifications and regressions, as well as for determining variable importance [49,50]. In RF, a user-defined number of trees (ntree), each split at a node using input-predictor variables (mtry), contributes a single vote, where class end-membership assignment of the input vector is based on the majority of the votes. RF are constructed using a bootstrap aggregation approach where the input data are randomly selected with replacement to respectively train the trees (in-bag samples) and to perform internal accuracy assessment (out-of-the bag samples) [49,51].

2.5. Accuracy Assessment

Overall accuracy (OA), producer’s accuracy (PA), and user’s accuracy (UA) were assessed via Monte Carlo cross-validation (mean of 100 iterations), in which 75% of the total number of ROIs available (n = 171) were used for training the models while the remaining 25% (n = 57) were used for performing an independent classification accuracy assessment. The three accuracy measures are reported as the mean of the 100 iterations. We quantitatively assessed if the observed differences among the overall accuracies were statistically significant using 95% confidence intervals [52].

3. Results

The highest and lowest OA (87.8% and 48.4%) were achieved using RF and MLM, respectively, in both cases with the smallest ROI-size (Table 3; Figure 4). RF outperformed the other classifiers (except for ROI-ID D14), followed closely by SVM, before the performance of SVM decreased well below that of RF for extreme heterogeneity. The GLM performed moderately well, with OA ranging from 62–79%, with better performance at smaller ROIs. Interestingly, unlike the other classifiers, MLM performed poorly at smaller ROIs and increased in performance through ROI-ID D24 (64%) until OA began decreasing again. An example of the differences in classification application between the algorithms may be seen in Figure 5, in which the map of the MLM presents a highly pixelated structure (Figure 5C) compared to the remaining maps, relatively more consistent among them (Figure 5D–F).
The OA changes vacillated with ROI size and, implicitly, by quantified heterogeneity (see Table 2) in the range of |0.9%–1.5%|for the three best-performing classifiers (with the exception of the largest ROI size). Significant differences arose once ROI-ID D15 (area = 177 m2) was exceeded (Table 3). Further increasing ROI heterogeneity to ROI-ID D17 (227 m2) afforded an improvement in RF and GLM OA that again made the differences between those approaches statistically insignificant. Exceeding D17 (227 m2) made the differences statistically significant for the three classifiers. However, OA decreased smoothly until ROI heterogeneity was at a maximum. As for MLM, its performance lagged as measured by the OA yet MLM had the least change in OA across the ROIs (averaging |1.0%|). The stability of MLM is suggestive of potential utility, but that is belied at this point by the low OA, particularly marked at small ROI sizes.
Per-class accuracy estimates were also produced, and the producer’s and user’s accuracy for each class by the best-performing classifier (RF) are provided in Figure 6. The classes mapped with larger accuracy, both in terms of PA and UA, were classes 12 (Dense floating vascular (Nymphoides)), 13 (Very dense floating vascular), 16 (Persistent emergent (Equisetum)) and 21 (Persistent terrestrial (Amoria)). These and most of the classes followed a decreasing trend of accuracy as ROI size increased. The largest PA and UA values were obtained across several ROI heterogeneity levels (e.g., PA of class 13 reached 100% up to ROI-ID D32). The smallest PA and UA values (22% and 26%) were obtained with the largest ROI size (D52) for classes 9 (Submerged floating vascular (Nymphoides)) and 5 (Shallow water with sand bottom), respectively. Presenting an inverse trend was class 14, a monoculture-forming type of Persistent emergent (Phragmites). This class apparently benefited from ROI heterogeneity for enhanced per-class classification accuracy. Class 9 (Nymphoides) was somewhat insensitive to ROI heterogeneity, and hence its PA and UA were relatively flat across all ROI sizes (except ROI-ID D52).

4. Discussion

Lewis Carroll’s Red Queen notes, “Now, here you see, it takes all the running you can do, to keep in the same place. If you want to get somewhere else, you must run at least twice as fast as that!” [53]. Advances in remote sensing imagery analysis and increased availability of sensed electromagnetic bands requires us to “move faster” (i.e., more creatively analyze the data) to effectively classify landscapes. The recent development and application of machine-learning classifiers such as ANNs, SVM, and RF approaches have quickly (pun intended) advanced our efforts to understand landscape patterning using spectral data. The science has moved from early supervised classification efforts (e.g., using Maximum Likelihood (ML); [54,55]) to using RF, SVM, ANN, and other approaches. However, the utility of the advanced methods when confronted with non-homogeneous ROIs in wetland classification has remained problematic (resulting in gerrymandered ROIs or data point nudging), insofar as our literature analyses have discerned.
Some authors have included wetlands as a discrete class when contrasting among methods, though none have wetland systems as diverse (e.g., with 22 wetland classes) as in our analyses. For instance, classifying seven land cover types (citrus, pasture, sod, timber, urban, water and wetland) using Landsat-5 TM data, Dixon and Candade [56] reported OA of 51%, 78% and 79% using ML, ANNs and SVM, respectively, concluding SVM is faster and easier to implement than ANNs. Moreover, Foody and Mathur [13] found comparable OA using conventional pure-pixels (92.6%) and mixed-pixels (91.1%) classification approaches using unbalanced training datasets, 3-band SPOT HRV multispectral data and SVM to classify three agricultural cover types.
In our analyses, we increased ROI size up to 2124 m2. As the ROI size increased, the number of different wetland class end-members defined by the boundaries of the ROI also increased (e.g., Figure 3). As we note, mixed-ROIs will become increasingly commonplace with technological advances. Across our analyses, though, ROI sizes increased nearly 19-fold, yet accuracies with RF (and to a lesser degree, SVM) stayed remarkably consistent, even as the ROI heterogeneity increased. To wit, we achieved OA of nearly 80% with RF (ROI-ID D32, 804 m2, eight times our field-based quadrat size of 100 m2) even though 96% of the ROIs were heterogeneous. We developed our approach using 22 classes, and it is likely that fewer wetland classes would have resulted in even greater OA across the different classifiers (e.g., [19]).
Across all ROI sizes and with increasing heterogeneity, both SVM and RF were superior to MLM and GLM (though we note the choice of classifier can be contextually dependent). The poor performance of MLM for small ROI sizes is perhaps explained in part by the small representativeness of some classes in the training samples. Since 228 ROIs were used to train 22 classes, and most of the ROIs were pure, there were on average only 228/22 ROIs per class (~10). This is a small sample size for training multinomial logistic models given the number of predictors (16). Note that ANNs are able to perform well with small sample sizes, but in this case they were used to fit multinomial logistic models. Increasing the size of the ROI increases heterogeneity and thus each individual ROI provides information on a larger number of classes (as many classes as it encloses). Increasing ROI size and heterogeneity can be seen as an indirect way of increasing sample size. Possibly for this reason, the OA of MLM increased with increasing diameter up to 24 m. That is, the particular behavior of MLM increasing OA from 48.4% to 60.6% was possibly caused by the richer information held in the mixed ROIs, which increases the number of ROIs available for training individual classes.
GLM, also used to fit multinomial models, was less sensitive to the small representativeness of the classes when the ROIs were small and mostly pure, possibly due to regularization and feature selection [38]. Therefore, traditional classifiers such as those with a statistical basis, which commonly require large samples, can perform relatively well in sub-optimal conditions, as long as some refinement is implemented. However, nonparametric algorithms are by design not constrained to use linear combinations of the predictors, and this may be advantageous. Thus, SVM and RF achieved higher OA.
While both SVM and RF allow integration of ancillary data, we concur with others (e.g., [42,57,58,59]), and recommend RF due to the need for lower parametrization, informative generation of variable importance (i.e., Mean Decrease in Gini values), detection of outliers, and construction of simple decision boundaries. When contrasted with RF, SVM is computationally costlier and less accurate than RF, particularly when an abundance of predictor variables (i.e., hyperspectral or multi-source data) are utilized [49]. On the other hand, RF requires a veritable forest of trees (i.e., a large number) to train the model, and SVM demands abundant support vectors to build its model. Furthermore, perhaps based on the limited number of parameters to select with RF or our familiarity with the approach (e.g., [16]), we may have optimized the RF results relative to SVM.
Irrespective of the classifier used, OA was stable across several ROI sizes. For example, the 95% CI of the largest OA for RF, obtained with ROI-ID D12, overlaps that of larger sizes (see Table 3). This means that there is no statistical significance of the difference observed between the OA for a range of ROI diameters. Table 2 shows that the percentage of pure ROIs decreased from 93.4% to 21.1%. Despite this big difference in ROI heterogeneity as a function of their size, the differences in OA are statistically insignificant. Similar situations are observed for GLM and SVM. Per-class classification accuracy also suggest strong ability of the classifiers to hold varying degrees of ROI heterogeneity. The producer’s and user’s accuracies of most of the classes stayed relatively high across several ROI sizes. There was only one clear exception (class 14), which benefited from ROI heterogeneity. Possible reasons for this distinct behavior of a monoculture-forming vegetative class (Phragmites) are not clear, which merits further research. Nevertheless, none of the classes seemed to suffer from ROI heterogeneity until extreme levels were used. Our findings, then, suggest that there are substantial practical advantages to creating mixed-ROIs rather than gerrymandering. In classification, the only thing that needs to be done is to recognize the heterogeneous nature of the ROIs rather than assuming they are necessarily pure, and use their heterogeneity directly in training the classifiers. The suggested way of using heterogeneity in training is to find the relative proportion of the classes found in the individual ROIs and use it as class membership (training weights), which is included in the settings passed to the classifiers [4].
The OA of GLM, SVM, and RF decreased significantly (i.e., the confidence intervals ceased overlapping) within increasing ROI size and hence heterogeneity, as compared to their respective highest OA (at D14, D14, D12, respectively; see Table 3). Where the OAs ceased overlapping, for ROI-ID D16 (for SVM) and D18 (for GLM and RF) the percentage of pure ROIs decreased to 29.4% and 17.1%, respectively. Therefore, we found classification accuracy starts decreasing significantly when the percentage of pure ROIs crosses a certain threshold. This suggests that there is a limit of the heterogeneity the classifiers can accommodate in training before their performance degrades. Recent research has shown that both mixed and pure training responses are needed. Classifiers can learn from pure responses to describe the classes and from mixed responses to separate the classes [4,13]. Ma et al. [60] found that the ratio of pure to mixed training units should lie between 0.2 to 0.6 for their dataset, which matches the ratios defined by ROI-ID D15 to D17. However, the specific threshold should fluctuate as a function of the application, including study area, number of classes, classifier, etc.

5. Conclusions

The world’s wetlands are highly diverse in vegetative and habitat structure, and that diversity creates hot spots of biodiversity and functioning. Yet wetland losses around the world continue (e.g., [61]), and these will be exacerbated by expected changes in precipitation patterning and timing (e.g., [62]). Satellite remote sensing is a cost-effective approach to creating baseline understanding of wetland structure. However, increasing spatial resolution and sensed spectral bands have created a paradox requiring increasing amounts of expensively collected field data and/or operational time to create homogeneous ROIs and thus meet the requirements of traditional classification approaches. Recently, Costa et al. [4] reiterated the utility of landscape classification using mixed-ROIs as defined in image segmentation. In this study, we contrasted the ability of four different classifiers to characterize a diverse wetland landscape with increasing ROI heterogeneity. The evidence is conclusive that following the Costa et al. [4] approach and using RF provides the highest OA among the four classifiers we explored. We furthermore conclude that the Costa et al. [4] approach with RF provides adequate OA, as defined here by OA of approximately 80% or higher, with ROI sizes of up to 804 m2, eight-fold higher than our field-based assessment area. Natural wetland landscapes like the Selenga River Delta tend to be complex, and hence collecting an adequate number of homogeneous ROIs that are representative of the entire wetland landscape can be difficult. The ROI size and heterogeneity increased concurrently in our study area, as we expect it would elsewhere. We therefore do not establish particular thresholds for ROI size as long as there is a fraction of pure ROIs. We do; however, conclude that remote sensing analyses should explore ROIs of several sizes, as thresholds and change points may emerge. ROI size is likely situationally determined as well given that end users could be focused on broad vegetation classes (e.g., open waters, submerged macrophytes, forested wetlands, etc.) or on specific communities (e.g., Potamogeton-dominated waters, Phragmites patches). These concerns should inform final ROI-size decisions. However, we conclusively state that wetland analyses should embrace ROI heterogeneity rather than attempting to avoid it.

Author Contributions

All authors made significant contributions to the manuscript. T.M.B. co-developed the study, processed the spatial data, applied remote sensing methodologies, analyzed the results, and co-wrote the manuscript. H.C. provided statistical analysis and interpretation and co-wrote the manuscript. C.R.L. co-developed and supervised the study, collected field data, analyzed the results, and co-wrote the manuscript. O.A.A. and V.V.C. co-developed the study’s field component, provided botanical expertise in collecting and interpreting field data, and contributed to results interpretation and manuscript writing and revision. B.C.A. co-developed the field component of the study, collected, processed, and analyzed field data, and contributed to results interpretation and manuscript editing and revision.

Funding

The research of Oleg A. Anenkhonov and Victor V. Chepinoga was funded by the Russian Federal Budget, projects 0337-2017-0006 and 0347-2016-003, respectively.

Acknowledgments

This paper has been reviewed in accordance with the U.S. Environmental Protection Agency’s peer-review policies and approved for publication. Mention of trade names or commercial products does not constitute endorsement of recommendation for use. Statements in this publication reflect the authors’ personal views and opinions and should not be construed to represent any determination of policy of the U.S. Environmental Protection Agency. We appreciate the receiving constructive criticism on earlier drafts from Melanie Vanderhoof (USGS) and Sean Woznicki (US EPA).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Robertson, L.; King, D.; Davies, C. Object-based image analysis of optical and radar variables for wetland evaluation. Int. J. Remote Sens. 2015, 36, 5811–5841. [Google Scholar] [CrossRef]
  2. Wu, Q. GIS and remote sensing applications in wetland mapping and monitoring. In Comprehensive Geographic Information Systems; Elsevier: Oxford, UK, 2018; pp. 140–157. [Google Scholar] [CrossRef]
  3. Burnett, C.; Blaschke, T. A multi-scale segmentation/object relationship modelling methodology for landscape analysis. Ecol. Model. 2003, 168, 233–249. [Google Scholar] [CrossRef]
  4. Costa, H.; Foody, G.M.; Boyd, D.S. Using mixed objects in the training of object-based image classifications. Remote Sens. Environ. 2017, 190, 188–197. [Google Scholar] [CrossRef]
  5. Du, P.; Xia, J.; Zhang, W.; Tan, K.; Liu, Y.; Liu, S. Multiple classifier system for remote sensing image classification: A review. Sensors 2012, 12, 4764–4792. [Google Scholar] [CrossRef] [PubMed]
  6. Woodcock, C.E.; Strahler, A.H. The factor of scale in remote sensing. Remote Sens. Environ. 1987, 21, 311–332. [Google Scholar] [CrossRef]
  7. Foody, G.M. Relating the land-cover composition of mixed pixels to artificial neural network classification output. Photogramm. Eng. Remote Sens. 1996, 62, 491–499. [Google Scholar]
  8. Foody, G.M. Fully fuzzy supervised classification of land cover from remotely sensed imagery with an artificial neural network. Neural Comput. Appl. 1997, 5, 238–247. [Google Scholar] [CrossRef]
  9. Dalponte, M.; Bruzzone, L.; Vescovo, L.; Gianelle, D. The role of spectral resolution and classifier complexity in the analysis of hyperspectral images of forest areas. Remote Sens. Environ. 2009, 113, 2345–2355. [Google Scholar] [CrossRef]
  10. Foody, G.M.; Arora, M.K. Incorporating mixed pixels in the training, allocation and testing stages of supervised classifications. Pattern Recognit. Lett. 1996, 17, 1389–1398. [Google Scholar] [CrossRef]
  11. Foody, G.M.; McCulloch, M.; Yates, W. Classification of remotely sensed data by an artificial neural network: Issues related to training data characteristics. Photogramm. Eng. Remote Sens. 1995, 61, 391–401. [Google Scholar]
  12. Brown, M.; Gunn, S.R.; Lewis, H.G. Support vector machines for optimal classification and spectral unmixing. Ecol. Model. 1999, 120, 167–179. [Google Scholar] [CrossRef]
  13. Foody, G.M.; Mathur, A. The use of small training sets containing mixed pixels for accurate hard image classification: Training on mixed spectral responses for classification by a SVM. Remote Sens. Environ. 2006, 103, 179–189. [Google Scholar] [CrossRef]
  14. Foody, G.M.; Mathur, A.; Sanchez-Hernandez, C.; Boyd, D.S. Training set size requirements for the classification of a specific class. Remote Sens. Environ. 2006, 104. [Google Scholar] [CrossRef]
  15. Ozesmi, S.L.; Bauer, M.E. Satellite remote sensing of wetlands. Wetlands Ecol. Manag. 2002, 10, 381–402. [Google Scholar] [CrossRef]
  16. Berhane, T.; Lane, C.; Wu, Q.; Autrey, B.; Anenkhonov, O.; Chepinoga, V.; Liu, H. Decision-tree, rule-based, and random forest classification of high-resolution multispectral imagery for wetland mapping and inventory. Remote Sens. 2018, 10, 580. [Google Scholar] [CrossRef] [PubMed]
  17. Lane, C.R.; Anenkhonov, O.; Liu, H.; Autrey, B.C.; Chepinoga, V. Classification and inventory of freshwater wetlands and aquatic habitats in the Selenga River Delta of Lake Baikal, Russia, using high-resolution satellite imagery. Wetlands Ecol. Manag. 2015, 23, 195–214. [Google Scholar] [CrossRef]
  18. Murray-Hudson, M.; Wolski, P.; Cassidy, L.; Brown, M.T.; Thito, K.; Kashe, K.; Mosimanyana, E. Remote Sensing-derived hydroperiod as a predictor of floodplain vegetation composition. Wetlands Ecol. Manag. 2015, 23, 603–616. [Google Scholar] [CrossRef]
  19. Dronova, I. Object-based image analysis in wetland research: A review. Remote Sens. 2015, 7, 6380–6413. [Google Scholar] [CrossRef]
  20. White, L.; Brisco, B.; Dabboor, M.; Schmitt, A.; Pratt, A. A collection of SAR methodologies for monitoring wetlands. Remote Sens. 2015, 7, 7615–7645. [Google Scholar] [CrossRef]
  21. Wohlfart, C.; Winkler, K.; Wendleder, A.; Roth, A. TerraSAR-X and wetlands: A review. Remote Sens. 2018, 10, 916–939. [Google Scholar] [CrossRef]
  22. Guo, M.; Li, J.; Sheng, C.; Xu, J.; Wu, L. A Review of wetland remote sensing. Remote Sens. 2017, 17, 777. [Google Scholar] [CrossRef] [PubMed]
  23. Amancio, D.R.; Comin, C.H.; Casanova, D.; Travieso, G.; Bruno, O.M.; Rodrigues, F.A.; da Fontoura Costa, L. A systematic comparison of supervised classifiers. PLoS ONE 2014, 9, e94137. [Google Scholar] [CrossRef] [PubMed]
  24. Lane, C.; Liu, H.; Autrey, B.; Anenkhonov, O.; Chepinoga, V.; Wu, Q. Improved wetland classification using eight-band high resolution satellite imagery and a hybrid approach. Remote Sens. 2014, 6, 12187–12216. [Google Scholar] [CrossRef]
  25. Chebykin, E.; Sorokovikova, L.; Vtomberg, I.; Rasskazov, S.; Khodzher, T.; Grachev, M. Current state of the Selenga River waters in the Russian territory concerning major components and trace elements. Chem. Sustain. Dev. 2012, 20, 561–580. [Google Scholar]
  26. Plyusnin, A.M.; Kislitsina, L.B.; Zhambalova, D.I.; Peryazeva, E.G.; Udodov, Y.N. Development of the chemical characteristics of ground water at the delta of the Selenga River. Geochem. Int. 2008, 46, 288–295. [Google Scholar] [CrossRef]
  27. Balkhanov, V.K.; Bashkuev, Y.V. Fractal dimension of the channel network structure of Selenga River Delta. Water Resour. 2004, 31, 148–151. [Google Scholar] [CrossRef]
  28. Chalov, S.; Jarsjo, J.; Kasimov, N.; Romanchenko, A.; Pietron, J.; Thorslund, J.; Promakhova, E. Spatio-temporal variation of sediment transport in the Selenga River Basin, Mongolia and Russia. Environ. Earth Sci. 2015, 73, 663–680. [Google Scholar] [CrossRef]
  29. Lychagin, M.; Chalov, S.; Kasimov, N.; Shinkareva, G.; Jarsjö, J.; Thorslund, J. Surface water pathways and fluxes of metals under changing environmental conditions and human interventions in the Selenga River system. Environ. Earth Sci. 2016, 76. [Google Scholar] [CrossRef]
  30. Törnqvist, R.; Jarsjö, J.; Pietroń, J.; Bring, A.; Rogberg, P.; Asokan, S.M.; Destouni, G. Evolution of the hydro-climate system in the Lake Baikal basin. J. Hydrol. 2014, 519, 1953–1962. [Google Scholar] [CrossRef]
  31. Tulochonov, A.K.; Plusnin, A.M. (Eds.) The Selenga River Delta—Natural Biofilter and Indicator of the Condition of Lake Baikal; Publishing House of the Siberian Branch of the Russian Academy of Sciences: Novosibirsk, Russia, 2008; ISBN 978-5-7692-0669-6. (In Russian) [Google Scholar]
  32. Ivanov, O. Clarification of the actual portion of Lake Baikal’s water in the world freshwater supply. Bull. Utiliz. Prot. Nat. Resour. Russia 2009, 3, 14–15. [Google Scholar]
  33. Berezhnykh, T.V.; Marchenko, O.Y.; Abasov, N.V.; Mordvinov, V.I. Changes in the summertime atmospheric circulation over East Asia and formation of long-lasting low-water periods within the Selenga river basin. Geogr. Nat. Resour. 2012, 33, 223–229. [Google Scholar] [CrossRef]
  34. Chalov, S.; Thorslund, J.; Kasimov, N.; Aybullatov, D.; Ilyicheva, E.; Karthe, D.; Kositsky, A.; Lychagin, M.; Nittrouer, J.; Pavlov, M.; et al. The Selenga River delta: A geochemical barrier protecting Lake Baikal waters. Reg. Environ. Chang. 2016, 17, 2039–2053. [Google Scholar] [CrossRef]
  35. Khazheeva, Z.I.; Tulokhonov, A.K.; Yao, R.; Hu, W. Seasonal and spatial distribution of heavy metals in the Selenga River Delta. J. Geogr. Sci. 2008, 18, 319–327. [Google Scholar] [CrossRef]
  36. Ubugunov, L.; Borisova, N.; Anenkhonov, O.; Baldanova, D.; Bazova, N.; Ubugunov, V. Probable biological and ecological consequences of hydropower plant constructing on the Selenga River and its tributaries in Mongolia. Bull. Irkutsk State Univ. Ser Biol. Ecol. 2015, 12, 58–72. [Google Scholar]
  37. Venables, W.N.; Ripley, B.D. Time series analysis. In Modern Applied Statistics with S; Springer: New York, NY, USA, 2002; pp. 387–418. [Google Scholar] [CrossRef]
  38. Friedman, J.; Hastie, T.; Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 2010, 33. [Google Scholar] [CrossRef]
  39. Czarnecki, W.; Jastrzebski, S.; Data, M.; Sieradzki, I.; Bruno-Kaminski, M.; Jurek, K.; Kowenzowski, P.; Pletty, M.; Talik, K.; Zgliczynski, M. gmum.r: GMUM Machine Learning Group Package. 2015. Available online: https://github.com/gmum/gmum.r (accessed on 1 July 2018).
  40. Wright, M.N.; Ziegler, A. ranger: A fast implementation of random forests for high dimensional data in C++ and R. J. Stat. Softw. 2017, 77. [Google Scholar] [CrossRef]
  41. Lek, S.; Guégan, J.F. Artificial neural networks as a tool in ecological modelling, an introduction. Ecol. Model. 1999, 120, 65–73. [Google Scholar] [CrossRef]
  42. Abdel-Rahman, E.M.; Mutanga, O.; Adam, E.; Ismail, R. Detecting Sirex noctilio grey-attacked and lightning-struck pine trees using airborne hyperspectral data, random forest and support vector machines classifiers. ISPRS J. Photogramm. Remote Sens. 2014, 88, 48–59. [Google Scholar] [CrossRef]
  43. Foody, G.M.; Mathur, A. A relative evaluation of multiclass image classification by support vector machines. IEEE Trans. Geosci. Remote Sens. 2004, 42, 1335–1343. [Google Scholar] [CrossRef]
  44. Heumann, B.W. An object-based classification of mangroves using a hybrid decision tree—Support vector machine approach. Remote Sens. 2011, 3, 2440–2460. [Google Scholar] [CrossRef]
  45. Mountrakis, G.; Im, J.; Ogole, C. Support vector machines in remote sensing: A review. ISPRS J. Photogramm. Remote Sens. 2011, 66, 247–259. [Google Scholar] [CrossRef]
  46. Berhane, T.; Lane, C.; Wu, Q.; Anenkhonov, O.; Chepinoga, V.; Autrey, B.; Liu, H. Comparing pixel- and object-based approaches in effectively classifying wetland-dominated landscapes. Remote Sens. 2017, 10, 46. [Google Scholar] [CrossRef] [PubMed]
  47. Corcoran, J.; Knight, J.; Gallant, A. Influence of multi-source and multi-temporal remotely sensed and ancillary data on the accuracy of random forest classification of wetlands in Northern Minnesota. Remote Sens. 2013, 5, 3212–3238. [Google Scholar] [CrossRef]
  48. Tian, S.; Zhang, X.; Tian, J.; Sun, Q. Random forest classification of wetland landcovers from multi-sensor data in the arid region of Xinjiang, China. Remote Sens. 2016, 8, 954. [Google Scholar] [CrossRef]
  49. Belgiu, M.; Drăguţ, L. Random forest in remote sensing: A review of applications and future directions. ISPRS J. Photogramm. Remote Sens. 2016, 114, 24–31. [Google Scholar] [CrossRef]
  50. Kotsiantis, S. Combining bagging, boosting, rotation forest and random subspace methods. Artif. Intell. Rev. 2010, 35, 223–240. [Google Scholar] [CrossRef]
  51. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  52. Foody, G.M. Classification accuracy comparison: Hypothesis tests and the use of confidence intervals in evaluations of difference, equivalence and non-inferiority. Remote Sens. Environ. 2009, 113, 1658–1663. [Google Scholar] [CrossRef]
  53. Carroll, L. Through the Looking-Glass, and What Alice Found There; MacMillan: London, UK, 1871. [Google Scholar]
  54. Foody, G.M.; Campbell, N.; Trodd, N.; Wood, T. Derivation and applications of probabilistic measures of class membership from the maximum-likelihood classification. Photogramm. Eng. Remote Sens. 1992, 58, 1335–1341. [Google Scholar]
  55. Jensen, J.R. Remote Sensing of the Environment: An Earth Resource Perspective; Prentice-Hall, Inc.: Upper Saddle River, NJ, USA, 2007. [Google Scholar]
  56. Dixon, B.; Candade, N. Multispectral landuse classification using neural networks and support vector machines: One or the other, or both? Int. J. Remote Sens. 2008, 29, 1185–1206. [Google Scholar] [CrossRef]
  57. Adam, E.; Mutanga, O.; Odindi, J.; Abdel-Rahman, E.M. Land-use/cover classification in a heterogeneous coastal landscape using RapidEye imagery: Evaluating the performance of random forest and support vector machines classifiers. Int. J. Remote Sens. 2014, 35, 3440–3458. [Google Scholar] [CrossRef]
  58. Pal, M. Random forest classifier for remote sensing classification. Int. J. Remote Sens. 2005, 26, 217–222. [Google Scholar] [CrossRef]
  59. Sesnie, S.E.; Finegan, B.; Gessler, P.E.; Thessler, S.; Ramos Bendana, Z.; Smith, A.M.S. The multispectral separability of Costa Rican rainforest types with support vector machines and Random Forest decision trees. Int. J. Remote Sens. 2010, 31, 2885–2909. [Google Scholar] [CrossRef]
  60. Ma, L.; Fu, T.; Li, M. Active learning for object-based image classification using predefined training objects. Int. J. Remote Sens. 2018, 39, 2746–2765. [Google Scholar] [CrossRef]
  61. Creed, I.F.; Lane, C.R.; Serran, J.N.; Alexander, L.C.; Basu, N.B.; Calhoun, A.J.K.; Christensen, J.R.; Cohen, M.J.; Craft, C.; D’Amico, E.; et al. Enhancing protection for vulnerable waters. Nat. Geosci. 2017, 10, 809–815. [Google Scholar] [CrossRef] [PubMed]
  62. Giosan, L.; Syvitski, J.; Constantinescu, S.; Day, J. Climate change: Protect the world’s deltas. Nature 2014, 516, 31–33. [Google Scholar] [CrossRef] [PubMed]
Figure 1. The location of the study area of the Selenga River Delta, a large, freshwater deltaic wetland extending into Russia’s Lake Baikal.
Figure 1. The location of the study area of the Selenga River Delta, a large, freshwater deltaic wetland extending into Russia’s Lake Baikal.
Remotesensing 11 00551 g001
Figure 2. False color composite (NIR-1, Red and Green bands) of the focal study area within the Selenga River Delta and the location of the both the field collection points and ground truthing sites. The white-colored box indicates the location of an area of interest analyzed to generate wetland class and aquatic habitats classification thematic maps using the four classifiers.
Figure 2. False color composite (NIR-1, Red and Green bands) of the focal study area within the Selenga River Delta and the location of the both the field collection points and ground truthing sites. The white-colored box indicates the location of an area of interest analyzed to generate wetland class and aquatic habitats classification thematic maps using the four classifiers.
Remotesensing 11 00551 g002
Figure 3. (AC) Spatial delineation of ROIs with sample wetland classes and aquatic habitats selected from the Selenga River Delta landscape (Top panels are true-color images for A: predominantly terrestrial habitats; B: predominantly lotic habitats, and C: predominantly lentic habitats; see bottom-panel class descriptions in Table 1). The ROI diameters range from 12 to 52 m (see Table 2).
Figure 3. (AC) Spatial delineation of ROIs with sample wetland classes and aquatic habitats selected from the Selenga River Delta landscape (Top panels are true-color images for A: predominantly terrestrial habitats; B: predominantly lotic habitats, and C: predominantly lentic habitats; see bottom-panel class descriptions in Table 1). The ROI diameters range from 12 to 52 m (see Table 2).
Remotesensing 11 00551 g003
Figure 4. Overall accuracy and 95% confidence interval results by classifier type, by ROI size and the associated proportions of pure and mixed ROIs for the 228 sites in this study. Note: ROI = region of interest, GLM = Generalized Linear Model, MLM = Multinomial Logistic Model, SVM = Support Vector Machine, and RF = Random Forest.
Figure 4. Overall accuracy and 95% confidence interval results by classifier type, by ROI size and the associated proportions of pure and mixed ROIs for the 228 sites in this study. Note: ROI = region of interest, GLM = Generalized Linear Model, MLM = Multinomial Logistic Model, SVM = Support Vector Machine, and RF = Random Forest.
Remotesensing 11 00551 g004
Figure 5. Wetland classification thematic maps of a select study area generated using the smallest ROI-size training data (ROI-ID D12; see Figure 2): (A) true-color WV2 bands 532, and (B) false-color WV2 bands 753 composite, (C) MLM, (D) GLM, (E) SVM, and (F) RF models. Colors depicted follow the wetland classes given in Table 1.
Figure 5. Wetland classification thematic maps of a select study area generated using the smallest ROI-size training data (ROI-ID D12; see Figure 2): (A) true-color WV2 bands 532, and (B) false-color WV2 bands 753 composite, (C) MLM, (D) GLM, (E) SVM, and (F) RF models. Colors depicted follow the wetland classes given in Table 1.
Remotesensing 11 00551 g005
Figure 6. Producer’s accuracy (PA) and user’s accuracy (UA) of the 22 classes by Random Forest and ROI size. PA and UA values are colored in blue as a function of their magnitude (darker colors represent higher values). The class columns are colored as in Table 1.
Figure 6. Producer’s accuracy (PA) and user’s accuracy (UA) of the 22 classes by Random Forest and ROI size. PA and UA values are colored in blue as a function of their magnitude (darker colors represent higher values). The class columns are colored as in Table 1.
Remotesensing 11 00551 g006
Table 1. Description of wetland classes and aquatic habitats from ISODATA unsupervised classifier and subsequently informed by field-based sampling. The habitat data parenthetically noted represents botanical characteristics of the wetland classes with adequate specificity and fidelity to be considered class indicators (see [17] for additional information).
Table 1. Description of wetland classes and aquatic habitats from ISODATA unsupervised classifier and subsequently informed by field-based sampling. The habitat data parenthetically noted represents botanical characteristics of the wetland classes with adequate specificity and fidelity to be considered class indicators (see [17] for additional information).
1Deep water with sand bottom12Dense floating vascular (Nymphoides)
2Shallow water with sediment13Very dense floating vascular (Nymphoides)
3Shallow water with mud bottom14Persistent emergent (Phragmites)
4Very shallow water with sand bottom15Persistent emergent (Bare Soil/Carex)
5Shallow water with sand bottom16Persistent emergent (Equisetum)
6Submerged aquatic vascular (Lemna)17Persistent emergent (Thatch)
7Submerged aquatic vascular (Sparganium)18Persistent emergent (Carex)
8Submerged aquatic vascular (Ceratophyllum)19Persistent emergent (Calamagrostis)
9Submerged floating vascular (Nymphoides)20Persistent emergent (Scolochloa)
10Very sparse floating vascular (Nymphoides)21Persistent terrestrial (Amoria)
11Sparse floating vascular (Nymphoides)22Shrub/scrub (Salix)
Table 2. Description of training and validation datasets by region of interest (ROI) size (D is diameter followed by the associated value in m, e.g., D12 is a ROI with a 12-m diameter). There are 228 ROIs.
Table 2. Description of training and validation datasets by region of interest (ROI) size (D is diameter followed by the associated value in m, e.g., D12 is a ROI with a 12-m diameter). There are 228 ROIs.
ROI-IDArea (m2)% Areal Increase# of Pure-ROIs% of Pure-ROIs# of Mixed-ROIs% of Mixed-ROIs
D12113021393.4156.6
D141543621192.5177.5
D151775612655.310244.7
D16201786729.416170.6
D172271014821.118078.9
D182541253917.118982.9
D203141783113.619786.4
D24452300219.220790.8
D28616444135.721594.3
D3280461193.921996.1
D522124177810.422799.6
Table 3. Classification accuracy (%) by classifier type and region of interest (ROI) size. The highest overall accuracy (OA) for each diameter is in bold and underlined. White text on a dark gray box identifies the highest OA within a given classifier. ROI-IDs that overlap the 95% confidence intervals (CI) for the highest OA for each classifier are highlighted in light grey.
Table 3. Classification accuracy (%) by classifier type and region of interest (ROI) size. The highest overall accuracy (OA) for each diameter is in bold and underlined. White text on a dark gray box identifies the highest OA within a given classifier. ROI-IDs that overlap the 95% confidence intervals (CI) for the highest OA for each classifier are highlighted in light grey.
ROI-IDMLMGLMSVMRF
OACIOACIOACIOACI
D1248.447.249.677.376.178.486.585.687.487.887.088.6
D1449.648.350.878.777.779.787.386.388.387.186.287.9
D1555.654.257.077.776.678.886.485.487.487.286.388.2
D1656.855.658.075.774.477.083.582.484.785.684.886.5
D1760.559.261.877.075.878.184.483.385.586.285.487.0
D1860.659.361.975.073.876.182.681.683.685.985.186.8
D2062.761.463.974.773.675.881.280.082.484.984.085.8
D2463.762.565.072.371.273.378.076.979.284.083.184.9
D2862.260.963.670.369.271.477.176.078.281.780.682.8
D3262.160.863.368.967.770.173.171.974.279.478.380.4
D5258.557.359.662.261.163.360.258.961.471.570.472.7
Notes: MLM = Multinomial Logistic Model, GLM = Generalized Linear Model, SVM = Support Vector Machine, RF = Random Forest.

Share and Cite

MDPI and ACS Style

Berhane, T.M.; Costa, H.; Lane, C.R.; Anenkhonov, O.A.; Chepinoga, V.V.; Autrey, B.C. The Influence of Region of Interest Heterogeneity on Classification Accuracy in Wetland Systems. Remote Sens. 2019, 11, 551. https://doi.org/10.3390/rs11050551

AMA Style

Berhane TM, Costa H, Lane CR, Anenkhonov OA, Chepinoga VV, Autrey BC. The Influence of Region of Interest Heterogeneity on Classification Accuracy in Wetland Systems. Remote Sensing. 2019; 11(5):551. https://doi.org/10.3390/rs11050551

Chicago/Turabian Style

Berhane, Tedros M., Hugo Costa, Charles R. Lane, Oleg A. Anenkhonov, Victor V. Chepinoga, and Bradley C. Autrey. 2019. "The Influence of Region of Interest Heterogeneity on Classification Accuracy in Wetland Systems" Remote Sensing 11, no. 5: 551. https://doi.org/10.3390/rs11050551

APA Style

Berhane, T. M., Costa, H., Lane, C. R., Anenkhonov, O. A., Chepinoga, V. V., & Autrey, B. C. (2019). The Influence of Region of Interest Heterogeneity on Classification Accuracy in Wetland Systems. Remote Sensing, 11(5), 551. https://doi.org/10.3390/rs11050551

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop