This section introduces a case study illustrating the application of the proposed approach for defining and mapping an ESI for standing water presences (e.g., inland water bodies and flooded areas by human activities or natural hazards).
2.1. Study Area, Data Sources and Data Transform
The case study in a territory in Northern Italy is relative to monitoring standing water, which can occur due to controlled inundations (irrigation), extreme event floods and natural water reservoirs. Specifically, the three sites shown in
Figure 1 were selected as ROIs to cover different conditions of standing water in order to capture variable spectral characteristics: flooded area due to extreme heavy rainfall (ROI_1 Emilia area), river bed (ROI_2 Po Valley) and flooded rice fields (ROI_3 rice paddies) (
Table 1). The latter site was selected, although flooding was not due to a natural event, to train and validate the algorithm over heterogeneous conditions of a shallow water surface (<50 cm) mixed with soil patches and vegetation (emerging rice plants).
In the three sites, VGI elements were available, obtained by both in situ observations (in ROI_3) and photointerpretation (in both ROI_1 and ROI_2). In situ observations had been created by agronomists by means of a mobile application within the Space4Agri Project ( Agronomists tagged agricultural parcels with the observed crop, stage of growth and tillage practice. In the case of the rice crop, they indicated if rice paddies were inundated or not (i.e., water or no water) [
1]. This VGI was partitioned into two subsets and used as a training set for learning and as a reference set for validation.
Furthermore, other VGIs were created by volunteers through photointerpretation by interacting with a GIS overlaying open street map (OSM) layers, RGB images and other background layers. This VGI consisted of georeferenced observations classified into five classes: “natural flooding”, “flooded fields”, “rivers”, “shadows over water” and “not flooded”. Since assuming VGI as the ground truth is very delicate and risky, as discussed in the huge literature on VGI quality [
31], we applied a quality assurance and assessment approach [
32]. Specifically, for quality assurance, photointerpreters were selected ex-ante and trained to identify the kinds of standing water. Furthermore, since we also reuse VGI created in situ for a different purpose, it was necessary to evaluate its fitness-for-use by applying an ex-post quality assessment approach [
32]. Specifically, only in situ VGI elements created close to the dates of acquisition of the multispectral images used for the ESI mapping were selected as reliable VGI. This VGI was assumed as the ground truth and partitioned into three distinct subsets for three distinct objectives: (i) calibrating the definitions of soft constraints, (ii) learning the OWA aggregation and (iii) validation of algorithm performance.
Table 2 reports for each site’s EO satellite data and acquisition dates the number of ground truth pixels (w/nw stand for water/not water) selected from the available reliable VGI used for the soft constraints definition (S) in the preliminary phase, for learning the OWA operator (L) in phase two of the algorithm and for validation (V) of the computed ESI maps. At each validation epoch, 10% (90%) of the ground truth pixels not used for (S) were randomly selected for (L), and the remaining 90% (10%) were used for (V) in the typical (atypical) validation settings, respectively.
Specifically, the remote sensing data source used in all sites is Sentinel 2 (S2) ( The S2 mission operates as part of a two-satellite system (A and B) providing high-resolution multispectral optical imagery since June 2015 (A) and March 2017 (B). The S2 multispectral instrument (MSI) measures the Earth’s reflected radiance in 13 spectral bands from VIS/NIR to SWIR with a spatial resolution ranging from 10 m to 60 m. The study case was built on S2 data collected for post-event assessments (after flooding occurrences at ROI_1 and ROI_2 and immediately after the rice field survey for ROI_3). Level-2A S2 images were downloaded and preprocessed with a sen2r toolbox [
33]. The details of the preprocessing operations are described in [
29]. For ROI_1 and ROI_2, Level-2A S2 imagery was downloaded as the bottom of atmosphere (BOA) reflectance through the Copernicus Open Access Hub. Preprocessing consisted of clipping images to our area of interest and masking clouds using the scene classification (SC) product; pixels classified as high and medium cloud probability were masked out, while pixels belonging to different classes were retained to avoid masking-out water pixels. For ROI_3, a BOA image was not available at the desired dates of the field survey in the Copernicus archive, so it was necessary to download the top of atmosphere Level-1C products and apply atmospheric correction by using the Sen2Cor algorithm of the sen2r toolbox library [
The processed multispectral products are used to compute the most suitable spectral indexes (SI) reported in
Table 3 [
34], which were identified by two experts as indicators enhancing standing water areas presences. They were used as contributing factors from which to derive partial evidence of standing water. The aggregation of partial evidence is performed through an OWA operator [
24] generated by applying a machine-learning approach exploiting the limited ground truth obtained by either in situ observations or photointerpretation.
2.2. Theoretical Aspects
In the following subsections, we define the main concepts of fuzzy set theory that are the basic materials and methods used to model the process of environmental status indicator (ESI) mapping.
2.2.1. Soft Constraints
Fuzzy sets were introduced by Zadeh in 1965 [
6] to represent concepts characterized by unsharp boundaries, where the transition between membership and non-membership is gradual rather than abrupt. A fuzzy set A on a universe D is characterized by a membership function
μA: D → [0, 1], assigning a membership degree,
d) ∈ [0, 1], to each element d of the domain D.
d) provides an estimation of the belonging of d to A.
An elastic or soft constraint C on a domain D of a variable v (i.e., a contributing factor) is defined by a membership function of a fuzzy subset C of D. When we apply the soft constraint to a value d ∈ D of the variable v, its membership degree μC(d) indicates the degree of satisfaction of C: μC(d) = 1 and means that d fully satisfies C; μC(d) = 0 means that d does not satisfy C at all and 0 < μC(d) < 1 means that d partially satisfies C.
In the case expert’s interpretation of a phenomenon, possibly incomplete and imprecise, a soft constraint can be defined by the domain expert to specify a criterion to compute a partial evidence of the phenomenon given the information on the value of a variable v, which is selected as a contributing factor of the phenomenon. In this case, the expert cannot state precisely which subset of values of the domain D of the contributing factor v provides evidence of the phenomenon but can state imprecise/fuzzy subsets of D.
Soft constraints can be defined with membership functions with different shapes of variable complexity: triangular, trapezoidal, Gaussian-shaped, Bell-shaped, Sigmoid-shaped, etc. A simple definition of a soft constraint
μC can be specified with a flexible shape by a tuple (
f), with
d ∈ [0, 1] and
f > 0, as follows:
By setting
a =
b = − ∞ or
c =
d = + ∞, we obtain the special cases of L-functions (not increasing) and R-functions (not decreasing) as the ones depicted in
Figure 2. By specifying
b =
c and
e =
f = 1
, we obtain triangular membership functions. By setting
e ≠ 1 and
f ≠ 1, we can obtain nonlinear functions.
Complex soft constraints can be defined by combining soft constraints either by conjunction (“C1 and C2” is defined by min(μC1(x), μC2(y)) ∀x ∈ X and ∀y ∈ Y), by disjunction (“C1 or C2” is defined by max(μC1(x), μC2(y)), ∀x ∈ X and ∀y ∈ Y) and by negation (“Not C” is defined by the complement 1 − μC(x)).
Finally, when μC1(x) ⊆ μC2(x), ∀x ∈ X, C1 is included in C2 (i.e., C1 is stricter than C2). When defining a soft constraint to compute the partial evidence degree of a critical phenomenon, stricter soft constraints favor omissions (false negatives) and, conversely, relaxed soft constraints favor commissions (false positives).
2.2.2. Ordered Weighted Averaging (OWA) Operators
The seminal paper [
35], stemming from the consideration that “the efficient use of decision support systems (DSSs) is to assist and help humans arrive at a proper decision, but by no means, to replace humans”, proposes to introduce some synergy between human and machine. To this end, the author defines the fuzzy logic-based calculi of linguistically quantified propositions as a viable means for expressing human interpretable decisions.
Linguistic quantifiers were first introduced in [
36] as fuzzy subsets of the positive real numbers or of the unit interval [0, 1], according to the fact that they express an absolute quantity, such as many, or a relative quantity, such as most.
Ordered weighted averaging (OWA) operators were first proposed to define an overall decision function aggregating degrees of satisfaction of multiple criteria (in our context, partial evidence degrees computed by soft constraints defined on the domain of some variables) [
24]. OWA operators allow us to define fusion strategies with distinct mean-like semantics ranging from the minimum to the maximum of the values they aggregate.
An OWA of dimension
N and weighting vector
W, with ∑
i = 1,...N wi = 1, aggregates
N values [
d1, …,
dN] and computes an aggregated value
a in [0, 1], as follows [
such that
in which
gi is the
ith largest value of the
A fundamental aspect of the OWA is the reordering of its arguments so that the weight wi is not associated with an argument di but rather with a particular rank of the arguments in decreasing order. A known property of the OWA operators is that they include the max, min and arithmetic mean operators for the appropriate selection of the weighting vector W:
For W = [1, 0, ...., 0], OWA([d1, ..., dN) = max([d1, …, dN]).
For W = [0, ...., 0, 1], OWA([d1, ..., dN]) = min([d1, …, dN]).
For W = [1/N, ...., 1/N], OWA([d1, ..., dN]) = .
It can be proved that OWA operators satisfy the commutativity, monotonicity and idempotency and are bounded by min and max operators [
2.3. Proposed Approach
A preliminary phase of the proposed approach is the selection of contributing factors that can influence the phenomenon; these contributing factors are physical variables whose values are computed in all spatial units of a ROI to create thematic maps. They are identified by experts based on domain knowledge; the most suitable shapes of the soft constraint membership functions are selected, exploiting a statistic analysis of the values of the contributing factors on a classified data set. This is done by defining the membership functions that better discriminate the class of interest from the others. Soft constraints satisfaction degrees are interpreted as degrees of partial evidence of the phenomenon due to a specific contributing factor. In this phase, for each factor, an importance degree can be computed proportional to the degree of separability between the classes. This can be determined by applying the soft constraints on the classified data set. Alternatively, for each factor, a degree of reliability or trust can be deemed, depending on the knowledge of the phenomenon or reliability of the data source.
This preliminary step does not need to be performed each time the algorithm is applied to map the ESI on a new ROI. It is done once and for all used classified data from one or more ROIs. Then the automatic algorithm adapts the ESI mapping to a new ROI by exploiting local ground truth.
The automatic algorithm depicted in
Figure 3 is structured into two phases. The first phase is mainly knowledge-driven, while the second phase is data-driven.
In the first phase, after computing the contributing factors on the input map, the input soft constraints are evaluated. This phase produces partial evidence (PE) maps, in which each unit element, a pixel in the illustrated implementation, is associated with a degree in [0, 1].
The second phase exploits reliable VGI in a ROI to learn the best operator, namely, an ordered weighted averaging (OWA) operator [
24], for aggregating the PE maps in order to compute the ESI map synthesizing the phenomenon. The choice of OWA operators to model the fusion strategy is due to their mean-like nature, which is recognized by many authors as particularly useful in the context of spatial decision-making [
44]. Furthermore, the semantics of the learned aggregation can be expressed linguistically to describe a decision attitude, either optimistic or pessimistic and monarchical or democratic, with blends of these extremes. This aspect confers human understandability to our approach.
Finally, the approach is scalable and suited for a distributing processing implementation framework.
2.3.1. Characterizing the OWA Semantics
To characterize the decision attitude modeled by an OWA operator with weighting vector W, two measures have been introduced in [
24]: ORness and dispersion.
W) ∈ [0, 1] measure is defined as follows:
This measure characterizes the degree to which the aggregation is like an OR (max) operator. It can be shown that, when the argument values
d1, ...,
dN are degrees of partial evidence of an anomaly of an environmental phenomenon from
N distinct sources (i.e., the greater they are, the more severe the anomaly), we have the following interpretations [
ORness[1 , …, 0] = 1 indicates a pessimistic attitude advertising risks (i.e., nothing is disregarded and any single source alone is trusted and taken into consideration to plan preparedness and mitigation interventions so as to minimize the occurrence of risky events);
ORness[0, …, 1] = 0 indicates an optimistic attitude towards tolerating risks (i.e., prioritizing preparedness and mitigation interventions only to anomaly situations pointed out by all sources) and
ORness[1/N, …, 1/N] = 0.5 indicates a balanced and neutral attitude towards risk-prone and risk-adverse.
Another measure used to qualify the semantics of an OWA operator is the dispersion. This measure represents how much of the information in all the arguments is used by an OWA with weighting vector W. The idea behind its definition is that, the greater the dispersion, the more democratic is the aggregation of the correspondent OWA, since it uses information from more sources [
46]. Several dispersion measures have been proposed, the first of which is based on the concept of entropy of W. We adopted the
W) ∈ [0, 1] measure of an OWA operator, as proposed in [
We see that dispersion(W) is clearly symmetric, and, when N is large, it is defined in [0, 1]. When dispersion(W) = 0, it means that only one source is considered; in this case, the aggregation is named monarchical, since the decision is taken just by one. The larger its value, the more the result is determined by additional sources, and, thus, we have a more democratic aggregation.
To linguistically explicit the semantics modeled by an OWA operator with weighting vector
W, first one computes its
W) and
W), as defined in Formulas (3) and (4), respectively. Then, by mapping the point (
W)) in the 2D space defined by
ORness and
dispersion shown in
Table 4, one can easily select the label representing the decision attitude modeled by the OWA operator. Notice that
Table 4 has been defined by considering that high/small arguments of the OWA are pessimistic/optimistic interpretations of the occurrence of a phenomenon that is regarded as undesired, critical, negative or that should not happen. For example, high/small evidence degrees of flood/wildfires/droughts occurrences have a negative/positive flavor. One is then pessimistic/optimistic if the evidence is high/small. Thus, the interpretation of optimism and pessimism reported in
Table 4 are complemented with respect to the context of multicriteria decision-making in which generally high/small values are regarded as optimistic/pessimistic evaluations.
When ORness(W) > 0.5 and dispersion(W) is close to 0, the decision is risk-adverse, since one mostly trusts the most pessimistic/towards pessimistic sources and almost disregards the optimistic ones. Nevertheless, in doing this, one may obtain many false positives.
When ORness(W) < 0.5 and dispersion(W) is close to 0, the decision attitude is risk-prone, since one mostly trusts the few sources that are optimistic. In this case, one may miss potential alerting sources and may thus generate many false negatives.
A balanced decision attitude, characterized by ORness(W) = 0.5 and dispersion(W) = (N − 1)/N, takes into account equally all sources, then is both neutral and democratic. Intermediate values of ORness and dispersion characterize different blends of both pessimism/optimism and democracy/monarchy.
2.3.2. Learning OWA Semantics from Observations
One important issue in the domain of partial-evidence aggregation is the determination of the OWA operator modeling the aggregation. If ground truth data are available (e.g., georeferenced observations on the occurrence of a phenomenon at certain locations of the ROI), they can be used to learn the weighting vector of the OWA operator.
To this end we propose the application of a machine-learning approach [
47], exploiting VGI assumed as ground truth to learn the best OWA operator for a given ROI by iteratively minimizing error between OWA results at epoch
t with respect to the observations described by VGI. Notice that VGI used to this purpose must be quality assessed.
Given K georeferenced observations
a1, …,
aK assumed as ground truth, for example, VGI elements, by knowing their geographic coordinates, we can associate with each observation the partial evidence values [
ai1, …,
aiN] having the same coordinates, such that we obtain the following antecedent-consequent rules that must be satisfied:
In principle, the observations a1, …, aK can be specified on a continuous scale [0, 1] to quantify the extent of the phenomenon in the specific location; nevertheless, in practical situations, a discrete scale such as {0, 0.5, 1}, or even a binary scale {0, 1}, is used where 0 means absence of the phenomenon and 1 is presence.
The learning mechanism starts at epoch L = 0 by assuming as initial OWA0 operator the weighted average (balanced and neutral attitude), which is defined with weighting vector W0 = [1/N, …, 1/N]. Then, at each epoch L, it iteratively determines the weighting vector WL = [w1L, …, wNL] of OWAL that minimizes the error existing between the results of its application to all the antecedents of the rules in (5) and the georeferenced observations (i.e., the consequents of the rules).
Formally, this is equivalent to applying the following rule:
in which
β ∈ (0, 1] is a learning rate parameter and the
ith weighting vector element at epoch
L is defined as follows:
2.3.3. Scalability of the Approach
The ESI computation described in the previous section can be implemented in a distributed processing framework represented by the schema depicted in
Figure 4.
Since the ESI computation is performed independently for each spatial unit and is organized in two subsequent phases, we can implement it in a single round of a map-reduce framework [
The map-reduce framework is inspired by the “map” and “reduce” functions used in functional programming. Computational processing occurs on data stored in a distributed file system or within a database, which takes a set of input key-values pairs and produces a set of output key-values pairs [
A mapper M is a Turing machine M (<k, v>) → (<k1′, v1′>, …, <ks′, vs′>), which accepts as input a single key-value pair <k, v> and produces a list of key-value pairs <k1′, v1′>, …, <ks’, vs’>.
A shuffle is performed on the outputs of the mappers so as to group the values with the same key: <k1′, v1′, …, vr1′>, …., <kR′, v1′, …, vrR′>.
A reducer R is a Turing machine R: <k′, v1′, …, vr′> → <k′, v″>, which accepts as input a pair <k′, v1′, …, vr′> and produces as output the same key k′ and a new value v″.
A mapper can be instructed by its input parameters to compute more contributing factors and to evaluate more soft constraints on the same chunk; the input key k identifies either a single pixel or a spatial unit in a multispectral image chunk. The associated value v is the information associated with the input chunk (e.g., the bands and theme values such as VGI), plus parameters (the contributing factors’ names and definitions the mapper has to compute) and the tuples (a, b, c, d, e, f) defining the soft constraints membership functions according to definition (1).
A mapper can compute for each pixel in the input chunk the key-value pairs <k1′, v1′>, …, <ks′, vs′>, where ki′ identifies the chunk and vi′ are the computed degrees of partial evidence of the SIs in the chunk.
Successively, the reducers execute the second phase by aggregating the partial evidence maps v1′, …, vr1′ of the same chunk ki′ in parallel so as to compute the ESI map v″ for the chunk.
Chunks are finally recombined by mosaicing at the end of the process.
The values v″ are computed by applying in each pixel or spatial unit of the chunk the OWA operator learned by leveraging VGI in the ROI covered by the chunk. This way, each reducer can learn a distinct OWA operator; thus, adapting the ESI computation to the local context and observations. Notice that the learning process is performed within each reducer module, which applies on its input chunk the OWA operator learned at time epoch L based on the subset of VGI included in the input chunk. There is no need to upload the input at each epoch, since the evidence maps do not change from epoch to epoch; once the optimal OWA has been determined, the ESI map can be computed and stored on disk.
2.3.4. Contributions from Expert’s Knowledge
In order to exploit the huge literature based on single spectral index (SI) for mapping water surfaces and vegetation cover, seven SIs have been selected as contributing factors from which partial evidence of standing water can be computed (see
Table 3). Besides SIs, also hue (H) and value (V) dimensions of the HSV color space, derived by transforming the components SWIR2, NIR and RED, were selected to define the reduced space hue-value (HV) as a further contributing factor; in this transformed space, standing water surfaces can be separated from land surfaces by means of empirical thresholds, as defined in [
The transformation function f: SWIR2 × NIR × RED → H × V is a standardized colorimetric transformation from RGB to HV components of the HSV color space, where SWIR2 = R, NIR = G and RED = B respectively, defined as in [
For each contributing factor/spectral index, a soft constraint is defined on its domain by the expert by analyzing the statistical distribution of each SI value for the pixels corresponding to standing water with respect to the ones of nonwater surfaces, according to a classified data set. The soft constraints are defined with a shape, basically L and R functions, as defined in Formula (1). In the case of the contributing factor HV, a single bi-dimensional soft constraint on the domain H × V has been defined as a fuzzy relation combining by minimum the soft constraints on the two dimensions. The details of this activity preliminary to the execution of the algorithm phase 1 are reported in [
In order to set up a validation experiment aimed at testing the stability of the approach when changing experts, we performed phase 1 twice by exploiting interpretations provided by two experts, hereafter named A and B, respectively. They defined different soft constraints on the same set of contributing factors by interpreting available classified data, as illustrated in
Figure 5. The used classified data were VGI created by photointerpretation. The two experts have distinct decision attitudes: A, who defined piece-wise linear membership functions, was generally more optimistic than B, who also defined nonlinear functions in order to better discriminate “not flooded” areas. In fact, it can be noticed that the soft constraints of expert A are generally stricter than those defined by expert B on the same SI (i.e., the membership functions defined by expert A are generally included in those of expert B). It follows that expert A (
Figure 5) has a more optimistic attitude towards mapping standing water areas (considered as an undesired phenomenon); he/she accepts the risk of generating omission errors by partially disregarding “shadows over water areas”. Conversely, expert B (
Figure 5) takes a more pessimistic attitude by defining soft constraints so as not to miss “shadows over water areas”, which belong to the support of the membership functions (i.e., have not null membership degree).
2.4. Validation Experiments
The validation experiment was designed with the following objectives:
- (a)
to compare the accuracy of the proposal with respect to traditional approaches based on a single SI,
- (b)
to investigate the stability of results with respect to changing the ROI,
- (c)
to investigate the stability of results with respect to changing experts (A and B),
- (d)
to investigate the adaptability of the learning to local context (ROI) by changing experts (A and B) and
- (e)
to investigate the accuracy when downscaling the dimension of the training set.
In phase 1 of the algorithm partial evidence (PE), maps are computed for each contributing factor using as input the preprocessed multispectral images, the definitions of contributing factors and soft constraints defined by either expert A or B. The PE maps are successively used by phase 2 to the aim of learning the OWA operator and then computing the overall ESI map.
Phase 1 was executed twice: the first execution by using the soft constraints by expert A and the second by expert B, respectively. Thus, we obtained two distinct sets of PE maps, indicated hereafter by PE_A and PE_B.
Figure 6 shows an example of PE_A maps derived by four different contributing factors on the three ROIs. It can be seen that soft constraints on H and V contributing factors generate maps (l), (m) and (n), characterized by high contrast in all ROIs. Pixel values are mostly distributed close to the extreme of the domain [0, 1]; this indicates that the classification of standing water by using the soft constraints on H and V components is less affected by doubts. This happens also for the other contributing factors in ROI_1 (Emilia area). Conversely, in ROI_2 and ROI_3, the soft constraints defined on AWEI (b, c); mNDWI (e, f) and NDFI (h, i) yield more gradual PE_A maps, thus, bearing more uncertainty.
Phase 2 of the algorithm takes as input one set of PE maps generated by a run of phase 1, either PE_A or PE_B, and a subset of VGI and computes an ESI map. This consists in aggregating PE maps by applying the OWA operator learned by the iterative process exploiting VGI. Outputs of this phase are: the ESI map, the weighting vector of the OWA operator, its
ORness and
dispersion measures and the correspondent label (as defined in
Table 4) representing the decision attitude modeled by the OWA operator.
By changing either PE maps or VGI, different ESI maps can be computed for the same ROI; specifically, phase 2 was executed several times with distinct VGI subsets, as described in the following subsection.
The algorithm phase 1 was executed twice on the three ROIs to the aim of testing stability and adaptability to ROI when changing experts (objectives c and d).
Accuracy of each single contributing factor in mapping standing water was evaluated by computing accuracy metrics true positives (
TP), true negatives (
TN), false positives (
FP) and false negatives (
FN) from the confusion matrix, commission (
CE =
FP +
TP)) and omission (
OE =
FN +
TP)) errors and F-score defined as follows:
Figure 7 reports the diagram of variation of the F-score measure in the three ROIs (shown in
Figure 1) obtained by using single contributing factors. Ground truth for the validation is composed of around 1000 VGI independent elements in each ROI, as reported in the fourth column of
Table 2. Values of F-scores were computed by defining increasing thresholds on the SI domains, normalized in [0, 1] with a 0.1 step; pixels with SI values exceeding the threshold are considered as “standing water”. It can be noticed that F-score curves are not increasing. This is because, by increasing the threshold, we are stricter on the selection of standing water pixels; thus, while commission errors remain stable, we may increase omission errors by missing true standing water areas.
It can be observed that in the three ROIs which are characterized by distinct land covers and water conditions (water depth, color, fractional cover, plant/soil patches presence, etc.), a different SI presents the best performance (greatest F-score) for given values of the thresholds. This confirms our intuition that a single SI cannot capture all types of standing water conditions.
In the Emilia area, HV, AWEI and NDFI have the best comparable performance for all thresholds; in the Po Valley area, the best index is NDWI, followed by HV. Finally, in the rice paddies area, AWEI and HV are the best indices for threshold values below and above 0.3, respectively.
These results confirm the need of an aggregation phase capable of automatically selecting the best-performing contributing factor for each pixel in each ROI.
This is achieved in phase 2, which applies an adaptation of the algorithm to a specific ROI by exploiting available ground truth.
In order to pursue the validation with a traditional setting of the training set and with a downscaled training set, we designed two kinds of k-fold cross-validation experiments.
We recall that a k-fold cross validation is a statistical method aimed at evaluating the performance of a learning algorithm by changing the training set; in doing so, it is possible to compute both average performance metrics and the standard deviation to assess its sensitivity.
In each experiment, using either expert A or B, phase 2 was executed 10 times (k = 10), thus generating ten weighting vectors of the OWA and, consequently, 10 distinct ESI maps for the site. At each run, a different subset of both ground truth data for learning and testing were selected by applying stratified random sampling. In the first kind of validation experiments, we first used 90% of ground truth VGI elements for learning the OWA aggregation and 10% for testing, as in the standard validation methods of machine-learning. These experiments are named typical (T) k-fold cross validations.
To test the algorithm with a downscaled training set, we performed two other 10-fold cross validations by using A and B but a different proportion of the learning and testing sets. Differently than in the typical validations, this time we used a small subset of VGI elements for learning (only 10% of the available ground truth pixels), while we used the remaining 90% for testing. Stratified random sampling was applied to select the two subsets. This validation is called atypical (AT) and was aimed at investigating the stability of the results when simulating a realistic situation with a small set of ground truth data.
Performance achieved on each ROI by the typical and atypical 10-fold cross validations is shown in
Figure 8; the ten F-score diagrams in each area are relative to the ten ESI maps produced as a result of the ten executions of the algorithm phase 2 using either A or B.
Table 5 summarizes average performances of the algorithm over all runs and all thresholds in both the typical and the atypical validations when using A and B and when using the single-best SI in each ROI.
Table 6 reports the learned OWA operator, averaged over the 10 runs when using both A and B in both the typical (T) and atypical (AT) validation settings.
Finally, Figure 10 illustrates for each ROI two ESI maps highlighting in blue “standing water” areas identified by values of ESI > 0.5 computed based on either A or B.
In the following, we discuss the results reported in the figures and tables.