Estimating Regional PM2.5 Concentrations in China Using a Global-Local Regression Model Considering Global Spatial Autocorrelation and Local Spatial Heterogeneity
Next Article in Journal
Evaluation of Remote Sensing-Based Evapotranspiration Datasets for Improving Hydrological Model Simulation in Humid Region of East China
Next Article in Special Issue
Local-Scale Horizontal CO2 Flux Estimation Incorporating Differential Absorption Lidar and Coherent Doppler Wind Lidar
Previous Article in Journal
Enhanced Multi-Stream Remote Sensing Spatiotemporal Fusion Network Based on Transformer and Dilated Convolution
Previous Article in Special Issue
Preliminary Assessment and Verification of the Langley Plots Calibration of the Sun Photometer at Mt Foyeding Observatory, Beijing
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Estimating Regional PM2.5 Concentrations in China Using a Global-Local Regression Model Considering Global Spatial Autocorrelation and Local Spatial Heterogeneity

School of Resource and Environment Science, Wuhan University, Wuhan 430079, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2022, 14(18), 4545; https://doi.org/10.3390/rs14184545
Submission received: 12 August 2022 / Revised: 2 September 2022 / Accepted: 8 September 2022 / Published: 11 September 2022
(This article belongs to the Special Issue Stereoscopic Remote Sensing of Air Pollutants and Applications)

Abstract

:
Linear regression models are commonly used for estimating ground PM2.5 concentrations, but the global spatial autocorrelation and local spatial heterogeneity of PM2.5 distribution are either ignored or only partially considered in commonly used models for estimating PM2.5 concentrations. Therefore, taking both global spatial autocorrelation and local spatial heterogeneity into consideration, a global-local regression (GLR) model is proposed for estimating ground PM2.5 concentrations in the Yangtze River Delta (YRD) and in the Beijing, Tianjin, Hebei (BTH) regions of China based on the aerosol optical depth data, meteorological data, remote sensing data, and pollution source data. Considering the global spatial autocorrelation, the GLR model extracts global factors by the eigenvector spatial filtering (ESF) method, and combines the fraction of them that passes further filtering with the geographically weighted regression (GWR) method to address the local spatial heterogeneity. Comprehensive results show that the GLR model outperforms the ordinary GWR and ESF models, and the GLR model has the best performance at the monthly, seasonal, and annual levels. The average adjusted R2 of the monthly GLR model in the YRD region (the BTH region) is 0.620 (0.853), which is 8.0% and 7.4% (6.8% and 7.0%) higher than that of the monthly ESF and GWR models, respectively. The average cross-validation root mean square error of the monthly GLR model is 7.024 μg/m3 in the YRD region, and 9.499 μg/m3 in the BTH region, which is lower than that of the ESF and GWR models. The GLR model can effectively address the spatial autocorrelation and spatial heterogeneity, and overcome the shortcoming of the ordinary GWR model that overfocuses on local features and the disadvantage of the poor local performance of the ordinary ESF model. Overall, the GLR model with good spatial and temporal applicability is a promising method for estimating PM2.5 concentrations.

1. Introduction

PM2.5, which denotes particulate matter with an aerodynamic diameter ≤2.5 μm, poses negative effects on human health [1,2,3]. The Global Burden of Disease (GBD) has reported that 2.9 million people died worldwide in 2013 due to PM2.5 [4]. Numerous studies show that PM2.5 has been implicated as among the key risk factors for cancer [5,6,7]. The combination of PM2.5 and water vapor also produces the haze phenomenon, causing reduced visibility and inconvenience for people to live and work [8]. In recent years, with rapid urbanization and industrialization, PM2.5 pollution in China has become more and more severe [9,10,11], so the monitoring of PM2.5 is especially important. However, although China has been building PM2.5 monitoring networks since 2013, these sparse and unevenly distributed PM2.5 monitoring stations are insufficient to support large-scale, accurate PM2.5 monitoring.
To fill the gap in ground measurements, large-scale remote sensing products were applied to estimate ground-level PM2.5 concentrations [12,13,14]. Aerosol optical depth (AOD) has been widely recognized to be closely correlated to PM2.5 concentration [15,16,17]. AOD data and other auxiliary factors, such as pollution source data, meteorological data, and elevation data, are commonly used to estimate PM2.5 concentrations [18,19,20]. Continuous ground-level PM2.5 concentrations can be obtained by many methods using continuous remote sensing data. Commonly used methods include machine learning methods, chemical transport models, and deep learning methods [21,22,23].
Linear regression models are commonly used in estimating PM2.5 concentrations, including land use regression (LUR), multiple linear regression (MLR), and least squares regression (OLS) [24,25,26,27]. Researchers have applied these models to estimate PM2.5 and explore the linear relationship between the variables and PM2.5. However, the distribution of PM2.5 is spatially autocorrelated and spatially heterogeneous, and the above models do not perform well in PM2.5 concentration estimation which may be due to their lack of consideration of spatial effects [28].
To address the spatial autocorrelation problem, the spatial error model (SEM), the spatial lag model (SLM), and the spatial Durbin model (SDM) [29,30,31] are typically applied. In particular, the eigenvector spatial filtering (ESF) method, proposed by Griffith et al. [32], is widely used to filter out spatial autocorrelation by calculating and filtering the eigenvectors of the spatial weight matrix, which are then introduced as independent variables together with other variables to construct the model. The ESF regression model was developed to estimate PM2.5 concentrations by introducing the spatial eigenvectors, which were selected from the spatial weight matrix constructed by the PM2.5 monitoring stations [33]. However, although ESF effectively solves the problem of spatial autocorrelation, the ESF model with constant regression coefficients has poor local fit and ignores the spatial heterogeneity of PM2.5 distribution.
Considering the spatial heterogeneity in PM2.5 distribution, geographically weighted regression (GWR) [34,35,36], Bayesian maximum entropy (BME) [20,37], and Bayesian spatially varying coefficient (SVC) [38] have been applied to solve this issue and improve the accuracy of PM2.5 concentration estimation. The GWR model, a local model proposed by Brunsdon et al. [39], is among the most commonly used local models, and its regression coefficients vary with spatial location, greatly improving the accuracy of local fits. Some researchers have continuously improved the GWR model and used it for PM2.5 concentration estimation. Examples of these models include the multiscale GWR (MGWR) [40], geographically and temporally WR (GTWR) [41,42], geographically neural network WR (GNNWR) [43], and the principal component analysis-GWR (PCA-GWR) [44] models. All these models effectively solve the local spatial heterogeneity problem of PM2.5 regression modeling and further expand the time series to optimize the model and improve the estimation accuracy, but these models ignore global spatial autocorrelation and overfocus on local features, and suffer from local overfitting. The above regression models either consider spatial autocorrelation or spatial heterogeneity and do not take both into account simultaneously.
In this paper, this study aims to address both global spatial autocorrelation and local spatial heterogeneity, and a global-local regression (GLR) model is proposed for estimating ground PM2.5 concentrations in the Yangtze River Delta (YRD) and in the Beijing, Tianjin, Hebei (BTH) regions. The GLR model can effectively address both the global spatial autocorrelation and local spatial heterogeneity, and it also overcomes the shortcoming of the ordinary GWR model that overfocuses on local features and the disadvantage of the poor local performance of the ordinary ESF model. In order to verify the spatial applicability of the GLR model and exclude the possibility of accuracy inflation caused by spatial factors, we selected two regions as the study area. In addition, to verify the temporal applicability of the GLR model, we not only constructed monthly models but also conducted experiments on seasonal and annual time scales.

2. Materials and Methods

2.1. Study Areas

The YRD region (Figure 1a) is a flat alluvial plain covering an area of over 350,000 square kilometers on the east coast of China. The YRD region consists of Shanghai, Anhui province, Jiangsu province, and Zhejiang province. It is an important economic and cultural center in China, with a total population of over 150 million, and is an important international gateway to the Asia-Pacific region. However, along with its rapid economic growth, the YRD region has suffered from severe environmental pollution, especially air pollution, and people’s health is under serious threat [45]. There are 234 national monitoring stations distributed throughout the region, but most of them are located at low altitudes.
The BTH region (Figure 1b), consisting of the two municipalities of Beijing and Tianjin, and Hebei Province, is the most dynamic and largest economic region in northern China. With a population of more than 110 million and an area of over 210,000 square kilometers, it is among the most economically promising regions in China, and also the most polluted region in China [46]. Despite the high level of air pollution, there are only 79 national monitoring stations here.

2.2. Data Integration

In addition to the ground PM2.5 data, the datasets used in this study include AOD, meteorological data including planetary boundary layer height (PBLH), surface pressure (PS), surface temperature (TS), and relative humidity (RH), the normalized difference vegetation index (NDVI), elevation (DEM), and the kernel densities of roads (ROAD) and factories (FACT). All data were measured from 1 December 2015 to 30 November 2016 over the YRD and BTH regions.
Ground PM2.5 data were obtained from the China Environmental Monitoring Center (http://106.37.208. 233:20035 (accessed on 21 December 2021)). The dataset included 24 h concentrations of all air pollutants from all monitoring stations in China. The monthly, seasonal, and annual average PM2.5 data were obtained from hourly PM2.5 concentrations.
The details of remote sensing data and meteorological data are shown in Table 1. The meteorological data (PS, PBLH, RH, TS) has a spatial resolution of 0.25° latitude × 0.25° longitude, which is coarser than that of the AOD data (1 km in the BTH region and 3 km in the YRD region). For consistency with the AOD data, they were interpolated to 1 km (3 km in the YRD region) by ordinary kriging [47,48]. Daily AOD data and hourly meteorological data were averaged to obtain monthly, seasonal, and annual data. Likewise, the DEM data were also resampled to the same spatial resolution as the AOD data.
Pollution source data, including road networks and factory locations, were derived from BaiduMap (https://map.baidu.com (accessed on 14 March 2022)) and OpenStreetMap (https://www.openstreetmap.org/ (accessed on 14 March 2022)), respectively. Point-based factory location data and line-based road network data can be converted into kernel density raster data in ArcGIS, and the buffer range and spatial resolution were set to 24 km [33] and 1 km (3 km in the YRD region), respectively.
Subsequently, all values of independent variables at the monitoring stations were extracted from the corresponding raster. Further, all records with null data values were removed. We obtained a total of 1257 and 3651 records at monthly, seasonal, and annual time scales in the BTH region and the YRD region, respectively. In each period, there were approximately 74 records in the BTH region, and approximately 215 records in the YRD region.

2.3. Global-Local Regression Modeling Method

Our proposed GLR modeling method for PM2.5 concentration estimation considering global spatial autocorrelation and local spatial heterogeneity can be divided into three steps: (1) the extraction of global spatial factors; (2) the construction of the GLR model; and (3) model assessment, comparison, and validation.

2.3.1. The Extraction of Global Spatial Factors

Global spatial factors are extracted to eliminate the effect of global spatial autocorrelation of the PM2.5 spatial distribution. The ESF method is used to extract the global spatial factors by calculating and filtering the spatial eigenvectors of the global spatial weight matrix. The introduction of spatial eigenvectors can effectively solve the effect of spatial autocorrelation [49].
The extraction of global spatial factors can be expressed as three steps: (1) the construction of a centralized global spatial weight matrix W0; (2) the calculation of the spatial eigenvectors of W0; and (3) the filtering of the spatial eigenvectors.
Well-known schemes for the spatial weight matrix typically include spatially contiguous neighbors; “tri-cube” distance decline function; ranked distances; all centroids within distance d; and Gaussian distance decline [50,51]; inverse distances raised to some power. The centralized global spatial weight matrix W0 for the stations in this study is constructed by a Gaussian kernel function:
W i , j = exp d i , j r 2
where r is the bandwidth; d i , j is the Euclidean distance between station i and station j.
After obtaining the n-dimensional centered spatial weight matrix W0, where n is the number of monitoring stations, eigenvectors of this matrix are calculated, and only eigenvectors with eigenvalues λi > 0 and λimax ≥ 0.25 are retained [52], where λi is the eigenvalue of the spatial eigenvectors to be filtered and λmax is the largest eigenvalue.
The formula for the ESF model is as follows [53]:
Y = β X + α E + ε
where Y is the PM2.5 column vector, X is the matrix of the dependent variables, β is the corresponding coefficients matrix, E is the matrix of filtered spatial eigenvectors, and α is the corresponding coefficient matrix. E is filtered out from all other candidate eigenvectors by stepwise regression [54].

2.3.2. The Construction of the GLR Model

The local spatial heterogeneity problem can be solved by the local regression model with spatially variable coefficients. The GWR method is used to obtain the spatially variable coefficients needed to construct the GLR model. Figure 2 shows the workflow of this step.
The construction of the GLR model involves (1) the introduction of global spatial factors; (2) the construction of the local spatial weight matrix; (3) the construction and judgment of the GLR model; and (4) obtaining the final GLR model.
The set of spatial eigenvectors (global spatial factors) obtained in the previous step (Section 2.3.1) need to be further filtered. In step (1), a spatial eigenvector is introduced as independent variable from the spatial eigenvectors in turn, and in step (3), we can get the temporary models with the introduction of different spatial eigenvectors. The model with the minimum Akaike information criterion (AIC) value is set as the GLR model, and the corresponding eigenvector is selected. Then, the AIC value of the current GLR model is compared with that of the previous GLR model (the first GLR model compared with the GWR model), and if the AIC value decreases, the above steps are repeated to select one more eigenvector from the remaining unselected eigenvectors and update the GLR model until the AIC value no longer decreases or all eigenvectors are selected; if the AIC value does not decrease, then the currently selected eigenvectors are the final selected eigenvectors and the current GLR model is the final GLR model.
Since the weight function that converts the distance relationships of the stations in the influence range into weight values is very sensitive to the bandwidth (the local spatial influence range), the bandwidth is continuously adjusted to obtain the optimal bandwidth before constructing the local spatial weight matrix in step (2). The most commonly used cross-validation (CV) method is applied to determine the optimal bandwidth [55,56].
The formula for the GWR can be written as:
y i = β 0 u i , v i + k = 1 p β k u i , v i x k i + ε i
where y i is the value of dependent variable,   x i is the independent variable, β 0 and β k are the estimated coefficients, ε is the random error, and u i , v i is the coordinate of sample point i.
The selected global spatial factors (E1, E2, …, Eq (q is the number of selected spatial eigenvectors)), which are filtered from the set E which is calculated in Section 2.3.1 are introduced as independent variables to construct the GLR model, and the GLR model can be expressed as:
  y i = β 0 u i , v i + k = 1 p β i k u i , v i x i k + k = 1 q α k i u i , v i E k i + ε i  
where y i is the value of the dependent variable at the i-th sampling point; u i , v i is the coordinate of the i-th sampling point; β 0 u i , v i is the intercept term at this point; β i k u i , v i x i k is the k-th independent variable and its coefficient at this point; p is the number of independent variables; α i k u i , v i E i k is the value of the k-th spatial eigenvector further selected from E (the E in Formula (2)) at this point and its coefficient; q is the number of spatial eigenvectors further selected from E; ε i is the random error at this point.
In this study, the GLR model for estimating regional PM2.5 concentrations in the YRD and BTH regions can be expressed as:
P M 2.5 i = β 0 i + β 1 i A O D i + β 2 i D E M i + β 3 i T S i + β 4 i P S i + β 5 i R H i + β 6 i P B L H i + β 7 i N D V I i + β 8 i R O A D i + β 9 i F A C T i + k = 1 q α k i E k i + ε i
where β 0 i is the intercept term at point i; β k i (k = 1, 2 ..., 9) are regression coefficients at this point; α k i E k i is the value of the k-th spatial eigenvector further selected from E (the E in Formula (2)) at this point and its coefficient; q is the number of spatial eigenvectors further selected from E; ε i is the random error at this point.

2.3.3. Model Assessment, Comparison, and Validation

To assess the performance of models and the accuracy of their fit to the true PM2.5 concentrations, the adjusted R2 (Adj.R2), AIC, and root mean square error (RMSE) are used as evaluation metrics. Moran coefficient (MC) test can be applied for detecting the effects of spatial autocorrelation.
MC = n s 0 i = 1 n j = 1 n w i j Z i Z j i = 1 n Z i 2
where Zi is the deviation between the value of geographical object i and its mean; n is the number of geographical objects; wij is the spatial weight between geographical objects i and j; S0 is the total spatial weights: S 0 = i = 1 n i = 1 n w i , j .
The higher the MC of PM2.5, the stronger the spatial aggregation of PM2.5. Additionally, if the MC for residuals are statistically insignificant or close to zero, the residuals can be considered as random errors.
A 10-fold CV is applied to validate the model’s ability to predict PM2.5 concentrations in areas without monitoring stations. The PM2.5 dataset is randomly divided into 10 subsets: 9 of them are used in turn for training and one for validation. The average RMSE for 10 times CV results is applied as an evaluation metric for the predictive power of models.

3. Results

3.1. Descriptive Statistics for PM2.5 and Its Variables

What is shown in Table 2 is the summary statistics of the aforementioned variables and PM2.5 concentrations at the monitoring stations in the YRD and BTH regions during the period from December 2015 to November 2016. The Pearson correlation coefficients shown in Tables S1 and S2 indicated that, in the YRD and BTH regions, the independent variables that were positively and significantly correlated with PM2.5 were TS, PS, AOD, ROAD, and FACT. Additionally, the NDVI, PBLH, and DEM were significantly and negatively correlated with PM2.5, but RH was very complexly associated with PM2.5.
The distribution of the annual PM2.5 observations in the YRD region and BTH region were shown in Figure 3a,b, which ranged from 22.62 to 66.75 μg/m3 in the YRD region, and ranged from 28.06 to 112.48 μg/m3 in the BTH region. The annual average PM2.5 concentrations at the monitoring stations in the two regions were 49.61 and 75.99 μg/m3, respectively.
According to Table S3, the MCs for annual PM2.5 were 0.605 and 0.968 in the YRD and BTH regions, respectively, indicating that there was spatial clustering of PM2.5 in the two regions throughout the year with a strong spatial autocorrelation.

3.2. Model Results and Validation

3.2.1. The Assessment and Comparison of Models

Table 3 shows the performance metrics of the monthly average, seasonal and annual models (ESF, GWR, and GLR) in the YRD and BTH regions. In the YRD region, the average Adj.R2 of the monthly GLR model was 0.620, which was 8.0% and 7.4% higher than that of the ordinary monthly ESF (0.574) and GWR (0.578) models, respectively, and the average RMSE of the GLR (ESF, GWR) model was 5.401 μg/m3 (7.101 and 5.854 μg/m3). The seasonal average Adj.R2 of the GLR model was 0.634, which was 16.9% and 5.7% higher than that of the ESF (0.542) and GWR (0.600) models, respectively, and the average RMSEs of the ESF and GWR models were 5.808 and 5.023 μg/m3, respectively, which were both higher than that of 4.792 μg/m3 for the GLR model. The annual Adj.R2 and RMSE of the GLR model were 0.748 and 3.427 μg/m3, both better than those of the ESF and GWR models. The AIC value of the GLR model was the lowest among the three models at all time scales.
In the BTH region, the results show that the monthly average Adj.R2 of the GLR model was 0.853, which was 6.8% and 7.0% higher than that of the ESF (0.799) and GWR (0.797) models, respectively, and the average RMSEs of the ESF, GWR and GLR model were 7.733, 6.907 and 4.885 μg/m3, respectively. The seasonal average Adj.R2 of the GLR model was 0.873, which was 5.8% and 5.3% higher compared to that of the ESF (0.826) and GWR (0.829) models, respectively, and the seasonal average RMSE of the GLR (ESF, GWR) model was 4.540 μg/m3 (6.846 and 6.251 μg/m3). The annual Adj.R2 and RMSE of the GLR model were 0.959 and 3.128 μg/m3, both better than those of the ESF and GWR models. Likewise, the AIC value of the GLR model was also the lowest among the three models at all time scales.
Tables S4 and S5 show the performance metrics of each monthly model in the YRD and BTH regions. The Adj.R2 of the GLR model ranged from 0.393 to 0.719 in the YRD region, and 0.668 to 0.964 in the BTH region. The RMSE of the GLR model ranged from 2.893 to 8.237 μg/m3 in the YRD region, and 2.863 to 7.089 μg/m3 in the BTH region. The performance of all models in both regions decreases in summer.
In summary, higher average Adj.R2, lower average RMSE and AIC suggested that the GLR model was superior to the ordinary GWR model and the ordinary ESF model in both regions at monthly, seasonal, and annual scales. Additionally, the GLR model had good spatial and temporal applicability.

3.2.2. The MCs for Residuals

The MCs of GLR model residuals are shown in Table 4. The MCs of the GLR models were low and insignificant for most months and seasons in both regions. Overall, the residuals of the GLR model could be considered almost as random errors, indicating that the GLR model effectively weakens or even eliminates the effect of spatial autocorrelation on regression modeling.

3.2.3. Cross-Validation

A 10-fold CV was applied to evaluate the prediction accuracy of models. The RMSEs of monthly average, seasonal and, annual models are shown in Table 5, and Table S6 shows the RMSE for the specific monthly model. The monthly average CV RMSEs of the models (ESF, GWR, and GLR) in the YRD region were 7.650, 7.286 and 7.024 μg/m3, respectively. In the BTH region, the monthly average CV RMSEs of the ESF, GWR, and GLR models were 10.526, 11.001 and 9.499 μg/m3. Meanwhile, the seasonal average CV RMSEs of the GLR (ESF, GWR) model were 5.63 μg/m3 (6.124 and 5.904 μg/m3) in the YRD region and 7.932 μg/m3 (8.205 and 8.443 μg/m3) in the BTH region. Except in summer, the RMSE of the GLR model was the lowest among the three models. Overall, the monthly, seasonal, and annual GLR models with lower RMSE had higher prediction accuracy than the other two models.

3.3. PM2.5 Distribution Maps and Spatiotemporal Characteristics

3.3.1. Continuous PM2.5 Distribution Maps

The coefficients were calculated by the GLR model, and by ordinary kriging could we obtain the coefficients for the entire YRD and BTH regions. The filtered spatial eigenvectors could also be interpolated by ordinary kriging into a raster image covering the entire region. Subsequently, we obtained PM2.5 concentration maps throughout the YRD and BTH regions in different periods. The estimated monthly PM2.5 results for the YRD and BTH regions are shown in Figures S1 and S2, respectively. The seasonal and annual PM2.5 results for the YRD and BTH regions were shown in Figure 4 and Figure 5. In addition, for more realistic results, we set the negative values to 0 μg/m3 and removed all outliers. The missing parts of the PM2.5 distribution maps caused by the missing values of the AOD images were obtained by inverse distance-weighted interpolation.

3.3.2. Spatiotemporal Distribution Based on the PM2.5 Distribution Maps

Figure 6 shows the monthly PM2.5 concentrations in the two regions obtained from the PM2.5 estimates distribution maps. The monthly PM2.5 concentrations in the two regions varied in a U-shape, with higher PM2.5 concentrations in the winter months and lower PM2.5 concentrations in the summer months. The most and least polluted months in the YRD were December 2015 and July 2016, when the average PM2.5 concentrations reached 71.95 and 25.34 μg/m3, respectively. In the BTH region, the air was most polluted in December 2015, when the average PM2.5 concentrations reached 139.64 μg/m3, and the cleanest month was August 2016, with an average PM2.5 concentrations of 37.79 μg/m3.
As is shown in Table 6, the annual average PM2.5 concentrations in the YRD and BTH regions were 44.85 and 80.21 μg/m3, respectively, and the annual average PM2.5 concentration in the BTH region was almost twice as high as that in the YRD region. The annual average PM2.5 concentrations in the YRD region and the BTH region exceed the WHO interim target-1 (35 μg/m3) by 28% and 129%, respectively.
According to the seasonal and annual maps, PM2.5 pollution in both regions is significantly worse in winter than in the rest of the year. PM2.5 pollution levels also varied by province and municipality. The order of PM2.5 pollution from high to low in the YRD region is Anhui Province > Zhejiang Province > Jiangsu Province > Shanghai. Additionally, in the BTH region, Hebei province > Tianjin ≈ Beijing. The annual maps show that the most polluted areas of PM2.5 in the YRD region were mainly located in the central part of Hefei in Anhui province and the western part of Wuxi and the eastern part of Changzhou in Jiangsu Province, while the most polluted areas of PM2.5 in the BTH region were mainly located in the western part of Hengshui, the central part of Langfang and the eastern part of Handan and Shijiazhuang in Hebei Province. The annual PM2.5 concentrations in the YRD region was higher in the northern part and lower in the southern part, and the PM2.5 concentrations gradually decreased from the northwest to the southeast, while the PM2.5 concentrations in the southern part of the BTH region was higher than that in the northern part.

4. Discussion

4.1. Method Improvement and Accuracy Enhancement

In this study, we proposed a GLR model considering global spatial autocorrelation and local spatial heterogeneity for PM2.5 concentration estimation. Comprehensive results show that the GLR model outperformed the ordinary ESF and GWR models at all time scales and in two regions. This proves that our proposed model is a feasible and reasonable improvement to the ESF and GWR models. Compared with other models used in the YRD [49,57] and BTH [44] region, the GLR model still has good performance, and the average Adj.R2 of the GLR model in two regions is also comparable to that of previous studies, or even improved. Further, the distribution map of PM2.5 obtained from the GLR model is also similar to the maps in the above studies. Overall, the GLR model is an excellent model for estimating regional PM2.5 concentrations, with good spatial and temporal applicability.
The spatial distribution of PM2.5 has global spatial autocorrelation and local spatial heterogeneity, which are not fully considered by the ESF and GWR models. The ESF model only solves the problem of spatial autocorrelation, and can only obtain coefficients that are optimal for the global, with low accuracy in the local; considering the spatial heterogeneity, the GWR model overfocuses on local features and suffers from local overfitting. However, considering that the spatial distribution of PM2.5 is spatially influenced by global (spatial autocorrelation) and local (spatial heterogeneity) factors, and the local spatial influence can be seen partly as the local expression of the global spatial influence and partly as the influence of the local environment itself, the GLR model solves the problems of spatial autocorrelation and spatial heterogeneity by introducing the spatial eigenvectors (global factors) in local modeling, making the model more consistent with the spatial pattern of the spatial distribution of PM2.5. Therefore, the GLR model has higher accuracy than the ESF and GWR models and better prediction ability in areas without stations.
The climatic conditions in summer in both regions are very complex, leading to more complex non-linear relationships between PM2.5 and factors, which may also be responsible for the reduced and fluctuating performance of all models during summer time in both regions, and this result is consistent with previous researches [33]. In the BTH region, there are fewer PM2.5 monitoring stations, only approximately one-third of those in the YRD region, and they are extremely unevenly distributed. Therefore, when the 10-fold CV was performed, the impact of a few stations not participating in the modeling had a greater impact on the CV results, resulting in higher CV RMSEs in the BTH region than in the YRD region. Additionally, the distribution of stations in the BTH region is more concentrated and the spatial autocorrelation is stronger, resulting in a higher MC for the residuals than in the YRD region.

4.2. Comparison of PM2.5 in the YRD and BTH Regions

The monthly, seasonal, and annual model results show that the model accuracy is lower in the YRD region than in the BTH region, and there may be some reasons that can explain this. The spatial resolution of the AOD product used in the YRD region is 3 km, while the spatial resolution of the AOD product used in the BTH region is 1 km. Coarser products may cause errors between the data used for modeling and the true values, resulting in reduced model performance. Further, the YRD region is larger than the BTH region, and its spatial and climatic conditions are much more complex than those in the BTH region, which leads to a more complex non-linear relationship between PM2.5 and various factors in the YRD region. Therefore, the linear model is not able to estimate PM2.5 concentrations well. Additionally, PM2.5 pollution in the BTH region is more serious than that in the YRD region, and the PM2.5 distribution is more aggregated. The spatial distribution of PM2.5 is more stable, making it easier to estimate the PM2.5 concentrations.
The YRD and BTH regions have hot and rainy summers with moist air, which is easily absorbed by particles that then gather and settle. High temperature improves the air diffusion efficiency and accelerates the diffusion of PM2.5, while more rainfall intensifies the flushing of PM2.5 [58]. Moreover, the light, moist air in summer is suitable for vegetation growth, which accelerates the absorption of PM2.5. Therefore, in both regions, the summer is the best season for air quality, while the cold and dry winter is a period of severe PM2.5 pollution, and this finding is consistent with previous studies [59,60]. PM2.5 pollution in the BTH region is much more severe than that in the YRD region. Located at a higher latitude, the BTH region has a lower annual temperature and less precipitation than the YRD region. In addition, the BTH region is more industrialized, with a higher number of factories (37,135) than the YRD region (23,299), and coal dust from winter heating in the BTH region worsens the air [61].
According to the Pearson correlation coefficient, RH is very complexly associated with PM2.5 in both regions, which is consistent with previous studies [33,49]. Low RH may increase the PM2.5 concentrations due to the hygroscopic growth, while high RH may contribute to the settlement of fine particles, decreasing the PM2.5 concentration at ground level [60,62]. Comparing the DEM (Figure S3) and factory data (Figure S4) with the PM2.5 distribution maps of the two regions, we find that PM2.5 pollution is worse at lower altitudes than at higher altitudes. Atmospheric pressure and temperature in high-altitude areas are lower than those in low-altitude areas. However, strong winds at high altitudes accelerate PM2.5 diffusion [63], and people are more likely to build factories and road networks on level areas (plains), resulting in higher PM2.5 concentrations at low altitudes.

4.3. Limitations and Future Works

The period of the data applied in this experiment is only one year, from December 2015 to November 2016, making the validation of the proposed method relatively weak on the annual time scale. More and better quality data are needed, and there is a serious lack of AOD data, a better method for processing AOD data are needed. Additionally, PM2.5 monitoring stations with any null data were deleted, but the omitted PM2.5 observations affect the construction of the global and local spatial weight matrix, which has a large impact on the extraction of spatial eigenvectors, thus affecting the model’s accuracy [57].
More data will be collected in a follow-up study, and we will address the above problems in follow-up research. In addition, although deep learning methods are less interpretable than linear models, they have higher accuracy and can explore the nonlinear relationship between factors and PM2.5 [64,65]. We will try to combine linear models with deep learning methods to retain the interpretability of the models and improve the accuracy in the future.

5. Conclusions

In this paper, a GLR model considering global spatial autocorrelation and local spatial heterogeneity has been proposed for estimating ground PM2.5 concentrations in the BTH region and the YRD region of China. Comprehensive results show that the GLR model is superior to the ordinary GWR model and ordinary ESF model in terms of Adj.R2, RMSE, AIC, and CV RMSE at monthly, seasonal, and annual scales. The GLR model is able to address the effects of spatial autocorrelation and spatial heterogeneity on modeling, and to some extent, it overcomes the shortcoming of the ordinary GWR model that overfocuses on local features and the disadvantage of the poor local performance of the ordinary ESF model. Further, lower CV RMSE indicates that the GLR model has higher prediction accuracy in areas without monitoring stations. According to the annual map, annual average PM2.5 concentrations in the YRD region and the BTH region exceed the WHO interim target-1 (35 μg/m3) by 28% and 129%, respectively. Overall, the GLR model with good spatial and temporal applicability is a promising method for estimating ground PM2.5 concentrations.
As discussed in the 4.3 “Limitations and future works” section, more and better quality data will be tried in the future, and the linear and non-linear relationships between PM2.5 and factor will be taken into account in the next step.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/rs14184545/s1, Figure S1: Estimated monthly PM2.5 concentration distribution maps in the YRD region: (a) December; (b) January; (c) February; (d) March; (e) April; (f) May; (g) June; (h) July; (i) August; (j) September; (k) October; (l) November. Figure S2: Estimated monthly PM2.5 concentration distribution maps in the BTH region: (a) December; (b) January; (c) February; (d) March; (e) April; (f) May; (g) June; (h) July; (i) August; (j) September; (k) October; (l) November. Figure S3: The DEM data of the YRD region (a) and the BTH region (b). Figure S4: Factories in the YRD region (a) and the BTH region (b). Table S1: The Pearson correlation coefficients in the YRD region. Table S2: The Pearson correlation coefficients in the BTH region. Table S3: The MCs for PM2.5 concentrations at monthly, seasonal, and annual time scales. Table S4: Model performance of the monthly ESF, GWR and, GLR models in the YRD region. Table S5: Model performance of the monthly ESF, GWR and, GLR models in the BTH region. Table S6: The CV REMSE of the monthly ESF, GWR and, GLR models in both regions.

Author Contributions

Conceptualization, H.S. and Y.C. (Yumin Chen); methodology, H.S., Y.C. (Yumin Chen) and H.T.; software, H.S. and H.T.; supervision, Y.C. (Yumin Chen); funding acquisition, Y.C. (Yumin Chen); writing—original draft preparation, H.S.; data curation, H.S., H.T., A.Z., G.C. and Y.C. (Yuejun Chen). All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the National Key Research and Development Program of China [Grant No. 2018YFB0505302].

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All datasets used in this paper are publicly available and the URLs are provided in the data section.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Bonyadi, Z.; Ehrampoush, M.H.; Ghaneian, M.T.; Mokhtari, M.; Sadeghi, A. Cardiovascular, respiratory, and total mortality attributed to PM2.5 in Mashhad, Iran. Environ. Monit. Assess. 2016, 188, 570. [Google Scholar] [CrossRef] [PubMed]
  2. Li, T.; Guo, Y.; Liu, Y.; Wang, J.; Wang, Q.; Sun, Z.; He, M.Z.; Shi, X. Estimating mortality burden attributable to short-term PM2.5 exposure: A national observational study in China. Environ. Int. 2019, 125, 245–251. [Google Scholar] [CrossRef]
  3. Xiao, Q.; Liang, F.; Ning, M.; Zhang, Q.; Bi, J.; He, K.; Lei, Y.; Liu, Y. The long-term trend of PM2.5-related mortality in China: The effects of source data selection. Chemosphere 2021, 263, 127894. [Google Scholar] [CrossRef] [PubMed]
  4. Brauer, M.; Freedman, G.; Frostad, J.; Van Donkelaar, A.; Martin, R.V.; Dentener, F.; Dingenen, R.V.; Estep, K.; Amini, H.; Apte, J.S.; et al. Ambient Air Pollution Exposure Estimation for the Global Burden of Disease 2013. Environ. Sci. Technol. 2016, 50, 79–88. [Google Scholar] [CrossRef] [PubMed]
  5. Wang, N.; Mengersen, K.; Kimlin, M.; Zhou, M.; Tong, S.; Fang, L.; Wang, B.; Hu, W. Lung cancer and particulate pollution: A critical review of spatial and temporal analysis evidence. Environ. Res. 2018, 164, 585–596. [Google Scholar] [CrossRef] [PubMed]
  6. Cao, Q.; Rui, G.; Liang, Y. Study on PM2.5 pollution and the mortality due to lung cancer in China based on geographic weighted regression model. BMC Public Health 2018, 18, 925. [Google Scholar] [CrossRef]
  7. Pun, V.C.; Kazemiparkouhi, F.; Manjourides, J.; Suh, H.H. Long-Term PM2.5 Exposure and Respiratory, Cancer, and Cardiovascular Mortality in Older US Adults. Am. J. Epidemiol. 2017, 186, 961–969. [Google Scholar] [CrossRef]
  8. Hyslop, N.P. Impaired visibility: The air pollution people see. Atmos. Environ. 2009, 43, 182–195. [Google Scholar] [CrossRef]
  9. Guo, J.P.; Zhang, X.Y.; Che, H.Z.; Gong, S.L.; An, X.; Cao, C.X.; Guang, J.; Zhang, H.; Wang, Y.Q.; Zhang, X.C.; et al. Correlation between PM concentrations and aerosol optical depth in eastern China. Atmos. Environ. 2009, 43, 5876–5886. [Google Scholar] [CrossRef]
  10. Lin, C.; Li, Y.; Yuan, Z.; Lau, A.K.H.; Li, C.; Fung, J.C.H. Using satellite remote sensing data to estimate the high-resolution distribution of ground-level PM2.5. Remote Sens. Environ. 2015, 156, 117–128. [Google Scholar] [CrossRef]
  11. He, Q.; Huang, B. Satellite-based mapping of daily high-resolution ground PM2.5 in China via space-time regression modeling. Remote Sens. Environ. 2018, 206, 72–83. [Google Scholar] [CrossRef]
  12. Lee, S.; Serre, M.L.; Van Donkelaar, A.; Martin, R.V.; Burnett, R.T.; Jerrett, M. Comparison of Geostatistical Interpolation and Remote Sensing Techniques for Estimating Long-Term Exposure to Ambient PM2.5 Concentrations across the Continental United States. Environ. Health Perspect. 2012, 120, 1727–1732. [Google Scholar] [CrossRef] [PubMed]
  13. Pan, Q.; Wen, X.; Lu, Z.; Li, L.; Jing, W. Dynamic speed control of unmanned aerial vehicles for data collection under internet of things. Sensors 2018, 18, 3951. [Google Scholar] [CrossRef] [PubMed]
  14. Bai, K.; Li, K.; Chang, N.B.; Gao, W. Advancing the prediction accuracy of satellite-based PM2.5 concentration mapping: A perspective of data mining through in situ PM2.5 measurements. Environ. Pollut. 2019, 254, 113047. [Google Scholar] [CrossRef] [PubMed]
  15. Chen, G.; Li, Y.; Zhou, Y.; Shi, C.; Guo, Y.; Liu, Y. The comparison of AOD-based and non-AOD prediction models for daily PM2.5 estimation in Guangdong province, China with poor AOD coverage. Environ. Res. 2021, 195, 110735. [Google Scholar] [CrossRef]
  16. Carmona, J.M.; Gupta, P.; Lozano-García, D.F.; Vanoye, A.Y.; Hernández-Paniagua, I.Y.; Mendoza, A. Evaluation of modis aerosol optical depth and surface data using an ensemble modeling approach to assess PM2.5 temporal and spatial distributions. Remote Sens. 2021, 13, 3102. [Google Scholar] [CrossRef]
  17. Zhang, Y.; Li, Z. Remote sensing of atmospheric fine particulate matter (PM2.5) mass concentration near the ground from satellite observation. Remote Sens. Environ. 2015, 160, 252–262. [Google Scholar] [CrossRef]
  18. Wang, Z.; Chen, L.; Tao, J.; Zhang, Y.; Su, L. Satellite-based estimation of regional particulate matter (PM) in Beijing using vertical-and-RH correcting method. Remote Sens. Environ. 2010, 114, 50–63. [Google Scholar] [CrossRef]
  19. Bai, Y.; Wu, L.; Qin, K.; Zhang, Y.; Shen, Y.; Zhou, Y. A geographically and temporally weighted regression model for ground-level PM2.5 estimation from satellite-derived 500 m resolution AOD. Remote Sens. 2016, 8, 262. [Google Scholar] [CrossRef]
  20. Reyes, J.M.; Serre, M.L. An LUR/BME framework to estimate PM2.5 explained by on road mobile and stationary sources. Environ. Sci. Technol. 2014, 48, 1736–1744. [Google Scholar] [CrossRef]
  21. Zheng, C.; Zhao, C.; Zhu, Y.; Wang, Y.; Shi, X.; Wu, X.; Chen, T.; Wu, F.; Qiu, Y. Analysis of influential factors for the relationship between PM2.5 and AOD in Beijing. Atmos. Chem. Phys. 2017, 17, 13473–13489. [Google Scholar] [CrossRef]
  22. Chen, X.; Li, H.; Zhang, S.; Chen, Y.; Fan, Q. High Spatial Resolution PM2.5 Retrieval Using MODIS and Ground Observation Station Data Based on Ensemble Random Forest. IEEE Access 2019, 7, 44416–44430. [Google Scholar] [CrossRef]
  23. Shen, H.; Li, T.; Yuan, Q.; Zhang, L. Estimating Regional Ground-Level PM 2.5 Directly From Satellite Top-Of-Atmosphere Reflectance Using Deep Belief Networks. J. Geophys. Res. Atmos. 2018, 123, 13, 875–913, 886. [Google Scholar] [CrossRef]
  24. Zhang, J.J.Y.; Sun, L.; Rainham, D.; Dummer, T.J.B.; Wheeler, A.J.; Anastasopolos, A.; Gibson, M.; Johnson, M. Predicting intraurban airborne PM1.0-trace elements in a port city: Land use regression by ordinary least squares and a machine learning algorithm. Sci. Total Environ. 2022, 806, 150149. [Google Scholar] [CrossRef] [PubMed]
  25. Wong, P.Y.; Lee, H.Y.; Zeng, Y.T.; Chern, Y.R.; Chen, N.T.; Candice Lung, S.C.; Su, H.J.; Wu, C. Da Using a land use regression model with machine learning to estimate ground level PM2.5. Environ. Pollut. 2021, 277, 116846. [Google Scholar] [CrossRef] [PubMed]
  26. Wei, J.; Huang, W.; Li, Z.; Xue, W.; Peng, Y.; Sun, L.; Cribb, M. Estimating 1-km-resolution PM2.5 concentrations across China using the space-time random forest approach. Remote Sens. Environ. 2019, 231, 111221. [Google Scholar] [CrossRef]
  27. Son, Y.; Osornio-Vargas, Á.R.; O’Neill, M.S.; Hystad, P.; Texcalac-Sangrador, J.L.; Ohman-Strickland, P.; Meng, Q.; Schwander, S. Land use regression models to assess air pollution exposure in Mexico City using finer spatial and temporal input parameters. Sci. Total Environ. 2018, 639, 40–48. [Google Scholar] [CrossRef]
  28. Chu, Y.; Liu, Y.; Li, X.; Liu, Z.; Lu, H.; Lu, Y.; Mao, Z.; Chen, X.; Li, N.; Ren, M.; et al. A review on predicting ground PM2.5 concentration using satellite aerosol optical depth. Atmosphere 2016, 7, 129. [Google Scholar] [CrossRef]
  29. Zhou, C.; Chen, J.; Wang, S. Examining the effects of socioeconomic development on fine particulate matter (PM2.5) in China’s cities using spatial regression and the geographical detector technique. Sci. Total Environ. 2018, 619–620, 436–445. [Google Scholar] [CrossRef]
  30. Hao, Y.; Liu, Y.M. The influential factors of urban PM2.5 concentrations in China: A spatial econometric analysis. J. Clean. Prod. 2016, 112, 1443–1453. [Google Scholar] [CrossRef]
  31. Cheng, L.; Zhang, T.; Chen, L.; Li, L.; Wang, S.; Hu, S.; Yuan, L.; Wang, J.; Wen, M. Investigating the impacts of urbanization on PM2.5 pollution in the yangtze river delta of china: A spatial panel data approach. Atmosphere 2020, 11, 1058. [Google Scholar] [CrossRef]
  32. Griffith, D.A. Spatial autocorrelation and eigenfunctions of the geographic weights matrix accompanying geo-referenced data. Can. Geogr.-Geogr. Can. 1996, 40, 351–367. [Google Scholar] [CrossRef]
  33. Zhang, J.; Li, B.; Chen, Y.; Chen, M.; Fang, T.; Liu, Y. Eigenvector spatial filtering regression modeling of ground PM2.5 concentrations using remotely sensed data. Int. J. Environ. Res. Public Health 2018, 15, 1228. [Google Scholar] [CrossRef]
  34. Lin, G.; Fu, J.; Jiang, D.; Hu, W.; Dong, D.; Huang, Y.; Zhao, M. Spatio-temporal variation of PM2.5 concentrations and their relationship with geographic and socioeconomic factors in China. Int. J. Environ. Res. Public Health 2013, 11, 173–186. [Google Scholar] [CrossRef]
  35. Ma, Z.; Hu, X.; Huang, L.; Bi, J.; Liu, Y. Estimating ground-level PM2.5 in china using satellite remote sensing. Environ. Sci. Technol. 2014, 48, 7436–7444. [Google Scholar] [CrossRef] [PubMed]
  36. Jiang, M.; Sun, W.; Yang, G.; Zhang, D. Modelling seasonal GWR of daily PM2.5 with proper auxiliary variables for the Yangtze River Delta. Remote Sens. 2017, 9, 346. [Google Scholar] [CrossRef]
  37. Christakos, G.; Serre, M.L. BME analysis of spatiotemporal particulate matter distributions in North Carolina. Atmos. Environ. 2000, 34, 3393–3406. [Google Scholar] [CrossRef]
  38. Wang, W.; Sun, Y. Penalized local polynomial regression for spatial data. Biometrics 2019, 75, 1179–1190. [Google Scholar] [CrossRef] [PubMed]
  39. Brunsdon, C.; Fotheringham, A.S.; Charlton Martin, E. Geographically weighted regression: A method for exploring spatial nonstationarity. Encycl. Geogr. 1996, 28, 281–298. [Google Scholar] [CrossRef]
  40. Wu, T.; Zhou, L.; Jiang, G.; Meadows, M.E.; Zhang, J.; Pu, L.; Wu, C.; Xie, X. Modelling spatial heterogeneity in the effects of natural and socioeconomic factors, and their interactions, on atmospheric PM2.5 concentrations in china from 2000–2015. Remote Sens. 2021, 13, 2152. [Google Scholar] [CrossRef]
  41. Mirzaei, M.; Amanollahi, J.; Tzanis, C.G. Evaluation of linear, nonlinear, and hybrid models for predicting PM2.5 based on a GTWR model and MODIS AOD data. Air Qual. Atmos. Health 2019, 12, 1215–1224. [Google Scholar] [CrossRef]
  42. Guo, Y.; Tang, Q.; Gong, D.Y.; Zhang, Z. Estimating ground-level PM2.5 concentrations in Beijing using a satellite-based geographically and temporally weighted regression model. Remote Sens. Environ. 2017, 198, 140–149. [Google Scholar] [CrossRef]
  43. Wu, S.; Du, Z.; Wang, Y.; Lin, T.; Zhang, F.; Liu, R. Modeling spatially anisotropic nonstationary processes in coastal environments based on a directional geographically neural network weighted regression. Sci. Total Environ. 2020, 709, 136097. [Google Scholar] [CrossRef] [PubMed]
  44. Zhai, L.; Li, S.; Zou, B.; Sang, H.; Fang, X.; Xu, S. An improved geographically weighted regression model for PM2.5 concentration estimation in large areas. Atmos. Environ. 2018, 181, 145–154. [Google Scholar] [CrossRef]
  45. Hu, J.; Wang, Y.; Ying, Q.; Zhang, H. Spatial and temporal variability of PM2.5 and PM10 over the North China Plain and the Yangtze River Delta, China. Atmos. Environ. 2014, 95, 598–609. [Google Scholar] [CrossRef]
  46. Fang, C.; Wang, Z.; Xu, G. Spatial-temporal characteristics of PM2.5 in China: A city-level perspective analysis. J. Geogr. Sci. 2016, 26, 1519–1532. [Google Scholar] [CrossRef]
  47. Li, T.; Shen, H.; Yuan, Q.; Zhang, L. A Locally Weighted Neural Network Constrained by Global Training for Remote Sensing Estimation of PM. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–13. [Google Scholar] [CrossRef]
  48. Liu, J.; Chen, W. First satellite-based regional hourly NO2 estimations using a space-time ensemble learning model: A case study for Beijing-Tianjin-Hebei Region, China. Sci. Total Environ. 2022, 820, 153289. [Google Scholar] [CrossRef]
  49. Tan, H.; Chen, Y.; Wilson, J.P.; Zhang, J.; Cao, J.; Chu, T. An eigenvector spatial filtering based spatially varying coefficient model for PM2.5 concentration estimation: A case study in Yangtze River Delta region of China. Atmos. Environ. 2020, 223, 117205. [Google Scholar] [CrossRef]
  50. Getis, A.; Aldstadt, J. Constructing the spatial weights matrix using a local statistic. Adv. Spat. Sci. 2004, 61, 147–163. [Google Scholar] [CrossRef]
  51. Aldstadt, J.; Getis, A. Using AMOEBA to create a spatial weights matrix and identify spatial clusters. Geogr. Anal. 2006, 38, 327–343. [Google Scholar] [CrossRef]
  52. Griffith, D.A.; Paelinck, J.H.P. Non-standard Spatial Statistics and Spatial Econometrics. In Advances in Geographic Information Science; 2011; Volume 1, pp. 1–256. ISBN 9783642160424. [Google Scholar]
  53. Chun, Y.; Griffith, D.A. A quality assessment of eigenvector spatial filtering based parameter estimates for the normal probability model. Spat. Stat. 2014, 10, 1–11. [Google Scholar] [CrossRef]
  54. Chun, Y.; Griffith, D.A.; Lee, M.; Sinha, P. Eigenvector selection with stepwise regression techniques to construct eigenvector spatial filters. J. Geogr. Syst. 2016, 18, 67–85. [Google Scholar] [CrossRef]
  55. Fan, J.; Gijbels, I. Variable Bandwidth and Local Linear Regression Smoothers. Ann. Stat. 1992, 20, 2008–2036. [Google Scholar] [CrossRef]
  56. Imbens, G.; Kalyanaraman, K. Optimal bandwidth choice for the regression discontinuity estimator. Rev. Econ. Stud. 2012, 79, 933–959. [Google Scholar] [CrossRef]
  57. Tan, H.; Chen, Y.; Wilson, J.P.; Zhou, A.; Chu, T. Self-adaptive bandwidth eigenvector spatial filtering model for estimating PM2.5 concentrations in the Yangtze River Delta region of China. Environ. Sci. Pollut. Res. 2021, 28, 67800–67813. [Google Scholar] [CrossRef]
  58. Wang, J.; Ogawa, S. Effects of meteorological conditions on PM2.5 concentrations in Nagasaki, Japan. Int. J. Environ. Res. Public Health 2015, 12, 9089–9101. [Google Scholar] [CrossRef]
  59. Hu, H.; Hu, Z.; Zhong, K.; Xu, J.; Zhang, F.; Zhao, Y.; Wu, P. Satellite-based high-resolution mapping of ground-level PM 2.5 concentrations over East China using a spatiotemporal regression kriging model. Sci. Total Environ. 2019, 672, 479–490. [Google Scholar] [CrossRef]
  60. Yang, Q.; Yuan, Q.; Li, T.; Shen, H.; Zhang, L. The relationships between PM2.5 and meteorological factors in China: Seasonal and regional variations. Int. J. Environ. Res. Public Health 2017, 14, 1510. [Google Scholar] [CrossRef]
  61. Liang, Y.; Fang, L.; Pan, H.; Zhang, K.; Kan, H.; Brook, J.R.; Sun, Q. PM2.5 in Beijing-temporal pattern and its association with influenza. Environ. Health A Glob. Access Sci. Source 2014, 13, 102. [Google Scholar] [CrossRef] [Green Version]
  62. Lin, G.; Fu, J.; Jiang, D.; Wang, J.; Wang, Q.; Dong, D. Spatial variation of the relationship between PM2.5 concentrations and meteorological parameters in China. Biomed Res. Int. 2015, 2015, 684618. [Google Scholar] [CrossRef] [PubMed]
  63. Fang, X.; Zou, B.; Liu, X.; Sternberg, T.; Zhai, L. Satellite-based ground PM2.5 estimation using timely structure adaptive modeling. Remote Sens. Environ. 2016, 186, 152–163. [Google Scholar] [CrossRef]
  64. Wang, Z.; Hu, B.; Huang, B.; Ma, Z.; Biswas, A.; Jiang, Y.; Shi, Z. Predicting annual PM2.5 in mainland China from 2014 to 2020 using multi temporal satellite product: An improved deep learning approach with spatial generalization ability. ISPRS J. Photogramm. Remote Sens. 2022, 187, 141–158. [Google Scholar] [CrossRef]
  65. Pak, U.; Ma, J.; Ryu, U.; Ryom, K.; Juhyok, U.; Pak, K.; Pak, C. Deep learning-based PM2.5 prediction considering the spatiotemporal correlations: A case study of Beijing, China. Sci. Total Environ. 2020, 699, 133561. [Google Scholar] [CrossRef]
Figure 1. The YRD region (a) and the BTH region (b).
Figure 1. The YRD region (a) and the BTH region (b).
Remotesensing 14 04545 g001
Figure 2. The workflow of the construction of the GLR model.
Figure 2. The workflow of the construction of the GLR model.
Remotesensing 14 04545 g002
Figure 3. Distribution map of the annual average PM2.5 observations: (a) the YRD region; (b) the BTH region.
Figure 3. Distribution map of the annual average PM2.5 observations: (a) the YRD region; (b) the BTH region.
Remotesensing 14 04545 g003
Figure 4. Estimated seasonal and annual PM2.5 concentration distribution maps in the YRD region: (a) winter; (b) spring; (c) summer; (d) autumn; (e) annual.
Figure 4. Estimated seasonal and annual PM2.5 concentration distribution maps in the YRD region: (a) winter; (b) spring; (c) summer; (d) autumn; (e) annual.
Remotesensing 14 04545 g004
Figure 5. Estimated seasonal and annual PM2.5 concentration distribution maps in the BTH region: (a) winter; (b) spring; (c) summer; (d) autumn; (e) annual.
Figure 5. Estimated seasonal and annual PM2.5 concentration distribution maps in the BTH region: (a) winter; (b) spring; (c) summer; (d) autumn; (e) annual.
Remotesensing 14 04545 g005
Figure 6. The estimated monthly PM2.5 concentrations in the YRD and BTH regions.
Figure 6. The estimated monthly PM2.5 concentrations in the YRD and BTH regions.
Remotesensing 14 04545 g006
Table 1. The data sources of remote sensing and meteorological data.
Table 1. The data sources of remote sensing and meteorological data.
FactorsSourceSpatial ResolutionTemporal Resolution
AODMCD19A2
https://ladsweb.modaps.eosdis.nasa.gov/search/
(accessed on 10 March 2022)
1 km (BTH)
3 km (YRD)
Daily
DEMShuttle Radar Topography Mission (SRTM) DEM
http://srtm.csi.cgiar.org1
(accessed on 9 March 2022)
90 m/
Meteorological DataTSEuropean Centre for Medium-Range Weather Forecasts (ECMWF) ERA5 (https://cds.climate.copernicus.eu)
(accessed on 9 March 2022)
0.25° latitude × 0.25° longitudeHourly
PBLH
PS
RH
NDVIMOD13A3
https://ladsweb.modaps.eosdis.nasa.gov/search/
(accessed on 10 March 2022)
1 kmMonthly
Table 2. Summary statistics of PM2.5 concentrations and all variables in the YRD and BTH regions (Std. Dev: Standard Deviation).
Table 2. Summary statistics of PM2.5 concentrations and all variables in the YRD and BTH regions (Std. Dev: Standard Deviation).
FactorsYRDBTH
MaxMinMeanStd. DevMaxMinMeanStd. Dev
PM2.5 (μg/m3)134.807.0049.7822.11264.9617.9976.0442.34
AOD (/)2.250.060.620.301.770.060.500.26
NDVI (/)0.880.160.390.130.850.080.270.14
TS (K)327.72271.13292.4113.55301.01253.80283.4211.99
RH (/)0.890.480.700.080.840.290. 5314.08
PS (hPa)1029.50955.831010.7813.041029.88850.92968.9965.88
PBLH (m)896.68135.24399.65143.391135.44174.19529.17187.07
DEM (m)256.001.0024.5731.33816.000.00108.46191.56
ROAD (km/km2)3.560.2171.300.7295.630.451.751.37
FACT (count/km2)0.690.000.210.140.540.030.260.13
Table 3. Model performance of the ESF, GWR and, GLR models in both regions at monthly, seasonal, and annual time scales.
Table 3. Model performance of the ESF, GWR and, GLR models in both regions at monthly, seasonal, and annual time scales.
TimeESFGWRGLR
Adj.R2AICRMSEAdj.R2AICRMSEAdj.R2AICRMSE
YRDMonthly average0.5741447.9117.1010.5781404.9065.8540.6201378.8995.401
Winter0.6861495.5847.4630.6981466.5817.1550.7031463.3517.086
Spring0.4491431.3976.3560.5481354.0515.080.5641345.94.975
Summer0.3821299.7734.6370.4451240.453.9120.5341189.8533.309
Autumn0.6521262.9264.7770.7091189.8273.9470.7341173.293.796
Annual0.6871193.7544.3340.7331174.3243.7980.7481114.993.427
BTHMonthly average0.799528.9717.7330.797504.5166.9070.853467.3124.885
Winter0.897584.78910.3090.902564.629.5850.938520.8196.533
Spring0.77513.3315.9730.758496.6415.6850.809475.0054.748
Summer0.711482.8715.2310.745450.1354.3890.786431.4693.705
Autumn0.925501.9615.8710.912492.0286.0290.96416.323.174
Annual0.926486.5445.3620.943451.564.4370.959410.663.138
Table 4. The MCs for residuals of the GLR model at monthly, seasonal, and annual time scales.
Table 4. The MCs for residuals of the GLR model at monthly, seasonal, and annual time scales.
TimeMCTimeMC
YRDBTHYRDBTH
15_Dec0.283 **−0.371 **16_Sep0.132 **−0.217 *
16_Jan//16_Oct0.122 **/
16_Feb/−0.298 **16_Nov//
16_Mar//Winter0.238 **−0.239 *
16_Apr//Spring//
16_May0.268 **/Summer//
16_Jun//Autumn//
16_Jul/−0.244 *Annual/−0.218 *
16_Aug//
/ insignificance; * significance at α = 0.05 level; ** significance at α = 0.01 level.
Table 5. The 10-fold CV RMSE of models at monthly, seasonal, and annual time scales.
Table 5. The 10-fold CV RMSE of models at monthly, seasonal, and annual time scales.
TimeYRDBTH
ESFGWRGLRESFGWRGLR
Monthly average7.6507.2867.02410.52611.0019.499
Winter7.7897.6656.9328.0378.9848.261
Spring6.6806.4196.0737.2296.8686.848
Summer4.8404.6694.7905.9685.7195.662
Autumn5.1884.8614.73111.58412.20110.955
Annual4.7014.8654.5995.7115.4855.302
Table 6. Estimated seasonal and annual PM2.5 concentrations in the YRD and BTH regions.
Table 6. Estimated seasonal and annual PM2.5 concentrations in the YRD and BTH regions.
TimeYRDBTH
MaxMeanMaxMean
Winter96.2766.86153.3694.91
Spring74.0345.6876.9661.21
Summer48.8326.7783.5548.94
Autumn60.8236.08110.3475.68
Annual62.0844.85114.7480.21
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Su, H.; Chen, Y.; Tan, H.; Zhou, A.; Chen, G.; Chen, Y. Estimating Regional PM2.5 Concentrations in China Using a Global-Local Regression Model Considering Global Spatial Autocorrelation and Local Spatial Heterogeneity. Remote Sens. 2022, 14, 4545. https://doi.org/10.3390/rs14184545

AMA Style

Su H, Chen Y, Tan H, Zhou A, Chen G, Chen Y. Estimating Regional PM2.5 Concentrations in China Using a Global-Local Regression Model Considering Global Spatial Autocorrelation and Local Spatial Heterogeneity. Remote Sensing. 2022; 14(18):4545. https://doi.org/10.3390/rs14184545

Chicago/Turabian Style

Su, Heng, Yumin Chen, Huangyuan Tan, Annan Zhou, Guodong Chen, and Yuejun Chen. 2022. "Estimating Regional PM2.5 Concentrations in China Using a Global-Local Regression Model Considering Global Spatial Autocorrelation and Local Spatial Heterogeneity" Remote Sensing 14, no. 18: 4545. https://doi.org/10.3390/rs14184545

APA Style

Su, H., Chen, Y., Tan, H., Zhou, A., Chen, G., & Chen, Y. (2022). Estimating Regional PM2.5 Concentrations in China Using a Global-Local Regression Model Considering Global Spatial Autocorrelation and Local Spatial Heterogeneity. Remote Sensing, 14(18), 4545. https://doi.org/10.3390/rs14184545

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop