Probable nexus between Methane and Air Pollution in Bangladesh using Machine Learning and Geographically Weighted Regression Modeling

This paper investigates the probable nexus between methane (CH4) and air pollutants, a public health hazard in Bangladesh. The hypothesis considers that the concentration of CH4 is dependent on the ten air pollutants found in the five districts in Dhaka Division, a major urban and industrial area in Bangladesh. These pollutants are: Particular matters (PM2.5), Nitrogen dioxide (NO2), Nitrogen oxide (NOx), Aerosol optical thickness (AOT), Sulfur dioxide (SO2), Carbon monoxide (CO), Ozone (O3), Black carbon (BC), Formaldehyde (HCHO) and Dust. The study applies Machine Learning (ML) technique and Geographically Weighted Regression (GWR) Modeling. Temporal CH4 datasets from the Sentinel-5P sensor are classified to estimate the annual CH4 concentration during 2019-2021.Seven supervised classifiers of ML coupled with the GWR model are used to predict the statistical and spatial relationships. CH4 increases gradually during 2018-2021 in Dhaka, Gazipur, and Munshiganj Districts. It relates differently with various air pollutants, e.g., positively with BC, Dust, NO2, PM2.5, O3, and AOT, and negatively with NOx, CO, HCHO, and SO2.This study results that Rational quadratic (RMSE-0.001, MAE-0.001, R2-0.96), Random Forest (RMSE-0.004, MAE-0.003, R2-0.91), and Stepwise regression (RMSE-0.002, MAE-0.002, R2-0.87) are the suitable method in ML. The highest goodness-of-fit (R2) of 82%-96% is found in Dhaka and Narshingdi Districts. The key findings may help formulate the appropriate action plan to mitigate ongoing and future air pollution in Bangladesh. In addition, the methodology of the research may be applicable elsewhere nationally and internationally for air pollution research.


Introduction
Methane (CH4) is an essential greenhouse gas (GHG) that not only contributes to global warming but also reduces air quality (Lattanzio, 2020;Rashid et al., 2020). The global emission rate of CH4 has been raised by 10% in the last two decades due to anthropogenic development (Schiermeier, 2020). In Asia, about 86% of CH4 emissions have increased from 1970 to 2021 (World Bank, 2021). However, Bangladesh consumes about 105,14 tons of CH4 and 84.25 tons of carbon dioxide (CO2) annually, resulting in 1.16 tons/ person/year greenhouse gas emissions (Knoema, 2021). The CH4 in Bangladesh is mainly emitted from various sources, including paddy fields, livestock, landfills, leaky natural gas pipelines, and coal stockpiles (Begum et al., 2018;Das et al., 2020;Clark et al., 2021).
An array of research has been conducted on an estimation of CH4 from animals, health benefit of CH4, effects of CH4 on coal dust, and the impact of CH4 and Black Carbon (BC) on temperature (Anenberg et al., 2012;Haque et al., 2014;Ajrash et al., 2015;Smith et al., 2020). In addition, (Yusuf et al., 2016) suggest that the chemical reaction between CH 4 and oxygen (O 2 ) would reduce the O 2 and increase CO2, which in turn increases 10% CH4 in the air in Indonesia. On the other hand, (West et al., 2006) show that the higher level of CH4 reduces the concentration of Ozone (O3) on the surface in 95 cities in the United States of America (USA).
The CH4 contributes to global warming and in the local environment, and atmospheric health. Therefore, it is essential to understand its spatial concentration and the exposed latent relationship with air pollution. So far, air pollution has been considered one of the significant environmental and public health issues in Bangladesh (Iqbal et al., 2020;Mo et al., 2020;Siddiqui et al., 2020). However, minimal studies are found on the relationship between CH 4 and air pollution. Therefore, this research investigates the relationship between CH4 and ten air pollutants in Bangladesh using Machine Learning (ML) and Geographically Weighted Regression (GWR) Modeling. The specific objectives are (i) to conduct a temporal concentration mapping of CH4 using satellite data during 2019-2021, and (ii) to execute ML and GWR modeling as methodological development to understand the relationship between CH4 and air pollutants.

Study area
Five districts of Dhaka Division in Bangladesh, namely Dhaka, Narayanganj, Munshiganj, Narsingdi, and Gazipur are considered for this study (Figure 1).
The areal coverage of the study area is around 7,036 km 2 that lies between latitudes 23°20'N-24°20'N and longitudes 90°00'E-91°00'E. The entire area has nearly 15,158,400 inhabitants (BBS, 2011), while the average density is 2,170 people per km 2 . It has a tropical climate that is sunny and dry in the winter and rainy in the monsoon. The rainy season, which lasts from May to September, accounts for 1,854 mm rainfall, nearly 80% of the annual average (Rabby et al., 2015). The hot and dry seasons have an average yearly temperature of 25 °C, while monthly mean temperatures range from 18 to 29 °C (BMD, 2015).

Datasets used
This study considered the CH4 as a dependent variable to the independent variables of 10 types of air pollutants (Table 1). All these 11 sets of the different spatial datasets were downloaded and extracted from several satellite sensors. Air pollutants data, i.e Particular matters (PM2.5), Nitrogen dioxide (NO 2 ), Nitrogen oxide (NOx), Aerosol optical thickness (AOT), Sulfur dioxide (SO2), Carbon monoxide (CO), Ozone (O3), Black carbon (BC), Formaldehyde (HCHO) and Dust, were collected in December 2020 because this time in Bangladesh keeps the similar environmental conditions, preserves the highest air pollution signature (Hossain et al., 2019) and maintains a data normalization procedure in map prediction (Liang et al., 2020). In addition, 3 multi-dates of remotely sensed CH4 (Table1) data were collected from the Sentinel-5P sensor. The list of both dependent-independent variables and characteristics is presented in Table 1. Pre-processing For standardization, all the air pollutant datasets were normalized by resampling into a 1-km resolution map (Joharestani et al., 2019). A total of 194 sample points of each dependent and independent variable were generated randomly for ML and GWR modeling ( Figure 1). Further, a Person's correlation was executed for deriving a heatmap to understand the statistical relationship between the independent and dependent variables (Watcharavitoon et al., 2013). The pre-and postprocessing, statistical analysis, and final map layout for analyzing CH4 data were conducted using ArcGIS 10.8. In addition, JASP 0.14 (JASP, 2021), a free statistical software, was used to calculate the descriptive analyses and the Person's correlation.

Machine Learning (ML) for prediction
The ML tools have been used widely as suitable methodological algorithms in predicting the pattern and statistical relationship of air pollution and other atmospheric variables (Hempel et al., 2020;Wang et al., 2020). The typical processes of ML-like data preparation, data exploration, feature selection, training and testing samples, model evaluation, and improvement of the model were followed in this study. Several supervised classifiers: Stepwise regression, Quadratic Support Vector Machine (SVM), Rational quadratic Gaussian Process Regression (GPR), Ensemble Bagged Trees, Random Forest, Linear Regression, and Gradient Boosting were applied (Suárez Sánchez et al., 2011;Syafei et al., 2015;Choudhary et al., 2017;Kamińska, 2018). After executing these classifiers, the key results were summarized by comparing the Mean squared error (MSE), Root Mean Square Error (RMSE), Mean Absolute Error (MAE) and goodness of fit (R 2 ). The MATLAB software was used to run these classifiers using its ML tool.

Spatial Distribution of CH4
The spatial and statistical distributions of CH4 are presented in Figure 2. A gradual increasing trend of CH4 was found in different parts of the study area ( Figure 2d). The estimated mean values of CH4 were 0.63 ppb in 2019, 0.65 ppb in 2020 and 0.67 ppb in 2021 ( Table 2). The maximum value of CH4 resulted in 0.64 ppb in 2019, 0.67 ppb in 2020 and 0.71 ppb in 2021 ( Table 2). The highest concentration of CH 4 was observed in Dhaka, Munshiganj and Gazipur Districts (Figure 2, Table 2). Along with the paddy field (Khan and Saleh, 2015;Begum et al., 2018b), landfills, wetland, leaky natural gas pipelines, and coal supply contribute to generating CH4 in Bangladesh (Knoema, 2021). Peters et al. (2017) concluded that the annual concentration of CH4 has significantly increased in Bangladesh from 2003-2015, keeping the similar emissions rate of previous relevant studies.  In the USA, CH4 emission increases2.7 ± 0.5 Tg a -1 due to coalbed, coal mining, and leaking natural gases (Zhang et al., 2020), similar to Bangladesh. On the other hand, (Bachelet and Neue, 1993) mentioned that Asian paddy fields release CH4 of 82 Tg a -1 . Using Inverse modeling and Scanning Imaging Absorption Spectrometer for Atmospheric Cartography (SCIAMACHY), (Bergamaschi et al., 2009) indicated that anthropogenic, paddy fields, wetland, and biomass burning enhance the concentration of CH4 in South Asia.

Correlations between CH4 and air pollutants
The result of Person's correlation using a heat map of the internal linear relationship between CH4 and ten air pollutants is shown in Figure3. This result reveals that CH4 correlated positively with six air pollutants (BC, Dust, NO2, PM2.5, O3, and AOT) and of which statistically significant to BC (0.70), PM2.5 (0.509), O3 (0.491) and AOT (0.677) (Figure 3). On the other hand, it correlated negatively to four pollutants (NOx, CO, HCHO, and SO2), of which statistically significant to CO (-0.543) and SO2 (-0.122) (Figure 3).
The study area is located in the urban area, and BC is increasing (Begum et al., 2015); thus, it may significantly lead to a rising CH4. Cofala et al. (2007) found a similar relationship between BC and CH4 in Asian urban and urban residential areas. They estimated that the percentage of BC emission in residential regions ranges from 60% to 80%. However, the controlling measures of BC and CH4 may benefit improved air quality (Anenberg et al., 2012).
This study also resulted in reducing CO and NOx, increases CH4 emission (Figure 3). However, (Rahman et al., 2019) suggested that CO and NOx emissions increase in Dhaka and its surroundings due to traffic and automobile source, industrial release and fossil fuel. These components correlated negatively with CH4 emission. The study area is densely populated, and the concentration of NO2 is increasing in Dhaka City over time (Zahangeer Alam et al., 2018), which has a positive correlation with CH4. Also, CH4 correlates positively with O3 (Fig. 3), which has good agreement with (Alva et al., 2015), who showed that increasing O3 impacts CH4 significantly. However, (Mohajan, 2012) showed that CH4 mitigation reduces O3 concentration. Similarly, West et al. (2006) and Mohajan (2012) concluded that decreasing CH4 helps to reduce greenhouse and condensation of O3.
This study suggested that Rational Quadratic has the best outcome (Figure 4(b), Table 3). In contrast, (Hempel et al., 2020) found the best result using Gradient boosting in Northern Germany. Wang et al. (2020) mentioned a random forest classifier to be the best, with the highest accuracy of 73% in the USA.
The performance of classifiers depends on the distribution and amount of the training data set (Joharestani et al., 2019), which is proven to be true in this study by selecting suitable training data sets randomly.
This GWR model resulted in Akaike's Information Criteria (AIC) (41.86), AICc (70.81), and Bayesian information criterion (BIC) (191.13). Furthermore, this model derived a better coefficient of determination R 2 (95%), representing a good prediction of the relationship between air pollutants and CH4 ( Figure 5). The minimum and maximum R 2 values of the study were 67% and 96%, respectively. The highest goodness-of-fit ranged from 82% to96%, found in Dhaka and Narsingdi Districts, covering 25 Upazilas (subdistrict) and 220 unions (local government units in the rural area). Significantly lower R 2 values were observed in the middle of Gazipur and the middle to southern parts of Narayanganj Districts, the major industrial sectors ( Figure 5). It may be caused by massive vegetation and good hydrological conditions in this section of the study area. Forest and water bodies have the natural capacity to mitigate emissions (Bachelet and Neue, 1993;Khan and Saleh, 2015). However, in Bangladesh, the emission and concentration of CH4 are increasing (Figure 2d), similar to the global CH4 emission, which will be increased 50% higher in 2030 than in 2000 (Cofala et al., 2007).
Figure5 -The spatial distribution of R 2 , derived from the Geographically Weighted Regression (GWR) model in the study area.

Conclusions
The annual spatial concentration of CH4 has been increasing in the study area. It has a strong statistical relationship with ten air pollutants, which has led to open a new avenue of understanding the future air pollution and its vulnerability. Along with other variables, CH 4 may be a new pollutant for increasing air pollution in Bangladesh. However, the key results of this paper can be summarized as follows: • CH4in Bangladesh's air increases gradually during the study period (2019)(2020)(2021) in different parts of the study area, especially in Dhaka, Gazipur, and Munshiganj Districts.
• The mean and highest values of CH4 concentration have increased by 3% and 4%, respectively.
• BC, Dust, NO, PM2.5, O3, and AOT positively correlate with CH4, while NOx, CO, HCHO, and SO2negatively correlate. Stepwise regression (RMSE-0.002, MAE-0.002, R 2 -0.87) are resulted to be suitable methods for predicting CH4 emission. • The highest goodness-of-fit (R 2 ) ranges 82%-96% in Dhaka and Narsingdi Districts. • Significantly lower R 2 values are observed in the middle of Gazipur and the middle/southern part of the Narayanganj District. However, the inclusion of other types of datasets, e.g., high-resolution (both spatially and temporally) metrological, climatic, and atmospheric and impact on public health, may provide a deeper understanding of the role of CH4 in air pollution and public health hazards. Moreover, the results of this study may benefit the Government of Bangladesh and its concerned ministries and authorities to formulate a new air pollution strategy and action plan considering CH4 and its relationship with air pollutants. Furthermore, this model may be replicable elsewhere nationally and internationally to understand the exposed or latent connection between CH4 and air pollution for long-term air pollution control and protective measures.