Figure 19 - uploaded by Ayman Yafouz
Content may be subject to copyright.
Predicted vs Actual -LSTM -S3.

Predicted vs Actual -LSTM -S3.

Source publication
Article
Full-text available
To accurately predict tropospheric ozone concentration(O3), it is needed to investigate the variety of artificial intelligence techniques’ performance, such as machine learning, deep learning and hybrid models. This research aims to effectively predict the hourly ozone trend via fewer input variables. This ozone prediction attempt is performed on d...

Similar publications

Article
Full-text available
The COVID-19 pandemic has significantly affected economic activities all around the world. Though it took a huge amount of human breathes as well as increases unemployment, it puts a positive impression on the environment. To stop the speedy extend of this disease, the maximum Government has imposed a strict lockdown on their citizens which creates...
Article
Full-text available
Up-to-date and accurate emission inventories for air pollutants are essential for understanding their role in the formation of tropospheric ozone and particulate matter at various temporal scales, for anticipating pollution peaks and for identifying the key drivers that could help mitigate their concentrations. This paper describes the Bayesian var...

Citations

... Generally, these techniques are used to build up outstanding prediction models (Lecun et al. 2015). Studies have shown that ML and DL models are more promising than traditional models since they do not rely on the correlations with the independent variables to make a prediction (Yafouz et al. 2021). ...
Article
Full-text available
Nowadays, recycled aggregate concrete (RAC) has been most extensively applied in the construction industry as a sustainable resource to decrease carbon dioxide emissions and construction waste. Predicting the compressive strength (CS) of RAC is crucial to understanding the behavior and performance of this environment-friendly (EF) concrete. This paper developed the models for forecasting the CS of RAC materials using hybrid machine learning (ML) models and ML with hyperparameter tuning techniques. The RAC experimental datasets were collected from the research literature, where the datasets were utilized for the 70% training and 30% testing phases of the models. This study used some renowned AI models such as XGBoost (Extreme Gradient-Boosting), GBM (Gradient Boosting Machine), RF (Random Forest), and the hybrid GBM–XGBoost model. The ensemble GBM–XGBoost algorithm showed the highest level of accuracy for CS prediction, with R2 = 0.982 for the training stage and R2 = 0.793 for the testing stage. The evaluation of the statistical indicators of AI algorithms revealed that the ensemble GBM–XBR had a more accurate prediction. The SHapley Additive exPlainations (SHAP) analysis showed that the effective water–cement ratio (We/C), nominal maximum RCA size, and replacement ratio positively correlated with the CS of RAC, which were the most significant parameters. The partial dependence plots (PDP) study displayed the optimal quantity of each parameter, which could help in mix design to achieve a targeted CS. Furthermore, the output of both the SHAP and PDP analyses could assist researchers and the industry in determining the quality of raw ingredients when preparing RAC.
... Recently, researchers have studied a number of deep learning networks. A CNN-LSTM combination model has been applied to predict wind speed, PM2.5, and ozone concentration [16][17][18][19] . Therefore, it is evident that deep learning network models have been applied for simulations and predictions in the atmospheric environment and meteorological fields. ...
Preprint
Full-text available
Considering that ozone is essential to understanding air quality and climate change, this study introduces a deep learning method for predicting atmospheric ozone concentrations. The method combines an attention mechanism with a convolutional neural network (CNN) and long short-term memory (LSTM) to address the nonlinear nature of multivariate time-series data. It employs CNN and LSTM to extract features from short series, enhanced by the attention mechanism for improved short-term prediction accuracy. The model uses eight meteorological and environmental parameters from 16,806 records (2018–2019) as input, selected through principal component analysis (PCA). It features a hybrid attention-CNN-LSTM model with specific settings: a time step of 5, a batch size of 25, 15 units in the LSTM layer, the Relu activation function, 25 epoch iterations, and an overfitting avoidance strategy at 0.15. Experimental results demonstrate that this hybrid model outperforms independent models and the CNN-LSTM model, especially in forward prediction with a multi-hour time lag. The model exhibits a high prediction determination coefficient (R ² = 0.971) and a root mean square error of 3.59 for a 1-hour time lag. It also shows consistent accuracy across different seasons, highlighting its robustness and superior time-series prediction capabilities for ozone concentration.
... Conversely, machine learning techniques, such as extreme gradient boosting (XGBoost, e.g., Jumin et al., 2020;Liu et al., 2020), random forest (RF, e.g., Stafoggia et al., 2020;Zhan et al., 2018), and multi-layer perceptron (MLP, e.g., Wang & Lu, 2006), excel in addressing nonlinear challenges. They have the capacity to surpass the constraints associated with deterministic and statistical regression approaches, offering superior accuracy, robust generalization capabilities, and computational efficiency (Yafouz et al., 2021). Nonetheless, it is worth noting that many machine learning models are considered black box models, lacking the ability to elucidate the relationship between input variables and model outputs. ...
... It is imperative to emphasize that deep learning algorithms, specifically Long Short Term Memory (Hochreiter &Schmidhuber, 1997) andConvolutional Neural Network (LeCun et al., 1989), were deliberately omitted from the scope of this study. These sophisticated algorithms necessitate substantial data volumes and are associated with significant time investments (Yafouz et al., 2021). Furthermore, previous studies have indicated that deep learning algorithms frequently yield comparable or even suboptimal outcomes when confronted with limited data samples, as exemplified by the 2,913 days of data in this study. ...
... Furthermore, previous studies have indicated that deep learning algorithms frequently yield comparable or even suboptimal outcomes when confronted with limited data samples, as exemplified by the 2,913 days of data in this study. Additionally, they exhibit diminished stability and a proclivity to converge toward local optima (Ma et al., 2022;Yafouz et al., 2021). In congruence with the findings presented in the subsequent section of this study, it is evident that basic machine learning algorithms outperformed their deep learning counterparts. ...
Article
Full-text available
Accurate estimation of ozone (O3) concentrations and quantitative meteorological contribution are crucial for effective control of O3 pollution. In recent years, there has been a growing interest in leveraging machine learning for O3 pollution research due to its advantages, such as high accuracy, strong generalization, and ease of use. In this study, we utilized meteorological parameters obtained from european center for medium—range weather forecasts (EMCWF) Reanalysis v5 data as input and employed five distinct machine learning methods to estimate values of maximum daily 8‐hr average (MDA8) O3 concentrations and analyze meteorological contributions. To improve the accuracy and generalization capabilities of the estimation, we employed Grid SearchCV techniques to select optimal parameters and mitigate the risk of overfitting. Additionally, we incorporated meteorological normalization and the SHAP model to quantify the influence of various parameters. Among the models evaluated, the Extreme Gradient Boost model exhibited exceptional performance from 2015 to 2022, yielding determination coefficients of 0.85 and 0.80 for the training and test data sets, respectively. The outcomes of meteorological normalization revealed that meteorological parameters accounted for 87.7% of the impacts in 2018, while emission‐related factors constituted 80.8% of the impacts in 2021. Over the period spanning 2015–2022, 2 m temperature emerged as the most influential parameter affecting daily MDA8 O3 concentration, with an average contribution of 9.4 μg m⁻³.
... In comparison to both single and hybrid models, the results of DM tests provided robust evidence of the superiority of the proposed model, achieving a 99% confidence level over the comparison models. Such investigation have allowed researchers to understand the relationships between variables contributing to PM 2.5 concentrations [32][33][34] . ...
Article
Full-text available
Fine particulate matter (PM2.5) is a significant air pollutant that drives the most chronic health problems and premature mortality in big metropolitans such as Delhi. In such a context, accurate prediction of PM2.5 concentration is critical for raising public awareness, allowing sensitive populations to plan ahead, and providing governments with information for public health alerts. This study applies a novel hybridization of extreme learning machine (ELM) with a snake optimization algorithm called the ELM-SO model to forecast PM2.5 concentrations. The model has been developed on air quality inputs and meteorological parameters. Furthermore, the ELM-SO hybrid model is compared with individual machine learning models, such as Support Vector Regression (SVR), Random Forest (RF), Extreme Learning Machines (ELM), Gradient Boosting Regressor (GBR), XGBoost, and a deep learning model known as Long Short-Term Memory networks (LSTM), in forecasting PM2.5 concentrations. The study results suggested that ELM-SO exhibited the highest level of predictive performance among the five models, with a testing value of squared correlation coefficient (R²) of 0.928, and root mean square error of 30.325 µg/m³. The study's findings suggest that the ELM-SO technique is a valuable tool for accurately forecasting PM2.5 concentrations and could help advance the field of air quality forecasting. By developing state-of-the-art air pollution prediction models that incorporate ELM-SO, it may be possible to understand better and anticipate the effects of air pollution on human health and the environment.
... On the macro scale, landscape type usually refers to land cover type, while on the micro scale, it is often defined by the landscape pattern index, which represents its structural composition and spatial configuration. At present, research on the factors affecting GOCs mainly focuses on meteorological conditions [16,17], human activities [18,19] and social development status [20,21]; few studies consider landscape types at different scales. With the development of remote sensing and geographic information systems, the relationship between land cover conditions and GOCs has gradually become a research hotspot. ...
Article
Full-text available
Scientifically configuring landscape patterns based on their relationship with ground-level ozone concentrations (GOCs) is an effective way to prevent and control ground-level ozone pollution. In this paper, a GOC variation trend prediction model (hybrid model) combining a generalized linear model (GLM) and a logistic regression model (LRM) was established to analyze the spatiotemporal variation patterns in GOCs as well as their responses to landscape patterns. The model exhibited satisfactory performance, with percent of samples correctly predicted ( PCP ) value of 82.33% and area under receiver operating characteristics curve ( AUC ) value of 0.70. Using the hybrid model, the per-pixel rise probability of annual average GOCs at a spatial resolution of 1 km in Shenzhen were generated. The results showed that (1) annual average GOCs were increasing in Shenzhen from 2015 to 2020, and had obvious spatial differences, with a higher value in the west and a lower value in the east; (2) variation trend in GOCs was significant positively correlated with landscape heterogeneity ( HET ), while significant negatively correlated with dominance ( DMG ) and contagion ( CON ); (3) GOCs in Shenzhen has a great risk of rising, especially in GuangMing, PingShan, LongGang, LuoHu and BaoAn. The results provide not only a preliminary index for estimating the GOC variation trend in the absence of air quality monitoring data but also guidance for landscape optimizing design from the perspective of controlling ground-level ozone pollution.
... In this sense, the creation of ML models to estimate or forecast O 3 concentration levels is a topic of major interest, as shown by the large number of existing studies, both from an academic and an applied perspective. Many types of algorithms have been used for this purpose (Yafouz et al., 2021b(Yafouz et al., , 2022Subramaniam et al., 2022;De Marco et al., 2022;Masih, 2019;Juarez and Petersen, 2021), which undoubtedly influence the performance results of each study. Regardless the growing number of papers employing ML methods to estimate or predict ground-level O 3 , most of them provide average results covering annual study periods. ...
... One of them is the temporal resolution. Two common output formats used in the literature include models with an hourly time step (Do et al., 2023;Juarez and Petersen, 2021;Lu et al., 2021;Yafouz et al., 2021b;AlOmar et al., 2020;Eslami et al., 2020;Sayeed et al., 2020;Su et al., 2020;Alves et al., 2019;Biancofiore et al., 2015), and models with a daily-based indicator, typically represented by the O 3, MDA8 Fan et al., 2022;Gao et al., 2022;Lyu et al., 2022;Meng et al., 2022;Sadeghi et al., 2022;Feng et al., 2019;Pernak et al., 2019;Watson et al., 2019;Freeman et al., 2018;Zhan et al., 2018) and, to a lesser extent, by the maximum daily 1-h concentration (López-Chacón, 2021;Malinović-Milićević et al., 2021) or the daily averaged ozone concentration (Wang et al., 2022a;Mo et al., 2021). ...
... For these approaches, a materialistic model is initially created, and statistical equations are applied for the data. But these approaches have drawbacks such as they offer limited precision since the extreme points i.e., maximum and minimum pollution are not predictable, they use inefficient approach for better output prediction, old data and new data equal treatment and complicated mathematical equations (Yafouz et al., 2021b). But soft computing models have recently been used extensively to solve environmental issues. ...
... The exploration of spatiotemporal interdependencies inherent to air quality attributes in imagery has been incorporated (Elbaz et al., 2023a;Utku et al., 2023). These adaptations serve to optimize the model extraction process, ultimately elevating the generalization capacity of these data-driven models (Dai et al., 2021;Yafouz et al., 2021). It has been established that diverse ensemble models significantly contribute to enhancing predictive accuracy while mitigating the loss of spatial features within these models (Bhat et al., 2023;Elbaz et al., 2023a;Elbaz et al. 2023b;Wang et al., 2023) In contrast, though less powerful in image recognition, tree-based models are simpler, more interpretable, and often used to investigate air pollution problems (Guo et al., 2021;Wei et al., 2019;. ...
Article
Full-text available
In the quest to reconcile public perception of air pollution with scientific measurements, our study introduced a pioneering method involving a gradient boost-regression tree model integrating PM2.5 concentration, visibility, and image-based data. Traditional stationary monitoring often falls short of accurately capturing public air quality perceptions, prompting the need for alternative strategies. Leveraging an extensive dataset of over 20,000 public visibility perception evaluations and over 8,000 stationary images, our models effectively quantify diverse air quality perceptions. The predictive prowess of our models was validated by strong performance metrics for perceived visibility (R=0.98, RMSE=0.19), all-day PM2.5 concentrations (R: 0.77-0.78, RMSE: 8.31-9.40), and Central Weather Bureau visibility records (R=0.82, RMSE=9.00). Interestingly, image contrast and light intensity hold greater importance than scenery clarity in the visibility perception model. However, clarity is prioritized in PM2.5 and Central Weather Bureau models. Our research also unveiled spatial limitations in stationary monitoring and outlined the variations in predictive image features between near and far stations. Crucially, all models benefit from the characterization of atmospheric light sources through defogging techniques. The image-based insights highlight the disparity between public perception of air pollution and current policy implementation. In other words, policymakers should shift from solely emphasizing the reduction of PM2.5 levels to also incorporating the public's perception of visibility into their strategies. Our findings have broad implications for air quality evaluation, image mining in specific areas, and formulating air quality management strategies that account for public perception.
... Complementary to field studies, Chemical Transport Models (CTMs) and data-based machine learning methods [11,12] have been widely used to enhance understanding of ozone formation and transport processes. Despite significant advancements in models, the prediction of ozone, especially during high ozone episodes, remains a challenging problem, particularly in complex coastal regions where local emissions, transport processes, and meteorological conditions interact in complex ways [13][14][15][16]. ...
Article
Full-text available
This study investigates the influence of meteorology initialization on surface ozone prediction in the Great Lakes region using Canada’s operational air quality model (GEM-MACH) at a 2.5 km horizontal resolution. Two different initialization techniques are compared, and it is found that the four-dimensional incremental analysis updating (IAU) method yields improved model performance for surface ozone prediction. The IAU run shows better ozone regression line statistics (y = 0.7x + 14.9, R2 = 0.2) compared to the non-IAU run (y = 0.6x + 23.1, R2 = 0.1), with improved MB and NMB values (3.9 ppb and 8.9%, respectively) compared to the non-IAU run (4.1 ppb and 9.3%). Furthermore, analyzing ozone prediction sensitivity to model initialization time reveals that the 18z initialization leads to enhanced performance, particularly during high ozone exceedance days, with an improved regression slope of 0.9 compared to 0.7 for the 00z and 12z runs. The MB also improves to −0.2 ppb in the 18z run compared to −2.8 ppb and −3.9 ppb for the 00z and 12z runs, respectively. The analysis of meteorological fields reveals that the improved ozone predictions at 18z are linked to a more accurate representation of afternoon wind speed. This improvement enhances the transport of ozone, contributing to the overall improvement in ozone predictions.
... This can result in lower accuracy and reduced performance when compared to more advanced machine learning model (Ehteram et al. 2020(Ehteram et al. , 2018. To overcome these limitations, optimization technique introduced (Pham et al. 2020;Yafouz et al. 2021b). ...
Article
Full-text available
Accurate prediction of short-term water demand, especially, in the case of extreme weather conditions such as flood, droughts and storms, is crucial information for the policy makers to manage the availability of freshwater. This study develops a hybrid model for the prediction of monthly water demand using the database of monthly urban water consumption in Melbourne, Australia. The dataset consisted of minimum, maximum, and mean temperature (°C), evaporation (mm), rainfall (mm), solar radiation (MJ/m 2), maximum relative humidity (%), vapor pressure (hpa), and potential evapotranspiration (mm). The dataset was normalized using natural logarithm and denoized then by employing the discrete wavelet transform. Principle component analysis was used to determine which predictors were most reliable. Hybrid model development included the optimization of ANN coefficients (its weights and biases) using adaptive guided differential evolution algorithm. Post-optimization ANN model was trained using eleven different leaning algorithms. Models were trained several times with different configuration (nodes in hidden layers) to achieve better accuracy. The final optimum learning algorithm was selected based on the performance values (regression; mean absolute, relative and maximum error) and Taylor diagram.