Figure 5 - available via license: CC BY
Content may be subject to copyright.
Individual-moving range chart (IMR chart) X/R with mobile range of hourly NO 2 concentrations from Blanchardstown air quality monitoring station. Software: R-programming [56].

Individual-moving range chart (IMR chart) X/R with mobile range of hourly NO 2 concentrations from Blanchardstown air quality monitoring station. Software: R-programming [56].

Source publication
Article
Full-text available
Ground level concentrations of nitrogen oxide (NOx) can act as an indicator of air quality in the urban environment. In cities with relatively good air quality, and where NOx concentrations rarely exceed legal limits, adverse health effects on the population may still occur. Therefore, detecting small deviations in air quality and deriving methods...

Context in source publication

Context 1
... analyse the data using the SPC method, an individual-moving range chart (IMR chart) of hourly NO 2 concentrations was made. With the examination of the results shown in Figure 5, it can be observed that the number of false alarms, i.e., outliers, is significant. This problem is attributable to: ...

Similar publications

Article
Full-text available
In this work, the air pollution index in three cities (Seoul, Busan, and Daegu) in South Korea was studied using multifractal detrended fluctuation analysis (MF-DFA). Hurst, Renyi, and Holder exponents were used to analyze the characteristics of the concentration time series of PM2.5 and NO2. The results showed that multifractality exists in each s...

Citations

... The PV system significantly contributed to reducing carbon emissions, aligning with global efforts to combat climate change. By replacing conventional fossil fuels with solar energy, the project also reduced air pollution, promoting better public health in the rural community [60]. Economically, the project demonstrated viability through a favorable payback period, enhanced by government subsidies. ...
Chapter
This chapter presents a comprehensive analysis of the planning, design, and implementation of photovoltaic (PV) systems, emphasizing their role in sustainable rural electrification and renewable energy integration. The chapter begins by examining the integration of solar energy into the electricity market, highlighting its contribution to energy security and climate change mitigation. It deals with the challenges and dynamics of incorporating distributed energy resources, with a special focus on solar PV systems. The chapter methodically explores the planning and design aspects of PV systems, considering factors like site location, climatic conditions, and grid connectivity. A case study on electrifying a rural community provides practical insights into the application of these principles. This chapter further details the components, specifications, and costs of PV systems, presenting exhaustive tables and guidelines for implementation. It also includes calculations and estimations essential for system balance and optimization, covering environmental, technical, and economic aspects. The chapter concludes with a discussion of lessons learned and provides a comprehensive conclusion, synthesizing the key findings and implications of the study for future renewable energy projects.
... In model construction, fundamental statistics are pivotal in identifying potential outliers within the dataset. An outlier is a data point in a dataset that significantly deviates from other observations, often due to variability in the data or measurement errors [28]. It is essential to meticulously verify key statistical parameters of each variable, encompassing minima, maxima, means, and standard deviations, as depicted in Fig. 8. ...
Chapter
Adopting an innovative framework, this study transcends traditional weak bus identification, exploring the interplay and causality among buses beyond direct connections. This multidimensional approach enhances system planning and operation by facilitating a comprehensive understanding of system load changes and elucidating the impact of a single-bus load alteration across the entire system. This methodology could underpin optimal renewable technology allocation in diverse contexts, promoting holistic system analysis. The research employed a combination of sensitivity and causality analysis to identify the most critical buses in the system, extending the analysis to the entire system rather than just neighboring buses. A web-based simulator was developed to predict the future values of the system’s most critical bus, “B03”, by considering the influence of other impactful buses under two conditions: their value in a previous time period (t-1) and their steady-state value before the simulation. Furthermore, an optimization process was performed to minimize the load on the critical B03 bus. By optimally distributing the load across the system based on the loadability of the entire system, the load at B03 was reduced from an initial 11.12–10.83 kWh. The neural network model, with a lower error rate of 3.85%, was more accurate than the baseline model in predicting the load on bus B03. The optimization process further enhanced the system’s ability to integrate renewable energy sources, contributing to a balanced and resilient power system. The proposed methodology’s superiority has been confirmed through experimental analysis of a sizable dataset from Iowa’s 240-bus power system. An adaptable framework, strengthened by various tools and techniques, can be successfully customized for a wide range of applications. This approach offers a promising pathway for the optimal allocation of renewable technologies, contributing to the development of more sustainable and resilient power systems.
... The importance of multiple modeling approaches was emphasized in practical recommendations for government policy on air quality regulation. Additionaly, Torres et al. [2] evaluated several analytical strategies for detecting air pollution events and outliers. The study demonstrated the effectiveness of functional data analysis in detecting patterns and outliers in nitrogen dioxide concentrations, highlighting the limits of traditional approaches as well as the advantages of functional data analysis for comprehensive air pollution control. ...
... Like in Wang et al. [8] study, we also used data smoothing technique (and used Fourier basis as it was done by Torres et al. [2]) and tested the significance of difference among different regions using ANOVA. We also performed outliers detection as similar outliers' analysis were performed by Torres et al. [2] and by Rigueira et al. [5]. ...
... Like in Wang et al. [8] study, we also used data smoothing technique (and used Fourier basis as it was done by Torres et al. [2]) and tested the significance of difference among different regions using ANOVA. We also performed outliers detection as similar outliers' analysis were performed by Torres et al. [2] and by Rigueira et al. [5]. ...
Article
Full-text available
In this research paper, a comprehensive analysis of particulate matter (PM10) and nitrogen dioxide (NO2) pollution concentrations in six different Lithuanian regions is presented. The analysis employs data smoothing, principal component analysis (PCA), exploratory data analysis, hypothesis testing, and time series analysis to provide a thorough examination. Functional data analysis approaches were used to find the origins and effects of these air pollutants by revealing their data patterns. The functional data analysis techniques demonstrate their effectiveness in revealing deep links within large datasets, assisting in the control of air quality problems. This research provides valuable insights into air quality challenges in Lithuanian regions. The study, aimed at comparing air quality across different regions, indicates that there are no significant differences in PM10 and NO2 between the two groups. Notably, reliable forecasts for 2023 data are attainable for PM10 in regions such as Vilnius Old Town, Vilnius Lazdynai, Šiauliai, and Klaipėda. For NO2, successful forecasting can be applied to Vilnius Old Town, Vilnius Lazdynai, and Šiauliai.
... The FDA approach was inspired by the classical technique of data mining to cope with vectorial data treatment. The applications of FDA have also been used for environmental research [14][15][16][17][18][19][20], medical research [21], and the manufacturing sector [22]. This functional model provides two important features: First, the correlation of the data structure with time is taken into account, and second, comparisons are made with a view of the global problem. ...
... Traditional statistical analysis seeks to evaluate the empirical frequency distribution that yields the absolute frequency of occurrence of each of the several possible results of the frequent size of a discrete event [19]. If there is just a finite number of various outcomes (a discrete example) and if the distribution function is utilized in the situation of an indefinitely frequent and randomly trustworthy calculation and each result is different, the outcome of relative frequency will not be very enlightening. ...
... The method establishes special cause variation to avoid imperfect subgroups while determining the control chart's border limit based on variability within each subgroup. If the mechanism is violated, only subgroups that duplicate the process's common cause of variance should be collected [19]. ...
Article
Full-text available
Air pollution is prevalent throughout the entire world due to the release of various gases such as NOx, PM, SO2, tropospheric ozone (O3), etc. Ground-stage ozone is the predominant issue in smog and is the product of the interplay between sunlight and emissions. The destructive impact on the health of the populace might also still occur in cities with noticeably clean air and where ozone levels hardly ever exceed safe limits. Therefore, the findings of small variations in air quality and the technique of regulating air contamination are thought-provoking. The study employs various techniques to effectively observe and assess strategies for detecting and eliminating outliers in ozone emissions from pollution episodes. This technique helps to describe the sources and exceedance values and enhance the value of monitoring the data. In this study, the data have some missing observations. The method of imputation, the classical statistical technique, the statistical process control (SPC) technique, functional data analysis (FDA), and functional process control help to fill in the data and detect outliers, trend deviations, and changes in ozone concentration at ground level. A comparison study is carried out using these three techniques: classical analysis, SPC, and FDA, and the results show how the statistical process control and functional data methods performed better than the classical technique for the detection of outliers and also in what way this methodology can enable an additional, comprehensive method of defining air pollution control measures and water pollution control measures.
... By leveraging advanced analytical techniques, we can extract invaluable insights into pollution trends, pinpoint areas of concern, and devise effective strategies to mitigate its impact, thereby promoting sustainable environmental management. This analytical endeavor assumes that facilitating well-informed decision-making processes, policy formulation, and the protection of public health are of utmost importance [1,2]. ...
Article
Full-text available
In this work, we introduce an innovative Markov Chain Monte Carlo (MCMC) classifier, a synergistic combination of Bayesian machine learning and Apache Spark, highlighting the novel use of this methodology in the spectrum of big data management and environmental analysis. By employing a large dataset of air pollutant concentrations in Madrid from 2001 to 2018, we developed a Bayesian Logistic Regression model, capable of accurately classifying the Air Quality Index (AQI) as safe or hazardous. This mathematical formulation adeptly synthesizes prior beliefs and observed data into robust posterior distributions, enabling superior management of overfitting, enhancing the predictive accuracy, and demonstrating a scalable approach for large-scale data processing. Notably, the proposed model achieved a maximum accuracy of 87.91% and an exceptional recall value of 99.58% at a decision threshold of 0.505, reflecting its proficiency in accurately identifying true negatives and mitigating misclassification, even though it slightly underperformed in comparison to the traditional Frequentist Logistic Regression in terms of accuracy and the AUC score. Ultimately, this research underscores the efficacy of Bayesian machine learning for big data management and environmental analysis, while signifying the pivotal role of the first-ever MCMC Classifier and Apache Spark in dealing with the challenges posed by large datasets and high-dimensional data with broader implications not only in sectors such as statistics, mathematics, physics but also in practical, real-world applications.
... In this study, the fraction of super-emitters was determined for each vehicle type using the distribution of box plots (see Fig. 7). Vehicles whose emission factors (EF CO and EF NOx ) exceeded the "third quartile (Q3) plus 1.5 times the Inter-Quartile Range (IQR)" (O'Leary et al., 2016;Torres et al., 2020) were classified as super-emitters. The results are shown in Table S1. ...
Article
Over the years, vehicular emissions have been the main contributor to the deterioration of urban air quality. However, quantification of real-world vehicular emissions is quite limited in low- and middle-income countries like India. Developing real-world vehicle emission factors (EFs) using reference-grade instruments requires a significant amount of resources. This study aims to develop the individual and fleet vehicle EFs and the fraction of high-emitting vehicles using high-time resolution, low-cost sensors from near-road measurements – a first-ofits- kind study in India. Traffic and air pollutant measurements were conducted at the kerbside of a street canyon in Mumbai. The individual vehicle fuel-based EFCO and EFNOx were estimated using the plume identification technique coupled with the information obtained from the vehicle registration number plates. The fleet mean (±SD) EFCO, EFNO, and EFNO2 were 6.30 (±3.16), 1.38 (±1.17), and 0.43 (±0.32) g/kg, respectively, while for EFPM1, EFPM2.5, EFPM2.5-10, and EFPM10 were 0.70 (±0.34), 1.19 (±0.57), 0.90 (±0.65), and 2.09 (±1.05) g/kg, respectively. The developed individual vehicle EFCO and EFNOx were greatly varied within each vehicle type due to differences in emission control technology, engine size, and the prevalence of “super-emitters”. There was no substantial difference in EFCO and EFNOx among the different BS emission standards across almost all vehicle types. The reconstructed fleet EFCO and EFNOx from the developed single-vehicle EFs were 1.4 and 1.9 times higher than the recorded fleet EFs. Approximately 14% of vehicles in the fleet were identified as super-emitters, responsible for 37–54% of total emissions, primarily from private passenger vehicles such as cars and two wheelers. The EFCO and EFNOx from these high-emitters were 3–30 times greater than the laboratory-reported emissions. Our study suggests that improving emission standards alone is not enough to decrease tailpipe emissions from vehicles. Proper vehicle inspection and maintenance programs are crucial in controlling these emissions.
... Bayesian machine learning models are statistical methods based on uncertainty quantification focused, in this case, on behavioral analysis and inference in complex multivariable systems [17][18][19] such as freshwater systems. On the other hand, functional data analysis [20] is a branch of statistics implemented, in this case, for treating continuous data series and detecting anomalies in different variables [21][22][23][24][25][26][27]. ...
... , x n . Therefore, the concept of depth makes it possible to work with observations, defined in a given time interval, in the form of curves, instead of having to summarize the information contained in these curves into a single value, such as the mean [25]. In this case, the Modified Band Depth (MBD) [45] has been selected as it has demonstrated a better performance in the analysis of environmental data with this approach [46]. ...
Article
Full-text available
Acid mine drainage events have a negative influence on the water quality of fluvial systems affected by coal mining activities. This research focuses on the analysis of these events, revealing hidden correlations among potential factors that contribute to the occurrence of atypical measures and ultimately proposing the basis of an analytical tool capable of automatically capturing the overall behavior of the fluvial system. For this purpose, the hydrological and water quality data collected by an automated station located in a coal mining region in the NW of Spain (Fabero) were analyzed with advanced mathematical methods: statistical Bayesian machine learning (BML) and functional data analysis (FDA). The Bayesian analysis describes a structure fully dedicated to explaining the behavior of the fluvial system and the characterization of the pH, delving into its statistical association with the rest of the variables in the model. FDA allows the definition of several time-dependent correlations between the functional outliers of different variables, namely, the inverse relationship between pH, rainfall, and flow. The results demonstrate that an analytical tool structured around a Bayesian model and functional analysis automatically captures different patterns of the pH in the fluvial system and identifies the underlying anomalies.
... The desire to interpret many curvilinear data together has increased the requirements for functional data analysis methods, and with the development of these methods, it is aimed to reveal the underlying structures of functional data called curves or surfaces. With the development of functional equivalents of multivariate statistical analysis techniques, functional data analysis (FDA) techniques have found a wide range of applications such as financial data (Wang et al., 2021), medical data (Ullah and Finch, 2013), climate change (Ghumman et al., 2020), air quality (Martinez Torres et al., 2020), management science (Dass and Shropshire, 2012), pandemics (Tang et al., 2020), etc. A detailed review of applications of functional data analysis can be found in Ullah and Finch (2013). ...
Article
Full-text available
The coefficient of the variation function is a useful descriptive statistic, especially when comparing the variability of more than two curve groups, even when they have significantly different mean curves. Since the coefficient of variation function is the ratio of the mean and standard deviation functions, its particular property is that it shows the acceleration more explicitly than the standard deviation function. The aim of the study is twofold: to show that the functional coefficient of variation is more sensitive to abrupt changes than the functional standard deviation and to propose the utilisation of the functional coefficient of variation as an outlier detection tool. Several simulation trials have shown that the coefficient of the variation function allows the effects of outliers to be seen explicitly.
... In the environmental pollution framework, an unusually high concentration of air pollutants, known formally as anomalies, may bring problems in the air quality index. Martínez et al. (2014), Sancho et al. (2014), and Torres et al. (2020) implemented a model relying on functional analysis to identify outliers samples, with the overall goal of achieving a better air quality monitoring solution. In another related paper, Shaadan et al. (2015) conducted a study to detect anomalies in daily PM10 functional data, investigate behaviour patterns, and identify potential factors determining PM10 abnormalities at three selected air quality monitoring stations. ...
Article
Full-text available
The application of spatiotemporal functional analysis techniques in environmental pollution research remains limited. As a result, this paper suggests spatiotemporal functional data clustering and visualization tools for identifying temporal dynamic patterns and spatial dependence of multiple air pollutants. The study uses concentrations of four major pollutants, named particulate matter (PM2.5), ground-level ozone (O3), carbon monoxide (CO), and sulfur oxides (SO2), measured over 37 cities in Yemen from 1980 to 2022. The proposed tools include Fourier transformation, B-spline functions, and generalized-cross validation for data smoothing, as well as static and dynamic visualization methods. Innovatively, a functional mixture model was used to capture/identify the underlying/hidden dynamic patterns of spatiotemporal air pollutants concentration. According to the results, CO levels increased 25% from 1990 to 1996, peaking in the cities of Taiz, Sana’a, and Ibb before decreasing. Also, PM2.5 pollution reached a peak in 2018, increasing 30% with severe concentrations in Hodeidah, Marib, and Mocha. Moreover, O3 pollution fluctuated with peaks in 2014–2015, 2% increase and pollution rate of 265 Dobson. Besides, SO2 pollution rose from 1997 to 2010, reaching a peak before stabilizing. Thus, these findings provide insights into the structure of the spatiotemporal air pollutants cycle and can assist policymakers in identifying sources and suggesting measures to reduce them. As a result, the study’s findings are promising and may guide future research on predicting multivariate air pollution statistics over the analyzed area.
... The seminal works of Ramsay and Silverman [25] and Ferraty and Vieu [26], among others, have significantly helped to popularize FDA to solve problems in many different domains apart from the statistics area. In this way, FDA is currently applied in domains ranging from material science, chemometrics [16,28,29], and engineering [20,21,30,31] to geosciences [32], medicine [33,34], genetics [35], or environmental sciences [20,[36][37][38], among others. ...
Article
Full-text available
The present work develops a methodology for the detection of outliers in functional data, taking into account both their shape and magnitude. Specifically, the multivariate method of anomaly detection called Local Correlation Integral (LOCI) has been extended and adapted to be applied to the particular case of functional data, using the calculation of distances in Hilbert spaces. This methodology has been validated with a simulation study and its application to real data. The simulation study has taken into account scenarios with functional data or curves with different degrees of dependence, as is usual in cases of continuously monitored data versus time. The results of the simulation study show that the functional approach of the LOCI method performs well in scenarios with inter-curve dependence, especially when the outliers are due to the magnitude of the curves. These results are supported by applying the present procedure to the meteorological database of the Alternative Energy and Environment Group in Ecuador, specifically to the humidity curves, presenting better performance than other competitive methods.