Table 3 - uploaded by Yves Comeau
Content may be subject to copyright.
-Potential sources of errors during analysis.

-Potential sources of errors during analysis.

Source publication
Article
Full-text available
Model results are only as good as the data fed as input or used for calibration. Data reconciliation for wastewater treatment modeling is a demanding task, and standardized approaches are lacking. This paper suggests a procedure to obtain high-quality data sets for model-based studies. The proposed approach starts with the collection of existing hi...

Context in source publication

Context 1
... use of control charts is recommended strongly (Montgomery, 2005). Table 3 shows typical sources of errors in the analysis of water samples. ...

Citations

... 3. Robust: Outliers are common in industrial systems. For example, in wastewater treatment systems, an outlier may be caused by an electrical glitch, an air bubble on a sensor, or a brief spike of contaminant in the influent Rieger et al. 2010;Newhart et al. 2019 being robust to univariate outliers, the method should also be robust to multivariate outliers. 5. Adaptive: The optimal window width may not be known in advance and may change over time as the behavior of the process changes. ...
Article
Full-text available
High-frequency, multivariate data collected in real-time and used to control or make decisions regarding a process’ operation often contain some noise and outliers. Thus, a method to extract the signal is needed in order to reduce the number and magnitude of control-based adjustments that are implemented. Such a method must be (i) online, depending only on past and current observations; (ii) fast, producing a smooth value more quickly than the measurement frequency; (iii) robust, ignoring brief bursts of erroneously measured values; (iv) multivariate, ignoring observations that are jointly unusual; (v) adaptive, adjusting to periods of rapid fluctuation in the signal versus periods of stability; and (vi) purely data-driven, not incorporating any information about the process from which the data are collected. Most existing methods are only able to address a subset of these six features. Furthermore, we also require the method to be nonlinear, providing a local nonlinear estimate of the signal. In this work, we propose a novel, real-time signal extraction method based on a local, robust polynomial fit. We demonstrate the performance of our method compared to a state-of-the-art competitor through simulation. For illustration, the methodology is applied to data collected from a reverse osmosis water treatment process.
... La inteligencia humana y las rutinas inteligentes siguen siendo necesarias para categorizar, estructurar, homogeneizar y convertir los datos en información valiosa. De hecho, este importante paso exige fácilmente el 40% de los costes en la mayoría de los proyectos de consultoría y ciencia de datos, tanto en el sector de tratamiento de aguas residuales como en otros sectores [10][11][12]. Este coste está asociado en gran medida a la necesidad de clasificar los datos disponibles (es decir, separar los datos adecuados y que sirven para el propósito que se persigue de los datos que no sirven) para evitar el problema común de Garbage-In Garbage-Out, que es actualmente más evidente que nunca. ...
... Este tipo de información descriptiva se conoce como metadatos y es un ingrediente esencial para convertir grandes volúmenes de datos sin procesar en información procesable. De hecho, se necesita un conocimiento detallado de las mediciones realizadas para poder realizar un análisis de datos consistente y creativo, a fin de garantizar un impacto en las decisiones operativas y de diseño [12]. Desafortunadamente, en el sector del agua no existen pautas específicas disponibles para la producción, selección, priorización y gestión de metadatos. ...
Article
Full-text available
RESUMEN Los metadatos hacen referencia a información descriptiva (como ubicación del sensor, unidad de medida, rango de medida, fecha de calibración, fecha de limpieza, si ocurrió algún evento como episodio de lluvia/fallo operativo/vertido tóxico …) que es esencial para convertir los grandes volúmenes de datos que se recogen actualmente en las instalaciones de tratamiento de agua y que están sin procesar en información y recursos útiles. Con el avance de la digitalización en el sector del agua, es fundamental evitar los cementerios de datos y, por otro lado, utilizar los datos almacenados para resolver problemas actuales y futuros. Este artículo se centra en el papel crucial que tienen los metadatos para responder a desafíos futuros y posiblemente impredecibles. El objetivo de este documento es presentar el ‘reto de los metadatos’ y destacar la necesidad de tener en cuenta los metadatos cuando se recoge información como parte de las buenas prácticas de digitalización.
... The filtered data set was then considered suitable for applying data reconciliation. Rieger et al. 10 suggested that an intensive measuring campaign should be started after several weeks (i.e., the duration of 2 to 3 times the solids retention time) of stable operation, i.e. operation without significant changes in flows, recycles, precipitant dosage, and so on, to obtain reasonable and more reliable data that reflect the typical WWTP operation condition. The data should be collected over a period within about one solids retention time (SRT) to maintain consistency. ...
Article
This study deals with the application of data reconciliation to wastewater treatment processes which are subject to dynamic conditions and therefore do not reach a steady-state behaviour sensu stricto. The...
... Data quality is a legitimate concern for use of big-data in modelling. In traditional WRRF mechanistic modelling of WRRF, data reconciliation can take up a third of study time in order to prevent faulty conclusions from poor-quality data (Rieger et al. 2010). Many facilities have little or no established data validation or quality assessment methods allowing unregulated collection of poor-quality data. ...
... Meijer et al. (2002) already pointed out that mass balance and DR techniques provide useful information for the process evaluation and WWTP design and benchmarking. Rieger et al. (2010) proposed a procedure based on DR to obtain high-quality data for WWTP simulation and to detect the typical sources of the errors. Recently, the methods of DR are clearly applied to data obtained from WWTP (Villez et al. 2013a, b). ...
Article
Data reconciliation and mass balance analysis were conducted for the first time to improve the data obtained from a petrochemical wastewater treatment plant (WWTP), and the results were applied to evaluate the performance of the plant. Daily average values for 209 days from the inlet and outlet of the plant obtained from WWTP documentation center along with the results of four sampling runs in this work were used for data reconciliation and performance evaluation of the plant. Results showed that standard deviation and relative errors in the balanced data of each measurement decreased, especially for the process wastewa-ter from 24.5 to 8.6 % for flow and 24.5 to 1.5 % for chemical oxygen demand (COD). The errors of measured data were −137 m 3 /day (−4.41 %) and 281 kg/day (7.92 %) for flow and COD, respectively. According to the balanced data, the removal rates of COD and 5-day biological oxygen demand (BOD 5) through the aeration unit were equal to 37 and 46 %, respectively. In addition, the COD and BOD 5 concentrations were reduced by about 61.9 % (2137 kg/day) and 78.1 % (1976 kg/day), respectively , prior to the biological process. At the same time, the removal rates of benzene, toluene, and styrene were 56, 38, and 69%,respectively.Theresults revealed thatabout 40 % of influent benzene (75.5 kg/day) is emitted to the ambient air at the overhead ofthe equalizationbasin.Itcanbeconcludedthatthevolatilization of organic compounds is the basic mechanism for the removal of volatile organic compounds (VOCs) and it corresponds to the main part of total COD removal from the WWTP.
... The generation of data in wastewater treatment plants rises from the need of monitoring the quality of the water and the amount of pollutants reduced to meet the environmental regulations for the emission of pollutants (Rieger et al., 2010). Since the last decades, the application of data-driven methods aka. ...
Conference Paper
Full-text available
Mechanistic modelling of phosphorus removal in tertiary treatment through solubility equilibrium models, adsorption models and chemical precipitation are highly dependent on the wastewater composition of dissolved and particulate water constituents. Moreover, modelling very low levels of phosphorus (< 0.2 mg L-1) through these models is challenging and highly depends on the integration methods applied for solving these models. In this work we propose data-driven methods for modelling low concentrations of phosphorus in tertiary treatment. Three phosphorus species were modelled; total phosphorus (TP), soluble TP (sTP) and soluble reactive phosphorus (sRP). The concentrations of these species fluctuated around 0.1 mg L-1 in the effluent, and the model acknolwedged the addition of different coagulants over the operational period; AlCl3 and FeCl3. In a first stage, two data-driven methods were evaluated; convolutional neural networks (CNN) and support vector machines (SVM). Although SVM outperformed CNN (R 2 > 0.80), we aimed to achive higher accuracy and for this purpose, ensemble methods were built. In machine learning, ensemble methods are composed by a group of predictors which usually perform better than a single one, thus: wisdom of the crowd. In this work boosting ensemble methods were evaluated. We built a gradient boosting algorithm with support vector machines or GB-SVM. GB-SVM was highly accurate to model the phosphorus species.
... Generation of data in wastewater treatment arose from the need of monitoring and controlling the quality of water and removal of pollutants in bWWTP to meet with the environmental regulations (Hreiz et al., 2015;Rieger et al., 2010). In bWWTP from different scales (lab, pilot or full-scale), large amounts of data are generated daily. ...
... Over the last decades, the application of data-driven methods, has been studied to study different bWWTP and has gained great popularity due to the high adaptability and low computational demand in comparison to deterministic models such as the activated sludge models (ASM) (Corominas et al., 2018). The generation of data in WWTP, that result from monitoring the quality of the effluent to meet the environmental regulations (Rieger et al., 2010), has led to a source of databases which can benefit from ML to extract novel and valuable knowledge. In supervised ML, a model is provided with examples of data for training. ...
... One possible set of independent hydraulic balances can be found by computing the fundamental cutsets of the graph. An advantage is that this provides a precise denition for the concept of overlapping balances, which has only been dened loosely so far (see e.g., Rieger et al., 2010;Spindler, 2014;Le et al., 2018), namely: any two cutsets with a shared edge represent a pair of overlapping balances. ...
Preprint
The advent of affordable computing, low-cost sensor hardware, and high-speed and reliable communications have spurred ubiquitous installation of sensors in complex engineered systems. However, ensuring reliable data quality remains a challenge. Exploitation of redundancy among sensor signals can help improving the precision of measured variables, detecting the presence of gross errors, and identifying faulty sensors. The cost of sensor ownership, maintenance efforts in particular, can still be cost-prohibitive however. Maximizing the ability to assess and control data quality while minimizing the cost of ownership thus requires a careful sensor placement. To solve this challenge, we develop a generally applicable method to solve the multi-objective sensor placement problem in systems governed by linear and bilinear balance equations. Importantly, the method computes all Pareto-optimal sensor layouts with conventional computational resources and requires no information about the expected sensor quality.
... For example, digitalisation may be based on online sensor water quality data (Blumensaat et al., 2019;Vanrolleghem and Lee, 2003;Yuan et al., 2019). Inaccurate data due to sensor fouling, drift, or lack or maintenance or calibration can lead to erroneous results and consequently fallacious decisions (Rieger et al., 2010). Sewer systems especially are very harsh environments for sensors (Campisano et al., 2013). ...
Article
The ongoing COVID-19 pandemic is, undeniably, a substantial shock to our civilization which has revealed the value of public services that relate to public health. Ensuring a safe and reliable water supply and maintaining water sanitation has become ever more critical during the pandemic. For this reason, researchers and practitioners have promptly investigated the impact associated with the spread of SARS-CoV-2 on water treatment processes, focusing specifically on water disinfection. However, the COVID-19 pandemic impacts multiple aspects of the urban water sector besides those related to the engineering processes, including sanitary, economic, and social consequences which can have significant effects in the near future. Furthermore, this outbreak appears at a time when the water sector was already experiencing a fourth revolution, transitioning toward the digitalisation of the sector, which redefines the Water-Human-Data Nexus. In this contribution, a product of collaboration between academics and practitioners from water utilities, we delve into the multiple impacts that the pandemic is currently causing and their possible consequences in the future. We show how the digitalisation of the water sector can provide useful approaches and tools to help address the impact of the pandemic. We expect this discussion to contribute not only to current challenges, but also to the conceptualization of new projects and the broader task of ameliorating climate change.
... One possible set of independent hydraulic balances can be found by computing the fundamental cutsets of the graph. An advantage is that this provides a precise definition for the concept of overlapping balances, which has only been defined loosely so far (see e.g., Rieger et al., 2010;Spindler, 2014;Le et al., 2018 ), namely: any two cutsets with a shared edge represent pair of overlapping balances. ...
Article
The advent of affordable computing, low-cost sensor hardware, and high-speed and reliable communications have spurred installation of ubiquitous sensors in complex engineered systems. However, ensuring reliable data quality remains a challenge. Exploitation of redundancy among sensor signals can help improving the precision of measured variables, detecting the presence of gross errors, and identifying faulty sensors. The cost of sensor ownership, maintenance efforts in particular, can still be cost-prohibitive however. Maximizing the ability to assess and control data quality while minimizing the cost of ownership thus requires a careful sensor placement. To solve this challenge, we develop a generally applicable method to solve the multi-objective sensor placement problem in systems governed by linear and bilinear balance equations. Importantly, the method computes all Pareto-optimal sensor layouts with conventional computational resources and requires no information about the expected sensor quality.