FIGURE 11 - uploaded by Darong Liu
Content may be subject to copyright.
The R 2 and RMSE of 21 years of rolling predictions.

The R 2 and RMSE of 21 years of rolling predictions.

Source publication
Article
Full-text available
Over the past few decades, floods have severely damaged production and daily life, causing enormous economic losses. Streamflow forecasts prepare us to fight floods ahead of time and mitigate the disasters arising from them. Streamflow forecasting demands a high-capacity model that can make precise long-term predictions. Traditional physics-based h...

Citations

... Furthermore, Liu et al. (2022) developed a Transformer-based model for monthly streamflow prediction on the Yangtze River, demonstrating its ability to incorporate both historical water levels and the influence of ENSO patterns. Similarly, Castangia et al. (2023) applied a Transformer model for predicting daily water levels within a river network, with a focus on capturing upstream hydrological signals. ...
... For this purpose, this study employs four key metrics that are widely recognized in the field of hydrological modeling and streamflow forecasting: Nash-Sutcliffe Efficiency (NSE), Kling-Gupta Efficiency (KGE), Pearson's r and Normalized Root Mean Square Error (NRMSE). These metrics are chosen for their proven interpretability and comprehensive ability to assess various facets of model performance, as supported by previous research (Kratzert et al., 2018;Xiang and Demir, 2021;Liu et al., 2022). ...
Preprint
Full-text available
This study explores the efficacy of a Transformer model for 120-hour streamflow prediction across 125 diverse locations in Iowa, US. Utilizing data from the preceding 72 hours, including precipitation, evapotranspiration, and discharge values, we developed a generalized model to predict future streamflow. Our approach contrasts with traditional methods that typically rely on location-specific models. We benchmarked the Transformer model's performance against three deep learning models (LSTM, GRU, and Seq2Seq) and the Persistence approach, employing Nash-Sutcliffe Efficiency (NSE), Kling-Gupta Efficiency (KGE), Pearson's r, and Normalized Root Mean Square Error (NRMSE) as metrics. The study reveals the Transformer model's superior performance, maintaining higher median NSE and KGE scores and exhibiting the lowest NRMSE values. This indicates its capability to accurately simulate and predict streamflow, adapting effectively to varying hydrological conditions and geographical variances. Our findings underscore the Transformer model's potential as an advanced tool in hydrological modeling, offering significant improvements over traditional and contemporary approaches.
... Zhu et al [16] proposed a multiscale domain adaptive method based on Transformer-CNN for fault diagnosis when data are scarce. Liu et al [3,23] proposed a dual encoder model based on transformers to predict the monthly runoff from the Yangtze River. Since the encoder and decoder of the Transformer use a self-attention mechanism, this leads to greater computational space complexity. ...
Article
Full-text available
Pipeline leakage detection is an integral part of pipeline integrity management. Combining AE (Acoustic Emission) with deep learning is currently the most commonly used method for pipeline leakage detection. However, this approach is usually applicable only to specific situations and requires powerful signal analysis and computational capabilities. To address these issues, this paper proposes an improved Transformer network model for diagnosing faults associated with abnormal working conditions in acoustic emission pipelines. First, the method utilizes the temporal properties of the GRU and the positional coding of the Transformer to capture and feature extract the data point sequence position information to suppress redundant information, and introduces the largest pooling layer into the Transformer model to alleviate the overfitting phenomenon. Second, while retaining the original attention learning mechanism and identity path in the original DRSN, a new soft threshold function is introduced to replace the ReLU activation function with a new threshold function, and a new soft threshold module and adaptive slope module are designed to construct the improved residual shrinkage unit (ASB-STRSBU), which is used to adaptively set the optimal threshold. Finally, pipeline leakage is classified. The experimental results show that the NDRSN model is able to make full use of global and local information when considering leakage signals and can automatically learn and acquire the important parameters of the input features in the spatial and channel domains. By optimizing the GRU improved Transformer network recognition model, the method significantly reduces the model training time and computational resource consumption while maintaining high leakage recognition accuracy. The average accuracy reached 93.97%. This indicates that the method has good robustness in acoustic emission pipeline leakage detection.
... While some past studies have claimed some architectures' superior performance compared to LSTM, most of the time the conclusions were highly conditional on using a small dataset for benchmarking (Abed et al., 2022;Amanambu et al., 2022;Ghobadi & Kang, 2022), or using procedures and configurations, e.g., training and test periods, sites, and forcing data, different from published benchmarks (Yin et al., 2022(Yin et al., , 2023, or on a case study which were not tested independently by other teams (Koya & Roy, 2023;Liu et al., 2022). In the interest of reproducibility and comparability which underpin scientific progress, it is a good idea to benchmark under the same conditions, on the same (reasonably large) dataset. ...
... Transformers are a sort of deep-learning neural network based on the attention mechanism presented by Vaswani et al. (2017). In the case of time series, which is the scope of our data, this mechanism allows the algorithm to "learn" the temporal dependence of the data, enabling parallel processing, which is an advantage when compared with Recurrent Neural Networks (RNN) (Vaswani et al., 2017;Liu, Liu, and Mu, 2022;Saoud, Al-Marzouqi, and Hussein, 2022;Lapeyrolerie and Boettiger, 2022). ...
... In the case of RNN, the processing is done sequentially, making it slow and susceptible to memory shortage problems, especially in massive datasets Liu, Liu, and Mu (2022). Although LSTM networks, a type of RNN, solved the memory problem Agung et al. (2022), processing continues linearly. ...
... Finally, we add the vectors of weighted values together, producing the self-attention output, which is a vector and will be sent to the feed-forward neural network layer. Equation 1 describes the attention calculation formula James et al. (2021); Yi et al. (2021); Liu, Liu, and Mu (2022); Agung et al. (2022); Alammar (2018). ...
Preprint
Full-text available
Solar flares are violent and sudden eruptions that occur in the solar atmosphere and release energy in the form of radiation. They can affect technological systems on Earth and in its orbit, causing financial losses and damage to human life. Therefore, it is necessary to predict the occurrence of such flares to mitigate their effects. Specialized instruments gather data for solar activity monitoring. Hence, we can create prediction models using machine learning from this data. From an analysis of the literature, we noticed the prevalence of some algorithms, such as Multi-layer Perceptrons (MLP), Support Vector Machines (SVM), and Long Short-Term Memory (LSTM), which presented good results, mainly considering the True Skill Statistic (TSS) metric. In parallel, in 2017, a new deep-learning based neural network architecture called Transformers emerged. Researchers initially created it for natural language processing. However, Transformers were successfully employed in other domains, such as time series forecasting. Solar activity data is considered a time series due to its continuous capture over time. Consequently, we can employ Transformers to develop a solar flare forecast model. Considering a significant lack of work using Transformers for solar flare forecasting, we ran experiments to test the Transformers' viability and performance in solar flare forecast models. We created models using other algorithms (MLP, SVM, LSTM, Transformers) to investigate the Transformers' performance and compared them using accuracy, TSS, and Area Under the ROC Curve (AUC) metrics. We observed that the Transformers had superior performance compared to the other models. For instance, the Transformers' TSS metric average was 0.9, contrasting the other models' TSS = 0.4. The difference was slightly smaller in AUC, where Transformers reached 0.9, and the others reached no more than 0.7. Therefore, we can use the Transformers to classify solar flare data and obtain superior results compared to other models. We also conducted experiments using different forms of data balancing, including unbalanced data, balanced with undersampling, oversampling, and SMOTE techniques. The MLP, SVM, and LSTM models showed significant improvements in balance, where the average TSS increased from 0.1 to 0.4. On the other hand, Transformers were not sensitive to data balancing, presenting the most stable TSS in all cases.
... In order to better realize the time series prediction under different tasks, scholars in related fields have improved it in many ways. For example, The MCST-Transformer [18] (Multi-channel spatiotemporal Transformer) is used to predict the traffic flow, the XGB-Transformer (Gradient Boosting Decision) Tree transformer model [19] has been used for power load prediction, solving the problem of the Transformer model being insensitive to local information in time series prediction tasks [20], and the Transformer-based dual encoder model is used for the prediction of the monthly runoff of the Yangtze River [21]. ...
Article
Full-text available
Flood forecasting helps anticipate floods and evacuate people, but due to the access of a large number of data acquisition devices, the explosive growth of multidimensional data and the increasingly demanding prediction accuracy, classical parameter models, and traditional machine learning algorithms are unable to meet the high efficiency and high precision requirements of prediction tasks. In recent years, deep learning algorithms represented by convolutional neural networks, recurrent neural networks and Informer models have achieved fruitful results in time series prediction tasks. The Informer model is used to predict the flood flow of the reservoir. At the same time, the prediction results are compared with the prediction results of the traditional method and the LSTM model, and how to apply the Informer model in the field of flood prediction to improve the accuracy of flood prediction is studied. The data of 28 floods in the Wan’an Reservoir control basin from May 2014 to June 2020 were used, with areal rainfall in five subzones and outflow from two reservoirs as inputs and flood processes with different sequence lengths as outputs. The results show that the Informer model has good accuracy and applicability in flood forecasting. In the flood forecasting with a sequence length of 4, 5 and 6, Informer has higher prediction accuracy, and the prediction accuracy is better than other models under the same sequence length, but the prediction accuracy will decline to a certain extent with the increase in sequence length. The Informer model stably predicts the flood peak better, and its average flood peak difference and average maximum flood peak difference are the smallest. As the length of the sequence increases, the number of fields with a maximum flood peak difference less than 15% increases, and the maximum flood peak difference decreases. Therefore, the Informer model can be used as one of the better flood forecasting methods, and it provides a new forecasting method and scientific decision-making basis for reservoir flood control.
... Xu et al. (2022) proposed a Transformer-based Generative Adversarial Network (GAN) for anomaly detection in time series. Liu et al. (2022) found that a double-encoder transformer model outperformed others in predicting the Yangtze River's stream flow for flood control. Nandi et al. (2022) utilized the ALTF Net for long-term temperature forecasting, and Hu and Xiao (2022) employed a self-attention-based RNN for extracting more information from time series data. ...
Article
Full-text available
In the realm of Earth systems modelling, the forecasting of rainfall holds crucial significance. The accurate prediction of monthly rainfall in India is paramount due to its pivotal role in determining the country’s agricultural productivity. Due to this phenomenon's highly nonlinear dynamic nature, linear models are deemed inadequate. Parametric non-linear models also face limitations due to stringent assumptions. Consequently, there has been a notable surge in the adoption of machine learning approaches in recent times, owing to their data-driven nature. However, it is acknowledged that machine learning algorithms lack automatic feature extraction capabilities. This limitation has propelled the popularity of deep learning models, particularly in the domain of rainfall forecasting. Nevertheless, conventional deep learning architectures typically engage in the sequential processing of input data, a task that can prove challenging and time-consuming, especially when dealing with lengthy sequences. To address this concern, the present article proposes a rainfall modelling algorithm founded on a transformer-based deep learning architecture. The primary distinguishing feature of this approach lies in its capacity to parallelize sequential input data through an attention mechanism. This attribute facilitates expedited processing and training of larger datasets. The predictive performance of the transformer-based architecture was assessed using monthly rainfall data spanning 41 years, from 1980 to 2021, in India. Comparative evaluations were conducted with conventional recurrent neural networks, long short-term memory, and gated recurrent unit architectures. Experimental findings reveal that the transformer architecture outperforms other conventional deep learning architectures based on root mean square error and mean absolute percentage error. Furthermore, the accuracy of each architecture's predictions underwent testing using the Diebold–Mariano test. The conclusive findings highlight the discernible and noteworthy advantages of the transformer-based architecture in comparison to the sequential-based architectures.
... Furthermore, due to their computational intensity and high parameter counts, traditional physically-based hydrological models require substantial computing resources, leading to significant computational costs (Mosavi et al., 2018;Sharma and Machiwal, 2021;Liu et al., 2022;Castangia et al., 2023). As a result, recent research (Yaseen et al., 2015) has explored alternative approaches to streamflow forecasting, indicating that machine learning, especially deep learning models, can serve as viable alternatives and often outperform physically-based models in terms of accuracy. ...
... Despite attention from other fields, there is a limited number of studies that focus on the performance and usage of transformers in streamflow forecasting. Liu et al. (2022) introduced a Transformer neural network model for monthly streamflow prediction of the Yangtze River in China. Their approach utilized historical water levels and incorporated the El Niño-Southern Oscillation (ENSO) as additional input features. ...
... In this study, we utilized three widely accepted metrics: Nash-Sutcliffe Efficiency (NSE), Pearson's r, and Normalized Root Mean Square Error (NRMSE). These metrics have been extensively applied in hydrological modeling and streamflow forecasting research due to their interpretability and ability to capture different aspects of model performance (Kratzert et al., 2018;Liu et al., 2022). ...
Preprint
In this paper, we address the critical task of 24-hour streamflow forecasting using advanced deep-learning models, with a primary focus on the Transformer architecture which has seen limited application in this specific task. We compare the performance of five different models, including Persistence, LSTM, Seq2Seq, GRU, and Transformer, across four distinct regions. The evaluation is based on three performance metrics: Nash-Sutcliffe Efficiency (NSE), Pearson’s r, and Normalized Root Mean Square Error (NRMSE). Additionally, we investigate the impact of two data extension methods: zero-padding and persistence, on the model's predictive capabilities. Our findings highlight the Transformer's superiority in capturing complex temporal dependencies and patterns in the streamflow data, outperforming all other models in terms of both accuracy and reliability. The study's insights emphasize the significance of leveraging advanced deep learning techniques, such as the Transformer, in hydrological modeling and streamflow forecasting for effective water resource management and flood prediction.
... With the development of machine learning and the production of large amounts of data, people began to use more sophisticated black box models, among which the common models are RNN and (volume neural network) CNN models, but traditional RNN models and CNN models in dealing with long sequence input problems, as the length of the sequence increases, will produce a marked explosion of scale and long-term information loss problems, and to mitigate such problems Sepp Hochreiter and others proposed the LSTM [29] model. LSTM models based on RNN architectures have been proposed to mitigate problems inherent in traditional RNN models, and have been widely used for predicting water quality sequences [30]. In 2019, TAO and others [31] realised air pollution prediction using a one-dimensional volume combined with a two-way door cycle unit, demonstrating the advantage of the model in comparison with machine learning. ...
... For the MLP model, the hyperparameter optimization included learning rate (0.001, 0.01, 0.1), number of hidden layers (32,64,128), and maximum iteration number (200, 300, 400, 500, 600, 700, 800). The hyperparameters for the Classification and Regression Tree (CART) model included maximum tree depth (10,15,20,25,30), minimum sample required to split internal nodes (2,3,4,5,10), and minimum sample required at leaf nodes (2,3,4,5,6). The optimization hyperparameters for Random Forest were maximum tree depth (10,20,30,40) and number of decision trees (20,40,50,60,70,80,90,100,120,150). ...
... The hyperparameters for the Classification and Regression Tree (CART) model included maximum tree depth (10,15,20,25,30), minimum sample required to split internal nodes (2,3,4,5,10), and minimum sample required at leaf nodes (2,3,4,5,6). The optimization hyperparameters for Random Forest were maximum tree depth (10,20,30,40) and number of decision trees (20,40,50,60,70,80,90,100,120,150). XGBoost's optimization hyperparameters were learning rate (0.001, 0.01, 0.1), maximum tree depth (3,5,7,9), and number of trees used by the model (100, 200, 300, 400, 500). ...
Article
Full-text available
This paper focuses on water quality prediction in the presence of a large number of missing values in water quality monitoring data. Current water quality monitoring data mostly come from different monitoring stations in different water bodies. As the duration of water quality monitoring increases, the complexity of water quality data also increases, and missing data is a common and difficult to avoid problem in water quality monitoring. In order to fully exploit the valuable features of the monitored data and improve the accuracy of water quality prediction models, we propose a long short-term memory (LSTM) encoder-decoder model that combines a Kalman filter (KF) with an attention mechanism. The Kalman filter in the model can quickly complete the reconstruction and pre-processing of hydrological data. The attention mechanism is added between the decoder and the encoder to solve the problem that traditional recursive neural network models lose long-range information and fully exploit the interaction information among high-dimensional covariate data. Using original data from the Haimen Bay water quality monitoring station in the Lianjiang River Basin for analysis, we trained and tested our model using detection data from 1 January 2019 to 30 June 2020 to predict future water quality. The results show that compared with traditional LSTM models, KF-LSTM models reduce the average absolute error (MAE) by 10%, the mean square error (MSE) by 21.2%, the root mean square error (RMSE) by 13.2%, while increasing the coefficient of determination (R2) by 4.5%. This model is more suitable for situations where there are many missing values in water quality data, while providing new solutions for real-time management of urban aquatic environments.
... While some past studies have claimed some architectures' superior performance compared to LSTM, most of the time the conclusions were highly conditional on using a small dataset for benchmarking (Abed et al., 2022;Amanambu et al., 2022;Ghobadi & Kang, 2022), or using procedures and configurations, e.g., training and test periods, sites, and forcing data, different from published benchmarks (Yin et al., 2022(Yin et al., , 2023, or on a case study which were not tested independently by other teams (Koya & Roy, 2023;Liu et al., 2022). In the interest of reproducibility and comparability which underpin scientific progress, it is a good idea to benchmark under the same conditions, on the same (reasonably large) dataset. ...
Preprint
For a number of years since its introduction to hydrology, recurrent neural networks like long short-term memory (LSTM) have proven remarkably difficult to surpass in terms of daily hydrograph metrics on known, comparable benchmarks. Outside of hydrology, Transformers have now become the model of choice for sequential prediction tasks, making it a curious architecture to investigate. Here, we first show that a vanilla Transformer architecture is not competitive against LSTM on the widely benchmarked CAMELS dataset, and lagged especially for the high-flow metrics due to short-term processes. However, a recurrence-free variant of Transformer can obtain mixed comparisons with LSTM, producing the same Kling-Gupta efficiency coefficient (KGE), along with other metrics. The lack of advantages for the Transformer is linked to the Markovian nature of the hydrologic prediction problem. Similar to LSTM, the Transformer can also merge multiple forcing dataset to improve model performance. While the Transformer results are not higher than current state-of-the-art, we still learned some valuable lessons: (1) the vanilla Transformer architecture is not suitable for hydrologic modeling; (2) the proposed recurrence-free modification can improve Transformer performance so future work can continue to test more of such modifications; and (3) the prediction limits on the dataset should be close to the current state-of-the-art model. As a non-recurrent model, the Transformer may bear scale advantages for learning from bigger datasets and storing knowledge. This work serves as a reference point for future modifications of the model.
... In addition, the attention-mechanism allows to tackle very different tasks involving both structured and unstructured data, motivating its application in diverse predictive domains (Jaegle et al., 2021). Liu et al. (2022b) proposed a Transformer neural network for predicting the monthly streamflow of the Yangtze River using both past water levels and the El Niño-Southern Oscillation (ENSO) as input. The Transformer architecture demonstrated superior performance with respect to several machine learning models, including convolutional and recurrent neural networks. ...
... In detail, we aim at predicting the water level of a target river by using the past water levels of its upstream branches as input. Differently from Liu et al. (2022b), we applied the Transformer directly to the raw streamflow data without applying any transformation to the input (e.g. variational mode decomposition). ...
Article
Floods are one of the most devastating natural hazards, causing several deaths and conspicuous damages all over the world. In this work, we explore the applicability of the Transformer neural network to the task of flood forecasting. Our goal consists in predicting the water level of a river one day ahead, by using the past water levels of its upstream branches as predictors. The methodology was validated on the severe flood that affected Southeast Europe in May 2014. The results show that the Transformer outperforms recurrent neural networks by more than 4% in terms of the Root Mean Squared Error (RMSE) and 7% in terms of the Mean Absolute Error (MAE). Furthermore, the Transformer requires lower computational costs with respect to recurrent networks. The forecasting errors obtained are considered acceptable according to the domain standards, demonstrating the applicability of the Transformer to the task of flood forecasting.