Figure - available from: Soft Computing
This content is subject to copyright. Terms and conditions apply.
The process flow diagram of HDPE cascade reaction

The process flow diagram of HDPE cascade reaction

Source publication
Article
Full-text available
The operational data of advanced process systems have met with explosive growth, but its fluctuations are so slight that the number of the extracted representative samples is quite limited, making it difficult to reflect the nature of the process and to establish prediction models. In this study, inspired by the process of fisherman repairing nets,...

Similar publications

Article
Full-text available
This study discusses the development of a prediction model for the classification of rainfall based on time in Java. The method used in this research is naive Bayes and simple kriging. Naive Bayes is used for classification prediction, while simple kriging is an interpolation method used for mapping. There are two scenarios used, that is building a...
Article
Full-text available
Rainfall erosivity is an important factor to be considered when predicting soil erosion. Precipitation data for 1971–2010 from 39 stations located in the Loess Plateau of China were collected to calculate the spatiotemporal variability of rainfall erosivity, and the long-term tendency of the erosivity was predicted using data from the HadGEM2-ES mo...
Article
Full-text available
Prediction methods are important for many applications. In particular, an accurate prediction for the total number of cases for pandemics such as the Covid-19 pandemic could help medical preparedness by providing in time a sucient supply of testing kits, hospital beds and medical personnel. This paper experimentally compares the accuracy of ten pre...
Article
Full-text available
Missing values in data series is a common problem in many research and applications. Most of existing interpolation methods are based on spatial or temporal interpolation, without considering the spatiotemporal correlation of observation data, resulting in poor interpolation effect. In this paper, a Modified Spatiotemporal Mixed-Effects (MSTME) mod...

Citations

... However, data-based surface roughness prediction has rarely been investigated in UPM due to the small sample size problem and the requirement for prediction accuracy. A large number of studies have shown that the virtual sample generation (VSG) technique will be an effective solution to the small sample problem, and it is widely used in petroleum, chemical, and other process industries [18][19][20][21][22]. ...
Article
Full-text available
Surface roughness is one of the main bases for measuring the surface quality of machined parts. A large amount of training data can effectively improve model prediction accuracy. However, obtaining a large and complete surface roughness sample dataset during the ultra-precision machining process is a challenging task. In this article, a novel virtual sample generation scheme (PSOVSGBLS) for surface roughness is designed to address the small sample problem in ultra-precision machining, which utilizes a particle swarm optimization algorithm combined with a broad learning system to generate virtual samples, enriching the diversity of samples by filling the information gaps between the original small samples. Finally, a set of ultra-precision micro-groove cutting experiments was carried out to verify the feasibility of the proposed virtual sample generation scheme, and the results show that the prediction error of the surface roughness prediction model was significantly reduced after adding virtual samples.
... 43 Remote sensing data have the advantages of wide coverage 44 and non-destructive estimation [12], which provide effective 45 information for crop LAI estimation and monitoring at re-46 gional scale [13]. In recent years, many researchers have 47 proposed many LAI estimation algorithms based on satellite 48 remote sensing image data [10,11,14,15]. However, there 49 are still some limitations in the application of satellite remote 50 sensing data. ...
... The 44 SAIL model takes vegetation as a mixed medium, assumes 45 uniform distribution of blade azimuth, considers arbitrary 46 leaves inclination and simulates the bidirectional reflectance 47 of canopy. It includes 8 input parameters, which are leaf area 48 index, LAI (unitless); mean leaf inclination angle , ALA ( • ); 49 hot spot parameters, hspot (mm −1 ); soil brightness parameters, 50 psoil (unitless); sky diffuse ratio, skyl (unitless); solar zenith 51 angle, sza ( • ); observed zenith angle, oza ( • ); and observed 52 relative azimuth angle, psi ( • ). The weight and bias calculation formula of BP algorithm are 28 as follows: ...
... 80 C. Model validation 1 28 LAI measured samples in the study area were divided 2 into a training set (21 samples) and a test set (7 samples). Since 3 there were few test samples available, Kriging interpolation 4 method was used to interpolate the original sample points 5 into a map with 0.3 m spatial resolution, then data at trusted 6 points around the original sample points were selected as 7 additional test samples [48]. Therefore, extra 14 samples, They are defined as follows: ...
Article
Full-text available
Leaf area index (LAI) is an important indicator for crop growth monitoring. Due to the small number of ground-measured samples, the hybrid method, using the radiative transfer model (RTM) to generate simulated samples and combining with the regression model, is popular for LAI estimation. However, there is still difference between simulated spectrum and measured spectrum, which may affect the inversion results. In this study, an iterative hybrid method combines BP neural network, PROSAIL model and an optimal sample selection method for crop LAI estimation was proposed. A small number of ground-measured samples and unmanned aerial vehicle (UAV) hyperspectral data were used to estimate LAI firstly. Then the initial LAI result was used as the parameter of PROSAIL model to generate the simulated spectrum. The simulated spectrum with high similarity with the UAV spectrum and corresponding LAI value would be treated as new samples for BP neural network. After several iterations, a reasonable sample set was obtained to estimate winter wheat LAI. The method proposed in this study is evaluated using ground-measured test samples and compared with the common hybrid methods. Results indicate that with the increase of the number of training samples, the accuracy of estimation model is improved (RMSE/MAE decreased from 0.4685/0.0301 to 0.4377/0.0272, respectively, while R <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> increased from 0.5857 to 0.6384). Also, the accuracy of proposed iterative hybrid model is higher than that of commonly used hybrid model. The experiments demonstrate the relatively high accuracy of the proposed iterative hybrid method, which could be used for vegetation parameter estimation with only a small number of ground samples.
... Furthermore, deep forest classification (DFC) based on non-neural network model is proposed by considering the idea of DNN and ensemble forest to address small sample [15,16]. However, DFC cannot guarantee the generalization performance due to the limited samples for KPI modeling [17,18]. Virtual sample generation (VSG) has been proposed to address the aforementioned problem, and it is widely used in complex industrial processes such as petroleum, chemical, and mechanical manufacturing [19,20]. ...
Article
Full-text available
Key performance indicators of complex industrial process such as production quality and pollutant emissions concentration are difficult to be measured online due to limited detection technology and high economical cost. Their modeling samples have high dimension, strong uncertainty, and small sample, which cannot satisfy the needs of traditional machine learning algorithms. A virtual sample generation method based on generative adversarial fuzzy neural network (GAFNN) is proposed to address the abovementioned problems. First, an adaptive feature selection algorithm based on random forest is used to reduce input feature for the original real samples. Second, candidate virtual samples are generated by GAFNN to alleviate the problems of uncertainty and small sample. Third, the virtual samples are screened by a multi-constrained selection mechanism to improve the quality of virtual samples. Finally, a deep forest classification model is constructed on the basis of the mixed samples in terms of the original real and selected virtual samples. The effectiveness of the proposed method is verified on benchmark and real industrial data.
... It should be noted that recently, this topic has attracted the attention of a large number of data processing specialists from the fields of mathematical statistics [8,9], statistical methods in medicine [10,11] and physiological studies [12], as well as in the analysis of industrial processes [13,14]. Moreover, along with the statistical methods in this area, it requires the development of new algorithms and the application of graph theory elements, particularly in the study of protein networks [15]. ...
Article
Full-text available
The interest in large or extreme outliers in arrays of empirical information is caused by the wishes of users (with whom the author worked): specialists in medical and zoo geography, mining, the application of meteorology in fishing tasks, etc. The following motives are important for these specialists: the substantial significance of large emissions, the fear of errors in the study of large emissions by standard and previously used methods, the speed of information processing and the ease of interpretation of the results obtained. To meet these requirements, interval pattern recognition algorithms and the accompanying auxiliary computational procedures have been developed. These algorithms were designed for specific samples provided by the users (short samples, the presence of rare events in them or difficulties in the construction of interpretation scenarios). They have the common property that the original optimization procedures are built for them or well-known optimization procedures are used. This paper presents a series of results on processing observations by allocating large outliers as in a time series in planar and spatial observations. The algorithms presented in this paper differ in speed and sufficient validity in terms of the specially selected indicators. The proposed algorithms were previously tested on specific measurements and were accompanied by meaningful interpretations. According to the author, this paper is more applied than theoretical. However, to work with the proposed material, it is required to use a more diverse mathematical tool kit than the one that is traditionally used in the listed applications.
... In fact, the Bootstrap method produces a relatively good result. However, the VSG strategy of the Bootstrap method is based on resampling of the original samples, which is unable to introduce new sample information to fill the information gaps in small samples [69]. Thus, Bootstrap's ability to solve the small sample problem is more limited. ...
Article
Full-text available
Typhoon storm surge disaster is the most severe marine disaster in China. Accurate estimation of typhoon storm surge disaster loss (TSSDL) is significant for emergency decisions and economic sustainability development. However, the TSSDL estimation is limited by small sample conditions, resulting in the low accuracy of the TSSDL estimation model based on machine learning. To solve the problems of easy overfitting and poor generalization ability of machine learning models under small sample conditions, the high-accuracy TSSDL estimation method was proposed. Firstly, this estimation method combines Gaussian Noise with the Information Diffusion Model based on the Vibrating String equation (named GN-VSIDM) to generate virtual samples to augment the original training set. Then, the augmented training set was applied to train three machine learning models. The results are as follows: the virtual sample generation method, i.e., GN-VSIDM, solves the small sample problem in the TSSDL estimation process and improves the machine models’ accuracy and robustness. Based on the GN-VSIDM and the eXtreme Gradient Boosting (named XGBoost) methods, the joint model GN-VSIDM-XGBoost is the optimal TSSDL estimation model. Compared with the original XGBoost model, the RMSE and R² of the GN-VSIDM-XGBoost model are 0.1089 and 0.8292, which reduces 25.67% and improves 19.88%, respectively. Besides, the GN-VSIDM-XGBoost model possesses excellent robustness. The GN-VSIDM overcomes the limitation on the performance of TSSDL estimation models based on machine learning under small sample conditions. This study provides an effective case and method for solving the small sample problem in disaster loss assessment.
... Finally, in [35], the study simulated the process of fishermen rectifying nets; this method was named Kriging-VSG and it was put forward to produce feasible virtual samples in data-sparse zones. is method was based on a distance-based criterion by imposing each dimension to recognize important samples with huge data gaps. ...
... Among which, interpolation is beneficial to deal with unbalanced and small-sample datasets. For example, Zhu at el. [28] used the distance criterion to identify large information intervals and performed Kriging interpolation. Zhang at el. [29] employed the manifold learning Isomap method to find small-sample sparse areas for interpolation. ...
Article
Dioxin (DXN), which is named a “century poison”, is emitted from municipal solid waste incineration (MSWI). The first step to effectively control and reduce DXN emissions is the application of soft sensors by utilizing easy-to-detect process data. However, DXN samples for data-driven modeling are extremely lacking because of the high cost and long period of measurement. To address the above issue, this work proposes a DXN emission prediction method based on expansion, interpolation, and selection for small-sample modeling, i.e., EIS-SSM, involving three main steps: domain expansion, hybrid interpolation, and virtual sample selection. First, the domain of samples is determined by domain extension, a great number of virtual samples in this domain are generated through hybrid interpolation, and the optimal virtual samples are chosen for virtual sample selection. Afterward, a prediction model for DXN emission is constructed using the optimal samples and raw small samples. Two cases, that is, a benchmark dataset and a DXN dataset from an actual MSWI plant, are applied to implement the proposed method. Results showed that compared with the non-expansion and existing expansion methods, the proposed method exhibits an improved performance by 48.22% and 13.68%, respectively, in the benchmark experiment and by 72.44% and 34.67%, respectively, in the DXN emission prediction experiment. Therefore, the proposed method can substantially improve the prediction of DXN emission from MSWI.
... For example, Yu and Zhang (2021) used a hybrid distribution method to generate virtual samples to solve different types of attributes in internet loan credit risk assessment, and obtained some better performance than other methods listed in the article. However, for high dimensional data, fitting data distribution is difficult for these methods (Zhu et al., 2020). ...
Article
Data scarcity is a serious issue in credit risk assessment for some emerging financial institutions. As a typical category of data scarcity, small sample with high dimensionality often leads to the failure to build an effective credit risk assessment model. To solve this issue, a Wasserstein generative adversarial networks (WGAN)-based data augmentation and hybrid feature selection method is proposed for small sample credit risk assessment with high dimensionality. In this methodology, WGAN is first used to produce the virtual samples to overcome the data instance scarcity issue, and then a kernel partial least square with quantum particle swarm optimization (KPLS-QPSO) algorithm is proposed to solve the high-dimensionality issue. For verification purposes, two small sample credit datasets with high dimensionality are used to demonstrate the effectiveness of the proposed methodology. Empirical results indicate that the proposed methodology can significantly improve the prediction performance and avoid possible economic losses in credit risk assessment. This implies that the proposed methodology is a competitive approach to small sample credit risk assessment with high dimensionality.
... Also, other VSG-based methods were proposed and applied for different purposes, particularly in industrial fields e.g. (Chang et al., 2014;He et al., 2018;Liu and Li, 2020;MacAllister et al., 2020;Olesen and Shaker, 2021;Zhu et al., 2020Zhu et al., , 2021aZhu et al., , 2021b. Interestingly, all these methods showed a potential improvement of ML models for regression problems by generating virtual data from small original samples. ...
Article
Full-text available
Deep Neural Network (DNN) is a powerful tool for predicting and monitoring water quality. However, its application is only limited to well-monitored zones where the availability of data for training and validation phases. In this study, we attempt to develop a novel framework based on Multivariate distributions (MVD) (elliptical copulas)-based Virtual Sample Generation (VSG) method to broaden the application of DNN to predict water quality even with a small dataset. This framework is evaluated to predict the Entropy Weighted Water Quality Index (EWQI) using DNN and Electrical Conductivity, Temperature, and pH as input variables, in Berrechid and Chaouia aquifer systems, Morocco. Validation results showed that the virtual samples generated from 400, 50, 30, and 20 original samples improved the NSE from 0.88 to 0.92, from 0.53 to 0.91, from 0.42 to 0.91, and from 0.24 to 0.87, respectively. Besides, sensitivity analysis of the methodology to the virtual data sizes and the original samples showed that the RMSE and NSE of the DNN models have limits in function to virtual data sizes according to the first order Exponential Decay and logistic trends, respectively. These limits highly depend on original sample sizes. Such empirical trends are crucial for reproducing the proposed methodology in other sites to determine optimal virtual datasets. Overall, the proposed methodology provided new insights to improve the DNN model performances in predicting water quality with small datasets. Hence, it is useful to manage water quality in order to supply clean water for the population in poorly monitored zones.
... Furthermore, it has an admirable capability to provide an accurate surrogate model based only on limited HF samples, giving rise to a problem: can the MFS framework be used to reduce the computational budget of establishing a single-fidelity model? In other words, since the MFS framework has such outstanding characteristics, how can it be reasonably utilized to address small sample size problems prevalent in the process industry, computer science, biomedical engineering, and material science fields [33]? ...
Article
Full-text available
As an effective approximation tool, surrogate models have been extensively studied and play an increasingly important role in different areas of engineering. In this paper, a novel surrogate model, termed correlation mapping surrogate (CMS), is proposed based on the Rayleigh quotient and the multi-fidelity surrogate framework. The CMS model has a distinct hierarchical structure because of its step-by-step modeling process, enabling it to obtain accurate predictions relying on a small number of samples alone. To evaluate its prediction accuracy, a series of comparative experiments are conducted, and four popular surrogates, namely Kriging, polynomial response surface, radial basis function, and least-squares support vector regression, are selected as the benchmark models. The key issues of the CMS model, that is, its robustness and ability to handle practical problems, are also investigated. The results demonstrate that the CMS model shows a higher performance on both numerical and practical engineering problems than the other benchmark models, indicating its satisfactory feasibility, practicality, and stability.