Figure - available from: Soft Computing
This content is subject to copyright. Terms and conditions apply.
Decision tree for credit card data when MIC ≤ 0.10 is pre-specified for stop growing tree

Decision tree for credit card data when MIC ≤ 0.10 is pre-specified for stop growing tree

Source publication
Article
Full-text available
The copula is well-known for learning scale-free measures of dependence among variables and has invited much interest in recent years. At the very coronary heart of the copula, the concept is the well-known theorem of Sklar. It states that any multivariate distribution function can be disintegrated into the marginal distributions and a copula, whic...

Similar publications

Article
Full-text available
Lung cancer stands as a prevalent respiratory ailment worldwide, with its incidence intricately linked to air pollution. Investigating this relationship is pivotal for implementing effective preventive strategies. Traditionally, research has relied on a simplistic causal model, assuming uniform relationships between air pollutants and lung cancer o...

Citations

... (2) The Light-GBM has better measurement performance than the Light-GBDT and the LSTM. Due to the gradient-based one-side sampling method and the histogram algorithm (Khan et al., 2021;Wen et al., 2021), the optimized Light-GBM obtains better measurement performance than the Light-GBDT and the LSTM. Further, the Light-GBM/GBDT combined with other models needs to reduce the dimensional channel, which can obtain a low time complexity of loading measurement of industrial material. ...
Article
Full-text available
Dynamic measurement via deep learning can be applied in many industrial fields significantly (e.g. electrical power load and fault diagnosis acquisition). Nowadays, accurate and continuous loading measurement is essential in coal mine production. The existing methods are weak in loading measurement because they ignore the symbol characteristics of loading and adjusting features. To address the problem, we propose a hybrid learning method (called ScTCN-LightGBM) to realize the loading measurement of industrial material effectively. First, we provide an abnormal data processing method to guarantee raw data accuracy. Second, we design a sided-composited temporal convolutional network that combines a novel transposed dimensionality-reduction convolution residual block with the conventional residual block. This module can extract symbol characteristics and values of loading and adjusting features well. Finally, we utilize the light-gradient boosting machine to measure loading capacity. Experimental results show that the ScTCN-LightGBM outperforms existing measurement models with high metrics, especially the stability coefficient R² is 0.923. Compared to the conventional loading measurement method, the measurement performance via ScTCN-LigthGBM improves by 40.2% and the continuous measurement time is 11.28s. This study indicates that the proposed model can achieve the loading measurement of industrial material effectively.
... There are also studies in the literature in which copulas and decision trees are used together. Khan et al. [25] bring a joint approach to copulas and decision trees. They appraised a novel nonparametric copula-based decision tree organization using a measure of dependence and applied their proposed method to credit card records for Taiwan and coronary heart disease records of Pakistan and acquired the desirable outcomes. ...
Article
Full-text available
The random forest algorithm could be enhanced and produce better results with a well-designed and organized feature selection phase. The dependency structure between the variables is considered to be the most important criterion behind selecting the variables to be used in the algorithm during the feature selection phase. As the dependency structure is mostly nonlinear, making use of a tool that considers nonlinearity would be a more beneficial approach. Copula-Based Clustering technique (CoClust) clusters variables with copulas according to nonlinear dependency. We show that it is possible to achieve a remarkable improvement in CPU times and accuracy by adding the CoClust-based feature selection step to the random forest technique. We work with two different large datasets, namely, the MIMIC-III Sepsis Dataset and the SMS Spam Collection Dataset. The first dataset is large in terms of rows referring to individual IDs, while the latter is an example of longer column length data with many variables to be considered. In the proposed approach, first, random forest is employed without adding the CoClust step. Then, random forest is repeated in the clusters obtained with CoClust. The obtained results are compared in terms of CPU time, accuracy and ROC (receiver operating characteristic) curve. CoClust clustering results are compared with K-means and hierarchical clustering techniques. The Random Forest, Gradient Boosting and Logistic Regression results obtained with these clusters and the success of RF and CoClust working together are examined.
... Furthermore. MARS, FFNN-BP, and DTR model also belongs to nonparametric learning, and the model is used in those areas (Bengio et al. 2010;Al Iqbal et al. 2012;Genuer et al. 2017;Khaldi et al. 2019;Kohler et al. 2019;Yurochkin et al. 2019;Antoniadis et al. 2020;Devianto et al. 2020;Khan et al. 2020b;Zheng et al. 2020;Amiri-Ardakani & Najafzadeh 2021). Najafzadeh & Ghaemi (2019) implemented the LS-SVM and MARS model to estimate BOD 5 and COD parameters through 200 samples collected from Karoun River, in the southwest of Iran. ...
Article
Full-text available
This paper presents to study the performance of machine learning techniques consisting of multivariate adaptive regression spline (MARS), feed forward neural network-back propagation (FFNN-BP), and decision tree regression (DTR) for estimating the physico-chemical properties of groundwater in the coastal plain area in Vinh Linh and Gio Linh districts of Quang Tri province of Vietnam. With 290 groundwater samples collected in two districts, this study has identified three main elements CO2, Ca, CaCO3 for simulation. Quantitative analysis results have shown that these three components are such as CaCO3 with from 0 to 25.8 mg/l, Ca from 0 to 87.55 mg/l and CO2 from 0 to 12 mg/l. In the present examination, groundwater quality index (GQI) values and their representative categories have been referred by the Vietnam Groundwater Standard (QCVN01). Furthermore, the statistical accuracy parameters were used to compare among models. To deploy FFNN-BP and DTR, different types of transfer and kernel functions were tested, respectively. Determining the results of MARS, FFNN-BP and DTR showed that three models have suitable carrying out for forecasting water quality components. Comparison of outcomes of MARS model with the FFNN-BP and DTR models indicated that this model has good performance for forecasting the elements of water quality, its level of accuracy was slightly more than the other. To assess the accurate values of the models according to the measurement parameters for training phase illustrated that the order of the models was MARS to give the best result, followed by DTR and finally FFNN-BP, respectively. HIGHLIGHTS Machine learning methods are used for spatial modeling of physico-chemical properties of groundwater.; MARS performances suitable precision compared to the DTR and FFNN-BP models.; Total CaCO3 value in the experiment samples adapted the regular limit of QCVN01 with ‘Excellent’ point.; The quality of water parameters (i.e., CaCO3, Ca, and CO2) of the coastal plain area was predicted.; The study results have shown that the water quality in these two districts is usable for humans, livestock, and agriculture activities.;
... areas [27][28][29][30][31][32][33][34][35][36]. 31 ...
Preprint
Full-text available
This paper presents to study the performance of machine learning techniques consisting of Multivariate Adaptive Regression Spline(MARS), Multilayer Perceptron (MLP), and Decision Tree Regression (DTR) for estimating physico-chemical properties groundwater in coastal plain area in Vinhlinh and Giolinh districts of Quangtri province of Vietnam. To deploy the MLP and DTR, different types of transfer and kernel functions were tested, respectively. Determining the results of MARS, MLP and DTR showed that three models have suitable carrying out for forecasting water quality components. Comparison of outcomes of MARS model with MLP, DTR models indicates that this model has good performance for forecasting the elements of water quality, its level of accuracy is slightly more than other. To assess the accurate values of the models according to the measurement parameters indicated that order models were MARS, DTR, and MLP, respectively.
... ii) Other possible future research directions will be to apply the proposed model and the deep learning approaches such as LSTM-RNN and Phased LSTM-RNN and compare the result in the presence of missing values. Finally, one may consider copula-based decision tree classification recently proposed by khan et al. [36] in the classification stage and compare the accuracy with the existing method. There are many other possible research points that are difficult to explain here, but one should think over it and work on it in the future. ...
Article
Full-text available
Background and objectives The ideal treatment of illnesses is the interest of every era. Data innovation in medical care has become extremely quick to analyze diverse diseases from the most recent twenty years. In such a finding, past and current information assume an essential job is utilizing and information mining strategies. We are inadequate in diagnosing the enthusiastic mental unsettling influence precisely in the beginning phases. In this manner, the underlying conclusion of misery expressively positions an extraordinary clinical and Scientific research issue. This work is dedicated to tackling the same issue utilizing the AI strategy. Individuals’ dependence on passionate stages has been successfully characterized into various gatherings in the data innovation climate. Methods A notable AI multi-include cross breed classifier is utilized to execute half and half order by having the passionate incitement as pessimistic or positive individuals. A troupe learning calculation helps to pick the more appropriate highlights from the accessible classes feeling information on online media to improve order. We split the Dataset into preparing and testing sets for the best proactive model. Results The execution assessment is applied to check the proposed framework through measurements of execution assessment. This exploration is done on the Class Labels MovieLens dataset. The exploratory outcomes show that the used group technique gives ideal order execution by picking the highlights’ greatest separation. The supposed results demonstrated the projected framework’s distinction, which originates from the picking-related highlights chosen by the incorporated learning calculation. Conclusion The proposed approach is utilized to precisely and successfully analyze the downturn in its beginning phase. It will assist in the recovery and action of discouraged individuals. We presume that the future strategy’s utilization is exceptionally appropriate in all data innovation-based E-medical services for discouraging incitement.
Preprint
Full-text available
In industrial systems,the signal of rotating machinery is usually non-stationary, non-linear, and with noise interference.To improve the accuracy of anomaly detection analysis and overcome the limitations of optimization methods, This article proposes a rolling bearing fault diagnosis method using Renyi entropy and the integrated sparrow search algorithm (ISSA) with flight strategy for optimizing support vector machines (SVM). Firstly, wavelet packet analysis is used to decompose the original signal, and the optimal frequency band is selected from the decomposed bands for reconstruction. The reconstructed frequency band is then used to calculate the Renyi entropy and form the feature vector, which is input into the sparrow search algorithm with dynamically reverse learning factors for fault diagnosis. This algorithm improves the diversity of the population and the problem of easily getting stuck in local optima of the sparrow search algorithm by initializing the population with a flight strategy and adjusting the step size factor. The improved algorithm is compared with the diagnostic results of grey wolf optimization algorithm, sparrow search algorithm, and particle swarm optimization algorithm, and it is evident that the ISSA-SVM with improved algorithm has faster convergence and higher accuracy.
Article
This paper addressed the prediction of heart sicknesses from hazard elements through a decision-making tree. We introduced the facts mining technique in public fitness to extract high-degree knowledge from raw data, which facilitates predicting heart diseases from risk factors and their prevention. The existing work intends to introduce a new risk element in heart diseases using novel data mining strategies. Latest actual international affected person’s information (e.g., smoking, area of residence, age, weight, blood stress, chest pain, low-density lipoproteins (LDL), high-density lipoproteins (HDL), block arteries became accrued by way of the use of questionnaire through direct interview technique from patients. Novel two-variable decision trees are constructed for coronary heart illness records primarily based on chance factors and ranking of risk elements. The results show a correct prediction of cardiovascular disease (CVD) from the risk factor if records on chance factors are available as direct results of this study, tobacco, loss of physical exercise, and weight-reduction plan play a vital role in predicting heart diseases, which is the most important reason for mortality in developing countries, especially in my country.