Decision tree for credit card data when MIC ≤ 0.10 is pre-specified for stop growing tree

Source publication

A nonparametric copula-based decision tree for two random variables using MIC as a classification index

Article

Full-text available

Aug 2021

The copula is well-known for learning scale-free measures of dependence among variables and has invited much interest in recent years. At the very coronary heart of the copula, the concept is the well-known theorem of Sklar. It states that any multivariate distribution function can be disintegrated into the marginal distributions and a copula, whic...

Correlation pairs of the observation data

Tree structure for lung cancer prevalence in men

Tree structure of the lung cancer prevalence in women

Assessing the complex interplay of airborne pollutants and lung cancer prevalence via the improved decision tree-based vine copula modeling

Article

Full-text available

May 2024

Lung cancer stands as a prevalent respiratory ailment worldwide, with its incidence intricately linked to air pollution. Investigating this relationship is pivotal for implementing effective preventive strategies. Traditionally, research has relied on a simplistic causal model, assuming uniform relationships between air pollutants and lung cancer o...

ScTCN-LightGBM: a hybrid learning method via transposed dimensionality-reduction convolution for loading measurement of industrial material

Article

Full-text available

Nov 2023
CONNECT SCI

Dynamic measurement via deep learning can be applied in many industrial fields significantly (e.g. electrical power load and fault diagnosis acquisition). Nowadays, accurate and continuous loading measurement is essential in coal mine production. The existing methods are weak in loading measurement because they ignore the symbol characteristics of loading and adjusting features. To address the problem, we propose a hybrid learning method (called ScTCN-LightGBM) to realize the loading measurement of industrial material effectively. First, we provide an abnormal data processing method to guarantee raw data accuracy. Second, we design a sided-composited temporal convolutional network that combines a novel transposed dimensionality-reduction convolution residual block with the conventional residual block. This module can extract symbol characteristics and values of loading and adjusting features well. Finally, we utilize the light-gradient boosting machine to measure loading capacity. Experimental results show that the ScTCN-LightGBM outperforms existing measurement models with high metrics, especially the stability coefficient R² is 0.923. Compared to the conventional loading measurement method, the measurement performance via ScTCN-LigthGBM improves by 40.2% and the continuous measurement time is 11.28s. This study indicates that the proposed model can achieve the loading measurement of industrial material effectively.

An enhanced random forest approach using CoClust clustering: MIMIC-III and SMS spam collection application

Article

Full-text available

Mar 2023

The random forest algorithm could be enhanced and produce better results with a well-designed and organized feature selection phase. The dependency structure between the variables is considered to be the most important criterion behind selecting the variables to be used in the algorithm during the feature selection phase. As the dependency structure is mostly nonlinear, making use of a tool that considers nonlinearity would be a more beneficial approach. Copula-Based Clustering technique (CoClust) clusters variables with copulas according to nonlinear dependency. We show that it is possible to achieve a remarkable improvement in CPU times and accuracy by adding the CoClust-based feature selection step to the random forest technique. We work with two different large datasets, namely, the MIMIC-III Sepsis Dataset and the SMS Spam Collection Dataset. The first dataset is large in terms of rows referring to individual IDs, while the latter is an example of longer column length data with many variables to be considered. In the proposed approach, first, random forest is employed without adding the CoClust step. Then, random forest is repeated in the clusters obtained with CoClust. The obtained results are compared in terms of CPU time, accuracy and ROC (receiver operating characteristic) curve. CoClust clustering results are compared with K-means and hierarchical clustering techniques. The Random Forest, Gradient Boosting and Logistic Regression results obtained with these clusters and the success of RF and CoClust working together are examined.

Predictive modelling physico-chemical properties groundwater in coastal plain area of Vinh Linh and Gio Linh districts of Quang Tri Province, Vietnam

Article

Full-text available

Oct 2022
Water Pract Tech

This paper presents to study the performance of machine learning techniques consisting of multivariate adaptive regression spline (MARS), feed forward neural network-back propagation (FFNN-BP), and decision tree regression (DTR) for estimating the physico-chemical properties of groundwater in the coastal plain area in Vinh Linh and Gio Linh districts of Quang Tri province of Vietnam. With 290 groundwater samples collected in two districts, this study has identified three main elements CO2, Ca, CaCO3 for simulation. Quantitative analysis results have shown that these three components are such as CaCO3 with from 0 to 25.8 mg/l, Ca from 0 to 87.55 mg/l and CO2 from 0 to 12 mg/l. In the present examination, groundwater quality index (GQI) values and their representative categories have been referred by the Vietnam Groundwater Standard (QCVN01). Furthermore, the statistical accuracy parameters were used to compare among models. To deploy FFNN-BP and DTR, different types of transfer and kernel functions were tested, respectively. Determining the results of MARS, FFNN-BP and DTR showed that three models have suitable carrying out for forecasting water quality components. Comparison of outcomes of MARS model with the FFNN-BP and DTR models indicated that this model has good performance for forecasting the elements of water quality, its level of accuracy was slightly more than the other. To assess the accurate values of the models according to the measurement parameters for training phase illustrated that the order of the models was MARS to give the best result, followed by DTR and finally FFNN-BP, respectively. HIGHLIGHTS Machine learning methods are used for spatial modeling of physico-chemical properties of groundwater.; MARS performances suitable precision compared to the DTR and FFNN-BP models.; Total CaCO3 value in the experiment samples adapted the regular limit of QCVN01 with ‘Excellent’ point.; The quality of water parameters (i.e., CaCO3, Ca, and CO2) of the coastal plain area was predicted.; The study results have shown that the water quality in these two districts is usable for humans, livestock, and agriculture activities.;

Predictive Modelling Physico-chemical Properties Groundwater in Coastal Plain Area of Vinhlinh and Giolinh Districts of Quangtri Province, Vietnam

Preprint

Full-text available

Oct 2021

This paper presents to study the performance of machine learning techniques consisting of Multivariate Adaptive Regression Spline(MARS), Multilayer Perceptron (MLP), and Decision Tree Regression (DTR) for estimating physico-chemical properties groundwater in coastal plain area in Vinhlinh and Giolinh districts of Quangtri province of Vietnam. To deploy the MLP and DTR, different types of transfer and kernel functions were tested, respectively. Determining the results of MARS, MLP and DTR showed that three models have suitable carrying out for forecasting water quality components. Comparison of outcomes of MARS model with MLP, DTR models indicates that this model has good performance for forecasting the elements of water quality, its level of accuracy is slightly more than other. To assess the accurate values of the models according to the measurement parameters indicated that order models were MARS, DTR, and MLP, respectively.

A multi-feature hybrid classification data mining technique for human-emotion

Article

Full-text available

Mar 2021

Background and objectives The ideal treatment of illnesses is the interest of every era. Data innovation in medical care has become extremely quick to analyze diverse diseases from the most recent twenty years. In such a finding, past and current information assume an essential job is utilizing and information mining strategies. We are inadequate in diagnosing the enthusiastic mental unsettling influence precisely in the beginning phases. In this manner, the underlying conclusion of misery expressively positions an extraordinary clinical and Scientific research issue. This work is dedicated to tackling the same issue utilizing the AI strategy. Individuals’ dependence on passionate stages has been successfully characterized into various gatherings in the data innovation climate. Methods A notable AI multi-include cross breed classifier is utilized to execute half and half order by having the passionate incitement as pessimistic or positive individuals. A troupe learning calculation helps to pick the more appropriate highlights from the accessible classes feeling information on online media to improve order. We split the Dataset into preparing and testing sets for the best proactive model. Results The execution assessment is applied to check the proposed framework through measurements of execution assessment. This exploration is done on the Class Labels MovieLens dataset. The exploratory outcomes show that the used group technique gives ideal order execution by picking the highlights’ greatest separation. The supposed results demonstrated the projected framework’s distinction, which originates from the picking-related highlights chosen by the incorporated learning calculation. Conclusion The proposed approach is utilized to precisely and successfully analyze the downturn in its beginning phase. It will assist in the recovery and action of discouraged individuals. We presume that the future strategy’s utilization is exceptionally appropriate in all data innovation-based E-medical services for discouraging incitement.

Performing non-linear anomaly detection analysis using Renyi entropy and ISSA-SVM

Preprint

Full-text available

Mar 2023

In industrial systems,the signal of rotating machinery is usually non-stationary, non-linear, and with noise interference.To improve the accuracy of anomaly detection analysis and overcome the limitations of optimization methods, This article proposes a rolling bearing fault diagnosis method using Renyi entropy and the integrated sparrow search algorithm (ISSA) with flight strategy for optimizing support vector machines (SVM). Firstly, wavelet packet analysis is used to decompose the original signal, and the optimal frequency band is selected from the decomposed bands for reconstruction. The reconstructed frequency band is then used to calculate the Renyi entropy and form the feature vector, which is input into the sparrow search algorithm with dynamically reverse learning factors for fault diagnosis. This algorithm improves the diversity of the population and the problem of easily getting stuck in local optima of the sparrow search algorithm by initializing the population with a flight strategy and adjusting the step size factor. The improved algorithm is compared with the diagnostic results of grey wolf optimization algorithm, sparrow search algorithm, and particle swarm optimization algorithm, and it is evident that the ISSA-SVM with improved algorithm has faster convergence and higher accuracy.

A Machine learning-based prediction model for the heart diseases from chance factors through two-variable decision tree classifier

Article

Aug 2021
J INTELL FUZZY SYST

This paper addressed the prediction of heart sicknesses from hazard elements through a decision-making tree. We introduced the facts mining technique in public fitness to extract high-degree knowledge from raw data, which facilitates predicting heart diseases from risk factors and their prevention. The existing work intends to introduce a new risk element in heart diseases using novel data mining strategies. Latest actual international affected person’s information (e.g., smoking, area of residence, age, weight, blood stress, chest pain, low-density lipoproteins (LDL), high-density lipoproteins (HDL), block arteries became accrued by way of the use of questionnaire through direct interview technique from patients. Novel two-variable decision trees are constructed for coronary heart illness records primarily based on chance factors and ranking of risk elements. The results show a correct prediction of cardiovascular disease (CVD) from the risk factor if records on chance factors are available as direct results of this study, tobacco, loss of physical exercise, and weight-reduction plan play a vital role in predicting heart diseases, which is the most important reason for mortality in developing countries, especially in my country.

Decision tree for credit card data when MIC ≤ 0.10 is pre-specified for stop growing tree

Similar publications

Citations