Performance comparison on Statlog dataset

Source publication

Figure 5 Feature weights on Cleveland dataset

Figure 7 ROC curve of proposed ensemble classifier on Cleveland dataset

Figure 9 Performance comparison on Hungarian dataset

A hybrid cost-sensitive ensemble for heart disease prediction

Preprint

Full-text available

Feb 2020

Heart disease is the primary cause of morbidity and mortality in the world. It includes numerous problems and symptoms. The diagnosis of heart disease is difficult because there are too many factors to analyze. What's more, the misclassification cost could be very high. In this paper, I firstly propose a cost-sensitive ensemble model to improve the...

Figure 4. Basic steps of feature selection

Automatic missing value imputation for cleaning phase of diabetic’s readmission prediction model

Article

Full-text available

Apr 2022

Recently, the industry of healthcare started generating a large volume of datasets. If hospitals can employ the data, they could easily predict the outcomes and provide better treatments at early stages with low cost. Here, data analytics (DA) was used to make correct decisions through proper analysis and prediction. However, inappropriate data may...

Machine Learning Approaches to Classify Primary and Metastatic Cancers Using Tissue of Origin-Based DNA Methylation Profiles

Article

Full-text available

Jul 2021

Metastatic cancers account for up to 90% of cancer-related deaths. The clear differentiation of metastatic cancers from primary cancers is crucial for cancer type identification and developing targeted treatment for each cancer type. DNA methylation patterns are suggested to be an intriguing target for cancer prediction and are also considered to b...

Malignant and Benign Breast Cancer Classification using Machine Learning Algorithms

Conference Paper

Full-text available

Apr 2021

At the moment, the most prevalent form of cancerdiagnosed in women across the globe is breast cancer. It developsin the breast tissue and is one of the most frequent causes ofwomen’s death. This cancer can be cured if it is diagnosedat preliminary stage. Malignant and benign are two types oftumor found in case of breast cancer. Malignant tumors are...

Software Requirements Classification Using Machine Learning Algorithms

Article

Full-text available

Sep 2020

The correct classification of requirements has become an essential task within software engineering. This study shows a comparison among the text feature extraction techniques, and machine learning algorithms to the problem of requirements engineer classification to answer the two major questions "Which works best (Bag of Words (BoW) vs. Term Frequ...

A Comparative Analysis and Predicting for Breast Cancer Detection Based on Data Mining Models

Article

Full-text available

May 2021

Breast cancer is one of the most common diseases among women, accounting for many deaths each year. Even though cancer can be treated and cured in its early stages, many patients are diagnosed at a late stage. Data mining is the method of finding or extracting information from massive databases or datasets, and it is a field of computer science wit...

Enhancing the Prediction of Employee Turnover With Knowledge Graphs and Explainable AI

Article

Full-text available

Jan 2024

Employee turnover poses a critical challenge that affects many organizations globally. Although advanced machine learning algorithms offer promising solutions for predicting turnover, their effectiveness in real-world scenarios is often limited because of their inability to fully utilize the relational structure within tabulated employee data. To address this gap, this study introduces a promising framework that converts traditional tabular employee data into a knowledge graph structure, harnessing the power of Graph Convolutional Networks (GCN) for more nuanced feature extraction. The proposed methodology extends beyond prediction and incorporates explainable artificial intelligence (XAI) techniques to unearth the pivotal factors influencing an employee’s decision to either remain with or depart from a particular organization. The empirical analysis was conducted using a comprehensive dataset from IBM that includes the records of 1,470 employees. We benchmarked the performance against five prevalent machine learning models and observed that our enhanced linear Support Vector Machine (L-SVM) model, combined with knowledge-graph-based features, achieved an impressive accuracy of 0.925. Moreover, the successful integration of XAI techniques for attribute evaluation sheds light on the significant impact of job environment, job satisfaction, and job involvement on turnover intentions. This study not only furthers the development of advanced predictive models for employee turnover but also provides organizations with actionable insights to strategically address and reduce turnover rates.

Improving performance of decision threshold moving-based strategies by integrating density-based clustering technique

Article

Full-text available

Mar 2023

Class imbalance learning (CIL), which aims to addressing the performance degradation problem of traditional supervised learning algorithms in the scenarios of skewed data distribution, has become one of research hotspots in fields of machine learning, data mining, and artificial intelligence. As a postprocessing CIL technique, the decision threshold moving (DTM) has been verified to be an effective strategy to address class imbalance problem. However, no matter adopting random or optimal threshold designation ways, the classification hyperplane could be only moved parallelly, but fails to vary its orientation, thus its performance is restricted, especially on some complex and density variable data. To further improve the performance of the existing DTM strategies, we propose an improved algorithm called CDTM by dividing majority training instances into multiple different density regions, and further conducting DTM procedure on each region independently. Specifically, we adopt the well-known DBSCAN clustering algorithm to split training set as it could adapt density variation well. In context of support vector machine (SVM) and extreme learning machine (ELM), we respectively verified the effectiveness and superiority of the proposed CDTM algorithm. The experimental results on 40 benchmark class imbalance datasets indicate that the proposed CDTM algorithm is superior to several other state-of-the-art DTM algorithms in term of G-mean performance metric.

Credit Card Fraud Detection Using a New Hybrid Machine Learning Architecture

Article

Full-text available

Apr 2022

The negative effect of financial crimes on financial institutions has grown dramatically over the years. To detect crimes such as credit card fraud, several single and hybrid machine learning approaches have been used. However, these approaches have significant limitations as no further investigation on different hybrid algorithms for a given dataset were studied. This research proposes and investigates seven hybrid machine learning models to detect fraudulent activities with a real word dataset. The developed hybrid models consisted of two phases, state-of-the-art machine learning algorithms were used first to detect credit card fraud, then, hybrid methods were constructed based on the best single algorithm from the first phase. Our findings indicated that the hybrid model Adaboost + LGBM is the champion model as it displayed the highest performance. Future studies should focus on studying different types of hybridization and algorithms in the credit card domain.

Learning-based techniques for heart disease prediction: a survey of models and performance metrics

Article

Full-text available

Oct 2023
MULTIMED TOOLS APPL

Heart disease (HD) is a major threat to human health, and the medical field generates vast amounts of data that doctors struggle to effectively interpret and use. Early prediction and classification of HD types are crucial for effective medical treatment. Researchers have found it important to use learning-based techniques from machine and deep learning, such as supervised and deep neural networks, to develop automatic models for HD. These techniques have been used to simulate HD management and extract important features from complex data sets. This survey examines various HD prediction models, classifying the learning-based techniques, datasets, and contexts used, and analyzing the performance metrics of each contribution. It also clarifies which method suits a type of HD. With the growth of data sets, researchers are increasingly utilizing these techniques to create more precise models. However, there is still much work to be done to improve the accuracy of HD predictions.

Performance comparison on Statlog dataset

Similar publications

Citations