Figure 1 - uploaded by Dwi Wahyu Prabowo
Content may be subject to copyright.
10-fold cross validation 

10-fold cross validation 

Source publication
Article
Full-text available
The objective of this research is to investigate the randomization of data on a computer based feature selection for diagnosing coronary artery disease. The randomization on Cleveland dataset was conducted because the performance value is different for each experiment. Assuming the performance values have a Gaussian probability distribution is a so...

Context in source publication

Context 1
... of the tools provided by Weka (CVParameter) is used in the train-test split. Figure 1 and 2 describe these processes. 3 Experiment result and analysis ...

Similar publications

Article
Full-text available
Feature selection, which plays an important role in high-dimensional data analysis, is drawing increasing attention recently. Finding the most relevant and important features for classifications are one of the most important tasks of data mining and machine learning, since all of the datasets have irrelevant features that affect accuracy rate and s...

Citations

... Besides, the labelled amount of data is more negative than positive, which in turn led to a low value of True Positive (TP) and F-measure. The same results occurred in a study carried out by [26], which had a high accuracy, but low TP and F-measure. ...
Article
Full-text available
The social network allows individuals to create public and semi-public web-based profiles to communicate with other users in the network and online interaction sources. Social media sites such as Facebook, Twitter, etc., are prime examples of the social network, which enable people to express their ideas, suggestions, views, and opinions about a particular product, service, political entity, and affairs. This research introduces an Machine Learning-based (ML-based) classification scheme for analyzing the social network reviews of Yemeni people using data mining techniques. A constructed dataset consisting of 2000 MSA and Yemeni dialects records used for training and testing purposes along with a test dataset consisting of 300 Modern Standard Arabic (MSA) and Yemeni dialects records used to demonstrate the capacity of our scheme. Four supervised machine learning algorithms were applied and a comparison was made of performance algorithms based on Accuracy, Recall, Precision and F-measure. The results show that the Support Vector Machine algorithm outperformed the others in terms of Accuracy on both training and testing datasets with 90.65% and 90.00, respectively. It is further noted that the accuracy of the selected algorithms was influenced by noisy and sarcastic opinions.
... The study converted multiclass classifications to binary classifications. Similar to the research by Nahar et al. [8], Prabowo et al. [9] proposed a system of diagnosis of coronary heart disease that adds randomization before the classification process. The algorithm was tested together with those considered by Nahar et al. [8]. ...
... In the studies that have been done with non-black-box approaches to implement conversion into binary and multiclass, the average yield performance is still relatively low, especially for the true positive rate (TPR) [8][9][10] and Fmeasure. Low TPR and F-measure values indicate that the system has a poor ability to interpret the data. ...
... Unfortunately, as [4] showed, the process cannot be interpreted by the clinician. The approach of feature selection has also been widely used to address data imbalanced in relation to the diagnosis of coronary heart disease, as was done in the studies by Nahar et al. [8] and Prabowo et al. [9]. ...
Article
Full-text available
Objectives: The interpretation of clinical data for the diagnosis of coronary heart disease can be done using algorithms in data mining. Most clinical data interpretation systems for diagnosis developed using data mining algorithms with a black-box approach cannot recognize examination attribute relationships with the incidence of coronary heart disease. Methods: This study proposes a system to interpretation clinical examination results for the diagnosis of coronary heart disease based the decision tree algorithm. This system comprises several stages. First, oversampling is carried out by a combination of the synthetic minority oversampling technique (SMOTE), feature selection, and the C4.5 classification algorithm. System testing is done using k-fold cross-validation. The performance parameters are sensitivity, specificity, positive prediction value (PPV), negative prediction value (NPV) and the area under the curve (AUC). Results: The results showed that the performance of the system has a sensitivity of 74.7%, a specificity of 93.7%, a PPV of 74.2%, an NPV of 93.7%, and an AUC of 84.2%. Conclusions: This study demonstrated that, by using C4.5 algorithms, data can be interpreted in the form of a decision tree, to aid the understanding of the clinician. In addition, the proposed system can provide better performance by category.
... The t-test results showed that the best performance classification method was SMO. The same approach was also adopted by Prabowo et al. [29]. The research also investigated the performance of the algorithm computational intelligence. ...
... When compared with the proposed system, Akrami et al. [28] achieved better results, but the resulting performance was still as good as that of category classification. Similar to the work by Prabowo et al. [29], improved results were obtained for sensitivity and F-measure when the process is done with randomize variable selection for every 10-fold and it is performed 10 times. When compared with the case when no variable selection is carried out, the performance of the propose system is still better in terms of sensitivity and Fmeasure. ...
... In the studies conducted by Nahar et al. [27], Akrami et al. [28], Prabowo et al. [29] and Setiawan et al. [30], multiclass classification was converted to binary classification. Such conversion makes the those systems ineffective. ...
Article
Full-text available
Objectives: Coronary heart disease is the leading cause of death worldwide, and it is important to diagnose the level of the disease. Intelligence systems for diagnosis proved can be used to support diagnosis of the disease. Unfortunately, most of the data available between the level/type of coronary heart disease is unbalanced. As a result system performance is low. Methods: This paper proposes an intelligence systems for the diagnosis of the level of coronary heart disease taking into account the problem of data imbalance. The first stage of this research was preprocessing, which included resampled non-stratified random sampling (R), the synthetic minority over-sampling technique (SMOTE), clean data out of range attribute (COR), and remove duplicate (RD). The second step was the sharing of data for training and testing using a k-fold cross-validation model and training multiclass classification by the K-star algorithm. The third step was performance evaluation. The proposed system was evaluated using the performance parameters of sensitivity, specificity, positive prediction value (PPV), negative prediction value (NPV), area under the curve (AUC) and F-measure. Results: The results showed that the proposed system provides an average performance with sensitivity of 80.1%, specificity of 95%, PPV of 80.1%, NPV of 95%, AUC of 87.5%, and F-measure of 80.1%. Performance of the system without consideration of data imbalance provide showed sensitivity of 53.1%, specificity of 88,3%, PPV of 53.1%, NPV of 88.3%, AUC of 70.7%, and F-measure of 53.1%. Conclusions: Based on these results it can be concluded that the proposed system is able to deliver good performance in the category of classification.
... The test results showed that the SMO gives better performance of the other algorithm. Other similar studies have also been conducted by [6]. This study is similar to that done before, by using the binary classification approach. ...
... This study adds randomize the process prior to the 10-fold cross validation. The processes are carried out as many as 100 times, while the end result is the average yield of 100 times the [6]. Multiclass clasification conversion approach to binary classification also performed by [7], its research emphasize banchmarking feature selection methods, computational intelligence algorithm C4.5 and Naive Bayesian. ...
... The studies that have been conducted by [4], [5] and [6] showed that the SMO algorithm gives good performance. The downside of these studies did not use the approach of multiclass classification Support Vector Machine. ...
Article
Full-text available
Automatic diagnosis of coronary heart disease helps the doctor to support in decision making a diagnosis. Coronary heart disease have some types or levels. Referring to the UCI Repository dataset, it divided into 4 types or levels that are labeled numbers 1-4 (low, medium, high and serious). The diagnosis models can be analyzed with multiclass classification approach. One of multiclass classification approach used, one of which is a support vector machine (SVM). The SVM use due to strong performance of SVM in binary classification. This research study multiclass performance classification support vector machine to diagnose the type or level of coronary heart disease. Coronary heart disease patient data taken from the UCI Repository. Stages in this study is preprocessing, which consist of, to normalizing the data, divide the data into data training and testing. The next stage of multiclass classification and performance analysis. This study uses multiclass SVM algorithm, namely: Binary Tree Support Vector Machine (BTSVM), One-Against-One (OAO), One-Against-All (OAA), Decision Direct Acyclic Graph (DDAG) and Exhaustive Output Error Correction Code (ECOC). Performance parameter used is recall, precision, F-measure and Overall accuracy.
Article
While coronary angiography is the gold standard diagnostic tool for coronary artery disease (CAD), but it is associated with procedural risk, it is an invasive technique requiring arterial puncture, and it subjects the patient to radiation and iodinated contrast exposure. Artificial intelligence (AI) can provide a pretest probability of disease that can be used to triage patients for angiography. This review comprehensively investigates published papers in the domain of CAD detection using different AI techniques from 1991 to 2020, in order to discern broad trends and geographical differences. Moreover, key decision factors affecting CAD diagnosis are identified for different parts of the world by aggregating the results from different studies. In this study, all datasets that have been used for the studies for CAD detection, their properties, and achieved performances using various AI techniques, are presented, compared, and analyzed. In particular, the effectiveness of machine learning (ML) and deep learning (DL) techniques to diagnose and predict CAD are reviewed. From PubMed, Scopus, Ovid MEDLINE, and Google Scholar search, 500 papers were selected to be investigated. Among these selected papers, 256 papers met our criteria and hence were included in this study. Our findings demonstrate that AI-based techniques have been increasingly applied for the detection of CAD since 2008. AI-based techniques that utilized electrocardiography (ECG), demographic characteristics, symptoms, physical examination findings, and heart rate signals, reported high accuracy for the detection of CAD. In these papers, the authors ranked the features based on their assessed clinical importance with ML techniques. The results demonstrate that the attribution of the relative importance of ML features for CAD diagnosis is different among countries. More recently, DL methods have yielded high CAD detection performance using ECG signals, which drives its burgeoning adoption.
Article
Full-text available
While coronary angiography is the gold standard diagnostic tool for coronary artery disease (CAD), but it is associated with procedural risk, it is an invasive technique requiring arterial puncture, and it subjects the patient to radiation and iodinated contrast exposure. Artificial intelligence (AI) can provide a pretest probability of disease that can be used to triage patients for angiography. This review comprehensively investigates published papers in the domain of CAD detection using different AI techniques from 1991 to 2020, in order to discern broad trends and geographical differences. Moreover, key decision factors affecting CAD diagnosis are identified for different parts of the world by aggregating the results from different studies. In this study, all datasets that have been used for the studies for CAD detection, their properties, and achieved performances using various AI techniques, are presented, compared, and analyzed. In particular, the effectiveness of machine learning (ML) and deep learning (DL) techniques to diagnose and predict CAD are reviewed. From PubMed, Scopus, Ovid MEDLINE, and Google Scholar search, 500 papers were selected to be investigated. Among these selected papers, 256 papers met our criteria and hence were included in this study. Our findings demonstrate that AI-based techniques have been increasingly applied for the detection of CAD since 2008. AI-based techniques that utilized electrocardiography (ECG), demographic characteristics, symptoms, physical examination findings, and heart rate signals, reported high accuracy for the detection of CAD. In these papers, the authors ranked the features based on their assessed clinical importance with ML techniques. The results demonstrate that the attribution of the relative importance of ML features for CAD diagnosis is different among countries. More recently, DL methods have yielded high CAD detection performance using ECG signals, which drives its burgeoning adoption.
Article
Coronary artery disease (CAD) is the most common cardiovascular disease (CVD) and often leads to a heart attack. It annually causes millions of deaths and billions of dollars in financial losses worldwide. Angiography, which is invasive and risky, is the standard procedure for diagnosing CAD. Alternatively, machine learning (ML) techniques have been widely used in the literature as fast, affordable, and noninvasive approaches for CAD detection. The results that have been published on ML-based CAD diagnosis differ substantially in terms of the analyzed datasets, sample sizes, features, location of data collection, performance metrics, and applied ML techniques. Due to these fundamental differences, achievements in the literature cannot be generalized. This paper conducts a comprehensive and multifaceted review of all relevant studies that were published between 1992 and 2019 for ML-based CAD diagnosis. The impacts of various factors, such as dataset characteristics (geographical location, sample size, features, and the stenosis of each coronary artery) and applied ML techniques (feature selection, performance metrics, and method) are investigated in detail. Finally, the important challenges and shortcomings of ML-based CAD diagnosis are discussed.
Article
Full-text available
Coronary heart disease is a disease with the highest mortality rates in the world. This makes the development of the diagnostic system as a very interesting topic in the field of biomedical informatics, aiming to detect whether a heart is normal or not. In the literature there are diagnostic system models by combining dimension reduction and data mining techniques. Unfortunately, there are no review papers that discuss and analyze the themes to date. This study reviews articles within the period 2009-2016, with a focus on dimension reduction methods and data mining techniques, validated using a dataset of UCI repository. Methods of dimension reduction use feature selection and feature extraction techniques, while data mining techniques include classification, prediction, clustering, and association rules. © 2017 Institute of Advanced Engineering and Science. All rights reserved.
Conference Paper
Full-text available
In 2005, 264 pupils of 1000 households in Indonesia had suffer from mental disorder. Currently, the number of that people tends to increase, possibly related to the economic crisis and social problems. Unfortunately, the number of psyciatrists are still rare in Indonesia. The number of psychiatrists in 2010 is around 616 people. Therefore, Indonesia still requires a lot of experts to address and overcome the problem of mental disorder. By using an expert system, it can be addressed by a number of general doctor whose number 71.307 lives in 2010. An Expert System (ES) is a computer based information systems that use expert knowledge to attain high level decision performance in a narrowly defined problem domain. This means, by using an expert system a general doctor can solve the problem as if he or she is a psychiatrist. The issues raised in this research is how to make an expert system for mental disorders with Forward Chaining method. This expert system are built by acquiring expert's knowledge as its knowledge and by using MINI ICD-10 as the instrument. This expert system testing was performed by co-assistants where 100 patients were checked randomly who came to The Outpatient Clinic, Dr. Cipto Mangunkusumo National Referral Hospital. The testing validation were performed by the experts directly and indirectly by the patients' medical record (co-assistants did not know the patients diagnosis and history before). The accuracy value of this system is 96%, which is calculated by comparing expert system's result with the true value in the detection system with the number of tested patients diagnosed by phsyciatrists.