LOGISTIC REGRESSION AND SVM RESULTS

Source publication

Application of Support Vector Machine Modeling and Graph Theory Metrics for Disease Classification

Article

Full-text available

Jul 2017

Jessica Rudd

Disease classification is a crucial element of biomedical research. Recent studies have demonstrated that machine learning techniques, such as Support Vector Machine (SVM) modeling, produce similar or improved predictive capabilities in comparison to the traditional method of Logistic Regression. In addition, it has been found that social network m...

Context 1

... this were a real network in the dataset (not simulated), this would indicate that people with shorter total paths to other people in the network would have increased risk of diabetes. Table 3 shows the model performance results for models with and without the social network graph characteristics. The models were evaluated based on the sensitivity, specificity, and ROC index for the validation data set. ...

View in full-text

Novel graph-based machine-learning technique for viral infectious diseases: application to influenza and hepatitis diseases

Article

Full-text available

Dec 2023
ANN MED

Background Most infectious diseases are caused by viruses, fungi, bacteria and parasites. Their ability to easily infect humans and trigger large-scale epidemics makes them a public health concern. Methods for early detection of these diseases have been developed; however, they are hindered by the absence of a unified, interoperable and reusable model. This study seeks to create a holistic and real-time model for swift, preliminary detection of infectious diseases using symptoms and additional clinical data. Materials and methods In this study, we present a medical knowledge graph (MKG) that leverages multiple data sources to analyse connections between different nodes. Medical ontologies were used to enhance the MKG. We applied various graph algorithms to extract key features. The performance of multiple machine-learning (ML) techniques for influenza and hepatitis detection was assessed, selecting multi-layer perceptron (MLP) and random forest (RF) models due to their superior outcomes. The hyperparameters of both graph-based ML models were automatically fine-tuned. Results Both the graph-based MLP and RF models showcased the least loss and error rates, along with the most specific, accurate recall, precision and F1 scores. Their Matthews correlation coefficients were also optimal. When compared with existing ML techniques and findings from the literature, these graph-based ML models manifested superior detection accuracy. Conclusions The graph-based MLP and RF models effectively diagnosed influenza and hepatitis, respectively. This underlines the potential of graph data science in enhancing ML model performance and uncovering concealed relationships in the MKG.

Application of fuzzy combined SVM & graph theory for agriculture productivity prediction

Article

Full-text available

Dec 2020
J Phys Conf

A fuzzy integrated support vector machine and graph theory concepts are represents the data models for predicting a production. On this account, it has been used in various platforms such as agriculture, medicine, and various engineering applications. Therefore, the development of new computational development for predicting the productivity of events in terms of farming structure is very significant in agriculture. This method used fuzzy integrated support vector machine and graph theory to perform structural tasks suggested by crop influencing factors. Finally, the results obtained illustrate the advantage of predicting the rate of productivity, in addition to the importance of system recommendations that fail to produce the expected output volume at the time of setup or fail to produce the expected output quantum.

Coronary Artery Disease Detection by Machine Learning with Coronary Bifurcation Features

Article

Full-text available

Oct 2020

Background: Early accurate detection of coronary artery disease (CAD) is one of the most important medical research areas. Researchers are motivated to utilize machine learning techniques for quick and accurate detection of CAD. Methods: To obtain the high quality of features used for machine learning, we here extracted the coronary bifurcation features from the coronary computed tomography angiography (CCTA) images by using the morphometric method. The machine learning classifier algorithms, such as logistic regression (LR), decision tree (DT), linear discriminant analysis (LDA), k-nearest neighbors (k-NN), artificial neural network (ANN), and support vector machine (SVM) were applied for estimating the performance by using the measured features. Results: The results showed that in comparison with other machine learning methods, the polynomial-SVM with the use of the grid search optimization method had the best performance for the detection of CAD and had yielded the classification accuracy of 100.00%. Among six examined coronary bifurcation features, the exponent of vessel diameter (n) and the area expansion ratio (AER) were two key features in the detection of CAD. Conclusions: This study could aid the clinicians to detect CAD accurately, which may probably provide an alternative method for the non-invasive diagnosis in clinical.

A Graph-Theoretic Approach for the Detection of Phishing Webpages

Article

May 2020
COMPUT SECUR

Over the years, various technical means have been developed to protect Internet users from phishing attacks. To enrich the anti-phishing efforts, we capitalise on concepts from graph theories, and propose a set of novel graph features to improve the phishing detection accuracy. The initial phase of the proposed technique involved the extraction of hyperlinks in the webpage under scrutiny and fetching the corresponding neighbourhood webpages. During this process, the page linking data were collected, and used to construct a web graph which models the overall hyperlink and network structure of the webpage. From the web graph, graph measures were computed and extracted as graph features to derive a classifier for detecting phishing webpages. Experimental results show that the proposed graph features achieve an improved overall accuracy of 97.8% when C4.5 was utilised as classifier, outperforming the existing conventional features derived from the same data samples. Unlike conventional features, the proposed graph features leverage inherent phishing patterns that are only visible at a higher level of abstraction, thus making it robust and difficult to be evaded by direct manipulations on the webpage contents. Our proposed graph-based technique also shows promising results when benchmarked against a prominent phishing detection technique. Hence, the proposed technique is an important contribution to the existing anti-phishing research towards improving the detection performance.

Bayesian classification, anomaly detection, and survival analysis using network inputs with application to the microbiome

Preprint

Full-text available

Apr 2020

While the study of a single network is well-established, technological advances now allow for the collection of multiple networks with relative ease. Increasingly, anywhere from several to thousands of networks can be created from brain imaging, gene co-expression data, or microbiome measurements. And these networks, in turn, are being looked to as potentially powerful features to be used in modeling. However, with networks being non-Euclidean in nature, how best to incorporate networks into standard modeling tasks is not obvious. In this paper, we propose a Bayesian modeling framework that provides a unified approach to binary classification, anomaly detection, and survival analysis with network inputs. Our methodology exploits the theory of Gaussian processes and naturally requires the use of a kernel, which we obtain by modifying the well-known Hamming distance. Moreover, kernels provide a principled way to integrate data of mixed types. We motivate and demonstrate the whole of our methodology in the area of microbiome research, where network analysis is emerging as the standard approach for capturing the interconnectedness of microbial taxa across both time and space.

Support vector machine

Chapter

Jan 2020

In this chapter, we explore Support Vector Machine (SVM)-a machine learning method that has become exceedingly popular for neuroimaging analysis in recent years. Because of their relative simplicity and flexibility for addressing a range of classification problems, SVMs distinctively afford balanced predictive performance, even in studies where sample sizes may be limited. In brain disorders research, SVMs are typically employed using multivoxel pattern analysis (MVPA) because their relative simplicity carries a lower risk of overfitting even using high-dimensional imaging data. More recently, SVMs have been used in the context of precision psychiatry, particularly for applications that involve predicting diagnosis and prognosis of brain diseases such as Alzheimer’s disease, schizophrenia, and depression. In the last section of this chapter, we review a number of recent studies that use SVM for such applications.

An Empirical Study of Downstream Analysis Effects of Model Pre-Processing Choices

Article

Full-text available

Jan 2020
OJS

A bi-objective hybrid optimization algorithm to reduce noise and data dimension in diabetes diagnosis using support vector machines

Article

Aug 2019
EXPERT SYST APPL

Diabetes mellitus is a medical condition examined by data miners for reasons such as significant health complications in affected people, the economic impact on healthcare networks, and so on. In order to find the main causes of this disease, researchers look into the patient's lifestyle, hereditary information, etc. The goal of data mining in this context is to find patterns that make early detection of the disease and proper treatment easier. Due to the high volume of data involved in therapeutic contexts and disease diagnosis, provision of the intended treatment method become almost impossible over a short period of time. This justifies the use of pre-processing techniques and data reduction methods in such contexts. In this regard, clustering and meta-heuristic algorithms maintain important roles. In this paper, a method based on the k-means clustering algorithm is first utilized to detect and delete outliers. Then, in order to select significant and effective features, four bi-objective meta-heuristic algorithms are employed to choose the least number of significant features with the highest classification accuracy using support vector machines (SVM). In addition, the 10-fold cross validation (CV) method is used to validate the constructed model. Using real case data, it is concluded that the multi-objective firefly (MOFA) and multi-objective imperialist competitive algorithm (MOICA) with a 100% classification accuracy outperform the non-dominated sorting genetic algorithm (NSGA-II) and multi-objective particle swarm optimization (MOPSO) with the accuracies of 98.2% and 94.6%, respectively.

Detection and real-time analysis of influenza disease using graph data science and a multi-layer perceptron model

Article

Apr 2024
J INTELL FUZZY SYST

The influenza virus can spread easily, causing significant public health concern. Despite the existence of different techniques for rapid detection and prevention of influenza, their efficiency varies significantly. Additionally, there is currently a lack of a comprehensive, interoperable, and reusable real-time model for detecting influenza infection and predicting relationships within the field of influenza analysis. This study proposed a comprehensive, real-time model for rapid and early influenza detection using symptoms. Further, new relationships in the influenza field were discovered. Multiple data sources were used for the influenza knowledge graph (KG). Throughout this study, various graph algorithms were utilized to extract significant nodes and relationship features and multiple influenza detection machine learning (ML) models were compared. Node classification and link prediction methods were employed on a multi-layer perceptron (MLP) model. Furthermore, the hyperparameters of the model were automatically tuned. The proposed MLP model demonstrated the lowest rate of loss and the highest specificity, accuracy, recall, precision, and F1-score compared to state-of-the-art ML models. Moreover, the Matthews correlation coefficient was promising. This study shows that graph data science can improve MLP model detection and assist in discovering hidden connections in influenza KG.

Graph data science and machine learning for the detection of COVID-19 infection from symptoms

Article

Full-text available

Apr 2023

Background COVID-19 is an infectious disease caused by SARS-CoV-2. The symptoms of COVID-19 vary from mild-to-moderate respiratory illnesses, and it sometimes requires urgent medication. Therefore, it is crucial to detect COVID-19 at an early stage through specific clinical tests, testing kits, and medical devices. However, these tests are not always available during the time of the pandemic. Therefore, this study developed an automatic, intelligent, rapid, and real-time diagnostic model for the early detection of COVID-19 based on its symptoms. Methods The COVID-19 knowledge graph (KG) constructed based on literature from heterogeneous data is imported to understand the COVID-19 different relations. We added human disease ontology to the COVID-19 KG and applied a node-embedding graph algorithm called fast random projection to extract an extra feature from the COVID-19 dataset. Subsequently, experiments were conducted using two machine learning (ML) pipelines to predict COVID-19 infection from its symptoms. Additionally, automatic tuning of the model hyperparameters was adopted. Results We compared two graph-based ML models, logistic regression (LR) and random forest (RF) models. The proposed graph-based RF model achieved a small error rate = 0.0064 and the best scores on all performance metrics, including specificity = 98.71%, accuracy = 99.36%, precision = 99.65%, recall = 99.53%, and F1-score = 99.59%. Furthermore, the Matthews correlation coefficient achieved by the RF model was higher than that of the LR model. Comparative analysis with other ML algorithms and with studies from the literature showed that the proposed RF model exhibited the best detection accuracy. Conclusion The graph-based RF model registered high performance in classifying the symptoms of COVID-19 infection, thereby indicating that the graph data science, in conjunction with ML techniques, helps improve performance and accelerate innovations.

LOGISTIC REGRESSION AND SVM RESULTS

Context in source publication

Citations