TABLE 3 - uploaded by Jessica Rudd
Content may be subject to copyright.
LOGISTIC REGRESSION AND SVM RESULTS

LOGISTIC REGRESSION AND SVM RESULTS

Source publication
Article
Full-text available
Disease classification is a crucial element of biomedical research. Recent studies have demonstrated that machine learning techniques, such as Support Vector Machine (SVM) modeling, produce similar or improved predictive capabilities in comparison to the traditional method of Logistic Regression. In addition, it has been found that social network m...

Context in source publication

Context 1
... this were a real network in the dataset (not simulated), this would indicate that people with shorter total paths to other people in the network would have increased risk of diabetes. Table 3 shows the model performance results for models with and without the social network graph characteristics. The models were evaluated based on the sensitivity, specificity, and ROC index for the validation data set. ...

Citations

... the graph Ml method has been employed in numerous studies. For instance, it was utilized for the classification of diabetes [59] and the investigation of alzheimer's disease [60]. in this study, we conducted a binary node classification on a training graph. the goal was to classify unlabelled nodes for influenza and hepatitis based on the features of their fellow nodes. ...
Article
Full-text available
Background Most infectious diseases are caused by viruses, fungi, bacteria and parasites. Their ability to easily infect humans and trigger large-scale epidemics makes them a public health concern. Methods for early detection of these diseases have been developed; however, they are hindered by the absence of a unified, interoperable and reusable model. This study seeks to create a holistic and real-time model for swift, preliminary detection of infectious diseases using symptoms and additional clinical data. Materials and methods In this study, we present a medical knowledge graph (MKG) that leverages multiple data sources to analyse connections between different nodes. Medical ontologies were used to enhance the MKG. We applied various graph algorithms to extract key features. The performance of multiple machine-learning (ML) techniques for influenza and hepatitis detection was assessed, selecting multi-layer perceptron (MLP) and random forest (RF) models due to their superior outcomes. The hyperparameters of both graph-based ML models were automatically fine-tuned. Results Both the graph-based MLP and RF models showcased the least loss and error rates, along with the most specific, accurate recall, precision and F1 scores. Their Matthews correlation coefficients were also optimal. When compared with existing ML techniques and findings from the literature, these graph-based ML models manifested superior detection accuracy. Conclusions The graph-based MLP and RF models effectively diagnosed influenza and hepatitis, respectively. This underlines the potential of graph data science in enhancing ML model performance and uncovering concealed relationships in the MKG.
... Beyond this, Implementation of fuzzy combined support vector machine and graph theory concepts could help provides a development in all the technical core problems. To solve such an issue, combing the SVM and graph theory concepts provides huge margin level of productivity pitches in output productivity prediction [17] [18]. Especially, a fuzzy combined support vector machine is a widely used procedure and commercially available technique for to predicting the productivity of all science properties issues including physics, chemistry, and biology [19] [20] [21]. ...
Article
Full-text available
A fuzzy integrated support vector machine and graph theory concepts are represents the data models for predicting a production. On this account, it has been used in various platforms such as agriculture, medicine, and various engineering applications. Therefore, the development of new computational development for predicting the productivity of events in terms of farming structure is very significant in agriculture. This method used fuzzy integrated support vector machine and graph theory to perform structural tasks suggested by crop influencing factors. Finally, the results obtained illustrate the advantage of predicting the rate of productivity, in addition to the importance of system recommendations that fail to produce the expected output volume at the time of setup or fail to produce the expected output quantum.
... Previous studies showed that features used for building the machine learning classifier can be extracted from the diagnostic images (such as MRI, CTA and ultrasound) by automatic algorithms [31]. However, the clinical significance of these features was unknown and difficult to be interpreted. ...
... Our results further indicated that the volume of training data was another factor that can impact the classification performance (Table 5). A study by Rudd et al. indicated that the whole data were split into 80% (1652# of 2066#) for training and the remaining 20% for testing achieved high accuracy of 97.5% [31]. Our present strategy showed similar high performances when no less than 50% of the morphometric data were used for training. ...
Article
Full-text available
Background: Early accurate detection of coronary artery disease (CAD) is one of the most important medical research areas. Researchers are motivated to utilize machine learning techniques for quick and accurate detection of CAD. Methods: To obtain the high quality of features used for machine learning, we here extracted the coronary bifurcation features from the coronary computed tomography angiography (CCTA) images by using the morphometric method. The machine learning classifier algorithms, such as logistic regression (LR), decision tree (DT), linear discriminant analysis (LDA), k-nearest neighbors (k-NN), artificial neural network (ANN), and support vector machine (SVM) were applied for estimating the performance by using the measured features. Results: The results showed that in comparison with other machine learning methods, the polynomial-SVM with the use of the grid search optimization method had the best performance for the detection of CAD and had yielded the classification accuracy of 100.00%. Among six examined coronary bifurcation features, the exponent of vessel diameter (n) and the area expansion ratio (AER) were two key features in the detection of CAD. Conclusions: This study could aid the clinicians to detect CAD accurately, which may probably provide an alternative method for the non-invasive diagnosis in clinical.
... Thus, in this paper, we introduce graph features as a fuller representation of the website's hyperlink and network structure to enhance the phishing detection rate. In the past, graph-theoretic approaches have been utilised to model the features in other research domains such as disease classification ( Bilgin et al., 2010, Rudd, 2018, e-commerce analytics ( Baumann et al., 2018 ), anomaly detection ( Sebastian et al., 2018 ), and web spam detection ( Castillo et al., 2007 ). To the best of our knowledge, graph features are yet to be fully utilised for phishing detection. ...
Article
Over the years, various technical means have been developed to protect Internet users from phishing attacks. To enrich the anti-phishing efforts, we capitalise on concepts from graph theories, and propose a set of novel graph features to improve the phishing detection accuracy. The initial phase of the proposed technique involved the extraction of hyperlinks in the webpage under scrutiny and fetching the corresponding neighbourhood webpages. During this process, the page linking data were collected, and used to construct a web graph which models the overall hyperlink and network structure of the webpage. From the web graph, graph measures were computed and extracted as graph features to derive a classifier for detecting phishing webpages. Experimental results show that the proposed graph features achieve an improved overall accuracy of 97.8% when C4.5 was utilised as classifier, outperforming the existing conventional features derived from the same data samples. Unlike conventional features, the proposed graph features leverage inherent phishing patterns that are only visible at a higher level of abstraction, thus making it robust and difficult to be evaded by direct manipulations on the webpage contents. Our proposed graph-based technique also shows promising results when benchmarked against a prominent phishing detection technique. Hence, the proposed technique is an important contribution to the existing anti-phishing research towards improving the detection performance.
... However, kernel support vector machines (SVMs) have been a popular tool for classification with network inputs, and extensions exist to one-class classification and survival analysis. In particular, graph data has been used with SVMs to perform classification of protein function prediction (Borgwardt et al. (2005)), chemical informatics (Ralaivola et al. (2005)), and disease (Rudd (2018)), as well as one-class classification for media data (Mygdalis et al. (2016)). In Section 2.1, we will show the connection between the kernel SVM solution and our GP solution. ...
Preprint
Full-text available
While the study of a single network is well-established, technological advances now allow for the collection of multiple networks with relative ease. Increasingly, anywhere from several to thousands of networks can be created from brain imaging, gene co-expression data, or microbiome measurements. And these networks, in turn, are being looked to as potentially powerful features to be used in modeling. However, with networks being non-Euclidean in nature, how best to incorporate networks into standard modeling tasks is not obvious. In this paper, we propose a Bayesian modeling framework that provides a unified approach to binary classification, anomaly detection, and survival analysis with network inputs. Our methodology exploits the theory of Gaussian processes and naturally requires the use of a kernel, which we obtain by modifying the well-known Hamming distance. Moreover, kernels provide a principled way to integrate data of mixed types. We motivate and demonstrate the whole of our methodology in the area of microbiome research, where network analysis is emerging as the standard approach for capturing the interconnectedness of microbial taxa across both time and space.
... Adapted from Haynes and Rees, 2006. 6. Support vector machine such as global graph measures that are not available on a "voxel-wise" basis can also be used as inputs to SVM (Rudd, 2017;Wang, Zuo, He, Bullmore, & Fornito, 2010). ...
Chapter
In this chapter, we explore Support Vector Machine (SVM)-a machine learning method that has become exceedingly popular for neuroimaging analysis in recent years. Because of their relative simplicity and flexibility for addressing a range of classification problems, SVMs distinctively afford balanced predictive performance, even in studies where sample sizes may be limited. In brain disorders research, SVMs are typically employed using multivoxel pattern analysis (MVPA) because their relative simplicity carries a lower risk of overfitting even using high-dimensional imaging data. More recently, SVMs have been used in the context of precision psychiatry, particularly for applications that involve predicting diagnosis and prognosis of brain diseases such as Alzheimer’s disease, schizophrenia, and depression. In the last section of this chapter, we review a number of recent studies that use SVM for such applications.
... Each data point is represented as an n-dimensional vector, then SVM constructs an n-1-dimensional separating hyperplane to discriminate 2 classes, with maximized distance between the hyperplane and data points on each side. SVM aims to find the best hyperplane for separation of both classes [16]. Data are represented as: ...
... In our work, the RBF kernel is used since it has fewer numerical difficulties and better performance in nonlinear cases. In addition, according to the previous studies carried out in this area, to classify the data, the support vector machine was used with the RBF kernel function having a radius of 0.2 (Rudd, 2018;Karatsiolis & Christos, 2012;Tambade et al. 2017;Abdillah & Suwarno, 2016;Kumari & Chitra, 2013). ...
Article
Diabetes mellitus is a medical condition examined by data miners for reasons such as significant health complications in affected people, the economic impact on healthcare networks, and so on. In order to find the main causes of this disease, researchers look into the patient's lifestyle, hereditary information, etc. The goal of data mining in this context is to find patterns that make early detection of the disease and proper treatment easier. Due to the high volume of data involved in therapeutic contexts and disease diagnosis, provision of the intended treatment method become almost impossible over a short period of time. This justifies the use of pre-processing techniques and data reduction methods in such contexts. In this regard, clustering and meta-heuristic algorithms maintain important roles. In this paper, a method based on the k-means clustering algorithm is first utilized to detect and delete outliers. Then, in order to select significant and effective features, four bi-objective meta-heuristic algorithms are employed to choose the least number of significant features with the highest classification accuracy using support vector machines (SVM). In addition, the 10-fold cross validation (CV) method is used to validate the constructed model. Using real case data, it is concluded that the multi-objective firefly (MOFA) and multi-objective imperialist competitive algorithm (MOICA) with a 100% classification accuracy outperform the non-dominated sorting genetic algorithm (NSGA-II) and multi-objective particle swarm optimization (MOPSO) with the accuracies of 98.2% and 94.6%, respectively.
Article
The influenza virus can spread easily, causing significant public health concern. Despite the existence of different techniques for rapid detection and prevention of influenza, their efficiency varies significantly. Additionally, there is currently a lack of a comprehensive, interoperable, and reusable real-time model for detecting influenza infection and predicting relationships within the field of influenza analysis. This study proposed a comprehensive, real-time model for rapid and early influenza detection using symptoms. Further, new relationships in the influenza field were discovered. Multiple data sources were used for the influenza knowledge graph (KG). Throughout this study, various graph algorithms were utilized to extract significant nodes and relationship features and multiple influenza detection machine learning (ML) models were compared. Node classification and link prediction methods were employed on a multi-layer perceptron (MLP) model. Furthermore, the hyperparameters of the model were automatically tuned. The proposed MLP model demonstrated the lowest rate of loss and the highest specificity, accuracy, recall, precision, and F1-score compared to state-of-the-art ML models. Moreover, the Matthews correlation coefficient was promising. This study shows that graph data science can improve MLP model detection and assist in discovering hidden connections in influenza KG.
Article
Full-text available
Background COVID-19 is an infectious disease caused by SARS-CoV-2. The symptoms of COVID-19 vary from mild-to-moderate respiratory illnesses, and it sometimes requires urgent medication. Therefore, it is crucial to detect COVID-19 at an early stage through specific clinical tests, testing kits, and medical devices. However, these tests are not always available during the time of the pandemic. Therefore, this study developed an automatic, intelligent, rapid, and real-time diagnostic model for the early detection of COVID-19 based on its symptoms. Methods The COVID-19 knowledge graph (KG) constructed based on literature from heterogeneous data is imported to understand the COVID-19 different relations. We added human disease ontology to the COVID-19 KG and applied a node-embedding graph algorithm called fast random projection to extract an extra feature from the COVID-19 dataset. Subsequently, experiments were conducted using two machine learning (ML) pipelines to predict COVID-19 infection from its symptoms. Additionally, automatic tuning of the model hyperparameters was adopted. Results We compared two graph-based ML models, logistic regression (LR) and random forest (RF) models. The proposed graph-based RF model achieved a small error rate = 0.0064 and the best scores on all performance metrics, including specificity = 98.71%, accuracy = 99.36%, precision = 99.65%, recall = 99.53%, and F1-score = 99.59%. Furthermore, the Matthews correlation coefficient achieved by the RF model was higher than that of the LR model. Comparative analysis with other ML algorithms and with studies from the literature showed that the proposed RF model exhibited the best detection accuracy. Conclusion The graph-based RF model registered high performance in classifying the symptoms of COVID-19 infection, thereby indicating that the graph data science, in conjunction with ML techniques, helps improve performance and accelerate innovations.