ArticlePDF Available

Abstract and Figures

Clinical diagnosis of amyotrophic lateral sclerosis (ALS) is difficult in the early period. But blood tests are less time consuming and low cost methods compared to other methods for the diagnosis. The ALS researchers have been used machine learning methods to predict the genetic architecture of disease. In this study we take advantages of Bayesian networks and machine learning methods to predict the ALS patients with blood plasma protein level and independent personal features. According to the comparison results, Bayesian Networks produced best results with accuracy (0.887), area under the curve (AUC) (0.970) and other comparison metrics. We confirmed that sex and age are effective variables on the ALS. In addition, we found that the probability of onset involvement in the ALS patients is very high. Also, a person's other chronic or neuro-logical diseases are associated with the ALS disease. Finally, we confirmed that the Parkin level may also have an effect on the ALS disease. While this protein is at very low levels in Parkinson's patients, it is higher in the ALS patients than all control groups.
Content may be subject to copyright.
Brain Sci. 2021, 11, 150. https://doi.org/10.3390/brainsci11020150 www.mdpi.com/journal/brainsci
Article
Bayesian Network as a Decision Tool for Predicting ALS Disease
Hasan Aykut Karaboga 1,2,*, Aslihan Gunel 3, Senay Vural Korkut 4, Ibrahim Demir 2 and Resit Celik 2
1 Department of Statistics, Amasya University, Amasya 05100, Turkey
2 Department of Statistics, Yildiz Technical University, Istanbul 34220, Turkey; karaboga@yildiz.edu.tr
(H.A.K.); idemir@yildiz.edu.tr (I.D.); rcelik@yildiz.edu.tr (R.C.)
3 Department of Chemistry, Ahi Evran University, Kirsehir 40200, Turkey; gunel.aslihan@gmail.com
4 Department of Molecular Biology and Genetics, Yildiz Technical University, Istanbul 34220, Turkey;
skorkut@yildiz.edu.tr
* Correspondence: karaboga@yildiz.edu.tr or h.aykut.karaboga@amasya.edu.tr; Tel.: +90-212-383-4427
Abstract: Clinical diagnosis of amyotrophic lateral sclerosis (ALS) is difficult in the early period.
But blood tests are less time consuming and low cost methods compared to other methods for the
diagnosis. The ALS researchers have been used machine learning methods to predict the genetic
architecture of disease. In this study we take advantages of Bayesian networks and machine
learning methods to predict the ALS patients with blood plasma protein level and independent
personal features. According to the comparison results, Bayesian Networks produced best results
with accuracy (0.887), area under the curve (AUC) (0.970) and other comparison metrics. We con-
firmed that sex and age are effective variables on the ALS. In addition, we found that the probabil-
ity of onset involvement in the ALS patients is very high. Also, a person’s other chronic or neuro-
logical diseases are associated with the ALS disease. Finally, we confirmed that the Parkin level
may also have an effect on the ALS disease. While this protein is at very low levels in Parkinson’s
patients, it is higher in the ALS patients than all control groups.
Keywords: motor neuron disease; amyotrophic lateral sclerosis; Parkinson’s disease; machine
learning; Bayesian networks; predictive model
1. Introduction
Amyotrophic lateral sclerosis (ALS) is a rare neurological disorder mainly caused by
progressive degeneration of upper and lower motor neurons. Currently, it is not possible
to cure or stop the progression of this disease [1]. ALS may initially affect only one hand
or only one leg, making it difficult to walk in a straight line. As the disease progresses,
severe muscle weakness, decrease in muscle mass, impaired speech, swallow, fine and
gross motor function, and respiratory weakness occur in patients. These lead to paralysis
and death usually within 2–5 years following diagnosis [2].
ALS is a multifactorial disease. Approximately 10% of ALS cases are familial (fALS)
and 90% of cases are sporadic (sALS) [3]. Although its etiology largely unknown, muta-
tions in various genes have been associated to the ALS [4,5]. There are also some under-
lying biochemical mechanisms have been proposed, such as protein aggregation, endo-
plasmic reticulum stress, oxidative stress, mitochondrial impairment, neu-
ro-inflammation, apoptotic cell death, glutamate excitotoxicity, abnormalities in RNA
mechanisms, and abnormal function of ubiquitin–proteasome system (UPS) [6].
ALS is typically an adult-onset disease although juvenile forms are present. There
are sex-dependent differences in disease development with a slight male predominance
[7,8]. ALS can occur in people from all over the world from all ranks of people. Geo-
graphical variations have been reported by different population-based studies for the
incidence of ALS which ranges 0.6 to 11 cases per 100.000 per year. The prevalence of
ALS is between 4.1 and 8.4 per 100.000 persons (reviewed in [9]).
Citation: Karaboga, H.A.; Gunel, A.;
Korkut, S.V.; Demir, I.; Celik, R.
Bayesian Network as a Decision Tool
for Predicting ALS Disease. Brain Sci.
2021, 11, 150. https://doi.org/10.3390/
brainsci11020150
Received: 21 December 2020
Accepted: 20 January 2021
Published: 23 January 2021
Publisher’s Note: MDPI stays
neutral with regard to jurisdictional
claims in published maps and
institutional affiliations.
Copyright: © 2021 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license
(http://creativecommons.org/licenses
/by/4.0/).
Brain Sci. 2021, 11, 150 2 of 16
Clinical diagnosis of ALS is difficult in the early period because the patients may not
show any upper or lower motor neuron signs [10]. In addition ALS symptoms can be
quite heterogeneous and show resemblance to many neurological diseases. Currently the
diagnosis is made according to El Escorial Criteria of the World Federation of Neurology
and based on complete neurological examination, radiological and electrophysiological
investigations [11]. All of these tests may take 3–6 months and cause delay between
emergence of early symptoms and diagnosis. It will be possible to prolong the patient’s
survival and improve the quality of life with more effective and earlier diagnosis of ALS.
Blood tests are less time consuming and low cost methods compared to other
methods for the diagnosis. In addition, the relationship between the values obtained with
these analyzes and other variables are very important. This study aims to develop a sta-
tistical machine learning model for the prediction of risk of ALS using Parkin protein
concentration in blood plasma. For this purpose data was obtained from an experimental
study investigating the potential use of Parkin protein as biomarker for the diagnosis of
ALS. Patient’s records including age, gender, disease onset, chronic disease information
were also obtained from the same study. In this paper, (1) we developed a predictive
model using Bayesian networks, (2) examined model performance by comparison with
other machine learning methods and (3) created queries based on patient type for evalu-
ation of afore-mentioned variables. In the literature, machine learning methods have
been used to examine the genetic architecture of the ALS disease [12]. This study is the
first in the literature with its specified features.
2. Materials and Methods
In this section, we summarized the data used in the study and explained the basic
steps of the experimental design of machine learning methods used to classify and pre-
dict the ALS with ALS-related feature interactions. We described the application process
of the study in Figure 1.
Experimental De sign
Clinical Trials
Sample Selectio n
Clinical Trials an d
Diagnosis
Patient Records
Informative Feat ures
Definition
Data Pre-pro cessing
Data Selection a nd
Missing Data Clean
Data Discretizat ion
Data Preperatio n
Bayesian Networ k
Model
Machine Learning
Modeling
Other Machine
Learning Mode ls
Model evaluation and
validation
Confusion Matri x
Comparison of
Obtained Resul ts
Comparison an d
Prediction
Bayesian Networ k
Model Queries
Evaluation and
Prediction
Comparison M etrics
Calculation
ROC Analysis
Figure 1. Modeling process with machine learning methods.
In Figure 1, first step was clinical trials to obtain experimental data. Second step,
after the properties related to ALS were determined, was data pre-processing. In the next
step, the data were modeled with Bayesian networks and other machine learning algo-
rithms, and obtained results were compared. Considering the comparison results, last
step was evaluation.
2.1. Participants
Brain Sci. 2021, 11, 150 3 of 16
This data set has been obtained from an experimental study investigating the dif-
ferences on the level of Parkin protein between blood plasma from the ALS patients and
other neurological cases including multiple sclerosis, frontal dementia and Parkinson’s
disease. There is no missing data in the data set, as the patients amnesia was taken in
detail.
The characteristics of the subjects used in the study are given in the Table 1. We
confirmed that, sex, age, upper motor neurons (UMN), lower motor neurons (LMN),
Bulbar onset types, total number of chronic patience and Parkin level (ng/mL) are related
to disease type. Accordingly, 50.5% of the data in the study are from the ALS and 9.3%
are from Parkinson’s patients. The Neurological Control (N-Control) group includes
people with different neurological diseases other than these diseases. Control group
consists of completely healthy individuals. Totally 204 individuals are included. All pa-
tients were diagnosed and treated by neurologists at Istanbul Medical University ac-
cording to El Escorial criteria [11].
Table 1. Characteristics of patients.
Feature Name Feature Value Freq. %Value
SEX Female 79 38.7
Male 125 61.3
AGE
Below 36 29 14.2
Between 36–52 70 34.3
Between 52–67 79 38.7
Upper 67 26 12.7
UMN No 129 63.2
Yes 75 36.8
LMN No 178 87.3
Yes 26 12.7
BULBAR No 182 89.2
Yes 22 10.8
Total Number of Chronic Patience
Five 1 0.5
Four 1 0.5
Three 12 5.9
Two 20 9.8
One 112 54.9
None 58 28.4
PARKIN Level (ng/mL)
Upper than 3.74 31 15.2
Between 2.79–3.74 17 8.3
Between 2.06–2.79 36 17.6
Between 1.36–2.06 52 25.5
Lower than 1.36 68 33.3
Patient type
ALS 103 50.5
Control 42 20.6
N-Control 40 19.6
Parkinson 19 9.3
2.2. Bayesian Networks
Bayesian networks are a graphical modeling approach that models the conditional
probabilistic relationships of certain independent variables. In a Bayes network model,
nodes correspond to variables, while arrows between nodes show the direct dependency
structure between these variables [13]. The direction of the arrow also indicates the di-
rection of the impact.
Brain Sci. 2021, 11, 150 4 of 16
The probability table for any given X node in the network expresses the values given
as X = x for the states of the parents of the node.
1
P(X) ( ) ( | ( ))
n
1 n i i
i
P X ...X P X Pa X
(1)
These networks are widely used in medicine and biology [14–17]. Bayesian net-
works are very useful in terms of ease of use of posterior probabilities especially in risk
assessment studies [17,18]. The ability to refine the network for new information makes
the network more useful and adaptive [19]. In addition, it provides to combine the rela-
tionships and expert knowledge stated in the literature with the probabilities obtained
from the data as a prior probability. In this respect, it is superior to other machine learn-
ing methods [20]. Bayesian networks, which are statistically very strong due to the fact
that they are based on probability theory. They are accepted as hybrid methods hence
they use both classical statistical techniques and heuristic algorithms [21].
2.3. Other Machine Learning Methods
Machine learning (ML) methods are a subfield of artificial intelligence (AI) and are
becoming increasingly common in clinical research [12,22]. The ML methods are mainly
examined in three main categories as semi-supervised, supervised and unsupervised
algorithms [23]. Supervised learning methods aim to make predictions about unknown
situations (e.g., disease type) based on known situations like age, gender, type of onset
[12,23]. Classification, similarity detection and regression are among the most common
tasks of supervised machine learning methods [24].
In our study, we examined the following seven popular supervised machine learn-
ing techniques with Bayesian Network: Artificial Neural Networks, Logistic Regression,
Naïve Bayes Algorithm, J48 Algorithm, Support Vector Machines, KStar Algorithm, and
K-Nearest Neighbor Algorithm. We investigate as extensively as possible in terms of
computing the best results for each machine learning method.
Artificial Neural Network (ANN), based on its learning and generalization abilities,
is one of the learning methods that imitate the human brain. These models basically have
a hidden layer and input and output layer. One of the most important advantages is that
it works on nonlinear, complex models and missing data. Models are optimized with
back propagation algorithms of faults during training. On the other hand, lack of rigid
hypotheses found in statistical methods makes the ANN advantageous in modeling
[25,26].
Logistic regression (LR) is one of the most widely used methods in biology and
health science applications [27]. The LR differs from standard regression models due to
the structure of the dependent variable. However, as in linear regression models, the re-
lationships of dependent and independent variables are investigated in the LR. The most
important difference here is that the dependent variable in LR is dichotomous. In terms
of application, the LR is similar to standard linear regression [28]. In cases where there
are more than two situations, the LR can be applied to estimate the dependent variable
[29].
Naïve Bayes (NB) Algorithm is one of the most important machine learning methods
based on Bayes Rule. This method is a classical Bayesian network based on the inde-
pendence of variables. Classes to be estimated in the NB method must be independent
from each other [30]. This method is one of the supervised learning algorithms. Despite
being simple, it produces very successful results in medical applications [31,32].
J48 algorithm is one of the most important decision tree algorithms decision trees
include popular machine learning algorithms [33]. This algorithm is a modified version
of ID3 [34] and c4.5 algorithms [35,36]. While this algorithm uses c4.5, c5.0, and ID3 al-
gorithms to create the decision tree, criteria such as gini index, information gain or en-
tropy reduction are used for estimation [33,36]. Another important feature of it is that it
Brain Sci. 2021, 11, 150 5 of 16
can make predictions by creating a smaller tree compared to other decision trees. This
enables the J48 algorithm to produce more successful results than its counterparts [37].
Support Vector Machines (SVMs) are statistical algorithms that use statistical
learning theory to produce a consistent estimator using available data [25]. It tries to di-
vide the data into two basic categories. The n-dimensional hyperplane is produced for
this reason [38]. Basically, if linear separation of data is possible, system optimization is
done the linear SVM. If not possible, quadratic optimization is provided with the
non-linear SVM [38–40]. Models use kernel functions for this. The selected kernel func-
tion affects the performance of the system. Different results can be obtained with differ-
ent kernel functions.
KStar algorithm is one of the Instance-based learning algorithms in the WEKA pro-
gram [41]. It is a method that automatically reveals the number of clusters when the
number of clusters is unknown [42]. This algorithm uses entropy as a measure of distance
[43]. In this respect, the algorithm is similar to the kNN algorithm that uses entropy as a
measure of the distance of the data [44].
The k-Nearest Neighbor Algorithm (k-NN) determines the classification of data ac-
cording to its closest neighbors. This algorithm is one of the most popular algorithms in
data mining work [41]. It is preferred because of simplicity and ease of understandability
[45]. The similarity function with the k parameter value in the algorithm affects the per-
formance [46]. It calculates the probability of a data considered to be included in the class
of its neighbors based on the status of its nearest neighbor. In this respect, it is superior to
NN, which is a completely black box. However, it is difficult to determine the distance
between neighbors [25].
2.4. Classification Criteria
There are a variety of criteria that can be used to compare the performance of the ML
models, the choice of which depends on the structure of the data and nature of the task
[12,38,41]. In our study, the numbers of samples in each class are different from each
other. In addition, while there are generally two classes in the ML studies, we had four
different classes in this study. Increasing the number of classes can affect the results [47].
Since some methods used to evaluate the results are susceptible to unbalanced data, cri-
teria such as Geometric Mean and Youden’s index were also used in the evaluation [48].
The criteria used to determine the algorithms that are effective in this section are
given in the Table 2. These criteria were given as Accuracy (ACC), Geometric Mean
(GM), Error Rate (ERR), Precision (PREC), Sensitivity (SENS), Specificity (SPEC),
F-Measure (FM), Matthew’s correlation coefficient (MCC), Youden’s index (YI), Kappa
(κ), False Positive Rate (FPR), and Receiver Operating Characteristic (ROC) Area. Calcu-
lation of these formulas is possible by using True positive (TP), True negative (TN), False
positive (FP), and False negative (FN) values. Given TP; correct positive prediction, FP;
incorrect positive prediction, TN; correct negative prediction, and FN; incorrect negative
prediction values are obtained from confusion matrixes.
Accuracy reflects the ratio of true positive and true negative predictions within the
total model estimates. The geometric mean is a metric that determines the balance be-
tween the results of both the majority and minority subgroups in classification [49]. Ac-
curacy is affected by the changes in the class distribution, but geometric mean is not. For
this reason geometric mean is more suitable for the imbalanced dataset [48]. The error
rate is complementary to the accuracy. Unlike the measure of accuracy, this metric shows
the number of misclassified samples for both positive and negative classes. Precision
represents how many positive predictions were genuinely positive for the model. Sensi-
tivity and specificity, representing true positive and true negative rates, are comple-
mentary to each other. Sensitivity, also known as the true positive rate, is the ratio of the
number of correct positive samples to the number classified as positive, while specificity
is the ratio calculated in the same way for negative samples [50].
Brain Sci. 2021, 11, 150 6 of 16
The equilibrium between precision and sensitivity is represented by the F-Measure.
Higher F-Measure indicates good classifier performance. This value is also equal to the
harmonic mean of sensitivity and precision [51]. The Matthew’s correlation coefficient is
the comparison coefficient that is least affected by unbalanced data and calculates the
correlation between observed and predicted classifications. Youden’s index assesses the
misclassifications potential of a classifier. The accuracy that can be obtained entirely by
chance is calculated by Kappa [52].
The Receiver Operating Curve plots the sensitivity against 1-Specificity to determine
an appropriate balance between true and false positive rates. ROC curve is one of the
important comparison criteria in clinical studies. This method uses the area under the
curve drawn in comparing the subclasses. The larger sum of the AUC shows better clas-
sification results [53].
Table 2. Evaluation criteria formulas.
Criteria Formula
Accuracy ACC = (TP + TN)/(P + N)
Geometric Mean GM = sqrt ((TP/(TP + FN)) × (TN/(TN + FP)))
Error Rate EER = (FP + FN) / (TP + TN + FP + FN)
Precision PREC = TP / (TP + FP)
Sensitivity SENS = TP / (TP + FN)
Specificity SPEC = TN / (FP + TN)
F-Measure F-Measure = 2 × TP / (2 × TP + FP + FN)
Matthews Corre-
lation Coefficient
MCC = TP × TN − FP×FN/sqrt((TP + FP) × (TP + FN) × (TN + FP) × (TN
+ FN))
Youden’s index YI = TPR + TNR − 1
Kappa Kappa = 2 × (TP × TN – FN × FP) / (TP × FN + TP × FP + 2 × TP × TN
+ FN2 + FN × TN + FP2 + FP × TN)
Overall Kappa Kappa = (p0 − pe)/(1 − pe)
p0 = observed accuracy; pe = expected accuracy
False Positive Rate FPR = FP/(FP + TN)
Also, 5-fold cross validation has been preferred for generating estimation results in
analyzes. The available data was divided into five, the first four pieces were used for
educational purposes and the last piece was used for testing [51]. 5-fold cross-validation
is one of the commonly used validation methods to increase model robustness [22].
3. Results
3.1. Bayesian Network Model
The Bayesian network model obtained from the data used is given in Figure 2. Ar-
rows show the relationship between variables in the network. The direction of the arrow
also indicates the direction of the impact. The network was created using GeNIe 2.1 Ac-
ademic version. GeNIe is a machine learning program based on Bayesian networks [54].
According to the Bayesian network model, the types of involvement, age, gender,
Parkin protein density and the number of diseases directly affect the type of disease. In
addition, it is observed that the types of involvement affect the number of diseases. Since
there is at least one disease in people except the control group, it is expected that the in-
volvement will affect the number of diseases. It is known that one of the most important
symptoms in the ALS disease is UMN involvement. In the model we obtained as a result of
the analysis, it was observed that the Parkin protein density affects the UMN involvement.
Brain Sci. 2021, 11, 150 7 of 16
Figure 2. Bayesian network model of dataset.
3.2. Comparison Results of Methods
Other machine learning programs that were utilized for comparison were obtained
with the WEKA program. This program is Java-based open source software, created by
the University of Waikato to facilitate the realization of the ML algorithms [41].
Classification performances of the algorithms according to the classification criteria
stated previously are given in Tables 3 and 4. The generalized results are shown in Table
3 and the results obtained for each class are shown in Table 4. The best classification re-
sults according to the criteria are marked in bold.
When the results are examined in general, it has been seen that Bayesian network
produces more successful results than other methods. It has been revealed that the Bayes
network classifications with little differences. On the other hand, it has been observed
that the results of other machine learning methods were close to each other. Polykernel is
used for the SVM. For the k-NN, it was seen that the most successful result was obtained
with the closest 1 neighbor.
When Table 3 is examined, it is seen that the ACC of Bayesian network is 88.7%. It is
observed that the success rates of other methods are approximately 80%. Since Sensitivity
and Precision values are the same in the general comparison table, precision values are
not included in the table. Specificity value, which expresses confidence in results, shows
correctly positively classified variables [55] and this ratio gave high values in all meth-
ods. However, the lowest false positive classification rate (0.024) was obtained with
Bayesian networks. The same results are also valid for the weighted ROC value.
Table 3. Overall comparisons for methods.
ACC GM ERR SENS SPEC F-M MCC YI Kappa FPR ROC
Bayesian Network 0.887 0.882 0.113 0.887 0.976 0.887 0.862 0.863 0.828 0.024 0.970
Neural Network 0.828 0.826 0.172 0.828 0.963 0.828 0.787 0.791 0.741 0.037 0.953
Logistic Regression 0.819 0.817 0.181 0.819 0.960 0.819 0.772 0.778 0.727 0.040 0.951
Naive Bayes 0.799 0.800 0.201 0.799 0.940 0.799 0.736 0.739 0.693 0.060 0.951
J48
0.804 0.804 0.196 0.804 0.958 0.804 0.752 0.762 0.705 0.042 0.930
Support Vector Ma-
chine (SVM) 0.828 0.826 0.172 0.828 0.962 0.828 0.784 0.790 0.741 0.038 0.916
KStar 0.838 0.835 0.162 0.838 0.963 0.838 0.794 0.801 0.756 0.037 0.952
k-Nearest Neighbor
(k-NN) 0.809 0.808 0.191 0.809 0.958 0.809 0.756 0.766 0.715 0.042 0.943
Brain Sci. 2021, 11, 150 8 of 16
Table 4. Comparisons of Methods for ALS, Control, Neurological Control and Parkinson Disease.
ACC GM ERR PREC SENS SPEC F-M MCC YI Kappa
ALS
Bayesian Network 1.000 1.000 0.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
Neural Network 0.985 0.985 0.015 1.000 0.971 1.000 0.985 0.971 0.971 0.971
Logistic Regression 0.975 0.975 0.025 1.000 0.951 1.000 0.975 0.952 0.951 0.951
Naive Bayes 0.971 0.970 0.029 0.962 0.981 0.960 0.971 0.941 0.941 0.941
J48
0.980 0.980 0.020 1.000 0.961 1.000 0.980 0.962 0.961 0.961
SVM 0.980 0.980 0.020 1.000 0.961 1.000 0.980 0.962 0.961 0.961
Kstar 0.975 0.976 0.025 0.990 0.961 0.990 0.975 0.951 0.951 0.951
k-NN 0.956 0.956 0.044 0.990 0.922 0.990 0.955 0.914 0.912 0.912
Control
Bayesian Network 0.917 0.874 0.083 0.791 0.810 0.944 0.800 0.747 0.754 0.747
Neural Network 0.882 0.813 0.118 0.714 0.714 0.926 0.714 0.640 0.640 0.640
Logistic Regression 0.868 0.845 0.132 0.642 0.810 0.883 0.716 0.638 0.692 0.631
Naive Bayes 0.882 0.854 0.118 0.680 0.810 0.901 0.739 0.668 0.711 0.664
J48
0.887 0.902 0.113 0.661 0.929 0.877 0.772 0.718 0.805 0.700
SVM 0.892 0.897 0.108 0.679 0.905 0.889 0.776 0.719 0.794 0.706
KStar 0.907 0.888 0.093 0.735 0.857 0.920 0.791 0.735 0.777 0.732
k-NN 0.912 0.891 0.088 0.750 0.857 0.926 0.800 0.746 0.783 0.744
Neurological Control
Bayesian Network 0.902 0.816 0.098 0.778 0.700 0.951 0.737 0.678 0.651 0.677
Neural Network 0.848 0.738 0.152 0.615 0.600 0.909 0.608 0.513 0.509 0.513
Logistic Regression 0.848 0.668 0.152 0.655 0.475 0.939 0.551 0.471 0.414 0.462
Naive Bayes 0.809 0.568 0.191 0.519 0.350 0.921 0.418 0.317 0.271 0.309
J48
0.819 0.532 0.181 0.571 0.300 0.945 0.393 0.320 0.245 0.299
SVM 0.843 0.650 0.157 0.643 0.450 0.939 0.529 0.449 0.389 0.439
KStar 0.853 0.670 0.147 0.679 0.475 0.945 0.559 0.485 0.420 0.474
k-NN 0.833 0.661 0.167 0.594 0.475 0.921 0.528 0.432 0.396 0.428
Parkinson
Bayesian Network 0.956 0.903 0.044 0.727 0.842 0.968 0.780 0.759 0.810 0.756
Neural Network 0.941 0.869 0.059 0.652 0.789 0.957 0.714 0.686 0.746 0.682
Logistic Regression 0.946 0.898 0.054 0.667 0.842 0.957 0.744 0.721 0.799 0.715
Naive Bayes 0.936 0.840 0.064 0.636 0.737 0.957 0.683 0.650 0.694 0.648
J48
0.922 0.832 0.078 0.560 0.737 0.941 0.636 0.600 0.677 0.593
SVM 0.941 0.842 0.059 0.667 0.737 0.962 0.700 0.669 0.699 0.667
KStar 0.941 0.920 0.059 0.630 0.895 0.946 0.739 0.721 0.841 0.707
k-NN 0.917 0.857 0.083 0.536 0.789 0.930 0.638 0.607 0.719 0.593
Graphical comparison of the results is given in Figure 3. When the graph is examined,
it is observed that the compared machine learning methods are close to each other and that
Bayesian network produces better results than the compared machine learning algorithms.
Comparison should be made for subclasses as well as general comparison of meth-
ods. The results of comparison obtained for each subclass are given in Table 4. Accord-
ingly, Bayes network produced more successful results in the ALS estimation than other
methods. It was observed that all individuals in the ALS patient group were classified
correctly. The results obtained with the SVM and the NN are also close to these values. It
can be proposed that all methods yield successful results in predicting the ALS patients.
In addition, it is very important to estimate the individuals in other classes.
Brain Sci. 2021, 11, 150 9 of 16
Figure 3. Overall comparison of methods.
When the results for the control group were examined, it has been seen that Bayes-
ian network gives the highest ACC value with 0.917. On the other hand, J48 algorithm
produced the best results according to GM (0.902), SENS (0.929), and YI (0.805) criteria.
However, Bayesian network showed the best fit (0.747) with Kappa value [56] between
data and forecast results. In addition, the best results for other criteria for the control
group were produced by the Bayesian network.
Bayesian network has produced more successful results than other methods ac-
cording to all comparison criteria for the Neurological Control group, as in the ALS
group. For this group, the Bayesian Network’s ACC value has been found as (0.902). The
Kappa values of other methods indicate that the results obtained are random, while the
Kappa value (0.677) was found for Bayesian network.
Similar results to the control group were obtained for the last group, Parkinson. The
Kstar algorithm produced the best results according to the GM (0.920), SENS (0.895) and
YI (0.841) criteria. However, it has been seen that the results obtained for Bayesian net-
work are close to these values.
The ROC curves and the AUC values of the methods are given in Figure 4. Accord-
ing to these values, the AUC value of Bayesian network for each class is higher than other
methods. This result supports the values given in Table 4.
(a) (b) (c)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
False Positive Rate
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
True Positive Rate
ROC Curve for Logistic Regression
ALS (AUC=0.984)
Control (AUC=0.920)
Neurological Control (AUC=0.891)
Parkinson (AUC=0.966)
Brain Sci. 2021, 11, 150 10 of 16
(d) (e) (f)
(g) (h)
Figure 4. ROC Analysis Results of Methods; (a) Bayesian Network, (b) Neural Network, (c) Logistic Regression, (d) Naive
Bayes, (e) J48, (f) SVM, (g) KStar, (h) kNN.
3.3. Queries of Bayesian Network Model
One of the most important features of Bayesian networks is that predictions can be
made by creating queries with the information and data available [20]. While the known
variables are included as evidence, the predicted variables are taken as target nodes.
When a new person in one of the disease groups is considered, the questions about the
status of other variables are given in Table 5.
Table 5. Bayesian network model queries for patient type
Target Node (s) Target Value Evidence (Patient Type)
ALS Control
N-Control
Parkinson
None None 0.340 0.252 0.257 0.151
AGE
Below 36 0.119 0.249 0.106 0.078
Between 36–52 0.333 0.419 0.298 0.316
Between 52–67 0.437 0.232 0.449 0.430
Upper 67 0.111 0.100 0.148 0.175
SEX Female 0.382 0.342 0.401 0.452
Male 0.618 0.658 0.599 0.548
PARKIN Level (ng/mL)
Upper than 3.74 0.244 0.107 0.098 0.114
Between 2.79–3.74 0.078 0.072 0.100 0.084
Between 2.06–2.79 0.192 0.200 0.159 0.131
Between 1.36–2.06 0.238 0.279 0.340 0.107
Lower than 1.36 0.247 0.343 0.303 0.563
Brain Sci. 2021, 11, 150 11 of 16
UMN No 0.273 0.840 0.844 0.733
Yes 0.727 0.160 0.156 0.267
LMN No 0.822 0.912 0.913 0.852
Yes 0.178 0.088 0.087 0.148
BULBAR No 0.853 0.924 0.925 0.872
Yes 0.147 0.076 0.075 0.128
Total Number of Patience
None 0.040 0.696 0.288 0.075
One 0.704 0.189 0.595 0.731
Two 0.140 0.060 0.059 0.101
Three 0.103 0.042 0.045 0.070
Four 0.008 0.008 0.007 0.013
Five 0.006 0.006 0.006 0.010
When the probability values given in Table 5 are examined, in the absence of any
prior knowledge, probability values of the persons are P(Patient Type = ALS) = 0.340;
P(Patient Type = Control) = 0.252; P(Patient Type = Neurological Control) = 0.257 and P(Patient
Type = Parkinson) = 0.151. From these given values, the conditional probability value ob-
tained for gender is shown in Equation (2).
 
0.382
= 0.618
|
|
P SEX Female Patient Type ALS
P SEX Male Patient Type ALS
 
(2)
According to this result, it is understood that the ALS disease is seen 62% in men
and 38% in women. In addition, the ALS disease is expressed largely as an adult-onset
disease in the literature [9]. In this part, it was found that 88.1% of the ALS patients were
older than 36 years. Furthermore, it was predicted that 54.8% of the ALS patients and
60.5% of Parkinson’s patients were older than 52 years.
 
|
|
0.273
= 0.727
P UMN No Patient Type ALS
P UMN Yes Patient Type ALS
 
(3)
when the information given in Equation (3) is examined, it is predicted that 72.7% of the
ALS patients have the UMN type onset involvement. In addition, it is understood that
82.2% of the patients do not have the LMN and 85.3% have no bulbar onset involvement.
However, it was calculated that there were 3.8% of the ALS patients with no involve-
ment. In summary, the probability of having at least 1 type of onset involvement in the
ALS patients was predicted 96.2%.
In Table 5, 25.7% of the ALS patients have at least 1 disease other than their own
disease. This probability was 19.4% in Parkinson’s patients. This probability was found to
be 11.6% in the control group and 11.7% in the neurological control group. Accordingly,
it can be thought that different neurological-chronic diseases are related to neurological
diseases such as Parkinson’s or ALS.
Moreover, according to the Parkin level, it is predicted that 75.3% of the ALS pa-
tients to be higher than 1.36 (ng/mL). This value is quite different in Parkinson’s patients.
When Table 5 is examined, 56.3% of Parkinson’s patients’ Parkin level is lower than 1.36
(ng/mL). Also, Parkin level distribution is given in Figure 5. The protein level differences
of the groups are also shown in the graph. Protein level is highest in the ALS patients, but
this level is lowest in Parkinson’s patients.
Brain Sci. 2021, 11, 150 12 of 16
Figure 5. Sex vs. Parkin level (ng/mL) by the type of disease.
4. Discussion and Conclusions
The use of machine learning methods with personal medical records in medical de-
cision-making processes is increasing. In this study Bayesian network—one of the most
beneficial ML method in clinical decision-making—has been used for the prediction of
ALS, based on differences in the level of a plasma protein, onset, age, sex, and total
number of patience. Then results were compared with some popular ML algorithms. To
the best of our knowledge, this is the first performance comparison study for Bayesian
network model and the ML models for predicting ALS disease using these variables.
Bayesian Networks are one of the probabilistic expert systems that use probability as
a measure of uncertainty in order to obtain a graphical structure that best represents the
data [57,58]. Since BN uses all the variables in the model, it is easily used in cases where
there is missing data [13,59]. With diagnostic reasoning in BN, it is ensured to make a
judgement about the patient and the disease by observing various symptoms [60]. Unlike
various rule-based ML methods such as NN, LR, SVM, and BN is a method of inference
and reasoning. These features allow making queries that reveal cause-effect relationships
between variables in the model [13]. The posterior probability values of the network are
updated with every new information acquired in BNs. Therefore, the use of BN in pre-
diction problems produces more effective results [61]. The transparency of all relation-
ships in the network structure makes BN advantageous to other ML methods such as
k-nn, NN and LR. In addition, it can produce successful results in cases where the data
set is small and the number of variables is high [62]. Discretization is main drawback of
the BNs which causes loss of information [63]. However, working with discrete data in-
creases the power of accurate prediction regarding classes [64]. All these features have
made BN a preferred method in clinical studies [59,62,65–68].
In this study, unlike the literature, there are three control groups; Parkinson’s dis-
ease, neurological control and healthy individuals in the control group. In this way, a
comparative result with different control groups containing a large number of subjects
improves the applicability of the study in practice.
According to the results of this study, ALS disease is more likely to be seen in men
than in women. Various studies have also indicated that gender is an independent vari-
able affecting ALS along with other demographic factors [5,69,70]. Gender was an influ-
ential variable and it was confirmed that the ALS disease is more common in males
[7,8,71–73]. There are studies showing that there is a difference in onset of the disease in
ALS patients with different mutations depending on sex [70,74]. Although it is known in
which gene some of ALS patients carry a mutation in this study, it has not been taken into
consideration. In the future, a similar analysis can be applied to a more homogeneous
ALS patient group in terms of mutation.
Brain Sci. 2021, 11, 150 13 of 16
It has been determined that with the algorithm used in this study, the probability of
having ALS will be higher with increasing age. This finding is also consistent with the
results of previous studies [75,76]. UMN, LMN and Bulbar are the onset types seen in
ALS disease. The probability of having at least one of each kind of onset involvement in
the ALS patients was found to be 96.2%. UMN has been determined to be the most
common type of involvement. LMN and Bulbar are less common. ALS patients can pre-
sent together with each a LMN or UMN prevalent phenotype [77]. Previously particular
clinical and demographic characteristics of ALS phenotypes have been demonstrated in a
population based study with a large epidemiological setting The likelihood of a specific
phenotype occurring in different age and gender groups changes. Bulbar phenotype oc-
curs mostly in elderly patients with almost equal incidence rates in the two genders [76].
In particular, the ALS is considered as a multifactorial disease which influenced by
environmental and genetic factors. Other neurological diseases that people have can have
a small effect on the ALS. It is thought that brain damages and mutations [77] caused by
other diseases such as schizophrenia [78], Alzheimer’s disease, Parkinson’s disease, or
frontotemporal dementia [79,80] should be associated with the ALS. In our study, the
probability of having multiple diseases with the ALS was higher than the control and
neurological control groups. Similar results were seen in Parkinson’s patients. Therefore,
it will be beneficial to treat patients considering multiple disease situations.
According to all these results, the algorithms we use and the Bayesian network can
predict the correct classes with high accuracy rates when information such as the type of
involvement of individuals, Parkin protein level, age, and the number of various chronic
diseases are considered. Although other machine learning algorithms also produce re-
sults with high success, the most important advantage of Bayesian network in this regard
is that it can be updated with new additional information and this aspect increases its
success. In this respect, it provides more useful results than other machine learning
methods such as artificial neural networks showing black box feature in prospective
studies, the change of the results should be examined by increasing number of samples
and using more variables. The results obtained from statistical and computational
methods may be more useful in combination with neuroimaging methods. There is such
a study in the literature [81]. A similar approach can be used to classify images of dif-
ferent brain networks as alternative or additional views and the entire MV framework
can be further extended to combine imaging with non-imaging views, such as clinical,
behavioral, or even genetic multidimensional data, when available from the same sub-
jects.
Author Contributions: Conceptualization, A.G., S.V.K., R.C., and H.A.K.; methodology, H.A.K.
and I.D.; software, H.A.K. and I.D.; validation, A.G., R.C., and S.V.K.; formal analysis, H.A.K.; in-
vestigation, A.G., S.V.K., and H.A.K.; resources, H.A.K., A.G., and S.V.K.; data curation, A.G.,
S.V.K., and R.C.; writing—original draft preparation, H.A.K., A.G. and S.V.K.; writing—review and
editing, H.A.K., A.G., and S.V.K.; visualization, H.A.K. All authors have read and agreed to the
published version of the manuscript.
Funding: This research received no external funding.
Institutional Review Board Statement: Ethical approvals for this study were acquired from Is-
tanbul University, Faculty of Medicine (2016/665 File number, meeting issue is 10 on 24 May 2016)
All volunteers participating the study declared their consent and signed the related ethical docu-
ments.
Informed Consent Statement: Informed consent was obtained from all subjects involved in the
study.
Acknowledgments: We are deeply grateful to Atilla Halil Idrisoglu for his valuable support in the
collection of blood samples.
Conflicts of Interest: The authors declare no conflict of interest.
Brain Sci. 2021, 11, 150 14 of 16
References
1. Rowland, L.P.; Shneider, N.A. Amyotrophic Lateral Sclerosis. N. Engl. J. Med. 2001, 344, 1688–1700,
doi:10.1056/NEJM200105313442207.
2. Hardiman, O.; van den Berg, L.H.; Kiernan, M.C. Clinical Diagnosis and Management of Amyotrophic Lateral Sclerosis. Nat.
Rev. Neurol. 2011, 7, 639–649, doi:10.1038/nrneurol.2011.153.
3. Swinnen, B.; Robberecht, W. The Phenotypic Variability of Amyotrophic Lateral Sclerosis. Nat. Rev. Neurol. 2014, 10, 661,
doi:10.1038/nrneurol.2014.184.
4. Al-Chalabi, A.; Hardiman, O. The Epidemiology of ALS: A Conspiracy of Genes, Environment and Time. Nat. Rev. Neurol.
2013, 9, 617, doi:10.1038/nrneurol.2013.203.
5. Filippini, T.; Fiore, M.; Tesauro, M.; Malagoli, C.; Consonni, M.; Violi, F.; Arcolin, E.; Iacuzio, L.; Oliveri Conti, G.; Cristaldi, A.;
et al. Clinical and Lifestyle Factors and Risk of Amyotrophic Lateral Sclerosis: A Population-Based Case-Control Study. Int. J.
Environ. Res. Public Health 2020, 17, 857, doi:10.3390/ijerph17030857.
6. Mendonça, D.M.F.; Pizzati, L.; Mostacada, K.; Martins, S.C.D.S.; Higashi, R.; Sá, L.A.; Neto, V.M.; Chimelli, L.; Martinez,
A.M.B. Neuroproteomics: an insight into ALS. Neurol. Res. 2012, 34, 937–943, doi:10.1179/1743132812y.0000000092.
7. Manjaly, Z.R.; Scott, K.M.; Abhinav, K.; Wijesekera, L.; Ganesalingam, J.; Goldstein, L.H.; Janssen, A.; Dougherty, A.; Willey, E.;
Stanton, B.R.; et al. The Sex Ratio in Amyotrophic Lateral Sclerosis: A Population Based Study. Amyotroph. Lateral Scler. 2010,
11, 439–442, doi:10.3109/17482961003610853.
8. Pape, J.A.; Grose, J.H. The Effects of Diet and Sex in Amyotrophic Lateral Sclerosis. Revue Neurol. 2020, 176, 301–315,
doi:10.1016/j.neurol.2019.09.008.
9. Longinetti, E.; Fang, F. Epidemiology of Amyotrophic Lateral Sclerosis: An Update of Recent Literature. Curr. Opin. Neurol.
2019, 32, 771, doi:10.1097/WCO.0000000000000730.
10. Le Gall, L.; Anakor, E.; Connolly, O.; Vijayakumar, U.G.; Duddy, W.J.; Duguez, S. Molecular and Cellular Mechanisms Af-
fected in ALS. J. Pers. Med. 2020, 10, 101, doi:10.3390/jpm10030101.
11. Brooks, B.R.; Miller, R.G.; Swash, M.; Munsat, T.L. El Escorial Revisited: Revised Criteria for the Diagnosis of Amyotrophic
Lateral Sclerosis. Amyotroph. Lateral Scler. Other Mot. Neuron Disord. 2000, 1, 293–299, doi:10.1080/146608200300079536.
12. Vasilopoulou, C.; Morris, A.P.; Giannakopoulos, G.; Duguez, S.; Duddy, W. What Can Machine Learning Approaches in Ge-
nomics Tell Us about the Molecular Basis of Amyotrophic Lateral Sclerosis? J. Pers. Med. 2020, 10, 247,
doi:10.3390/jpm10040247.
13. Pearl, J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference; Revised Second Printing; Morgan Kaufmann:
San Francisco (CA), 2014; ISBN 978-0-08-051489-5.
14. Bandyopadhyay, S.; Wolfson, J.; Vock, D.M.; Vazquez-Benitez, G.; Adomavicius, G.; Elidrisi, M.; Johnson, P.E.; OConnor, P.J.
Data Mining for Censored Time-to-Event Data: A Bayesian Network Model for Predicting Cardiovascular Risk from Electronic
Health Record Data. Data Min. Knowl. Discov. 2015, 29, 1033–1069, doi:10.1007/s10618-014-0386-6.
15. Kanwar, M.K.; Lohmueller, L.C.; Kormos, R.L.; Teuteberg, J.J.; Rogers, J.G.; Lindenfeld, J.; Bailey, S.H.; McIlvennan, C.K.;
Benza, R.; Murali, S.; et al. A Bayesian Model to Predict Survival after Left Ventricular Assist Device Implantation. JACC Heart
Fail. 2018, 6, 771–779, doi:10.1016/j.jchf.2018.03.016.
16. Kraisangka, J.; Druzdzel, M.J.; Benza, R.L. A Risk Calculator for the Pulmonary Arterial Hypertension Based on a Bayesian
Network. In Proceedings of the BMA@ UAI, New York, NY, USA, 29 June 2016; pp. 1–6.
17. Arora, P.; Boyne, D.; Slater, J.J.; Gupta, A.; Brenner, D.R.; Druzdzel, M.J. Bayesian Networks for Risk Prediction Using Re-
al-World Data: A Tool for Precision Medicine. Value Health 2019, 22, 439–445, doi:10.1016/j.jval.2019.01.006.
18. Gupta, A.; Slater, J.J.; Boyne, D.; Mitsakakis, N.; Béliveau, A.; Druzdzel, M.J.; Brenner, D.R.; Hussain, S.; Arora, P. Probabilistic
Graphical Modeling for Estimating Risk of Coronary Artery Disease: Applications of a Flexible Machine-Learning Method.
Med. Decis. Mak. 2019, 39, 1032–1044, doi:10.1177/0272989X19879095.
19. Lam, W.; Bacchus, F. Learning Bayesian Belief Networks: An Approach Based on the Mdl Principle. Comput. Intell. 1994, 10,
269–293, doi:10.1111/j.1467-8640.1994.tb00166.x.
20. Koller, D.; Friedman, N. Probabilistic Graphical Models. Principles and Techniques. Cambridge, Massachusets; MIT Press: Cambridge,
MA, USA, 2009.
21. Probabilistic Modeling in Bioinformatics and Medical Informatics; Husmeier, D., Dybowski, R., Roberts, S., Eds.; Advanced
Information and Knowledge Processing; Springer-Verlag: London, 2005; ISBN 978-1-85233-778-0.
22. Senders, J.T.; Staples, P.C.; Karhade, A.V.; Zaki, M.M.; Gormley, W.B.; Broekman, M.L.D.; Smith, T.R.; Arnaout, O. Machine
Learning and Neurosurgical Outcome Prediction: A Systematic Review. World Neurosurg. 2018, 109, 476-486.e1,
doi:10.1016/j.wneu.2017.09.149.
23. Deo Rahul, C. Machine Learning in Medicine. Circulation 2015, 132, 1920–1930, doi:10.1161/CIRCULATIONAHA.115.001593.
24. Yu, K.-H.; Beam, A.L.; Kohane, I.S. Artificial Intelligence in Healthcare. Nat. Biomed. Eng. 2018, 2, 719–731,
doi:10.1038/s41551-018-0305-z.
25. Dreiseitl, S.; Ohno-Machado, L. Logistic Regression and Artificial Neural Network Classification Models: A Methodology
Review. J. Biomed. Inform. 2002, 35, 352–359.
26. Gevrey, M.; Dimopoulos, I.; Lek, S. Review and Comparison of Methods to Study the Contribution of Variables in Artificial
Neural Network Models. Ecol. Model. 2003, 160, 249–264, doi:10.1016/S0304-3800(02)00257-0.
Brain Sci. 2021, 11, 150 15 of 16
27. Christodoulou, E.; Ma, J.; Collins, G.S.; Steyerberg, E.W.; Verbakel, J.Y.; Van Calster, B. A Systematic Review Shows No Per-
formance Benefit of Machine Learning over Logistic Regression for Clinical Prediction Models. J. Clin. Epidemiol. 2019, 110, 12–
22, doi:10.1016/j.jclinepi.2019.02.004.
28. Hosmer, D.W., Jr.; Lemeshow, S.; Sturdivant, R.X. Applied Logistic Regression; John Wiley & Sons: Hoboken, NJ, USA, 2013;
Volume 398.
29. Kleinbaum, D.G.; Dietz, K.; Gail, M.; Klein, M.; Klein, M. Logistic Regression; Springer-Verlag: New York, NY, USA, 2002.
30. John, G.; Langley, P. Estimating Continuous Distributions in Bayesian Classifiers. In Proceedings of the Eleventh Conference on Un-
certainty in Artificial Intelligence; Morgan Kaufmann, San Mateo, 1995; pp. 338–345.
31. Wei, W.; Visweswaran, S.; Cooper, G.F. The Application of Naive Bayes Model Averaging to Predict Alzheimer’s Disease from
Genome-Wide Data. J. Am. Med. Inform. Assoc. 2011, 18, 370–375, doi:10.1136/amiajnl-2011-000101.
32. Jiang, W.; Shen, Y.; Ding, Y.; Ye, C.; Zheng, Y.; Zhao, P.; Liu, L.; Tong, Z.; Zhou, L.; Sun, S.; et al. A Naive Bayes Algorithm for
Tissue Origin Diagnosis (TOD-Bayes) of Synchronous Multifocal Tumors in the Hepatobiliary and Pancreatic System. Int. J.
Cancer 2018, 142, 357–368.
33. Rokach, L.; Maimon, O. Decision Trees. In Data Mining and Knowledge Discovery Handbook; Maimon, O., Rokach, L., Eds.;
Springer US: Boston, MA, 2005; pp. 165–192 ISBN 978-0-387-25465-4.
34. Kaur, G.; Chhabra, A. Improved J48 Classification Algorithm for the Prediction of Diabetes. Int. J. Comput. Appl. 2014, 98, 13–17.
35. Quinlan, J.R. Improved Use of Continuous Attributes in C4. 5. J. Artif. Intell. Res. 1996, 4, 7790, doi:10.1613/jair.279.
36. Yadav, A.K.; Chandel, S. Solar Energy Potential Assessment of Western Himalayan Indian State of Himachal Pradesh Using
J48 Algorithm of WEKA in ANN Based Prediction Model. Renew. Energy 2015, 75, 675–693, doi:10.1016/j.renene.2014.10.046.
37. bin Othman, M.F.; Yau, T.M.S. Comparison of Different Classification Techniques Using WEKA for Breast Cancer. In Pro-
ceedings of the 3rd Kuala Lumpur International Conference on Biomedical Engineering 2006, Kuala Lumpur, Malaysia, 11–14
December 2006; Ibrahim, F., Osman, N.A.A., Usman, J., Kadri, N.A., Eds.; Springer: Berlin, Heidelberg, 2007; pp. 520–523.
38. Alpaydin, E. Introduction to Machine Learning; MIT Press: Cambridge, MA, USA, 2004.
39. Schölkopf, B.; Smola, A.J. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond; Adaptive
Computation and Machine Learning; MIT Press: Cambridge, MA, USA, 2009; ISBN 0-262-19475-9.
40. Cristianini, N.; Shawe-Taylor, J.; others An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods;
Cambridge University Press: Cambridge, UK, 2000; ISBN 0-521-78019-5.
41. Witten, I.H.; Frank, E.; Hall, M.A.; Pal, C.J. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations,
4th ed.; Morgan Kaufmann Publishers: Burlington, MA, USA, 2017; ISBN 1-55860-552-5.
42. Pinto, D.; Tovar, M.; Vilarino, D.; Beltrán, B.; Jiménez-Salazar, H.; Campos, B. BUAP: Performance of K-Star at the INEX’09
Clustering Task. In Proceedings of the International Workshop of the Initiative for the Evaluation of XML Retrieval, Brisbane,
QLD, Australia, 7–9 December 2009; pp. 434–440.
43. Painuli, S.; Elangovan, M.; Sugumaran, V. Tool Condition Monitoring Using K-Star Algorithm. Expert Syst. Appl. 2014, 41,
2638–2643, doi:10.1016/j.eswa.2013.11.005.
44. Wiharto, W.; Kusnanto, H.; Herianto, H. Intelligence System for Diagnosis Level of Coronary Heart Disease with K-Star Algo-
rithm. Healthc. Inform. Res. 2016, 22, 30–38, doi:10.4258/hir.2016.22.1.30.
45. Zhang, S.; Cheng, D.; Deng, Z.; Zong, M.; Deng, X. A Novel KNN Algorithm with Data-Driven k Parameter Computation.
Pattern Recognit. Lett. 2018, 109, 44–54, doi:10.1016/j.patrec.2017.09.036.
46. Filiz, E.; Öz, E. Educational Data Mining Methods For Timss 2015 Mathematics Success: Turkey Case. Sigma J. Eng. Nat. Sci.
/Mühendislik ve Fen Bilimleri Dergisi 2020, 38, 963-977.
47. Ballabio, D.; Grisoni, F.; Todeschini, R. Multivariate Comparison of Classification Performance Measures. Chemom. Intell. Lab.
Syst. 2018, 174, 33–44, doi:10.1016/j.chemolab.2017.12.004.
48. Tharwat, A. Classification Assessment Methods. Appl. Comput. Inform. 2020, doi:10.1016/j.aci.2018.08.003.
49. Kuncheva, L.I.; Arnaiz-González, Á.; Díez-Pastor, J.-F.; Gunn, I.A.D. Instance Selection Improves Geometric Mean Accuracy: A
Study on Imbalanced Data Classification. Prog. Artif. Intell. 2019, 8, 215–228, doi:10.1007/s13748-019-00172-4.
50. Marsland, S. Machine Learning: An Algorithmic Perspective, 2nd ed.; CRC Press: Boca Raton, FL, USA, 2015; ISBN
978-1-4665-8333-7.
51. Sakr, S.; Elshawi, R.; Ahmed, A.M.; Qureshi, W.T.; Brawner, C.A.; Keteyian, S.J.; Blaha, M.J.; Al-Mallah, M.H. Comparison of
Machine Learning Techniques to Predict All-Cause Mortality Using Fitness Data: The Henry Ford ExercIse Testing (FIT) Pro-
ject. BMC Med. Inform. Decis. Mak. 2017, 17, 174, doi:10.1186/s12911-017-0566-6.
52. Akosa, J. Predictive Accuracy: A Misleading Performance Measure for Highly Imbalanced Data. In Proceedings of the SAS
Global Forum, Oklahoma State University, Orlando, FL, USA, 2-5 April 2017; pp. 2–5.
53. Fawcett, T. An Introduction to ROC Analysis. Pattern Recognit. Lett. 2006, 27, 861–874, doi:10.1016/j.patrec.2005.10.010.
54. BayesFusion, L. GeNIe Modeler User Manual; BayesFusion, LLC: Pittsburgh, PA, USA, 2017.
55. Powers, D. Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation. J. Mach. Learn.
Technol. 2011, 2, 37–63.
56. Viera, A.J.; Garrett, J.M. Understanding Interobserver Agreement: The Kappa Statistic. Fam. Med. 2005, 37, 360–363.
57. Spiegelhalter, D.J.; Dawid, A.P.; Lauritzen, S.L.; Cowell, R.G. Bayesian Analysis in Expert Systems. Stat. Sci. 1993, 8, 219–247.
58. Jensen, F.V.; Nielsen, T.D. Bayesian Networks and Decision Graphs, 2nd ed.; Information Science and Statistics; Springer: New
York, NY, USA, 2007; ISBN 978-0-387-68281-5.
Brain Sci. 2021, 11, 150 16 of 16
59. Lin, J.-H.; Haug, P.J. Exploiting Missing Clinical Data in Bayesian Network Modeling for Predicting Medical Problems. J.
Biomed. Inform. 2008, 41, 1–14, doi:10.1016/j.jbi.2007.06.001.
60. Korb, K.B.; Nicholson, A.E. Bayesian Artificial Intelligence; CRC Press: Boca Raton, FL, USA, 479.
61. Chen, R.; Herskovits, E.H. Clinical Diagnosis Based on Bayesian Classification of Functional Magnetic-Resonance Data. Neu-
roinformatics 2007, 5, 178–188, doi:10.1007/s12021-007-0007-2.
62. Luo, Y.; El Naqa, I.; McShan, D.L.; Ray, D.; Lohse, I.; Matuszak, M.M.; Owen, D.; Jolly, S.; Lawrence, T.S.; Kong, F.-M.; et al.
Unraveling Biophysical Interactions of Radiation Pneumonitis in Non-Small-Cell Lung Cancer via Bayesian Network Analysis.
Radiother. Oncol. 2017, 123, 85–92, doi:10.1016/j.radonc.2017.02.004.
63. Nojavan, A., F.; Qian, S.S.; Stow, C.A. Comparative Analysis of Discretization Methods in Bayesian Networks. Environ. Model.
Softw. 2017, 87, 64–71, doi:10.1016/j.envsoft.2016.10.007.
64. Yang, Y.; Webb, G.I. A Comparative Study of Discretization Methods for Naive-Bayes Classifiers. In Proceedings of the PKAW
2002: The 2002 Pacific Rim Knowledge Acquisition Workshop, Tokyo, Japan, 18-19 August 2002; pp. 159–173.
65. Rodríguez-López, V.; Cruz-Barbosa, R. Improving Bayesian Networks Breast Mass Diagnosis by Using Clinical Data. In Pro-
ceedings of the Pattern Recognition; Mexico City, Mexico,24–27 June 2015. Carrasco-Ochoa, J.A., Martí-nez-Trinidad, J.F.,
Sossa-Azuela, J.H., Olvera López, J.A., Famili, F., Eds.; Springer International Publishing: Cham, Switzerland 2015; pp. 292
301.
66. Nagarajan, R.; Scutari, M.; Lèbre, S. Bayesian Networks in R: With Applications in Systems Biology; Use R!; Springer: New York, NY,
USA, 2013; ISBN 978-1-4614-6445-7.
67. Antal, P.; Fannes, G.; Timmerman, D.; Moreau, Y.; De Moor, B. Using Literature and Data to Learn Bayesian Networks as
Clinical Models of Ovarian Tumors. Artif. Intell. Med. 2004, 30, 257–281, doi:10.1016/j.artmed.2003.11.007.
68. Khanna, S.; Domingo-Fernández, D.; Iyappan, A.; Emon, M.A.; Hofmann-Apitius, M.; Fröhlich, H. Using Multi-Scale Genetic,
Neuroimaging and Clinical Data for Predicting Alzheimer’s Disease and Reconstruction of Relevant Biological Mechanisms.
Sci. Rep. 2018, 8, 11173, doi:10.1038/s41598-018-29433-3.
69. Palmieri, A.; Mento, G.; Calvo, V.; Querin, G.; D’Ascenzo, C.; Volpato, C.; Kleinbub, J.R.; Bisiacchi, P.S.; Sorarù, G. Female
Gender Doubles Executive Dysfunction Risk in ALS: A Case-Control Study in 165 Patients. J. Neurol. Neurosurg. Psychiatry.
2015, 86, 574–579, doi:10.1136/jnnp-2014-307654.
70. Trojsi, F.; D’Alvano, G.; Bonavita, S.; Tedeschi, G. Genetics and Sex in the Pathogenesis of Amyotrophic Lateral Sclerosis (ALS):
Is There a Link? Int. J. Mol. Sci. 2020, 21, 3647, doi:10.3390/ijms21103647.
71. Chiò, A.; Moglia, C.; Canosa, A.; Manera, U.; D’Ovidio, F.; Vasta, R.; Grassano, M.; Brunetti, M.; Barberis, M.; Corrado, L.; et al.
ALS Phenotype Is Influenced by Age, Sex, and Genetics: A Population-Based Study. Neurology 2020, 94, e802–e810,
doi:10.1212/WNL.0000000000008869.
72. Ingre, C.; Roos, P.M.; Piehl, F.; Kamel, F.; Fang, F. Risk Factors for Amyotrophic Lateral Sclerosis. Clin. Epidemiol. 2015, 7, 181–
193, doi:10.2147/CLEP.S37505.
73. Trojsi, F.; Siciliano, M.; Femiano, C.; Santangelo, G.; Lunetta, C.; Calvo, A.; Moglia, C.; Marinou, K.; Ticozzi, N.; Ferro, C.; et al.
Comparative Analysis of C9orf72 and Sporadic Disease in a Large Multicenter ALS Population: The Effect of Male Sex on
Survival of C9orf72 Positive Patients. Front. Neurosci. 2019, 13, 485, doi:10.3389/fnins.2019.00485.
74. Rooney, J.; Fogh, I.; Westeneng, H.-J.; Vajda, A.; McLaughlin, R.; Heverin, M.; Jones, A.; van Eijk, R.; Calvo, A.; Mazzini, L.; et
al. C9orf72 Expansion Differentially Affects Males with Spinal Onset Amyotrophic Lateral Sclerosis. J. Neurol. Neurosurg. Psy-
chiatry 2017, 88, 281, doi:10.1136/jnnp-2016-314093.
75. Atsuta, N.; Watanabe, H.; Ito, M.; Tanaka, F.; Tamakoshi, A.; Nakano, I.; Aoki, M.; Tsuji, S.; Yuasa, T.; Takano, H.; et al. Age at
Onset Influences on Wide-Ranged Clinical Features of Sporadic Amyotrophic Lateral Sclerosis. J. Neurol. Sci. 2009, 276, 163–
169, doi:10.1016/j.jns.2008.09.024.
76. Chiò, A.; Calvo, A.; Moglia, C.; Mazzini, L.; Mora, G.; PARALS Study Group. Phenotypic Heterogeneity of Amyotrophic Lat-
eral Sclerosis: A Population Based Study. J. Neurol. Neurosurg. Psychiatry 2011, 82, 740–746, doi:10.1136/jnnp.2010.235952.
77. Connolly, O.; Le Gall, L.; McCluskey, G.; Donaghy, C.G.; Duddy, W.J.; Duguez, S. A Systematic Review of Genotype–
Phenotype Correlation across Cohorts Having Causal Mutations of Different Genes in ALS. J. Pers. Med. 2020, 10, 58,
doi:10.3390/jpm10030058.
78. van Es, M.A.; Hardiman, O.; Chio, A.; Al-Chalabi, A.; Pasterkamp, R.J.; Veldink, J.H.; van den Berg, L.H. Amyotrophic Lateral
Sclerosis. Lancet 2017, 390, 2084–2098, doi:10.1016/S0140-6736(17)31287-4.
79. Nguyen, H.P.; Van Broeckhoven, C.; van der Zee, J. ALS Genes in the Genomic Era and Their Implications for FTD. Trends
Genet. 2018, 34, 404–423, doi:10.1016/j.tig.2018.03.001.
80. Andersen, P.M.; Al-Chalabi, A. Clinical Genetics of Amyotrophic Lateral Sclerosis: What Do We Really Know? Nat. Rev. Neurol.
2011, 7, 603–615, doi:10.1038/nrneurol.2011.150.
81. Fratello, M.; Caiazzo, G.; Trojsi, F.; Russo, A.; Tedeschi, G.; Tagliaferri, R.; Esposito, F. Multi-View Ensemble Classification of
Brain Connectivity Images for Neurodegeneration Type Discrimination. Neuroinform 2017, 15, 199–213,
doi:10.1007/s12021-017-9324-2.
... Common classification algorithms include Bayesian network [10], K-nearest neighbour algorithm [11] and decision tree [12]. Researchers take advantage of Bayesian network and machine learning methods to predict amyotrophic lateral sclerosis, and Bayesian network produced the best results [13]. However, by applying these models, it is difficult to reveal the potential information of neonatal pneumonia of gestational diabetes, which is a complex disease affected by multiple factors. ...
... As it is shown [13,15], BN is very useful for prediction and diagnosis, which is very important in disease interventions because they are usually expensive and their effects can only be observed in the long term. BN has the properties to be very useful in predicting the effectiveness of different strategies and selecting the best among them. ...
... Factual or reference status is introduced to show the counterfactual condition with or without a patient-specific factor regarding its impact on neonatal pneumonia (node "np") respectively. In the complex correlation analysis model (np ~ ptb + crp + nrds + ngr + afv + prom, R 2 ≧0.6), indicating the goodness of the model as well as np [13]. The relationship among patient-specific factors and neonatal pneumonia in an NB is fixed, and among them nodes represent these factors and neonatal pneumonia, directed arcs denote dependent relationship between each factor and the stage. ...
Article
Full-text available
Objective To predict the influencing factors of neonatal pneumonia in pregnant women with diabetes mellitus using a Bayesian network model. By examining the intricate network connections between the numerous variables given by Bayesian networks (BN), this study aims to compare the prediction effect of the Bayesian network model and to analyze the influencing factors directly associated to neonatal pneumonia. Method Through the structure learning algorithms of BN, Naive Bayesian (NB), Tree Augmented Naive Bayes (TAN), and k-Dependence Bayesian Classifier (KDB), complex networks connecting variables were presented and their predictive abilities were tested. The BN model and three machine learning models computed using the R bnlean package were also compared in the data set. Results In constraint-based algorithms, three algorithms had different presentation DAGs. KDB had a better prediction effect than NB and TAN, and it achieved higher AUC compared with TAN. Among three machine learning modes, Support Vector Machine showed a accuracy rate of 91.04% and 67.88% of precision, which was lower than TAN (92.70%; 72.10%). Conclusion KDB was applicable, and it can detect the dependencies between variables, identify more potential associations and track changes between variables and outcome.
... In one of the studies [141], the research group employed ML methods to predict the disease's genetic architecture. They utilized bayesian networks (BN) along with other ML algorithms to develop a diagnostic model. ...
... A significant barrier to ALS research is a lack of data. To address this setback in ML approaches, different approaches like resampling techniques, generating synthetic simulated data for the minority class, choosing suitable performance metrics like mean balanced classification accuracy [147], geometric mean, Youden's index [141], precision, recall and F1 score, and adding 'class-weights' during model training, can be used [132,142]. Papers also addressed data analysis methods to compensate for the difference in the age between the ALS patients and controls [132]. ...
... Hence, incorporating N-Control along with data from ALS mimic disorders into the model might further enhance the relevance of the model in clinical settings [132]. Control group consists of healthy individuals, while the N-Control group includes people with different neurological diseases other than these diseases [141]. ...
Article
Background Network medicine is an emerging area of research that focuses on delving into the molecular complexity of the disease, leading to the discovery of network biomarkers and therapeutic target discovery. Amyotrophic lateral sclerosis (ALS) is a complicated rare disease with unknown pathogenesis and no available treatment. In ALS, network properties appear to be potential biomarkers that can be beneficial in disease-related applications when explored independently or in tandem with machine learning (ML) techniques. Objective This systematic literature review explores recent trends in network medicine and implementations of network-based ML algorithms in ALS. We aim to provide an overview of the identified primary studies and gather details on identifying the potential biomarkers and delineated pathways. Methods The current study consists of searching for and investigating primary studies from PubMed and Dimensions.ai, published between 2018 and 2022 that reported network medicine perspectives and the coupling of ML techniques. Each abstract and full-text study was individually evaluated, and the relevant studies were finally included in the review for discussion once they met the inclusion and exclusion criteria. Results We identified 109 eligible publications from primary studies representing this systematic review. The data coalesced into two themes: application of network science to identify disease modules and promising biomarkers in ALS, along with network-based ML approaches. Conclusion This systematic review gives an overview of the network medicine approaches and implementations of network-based ML algorithms in ALS to determine new disease genes, and identify critical pathways and therapeutic target discovery for personalized treatment.
... Bayesian networks were used to understand the causal relationships in real-world probabilistic problems ( [10]). Bayesian networks has been considered as an efficient decision tool for predicting ALS disease in previous study ( [11]). Based on the medical big data system ( [12,13]), we chose the ALS inpatients who underwent blood Pb testing as well as matched controlled inpatients admitted to the neurology department. ...
... Bayesian networks were built upon a strong foundation in causality and probability theory, regardless of the missing values ( [24]). Compared with other machine learning methods, like artificial neural network, logistic regression, support vector machines, k-nearest neighbor algorithm, Bayesian networks produced best results in predicting the ALS with blood indexers ( [11]). Using this model, we took blood Pb concentration as a preferable biomarker of ALS, because of its properties of exogenous substances. ...
Article
Full-text available
Background Environmental lead (Pb) exposure have been suggested as a causative factor for amyotrophic lateral sclerosis (ALS). However, the role of Pb content of human body in ALS outcomes has not been quantified clearly. The purpose of this study was to apply Bayesian networks to forecast the risk of Pb exposure on the disease occurrence. Methods We retrospectively collected medical records of ALS inpatients who underwent blood Pb testing, while matched controlled inpatients on age, gender, hospital ward and admission time according to the radio of 1:9. Tree Augmented Naïve Bayes (TAN), a semi-naïve Bayes classifier, was established to predict probability of ALS or controls with risk factors. Results A total of 140 inpatients were included in this study. The whole blood Pb levels of ALS patients (57.00 μg/L) were more than twice as high as the controls (27.71 μg/L). Using the blood Pb concentrations to calculate probability of ALS, TAN produced the total coincidence rate of 90.00%. The specificity, sensitivity of Pb for ALS prediction was 0.79, or 0.74, respectively. Conclusion Therefore, these results provided quantitative evidence that Pb exposure may contribute to the development of ALS. Bayesian networks may be used to predict the ALS early onset with blood Pb levels.
... Thirdly, our risk scores' construction relied on regression methods, which may not capture nonlinear combinations of variables influencing ALS prediction [43]. In the future, it may be worthwhile to explore the use of nonlinear methods to improve the accuracy of ALS prediction models [44,45]. Lastly, our study cohort was derived from the UK Biobank, which primarily represents a European population, and we have limited . ...
Preprint
Full-text available
Background and Objectives Amyotrophic lateral sclerosis (ALS) causes profound impairments in neurological function and a cure for this devastating disease remains elusive. Early detection and risk stratification are crucial for timely intervention and improving patient outcomes. This study aimed to identify predisposing genetic, phenotypic, and exposure-related factors for Amyotrophic lateral sclerosis using multi-modal data and assess their joint predictive potential. Methods Utilizing data from the UK Biobank, we analyzed an unrelated set of 292 ALS cases and 408,831 controls of European descent. Two polygenic risk scores (PRS) are constructed: “GWAS Hits PRS” and “PRS-CS,” reflecting oligogenic and polygenic ALS risk profiles, respectively. Time-restricted phenome-wide association studies (PheWAS) were performed to identify pre-existing conditions increasing ALS risk, integrated into phenotypic risk scores (PheRS). A poly-exposure score (“PXS”) captures the influence of environmental exposures measured through survey questionnaires. We evaluate the performance of these scores for predicting ALS incidence and stratifying risk, adjusting for baseline demographic covariates. Results Both PRSs modestly predicted ALS diagnosis, but with increased predictive power when combined (covariate-adjusted receiver operating characteristic [AAUC] = 0.584 [0.525, 0.639]). PheRS incorporated diagnoses 1 year before ALS onset (PheRS1) modestly discriminated cases from controls (AAUC = 0.515 [0.472, 0.564]). The “PXS” did not significantly predict ALS. However, a model incorporating PRSs and PheRS1 improved prediction of ALS (AAUC = 0.604 [0.547, 0.667]), outperforming a model combining all risk scores. This combined risk score identified the top 10% of risk score distribution with a 4-fold higher ALS risk (95% CI: [2.04, 7.73]) versus those in the 40%-60% range. Discussions By leveraging UK Biobank data, our study uncovers predisposing ALS factors, highlighting the improved effectiveness of multi-factorial prediction models to identify individuals at highest risk for ALS.
... In addition, in case of dichotomized variables, a BN can provide the conditional probability that a variable is present or absent, based on the presence or absence of other variables in the network. BNs have been widely used in medicine to predict outcomes such as diagnosis, functional outcome, quality of life and survival, based on patient and disease characteristics [10][11][12][13][14][15] . The advantage compared to other probabilistic modelling methods is that they do not need dedicated input and output variables and that they can be constructed in case of insufficient available evidence on associations between variables 15 . ...
Article
Full-text available
Although patients with advanced cancer often experience multiple symptoms simultaneously, clinicians usually focus on symptoms that are volunteered by patients during regular history-taking. We aimed to evaluate the feasibility of a Bayesian network (BN) model to predict the presence of simultaneous symptoms, based on the presence of other symptoms. Our goal is to help clinicians prioritize which symptoms to assess. Patient-reported severity of 11 symptoms (scale 0–10) was measured using an adapted Edmonton Symptom Assessment Scale (ESAS) in a national cross-sectional survey among advanced cancer patients. Scores were dichotomized (< 4 and ≥ 4). Using fourfold cross validation, the prediction error of 9 BN algorithms was estimated (Akaike information criterion (AIC). The model with the highest AIC was evaluated. Model predictive performance was assessed per symptom; an area under curve (AUC) of ≥ 0.65 was considered satisfactory. Model calibration compared predicted and observed probabilities; > 10% difference was considered inaccurate. Symptom scores of 532 patients were collected. A symptom score ≥ 4 was most prevalent for fatigue (64.7%). AUCs varied between 0.60 and 0.78, with satisfactory AUCs for 8/11 symptoms. Calibration was accurate for 101/110 predicted conditional probabilities. Whether a patient experienced fatigue was directly associated with experiencing 7 other symptoms. For example, in the absence or presence of fatigue, the model predicted a 8.6% and 33.1% probability of experiencing anxiety, respectively. It is feasible to use BN development for prioritizing symptom assessment. Fatigue seems most eligble to serve as a starting symptom for predicting the probability of experiencing simultaneous symptoms.
Article
Bayesian Networks (BNs) are probabilistic graphical statistical models that have been widely used in many fields over the last decade. This method, which can also be used for educational data mining (EDM) purposes, is a fairly new method in education literature. This study models students' science success using the BN approach. Science is one of the core areas in the PISA exam. To this end, we used the data set including the most successful 25% and the least successful 25% students from Turkey based on their scores from Program for International Student Assessment (PISA) survey. We also made the feature selection to determine the most effective variables on success. The accuracy value of the BN model created with the variables determined by the feature selection is 86.2%. We classified effective variables on success into three categories; individual, family-related and school-related. Based on the analysis, we found that family-related variables are very effective in science success, and gender is not a discriminant variable in this success. In addition, this is the first study in the literature on the evaluation of complex data made with the BN model. In this respect, it serves as a guide in the evaluation of international exams and in the use of the data obtained.
Chapter
The large-scale outbreaks of infectious pandemic diseases emerged regularly throughout history and created notable economic, social, and political disruptions. Major pandemics affect a wide geographic area significantly increasing morbidity and mortality. The world has come across numerous remarkable pandemics such as the Black Death, measles, smallpox, influenza, plague, cholera, Spanish flu, severe acute respiratory syndrome coronavirus (SARS-CoV), Middle East respiratory syndrome coronavirus (MERS-CoV), human immunodeficiency virus/acquired immunodeficiency syndrome (HIV/AIDS) and Ebola virus and is now combating the new coronavirus disease 2019 (COVID-19) pandemic affecting humanity greatly. Studies suggest that the likelihood of pandemic threats is due to the diversity of pathogens, changes in the dynamics of disease transmission and severity, human-pathogen interaction, increased globalization, urbanization, huge exploitation of land and natural resources, and global warming. The pandemic risk burden poses serious challenges to humanity and these trends will prolong and intensify over time. For the well-being of humanity, administration of public health measures, techniques to intercept and control infection, pharmaceutical intervention, global surveillance programs, novel technologies to identify disease biomarkers, and vaccine production prove to be effective beneficiary responses to identify and limit emerging outbreaks and to escalate preparedness and health capacity. The extensive amount of data produced during the pandemic has given a lot of chances to the researchers and healthcare providers to evaluate new trends, detect vulnerable groups, and solve long-standing issues in the healthcare industry. The healthcare industry has sought to use the most comprehensive data and predictive analytics software tools employing intelligent data technology, artificial intelligence (AI), machine learning (ML), and deep learning (DL) and has leveraged to gain insight, establish innovative ways to ease sustainable demand and supply, and pitch straight into the prospective benefits to foster the fight against the pandemic. Hence, these predictive models can support hospitals, healthcare settings, state health organizations, and government establishments to speculate the influence of COVID-19 and prepare for the future. In this chapter, a comprehensive investigation of various data analytic tools that are used in expert systems, proposed for pandemic and epidemic diseases, is discussed. The key issues, challenges, and opportunities of the existing and current methods are also discussed.
Chapter
Nature-inspired computing (NIC) computer optimization algorithms are an emerging approach that relies on the principles and inspiration of the biological development of nature to build new and strong competitive tactics. Given the success of NIC approaches and techniques in big data analytic applications, it is expected that they may also be effectively applied in health care. The application of NIC in the management of the ongoing COVID-19 pandemic is a beneficial tool that may be widely employed in clinical and public health decision-making. Recent developments in artificial intelligence, machine learning, and bio-inspired optimization algorithms have boosted the relevance of biomedical signal and image processing research. Biomedical image processing is comparable in theory to biomedical signal processing in many aspects. It comprises the analysis, enhancement, and display of photographs collected via X-rays, ultrasound, magnetic resonance imaging (MRI), nuclear medicine, and visual imaging technologies. NIC is presently quickly emerging in many scientific and technological research domains, including biomedical sciences. In this perspective, nature optimization algorithms may play a key role in addressing the multiple elements of health care. Researchers, healthcare policymakers, physicians, and other interested parties might use the insights of our chapter to better prioritize research and development for the operationalization of AI in the event of future pandemics.
Article
Background Frailty is a syndrome that is defined as an accumulation of deficits in physical, psychological, and social domains. On a global scale, there is an urgent need to create frailty-ready healthcare systems due to the healthcare burden that frailty confers on systems and the increased risk of falls, healthcare utilization, disability, and premature mortality. Several studies have been conducted to develop prediction models for predicting frailty. Most studies used logistic regression as a technique to develop a prediction model. One area that has experienced significant growth is the application of Bayesian techniques, partly due to an increasing number of practitioners valuing the Bayesian paradigm as matching that of scientific discovery. Objective We compared ten different Bayesian networks as proposed by ten experts in the field of frail elderly people to predict frailty with a choice from ten dichotomized determinants for frailty. Methods We used the opinion of ten experts who could indicate, using an empty Bayesian network graph, the important predictors for frailty and the interactions between the different predictors. The candidate predictors were age, sex, marital status, ethnicity, education, income, lifestyle, multimorbidity, life events, and home living environment. The ten Bayesian network models were evaluated in terms of their ability to predict frailty. For the evaluation, we used the data of 479 participants that filled in the Tilburg Frailty indicator (TFI) questionnaire for assessing frailty among community-dwelling older people. The data set contained the aforementioned variables and the outcome ”frail”. The model fit of each model was measured using the Akaike information criterion (AIC) and the predictive performance of the models was measured using the area under the curve (AUC) of the receiver operator characteristic (ROC). The AUCs of the models were validated using bootstrapping with 100 repetitions. The relative importance of the predictors in the models was calculated using the permutation feature importance algorithm (PFI). ResultsThe ten Bayesian networks of the ten experts differed considerably regarding the predictors and the connections between the predictors and the outcome. However, all ten networks had corrected AUCs >0.700. Evaluating the importance of the predictors in each model, ”diseases or chronic disorders” was the most important predictor in all models (10 times). The predictors ”lifestyle” and ”monthly income” were also often present in the models (both 6 times). One or more diseases or chronic disorders, an unhealthy lifestyle, and a monthly income below 1,800 euro increased the likelihood of frailty. Conclusions Although the ten experts all made different graphs, the predictive performance was always satisfying (AUCs >0.700). While it is true that the predictor importance varied all the time, the top three of the predictor importance consisted of “diseases or chronic disorders”, “lifestyle” and “monthly income”. All in all, asking for the opinion of experts in the field of frail elderly to predict frailty with Bayesian networks may be more rewarding than a data-driven forecast with Bayesian networks because they have expert knowledge regarding interactions between the different predictors.
Article
Full-text available
Educational data mining (EDM) is an important research area which has an ability of analyzing and modeling educational data. Obtained outputs from EDM help researchers and education planners understand and revise the systematic problems of current educational strategies. This study deals with an important international study, namely Trends International Mathematics and Science Study (TIMSS). EDM methods are applied to last released TIMSS 2015 8th grade Turkish students' data. The study has mainly twofold: to find best performer algorithm(s) for classifying students' mathematic success and to extract important features on success. The most appropriate algorithm is found as logistic regression and also support vector machines-polynomial kernel and support vector machines-Pearson VII function-based universal kernel give similar performances with logistic regression. Different feature selection methods are used in order to extract the most effective features in classification among all features in the original dataset. "Home Educational Resources", "Student Confident in Mathematics" and "Mathematics Achievement Too Low for Estimation" are found the most important features in all feature selection methods.
Article
Full-text available
Amyotrophic Lateral Sclerosis (ALS) is the most common late-onset motor neuron disorder, but our current knowledge of the molecular mechanisms and pathways underlying this disease remain elusive. This review (1) systematically identifies machine learning studies aimed at the understanding of the genetic architecture of ALS, (2) outlines the main challenges faced and compares the different approaches that have been used to confront them, and (3) compares the experimental designs and results produced by those approaches and describes their reproducibility in terms of biological results and the performances of the machine learning models. The majority of the collected studies incorporated prior knowledge of ALS into their feature selection approaches, and trained their machine learning models using genomic data combined with other types of mined knowledge including functional associations, protein-protein interactions, disease/tissue-specific information, epigenetic data, and known ALS phenotype-genotype associations. The importance of incorporating gene-gene interactions and cis-regulatory elements into the experimental design of future ALS machine learning studies is highlighted. Lastly, it is suggested that future advances in the genomic and machine learning fields will bring about a better understanding of ALS genetic architecture, and enable improved personalized approaches to this and other devastating and complex diseases.
Article
Full-text available
Amyotrophic lateral sclerosis (ALS) is a terminal late-onset condition characterized by the loss of upper and lower motor neurons. Mutations in more than 30 genes are associated to the disease, but these explain only~20% of cases. The molecular functions of these genes implicate a wide range of cellular processes in ALS pathology, a cohesive understanding of which may provide clues to common molecular mechanisms across both familial (inherited) and sporadic cases and could be key to the development of effective therapeutic approaches. Here, the different pathways that have been investigated in ALS are summarized, discussing in detail: mitochondrial dysfunction, oxidative stress, axonal transport dysregulation, glutamate excitotoxicity, endosomal and vesicular transport impairment, impaired protein homeostasis, and aberrant RNA metabolism. This review considers the mechanistic roles of ALS-associated genes in pathology, viewed through the prism of shared molecular pathways.
Article
Full-text available
Amyotrophic lateral sclerosis is a rare and fatal neurodegenerative disease characterised by progressive deterioration of upper and lower motor neurons that eventually culminates in severe muscle atrophy, respiratory failure and death. There is a concerning lack of understanding regarding the mechanisms that lead to the onset of ALS and as a result there are no reliable biomarkers that aid in the early detection of the disease nor is there an effective treatment. This review first considers the clinical phenotypes associated with ALS, and discusses the broad categorisation of ALS and ALS-mimic diseases into upper and lower motor neuron diseases, before focusing on the genetic aetiology of ALS and considering the potential relationship of mutations of different genes to variations in phenotype. For this purpose, a systematic review is conducted collating data from 107 original published clinical studies on monogenic forms of the disease, surveying the age and site of onset, disease duration and motor neuron involvement. The collected data highlight the complexity of the disease’s genotype–phenotype relationship, and thus the need for a nuanced approach to the development of clinical assays and therapeutics.
Article
Full-text available
Amyotrophic lateral sclerosis (ALS) is a fatal neurodegenerative disease with no known cure. Approximately 90% of ALS cases are sporadic, although multiple genetic risk factors have been recently revealed also in sporadic ALS (SALS). The pathological expansion of a hexanucleotide repeat in chromosome 9 open reading frame 72 (C9orf72) is the most common genetic mutation identified in familial ALS, detected also in 5–10% of SALS patients. C9orf72-related ALS phenotype appears to be dependent on several modifiers, including demographic factors. Sex has been reported as an independent factor influencing ALS development, with men found to be more susceptible than women. Exposure to both female and male sex hormones have been shown to influence disease risk or progression. Moreover, interplay between genetics and sex has been widely investigated in ALS preclinical models and in large populations of ALS patients carrying C9orf72 repeat expansion. In light of the current need for reclassifying ALS patients into pathologically homogenous subgroups potentially responsive to targeted personalized therapies, we aimed to review the recent literature on the role of genetics and sex as both independent and synergic factors, in the pathophysiology, clinical presentation, and prognosis of ALS. Sex-dependent outcomes may lead to optimizing clinical trials for developing patient-specific therapies for ALS.
Article
Full-text available
Amyotrophic lateral sclerosis (ALS) is a devastating neurodegenerative disease with no known cure. Approximately 90% of ALS cases are sporadic, suggesting there are multiple contributing factors that influence the disease risk, onset, and progression. Diet and sex are two factors that have been reported to alter ALS risk, onset and progression in humans and in animal models, providing potential modifiers of disease. Several epidemiological studies have identified diets that positively affect ALS patients, including various high-calorie fat or sugar-based diets, while animal models have been developed to test how these diets are working on a molecular level. These diets may offset the metabolic alterations that occur in ALS, such as hypermetabolism, lowered body mass index(BMI), and hyperlipidemia. Sex-dependent differences have also come forth from large-scale epidemiological studies as well as mouse-model studies. In addition, sex hormones have been shown to affect disease risk or progression. Herein, studies on the effects of diet and sex on ALS risk, onset, and progression will be reviewed. Understanding these diet- and sex-dependent outcomes may lead to optimized patient-specific therapies for ALS.
Article
Full-text available
Background: Amyotrophic lateral sclerosis (ALS) is a progressive, fatal neurodegenerative disease of the motor neurons. The etiology of ALS remains largely unknown, particularly with reference to the potential environmental determinants. Methods: We performed a population-based case-control study in four provinces from both Northern and Southern Italy in order to assess non-genetic ALS risk factors by collecting through tailored questionnaires information about clinical and lifestyle factors. We estimated ALS risk by calculating odds ratio (OR) with its 95% confidence interval (CI) using unconditional logistic regression models adjusted for sex, age and educational attainment. Results: We recruited 230 participants (95 cases and 135 controls). We found a possible positive association of ALS risk with trauma, particularly head trauma (OR = 2.61, 95% CI 1.19–5.72), electric shock (OR = 2.09, 95% CI 0.62–7.06), and some sports, although at a competitive level only. In addition, our results suggest an increased risk for subjects reporting use of private wells for drinking water (OR = 1.38, 95% CI 0.73–2.27) and for use of herbicides during gardening (OR = 1.95, 95% CI 0.88–2.27). Conversely, there was a suggestion of an inverse association with overall fish consumption (OR = 0.27, 95% CI 0.12–0.60), but with no dose-response relation. Consumption of some dietary supplements, namely those containing amino acids and, in the Southern Italy population, vitamins and minerals such as selenium, seemed associated with a statistically imprecise increased risk. Conclusions: Our results suggest a potential etiologic role a number of clinical and lifestyle factors with ALS risk. However, caution is needed due to some study limitations. These include the small sample size and the low number of exposed subjects, which affect statistical precision of risk estimates, the potential for exposure misclassification, and the uncertainties about mechanisms underpinning the possible association between these factors and disease risk.
Article
Full-text available
Purpose of review: The cause of amyotrophic lateral sclerosis (ALS) remains unknown for most of the patients with the disease. Epidemiologic studies can help describe disease burden and examine its potential risk factors, providing thereby evidence base for future mechanistic studies. With this review, we aimed to provide a summary of epidemiologic studies published during the past 18 months, which studied the incidence and risk factors for ALS. Recent findings: An increasing incidence and prevalence of ALS continue to be reported from different parts of the world. Several previously studied risk factors are confirmed as causally related to ALS by Mendelian randomization analysis. The previously known prognostic indicators for ALS appear to be the same across populations. Summary: Provided with the increasing number of patients diagnosed with ALS and the improved societal awareness of the disease, more resources should be allocated to the research and care of ALS. Population-based studies, especially population-based disease registers, should be the priorities in ALS research, and more data from outside Europe are needed in gaining a better global perspective of the disease.
Article
Objective: To assess the determinants of amyotrophic lateral sclerosis (ALS) phenotypes in a population-based cohort. Methods: The study population included 2,839 patients with ALS diagnosed in Piemonte, Italy (1995-2015). Patients were classified according to motor (classic, bulbar, flail arm, flail leg, predominantly upper motor neuron [PUMN], respiratory) and cognitive phenotypes (normal, ALS with cognitive impairment [ALSci], ALS with behavioral impairment [ALSbi], ALSci and ALSbi combined [ALScbi], ALS-frontotemporal dementia [FTD]). Binary logistic regression analysis was adjusted for sex, age, and genetics. Results: Bulbar phenotype correlated with older age (p < 0.0001), women were more affected than men at increasing age (p < 0.0001), classic with younger age (p = 0.029), men were more affected than women at increasing age (p < 0.0001), PUMN with younger age (p < 0.0001), flail arm with male sex (p < 0.0001) and younger age (p = 0.04), flail leg with male sex with increasing age (p = 0.008), and respiratory with male sex (p < 0.0001). C9orf72 expansions correlated with bulbar phenotype (p < 0.0001), and were less frequent in PUMN (p = 0.041); SOD1 mutations correlated with flail leg phenotype (p < 0.0001), and were less frequent in bulbar (p < 0.0001). ALS-FTD correlated with C9orf72 (p < 0.0001) and bulbar phenotype (p = 0.008), ALScbi with PUMN (p = 0.014), and ALSci with older age (p = 0.008). Conclusions: Our data suggest that the spatial-temporal combination of motor and cognitive events leading to the onset and progression of ALS is characterized by a differential susceptibility to the pathologic process of motor and prefrontal cortices and lower motor neurons, and is influenced by age, sex, and gene variants. The identification of those factors that regulate ALS phenotype will allow us to reclassify patients into pathologically homogenous subgroups, responsive to targeted personalized therapies.
Article
Objectives. Coronary artery disease (CAD) is the leading cause of death and disease burden worldwide, causing 1 in 7 deaths in the United States alone. Risk prediction models that can learn the complex causal relationships that give rise to CAD from data, instead of merely predicting the risk of disease, have the potential to improve transparency and efficacy of personalized CAD diagnosis and therapy selection for physicians, patients, and other decision makers. Methods. We use Bayesian networks (BNs) to model the risk of CAD using the Z-Alizadehsani data set—a published real-world observational data set of 303 Iranian patients at risk for CAD. We also describe how BNs can be used for incorporation of background knowledge, individual risk prediction, handling missing observations, and adaptive decision making under uncertainty. Results. BNs performed on par with machine-learning classifiers at predicting CAD and showed better probability calibration. They achieved a mean 10-fold area under the receiver-operating characteristic curve (AUC) of 0.93 ± 0.04, which was comparable with the performance of logistic regression with L1 or L2 regularization (AUC: 0.92 ± 0.06), support vector machine (AUC: 0.92 ± 0.06), and artificial neural network (AUC: 0.91 ± 0.05). We describe the use of BNs to predict with missing data and to adaptively calculate prognostic values of individual variables under uncertainty. Conclusion. BNs are powerful and versatile tools for risk prediction and health outcomes research that can complement traditional statistical techniques and are particularly useful in domains in which information is uncertain or incomplete and in which interpretability is important, such as medicine.