ArticlePDF Available

Predicting Alzheimer's Disease with Interpretable Machine Learning

Authors:

Abstract

Introduction: This study aimed to develop novel machine learning models for predicting Alzheimer's disease (AD) and identify key factors for targeted prevention. Methods: We included 1219, 863, and 482 participants aged 60+ years with only sociodemographic, both sociodemographic and self-reported health, both the former two and blood biomarkers information from Alzheimer's Disease Neuroimaging Initiative (ADNI) database. Machine learning models were constructed for predicting the risk of AD for the above three populations. Model performance was evaluated by discrimination, calibration, and clinical usefulness. Shapley additive explanations (SHAP) was applied to identify key predictors of optimal models. Results: The mean age was 73.49, 74.52, and 74.29 years for the three populations, respectively. Models with sociodemographic information and models with both sociodemographic and self-reported health information showed modest performance. For models with sociodemographic and self-reported health, and blood biomarker information, their overall performance improved substantially, specifically, LR performed best, with an AUC value of 0.818. Blood biomarkers of ptau protein and plasma neurofilament light, age, blood tau protein and education level were top five significant predictors. In addition, taurine, inosine, xanthine, marital status, and L.Glutamine also showed importance to AD prediction. Conclusion: Interpretable machine learning showed promise in screening high-risk AD individual, and could further identify key predictors for targeted prevention.
Dementia and
Geriatric Cognitive
Disorders
Research Article
Dement Geriatr Cogn Disord
DOI: 10.1159/000531819
Received: April 18, 2023
Accepted: June 30, 2023
Published online: July 21, 2023
Predicting Alzheimers Disease with
Interpretable Machine Learning
Maoni Jia
a
Yafei Wu
a
Chaoyi Xiang
a
Ya Fang
a, b
a
Center for Aging and Health Research, School of Public Health, Xiamen University, Xiamen, China;
b
National
Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, China
Keywords
Alzheimers disease ·Prediction model ·Machine learning ·
Interpretability analysis
Abstract
Introduction: This study aimed to develop novel machine
learning models for predicting Alzheimers disease (AD) and
identify key factors for targeted prevention. Methods: We
included 1,219, 863, and 482 participants aged 60+ years
with only sociodemographic, both sociodemographic and
self-reported health, both the former two and blood bio-
markers information from Alzheimers Disease Neuro-
imaging Initiative (ADNI) database. Machine learning models
were constructed for predicting the risk of AD for the above
three populations. Model performance was evaluated by
discrimination, calibration, and clinical usefulness. SHapley
Additive exPlanation (SHAP) was applied to identify key
predictors of optimal models. Results: The mean age was
73.49, 74.52, and 74.29 years for the three populations,
respectively. Models with sociodemographic information
and models with both sociodemographic and self-reported
health information showed modest performance. For
models with sociodemographic, self-reported health, and
blood biomarker information, their overall performance
improved substantially, specically, logistic regression per-
formed best, with an AUC value of 0.818. Blood biomarkers
of ptau protein and plasma neurolament light, age, blood
tau protein, and education level were top ve signicant
predictors. In addition, taurine, inosine, xanthine, marital
status, and L.Glutamine also showed importance to AD
prediction. Conclusion: Interpretable machine learning
showed promise in screening high-risk AD individual and
could further identify key predictors for targeted prevention.
© 2023 S. Karger AG, Basel
Introduction
Dementia has become a major threat to human health
and quality of life. According to the World Health Or-
ganization, more than 55 million people were suffering
from dementia, and this number is expected to reach 78
and 139 million by 2030 and 2050 [1]. Notably, Alz-
heimers disease (AD) is the most common cause of
dementia [2]. The onset of AD is latent and irreversible,
and it would cause adverse outcomes such as disability
and death [3]. Therefore, development of prediction
models can help early identication of AD, and further
provide implications for targeted prevention and medical
decisions.
With the generation of huge volume of medical and
health data, it has signicantly stimulated the rapid de-
velopment and application of data mining techniques
in the eld of healthcare [4]. Studies on data-driven
Maoni Jia and Yafei Wu contributed equally to this work.
karger@karger.com
www.karger.com/dem
© 2023 S. Karger AG, Basel Correspondence to:
Ya Fang, fangya
@
xmu.edu.cn
Downloaded from http://karger.com/dem/article-pdf/doi/10.1159/000531819/3988991/000531819.pdf by Southern Medical University user on 23 August 2023
diagnostic or prognostic models for diseases are on the
rise, evolving from classical statistical methods to ma-
chine learning models [5]. It is worth noting that machine
learning algorithms often show superior performance to
statistical models when encountering high-dimensional
and complex healthcare data, and played an increasingly
critical role in medical study [6]. Nevertheless, machine
learning methods were considered to be less interpretable
compared to statistical models [7]. Hence, interpretable
techniques are emerging in recent years such as SHapley
Additive exPlanation (SHAP), which can be leveraged to
shed light on the inner decision-making process and
identify signicant variables [8].
Previous studies have examined a range of risk factors
of AD in the elderly population including sociodemo-
graphic factors such as age, gender, and education, disease
history (comorbidities) such as diabetes, cardiovascular
disease, unhealthy lifestyles such as smoking and alcohol
consumption, and blood biological markers [9]. Pres-
ently, cerebrospinal uid amyloid-beta42 (Aβ42) and tau
protein are recognized uid markers [10], but the invasive
nature of lumbar puncture limits their widespread use.
Moreover, magnetic resonance imaging and positron
emission tomography perform more reliable in cognitive
impairment assessment, but the high cost and limited
availability also constrain their broader applications [11].
In this context, we aimed to examine the following
questions: First, was the state of art machine learning
models utilizing multimodal data (sociodemographic,
self-reported health, and blood biomarker information) a
suitable tool for AD prediction among older adults?
Second, what key predictors could be identied for AD
prediction when using the SHAP analysis? Third, how
does blood biomarker information contribute to AD
prediction?
Methods
Participants
Data were collected from the Alzheimers Disease Neuro-
imaging Initiative (ADNI) database (https://adni.loni.usc.edu),
which is open-access and has high quality for cognitive research.
For more information, please refer to the ofcial website at www.
adni-info.org. ADNI research began in 2004 (ADNI-1), including
study extension in 2009 (ADNI-GO) and renewals in 2011 (ADNI-
2) and 2016 (ADNI-3). The ADNI study protocol was reviewed
and approved by Ethics Committees at each of the participating
site. This full list of participating site and Ethics Committees can be
found at previous publication [12]. All subjects provided written
informed consent through the participating ADNI sites. We
conducted sample selection in three scenarios: (1) participants who
had only sociodemographic information (scenario 1); (2) partic-
ipants who had both sociodemographic and self-reported health
information (scenario 2); (3) participants who had sociodemo-
graphic, self-reported health, and routine blood biomarker in-
formation (scenario 3). Finally, there were 1,219, 863, and 482
subjects included, respectively. The detailed sample information
was listed in online supplementary Figure S1 (for all online suppl.
material, see https://doi.org/10.1159/000531819). The variables
selected for each scenario were presented in online supplementary
Methods S1 and supplementary Table S1.
The ADNI study is designed to diagnose and treat patients with
cognitive impairment early, therefore, more AD patients were in-
cluded in the database. In this study, out of the total samples in each
scenario, there were 400, 341, and 183 old adults had AD, respec-
tively. The mean age was 73.49, 74.52, and 74.29 years for each
scenario. The proportion of male participants was 48.1%, 51.2%, and
50.0%, respectively. More detailed descriptive information of par-
ticipants in 3 scenarios and the comparisons between AD and normal
populations were shown in online supplementary Tables S2S4.
Outcome
In this study, we aimed to predict AD (normal control vs. AD)
for older adults aged 60+ years old. In ADNI study, normal control
was mainly dened by following criteria: MMSE scores between 24
and 30, Clinical Dementia Rating score and Memory Box score of
0, an absence of signicant impairment in cognitive functions or
activities of daily living. AD was mainly dened by following
criteria: MMSE scores between 20 and 26, CDR score of 0.5 or 1.0,
and meets the National Institute of Neurological and Commu-
nicative Disorders and Stroke/Alzheimers Disease and Related
Disorders Association criteria for probable AD [13]. A more
detailed description of normal control and AD could be found in
online supplementary Table S5 and at http://adni.loni.usc.edu/
methods/documents.
Candidate Variables and Preprocessing
In the current study, we collected 3 kinds of variables including
sociodemographic, self-reported health, and blood biomarker
information from ADNI-1, ADNI-GO, ADNI-2, and ADNI-3.
Variables with missing rate of more than 20% were removed. For
variables with missing rate of less than 20%, missForest was
employed to ll the missing data, which can deal with both cat-
egorical and continuous variables based on random forest (RF)
[14]. One-hot encoding was adopted to encode categorical vari-
ables [15]. All continuous variables were standardized for better
model convergence.
Model Construction and Evaluation
Based on the biomedical research guideline recommendations
[16], we selected six commonly used models including logistic
regression with penalty (LR) [17], support vector machine (SVM)
[18], decision trees [19], random forest (RF) [20], extreme gradient
boosting (XGB) [21], and articial neural network (ANN) [22], to
test their ability in predicting AD.
We used the six machine learning algorithms to construct AD
prediction models in each scenario. Specically, in scenario 1, only
sociodemographic information was included. Then the self-
reported health information were added in scenario 2. Further,
blood biomarkers were integrated in scenario 3. In addition, we
further explored the value of blood biomarkers in AD prediction
and hoped to nd some key blood biomarkers.
2Dement Geriatr Cogn Disord
DOI: 10.1159/000531819 Jia/Wu/Xiang/Fang
Downloaded from http://karger.com/dem/article-pdf/doi/10.1159/000531819/3988991/000531819.pdf by Southern Medical University user on 23 August 2023
We followed a standard procedure of model construction,
derived from the Transparent Reporting of a Multivariable Pre-
diction Model for Individual Prognosis or Diagnosis (TRIPOD)
[23]. The whole data were randomly split into training (70%) and
testing datasets (30%) considering the distribution of AD versus
non-AD patients. Then we conducted 10-fold cross-validation in
training dataset for hyperparameters tuning. The main hyper-
parameters used for our prediction models were shown in online
supplementary Table S6. Discrimination, calibration, and clinical
usefulness metrics were used to evaluate the model performance in
the test set. Balanced accuracy, sensitivity, specicity, precision, F1
score, area under the receiver operating characteristic curve
(AUROC) were considered for model discrimination. Brier score
was used to evaluate the calibration results of models. The utility of
models in clinical practice was evaluated by decision curve analysis
(DCA) based on a range of threshold probabilities. For the above
metrics, the 95% condence interval was also calculated in the
testing data with a 1000-time repeated bootstrap method [24].
Python 3.7.6 was adopted for machine learning analyses. The
detailed workow including model construction and evaluation is
shown in Figure 1.
Interpretability Analysis
Although machine learning has made great breakthroughs in
medicine, their complicated structure make them difcult for
decision makers to understand [25]. SHAP, an interpretable
technique, was able to uncover the black box of machine
learning. SHAP can be used to interpret the model from both
global and local perspectives. The global analysis of SHAP was
presented by summary plot. The horizontal axis of summary plot
denotes the average absolute SHAP value of each feature, a
longer horizontal bar means a larger absolute value of SHAP,
which implies a more signicant feature. Local interpretability
provides the details of the predictions by force plot, focusing on
explaining how individual predictions are derived. Each feature
has its own contribution, and the accumulation of all features
ultimately drives the prediction from the base value to the nal
model output. Features that push the prediction higher are
showninredandthatpushthepredictionlowerareshown
in blue.
Statistical Analysis
For descriptive analysis, continuous variables were displayed as
means (standard deviation) for normal distribution or presented
by median (interquartile) for skewed distribution. Categorical
variables were presented by numbers (percentage). For compar-
isons of characteristics, ttest or Wilcoxon test and χ
2
test or
Fishers exact test were used for continuous and categorical var-
iables, respectively. The two-tailed value of p<0.05 was regarded
as statistically signicant. All the above analyses were performed
by SPSS.26.0.
Color version available online
Fig. 1. Flowchart of the workow. LR, logistic regression; DT, decision tree; RF, random forest; SVM, support
vector machine; ANN, articial neural network; XGB, extreme gradient boosting; AUC, area under curve; DCA,
decision curve analysis.
Interpretable Prediction of Alzheimers
Disease
Dement Geriatr Cogn Disord
DOI: 10.1159/000531819 3
Downloaded from http://karger.com/dem/article-pdf/doi/10.1159/000531819/3988991/000531819.pdf by Southern Medical University user on 23 August 2023
Table 1. Performance of machine learning algorithms in AD prediction
LR DT RF SVM XGB ANN
Scenario 1
Balanced-accuracy 0.738 (0.714, 0.762) 0.722 (0.698, 0.746) 0.749 (0.726, 0.773) 0.717 (0.692, 0.741) 0.746 (0.723, 0.770) 0.644 (0.618, 0.670)
Sensitivity 0.304 (0.256, 0.352) 0.385 (0.336, 0.435) 0.385 (0.337, 0.433) 0.171 (0.132, 0.209) 0.339 (0.291, 0.388) 0.491 (0.440, 0.542)
Specicity 0.882 (0.852, 0.912) 0.832 (0.804, 0.850) 0.869 (0.833, 0.905) 0.823 (0.796, 0.838 0.854 (0.820, 0.879) 0.776 (0.745, 0.786)
Precision 0.652 (0.582, 0.723) 0.565 (0.506, 0.625) 0.651 (0.591, 0.712) 0.633 (0.539, 0.727) 0.666 (0.598, 0.733) 0.428 (0.382, 0.473)
F1 0.414 (0.361, 0.468) 0.458 (0.410, 0.506) 0.484 (0.436, 0.531) 0.268 (0.215, 0.321) 0.449 (0.398, 0.500) 0.457 (0.414, 0.499)
AUC 0.720 (0.691, 0.750) 0.713 (0.683, 0.743) 0.764 (0.736, 0.791) 0.767 (0.738, 0.795) 0.754 (0.726, 0.782) 0.648 (0.617, 0.679)
Brier score 0.184 (0.174, 0.195) 0.209 (0.194, 0.224) 0.175 (0.165, 0.184) 0.178 (0.168, 0.188) 0.179 (0.170, 0.188) 0.356 (0.330, 0.382)
Scenario 2
Balanced-accuracy 0.621 (0.594, 0.648) 0.626 (0.600, 0.651) 0.614 (0.587, 0.641) 0.668 (0.642, 0.694) 0.660 (0.634, 0.686) 0.575 (0.548, 0.601)
Sensitivity 0.470 (0.427, 0.513) 0.412 (0.369, 0.455) 0.451 (0.407, 0.496) 0.353 (0.312, 0.395) 0.480 (0.436, 0.525) 0.529 (0.485, 0.573)
Specicity 0.776 (0.745, 0.786) 0.837 (0.802, 0.851) 0.824 (0.798, 0.844) 0.806 (0.785, 0.815) 0.806 (0.785, 0.815) 0.786 (0.735, 0.802)
Precision 0.521 (0.474, 0.568) 0.532 (0.481, 0.582) 0.511 (0.463, 0.559) 0.643 (0.586, 0.699) 0.582 (0.534, 0.631) 0.465 (0.423, 0.507)
F1 0.494 (0.454, 0.533) 0.464 (0.423, 0.505) 0.479 (0.439, 0.520) 0.456 (0.413, 0.499) 0.526 (0.486, 0.567) 0.495 (0.457, 0.532)
AUC 0.682 (0.653, 0.711) 0.636 (0.605, 0.667) 0.672 (0.642, 0.702) 0.698 (0.670, 0.727) 0.694 (0.665, 0.723) 0.621 (0.592, 0.650)
Brier score 0.217 (0.207, 0.226) 0.284 (0.265, 0.303) 0.223 (0.212, 0.233) 0.213 (0.204, 0.223) 0.212 (0.203, 0.221) 0.421 (0.394, 0.447)
Scenario 3
Balanced-accuracy 0.744 (0.673, 0.815) 0.684 (0.603, 0.765) 0.701 (0.625, 0.777) 0.556 (0.490, 0.622) 0.717 (0.640, 0.794) 0.648 (0.570, 0.725)
Sensitivity 0.801 (0.737, 0.864) 0.724 (0.649, 0.798) 0.745 (0.674, 0.816) 0.655 (0.576, 0.734) 0.766 (0.695, 0.836) 0.704 (0.630, 0.779)
Specicity 0.936 (0.900, 0.985) 0.819 (0.783, 0.899) 0.829 (0.793, 0.908) 0.893 (0.857, 0.957) 0.882 (0.846, 0.949) 0.841 (0.805, 0.917)
Precision 0.807 (0.742, 0.871) 0.721 (0.643, 0.799) 0.741 (0.667, 0.815) 0.628 (0.530, 0.726) 0.763 (0.690, 0.836) 0.696 (0.616, 0.776)
F1 0.789 (0.718, 0.859) 0.719 (0.642, 0.796) 0.739 (0.665, 0.813) 0.607 (0.513, 0.701) 0.757 (0.682, 0.832) 0.692 (0.612, 0.772)
AUC 0.818 (0.745, 0.890) 0.702 (0.616, 0.788) 0.802 (0.729, 0.875) 0.382 (0.283, 0.482) 0.788 (0.710, 0.867) 0.730 (0.646, 0.813)
Brier score 0.161 (0.126, 0.195) 0.206 (0.157, 0.255) 0.172 (0.139, 0.205) 0.236 (0.216, 0.255) 0.174 (0.141, 0.207) 0.199 (0.163, 0.235)
All the metrics were calculated in the 30% test data with 1,000-time bootstrap. Scenario 1, only sociodemographic variables; Scenario 2, scenario 1 plus self-
reported health information; Scenario 3, scenario 2 plus blood biomarkers. LR; logistic regression; DT, decision tree; RF, random forest; SVM, support vector
machine; XGB, extreme gradient boosting; ANN, articial neural network; AUC, area under the curve.
4Dement Geriatr Cogn Disord
DOI: 10.1159/000531819 Jia/Wu/Xiang/Fang
Downloaded from http://karger.com/dem/article-pdf/doi/10.1159/000531819/3988991/000531819.pdf by Southern Medical University user on 23 August 2023
Results
Performance of Prediction Models
Performance of prediction models in each scenario is
shown in Table 1. In scenario 1, all models showed
relatively moderate performance in balanced accuracy
(ranging from 0.644 to 0.749) and areas under the curve
(AUCs) (ranging from 0.648 to 0.767), and these models
were inclined to fail to predict AD with lower sensitivity
(less than 0.50). Compared to scenario 1, there was a
slight improvement in sensitivity (maximum: 0.529) in
scenario 2, but the balanced accuracy, precision, and
AUC declined slightly, and the Brier score was relatively
higher. In scenario 3, the sensitivity of all models im-
proved signicantly compared with other two scenarios,
the maximum sensitivity, F1, and AUC were 0.801, 0.789,
and 0.818 for LR, respectively. Moreover, when only
blood biomarkers were incorporated, the overall per-
formance of all models was superior to that in scenario 1
and scenario 2 but slightly lower than that in scenario 3
(online suppl. Table S7), and the sensitivity, F1 and AUCs
of LR were 0.745, 0.728, and 0.820, respectively.
For DCA analysis (Fig. 2), when the threshold was set
between 0.2 and 0.8 in scenario 1 and scenario 2, certain
clinical benet could be expected from implementing
specic interventions on high-risk individuals. While in
scenario 3, the threshold was better to be set between 0.2
and 0.6 and LR showed much higher and stable net
benets compared with other models.
Interpretability Analysis of Models
Given that the overall performance of prediction models
in scenario 3 was much better, especially LR model for its
high sensitivity, F1 score, AUC, and low Brier score,
therefore, we selected LR model in scenario 3 for model
interpretability analysis to explore the most important
predictors, which can be regarded as signicant indicators
to further tailor the intervention among the high-risk
population. Figure 3 presented the top 10 key factors
based on the results of the global interpretability analysis, it
can be found that the blood biomarkers of ptau protein and
plasma neurolament light (NfL), age, blood tau protein,
and education were top ve signicant predictors (Fig. 3). In
addition, taurine, inosine, xanthine, marital status, and
L.Glutamine also showed importance for AD prediction.
When only utilizing blood biomarkers variables for AD
prediction, NfL was recognized as the most signicant
factor with maximum SHAP value (online suppl. Fig. S2).
Figure 4a, b displayed the contribution of each pre-
dictor for the individual decision. For example, from
Figure 4a, we can know that male, taurine, age, and ptau
protein pushed the model from the base value (0.8644)
to larger direction. Tau protein in blood and education
conversely pushed the model down from the base value.
Ultimately, with all these factors, the model was pushed
from the base value (0.8644) to 0.1, predicting that the
individual was a positive individual (high-risk AD in-
dividual). For Figure 4b, apart from tau protein, ptau
protein, plasma NfL, age, and taurine pushed the model
down from the base value (0.8644) to a lower value
(2.24); nally, this individual was predicted as normal.
Figure 4c, d demonstrated the decision mechanisms for
false-negative and false-positive subjects separately.
Discussion
In this study, we proposed machine learning prediction
models for AD identication, and interpretable method was
utilized to explore the key predictors, and we further dis-
cussed the value of blood biomarkers on the performance of
AD identication. These models can be employed to op-
timize risk stratication, and the identied predictors could
be served as important clues for targeted prevention.
All models in scenario 1 showed moderate discriminative
ability. In scenario 2, models showed lower balanced ac-
curacy, precision, and AUC but relatively higher sensitivity
and F1 score compared to models in scenario 1. Presumably
because these additional variables may also be substantially
associated with various other geriatric diseases such as
cancer, cardiovascular diseases, and so on, which under-
mined their ability to discriminate the onset of AD [26].
Consistent with the ndings of previous studies [27], when
blood biomarkers were further incorporated in scenario 3,
the model performance improved signicantly, especially
for sensitivity. When only blood biomarkers were involved,
all models outperformed than models with only epidemi-
ological data (sociodemographic characteristics and self-
reported health information). Comparing with the per-
formance of models in scenario 1 and scenario 2, our results
suggested that the value of blood biological markers was
somewhat superior to epidemiological information, but
there is still a long way to go to explore low invasive but
more sensitive indicators for AD identication. In the
current study, blood biomarkers of ptau protein and plasma
NfL, age, blood tau protein, and education were top ve
signicant variables and could be the priority indicators for
AD prediction.
Based on the results of this study, when the models
incorporated only epidemiological information, the sen-
sitivity of the models was low and was unable to identify
more high-risk AD populations. When blood biological
Interpretable Prediction of Alzheimers
Disease
Dement Geriatr Cogn Disord
DOI: 10.1159/000531819 5
Downloaded from http://karger.com/dem/article-pdf/doi/10.1159/000531819/3988991/000531819.pdf by Southern Medical University user on 23 August 2023
markers were further incorporated, the sensitivity of the
models was considerably improved, which is able to assist
in identifying people at high risk of AD. Additionally, in
contrast to cerebrospinal uid and imaging biomarkers,
blood biomarkers, detected from peripheral blood samples,
contained a variety of metabolites that reected the
physiological activity of many organs, including the brain,
which made it available as population prediction tools [28].
Further, it has been conrmed that the biochemical
changes in metabolism, followed by energy metabolism,
associated with N-acetylaspartate (NAA), inositol (MI),
glucose (Glu), glutamate (Gln), and aspartate (Asp) pre-
ceded structural changes in AD [29]. Therefore, we con-
structed models with only blood biomarkers to examine
possible low-invasive biological indicators of changes in
cognition decline to help early identication of AD. The
results identied that NfL chain was a signicant predictor
based on interpretability analysis. NFL is a crucial com-
ponent subunit of neurolament protein (NF). As the
major cytoskeletal protein of neuronal axons, NF is highly
expressed in neuronal axon sites and maintains the stability
of axon morphology and ensuring nerve signaling. The
NfL in tissue uid can diffuse to blood through cerebro-
spinal uid, and the concentration of NfL in blood is lower
than cerebrospinal uid. Therefore, NfL is considered to be
an promising reection of neurodegeneration and can be
Color version available online
ab
c
Fig. 2. Decision curve and net benet of each model. Net benet:
the sum of the gain value of the intervention for the corresponding
true-positive population at each threshold and the loss value of the
intervention for the false-positive population. LR, logistic re-
gression; DT, decision tree; RF, random forest; SVM, support
vector machine; XGB, extreme gradient boosting; ANN, articial
neural network; all positive: the net benet of providing inter-
vention for all subjects (all subjects would be positive); all negative:
the net benet of providing no intervention for all subjects (all
subjects would be negative). aScenario 1 (only sociodemographic
variables). bScenario 2 (scenario 1 plus self-reported health in-
formation). cScenario 3 (scenario 2 plus blood biomarker).
6Dement Geriatr Cogn Disord
DOI: 10.1159/000531819 Jia/Wu/Xiang/Fang
Downloaded from http://karger.com/dem/article-pdf/doi/10.1159/000531819/3988991/000531819.pdf by Southern Medical University user on 23 August 2023
measured in blood [30]. However, the prediction power of
plasma metabolites was still limited in the current study.
The possible explanation for this was that the method of
blood biomarkers assay was taken into consideration, and
the ability of the blood markers to predict AD may be
improved by more precise tests of blood biomarkers in the
future, so the clinical application of blood biomarkers
needs to be further examined.
Color version available online
Fig. 3. Interpretability analysis with
SHapley Additive exPlanation for logistic
regression in scenario 3 (top 10 predictors).
Color version available online
a
b
c
d
Fig. 4. Individual aspect of interpretability analysis with SHapley Additive exPlanation in scenario 3 (Logistic
regression). aTrue positive. bTrue negative. cFalse negative. dFalse positive.
Interpretable Prediction of Alzheimers
Disease
Dement Geriatr Cogn Disord
DOI: 10.1159/000531819 7
Downloaded from http://karger.com/dem/article-pdf/doi/10.1159/000531819/3988991/000531819.pdf by Southern Medical University user on 23 August 2023
Our prediction models have several clinical implica-
tions, the machine learning algorithms not only showed
potential for predicting AD with abundant data but also
showed stable and high clinical value from the DCA
curve. The net benets in scenario 3 were near 0.2 at the
threshold of 0.3 indicating that the designed models could
be used to identify 20 positive subjects among 100 in-
dividuals [31]. Additionally, the interpretability analyses
made it more transparent to give individual decision with
presenting the contribution of each predictor.
There are still several limitations. First, this study was a
cross-sectional study; therefore, it is necessary to validate
our results through a prospective study in future. Second,
the sample size was relatively small; we performed bootstrap
in order to give objective model evaluation. A large sample
is warranted to externally validate our results. Meanwhile,
machine learning algorithms didnt outperform LR in our
study, which may be also due to the small sample size, and
more advanced methods such as convolutional neural
network are deserved to be studied in a large dataset. Third,
majority of the people recruited in ADNI were from North
America, the study population was relatively constrained,
which may limit model application and the results need to
be generalized among other populations. Fourth, this study
only predicts the risk of AD without considering its sub-
types because there was no more detailed information about
other dementia types, which is deserved to be studied in
future. Finally, the positive predicted value was not high
enough, indicating some misdiagnosed cases, which needs
to be addressed in further investigations.
Conclusion
This study utilized advanced machine learning
methods and abundant clinical data to accurately predict
AD, which is crucial for optimizing personalized pre-
vention trials and individualized risk management. In
addition, interpretable analysis highlighted that blood
biomarkers of ptau protein and plasma NfL, and blood
tau protein, sociodemographic variables of age and ed-
ucation were the top ve key factors for the early iden-
tication of AD, and taurine, inosine, xanthine, marital
status, and L.Glutamine also showed importance for AD
prediction, which may be used for targeted intervention.
Acknowledgments
We thank the staff and the participants of the ADNI study.
Statement of Ethics
The ADNI study is an open-access database, and publicly
available for global researchers. The ADNI study was approved by
each of the participating sitesInstitutional Review Boards (IRBs)
and complied with the Declaration of Helsinki. Written informed
consent was obtained from all participants after they had received
a complete description of the study. A full list of participating site
and Ethics Committees can be found in Methods section.
Conict of Interest Statement
The authors have no conicts of interest to declare.
Funding Sources
This study was supported by the National Natural Science
Foundation of China (No. 81973144).
Author Contributions
M.J., Y.W., C.X., and Y.F. worked together on this article.
Specically, Y.W. conceived and designed the study. Y.W., and
M.J. contributed to the data analysis. M.J., and Y.W. drafted the
manuscript. C.X. revised the article. Y.F. supervised and revised the
article. All authors have read and approved the nal manuscript.
Data Availability Statement
The ADNI study is an open-access database, and publicly available
at Alzheimers Disease Neuroimaging Initiative (https://adni.loni.usc.
edu/). Further inquiries can be directed to the corresponding author.
References
1 Wang CY, Song PP, Niu YH. The manage-
ment of dementia worldwide: a review on
policy practices, clinical guidelines, end-of-life
care, and challenge along with aging pop-
ulation. Biosci Trends.2022Apr;16(2):11929.
2 Zetterberg H, Schott JM. Blood biomarkers
for Alzheimers disease and related disorders.
Acta Neurol Scand. 2022;146(1):515.
3 2022 Alzheimers disease facts and gures.
Alzheimers Dement. 2022;18(4):70089.
4 Sarwar T, Seifollahi S, Chan J, Zhang X,
Aksakalli V, Hudson I, et al. The secondary
use of electronic health records for data
mining: data characteristics and challenges.
ACM Comput Surv. 2022;55(2):140.
8Dement Geriatr Cogn Disord
DOI: 10.1159/000531819 Jia/Wu/Xiang/Fang
Downloaded from http://karger.com/dem/article-pdf/doi/10.1159/000531819/3988991/000531819.pdf by Southern Medical University user on 23 August 2023
5 Shehab M, Abualigah L, Shambour Q, Abu-
Hashem MA, Shambour MKY, Alsalibi AI,
et al. Machine learning in medical applica-
tions: a review of state-of-the-art methods.
Comput Biol Med. 2022 Jun;145:105458.
6 Chen SX, Xu C. Handling high-dimensional
data with missing values by modern machine
learning techniques. J Appl Stat. 2022;50(3):
786804.
7 Petch J, Di S, Nelson W. Opening the black
box: the promise and limitations of ex-
plainable machine learning in cardiology.
Can J Cardiol. 2022 Feb;38(2):20413.
8 Kwon Y, Rivas MA, Zou JYJA. Efcient
computation and analysis of distributional
shapley values. 2021. abs/2007.01357.
9 Serrano-Pozo A, Das S, Hyman BT. APOE
and Alzheimers disease: advances in genet-
ics, pathophysiology, and therapeutic ap-
proaches. Lancet Neurol. 2021;20(1):6880.
10 Wu C, Wu L, Wang J, Lin L, Li Y, Lu Q, et al.
Systematic identication of risk factors and
drug repurposing options for Alzheimers
disease. Alzheimers Dement.2021;7(1):
e12148.
11 Blennow KA-O, Zetterberg H. Biomarkers
for Alzheimers disease: current status and
prospects for the future. J Intern Med. 2018;
284(6):64363.
12 Podhorna J, Krahnke T, Shear M, Harrison JE;
Alzheimers Disease Neuroimaging Initiative.
Alzheimers disease assessment scale-cognitive
subscale variants in mild cognitive impairment
and mild Alzheimers disease: change over
time and the effect of enrichment strategies.
Alzheimers Res Ther. 2016 Feb 12;8:8.
13 McKhann G, Drachman D, Folstein M,
Katzman R, Price D, Stadlan EM, et al.
Clinical diagnosis of Alzheimers disease:
report of the NINCDS-ADRDA work group
under the auspices of department of health
and human services task force on Alzheimers
disease. Neurology. 1984;34(7):93944.
14 Stekhoven DJ, Bühlmann P. MissForest--
non-parametric missing value imputation for
mixed-type data. Bioinformatics. 2012;28(1):
1128.
15 Tokuyama Y, Miki R, Fukushima Y, Tarutani
Y, Yokohira T. Performance evaluation of
feature encoding methods in network trafc
prediction using recurrent neural networks.
Proceedings of the 2020 8th international
conference on information and education
technology. Okayama, Japan: Association for
Computing Machinery; 2020. p. 27983.
16 Luo W, Phung D, Tran T, Gupta S, Rana S,
Karmakar C, et al. Guidelines for developing and
reporting machine learning predictive models in
biomedical research: a multidisciplinary view.
J Med Internet Res. 2016;18(12):e323.
17 Guan XC, Zhang JH, Chen SY. Logistic re-
gression based on statistical learning model
with linearized kernel for classication. Cai.
2021;40(2):298317.
18 Chauhan VK, Dahiya K, Sharma A. Problem
formulations and solvers in linear SVM: a
review. Artif Intell Rev. 2019;52(2):80355.
19 Myles AJ, Feudale RN, Liu Y, Woody NA,
Brown SD. An introduction to decision tree
modeling. J Chemom. 2004;18(6):27585.
20 Sarica A, Cerasa A, Quattrone A. Random
forest algorithm for the classication of
neuroimaging data in Alzheimers disease: a
systematic review. Front Aging Neurosci.
2017;9:329.
21 Schapire RE. The strength of weak learn-
ability. Mach Learn. 1990;5(2):197227.
22 Han SH, Kim KW, Kim S, Youn YC. Articial
neural network: understanding the basic
concepts without mathematics. Dement
Neurocogn Disord. 2018;17(3):839.
23 Collins GS, Reitsma JB, Altman DG, Moons
KG. Transparent reporting of a multivariable
prediction model for individual prognosis or
diagnosis (TRIPOD): the TRIPOD statement.
Ann Intern Med. 2015;162(10):7356.
24 ÖtleşE, Seymour J, Wang H, Denton BT.
Dynamic prediction of work status for
workers with occupational injuries: assessing
the value of longitudinal observations. JAm
Med Inf Assoc. 2022;29(11):193140.
25 Zhou Q, Liao F, Mou C, Wang P. Measuring
interpretability for different types of machine
learning models. In: Ganji M, Rashidi L, Fung
BCM, Wang C, editors. Trends and appli-
cations in knowledge discovery and data
mining. Cham: Springer International Pub-
lishing; 2018. p. 295308.
26 Licher S, Leening MJG, Yilmaz P, Wolters FJ,
Heeringa J, Bindels PJE, et al. Development
and validation of a dementia risk prediction
model in the general population: an analysis
of three longitudinal studies. Am J Psychiatry.
2019 Jul 1;176(7):54351.
27 Battista P, Salvatore C, Castiglioni I. Opti-
mizing neuropsychological assessments for
cognitive, behavioral, and functional im-
pairment classication: a machine learning
study. Behav Neurol. 2017;2017:1850909.
28 OBryant SE, Xiao G, Barber R, Huebinger R,
Wilhelmsen K, Edwards M, et al. A blood-
based screening tool for Alzheimers disease
that spans serum and plasma: ndings from
TARC and ADNI. PLoS One. 2011;6(12):
e28092.
29 Lin AP, Shic F, Enriquez C, Ross BD. Re-
duced glutamate neurotransmission in pa-
tients with Alzheimers disease -- an in vivo
(13)C magnetic resonance spectroscopy
study. Magma. 2003;16(1):2942.
30 Leuzy A, Cullen NC, Mattsson-Carlgren N,
Hansson O. Current advances in plasma and
cerebrospinal uid biomarkers in Alzhei-
mers disease. Curr Opin Neurol. 2021 Apr 1;
34(2):26674.
31 Vickers AJ, van Calster B, Steyerberg EW. A
simple, step-by-step guide to interpreting
decision curve analysis. Diagn Progn Res.
2019;3:18.
Interpretable Prediction of Alzheimers
Disease
Dement Geriatr Cogn Disord
DOI: 10.1159/000531819 9
Downloaded from http://karger.com/dem/article-pdf/doi/10.1159/000531819/3988991/000531819.pdf by Southern Medical University user on 23 August 2023
... This added complexity made our classi cation process more demanding. In studies comparable to ours, Monai et al [30] developed amachine learning model incorporating socio-demographic data, self-reported health information, and blood biomarkers, achieving an impressive AUC value of 0.801 for the XGB model. Burge et al [31] utilized a dynamic Bayesian network (DBN) on functional magnetic resonance imaging (fMRI) data, achieving a classi cation accuracy of 73% in distinguishing between older adults with dementia and healthy individuals. ...
Preprint
Full-text available
Background: Alzheimer’s disease (AD) poses a significant challenge for individuals aged 65 and older, being the most prevalent form of dementia. Most existing Alzheimer’s disease risk predic- tion tools have high accuracy, but the complexity and limited accessibility of current AD risk prediction tools hinder their practical use. Objective: Our goal was to leverage machine learning techniques to develop a prediction model that is not only highly efficient but also cost-effective. METHODS: Utilizing data from 2,968 individuals sourced from the National Alzheimer’s Coor- dinating Center, and we constructed models, including gradient-enhanced machines and random forests, as well as commonly used logistic regression models. For modeling purposes, we employed two popular machine learning algorithms, Random Forest and XGBoost, along with traditional logistic regression methods. The models’ performance was evaluated based on five key criteria: the Brier score, accuracy (ACC), specificity (SPE), sensitivity (SEN), and area under the receiver operating characteristic curve (AUC). RESULTS: The average age of the 2968 participants was 71.1 years, with a standard deviation of 6.8 years, and 60.3% were female. The prevalence of AD was 23.15% (n= 687). The machine learning-based Boruta algorithm identified 16 significant predictors from 33 potential risk factors, with a minimum Root mean squared error (RMSE) of 0.27 when the top 5 variables were selected (education level, depression, rapid eye movement sleep disorder, age, anxiety).We used the SHAP feature in the Gradient Boosting Decision Tree Model importance to rank the top 20 significant predictors and selected the top 4 variables: edu- cation level, age, marital status, and depression to construct our model based on cross-validation results. Compared to the logistic regression model, the integrated algorithm XGBoost and the random forest model performed better. Notably, XGBoost outperformed other models, achievingan AUC score of 0.78, ACC score of 0.691, SPE score of 0.677, SEN score of 0.739, PRE score of 0.403, and Brier score of 0.140. CONCLUSION: Individual characteristics and psychological status are more critical than past history. Machine-learning-based AD risk assessment tools for older adults can be easily accessed and show some accurate discrimination, which may be useful in guiding preclinical screening for AD in the elderly population.
Article
Full-text available
The primary objective of implementing Electronic Health Records (EHRs) is to improve the management of patients' health-related information. However, these records have also been extensively used for the secondary purpose of clinical research and to improve healthcare practice. EHRs provide a rich set of information that includes demographics, medical history, medications, laboratory test results, and diagnosis. Data mining and analytics techniques have extensively exploited EHR information to study patient cohorts for various clinical and research applications, such as phenotype extraction, precision medicine, intervention evaluation, disease prediction, detection, and progression. But the presence of diverse data types and associated characteristics poses many challenges to the use of EHR data. In this article, we provide an overview of information found in EHR systems and their characteristics that could be utilized for secondary applications. We first discuss the different types of data stored in EHRs, followed by the data transformations necessary for data analysis and mining. Later, we discuss the data quality issues and characteristics of the EHRs along with the relevant methods used to address them. Moreover, this survey also highlights the usage of various data types for different applications. Hence, this article can serve as a primer for researchers to understand the use of EHRs for data mining and analytics purposes.
Article
Full-text available
Many clinicians remain wary of machine learning due to long-standing concerns about “black box” models. “Black box” is shorthand for models that are sufficiently complex that they are not straightforwardly interpretable to humans. Lack of interpretability in predictive models can undermine trust in those models, especially in health care where so many decisions are literally life and death. There has recently been an explosion of research in the field of explainable machine learning aimed at addressing these concerns. The promise of explainable machine learning is considerable, but it is important for cardiologists who may encounter these techniques in clinical decision support tools or novel research papers to have a critical understanding of both their strengths and their limitations. This paper reviews key concepts and techniques in the field of explainable machine learning as they apply to cardiology. Key concepts reviewed include interpretability versus explainability and global versus local explanations. Techniques demonstrated include permutation importance, surrogate decision trees, local interpretable model-agnostic explanations, and partial dependence plots. We discuss several limitations with explainability techniques, focusing on the how the nature of explanations as approximations may omit important information about how black box models work and why they make certain predictions. We conclude by proposing a rule of thumb about when it is appropriate to use black box models with explanations, rather than interpretable models.
Article
Full-text available
Introduction: Several Mendelian randomization studies have been conducted that identified multiple risk factors for Alzheimer's disease (AD). However, they typically focus on a few pre-selected risk factors. Methods: A two-sample Mendelian randomization (MR) study was used to systematically examine the potential causal associations of 1037 risk factors/medical conditions and 31 drugs with the risk of late-onset AD. To correct for multiple comparisons, the false discovery rate was set at < 0.05. Results: There was strong evidence of a causal association between glioma risk, reduced trunk fat-free mass, lower education levels, lower intelligence and a higher risk of AD. For 31 investigated treatments (such as antihypertensive drugs), we found limited evidence for their associations. Discussion: MR found robust evidence of causal associations between glioma, trunk fat-free, and AD. Our study also confirms that higher educational attainment and higher intelligence are associated with a reduced risk of AD.
Article
Objective Occupational injuries (OIs) cause an immense burden on the US population. Prediction models help focus resources on those at greatest risk of a delayed return to work (RTW). RTW depends on factors that develop over time; however, existing methods only utilize information collected at the time of injury. We investigate the performance benefits of dynamically estimating RTW, using longitudinal observations of diagnoses and treatments collected beyond the time of initial injury. Materials and Methods We characterize the difference in predictive performance between an approach that uses information collected at the time of initial injury (baseline model) and a proposed approach that uses longitudinal information collected over the course of the patient’s recovery period (proposed model). To control the comparison, both models use the same deep learning architecture and differ only in the information used. We utilize a large longitudinal observation dataset of OI claims and compare the performance of the two approaches in terms of daily prediction of future work state (working vs not working). The performance of these two approaches was assessed in terms of the area under the receiver operator characteristic curve (AUROC) and expected calibration error (ECE). Results After subsampling and applying inclusion criteria, our final dataset covered 294 103 OIs, which were split evenly between train, development, and test datasets (1/3, 1/3, 1/3). In terms of discriminative performance on the test dataset, the proposed model had an AUROC of 0.728 (90% confidence interval: 0.723, 0.734) versus the baseline’s 0.591 (0.585, 0.598). The proposed model had an ECE of 0.004 (0.003, 0.005) versus the baseline’s 0.016 (0.009, 0.018). Conclusion The longitudinal approach outperforms current practice and shows potential for leveraging observational data to dynamically update predictions of RTW in the setting of OI. This approach may enable physicians and workers’ compensation programs to manage large populations of injured workers more effectively.
Article
High-dimensional data have been regarded as one of the most important types of big data in practice. It happens frequently in practice including genetic study, financial study, and geographical study. Missing data in high dimensional data analysis should be handled properly to reduce nonresponse bias. We discuss some modern machine learning techniques including penalized regression approaches, tree-based approaches, and deep learning (DL) for handling missing data with high dimensionality. Specifically, our proposed methods can be used for estimating general parameters of interest including population means and percentiles with imputation-based estimators, propensity score estimators, and doubly robust estimators. We compare those methods through some limited simulation studies and a real application. Both simulation studies and real application show the benefits of DL and XGboost approaches compared with other methods in terms of balancing bias and variance.
Article
Alzheimer's disease (AD) is a progressive neurodegenerative disease and the single commonest cause of dementia. Many other diseases can, however, cause dementia, and differential diagnosis can be challenging, especially in early disease stages. For most neurodegenerative dementias, accumulation of brain pathologies starts many years before clinical onset; the ability to detect these pathologies paves the way for targeted disease‐modifying prevention trials. AD is associated with β‐amyloid and tau pathologies, which can be quantified using cerebrospinal fluid and imaging biomarkers and, more recently, using highly sensitive blood tests. While for the most part, specific biomarkers of non‐AD neurodegenerative dementias are lacking, non‐specific biomarkers of neurodegeneration are available. This review summarizes recent advances in the neurodegenerative dementia blood biomarker research and discusses the next steps required for clinical implementation.
Article
Dementia, with a high incidence rate, fast-developing syndrome and large disease burden, raises challenges to global health and social systems. In this review, in order to elaborate current management and diagnosis statements of dementia, and provide further reference to improve dementia service system, we stated policies, clinical guidelines and management experiences concerning dementia across the world. According to the existing dementia management policies and plans, most countries focus on the following aspects: timely detection of dementia, improvement of service quality, person-centered and integrated dementia services at all stages, dementia awareness and friendliness, and scientific research of dementia. Detection of dementia requires knowledge of medical history and cognitive examination, while dementia diagnosis requires more professional medical examination results. Regarding different types of dementia, multiple international standards are used in practice. The overall goals of dementia treatments include postponing the process of cognitive decline and reducing pain caused by cognitive decline, behavioral and psychological symptoms of dementia (BPSD). Treatments include pharmacotherapy interventions and non-pharmacotherapy interventions. In the end-of-life, palliative care is required to improve the quality of life of people with dementia, and maintain their functions. Challenges exist in reducing the disease burden of dementia in the situation of aging population. There are policy bottlenecks and shortcomings to overcome providing medical care services for people with dementia. We would like to suggest strengthening continuous integrated dementia services, improving community services and management support, encouraging policy and financial support for nursing workers, and better support in the end-of-life.
Article
Applications of machine learning (ML) methods have been used extensively to solve various complex challenges in recent years in various application areas, such as medical, financial, environmental, marketing, security, and industrial applications. ML methods are characterized by their ability to examine many data and discover exciting relationships, provide interpretation, and identify patterns. ML can help enhance the reliability, performance, predictability, and accuracy of diagnostic systems for many diseases. This survey provides a comprehensive review of the use of ML in the medical field highlighting standard technologies and how they affect medical diagnosis. Five major medical applications are deeply discussed, focusing on adapting the ML models to solve the problems in cancer, medical chemistry, brain, medical imaging, and wearable sensors. Finally, this survey provides valuable references and guidance for researchers, practitioners, and decision-makers framing future research and development directions.
Article
Purpose of review: This review provides a concise overview of recent advances in cerebrospinal fluid (CSF) and blood-based biomarkers of Alzheimer's disease lesions. Recent findings: Important recent advances for CSF Alzheimer's disease biomarkers include the introduction of fully automated assays, the development and implementation of certified reference materials for CSF Aβ42 and a unified protocol for handling of samples, which all support reliability and availability of CSF Alzheimer's disease biomarkers. Aβ deposition can be detected using Aβ42/Aβ40 ratio in both CSF and plasma, though a much more modest change is seen in plasma. Tau aggregation can be detected using phosphorylated tau (P-tau) at threonine 181 and 217 in CSF, with similar accuracy in plasma. Neurofilament light (NfL) be measured in CSF and shows similar diagnostic accuracy in plasma. Though total tau (T-tau) can also be measured in plasma, this measure is of limited clinical relevance for Alzheimer's disease in its current immunoassay format. Summary: Alzheimer's disease biomarkers, including Aβ, P-tau and NfL can now be reliably measured in both CSF and blood. Plasma-based measures of P-tau show particular promise, with potential applications in both clinical practice and in clinical trials.