PreprintPDF Available

Abstract and Figures

Every year, healthcare specialists collect more and more data about patients but struggle to use it to optimize disease prevention, diagnosis, or treatment processes. While a manual use of this medical data is virtually impossible considering the vast growth rate, automation with artificial intelligence (AI) and digital decision support systems (DDSS) has still not yield any large-scale success in healthcare. Therefore, we aim to investigate possible obstacles, the trustworthiness based on possible biases, and the adoption of new technology by AI and DDSS in psychiatry based on a systematic literature review. We screened 189 papers about AI or DDSS in psychiatry. Since the literature possibly deviates between general decision support systems for a domain and specific diseases, we added results of a literature screening of 30 articles about AI or DDSS for post-traumatic stress disorder as one specific psychiatric disease to our research. From a total of 56 articles, we extract algorithms, the collection method and size of the used training data, and the testing process including the accuracy metrics. The results show low sample sizes (median of 121), a focus on algorithm development without real-world interaction, and methodological shortcomings when it comes to the evaluation of DDSS. In the end, we conclude that DDSS in psychiatry are, based on our survey, not ready for the often-promised revolution of healthcare. Based on the found obstacles, we contribute by suggesting ways to improve the current state of the art.
Content may be subject to copyright.
A survey on AI and Decision Support Systems in Psychiatry
Uncovering a Dilemma
Markus Bertl
1
a, Peeter Ross a, Dirk Draheim b
Abstract
Every year, healthcare specialists collect more and more data about patients but struggle to use it to
optimize disease prevention, diagnosis, or treatment processes. While a manual use of this medical data
is virtually impossible considering the vast growth rate, automation with artificial intelligence (AI) and
digital decision support systems (DDSS) has still not yielded any large-scale success in healthcare. We aim
to investigate possible obstacles, the trustworthiness based on potential biases, and the adoption of new
technology by AI and DDSS in psychiatry based on a systematic literature review. We screened 189 papers
about AI or DDSS in psychiatry. We added results from a literature screening of 30 articles about AI or
DDSS for post-traumatic stress disorder as one specific psychiatric disease to our research, given that
literature possibly deviates from general decision support systems for psychiatry. Out of 56 articles, we
extract algorithms, data collection method and sample size of the used training data, and testing process
including accuracy metrics. The results show that sample sizes are small (median of 121), a focus on
algorithm development without real-world interaction, and methodological shortcomings when it comes
to the evaluation of DDSS. Based on our survey, we conclude that DDSSs in psychiatry are not ready for
the often-promised “AI revolution in healthcare.
Keywords: Medical information policy; Medical technology; Digital decision support systems (DDSS);
Clinical Decision Support Systems (CDSS); Artificial Intelligence (AI); Psychiatry
1. Introduction
Health data is growing steadily. According to an estimation by the International Data Corporation (IDC)
2.414 exabytes of health data were generated by the end of 2020. Given that time is scarce, it is already
impossible to read all medical data of a patient before a doctor’s appointment, stay up-to-date with
treatment methods, or track drug-drug interaction manually. Especially in psychiatry, the “gold standard”
of human diagnosis has low accuracy (Aboraya et al., 2006; Al-Huthail, 2008; Kitamura et al., 1989). A
recent study suggests that more than half of patient care in the U.S. is not administered according to
medical guidelines (McGlynn et al., 2003). Furthermore, 52.7% of people with depression are not correctly
diagnosed by their general practitioner (Mitchell et al., 2009). This evidence shows that we collect medical
data but struggle to make use out of it. Digital Decision Support Systems (DDSS) and artificial intelligence
(AI) could be one way to address diagnostic uncertainty by assisting medical professionals in making sense
of data. In this research, we understand DDSS as defined by Sauter, 1997 as “computer-based systems
that bring together information from a variety of sources, assist in the organization and analysis of
information and facilitate the evaluation of assumptions underlying the use of specific models” (Sauter,
1997). Artificial Intelligence is defined by the Cambridge Dictionary as “the study of how to produce
1
corresponding author
a Technical University Tallinn - Department of Health Technologies, Ehitajate tee 5, 19086 Tallinn, Estonia
b Technical University Tallinn - Department of Software Science, Ehitajate tee 5, 19086 Tallinn, Estonia
computers that have some of the qualities of the human mind, such as the ability to understand language,
recognize pictures, solve problems, and learn” (“Artificial Intelligence,” 2014).
Mounting evidence suggests that demand for such systems is given. 56% of U.S. adults are willing to share
their health data with tech companies like Google (Day et al., 2019). The big data market for health data
is booming (Dash et al., 2019) and is estimated to reach 7 billion USD by 2021. However, considerable
doubt exists. 85.9% of office-based physicians use electronic health records in U.S. (Office-Based Physician
Electronic Health Record Adoption, 2019). A similar trend can be observed in the European Union (eHealth,
Well-being, and Ageing (Unit H.3), 2019). In todays digitized world, even a single byte can have
importance in health-related decisions. Digitized data shows what kind of treatment a person gets, what
kind of medication is prescribed, whether a person is allowed to drive, or even whether someone is
allowed to make decisions on his own. Security of these data is not guaranteed, breaches have happened,
data has been manipulated, and health IT has even been targeted by terroristic activities (Bertl, 2019).
These developments could be countered by increasing investments in security, data protection
techniques, and zero-trust computing. But AI itself also introduces new challenges to clinical safety
(Challen et al., 2019). For example, people increase their worry about whether AI can be trusted. Have
possible reasons for bias been taken into account? Are DDSS tested enough to be used in such a sensitive
area as health? What accuracy can we expect?
A wide variety of biases in scientific publications have been studied extensively. Most notably, scholar
have famously argued that false discovery rates of what researchers advertise as experimental research
findings in scientific publications exceeds 50% (Ioannidis, 2005).
In DDSS research, some researchers also begin to critique the datasets used for AI and the dataset culture
in general in machine learning (Paullada et al., 2020). Since data is the foundational part of AI and machine
learning, the question arises whether currently used data is curated well enough. Furthermore, scholars
increasingly cast doubt whether sampling methodologies are good enough for justifying their use in the
medical domain. Poorly curated datasets reflect human biases. Given their foundational role for
computerized systems, human biases run the risks of spreading flawed decisions at a large-scale, possibly
with catastrophic consequences. These arguments lead to the question how the current state of the art
concerning AI and decision support systems is.
The facts around the dilemma of growing data, medical complexity, and trustworthiness shows that a
thorough investigation of DDSS and AI in healthcare needs to be conducted. In this paper, we investigate
the corresponding literature to find obstacles, indications about the trustworthiness, and the use of
emerging technologies of AI and DDSS in psychiatry. Our goal is to examine state of the art and possible
ways of DDSS improvement.
2. Medical Background
Psychiatric disorders represent critical non-communicable diseases of the 21st century. In 2010, mental
disorders accounted for €461 billion in healthcare costs in Europe (Gustavsson et al., 2011) and ranked as
the leading cause of years lived with disabilities (Wittchen et al., 2011). However, diagnostic accuracy in
psychiatry is still low. For example, 69% of patients with bipolar disorder are initially misdiagnosed by
mental health specialists (Singh & Rajput, 2006). Such errors in diagnoses remain uncorrected for an
average of 5.7 years (Morselli & Elgie, 2003). Despite some achievements in the implementation of DDSS
and AI in clinical routines like drug-drug interaction databases (Metsallik et al., 2018), primary care or
hospital DDSS (Sutton et al., 2020), medical specialists have been waiting for a breakthrough of DDSS and
AI in healthcare settings for at least two decades without tangible success. Most notably, systems still
suffer from both low user acceptance and adoption rates (Bates et al., 2003; Gaube et al., 2021; Sittig et
al., 2006). While new medical devices supported by software, like diagnostic devices, digital imaging or
cardiovascular interventional equipment, incorporate new technology (Bettinger, 2018; Neuman et al.,
2012), are well-accepted by clinicians and have quickly acquired substantial market shares(Schreyögg et
al., 2009), DDSSs have not followed a similar trajectory. Innovative technology like AI does mostly not
deliver value in clinically adopted DDSSs (Strickland, 2019). This may be the case because medical devices
are well-targeted at specific clinical professionals and lead to better performance, while DDSS developers
strive to cover a wide range of clinical disciplines with one technological application. These tendencies
exacerbate for more specialized healthcare sections such as psychiatry. More precisely, the corresponding
situation in psychiatry differs from other medical domains because biomarkers and technical tools for
decision-making have not yet been validated. As a result, diagnoses and treatment decisions tend to
depend on clinical interviews, observations, and self-report measures (Maron et al., 2019). These do
currently not deliver as precise results as biomarkers do. Despite the urgent need caused by increasing
data, increasing medical complexity, as well as limited staff and financial capacities in healthcare, DDSS
and AI still suffer a niche existence. Software is still mainly used to store data rather than as a tool to
redesign care processes or improve decision quality and safety. Therefore, investigation of different
aspects that might hinder broader adoption of DDSS in medicine is of great interest among clinicians. We
contribute to this debate by analyzing current obstacles and ways to improve DDSS and AI in psychiatry.
Additionally to psychiatry as a whole, post-traumatic stress disorder (PTSD) was taken as a clinical entry
in psychiatry. The American Psychiatric Association defines PTSD as “a psychiatric disorder that can occur
in people who have experienced or witnessed a traumatic event such as a natural disaster, a serious
accident, a terrorist act, war/combat, rape or other violent personal assault”. People with PTSD
experience recurrent thoughts about their traumatic experience which influences their daily life (What Is
PTSD?, n.d.). Lifetime prevalence of PTSD is around 12.5% (Spottswood et al., 2017) which renders it all
the more pressing to examine this disorder in greater depth. Even more so, people suffering from PTSD
are often un- or misdiagnosed, resulting in wrong, incomplete, or missing treatment (Meltzer et al., 2012).
3. Methods
3.1 Reporting Standards
We follow Kitchenham & Charters', (2007) five stages for performing systematic literature reviews in
software engineering”:
(1) Search Strategy
(2) Study Selection
(3) Study Quality Assessment
(4) Data Extraction
(5) Data Synthesis
The process of conducting this literature review is visualized in Figure 1. Importantly, our research
methodology complies with the PRISMA checklist for transparent reporting of systematic reviews and
meta-analyses (Liberati et al., 2009; Moher et al., 2009).
We want to highlight that our study is a systematic literature review. Doing a meta-analysis is not possible
because most AI/ML research does not report effect sizes with confidence intervals which would then be
used in a fixed or random effect model for synthesis. Because of that, we do not assign weights to the
extracted features of the studies based on their sample sizes.
3.2 Research Questions
For the literature search, we worked based on RQ1, RQ2, and RQ3 shown in Table 1. The results of our
survey were then used to answer RQ4 based on a narrative synthesis.
Table 1: research questions
#
Research Question
RQ1
What are the current obstacles of research on AI and decision support systems in psychiatry?
RQ2
How trustworthy is the state of the art concerning AI and decision support systems in psychiatry?
RQ3
How do AI and decision support systems in psychiatry adopt new technology?
RQ4
What is needed to improve AI and decision support systems in psychiatry?
3.3 Search Strategy
We built a search string derived from the above research questions. This search string consists of the
object of interests (decision support or artificial intelligence) and the scope of our review (psychiatry).
Although AI has many sub-categories (e.g. machine learning, deep learning, etc.), it is likely that studies
still use either decision support or AI in their title, abstract, or tags. We restricted our search to articles
that were published between 2000 and 2020 to only include modern technology. The resulting search
string was ((decision AND support AND system) OR (artificial AND intelligence)) AND psychiatry).
Additionally to psychiatry as a whole, we supplement our analysis on general DDSS and AI algorithms for
psychiatry by also selecting systems designed for one specific disease. By that, we are reducing possible
bias since approaches for the whole domain of psychiatry might have different results than for one
condition. We used the same object of interests (decision support or artificial intelligence) to construct
the search string, but with the scope post-traumatic stress disorder and its abbreviation PTSD. The
resulting search was ("decision support" OR "Artificial Intelligence") AND (ptsd OR (post AND traumatic
AND stress AND disorder)). We applied our search strings to the research papers’ titles, abstracts, and
tags in Scopus’ abstract and citation database. Scopus was chosen as the primary source because it is the
largest abstract and citation database of research literature with a 100% MEDLINE coverage (Falagas et
al., 2008).
Our Scopus search was carried out on 30th March 2020. We also conducted reference screening and a
manual search in Google Scholar and the web to find additional research.
3.4 Study Selection
Titles and abstracts of queried articles were analyzed to identify relevant articles derived from our search
results. Articles that fitted the research questions and met the inclusion criteria (see 2.4.1), and the quality
criteria (see 2.4.2) were included. To reduce bias, title and abstract screening as well as checking the
inclusion and quality criteria was conducted independently by two researchers. The two sets were then
merged and deviations were discussed among the authors. In the end, we selected 26 papers from our
psychiatry search and 30 papers from our PTSD search, a total of 56 articles for this review. Cohen’s Kappa
was calculated to assess interrater reliability (McHugh, 2012). The agreement score was 90% (Cohen’s
Kappa 0.718). All disagreements could be resolved and were mainly concerned with different judgement
about studies’ maturity levels 1 or 2 based on computerized or paper-based algorithms (IC2).
3.4.1 Inclusion criteria
Table 2 presents the inclusion criteria used.
Table 2: inclusion criteria
#
Inclusion Criteria
IC1
Does the study deal with decision support systems (e.g. systems that help to diagnose, screen, predict, or treat)?
IC2
Does the study use computerized algorithms?
IC3
Does the article deal with psychiatric diseases?
IC4
Is the article related to at least one of our research questions?
3.4.2 Study Quality Assessment
Since uncovering possible research biases was one purpose of this review, we reduced study quality
assessments to a minimum to get a more holistic view of the published research. Quality criteria are
shown in Table 3. We added QC3 since we found two articles originating from journals without peer
review through reference search.
Table 3: quality criteria
#
QC1
QC2
QC3
3.5 Data Extraction and Synthesis
To answer our research questions, clear scoped questions for data extraction were formed (see Table 4)
based on the DDSS framework further described in Bertl et al., 2020. The framework was created based
on thematic analysis (Braun & Clarke, 2006) to define different components of DDSSs in healthcare. We
use the following dimensions for our extraction:
Data which is used by the DDSS to get insights into what data sources are used and what average
sample sizes are. This contributes to RQ2 by investigating if data sources are trustworthy and
sample sizes are appropriate. It also contributes to RQ3 by showing what technology is used for
data collection.
Technology for data collection, user interaction, and decision-making contributing to RQ3.
Validation around accuracy, user acceptance, efficacy, and legal/compliance to extract
information around RQ2 - trustworthiness.
Disease that DDSSs are applied to. Different diseases can be used for possible subgroup analysis.
Decision type (prediction, diagnosis, screening, monitoring, or treatment) for possible subgroup
analysis.
Maturity level of DDSSs from 1 (idea of DDSS) to 7 (world-wide adopted product). This indicates
general DDSS adoption rates by showing how they have advanced in the market. Our maturity
levels contribute to RQ1 and RQ4.
Using a framework for data extraction helps us to make our research reproducible by highlighting which
features of DDSSs were used to answer our research questions. We omitted framework dimensions about
user groups during extraction as they do not contribute to our research questions.
We decided to reuse our own models since it was developed especially for the analysis of research about
DDSS in psychiatry and has been already applied successfully. Other existing frameworks were found to
be either too complex (Boza et al., 2009; Sprague, 1980) or not fit our research questions (Camacho et al.,
2020; Sim & Berlin, 2003). Greenes et al. also highlight that too many different perspectives on studying
DDSS makes a single DDSS model which can be reused for different applications challenging (Greenes et
al., 2018) .
Table 4: extraction questions (EQ)
#
Extraction parameters
EQ1
What data do existing decision support systems use?
EQ1.2
How large is the used sample size?
EQ2
How are existing DDSS in mental health implemented?
EQ2.1
Decision technology
EQ2.2
User interaction technology
EQ2.3
Data collection technology
EQ3
Which features were validated?
EQ3.1
How high is accuracy?
EQ4
What diseases are currently targeted by the DDSS?
EQ5
What decisions are supported by the system?
EQ6
What maturity level does the DDSS have?
The extracted answers to the EQ’s were then combined to a feature matrix based on a common
agreement among the authors. The extracted features were then clustered to have a common
terminology that allows further analysis and the possibility to compare results based on a narrative
synthesis. Figure 1 highlights this research process.
Figure 1: research strategy
3.6 Risk of Bias
We used the funnel plots based on sample size and accuracy to search for possible publication biases
(Sterne & Harbord, 2004). Funnel plots plot the treatment effect (accuracy in our case) against the sample
size. Suppose studies with smaller sample sizes have equal or less variance than studies with higher sample
size (accuracy’s distribution is skewed). In that case, publication bias can be assumed (Kitchenham &
Charters, 2007). Our empirical accuracy distribution presented in Figure 2 does not indicate publication
bias since it is nearly symmetric with only a small left-skew of -0.03. Studies with smaller sample sizes
have more variance in accuracy than studies with higher sample sizes. However, these results should be
interpreted with caution since not all studies mentioned their systems' accuracy values. Some studies
used different metrics that could not be converted to accuracy (see 4.1.3). Since most articles did not yield
statistically significant results, alternative methods for detecting publication bias or data mining like p-
curve analysis were not possible. However, Figure 2 shows that mostly articles with high accuracy scores
have been published.
Sample size
Figure 2: Funnel Plot Accuracy vs. sample size (Finkelman et al., 2017; Gong et al., 2 014; Suhasini et al., 2011)
4. Results
4.1 Facts from the literature
This sub-section deals with the hard facts obtained from the selected literature based on the extraction
questions in Table 4. The results are then analyzed, synthesized, and discussed in section 4.2. In general,
article publication dates range from 2001 to 2020. About 55% of articles were published between 2014
and 2020 (31/56). The majority of articles were published in medical journals (34/56), 12 were published
in computer science journals and 10 in journals specific to digital healthcare or health informatics.
4.1.1 Data
As shown in Figure 3, the majority of DDSS uses the results of questionnaires or checklists (25%). As of the
time of this review, innovative technology like virtual reality or sensor data has not been adopted widely
(6.2%).
Acurracy
Figure 3: Input Data
Figure 4 shows the distribution of sample size. More precisely, sample sizes mean was μ = 5569 with a
standard deviation of σ = 19194.28 and a median of η = 237. The smallest sample size observed was 4, the
highest 89840. Outliers (5972, 11540, 45388, 89840, 89840) have been removed in the plot for better
visibility. Additionally to total sample size, 17 articles listed information about the number of positive and
negative cases in their datasets.
Figure 4: Distribution of the sample size of DDSS in Psychiatry
39 out of the 56 articles in our review did not mention possible biases of data collection, their data, or
the DDSS algorithm.
4.1.2 Technology
Figure 5 shows the different algorithms used for DDSS in psychiatry. 19 research projects used Support
Vector Machines (SVM) as decision algorithm for their system. The second most popular decision
technology was logistic regression (16 articles). Together with decision trees and random forest (9
articles), these groups make up nearly two-thirds of all algorithms. Explainability or explainable AI was not
mentioned by any of the papers in this review. Two papers mentioned that their approach is a black box
as a downside.
Figure 5: Decision technology
Because of generally low maturity score described in 4.1.6, no clear indication of user interaction or data
collection technology could be found since these technologies are typically not present when only dealing
with datasets.
4.1.3 Validation
Figure 6 presents the evaluation methods of DDSS. Since the majority of research found is based on
algorithm development based on datasets, the most dominant evaluation criterion was algorithmic
accuracy measured for instance by precision, recall, F1 score, or area under the receiver operating
characteristic (ROC) curve (AUC). Definitions of the mentioned performance measure, especially AUC, can
be found at Bradley, 1997. 74.6% of the articles used algorithmic accuracy for the evaluations.
Figure 6: Evaluation of DDSS
1
1
1
2
3
5
5
6
6
7
9
16
19
0 2 4 6 8 10 12 14 16 18 20
kernelized partial least squares regression
spatial deformation model
visual analytics
gradient boosting
correlation
neural networks
rule based
bayesian model
text mining
machine learning
decision tree
logistic regression
SVM
#
Decision Technology
Most popular was the measurement “accuracy”, present in 19 out of the 56 studies. The second most
popular performance measurement was the area under the curve, present in 16 studies. Since accuracy
and AUC values cannot be converted to each other, we extracted all accuracy measurements present in
each paper and aggregated each scale individually. Mean accuracy of the DDSS is μ = 84.3% with a median
of η = 82.5% and a standard deviation of σ = 0.099 (Figure 7). Mean area under the curve value is μ = 0.806
with a median of η = 0.805 and a standard deviation of σ = 0.071. 27 papers listed a confusion matrix or
precision/recall values apart from other evaluation metrics like AUC, F1, or accuracy scores.
Figure 7: Accuracy Boxplot
Figure 8 shows the accuracy scores with the corresponding sample size of the different papers in our
analysis.
Figure 8: Scatter Plot Sample Size vs. Accuracy
4.1.4 Supported diseases and Supported decisions
Table 5 lists the diseases classified by the DDSS in psychiatry.
Table 5: Diseases
Disease
#
depression
4
schizophrenia
4
psychotic disorder
2
anxiety
1
0
100
200
300
400
500
600
700
800
900
1000
60% 65% 70% 75% 80% 85% 90% 95% 100%
Sample Size
Accuracy
Accuracy DDSS for Psychiatry Accuracy DDSS for PTSD
ADHD
1
autism
1
PTSD
1
positive valence system symptom severity
1
suicide
1
31 papers studied PTSD. Additionally, four papers investigated depression, two anxiety, and one paranoia.
Extracted features for supported diseases and decisions did not yield any results that could be linked to
our research questions about problems, biases, and fairness of AI algorithms.
3.1.5 Maturity
Based on the digital decision support framework described at Bertl, Metsallik, and Ross 2020, maturity
was ranked on a scale from 1 (idea) to 7 (world-wide adopted product). The levels are described in Table
6. Scores based on these maturity levels indicate advancement in the development of AI and DDSS in
psychiatry.
Table 6: Maturity Levels
Level
Description
1
Idea without implementation
2
Implementation without real-world interaction (algorithm development)
3
Implementation with real-world interaction but without patient intervention (no
real intervention on a patient takes part based on the output of the DDSS)
4
Fully functioning prototype, system triggers real-world action (e.g., clinical trial)
5
Operational product (at least one adopter, certified if required)
6
Locally adopted product
7
World-wide adopted product (transformational)
The maturity levels of the research in this review are shown in Figure 8. The majority of articles dealt with
maturity level two (29) and three (17). The average maturity level of research in this survey was 2.4 and
therefore indicates that most research deals with algorithm development.
Figure 8: Maturity Levels
A Mann-Whitney-U test (McKnight & Najab, 2010) indicated that the difference between maturity levels
of articles about DDSS and AI in psychiatry does not differ statistically significantly from maturity levels
for articles about DDSS and AI for PTSD (U(npsychiatry=26, nPTSD=30) = 383.00, z = -0.11, p < .9124).
4.2 Findings & Discussion
Our research summarizes the current state of the art on decision support systems and artificial intelligence
in psychiatry based on a theoretical framework for DDSS.
4.2.1. Current Obstacles of Research of AI and Decision Support Systems in Psychiatry RQ1
Concerning RQ1 “obstacles”, we observed small sample sizes when examining data dimensions. Mostly,
either datasets that were already available were used, or data was collected using local medical
information systems. No data source covering the whole population at the state or national level was
found. We assume that this is an indicator for the lack of standardized eHealth infrastructures, universal
data access, interoperability and a lack of data reuse capabilities in psychiatry. Psychiatry, in particular, is
still mainly based on unconnected, impractical or inefficient electronic record systems. Sometimes,
documentation is even paper-based. This can introduce selection bias to AI training data. Clinical notes in
EMR are mostly entered in free text. Coding and classification of findings are either based on very general,
artificial categories like ICD or DSM, or locally used legacy taxonomies. It is questionable whether these
artificial categories reflect the actual mental problem present with a patient. Knowing about the
problems, new frameworks like the Research Domain Criteria (RDoC), which take into account more
dimensions than just patient symptoms, are currently developed (Cuthbert, 2014). This is especially
important given that we found a small correlation between sample size and accuracy, indicating that more
data does not necessarily produce better outcomes. Instead, quality and representativeness of the data
remain the important factor. As we have written at the beginning of this section, such problems continue
to bedevil research on healthcare. Additionally to quality issues, fragmentation of healthcare data makes
it difficult to successfully implement DDSSs and AI in a real-world scenario. This conclusion is shared by
Panch et al., 2019. Research's generally low maturity scores also indicate this problem of bringing AI and
DDSS into clinical practice. Besides health data’s fragmentation, other explanations could be a lack of
strategic development, resulting in difficulties to bring research into clinical settings. Providing well-
accepted user interaction is often more challenging to solve than the AI algorithm powering the DDSS’s
5
29
17
20 0 2
0
5
10
15
20
25
30
35
1 2 3 4 5 6 7
#
Maturity Level
cognition. Nevertheless, the evaluation dimension indicates, that most papers focus on evaluating the
cognition by exclusively using accuracy score. The human interaction with those systems is often
neglected. This demonstrates a significant research gap when it comes to the enrichment of clinical
processes with IT. Research on AI and DDSS should perhaps focus more on the effects of the clinical
processes, similar to health technologies, that are more successful in clinical settings, e.g. diagnostic
imaging.
4.2.2 Trustworthiness of AI and Digital Decision Support Systems in Psychiatry RQ2
We answer RQ2 “trustworthiness” by investigating evaluation methods, accuracy, possible reasons for
biases, and other ways which could lead to wrong recommendations of AI and DDSS in psychiatry. We
found that the majority of articles investigated neither statistical significance (present in 17/56 papers)
nor possible reasons for biases (19/56). Ranking according to maturity levels revealed that research mainly
dealt with algorithm development. In contrast, randomized control trials were rare and only two systems
in production were found. The fact that neither the concepts of decision support systems (Power, 2008,
pp. 121140) nor AI (Yu et al., 2018) are new indicates that the field is both stagnating and getting
increasingly complex. This fact can be observed in current research and is also present in commercial
products like IBM Watson (Strickland, 2019). One important factor that has held back AI in the past has
recently been overcome: the lack of computing power. This positive development has led to many AI-
related success stories in areas like finance, retail, or marketing. Notable examples include Google or
Amazon, which are heavily dependent on AI (Smith & Linden, 2017). However, the medical domain is
complex, the generalizability of diagnoses is questionable, while data collection and reuse are time-
consuming and expensive. Additionally, many requirements concerning data protection and
anonymization pose difficulties. Also, diagnosing patients is not a straightforward matter, especially in
mental health, and the reproducibility of a patient diagnosis by humans is low (Aboraya et al., 2006; Basco
et al., 2000; Mendel et al., 2011; Muller, 2013). Studies suggest that not only diagnostic, but also
administrative errors, run rampant in modern-day diagnoses (Davis et al., 2016). This means that AI’s
training or labeling data is itself probably inconsistent, making establishing a ground truth for ML training
difficult. In this context, computerized systems function as catalysts for already present errors in the
dataset by enabling the large-scale reproduction of already biased decisions. It is questionable whether
simple algorithms like SVMs (used in 19/56 papers) or logistic regression (16/56) can model complex
neurophysiology processes that are present in the human brain to a satisfactory degree.
Evaluation is an important keyword when it comes to the question of trustworthiness. Data scientists and
AI researchers focus on improving accuracy scores since the academic community has decided that this
constitutes the main criteria for success. Other evaluation metrics are often neglected. According to our
review, 47 out of the 56 papers focus on accuracy evaluations. Mean accuracy of 83.45% seems high.
However, accuracy alone is insufficient to measure whether AI algorithms or DDSSs perform well. One
crucial shortcoming of accuracy scores is that they are highly dependent on both sampling and the number
of positive and negative cases that are present in the evaluation sets. In contrast, the class balance must
be preserved to get realistic accuracy scores. By default, classification problems concerning psychiatric
diseases are highly unbalanced, meaning the number of negative cases in a random population sample is
much higher than the number of positive ones. Since researchers make evaluations based on their
personally collected data, they may reproduce their own biases. Due to the problems mentioned above,
the resulting accuracy values make comparisons between different research results difficult.
Benchmarking different approaches becomes even more complicated when only a single measure like
accuracy or AUC values are mentioned, as it was the case in most of the investigated research.
4.2.3 Adoption of new Technology - RQ3
When looking at RQ3 “adoption of new technology”, our literature review uncovered that DDSSs’ input
data is mostly based on checklists or questionnaires. Checklists and questionnaires as a tool for diagnosis
have been tested and validated over the past decades. , Questionnaires and checklists are still the only
mentioned approach for diagnosis in medical guidelines like NICE (NICE Guideline NG116, n.d.). New data
sources that could be used for digital phenotyping have still not been adopted much. Wider use of
internationally recognized taxonomies for clinical notes and standardization of data capture would also
improve the performance and quality of artificial intelligence and DDSS. Since new technology has a short
lifecycle, it is more difficult to find evidence supporting their clinical use, which might explain why there
is less research. From an algorithmic perspective, the most common algorithm used in articles of this
review was SVM. Compared to other algorithms, SVM have less stringent assumptions for input data and
are easy to implement. However, SVMs are not explainable by default. Explainable AI is still a neglected
topic of the current state of the art in health informatics. From a legal as well as from an ethical point of
view, it is essential to understand why systems produce certain outputs (Safdar et al., 2020). Since cases
are known where the application of AI algorithms has resulted in discrimination based on ethnicity or
gender (Buolamwini & Gebru, 2018; Leavy, 2018), decision transparency is very much needed, especially
in a sensitive domain like healthcare.
4.2.3 Ways to improve AI and Digital Decision Support Systems in Psychiatry RQ4
Advancements of AI algorithms and DDSS are highly dependent on data availability. As shown above, data
impacts AI models' training and is also the primary source of evaluation and benchmarking. We think that
many current problems are unrelated to algorithms’ cognition or intelligence but can be better explained
by a lack of high-quality data.
We propose that further research dedicates renewed attention to the use of unobtrusive data by DDSS to
supplement diagnostic data and clinical questionnaires. This shifts the focus from a diagnosis’ perspective
based on generalized artificial categorization back to physiological problems caused by different diseases
giving better insights into the possible cause of mental illness and effective therapeutic intervention.
Unobtrusive data is not impacted by current issues in healthcare concerning data and diagnosis
standardization and data collection. Additionally, it also helps to mitigate potential biases in available data
sources like electronic health records or checklists/questionaries.
To ensure that AI and DDSS are less biased in psychiatric research, we propose using a unified benchmark
dataset. Such datasets should contain anonymized, open-access, data from many different resources. A
unified benchmark can help to overcome challenges in measuring the correctness of DDSS algorithms. It
helps to obtain standardized benchmark results which makes the comparison of different approaches
possible. This is already common in other disciplines where AI is used; examples include the MNIST
Database for handwritten digits by the US National Institute of Standards and Technology (LeCun et al.,
1999) or TweetEval for Tweet classification (Barbieri et al., 2020). Further research needs to specify how
such unified benchmark datasets for DDSS and AI in psychiatry could look like.
Apart from relying on more and better data, we want to highlight the importance of adding confusion
matrices in academic papers. A confusion matrix helps to make the performance of different algorithms
comparable. It is impossible to produce aggregated meta-analyses to evaluate the performance of DDSS
and AI when accuracy, AUC values, or F1 values are the only measures that researchers calculate since
these values cannot be converted to a unified metric. Nevertheless, we found that only 27 papers listed
confusion matrixes.
Given that most studies in this review used available datasets which were not sampled individually for
their research (maturity level two or lower), their high accuracy values must not necessarily reflect good
performance of level four or higher DDSS. We suggest that DDSS need to be tested in a clinical setting to
evaluate their real-world performance and efficacy. Additionally, high accuracy of the algorithms could be
an indication of overfitting. Current research mostly neglects this.
It is not sufficient to focus exclusively on one dimension of our framework like data or decision technology.
In order to introduce DDSS and AI safely into clinical practice, there is a need for standardization and
unified evaluation criteria for every part of the decision support system framework by Bertl et al., 2020.
A standardized way of evaluating DDSS in clinical practice is needed to show the unbiased performance
of DDSS and AI in healthcare. At the moment, this has been neglected by the scientific community.
However, a standardized evaluation is the foundation of the trustworthiness of computerized decision-
making.
4.2.4 Limitations
This survey has several limitations. First, we only included peer-reviewed publications in English. Relevant
DDSS might have been published as pre-prints or news reports. DDSS may also have been implemented
in real-world clinical practice without previous publishing these systems in academic journals. We think
that this concern is unlikely but still possible. One example of this would be the product EBMeDS
(Duodecim Medical Publications Ltd, 2020). Several publications about EBMeDS exist, but none of them
mentions psychiatry, although it is used in this area in production. These points may partly explain why
we found low maturity scores.
Not the least, further research needs to be done to analyze maturity scores of currently used DDSSs. Since
research about DDSS in psychiatry is limited, our results are based on a small corpus of literature. This
needs to be taken into account while interpreting the quantitative results of this survey.
Regarding our research question on trustworthiness, it needs to be noted that this is a highly subjective
matter. Indeed, no unified measurement has evolved. We try to quantify trustworthiness by looking at
maturity and accuracy scores. Further research could use other approaches like surveys of stakeholders
to measure their opinion.
Another limitation is that we calculate statistical parameters over the whole study population. We argue
that this is valid because all our papers still belong to the top-level category of DDSS in psychiatry, although
they might follow different approaches and deal with various diseases. The overview provided by this
survey is needed to establish a baseline on how well DDSSs in psychiatry work generally. Especially the
aggregation of sample size used for training and evaluation, as well as the accuracy scores can be an
indicator for trustworthiness and is therefore highly relevant for our research. We accept that our
aggregations might hide important variations in the data. A more granular sub-group analysis would be
desirable but is unlikely to find statistically significant results due to the even smaller corpus of the
literature. Here, we encourage further research. Nevertheless, such studies still cannot conclude on the
general subject area of DDSSs in psychiatry.
5 Conclusion
There is no evidence of widespread usage of AI and DDSS applications in psychiatry or other clinical
specialties in everyday practice. Although the algorithms' high accuracy scores seem to support their use,
this systematic literature review indicated problems concerning small sample sizes, possibilities of bias,
lack of evaluation in production, and potential difficulties in establishing a ground truth. One reason that
could explain why collecting new and original data is difficult might be the absence of standardization,
centralized eHealth infrastructure, and data reuse capabilities in healthcare systems. Concepts to cope
with health data fragmentation are needed as well as concepts to ensure data quality. Additionally, we
are missing broad evidence of AI’s and DDSSs’ success confirmed by clinical studies to justify large-scale
adoption. We also advocate for introducing a standardized concept for evaluating DDSS and AI in
healthcare. One such component could include carefully assembled unified benchmark datasets to
establish a consistent way of evaluating algorithmic accuracy of DDSS and AI algorithms and helping to
reduce bias. Algorithmic bias reflects human biases and culture, possibly with catastrophic consequences.
On the other hand, a well-consolidated AI system that discovers real relationships in big data could amass
the kind of knowledge of human mental disorders, their genetic origins and expressions in diagnoses and
human behavior in a way that no other method can potentially tapping into the 'source code of the
mind' on the deepest neurocognitive levels. However, although much needed, we see the opportunity for
AI and DDSSs to improve psychiatry at the moment to remain just that an opportunity for the future.
Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or not-
for-profit sectors.
References
Aboraya, A., Rankin, E., France, C., El-Missiry, A., & John, C. (2006). The Reliability of Psychiatric
Diagnosis Revisited. Psychiatry (Edgmont), 3(1), 4150.
Al-Huthail, Y. R. (2008). Accuracy of Referring Psychiatric Diagnosis. International Journal of Health
Sciences, 2(1), 3538.
Alvarez-Conrad, J., Zoellner, L. A., & Foa, E. B. (2001). Linguistic predictors of trauma pathology and
physical health. Applied Cognitive Psychology, 15(7), S159S170.
https://doi.org/10.1002/acp.839
Barbieri, F., Camacho-Collados, J., Neves, L., & Espinosa-Anke, L. (2020). TweetEval: Unified Benchmark
and Comparative Evaluation for Tweet Classification. ArXiv:2010.12421 [Cs].
http://arxiv.org/abs/2010.12421
Barish, G., Aralis, H., Elbogen, E., & Lester, P. (2019). A mobile app for patients and those who care about
them: A case study for veterans with PTSD + anger. ACM Int. Conf. Proc. Ser., 110. Scopus.
https://doi.org/10.1145/3329189.3329248
Basco, M. R., Bostic, J. Q., Davies, D., Rush, A. J., Witte, B., Hendrickse, W., & Barnett, V. (2000). Methods
to Improve Diagnostic Accuracy in a Community Mental Health Setting. American Journal of
Psychiatry, 157(10), 15991605. https://doi.org/10.1176/appi.ajp.157.10.1599
Bates, D. W., Kuperman, G. J., Wang, S., Gandhi, T., Kittler, A., Volk, L., Spurr, C., Khorasani, R.,
Tanasijevic, M., & Middleton, B. (2003). Ten Commandments for Effective Clinical Decision
Support: Making the Practice of Evidence-based Medicine a Reality. Journal of the American
Medical Informatics Association, 10(6), 523530. https://doi.org/10.1197/jamia.M1370
Bergman, L. G., & Fors, U. G. H. (2005). Computer-aided DSM-IV-diagnosticsAcceptance, use and
perceived usefulness in relation to users’ learning styles. BMC Medical Informatics and Decision
Making, 5. Scopus. https://doi.org/10.1186/1472-6947-5-1
Berrouiguet, S., Barrigón, M. L., Castroman, J. L., Courtet, P., Artés-Rodríguez, A., & Baca-García, E.
(2019). Combining mobile-health (mHealth) and artificial intelligence (AI) methods to avoid
suicide attempts: The Smartcrises study protocol. BMC Psychiatry, 19(1). Scopus.
https://doi.org/10.1186/s12888-019-2260-y
Bertl, M. (2019). News analysis for the detection of cyber security issues in digital healthcare: Young
Information Scientist, 4, 115. https://doi.org/10.25365/yis-2019-4-1
Bertl, M., Metsallik, J., & Ross, P. (2020). Digital Decision Support Systems for Post-Traumatic Stress
DisorderImplementing a novel framework for decision support systems based on a technology-
focused, systematic literature review. https://doi.org/10.13140/RG.2.2.12571.28965/1
Bettinger, C. J. (2018). Advances in Materials and Structures for Ingestible Electromechanical Medical
Devices. Angewandte Chemie International Edition, 57(52), 1694616958.
https://doi.org/10.1002/anie.201806470
Boza, A., Ortiz, A., Vicens, E., & Poler, R. (2009). A Framework for a Decision Support System in a
Hierarchical Extended Enterprise Decision Context. In R. Poler, M. van Sinderen, & R. Sanchis
(Eds.), Enterprise Interoperability (pp. 113124). Springer. https://doi.org/10.1007/978-3-642-
04750-3_10
Bradley, A. P. (1997). The use of the area under the ROC curve in the evaluation of machine learning
algorithms. Pattern Recognition, 30(7), 11451159. https://doi.org/10.1016/S0031-
3203(96)00142-2
Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology,
3(2), 77101. https://doi.org/10.1191/1478088706qp063oa
Buolamwini, J., & Gebru, T. (2018). Gender Shades: Intersectional Accuracy Disparities in Commercial
Gender Classification. Conference on Fairness, Accountability and Transparency, 7791.
http://proceedings.mlr.press/v81/buolamwini18a.html
Camacho, J., Zanoletti-Mannello, M., Landis-Lewis, Z., Kane-Gill, S. L., & Boyce, R. D. (2020). A
Conceptual Framework to Study the Implementation of Clinical Decision Support Systems
(BEAR): Literature Review and Concept Mapping. Journal of Medical Internet Research, 22(8),
e18388. https://doi.org/10.2196/18388
Caye, A., Agnew-Blais, J., Arseneault, L., Gonçalves, H., Kieling, C., Langley, K., Menezes, A. M. B., Moffitt,
T. E., Passos, I. C., Rocha, T. B., Sibley, M. H., Swanson, J. M., Thapar, A., Wehrmeister, F., &
Rohde, L. A. (2019). A risk calculator to predict adult attention-deficit/hyperactivity disorder:
Generation and external validation in three birth cohorts and one clinical sample. Epidemiology
and Psychiatric Sciences. Scopus. https://doi.org/10.1017/S2045796019000283
Challen, R., Denny, J., Pitt, M., Gompels, L., Edwards, T., & Tsaneva-Atanasova, K. (2019). Artificial
intelligence, bias and clinical safety. BMJ Quality & Safety, 28(3), 231237.
https://doi.org/10.1136/bmjqs-2018-008370
Chyzhyk, D., Graña, M., Öngür, D., & Shinn, A. K. (2015). Discrimination of schizophrenia auditory
hallucinators by machine learning of resting-state functional MRI. International Journal of Neural
Systems, 25(3). Scopus. https://doi.org/10.1142/S0129065715500070
Constantinou, A. C., Fenton, N., Marsh, W., & Radlinski, L. (2016). From complex questionnaire and
interviewing data to intelligent Bayesian network models for medical decision support. Artificial
Intelligence in Medicine, 67, 7593. Scopus. https://doi.org/10.1016/j.artmed.2016.01.002
Coppersmith, G., Harman, C., & Dredze, M. (2014). Measuring post traumatic stress disorder in twitter.
Proc. Int. Conf. Weblogs Soc. Media, ICWSM, 579582. Scopus.
https://www.scopus.com/inward/record.uri?eid=2-s2.0-
84909982111&partnerID=40&md5=385e0d4e8cb2b52e387d55317b1ac262
Coronato, A., De Pietro, G., & Paragliola, G. (2014). A situation-aware system for the detection of motion
disorders of patients with Autism Spectrum Disorders. Expert Systems with Applications, 41(17),
78687877. Scopus. https://doi.org/10.1016/j.eswa.2014.05.011
Ćosić, K., Popović, S., Kukolja, D., Horvat, M., & Dropuljić, B. (2010). Physiology-Driven Adaptive Virtual
Reality Stimulation for Prevention and Treatment of Stress Related Disorders. Cyberpsychology,
Behavior, and Social Networking, 13(1), 7378. https://doi.org/10.1089/cyber.2009.0260
Cuthbert, B. N. (2014). The RDoC framework: Facilitating transition from ICD/DSM to dimensional
approaches that integrate neuroscience and psychopathology. World Psychiatry, 13(1), 2835.
https://doi.org/10.1002/wps.20087
Dabek, F., & Caban, J. J. (2015a). A neural network based model for predicting psychological conditions
(Vol. 9250). Springer Verlag; Scopus. https://doi.org/10.1007/978-3-319-23344-4_25
Dabek, F., & Caban, J. J. (2015b). Leveraging big data to model the likelihood of developing psychological
conditions after a concussion. In Roy A., Venayagamoorthy K., Alimi A., Angelov P., & Trafalis T.
(Eds.), Procedia Comput. Sci. (Vol. 53, pp. 265273). Elsevier B.V.; Scopus.
https://doi.org/10.1016/j.procs.2015.07.303
Dash, S., Shakyawar, S. K., Sharma, M., & Kaushik, S. (2019). Big data in healthcare: Management,
analysis and future prospects. Journal of Big Data, 6(1), 54. https://doi.org/10.1186/s40537-
019-0217-0
Davis, K. A. S., Sudlow, C. L. M., & Hotopf, M. (2016). Can mental health diagnoses in administrative data
be used for research? A systematic review of the accuracy of routinely collected diagnoses. BMC
Psychiatry, 16(1), 263. https://doi.org/10.1186/s12888-016-0963-x
Day, S., Seninger, C., Fan, J., Pundi, K., Perino, A., & Turakhia, M. (2019). Digital Health Consumer
Adoption Reposrt 2019. Stanford Medicine.
Duodecim Medical Publications Ltd. (2020). EBMEDS White Paper. https://www.ebmeds.org/wp-
content/uploads/sites/16/2020/10/WhitePaper_2020-1.pdf
eHealth, Well-being, and Ageing (Unit H.3). (2019). EHealth adoption in primary healthcare in the EU is
on the rise [Text]. European Commission. https://ec.europa.eu/digital-single-
market/en/news/ehealth-adoption-primary-healthcare-eu-rise
Falagas, M. E., Pitsouni, E. I., Malietzis, G. A., & Pappas, G. (2008). Comparison of PubMed, Scopus, Web
of Science, and Google Scholar: Strengths and weaknesses. The FASEB Journal, 22(2), 338342.
https://doi.org/10.1096/fj.07-9492LSF
Finkelman, M. D., Lowe, S. R., Kim, W., Gruebner, O., Smits, N., & Galea, S. (2017). Customized
computer-based administration of the PCL-5 for the efficient assessment of PTSD: A proof-of-
principle study. Psychological Trauma: Theory, Research, Practice, and Policy, 9(3), 379389.
https://doi.org/10.1037/tra0000226
Freeman, D., Antley, A., Ehlers, A., Dunn, G., Thompson, C., Vorontsova, N., Garety, P., Kuipers, E.,
Glucksman, E., & Slater, M. (2014). The use of immersive virtual reality (VR) to predict the
occurrence 6 months later of paranoid thinking and posttraumatic stress symptoms assessed by
self-report and interviewer methods: A study of individuals who have been physically assaulted.
Psychological Assessment, 26(3), 841847. https://doi.org/10.1037/a0036240
Galatzer-Levy, I. R., Karstoft, K.-I., Statnikov, A., & Shalev, A. Y. (2014). Quantitative forecasting of PTSD
from early trauma responses: A Machine Learning application. Journal of Psychiatric Research,
59, 6876. Scopus. https://doi.org/10.1016/j.jpsychires.2014.08.017
Gaube, S., Suresh, H., Raue, M., Merritt, A., Berkowitz, S. J., Lermer, E., Coughlin, J. F., Guttag, J. V.,
Colak, E., & Ghassemi, M. (2021). Do as AI say: Susceptibility in deployment of clinical decision-
aids. Npj Digital Medicine, 4(1), 18. https://doi.org/10.1038/s41746-021-00385-9
Gong, Q., Li, L., Tognin, S., Wu, Q., Pettersson-Yeo, W., Lui, S., Huang, X., Marquand, A. F., & Mechelli, A.
(2014). Using structural neuroanatomy to identify trauma survivors with and without post-
traumatic stress disorder at the individual level. Psychological Medicine, 44(1), 195203.
https://doi.org/10.1017/S0033291713000561
Goodspeed, A., Kostman, N., Kriete, T. E., Longtine, J. W., Smith, S. M., Marshall, P., Williams, W., Clark,
C., & Blakeslee, W. W. (2019). Leveraging the utility of pharmacogenomics in psychiatry through
clinical decision support: A focus group study. Annals of General Psychiatry, 18(1). Scopus.
https://doi.org/10.1186/s12991-019-0237-3
Goodwin, T. R., Maldonado, R., & Harabagiu, S. M. (2017). Automatic recognition of symptom severity
from psychiatric evaluation records. Journal of Biomedical Informatics, 75, S71S84. Scopus.
https://doi.org/10.1016/j.jbi.2017.05.020
Greenes, R. A., Bates, D. W., Kawamoto, K., Middleton, B., Osheroff, J., & Shahar, Y. (2018). Clinical
decision support models and frameworks: Seeking to address research issues underlying
implementation successes and failures. Journal of Biomedical Informatics, 78, 134143.
https://doi.org/10.1016/j.jbi.2017.12.005
Gustavsson, A., Svensson, M., Jacobi, F., Allgulander, C., Alonso, J., Beghi, E., Dodel, R., Ekman, M.,
Faravelli, C., Fratiglioni, L., Gannon, B., Jones, D. H., Jennum, P., Jordanova, A., Jönsson, L.,
Karampampa, K., Knapp, M., Kobelt, G., Kurth, T., … Olesen, J. (2011). Cost of disorders of the
brain in Europe 2010. European Neuropsychopharmacology, 21(10), 718779.
https://doi.org/10.1016/j.euroneuro.2011.08.008
Hatton, C. M., Paton, L. W., McMillan, D., Cussens, J., Gilbody, S., & Tiffin, P. A. (2019). Predicting
persistent depressive symptoms in older adults: A machine learning approach to personalised
mental healthcare. Journal of Affective Disorders, 246, 857860. Scopus.
https://doi.org/10.1016/j.jad.2018.12.095
He, Q., Veldkamp, B. P., Glas, C. A. W., & de Vries, T. (2017). Automated Assessment of Patients’ Self-
Narratives for Posttraumatic Stress Disorder Screening Using Natural Language Processing and
Text Mining. Assessment, 24(2), 157172. https://doi.org/10.1177/1073191115602551
Henshall, C., Cipriani, A., Ruvolo, D., Macdonald, O., Wolters, L., & Koychev, I. (2019). Implementing a
digital clinical decision support tool for side effects of antipsychotics: A focus group study.
Evidence-Based Mental Health, 22(2), 5660. Scopus. https://doi.org/10.1136/ebmental-2019-
300086
Henshall, C., Marzano, L., Smith, K., Attenburrow, M.-J., Puntis, S., Zlodre, J., Kelly, K., Broome, M. R.,
Shaw, S., Barrera, A., Molodynski, A., Reid, A., Geddes, J. R., & Cipriani, A. (2017). A web-based
clinical decision tool to support treatment decision-making in psychiatry: A pilot focus group
study with clinicians, patients and carers. BMC Psychiatry, 17(1). Scopus.
https://doi.org/10.1186/s12888-017-1406-z
Hossain, M. F., George, O., Johnson, N., Madiraju, P., Flower, M., Franco, Z., Hooyer, K., Rein, L., Mazaba,
J. L., & Ahamed, S. I. (2019). Towards clinical decision support for veteran mental health crisis
events using tree algorithm. In Getov V., Gaudiot J.-L., Yamai N., Cimato S., Chang M., Teranishi
Y., Yang J.-J., Leong H.V., Shahriar H., Takemoto M., Towey D., Takakura H., Elci A., Takeuchi S., &
Puri S. (Eds.), Proc Int Comput Software Appl Conf (Vol. 2, pp. 386390). IEEE Computer Society;
Scopus. https://doi.org/10.1109/COMPSAC.2019.10237
Ioannidis, J. P. A. (2005). Why Most Published Research Findings Are False. PLOS Medicine, 2(8), e124.
https://doi.org/10.1371/journal.pmed.0020124
Jones, N. J., & Bennell, C. (2007). The development and validation of statistical prediction rules for
discriminating between genuine and simulated suicide notes. Archives of Suicide Research,
11(2), 219233. Scopus. https://doi.org/10.1080/13811110701250176
Karstoft, Karen-Inge, Statnikov, A., Andersen, S. B., Madsen, T., & Galatzer-Levy, I. R. (2015). Early
identification of posttraumatic stress following military deployment: Application of machine
learning methods to a prospective study of Danish soldiers. Journal of Affective Disorders, 184,
170175. https://doi.org/10.1016/j.jad.2015.05.057
Karstoft, K.-I., Galatzer-Levy, I. R., Statnikov, A., Li, Z., Shalev, A. Y., Ankri, Y., Freedman, S., Addesky, R.,
Israeli-Shalev, Y., Gilad, M., Roitman, P., & and For members of the Jerusalem Trauma Outreach
and Prevention Study (J-TOPS) group. (2015). Bridging a translational gap: Using machine
learning to improve the prediction of PTSD. BMC Psychiatry, 15(1). Scopus.
https://doi.org/10.1186/s12888-015-0399-8
Khodayari-Rostamabad, A., Hasey, G. M., MacCrimmon, D. J., Reilly, J. P., & Bruin, H. D. (2010). A pilot
study to determine whether machine learning methodologies using pre-treatment
electroencephalography can predict the symptomatic response to clozapine therapy. Clinical
Neurophysiology, 121(12), 19982006. Scopus. https://doi.org/10.1016/j.clinph.2010.05.009
Kitamura, T., Shima, S., Sakio, E., & Kato, M. (1989). Psychiatric Diagnosis in Japan. 2. Reliability of
Conventional Diagnosis and Discrepancies with Research Diagnostic Criteria Diagnosis.
Psychopathology, 22(5), 250259. https://doi.org/10.1159/000284605
Kitchenham, B., & Charters, S. (2007). Guidelines for performing Systematic Literature Reviews in
Software Engineering.
Koutsouleris, N., Meisenzahl, E. M., Davatzikos, C., Bottlender, R., Frodl, T., Scheuerecker, J., Schmitt, G.,
Zetzsche, T., Decker, P., Reiser, M., Möller, H.-J., & Gaser, C. (2009). Use of neuroanatomical
pattern classification to identify subjects in at-risk mental states of psychosis and predict disease
transition. Archives of General Psychiatry, 66(7), 700712. Scopus.
https://doi.org/10.1001/archgenpsychiatry.2009.62
Kuhn, E., Greene, C., Hoffman, J., Nguyen, T., Wald, L., Schmidt, J., Ramsey, K. M., & Ruzek, J. (2014).
Preliminary Evaluation of PTSD Coach, a Smartphone App for Post-Traumatic Stress Symptoms.
Military Medicine, 179(1), 1218. https://doi.org/10.7205/MILMED-D-13-00271
Leavy, S. (2018). Gender Bias in Artificial Intelligence: The Need for Diversity and Gender Theory in
Machine Learning. 2018 IEEE/ACM 1st International Workshop on Gender Equality in Software
Engineering (GE), 1416.
LeCun, Y., Cortes, C., & Burges, C. J. C. (1999). MNIST handwritten digit database.
http://yann.lecun.com/exdb/mnist/
Liberati, A., Altman, D. G., Tetzlaff, J., Mulrow, C., Gøtzsche, P. C., Ioannidis, J. P. A., Clarke, M.,
Devereaux, P. J., Kleijnen, J., & Moher, D. (2009). The PRISMA statement for reporting
systematic reviews and meta-analyses of studies that evaluate healthcare interventions:
Explanation and elaboration. BMJ, 339, b2700. https://doi.org/10.1136/bmj.b2700
Ma, S., Galatzer-Levy, I. R., Wang, X., Fenyö, D., & Shalev, A. Y. (2016). A First Step towards a Clinical
Decision Support System for Post-traumatic Stress Disorders. AMIA ... Annual Symposium
Proceedings. AMIA Symposium, 2016, 837843. Scopus.
Mallol-Ragolta, A., Dhamija, S., & Boult, T. E. (2018). A multimodal approach for predicting changes in
PTSD symptom severity. ICMI - Proc. Int. Conf. Multimodal Interact., 324333. Scopus.
https://doi.org/10.1145/3242969.3242981
Mane, K. K., Bizon, C., Schmitt, C., Owen, P., Burchett, B., Pietrobon, R., & Gersing, K. (2012).
VisualDecisionLinc: A visual analytics approach for comparative effectiveness-based clinical
decision support in psychiatry. Journal of Biomedical Informatics, 45(1), 101106. Scopus.
https://doi.org/10.1016/j.jbi.2011.09.003
Marinić, I., Supek, F., Kovačić, Z., Rukavina, L., Jendričko, T., & Kozarić-Kovačić, D. (2007). Posttraumatic
Stress Disorder: Diagnostic Data Analysis by Data Mining Methodology. Croat Med J, 13.
Maron, E., Baldwin, D. S., Balõtšev, R., Fabbri, C., Gaur, V., Hidalgo-Mazzei, D., Hood, S., Juhola, M.,
Kampman, O., Kasper, S., Kärkkäinen, H., Látalová, K., Lähteenvuo, M., Mastellos, N., McTigue,
J., Metsallik, J., Metspalu, A., Nutt, D., Nykänen, P., … Eberhard, J. (2019). Manifesto for an
international digital mental health network. Digital Psychiatry, 2(1), 1424.
https://doi.org/10.1080/2575517X.2019.1617575
McGlynn, E. A., Asch, S. M., Adams, J., Keesey, J., Hicks, J., DeCristofaro, A., & Kerr, E. A. (2003). The
Quality of Health Care Delivered to Adults in the United States. New England Journal of
Medicine, 348(26), 26352645. https://doi.org/10.1056/NEJMsa022615
McHugh, M. L. (2012). Interrater reliability: The kappa statistic. Biochemia Medica, 22(3), 276282.
McKnight, P. E., & Najab, J. (2010). Mann-Whitney U Test. In The Corsini Encyclopedia of Psychology (pp.
11). American Cancer Society. https://doi.org/10.1002/9780470479216.corpsy0524
McWhorter, J., Brown, L., & Khansa, L. (2017). A wearable health monitoring system for posttraumatic
stress disorder. Biologically Inspired Cognitive Architectures, 22, 4450. Scopus.
https://doi.org/10.1016/j.bica.2017.09.004
Meltzer, E. C., Averbuch, T., Samet, J. H., Saitz, R., Jabbar, K., Lloyd-Travaglini, C., & Liebschutz, J. M.
(2012). Discrepancy in diagnosis and treatment of post-traumatic stress disorder (PTSD):
Treatment for the wrong reason. The Journal of Behavioral Health Services & Research, 39(2),
190201. https://doi.org/10.1007/s11414-011-9263-x
Mendel, R., Traut-Mattausch, E., Jonas, E., Leucht, S., Kane, J. M., Maino, K., Kissling, W., & Hamann, J.
(2011). Confirmation bias: Why psychiatrists stick to wrong preliminary diagnoses. Psychological
Medicine, 9.
Metsallik, J., Ross, P., Draheim, D., & Piho, G. (2018). Ten Years of the e-Health System in Estonia. CEUR
Workshop Proceedings, 10.
Miner, A., Kuhn, E., Hoffman, J. E., Owen, J. E., Ruzek, J. I., & Taylor, C. B. (2016). Feasibility,
acceptability, and potential efficacy of the PTSD Coach app: A pilot randomized controlled trial
with community trauma survivors. Psychological Trauma: Theory, Research, Practice, and Policy,
8(3), 384392. https://doi.org/10.1037/tra0000092
Mitchell, A. J., Vaze, A., & Rao, S. (2009). Clinical diagnosis of depression in primary care: A meta-
analysis. Database of Abstracts of Reviews of Effects (DARE): Quality-Assessed Reviews.
https://www.ncbi.nlm.nih.gov/books/NBK77945/
Moher, D., Liberati, A., Tetzlaff, J., Altman, D. G., & Group, T. P. (2009). Preferred Reporting Items for
Systematic Reviews and Meta-Analyses: The PRISMA Statement. PLOS Medicine, 6(7), e1000097.
https://doi.org/10.1371/journal.pmed.1000097
Morselli, P. L., & Elgie, R. (2003). GAMIAN-Europe*/BEAM survey I global analysis of a patient
questionnaire circulated to 3450 members of 12 European advocacy groups operating in the
field of mood disorders. Bipolar Disorders, 5(4), 265278. https://doi.org/10.1034/j.1399-
5618.2003.00037.x
Muller, R. J. (2013). Doing Psychiatry Wrong: A Critical and Prescriptive Look at a Faltering Profession.
Routledge.
Myers, C. E., Radell, M. L., Shind, C., Ebanks-Williams, Y., Beck, K. D., & Gilbertson, M. W. (2016). Beyond
symptom self-report: Use of a computer “avatar” to assess post-traumatic stress disorder (PTSD)
symptoms. Stress, 19(6), 593598. https://doi.org/10.1080/10253890.2016.1232385
Neuman, M. R., Baura, G. D., Meldrum, S., Soykan, O., Valentinuzzi, M. E., Leder, R. S., Micera, S., &
Zhang, Y.-T. (2012). Advances in Medical Devices and Medical Electronics. Proceedings of the
IEEE, 100(Special Centennial Issue), 15371550. https://doi.org/10.1109/JPROC.2012.2190684
Office-based Physician Electronic Health Record Adoption. (2019). Office of the National Coordinator for
Health Information Technology. dashboard.healthit.gov/quickstats/pages/physician-ehr-
adoption-trends.php
Omurca, S. I., & Ekinci, E. (2015). An alternative evaluation of post traumatic stress disorder with
machine learning methods. INISTA - Int. Symp. Innov. Intell. SysT. Appl., Proc. Scopus.
https://doi.org/10.1109/INISTA.2015.7276754
Panch, T., Mattie, H., & Celi, L. A. (2019). The “inconvenient truth” about AI in healthcare. Npj Digital
Medicine, 2(1), 13. https://doi.org/10.1038/s41746-019-0155-4
Paullada, A., Raji, I. D., Bender, E. M., Denton, E., & Hanna, A. (2020). Data and its (dis)contents: A survey
of dataset development and use in machine learning research. ArXiv:2012.05345 [Cs].
http://arxiv.org/abs/2012.05345
Perlis, R. H. (2013). A clinical risk stratification tool for predicting treatment resistance in major
depressive disorder. Biological Psychiatry, 74(1), 714. Scopus.
https://doi.org/10.1016/j.biopsych.2012.12.007
Posada, J. D., Barda, A. J., Shi, L., Xue, D., Ruiz, V., Kuan, P.-H., Ryan, N. D., & Tsui, F. R. (2017). Predictive
modeling for classification of positive valence system symptom severity from initial psychiatric
evaluation records. Journal of Biomedical Informatics, 75, S94S104. Scopus.
https://doi.org/10.1016/j.jbi.2017.05.019
Post-traumatic stress disorder[D] Evidence reviews for psychological, psychosocial and other non-
pharmacological interventions for the treatment of PTSD in adults. (n.d.). National Institute for
Health and Care Excellence. Retrieved September 9, 2020, from
https://www.nice.org.uk/guidance/ng116/evidence/d-psychological-psychosocial-and-other-
nonpharmacological-interventions-for-the-treatment-of-ptsd-in-adults-pdf-6602621008
Power, D. J. (2008). Decision Support Systems: A Historical Overview. In F. Burstein & C. W. Holsapple
(Eds.), Handbook on Decision Support Systems 1: Basic Themes (pp. 121140). Springer.
https://doi.org/10.1007/978-3-540-48713-5_7
Pyne, J. M., Constans, J. I., Wiederhold, M. D., Gibson, D. P., Kimbrell, T., Kramer, T. L., Pitcock, J. A., Han,
X., Williams, D. K., Chartrand, D., Gevirtz, R. N., Spira, J., Wiederhold, B. K., McCraty, R., &
McCune, T. R. (2016). Heart rate variability: Pre-deployment predictor of post-deployment PTSD
symptoms. Biological Psychology, 121, 9198. https://doi.org/10.1016/j.biopsycho.2016.10.008
Safdar, N. M., Banja, J. D., & Meltzer, C. C. (2020). Ethical considerations in artificial intelligence.
European Journal of Radiology, 122, 108768. https://doi.org/10.1016/j.ejrad.2019.108768
Saxe, G. N., Ma, S., Ren, J., & Aliferis, C. (2017). Machine learning methods to predict child posttraumatic
stress: A proof of concept study. BMC Psychiatry, 17(1). Scopus.
https://doi.org/10.1186/s12888-017-1384-1
Scherer, S., Lucas, G. M., Gratch, J., Rizzo, A., & Morency, L.-P. (2016). Self-Reported Symptoms of
Depression and PTSD Are Associated with Reduced Vowel Space in Screening Interviews. IEEE
Transactions on Affective Computing, 7(1), 5973. Scopus.
https://doi.org/10.1109/TAFFC.2015.2440264
Scherer, S., Stratou, G., Gratch, J., & Morency, L.-P. (2013). Investigating voice quality as a speaker-
independent indicator of depression and PTSD. Proc. Annu. Conf. Int. Speech. Commun. Assoc.,
INTERSPEECH, 847851. Scopus. https://www.scopus.com/inward/record.uri?eid=2-s2.0-
84905232283&partnerID=40&md5=0511bf000f576d12ac1b8dbfcec3dac1
Schreyögg, J., Bäumler, M., & Busse, R. (2009). Balancing adoption and affordability of medical devices in
Europe. Health Policy, 92(2), 218224. https://doi.org/10.1016/j.healthpol.2009.03.016
Schwarz, D., Kasparek, T., Provaznik, I., & Jarkovsky, J. (2007). A deformable registration method for
automated morphometry of MRI brain images in neuropsychiatric research. IEEE Transactions
on Medical Imaging, 26(4), 452461. Scopus. https://doi.org/10.1109/TMI.2007.892512
Shaikh al arab, A., Guédon-Moreau, L., Ducrocq, F., Molenda, S., Duhem, S., Salleron, J., Chaudieu, I.,
Bert, D., Libersa, C., & Vaiva, G. (2012). Temporal analysis of heart rate variability as a predictor
of post traumatic stress disorder in road traffic accidents survivors. Journal of Psychiatric
Research, 46(6), 790796. https://doi.org/10.1016/j.jpsychires.2012.02.006
Sim, I., & Berlin, A. (2003). A Framework for Classifying Decision Support Systems. AMIA Annual
Symposium Proceedings, 2003, 599603.
Simoons, M., Ruhé, H. G., Van Roon, E. N., Schoevers, R. A., Bruggeman, R., Cath, D. C., Muis, D., Arends,
J., Doornbos, B., & Mulder, H. (2019). Design and methods of the “monitoring outcomes of
psychiatric pharmacotherapy” (MOPHAR) monitoring program—A study protocol. BMC Health
Services Research, 19(1). Scopus. https://doi.org/10.1186/s12913-019-3951-2
Singh, T., & Rajput, M. (2006). Misdiagnosis of Bipolar Disorder. Psychiatry (Edgmont), 3(10), 5763.
Sittig, D. F., Krall, M. A., Dykstra, R. H., Russell, A., & Chin, H. L. (2006). A survey of factors affecting
clinician acceptance of clinical decision support. BMC Medical Informatics and Decision Making,
6(1), 6. https://doi.org/10.1186/1472-6947-6-6
Smith, B., & Linden, G. (2017). Two Decades of Recommender Systems at Amazon.com. IEEE Internet
Computing, 21(3), 1218. https://doi.org/10.1109/MIC.2017.72
Sohn, S., Kocher, J.-P. A., Chute, C. G., & Savova, G. K. (2011). Drug side effect extraction from clinical
narratives of psychiatry and psychology patients. Journal of the American Medical Informatics
Association, 18(SUPPL. 1), 144149. Scopus. https://doi.org/10.1136/amiajnl-2011-000351
Spottswood, M., Davydow, D. S., & Huang, H. (2017). The Prevalence of Posttraumatic Stress Disorder in
Primary Care: A Systematic Review. Harvard Review of Psychiatry, 25(4), 159169.
https://doi.org/10.1097/HRP.0000000000000136
Sprague, R. H. (1980). A Framework for the Development of Decision Support Systems. MIS Quarterly,
4(4), 126. https://doi.org/10.2307/248957
Stephan, K. E., Schlagenhauf, F., Huys, Q. J. M., Raman, S., Aponte, E. A., Brodersen, K. H., Rigoux, L.,
Moran, R. J., Daunizeau, J., Dolan, R. J., Friston, K. J., & Heinz, A. (2017). Computational
neuroimaging strategies for single patient predictions. NeuroImage, 145, 180199. Scopus.
https://doi.org/10.1016/j.neuroimage.2016.06.038
Sterne, J. A. C., & Harbord, R. M. (2004). Funnel Plots in Meta-analysis. The Stata Journal, 4(2), 127141.
https://doi.org/10.1177/1536867X0400400204
Strickland, E. (2019). IBM Watson, heal thyself: How IBM overpromised and underdelivered on AI health
care. IEEE Spectrum, 56(4), 2431. https://doi.org/10.1109/MSPEC.2019.8678513
Suhasini, A., Palanivel, S., & Ramalingam, V. (2011). Multimodel decision support system for psychiatry
problem. Expert Systems with Applications, 38(5), 49904997. Scopus.
https://doi.org/10.1016/j.eswa.2010.09.152
Sutton, R. T., Pincock, D., Baumgart, D. C., Sadowski, D. C., Fedorak, R. N., & Kroeker, K. I. (2020). An
overview of clinical decision support systems: Benefits, risks, and strategies for success. Npj
Digital Medicine, 3(1), 110. https://doi.org/10.1038/s41746-020-0221-y
Tasma, M., Roebroek, L. O., Liemburg, E. J., Knegtering, H., Delespaul, P. A., Boonstra, A., Swart, M., &
Castelein, S. (2018). The development and evaluation of a computerized decision aid for the
treatment of psychotic disorders. BMC Psychiatry, 18(1). Scopus.
https://doi.org/10.1186/s12888-018-1750-7
What Is PTSD? (n.d.). Retrieved June 4, 2020, from https://www.psychiatry.org/patients-
families/ptsd/what-is-ptsd
Wittchen, H. U., Jacobi, F., Rehm, J., Gustavsson, A., Svensson, M., Jönsson, B., Olesen, J., Allgulander, C.,
Alonso, J., Faravelli, C., Fratiglioni, L., Jennum, P., Lieb, R., Maercker, A., van Os, J., Preisig, M.,
Salvador-Carulla, L., Simon, R., & Steinhausen, H.-C. (2011). The size and burden of mental
disorders and other disorders of the brain in Europe 2010. European
Neuropsychopharmacology, 21(9), 655679. https://doi.org/10.1016/j.euroneuro.2011.07.018
Wolff, J., Gary, A., Jung, D., Normann, C., Kaier, K., Binder, H., Domschke, K., Klimke, A., & Franz, M.
(2020). Predicting patient outcomes in psychiatric hospitals with routine data: A machine
learning approach. BMC Medical Informatics and Decision Making, 20(1). Scopus.
https://doi.org/10.1186/s12911-020-1042-2
Xu, R., Mei, G., Zhang, G., Gao, P., Judkins, T., Cannizzaro, M., & Li, J. (2012). A voice-based automated
system for PTSD screening and monitoring. Studies in Health Technology and Informatics, 173,
552558.
Yu, K.-H., Beam, A. L., & Kohane, I. S. (2018). Artificial intelligence in healthcare. Nature Biomedical
Engineering, 2(10), 719731. https://doi.org/10.1038/s41551-018-0305-z
Zatzick, D., O’Connor, S. S., Russo, J., Wang, J., Bush, N., Love, J., Peterson, R., Ingraham, L., Darnell, D.,
Whiteside, L., & Van Eaton, E. (2015). Technology-Enhanced Stepped Collaborative Care
Targeting Posttraumatic Stress Disorder and Comorbidity After Injury: A Randomized Controlled
Trial. Journal of Traumatic Stress, 28(5), 391400. Scopus. https://doi.org/10.1002/jts.22041
Zhao, K., & So, H.-C. (2019). Drug Repositioning for Schizophrenia and Depression/Anxiety Disorders: A
Machine Learning Approach Leveraging Expression Data. IEEE Journal of Biomedical and Health
Informatics, 23(3), 13041315. Scopus. https://doi.org/10.1109/JBHI.2018.2856535
Zhuang, X., Rozgić, V., Crystal, M., & Marx, B. P. (2014). Improving speech-based PTSD detection via
multi-view learning. 2014 IEEE Spoken Language Technology Workshop (SLT), 260265.
https://doi.org/10.1109/SLT.2014.7078584
Appendix
The following two sections list the literature used for this review:
Publication Title (Psychiatry)
Year
A clinical risk stratification tool for predicting treatment resistance in major depressive disorder
(Perlis, 2013)
2013
A deformable registration method for automated morphometry of MRI brain images in
neuropsychiatric research (Schwarz et al., 2007)
2007
A pilot study to determine whether machine learning methodologies using pre-treatment
electroencephalography can predict the symptomatic response to clozapine therapy (Khodayari-
Rostamabad et al., 2010)
2010
A risk calculator to predict adult attention-deficit/hyperactivity disorder: Generation and
external validation in three birth cohorts and one clinical sample (Caye et al., 2019)
2019
A situation-aware system for the detection of motion disorders of patients with Autism
Spectrum Disorders (Coronato et al., 2014)
2014
A web-based clinical decision tool to support treatment decision-making in psychiatry: A pilot
focus group study with clinicians, patients and carers (Henshall et al., 2017)
2017
Automatic recognition of symptom severity from psychiatric evaluation records (Goodwin et al.,
2017)
2017
Combining mobile-health (mHealth) and artificial intelligence (AI) methods to avoid suicide
attempts: The Smartcrises study protocol (Berrouiguet et al., 2019)
2019
Computational neuroimaging strategies for single patient predictions (Stephan et al., 2017)
2017
Computer-aided DSM-IV-diagnostics - Acceptance, use and perceived usefulness in relation to
users' learning styles (Bergman & Fors, 2005)
2005
Design and methods of the 'monitoring outcomes of psychiatric pharmacotherapy' (MOPHAR)
monitoring program - A study protocol (Simoons et al., 2019)
2019
Discrimination of schizophrenia auditory hallucinators by machine learning of resting-state
functional MRI (Chyzhyk et al., 2015)
2015
Drug Repositioning for Schizophrenia and Depression/Anxiety Disorders: A Machine Learning
Approach Leveraging Expression Data (Zhao & So, 2019)
2019
Drug side effect extraction from clinical narratives of psychiatry and psychology patients (Sohn et
al., 2011)
2011
From complex questionnaire and interviewing data to intelligent Bayesian network models for
medical decision support (Constantinou et al., 2016)
2016
Implementing a digital clinical decision support tool for side effects of antipsychotics: A focus
group study (Henshall et al., 2019)
2019
Leveraging the utility of pharmacogenomics in psychiatry through clinical decision support: A
focus group study (Goodspeed et al., 2019)
2019
Machine learning methods to predict child posttraumatic stress: A proof of concept study (Saxe
et al., 2017)
2017
Multimodel decision support system for psychiatry problem (Suhasini et al., 2011)
2011
Predicting patient outcomes in psychiatric hospitals with routine data: A machine learning
approach (Wolff et al., 2020)
2020
Predicting persistent depressive symptoms in older adults: A machine learning approach to
personalised mental healthcare (Hatton et al., 2019)
2019
Predictive modeling for classification of positive valence system symptom severity from initial
psychiatric evaluation records (Posada et al., 2017)
2017
The development and evaluation of a computerized decision aid for the treatment of psychotic
disorders (Tasma et al., 2018)
2018
The development and validation of statistical prediction rules for discriminating between
genuine and simulated suicide notes (Jones & Bennell, 2007)
2007
Use of neuroanatomical pattern classification to identify subjects in at-risk mental states of
psychosis and predict disease transition (Koutsouleris et al., 2009)
2009
VisualDecisionLinc: A visual analytics approach for comparative effectiveness-based clinical
decision support in psychiatry (Mane et al., 2012)
2012
Publication Title (PTSD)
Year
A First Step towards a Clinical Decision Support System for Post-traumatic Stress Disorders (Ma et
al., 2016)
2016
A mobile app for patients and those who care about them: A case study for veterans with PTSD +
anger (Barish et al., 2019)
2019
A multimodal approach for predicting changes in PTSD symptom severity (Mallol-Ragolta et al.,
2018)
2018
A neural network based model for predicting psychological conditions (Dabek & Caban, 2015a)
2015
A wearable health monitoring system for posttraumatic stress disorder (McWhorter et al., 2017)
2017
An alternative evaluation of post traumatic stress disorder with machine learning methods
(Omurca & Ekinci, 2015)
2015
Bridging a translational gap: Using machine learning to improve the prediction of PTSD (K.-I.
Karstoft et al., 2015)
2015
Investigating voice quality as a speaker-independent indicator of depression and PTSD (Scherer
et al., 2013)
2013
Machine learning methods to predict child posttraumatic stress: A proof of concept study (Saxe
et al., 2017)
2016
Measuring post traumatic stress disorder in twitter (Coppersmith et al., 2014)
2014
Quantitative forecasting of PTSD from early trauma responses: A Machine Learning application
(Galatzer-Levy et al., 2014)
2014
Self-Reported Symptoms of Depression and PTSD Are Associated with Reduced Vowel Space in
Screening Interviews (Scherer et al., 2016)
2016
Technology-Enhanced Stepped Collaborative Care Targeting Posttraumatic Stress Disorder and
Comorbidity After Injury: A Randomized Controlled Trial (Zatzick et al., 2015)
2015
Towards clinical decision support for veteran mental health crisis events using tree algorithm
(Hossain et al., 2019)
2019
A voice-based automated system for PTSD screening and monitoring (Xu et al., 2012)
2012
Automated Assessment of Patients’ Self-Narratives for Posttraumatic Stress Disorder Screening
Using Natural Language Processing and Text Mining (He et al., 2017)
2017
Beyond symptom self-report: use of a computer “avatar” to assess post-traumatic stress disorder
(PTSD) symptoms (Myers et al., 2016)
2016
Customized computer-based administration of the PCL-5 for the efficient assessment of PTSD: A
proof-of-principle study (Finkelman et al., 2017)
2017
Early identification of posttraumatic stress following military deployment: Application of machine
learning methods to a prospective study of Danish soldiers (Karen-Inge Karstoft et al., 2015)
2015
Feasibility, acceptability, and potential efficacy of the PTSD Coach app: A pilot randomized
controlled trial with community trauma survivors (Miner et al., 2016)
2016
Heart rate variability: Pre-deployment predictor of post-deployment PTSD symptoms (Pyne et al.,
2016)
2016
Improving speech-based PTSD detection via multi-view learning (Zhuang et al., 2014)
2014
Leveraging Big Data to Model the Likelihood of Developing Psychological Conditions After a
Concussion (Dabek & Caban, 2015b)
2015
Linguistic predictors of trauma pathology and physical health (Alvarez-Conrad et al., 2001)
2001
Physiology-Driven Adaptive Virtual Reality Stimulation for Prevention and Treatment of Stress
Related Disorders (Ćosić et al., 2010)
2010
Posttraumatic Stress Disorder: Diagnostic Data Analysis by Data Mining Methodology (Marinić et
al., 2007)
2007
Preliminary Evaluation of PTSD Coach, a Smartphone App for Post-Traumatic Stress Symptoms
(Kuhn et al., 2014)
2014
Temporal analysis of heart rate variability as a predictor of post traumatic stress disorder in road
traffic accidents survivors (Shaikh al arab et al., 2012)
2012
The use of immersive virtual reality (VR) to predict the occurrence 6 months later of paranoid
thinking and posttraumatic stress symptoms assessed by self-report and interviewer methods: A
study of individuals who have been physically assaulted. (Freeman et al., 2014)
2014
Using structural neuroanatomy to identify trauma survivors with and without post-traumatic
stress disorder at the individual level (Gong et al., 2014)
2014
... These obstacles, together with the exponential data growth, create increasing opportunities for DDSS. Recent meta-reviews showed that current research of DDSS promises high accuracy scores using ML algorithms [1,2]. The used data was mostly transformed and algorithms were specifically selected and tuned. ...
... However, the resulting evaluation metrics are behind the ones that are reported using traditional AI approaches. While we achieved an accuracy of 0.6 with auto-sklearn and 0.59 with Watson Studio's AutoAI Experiment, a recent literature survey reports an average accuracy of 0.84 for predicting psychiatric diseases [2]. Notable is, that the studies found by the named survey have a by far smaller sample size (mean of μ = 5569 records with a standard deviation of σ = 19194.28 ...
Chapter
Digital transformation enables a vast growth of health data. Because of that, scholars and professionals considered AI to enhance quality of care significantly. Machine learning (ML) algorithms for improvement have been studied extensively, but automatic artificial intelligence (autoAI/autoML) has been widely neglected. AutoAI aims to automate the complete AI lifecycle to save data scientists from doing low-level coding tasks. Additionally, autoAI has the potential to democratize AI by empowering non-IT users to build AI algorithms. In this paper, we analyze the suitability of autoAI for mental health screening to detect psychiatric diseases. A sooner diagnosis can lead to cost savings for healthcare systems and decrease patients’ suffering. We evaluate AutoAI using the open-source machine learning library auto-sklearn, as well as the commercial Watson Studio’s AutoAI platform to predict depression, post-traumatic stress disorder, and psychiatric disorders in general. We use health insurance billing data from 83,986 patients with a total of 687,697 ICD-10 coded diseases. The results of our research are as follows: (i) on average, an accuracy of 0.6 (F1–score 0.58) with a precision of 0.61 and recall of 0.56 was achieved using auto-sklearn. (ii) The evaluation metrics for Watson Studio’s autoAI were 0.59 accuracy, 0.57 F1–score, a precision of 0.6, and a recall of 0.55. We conclude that the prediction quality of autoAI in psychiatry still lacks behind traditional ML approaches by about 24% and is therefore not ready for production use yet.
Article
Full-text available
In this work, we survey a breadth of literature that has revealed the limitations of predominant practices for dataset collection and use in the field of machine learning. We cover studies that critically review the design and development of datasets with a focus on negative societal impacts and poor outcomes for system performance. We also cover approaches to filtering and augmenting data and modeling techniques aimed at mitigating the impact of bias in datasets. Finally, we discuss works that have studied data practices, cultures, and disciplinary norms and discuss implications for the legal, ethical, and functional challenges the field continues to face. Based on these findings, we advocate for the use of both qualitative and quantitative approaches to more carefully document and analyze datasets during the creation and usage phases.
Article
Full-text available
Artificial intelligence (AI) models for decision support have been developed for clinical settings such as radiology, but little work evaluates the potential impact of such systems. In this study, physicians received chest X-rays and diagnostic advice, some of which was inaccurate, and were asked to evaluate advice quality and make diagnoses. All advice was generated by human experts, but some was labeled as coming from an AI system. As a group, radiologists rated advice as lower quality when it appeared to come from an AI system; physicians with less task-expertise did not. Diagnostic accuracy was significantly worse when participants received inaccurate advice, regardless of the purported source. This work raises important considerations for how advice, AI and non-AI, should be deployed in clinical environments.
Preprint
Full-text available
Over the last years, an increase in research about medical decision support systems could be observed. However, compared to other disciplines, decision support systems in mental health are still in the minority, especially for rare diseases like post-traumatic stress disorder (PTSD). In this paper, we are presenting the state of the art of digital decision support systems for PTSD by conducting a systematic literature review based on a collection of 88 scholarly articles published. Most literature reviews are only focused on one aspect of decision support systems (e.g. implementation technology or decisions made by the system). We, however, are extracting all relevant facts and properties needed by a decision support system to function. Based on screened literature, we contribute by aggregating and clustering the available knowledge from 30 decision support systems into one novel framework with the dimensions on input data, data collection technology, decision technology, interaction technology, validation, user group, targeted diseases, supported decisions, and maturity. This framework creates a holistic view for analyzing digital decision support systems.
Article
Full-text available
Background: The implementation of clinical decision support systems (CDSSs) as an intervention to foster clinical practice change is affected by many factors. Key factors include those associated with behavioral change and those associated with technology acceptance. However, the literature regarding these subjects is fragmented and originates from two traditionally separate disciplines: implementation science and technology acceptance. Objective: Our objective is to propose an integrated framework that bridges the gap between the behavioral change and technology acceptance aspects of the implementation of CDSSs. Methods: We employed an iterative process to map constructs from four contributing frameworks-the Theoretical Domains Framework (TDF); the Consolidated Framework for Implementation Research (CFIR); the Human, Organization, and Technology-fit framework (HOT-fit); and the Unified Theory of Acceptance and Use of Technology (UTAUT)-and the findings of 10 literature reviews, identified through a systematic review of reviews approach. Results: The resulting framework comprises 22 domains: agreement with the decision algorithm; attitudes; behavioral regulation; beliefs about capabilities; beliefs about consequences; contingencies; demographic characteristics; effort expectancy; emotions; environmental context and resources; goals; intentions; intervention characteristics; knowledge; memory, attention, and decision processes; patient-health professional relationship; patient's preferences; performance expectancy; role and identity; skills, ability, and competence; social influences; and system quality. We demonstrate the use of the framework providing examples from two research projects. Conclusions: We proposed BEAR (BEhavior and Acceptance fRamework), an integrated framework that bridges the gap between behavioral change and technology acceptance, thereby widening the view established by current models.
Article
Full-text available
Background: A common problem in machine learning applications is availability of data at the point of decision making. The aim of the present study was to use routine data readily available at admission to predict aspects relevant to the organization of psychiatric hospital care. A further aim was to compare the results of a machine learning approach with those obtained through a traditional method and those obtained through a naive baseline classifier. Methods: The study included consecutively discharged patients between 1st of January 2017 and 31st of December 2018 from nine psychiatric hospitals in Hesse, Germany. We compared the predictive performance achieved by stochastic gradient boosting (GBM) with multiple logistic regression and a naive baseline classifier. We tested the performance of our final models on unseen patients from another calendar year and from different hospitals. Results: The study included 45,388 inpatient episodes. The models' performance, as measured by the area under the Receiver Operating Characteristic curve, varied strongly between the predicted outcomes, with relatively high performance in the prediction of coercive treatment (area under the curve: 0.83) and 1:1 observations (0.80) and relatively poor performance in the prediction of short length of stay (0.69) and non-response to treatment (0.65). The GBM performed slightly better than logistic regression. Both approaches were substantially better than a naive prediction based solely on basic diagnostic grouping. Conclusion: The present study has shown that administrative routine data can be used to predict aspects relevant to the organisation of psychiatric hospital care. Future research should investigate the predictive performance that is necessary to provide effective assistance in clinical practice for the benefit of both staff and patients.
Article
Full-text available
Computerized clinical decision support systems, or CDSS, represent a paradigm shift in healthcare today. CDSS are used to augment clinicians in their complex decision-making processes. Since their first use in the 1980s, CDSS have seen a rapid evolution. They are now commonly administered through electronic medical records and other computerized clinical workflows, which has been facilitated by increasing global adoption of electronic medical records with advanced capabilities. Despite these advances, there remain unknowns regarding the effect CDSS have on the providers who use them, patient outcomes, and costs. There have been numerous published examples in the past decade(s) of CDSS success stories, but notable setbacks have also shown us that CDSS are not without risks. In this paper, we provide a state-of-the-art overview on the use of clinical decision support systems in medicine, including the different types, current use cases with proven efficacy, common pitfalls, and potential harms. We conclude with evidence-based recommendations for minimizing risk in CDSS design, implementation, evaluation, and maintenance.
Article
Full-text available
Objectives: This research reviews the possibilities of text mining in the area of cybercrime in digital healthcare showing how advanced information retrieval and natural language processing can be used to get insights. The aim is to mine news data to find out what is reported about digital healthcare, what security-related critical events happened, and what actors, attack methods, and technologies play a role there. Methods: Different projects already apply text mining successfully in the cyber domain. However, none of these are specifically tailored to threats in the digital healthcare sector or or rely on a comparably large datset as this study. To achieve that goal, different text mining methodologies like fact extraction, semantic fields as well as statistical methods like frequency, correlation, and trend calculations were used. The news data for the analysis was provided by the DocCenter from the National Defense Academy (DocCenter/NDA) of the Austrian Armed Forces. About 300,000 news articles were processed and analyzed. Additionally, the open source GDELT dataset was investigated. Results & Conclusion: Text mining is an important tool for cybersecurity and trend research. The data points out that cyber threats are present in digital health technologies and cyberattacks are more and more threatening to organizations, governments, and individuals. Not only hacker groups, firms, and governments are involved in these attacks, also terroristic organizations use cyber warfare. That, together with the amount of technology in healthcare like pacemakers, IoT, wearables but also the importance of healthcare as critical infrastructure and the growing dependence on electronic health records makes our society vulnerable.
Conference Paper
The e-health system in Estonia, called the Estonian nationwide Health Information System (EHIS) has been operational since the end of 2008. The main success factors for the e-health system in Estonia are clear governance, legal clarity, a mature ecosystem, agreement about access rights, and standardization of medical data and data exchange rules. We present a short history, outline the general business and technical architecture and discuss the lessons learned.
Article
With artificial intelligence (AI) precipitously perched at the apex of the hype curve, the promise of transforming the disparate fields of healthcare, finance, journalism, and security and law enforcement, among others, is enormous. For healthcare - particularly radiology - AI is anticipated to facilitate improved diagnostics, workflow, and therapeutic planning and monitoring. And, while it is also causing some trepidation among radiologists regarding its uncertain impact on the demand and training of our current and future workforce, most of us welcome the potential to harness AI for transformative improvements in our ability to diagnose disease more accurately and earlier in the populations we serve.