ArticlePDF AvailableLiterature Review

Systematic reviews of machine learning in healthcare: a literature review

Taylor & Francis
Expert Review of Pharmacoeconomics & Outcomes Research
Authors:
  • Agency for Health Technology Assessment and Tariff System in Poland

Abstract

Introduction: The increasing availability of data and computing power has made machine learning (ML) a viable approach to faster, more efficient healthcare delivery. Methods: A systematic literature review (SLR) of published SLRs evaluating ML applications in healthcare settings published between1 January 2010 and 27 March 2023 was conducted. Results: In total 220 SLRs covering 10,462 ML algorithms were reviewed. The main application of AI in medicine related to the clinical prediction and disease prognosis in oncology and neurology with the use of imaging data. Accuracy, specificity, and sensitivity were provided in 56%, 28%, and 25% SLRs respectively. Internal and external validation was reported in 53% and less than 1% of the cases respectively. The most common modeling approach was neural networks (2,454 ML algorithms), followed by support vector machine and random forest/decision trees (1,578 and 1,522 ML algorithms, respectively). Expert opinion: The review indicated considerable reporting gaps in terms of the ML's performance, both internal and external validation. Greater accessibility to healthcare data for developers can ensure the faster adoption of ML algorithms into clinical practice.
Full Terms & Conditions of access and use can be found at
https://www.tandfonline.com/action/journalInformation?journalCode=ierp20
Expert Review of Pharmacoeconomics & Outcomes
Research
ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/ierp20
Systematic reviews of machine learning in
healthcare: a literature review
Katarzyna Kolasa, Bisrat Admassu, Malwina Hołownia-Voloskova, Katarzyna
J Kędzior, Jean-Etienne Poirrier & Stefano Perni
To cite this article: Katarzyna Kolasa, Bisrat Admassu, Malwina Hołownia-Voloskova, Katarzyna
J Kędzior, Jean-Etienne Poirrier & Stefano Perni (13 Nov 2023): Systematic reviews of machine
learning in healthcare: a literature review, Expert Review of Pharmacoeconomics & Outcomes
Research, DOI: 10.1080/14737167.2023.2279107
To link to this article: https://doi.org/10.1080/14737167.2023.2279107
© 2023 The Author(s). Published by Informa
UK Limited, trading as Taylor & Francis
Group.
View supplementary material
Published online: 13 Nov 2023.
Submit your article to this journal
Article views: 292
View related articles
View Crossmark data
REVIEW
Systematic reviews of machine learning in healthcare: a literature review
Katarzyna Kolasa
a
, Bisrat Admassu
a
, Malwina Hołownia-Voloskova
a
, Katarzyna J Kędzior
b
, Jean-Etienne Poirrier
b
and Stefano Perni
c
a
Division of Health Economics and Healthcare Management, Kozminski University, Warsaw, Poland;
b
Parexel International, Wavre, Belgium;
c
Parexel International Corp, London, UK
ABSTRACT
Introduction: The increasing availability of data and computing power has made machine learning
(ML) a viable approach to faster, more efficient healthcare delivery.
Methods: A systematic literature review (SLR) of published SLRs evaluating ML applications in healthcare
settings published between1 January 2010 and 27 March 2023 was conducted.
Results: In total 220 SLRs covering 10,462 ML algorithms were reviewed. The main application of AI in
medicine related to the clinical prediction and disease prognosis in oncology and neurology with the use of
imaging data. Accuracy, specificity, and sensitivity were provided in 56%, 28%, and 25% SLRs respectively.
Internal and external validation was reported in 53% and less than 1% of the cases respectively. The most
common modeling approach was neural networks (2,454 ML algorithms), followed by support vector machine
and random forest/decision trees (1,578 and 1,522 ML algorithms, respectively).
Expert opinion: The review indicated considerable reporting gaps in terms of the ML’s performance,
both internal and external validation. Greater accessibility to healthcare data for developers can ensure
the faster adoption of ML algorithms into clinical practice.
ARTICLE HISTORY
Received 17 July 2023
Accepted 31 October 2023
KEYWORDS
SLRs; machine learning;
artificial intelligence;
healthcare; healthcare data
1. Introduction
Along with many other sectors, medicine has become
a prominent beneficiary of artificial intelligence (AI)-driven innova-
tions, owing to the growing availability of data. The transformation
of healthcare began with the widespread adoption of electronic
health records (EHRs) in the early 1990s, with up to 93% of primary
care doctors using EHR across 24 OECD countries in 2021 [1].
The growing number of new data sources such as sensors,
wearables, and mobile applications is transforming healthcare.
The digital footprint of a patient’s journey produces new insights
that inform decision-making processes and makes them readily
available for developing machine learning (ML) algorithms.
Therefore, the abundance of data can help healthcare
organizations develop an holistic picture of a patient’s health
over time and can also introduce new insights into unmet
medical needs with new data-driven technologies.
The potential for digital transformation to improve health
outcomes and introduce efficiency gains has already been
observed in recent developments. There are numerous exam-
ples such as the application of AI to the diagnosis of cardiac
diseases [2], neoplastic diseases [3], pathologies of the voice
[4] and more recently during the COVID-19 pandemic [5,6]
have the potential to enhance diagnostic precision and
throughput, and patient outcomes [2].
Several experts claim that medicine is already moving from
the past decade, that focused on ML development, to the
subsequent decade, driven by the challenges of ensuring ML
algorithm deployment in clinical settings [7].
Although the International Medical Device Regulators
Forum (IMDRF) introduced the terms ‘software as a medical
device’ (SaMD) and ‘software in a medical device’ in 2013,
there have been limited efforts so far to develop the value
assessment framework for ML algorithms in healthcare system
similarly to the pricing & reimbursement of medical devices
and pharmaceuticals.
1.1. Aims
In order for the adoption process of artificial intelligence in the
healthcare to become effective and implementable there is,
however, the need to learn more about the opportunities and
challenges with the applicability of AI in medicine. Therefore,
our ultimate goal was to summarize the state-of-the-art
regarding the availability and performance of AI solutions in
healthcare. We conducted a review of systematic literature
reviews (SLRs) covering ML algorithms developed for medical
purposes. The objective of our research was two-folds: First, to
describe the number of ML solutions already available in
healthcare; and second, to assess the types of data commonly
reported in scientific publications on ML algorithms. Based on
our review results, we recommend actions for developers and
healthcare payers to facilitate AI integration into medicine.
CONTACT Katarzyna Kolasa kkolasa@kozminski.edu.pl Division of Health Economics and Healthcare Management, Kozminski University, Warsaw, Poland
Supplemental data for this article can be accessed online at https://doi.org/10.1080/14737167.2023.2279107
EXPERT REVIEW OF PHARMACOECONOMICS & OUTCOMES RESEARCH
https://doi.org/10.1080/14737167.2023.2279107
© 2023 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group.
This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License (http://creativecommons.org/licenses/by-nc-nd/4.0/),
which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited, and is not altered, transformed, or built upon in any way.
The terms on which this article has been published allow the posting of the Accepted Manuscript in a repository by the author(s) or with their consent.
2. Methodology
This review was carried out according to the guidelines of
Preferred Reporting Items for Systematic Review and Meta-
Analyses (PRISMA) [8].
2.1. Literature search
This review had the following formal protocol.
2.2. Selection of studies
The systematic literature reviews (SLRs) reporting the use of
ML in healthcare in any country, written in English and pub-
lished in peer-reviewed journals between 1 January 2010 and
27 March 2023 were included.
Searches were conducted in PubMed, IEEE Xplore, Scopus
(www.scopus.com), Web of Science, EBSCO, and the Cochrane
Library (www.cochranelibrary.com). The following words were
searched in the titles and abstracts of published studies: ‘SRL,’
‘ML’ and ‘Machine Learning,’ using the Boolean operator ‘AND’
and wildcard symbols as appropriate for each database; with
additional key terms such as outcome prediction, diagnosis,
screening and/or treatment of any disease.
Studies on animal, plant or in vitro investigations were
excluded, as were studies assessing ML applications in non-
medicine related. Furthermore, explorative articles without
details about the performance of MLs were excluded as well.
Four researchers (hereafter referred to as reviewers) per-
formed the initial review in the following steps:
(1) Identification: The titles, keywords, and abstracts of all
identified publications were screened by two reviewers
that independently evaluated whether the paper had
the potential to be relevant. When the initial assess-
ment was different, consensus was reached through
discussion.
(2) Full-text screening: The full texts of all publications
identified in the previous step were obtained and
assessed independently by two reviewers for inclusion
in the review and for data extraction against the inclu-
sion/exclusion criteria and study objectives. When the
initial assessment of the reviewers was different, a final
decision was reached by consensus.
2.3. Data extraction
Data were extracted from each SLR in two phases. First, two
recent checklists were analyzed to define the set of review
criteria [9,10]. Second, a random sample of 30 SLRs was ana-
lyzed to assess the most commonly reported information
across the included publications and to develop an extraction
grid capable of ensuring a standard, rigors and comprehensive
data extraction.
All identified publications were entered into the Covidence
systematic review software for the remainder of the review.
Data extraction was initiated after the initial process. Each
SLR was reviewed for basic descriptive statistics, including
quality assessment and reporting methods, along with an
assignment to one of the three categories.
(1) Categorization (classification of data into categories or
clusters)
(2) Prediction (making predictions regarding outputs pro-
viding historical data)
(3) Discovery (analysis of the structure of data)
2.4. Data analysis
ICD-10 codes were used to analyze the therapeutic area cov-
ered by the SLRs, and basic statistics such as the sources of
data, accuracy, specificity, and sensitivity were extracted sepa-
rately from each publication reported in each of the
included SLR.
These performance parameters are commonly employed to
assess the performance of categorization methods; they are
defined as follow:
Where: FN = false negative; FP = false positive; TN = true nega-
tive; TP = true positive
Accuracy represents the overall correctness of the model
prediction; sensitivity consists of the fraction of correctly iden-
tified positive cases while specificity is the fraction of correctly
identified negative cases.
Methods of validation and handling missing data were also
extracted. Details regarding the external validation with
respect to the comparison of AI against humans were also
extracted and reviewed, not only from systematic literature
reviews but also from primary studies. The details of the types
of ML techniques were also extracted, and the number of
primary studies reporting the use of different ML algorithm
typologies was determined for each SLR included.
Article highlights
Artificial Intelligence and Machine Learning (ML) have to the poten-
tial to improve health outcomes and increase healthcare system’s
efficiency.
A systematic literature review (SLR) identified 220 published SLRs
evaluating ML applications in healthcare settings covering 10,462 ML.
The most common modeling approach was neural networks (2,454
ML algorithms), followed by support vector machine and random
forest/decision trees (1,578 and 1,522 ML algorithms, respectively).
Internal validation was reported in 53% of the ML algorithms and
external validation in less than 1% of cases. The lack of assessment of
the AI performance should be overcome to facilitate the application
of AI/ML in healthcare.
2K. KOLASA ET AL.
3. Results
A total of 2,342 SLRs were identified. Based on the title and
abstract reviews, 1,233 hits were removed during the identifi-
cation phase. A total of 686 duplicates were identified
(Figure 1). The screening phase included 423 publications.
After full-text analysis, 220 articles [11–231] covering 10,462
ML algorithms were finally included in the review (Figure 1).
The number of studies covered by each SLR varied from 4
(166) to 921 (83) articles (Table 1). Approximately 88% of these
articles were published between 2020–2021 and no study
before 2017 (Figure 2).About 67% (147) of the selected SLRs
were published in 2021; SLRs published in 2020 and 2022
represented 9% (19) and 15% (33) of the total, respectively.
In total, 74% of studies employed PRISMA or other methods
to report their SLR. A quality assessment was not conducted in
117 of the 220 included studies (Table 1). A review of the ICD
codes revealed that neoplasms (Chapter II) were the most
frequently studied clinical areas, followed by diseases of the
nervous system (Chapter VI) (Table 2). As far as the data
sources used are concerned, imaging was used most
frequently with clinical notes and lab tests following as
the second frequently used (Table 2).
Considerable variations were observed across the included
publications in terms of ML accuracy, specificity, and sensitivity.
ICD-10 Chapters III and XVIII reported the lowest results, while
some ML algorithms for ICD-10 Chapters II and VI reported 100%
accuracy for all three parameters (Appendix 1). In total, 231 of
10,963 studies (7 of 220 SLR) provided information about the
accuracy, specificity, and sensitivity of all included studies. A total
of 3,164 studies (51 SLRs) did not report any results across the
three dimensions (Table 3).
Four thousand nine hundred ninety-two of the 10,963
studies (103 of 220 SLRs) conducted internal validation proce-
dures. The most common approach was the k-fold cross-
validation (1,325 studies), followed by leave-one-out cross-
validation (205 cases) (Appendix 2).
Regarding external validation, a comparison of ML with
a human comparator was mentioned in 90 of the 10,963
studies (Table 4) [241–311,313–328]. In total, 50 cases pro-
vided evidence of comparable performance, 33 (four)
Studies identified from:
PubMed: (n=778)
IEEE: (n=430)
Web of Science: (n=486)
Scopus: (n=388)
EBSCO host: (n=257)
Studies removed before
screening:
Duplicate records removed
(n=686)
Studies screened
(n=1,656)
Studies excluded based on title
screening
(n=1,233)
Studies sought for retrieval
(n=423)
Full text not available
(n=0)
Studies assessed for eligibility
(n=423)
Studies excluded:
n=198 (wrong article type or
design)
n=2 not on humans
Studies included in review
(n=220)
Identification
Screening
Included
Figure 1. PRISMA flowchart for selection of publications (8).
EXPERT REVIEW OF PHARMACOECONOMICS & OUTCOMES RESEARCH 3
Table 1. Review of included systematic reviews.
Size of study included
Lead author & year Title Databases searched
Total of studies
included Quality assessment
Area of focus
(objective) min max Type of data Reporting method
Abu Bakar 2021 [11] The emergence of machine
learning in auditory neural
impairment: A systematic
review.
Web of Science (WoS), Scopus, Science
Direct, PubMed and Google Scholar
11 NA predict
categorize
8 95 ECG none stated
Adamidi E 2021 [12] Artificial intelligence in clinical
care amidst COVID-19
pandemic: A systematic review
PubMed, Nature, Science Direct, IEEE
Xplore, Arxiv and medRxiv
101 Covidence predict
categorize
17 117,000 Patients’ demographics and medical history,
socioeconomic status, outcomes of HTAs,
diagnostic images, lab results
PRISMA
Adeoye J 2021 [13] Prediction models applying
machine learning to oral cavity
cancer outcomes: A systematic
review
PubMed, Scopus, EMBASE, Cochrane
Library, LILACS, SciELO, PsychINFO, and
Web of Science
27 QUIPS tool predict 31 33,065 no details provided PRISMA
Ahsan MM 2022 [14] Machine learning-based heart
disease diagnosis: A systematic
literature review
Scopus 49 NA categorize
discover
NA NA NA PRISMA
Akazawa M 2021 [15] Artificial intelligence in
gynecologic cancers: Current
status and future
challenges – A systematic review
PubMed, Web of Science, and Scopus 71 PROBAST discover 20 78,215 MRI/CT/PET, clinical, cytology, ultrasound PRISMA
Al Hinai G 2021 [16] Deep learning analysis of resting
electrocardiograms for the
detection of myocardial
dysfunction, hypertrophy, and
ischemia: a systematic review
PubMed MEDLINE 12 NA categorize 47 52,870 ECG PRISMA
Alabi R 2021 [17] Machine learning in oral
squamous cell carcinoma:
Current status, clinical
concerns and prospects for
future – A systematic review
OvidMedline, PubMed, Scopus, Web of
Science, and Institute of Electrical and
Electronics Engineers (IEEE)
41 PROBAST predict 31 33,065 NA PRISMA
Albahri A 2020 [18] Role of biological Data Mining
and Machine Learning
Techniques in Detecting and
Diagnosing the Novel
Coronavirus (COVID-19) –
A Systematic Review
IEEE Xplore; Web of Science; PubMed;
ScienceDirect; Scopus
8 NA predict 322 0.2 million Public health and government agency
databases; body-worn sensors and mobile
applications; media articles and social
media; electronic patient records; synthetic
data
PRISMA
Alballa N 2021 [19] Machine learning approaches in
COVID-19 diagnosis, mortality,
and severity risk prediction:
A review
PubMed, Scopus, IEEE Xplore, and Google
Scholar
52 NA predict
categorize
47 22,095 clinical test, demographic PRISMA
Alharbi B 2022 [20] Predictive models for personalized
asthma attacks based on
patient’s biosignals and
environmental factors:
a systematic review
PubMed, ScienceDirect, Springer, and IEEE 15 NA predict 1 500,000 Telemonitoring data, randomly generated text,
Historical data, ED visits, social media,
sensors
PRISMA
Alhasan M 2021 [21] Clinical Applications of Artificial
Intelligence, Machine
Learning, and Deep Learning
in the Imaging of Gliomas:
A Systematic Review
PubMed, Medline, Cumulative Index to
Nursing and Allied Health Literature
(CINAHL), Web of Science, and Google
Scholar
20 QUADAS-2 discover 30 496 MR images PRISMA
Alsolai H [22] A Systematic Review of Literature
on Automated Sleep Scoring
NA 27 NA categorize
discover
NA NA ECG PRISMA
Anteby R 2021 [23] Deep learning for noninvasive
liver fibrosis classification:
A systematic review
Embase, MEDLINE, Web of Science and
IEEE Xplore
16 QUADAS-2 categorize 34 8,352 US, MRI and CT PRISMA
(Continued )
4K. KOLASA ET AL.
Table 1. (Continued).
Size of study included
Lead author & year Title Databases searched
Total of studies
included Quality assessment
Area of focus
(objective) min max Type of data Reporting method
Bang CS 2021a [24] Computer-Aided Diagnosis of
Gastrointestinal Ulcer and
Hemorrhage Using Wireless
Capsule Endoscopy:
Systematic Review and
Diagnostic Test Accuracy
Meta-analysis
PubMed, Web of Science and Cochrane
Library
33 QUADAS-2 categorize 10 33,009 Still-cut images of endoscopy PRISMA
Bang CS 2021b [25] Computer-aided diagnosis of
esophageal cancer and
neoplasms in endoscopic
images: a systematic review
and meta-analysis of
diagnostic test accuracy
MEDLINE-PubMed, Embase and the
Cochrane Library
21 QUADAS-2 categorize 20 1,383 white-light imaging; narrow-band imaging PRISMA
Barrett L 2022 [26] Systematic Review of Machine
Learning Approaches for
Detecting Developmental
Stuttering
Medline, Springer, EMBASE, ISI Web of
Knowledge/Science and the Institute of
Electrical and Electronics Engineers
(IEEE), Google Scholar, GitHub,
OpenGrey and OpenDOAR
27 NA categorize 2 32,321 Sound recordings PRISMA
Bazoukis G 2021 [27] Machine learning versus
conventional clinical methods
in guiding management of
heart failure patients
a systematic review.
MEDLINE and Cochrane library 122 Qiao scorfe discover 30 <100,000 no details provided PRISMA
Bedrikovetski
S 2021a [28]
Artificial intelligence for pre-
operative lymph node staging
in colorectal cancer:
a systematic review and meta-
analysis
PubMed (MEDLINE), EMBASE, IEEE Xplore
and the Cochrane Library
17 QUADAS-2 predict 17 414 MRI and CT scans PRISMA
Bedrikovetski
S 2021b [29]
Artificial intelligence for the
diagnosis of lymph node
metastases in patients with
abdominopelvic malignancy:
A systematic review and meta-
analysis.
PubMed, MEDLINE, Science Direct and IEEE
Xplore
21 PROBAST categorize 17 1,689 MRI and CT scans PRISMA
Benoit J 2020 [30] Systematic Review of Digital
Phenotyping and Machine
Learning in Psychosis
Spectrum Illnesses
PubMed, Web of Science, PsycInfo,
Embase, Cochrane Central Register of
Controlled Trials
51 (16 used ML) NA categorize NA NA Physiological sensors and GPS, microphones,
mobile- or computer-based applications
none listed
Bernert RA 2020 [31] Artificial Intelligence and Suicide
Prevention: A Systematic
Review of Machine Learning
Investigations
PubMed/MEDLINE, PsychInfo, Web-of-
Science, EMBASE, Google Scholar, hand
search
87 Oxford Center for Evidence-
Based Medicine
Protocol
categorize 55 975,057 clinical samples, emergency settings,
epidemiologic surveys
EQUATOR/PRISMA
Bertl M 2022 [32] A systematic literature review of
AI-based digital decision
support systems for post-
traumatic stress disorder
Scopus 30 internal qualitative
checklist
Categorize
discover
10 89,840 Voice data, text data, checklists and
questionnaires, bio signals, EMR
Kitchenham’s
“Guidelines for
performing
Systematic
Literature
Reviews in Software
Engineering”
Binvignat M 2022
[33]
Use of machine learning in
osteoarthritis research:
a systematic literature review
MEDLINE Pubmed 46 NA Categorize
discover
18 5,749 Imaging, clinical data, biological data PRISMA
(Continued )
EXPERT REVIEW OF PHARMACOECONOMICS & OUTCOMES RESEARCH 5
Table 1. (Continued).
Size of study included
Lead author & year Title Databases searched
Total of studies
included Quality assessment
Area of focus
(objective) min max Type of data Reporting method
Boonstra A 2022 [34] Influence of artificial intelligence
on the work design of
emergency department
clinicians a systematic
literature review
Smart-Cat, Web of Science, and PubMed 34 internal qualitative process Predict
Categorize
discover
NA NA Not reported PRISMA
Boyd C 2021 [35] Machine Learning Quantitation of
Cardiovascular and
Cerebrovascular Disease:
A Systematic Review ofClinical
Applications
Ovid MEDLINE and Elsevier SCOP 47 NA predict 74 86,155 CTA, ultrasound, MRI, Invasive
Angiography
PRISMA
Bracher-Smith
M 2021 [36]
Machine learning for genetic
prediction of psychiatric
disorders: a systematic review.
Medline via Ovid, PsychInfo, Web of
Science and Scopus
13 PROBAST predict NA NA genome-wide association studies PRISMA
Buchlak Q 2020 [37] Machine learning applications to
clinical decision support in
neurosurgery: an artificial
intelligence augmented
systematic review
PubMed, Medline, Embase, Scopus 70 Prediction model Risk of
Bias Assessment Tool
(PROBAST)
categorize 6 22,629 Demographics data, clinical data/assessment,
patient-reported outcomes, imaging data,
lab tests, journals
PRISMA
Buisson M 2021 [38] Deep learning versus
ophthalmologists for screening
for glaucoma on fundus
examination: A systematic
review and meta-analysis.
PubMed, Cochrane Library, Science Direct,
Embase andClinicalTrials.go
6 QUADAS-2 categorize 86 490 glaucoma fundus images PRISMA
Cabitza F 2018 [39] Machine Learning in Orthopedics:
A Literature Review
Scopus and Medline 70 NA predict NA NA 80% imaging data, 20% sensor & biomechanical
data, 15% patients data
none stated
Castaldo R 2021 [40] Radiomic and Genomic Machine
Learning Method Performance
for Prostate Cancer Diagnosis:
Systematic Literature Review.
PubMed, Scopus, and OvidSP databases 29 QUADAS-2 categorize 11 699 MRI and lab test results PRISMA
Cavus N 2021 [41] A Systematic Literature Review on
the Application of Machine-
Learning Models in Behavioral
Assessment of Autism
Spectrum Disorder.
Web ofScience, PubMed, IEEEXplore, and
Scopus
22 NA categorize NA NA data collection instruments are AQ-10, Q-CHAT-
10, ADOS, ADI-R, and Social Responsive-ness
Scale (SRS). Others include Autism Behavior
Checklist, Aberrant BehaviorChecklist,
Clinical Global Impression, and MCHAT-
based Pictorial Autism AssessmentSchedule
(PASS)
PRISMA
Celtikci E 2018 [42] A Systematic Review on Machine
Learning in Neurosurgery: The
Future of Decision-Making in
Patient Care
MEDLINE Cochrane Database of Systematic
Reviews
51 NA predict NA NA Lab tests, clinical data/assessment, sensor
recordings, physiological measurements,
imaging data
PRISMA
Chandra G 2022 [43] Systematic Literature Review on
Application of Artificial
Intelligence in Cancer
Detection Using Image
Processing
Science Direct, IEEEXplore, Radiology
Society of
North America Journals
24 NA categorize NA NA Not reported None stated
Chee M 2021 [44] Artificial Intelligence Applications
for COVID-19 in Intensive Care
and Emergency Settings:
A Systematic Review.
PubMed, Embase, Scopus, CINAHL,IEEE
Xplore, and ACM Digital Library d
14 QUADAS-2 predict
discover
20 49,623 ED, ICU, or Prehospital PRISMA
(Continued )
6K. KOLASA ET AL.
Table 1. (Continued).
Size of study included
Lead author & year Title Databases searched
Total of studies
included Quality assessment
Area of focus
(objective) min max Type of data Reporting method
Chiesa-Estomba
C 2021 [45]
Machine Learning Algorithms as
a Computer-Assisted Decision
Tool for Oral Cancer Prognosis
and Management Decisions:
A Systematic Review
PubMed, Google Scholar, SciELO, and
Scopus
8 ROBIN-I predict 31 33,065 Social demographic, clinical, imaging, tissue
genomic, and blood genomic
PRISMA
Cho S 2021 [46] Brain metastasis detection using
machine learning: a systematic
review and meta-analysis.
MEDLINE and EMBASE 12 QUADAS-2 categorize 26 1,632 MRI scans PRISMA
Choudhury A 2020a
[47]
Role of Artificial Inteligence in
patient safety outcomes:
systematic literature review
PubMed, WorldCat, MEDLINE, ProQuest,
ScienceDirect, SpringerLink, Wiley, and
ERIC
35 NA predict 27 3,377 55%-public databases, 30% - hospital
databases, 15%- clinical trials
PRISMA
Choudhury A 2020b
[48]
Use of machine learning in
geriatric clinical care for
chronic diseases: a systematic
literature review
PubMed, PubMed Central, Web of Science 53 NA categorize NA NA adverse drug events, drug safetyreports, clinical
alerts
PRISMA
da Silva Neto S 2022
[49]
Machine learning and deep
learning techniques to support
clinical diagnosis of arboviral
diseases: A systematic review.
GoogleScholar 15 NA categorize 20 14,019 Demographic, epidemiological and clinical data PRISMA
Dallora A 2017 [50] Machine learning
and microsimulation techniques
on the prognosis of dementia:
A systematic
literature review.
Pubmed, Scopus and Web of Science 26 Kitchenham’s guidelines predict 79 8,325 imagining (MRI, Radiograph, CT) PRISMA
Dallora A 2019 [51] Bone age assessment with various
machine learning techniques:
A systematic literature review
and meta-analysis
Pubmed, Web of Science, Scopus 37 Kitchenham’s guidelines predict NA NA 95% -neuroimaging, 32%- lab test, 20%-
demographics & cognitive measures
(neuropsychological),15% - genetic data
PRISMA
Daniel 2023 [52] A systematic literature review of
machine learning application
in COVID-19 medical image
classification
Google Scholar 19 NA Categorize NA NA Imaging Maniah et al. 2021
[232]
D’Antoni F 2021 [53] Artificial Intelligence and
Computer Vision in Low Back
Pain: A Systematic Review.
PubMed 76 QUADAS-2 predict
categorize
4 39,295 MRI, ultrasound, X-rays, and Computed
Tomography (CT)
PRISMA
Das PK 2022 [54] A Systematic Review on Recent
Advancements in Deep and
Machine Learning Based
Detection and Classification of
Acute Lymphoblastic
Leukemia
Not specified 29 NA Categorize NA NA Imaging None stated
Das T 2022 [55] Intersection of network medicine
and machine learning toward
investigating the key
biomarkers and pathways
underlying amyotrophic lateral
sclerosis: a systematic review
MEDLINE,
PubMed Central, Bookshelf
109 NA Categorize 5 26,898 Demographic, epidemiological, clinical, signs,
comorbidities
PRISMA
de Bardeci M 2021
[56]
Deep learning applied to
electroencephalogram data in
mental disorders: A systematic
review.
PubMed 30 NA predict
categorize
8 178 EEG PRISMA
(Continued )
EXPERT REVIEW OF PHARMACOECONOMICS & OUTCOMES RESEARCH 7
Table 1. (Continued).
Size of study included
Lead author & year Title Databases searched
Total of studies
included Quality assessment
Area of focus
(objective) min max Type of data Reporting method
Decharatanachart
P 2021a [57]
Application of artificial
intelligence in chronic liver
diseases: a systematic review
and meta-analysis
MEDLINE, Scopus, Web of Science and
Google Scholar
19 QUADAS-2 categorize NA NA imaging (elastography, ultrasound, MRI, CT), lab
and clinical tests
PRISMA
Decharatanachart
P 2021b [58]
Application of artificial
intelligence in nonalcoholic
fatty liver disease and liver
fibrosis: a systematic review
and meta-analysis.
MEDLINE, Scopus, Web of Science, and
Google Scholar
25 QUADAS-2 categorize 30 13,030 Ultrasonography, Elastography, MRI, liver
biopsy
PRISMA
DelSole E 2021 [59] The State of Machine Learning in
Spine Surgery: A Systematic
Review.
PubMed, Ovid, and Web of Science 44 NA predict 27 1,106, 234 scans and patients data PRISMA
Dogan O 2021 [60] A systematic review on AI/ML
approaches against COVID-19
outbreak.
ACM Digital Library, ArXiv.org, Elsevier,
IEEE Xplore Digital Library, PubMed,
Springer, Wiley Online Library
264 NA predict
discover
NA NA CT, x-rays, case data NA
Dudchenko A 2020
[61]
Machine Learning Algorithms in
Cardiology Domain:
A Systematic Review
PubMed 27 NA categorize NA NA Private and public datasets and repositories PRISMA
Ebrahimi A 2021 [62] Predicting the Risk of Alcohol Use
Disorder Using Machine
Learning: A Systematic
Literature Review.
Medline, Embase, Inspec, ScienceDirect,
Web of Science, andIEEE Xplore
12 NA predict 106 43,545 demographic, alcohol behavior, EEC, family
history, clinical
PRISMA
Ebrahimighahnavieh
A 2020 [63]
Deep learning to detect
Alzheimer’s disease from
neuroimaging: A systematic
literature review
IEEE Xplore, ScienceDirect, SpringerLink,
ACM digital libraries Web of Science
Scopus Google Scholar
114 Own model based on
article type, scientific
impact, study size and
completeness
categorize NA NA imaging from MRI, PET, CSF none listed
El-Daw S 2021 [64] Role of machine learning in
management of degenerative
spondylolisthesis: A systematic
review.
PubMed, Medline, Cinahl, Allied and
Complementary Medical Database
(AMED), Cochrane Register of
Controlled trials, Embase, Elsevier
clinical key, Scopus
8 NA predict
categorize
NA NA scans, medical notes PRISMA
Falconer N 2021 [65] Systematic review of machine
learning models for
personalized dosing of
heparin.
PubMed, Embase, International
Pharmaceutical Databases (IPA),
CINAHL, Web of Science and IEEE
Xplore
8 CHAMS predict
discover
159 4,908 patients demographic, vitals, lab results PRISMA
Farook T 2021 [66] Machine Learning and Intelligent
Diagnostics in Dental and
Orofacial Pain Management:
A Systematic Review
Scopus, PubMed, and Web of Science 34 Cochrane GradePro (GRADE
approach)
categorize 45 11,198 Dental images, pateints records PRISMA
Fernandes F 2021
[67]
Biomedical signals and machine
learning in amyotrophic lateral
sclerosis: a systematic review.
IEEE Xplore, Web of Science, Science
Direct, Springer, and PubMed
18 NA discover 5 265 EMG, EEG, GR, and MRI PRISMA
Fregoso-Aparicio
L 2021 [68]
Machine learning and deep
learning predictive models for
type 2 diabetes: a systematic
review.
PubMed, Web of Science 90 not defined as standard predict
discover
145 41,000,000 demographic, vitals, medical history PRISMA
(Continued )
8K. KOLASA ET AL.
Table 1. (Continued).
Size of study included
Lead author & year Title Databases searched
Total of studies
included Quality assessment
Area of focus
(objective) min max Type of data Reporting method
Frondelius T 2022
[69]
Diagnostic and prognostic
prediction models in
ventilator-associated
pneumonia: Systematic review
and meta-analysis of
prediction modeling studies.
ACM Digital Library/ACM Guide to
Computing Literature, Astrophysics
Data System, arXiV, IEEE Xplore Digital
Library, Academic Search Ultimate,
Cumulative Index to Nursing and Allied
Health Literature [CINAHL], CT.gov,
PubMed [Medline], Scopus, Web of
Science
20 PROBAST predict 12 1,438,035 Sensor array response data, Acoustic wave
based electronic nose system, GC – MS data
PRISMA
Fusco R 2021 [70] Artificial intelligence and COVID-
19 using chest ct scan and
chest x-ray images: Machine
learning and deep learning
approaches for diagnosis and
treatment.
PubMed, Scopus, Web of Science, Google
Scholar
23 NA predict
categorize
50 2,905 CXR, CT scans NA
Garrow C 2021 [71] Machine Learning for Surgical
Phase Recognition
PubMed, Web of Science, IEEExplore,
GoogleScholar, and CiteSeerX
35 AMSTAR-2 discover 4 340 Intraoperative characteristics; instrument use-
manual annotation; instrument use-RF ID
tags; instrument use automatic detection
from video; Feature extraction from video;
feature learning from video
Cochrane
recommendations,
PRISMA
Ghaderzadeh M 2019
[72]
Deep Learning in the Detection
and Diagnosis of COVID-19
Using Radiology Modalities:
A Systematic Review
PubMed, Scopus, and Web of Science 37 NA categorize 227 11,302 Imaging (all) PRISMA
Grueso S 2021 [73] Machine learning methods for
predicting progression from
mild cognitive impairment to
Alzheimer’s disease dementia:
a systematic review.
PubMed, PsycINFO and Web of Science 116 Cochrane guidelines for
systematic reviews
predict 10 2,084 MRI was the most common kind of
neuroimaging used (in 76 out of 116
studies), followed by PET (11 studies), 26
studies included data from both techniques
(MRI and PET), two studies used
magnetoencephalography (MEG) data, and
one study used MRI and MEG data.
PRISMA
Gutiérrez-Tobal
G 2021 [74]
Reliability of machine learning to
diagnose pediatric obstructive
sleep apnea: Systematic review
and meta-analysis
WoS and Scopus 19 NA categorize 25 3,602 Clinical+ Anthropometrics+Demographics
+SpO2
PRISMA
Haggenmuller
S 2021 [75]
Skin cancer classification via
convolutional neural
networks: systematic review of
studies involving human
experts
PubMed, Medline and ScienceDirect 19 NA categorize NA NA dermoscopic images PRISMA
Hameed B 2021 [76] The Ascent of Artificial
Intelligence in Endourology:
a Systematic
Review Over the Last 2 Decades
MEDLINE,
Scopus, CINAHL, Clinicaltrials.gov, EMBASE,
Cochrane
library, Google Scholar, and Web of
Science
58 NA predict
categorize
31 46,891 Patients with kidney stone disease undergoing
imaging for the diagnosis of stone disease
PRISMA
Hasan N 2021 [77] Understanding current states of
machine learning approaches
in medical informatics:
a systematic literature review
Elsevier, IEEE Xplore, PubMed, and Google
Scholar.
PLOS ONE
51 Journals impact factors discover NA NA ML algorithms identified in the medical
informatic domain.
PRISMA
(Continued )
EXPERT REVIEW OF PHARMACOECONOMICS & OUTCOMES RESEARCH 9
Table 1. (Continued).
Size of study included
Lead author & year Title Databases searched
Total of studies
included Quality assessment
Area of focus
(objective) min max Type of data Reporting method
Hassan N 2021 [78] Preventing sepsis; how can
artificial intelligence inform
the clinical
decision-making process?
A systematic review
Medline, Cumulative Index of Nursing and
Allied Health
Literature, and Embase
17 NA predict NA NA Machine learning algorithms, predictive power
and sepsis definition used.
PRISMA
Henn J 2021 [79] Machine learning to guide clinical
decisionmaking in abdominal
surgery – a systematic literature
review
PubMed 47 NA discover 60 1,049,160 Surgical domain, predicted outcome, outcome
variable, patients, study period, ML,
predictors, cross-validation, and benchmark
PROSPERO
Hickman S 2021 [80] Machine Learning for Workflow
Applications in Screening
Mammography: Systematic
Review and Meta-Analysis
Ovid Embase, Ovid Medline, Cochrane
Central Register of Controlled Trials,
Scopus, and Web of Science literature
14 QUDAS-2 categorize 653 25, 856 Number of cancer images, database used,
patient age, screening or diagnostic
mammograms, breast density
PRISMA
Hinterwimmer
F 2021 [81]
Machine learning in knee
arthroplasty: specific data are
key – a
systematic review
PubMed, Medline database and the
Cochrane Library
19 NA predict 6 1,049,160 complications, costs, functional outcome,
revision, postoperative satisfaction, surgical
technique and biomechanical properties
were investigated
none stated
Hoekstra O 2022 [82] Healthcare related event
prediction from textual data
with machine learning:
A Systematic Literature Review
PubMed, IEEE
Xplore and Web of Science
35 Internal qualitative process predict NA NA Electronic Medical Records (text) Kitshenham et al. 2009
[233]
Hoodbhoy Z 2021
[83]
Machine Learning for Child and
Adolescent Health: A
Systematic Review
Medline, the Cochrane Library,
theCumulative Index to Nursing and
Allied Health Literature Plus, Web of
Science Library, andEBSCO Dentistry &
Oral Science Source.
363 NA discover 6 125,940 year of publication, geographical location, age
range, number of participant, disease or
condition under investigation, study
methodology, reference standard, type,
category, and performance of ML algorithms
PRISMA
Hosni M 2019 [84] Reviewing ensemble classification
methods in breast cancer
IEEE Xplore, ACM digital library, Scopus
and PubMed
193 NA predict NA NA imaging, clinical and non-clinical records,
genetic, lab tests, epidemiological data,
biomarkers from public and private
databases, registers
Kitchenham and
Charters; Peterson
guidlines for
systematic
mapping studies in
software
engineering
Hoyos W 2021 [85] Dengue models based on
machine learning techniques:
A systematic
literature review
ScienceDirect, IEEE
Xplorer, Google Scholar, Emerald, Taylor &
Francis and PubMed
64 NA categorize NA NA diagnostic, epidemic and intervention modeling
of dengue.
PRISMA
Huang J 2021 [86] Deep Learning for Outcome
Prediction in
Neurosurgery: A Systematic
Review of Design,
Reporting, and Reproducibility
PubMed, Scopus, and Embase databases 35 Probast predict 17 101,654 characteristics of DL studies involving
neurosurgical outcome prediction and to
assess their bias and reporting quality (year,
country, disease, procedure, outcome,
predictors data source, data set size, model
type, key findings. Potential sources of bias)
PRISMA
Huang S 2021 [87] A systematic review of machine
learning and
automation in burn wound
evaluation: A promising
but developing frontier
PubMedand MEDLINE (OVID) 30 not stated discover 1 1,500 study design, method of data acquisition,
machine learning techniques, and machine
learning accuracy
PRISMA
(Continued )
10 K. KOLASA ET AL.
Table 1. (Continued).
Size of study included
Lead author & year Title Databases searched
Total of studies
included Quality assessment
Area of focus
(objective) min max Type of data Reporting method
Huang Z 2021 [88] Prediction of the obstruction sites
in the upper airway in sleep
disordered breathing based on
snoring sound parameters:
a systematic review
PubMed, Embase.com, CENTRAL, Web of
Science, and
Scopus in collaboration with a medical
librarian.
28 AXIS predict 9 90 Population, snoring sound parameter,
Identification of the obstruction site, main
finding
PRISMA
Ibrahim B 2021 [89] Diagnostic power of resting-state
fMRI for detection
of network connectivity in
Alzheimer’s disease and mild
cognitive impairment:
A systematic review
Scopus, PubMed, DOAJ, and Google Schola 36 QUADAS categorize 15 291 number of subjects (patients and controls), age
of the subjects, MMSE scores, rs-fMRI
imaging protocol, and analysis method,
sensitivity scores, and specificity scores
PRISMA
Infante T 2021 [90] Radiogenomics and artificial
Intelligence approaches
applied to cardiac computed
tomography angiography and
cardic magnetic resonance for
precision medicine in coronary
heart disease : A Systematic
Review
PubMed, Scopus database 60 NA discover 27 5,065 Imaging protocol, participants, extracted
feature categories, performance in terms of
AUC
none stated
Irgang L 2023 [91] Data-Driven Technologies as
Enablers for Value Creation in
the Prevention of Surgical Site
Infections: a Systematic
Review
Scopus, Web of Science, MEDLINE,
ProQuest, PubMed, ABI/Inform Global,
and Cochrane
59 NA categorize NA NA EMR PRISMA
Islam MN 2022 [92] Machine learning to predict
pregnancy outcomes:
a systematic review,
synthesizing framework and
future research agenda
Google Scholar, SpringerLink, IEEE
Xplore, ScienceDirect, etc.
26 NA predict NA NA Electronic Medical Records (text) Kitshenham et al. 2009
[233]
Jiang K 2021 [93] Current Evidence and Future
Perspective of Accuracy of
Artificial Intelligence
Application for Early Gastric
Cancer Diagnosis With
Endoscopy: A Systematic and
Meta-Analysis
PubMed, MEDLINE, Embase and the
Cochrane Library Databases
16 QUADAS categorize NA NA endoscopic detection of EGC, AUC PRISMA
Jiang M 2021 [94] Using Machine Learning
Technologies in Pressure Injury
Management: Systematic
Review
PubMed, EMBASE, Web of Science,
Cumulative Index to Nursing and
Allied Health Literature (CINAHL),
Cochrane Library, China National
Knowledge Infrastructure (CNKI),
theWanfang database, the VIP
database, and the China Biomedical
Literature Database (CBM) t
32 NA discover NA NA Study outcomes, performance of the algorithm,
and findings
PRISMA
Jones O 2020 [95] Artificial Intelligence Techniques
That May Be Applied to
Primary Care Data to Facilitate
Earlier Diagnosis of Cancer:
Systematic Review
MEDLINE, Embase, SCOPUS, and Web of
Science databases
16 QUADAS categorize
discover
NR NR positive predictive value (PPV), negative
predictive value (NPV), area under the
receiver operating characteristic (AUROC)
curve, types of AI used, the type of data
used to train and test the algorithms,
PRISMA
(Continued )
EXPERT REVIEW OF PHARMACOECONOMICS & OUTCOMES RESEARCH 11
Table 1. (Continued).
Size of study included
Lead author & year Title Databases searched
Total of studies
included Quality assessment
Area of focus
(objective) min max Type of data Reporting method
Kalhori S 2021 [96] Enhanced childhood diseases
treatment using
computational models:
Systematic review of
intelligent experiments
heading to precision medicine
PubMed, Web of Science, Scopus, and
EMBASE
62 NA discover 6 68,921 NA PRISMA
Kareemi H 2020 [97] Machine Learning Versus Usual
Care for
Diagnostic and Prognostic
Prediction in the
Emergency Department:
A Systematic
Review
MEDLINE, Embase, Central, and CINAHL 23 PROBAST predict
categorize
170 10,967,518 Prediction performance metric of model
discrimination (e.g. AUROC), calibration (e.g.
calibration plot slope), or classification (e.g.
sensitivity, specificity).
PRISMA
Karwath A 2021 [98] Redefining β-blocker response in
heart failure patients with
sinus rhythm and atrial fibrillation:
a machine learningcluster
analysis
PubMed NA predict NR NR all-cause mortality none stated
Kassem M 2021 [99] Machine Learning and Deep
Learning Methods for Skin
Lesion
Classification and Diagnosis:
A Systematic Review
MedNode, DermaIS, DermQuest,
the ISIC 2016, 2017, 2018, and 2019, Ph2
49 NA categorize NR NR MedNode, DermaIS, DermQuest, ISIC 2016,
2017, 2018, and 2019, Ph2, and Dermofit,
EDRA,
PRISMA
Kausch S 2021 [100] Physiological machine learning
models for prediction of sepsis
in
hospitalized adults: An integrative
review
PubMed, CINAHL, and Cochrane 14 NA predict 67 3,845 study population, outcome measure (i.e. sepsis-
1, sepsis-3, septic shock), statistical analysis
for model development, model
characteristics, and model validation
measures (i.e. AUROC)
PRISMA
Kawamoto A 2022
[101]
Systematic review of artificial
intelligence-based image
diagnosis for inflammatory
bowel disease
PubMed 27 NA predict NA NA Clinical data, laboratory data, imaging PRISMA
Kedra J 2019 [102] Current status of use of big data
and artificial
intelligence in RMDs: a systematic
literature review informing
EULAR
recommendations
PubMed MEDLINE 110 NA predict
categorize
discover
5 (Units of
observation)
140 mln (Units of
observation)
31% clinical, 31% biological and 29% imaging
sources
Cochrane
Collaboration
handbook
Kennedy EE 2022
[103]
Systematic review of prediction
models for postacute care
destination decision-making
PubMed,
CINAHL, and Embase
28 PROBAST predict NA NA EMR PRISMA + CHARMS
Khanagar S 2021
[104]
Application and Performance of
Artificial Intelligence
Technology in Oral Cancer
Diagnosis and Prediction of
Prognosis
PRISMA, PubMed, Google Scholar, Scopus,
Embase, Cochrane, Web of Science, and
the Saudi Digital Library for articles
16 QUADAS-2 categorize 10 33,065 neural networks PRISMA
Kim HR 2022 [105] Analyzing adverse drug reaction
using statistical and machine
learning methods:
A systematic review
EMBASE and PubMed 72 ROBIS discover NA NA EMR PRISMA
(Continued )
12 K. KOLASA ET AL.
Table 1. (Continued).
Size of study included
Lead author & year Title Databases searched
Total of studies
included Quality assessment
Area of focus
(objective) min max Type of data Reporting method
Kim S 2021 [106] Recent Trends of artificial
intelligence and machine
learning for insomnia research
PubMed; Systematic Reviews; meta
Analysis; PRISMA
17 NA discover 7 807 Brain volume cortical thickness, Hypnogra, EEG,
ECG
PRISMA
Kodama S 2021 [107] Ability of Current machine
learning algorithms to predict
and detect hypoglycemia in
patients with Diabetes
Mellitus: Meta-analysis
EMBASE, MEDLINE 33 QUADAS-2 predict 2 453,487 number of study participants, N-total, N-hypo,
mean or range of the patients’ age, time
of day of hypoglycemic events, place of
supposed hypoglycemic episode (ie,
experimental, in-hospital, and out-of-
hospital), ML algorithm used for
classification
Electronic Literature
searches
Komolafe T 2021
[108]
Diagnostic Test Accuracy of Deep
Learning Detection of COVID-19:
A Systematic Review and
Meta-Analysis
PubMed, Web of Science and Inspec 19 QUADAS categorize 51 3,342 data partitioning, training model, deep learning
techniques, training parameter, the total
number of positive (cohort) vs control
(negative) and other valuable information.
PRISMA
Kourou K 2021 [109] Applied machine learning in
cancer research: A systematic
review for
patient diagnosis, classification
and prognosis
PubMed and dblp 921 NA categorize NR NR disease diagnosis, patient classification and
cancer prognosis and survival
none stated
Kozikowski M 2021
[110]
Role of Radiomics in the
Prediction of Muscle-invasive
Bladder
Cancer: A Systematic Review and
Meta-analysis
PubMed; EMBASE; PRISMA-DTA, 8 QUADAS-2 Predict 54 218 Number of lesions, surgical technique,
pathological stage, Imaging characteristics
PRISMA
Kumar S 2021 [111] Machine learning for modeling
the progression of
Alzheimer disease dementia using
clinical data:
a systematic literature review
PubMed, Scopus, ScienceDirect, IEEE
Explore Digital Library, Association for
Computing Machinery Digital Library,
and arXiv.
64 NA predict NR NR laboratory results, vital signs, neurobehavioral
status exam scores, demographic
information, and comorbidities, along with
neuroimaging scans and CSF biomarkers
PRISMA
Kumar Y 2021 [112] A Systematic Review of Artificial
Intelligence Techniques in
Cancer
Prediction and Diagnosis
web of science, EBSCO, and EMBASE 185 NA categorize NA NA MRIs images, CT images and other diagnostic
images of various cancer types
PRISMA
Kuntz S 2021 [113] Gastrointestinal cancer
classification and
prognostication
from histology using deep
learning: Systematic review
Pubmed and Medline 16 NA predict
discover
30 1,122 area under the receiver operating characteristic,
cancer type, study classification
PRISMA
La Greca Saint-
Esteven A 2021
[114]
Systematic Review on the
Association of Radiomics with
Tumor
Biological Endpoints
PubMed 104 NA predict
discover
45 260 biological endpoint and its alteration, e.g.
mutation on a specific exon, over-
expression, etc.; the imaging modality; the
origin of the dataset; the training set size
NA
PRISMA
Langarizadeh
M 2021 [115]
Machine Learning Techniques for
Diagnosis of Lower
Gastrointestinal Cancer: A
Systematic Review
Google Scholar, Scopus,
ProQuest, PubMed, Web of Science,
Cochrane, and SID
as a Persian database
44 NA categorize NR NR machine learning model and algorithm, sample
size, the type of data, and the results of the
model.
PRISMA
Le Glaz A 2021 [116] Machine Learning and Natural
Language Processing in
Mental Health: Systematic
Review
PubMed, Scopus, ScienceDirect,
and PsycINFO
58 NA discover NA NA precise topic of mental health (eg, autism,
psychotic spectrum disordered), population
characteristics, and types of recorded data
PRISMA
(Continued )
EXPERT REVIEW OF PHARMACOECONOMICS & OUTCOMES RESEARCH 13
Table 1. (Continued).
Size of study included
Lead author & year Title Databases searched
Total of studies
included Quality assessment
Area of focus
(objective) min max Type of data Reporting method
Lecointre L 2021
[117]
Artificial intelligence-based
radiomics models in
endometrial cancer:
A systematic review
Pubmed and Medline 17 NA predict
discover
24 622 imaging technique used, sample size (training,
validation and testing), model type
(Machine learning or Deep Learning),
classifier and diagnostic performance
metrics
none stated
Lequertier V 2021
[118]
Hospital Length of Stay Prediction
Methods
A Systematic Review
PubMed, ScienceDirect, and arXiv
databases.
74 TRIPOD predict 1065 3,517,950 sample with number of hospitals and inpatient
stays, data sources and input variables used
by the prediction models, reimputation
strategies for missing data, LOS modeling
format with potential transformations,
validation study design, employed LOS
prediction methods, and performance
evaluation metrics
PRISMA
Li J 2021 [119] Predicting breast cancer 5-year
survival usingmachine
learning: A systematic review
PubMed (including MEDLINE), Embase, and
Web of Science Core
31 PROBAST predict 200 202,932 Disease characters, and predicted outcome,
data source, data type, number of centers,
and number of samples, number of
candidate predictors used, ML algorithms,
model presentation
PRISMA
Li M 2021 [120] Artificial intelligence applied to
musculoskeletal oncology:
a systematic review
PubMed 252 NA discover NA NA machine learning techniques none stated
Li Y 2021 [121] Applications of artificial
intelligence (AI) in researches
on nonalcoholic fatty liver
disease(NAFLD) : A systematic
review
22 NA predict
discover
NA NA Qualitative analysis for applications of AI in
NAFLD
none stated
Librenza-Garcia
D 2017 [122]
The impact of machine learning
techniques in the study of
bipolar disorder: A systematic
review
PubMed, Embase and Web of Science 51 NA predict
categorize
30 4,488 imaging, clinical tests and scales, lab data, PRISMA
Lima CLD 2022 [123] Temporal and Spatiotemporal
Arboviruses Forecasting by
Machine Learning:
A Systematic Review
IEE Xplore, PubMed, Science Direct,
Springer Link, and Scopus
139 NA predict NA NA Epidemiological data none stated
Locquet M 2021
[124]
A systematic review of prediction
models
to diagnose COVID-19 in adults
admitted to
healthcare centers
MEDLINE and Scopus databases 13 PROBAST predict 8 71 characterization of predictors and outcome,
model development, model performance
PRISMA
Lopez C 2021 [125] Artificial Learning and Machine
Learning Decision Guidance
Applications in Total Hip and
Knee Arthroplasty:
A Systematic Review
EMBASE, Medline, and
PubMed
49 NA discover 25 262,290 AI/ML methods and clinical applications,
surgical domain, data sources, input
variables and output variables, sample size,
average patient age, percent female
patients,
PRISMA
Lubelski D 2021
[126]
Prediction Models in Degenerative
Spine
Surgery: A Systematic Review
PubMed/Medline and Embase databases 31 NA predict 40 8,435 functional/disability/pain scores or more
objective measures such as LOS,
reoperation, readmission, and complications
PRISMA
(Continued )
14 K. KOLASA ET AL.
Table 1. (Continued).
Size of study included
Lead author & year Title Databases searched
Total of studies
included Quality assessment
Area of focus
(objective) min max Type of data Reporting method
Maile H 2021 [127] Machine Learning Algorithms to
Detect Subclinical
Keratoconus: Systematic
Review
MEDLINE, Embase,
and Web of Science and Cochrane Library
26 QUADAS categorize 15 791 diagnosis details, validation details, input
details, input types, method, classification
groups, sensitivity, specificity, accuracy,
precision, area under the receiver-operating
characteristic curve (AUC), and source code
availability
PRISMA
Mangold C 2021
[128]
Machine Learning Models for
Predicting
Neonatal Mortality: A Systematic
Review
PubMed, Cochrane, OVID, and Google
Scholar
11 NA predict 506 481,058 mortality during the, neonatal period, ML data PRISMA
Mari 2022 [129] Systematic Review of the
Effectiveness of Machine
Learning Algorithms for
Classifying Pain Intensity
Phenotype or Treatment
Outcomes Using
Electroencephalogram Data.
MEDLINE, EMBASE, Web of Science,
PsycINFO and The Cochrane Library
44 PROBAST predict 13 342 EEG data none stated
Matsangidou M 2021
[130]
Machine Learning in Pain
Medicine: An Up-To-Date
Systematic Review.
PUBMED 26 NA categorize
discover
10 20,716 quantitative kinematic data, electromyographic
data, Fmri data, accelerometer data, EEG
data, X-rays, sleep scans, clinical data
none stated
Mawdslay E 2021
[131]
A systematic review of the
effectiveness of machine
learning for predicting
psychosocial outcomes in
acquired brain injury: Which
algorithms are used and why?
MEDLINE (PubMed), Web of Science,
EMBASE (OVID interface, 1990
onwards), CINAHL, and PsycINFO
9 PROBAST predict 100 17,132 time to symptom resolve, depression, post-
concussive symptoms, antidepressant use
none stated
Medic G 2019 [132] Evidence-based Clinical Decision
Support Systems for the
prediction and detection of
three disease states in critical
care: A systematic literature
review
PubMed, ClinicalTrials.gov and Cochrane
Database of Systematic Reviews
20 (shock), 22
(respiratory), 31
(sepsis)
NA predtict 36 (shock), 22
(respiratory), 31
(sepsis)
359,350 (shock), 22
(respiratory), 31
(sepsis)
EHRs, clinical/sensor physiological data PRISMA
Mei J 2021 [133] Machine Learning for the
Diagnosis of Parkinson’s
Disease: A Review of
Literature.
IEEE Xplore, Pubmed 209 NA categorize 10 2,289 voice recordings, movement data, or
handwritten patterns, MRI, SPECT, PET, CSF
samples, electromyography, OCT, cardiac
scintigraphy, Patient Questionnaire of
Movement Disorder Society Unified
Parkinson’s Disease Rating
Scale, wholeblood gene expression profiles,
transcranial
sonography, eye movements,
electroencephalography (EEG), and
serum samples.
none stated
Mellia J 2021 [134] Natural Language Processing in
Surgery A Systematic Review
and Meta-analysis.
PubMed, MEDLINE, Web of Science, and
Embase
29 NA discover NA NA EHRs, clinical/sensor physiological data none stated
Miltiadous A 2023
[135]
Machine Learning Algorithms for
Epilepsy Detection Based on
Published EEG Databases:
A Systematic Review
Elsevier’s Scopus, IEEE
Xplore, Elsevier’s ScienceDirect and
MEDLINE PubMed
190 NA predict NA NA EEG PRISMA
(Continued )
EXPERT REVIEW OF PHARMACOECONOMICS & OUTCOMES RESEARCH 15
Table 1. (Continued).
Size of study included
Lead author & year Title Databases searched
Total of studies
included Quality assessment
Area of focus
(objective) min max Type of data Reporting method
Minissi M 2021 [136] Assessment of the Autism
Spectrum Disorder Based on
Machine Learning and Social
Visual Attention: A Systematic
Review.
PubMed Central and Scopus 11 NA discover 28 189 eye movements none stated
Miranda L 2021 [137] Systematic Review of Functional
MRI Applications for
Psychiatric Disease Subtyping.
PubMed 20 NA discover 20 1,872 functional magnetic resonance imaging (fMRI) none stated
Mirzania D 2021
[138]
Applications of deep learning in
detection of glaucoma:
A systematic review.
Medline, Web of Science, and Embase 29 NA categorize 551 (images) 197,085 (images) fundus images of the retina, optical coherence
tomography (OCT), visual field testing
none stated
Moezzi M 2021 [139] The diagnostic accuracy of
Artificial Intelligence-Assisted
CT imaging in COVID-19
disease: A systematic review
and meta-analysis.
ISI Web of Science, Cochrane Library,
PubMed, Scopus, CINAHL, Science
Direct, PROSPERO, and EMBASE
36 QUADAS-2 categorize 46 3,102 CT-SCANS none stated
Moglia A 2021 [140] A systematic review on artificial
intelligence in robot-assisted
surgery
PubMed,
Web of Science, Scopus, and IEEExplore
35 AMSTAR-2 discover 3 86 Kinematic data and video frames none stated
Mohan B 2021 [141] High pooled performance of
convolutional neural networks
in computer-aided diagnosis
of GI ulcers and/or
hemorrhage on wireless
capsule endoscopy images:
a systematic review and meta-
analysis
ClinicalTrials.gov, Ovid EBM Reviews, Ovid,
Embase, Ovid Medline, Scopus and
Web of Science
9 NA categorize 105 images 5,000 cases
(113,268,334
images)
studies that tested a deep CNN learning model
for the detection and/or diagnosis of GI
ulcers and/or hemorrhage on WCE
none stated
Mondal M 2021 [142] Diagnosis of COVID-19 Using
Machine Learning and Deep
Learning: A Review
IEEE, Science Direct, Springer Nature,
MDPI, Wiley, Bentham Science, Arxiv,
Medrxiv
52 NA categorize 1 5,644 X-rays, CT scans, ultrasounds none stated
Montazeri M 2021
[143]
Machine Learning Models for
Image-Based Diagnosis and
Prognosis of COVID-19:
Systematic Review.
PubMed, Web of Science, IEEE, ProQuest,
Scopus, bioRxiv, and medRxiv
44 PROBAST categorize 75 32,583 CT scans, CXR, CT, CXR, lung ultrasound, and
other information such as the patient’s age
and medical history
none stated
Moor M 2021 [144] Early Prediction of Sepsis in the
ICU Using Machine Learning:
A Systematic Review.
Embase, Google Scholar, PubMed/Medline,
Scopus, and Web of Science
22 criteria adapted from Qiao predict 25 36,176 Demographics, labs, vitals, comorbidities,
diagnoses, images, clinical parameters,
cytokine mRNA expression
none stated
Moshawrab M 2023
[145]
Smart Wearables for the Detection
of Cardiovascular Diseases:
A Systematic Literature Review
IEEE, PubMed, and Scopus 87 NA Discover
predict
NA NA Wearable data PRISMA
Moura F 2021 [146] Artificial intelligence in the
management and treatment of
burns: a systematic review.
MEDLINE, Embase and PubMed 46 NA discover 5 66,611 survival/mortality; assessment of burn depth;
estimation of body surface area; antibiotic
response/sepsis; other miscellaneous
applications
none stated
Mpanya D 2021 [147] Predicting mortality and
hospitalization in heart failure
using machine learning:
A systematic literature review.
MEDLINE, Google Scholar, Springer Link,
Scopus, and
Web of Science
30 NA predict 71 716,790 all-cause mortality, 30-day all-cause
readmission, cardiac death, heart
transplantation and HF-related
hospitalization, one year survival
none stated
Mughal H 2022 [148] Parkinson’s Disease Management
via Wearable Sensors:
A Systematic Review
IEEE Xplore, Multidisciplinary Digital
Publishing Institute
(MDPI), Springer, Elsevier, and other
Journals
60 NA Categorize 4 2,063 Wearable data none stated
(Continued )
16 K. KOLASA ET AL.
Table 1. (Continued).
Size of study included
Lead author & year Title Databases searched
Total of studies
included Quality assessment
Area of focus
(objective) min max Type of data Reporting method
Musa N 2022 [149] A systematic review and Meta-
data analysis on the
applications of Deep Learning
in Electrocardiogram
IEEE Xplore digital library, ACM
digital library, Science Direct, Springer
Links, DBLP,
PubMed,
Scopus and Web of Science
150 Kitchenham 2007 [234] categorize 16 2,322,513 ECG Kitchenham 2007
[234]
Musulin J 2021 [150] Application of Artificial
Intelligence-Based Regression
Methods in the Problem of
COVID-19 Spread Prediction:
A Systematic Review.
Google scholar, The Multidisciplinary
Preprint Platform, PubMed, Web of
Science, Arxiv, and
MedArxiv
127 All articles were evaluated
according to regression
measures used for
evaluation of AI-based
regressors (R2-score,
Accuracy, MAE, RMSE).
predict NA NA new cases of COVID19 none stated
Nadarajah R 2021
[151]
Prediction of incident atrial
fibrillation in community-
based electronic health
records: a systematic review
with meta-analysis.”
Medline and Embase 11 PROBAST predict 921 95,607 number of incident atrial fibrillation none stated
Naemi A 2021 [152] Machine learning techniques for
mortality prediction in
emergency departments:
a systematic review.
Medline (PubMed), Scopus and Embase
(Ovid)
15 PROBAST predict 100 799,522 vital signs: diastolic blood pressure, heart rate,
pulse rate, respiratory rate;, systolic blood
pressure, oxygen saturation, body
temperature.
none stated
Nafea MS 2022 [153] Supervised Machine Learning and
Deep Learning Techniques for
Epileptic Seizure Recognition
Using EEG Signals
A Systematic Literature Review
Web of Science and Scopus 91 NA Predict
categorize
discover
NA NA ECG PRISMA
Nasser M 2023 [154] Deep Learning Based Methods for
Breast Cancer Diagnosis:
A Systematic Review and
Future Direction
Scopus,
Google Scholar, IEEE Xplore Library,Web of
Science, SpringerLink, ScienceDirect,
ACM
Digital Library and PubMed
98 NA predict 106 11,429 Imaging PRISMA
Nazarian S 2021
[155]
Diagnostic Accuracy of Artificial
Intelligence and Computer-
Aided Diagnosis for the
Detection and Characterization
of Colorectal Polyps:
Systematic Review and Meta-
analysis.
Embase, MEDLINE, and the Cochrane
Library
48 QUADAS-2 categorize 15 12,895 narrow band imaging, endoscopy none stated
Nguyen A 2018 [156] Machine learning applications for
the differentiation of primary
central nervous system
lymphoma from glioblastoma
on imaging: a systematic
review and meta-analysis
PubMed 8 QUADAS-2 categorize 10 (GBM),
8 (PCNSL)
70 (GBM),
54 (PCNSL)
Imaging (MRI, FLAIR) prior to treatment PRISMA
Ogink P 2021a [157] The use of machine learning
prediction models in spinal
surgical outcome: An overview
of current development and
external validation studies.
PubMed, Embase, and Cochrane Library 59, 77 models MINORS predict 635 26,364 Medical management, Survival, Complication,
PROMs,
Intraoperative complication
none stated
Ogink P 2021b [158] Wide range of applications for
machine-learning prediction
models in orthopedic surgical
outcome: a systematic review.
Cochrane library, Embase and PubMed 33 NA predict 176 1,104,233 cost, complication, survival, readmission, non-
home discharge, sustained opioid use
none stated
(Continued )
EXPERT REVIEW OF PHARMACOECONOMICS & OUTCOMES RESEARCH 17
Table 1. (Continued).
Size of study included
Lead author & year Title Databases searched
Total of studies
included Quality assessment
Area of focus
(objective) min max Type of data Reporting method
Ortiz-Barrios M 2021
[159]
Process Improvement Approaches
for Increasing the Response of
Emergency Departments
against the COVID-19
Pandemic: A Systematic
Review.
ISI Web of Science, Scopus, PubMed, IEEE,
Google Scholar, and Science Direct
65 EPOC discover 10 192,779 mean length of stay, left-without-being-seen
rate, average flow time
Median time to ED revisit
Median waiting time for consultation
Ossai C 2021 [160] Intelligent decision support with
machine learning for efficient
management of mechanical
ventilation in the intensive
care unit – A critical overview.
EBSCO, IEEEXplore, Google Scholar,
SCOPUS, and the Web of Science
26 Joanna Briggs Institute (JBI)
critical appraisal
checklist for cross-
sectional research
adapted to machine
learning (ML) decision
support strategies for
effective management
discover 15 3,602 Tidal Volume (TV), asynchrony, weaning, risk of
Prolonged Mechanical ventilation (PMV)
none stated
Paganelli AI 2022
[161]
Real-time data analysis in health
monitoring systems:
A comprehensive systematic
literature review
ACM digital library,
IEEE Xplore, Science Direct, SpringerLink,
and PubMed
36 NA categorize NA NA ECG Kitchenham 2009
[233]
Pahwa B 2021 [162] Applications of Machine Learning
in Pediatric Hydrocephalus:
A Systematic Review.
PubMed and Cochrane 15 TRIPOD predict
categorize
9 953 clinical features, CT, MRI, ultrasound
Patil S 2019 [163] Machine learning and its potential
applications to the genomic
study of head and neck
cancer – A systematic review
PubMed, EMBASE, Scopus, Web of Science,
gray literature (google scholar,
proquest, OpenGrey).
7 Newcastle-Ottawa Scale
(NOS)
predict 21 408 Biomarkers, genomic data, clinical data PRISMA
Peralta M 2021 [164] Machine learning in deep brain
stimulation: A systematic
review
PubMed, Google Scholar 73 NA discover 1 509 micro-electrodes Imaging, Local Field Potential
Ext. sensors Clinical EEG/ECoG
Stimulation Transcranial Magnetic Stimulation
none stated
Persad E 2021 [165] Neonatal sepsis prediction
through clinical decision
support algorithms:
A systematic review.
PubMed, CENTRAL and EMBASE 36 Risk of bias was assessed
with a tool relevant
for each study design
predict 24 2,989 electronic health records; events of apneas,
with bradycardia and desaturation; average
heart rate characteristics; blood pressure;
body temperature; diastolic blood pressure;
heart rate; Neonatal Therapeutic
Intervention Scoring System; respiratory
rate; Score for Acute Neonatal Physiology;
oxygen saturation
none stated
Popa S 2021 [166] Non-Alcoholic Fatty Liver Disease:
Implementing Complete
Automated Diagnosis and
Staging. A Systematic Review.
PubMed, EMBASE, Cochrane Library, and
WILEY
37 NA categorize 18 108,139 ultrasound images none stated
Prasoppokakorn
T 2021 [167]
Application of artificial
intelligence for diagnosis of
pancreatic ductal
adenocarcinoma by EUS:
A systematic review and meta-
analysis.”
Ovid MEDLINE, EMBASE, SCOPUS,
International Scientific Indexing,
Computer Sciences and Engineering
databases including Institute of Electrical
and
Electronics Engineers and Association for
Computing
Machinery
8 QUADAS-2 categorize 20 388 EUS images none stated
(Continued )
18 K. KOLASA ET AL.
Table 1. (Continued).
Size of study included
Lead author & year Title Databases searched
Total of studies
included Quality assessment
Area of focus
(objective) min max Type of data Reporting method
Quaak M 2021 [168] Deep learning applications for the
classification of psychiatric
disorders using neuroimaging
data: Systematic review and
meta-analysis.
PUBMED and IEEE Xplore 44 NA categorize 4 1,301 f-MRI, s-MRI, Autism Brain Imaging Data
Exchange
none stated
Quartuccio N 2021
[169]
The role of PET radiomic features
in prostate cancer:
a systematic review.
PubMed/MEDLINE 7 QUADAS-2 predict 52 94 PET none stated
Ramesh S 2021 [170] Applications of Artificial
Intelligence in Pediatric
Oncology: A Systematic
Review
Embase, Scopus, and MEDLINE 42 NA predict
discover
12 337 MRI, magnetic resonance imaging; MRS,
magnetic resonance spectroscopy, clinical
data, histology, Raman
histology, CT, computed tomography; FTIR,
Fourier-transform infrared spectroscopy; K
none stated
Ramos-Lima L 2020
[171]
The use of machine learning
techniques in trauma-related
disorders: a systematic review
PubMed, Embase, Web of Science 49 Proposed by the authors predict
categorize
16 (prognositic), 40
(classification),
69 (clustering
analysis)
89,840 (prognositic),
2,782
(classification),
2,782 (network
analysis)
imaging, surveys, lab tests, medical records,
clinical tests.
PRISMA
Ravegnini G 2021
[172]
Radiomics and Artificial
Intelligence in Uterine
Sarcomas: A Systematic
Review
PubMed, Scopus, and Cochrane
Library
6 QUADAS-2 predict 58 80 imaging PRISMA
Ren M 2021 [173] Artificial intelligence in orthopedic
implant model classification:
a systematic review
PubMed,
EMBASE, and the Cochrane Library
11 ad hoc for paper categorize 170 2,894 imaging PRISMA
Rice P 2021 [174] Machine Learning Models for
Predicting Stone-Free Status after
Shockwave Lithotripsy:
A Systematic
Review and Meta-Analysis
MEDLINE, EMBASE,
Scopus, ScienceDirect, CINAHL and
Cochrane Library + gray literature
8 QUADAS-2 predict 51 984 NA PRISMA
Rowe T 2021 [175] Machine learning for the life-time
risk prediction of Alzheimer’s
disease: a systematic review
Scopus, PubMed and Google Scholar 12 NA predict NA NA NA CHARMS
Safaei M 2021 [176] A systematic literature review on
obesity: Understanding the
causes & consequences of
obesity and reviewing various
machine learning approaches
used to predict obesity
Science Direct, Springer Link, IEEE Explorer,
Taylor and Francis Online, ACM Digital
Library, MDPI, NCBI
93 Nidhra et al predict NA NA NA Kitchenham and
Charters + Jula
Sajjadian M 2021
[177]
Machine learning in the
prediction of depression
treatment outcomes:
a systematic review and meta-
analysis
PubMed, Google Scholar,
ScienceDirect, and PsychINFO
54 but only 8
adequate-quality
papers
ad hoc for paper predict NA NA NA PRISMA
Salas-Zárate R 2022
[178]
Detecting Depression Signs on
Social Media: A Systematic
Literature Review
ACM Digital
Library, IEEE Xplore Digital Library,
SpringerLink, Science Direct, and
PubMed, Google Scholar
34 NA discover NA NA Social media Brereton 2007 [235]
Salem H 2021 [179] A systematic review of the
applications of Expert Systems
(ES) and machine learning
(ML) in clinical urology
WEB OF SCIENCE, EMBASE, BIOSIS
CITATION INDEX, SCOPUS, PUBMED,
Google Scholar and MEDLINE
138 NA predict
discover
NA NA NA PRISMA
(Continued )
EXPERT REVIEW OF PHARMACOECONOMICS & OUTCOMES RESEARCH 19
Table 1. (Continued).
Size of study included
Lead author & year Title Databases searched
Total of studies
included Quality assessment
Area of focus
(objective) min max Type of data Reporting method
Sanmarchi F 2023
[180,181]
Predict, diagnose, and treat
chronic kidney disease with
machine learning: a systematic
literature review
PubMed 68 PROBAST and Luo et al.
[236]
Predict
discover
categorize
30 550,000 Lab data PRISMA
Saputro S 2021 [182] Prognostic models of diabetic
microvascular complications:
a systematic review and meta
analysis
PubMed and Scopus 76 PROBAST predict NA NA age, sex, HbA&c, eGFR, BMI, SBP, diabetic
duration
PRISMA + CHARMS
Sardar SK 2022 [183] A Systematic Literature Review on
Machine Learning Algorithms
for Human Status Detection
ScienceDirect, IEEE Xplore, ACM 76 NA categorize NA NA Physiological data PRISMA
Scardoni A 2020
[184]
Artificial intelligence-based tools
to control healthcare
associated infections:
A systematic review of the
literature
Medline, Embase 27 Newcastle-Ottawa Scale
(NOS) for non-
randomized, Cochrane
tool for randomized.
categorize 100 191,014 clinical data, lab tests PRISMA
Segato A 2020 [185] Artificial intelligence for brain
diseases: A systematic review
PubMed, Scopus, Web of Science 154 NA categorize 5 1,576 Imaging (MRI, PET, CT, ultrasound, HSI),
Connectivity, Clinical recordings (EEG,
microelectrode), EHR, gene sequencing
PRISMA
Senanayake S 2019
[186]
Machine learning in predicting
graft failure following kidney
transplantation: A systematic
review of published predictive
models
Medline, CINAHL, EMBASE, PsycINFO,
Cochrane databases
18 Qiao 2019 predict NA NA Transplant records, patient registries, lab tests PRISMA
Senders J 2018 [187] Natural and Artificial Intelligence
in Neurosurgery: A Systematic
Review
PubMed, Embase 23 NA categorize Diagnosis (10); Pre-
op planning (9),
outcomes (101)
Diagnosis (3,785 +
1,225); Pre-op
planning (523),
outcomes
(7,769)
imaging, EEG, EHR, genetics and clinical data, PRISMA
Shi Z 2021 [188] Methodological quality of
machine learningbased
quantitative
imaging analysis studies in
esophageal cancer:
a systematic review
of clinical outcome prediction
after concurrent
chemoradiotherapy
PubMed and Embase Ovid 37 RQS + findings of other
radiomics
methodological
evaluations
predict 11 190 Imaging PRISMA
Shillan D 2019 [189] Use of machine learning to
analyze routinely collected
intensive care unit data
a systematic review
PubMed, Web of Science 258 NA predict median sample size
across all studies
was 488 (IQR
108–4,099)
Patient-reported outcomes PRISMA
Shin S 2021 [190] Machine learning vs. conventional
statistical models for
predicting heart failure
readmission and mortality
MEDLINE, EPUB, Cochrane
CENTRAL, EMBASE, INSPEC, ACM, and Web
of Science
20 CHARMS predict NA NA NA PRISMA
Shlobin N 2021 [191] Artificial Intelligence for Large-
Vessel Occlusion Stroke:
A Systematic Review
PubMed MEDLINE (National Library
ofMedicine), Embase (Elsevier),
andScopus (Elsevier)
40 GRADE + PROBAST predict
discover
NA NA NA PRISMA
Siddiqui S 2022 [192] Deep Learning Models for the
Diagnosis and Screening of
COVID-19: A Systematic
Review
Google Scholar, PubMed, IEEE 45 Zeng 2015 [237] Predict
discover
147 80,000 Imaging Kable 2012 [238]
(Continued )
20 K. KOLASA ET AL.
Table 1. (Continued).
Size of study included
Lead author & year Title Databases searched
Total of studies
included Quality assessment
Area of focus
(objective) min max Type of data Reporting method
Smets J 2021 [193] Machine Learning Solutions for
Osteoporosis – A Review
PubMed and Web of Science 89 NA predict
discover
NA NA NA PRISMA
Soffer S 2021 [194] Deep learning for pulmonary
embolism detection on computed
tomography pulmonary
angiogram: a systematic review
and metaanalysis
Medline, pubmed 5 QUADAS-2 categorize 121 29,465 images PRISMA
Song D 2021 [195] Machine learning with
neuroimaging data to identify
autism spectrum
disorder: a systematic review and
metaanalysis
PubMed, Scopus, and Embase 44 QUADAS-2 categorize NA NA MRI images PRISMA
Song X 2021 [196] Comparison of machine learning
and logistic regression models
in
predicting acute kidney injury:
A systematic review and meta-
analysis
Pubmed, Embase 24 NA predict 27 1,620,898 surgery, admission, hospitalization, transplant,
cancer, care
PRISMA
Sorin V 2020 [197] Deep Learning for Natural
Language Processing in
Radiology – Fundamentals and
a Systematic Review
Medline, Scopus, Google Scholar 10 NA categorize NA NA Patient journals, clinical reports, imaging PRISMA
Srivani M 2022 [198] Cognitive computing
technological trends and
future research directions in
healthcare – A systematic
literature review
Scopus, IEEEXplore, Google Scholar, DBLP,
PubMed, Springer, Science Direct
10 NA categorize NA NA HER, genomic, sensors, eye images Non stated
Stephens M 2022
[199]
Utility of machine learning
algorithms in degenerative
cervical and lumbar spine
disease: a systematic review
PubMed, Embase, Medline,
and Cochrane
35 GRADE + PROBAS predict
discover
NA NA NA PRISMA
Stewart J 2021 [200] Applications of machine learning
to
undifferentiated chest pain in the
emergency
department: A systematic review
PubMed (MEDLINE), Cochrane Library, Web
of Science, Embase, and Scopus
23 NA predict
categorize
228 85,254 demographics, PMHx, Sx, Exam, ECG, Meds,
Vitals, troponins, labs, estrogen status
(women), echo
PRISMA
Stokes K 2022 [201] The use of artificial intelligence
systems in diagnosis of
pneumonia via signs and
symptoms: A systematic
review
PubMed, Scopus, and Ovid
SP
16 STARD 2015 tool predict 249 48,449 Imaging PRISMA
Subramanian H 2021
[202]
Trends in Development of Novel
Machine Learning Methods for
the
Identification of Gliomas in
Datasets
That Include Non-Glioma Images:
A Systematic Review
Ovid
Embase, Ovid MEDLINE, Cochrane trials
(CENTRAL), and Web of Science-Core
Collection
12 TRIPOD predict
categorize
42 patients (but
many studies
unspecified)
717 Images PRISMA
Syeda H 2021 [203] Role of Machine Learning
Techniques to Tackle the
COVID-19 Crisis: Systematic
Review
PubMed, Web of Science, and the CINAHL
databases
130 NA predict
categorize
5 6,054 imagining (MRI, Radiograph, CT), social media
posts, lab tests
PRISMA
(Continued )
EXPERT REVIEW OF PHARMACOECONOMICS & OUTCOMES RESEARCH 21
Table 1. (Continued).
Size of study included
Lead author & year Title Databases searched
Total of studies
included Quality assessment
Area of focus
(objective) min max Type of data Reporting method
Syer T 2021 [204] Artificial Intelligence Compared to
Radiologists for the Initial
Diagnosis of Prostate Cancer on
Magnetic Resonance Imaging:
A Systematic Review and
Recommendations for Future
Studies
MEDLINE, EMBASE, and arXiv electronic
databases, and the OpenSIGLE repository
27 QUADAS-2 categorize NA NA Images PRISMA
Tabatabaei M 2021
[205]
Current Status and Quality of
Machine
Learning-Based Radiomics Studies
for
Glioma Grading: A Systematic
Review
PubMed, Scopus, and EMBASE 18 NA predict
discover
NA NA Images PRISMA
Tan K 2021 [206] Evaluation of Machine Learning
Methods Developed for
Prediction of Diabetes
Complications: A Systematic
Review
MEDLINE® (via PubMed®), Embase®,
the Cochrane®
Library, Web of Science®, and DBLP
Computer Science Bibliography
32 PROBAST predict
discover
52 805,867 NA PRISMA + CHARMS
Teo Y 2021 [207] Predicting Clinical Outcomes in
Acute Ischemic Stroke Patients
Undergoing Endovascular
Thrombectomy with Machine
Learning
Pubmed 4 NA predict 50 387 NA PRISMA
Tewarie I 2021 [208] Survival prediction of
glioblastoma patients – are we
there yet?
A systematic review of prognostic
modeling for glioblastoma
and its clinical potential
Embase, Medline Ovid
(PubMed), Web of science, Cochrane
CENTRAL, and
Google Scholar
27 PROBAST and CHARMS predict NA NA clinical parameters, genomics, MRI imaging,
combined clinical and genomics, combined
clinical and MRI imaging, combined clinical,
MRI imaging and genomics, histopathology,
combined clinical and pharmacokinetics
PRISMA
Triantafyllidis A 2020
[209]
Computerized decision support
and machine
learning applications for the
prevention and treatment of
childhood obesity: A
systematic review of the literature
PubMed and Scopus 9 NA predict
categorize
NA NA patients questionnaires PRISMA
Ugga L 2021 [210] Meningioma MRI radiomics and
machine learning: systematic
review,
quality score assessment, and
meta-analysis
PubMed, Scopus, and Web of
Science
8 RQS and QUADAS-2 discover NA NA NA PRISMA-DTA
van Kempen E 2021a
[211]
Accuracy of Machine Learning
Algorithms for the
Classification
of Molecular Features of Gliomas
on MRI: A Systematic
Literature Review and Meta-
Analysis
Medline
(accessed through PubMed), EMBASE, and
the Cochrane Library
17 TRIPOD categorize 23 381 Images PRISMA
van Kempen E 2021b
[212]
Performance of machine learning
algorithms for glioma
segmentation of brain MRI:
a systematic literature
reviewand meta-analysis
Medline
(accessed through PubMed), EMBASE, and
the Cochrane Library
8 TRIPOD predict
discover
11 287 Images PRISMA
(Continued )
22 K. KOLASA ET AL.
Table 1. (Continued).
Size of study included
Lead author & year Title Databases searched
Total of studies
included Quality assessment
Area of focus
(objective) min max Type of data Reporting method
Volpe S 2021 [213] Machine Learning for Head and
Neck
Cancer: A Safe Bet? – A Clinically
Oriented Systematic Review for
the
Radiation Oncologist
National Center for Biotechnology
Information PubMed, Elsevier EMBASE and
Elsevier Scopus
48 internal qualitative
checklist
predict
discover
NA NA NA PRISMA
Wang W 2020 [214] A systematic review of ML models
for predicting outcomes of
stroke with structured data
PubMed, Web of Science 18 TRIPOD predict 70 3,184 NA PRISMA
Wesselius F 2021
[215]
Digital biomarkers and algorithms
for detection of atrial
fibrillation usingsurface
electrocardiograms:
A systematic review
pubmed 130 NA categorize
discover
NA NA NA PRISMA
Wongkoblap A 2017
[216]
Researching Mental Health
Disorders in
the Era of Social Media:
Systematic Review
PubMed, IEEE, ACM Digital LibrarY, Web of
Science, and Scopus
48 NA categorize NA NA patients reported outcomes none stated
Wu J 2021 [217] Performance and Limitation of
Machine Learning Algorithms
for
Diabetic Retinopathy Screening:
Meta-analysis
PubMed and EMBASE 60 QUADAS-2 categorize
discover
NA NA NA PRISMA
Xu L 2021 [218] Prognostic models for
amyotrophic lateral sclerosis:
a systematic
review
Medline, Embase, Web of Science,
and Cochrane library
28 PROBAST predict NA NA NA PRISMA
Yan M 2022 [219] Sepsis prediction, early detection,
and identification using
clinical text for machine learning:
a systematic review
PubMed
and Scopus, ACM DL, dblp, and
IEEE Xplore
9 NA predict
categorize
discover
NA NA NA PRISMA
Yeo M 2021 [220] Review of deep learning
algorithms for the automatic
detection of intracranial
hemorrhages on computed
tomography head imaging
MEDLINE and arXiv databases 11 NA Categorize 246 313,318 Images PRISMA
Yin J 2021 [221] Role of Artificial Intelligence
Applications in Real-Life
Clinical
Practice: Systematic Review
PubMed, Embase, Cochrane Central, and
CINAHL
51 NA Discover NA NA NA PRISMA
Yu K 2020 [222] Machine Learning Applications in
the Evaluation and
Management of Psoriasis:
A Systematic Review
MEDLINE, Google Scholar, ACM Digital
Library, IEEE Xplore
33 NA predict
categorize
NA NA imaging, lab tests, clinical data, patient journals,
literature databases, registries
PRISMA
Zakhem G 2020 [223] Characterizing the role of
dermatologists in developing
artificial intelligence for
assessment of skin cancer:
A systematic review
PubMed 51 NA Categorize NA NA Imaging none stated
Zaunseder E 2022
[224]
Opportunities and challenges in
machine learning-based
newborn screening-A
systematic literature review
ScienceDirect, IEEE, ACM, Sage, and
PubMed
17 NA categorize NA NA Physiological data PRISMA
(Continued )
EXPERT REVIEW OF PHARMACOECONOMICS & OUTCOMES RESEARCH 23
Table 1. (Continued).
Size of study included
Lead author & year Title Databases searched
Total of studies
included Quality assessment
Area of focus
(objective) min max Type of data Reporting method
Zeiser F 2021 [225] Breast cancer intelligent analysis
of histopathological data:
Asystematic review
ACM Digital Library, IEEE Xplore Library,
PubMed, Google Scholar,ScienceDirect,
Scopus, SpringerLink, and Web of
Science
53 h5-index, SJR-index, CORE-
index
Discover 58 8,158 Images PRISMA
Zhang L 2021 [226] Development of artificial
intelligence in
epicardial and pericoronary
adipose tissue
imaging: a systematic review
PubMed and the Web of Science 19 NA predict
discover
NA NA Images PRISMA
Zhao Y 2021 [227] Social Determinants in Machine
Learning
Cardiovascular Disease Prediction
Models: A
Systematic Review
PubMed, Embase, Web of Science,
IEEE Xplore, and ACM Digital Library
48 NA Predict NA NA not reported PRISMA
Zheng Q 2021 [228] Artificial intelligence performance
in detecting tumor metastasis
from
medical radiology imaging:
A systematic review and meta-
analysis
PubMed andWeb of Science 34 NA categorize
discover
NA NA Images PRISMA
Zheng Y 2022 [229] Identifying Patients With
Hypoglycemia Using Natural
Language Processing:
Systematic Literature Review
PubMed, Web of Science Core Collection,
CINAHL (EBSCO),
PsycINFO (Ovid), IEEE Xplore, Google
Scholar, and ACL Anthology
8 NA discover NA NA Physiological data NA
Zhou Y 2022 [230] Machine learning predictive
models for acute pancreatitis:
A
systematic review
PubMed, Web of Science, Scopus, and
Embase
24 NA Predict NA NA Images PRISMA
Zhu T 2020 [231] Deep Learning for Diabetes:
A Systematic Review
PubMed, DBLP Computer Science
Bibliography, and IEEEXplore
40 NA categorize
discover
37 199,116 imaging, EHR, lab tests (no details provided)
TOTAL 10,963
NA – not applicable.
24 K. KOLASA ET AL.
Table 2. Review of sources of data used across included studies.
Sources of data used
Chapter
of ICD-
10 Name of ICD-10 chapter
No
of
SLRs Imaging
Laboratory,
testing
Clinical
notes
Patient-
reported
data
Epidemiology
&
demographics
Other (mobile applications, genetic
data etc.)
I Certain infectious and parasitic diseases 9 3 2 2 1 4
II Neoplasms 44 37 5 8 3 3 mobile applications, genetic data
etc.
III Diseases of the blood and blood-forming
organs and certain disorders involving the
immune mechanism
2 1 1 1
IV Endocrine, nutritional and metabolic
diseases
7 2 4 2 1 3 mobile applications
V Mental and behavioral disorders 11 5 1 2 1 2 genetic data, voice data, text data,
checklists and questionnaires, bio
signals, social media
VI Diseases of the nervous system 30 14 8 9 8 3 mobile applications, genetic data,
wearable data
VII Diseases of the eye and adnexa 3 3
VIII Diseases of the ear and mastoid process 1
IX Diseases of the circulatory system 18 5 2 2 1 2 mobile applications, genetic data,
sound recording, wearable data,
ECG
X Diseases of the respiratory system 21 12 5 7 3 5 mobile applications, genetic data
XI Diseases of the digestive system 10 9 3 2 2
XII Diseases of the skin and subcutaneous tissue 4 2 1 1 2
XIII Diseases of the musculoskeletal system and
connective tissue
15 8 4 6 5 1 mobile applications, genetic data,
muscle measurements
XIV Diseases of the genitourinary system 5 1 2 1
XV Pregnancy, childbirth and the puerperium 6 1 1 2 2
XVIII Symptoms, signs and abnormal clinical and
laboratory findings, not elsewhere
classified
2 1 1
XIX Injury, poisoning and certain other
consequences of external causes
0 3 3 4 4
XXI Factors influencing health status and contact
with health services
19 7 5 5 8 3 genetic data etc.
Total 214* 112 48 53 42 28
*Boonstra 2022: emergency (whatever condition); Hoekstra 2022 [82]: medical events (whatever condition); Kennedy 2021 [103]: acute care discharge (whatever
condition); Kim 2022 [105]: adverse event reactions (whatever condition or drug); Paganelli 2022 [161]: just health monitoring and Sardar 2022 [183]: ‘Human
Status Detection’ (emotions) NOT COUNTED HERE.
Figure 2. Number of SLRs included in this reivew published over the years.
EXPERT REVIEW OF PHARMACOECONOMICS & OUTCOMES RESEARCH 25
Table 3. Number of studies included in reviewed SLRs that reported accuracy, sensitivity and specificity.
No of studies that reported
Lead author & year No of studies Accuracy Sensitivity Specificity
Abu Bakar 2021 [11] 11 10 NA NA
Adamidi E 2021 [12] 101 48 51 47
Adeoye J 2021 [13] 27 19 19 19
Akazawa M 2021 [15] 71 41 25 23
Al Hinai G 2021 [16] 12 10 11 11
Alabi R 2021 [17] 41 28 23 26
Albahri A 2020 [18] 8 6 0 0
Alballa N 2021 [19] 52 11 23 22
Alharbi B 2022 [20] 15 8 3 3
Alhasan M 2021 [21] 20 20 5 5
Alsolai H [22] 27 27 NA NA
Anteby R 2021 [23] 16 8 NA NA
Bang CS 2021a [24] 33 33 33 33
Bang CS 2021b [25] 21 21 21 21
Bazoukis G 2021 [27] 122 48 35 31
Bedrikovetski S 2021a [28] 17 11 9 9
Bedrikovetski S 2021b [29] 17 11 10 10
Benoit J 2020 [30] 16 7 7 8
Bernert RA 2020 [31] 87 54 52 50
Binvignat M 2022 [33] 46 NA NA NA
Boyd C 2021 [35] 47 29 22 15
Bracher-Smith M 2021 [36] 13 7 5 5
Buchlak Q 2020 [37] 70 NA NA NA
Buisson M 2021 [38] 6 3 2 2
Cabitza F 2018 [39] 70 70 70 70
Castaldo R 2021 [40] 29 NA 7 7
Cavus N 2021 [41] 22 NA NA NA
Celtikci E 2018 [42] 51 1 4 4
Chee M 2021 [44] 14 11 0 0
Chiesa-Estomba C 2021 [45] 8 0 8 0
Cho S 2021 [46] 12 3 3 3
Choudhury A 2020a [47] 35 25 19 16
Choudhury A 2020b [48] 35 21 23 14
da Silva Neto S 2022 [49] 15 11 NA NA
Dallora A 2017 [50] 37 NA NA NA
Dallora A 2019 [51] 26 4 0 0
Daniel 2023 [54] 39 17 6 NA
D’Antoni F 2021 [53] 76 76 NA NA
Das PK 2022 [54] 29 29 24 23
de Bardeci M 2021 [56] 30 29 30 30
Decharatanachart P 2021 [57] 19 19 19 19
DelSole E 2021 [59] 44 14 9 8
Dogan O 2021 [60] 264 127 177 166
Dudchenko A 2020 [61] 27 14 9 9
Ebrahimighahnavieh A 2020 [63] 114 43 0 0
El-Daw S 2021 [64] 8 4 7 6
Falconer N 2021 [65] 8 7 5 5
Farook T 2021 [66] 34 21 16 12
Fernandes F 2021 [67] 18 9 9 7
Fregoso-Aparicio L 2021 [68] 90 63 NA 23
Frondelius T 2022 [69] 20 20 NA NA
Fusco R 2021 [70] 23 16 11 10
Garrow C 2021 [71] 35 23 18 15
Ghaderzadeh M 2019 [72] 37 35 0 0
Gutiérrez-Tobal G 2021 [74] 19 17 NA NA
Hasan N 2021 [77] 51 38 NA NA
Hassan N 2021 [78] 17 16 NA NA
Henn J 2021 [79] 47 42 NA NA
Hickman S 2021 [80] 14 12 NA NA
Hinterwimmer F 2021 [81] 19 14 9 6
Hoekstra O 2022 [82] 35 9 17 8
Hoodbhoy Z 2021 [83] 363 272 NA NA
Hosni M 2019 [84] 193 185 NA NA
Hoyos W 2021 [85] 64 58 NA NA
Huang J 2021 [86] 35 29 NA NA
Huang S 2021 [87] 30 24 NA NA
Huang Z 2021 [88] 28 27 NA NA
Ibrahim B 2021 [89] 36 34 NA NA
Infante T 2021 [90] 60 53 NA NA
Irgang L 2023 [91] 59 NA NA NA
Islam MN 2022 [92] 26 NA NA NA
Jiang M 2021 [94] 32 11 NA NA
(Continued )
26 K. KOLASA ET AL.
Table 3. (Continued).
No of studies that reported
Lead author & year No of studies Accuracy Sensitivity Specificity
Kalhori SR 2021 [96] 62 56 NA NA
Kausch S 2021 [100] 14 14 14 14
Kawamoto A 2022 [101] 27 8 3 2
Kedra J 2019 [102] 110 58 39 32
Kennedy EE 2022 [103] 28 3 8 10
Kim HR 2022 [239] 72 NA NA NA
Kim S 2021 [106] 17 16 NA NA
Kodama S 2021 [107] 33 30 NA NA
Komolafe T 2021 [108] 19 NA 16 15
Kozikowski M 2021 [110] 8 5 NA NA
Kumar S 2021 [111] 64 36 NA NA
Kumar Y 2021 [112] 185 94 135 130
Le Glaz A 2021 [116] 58 24 NA NA
Lecointre L 2021 [117] 17 3 NA NA
Lequertier V 2021 [118] 74 67 NA NA
Li M 2021 [120] 252 239 NA NA
Li Y 2021 [121] 22 18 16 15
Librenza-Garcia D 2017 [122] 51 48 NA NA
Lima CLD 2022 [123] 139 NA NA NA
Locquet M 2021 [124] 13 10 NA NA
Lubelski D 2021 [126] 31 22 NA NA
Mangold C 2021 [128] 11 9 4 4
Mari 2022 [129] 44 42 15 15
Matsangidou M 2021 [130] 26 3 3 3
Mawdslay E 2021 [131] 9 4 6 6
Medic G 2019 [132] 20 17 9 8
Mellia J 2021 [134] 29 24 13 13
Miltiadous A 2023 [135] 190 120 15 NA
Minissi M 2021 [136] 11 1 0 0
Miranda L 2021 [137] 20 6 15 12
Mirzania D 2021 [138] 29 19 23 22
Moezzi M 2021 [139] 36 9 4 2
Moglia A 2021 [140] 35 23 35 31
Mohan B 2021 [141] 9 5 1 4
Mondal M 2021 [142] 52 27 30 33
Moor M 2021 [144] 22 9 2 2
Moshawrab M 2023 [145] 87 69 16 13
Moura F 2021 [146] 46 11 8 NA
Musa N 2022 [149] 150 147 52 70
Nadarajah R 2021 [151] 11 4 6 6
Naemi A 2021 [152] 15 15 15 15
Nafea MS 2022 [153] 91 10 101 9
Nasser M 2023 [154] 98 42 18 14
Nazarian S 2021 [155] 48 42 30 30
Ogink P 2021a [157] 33 17 NA NA
Ogink P 2021b [158] 59 59 59 59
Ortiz-Barrios M 2021 [159] 65 11 16 15
Ossai C 2021 [160] 26 20 16 15
Paganelli AI 2022 [240] 36 20 10 10
Pahwa B 2021 [162] 15 3 2 3
Patil S 2019 [163] 7 5 0 0
Peralta M 2021 [164] 73 42 28 19
Persad E 2021 [165] 36 1 10 10
Popa S 2021 [166] 37 5 5 NA
Prasoppokakorn T 2021 [167] 8 NA 8 8
Quaak M 2021 [168] 44 43 29 29
Ramesh S 2021 [170] 42 7 3 3
Ramos-Lima L 2020 [171] 49 28 14 14
Ravegnini G 2021 [172] 6 1 1 1
Ren M 2021 [173] 11 9 NA NA
Rice P 2021 [174] 8 6 8 8
Sajjadian M 2021 [177] 54 but only 8 adequate-quality papers 30 NA NA
Salas-Zárate R 2022 [178] 34 1 NA NA
Salem H 2021 [179] 138 69 NA NA
Sanmarchi F 2023 [180] 68 30 29 34
Sardar SK 2022 [183] 76 NA NA NA
Scardoni A 2020 [184] 27 10 15 16
Segato A 2020 [185] 154 102 57 54
Senanayake S 2019 [186] 18 6 12 10
Senders J 2018 [187] 23 11 7 6
Shillan D 2019 [189] 258 168 168 168
Siddiqui S 2022 [192] 45 33 21 24
(Continued )
EXPERT REVIEW OF PHARMACOECONOMICS & OUTCOMES RESEARCH 27
publications confirmed the superiority (inferiority) of ML over
clinicians and three did not indicate any results. The median
number of clinical experts included in the validation was six
(range 1–511).
The methodological approach to the missing data was
discussed in 144 studies, with the most common being impu-
tation (Table 5).
In total, over 10,000 ML algorithms were used for the
included SLRs (Table 6). The most common modeling
approach was neural networks (2,454 studies), followed by
SVM and RF/decision trees (1,578 and 1,522 studies,
respectively).
4. Discussion
To the best of our knowledge, this is the first attempt at system-
atically studying the integration of ML algorithms in healthcare.
The number of identified SLRs and AI algorithms along with the
coverage of disease areas demonstrates the level of interest and
effort dedicated to the application of ML in medical settings.
Key findings revealed:
The relative high frequency of reported ML application in
oncology and neurology
The reported high level of ML prediction ability in many
diseases area
How disease area impacts the type of ML algorithms and
data sources used.
4.1. Disease areas of application and ML ability
The frequent quotations of oncology and neurology (Table 2)
as disease area of the identified SLR likely reflects the preva-
lence of such diseases and the impact on patients’ lives.
The key highlights of this review were the low reporting
quality of publications dedicated to the development and
adaptation of ML algorithms in clinical practice. There was
a significant share of studies without data on accuracy (44%),
sensitivity (72%), and specificity (75%), as well as internal
(65%) and external (99%) validations. Additionally, only 44
studies (2% of total) reported a methodological approach for
handling of missing data.
The most commonly used type of data was radiological
imaging adopted for the development of ML solutions toward
the clinical prediction and categorization as well as the disease
prognosis in the field of oncology and neurology.
4.2. Type of ML algorithms
However, the most frequently published type of ML was the
artificial neural network (ANN). Neural networks try to replicate
how neurons work with information provided and processed
based on activation functions. These methods can achieve
high accuracy but tend to be time-consuming. ANNs can
detect complex nonlinear relationships and interactions
between the dependent and independent variables (universal
approximators). Deep learning (DL) methods are primarily
used in oncological or respiratory disease studies. The increas-
ing use of DL has been observed during the COVID-19 pan-
demic. A systematic literature review of 34 studies indicated
that ML could enhance the sensitivity and specificity of radio-
graphic images compared with radiologists’ diagnoses.
Our review indicated that apart from neural networks, SVM
is the most frequently used after deep neural networks. SVM
use hyperplanes to separate data; they can achieve high accu-
racy but generally slow to train. The highly similar perfor-
mance of SVM, particularly in terms of classification accuracy,
Table 3. (Continued).
No of studies that reported
Lead author & year No of studies Accuracy Sensitivity Specificity
Soffer S 2021 [194] 5 NA 4 3
Song D 2021 [195] 44 12 12 12
Sorin V 2020 [197] 10 5 1 1
Srivani M 2022 [198] 10 5 NA NA
Stephens M 2022 [199] 35 9 12 9
Stewart J 2021 [200] 23 2 7 7
Stokes K 2022 [201] 16 8 8 7
Subramanian H 2021 [202] 12 9 5 5
Syeda H 2021 [203] 130 NA NA NA
Syer T 2021 [204] 27 NA 16 14
Tabatabaei M 2021 [205] 18 NA 15 10
Tewarie I 2021 [208] 27 5 NA NA
Triantafyllidis A 2020 [209] 9 4 NA NA
Ugga L 2021 [210] 8 3 1 1
van Kempen E 2021a [211] 17 NA 11 11
van Kempen E 2021b [212] 17 15 12 12
Wang W 2020 [214] 18 9 8 7
Wongkoblap A 2017 [216] 48 NA NA NA
Yeo M 2021 [220] 11 5 10 8
Yu K 2020 [222] 33 14 3 2
Zakhem G 2020 [223] 51 NA NA NA
Zeiser F 2021 [225] 53 35 6 4
Zheng Y 2022 [229] 8 NA 5 NA
Zhu T 2020 [231] 40 12 32 14
Total 7,752 4,311 2,194 1,964
NA – not applicable.
28 K. KOLASA ET AL.
Table 4. Reviews with comparison of ML vs. clinical experts.
Lead author
& year
How many
studies
provided
methodological
approach to
comparison
Lead author
& year
(primary
study) Clinical experts
Total number of
clinical experts [N]
Number of clinical experts
with specialization [n] Experience [years, junior-senior] Other features of expercience Blinded Clinic center Results
Adeoye J 2021
[13]
2 out of 27 Ariji Y 2019
[241]
Radiologists N = 2 NA NA NA NA NA Comparable
performance
Ariji Y 2020
[242]
Radiologists N = 4 NA >20 years of experience (n = 1)
> 10 years of experience (n = 3)
NA NA Single (national) Superiority of the
ML
Al Hinai
G 2021 [16]
2 out of 12 Kwon JM
2020 [243]
Cardiologists N = 4 NA Over 10 years in clinical practice Y: Board-certified practicing
cardiologists
NA Single (national) Superiority of the
ML
Nakajima
K 2017
[244]
Nuclear cardiology
experts
N = 8 Nuclear cardiology experts
reconfirmed the
appropriateness of the
judgment without
clinical information (n
= 6)
experts interpreted the
data during this process
and modified the
judgment with referral
to other expert
opinions to reach
consensus (n = 2).
NA NA NA NA Comparable
performance
Buisson
M 2021
[38]
6 out of 6 Al-Aswad LA
2019 [245]
Ophthalmologists
Members in training
N = 6 Glaucoma fellows (n = 2)
Glaucoma specialist (n = 1)
Neuroophthalmologist (n
= 1)
Members in
training: second-year
and third-year resident
(n = 2)
NA NA NA Superiority of the
ML
Liu S 2018
[246]
Glaucoma specialists
General
ophthalmologists
N = 18 Glaucoma specialists (n =
11)
General ophthalmologists
(n = 7)
NA NA NA Multi
(international)
Superiority of the
ML
Phene S 2019
[247]
Ophthalmologists
Glaucoma specialists
N = 12 glaucoma
specialists
N = 6
ophthalmologists
N = 4 glaucoma
specialists
(Validation Dataset
A, B, C)
NA NA NA NA NA Superiority of the
ML
(Continued )
EXPERT REVIEW OF PHARMACOECONOMICS & OUTCOMES RESEARCH 29
Table 4. (Continued).
Lead author
& year
How many
studies
provided
methodological
approach to
comparison
Lead author
& year
(primary
study) Clinical experts
Total number of
clinical experts [N]
Number of clinical experts
with specialization [n] Experience [years, junior-senior] Other features of expercience Blinded Clinic center Results
Shibata
N 2018
[248]
Residents in
Ophthalmology
N = 3 First year in
Ophthalmology
residency (n = 1)
Third year in
Ophthalmology
residency (n = 1)
Fourth year in
Ophthalmology
residency (n = 1)
NA NA NA NA Superiority of the
ML
Seo E 2018
[249]
Glaucoma specialists N = 2 NA NA NA NA NA Comparable
performance
Jammal AA
2019 [250]
Glaucoma specialists N = 2 NA NA NA NA NA Comparable
performance
Haggenmuller
S 2021 [75]
19 out of 19 Brinker TJ
2019 [251]
Dermatologists N = 157 NA Junior physicians (n = 88)
Attendings (n = 15)
Senior physicians (n = 45)
Chief physicians (n = 3)
Y: University hospital (n = 151)
Resident physicians (n = 6)
NA Multi (national) Superiority of the
ML
(CNN
outperformed
136 out of 157
dermatologists)
Brinker TJ
2019 [252]
Dermatologists N = 144 nie dotyczy Junior clinicians (n = 92)
Board-certified dermatologists (n = 52)
NA NA Multi (national) Superiority of the
ML
Yu JS 2018
[253]
Dermatologist’s and
non-expert’s
N = 4 General physicians (n = 2)
Dermatologists (n = 2)
Expert group: five or more years of clinical
experience in dermoscopy
Non-expert group: non-trained general physicians.
NA NA NA Comparable
performance
Marchetti MA
2018 [254]
Dermatologists N = 8 NA The mean (range) number of years of
postresidency clinical experience was 13 years
(range, 3–31 years).
The mean (range) number of years of the use of
dermoscopy among readers was 13.5 years
(range, 6–27 years).
All had a primary clinical focus on skin cancer.
NA Y: blinded to diagnosis and
clinical images/metadata
Multi
(international)
Superiority of the
ML
Marchetti MA
2020 [255]
Dermatologists
Dermatology
residents
N = 17 Dermatologists (n = 8)
Dermatology residents (n
= 9)
The mean (range) number of years of post-
residency clinical experience and use of
dermoscopy of the dermatologists was 14 (4–
32) and 14.5 (7–28) years, respectively.
NA Y: blinded to diagnosis, clinical
images, and metadata
Multi
(international)
Superiority of the
ML
Haenssle HA
2018 [256]
Dermatologists N = 58 NA Beginners: <2 years of experience (n = 17)
Skilled: 2–5 years of experience (n = 11)
Expert: > 5 years of experience (n = 30)
(self-reported levels of experience with
dermoscopy)
NA NA Multi
(international)
Superiority of the
ML
Haenssle HA
2020 [257]
Dermatologists N = 96 NA Beginners: <2 years of experience (n = 17)
Skilled: 2–5 years of experience (n = 29)
Expert: > 5 years of experience (n = 40)
(self-reported levels of experience with
dermoscopy)
NA Y: blinded of authors to true
diagnoses
NA Comparable
performance
(Continued )
30 K. KOLASA ET AL.
Table 4. (Continued).
Lead author
& year
How many
studies
provided
methodological
approach to
comparison
Lead author
& year
(primary
study) Clinical experts
Total number of
clinical experts [N]
Number of clinical experts
with specialization [n] Experience [years, junior-senior] Other features of expercience Blinded Clinic center Results
Haenssle 2021
[258]
Dermatologists N = 64 NA Beginners: <2 years of experience (n = 9)
Skilled: 2–5 years of experience (n = 20)
Expert: > 5 years of experience (n = 30)
No information provided (n = 5)
NA Y: blinding of authors to true
diagnoses
NA Superiority of the
ML
Tschandl
P 2019
[259]
Dermatologists
Dermatology
residents
General practitioners
N = 511 Board-certified
dermatologists (n =
283)
Dermatology residents (n
= 118)
General practitioners (n =
83)
No information provided
(n = 27)
<1 year experience (n = 80)
>1 year experience (n = 99)
>3 years experience (n = 194)
>5 years experience (n = 111)
Experts: >10 years experience (n = 27)
NA NA Multi
(international)
Superiority of the
ML
Maron RC
2019 [260]
Dermatologists N = 112 nie dotyczy Resident physicians (n = 4)
Chief physicians (n = 1)
Senior physicians (n = 28)
Attendings (n = 12)
Junior physicians (n = 67)
Y: The median years of
dermatologic practice was 4
years,
47 (42.0%) dermatologists
possessing less than 3 years of
experience with dermoscopic
examinations
37 (33.0%) between three and 10
years with dermoscopic
examinations
and 28 (25.0%) with more than
10 years with dermoscopic
examinations
NA Multi (national) Superiority of the
ML
Tschandl
P 2019
[261]
Dermatologists
Dermatology
residents
Others
N = 95 Dermatology Specialist (n
= 62)
Dermatology residents (n
= 12)
General practitioners (n =
17)
Medical Student (n = 1)
Nurse (n = 2)
Oncologist (n = 2)
Beginner raters: <3 years (n = 31)
Intermediate raters: 3–10 years (n = 28)
Expert raters: >10 year (n = 36)
51.6% female mean age, 43.4
years (95% CI, 41.0–45.7
years);
board-certified dermatologists.
NA Multi
(international)
Comparable
performance
Fujisawa 2018
[262]
Dermatologists N = 22 Dermatologic trainees (n =
9)
Board-certified (n = 13)
NA NA NA NA Superiority of the
ML
Jinnai S 2020
[263]
Dermatologists N = 20 Dermatologic trainees (n =
10)
Board-certified (n = 10)
NA NA NA NA Superiority of the
ML
(Continued )
EXPERT REVIEW OF PHARMACOECONOMICS & OUTCOMES RESEARCH 31
Table 4. (Continued).
Lead author
& year
How many
studies
provided
methodological
approach to
comparison
Lead author
& year
(primary
study) Clinical experts
Total number of
clinical experts [N]
Number of clinical experts
with specialization [n] Experience [years, junior-senior] Other features of expercience Blinded Clinic center Results
Han SS 2020
[264]
Dermatologists N = 47 (Binary)
N = 4 (Multiclass)
Board-certified (n = 21)
Dermatology residents (n
= 26)
Dermatologists (n = 4)
Board-certified (n = 2)
Dermatology residents (n
= 2)
NA NA Y: all clinical information NA Non-inferiority
(Binary)
Comparable
performance
(Multiclass)
Han SS 2018
[265]
Dermatologists
board members
N = 16 Clinicians (>10 years of
experience) (n = 6)
Professors (n = 10)
NA NA NA NA Non-inferiority
Brinker TJ
2019 [266]
Dermatologists N = 145 Junior clinicians (n = 88)
Attendings (n = 16)
Senior clinicians (n = 35)
Chief clinicians (n = 3)
Y: Fig. 1, Fig. 2. Y: University hospital (n = 142)
Resident physicians (n = 3)
NA Multi (national) Non-inferiority
Han SS 2020
[267]
Dermatologists N = 44 Attending clinicians (n =
65)
Board-certified
dermatologists (n = 44)
Y:
Dermatologists: 5.7 ± 5.2 (mean ± SD) years of
experience after board certification
Attending physicians: 7.1 ± 9.5 (mean ± SD) years
of experiences after board certification at the
time of biopsy request.
NA NA NA Non-inferiority
(Binary)
Significant
superiority of
the ML
(Multiclass)
Hekler A 2019
[252]
Pathologists N = 11 NA Y:
Junior physicians: with less than 3 years of practical
experience (n = 3)
Board-certified pathologists: more than 4 years of
practical experience (n = 8)
NA NA Multi (national) Superiority of the
ML
Brinker TJ
2022 [268]
Dermatopathologists N = 18 NA Each dermatopathologistsat with at least 5 years of
experience
NA NA Multi
(international)
Non-inferiority
Henn J 2021
[79]
2 out of 47 Brennan
M 2019
[269]
Physicians N = 20 Surgical intensivists
attending physicians (n
= 14)
Residents – trainees in
anesthesiology and
surgical fellowships (n
= 6)
Y: with an average of 13 years of experience. NA Y: blinded to the observed
outcomes of the cases.
Single (national) Superiority of the
ML
Kambakamba
P 2020
[270]
Radiologists N = 2 NA Y: experienced in cross-sectional image
interpretation
NA Y: blinded to the clinical and
histopathologic information
NA No results of direct
comparison
with experts
Hickman
S 2021 [80]
12 out of 14 Yala A 2019
[271]
Radiologists
(fellowship-
trained or
equivalent breast
imaging
radiologists)
N = 23 NA Y: 1–31 years of experience NA NA Single (national) Comparable
performance
(Continued )
32 K. KOLASA ET AL.
Table 4. (Continued).
Lead author
& year
How many
studies
provided
methodological
approach to
comparison
Lead author
& year
(primary
study) Clinical experts
Total number of
clinical experts [N]
Number of clinical experts
with specialization [n] Experience [years, junior-senior] Other features of expercience Blinded Clinic center Results
McKinney SM
2020 [272]
Radiologists N = 57 UK reader study (n = 51)
USA reader study (n = 6)
Y: UK reader study: years of experience (12; 7; 4; 12;
15; 10), reads per year (2,000–5,500), Fellowship
trained: Yes by 2
USA reader study: years of experience (4 × 5–10;
5 × 10–15; 4 × 15–20; 4 × 20 + 5; 33 ×
unknown), reads per year (3,000–8,000+),
Fellowship trained: 8 × Consultant Radiologist;
6 × Consultant Radiographer; 4 × Advanced
Practitioner Radiographer; 33 × unknown).
Y: US-board-certified radiologists
who were compliant with the
requirements of the
Mammography Quality
Standards Act (MQSA)
Y: None of the readers who
interpreted the images (either
in the course of clinical
practice or in the context of
the reader study) had
knowledge of any aspect of
the AI system.
Double (UK) and
Single (US)
Comparable
performance
Balta C 2020
[273]
Radiologists N = 6 NA NA NA NA Double (national) Comparable
performance
Dembrower
K 2020
[274]
Radiologists N = 2 NA NA NA NA Double (national) Comparable
performance
Kyono T 2019
[275]
Radiologists N = 30 NA Y: at least 2 years of experience reading 5,000
mammograms or more per year
NA NA Single (national) Comparable
performance
Rodriguez-
Ruiz
A 2019
[276]
Radiologists N = 101 Wallis 2012 (n = 14)
Visser 2012 (n = 6)
Hupse 2013 (n = 9)
Gennaro 2013 (n = 6)
Siemens Medical Solutions
2015 (n = 22)
Siemens Medical Solutions
2015 (n = 31)
Garayoa 2018 (n = 3)
Rodriguez-Ruiz 2018 (n =
6)
Clauser 2018 (n = 4)
Wallis 2012 (3–525; avg. −10)
Visser 2012 (1–34)
Hupse 2013 (1–24; avg. −14)
Gennaro 2013 (5–30)
Siemens Medical Solutions 2015 (>5)
Siemens Medical Solutions 2015 (>5)
Garayoa 2018 (10–20)
Rodriguez-Ruiz 2018 (3–44 (avg. −22)
Clauser 2018 (>5)
NA NA Multi
(international)
Comparable
performance
Geras K 2018
[277]
Radiologists N = 4 NA experienced in reading breast cancer screening
exams
NA NA Single (national) Comparable
performance
Lotter W 2019
[278]
Radiologists N = 5 NA Y: practiced for an average of 5.6 years post-
fellowship (range 2–12 years); the readers read
an average of 6,969 mammograms over
the year preceding the reader study (range of
2,921–9,260)
Y: Board-certified and MQSA-
qualified, fellowship-trained in
breast imaging
Y: blinded to the ground truth of
the cases
Single (national) Superiority of the
ML
Rodriguez
Ruiz
A 2019
[279]
Radiologists N = 14 general radiologists (n = 3)
dedicated breast
radiologists (n = 11)
Y: The median experience with MQSA qualification
was 9.5 years (range, 3–25 years)
mean number of mammograms read per year
during the past 2 years was 5,900 (range,
1,200–10,000)
Y: Mammography Quality
Standard Act – qualified
radiologists
Y: blinded to any information
about the patient, including
previous radiology and
histopathology reports.
Single (national) Comparable
performance
Schaffter
T 2020
[280]
Radiologists NA single radiologist
consensus radiologist
NA NA NA Multi
(international)
Inferiority
Salim M 2020
[281]
Radiologists N = 45 first-reader (n = 25)
second-reader (n = 20)
NA NA NA Double (national) Comparable
performance
(Continued )
EXPERT REVIEW OF PHARMACOECONOMICS & OUTCOMES RESEARCH 33
Table 4. (Continued).
Lead author
& year
How many
studies
provided
methodological
approach to
comparison
Lead author
& year
(primary
study) Clinical experts
Total number of
clinical experts [N]
Number of clinical experts
with specialization [n] Experience [years, junior-senior] Other features of expercience Blinded Clinic center Results
Kim HE 2020
[267]
Radiologists N = 14 breast specialists (n = 7)
general radiologists
(n = 7)
NA Y: Board-certified radiologists,
general radiologists had not
been specifically trained in
breast imaging; breast
specialists had been trained in
breast imaging for at least 6
months
Y: observer-blinded study Multi (national) Superiority of the
ML
Jones O 2020
[95]
1 out of 16 Cowley JB
2013 [282]
Specialists (primary
care physicians,
surgeons)
N = 4 practicing primary care
physicians (n = 2)
post CCT Colorectal
surgeons (n = 2)
NA NA NA Single (national) Superiority of the
ML
Kassem
M 2021
[99]
1 out of 49 Brinker TJ
2019 [252]
Dermatologists N = 157 NA Junior physicians (n = 88)
Attendings (n = 15)
Senior physicians (n = 45)
Chief physicians (n = 3)
Y: University hospital (n = 151)
Resident physicians (n = 6)
NA Multi (national) Superiority of the
ML
(CNN
outperformed
136 out of 157
dermatologists)
Kuntz S 2021
[113]
6 out of 16 Wei JW 2020
[283]
Gastrointestinal
pathologists
N = 5 NA Y: 3 with gastrointestinal pathology fellowship
training and 2 who gained gastrointestinal
pathology expertise through years of
gastrointestinal pathology service.
NA NA Single (national) Superiority of the
ML
Song Z 2020
[284]
Pathologists N = 5 NA NA NA NA Double (national) Comparable
performance
Bychkov
D 2018
[285]
Pathologists N = 3 NA NA NA Y: blinded to patient outcome
and the tissue cores were
chosen
NA Superiority of the
ML
Geessink
O 2019
[286]
Pathologists N = 2 NA Y: > 10 years of experience with TSR scoring NA NA Double (national) Comparable
performance
Kather JN
2019 [287]
Pathologists NA NA NA NA Y: blinded to all other
clinicopathological variables,
outcome data, or gene
expression data
NA No results of direct
comparison
with experts
Zhao K 2020
[288]
Pathologists N = 2 NA NA NA Y: blinded to patient clinical
information and outcome
Double (national) Comparable
performance
Nguyen
A 2018
[156]
4 out of 8 Suh HB 2018
[289]
Neuroradiologists N = 5 Review of the images (n =
3)
Draw the region of interest
(ROIs) (n = 2)
Y: Review: 5, 5 and 6 years’ experience,
respectively, in radiology
ROIs: 5, 9 years’experience in radiology.
NA Y: blinded to clinical information Single (national) Superiority of the
ML
Kang D 2018
[290]
Neuroradiologist N = 2 External validation (n = 2)
Tumor segmentation (n =
1)
Y:
external validation: 5 years and 20 years of
experience
tumor segmentation: 4 years of experience in
neuro-oncological imaging
NA NA NA Superiority of the
ML
Alcaide-Leon
2017 [291]
Neuroradiologists N = 3 NA Y: with 3, 2, and 4 years of experience in
neuroradiology after residency
NA Y: blinded to clinical information
and pathology reports.
Multi (national) Non-inferiority
(Continued )
34 K. KOLASA ET AL.
Table 4. (Continued).
Lead author
& year
How many
studies
provided
methodological
approach to
comparison
Lead author
& year
(primary
study) Clinical experts
Total number of
clinical experts [N]
Number of clinical experts
with specialization [n] Experience [years, junior-senior] Other features of expercience Blinded Clinic center Results
Yamashita
K 2008
[292]
Nauroradiologists
Residents
N = 11 Review of the images (n =
2)
Observer test:
neuroradiologists,
residents (n = 9)
13, 10, 8, 8, 6, 5, 3, 3, and 3 years of experience in
radiology practice. The first 3 were attending
neuroradiologists.
Y: The radiologists with 6 or more
years of experience were
board certified in Japan, and
the other radiologists,
including residents, had not
yet received board
certification.
NA NA Comparable
performance
Ossai C 2021
[160]
3 out of 26 Rehm GB
2018 [293]
Intensive care unit
physicians
N = 2 NA NA NA NA Single (national) Comparable
performance
Bakkes TH
2020 [294]
Expert N = 1 NA NA NA NA Single (national) Inferiority
Mulqueeny
Q 2009
[295]
Expert N = 1 NA NA NA Y: blinded to the clinical data of
the patient and was not
involved in their care
Single (national) Comparable
performance
Quaak M 2021
[168]
1 out of 44 Aghdam MA
2019 [296]
Expert (mixture of
CNN experts)
N = 3 NA NA NA NA NA Comparable
performance
Salem H 2021
[179]
14 out of 138 Petrucci
K 1991
[297]
Nurse N = 10 Nurse experts (nurse
practitioners, clinical
specialists) (n = 4)
Nurse evaluators (who
were also experts) (n =
6)
NA Y: 3 out of 4 nurse experts held
masters degrees in nursing.
All of the evaluators held masters
degrees in nursing and one
held a doctoral degree.
Y: blinded to the purpose of the
study
NA Non-inferiority
Gorman
R 1995
[298]
Nurse N = 5 Gerontological nurse
experts (n = 4)
Nurse knowledgeable in
expert systems (n = 1)
NA NA NA NA Superiority of the
ML
Chang P 1999
[299]
Urology attending
physicians
Urology residents
N = 9 Urology attending
physicians (n = 4)
Urology residents (chief
resident n = 1, second-
year urology residents
n = 2, first-year urology
residents n = 2)
NA NA NA Single (national) Superiority of the
ML
Koutsojannis
C 2004
[300]
Urology residents
Expert doctor
N = 4 Urology residents (n = 3)
Expert doctor (n = 1)
NA Y: Expert doctor was the director
of the ‘Andrology Lab’
NA NA Quite better
performance
than non-
expert
urologists
Inferiority than
clinical experts
Koutsojannis
C 2009
[301]
Urology doctor for
prostate cancer
N = 1 NA NA NA NA Single (national) Comparable
performance
Altunay
S 2009
[302]
Expert physician N = 1 NA NA NA NA Single (national) Comparable
performance
(Continued )
EXPERT REVIEW OF PHARMACOECONOMICS & OUTCOMES RESEARCH 35
Table 4. (Continued).
Lead author
& year
How many
studies
provided
methodological
approach to
comparison
Lead author
& year
(primary
study) Clinical experts
Total number of
clinical experts [N]
Number of clinical experts
with specialization [n] Experience [years, junior-senior] Other features of expercience Blinded Clinic center Results
Koutsojannis
C 2012
[303]
Expert doctor N = 1 NA NA NA NA Single (national) Comparable
performance
Hassanien
A 2011
[304]
Expert (overlap
measure)
N = 1 NA NA NA NA Single (national) No results of direct
comparison
with experts
Torshizi AD
2014 [305]
Expert doctor N = 1 NA NA NA NA Single (national) Comparable
performance
Xiao D 2016
[306]
Radiological expert N = 1 NA NA NA NA Single (national) Comparable
performance
Hurst RE 1997
[307]
Human expert N = 1 NA NA NA NA Single (national) Inferiority
Hao AT 2013
[308]
Expert nurses N = 6 Expert nurses involved in
Delphi method (n = 9)
Expert nurses evaluate the
usability of the systems
(n = 6)
NA NA NA Single (national) Comparable
performance
Lopes MH
2013 [309]
Expert nurses N = 3 NA NA NA NA NA Comparable
performance
Koutsojannis
C 2008
[310]
Multiple experts NA NA NA NA NA NA Comparable
performance
Shillan D 2019
[189]
2 out of 258 primary studies not indicated
Shlobin
N 2021
[191]
1 out of 40 Qiu W 2020
[311]
Expert N = 2 NA NA NA NA Single (national) Comparable
performance
Siddiqui
S 2022
[192]
1 out of 45 Qianqian Ni
2020 [312]
Radiologist N = 3 cardiothoracic resident
radiologists (n = 3)
Y: 6, 5, and 2 years of experiences in chest imaging
interpretation
NA Y: blinded to clinical data and
previous imaging results
Multi Comparable
performance
(expert as gold
standard)
Smets J 2021
[193]
12 out of 89 Tomita
N 2018
[313]
Radiologist N = 1 NA Y: over 18 years of musculoskeletal (MSK) radiology
experience
Y: Board-certified Y: blinded to the original
diagnoses reported in
radiology reports and system’s
results
Single (national) Comparable
performance
Murata
K 2020
[314]
Orthopedic experts N = 53 orthopedic residents (n =
20)
board certified orthopedic
surgeons (n = 24)
board certified spine
surgeons (n = 9)
NA NA NA Single (national) Comparable
performance
Cheng CT
2019 [315]
Experts N = 21 Primary physicians (n = 15)
Radiologists (n = 2)
Orthopedic surgeons (n =
4)
NA NA NA Multi (national) Comparable
performance
(Continued )
36 K. KOLASA ET AL.
Table 4. (Continued).
Lead author
& year
How many
studies
provided
methodological
approach to
comparison
Lead author
& year
(primary
study) Clinical experts
Total number of
clinical experts [N]
Number of clinical experts
with specialization [n] Experience [years, junior-senior] Other features of expercience Blinded Clinic center Results
Yu JS 2020
[316]
Radiologists N = 6 Musculoskeletal fellow (n
= 1)
Musculoskeletal radiologist
(n = 2)
Diagnostic radiology
residents (n = 3)
NA NA NA Comparable
performance
Yamada
Y 2020
[317]
Orthopedic surgeons N = 4 Orthopedic surgeons (n =
2)
Resident orthopedic
surgeons (n = 2)
Y: 22, 14 years of experience
4, 3 years of experience
Y: Board-certified Y: blinded to clinical information
such as the age of the patient
and the mechanism of injury
Single (national) Superiority of the
ML
Jimenez-
Sanchez
A 2020
[318]
Clinical experts N = 3 Trauma surgeon (n = 1)
Radiologist (n = 1)
Resident trauma surgeon
(under the supervision
of the radiologist) (n =
1)
Y: -, senior, 5 year resident NA NA Single (national) Comparable
performance
Adams
M 2019
[319]
Human experts NA Radiologists
Radiology residents
NA Y: Board-certified NA NA Comparable
performance
Mawatari
T 2020
[320]
Clinical experts N = 7 Radiologists (n = 3)
Orthopedist (n = 1)
Radiology specialist trainee
(n = 1)
General physician (n = 1)
Senior resident (n = 1)
Y: 9, 13, 24 years of experience respectively.
22 years of experience
4 years of experience
4 years of experience
senior
NA NA NA Superiority of the
ML
Urakawa
T 2019
[321]
Orthopedic surgeons N = 5 NA NA Y: 2 board-certified and 3 non-
board-certified surgeons
NA NA Superiority of the
ML
Chung SW
2018 [322]
Human experts N = 58 General physicians (n = 28)
General orthopedists (n =
11)
Orthopedists specialized in
the shoulder (n = 19)
NA NA NA Multi (national) Superiority of the
ML
Olczak J 2017
[323]
Orthopedic surgeons N = 2 NA Y: senior NA Y: blinded to the network’s and
the other reviewer’s labels
NA Comparable
performance
Lindsey
R 2018
[324]
Orthopedic surgeons N = 18 Y: senior, multiple orthopedic hand surgeons with
many years of experience
NA NA Comparable
performance
Volpe S 2021
[213]
1 out of 48 Qazi A 2011
[325]
Physician N = 1 NA NA NA NA Single (national) Comparable
performance
Zheng Y 2022
[229]
1 out of 8 Misra Hebert
2020 [326]
Practicing clinicians NA NA NA NA NA NA Comparable
performance
NA – not applicable.
EXPERT REVIEW OF PHARMACOECONOMICS & OUTCOMES RESEARCH 37
Table 5. Reviews with missing data (MD) discussed.
ICD-10
chapter
Lead author
& year
Total of
studies
included Is MD discussed?
How many studies
provided
methodological
approach to MD MD methods used
X Bracher-Smith
M 2021 [36]
13 Y: 46% 7 out of 13 (1) Complete-case analysis
(2) Code missingness as category in
predictor
(3) Imputation after excluding high
missingness
(4) Imputation using genetics server/
application,
(5) Imputation in-sample from binomial
distributiona
V Dallora A 2017
[50]
37 Y: 5% 1 out of 37 (1) Censoring data using Cox
proportional hazard models (CPHM)
III Falconer
N 2021 [65]
8 Y: 75% 5 out of 8 (1) Imputation by K-nearest neighbor,
(2) Sample and hold interpolation,
(3) Sample-and-hold and if not applic-
able mean imputation.
X Frondelius
T 2022 [54]
20 Y:15% 3 out of 20 (1) Complete-case analysis (patients
with missing data were excluded)
XIII Hinterwimmer
F 2021 [81]
19 Y: Missing necessary details in the included evidence to
repeat the ML model were one of the methodological
deficiencies.
NA NA
I Hoyos W 2021
[85]
64 Y: According to the reviewed articles, none of the papers
used models to manage uncertainty related to data
quality. To solve this problem, following methods were
used: several machine learning alternatives: Bayesian
models and fuzzy approaches; approaches that use
robust estimators to deal with the problem of missing
values and outliers; generation complementary data to
the existing ones.
NA NA
XXI Huang J 2021
[86]
35 Y: Missing data were accounted for in 5 studies (14%) using
imputation.
5 out of 35 (1) Imputation
XXI Kareemi
H 2020 [97]
23 Y: Included studies often did not report on how they
handled continuous variables, missing data, or
complexities in the data such as patient censoring and
competing risks. Patients with missing data were often
excluded (i.e. ‘case – control analysis’), and a description
of which patients had missing data was infrequently
provided.
NA NA
XXI Kausch S 2021
[100]
14 Y: (Two studies, examined robustness against missing
data.)
NA NA
NA Kennedy EE
2021 [103]
28 Y: 54%; Missing data were not mentioned or handled
optimally in the majority of studies. Only 4 studies
imputed missing data (multiple imputation); 13 models
used complete case analysis (11 studies).
15 out of 28 (1) Multiple imputation
(2) Complete case analysis
II Kozikowski
M 2021
[110]
8 Y (Corresponding authors of selected publications were
contacted individually to complete missing values and
verify data extracted in contingency tables)
NA NA
VI Kumar S 2021
[111]
64 Y (Clinical databases which are collected for specific
research purposes are often relatively well-structured,
standardized, and clean, even though they may still have
a few missing values and outlier while raw EHR
(electronic health record) data sources often have data
quality issues and require significant effort for data
preprocessing and feature engineering. However, they
are a rich source of historical clinical data containing
patient-level elements which can be effectively
leveraged using ML-based computational techniques for
longitudinal analyses of their preclinical phase to
identify prognostic clinical phenotypes.)
NA NA
XXI Lequertier
V 2021 [118]
74 Y (Reimputation method was utilized for missing variable
in 18.9% of inlcuded studies.)
14 out of 74 (1) Carry over the last observation,
(2) Observed mean as a replacement
value,
(3) Machine learning technique,
(4) Regression estimate.
(Continued )
38 K. KOLASA ET AL.
Table 5. (Continued).
ICD-10
chapter
Lead author
& year
Total of
studies
included Is MD discussed?
How many studies
provided
methodological
approach to MD MD methods used
II Li J 2021 [119] 31 Y (31 studies conducted data preprocessing, among which
20 described missing value information and reported
missing value processing strategies, including deleting
directly, multiple imputation, and nearest neighbor
algorithm.)
20 out of 31 (1) Deleting directly,
(2) Multiple imputation,
(3) Nearest neighbor algorithm.
XV Mangold
C 2021 [128]
11 Y (Handling of missing values was reported in 81.8% of the
articles; the handling of missing data was done by
removal of incomplete cases or simple imputation of
missing values.)
9 out of 11 (1) Removal of incomplete cases,
(2) Imputed missing values by hybrid
ANN-CBR (artificial neural network
research framework) approach,
(3) Removed patients missing > 50% of
the attributes, model uses
a probabilistic algorithm to handle
missing data,
(4) Removed variables with > 50% of
missing data, replaced missing data
with mean value,
(5) Simple imputation of missing data,
(6) Imputation by K-nearest neighbors,
(7) Excluded infants with missing data,
(8) Excluded cases with missing data.
VI Mari 2022
[129]
44 Y: Pain intensity 0 (0%), pain phenotype 1 (7%), treatment
response 1 (14%), only 2 of the studies reported their
justification for the sample size and only around half of
the intensity and phenotyping studies reported the
presence and handling of missing data.
25 out of 44 (1) Primary MD not indicated (ROB
assessment for all studies)
XIX Mawdslay
E 2021 [131]
9 Y: 89%, Handling of missing data: a. The outcome variable
can be imputed, or otherwise if those with missing
outcome data are excluded, bias will be minimized
through exploration of whether data are missing at
random (e.g. significance testing of differences in
predictor
variables between those with and without the outcome of
interest). b. Predictor data: missing data should be
imputed rather than excluded when appropriate
quantities of complete data are available.
6 out of 9 (1) Imputation of missing predictor
information,
(2) Excluded cases with outcome data,
(3) Excluded cases with missing predic-
tor or outcome data.
XXI Moglia A 2021
[140]
35 Y: Three studies explained how missing data were treated:
two used only complete data, while another applied
imputation (a technique where missing data are
computed as the mean of the remaining one). There
might be some biases in the results of the reviewed
studies as almost all (32 out of 35) did not report how
missing data were handled.
3 out of 35 (1) Excluded cases with missing data
(used only complete data)
(2) Imputation missing values of any
feature by its median value
XIX Moura F 2021
[146]
46 Y: Extensive research into the use of statistical methods
and machine learning imputation has been made to
safely identify how to estimate these missing values in
data processing and, where appropriate, disregard these
values.
NA NA
XIX Nadarajah
R 2021 [151]
11 Y: 96% of model results were at high risk of bias
predominantly driven by high risk of bias in the analysis
domain (88%). This resulted from exclusion of
participants with missing data from analysis (72%) or not
mentioning missing data (16%)
8 out of 11 (1) Excluded participants with missing
data.
XXI Naemi A 2021
[152]
15 Y: 10 articles did not utilize proper techniques to impute
missing values and excluded up to 32% of incomplete
patients’ records in some studies. Different approaches,
including replacing missing values with median,
considering missing values as a special value and
imputation using GP technique were applied to impute
missing values.
5 out of 15 (1) Replacing missing values with
median,
(2) Considering missing values as
a special value (discrete categories
using K-means clustering),
(3) Imputation using GP technique –
Gaussian Process Regression (GPR).
X Ortiz-Barrios
M 2021
[159]
65 Y: All the selected studies completely reported the data of
interest.
NA Primary MD not indicated (ROB
assessment for all studies)
(Continued )
EXPERT REVIEW OF PHARMACOECONOMICS & OUTCOMES RESEARCH 39
makes them rank among the most popular ML classifiers. In
addition to deep networks and SVM, Decision tree (DT) and
Random Forest (RF) were the most often used. Decision tree
progressively segment data into smaller and smaller groups. It
is a quick method to implement but may not reach high
accuracy. Random Forests is an ensemble bagging technique
whereby numerous decision trees are combined to obtain
final modeling of the results. This process combines both
Table 5. (Continued).
ICD-10
chapter
Lead author
& year
Total of
studies
included Is MD discussed?
How many studies
provided
methodological
approach to MD MD methods used
VI Rowe T 2021
[175]
12 Y: Mitigation methods include exclude samples with >
threshold missing or imputation of missing data.
8 out of 12 (1) The formulation of a latent
representation learning method,
which used incomplete samples,
(2) Imputation missing values by
Expectation Maximization algorithm,
(3) Imputation missing genotypes
(according to the 1000 genome
haplotypes; using MaCH software; or
no method given),
(4) Missing values replaced with median
of nearest neighbors,
(5) Excluded samples with > 10% miss-
ing predictor values.
IX Wang W 2020
[214]
18 Y: 55% - Imputation or case analysis. 10 out of 18 (1) Single imputation,
(2) Complete case analysis,
(3) Cases discarded with missing
outcomes,
(4) Multiple imputation by chained
equations.
NA – not applicable.
Table 6. Types of machine learning in reviews depending on ICD chapters.
Types of ML
ICD-10
Chapter
No of
SLRs
RF/
DT SVM DL Regression kNN Bayes NLP
Neural
Networks
LDA/QDA/
MDA
boosted-
bagged GAM PLS PCA Cox other
I 9 33 18 28 53 3 7 0 53 0 11 2 0 0 2 15
II 43 368 264 222 127 59 58 1 440 15 55 0 5 5 18 196
III 2 2 7 13 0 2 1 0 4 0 0 0 0 0 0 5
IV 7 69 53 2 25 20 26 8 58 4 21 0 1 0 3 112
V 12 44 115 10 41 21 26 0 216 13 1 0 0 3 0 86
VI 30 183 476 15 160 80 86 4 286 44 29 1 4 0 1 383
VII 3 8 7 6 7 1 2 0 8 7 1 0 0 0 0 1
VIII 1 1 6 0 0 3 2 0 4 3 2 0 0 0 0 4
IX 19 165 117 53 79 41 46 4 253 3 51 1 1 2 1 160
X 20 107 89 159 104 21 34 5 182 4 58 0 4 3 4 153
XI 10 21 46 24 25 7 3 1 84 5 9 0 0 1 0 69
XII 4 42 0 64 76 0 31 10 126 0 1 0 0 35 0 227
XIII 15 141 117 37 79 25 47 9 248 3 58 0 1 0 0 88
XIV 5 46 28 4 30 10 10 2 198 3 20 0 0 0 0 71
XV 6 51 32 0 48 13 12 0 22 3 9 0 0 0 0 46
XVI 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
XVII 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
XVIII 2 17 17 13 12 2 10 0 0 1 4 0 0 1 1 18
XIX 7 31 57 3 8 9 12 0 45 9 2 0 0 1 0 29
XX 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
XXI 19 193 129 42 107 14 46 29 227 2 50 0 1 3 5 95
XXII 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Total 214 1,522 1,578 695 981 331 459 73 2,454 119 382 4 17 54 35 1,758
DT: Decision tree; GAM: Generalised Additive Model; kNN: kth nearest neighbor; LDA: Linear Discriminant Analyis; MDA: Multiple Discriminant Analysis; NLP: Natural
Language Programm; PCA: Principal Component Analysis; PLS: Partial Least Square; QDA: Quadratic Discriminant Analysis; RF : Random Forest; SVM: Supporting
Vector Machine.
40 K. KOLASA ET AL.
bootstrapping and aggregation. The key advantage of this
approach is that it can be used for either classification or
regression problems; hence, this is likely one of the reasons
it is used in cardiology. Although RF can provide higher diag-
nostic accuracy and reduce variance without increasing bias,
the operating time might be too long and incompatible to
clinical situations.
Even though boosted methods are known to improve the
performance of the corresponding methods, they have not
been extensively encountered in this review.
Other common ML algorithms were also identified but not
frequently; among these linear regression methods (including
logistic regression), kNN that base their prediction on the
proximity among known data to the case under consideration
(generally quick to implement but suffer from the curse of
dimensionality) and Linear discriminant analysis (LDA) that
segments data according to a hyper-plane orthogonal to the
vector between the mean input values of the different classes.
LDA classifications are easy to implement but may not achieve
high accuracy.
4.3. Strength and weakness
This study has several limitations. Firstly, our review was lim-
ited to literature reviews; consequently, certain information
might have been misunderstood if it had not been presented
in the given SLR. There may have been some over-counting of
the number of ML algorithms identified. We did not have
sufficient details to understand whether any of the publica-
tions used the same data source. Secondly, we did not review
studies that were missed in any SLR; hence, we could have
a biased picture of the utilization of ML in healthcare. Third,
we restricted the review to studies published in English, thus
potentially introducing a selection bias toward certain coun-
tries. Finally, publication bias can not be excluded, as reports
of unsatisfactory or unsuccessful ML applications are rarely
encountered; thus, the actual performance of ML could be
overestimated. Moreover, several techniques may have been
reported differently and may be missed or incorrectly categor-
ized. For example, principal component analysis (PCA) was
also reported in the included SLRs, despite not being strictly
a ML algorithm but a dimensionality reduction technique.
Finally, our ML performance evaluation covered accuracy,
sensitivity and specificity. Such performance parameters are
relevant in classification problems. It has to be added how-
ever, the goodness of fitting of predictive models for quanti-
tative variables such as length of stay (LoS) can be evaluated
through different parameters such as residual mean square
error (RMSE) or the coefficient of determination (R
2
).
Despite these limitations, this review provides important
insights into the current state of AI integration in the health-
care sector. It indicates that over 10,000 ML algorithms have
been already developed for medical use. This is not surprising,
considering that AI is becoming a major driver of innovation in
healthcare. For example, the number of patents granted solely
for digital communication or medical technologies will almost
double that for drugs by 2022 in Europe [329]. A rough com-
parison indicates that fewer than 60 drugs and over 100 ML/
AI-enabled medical devices have been approved annually by
the FDA since 2019 (up to 523 until the end of January 2023)
[330]. Additionally, our study indicated that majority of identi-
fied ML algorithms were developed based on the imaging
data with the adoption of ANN methods. This implies that
artificial intelligence is used for medical purposes mainly to
support clinical decision making. It aligns with another study
that found 189 out of 222 FDA-approved medical devices
(85%) are designed for use by healthcare professionals, while
the remaining 33 (15%) are intended for use by patients [331].
5. Conclusions
There is still unrealized potential for AI in healthcare. Despite
the growing number of published ML algorithms, there is
limited evidence of their impact on clinical practice.
More evidence concerning external and internal validation
can drive the change toward a greater, more robust, and safer
adoption of AI. Consequently, it may allow payers, clinicians,
and patients to increase their trust in ML algorithms. The key is
ensuring that AI development is examined through the lens of
the health problems in question. Unmet medical needs are
heterogeneously shaped by patients and influenced by the
care setting, baseline characteristics, and cultural differences.
Thus, there is a need to prepare a landing field for ML algo-
rithms for healthcare applications. However, we are not there
yet; hence, by moving forward, AI will only face more chal-
lenges. Currently, we are in a different era. Let us be ready
with the right data at the appropriate time.
6. Expert opinion
Looking into the future, it is provoking to ask how to ensure
greater adoption of ML algorithms in the healthcare systems
while taking into consideration patients’ benefits, developers’
business needs as well as limitations of public budgets.
Our review was driven by a central question: How can we
bridge the divide between the development and implementa-
tion of ML algorithms in healthcare? From our findings, we can
extract recommendations for both developers and payers.
6.1. Recommendations for ML developers in healthcare
With respect to the performance of artificial intelligence, the
results of our review appear promising at first glance. At first
glance, it may be perceived as an impressive finding if one
acknowledges that within 12 therapeutic areas (out of 22 ICD
chapters) we are already having access to some ML algorithms
with an accuracy pointing toward 100%. In addition, five other
ICD chapters had scores above 88% (Appendix 1). However, to
embrace the clinical applicability of AI, a review of such bare
numbers may not offer a comprehensive perspective. The
adoption of ML algorithms into the clinical settings requires
further consideration. As far as internal validation is con-
cerned, the lack of testing of the predictive power on separate
datasets may overestimate ML performance in practical situa-
tions. It is the cross-validation that leads to more accurate
estimates of the performance of the ML model on an unseen
dataset. Unfortunately, it was found across only 15% of the
included 10,462 studies (Appendix 1). Cross-validation divides
EXPERT REVIEW OF PHARMACOECONOMICS & OUTCOMES RESEARCH 41
the sample into k subsets, with k
th
subsets used as the test
set/validation set and (k-1)
th
subsets for training. The model is
trained on the training data and predictions are made using
the model on the testing data. The sensitivity and/or specifi-
city are averaged by testing multiple times on k-folds subsets.
As most of the data was used for fitting, the k-fold approach
significantly reduced the bias and variance, as most of the
data is also used in the validation set. Thus it reduces the risk
of undertraining when a large amount of noise is introduced
into the training data and, consequently, bias. It helps prevent
overfitting, which occurs when the model attempts to learn
each detail and noise of the data, leading to poor model
performance on test sets [332].
The cross-validation performed better with larger data-
sets. This is vital, particularly when one considers the
importance of ML for diagnosis which initiates
a sequence of subsequent actions. Hence, it is up to the
correct prediction that can allow the healthcare system to
be effective and efficient. This helps optimizing treatment
pathways for previously diagnosed patients. The availabil-
ity of data enables healthcare professionals to use predic-
tive modeling techniques for prevention and prophylaxis
actions more than ever.
Generally, the larger the dataset, the greater the statistical
power and chances of better prediction. A negative relationship
between sample size and classification accuracy has already
been reported [332,333], and it is important to note that as
many as 83 out of identified 220 systematic literature reviews
did not provide information regarding the size of the datasets
used for ML algorithm development. Simultaneously, the major-
ity of the included SLRs reported a large variance between the
smallest and largest sample sizes, despite having similar clinical
objectives (Table 1). However, it is not only the size of the training
dataset that has significant importance, but also the variability of
the available data. Lack of diversity in training datasets, often
driven by the use of data obtained from a localized patient’s
population, is a main source of inadequate generalizability of the
model outputs to different patient populations. As such, it is
likely that the most effective approach in reducing biases and
expand the applicability of models in multiple settings and/or
populations is to train models on multi-institutional datasets.
Some studies have indicated that other sources of potential
variety driven by medical device manufacturer software are
adopted, in which AI models trained on cardiac magnetic reso-
nance imaging (MRI) scans provide different accuracy results
from different scanners [334] and more than a two-folds differ-
ence were found in the error rate between two different optical
coherence tomography (OCT) scans [335]. The limited diversity in
the data used for ML is a problem, and a scoping review of
publications related to AI that appeared in PubMed in 2019
revealed that over half of the datasets used for clinical AI origi-
nated from either the US or China [336]. In addition, the U.S. and
China contributed over 40% of the publications [336]. Barriers to
accessing data lead to the overutilization of available datasets.
For instance, there are only four major databases in ophthalmol-
ogy: ESSIDOR, DRIVE, EyePACS, and E-ophtha, with unknown
publicly available datasets for ophthalmological images in 172
countries that constitute roughly 45% of the global popula-
tion [337].
Shifting away from the rigorous demands of cross-
validation, developers ought to perform more often the stu-
dies to test their technology in the mode of external validation
as well. To date, the published comparative data between ML
algorithms and humans has shown favorable outcomes for the
former. For instance, across 12 studies using deep neural net-
works for ECG analysis to detect structural cardiac pathologies,
the predictive accuracy of the neural network DL models was
superior to that of expert interpretations by board-certified
cardiologists. The same was found in the comparison of com-
puter-aided detection (CAD) systems with 53 general endos-
copists for detecting early neoplasia in patients with Barrett’s
esophagus (BE). The CAD achieved higher accuracy than any
of the clinicians, regardless of the level of endoscopic exper-
tise [338]; in both cases, details regarding the choice of the
clinical group were missing.
It should be noted however that less than 1% of the
included studies reported external validation. This is
a significant gap in the evidence. This has considerable con-
sequences for its implementation in clinical practice. For true
external validation, a tuned algorithm must be applied to
a new set of data from different sources. The ultimate objec-
tive is to ensure the generalizability of the results with the
adoption of ML across various care compositions. As Bang and
colleagues mentioned in their systematic literature review:
‘CAD algorithms demonstrated high accuracy for the auto-
matic endoscopic diagnosis of oesophageal cancer and neo-
plasms. The limitation of a lack of performance in external
validation and clinical applications should be overcome’ [25].
Considering the broader perspective on the necessity of ML
algorithm external validation, it’s crucial to emphasize that embra-
cing a suitable methodology for translation of efficacy (clinical
trials data) into effectiveness (real world data) has been introduced
as the minimum requirement within the evidence-based health-
care, a concept initially introduced for pharmaceuticals.
Our findings are similar to those of another review of DL
studies that focused on the comparison of ML against human
comparators covering the period from January 2010 to
June 2022. Only ten RCTs (including eight ongoing RCTs)
and 81 non-randomized clinical trials compared diagnostic
algorithms performance against clinicians [339]. In another
systematic literature review of 82 publications, only 14 studies
compared the diagnostic performance of DL models based on
medical imaging with that of healthcare professionals [340].
6.2. Recommendations for regulators and payers
Will improvements in both internal and external validations
make ML algorithms directly eligible for registration and
refundable? While the former is likely more about internal
validity, as its primary objective addresses the risk benefit
ratio, the latter may be more about external validity, as its
primary objective is to address the value for money. Therefore,
the next question is how regulators and public payers should
balance the requirements with respect to the evidence of the
usability of ML algorithms against the need to ensure safety
and treatment effectiveness. Given the existence of strict reg-
ulations for both pricing and reimbursement for pharmaceu-
ticals and medical devices, it is necessary to enquire whether
42 K. KOLASA ET AL.
similar hurdles of evidence generation should also be intro-
duced for ML algorithms. To address this issue, it is important
to recall that only approximately 12% of drugs entering clin-
ical trials are ultimately approved by the FDA [341] and the
average time to reimbursement for innovative treatments in
Europe is 511 days [342]. Hence, some claim that overregula-
tion may harm innovation. However, the development of the
majority of AI-driven innovations may be relatively short com-
pared to other time-consuming research and development
technologies, and there is potential for greater disruption in
the healthcare sector by ML algorithms than what we have
witnessed thus far. Therefore, the types of regulations that
should be developed to support the adoption of ML algo-
rithms remain unclear. Overall, there is a need to establish
a matrix of criteria to assess the ability of AI solutions to be
integrated into healthcare systems. There are already several
recommendations in this respect, such as a scoping review of
72 guidelines that, among others, identified quality criteria
regarding the development, evaluation, and implementation
of ML in healthcare [343]. Other experts have suggested
grouping ML algorithms into one of the following categories:
assistive, augmentative, or autonomous [344].
Still, there is a need for decision-makers (regulators and
public payers) to form a common unified approach toward
the development of a common set of standards for the
assessment of AI-driven health technologies, as ML is rarely
jurisdiction-specific. The maximum accuracy varied from 27%
(ICD-10 Chapter XVIII) to 100% (ICD-10 Chapter II) across the
included studies. Therefore, the question is whether the
same rules should be applied, irrespective of the area under
consideration. This may require the involvement of clinical
experts and a clear understanding of the unmet medical
needs in each disease field. Therefore, our recommendations
focus on the interoperability in the journey toward unified
P&R regulations for ML algorithms. The underlying rationale
is to ensure the accessibility of data such as electronic med-
ical records (EMRs) to AI developers. Thus far, there have
been limited efforts related to the availability of real-world
data (RWD) for validation as eluded earlier. In the era of
digital transformation, we should move further and ensure
the integration of EMRs with unstructured data. Additionally,
healthcare decision-makers must prepare data repositories to
facilitate external validation and invest in local data analytics
capabilities to facilitate internal validation. Such efforts
should be welcomed by developers, as expressed by many
experts [345]. The overarching objective is to ensure that ML
algorithms have complete access to health-related data irre-
spective of geographical, demographic, or institutional com-
position. Without an appropriate understanding of the health
problems in question, ML algorithms can only be utilized for
the populations and medical conditions for which they were
trained, failing to provide any value for populations or con-
comitant medical conditions that were omitted or underre-
presented in the training set owing to racial, ethnic, or simple
misrepresentation. Such activities will inevitably bring an
additional burden on both payers and developers; however,
AI is as good as the data it possesses, as demonstrated in this
study.
6.3. Five years view
Machine learning is poised to revolutionize the healthcare system
to an extent not seen before. It will reshape decision-making
processes, with individuals playing a more significant role, thanks
to data delivered directly from the Internet of Things. The role of
clinicians will shift from decision-makers to consultants, support-
ing patients in interpreting the collected data. With the rapid
advancements in sensor technologies and the widespread avail-
ability of semiconductors, machine learning algorithms tailored to
mobile phones will empower patients and, most importantly,
provide numerous opportunities for preventive care. Significant
savings for public payers can be realized as data-driven trends lead
to human-centric healthcare ecosystems, provided they find
a solid framework in the legal structure. The digital revolution is
set to retire the healthcare system as we have known it so far.
Funding
This paper was not funded.
Declaration of interests
This paper was not funded. The authors have no relevant affiliations or financial
involvement with any organization or entity with a financial interest in or
financial conflict with the subject matter or materials discussed in the manu-
script. This includes employment, consultancies, honoraria, stock ownership or
options, expert testimony, grants or patents received or pending, or royalties.
Reviewer disclosures
Peer reviewers on this manuscript have no relevant financial or other
relationships to disclose.
Authors contribution
KK and SP participated in the design and execution of the SRL and oversaw
studies selection and the synthesis of the results obtained; they also finalized
the discussion and the conclusions. JEP participated in the design of the SRL
and actively contributed to the selection of studies and data extraction. BA,
MHV, KJK contributed data extraction as well as to the drafting of the
manuscript.
All authors read and approved the final version of the manuscript for
publication.
References
1. OECD. Health at a glance 2021: OECD Indicators: Digital health. 2021.
2. Enad H, Mohammed M. A Review on Artificial Intelligence and
Quantum Machine Learning for Heart Disease Diagnosis: Current
Techniques, Challenges and Issues. Recent Dev Fut Directions.
2023;11(1):08–25. doi: 10.54216/FPA.110101
3. Mohammed MA, Lakhan A, Abdulkareem KH, et al. Federated
auto-encoder and XGBoost schemes for multi-omics cancer detec-
tion in distributed fog computing paradigm. Chemometr Intell Lab
Syst. 2023;241:104932. doi: 10.1016/j.chemolab.2023.104932
4. Abdulmajeed NQ, Al-Khateeb B, Mohammed MA. Voice pathology
identification system using a deep learning approach based on
unique feature selection sets. Expert Syst. 2023;3:e13327.
5. Arif ZH, Cengiz K. Severity classification for COVID-19 infections
based on lasso-logistic regression model. Int J Math Stat Comput
Sci. 2023;1:25–32. doi: 10.59543/ijmscs.v1i.7715
EXPERT REVIEW OF PHARMACOECONOMICS & OUTCOMES RESEARCH 43
6. Salman AO, Geman O. Evaluating three Machine learning classifica-
tion methods for effective COVID-19 diagnosis. Int J Math Stat
Comput Sci. 2023;1:1–14. doi: 10.59543/ijmscs.v1i.7693
7. Zeng Z, Zhan J, Zhang K, et al. Global, regional, and national
burden of urinary tract infections from 1990 to 2019: an analysis
of the global burden of disease study 2019. World J Urol. 2022;40
(3):755–763. doi: 10.1007/s00345-021-03913-0
8. Moher D, Liberati A, Tetzlaff J, et al. Preferred reporting items for
systematic reviews and meta-analyses: the PRISMA statement. BMJ.
2009;339:b2535. jul21 1. doi: 10.1136/bmj.b2535.
9. Loftus TJ, Tighe PJ, Ozrazgat-Baslanti T, et al. Ideal algorithms in
healthcare: explainable, dynamic, precise, autonomous, fair, and
reproducible. PLOS Digit Health. 2022;1(1):e0000006. doi: 10.1371/
journal.pdig.0000006
10. Padula WV, Kreif N, Vanness DJ, et al. Machine learning methods in
Health Economics and outcomes research-the PALISADE checklist:
a good practices report of an ISPOR task force. Value Health.
2022;25(7):1063–1080. doi: 10.1016/j.jval.2022.03.022
11. Abu Bakar AR, Lai KW, Hamzaid NA. The emergence of machine
learning in auditory neural impairment: a systematic review.
Neurosci Lett. 2021;765:136250. doi: 10.1016/j.neulet.2021.136250
12. Adamidi ES, Mitsis K, Nikita KS. Artificial intelligence in clinical care
amidst COVID-19 pandemic: A systematic review. Comput Struct
Biotechnol J. 2021;19:2833–2850. doi: 10.1016/j.csbj.2021.05.010
13. Adeoye J, Tan JY, Choi SW, et al. Prediction models applying machine
learning to oral cavity cancer outcomes: a systematic review. Int J Med
Inform. 2021;154:154. doi: 10.1016/j.ijmedinf.2021.104557
14. Ahsan MM, Siddique Z. Machine learning-based heart disease diag-
nosis: a systematic literature review. Artif Intell Med.
2022;128:102289. doi: 10.1016/j.artmed.2022.102289
15. Akazawa M, Hashimoto K. Artificial intelligence in gynecologic can-
cers: current status and future challenges – a systematic review. Artif
Intell Med. 2021;120:102164. doi: 10.1016/j.artmed.2021.102164
16. Al Hinai G, Jammoul S, Vajihi Z, et al. Deep learning analysis of resting
electrocardiograms for the detection of myocardial dysfunction,
hypertrophy, and ischaemia: a systematic review. Eur Heart J Digit
Health. 2021;2(3):416–423. doi: 10.1093/ehjdh/ztab048
17. Alabi RO, Youssef O, Pirinen M, et al. Machine learning in oral
squamous cell carcinoma: Current status, clinical concerns and
prospects for future—A systematic review. Artif Intell Med.
2021;115:102060. doi: 10.1016/j.artmed.2021.102060
18. Albahri AS, Hamid RA, Alwan J, et al. Role of biological data mining
and Machine learning techniques in detecting and diagnosing the
Novel coronavirus (COVID-19): a systematic review. J Med Syst.
2020;44(7). doi: 10.1007/s10916-020-01582-x
19. Alballa N, Al-Turaiki I. Machine learning approaches in COVID-19
diagnosis, mortality, and severity risk prediction: a review. IMU.
2021;24:100564. doi: 10.1016/j.imu.2021.100564
20. Alharbi ET, Nadeem F, Cherif A. Predictive models for personalized
asthma attacks based on patient’s biosignals and environmental
factors: a systematic review. BMC Med Inform Decis Mak. 2021;21
(1). doi: 10.1186/s12911-021-01704-6
21. Alhasan AS. Clinical applications of Artificial intelligence, Machine
learning, and Deep learning in the imaging of Gliomas:
a systematic review. Cureus. 2021;13(11). doi: 10.7759/cureus.19580
22. Alsolai H, Qureshi S, Iqbal SMZ, et al. A systematic review of
literature on Automated sleep Scoring. IEEE Access.
2022;10:79419–79443. doi: 10.1109/ACCESS.2022.3194145
23. Anteby R, Klang E, Horesh N, et al. Deep learning for noninvasive
liver fibrosis classification: a systematic review. Liver Int. 2021;41
(10):2269–2278. doi: 10.1111/liv.14966
24. Bang CS, Lee JJ, Baik GH. Computer-aided diagnosis of
Gastrointestinal ulcer and hemorrhage using wireless capsule
endoscopy: systematic review and diagnostic test accuracy
meta-analysis. J Med Internet Res. 2021;23(12):e33267. doi: 10.
2196/33267
25. Bang CS, Lee JJ, Baik GH. Computer-aided diagnosis of esophageal
cancer and neoplasms in endoscopic images: a systematic review
and meta-analysis of diagnostic test accuracy. Gastrointest Endosc.
2021;93(5):1006–1015.e1013. doi: 10.1016/j.gie.2020.11.025
26. Barrett L, Hu J, Howell P. Systematic review of Machine learning
approaches for detecting developmental stuttering. IEEE/ACM
Trans Audio Speech Lang Process. 2022;30:1160–1172. doi: 10.
1109/TASLP.2022.3155295
27. Bazoukis G, Stavrakis S, Zhou J, et al. Machine learning versus
conventional clinical methods in guiding management of heart
failure patients—a systematic review. Heart Fail Rev. 2021;26
(1):23–34. doi: 10.1007/s10741-020-10007-3
28. Bedrikovetski S, Dudi-Venkata NN, Kroon HM, et al. Artificial intelli-
gence for pre-operative lymph node staging in colorectal cancer:
a systematic review and meta-analysis. BMC Cancer. 2021;21
(1):1058. doi: 10.1186/s12885-021-08773-w
29. Bedrikovetski S, Dudi-Venkata NN, Maicas G, et al. Artificial intelli-
gence for the diagnosis of lymph node metastases in patients with
abdominopelvic malignancy: a systematic review and
meta-analysis. Artif Intell Med. 2021;113:102022. doi: 10.1016/j.
artmed.2021.102022
30. Benoit J, Onyeaka H, Keshavan M, et al. Systematic review of digital
phenotyping and machine learning in psychosis spectrum illnesses.
Harv Rev Psychiatry. 2020;28(5):296–304. doi: 10.1097/HRP.
0000000000000268
31. Bernert RA, Hilberg AM, Melia R, et al. Artificial intelligence and
suicide prevention: a systematic review of Machine learning
investigations. Int J Environ Res Public Health. 2020;17(16):5929.
doi: 10.3390/ijerph17165929
32. Bertl M, Metsallik J, Ross P. A systematic literature review of
AI-based digital decision support systems for post-traumatic stress
disorder. Front Psychiatry. 2022;13: doi: 10.3389/fpsyt.2022.923613
33. Binvignat M, Pedoia V, Butte AJ, et al. Use of machine learning in
osteoarthritis research: a systematic literature review. RMD Open.
2022;8(1):e001998. doi: 10.1136/rmdopen-2021-001998
34. Boonstra A, Laven M. Influence of artificial intelligence on the work
design of emergency department clinicians a systematic literature
review. BMC Health Serv Res. 2022;22(1). doi: 10.1186/s12913-022-
08070-7
35. Boyd C, Brown G, Kleinig T, et al. Machine learning quantitation of
cardiovascular and cerebrovascular disease: a systematic review of
clinical applications. Diagnostics. 2021;11(3):551. doi: 10.3390/
diagnostics11030551
36. Bracher-Smith M, Crawford K, Escott-Price V. Machine learning for
genetic prediction of psychiatric disorders: a systematic review. Mol
Psychiatry. 2021;26(1):70–79. doi: 10.1038/s41380-020-0825-2
37. Buchlak QD, Esmaili N, Leveque JC, et al. Machine learning applica-
tions to clinical decision support in neurosurgery: an artificial intel-
ligence augmented systematic review. Neurosurg Rev. 2020;43
(5):1235–1253. doi: 10.1007/s10143-019-01163-8
38. Buisson M, Navel V, Labbe A, et al. Deep learning versus ophthal-
mologists for screening for glaucoma on fundus examination:
a systematic review and meta-analysis. Clin Exp Ophthalmol.
2021;49(9):1027–1038. doi: 10.1111/ceo.14000
39. Cabitza F, Locoro A, Banfi G. Machine learning in orthopedics:
a literature review. Front Bioeng Biotechnol. 2018;6:75. doi: 10.
3389/fbioe.2018.00075
40. Castaldo R, Cavaliere C, Soricelli A, et al. Radiomic and genomic
machine learning method performance for prostate cancer diag-
nosis: systematic literature review. J Med Internet Res. 2021;23(4):
e22394. doi: 10.2196/22394
41. Cavus N, Lawan AA, Ibrahim Z, et al. A systematic literature review
on the application of machine-learning models in behavioral
assessment of Autism spectrum disorder. J Pers Med. 2021;11
(4):299. doi: 10.3390/jpm11040299
42. Celtikci E. A systematic review on machine learning in neurosur-
gery: the future of decision-making in patient care. Turk Neurosurg.
2018;28(2):167–173. doi: 10.5137/1019-5149.JTN.20059-17.1
43. Chandra G, Irisha KD, Vica VI, et al. Systematic literature review on
application of artificial intelligence in cancer detection using image
processing. In: 2022 3rd International Conference on Artificial
Intelligence and Data Sciences (AiDAS); Ipoh, Malaysia; 2022. p.
273–277.
44 K. KOLASA ET AL.
44. Chee ML, Ong MEH, Siddiqui FJ, et al. Artificial intelligence applica-
tions for COVID-19 in intensive care and emergency settings:
a systematic review. Int J Environ Res Public Health. 2021;18
(9):4749. doi: 10.3390/ijerph18094749
45. Chiesa-Estomba CM, Graña M, Medela A, et al. Machine learning
algorithms as a computer-assisted decision tool for Oral cancer
prognosis and management decisions: a systematic review. ORL
J Otorhinolaryngol Relat Spec. 2022;84(4):1–11. doi: 10.1159/
000520672
46. Cho SJ, Sunwoo L, Baik SH, et al. Brain metastasis detection using
machine learning: a systematic review and meta-analysis. Neuro
Oncol. 2021;23(2):214–225. doi: 10.1093/neuonc/noaa232
47. Choudhury A, Asan O. Role of Artificial intelligence in patient safety
outcomes: systematic literature review. JMIR Med Inform. 2020;8(7):
e18599. doi: 10.2196/18599
48. Choudhury A, Renjilian E, Asan O. Use of machine learning in
geriatric clinical care for chronic diseases: a systematic literature
review. JAMIA Open. 2020;3(3):459–471. doi: 10.1093/jamiaopen/
ooaa034
49. da Silva Neto SR, Tabosa Oliveira T, Teixeira IV, et al. Machine
learning and deep learning techniques to support clinical diagnosis
of arboviral diseases: a systematic review. PLoS Negl Trop Dis.
2022;16(1):e0010061. doi: 10.1371/journal.pntd.0010061
50. Dallora AL, Eivazzadeh S, Mendes E, et al. Machine learning and
microsimulation techniques on the prognosis of dementia:
a systematic literature review. PLoS One. 2017;12(6):e0179804.
doi: 10.1371/journal.pone.0179804
51. Dallora AL, Anderberg P, Kvist O, et al. Bone age assessment with
various machine learning techniques: a systematic literature review
and meta-analysis. PLoS One. 2019;14(7):e0220242. doi: 10.1371/
journal.pone.0220242
52. Daniel, Cenggoro TW, Pardamean B. A systematic literature review
of machine learning application in COVID-19 medical image
classification. Procedia Comput Sci. 2023;216:749–756. doi: 10.
1016/j.procs.2022.12.192
53. D’Antoni F, Russo F, Ambrosio L, et al. Artificial intelligence and
Computer vision in low back Pain: a systematic review.
Int J Environ Res Public Health. 2021;18(20):10909. doi: 10.
3390/ijerph182010909
54. Das PK, V A D, Meher S, et al. A systematic review on recent
advancements in deep and machine learning based detection
and classification of acute lymphoblastic leukemia. IEEE Access.
2022;10:81741–81763. doi: 10.1109/ACCESS.2022.3196037
55. Das T, Kaur H, Gour P, et al. Intersection of network medicine
and machine learning towards investigating the key biomarkers
and pathways underlying amyotrophic lateral sclerosis:
a systematic review. Brief Bioinform. 2022;23(6). doi: 10.1093/
bib/bbac442
56. de Bardeci M, Ip CT, Olbrich S. Deep learning applied to electro-
encephalogram data in mental disorders: a systematic review. Biol
Psychol. 2021;162:108117. doi: 10.1016/j.biopsycho.2021.108117
57. Decharatanachart P, Chaiteerakij R, Tiyarattanachai T, et al.
Application of artificial intelligence in chronic liver diseases:
a systematic review and meta-analysis. BMC Gastroenterol.
2021;21(1):10. doi: 10.1186/s12876-020-01585-5
58. Decharatanachart P, Chaiteerakij R, Tiyarattanachai T, et al.
Application of artificial intelligence in non-alcoholic fatty liver dis-
ease and liver fibrosis: a systematic review and meta-analysis.
Therap Adv Gastroenterol. 2021;14:17562848211062807. doi: 10.
1177/17562848211062807
59. DelSole EM, Keck WL, Patel AA. The state of machine learning in
spine surgery: a systematic review. Clin Spine Surg. 2021;35
(2):80–89. doi: 10.1097/BSD.0000000000001208
60. Dogan O, Tiwari S, Jabbar MA, et al. A systematic review on AI/ML
approaches against COVID-19 outbreak. Complex Intell Systems.
2021;7(5):1–24. doi: 10.1007/s40747-021-00424-8
61. Dudchenko A, Kopanitsa G. Decision support systems in cardiology:
a systematic review. Stud Health Technol Inform.
2017;237:209–214.
62. Ebrahimi A, Wiil UK, Schmidt T, et al. Predicting the risk of alcohol
use disorder using machine learning: a systematic literature review.
IEEE Access. 2021;9:151697–151712. doi: 10.1109/ACCESS.2021.
3126777
63. Ebrahimighahnavieh MA, Luo S, Chiong R. Deep learning to detect
Alzheimer’s disease from neuroimaging: a systematic literature
review. Comput Methods Programs Biomed. 2020;187:105242.
doi: 10.1016/j.cmpb.2019.105242
64. El-Daw S, El-Tantawy A, Aly T, et al. Role of machine learning in
management of degenerative spondylolisthesis: a systematic
review. Curr Orthop Pract. 2021;32(3):302–308. doi: 10.1097/BCO.
0000000000000992
65. Falconer N, Abdel-Hafez A, Scott IA, et al. Systematic review of
machine learning models for personalised dosing of heparin. Br
J Clin Pharmacol. 2021;87(11):4124–4139. doi: 10.1111/bcp.14852
66. Farook TH, Jamayet NB, Abdullah JY, et al. Machine learning and
intelligent Diagnostics in Dental and orofacial pain management:
a systematic review. Pain Res Manag. 2021;2021:1–9. doi: 10.1155/
2021/6659133
67. Fernandes F, Barbalho I, Barros D, et al. Biomedical signals and
machine learning in amyotrophic lateral sclerosis: a systematic
review. Biomed Eng Online. 2021;20(1). doi: 10.1186/s12938-021-
00896-2
68. Fregoso-Aparicio L, Noguez J, Montesinos L, et al. Machine learning
and deep learning predictive models for type 2 diabetes:
a systematic review. Diabetol Metab Syndr. 2021;13(1):148. doi:
10.1186/s13098-021-00767-9
69. Frondelius T, Atkova I, Miettunen J, et al. Diagnostic and prognostic
prediction models in ventilator-associated pneumonia: systematic
review and meta-analysis of prediction modelling studies. J Crit
Care. 2022;67:44–56. doi: 10.1016/j.jcrc.2021.10.001
70. Fusco R, Grassi R, Granata V, et al. Artificial intelligence and
COVID-19 using chest ct scan and chest x-ray images: machine
learning and deep learning approaches for diagnosis and
treatment. J Pers Med. 2021;11(10):993. doi: 10.3390/jpm11100993
71. Garrow CR, Kowalewski KF, Li L, et al. Machine learning for surgical
phase recognition: a systematic review. Ann Surg. 2021;273
(4):684–693. doi: 10.1097/SLA.0000000000004425
72. Ghaderzadeh M, Asadi F. Deep learning in the detection and
diagnosis of COVID-19 using radiology modalities: a systematic
review. J Healthc Eng. 2021;2021,:6677314. doi: 10.1155/2021/
6677314
73. Grueso S, Viejo-Sobera R. Machine learning methods for predicting
progression from mild cognitive impairment to Alzheimer’s disease
dementia: a systematic review. Alzheimers Res Ther. 2021;13(1):162.
doi: 10.1186/s13195-021-00900-w
74. Gutiérrez-Tobal GC, Álvarez D, Kheirandish-Gozal L, et al. Reliability
of machine learning to diagnose pediatric obstructive sleep apnea:
systematic review and meta-analysis. Pediatr Pulmonol. 2021;57
(8):1931–1943. doi: 10.1002/ppul.25423
75. Haggenmüller S, Maron RC, Hekler A, et al. Skin cancer classification
via convolutional neural networks: systematic review of studies
involving human experts. Eur J Cancer. 2021;156:202–216. doi: 10.
1016/j.ejca.2021.06.049
76. Hameed BMZ, Shah M, Naik N, et al. The ascent of artificial intelli-
gence in endourology: a systematic review over the last 2 decades.
Curr Urol Rep. 2021;22(10):53. doi: 10.1007/s11934-021-01069-3
77. Hasan N, Bao YKK. Understanding current states of machine learn-
ing approaches in medical informatics: a systematic literature
review. Health Technol. 2021;11(3):471–482. doi: 10.1007/s12553-
021-00538-6
78. Hassan N, Slight R, Weiand D, et al. Preventing sepsis; how can
artificial intelligence inform the clinical decision-making process?
a systematic review. Int J Med Inform. 2021;150:104457. doi: 10.
1016/j.ijmedinf.2021.104457
79. Henn J, Buness A, Schmid M, et al. Machine learning to guide
clinical decision-making in abdominal surgery—a systematic litera-
ture review. Langenbecks Arch Surg. 2021;407(1):51–61. doi: 10.
1007/s00423-021-02348-w
EXPERT REVIEW OF PHARMACOECONOMICS & OUTCOMES RESEARCH 45
80. Hickman SE, Woitek R, Le EPV, et al. Machine learning for workflow
applications in screening Mammography: systematic review and
meta-analysis. Radiology. 2022;302(1):88–104. doi: 10.1148/radiol.
2021210391
81. Hinterwimmer F, Lazic I, Suren C, et al. Machine learning in knee
arthroplasty: specific data are key—a systematic review. Knee Surg
Sports Traumatol Arthrosc. 2022;30(2):376–388. doi: 10.1007/
s00167-021-06848-6
82. Hoekstra O, Hurst W, Tummers J. Healthcare related event prediction
from textual data with machine learning: a systematic literature review.
Healthc Anal. 2022;2:100107. doi: 10.1016/j.health.2022.100107
83. Hoodbhoy Z, Masroor Jeelani S, Aziz A, et al. Machine learning for
child and adolescent health: a systematic review. Pediatrics.
2021;147(1):e2020011833. doi: 10.1542/peds.2020-011833
84. Hosni M, Abnane I, Idri A, et al. Reviewing ensemble classification
methods in breast cancer. Comput Methods Programs Biomed.
2019;177:89–112. doi: 10.1016/j.cmpb.2019.05.019
85. Hoyos W, Aguilar J, Toro M. Dengue models based on machine
learning techniques: A systematic literature review. Artif Intell Med.
2021;119:102157. doi: 10.1016/j.artmed.2021.102157
86. Huang J, Shlobin NA, DeCuypere M, et al. Deep learning for out-
come prediction in Neurosurgery: a systematic review of design,
reporting, and reproducibility. Neurosurgery. 2022;90(1):16–38. doi:
10.1227/NEU.0000000000001736
87. Huang S, Dang J, Sheckter CC, et al. A systematic review of
machine learning and automation in burn wound evaluation:
a promising but developing frontier. Burns. 2021;47(8):1691–1704.
doi: 10.1016/j.burns.2021.07.007
88. Huang Z, Aarab G, Ravesloot MJL, et al. Prediction of the obstruc-
tion sites in the upper airway in sleep-disordered breathing based
on snoring sound parameters: a systematic review. Sleep Med.
2021;88:116–133. doi: 10.1016/j.sleep.2021.10.015
89. Ibrahim B, Suppiah S, Ibrahim N, et al. Diagnostic power of resting-
state fMRI for detection of network connectivity in Alzheimer’s
disease and mild cognitive impairment: a systematic review. Hum
Brain Mapp. 2021;42(9):2941–2968. doi: 10.1002/hbm.25369
90. Infante T, Cavaliere C, Punzo B, et al. Radiogenomics and artificial
intelligence approaches applied to cardiac computed tomography
angiography and cardiac magnetic resonance for precision medicine
in coronary heart disease: a systematic review. Circ Cardiovasc Imaging.
2021;14(12):1133–1146. doi: 10.1161/CIRCIMAGING.121.013025
91. Irgang L, Barth H, Holmén M. Data-driven technologies as enablers
for value Creation in the prevention of surgical site infections:
a systematic review. J Healthc Informatics Res. 2023;7(1):1–41. doi:
10.1007/s41666-023-00129-2
92. Islam MN, Mustafina SN, Mahmud T, et al. Machine learning to
predict pregnancy outcomes: a systematic review, synthesizing
framework and future research agenda. Bmc Pregnancy
Childbirth. 2022;22(1). doi: 10.1186/s12884-022-04594-2
93. Jiang K, Jiang X, Pan J, et al. Current evidence and future perspec-
tive of accuracy of Artificial intelligence Application for early gastric
cancer diagnosis with endoscopy: a systematic and meta-analysis.
Front Med. 2021;8:629080. doi: 10.3389/fmed.2021.629080
94. Jiang MY, Ma YX, Guo SY, et al. Using Machine learning technolo-
gies in pressure injury management: systematic review. JMIR Med
Inform. 2021;9(3):e25704. doi: 10.2196/25704
95. Jones OT, Calanzani N, Saji S, et al. Artificial intelligence techniques
that may be applied to primary care data to facilitate earlier
diagnosis of cancer: systematic review. J Med Internet Res.
2021;23(3):N.PAG–N.PAG. doi: 10.2196/23483
96. Kalhori SRN, Tanhapour M, Gholamzadeh M. Enhanced childhood
diseases treatment using computational models: systematic review
of intelligent experiments heading to precision medicine. J Biomed
Informat. 2021;115:115. doi: 10.1016/j.jbi.2021.103687
97. Kareemi H, Vaillancourt C, Rosenberg H, et al. Machine learning
versus usual care for diagnostic and prognostic prediction in the
emergency department: a systematic review. Acad Emerg Med.
2021;28(2):184–196. doi: 10.1111/acem.14190
98. Karwath A, Bunting KV, Gill SK, et al. Redefining β-blocker response
in heart failure patients with sinus rhythm and atrial fibrillation:
a machine learning cluster analysis. Lancet. 2021;398
(10309):1427–1435. doi: 10.1016/S0140-6736(21)01638-X
99. Kassem MA, Hosny KM, Damaševičius R, et al. Machine learning and
Deep learning methods for skin lesion classification and diagnosis:
a systematic review. Diagnostics. 2021;11(8):1390. doi: 10.3390/
diagnostics11081390
100. Kausch SL, Moorman JR, Lake DE, et al. Physiological machine
learning models for prediction of sepsis in hospitalized adults: an
integrative review. Intensive Crit Care Nurs. 2021;65:103035. doi:
10.1016/j.iccn.2021.103035
101. Kawamoto A, Takenaka K, Okamoto R, et al. Systematic review of
artificial intelligence-based image diagnosis for inflammatory bowel
disease. Dig Endosc. 2022;34(7):1311–1319. doi: 10.1111/den.14334
102. Kedra J, Radstake T, Pandit A, et al. Current status of use of big data
and artificial intelligence in RMDs: a systematic literature review
informing EULAR recommendations. RMD Open. 2019;5(2):
e001004. doi: 10.1136/rmdopen-2019-001004
103. Kennedy EE, Bowles KH, Aryal S. Systematic review of prediction
models for postacute care destination decision-making. J Am Med
Informatics Assoc. 2022;29(1):176–186. doi: 10.1093/jamia/ocab197
104. Khanagar SB, Naik S, Al Kheraif AA, et al. Application and perfor-
mance of Artificial intelligence technology in Oral cancer diagnosis
and prediction of prognosis: a systematic review. Diagnostics.
2021;11(6):1004. doi: 10.3390/diagnostics11061004
105. Kim HR, Sung M, Park JA, et al. Analyzing adverse drug reaction
using statistical and machine learning methods: a systematic
review. Medicine. 2022;101(25):E29387. doi: 10.1097/MD.
0000000000029387
106. Kim SS. Recent trends of artificial intelligence and machine learning
for insomnia research. Chronobiol Med. 2021;3(1):16–19. doi: 10.
33069/cim.2021.0008
107. Kodama S, Fujihara K, Shiozaki H, et al. Ability of current machine
learning algorithms to predict and detect hypoglycemia in patients
with diabetes mellitus: meta-analysis. JMIR Diabetes. 2021;6(1):
e22458. doi: 10.2196/22458
108. Komolafe TE, Cao Y, Nguchu BA, et al. Diagnostic test accuracy of
deep learning detection of COVID-19: a systematic review and
meta-analysis. Acad Radiol. 2021;28(11):1507–1523. doi: 10.1016/j.
acra.2021.08.008
109. Kourou K, Exarchos KP, Papaloukas C, et al. Applied machine learn-
ing in cancer research: a systematic review for patient diagnosis,
classification and prognosis. Comput Struct Biotechnol J.
2021;19:5546–5555. doi: 10.1016/j.csbj.2021.10.006
110. Kozikowski M, Suarez-Ibarrola R, Osiecki R, et al. Role of radiomics
in the prediction of muscle-invasive bladder cancer: a systematic
review and meta-analysis. Eur Urol Focus. 2021;8(3):728–738. doi:
10.1016/j.euf.2021.05.005
111. Kumar S, Oh I, Schindler S, et al. Machine learning for modeling the
progression of Alzheimer disease dementia using clinical data:
a systematic literature review. JAMIA Open. 2021;4(3):ooab052.
doi: 10.1093/jamiaopen/ooab052
112. Kumar Y, Gupta S, Singla R, et al. A systematic review of artificial
intelligence techniques in cancer prediction and diagnosis. Archiv
Comput Methods Eng. 2022;29(4):2043–2070.
113. Kuntz S, Krieghoff-Henning E, Kather JN, et al. Gastrointestinal
cancer classification and prognostication from histology using
deep learning: systematic review. Eur J Cancer. 2021;155:200–215.
doi: 10.1016/j.ejca.2021.07.012
114. La Greca Saint-Esteven A, Vuong D, Tschanz F, et al. Systematic review
on the association of radiomics with tumor biological endpoints.
Cancers (Basel). 2021;13(12). doi: 10.3390/cancers13123015
115. Langarizadeh M, Sayadi M. Machine learning techniques for diag-
nosis of lower gastrointestinal cancer: a systematic review. Iran Red
Crescent Med J. 2021;23(7):e436.
116. Le Glaz A, Haralambous Y, Kim-Dufor DH, et al. Machine learning
and natural language processing in mental health: systematic
review. J Med Internet Res. 2021;23(5):e15708. doi: 10.2196/15708
117. Lecointre L, Dana J, Lodi M, et al. Artificial intelligence-based radio-
mics models in endometrial cancer: a systematic review. Eur J Surg
Oncol. 2021;47(11):2734–2741. doi: 10.1016/j.ejso.2021.06.023
46 K. KOLASA ET AL.
118. Lequertier V, Wang T, Fondrevelle J, et al. Hospital length of stay
prediction methods: a systematic review. Med Care. 2021;59
(10):929–938. doi: 10.1097/MLR.0000000000001596
119. Li JX, Zhou ZJ, Dong JY, et al. Predicting breast cancer 5-year
survival using machine learning: a systematic review. PLoS One.
2021;16(4):e0250370. doi: 10.1371/journal.pone.0250370
120. Li MD, Ahmed SR, Choy E, et al. Artificial intelligence applied to
musculoskeletal oncology: a systematic review. Skeletal Radiol.
2022;51(2):245–256. doi: 10.1007/s00256-021-03820-w
121. Li Y, Wang X, Zhang J, et al. Applications of artificial intelligence
(AI) in researches on non-alcoholic fatty liver disease(NAFLD):
a systematic review. Rev Endocr Metab Disord. 2021;23
(3):387–400. doi: 10.1007/s11154-021-09681-x
122. Librenza-Garcia D, Kotzian BJ, Yang J, et al. The impact of machine
learning techniques in the study of bipolar disorder: a systematic
review. Neuroscience & Biobehavioral Reviews. 2017;80:538–554.
doi: 10.1016/j.neubiorev.2017.07.004
123. Lima CLD, da Silva ACG, Moreno GMM, et al. Temporal and spatio-
temporal arboviruses forecasting by machine learning: a systematic
review. Front Public Health. 2022;10: doi: 10.3389/fpubh.2022.
900077
124. Locquet M, Diep AN, Beaudart C, et al. A systematic review of
prediction models to diagnose COVID-19 in adults admitted to
healthcare centers. Arch Public Health. 2021;79(1):105. doi: 10.
1186/s13690-021-00630-3
125. Lopez CD, Gazgalis A, Boddapati V, et al. Artificial learning and
Machine learning decision guidance applications in total hip and
nnee arthroplasty: a systematic review. Arthroplast Today.
2021;11:103–112. doi: 10.1016/j.artd.2021.07.012
126. Lubelski D, Hersh A, Azad TD, et al. Prediction models in degen-
erative spine surgery: a systematic review. Global Spine J. 2021;11
(1_suppl):79s–88s. doi: 10.1177/2192568220959037
127. Maile H, JPO L, Gore D, et al. Machine learning algorithms to detect
subclinical keratoconus: systematic review. JMIR Med Inform.
2021;9(12):e27363. doi: 10.2196/27363
128. Mangold C, Zoretic S, Thallapureddy K, et al. Machine learning
models for predicting neonatal mortality: a systematic review.
Neonatology. 2021;118(4):394–405. doi: 10.1159/000516891
129. Mari T, Henderson J, Maden M, et al. Systematic review of the effective-
ness of machine learning algorithms for classifying pain intensity, phe-
notype or treatment outcomes using electroencephalogram data.
J Pain. 2021;23(3):349–369. doi: 10.1016/j.jpain.2021.07.011
130. Matsangidou M, Liampas A, Pittara M, et al. Machine learning in
pain medicine: an up-to-date systematic review. Pain Ther. 2021;10
(2):1067–1084. doi: 10.1007/s40122-021-00324-2
131. Mawdsley E, Reynolds B, Cullen B. A systematic review of the
effectiveness of machine learning for predicting psychosocial out-
comes in acquired brain injury: which algorithms are used and
why? J Neuropsychol. 2021;15(3):319–339. doi: 10.1111/jnp.12244
132. Medic G, Kosaner Kließ M, Atallah L, et al. Evidence-based clinical
decision support systems for the prediction and detection of three
disease states in critical care: a systematic literature review.
F1000Res. 2019;8:1728. doi: 10.12688/f1000research.20498.2
133. Mei J, Desrosiers C, Frasnelli J. Machine learning for the diagnosis
of Parkinson’s disease: a review of literature. Front Aging Neurosci.
2021;13:633752. doi: 10.3389/fnagi.2021.633752
134. Mellia JA, Basta MN, Toyoda Y, et al. Natural language processing in
surgery a systematic review and meta-analysis. Ann Surg. 2021;273
(5):900–908. doi: 10.1097/SLA.0000000000004419
135. Miltiadous A, Tzimourta KD, Giannakeas N, et al. Machine learning
algorithms for epilepsy detection based on published EEG data-
bases: a systematic review. IEEE Access. 2023;11:564–594. doi: 10.
1109/ACCESS.2022.3232563
136. Minissi ME, Chicchi Giglioli IA, Mantovani F, et al. Assessment of the
autism spectrum disorder based on machine learning and social
visual attention: a systematic review. J Autism Dev Disord. 2021;52
(5):2187–2202. doi: 10.1007/s10803-021-05106-5
137. Miranda L, Paul R, Pütz B, et al. Systematic review of functional MRI
applications for psychiatric disease subtyping. Front Psychiatry.
2021;12:665536. doi: 10.3389/fpsyt.2021.665536
138. Mirzania D, Thompson AC, Muir KW. Applications of deep learning
in detection of glaucoma: a systematic review. Eur J Ophthalmol.
2021;31(4):1618–1642. doi: 10.1177/1120672120977346
139. Moezzi M, Shirbandi K, Shahvandi HK, et al. The diagnostic accuracy
of Artificial intelligence-assisted CT imaging in COVID-19 disease:
a systematic review and meta-analysis. Inform Med Unlocked.
2021;24:100591. doi: 10.1016/j.imu.2021.100591
140. Moglia A, Georgiou K, Georgiou E, et al. A systematic review on
artificial intelligence in robot-assisted surgery. Int J Surg.
2021;95:106151. doi: 10.1016/j.ijsu.2021.106151
141. Mohan BP, Khan SR, Kassab LL, et al. High pooled performance of
convolutional neural networks in computer-aided diagnosis of GI
ulcers and/or hemorrhage on wireless capsule endoscopy images:
a systematic review and meta-analysis. Gastrointest Endosc.
2021;93(2):356–364.e354. doi: 10.1016/j.gie.2020.07.038
142. Mondal MRH, Bharati S, Podder P. Diagnosis of COVID-19 using
machine learning and deep learning: a review. Curr Med Imaging.
2021;17(12):1403–1418. doi: 10.2174/1573405617666210713113439
143. Montazeri M, ZahediNasab R, Farahani A, et al. Machine learning
models for image-based diagnosis and prognosis of COVID-19:
systematic review. JMIR Med Inform. 2021;9(4):e25181. doi: 10.
2196/25181
144. Moor M, Rieck B, Horn M, et al. Early prediction of sepsis in the ICU
using Machine learning: a systematic review. Front Med.
2021;8:607952. doi: 10.3389/fmed.2021.607952
145. Moshawrab M, Adda M, Bouzouane A, et al. Smart wearables for
the detection of Cardiovascular diseases: a systematic literature
review. Sensors. 2023;23(2):828. doi: 10.3390/s23020828
146. Moura FSE, Amin K, Ekwobi C. Artificial intelligence in the manage-
ment and treatment of burns: a systematic review. Burns Rauma.
2021;9: doi: 10.1093/burnst/tkab022
147. Mpanya D, Celik T, Klug E, et al. Predicting mortality and hospita-
lization in heart failure using machine learning: a systematic litera-
ture review. Int J Cardiol Heart Vasc. 2021;34:100773. doi: 10.1016/j.
ijcha.2021.100773
148. Mughal H, Javed AR, Rizwan M, et al. Parkinson’s disease manage-
ment via Wearable sensors: a systematic review. IEEE Access.
2022;10:35219–35237. doi: 10.1109/ACCESS.2022.3162844
149. Musa N, AYu G, Aljojo N, et al. A systematic review and meta-data
analysis on the applications of Deep learning in electrocardiogram.
J Ambient Intell Humaniz Comput. 2022;14(7):1–74.
150. Musulin J, Baressi Šegota S, Štifanić D, et al. Application of Artificial
intelligence-based regression methods in the problem of COVID-19
spread prediction: a systematic review. Int J Environ Res Public
Health. 2021;18(8):4287. doi: 10.3390/ijerph18084287
151. Nadarajah R, Alsaeed E, Hurdus B, et al. Prediction of incident atrial
fibrillation in community-based electronic health records:
a systematic review with meta-analysis. Heart. 2021;108
(13):1020–1029. doi: 10.1136/heartjnl-2021-320036
152. Naemi A, Schmidt T, Mansourvar M, et al. Machine learning tech-
niques for mortality prediction in emergency departments:
a systematic review. BMJ Open. 2021;11(11):e052663. doi: 10.
1136/bmjopen-2021-052663
153. Nafea MS, Ismail ZH. Supervised machine learning and Deep learn-
ing techniques for epileptic seizure recognition using EEG signals—
A systematic literature review. Bioeng. 2022;9(12):781. doi: 10.3390/
bioengineering9120781
154. Nasser M, Yusof UK. Deep learning based methods for breast
cancer diagnosis: a systematic review and future direction.
Diagnostics. 2023;13(1). doi: 10.3390/diagnostics13010161
155. Nazarian S, Glover B, Ashrafian H, et al. Diagnostic accuracy of
Artificial intelligence and Computer-aided diagnosis for the detec-
tion and characterization of colorectal polyps: systematic review
and meta-analysis. J Med Internet Res. 2021;23(7):e27370. doi: 10.
2196/27370
156. Nguyen AV, Blears EE, Ross E, et al. Machine learning applications
for the differentiation of primary central nervous system lymphoma
from glioblastoma on imaging: a systematic review and
meta-analysis. Neurosurg Focus. 2018;45(5):E5. doi: 10.3171/2018.
8.FOCUS18325
EXPERT REVIEW OF PHARMACOECONOMICS & OUTCOMES RESEARCH 47
157. Ogink PT, Groot OQ, Bindels BJJ, et al. The use of machine learning
prediction models in spinal surgical outcome: an overview of cur-
rent development and external validation studies. Semin Spine
Surg. 2021;33(2):100872. doi: 10.1016/j.semss.2021.100872
158. Ogink PT, Groot OQ, Karhade AV, et al. Wide range of applications
for machine-learning prediction models in orthopedic surgical out-
come: a systematic review. Acta Orthop. 2021;92(5):526–531. doi:
10.1080/17453674.2021.1932928
159. Ortíz-Barrios MA, Coba-Blanco DM, Alfaro-Saíz JJ, et al. Process
improvement approaches for increasing the response of emer-
gency departments against the COVID-19 pandemic: a systematic
review. Int J Environ Res Public Health. 2021;18(16):8814. doi: 10.
3390/ijerph18168814
160. Ossai CI, Wickramasinghe N. Intelligent decision support with
machine learning for efficient management of mechanical ventila-
tion in the intensive care unit a critical overview. Int J Med
Inform. 2021;150:104469. doi: 10.1016/j.ijmedinf.2021.104469
161. Paganelli AI, Mondéjar AG, da Silva AC, et al. Real-time data analysis
in health monitoring systems: a comprehensive systematic litera-
ture review. J Biomed Informat. 2022;127:104009. doi: 10.1016/j.jbi.
2022.104009
162. Pahwa B, Bali O, Goyal S, et al. APplications of machine learning in
pediatric hydrocephalus: a systematic review. Neurol India. 2021;69
(8):S568–S577. doi: 10.4103/0028-3886.332287
163. Patil S, Habib Awan K, Arakeri G, et al. Machine learning and its
potential applications to the genomic study of head and neck
cancer—A systematic review. J Oral Pathol Med. 2019;48
(9):773–779. doi: 10.1111/jop.12854
164. Peralta M, Jannin P, Baxter JSH. Machine learning in deep brain
stimulation: a systematic review*. Artif Intell Med. 2021;122:122.
doi: 10.1016/j.artmed.2021.102198
165. Persad E, Jost K, Honoré A, et al. Neonatal sepsis prediction
through clinical decision support algorithms: a systematic review.
Acta Paediatr. 2021;110(12):3201–3226. doi: 10.1111/apa.16083
166. Popa SL, Ismaiel A, Cristina P, et al. Non-alcoholic fatty liver disease:
implementing complete automated diagnosis and staging.
a systematic review. Diagnostics. 2021;11(6):1078. doi: 10.3390/
diagnostics11061078
167. Prasoppokakorn T, Tiyarattanachai T, Chaiteerakij R, et al.
Application of artificial intelligence for diagnosis of pancreatic
ductal adenocarcinoma by EUS: a systematic review and
meta-analysis. Endosc Ultrasound. 2021;11(1):17–26.
168. Quaak M, van de Mortel L, Thomas RM, et al. Deep learning
applications for the classification of psychiatric disorders using
neuroimaging data: systematic review and meta-analysis.
NeuroImage Clin. 2021;30:102584. doi: 10.1016/j.nicl.2021.102584
169. Quartuccio N, Marrale M, Laudicella R, et al. The role of PET radio-
mic features in prostate cancer: a systematic review. Clin Transl
Imaging. 2021;9(6):579–588. doi: 10.1007/s40336-021-00436-x
170. Ramesh S, Chokkara S, Shen T, et al. Applications of artificial
intelligence in pediatric oncology: a systematic review. JCO Clin
Cancer Inform. 2021;5(5):1208–1219. doi: 10.1200/CCI.21.00102
171. Ramos-Lima LF, Waikamp V, Antonelli-Salgado T, et al. The use of
machine learning techniques in trauma-related disorders:
a systematic review. J Psychiatr Res. 2020;121:159–172. doi: 10.
1016/j.jpsychires.2019.12.001
172. Ravegnini G, Ferioli M, Morganti AG, et al. Radiomics and artificial
intelligence in uterine sarcomas: a systematic review. J Pers Med.
2021;11(11):1179. doi: 10.3390/jpm11111179
173. Ren M, Yi PH. Artificial intelligence in orthopedic implant model
classification: a systematic review. Skeletal Radiol. 2022;51
(2):407–416. doi: 10.1007/s00256-021-03884-8
174. Rice P, Pugh M, Geraghty R, et al. Machine learning models for
predicting stone-free status after shockwave lithotripsy:
a systematic review and meta-analysis. Urology. 2021;156:16–22.
doi: 10.1016/j.urology.2021.04.006
175. Rowe TW, Katzourou IK, Stevenson-Hoare JO, et al. Machine learn-
ing for the life-time risk prediction of Alzheimer’s disease:
a systematic review. Brain Commun. 2021;3(4):fcab246. doi: 10.
1093/braincomms/fcab246
176. Safaei M, Sundararajan EA, Driss M, et al. A systematic literature
review on obesity: understanding the causes & consequences of
obesity and reviewing various machine learning approaches used
to predict obesity. Comput Biol Med. 2021;136:104754. doi: 10.
1016/j.compbiomed.2021.104754
177. Sajjadian M, Lam RW, Milev R, et al. Machine learning in the
prediction of depression treatment outcomes: a systematic review
and meta-analysis. Psychol Med. 2021;51(16):2742–2751. doi: 10.
1017/S0033291721003871
178. Salas-Zárate R, Alor-Hernández G, Salas-Zárate MDP, et al.
Detecting depression signs on social media: a systematic literature
review. Healthcare. 2022;10(2):291. doi: 10.3390/
healthcare10020291
179. Salem H, Soria D, Lund JN, et al. A systematic review of the
applications of expert systems (ES) and machine learning (ML) in
clinical urology. BMC Med Inform Decis Mak. 2021;21(1):223. doi:
10.1186/s12911-021-01585-9
180. Sanmarchi F, Fanconi C, Golinelli D, et al. Predict, diagnose, and
treat chronic kidney disease with machine learning: a systematic
literature review. J Nephrol. 2023;36(4):1101–1117. doi: 10.1007/
s40620-023-01573-4
181. Sanmarchi F, Fanconi C, Golinelli D, et al. Correction to: predict,
diagnose, and treat chronic kidney disease with machine learning:
a systematic literature review (journal of nephrology. J Nephrol.
2023;2023(4):1219–1219. doi: 10.1007/s40620-023-01573-4
182. Saputro SA, Pattanaprateep O, Pattanateepapon A, et al. Prognostic
models of diabetic microvascular complications: a systematic
review and meta-analysis. Syst Rev. 2021;10(1):288. doi: 10.1186/
s13643-021-01841-z
183. Sardar SK, Kumar N, Lee SC. A systematic literature review on
machine learning algorithms for human status detection. IEEE
Access. 2022;10:74366–74382. doi: 10.1109/ACCESS.2022.3190967
184. Scardoni A, Balzarini F, Signorelli C, et al. Artificial
intelligence-based tools to control healthcare associated infections:
a systematic review of the literature. J Infect Public Health. 2020;13
(8):1061–1077. doi: 10.1016/j.jiph.2020.06.006
185. Segato A, Marzullo A, Calimeri F, et al. Artificial intelligence for
brain diseases: a systematic review. APL Bioeng. 2020;4(4):041503.
doi: 10.1063/5.0011697
186. Senanayake S, White N, Graves N, et al. Machine learning in pre-
dicting graft failure following kidney transplantation: a systematic
review of published predictive models. Int J Med Inform.
2019;130:130. doi: 10.1016/j.ijmedinf.2019.103957
187. Senders JT, Arnaout O, Karhade AV, et al. Natural and artificial
intelligence in neurosurgery: a systematic review. Neurosurgery.
2018;83(2):181–192. doi: 10.1093/neuros/nyx384
188. Shi Z, Zhang Z, Liu Z, et al. Methodological quality of machine
learning-based quantitative imaging analysis studies in esophageal
cancer: a systematic review of clinical outcome prediction after
concurrent chemoradiotherapy. Eur J Nucl Med Mol Imaging.
2021;49(8):2462–2481. doi: 10.1007/s00259-021-05658-9
189. Shillan D, Sterne JAC, Champneys A, et al. Use of machine learning
to analyse routinely collected intensive care unit data: a systematic
review. Crit Care. 2019;23(1):284. doi: 10.1186/s13054-019-2564-9
190. Shin S, Austin PC, Ross HJ, et al. Machine learning vs. conventional
statistical models for predicting heart failure readmission and
mortality. ESC Heart Fail. 2021;8(1):106–115. doi: 10.1002/ehf2.13073
191. Shlobin NA, Baig AA, Waqas M, et al. Artificial intelligence for
large-vessel occlusion stroke: a systematic review. World
Neurosurg. 2021 Mar;159:207–220.
192. Siddiqui S, Arifeen M, Hopgood A, et al. Deep learning models for
the diagnosis and screening of COVID-19: a systematic review. SN
Comput Sci. 2022;3(5):397. doi: 10.1007/s42979-022-01326-3
193. Smets J, Shevroja E, Hügle T, et al. Machine learning solutions for
osteoporosis—a review. J Bone Mineral Res. 2021;36(5):833–851.
doi: 10.1002/jbmr.4292
194. Soffer S, Klang E, Shimon O, et al. Deep learning for pulmonary
embolism detection on computed tomography pulmonary angio-
gram: a systematic review and meta-analysis. Sci Rep. 2021;11
(1):15814. doi: 10.1038/s41598-021-95249-3
48 K. KOLASA ET AL.
195. Song DY, Topriceanu CC, Ilie-Ablachim DC, et al. Machine learning
with neuroimaging data to identify autism spectrum disorder:
a systematic review and meta-analysis. Neuroradiology. 2021;63
(12):2057–2072. doi: 10.1007/s00234-021-02774-z
196. Song X, Liu X, Liu F, et al. Comparison of machine learning and
logistic regression models in predicting acute kidney injury:
a systematic review and meta-analysis. Int J Med Inform.
2021;151:104484. doi: 10.1016/j.ijmedinf.2021.104484
197. Sorin V, Barash Y, Konen E, et al. Deep learning for natural language
processing in radiology-fundamentals and a systematic review. J Am
Coll Radiol. 2020;17(5):639–648. doi: 10.1016/j.jacr.2019.12.026
198. Srivani M, Murugappan A, Mala T. Cognitive computing technolo-
gical trends and future research directions in healthcare
a systematic literature review. Artif Intell Med. 2023;138:102513.
doi: 10.1016/j.artmed.2023.102513
199. Stephens ME, O’Neal CM, Westrup AM, et al. Utility of machine
learning algorithms in degenerative cervical and lumbar spine
disease: a systematic review. Neurosurg Rev. 2021;45(2):965–978.
doi: 10.1007/s10143-021-01624-z
200. Stewart J, Lu J, Goudie A, et al. Applications of machine learning to
undifferentiated chest pain in the emergency department:
a systematic review. PLoS One. 2021;16(8):e0252612. doi: 10.1371/
journal.pone.0252612
201. Stokes K, Castaldo R, Federici C, et al. The use of artificial intelli-
gence systems in diagnosis of pneumonia via signs and symptoms:
a systematic review. Biomedical Signal Processing And Control.
2022;72:103325. doi: 10.1016/j.bspc.2021.103325
202. Subramanian H, Dey R, Brim WR, et al. Trends in development of
Novel Machine learning methods for the identification of Gliomas
in datasets that include non-glioma images: a systematic review.
Front Oncol. 2021;11:788819. doi: 10.3389/fonc.2021.788819
203. Syeda HB, Syed M, Sexton KW, et al. Role of machine learning
techniques to tackle the covid-19 crisis: systematic review. JMIR
Med Inform. 2021;9(1):e23811. doi: 10.2196/23811
204. Syer T, Mehta P, Antonelli M, et al. Artificial intelligence compared
to Radiologists for the initial diagnosis of prostate cancer on mag-
netic resonance imaging: a systematic review and recommenda-
tions for future studies. Cancers (Basel). 2021;13(13):3318. doi: 10.
3390/cancers13133318
205. Tabatabaei M, Razaei A, Sarrami AH, et al. Current status and
quality of machine learning-based radiomics studies for glioma
grading: a systematic review. Oncology. 2021;99(7):433–443. doi:
10.1159/000515597
206. Tan KR, Seng JJB, Kwan YH, et al. Evaluation of machine learning
methods developed for prediction of Diabetes complications:
a systematic review. J Diabetes Sci Technol. 2021;17(2):474–489.
doi: 10.1177/19322968211056917
207. Teo YH, Lim ICZY, Tseng FS, et al. Predicting clinical outcomes in
acute ischemic stroke patients undergoing endovascular throm-
bectomy with machine learning: a systematic review and
meta-analysis. Clin Neuroradiol. 2021;31(4):1121–1130. doi: 10.
1007/s00062-020-00990-3
208. Tewarie IA, Senders JT, Kremer S, et al. Survival prediction of
glioblastoma patients-are we there yet? a systematic review of
prognostic modeling for glioblastoma and its clinical potential.
Neurosurg Rev. 2021;44(4):2047–2057. doi: 10.1007/s10143-020-
01430-z
209. Triantafyllidis A, Polychronidou E, Alexiadis A, et al. Computerized
decision support and machine learning applications for the pre-
vention and treatment of childhood obesity: a systematic review of
the literature. Artif Intell Med. 2020;104:101844. doi: 10.1016/j.
artmed.2020.101844
210. Ugga L, Perillo T, Cuocolo R, et al. Meningioma MRI radiomics and
machine learning: systematic review, quality score assessment, and
meta-analysis. Neuroradiology. 2021;63(8):1293–1304. doi: 10.1007/
s00234-021-02668-0
211. van Kempen EJ, Post M, Mannil M, et al. Accuracy of machine
learning algorithms for the classification of molecular features of
gliomas on MRI: a systematic literature review and meta-analysis.
Cancers (Basel). 2021;13(11):2606. doi: 10.3390/cancers13112606
212. van Kempen EJ, Post M, Mannil M, et al. Performance of machine
learning algorithms for glioma segmentation of brain MRI:
a systematic literature review and meta-analysis. Eur Radiol.
2021;31(12):9638–9653. doi: 10.1007/s00330-021-08035-0
213. Volpe S, Pepa M, Zaffaroni M, et al. Machine learning for head and
neck cancer: a safe bet?-a clinically oriented systematic review for
the radiation oncologist. Front Oncol. 2021;11:772663. doi: 10.
3389/fonc.2021.772663
214. Wang W, Kiik M, Peek N, et al. A systematic review of machine learning
models for predicting outcomes of stroke with structured data. PLoS
One. 2020;15(6):e0234722. doi: 10.1371/journal.pone.0234722
215. Wesselius FJ, van Schie MS, De Groot NMS, et al. Digital biomarkers
and algorithms for detection of atrial fibrillation using surface
electrocardiograms: a systematic review. Comput Biol Med.
2021;133:104404. doi: 10.1016/j.compbiomed.2021.104404
216. Wongkoblap A, Vadillo MA, Curcin V. Researching mental Health
disorders in the era of social media: systematic review. J Med
Internet Res. 2017;19(6):e228. doi: 10.2196/jmir.7215
217. Wu JH, Liu TYA, Hsu WT, et al. Performance and limitation of machine
learning algorithms for diabetic retinopathy screening: meta-analysis.
J Med Internet Res. 2021;23(7):e23863. doi: 10.2196/23863
218. Xu L, He B, Zhang Y, et al. Prognostic models for amyotrophic
lateral sclerosis: a systematic review. J Neurol. 2021;268
(9):3361–3370. doi: 10.1007/s00415-021-10508-7
219. Yan MY, Gustad LT, Ø N. Sepsis prediction, early detection, and
identification using clinical text for machine learning: a systematic
review. J Am Med Inform Assoc. 2021;29(3):559–575. doi: 10.1093/
jamia/ocab236
220. Yeo M, Tahayori B, Kok HK, et al. Review of deep learning algo-
rithms for the automatic detection of intracranial hemorrhages on
computed tomography head imaging. J Neurointerv Surg. 2021;13
(4):369–378. doi: 10.1136/neurintsurg-2020-017099
221. Yin J, Ngiam KY, Teo HH. Role of Artificial intelligence applications
in Real-life clinical practice: systematic review. J Med Internet Res.
2021;23(4):e25759. doi: 10.2196/25759
222. Yu K, Syed MN, Bernardis E, et al. Machine learning applications in
the evaluation and management of Psoriasis: a systematic review.
J Psoriasis Psoriatic Arthritis. 2020;5(4):147–159. doi: 10.1177/
2475530320950267
223. Zakhem GA, Fakhoury JW, Motosko CC, et al. Characterizing the
role of dermatologists in developing artificial intelligence for
assessment of skin cancer. J Am Acad Dermatol. 2021;85
(6):1544–1556. doi: 10.1016/j.jaad.2020.01.028
224. Zaunseder E, Haupt S, Mütze U, et al. Opportunities and challenges in
machine learning-based newborn screening-a systematic literature
review. JIMD Rep. 2022;63(3):250–261. doi: 10.1002/jmd2.12285
225. Zeiser FA, da Costa CA, Roehe AV, et al. Breast cancer intelligent
analysis of histopathological data: a systematic review. Appl Soft
Comput. 2021;113:113. doi: 10.1016/j.asoc.2021.107886
226. Zhang L, Sun JQ, Jiang BB, et al. Development of artificial intelli-
gence in epicardial and pericoronary adipose tissue imaging:
a systematic review. Eur J Hybrid Imaging. 2021;5(1). doi: 10.1186/
s41824-021-00107-0
227. Zhao Y, Wood EP, Mirin N, et al. Social determinants in machine
learning Cardiovascular disease prediction models: a systematic
review. Am J Prev Med. 2021;61(4):596–605. doi: 10.1016/j.
amepre.2021.04.016
228. Zheng QH, Yang L, Zeng B, et al. Artificial intelligence performance
in detecting tumor metastasis from medical radiology imaging:
a systematic review and meta-analysis. ECLINICALMEDICINE.
2021;31:100669. doi: 10.1016/j.eclinm.2020.100669
229. Zheng Y, Dickson VV, Blecker S, et al. Identifying patients with
hypoglycemia using natural language processing: systematic litera-
ture review. JMIR Diabetes. 2022;7(2):e34681. doi: 10.2196/34681
230. Zhou Y, Ge YT, Shi XL, et al. Machine learning predictive models for
acute pancreatitis: a systematic review. Int J Med Inform.
2022;157:104641. doi: 10.1016/j.ijmedinf.2021.104641
231. Zhu T, Li K, Herrero P, et al. Deep learning for diabetes: a systematic
review. IEEE J Biomed Health Inform. 2021;25(7):2744–2757. doi: 10.
1109/JBHI.2020.3040225
EXPERT REVIEW OF PHARMACOECONOMICS & OUTCOMES RESEARCH 49
232. Maniah, Soewito B, Lumban Gaol F, et al. A systematic literature
review: Risk analysis in cloud migration. J King Saud Univ
Comput Inf Sci. 2022;34(6, Part B):3111–3120. doi: 10.1016/j.
jksuci.2021.01.008
233. Kitchenham B, Pearl Brereton O, Budgen D, et al. Systematic litera-
ture reviews in software engineering a systematic literature
review. Inf Software Technol. 2009;51(1):7–15. doi: 10.1016/j.infsof.
2008.09.009
234. Kitchenham B, Charters S Guidelines for performing systematic
literature reviews in software engineering, Technical Report EBSE
2007-001. (2007).
235. Brereton P, Kitchenham BA, Budgen D, et al. Lessons from applying
the systematic literature review process within the software engi-
neering domain. J Syst Software. 2007;80(4):571–583. doi: 10.1016/
j.jss.2006.07.009
236. Luo W, Phung D, Tran T, et al. Guidelines for developing and
reporting machine learning predictive models in Biomedical
research: a Multidisciplinary View. J Med Internet Res. 2016;18(12):
e323. doi: 10.2196/jmir.5870
237. Zeng X, Zhang Y, Kwong JS, et al. The methodological quality
assessment tools for preclinical and clinical studies, systematic
review and meta-analysis, and clinical practice guideline:
a systematic review. J Evid Based Med. 2015;8(1):2–10. doi: 10.
1111/jebm.12141
238. Kable AK, Pich J, Maslin-Prothero SE. A structured approach to
documenting a search strategy for publication: a 12 step guideline
for authors. Nurse Educ Today. 2012;32(8):878–886. doi: 10.1016/j.
nedt.2012.02.022
239. Kim HR, Sung M, Park JA, et al. Analyzing adverse drug reaction
using statistical and machine learning methods: a systematic
review. Medicine. 2022;101(25):e29387. doi: 10.1097/MD.
0000000000029387
240. Paganelli AI, Mondéjar AG, da Silva AC, et al. Real-time data analysis
in health monitoring systems: a comprehensive systematic litera-
ture review. J Biomed Informatics. 2022;127:104009. doi: 10.1016/j.
jbi.2022.104009
241. Ariji Y, Fukuda M, Kise Y, et al. Contrast-enhanced computed
tomography image assessment of cervical lymph node metastasis
in patients with oral cancer by using a deep learning system of
artificial intelligence. Oral Surg Oral Med Oral Pathol Oral Radiol.
2019;127(5):458–463. doi: 10.1016/j.oooo.2018.10.002
242. Ariji Y, Sugita Y, Nagao T, et al. CT evaluation of extranodal exten-
sion of cervical lymph node metastases in patients with oral squa-
mous cell carcinoma using deep learning classification. Oral Radiol.
2020;36(2):148–155. doi: 10.1007/s11282-019-00391-4
243. Kwon JM, Jeon KH, Kim HM, et al. Comparing the performance of
artificial intelligence and conventional diagnosis criteria for detecting
left ventricular hypertrophy using electrocardiography. EP Europace.
2020;22(3):412–419. doi: 10.1093/europace/euz324
244. Nakajima K, Kudo T, Nakata T, et al. Diagnostic accuracy of an
artificial neural network compared with statistical quantitation of
myocardial perfusion images: a Japanese multicenter study. Eur
J Nucl Med Mol Imaging. 2017;44(13):2280–2289. doi: 10.1007/
s00259-017-3834-x
245. Al-Aswad LA, Kapoor R, Chu CK, et al. Evaluation of a deep learning
system for identifying glaucomatous optic neuropathy based on
color fundus photographs. J Glaucoma. 2019;28(12):1029–1034.
doi: 10.1097/IJG.0000000000001319
246. Liu S, Graham SL, Schulz A, et al. A Deep Learning-Based Algorithm
Identifies Glaucomatous Discs Using Monoscopic Fundus
Photographs. Ophthalmol Glaucoma. 2018;1(1):15–22. doi: 10.
1016/j.ogla.2018.04.002
247. Phene S, Dunn RC, Hammel N, et al. Deep learning and glaucoma
specialists: the relative importance of optic disc features to predict
glaucoma referral in fundus photographs. Ophthalmol. 2019;126
(12):1627–1639. doi: 10.1016/j.ophtha.2019.07.024
248. Shibata N, Tanito M, Mitsuhashi K, et al. Development of a deep
residual learning algorithm to screen for glaucoma from fundus
photography. Sci Rep. 2018;8(1):14665. doi: 10.1038/s41598-018-
33013-w
249. Seo E, Jaccard N, Trikha S, et al. Automated evaluation of optic disc
images for manifest glaucoma detection using a deep-learning,
neural network-based algorithm. Invest Ophthalmol Visual Sci.
2018;59(9):2080–2080.
250. Jammal AA, Thompson AC, Mariottoni EB, et al. Human versus
Machine: comparing a Deep learning algorithm to human gradings
for detecting glaucoma on fundus photographs. Am J Ophthalmol.
2020;211:123–131. doi: 10.1016/j.ajo.2019.11.006
251. Brinker TJ, Hekler A, Enk AH, et al. Deep learning outperformed 136
of 157 dermatologists in a head-to-head dermoscopic melanoma
image classification task. Eur J Cancer. 2019;113:47–54. doi: 10.
1016/j.ejca.2019.04.001
252. Brinker TJ, Hekler A, Enk AH, et al. Deep neural networks are
superior to dermatologists in melanoma image classification. Eur
J Cancer. 2019;119:11–17. doi: 10.1016/j.ejca.2019.05.023
253. Yu C, Yang S, Kim W, et al. Acral melanoma detection using
a convolutional neural network for dermoscopy images. PLoS
One. 2018;13(3):e0193321. doi: 10.1371/journal.pone.0193321
254. Marchetti MA, Codella NCF, Dusza SW, et al. Results of the 2016
international skin imaging collaboration international symposium
on biomedical imaging challenge: comparison of the accuracy of
computer algorithms to dermatologists for the diagnosis of mela-
noma from dermoscopic images. J Am Acad Dermatol. 2018;78
(2):270–277.e271. doi: 10.1016/j.jaad.2017.08.016
255. Marchetti MA, Liopyris K, Dusza SW, et al. Computer algorithms
show potential for improving dermatologists’ accuracy to diagnose
cutaneous melanoma: results of the international skin imaging
collaboration 2017. J Am Acad Dermatol. 2020;82(3):622–627. doi:
10.1016/j.jaad.2019.07.016
256. Haenssle HA, Fink C, Schneiderbauer R, et al. Man against machine:
diagnostic performance of a deep learning convolutional neural
network for dermoscopic melanoma recognition in comparison to
58 dermatologists. Ann Oncol. 2018;29(8):1836–1842. doi: 10.1093/
annonc/mdy166
257. Haenssle HA, Fink C, Toberer F, et al. Man against machine
reloaded: performance of a market-approved convolutional neural
network in classifying a broad spectrum of skin lesions in compar-
ison with 96 dermatologists working under less artificial conditions.
Ann Oncol. 2020;31(1):137–143. doi: 10.1016/j.annonc.2019.10.013
258. Haenssle HA, Winkler JK, Fink C, et al. Skin lesions of face and scalp
- classification by a market-approved convolutional neural network
in comparison with 64 dermatologists. Eur J Cancer.
2021;144:192–199. doi: 10.1016/j.ejca.2020.11.034
259. Tschandl P, Codella N, Akay BN, et al. Comparison of the accuracy
of human readers versus machine-learning algorithms for pigmen-
ted skin lesion classification: an open, web-based, international,
diagnostic study. Lancet Oncol. 2019;20(7):938–947. doi: 10.1016/
S1470-2045(19)30333-X
260. Maron RC, Weichenthal M, Utikal JS, et al. Systematic outperfor-
mance of 112 dermatologists in multiclass skin cancer image clas-
sification by convolutional neural networks. Eur J Cancer.
2019;119:57–65. doi: 10.1016/j.ejca.2019.06.013
261. Tschandl P, Rosendahl C, Akay BN, et al. Expert-level diagnosis of
nonpigmented skin cancer by combined convolutional neural
networks. JAMA Dermatol. 2019;155(1):58–65. doi: 10.1001/jamader
matol.2018.4378
262. Fujisawa Y, Otomo Y, Ogata Y, et al. Deep-learning-based,
computer-aided classifier developed with a small dataset of clinical
images surpasses board-certified dermatologists in skin tumour
diagnosis. Br J Dermatol. 2019;180(2):373–381. doi: 10.1111/bjd.
16924
263. Jinnai S, Yamazaki N, Hirano Y, et al. The development of a skin
cancer classification system for pigmented skin lesions using Deep
learning. Biomolecules. 2020;10(8):1123. doi: 10.3390/
biom10081123
264. Han SS, Park I, Eun Chang S, et al. Augmented intelligence
Dermatology: deep neural networks empower medical profes-
sionals in diagnosing skin cancer and predicting treatment options
for 134 skin disorders. J Invest Dermatol. 2020;140(9):1753–1761.
doi: 10.1016/j.jid.2020.01.019
50 K. KOLASA ET AL.
265. Han SS, Kim MS, Lim W, et al. Classification of the clinical images for
benign and malignant cutaneous tumors using a deep learning
algorithm. J Invest Dermatol. 2018;138(7):1529–1538. doi: 10.1016/
j.jid.2018.01.028
266. Brinker TJ, Hekler A, Enk AH, et al. A convolutional neural network
trained with dermoscopic images performed on par with 145
dermatologists in a clinical melanoma image classification task.
Eur J Cancer. 2019;111:148–154. doi: 10.1016/j.ejca.2019.02.005
267. Han SS, Moon IJ, Kim SH, et al. Assessment of deep neural networks for
the diagnosis of benign and malignant skin neoplasms in comparison
with dermatologists: a retrospective validation study. PLOS Med.
2020;17(11):e1003381. doi: 10.1371/journal.pmed.1003381
268. Brinker TJ, Schmitt M, Krieghoff-Henning EI, et al. Diagnostic per-
formance of artificial intelligence for histologic melanoma recogni-
tion compared to 18 international expert pathologists. J Am Acad
Dermatol. 2022;86(3):640–642. doi: 10.1016/j.jaad.2021.02.009
269. Brennan M, Puri S, Ozrazgat-Baslanti T, et al. Comparing clinical
judgment with the MySurgeryRisk algorithm for preoperative risk
assessment: a pilot usability study. Surgery. 2019;165(5):1035–1045.
doi: 10.1016/j.surg.2019.01.002
270. Kambakamba P, Mannil M, Herrera PE, et al. The potential of
machine learning to predict postoperative pancreatic fistula
based on preoperative, non-contrast-enhanced CT: a proof-of-
principle study. Surgery. 2020;167(2):448–454. doi: 10.1016/j.surg.
2019.09.019
271. Yala A, Schuster T, Miles R, et al. A Deep learning model to triage
screening mammograms: a simulation study. Radiology. 2019;293
(1):38–46. doi: 10.1148/radiol.2019182908
272. McKinney SM, Sieniek M, Godbole V, et al. International evaluation
of an AI system for breast cancer screening. Nature. 2020;577
(7788):89–94. doi: 10.1038/s41586-019-1799-6
273. Balta C, Rodríguez-Ruiz A, Mieskes C, et al. Going from double to
single reading for screening exams labeled as likely normal by AI:
what is the impact? In: 15th International Workshop on Breast
Imaging; Leuven, Belgium; 2020.
274. Dembrower K, Wåhlin E, Liu Y, et al. Effect of artificial
intelligence-based triaging of breast cancer screening mammo-
grams on cancer detection and radiologist workload:
a retrospective simulation study. Lancet Digit Health. 2020;2(9):
e468–e474. doi: 10.1016/S2589-7500(20)30185-0
275. Kyono T, Gilbert FJ, van der Schaar M. Improving workflow effi-
ciency for mammography using machine learning. J Am Coll
Radiol. 2020;17(1 Pt A):56–63. doi: 10.1016/j.jacr.2019.05.012
276. Rodriguez-Ruiz A, Lång K, Gubern-Merida A, et al. Can we reduce
the workload of mammographic screening by automatic identifica-
tion of normal exams with artificial intelligence? A feasibility study.
Eur Radiol. 2019;29(9):4825–4832. doi: 10.1007/s00330-019-06186-9
277. Geras KJ, Wolfson S, Kim SG, et al. High-resolution breast cancer
screening with multi-view deep convolutional neural networks.
ArXiv. 2017. abs/1703.07047.
278. Lotter W, Diab AR, Haslam B, et al. Robust breast cancer detection
in mammography and digital breast tomosynthesis using an
annotation-efficient deep learning approach. Nat Med. 2021;27
(2):244–249. doi: 10.1038/s41591-020-01174-9
279. Rodríguez-Ruiz A, Krupinski E, Mordang JJ, et al. Detection of breast
cancer with Mammography: effect of an Artificial intelligence sup-
port system. Radiology. 2019;290(2):305–314. doi: 10.1148/radiol.
2018181371
280. Schaffter T, Buist DSM, Lee CI, et al. Evaluation of combined
Artificial intelligence and radiologist assessment to Interpret
screening mammograms. JAMA Netw Open. 2020;3(3):e200265.
doi: 10.1001/jamanetworkopen.2020.0265
281. Salim M, Wåhlin E, Dembrower K, et al. External evaluation of 3
commercial artificial intelligence algorithms for independent
assessment of screening mammograms. JAMA Oncol. 2020;6
(10):1581–1588. doi: 10.1001/jamaoncol.2020.3321
282. Cowley JB The use of knowledge discovery databases in the iden-
tification of patients with colorectal cancer. (Ed.^(Eds) (2012)
283. Wei JW, Suriawinata AA, Vaickus LJ, et al. Evaluation of a deep
neural network for automated classification of colorectal polyps on
histopathologic slides. JAMA Netw Open. 2020;3(4):e203398. doi:
10.1001/jamanetworkopen.2020.3398
284. Song Z, Yu C, Zou S, et al. Automatic deep learning-based color-
ectal adenoma detection system and its similarities with
pathologists. BMJ Open. 2020;10(9):e036423. doi: 10.1136/bmjo
pen-2019-036423
285. Bychkov D, Linder N, Turkki R, et al. Deep learning based tissue
analysis predicts outcome in colorectal cancer. Sci Rep. 2018;8
(1):3395. doi: 10.1038/s41598-018-21758-3
286. Geessink OGF, Baidoshvili A, Klaase JM, et al. Computer aided
quantification of intratumoral stroma yields an independent prog-
nosticator in rectal cancer. Cell Oncol. 2019;42(3):331–341. doi: 10.
1007/s13402-019-00429-z
287. Kather JN, Krisam J, Charoentong P, et al. Predicting survival from
colorectal cancer histology slides using deep learning:
a retrospective multicenter study. PLOS Med. 2019;16(1):
e1002730. doi: 10.1371/journal.pmed.1002730
288. Zhao K, Li Z, Yao S, et al. Artificial intelligence quantified
tumour-stroma ratio is an independent predictor for overall survi-
val in resectable colorectal cancer. EBioMedicine. 2020;61:103054.
doi: 10.1016/j.ebiom.2020.103054
289. Suh HB, Choi YS, Bae S, et al. Primary central nervous system
lymphoma and atypical glioblastoma: differentiation using radio-
mics approach. Eur Radiol. 2018;28(9):3832–3839. doi: 10.1007/
s00330-018-5368-4
290. Kang D, Park JE, Kim YH, et al. Diffusion radiomics as
a diagnostic model for atypical manifestation of primary central
nervous system lymphoma: development and multicenter exter-
nal validation. Neuro Oncol. 2018;20(9):1251–1261. doi: 10.1093/
neuonc/noy021
291. Alcaide-Leon P, Dufort P, Geraldo AF, et al. Differentiation of
enhancing glioma and primary Central nervous system lymphoma
by texture-based Machine learning. AJNR Am J Neuroradiol.
2017;38(6):1145–1150. doi: 10.3174/ajnr.A5173
292. Yamashita K, Yoshiura T, Arimura H, et al. Performance evaluation
of radiologists with artificial neural network for differential diagno-
sis of intra-axial cerebral tumors on MR images. AJNR Am
J Neuroradiol. 2008;29(6):1153–1158. doi: 10.3174/ajnr.A1037
293. Rehm GB, Han J, Kuhn BT, et al. Creation of a robust and general-
izable Machine learning classifier for patient ventilator asynchrony.
Methods Inf Med. 2018;57(4):208–219. doi: 10.3414/ME17-02-0012
294. Bakkes T, Montree RJH, Mischi M, et al. A machine learning method
for automatic detection and classification of patient-ventilator
asynchrony. Annu Int Conf IEEE Eng Med Biol Soc; Montreal, QC,
Canada; 2020. p. 150–153.
295. Mulqueeny Q, Redmond SJ, Tassaux D, et al. Automated detection
of asynchrony in patient-ventilator interaction. Annu Int Conf IEEE
Eng Med Biol Soc; Minneapolis, Minnesota; 2009. p. 5324–5327.
296. Aghdam MA, Sharifi A, Pedram MM. Diagnosis of Autism spectrum
disorders in young children based on resting-state functional magnetic
resonance imaging data using convolutional neural networks. J Digit
Imaging. 2019;32(6):899–918. doi: 10.1007/s10278-019-00196-1
297. Petrucci K, Petrucci P, Canfield K, et al. Evaluation of UNIS: urologi-
cal Nursing information systems. Proc Annu Symp Comput Appl
Med Care. 1991;1:43–47.
298. Gorman R. Expert system for management of urinary incontinence
in women. Proc Annu Symp Comput Appl Med Care.
1995;1:527–531.
299. Chang PL, Li YC, Wang TM, et al. Evaluation of a decision-support
system for preoperative staging of prostate cancer. Med Decis
Making. 1999;19(4):419–427. doi: 10.1177/0272989X9901900410
300. Koutsojannis C, Hatzilygeroudis I. FESMI: A Fuzzy Expert System for
Diagnosis and Treatment of Male Impotence. In: International
Conference on Knowledge-Based Intelligent Information &
Engineering Systems; Athens, Greece; 2004.
301. Koutsojannis C, Nabil E, Tsimara M, et al. Using machine learning
techniques to improve the behaviour of a medical decision support
system for prostate diseases (2009).
302. Altunay S, Telatar Z, Eroğul O, et al. A new approach to urinary
system dynamics problems: evaluation and classification of
EXPERT REVIEW OF PHARMACOECONOMICS & OUTCOMES RESEARCH 51
uroflowmeter signals using artificial neural networks. Expert Syst
Appl. 2009;36(3):4891–4895. doi: 10.1016/j.eswa.2008.05.051
303. Koutsojannis C, Lithari C, Hatzilygeroudis I. Managing urinary
incontinence through hand-held real-time decision support aid.
Comput Methods Programs Biomed. 2012;107(1):84–89. doi: 10.
1016/j.cmpb.2012.02.012
304. Hassanien AE, Alqaheri H, El-Dahshan E-S. Prostate boundary detec-
tion in ultrasound images using biologically-inspired spiking neural
network. Appl Soft Comput. 2011;11(2):2035–2041. doi: 10.1016/j.
asoc.2010.07.001
305. Torshizi AD, Zarandi MH, Torshizi GD, et al. A hybrid fuzzy-ontology
based intelligent system to determine level of severity and treat-
ment recommendation for benign prostatic hyperplasia. Comput
Methods Programs Biomed. 2014;113(1):301–313. doi: 10.1016/j.
cmpb.2013.09.021
306. Xiao D, Zhang G, Liu Y, et al. 3D detection and extraction of
bladder tumors via MR virtual cystoscopy. Int J Comput Assist
Radiol Surg. 2016;11(1):89–97. doi: 10.1007/s11548-015-1234-x
307. Hurst RE, Bonner RB, Ashenayi K, et al. Neural net-based identifica-
tion of cells expressing the p300 tumor-related antigen using
fluorescence image analysis. Cytometry. 1997;27(1):36–42. 3rd.
doi: 10.1002/(SICI)1097-0320(19970101)27:1<36:AID-CYTO5>3.0.
CO;2-J
308. Hao AT, Wu LP, Kumar A, et al. Nursing process decision support
system for urology ward. Int J Med Inform. 2013;82(7):604–612. doi:
10.1016/j.ijmedinf.2013.02.006
309. Lopes MH, Ortega NR, Silveira PS, et al. Fuzzy cognitive map in
differential diagnosis of alterations in urinary elimination: a nursing
approach. Int J Med Inform. 2013;82(3):201–208. doi: 10.1016/j.
ijmedinf.2012.05.012
310. Koutsojannis C, Tsimara M, Nabil E. HIROFILOS: a medical expert
system for prostate diseases. 7th WSEAS international conference
on Computational intelligence, man-machine systems and cyber-
netics; 2007; Venice, Italy; 2008.
311. Qiu W, Kuang H, Teleg E, et al. Machine learning for detecting early
infarction in acute stroke with non-contrast-enhanced CT.
Radiology. 2020;294(3):638–644. doi: 10.1148/radiol.2020191193
312. Ni Q, Sun ZY, Qi L, et al. A deep learning approach to characterize
2019 coronavirus disease (COVID-19) pneumonia in chest CT
images. Eur Radiol. 2020;30(12):6517–6527. doi: 10.1007/s00330-
020-07044-9
313. Tomita N, Cheung YY, Hassanpour S. Deep neural networks for
automatic detection of osteoporotic vertebral fractures on CT
scans. Comput Biol Med. 2018;98:8–15. doi: 10.1016/j.comp
biomed.2018.05.011
314. Murata K, Endo K, Aihara T, et al. Artificial intelligence for the
detection of vertebral fractures on plain spinal radiography. Sci
Rep. 2020;10(1):20031. doi: 10.1038/s41598-020-76866-w
315. Cheng CT, Ho TY, Lee TY, et al. Application of a deep learning
algorithm for detection and visualization of hip fractures on plain
pelvic radiographs. Eur Radiol. 2019;29(10):5469–5477. doi: 10.
1007/s00330-019-06167-y
316. Yu JS, Yu SM, Erdal BS, et al. Detection and localisation of hip
fractures on anteroposterior radiographs with artificial intelligence:
proof of concept. Clin Radiol. 2020;75(3):.e237.231–.e237.239. doi:
10.1016/j.crad.2019.10.022
317. Yamada Y, Maki S, Kishida S, et al. Automated classification of hip
fractures using deep convolutional neural networks with orthope-
dic surgeon-level accuracy: ensemble decision-making with
antero-posterior and lateral radiographs. Acta Orthop. 2020;91
(6):699–704. doi: 10.1080/17453674.2020.1803664
318. Jiménez-Sánchez A, Kazi A, Albarqouni S, et al. Precise proximal
femur fracture classification for interactive training and surgical
planning. Int J Comput Assist Radiol Surg. 2020;15(5):847–857.
doi: 10.1007/s11548-020-02150-x
319. Adams M, Chen W, Holcdorf D, et al. Computer vs human: deep
learning versus perceptual training for the detection of neck of
femur fractures. J Med Imaging Radiat Oncol. 2019;63(1):27–32. doi:
10.1111/1754-9485.12828
320. Mawatari T, Hayashida Y, Katsuragawa S, et al. The effect of deep
convolutional neural networks on radiologists’ performance in the
detection of hip fractures on digital pelvic radiographs. Eur
J Radiol. 2020;130:109188. doi: 10.1016/j.ejrad.2020.109188
321. Urakawa T, Tanaka Y, Goto S, et al. Detecting intertrochanteric hip
fractures with orthopedist-level accuracy using a deep convolu-
tional neural network. Skeletal Radiol. 2019;48(2):239–244. doi: 10.
1007/s00256-018-3016-3
322. Chung SW, Han SS, Lee JW, et al. Automated detection and classi-
fication of the proximal humerus fracture by using deep learning
algorithm. Acta Orthop. 2018;89(4):468–473. doi: 10.1080/
17453674.2018.1453714
323. Olczak J, Fahlberg N, Maki A, et al. Artificial intelligence for analyz-
ing orthopedic trauma radiographs. Acta Orthop. 2017;88
(6):581–586. doi: 10.1080/17453674.2017.1344459
324. Lindsey R, Daluiski A, Chopra S, et al. Deep neural network
improves fracture detection by clinicians. Proc Natl Acad Sci
U S A. 2018;115(45):11591–11596. doi: 10.1073/pnas.1806905115
325. Qazi AA, Pekar V, Kim J, et al. Auto-segmentation of normal and
target structures in head and neck CT images: a feature-driven
model-based approach. Med Phys. 2011;38(11):6160–6170. doi:
10.1118/1.3654160
326. Misra-Hebert AD, Milinovich A, Zajichek A, et al. Natural language
processing improves detection of nonsevere hypoglycemia in med-
ical records versus coding alone in patients with type 2 diabetes
but does not improve prediction of severe hypoglycemia events:
an analysis using the electronic medical record in a large health
system. Diabetes Care. 2020;43(8):1937–1940. doi: 10.2337/dc19-
1791
327. Hekler A, Utikal JS, Enk AH, et al. Deep learning outperformed 11
pathologists in the classification of histopathological melanoma
images. Eur J Cancer. 2019;118:91–96. doi: 10.1016/j.ejca.2019.06.
012
328. Kim HE, Kim HH, Han BK, et al. Changes in cancer detection and
false-positive recall in mammography using artificial intelligence:
a retrospective, multireader study. Lancet Digit Health. 2020;2(3):
e138–e148. doi: 10.1016/S2589-7500(20)30003-0
329. European Patent Office. (Ed.^(Eds)
330. FDA. Artificial intelligence and Machine learning (AI/ML)-enabled
medical devices. (Ed.^(Eds) (2022)
331. Muehlematter UJ, Daniore P, Vokinger KN. Approval of artificial
intelligence and machine learning-based medical devices in the
USA and Europe (2015–20): a comparative analysis. Lancet Digital
Health. 2021;3(3):e195–e203. doi: 10.1016/S2589-7500(20)30292-2
332. Varoquaux G. Cross-validation failure: Small sample sizes lead to
large error bars. Neuroimage. 2018;180(Pt A):68–77. doi: 10.1016/j.
neuroimage.2017.06.061
333. Vabalas A, Gowen E, Poliakoff E, et al. Machine learning algorithm
validation with a limited sample size. PLoS One. 2019;14(11):
e0224365. doi: 10.1371/journal.pone.0224365
334. Campello VM, Gkontra P, Izquierdo C, et al. Multi-centre, multi-
vendor and multi-disease cardiac segmentation: the M&Ms chal-
lenge. IEEE Trans Med Imaging. 2021;40(12):3543–3554. doi: 10.
1109/TMI.2021.3090082
335. De Fauw J, Ledsam JR, Romera-Paredes B, et al. Clinically applicable
deep learning for diagnosis and referral in retinal disease. Nat Med.
2018;24(9):1342–1350. doi: 10.1038/s41591-018-0107-6
336. Celi LA, Cellini J, Charpignon M-L, et al. Sources of bias in artificial
intelligence that perpetuate healthcare disparities—A global
review. PLOS Digital Health. 2022;1(3):e0000022. doi: 10.1371/jour
nal.pdig.0000022
337. Khan SM, Liu X, Nath S, et al. A global review of publicly available
datasets for ophthalmological imaging: barriers to access, usability,
and generalisability. Lancet Digit Health. 2021;3(1):e51–e66. doi: 10.
1016/S2589-7500(20)30240-5
338. de Groof AJ, Struyvenberg MR, van der Putten J, et al. Deep-
learning system detects neoplasia in patients with barrett’s eso-
phagus with higher accuracy than endoscopists in a multistep
training and validation study with benchmarking.
52 K. KOLASA ET AL.
Gastroenterology. 2020;158(4):915–929.e914. doi: 10.1053/j.gastro.
2019.11.030
339. Nagendran M, Chen Y, Lovejoy CA, et al. Artificial intelligence
versus clinicians: systematic review of design, reporting standards,
and claims of deep learning studies. BMJ. 2020;368:m689. doi: 10.
1136/bmj.m689
340. Liu X, Faes L, Kale AU, et al. A comparison of deep learning
performance against health-care professionals in detecting dis-
eases from medical imaging: a systematic review and
meta-analysis. Lancet Digital Health. 2019;1(6):e271–e297. doi: 10.
1016/S2589-7500(19)30123-2
341. Congress. CBONaftUS. Research and development in the
Pharmaceutical Industry. (Ed.^(Eds) (2023)
342. Associations. EFooPIa. The root cause of unavailability and delay to
innovative medicines: reducing the time before patients have
access to innovative medicines. (Ed.^(Eds) (2022)
343. de Hond AAH, Leeuwenberg AM, Hooft L, et al. Guidelines and
quality criteria for artificial intelligence-based prediction models in
healthcare: a scoping review. NPJ Digit Med. 2022;5(1):2. doi: 10.
1038/s41746-021-00549-7
344. Parikh RB, Helmchen LA. Paying for artificial intelligence in
medicine. NPJ Digit Med. 2022;5(1):63. doi: 10.1038/s41746-022-
00609-6
345. Abràmoff MD, Roehrenbeck C, Trujillo S, et al. A reimbursement
framework for artificial intelligence in healthcare. NPJ Digit Med.
2022;5(1):72. doi: 10.1038/s41746-022-00621-w
EXPERT REVIEW OF PHARMACOECONOMICS & OUTCOMES RESEARCH 53
... In addition, it has many applications, including diagnosis, treatment, and risk prediction [6][7][8] . Systematic literature reviews have identified over 10,000 ML algorithms used in healthcare, with the most common being neural networks, support vector machines, and random forests or decision trees 9 . The most frequent areas of ML application in healthcare are oncology and neurology, likely reflecting the prevalence of these diseases 9 . ...
... Systematic literature reviews have identified over 10,000 ML algorithms used in healthcare, with the most common being neural networks, support vector machines, and random forests or decision trees 9 . The most frequent areas of ML application in healthcare are oncology and neurology, likely reflecting the prevalence of these diseases 9 . ML has demonstrated high predictive ability in many disease areas, but the reporting quality of these studies is often low, with many lacking data on accuracy, sensitivity, specificity, and validation 9,10 . ...
... ML has demonstrated high predictive ability in many disease areas, but the reporting quality of these studies is often low, with many lacking data on accuracy, sensitivity, specificity, and validation 9,10 . The most used data source for ML in healthcare is radiological imaging, which has been used to develop ML solutions for clinical decision support 9 . Neural networks and deep learning are the most frequently published ML techniques, as they can detect complex nonlinear relationships in data 9 . ...
Article
Full-text available
Background: The field of machine learning in health science is evolving exponentially, with a focus on accelerating scientific discoveries, improving holistic well-being, and advancing personalized healthcare. Aim: In this same spirit, this critical review article aims to provide a comprehensive understanding of the role, challenges, opportunities, and ethical considerations of integrating machine learning into health science, with an emphasis on healthcare research and practice. Methods: To base its critiques on previous literature, the elucidative survey considered specific criteria, such as the significance and contribution of each source to the field, methodology or approach, and argument, as well as the use of evidence. Results: The study results indicate that machine learning holds great promise to improve evidence-based health science, but significant work is needed to ensure the technology is developed and deployed in a way that is trustworthy and ethical. Conclusion: In conclusion, the literature review presents a balanced assessment of the strengths, weaknesses, and notable features of the current state of machine learning in health science. The key takeaway point is that while machine learning has demonstrated significant potential to improve health science outcomes and strategic management, there are still important challenges, limitations, and research gaps that need to be addressed to facilitate widespread adoption and trust in these technologies.
... Such properties have resulted in the growth of LITT across practices. ML analytics leveraging advanced algorithms on diverse patient data have shown early success in automating complex planning tasks, forecasting LITT outcomes, and tracking effectiveness across sites (Li et al., 2019;Kolasa et al., 2023). However, variability in techniques, patient selection, and study designs have led to calls for further research and guidance on best practices for LITT adoption and ML integration (Viozzi et al., 2021;Miranda de Souza et al., 2023). ...
... ML's role extends across the LITT treatment spectrum and includes preoperative planning, intraoperative guidance, and postoperative assessment. Techniques such as random forests, support vector machines, and neural networks have been instrumental in optimizing treatment parameters and modeling expected outcomes (Kolasa et al., 2023). ...
... During the LITT procedure, the ML plays a pivotal role in intraoperative guidance. By processing real-time data, these algorithms can assist surgeons in making informed decisions (Kolasa et al., 2023), thereby enhancing intervention precision. This aspect of ML applications is particularly vital in navigating the intricate anatomy of the brain, where every millimeter counts. ...
Preprint
Full-text available
Background: The incorporation of Machine Learning (ML) into Laser Interstitial Thermal Therapy (LITT) represents a significant advancement in minimally invasive neurosurgery, partic- ularly for treating brain tumors, vascular malformations, and epileptogenic foci. This systematic review focuses on evaluating the integration and impact of ML in enhancing the efficacy, precision, and outcomes of LITT in neurosurgical procedures. Methods: An exhaustive search was conducted in major scientific databases for studies from 2015 to 2023 that specifically focused on the application of ML in LITT. The review assessed the development and implementation of ML algorithms in surgical planning, outcome prediction, and postoperative evaluation in LITT. Rigorous inclusion criteria were applied to select studies, and a combination of meta-analysis and qualitative synthesis was used to analyze the data. Results: The review synthesizes findings from a range of studies, including retrospective anal- yses and initial clinical trials. It highlights the role of ML in enhancing the selection criteria for LITT, optimizing surgical approaches, and improving patient-specific outcome predictions. While LITT showed favorable results in treating non-resectable lesions, the integration of ML was found to potentially refine these outcomes further. However, challenges such as the need for larger sample sizes, standardization of ML algorithms, and validation of these methods in clinical settings were noted. Conclusions: The integration of ML into LITT procedures marks a promising frontier in neurosurgery, offering potential improvements in surgical accuracy and patient outcomes. The evidence suggests a need for continued development and rigorous testing of ML applications in LITT. Future research should focus on the refinement and validation of ML algorithms for wider clinical adoption, ensuring that technological advancements align with patient safety and treatment efficacy.
... The accumulated published papers have undergone thorough screening, enabling a centralized focus on the review objectives via the PRISMA methodology-Preferred Reporting Item for Systems Review and Meta-Analyses [17]. The filtration process involved discerning sets of literature integrated with machine learning for fault identification and localization within an optical fiber network. ...
... Shown in Figure 1 is the PRISMA-based assessment system [17] of the review. The evaluation of selected ML-based systems progressed with 350 studies, with 300 duplicate copies rejected from academic repositories-IEEE Xplore, ResearchGate, and Google Scholar during the preliminary search. ...
Preprint
Full-text available
The review aims to assess fifteen (15) academic literature sources, highlighting the application of machine learning algorithms in the maintenance operations of optical fiber networks. It exhibits the collection of data using PRISMA methodology—Preferred Reporting Item for Systems Review and Meta-Analyses. The application, results, and performance metrics are discussed based on the collected observations, computations, and statistics in the studies, which revealed records of high accuracy degrees ranging from 86% to 98% on average and quality ML models including Neural Networks (NNs), Support Vector Machines (SVMs), and LSTM, as well as deep learning models that disclosed effective results of determining challenges and problems within the optical fiber lines. The review mainly centralized on superior machine learning technologies that surpass traditional techniques in fault detection and localization for improved optical fiber networks’ operations while providing insights into the limitations and challenges encountered in real-world applications of these models, offering a comprehensive perspective on the optical fiber network’s domain.
... This review utilized a hybrid approach, employing a systematic method to gather research on stingless bee products and machine learning advancements in accordance with the guidelines outlined by Munn et al., Peters et al., and Zulhendri et al. [8][9][10]. The review followed the guidelines of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA), incorporating the four-phase flow diagram suggested [11,12]. The guiding phrase of "Stingless Bee Products and Machine Learning Advancements" was applied in this review. ...
Article
Full-text available
This literature review aims to explore the many applications of stingless bee products and advancements in machine learning related to stingless bees. The review includes research conducted from 2019 to 2023, with a primary focus on two key topics: the application of bee products and machine learning advancements. The review examines the various applications of stingless bee products, such as honey and bee bread, highlighting their potential in areas like medicine and anti-cancer treatments. Additionally, it examines modern machine learning advancements, including IoT systems and neural networks. The review adopts a hybrid approach, utilizing a systematic method to gather research on both stingless bee products and machine learning advancements. A total of 65 studies contributed to this research, revealing that a majority were conducted in Asia, with the genus of stingless bee Meliponini serving as the primary subject. Findings from these studies offer valuable insights for researchers, policymakers, and stakeholders interested in the sustainable utilization of stingless bee products and the integration of machine learning technologies in this domain.
Article
Full-text available
A crucial area of medical study is the diagnosis of breast cancer, where managing the inherent complexity of high-dimensional information poses a challenge in addition to precise identification. In order to improve diagnostic accuracy, this research investigates dimensionality reduction strategies. This study's main goal was to improve the accuracy and interpretability of breast cancer diagnosis by using dimensionality reduction techniques. The goal of the study is to find significant patterns for useful diagnostic models by examining how preprocessing methods affect a high-dimensional dataset. Starting with a dataset including 569 observations and 30 attributes, careful examination reveals imbalances in the dataset (63% benign, 37% malignant). We used Pearson correlation coefficients to detect and eliminate highly correlated features in order to address multicollinearity. A subsequent adjustment of the data using min-max normalization guarantees consistent weighting. Then, for thorough dimensionality reduction, Principal Component Analysis (PCA) is employed. Screeplots and biplots are used to visually represent data, highlighting how well-suited early principle components are for separating benign from malignant instances. Our findings confirm the effectiveness of the procedure by showing a significant 24% decrease in data dimensionality. This work highlights the critical role that dimensionality reduction plays in improving breast cancer diagnosis for more precise, effective, and understandable models, and it calls for further investigation of the specific findings.
Article
Full-text available
In hospital settings, effective risk management is critical to ensuring patient safety, regulatory compliance, and operational effectiveness. Conventional approaches to risk assessment and mitigation frequently rely on manual procedures and retroactive analysis, which might not be sufficient to recognize and respond to new risks as they arise. This study examines how artificial intelligence (AI) technologies can improve risk management procedures in healthcare facilities, fortifying patient safety precautions and guidelines while improving the standard of care overall. Hospitals can proactively identify and mitigate risks, optimize resource allocation, and improve clinical outcomes by utilizing AI-driven predictive analytics, natural language processing, and machine learning algorithms. The different applications of AI in risk management are discussed in this paper, along with opportunities, problems, and suggestions for their effective use in hospital settings.
Article
Full-text available
The tremendous growth of the Covid19 epidemic in recent months is devastatingly affecting human civilization. Many different biomarkers are being studied to monitor the patient's health. This might mask the symptoms of various diseases, making it more challenging for a doctor to make a correct diagnosis or prognosis. Therefore, this study aimed to create several classes of prediction methods that can handle situations of varying severity (severe, moderate, and mild). Using machine learning, a Lasso-logistic regression model is developed. To create the Covid-19 clinical dataset, researchers enlisted the help of 78 patients from the Azizia main hospital sector, the Wasit Health Directorate, and the Ministry of Health. The results show that the proposed method is generally accurate to 85.9%. Deaths have been reduced thanks to the established prediction method that enables early detection of patients across three severity levels.
Article
Full-text available
SARS-CoV2, which produces COVID-19, has spread worldwide. Since the number of patients is rising daily, it requires time to evaluate laboratory data, limiting treatment and discoveries. Such restrictions necessitate a clinical decision-making tool with predictive algorithms. Predictive algorithms help healthcare systems by spotting disorders. This study uses machine learning and laboratory data to predict COVID-19 patients. Recall, Precision, accuracy, and AUC ratings assessed our models' prediction performance. Models were verified with 10-fold cross-validation and train-test split methods using 18 laboratory data from 600 patients. This research compared three different classification approaches—Support Vector Machines (SVM), artificial neural networks (ANN), and k-Nearest Neighbors (k-NN). According to the findings, SVM achieved the most significant average accuracy (89.3%), followed by ANN (88.5%) and kNN (86.6%). The accuracy rates of all three approaches were relatively reasonable, with SVM being the best of the bunch. The results of this research indicate that classification using machine learning methods has the potential to be used in developing reliable COVID-19 diagnosis systems, thereby facilitating the fast and accurate diagnosis of COVID-19 cases and facilitating proper therapy and management of COVID-19 patients. More work might be done to refine these techniques and include them in useable diagnostic frameworks.
Article
Full-text available
Voice pathology diagnosis requires extracting significant features from voice signals, and classical machine learning models can overfit to the training data, which can cause difficult issues and pose challenges. The study aimed to develop a reliable and efficient system for identifying voice pathologies utilizing the long short-term memory (LSTM) method. The study combined unique feature sets such as the mel frequency cepstral coefficients (MFCCs), zero crossing rate (ZCR), and mel spectro-grams, which have not been used together in previous works. Voice pathology identification improved the accuracy rate using the LSTM approach on the Saarbruecken voice database (SVD) samples. The best results achieved by the proposed system showed an accuracy rate of 99.3% for /u/ vowel samples in neutral pitch, 99.2% for /a/ vowel samples in high pitch, 99% for /i/ vowel samples in neutral pitch, and 99.2% for sentence samples. The experimental results were evaluated utilizing accuracy , precision, specificity, sensitivity, and F1 measures. Additionally, the study compared the performance of LSTM with that of artificial neural networks (ANNs) and found that LSTM achieved better outcomes. K E Y W O R D S artificial neural networks, LSTM, mel frequency cepstral coefficients, mel spectrogram, SVD, unique feature selection sets, voice pathology, zero crossing rate
Article
Full-text available
This study presents a comprehensive analysis of the existing techniques and applications of artificial intelligence (AI) to cardiovascular disease diagnosis. The application of AI to the diagnosis of cardiac diseases can enhance diagnostic precision, diagnostic throughput, and patient outcomes. This literature survey analyzes state-of-the-art AI-based methods, rates their efficiency, examines potential future research and development avenues, and finds challenges and limitations, providing a foundational overview of main developments in AI, machine learning, deep learning, and quantum computing in relation to heart disease prevention. This study seeks to guide the use of AI-based techniques for heart disease detection, having an ultimate objective of enhancing patient outcomes through research and development. This review mainly emphasizes the significance of further studying and advancing AI for its ability to revolutionize the diagnosis and management of heart diseases.
Article
Full-text available
Unlabelled: Despite the advances in modern medicine, the use of data-driven technologies (DDTs) to prevent surgical site infections (SSIs) remains a major challenge. Scholars recognise that data management is the next frontier in infection prevention, but many aspects related to the benefits and advantages of using DDTs to mitigate SSI risk factors remain unclear and underexplored in the literature. This study explores how DDTs enable value creation in the prevention of SSIs. This study follows a systematic literature review approach and the PRISMA statement to analyse peer-reviewed articles from seven databases. Fifty-nine articles were included in the review and were analysed through a descriptive and a thematic analysis. The findings suggest a growing interest in DDTs in SSI prevention in the last 5 years, and that machine learning and smartphone applications are widely used in SSI prevention. DDTs are mainly applied to prevent SSIs in clean and clean-contaminated surgeries and often used to manage patient-related data in the postoperative stage. DDTs enable the creation of nine categories of value that are classified in four dimensions: cost/sacrifice, functional/instrumental, experiential/hedonic, and symbolic/expressive. This study offers a unique and systematic overview of the value creation aspects enabled by DDT applications in SSI prevention and suggests that additional research is needed in four areas: value co-creation and product-service systems, DDTs in contaminated and dirty surgeries, data legitimation and explainability, and data-driven interventions. Supplementary information: The online version contains supplementary material available at 10.1007/s41666-023-00129-2.
Article
Full-text available
Objectives In this systematic review we aimed at assessing how artificial intelligence (AI), including machine learning (ML) techniques have been deployed to predict, diagnose, and treat chronic kidney disease (CKD). We systematically reviewed the available evidence on these innovative techniques to improve CKD diagnosis and patient management. Methods We included English language studies retrieved from PubMed. The review is therefore to be classified as a "rapid review", since it includes one database only, and has language restrictions; the novelty and importance of the issue make missing relevant papers unlikely. We extracted 16 variables, including: main aim, studied population, data source, sample size, problem type (regression, classification), predictors used, and performance metrics. We followed the Preferred Reporting Items for Systematic Reviews (PRISMA) approach; all main steps were done in duplicate. Results From a total of 648 studies initially retrieved, 68 articles met the inclusion criteria. Models, as reported by authors, performed well, but the reported metrics were not homogeneous across articles and therefore direct comparison was not feasible. The most common aim was prediction of prognosis, followed by diagnosis of CKD. Algorithm generalizability, and testing on diverse populations was rarely taken into account. Furthermore, the clinical evaluation and validation of the models/algorithms was perused; only a fraction of the included studies, 6 out of 68, were performed in a clinical context. Conclusions Machine learning is a promising tool for the prediction of risk, diagnosis, and therapy management for CKD patients. Nonetheless, future work is needed to address the interpretability, generalizability, and fairness of the models to ensure the safe application of such technologies in routine clinical practice.
Article
Full-text available
Background: The advancement of information and communication technologies and the growing power of artificial intelligence are successfully transforming a number of concepts that are important to our daily lives. Many sectors, including education, healthcare, industry, and others, are benefiting greatly from the use of such resources. The healthcare sector, for example, was an early adopter of smart wearables, which primarily serve as diagnostic tools. In this context, smart wearables have demonstrated their effectiveness in detecting and predicting cardiovascular diseases (CVDs), the leading cause of death worldwide. Objective: In this study, a systematic literature review of smart wearable applications for cardiovascular disease detection and prediction is presented. After conducting the required search, the documents that met the criteria were analyzed to extract key criteria such as the publication year, vital signs recorded, diseases studied, hardware used, smart models used, datasets used, and performance metrics. Methods: This study followed the PRISMA guidelines by searching IEEE, PubMed, and Scopus for publications published between 2010 and 2022. Once records were located, they were reviewed to determine which ones should be included in the analysis. Finally, the analysis was completed, and the relevant data were included in the review along with the relevant articles. Results: As a result of the comprehensive search procedures, 87 papers were deemed relevant for further review. In addition, the results are discussed to evaluate the development and use of smart wearable devices for cardiovascular disease management, and the results demonstrate the high efficiency of such wearable devices. Conclusions: The results clearly show that interest in this topic has increased. Although the results show that smart wearables are quite accurate in detecting, predicting, and even treating cardiovascular disease, further research is needed to improve their use.
Article
Full-text available
Detecting COVID-19 as early as possible and quickly is one way to stop the spread of COVID-19. Machine learning development can help to diagnose COVID-19 more quickly and accurately. This report aims to find out how far research has progressed and what lessons can be learned for future research in this sector. By filtering titles, abstracts, and content in the Google Scholar database, this literature review was able to find 19 related papers to answer two research questions, i.e. what medical images are commonly used for COVID-19 classification and what are the methods for COVID-19 classification. According to the findings, chest X-ray were the most commonly used data to categorize COVID-19 and transfer learning techniques were the method used in this study. Researchers also concluded that lung segmentation and use of multimodal data could improve performance.
Article
Thedigitalhealthcareparadigmhassignificantlyimprovedbasedondistributedfogandcloudnetworksforcancerdetectionwithmultipleclassesinrecent years.Theparadigmallowsthecollectingandtrainingofcancerdataonvariouscomputingnodestomakeoptimalcancerdetectionwiththeirclasses.In paradigm,multi-omicsapproaches(suchasRNA,miRNA,andmethylation) andmachinelearningtechniqueshaveachievedremarkableresultsinpredicting cancerwithdifferentfeatures.However,thedigitalhealthcareparadigmforcancerdetectionconcerninginfrastructurestillfaceschallengesrelatedtosecurity, executiondelay,andimprovingtheaccuracyofcancerpredictioninexisting studies.Theselimitationsaffecttheoverallcancerdetectionresultswithmore accuracy.Inthispaper,wearehandlingtheresearchmentionedabovelimitations.Wepresentanewparadigmforcancerdetectionwithmoreaccuracy,less processingdelay,andmoresecuritybasedonfogcloudheterogeneouscomputing nodes.WepresenttheMulti-CancerMulti-OmicsClinicalDatasetLaboratories (MCMOCL)Schemes to predict multi-cancerswithmultipleclassesandconsist offederatedlearning,auto-encoder,andXGBoostmethods.Themainobjec tiveofthisstudyistoimproveaccuracy,reduceexecutiondelay,andimprove thesecurityamongheterogeneouscancerclinicsinthearchitecture.Simulation resultsshowthatMCMOCLoutperformedallexistingmachinemodelsinterms ofaccuracyby98%,processingdelayby61%,andsecurityforthemulti-classes typesofcancersinheterogeneousfogcloudparadigm.
Article
Background and aim: Cognitive Computing systems are the intelligent systems that thinks, understands and augments the capabilities of human brain by blending the technologies of Artificial Intelligence, Machine Learning and Natural Language Processing. In recent days, maintenance or enhancement of health by preclusion, prognosis, and analysis of diseases has become a challenging task. The increasing diseases and its causes becomes a big question before humanity. Limited risk analysis, meticulous training process, and automated critical decision-making are some of the issues of cognitive computing. To overcome this issue, cognitive computing in healthcare works like a medical prodigy which anticipates the disease or illness of the human being and helps the doctors with technological facts to take the timely action. The main aim of this survey article is to explore the present and futuristic technological trends of cognitive computing in healthcare. In this work, different cognitive computing applications are reviewed, and the best application is recommended to the clinicians. Based on this recommendation, the clinicians are able to monitor and analyze the physical health of patients. Methods: This article presents the systematic literature on the different aspects of cognitive computing in healthcare. Nearly seven online databases such as SCOPUS, IEEE Xplore, Google Scholar, DBLP, Web of Science, Springer and PubMed were screened and the published articles related to cognitive computing in healthcare is collected from 2014 to 2021. In total, 75 articles were selected, examined and their pros and cons are analyzed. The analysis is done with respect to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. Results: The basic findings of this review article and their significance for theory and practice are mindmaps portraying the cognitive computing platforms, cognitive applications in healthcare, and use cases of cognitive computing in healthcare. A detailed discussion section highlighting the present issues, future research directions and recent applications of cognitive computing in healthcare. Accuracy analysis of different cognitive systems conclude that the Medical Sieve achieves 0.95 and Watson For Oncology (WFO) achieves 0.93 and hence proves to be the prominent computing systems for healthcare. Conclusions: Cognitive computing, an evolving technology in healthcare augments the clinical thought process and enable the doctors to make the right diagnosis and preserve the patient's health in good condition. These systems provides timely care, optimal and cost-effective treatment. This article provides an extensive survey of the importance of cognitive computing in the health sector by highlighting the platforms, techniques, tools, algorithms, applications, and use cases. This survey also explores about the works in the literature on present issues and proposes the future research directions of applying cognitive systems in healthcare.