ArticlePDF Available

A Systematic Review on Federated Learning in Medical Image Analysis

March 2023
IEEE Access PP(99)

March 2023
PP(99)

DOI:10.1109/ACCESS.2023.3260027

License
CC BY 4.0

Authors:

Md Fahimuzzman Sohan

Daffodil International University

Federated Learning (FL) obtained a lot of attention to the academic and industrial stakeholders from the beginning of its invention. The eye-catching feature of FL is handling data in a decentralized manner which creates a privacy preserving environment in Artificial Intelligence (AI) applications. As we know medical data includes marginal private information of patients which demands excessive data protection from disclosure to unexpected destinations. In this paper, we performed a Systematic Literature Review (SLR) of published research articles on FL based medical image analysis. Firstly, we have collected articles from different databases followed by PRISMA guidelines, then synthesized data from the selected articles, and finally we provided a comprehensive overview on the topic. In order to do that we extracted core information associated with the implementation of FL in medical imaging from the articles. In our findings we briefly presented characteristics of federated data and models, performance achieved by the models and exclusively results comparison with traditional ML models. In addition, we discussed the open issues and challenges of implementing FL and mentioned our recommendations for future direction of this particular research field. We believe this SLR has successfully summarized the state-of-the-art FL methods for medical image analysis using deep learning.

Two basic frameworks: working and communication flow of decentralised federated learning in left and usual machine learning in right for a hospital environment.

…

Article consideration process of this review according to PRISMA flow diagram.

…

Different types data samples taken from respective dataset. (a) Tuberculosis infected chest X-ray image, (b) COVID-19 positive chest X-ray image, (c) COVID-19 positive CT image, (d) Brain MRI image for autism spectrum disorders identification, (e) Optical coherence tomography angiography (OCT-A) image of eye, (f) Light-sensitive tissue data of retinal blood vessel, (g) Skin dermoscopy image, and (h) Lung tissue image for cancer detection.

…

Number of articles applied different performance metrics.

…

Figures - uploaded by Md Fahimuzzman Sohan

Content may be subject to copyright.

Content uploaded by Md Fahimuzzman Sohan

Content may be subject to copyright.

Received 2 March 2023, accepted 18 March 2023, date of publication 21 March 2023, date of current version 24 March 2023.

Digital Object Identifier 10.1109/ACCESS.2023.3260027

A Systematic Review on Federated Learning

in Medical Image Analysis

MD FAHIMUZZMAN SOHAN 1AND ANAS BASALAMAH 2

1Department of Software Engineering, Daffodil International University, Dhaka 1207, Bangladesh

2Department of Computer Engineering, Umm Al-Qura University, Mecca 21955, Saudi Arabia

Corresponding author: Md Fahimuzzman Sohan (fahimsohan2@gmail.com)

ABSTRACT Federated Learning (FL) obtained a lot of attention to the academic and industrial stakeholders

from the beginning of its invention. The eye-catching feature of FL is handling data in a decentralized manner

which creates a privacy preserving environment in Artiﬁcial Intelligence (AI) applications. As we know

medical data includes marginal private information of patients which demands excessive data protection from

disclosure to unexpected destinations. In this paper, we performed a Systematic Literature Review (SLR) of

published research articles on FL based medical image analysis. Firstly, we have collected articles from

different databases followed by PRISMA guidelines, then synthesized data from the selected articles, and

ﬁnally we provided a comprehensive overview on the topic. In order to do that we extracted core information

associated with the implementation of FL in medical imaging from the articles. In our ﬁndings we brieﬂy

presented characteristics of federated data and models, performance achieved by the models and exclusively

results comparison with traditional ML models. In addition, we discussed the open issues and challenges of

implementing FL and mentioned our recommendations for future direction of this particular research ﬁeld.

We believe this SLR has successfully summarized the state-of-the-art FL methods for medical image analysis

using deep learning.

INDEX TERMS Federated learning, machine learning, medical image analysis, data privacy, systematic

literature review.

I. INTRODUCTION

Image processing and analysis both are different tasks and

often dependent on each other in terms of classifying an

image data. To describe the image processing history we have

to look quite back in 1973, an image of a Swedish model

Lena is the ﬁrst one that was used for image processing.

Since then image processing has been applied in dozens of

research ﬁelds, medical imaging is one of them. An image is

essentially composed of 2D signals (vertical and horizontal),

also with a number of pixels [1]. Different types of images

have their different pixel parameters, during analysis these

parameters help to extract respective information from the

image. On the other hand, the task of the analysis part is

to understand the processed images through different tech-

niques, i.e., Machine Learning (ML); this technique includes

different ML oriented algorithms. At the beginning, classical

The associate editor coordinating the review of this manuscript and

approving it for publication was Sudipta Roy .

ML algorithms (e.g., SVM, naive bayes, decision tree) were

used broadly in image processing research. Later on, it turned

into neural network based modeling after introducing deep

learning and now it is an integral part of any image analysis

task including medical imaging. Every year the usage of

medical imaging increases worldwide for diagnostics. The

image data mainly represents various radiological images

such as, X-ray, computed tomography (CT), magnetic res-

onance imaging (MRI), ophthalmology images, and so on.

Besides, other data from eye, skin, cell have signiﬁcant con-

tributions in clinical imaging to detect, diagnose and treat dis-

eases [2]. It is becoming increasingly important now to have

these medical images being taken by different devices need

to be sent across from one system to another and therefore

they need a computer network. However, a large collection of

such images creates a dataset, they are located and processed

in cloud servers under ML approach.

In the era of AI, collaborative learning, more speciﬁcally

sharing data among different institutions, multiple sources

28628 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ VOLUME 11, 2023

M. F. Sohan, A. Basalamah: Systematic Review on FL in Medical Image Analysis

TABLE 1. Contribution of available SLR on FL-driven medical data analysis and out study.

can be very efﬁcient in terms of building robust AI models.

Since models are trained in centralized individual locations

in traditional ML, the collaboration between models is quite

tough. Contrastingly, X-rays, CT, MRI all of these are per-

sonal data pertaining individual patients which need to protect

from risk of this medical information being disclosed or

revealed to any unauthorized third party. In addition, even

though data sharing is possible, the data store, processing,

and analysis are still difﬁcult tasks in a centralized manner.

For such that scenario, data encryption-decryption could be

a potential solution to exchange information between partic-

ipants; however the process could be complex, time consum-

ing and not sustainable [3]. So, instead of bringing the data to

the location where the model is trained, why not bring the

model to the data (institutions and the hospitals) and train

directly there in-house, it allows collaborative learning with-

out centralizing the dataset itself, this is called FL. It was ﬁrst

introduced in 2016 [4] and gained a lot of attraction within

last couple of years for the healthcare domain. It addresses

the privacy and data protection concern, which is currently

an important problem in developing medical AI. In FL, the

participants can train models locally and estimate different

parameters for respective models, then share the parameters

to a centralized server for aggregating them. Therefore, the

focus is not on which data is used or what algorithms can be

trained, the concept is managing the data in a different way

where data privacy is reserved.

A. OBJECTIVE AND CONTRIBUTION

Since medical images are sensitive data, it needs to be pro-

tected and preserves the rights of users’ personal information.

We already discussed FL is arrived to solve the data privacy

issue in collaborative ML and within the short time the con-

cept has applied in different ﬁelds including medical imaging.

Already many articles have been published on FL oriented

medical image analysis and they successfully applied this

unique data management technique in their research articles.

At this stage, it is time to look back, need to review and assess

what has been done till now, what are the impacts of FL on

medical imaging. Meanwhile, some SLR have been published

on the topic, however, they were about overall healthcare

applications not particularly for the medical image analysis

context. A SLR has been presented in [5], they considered all

of the articles which have used all forms of medical data to

train their FL models. Similarly in [3], [9], and [6] the authors

have included the whole healthcare area to survey and review

the papers. Some review articles presented speciﬁc medical

domains, for example, Naeem et al. [10] worked particularly

on brain tumor diagnosis using MRI images. Since FL is

comparatively a new concept, most of the review articles

emphasized on the design and implementation. Secondly,

they discussed the privacy or security opportunity, which is

the fundamental characteristic of FL. Some of them [5], [7],

and [10] were formulated on different research questions,

a common question was regarding the state-of-the-art FL

methods; besides, data properties, impact, gaps and future

research have been investigated. Alongside, several survey

articles have been published on FL for healthcare informatics.

Xu et al. [11] surveyed the papers that focus FL in the biomed-

ical area to provide a review. Their effort was to summarize

the privacy, statistical and system challenges that exist in

this speciﬁc domain. A well-known article in this ﬁeld [12],

where the authors discussed the prime factors related to FL

in digital health with challenges and solutions.

This study is a SLR, we exclusively investigated the FL in

medical image analysis and extensively touched every com-

ponent in the considered articles, specially the performance

analysis and comparison with usual ML, which is the main

distinction of our study corresponding to the previously pub-

lished review papers. Our study consisted of several research

questions and by answering the questions we illustrated the

current research lay-out in the ﬁeld of medical image process-

ing using FL. In addition, several observations were discussed

according to the ﬁndings extracted from the literature. Table 1

VOLUME 11, 2023 28629

M. F. Sohan, A. Basalamah: Systematic Review on FL in Medical Image Analysis

FIGURE 1. Two basic frameworks: working and communication flow of decentralised federated learning in left and usual machine learning in right

for a hospital environment.

shows comparative analysis of our contribution and related

review articles, our study explored the demographic data,

FL architecture, privacy preserving concern, federated data

management, and performance of FL models. We did not ﬁnd

any article which has worked particularly on medical images.

Consequently, this study can be an outline for future research

of FL application in medical imaging. The following are the

key contributions of our paper:

•We surveyed the insights of FL solely in medical image

research in a systematic way.

•We provided the latest implementation, advancement,

and tendencies toward medical image analysis research

using FL in different aspects.

•We presented and compared the performance of differ-

ent FL architectures used in the reviewed articles with

traditional ML models, which is the ﬁrst of its kind.

•For incoming contributors we discussed open issues,

challenges, and future direction of the research ﬁeld.

Rest of the article is structured with six sections. Basic

FL concept is introduced in Section II. Section III described

the procedures of this review. The results of this investi-

gation are presented through different research questions in

Section IV. Open issues and challenges are discussed in

Section V. Besides, Section VI includes the limitation of this

study. Lastly, the conclusion and future directions is provided

in Section VII.

II. FEDERATED LEARNING

In this section we have described an overview of FL architec-

ture. The concept of FL is not related directly to the ML com-

ponents, it is all about a data management process to share

data between multiple clients in a privacy preserving manner.

For a practical example, suppose a hospital environment that

produces some data, also has a model and some computer

resources that would like to tackle a speciﬁc problem by an

AI system. Moreover, the dataset in the institution has not

been sufﬁcient to train the model which is able to address

this problem. Another hospital dealing with similar difﬁculty

wants to work together on this promise where they have a

common goal and can solve a common task. However, both

hospitals have different data locally and they need to use each

other’s data without sharing data directly. This collaborative

model training without sharing the data is exactly the purpose

of FL.

In Fig. 1, we have presented FL in left and traditional ML

framework in right to illustrate the fundamentals of both for

a hospital environment. In association with that, as supple-

mentary information we have listed necessary keywords and

their explanations related to decentralized FL implementation

in Table 2. Since FL consists of multiple sources of data,

we have shown four clients in the ﬁgure. Each of the clients

has few common duties, they collect the data from the hospi-

tals, train them using the local ML models and estimate some

parameters. These parameters are sent to the central server

from every client, not the data itself. Once the central server

has received all the local modes’ parameters, it aggregates

them and takes the weighted average, this is known as the

global model and sent back to all of the clients. By this

process a learning round is completed and repeated for the

next round.

However, a well known federated averaging algorithm

is FedAvg [13], proposed by Google in 2016, it calculates

28630 VOLUME 11, 2023

M. F. Sohan, A. Basalamah: Systematic Review on FL in Medical Image Analysis

TABLE 2. Some commonly used components and their definitions in

federated network.

weighted average of the individual clients. It is very expected

that the data quantity could not be the same across the clients,

sources with larger datasets will have correspondingly larger

weighted losses, individual clients losses are minimized to an

overall global loss which is called weighted average. Under

FedAvg, every client trains a model for a deﬁned number

of epochs through Stochastic Gradient Descent (SGD) algo-

rithm and transmits the learning parameters to the central

server and the server performs aggregation in the form of an

averaging. The mathematical presentation of FedAvg is:

f(w)=

k=1

nFkw

In this function there are knumber of clients and each client

has its own loss function Fkw. Then weight each of the

losses by the size of the client’s dataset nk. Hence, the overall

objective is to minimize a global loss which is a weighted

combination of local losses and the local loss is computed

on private data which is never shared, only model updates

are shared. Apart from the FedAvg, there are many research

directions and varieties of FL going on such as SecAgg

[54]. Though different combinations exist in the FL imple-

mentation, two characteristics are maintained expectedly: the

datasets are distributed and remain local, not centralized and

have a collaborative model to work towards the same goal.

III. RESEARCH METHOD

There are several review article types available in the litera-

ture to do deeper level of research, such as narrative reviews,

systematic reviews. We mentioned at the beginning that a

systematic review has been conducted for our investigation.

Mainly two SLR methods are popular in practice, one is

PRISMA (Preferred Reporting Items for Systematic Reviews

and Meta-Analyses) and another is Kitchenham’s guidelines;

the second one is mainly considered in computer science

and software engineering research ﬁelds [14]. To conduct

this review we followed the PRISMA procedures which is

the most common way of performing SLR in the healthcare

sector [6]. However, for a SLR, ﬁrst we need to identify

FIGURE 2. The steps taken to conduct this study.

relevant articles that focus on a very speciﬁc research area

and question(s), secondly appraising the quality of the studies

performed and the strength of the evidence in the papers, and

lastly synthesize the ﬁndings to draw respective conclusions.

Fig. 2shows all of the steps taken to conduct this review

sequentially.

TABLE 3. Formulated research questions of this review.

A. RESEARCH QUESTION

Our ﬁrst step of this review was to establish a group of ques-

tions which will describe the literature in the most effective

way. Table 3shows the ﬁve contexts and their associated

12 research questions. First context is the overview that talked

about the application and problem solved by the FL; next a

broad explanation over the datasets was presented; third ML

framework; then implementation of FL was discussed includ-

ing privacy method, types of FL; and lastly the experimental

substances, specially the performance comparison have been

presented.

VOLUME 11, 2023 28631

M. F. Sohan, A. Basalamah: Systematic Review on FL in Medical Image Analysis

B. SEARCH PROCESS

Since FL was ﬁrst presented in 2016, the search process of

the review was limited over the time period from 1 January

2017 to 30 June 2022. We discovered all of the common

databases considered by previous researchers; for exam-

ple, Science Direct, IEEE Xplore digital library, Springer

Link, Wiley Online Library, SPIE digital library, ACM dig-

ital library, Multidisciplinary Digital Publishing Institute

(MDPI), Nature Portfolio, Taylor & Francis, and Google

Scholar. The searching criteria is different across the plat-

forms, we used advanced options of each database to search

articles with Boolean ‘‘AND’’ and ‘‘OR’’ expressions. Our

study focused on the implementation of FL in healthcare

image processing, so that we carefully avoided the other

applications. The search phrases looked over the titles,

abstracts, and keywords in each of the databases. Fig. 3

depicts the PRISMA ﬂow diagram where whole statistics of

article consideration in this review has been presented. After

the search operation, primarily collected articles have gone

through a selection process, we have described them in the

next sections.

C. INCLUSION AND EXCLUSION CRITERIA

Literature search strategy is a big challenge while it is needed

to ﬁnd too many papers, these circumstances are solved by

a predeﬁned inclusion and exclusion criteria in SLR. This

might include limiting the search to only those that contain

certain types of studies. However, the processes ensure the

task achievement properly, reduce the possibility of bias

and protect the selection process from irrelevant research

documents. We implemented the inclusion and exclusion on

the collected articles from the databases to reach the exact

materials that are seeking the readers. We emphasized the

following points to include articles for ﬁnal analysis:

•Article that studied medical image datasets.

•ML model developed with the FL environment.

•FL was the main focus in the ﬁndings (result

analysis/comparison).

Since we performed keyword search, the articles were

collected based on the words present in the paper, even if it

was mentioned for a single time. Therefore, we excluded the

articles that are not relevant and does not fulﬁll our scope

based on the given criteria:

•Articles that used private dataset(s) for the ML model.

•Studies that are not mainly focused on FL and medical

image data.

•Hybridization or modify the theme of FL, e.g., federated

reinforcement learning.

•Abstract, short article, any pre-print, any book or book

part.

•Articles do not have a clear presentation of the results

using ML based performance measures (e.g., [85], [86]).

The functionalities of inclusion and exclusion are observed

in Fig. 3. It shows the number of initially collected arti-

cles from different databases is 161. We have removed the

duplicate articles from there and 138 articles were taken for

further steps. After that we screened the articles for two times

under two different conditions, ﬁrst we gently explored the

title and abstract which helps to remove 96 articles, besides,

we extensively investigated the full text of rest 42, where

another 25 papers have been disqualiﬁed. Finally, we discov-

ered 17 from 161 articles to hold our review.

D. DATA EXTRACTION

Data collection mostly involved in research questions of our

study, we extracted information in order to cover the ques-

tions perfectly. At ﬁrst we created a spreadsheet and input

respective information headers on the top. We worked on the

17 articles individually, each time all of the information has

been gathered distinctively on the spreadsheet and they were

used as our ﬁndings. The following data are extracted from

every articles:

1) Document title, publication year, and journal/

conference name.

2) Used datasets and their federated settings.

3) The security or privacy protocol used for FL.

4) The algorithms used to train ML models.

5) Performance of the FL model.

IV. RESULTS

We assembled this section following the research questions

that we described in Section III. In the upcoming sections,

ﬁrst we have presented the demographic analysis (also known

as numerical analysis) data along with the key contributions

and limitations of each reference work in Table 4, thereafter

we answered the 12 questions successively.

A. OVERVIEW

RQ1 What are possible applications of FL?

We found the application of FL in different research ﬁelds,

such as, Diabetic Retinopathy (DR), MRI classiﬁcation, can-

cer, pneumonia, COVID-19 detection, and few more. These

topics are popular in medical image processing research

with conventional ML. Hence, FL also creates new scope to

research due to the privacy production efﬁciency which is

essential for this particular imaging research.

In 2019, coronavirus disease hitted all over the world

and created a crisis regarding identiﬁcation of COVID-19

samples. The RT-PCR test is the most reliable diagnosis

method of the diseases, since inadequate testing kits and

some technical limitations, researchers tried to explore alter-

native ways of COVID screening. Therefore, hundreds of

ML based automated and time saving COVID-19 detection

models have been presented within the last two years [33].

ML based COVID analysis is mostly carried out by radi-

ological chest images, i.e., X-ray and CT images. Among

the contributions, FL also discussed and implemented several

detection models as data privacy was a big concern there.

In this study, we found six articles out of 17 were speciﬁcally

worked on COVID-19 detection. Feki et al. [18] proposed a

collaborative FL for COVID-19 screening from chest X-ray

images; they cooperated with multiple medical institutions

28632 VOLUME 11, 2023

M. F. Sohan, A. Basalamah: Systematic Review on FL in Medical Image Analysis

TABLE 4. Numerical data of considered articles for this study which include publication year, name of publisher, and data analysis method.

VOLUME 11, 2023 28633

M. F. Sohan, A. Basalamah: Systematic Review on FL in Medical Image Analysis

FIGURE 3. Article consideration process of this review according to PRISMA flow diagram.

without sharing their data. Similarly, Zhang et al. [24] and

Yan et al. [29] used X-ray and CT image data for different

Convolutional Neural Network (CNN) architectures in FL

settings. References [21], [25], and [32] also have contributed

to the COVID-19 infection in a multinational way. However,

during the pandemic such that artiﬁcial intelligence tools

were not clinically used signiﬁcantly to diagnose COVID-19,

all of them were experimental operations and hopefully the

contribution will help in future initiative.

Millions of patients are suffering from fatal diseases world-

wide, cancer is top of them. Researchers have shown early

detection of cancer can save a large number of lives [34].

Consequently, deep learning has emerged as a potential

of early cancer detection by the help of medical images.

It extracts features from the raw images and provides deci-

sions regarding cancer detection with notable performance.

As a part of ML technique, FL has been considered in sev-

eral cancer diagnosis techniques, Fig. 4shows 29.4% arti-

cles (ﬁve out of 17) of this review were formed on cancer

detection. Researcher Polap and their team have published

three research papers [17], [19], [22], all of them focused on

skin cancer detection with the FL environment. They used

seven different skin marks (classes) to train the detection

models and successfully implemented the privacy protected

FL. Moreover, Hashmani et al. [30] applied FL on a series of

dermoscopy images to classify nine different skin diseases.

Nowadays, important internal organs of human body, such

as lung, breast cancer are the leading causes of cancer death.

A FL oriented lung cancer detection model has been proposed

by Adnan et al. [28]. They demonstrated that their model

achieved acceptable performance while decentralized data

conﬁguration applied.

One of the domains is Diabetic retinopathy (DR) analysis.

Diabetes is a chronic disease that affects millions of people

globally and uncontrolled diabetes can lead to serious damage

to the body’s system including eyes. DR is a common diabetic

eye disease and the number one cause of vision loss and

blindness in the world. It occurs when diabetes damages

the small blood vessels on the retina. In the primary care

clinic, those retinal images can be transmitted to an eye care

specialist who investigates the image and then provides a

consultation. However, these days deep learning algorithms

can detect the DR within seconds with high accuracy. Lo et

al. [16] analyzed the retinal images to classify the DR posi-

tive and non-DR samples using the FL approach. In another

article, Zhou et al. [31] introduced a FL framework which

classiﬁes ﬁve scalability categories of DR, 0 to 4 (No DR

to Proliferative DR).

28634 VOLUME 11, 2023

M. F. Sohan, A. Basalamah: Systematic Review on FL in Medical Image Analysis

TABLE 5. List of dataset used in federated medical imaging with references for quick access.

FIGURE 4. Percentage of FL applied in different diseases diagnosis

research.

Linardos et al. [26] considered FL for Diagnosing Hyper-

trophic Cardiomyopathy (HCM), whether the subjects are

suffering from HMC or normal. In addition to that, a multi-

label cardiac diseases classiﬁcation has been proposed by

Chakravarty et al. [23], where 14 classes were examined. The

other application includes Autism Spectrum Disorders (ASD)

detection. Li et al. [20] applied deep learning in a FL environ-

ment to classify MRI images. Their model worked for iden-

tifying the ASD using the MRI analysis technique. We also

found FL is used in pneumonia detection, Kaissis et al. [27]

proposed a model that able to detect different pneumonia

samples.

RQ2 What problems were solved?

Almost all of the articles considered in our investigation

solved an universal problem which is ‘ensure the security of

private data’. Data is always a key factor while we need to

train a ML model, besides it is a challenge to protect the data

from potential security and privacy threats. These threats are

more crucial in Electronic Health Record (EHR) data analy-

sis. Sharing EHR data includes patients’ private information,

above all their identity could be under risk to expose publicly.

Similarly in medical image analysis, maintaining privacy of

users’ data such as X-ray, CT, MRI images is going to be

difﬁcult with traditional ML layout. Hence, FL is a privacy

preserving way of training AI algorithms, allows to move the

model to the data rather than moving the data to the model

and this makes it very useful in cases where sensitive data

cannot be shared. Since researchers are working for a long

time on the application domains that we have discussed in the

previous section, now they applied the same fenomena with

privacy preserving FL as their experimental research.

B. DATASET

RQ3 What type of dataset used?

We divided the used datasets in the 17 investigated articles

into several categories based on image type. Data type varies

from model to model, it actually depends on which domain

the model will apply; for example, skin images are used

to detect skin cancer. Fig. 5displays eight different types

of medical images collected from various datasets used in

the research ﬁeld. Lung X-ray image: As we mentioned,

severe cases of some diseases affect particular organs of

our body, lung is one of them. Literature shows COVID-19

and pneumonia complications include lung damage which is

the reason behind using lung X-ray and CT images in such

disease detection models. Likewise, as we know smoking

VOLUME 11, 2023 28635

M. F. Sohan, A. Basalamah: Systematic Review on FL in Medical Image Analysis

FIGURE 5. Different types data samples taken from respective dataset. (a) Tuberculosis infected chest X-ray image, (b) COVID-19 positive chest X-ray

image, (c) COVID-19 positive CT image, (d) Brain MRI image for autism spectrum disorders identification, (e) Optical coherence tomography angiography

(OCT-A) image of eye, (f) Light-sensitive tissue data of retinal blood vessel, (g) Skin dermoscopy image, and (h) Lung tissue image for cancer detection.

is dangerous for health, which particularly affects our lungs

and a key reason for lung cancer. According to our inves-

tigation, six articles [18], [23], [25], [27], [29], [32] have

used chest X-ray images out of 17. The X-ray datasets

considered in the articles are Cohen JP, TB x-ray, CheX-

pert, Mendeley data, COVIDx, Chest X-ray (CXR), and

COVID 2019 dataset. Moreover, Zhang et al. [24] proposed

a FL oriented COVID-19 detection model where chest X-ray

and CT images were considered from three datasets, Qatar-

Dhaka data, COVID-CT, and Figure 1dataset. Skin image

data: MNIST: HAM10000 is one of the leading datasets

used in skin cancer detection research with deep learning

techniques. This repository contains 10,015 dermatoscopic

images divided into seven different classes. Połap and the

groups used the dataset in a series of articles [17], [19], [22]

with FL environment. Similar data has been used in [30],

which was released under a dataset challenge competition

called ISIC 2019 and contains 25,331 dermoscopy images.

Retina image data: We found two different articles which

have applied retinal images for their FL models. In [31],

Zhou et al. used a DR dataset consisting of 3,662 images. The

images are noted as ﬁve different scalability categories, from

no DR to extreme. Their goal was to classify the different

levels of DR cases. In another article, Lo et al. [16] collected

a total of 153 data samples from four different sources.

Their deep learning model performed binary classiﬁcation to

deﬁne DR and non-DR samples. Others: Adnan et al. [28]

used tissue image data, more speciﬁcally they proposed a

privacy guaranteed ML model where lung tissue images were

considered to classify cancer. In addition, Li et al. [20] and

Linardos et al. [26] both used MRI images for their models,

brain and heart MRI data consequently. We have included all

of the dataset name with their references for easy access in

Table 5.

RQ4 Are the number of data samples sufﬁcient?

In ML research, it is very established that the more data

we have for training purposes the better prediction we will

get from the models. Also, chances of model overﬁtting will

increase when we have a smaller dataset; so, it is always

advisable to use a larger dataset. For our study, we analyzed

the 17 articles by the range of data samples used in the

respective research papers. First, we will discuss the arti-

cles which have used less than 1,000 samples. As Table 5

shows, four papers [16], [18], [20] and [26] used very small

amounts of data, their number of elements are 153, 216, 370,

and 180 respectively. Since larger dataset belongs to better

potentiality of inside analysis, literally 153 data samples are

not technically sound. Next, within the 10,000 sample range,

seven papers [21], [24], [25], [27], [28], [31], and [32] used

data samples between 2,109 and 6,284 and this number is

quite good. Finally, we found six papers [17], [19], [22], [23],

[29], [30] all of them have used more than 10,000 images

28636 VOLUME 11, 2023

M. F. Sohan, A. Basalamah: Systematic Review on FL in Medical Image Analysis

individually. CheXpert, the largest dataset (overall 223,414

CXR image) found under our investigation was considered

by Chakravarty et al. [23].

RQ5 Are non-IID data distribution considered?

There are two forms of federated frameworks exist accord-

ing to the data distribution, IID and non-IID. IID refers to

independent and identically distributed. This can be divided

into two parts, independence and identical distribution; inde-

pendence means that the value (data) of an example does

not affect the value of the other. This particular scenario is

commonly described by a coin ﬂipping experiment, when

a coin is ﬂipped, every time the result of both roles does

not depend on the other die. Identically distributed means

that the probability of any speciﬁc outcome is the same, for

example every time ﬂipping a coin there is a 50% chance of

getting heads and a 50% chance of getting tails and that value

does not change while ﬂipping a coin every time. Non-IID

technically inverse from both of the sides. While IID data

feature distribution is same across clients, the feature distribu-

tion is different in non-IID. The problem is quite common in

real life, for example, the appearance of the medical image

sample using different machines across different hospitals

may not align due to different imaging protocols. Therefore,

non-IID data settings mean values are dependent on each

other and there are overall trends between them. Generally

in FL, local models are trained independently where data

distribution is hidden to each other and as a result data type

and features could be vary client to client [6], [53], this

variation makes non-IID data consideration important in FL

research. However, in this study, we investigated FL used

in medical image analysis. We observed FL data structure

is complicated, especially while the local clients’ data are

signiﬁcantly different to each other. Our results show only

four papers (we did not ﬁnd sufﬁcient explanation from [26]

and [21]) considered non-IID type along with IID data and

the rest 13 did not talk about the content. In [18], Feki et al.

divided the collected dataset into four parts for clients data,

for IID, they used an equal number of images from both sides,

client and class. Moreover, for non-IID data they allocated

the samples among classes unequally by a ratio of 66% and

44%. Likewise, Adnan et al. [28] performed FL with IID and

non-IID data individually where number of samples were

different in each client under non-IID scenario.

C. ML FRAMEWORK

RQ6 Which ML algorithms are used to train local models?

Although FL is the leading focused topic of this investi-

gation, ML techniques make the actual difference when it

comes to ﬁgure out the overall performance of the models.

As usual in the FL framework, each client server data is

trained by ML algorithms. Since our review is based on med-

ical image data and this image analysis or computer vision

task is mostly conducted by CNN oriented deep learning

models. However, to answer the question we searched each

of the considered articles and found a variety of using built-in

TABLE 6. ML and FL methods for medical image analysis.

CNN models, such as VGG16, Inception, ResNet18, and

many more. VGG16 is a widely considered, reliable, and

pre-trained model; ﬁve out of 17 surveyed papers considered

this CNN model. This model is constructed by 16 layers,

13 convolutional and 3 fully connected layers. Likewise,

VGG19 is a 19 layers CNN model and used by Lo et al.

[16]. Residual Network (ResNet) is also a commonly used

algorithm that can be constructed by different numbers of

layers, e.g., ResNet18 ([23], [26], [27], [29]), ResNet50 ([18],

[24]), ResNet101 ([24]). Other pre-trained CNN models are

Inception ( [17], [19], [22]), AlexNet ( [17], [19]). Besides,

CNN associated customised deep learning models have been

used in several articles which is listed in Table 6. Li et al. [20]

have used multi-layer perceptron (MLP) classiﬁer which was

a deep neural network constructed by one input, hidden, and

output layers. Adnan et al. [28] performed image segmenta-

tion using a supervised learning approach called Multiple-

Instance Learning (MIL) to train the local models.

D. FL IMPLEMENTATION

RQ7 Are any additional security methods implemented?

Data privacy and security both are not similar in prac-

tice; privacy covers the use (control, access, and regulate)

of data, on the other hand, security deﬁnes the poten-

tial threats of unauthorized access and malicious attacks.

FL mainly preserves the privacy concern since trained models

of stakeholders are shared instead of sharing data directly.

Still, sharing models can be vulnerable while parameters are

exchanged between clients and servers and could be a possi-

ble threat against system security [28]. Several additional pri-

vacy preserving methods have been described in a systematic

review article [83]. However, we found few articles that have

VOLUME 11, 2023 28637

M. F. Sohan, A. Basalamah: Systematic Review on FL in Medical Image Analysis

considered additional initiatives for security in FL based med-

ical imaging research. Most of the articles (three out of four)

have used Differential Privacy (DP), it allows companies to

collect information about their users without compromising

the privacy of an individual and the ultimate goal is to be

able to share information about a dataset with other people

without revealing individuals Personally Identiﬁable Infor-

mation (PII) from the dataset [9], [84]. Li et al. [20] used

two different mechanisms of DP, Gaussian and Laplace. They

deﬁned the noise level αwhich varied from 0.001 to 1.

Similarly, Kaissis et al. [27] have applied both techniques

and Adnan et al. [28] have used only Gaussian noise in their

experiments. In addition, Połap et al. [17] used encryption

and blockchain techniques to make their FL model more

secure. They proposed three different learning agents where

blockchain technique was applied in Data Management

Agent (DMA). According to their description, all patients

data (images) have to be their unique IDs, once a request

arrive to analysis, it will check whether the ID is exist or not

into the database, if not then it will create an unique ID and a

block to the blockchain, then transfer the ID to the database

with the image.

RQ8 What types federated data partitioning are used?

Mainly three categories of FL described in the previous

literature based on the training data distributions across the

models. Among the three types, Federated Transfer Learn-

ing (FTL) and Vertical FL (VFL) are rarely considered in

medical research; another one, Horizontal FL (HFL) was

used widely. So, in a horizontal partition the client’s database

holds many different customers but they are collecting all the

same type of data on those customers, in other words ‘‘same

features, different samples’’. In vertical FL, it has different

customers in both but there is an overlap of those customers

and they are collecting different features, more speciﬁcally

‘‘different features, different samples’’ [3], [9], [84]. How-

ever, in this investigation we focused on the medical image

research and found most of the articles were based on HFL.

For example, Feki et al. [18] utilized HFL, they used a chest

X-ray image dataset where features are same for all clients but

samples are different. Interestingly, Kaissis et al. [27] used

two different datasets for training and testing their FL models,

the fact is both datasets contain X-ray images (same features)

and different data. Only two articles we deﬁned as VFL; [24]

have taken three datasets, two X-ray and one CT image based.

In the article the authors combined the both types of images

and used them to train and test models. In [25], the authors

used X-ray and ultrasound images for their federated models.

X-ray with CT or ultrasound images are technically different,

thus their features will be also different and they used various

data features in different clients which makes a VFL scenario.

RQ9 What are the federated frameworks used?

Table 6represents respective deep learning architectures

that were used for training their local models (we discussed in

RQ6) and next the federated framework which was mainly the

aggression approach of the collected local models in the cen-

tral server. We observed federated mechanisms are executed

in two ways, some articles were driven by formerly proposed

build-in FL algorithms and others with basic concepts for

aggregation. FedAvg (discussed in Section II), which is a

commonly used method in federated aggregation, as Table 6

shows six articles considered this algorithm. Likewise, [20]

and [27] used two different federated algorithms named Fed,

secure aggregation (SecAgg) respectively. SecAgg is a secure

model aggregation for FL also proposed by Google in 2016.

Połap and Woźniak [19] proposed a meta-heuristic search

based federated model, ﬁrst they calculated average loss of

all local models and then selected only models that have

scored higher than the average loss for aggregation in server.

Mainly all of them pursue fundamental concepts of FL but

they implemented it in different ways. However, the described

above federated aggregation process has no impact on the

model performance, it is all about engineering the data dis-

tribution in a decentralized and collaborative manner.

E. EXPERIMENTAL

RQ10 What are the performance measures used in the

studies?

The ﬁnal and startling step of any ML setting is to assess

how good the model is through performance evaluation. The

basic idea is to develop a ML model using some training

samples and test this train model on some other unknown

data. However, the training error is not very useful for actual

evaluation, because it is easy to overﬁt the training data by

using complex models which do not generalize well to future

samples. Contrariwise, testing error is the key metric since

it has a better approximation of the true performance of the

model on future samples. Thereby, we only considered testing

performance throughout our review. As we found from this

investigation, classiﬁcation and segmentation both tasks were

used and that is why their performance were also evaluated

in different ways. In Fig. 6we have presented the number

of articles using different performance metrics. Most of the

experiments (14 out of 17) were evaluated by accuracy. Recall

was the second commonly used measurement criteria, consid-

ered by ﬁve articles. Area Under the Curve (AUC) score three

and precision were used two times.

FIGURE 6. Number of articles applied different performance metrics.

28638 VOLUME 11, 2023

M. F. Sohan, A. Basalamah: Systematic Review on FL in Medical Image Analysis

RQ11 How is the performance of the FL frameworks

reported?

This question is for getting the overview of performance

achieved by the FL based models in the 17 articles. Perfor-

mance assessment is the ultimate part of any ML model where

the conducted experiment is evaluated by different matrices.

Our investigation revealed 14 articles worked on data classi-

ﬁcation (binary and multi-class), one article worked on data

segmentation, and remaining two considered both of them

(listed in Table 4). Usually performance of classiﬁcation tasks

is assessed by accuracy, it represents the report of correctly

identiﬁed samples from all of the data [14]. We divided

the performance into three categories according to the

achieved accuracy by the 17 studies: high (>=90%), medium

(80%-89%), and low (<80%). Table 7summarised the perfor-

mance scores of all articles.

High: We found eight articles have an accuracy of 90% or

more. Feki et al. [18] performed binary classiﬁcation, their

accuracy score is highest, for FL+VGG16 with data aug-

mentation model 94.4% and for FL+VGG16 93.57%. Połap

and Woźniak [19] used the inception91 classiﬁer for the FL

model and obtained an accuracy of 91%. Score of [25], [30],

and [24] is not clear, they discussed the accuracy between

90-95%. Article [32] and [27] achieved an accuracy of

90.61% and 90% respectively. Yan et al. [29] presented their

results using sensitivity, their highest score was 91.26%.

Medium: In [22], the author classiﬁed the images as dis-

eases and not a disease, their proposed VGG based FL model

achieved 89.82% accuracy. Lo et al. [16] performed classi-

ﬁcation and segmentation both tasks on different datasets,

the classiﬁcation and segmentation accuracy for SFU dataset

were 88% and 85% respectively, classiﬁcation accuracy of

OHSU dataset was 89%. In [26], Linardos et al. considered

AUC, the highest score achieved by the FL model was 89%.

Adnan et al. [28] conducted binary classiﬁcation with an

accuracy of 85%.

Low: The rest three articles performed multi-class classi-

ﬁcation, where Chakravarty et al. [23] 14, Połap et al. [17]

seven, and Li et al. [20] have considered four classes, their

acquired performances are AUC 80%, accuracy 70% and

76% respectively.

RQ12 How perform the FL approach compared to the

conventional models?

Last research question explores the comparative perfor-

mance analysis between FL and traditional ML image pro-

cessing research. This query is important while we want to

discuss the effect, contribution, and drawback of using FL in

medical image analysis. To answer this question we inten-

sively collected experiment results from both areas, 17 FL

articles and their relevant conventional models. We already

described the performance of the FL models in the previous

question and here we will present the results of usual ML

models and then the comparative analysis. In Table 7we sum-

marized the performance of all articles in this review and we

presented the results of one or more similar articles opposite

to each of the articles to make a comparison chart. To do

TABLE 7. Performance (accuracy) comparison, the results of each of the

reviewed articles and their respective compatible ML models with

references.

so, we extensively investigated dozens of research papers

that analyzed medical images by traditional ML to explore

best matching options which was essential for a reliable

comparison. Several conditions were applied in this criteria

based on the structural and experimental similarity between

ML and FL papers, such as we considered the papers which

used similar datasets, algorithms, and performance measures.

We expect maintaining this condition will ensure an accurate

comparison among the two parties. Our investigation shows

in Table 7that all of the ML models have improved accuracy

compared to their respective FL models in existing literature,

more speciﬁcally we found better ML results against every

FL article. For instance, Połap et al. [17] have achieved accu-

racy with federated VGG16 70% and Inception 67%, how-

ever in ML part, Jain et al. [56] achieved 79.23% accuracy

with Inception and Liu et al. [57] 87% with ResNet50; all

of three have considered the MNIST: HAM10000 dataset.

Then as well Chakravarty et al. [23] has an AUC score of

80% with FL environment, but with same dataset and ML

algorithm article [67] and [68] have 86% and 87% AUC

respectively.

V. OPEN ISSUES AND CHALLENGES

FL is still a young research ﬁeld, so it is difﬁcult to draw

a remark on the rejection and acceptance. However, here

we have discussed the issues and challenges found in the

reviewed articles regarding the application of FL in medical

image. Generally, FL is invented to fulﬁll the privacy concern

of private data, unfortunately it does not cover all potential

privacy threats [93]. However, we described model perfor-

mance, data heterogeneity, and federated model efﬁciency

issues found from the review below:

VOLUME 11, 2023 28639

M. F. Sohan, A. Basalamah: Systematic Review on FL in Medical Image Analysis

A. PRIVACY AND SECURITY

Medical image data is created by personal information of

patients and no one can share this data for AI applications

without reliable data protection. FL makes the data sharing

between the different institutions with some privacy guar-

antees by an advanced data management and model con-

struction process, all we have described in Section II. FL is

different compared to ML models where the training process

is exposed to multiple parties, we do not know the motive

of every participant, it is an issue of trust among them; so

this additional communication increases the risk of leakage

data via reverse engineering. Meanwhile, we observed two

further privacy measures used in federated medical image

processing, differential privacy and secure aggregation. Dif-

ferential privacy involves adding carefully selected noise to

the outputs and can either be done by the individual clients or

server level, secure aggregation is a cryptographic technique

(e.g., blockchain technology), ensures the server can only see

the aggregate of thousands of updates rather than individual

model updates. But the reality is every privacy mechanism

comes with a signiﬁcant computational cost on the federation.

B. DATA HETEROGENEITY

Our investigation shows data heterogeneity could occur in

two ways: number of samples are different (non-IID data) and

data features are different (VFL) among the clients. Usually,

the number of produced data in hospitals are not identical and

in FL, clients can have different data distributions, this uneven

distribution of data of client sides might provide opposing

gradient updates to the server which is challenging to tackle.

Furthermore, practically features of federated datasets are not

the same in many cases, for instance X-ray and CT images

data can be used in two different clients which makes trouble

during aggregate the models parameters centrally in a FL

setting.

C. OVERALL MODEL PERFORMANCE

The ﬁrst impression of an AI model is the performance, how

accurately the model accomplished the task. High perfor-

mance accuracy makes the model more acceptable than a

model that achieved a lower score. We previously discussed

the federated model performance and compared them with

traditional ML models (RQ12). Our ﬁndings show FL failed

to perform better than ML with similar model structures, this

drawback claims us to reevaluate the usefulness of FL in

medical image.

D. FEDERATED ARCHITECTURE

Training a personalized model on each of the clients is not

difﬁcult in FL, problems emerge when all of the model

output transfers to the central server and passes through an

aggregation process. We observed that the federated models

presented in the reviewed articles are mostly theoretical and

less practically implemented, few articles included their open

source code with their articles. Since the research started in

the ﬁeld a couple of years ago, the research method and mate-

rials need to be more easily accessible to future researchers.

Besides, we usually have a very controlled setting in research,

but the question comes when we try to aim for huge datasets

to simulate in a real-world scenario.

VI. LIMITATIONS

In this section we have admitted the limitations of this study.

First, we searched all prominent databases for article collec-

tion where some journals and conference proceedings were

with subscription download policy. In some of such cases,

we could not grab the papers from the sources. Although,

we tried for an alternative way, sent email to the correspond-

ing authors and requested for a full text of the required article.

However, still we failed to reach some of them ( [94] and

[95]) which is limiting the range of this survey. In addition,

our inclusion and exclusion process removed articles from

the initial ﬂeet and preprint articles were not included there,

besides we could not explore all of the searching databases

so it could be possible that we missed to include any relevant

article(s) on the topic. We did not experiment the models used

in the 17 articles under our supervision, for a precise review

that would have been more effective. Overall, it is difﬁcult

to conclude this study with strong and tested historical evi-

dence, because our review was on very limited time and with

insufﬁcient resources since FL was recently introduced.

VII. CONCLUSION AND FUTURE DIRECTIONS

One of the most popular and effective diagnosis methods

is imaging techniques in the medical sector. This practice

is increasing day by day and produces tons of image data.

AI has lots of opportunities in medical imaging using this

data, but clinical use of AI and ML is very limited right now.

In research direction, creating a publicly shareable image

dataset is very difﬁcult for the medical domain. The major

hurdle behind data share and collaboration is privacy issues

which are less prioritized in typical centralized models. Apart

from this concept, federated or distributed learning is differ-

ent, here a data-driven learning model is shared not the data

directly. In this study, we systematically reviewed the articles

that considered FL in their ML based medical image research.

We elaborately discussed from every perspective, including

demographic data, privacy appearance, datasets, FL charac-

teristics, model implementation, and performance compar-

ison. We noticed in one of our previous articles [33] that

deep learning oriented COVID-19 detection using X-ray and

CT images has high accuracy, most of them achieved more

than 95% accuracy. We further observed a similar trend in

this study, here COVID-19 detection research articles are the

top scorers with FL mechanism. Although, the scores under

FL are comparatively lower than general models, as listed in

Table 7. Performance of other application domains with FL

models were also not mentionable. Besides, previous articles

point out the implementation of federated models is relatively

complex, it requires extra communication and maintenance

trouble. However, it is favorable to become acquainted that

28640 VOLUME 11, 2023

M. F. Sohan, A. Basalamah: Systematic Review on FL in Medical Image Analysis

TABLE 8. Quality questions and the scores achived by the 17 articles.

the research ﬁeld got lots of attention and publications within

a very little time, that is why we can hope for promising

progress of FL in medical image analysis in future. At this

stage, we have summarized our ﬁndings below for future

direction to the researchers who are interested to contribute

in the ﬁeld:

•Privacy concern is not fully solved in FL, however,

we cannot deny the importance of decentralized con-

cepts. It could be effective for collaborative ML in med-

ical image research, thus researchers should emphasize

on the implementation of additional privacy protection

in a cost effective way.

•Datasets in the research are collected from various

sources and for various purposes where experimental

results could differ enormously. There is no particu-

lar or benchmark dataset available in federated med-

ical imaging research; need to build some standard

datasets to avoid biased data and data heterogeneity

problems.

•Similarly no benchmark FL model has been presented

yet in this ﬁeld, such that initiative will assist to build

robust AL models for further research.

•In truth, collaborative models data are prone to be het-

erogeneous, various classes of data are collaborating

there. But our results show the accuracy of multi-class

classiﬁcation is very low (as described in RQ11) which

needs to be addressed in future research.

•Federated models achieved satisfactory performance in

some cases but we cannot narrate as an alternative in the

accuracy race with ML models.

•There are many weaknesses observed in current publi-

cations (papers investigated in this review) of this ﬁeld,

we included the article quality checklist and results in

A. Future research could consider the quality analysis

questionnaires for article quality improvement.

No doubt FL is something that might be in the future horizon.

But still there are some technical problems, that challenges

need to be tackled before FL is going to be applied vastly.

Best of our knowledge this is the ﬁrst SLR and we believe

this review is a reﬂection of FL research in the area of medical

imaging.

APPENDIX

QUALITY ANALYSIS

Table 8shows 12 Quality Questions (QQ) and scores, mostly

motivated from our previous article [14]. The goal of such

inquiry was to check the basic quality of the articles published

in FL oriented medical imaging. However, each question

has one score for one article and a total score of 12 for an

individual. We considered the QQ answer in three forms of

VOLUME 11, 2023 28641

M. F. Sohan, A. Basalamah: Systematic Review on FL in Medical Image Analysis

scoring, ‘‘Yes (1)’’, ‘‘Partially Yes (0)’’, and ‘‘No (−1)’’. The

article which clearly supports the question is Yes, partially

supported or where no clear answer found is Partially Yes,

and lastly fully disagreed is No. We investigated each of

the articles to ﬁnd the answer and assigned the scores in

respective columns. As the table interprets most of the articles

have failed to fulﬁll the quality requirement. Highest score is

9 out of 12 gained by [27], followed by six for [20] and [28]

both articles individually. The score indicates in some areas

quality has been maintained poorly in the research papers,

a reason could be that lots of attention made a rush on FL

research among the contributors.

REFERENCES

[1] A. Maier, S. Steidl, V. Christlein, and J. Hornegger, Medical Imaging

Systems: An Introductory Guide. Cham, Switzerland: Springer, 2018.

[2] Z. Zhang and E. Sejdic, ‘‘Radiological images and machine learning:

Trends, perspectives, and prospects,’’ Comput. Biol. Med., vol. 108,

pp. 354–370, May 2019.

[3] A. Rauniyar, D. Haileselassie Hagos, D. Jha, J. E. Håkegård, U. Bagci,

D. B. Rawat, and V. Vlassov, ‘‘Federated learning for medical applications:

A taxonomy, current trends, challenges, and future research directions,’’

2022, arXiv:2208.03392.

[4] H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. Y. Arcas,

‘‘Federated learning of Deep Networks using model averaging,’’ 2016,

arXiv:1602.05629.

[5] B. Pﬁtzner, N. Steckhan, and B. Arnrich, ‘‘Federated learning in a medical

context: A systematic literature review,’’ ACM Trans. Internet Technol.,

vol. 21, no. 2, pp. 1–31, Jun. 2021.

[6] Prayitno, C.-R. Shyu, K. T. Putra, H.-C. Chen, Y.-Y. Tsai, K. S. Hossain,

W. Jiang, and Z.-Y. Shae, ‘‘A systematic review of federated learning in the

healthcare area: From the perspective of data properties and applications,’’

Appl. Sci., vol. 11, no. 23, p. 11191, Nov. 2021.

[7] M. G. Crowson, D. Moukheiber, A. R. Arévalo, B. D. Lam, S. Mantena,

A. Rana, D. Goss, D. W. Bates, and L. A. Celi, ‘‘A systematic review of

federated learning applications for biomedical data,’’ PLOS Digit. Health,

vol. 1, no. 5, May 2022, Art. no. e0000033.

[8] A. Chowdhury, H. Kassem, N. Padoy, R. Umeton, and A. Karargyris,

‘‘A review of medical federated learning: Applications in oncol-

ogy and cancer research,’’ in Brainlesion: Glioma, Multiple Scle-

rosis, Stroke and Traumatic Brain Injuries. Springer, Jul. 2022,

doi: 10.1007/978-3-031-08999-2_1.

[9] D. C. Nguyen, Q.-V. Pham, P. N. Pathirana, M. Ding, A. Seneviratne,

Z. Lin, O. Dobre, and W.-J. Hwang, ‘‘Federated learning for smart health-

care: A survey,’’ ACM Comput. Surv., vol. 55, no. 3, pp. 1–37, Apr. 2023.

[10] A. Naeem, T. Anees, R. A. Naqvi, and W.-K. Loh, ‘‘A comprehensive anal-

ysis of recent deep and federated-learning-based methodologies for brain

tumor diagnosis,’’ J. Personalized Med., vol. 12, no. 2, p. 275, Feb. 2022.

[11] J. Xu, B. S. Glicksberg, C. Su, P. Walker, and J. Bian, ‘‘Federated learning

for healthcare informatics,’’ J. Healthc Inform. Res., vol. 5, pp. 1–19,

Dec. 2021.

[12] N. Rieke, J. Hancox, W. Li, F. Milletarì, H. R. Roth, S. Albarqouni,

S. Bakas, M. N. Galtier, B. A. Landman, K. Maier-Hein, S. Ourselin,

M. Sheller, R. M. Summers, A. Trask, D. Xu, M. Baust, and M. J. Cardoso,

‘‘The future of digital health with federated learning,’’ npj Digit. Med.,

vol. 3, no. 1, p. 119, Sep. 2020.

[13] H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. Y. Arcas,

‘‘Communication-efﬁcient learning of deep networks from decentralized

data,’’ 2016, arXiv:1602.05629.

[14] M. F. Sohan and A. Basalamah, ‘‘A systematic literature review and

quality analysis of Javascript malware detection,’’ IEEE Access, vol. 8,

pp. 190539–190552, 2020.

[15] D. Moher, A. Liberati, J. Tetzlaff, and D. G. Altman, ‘‘Preferred reporting

items for systematic reviews and meta-analyses: The Prisma statement,’’

PLoS Med., vol. 6, no. 7, p. 264, 2009.

[16] J. Lo, T. T. Yu, D. Ma, P. Zang, J. P. Owen, Q. Zhang, R. K. Wang,

M. F. Beg, A. Y. Lee, Y. Jia, and M. V. Sarunic, ‘‘Federated learning for

microvasculature segmentation and diabetic retinopathy classiﬁcation of

OCT data,’’ Ophthalmol. Sci., vol. 1, no. 4, Dec. 2021, Art. no. 100069.

[17] D. Połap, G. Srivastava, and K. Yu, ‘‘Agent architecture of an intelligent

medical system based on federated learning and blockchain technology,’’

J. Inf. Secur. Appl., vol. 58, May 2021, Art. no. 102748.

[18] I. Feki, S. Ammar, Y. Kessentini, and K. Muhammad, ‘‘Federated learning

for COVID-19 screening from chest X-ray images,’’ Appl. Soft Comput.,

vol. 106, Jul. 2021, Art. no. 107330.

[19] D. Połap and M. Wozniak, ‘‘Meta-heuristic as manager in federated

learning approaches for image processing purposes,’’ Appl. Soft Comput.,

vol. 113, Dec. 2021, Art. no. 107872.

[20] X. Li, Y. Gu, N. Dvornek, L. H. Staib, P. Ventola, and J. S. Duncan,

‘‘Multi-site fMRI analysis using privacy-preserving federated learning and

domain adaptation: ABIDE results,’’ Med. Image Anal., vol. 65, Oct. 2020,

Art. no. 101765.

[21] D. Yang, Z. Xu, W. Li, A. Myronenko, H. R. Roth, S. Harmon, S. Xu,

B. Turkbey, E. Turkbey, X. Wang, W. Zhu, G. Carraﬁello, F. Patella,

M. Cariati, H. Obinata, H. Mori, K. Tamura, P. An, B. J. Wood, and D. Xu,

‘‘Federated semi-supervised learning for COVID region segmentation in

chest CT using multi-national data from China, Italy, Japan,’’ Med. Image

Anal., vol. 70, May 2021, Art. no. 101992.

[22] D. Poap, ‘‘Fuzzy consensus with federated learning method in medical

systems,’’ IEEE Access, vol. 9, pp. 150383–150392, 2021.

[23] A. Chakravarty, A. Kar, R. Sethuraman, and D. Sheet, ‘‘Federated learning

for site aware chest radiograph screening,’’ in Proc. IEEE 18th Int. Symp.

Biomed. Imag. (ISBI), Apr. 2021, pp. 1077–1081.

[24] W. Zhang, T. Zhou, Q. Lu, X. Wang, C. Zhu, H. Sun, Z. Wang, S. K. Lo,

and F.-Y. Wang, ‘‘Dynamic-fusion-based federated learning for COVID-19

detection,’’ IEEE Internet Things J., vol. 8, no. 21, pp. 15884–15891,

Nov. 2021.

[25] A. Qayyum, K. Ahmad, M. A. Ahsan, A. Al-Fuqaha, and J. Qadir,

‘‘Collaborative federated learning for healthcare: Multi-modal COVID-19

diagnosis at the edge,’’ IEEE Open J. Comput. Soc., vol. 3, pp. 172–184,

2022.

[26] A. Linardos, K. Kushibar, S. Walsh, P. Gkontra, and K. Lekadir, ‘‘Feder-

ated learning for multi-center imaging diagnostics: A simulation study in

cardiovascular disease,’’ Sci. Rep., vol. 12, no. 1, p. 3551, Mar. 2022.

[27] G. Kaissis, A. Ziller, J. Passerat-Palmbach, T. Ryffel, D. Usynin,

A. Trask, I. Lima, J. Mancuso, F. Jungmann, M.-M. Steinborn, A. Saleh,

M. Makowski, D. Rueckert, and R. Braren, ‘‘End-to-end privacy preserv-

ing deep learning on multi-institutional medical imaging,’’ Nature Mach.

Intell., vol. 3, pp. 473–484, Jun. 2021.

[28] M. Adnan, S. Kalra, J. C. Cresswell, G. W. Taylor, and H. R. Tizhoosh,

‘‘Federated learning and differential privacy for medical image analysis,’’

Sci. Rep., vol. 12, no. 1, p. 1953, Feb. 2022.

[29] B. Yan, J. Wang, J. Cheng, Y. Zhou, Y. Zhang, Y. Yang, L. Liu, H. Zhao,

C. Wang, and B. Liu, ‘‘Experiments of federated learning for COVID-19

chest X-ray images,’’ Advances in Artiﬁcial Intelligence and Security.

Springer, 2021, pp. 41–53.

[30] M. A. Hashmani, S. M. Jameel, S. S. H. Rizvi, and S. Shukla, ‘‘An adaptive

federated machine learning-based intelligent system for skin disease detec-

tion: A step toward an intelligent dermoscopy device,’’ Appl. Sci., vol. 11,

no. 5, p. 2145, Feb. 2021.

[31] S. Zhou, B. Landman, Y. Huo, and A. Gokhale, ‘‘Communication-efﬁcient

federated learning for multi-institutional medical image classiﬁcation,’’ in

Proc. SPIE, 2022, pp. 6–12.

[32] M. A. Salam, S. Taha, and M. Ramadan, ‘‘COVID-19 detection using

federated machine learning,’’ PLoS ONE, vol. 16, no. 6, Jun. 2021,

Art. no. e0252573.

[33] M. F. Sohan, A. Basalamah, and M. Solaiman, ‘‘COVID-19 detection

using machine learning: A large scale assessment of X-ray and CT image

datasets,’’ J. Electron. Imag., vol. 31, no. 4, Mar. 2022, Art. no. 041212.

[34] K. Munir, H. Elahi, A. Ayub, F. Frezza, and A. Rizzi, ‘‘Cancer diagnosis

using deep learning: A bibliographic review,’’ Cancers, vol. 11, no. 9,

p. 1235, Aug. 2019.

[35] M. Heisler, F. Chan, Z. Mammo, C. Balaratnasingam, P. Prentasic,

G. Docherty, M. Ju, S. Rajapakse, S. Lee, A. Merkur, A. Kirker, D. Albiani,

D. Maberley, K. Bailey Freund, M. Faisal Beg, S. Loncaric, M. V. Sarunic,

and E. V. Navajas, ‘‘Deep learning vessel segmentation and quantiﬁcation

of the foveal avascular zone using commercial and prototype OCT—A

platforms,’’ 2019, arXiv:1909.11289.

[36] P. Tschandl, C. Rosendahl, and H. Kittler, ‘‘The HAM 10000 dataset, a

large collection of multi-source dermatoscopic images of common pig-

mented skin lesions,’’ Sci. Data, vol. 5, no. 1, Aug. 2018.

28642 VOLUME 11, 2023

M. F. Sohan, A. Basalamah: Systematic Review on FL in Medical Image Analysis

[37] J. P. Cohen, P. Morrison, L. Dao, K. Roth, T. Q. Duong, and M. Ghassemi,

‘‘COVID-19 image data collection: Prospective predictions are the future,’’

2020, arXiv:2006.11988.

[38] S. Jaeger, S. Candemir, S. Antani, Y.-X. J. Wáng, P.-X. Lu, and

G. Thoma, ‘‘Two public chest X-ray datasets for computer-aided screen-

ing of pulmonary diseases,’’ Quant. Imag. Med. Surg., vol. 4, no. 6,

pp. 475–477, Nov. 2014, Accessed: Oct. 18, 2022. [Online]. Available:

https://qims.amegroups.com/article/view/5132/6030

[39] A. Di Martino et al., ‘‘The autism brain imaging data exchange: Towards

a large-scale evaluation of the intrinsic brain architecture in autism,’’ Mol.

Psychiatry, vol. 19, no. 6, pp. 659–667, Jun. 2013.

[40] J. Irvin, P. Rajpurkar, M. Ko,Y. Yu, S. Ciurea-Ilcus, C. Chute, H. Marklund,

B. Haghgoo, R. Ball, K. Shpanskaya, J. Seekins, D. A. Mong, S. S. Halabi,

J. K. Sandberg, R. Jones, D. B. Larson, C. P. Langlotz, B. N. Patel,

M. P. Lungren, and A. Y. Ng, ‘‘CheXpert: A large chest radiograph dataset

with uncertainty labels and expert comparison,’’ in Proc. AAAI Conf. Artif.

Intell., 2019, pp. 590–597, Accessed: Oct. 18, 2022. [Online]. Available:

https://ojs.aaai.org/index.php/AAAI/article/view/3834

[41] M. E. Chowdhury, T. Rahman, A. Khandakar, R. Mazhar, M. A. Kadir,

Z. B. Mahbub, K. R. Islam, M. S. Khan, A. Iqbal, N. A. Emadi, M. B. Reaz,

and M. T. Islam, ‘‘Can ai help in screening viral and COVID-19 pneumo-

nia?’’ IEEE Access, vol. 8, pp. 132665–132676, 2020.

[42] X. Yang, X. He, J. Zhao, Y. Zhang, S. Zhang, and P. Xie, ‘‘COVID-CT-

dataset: A CT scan dataset about COVID-19,’’ 2020, arXiv:2003.13865.

[43] Agchung. Agchung/Figure1-COVID-Chestxray-Dataset: Figure 1

COVID-19 Chest X-Ray Dataset Initiative. GitHub. Accessed:

Oct. 18, 2022. [Online]. Available: https://github.com/agchung/Figure1-

COVID-chestxray-dataset

[44] V. M. Campello et al., ‘‘Multi-centre, multi-vendor and multi-disease

cardiac segmentation: The M&Ms challenge,’’ IEEE Trans. Med. Imag.,

vol. 40, no. 12, pp. 3543–3554, Dec. 2021.

[45] O. Bernard et al., ‘‘Deep learning techniques for automatic MRI cardiac

multi-structures segmentation and diagnosis: Is the problem solved?’’

IEEE Trans. Med. Imag., vol. 37, no. 11, pp. 2514–2525, Nov. 2018.

[46] D. Kermany, K. Zhang, and M. Goldbaum. (Jan. 1, 2018). Large Dataset

of Labeled Optical Coherence Tomography (OCT) and Chest X-Ray

Images. Mendeley Data. Accessed: Oct. 18, 2022. [Online]. Available:

https://data.mendeley.com/datasets/rscbjbr9sj/3

[47] Project-MONAI. Project-Monai/Monai: AI Toolkit for Healthcare

Imaging. GitHub. Accessed: Oct. 18, 2022. [Online]. Available:

https://github.com/Project-MONAI/MONAI/

[48] Medical Segmentation Decathlon. Accessed: Oct. 18, 2022. [Online].

Available: http://medicaldecathlon.com/

[49] K. Tomczak, P. Czerwinska, and M. Wiznerowicz, ‘‘Review the cancer

genome atlas (TCGA): An immeasurable source of knowledge,’’ Wspól-

czesna Onkologia, vol. 1A, pp. 68–77, Jan. 2015.

[50] L. Wang, Z. Q. Lin, and A. Wong, ‘‘COVID-Net: A tailored deep convolu-

tional neural network design for detection of COVID-19 cases from chest

X-ray images,’’ Sci. Rep., vol. 10, no. 1, pp. 1–12, Nov. 2020.

[51] ISIC Challenge. Accessed: Oct. 18, 2022. [Online]. Available:

https://challenge.isic-archive.com/landing/2019/

[52] J. Y. Choi, T. K. Yoo, J. G. Seo, J. Kwak, T. T. Um, and T. H. Rim, ‘‘Multi-

categorical deep learning neural network to classify retinal images: A pilot

study employing small database,’’ PLoS ONE, vol. 12, no. 11, Nov. 2017,

Art. no. e0187336.

[53] H. Zhu, J. Xu, S. Liu, and Y. Jin, ‘‘Federated learning on non-IID

Data: A survey,’’ Neurocomputing, vol. 465, pp. 371–390, Sep. 2021,

Accessed: Oct. 18, 2022. [Online]. Available: https://www.sciencedirect.

com/science/article/abs/pii/S0925231221013254

[54] K. Bonawitz, V. Ivanov, B. Kreuter, A. Marcedone, H. B. McMahan,

S. Patel, D. Ramage, A. Segal, and K. Seth, ‘‘Practical secure aggregation

for federated learning on user-held data,’’ 2016, arXiv:1611.04482.

[55] M. Heisler, S. Karst, J. Lo, Z. Mammo, T. Yu, S. Warner, D. Maberley,

M. F. Beg, E. V. Navajas, and M. V. Sarunic, ‘‘Ensemble deep learning

for diabetic retinopathy detection using optical coherence tomography

angiography,’’ Transl. Vis. Sci. Technol., vol. 9, no. 2, p. 20, Apr. 2020.

[56] S. Jain, U. Singhania, B. Tripathy, E. A. Nasr, M. K. Aboudaif, and

A. K. Kamrani, ‘‘Deep learning-based transfer learning for classiﬁcation

of skin cancer,’’ Sensors, vol. 21, no. 23, p. 8142, Dec. 2021.

[57] Y. Liu, Z. Wang, Z. Li, J. Li, T. Li, P. Chen, and R. Liang, ‘‘Multiscale

ensemble of convolutional neural networks for skin lesion classiﬁcation,’’

IET Image Process., vol. 15, no. 10, pp. 2309–2318, Aug. 2021.

[58] D. Das, K. C. Santosh, and U. Pal, ‘‘Truncated inception net: COVID-19

outbreak screening using chest X-rays,’’ Phys. Eng. Sci. Med., vol. 43,

no. 3, pp. 915–925, Sep. 2020.

[59] N. S. Punn and S. Agarwal, ‘‘Automated diagnosis of COVID-19 with

limited posteroanterior chest X-ray images using ﬁne-tuned deep neu-

ral networks,’’ Int. J. Speech Technol., vol. 51, no. 5, pp. 2689–2702,

May 2021.

[60] G. Chowdhary, N. K. Toppo, and D. Das, ‘‘Skin lesion diagnosis in

healthcare-cyber physical system,’’ in Proc. IEEE Int. Conf. Innov.Technol.

(INOCON), Nov. 2020, pp. 1–6.

[61] M. Arshad, M. A. Khan, U. Tariq, A. Armghan, F. Alenezi, M. Younus

Javed, S. M. Aslam, and S. Kadry, ‘‘A computer-aided diagnosis system

using deep learning for multiclass skin lesion classiﬁcation,’’ Comput.

Intell. Neurosci., vol. 2021, pp. 1–15, Dec. 2021.

[62] M. R. Ahmed, Y. Zhang, Y. Liu, and H. Liao, ‘‘Single volume image

generator and deep learning-based ASD classiﬁcation,’’ IEEE J. Biomed.

Health Informat., vol. 24, no. 11, pp. 3044–3054, Nov. 2020.

[63] X. Li, N. C. Dvornek, J. Zhuang, P. Ventola, and J. S. Duncan, ‘‘Brain

biomarker interpretation in ASD using deep learning and fMRI,’’ in Med-

ical Image Computing and Computer Assisted Intervention—MICCAI.

Springer, 2018, pp. 206–214.

[64] M. Kumar, M. Alshehri, R. AlGhamdi, P. Sharma, and V. Deep,

‘‘A DE-ANN inspired skin cancer detection approach using fuzzy C-means

clustering,’’ Mobile Netw. Appl., vol. 25, no. 4, pp. 1319–1329, Aug. 2020.

[65] F. Afza, M. Sharif, M. A. Khan, U. Tariq, H.-S. Yong, and J. Cha, ‘‘Mul-

ticlass skin lesion classiﬁcation using hybrid deep features selection and

extreme learning machine,’’ Sensors, vol. 22, no. 3, p. 799, Jan. 2022.

[66] U. Bhimavarapu and G. Battineni, ‘‘Skin lesion analysis for melanoma

detection using the novel deep learning model fuzzy GC-SCNN,’’ Health-

care, vol. 10, no. 5, p. 962, May 2022.

[67] L. Seyyed-Kalantari, G. Liu, M. McDermott, I. Y. Chen, and M. Ghassemi,

‘‘CheXclusion: Fairness gaps in deep chest X-ray classiﬁers,’’ in Proc.

Biocomputing, vol. 2021, 2020, pp. 232–243.

[68] A. Mitra, A. Chakravarty, N. Ghosh, T. Sarkar, R. Sethuraman, and

D. Sheet, ‘‘A systematic search over deep convolutional neural network

architectures for screening chest radiographs,’’ in Proc. 42nd Annu. Int.

Conf. IEEE Eng. Med. Biol. Soc. (EMBC), Jul. 2020, pp. 1225–1228.

[69] S. Thakur and A. Kumar, ‘‘X-ray and CT-scan-based automated detec-

tion and classiﬁcation of covid-19 using convolutional neural net-

works (CNN),’’ Biomed. Signal Process. Control, vol. 69, Aug. 2021,

Art. no. 102920.

[70] R. Kundu, R. Das, Z. W. Geem, G.-T. Han, and R. Sarkar, ‘‘Pneumonia

detection in chest X-ray images using an ensemble of deep learning mod-

els,’’ PLoS ONE, vol. 16, no. 9, Sep. 2021, Art. no. e0256630.

[71] V. Chouhan, S. K. Singh, A. Khamparia, D. Gupta, P. Tiwari, C. Moreira,

R. Damasevicius, and V. H. C. de Albuquerque, ‘‘A novel transfer learning

based approach for pneumonia detection in chest X-ray images,’’ Appl.

Sci., vol. 10, no. 2, p. 559, Jan. 2020.

[72] N. Dey, Y.-D. Zhang, V. Rajinikanth, R. Pugalenthi, and N. S. M. Raja,

‘‘Customized VGG19 architecture for pneumonia detection in chest

X-rays,’’ Pattern Recognit. Lett., vol. 143, pp. 67–74, Mar. 2021.

[73] L. Girard, J. Rodriguez-Canales, C. Behrens, D. M. Thompson,

I. W. Botros, H. Tang, Y. Xie, N. Rekhtman, W. D. Travis, I. I. Wistuba,

J. D. Minna, and A. F. Gazdar, ‘‘An expression signature as an aid to the

histologic classiﬁcation of non–small cell lung cancer,’’ Clin. Cancer Res.,

vol. 22, no. 19, pp. 4880–4889, 2016.

[74] S. Dong, Q. Yang, Y. Fu, M. Tian, and C. Zhuo, ‘‘RCoNet: Deformable

mutual information maximization and high-order uncertainty-aware learn-

ing for robust COVID-19 detection,’’ IEEE Trans. Neural Netw. Learn.

Syst., vol. 32, no. 8, pp. 3401–3411, Aug. 2021.

[75] A. Bar-El, D. Cohen, N. Cahan, and H. Greenspan, ‘‘Improved cycle-

gan with application to COVID-19 classiﬁcation,’’ in Proc. SPIE, 2021,

pp. 296–305.

[76] F. Ucar and D. Korkmaz, ‘‘COVIDiagnosis-Net: Deep bayes-squeezenet

based diagnosis of the coronavirus disease 2019 (COVID-19) from X-ray

images,’’ Med. Hypotheses, vol. 140, Jul. 2020, Art. no. 109761.

[77] H. El-Khatib, D. Popescu, and L. Ichim, ‘‘Deep Learning–Based methods

for automatic diagnosis of skin lesions,’’ Sensors, vol. 20, no. 6, p. 1753,

Mar. 2020.

[78] H. Nahata and S. P. Singh, ‘‘Deep learning solutions for skin cancer

detection and diagnosis,’’ in Learning and Analytics in Intelligent Systems.

Springer, 2020, pp. 159–182.

VOLUME 11, 2023 28643

M. F. Sohan, A. Basalamah: Systematic Review on FL in Medical Image Analysis

[79] H. Fu, Y. Xu, S. Lin, D. W. Kee Wong, and J. Liu, ‘‘Deepvessel:Retinal ves-

sel segmentation via deep learning and conditional random ﬁeld,’’ in Med-

ical Image Computing and Computer-Assisted Intervention—MICCAI.

Springer, 2016, pp. 132–139.

[80] S. S. M. Sheet, T.-S. Tan, M. A. As’ari, W. H. W. Hitam, and J. S. Y. Sia,

‘‘Retinal disease identiﬁcation using upgraded CLAHE ﬁlter and trans-

fer convolution neural network,’’ ICT Exp., vol. 8, no. 1, pp. 142–150,

Mar. 2022.

[81] C. Luo, C. Shi, X. Li, and D. Gao, ‘‘Cardiac MR segmentation based

on sequence propagation by deep learning,’’ PLoS ONE, vol. 15, no. 4,

Apr. 2020, Art. no. e0230415.

[82] S. Tripathi, T. S. Sharan, S. Sharma, and N. Sharma, ‘‘An augmented deep

learning network with noise suppression feature for efﬁcient segmentation

of magnetic resonance images,’’ IETE Tech. Rev., vol. 39, no. 4, pp. 1–14,

2021.

[83] L. Witt, M. Heyer, K. Toyoda, W. Samek, and D. Li, ‘‘Decentral and incen-

tivized federated learning frameworks: A systematic literature review,’’

2022, arXiv:2205.07855.

[84] T. R. Gadekallu, Q.-V. Pham, T. Huynh-The, S. Bhattacharya,

P. K. R. Maddikunta, and M. Liyanage, ‘‘Federated learning for big

data: A survey on opportunities, applications, and future directions,’’

2021, arXiv:2110.04160.

[85] H. R. Roth et al., ‘‘Federated learning for breast density classiﬁcation: A

real-world implementation,’’ in Domain Adaptation and Representation

Transfer, and Distributed and Collaborative Learning. Springer, 2020,

pp. 181–191.

[86] Q. Dou et al., ‘‘Federated deep learning for detecting COVID-19 lung

abnormalities in CT: A privacy-preserving multinational validation study,’’

npj Digit. Med., vol. 4, no. 1, p. 60, Mar. 2021.

[87] N. N. Thilakarathne, G. Muneeswari, V. Parthasarathy, F. Alassery,

H. Hamam, R. Kumar Mahendran, and M. Shaﬁq, ‘‘Federated learning

for privacy-preserved medical Internet of Things,’’ Intell. Autom. Soft

Comput., vol. 33, no. 1, pp. 157–172, 2022.

[88] J. Born, N. Wiedemann, M. Cossio, C. Buhre, G. Brändle, K. Leidermann,

J. Goulet, A. Aujayeb, M. Moor, B. Rieck, and K. Borgwardt, ‘‘Accel-

erating detection of lung pathologies with explainable ultrasound image

analysis,’’ Appl. Sci., vol. 11, no. 2, p. 672, 2021.

[89] P. Patel. (Sep. 17, 2020). Chest X-ray (COVID-19 & Pneumonia). Kag-

gle. Accessed: Oct. 30, 2022. [Online]. Available: https://www.kaggle.

com/datasets/prashant268/chest-xray-covid19-pneumonia

[90] Srk. (Jun. 24, 2021). Novel Corona Virus 2019 Dataset. Kaggle.

Accessed: Oct. 30, 2022. [Online]. Available: https://www.kaggle.com/

datasets/sudalairajkumar/novel-corona-virus-2019-dataset

[91] F. Sattler, K.-R. Müller, and W. Samek, ‘‘Clustered federated learn-

ing: Model-agnostic distributed multitask optimization under privacy

constraints,’’ IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 8,

pp. 3710–3722, Aug. 2021.

[92] P. R. Bassi and R. Attux, ‘‘A deep convolutional neural network for

COVID-19 detection using chest X-rays,’’ Res. Biomed. Eng., vol. 38,

no. 1, pp. 139–148, Mar. 2022.

[93] K. M. J. Rahman, F. Ahmed, N. Akhter, M. Hasan, R. Amin, K. E. Aziz,

A. K. M. M. Islam, M. S. H. Mukta, and A. K. M. N. Islam, ‘‘Challenges,

applications and design aspects of federated learning: A survey,’’ IEEE

Access, vol. 9, pp. 124682–124700, 2021.

[94] K. Guo, T. Chen, S. Ren, N. Li, M. Hu, and J. Kang, ‘‘Federated learning

empowered real-time medical data processing method for smart health-

care,’’ IEEE/ACM Trans. Comput. Biol. Bioinf., early access, Jun. 23, 2022,

doi: 10.1109/TCBB.2022.3185395.

[95] S. Sakib, M. M. Fouda, Z. Md Fadlullah, and N. Nasser, ‘‘On COVID-19

prediction using asynchronous federated learning-based agile radiograph

screening booths,’’ in Proc. IEEE Int. Conf. Commun. (ICC), Jun. 2021,

pp. 1–6.

MD FAHIMUZZMAN SOHAN received the

B.Sc. degree in software engineering from

Daffodil International University, Bangladesh,

in 2019. He has published several papers in reputed

journals and conferences. His research interests

include machine learning, computer vision, and

image processing.

ANAS BASALAMAH received the M.Sc. and

Ph.D. degrees from Waseda University, Tokyo, in

2006 and 2009, respectively. He was a Postdoc-

toral Researcher with The University of Tokyo and

the University of Minnesota, in 2010 and 2011,

respectively. He is currently an Associate Profes-

sor with the Department of Computer Engineering,

Umm Al-Qura University. His research interests

include embedded networked sensing, smart cities,

ubiquitous computing, participatory, and urban

sensing.

28644 VOLUME 11, 2023

A Privacy-Preserving Approach to Effectively Utilize Distributed Data for Malaria Image Detection

Article

Full-text available

Mar 2024

Malaria is one of the life-threatening diseases caused by the parasite known as Plasmodium falciparum, affecting the human red blood cells. Therefore, it is an important to have an effective computer-aided system in place for early detection and treatment. The visual heterogeneity of the malaria dataset is highly complex and dynamic, therefore higher number of images are needed to train the machine learning (ML) models effectively. However, hospitals as well as medical institutions do not share the medical image data for collaboration due to general data protection regulations (GDPR) and the data protection act (DPA). To overcome this collaborative challenge, our research utilised real-time medical image data in the framework of federated learning (FL). We have used state-of-the-art ML models that include the ResNet-50 and DenseNet in a federated learning framework. We have experimented both models in different settings on a malaria dataset constituting 27,560 publicly available images and our preliminary results showed that the DenseNet model performed better in accuracy (75%) in contrast to ResNet-50 (72%) while considering eight clients, while the trend was observed as common in four clients with the similar accuracy of 94%, and six clients showed that the DenseNet model performed quite well with the accuracy of 92%, while ResNet-50 achieved only 72%. The federated learning framework enhances the accuracy due to its decentralised nature, continuous learning, and effective communication among clients, as well as the efficient local adaptation. The use of federated learning architecture among the distinct clients for ensuring the data privacy and following GDPR is the contribution of this research work.

Federated Learning for Medical Images Analysis: A Meta Survey

Conference Paper

Full-text available

Nov 2023

Federated Learning for Medical Image Analysis: A Survey

Article

Mar 2024
PATTERN RECOGN

Byzantine-Resilient Federated Learning Leveraging Confidence Score to Identify Retinal Disease

Conference Paper

Jan 2024

Federated learning is a distributed machine learning paradigm that enables multiple actors to collaboratively train a common model without sharing their local data, thus addressing data privacy issues, especially in sensitive domains such as health-care. However, federated learning is vulnerable to poisoning attacks, where malicious (Byzantine) clients can manipulate their local updates to degrade the performance or compromise the privacy of the global model. To mitigate this problem, this paper proposes a novel method that reduces the influence of malicious clients based on their confidence. We evaluate our method on the Retinal OCT dataset consisting of age-related macular degeneration and diabetic macular edema, using InceptionV3 and VGG19 architecture. The proposed technique significantly improves the global model's precision, recall, F1 score, and area under the receiver operating characteristic curve (AUC-ROC) for both InceptionV3 and VGG19. For InceptionV3, precision rises from 0.869 to 0.906, recall rises from 0.836 to 0.889, and F1 score rises from 0.852 to 0.898. For VGG19, precision rises from 0.958 to 0.963, recall rises from 0.917 to 0.941, and F1 score rises from 0.937 to 0.952.

Technical considerations of federated learning in digital healthcare systems

Chapter

Jan 2024

Optimizing latent graph representations of surgical scenes for unseen domain generalization

Article

Apr 2024

Advances in deep learning have resulted in effective models for surgical video analysis; however, these models often fail to generalize across medical centers due to domain shift caused by variations in surgical workflow, camera setups, and patient demographics. Recently, object-centric learning has emerged as a promising approach for improved surgical scene understanding, capturing and disentangling visual and semantic properties of surgical tools and anatomy to improve downstream task performance. In this work, we conduct a multicentric performance benchmark of object-centric approaches, focusing on critical view of safety assessment in laparoscopic cholecystectomy, then propose an improved approach for unseen domain generalization. We evaluate four object-centric approaches for domain generalization, establishing baseline performance. Next, leveraging the disentangled nature of object-centric representations, we dissect one of these methods through a series of ablations (e.g., ignoring either visual or semantic features for downstream classification). Finally, based on the results of these ablations, we develop an optimized method specifically tailored for domain generalization, LG-DG, that includes a novel disentanglement loss function. Our optimized approach, LG-DG, achieves an improvement of 9.28% over the best baseline approach. More broadly, we show that object-centric approaches are highly effective for domain generalization thanks to their modular approach to representation learning. We investigate the use of object-centric methods for unseen domain generalization, identify method-agnostic factors critical for performance, and present an optimized approach that substantially outperforms existing methods.

An Approach of SIFT With Fed-VGG16 and Fed-CNN for Identification and Classification of Brain Tumors

Chapter

Apr 2024

Brain tumors develop when cells in the brain multiply rapidly and unchecked. It can be fatal if not addressed in its early stages. Getting segmentation and classification right is still a challenge, despite a lot of work and good results in this field. Radiologists may now more easily locate tumor regions with the use of experimental medical imaging techniques like magnetic resonance imaging (MRI). Image processing techniques such as pre-processing, segmentation, contour detection, feature extraction using SIFT (scale invariant feature transformation), classification using VGG16, CNN, Fed-VGG16, Fed-CNN classifiers, and evaluation using confusion matrices are presented in this study. The models reach up to 97%, 98.51%, 99.28%, and 100% classification accuracy for the used classifiers, correspondingly, according to the experimental data. In order to facilitate early detection for subsequent research and activity, it seeks to mitigate some of the problems that have already been addressed.

Role of Artificial Intelligence in Medical Image Analysis: A Review of Current Trends and Future Directions

Article

Apr 2024

This review offers insight into AI’s current and future contributions to medical image analysis. The article highlights the challenges associated with manual image interpretation and introduces AI methodologies, including machine learning and deep learning. It explores AI’s applications in image segmentation, classification, registration, and reconstruction across various modalities like X-ray, computed tomography (CT), magnetic resonance imaging (MRI), and ultrasound. Medical image analysis is vital in modern healthcare, facilitating disease diagnosis, treatment, and monitoring. Integrating artificial intelligence (AI) techniques, particularly deep learning, has revolutionized this field. Recent advancements are discussed, such as generative adversarial networks (GANs), transfer learning, and federated learning. The review assesses the advantages and limitations of AI in medical image analysis, underscoring the importance of interpretability, robustness, and generalizability in clinical practice. Ethical considerations related to data privacy, bias, and regulatory aspects are also examined. The article concludes by exploring future directions, including personalized medicine, multi-modal fusion, real-time analysis, and seamless integration with electronic health records (EHRs). This comprehensive review delineates artificial intelligence’s current and prospective role in medical image analysis. With implications for researchers, clinicians, and policymakers, it underscores AI’s transformative potential in enhancing patient care.

Personalized Federated Learning for Histopathological Prediction of Lung Cancer

Conference Paper

Dec 2023

Federated Learning and Artificial Intelligence in E-Healthcare

Chapter

Dec 2023

Federated Learning (FL), a novel distributed interactive AI paradigm, holds particular promise for smart healthcare since it enables many clients including hospitals to take part in AI training while ensuring data privacy. Each participant's data that is sent to the server is really a trained sub-model rather than original data. FL benefits from better privacy features and dispersed data processing. Analysis of very sensitive data has substantially improved because to the combination of Federated Learning with healthcare data informatics. By utilizing the advantages of FL, the clients' data is preserved safely with their own model, and data leakage is avoided to prevent any malicious data modification in the system. Horizontal FL takes data from all devices with a comparable trait space suggests that Clients A and B are using the same features. Vertical Federated Learning uses a number of datasets from various feature domains to train a global model. A successful FL implementation could thus hold a significant potential for enabling precision medicine on a large scale.

Federated Learning for Medical Applications: A Taxonomy, Current Trends, Challenges, and Future Research Directions

Article

Full-text available

Nov 2023

With the advent of the Internet of Things (IoT), Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL) algorithms, the landscape of data-driven medical applications has emerged as a promising avenue for designing robust and scalable diagnostic and prognostic models from medical data. This has gained a lot of attention from both academia and industry, leading to significant improvements in healthcare quality. However, the adoption of AI-driven medical applications still faces tough challenges, including meeting security, privacy, and quality of service (QoS) standards. Recent developments in Federated Learning (FL) have made it possible to train complex machine-learned models in a distributed manner and has become an active research domain, particularly processing the medical data at the edge of the network in a decentralized way to preserve privacy and address security concerns. To this end, in this paper, we explore the present and future of FL technology in medical applications where data sharing is a significant challenge. We delve into the current research trends and their outcomes, unravelling the complexities of designing reliable and scalable FL models. Our paper outlines the fundamental statistical issues in FL, tackles device-related problems, addresses security challenges, and navigates the complexity of privacy concerns, all while highlighting its transformative potential in the medical field. Our study primarily focuses on medical applications of FL, particularly in the context of global cancer diagnosis. We highlight the potential of FL to enable computer-aided diagnosis tools that address this challenge with greater effectiveness than traditional data-driven methods. Recent literature has shown that FL models are robust and generalize well to new data, which is essential for medical applications. We hope that this comprehensive review will serve as a checkpoint for the field, summarizing the current state-of-the-art and identifying open problems and future research directions.

A Comprehensive Analysis of Recent Deep and Federated-Learning-Based Methodologies for Brain Tumor Diagnosis

Article

Full-text available

Feb 2022

Citation: Naeem, A.; Anees, T.; Naqvi, R.A.; Loh, W.-K. A Comprehensive Analysis of Recent Deep and Federated-Learning-Based Methodologies for Brain Tumor Diagnosis.

Decentral and Incentivized Federated Learning Frameworks: A Systematic Literature Review

Article

Full-text available

Jan 2022

The advent of Federated Learning (FL) has sparked a new paradigm of parallel and confidential decentralized Machine Learning (ML) with the potential of utilizing the computational power of a vast number of Internet of Things (IoT), mobile, and edge devices without data leaving the respective device, thus ensuring privacy by design. Yet, simple Federated Learning Frameworks (FLF) naively assume an honest central server and altruistic client participation. In order to scale this new paradigm beyond small groups of already entrusted entities towards mass adoption, FLFs must be (i) truly decentralized, and (ii) incentivized to participants. This systematic literature review is the first to analyze FLFs that holistically apply both, blockchain technology to decentralize the process and reward mechanisms to incentivize participation. 422 publications were retrieved by querying 12 major scientific databases. After a systematic filtering process, 40 articles remained for an in-depth examination following our five research questions. To ensure the correctness of our findings, we verified the examination results with the respective authors. Although having the potential to direct the future of distributed and secure Artificial Intelligence, none of the analyzed FLFs is production-ready. The approaches vary heavily in terms of use cases, system design, solved issues, and thoroughness. We provide a systematic approach to classify and quantify differences between FLFs, expose limitations of current works and derive future directions for research in this novel domain.

Collaborative Federated Learning for Healthcare: Multi-Modal COVID-19 Diagnosis At the Edge

Article

Full-text available

Jan 2022

Despite significant improvements over the last few years, cloud-based healthcare applications continue to suffer from poor adoption due to their limitations in meeting stringent security, privacy, and quality of service requirements (such as low latency). The edge computing trend, along with techniques for distributed machine learning such as federated learning, has gained popularity as a viable solution in such settings. In this paper, we leverage the capabilities of edge computing in medicine by evaluating the potential of intelligent processing of clinical data at the edge. We utilized the emerging concept of clustered federated learning (CFL) for an automatic COVID-19 diagnosis. We evaluate the performance of the proposed framework under different experimental setups on two benchmark datasets. Promising results are obtained on both datasets resulting in comparable results against the central baseline where the specialized models (i.e., each on a specific image modality) are trained with central data, and improvements of 16% and 11% in overall F1-Scores have been achieved over the trained model trained (using multi-modal COVID-19 data) in the CFL setup on X-ray and Ultrasound datasets, respectively. We also discussed the associated challenges, technologies, and techniques available for deploying ML at the edge in such privacy and delay-sensitive applications.

Experiments of Federated Learning for COVID-19 Chest X-ray Images

Article

Full-text available

Jul 2020

AI plays an important role in COVID-19 identification. Computer vision and deep learning techniques can assist in determining COVID-19 infection with Chest X-ray Images. However, for the protection and respect of the privacy of patients, the hospital's specific medical-related data did not allow leakage and sharing without permission. Collecting such training data was a major challenge. To a certain extent, this has caused a lack of sufficient data samples when performing deep learning approaches to detect COVID-19. Federated Learning is an available way to address this issue. It can effectively address the issue of data silos and get a shared model without obtaining local data. In the work, we propose the use of federated learning for COVID-19 data training and deploy experiments to verify the effectiveness. And we also compare performances of four popular models (MobileNet, ResNet18, MoblieNet, and COVID-Net) with the federated learning framework and without the framework. This work aims to inspire more researches on federated learning about COVID-19.

A Review of Medical Federated Learning: Applications in Oncology and Cancer Research

Chapter

Full-text available

Jul 2022

Machine learning has revolutionized every facet of human life, while also becoming more accessible and ubiquitous. Its prevalence has had a powerful impact in healthcare, with numerous applications and intelligent systems achieving clinical level expertise. However, building robust and generalizable systems relies on training algorithms in a centralized fashion using large, heterogeneous datasets. In medicine, these datasets are time consuming to annotate and difficult to collect centrally due to privacy concerns. Recently, Federated Learning has been proposed as a distributed learning technique to alleviate many of these privacy concerns by providing a decentralized training paradigm for models using large, distributed data. This new approach has become the defacto way of building machine learning models in multiple industries (e.g. edge computing, smartphones). Due to its strong potential, Federated Learning is also becoming a popular training method in healthcare, where patient privacy is of paramount concern. In this paper we performed an extensive literature review to identify state-of-the-art Federated Learning applications for cancer research and clinical oncology analysis. Our objective is to provide readers with an overview of the evolving Federated Learning landscape, with a focus on applications and algorithms in oncology space. Moreover, we hope that this review will help readers to identify potential needs and future directions for research and development.

Skin Lesion Analysis for Melanoma Detection Using the Novel Deep Learning Model Fuzzy GC-SCNN

Article

Full-text available

May 2022

Melanoma is easily detectable by visual examination since it occurs on the skin’s surface. In melanomas, which are the most severe types of skin cancer, the cells that make melanin are affected. However, the lack of expert opinion increases the processing time and cost of computer-aided skin cancer detection. As such, we aimed to incorporate deep learning algorithms to conduct automatic melanoma detection from dermoscopic images. The fuzzy-based GrabCut-stacked convolutional neural networks (GC-SCNN) model was applied for image training. The image features extraction and lesion classification were performed on different publicly available datasets. The fuzzy GC-SCNN coupled with the support vector machines (SVM) produced 99.75% classification accuracy and 100% sensitivity and specificity, respectively. Additionally, model performance was compared with existing techniques and outcomes suggesting the proposed model could detect and classify the lesion segments with higher accuracy and lower processing time than other techniques.

A systematic review of federated learning applications for biomedical data

Article

Full-text available

May 2022

Objectives Federated learning (FL) allows multiple institutions to collaboratively develop a machine learning algorithm without sharing their data. Organizations instead share model parameters only, allowing them to benefit from a model built with a larger dataset while maintaining the privacy of their own data. We conducted a systematic review to evaluate the current state of FL in healthcare and discuss the limitations and promise of this technology. Methods We conducted a literature search using PRISMA guidelines. At least two reviewers assessed each study for eligibility and extracted a predetermined set of data. The quality of each study was determined using the TRIPOD guideline and PROBAST tool. Results 13 studies were included in the full systematic review. Most were in the field of oncology (6 of 13; 46.1%), followed by radiology (5 of 13; 38.5%). The majority evaluated imaging results, performed a binary classification prediction task via offline learning (n = 12; 92.3%), and used a centralized topology, aggregation server workflow (n = 10; 76.9%). Most studies were compliant with the major reporting requirements of the TRIPOD guidelines. In all, 6 of 13 (46.2%) of studies were judged at high risk of bias using the PROBAST tool and only 5 studies used publicly available data. Conclusion Federated learning is a growing field in machine learning with many promising uses in healthcare. Few studies have been published to date. Our evaluation found that investigators can do more to address the risk of bias and increase transparency by adding steps for data homogeneity or sharing required metadata and code.

COVID-19 Image Data Collection: Prospective Predictions are the Future

Article

Dec 2020

Across the world’s coronavirus disease 2019 (COVID-19) hot spots, the need to streamline patient diagnosis and management has become more pressing than ever. As one of the main imaging tools, chest X-rays (CXRs) are common, fast, non-invasive, relatively cheap, and potentially bedside to monitor the progression of the disease. This paper describes the first public COVID-19 image data collection as well as a preliminary exploration of possible use cases for the data. This dataset currently contains hundreds of frontal view X-rays and is the largest public resource for COVID-19 image and prognostic data, making it a necessary resource to develop and evaluate tools to aid in the treatment of COVID-19. It was manually aggregated from publication figures as well as various web based repositories into a machine learning (ML) friendly format with accompanying dataloader code. We collected frontal and lateral view imagery and metadata such as the time since first symptoms, intensive care unit (ICU) status, survival status, intubation status, or hospital location. We present multiple possible use cases for the data such as predicting the need for the ICU, predicting patient survival, and understanding a patient’s trajectory during treatment. Data can be accessed here: https://github.com/ieee8023/covid-chestxray-dataset

Federated Learning Empowered Real-Time Medical Data Processing Method for Smart Healthcare

Article

Jun 2022

Computer-aided diagnosis (CAD) has always been an important research topic for applying artificial intelligence in smart healthcare. Sufficient medical data are one of the most critical factors in CAD research. However, medical data are usually obtained in chronological order and cannot be collected all at once, which poses difficulties for the application of deep learning technology in the medical field. The traditional batch learning method consumes considerable time and space resources for real-time medical data, and the incremental learning method often leads to catastrophic forgetting. To solve these problems, we propose a real-time medical data processing method based on federated learning. We divide the process into the model stage and the exemplar stage. In the model stage, we use the federated learning method to fuse the old and new models to mitigate the catastrophic forgetting problem of the new model. In the exemplar stage, we use the most representative exemplars selected from the old data to help the new model review the old knowledge, which further mitigates the catastrophic forgetting problem of the new model. We use this method to conduct experiments on a simulated medical real-time data stream. The experimental results show that our method can learn a disease diagnosis model from a continuous medical real-time data stream. As the amount of data increases, the performance of the disease diagnosis model continues to improve, and the catastrophic forgetting problem has been effectively mitigated. Compared with the traditional batch learning method, our method can significantly save time and space resources.