ArticlePDF Available

Abstract

Due to the fast advancement of artificial intelligence (AI), centralized-based models have become critical for healthcare tasks like in medical image analysis and human behavior recognition. Although these models exhibit suitable performance, they are frequently constrained by privacy concerns. To attenuate this, a centralized learning strategy cannot be used in cases where there is a risk of data privacy breach, particularly in healthcare centers. Federated learning (FL) is a technique that allows for training a global model without sharing data by training distributed local models and aggregating them. By implementing FL throughout the training process, we can obtain a model with comparable generalization abilities to centralized learning while maintaining data privacy. This survey provides an introduction to the fundamental concepts and categories of FL, highlights the limitations of the centralized healthcare model, and discusses how FL can address these constraints. We also provide a detailed overview of the healthcare applications using FL models, along with commonly used evaluation metrics and public datasets. In this context, we have implemented a case study to demonstrate how FL can be applied in the healthcare field. Furthermore, we outline the key challenges and future trends in FL.
1
Federated Learning for Healthcare Applications
Ahmad Chaddad*, Yihang Wu, Christian Desrosiers
Abstract—Due to the fast advancement of artificial intelligence
(AI), centralized-based models have become critical for health-
care tasks like in medical image analysis and human behavior
recognition. Although these models exhibit suitable performance,
they are frequently constrained by privacy concerns. To attenuate
this, a centralized learning strategy cannot be used in cases where
there is a risk of data privacy breach, particularly in healthcare
centers. Federated learning (FL) is a technique that allows
for training a global model without sharing data by training
distributed local models and aggregating them. By implementing
FL throughout the training process, we can obtain a model with
comparable generalization abilities to centralized learning while
maintaining data privacy. This survey provides an introduction
to the fundamental concepts and categories of FL, highlights
the limitations of the centralized healthcare model, and discusses
how FL can address these constraints. We also provide a detailed
overview of the healthcare applications using FL models, along
with commonly used evaluation metrics and public datasets. In
this context, we have implemented a case study to demonstrate
how FL can be applied in the healthcare field. Furthermore, we
outline the key challenges and future trends in FL.
Keywords: Federated Learning, Healthcare, Medical Imaging,
Data Privacy, Artificial Intelligence.
I. INTRODUCTION
In light of modern AI, various state-of-the-art AI techniques,
including deep learning (DL) and the Internet of Medical
Things (IoMTs), have made their way into the healthcare
industry. This leads to improve the diagnosis and treatment of
various conditions such as COVID-19 [1] and autism spectrum
disorder (ASD) [2].
However, existing intelligent healthcare AI models need
to be truly intelligent, and some have been criticized for
providing ineffective and unsafe treatment recommendations
[3]. Several factors may have caused deficiencies in existing
systems. A significant issue is the difficulty of obtaining
sufficient data with complex features that can adequately
describe the patient’s symptoms.
In addition, with the implementation of rigorous laws such
as the United States Consumer Privacy Bill of Rights and the
European Commission’s General Data Protection Regulations
(GDPR), which aim to safeguard individuals’ privacy [4],
AI models are now unable to directly access source data
for training purposes. Instead, they must adhere to strict
limitations and regulatory requirements.
FL, which offers a novel distributed AI paradigm aimed
at addressing concerns related to healthcare data privacy
and management [5], has emerged as a popular subject of
A. Chaddad and Y. Wu are with the School of Artificial Intelligence, Guilin
University of Electronic Technology, Guilin, China.
Corresponding author: Ahmad Chaddad.
A. Chaddad and C. Desrosiers are with The Laboratory for Imagery, Vi-
sion and Artificial Intelligence, Ecole de Technologie Superieure, Montreal,
Canada. Email: ahmad8chaddad@gmail.com, ahmadchaddad@guet.edu.cn
Manuscript received May 9, 2023; revised August -, 2023.
Fig. 1: This example illustrates the results of a search query
conducted on the PubMed and Google Scholar databases. The
search query was formulated using the keywords “Federated”
AND ( (medical) OR (healthcare) ), and the bars show the
number of publications indexed on each platform that matched
the search criteria related to the topics discussed in this review.
discussion in recent years [6]. Google first introduced FL in
2015 [7]. Essentially, FL is a distributed AI methodology that
involves training several local models and aggregating them
to derive a global model without the need for data sharing.
FL can be specifically applied in the following situations:
Non-IID data: In the realm of traditional machine learning,
it is common practice to assume that data is independently and
identically distributed (IID). However, it is important to note
that in the majority of practical scenarios and circumstances,
this assumption is not met. For instance, each individual client
exhibits a unique set of behaviors, resulting in the collection
of biased data that may differ from that of other participants
[8]. This, in turn, can lead to the emergence of Non-IID or
Heterogeneous data, which can pose a challenge for machine
learning models.
Unbalanced data distribution: An unbalanced data distri-
bution occurs when certain participants in the training dataset
possess a disproportionate amount of pertinent data. For ex-
ample, in a scenario where the training participants include
both hospitals and individuals, hospitals are likely to have
significantly larger sample sizes than individuals. Addition-
ally, data relevant to the same disease can vary substantially
between hospitals due to differences in equipment, personnel,
and other factors. This can create challenges for machine
learning models, especially when attempting to generalize to
new and diverse datasets.
Data privacy protection: In recent years, the enactment
of multiple data privacy protection laws, especially in the
medical domain, has made it extremely difficult to acquire
large amounts of data in a single batch for modeling training
purposes. Such data frequently contain confidential patient
information, making it essential to limit access to the actual
2
data and only provide access to the model’s parameters. How-
ever, this poses a significant challenge for machine learning
models, which rely heavily on a vast and diverse dataset to
learn patterns effectively and make accurate predictions.
FL has emerged as a particularly promising methodology
for smart healthcare, as it enables multiple hospitals to train
AI models collaboratively without compromising the confiden-
tiality of their raw data [9]. The healthcare industry can take
advantage of FL for a wide range of applications, including
expert diagnosis, drug development, medical image analysis,
electronic health data collection, human activity recognition
(HAR), and remote health monitoring. The broad applicability
of FL in the healthcare sector is illustrated in Fig. 2. The
outstanding potential of FL in healthcare has generated a
growing interest in the research community. This can be
observed in Fig. 1, which shows the number of research
publications on FL in the healthcare sector on PubMed and
Google Scholar over the years.
The contribution of our work is summarized as follows:
We provide a detailed definition of FL and present its
various categories.
We identify the limitations of existing traditional health-
care models and demonstrate how FL can address those
issues.
We offer a comprehensive explanation of how FL can
be applied in healthcare, including an evaluation of the
performance of different FL methods.
We describe the commonly used evaluation metrics and
datasets for FL in healthcare.
We outline the key challenges and future directions of FL
in the healthcare domain.
Our work offers a thorough examination of FL models used
within the healthcare field. It includes a detailed analysis of
their limitations with distinctive contributions. Furthermore,
we present a case study demonstrating the application of FL
with medical images to illustrate its relevance in healthcare.
Additionally, we focus on FL challenges that are both common
and unique to this domain. The structure of this paper is as fol-
lows. In Section II, we provide a comprehensive definition of
FL and explain the key differences between FL and traditional
learning (TL). We also highlight the different applications of
FL in various domains. The main steps involved in FL and
its different categories are presented in Section III. In Section
IV, we delve into the details of applying FL in the healthcare
sector. Additionally, we provide information on the commonly
used datasets and evaluation metrics for FL in healthcare
in Section V. Section VI focuses on the critical challenges
associated with FL. We then discuss the future trends of FL in
healthcare in Section VII. Finally, we offer concluding remarks
in Section VIII.
II. BACKGROUND
This section begins with a formal definition of FL, followed
by a comparison between FL and traditional machine learning
methodologies. The section concludes with a discussion of
the challenges faced by the healthcare field and the increasing
need for FL to overcome these challenges.
Fig. 2: Examples of federated learning applications in health-
care, including drug development, human activity recognition,
remote health monitoring, electronic health data recording,
medical image analysis, and assisted expert diagnosis. These
are some of the most common use cases for FL in the
healthcare field.
A. The definition of federated learning
Assuming a group of Nparticipants denoted by {Fi}N
i=1,
each with their own dataset {Di}N
i=1, traditional learning
merges all data into a single dataset D=D1 D2,· · · DN,
which is then used to train a model MSUM. In contrast, FL
allows each participant to train their own local model on their
own dataset, without sharing their data, and then obtain an
aggregation model in a global server, denoted by MFED .
If we use VSUM and VFED to represent the performance
metrics (such as accuracy, recall, precision, etc.) of MSUM
and MFED, respectively, we say that the FL model MFED has
a performance loss of δif it satisfies the following inequality:
VSUM VFED < δ (1)
where δis a non-negative constant that represents the maxi-
mum amount of performance loss that can be tolerated in the
FL model compared to the traditional approach.
Based on the above statements, we can see that the perfor-
mance of FL depends largely on the aggregation algorithm.
Formally, the final goal of FL is the optimization of the
following objective function [10]:
min
ωL(ω) =
N
X
i=1
fiLi(ω)(2)
where ωis the weights of local models, L(ω)is the global
loss while Li(ω)is the local loss. fiindicates the importance
of the clients who participated in the training.
B. Traditional Learning vs Federated Learning
When it comes to traditional learning (TL), data is typically
shared and used collaboratively to develop a model. However,
if the data contains personally identifiable information, there
may be privacy concerns and potential data leaks that could
disrupt the training process. In contrast, FL allows for the
3
training of local models and their subsequent aggregation into
a global model without directly accessing each other’s data.
This approach can lead to highly accurate models without
compromising data privacy. For further information on the
differences between TL and FL, please refer to Table I.
TABLE I: Traditional Learning vs Federated Learning
Criteria Traditional Learning Federated Learning
Training method Centralized learning Distribute learning
Training process Training together Training on edge devices
Aggregation No aggregation Aggregation on server
Model Shared model Personalized model / Shared models
Sharing process Data sharing Raw data encryption
Iterations One-time Iterative process
C. The limitations of the existing AI-based healthcare models
Most current AI-based healthcare applications such as clas-
sification [11], [12] and segmentation [13], [14] are related
to TL approaches. We summarize the limitations of TL as
follows.
Privacy leakage issues: As discussed previously, tra-
ditional AI-based smart healthcare requires the sharing of
raw data. As some of these records contain private patient
information, this leads to privacy compromise issues. For
instance, a third-party service can modify data patterns without
user consent [15].
Data limitation: Although machine learning, especially
deep learning (DL), is becoming the primary approach in many
industries, this approach requires large and diverse datasets
for training [16]. Unfortunately, in reality, only some medical
institutions have sufficient data for training the model. For
example, a small research center may desire to build an AI
model using limited datasets. This leads to a trained model
with poor generalizability.
Communication consumption for model training: The
transfer of large amounts of data in TL-based smart healthcare
models can lead to network latency issues, as noted in [17].
This poses a significant challenge for medical institutions
and network connectivity, particularly with respect to energy
consumption.
D. Federated learning: An appropriate approach to address
current challenges
Given the various challenges discussed previously, FL can
be viewed as a potential solution to the many issues present in
modern smart healthcare, particularly in the following aspects:
Protection of data privacy: FL ensures data privacy by
allowing clients to share only model parameters, not the raw
data, as mentioned in the previous section. This approach is
highly effective in protecting data privacy. In a study cited
in [18], homomorphic encryption-based privacy-preserving
strategies were used to address data privacy leakage issues. As
data privacy laws continue to become more stringent, FL is
expected to play a crucial role in the smart healthcare industry.
Reduce training consumption: The FL technique can
distribute data efficiently to each edge server, leading to
Fig. 3: Flowchart of federated learning model for healthcare.
The process involves several steps, including model initializa-
tion and client selection (Left), local training and parameter
upload (Middle), and model aggregation and parameter down-
load (Right). 1) the global model is initialized, and clients
are selected to participate in the federated learning process.
2) second step involves local training on client data and the
upload of updated model parameters to the server. Finally, the
updated parameters from all clients are aggregated to create a
new global model, which is transmitted back to the clients
for the next round of training. This approach enables the
training of models using decentralized data while preserving
data privacy.
a reduction in communication usage, network transmission
latency, and costs. Sharing model parameters through FL
typically requires much less energy compared to exchanging
raw data. For example, the size of parameter gradients is
significantly smaller than the actual data in the dataset, as
stated in [19]. This makes FL an energy-efficient solution for
distributed machine learning.
Large amount of training data: FL provides strategies,
such as FedAvg [20], that allows for the merging of multiple
clients when the number of clients is sufficient. This merging
of clients promotes the availability of training data and can
alleviate the problem of requiring a large quantity of data
to train AI models. Thus, FL is a powerful technique for
distributed machine learning, especially when there is a large
number of clients available.
III. CATEGORIES AND ESSENTIAL STAGES OF
FEDERATED LEARNING IN HEALTHCARE
In this section, we will provide an overview and explanation
of the essential phases of FL, followed by an explanation of
the categories of FL.
A. How the Principles of Federated Learning Apply to Health-
care
In Figure 3, we present a flowchart outlining the major steps
of FL in healthcare. Additionally, we provide pseudo-code for
the key steps of FL in Algorithm 1.
Model initialization and client selection: The process
begins by defining a task in the healthcare domain, such as
medical image classification, segmentation, or HAR. Next, the
4
Algorithm 1 The key stages involved in federated learning,
where ωrepresents the model’s parameters, Ddenotes the lo-
cal dataset held by individual clients, and the method pertains
to the aggregation approach.
Initialization: Clients = {},ωglobal = 0, The number of clients N, Initial
parameters ωpretrained, Communication round C.
procedure Initialization & Selection (ωpretrained,N)
Clients = Select clients(N)
ωglobal =ωpretrained
end procedure
procedure Local training & Upload (ωglobal,Clients)
ωclient =ωglobal
ωlocalnew =Local training(ωclient , Cl ients, D)
U ploadP ara(Client, ωlo calnew)
end procedure
procedure Aggregation & Download (ωlocalnew,Clients)
ωglobal =Aggregation(ωlocalnew , method)
DownloadP ar a(Clients, ωglobal )
end procedure
for c= 1 to Cdo Local training & Upload (ωglobal,Clients)
Aggregation & Download (ωlocalnew,C lients)
if performance meets requirement do
Break
endif
endfor
parameters of the global server are artificially initialized, and
clients are then selected by the global server to participate in
the training.
Local training and parameters upload: Once the par-
ticipating clients are identified, the global server distributes
the initial model and its parameters to each client. In every
subsequent communication round, each client trains its own
dataset and uploads the parameters of its local model on the
global server for aggregation.
Model aggregation and parameters download: After all
participating clients have completed uploading their updated
parameters, the global server combines them to compute a
new global model. This updated model is then distributed to
each client for the next training session. The process of FL
continues until the loss function of the global server converges
or meets the performance requirements.
B. Federated learning approaches in healthcare
This section classifies FL into three distinct types, namely
Horizontal FL (HFL, [21]), Vertical FL (VFL), and Federated
Transfer Learning (FTL). Figure 4 provides a clear illustration
of these categories.
1) Horizontal federated learning: Sample-Partitioned FL,
also referred to as Horizontal FL, involves healthcare clients
with datasets that share the same feature space but have
different sample spaces. In this scenario, each participant can
use the same model to train on its data and then upload it to the
global server. The integration of data from the same feature
space that is spread across multiple clients is a commonly
used technique in privacy-sensitive fields such as healthcare
and mobile services. This technique is made possible through
Fig. 4: The various types of federated learning utilized in the
healthcare field can be illustrated through three categories. The
first category, represented on the left, is Horizontal Federated
Learning (HFL), which involves the same feature space but
different sample spaces. The second category, shown in the
middle, is Vertical Federated Learning (VFL), where there
are distinct feature spaces but the same sample spaces. The
third category, depicted on the right, is Federated Transfer
Learning (FTL), which is characterized by disparate feature
and sample spaces. The blue and green colors represent the
different types of samples, while the gray circles indicate the
feature types.
the use of HFL, as described in [22]. To be specific, HFL can
be defined as:
Xi=Xj,Yi=Yj, Ii=Ij, Di,Dj, i =j(3)
where Idenotes the sample space, while Xand Yrefer to the
feature space and the label space, respectively. The datasets
owned by the ith and jth clients are represented by Diand
Dj, respectively.
2) Vertical federated learning: Feature-Partitioned FL, also
referred to as Vertical FL according to the source [23], operates
within the FL framework where the sample space remains
the same, but the feature space differs. The goal of VFL is
to create a shared machine learning model collaboratively,
utilizing all the features gathered by the participating clients.
An instance of this is the Federated Data Network (FDN)
[24], which integrates anonymous data from a prominent social
network service, thus allowing for the inclusion of a vast
majority of user samples from other data holders, such as bank
customers. Formally, the VFL can be defined as follows:
Xi=Xj,Yi=Yj, Ii=Ij, Di,Dj, i =j(4)
where Xagain represents feature space and Ythe label space.
Iis sample space and Dis the datasets owned by each
healthcare client.
3) Federated transfer learning: HFL and VFL require all
clients to have the same feature space or sample space, but
this assumption does not hold in more practical situations.
Transfer learning is a technique that attempts to improve the
performance of target learners on target domains by trans-
ferring knowledge from distinct, but related source domains
[25]. Thus, FTL aims to solve the case where both the sample
space and feature space are different while using a TL method
to minimize the data distribution discrepancy between each
local dataset. In healthcare, for example, FTL can assist in
disease diagnosis with data from different patients (different
sample spaces) in multiple hospitals with different therapeutic
5
programs (different feature spaces) [26]. Hence, FTL can be
defined as:
Xi=Xj,Yi=Yj, Ii=Ij, Di,Dj, i =j(5)
Xibeing the ith feature space and Yithe ith label space. Ii
is ith sample space and Di,Djare the datasets owned by ith
and jth healthcare clients, respectively.
IV. HEALTHCARE APPLICATIONS OF FEDERATED
LEARNING
As TL has many limitations, various studies have been
conducted to evaluate the usefulness of FL within the field
of healthcare. This section provides an overview of how FL
has been specifically applied in this field. The section covers
the application of FL in classification and segmentation, as
well as in other tasks. The healthcare applications of FL in
2022 are summarized in Tables III.
A. Classification-based federated learning
Classification (and/or detection/prediction) is a very com-
mon problem of FL. Integrating FL is an important step in
enhancing the robustness of medical models due to the com-
plexity of medical data. In healthcare, FL models have been
proposed for a broad range of classification tasks, including
cancer diagnosis [27]–[35], COVID-19 detection and Pneu-
monia diagnosis [36]–[39], epileptic seizure detection [40],
HAR [41]–[43], identifying functional connectivity biomarkers
of major depressive disorder (MDD) [44], autistic spectrum
disorder prediction [45] and surgical phases recognition [46].
We report in Table II the performance of different FL methods
proposed for classification in healthcare, and summarize these
methods below.
Cancer diagnosis: Recent studies have shown the fea-
sibility and benefits of applying FL technology to cancer
diagnostic tasks [27]–[35], [39], [47]. For instance, [27] pro-
poses a differentially private FL framework that employs Bag
Preparation and Multiple Instance Learning (MIL) to perform
a classification task on a Lung cancer dataset. The authors
conduct experiments on their hand-crafted dataset derived
from The Cancer Genome Atlas (TCGA) [48] and demonstrate
that their FL model performs better than non-FL models while
also addressing medical data privacy concerns. However, the
performance of their model degrades when the number of
clients is high (32 clients), with an accuracy of less than 60%
in this case. This limitation prohibits the implementation of
the model in large-scale collaborative environments.
Heterogeneous data is a common challenge in FL that can
cause local and global drift, affecting the performance of
the model [28]. To address this issue, the authors of [28]
introduced a FL framework called HarmoFL, which aims to
harmonize local and global drifts simultaneously using mag-
nitude normalization. For addressing local drift, magnitudes
are limited to a specific range to generate a coordinated
feature space across local clients. They also used client weight
perturbation based on the generated feature space to guide the
local target near a globally-optimal solution which reduces
global drift. Specifically, it considered both local and global
update drifts in FL on heterogeneous medical images. They
tested their approach on the Camelyon17 dataset [49] that
consists of 450,000 breast cancer images.
Curriculum Learning (CL) has gained significant attention
in the academic community, as evidenced by recent references
[50], [51]. CL is a training approach that gradually introduces
more challenging examples throughout the training process,
following a proper pedagogical sequence observed in human
education [52]. The work in [29] proposes a novel memory-
aware CL approach for FL that re-scales the priority of train-
ing samples based on their scores to improve effectiveness.
Authors of this work use CL for the purpose of classifying
medical images across multiple sites. This approach also
employs unsupervised domain adaptation to maintain data
privacy and minimize data distribution discrepancy. The pro-
posed approach was evaluated on a breast cancer dataset [49]
and showed superior performance compared to traditional FL
methods. However, the FL model was only tested on a single
breast cancer classification dataset, which raises concerns
about potential bias and the generalizability of this model to
other tasks.
In [30], a solution was proposed to alleviate the instability
arising from data diversity in a setup known as FL with Shared
Label Distribution. This approach employs a weighted cross-
entropy loss, which optimizes the relevance of each sample to
the local target by taking into account the label distribution in
each client. However, it is assumed that clients can share the
number of samples in each class, which may result in privacy
leakage if this information is valuable. The proposed approach
achieved improved test accuracy on the OrganMNIST dataset
[53]. Yet, these studies performed experiments on limited types
of datasets, and further analyses on more varied and complex
medical datasets are warranted.
The work in [31] introduces a novel self-supervised pre-
training FL approach which utilizes the Vision Transformer
(ViT) as the underlying network architecture. This approach
performs local model pre-training on each client dataset to
overcome data heterogeneity concerns. Experiments conducted
on a Dermatology dataset related to skin cancer showed the
method to achieve notable improvements in test accuracy [54]–
[56]. In contrast to previous studies, authors of this work
perform three classification tasks in both simulated and real-
world scenarios, providing a more thorough assessment of
reliability. However, their experiments only consider a limited
number of clients (5 clients), which raises worries regarding
possible bias and the approach’s ability to effectively handle
a larger number of clients.
To address the issue of non-IID (non-identically distributed)
data across different clients, the approach in [32] trains per-
sonalized models using channel-wise assignment instead of
the layer-wise personalization techniques of previous studies
[57]–[60]. In this method, the global model is decoupled at the
channel level to enable personalization. To further improve
the decoupling effect, a new cyclic distillation technique is
introduced for reducing divergence. Experiments conducted on
the colorectal cancer HISTO-FED dataset, demonstrated the
proposed approach’s effectiveness in handling non-IID data.
However, the approach was only tested using three clients
6
and it is unclear if a robust performance can be achieved in
challenging scenarios with more clients.
Most existing approaches focus on optimizing the average
aggregation loss, which often results in bias, where the global
model performs well on many clients but poorly on others
[33]. To address this issue and reduce the performance gap of
the global model on different clients, a FL paradigm called
Proportionally Fair FL is proposed in [33]. This approach
aims to improve model fairness by optimizing a new objective
function that allows the global model to have better general-
ization ability across different clients. The primary objective
of Proportionally Fair FL is to enhance poorly performing
models by allowing the global model to dynamically adjust
the network parameters based on the actual performance
of the training clients. The experiments conducted on the
Cancer Genome Atlas (TCGA) dataset [48] demonstrate that
this method achieves good results. Yet, the stability of the
training loss is lower in comparison to FedAvg, and there
is a noticeable fluctuation in the testing accuracy during the
training procedure.
Compared to previous methods, the approach presented
in [34] focuses on improving the generalization ability of
the local-specific model instead of the global model. In this
approach, the global model only acts as a feature extractor
to assist the local model in extracting pertinent information.
Evaluated on the task of skin legion classification using the
HAM10000 dataset [61], the proposed approach achieved an
accuracy of 62.8±2.0% when eight clients participated in
the training. While improvements were also observed when
increasing the client number to 32, the performance of their
model did not improve for 64 clients, with an accuracy rate
of approximately 60% in this case.
A semi-supervised FL method is introduced in [35] to
handle scenarios where clients solely have unlabeled data,
whereas the global server holds only a small amount of labeled
data. This method employs a dynamic bank learning tech-
nique to support client training by utilizing class proportion
information. The technique distils class scale information by
establishing dynamic banks, which enables the model to learn
the scale distribution knowledge via sub-banks. The result
of the experiments achieves an AUC of 77.47% for skin
lesion diagnosis in dermoscopy images of the HAM10000
dataset. Nonetheless, the sensitivity of the proposed models is
relatively low at approximately 37%, indicating that its ability
to accurately detect the proportion of true positive cases among
all positive cases is insufficient. The F1 score of the model is
also poor, with a value of approximately 33%. Furthermore,
limited information was given on the parameters of training
optimizers, such as learning rate and weight decay, and the
number of clients was limited to 10.
COVID-19 Detection and Pneumonia diagnosis: Recent
studies have also investigated the use of FL for COVID-
19 detection and pneumonia diagnosis [36]–[39], [62]. Since
COVID-19 is a worldwide epidemic, incorporating more
clients to create a robust global model can be beneficial for pa-
tients and physicians. The study in [36] leverages customized
local models for healthcare personalization, employing distinct
local batch normalization to optimize model generalizability
while maintaining a high specificity for each patient. Exper-
imental results on the COVID-19 chest x-ray dataset [63]
showed promising performance and rapid convergence of the
method. Experiments involving 100 clients showed the method
achieves an average classification accuracy of 75%, which
indicates its robustness under a large number of clients.
In [37], two FL techniques are proposed for different active
learning scenarios: Labeling Efficient Federated Active Learn-
ing (LEFAL) and Training Efficient Federated Active Learning
(TEFAL). LEFAL aims to enhance the effectiveness of feature
learning by taking into account data uncertainty and diversity,
while TEFAL improves client efficiency by employing a
discriminator to assess the amount of useful information a
client can provide. The authors conducted experiments on the
COVID-19 dataset [64] and showed their approach achieves
high accuracy and F1 scores in a limited number of iterations.
For example, their model obtained an average accuracy of
0.9 and an average F1 score of 0.95 with only 50 iterations.
Additionally, the experiments covered two scenarios, involving
a small hospital and a large hospital, providing a more practical
assessment of the performance of the FL model in complex
settings. However, the maximum number of clients was limited
to five in this study.
The work in [38] presents a FL approach utilizing Gen-
erative Adversarial Networks (GANs) to mitigate the risk
of data privacy leakage. In this approach, a Convolutional
Neural Network (CNN) was used as a generator to pro-
duce synthetic COVID-19 images, enabling the discriminator
to learn and replicate the actual distribution of COVID-19
data. Additionally, a blockchain-based Differential Privacy
Protection technique was implemented to enhance the data
privacy protection. Experiments on the DarkCOVID dataset
[65] and the ChestCOVID dataset [66] indicated that this ap-
proach could outperform state-of-the-art FL methods on these
datasets. Results on the DarkCOVID dataset reveal that the
classification accuracy for COVID and normal cases is 99%,
however, the performance in predicting pneumonia is relatively
lower with an accuracy of 80%. Furthermore, the proposed
method requires a large number of epochs, typically around
200, to achieve optimal results, which is time-consuming.
The authors of [62] use cyclic homomorphic encryption to
improve the privacy-preserving capabilities of their FL method
by encrypting the aggregation process. Adversarial attacks are
also simulated to evaluate the model’s resilience. However,
their privacy protection technique is only effective when there
are more than two clients. In other words, when there are fewer
than three participating clients, the model’s privacy-preserving
ability is almost nonexistent. Experimental results based on the
RAD-ChestCT dataset showed their approach to achieve an
average accuracy of 94%, which is similar to the performance
of TL (95%) [67]. However, the maximum number of clients
used in this work is limited to 5. Moreover, the GPU memory
usage of the method exceeds 26 GB, which may restrict the
choice of computational device. One advantage is the training
time is shorter compared to centralized training, shedding light
on training efficient FL models.
In [39], a practical FL scenario called intermittent client
participation is presented, where some clients are consistently
7
involved in the training while others leave due to internet
connectivity issues. The method in this work achieves an
accuracy of 80.29% for pneumonia diagnosis on the chest X-
ray image dataset [68]. However, this study only considers
whether there is one client leaving or not, which fails to
provide a comprehensive reflection of the overall impact of
clients leaving. Additionally, the maximum number of clients
is limited to 10.
Epileptic Seizure detection: According to the World
Health Organization, epilepsy symptoms affect approximately
50 million individuals globally [69]. As a result, the detection
of epileptic seizures is critical for pre-operative evaluations
[70]. Detecting epileptic seizures usually involves accessing
sensitive patient data, such as EEG recordings. The diversity of
data obtained from different EEG devices further complicates
the training of a reliable model, which drives the need for
FL. A recent study proposed a real-time personalized FL
framework for detecting epileptic seizures on mobile plat-
forms, based on a deep neural network [40]. The authors
explored personalized FL, which enabled the model to learn
patient-specific seizure features from local data. The study
also showed this approach achieves greater energy efficiency
and performance using the EPILEPSIAE dataset [71]. Yet, the
model’s sensitivity is not robust, as it exhibits a substantial
decrease (8.5%) when compared to the centralized model.
In addition, experiments were limited to a total of four clients.
Human Activity Recognition: The development of IoT
technology has enabled Human Activity Recognition (HAR)
to play a critical role in assisting medical professionals with
collecting patient data for diagnosing chronic illnesses [72].
However, HAR is susceptible to privacy violations and data
dissimilarity issues. FL is a potential solution for implement-
ing robust models with numerous clients, as it effectively
addresses the previous issues. In a recent study [42], the au-
thors concluded that privacy regulations would not be violated
if a label with natural language is specified when sharing
data. The study considered the classification problem as a
matching process between data and class representation, and
transformed the classifier into a data and category encoder to
facilitate this process. Additionally, it used the class names as
a reference point to ensure category representation in the label
encoder through natural language. Experiments conducted on
the PAMAP2 dataset [73] demonstrated that this method could
outperform most existing classification techniques based on
FL. Nevertheless, the experiments did not include the results
obtained using a centralized model. Instead, the authors only
compared their results with those of six recent FL methods.
Thus, this comparison does not adequately reflect the differ-
ences in performance between TL and FL.
In [41], the limitations of existing wearable devices such
as data privacy, service integrity, and network structure adapt-
ability have led authors to create an adaptive network for in-
telligent wearables based on the distributed structural features
of the fog-IoT network. The proposed FL platform integrates
blockchain technology to enhance data privacy protection.
When tested on a HAR task using smartphone data [74],
this approach achieved good performance in terms of privacy
preservation and classification accuracy. However, the maxi-
mum number of clients was set to 10 in this work, which is
not practical for device-based FL due to the larger number of
devices compared to institutions in real-life scenarios.
Another study proposed a transfer learning-based person-
alized FL framework to tackle issues of heterogeneous data
and data privacy [43]. This framework aims to enhance model
performance by reducing the need for localized training and
using multi-domain knowledge to lessen disparities between
the data. The performance of the framework was evaluated on
a custom dataset, with results showing it achieves more than
90% precision on a five-category HAR task. Unfortunately,
the optimizer parameters, such as the learning rate, choice of
the optimizer, and batch size, are not mentioned in the study,
making it difficult to reproduce its results. Furthermore, the
absence of a publicly available benchmark may not adequately
reflect the actual performance of the proposed model.
Major Depressive Disorder disease diagnosis: Major De-
pressive Disorder (MDD), a prevalent, severe, and expensive
mental disorder worldwide, causes depressed mood, reduced
interest, and impaired cognitive function. Detecting functional
connectivity biomarkers and early intervention is important
for managing MDD. The privacy concerns related to patients’
information and data require the utilization of FL to train a
large global model. In a recent study [44], the authors devel-
oped a federated joint estimator to detect these biomarkers by
training a multilayer Bayesian network based on continuous
optimization. To enhance personalized models, they utilized
group fused lasso penalty during training and proposed an
alternating direction method of multipliers (ADMM) technique
to aid in processing neuroimaging data. The proposed method
incorporated information-sharing strategies to improve the
learning of local models. Experiments on rs-fMRI dataset [75]
demonstrated the superior effectiveness and precision of this
method.
Autism spectrum disorder prediction: Autism spectrum
disorder (ASD), a disorder that is part of the autism spectrum,
has a substantial impact on the prevalence of mental illnesses,
which can harm a child’s mental health development [76].
CNN [77], [78] and Recurrent Neural Network (RNN) [79],
[80] are frequently employed to detect ASD early on for
prediction purposes. Although these techniques have achieved
good results, they mostly disregarded the correlations and
connections between subjects in the population [45]. Recent
research has shown that graph neural networks can effectively
overcome this limitation [81]. This approach employs graph
generative adversarial networks to complete the missing infor-
mation in the local network and uses network in painting and
inter-institutional data to enhance the edge predictor [45]. The
method’s effectiveness was demonstrated through experiments
on two neuroimaging datasets, ABIDE [82] and ADNI [83].
For the ADNI dataset, the performance of the FL model
remains the same when increasing the number of clients
beyond 8. However, this performance continues to improve
for the ABIDE dataset, suggesting that the model’s potential
may not be fully attained when faced with more clients.
Surgical Phases recognition: Surgical phase recognition
8
serves a crucial clinical purpose by accurately identifying the
current phase without future information from the surgical
video [84]. Despite its importance, the field continues to
face challenges due to the sensitive nature of medical data.
This restricts collaborations between multiple institutions and
limits the deployment of traditional deep models in real-world
settings. In [46], the authors introduced the first FL strategy
that employs semi-supervised learning to enhance the gener-
alization capability of the surgical phase recognition model
using both labeled and unlabeled data present in the dataset.
The experimental results demonstrated that this approach can
learn better features and exhibit a feasible generalization per-
formance in unknown domains. The MultiChole2022 dataset
used in this study was created from the Cholec80 dataset [85].
Summary: The existing FL classification models are
still restricted to a limited number of clients. Furthermore,
benchmark datasets are needed to compare the performance
of the same tasks (e.g., COVID-19 diagnostics). To ensure an
objective evaluation of these FL (classification) models, future
collaboration is encouraged for expanding the datasets.
B. Application of Federated Learning Segmentation in Health-
care Tasks
Medical imaging applications using FL may also involve
various segmentation tasks, for example, to delineate tumors
and other lesions in the prostate [28], [90], [92], [97], [98],
brain [104], [107], [108], [128], breast [100], skin [100] or
liver [110]. Table II presents a summary FL-based techniques
for segmentation and their reported performance.
Prostate tumor segmentation: The accurate segmentation
prostate regions in MRI is a crucial step in numerous medical
imaging applications for detecting prostate cancer, character-
izing its aggressiveness, predicting its recurrence, assessing
the effectiveness of treatment [129]. The work in [28] trains
a FL-based segmentation model using a multi-site prostate
dataset [89], which comprises 79 samples from six different
sites. Results showed this model to achieve an average Dice
of 94.28%. Compared to FedAvg and FedBN, the proposed
method shows enhanced stability with increased local training
epochs. However, this study did not evaluate the performance
improvement or decrease brought by using FL, compared to
centralized approaches.
Weakly supervised learning has emerged as popular ap-
proach to alleviate the burden of labeling data [130]. In this
approach, incomplete but easier-to-obtain annotations are used
instead of full image annotations. In [90], authors proposed
a first federated weakly supervised segmentation (FedWSS)
method to learn a segmentation task from multiple data sources
wile minimizing the impact of data drift. To address local and
global data drift, the authors introduced two strategies, based
on Cooperative Annotation Calibration (CAC) and Hierarchi-
cal Gradient De-confliction (HGD). CAC reduces local drift
using a Monte Carlo sampling technique that customizes a
distal peer and proximal peer for each client, and accurately
distinguishes between clean and noisy labels. Meanwhile, the
HGD strategy mitigates global data drift by using primary
gradient data to aid clients in subsequent training cycles [90].
TABLE II: Summary of the performance of each federated
learning algorithm.
Ref. Clients Performance Model datasets
CLASSIFICATION
[27] 32 ACC = 0.641 ±0.09 DenseNet [49]
[28] 5 ACC = 0.9548 ±0.0113 U-Net [49]
[29] 3 AUC = 0.79 ResNet-22 [49]
[30] 12 ACC = 0.8475 CCNN [53]
[31] 5 ACC = 0.8996 ViT [54]–[56]
[39] 3 ACC = 0.8029,
AUC = 0.9313 CNN [68]
[47] 14 AUC = 0.83 DenseNet121 [86], [87]
[32] 3 ACC = 0.6566 ResNet-32 *
[88] ACC >0.86 CCNN *
[33] 4 ACC = 0.7954 DenseNet121 [48]
[62] 3 ACC = 0.94 U-Net [67]
[34] 32 ACC = 0.596 ±0.03 ResNet-18 [61]
[35] 10 ACC = 0.8894 ±0.015 DenseNet121 [61]
[36] 20 ACC >0.94 Alexnet [63]
[37] 6 ACC = 0.976,
REC = 0.978 U-Net [64]
[38] 4 ACC = 0.967 0.973 GAN, CCNN [65], [66]
[40] 4 ACC = 0.8162,
SPEC = 0.82 Res1DCNN [71]
[42] 9 ACC = 0.8814 ViT [73]
[41] 10 ACC = 0.9043 1DCNN [74]
[43] 10 ACC >0.90 CCNN *
[44] 60 PREC >0.92 Bayesian Networks [75]
[45] 5 ACC = 0.758 ±0.0007 GCN [82], [83]
[46] 4 ACC = 0.5969 ±0.075 ResNet-50 [85]
SEGMENTATION
[28] 6 Dice = 0.9428 ±0.08 U-Net [89]
[90] 10 Dice = 0.8778 ±0.064 U-Net [91]
[92] 6 Dice = 0.9028 U-Net,VGG-11 [91], [93]–[96]
[97] 5 Dice = 0.7828,
IoU = 0.7192 ResNet-18 [91], [95]
[98] 20 IoU = 0.671 GAN [99]
[100] 3 Dice = 0.8334 / 0.8693 U-Net [61], [101]–[103]
[104] 3 Dice = 0.888 0.898 U-Net [105], [106]
[107] 3 Dice = 0.7785 ResNet-34 [105], [106]
[108] 10 Dice = 0.8460 ResNet-34 [109]
[110] 30 Dice = 0.829 / 0.899 U-Net [93], [111]
[112] 2 Dice = 0.803 ±0.004 U-Net [113]
[114] 50 Dice = 0.8804 U-Net [115]
OTHE R TASK S
[116] 3 PSNR = 0.351 ±0.014,
SSIM = 0.954 ±0.01 GAN [117]
[118] 8 PSNR = 0.3921,
SSIM = 0.970 U-Net [119]
[120] 50 F1= 0.5882 BERT [121]
[122] 20 AUC >0.845 LSTM [123]
[124] 4 IBS = 23.6 ±0.9 NN [125]
[126] 6 ROC = 0.86 LSTM [127]
ACC: Accuracy; PREC: Precision; SPEC: Specificity; REC: Recall; ROC: Receiver
Operating Characteristic Curve; AUC: Area under the ROC curve; REC: Recall; Dice:
Dice coefficient; IoU: Insertion over Union; PSNR: Peak signal-to-noise ratio; SSIM:
Structural similarity index; IBS: Integrated Brier scores; ‘*’ means private dataset,
means not provided; GAN: Generative adversarial network; CNN: Convolutional neural
network; CCNN: Custom convolutional neural network; GCN: Graph Convolutional
network; BERT: Bidirectional encoder representations from transformers; LSTM: Long
short term memory; ViT: Vision Transformer; 1DCNN: 1 dimensional convolutional
neural network; NN: Neural network.
Experimental results on the PROMISE12 dataset [91] showed
the method to outperform previous approaches for FL-based
prostate segmentation. Yet, this method primarily involves
sharing models between clients to detect noisy labels, which
may lead to increased data transmission costs. Moreover,
the sharing process may be vulnerable to malicious attacks,
which could potentially lead to the recurrence of privacy
breaches. Additional encryption techniques should hence be
incorporated into the sharing process.
The work in [92] addressed the challenge of client drift to
9
reduce the generalization gap between FL and TL models. It
proposed a new FL framework based on ensemble learning
and introduced a novel personalization technique that aims
to update model parameters by interpolating the local optima
of the current client with those of other clients. Experimental
results on three medical image segmentation tasks (retinal disc
and cup segmentation, 2D fundus image segmentation, and
prostate segmentation from 3D MRI) [91], [93]–[96] demon-
strated the effectiveness of the proposed method. Additionally,
the proposed approach demonstrated comparable outcomes to
the centralized model, indicating considerable potential when
compared to FedAvg and FedProx.
In [97], the authors proposed a personalized FL paradigm
to address the challenges of performance degradation and
unbalanced label distribution. The proposed method leverages
progressive Fourier aggregation on the global server side and
enhanced transfer on the client side to learn the parameters of
individual client models and transfer local knowledge to the
global model more effectively. To address the problem of label
distribution imbalance, it also introduces a new loss function
called Conjoint Prototype Aligned (CPA) loss. This loss eval-
uates the global conjoint objective based on the global imbal-
ance and modifies the local client-side training via prototype-
aligned refinement to eliminate the imbalance gap with a bal-
anced objective. Experimental results on PROMISE12 dataset
[91] and ISBI dataset [95] showed the method’s superior
performance compared to recent approaches. However, this
method has a local training time twice longer than standard FL,
which could potentially increase the communication load when
using edge devices. Moreover, the absence of a comparison
with the centralized model does not sufficiently explain the
potential of using FL for prostate tumor segmentation.
Breast tumor segmentation: Breast cancer, which is the
most prevalent type of cancer in women, can be fatal if not
detected early [131]. In order to simulate a FL model, in
a recent study [100], a novel label-agnostic supervised FL
method called FedMix was proposed. FedMix trains each
client by utilizing both strong and weak labels with an adaptive
weight adjustment strategy, which allows for dynamic weight
adaptation during the FL training process to learn better feature
representations. This method breaks the restriction of only us-
ing one type of label for training. FedMix was tested on three
breast tumor segmentation datasets: BUS [101], BUSIS [102],
and UDIAT [103]. Experimental results showed it outperforms
most current approaches. Nevertheless, the performance of this
technique relies heavily on the choice of hyper-parameters,
which needs extensive fine-tuning to avoid degradation in
performance. Additionally, it is assumed that rich label clients
exhibit higher training loss, indicating a greater amount of in-
formation available for model training. However, the presence
of noisy or corrupted labels can lead to a substantial rise in
the loss, and their model is unable to effectively differentiate
these labels. Consequently, the performance of the model in
this particular scenario may be negatively impacted.
Skin tumor segmentation: Skin cancer is a common
disease that affects both men and women [132]. Similarly to
their experiments on breast tumor segmentation, in [100], the
authors also evaluated their FedMix method on a skin tumor
dataset [61]. This dataset includes approximately 10,015 sam-
ples from four different sources. According to the experimental
results, FedMix achieved an average Dice score of 86.93%. A
comparison with the centralized model is however missing.
Brain tumor segmentation: The accurate identification of
tumor regions in medical images is crucial for effective clinical
treatment of brain cancer, and brain tumor segmentation plays
a key role in this process [131], [133]. However, the need
for FL models has risen due to the huge workload associated
with annotating images for medical professionals. In [104],
a distributed network framework based on is proposed for
FL that operates in real-time using the Message Queuing
Telemetry Transport (MQTT) protocol. It uses a modified
version of the U-Net model train with data from daily clinical
practice. Two commonly used datasets, BraTS 2018 [105] and
BraTS 2020 [106] are used to investigate the trade-off between
training accuracy and latency. The results show, for the first
time, the primary advantages of the MQTT protocol in terms
of reliability, bandwidth efficiency, and scalability.
The work in [107] presents an FL-based framework address-
ing the issues of non-IID data and privacy leakage. The pro-
posed solution utilizes unlabeled public data for offline, one-
way knowledge distillation to extract local knowledge through
ensemble attention distillation, enabling global model learning
while maintaining privacy. This approach was tested on the
BraTS 2018 dataset [105] and BraTS 2020 dataset [106].
Results showed highly competitive performance along with
more effective privacy protection. The experiments conducted
in this work covered wide a range of FL scenarios including
local data from different institutions, local data of varying
sizes, public data from different domains combined with local
data, and public data with modalities different from the local
data. This comprehensive analysis allowed for a thorough
evaluation of the performance of FL in various situations.
However, a relatively small number of clients was employed
(i.e., 3 clients).
In [108], a straightforward FL method called heterogeneity-
aware FL is proposed, which improves the generalization of
the model over the target domain by splitting the network
and concatenating feature maps. Unlike other methods, this
approach does not require complex tuning and optimization
strategies. Experiments conducted on the BraTS 2017 dataset
[109] indicate that this method can achieve an average Dice
score of 84.60%. The robustness of the model was also exam-
ined, and the results of the assessments demonstrated that this
approach had the most favorable outcomes compared to CWT
[134] and FedAvg+SD [135] when the network architecture
was changed from Resnet34 to MobileNet-v2. Although the
method effectively addresses statistical heterogeneity issues,
it neglects model heterogeneity [136], device heterogeneity
[137], behavior heterogeneity [138], and other factors. As a
result, the model’s potential for tackling heterogeneity prob-
lems is not fully demonstrated.
Neuroimaging anomaly labeling: A technique called
Federated Disentangled Representation Learning is proposed
for performing unsupervised brain disease segmentation [128].
10
This technique decomposes the parameter space into a global
space, allowing the model to take advantage of generic
anatomical features while also protecting client-specific con-
trast information. The approach was tested on three datasets,
namely OASIS-3 [139], ADNI [83], and an internal dataset
(KRI), and the results showed significant improvement in
anomaly segmentation when compared to locally trained mod-
els without annotations or sharing of private local data. Specif-
ically, the proposed method achieved a 99.74% improvement
for multiple sclerosis and a 40.45% improvement for tumors
[128].
Liver tumor segmentation: The segmentation of liver
cancer using computed tomography (CT) volumes is important
due to its high mortality rate [140]. Proper segmentation is nec-
essary for the accurate diagnosis of this common malignancy.
However, incomplete annotations in individual datasets, such
as those found in [47], can pose a challenge. FL offers an
interesting solution for tackling these problems. In [110], a FL
segmentation algorithm is introduced to address this issue. The
algorithm consolidates the acquired knowledge into a meta-
global model through learning to segment datasets with various
incomplete annotations. The experiments conducted on the
MSD dataset [93] and BTCV dataset [111] this model achieved
impressive results on distributed datasets that have disjoint and
incomplete annotations. The model sets up a prototype for
creating a unified multi-task segmentation model with clinical
relevance using fragmented datasets with incomplete annota-
tions. Yet, further experimentation is required to evaluate the
performance the method’s in practical FL settings (e.g., clients
may leave and join the training procedure) and with more
challenging datasets.
Pneumothorax segmentation: The accuracy of clinical
diagnosis for Pneumothorax, a common lung disease, largely
depends on the precision of segmentation from chest X-ray
images [141]. In [112], a patch permutation approach was
employed by incorporating permutation into a patch embedder
layer and using ViT as the backbone for multi-task FL. This
approach led to a decrease in communication consumption
between clients and the global server, as well as an im-
provement in model performance. Experimental results on the
SIIM-ACR dataset [113] showed the proposed technique to
achieve an average Dice score of 80.8%, outperforming the
centralized model by 0.7%. However, the small number of
clients considered in this study (i.e., only two) constitutes a
limitation.
Vertebral body segmentation: When it comes to clinical
applications, manual segmentation is widely applied. However,
it is not practical for spinal body segmentation due to time
limitations [142]. To address time constraints and improve
the availability of data for decreasing manual segmentation,
it is necessary to collaborate with a larger number of clients.
In [114], a new FL-based framework was proposed for ver-
tebral body segmentation. The framework utilized a local
Dual Attention Gates-based attention mechanism to improve
the performance of the model. This method was capable of
enhancing the performance of vertebral segmentation models
using the SpineSagT2Wdataset3 dataset [115]. Nevertheless,
there remains a difference in performance between FL and TL,
with FL experiencing fluctuations during training that make it
less stable compared to centralized approaches.
Summary: The same issue of having a very small number
of clients is also present in segmentation tasks. Despite the fact
that there are a number of segmentation tasks (e.g., prostate
cancer) that have limited high-quality data, increasing the
number of clients will only lead to a few or no samples
for each client, thus significantly decreasing the global per-
formance. It is thus challenging to acquire more high-quality
data.
C. Applications of Federated Learning in Healthcare for Var-
ious Tasks
FL can be applied beyond image classification and segmen-
tation, with potential applications including MRI reconstruc-
tion [116], [118], medical relation extraction [120], medical
knowledge graphs [143], mortality prediction [122], lifespan
prediction [124], and mental health detection [126]. For a
summary of each method’s performance, please refer to Table
II.
MRI reconstruction: The lengthy MRI acquisition times
caused by modern technology have become problematic for
both patients and doctors, leading to an increase in popularity
for reconstructed high-quality MRI [144]. However, previous
FL methods for MRI reconstruction were based on conditional
reconstruction models and had poor generalization ability,
making them unsuitable for a wide range of acceleration
rates [116]. To address this issue, a novel image recon-
struction method based on unconditional generative adver-
sarial networks was proposed in [116]. The method utilized
cross-site learning to generate images and included a new
mapper subnetwork to maintain specificity by creating site-
specific latent. This method improved performance on multi-
institutional datasets, including IXI [117], fastMRI [119], and
BRATS [145]. The model also has lower computational and
inference times compared with other reconstruction methods
[146], [147], which is more practical for real-life settings.
Nonetheless, it is critical to conduct further research in order
to systematically validate the method and assess its anatomical
accuracy on a wider range of patients.
In [118], it was noted that domain-specific information,
which can contain valuable information for local reconstruc-
tion, should not be ignored while other FL techniques focus on
improving the generalization of global models. To address this,
a specificity-preserving FL algorithm was proposed, consisting
of an encoder to learn a global generalization representation
and a client-specific decoder to retain domain-specific features.
Weight contrast regularization was also employed during the
training process. It achieved an average Peak Signal-to-Noise
Ratio (PSNR) of 39.21% and Structural Similarity Index
(SSIM) of 0.970 using the fastMRI dataset [119]. In addition,
the model outperforms FedAvg, FedBN, and FedProx in terms
of achieving higher SSIM and PSNR values within a few
epochs. It also exhibits a consistent curve for PSNR and SSIM
during training.
11
Medical relation extraction: Relational extraction, which
is a critical approach to acquiring knowledge in AI, is gaining
traction in the healthcare industry [148]. However, heterogene-
ity issues also exist in the texts from various institutions. In
[120], a new concept called major classification vectors is pro-
posed, which consists of a set of class vectors obtained through
an ensemble learning method. A contrastive learning method
is employed to facilitate local training and minimize the risk
of local models overfitting. Additionally, the proposed method
effectively prevents the leakage of original data, features, and
label distribution. The experiments conducted on the 2010
i2b2/VA challenge dataset [121], BioCreative VI: Chemical-
protein interaction dataset [149], and Phenotype-Gene Rela-
tions corpus dataset [150] show that the proposed method
yields decent results, especially in terms of a more efficient
convergence rate. The experimental results indicate that the
method exhibits reduced training fluctuations in comparison to
FedAvg and FedRS. It is important to note, however, that the
study does not include a comparison with non-FL techniques.
Mortality prediction: The lack of labeling in electronic
medical records and the distributed nature of the data make
it difficult to train an AI model that can achieve optimal
performance [122]. To address the challenge of privacy while
also requiring additional assistance for successful resolution,
a model-agnostic meta-learning algorithm called Reptile was
recently proposed [151]. Despite the development of such
approach, this field still suffers from distributed data issues. In
[122], a dynamic variant of the neural graph based on the Rep-
tile algorithm is introduced, which enables semi-supervised
learning by integrating unlabeled data into the training phase
and simultaneously conducting metric learning on labeled and
unlabeled neighborhoods. Experiments carried out using the
MIMIC-III dataset [123] demonstrate the effectiveness of the
proposed method, particularly when constrained to limited
supervision. The method displayed comparable performance
to the centralized model, but had a slower loss convergence
rate compared to FedAvg. Additionally, this work lacked
a thorough investigation into privacy-preserving techniques.
This raises concerns about the robustness of the proposed
model’s performance against malicious attacks or when em-
ploying encryption methods such as differential privacy or
secure multi-party computation.
Lifespan prediction: There has been a recent increase
in studies related to predicting life expectancy [124], [152].
The Cox model is a well-known standard technique in this
field [153]. In [124], a federated Cox model is proposed
based on the Cox model. This model accounts for the effects
of time-varying covariates by relaxing the proportional risk
assumption, which ensures data privacy and reduces upfront
investment costs for organizations compared to previous meth-
ods. Experiments done on three clinical datasets METABRIC
[154], SUPPORT [155], and GBSG [125] show that the
FL model can perform equally well to the traditional model.
However, this study mainly focuses on the heterogeneity
resulting from label stratification, while neglecting other forms
of heterogeneity, such as covariate shifts [156], which are
commonly observed in image-based survival predictions.
Medical knowledge graph: In the healthcare field, knowl-
edge graphs, which are data networks comprising entities rep-
resented by nodes and relationships represented by edges, have
become a popular topic of discussion [157]. In order to en-
hance collaboration with additional institutions and experts, in
[143], a framework is proposed to build on-demand knowledge
graphs with specific tasks for FL. The framework is designed
to be findable, accessible, interoperable, and reusable (FAIR)
for creating biological knowledge graphs while maintaining
the source data’s provenance. This framework is among the
first to standardize the process of constructing knowledge
graphs, rather than their representation.
Mental health assessment: Early detection of mental
illness is challenging due to its insidious nature and the lack
of available resources. Mental illness is the most widespread
mental health issue globally. To address global mental health
issues, a potential approach involves using FL to train a
large-scale model that can be universally applied. In [126],
a FL-based model for mental health detection is proposed.
The model utilizes a hypergraph and a sentiment vocabu-
lary approach, which is word-represented, to learn a low-
dimensional vector representation for detection while pre-
serving semantic relevance to the greatest extent possible.
An attention-displacement mechanism is also incorporated to
assist in the instructional process. Experiments conducted on
a dataset gathered from websites and forums [127] achieved
an AUC-ROC of 0.86 using long short-term memory ar-
chitecture. However, the study has certain limitations. For
instance, it fails to take into account extra variables such
as the patient’s geographical location, cultural and religious
background, as well as social context. Moreover, in some cases
during training, there may be situations where the performance
of a single class is poor (e.g., AUC = 0.2). Despite the
overall performance being acceptable, the model fails when
confronted with particular categories. Additionally, there is a
lack of comparison with the centralized model.
Summary: The limited number of clients is not a major
issue when it comes to medical tasks based on FL, such as
identifying relationships between medical entities and knowl-
edge graphs. However, these tasks are highly valuable for
physicians in providing accurate diagnoses. Furthermore, the
reconstruction of MRI data requires high-quality data, which
needs FL to combine more data to train a robust model. It is
suggested that more research be conducted to address these
issues.
V. FEDERATED LEARNING MODELS
PERFORMANCE
This section initially outlines the metrics typically em-
ployed for FL tasks in healthcare. Subsequently, it presents an
overview of the commonly used datasets with their descrip-
tions. It’s important to mention that, due to the unavailability
of a standardized test dataset at present, a uniform performance
comparison is not provided.
A. Commonly Used Evaluating Metrics
The metrics that are typically utilized for image segmenta-
tion tasks include the Dice coefficient (Dice) and Insertion over
12
TABLE III: Summary of recent studies on federated learning models in healthcare.
Reference Purpose Category
[27] Employs bag preparation and MIL techniques to accomplish classification tasks FL, MIC
[128] Utilizes generic anatomical features by decomposing the parameter space into a global space while maintaining client-specific contrast
information FL, BAD, MIS
[28] Builds a new harmonizing architecture called HarmoFL to address data drift problem FL, MIC, MIS
[116] A new mapping subnetwork with cross-site learning proposed to perform image reconstruction tasks FL, MRIR
[46] Federated semi-supervised learning method used for surgical phases recognition FL, SPR
[100] A label-agnostic unified FL using mixed labels and an adaptive weight assignment procedure for aggregation proposed to perform
segmentation tasks FL, MIS
[122] A dynamic variant of neural graph network and meta-learning to tackle mortality prediction tasks FL, ML, MIA
[30] To mitigate the instability caused by data heterogeneity through knowledge of the client’s label distribution FL, MIC
[31] Uses masked image encoding as a self-supervised task to learn effective representations FL, MIC
[18] Considers a weighted average assignment based on data quality instead of the amount of data to perform classification tasks FL, MIC
[104] A real-time distributed FL framework based on MQTT protocol proposed to improve the FL procedure FL, MIS
[36] The use of local batch normalization and personalized models for each client has been explored as a means to address the domain shift
problem and learn the similarities between clients. FL, HAR
[114] A novel local Dual Attention Gates-based attention mechanism has been employed for FL FL, MIS
[37] Active learning techniques have been utilized to improve the performance of FL models FL, AL, MIA
[97] To address the challenges of performance degradation and unbalanced label distribution in a dataset, a solution has been proposed that
employs PFA and Conjoint Prototype Aligned loss. FL, MIS
[118] Optimizes the FL model by dividing it into two parts, one as a decoder and the other as a client-specific decoder, and uses weighted contrast
regularization in the training process FL, MRIR
[34] A customized FL has been developed to enhance the generalization ability of the local model, rather than the global model FL, MIC
[108] This approach aims to address the reduction in performance caused by data heterogeneity in FL by tackling the heterogeneity of data FL, MIA
[43] A cross-domain FL framework has been developed that leverages transfer learning techniques to mitigate differences in data distribution FL, TR, HAR
[45] To enhance the performance of the model, a combination of graph neural networks and intra-network mapping has been utilized FL, GNN, DP
[107] The robustness of data privacy protection has been increased by employing a one-way offline knowledge distillation technique FL, KD, MIA
[98] Uses GANs for computational pathology to reduce the discrepancies between data FL, CP, MIS
[35] A semi-supervised FL approach with dynamic bank learning method proposed to solve the class distribution imbalance problem FL, MIC
[38] Builds a blockchain-based differential privacy protection strategy to enhance the effect of data privacy conservation FL, MIC, GAN
[126] Considers a word emotion representation while using an attention shifting mechanism to assist in training FL, HG, NLP
[92] To tackle client drift problem and the generalization gap between TL and FL FL, MIS
[40] Employs a FL model using CNN architecture to tackle the real-time seizure detection tasks FL, SD
[32] Implements channel decoupling to provide personalized models and a new cyclic distillation scheme to control the training process FL, MIC
[88] Incorporates contribution aware into FL to build a reliable healthcare system FL, HS
[39] To solve the intermittent client problem (some clients may leave the training) FL, MIA
[33] Aims to lessen the performance differences between each local model in an effort to increase the ”fairness” of the models for each client FL, MIC
[44] Uses Bayesian networks and group fused lasso penalty to process the neuroimaging data at each local client before update to the global
server FL, MIC
[112] To improve the models performance without sacrificing privacy by utilizing random patch permutation for MTL FL, ViT, MIA
[124] Builds a federated Cox model to lower initial organizational expenses FL, LP
[41] Utilizes private blockchain technology in order to safeguard data within the IoT network with privacy-preserving features FL, HAR
[29] A new memory-aware curriculum learning strategy for FL proposed in order to enhance the consistency of local models and penalize
inconsistent prediction outcomes FL, MIC
[42] To better align the latent spaces across clients by using natural language to represent label classes FL, KD, MIC
[90] Builds a weakly supervised FL algorithm to efficiently learn segmentation models in the context of data drift mitigation FL, WSL, MIS
[120] An ensemble approach and contrastive learning has been developed to prevent overfitting issues FL, CL, MRE
[110] Uses a knowledge aggregation strategy for handling medical datasets with different and incomplete annotations FL, MIS
[47] A FL based surgical aggregation method has beem utilized to handle multi-label classification problems FL, SA, MIC
[143] To build on-demand knowledge graphs with specific tasks by using FL methods FL, KG
[62] Utilizes personalized cyclic homomorphic encryption to enhance the privacy protection effect FL, MIA
FL: Federated learning; MIC: Medical image classification; SPR: Surgical phase recognition; MIS: Medical image segmentation; HAR: Human activity recognition; MIA: Medical
image analysis; AL: Active learning; TR: Transfer learning; GNN: Graph neural network; DP: Disease prediction; KD: Knowledge distillation; MRIR: MRI reconstruction; ML:
Meta-learning; ViT: Vision Transformer; MQTT: Message queuing telemetry transport; CNN: Convolutional neural network; PFA: Progressive Fourier aggregation. BAD: Brain
anomaly detection; SPR: Surgical phase recognition; HAR: Human Activity recognition; MRE: Medical relation extraction; WSL: Weekly supervised learning; CP: Computational
pathology; HG: Hyper graph; KG: Knowledge graph; SD: Seizure detection; SA: Surgical aggregation; HS: Healthcare system; LP: Lifespan prediction; CL: Contrastive learning;
MTL: Multi-task learning.
Union (IoU) [158]. Similarly, for classification/prediction tasks
in FL, widely-used metrics are the Accuracy (ACC), Precision
(PREC), Recall (REC), Specificity (SPEC), F1score, and
the Area Under the ROC (AUC) [159]. In the case of other
healthcare-related tasks, such as MRI reconstruction, common
metrics include PSNR and SSIM [160], while for lifespan
prediction, Integrated Brier Scores (IBS) are typically used
[161].
B. The Benchmark Datasets Used in Federated Learning
In this section, we will discuss the healthcare benchmark
datasets that have been used in FL. As there is currently no
unified benchmark dataset for FL in healthcare, we present an
overview of the datasets that have been used in the majority
of published works in Table IV.
Retina: The Retina dataset comprises approximately
35,126 images related to Diabetic Retinopathy Disease. The
dataset encompasses five classes, namely normal, mild, mod-
erate, severe, and proliferating. The images in the dataset have
been captured using various cameras and from different angles
[167].
MedMNIST: The MedMNIST dataset is composed of
roughly 718,067 images related to ten sub-domains, which in-
clude PathMNIST, ChestMNIST, DermaMNIST, OCTMNIST,
13
TABLE IV: Commonly used datasets for federated learning in
healthcare
Datasets Sample(n) Application area
MIT BIH [162] 109,446 ECG-based prediction to identify ar-
rhythmia.
Premier healthcare [163] 1,271,733 PPR
Chest xray image [164] 16,148 COVID-19 diagnosis
Chest xray image 2 [68] 207,130 PD
Hologic and Siemens [49] 1,870 To detect breast cancer or tumor
COVID-19 [165] 4,029 Mortality prediction for patients with
COVID-19
eICU synergetic [166] >200,000 Predict the likelihood of patient death
HAM10000 [61] 10,015 Skin Cancer classification / Segmen-
tation
Cancer Genome Atlas [48] >20,000 Cancer genomics program
Camelyon17 [49] 450,000 Breast cancer classification
MedMNIST [53] 718,067 Medical image classification
Retina [167] 35,126 Diabetic Retinopathy Detection
BraTS series [105], [106], [109] 285 Brain tumor segmentation
ABIDE [82] 1,112 ASD diagnosis
ADNI [83] 911 ASD diagnosis
PolypGen [168] 6,282 Polyp detection and segmentation
MIP [169]–[172] 393 Pancreas segmentation
MIL [173]–[177] 428 Liver tumor segmentation
MSP [89] 79 Prostate MRI segmentation
ECG: Electrocardiogram; ASD: Autism spectrum disorder; MRI: Magnetic resonance
imaging; PPR: Predict patient mortality, PD: Pneumonia detection, MIP: Multi-
institutional pancreas, MIL: Multi-institutional livers, MSP: Multi-site prostate.
BreastMNIST, among others. The amount of medical images
available in each sub-dataset ranges from 100 to 100,000.
Furthermore, the dataset comprises 12 2D sub-datasets and
6 3D sub-datasets [53].
Camelyon17: Camelyon17 dataset contains 450,000 im-
ages about breast cancer. A total of ve centers contributed
data to this dataset [49].
PolypGen: PolypGen dataset is a multi-center polyp de-
tection and segmentation dataset.It incorporates more than 300
patients [168]. This dataset has both single frame and sequence
data, containing about 6,282 samples.
HAM10000: HAM10000 dataset consists of 10,015 der-
matoscopic images about skin disease from two different sites
with seven categories [61].
Premier healthcare: Premier healthcare dataset is one
of the largest database collecting data from 415 hospitals in
the USA [163]. This dataset consists of 1,271,733 Electronic
Health Record (EHR) for mortality prediction.
MIT BIH: MIT BIH dataset includes 109,446 samples
of EHR for predicting arrhythmia. This dataset contains ECG
data for 47 individuals between 1975-1979 [162].
Multi-site prostate: Multi-site prostate dataset is a dataset
which contains 79 prostate T2-weighted MRI from three
different centers [89]. It is mainly used for the segmentation
tasks.
Multi-institutional Pancreas: Multi-institutional Pancreas
dataset consists of about 393 CT samples about pancreas
[169]–[172]. It is used for the segmentation tasks.
eICU synergetic: eICU synergetic dataset is mainly for
Intensive Care Unit (ICU) based mortality prediction, which
contains more than 200,000 samples [166].
COVID-19: COVID-19 dataset is a dataset for EHR based
COVID-19 patients mortality prediction [165]. It includes
4,029 samples from five different hospitals.
Multi-institutional Livers: Multi-institutional Livers
dataset consists of 428 CT images about liver tumor diagnosis
[173]–[177]. These datasets are derived from five different
centers.
Chest Xray image: Chest Xray image dataset has 16,148
cases (both positive and negative) from 20 client-sites [164].
These datasets are used for predicting the future oxygen
requirements of patients. While the Chest Xray image 2 is
a Chest Xray image 2 datasets contains about 207,130 OCT
images from 4,686 patients with four classes [68]. It is mainly
related to pneumonia diagnosis.
Cancer Genome Atlas: Cancer Genome Atlas dataset is
a large cancer dataset that contains breast, lung, Colon, and
rectal cancer, etc., [48]. It has more than 20,000 samples.
BraTS: BraTS is a brain tumor segmentation dataset which
includes 285 brain tumor MRI scans with four MRI modalities
as T1, T1ce, T2, and Flair for each scan [105], [106], [109].
In addition, the dataset consists of complete masks for brain
tumors with labels for edema/invasion, enhancement tumor,
and necrosis regions.
ABIDE: The ABIDE dataset includes 1,112 samples,
539 of which are from individuals with ASD and 559 from
normal/healthy controls (ages 7-64 years, median 14.7 years
across groups) [82].
ADNI: The ADNI dataset contains 911 samples (378 AD
patients and 536 mild cognitive impairment subjects.). It has
three different domains including ADNI-1, ADNI-2, ADNI-
GO [83].
C. A case study of federated learning in healthcare
We use a supervised FL algorithm for medical image
classification as an example to demonstrate the application of
FL in healthcare classification tasks. The procedure of this FL
technique can be described as follows. We use a large-scale
dataset consisting of COVID-19 and pneumonia diagnoses
[178], covering three distinct subtypes of the disease, namely
normal, pneumonia, and COVID-19. The client number is set
to 5. Each client has 20% of the total 357,518 samples, and
the testing set consists of 33,781 samples. After successfully
allocating the data, local training is conducted, followed by
the aggregation of weights to generate a global model. This
global model is then evaluated on the testing set. The training
procedure finished after 100 federated rounds.
We employ the ResNet34 architecture for both local and
global model, using the Adam optimizer [179], [180] with a
learning rate of 0.0002, a weight decay of 0.0005, and a batch
size of 16. We also use FedAvg as the aggregation technique,
along with Cross-Entropy loss to optimize the local model
[20], [181]. Figure 5 illustrates the testing in function with
epochs, and Table V reports the performance of the compared
methods.
Our simulation indicates that the FL method can result in a
large drop of 10. 96% in accuracy when comparing the global
model with the centralized model. Furthermore, the FL method
suffers from fluctuations during the testing phase as illustrated
in Figure 5. It may be necessary to apply techniques such
14
TABLE V: Top performance metrics of Federated/Centralized
Learning models using the testing samples.
Accuracy Precision Recall F1
Centralized 78.43 82.10 83.75 82.30
Client1 69.65 77.99 74.81 75.06
Client2 70.11 78.26 74.63 76.29
Client3 67.86 77.93 75.17 74.51
Client4 69.20 77.15 76.92 75.74
Client5 70.02 77.35 76.70 75.39
Global 67.47 77.78 73.93 75.25
The most favorable results are indicated with bold text.
as domain adaptation before implementing an FL model, as
suggested in [16].
VI. KEY CHALLENGES IN FEDERATED LEARNING
FOR HEALTHCARE
This section outlines the main challenges that FL faces in
the healthcare field. Figure 6 provides a visual representation
of some critical obstacles.
A. Potential malicious attacks
Although FL is effective in ensuring data privacy, it faces a
security risk when transmitting communication between local
and global servers via the Internet. In [182], backdoor attacks
can be a severe security vulnerability that deceives the back-
doored global model into misclassifying all backdoored inputs
as belonging to the targeted false label, while functioning
correctly for regular inputs. FL is then highly vulnerable
to Byzantine attacks, as malicious users can manipulate the
learning process by creating fraudulent data, which degrades
the global model’s performance [183]. Moreover, in [184],
it was noted that Local Environment Poisoning Attacks can
impair the model’s performance by contaminating the local
training environment. Furthermore, in [185], they pointed out
that the integrity of the learning model in FL was susceptible
to adversarial attacks.
B. Fairness of federated learning
FL may produce biased results in the learning process due
to the varying amounts of data available for each participating
client. Therefore, it is crucial for the global model to ensure
fairness and not discriminate against any group during training.
Failure to achieve fairness can cause some participants to drop
out, leading to a decrease in model performance. In [186],
the Gini coefficient was used to minimize the gap between
each client and improve the fairness of FL. FL based on an
entropy-based aggregation method to enhance fairness during
the training process is also investigated [187].
C. Heterogeneity of data
The use of diverse scanning technologies across medical
institutions worldwide creates data heterogeneity for the same
symptom, leading to significant accuracy loss in many dis-
tributed training of deep neural networks due to non-IID data
[188]. In [189], a FL model uses a cross-correlation matrix to
learn a generalizable representation to address heterogeneous
data issues. Similarly, in [28], it introduced client weight per-
turbation and magnitude normalization to resolve this problem.
Fig. 5: Performance metrics of the Federated/Centralized
Learning model using the testing sample are measured across
different epochs. C1 through C5 denote Client 1 to Client 5,
respectively.
D. Lack of unified benchmark datasets
While current FL models exhibit impressive performance,
they often employ inconsistent datasets, which hinders ob-
jective assessment of performance metrics. Even when using
the same database, some algorithms require a subset of the
data for their training, often involving human intervention
and subjective evaluation [92]. Therefore, the creation of a
standardized benchmark test set is crucial to ensure objective
15
Fig. 6: Examples of the main challenges facing FL in health-
care. (Left) The challenge relates to potential malicious at-
tacks where a hacker infiltrates the global model to make
parameter adjustments that cause the local model to learn
erroneous parameters, ultimately reducing the model’s perfor-
mance. (Middle) The challenge pertains to fairness in FL.
To ensure fairness, the parameters are evaluated after each
communication cycle before being downloaded and uploaded.
(Right) The challenge illustrates data heterogeneity in FL. As
the acquisition of equipment varies from hospital to hospital,
it leads to different data distributions, making the learning
process more challenging.
evaluation of FL models.
E. Data ownership management and allocation
The limited availability of certain medical data necessitates
making decisions regarding the selection of data for model
training and the distribution of data among participants [190].
Moreover, there are costs involved in managing a significant
amount of medical data, and it is imperative to protect the
integrity of the data [191].
F. Medical data quality
Poor data quality, small dataset size, and incomplete label-
ing can prevent model training, especially in medical domains
where acquiring labeled data is challenging [192]. Moreover,
providing false or harmful data to decrease the model’s per-
formance is also a critical problem.
G. Federated learning under multi-client scenario
Our analysis revealed that the number of clients for most
FL methods is generally small [30], [31]. This reduced range
of clients may not be practical given the current situation. In
cases in which a hospital has limited data, particularly fewer
than 10 samples, it may be challenging to train a robust model.
Robust models often need access to thousands of data samples,
which require many clients to participate. In such scenarios,
it is worth considering whether these techniques would still
yield impressive results. Furthermore, recent development in
IoT enable numerous devices to collaborate in training a global
model with the participation of multiple devices, a scenario
which is typically named Cross-device FL [193]. This scenario
presents a challenge in the healthcare field.
VII. FUTURE TRENDS
Fig. 7: Showcases some expected advancements in FL within
the healthcare domain. The left panel displays blockchain-
driven FL, which relies on blockchain technology to enable
cryptographic operations that enhance data privacy. The mid-
dle panel depicts the role of FL in the healthcare metaverse,
demonstrating how this approach can maintain data privacy
and facilitate the development of a large-scale healthcare
metaverse without any leakage issues. The right panel il-
lustrates next-generation-driven FL, which capitalizes on the
high-bandwidth service provided by next-generation networks
to alleviate communication pressure between local and global
servers.
This section outlines the possible future directions of FL in
the healthcare sector. Furthermore, a set of these trends are
illustrated in Figure 7.
A. Federated learning with next generation networks
As previously mentioned, communication exchange be-
tween local and global servers is a key challenge in FL.
The emergence of 6G networks, which are projected to offer
wireless connection speeds up to 1,000 times faster than
5G, presents a potential solution to this challenge [194]. By
leveraging the increased network bandwidth of 6G technology,
FL implementations could achieve more efficient and effective
communication between local and global servers, improving
overall FL performance in the healthcare sector.
B. Federated learning-driven healthcare metaverse
The metaverse, a virtual world concept seen as the successor
to the mobile internet [195], has gained significant attention
from both academic and industrial environments. Researchers
in the healthcare sector are particularly interested in the meta-
verse due to its potential to enable remote assistance, medical
simulations, and virtual comparisons of scans [195]. However,
ensuring data privacy remains a challenge in the use of the
metaverse for clinical purposes. Adherence to appropriate pri-
vacy regulations, such as the United States’ Health Insurance
Portability and Accountability Act, is critical to guarantee
data security [195]. Additionally, the medical data obtained
from the metaverse may still suffer from heterogeneity and
data dispersion issues. By leveraging FL, there is potential
for unlocking new opportunities in the realm of intelligent
healthcare metaverse.
16
C. Blockchain-driven federated learning in healthcare
Blockchains can be described as a publicly accessible
ledger that documents all executed transactions, with the chain
expanding as additional blocks are added. Decentralization,
persistence, anonymity, and audibility are among the key
properties of blockchain technology. These advantages make
it an effective solution for addressing data privacy concerns.
The integration of blockchain technology into FL can enhance
data privacy protection, and this combination has already been
implemented in some instances [196].
D. Contrastive learning and federated learning: solving the
data unlabeled problem
Recently, contrastive learning has become a popular re-
search topic for learning unlabeled data representations [197].
This involves training a FL model on unlabeled data to learn
the similarities and differences between samples, allowing the
model to adapt easily to data that lacks labels [197]. Con-
trastive learning has the potential to enhance the performance
of FL models, especially in cases where data sources are
widely dispersed, and some data lack labels. For instance,
contrastive learning was used to reduce the difference between
identical images and increase the discrepancy between differ-
ent images, resulting in improved model performance [198].
E. Lifelong federated learning
While FL models have shown great success, they tend to
be task-specific, meaning that a model works well for one
task may not work as well for another. Lifelong learning
aims to overcome this limitation by using a single model that
can continuously improve and adapt to multiple tasks. For
instance, a federated lifelong learning approach was proposed
for landmark localization in medical imaging, which achieved
high performance with a mean distance error of 7.81 on
the BraTS dataset [199]. This approach shows promise for
improving the versatility and efficiency of FL models in
medical applications.
F. Generative pre-trained language model-based smart
healthcare under federated supervision
Researchers have recently shown great interest in using
ChatGPT in the healthcare sector to generate patient reports
and complementary diagnoses [200]. However, to train Chat-
GPT effectively, a large amount of data is required, which
can lead to potential privacy risks. Therefore, exploring ways
to train a general ChatGPT model using FL methods for
healthcare applications that are both reliable and feasible could
be a valuable avenue for further research.
G. Explainable federated learning in healthcare
The increasing attention given to the explainability of deep
models like CNNs is driven by concerns that if these models
correctly predict a disease but do not focus on abnormal
regions, it may result in a lack of trust in AI among doctors and
patients. FL also requires a global model that is explainable
and capable of providing reliable predictions, especially in the
healthcare sector [16].
VIII. CONCLUSION
In this paper, we discuss the limitations of TL in protecting
data privacy in the healthcare sector. FL is presented as a
potential solution to address privacy concerns by developing
a global model through local training and model aggregation
on decentralized datasets without sharing raw data. However,
FL in healthcare faces its own set of challenges such as
poor data quality, data heterogeneity, and data allocation and
management. We also compare FL with TL and highlight the
advantages of the former approach. The critical steps of FL
are explained in detail, and FL is categorized based on sample
and feature space. The applications of FL in healthcare are
summarized and categorized, along with typical evaluation
metrics and commonly used medical datasets. The reported
case study also sheds light on the importance of FL in
healthcare. It is expected that FL techniques will continue to
be widely used in both academia and hospitals in the near
future. With the aid of advances in science and technology,
we anticipate that FL can be further enhanced to provide more
effective support to patients in the healthcare sector.
ACKNOWLEDGMENTS
This research was funded by the National Natural Science
Foundation of China grant number 82260360, the Guilin Inno-
vation Platform and Talent Program 20222C264164, the Na-
tional Innovation Training Program for College Students under
Grant 202310595083, and the Guangxi Science and Technol-
ogy Base and Talent Project (2022AC18004, 2022AC21040).
REFERENCES
[1] Ahmad Chaddad and Camel Tanougast. Cnn approach for predicting
survival outcome of patients with covid-19. IEEE Internet of Things
Journal, 2023.
[2] Jonathan T Megerian, Sangeeta Dey, Raun D Melmed, Daniel L
Coury, Marc Lerner, Christopher J Nicholls, Kristin Sohl, Rambod
Rouhbakhsh, Anandhi Narasimhan, Jonathan Romain, et al. Evaluation
of an artificial intelligence-based medical device for diagnosis of autism
spectrum disorder. NPJ digital medicine, 5(1):57, 2022.
[3] Lucas Mearian. Did IBM overhype watson health’s AI promise.
Computerworld, 2019.
[4] Jane Andrew and Max Baker. The general data protection regulation
in the age of surveillance capitalism. Journal of Business Ethics, 168,
01 2021.
[5] Ashish Rauniyar, Desta Haileselassie Hagos, Debesh Jha, Jan Erik
H˚
akeg˚
ard, Ulas Bagci, Danda B Rawat, and Vladimir Vlassov.
Federated learning for medical applications: A taxonomy, current
trends, challenges, and future research directions. arXiv preprint
arXiv:2208.03392, 2022.
[6] Chen Zhang, Yu Xie, Hang Bai, Bin Yu, Weihong Li, and Yuan Gao. A
survey on federated learning. Knowledge-Based Systems, 216:106775,
2021.
[7] Jakub Koneˇ
cn`
y, H Brendan McMahan, Daniel Ramage, and Peter
Richt´
arik. Federated optimization: Distributed machine learning for
on-device intelligence. arXiv preprint arXiv:1610.02527, 2016.
[8] Marcos F Criado, Fernando E Casado, Roberto Iglesias, Carlos V
Regueiro, and Sen´
en Barro. Non-IID data and continual learning
processes in federated learning: A long road ahead. Information Fusion,
88:263–280, 2022.
[9] Matthew G Crowson, Dana Moukheiber, Aldo Robles Ar´
evalo, Bar-
bara D Lam, Sreekar Mantena, Aakanksha Rana, Deborah Goss,
David W Bates, and Leo Anthony Celi. A systematic review of
federated learning applications for biomedical data. PLOS Digital
Health, 1(5):e0000033, 2022.
[10] Bingyan Liu, Nuoyan Lv, Yuanchun Guo, and Yawen Li. Recent
advances on federated learning: A systematic survey. arXiv preprint
arXiv:2301.01299, 2023.
17
[11] Ahmad Chaddad, Lama Hassan, Yousef Katib, and Ahmed Bouridane.
Deep survival analysis with clinical variables for covid-19. IEEE
Journal of Translational Engineering in Health and Medicine, 11:223–
231, 2023.
[12] Omid Nejati Manzari, Hamid Ahmadabadi, Hossein Kashiani,
Shahriar B Shokouhi, and Ahmad Ayatollahi. Medvit: a robust vision
transformer for generalized medical image classification. Computers
in Biology and Medicine, 157:106791, 2023.
[13] Junde Wu, Rao Fu, Huihui Fang, Yuanpei Liu, Zhaowei Wang, Yanwu
Xu, Yueming Jin, and Tal Arbel. Medical sam adapter: Adapting
segment anything model for medical image segmentation. arXiv
preprint arXiv:2304.12620, 2023.
[14] Hong-Yu Zhou, Jiansen Guo, Yinghao Zhang, Xiaoguang Han, Lequan
Yu, Liansheng Wang, and Yizhou Yu. nnformer: Volumetric medical
image segmentation via a 3d transformer. IEEE Transactions on Image
Processing, 2023.
[15] Ji-Jiang Yang, Jian-Qiang Li, and Yu Niu. A hybrid solution for privacy
preserving medical data sharing in the cloud environment. Future
Generation Computer Systems, 43-44:74–86, 2015.
[16] Ahmad Chaddad, Qizong Lu, Jiali Li, Yousef Katib, Reem Kateb,
Camel Tanougast, Ahmed Bouridane, and Ahmed Abdulkadir. Explain-
able, domain-adaptive, and federated artificial intelligence in medicine.
IEEE/CAA Journal of Automatica Sinica, 10(4):859–876, 2023.
[17] Dinh C. Nguyen, Pubudu N. Pathirana, Ming Ding, and Aruna Senevi-
ratne. BEdgeHealth: A decentralized architecture for edge-based
IoMT networks using blockchain. IEEE Internet of Things Journal,
8(14):11743–11757, 2021.
[18] Li Zhang, Jianbo Xu, Pandi Vijayakumar, Pradip Kumar Sharma, and
Uttam Ghosh. Homomorphic encryption-based privacy-preserving fed-
erated learning in IoT-enabled healthcare system. IEEE Transactions
on Network Science and Engineering, pages 1–17, 2022.
[19] Latif U. Khan, Shashi Raj Pandey, Nguyen H. Tran, Walid Saad, Zhu
Han, Minh N. H. Nguyen, and Choong Seon Hong. Federated learning
for edge networks: Resource optimization and incentive mechanism.
IEEE Communications Magazine, 58(10):88–93, 2020.
[20] Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson,
and Blaise Aguera y Arcas. Communication-efficient learning of
deep networks from decentralized data. In Artificial intelligence and
statistics, pages 1273–1282. PMLR, 2017.
[21] Shuo Wan, Jiaxun Lu, Pingyi Fan, Yunfeng Shao, Chenghui Peng,
Khaled B. letaief, and Jie Chuai. How global observation works in
federated learning: Integrating vertical training into horizontal federated
learning. IEEE Internet of Things Journal, pages 1–1, 2023.
[22] Jie Xu, Benjamin S Glicksberg, Chang Su, Peter Walker, Jiang Bian,
and Fei Wang. Federated learning for healthcare informatics. Journal
of Healthcare Informatics Research, 5:1–19, 2021.
[23] Peter Kairouz, H Brendan McMahan, Brendan Avent, Aur´
elien Bel-
let, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary
Charles, Graham Cormode, Rachel Cummings, et al. Advances and
open problems in federated learning. Foundations and Trends® in
Machine Learning, 14(1–2):1–210, 2021.
[24] Yang Liu, Tao Fan, Tianjian Chen, Qian Xu, and Qiang Yang. FATE:
An industrial grade platform for collaborative learning with data
protection. The Journal of Machine Learning Research, 22(1):10320–
10325, 2021.
[25] Fuzhen Zhuang, Zhiyuan Qi, Keyu Duan, Dongbo Xi, Yongchun Zhu,
Hengshu Zhu, Hui Xiong, and Qing He. A comprehensive survey on
transfer learning. Proceedings of the IEEE, 109(1):43–76, 2021.
[26] Yazan Otoum, Yue Wan, and Amiya Nayak. Federated transfer
learning-based IDS for the internet of medical things (IoMT). In 2021
IEEE Globecom Workshops (GC Wkshps), pages 1–6. IEEE, 2021.
[27] Mohammed Adnan, Shivam Kalra, Jesse C Cresswell, Graham W
Taylor, and Hamid R Tizhoosh. Federated learning and differential
privacy for medical image analysis. Scientific reports, 12(1):1953,
2022.
[28] Meirui Jiang, Zirui Wang, and Qi Dou. HarmoFL: Harmonizing
local and global drifts in federated learning on heterogeneous medical
images. Proceedings of the AAAI Conference on Artificial Intelligence,
36(1):1087–1095, 6 2022.
[29] Amelia Jim´
enez-S´
anchez, Mickael Tardy, Miguel A. Gonz´
alez
Ballester, Diana Mateus, and Gemma Piella. Memory-aware curriculum
federated learning for breast cancer classification. Computer Methods
and Programs in Biomedicine, 229:107318, 2023.
[30] Jun Luo and Shandong Wu. FedSLD: Federated learning with shared
label distribution for medical image classification. In 2022 IEEE 19th
International Symposium on Biomedical Imaging (ISBI), pages 1–5.
IEEE, 2022.
[31] Rui Yan, Liangqiong Qu, Qingyue Wei, Shih-Cheng Huang, Liyue
Shen, Daniel Rubin, Lei Xing, and Yuyin Zhou. Label-efficient self-
supervised federated learning for tackling data heterogeneity in medical
imaging. arXiv preprint arXiv:2205.08576, 2022.
[32] Yiqing Shen, Yuyin Zhou, and Lequan Yu. CD2-pFed: Cyclic
distillation-guided channel decoupling for model personalization in
federated learning. In Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition (CVPR), pages 10041–
10050, June 2022.
[33] S. Maryam Hosseini, Milad Sikaroudi, Morteza Babaie, and H.R.
Tizhoosh. Proportionally fair hospital collaborations in federated
learning of histopathology images. IEEE Transactions on Medical
Imaging, pages 1–1, 2023.
[34] Jeffry Wicaksana, Zengqiang Yan, Xin Yang, Yang Liu, Lixin Fan, and
Kwang-Ting Cheng. Customized federated learning for multi-source
decentralized medical image classification. IEEE Journal of Biomedical
and Health Informatics, 26(11):5596–5607, 2022.
[35] Meirui Jiang, Hongzheng Yang, Xiaoxiao Li, Quande Liu, Pheng-
Ann Heng, and Qi Dou. Dynamic bank learning for semi-supervised
federated image diagnosis with class imbalance. In Medical Image
Computing and Computer Assisted Intervention–MICCAI 2022: 25th
International Conference, Singapore, September 18–22, 2022, Proceed-
ings, Part III, pages 196–206. Springer, 2022.
[36] Wang Lu, Jindong Wang, Yiqiang Chen, Xin Qin, Renjun Xu, Dimitrios
Dimitriadis, and Tao Qin. Personalized federated learning with adaptive
batchnorm for healthcare. IEEE Transactions on Big Data, pages 1–1,
2022.
[37] Xing Wu, Jie Pei, Cheng Chen, Yimin Zhu, Jianjia Wang, Quan Qian,
Jian Zhang, Qun Sun, and Yike Guo. Federated active learning for
multicenter collaborative disease diagnosis. IEEE Transactions on
Medical Imaging, pages 1–1, 2022.
[38] Dinh C. Nguyen, Ming Ding, Pubudu N. Pathirana, Aruna Seneviratne,
and Albert Y. Zomaya. Federated learning for COVID-19 detection
with generative adversarial networks in edge cloud computing. IEEE
Internet of Things Journal, 9(12):10257–10271, 2022.
[39] Judith S ´
ainz-Pardo D´
ıaz and ´
Alvaro L´
opez Garc´
ıa. Study of the
performance and scalability of federated learning for medical imaging
with intermittent clients. Neurocomputing, 518:142–154, 2023.
[40] Saleh Baghersalimi, Tomas Teijeiro, David Atienza, and Amir Amini-
far. Personalized real-time federated learning for epileptic seizure
detection. IEEE Journal of Biomedical and Health Informatics,
26(2):898–909, 2022.
[41] Marc Jayson Baucas, Petros Spachos, and Konstantinos N Plataniotis.
Federated learning and blockchain-enabled Fog-IoT platform for wear-
ables in predictive healthcare. IEEE Transactions on Computational
Social Systems, 2023.
[42] Jiayun Zhang, Xiyuan Zhang, Xinyang Zhang, Dezhi Hong, Rajesh K
Gupta, and Jingbo Shang. Federated learning with client-exclusive
classes. arXiv preprint arXiv:2301.00489, 2023.
[43] Kaixuan Zhang, Xiulong Liu, Xin Xie, Jiuwu Zhang, Bingxin Niu, and
Keqiu Li. A cross-domain federated learning framework for wireless
human sensing. IEEE Network, 36(5):122–128, 2022.
[44] Shuai Liu, Xiao Guo, Shun Qi, Huaning Wang, and Xiangyu Chang.
Learning personalized brain functional connectivity of MDD patients
from multiple sites via federated bayesian networks. arXiv preprint
arXiv:2301.02423, 2023.
[45] Liang Peng, Nan Wang, Nicha Dvornek, Xiaofeng Zhu, and Xiaoxiao
Li. FedNI: Federated graph learning with network inpainting for
population-based disease prediction. IEEE Transactions on Medical
Imaging, pages 1–1, 2022.
[46] Hasan Kassem, Deepak Alapatt, Pietro Mascagni, Consortium
AI4SafeChole, Alexandros Karargyris, and Nicolas Padoy. Federated
cycling (fedcy): Semi-supervised federated learning of surgical phases.
IEEE Transactions on Medical Imaging, 2022.
[47] Pranav Kulkarni, Adway Kanhere, Paul H Yi, and Vishwa S
Parekh. Surgical aggregation: A federated learning framework for
harmonizing distributed datasets with diverse tasks. arXiv preprint
arXiv:2301.06683, 2023.
[48] Katarzyna Tomczak, Patrycja Czerwi´
nska, and Maciej Wiznerowicz.
Review the cancer genome atlas (TCGA): an immeasurable source
of knowledge. Contemporary Oncology/Wsp´
ołczesna Onkologia,
2015(1):68–77, 2015.
[49] Peter Bandi, Oscar G. F. Geessink, Quirine F. Manson, Marcory Crf
van Dijk, Maschenka C. A. Balkenhol, Meyke Hermsen, and Babak
Ehteshami Bejnordi et al. From detection of individual metastases to
classification of lymph node status at the patient level: The CAME-
18
LYON17 challenge. IEEE Transactions on Medical Imaging, 38:550–
560, 2019.
[50] Heejo Kong, Gun-Hee Lee, Suneung Kim, and Seong-Whan Lee.
Pruning-guided curriculum learning for semi-supervised semantic seg-
mentation. In Proceedings of the IEEE/CVF Winter Conference on
Applications of Computer Vision (WACV), pages 5914–5923, January
2023.
[51] Yi Chang, Meiya Chen, Changfeng Yu, Yi Li, Liqun Chen, and Luxin
Yan. Direction and residual awareness curriculum learning network
for rain streaks removal. IEEE Transactions on Neural Networks and
Learning Systems, pages 1–15, 2023.
[52] Yoshua Bengio, J´
erˆ
ome Louradour, Ronan Collobert, and Jason We-
ston. Curriculum learning. In International Conference on Machine
Learning, 2009.
[53] Jiancheng Yang, Rui Shi, and Bingbing Ni. MedMNIST classifica-
tion decathlon: A lightweight AutoML benchmark for medical image
analysis. In 2021 IEEE 18th International Symposium on Biomedical
Imaging (ISBI), pages 191–195. IEEE, 2021.
[54] Noel CF Codella, David Gutman, M Emre Celebi, Brian Helba,
Michael A Marchetti, Stephen W Dusza, Aadi Kalloo, Konstantinos
Liopyris, Nabin Mishra, Harald Kittler, et al. Skin lesion analysis
toward melanoma detection: A challenge at the 2017 international
symposium on biomedical imaging (ISBI), hosted by the international
skin imaging collaboration (ISIC). In 2018 IEEE 15th international
symposium on biomedical imaging (ISBI 2018), pages 168–172. IEEE,
2018.
[55] Marc Combalia, Noel CF Codella, Veronica Rotemberg, Brian Helba,
Veronica Vilaplana, Ofer Reiter, Cristina Carrera, Alicia Barreiro,
Allan C Halpern, Susana Puig, et al. BCN20000: Dermoscopic lesions
in the wild. arXiv preprint arXiv:1908.02288, 2019.
[56] Veronica Rotemberg, Nicholas Kurtansky, Brigid Betz-Stablein, Liam
Caffery, Emmanouil Chousakos, Noel Codella, Marc Combalia,
Stephen Dusza, Pascale Guitera, David Gutman, et al. A patient-
centric dataset of images and metadata for identifying melanomas using
clinical context. Scientific data, 8(1):34, 2021.
[57] Xiaosong Ma, Jie Zhang, Song Guo, and Wenchao Xu. Layer-
wised model aggregation for personalized federated learning. In
Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition (CVPR), pages 10092–10101, June 2022.
[58] Hongyi Wang, Mikhail Yurochkin, Yuekai Sun, Dimitris Papailiopou-
los, and Yasaman Khazaeni. Federated learning with matched averag-
ing. arXiv preprint arXiv:2002.06440, 2020.
[59] Sannara EK, Franc¸ois PORTET, Philippe LALANDA, and German
VEGA. A federated learning aggregation algorithm for pervasive
computing: Evaluation and comparison. In 2021 IEEE International
Conference on Pervasive Computing and Communications (PerCom),
pages 1–10, 2021.
[60] Sannara Ek, Franc¸ ois Portet, Philippe Lalanda, and German Vega.
Evaluation and comparison of federated learning algorithms for human
activity recognition on smartphones. Pervasive and Mobile Computing,
87:101714, 2022.
[61] Philipp Tschandl, Cliff Rosendahl, and Harald Kittler. The HAM10000
dataset, a large collection of multi-source dermatoscopic images of
common pigmented skin lesions. Scientific data, 5(1):1–9, 2018.
[62] Juexiao Zhou, Longxi Zhou, Di Wang, Xiaopeng Xu, Haoyang
Li, Yuetan Chu, Wenkai Han, and Xin Gao. Personalized and
privacy-preserving federated heterogeneous medical image analysis
with PPPML-HMI. medRxiv, pages 2023–02, 2023.
[63] Unais Sait, KG Lal, S Prajapati, Rahul Bhaumik, Tarun Kumar,
S Sanjana, and Kriti Bhalla. Curated dataset for COVID-19 posterior-
anterior chest radiography images (X-Rays). Mendeley Data, 1, 2020.
[64] Kang Zhang, Xiaohong Liu, Jun Shen, Zhihuan Li, Ye Sang, Xingwang
Wu, Yunfei Zha, Wenhua Liang, Chengdi Wang, Ke Wang, et al.
Clinically applicable ai system for accurate diagnosis, quantitative mea-
surements, and prognosis of COVID-19 pneumonia using computed
tomography. Cell, 181(6):1423–1433, 2020.
[65] Tulin Ozturk, Muhammed Talo, Eylul Azra Yildirim, Ulas Baran
Baloglu, Ozal Yildirim, and U. Rajendra Acharya. Automated detection
of COVID-19 cases using deep neural networks with X-ray images.
Computers in Biology and Medicine, 121:103792, 2020.
[66] Parnian Afshar, Shahin Heidarian, Farnoosh Naderkhani, Anastasia
Oikonomou, Konstantinos N. Plataniotis, and Arash Mohammadi.
COVID-CAPS: A capsule network-based framework for identification
of COVID-19 cases from X-ray images. Pattern Recognition Letters,
138:638–643, 2020.
[67] Rachel Lea Draelos, David Dov, Maciej A Mazurowski, Joseph Y Lo,
Ricardo Henao, Geoffrey D Rubin, and Lawrence Carin. Machine-
learning-based multiple abnormality prediction with large-scale chest
computed tomography volumes. Medical image analysis, 67:101857,
2021.
[68] Daniel S Kermany, Michael Goldbaum, Wenjia Cai, Carolina CS
Valentim, Huiying Liang, Sally L Baxter, Alex McKeown, Ge Yang,
Xiaokang Wu, Fangbing Yan, et al. Identifying medical diagnoses and
treatable diseases by image-based deep learning. cell, 172(5):1122–
1131, 2018.
[69] World Health Organization et al. Epilepsy: a public health imperative.
2019. This report provides an overview of the challenges of epilepsy
diagnosis and treatment throughout the world, highlighting the gaps
between high-income and low-income countries, 2020.
[70] Lijuan Duan, Zeyu Wang, Yuanhua Qiao, Yue Wang, Zhaoyang Huang,
and Baochang Zhang. An automatic method for epileptic seizure
detection based on deep metric learning. IEEE Journal of Biomedical
and Health Informatics, 26(5):2147–2157, 2022.
[71] Matthias Ihle, Hinnerk Feldwisch-Drentrup, C´
esar A. Teixeira, Adrien
Witon, Bj¨
orn Schelter, Jens Timmer, and Andreas Schulze-Bonhage.
EPILEPSIAE a european epilepsy database. Computer Methods and
Programs in Biomedicine, 106(3):127–138, 2012.
[72] Yang Li, Guanci Yang, Zhidong Su, Shaobo Li, and Yang Wang.
Human activity recognition based on multienvironment sensor data.
Information Fusion, 91:47–63, 2023.
[73] Attila Reiss and Didier Stricker. Introducing a new benchmarked
dataset for activity monitoring. In 2012 16th International Symposium
on Wearable Computers, pages 108–109, 2012.
[74] Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra,
Jorge Luis Reyes-Ortiz, et al. A public domain dataset for human
activity recognition using smartphones. In Esann, volume 3, page 3,
2013.
[75] Saori C Tanaka, Ayumu Yamashita, Noriaki Yahata, Takashi Itahashi,
Giuseppe Lisi, Takashi Yamada, Naho Ichikawa, Masahiro Takamura,
Yujiro Yoshihara, Akira Kunimatsu, et al. A multi-site, multi-disorder
resting-state magnetic resonance image database. Scientific data,
8(1):227, 2021.
[76] Marco Solmi, Minjin Song, Dong Keon Yon, Seung Won Lee, Eric
Fombonne, Min Seo Kim, Seoyeon Park, Min Ho Lee, Jimin Hwang,
Roberto Keller, et al. Incidence, prevalence, and global burden of
autism spectrum disorder from 1990 to 2019 across 204 countries.
Molecular Psychiatry, pages 1–9, 2022.
[77] Ziqi Tang, Kangway V Chuang, Charles DeCarli, Lee-Way Jin, Laurel
Beckett, Michael J Keiser, and Brittany N Dugger. Interpretable
classification of Alzheimer’s disease pathologies with a convolutional
neural network pipeline. Nature communications, 10(1):2173, 2019.
[78] Jingsheng Deng, Md Rakibul Hasan, Minhaz Mahmud, Md Mahbub
Hasan, Khandaker Asif Ahmed, and Md Zakir Hossain. Diagnosing
autism spectrum disorder using ensemble 3D-CNN: A preliminary
study. In 2022 IEEE International Conference on Image Processing
(ICIP), pages 3480–3484. IEEE, 2022.
[79] Hongming Li and Yong Fan. Brain decoding from functional MRI
using long short-term memory recurrent neural networks. In Medical
Image Computing and Computer Assisted Intervention–MICCAI 2018:
21st International Conference, Granada, Spain, September 16-20,
2018, Proceedings, Part III 11, pages 320–328. Springer, 2018.
[80] V Pream Sudha and MS Vijaya. Recurrrent neural network based
model for autism spectrum disorder prediction using codon encoding.
Journal of The Institution of Engineers (India): Series B, pages 1–7,
2021.
[81] Hao Zhang, Ran Song, Liping Wang, Lin Zhang, Dawei Wang, Cong
Wang, and Wei Zhang. Classification of brain disorders in rs-fMRI via
local-to-global graph neural networks. IEEE Transactions on Medical
Imaging, pages 1–1, 2022.
[82] Adriana Di Martino, Chao-Gan Yan, Qingyang Li, Erin Denio, Fran-
cisco X Castellanos, Kaat Alaerts, Jeffrey S Anderson, Michal Assaf,
Susan Y Bookheimer, Mirella Dapretto, et al. The autism brain imaging
data exchange: towards a large-scale evaluation of the intrinsic brain
architecture in autism. Molecular psychiatry, 19(6):659–667, 2014.
[83] Alzheimer’s disease neuroimaging initiative (ADNI). https://adni.loni.
usc.edu. Accessed: March 18, 2023.
[84] Xinpeng Ding and Xiaomeng Li. Exploring segment-level semantics
for online phase recognition from surgical videos. IEEE Transactions
on Medical Imaging, 41(11):3309–3319, 2022.
[85] Andru P. Twinanda, Sherif Shehata, Didier Mutter, Jacques Marescaux,
Michel de Mathelin, and Nicolas Padoy. EndoNet: A deep architecture
for recognition tasks on laparoscopic videos. IEEE Transactions on
Medical Imaging, 36(1):86–97, 2017.
19
[86] Xiaosong Wang, Yifan Peng, Le Lu, Zhiyong Lu, Mohammadhadi
Bagheri, and Ronald M Summers. ChestX-ray8: Hospital-scale chest
X-ray database and benchmarks on weakly-supervised classification
and localization of common thorax diseases. In Proceedings of the
IEEE conference on computer vision and pattern recognition, pages
2097–2106, 2017.
[87] Jeremy Irvin, Pranav Rajpurkar, Michael Ko, Yifan Yu, Silviana Ciurea-
Ilcus, Chris Chute, Henrik Marklund, Behzad Haghgoo, Robyn Ball,
Katie Shpanskaya, et al. CheXpert: A large chest radiograph dataset
with uncertainty labels and expert comparison. In Proceedings of the
AAAI conference on artificial intelligence, volume 33, pages 590–597,
2019.
[88] Zelei Liu, Yuanyuan Chen, Yansong Zhao, Han Yu, Yang Liu,
Renyi Bao, Jinpeng Jiang, Zaiqing Nie, Qian Xu, and Qiang Yang.
Contribution-aware federated learning for smart healthcare. Proceed-
ings of the AAAI Conference on Artificial Intelligence, 36(11):12396–
12404, 6 2022.
[89] Quande Liu, Qi Dou, Lequan Yu, and Pheng Ann Heng. MS-Net: multi-
site network for improving prostate segmentation with heterogeneous
mri data. IEEE transactions on medical imaging, 39(9):2713–2724,
2020.
[90] Meilu Zhu, Zhen Chen, and Yixuan Yuan. FedDM: Federated weakly
supervised segmentation via annotation calibration and gradient de-
conflicting. IEEE Transactions on Medical Imaging, pages 1–1, 2023.
[91] Geert Litjens, Robert Toth, Wendy Van De Ven, Caroline Hoeks, Sjoerd
Kerkstra, Bram van Ginneken, Graham Vincent, Gwenael Guillard,
Neil Birbeck, Jindang Zhang, et al. Evaluation of prostate segmentation
algorithms for MRI: the PROMISE12 challenge. Medical image
analysis, 18(2):359–373, 2014.
[92] An Xu, Wenqi Li, Pengfei Guo, Dong Yang, Holger R. Roth, Ali
Hatamizadeh, Can Zhao, Daguang Xu, Heng Huang, and Ziyue Xu.
Closing the generalization gap of cross-silo federated medical image
segmentation. In Proceedings of the IEEE/CVF Conference on Com-
puter Vision and Pattern Recognition (CVPR), pages 20866–20875,
June 2022.
[93] Michela Antonelli, Annika Reinke, Spyridon Bakas, Keyvan Farahani,
Annette Kopp-Schneider, Bennett A Landman, Geert Litjens, Bjoern
Menze, Olaf Ronneberger, Ronald M Summers, et al. The medical
segmentation decathlon. Nature communications, 13(1):4128, 2022.
[94] Guillaume Lemaˆ
ıtre, Robert Mart´
ı, Jordi Freixenet, Joan C. Vilanova,
Paul M. Walker, and Fabrice Meriaudeau. Computer-aided detection
and diagnosis for prostate cancer based on mono and multi-parametric
MRI: A review. Computers in Biology and Medicine, 60:8–31, 2015.
[95] NCI ISBI dataset. https://www.cancerimagingarchive.net/. Accessed:
March 18, 2023.
[96] Prostatex dataset. https://prostatex.grand-challenge.org/, 2022. Ac-
cessed: March 18, 2023.
[97] Zhen Chen, Chen Yang, Meilu Zhu, Zhe Peng, and Yixuan Yuan.
Personalized retrogress-resilient federated learning toward imbalanced
medical data. IEEE Transactions on Medical Imaging, 41(12):3663–
3674, 2022.
[98] Nicolas Wagner, Moritz Fuchs, Yuri Tolkach, and Anirban Mukhopad-
hyay. Federated stain normalization for computational pathology.
In Medical Image Computing and Computer Assisted Intervention–
MICCAI 2022: 25th International Conference, Singapore, September
18–22, 2022, Proceedings, Part II, pages 14–23. Springer, 2022.
[99] Wouter Bulten, P´
eter B´
andi, Jeffrey Hoven, Rob van de Loo, Johannes
Lotz, Nick Weiss, Jeroen van der Laak, Bram van Ginneken, Christina
Hulsbergen-van de Kaa, and Geert Litjens. Epithelium segmentation
using deep learning in H&E-stained prostate specimens with immuno-
histochemistry as reference standard. Scientific reports, 9(1):864, 2019.
[100] Jeffry Wicaksana, Zengqiang Yan, Dong Zhang, Xijie Huang, Huimin
Wu, Xin Yang, and Kwang-Ting Cheng. FedMix: Mixed supervised
federated learning for medical image segmentation. IEEE Transactions
on Medical Imaging, 2022.
[101] Walid Al-Dhabyani, Mohammed Gomaa, Hussien Khaled, and Aly
Fahmy. Dataset of breast ultrasound images. Data in brief, 28:104863,
2020.
[102] Yingtao Zhang, Min Xian, Heng-Da Cheng, Bryar Shareef, Jianrui
Ding, Fei Xu, Kuan Huang, Boyu Zhang, Chunping Ning, and Ying
Wang. BUSIS: a benchmark for breast ultrasound image segmentation.
In Healthcare, volume 10, page 729. MDPI, 2022.
[103] Moi Hoon Yap, Gerard Pons, Joan Mart´
ı, Sergi Ganau, Melcior Sent´
ıs,
Reyer Zwiggelaar, Adrian K. Davison, and Robert Mart´
ı. Automated
breast ultrasound lesions detection using convolutional neural networks.
IEEE Journal of Biomedical and Health Informatics, 22(4):1218–1226,
2018.
[104] Bernardo Camajori Tedeschini, Stefano Savazzi, Roman Stoklasa,
Luca Barbieri, Ioannis Stathopoulos, Monica Nicoli, and Luigi Serio.
Decentralized federated learning for healthcare networks: A case study
on tumor segmentation. IEEE Access, 10:8693–8708, 2022.
[105] Spyridon Bakas, Hamed Akbari, Aristeidis Sotiras, Michel Bilello,
Martin Rozycki, Justin S Kirby, John B Freymann, Keyvan Farahani,
and Christos Davatzikos. Advancing the cancer genome atlas glioma
MRI collections with expert segmentation labels and radiomic features.
Scientific data, 4(1):1–13, 2017.
[106] Spyridon Bakas, Mauricio Reyes, Andras Jakab, Stefan Bauer, Markus
Rempfler, Alessandro Crimi, Russell Takeshi Shinohara, Christoph
Berger, Sung Min Ha, Martin Rozycki, et al. Identifying the best
machine learning algorithms for brain tumor segmentation, progression
assessment, and overall survival prediction in the BRATS challenge.
arXiv preprint arXiv:1811.02629, 2018.
[107] Xuan Gong, Liangchen Song, Rishi Vedula, Abhishek Sharma, Meng
Zheng, Benjamin Planche, Arun Innanje, Terrence Chen, Junsong Yuan,
David Doermann, et al. Federated learning with privacy-preserving
ensemble attention distillation. IEEE Transactions on Medical Imaging,
2022.
[108] Miao Zhang, Liangqiong Qu, Praveer Singh, Jayashree Kalpathy-
Cramer, and Daniel L. Rubin. SplitAVG: A heterogeneity-aware
federated deep learning method for medical imaging. IEEE Journal
of Biomedical and Health Informatics, 26(9):4635–4644, 2022.
[109] MICCAI BRATS 2017. https://sites.google.com/site/
braintumorsegmentation/, 2017. Accessed: March 18, 2023.
[110] Adway U Kanhere, Pranav Kulkarni, Paul H Yi, and Vishwa S
Parekh. SegViz: A federated learning framework for medical image
segmentation from distributed datasets with different and incomplete
annotations. arXiv preprint arXiv:2301.07074, 2023.
[111] Synapse sage bionetworks. https://www.synapse.org. Accessed:
March 18, 2023.
[112] Sangjoon Park and Jong Chul Ye. Multi-task distributed learning using
vision transformer with random patch permutation. IEEE Transactions
on Medical Imaging, 42(7):2091–2105, 2023.
[113] Pneumothorax dataset. https://www.kaggle.com/competitions/
siim-acr-pneumothorax-segmentation/overview, 2018. Accessed:
March 18, 2023.
[114] Junxiu Liu, Xiuhao Liang, Rixing Yang, Yuling Luo, Hao Lu, Liangjia
Li, Shunsheng Zhang, and Su Yang. Federated learning-based vertebral
body segmentation. Engineering Applications of Artificial Intelligence,
116:105451, 2022.
[115] 2019.kaggle:spinesagt2wdataset3. https://www.kaggle.com/datasets/
dutianze/mri-dataset, 2019. Accessed: March 18, 2023.
[116] Gokberk Elmas, Salman UH Dar, Yilmaz Korkmaz, Emir Ceyani,
Burak Susam, Muzaffer Ozbey, Salman Avestimehr, and Tolga C¸ ukur.
Federated learning of generative image priors for MRI reconstruction.
IEEE Transactions on Medical Imaging, pages 1–1, 2022.
[117] IXI dataset. http://brain- development.org/ixi-dataset/. Accessed: March
18, 2023.
[118] Chun-Mei Feng, Yunlu Yan, Shanshan Wang, Yong Xu, Ling Shao,
and Huazhu Fu. Specificity-preserving federated learning for mr image
reconstruction. IEEE Transactions on Medical Imaging, 42(7):2010–
2021, 2023.
[119] Florian Knoll, Jure Zbontar, Anuroop Sriram, Matthew J Muckley,
Mary Bruno, Aaron Defazio, Marc Parente, Krzysztof J Geras, Joe
Katsnelson, Hersh Chandarana, et al. fastMRI: A publicly available
raw k-space and DICOM dataset of knee images for accelerated MR
image reconstruction using machine learning. Radiology: Artificial
Intelligence, 2(1):e190007, 2020.
[120] Chunhui Du, Hao He, and Yaohui Jin. Contrast with major classifier
vectors for federated medical relation extraction with heterogeneous
label distribution. arXiv preprint arXiv:2301.05376, 2023.
[121] ¨
Ozlem Uzuner, Brett R South, Shuying Shen, and Scott L DuVall. 2010
i2b2/VA challenge on concepts, assertions, and relations in clinical text.
Journal of the American Medical Informatics Association, 18(5):552–
556, 2011.
[122] Anshul Thakur, Pulkit Sharma, and David A. Clifton. Dynamic neural
graphs based federated reptile for semi-supervised multi-tasking in
healthcare applications. IEEE Journal of Biomedical and Health
Informatics, 26(4):1761–1772, 2022.
[123] Alistair EW Johnson, Tom J Pollard, Lu Shen, Li-wei H Lehman,
Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter
Szolovits, Leo Anthony Celi, and Roger G Mark. MIMIC-III, a freely
accessible critical care database. Scientific data, 3(1):1–9, 2016.
20
[124] D Kai Zhang, Francesca Toni, and Matthew Williams. A federated cox
model with non-proportional hazards. In Multimodal AI in healthcare:
A paradigm shift in health intelligence, pages 171–185. Springer, 2022.
[125] GBSG dataset. https://www.gbg.de/en/. Accessed: March 18, 2023.
[126] Usman Ahmed, Jerry Chun-Wei Lin, and Gautam Srivastava. Hyper-
graph attention based federated learning method for mental health
detection. IEEE Journal of Biomedical and Health Informatics, 2022.
[127] Suresh Kumar Mukhiya, Usman Ahmed, Fazle Rabbi, Ka I Pun, and
Yngve Lamo. Adaptation of IDPT system based on patient-authored
text data using NLP. In 2020 IEEE 33rd International Symposium on
Computer-Based Medical Systems (CBMS), pages 226–232, 2020.
[128] Cosmin I Bercea, Benedikt Wiestler, Daniel Rueckert, and Shadi Albar-
qouni. Federated disentangled representation learning for unsupervised
brain anomaly detection. Nature Machine Intelligence, 4(8):685–695,
2022.
[129] Dimitrios I Zaridis, Eugenia Mylona, Nikolaos Tachos, Vasileios C
Pezoulas, Grigorios Grigoriadis, Nikos Tsiknakis, Kostas Marias,
Manolis Tsiknakis, and Dimitrios I Fotiadis. Region-adaptive magnetic
resonance image enhancement for improving CNN-based segmentation
of the prostate and prostatic zones. Scientific Reports, 13(1):714, 2023.
[130] Zhi-Hua Zhou. A brief introduction to weakly supervised learning.
National science review, 5(1):44–53, 2018.
[131] Francesco Piccialli, Vittorio Di Somma, Fabio Giampaolo, Salvatore
Cuomo, and Giancarlo Fortino. A survey on deep learning in medicine:
Why, how and when? Information Fusion, 66:111–137, 2021.
[132] Hatice Catal Reis, Veysel Turk, Kourosh Khoshelham, and Serhat Kaya.
InSiNet: a deep convolutional approach to skin cancer detection and
segmentation. Medical & Biological Engineering & Computing, pages
1–20, 2022.
[133] Ahmad Chaddad, Lama Hassan, and Yousef Katib. A texture-based
method for predicting molecular markers and survival outcome in lower
grade glioma. Applied Intelligence, pages 1–15, 2023.
[134] Ken Chang, Niranjan Balachandar, Carson Lam, Darvin Yi, James
Brown, Andrew Beers, Bruce Rosen, Daniel L Rubin, and Jayashree
Kalpathy-Cramer. Distributed deep learning networks among in-
stitutions for medical imaging. Journal of the American Medical
Informatics Association, 25(8):945–954, 2018.
[135] Yue Zhao, Meng Li, Liangzhen Lai, Naveen Suda, Damon Civin, and
Vikas Chandra. Federated learning with non-IID data. arXiv preprint
arXiv:1806.00582, 2018.
[136] Yue Tan, Guodong Long, Lu Liu, Tianyi Zhou, Qinghua Lu, Jing
Jiang, and Chengqi Zhang. Fedproto: Federated prototype learning
across heterogeneous clients. In Proceedings of the AAAI Conference
on Artificial Intelligence, volume 36, pages 8432–8440, 2022.
[137] Ahmed M Abdelmoniem and Marco Canini. Towards mitigating device
heterogeneity in federated learning via adaptive model quantization. In
Proceedings of the 1st Workshop on Machine Learning and Systems,
pages 96–103, 2021.
[138] Ahmed M Abdelmoniem, Chen-Yu Ho, Pantelis Papageorgiou, and
Marco Canini. A comprehensive empirical study of heterogeneity in
federated learning. IEEE Internet of Things Journal, 2023.
[139] Pamela J LaMontagne, Tammie LS Benzinger, John C Morris, Sarah
Keefe, Russ Hornbeck, Chengjie Xiong, Elizabeth Grant, Jason Hassen-
stab, Krista Moulder, Andrei G Vlassenko, et al. OASIS-3: longitudinal
neuroimaging, clinical, and cognitive dataset for normal aging and
Alzheimer disease. MedRxiv, pages 2019–12, 2019.
[140] Fei Lyu, Andy J. Ma, Terry Cheuk-Fung Yip, Grace Lai-Hung Wong,
and Pong C. Yuen. Weakly supervised liver tumor segmentation using
couinaud segment annotation. IEEE Transactions on Medical Imaging,
41(5):1138–1149, 2022.
[141] Yunpeng Wang, Kang Wang, Xueqing Peng, Lili Shi, Jing Sun, Shibao
Zheng, Fei Shan, Weiya Shi, and Lei Liu. DeepSDM: Boundary-aware
pneumothorax segmentation in chest X-ray images. Neurocomputing,
454:201–211, 2021.
[142] Jiawei Huang, Haotian Shen, Jialong Wu, Xiaojian Hu, Zhiwei Zhu,
Xiaoqiang Lv, Yong Liu, and Yue Wang. Spine explorer: a deep
learning based fully automated program for efficient and reliable
quantifications of the vertebrae and discs on sagittal lumbar spine MR
images. The Spine Journal, 20(4):590–599, 2020.
[143] Sebastian Lobentanzer, Patrick Aloy, Jan Baumbach, Balazs Bohar,
Vincent J Carey, Pornpimol Charoentong, Katharina Danhauser, Tunca
Do˘
gan, Johann Dreo, Ian Dunham, et al. Democratizing knowledge
representation with biocypher. Nature Biotechnology, pages 1–4, 2023.
[144] Bo Zhou, Neel Dey, Jo Schlemper, Seyed Sadegh Mohseni Salehi,
Chi Liu, James S. Duncan, and Michal Sofka. DSFormer: A dual-
domain self-supervised transformer for accelerated multi-contrast MRI
reconstruction. In Proceedings of the IEEE/CVF Winter Conference on
Applications of Computer Vision (WACV), pages 4966–4975, January
2023.
[145] Bjoern H. Menze, Andras Jakab, Stefan Bauer, Jayashree Kalpathy-
Cramer, Keyvan Farahani, Justin Kirby, Yuliya Burren, Nicole Porz,
Johannes Slotboom, Roland Wiest, Levente Lanczi, Elizabeth Gerstner,
Marc-Andr´
e Weber, Tal Arbel, Brian B. Avants, and Nicholas et al
Ayache. The multimodal brain tumor image segmentation benchmark
(BRATS). IEEE Transactions on Medical Imaging, 34(10):1993–2024,
2015.
[146] Pengfei Guo, Puyang Wang, Jinyuan Zhou, Shanshan Jiang, and
Vishal M Patel. Multi-institutional collaborations for improving deep
learning-based magnetic resonance image reconstruction using fed-
erated learning. In Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition, pages 2423–2432, 2021.
[147] Chun-Mei Feng, Yunlu Yan, Shanshan Wang, Yong Xu, Ling Shao,
and Huazhu Fu. Specificity-preserving federated learning for mr image
reconstruction. IEEE Transactions on Medical Imaging, 2022.
[148] Hailin Wang, Ke Qin, Rufai Yusuf Zakari, Guoming Lu, and Jin Yin.
Deep neural network-based relation extraction: an overview. Neural
Computing and Applications, pages 1–21, 2022.
[149] Martin Krallinger, Obdulia Rabal, Saber A Akhondi, Martın P´
erez
P´
erez, Jes´
us Santamar´
ıa, Gael P´
erez Rodr´
ıguez, Georgios Tsatsaro-
nis, Ander Intxaurrondo, Jos´
e Antonio L´
opez, Umesh Nandal, et al.
Overview of the BioCreative VI chemical-protein interaction track. In
Proceedings of the sixth BioCreative challenge evaluation workshop,
volume 1, pages 141–146, 2017.
[150] Diana Sousa, Andre Lamurias, and Francisco M. Couto. A silver
standard corpus of human phenotype-gene relations. In Proceedings
of the 2019 Conference of the North. Association for Computational
Linguistics, 2019.
[151] Alex Nichol, Joshua Achiam, and John Schulman. On first-order meta-
learning algorithms. arXiv preprint arXiv:1803.02999, 2018.
[152] Rouzbeh Talebi Zarinkamar and Rene V Mayorga. Lifespan prediction
for lung and bronchus cancer patients via machine learning techniques.
International Journal of Machine Learning and Computing, 12(5),
2022.
[153] David R Cox. Regression models and life-tables. Journal of the Royal
Statistical Society: Series B (Methodological), 34(2):187–202, 1972.
[154] Jared L Katzman, Uri Shaham, Alexander Cloninger, Jonathan Bates,
Tingting Jiang, and Yuval Kluger. DeepSurv: personalized treatment
recommender system using a cox proportional hazards deep neural
network. BMC medical research methodology, 18(1):1–12, 2018.
[155] Alfred F Connors, Neal V Dawson, Norman A Desbiens, William J
Fulkerson, Lee Goldman, William A Knaus, Joanne Lynn, Robert K
Oye, Marilyn Bergner, Anne Damiano, et al. A controlled trial
to improve care for seriously iII hospitalized patients: The study
to understand prognoses and preferences for outcomes and risks of
treatments (SUPPORT). Jama, 274(20):1591–1598, 1995.
[156] Ali Ramezani-Kebrya, Fanghui Liu, Thomas Pethick, Grigorios
Chrysos, and Volkan Cevher. Federated learning under covariate shifts
with generalization guarantees. arXiv preprint arXiv:2306.05325, 2023.
[157] Aidan Hogan, Eva Blomqvist, Michael Cochez, Claudia d’Amato,
Gerard de Melo, Claudio Gutierrez, Sabrina Kirrane, Jos´
e Emilio Labra
Gayo, Roberto Navigli, Sebastian Neumaier, et al. Knowledge graphs.
ACM Computing Surveys (CSUR), 54(4):1–37, 2021.
[158] Gabriela Csurka, Diane Larlus, Florent Perronnin, and France Meylan.
What is a good evaluation measure for semantic segmentation? In
Bmvc, volume 27, pages 10–5244. Bristol, 2013.
[159] Metrics and scoring: quantifying the quality of predictions.
https://scikit-learn.org/stable/modules/model evaluation.html#
classification-metrics, 2023.
[160] Alain Hore and Djemel Ziou. Image quality metrics: PSNR vs. SSIM.
In 2010 20th international conference on pattern recognition, pages
2366–2369. IEEE, 2010.
[161] Erika Graf, Claudia Schmoor, Willi Sauerbrei, and Martin Schumacher.
Assessment and comparison of prognostic classification schemes for
survival data. Statistics in medicine, 18(17-18):2529–2545, 1999.
[162] Ali Raza, Kim Phuc Tran, Ludovic Koehl, and Shujun Li. Designing
ECG monitoring healthcare system with federated transfer learning and
explainable AI. Knowledge-Based Systems, 236:107763, 2022.
[163] Raouf Kerkouche, Gergely Acs, Claude Castelluccia, and Pierre
Genev`
es. Privacy-preserving and bandwidth-efficient federated learn-
ing: An application to in-hospital mortality prediction. In Proceedings
of the Conference on Health, Inference, and Learning, pages 25–35,
2021.
[164] Mona Flores, Ittai Dayan, Holger Roth, Aoxiao Zhong, Ahmed
Harouni, Amilcare Gentili, Anas Abidin, Andrew Liu, Anthony Costa,
21
Bradford Wood, et al. Federated learning used for predicting outcomes
in SARS-COV-2 patients. Research Square, 2021.
[165] Akhil Vaid, Suraj K Jaladanki, Jie Xu, Shelly Teng, Arvind Kumar,
Samuel Lee, Sulaiman Somani, Ishan Paranjpe, Jessica K De Freitas,
Tingyi Wanyan, et al. Federated learning of electronic health records
improves mortality prediction in patients hospitalized with COVID-19.
MedRxiv, 2020.
[166] Trung Kien Dang, Kwan Chet Tan, Mark Choo, Nicholas Lim, Jianshu
Weng, and Mengling Feng. Building ICU in-hospital mortality predic-
tion model with federated learning. Federated Learning: Privacy and
Incentive, pages 255–268, 2020.
[167] Kaggle diabetic retinopathy detection. https://www.kaggle.com/
competitions/diabetic-retinopathy- detection/data, 2017. Accessed:
March 18, 2023.
[168] Sharib Ali, Debesh Jha, Noha Ghatwary, Stefano Realdon, Renato
Cannizzaro, Osama E Salem, Dominique Lamarque, Christian Daul,
Michael A Riegler, Kim V Anonsen, et al. PolypGen: A multi-
center polyp detection and segmentation dataset for generalisability
assessment. arXiv preprint arXiv:2106.04463, 2021.
[169] Amber L Simpson, Michela Antonelli, Spyridon Bakas, Michel Bilello,
Keyvan Farahani, Bram Van Ginneken, Annette Kopp-Schneider, Ben-
nett A Landman, Geert Litjens, Bjoern Menze, et al. A large
annotated medical image dataset for the development and evaluation
of segmentation algorithms. arXiv preprint arXiv:1902.09063, 2019.
[170] Holger R Roth, Le Lu, Amal Farag, Hoo-Chang Shin, Jiamin Liu,
Evrim B Turkbey, and Ronald M Summers. DeepOrgan: Multi-level
deep convolutional networks for automated pancreas segmentation. In
International conference on medical image computing and computer-
assisted intervention, pages 556–564. Springer, 2015.
[171] Bennett Landman, Zhoubing Xu, J Igelsias, Martin Styner, T Langerak,
and Arno Klein. MICCAI multi-atlas labeling beyond the cranial
vault–workshop and challenge. In Proc. MICCAI Multi-Atlas Labeling
Beyond Cranial Vault—Workshop Challenge, volume 5, page 12, 2015.
[172] Yingda Xia, Dong Yang, Wenqi Li, Andriy Myronenko, Daguang
Xu, Hirofumi Obinata, Hitoshi Mori, Peng An, Stephanie Harmon,
Evrim Turkbey, et al. Auto-FedAvg: learnable federated averaging
for multi-institutional medical image segmentation. arXiv preprint
arXiv:2104.10195, 2021.
[173] Patrick Bilic, Patrick Christ, Hongwei Bran Li, Eugene Vorontsov,
Avi Ben-Cohen, Georgios Kaissis, Adi Szeskin, Colin Jacobs, Gabriel
Efrain Humpire Mamani, Gabriel Chartrand, et al. The liver tumor
segmentation benchmark (LITS). Medical Image Analysis, 84:102680,
2023.
[174] Tobias Heimann, Bram van Ginneken, Martin A. Styner, Yulia
Arzhaeva, Volker Aurich, Christian Bauer, Andreas Beck, Christoph
Becker, Reinhard Beichel, Gy ¨
Orgy Bekes, Fernando Bello, Gerd Bin-
nig, and et al Bischof, Horst. Comparison and evaluation of methods
for liver segmentation from CT datasets. IEEE Transactions on Medical
Imaging, 28(8):1251–1265, 2009.
[175] A Emre Kavur, N Sinem Gezer, Mustafa Barıs¸, Sinem Aslan, Pierre-
Henri Conze, Vladimir Groza, Duc Duy Pham, Soumick Chatterjee,
Philipp Ernst, Savas¸ ¨
Ozkan, et al. CHAOS challenge-combined (CT-
MR) healthy abdominal organ segmentation. Medical Image Analysis,
69:101950, 2021.
[176] Kooperative gesundheitsforschung in der region augsburg (KORA).
https://www.helmholtz-muenchen.de/en/kora/index.html, 2005. Ac-
cessed: March 18, 2023.
[177] Tobias Bernecker, Annette Peters, Christopher L Schlett, Fabian Bam-
berg, Fabian Theis, Daniel Rueckert, Jakob Weiß, and Shadi Albar-
qouni. FedNorm: Modality-based normalization in federated learning
for multi-modal liver segmentation. arXiv preprint arXiv:2205.11096,
2022.
[178] Hayden Gunraj, Linda Wang, and Alexander Wong. Covidnet-ct: A
tailored deep convolutional neural network design for detection of
covid-19 cases from chest ct images. Frontiers in Medicine, 7:1025,
2020.
[179] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep
residual learning for image recognition. In Proceedings of the IEEE
conference on computer vision and pattern recognition, pages 770–778,
2016.
[180] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic
optimization. arXiv preprint arXiv:1412.6980, 2014.
[181] Pieter-Tjerk De Boer, Dirk P Kroese, Shie Mannor, and Reuven Y
Rubinstein. A tutorial on the cross-entropy method. Annals of
operations research, 134:19–67, 2005.
[182] Xueluan Gong, Yanjiao Chen, Qian Wang, and Weihan Kong. Backdoor
attacks and defenses in federated learning: State-of-the-art, taxonomy,
and future directions. IEEE Wireless Communications, 2022.
[183] Shengshan Hu, Jianrong Lu, Wei Wan, and Leo Yu Zhang. Challenges
and approaches for mitigating byzantine attacks in federated learning.
arXiv preprint arXiv:2112.14468, 2021.
[184] Evelyn Ma, Rasoul Etesami, et al. Local environment poisoning attacks
on federated reinforcement learning. arXiv preprint arXiv:2303.02725,
2023.
[185] Nuria Rodr´
ıguez-Barroso, Daniel Jim´
enez-L´
opez, M Victoria Luz´
on,
Francisco Herrera, and Eugenio Mart´
ınez-C´
amara. Survey on feder-
ated learning threats: Concepts, taxonomy on attacks and defences,
experimental study and challenges. Information Fusion, 90:148–173,
2023.
[186] Xiaoli Li, Siran Zhao, Chuan Chen, and Zibin Zheng. Heterogeneity-
aware fair federated learning. Information Sciences, 619:968–986,
2023.
[187] Lin Wang, Zhichao Wang, and Xiaoying Tang. FedEBA+: Towards
fair and effective federated learning via entropy-based model. arXiv
preprint arXiv:2301.12407, 2023.
[188] Xiaodong Ma, Jia Zhu, Zhihao Lin, Shanxuan Chen, and Yangjie Qin.
A state-of-the-art survey on solving non-IID data in federated learning.
Future Generation Computer Systems, 135:244–258, 2022.
[189] Wenke Huang, Mang Ye, and Bo Du. Learn from others and be
yourself in heterogeneous federated learning. In Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition
(CVPR), pages 10143–10153, June 2022.
[190] Tian Li, Maziar Sanjabi, Ahmad Beirami, and Virginia Smith.
Fair resource allocation in federated learning. arXiv preprint
arXiv:1905.10497, 2019.
[191] Christos L Stergiou, Konstantinos E Psannis, and Brij B Gupta. Infemo:
flexible big data management through a federated cloud system. ACM
Transactions on Internet Technology (TOIT), 22(2):1–22, 2021.
[192] Emily R Pfaff, Andrew T Girvin, Davera L Gabriel, Kristin Kostka,
Michele Morris, Matvey B Palchuk, Harold P Lehmann, Benjamin
Amor, Mark Bissell, Katie R Bradwell, et al. Synergies between
centralized and federated approaches to data quality: a report from the
national covid cohort collaborative. Journal of the American Medical
Informatics Association, 29(4):609–618, 2022.
[193] Daoyuan Chen, Dawei Gao, Yuexiang Xie, Xuchen Pan, Zitao Li,
Yaliang Li, Bolin Ding, and Jingren Zhou. Fs-real: Towards real-world
cross-device federated learning. arXiv preprint arXiv:2303.13363,
2023.
[194] Xiaohu You, Cheng-Xiang Wang, Jie Huang, Xiqi Gao, Zaichen
Zhang, Mao Wang, Yongming Huang, Chuan Zhang, Yanxiang Jiang,
Jiaheng Wang, et al. Towards 6G wireless communication networks:
Vision, enabling technologies, and new paradigm shifts. Science China
Information Sciences, 64:1–74, 2021.
[195] Ge Wang, Andreu Badal, Xun Jia, Jonathan S Maltz, Klaus Mueller,
Kyle J Myers, Chuang Niu, Michael Vannier, Pingkun Yan, Zhou Yu,
et al. Development of metaverse for intelligent healthcare. Nature
Machine Intelligence, pages 1–8, 2022.
[196] Hu Xiong, Chuanjie Jin, Mamoun Alazab, Kuo-Hui Yeh, Hanxiao
Wang, Thippa Reddy Gadekallu, Weizheng Wang, and Chunhua Su.
On the design of blockchain-based ECDSA with fault-tolerant batch
verification protocol for blockchain-enabled IoMT. IEEE Journal of
Biomedical and Health Informatics, 26(5):1977–1986, 2022.
[197] Phuc H Le-Khac, Graham Healy, and Alan F Smeaton. Contrastive
representation learning: A framework and review. IEEE Access,
8:193907–193934, 2020.
[198] Fei Kong, Jinxi Xiang, Xiyue Wang, Xinran Wang, Meng Yue, Jun
Zhang, Sen Yang, Junhan Zhao, Xiao Han, Yuhan Dong, et al.
Federated contrastive learning models for prostate cancer diagnosis and
Gleason grading. arXiv preprint arXiv:2302.06089, 2023.
[199] Guangyao Zheng, Michael A Jacobs, Vladimir Braverman, and
Vishwa S Parekh. Asynchronous decentralized federated lifelong
learning for landmark localization in medical imaging. arXiv preprint
arXiv:2303.06783, 2023.
[200] Marco Cascella, Jonathan Montomoli, Valentina Bellini, and Elena
Bignami. Evaluating the feasibility of ChatGPT in healthcare: An
analysis of multiple clinical and research scenarios. Journal of Medical
Systems, 47(1):1–5, 2023.
... It can not only ensure the security of users' private data but also enhance model performance by collaborating diverse data from various local sources. It finds applications in diverse scenarios, including smart healthcare [8], financial services [6], IoT [21], and intelligent transportation [35]. However, FL systems are vulnerable, and the performance of the aggregated model is highly susceptible to model poisoning attacks from unknown clients, especially the sophisticated poisoning strategies tailored for central servers. ...
Preprint
Federated learning is highly susceptible to model poisoning attacks, especially those meticulously crafted for servers. Traditional defense methods mainly focus on updating assessments or robust aggregation against manually crafted myopic attacks. When facing advanced attacks, their defense stability is notably insufficient. Therefore, it is imperative to develop adaptive defenses against such advanced poisoning attacks. We find that benign clients exhibit significantly higher data distribution stability than malicious clients in federated learning in both CV and NLP tasks. Therefore, the malicious clients can be recognized by observing the stability of their data distribution. In this paper, we propose AdaAggRL, an RL-based Adaptive Aggregation method, to defend against sophisticated poisoning attacks. Specifically, we first utilize distribution learning to simulate the clients' data distributions. Then, we use the maximum mean discrepancy (MMD) to calculate the pairwise similarity of the current local model data distribution, its historical data distribution, and global model data distribution. Finally, we use policy learning to adaptively determine the aggregation weights based on the above similarities. Experiments on four real-world datasets demonstrate that the proposed defense model significantly outperforms widely adopted defense models for sophisticated attacks.
... Chaddad et al. [19] 2024 FL applications in healthcare; use cases Detailed ...
Article
Full-text available
Federated learning (FL) is considered a de facto standard for privacy preservation in AI environments because it does not require data to be aggregated in some central place to train an AI model. Preserving data on the client side and sharing only the model’s parameters with a central server preserves privacy while training an AI model of higher generalizability. Unfortunately, sharing the model’s parameters with the server can create privacy leaks, and therefore, FL is unable to meet privacy requirements in many situations. Furthermore, FL is prone to other technical issues, such as data poisoning, model poisoning, fairness, client dropout, and convergence issues, to name just a few. In this work, we provide a multifaceted survey on FL, including its fundamentals, paradigm shifts, technical issues, recent developments, and future prospects. First, we discuss the fundamental concepts of FL (workflow, categorization, the differences between centralized learning and FL, and applications of FL in diverse fields), and we then discuss the paradigm shifts brought on by FL from a broader perspective (e.g., data use, AI model development, resource sharing, etc.). Later, we pinpoint 10 practical issues currently hindering the viability of the FL landscape, and we discuss developments made under each issue by summarizing state-of-the-art (SOTA) literature. We highlight FL partnerships with two or more technologies that either improve practical aspects/issues in FL or extend its adoption to new areas/domains. We pinpoint various trade-offs that exist in an FL ecosystem, and the corresponding SOTA developments to mitigate them. We also discuss the latest studies that have been proposed to make FL trustworthy and beneficial for the community. Lastly, we suggest valuable research directions towards enhancing technical efficacy by guiding researchers to less explored topics in FL.
Article
Full-text available
Federated learning (FL) has emerged as one of the de-facto privacy-preserving paradigms that can effectively work with decentralized data sources (e.g., hospitals) without acquiring any private data. Recently, applications of FL have vastly expanded into multiple domains, particularly the medical domain, and FL is becoming one of the mainstream technologies of the near future. In this study, we provide insights into FL fundamental concepts (e.g., the difference from centralized learning, functions of clients and servers, workflows, and nature of data), architecture and applications in the general medical domain, synergies with emerging technologies, key challenges (medical domain), and potential research prospects. We discuss major taxonomies of the FL systems and enlist technical factors in the FL ecosystem that are the foundation of many adversarial attacks on these systems. We also highlight the promising applications of FL in the medical domain by taking the recent COVID-19 pandemic as an application use case. We highlight potential research and development trajectories to further enhance the persuasiveness of this emerging paradigm from the technical point of view. We aim to concisely present the progress of FL up to the present in the medical domain including COVID-19 and to suggest future research trajectories in this area.
Article
Full-text available
With the advent of the Internet of Things (IoT), Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL) algorithms, the landscape of data-driven medical applications has emerged as a promising avenue for designing robust and scalable diagnostic and prognostic models from medical data. This has gained a lot of attention from both academia and industry, leading to significant improvements in healthcare quality. However, the adoption of AI-driven medical applications still faces tough challenges, including meeting security, privacy, and quality of service (QoS) standards. Recent developments in Federated Learning (FL) have made it possible to train complex machine-learned models in a distributed manner and has become an active research domain, particularly processing the medical data at the edge of the network in a decentralized way to preserve privacy and address security concerns. To this end, in this paper, we explore the present and future of FL technology in medical applications where data sharing is a significant challenge. We delve into the current research trends and their outcomes, unravelling the complexities of designing reliable and scalable FL models. Our paper outlines the fundamental statistical issues in FL, tackles device-related problems, addresses security challenges, and navigates the complexity of privacy concerns, all while highlighting its transformative potential in the medical field. Our study primarily focuses on medical applications of FL, particularly in the context of global cancer diagnosis. We highlight the potential of FL to enable computer-aided diagnosis tools that address this challenge with greater effectiveness than traditional data-driven methods. Recent literature has shown that FL models are robust and generalize well to new data, which is essential for medical applications. We hope that this comprehensive review will serve as a checkpoint for the field, summarizing the current state-of-the-art and identifying open problems and future research directions.
Article
Full-text available
Federated medical relation extraction enables multiple clients to train a deep network collaboratively without sharing their raw medical data. To handle the heterogeneous label distribution across clients, most of the existing works enforce regularization between local and global models during updating. In this paper, we propose the concept of major classifier vectors, which are a group of classifier vectors that characterize the representation space of relation classes well. They are obtained by comparing the inter-classifier similarity between clients, which is an ensembling method that avoids the bias introduced by weighted aggregation. We propose an algorithm named FedCMC, which restricts the updating of local models by contrasting with major classifier vectors to avoid overfitting to the local label distribution by comparison with major classifier vectors. Extensive experiments show that FedCMC outperforms the other state-of-the-art federated learning (FL) algorithms on three medical relation extraction datasets.
Article
Full-text available
Texture-based convolutional neural networks (CNNs) have shown great promise in predicting various types of cancer, including lower grade glioma (LGG) through radiomics analysis. However, the use of CNN-based radiomics requires a large training set to avoid overfitting. To overcome this problem, the study proposes a novel panel of radiomic/texture features based on principal component analysis (PCA) applied to pretrained CNN features. The study used extracted PCA-CNN radiomic features from multimodal magnetic resonance imaging (MRI) images as input to a random forest (RF) classifier to predict immune cell markers, the gene status, and the survival outcome for LGG patients (n = 83). The results of the experiments demonstrate that RF with PCA-CNN radiomic features improved the classification performance, achieving the highest significant classification between short- and long-term survival outcomes. Notably, the area under the curve for PCA-CNN radiomic features with RF was 78.53% (p = 0.0008), which was significantly better than using gene status 63.14% (p = 0.23), clinical variables 52.60% (p = 0.32), standard radiomic features 72.56% (p = 0.02), immune cell markers 65.67% (p = 0.007), conditional entropy 74.54% (p = 0.0058), Gaussian mixture model-CNN 74.94% (p = 0.0053), or using 3D CNN classification directly without RF 72.61% (p = 0.01). The proposed PCA-CNN-based radiomic model outperformed state-of-the-art techniques to predict the survival outcome of LGG patients.
Article
Full-text available
Transformer, the model of choice for natural language processing, has drawn scant attention from the medical imaging community. Given the ability to exploit long-term dependencies, transformers are promising to help atypical convolutional neural networks to learn more contextualized visual representations. However, most of recently proposed transformer-based segmentation approaches simply treated transformers as assisted modules to help encode global context into convolutional representations. To address this issue, we introduce nnFormer (i.e., not-another transFormer), a 3D transformer for volumetric medical image segmentation. nnFormer not only exploits the combination of interleaved convolution and self-attention operations, but also introduces local and global volume-based self-attention mechanism to learn volume representations. Moreover, nnFormer proposes to use skip attention to replace the traditional concatenation/summation operations in skip connections in U-Net like architecture. Experiments show that nnFormer significantly outperforms previous transformer-based counterparts by large margins on three public datasets. Compared to nnUNet, the most widely recognized convnet-based 3D medical segmentation model, nnFormer produces significantly lower HD95 and is much more computationally efficient. Furthermore, we show that nnFormer and nnUNet are highly complementary to each other in model ensembling. Codes and models of nnFormer are available at https://git.io/JSf3i .
Article
Full-text available
Coronavirus disease 2019 (COVID-19) has been challenged specifically with the new variant. The number of patients seeking treatment has increased significantly, putting tremendous pressure on hospitals and healthcare systems. With the potential of artificial intelligence (AI) to leverage clinicians to improve personalized medicine for COVID-19, we propose a deep learning model based on 1D and 3D convolutional neural networks (CNNs) to predict the survival outcome of COVID-19 patients. Our model consists of two CNN channels that operate with CT scans and the corresponding clinical variables. Specifically, each patient data set consists of CT images and the corresponding 44 clinical variables used in the 3D CNN and 1D CNN input, respectively. This model aims to combine imaging and clinical features to predict short-term from long-term survival. Our models demonstrate higher performance metrics compared to state-of-the-art models with AUC-ROC of 91.44 – 91.60% versus 84.36 – 88.10% and Accuracy of 83.39 – 84.47% versus 79.06 – 81.94% in predicting the survival groups of patients with COVID-19. Based on the findings, the combined clinical and imaging features in the deep CNN model can be used as a prognostic tool and help to distinguish censored and uncensored cases of COVID-19.
Article
Artificial intelligence (AI) continues to transform data analysis in many domains. Progress in each domain is driven by a growing body of annotated data, increased computational resources, and technological innovations. In medicine, the sensitivity of the data, the complexity of the tasks, the potentially high stakes, and a requirement of accountability give rise to a particular set of challenges. In this review, we focus on three key methodological approaches that address some of the particular challenges in AI-driven medical decision making. 1) Explainable AI aims to produce a human-interpretable justification for each output. Such models increase confidence if the results appear plausible and match the clinicians expectations. However, the absence of a plausible explanation does not imply an inaccurate model. Especially in highly non-linear, complex models that are tuned to maximize accuracy, such interpretable representations only reflect a small portion of the justification. 2) Domain adaptation and transfer learning enable AI models to be trained and applied across multiple domains. For example, a classification task based on images acquired on different acquisition hardware. 3) Federated learning enables learning large-scale models without exposing sensitive personal health information. Unlike centralized AI learning, where the centralized learning machine has access to the entire training data, the federated learning process iteratively updates models across multiple sites by exchanging only parameter updates, not personal health data. This narrative review covers the basic concepts, highlights relevant corner-stone and state-of-the-art research in the field, and discusses perspectives.