ArticlePDF Available

Federated Learning for Healthcare Applications

October 2023
IEEE Internet of Things Journal PP(99):1-1

October 2023
PP(99):1-1

DOI:10.1109/JIOT.2023.3325822

Authors:

Ahmad Chaddad

Guilin University of Electronic Technology

Yihang Wu

Guilin University of Electronic Technology

Christian Desrosiers

École de Technologie Supérieure

Due to the fast advancement of artificial intelligence (AI), centralized-based models have become critical for healthcare tasks like in medical image analysis and human behavior recognition. Although these models exhibit suitable performance, they are frequently constrained by privacy concerns. To attenuate this, a centralized learning strategy cannot be used in cases where there is a risk of data privacy breach, particularly in healthcare centers. Federated learning (FL) is a technique that allows for training a global model without sharing data by training distributed local models and aggregating them. By implementing FL throughout the training process, we can obtain a model with comparable generalization abilities to centralized learning while maintaining data privacy. This survey provides an introduction to the fundamental concepts and categories of FL, highlights the limitations of the centralized healthcare model, and discusses how FL can address these constraints. We also provide a detailed overview of the healthcare applications using FL models, along with commonly used evaluation metrics and public datasets. In this context, we have implemented a case study to demonstrate how FL can be applied in the healthcare field. Furthermore, we outline the key challenges and future trends in FL.

Content uploaded by Ahmad Chaddad

Content may be subject to copyright.

Federated Learning for Healthcare Applications

Ahmad Chaddad*, Yihang Wu, Christian Desrosiers

Abstract—Due to the fast advancement of artiﬁcial intelligence

(AI), centralized-based models have become critical for health-

care tasks like in medical image analysis and human behavior

recognition. Although these models exhibit suitable performance,

they are frequently constrained by privacy concerns. To attenuate

this, a centralized learning strategy cannot be used in cases where

there is a risk of data privacy breach, particularly in healthcare

centers. Federated learning (FL) is a technique that allows

for training a global model without sharing data by training

distributed local models and aggregating them. By implementing

FL throughout the training process, we can obtain a model with

comparable generalization abilities to centralized learning while

maintaining data privacy. This survey provides an introduction

to the fundamental concepts and categories of FL, highlights

the limitations of the centralized healthcare model, and discusses

how FL can address these constraints. We also provide a detailed

overview of the healthcare applications using FL models, along

with commonly used evaluation metrics and public datasets. In

this context, we have implemented a case study to demonstrate

how FL can be applied in the healthcare ﬁeld. Furthermore, we

outline the key challenges and future trends in FL.

Keywords: Federated Learning, Healthcare, Medical Imaging,

Data Privacy, Artiﬁcial Intelligence.

I. INTRODUCTION

In light of modern AI, various state-of-the-art AI techniques,

including deep learning (DL) and the Internet of Medical

Things (IoMTs), have made their way into the healthcare

industry. This leads to improve the diagnosis and treatment of

various conditions such as COVID-19 [1] and autism spectrum

disorder (ASD) [2].

However, existing intelligent healthcare AI models need

to be truly intelligent, and some have been criticized for

providing ineffective and unsafe treatment recommendations

[3]. Several factors may have caused deﬁciencies in existing

systems. A signiﬁcant issue is the difﬁculty of obtaining

sufﬁcient data with complex features that can adequately

describe the patient’s symptoms.

In addition, with the implementation of rigorous laws such

as the United States Consumer Privacy Bill of Rights and the

European Commission’s General Data Protection Regulations

(GDPR), which aim to safeguard individuals’ privacy [4],

AI models are now unable to directly access source data

for training purposes. Instead, they must adhere to strict

limitations and regulatory requirements.

FL, which offers a novel distributed AI paradigm aimed

at addressing concerns related to healthcare data privacy

and management [5], has emerged as a popular subject of

A. Chaddad and Y. Wu are with the School of Artiﬁcial Intelligence, Guilin

University of Electronic Technology, Guilin, China.

Corresponding author: Ahmad Chaddad.

A. Chaddad and C. Desrosiers are with The Laboratory for Imagery, Vi-

sion and Artiﬁcial Intelligence, Ecole de Technologie Superieure, Montreal,

Canada. Email: ahmad8chaddad@gmail.com, ahmadchaddad@guet.edu.cn

Manuscript received May 9, 2023; revised August -, 2023.

Fig. 1: This example illustrates the results of a search query

conducted on the PubMed and Google Scholar databases. The

search query was formulated using the keywords “Federated”

AND ( (medical) OR (healthcare) ), and the bars show the

number of publications indexed on each platform that matched

the search criteria related to the topics discussed in this review.

discussion in recent years [6]. Google ﬁrst introduced FL in

2015 [7]. Essentially, FL is a distributed AI methodology that

involves training several local models and aggregating them

to derive a global model without the need for data sharing.

FL can be speciﬁcally applied in the following situations:

Non-IID data: In the realm of traditional machine learning,

it is common practice to assume that data is independently and

identically distributed (IID). However, it is important to note

that in the majority of practical scenarios and circumstances,

this assumption is not met. For instance, each individual client

exhibits a unique set of behaviors, resulting in the collection

of biased data that may differ from that of other participants

[8]. This, in turn, can lead to the emergence of Non-IID or

Heterogeneous data, which can pose a challenge for machine

learning models.

Unbalanced data distribution: An unbalanced data distri-

bution occurs when certain participants in the training dataset

possess a disproportionate amount of pertinent data. For ex-

ample, in a scenario where the training participants include

both hospitals and individuals, hospitals are likely to have

signiﬁcantly larger sample sizes than individuals. Addition-

ally, data relevant to the same disease can vary substantially

between hospitals due to differences in equipment, personnel,

and other factors. This can create challenges for machine

learning models, especially when attempting to generalize to

new and diverse datasets.

Data privacy protection: In recent years, the enactment

of multiple data privacy protection laws, especially in the

medical domain, has made it extremely difﬁcult to acquire

large amounts of data in a single batch for modeling training

purposes. Such data frequently contain conﬁdential patient

information, making it essential to limit access to the actual

data and only provide access to the model’s parameters. How-

ever, this poses a signiﬁcant challenge for machine learning

models, which rely heavily on a vast and diverse dataset to

learn patterns effectively and make accurate predictions.

FL has emerged as a particularly promising methodology

for smart healthcare, as it enables multiple hospitals to train

AI models collaboratively without compromising the conﬁden-

tiality of their raw data [9]. The healthcare industry can take

advantage of FL for a wide range of applications, including

expert diagnosis, drug development, medical image analysis,

electronic health data collection, human activity recognition

(HAR), and remote health monitoring. The broad applicability

of FL in the healthcare sector is illustrated in Fig. 2. The

outstanding potential of FL in healthcare has generated a

growing interest in the research community. This can be

observed in Fig. 1, which shows the number of research

publications on FL in the healthcare sector on PubMed and

Google Scholar over the years.

The contribution of our work is summarized as follows:

•We provide a detailed deﬁnition of FL and present its

various categories.

•We identify the limitations of existing traditional health-

care models and demonstrate how FL can address those

issues.

•We offer a comprehensive explanation of how FL can

be applied in healthcare, including an evaluation of the

performance of different FL methods.

•We describe the commonly used evaluation metrics and

datasets for FL in healthcare.

•We outline the key challenges and future directions of FL

in the healthcare domain.

Our work offers a thorough examination of FL models used

within the healthcare ﬁeld. It includes a detailed analysis of

their limitations with distinctive contributions. Furthermore,

we present a case study demonstrating the application of FL

with medical images to illustrate its relevance in healthcare.

Additionally, we focus on FL challenges that are both common

and unique to this domain. The structure of this paper is as fol-

lows. In Section II, we provide a comprehensive deﬁnition of

FL and explain the key differences between FL and traditional

learning (TL). We also highlight the different applications of

FL in various domains. The main steps involved in FL and

its different categories are presented in Section III. In Section

IV, we delve into the details of applying FL in the healthcare

sector. Additionally, we provide information on the commonly

used datasets and evaluation metrics for FL in healthcare

in Section V. Section VI focuses on the critical challenges

associated with FL. We then discuss the future trends of FL in

healthcare in Section VII. Finally, we offer concluding remarks

in Section VIII.

II. BACKGROUND

This section begins with a formal deﬁnition of FL, followed

by a comparison between FL and traditional machine learning

methodologies. The section concludes with a discussion of

the challenges faced by the healthcare ﬁeld and the increasing

need for FL to overcome these challenges.

Fig. 2: Examples of federated learning applications in health-

care, including drug development, human activity recognition,

remote health monitoring, electronic health data recording,

medical image analysis, and assisted expert diagnosis. These

are some of the most common use cases for FL in the

healthcare ﬁeld.

A. The deﬁnition of federated learning

Assuming a group of Nparticipants denoted by {Fi}N

i=1,

each with their own dataset {Di}N

i=1, traditional learning

merges all data into a single dataset D=D1∪ D2,· · · ∪ DN,

which is then used to train a model MSUM. In contrast, FL

allows each participant to train their own local model on their

own dataset, without sharing their data, and then obtain an

aggregation model in a global server, denoted by MFED .

If we use VSUM and VFED to represent the performance

metrics (such as accuracy, recall, precision, etc.) of MSUM

and MFED, respectively, we say that the FL model MFED has

a performance loss of δif it satisﬁes the following inequality:

VSUM − VFED < δ (1)

where δis a non-negative constant that represents the maxi-

mum amount of performance loss that can be tolerated in the

FL model compared to the traditional approach.

Based on the above statements, we can see that the perfor-

mance of FL depends largely on the aggregation algorithm.

Formally, the ﬁnal goal of FL is the optimization of the

following objective function [10]:

min

ωL(ω) =

i=1

fiLi(ω)(2)

where ωis the weights of local models, L(ω)is the global

loss while Li(ω)is the local loss. fiindicates the importance

of the clients who participated in the training.

B. Traditional Learning vs Federated Learning

When it comes to traditional learning (TL), data is typically

shared and used collaboratively to develop a model. However,

if the data contains personally identiﬁable information, there

may be privacy concerns and potential data leaks that could

disrupt the training process. In contrast, FL allows for the

training of local models and their subsequent aggregation into

a global model without directly accessing each other’s data.

This approach can lead to highly accurate models without

compromising data privacy. For further information on the

differences between TL and FL, please refer to Table I.

TABLE I: Traditional Learning vs Federated Learning

Criteria Traditional Learning Federated Learning

Training method Centralized learning Distribute learning

Training process Training together Training on edge devices

Aggregation No aggregation Aggregation on server

Model Shared model Personalized model / Shared models

Sharing process Data sharing Raw data encryption

Iterations One-time Iterative process

C. The limitations of the existing AI-based healthcare models

Most current AI-based healthcare applications such as clas-

siﬁcation [11], [12] and segmentation [13], [14] are related

to TL approaches. We summarize the limitations of TL as

follows.

Privacy leakage issues: As discussed previously, tra-

ditional AI-based smart healthcare requires the sharing of

raw data. As some of these records contain private patient

information, this leads to privacy compromise issues. For

instance, a third-party service can modify data patterns without

user consent [15].

Data limitation: Although machine learning, especially

deep learning (DL), is becoming the primary approach in many

industries, this approach requires large and diverse datasets

for training [16]. Unfortunately, in reality, only some medical

institutions have sufﬁcient data for training the model. For

example, a small research center may desire to build an AI

model using limited datasets. This leads to a trained model

with poor generalizability.

Communication consumption for model training: The

transfer of large amounts of data in TL-based smart healthcare

models can lead to network latency issues, as noted in [17].

This poses a signiﬁcant challenge for medical institutions

and network connectivity, particularly with respect to energy

consumption.

D. Federated learning: An appropriate approach to address

current challenges

Given the various challenges discussed previously, FL can

be viewed as a potential solution to the many issues present in

modern smart healthcare, particularly in the following aspects:

Protection of data privacy: FL ensures data privacy by

allowing clients to share only model parameters, not the raw

data, as mentioned in the previous section. This approach is

highly effective in protecting data privacy. In a study cited

in [18], homomorphic encryption-based privacy-preserving

strategies were used to address data privacy leakage issues. As

data privacy laws continue to become more stringent, FL is

expected to play a crucial role in the smart healthcare industry.

Reduce training consumption: The FL technique can

distribute data efﬁciently to each edge server, leading to

Fig. 3: Flowchart of federated learning model for healthcare.

The process involves several steps, including model initializa-

tion and client selection (Left), local training and parameter

upload (Middle), and model aggregation and parameter down-

load (Right). 1) the global model is initialized, and clients

are selected to participate in the federated learning process.

2) second step involves local training on client data and the

upload of updated model parameters to the server. Finally, the

updated parameters from all clients are aggregated to create a

new global model, which is transmitted back to the clients

for the next round of training. This approach enables the

training of models using decentralized data while preserving

data privacy.

a reduction in communication usage, network transmission

latency, and costs. Sharing model parameters through FL

typically requires much less energy compared to exchanging

raw data. For example, the size of parameter gradients is

signiﬁcantly smaller than the actual data in the dataset, as

stated in [19]. This makes FL an energy-efﬁcient solution for

distributed machine learning.

Large amount of training data: FL provides strategies,

such as FedAvg [20], that allows for the merging of multiple

clients when the number of clients is sufﬁcient. This merging

of clients promotes the availability of training data and can

alleviate the problem of requiring a large quantity of data

to train AI models. Thus, FL is a powerful technique for

distributed machine learning, especially when there is a large

number of clients available.

III. CATEGORIES AND ESSENTIAL STAGES OF

FEDERATED LEARNING IN HEALTHCARE

In this section, we will provide an overview and explanation

of the essential phases of FL, followed by an explanation of

the categories of FL.

A. How the Principles of Federated Learning Apply to Health-

care

In Figure 3, we present a ﬂowchart outlining the major steps

of FL in healthcare. Additionally, we provide pseudo-code for

the key steps of FL in Algorithm 1.

Model initialization and client selection: The process

begins by deﬁning a task in the healthcare domain, such as

medical image classiﬁcation, segmentation, or HAR. Next, the

Algorithm 1 The key stages involved in federated learning,

where ωrepresents the model’s parameters, Ddenotes the lo-

cal dataset held by individual clients, and the method pertains

to the aggregation approach.

Initialization: Clients = {},ωglobal = 0, The number of clients N, Initial

parameters ωpretrained, Communication round C.

procedure Initialization & Selection (ωpretrained,N)

Clients = Select clients(N)

ωglobal =ωpretrained

end procedure

procedure Local training & Upload (ωglobal,Clients)

ωclient =ωglobal

ωlocalnew =Local training(ωclient , Cl ients, D)

U ploadP ara(Client, ωlo calnew)

end procedure

procedure Aggregation & Download (ωlocalnew,Clients)

ωglobal =Aggregation(ωlocalnew , method)

DownloadP ar a(Clients, ωglobal )

end procedure

for c= 1 to Cdo Local training & Upload (ωglobal,Clients)

Aggregation & Download (ωlocalnew,C lients)

if performance meets requirement do

Break

endif

endfor

parameters of the global server are artiﬁcially initialized, and

clients are then selected by the global server to participate in

the training.

Local training and parameters upload: Once the par-

ticipating clients are identiﬁed, the global server distributes

the initial model and its parameters to each client. In every

subsequent communication round, each client trains its own

dataset and uploads the parameters of its local model on the

global server for aggregation.

Model aggregation and parameters download: After all

participating clients have completed uploading their updated

parameters, the global server combines them to compute a

new global model. This updated model is then distributed to

each client for the next training session. The process of FL

continues until the loss function of the global server converges

or meets the performance requirements.

B. Federated learning approaches in healthcare

This section classiﬁes FL into three distinct types, namely

Horizontal FL (HFL, [21]), Vertical FL (VFL), and Federated

Transfer Learning (FTL). Figure 4 provides a clear illustration

of these categories.

1) Horizontal federated learning: Sample-Partitioned FL,

also referred to as Horizontal FL, involves healthcare clients

with datasets that share the same feature space but have

different sample spaces. In this scenario, each participant can

use the same model to train on its data and then upload it to the

global server. The integration of data from the same feature

space that is spread across multiple clients is a commonly

used technique in privacy-sensitive ﬁelds such as healthcare

and mobile services. This technique is made possible through

Fig. 4: The various types of federated learning utilized in the

healthcare ﬁeld can be illustrated through three categories. The

ﬁrst category, represented on the left, is Horizontal Federated

Learning (HFL), which involves the same feature space but

different sample spaces. The second category, shown in the

middle, is Vertical Federated Learning (VFL), where there

are distinct feature spaces but the same sample spaces. The

third category, depicted on the right, is Federated Transfer

Learning (FTL), which is characterized by disparate feature

and sample spaces. The blue and green colors represent the

different types of samples, while the gray circles indicate the

feature types.

the use of HFL, as described in [22]. To be speciﬁc, HFL can

be deﬁned as:

Xi=Xj,Yi=Yj, Ii=Ij,∀ Di,Dj, i =j(3)

where Idenotes the sample space, while Xand Yrefer to the

feature space and the label space, respectively. The datasets

owned by the ith and jth clients are represented by Diand

Dj, respectively.

2) Vertical federated learning: Feature-Partitioned FL, also

referred to as Vertical FL according to the source [23], operates

within the FL framework where the sample space remains

the same, but the feature space differs. The goal of VFL is

to create a shared machine learning model collaboratively,

utilizing all the features gathered by the participating clients.

An instance of this is the Federated Data Network (FDN)

[24], which integrates anonymous data from a prominent social

network service, thus allowing for the inclusion of a vast

majority of user samples from other data holders, such as bank

customers. Formally, the VFL can be deﬁned as follows:

Xi=Xj,Yi=Yj, Ii=Ij,∀ Di,Dj, i =j(4)

where Xagain represents feature space and Ythe label space.

Iis sample space and Dis the datasets owned by each

healthcare client.

3) Federated transfer learning: HFL and VFL require all

clients to have the same feature space or sample space, but

this assumption does not hold in more practical situations.

Transfer learning is a technique that attempts to improve the

performance of target learners on target domains by trans-

ferring knowledge from distinct, but related source domains

[25]. Thus, FTL aims to solve the case where both the sample

space and feature space are different while using a TL method

to minimize the data distribution discrepancy between each

local dataset. In healthcare, for example, FTL can assist in

disease diagnosis with data from different patients (different

sample spaces) in multiple hospitals with different therapeutic

programs (different feature spaces) [26]. Hence, FTL can be

deﬁned as:

Xi=Xj,Yi=Yj, Ii=Ij,∀ Di,Dj, i =j(5)

Xibeing the ith feature space and Yithe ith label space. Ii

is ith sample space and Di,Djare the datasets owned by ith

and jth healthcare clients, respectively.

IV. HEALTHCARE APPLICATIONS OF FEDERATED

LEARNING

As TL has many limitations, various studies have been

conducted to evaluate the usefulness of FL within the ﬁeld

of healthcare. This section provides an overview of how FL

has been speciﬁcally applied in this ﬁeld. The section covers

the application of FL in classiﬁcation and segmentation, as

well as in other tasks. The healthcare applications of FL in

2022 are summarized in Tables III.

A. Classiﬁcation-based federated learning

Classiﬁcation (and/or detection/prediction) is a very com-

mon problem of FL. Integrating FL is an important step in

enhancing the robustness of medical models due to the com-

plexity of medical data. In healthcare, FL models have been

proposed for a broad range of classiﬁcation tasks, including

cancer diagnosis [27]–[35], COVID-19 detection and Pneu-

monia diagnosis [36]–[39], epileptic seizure detection [40],

HAR [41]–[43], identifying functional connectivity biomarkers

of major depressive disorder (MDD) [44], autistic spectrum

disorder prediction [45] and surgical phases recognition [46].

We report in Table II the performance of different FL methods

proposed for classiﬁcation in healthcare, and summarize these

methods below.

Cancer diagnosis: Recent studies have shown the fea-

sibility and beneﬁts of applying FL technology to cancer

diagnostic tasks [27]–[35], [39], [47]. For instance, [27] pro-

poses a differentially private FL framework that employs Bag

Preparation and Multiple Instance Learning (MIL) to perform

a classiﬁcation task on a Lung cancer dataset. The authors

conduct experiments on their hand-crafted dataset derived

from The Cancer Genome Atlas (TCGA) [48] and demonstrate

that their FL model performs better than non-FL models while

also addressing medical data privacy concerns. However, the

performance of their model degrades when the number of

clients is high (32 clients), with an accuracy of less than 60%

in this case. This limitation prohibits the implementation of

the model in large-scale collaborative environments.

Heterogeneous data is a common challenge in FL that can

cause local and global drift, affecting the performance of

the model [28]. To address this issue, the authors of [28]

introduced a FL framework called HarmoFL, which aims to

harmonize local and global drifts simultaneously using mag-

nitude normalization. For addressing local drift, magnitudes

are limited to a speciﬁc range to generate a coordinated

feature space across local clients. They also used client weight

perturbation based on the generated feature space to guide the

local target near a globally-optimal solution which reduces

global drift. Speciﬁcally, it considered both local and global

update drifts in FL on heterogeneous medical images. They

tested their approach on the Camelyon17 dataset [49] that

consists of 450,000 breast cancer images.

Curriculum Learning (CL) has gained signiﬁcant attention

in the academic community, as evidenced by recent references

[50], [51]. CL is a training approach that gradually introduces

more challenging examples throughout the training process,

following a proper pedagogical sequence observed in human

education [52]. The work in [29] proposes a novel memory-

aware CL approach for FL that re-scales the priority of train-

ing samples based on their scores to improve effectiveness.

Authors of this work use CL for the purpose of classifying

medical images across multiple sites. This approach also

employs unsupervised domain adaptation to maintain data

privacy and minimize data distribution discrepancy. The pro-

posed approach was evaluated on a breast cancer dataset [49]

and showed superior performance compared to traditional FL

methods. However, the FL model was only tested on a single

breast cancer classiﬁcation dataset, which raises concerns

about potential bias and the generalizability of this model to

other tasks.

In [30], a solution was proposed to alleviate the instability

arising from data diversity in a setup known as FL with Shared

Label Distribution. This approach employs a weighted cross-

entropy loss, which optimizes the relevance of each sample to

the local target by taking into account the label distribution in

each client. However, it is assumed that clients can share the

number of samples in each class, which may result in privacy

leakage if this information is valuable. The proposed approach

achieved improved test accuracy on the OrganMNIST dataset

[53]. Yet, these studies performed experiments on limited types

of datasets, and further analyses on more varied and complex

medical datasets are warranted.

The work in [31] introduces a novel self-supervised pre-

training FL approach which utilizes the Vision Transformer

(ViT) as the underlying network architecture. This approach

performs local model pre-training on each client dataset to

overcome data heterogeneity concerns. Experiments conducted

on a Dermatology dataset related to skin cancer showed the

method to achieve notable improvements in test accuracy [54]–

[56]. In contrast to previous studies, authors of this work

perform three classiﬁcation tasks in both simulated and real-

world scenarios, providing a more thorough assessment of

reliability. However, their experiments only consider a limited

number of clients (5 clients), which raises worries regarding

possible bias and the approach’s ability to effectively handle

a larger number of clients.

To address the issue of non-IID (non-identically distributed)

data across different clients, the approach in [32] trains per-

sonalized models using channel-wise assignment instead of

the layer-wise personalization techniques of previous studies

[57]–[60]. In this method, the global model is decoupled at the

channel level to enable personalization. To further improve

the decoupling effect, a new cyclic distillation technique is

introduced for reducing divergence. Experiments conducted on

the colorectal cancer HISTO-FED dataset, demonstrated the

proposed approach’s effectiveness in handling non-IID data.

However, the approach was only tested using three clients

and it is unclear if a robust performance can be achieved in

challenging scenarios with more clients.

Most existing approaches focus on optimizing the average

aggregation loss, which often results in bias, where the global

model performs well on many clients but poorly on others

[33]. To address this issue and reduce the performance gap of

the global model on different clients, a FL paradigm called

Proportionally Fair FL is proposed in [33]. This approach

aims to improve model fairness by optimizing a new objective

function that allows the global model to have better general-

ization ability across different clients. The primary objective

of Proportionally Fair FL is to enhance poorly performing

models by allowing the global model to dynamically adjust

the network parameters based on the actual performance

of the training clients. The experiments conducted on the

Cancer Genome Atlas (TCGA) dataset [48] demonstrate that

this method achieves good results. Yet, the stability of the

training loss is lower in comparison to FedAvg, and there

is a noticeable ﬂuctuation in the testing accuracy during the

training procedure.

Compared to previous methods, the approach presented

in [34] focuses on improving the generalization ability of

the local-speciﬁc model instead of the global model. In this

approach, the global model only acts as a feature extractor

to assist the local model in extracting pertinent information.

Evaluated on the task of skin legion classiﬁcation using the

HAM10000 dataset [61], the proposed approach achieved an

accuracy of 62.8±2.0% when eight clients participated in

the training. While improvements were also observed when

increasing the client number to 32, the performance of their

model did not improve for 64 clients, with an accuracy rate

of approximately 60% in this case.

A semi-supervised FL method is introduced in [35] to

handle scenarios where clients solely have unlabeled data,

whereas the global server holds only a small amount of labeled

data. This method employs a dynamic bank learning tech-

nique to support client training by utilizing class proportion

information. The technique distils class scale information by

establishing dynamic banks, which enables the model to learn

the scale distribution knowledge via sub-banks. The result

of the experiments achieves an AUC of 77.47% for skin

lesion diagnosis in dermoscopy images of the HAM10000

dataset. Nonetheless, the sensitivity of the proposed models is

relatively low at approximately 37%, indicating that its ability

to accurately detect the proportion of true positive cases among

all positive cases is insufﬁcient. The F1 score of the model is

also poor, with a value of approximately 33%. Furthermore,

limited information was given on the parameters of training

optimizers, such as learning rate and weight decay, and the

number of clients was limited to 10.

COVID-19 Detection and Pneumonia diagnosis: Recent

studies have also investigated the use of FL for COVID-

19 detection and pneumonia diagnosis [36]–[39], [62]. Since

COVID-19 is a worldwide epidemic, incorporating more

clients to create a robust global model can be beneﬁcial for pa-

tients and physicians. The study in [36] leverages customized

local models for healthcare personalization, employing distinct

local batch normalization to optimize model generalizability

while maintaining a high speciﬁcity for each patient. Exper-

imental results on the COVID-19 chest x-ray dataset [63]

showed promising performance and rapid convergence of the

method. Experiments involving 100 clients showed the method

achieves an average classiﬁcation accuracy of 75%, which

indicates its robustness under a large number of clients.

In [37], two FL techniques are proposed for different active

learning scenarios: Labeling Efﬁcient Federated Active Learn-

ing (LEFAL) and Training Efﬁcient Federated Active Learning

(TEFAL). LEFAL aims to enhance the effectiveness of feature

learning by taking into account data uncertainty and diversity,

while TEFAL improves client efﬁciency by employing a

discriminator to assess the amount of useful information a

client can provide. The authors conducted experiments on the

COVID-19 dataset [64] and showed their approach achieves

high accuracy and F1 scores in a limited number of iterations.

For example, their model obtained an average accuracy of

0.9 and an average F1 score of 0.95 with only 50 iterations.

Additionally, the experiments covered two scenarios, involving

a small hospital and a large hospital, providing a more practical

assessment of the performance of the FL model in complex

settings. However, the maximum number of clients was limited

to ﬁve in this study.

The work in [38] presents a FL approach utilizing Gen-

erative Adversarial Networks (GANs) to mitigate the risk

of data privacy leakage. In this approach, a Convolutional

Neural Network (CNN) was used as a generator to pro-

duce synthetic COVID-19 images, enabling the discriminator

to learn and replicate the actual distribution of COVID-19

data. Additionally, a blockchain-based Differential Privacy

Protection technique was implemented to enhance the data

privacy protection. Experiments on the DarkCOVID dataset

[65] and the ChestCOVID dataset [66] indicated that this ap-

proach could outperform state-of-the-art FL methods on these

datasets. Results on the DarkCOVID dataset reveal that the

classiﬁcation accuracy for COVID and normal cases is 99%,

however, the performance in predicting pneumonia is relatively

lower with an accuracy of 80%. Furthermore, the proposed

method requires a large number of epochs, typically around

200, to achieve optimal results, which is time-consuming.

The authors of [62] use cyclic homomorphic encryption to

improve the privacy-preserving capabilities of their FL method

by encrypting the aggregation process. Adversarial attacks are

also simulated to evaluate the model’s resilience. However,

their privacy protection technique is only effective when there

are more than two clients. In other words, when there are fewer

than three participating clients, the model’s privacy-preserving

ability is almost nonexistent. Experimental results based on the

RAD-ChestCT dataset showed their approach to achieve an

average accuracy of 94%, which is similar to the performance

of TL (95%) [67]. However, the maximum number of clients

used in this work is limited to 5. Moreover, the GPU memory

usage of the method exceeds 26 GB, which may restrict the

choice of computational device. One advantage is the training

time is shorter compared to centralized training, shedding light

on training efﬁcient FL models.

In [39], a practical FL scenario called intermittent client

participation is presented, where some clients are consistently

involved in the training while others leave due to internet

connectivity issues. The method in this work achieves an

accuracy of 80.29% for pneumonia diagnosis on the chest X-

ray image dataset [68]. However, this study only considers

whether there is one client leaving or not, which fails to

provide a comprehensive reﬂection of the overall impact of

clients leaving. Additionally, the maximum number of clients

is limited to 10.

Epileptic Seizure detection: According to the World

Health Organization, epilepsy symptoms affect approximately

50 million individuals globally [69]. As a result, the detection

of epileptic seizures is critical for pre-operative evaluations

[70]. Detecting epileptic seizures usually involves accessing

sensitive patient data, such as EEG recordings. The diversity of

data obtained from different EEG devices further complicates

the training of a reliable model, which drives the need for

FL. A recent study proposed a real-time personalized FL

framework for detecting epileptic seizures on mobile plat-

forms, based on a deep neural network [40]. The authors

explored personalized FL, which enabled the model to learn

patient-speciﬁc seizure features from local data. The study

also showed this approach achieves greater energy efﬁciency

and performance using the EPILEPSIAE dataset [71]. Yet, the

model’s sensitivity is not robust, as it exhibits a substantial

decrease (−8.5%) when compared to the centralized model.

In addition, experiments were limited to a total of four clients.

Human Activity Recognition: The development of IoT

technology has enabled Human Activity Recognition (HAR)

to play a critical role in assisting medical professionals with

collecting patient data for diagnosing chronic illnesses [72].

However, HAR is susceptible to privacy violations and data

dissimilarity issues. FL is a potential solution for implement-

ing robust models with numerous clients, as it effectively

addresses the previous issues. In a recent study [42], the au-

thors concluded that privacy regulations would not be violated

if a label with natural language is speciﬁed when sharing

data. The study considered the classiﬁcation problem as a

matching process between data and class representation, and

transformed the classiﬁer into a data and category encoder to

facilitate this process. Additionally, it used the class names as

a reference point to ensure category representation in the label

encoder through natural language. Experiments conducted on

the PAMAP2 dataset [73] demonstrated that this method could

outperform most existing classiﬁcation techniques based on

FL. Nevertheless, the experiments did not include the results

obtained using a centralized model. Instead, the authors only

compared their results with those of six recent FL methods.

Thus, this comparison does not adequately reﬂect the differ-

ences in performance between TL and FL.

In [41], the limitations of existing wearable devices such

as data privacy, service integrity, and network structure adapt-

ability have led authors to create an adaptive network for in-

telligent wearables based on the distributed structural features

of the fog-IoT network. The proposed FL platform integrates

blockchain technology to enhance data privacy protection.

When tested on a HAR task using smartphone data [74],

this approach achieved good performance in terms of privacy

preservation and classiﬁcation accuracy. However, the maxi-

mum number of clients was set to 10 in this work, which is

not practical for device-based FL due to the larger number of

devices compared to institutions in real-life scenarios.

Another study proposed a transfer learning-based person-

alized FL framework to tackle issues of heterogeneous data

and data privacy [43]. This framework aims to enhance model

performance by reducing the need for localized training and

using multi-domain knowledge to lessen disparities between

the data. The performance of the framework was evaluated on

a custom dataset, with results showing it achieves more than

90% precision on a ﬁve-category HAR task. Unfortunately,

the optimizer parameters, such as the learning rate, choice of

the optimizer, and batch size, are not mentioned in the study,

making it difﬁcult to reproduce its results. Furthermore, the

absence of a publicly available benchmark may not adequately

reﬂect the actual performance of the proposed model.

Major Depressive Disorder disease diagnosis: Major De-

pressive Disorder (MDD), a prevalent, severe, and expensive

mental disorder worldwide, causes depressed mood, reduced

interest, and impaired cognitive function. Detecting functional

connectivity biomarkers and early intervention is important

for managing MDD. The privacy concerns related to patients’

information and data require the utilization of FL to train a

large global model. In a recent study [44], the authors devel-

oped a federated joint estimator to detect these biomarkers by

training a multilayer Bayesian network based on continuous

optimization. To enhance personalized models, they utilized

group fused lasso penalty during training and proposed an

alternating direction method of multipliers (ADMM) technique

to aid in processing neuroimaging data. The proposed method

incorporated information-sharing strategies to improve the

learning of local models. Experiments on rs-fMRI dataset [75]

demonstrated the superior effectiveness and precision of this

method.

Autism spectrum disorder prediction: Autism spectrum

disorder (ASD), a disorder that is part of the autism spectrum,

has a substantial impact on the prevalence of mental illnesses,

which can harm a child’s mental health development [76].

CNN [77], [78] and Recurrent Neural Network (RNN) [79],

[80] are frequently employed to detect ASD early on for

prediction purposes. Although these techniques have achieved

good results, they mostly disregarded the correlations and

connections between subjects in the population [45]. Recent

research has shown that graph neural networks can effectively

overcome this limitation [81]. This approach employs graph

generative adversarial networks to complete the missing infor-

mation in the local network and uses network in painting and

inter-institutional data to enhance the edge predictor [45]. The

method’s effectiveness was demonstrated through experiments

on two neuroimaging datasets, ABIDE [82] and ADNI [83].

For the ADNI dataset, the performance of the FL model

remains the same when increasing the number of clients

beyond 8. However, this performance continues to improve

for the ABIDE dataset, suggesting that the model’s potential

may not be fully attained when faced with more clients.

Surgical Phases recognition: Surgical phase recognition

serves a crucial clinical purpose by accurately identifying the

current phase without future information from the surgical

video [84]. Despite its importance, the ﬁeld continues to

face challenges due to the sensitive nature of medical data.

This restricts collaborations between multiple institutions and

limits the deployment of traditional deep models in real-world

settings. In [46], the authors introduced the ﬁrst FL strategy

that employs semi-supervised learning to enhance the gener-

alization capability of the surgical phase recognition model

using both labeled and unlabeled data present in the dataset.

The experimental results demonstrated that this approach can

learn better features and exhibit a feasible generalization per-

formance in unknown domains. The MultiChole2022 dataset

used in this study was created from the Cholec80 dataset [85].

Summary: The existing FL classiﬁcation models are

still restricted to a limited number of clients. Furthermore,

benchmark datasets are needed to compare the performance

of the same tasks (e.g., COVID-19 diagnostics). To ensure an

objective evaluation of these FL (classiﬁcation) models, future

collaboration is encouraged for expanding the datasets.

B. Application of Federated Learning Segmentation in Health-

care Tasks

Medical imaging applications using FL may also involve

various segmentation tasks, for example, to delineate tumors

and other lesions in the prostate [28], [90], [92], [97], [98],

brain [104], [107], [108], [128], breast [100], skin [100] or

liver [110]. Table II presents a summary FL-based techniques

for segmentation and their reported performance.

Prostate tumor segmentation: The accurate segmentation

prostate regions in MRI is a crucial step in numerous medical

imaging applications for detecting prostate cancer, character-

izing its aggressiveness, predicting its recurrence, assessing

the effectiveness of treatment [129]. The work in [28] trains

a FL-based segmentation model using a multi-site prostate

dataset [89], which comprises 79 samples from six different

sites. Results showed this model to achieve an average Dice

of 94.28%. Compared to FedAvg and FedBN, the proposed

method shows enhanced stability with increased local training

epochs. However, this study did not evaluate the performance

improvement or decrease brought by using FL, compared to

centralized approaches.

Weakly supervised learning has emerged as popular ap-

proach to alleviate the burden of labeling data [130]. In this

approach, incomplete but easier-to-obtain annotations are used

instead of full image annotations. In [90], authors proposed

a ﬁrst federated weakly supervised segmentation (FedWSS)

method to learn a segmentation task from multiple data sources

wile minimizing the impact of data drift. To address local and

global data drift, the authors introduced two strategies, based

on Cooperative Annotation Calibration (CAC) and Hierarchi-

cal Gradient De-conﬂiction (HGD). CAC reduces local drift

using a Monte Carlo sampling technique that customizes a

distal peer and proximal peer for each client, and accurately

distinguishes between clean and noisy labels. Meanwhile, the

HGD strategy mitigates global data drift by using primary

gradient data to aid clients in subsequent training cycles [90].

TABLE II: Summary of the performance of each federated

learning algorithm.

Ref. Clients Performance Model datasets

CLASSIFICATION

[27] 32 ACC = 0.641 ±0.09 DenseNet [49]

[28] 5 ACC = 0.9548 ±0.0113 U-Net [49]

[29] 3 AUC = 0.79 ResNet-22 [49]

[30] 12 ACC = 0.8475 CCNN [53]

[31] 5 ACC = 0.8996 ViT [54]–[56]

[39] 3 ACC = 0.8029,

AUC = 0.9313 CNN [68]

[47] 14 AUC = 0.83 DenseNet121 [86], [87]

[32] 3 ACC = 0.6566 ResNet-32 *

[88] †ACC >0.86 CCNN *

[33] 4 ACC = 0.7954 DenseNet121 [48]

[62] 3 ACC = 0.94 U-Net [67]

[34] 32 ACC = 0.596 ±0.03 ResNet-18 [61]

[35] 10 ACC = 0.8894 ±0.015 DenseNet121 [61]

[36] 20 ACC >0.94 Alexnet [63]

[37] 6 ACC = 0.976,

REC = 0.978 U-Net [64]

[38] 4 ACC = 0.967 – 0.973 GAN, CCNN [65], [66]

[40] 4 ACC = 0.8162,

SPEC = 0.82 Res1DCNN [71]

[42] 9 ACC = 0.8814 ViT [73]

[41] 10 ACC = 0.9043 1DCNN [74]

[43] 10 ACC >0.90 CCNN *

[44] 60 PREC >0.92 Bayesian Networks [75]

[45] 5 ACC = 0.758 ±0.0007 GCN [82], [83]

[46] 4 ACC = 0.5969 ±0.075 ResNet-50 [85]

SEGMENTATION

[28] 6 Dice = 0.9428 ±0.08 U-Net [89]

[90] 10 Dice = 0.8778 ±0.064 U-Net [91]

[92] 6 Dice = 0.9028 U-Net,VGG-11 [91], [93]–[96]

[97] 5 Dice = 0.7828,

IoU = 0.7192 ResNet-18 [91], [95]

[98] 20 IoU = 0.671 GAN [99]

[100] 3 Dice = 0.8334 / 0.8693 U-Net [61], [101]–[103]

[104] 3 Dice = 0.888 – 0.898 U-Net [105], [106]

[107] 3 Dice = 0.7785 ResNet-34 [105], [106]

[108] 10 Dice = 0.8460 ResNet-34 [109]

[110] 30 Dice = 0.829 / 0.899 U-Net [93], [111]

[112] 2 Dice = 0.803 ±0.004 U-Net [113]

[114] 50 Dice = 0.8804 U-Net [115]

OTHE R TASK S

[116] 3 PSNR = 0.351 ±0.014,

SSIM = 0.954 ±0.01 GAN [117]

[118] 8 PSNR = 0.3921,

SSIM = 0.970 U-Net [119]

[120] 50 F1= 0.5882 BERT [121]

[122] 20 AUC >0.845 LSTM [123]

[124] 4 IBS = 23.6 ±0.9 NN [125]

[126] 6 ROC = 0.86 LSTM [127]

ACC: Accuracy; PREC: Precision; SPEC: Speciﬁcity; REC: Recall; ROC: Receiver

Operating Characteristic Curve; AUC: Area under the ROC curve; REC: Recall; Dice:

Dice coefﬁcient; IoU: Insertion over Union; PSNR: Peak signal-to-noise ratio; SSIM:

Structural similarity index; IBS: Integrated Brier scores; ‘*’ means private dataset, ‘†’

means not provided; GAN: Generative adversarial network; CNN: Convolutional neural

network; CCNN: Custom convolutional neural network; GCN: Graph Convolutional

network; BERT: Bidirectional encoder representations from transformers; LSTM: Long

short term memory; ViT: Vision Transformer; 1DCNN: 1 dimensional convolutional

neural network; NN: Neural network.

Experimental results on the PROMISE12 dataset [91] showed

the method to outperform previous approaches for FL-based

prostate segmentation. Yet, this method primarily involves

sharing models between clients to detect noisy labels, which

may lead to increased data transmission costs. Moreover,

the sharing process may be vulnerable to malicious attacks,

which could potentially lead to the recurrence of privacy

breaches. Additional encryption techniques should hence be

incorporated into the sharing process.

The work in [92] addressed the challenge of client drift to

reduce the generalization gap between FL and TL models. It

proposed a new FL framework based on ensemble learning

and introduced a novel personalization technique that aims

to update model parameters by interpolating the local optima

of the current client with those of other clients. Experimental

results on three medical image segmentation tasks (retinal disc

and cup segmentation, 2D fundus image segmentation, and

prostate segmentation from 3D MRI) [91], [93]–[96] demon-

strated the effectiveness of the proposed method. Additionally,

the proposed approach demonstrated comparable outcomes to

the centralized model, indicating considerable potential when

compared to FedAvg and FedProx.

In [97], the authors proposed a personalized FL paradigm

to address the challenges of performance degradation and

unbalanced label distribution. The proposed method leverages

progressive Fourier aggregation on the global server side and

enhanced transfer on the client side to learn the parameters of

individual client models and transfer local knowledge to the

global model more effectively. To address the problem of label

distribution imbalance, it also introduces a new loss function

called Conjoint Prototype Aligned (CPA) loss. This loss eval-

uates the global conjoint objective based on the global imbal-

ance and modiﬁes the local client-side training via prototype-

aligned reﬁnement to eliminate the imbalance gap with a bal-

anced objective. Experimental results on PROMISE12 dataset

[91] and ISBI dataset [95] showed the method’s superior

performance compared to recent approaches. However, this

method has a local training time twice longer than standard FL,

which could potentially increase the communication load when

using edge devices. Moreover, the absence of a comparison

with the centralized model does not sufﬁciently explain the

potential of using FL for prostate tumor segmentation.

Breast tumor segmentation: Breast cancer, which is the

most prevalent type of cancer in women, can be fatal if not

detected early [131]. In order to simulate a FL model, in

a recent study [100], a novel label-agnostic supervised FL

method called FedMix was proposed. FedMix trains each

client by utilizing both strong and weak labels with an adaptive

weight adjustment strategy, which allows for dynamic weight

adaptation during the FL training process to learn better feature

representations. This method breaks the restriction of only us-

ing one type of label for training. FedMix was tested on three

breast tumor segmentation datasets: BUS [101], BUSIS [102],

and UDIAT [103]. Experimental results showed it outperforms

most current approaches. Nevertheless, the performance of this

technique relies heavily on the choice of hyper-parameters,

which needs extensive ﬁne-tuning to avoid degradation in

performance. Additionally, it is assumed that rich label clients

exhibit higher training loss, indicating a greater amount of in-

formation available for model training. However, the presence

of noisy or corrupted labels can lead to a substantial rise in

the loss, and their model is unable to effectively differentiate

these labels. Consequently, the performance of the model in

this particular scenario may be negatively impacted.

Skin tumor segmentation: Skin cancer is a common

disease that affects both men and women [132]. Similarly to

their experiments on breast tumor segmentation, in [100], the

authors also evaluated their FedMix method on a skin tumor

dataset [61]. This dataset includes approximately 10,015 sam-

ples from four different sources. According to the experimental

results, FedMix achieved an average Dice score of 86.93%. A

comparison with the centralized model is however missing.

Brain tumor segmentation: The accurate identiﬁcation of

tumor regions in medical images is crucial for effective clinical

treatment of brain cancer, and brain tumor segmentation plays

a key role in this process [131], [133]. However, the need

for FL models has risen due to the huge workload associated

with annotating images for medical professionals. In [104],

a distributed network framework based on is proposed for

FL that operates in real-time using the Message Queuing

Telemetry Transport (MQTT) protocol. It uses a modiﬁed

version of the U-Net model train with data from daily clinical

practice. Two commonly used datasets, BraTS 2018 [105] and

BraTS 2020 [106] are used to investigate the trade-off between

training accuracy and latency. The results show, for the ﬁrst

time, the primary advantages of the MQTT protocol in terms

of reliability, bandwidth efﬁciency, and scalability.

The work in [107] presents an FL-based framework address-

ing the issues of non-IID data and privacy leakage. The pro-

posed solution utilizes unlabeled public data for ofﬂine, one-

way knowledge distillation to extract local knowledge through

ensemble attention distillation, enabling global model learning

while maintaining privacy. This approach was tested on the

BraTS 2018 dataset [105] and BraTS 2020 dataset [106].

Results showed highly competitive performance along with

more effective privacy protection. The experiments conducted

in this work covered wide a range of FL scenarios including

local data from different institutions, local data of varying

sizes, public data from different domains combined with local

data, and public data with modalities different from the local

data. This comprehensive analysis allowed for a thorough

evaluation of the performance of FL in various situations.

However, a relatively small number of clients was employed

(i.e., 3 clients).

In [108], a straightforward FL method called heterogeneity-

aware FL is proposed, which improves the generalization of

the model over the target domain by splitting the network

and concatenating feature maps. Unlike other methods, this

approach does not require complex tuning and optimization

strategies. Experiments conducted on the BraTS 2017 dataset

[109] indicate that this method can achieve an average Dice

score of 84.60%. The robustness of the model was also exam-

ined, and the results of the assessments demonstrated that this

approach had the most favorable outcomes compared to CWT

[134] and FedAvg+SD [135] when the network architecture

was changed from Resnet34 to MobileNet-v2. Although the

method effectively addresses statistical heterogeneity issues,

it neglects model heterogeneity [136], device heterogeneity

[137], behavior heterogeneity [138], and other factors. As a

result, the model’s potential for tackling heterogeneity prob-

lems is not fully demonstrated.

Neuroimaging anomaly labeling: A technique called

Federated Disentangled Representation Learning is proposed

for performing unsupervised brain disease segmentation [128].

This technique decomposes the parameter space into a global

space, allowing the model to take advantage of generic

anatomical features while also protecting client-speciﬁc con-

trast information. The approach was tested on three datasets,

namely OASIS-3 [139], ADNI [83], and an internal dataset

(KRI), and the results showed signiﬁcant improvement in

anomaly segmentation when compared to locally trained mod-

els without annotations or sharing of private local data. Specif-

ically, the proposed method achieved a 99.74% improvement

for multiple sclerosis and a 40.45% improvement for tumors

[128].

Liver tumor segmentation: The segmentation of liver

cancer using computed tomography (CT) volumes is important

due to its high mortality rate [140]. Proper segmentation is nec-

essary for the accurate diagnosis of this common malignancy.

However, incomplete annotations in individual datasets, such

as those found in [47], can pose a challenge. FL offers an

interesting solution for tackling these problems. In [110], a FL

segmentation algorithm is introduced to address this issue. The

algorithm consolidates the acquired knowledge into a meta-

global model through learning to segment datasets with various

incomplete annotations. The experiments conducted on the

MSD dataset [93] and BTCV dataset [111] this model achieved

impressive results on distributed datasets that have disjoint and

incomplete annotations. The model sets up a prototype for

creating a uniﬁed multi-task segmentation model with clinical

relevance using fragmented datasets with incomplete annota-

tions. Yet, further experimentation is required to evaluate the

performance the method’s in practical FL settings (e.g., clients

may leave and join the training procedure) and with more

challenging datasets.

Pneumothorax segmentation: The accuracy of clinical

diagnosis for Pneumothorax, a common lung disease, largely

depends on the precision of segmentation from chest X-ray

images [141]. In [112], a patch permutation approach was

employed by incorporating permutation into a patch embedder

layer and using ViT as the backbone for multi-task FL. This

approach led to a decrease in communication consumption

between clients and the global server, as well as an im-

provement in model performance. Experimental results on the

SIIM-ACR dataset [113] showed the proposed technique to

achieve an average Dice score of 80.8%, outperforming the

centralized model by 0.7%. However, the small number of

clients considered in this study (i.e., only two) constitutes a

limitation.

Vertebral body segmentation: When it comes to clinical

applications, manual segmentation is widely applied. However,

it is not practical for spinal body segmentation due to time

limitations [142]. To address time constraints and improve

the availability of data for decreasing manual segmentation,

it is necessary to collaborate with a larger number of clients.

In [114], a new FL-based framework was proposed for ver-

tebral body segmentation. The framework utilized a local

Dual Attention Gates-based attention mechanism to improve

the performance of the model. This method was capable of

enhancing the performance of vertebral segmentation models

using the SpineSagT2Wdataset3 dataset [115]. Nevertheless,

there remains a difference in performance between FL and TL,

with FL experiencing ﬂuctuations during training that make it

less stable compared to centralized approaches.

Summary: The same issue of having a very small number

of clients is also present in segmentation tasks. Despite the fact

that there are a number of segmentation tasks (e.g., prostate

cancer) that have limited high-quality data, increasing the

number of clients will only lead to a few or no samples

for each client, thus signiﬁcantly decreasing the global per-

formance. It is thus challenging to acquire more high-quality

data.

C. Applications of Federated Learning in Healthcare for Var-

ious Tasks

FL can be applied beyond image classiﬁcation and segmen-

tation, with potential applications including MRI reconstruc-

tion [116], [118], medical relation extraction [120], medical

knowledge graphs [143], mortality prediction [122], lifespan

prediction [124], and mental health detection [126]. For a

summary of each method’s performance, please refer to Table

II.

MRI reconstruction: The lengthy MRI acquisition times

caused by modern technology have become problematic for

both patients and doctors, leading to an increase in popularity

for reconstructed high-quality MRI [144]. However, previous

FL methods for MRI reconstruction were based on conditional

reconstruction models and had poor generalization ability,

making them unsuitable for a wide range of acceleration

rates [116]. To address this issue, a novel image recon-

struction method based on unconditional generative adver-

sarial networks was proposed in [116]. The method utilized

cross-site learning to generate images and included a new

mapper subnetwork to maintain speciﬁcity by creating site-

speciﬁc latent. This method improved performance on multi-

institutional datasets, including IXI [117], fastMRI [119], and

BRATS [145]. The model also has lower computational and

inference times compared with other reconstruction methods

[146], [147], which is more practical for real-life settings.

Nonetheless, it is critical to conduct further research in order

to systematically validate the method and assess its anatomical

accuracy on a wider range of patients.

In [118], it was noted that domain-speciﬁc information,

which can contain valuable information for local reconstruc-

tion, should not be ignored while other FL techniques focus on

improving the generalization of global models. To address this,

a speciﬁcity-preserving FL algorithm was proposed, consisting

of an encoder to learn a global generalization representation

and a client-speciﬁc decoder to retain domain-speciﬁc features.

Weight contrast regularization was also employed during the

training process. It achieved an average Peak Signal-to-Noise

Ratio (PSNR) of 39.21% and Structural Similarity Index

(SSIM) of 0.970 using the fastMRI dataset [119]. In addition,

the model outperforms FedAvg, FedBN, and FedProx in terms

of achieving higher SSIM and PSNR values within a few

epochs. It also exhibits a consistent curve for PSNR and SSIM

during training.

Medical relation extraction: Relational extraction, which

is a critical approach to acquiring knowledge in AI, is gaining

traction in the healthcare industry [148]. However, heterogene-

ity issues also exist in the texts from various institutions. In

[120], a new concept called major classiﬁcation vectors is pro-

posed, which consists of a set of class vectors obtained through

an ensemble learning method. A contrastive learning method

is employed to facilitate local training and minimize the risk

of local models overﬁtting. Additionally, the proposed method

effectively prevents the leakage of original data, features, and

label distribution. The experiments conducted on the 2010

i2b2/VA challenge dataset [121], BioCreative VI: Chemical-

protein interaction dataset [149], and Phenotype-Gene Rela-

tions corpus dataset [150] show that the proposed method

yields decent results, especially in terms of a more efﬁcient

convergence rate. The experimental results indicate that the

method exhibits reduced training ﬂuctuations in comparison to

FedAvg and FedRS. It is important to note, however, that the

study does not include a comparison with non-FL techniques.

Mortality prediction: The lack of labeling in electronic

medical records and the distributed nature of the data make

it difﬁcult to train an AI model that can achieve optimal

performance [122]. To address the challenge of privacy while

also requiring additional assistance for successful resolution,

a model-agnostic meta-learning algorithm called Reptile was

recently proposed [151]. Despite the development of such

approach, this ﬁeld still suffers from distributed data issues. In

[122], a dynamic variant of the neural graph based on the Rep-

tile algorithm is introduced, which enables semi-supervised

learning by integrating unlabeled data into the training phase

and simultaneously conducting metric learning on labeled and

unlabeled neighborhoods. Experiments carried out using the

MIMIC-III dataset [123] demonstrate the effectiveness of the

proposed method, particularly when constrained to limited

supervision. The method displayed comparable performance

to the centralized model, but had a slower loss convergence

rate compared to FedAvg. Additionally, this work lacked

a thorough investigation into privacy-preserving techniques.

This raises concerns about the robustness of the proposed

model’s performance against malicious attacks or when em-

ploying encryption methods such as differential privacy or

secure multi-party computation.

Lifespan prediction: There has been a recent increase

in studies related to predicting life expectancy [124], [152].

The Cox model is a well-known standard technique in this

ﬁeld [153]. In [124], a federated Cox model is proposed

based on the Cox model. This model accounts for the effects

of time-varying covariates by relaxing the proportional risk

assumption, which ensures data privacy and reduces upfront

investment costs for organizations compared to previous meth-

ods. Experiments done on three clinical datasets – METABRIC

[154], SUPPORT [155], and GBSG [125] – show that the

FL model can perform equally well to the traditional model.

However, this study mainly focuses on the heterogeneity

resulting from label stratiﬁcation, while neglecting other forms

of heterogeneity, such as covariate shifts [156], which are

commonly observed in image-based survival predictions.

Medical knowledge graph: In the healthcare ﬁeld, knowl-

edge graphs, which are data networks comprising entities rep-

resented by nodes and relationships represented by edges, have

become a popular topic of discussion [157]. In order to en-

hance collaboration with additional institutions and experts, in

[143], a framework is proposed to build on-demand knowledge

graphs with speciﬁc tasks for FL. The framework is designed

to be ﬁndable, accessible, interoperable, and reusable (FAIR)

for creating biological knowledge graphs while maintaining

the source data’s provenance. This framework is among the

ﬁrst to standardize the process of constructing knowledge

graphs, rather than their representation.

Mental health assessment: Early detection of mental

illness is challenging due to its insidious nature and the lack

of available resources. Mental illness is the most widespread

mental health issue globally. To address global mental health

issues, a potential approach involves using FL to train a

large-scale model that can be universally applied. In [126],

a FL-based model for mental health detection is proposed.

The model utilizes a hypergraph and a sentiment vocabu-

lary approach, which is word-represented, to learn a low-

dimensional vector representation for detection while pre-

serving semantic relevance to the greatest extent possible.

An attention-displacement mechanism is also incorporated to

assist in the instructional process. Experiments conducted on

a dataset gathered from websites and forums [127] achieved

an AUC-ROC of 0.86 using long short-term memory ar-

chitecture. However, the study has certain limitations. For

instance, it fails to take into account extra variables such

as the patient’s geographical location, cultural and religious

background, as well as social context. Moreover, in some cases

during training, there may be situations where the performance

of a single class is poor (e.g., AUC = 0.2). Despite the

overall performance being acceptable, the model fails when

confronted with particular categories. Additionally, there is a

lack of comparison with the centralized model.

Summary: The limited number of clients is not a major

issue when it comes to medical tasks based on FL, such as

identifying relationships between medical entities and knowl-

edge graphs. However, these tasks are highly valuable for

physicians in providing accurate diagnoses. Furthermore, the

reconstruction of MRI data requires high-quality data, which

needs FL to combine more data to train a robust model. It is

suggested that more research be conducted to address these

issues.

V. FEDERATED LEARNING MODELS

PERFORMANCE

This section initially outlines the metrics typically em-

ployed for FL tasks in healthcare. Subsequently, it presents an

overview of the commonly used datasets with their descrip-

tions. It’s important to mention that, due to the unavailability

of a standardized test dataset at present, a uniform performance

comparison is not provided.

A. Commonly Used Evaluating Metrics

The metrics that are typically utilized for image segmenta-

tion tasks include the Dice coefﬁcient (Dice) and Insertion over

TABLE III: Summary of recent studies on federated learning models in healthcare.

Reference Purpose Category

[27] Employs bag preparation and MIL techniques to accomplish classiﬁcation tasks FL, MIC

[128] Utilizes generic anatomical features by decomposing the parameter space into a global space while maintaining client-speciﬁc contrast

information FL, BAD, MIS

[28] Builds a new harmonizing architecture called HarmoFL to address data drift problem FL, MIC, MIS

[116] A new mapping subnetwork with cross-site learning proposed to perform image reconstruction tasks FL, MRIR

[46] Federated semi-supervised learning method used for surgical phases recognition FL, SPR

[100] A label-agnostic uniﬁed FL using mixed labels and an adaptive weight assignment procedure for aggregation proposed to perform

segmentation tasks FL, MIS

[122] A dynamic variant of neural graph network and meta-learning to tackle mortality prediction tasks FL, ML, MIA

[30] To mitigate the instability caused by data heterogeneity through knowledge of the client’s label distribution FL, MIC

[31] Uses masked image encoding as a self-supervised task to learn effective representations FL, MIC

[18] Considers a weighted average assignment based on data quality instead of the amount of data to perform classiﬁcation tasks FL, MIC

[104] A real-time distributed FL framework based on MQTT protocol proposed to improve the FL procedure FL, MIS

[36] The use of local batch normalization and personalized models for each client has been explored as a means to address the domain shift

problem and learn the similarities between clients. FL, HAR

[114] A novel local Dual Attention Gates-based attention mechanism has been employed for FL FL, MIS

[37] Active learning techniques have been utilized to improve the performance of FL models FL, AL, MIA

[97] To address the challenges of performance degradation and unbalanced label distribution in a dataset, a solution has been proposed that

employs PFA and Conjoint Prototype Aligned loss. FL, MIS

[118] Optimizes the FL model by dividing it into two parts, one as a decoder and the other as a client-speciﬁc decoder, and uses weighted contrast

regularization in the training process FL, MRIR

[34] A customized FL has been developed to enhance the generalization ability of the local model, rather than the global model FL, MIC

[108] This approach aims to address the reduction in performance caused by data heterogeneity in FL by tackling the heterogeneity of data FL, MIA

[43] A cross-domain FL framework has been developed that leverages transfer learning techniques to mitigate differences in data distribution FL, TR, HAR

[45] To enhance the performance of the model, a combination of graph neural networks and intra-network mapping has been utilized FL, GNN, DP

[107] The robustness of data privacy protection has been increased by employing a one-way ofﬂine knowledge distillation technique FL, KD, MIA

[98] Uses GANs for computational pathology to reduce the discrepancies between data FL, CP, MIS

[35] A semi-supervised FL approach with dynamic bank learning method proposed to solve the class distribution imbalance problem FL, MIC

[38] Builds a blockchain-based differential privacy protection strategy to enhance the effect of data privacy conservation FL, MIC, GAN

[126] Considers a word emotion representation while using an attention shifting mechanism to assist in training FL, HG, NLP

[92] To tackle client drift problem and the generalization gap between TL and FL FL, MIS

[40] Employs a FL model using CNN architecture to tackle the real-time seizure detection tasks FL, SD

[32] Implements channel decoupling to provide personalized models and a new cyclic distillation scheme to control the training process FL, MIC

[88] Incorporates contribution aware into FL to build a reliable healthcare system FL, HS

[39] To solve the intermittent client problem (some clients may leave the training) FL, MIA

[33] Aims to lessen the performance differences between each local model in an effort to increase the ”fairness” of the models for each client FL, MIC

[44] Uses Bayesian networks and group fused lasso penalty to process the neuroimaging data at each local client before update to the global

server FL, MIC

[112] To improve the models performance without sacriﬁcing privacy by utilizing random patch permutation for MTL FL, ViT, MIA

[124] Builds a federated Cox model to lower initial organizational expenses FL, LP

[41] Utilizes private blockchain technology in order to safeguard data within the IoT network with privacy-preserving features FL, HAR

[29] A new memory-aware curriculum learning strategy for FL proposed in order to enhance the consistency of local models and penalize

inconsistent prediction outcomes FL, MIC

[42] To better align the latent spaces across clients by using natural language to represent label classes FL, KD, MIC

[90] Builds a weakly supervised FL algorithm to efﬁciently learn segmentation models in the context of data drift mitigation FL, WSL, MIS

[120] An ensemble approach and contrastive learning has been developed to prevent overﬁtting issues FL, CL, MRE

[110] Uses a knowledge aggregation strategy for handling medical datasets with different and incomplete annotations FL, MIS

[47] A FL based surgical aggregation method has beem utilized to handle multi-label classiﬁcation problems FL, SA, MIC

[143] To build on-demand knowledge graphs with speciﬁc tasks by using FL methods FL, KG

[62] Utilizes personalized cyclic homomorphic encryption to enhance the privacy protection effect FL, MIA

FL: Federated learning; MIC: Medical image classiﬁcation; SPR: Surgical phase recognition; MIS: Medical image segmentation; HAR: Human activity recognition; MIA: Medical

image analysis; AL: Active learning; TR: Transfer learning; GNN: Graph neural network; DP: Disease prediction; KD: Knowledge distillation; MRIR: MRI reconstruction; ML:

Meta-learning; ViT: Vision Transformer; MQTT: Message queuing telemetry transport; CNN: Convolutional neural network; PFA: Progressive Fourier aggregation. BAD: Brain

anomaly detection; SPR: Surgical phase recognition; HAR: Human Activity recognition; MRE: Medical relation extraction; WSL: Weekly supervised learning; CP: Computational

pathology; HG: Hyper graph; KG: Knowledge graph; SD: Seizure detection; SA: Surgical aggregation; HS: Healthcare system; LP: Lifespan prediction; CL: Contrastive learning;

MTL: Multi-task learning.

Union (IoU) [158]. Similarly, for classiﬁcation/prediction tasks

in FL, widely-used metrics are the Accuracy (ACC), Precision

(PREC), Recall (REC), Speciﬁcity (SPEC), F1score, and

the Area Under the ROC (AUC) [159]. In the case of other

healthcare-related tasks, such as MRI reconstruction, common

metrics include PSNR and SSIM [160], while for lifespan

prediction, Integrated Brier Scores (IBS) are typically used

[161].

B. The Benchmark Datasets Used in Federated Learning

In this section, we will discuss the healthcare benchmark

datasets that have been used in FL. As there is currently no

uniﬁed benchmark dataset for FL in healthcare, we present an

overview of the datasets that have been used in the majority

of published works in Table IV.

Retina: The Retina dataset comprises approximately

35,126 images related to Diabetic Retinopathy Disease. The

dataset encompasses ﬁve classes, namely normal, mild, mod-

erate, severe, and proliferating. The images in the dataset have

been captured using various cameras and from different angles

[167].

MedMNIST: The MedMNIST dataset is composed of

roughly 718,067 images related to ten sub-domains, which in-

clude PathMNIST, ChestMNIST, DermaMNIST, OCTMNIST,

TABLE IV: Commonly used datasets for federated learning in

healthcare

Datasets Sample(n) Application area

MIT BIH [162] 109,446 ECG-based prediction to identify ar-

rhythmia.

Premier healthcare [163] 1,271,733 PPR

Chest xray image [164] 16,148 COVID-19 diagnosis

Chest xray image 2 [68] 207,130 PD

Hologic and Siemens [49] 1,870 To detect breast cancer or tumor

COVID-19 [165] 4,029 Mortality prediction for patients with

COVID-19

eICU synergetic [166] >200,000 Predict the likelihood of patient death

HAM10000 [61] 10,015 Skin Cancer classiﬁcation / Segmen-

tation

Cancer Genome Atlas [48] >20,000 Cancer genomics program

Camelyon17 [49] 450,000 Breast cancer classiﬁcation

MedMNIST [53] 718,067 Medical image classiﬁcation

Retina [167] 35,126 Diabetic Retinopathy Detection

BraTS series [105], [106], [109] 285 Brain tumor segmentation

ABIDE [82] 1,112 ASD diagnosis

ADNI [83] 911 ASD diagnosis

PolypGen [168] 6,282 Polyp detection and segmentation

MIP [169]–[172] 393 Pancreas segmentation

MIL [173]–[177] 428 Liver tumor segmentation

MSP [89] 79 Prostate MRI segmentation

ECG: Electrocardiogram; ASD: Autism spectrum disorder; MRI: Magnetic resonance

imaging; PPR: Predict patient mortality, PD: Pneumonia detection, MIP: Multi-

institutional pancreas, MIL: Multi-institutional livers, MSP: Multi-site prostate.

BreastMNIST, among others. The amount of medical images

available in each sub-dataset ranges from 100 to 100,000.

Furthermore, the dataset comprises 12 2D sub-datasets and

6 3D sub-datasets [53].

Camelyon17: Camelyon17 dataset contains 450,000 im-

ages about breast cancer. A total of ﬁve centers contributed

data to this dataset [49].

PolypGen: PolypGen dataset is a multi-center polyp de-

tection and segmentation dataset.It incorporates more than 300

patients [168]. This dataset has both single frame and sequence

data, containing about 6,282 samples.

HAM10000: HAM10000 dataset consists of 10,015 der-

matoscopic images about skin disease from two different sites

with seven categories [61].

Premier healthcare: Premier healthcare dataset is one

of the largest database collecting data from 415 hospitals in

the USA [163]. This dataset consists of 1,271,733 Electronic

Health Record (EHR) for mortality prediction.

MIT BIH: MIT BIH dataset includes 109,446 samples

of EHR for predicting arrhythmia. This dataset contains ECG

data for 47 individuals between 1975-1979 [162].

Multi-site prostate: Multi-site prostate dataset is a dataset

which contains 79 prostate T2-weighted MRI from three

different centers [89]. It is mainly used for the segmentation

tasks.

Multi-institutional Pancreas: Multi-institutional Pancreas

dataset consists of about 393 CT samples about pancreas

[169]–[172]. It is used for the segmentation tasks.

eICU synergetic: eICU synergetic dataset is mainly for

Intensive Care Unit (ICU) based mortality prediction, which

contains more than 200,000 samples [166].

COVID-19: COVID-19 dataset is a dataset for EHR based

COVID-19 patients mortality prediction [165]. It includes

4,029 samples from ﬁve different hospitals.

Multi-institutional Livers: Multi-institutional Livers

dataset consists of 428 CT images about liver tumor diagnosis

[173]–[177]. These datasets are derived from ﬁve different

centers.

Chest Xray image: Chest Xray image dataset has 16,148

cases (both positive and negative) from 20 client-sites [164].

These datasets are used for predicting the future oxygen

requirements of patients. While the Chest Xray image 2 is

a Chest Xray image 2 datasets contains about 207,130 OCT

images from 4,686 patients with four classes [68]. It is mainly

related to pneumonia diagnosis.

Cancer Genome Atlas: Cancer Genome Atlas dataset is

a large cancer dataset that contains breast, lung, Colon, and

rectal cancer, etc., [48]. It has more than 20,000 samples.

BraTS: BraTS is a brain tumor segmentation dataset which

includes 285 brain tumor MRI scans with four MRI modalities

as T1, T1ce, T2, and Flair for each scan [105], [106], [109].

In addition, the dataset consists of complete masks for brain

tumors with labels for edema/invasion, enhancement tumor,

and necrosis regions.

ABIDE: The ABIDE dataset includes 1,112 samples,

539 of which are from individuals with ASD and 559 from

normal/healthy controls (ages 7-64 years, median 14.7 years

across groups) [82].

ADNI: The ADNI dataset contains 911 samples (378 AD

patients and 536 mild cognitive impairment subjects.). It has

three different domains including ADNI-1, ADNI-2, ADNI-

GO [83].

C. A case study of federated learning in healthcare

We use a supervised FL algorithm for medical image

classiﬁcation as an example to demonstrate the application of

FL in healthcare classiﬁcation tasks. The procedure of this FL

technique can be described as follows. We use a large-scale

dataset consisting of COVID-19 and pneumonia diagnoses

[178], covering three distinct subtypes of the disease, namely

normal, pneumonia, and COVID-19. The client number is set

to 5. Each client has 20% of the total 357,518 samples, and

the testing set consists of 33,781 samples. After successfully

allocating the data, local training is conducted, followed by

the aggregation of weights to generate a global model. This

global model is then evaluated on the testing set. The training

procedure ﬁnished after 100 federated rounds.

We employ the ResNet34 architecture for both local and

global model, using the Adam optimizer [179], [180] with a

learning rate of 0.0002, a weight decay of 0.0005, and a batch

size of 16. We also use FedAvg as the aggregation technique,

along with Cross-Entropy loss to optimize the local model

[20], [181]. Figure 5 illustrates the testing in function with

epochs, and Table V reports the performance of the compared

methods.

Our simulation indicates that the FL method can result in a

large drop of 10. 96% in accuracy when comparing the global

model with the centralized model. Furthermore, the FL method

suffers from ﬂuctuations during the testing phase as illustrated

in Figure 5. It may be necessary to apply techniques such

TABLE V: Top performance metrics of Federated/Centralized

Learning models using the testing samples.

Accuracy Precision Recall F1

Centralized 78.43 82.10 83.75 82.30

Client1 69.65 77.99 74.81 75.06

Client2 70.11 78.26 74.63 76.29

Client3 67.86 77.93 75.17 74.51

Client4 69.20 77.15 76.92 75.74

Client5 70.02 77.35 76.70 75.39

Global 67.47 77.78 73.93 75.25

The most favorable results are indicated with bold text.

as domain adaptation before implementing an FL model, as

suggested in [16].

VI. KEY CHALLENGES IN FEDERATED LEARNING

FOR HEALTHCARE

This section outlines the main challenges that FL faces in

the healthcare ﬁeld. Figure 6 provides a visual representation

of some critical obstacles.

A. Potential malicious attacks

Although FL is effective in ensuring data privacy, it faces a

security risk when transmitting communication between local

and global servers via the Internet. In [182], backdoor attacks

can be a severe security vulnerability that deceives the back-

doored global model into misclassifying all backdoored inputs

as belonging to the targeted false label, while functioning

correctly for regular inputs. FL is then highly vulnerable

to Byzantine attacks, as malicious users can manipulate the

learning process by creating fraudulent data, which degrades

the global model’s performance [183]. Moreover, in [184],

it was noted that Local Environment Poisoning Attacks can

impair the model’s performance by contaminating the local

training environment. Furthermore, in [185], they pointed out

that the integrity of the learning model in FL was susceptible

to adversarial attacks.

B. Fairness of federated learning

FL may produce biased results in the learning process due

to the varying amounts of data available for each participating

client. Therefore, it is crucial for the global model to ensure

fairness and not discriminate against any group during training.

Failure to achieve fairness can cause some participants to drop

out, leading to a decrease in model performance. In [186],

the Gini coefﬁcient was used to minimize the gap between

each client and improve the fairness of FL. FL based on an

entropy-based aggregation method to enhance fairness during

the training process is also investigated [187].

C. Heterogeneity of data

The use of diverse scanning technologies across medical

institutions worldwide creates data heterogeneity for the same

symptom, leading to signiﬁcant accuracy loss in many dis-

tributed training of deep neural networks due to non-IID data

[188]. In [189], a FL model uses a cross-correlation matrix to

learn a generalizable representation to address heterogeneous

data issues. Similarly, in [28], it introduced client weight per-

turbation and magnitude normalization to resolve this problem.

Fig. 5: Performance metrics of the Federated/Centralized

Learning model using the testing sample are measured across

different epochs. C1 through C5 denote Client 1 to Client 5,

respectively.

D. Lack of uniﬁed benchmark datasets

While current FL models exhibit impressive performance,

they often employ inconsistent datasets, which hinders ob-

jective assessment of performance metrics. Even when using

the same database, some algorithms require a subset of the

data for their training, often involving human intervention

and subjective evaluation [92]. Therefore, the creation of a

standardized benchmark test set is crucial to ensure objective

Fig. 6: Examples of the main challenges facing FL in health-

care. (Left) The challenge relates to potential malicious at-

tacks where a hacker inﬁltrates the global model to make

parameter adjustments that cause the local model to learn

erroneous parameters, ultimately reducing the model’s perfor-

mance. (Middle) The challenge pertains to fairness in FL.

To ensure fairness, the parameters are evaluated after each

communication cycle before being downloaded and uploaded.

(Right) The challenge illustrates data heterogeneity in FL. As

the acquisition of equipment varies from hospital to hospital,

it leads to different data distributions, making the learning

process more challenging.

evaluation of FL models.

E. Data ownership management and allocation

The limited availability of certain medical data necessitates

making decisions regarding the selection of data for model

training and the distribution of data among participants [190].

Moreover, there are costs involved in managing a signiﬁcant

amount of medical data, and it is imperative to protect the

integrity of the data [191].

F. Medical data quality

Poor data quality, small dataset size, and incomplete label-

ing can prevent model training, especially in medical domains

where acquiring labeled data is challenging [192]. Moreover,

providing false or harmful data to decrease the model’s per-

formance is also a critical problem.

G. Federated learning under multi-client scenario

Our analysis revealed that the number of clients for most

FL methods is generally small [30], [31]. This reduced range

of clients may not be practical given the current situation. In

cases in which a hospital has limited data, particularly fewer

than 10 samples, it may be challenging to train a robust model.

Robust models often need access to thousands of data samples,

which require many clients to participate. In such scenarios,

it is worth considering whether these techniques would still

yield impressive results. Furthermore, recent development in

IoT enable numerous devices to collaborate in training a global

model with the participation of multiple devices, a scenario

which is typically named Cross-device FL [193]. This scenario

presents a challenge in the healthcare ﬁeld.

VII. FUTURE TRENDS

Fig. 7: Showcases some expected advancements in FL within

the healthcare domain. The left panel displays blockchain-

driven FL, which relies on blockchain technology to enable

cryptographic operations that enhance data privacy. The mid-

dle panel depicts the role of FL in the healthcare metaverse,

demonstrating how this approach can maintain data privacy

and facilitate the development of a large-scale healthcare

metaverse without any leakage issues. The right panel il-

lustrates next-generation-driven FL, which capitalizes on the

high-bandwidth service provided by next-generation networks

to alleviate communication pressure between local and global

servers.

This section outlines the possible future directions of FL in

the healthcare sector. Furthermore, a set of these trends are

illustrated in Figure 7.

A. Federated learning with next generation networks

As previously mentioned, communication exchange be-

tween local and global servers is a key challenge in FL.

The emergence of 6G networks, which are projected to offer

wireless connection speeds up to 1,000 times faster than

5G, presents a potential solution to this challenge [194]. By

leveraging the increased network bandwidth of 6G technology,

FL implementations could achieve more efﬁcient and effective

communication between local and global servers, improving

overall FL performance in the healthcare sector.

B. Federated learning-driven healthcare metaverse

The metaverse, a virtual world concept seen as the successor

to the mobile internet [195], has gained signiﬁcant attention

from both academic and industrial environments. Researchers

in the healthcare sector are particularly interested in the meta-

verse due to its potential to enable remote assistance, medical

simulations, and virtual comparisons of scans [195]. However,

ensuring data privacy remains a challenge in the use of the

metaverse for clinical purposes. Adherence to appropriate pri-

vacy regulations, such as the United States’ Health Insurance

Portability and Accountability Act, is critical to guarantee

data security [195]. Additionally, the medical data obtained

from the metaverse may still suffer from heterogeneity and

data dispersion issues. By leveraging FL, there is potential

for unlocking new opportunities in the realm of intelligent

healthcare metaverse.

C. Blockchain-driven federated learning in healthcare

Blockchains can be described as a publicly accessible

ledger that documents all executed transactions, with the chain

expanding as additional blocks are added. Decentralization,

persistence, anonymity, and audibility are among the key

properties of blockchain technology. These advantages make

it an effective solution for addressing data privacy concerns.

The integration of blockchain technology into FL can enhance

data privacy protection, and this combination has already been

implemented in some instances [196].

D. Contrastive learning and federated learning: solving the

data unlabeled problem

Recently, contrastive learning has become a popular re-

search topic for learning unlabeled data representations [197].

This involves training a FL model on unlabeled data to learn

the similarities and differences between samples, allowing the

model to adapt easily to data that lacks labels [197]. Con-

trastive learning has the potential to enhance the performance

of FL models, especially in cases where data sources are

widely dispersed, and some data lack labels. For instance,

contrastive learning was used to reduce the difference between

identical images and increase the discrepancy between differ-

ent images, resulting in improved model performance [198].

E. Lifelong federated learning

While FL models have shown great success, they tend to

be task-speciﬁc, meaning that a model works well for one

task may not work as well for another. Lifelong learning

aims to overcome this limitation by using a single model that

can continuously improve and adapt to multiple tasks. For

instance, a federated lifelong learning approach was proposed

for landmark localization in medical imaging, which achieved

high performance with a mean distance error of 7.81 on

the BraTS dataset [199]. This approach shows promise for

improving the versatility and efﬁciency of FL models in

medical applications.

F. Generative pre-trained language model-based smart

healthcare under federated supervision

Researchers have recently shown great interest in using

ChatGPT in the healthcare sector to generate patient reports

and complementary diagnoses [200]. However, to train Chat-

GPT effectively, a large amount of data is required, which

can lead to potential privacy risks. Therefore, exploring ways

to train a general ChatGPT model using FL methods for

healthcare applications that are both reliable and feasible could

be a valuable avenue for further research.

G. Explainable federated learning in healthcare

The increasing attention given to the explainability of deep

models like CNNs is driven by concerns that if these models

correctly predict a disease but do not focus on abnormal

regions, it may result in a lack of trust in AI among doctors and

patients. FL also requires a global model that is explainable

and capable of providing reliable predictions, especially in the

healthcare sector [16].

VIII. CONCLUSION

In this paper, we discuss the limitations of TL in protecting

data privacy in the healthcare sector. FL is presented as a

potential solution to address privacy concerns by developing

a global model through local training and model aggregation

on decentralized datasets without sharing raw data. However,

FL in healthcare faces its own set of challenges such as

poor data quality, data heterogeneity, and data allocation and

management. We also compare FL with TL and highlight the

advantages of the former approach. The critical steps of FL

are explained in detail, and FL is categorized based on sample

and feature space. The applications of FL in healthcare are

summarized and categorized, along with typical evaluation

metrics and commonly used medical datasets. The reported

case study also sheds light on the importance of FL in

healthcare. It is expected that FL techniques will continue to

be widely used in both academia and hospitals in the near

future. With the aid of advances in science and technology,

we anticipate that FL can be further enhanced to provide more

effective support to patients in the healthcare sector.

ACKNOWLEDGMENTS

This research was funded by the National Natural Science

Foundation of China grant number 82260360, the Guilin Inno-

vation Platform and Talent Program 20222C264164, the Na-

tional Innovation Training Program for College Students under

Grant 202310595083, and the Guangxi Science and Technol-

ogy Base and Talent Project (2022AC18004, 2022AC21040).

REFERENCES

[1] Ahmad Chaddad and Camel Tanougast. Cnn approach for predicting

survival outcome of patients with covid-19. IEEE Internet of Things

Journal, 2023.

[2] Jonathan T Megerian, Sangeeta Dey, Raun D Melmed, Daniel L

Coury, Marc Lerner, Christopher J Nicholls, Kristin Sohl, Rambod

Rouhbakhsh, Anandhi Narasimhan, Jonathan Romain, et al. Evaluation

of an artiﬁcial intelligence-based medical device for diagnosis of autism

spectrum disorder. NPJ digital medicine, 5(1):57, 2022.

[3] Lucas Mearian. Did IBM overhype watson health’s AI promise.

Computerworld, 2019.

[4] Jane Andrew and Max Baker. The general data protection regulation

in the age of surveillance capitalism. Journal of Business Ethics, 168,

01 2021.

[5] Ashish Rauniyar, Desta Haileselassie Hagos, Debesh Jha, Jan Erik

H˚

akeg˚

ard, Ulas Bagci, Danda B Rawat, and Vladimir Vlassov.

Federated learning for medical applications: A taxonomy, current

trends, challenges, and future research directions. arXiv preprint

arXiv:2208.03392, 2022.

[6] Chen Zhang, Yu Xie, Hang Bai, Bin Yu, Weihong Li, and Yuan Gao. A

survey on federated learning. Knowledge-Based Systems, 216:106775,

2021.

[7] Jakub Koneˇ

cn`

y, H Brendan McMahan, Daniel Ramage, and Peter

Richt´

arik. Federated optimization: Distributed machine learning for

on-device intelligence. arXiv preprint arXiv:1610.02527, 2016.

[8] Marcos F Criado, Fernando E Casado, Roberto Iglesias, Carlos V

Regueiro, and Sen´

en Barro. Non-IID data and continual learning

processes in federated learning: A long road ahead. Information Fusion,

88:263–280, 2022.

[9] Matthew G Crowson, Dana Moukheiber, Aldo Robles Ar´

evalo, Bar-

bara D Lam, Sreekar Mantena, Aakanksha Rana, Deborah Goss,

David W Bates, and Leo Anthony Celi. A systematic review of

federated learning applications for biomedical data. PLOS Digital

Health, 1(5):e0000033, 2022.

[10] Bingyan Liu, Nuoyan Lv, Yuanchun Guo, and Yawen Li. Recent

advances on federated learning: A systematic survey. arXiv preprint

arXiv:2301.01299, 2023.

[11] Ahmad Chaddad, Lama Hassan, Yousef Katib, and Ahmed Bouridane.

Deep survival analysis with clinical variables for covid-19. IEEE

Journal of Translational Engineering in Health and Medicine, 11:223–

231, 2023.

[12] Omid Nejati Manzari, Hamid Ahmadabadi, Hossein Kashiani,

Shahriar B Shokouhi, and Ahmad Ayatollahi. Medvit: a robust vision

transformer for generalized medical image classiﬁcation. Computers

in Biology and Medicine, 157:106791, 2023.

[13] Junde Wu, Rao Fu, Huihui Fang, Yuanpei Liu, Zhaowei Wang, Yanwu

Xu, Yueming Jin, and Tal Arbel. Medical sam adapter: Adapting

segment anything model for medical image segmentation. arXiv

preprint arXiv:2304.12620, 2023.

[14] Hong-Yu Zhou, Jiansen Guo, Yinghao Zhang, Xiaoguang Han, Lequan

Yu, Liansheng Wang, and Yizhou Yu. nnformer: Volumetric medical

image segmentation via a 3d transformer. IEEE Transactions on Image

Processing, 2023.

[15] Ji-Jiang Yang, Jian-Qiang Li, and Yu Niu. A hybrid solution for privacy

preserving medical data sharing in the cloud environment. Future

Generation Computer Systems, 43-44:74–86, 2015.

[16] Ahmad Chaddad, Qizong Lu, Jiali Li, Yousef Katib, Reem Kateb,

Camel Tanougast, Ahmed Bouridane, and Ahmed Abdulkadir. Explain-

able, domain-adaptive, and federated artiﬁcial intelligence in medicine.

IEEE/CAA Journal of Automatica Sinica, 10(4):859–876, 2023.

[17] Dinh C. Nguyen, Pubudu N. Pathirana, Ming Ding, and Aruna Senevi-

ratne. BEdgeHealth: A decentralized architecture for edge-based

IoMT networks using blockchain. IEEE Internet of Things Journal,

8(14):11743–11757, 2021.

[18] Li Zhang, Jianbo Xu, Pandi Vijayakumar, Pradip Kumar Sharma, and

Uttam Ghosh. Homomorphic encryption-based privacy-preserving fed-

erated learning in IoT-enabled healthcare system. IEEE Transactions

on Network Science and Engineering, pages 1–17, 2022.

[19] Latif U. Khan, Shashi Raj Pandey, Nguyen H. Tran, Walid Saad, Zhu

Han, Minh N. H. Nguyen, and Choong Seon Hong. Federated learning

for edge networks: Resource optimization and incentive mechanism.

IEEE Communications Magazine, 58(10):88–93, 2020.

[20] Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson,

and Blaise Aguera y Arcas. Communication-efﬁcient learning of

deep networks from decentralized data. In Artiﬁcial intelligence and

statistics, pages 1273–1282. PMLR, 2017.

[21] Shuo Wan, Jiaxun Lu, Pingyi Fan, Yunfeng Shao, Chenghui Peng,

Khaled B. letaief, and Jie Chuai. How global observation works in

federated learning: Integrating vertical training into horizontal federated

learning. IEEE Internet of Things Journal, pages 1–1, 2023.

[22] Jie Xu, Benjamin S Glicksberg, Chang Su, Peter Walker, Jiang Bian,

and Fei Wang. Federated learning for healthcare informatics. Journal

of Healthcare Informatics Research, 5:1–19, 2021.

[23] Peter Kairouz, H Brendan McMahan, Brendan Avent, Aur´

elien Bel-

let, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary

Charles, Graham Cormode, Rachel Cummings, et al. Advances and

open problems in federated learning. Foundations and Trends® in

Machine Learning, 14(1–2):1–210, 2021.

[24] Yang Liu, Tao Fan, Tianjian Chen, Qian Xu, and Qiang Yang. FATE:

An industrial grade platform for collaborative learning with data

protection. The Journal of Machine Learning Research, 22(1):10320–

10325, 2021.

[25] Fuzhen Zhuang, Zhiyuan Qi, Keyu Duan, Dongbo Xi, Yongchun Zhu,

Hengshu Zhu, Hui Xiong, and Qing He. A comprehensive survey on

transfer learning. Proceedings of the IEEE, 109(1):43–76, 2021.

[26] Yazan Otoum, Yue Wan, and Amiya Nayak. Federated transfer

learning-based IDS for the internet of medical things (IoMT). In 2021

IEEE Globecom Workshops (GC Wkshps), pages 1–6. IEEE, 2021.

[27] Mohammed Adnan, Shivam Kalra, Jesse C Cresswell, Graham W

Taylor, and Hamid R Tizhoosh. Federated learning and differential

privacy for medical image analysis. Scientiﬁc reports, 12(1):1953,

2022.

[28] Meirui Jiang, Zirui Wang, and Qi Dou. HarmoFL: Harmonizing

local and global drifts in federated learning on heterogeneous medical

images. Proceedings of the AAAI Conference on Artiﬁcial Intelligence,

36(1):1087–1095, 6 2022.

[29] Amelia Jim´

enez-S´

anchez, Mickael Tardy, Miguel A. Gonz´

alez

Ballester, Diana Mateus, and Gemma Piella. Memory-aware curriculum

federated learning for breast cancer classiﬁcation. Computer Methods

and Programs in Biomedicine, 229:107318, 2023.

[30] Jun Luo and Shandong Wu. FedSLD: Federated learning with shared

label distribution for medical image classiﬁcation. In 2022 IEEE 19th

International Symposium on Biomedical Imaging (ISBI), pages 1–5.

IEEE, 2022.

[31] Rui Yan, Liangqiong Qu, Qingyue Wei, Shih-Cheng Huang, Liyue

Shen, Daniel Rubin, Lei Xing, and Yuyin Zhou. Label-efﬁcient self-

supervised federated learning for tackling data heterogeneity in medical

imaging. arXiv preprint arXiv:2205.08576, 2022.

[32] Yiqing Shen, Yuyin Zhou, and Lequan Yu. CD2-pFed: Cyclic

distillation-guided channel decoupling for model personalization in

federated learning. In Proceedings of the IEEE/CVF Conference

on Computer Vision and Pattern Recognition (CVPR), pages 10041–

10050, June 2022.

[33] S. Maryam Hosseini, Milad Sikaroudi, Morteza Babaie, and H.R.

Tizhoosh. Proportionally fair hospital collaborations in federated

learning of histopathology images. IEEE Transactions on Medical

Imaging, pages 1–1, 2023.

[34] Jeffry Wicaksana, Zengqiang Yan, Xin Yang, Yang Liu, Lixin Fan, and

Kwang-Ting Cheng. Customized federated learning for multi-source

decentralized medical image classiﬁcation. IEEE Journal of Biomedical

and Health Informatics, 26(11):5596–5607, 2022.

[35] Meirui Jiang, Hongzheng Yang, Xiaoxiao Li, Quande Liu, Pheng-

Ann Heng, and Qi Dou. Dynamic bank learning for semi-supervised

federated image diagnosis with class imbalance. In Medical Image

Computing and Computer Assisted Intervention–MICCAI 2022: 25th

International Conference, Singapore, September 18–22, 2022, Proceed-

ings, Part III, pages 196–206. Springer, 2022.

[36] Wang Lu, Jindong Wang, Yiqiang Chen, Xin Qin, Renjun Xu, Dimitrios

Dimitriadis, and Tao Qin. Personalized federated learning with adaptive

batchnorm for healthcare. IEEE Transactions on Big Data, pages 1–1,

2022.

[37] Xing Wu, Jie Pei, Cheng Chen, Yimin Zhu, Jianjia Wang, Quan Qian,

Jian Zhang, Qun Sun, and Yike Guo. Federated active learning for

multicenter collaborative disease diagnosis. IEEE Transactions on

Medical Imaging, pages 1–1, 2022.

[38] Dinh C. Nguyen, Ming Ding, Pubudu N. Pathirana, Aruna Seneviratne,

and Albert Y. Zomaya. Federated learning for COVID-19 detection

with generative adversarial networks in edge cloud computing. IEEE

Internet of Things Journal, 9(12):10257–10271, 2022.

[39] Judith S ´

ainz-Pardo D´

ıaz and ´

Alvaro L´

opez Garc´

ıa. Study of the

performance and scalability of federated learning for medical imaging

with intermittent clients. Neurocomputing, 518:142–154, 2023.

[40] Saleh Baghersalimi, Tomas Teijeiro, David Atienza, and Amir Amini-

far. Personalized real-time federated learning for epileptic seizure

detection. IEEE Journal of Biomedical and Health Informatics,

26(2):898–909, 2022.

[41] Marc Jayson Baucas, Petros Spachos, and Konstantinos N Plataniotis.

Federated learning and blockchain-enabled Fog-IoT platform for wear-

ables in predictive healthcare. IEEE Transactions on Computational

Social Systems, 2023.

[42] Jiayun Zhang, Xiyuan Zhang, Xinyang Zhang, Dezhi Hong, Rajesh K

Gupta, and Jingbo Shang. Federated learning with client-exclusive

classes. arXiv preprint arXiv:2301.00489, 2023.

[43] Kaixuan Zhang, Xiulong Liu, Xin Xie, Jiuwu Zhang, Bingxin Niu, and

Keqiu Li. A cross-domain federated learning framework for wireless

human sensing. IEEE Network, 36(5):122–128, 2022.

[44] Shuai Liu, Xiao Guo, Shun Qi, Huaning Wang, and Xiangyu Chang.

Learning personalized brain functional connectivity of MDD patients

from multiple sites via federated bayesian networks. arXiv preprint

arXiv:2301.02423, 2023.

[45] Liang Peng, Nan Wang, Nicha Dvornek, Xiaofeng Zhu, and Xiaoxiao

Li. FedNI: Federated graph learning with network inpainting for

population-based disease prediction. IEEE Transactions on Medical

Imaging, pages 1–1, 2022.

[46] Hasan Kassem, Deepak Alapatt, Pietro Mascagni, Consortium

AI4SafeChole, Alexandros Karargyris, and Nicolas Padoy. Federated

cycling (fedcy): Semi-supervised federated learning of surgical phases.

IEEE Transactions on Medical Imaging, 2022.

[47] Pranav Kulkarni, Adway Kanhere, Paul H Yi, and Vishwa S

Parekh. Surgical aggregation: A federated learning framework for

harmonizing distributed datasets with diverse tasks. arXiv preprint

arXiv:2301.06683, 2023.

[48] Katarzyna Tomczak, Patrycja Czerwi´

nska, and Maciej Wiznerowicz.

Review the cancer genome atlas (TCGA): an immeasurable source

of knowledge. Contemporary Oncology/Wsp´

ołczesna Onkologia,

2015(1):68–77, 2015.

[49] Peter Bandi, Oscar G. F. Geessink, Quirine F. Manson, Marcory Crf

van Dijk, Maschenka C. A. Balkenhol, Meyke Hermsen, and Babak

Ehteshami Bejnordi et al. From detection of individual metastases to

classiﬁcation of lymph node status at the patient level: The CAME-

LYON17 challenge. IEEE Transactions on Medical Imaging, 38:550–

560, 2019.

[50] Heejo Kong, Gun-Hee Lee, Suneung Kim, and Seong-Whan Lee.

Pruning-guided curriculum learning for semi-supervised semantic seg-

mentation. In Proceedings of the IEEE/CVF Winter Conference on

Applications of Computer Vision (WACV), pages 5914–5923, January

2023.

[51] Yi Chang, Meiya Chen, Changfeng Yu, Yi Li, Liqun Chen, and Luxin

Yan. Direction and residual awareness curriculum learning network

for rain streaks removal. IEEE Transactions on Neural Networks and

Learning Systems, pages 1–15, 2023.

[52] Yoshua Bengio, J´

erˆ

ome Louradour, Ronan Collobert, and Jason We-

ston. Curriculum learning. In International Conference on Machine

Learning, 2009.

[53] Jiancheng Yang, Rui Shi, and Bingbing Ni. MedMNIST classiﬁca-

tion decathlon: A lightweight AutoML benchmark for medical image

analysis. In 2021 IEEE 18th International Symposium on Biomedical

Imaging (ISBI), pages 191–195. IEEE, 2021.

[54] Noel CF Codella, David Gutman, M Emre Celebi, Brian Helba,

Michael A Marchetti, Stephen W Dusza, Aadi Kalloo, Konstantinos

Liopyris, Nabin Mishra, Harald Kittler, et al. Skin lesion analysis

toward melanoma detection: A challenge at the 2017 international

symposium on biomedical imaging (ISBI), hosted by the international

skin imaging collaboration (ISIC). In 2018 IEEE 15th international

symposium on biomedical imaging (ISBI 2018), pages 168–172. IEEE,

2018.

[55] Marc Combalia, Noel CF Codella, Veronica Rotemberg, Brian Helba,

Veronica Vilaplana, Ofer Reiter, Cristina Carrera, Alicia Barreiro,

Allan C Halpern, Susana Puig, et al. BCN20000: Dermoscopic lesions

in the wild. arXiv preprint arXiv:1908.02288, 2019.

[56] Veronica Rotemberg, Nicholas Kurtansky, Brigid Betz-Stablein, Liam

Caffery, Emmanouil Chousakos, Noel Codella, Marc Combalia,

Stephen Dusza, Pascale Guitera, David Gutman, et al. A patient-

centric dataset of images and metadata for identifying melanomas using

clinical context. Scientiﬁc data, 8(1):34, 2021.

[57] Xiaosong Ma, Jie Zhang, Song Guo, and Wenchao Xu. Layer-

wised model aggregation for personalized federated learning. In

Proceedings of the IEEE/CVF Conference on Computer Vision and

Pattern Recognition (CVPR), pages 10092–10101, June 2022.

[58] Hongyi Wang, Mikhail Yurochkin, Yuekai Sun, Dimitris Papailiopou-

los, and Yasaman Khazaeni. Federated learning with matched averag-

ing. arXiv preprint arXiv:2002.06440, 2020.

[59] Sannara EK, Franc¸ois PORTET, Philippe LALANDA, and German

VEGA. A federated learning aggregation algorithm for pervasive

computing: Evaluation and comparison. In 2021 IEEE International

Conference on Pervasive Computing and Communications (PerCom),

pages 1–10, 2021.

[60] Sannara Ek, Franc¸ ois Portet, Philippe Lalanda, and German Vega.

Evaluation and comparison of federated learning algorithms for human

activity recognition on smartphones. Pervasive and Mobile Computing,

87:101714, 2022.

[61] Philipp Tschandl, Cliff Rosendahl, and Harald Kittler. The HAM10000

dataset, a large collection of multi-source dermatoscopic images of

common pigmented skin lesions. Scientiﬁc data, 5(1):1–9, 2018.

[62] Juexiao Zhou, Longxi Zhou, Di Wang, Xiaopeng Xu, Haoyang

Li, Yuetan Chu, Wenkai Han, and Xin Gao. Personalized and

privacy-preserving federated heterogeneous medical image analysis

with PPPML-HMI. medRxiv, pages 2023–02, 2023.

[63] Unais Sait, KG Lal, S Prajapati, Rahul Bhaumik, Tarun Kumar,

S Sanjana, and Kriti Bhalla. Curated dataset for COVID-19 posterior-

anterior chest radiography images (X-Rays). Mendeley Data, 1, 2020.

[64] Kang Zhang, Xiaohong Liu, Jun Shen, Zhihuan Li, Ye Sang, Xingwang

Wu, Yunfei Zha, Wenhua Liang, Chengdi Wang, Ke Wang, et al.

Clinically applicable ai system for accurate diagnosis, quantitative mea-

surements, and prognosis of COVID-19 pneumonia using computed

tomography. Cell, 181(6):1423–1433, 2020.

[65] Tulin Ozturk, Muhammed Talo, Eylul Azra Yildirim, Ulas Baran

Baloglu, Ozal Yildirim, and U. Rajendra Acharya. Automated detection

of COVID-19 cases using deep neural networks with X-ray images.

Computers in Biology and Medicine, 121:103792, 2020.

[66] Parnian Afshar, Shahin Heidarian, Farnoosh Naderkhani, Anastasia

Oikonomou, Konstantinos N. Plataniotis, and Arash Mohammadi.

COVID-CAPS: A capsule network-based framework for identiﬁcation

of COVID-19 cases from X-ray images. Pattern Recognition Letters,

138:638–643, 2020.

[67] Rachel Lea Draelos, David Dov, Maciej A Mazurowski, Joseph Y Lo,

Ricardo Henao, Geoffrey D Rubin, and Lawrence Carin. Machine-

learning-based multiple abnormality prediction with large-scale chest

computed tomography volumes. Medical image analysis, 67:101857,

2021.

[68] Daniel S Kermany, Michael Goldbaum, Wenjia Cai, Carolina CS

Valentim, Huiying Liang, Sally L Baxter, Alex McKeown, Ge Yang,

Xiaokang Wu, Fangbing Yan, et al. Identifying medical diagnoses and

treatable diseases by image-based deep learning. cell, 172(5):1122–

1131, 2018.

[69] World Health Organization et al. Epilepsy: a public health imperative.

2019. This report provides an overview of the challenges of epilepsy

diagnosis and treatment throughout the world, highlighting the gaps

between high-income and low-income countries, 2020.

[70] Lijuan Duan, Zeyu Wang, Yuanhua Qiao, Yue Wang, Zhaoyang Huang,

and Baochang Zhang. An automatic method for epileptic seizure

detection based on deep metric learning. IEEE Journal of Biomedical

and Health Informatics, 26(5):2147–2157, 2022.

[71] Matthias Ihle, Hinnerk Feldwisch-Drentrup, C´

esar A. Teixeira, Adrien

Witon, Bj¨

orn Schelter, Jens Timmer, and Andreas Schulze-Bonhage.

EPILEPSIAE – a european epilepsy database. Computer Methods and

Programs in Biomedicine, 106(3):127–138, 2012.

[72] Yang Li, Guanci Yang, Zhidong Su, Shaobo Li, and Yang Wang.

Human activity recognition based on multienvironment sensor data.

Information Fusion, 91:47–63, 2023.

[73] Attila Reiss and Didier Stricker. Introducing a new benchmarked

dataset for activity monitoring. In 2012 16th International Symposium

on Wearable Computers, pages 108–109, 2012.

[74] Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra,

Jorge Luis Reyes-Ortiz, et al. A public domain dataset for human

activity recognition using smartphones. In Esann, volume 3, page 3,

2013.

[75] Saori C Tanaka, Ayumu Yamashita, Noriaki Yahata, Takashi Itahashi,

Giuseppe Lisi, Takashi Yamada, Naho Ichikawa, Masahiro Takamura,

Yujiro Yoshihara, Akira Kunimatsu, et al. A multi-site, multi-disorder

resting-state magnetic resonance image database. Scientiﬁc data,

8(1):227, 2021.

[76] Marco Solmi, Minjin Song, Dong Keon Yon, Seung Won Lee, Eric

Fombonne, Min Seo Kim, Seoyeon Park, Min Ho Lee, Jimin Hwang,

Roberto Keller, et al. Incidence, prevalence, and global burden of

autism spectrum disorder from 1990 to 2019 across 204 countries.

Molecular Psychiatry, pages 1–9, 2022.

[77] Ziqi Tang, Kangway V Chuang, Charles DeCarli, Lee-Way Jin, Laurel

Beckett, Michael J Keiser, and Brittany N Dugger. Interpretable

classiﬁcation of Alzheimer’s disease pathologies with a convolutional

neural network pipeline. Nature communications, 10(1):2173, 2019.

[78] Jingsheng Deng, Md Rakibul Hasan, Minhaz Mahmud, Md Mahbub

Hasan, Khandaker Asif Ahmed, and Md Zakir Hossain. Diagnosing

autism spectrum disorder using ensemble 3D-CNN: A preliminary

study. In 2022 IEEE International Conference on Image Processing

(ICIP), pages 3480–3484. IEEE, 2022.

[79] Hongming Li and Yong Fan. Brain decoding from functional MRI

using long short-term memory recurrent neural networks. In Medical

Image Computing and Computer Assisted Intervention–MICCAI 2018:

21st International Conference, Granada, Spain, September 16-20,

2018, Proceedings, Part III 11, pages 320–328. Springer, 2018.

[80] V Pream Sudha and MS Vijaya. Recurrrent neural network based

model for autism spectrum disorder prediction using codon encoding.

Journal of The Institution of Engineers (India): Series B, pages 1–7,

2021.

[81] Hao Zhang, Ran Song, Liping Wang, Lin Zhang, Dawei Wang, Cong

Wang, and Wei Zhang. Classiﬁcation of brain disorders in rs-fMRI via

local-to-global graph neural networks. IEEE Transactions on Medical

Imaging, pages 1–1, 2022.

[82] Adriana Di Martino, Chao-Gan Yan, Qingyang Li, Erin Denio, Fran-

cisco X Castellanos, Kaat Alaerts, Jeffrey S Anderson, Michal Assaf,

Susan Y Bookheimer, Mirella Dapretto, et al. The autism brain imaging

data exchange: towards a large-scale evaluation of the intrinsic brain

architecture in autism. Molecular psychiatry, 19(6):659–667, 2014.

[83] Alzheimer’s disease neuroimaging initiative (ADNI). https://adni.loni.

usc.edu. Accessed: March 18, 2023.

[84] Xinpeng Ding and Xiaomeng Li. Exploring segment-level semantics

for online phase recognition from surgical videos. IEEE Transactions

on Medical Imaging, 41(11):3309–3319, 2022.

[85] Andru P. Twinanda, Sherif Shehata, Didier Mutter, Jacques Marescaux,

Michel de Mathelin, and Nicolas Padoy. EndoNet: A deep architecture

for recognition tasks on laparoscopic videos. IEEE Transactions on

Medical Imaging, 36(1):86–97, 2017.

[86] Xiaosong Wang, Yifan Peng, Le Lu, Zhiyong Lu, Mohammadhadi

Bagheri, and Ronald M Summers. ChestX-ray8: Hospital-scale chest

X-ray database and benchmarks on weakly-supervised classiﬁcation

and localization of common thorax diseases. In Proceedings of the

IEEE conference on computer vision and pattern recognition, pages

2097–2106, 2017.

[87] Jeremy Irvin, Pranav Rajpurkar, Michael Ko, Yifan Yu, Silviana Ciurea-

Ilcus, Chris Chute, Henrik Marklund, Behzad Haghgoo, Robyn Ball,

Katie Shpanskaya, et al. CheXpert: A large chest radiograph dataset

with uncertainty labels and expert comparison. In Proceedings of the

AAAI conference on artiﬁcial intelligence, volume 33, pages 590–597,

2019.

[88] Zelei Liu, Yuanyuan Chen, Yansong Zhao, Han Yu, Yang Liu,

Renyi Bao, Jinpeng Jiang, Zaiqing Nie, Qian Xu, and Qiang Yang.

Contribution-aware federated learning for smart healthcare. Proceed-

ings of the AAAI Conference on Artiﬁcial Intelligence, 36(11):12396–

12404, 6 2022.

[89] Quande Liu, Qi Dou, Lequan Yu, and Pheng Ann Heng. MS-Net: multi-

site network for improving prostate segmentation with heterogeneous

mri data. IEEE transactions on medical imaging, 39(9):2713–2724,

2020.

[90] Meilu Zhu, Zhen Chen, and Yixuan Yuan. FedDM: Federated weakly

supervised segmentation via annotation calibration and gradient de-

conﬂicting. IEEE Transactions on Medical Imaging, pages 1–1, 2023.

[91] Geert Litjens, Robert Toth, Wendy Van De Ven, Caroline Hoeks, Sjoerd

Kerkstra, Bram van Ginneken, Graham Vincent, Gwenael Guillard,

Neil Birbeck, Jindang Zhang, et al. Evaluation of prostate segmentation

algorithms for MRI: the PROMISE12 challenge. Medical image

analysis, 18(2):359–373, 2014.

[92] An Xu, Wenqi Li, Pengfei Guo, Dong Yang, Holger R. Roth, Ali

Hatamizadeh, Can Zhao, Daguang Xu, Heng Huang, and Ziyue Xu.

Closing the generalization gap of cross-silo federated medical image

segmentation. In Proceedings of the IEEE/CVF Conference on Com-

puter Vision and Pattern Recognition (CVPR), pages 20866–20875,

June 2022.

[93] Michela Antonelli, Annika Reinke, Spyridon Bakas, Keyvan Farahani,

Annette Kopp-Schneider, Bennett A Landman, Geert Litjens, Bjoern

Menze, Olaf Ronneberger, Ronald M Summers, et al. The medical

segmentation decathlon. Nature communications, 13(1):4128, 2022.

[94] Guillaume Lemaˆ

ıtre, Robert Mart´

ı, Jordi Freixenet, Joan C. Vilanova,

Paul M. Walker, and Fabrice Meriaudeau. Computer-aided detection

and diagnosis for prostate cancer based on mono and multi-parametric

MRI: A review. Computers in Biology and Medicine, 60:8–31, 2015.

[95] NCI ISBI dataset. https://www.cancerimagingarchive.net/. Accessed:

March 18, 2023.

[96] Prostatex dataset. https://prostatex.grand-challenge.org/, 2022. Ac-

cessed: March 18, 2023.

[97] Zhen Chen, Chen Yang, Meilu Zhu, Zhe Peng, and Yixuan Yuan.

Personalized retrogress-resilient federated learning toward imbalanced

medical data. IEEE Transactions on Medical Imaging, 41(12):3663–

3674, 2022.

[98] Nicolas Wagner, Moritz Fuchs, Yuri Tolkach, and Anirban Mukhopad-

hyay. Federated stain normalization for computational pathology.

In Medical Image Computing and Computer Assisted Intervention–

MICCAI 2022: 25th International Conference, Singapore, September

18–22, 2022, Proceedings, Part II, pages 14–23. Springer, 2022.

[99] Wouter Bulten, P´

eter B´

andi, Jeffrey Hoven, Rob van de Loo, Johannes

Lotz, Nick Weiss, Jeroen van der Laak, Bram van Ginneken, Christina

Hulsbergen-van de Kaa, and Geert Litjens. Epithelium segmentation

using deep learning in H&E-stained prostate specimens with immuno-

histochemistry as reference standard. Scientiﬁc reports, 9(1):864, 2019.

[100] Jeffry Wicaksana, Zengqiang Yan, Dong Zhang, Xijie Huang, Huimin

Wu, Xin Yang, and Kwang-Ting Cheng. FedMix: Mixed supervised

federated learning for medical image segmentation. IEEE Transactions

on Medical Imaging, 2022.

[101] Walid Al-Dhabyani, Mohammed Gomaa, Hussien Khaled, and Aly

Fahmy. Dataset of breast ultrasound images. Data in brief, 28:104863,

2020.

[102] Yingtao Zhang, Min Xian, Heng-Da Cheng, Bryar Shareef, Jianrui

Ding, Fei Xu, Kuan Huang, Boyu Zhang, Chunping Ning, and Ying

Wang. BUSIS: a benchmark for breast ultrasound image segmentation.

In Healthcare, volume 10, page 729. MDPI, 2022.

[103] Moi Hoon Yap, Gerard Pons, Joan Mart´

ı, Sergi Ganau, Melcior Sent´

ıs,

Reyer Zwiggelaar, Adrian K. Davison, and Robert Mart´

ı. Automated

breast ultrasound lesions detection using convolutional neural networks.

IEEE Journal of Biomedical and Health Informatics, 22(4):1218–1226,

2018.

[104] Bernardo Camajori Tedeschini, Stefano Savazzi, Roman Stoklasa,

Luca Barbieri, Ioannis Stathopoulos, Monica Nicoli, and Luigi Serio.

Decentralized federated learning for healthcare networks: A case study

on tumor segmentation. IEEE Access, 10:8693–8708, 2022.

[105] Spyridon Bakas, Hamed Akbari, Aristeidis Sotiras, Michel Bilello,

Martin Rozycki, Justin S Kirby, John B Freymann, Keyvan Farahani,

and Christos Davatzikos. Advancing the cancer genome atlas glioma

MRI collections with expert segmentation labels and radiomic features.

Scientiﬁc data, 4(1):1–13, 2017.

[106] Spyridon Bakas, Mauricio Reyes, Andras Jakab, Stefan Bauer, Markus

Rempﬂer, Alessandro Crimi, Russell Takeshi Shinohara, Christoph

Berger, Sung Min Ha, Martin Rozycki, et al. Identifying the best

machine learning algorithms for brain tumor segmentation, progression

assessment, and overall survival prediction in the BRATS challenge.

arXiv preprint arXiv:1811.02629, 2018.

[107] Xuan Gong, Liangchen Song, Rishi Vedula, Abhishek Sharma, Meng

Zheng, Benjamin Planche, Arun Innanje, Terrence Chen, Junsong Yuan,

David Doermann, et al. Federated learning with privacy-preserving

ensemble attention distillation. IEEE Transactions on Medical Imaging,

2022.

[108] Miao Zhang, Liangqiong Qu, Praveer Singh, Jayashree Kalpathy-

Cramer, and Daniel L. Rubin. SplitAVG: A heterogeneity-aware

federated deep learning method for medical imaging. IEEE Journal

of Biomedical and Health Informatics, 26(9):4635–4644, 2022.

[109] MICCAI BRATS 2017. https://sites.google.com/site/

braintumorsegmentation/, 2017. Accessed: March 18, 2023.

[110] Adway U Kanhere, Pranav Kulkarni, Paul H Yi, and Vishwa S

Parekh. SegViz: A federated learning framework for medical image

segmentation from distributed datasets with different and incomplete

annotations. arXiv preprint arXiv:2301.07074, 2023.

[111] Synapse — sage bionetworks. https://www.synapse.org. Accessed:

March 18, 2023.

[112] Sangjoon Park and Jong Chul Ye. Multi-task distributed learning using

vision transformer with random patch permutation. IEEE Transactions

on Medical Imaging, 42(7):2091–2105, 2023.

[113] Pneumothorax dataset. https://www.kaggle.com/competitions/

siim-acr-pneumothorax-segmentation/overview, 2018. Accessed:

March 18, 2023.

[114] Junxiu Liu, Xiuhao Liang, Rixing Yang, Yuling Luo, Hao Lu, Liangjia

Li, Shunsheng Zhang, and Su Yang. Federated learning-based vertebral

body segmentation. Engineering Applications of Artiﬁcial Intelligence,

116:105451, 2022.

[115] 2019.kaggle:spinesagt2wdataset3. https://www.kaggle.com/datasets/

dutianze/mri-dataset, 2019. Accessed: March 18, 2023.

[116] Gokberk Elmas, Salman UH Dar, Yilmaz Korkmaz, Emir Ceyani,

Burak Susam, Muzaffer Ozbey, Salman Avestimehr, and Tolga C¸ ukur.

Federated learning of generative image priors for MRI reconstruction.

IEEE Transactions on Medical Imaging, pages 1–1, 2022.

[117] IXI dataset. http://brain- development.org/ixi-dataset/. Accessed: March

18, 2023.

[118] Chun-Mei Feng, Yunlu Yan, Shanshan Wang, Yong Xu, Ling Shao,

and Huazhu Fu. Speciﬁcity-preserving federated learning for mr image

reconstruction. IEEE Transactions on Medical Imaging, 42(7):2010–

2021, 2023.

[119] Florian Knoll, Jure Zbontar, Anuroop Sriram, Matthew J Muckley,

Mary Bruno, Aaron Defazio, Marc Parente, Krzysztof J Geras, Joe

Katsnelson, Hersh Chandarana, et al. fastMRI: A publicly available

raw k-space and DICOM dataset of knee images for accelerated MR

image reconstruction using machine learning. Radiology: Artiﬁcial

Intelligence, 2(1):e190007, 2020.

[120] Chunhui Du, Hao He, and Yaohui Jin. Contrast with major classiﬁer

vectors for federated medical relation extraction with heterogeneous

label distribution. arXiv preprint arXiv:2301.05376, 2023.

[121] ¨

Ozlem Uzuner, Brett R South, Shuying Shen, and Scott L DuVall. 2010

i2b2/VA challenge on concepts, assertions, and relations in clinical text.

Journal of the American Medical Informatics Association, 18(5):552–

556, 2011.

[122] Anshul Thakur, Pulkit Sharma, and David A. Clifton. Dynamic neural

graphs based federated reptile for semi-supervised multi-tasking in

healthcare applications. IEEE Journal of Biomedical and Health

Informatics, 26(4):1761–1772, 2022.

[123] Alistair EW Johnson, Tom J Pollard, Lu Shen, Li-wei H Lehman,

Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter

Szolovits, Leo Anthony Celi, and Roger G Mark. MIMIC-III, a freely

accessible critical care database. Scientiﬁc data, 3(1):1–9, 2016.

[124] D Kai Zhang, Francesca Toni, and Matthew Williams. A federated cox

model with non-proportional hazards. In Multimodal AI in healthcare:

A paradigm shift in health intelligence, pages 171–185. Springer, 2022.

[125] GBSG dataset. https://www.gbg.de/en/. Accessed: March 18, 2023.

[126] Usman Ahmed, Jerry Chun-Wei Lin, and Gautam Srivastava. Hyper-

graph attention based federated learning method for mental health

detection. IEEE Journal of Biomedical and Health Informatics, 2022.

[127] Suresh Kumar Mukhiya, Usman Ahmed, Fazle Rabbi, Ka I Pun, and

Yngve Lamo. Adaptation of IDPT system based on patient-authored

text data using NLP. In 2020 IEEE 33rd International Symposium on

Computer-Based Medical Systems (CBMS), pages 226–232, 2020.

[128] Cosmin I Bercea, Benedikt Wiestler, Daniel Rueckert, and Shadi Albar-

qouni. Federated disentangled representation learning for unsupervised

brain anomaly detection. Nature Machine Intelligence, 4(8):685–695,

2022.

[129] Dimitrios I Zaridis, Eugenia Mylona, Nikolaos Tachos, Vasileios C

Pezoulas, Grigorios Grigoriadis, Nikos Tsiknakis, Kostas Marias,

Manolis Tsiknakis, and Dimitrios I Fotiadis. Region-adaptive magnetic

resonance image enhancement for improving CNN-based segmentation

of the prostate and prostatic zones. Scientiﬁc Reports, 13(1):714, 2023.

[130] Zhi-Hua Zhou. A brief introduction to weakly supervised learning.

National science review, 5(1):44–53, 2018.

[131] Francesco Piccialli, Vittorio Di Somma, Fabio Giampaolo, Salvatore

Cuomo, and Giancarlo Fortino. A survey on deep learning in medicine:

Why, how and when? Information Fusion, 66:111–137, 2021.

[132] Hatice Catal Reis, Veysel Turk, Kourosh Khoshelham, and Serhat Kaya.

InSiNet: a deep convolutional approach to skin cancer detection and

segmentation. Medical & Biological Engineering & Computing, pages

1–20, 2022.

[133] Ahmad Chaddad, Lama Hassan, and Yousef Katib. A texture-based

method for predicting molecular markers and survival outcome in lower

grade glioma. Applied Intelligence, pages 1–15, 2023.

[134] Ken Chang, Niranjan Balachandar, Carson Lam, Darvin Yi, James

Brown, Andrew Beers, Bruce Rosen, Daniel L Rubin, and Jayashree

Kalpathy-Cramer. Distributed deep learning networks among in-

stitutions for medical imaging. Journal of the American Medical

Informatics Association, 25(8):945–954, 2018.

[135] Yue Zhao, Meng Li, Liangzhen Lai, Naveen Suda, Damon Civin, and

Vikas Chandra. Federated learning with non-IID data. arXiv preprint

arXiv:1806.00582, 2018.

[136] Yue Tan, Guodong Long, Lu Liu, Tianyi Zhou, Qinghua Lu, Jing

Jiang, and Chengqi Zhang. Fedproto: Federated prototype learning

across heterogeneous clients. In Proceedings of the AAAI Conference

on Artiﬁcial Intelligence, volume 36, pages 8432–8440, 2022.

[137] Ahmed M Abdelmoniem and Marco Canini. Towards mitigating device

heterogeneity in federated learning via adaptive model quantization. In

Proceedings of the 1st Workshop on Machine Learning and Systems,

pages 96–103, 2021.

[138] Ahmed M Abdelmoniem, Chen-Yu Ho, Pantelis Papageorgiou, and

Marco Canini. A comprehensive empirical study of heterogeneity in

federated learning. IEEE Internet of Things Journal, 2023.

[139] Pamela J LaMontagne, Tammie LS Benzinger, John C Morris, Sarah

Keefe, Russ Hornbeck, Chengjie Xiong, Elizabeth Grant, Jason Hassen-

stab, Krista Moulder, Andrei G Vlassenko, et al. OASIS-3: longitudinal

neuroimaging, clinical, and cognitive dataset for normal aging and

Alzheimer disease. MedRxiv, pages 2019–12, 2019.

[140] Fei Lyu, Andy J. Ma, Terry Cheuk-Fung Yip, Grace Lai-Hung Wong,

and Pong C. Yuen. Weakly supervised liver tumor segmentation using

couinaud segment annotation. IEEE Transactions on Medical Imaging,

41(5):1138–1149, 2022.

[141] Yunpeng Wang, Kang Wang, Xueqing Peng, Lili Shi, Jing Sun, Shibao

Zheng, Fei Shan, Weiya Shi, and Lei Liu. DeepSDM: Boundary-aware

pneumothorax segmentation in chest X-ray images. Neurocomputing,

454:201–211, 2021.

[142] Jiawei Huang, Haotian Shen, Jialong Wu, Xiaojian Hu, Zhiwei Zhu,

Xiaoqiang Lv, Yong Liu, and Yue Wang. Spine explorer: a deep

learning based fully automated program for efﬁcient and reliable

quantiﬁcations of the vertebrae and discs on sagittal lumbar spine MR

images. The Spine Journal, 20(4):590–599, 2020.

[143] Sebastian Lobentanzer, Patrick Aloy, Jan Baumbach, Balazs Bohar,

Vincent J Carey, Pornpimol Charoentong, Katharina Danhauser, Tunca

Do˘

gan, Johann Dreo, Ian Dunham, et al. Democratizing knowledge

representation with biocypher. Nature Biotechnology, pages 1–4, 2023.

[144] Bo Zhou, Neel Dey, Jo Schlemper, Seyed Sadegh Mohseni Salehi,

Chi Liu, James S. Duncan, and Michal Sofka. DSFormer: A dual-

domain self-supervised transformer for accelerated multi-contrast MRI

reconstruction. In Proceedings of the IEEE/CVF Winter Conference on

Applications of Computer Vision (WACV), pages 4966–4975, January

2023.

[145] Bjoern H. Menze, Andras Jakab, Stefan Bauer, Jayashree Kalpathy-

Cramer, Keyvan Farahani, Justin Kirby, Yuliya Burren, Nicole Porz,

Johannes Slotboom, Roland Wiest, Levente Lanczi, Elizabeth Gerstner,

Marc-Andr´

e Weber, Tal Arbel, Brian B. Avants, and Nicholas et al

Ayache. The multimodal brain tumor image segmentation benchmark

(BRATS). IEEE Transactions on Medical Imaging, 34(10):1993–2024,

2015.

[146] Pengfei Guo, Puyang Wang, Jinyuan Zhou, Shanshan Jiang, and

Vishal M Patel. Multi-institutional collaborations for improving deep

learning-based magnetic resonance image reconstruction using fed-

erated learning. In Proceedings of the IEEE/CVF Conference on

Computer Vision and Pattern Recognition, pages 2423–2432, 2021.

[147] Chun-Mei Feng, Yunlu Yan, Shanshan Wang, Yong Xu, Ling Shao,

and Huazhu Fu. Speciﬁcity-preserving federated learning for mr image

reconstruction. IEEE Transactions on Medical Imaging, 2022.

[148] Hailin Wang, Ke Qin, Rufai Yusuf Zakari, Guoming Lu, and Jin Yin.

Deep neural network-based relation extraction: an overview. Neural

Computing and Applications, pages 1–21, 2022.

[149] Martin Krallinger, Obdulia Rabal, Saber A Akhondi, Martın P´

erez

P´

erez, Jes´

us Santamar´

ıa, Gael P´

erez Rodr´

ıguez, Georgios Tsatsaro-

nis, Ander Intxaurrondo, Jos´

e Antonio L´

opez, Umesh Nandal, et al.

Overview of the BioCreative VI chemical-protein interaction track. In

Proceedings of the sixth BioCreative challenge evaluation workshop,

volume 1, pages 141–146, 2017.

[150] Diana Sousa, Andre Lamurias, and Francisco M. Couto. A silver

standard corpus of human phenotype-gene relations. In Proceedings

of the 2019 Conference of the North. Association for Computational

Linguistics, 2019.

[151] Alex Nichol, Joshua Achiam, and John Schulman. On ﬁrst-order meta-

learning algorithms. arXiv preprint arXiv:1803.02999, 2018.

[152] Rouzbeh Talebi Zarinkamar and Rene V Mayorga. Lifespan prediction

for lung and bronchus cancer patients via machine learning techniques.

International Journal of Machine Learning and Computing, 12(5),

2022.

[153] David R Cox. Regression models and life-tables. Journal of the Royal

Statistical Society: Series B (Methodological), 34(2):187–202, 1972.

[154] Jared L Katzman, Uri Shaham, Alexander Cloninger, Jonathan Bates,

Tingting Jiang, and Yuval Kluger. DeepSurv: personalized treatment

recommender system using a cox proportional hazards deep neural

network. BMC medical research methodology, 18(1):1–12, 2018.

[155] Alfred F Connors, Neal V Dawson, Norman A Desbiens, William J

Fulkerson, Lee Goldman, William A Knaus, Joanne Lynn, Robert K

Oye, Marilyn Bergner, Anne Damiano, et al. A controlled trial

to improve care for seriously iII hospitalized patients: The study

to understand prognoses and preferences for outcomes and risks of

treatments (SUPPORT). Jama, 274(20):1591–1598, 1995.

[156] Ali Ramezani-Kebrya, Fanghui Liu, Thomas Pethick, Grigorios

Chrysos, and Volkan Cevher. Federated learning under covariate shifts

with generalization guarantees. arXiv preprint arXiv:2306.05325, 2023.

[157] Aidan Hogan, Eva Blomqvist, Michael Cochez, Claudia d’Amato,

Gerard de Melo, Claudio Gutierrez, Sabrina Kirrane, Jos´

e Emilio Labra

Gayo, Roberto Navigli, Sebastian Neumaier, et al. Knowledge graphs.

ACM Computing Surveys (CSUR), 54(4):1–37, 2021.

[158] Gabriela Csurka, Diane Larlus, Florent Perronnin, and France Meylan.

What is a good evaluation measure for semantic segmentation? In

Bmvc, volume 27, pages 10–5244. Bristol, 2013.

[159] Metrics and scoring: quantifying the quality of predictions.

https://scikit-learn.org/stable/modules/model evaluation.html#

classiﬁcation-metrics, 2023.

[160] Alain Hore and Djemel Ziou. Image quality metrics: PSNR vs. SSIM.

In 2010 20th international conference on pattern recognition, pages

2366–2369. IEEE, 2010.

[161] Erika Graf, Claudia Schmoor, Willi Sauerbrei, and Martin Schumacher.

Assessment and comparison of prognostic classiﬁcation schemes for

survival data. Statistics in medicine, 18(17-18):2529–2545, 1999.

[162] Ali Raza, Kim Phuc Tran, Ludovic Koehl, and Shujun Li. Designing

ECG monitoring healthcare system with federated transfer learning and

explainable AI. Knowledge-Based Systems, 236:107763, 2022.

[163] Raouf Kerkouche, Gergely Acs, Claude Castelluccia, and Pierre

Genev`

es. Privacy-preserving and bandwidth-efﬁcient federated learn-

ing: An application to in-hospital mortality prediction. In Proceedings

of the Conference on Health, Inference, and Learning, pages 25–35,

2021.

[164] Mona Flores, Ittai Dayan, Holger Roth, Aoxiao Zhong, Ahmed

Harouni, Amilcare Gentili, Anas Abidin, Andrew Liu, Anthony Costa,

Bradford Wood, et al. Federated learning used for predicting outcomes

in SARS-COV-2 patients. Research Square, 2021.

[165] Akhil Vaid, Suraj K Jaladanki, Jie Xu, Shelly Teng, Arvind Kumar,

Samuel Lee, Sulaiman Somani, Ishan Paranjpe, Jessica K De Freitas,

Tingyi Wanyan, et al. Federated learning of electronic health records

improves mortality prediction in patients hospitalized with COVID-19.

MedRxiv, 2020.

[166] Trung Kien Dang, Kwan Chet Tan, Mark Choo, Nicholas Lim, Jianshu

Weng, and Mengling Feng. Building ICU in-hospital mortality predic-

tion model with federated learning. Federated Learning: Privacy and

Incentive, pages 255–268, 2020.

[167] Kaggle diabetic retinopathy detection. https://www.kaggle.com/

competitions/diabetic-retinopathy- detection/data, 2017. Accessed:

March 18, 2023.

[168] Sharib Ali, Debesh Jha, Noha Ghatwary, Stefano Realdon, Renato

Cannizzaro, Osama E Salem, Dominique Lamarque, Christian Daul,

Michael A Riegler, Kim V Anonsen, et al. PolypGen: A multi-

center polyp detection and segmentation dataset for generalisability

assessment. arXiv preprint arXiv:2106.04463, 2021.

[169] Amber L Simpson, Michela Antonelli, Spyridon Bakas, Michel Bilello,

Keyvan Farahani, Bram Van Ginneken, Annette Kopp-Schneider, Ben-

nett A Landman, Geert Litjens, Bjoern Menze, et al. A large

annotated medical image dataset for the development and evaluation

of segmentation algorithms. arXiv preprint arXiv:1902.09063, 2019.

[170] Holger R Roth, Le Lu, Amal Farag, Hoo-Chang Shin, Jiamin Liu,

Evrim B Turkbey, and Ronald M Summers. DeepOrgan: Multi-level

deep convolutional networks for automated pancreas segmentation. In

International conference on medical image computing and computer-

assisted intervention, pages 556–564. Springer, 2015.

[171] Bennett Landman, Zhoubing Xu, J Igelsias, Martin Styner, T Langerak,

and Arno Klein. MICCAI multi-atlas labeling beyond the cranial

vault–workshop and challenge. In Proc. MICCAI Multi-Atlas Labeling

Beyond Cranial Vault—Workshop Challenge, volume 5, page 12, 2015.

[172] Yingda Xia, Dong Yang, Wenqi Li, Andriy Myronenko, Daguang

Xu, Hirofumi Obinata, Hitoshi Mori, Peng An, Stephanie Harmon,

Evrim Turkbey, et al. Auto-FedAvg: learnable federated averaging

for multi-institutional medical image segmentation. arXiv preprint

arXiv:2104.10195, 2021.

[173] Patrick Bilic, Patrick Christ, Hongwei Bran Li, Eugene Vorontsov,

Avi Ben-Cohen, Georgios Kaissis, Adi Szeskin, Colin Jacobs, Gabriel

Efrain Humpire Mamani, Gabriel Chartrand, et al. The liver tumor

segmentation benchmark (LITS). Medical Image Analysis, 84:102680,

2023.

[174] Tobias Heimann, Bram van Ginneken, Martin A. Styner, Yulia

Arzhaeva, Volker Aurich, Christian Bauer, Andreas Beck, Christoph

Becker, Reinhard Beichel, Gy ¨

Orgy Bekes, Fernando Bello, Gerd Bin-

nig, and et al Bischof, Horst. Comparison and evaluation of methods

for liver segmentation from CT datasets. IEEE Transactions on Medical

Imaging, 28(8):1251–1265, 2009.

[175] A Emre Kavur, N Sinem Gezer, Mustafa Barıs¸, Sinem Aslan, Pierre-

Henri Conze, Vladimir Groza, Duc Duy Pham, Soumick Chatterjee,

Philipp Ernst, Savas¸ ¨

Ozkan, et al. CHAOS challenge-combined (CT-

MR) healthy abdominal organ segmentation. Medical Image Analysis,

69:101950, 2021.

[176] Kooperative gesundheitsforschung in der region augsburg (KORA).

https://www.helmholtz-muenchen.de/en/kora/index.html, 2005. Ac-

cessed: March 18, 2023.

[177] Tobias Bernecker, Annette Peters, Christopher L Schlett, Fabian Bam-

berg, Fabian Theis, Daniel Rueckert, Jakob Weiß, and Shadi Albar-

qouni. FedNorm: Modality-based normalization in federated learning

for multi-modal liver segmentation. arXiv preprint arXiv:2205.11096,

2022.

[178] Hayden Gunraj, Linda Wang, and Alexander Wong. Covidnet-ct: A

tailored deep convolutional neural network design for detection of

covid-19 cases from chest ct images. Frontiers in Medicine, 7:1025,

2020.

[179] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep

residual learning for image recognition. In Proceedings of the IEEE

conference on computer vision and pattern recognition, pages 770–778,

2016.

[180] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic

optimization. arXiv preprint arXiv:1412.6980, 2014.

[181] Pieter-Tjerk De Boer, Dirk P Kroese, Shie Mannor, and Reuven Y

Rubinstein. A tutorial on the cross-entropy method. Annals of

operations research, 134:19–67, 2005.

[182] Xueluan Gong, Yanjiao Chen, Qian Wang, and Weihan Kong. Backdoor

attacks and defenses in federated learning: State-of-the-art, taxonomy,

and future directions. IEEE Wireless Communications, 2022.

[183] Shengshan Hu, Jianrong Lu, Wei Wan, and Leo Yu Zhang. Challenges

and approaches for mitigating byzantine attacks in federated learning.

arXiv preprint arXiv:2112.14468, 2021.

[184] Evelyn Ma, Rasoul Etesami, et al. Local environment poisoning attacks

on federated reinforcement learning. arXiv preprint arXiv:2303.02725,

2023.

[185] Nuria Rodr´

ıguez-Barroso, Daniel Jim´

enez-L´

opez, M Victoria Luz´

on,

Francisco Herrera, and Eugenio Mart´

ınez-C´

amara. Survey on feder-

ated learning threats: Concepts, taxonomy on attacks and defences,

experimental study and challenges. Information Fusion, 90:148–173,

2023.

[186] Xiaoli Li, Siran Zhao, Chuan Chen, and Zibin Zheng. Heterogeneity-

aware fair federated learning. Information Sciences, 619:968–986,

2023.

[187] Lin Wang, Zhichao Wang, and Xiaoying Tang. FedEBA+: Towards

fair and effective federated learning via entropy-based model. arXiv

preprint arXiv:2301.12407, 2023.

[188] Xiaodong Ma, Jia Zhu, Zhihao Lin, Shanxuan Chen, and Yangjie Qin.

A state-of-the-art survey on solving non-IID data in federated learning.

Future Generation Computer Systems, 135:244–258, 2022.

[189] Wenke Huang, Mang Ye, and Bo Du. Learn from others and be

yourself in heterogeneous federated learning. In Proceedings of the

IEEE/CVF Conference on Computer Vision and Pattern Recognition

(CVPR), pages 10143–10153, June 2022.

[190] Tian Li, Maziar Sanjabi, Ahmad Beirami, and Virginia Smith.

Fair resource allocation in federated learning. arXiv preprint

arXiv:1905.10497, 2019.

[191] Christos L Stergiou, Konstantinos E Psannis, and Brij B Gupta. Infemo:

ﬂexible big data management through a federated cloud system. ACM

Transactions on Internet Technology (TOIT), 22(2):1–22, 2021.

[192] Emily R Pfaff, Andrew T Girvin, Davera L Gabriel, Kristin Kostka,

Michele Morris, Matvey B Palchuk, Harold P Lehmann, Benjamin

Amor, Mark Bissell, Katie R Bradwell, et al. Synergies between

centralized and federated approaches to data quality: a report from the

national covid cohort collaborative. Journal of the American Medical

Informatics Association, 29(4):609–618, 2022.

[193] Daoyuan Chen, Dawei Gao, Yuexiang Xie, Xuchen Pan, Zitao Li,

Yaliang Li, Bolin Ding, and Jingren Zhou. Fs-real: Towards real-world

cross-device federated learning. arXiv preprint arXiv:2303.13363,

2023.

[194] Xiaohu You, Cheng-Xiang Wang, Jie Huang, Xiqi Gao, Zaichen

Zhang, Mao Wang, Yongming Huang, Chuan Zhang, Yanxiang Jiang,

Jiaheng Wang, et al. Towards 6G wireless communication networks:

Vision, enabling technologies, and new paradigm shifts. Science China

Information Sciences, 64:1–74, 2021.

[195] Ge Wang, Andreu Badal, Xun Jia, Jonathan S Maltz, Klaus Mueller,

Kyle J Myers, Chuang Niu, Michael Vannier, Pingkun Yan, Zhou Yu,

et al. Development of metaverse for intelligent healthcare. Nature

Machine Intelligence, pages 1–8, 2022.

[196] Hu Xiong, Chuanjie Jin, Mamoun Alazab, Kuo-Hui Yeh, Hanxiao

Wang, Thippa Reddy Gadekallu, Weizheng Wang, and Chunhua Su.

On the design of blockchain-based ECDSA with fault-tolerant batch

veriﬁcation protocol for blockchain-enabled IoMT. IEEE Journal of

Biomedical and Health Informatics, 26(5):1977–1986, 2022.

[197] Phuc H Le-Khac, Graham Healy, and Alan F Smeaton. Contrastive

representation learning: A framework and review. IEEE Access,

8:193907–193934, 2020.

[198] Fei Kong, Jinxi Xiang, Xiyue Wang, Xinran Wang, Meng Yue, Jun

Zhang, Sen Yang, Junhan Zhao, Xiao Han, Yuhan Dong, et al.

Federated contrastive learning models for prostate cancer diagnosis and

Gleason grading. arXiv preprint arXiv:2302.06089, 2023.

[199] Guangyao Zheng, Michael A Jacobs, Vladimir Braverman, and

Vishwa S Parekh. Asynchronous decentralized federated lifelong

learning for landmark localization in medical imaging. arXiv preprint

arXiv:2303.06783, 2023.

[200] Marco Cascella, Jonathan Montomoli, Valentina Bellini, and Elena

Bignami. Evaluating the feasibility of ChatGPT in healthcare: An

analysis of multiple clinical and research scenarios. Journal of Medical

Systems, 47(1):1–5, 2023.

Defending Against Sophisticated Poisoning Attacks with RL-based Aggregation in Federated Learning

Preprint

Jun 2024

Federated learning is highly susceptible to model poisoning attacks, especially those meticulously crafted for servers. Traditional defense methods mainly focus on updating assessments or robust aggregation against manually crafted myopic attacks. When facing advanced attacks, their defense stability is notably insufficient. Therefore, it is imperative to develop adaptive defenses against such advanced poisoning attacks. We find that benign clients exhibit significantly higher data distribution stability than malicious clients in federated learning in both CV and NLP tasks. Therefore, the malicious clients can be recognized by observing the stability of their data distribution. In this paper, we propose AdaAggRL, an RL-based Adaptive Aggregation method, to defend against sophisticated poisoning attacks. Specifically, we first utilize distribution learning to simulate the clients' data distributions. Then, we use the maximum mean discrepancy (MMD) to calculate the pairwise similarity of the current local model data distribution, its historical data distribution, and global model data distribution. Finally, we use policy learning to adaptively determine the aggregation weights based on the above similarities. Experiments on four real-world datasets demonstrate that the proposed defense model significantly outperforms widely adopted defense models for sophisticated attacks.

A Multifaceted Survey on Federated Learning: Fundamentals, Paradigm Shifts, Practical Issues, Recent Developments, Partnerships, Trade-Offs, Trustworthiness, and Ways Forward

Article

Full-text available

Jan 2024

Federated learning (FL) is considered a de facto standard for privacy preservation in AI environments because it does not require data to be aggregated in some central place to train an AI model. Preserving data on the client side and sharing only the model’s parameters with a central server preserves privacy while training an AI model of higher generalizability. Unfortunately, sharing the model’s parameters with the server can create privacy leaks, and therefore, FL is unable to meet privacy requirements in many situations. Furthermore, FL is prone to other technical issues, such as data poisoning, model poisoning, fairness, client dropout, and convergence issues, to name just a few. In this work, we provide a multifaceted survey on FL, including its fundamentals, paradigm shifts, technical issues, recent developments, and future prospects. First, we discuss the fundamental concepts of FL (workflow, categorization, the differences between centralized learning and FL, and applications of FL in diverse fields), and we then discuss the paradigm shifts brought on by FL from a broader perspective (e.g., data use, AI model development, resource sharing, etc.). Later, we pinpoint 10 practical issues currently hindering the viability of the FL landscape, and we discuss developments made under each issue by summarizing state-of-the-art (SOTA) literature. We highlight FL partnerships with two or more technologies that either improve practical aspects/issues in FL or extend its adoption to new areas/domains. We pinpoint various trade-offs that exist in an FL ecosystem, and the corresponding SOTA developments to mitigate them. We also discuss the latest studies that have been proposed to make FL trustworthy and beneficial for the community. Lastly, we suggest valuable research directions towards enhancing technical efficacy by guiding researchers to less explored topics in FL.

Analysis of Federated Learning Paradigm in Medical Domain: Taking COVID-19 as an Application Use Case

Article

Full-text available

May 2024

Federated learning (FL) has emerged as one of the de-facto privacy-preserving paradigms that can effectively work with decentralized data sources (e.g., hospitals) without acquiring any private data. Recently, applications of FL have vastly expanded into multiple domains, particularly the medical domain, and FL is becoming one of the mainstream technologies of the near future. In this study, we provide insights into FL fundamental concepts (e.g., the difference from centralized learning, functions of clients and servers, workflows, and nature of data), architecture and applications in the general medical domain, synergies with emerging technologies, key challenges (medical domain), and potential research prospects. We discuss major taxonomies of the FL systems and enlist technical factors in the FL ecosystem that are the foundation of many adversarial attacks on these systems. We also highlight the promising applications of FL in the medical domain by taking the recent COVID-19 pandemic as an application use case. We highlight potential research and development trajectories to further enhance the persuasiveness of this emerging paradigm from the technical point of view. We aim to concisely present the progress of FL up to the present in the medical domain including COVID-19 and to suggest future research trajectories in this area.

A Practical Simulation for Domain Adaptation Models

Conference Paper

Dec 2023

Potential of Federated Learning in Healthcare

Conference Paper

Dec 2023

Medical Metaverse: A New Virtual Health Experience

Conference Paper

Dec 2023

Boosting Classification Tasks with Federated Learning: Concepts, Experiments and Perspectives

Conference Paper

Dec 2023

ChatGPT: An Artificial Intelligence-Based Approach to Enhance Medical Applications

Conference Paper

Dec 2023

Federated Learning for Medical Applications: A Taxonomy, Current Trends, Challenges, and Future Research Directions

Article

Full-text available

Nov 2023

With the advent of the Internet of Things (IoT), Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL) algorithms, the landscape of data-driven medical applications has emerged as a promising avenue for designing robust and scalable diagnostic and prognostic models from medical data. This has gained a lot of attention from both academia and industry, leading to significant improvements in healthcare quality. However, the adoption of AI-driven medical applications still faces tough challenges, including meeting security, privacy, and quality of service (QoS) standards. Recent developments in Federated Learning (FL) have made it possible to train complex machine-learned models in a distributed manner and has become an active research domain, particularly processing the medical data at the edge of the network in a decentralized way to preserve privacy and address security concerns. To this end, in this paper, we explore the present and future of FL technology in medical applications where data sharing is a significant challenge. We delve into the current research trends and their outcomes, unravelling the complexities of designing reliable and scalable FL models. Our paper outlines the fundamental statistical issues in FL, tackles device-related problems, addresses security challenges, and navigates the complexity of privacy concerns, all while highlighting its transformative potential in the medical field. Our study primarily focuses on medical applications of FL, particularly in the context of global cancer diagnosis. We highlight the potential of FL to enable computer-aided diagnosis tools that address this challenge with greater effectiveness than traditional data-driven methods. Recent literature has shown that FL models are robust and generalize well to new data, which is essential for medical applications. We hope that this comprehensive review will serve as a checkpoint for the field, summarizing the current state-of-the-art and identifying open problems and future research directions.

Contrast with major classifier vectors for federated medical relation extraction with heterogeneous label distribution

Article

Full-text available

Oct 2023
APPL INTELL

Federated medical relation extraction enables multiple clients to train a deep network collaboratively without sharing their raw medical data. To handle the heterogeneous label distribution across clients, most of the existing works enforce regularization between local and global models during updating. In this paper, we propose the concept of major classifier vectors, which are a group of classifier vectors that characterize the representation space of relation classes well. They are obtained by comparing the inter-classifier similarity between clients, which is an ensembling method that avoids the bias introduced by weighted aggregation. We propose an algorithm named FedCMC, which restricts the updating of local models by contrasting with major classifier vectors to avoid overfitting to the local label distribution by comparison with major classifier vectors. Extensive experiments show that FedCMC outperforms the other state-of-the-art federated learning (FL) algorithms on three medical relation extraction datasets.

A texture-based method for predicting molecular markers and survival outcome in lower grade glioma

Article

Full-text available

Jul 2023
APPL INTELL

Texture-based convolutional neural networks (CNNs) have shown great promise in predicting various types of cancer, including lower grade glioma (LGG) through radiomics analysis. However, the use of CNN-based radiomics requires a large training set to avoid overfitting. To overcome this problem, the study proposes a novel panel of radiomic/texture features based on principal component analysis (PCA) applied to pretrained CNN features. The study used extracted PCA-CNN radiomic features from multimodal magnetic resonance imaging (MRI) images as input to a random forest (RF) classifier to predict immune cell markers, the gene status, and the survival outcome for LGG patients (n = 83). The results of the experiments demonstrate that RF with PCA-CNN radiomic features improved the classification performance, achieving the highest significant classification between short- and long-term survival outcomes. Notably, the area under the curve for PCA-CNN radiomic features with RF was 78.53% (p = 0.0008), which was significantly better than using gene status 63.14% (p = 0.23), clinical variables 52.60% (p = 0.32), standard radiomic features 72.56% (p = 0.02), immune cell markers 65.67% (p = 0.007), conditional entropy 74.54% (p = 0.0058), Gaussian mixture model-CNN 74.94% (p = 0.0053), or using 3D CNN classification directly without RF 72.61% (p = 0.01). The proposed PCA-CNN-based radiomic model outperformed state-of-the-art techniques to predict the survival outcome of LGG patients.

nnFormer: Volumetric Medical Image Segmentation via a 3D Transformer

Article

Full-text available

Jul 2023
IEEE T IMAGE PROCESS

Transformer, the model of choice for natural language processing, has drawn scant attention from the medical imaging community. Given the ability to exploit long-term dependencies, transformers are promising to help atypical convolutional neural networks to learn more contextualized visual representations. However, most of recently proposed transformer-based segmentation approaches simply treated transformers as assisted modules to help encode global context into convolutional representations. To address this issue, we introduce nnFormer (i.e., not-another transFormer), a 3D transformer for volumetric medical image segmentation. nnFormer not only exploits the combination of interleaved convolution and self-attention operations, but also introduces local and global volume-based self-attention mechanism to learn volume representations. Moreover, nnFormer proposes to use skip attention to replace the traditional concatenation/summation operations in skip connections in U-Net like architecture. Experiments show that nnFormer significantly outperforms previous transformer-based counterparts by large margins on three public datasets. Compared to nnUNet, the most widely recognized convnet-based 3D medical segmentation model, nnFormer produces significantly lower HD95 and is much more computationally efficient. Furthermore, we show that nnFormer and nnUNet are highly complementary to each other in model ensembling. Codes and models of nnFormer are available at https://git.io/JSf3i .

CNN approach for predicting survival outcome of patients with COVID-19

Article

Full-text available

Aug 2023

Coronavirus disease 2019 (COVID-19) has been challenged specifically with the new variant. The number of patients seeking treatment has increased significantly, putting tremendous pressure on hospitals and healthcare systems. With the potential of artificial intelligence (AI) to leverage clinicians to improve personalized medicine for COVID-19, we propose a deep learning model based on 1D and 3D convolutional neural networks (CNNs) to predict the survival outcome of COVID-19 patients. Our model consists of two CNN channels that operate with CT scans and the corresponding clinical variables. Specifically, each patient data set consists of CT images and the corresponding 44 clinical variables used in the 3D CNN and 1D CNN input, respectively. This model aims to combine imaging and clinical features to predict short-term from long-term survival. Our models demonstrate higher performance metrics compared to state-of-the-art models with AUC-ROC of 91.44 – 91.60% versus 84.36 – 88.10% and Accuracy of 83.39 – 84.47% versus 79.06 – 81.94% in predicting the survival groups of patients with COVID-19. Based on the findings, the combined clinical and imaging features in the deep CNN model can be used as a prognostic tool and help to distinguish censored and uncensored cases of COVID-19.

Recent advances on federated learning: A systematic survey

Article

Jun 2024
NEUROCOMPUTING

Personalized and privacy-preserving federated heterogeneous medical image analysis with PPPML-HMI

Article

Dec 2023
COMPUT BIOL MED

FS-REAL: Towards Real-World Cross-Device Federated Learning

Conference Paper

Aug 2023

Democratizing knowledge representation with BioCypher

Article

Jun 2023

Explainable, Domain-Adaptive, and Federated Artificial Intelligence in Medicine

Article

Apr 2023

Artificial intelligence (AI) continues to transform data analysis in many domains. Progress in each domain is driven by a growing body of annotated data, increased computational resources, and technological innovations. In medicine, the sensitivity of the data, the complexity of the tasks, the potentially high stakes, and a requirement of accountability give rise to a particular set of challenges. In this review, we focus on three key methodological approaches that address some of the particular challenges in AI-driven medical decision making. 1) Explainable AI aims to produce a human-interpretable justification for each output. Such models increase confidence if the results appear plausible and match the clinicians expectations. However, the absence of a plausible explanation does not imply an inaccurate model. Especially in highly non-linear, complex models that are tuned to maximize accuracy, such interpretable representations only reflect a small portion of the justification. 2) Domain adaptation and transfer learning enable AI models to be trained and applied across multiple domains. For example, a classification task based on images acquired on different acquisition hardware. 3) Federated learning enables learning large-scale models without exposing sensitive personal health information. Unlike centralized AI learning, where the centralized learning machine has access to the entire training data, the federated learning process iteratively updates models across multiple sites by exchanging only parameter updates, not personal health data. This narrative review covers the basic concepts, highlights relevant corner-stone and state-of-the-art research in the field, and discusses perspectives.

Federated Learning for Healthcare Applications

Abstract

Recommended publications

Boosting Classification Tasks with Federated Learning: Concepts, Experiments and Perspectives

Enhancing Classification Tasks through Domain Adaptation Strategies

Domain Adaptation in Machine Learning: A Practical Simulation Study

Building a Better Metaverse: How Federated Learning is Revolutionizing Virtual Worlds