ArticlePDF Available

IFL-GAN: Improved Federated Learning Generative Adversarial Network With Maximum Mean Discrepancy Model Aggregation

April 2022
IEEE Transactions on Neural Networks and Learning Systems PP(99)

April 2022
PP(99)

DOI:10.1109/TNNLS.2022.3167482

Authors:

The generative adversarial network (GAN) is usually built from the centralized, independent identically distributed (i.i.d.) training data to generate realistic-like instances. In real-world applications, however, the data may be distributed over multiple clients and hard to be gathered due to bandwidth, departmental coordination, or storage concerns. Although existing works, such as federated learning GAN (FL-GAN), adopt different distributed strategies to train GAN models, there are still limitations when data are distributed in a non-i.i.d. manner. These studies suffer from convergence difficulty, producing generated data with low quality. Fortunately, we found that these challenges are often due to the use of a federated averaging strategy to aggregate local GAN models' updates. In this article, we propose an alternative approach to tackling this problem, which learns a globally shared GAN model by aggregating locally trained generators' updates with maximum mean discrepancy (MMD). In this way, we term our approach improved FL-GAN (IFL-GAN). The MMD score helps each local GAN hold different weights, making the global GAN in IFL-GAN getting converged more rapidly than federated averaging. Extensive experiments on MNIST, CIFAR10, and SVHN datasets demonstrate the significant improvement of our IFL-GAN in both achieving the highest inception score and producing high-quality instances.

Content uploaded by Chao Ma

Content may be subject to copyright.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 1

IFL-GAN: Improved Federated Learning Generative

Adversarial Network With Maximum Mean

Discrepancy Model Aggregation

Wei Li ,Member, IEEE, Jinlin Chen , Zhenyu Wang, Zhidong Shen ,

Chao Ma ,Member, IEEE, and Xiaohui Cui

Abstract— The generative adversarial network (GAN) is

usually built from the centralized, independent identically dis-

tributed (i.i.d.) training data to generate realistic-like instances.

In real-world applications, however, the data may be distributed

over multiple clients and hard to be gathered due to bandwidth,

departmental coordination, or storage concerns. Although exist-

ing works, such as federated learning GAN (FL-GAN), adopt

different distributed strategies to train GAN models, there are

still limitations when data are distributed in a non-i.i.d. manner.

These studies suffer from convergence difﬁculty, producing gen-

erated data with low quality. Fortunately, we found that these

challenges are often due to the use of a federated averaging

strategy to aggregate local GAN models’ updates. In this article,

we propose an alternative approach to tackling this problem,

which learns a globally shared GAN model by aggregating locally

trained generators’ updates with maximum mean discrepancy

(MMD). In this way, we term our approach improved FL-GAN

(IFL-GAN). The MMD score helps each local GAN hold different

weights, making the global GAN in IFL-GAN getting converged

more rapidly than federated averaging. Extensive experiments

on MNIST, CIFAR10, and SVHN datasets demonstrate the

signiﬁcant improvement of our IFL-GAN in both achieving the

highest inception score and producing high-quality instances.

Index Terms—Federated learning, generative adversarial

network (GAN), maximum mean discrepancy (MMD), non-

independent identically distributed (i.i.d.) training data.

Manuscript received January 2, 2020; revised November 30, 2020 and

January 2, 2022; accepted April 10, 2022. This work was supported in

part by the Fundamental Research Funds for the Central Universities under

Grant JUSRP121073 and Grant JUSRP521004, in part by the 2021 Jiangsu

Shuangchuang (Mass Innovation and Entrepreneurship) Talent Program under

Grant JSSCBS20210827, and in part by the Open Foundation of Engineering

Research Center of Cyberspace under Grant KJAQ202112014. (Correspond-

ing authors: Jinlin Chen; Xiaohui Cui.)

Wei Li is with the Science Center for Future Foods, the School of Artiﬁcial

Intelligence and Computer Science, and the Jiangsu Key Laboratory of Media

Design and Software Technology, Jiangnan University, Wuxi, Jiangsu 214122,

China (e-mail: cs_weili@jiangnan.edu.cn).

Jinlin Chen is with the Department of Computing, The Hong Kong

Polytechnic University, Hong Kong (e-mail: csjlchen@comp.polyu.edu.hk).

Zhenyu Wang is with the School of Computer Science, Wuhan University,

Wuhan 430072, China, and also with the Jiaxing Institute of Future Food,

Jiaxing, Zhejiang 314005, China (e-mail: zhenyuwang@whu.edu.cn).

Zhidong Shen, Chao Ma, and Xiaohui Cui are with the School of Cyber

Science and Engineering, Wuhan University, Wuhan 430072, China (e-mail:

shenzd@whu.edu.cn; whmachao@ieee.org; xcui@whu.edu.cn).

Color versions of one or more ﬁgures in this article are available at

https://doi.org/10.1109/TNNLS.2022.3167482.

Digital Object Identiﬁer 10.1109/TNNLS.2022.3167482

I. INTRODUCTION

THE generative adversarial network (GAN) [6] has been

demonstrated as a powerful generative model that casts

generative modeling as a game between two networks: a

discriminator Dand a generator G. The discriminator D

estimates a probability that a sample came from the training

data rather than the generator G. In the training of GAN,

Gis viewed as a forger, which specializes in generating

simulation data to fool Dinto accepting it as real. The

GAN model sidesteps the difﬁculty of approximating many

intractable probabilistic computations; it samples data from

an easy-to-sample distribution, so the Markov chains [25]

are never needed. Its gradients are tuned using backpropaga-

tion, which makes the training computationally inexpensive.

Although GAN is successfully applied to many real-world

applications [16]–[18], [28], training a GAN is still a challenge

because training GAN with high capacity is usually built

from the centralized, independent identically distributed (i.i.d.)

training data. Such a scenario is not often the case in practice.

For example, different departments in the same domain (e.g.,

The State Food and Drug Bureau and The Animal Husbandry

and Veterinary Bureau) collect data that are of concern to

their department. However, these data may be non-i.i.d. and

hard to be gathered due to departmental coordination and

bandwidth concerns such that the collected data are distributed

over clients in a non-i.i.d. manner, rendering it difﬁcult to train

GAN under such a scenario. A non-i.i.d. example is shown

in Fig. 1.

GAN variants, such as MD-GAN [9] and federated learning

GAN (FL-GAN) [9], have been proposed to address this

problem. FL-GAN trains several local GAN models with the

federated averaging algorithm to learn a shared model by

aggregating locally computed updates. By employing multiple

GAN models and deploying each GAN (GANi) to Clienti,

however, they still have limitations. MD-GAN (one single

generator and multiple discriminators) is trained over multi-

ple datacenters through the wide-area network (WAN) [11].

However, such a training strategy is very expensive and com-

plicated. Moreover, the server usually generates kbatches at

each global iteration. All the Nclients receive and compute the

feedback on the same training batch if k=1, and no feedback

has a conﬂict on some concurrently processed data if k=N.

See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Wuhan University. Downloaded on June 23,2022 at 04:15:17 UTC from IEEE Xplore. Restrictions apply.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

2IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

Fig. 1. Non-i.i.d. data are distributed over different clients. Each client

Clientiholds few categories, while the quantity of data stored in Clientimay

be different.

Thus, MD-GAN usually reduces the server workload by sac-

riﬁcing the diversity of generated instances. FL-GAN, on the

other hand, applies a federated learning strategy [21] to GAN

models and trains all models with federated averaging. The

federated averaging strategy produces promising results on

i.i.d. training data [23]. However, it degrades the performance

of models in the non-i.i.d. case [20], which produces generated

data with low quality and suffers from a longer training time

to get converged. A detailed discussion is shown in Section V.

To address this challenge, we propose improved FL-GAN

(IFL-GAN), which learns a globally shared GAN model

(global GAN) by aggregating locally trained generators’

updates with maximum mean discrepancy (MMD) [8]. In sce-

narios where the federated averaging strategy is employed

for learning a global GAN, all locally trained generators

are treated as contributors with exactly the same weights,

which is not always suitable in many real-world applications.

For instance, we have Klocal GAN models denoted as

GAN1∼GANKand GAN1∼GANi, which reached the Nash

equilibrium much earlier than the rest (i.e., GANi+1∼GANK)

mainly due to the non-i.i.d. property of the distributed training

data. In the federated averaging strategy, it is obvious that this

will cause a much longer time for the global GAN to get

converged. This is because local models’ (GAN1∼GANK)

updates are triggered by global GAN, i.e., GAN(x;θGANi)=

GAN(x;θGANglobal)after updating global GAN parameters with

GAN(x;θGANglobal)=(1/K)K

1GAN(x;θGANi). It causes

local GAN models that have reached the Nash equilibrium,

jumping out of the equilibrium. Fortunately, this misery could

be signiﬁcantly alleviated by replacing the averaging strategy

with MMD because it tends to assign larger weights to

local GAN models, which are still far from convergence.

In this manner, MMD makes all locally trained GAN models

get converged more rapidly, which will greatly reduce the

training time of the global GAN. In recent works of GAN

research, MMD is introduced to calculate the supremum of the

difference between source samples and target samples (a.k.a.

the two-sample test). It is theoretically proved and empiri-

cally evaluated that the MMD score effectively reﬂects the

performance of GAN models [15], [31]. A detailed discussion

about MMD and the federated averaging method is presented

in Section IV.

In summary, the major innovations and contributions of this

study are described as follows.

1) This article proposes a novel approach, named

IFL-GAN, to train GAN with MMD from decentralized,

non-i.i.d. training data.

2) This article compares the federated averaging strategy

with MMD from both theoretical and empirical perspec-

tives, giving new insights into the success of IFL-GAN.

3) Through comprehensive experiments on three datasets

with different resolutions, we demonstrate the effective-

ness of IFL-GAN.

The rest part of this article is organized as follows.

In Section II, existing works are discussed. The GAN model

and federated learning are reviewed in Section III. Our

IFL-GAN is introduced in detail in Section IV. In Section V,

empirical evaluation is conducted. Section VI serves as our

conclusion.

II. RELATED WORK

Federatively training the neural networks is a thriving

direction and attracts many researchers focusing on this topic.

Here, we categorize the research works of federated learning

according to the research that focuses on different aspects.

A. Improving Communication Efﬁciency Among Models

Since the data are distributed over a set of clients and

the networks that these clients held need to communicate

and exchange information, researchers focus on reducing the

network latency and improving communication efﬁciency.

Smith et al. [27] showed that multitask learning is naturally

suited to handle the statistical challenge and proposed a

new system-aware optimization approach, MOCHA. This

is because unstable communication might cause devices to

get ofﬂine, which makes federated learning more challeng-

ing. The goal of this approach is to achieve signiﬁcant

speedups for operating federated learning on separate devices.

Bonawitz et al. [3] paid their attention to the device availabil-

ity and unreliable device connectivity, as well as interrupted

execution. They had built a scalable production system for

federated learning on tens of millions of real-world devices.

Although these studies have improved the communication

efﬁciency among models, they pay less attention to improve

the performance of models.

B. Integrating Federated Learning With Other Schemes

Some researchers try to integrate the federated learning

approach (e.g., federated learning [19], [34]) with other deep

learning models and apply the integrated models to the

real-world applications. Zhuo et al. [34] integrated the rein-

forcement learning with federated learning. This is because

different departments’ decision policies are private and hard

to be shared with each other. On the other hand, building

individual decision policies with high quality is a nontrivial

task. Therefore, it is necessary to federatively learn deci-

sion policies. Liu et al. [19] applied federated learning to

robots’ communication and made robots fuse and transfer

their experience so that robots can quickly adapt to the new

environment. In this way, they termed their new approach

Authorized licensed use limited to: Wuhan University. Downloaded on June 23,2022 at 04:15:17 UTC from IEEE Xplore. Restrictions apply.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

LI et al.: IFL-GAN WITH MMD MODEL AGGREGATION 3

lifelong federated reinforcement learning (LFRL). LFRL is

consistent with human cognitive science and ﬁts well in

cloud robotic systems. Different from those federated learning

approaches, the following studies explore the integration of

federated learning with GAN.

MD-GAN [9] is the recent approach that trains GAN models

in a federated fashion. MD-GAN employs multiple discrimi-

nators and only one generator to reduce computing costs. Note

that the generator is on the server, while these discriminators

are on the local clients. The generator produces simulation

data and sends them to each local client simultaneously. The

discriminator is still used to distinguish the simulation data

from real data. Note that each discriminator has (1/(n−1))

probability to be exchanged with another discriminator, which

is randomly selected from n−1 discriminators, given that

there are ndiscriminators. The exchanging approach adopts

the gossip algorithm [10]. Although swapping the parameters

between two discriminators can avoid overﬁtting problems,

swapped models need to be retrained. We still take Fig. 1

as the example. Discriminator 1 has learned the knowledge

of ﬁgures “0” and “1,” and the knowledge of ﬁgures “2”

and “3” has been learned by discriminator 2. If we swap

the parameters of the two discriminators, both of them need

to be retrained to understand which ﬁgures are “0” or “2,”

resulting in more training costs. In this way, MD-GAN needs

to determine a tradeoff between training complexity and data

diversity, reducing the diversity of generated data.

FL-GAN [9] aims to train a set of GAN models with the

federated averaging method. Speciﬁcally, each client holds

a vanilla GAN model, and a shared model is learned by

aggregating locally computed updates with iterative federated

averaging. The federated averaging method shows promising

results when facing i.i.d. training data. However, it degrades

the performance of models in the non-i.i.d. case [20].

III. PRELIMINARIES

A. Generative Adversarial Network

The GAN was proposed by Goodfellow et al. [6] as a novel

generative model to simultaneously train a generator Gand a

discriminator Dusing the following function:

min

Gmax

DV(G,D)=Ex∼pr(x)log D(x)

+Ez∼pz(z)log(1−D(G(z)))(1)

where xcomes from a distribution pr(x)underlying the

original dataset and zcomes from a predeﬁned noise distrib-

ution pz(z), which is usually an easy-to-sample distribution,

e.g., uniform distribution with (-1, 1) or Gaussian distribution

with (0, 1). Gstarts with sampling input zfrom pz(z)and

then maps zto data space G(z;θG)through a differentiable

network. On the other hand, Daims to recognize whether an

instance is from training data or from G. In general, Dstrives

to minimize the score it assigns to the generated data G(z)

by minimizing D(G(z)) and maximize the score it assigns

to the original data xby maximizing D(x).Inthisway,

Dand Gare alternatively optimized, and the Jensen–Shannon

(JS) divergence is utilized to measure the difference between

pr(x)and pG(x). JS divergence reaches its lowest value as

Dand Greach the Nash equilibrium [4], where D(G(z)) =

D(x)=0.5. The GAN model gets converged under such a

scenario.

B. Federated Learning

Both the centralized and decentralized architectures usually

assume the balanced and i.i.d. training data [20]. Federated

learning [21] evolves around a scenario where the training data

are stored locally in multiple clients (e.g., mobile devices).

Hence, each particular local dataset could not be representative

of the overall distribution. Usually, the machine learning mod-

els the clients held are neural networks (e.g., the convolutional

neural network). Thus, the algorithm adopted by the federated

learning is applicable to any ﬁnite-sum objective, which is

shown as follows:

min

ω∈Rdf(ω)where f(w)def



i=1

fi(ω)(2)

where fi(ω) =(xi,yi,ω) indicates the loss of the prediction

on the sample (xi,yi)with parameters ω. Assuming that there

are Kclients and the data samples are partitioned into K

subsets, with Pk(nk=|Pk|), the set of indexes of data samples

in client k. Thus, (2) can be rewritten as follows:

f(w)=



k=1

nFk(ω)where Fk(ω)=1

nk

i∈Pk

fi(ω)(3)

where Pkrefers to a partition. Since Fkcould be an arbitrarily

bad approximation to f, the setting of federated learning

can be extrapolated to non-i.i.d. However, Nilsson et al. [23]

showed that the federated averaging achieves the best perfor-

mance for federated learning and is practically equivalent to

the centralized architecture only when training data are i.i.d.

In the non-i.i.d. case, the centralized approach performs better

than the federated averaging [20].

C. Federated Learning GAN

FL-GAN focuses on integrating federated learning with

GAN models [9]. We still take Fig. 1 as the example. Assume

that there are Kclients, and each client holds a regular GAN.

Each GAN builds a mapping function that can map a random

noise zfrom randomized space Zinto the data space χi,

K

i=1χi=X, given that χiis not representative of the

overall data space Xbut a local representative. Therefore, the

data distribution, pri(x), within a client is a part of overall

distribution pr(x),andK

i=1pri(x)=pr(x). We deﬁne a prior

distribution on Zas pz(z)in each client and sample noise from

pz(z). In this way, the objective function of FL-GAN can be

deﬁned as follows:

min

Gmax

DV(G,D)=1



i=1Ex∼pri(x)logDi(x)

+Ez∼pz(z)log(1−Di(Gi(z)))

(4)

where pz(z)follows an easy-to-sample distribution [e.g., the

Gaussian distribution (0, 1)], and iindicates the ith GAN

Authorized licensed use limited to: Wuhan University. Downloaded on June 23,2022 at 04:15:17 UTC from IEEE Xplore. Restrictions apply.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

4IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

in the ith client. From (4), we can see that the FL-GAN

adopts model averaging to aggregate local GAN models’

updates to calculate the parameters of the global GAN, i.e.,

Gglb =(1/K)K

i=1G(x;θgi). After that, the parameters of

each local GAN would be replaced by the parameters of global

GAN, and this is formulated by G(x;θgi)=Gglb ,i∈[1,K].

This process would be repeated until all local GAN models

reach the equilibrium. However, such an aggregation may

cause convergence difﬁculty or longer time to get converged,

resulting in simulation data with low diversity [9]. This is

because federated averaging method implies that all local GAN

models either simultaneously reach equilibrium or are far away

from the equilibrium. Even though there is a speciﬁc local

GAN (GANi) reaching the equilibrium, it would be replaced

by Gglb in next iteration, rendering GANijumping out of its

own optimal status. In Section IV, we propose the IFL-GAN

and demonstrate how to address this issue with IFL-GAN.

IV. IMPROVED FEDERATED LEARNING GAN

In this section, we propose IFL-GAN by aggregating GAN

models with MMD. Speciﬁcally, we discuss two important

issues in our new approach: 1) how to train IFL-GAN on

decentralized and non-i.i.d. data with MMD and 2) comparing

MMD with the federated averaging method.

A. Training IFL-GAN With MMD

For each client, it still holds one single GAN model, and the

generator within each GAN model builds a mapping function

that maps noise code zfrom randomized space Zinto data

space χi.χiis a subspace of data space X.However,the

weight of each GAN in IFL-GAN is derived from the MMD

score rather than the weighted average score. In this way, the

objective function of IFL-GAN can be deﬁned as follows:

min

Gmax

DV(G,D)

=α1Ex∼pr1(x)logD1(x)

+Ez∼pz(z)log(1−D1(G1(z)))

+α2Ex∼pr2(x)logD2(x)

+Ez∼pz(z)log(1−D2(G2(z)))+···

+αKEx∼prK(x)logDK(x)

+Ez∼pz(z)log(1−DK(GK(z)))

s.t.



i=1

αi=1(5)

where αi(i=1,2,...,K) indicates the weight of the ith

GAN model and its value is derived from the MMD score

[see (6)]. In (6), fusually refers to the Gaussian kernel func-

tion that maps data into the reproducing kernel Hilbert space

(RKHS). xrefers to the training data with minibatch size,

while G(z)indicates the generated data with minibatch size.

The MMD score is determined by calculating the supremum

of expectations [i.e., E(∗)]of f(x)and that of f(G(z))

mmdi=sup

|| f||1

||E[f(x)]−E[f(G(z))]||2

s.t.α

i=emmdi

K

j=1emmd j

.(6)

Since the MMD score (mmdi) may be larger than 1, we uti-

lize the Softmax function to normalize MMD scores, obtaining

normalized MMD score αi,s.t. K

i=1αi=1. Similar to

vanilla GAN, each discriminator Diof IFL-GAN pursues max-

imizing the probability of assigning the correct label to both

training and generated samples, and each generator Gitries to

fool Diinto accepting its outputs as real data by maximizing

Di(Gi(z)). Here, we assume that each discriminator Diand

generator Gihave enough capacity. Thus, for each generator

Giﬁxed, each optimal discriminator D∗

iis shown as follows:

D∗

i(x)=pri

pri+pGi

.(7)

For pri=pGi,D∗

i(x)=(1/2). In this way, each optimal

generator Giis shown as follows:

V(Gi)=−2log2 +KL

pri

pri+pGi

2

+KL

pGi

pri+pGi

2.(8)

The proofs for optimal discriminator and generator are

referred to in the vanilla GAN study [6]. When both dis-

criminator and generator in a speciﬁc client reach the optimal

status (i.e., pri=pGi), the local generator Giwithin the

ith client has successfully learned the knowledge of data stored

in this client. Our goal is that all local generators have to learn

the knowledge of data stored in different clients, rendering

global generators producing all modes of data. Accordingly,

the formula of global generator Gglb is shown as follows:

Gglbx;θGglb =



i=1

αiGix;θGi(9)

where θGiindicates the ith generator’s parameters and θGglb

indicates the parameters of global generator.

After aggregating the parameters of G1:K, the global gener-

ator, Gglb, has learned the knowledge of all data samples, and

the global minimum of the virtual training criterion V(Gglb)is

achieved if and only if pGglb =pr,givenpr=K

i=1pGiand

K

i=1pGi=pGglb. At that point, V(Gglb )achieves the optimal

value −2log2. After that, the parameters of each generator,

Gi(x;θGi), is replaced by Gglb (x;θGglb ), and the formula is

shown in the following equation:

Gix;θGi=Gglbx;θGglb ,i=1,2,...,K.(10)

Note that our IFL-GAN is also better than FL-GAN on

imbalanced setting of distributed data. All local GAN models

are treated as exactly the same contributors in federated learn-

ing. This strategy reduces the impact on global generator for

such a GAN model (e.g., GAN1), which is trained on imbal-

anced data (the detailed explanation could refer to Section V),

given Gglb =(1/K)K

i=1G(x;θgi)and G(x;θgi)=Gglb,

i∈[1,K]. If GAN is trained on imbalanced data, generator

needs longer time to learn data distribution. Hence, the MMD

score would be enlarged for this case. According to (9)

and (10), IFL-GAN can preserve the parameters of GAN1to

larger extent, reducing training time.

There is one more thing worth to be noted. Each local

GAN may reach or approach the Nash equilibrium at different

Authorized licensed use limited to: Wuhan University. Downloaded on June 23,2022 at 04:15:17 UTC from IEEE Xplore. Restrictions apply.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

LI et al.: IFL-GAN WITH MMD MODEL AGGREGATION 5

Algorithm 1 IFL-GAN

1Input: Original dataset, noise z,pz(z).

2Output: Simulation data.

3KGAN models and each one holds the same hyperparameters and the same function, i.e., Ad am [14] optimizer with

learning

4rate 0.0002 and Binary Cross Entropy loss function. The index is indicated by i.

5for each epoch t =1, 2,… do

6Clients execute:

7for each client i ∈[1, K] do

8Updating the discriminator’s (Di) parameters by ascending its stochastic gradient;

9θD

mm

1{logDi(x(j))+log(1−Di(Gi(z(j))))};

10 Updating the local generator’s (Gi) parameters by descending its stochastic gradient;

11 θG

mm

1{log(1−Di(Gi(z(j)))},mindicates noise samples z1,...,zm;

12 end

13 Calculating and normalizing MMD score mmdiproduced by xiand Gi(z)with Eq.(6);

14 Applying Softmax function to all MMD scores to obtain αiand setting the threshold cai;

15 Server executes:

16 Gglb (x;θGglb )=K

i=1αiGi(x;θGi);

17 Clients update:

18 for each client i ∈[1, K] do

19 if mmdi>caithen

20 Gi(x;θGi)=Gglb (x;θGglb );

21 end

22 else

23 continue;

24 end

25 end

26 end

epochs due to model initialization and non-i.i.d. training data,

directly replacing these local GAN models with Gglb results

in them jumping into the nonoptimal status from the Nash

equilibrium [29]. In our proposed IFL-GAN, the MMD score

would be regarded as an indicator. In general, we utilize

ﬁnite samples from two distributions to measure the MMD

score [15]. In our work, we assume that the distributions

refer to generated data distribution and original data distribu-

tion, and ﬁnite samples are sampled from both distributions.

Because of sampling variance, the MMD score may not be

zero even though GAN has reached the Nash equilibrium [15].

Since the MMD estimator implicitly involves a threshold, cα,

for distinguishing distributions by ﬁnite samples, we usually

conduct a null hypothesis H0:P=Q[8]. If the MMD

score is greater than cα, we should reject such a hypothesis;

otherwise, we should accept this hypothesis. In this study,

we set the threshold with the minimal MMD score. This is

because the smaller the MMD score is, the better matching

the two distributions are. A more detailed discussion is shown

in Section V.

The pseudocode of IFL-GAN is formally presented in

Algorithm 1. The generator and discriminator architectures

of each GAN model in Algorithm 1 are based on regular

DCGAN [1]. The details of components conform to

Conv-BatchNorm-ReLu [24] (generator G) or Conv-

BatchNorm-LeakyReLu [30] (discriminator D).

In Algorithm 1, each client holds a speciﬁc GAN and uses

subscript ito indicate it, i.e., Giand i∈[1,K]. For each

client, we train the local GAN model and send its parameters

(θGi) and the corresponding softmax MMD score αito the

server to learn a global generator (Gglb) by aggregating local

GAN models and MMD scores. After that, the generator of

each local GAN is replaced with Gglb if GANidoes not

reach the Nash equilibrium. This process is repeated until

all GAN mod els reach the Nash equilibrium ( pG1=pr1,

pG2=pr2,...,pGK=prK). As a consequence, the global

generator has learned the knowledge of all data samples, and

the simulation data generated by global generator hold all

features of decentralized data according to (9). Fig. 2 shows

the architecture of our proposed IFL-GAN.

B. Avg Versus MMD

Since the generator in GAN specializes in producing simu-

lation data, we compare the federated averaging method with

MMD by targeting the generator. In the MMD, we assume

that the generated instances, G(z), are deﬁned via sampling

from the generated data distribution P; the original samples, x,

Authorized licensed use limited to: Wuhan University. Downloaded on June 23,2022 at 04:15:17 UTC from IEEE Xplore. Restrictions apply.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

6IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

Fig. 2. Dotted arrows indicate sampling from a speciﬁc distribution. θG1indicates the parameters of the ﬁrst generator, which is in the ﬁrst client. The

parameters of all local generators (θG1,θG1,...,θ

GK) and their MMD scores normalized by Softmax function (α1,α

2,...,α

K) are uploaded to the global

generator to calculate its parameters with aggregation [see (9)]. After that, each local GAN (G1to GK) is replaced with aggregated GAN (θGglb), which is

indicated by the red line.

are deﬁned via sampling from the original data distribution Q.

Those samples are mapped into the RKHS with the Gaussian

Kernel function. After that, we utilize the Euclidean distance

to reexpress (6) in RKHS. Hereby, the estimator of MMD

(P,Q)isdeﬁnedasfollows:

MMD(P,Q)=EPx,x−2EP,Q(x,G(z))+EQG(z),G(z).

(11)

The detailed deduction of the MMD estimator is shown in

study [8]. Note that the MMD estimator implicitly involves

a threshold for distinguishing distributions by ﬁnite samples.

In other words, we usually conduct a hypothesis test with a

null hypothesis: H0:P=Q. In general, judging whether

this equation is to be held or not relies on comparing the test

statistic MMD[x,G(z)] with a particular threshold ca.Ifcais

exceeded, then the test rejects the hypothesis; otherwise, the

test should accept the hypothesis.

Since the data are distributed over different clients in a non-

i.i.d. manner, it is a nontrivial task to know the data details,

such as sample quantity and category that a client holds. This

causes it to be hard to perceive the learning status of each

local model, e.g., whether a local GAN model reaches the

Nash equilibrium or not. The traditional federated averaging

method cannot loyally reﬂect such different status because it

views all local models with the same contribution. In opposite,

the MMD score can loyally reﬂect such a difference, i.e., the

smaller MMD score indicates a better status, while the larger

MMD score refers to a worse status [8], given that the MMD

score reﬂects the matching degree of two distributions. Hence,

the MMD is more suitable than the averaging method for

aggregating locally computed updates on federated learning.

Next, we would like to discuss the supreme advantages of

MMD according to the learning status. For convenient demon-

stration, here, we assume that K=2: two local GAN models:

GAN1and GAN2, and one global GAN model: GANglb

⎧

⎨

⎩

Gglbavg =1

2G1(z;θG1)+1

2G2z;θG2

Gglbmmd =α1×G1(z;θG1)+α2×G2z;θG2

(12)

where θGi indicates the parameters of generator in the

ith GAN. Gglbavg indicates that the parameters of global gener-

ator are calculated by federated averaging method, while our

approach adopts MMD to aggregate the parameters of local

generators, and here, we term Gglbmmd . Given that the training

data are distributed over different clients in the non-i.i.d. man-

ner, we could observe 0 <α

1= α2<1 after training some

epochs, and they refer to MMD scores. From (12), an intuitive

scenario could be observed that Gglbmmd within (12) becomes

a weighted equation, and Gglbavg is an absolute arithmetic

averaging equation. Compared with the averaging method,

the beneﬁts of the weighted method include: 1) averaging

the averages breaks the fundamental rules of math [13] such

Authorized licensed use limited to: Wuhan University. Downloaded on June 23,2022 at 04:15:17 UTC from IEEE Xplore. Restrictions apply.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

LI et al.: IFL-GAN WITH MMD MODEL AGGREGATION 7

Fig. 3. Architectural details of IFL-GAN for MNIST, CIFAR10, and SVHN datasets.

that some studies insist that averaging method is derived

by the “wrong math” but believe that the weighted method

is based upon the “correct math” and should be applied

accordingly [13] and 2) the weighted method allows the ﬁnal

number to quantitatively reﬂect the relative importance of each

local model that is being averaged in federated averaging [32].

However, the federated averaging strategy only works well

when all local models are equally important [33]. Yet, this is

not often the case in practice, especially for that the data are

distributed over clients in the non-i.i.d. manner.

In general, we view the process of training a model as

the process of optimizing the model’s parameters [5]. With

the training increasing, the updated parameters guide the

model toward the optimal status with backpropagation. Here,

we assume that G1is more optimal than G2, which means

that both the distribution of synthetic data produced by G1

and that of original data hold a smaller MMD score than G2.

Hence, we have 0 <α

1<α

2<1. With this in place, to make

the global model converge rapidly, an intuitive common sense

is that the parameters of G1need smaller updating, while

the parameters of G2require larger updating. However, the

averaging method views them with the same contribution,

which may easily suffer from the problem that G1has skipped

the optimal status when G2is far away from this goal. Hence,

Gglbmmd gets converged more rapidly than Gglbavg .Wenowstate

this properly in the following theorem.

Theorem: We assume that M1denotes a model that adopts

the averaging method to aggregate locally computed updates;

M2denotes a model that adopts the MMD method to learn

a shared model by aggregating locally computed updates. For

each Mi,weset K=2 (i.e., two local GAN models GAN1

and GAN2and a global GAN model GANglb ). Moreover,

we assume that αirepresents the performance (the matching

degree between the generated data distribution and the original

data distribution) of Gi, given that the generator outputs

realistic-like data. By optimizing the parameters of the model,

the matching degree between the two distributions tends to

be enhanced. Also, we utilize the same components and

parameters to initialize both M1and M2andemploythe

same hyperparameters to train the two models. In addition,

the training data are distributed over different clients in the

non-i.i.d. manner, which means that α1= α2. If we suppose

that α1<α

2,thenM2would get converged more rapidly

than M1.

Proof: Let the current performance of G2be α2,andα

represents the performance of G2in M1;α

2indicates that

of G2in M2after performing (13). Since α1<α

2and

αi=1 [see (5)], we have α2>const (i.e., 0.5). In this

way, Gglbavg certainly reduces the performance of G2because

the averaging method enforces the weight of G2(i.e., α2)to

be small. Since the parameters of G2are kept to the greatest

extent in Gglbmmd [see (12)], we obtain α

2>α



2according

to (13). Hence, G2in M2gets larger updating than that in

M1.AstoG1, we naturally hold α1<const. Let α

1represent

the performance of G1in M1,andα

1indicates that of G1

in M2after performing (13); we get α

1<α



1.Inthisway,

G1in M2gets smaller updating than that in M1. Since the

updating strategy in M2is more practical than M1,M2gets

converged more rapidly than M1. As illustrated above, this

theorem could be easily extended to the cases that K>2.

In addition, the scenario of α1>α

2is similar to α1<α

Next, we would like to discuss the speciﬁc implementations

of M1and M2through FL-GAN and our proposed IFL-GAN

G1z;θG1=G2z;θG2=Gglbavg

G1z;θG1=G2z;θG2=Gglbmmd .(13)

As the training process of the two models proceeds, a sce-

nario would have arrived where GAN1reaches or nears the

Nash equilibrium, while GAN2is far away from this status.

According to the Clients update of Algorithm 1, the local

model GANigets updated only when its MMD score is

larger than the threshold cai; otherwise, the local model GANi

refuses the updating if its MMD score is equivalent to or

less than cai. In this way, the parameters of GAN1would

not be changed, and only GAN2gets updated in IFL-GAN.

Unfortunately, the operation of G1(z;θG1)=Gglbavg changes

the parameters of GAN1in FL-GAN, resulting in that GAN1

jumps out of the Nash equilibrium. Therefore, our proposed

approach enables the GAN model to get converged faster than

averaging method.

V. E XPERIMENTS

For the experiments, the implementation details of our

proposed IFL-GAN are shown in Fig. 3. Note that each

generator and discriminator in IFL-GAN are initialized by

the same hyperparameters (normal (0.0, 0.02) and biases

are 0.0), and the noise is sampled from the standard

Authorized licensed use limited to: Wuhan University. Downloaded on June 23,2022 at 04:15:17 UTC from IEEE Xplore. Restrictions apply.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

8IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

Fig. 4. Synthetic images produced by the three candidate models on balanced data setting of MNIST with K=2. The ﬁrst client holds ﬁgures “0”–“4” with

10 000 samples, while the second client holds ﬁgures “5”–“9” with 10 000 samples. MD-GAN does not capture ﬁgure “4,” while FL-GAN does not capture

ﬁgure “2.” Only our IFL-GAN generates all ﬁgures from “0” to “9.” (a) MD-GAN. (b) FL-GAN. (c) IFL-GAN.

Gaussian (0, 1) [ pz(z)]. Two recently popular GAN variants

are employed as baselines, which are MD-GAN [9] and

FL-GAN [9], [21]. They are representatives of distributed

GAN. For making a fair comparison, we adopt the same

components and hyperparameters (e.g., Epoch =200 and

learning rate =0.0002) and implementation details for base-

lines and IFL-GAN. All models are implemented in the

Pytorch framework.

Three commonly used public image datasets, the MNIST,

CIFAR-10, and SVHN, are studied. The MNIST dataset con-

sists of a gray-scale image with a size of 28 ×28. Both

the CIFAR-10 dataset and the SVHN dataset contain RGB

images, and their sizes are set to 3 ×32 ×32. We conduct

experiments with a number of models K∈[2,5,10].For

model split, K=2 (5 or 10) indicates two (ﬁve or ten) local

GAN models, one global GAN for FL-GAN, our proposed

IFL-GAN, Kdiscriminators, and one single generator are

employed for MD-GAN. For training data split (taking MNIST

as the example) to Kclients, K=2 indicates that each client

holds ﬁve categories. The ﬁrst client holds ﬁgures “0”–“4,”

while the second client holds ﬁgures “5”–“9.” Each client

holds 10 000 training samples, and we call it “balanced data.”

The ﬁrst client holds 10000 data samples, while the second

client holds 1000 data samples, and we call it “imbalanced

data.” When the second client holds 100 data samples, we term

it “extremely imbalanced data.” For K=5, each client holds

two categories and has 10000 data samples in the case of

balanced data. The ﬁrst three clients hold 15 000 data samples,

and each one has 5000 samples, while the last two clients

hold 2000 data samples, and each one has 1000 samples in

the imbalanced case. Each client holds 100 data samples for

the last two clients in the extremely imbalanced case. For

K=10, each client just holds one category and has 5000 data

samples in the balanced case. In the imbalanced case, each

client within the ﬁrst ﬁve clients holds 5000 samples, while

each client within the last ﬁve clients holds 1000 samples.

In the extremely imbalanced case, each client within the last

ﬁve clients holds 100 samples.

A. MNIST

We ﬁrst apply IFL-GAN and baselines to the MNIST dataset

in the case of K=2 with balanced data. For FL-GAN and

IFL-GAN, the generated images are produced by a global

generator. For MD-GAN, there is only one single generator.

Since our proposed approach is to utilize the MMD to assign

the weights for corresponding local models, and determining

whether the empirical MMD shows a statistically signiﬁcant

difference is achieved by comparing the test statistic with a

particular threshold [8], we utilize the minimal MMD score

to set the threshold for each mmdiduring training. This is

because the smaller MMD score implicitly indicates that one

distribution is better matching another distribution. In this

case, the threshold ca1of mmd1is 0.11027 at epoch =182,

and the threshold ca2of mmd2is 0.11463 at epoch =184.

In other words, if mmd1(mmd2) score is larger than 0.11027

(0.11463), we should reject the hypothesis that assuming the

two distributions match with each other; otherwise, we should

accept the hypothesis. Once we accept the two hypotheses

from the two clients, the global GAN generates the simulation

data. The generated images of the three candidate models in

the same case are shown in Fig. 4, and the corresponding loss

values of two generators for FL-GAN and IFL-GAN are shown

in Fig. 5, which is plotted by exponential moving average

method. From Fig. 4, we can observe that the MD-GAN does

not capture ﬁgure “4,” while FL-GAN does not hold ﬁgure “2.”

Only our proposed IFL-GAN generates all ﬁgures, which

demonstrates the effectiveness of IFL-GAN. From Fig. 5,

we can observe that IFL-GAN outperforms FL-GAN in terms

of smaller ﬂuctuation and faster convergence.

Furthermore, we continue to quantitatively evaluate the

performance of our proposed IFL-GAN in different cases

(i.e., K=2,5,10 on “balanced data,” “imbalanced data,” and

“extremely imbalanced data” settings of the MNIST dataset).

The MNIST score [9] (the higher the better), similar to the

inceptions score, is employed as the quantitative metric in

our study. The MNIST scores achieved by our proposed IFL-

GAN and baselines are shown in Table I. Fig. 6 shows the

Authorized licensed use limited to: Wuhan University. Downloaded on June 23,2022 at 04:15:17 UTC from IEEE Xplore. Restrictions apply.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

LI et al.: IFL-GAN WITH MMD MODEL AGGREGATION 9

Fig. 5. In the case of K=2 on the balanced data setting, (a) indicates the ﬁrst generator loss for our proposed IFL-GAN and FL-GAN and (b) indicates

the second generator loss for the two models. In each subﬁgure, the red curve indicates the generator loss of FL-GAN, while the blue curve indicates thatof

IFL-GAN.

TAB L E I

MNIST SCORES ON BALANCED,IMBALANCED,AND EXTREMELY IMBALANCED SETTINGS OF THE MNIST DATA S E T.ITISOBVIOUS THAT OUR

PROPOSEDIFL-GAN ACHIEVES THE BEST PERFORMANCE AND OBTAINS THE HIGHEST MNIST SCORES IN DIFFERENT CASES

Fig. 6. Synthetic images produced by the three candidate models on extremely imbalanced data setting of MNIST in the case of K=2 in which the client1

holds ﬁgures from “0” to “4” with 10 000 samples, while the second client2holds ﬁgures from “5” to “9” with 100 samples. In this case, MD-GAN basically

focuses on the ﬁgures “0”–“4,” while the FL-GAN holds the worse quality among the three models. Only our proposed IFL-GAN produces diverse generated

images. (a) MD-GAN. (b) FL-GAN. (c) IFL-GAN.

generated images from all the three models in the case of

K=2 on the extremely imbalanced setting of MNIST. It is

obvious that our proposed IFL-GAN produces more diverse

simulation images than the other two GAN variants. For

example, Fig. 6(c) shows the ﬁgures “6” (row “4” and col

“8”), “7” (row “1” and col “6”), “8” (row “4” and col “6”), and

Authorized licensed use limited to: Wuhan University. Downloaded on June 23,2022 at 04:15:17 UTC from IEEE Xplore. Restrictions apply.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

10 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

Fig. 7. Synthetic images produced by three candidate models in the case of K=2 on the balanced data setting of CIFAR10 in which the ﬁrst client holds

categories “airplane,” “automobile,” “bird,” “cat,” and “deer,” while the second client holds categories “dog,” “frog,” “horse,” “ship,” and “truck.” (a) MD-GAN.

(b) FL-GAN. (c) IFL-GAN.

Fig. 8. In the case of K=2 on the balanced data setting of CIFAR10, (a) indicates the ﬁrst generator loss for our proposed IFL-GAN and FL-GAN, and

(b) indicates the second generator loss for the two models. In each subﬁgure, the red curve indicates the generator loss of FL-GAN, while the blue curve

indicates that of IFL-GAN.

“9” (row “1” and col “7”). Fig. 6(a) basically focuses on the

ﬁgures “0”–“4,” while Fig. 6(b) shows the worse quality of

generated images among the three models. For the imbalanced

case, the generated data quality is similar to Fig. 6. As is

visually predictable from the outputs, the MNIST score of

our proposed IFL-GAN is higher than other GAN variants in

different cases, demonstrating the effectiveness of IFL-GAN.

B. CIFAR10

We continue to apply our proposed IFL-GAN and baselines

to the CIFAR10 dataset in different cases: K=2,5,10 on

the “balanced data,” “imbalanced data,” and “extremely imbal-

anced data” settings. Still, we utilize the minimal MMD score

to set the threshold during training. Here, we demonstrate the

case of K=2 with balanced data for convenient observa-

tion. In this case, the threshold ca1of mmd1is 0.09416 at

epoch =171, and the threshold ca2of mmd2is 0.08366 at

epoch =199. The threshold can help us reject the hypothesis

(generated data distribution matching to original data distribu-

tion) if the MMD score is larger than the threshold; otherwise,

we should accept the hypothesis. When we accept the two

hypotheses, the global GAN generates simulation data. The

generated images, in this case, are shown in Fig. 7, and the

corresponding generator loss values are shown in Fig. 8. It also

indicates that our proposed IFL-GAN outperforms FL-GAN in

terms of faster convergence and smaller ﬂuctuation, especially

for the ﬁrst generator G1. To further evaluate the performance

of our proposed IFL-GAN, we adopt the inception score

(the higher the better) [2], [26] as the metric. The inception

score involves using a pretrained Inception v3 network model

for image classiﬁcation to predict the class probabilities for

each generated image, and these probabilities are conditional

Authorized licensed use limited to: Wuhan University. Downloaded on June 23,2022 at 04:15:17 UTC from IEEE Xplore. Restrictions apply.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

LI et al.: IFL-GAN WITH MMD MODEL AGGREGATION 11

TAB L E I I

INCEPTION SCORES ON BALANCED,IMBALANCED,AND EXTREMELY IMBALANCED SETTINGS OF THE CIFAR10 DATAS E T .OUR PROPOSED

IFL-GAN STILL ACHIEVES THE BEST PERF ORMANCE AND OBTAINS THE HIGHEST INCEPTION SCORES I N DIFFERENT CASES

Fig. 9. Synthetic images produced by three candidate models in the case of K=2 on the balanced data setting of SVHN in which the ﬁrst client holds the

categories from “0” to “4,” while the second client holds the categories from “5” to “9.” (a) MD-GAN. (b) FL-GAN. (c) IFL-GAN.

probabilities because images that contain meaningful objects

should have a conditional label distribution p(y|x)with low

entropy [26]. The inception score of generated images is

reported in Table II. It is obvious that our proposed IFL-GAN

still outperforms the baselines in different cases, which further

validates the effectiveness of IFL-GAN.

C. SVHN

The street view house numbers (SVHNs) [7] are con-

structed from real-world house numbers in Google Street View

images [22]. It can be seen as similar in ﬂavor to MNIST

(e.g., the images are of small cropped digits) but incorporates

an order of magnitude more labeled data (over 600000 digit

images). We continue to apply our proposed IFL-GAN and

the baselines to this dataset. We still set the threshold with the

minimal MMD score during training. In the case of balanced

data with K=2, the threshold ca1of mmd1is 0.05169 at

epoch =169, and the threshold ca2of mmd2is 0.0475 at

epoch =152. We would reject the hypothesis if the MMD

score is larger than the threshold and accept this hypothesis

if the MMD score is smaller and equal to the threshold.

After accepting the two hypotheses, the global GAN generates

the simulation data. The corresponding generated images of

this case are shown in Fig. 9, and the generator loss values

are shown in Fig. 10, which illustrates that IFL-GAN gets

converged faster than FL-GAN. In addition, we develop the

SVHN score (the higher the better), which is calculated in a

similar way as the MNIST score to evaluate the performance

of the three candidate models on balanced data, imbalanced

data, and extremely imbalanced data with K=2,5,10.

To obtain the SVHN score, the training set (i.e., 73 257 digits)

is employed to train a classiﬁer, and 100 generated images are

viewed as the test set. The SVHN scores of generated images

from the three candidate models are reported in Table III. From

Table III, we can observe that our proposed IFL-GAN still

outperforms baselines in achieving the highest SVHN score

on the SVHN dataset.

D. Impact of Different Data Division Methods

Note that, although the data division in our study is artiﬁcial,

it could reﬂect the real scene to some extent. Here, we discuss

the data division methods of both our study and [12], which

is a representative data division method in federated learning.

The study [12] splits data samples according to the Dirichlet

distribution. It draws q∼Dir(αp)samples from a Dirichlet

distribution in which pcharacterizes a prior class distribution

and αdenotes the concentration parameter. Note that the prior

class distribution is a uniform distribution in [12]. When αis

a small value (e.g., α≤1), each client holds a few or even

one category with qimages and holds most categories when

Authorized licensed use limited to: Wuhan University. Downloaded on June 23,2022 at 04:15:17 UTC from IEEE Xplore. Restrictions apply.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

12 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

Fig. 10. In the case of K=2 on the balanced data setting of SVHN, (a) indicates the ﬁrst generator loss for our proposed IFL-GAN and FL-GAN, and

(b) indicates the second generator loss for the two models.

TABLE III

SVHN SCORES ON BALANCED,IMBALANCED,AND EXTREMELY IMBALANCED SETTINGS OF THE SVHN DATA S E T.

THE EXPERIMENTAL RESULTS FURTHER DEMONSTRATE THE EFFECTIVENESS OF OUR PROPOSED IFL-GAN

αhas a larger value (e.g., α→∞). In our study, the class

distribution still follows a uniform distribution. Moreover, each

client holds a few or just one category when K=5orK=10

and holds more categories when K=2. In other words, the

division method in our study is similar to that in [12] to a cer-

tain degree. We also conduct experiments with α=20.0and

α=2.0 on CIFAR10. Here, we assume two clients and divide

samples according to the Dirichlet distribution. For the case of

α=20.0, client 1 holds categories “airplane,” “automobile,”

“frog,” “ship,” and “truck,” while client 2 holds categories

“bird,” “cat,” “deer,” “dog,” and “horse.” This corresponds

to the case of K=2 in our study where client 1 holds

categories “airplane,” “automobile,” “bird,” “cat,” and “deer,”

while client 2 holds categories “dog,” “frog,” “horse,” “ship,”

and “truck.” The corresponding inception scores are similar

to Table II. For the case of α=2.0, client 1 holds categories

“automobile,” “deer,” and “dog” with 10 000 images, while

client 2 holds categories “airplane,” “bird,” “cat,” “frog,”

“horse,” “ship,” and “truck” with 10000 images (i.e., balanced

data) or 100 images (i.e., extremely imbalanced data). Both

FedAvg and MMD are utilized to update the parameters of

the model, respectively. The inception scores are shown in

Table IV. It is obvious that our approach still outperforms the

TAB L E I V

INCEPTION SCORES ON DIRICHLET DATA DIVISION.THE RES ULTS

STILL SHOW THE EFFECTIVENESS OF OUR IFL-GAN

traditional FedAvg method when utilizing Dirichlet to divide

data samples.

E. Discussion

Although MD-GAN and FL-GAN can handle the distributed

data, they have limitations if data are distributed over multiple

clients in a non-i.i.d. manner (see Fig. 6). This is because the

federated averaging method views each local model with the

same contribution to train the global model. Data distributed

over clients in the i.i.d. manner is not often the case in practice,

so each local model inevitably holds different contributions to

the global model. To this end, we propose IFL-GAN, which

takes the MMD score as each local model’s contribution

weight to drive the learning of a globally shared model by

Authorized licensed use limited to: Wuhan University. Downloaded on June 23,2022 at 04:15:17 UTC from IEEE Xplore. Restrictions apply.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

LI et al.: IFL-GAN WITH MMD MODEL AGGREGATION 13

aggregating locally computed updates with those weights. Our

IFL-GAN is more generalizable and achieves faster and more

stable convergence (see Figs. 5, 8, and 10) than baselines

(i.e., FL-GAN and MD-GAN) on decentralized and non-i.i.d.

data (see Figs. 4, 7, and 9) and still produces diverse instances

when facing extremely imbalanced data (see Fig. 6).

VI. CONCLUSION

In this study, to deal with the challenge of training GAN

models on distributed data in a non-i.i.d. manner, we pro-

pose IFL-GAN. Comprehensive experiments demonstrate the

following capabilities of our IFL-GAN: 1) achieving higher

MNIST scores, inception scores, and SVHN scores than

FL-GAN and MD-GAN on decentralized and non-i.i.d. data;

2) generating more diverse and appealing recognizable images;

and 3) converging faster and more stable than FL-GAN.

REFERENCES

[1] L. M. A. Radford and S. Chintala, “Unsupervised representation learn-

ing with deep convolutional generative adversarial networks,” 2015,

arXiv:1511.06434.

[2] S. Barratt and R. Sharma, “A note on the inception score,” 2018,

arXiv:1801.01973.

[3] K. Bonawitz et al., “Towards federated learning at scale: System design,”

2019, arXiv:1902.01046.

[4] C. Daskalakis, W. P. Goldberg, and H. C. Papadimitriou, “The complex-

ity of computing a Nash equilibrium,” SIAM J. Comput., vol. 39, no. 1,

pp. 195–259, 2009.

[5] I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio, Deep Learning,

vol. 1. Cambridge, MA, USA: MIT Press, 2016.

[6] I. Goodfellow et al., “Generative adversarial nets,” in Proc. Adv. Neural

Inf. Process. Syst., 2014, pp. 2672–2680.

[7] I. J. Goodfellow, Y. Bulatov, J. Ibarz, S. Arnoud, and V. Shet, “Multi-

digit number recognition from street view imagery using deep convolu-

tional neural networks,” 2013, arXiv:1312.6082.

[8] A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, and

A. Smola, “A kernel two-sample test,” J.Mach.Learn.Res., vol. 13,

no. 1, pp. 723–773, Mar. 2012.

[9] C. Hardy, E. Le Merrer, and B. Sericola, “MD-GAN: Multi-

discriminator generative adversarial networks for distributed datasets,”

2018, arXiv:1811.03850.

[10] I. Heged ˝us, G. Danner, and M. Jelasity, “Gossip learning as a decentral-

ized alternative to federated learning,” in Proc. IFIP Int. Conf. Distrib.

Appl. Interoperable Syst. Cham, Switzerland: Springer, 2019, pp. 74–90.

[11] K. Hsieh et al., “Gaia: Geo-distributed machine learning approaching

LAN speeds,” in Proc. 14th USENIX Symp. Netw. Syst. Design Imple-

ment. (NSDI), 2017, pp. 629–647.

[12] T.-M. Harry Hsu, H. Qi, and M. Brown, “Measuring the effects of

non-identical data distribution for federated visual classiﬁcation,” 2019,

arXiv:1909.06335.

[13] S.-P. Hu, “Simple mean, weighted mean, or geometric mean,” in Proc.

ISPA/SCEA Int. Conf., San Diego, CA, USA, 2010, pp. 1–24.

[14] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”

2014, arXiv:1412.6980.

[15] C.-L. Li, W.-C. Chang, Y. Cheng, Y. Yang, and B. Póczos, “MMD GAN:

Towards deeper understanding of moment matching network,” in Proc.

Adv. Neural Inf. Process. Syst., 2017, pp. 2203–2213.

[16] W. Li, W. Ding, R. Sadasivam, X. Cui, and P. Chen, “His-GAN:

A histogram-based GAN model to improve data generation quality,”

Neural Netw., vol. 119, pp. 31–45, Nov. 2019.

[17] W. Li, L. Fan, Z. Wang, C. Ma, and X. Cui, “Tackling mode collapse

in multi-generator GANs with orthogonal vectors,” Pattern Recognit.,

vol. 110, Feb. 2021, Art. no. 107646.

[18] W. Li, Z. Liang, P. Ma, R. Wang, X. Cui, and P. Chen, “Haus-

dorff GAN: Improving GAN generation quality with Hausdorff

metric,” IEEE Trans. Cybern., early access, Mar. 18, 2021, doi:

10.1109/TCYB.2021.3062396.

[19] B. Liu, L. Wang, and M. Liu, “Lifelong federated reinforcement learn-

ing: A learning architecture for navigation in cloud robotic systems,”

2019, arXiv:1901.06455.

[20] R. Mayer and H.-A. Jacobsen, “Scalable deep learning on dis-

tributed infrastructures: Challenges, techniques and tools,” 2019,

arXiv:1903.11314.

[21] H. Brendan McMahan, E. Moore, D. Ramage, S. Hampson, and

B. Agüera y Arcas, “Communication-efﬁcient learning of deep networks

from decentralized data,” 2016, arXiv:1602.05629.

[22] Y. Netzer, T. Wang, A. Coates, A. Bissacco, Bo Wu, and A. Y. Ng,

“Reading digits in natural images with unsupervised feature learning,” in

Proc. NIPS Workshop Deep Learn. Unsupervised Feature Learn., 2011.

[23] A. Nilsson, S. Smith, G. Ulm, E. Gustavsson, and M. Jirstrand,

“A performance evaluation of federated learning algorithms,” in Proc.

2nd Workshop Distrib. Infrastruct. Deep Learn., Dec. 2018, pp. 1–8.

[24] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating

deep network training by reducing internal covariate shift,” 2015,

arXiv:1502.03167.

[25] J. R. Norris, Markov chains, no. 2. Cambridge, U.K.: Cambridge Univ.

Press, 1998.

[26] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and

X. Chen, “Improved techniques for training GANs,” in Proc. Adv. Neural

Inf. Process. Syst., 2016, pp. 2234–2242.

[27] V. Smith, C.-K. Chiang, M. Sanjabi, and A. S. Talwalkar, “Federated

multi-task learning,” in Proc. Adv. Neural Inf. Process. Syst., 2017,

pp. 4424–4434.

[28] X. Wang and A. Gupta, “Generative image modeling using style and

structure adversarial networks,” in Proc. Eur. Conf. Comput. Vis. Cham,

Switzerland: Springer, 2016, pp. 318–335.

[29] C. Xiao, P. Zhong, and C. Zheng, “BourGAN: Generative networks

with metric embeddings,” in Proc. Adv. Neural Inf. Process. Syst., 2018,

pp. 2269–2280.

[30] B. Xu, N. Wang, C. Naiyan, and T. Mu Li, “Empirical Evaluation of Rec-

tiﬁed Activations in Convolutional Network,” 2015, arXiv:1505.00853.

[31] Q. Xu et al., “An empirical study on evaluation metrics of generative

adversarial networks,” 2018, arXiv:1806.07755.

[32] R. R. Yager, “On ordered weighted averaging aggregation operators in

multicriteria decisionmaking,” IEEE Trans. Syst., Man, Cybern., vol. 18,

no. 1, pp. 183–190, Jan./Feb. 1988.

[33] R. R. Yager and J. Kacprzyk, The Ordered Weighted Averaging Oper-

ators: Theory and Applications. Berlin, Germany: Springer, Sci. Bus.

Media, 2012.

[34] H. Hankui Zhuo, W. Feng, Y. Lin, Q. Xu, and Q. Yang, “Federated deep

reinforcement learning,” 2019, arXiv:1901.08277.

Wei L i (Member, IEEE) received the bachelor’s

degree from the School of Mathematics and Statis-

tics, Wuhan University, Wuhan, China, in 2008, and

the Ph.D. degree from the School of Cyber Science

and Engineering, Wuhan University, in 2019.

He was a Visiting Student with the University

of Massachusetts Boston, Boston, MA, USA, and

had visited The Hong Kong Polytechnic University,

Hong Kong, as a Research Assistant. He is currently

an Associate Professor with the School of Artiﬁ-

cial Intelligence and Computer Science, Jiangnan

University, Wuxi, China. His research areas include data mining, machine

learning, and artiﬁcial intelligence.

Jinlin Chen received the master’s degree from The

Hong Kong Polytechnic University, Hong Kong,

in 2016, where he is currently pursuing the Ph.D.

degree with the Department of Computing.

From 2017 to 2018, he was a Machine Learning

Engineer with The Hong Kong Applied Science

and Technology Research Institute, Hong Kong. His

research interests include machine learning secu-

rity, robotics control, and multiagent reinforcement

learning.

Authorized licensed use limited to: Wuhan University. Downloaded on June 23,2022 at 04:15:17 UTC from IEEE Xplore. Restrictions apply.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

14 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

Zhenyu Wang received the bachelor’s degree

in software engineering from Wuhan University,

Wuhan, China, in 2010, and the Ph.D. degree from

the School of Remote Sensing Information Engineer-

ing, Wuhan University, in 2018.

He held a post-doctoral research position at the

School of Computer Science, Wuhan University.

He is currently a Researcher with the Jiaxing Insti-

tute of Future Food, Jiaxing, China. His research

areas include data mining, big data on food safety,

and artiﬁcial intelligence.

Zhidong Shen received the M.A. and Ph.D. degrees

in computer science from Wuhan University, Wuhan,

China, in 2003 and 2006.

He did his visiting research at the Queensland

University of Technology, Brisbane, QLD, Australia,

in 2010, and the University of Victoria, Victoria,

BC, Canada, in 2014. He is currently an Associate

Professor with the School of Cyber Science and

Engineering, Wuhan University. His research inter-

ests include cyberspace security, trusted computing,

machine learning, and big data.

Dr. Shen is also a member of the Chinese Computer Federation (CCF).

Chao Ma (Member, IEEE) is currently an Assis-

tant Professor with the School of Cyber Science

and Engineering, Wuhan University, Wuhan, China.

He has published over 30 academic papers in major

international journals and conference proceedings.

His research interests include time-series analytics,

representation learning, deep learning, explainable

AI, and big data analytics.

Dr. Ma is also a Professional Member of the

Chinese Computer Federation (CCF).

Xiaohui Cui received the B.S. degree in photoelec-

tric technology and the M.S. degree in computer sci-

ence from Wuhan University, Wuhan, China, in 1996

and 2000, and the Ph.D. degree in computer science

and engineering from the University of Louisville,

Louisville, KY, USA, in 2004.

He is currently a Professor with the School of

Cyber Science and Engineering, Wuhan University.

He has published over 160 papers in major artiﬁcial

intelligence journals and conferences. His research

areas include data mining, machine learning, and

artiﬁcial intelligence.

Dr. Cui received multiple national grants.

Authorized licensed use limited to: Wuhan University. Downloaded on June 23,2022 at 04:15:17 UTC from IEEE Xplore. Restrictions apply.

Federated Analytics With Data Augmentation in Domain Generalization Toward Future Networks

Article

Full-text available

Jan 2024

Federated Domain Generalization (FDG) aims to train a global model that generalizes well to new clients in a privacy-conscious manner, even when domain shifts are encountered. The increasing concerns of knowledge generalization and data privacy also challenge the traditional gather-and-analyze paradigm in networks. Recent investigations mainly focus on aggregation optimization and domain-invariant representations. However, without directly considering the data augmentation and leveraging the knowledge among existing domains, the domain-only data cannot guarantee the generalization ability of the FDG model when testing on the unseen domain. To overcome the problem, this paper proposes a distributed data augmentation method which combines Generative Adversarial Networks (GANs) and Federated Analytics (FA) to enhance the generalization ability of the trained FDG model, called FA-FDG. First, FA-FDG integrates GAN data generators from each Federated Learning (FL) client. Second, an evaluation index called generalization ability of domain (GAD) is proposed in the FA server. Then, the targeted data augmentation is implemented in each FL client with the GAD index and the integrated data generators. Extensive experiments on several data sets have shown the effectiveness of FA-FDG. Specifically, the accuracy of the FDG model improves up to 5.12% in classification problems, and the R-squared index of the FDG model advances up to 0.22 in the regression problem.

Cloud-Edge Collaborative Federated GAN Based Data Processing for IoT-Empowered Multi-Flow Integrated Energy Aggregation Dispatch

Article

Jan 2024
CMC-COMPUT MATER CON

CollaFuse: Collaborative Diffusion Models

Preprint

Jun 2024

In the landscape of generative artificial intelligence, diffusion-based models have emerged as a promising method for generating synthetic images. However, the application of diffusion models poses numerous challenges, particularly concerning data availability, computational requirements, and privacy. Traditional approaches to address these shortcomings, like federated learning, often impose significant computational burdens on individual clients, especially those with constrained resources. In response to these challenges, we introduce a novel approach for distributed collaborative diffusion models inspired by split learning. Our approach facilitates collaborative training of diffusion models while alleviating client computational burdens during image synthesis. This reduced computational burden is achieved by retaining data and computationally inexpensive processes locally at each client while outsourcing the computationally expensive processes to shared, more efficient server resources. Through experiments on the common CelebA dataset, our approach demonstrates enhanced privacy by reducing the necessity for sharing raw data. These capabilities hold significant potential across various application areas, including the design of edge computing solutions. Thus, our work advances distributed machine learning by contributing to the evolution of collaborative diffusion models.

Implementing a Multitarget Backdoor Attack Algorithm Based on Procedural Noise Texture Features

Article

Full-text available

Jan 2024

Recent studies have shown that deep neural networks (DNNs) may suffer from some security issues such as backdoor attacks. The triggers of backdoor attacks are dynamic and global. However, existing backdoor attack methods are laborious and time-consuming in hiding global triggers, and most of them focus on single-target attacks, with less research on multitarget backdoor attacks. In this work, we present a multitarget attack strategy using texture features of procedural noise. Specifically, we use the k-LSB steganography algorithm to hide the triggers in the image and use different texture features of procedural noise to trigger multiple targets for attack. Poisoned images can be generated more quickly using the k-LSB steganography algorithm without any training process. Multitarget backdoor attacks apply to more scenarios and are more difficult to defend against. We evaluate the effectiveness of the proposed attack method on GTSRB and ImageNet datasets, and the experiments show that the proposed attack can achieve a high attack success rate (up to 100.00% for GTSRB and up to 98.48% for ImageNet) without compromising the clean data’s categorization performance, and thus is less likely to arouse the suspicion of administrators. In addition, the attack can bypass existing defense methods (STRIP defense and Neural Cleanse defense).

Hybrid Network Model Based on Data Enhancement for Short-Term Power Prediction of New PV Plants

Article

Full-text available

Jan 2024

This study proposes a hybrid network model based on data enhancement to address the problem of low accuracy in photovoltaic (PV) power prediction that arises due to insufficient data samples for new PV plants. First, a time-series generative adversarial network (TimeGAN) is used to learn the distribution law of the original PV data samples and the temporal correlations between their features, and these are then used to generate new samples to enhance the training set. Subsequently, a hybrid network model that fuses bi-directional long-short term memory (BiLSTM) network with attention mechanism (AM) in the framework of deep & cross network (DCN) is constructed to effectively extract deep information from the original features while enhancing the impact of important information on the prediction results. Finally, the hyperparameters in the hybrid network model are optimized using the whale optimization algorithm (WOA), which prevents the network model from falling into a local optimum and gives the best prediction results. The simulation results show that after data enhancement by TimeGAN, the hybrid prediction model proposed in this paper can effectively improve the accuracy of short-term PV power prediction and has wide applicability.

A Systematic Review of Federated Generative Models

Preprint

Full-text available

May 2024

Federated Learning (FL) has emerged as a solution for distributed systems that allow clients to train models on their data and only share models instead of local data. Generative Models are designed to learn the distribution of a dataset and generate new data samples that are similar to the original data. Many prior works have tried proposing Federated Generative Models. Using Federated Learning and Generative Models together can be susceptible to attacks, and designing the optimal architecture remains challenging. This survey covers the growing interest in the intersection of FL and Generative Models by comprehensively reviewing research conducted from 2019 to 2024. We systematically compare nearly 100 papers, focusing on their FL and Generative Model methods and privacy considerations. To make this field more accessible to newcomers, we highlight the state-of-the-art advancements and identify unresolved challenges, offering insights for future research in this evolving field.

An Object Detection Method Based on Heterogeneous Lidar Point Clouds Using Federated Learning

Chapter

Apr 2024

Afl-gan: adaptive federated learning for generative adversarial network with resource constraints

Article

Mar 2024

Federated Learning and Generative Adversarial Networks (FL-GANs) are becoming increasingly popular in solving practical applications, and their collaboration is even more efficient. However, non-independent and identically distributed (non-IID) training data could make model convergence difficult and training unstable under FL, and the client-drift challenge due to non-IID training data can also adversely affect the training of GAN. To address these challenges, we propose an adaptive FL framework, AFL-GAN, which aims to optimize client selection and local training epochs simultaneously, so as to implement high-performance and stable GANs in practical wireless environments. Specifically, we first give a toy example to explain the necessity of optimizing client selection and local training epochs in FL-GANs, for the two components of the GAN, we set up the training process without exposing the discriminator but sharing the generator to reduce communication overhead. Then, we formulate the minimization problem of AFL-GAN model loss under a given resource budget, and analyze the effect of client selection and local training epoch on the training performance of FL-GANs. Next, guided by the toy example and theoretical analysis, to solve the non-IID and client-drift challenge caused by non-IID, we employ the maximum mean discrepancy (MMD) score to evaluate the contribution weight of each local model, and leverage the deep reinforcement learning (DRL) to adaptively achieve the optimizing of client selection and local training epochs. Finally, experimental results show that our proposed framework can improve the learning performance of FL-GANs training while saving computation and communication resources, and have good performance in resource-constrained situations.

Leveraging saliency priors and explanations for enhanced consistent interpretability

Article

Feb 2024
EXPERT SYST APPL

Fusing joint distribution and adversarial networks: A new transfer learning method for intelligent fault diagnosis

Article

Jan 2024
APPL ACOUST

A Performance Evaluation of Federated Learning Algorithms

Conference Paper

Full-text available

Dec 2018

Federated learning is an approach to distributed machine learning where a global model is learned by aggregating models that have been trained locally on data-generating clients. Contrary to centralized optimization, clients can be very large in number and face challenges of data and network heterogeneity. Examples of clients include smartphones and connected vehicles, which highlights the practical relevance of federated learning. We benchmark three federated learning algorithms and compare their performance against a centralized approach where data resides on the server. The algorithms Federated Averaging (FedAvg), Federated Stochastic Variance Reduced Gradient, and CO-OP are evaluated on the MNIST dataset, using both i.i.d. and non-i.i.d. partitionings of the data. Our results show that FedAvg achieves the highest accuracy among the federated algorithms, regardless of how data was partitioned. Our comparison between FedAvg and centralized learning shows that they are practically equivalent when i.i.d. data is used. However, the centralized approach outperforms FedAvg with non-i.i.d. data.

Hausdorff GAN: Improving GAN Generation Quality With Hausdorff Metric

Article

Mar 2021

Data usually resides on a manifold, and the minimal dimension of such a manifold is called its intrinsic dimension . This fundamental data property is not considered in the generative adversarial network (GAN) model along with its its variants; such that original data and generated data often hold different intrinsic dimensions . The different intrinsic dimensions of both generated and original data may cause generated data distribution to not match original data distribution completely, and it certainly will hurt the quality of generated data. In this study, we first show that GAN is often unable to generate simulation data, holding the same intrinsic dimension as the original data with both theoretical analysis and experimental illustration. Next, we propose a new model, called Hausdorff GAN, which removes the issue of different intrinsic dimensions and introduces the Hausdorff metric into GAN training to generate higher quality data. This provides new insights into the success of Hausdorff GAN. Specifically, we utilize a mapping function to map both original and generated data into the same manifold. We then calculate the Hausdorff distance to measure the difference between the mapped original data and the mapped generated data, toward pushing generated data to the side of original data. Finally, we conduct extensive experiments (using MNIST, CIFAR10, and CelebA datasets) to demonstrate the significant performance improvement of the Hausdorff GAN in achieving the largest Inception Score and the smallest Frechet inception distance (FID) score as well as producing diverse generated data at different resolutions.

Tackling mode collapse in multi-generator GANs with orthogonal vectors

Article

Feb 2021
PATTERN RECOGN

Generative Adversarial Networks (GANs) have been widely used to generate realistic-looking instances. However, training robust GAN is a non-trivial task due to the problem of mode collapse. Although many GAN variants are proposed to overcome this problem, they have limitations. Those existing studies either generate identical instances or result in negative gradients during training. In this paper, we propose a new approach to training GAN to overcome mode collapse by employing a set of generators, an encoder and a discriminator. A new minimax formula is proposed to simultaneously train all components in a similar spirit to vanilla GAN. The orthogonal vector strategy is employed to guide multiple generators to learn different information in a complementary manner. In this way, we term our approach Multi-Generator Orthogonal GAN (MGO-GAN). Specifically, the synthetic data produced by those generators are fed into the encoder to obtain feature vectors. The orthogonal value is calculated between any two feature vectors, which loyally reflects the correlation between vectors. Such a correlation indicates how different information has been learnt by generators. The lower the orthogonal value is, the more different information the generators learn. We minimize the orthogonal value along with minimizing the generator loss through back-propagation in the training of GAN. The orthogonal value is integrated with the original generator loss to jointly update the corresponding generator’s parameters. We conduct extensive experiments utilizing MNIST, CIFAR10 and CelebA datasets to demonstrate the significant performance improvement of MGO-GAN in terms of generated data quality and diversity at different resolutions.

Scalable Deep Learning on Distributed Infrastructures: Challenges, Techniques, and Tools

Article

Feb 2020

Deep Learning (DL) has had an immense success in the recent past, leading to state-of-the-art results in various domains, such as image recognition and natural language processing. One of the reasons for this success is the increasing size of DL models and the proliferation of vast amounts of training data being available. To keep on improving the performance of DL, increasing the scalability of DL systems is necessary. In this survey, we perform a broad and thorough investigation on challenges, techniques and tools for scalable DL on distributed infrastructures. This incorporates infrastructures for DL, methods for parallel DL training, multi-tenant resource scheduling, and the management of training and model data. Further, we analyze and compare 11 current open-source DL frameworks and tools and investigate which of the techniques are commonly implemented in practice. Finally, we highlight future research trends in DL systems that deserve further research.

Lifelong Federated Reinforcement Learning: A Learning Architecture for Navigation in Cloud Robotic Systems

Conference Paper

Nov 2019

[title in Japanese]

Article

Oct 2017

[in Japanese]

MD-GAN: Multi-Discriminator Generative Adversarial Networks for Distributed Datasets

Conference Paper

May 2019

Lifelong Federated Reinforcement Learning: A Learning Architecture for Navigation in Cloud Robotic Systems

Article

Jul 2019

This paper was motivated by the problem of how to make robots fuse and transfer their experience so that they can effectively use prior knowledge and quickly adapt to new environments. To address the problem, we present a learning architecture for navigation in cloud robotic systems: Lifelong Federated Reinforcement Learning (LFRL). In the work, we propose a knowledge fusion algorithm for upgrading a shared model deployed on the cloud. Then, effective transfer learning methods in LFRL are introduced. LFRL is consistent with human cognitive science and fits well in cloud robotic systems. Experiments show that LFRL greatly improves the efficiency of reinforcement learning for robot navigation. The cloud robotic system deployment also shows that LFRL is capable of fusing prior knowledge. In addition, we release a cloud robotic navigation-learning website to provide the service based on LFRL: www.shared-robotics.com

H i s -GAN: A histogram-based GAN model to improve data generation quality

Article

Jul 2019
NEURAL NETWORKS

Generative Adversarial Network (GAN) has become an active research field due to its capability to generate quality simulation data. However, two consistent distributions (generated data distribution and original data distribution) produced by GAN cannot guarantee that generated data are always close to real data. Traditionally GAN is mainly applied to images, and it becomes more challenging for numeric datasets. In this paper, we propose a histogram-based GAN model (His-GAN). The purpose of our proposed model is to help GAN produce generated data with high quality. Specifically, we map generated data and original data into a histogram, then we count probability percentile on each bin and calculate dissimilarity with traditional f-divergence measures (e.g., Hellinger distance, Jensen-Shannon divergence) and Histogram Intersection Kernel. After that, we incorporate this dissimilarity score into training of the GAN model to update the generator's parameters to improve generated data quality. This is because the parameters have an influence on the generated data quality. Moreover, we revised GAN training process by feeding GAN model with one group of samples (these samples can come from one class or one cluster that hold similar characteristics) each time, so the final generated data could contain the characteristics from a single group to overcome the challenge of figuring out complex characteristics from mixed groups/clusters of data. In this way, we can generate data that is more indistinguishable from original data. We conduct extensive experiments to validate our idea with MNIST, CIFAR-10, and a real-world numeric dataset, and the results clearly show the effectiveness of our approach.

Gossip Learning as a Decentralized Alternative to Federated Learning

Chapter

Jun 2019

Federated learning is a distributed machine learning approach for computing models over data collected by edge devices. Most importantly, the data itself is not collected centrally, but a master-worker architecture is applied where a master node performs aggregation and the edge devices are the workers, not unlike the parameter server approach. Gossip learning also assumes that the data remains at the edge devices, but it requires no aggregation server or any central component. In this empirical study, we present a thorough comparison of the two approaches. We examine the aggregated cost of machine learning in both cases, considering also a compression technique applicable in both approaches. We apply a real churn trace as well collected over mobile phones, and we also experiment with different distributions of the training data over the devices. Surprisingly, gossip learning actually outperforms federated learning in all the scenarios where the training data are distributed uniformly over the nodes, and it performs comparably to federated learning overall.

IFL-GAN: Improved Federated Learning Generative Adversarial Network With Maximum Mean Discrepancy Model Aggregation

Abstract

Recommended publications

Federated Learning: The Pioneering Distributed Machine Learning and Privacy-Preserving Data Technolo...

Tackling mode collapse in multi-generator GANs with orthogonal vectors

Hausdorff GAN: Improving GAN Generation Quality With Hausdorff Metric

EID -GAN: Generative Adversarial Nets for Extremely Imbalanced Data Augmentation

DLS-GAN: Generative Adversarial Nets for Defect Location Sensitive Data Augmentation