ArticlePDF Available

IFL-GAN: Improved Federated Learning Generative Adversarial Network With Maximum Mean Discrepancy Model Aggregation

Authors:

Abstract

The generative adversarial network (GAN) is usually built from the centralized, independent identically distributed (i.i.d.) training data to generate realistic-like instances. In real-world applications, however, the data may be distributed over multiple clients and hard to be gathered due to bandwidth, departmental coordination, or storage concerns. Although existing works, such as federated learning GAN (FL-GAN), adopt different distributed strategies to train GAN models, there are still limitations when data are distributed in a non-i.i.d. manner. These studies suffer from convergence difficulty, producing generated data with low quality. Fortunately, we found that these challenges are often due to the use of a federated averaging strategy to aggregate local GAN models' updates. In this article, we propose an alternative approach to tackling this problem, which learns a globally shared GAN model by aggregating locally trained generators' updates with maximum mean discrepancy (MMD). In this way, we term our approach improved FL-GAN (IFL-GAN). The MMD score helps each local GAN hold different weights, making the global GAN in IFL-GAN getting converged more rapidly than federated averaging. Extensive experiments on MNIST, CIFAR10, and SVHN datasets demonstrate the significant improvement of our IFL-GAN in both achieving the highest inception score and producing high-quality instances.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 1
IFL-GAN: Improved Federated Learning Generative
Adversarial Network With Maximum Mean
Discrepancy Model Aggregation
Wei Li ,Member, IEEE, Jinlin Chen , Zhenyu Wang, Zhidong Shen ,
Chao Ma ,Member, IEEE, and Xiaohui Cui
Abstract The generative adversarial network (GAN) is
usually built from the centralized, independent identically dis-
tributed (i.i.d.) training data to generate realistic-like instances.
In real-world applications, however, the data may be distributed
over multiple clients and hard to be gathered due to bandwidth,
departmental coordination, or storage concerns. Although exist-
ing works, such as federated learning GAN (FL-GAN), adopt
different distributed strategies to train GAN models, there are
still limitations when data are distributed in a non-i.i.d. manner.
These studies suffer from convergence difficulty, producing gen-
erated data with low quality. Fortunately, we found that these
challenges are often due to the use of a federated averaging
strategy to aggregate local GAN models’ updates. In this article,
we propose an alternative approach to tackling this problem,
which learns a globally shared GAN model by aggregating locally
trained generators’ updates with maximum mean discrepancy
(MMD). In this way, we term our approach improved FL-GAN
(IFL-GAN). The MMD score helps each local GAN hold different
weights, making the global GAN in IFL-GAN getting converged
more rapidly than federated averaging. Extensive experiments
on MNIST, CIFAR10, and SVHN datasets demonstrate the
significant improvement of our IFL-GAN in both achieving the
highest inception score and producing high-quality instances.
Index Terms—Federated learning, generative adversarial
network (GAN), maximum mean discrepancy (MMD), non-
independent identically distributed (i.i.d.) training data.
Manuscript received January 2, 2020; revised November 30, 2020 and
January 2, 2022; accepted April 10, 2022. This work was supported in
part by the Fundamental Research Funds for the Central Universities under
Grant JUSRP121073 and Grant JUSRP521004, in part by the 2021 Jiangsu
Shuangchuang (Mass Innovation and Entrepreneurship) Talent Program under
Grant JSSCBS20210827, and in part by the Open Foundation of Engineering
Research Center of Cyberspace under Grant KJAQ202112014. (Correspond-
ing authors: Jinlin Chen; Xiaohui Cui.)
Wei Li is with the Science Center for Future Foods, the School of Artificial
Intelligence and Computer Science, and the Jiangsu Key Laboratory of Media
Design and Software Technology, Jiangnan University, Wuxi, Jiangsu 214122,
China (e-mail: cs_weili@jiangnan.edu.cn).
Jinlin Chen is with the Department of Computing, The Hong Kong
Polytechnic University, Hong Kong (e-mail: csjlchen@comp.polyu.edu.hk).
Zhenyu Wang is with the School of Computer Science, Wuhan University,
Wuhan 430072, China, and also with the Jiaxing Institute of Future Food,
Jiaxing, Zhejiang 314005, China (e-mail: zhenyuwang@whu.edu.cn).
Zhidong Shen, Chao Ma, and Xiaohui Cui are with the School of Cyber
Science and Engineering, Wuhan University, Wuhan 430072, China (e-mail:
shenzd@whu.edu.cn; whmachao@ieee.org; xcui@whu.edu.cn).
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/TNNLS.2022.3167482.
Digital Object Identifier 10.1109/TNNLS.2022.3167482
I. INTRODUCTION
THE generative adversarial network (GAN) [6] has been
demonstrated as a powerful generative model that casts
generative modeling as a game between two networks: a
discriminator Dand a generator G. The discriminator D
estimates a probability that a sample came from the training
data rather than the generator G. In the training of GAN,
Gis viewed as a forger, which specializes in generating
simulation data to fool Dinto accepting it as real. The
GAN model sidesteps the difficulty of approximating many
intractable probabilistic computations; it samples data from
an easy-to-sample distribution, so the Markov chains [25]
are never needed. Its gradients are tuned using backpropaga-
tion, which makes the training computationally inexpensive.
Although GAN is successfully applied to many real-world
applications [16]–[18], [28], training a GAN is still a challenge
because training GAN with high capacity is usually built
from the centralized, independent identically distributed (i.i.d.)
training data. Such a scenario is not often the case in practice.
For example, different departments in the same domain (e.g.,
The State Food and Drug Bureau and The Animal Husbandry
and Veterinary Bureau) collect data that are of concern to
their department. However, these data may be non-i.i.d. and
hard to be gathered due to departmental coordination and
bandwidth concerns such that the collected data are distributed
over clients in a non-i.i.d. manner, rendering it difficult to train
GAN under such a scenario. A non-i.i.d. example is shown
in Fig. 1.
GAN variants, such as MD-GAN [9] and federated learning
GAN (FL-GAN) [9], have been proposed to address this
problem. FL-GAN trains several local GAN models with the
federated averaging algorithm to learn a shared model by
aggregating locally computed updates. By employing multiple
GAN models and deploying each GAN (GANi) to Clienti,
however, they still have limitations. MD-GAN (one single
generator and multiple discriminators) is trained over multi-
ple datacenters through the wide-area network (WAN) [11].
However, such a training strategy is very expensive and com-
plicated. Moreover, the server usually generates kbatches at
each global iteration. All the Nclients receive and compute the
feedback on the same training batch if k=1, and no feedback
has a conflict on some concurrently processed data if k=N.
2162-237X © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Wuhan University. Downloaded on June 23,2022 at 04:15:17 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
2IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
Fig. 1. Non-i.i.d. data are distributed over different clients. Each client
Clientiholds few categories, while the quantity of data stored in Clientimay
be different.
Thus, MD-GAN usually reduces the server workload by sac-
rificing the diversity of generated instances. FL-GAN, on the
other hand, applies a federated learning strategy [21] to GAN
models and trains all models with federated averaging. The
federated averaging strategy produces promising results on
i.i.d. training data [23]. However, it degrades the performance
of models in the non-i.i.d. case [20], which produces generated
data with low quality and suffers from a longer training time
to get converged. A detailed discussion is shown in Section V.
To address this challenge, we propose improved FL-GAN
(IFL-GAN), which learns a globally shared GAN model
(global GAN) by aggregating locally trained generators’
updates with maximum mean discrepancy (MMD) [8]. In sce-
narios where the federated averaging strategy is employed
for learning a global GAN, all locally trained generators
are treated as contributors with exactly the same weights,
which is not always suitable in many real-world applications.
For instance, we have Klocal GAN models denoted as
GAN1GANKand GAN1GANi, which reached the Nash
equilibrium much earlier than the rest (i.e., GANi+1GANK)
mainly due to the non-i.i.d. property of the distributed training
data. In the federated averaging strategy, it is obvious that this
will cause a much longer time for the global GAN to get
converged. This is because local models’ (GAN1GANK)
updates are triggered by global GAN, i.e., GAN(x;θGANi)=
GAN(x;θGANglobal)after updating global GAN parameters with
GAN(x;θGANglobal)=(1/K)K
1GAN(x;θGANi). It causes
local GAN models that have reached the Nash equilibrium,
jumping out of the equilibrium. Fortunately, this misery could
be significantly alleviated by replacing the averaging strategy
with MMD because it tends to assign larger weights to
local GAN models, which are still far from convergence.
In this manner, MMD makes all locally trained GAN models
get converged more rapidly, which will greatly reduce the
training time of the global GAN. In recent works of GAN
research, MMD is introduced to calculate the supremum of the
difference between source samples and target samples (a.k.a.
the two-sample test). It is theoretically proved and empiri-
cally evaluated that the MMD score effectively reflects the
performance of GAN models [15], [31]. A detailed discussion
about MMD and the federated averaging method is presented
in Section IV.
In summary, the major innovations and contributions of this
study are described as follows.
1) This article proposes a novel approach, named
IFL-GAN, to train GAN with MMD from decentralized,
non-i.i.d. training data.
2) This article compares the federated averaging strategy
with MMD from both theoretical and empirical perspec-
tives, giving new insights into the success of IFL-GAN.
3) Through comprehensive experiments on three datasets
with different resolutions, we demonstrate the effective-
ness of IFL-GAN.
The rest part of this article is organized as follows.
In Section II, existing works are discussed. The GAN model
and federated learning are reviewed in Section III. Our
IFL-GAN is introduced in detail in Section IV. In Section V,
empirical evaluation is conducted. Section VI serves as our
conclusion.
II. RELATED WORK
Federatively training the neural networks is a thriving
direction and attracts many researchers focusing on this topic.
Here, we categorize the research works of federated learning
according to the research that focuses on different aspects.
A. Improving Communication Efficiency Among Models
Since the data are distributed over a set of clients and
the networks that these clients held need to communicate
and exchange information, researchers focus on reducing the
network latency and improving communication efficiency.
Smith et al. [27] showed that multitask learning is naturally
suited to handle the statistical challenge and proposed a
new system-aware optimization approach, MOCHA. This
is because unstable communication might cause devices to
get offline, which makes federated learning more challeng-
ing. The goal of this approach is to achieve significant
speedups for operating federated learning on separate devices.
Bonawitz et al. [3] paid their attention to the device availabil-
ity and unreliable device connectivity, as well as interrupted
execution. They had built a scalable production system for
federated learning on tens of millions of real-world devices.
Although these studies have improved the communication
efficiency among models, they pay less attention to improve
the performance of models.
B. Integrating Federated Learning With Other Schemes
Some researchers try to integrate the federated learning
approach (e.g., federated learning [19], [34]) with other deep
learning models and apply the integrated models to the
real-world applications. Zhuo et al. [34] integrated the rein-
forcement learning with federated learning. This is because
different departments’ decision policies are private and hard
to be shared with each other. On the other hand, building
individual decision policies with high quality is a nontrivial
task. Therefore, it is necessary to federatively learn deci-
sion policies. Liu et al. [19] applied federated learning to
robots’ communication and made robots fuse and transfer
their experience so that robots can quickly adapt to the new
environment. In this way, they termed their new approach
Authorized licensed use limited to: Wuhan University. Downloaded on June 23,2022 at 04:15:17 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
LI et al.: IFL-GAN WITH MMD MODEL AGGREGATION 3
lifelong federated reinforcement learning (LFRL). LFRL is
consistent with human cognitive science and fits well in
cloud robotic systems. Different from those federated learning
approaches, the following studies explore the integration of
federated learning with GAN.
MD-GAN [9] is the recent approach that trains GAN models
in a federated fashion. MD-GAN employs multiple discrimi-
nators and only one generator to reduce computing costs. Note
that the generator is on the server, while these discriminators
are on the local clients. The generator produces simulation
data and sends them to each local client simultaneously. The
discriminator is still used to distinguish the simulation data
from real data. Note that each discriminator has (1/(n1))
probability to be exchanged with another discriminator, which
is randomly selected from n1 discriminators, given that
there are ndiscriminators. The exchanging approach adopts
the gossip algorithm [10]. Although swapping the parameters
between two discriminators can avoid overfitting problems,
swapped models need to be retrained. We still take Fig. 1
as the example. Discriminator 1 has learned the knowledge
of figures “0” and “1, and the knowledge of figures “2”
and “3” has been learned by discriminator 2. If we swap
the parameters of the two discriminators, both of them need
to be retrained to understand which figures are “0” or “2,
resulting in more training costs. In this way, MD-GAN needs
to determine a tradeoff between training complexity and data
diversity, reducing the diversity of generated data.
FL-GAN [9] aims to train a set of GAN models with the
federated averaging method. Specifically, each client holds
a vanilla GAN model, and a shared model is learned by
aggregating locally computed updates with iterative federated
averaging. The federated averaging method shows promising
results when facing i.i.d. training data. However, it degrades
the performance of models in the non-i.i.d. case [20].
III. PRELIMINARIES
A. Generative Adversarial Network
The GAN was proposed by Goodfellow et al. [6] as a novel
generative model to simultaneously train a generator Gand a
discriminator Dusing the following function:
min
Gmax
DV(G,D)=Expr(x)log D(x)
+Ezpz(z)log(1D(G(z)))(1)
where xcomes from a distribution pr(x)underlying the
original dataset and zcomes from a predefined noise distrib-
ution pz(z), which is usually an easy-to-sample distribution,
e.g., uniform distribution with (-1, 1) or Gaussian distribution
with (0, 1). Gstarts with sampling input zfrom pz(z)and
then maps zto data space G(z;θG)through a differentiable
network. On the other hand, Daims to recognize whether an
instance is from training data or from G. In general, Dstrives
to minimize the score it assigns to the generated data G(z)
by minimizing D(G(z)) and maximize the score it assigns
to the original data xby maximizing D(x).Inthisway,
Dand Gare alternatively optimized, and the Jensen–Shannon
(JS) divergence is utilized to measure the difference between
pr(x)and pG(x). JS divergence reaches its lowest value as
Dand Greach the Nash equilibrium [4], where D(G(z)) =
D(x)=0.5. The GAN model gets converged under such a
scenario.
B. Federated Learning
Both the centralized and decentralized architectures usually
assume the balanced and i.i.d. training data [20]. Federated
learning [21] evolves around a scenario where the training data
are stored locally in multiple clients (e.g., mobile devices).
Hence, each particular local dataset could not be representative
of the overall distribution. Usually, the machine learning mod-
els the clients held are neural networks (e.g., the convolutional
neural network). Thus, the algorithm adopted by the federated
learning is applicable to any finite-sum objective, which is
shown as follows:
min
ωRdf(ω)where f(w)def
=1
n
n
i=1
fi(ω)(2)
where fi(ω) =(xi,yi) indicates the loss of the prediction
on the sample (xi,yi)with parameters ω. Assuming that there
are Kclients and the data samples are partitioned into K
subsets, with Pk(nk=|Pk|), the set of indexes of data samples
in client k. Thus, (2) can be rewritten as follows:
f(w)=
K
k=1
nk
nFk(ω)where Fk(ω)=1
nk
iPk
fi(ω)(3)
where Pkrefers to a partition. Since Fkcould be an arbitrarily
bad approximation to f, the setting of federated learning
can be extrapolated to non-i.i.d. However, Nilsson et al. [23]
showed that the federated averaging achieves the best perfor-
mance for federated learning and is practically equivalent to
the centralized architecture only when training data are i.i.d.
In the non-i.i.d. case, the centralized approach performs better
than the federated averaging [20].
C. Federated Learning GAN
FL-GAN focuses on integrating federated learning with
GAN models [9]. We still take Fig. 1 as the example. Assume
that there are Kclients, and each client holds a regular GAN.
Each GAN builds a mapping function that can map a random
noise zfrom randomized space Zinto the data space χi,
K
i=1χi=X, given that χiis not representative of the
overall data space Xbut a local representative. Therefore, the
data distribution, pri(x), within a client is a part of overall
distribution pr(x),andK
i=1pri(x)=pr(x). We define a prior
distribution on Zas pz(z)in each client and sample noise from
pz(z). In this way, the objective function of FL-GAN can be
defined as follows:
min
Gmax
DV(G,D)=1
K
K
i=1Expri(x)logDi(x)
+Ezpz(z)log(1Di(Gi(z)))
(4)
where pz(z)follows an easy-to-sample distribution [e.g., the
Gaussian distribution (0, 1)], and iindicates the ith GAN
Authorized licensed use limited to: Wuhan University. Downloaded on June 23,2022 at 04:15:17 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
4IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
in the ith client. From (4), we can see that the FL-GAN
adopts model averaging to aggregate local GAN models’
updates to calculate the parameters of the global GAN, i.e.,
Gglb =(1/K)K
i=1G(x;θgi). After that, the parameters of
each local GAN would be replaced by the parameters of global
GAN, and this is formulated by G(x;θgi)=Gglb ,i∈[1,K].
This process would be repeated until all local GAN models
reach the equilibrium. However, such an aggregation may
cause convergence difficulty or longer time to get converged,
resulting in simulation data with low diversity [9]. This is
because federated averaging method implies that all local GAN
models either simultaneously reach equilibrium or are far away
from the equilibrium. Even though there is a specific local
GAN (GANi) reaching the equilibrium, it would be replaced
by Gglb in next iteration, rendering GANijumping out of its
own optimal status. In Section IV, we propose the IFL-GAN
and demonstrate how to address this issue with IFL-GAN.
IV. IMPROVED FEDERATED LEARNING GAN
In this section, we propose IFL-GAN by aggregating GAN
models with MMD. Specifically, we discuss two important
issues in our new approach: 1) how to train IFL-GAN on
decentralized and non-i.i.d. data with MMD and 2) comparing
MMD with the federated averaging method.
A. Training IFL-GAN With MMD
For each client, it still holds one single GAN model, and the
generator within each GAN model builds a mapping function
that maps noise code zfrom randomized space Zinto data
space χi.χiis a subspace of data space X.However,the
weight of each GAN in IFL-GAN is derived from the MMD
score rather than the weighted average score. In this way, the
objective function of IFL-GAN can be defined as follows:
min
Gmax
DV(G,D)
=α1Expr1(x)logD1(x)
+Ezpz(z)log(1D1(G1(z)))
+α2Expr2(x)logD2(x)
+Ezpz(z)log(1D2(G2(z)))+···
+αKExprK(x)logDK(x)
+Ezpz(z)log(1DK(GK(z)))
s.t.
K
i=1
αi=1(5)
where αi(i=1,2,...,K) indicates the weight of the ith
GAN model and its value is derived from the MMD score
[see (6)]. In (6), fusually refers to the Gaussian kernel func-
tion that maps data into the reproducing kernel Hilbert space
(RKHS). xrefers to the training data with minibatch size,
while G(z)indicates the generated data with minibatch size.
The MMD score is determined by calculating the supremum
of expectations [i.e., E()]of f(x)and that of f(G(z))
mmdi=sup
|| f||1
||E[f(x)]E[f(G(z))]||2
s.t
i=emmdi
K
j=1emmd j
.(6)
Since the MMD score (mmdi) may be larger than 1, we uti-
lize the Softmax function to normalize MMD scores, obtaining
normalized MMD score αi,s.t. K
i=1αi=1. Similar to
vanilla GAN, each discriminator Diof IFL-GAN pursues max-
imizing the probability of assigning the correct label to both
training and generated samples, and each generator Gitries to
fool Diinto accepting its outputs as real data by maximizing
Di(Gi(z)). Here, we assume that each discriminator Diand
generator Gihave enough capacity. Thus, for each generator
Gifixed, each optimal discriminator D
iis shown as follows:
D
i(x)=pri
pri+pGi
.(7)
For pri=pGi,D
i(x)=(1/2). In this way, each optimal
generator Giis shown as follows:
V(Gi)=−2log2 +KL
pri
pri+pGi
2
+KL
pGi
pri+pGi
2.(8)
The proofs for optimal discriminator and generator are
referred to in the vanilla GAN study [6]. When both dis-
criminator and generator in a specific client reach the optimal
status (i.e., pri=pGi), the local generator Giwithin the
ith client has successfully learned the knowledge of data stored
in this client. Our goal is that all local generators have to learn
the knowledge of data stored in different clients, rendering
global generators producing all modes of data. Accordingly,
the formula of global generator Gglb is shown as follows:
Gglbx;θGglb =
n
i=1
αiGix;θGi(9)
where θGiindicates the ith generator’s parameters and θGglb
indicates the parameters of global generator.
After aggregating the parameters of G1:K, the global gener-
ator, Gglb, has learned the knowledge of all data samples, and
the global minimum of the virtual training criterion V(Gglb)is
achieved if and only if pGglb =pr,givenpr=K
i=1pGiand
K
i=1pGi=pGglb. At that point, V(Gglb )achieves the optimal
value 2log2. After that, the parameters of each generator,
Gi(x;θGi), is replaced by Gglb (x;θGglb ), and the formula is
shown in the following equation:
Gix;θGi=Gglbx;θGglb ,i=1,2,...,K.(10)
Note that our IFL-GAN is also better than FL-GAN on
imbalanced setting of distributed data. All local GAN models
are treated as exactly the same contributors in federated learn-
ing. This strategy reduces the impact on global generator for
such a GAN model (e.g., GAN1), which is trained on imbal-
anced data (the detailed explanation could refer to Section V),
given Gglb =(1/K)K
i=1G(x;θgi)and G(x;θgi)=Gglb,
i∈[1,K]. If GAN is trained on imbalanced data, generator
needs longer time to learn data distribution. Hence, the MMD
score would be enlarged for this case. According to (9)
and (10), IFL-GAN can preserve the parameters of GAN1to
larger extent, reducing training time.
There is one more thing worth to be noted. Each local
GAN may reach or approach the Nash equilibrium at different
Authorized licensed use limited to: Wuhan University. Downloaded on June 23,2022 at 04:15:17 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
LI et al.: IFL-GAN WITH MMD MODEL AGGREGATION 5
Algorithm 1 IFL-GAN
1Input: Original dataset, noise z,pz(z).
2Output: Simulation data.
3KGAN models and each one holds the same hyperparameters and the same function, i.e., Ad am [14] optimizer with
learning
4rate 0.0002 and Binary Cross Entropy loss function. The index is indicated by i.
5for each epoch t =1, 2,… do
6Clients execute:
7for each client i [1, K] do
8Updating the discriminator’s (Di) parameters by ascending its stochastic gradient;
9θD
1
mm
1{logDi(x(j))+log(1Di(Gi(z(j))))};
10 Updating the local generator’s (Gi) parameters by descending its stochastic gradient;
11 θG
1
mm
1{log(1Di(Gi(z(j)))},mindicates noise samples z1,...,zm;
12 end
13 Calculating and normalizing MMD score mmdiproduced by xiand Gi(z)with Eq.(6);
14 Applying Softmax function to all MMD scores to obtain αiand setting the threshold cai;
15 Server executes:
16 Gglb (x;θGglb )=K
i=1αiGi(x;θGi);
17 Clients update:
18 for each client i [1, K] do
19 if mmdi>caithen
20 Gi(x;θGi)=Gglb (x;θGglb );
21 end
22 else
23 continue;
24 end
25 end
26 end
epochs due to model initialization and non-i.i.d. training data,
directly replacing these local GAN models with Gglb results
in them jumping into the nonoptimal status from the Nash
equilibrium [29]. In our proposed IFL-GAN, the MMD score
would be regarded as an indicator. In general, we utilize
finite samples from two distributions to measure the MMD
score [15]. In our work, we assume that the distributions
refer to generated data distribution and original data distribu-
tion, and finite samples are sampled from both distributions.
Because of sampling variance, the MMD score may not be
zero even though GAN has reached the Nash equilibrium [15].
Since the MMD estimator implicitly involves a threshold, cα,
for distinguishing distributions by finite samples, we usually
conduct a null hypothesis H0:P=Q[8]. If the MMD
score is greater than cα, we should reject such a hypothesis;
otherwise, we should accept this hypothesis. In this study,
we set the threshold with the minimal MMD score. This is
because the smaller the MMD score is, the better matching
the two distributions are. A more detailed discussion is shown
in Section V.
The pseudocode of IFL-GAN is formally presented in
Algorithm 1. The generator and discriminator architectures
of each GAN model in Algorithm 1 are based on regular
DCGAN [1]. The details of components conform to
Conv-BatchNorm-ReLu [24] (generator G) or Conv-
BatchNorm-LeakyReLu [30] (discriminator D).
In Algorithm 1, each client holds a specific GAN and uses
subscript ito indicate it, i.e., Giand i∈[1,K]. For each
client, we train the local GAN model and send its parameters
(θGi) and the corresponding softmax MMD score αito the
server to learn a global generator (Gglb) by aggregating local
GAN models and MMD scores. After that, the generator of
each local GAN is replaced with Gglb if GANidoes not
reach the Nash equilibrium. This process is repeated until
all GAN mod els reach the Nash equilibrium ( pG1=pr1,
pG2=pr2,...,pGK=prK). As a consequence, the global
generator has learned the knowledge of all data samples, and
the simulation data generated by global generator hold all
features of decentralized data according to (9). Fig. 2 shows
the architecture of our proposed IFL-GAN.
B. Avg Versus MMD
Since the generator in GAN specializes in producing simu-
lation data, we compare the federated averaging method with
MMD by targeting the generator. In the MMD, we assume
that the generated instances, G(z), are defined via sampling
from the generated data distribution P; the original samples, x,
Authorized licensed use limited to: Wuhan University. Downloaded on June 23,2022 at 04:15:17 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
6IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
Fig. 2. Dotted arrows indicate sampling from a specific distribution. θG1indicates the parameters of the first generator, which is in the first client. The
parameters of all local generators (θG1,θG1,...,θ
GK) and their MMD scores normalized by Softmax function (α1
2,...,α
K) are uploaded to the global
generator to calculate its parameters with aggregation [see (9)]. After that, each local GAN (G1to GK) is replaced with aggregated GAN (θGglb), which is
indicated by the red line.
are defined via sampling from the original data distribution Q.
Those samples are mapped into the RKHS with the Gaussian
Kernel function. After that, we utilize the Euclidean distance
to reexpress (6) in RKHS. Hereby, the estimator of MMD
(P,Q)isdenedasfollows:
MMD(P,Q)=EPx,x2EP,Q(x,G(z))+EQG(z),G(z).
(11)
The detailed deduction of the MMD estimator is shown in
study [8]. Note that the MMD estimator implicitly involves
a threshold for distinguishing distributions by finite samples.
In other words, we usually conduct a hypothesis test with a
null hypothesis: H0:P=Q. In general, judging whether
this equation is to be held or not relies on comparing the test
statistic MMD[x,G(z)] with a particular threshold ca.Ifcais
exceeded, then the test rejects the hypothesis; otherwise, the
test should accept the hypothesis.
Since the data are distributed over different clients in a non-
i.i.d. manner, it is a nontrivial task to know the data details,
such as sample quantity and category that a client holds. This
causes it to be hard to perceive the learning status of each
local model, e.g., whether a local GAN model reaches the
Nash equilibrium or not. The traditional federated averaging
method cannot loyally reflect such different status because it
views all local models with the same contribution. In opposite,
the MMD score can loyally reflect such a difference, i.e., the
smaller MMD score indicates a better status, while the larger
MMD score refers to a worse status [8], given that the MMD
score reflects the matching degree of two distributions. Hence,
the MMD is more suitable than the averaging method for
aggregating locally computed updates on federated learning.
Next, we would like to discuss the supreme advantages of
MMD according to the learning status. For convenient demon-
stration, here, we assume that K=2: two local GAN models:
GAN1and GAN2, and one global GAN model: GANglb
Gglbavg =1
2G1(z;θG1)+1
2G2z;θG2
Gglbmmd =α1×G1(z;θG1)+α2×G2z;θG2
(12)
where θGi indicates the parameters of generator in the
ith GAN. Gglbavg indicates that the parameters of global gener-
ator are calculated by federated averaging method, while our
approach adopts MMD to aggregate the parameters of local
generators, and here, we term Gglbmmd . Given that the training
data are distributed over different clients in the non-i.i.d. man-
ner, we could observe 0
1= α2<1 after training some
epochs, and they refer to MMD scores. From (12), an intuitive
scenario could be observed that Gglbmmd within (12) becomes
a weighted equation, and Gglbavg is an absolute arithmetic
averaging equation. Compared with the averaging method,
the benefits of the weighted method include: 1) averaging
the averages breaks the fundamental rules of math [13] such
Authorized licensed use limited to: Wuhan University. Downloaded on June 23,2022 at 04:15:17 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
LI et al.: IFL-GAN WITH MMD MODEL AGGREGATION 7
Fig. 3. Architectural details of IFL-GAN for MNIST, CIFAR10, and SVHN datasets.
that some studies insist that averaging method is derived
by the “wrong math” but believe that the weighted method
is based upon the “correct math” and should be applied
accordingly [13] and 2) the weighted method allows the final
number to quantitatively reflect the relative importance of each
local model that is being averaged in federated averaging [32].
However, the federated averaging strategy only works well
when all local models are equally important [33]. Yet, this is
not often the case in practice, especially for that the data are
distributed over clients in the non-i.i.d. manner.
In general, we view the process of training a model as
the process of optimizing the model’s parameters [5]. With
the training increasing, the updated parameters guide the
model toward the optimal status with backpropagation. Here,
we assume that G1is more optimal than G2, which means
that both the distribution of synthetic data produced by G1
and that of original data hold a smaller MMD score than G2.
Hence, we have 0
1
2<1. With this in place, to make
the global model converge rapidly, an intuitive common sense
is that the parameters of G1need smaller updating, while
the parameters of G2require larger updating. However, the
averaging method views them with the same contribution,
which may easily suffer from the problem that G1has skipped
the optimal status when G2is far away from this goal. Hence,
Gglbmmd gets converged more rapidly than Gglbavg .Wenowstate
this properly in the following theorem.
Theorem: We assume that M1denotes a model that adopts
the averaging method to aggregate locally computed updates;
M2denotes a model that adopts the MMD method to learn
a shared model by aggregating locally computed updates. For
each Mi,weset K=2 (i.e., two local GAN models GAN1
and GAN2and a global GAN model GANglb ). Moreover,
we assume that αirepresents the performance (the matching
degree between the generated data distribution and the original
data distribution) of Gi, given that the generator outputs
realistic-like data. By optimizing the parameters of the model,
the matching degree between the two distributions tends to
be enhanced. Also, we utilize the same components and
parameters to initialize both M1and M2andemploythe
same hyperparameters to train the two models. In addition,
the training data are distributed over different clients in the
non-i.i.d. manner, which means that α1= α2. If we suppose
that α1
2,thenM2would get converged more rapidly
than M1.
Proof: Let the current performance of G2be α2,andα
2
represents the performance of G2in M1;α
2indicates that
of G2in M2after performing (13). Since α1
2and
αi=1 [see (5)], we have α2>const (i.e., 0.5). In this
way, Gglbavg certainly reduces the performance of G2because
the averaging method enforces the weight of G2(i.e., α2)to
be small. Since the parameters of G2are kept to the greatest
extent in Gglbmmd [see (12)], we obtain α
2
2according
to (13). Hence, G2in M2gets larger updating than that in
M1.AstoG1, we naturally hold α1<const. Let α
1represent
the performance of G1in M1,andα
1indicates that of G1
in M2after performing (13); we get α
1
1.Inthisway,
G1in M2gets smaller updating than that in M1. Since the
updating strategy in M2is more practical than M1,M2gets
converged more rapidly than M1. As illustrated above, this
theorem could be easily extended to the cases that K>2.
In addition, the scenario of α1
2is similar to α1
2.
Next, we would like to discuss the specific implementations
of M1and M2through FL-GAN and our proposed IFL-GAN
G1z;θG1=G2z;θG2=Gglbavg
G1z;θG1=G2z;θG2=Gglbmmd .(13)
As the training process of the two models proceeds, a sce-
nario would have arrived where GAN1reaches or nears the
Nash equilibrium, while GAN2is far away from this status.
According to the Clients update of Algorithm 1, the local
model GANigets updated only when its MMD score is
larger than the threshold cai; otherwise, the local model GANi
refuses the updating if its MMD score is equivalent to or
less than cai. In this way, the parameters of GAN1would
not be changed, and only GAN2gets updated in IFL-GAN.
Unfortunately, the operation of G1(z;θG1)=Gglbavg changes
the parameters of GAN1in FL-GAN, resulting in that GAN1
jumps out of the Nash equilibrium. Therefore, our proposed
approach enables the GAN model to get converged faster than
averaging method.
V. E XPERIMENTS
For the experiments, the implementation details of our
proposed IFL-GAN are shown in Fig. 3. Note that each
generator and discriminator in IFL-GAN are initialized by
the same hyperparameters (normal (0.0, 0.02) and biases
are 0.0), and the noise is sampled from the standard
Authorized licensed use limited to: Wuhan University. Downloaded on June 23,2022 at 04:15:17 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
8IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
Fig. 4. Synthetic images produced by the three candidate models on balanced data setting of MNIST with K=2. The first client holds figures “0”–“4” with
10 000 samples, while the second client holds figures “5”–“9” with 10 000 samples. MD-GAN does not capture figure “4,” while FL-GAN does not capture
figure “2.” Only our IFL-GAN generates all figures from “0” to “9.” (a) MD-GAN. (b) FL-GAN. (c) IFL-GAN.
Gaussian (0, 1) [ pz(z)]. Two recently popular GAN variants
are employed as baselines, which are MD-GAN [9] and
FL-GAN [9], [21]. They are representatives of distributed
GAN. For making a fair comparison, we adopt the same
components and hyperparameters (e.g., Epoch =200 and
learning rate =0.0002) and implementation details for base-
lines and IFL-GAN. All models are implemented in the
Pytorch framework.
Three commonly used public image datasets, the MNIST,
CIFAR-10, and SVHN, are studied. The MNIST dataset con-
sists of a gray-scale image with a size of 28 ×28. Both
the CIFAR-10 dataset and the SVHN dataset contain RGB
images, and their sizes are set to 3 ×32 ×32. We conduct
experiments with a number of models K∈[2,5,10].For
model split, K=2 (5 or 10) indicates two (five or ten) local
GAN models, one global GAN for FL-GAN, our proposed
IFL-GAN, Kdiscriminators, and one single generator are
employed for MD-GAN. For training data split (taking MNIST
as the example) to Kclients, K=2 indicates that each client
holds five categories. The first client holds figures “0”–“4,
while the second client holds figures “5”–“9.” Each client
holds 10 000 training samples, and we call it “balanced data.”
The first client holds 10000 data samples, while the second
client holds 1000 data samples, and we call it “imbalanced
data.” When the second client holds 100 data samples, we term
it “extremely imbalanced data.” For K=5, each client holds
two categories and has 10000 data samples in the case of
balanced data. The first three clients hold 15 000 data samples,
and each one has 5000 samples, while the last two clients
hold 2000 data samples, and each one has 1000 samples in
the imbalanced case. Each client holds 100 data samples for
the last two clients in the extremely imbalanced case. For
K=10, each client just holds one category and has 5000 data
samples in the balanced case. In the imbalanced case, each
client within the first five clients holds 5000 samples, while
each client within the last five clients holds 1000 samples.
In the extremely imbalanced case, each client within the last
five clients holds 100 samples.
A. MNIST
We first apply IFL-GAN and baselines to the MNIST dataset
in the case of K=2 with balanced data. For FL-GAN and
IFL-GAN, the generated images are produced by a global
generator. For MD-GAN, there is only one single generator.
Since our proposed approach is to utilize the MMD to assign
the weights for corresponding local models, and determining
whether the empirical MMD shows a statistically significant
difference is achieved by comparing the test statistic with a
particular threshold [8], we utilize the minimal MMD score
to set the threshold for each mmdiduring training. This is
because the smaller MMD score implicitly indicates that one
distribution is better matching another distribution. In this
case, the threshold ca1of mmd1is 0.11027 at epoch =182,
and the threshold ca2of mmd2is 0.11463 at epoch =184.
In other words, if mmd1(mmd2) score is larger than 0.11027
(0.11463), we should reject the hypothesis that assuming the
two distributions match with each other; otherwise, we should
accept the hypothesis. Once we accept the two hypotheses
from the two clients, the global GAN generates the simulation
data. The generated images of the three candidate models in
the same case are shown in Fig. 4, and the corresponding loss
values of two generators for FL-GAN and IFL-GAN are shown
in Fig. 5, which is plotted by exponential moving average
method. From Fig. 4, we can observe that the MD-GAN does
not capture figure “4, while FL-GAN does not hold figure “2.”
Only our proposed IFL-GAN generates all figures, which
demonstrates the effectiveness of IFL-GAN. From Fig. 5,
we can observe that IFL-GAN outperforms FL-GAN in terms
of smaller fluctuation and faster convergence.
Furthermore, we continue to quantitatively evaluate the
performance of our proposed IFL-GAN in different cases
(i.e., K=2,5,10 on “balanced data,” “imbalanced data, and
“extremely imbalanced data” settings of the MNIST dataset).
The MNIST score [9] (the higher the better), similar to the
inceptions score, is employed as the quantitative metric in
our study. The MNIST scores achieved by our proposed IFL-
GAN and baselines are shown in Table I. Fig. 6 shows the
Authorized licensed use limited to: Wuhan University. Downloaded on June 23,2022 at 04:15:17 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
LI et al.: IFL-GAN WITH MMD MODEL AGGREGATION 9
Fig. 5. In the case of K=2 on the balanced data setting, (a) indicates the first generator loss for our proposed IFL-GAN and FL-GAN and (b) indicates
the second generator loss for the two models. In each subfigure, the red curve indicates the generator loss of FL-GAN, while the blue curve indicates thatof
IFL-GAN.
TAB L E I
MNIST SCORES ON BALANCED,IMBALANCED,AND EXTREMELY IMBALANCED SETTINGS OF THE MNIST DATA S E T.ITISOBVIOUS THAT OUR
PROPOSEDIFL-GAN ACHIEVES THE BEST PERFORMANCE AND OBTAINS THE HIGHEST MNIST SCORES IN DIFFERENT CASES
Fig. 6. Synthetic images produced by the three candidate models on extremely imbalanced data setting of MNIST in the case of K=2 in which the client1
holds figures from “0” to “4” with 10 000 samples, while the second client2holds figures from “5” to “9” with 100 samples. In this case, MD-GAN basically
focuses on the figures “0”–“4,” while the FL-GAN holds the worse quality among the three models. Only our proposed IFL-GAN produces diverse generated
images. (a) MD-GAN. (b) FL-GAN. (c) IFL-GAN.
generated images from all the three models in the case of
K=2 on the extremely imbalanced setting of MNIST. It is
obvious that our proposed IFL-GAN produces more diverse
simulation images than the other two GAN variants. For
example, Fig. 6(c) shows the figures “6” (row “4” and col
“8”), “7” (row “1” and col “6”), “8” (row “4” and col “6”), and
Authorized licensed use limited to: Wuhan University. Downloaded on June 23,2022 at 04:15:17 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
10 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
Fig. 7. Synthetic images produced by three candidate models in the case of K=2 on the balanced data setting of CIFAR10 in which the first client holds
categories “airplane,” “automobile, “bird,” “cat, and “deer, while the second client holds categories “dog,” “frog,” “horse, “ship,” and “truck.” (a) MD-GAN.
(b) FL-GAN. (c) IFL-GAN.
Fig. 8. In the case of K=2 on the balanced data setting of CIFAR10, (a) indicates the rst generator loss for our proposed IFL-GAN and FL-GAN, and
(b) indicates the second generator loss for the two models. In each subfigure, the red curve indicates the generator loss of FL-GAN, while the blue curve
indicates that of IFL-GAN.
“9” (row “1” and col “7”). Fig. 6(a) basically focuses on the
figures “0”–“4,” while Fig. 6(b) shows the worse quality of
generated images among the three models. For the imbalanced
case, the generated data quality is similar to Fig. 6. As is
visually predictable from the outputs, the MNIST score of
our proposed IFL-GAN is higher than other GAN variants in
different cases, demonstrating the effectiveness of IFL-GAN.
B. CIFAR10
We continue to apply our proposed IFL-GAN and baselines
to the CIFAR10 dataset in different cases: K=2,5,10 on
the “balanced data,” “imbalanced data, and “extremely imbal-
anced data” settings. Still, we utilize the minimal MMD score
to set the threshold during training. Here, we demonstrate the
case of K=2 with balanced data for convenient observa-
tion. In this case, the threshold ca1of mmd1is 0.09416 at
epoch =171, and the threshold ca2of mmd2is 0.08366 at
epoch =199. The threshold can help us reject the hypothesis
(generated data distribution matching to original data distribu-
tion) if the MMD score is larger than the threshold; otherwise,
we should accept the hypothesis. When we accept the two
hypotheses, the global GAN generates simulation data. The
generated images, in this case, are shown in Fig. 7, and the
corresponding generator loss values are shown in Fig. 8. It also
indicates that our proposed IFL-GAN outperforms FL-GAN in
terms of faster convergence and smaller fluctuation, especially
for the first generator G1. To further evaluate the performance
of our proposed IFL-GAN, we adopt the inception score
(the higher the better) [2], [26] as the metric. The inception
score involves using a pretrained Inception v3 network model
for image classification to predict the class probabilities for
each generated image, and these probabilities are conditional
Authorized licensed use limited to: Wuhan University. Downloaded on June 23,2022 at 04:15:17 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
LI et al.: IFL-GAN WITH MMD MODEL AGGREGATION 11
TAB L E I I
INCEPTION SCORES ON BALANCED,IMBALANCED,AND EXTREMELY IMBALANCED SETTINGS OF THE CIFAR10 DATAS E T .OUR PROPOSED
IFL-GAN STILL ACHIEVES THE BEST PERF ORMANCE AND OBTAINS THE HIGHEST INCEPTION SCORES I N DIFFERENT CASES
Fig. 9. Synthetic images produced by three candidate models in the case of K=2 on the balanced data setting of SVHN in which the first client holds the
categories from “0” to “4,” while the second client holds the categories from “5” to “9.” (a) MD-GAN. (b) FL-GAN. (c) IFL-GAN.
probabilities because images that contain meaningful objects
should have a conditional label distribution p(y|x)with low
entropy [26]. The inception score of generated images is
reported in Table II. It is obvious that our proposed IFL-GAN
still outperforms the baselines in different cases, which further
validates the effectiveness of IFL-GAN.
C. SVHN
The street view house numbers (SVHNs) [7] are con-
structed from real-world house numbers in Google Street View
images [22]. It can be seen as similar in flavor to MNIST
(e.g., the images are of small cropped digits) but incorporates
an order of magnitude more labeled data (over 600000 digit
images). We continue to apply our proposed IFL-GAN and
the baselines to this dataset. We still set the threshold with the
minimal MMD score during training. In the case of balanced
data with K=2, the threshold ca1of mmd1is 0.05169 at
epoch =169, and the threshold ca2of mmd2is 0.0475 at
epoch =152. We would reject the hypothesis if the MMD
score is larger than the threshold and accept this hypothesis
if the MMD score is smaller and equal to the threshold.
After accepting the two hypotheses, the global GAN generates
the simulation data. The corresponding generated images of
this case are shown in Fig. 9, and the generator loss values
are shown in Fig. 10, which illustrates that IFL-GAN gets
converged faster than FL-GAN. In addition, we develop the
SVHN score (the higher the better), which is calculated in a
similar way as the MNIST score to evaluate the performance
of the three candidate models on balanced data, imbalanced
data, and extremely imbalanced data with K=2,5,10.
To obtain the SVHN score, the training set (i.e., 73 257 digits)
is employed to train a classifier, and 100 generated images are
viewed as the test set. The SVHN scores of generated images
from the three candidate models are reported in Table III. From
Table III, we can observe that our proposed IFL-GAN still
outperforms baselines in achieving the highest SVHN score
on the SVHN dataset.
D. Impact of Different Data Division Methods
Note that, although the data division in our study is artificial,
it could reflect the real scene to some extent. Here, we discuss
the data division methods of both our study and [12], which
is a representative data division method in federated learning.
The study [12] splits data samples according to the Dirichlet
distribution. It draws qDirp)samples from a Dirichlet
distribution in which pcharacterizes a prior class distribution
and αdenotes the concentration parameter. Note that the prior
class distribution is a uniform distribution in [12]. When αis
a small value (e.g., α1), each client holds a few or even
one category with qimages and holds most categories when
Authorized licensed use limited to: Wuhan University. Downloaded on June 23,2022 at 04:15:17 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
12 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
Fig. 10. In the case of K=2 on the balanced data setting of SVHN, (a) indicates the first generator loss for our proposed IFL-GAN and FL-GAN, and
(b) indicates the second generator loss for the two models.
TABLE III
SVHN SCORES ON BALANCED,IMBALANCED,AND EXTREMELY IMBALANCED SETTINGS OF THE SVHN DATA S E T.
THE EXPERIMENTAL RESULTS FURTHER DEMONSTRATE THE EFFECTIVENESS OF OUR PROPOSED IFL-GAN
αhas a larger value (e.g., α→∞). In our study, the class
distribution still follows a uniform distribution. Moreover, each
client holds a few or just one category when K=5orK=10
and holds more categories when K=2. In other words, the
division method in our study is similar to that in [12] to a cer-
tain degree. We also conduct experiments with α=20.0and
α=2.0 on CIFAR10. Here, we assume two clients and divide
samples according to the Dirichlet distribution. For the case of
α=20.0, client 1 holds categories “airplane,” “automobile,
“frog,” ship,” and “truck, while client 2 holds categories
“bird,” “cat, “deer, “dog,” and “horse.” This corresponds
to the case of K=2 in our study where client 1 holds
categories “airplane, “automobile,” “bird, “cat,” and “deer,
while client 2 holds categories “dog, “frog,” “horse, “ship,
and “truck.” The corresponding inception scores are similar
to Table II. For the case of α=2.0, client 1 holds categories
“automobile, “deer, and “dog” with 10 000 images, while
client 2 holds categories “airplane,” “bird, “cat,” “frog,
“horse, “ship,” and “truck” with 10000 images (i.e., balanced
data) or 100 images (i.e., extremely imbalanced data). Both
FedAvg and MMD are utilized to update the parameters of
the model, respectively. The inception scores are shown in
Table IV. It is obvious that our approach still outperforms the
TAB L E I V
INCEPTION SCORES ON DIRICHLET DATA DIVISION.THE RES ULTS
STILL SHOW THE EFFECTIVENESS OF OUR IFL-GAN
traditional FedAvg method when utilizing Dirichlet to divide
data samples.
E. Discussion
Although MD-GAN and FL-GAN can handle the distributed
data, they have limitations if data are distributed over multiple
clients in a non-i.i.d. manner (see Fig. 6). This is because the
federated averaging method views each local model with the
same contribution to train the global model. Data distributed
over clients in the i.i.d. manner is not often the case in practice,
so each local model inevitably holds different contributions to
the global model. To this end, we propose IFL-GAN, which
takes the MMD score as each local model’s contribution
weight to drive the learning of a globally shared model by
Authorized licensed use limited to: Wuhan University. Downloaded on June 23,2022 at 04:15:17 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
LI et al.: IFL-GAN WITH MMD MODEL AGGREGATION 13
aggregating locally computed updates with those weights. Our
IFL-GAN is more generalizable and achieves faster and more
stable convergence (see Figs. 5, 8, and 10) than baselines
(i.e., FL-GAN and MD-GAN) on decentralized and non-i.i.d.
data (see Figs. 4, 7, and 9) and still produces diverse instances
when facing extremely imbalanced data (see Fig. 6).
VI. CONCLUSION
In this study, to deal with the challenge of training GAN
models on distributed data in a non-i.i.d. manner, we pro-
pose IFL-GAN. Comprehensive experiments demonstrate the
following capabilities of our IFL-GAN: 1) achieving higher
MNIST scores, inception scores, and SVHN scores than
FL-GAN and MD-GAN on decentralized and non-i.i.d. data;
2) generating more diverse and appealing recognizable images;
and 3) converging faster and more stable than FL-GAN.
REFERENCES
[1] L. M. A. Radford and S. Chintala, “Unsupervised representation learn-
ing with deep convolutional generative adversarial networks,” 2015,
arXiv:1511.06434.
[2] S. Barratt and R. Sharma, “A note on the inception score,” 2018,
arXiv:1801.01973.
[3] K. Bonawitz et al., “Towards federated learning at scale: System design,”
2019, arXiv:1902.01046.
[4] C. Daskalakis, W. P. Goldberg, and H. C. Papadimitriou, “The complex-
ity of computing a Nash equilibrium,” SIAM J. Comput., vol. 39, no. 1,
pp. 195–259, 2009.
[5] I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio, Deep Learning,
vol. 1. Cambridge, MA, USA: MIT Press, 2016.
[6] I. Goodfellow et al., “Generative adversarial nets,” in Proc. Adv. Neural
Inf. Process. Syst., 2014, pp. 2672–2680.
[7] I. J. Goodfellow, Y. Bulatov, J. Ibarz, S. Arnoud, and V. Shet, “Multi-
digit number recognition from street view imagery using deep convolu-
tional neural networks,” 2013, arXiv:1312.6082.
[8] A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, and
A. Smola, “A kernel two-sample test,” J.Mach.Learn.Res., vol. 13,
no. 1, pp. 723–773, Mar. 2012.
[9] C. Hardy, E. Le Merrer, and B. Sericola, “MD-GAN: Multi-
discriminator generative adversarial networks for distributed datasets,”
2018, arXiv:1811.03850.
[10] I. Heged ˝us, G. Danner, and M. Jelasity, “Gossip learning as a decentral-
ized alternative to federated learning,” in Proc. IFIP Int. Conf. Distrib.
Appl. Interoperable Syst. Cham, Switzerland: Springer, 2019, pp. 74–90.
[11] K. Hsieh et al., “Gaia: Geo-distributed machine learning approaching
LAN speeds,” in Proc. 14th USENIX Symp. Netw. Syst. Design Imple-
ment. (NSDI), 2017, pp. 629–647.
[12] T.-M. Harry Hsu, H. Qi, and M. Brown, “Measuring the effects of
non-identical data distribution for federated visual classification,” 2019,
arXiv:1909.06335.
[13] S.-P. Hu, “Simple mean, weighted mean, or geometric mean,” in Proc.
ISPA/SCEA Int. Conf., San Diego, CA, USA, 2010, pp. 1–24.
[14] D. P. Kingma and J. Ba, Adam: A method for stochastic optimization,”
2014, arXiv:1412.6980.
[15] C.-L. Li, W.-C. Chang, Y. Cheng, Y. Yang, and B. Póczos, MMD GAN:
Towards deeper understanding of moment matching network,” in Proc.
Adv. Neural Inf. Process. Syst., 2017, pp. 2203–2213.
[16] W. Li, W. Ding, R. Sadasivam, X. Cui, and P. Chen, “His-GAN:
A histogram-based GAN model to improve data generation quality,”
Neural Netw., vol. 119, pp. 31–45, Nov. 2019.
[17] W. Li, L. Fan, Z. Wang, C. Ma, and X. Cui, “Tackling mode collapse
in multi-generator GANs with orthogonal vectors,” Pattern Recognit.,
vol. 110, Feb. 2021, Art. no. 107646.
[18] W. Li, Z. Liang, P. Ma, R. Wang, X. Cui, and P. Chen, “Haus-
dorff GAN: Improving GAN generation quality with Hausdorff
metric,” IEEE Trans. Cybern., early access, Mar. 18, 2021, doi:
10.1109/TCYB.2021.3062396.
[19] B. Liu, L. Wang, and M. Liu, “Lifelong federated reinforcement learn-
ing: A learning architecture for navigation in cloud robotic systems,”
2019, arXiv:1901.06455.
[20] R. Mayer and H.-A. Jacobsen, Scalable deep learning on dis-
tributed infrastructures: Challenges, techniques and tools, 2019,
arXiv:1903.11314.
[21] H. Brendan McMahan, E. Moore, D. Ramage, S. Hampson, and
B. Agüera y Arcas, “Communication-efficient learning of deep networks
from decentralized data,” 2016, arXiv:1602.05629.
[22] Y. Netzer, T. Wang, A. Coates, A. Bissacco, Bo Wu, and A. Y. Ng,
“Reading digits in natural images with unsupervised feature learning,” in
Proc. NIPS Workshop Deep Learn. Unsupervised Feature Learn., 2011.
[23] A. Nilsson, S. Smith, G. Ulm, E. Gustavsson, and M. Jirstrand,
“A performance evaluation of federated learning algorithms,” in Proc.
2nd Workshop Distrib. Infrastruct. Deep Learn., Dec. 2018, pp. 1–8.
[24] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating
deep network training by reducing internal covariate shift,” 2015,
arXiv:1502.03167.
[25] J. R. Norris, Markov chains, no. 2. Cambridge, U.K.: Cambridge Univ.
Press, 1998.
[26] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and
X. Chen, “Improved techniques for training GANs,” in Proc. Adv. Neural
Inf. Process. Syst., 2016, pp. 2234–2242.
[27] V. Smith, C.-K. Chiang, M. Sanjabi, and A. S. Talwalkar, “Federated
multi-task learning,” in Proc. Adv. Neural Inf. Process. Syst., 2017,
pp. 4424–4434.
[28] X. Wang and A. Gupta, “Generative image modeling using style and
structure adversarial networks,” in Proc. Eur. Conf. Comput. Vis. Cham,
Switzerland: Springer, 2016, pp. 318–335.
[29] C. Xiao, P. Zhong, and C. Zheng, “BourGAN: Generative networks
with metric embeddings,” in Proc. Adv. Neural Inf. Process. Syst., 2018,
pp. 2269–2280.
[30] B. Xu, N. Wang, C. Naiyan, and T. Mu Li, “Empirical Evaluation of Rec-
tified Activations in Convolutional Network,” 2015, arXiv:1505.00853.
[31] Q. Xu et al., “An empirical study on evaluation metrics of generative
adversarial networks,” 2018, arXiv:1806.07755.
[32] R. R. Yager, “On ordered weighted averaging aggregation operators in
multicriteria decisionmaking,” IEEE Trans. Syst., Man, Cybern., vol. 18,
no. 1, pp. 183–190, Jan./Feb. 1988.
[33] R. R. Yager and J. Kacprzyk, The Ordered Weighted Averaging Oper-
ators: Theory and Applications. Berlin, Germany: Springer, Sci. Bus.
Media, 2012.
[34] H. Hankui Zhuo, W. Feng, Y. Lin, Q. Xu, and Q. Yang, “Federated deep
reinforcement learning,” 2019, arXiv:1901.08277.
Wei L i (Member, IEEE) received the bachelor’s
degree from the School of Mathematics and Statis-
tics, Wuhan University, Wuhan, China, in 2008, and
the Ph.D. degree from the School of Cyber Science
and Engineering, Wuhan University, in 2019.
He was a Visiting Student with the University
of Massachusetts Boston, Boston, MA, USA, and
had visited The Hong Kong Polytechnic University,
Hong Kong, as a Research Assistant. He is currently
an Associate Professor with the School of Artifi-
cial Intelligence and Computer Science, Jiangnan
University, Wuxi, China. His research areas include data mining, machine
learning, and artificial intelligence.
Jinlin Chen received the master’s degree from The
Hong Kong Polytechnic University, Hong Kong,
in 2016, where he is currently pursuing the Ph.D.
degree with the Department of Computing.
From 2017 to 2018, he was a Machine Learning
Engineer with The Hong Kong Applied Science
and Technology Research Institute, Hong Kong. His
research interests include machine learning secu-
rity, robotics control, and multiagent reinforcement
learning.
Authorized licensed use limited to: Wuhan University. Downloaded on June 23,2022 at 04:15:17 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
14 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
Zhenyu Wang received the bachelor’s degree
in software engineering from Wuhan University,
Wuhan, China, in 2010, and the Ph.D. degree from
the School of Remote Sensing Information Engineer-
ing, Wuhan University, in 2018.
He held a post-doctoral research position at the
School of Computer Science, Wuhan University.
He is currently a Researcher with the Jiaxing Insti-
tute of Future Food, Jiaxing, China. His research
areas include data mining, big data on food safety,
and artificial intelligence.
Zhidong Shen received the M.A. and Ph.D. degrees
in computer science from Wuhan University, Wuhan,
China, in 2003 and 2006.
He did his visiting research at the Queensland
University of Technology, Brisbane, QLD, Australia,
in 2010, and the University of Victoria, Victoria,
BC, Canada, in 2014. He is currently an Associate
Professor with the School of Cyber Science and
Engineering, Wuhan University. His research inter-
ests include cyberspace security, trusted computing,
machine learning, and big data.
Dr. Shen is also a member of the Chinese Computer Federation (CCF).
Chao Ma (Member, IEEE) is currently an Assis-
tant Professor with the School of Cyber Science
and Engineering, Wuhan University, Wuhan, China.
He has published over 30 academic papers in major
international journals and conference proceedings.
His research interests include time-series analytics,
representation learning, deep learning, explainable
AI, and big data analytics.
Dr. Ma is also a Professional Member of the
Chinese Computer Federation (CCF).
Xiaohui Cui received the B.S. degree in photoelec-
tric technology and the M.S. degree in computer sci-
ence from Wuhan University, Wuhan, China, in 1996
and 2000, and the Ph.D. degree in computer science
and engineering from the University of Louisville,
Louisville, KY, USA, in 2004.
He is currently a Professor with the School of
Cyber Science and Engineering, Wuhan University.
He has published over 160 papers in major artificial
intelligence journals and conferences. His research
areas include data mining, machine learning, and
artificial intelligence.
Dr. Cui received multiple national grants.
Authorized licensed use limited to: Wuhan University. Downloaded on June 23,2022 at 04:15:17 UTC from IEEE Xplore. Restrictions apply.
... When the test domain is unseen, the FL model trained with GANs augmented data could have better performance. Besides, instead of training a GAN model in a federated way with high cost like [38] and [39], we directly send the trained GAN generator models from FL clients to the FA server. ...
... If each client uploads too many model parameters (e.g., the feature extractor, classifier, and distribution generator in [43]) for aggregation, it will lead the communication overhead in FL. If only one GAN is trained among domains, the high computation complexity may lead to model performance degradation [38], [39]. In this case, which domains will have more generalization performance among other domains should be considered by FA. ...
... Unlike training GANs in a federated manner like [38] [39] with high communication overhead and expensive one generator multiple discriminators mode [52], [53], the proposed Ensemble GANs algorithm has some significant advantages: 1) The FA server directly collects the well-trained generators from clients. Without exchanging the discriminators or training the GAN model in a federated aggregation way, this method relieves the high communication overhead and offline problems during the distributed model training. ...
Article
Full-text available
Federated Domain Generalization (FDG) aims to train a global model that generalizes well to new clients in a privacy-conscious manner, even when domain shifts are encountered. The increasing concerns of knowledge generalization and data privacy also challenge the traditional gather-and-analyze paradigm in networks. Recent investigations mainly focus on aggregation optimization and domain-invariant representations. However, without directly considering the data augmentation and leveraging the knowledge among existing domains, the domain-only data cannot guarantee the generalization ability of the FDG model when testing on the unseen domain. To overcome the problem, this paper proposes a distributed data augmentation method which combines Generative Adversarial Networks (GANs) and Federated Analytics (FA) to enhance the generalization ability of the trained FDG model, called FA-FDG. First, FA-FDG integrates GAN data generators from each Federated Learning (FL) client. Second, an evaluation index called generalization ability of domain (GAD) is proposed in the FA server. Then, the targeted data augmentation is implemented in each FL client with the GAD index and the integrated data generators. Extensive experiments on several data sets have shown the effectiveness of FA-FDG. Specifically, the accuracy of the FDG model improves up to 5.12% in classification problems, and the R-squared index of the FDG model advances up to 0.22 in the regression problem.
... In [20], Eisuke et al. proposed an FGAN-based image generation model training method in wireless ad hoc collaboration, which achieves better performance in crossnode data learning and image generation. In [21], Sui et al. augmented the training dataset by leveraging the FGAN method and addressed the unlabeled data by means of an active learning method. In [22], Li et al. proposed an alternative approach which learns a globally shared GAN model by aggregating locally trained generators' updates with maximum mean discrepancy to achieve the highest inception score and produce high-quality instances. ...
... Their analysis demonstrates that the best results are achieved when synchronizing both the discriminator and generator across clients. Li et al. [23] improved FL-GANs by employing maximum mean discrepancy for generator updates. Moreover, follow-up research [22,49] has combined FL and SL to train GANs collaboratively. ...
Preprint
In the landscape of generative artificial intelligence, diffusion-based models have emerged as a promising method for generating synthetic images. However, the application of diffusion models poses numerous challenges, particularly concerning data availability, computational requirements, and privacy. Traditional approaches to address these shortcomings, like federated learning, often impose significant computational burdens on individual clients, especially those with constrained resources. In response to these challenges, we introduce a novel approach for distributed collaborative diffusion models inspired by split learning. Our approach facilitates collaborative training of diffusion models while alleviating client computational burdens during image synthesis. This reduced computational burden is achieved by retaining data and computationally inexpensive processes locally at each client while outsourcing the computationally expensive processes to shared, more efficient server resources. Through experiments on the common CelebA dataset, our approach demonstrates enhanced privacy by reducing the necessity for sharing raw data. These capabilities hold significant potential across various application areas, including the design of edge computing solutions. Thus, our work advances distributed machine learning by contributing to the evolution of collaborative diffusion models.
... Deep neural networks have seen significant usage in several domains recently, particularly image classification [1], emotion recognition [2], and natural language processing [3], which have shown remarkable performance. However, deep neural networks require huge training data, expensive computational resources, and massive training time during the training process. ...
Article
Full-text available
Recent studies have shown that deep neural networks (DNNs) may suffer from some security issues such as backdoor attacks. The triggers of backdoor attacks are dynamic and global. However, existing backdoor attack methods are laborious and time-consuming in hiding global triggers, and most of them focus on single-target attacks, with less research on multitarget backdoor attacks. In this work, we present a multitarget attack strategy using texture features of procedural noise. Specifically, we use the k-LSB steganography algorithm to hide the triggers in the image and use different texture features of procedural noise to trigger multiple targets for attack. Poisoned images can be generated more quickly using the k-LSB steganography algorithm without any training process. Multitarget backdoor attacks apply to more scenarios and are more difficult to defend against. We evaluate the effectiveness of the proposed attack method on GTSRB and ImageNet datasets, and the experiments show that the proposed attack can achieve a high attack success rate (up to 100.00% for GTSRB and up to 98.48% for ImageNet) without compromising the clean data’s categorization performance, and thus is less likely to arouse the suspicion of administrators. In addition, the attack can bypass existing defense methods (STRIP defense and Neural Cleanse defense).
... 2) Methods based on data synthesis do not require complex physical modeling and can generate new samples with fewer sample data [14], [15]. This type of method uses neural networks to learn a sample distribution law for the PV data for different weather types, generates new samples that are similar to the original samples to expand the sample training set, and thus improves the prediction accuracy of the network model [16]. ...
Article
Full-text available
This study proposes a hybrid network model based on data enhancement to address the problem of low accuracy in photovoltaic (PV) power prediction that arises due to insufficient data samples for new PV plants. First, a time-series generative adversarial network (TimeGAN) is used to learn the distribution law of the original PV data samples and the temporal correlations between their features, and these are then used to generate new samples to enhance the training set. Subsequently, a hybrid network model that fuses bi-directional long-short term memory (BiLSTM) network with attention mechanism (AM) in the framework of deep & cross network (DCN) is constructed to effectively extract deep information from the original features while enhancing the impact of important information on the prediction results. Finally, the hyperparameters in the hybrid network model are optimized using the whale optimization algorithm (WOA), which prevents the network model from falling into a local optimum and gives the best prediction results. The simulation results show that after data enhancement by TimeGAN, the hybrid prediction model proposed in this paper can effectively improve the accuracy of short-term PV power prediction and has wide applicability.
Preprint
Full-text available
Federated Learning (FL) has emerged as a solution for distributed systems that allow clients to train models on their data and only share models instead of local data. Generative Models are designed to learn the distribution of a dataset and generate new data samples that are similar to the original data. Many prior works have tried proposing Federated Generative Models. Using Federated Learning and Generative Models together can be susceptible to attacks, and designing the optimal architecture remains challenging. This survey covers the growing interest in the intersection of FL and Generative Models by comprehensively reviewing research conducted from 2019 to 2024. We systematically compare nearly 100 papers, focusing on their FL and Generative Model methods and privacy considerations. To make this field more accessible to newcomers, we highlight the state-of-the-art advancements and identify unresolved challenges, offering insights for future research in this evolving field.
Article
Federated Learning and Generative Adversarial Networks (FL-GANs) are becoming increasingly popular in solving practical applications, and their collaboration is even more efficient. However, non-independent and identically distributed (non-IID) training data could make model convergence difficult and training unstable under FL, and the client-drift challenge due to non-IID training data can also adversely affect the training of GAN. To address these challenges, we propose an adaptive FL framework, AFL-GAN, which aims to optimize client selection and local training epochs simultaneously, so as to implement high-performance and stable GANs in practical wireless environments. Specifically, we first give a toy example to explain the necessity of optimizing client selection and local training epochs in FL-GANs, for the two components of the GAN, we set up the training process without exposing the discriminator but sharing the generator to reduce communication overhead. Then, we formulate the minimization problem of AFL-GAN model loss under a given resource budget, and analyze the effect of client selection and local training epoch on the training performance of FL-GANs. Next, guided by the toy example and theoretical analysis, to solve the non-IID and client-drift challenge caused by non-IID, we employ the maximum mean discrepancy (MMD) score to evaluate the contribution weight of each local model, and leverage the deep reinforcement learning (DRL) to adaptively achieve the optimizing of client selection and local training epochs. Finally, experimental results show that our proposed framework can improve the learning performance of FL-GANs training while saving computation and communication resources, and have good performance in resource-constrained situations.
Conference Paper
Full-text available
Federated learning is an approach to distributed machine learning where a global model is learned by aggregating models that have been trained locally on data-generating clients. Contrary to centralized optimization, clients can be very large in number and face challenges of data and network heterogeneity. Examples of clients include smartphones and connected vehicles, which highlights the practical relevance of federated learning. We benchmark three federated learning algorithms and compare their performance against a centralized approach where data resides on the server. The algorithms Federated Averaging (FedAvg), Federated Stochastic Variance Reduced Gradient, and CO-OP are evaluated on the MNIST dataset, using both i.i.d. and non-i.i.d. partitionings of the data. Our results show that FedAvg achieves the highest accuracy among the federated algorithms, regardless of how data was partitioned. Our comparison between FedAvg and centralized learning shows that they are practically equivalent when i.i.d. data is used. However, the centralized approach outperforms FedAvg with non-i.i.d. data.
Article
Data usually resides on a manifold, and the minimal dimension of such a manifold is called its intrinsic dimension . This fundamental data property is not considered in the generative adversarial network (GAN) model along with its its variants; such that original data and generated data often hold different intrinsic dimensions . The different intrinsic dimensions of both generated and original data may cause generated data distribution to not match original data distribution completely, and it certainly will hurt the quality of generated data. In this study, we first show that GAN is often unable to generate simulation data, holding the same intrinsic dimension as the original data with both theoretical analysis and experimental illustration. Next, we propose a new model, called Hausdorff GAN, which removes the issue of different intrinsic dimensions and introduces the Hausdorff metric into GAN training to generate higher quality data. This provides new insights into the success of Hausdorff GAN. Specifically, we utilize a mapping function to map both original and generated data into the same manifold. We then calculate the Hausdorff distance to measure the difference between the mapped original data and the mapped generated data, toward pushing generated data to the side of original data. Finally, we conduct extensive experiments (using MNIST, CIFAR10, and CelebA datasets) to demonstrate the significant performance improvement of the Hausdorff GAN in achieving the largest Inception Score and the smallest Frechet inception distance (FID) score as well as producing diverse generated data at different resolutions.
Article
Generative Adversarial Networks (GANs) have been widely used to generate realistic-looking instances. However, training robust GAN is a non-trivial task due to the problem of mode collapse. Although many GAN variants are proposed to overcome this problem, they have limitations. Those existing studies either generate identical instances or result in negative gradients during training. In this paper, we propose a new approach to training GAN to overcome mode collapse by employing a set of generators, an encoder and a discriminator. A new minimax formula is proposed to simultaneously train all components in a similar spirit to vanilla GAN. The orthogonal vector strategy is employed to guide multiple generators to learn different information in a complementary manner. In this way, we term our approach Multi-Generator Orthogonal GAN (MGO-GAN). Specifically, the synthetic data produced by those generators are fed into the encoder to obtain feature vectors. The orthogonal value is calculated between any two feature vectors, which loyally reflects the correlation between vectors. Such a correlation indicates how different information has been learnt by generators. The lower the orthogonal value is, the more different information the generators learn. We minimize the orthogonal value along with minimizing the generator loss through back-propagation in the training of GAN. The orthogonal value is integrated with the original generator loss to jointly update the corresponding generator’s parameters. We conduct extensive experiments utilizing MNIST, CIFAR10 and CelebA datasets to demonstrate the significant performance improvement of MGO-GAN in terms of generated data quality and diversity at different resolutions.
Article
Deep Learning (DL) has had an immense success in the recent past, leading to state-of-the-art results in various domains, such as image recognition and natural language processing. One of the reasons for this success is the increasing size of DL models and the proliferation of vast amounts of training data being available. To keep on improving the performance of DL, increasing the scalability of DL systems is necessary. In this survey, we perform a broad and thorough investigation on challenges, techniques and tools for scalable DL on distributed infrastructures. This incorporates infrastructures for DL, methods for parallel DL training, multi-tenant resource scheduling, and the management of training and model data. Further, we analyze and compare 11 current open-source DL frameworks and tools and investigate which of the techniques are commonly implemented in practice. Finally, we highlight future research trends in DL systems that deserve further research.
Article
This paper was motivated by the problem of how to make robots fuse and transfer their experience so that they can effectively use prior knowledge and quickly adapt to new environments. To address the problem, we present a learning architecture for navigation in cloud robotic systems: Lifelong Federated Reinforcement Learning (LFRL). In the work, we propose a knowledge fusion algorithm for upgrading a shared model deployed on the cloud. Then, effective transfer learning methods in LFRL are introduced. LFRL is consistent with human cognitive science and fits well in cloud robotic systems. Experiments show that LFRL greatly improves the efficiency of reinforcement learning for robot navigation. The cloud robotic system deployment also shows that LFRL is capable of fusing prior knowledge. In addition, we release a cloud robotic navigation-learning website to provide the service based on LFRL: www.shared-robotics.com
Article
Generative Adversarial Network (GAN) has become an active research field due to its capability to generate quality simulation data. However, two consistent distributions (generated data distribution and original data distribution) produced by GAN cannot guarantee that generated data are always close to real data. Traditionally GAN is mainly applied to images, and it becomes more challenging for numeric datasets. In this paper, we propose a histogram-based GAN model (His-GAN). The purpose of our proposed model is to help GAN produce generated data with high quality. Specifically, we map generated data and original data into a histogram, then we count probability percentile on each bin and calculate dissimilarity with traditional f-divergence measures (e.g., Hellinger distance, Jensen-Shannon divergence) and Histogram Intersection Kernel. After that, we incorporate this dissimilarity score into training of the GAN model to update the generator's parameters to improve generated data quality. This is because the parameters have an influence on the generated data quality. Moreover, we revised GAN training process by feeding GAN model with one group of samples (these samples can come from one class or one cluster that hold similar characteristics) each time, so the final generated data could contain the characteristics from a single group to overcome the challenge of figuring out complex characteristics from mixed groups/clusters of data. In this way, we can generate data that is more indistinguishable from original data. We conduct extensive experiments to validate our idea with MNIST, CIFAR-10, and a real-world numeric dataset, and the results clearly show the effectiveness of our approach.
Chapter
Federated learning is a distributed machine learning approach for computing models over data collected by edge devices. Most importantly, the data itself is not collected centrally, but a master-worker architecture is applied where a master node performs aggregation and the edge devices are the workers, not unlike the parameter server approach. Gossip learning also assumes that the data remains at the edge devices, but it requires no aggregation server or any central component. In this empirical study, we present a thorough comparison of the two approaches. We examine the aggregated cost of machine learning in both cases, considering also a compression technique applicable in both approaches. We apply a real churn trace as well collected over mobile phones, and we also experiment with different distributions of the training data over the devices. Surprisingly, gossip learning actually outperforms federated learning in all the scenarios where the training data are distributed uniformly over the nodes, and it performs comparably to federated learning overall.