PreprintPDF Available

Quaternion Generative Adversarial Networks

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Latest Generative Adversarial Networks (GANs) are gathering outstanding results through a large-scale training, thus employing models composed of millions of parameters requiring extensive computational capabilities. Building such huge models undermines their replicability and increases the training instability. Moreover, multi-channel data, such as images or audio, are usually processed by real-valued convolutional networks that flatten and concatenate the input, losing any intra-channel spatial relation. To address these issues, here we propose a family of quaternion-valued generative adversarial networks (QGANs). QGANs exploit the properties of quaternion algebra, e.g., the Hamilton product for convolutions. This allows to process channels as a single entity and capture internal latent relations, while reducing by a factor of 4 the overall number of parameters. We show how to design QGANs and to extend the proposed approach even to advanced models. We compare the proposed QGANs with real-valued counterparts on multiple image generation benchmarks. Results show that QGANs are able to generate visually pleasing images and to obtain better FID scores with respect to their real-valued GANs. Furthermore, QGANs save up to 75% of the training parameters. We believe these results may pave the way to novel, more accessible, GANs capable of improving performance and saving computational resources.
Content may be subject to copyright.
Quaternion Generative Adversarial Networks
Eleonora Grassucci, Edoardo Cicero and Danilo Comminiello
Abstract Latest Generative Adversarial Networks (GANs) are gathering outstand-
ing results through a large-scale training, thus employing models composed of mil-
lions of parameters requiring extensive computational capabilities. Building such
huge models undermines their replicability and increases the training instability.
Moreover, multi-channel data, such as images or audio, are usually processed by
real-valued convolutional networks that flatten and concatenate the input, losing any
intra-channel spatial relation. To address these issues, here we propose a family of
quaternion-valued generative adversarial networks (QGANs). QGANs exploit the
properties of quaternion algebra, e.g., the Hamilton product for convolutions. This
allows to process channels as a single entity and capture internal latent relations,
while reducing by a factor of 4 the overall number of parameters. We show how
to design QGANs and to extend the proposed approach even to advanced models.
We compare the proposed QGANs with real-valued counterparts on multiple im-
age generation benchmarks. Results show that QGANs are able to generate visually
pleasing images and to obtain better FID scores with respect to their real-valued
GANs. Furthermore, QGANs save up to 75% of the training parameters. We be-
lieve these results may pave the way to novel, more accessible, GANs capable of
improving performance and saving computational resources.
1 Introduction
Generative models including generative adversarial networks (GANs) [10] and vari-
ational autoecoders (VAEs) [23] have been recently spectators of an increasing
widespread development due to the massive availability of large datasets covering
a large range of applications. The demand to learn such complex data distributions
Authors are with the Department of Information Engineering, Electronics and Telecommuni-
cations (DIET), Sapienza University of Rome, Via Eudossiana 18, 00184 Rome, Italy, e-mail:
{eleonora.grassucci, danilo.comminiello}@uniroma1.it
1
arXiv:2104.09630v1 [cs.LG] 19 Apr 2021
2 Eleonora Grassucci, Edoardo Cicero and Danilo Comminiello
leads to define models far from the original approach of a simple GAN, which was
characterized by fully connected layers and evaluated on benchmark datasets such as
the MNIST [10]. Multiple pathways have been covered to improve GANs generation
ability. A first branch aims at stabilizing the training process which is notoriously
unstable, often leading to a lack of convergence. This includes constraining the dis-
criminator network to be 1-Lipschitz by introducing a gradient penalty in the loss
function, normalizing the spectral norm of the network weights or adding a consis-
tency regularization [1, 15, 25, 39]. Other significant improvements are gained by
architectural innovations such as self-attention modules, flexible activation functions
or style-based generator [21, 13, 38, 20]. A crucial improvement in the quality of
image generation has been brought by broadly scaling up the networks and involving
wider batch sizes [2, 22, 33]. Indeed, BigGAN closed the visual quality gap between
GANs generated images and real-world samples in ImageNet [2]. Most of the latest
GANs are somehow inspired to it.
However, these impressive results come at the cost of huge models with hundred of
millions of free parameters which require large computational resources. This dras-
tically reduces the accessibility and the diffusion of these kind of models. Moreover,
GANs are notorious fragile models, thus the training with this amount of parameters
may result in unstable or less handy process. Furthermore, when dealing with mul-
tidimensional inputs, such as images, 3D audio, multi-sensor signals or human-pose
estimation, among others, real-valued networks break the original structure of the
inputs. Channels are processed as independent entities and just concatenated in a
tensor without exploiting any correlation or intra-channel information.
In order to address these limitations, neural networks in hypercomplex domains
have been proposed. Among them, quaternion neural networks (QNNs) leverage the
properties of the non-commutative quaternion algebra to define lower-complexity
models and preserve relations among channels. Indeed, QNNs process channels
together as a single entity, thus maintaining the original input design and correlation.
Due to this feature, QNNs are able to capture internal relations while saving up to
the 75% of free parameters thanks to hypercomplex-valued operations, including the
Hamilton product.
Encouraged by the promising results of other generative models in the quaternion
domain [12] and the need to make deep GANs more accessible, we introduce the
family of quaternion generative adversarial networks (QGANs). QGANs are com-
pletely defined in the quaternion domain and, among other properties, they exploit
the quaternion convolutions derived from the hypercomplex algebra [27, 28, 8, 6]
to improve the generation ability of the model while reducing the overall number
of parameters. We present the core-blocks to define a vanilla QGAN in the quater-
nion adversarial fashion and then explain how to derive more advanced QGANs
to prove the improved generation ability of the proposed approach in multiple im-
age generation benchmarks. We show that the quaternion spectral normalized GAN
(QSNGAN) is able to earn a better FID and a more pleasant visual quality of the
generated images with respect to its real-valued counterpart thanks to the quaternion
inner operations. Moreover, the proposed QSNGAN has just 25% the number of free
parameters with respect to the real-valued SNGAN.
Quaternion Generative Adversarial Networks 3
We believe that these theoretical statements and empirical results lay the founda-
tions for novel deep GANs in hypercomplex domains capable of grasping internal
input relations while scaling down computational requirements, thus saving memory
and being more accessible. To the best of our knowledge, this is the first attempt to
define GANs in a hypercomplex domain.
The contribution of this chapter is threefold:
i) we introduce the family of quaternion generative adversarial networks (QGANs)
proving their enhanced generation ability and lower-complexity with respect to
its real-valued counterpart on different benchmark datasets;
ii) we define the theoretically correct approach to apply the quaternion batch nor-
malization (QBN) and redefine existing approaches as its approximations;
iii) we propose and define the spectral normalization in the quaternion domain
(QSN) proving its efficacy on two image generation benchmarks.
The chapter is organized as follows. Section 2 presents the fundamental prop-
erties of quaternion algebra, while Section 3 describes the quaternion adversarial
framework and the quaternion-valued core blocks used in QGANs. Section 4 lays
the foundations for the quaternion generative adversarial networks and presents a
simple quaternion vanilla GAN and a more advanced and complex QGAN model.
Section 5 proves the effectiveness of the presented QGANs on a thorough empirical
evaluation, and, finally, conclusions are drawn in Section 6.
2 Quaternion Algebra
Quaternions are hypercomplex numbers of rank 4, being a direct non-commutative
extension of complex-valued numbers. The quaternion domain Hlies in a four-
dimensional associative normed division algebra over real numbers, belonging to
the class of Clifford algebras [36]. A quaternion is defined as the composition of one
scalar element and three imaginary ones:
𝑞=𝑞0+𝑞1ˆ𝚤+𝑞2ˆ𝚥+𝑞3ˆ𝜅=𝑞0+q(1)
with 𝑞0, 𝑞1, 𝑞2, 𝑞3Rand being ˆ𝚤=(1,0,0),ˆ𝚥=(0,1,0),ˆ𝜅=(0,0,1)unit axis
vectors representing the orthonormal basis in R3. A pure quaternion is a quaternion
without its scalar part 𝑞0, resulting in the vector q=𝑞1ˆ𝚤+𝑞2ˆ𝚥+𝑞3ˆ𝜅. As for complex
numbers, also the quaternion algebra relies upon the relations among the imaginary
components:
ˆ𝚤2=ˆ𝚥2=ˆ𝜅2=1(2)
ˆ𝚤ˆ𝚥=ˆ𝚤׈𝚥=ˆ𝜅; ˆ𝚥ˆ𝜅=ˆ𝚥׈𝜅=ˆ𝚤; ˆ𝜅ˆ𝚤=ˆ𝜅׈𝚤=ˆ𝚥(3)
4 Eleonora Grassucci, Edoardo Cicero and Danilo Comminiello
While the scalar product of two quaternions 𝑞and 𝑝is simply defined as the
element-wise product 𝑞·𝑝=𝑞0𝑝0+𝑞1𝑝1+𝑞2𝑝2+𝑞3𝑝3, quaternion vector multi-
plication, denoted with ×, is not commutative, i.e., ˆ𝚤ˆ𝚥ˆ𝚥ˆ𝚤. In fact:
ˆ𝚤ˆ𝚥=ˆ𝚥ˆ𝚤; ˆ𝚥ˆ𝜅=ˆ𝜅ˆ𝚥; ˆ𝜅ˆ𝚤=ˆ𝚤ˆ𝜅.
Due to the non-commutative property, we need to introduce the quaternion prod-
uct, commonly known as Hamilton product. We will see that Hamilton product plays
a crucial role in neural networks. It is defined as:
𝑞 𝑝 =(𝑞0+𝑞1ˆ𝚤+𝑞2ˆ𝚥+𝑞3ˆ𝜅) (𝑝0+𝑝1ˆ𝚤+𝑝2ˆ𝚥+𝑝3ˆ𝜅)
=(𝑞0𝑝0𝑞1𝑝1𝑞2𝑝2𝑞4𝑝4)
+(𝑞0𝑝1+𝑞1𝑝0+𝑞2𝑝3𝑞4𝑝3)ˆ𝚤
+(𝑞0𝑝2𝑞1𝑝3+𝑞2𝑝0+𝑞4𝑝2)ˆ𝚥
+(𝑞0𝑝3+𝑞1𝑝2𝑞2𝑝1+𝑞4𝑝1)ˆ𝜅.
(4)
The above product can be rewritten in a more concise form as:
𝑞 𝑝 =𝑞0𝑝0q·p+𝑞0p+𝑝0q+q×p,(5)
where 𝑞0𝑝0q·pis the scalar element of the new quaternion in output and
𝑞0p+𝑝0q+q×pis instead the vector part of the quaternion. From (5) it is easy to
define a concise form of product for pure quaternions too:
qp =q·p+q×p.(6)
where the scalar product is the same as before for full quaternions and the vector
product is q×p=(𝑞2𝑝3𝑞3𝑝2)ˆ𝚤+ (𝑞3𝑝1𝑞1𝑝3)ˆ𝚥+ (𝑞1𝑝2𝑞2𝑝1)ˆ𝜅.
Similarly to complex numbers, the complex conjugate of a quaternion can be
defined as:
𝑞=𝑞0𝑞1ˆ𝚤𝑞2ˆ𝚥𝑞3ˆ𝜅=𝑞0q(7)
Also the norm is defined and it is equal to |𝑞|=𝑞𝑞=𝑞2
0+𝑞2
1+𝑞2
2+𝑞2
3that is
the euclidean norm in R4. Indeed, 𝑞is said to be a unit quaternion if |𝑞|=1, as well
as a pure unit quaternion if 𝑞2=1. Moreover, a quaternion 𝑞is endowed with an
inverse determined by:
𝑞1=𝑞
|𝑞|2.
Note that for unit quaternions, the relation 𝑞=𝑞1holds.
A quaternion has also a polar form:
𝑞=|𝑞| (cos (𝜃)+vsin (𝜃)) =|𝑞|𝑒𝜈 𝜃 (8)
Quaternion Generative Adversarial Networks 5
where 𝜃Ris the argument of the quaternion, cos (𝜃)=𝑞0/k𝑞k,sin (𝜃)=k𝑞k/k𝑞k
and v=𝑞/k𝑞kis a pure unit quaternion.
Following, quaternions show interestingly properties when they can be interpreted
as points and hyperplanes in R4. Among them, we find involutions, which are
generally defined as self-inverse mappings or mappings that are their own inverse.
Quaternions have an infinite number of involutions [7] that can be generalized by
the formula:
𝑞v=v𝑞v(9)
where 𝑞is an arbitrary quaternion to be involved and vis any unit vector and the
axis of the involution. Among the infinite involutions, the most relevant ones are the
three perpendicular involutions defined as:
𝑞ˆ𝚤=ˆ𝚤𝑞ˆ𝚤=𝑞0+𝑞1ˆ𝚤𝑞2ˆ𝚥𝑞3ˆ𝜅
𝑞ˆ𝚥=ˆ𝚥 𝑞 ˆ𝚥=𝑞0𝑞1ˆ𝚤+𝑞2ˆ𝚥𝑞3ˆ𝜅
𝑞ˆ𝜅=ˆ𝜅𝑞 ˆ𝜅=𝑞0𝑞1ˆ𝚤𝑞2ˆ𝚥+𝑞3ˆ𝜅
(10)
which are the first involutions identified [5] and they are crucial for the study of the
second-order statistics of a quaternion signal, as we will see in the next section.
3 Generative Learning in the Quaternion Domain
In this section, we introduce the quaternion adversarial approach as well as the
fundamental quaternion-valued operations employed to define the family of QGANs
in next sections. It is worth noting that in a quaternion neural network each element
is a quaternion, including inputs, weights, biases and outputs.
3.1 The Quaternion Adversarial Framework
Generative adversarial networks are built upon a minimax game between the gen-
erator network (𝐺) and the discriminator one (𝐷), as a special case of the concept
initially proposed to implement artificial curiosity [31, 32]. They are trained in an
adversarial fashion through the following objective function introduced in [10]:
min
𝐺max
𝐷𝑉(𝐷, 𝐺 )=E𝑥𝑝data(𝑥){log 𝐷(𝑥)}+E𝑧𝑝𝑧(𝑧){log(1𝐷(𝐺(𝑧)))}(11)
where 𝑝data is the real data distribution and 𝑝𝑧is the noise distribution. The two
terms in the objective are two cross-entropies [14]. Indeed, the first term is the
6 Eleonora Grassucci, Edoardo Cicero and Danilo Comminiello
cross-entropy between [1 0]Tand [𝐷(𝑥)1𝐷(𝑥)]T, whereas the second term
is the cross-entropy between [0 1]Tand [𝐷(𝐺(𝑧)) 1𝐷(𝐺(𝑧))]T. In order to
introduce the family of QGANs, first we need to delineate this adversarial approach
in the quaternion domain. Thus, we redefine the cross-entropy function, as suggested
in [27], by replacing real numbers with hypercomplex numbers and computing the
operations element-wise. Thus, the quaternion cross-entropy (QCE) between the
target quaternion 𝑞and the estimated one ˜𝑞can be defined as follows:
QCE(𝑞, ˜𝑞)=1
𝑁
𝑁
𝑛=1𝑞0log(˜𝑞0)+(1𝑞0)log (1˜𝑞0)
+𝑞1log(˜𝑞1)+(1𝑞1)log (1˜𝑞1)
+𝑞2log(˜𝑞2)+(1𝑞2)log (1˜𝑞2)
+𝑞3log(˜𝑞3)+(1𝑞3)log (1˜𝑞3).
(12)
More in general, several objective functions proposed to train GANs can be
redefined in the quaternion domain. Among the most common ones, we find the
Wasserstein distance with a gradient penalty that enforces the Lipschitz continuity
of the discriminator, which is defined as follows [1, 15]:
𝑉(𝐷, 𝐺 )=E𝑥𝑝data {𝐷(𝑥)}E𝑧𝑝(𝑧){𝐷(𝐺(𝑧))}𝜆Eˆ𝑥𝑝ˆ𝑥(||ˆ𝑥𝐷(ˆ𝑥)||21)2
(13)
where the last term is the gradient penalty that is a regularization technique for the
discriminator.
Other works [25, 3] consider instead the hinge loss, which is given, respectively
for the discriminator and the generator, by:
𝑉(𝐷, ˆ
𝐺)=E𝑥𝑝data (𝑥){min(0,1+𝐷(𝑥))}+E𝑧𝑝𝑧(𝑧){min(0,1𝐷(𝐺(𝑧)))},
(14)
𝑉(ˆ
𝐷, 𝐺 )=E𝑧𝑝𝑧(𝑧)ˆ
𝐷(𝐺(𝑧))).(15)
Being (13) and (14) the composition of expected values and cross-entropies, both
the definitions of the Wasserstein loss and of the hinge loss in the quaternion domain
are straightforwardly derived by following the procedure shown for the adversarial
loss in (11).
3.2 Quaternion Fully Connected Layers
In real-valued neural networks, fully connected layers are generally defined as:
Quaternion Generative Adversarial Networks 7
yr=𝜙(Wrxr+br)(16)
where Wrxrperforms the multiplication between the weight matrix Wrand the input
xr,bris the bias and 𝜙) is any activation function. In order to define the same
operation in the quaternion domain, we represent the quaternion weight matrix as
W=W0+W1ˆ𝚤+W2ˆ𝚥+W3ˆ𝜅, the quaternion input as x=x0+x1ˆ𝚤+x2ˆ𝚥+x3ˆ𝜅and
the quaternion bias as b=b0+b1ˆ𝚤+b2ˆ𝚥+b3ˆ𝜅. Therefore, Wx in (16), is performed
by a vector multiplication between two quaternions, i.e., by the Hamilton product
Wx:
Wx=(W0x0W1x1W2x2W3x3)
+(W0x1+W1x0+W2x3W3x2)ˆ𝚤
+(W0x2W1x3+W2x0+W3x1)ˆ𝚥
+(W0x3+W1x2W2x1+W3x0)ˆ𝜅.
(17)
Note that Whas dimensionality 1
4|Wr|since it is composed of four submatrices
W0,W1,W2and W3each one with 1/16 the dimension of Wr. This is a key feature
of QNNs since the results of the quaternion layer with product Wxhas the
same output dimension of the real-valued layer built upon Wrxrbut with 1/4the
number of parameters to train. Note also that the submatrices are shared over each
component of the quaternion input. The sharing allows the weights to capture internal
relations among quaternion elements since each charcteristic in a component will
have an influence in the other components through the common weights. In this way
the relations among components are preserved and captured by the weights of the
network which is able to process inputs without losing intra-channel information.
The bias bis then added with a sum component by component. Finally, in QNNs the
activation functions are applied to the input element-wise resulting in the so called
split activation functions. That is, suppose to consider a common Rectified Linear
Unit (ReLU) activation function and the quaternion z=Wx+b, the final result y
of the layer will be:
y=ReLU(z0) + ReLU(z1)ˆ𝚤+ReLU(z2)ˆ𝚥+ReLU(z3)ˆ𝜅. (18)
3.3 Quaternion Convolutional Layers
Convolutional layers are generally applied to multichannel inputs, such as images.
Supposing to deal with color images, real-valued neural networks break the struc-
ture of the input and concatenates the red, green and blue (RGB) channels in a
tensor. Quaternion-valued convolutions, instead, preserve the correlations among
the channels and encapsulates the image in a quaternion as [27, 26, 37]:
x=0+𝑅ˆ𝚤+𝐺ˆ𝚥+𝐵ˆ𝜅(19)
8 Eleonora Grassucci, Edoardo Cicero and Danilo Comminiello
The image channels are the real coefficients of the imaginary units while the scalar
part is set to 0. Encapsulating channels in a quaternion allows to treat them as a
single entity and thus to preserve intra-channels relations. A visual explanation of
the quaternion representation of color images is depicted in Fig. 1.
Original image
Real-valued image
Quaternion-valued image
Real-valued
CNN
Quaternion-valued
QCNN
Fig. 1 Visual explanation of an 𝑅, 𝐺 , 𝐵 image processed by real and quaternion-valued networks.
On the left, the original three-channels image. The image can be processed in two ways: i) As a
tensor of independent channels by a standard real-valued convolutional network as on the top of
the figure. ii) As a single entity, encapsulating it in a quaternion, and considering internal relations
among channels as quaternion-valued convolutional network does in the bottom of the figure. It is
worth noting how the real-valued network does not consider any correlation among channels while
quaternion ones preserve the relations among channels.
Similarly to the definition of fully connected layers in the previous section, let us
consider now a real-valued convolutional layer delineated by:
y=𝜙(Wx+b)(20)
where is the convolution operator. Quaternion convolutional layers are built with
the same procedure depicted for fully connected layers thus considering the Hamil-
ton product instead of the standard vector multiplication. That is, the convolution
operator Wxis replaced for quaternion weights and inputs with
Quaternion Generative Adversarial Networks 9
Wx=(W0x0W1x1W2x2W3x3)
+(W0x1+W1x0+W2x3W3x2)ˆ𝚤
+(W0x2W1x3+W2x0+W3x1)ˆ𝚥
+(W0x3+W1x2W2x1+W3x0)ˆ𝜅.
(21)
Note that in convolutional networks the sharing weights are crucial to properly
process channels. Indeed, the RGB channels of an image interact with each other
by resulting in combined colors, such as yellow or violet, through a representation
of pixels in the color space. Nonetheless, real-valued networks are not able to catch
these interactions since they process input channels separately, while QCNNs not
only preserves the input design but also capture these relations through the sharing
of weights. Actually, QCNNs perform a double learning: the convolution operator
has the task of learning external relations among the pixels of the image, while the
Hamilton product accomplishes the learning among the channels. Furthermore, as
for linear layers, QCNNs are built with 1/4the number of parameters with respect
to their real-valued counterpart.
3.4 Quaternion Pooling Layers
Many neural networks make use of pooling layers, such as max pooling or average
pooling, to extract high-level information and reduce input dimensions. As done
before for previous layers in the quaternion domain, also this set of operations can
be redefined in the quaternion domain.
The simplest examples of pooling in the hypercomplex domain are average and
sum poolings. Indeed, applying these operations to each quaternion component,
as done for split activation function, will not affect the final result [37]. A different
approach must be defined, instead, for max pooling. Indeed, the maximum of a single
component is not guaranteed by the maximum of all the other components. In order
to address this issue, a guidance matrix has to be introduced. As in [37], the matrix
is built through the quaternion amplitude and keeps trace of the maximum position,
which is then mapped back to the original quaternion matrix in order to proceed with
the pooling computation. However, max pooling operations are rarely employed in
GANs, thus we only make use of average and sum pooling in our experiments.
3.5 Quaternion Batch Normalization
Introduced in [19], batch normalization (BN) has immediately became an ever-
present module in neural networks. The idea behind BN is to normalize inputs
to have zero mean and unit variance. This normalization helps the generalization
ability of the network among different batches of training data and between train
and test data distribution. Moreover, reducing the internal covariate shift remarkably
10 Eleonora Grassucci, Edoardo Cicero and Danilo Comminiello
improves the training speed, thus leading to a faster convergence of the model.
For these reasons, also QNNs are endowed with batch normalization. However,
different versions of this method were proposed in literature. An elegant whitening
procedure based on the standard covarince matrix is introduced in [8]. Here, the
Cholesky decomposition is used to compute the square root of the inverse of the
covariance matrix, which is often intractable. The authors asserts that approach
ensures zero mean, unit variance and decorrelation among components. However,
the covariance matrix is not able to recover the complete second-order statistics in
the quaternion domain [4] and the decomposition requires heavy matrix calculations
and computational time [18]. Another remarkable approach is introduced in [34],
where the input is standardized computing the average of the variance of each
component. Nevertheless, describing the second-order statistics of a signal in the
quaternion domain needs meticulous computations and the approach in [34] is an
approximation of the complete variance. Notwithstanding the approximation, this
method allows to notably reduce computational time.
The proper theoretically procedure to reach a centered, decorrelated and unit-
variance quaternion signal would be represented by performing a whitening proce-
dure. Ideally, we should consider the covariance matrix and then decompose it to
whiten the input in order to avoid computing the square root of the inverse which is
often unfeasible. However, due to the interactions among components, second-order
statistics for quaternion random variables are not completely described by the stan-
dard covariance matrix [4]. For this reason, the augmented covariance matrix should
be considered instead. Such matrix is augmented with the complementary covariance
matrices Cqq𝑖,Cqq 𝑗,Cqq𝑘that are the covariance matrices of the quaternion with
its three perpendicular involutions q𝑖,q𝑗,q𝑘. Thus, the augmented covariance ma-
trix, which completely characterizes the second-order information of the augmented
quaternion vector ˜
q, is defined as:
˜
Cqq =E˜
q˜
qH=
Cqq Cqqˆ𝚤Cqq ˆ𝚥Cqq ˆ𝜅
CH
qqˆ𝚤Cqˆ𝚤qˆ𝚤Cqˆ𝚤qˆ𝚥Cqˆ𝚤qˆ𝜅
CH
qq ˆ𝚥Cqˆ𝚥qˆ𝚤Cqˆ𝚥qˆ𝚥Cqˆ𝚥qˆ𝜅
CH
qq ˆ𝜅Cqˆ𝜅qˆ𝚤Cqˆ𝜅qˆ𝚥Cqˆ𝜅qˆ𝜅
(22)
where (·)His the conjugate transpose operator. The formulation in (22) recovers the
complete statistical information of a general quaternion signal. Thus, the theoretically
procedure should be delineated as:
x=˜
C1/2
qq (xE{x})(23)
or substituting the inverse square root ˜
C1/2
qq with a decomposition of it. However, the
construction of the augmented covariance matrix may be quite difficult and compu-
tational expensive due to the computation of each sub-covariance matrix. Moreover,
˜
C1/2
qq includes skew-symmetric sub-matrices [4], which make the decomposition
more difficult.
Quaternion Generative Adversarial Networks 11
In order to simplify the calculation of (22) and make it more feasible for practical
applications, a particular case can be considered by leveraging the Q-properness
property [35, 4, 12]. The Q-properness entails that the quaternion signal is not corre-
lated with its involutions, implying vanishing complementary covariance matrices,
i.e., Cqq𝑖=Cqq 𝑗=Cqq𝑘=0. Also, for Q-proper random variables the following
relation holds:
var {q𝑐}=Eq2
𝑐=𝜎2, 𝑐 ={0,1,2,3}(24)
Thus, considering a Q-proper quaternion, the covariance in (22) becomes:
˜
Cqq =E˜
q˜
qH=
Cqq 0 0 0
0 Cqˆ𝚤qˆ𝚤0 0
0 0 Cqˆ𝚥qˆ𝚥0
0 0 0 Cqˆ𝜅qˆ𝜅
=4𝜎2I(25)
Assuming Q-properness for a random variable saves a lot of calculations and
computational costs. Notwithstanding the theoretical correctness of the above defined
approach, quaternion batch normalization (QBN) techniques adopted so far in the
literature relies in some approximations.
We assume the input signal is Q-proper, thus we consider the covariance in (25)
and build the normalization as follows:
x=x𝜇𝑞
var {x}+𝜖
=x𝜇𝑞
4𝜎2+𝜖
(26)
where 𝜇𝑞is the quaternion input mean value, which is a quaternion itself, and it is
defined as:
𝜇𝑞=1
𝑁
𝑁
𝑛=1
𝑞0,𝑛 +𝑞1, 𝑛ˆ𝚤+𝑞2, 𝑛 ˆ𝚥+𝑞3,𝑛 ˆ𝜅=¯𝑞0+¯𝑞1ˆ𝚤+¯𝑞2ˆ𝚥+¯𝑞3ˆ𝜅. (27)
The final output is computed as follows:
QBN(x)=𝛾x+𝛽(28)
where 𝛽is a shifting quaternion parameter and 𝛾is a scalar parameter.
In conclusion, the QBN proposed by [8] is an elegant approximation, nevertheless
it is not able to catch the complete second-order statistics information, while requiring
heavy computations [18]. Thus, we believe that considering Q-proper signals, which
are indeed very frequent, is a good approximation which also extremely reduces the
computational requirements. For our experiments, we adopt the method represented
by (28).
12 Eleonora Grassucci, Edoardo Cicero and Danilo Comminiello
3.6 Quaternion Spectral Normalization
Among the wide variety of proposed techniques to stabilize GANs traning, the
spectral normalization (SN) [25] is one of the most widespread method. Previously,
the crucial importance of having a Lipschitz-bounded discriminator function was
introduced in [1, 15]. Lately, it was proved that no restriction on the discriminator
space leads to the gradient uninformativeness problem [40]. This means that the
gradient of the optimal discriminative function has no information about the real
distribution, thus providing useless feedbacks to the generator. Forcing a function
to be Lipschitz continuous means controlling how fast it increases and bound the
gradients, thus mitigating gradient explosions [40, 11]. In [1], a method based on
weight clipping was proposed to force the discriminator to be 1-Lipschitz. Later,
such approach has been improved by adding a gradient penalty (GP) that constraints
the gradient norm to be at most 1 [15]. The latter method is reproposed in several
state-of-the-art GANs and combined with other regularization techniques to improve
performance, as suggest [24]. However, being built on the gradients with respect to
the inputs, the gradient penalty cannot impose a regularization outside the support
of the fake and real data distribution. Moreover, it requires consistent computations.
The spectral normalization, instead, directly operates on the weights of the network
being free of the support limit and its computations is faster than other methods [25].
It aims at controlling the Lipschitz constant of the discriminator by constraining the
spectral norm of each layer.
A generic function 𝑓is 𝐾-Lipschitz continuous if, for any two points 𝑥1,𝑥2, the
following property holds:
k𝑓(𝑥1)𝑓(𝑥2)k
|𝑥1𝑥2|𝐾(29)
being k·kthe 𝑙2norm. The Lipschitz norm k𝑓kLip of a function 𝑓is equal to
sup𝑥𝜎(𝑓(𝑥)), where 𝜎(·) is the spectral norm of the matrix in input, that is, the
largest singular value of the matrix.
For a generic linear layer 𝑓()=Wx +b, the Lipschitz norm is:
k𝑓kLip =sup
𝜎(∇ 𝑓()) =sup
𝜎(W)=𝜎(W)(30)
Assuming the Lipschitz norm of each layer activation being equal to 1, constraint
that is satisfied for many popular activation functions including ReLU and Leaky
ReLU [25], we can apply the Lipschitz bound to the whole network by following
k𝑓1𝑓2kLip k𝑓1kLip ·k𝑓2kLip.
Finally, the SN is defined as
¯
𝑊𝑆 𝑁 (W)=W
𝜎(W)(31)
and it ensures that the weight matrix Walways satisfies the constraint 𝜎(W)=1. In
[25] the authors underline that applying the original singular value decomposition
Quaternion Generative Adversarial Networks 13
algorithm to compute 𝜎(W)may result in an extremely heavy algorithm. To address
the computational complexity, they suggest to estimate the largest singular value via
the power iteration method.
In order to control the Lipschitz constraint in a QGAN, in this section we
explore two methods to define the spectral normalization in the quaternion do-
main. A first approach aims at normalizing the weights Wby operating on each
submatrix W0,W1,W2,W3independently, by computing the spectral norm
separately. That is, through the power iteration method as above, we compute
𝜎0(W0), 𝜎1(W1), 𝜎2(W2), 𝜎3(W3)and then normalize each submatrix with
the corresponding norm. This method forces each submatrix to have spectral norm
equal to 1. However, it never takes the whole weight matrix Winto account. More-
over, the relations among the components of the quaternion matrix is not considered,
losing the characteristic property of QNNs.
The second method, similarly to the real-valued SN, normalizes the whole matrix
Wtogether, by imposing the constraint to the complete matrix and not to the singular
submatrices. Therefore, the spectral norm is computed by taking the complete weight
matrix into account and considering the relations among the quaternion components.
However, while the spectral norm is computed as in (30), the normalization step is
applied differently from the SN in (31). Instead of normalizing the whole matrix as
in (31), being the weight matrix Wdesigned by a composition of the submatrices
W0,W1,W2,W3, we can leverage this quaternion setup to save computational
costs and normalize each submatrix W0,W1,W2,W3. The normalized subma-
trices ¯
W0,𝑄𝑆 𝑁 ,¯
W1,𝑄𝑆 𝑁 ,¯
W2,𝑄𝑆 𝑁 ,¯
W3,𝑄𝑆 𝑁 will result in a normalized weight
matrix ¯
𝑊𝑄𝑆 𝑁 (W)with a more efficient computation than normalizing the full matrix
W.
An empirical comparison between the two methods is reported in Section 5. We
investigate the two techniques in a plain QGAN and prove that the latter approach
is stabler and gains better performance in both the datasets considered. We deem
it more appropriate both theoretically and empirically and we use it in our further
experiments. From now on, we refer to such approach as the quaternion spectral
normalization (QSN).
3.7 Quaternion Weight Initialization
Weight initialization has often a crucial role in networks convergence and in the
reduction of the risk of vanishing or exploding gradients [9]. This procedure be-
comes even more important when dealing with quaternion weights. Indeed, due to
the interactions among the elements of the quaternion, the initialization cannot be
random nor component-aware. For these reasons, an appropriate initialization has to
be introduced.
First, consider a weight matrix Wwith E {|W|}=0. The initialization is based on
a normalized pure quaternion 𝑢generated for each weight submatrix from a uniform
14 Eleonora Grassucci, Edoardo Cicero and Danilo Comminiello
distribution in [0,1]. By using the polar form of a quaternion, we can define the
initialization of the weight matrix as
W=|W|𝑒𝑢 𝜃 =|W|(cos(𝜃) + 𝑢sin(𝜃)) (32)
where each matrix component is initialized as
W0=𝜙cos(𝜃)
W1=𝜙𝑢1sin(𝜃)
W2=𝜙𝑢2sin(𝜃)
W3=𝜙𝑢3sin(𝜃)
(33)
where the angle 𝜃is randomly generated in the interval [−𝜋, 𝜋]and 𝜙is randomly
sampled in the interval of the standard deviation around zero [−𝜎, 𝜎]. The standard
deviation is set according to the initialization method chosen, either [9] or [16]. In
the first case, we set 𝜎=1/2(𝑛𝑖𝑛 +𝑛𝑜𝑢𝑡 )whereas in the latter we set 𝜎=1/2𝑛𝑖 𝑛.
In both the equations, 𝑛𝑖𝑛 is the number of neurons in the input layer and 𝑛𝑜𝑢𝑡 the
number of neurons in the output layer. The variance of Wcan be written as:
var {W}=E|W|2E{|W|}2.(34)
However, similarly to the QBN in the previous section, in order to reduce the
computations, the component E {|W|}2can be considered equal to 0[28, 27]. This is
equivalent to considering a Q-proper quaternion signal whose augmented covariance
matrix has off-diagonal elements equal to 0and trace equal to 4𝜎2. Consequently,
the variance is computed by considering only the first term of (34) as:
var {W}=E|W|2=4𝜎2(35)
3.8 Training
The forward phase of a QNN is the same as its real-valued counterpart. Therefore, the
input flows from the first to the last layer of the network. It may be interesting to note
that in eq. (17) the order of the weight and the input can be inverted, thus changing
the output of the product, resulting in an inverted QNN [27, 28]. For what concerns
the backward phase, it worth mentioning that the gradient of a general quaternion
loss function Lis computed for each component of the quaternion weight matrix W
as in the ensuing equation:
𝛿L
𝛿W=𝛿L
𝛿W0+𝛿L
𝛿W1
ˆ𝚤+𝛿L
𝛿W2
ˆ𝚥+𝛿L
𝛿W3
ˆ𝜅. (36)
Then, the gradient is propagated back following the chain rule. Indeed, as defined
in [27], the backpropagation of quaternion neural networks is just an extension of the
Quaternion Generative Adversarial Networks 15
method for their real-valued counterpart. Consequently, QNNs can be easily trained
as real-valued networks via backpropagation.
4 GAN Architectures in the Quaternion Domain
The previous section described the main blocks and the framework to build and
train a GAN in the quaternion domain. In this section we go further, presenting the
complete definition of a plain QGAN in Subsection 4.1 and of an advanced state-
of-the-art QGAN composed of complex blocks in Subsection 4.2. First, in order
to setting up a QGAN, each input, weight, bias and output has to be manipulated
to become a quaternion. Therefore, weight matrices are initialized as composed
by the four submatrices, similarly to (17) and (21). Real-valued operations such as
multiplications or convolutions in the networks are replaced with their quaternion
counterparts, completing the redefinition of the layers in the quaternion domain.
The input is handled as a quaternion and processed as a single entity. For images, a
pure quaternion is considered as in (19), while for other kind of multidimensional
signals, the scalar part is considered too. The initialization of the weights is then
applied following the description in Section 3.7. This accurate definition of QGANs
grants to design a model with a fewer number of free parameters with respect to
the same real-valued model and consequently to save memory and computational
requirements.
4.1 Vanilla QGAN
In the original GAN [10], both the generator (𝐺) and the discriminator (𝐷) are
defined by fully connected layers. Due to the limited expressivity of this design
with complex data such as images, in [29] the authors propose to replace dense
layers with more suitable operations for this kind of data and to build 𝐺and 𝐷by
stacking several convolutional layers. State-of-the-art GANs are based on the deep
convolutional GAN (DCGAN) [29]. In particular, the DCGAN increases the spatial
dimensionality by means of transposed convolutions in the generator and decreases it
in the discriminator with convolutions. Furthermore, this architecture defines batch
normalization in every layer except for the last layer of 𝐺and for the first layer of 𝐷,
in order to let the networks learn the correct statistics of the data distribution.
By redefining the DCGAN in the quaternion domain (QDCGAN) it is possible
to explore the potential of the quaternion algebra in a simple GAN framework. The
QDCGAN generator is defined by an initial quaternion fully connected layer and
then by interleaving quaternion transposed convolutions with quaternion batch nor-
malization and split ReLU activation functions except for the last layer which ends up
with a split Tanh function. The discriminator has the same structure of the generator
but with quaternion transposed convolutions replaced by quaternion convolutions to
16 Eleonora Grassucci, Edoardo Cicero and Danilo Comminiello
decrease the dimensionality and with a final fully connected quaternion layer that
returns as output the real/fake decision by means of a sigmoid split activation. The
QDCGAN, as its real-valued counterpart, optimizes the original loss in (11).
QTranConv
QBN
split ReLU
QTranConv
split Tanh
Quaternion Generator
Quaternion Discriminator
QFC layer
QFC layer
split Sigm
QNoise Input
Fake
True
QTranConv
QBN
split ReLU
QTranConv
QBN
split ReLU
split LReLU
QBN
QConv
split LReLU
QConv
split LReLU
QBN
QConv
split LReLU
QBN
QConv
Fig. 2 Quaternion Vanilla GAN architecture. Each parameter including inputs, weights and outputs
is a quaternion. The generator (green network) takes a quaternion noise signal and generates a batch
of quaternion images with four channels. The discriminator tries to distinguish between fake and
real quaternion samples exploiting the properties of quaternion algebra.
4.2 Advanced QGAN
The above presented Vanilla QGAN is just a plain example to give a general idea on
how to build GANs in the quaternion domain. In this section, we consider a more
advanced model, the spectral normalized GAN (SNGAN) [25] and we present the
steps to define its quaternion counterpart.
The quaternion spectral normalized GAN (QSNGAN) is trained in an adversarial
fashion through the hinge loss defined in (14) and (15) for the discriminator and
generator respectively, as suggested in [25, 3]. The overall architecture of the model
is inspired by [3]. Both the generator and discriminator networks are characterized
by quaternion convolutional layers in order to leverage the properties of the Hamilton
product. To mitigate the vanishing gradient problem and obtain better performance,
a series of residual blocks with upsampling in the generator and downsampling in
the disciminator can be adopted [25]. A scheme of the residual block of the proposed
QSNGAN is depicted in Fig. 3. The discriminative network plays a crucial role in
GANs training, thus it is more complex with respect to the generator network. It
Quaternion Generative Adversarial Networks 17
QBN
QBN
split ReLU
QConv
split ReLU
QConv
QConv
Upsampling
AvgPool
Upsampling AvgPool
Fig. 3 Quaternion residual block (QResBlock) architecture inspired by [25] and redefined in the
quaternion domain. QBN is omitted in the discriminator network and replaced by QSN. Grey blocks
means they are used exclusively in teh generator or in the discriminator. The generator considers
the umpsampling steps in the residual and in the shortcut pass while the discriminator the average
pooling ones, except for the last residual block of the discriminator which keep the dimension
invariant.
split ReLU
QConv
QConv
AvgPool
QConv AvgPool
Fig. 4 First discriminator quaternion residual block (First QResBlock)with quaternion convolutions
and average pooling layers to downsample the input.
takes in input the two sets of quaternion images with four channels in a first residual
block, as illustrated in Fig. 4. The output of the block is the decision on whether they
come from the fake or real distribution.
In order to guarantee a fair comparison with the SNGAN, we consider a real-
valued noise signal in input to the generator and handle it with an initial real-valued
fully connected layer. The output of the first layer is then encapsulated in a quaternion
signal with a procedure similar to the one considered in Subection 3.3 to handle
colored images. The signal is then processed by the quaternion generator up to
the last layer, which generates the four-channel fake image. The original SNGAN
considers batch normalization in the generator and spectral normalization in the
discriminator. We keep the same structure and consider the proposed QBN in (28)
for the first network and the QSN introduced in Section 3.6 for the discriminator. In
particular, we exploit the QSN with spectral norm computed over the whole weight
matrix, which is theoretically better and ensures stabler results.
The definition of the SNGAN in the quaternion domain allows to save parameters,
as we will explore in the next section. Moreover, the QSNGAN, processing the
channels as a single entity through the quaternion convolutions based on the Hamilton
product, is able to capture the relations among them and to capture any intra-channel
information, which the SNGAN, conversely, loses. The latter property turns into
an improved generation ability by the QSNGAN that properly grasps the real data
18 Eleonora Grassucci, Edoardo Cicero and Danilo Comminiello
Fake
True
Noise input
QResBlock
upsample
Quaternion Generator
QBN
split ReLU
QConv
split Tanh
QResBlock
upsample
FC layer
QResBlock
upsample
QResBlock
upsample
QResBlock
upsample
Quaternion Discriminator
QResBlock
downsample
QResBlock
Sum pooling
split ReLU
Sigmoid
FC layer
First
QResBlock
QResBlock
downsample
QResBlock
downsample
QResBlock
downsample
Fig. 5 QSNGAN architecture schema. The generator network (top) takes in input a real-valued
signal, processes it with a a fully connected layer and then encapsulates it in a quaternion signal. The
residual blocks are depicted in Fig. 3. The generator outputs a quaternion-valued sample of images
that, together with a sample from the real distribution, goes to the input of the discriminator network
(bottom). It handles the samples through a series of residual blocks (the first one is illustrutade in
Fig. 4, the other ones in Fig. 3) up to the last layer which outputs the real or fake decision.
distribution. The architecture of the proposed QSNGAN is reported in Fig. 5. In the
scheme, the forward phase flows from left to right for the top network (quaternion
generator) and from right to left for the second network (quaternion discriminator).
4.3 Evaluation metrics
In order to evaluate the performance of the generative networks, we consider two
objective metrics, the Fréchet Inception Distance (FID) [17] as main metric, as
it is more consistent with human evaluations, and the Inception Score (IS) [30].
The Fréchet inception distance embeds the generated and the real samples into
the Inception convolutional features and models the two distributions as Gaussian
signals evaluating the empirical means 𝜇𝑔, 𝜇data and covariances C𝑔,Cdata and then
computes the Fréchet distance as:
FID(𝑝𝑔, 𝑝data)=||𝜇𝑔𝜇data || + Tr(C𝑔+Cdata 2(C𝑔Cdata)1/2(37)
where Tr (·)refers to the trace operation. Being the FID a distance between real and
fake distributions, the lower the FID value, the better the generated samples.
Instead, the IS considers the inception model to get the conditional distribution
𝑝(𝑦|𝑥)of the generated samples. IS expects the conditional distribution to have
low entropy since the images represent meaningful objects, while the marginal
Quaternion Generative Adversarial Networks 19
distribution 𝑝(𝑦)should have high entropy due to the diversity among the samples.
It is defined as:
IS(𝑝𝑔)=exp E𝑥𝑝𝑔{KL[𝑝(𝑦|𝑥)||𝑝(𝑦)]}(38)
where KL is the Kullback-Leibler divergence. Conversely to the FID, higher IS
values stands for better generated samples. However, IS has some drawbacks since
it does not consider the true data distribution and, moreover, it is not able to detect
mode collapse, thus we consider the FID score as main metric and the IS in support
to it.
5 Experimental Evaluation
In order to evaluate the effectiveness of our proposed approach, we conduct a collec-
tion of experiments on the unsupervised image generation task. We take two datasets
into account: the CelebA-HQ [20], which contains 27k images for training and 3k
images for testing, and the 102 Oxford Flowers, which contains approximately 7k
images for training and a few less then 1k images for testing. We reshape the samples
of both the dataset to 128 ×128 and then test the real-valued SNGAN and the pro-
posed QSNGAN. We use the Adam optimizer and keep the same hyper-parameters
fixed as in [25], i.e., learning rate equal to 0.0002, and the optimizer parameters
equal to 𝛽1=0.0,𝛽2=0.9. We just vary the number of critic iterations, considering
two experiments with critic iterations equal to 1 and then equal to 5 in order to
better investigate the behavior of our QSNGAN, which may have a different balance
between generator and discriminator networks with respect to the SNGAN. In every
experiment, we fix the batch size to 64 and we perform 100k training iterations for
the CelebA-HQ and 50k for the 102 Oxford Flowers. We have also considered to
endow the SNGAN and the QSNGAN with a gradient penalty, as in (13), but we did
not notice any improvement in the experiments, thus meaning that both the SN and
the QSN adequately control the discriminator to be Lipschitz continuous.
The QSNGAN generator is a quaternion convolutional network as in Fig. 5. The
initial fully connected layer, which takes the noise of size 128 in input, is composed
of 4×4×1024 neurons. The following quaternion residual blocks illustrated in
Fig. 3 stack 1024, 512, 256, 128 and 64 filters. This means that, as an example,
the first residual block is built by interleaving QBNs, split ReLUs and quaternion
convolutions with 1024 kernels and an upsampling module with scale factor equal
to 2. Further, at the end of the last residual connection, we stack a QBN, a split
ReLU activation function and a final quaternion convolutional layer of dimension
64 to refine the output image, which is then passed to a split Tanh function to bound
it in the interval [1,1]. For each quaternion convolution, we fix the kernel size to
3 and the stride and the padding to 1. Conversely, the shortcut in the residual block
is composed of an upsampling module and a quaternion convolution with kernel
size equal to 1 and null padding. The network built through this procedure has less
20 Eleonora Grassucci, Edoardo Cicero and Danilo Comminiello
Table 1 Summary of number of networks parameters and memory requirements for real-valued
SNGAN and its quaternion-valued counterpart QSNGAN models for CelebA-HQ. The proposed
method saves more than the 70% of total free parameters and memory disk for model checkpoints.
Model #Params G #Params D #Params Tot Disk Memory
SNGAN 32,150,787 29,022,213 61,173,000 115 GB
QSNGAN 9,631,204 7,264,901 16,896,105 35 GB
Generator checkpoint for inference.
than 10M of free parameters with respect to the 32M parameters of its real-valued
counterpart. This means that the checkpoint for inference saves more than the 70%
of disk memory, as shown in Table 1.
The QSNGAN discriminator is still a quaternion convolutional network as in
Fig. 5, but it is slightly more complex. At the beginning, the real images are en-
capsulated in a quaternion as depicted in Subsection 3.3, resulting in a batch of
four-channel images. Obviously, the images generated by the generator network are
already comprised of four channels and defined as quaternions.
The first residual block of the discriminator in Fig. 4 is a spectrally-normalized
quaternion convolution block with 64 3×3filters and split ReLU activation functions.
The shortcut, instead, as for the generator network, is a 1×1quaternion convolution
with padding equal to 0. In this case, however, both the residual and the shortcut
part ends with a 2×2split average pooling. The images flow then to a stack of
five residual blocks built as in Fig. 3 with, respectively, 128, 256, 512, 1024 and
1024 filters. Nevertheless, the residual section of each block has a split average
pooling to operate downsampling and the shortcut is comprised of a quaternion
convolution and another average pooling. The downsampling procedure is applied
in each residual block except for the last one, which is a refiner and leaves the
dimensionality unchanged. Every weight is normalized through the QSN introduced
in Subsection 3.6. The configurations for kernel size, stride and padding are the same
of the generator. At the end of the residual block stack, we apply a split ReLU and a
global sum pooling before passing the batch to the final spectrally-normalized fully
connected layer which, by means of a sigmoid, returns the real/fake decision. As
for the generator, also the quaternion discriminator allows to save parameters while
learning the internal relations among channels. This saving is underlined in Table 1,
which reports the exact number of parameters for the quaternion model and the
real-valued one. The quaternion GAN can obtain equal or better results when trained
with less parameters since it leverages the properties of quaternion algebra, including
the Hamilton product, that allow to capture also the relations among channels and
catch more information on the input. Consequently, the training procedure needs less
parameters to learn the real distribution and to generate images from it.
The objective evaluation is reported in Table 2. We perform the computations of
FID and IS on the test images (3k for the CelebA-HQ and slightly less than 1k for
the 102 Oxford Flowers). As shown in Table 2, the proposed method stands out in
the generation of samples from both the dataset according to the metrics considered.
Quaternion Generative Adversarial Networks 21
Table 2 Results summary for the 128 ×128 CelebA-HQ and 102 Oxford Flowers datasets. The
proposed QSNGAN obtains a lower FID in each dataset considered. The vlaues of the IS support
the FID results. According to the objective metrics, the proposed QSNGAN generates more visually
pleasant and diverse samples with respect to the real-valued baseline counterpart. The QSNGAN
seems to be more robust to the choice of the hyper-parameter regarding the discriminator iterations
(Critic iter) while the real-valued model fails when changing the original setting which fixes the
parameter equal to 5.
FID IS
Model Critic iter CelebA-HQ 102 Oxford Flowers CelebA-HQ 102 Oxford Flowers
SNGAN 1 >200>200<2.000 2.797 ±0.196
5 34.483 165.058 2.032 ±0.062 2.977 ±0.146
QSNGAN 1 29.417 175.484 2.249 ±0.164 2.754 ±0.256
5 33.068 115.838 2.026 ±0.082 3.000 ±0.141
Discriminator collapses and training fails, thus metrics results are not comparable.
Moreover, the two QSNGANs with critic iterations 1 and 5 score a lower FID
with respect to the best configuration of the SNGAN model. The proposed method
performs better with one critic per generator iterations, while the real-valued model
fails with this configuration. Overall, the QSNGAN seems to be more robust to the
choice of the critic iterations with respect to the SNGAN, which is more fragile. The
IS strengthen the results obtained with the FID, as it reports higher scores for the
proposed method in every dataset.
The visual inspection of the generated samples underlines the improved ability
of our QSNGAN. Figure 6 and Figure 7 show a randomly selected 128 ×128
batch of generated images for the real-valued SNGAN and the proposed QSNGAN,
respectively. On one hand, SNGAN seems to be quite unstable and prone to the input
noise, thus alternating some good quality images with bad generated ones. Overall,
the SNGAN is not always able to distinguish the background from some parts of the
character, sometimes confusing attributes such as the neck or the hair as part of the
environment, and letting them vanishing. On the other hand, the QSNGAN sample
in Fig. 7 shows visually pleasant images, with a clear distinction between subject
and background. It also shows a higher definition of faces attributes, including the
most difficult ones, such as eyebrows, beard or skin shades. In addition, colors seem
to be more vivid and samples are diverse in terms of pose, genre, expression, and
hair color, among others. Concerning the second dataset, the generated samples for
the SNGAN are shown in Fig. 8, while the batch from the QSNGAN is reported
in Fig. 9. As it is clear from Table 2, the results for this dataset are preliminary
but encouraging. Even in this case the proposed approach gains a lower FID and
a higher IS than the real-valued model. Additionally, in SNGAN samples pixels
are evident and often misleading, thus confusing the flower object with the colored
background. On the other hand, the images generated from our QSNGAN contain
more distinct subjects. Furthermore, the proposed method better catches every color
shade thanks to the quaternion algebra properties, which allow the network learning
internal relations among channels without losing intra-channel information.
22 Eleonora Grassucci, Edoardo Cicero and Danilo Comminiello
Fig. 6 Randomly generated samples from the real-valued SNGAN on the CelebA-HQ dataset after
100k training iterations. Sometimes this model fails to detect border attributes such as hair and
neck which may fade on the background. Indeed, only few samples seem to be visually pleasant
while in some other cases the network fails to generate likable images.
Fig. 7 Randomly generated samples from our QSNGAN on the CelebA-HQ dataset after 100k
training iterations. These images are part of the test samples which gained a FID of 29.417 and IS
2.249 ±0.164. The proposed method is able to generate visually pleasant images,well distinguishing
the background from the face. Moreover, we do not observe mode collapse as samples have different
attributes such as genre, hair color, pose and smile, among others.
In conclusion, the proposed quaternion-valued QSNGAN shows an improved
ability in capturing the real data distribution by leveraging the quaternion algebra
Quaternion Generative Adversarial Networks 23
Fig. 8 Randomly generated samples from the SNGAN model on the 102 Oxford Flowers dataset
after 50k training iterations. SNGAN misleads some pixels in the images and depicted objects are
not always distinguishable.
Fig. 9 Randomly generated samples from the proposed QSNGAN on the 102 Oxford Flowers
dataset after 50k training iterations. Flowers contain many different colors shades and most of the
objects are clearly defined. This set of figures sow the improved generation ability of our proposed
method with respect to its real-valued counterpart.
properties in each experiment we conduct. It can generate better and more vivid
samples according to visual inspections and to objective metrics with respect to its
real-valued counterpart. Furthermore, the proposed method has less than the 30%
24 Eleonora Grassucci, Edoardo Cicero and Danilo Comminiello
of free parameters with respect to the SNGAN which also has worse generation
performance.
5.1 Evaluation of Spectral Normalization Methods
This section reports the tests we conduct to evaluate the two quaternion spectral
normalization methods described in Subsection 3.6. In order to investigate the per-
formance of the normalizing approaches, we validate two smaller models with re-
spect to the ones introduced in the previous subsection on the CIFAR10 and STL10
datasets. CIFAR10 contains 50k 32×32 images for training and 10k for testing while
STL10 has 105k 48 ×48 images in the train split and 8k in the test one.
We examine three different configurations: the first one does not involve any QSN
method, thus the discriminator network is not constrained to be 1-Lipscithz. We
run this experiment in order to check the effectiveness of the spectral normalization
methods that we propose. The second configuration applies a split computation of the
spectral norm for each quaternion component and normalize each weight submatrix
W0,W1,W2,W3independently. The last approach computes the spectral norm of
the whole weight matrix and uses it to normalize each component. Respectively, we
refer to these methods as No QSN, QSN Split and QSN Full.
To assess the performance, we build the same SNGANs presented in [25] by
redefining them in the quaternion domain. We adopt the quaternion core residual
blocks we define in the previous section and in Fig. 3, while reducing the model
dimension. For CIFAR10, we set up a generator with the initial linear layer 4×4×256
and then pile up three quaternion residual blocks, each one with 256 filters. As before,
we end up with a stack of QBN, split ReLU and a quaternion convolution with a final
split Tanh to generate the 32 ×32 images in the range [−1,1]. The discriminator, in
which the QSN methods act in each layer, begins with a first residual block (Fig. 4)
with 128 filters and then proceeds with three blocks composed of 128 kernels. As in
Fig. 5, the network ends with a global sum pooling and a fully connected layer with
sigmoid to output the decision probability. The so-defined QSNGAN for CIFAR10
is comprised of less than 2M parameters. It is worth noting that the real-valued
counterpart presented in [25] has more than 5M of free parameters.
The model to generate the 48 ×48 STL10 images is deeper than the previous one
and is composed of 5,545,188 parameters. The structure is the same but it contains
an initial layer of 6×6×512 and then the residual blocks with 256, 128 and 64 filters.
The final refiner quaternion convolutional layers has 64 kernels. The discriminator,
instead, has one residual blocks more than the model for CIFAR10 and the filters
are, respectively from the first to the last block, 64, 128, 256, 512, 1024 with a final
512 fully connected layer with sigmoid.
As we can see in Table 3, the unbounded model with no QSN fails in generating
images from both CIFAR10 and STL10. Indeed, the FID is much higher than the
other approaches. This proves the effectiveness of the proposed QSN full method
which computes the spectral norm of each layer taking all the components into
Quaternion Generative Adversarial Networks 25
Table 3 Summary results for comparison of the two quaternion spectral normalization methods
depicted in Section 3. We consider the SNGAN proposed in [25] as baseline to define two simple
models in the quaternion domain and then test the different QSN approaches. QSN Split refers
to the first method that normalizes the submatrices independently while QSN Full stands for the
normalization of the whole weight matrix together. No QSN is a model without any spectral
normalization method. While the latter fails, the QSN Full generates better images according to the
FID in both datasets.
FID IS
Config CIFAR10 STL10 CIFAR10 STL10
No QSN 70.312 91.567 4.031 ±1.327 4.744 ±0.643
QSN Split 35.417 75.112 4.7128 ±1.270 4.455 ±0.092
QSN Full 31.966 59.611 4.317 ±0.951 4.987 ±0.485
account. As a matter of fact, the proposed approach is capable to generate improved
quality images in every experiment we conduct.
6 Conclusions
In this paper we introduce the family of quaternion-valued GANs (QGANs) that
leverages the properties of quaternion algebra. We have rigorously defined each core
block employed to build the proposed QGANs, including the quaternion adversarial
framework. Moreover, we have provided a meticulous experimental evaluation on
different image generation benchmarks to prove the effectiveness of our method.
We have shown that the proposed QGAN has an improved generation ability with
respect to the real-valued counterpart, according to the FID and IS metrics and to a
visual inspection. Moreover, our method saves up to the 75% of free parameters. We
reckon that these results lay the foundations for novel deep GANs, thus capturing
higher levels of input information and better grasping the real data distribution, while
significantly reducing the overall number of parameters.
References
1. Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein GAN. arXiv preprint: arXiv:1701.07875v3
(2017)
2. Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high fidelity natural image
synthesis. Int. Conf. on Learning Representations (ICLR) (2019)
3. Chen, T., Zhai, X., Ritter, M., Lucic, M., Houlsby, N.: Self-supervised GANs via auxiliary
rotation loss. In: IEEE/CVF Int. Conf. on Computer Vision and Pattern Recognition (CVPR),
pp. 12146–12155 (2019)
4. Cheong Took, C., Mandic, D.P.: Augmented second-order statistics of quaternion random
signals. Signal Process. 91(2), 214–224 (2011)
26 Eleonora Grassucci, Edoardo Cicero and Danilo Comminiello
5. Chernov, V.: Discrete orthogonal transforms with data representation in composition algebras.
Proc. Scandinavian Conf. on Image Analysis pp. 357–364 (1995)
6. Comminiello, D., Lella, M., Scardapane, S., Uncini, A.: Quaternion convolutional neural
networks for detection and localization of 3D sound events. In: IEEE Int. Conf. on Acoust.,
Speech and Signal Process. (ICASSP), pp. 8533–8537. Brighton, UK (2019)
7. Ell, T.A., Sangwine, S.J.: Quaternion involutions and anti-involutions. Comput. Math. Appl.
53(1), 137–143 (2007)
8. Gaudet, C., Maida, A.: Deep quaternion networks. In: IEEE Int. Joint Conf. on Neural Netw.
(IJCNN). Rio de Janeiro, Brazil (2018)
9. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural net-
works. In: Int. Conf. on artificial intelligence and statistics, pp. 249–256 (2010)
10. Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville,
A., Bengio, Y.: Generative adversarial nets. In: 27th Int. Conf. on Neural Information Process-
ing Systems (NIPS), vol. 2, pp. 2672–2680. MIT Press, Cambridge, MA, USA (2014)
11. Gouk, H., Frank, E., Pfahringer, B., Cree, M.J.: Regularisation of neural networks by enforcing
Lipschitz continuity. Mach. Learn. 110(2), 393–416 (2021)
12. Grassucci E. Comminiello, D., Uncini, A.: A quaternion-valued variational autoencoder. In:
IEEE Int. Conf. on Acoust., Speech and Signal Process. (ICASSP) (2021)
13. Grassucci, E., Scardapane, S., Comminiello, D., Uncini, A.: Flexible generative adversarial
networks with non-parametric activation functions. In: Progress in Artificial Intelligence and
Neural Systems, vol. 184. Smart Innovation, Systems and Technologies, Springer (2021)
14. Gui, J., Sun, Z., Wen, Y., Tao, D., Ye, J.p.: A review on generative adversarial networks:
Algorithms, theory, and applications. arXiv preprint: arXiv:2001.06937v1 (2020)
15. Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of
Wasserstein GANs. In: Advances in Neural Information Processing Systems (NIPS) (2017)
16. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: Surpassing human-level
performance on imagenet classification. In: IEEE/CVF Int. Conf. on Computer Vision and
Pattern Recognition (CVPR), pp. 1026–1034 (2015)
17. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two
time-scale update rule converge to a local Nash equilibrium. In: Neural Information Processing
Systems (NIPS), pp. 6626–6637 (2017)
18. Hoffmann, J., Schmitt, S., Osindero, S., Simonyan, K., Elsen, E.: AlgebraNets. arXiv preprint:
arXiv:2006.07360v2 (2020)
19. Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing
internal covariate shift. In: Int. Conf. on Machine Learning (ICML), p. 448–456. JMLR.org
(2015)
20. Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality,
stability, and variation. In: Int. Conf. on Learning Representations (ICLR) (2018)
21. Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial
networks. In: IEEE Conf. on Computer Vision and Pattern Recognition, CVPR, pp. 4401–4410.
Computer Vision Foundation / IEEE (2019)
22. Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving
the image quality of stylegan. In: 2020 IEEE/CVF Conf. on Computer Vision and Pattern
Recognition (CVPR), pp. 8107–8116. IEEE (2020)
23. Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. arXiv Preprint:
arXiv:1312.6114v10 pp. 1–14 (2014)
24. Kurach, K., Lucic, M., Zhai, X., Michalski, M., Gelly, S.: A large-scale study on regularization
and normalization in GANs. In: Int. Conf. on Machine Learning (ICML) (2019)
25. Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative
adversarial networks. arXiv preprint: arXiv:1802.05957v1 (2018)
26. Parcollet, T., Morchid, M., Linarès, G.: Quaternion convolutional neural networks for het-
erogeneous image processing. In: IEEE Int. Conf. on Acoust., Speech and Signal Process.
(ICASSP), pp. 8514–8518. Brighton, UK (2019)
27. Parcollet, T., Morchid, M., Linarès, G.: A survey of quaternion neural networks. Artif. Intell.
Rev. (2019)
Quaternion Generative Adversarial Networks 27
28. Parcollet, T., Ravanelli, M., Morchid, M., Linarès, G., Trabelsi, C., De Mori, R., Bengio, Y.:
Quaternion recurrent neural networks. In: Int. Conf. on Learning Representations (ICLR), pp.
1–19. New Orleans, LA (2019)
29. Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolu-
tional generative adversarial networks. arXiv preprint: arXiv:1511.06434v2 (2016)
30. Salimans, T., Goodfellow, I.J., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved
techniques for training GANs. In: Neural Information Processing Systems (NIPS), pp. 2234–
2242 (2016)
31. Schmidhuber, J.: A possibility for implementing curiosity and boredom in model-building
neural controllers. In: Proc. of the First Int. Conf. on Simulation of Adaptive Behavior on
From Animals to Animats, pp. 222—-227. MIT Press, Cambridge, MA, USA (1991)
32. Schmidhuber, J.: Generative adversarial networks are special cases of artificial curiosity (1990)
and also closely related to predictability minimization (1991). Neural Networks 127, 58–66
(2020)
33. Schönfeld, E., Schiele, B., Khoreva, A.: A U-Net based discriminator for generative adversarial
networks. In: IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), pp.
8207–8216 (2020)
34. Vecchi, R., Scardapane, S., Comminiello, D., Uncini, A.: Compressing deep-quaternion neural
networks with targeted regularisation. CAAI Trans. Intell. Technol. 5(3), 172–176 (2020)
35. Vìa, J., Ramìrez, D., Santamarìa, I.: Proper and widely linear processing of quaternion random
vectors. IEEE Trans. Inf. Theory 56(7), 3502–3515 (2010)
36. Ward, J.P.: Quaternions and Caley Numbers. Algebra ans Applications, Mathematics and Its
Applications, vol. 403. Kluwer Academic Publishers (1997)
37. Yin, Q., Wang, J., Luo, X., Zhai, J., Jha, S.K., Shi, Y.: Quaternion convolutional neural network
for color image classification and forensics. IEEE Access 7, 20293–20301 (2019)
38. Zhang, H., Goodfellow, I.J., Metaxas, D.N., Odena, A.: Self-attention generative adversarial
networks. In: Int. Conf. on Machine Learning (ICML), Proceedings of Machine Learning
Research, vol. 97, pp. 7354–7363. PMLR (2019)
39. Zhang, H., Zhang, Z., Odena, A., Lee, H.: Consistency regularization for generative adversarial
networks. In: Int. Conf. on Machine Learning (ICML) (2020)
40. Zhou, Z., Liang, J., Song, Y., Yu, L., Wang, H., Zhang, W., Yu, Y., Zhang, Z.: Lipschitz
generative adversarial nets. In: Int. Conf. on Machine Learning (ICML), Proceedings of
Machine Learning Research, vol. 97, pp. 7584–7593. PMLR (2019)
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
In recent years, hyper-complex deep networks (such as complex-valued and quaternion-valued neural networks – QVNNs) have received a renewed interest in the literature. They find applications in multiple fields, ranging from image reconstruction to 3D audio processing. Similar to their real-valued counterparts, quaternion neural networks require custom regularisation strategies to avoid overfitting. In addition, for many real-world applications and embedded implementations, there is the need of designing sufficiently compact networks, with few weights and neurons. However, the problem of regularising and/or sparsifying QVNNs has not been properly addressed in the literature as of now. In this study, the authors show how to address both problems by designing targeted regularisation strategies, which can minimise the number of connections and neurons of the network during training. To this end, they investigate two extensions of [inline-formula] and structured regularisations to the quaternion domain. In the authors’ experimental evaluation, they show that these tailored strategies significantly outperform classical (real-valued) regularisation approaches, resulting in small networks especially suitable for low-power and real-time applications.
Article
Full-text available
Quaternion neural networks have recently received an increasing interest due to noticeable improvements over real-valued neural networks on real world tasks such as image, speech and signal processing. The extension of quaternion numbers to neural architectures reached state-of-the-art performances with a reduction of the number of neural parameters. This survey provides a review of past and recent research on quaternion neural networks and their applications in different domains. The paper details methods, algorithms and applications for each quaternion-valued neural networks proposed.
Chapter
Generative adversarial networks (GANs) have become widespread models for complex density estimation tasks such as image generation or image-to-image synthesis. At the same time, training of GANs can suffer from several problems, either of stability or convergence, sometimes hindering their effective deployment. In this paper we investigate whether we can improve GAN training by endowing the neural network models with more flexible activation functions compared to the commonly used rectified linear unit (or its variants). In particular, we evaluate training a deep convolutional GAN wherein all hidden activation functions are replaced with a version of the kernel activation function (KAF), a recently proposed technique for learning non-parametric nonlinearities during the optimization process. On a thorough empirical evaluation on multiple image generation benchmarks, we show that the resulting architectures learn to generate visually pleasing images in a fraction of the number of the epochs, eventually converging to a better solution, even when we equalize (or even lower) the number of free parameters. Overall, this points to the importance of investigating better and more flexible architectures in the context of GANs.
Article
I review unsupervised or self-supervised neural networks playing minimax games in game-theoretic settings: (i) Artificial Curiosity (AC, 1990) is based on two such networks. One network learns to generate a probability distribution over outputs, the other learns to predict effects of the outputs. Each network minimizes the objective function maximized by the other. (ii) Generative Adversarial Networks (GANs, 2010-2014) are an application of AC where the effect of an output is 1 if the output is in a given set, and 0 otherwise. (iii) Predictability Minimization (PM, 1990s) models data distributions through a neural encoder that maximizes the objective function minimized by a neural predictor of the code components. I correct a previously published claim that PM is not based on a minimax game.
Article
We propose an alternative generator architecture for generative adversarial networks, borrowing from style transfer literature. The new architecture leads to an automatically learned, unsupervised separation of high-level attributes (e.g., pose and identity when trained on human faces) and stochastic variation in the generated images (e.g., freckles, hair), and it enables intuitive, scale-specific control of the synthesis. The new generator improves the state-of-the-art in terms of traditional distribution quality metrics, leads to demonstrably better interpolation properties, and also better disentangles the latent factors of variation. To quantify interpolation quality and disentanglement, we propose two new, automated methods that are applicable to any generator architecture. Finally, we introduce a new, highly varied and high-quality dataset of human faces.