PreprintPDF Available

Quaternion Generative Adversarial Networks

April 2021

April 2021

Authors:

Eleonora Grassucci

Sapienza University of Rome

Danilo Comminiello

Sapienza University of Rome

Preprints and early-stage research may not have been peer reviewed yet.

Latest Generative Adversarial Networks (GANs) are gathering outstanding results through a large-scale training, thus employing models composed of millions of parameters requiring extensive computational capabilities. Building such huge models undermines their replicability and increases the training instability. Moreover, multi-channel data, such as images or audio, are usually processed by real-valued convolutional networks that flatten and concatenate the input, losing any intra-channel spatial relation. To address these issues, here we propose a family of quaternion-valued generative adversarial networks (QGANs). QGANs exploit the properties of quaternion algebra, e.g., the Hamilton product for convolutions. This allows to process channels as a single entity and capture internal latent relations, while reducing by a factor of 4 the overall number of parameters. We show how to design QGANs and to extend the proposed approach even to advanced models. We compare the proposed QGANs with real-valued counterparts on multiple image generation benchmarks. Results show that QGANs are able to generate visually pleasing images and to obtain better FID scores with respect to their real-valued GANs. Furthermore, QGANs save up to 75% of the training parameters. We believe these results may pave the way to novel, more accessible, GANs capable of improving performance and saving computational resources.

Randomly generated samples from the real-valued SNGAN on the CelebA-HQ dataset after 100k training iterations. Sometimes this model fails to detect border attributes such as hair and neck which may fade on the background. Indeed, only few samples seem to be visually pleasant while in some other cases the network fails to generate likable images.

…

Randomly generated samples from our QSNGAN on the CelebA-HQ dataset after 100k training iterations. These images are part of the test samples which gained a FID of 29.417 and IS 2.249 ± 0.164. The proposed method is able to generate visually pleasant images, well distinguishing the background from the face. Moreover, we do not observe mode collapse as samples have different attributes such as genre, hair color, pose and smile, among others.

…

Randomly generated samples from the SNGAN model on the 102 Oxford Flowers dataset after 50k training iterations. SNGAN misleads some pixels in the images and depicted objects are not always distinguishable.

…

Randomly generated samples from the proposed QSNGAN on the 102 Oxford Flowers dataset after 50k training iterations. Flowers contain many different colors shades and most of the objects are clearly defined. This set of figures sow the improved generation ability of our proposed method with respect to its real-valued counterpart.

…

Summary of number of networks parameters and memory requirements for real-valued SNGAN and its quaternion-valued counterpart QSNGAN models for CelebA-HQ. The proposed method saves more than the 70% of total free parameters and memory disk for model checkpoints.

…

Figures - uploaded by Eleonora Grassucci

Content may be subject to copyright.

Content uploaded by Eleonora Grassucci

Content may be subject to copyright.

Quaternion Generative Adversarial Networks

Eleonora Grassucci, Edoardo Cicero and Danilo Comminiello

Abstract Latest Generative Adversarial Networks (GANs) are gathering outstand-

ing results through a large-scale training, thus employing models composed of mil-

lions of parameters requiring extensive computational capabilities. Building such

huge models undermines their replicability and increases the training instability.

Moreover, multi-channel data, such as images or audio, are usually processed by

real-valued convolutional networks that ﬂatten and concatenate the input, losing any

intra-channel spatial relation. To address these issues, here we propose a family of

quaternion-valued generative adversarial networks (QGANs). QGANs exploit the

properties of quaternion algebra, e.g., the Hamilton product for convolutions. This

allows to process channels as a single entity and capture internal latent relations,

while reducing by a factor of 4 the overall number of parameters. We show how

to design QGANs and to extend the proposed approach even to advanced models.

We compare the proposed QGANs with real-valued counterparts on multiple im-

age generation benchmarks. Results show that QGANs are able to generate visually

pleasing images and to obtain better FID scores with respect to their real-valued

GANs. Furthermore, QGANs save up to 75% of the training parameters. We be-

lieve these results may pave the way to novel, more accessible, GANs capable of

improving performance and saving computational resources.

1 Introduction

Generative models including generative adversarial networks (GANs) [10] and vari-

ational autoecoders (VAEs) [23] have been recently spectators of an increasing

widespread development due to the massive availability of large datasets covering

a large range of applications. The demand to learn such complex data distributions

Authors are with the Department of Information Engineering, Electronics and Telecommuni-

cations (DIET), Sapienza University of Rome, Via Eudossiana 18, 00184 Rome, Italy, e-mail:

{eleonora.grassucci, danilo.comminiello}@uniroma1.it

arXiv:2104.09630v1 [cs.LG] 19 Apr 2021

2 Eleonora Grassucci, Edoardo Cicero and Danilo Comminiello

leads to deﬁne models far from the original approach of a simple GAN, which was

characterized by fully connected layers and evaluated on benchmark datasets such as

the MNIST [10]. Multiple pathways have been covered to improve GANs generation

ability. A ﬁrst branch aims at stabilizing the training process which is notoriously

unstable, often leading to a lack of convergence. This includes constraining the dis-

criminator network to be 1-Lipschitz by introducing a gradient penalty in the loss

function, normalizing the spectral norm of the network weights or adding a consis-

tency regularization [1, 15, 25, 39]. Other signiﬁcant improvements are gained by

architectural innovations such as self-attention modules, ﬂexible activation functions

or style-based generator [21, 13, 38, 20]. A crucial improvement in the quality of

image generation has been brought by broadly scaling up the networks and involving

wider batch sizes [2, 22, 33]. Indeed, BigGAN closed the visual quality gap between

GANs generated images and real-world samples in ImageNet [2]. Most of the latest

GANs are somehow inspired to it.

However, these impressive results come at the cost of huge models with hundred of

millions of free parameters which require large computational resources. This dras-

tically reduces the accessibility and the diﬀusion of these kind of models. Moreover,

GANs are notorious fragile models, thus the training with this amount of parameters

may result in unstable or less handy process. Furthermore, when dealing with mul-

tidimensional inputs, such as images, 3D audio, multi-sensor signals or human-pose

estimation, among others, real-valued networks break the original structure of the

inputs. Channels are processed as independent entities and just concatenated in a

tensor without exploiting any correlation or intra-channel information.

In order to address these limitations, neural networks in hypercomplex domains

have been proposed. Among them, quaternion neural networks (QNNs) leverage the

properties of the non-commutative quaternion algebra to deﬁne lower-complexity

models and preserve relations among channels. Indeed, QNNs process channels

together as a single entity, thus maintaining the original input design and correlation.

Due to this feature, QNNs are able to capture internal relations while saving up to

the 75% of free parameters thanks to hypercomplex-valued operations, including the

Hamilton product.

Encouraged by the promising results of other generative models in the quaternion

domain [12] and the need to make deep GANs more accessible, we introduce the

family of quaternion generative adversarial networks (QGANs). QGANs are com-

pletely deﬁned in the quaternion domain and, among other properties, they exploit

the quaternion convolutions derived from the hypercomplex algebra [27, 28, 8, 6]

to improve the generation ability of the model while reducing the overall number

of parameters. We present the core-blocks to deﬁne a vanilla QGAN in the quater-

nion adversarial fashion and then explain how to derive more advanced QGANs

to prove the improved generation ability of the proposed approach in multiple im-

age generation benchmarks. We show that the quaternion spectral normalized GAN

(QSNGAN) is able to earn a better FID and a more pleasant visual quality of the

generated images with respect to its real-valued counterpart thanks to the quaternion

inner operations. Moreover, the proposed QSNGAN has just 25% the number of free

parameters with respect to the real-valued SNGAN.

Quaternion Generative Adversarial Networks 3

We believe that these theoretical statements and empirical results lay the founda-

tions for novel deep GANs in hypercomplex domains capable of grasping internal

input relations while scaling down computational requirements, thus saving memory

and being more accessible. To the best of our knowledge, this is the ﬁrst attempt to

deﬁne GANs in a hypercomplex domain.

The contribution of this chapter is threefold:

i) we introduce the family of quaternion generative adversarial networks (QGANs)

proving their enhanced generation ability and lower-complexity with respect to

its real-valued counterpart on diﬀerent benchmark datasets;

ii) we deﬁne the theoretically correct approach to apply the quaternion batch nor-

malization (QBN) and redeﬁne existing approaches as its approximations;

iii) we propose and deﬁne the spectral normalization in the quaternion domain

(QSN) proving its eﬃcacy on two image generation benchmarks.

The chapter is organized as follows. Section 2 presents the fundamental prop-

erties of quaternion algebra, while Section 3 describes the quaternion adversarial

framework and the quaternion-valued core blocks used in QGANs. Section 4 lays

the foundations for the quaternion generative adversarial networks and presents a

simple quaternion vanilla GAN and a more advanced and complex QGAN model.

Section 5 proves the eﬀectiveness of the presented QGANs on a thorough empirical

evaluation, and, ﬁnally, conclusions are drawn in Section 6.

2 Quaternion Algebra

Quaternions are hypercomplex numbers of rank 4, being a direct non-commutative

extension of complex-valued numbers. The quaternion domain Hlies in a four-

dimensional associative normed division algebra over real numbers, belonging to

the class of Cliﬀord algebras [36]. A quaternion is deﬁned as the composition of one

scalar element and three imaginary ones:

𝑞=𝑞0+𝑞1ˆ𝚤+𝑞2ˆ𝚥+𝑞3ˆ𝜅=𝑞0+q(1)

with 𝑞0, 𝑞1, 𝑞2, 𝑞3∈Rand being ˆ𝚤=(1,0,0),ˆ𝚥=(0,1,0),ˆ𝜅=(0,0,1)unit axis

vectors representing the orthonormal basis in R3. A pure quaternion is a quaternion

without its scalar part 𝑞0, resulting in the vector q=𝑞1ˆ𝚤+𝑞2ˆ𝚥+𝑞3ˆ𝜅. As for complex

numbers, also the quaternion algebra relies upon the relations among the imaginary

components:

ˆ𝚤2=ˆ𝚥2=ˆ𝜅2=−1(2)

ˆ𝚤ˆ𝚥=ˆ𝚤×ˆ𝚥=ˆ𝜅; ˆ𝚥ˆ𝜅=ˆ𝚥×ˆ𝜅=ˆ𝚤; ˆ𝜅ˆ𝚤=ˆ𝜅×ˆ𝚤=ˆ𝚥(3)

4 Eleonora Grassucci, Edoardo Cicero and Danilo Comminiello

While the scalar product of two quaternions 𝑞and 𝑝is simply deﬁned as the

element-wise product 𝑞·𝑝=𝑞0𝑝0+𝑞1𝑝1+𝑞2𝑝2+𝑞3𝑝3, quaternion vector multi-

plication, denoted with ×, is not commutative, i.e., ˆ𝚤ˆ𝚥≠ˆ𝚥ˆ𝚤. In fact:

ˆ𝚤ˆ𝚥=−ˆ𝚥ˆ𝚤; ˆ𝚥ˆ𝜅=−ˆ𝜅ˆ𝚥; ˆ𝜅ˆ𝚤=−ˆ𝚤ˆ𝜅.

Due to the non-commutative property, we need to introduce the quaternion prod-

uct, commonly known as Hamilton product. We will see that Hamilton product plays

a crucial role in neural networks. It is deﬁned as:

𝑞 𝑝 =(𝑞0+𝑞1ˆ𝚤+𝑞2ˆ𝚥+𝑞3ˆ𝜅) (𝑝0+𝑝1ˆ𝚤+𝑝2ˆ𝚥+𝑝3ˆ𝜅)

=(𝑞0𝑝0−𝑞1𝑝1−𝑞2𝑝2−𝑞4𝑝4)

+(𝑞0𝑝1+𝑞1𝑝0+𝑞2𝑝3−𝑞4𝑝3)ˆ𝚤

+(𝑞0𝑝2−𝑞1𝑝3+𝑞2𝑝0+𝑞4𝑝2)ˆ𝚥

+(𝑞0𝑝3+𝑞1𝑝2−𝑞2𝑝1+𝑞4𝑝1)ˆ𝜅.

(4)

The above product can be rewritten in a more concise form as:

𝑞 𝑝 =𝑞0𝑝0−q·p+𝑞0p+𝑝0q+q×p,(5)

where 𝑞0𝑝0−q·pis the scalar element of the new quaternion in output and

𝑞0p+𝑝0q+q×pis instead the vector part of the quaternion. From (5) it is easy to

deﬁne a concise form of product for pure quaternions too:

qp =−q·p+q×p.(6)

where the scalar product is the same as before for full quaternions and the vector

product is q×p=(𝑞2𝑝3−𝑞3𝑝2)ˆ𝚤+ (𝑞3𝑝1−𝑞1𝑝3)ˆ𝚥+ (𝑞1𝑝2−𝑞2𝑝1)ˆ𝜅.

Similarly to complex numbers, the complex conjugate of a quaternion can be

deﬁned as:

𝑞∗=𝑞0−𝑞1ˆ𝚤−𝑞2ˆ𝚥−𝑞3ˆ𝜅=𝑞0−q(7)

Also the norm is deﬁned and it is equal to |𝑞|=√𝑞𝑞∗=𝑞2

0+𝑞2

1+𝑞2

2+𝑞2

3that is

the euclidean norm in R4. Indeed, 𝑞is said to be a unit quaternion if |𝑞|=1, as well

as a pure unit quaternion if 𝑞2=−1. Moreover, a quaternion 𝑞is endowed with an

inverse determined by:

𝑞−1=𝑞∗

|𝑞|2.

Note that for unit quaternions, the relation 𝑞∗=𝑞−1holds.

A quaternion has also a polar form:

𝑞=|𝑞| (cos (𝜃)+vsin (𝜃)) =|𝑞|𝑒𝜈 𝜃 (8)

Quaternion Generative Adversarial Networks 5

where 𝜃∈Ris the argument of the quaternion, cos (𝜃)=𝑞0/k𝑞k,sin (𝜃)=k𝑞k/k𝑞k

and v=𝑞/k𝑞kis a pure unit quaternion.

Following, quaternions show interestingly properties when they can be interpreted

as points and hyperplanes in R4. Among them, we ﬁnd involutions, which are

generally deﬁned as self-inverse mappings or mappings that are their own inverse.

Quaternions have an inﬁnite number of involutions [7] that can be generalized by

the formula:

𝑞v=−v𝑞v(9)

where 𝑞is an arbitrary quaternion to be involved and vis any unit vector and the

axis of the involution. Among the inﬁnite involutions, the most relevant ones are the

three perpendicular involutions deﬁned as:

𝑞ˆ𝚤=−ˆ𝚤𝑞ˆ𝚤=𝑞0+𝑞1ˆ𝚤−𝑞2ˆ𝚥−𝑞3ˆ𝜅

𝑞ˆ𝚥=−ˆ𝚥 𝑞 ˆ𝚥=𝑞0−𝑞1ˆ𝚤+𝑞2ˆ𝚥−𝑞3ˆ𝜅

𝑞ˆ𝜅=−ˆ𝜅𝑞 ˆ𝜅=𝑞0−𝑞1ˆ𝚤−𝑞2ˆ𝚥+𝑞3ˆ𝜅

(10)

which are the ﬁrst involutions identiﬁed [5] and they are crucial for the study of the

second-order statistics of a quaternion signal, as we will see in the next section.

3 Generative Learning in the Quaternion Domain

In this section, we introduce the quaternion adversarial approach as well as the

fundamental quaternion-valued operations employed to deﬁne the family of QGANs

in next sections. It is worth noting that in a quaternion neural network each element

is a quaternion, including inputs, weights, biases and outputs.

3.1 The Quaternion Adversarial Framework

Generative adversarial networks are built upon a minimax game between the gen-

erator network (𝐺) and the discriminator one (𝐷), as a special case of the concept

initially proposed to implement artiﬁcial curiosity [31, 32]. They are trained in an

adversarial fashion through the following objective function introduced in [10]:

min

𝐺max

𝐷𝑉(𝐷, 𝐺 )=E𝑥∼𝑝data(𝑥){log 𝐷(𝑥)}+E𝑧∼𝑝𝑧(𝑧){log(1−𝐷(𝐺(𝑧)))}(11)

where 𝑝data is the real data distribution and 𝑝𝑧is the noise distribution. The two

terms in the objective are two cross-entropies [14]. Indeed, the ﬁrst term is the

6 Eleonora Grassucci, Edoardo Cicero and Danilo Comminiello

cross-entropy between [1 0]Tand [𝐷(𝑥)1−𝐷(𝑥)]T, whereas the second term

is the cross-entropy between [0 1]Tand [𝐷(𝐺(𝑧)) 1−𝐷(𝐺(𝑧))]T. In order to

introduce the family of QGANs, ﬁrst we need to delineate this adversarial approach

in the quaternion domain. Thus, we redeﬁne the cross-entropy function, as suggested

in [27], by replacing real numbers with hypercomplex numbers and computing the

operations element-wise. Thus, the quaternion cross-entropy (QCE) between the

target quaternion 𝑞and the estimated one ˜𝑞can be deﬁned as follows:

QCE(𝑞, ˜𝑞)=1

𝑁



𝑛=1𝑞0log(˜𝑞0)+(1−𝑞0)log (1−˜𝑞0)

+𝑞1log(˜𝑞1)+(1−𝑞1)log (1−˜𝑞1)

+𝑞2log(˜𝑞2)+(1−𝑞2)log (1−˜𝑞2)

+𝑞3log(˜𝑞3)+(1−𝑞3)log (1−˜𝑞3).

(12)

More in general, several objective functions proposed to train GANs can be

redeﬁned in the quaternion domain. Among the most common ones, we ﬁnd the

Wasserstein distance with a gradient penalty that enforces the Lipschitz continuity

of the discriminator, which is deﬁned as follows [1, 15]:

𝑉(𝐷, 𝐺 )=E𝑥∼𝑝data {𝐷(𝑥)}−E𝑧∼𝑝(𝑧){𝐷(𝐺(𝑧))}−𝜆Eˆ𝑥∼𝑝ˆ𝑥(||∇ˆ𝑥𝐷(ˆ𝑥)||2−1)2

(13)

where the last term is the gradient penalty that is a regularization technique for the

discriminator.

Other works [25, 3] consider instead the hinge loss, which is given, respectively

for the discriminator and the generator, by:

𝑉(𝐷, ˆ

𝐺)=E𝑥∼𝑝data (𝑥){min(0,−1+𝐷(𝑥))}+E𝑧∼𝑝𝑧(𝑧){min(0,−1−𝐷(𝐺(𝑧)))},

(14)

𝑉(ˆ

𝐷, 𝐺 )=−E𝑧∼𝑝𝑧(𝑧)ˆ

𝐷(𝐺(𝑧))).(15)

Being (13) and (14) the composition of expected values and cross-entropies, both

the deﬁnitions of the Wasserstein loss and of the hinge loss in the quaternion domain

are straightforwardly derived by following the procedure shown for the adversarial

loss in (11).

3.2 Quaternion Fully Connected Layers

In real-valued neural networks, fully connected layers are generally deﬁned as:

Quaternion Generative Adversarial Networks 7

yr=𝜙(Wrxr+br)(16)

where Wrxrperforms the multiplication between the weight matrix Wrand the input

xr,bris the bias and 𝜙(·) is any activation function. In order to deﬁne the same

operation in the quaternion domain, we represent the quaternion weight matrix as

W=W0+W1ˆ𝚤+W2ˆ𝚥+W3ˆ𝜅, the quaternion input as x=x0+x1ˆ𝚤+x2ˆ𝚥+x3ˆ𝜅and

the quaternion bias as b=b0+b1ˆ𝚤+b2ˆ𝚥+b3ˆ𝜅. Therefore, Wx in (16), is performed

by a vector multiplication between two quaternions, i.e., by the Hamilton product

W⊗x:

W⊗x=(W0x0−W1x1−W2x2−W3x3)

+(W0x1+W1x0+W2x3−W3x2)ˆ𝚤

+(W0x2−W1x3+W2x0+W3x1)ˆ𝚥

+(W0x3+W1x2−W2x1+W3x0)ˆ𝜅.

(17)

Note that Whas dimensionality 1

4|Wr|since it is composed of four submatrices

W0,W1,W2and W3each one with 1/16 the dimension of Wr. This is a key feature

of QNNs since the results of the quaternion layer with product W⊗xhas the

same output dimension of the real-valued layer built upon Wrxrbut with 1/4the

number of parameters to train. Note also that the submatrices are shared over each

component of the quaternion input. The sharing allows the weights to capture internal

relations among quaternion elements since each charcteristic in a component will

have an inﬂuence in the other components through the common weights. In this way

the relations among components are preserved and captured by the weights of the

network which is able to process inputs without losing intra-channel information.

The bias bis then added with a sum component by component. Finally, in QNNs the

activation functions are applied to the input element-wise resulting in the so called

split activation functions. That is, suppose to consider a common Rectiﬁed Linear

Unit (ReLU) activation function and the quaternion z=W⊗x+b, the ﬁnal result y

of the layer will be:

y=ReLU(z0) + ReLU(z1)ˆ𝚤+ReLU(z2)ˆ𝚥+ReLU(z3)ˆ𝜅. (18)

3.3 Quaternion Convolutional Layers

Convolutional layers are generally applied to multichannel inputs, such as images.

Supposing to deal with color images, real-valued neural networks break the struc-

ture of the input and concatenates the red, green and blue (RGB) channels in a

tensor. Quaternion-valued convolutions, instead, preserve the correlations among

the channels and encapsulates the image in a quaternion as [27, 26, 37]:

x=0+𝑅ˆ𝚤+𝐺ˆ𝚥+𝐵ˆ𝜅(19)

8 Eleonora Grassucci, Edoardo Cicero and Danilo Comminiello

The image channels are the real coeﬃcients of the imaginary units while the scalar

part is set to 0. Encapsulating channels in a quaternion allows to treat them as a

single entity and thus to preserve intra-channels relations. A visual explanation of

the quaternion representation of color images is depicted in Fig. 1.

Original image

Real-valued image

Quaternion-valued image

Real-valued

CNN

Quaternion-valued

QCNN

Fig. 1 Visual explanation of an 𝑅, 𝐺 , 𝐵 image processed by real and quaternion-valued networks.

On the left, the original three-channels image. The image can be processed in two ways: i) As a

tensor of independent channels by a standard real-valued convolutional network as on the top of

the ﬁgure. ii) As a single entity, encapsulating it in a quaternion, and considering internal relations

among channels as quaternion-valued convolutional network does in the bottom of the ﬁgure. It is

worth noting how the real-valued network does not consider any correlation among channels while

quaternion ones preserve the relations among channels.

Similarly to the deﬁnition of fully connected layers in the previous section, let us

consider now a real-valued convolutional layer delineated by:

y=𝜙(W∗x+b)(20)

where ∗is the convolution operator. Quaternion convolutional layers are built with

the same procedure depicted for fully connected layers thus considering the Hamil-

ton product instead of the standard vector multiplication. That is, the convolution

operator W∗xis replaced for quaternion weights and inputs with

Quaternion Generative Adversarial Networks 9

W∗x=(W0∗x0−W1∗x1−W2∗x2−W3∗x3)

+(W0∗x1+W1∗x0+W2∗x3−W3∗x2)ˆ𝚤

+(W0∗x2−W1∗x3+W2∗x0+W3∗x1)ˆ𝚥

+(W0∗x3+W1∗x2−W2∗x1+W3∗x0)ˆ𝜅.

(21)

Note that in convolutional networks the sharing weights are crucial to properly

process channels. Indeed, the RGB channels of an image interact with each other

by resulting in combined colors, such as yellow or violet, through a representation

of pixels in the color space. Nonetheless, real-valued networks are not able to catch

these interactions since they process input channels separately, while QCNNs not

only preserves the input design but also capture these relations through the sharing

of weights. Actually, QCNNs perform a double learning: the convolution operator

has the task of learning external relations among the pixels of the image, while the

Hamilton product accomplishes the learning among the channels. Furthermore, as

for linear layers, QCNNs are built with 1/4the number of parameters with respect

to their real-valued counterpart.

3.4 Quaternion Pooling Layers

Many neural networks make use of pooling layers, such as max pooling or average

pooling, to extract high-level information and reduce input dimensions. As done

before for previous layers in the quaternion domain, also this set of operations can

be redeﬁned in the quaternion domain.

The simplest examples of pooling in the hypercomplex domain are average and

sum poolings. Indeed, applying these operations to each quaternion component,

as done for split activation function, will not aﬀect the ﬁnal result [37]. A diﬀerent

approach must be deﬁned, instead, for max pooling. Indeed, the maximum of a single

component is not guaranteed by the maximum of all the other components. In order

to address this issue, a guidance matrix has to be introduced. As in [37], the matrix

is built through the quaternion amplitude and keeps trace of the maximum position,

which is then mapped back to the original quaternion matrix in order to proceed with

the pooling computation. However, max pooling operations are rarely employed in

GANs, thus we only make use of average and sum pooling in our experiments.

3.5 Quaternion Batch Normalization

Introduced in [19], batch normalization (BN) has immediately became an ever-

present module in neural networks. The idea behind BN is to normalize inputs

to have zero mean and unit variance. This normalization helps the generalization

ability of the network among diﬀerent batches of training data and between train

and test data distribution. Moreover, reducing the internal covariate shift remarkably

10 Eleonora Grassucci, Edoardo Cicero and Danilo Comminiello

improves the training speed, thus leading to a faster convergence of the model.

For these reasons, also QNNs are endowed with batch normalization. However,

diﬀerent versions of this method were proposed in literature. An elegant whitening

procedure based on the standard covarince matrix is introduced in [8]. Here, the

Cholesky decomposition is used to compute the square root of the inverse of the

covariance matrix, which is often intractable. The authors asserts that approach

ensures zero mean, unit variance and decorrelation among components. However,

the covariance matrix is not able to recover the complete second-order statistics in

the quaternion domain [4] and the decomposition requires heavy matrix calculations

and computational time [18]. Another remarkable approach is introduced in [34],

where the input is standardized computing the average of the variance of each

component. Nevertheless, describing the second-order statistics of a signal in the

quaternion domain needs meticulous computations and the approach in [34] is an

approximation of the complete variance. Notwithstanding the approximation, this

method allows to notably reduce computational time.

The proper theoretically procedure to reach a centered, decorrelated and unit-

variance quaternion signal would be represented by performing a whitening proce-

dure. Ideally, we should consider the covariance matrix and then decompose it to

whiten the input in order to avoid computing the square root of the inverse which is

often unfeasible. However, due to the interactions among components, second-order

statistics for quaternion random variables are not completely described by the stan-

dard covariance matrix [4]. For this reason, the augmented covariance matrix should

be considered instead. Such matrix is augmented with the complementary covariance

matrices Cqq𝑖,Cqq 𝑗,Cqq𝑘that are the covariance matrices of the quaternion with

its three perpendicular involutions q𝑖,q𝑗,q𝑘. Thus, the augmented covariance ma-

trix, which completely characterizes the second-order information of the augmented

quaternion vector ˜

q, is deﬁned as:

Cqq =E˜

q˜

qH=







Cqq Cqqˆ𝚤Cqq ˆ𝚥Cqq ˆ𝜅

qqˆ𝚤Cqˆ𝚤qˆ𝚤Cqˆ𝚤qˆ𝚥Cqˆ𝚤qˆ𝜅

qq ˆ𝚥Cqˆ𝚥qˆ𝚤Cqˆ𝚥qˆ𝚥Cqˆ𝚥qˆ𝜅

qq ˆ𝜅Cqˆ𝜅qˆ𝚤Cqˆ𝜅qˆ𝚥Cqˆ𝜅qˆ𝜅







(22)

where (·)His the conjugate transpose operator. The formulation in (22) recovers the

complete statistical information of a general quaternion signal. Thus, the theoretically

procedure should be delineated as:

x=˜

C−1/2

qq (x−E{x})(23)

or substituting the inverse square root ˜

C−1/2

qq with a decomposition of it. However, the

construction of the augmented covariance matrix may be quite diﬃcult and compu-

tational expensive due to the computation of each sub-covariance matrix. Moreover,

C−1/2

qq includes skew-symmetric sub-matrices [4], which make the decomposition

more diﬃcult.

Quaternion Generative Adversarial Networks 11

In order to simplify the calculation of (22) and make it more feasible for practical

applications, a particular case can be considered by leveraging the Q-properness

property [35, 4, 12]. The Q-properness entails that the quaternion signal is not corre-

lated with its involutions, implying vanishing complementary covariance matrices,

i.e., Cqq𝑖=Cqq 𝑗=Cqq𝑘=0. Also, for Q-proper random variables the following

relation holds:

var {q𝑐}=Eq2

𝑐=𝜎2, 𝑐 ={0,1,2,3}(24)

Thus, considering a Q-proper quaternion, the covariance in (22) becomes:

Cqq =E˜

q˜

qH=





Cqq 0 0 0

0 Cqˆ𝚤qˆ𝚤0 0

0 0 Cqˆ𝚥qˆ𝚥0

0 0 0 Cqˆ𝜅qˆ𝜅







=4𝜎2I(25)

Assuming Q-properness for a random variable saves a lot of calculations and

computational costs. Notwithstanding the theoretical correctness of the above deﬁned

approach, quaternion batch normalization (QBN) techniques adopted so far in the

literature relies in some approximations.

We assume the input signal is Q-proper, thus we consider the covariance in (25)

and build the normalization as follows:

x=x−𝜇𝑞

var {x}+𝜖

=x−𝜇𝑞

√4𝜎2+𝜖

(26)

where 𝜇𝑞is the quaternion input mean value, which is a quaternion itself, and it is

deﬁned as:

𝜇𝑞=1

𝑁



𝑛=1

𝑞0,𝑛 +𝑞1, 𝑛ˆ𝚤+𝑞2, 𝑛 ˆ𝚥+𝑞3,𝑛 ˆ𝜅=¯𝑞0+¯𝑞1ˆ𝚤+¯𝑞2ˆ𝚥+¯𝑞3ˆ𝜅. (27)

The ﬁnal output is computed as follows:

QBN(x)=𝛾x+𝛽(28)

where 𝛽is a shifting quaternion parameter and 𝛾is a scalar parameter.

In conclusion, the QBN proposed by [8] is an elegant approximation, nevertheless

it is not able to catch the complete second-order statistics information, while requiring

heavy computations [18]. Thus, we believe that considering Q-proper signals, which

are indeed very frequent, is a good approximation which also extremely reduces the

computational requirements. For our experiments, we adopt the method represented

by (28).

12 Eleonora Grassucci, Edoardo Cicero and Danilo Comminiello

3.6 Quaternion Spectral Normalization

Among the wide variety of proposed techniques to stabilize GANs traning, the

spectral normalization (SN) [25] is one of the most widespread method. Previously,

the crucial importance of having a Lipschitz-bounded discriminator function was

introduced in [1, 15]. Lately, it was proved that no restriction on the discriminator

space leads to the gradient uninformativeness problem [40]. This means that the

gradient of the optimal discriminative function has no information about the real

distribution, thus providing useless feedbacks to the generator. Forcing a function

to be Lipschitz continuous means controlling how fast it increases and bound the

gradients, thus mitigating gradient explosions [40, 11]. In [1], a method based on

weight clipping was proposed to force the discriminator to be 1-Lipschitz. Later,

such approach has been improved by adding a gradient penalty (GP) that constraints

the gradient norm to be at most 1 [15]. The latter method is reproposed in several

state-of-the-art GANs and combined with other regularization techniques to improve

performance, as suggest [24]. However, being built on the gradients with respect to

the inputs, the gradient penalty cannot impose a regularization outside the support

of the fake and real data distribution. Moreover, it requires consistent computations.

The spectral normalization, instead, directly operates on the weights of the network

being free of the support limit and its computations is faster than other methods [25].

It aims at controlling the Lipschitz constant of the discriminator by constraining the

spectral norm of each layer.

A generic function 𝑓is 𝐾-Lipschitz continuous if, for any two points 𝑥1,𝑥2, the

following property holds:

k𝑓(𝑥1)−𝑓(𝑥2)k

|𝑥1−𝑥2|≤𝐾(29)

being k·kthe 𝑙2norm. The Lipschitz norm k𝑓kLip of a function 𝑓is equal to

sup𝑥𝜎(∇𝑓(𝑥)), where 𝜎(·) is the spectral norm of the matrix in input, that is, the

largest singular value of the matrix.

For a generic linear layer 𝑓(ℎ)=Wx +b, the Lipschitz norm is:

k𝑓kLip =sup

ℎ

𝜎(∇ 𝑓(ℎ)) =sup

ℎ

𝜎(W)=𝜎(W)(30)

Assuming the Lipschitz norm of each layer activation being equal to 1, constraint

that is satisﬁed for many popular activation functions including ReLU and Leaky

ReLU [25], we can apply the Lipschitz bound to the whole network by following

k𝑓1◦𝑓2kLip ≤k𝑓1kLip ·k𝑓2kLip.

Finally, the SN is deﬁned as

𝑊𝑆 𝑁 (W)=W

𝜎(W)(31)

and it ensures that the weight matrix Walways satisﬁes the constraint 𝜎(W)=1. In

[25] the authors underline that applying the original singular value decomposition

Quaternion Generative Adversarial Networks 13

algorithm to compute 𝜎(W)may result in an extremely heavy algorithm. To address

the computational complexity, they suggest to estimate the largest singular value via

the power iteration method.

In order to control the Lipschitz constraint in a QGAN, in this section we

explore two methods to deﬁne the spectral normalization in the quaternion do-

main. A ﬁrst approach aims at normalizing the weights Wby operating on each

submatrix W0,W1,W2,W3independently, by computing the spectral norm

separately. That is, through the power iteration method as above, we compute

𝜎0(W0), 𝜎1(W1), 𝜎2(W2), 𝜎3(W3)and then normalize each submatrix with

the corresponding norm. This method forces each submatrix to have spectral norm

equal to 1. However, it never takes the whole weight matrix Winto account. More-

over, the relations among the components of the quaternion matrix is not considered,

losing the characteristic property of QNNs.

The second method, similarly to the real-valued SN, normalizes the whole matrix

Wtogether, by imposing the constraint to the complete matrix and not to the singular

submatrices. Therefore, the spectral norm is computed by taking the complete weight

matrix into account and considering the relations among the quaternion components.

However, while the spectral norm is computed as in (30), the normalization step is

applied diﬀerently from the SN in (31). Instead of normalizing the whole matrix as

in (31), being the weight matrix Wdesigned by a composition of the submatrices

W0,W1,W2,W3, we can leverage this quaternion setup to save computational

costs and normalize each submatrix W0,W1,W2,W3. The normalized subma-

trices ¯

W0,𝑄𝑆 𝑁 ,¯

W1,𝑄𝑆 𝑁 ,¯

W2,𝑄𝑆 𝑁 ,¯

W3,𝑄𝑆 𝑁 will result in a normalized weight

matrix ¯

𝑊𝑄𝑆 𝑁 (W)with a more eﬃcient computation than normalizing the full matrix

An empirical comparison between the two methods is reported in Section 5. We

investigate the two techniques in a plain QGAN and prove that the latter approach

is stabler and gains better performance in both the datasets considered. We deem

it more appropriate both theoretically and empirically and we use it in our further

experiments. From now on, we refer to such approach as the quaternion spectral

normalization (QSN).

3.7 Quaternion Weight Initialization

Weight initialization has often a crucial role in networks convergence and in the

reduction of the risk of vanishing or exploding gradients [9]. This procedure be-

comes even more important when dealing with quaternion weights. Indeed, due to

the interactions among the elements of the quaternion, the initialization cannot be

random nor component-aware. For these reasons, an appropriate initialization has to

be introduced.

First, consider a weight matrix Wwith E {|W|}=0. The initialization is based on

a normalized pure quaternion 𝑢generated for each weight submatrix from a uniform

14 Eleonora Grassucci, Edoardo Cicero and Danilo Comminiello

distribution in [0,1]. By using the polar form of a quaternion, we can deﬁne the

initialization of the weight matrix as

W=|W|𝑒𝑢 𝜃 =|W|(cos(𝜃) + 𝑢sin(𝜃)) (32)

where each matrix component is initialized as

W0=𝜙cos(𝜃)

W1=𝜙𝑢1sin(𝜃)

W2=𝜙𝑢2sin(𝜃)

W3=𝜙𝑢3sin(𝜃)

(33)

where the angle 𝜃is randomly generated in the interval [−𝜋, 𝜋]and 𝜙is randomly

sampled in the interval of the standard deviation around zero [−𝜎, 𝜎]. The standard

deviation is set according to the initialization method chosen, either [9] or [16]. In

the ﬁrst case, we set 𝜎=1/2(𝑛𝑖𝑛 +𝑛𝑜𝑢𝑡 )whereas in the latter we set 𝜎=1/√2𝑛𝑖 𝑛.

In both the equations, 𝑛𝑖𝑛 is the number of neurons in the input layer and 𝑛𝑜𝑢𝑡 the

number of neurons in the output layer. The variance of Wcan be written as:

var {W}=E|W|2−E{|W|}2.(34)

However, similarly to the QBN in the previous section, in order to reduce the

computations, the component E {|W|}2can be considered equal to 0[28, 27]. This is

equivalent to considering a Q-proper quaternion signal whose augmented covariance

matrix has oﬀ-diagonal elements equal to 0and trace equal to 4𝜎2. Consequently,

the variance is computed by considering only the ﬁrst term of (34) as:

var {W}=E|W|2=4𝜎2(35)

3.8 Training

The forward phase of a QNN is the same as its real-valued counterpart. Therefore, the

input ﬂows from the ﬁrst to the last layer of the network. It may be interesting to note

that in eq. (17) the order of the weight and the input can be inverted, thus changing

the output of the product, resulting in an inverted QNN [27, 28]. For what concerns

the backward phase, it worth mentioning that the gradient of a general quaternion

loss function Lis computed for each component of the quaternion weight matrix W

as in the ensuing equation:

𝛿L

𝛿W=𝛿L

𝛿W0+𝛿L

𝛿W1

ˆ𝚤+𝛿L

𝛿W2

ˆ𝚥+𝛿L

𝛿W3

ˆ𝜅. (36)

Then, the gradient is propagated back following the chain rule. Indeed, as deﬁned

in [27], the backpropagation of quaternion neural networks is just an extension of the

Quaternion Generative Adversarial Networks 15

method for their real-valued counterpart. Consequently, QNNs can be easily trained

as real-valued networks via backpropagation.

4 GAN Architectures in the Quaternion Domain

The previous section described the main blocks and the framework to build and

train a GAN in the quaternion domain. In this section we go further, presenting the

complete deﬁnition of a plain QGAN in Subsection 4.1 and of an advanced state-

of-the-art QGAN composed of complex blocks in Subsection 4.2. First, in order

to setting up a QGAN, each input, weight, bias and output has to be manipulated

to become a quaternion. Therefore, weight matrices are initialized as composed

by the four submatrices, similarly to (17) and (21). Real-valued operations such as

multiplications or convolutions in the networks are replaced with their quaternion

counterparts, completing the redeﬁnition of the layers in the quaternion domain.

The input is handled as a quaternion and processed as a single entity. For images, a

pure quaternion is considered as in (19), while for other kind of multidimensional

signals, the scalar part is considered too. The initialization of the weights is then

applied following the description in Section 3.7. This accurate deﬁnition of QGANs

grants to design a model with a fewer number of free parameters with respect to

the same real-valued model and consequently to save memory and computational

requirements.

4.1 Vanilla QGAN

In the original GAN [10], both the generator (𝐺) and the discriminator (𝐷) are

deﬁned by fully connected layers. Due to the limited expressivity of this design

with complex data such as images, in [29] the authors propose to replace dense

layers with more suitable operations for this kind of data and to build 𝐺and 𝐷by

stacking several convolutional layers. State-of-the-art GANs are based on the deep

convolutional GAN (DCGAN) [29]. In particular, the DCGAN increases the spatial

dimensionality by means of transposed convolutions in the generator and decreases it

in the discriminator with convolutions. Furthermore, this architecture deﬁnes batch

normalization in every layer except for the last layer of 𝐺and for the ﬁrst layer of 𝐷,

in order to let the networks learn the correct statistics of the data distribution.

By redeﬁning the DCGAN in the quaternion domain (QDCGAN) it is possible

to explore the potential of the quaternion algebra in a simple GAN framework. The

QDCGAN generator is deﬁned by an initial quaternion fully connected layer and

then by interleaving quaternion transposed convolutions with quaternion batch nor-

malization and split ReLU activation functions except for the last layer which ends up

with a split Tanh function. The discriminator has the same structure of the generator

but with quaternion transposed convolutions replaced by quaternion convolutions to

16 Eleonora Grassucci, Edoardo Cicero and Danilo Comminiello

decrease the dimensionality and with a ﬁnal fully connected quaternion layer that

returns as output the real/fake decision by means of a sigmoid split activation. The

QDCGAN, as its real-valued counterpart, optimizes the original loss in (11).

QTranConv

QBN

split ReLU

QTranConv

split Tanh

Quaternion Generator

Quaternion Discriminator

QFC layer

split Sigm

QNoise Input

Fake

True

QTranConv

QBN

split ReLU

QTranConv

QBN

split ReLU

split LReLU

QBN

QConv

split LReLU

QConv

split LReLU

QBN

QConv

split LReLU

QBN

QConv

Fig. 2 Quaternion Vanilla GAN architecture. Each parameter including inputs, weights and outputs

is a quaternion. The generator (green network) takes a quaternion noise signal and generates a batch

of quaternion images with four channels. The discriminator tries to distinguish between fake and

real quaternion samples exploiting the properties of quaternion algebra.

4.2 Advanced QGAN

The above presented Vanilla QGAN is just a plain example to give a general idea on

how to build GANs in the quaternion domain. In this section, we consider a more

advanced model, the spectral normalized GAN (SNGAN) [25] and we present the

steps to deﬁne its quaternion counterpart.

The quaternion spectral normalized GAN (QSNGAN) is trained in an adversarial

fashion through the hinge loss deﬁned in (14) and (15) for the discriminator and

generator respectively, as suggested in [25, 3]. The overall architecture of the model

is inspired by [3]. Both the generator and discriminator networks are characterized

by quaternion convolutional layers in order to leverage the properties of the Hamilton

product. To mitigate the vanishing gradient problem and obtain better performance,

a series of residual blocks with upsampling in the generator and downsampling in

the disciminator can be adopted [25]. A scheme of the residual block of the proposed

QSNGAN is depicted in Fig. 3. The discriminative network plays a crucial role in

GANs training, thus it is more complex with respect to the generator network. It

Quaternion Generative Adversarial Networks 17

QBN

split ReLU

QConv

split ReLU

QConv

Upsampling

AvgPool

Upsampling AvgPool

Fig. 3 Quaternion residual block (QResBlock) architecture inspired by [25] and redeﬁned in the

quaternion domain. QBN is omitted in the discriminator network and replaced by QSN. Grey blocks

means they are used exclusively in teh generator or in the discriminator. The generator considers

the umpsampling steps in the residual and in the shortcut pass while the discriminator the average

pooling ones, except for the last residual block of the discriminator which keep the dimension

invariant.

split ReLU

QConv

AvgPool

QConv AvgPool

Fig. 4 First discriminator quaternion residual block (First QResBlock)with quaternion convolutions

and average pooling layers to downsample the input.

takes in input the two sets of quaternion images with four channels in a ﬁrst residual

block, as illustrated in Fig. 4. The output of the block is the decision on whether they

come from the fake or real distribution.

In order to guarantee a fair comparison with the SNGAN, we consider a real-

valued noise signal in input to the generator and handle it with an initial real-valued

fully connected layer. The output of the ﬁrst layer is then encapsulated in a quaternion

signal with a procedure similar to the one considered in Subection 3.3 to handle

colored images. The signal is then processed by the quaternion generator up to

the last layer, which generates the four-channel fake image. The original SNGAN

considers batch normalization in the generator and spectral normalization in the

discriminator. We keep the same structure and consider the proposed QBN in (28)

for the ﬁrst network and the QSN introduced in Section 3.6 for the discriminator. In

particular, we exploit the QSN with spectral norm computed over the whole weight

matrix, which is theoretically better and ensures stabler results.

The deﬁnition of the SNGAN in the quaternion domain allows to save parameters,

as we will explore in the next section. Moreover, the QSNGAN, processing the

channels as a single entity through the quaternion convolutions based on the Hamilton

product, is able to capture the relations among them and to capture any intra-channel

information, which the SNGAN, conversely, loses. The latter property turns into

an improved generation ability by the QSNGAN that properly grasps the real data

18 Eleonora Grassucci, Edoardo Cicero and Danilo Comminiello

Fake

True

Noise input

QResBlock

upsample

Quaternion Generator

QBN

split ReLU

QConv

split Tanh

QResBlock

upsample

FC layer

QResBlock

upsample

QResBlock

upsample

QResBlock

upsample

Quaternion Discriminator

QResBlock

downsample

QResBlock

Sum pooling

split ReLU

Sigmoid

FC layer

First

QResBlock

downsample

QResBlock

downsample

QResBlock

downsample

Fig. 5 QSNGAN architecture schema. The generator network (top) takes in input a real-valued

signal, processes it with a a fully connected layer and then encapsulates it in a quaternion signal. The

residual blocks are depicted in Fig. 3. The generator outputs a quaternion-valued sample of images

that, together with a sample from the real distribution, goes to the input of the discriminator network

(bottom). It handles the samples through a series of residual blocks (the ﬁrst one is illustrutade in

Fig. 4, the other ones in Fig. 3) up to the last layer which outputs the real or fake decision.

distribution. The architecture of the proposed QSNGAN is reported in Fig. 5. In the

scheme, the forward phase ﬂows from left to right for the top network (quaternion

generator) and from right to left for the second network (quaternion discriminator).

4.3 Evaluation metrics

In order to evaluate the performance of the generative networks, we consider two

objective metrics, the Fréchet Inception Distance (FID) [17] as main metric, as

it is more consistent with human evaluations, and the Inception Score (IS) [30].

The Fréchet inception distance embeds the generated and the real samples into

the Inception convolutional features and models the two distributions as Gaussian

signals evaluating the empirical means 𝜇𝑔, 𝜇data and covariances C𝑔,Cdata and then

computes the Fréchet distance as:

FID(𝑝𝑔, 𝑝data)=||𝜇𝑔−𝜇data || + Tr(C𝑔+Cdata −2(C𝑔Cdata)1/2(37)

where Tr (·)refers to the trace operation. Being the FID a distance between real and

fake distributions, the lower the FID value, the better the generated samples.

Instead, the IS considers the inception model to get the conditional distribution

𝑝(𝑦|𝑥)of the generated samples. IS expects the conditional distribution to have

low entropy since the images represent meaningful objects, while the marginal

Quaternion Generative Adversarial Networks 19

distribution 𝑝(𝑦)should have high entropy due to the diversity among the samples.

It is deﬁned as:

IS(𝑝𝑔)=exp E𝑥∼𝑝𝑔{KL[𝑝(𝑦|𝑥)||𝑝(𝑦)]}(38)

where KL is the Kullback-Leibler divergence. Conversely to the FID, higher IS

values stands for better generated samples. However, IS has some drawbacks since

it does not consider the true data distribution and, moreover, it is not able to detect

mode collapse, thus we consider the FID score as main metric and the IS in support

to it.

5 Experimental Evaluation

In order to evaluate the eﬀectiveness of our proposed approach, we conduct a collec-

tion of experiments on the unsupervised image generation task. We take two datasets

into account: the CelebA-HQ [20], which contains 27k images for training and 3k

images for testing, and the 102 Oxford Flowers, which contains approximately 7k

images for training and a few less then 1k images for testing. We reshape the samples

of both the dataset to 128 ×128 and then test the real-valued SNGAN and the pro-

posed QSNGAN. We use the Adam optimizer and keep the same hyper-parameters

ﬁxed as in [25], i.e., learning rate equal to 0.0002, and the optimizer parameters

equal to 𝛽1=0.0,𝛽2=0.9. We just vary the number of critic iterations, considering

two experiments with critic iterations equal to 1 and then equal to 5 in order to

better investigate the behavior of our QSNGAN, which may have a diﬀerent balance

between generator and discriminator networks with respect to the SNGAN. In every

experiment, we ﬁx the batch size to 64 and we perform 100k training iterations for

the CelebA-HQ and 50k for the 102 Oxford Flowers. We have also considered to

endow the SNGAN and the QSNGAN with a gradient penalty, as in (13), but we did

not notice any improvement in the experiments, thus meaning that both the SN and

the QSN adequately control the discriminator to be Lipschitz continuous.

The QSNGAN generator is a quaternion convolutional network as in Fig. 5. The

initial fully connected layer, which takes the noise of size 128 in input, is composed

of 4×4×1024 neurons. The following quaternion residual blocks illustrated in

Fig. 3 stack 1024, 512, 256, 128 and 64 ﬁlters. This means that, as an example,

the ﬁrst residual block is built by interleaving QBNs, split ReLUs and quaternion

convolutions with 1024 kernels and an upsampling module with scale factor equal

to 2. Further, at the end of the last residual connection, we stack a QBN, a split

ReLU activation function and a ﬁnal quaternion convolutional layer of dimension

64 to reﬁne the output image, which is then passed to a split Tanh function to bound

it in the interval [−1,1]. For each quaternion convolution, we ﬁx the kernel size to

3 and the stride and the padding to 1. Conversely, the shortcut in the residual block

is composed of an upsampling module and a quaternion convolution with kernel

size equal to 1 and null padding. The network built through this procedure has less

20 Eleonora Grassucci, Edoardo Cicero and Danilo Comminiello

Table 1 Summary of number of networks parameters and memory requirements for real-valued

SNGAN and its quaternion-valued counterpart QSNGAN models for CelebA-HQ. The proposed

method saves more than the 70% of total free parameters and memory disk for model checkpoints.

Model #Params G #Params D #Params Tot Disk Memory∗

SNGAN 32,150,787 29,022,213 61,173,000 ∼115 GB

QSNGAN 9,631,204 7,264,901 16,896,105 ∼35 GB

∗Generator checkpoint for inference.

than 10M of free parameters with respect to the 32M parameters of its real-valued

counterpart. This means that the checkpoint for inference saves more than the 70%

of disk memory, as shown in Table 1.

The QSNGAN discriminator is still a quaternion convolutional network as in

Fig. 5, but it is slightly more complex. At the beginning, the real images are en-

capsulated in a quaternion as depicted in Subsection 3.3, resulting in a batch of

four-channel images. Obviously, the images generated by the generator network are

already comprised of four channels and deﬁned as quaternions.

The ﬁrst residual block of the discriminator in Fig. 4 is a spectrally-normalized

quaternion convolution block with 64 3×3ﬁlters and split ReLU activation functions.

The shortcut, instead, as for the generator network, is a 1×1quaternion convolution

with padding equal to 0. In this case, however, both the residual and the shortcut

part ends with a 2×2split average pooling. The images ﬂow then to a stack of

ﬁve residual blocks built as in Fig. 3 with, respectively, 128, 256, 512, 1024 and

1024 ﬁlters. Nevertheless, the residual section of each block has a split average

pooling to operate downsampling and the shortcut is comprised of a quaternion

convolution and another average pooling. The downsampling procedure is applied

in each residual block except for the last one, which is a reﬁner and leaves the

dimensionality unchanged. Every weight is normalized through the QSN introduced

in Subsection 3.6. The conﬁgurations for kernel size, stride and padding are the same

of the generator. At the end of the residual block stack, we apply a split ReLU and a

global sum pooling before passing the batch to the ﬁnal spectrally-normalized fully

connected layer which, by means of a sigmoid, returns the real/fake decision. As

for the generator, also the quaternion discriminator allows to save parameters while

learning the internal relations among channels. This saving is underlined in Table 1,

which reports the exact number of parameters for the quaternion model and the

real-valued one. The quaternion GAN can obtain equal or better results when trained

with less parameters since it leverages the properties of quaternion algebra, including

the Hamilton product, that allow to capture also the relations among channels and

catch more information on the input. Consequently, the training procedure needs less

parameters to learn the real distribution and to generate images from it.

The objective evaluation is reported in Table 2. We perform the computations of

FID and IS on the test images (3k for the CelebA-HQ and slightly less than 1k for

the 102 Oxford Flowers). As shown in Table 2, the proposed method stands out in

the generation of samples from both the dataset according to the metrics considered.

Quaternion Generative Adversarial Networks 21

Table 2 Results summary for the 128 ×128 CelebA-HQ and 102 Oxford Flowers datasets. The

proposed QSNGAN obtains a lower FID in each dataset considered. The vlaues of the IS support

the FID results. According to the objective metrics, the proposed QSNGAN generates more visually

pleasant and diverse samples with respect to the real-valued baseline counterpart. The QSNGAN

seems to be more robust to the choice of the hyper-parameter regarding the discriminator iterations

(Critic iter) while the real-valued model fails when changing the original setting which ﬁxes the

parameter equal to 5.

FID ↓IS ↑

Model Critic iter CelebA-HQ 102 Oxford Flowers CelebA-HQ 102 Oxford Flowers

SNGAN 1 >200∗>200∗<2.000 ∗2.797 ±0.196

5 34.483 165.058 2.032 ±0.062 2.977 ±0.146

QSNGAN 1 29.417 175.484 2.249 ±0.164 2.754 ±0.256

5 33.068 115.838 2.026 ±0.082 3.000 ±0.141

∗Discriminator collapses and training fails, thus metrics results are not comparable.

Moreover, the two QSNGANs with critic iterations 1 and 5 score a lower FID

with respect to the best conﬁguration of the SNGAN model. The proposed method

performs better with one critic per generator iterations, while the real-valued model

fails with this conﬁguration. Overall, the QSNGAN seems to be more robust to the

choice of the critic iterations with respect to the SNGAN, which is more fragile. The

IS strengthen the results obtained with the FID, as it reports higher scores for the

proposed method in every dataset.

The visual inspection of the generated samples underlines the improved ability

of our QSNGAN. Figure 6 and Figure 7 show a randomly selected 128 ×128

batch of generated images for the real-valued SNGAN and the proposed QSNGAN,

respectively. On one hand, SNGAN seems to be quite unstable and prone to the input

noise, thus alternating some good quality images with bad generated ones. Overall,

the SNGAN is not always able to distinguish the background from some parts of the

character, sometimes confusing attributes such as the neck or the hair as part of the

environment, and letting them vanishing. On the other hand, the QSNGAN sample

in Fig. 7 shows visually pleasant images, with a clear distinction between subject

and background. It also shows a higher deﬁnition of faces attributes, including the

most diﬃcult ones, such as eyebrows, beard or skin shades. In addition, colors seem

to be more vivid and samples are diverse in terms of pose, genre, expression, and

hair color, among others. Concerning the second dataset, the generated samples for

the SNGAN are shown in Fig. 8, while the batch from the QSNGAN is reported

in Fig. 9. As it is clear from Table 2, the results for this dataset are preliminary

but encouraging. Even in this case the proposed approach gains a lower FID and

a higher IS than the real-valued model. Additionally, in SNGAN samples pixels

are evident and often misleading, thus confusing the ﬂower object with the colored

background. On the other hand, the images generated from our QSNGAN contain

more distinct subjects. Furthermore, the proposed method better catches every color

shade thanks to the quaternion algebra properties, which allow the network learning

internal relations among channels without losing intra-channel information.

22 Eleonora Grassucci, Edoardo Cicero and Danilo Comminiello

Fig. 6 Randomly generated samples from the real-valued SNGAN on the CelebA-HQ dataset after

100k training iterations. Sometimes this model fails to detect border attributes such as hair and

neck which may fade on the background. Indeed, only few samples seem to be visually pleasant

while in some other cases the network fails to generate likable images.

Fig. 7 Randomly generated samples from our QSNGAN on the CelebA-HQ dataset after 100k

training iterations. These images are part of the test samples which gained a FID of 29.417 and IS

2.249 ±0.164. The proposed method is able to generate visually pleasant images,well distinguishing

the background from the face. Moreover, we do not observe mode collapse as samples have diﬀerent

attributes such as genre, hair color, pose and smile, among others.

In conclusion, the proposed quaternion-valued QSNGAN shows an improved

ability in capturing the real data distribution by leveraging the quaternion algebra

Quaternion Generative Adversarial Networks 23

Fig. 8 Randomly generated samples from the SNGAN model on the 102 Oxford Flowers dataset

after 50k training iterations. SNGAN misleads some pixels in the images and depicted objects are

not always distinguishable.

Fig. 9 Randomly generated samples from the proposed QSNGAN on the 102 Oxford Flowers

dataset after 50k training iterations. Flowers contain many diﬀerent colors shades and most of the

objects are clearly deﬁned. This set of ﬁgures sow the improved generation ability of our proposed

method with respect to its real-valued counterpart.

properties in each experiment we conduct. It can generate better and more vivid

samples according to visual inspections and to objective metrics with respect to its

real-valued counterpart. Furthermore, the proposed method has less than the 30%

24 Eleonora Grassucci, Edoardo Cicero and Danilo Comminiello

of free parameters with respect to the SNGAN which also has worse generation

performance.

5.1 Evaluation of Spectral Normalization Methods

This section reports the tests we conduct to evaluate the two quaternion spectral

normalization methods described in Subsection 3.6. In order to investigate the per-

formance of the normalizing approaches, we validate two smaller models with re-

spect to the ones introduced in the previous subsection on the CIFAR10 and STL10

datasets. CIFAR10 contains 50k 32×32 images for training and 10k for testing while

STL10 has 105k 48 ×48 images in the train split and 8k in the test one.

We examine three diﬀerent conﬁgurations: the ﬁrst one does not involve any QSN

method, thus the discriminator network is not constrained to be 1-Lipscithz. We

run this experiment in order to check the eﬀectiveness of the spectral normalization

methods that we propose. The second conﬁguration applies a split computation of the

spectral norm for each quaternion component and normalize each weight submatrix

W0,W1,W2,W3independently. The last approach computes the spectral norm of

the whole weight matrix and uses it to normalize each component. Respectively, we

refer to these methods as No QSN, QSN Split and QSN Full.

To assess the performance, we build the same SNGANs presented in [25] by

redeﬁning them in the quaternion domain. We adopt the quaternion core residual

blocks we deﬁne in the previous section and in Fig. 3, while reducing the model

dimension. For CIFAR10, we set up a generator with the initial linear layer 4×4×256

and then pile up three quaternion residual blocks, each one with 256 ﬁlters. As before,

we end up with a stack of QBN, split ReLU and a quaternion convolution with a ﬁnal

split Tanh to generate the 32 ×32 images in the range [−1,1]. The discriminator, in

which the QSN methods act in each layer, begins with a ﬁrst residual block (Fig. 4)

with 128 ﬁlters and then proceeds with three blocks composed of 128 kernels. As in

Fig. 5, the network ends with a global sum pooling and a fully connected layer with

sigmoid to output the decision probability. The so-deﬁned QSNGAN for CIFAR10

is comprised of less than 2M parameters. It is worth noting that the real-valued

counterpart presented in [25] has more than 5M of free parameters.

The model to generate the 48 ×48 STL10 images is deeper than the previous one

and is composed of 5,545,188 parameters. The structure is the same but it contains

an initial layer of 6×6×512 and then the residual blocks with 256, 128 and 64 ﬁlters.

The ﬁnal reﬁner quaternion convolutional layers has 64 kernels. The discriminator,

instead, has one residual blocks more than the model for CIFAR10 and the ﬁlters

are, respectively from the ﬁrst to the last block, 64, 128, 256, 512, 1024 with a ﬁnal

512 fully connected layer with sigmoid.

As we can see in Table 3, the unbounded model with no QSN fails in generating

images from both CIFAR10 and STL10. Indeed, the FID is much higher than the

other approaches. This proves the eﬀectiveness of the proposed QSN full method

which computes the spectral norm of each layer taking all the components into

Quaternion Generative Adversarial Networks 25

Table 3 Summary results for comparison of the two quaternion spectral normalization methods

depicted in Section 3. We consider the SNGAN proposed in [25] as baseline to deﬁne two simple

models in the quaternion domain and then test the diﬀerent QSN approaches. QSN Split refers

to the ﬁrst method that normalizes the submatrices independently while QSN Full stands for the

normalization of the whole weight matrix together. No QSN is a model without any spectral

normalization method. While the latter fails, the QSN Full generates better images according to the

FID in both datasets.

FID ↓IS ↑

Conﬁg CIFAR10 STL10 CIFAR10 STL10

No QSN 70.312 91.567 4.031 ±1.327 4.744 ±0.643

QSN Split 35.417 75.112 4.7128 ±1.270 4.455 ±0.092

QSN Full 31.966 59.611 4.317 ±0.951 4.987 ±0.485

account. As a matter of fact, the proposed approach is capable to generate improved

quality images in every experiment we conduct.

6 Conclusions

In this paper we introduce the family of quaternion-valued GANs (QGANs) that

leverages the properties of quaternion algebra. We have rigorously deﬁned each core

block employed to build the proposed QGANs, including the quaternion adversarial

framework. Moreover, we have provided a meticulous experimental evaluation on

diﬀerent image generation benchmarks to prove the eﬀectiveness of our method.

We have shown that the proposed QGAN has an improved generation ability with

respect to the real-valued counterpart, according to the FID and IS metrics and to a

visual inspection. Moreover, our method saves up to the 75% of free parameters. We

reckon that these results lay the foundations for novel deep GANs, thus capturing

higher levels of input information and better grasping the real data distribution, while

signiﬁcantly reducing the overall number of parameters.

References

1. Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein GAN. arXiv preprint: arXiv:1701.07875v3

(2017)

2. Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high ﬁdelity natural image

synthesis. Int. Conf. on Learning Representations (ICLR) (2019)

3. Chen, T., Zhai, X., Ritter, M., Lucic, M., Houlsby, N.: Self-supervised GANs via auxiliary

rotation loss. In: IEEE/CVF Int. Conf. on Computer Vision and Pattern Recognition (CVPR),

pp. 12146–12155 (2019)

4. Cheong Took, C., Mandic, D.P.: Augmented second-order statistics of quaternion random

signals. Signal Process. 91(2), 214–224 (2011)

26 Eleonora Grassucci, Edoardo Cicero and Danilo Comminiello

5. Chernov, V.: Discrete orthogonal transforms with data representation in composition algebras.

Proc. Scandinavian Conf. on Image Analysis pp. 357–364 (1995)

6. Comminiello, D., Lella, M., Scardapane, S., Uncini, A.: Quaternion convolutional neural

networks for detection and localization of 3D sound events. In: IEEE Int. Conf. on Acoust.,

Speech and Signal Process. (ICASSP), pp. 8533–8537. Brighton, UK (2019)

7. Ell, T.A., Sangwine, S.J.: Quaternion involutions and anti-involutions. Comput. Math. Appl.

53(1), 137–143 (2007)

8. Gaudet, C., Maida, A.: Deep quaternion networks. In: IEEE Int. Joint Conf. on Neural Netw.

(ĲCNN). Rio de Janeiro, Brazil (2018)

9. Glorot, X., Bengio, Y.: Understanding the diﬃculty of training deep feedforward neural net-

works. In: Int. Conf. on artiﬁcial intelligence and statistics, pp. 249–256 (2010)

10. Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville,

A., Bengio, Y.: Generative adversarial nets. In: 27th Int. Conf. on Neural Information Process-

ing Systems (NIPS), vol. 2, pp. 2672–2680. MIT Press, Cambridge, MA, USA (2014)

11. Gouk, H., Frank, E., Pfahringer, B., Cree, M.J.: Regularisation of neural networks by enforcing

Lipschitz continuity. Mach. Learn. 110(2), 393–416 (2021)

12. Grassucci E. Comminiello, D., Uncini, A.: A quaternion-valued variational autoencoder. In:

IEEE Int. Conf. on Acoust., Speech and Signal Process. (ICASSP) (2021)

13. Grassucci, E., Scardapane, S., Comminiello, D., Uncini, A.: Flexible generative adversarial

networks with non-parametric activation functions. In: Progress in Artiﬁcial Intelligence and

Neural Systems, vol. 184. Smart Innovation, Systems and Technologies, Springer (2021)

14. Gui, J., Sun, Z., Wen, Y., Tao, D., Ye, J.p.: A review on generative adversarial networks:

Algorithms, theory, and applications. arXiv preprint: arXiv:2001.06937v1 (2020)

15. Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of

Wasserstein GANs. In: Advances in Neural Information Processing Systems (NIPS) (2017)

16. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectiﬁers: Surpassing human-level

performance on imagenet classiﬁcation. In: IEEE/CVF Int. Conf. on Computer Vision and

Pattern Recognition (CVPR), pp. 1026–1034 (2015)

17. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two

time-scale update rule converge to a local Nash equilibrium. In: Neural Information Processing

Systems (NIPS), pp. 6626–6637 (2017)

18. Hoﬀmann, J., Schmitt, S., Osindero, S., Simonyan, K., Elsen, E.: AlgebraNets. arXiv preprint:

arXiv:2006.07360v2 (2020)

19. Ioﬀe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing

internal covariate shift. In: Int. Conf. on Machine Learning (ICML), p. 448–456. JMLR.org

(2015)

20. Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality,

stability, and variation. In: Int. Conf. on Learning Representations (ICLR) (2018)

21. Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial

networks. In: IEEE Conf. on Computer Vision and Pattern Recognition, CVPR, pp. 4401–4410.

Computer Vision Foundation / IEEE (2019)

22. Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving

the image quality of stylegan. In: 2020 IEEE/CVF Conf. on Computer Vision and Pattern

Recognition (CVPR), pp. 8107–8116. IEEE (2020)

23. Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. arXiv Preprint:

arXiv:1312.6114v10 pp. 1–14 (2014)

24. Kurach, K., Lucic, M., Zhai, X., Michalski, M., Gelly, S.: A large-scale study on regularization

and normalization in GANs. In: Int. Conf. on Machine Learning (ICML) (2019)

25. Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative

adversarial networks. arXiv preprint: arXiv:1802.05957v1 (2018)

26. Parcollet, T., Morchid, M., Linarès, G.: Quaternion convolutional neural networks for het-

erogeneous image processing. In: IEEE Int. Conf. on Acoust., Speech and Signal Process.

(ICASSP), pp. 8514–8518. Brighton, UK (2019)

27. Parcollet, T., Morchid, M., Linarès, G.: A survey of quaternion neural networks. Artif. Intell.

Rev. (2019)

Quaternion Generative Adversarial Networks 27

28. Parcollet, T., Ravanelli, M., Morchid, M., Linarès, G., Trabelsi, C., De Mori, R., Bengio, Y.:

Quaternion recurrent neural networks. In: Int. Conf. on Learning Representations (ICLR), pp.

1–19. New Orleans, LA (2019)

29. Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolu-

tional generative adversarial networks. arXiv preprint: arXiv:1511.06434v2 (2016)

30. Salimans, T., Goodfellow, I.J., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved

techniques for training GANs. In: Neural Information Processing Systems (NIPS), pp. 2234–

2242 (2016)

31. Schmidhuber, J.: A possibility for implementing curiosity and boredom in model-building

neural controllers. In: Proc. of the First Int. Conf. on Simulation of Adaptive Behavior on

From Animals to Animats, pp. 222—-227. MIT Press, Cambridge, MA, USA (1991)

32. Schmidhuber, J.: Generative adversarial networks are special cases of artiﬁcial curiosity (1990)

and also closely related to predictability minimization (1991). Neural Networks 127, 58–66

(2020)

33. Schönfeld, E., Schiele, B., Khoreva, A.: A U-Net based discriminator for generative adversarial

networks. In: IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), pp.

8207–8216 (2020)

34. Vecchi, R., Scardapane, S., Comminiello, D., Uncini, A.: Compressing deep-quaternion neural

networks with targeted regularisation. CAAI Trans. Intell. Technol. 5(3), 172–176 (2020)

35. Vìa, J., Ramìrez, D., Santamarìa, I.: Proper and widely linear processing of quaternion random

vectors. IEEE Trans. Inf. Theory 56(7), 3502–3515 (2010)

36. Ward, J.P.: Quaternions and Caley Numbers. Algebra ans Applications, Mathematics and Its

Applications, vol. 403. Kluwer Academic Publishers (1997)

37. Yin, Q., Wang, J., Luo, X., Zhai, J., Jha, S.K., Shi, Y.: Quaternion convolutional neural network

for color image classiﬁcation and forensics. IEEE Access 7, 20293–20301 (2019)

38. Zhang, H., Goodfellow, I.J., Metaxas, D.N., Odena, A.: Self-attention generative adversarial

networks. In: Int. Conf. on Machine Learning (ICML), Proceedings of Machine Learning

Research, vol. 97, pp. 7354–7363. PMLR (2019)

39. Zhang, H., Zhang, Z., Odena, A., Lee, H.: Consistency regularization for generative adversarial

networks. In: Int. Conf. on Machine Learning (ICML) (2020)

40. Zhou, Z., Liang, J., Song, Y., Yu, L., Wang, H., Zhang, W., Yu, Y., Zhang, Z.: Lipschitz

generative adversarial nets. In: Int. Conf. on Machine Learning (ICML), Proceedings of

Machine Learning Research, vol. 97, pp. 7584–7593. PMLR (2019)

ResearchGate has not been able to resolve any citations for this publication.

Compressing deep quaternion neural networks with targeted regularization

Article

Full-text available

Aug 2020

In recent years, hyper-complex deep networks (such as complex-valued and quaternion-valued neural networks – QVNNs) have received a renewed interest in the literature. They find applications in multiple fields, ranging from image reconstruction to 3D audio processing. Similar to their real-valued counterparts, quaternion neural networks require custom regularisation strategies to avoid overfitting. In addition, for many real-world applications and embedded implementations, there is the need of designing sufficiently compact networks, with few weights and neurons. However, the problem of regularising and/or sparsifying QVNNs has not been properly addressed in the literature as of now. In this study, the authors show how to address both problems by designing targeted regularisation strategies, which can minimise the number of connections and neurons of the network during training. To this end, they investigate two extensions of [inline-formula] and structured regularisations to the quaternion domain. In the authors’ experimental evaluation, they show that these tailored strategies significantly outperform classical (real-valued) regularisation approaches, resulting in small networks especially suitable for low-power and real-time applications.

A survey of quaternion neural networks

Article

Full-text available

Apr 2020
ARTIF INTELL REV

Quaternion neural networks have recently received an increasing interest due to noticeable improvements over real-valued neural networks on real world tasks such as image, speech and signal processing. The extension of quaternion numbers to neural architectures reached state-of-the-art performances with a reduction of the number of neural parameters. This survey provides a review of past and recent research on quaternion neural networks and their applications in different domains. The paper details methods, algorithms and applications for each quaternion-valued neural networks proposed.

A Quaternion-Valued Variational Autoencoder

Conference Paper

Jun 2021

Analyzing and Improving the Image Quality of StyleGAN

Conference Paper

Jun 2020

Flexible Generative Adversarial Networks with Non-parametric Activation Functions

Chapter

Jan 2021

Generative adversarial networks (GANs) have become widespread models for complex density estimation tasks such as image generation or image-to-image synthesis. At the same time, training of GANs can suffer from several problems, either of stability or convergence, sometimes hindering their effective deployment. In this paper we investigate whether we can improve GAN training by endowing the neural network models with more flexible activation functions compared to the commonly used rectified linear unit (or its variants). In particular, we evaluate training a deep convolutional GAN wherein all hidden activation functions are replaced with a version of the kernel activation function (KAF), a recently proposed technique for learning non-parametric nonlinearities during the optimization process. On a thorough empirical evaluation on multiple image generation benchmarks, we show that the resulting architectures learn to generate visually pleasing images in a fraction of the number of the epochs, eventually converging to a better solution, even when we equalize (or even lower) the number of free parameters. Overall, this points to the importance of investigating better and more flexible architectures in the context of GANs.

Generative Adversarial Networks are special cases of Artificial Curiosity (1990) and also closely related to Predictability Minimization (1991)

Article

Apr 2020
NEURAL NETWORKS

Jürgen Schmidhuber

I review unsupervised or self-supervised neural networks playing minimax games in game-theoretic settings: (i) Artificial Curiosity (AC, 1990) is based on two such networks. One network learns to generate a probability distribution over outputs, the other learns to predict effects of the outputs. Each network minimizes the objective function maximized by the other. (ii) Generative Adversarial Networks (GANs, 2010-2014) are an application of AC where the effect of an output is 1 if the output is in a given set, and 0 otherwise. (iii) Predictability Minimization (PM, 1990s) models data distributions through a neural encoder that maximizes the objective function minimized by a neural predictor of the code components. I correct a previously published claim that PM is not based on a minimax game.

A Style-Based Generator Architecture for Generative Adversarial Networks

Article

Jan 2020

We propose an alternative generator architecture for generative adversarial networks, borrowing from style transfer literature. The new architecture leads to an automatically learned, unsupervised separation of high-level attributes (e.g., pose and identity when trained on human faces) and stochastic variation in the generated images (e.g., freckles, hair), and it enables intuitive, scale-specific control of the synthesis. The new generator improves the state-of-the-art in terms of traditional distribution quality metrics, leads to demonstrably better interpolation properties, and also better disentangles the latent factors of variation. To quantify interpolation quality and disentanglement, we propose two new, automated methods that are applicable to any generator architecture. Finally, we introduce a new, highly varied and high-quality dataset of human faces.

Self-Supervised GANs via Auxiliary Rotation Loss

Conference Paper

Jun 2019

Quaternion Convolutional Neural Networks for Detection and Localization of 3D Sound Events

Conference Paper

May 2019

Quaternion Convolutional Neural Networks for Heterogeneous Image Processing

Conference Paper

May 2019

Quaternion Generative Adversarial Networks

Abstract and Figures

Recommended publications

Quaternion Generative Adversarial Networks

Quaternion-Valued Variational Autoencoder

Hypercomplex Image-to-Image Translation

An Information-Theoretic Perspective on Proper Quaternion Variational Autoencoders