ArticlePDF Available

Artwork Style Transfer Model using Deep Learning Approach

Authors:

Abstract and Figures

Art in general and fine arts, in particular, play a significant role in human life, entertaining and dispelling stress and motivating their creativeness in specific ways. Many well-known artists have left a rich treasure of paintings for humanity, preserving their exquisite talent and creativity through unique artistic styles. In recent years, a technique called 'style transfer' allows computers to apply famous artistic styles into the style of a picture or photograph while retaining the shape of the image, creating superior visual experiences. The basic model of that process, named 'Neural Style Transfer,' has been introduced promisingly by Leon A. Gatys; however, it contains several limitations on output quality and implementation time, making it challenging to apply in practice. Based on that basic model, an image transform network was proposed in this paper to generate higher-quality art-work and higher abilities to perform on a larger image amount. The proposed model significantly shortened the execution time and can be implemented in a real-time application , providing promising results and performance. The outcomes are auspicious and can be used as a ref-erenced model in color grading or semantic image seg-mentation, and future research focuses on improving its applications.
Content may be subject to copyright.
CYBERNETICS AND PHYSICS, VOL. 10, NO. 3, 2021, 127–137
ARTWORK STYLE TRANSFER MODEL USING DEEP
LEARNING APPROACH
Ngo Le Huy Hien
School of Built Environment, Engineering, and Computing
Leeds Beckett University
England
n.hien2994@student.leedsbeckett.ac.uk
Luu Van Huy
Department of Information and Technology
University of Science and Technology, The University of Danang
Vietnam
luuvanhuy2012@gmail.com
Nguyen Van Hieu*
Department of Information and Technology
University of Science and Technology, The University of Danang
Vietnam
nvhieuqt@dut.udn.vn
Article history:
Received 24.08.2021, Accepted 25.11.2021
Abstract
Art in general and fine arts, in particular, play a signifi-
cant role in human life, entertaining and dispelling stress
and motivating their creativeness in specific ways. Many
well-known artists have left a rich treasure of paintings
for humanity, preserving their exquisite talent and cre-
ativity through unique artistic styles. In recent years, a
technique called ’style transfer’ allows computers to ap-
ply famous artistic styles into the style of a picture or
photograph while retaining the shape of the image, cre-
ating superior visual experiences. The basic model of
that process, named ’Neural Style Transfer,’ has been in-
troduced promisingly by Leon A. Gatys; however, it con-
tains several limitations on output quality and implemen-
tation time, making it challenging to apply in practice.
Based on that basic model, an image transform network
was proposed in this paper to generate higher-quality art-
work and higher abilities to perform on a larger image
amount. The proposed model significantly shortened the
execution time and can be implemented in a real-time ap-
plication, providing promising results and performance.
The outcomes are auspicious and can be used as a ref-
erenced model in color grading or semantic image seg-
mentation, and future research focuses on improving its
applications.
Key words
Image Processing, Style Transfer, Image Transformer
Network, Deep Learning, Convolution Neural Network.
1 Introduction
Nowadays, tremendous efforts have been put into ac-
celerating different techniques to assist computers in
performing human-like tasks such as classification and
communication due to the fast-growing artificial intel-
ligence advancements. Most of the conventional deep
learning techniques contribute to improving productiv-
ity and quality of life, and artworks are one of the
promising areas. Preserving artistic features thereby be-
comes noteworthy for both human duties and technol-
ogy, as they represent the heritage and culture through
time [Doulamis and Varvarigou, 2012]. The first im-
age transforming algorithm was founded as an inno-
vation to change a picture’s style while preserving its
shape, which has drawn substantial attention [Gatys
et al., 2016b]. The term ‘neural style transfer’ was in-
troduced to indicate a machine learning technique for
converting an image’s style from an initial to another by
blending a content image into a style reference image,
such as artworks from reputed artists. The fundamental
purpose is to generate a content-look-alike image but is
artificially drawn in the style of the reference image, as
illustrated in Figure 1.
Although several researchers have been involved in
this trend, there are still many spaces for developing and
solving the backlog of limitations [Liu et al., 2017; Chen
et al., 2018; Gatys et al., 2016a]. In the scope of this
study, the authors have pointed out those remaining lim-
itations in the image transformation models and propose
methods to resolve them. Moreover, a web application
was implemented to apply the built model, attempted to
create a number of exquisite art paintings. The approach
128 CYBERNETICS AND PHYSICS, VOL. 10, NO. 3, 2021
Figure 1. Ha Long Bay image styled in an udnie art.
of this study is divided into 3 steps: building a model
to generate uncomplicated images and conveying obser-
vations related to the drawbacks of the model; proposing
an image conversion network to enhance the pattern con-
version model based on the initial model; and applying
model transformation and building an artistic image gen-
erator website. The proposed model significantly short-
ened the execution time and can be applied as a refer-
enced model in real-time applications.
2 Literature Review
In recent years, many research authors have been solv-
ing style image transformation problems by training con-
volutional neural networks with loss functions per pixel
[Selim et al., 2016; Zhao et al., 2020]. A typical example
can be seen from the ’Neural Style Transfer’ model pro-
posed by Leon A. Gatys [Gatys et al., 2016b], which has
used the content image and reference image for training
without any large training dataset directly. The model
has produced new images of high perceptual quality that
blend the appearance of famous artworks into the content
of an arbitrary photograph, with insights into deep image
representations. Although this model contains unavoid-
able shortcomings, it is a prerequisite for later studies
on image generation methods [Chen et al., 2018; Jing
et al., 2019]. AdaIN model, for example, was developed
to match the mean-variance of a content image with a
reference image for remodeling features [Huang and Be-
longie, 2017]. A patch match procedure was introduced
in the swap model [Chen and Schmidt, 2016] for alter-
nating content features with the nearest match style char-
acteristic. Li et al. [Li et al., 2017] proposed a multilevel
stylization for transforming whiten color recursively, im-
proving the output quality and preserve the content struc-
ture. The style transfer problem can be widely used
in other neural networks image processing applications,
such as text classification, semantic parsing, and infor-
mation extraction. The deep fake image approach [?],
for example, might be considered a solution to compara-
ble facial feature transfer challenges. Moreover, it can be
seen from these studies that there is a trade-off between
content and style losses. Therefore, many researchers
have considered image super-resolution and image seg-
mentation methods.
Image super-resolution is a classic problem related to
image processing [Bazhanov et al., 2018], and there
have been a number of researchers involved in this area
[Cheng et al., 2019; Ma et al., 2020]. Yang et al. [Yang
et al., 2014] provided a general review of previously
standard techniques when applying convolutional neu-
ral networks to their research. Some other output quality
improvement methods were suggested, such as a model
of Chao Dong et al. [Dong et al., 2015], taking a low-
resolution content image through a convolutional neural
network [Andreev and Maksimenko, 2019] and produc-
ing an image with high resolution. While their neural
network architecture is uncomplicated and less weight, it
exhibits a high quality of image recovery and performs in
a short time to apply in practical applications. The above
studies are the driving force behind this research to pro-
duce the same high-quality transition images as conven-
tional image hyper-resolution.
Image Segmentation methods divide an image into
many different image areas [Long et al., 2015]. Image
segmentation also has the same objective as the object
detection problem: detecting the image area containing
the object and labeling them appropriately [Noh et al.,
2015]. Although the issue of image segmentation re-
quires a higher level of detail, in return, the algorithm
gives an understanding of the image’s content at a deeper
level. Simultaneously, it reveals the position of the ob-
ject in the image, the shape of the object, and which
object each pixel belongs to [Zheng et al., 2015]. This
method generates labels for image regions for the input
image to run through a fully convolutional neural net-
work, trained with the loss function per pixel.
The objective of this research is to develop a model
that can provide smooth transitions and results as sharp
as the study of Chao Dong et al. [Dong et al., 2015]. It is
also proposed an image transformation network inspired
by two studies of Long J. [Long et al., 2015] and Noh H.
[Noh et al., 2015], improving the quality of the output
image and shorten the transition time. The results are
promising for transferring artwork style to other images,
contributing to the applications in many natural science
fields, including materials science and physics.
3 Standard Image Style Conversion Model
In the standard image style conversion model proposed
by Leon A. Gatys et al. [Gatys et al., 2016b], its input
data consists of a content image and a reference image.
The resulting (target) image is initialized as white noise
before applying convolutional neural networks (CNN) to
transform the white noise image closer to the content and
visual style.
As illustrated in Figure 2, this standard model com-
prises 2 steps:
1. Training by a convolutional neural network for ex-
tracting features.
2. Calculating the loss functions of the content image
and the reference image to update the target image
CYBERNETICS AND PHYSICS, VOL. 10, NO. 3, 2021 129
closer to the content and reference image. The result
is obtained when the total loss is minimum.
Figure 2. Standard Conversion Model.
3.1 Convection Neural Network Features Extrac-
tion
The standard model applies the VGG-16 pre-trained
neural network to extract the content image and refer-
ence image features. Despite using the same VGG16
network, content images and reference images have
different extraction features. When passing an image
through the convolutional neural network, the higher-
order layers in the convolutional neural network capture
the ’high value’ content about the objects and their ar-
rangement from the input image without correct pixel
values. In contrast, the lower layers reproduce the exact
pixel values of the original image. Therefore, features
extracted from higher-order layers are considered image
content features, while features at lower layers are con-
sidered image style features, as shown in Figure 3.
Figure 3. The CNN Network extracts the features of content and style
images.
Based on the study of Leon A. Gatys et al. [Gatys et al.,
2016b], the class of the VGG16 network was used to
extract the characteristic of:
Content image: conv5 2;
Reference image: conv1 1, conv2 1, conv3 1,
conv4 1, conv5 1.
3.2 Building Loss Function
This section calculates the loss functions of the content
image and the reference image to update the target image
closer to the content and reference image, as presented
below.
3.2.1 Loss Function for Recreating Content To
extract content features, the content image was passed
through filters of a convolutional neural network. For
each class of the CNN network with Nlfilters, Nlfea-
ture maps sized Ml(Ml= width ×height of the image)
were generated. Therefore, each class l stores the matrix
FlRNl×Ml, in which Fl
ij is the trigger function of the
ith filter at position j in class l. Let
pand
xis the input
and output image, and Pland Flrepresent their respec-
tive features in the class. The loss function is based on
content characteristics of the input and output image is:
Lcontent =1
2X
i,j Fl
ij Pl
ij2(1)
3.2.2 Loss Function For Recreating Style Calcu-
lating the style loss function is relatively more compli-
cated, whereas it follows the same principle. Instead of
comparing the intermediate outputs of the content im-
age and reference image, a Gram matrice was handled to
compare the two outputs.
Calculating the Gram matrix: After the target im-
age and reference image passed through a convolutional
neural network, a nCfeature map was obtained with
nH×nWdimensions. To calculate the similarity of the
2 images, nCfeature maps were taken before comput-
ing the scalar product of two feature vectors (each fea-
ture map is flattened to a feature vector) on each corre-
sponding pair of maps. As a result, a Gram matrix with
nC×nCdimensions was created, as shown in Figure 4.
Given a set nCof nH×nWvectors, the Gram matrix G
is the matrix of all possible inner products of nC:
Gij =F|
iFj(2)
The Gram matrix determines the vectors Fiup to isom-
etry and indicates the correlation between filters.
Figure 4. The Gram Matrix is created from a target image and a ref-
erence image.
130 CYBERNETICS AND PHYSICS, VOL. 10, NO. 3, 2021
The style features are given by the Gram matrix Gl
RNlxNl, in which Gl
ij is the inner product between the
feature maps, are vectors i and j in class l:
Gl
ij =X
k
Fl
ikFl
jk (3)
Let
aand
xis the input image and target output im-
age, and Aland Glare their corresponding styles in class
l. Then, the contribution of class l to the total loss is:
El=1
4N2
lM2
lX
i,j
(Gl
ij Al
ij)2(4)
And the sum of the style loss function is:
Lstyle =
L
X
l=0
wlEl(5)
3.2.3 Synthesizing the Loss Function To convert
the style of a work of art
ato an image
p, it is needed
to synthesize a new image while having a representation
of its content of
pand perform the style of
a. There-
fore, this study uses the content and style loss aggrega-
tion function.
Ltotal =αLcontent +βLstyle (6)
in which αand βare the weights for content reproduc-
tion and style reproduction, respectively. The selection
of αand βalso affects the quality of the final output im-
age.
3.3 Results
Figure 5. Applying Standard Transformation Model with the content
image of Golden Bridge in Da Nang and the reference image of Candy
Artwork.
In general, the standard model was able to apply the
style of the given image to the content image. How-
ever, it can be seen from Figure 5 that the quality of
the output image remains relatively low, although it has
been through a long period of training. The color arrays
were changed disorderly in the resulting image, contain-
ing noise, and its resolution was severely reduced. Not
only that, on average, a transfer takes 10 to 15 minutes
for 100 loops, and it increases correspondingly with the
number of loops and the resolution of the input image.
This is comparatively a long time to attain a possible pat-
tern conversion when applying to the real practices.
In the upcoming section, the study proposes a new type
conversion model based on the standard type conversion
model to improve image generation speed and image
quality.
4 Proposed Image Style Conversion Model
From the standard image style conversion model out-
lined above, this section proposes a new image style con-
version model. The concept, architecture, and experi-
ment results of the proposed model are presented below.
4.1 The Concept
The standard conversion model initializes the target
image as white noise, then attempts to match it in a typi-
cal way of the content image and reference image, lead-
ing to a time-consuming process and low-quality output.
Accordingly, the concept of this model is to propose an
image conversion network that possesses the ability to
learn features similar to a content image. When hav-
ing an image that needs to be transformed, its features
can be obtained by the model faster and more accurately,
thereby reducing the conversion time to the target image,
as indicated in Figure 6.
Content Image
Target Image
Reference Image
VGG16
Content
Features
Reference
Features
Content
Features
Reference
Features
Style Loss Function
Content Loss Function
Image
Transformation
Network
Figure 6. Proposed image style conversion model.
CYBERNETICS AND PHYSICS, VOL. 10, NO. 3, 2021 131
4.2 Model Architecture
4.2.1 Image Transformation Network The im-
age transformation network is a convolutional neural
network with the residual mass parameterized by the
weights of W; it converts the input image x to the output
image ˆythrough mapping ˆy=fW(x). Each loss function
li(ˆy,yi) measures the differences between output image
ˆyand input image yi. The image transformation network
was trained using stochastic gradient descent to optimize
weights of the loss function:
W=argminWEx,{yi}"X
i=1
λili(fW(x), yi)#(7)
Using the above-mentioned loss function, the follow-
ing details of the convolutional neural network have been
used in the proposed image transformation network, as
presented in Table 1.
There are a few points of interest in this image transfor-
mation network model; the stridden convolutions down-
sampling and upsampling blocks in the network were
used instead of pooling layers. The network body con-
sists of five residual blocks by using the architecture in
the research of Kaiming H. et al. [He et al., 2016].
All convolutional classes (no residual block) were fol-
lowed by the normalization class batch and the ReLU
non-linear classes, as presented in Figure 7.
4.2.2 The Function of the Image Conversion Net-
work Layers Downsampling with strided convolu-
tions: The first convolution in the proposed network has
stride = 1, but the following two layers have stride =
2. This means that every time the filter was relocated,
it shifts 2 pixels instead of 1 and the output image has
the size of n/2 ×n/2. With a convolution layer stride
= 2, it reduced half of the input size, called downsam-
pling. Since the input image can be sampled down; each
pixel of the input results from a calculation involving
the larger number of pixels from the original input im-
age. This approach allows the kernel filters to access a
much larger portion of the original input image without
increasing the kernel size. Applying a certain conversion
pattern to the entire image creates more information for
each filter related to the original input image, making the
conversion network performs better.
Upsampling with strided convolutions: Upsampling is
the contrary of downsampling, which is used with stride
= 2. After applying 2 layers of downsampling, the im-
age size is reduced by 1/4 compared to the original. The
desired output of the converter network is a typed image
with the same resolution as the original content image.
To achieve this, 2 convolution layers were applied with
stride = 1/2. These layers, which increase the output im-
age size, are called upsampling.
Residual convergence class [He et al., 2016]: Between
the downsampling and upsampling classes, there are 5
residual blocks. These are first-order convolution lay-
ers, but the difference between these layers and the con-
ventional convolution layers is that the network’s input
directly contributes to the output.
Figure 7. Image transformation network architecture.
4.2.3 Constructing Loss Function Like the stan-
dard model, the proposed model also utilizes the VGG16
pre-trained neural network [Van Hieu and Hien, 2020a;
Van Hieu and Hien, 2020b] to measure the loss in the
input images’ content or art style. However, the content-
feature reconstruction loss function was computed at
class relu2 2, and the style-feature reconstruction loss
function was calculated in classes relu 2, relu 2 2, relu
3 3, and relu 4 3.
132 CYBERNETICS AND PHYSICS, VOL. 10, NO. 3, 2021
Table 1. Details of proposed image transformation network.
Operation Kernel size Stride Feature maps Padding Activation
Network – 256 ×256 ×3input
Conv2d 9 1 32 ReflectionPad2d
InstanceNorm2d 32
Conv2d 3 2 64 ReflectionPad2d
InstanceNorm2d 64
Conv2d 3 2 128 ReflectionPad2d
InstanceNorm2d 128
Residual block 128 ReLU
Residual block 128 ReLU
Residual block 128 ReLU
Residual block 128 ReLU
Residual block 128 ReLU
Upsampling 64
InstanceNorm2d 64
Upsampling 32
InstanceNorm2d 32
Conv2d 9 1 3 ReflectionPad2d ReLU
Feature Reconstruction Loss: Instead of forcing pixels
of the output image ˆy=fW(x) exactly matches the pix-
els of the target image, they have the same feature repre-
sentations as computed by the loss network φ. Let φj (x)
are the features of the j layer of the network φwhen pro-
cessing image x; if j is a convolution class, φj(x) will be
a feature map of the shape Cj ×H j ×W j. The loss on
feature reproduction is the Euclidean distance between
the object’s features:
lφ
feat(ˆy , y) = 1
CjHjWj
kφjy)φj(y)k2
F(8)
Style Reconstruction Loss: The idea of calculating the
style loss function is the comparison of the Gram matrix
of the visual outputs as in the standard model.
Gφ
j(x)c,c0=1
CjHjWj
Hj
X
h=1
Wj
X
w=1
φj(x)h,w,cφj(x)h,w ,c0
(9)
Then, the style loss function is the Frobenius standard
square of the difference between the Gram matrix of the
output image and the input image:
lφ
style(ˆy , y) = kGφ
jy)Gφ
j(y)k2
F(10)
To perform style reproduction from a set of J classes
instead of a class j, let define lφ ,J
style y , y)is the sum of
the losses per class j J.
lφ ,J
style y , y) =
J
X
j
lφ ,j
style y , y)(11)
CYBERNETICS AND PHYSICS, VOL. 10, NO. 3, 2021 133
4.3 Results
Figure 8. Proposed style conversion model from the content image of
Golden Bridge in Da Nang and the reference image of Candy artwork.
Figure 9. Comparison of results generated by the Standard transfor-
mation model (top) and Proposed style conversion model (bottom).
It is undoubtedly noticed from Figure 8 and 9 that the
improved conversion network of the proposed model is
significantly better than the standard model, in general.
The improved conversion is many times faster than the
basic transitions model, and small images can even be
executed in real-time. The essence of artwork gener-
ated from the proposed model is also more artistic and
sharper than the first model. Moreover, proposed mod-
els with a trained style through one training can apply its
style for any content image in later usage.
5 Experiments and Results
5.1 Dataset Preparation and Configuration
In this study, the dataset used for implementation in-
cludes the Flirck8k dataset (8100 images, 1Gb) and the
Microsoft Coco dataset (80000 images, 13Gb). The con-
tent of these image datasets is mostly scenes of diversi-
fied environments such as mountains, rivers, and trees,
as illustrated in Figure 10. Given the diversity of these
datasets, it was expected the model to be able to style any
content image even if it has never been trained before.
Figure 10. Flickr8k Dataset.
The Flirck8k dataset was trained on Google Colab
GPU Tesla K80, while the Microsoft Coco dataset was
trained on a personal computer configured with Intel
Core i7 - 4700MQ. The standard conversion model was
configured with the loss function shown in Figure 11 and
the following characteristics.
Content weight : 1e3loss weight of content;
Style weight: 1e2loss weight of style;
Content image: golden bridge.jpg the content im-
age prepared to be styled;
Style image: candy.jpg reference image.
And the proposed conversion model was configured
with the loss function shown in Figure 12 and the fol-
lowing characteristics.
Content weight: 1e5;
Style weight: 5e10 ;
Style image: candy.jpg;
Learning rate: 1e3the learning rate of the neural
network.
134 CYBERNETICS AND PHYSICS, VOL. 10, NO. 3, 2021
Figure 11. Loss function diagram of the standard loop-based conver-
sion model.
Figure 12. Loss function diagram of the proposed loop-based con-
version model.
5.2 Results
Figures 13, 14 and Table 2 showcase the results and
comparations of the models through different criteria,
from which the advantages and disadvantages of each
model will be concluded. Traditional metrics were used
to evaluate output image quality, Peak signal to noise
ratio (PSNR), and structural index similarity (SSIM)
[Wang et al., 2004], both of which represent the human
view of image quality [Hanhart et al., 2013; Huynh-Thu
and Ghanbari, 2008; Sheikh et al., 2006]. The goal of
these analyses was not to achieve the best PSNR or SSIM
results but simply to show differences in output image
quality, characteristic loss from the original image be-
tween different models trained.
5.3 Discussion
It is noticed from Figure 14 that the PSNR index of
the image from the standard and proposed models is rel-
atively close; sometimes, the standard model is higher
than the proposed model (the higher the PSNR, the less
interference effect). However, this index does not pre-
cisely determine the quality of the output image. Specif-
ically for images from the standard model, the noise was
calculated on the whole picture while it is measured in
the picture’s unimportant parts in the proposed model;
therefore, the image quality of the proposed model was
still better than from the standard model. Furthermore,
the proposed model’s image always retains the main im-
age characteristics compared to the original image, per-
forming much better than the standard model. This can
be indicated that the SSIM of the proposed model was
always higher than the standard one.
Figure 15. With the same content and reference image. The proposed
style conversion model (right below) provides better quality than the
standard conversion model (left below).
Figure 15 proves that with the same content and refer-
ence image, the proposed model produces a completely
better image quality than the standard model, which
proves the effectiveness of using additional conversion
networks.
Figure 16. The proposed model was trained on the Flick8k dataset
(left, bottom) and Coco dataset (the right side, below) using the content
image of the homeland of Vietnam and the reference image of Candy
artwork.
Whereas Figure 16 shows that with the proposed
model, the model trained on a larger data set gives supe-
rior results compared to the model on the small dataset
(the Microsoft Coco dataset is 10 times higher than the
Flick8k dataset). This supremacy once again confirms
the efficiency of using a larger image conversion network
of the proposed model compared to the smaller training
dataset.
CYBERNETICS AND PHYSICS, VOL. 10, NO. 3, 2021 135
Figure 13. Some of the artwork results produced by the proposed model.
Figure 14. Comparison of PNSR / SSIM of the images generated from the models (under each image).
136 CYBERNETICS AND PHYSICS, VOL. 10, NO. 3, 2021
Table 2. Comparison of the execution times of the models.
Model / Image size
(pixel)
Standard Model Proposed Model on
Flirck8k Dataset
Proposed Model on Mi-
crosoft Coco Dataset
256 ×256 5 minutes 0.6 second 0.6 second
512 ×512 15 minutes 5 seconds 5 seconds
1024 ×1024 31 minutes 30 seconds 30 seconds
5.4 Artwork Generator Website
To create exquisite artistic paintings and demonstrate
the effectiveness of the prosed image transformation
model that can be applied in practice, an artistic photo
generator website was built with the following main
functions.
Art exhibitions: (Figure 17) View pictures, collec-
tions of famous artists, exhibition events, and re-
lated information.
Artwork Generator: (Figure 18) Transforms images
from exhibited artworks or images uploaded from
user devices.
Figure 17. Layout of the artwork generator website.
Figure 18. Artistic Image Generator Website – several style-
transformed images.
6 Conclusion and Future Work
In this paper, a new image conversion model was pro-
posed and compared with a previous conversion archi-
tecture, using a convolutional neural network for trans-
formation, pre-trained VGG16 model for measuring the
loss in the input images’ content. Despite substantial
efforts that have been put in the past ([Huang and Be-
longie, 2017; Chen and Schmidt, 2016; Bourached et al.,
2021]), this research introduced a high-efficiency model
for image style transfer in large-scale and time-efficient
focus. The proposed model generates a remarkable out-
come on output image quality, performing much better
than the standard model while also retaining the main
image characteristics compared to the original model
in both PSNR and SSIM index. Moreover, it also re-
duced more than 90% the execution time, which is sig-
nificant for executing in real-time practice. And these
study outcomes open up new avenues for future research
and can play as a crucial source for future image style
transfer systems. The new image conversion model can
not only be applied in arts but also in various areas such
as geotechnics, civil engineering, materials science, and
physics.
Future work may gear towards improving applications
of the model, such as the ability to convert more artwork
styles in a single training session or the flexibility to ad-
just the degree of transformation. This is also a promis-
ing and potential image conversion model that can be ap-
plied in color grading or semantic image segmentation.
7 Acknowledgement
This research was funded and implemented for the
Mercury project of Est Rouge Technologies JSC, Viet-
nam. This work was also supported by the People’s
Committee of Danang and The University of Danang,
Vietnam.
References
Andreev, A. and Maksimenko, V. (2019). Synchroniza-
tion in coupled neural network with inhibitory cou-
pling. Cybernetics and Physics,8(4), pp. 199–204.
Bazhanov, P., Kotina, E., Ovsyannikov, D., and Ploskikh,
V. (2018). Optimization algorithm of the velocity field
determining in image processing. Cybernetics and
Physics,7(4), pp. 174–181.
Bourached, A., Cann, G., Griffiths, R.-R., and Stork,
D. G. (2021). Recovery of underdrawings and ghost-
paintings via style transfer by deep convolutional neu-
ral networks: A digital tool for art scholars. arXiv
preprint arXiv:2101.10807.
CYBERNETICS AND PHYSICS, VOL. 10, NO. 3, 2021 137
Chen, D., Yuan, L., Liao, J., Yu, N., and Hua, G. (2018).
Stereoscopic neural style transfer. In Proceedings of
the IEEE Conference on Computer Vision and Pattern
Recognition, pp. 6654–6663.
Chen, T. Q. and Schmidt, M. (2016). Fast patch-
based style transfer of arbitrary style. arXiv preprint
arXiv:1612.04337.
Cheng, M.-M., Liu, X.-C., Wang, J., Lu, S.-P., Lai, Y.-
K., and Rosin, P. L. (2019). Structure-preserving neu-
ral style transfer. IEEE Transactions on Image Pro-
cessing,29, pp. 909–920.
Dong, C., Loy, C. C., He, K., and Tang, X. (2015).
Image super-resolution using deep convolutional net-
works. IEEE transactions on pattern analysis and ma-
chine intelligence,38 (2), pp. 295–307.
Doulamis, A. and Varvarigou, T. (2012). Image analy-
sis for artistic style identification: A powerful tool for
preserving cultural heritage. Emerging technologies in
non-destructive testing V,71.
Gatys, L. A., Bethge, M., Hertzmann, A., and Shecht-
man, E. (2016a). Preserving color in neural artistic
style transfer. arXiv preprint arXiv:1606.05897.
Gatys, L. A., Ecker, A. S., and Bethge, M. (2016b). Im-
age style transfer using convolutional neural networks.
In Proceedings of the IEEE conference on computer
vision and pattern recognition, pp. 2414–2423.
Hanhart, P., Korshunov, P., and Ebrahimi, T. (2013).
Benchmarking of quality metrics on ultra-high defini-
tion video sequences. In 2013 18th International Con-
ference on Digital Signal Processing (DSP), IEEE, pp.
1–8.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep
residual learning for image recognition. In Proceed-
ings of the IEEE conference on computer vision and
pattern recognition, pp. 770–778.
Huang, X. and Belongie, S. (2017). Arbitrary style trans-
fer in real-time with adaptive instance normalization.
pp. 1501–1510.
Huynh-Thu, Q. and Ghanbari, M. (2008). Scope of va-
lidity of psnr in image/video quality assessment. Elec-
tronics letters,44 (13), pp. 800–801.
Jing, Y., Yang, Y., Feng, Z., Ye, J., Yu, Y., and Song, M.
(2019). Neural style transfer: A review. IEEE transac-
tions on visualization and computer graphics,26 (11),
pp. 3365–3385.
Li, Y., Fang, C., Yang, J., Wang, Z., Lu, X., and Yang,
M.-H. (2017). Universal style transfer via feature
transforms. arXiv preprint arXiv:1705.08086.
Liu, X.-C., Cheng, M.-M., Lai, Y.-K., and Rosin, P. L.
(2017). Depth-aware neural style transfer. In Proceed-
ings of the Symposium on Non-Photorealistic Anima-
tion and Rendering, pp. 1–10.
Long, J., Shelhamer, E., and Darrell, T. (2015). Fully
convolutional networks for semantic segmentation. pp.
3431–3440.
Ma, W., Chen, Z., and Ji, C. (2020). Block shuffle: A
method for high-resolution fast style transfer with lim-
ited memory. IEEE Access,8, pp. 158056–158066.
Noh, H., Hong, S., and Han, B. (2015). Learning decon-
volution network for semantic segmentation. In Pro-
ceedings of the IEEE international conference on com-
puter vision, pp. 1520–1528.
Selim, A., Elgharib, M., and Doyle, L. (2016). Paint-
ing style transfer for head portraits using convolu-
tional neural networks. ACM Transactions on Graph-
ics (ToG),35 (4), pp. 1–18.
Sheikh, H. R., Sabir, M. F., and Bovik, A. C. (2006).
A statistical evaluation of recent full reference image
quality assessment algorithms. IEEE Transactions on
image processing,15 (11), pp. 3440–3451.
Van Hieu, N. and Hien, N. L. H. (2020a). Automatic
plant image identification of vietnamese species using
deep learning models. International Journal of Engi-
neering Trends and Technology,68 (4), pp. 25–31.
Van Hieu, N. and Hien, N. L. H. (2020b). Recognition of
plant species using deep convolutional feature extrac-
tion. International Journal on Emerging Technologies,
11, pp. 904–910.
Wang, Z. and Bovik, A. C. (2009). Mean squared er-
ror: Love it or leave it? a new look at signal fidelity
measures. IEEE signal processing magazine,26 (1),
pp. 98–117.
Wang, Z., Bovik, A. C., Sheikh, H. R., and Simoncelli,
E. P. (2004). Image quality assessment: from error
visibility to structural similarity. IEEE transactions on
image processing,13 (4), pp. 600–612.
Yang, C.-Y., Ma, C., and Yang, M.-H. (2014). Single-
image super-resolution: A benchmark. In European
conference on computer vision, Springer, pp. 372–386.
Zhao, H.-H., Rosin, P. L., Lai, Y.-K., and Wang, Y.-N.
(2020). Automatic semantic style transfer using deep
convolutional neural networks and soft masks. The Vi-
sual Computer,36 (7), pp. 1307–1324.
Zheng, S., Jayasumana, S., Romera-Paredes, B., Vi-
neet, V., Su, Z., Du, D., Huang, C., and Torr, P. H.
(2015). Conditional random fields as recurrent neural
networks. In Proceedings of the IEEE international
conference on computer vision, pp. 1529–1537.
... Presently, there are predominantly two categories of image style transfer methods based on deep learning: Convolutional Neural Networks (CNNs) and Generative Adversarial Networks (GAN). The former encompasses techniques such as texture synthesis-based image style transfer as proposed by Gatys et al. [1][2][3], rapid image style transfer achieved through training image features with a forward residual transfer network by Johnson et al. [4], an approach to enhancing the quality and efficiency of image style transfer through CNNs by Hien et al. [5], and quick attainment of video diversification in style transfer using the VGG19 model by Dong et al. [6], despite potential issues like flickering and information loss during the image transfer process. Hong et al. [7] introduced a self-attention mechanism in deep learning to blend Chinese landscape painting with classical private garden virtual scenes, while Li et al. [8] incorporated the Canny operator into the CNN model to sharpen the edges of generated transferred images. ...
Article
Full-text available
To reduce the occurrence of information loss and distortion in image style transfer, a method is proposed for researching and designing image style transfer technology based on multi‐scale convolutional neural networks (CNNs) feature fusion. Initially, the VGG19 model is designed for coarse and fine‐scale networks to achieve multi‐scale CNN feature extraction of target image information. Subsequently, while setting the corresponding feature loss function, an additional least‐squares penalty parameter is introduced to balance the optimal total loss function. Finally, leveraging the characteristics of stochastic gradient descent iteration, image features are fused and reconstructed to obtain better style transfer images. Experimental evaluations utilize peak signal‐to‐noise ratio (PSNR), structural similarity index (SSIM), information entropy (IE), and mean squared error (MSE) as metrics for assessing the transferred images, comparing them with three typical image style transfer methods. Results demonstrate that the proposed method achieves optimal performance across all metrics, realizing superior image style transfer effects.
... The organization of the visual cortex and the human brain's neural network both had an influence on CNN's architecture [54]. Individual neurons can only respond to stimuli in the restricted visual field region known as the Receptive Field. ...
... Another significant area in AI-driven art generation is style transfer models. These models allow the application of the style of one artwork to another image, producing remarkable results [10,11]. Implementing such models requires a sound understanding of Convolutional Neural Networks (CNNs) and their ability to extract stylistic features from artworks. ...
Chapter
Full-text available
This paper aims to give a contribution to one of the most discussed issues in recent times, in both scientific and art communities: the use of Artificial Intelligence (AI) based tools for creating artworks. As the issue is strongly multidisciplinary, we structured the paper as a debate between experts in several fields (computer science, art history, philosophy) to listen to their specific points of view on the topic. The first part of the paper is focused on the relationship between the artists and the use of AI techniques. Furthermore, we organized an art exhibition with images created by an AI-based tools, to also collect people’s feedbacks. We submitted to the viewers a questionnaire and their answers are reported in the experimental section. This, the second part is more focused on the visitors’ perspective and about their perception on the use of these tools.
... Academic circles made many achievements in researching digital painting art and image style transfer. For example, Hien et al. thought that there were some limitations in the output quality and implementation time of neural style transfer, so they put forward an image transformation network to generate higher quality artworks, and have a higher ability to perform on a larger number of images [16]. Huang et al. aimed to use computer support to help people's visual perception in painting practice. ...
Article
Full-text available
The Metaverse is regarded as a brand-new virtual society constructed by deep media, and the new media art produced by new media technology will gradually replace the traditional art form and play an important role in the infinite Metaverse in the future. The maturity of the new media art creation mechanism must also depend on the help of artificial intelligence (AI) and Internet of Things (IoT) technology. The purpose of this study is to explore the image style transfer of digital painting art in new media art, that is, to reshape the image style by neural network technology in AI based on retaining the semantic information of the original image. Based on neural style transfer, an image style conversion method based on feature synthesis is proposed. Using the feature mapping of content image and style image and combining the advantages of traditional texture synthesis, a richer multi-style target feature mapping is synthesized. Then, the inverse transformation of target feature mapping is restored to an image to realize style transformation. In addition, the research results are analyzed. Under the background of integrating AI and IoT, the creation mechanism of new media art is optimized. Regarding digital art style transformation, the Tensorflow program framework is used for simulation verification and performance evaluation. The experimental results show that the image style transfer method based on feature synthesis proposed in this study can make the image texture more reasonably distributed, and can change the style texture by retaining more semantic structure content of the original image, thus generating richer artistic effects, and having better interactivity and local controllability. It can provide theoretical help and reference for developing new media art creation mechanisms.
... However, the use of NST-generated stimuli for aesthetic research has several shortcomings. (1) Although the computational paradigms underlying NST are relatively well defined and understood (Semmo et al., 2017;Kotovenko et al., 2019;Hien et al., 2021), it is less well known how objective (physical) image properties are modulated by NST and how they mediate the aesthetic attributes and the liking of the generated images (Zhang et al., 2021). (2) The responses of beholders may be biased against computer-generated art (Chamberlain et al., 2018). ...
Article
Full-text available
Artificial intelligence has emerged as a powerful computational tool to create artworks. One application is Neural Style Transfer, which allows to transfer the style of one image, such as a painting, onto the content of another image, such as a photograph. In the present study, we ask how Neural Style Transfer affects objective image properties and how beholders perceive the novel (style-transferred) stimuli. In order to focus on the subjective perception of artistic style, we minimized the confounding effect of cognitive processing by eliminating all representational content from the input images. To this aim, we transferred the styles of 25 diverse abstract paintings onto 150 colored random-phase patterns with six different Fourier spectral slopes. This procedure resulted in 150 style-transferred stimuli. We then computed eight statistical image properties (complexity, self-similarity, edge-orientation entropy, variances of neural network features, and color statistics) for each image. In a rating study, we asked participants to evaluate the images along three aesthetic dimensions (Pleasing, Harmonious, and Interesting). Results demonstrate that not only objective image properties, but also subjective aesthetic preferences transferred from the original artworks onto the style-transferred images. The image properties of the style-transferred images explain 50 – 69% of the variance in the ratings. In the multidimensional space of statistical image properties, participants considered style-transferred images to be more Pleasing and Interesting if they were closer to a “sweet spot” where traditional Western paintings (JenAesthetics dataset) are represented. We conclude that NST is a useful tool to create novel artistic stimuli that preserve the image properties of the input style images. In the novel stimuli, we found a strong relationship between statistical image properties and subjective ratings, suggesting a prominent role of perceptual processing in the aesthetic evaluation of abstract images.
... It is recognized that Linear Regression, Multiple Linear Regression, Univariate Polynomial Regression, and Multivariate Polynomial Regression are very potential in this field, which answered the research question RQ3.2. [41,42] has been employed in this study to estimate the total CO 2 emissions and fuel consumption of vehicles from multiple inputs. CNN is a form of deep neural network that is often used to explore visual imagery [37,43]. ...
Article
Full-text available
Due to the alarming rate of climate change, fuel consumption and emission estimates are critical in determining the effects of materials and stringent emission control strategies. In this research, an analytical and predictive study has been conducted using the Government of Canada dataset, containing 4973 light-duty vehicles observed from 2017 to 2021, delivering a comparative view of different brands and vehicle models by their fuel consumption and carbon dioxide emissions. Based on the findings of the statistical data analysis, this study makes evidence-based recommendations to both vehicle users and producers to reduce their environmental impacts. Additionally, Convolutional Neural Networks (CNN) and various regression models have been built to estimate fuel consumption and carbon dioxide emissions for future vehicle designs. This study reveals that the Univariate Polynomial Regression model is the best model for predictions from one vehicle feature input, with up to 98.6% accuracy. Multiple Linear Regression and Multivariate Polynomial Regression are good models for predictions from multiple vehicle feature inputs, with approximately 75% accuracy. Convolutional Neural Network is also a promising method for prediction because of its stable and high accuracy of around 70%. The results contribute to the quantifying process of energy cost and air pollution caused by transportation, followed by proposing relevant recommendations for both vehicle users and producers. Future research should aim towards developing higher performance models and larger datasets for building APIs and applications.
Article
Full-text available
In recent years, Magnetic Resonance Imaging (MRI) has emerged as a prevalent medical imaging technique, offering comprehensive anatomical and functional information. However, the MRI data acquisition process presents several challenges, including time-consuming procedures, prone motion artifacts, and hardware constraints. To address these limitations, this study proposes a novel method that leverages the power of generative adversarial networks (GANs) to generate multi-domain MRI images from a single input MRI image. Within this framework, two primary generator architectures, namely ResUnet and StarGANs generators, were incorporated. Furthermore, the networks were trained on multiple datasets, thereby augmenting the available data, and enabling the generation of images with diverse contrasts obtained from different datasets, given an input image from another dataset. Experimental evaluations conducted on the IXI and BraTS2020 datasets substantiate the efficacy of the proposed method compared to an existing method, as assessed through metrics such as Structural Similarity Index (SSIM), Peak Signal-to-Noise Ratio (PSNR) and Normalized Mean Absolute Error (NMAE). The synthesized images resulting from this method hold substantial potential as invaluable resources for medical professionals engaged in research, education, and clinical applications. Future research gears towards expanding experiments to larger datasets and encompassing the proposed approach to 3D images, enhancing medical diagnostics within practical applications.
Article
Full-text available
The problem of a displacement field calculation for an image sequence based on a discrete model is being solved. Algorithms for velocity field (displacement field) construction are in demand in various image processing tasks. These methods are used in motion detection, object movement tracking, analysis of complex images, movement correction of medical diagnostic images in nuclear medicine, radiology, etc. An optimization approach to the displacement field construction based on a discrete model is developed in the paper. The approach explores the possibility of taking into account the brightness change along the trajectories of the system. A linear model is considered. Directed optimization methods based on the analytical representation of the functional gradient are constructed to search for unknown parameters. The algorithm for displacement field construction with image partitioning into regions (neighborhoods) is proposed. This algorithm can be used to process a variety of image sequences. The results of the algorithm operation on test radionuclide images are presented.
Article
Full-text available
Fast Style Transfer is a series of Neural Style Transfer algorithms that use feed-forward neural networks to render input images. Because of the high dimension of the output layer, these networks require much memory for computation. Therefore, for high-resolution images, most mobile devices and personal computers cannot stylize them, which greatly limits the application scenarios of Fast Style Transfer. At present, the two existing solutions are purchasing more memory and using the feathering-based method, but the former requires additional cost, and the latter has poor image quality. To solve this problem, we propose a novel image synthesis method named block shuffle, which converts a single task with high memory consumption to multiple subtasks with low memory consumption. This method can act as a plug-in for Fast Style Transfer without any modification to the network architecture. We use the most popular Fast Style Transfer repository on GitHub as the baseline. Experiments show that the quality of high-resolution images generated by our method is better than that of the feathering-based method. Although our method is an order of magnitude slower than the baseline, it can stylize high-resolution images with limited memory, which is impossible with the baseline. The code, models, and Android demonstration application will be made available on https://github.com/czczup/block-shuffle.
Article
Full-text available
There are more than 391,000 plant species currently known to global science, and it is challenging to distinguish among them. The identification of plant species requires in-depth surveyors and botanists who possess a tremendous amount of knowledge on native plant species. Therefore, plant recognition has become an interdisciplinary concentration in both botanical taxonomy and machine learning for a faster identification process. In this paper, a convolutional neural network system has been proposed to perform feature extraction using different deep learning models in large-scale plant classification methods. The plant image dataset was collected from the PlantCLEF2003 dataset, which consists of 51,273 images from 609 plant species. Four deep convolutional feature extraction methods, including Resnet50V2, Inception Resnet V2, MobilenetV2, and VGG16, are used to extract features from the images. A comparative evaluation of four deep learning models using two classification methods, Support Vector Machine (SVN) and k-nearest neighbor (KNN), is presented. With the highest accuracy of 95.6%, MobilenetV2 performed better than the other deep learning models for plant recognition in both SVM and KNN classification methods. Moreover, the SVM classifier has outperformed the KNN in terms of accuracy in the plant image recognition system. The outcomes are promising for further applications and future work gears towards experiments on a larger dataset with high-performance computing facilities to propose a higher accuracy system of plant image identification in natural environments.
Article
Full-text available
It is complicated to distinguish among thousands of plant species in the natural ecosystem, and many efforts have been investigated to address the issue. In Vietnam, the task of identifying one from 12,000 species requires specialized experts in flora management, with thorough training skills and in-depth knowledge. Therefore, with the advance of machine learning, automatic plant identification systems have been proposed to benefit various stakeholders, including botanists, pharmaceutical laboratories, taxonomists, forestry services, and organizations. The concept has fueled an interest in research and application from global researchers and engineers in both fields of machine learning and computer vision. In this paper, the Vietnamese plant image dataset was collected from an online encyclopedia of Vietnamese organisms, together with the Encyclopedia of Life, to generate a total of 28,046 environmental images of 109 plant species in Vietnam. A comparative evaluation of four deep convolutional feature extraction models, which are MobileNetV2, VGG16, ResnetV2, and Inception Resnet V2, is presented. Those models have been tested on the Support Vector Machine (SVM) classifier to experiment with the purpose of plant image identification. The proposed models achieve promising recognition rates, and MobilenetV2 attained the highest with 83.9%. This result demonstrates that machine learning models are potential for plant species identification in the natural environment, and future works need to examine proposing higher accuracy systems on a larger dataset to meet the current application demand.
Article
Full-text available
State-of-the-art neural style transfer methods have demonstrated amazing results by training feed-forward convolutional neural networks or using an iterative optimization strategy. The image representation used in these methods, which contains two components: style representation and content representation, is typically based on high-level features extracted from pretrained classification networks. Because the classification networks are originally designed for object recognition, the extracted features often focus on the central object and neglect other details. As a result, the style textures tend to scatter over the stylized outputs and disrupt the content structures. To address this issue, we present a novel image stylization method that involves an additional structure representation. Our structure representation, which considers two factors: i) the global structure represented by the depth map and ii) the local structure details represented by the image edges, effectively reflects the spatial distribution of all the components in an image as well as the structure of dominant objects respectively. Experimental results demonstrate that our method achieves an impressive visual effectiveness, which is particularly significant when processing images sensitive to structure distortion, e.g. images containing multiple objects potentially at different depths, or dominant objects with clear structures.
Article
Full-text available
The seminal work of Gatys et al. demonstrated the power of Convolutional Neural Networks (CNNs) in creating artistic imagery by separating and recombining image content and style. This process of using CNNs to render a content image in different styles is referred to as Neural Style Transfer (NST). Since then, NST has become a trending topic both in academic literature and industrial applications. It is receiving increasing attention and a variety of approaches are proposed to either improve or extend the original NST algorithm. In this paper, we aim to provide a comprehensive overview of the current progress towards NST. We first propose a taxonomy of current algorithms in the field of NST. Then, we present several evaluation methods and compare different NST algorithms both qualitatively and quantitatively. The review concludes with a discussion of various applications of NST and open problems for future research. A list of papers discussed in this review, corresponding codes, pre-trained models and more comparison results are publicly available at: https://osf.io/f8tu4/.
Article
Full-text available
This paper presents an automatic image synthesis method to transfer the style of an example image to a content image. When standard neural style transfer approaches are used, the textures and colours in different semantic regions of the style image are often applied inappropriately to the content image, ignoring its semantic layout, and ruining the transfer result. In order to reduce or avoid such effects, we propose a novel method based on automatically segmenting the objects and extracting their soft semantic masks from the style and content images, in order to preserve the structure of the content image while having the style transferred. Each soft mask of the style image represents a specific part of the style image, corresponding to the soft mask of the content image with the same semantics. Both the soft masks and source images are provided as multichannel input to an augmented deep CNN framework for style transfer which incorporates a generative Markov random field (MRF) model. Results on various images show that our method outperforms the most recent techniques.
Article
The paper proposes a new algorithm for constructing the velocity field, which is based on the study of the integral functional on the ensemble of trajectories. The resulting analytical representation of the variation of the integral functional gives us the gradient of the investigated functional. It allows to find the desired parameters using gradient methods, which determine the velocity field. This approach allows both optical flow and non-optical flow construction. The proposed algorithm can be used in the analysis of various images, in particular in radionuclide image processing. © 2018, Institute for Problems in Mechanical Engineering, Russian Academy of Sciences. All rights reserved.
Conference Paper
Neural style transfer has recently received significant attention and demonstrated amazing results. An efficient solution proposed by Johnson et al. trains feed-forward convolutional neural networks by defining and optimizing perceptual loss functions. Such methods are typically based on high-level features extracted from pre-trained neural networks, where the loss functions contain two components: style loss and content loss. However, such pre-trained networks are originally designed for object recognition, and hence the high-level features often focus on the primary target and neglect other details. As a result, when input images contain multiple objects potentially at different depths, the resulting images are often unsatisfactory because image layout is destroyed and the boundary between the foreground and background as well as different objects becomes obscured. We observe that the depth map effectively reflects the spatial distribution in an image and preserving the depth map of the content image after stylization helps produce an image that preserves its semantic content. In this paper, we introduce a novel approach for neural style transfer that integrates depth preservation as additional loss, preserving overall image layout while performing style transfer.
Article
Gatys et al. recently introduced a neural algorithm that renders a content image in the style of another image, achieving so-called style transfer. However, their framework requires a slow iterative optimization process, which limits its practical application. Fast approximations with feed-forward neural networks have been proposed to speed up neural style transfer. Unfortunately, the speed improvement comes at a cost: the network is usually tied to a fixed set of styles and cannot adapt to arbitrary new styles. In this paper, we present a simple yet effective approach that for the first time enables arbitrary style transfer in real-time. At the heart of our method is a novel adaptive instance normalization (AdaIN) layer that aligns the mean and variance of the content features with those of the style features. Our method achieves speed comparable to the fastest existing approach, without the restriction to a pre-defined set of styles. In addition, our approach allows flexible user controls such as content-style trade-off, style interpolation, color & spatial controls, all using a single feed-forward neural network.