Content uploaded by Ngo Le Huy Hien
Author content
All content in this area was uploaded by Ngo Le Huy Hien on Dec 01, 2021
Content may be subject to copyright.
CYBERNETICS AND PHYSICS, VOL. 10, NO. 3, 2021, 127–137
ARTWORK STYLE TRANSFER MODEL USING DEEP
LEARNING APPROACH
Ngo Le Huy Hien
School of Built Environment, Engineering, and Computing
Leeds Beckett University
England
n.hien2994@student.leedsbeckett.ac.uk
Luu Van Huy
Department of Information and Technology
University of Science and Technology, The University of Danang
Vietnam
luuvanhuy2012@gmail.com
Nguyen Van Hieu*
Department of Information and Technology
University of Science and Technology, The University of Danang
Vietnam
nvhieuqt@dut.udn.vn
Article history:
Received 24.08.2021, Accepted 25.11.2021
Abstract
Art in general and fine arts, in particular, play a signifi-
cant role in human life, entertaining and dispelling stress
and motivating their creativeness in specific ways. Many
well-known artists have left a rich treasure of paintings
for humanity, preserving their exquisite talent and cre-
ativity through unique artistic styles. In recent years, a
technique called ’style transfer’ allows computers to ap-
ply famous artistic styles into the style of a picture or
photograph while retaining the shape of the image, cre-
ating superior visual experiences. The basic model of
that process, named ’Neural Style Transfer,’ has been in-
troduced promisingly by Leon A. Gatys; however, it con-
tains several limitations on output quality and implemen-
tation time, making it challenging to apply in practice.
Based on that basic model, an image transform network
was proposed in this paper to generate higher-quality art-
work and higher abilities to perform on a larger image
amount. The proposed model significantly shortened the
execution time and can be implemented in a real-time ap-
plication, providing promising results and performance.
The outcomes are auspicious and can be used as a ref-
erenced model in color grading or semantic image seg-
mentation, and future research focuses on improving its
applications.
Key words
Image Processing, Style Transfer, Image Transformer
Network, Deep Learning, Convolution Neural Network.
1 Introduction
Nowadays, tremendous efforts have been put into ac-
celerating different techniques to assist computers in
performing human-like tasks such as classification and
communication due to the fast-growing artificial intel-
ligence advancements. Most of the conventional deep
learning techniques contribute to improving productiv-
ity and quality of life, and artworks are one of the
promising areas. Preserving artistic features thereby be-
comes noteworthy for both human duties and technol-
ogy, as they represent the heritage and culture through
time [Doulamis and Varvarigou, 2012]. The first im-
age transforming algorithm was founded as an inno-
vation to change a picture’s style while preserving its
shape, which has drawn substantial attention [Gatys
et al., 2016b]. The term ‘neural style transfer’ was in-
troduced to indicate a machine learning technique for
converting an image’s style from an initial to another by
blending a content image into a style reference image,
such as artworks from reputed artists. The fundamental
purpose is to generate a content-look-alike image but is
artificially drawn in the style of the reference image, as
illustrated in Figure 1.
Although several researchers have been involved in
this trend, there are still many spaces for developing and
solving the backlog of limitations [Liu et al., 2017; Chen
et al., 2018; Gatys et al., 2016a]. In the scope of this
study, the authors have pointed out those remaining lim-
itations in the image transformation models and propose
methods to resolve them. Moreover, a web application
was implemented to apply the built model, attempted to
create a number of exquisite art paintings. The approach
128 CYBERNETICS AND PHYSICS, VOL. 10, NO. 3, 2021
Figure 1. Ha Long Bay image styled in an udnie art.
of this study is divided into 3 steps: building a model
to generate uncomplicated images and conveying obser-
vations related to the drawbacks of the model; proposing
an image conversion network to enhance the pattern con-
version model based on the initial model; and applying
model transformation and building an artistic image gen-
erator website. The proposed model significantly short-
ened the execution time and can be applied as a refer-
enced model in real-time applications.
2 Literature Review
In recent years, many research authors have been solv-
ing style image transformation problems by training con-
volutional neural networks with loss functions per pixel
[Selim et al., 2016; Zhao et al., 2020]. A typical example
can be seen from the ’Neural Style Transfer’ model pro-
posed by Leon A. Gatys [Gatys et al., 2016b], which has
used the content image and reference image for training
without any large training dataset directly. The model
has produced new images of high perceptual quality that
blend the appearance of famous artworks into the content
of an arbitrary photograph, with insights into deep image
representations. Although this model contains unavoid-
able shortcomings, it is a prerequisite for later studies
on image generation methods [Chen et al., 2018; Jing
et al., 2019]. AdaIN model, for example, was developed
to match the mean-variance of a content image with a
reference image for remodeling features [Huang and Be-
longie, 2017]. A patch match procedure was introduced
in the swap model [Chen and Schmidt, 2016] for alter-
nating content features with the nearest match style char-
acteristic. Li et al. [Li et al., 2017] proposed a multilevel
stylization for transforming whiten color recursively, im-
proving the output quality and preserve the content struc-
ture. The style transfer problem can be widely used
in other neural networks image processing applications,
such as text classification, semantic parsing, and infor-
mation extraction. The deep fake image approach [?],
for example, might be considered a solution to compara-
ble facial feature transfer challenges. Moreover, it can be
seen from these studies that there is a trade-off between
content and style losses. Therefore, many researchers
have considered image super-resolution and image seg-
mentation methods.
Image super-resolution is a classic problem related to
image processing [Bazhanov et al., 2018], and there
have been a number of researchers involved in this area
[Cheng et al., 2019; Ma et al., 2020]. Yang et al. [Yang
et al., 2014] provided a general review of previously
standard techniques when applying convolutional neu-
ral networks to their research. Some other output quality
improvement methods were suggested, such as a model
of Chao Dong et al. [Dong et al., 2015], taking a low-
resolution content image through a convolutional neural
network [Andreev and Maksimenko, 2019] and produc-
ing an image with high resolution. While their neural
network architecture is uncomplicated and less weight, it
exhibits a high quality of image recovery and performs in
a short time to apply in practical applications. The above
studies are the driving force behind this research to pro-
duce the same high-quality transition images as conven-
tional image hyper-resolution.
Image Segmentation methods divide an image into
many different image areas [Long et al., 2015]. Image
segmentation also has the same objective as the object
detection problem: detecting the image area containing
the object and labeling them appropriately [Noh et al.,
2015]. Although the issue of image segmentation re-
quires a higher level of detail, in return, the algorithm
gives an understanding of the image’s content at a deeper
level. Simultaneously, it reveals the position of the ob-
ject in the image, the shape of the object, and which
object each pixel belongs to [Zheng et al., 2015]. This
method generates labels for image regions for the input
image to run through a fully convolutional neural net-
work, trained with the loss function per pixel.
The objective of this research is to develop a model
that can provide smooth transitions and results as sharp
as the study of Chao Dong et al. [Dong et al., 2015]. It is
also proposed an image transformation network inspired
by two studies of Long J. [Long et al., 2015] and Noh H.
[Noh et al., 2015], improving the quality of the output
image and shorten the transition time. The results are
promising for transferring artwork style to other images,
contributing to the applications in many natural science
fields, including materials science and physics.
3 Standard Image Style Conversion Model
In the standard image style conversion model proposed
by Leon A. Gatys et al. [Gatys et al., 2016b], its input
data consists of a content image and a reference image.
The resulting (target) image is initialized as white noise
before applying convolutional neural networks (CNN) to
transform the white noise image closer to the content and
visual style.
As illustrated in Figure 2, this standard model com-
prises 2 steps:
1. Training by a convolutional neural network for ex-
tracting features.
2. Calculating the loss functions of the content image
and the reference image to update the target image
CYBERNETICS AND PHYSICS, VOL. 10, NO. 3, 2021 129
closer to the content and reference image. The result
is obtained when the total loss is minimum.
Figure 2. Standard Conversion Model.
3.1 Convection Neural Network Features Extrac-
tion
The standard model applies the VGG-16 pre-trained
neural network to extract the content image and refer-
ence image features. Despite using the same VGG16
network, content images and reference images have
different extraction features. When passing an image
through the convolutional neural network, the higher-
order layers in the convolutional neural network capture
the ’high value’ content about the objects and their ar-
rangement from the input image without correct pixel
values. In contrast, the lower layers reproduce the exact
pixel values of the original image. Therefore, features
extracted from higher-order layers are considered image
content features, while features at lower layers are con-
sidered image style features, as shown in Figure 3.
Figure 3. The CNN Network extracts the features of content and style
images.
Based on the study of Leon A. Gatys et al. [Gatys et al.,
2016b], the class of the VGG16 network was used to
extract the characteristic of:
−Content image: conv5 2;
−Reference image: conv1 1, conv2 1, conv3 1,
conv4 1, conv5 1.
3.2 Building Loss Function
This section calculates the loss functions of the content
image and the reference image to update the target image
closer to the content and reference image, as presented
below.
3.2.1 Loss Function for Recreating Content To
extract content features, the content image was passed
through filters of a convolutional neural network. For
each class of the CNN network with Nlfilters, Nlfea-
ture maps sized Ml(Ml= width ×height of the image)
were generated. Therefore, each class l stores the matrix
Fl∈RNl×Ml, in which Fl
ij is the trigger function of the
ith filter at position j in class l. Let −→
pand −→
xis the input
and output image, and Pland Flrepresent their respec-
tive features in the class. The loss function is based on
content characteristics of the input and output image is:
Lcontent =1
2X
i,j Fl
ij −Pl
ij2(1)
3.2.2 Loss Function For Recreating Style Calcu-
lating the style loss function is relatively more compli-
cated, whereas it follows the same principle. Instead of
comparing the intermediate outputs of the content im-
age and reference image, a Gram matrice was handled to
compare the two outputs.
Calculating the Gram matrix: After the target im-
age and reference image passed through a convolutional
neural network, a nCfeature map was obtained with
nH×nWdimensions. To calculate the similarity of the
2 images, nCfeature maps were taken before comput-
ing the scalar product of two feature vectors (each fea-
ture map is flattened to a feature vector) on each corre-
sponding pair of maps. As a result, a Gram matrix with
nC×nCdimensions was created, as shown in Figure 4.
Given a set nCof nH×nWvectors, the Gram matrix G
is the matrix of all possible inner products of nC:
Gij =F|
iFj(2)
The Gram matrix determines the vectors Fiup to isom-
etry and indicates the correlation between filters.
Figure 4. The Gram Matrix is created from a target image and a ref-
erence image.
130 CYBERNETICS AND PHYSICS, VOL. 10, NO. 3, 2021
The style features are given by the Gram matrix Gl∈
RNlxNl, in which Gl
ij is the inner product between the
feature maps, are vectors i and j in class l:
Gl
ij =X
k
Fl
ikFl
jk (3)
Let −→
aand −→
xis the input image and target output im-
age, and Aland Glare their corresponding styles in class
l. Then, the contribution of class l to the total loss is:
El=1
4N2
lM2
lX
i,j
(Gl
ij −Al
ij)2(4)
And the sum of the style loss function is:
Lstyle =
L
X
l=0
wlEl(5)
3.2.3 Synthesizing the Loss Function To convert
the style of a work of art −→
ato an image −→
p, it is needed
to synthesize a new image while having a representation
of its content of −→
pand perform the style of −→
a. There-
fore, this study uses the content and style loss aggrega-
tion function.
Ltotal =αLcontent +βLstyle (6)
in which αand βare the weights for content reproduc-
tion and style reproduction, respectively. The selection
of αand βalso affects the quality of the final output im-
age.
3.3 Results
Figure 5. Applying Standard Transformation Model with the content
image of Golden Bridge in Da Nang and the reference image of Candy
Artwork.
In general, the standard model was able to apply the
style of the given image to the content image. How-
ever, it can be seen from Figure 5 that the quality of
the output image remains relatively low, although it has
been through a long period of training. The color arrays
were changed disorderly in the resulting image, contain-
ing noise, and its resolution was severely reduced. Not
only that, on average, a transfer takes 10 to 15 minutes
for 100 loops, and it increases correspondingly with the
number of loops and the resolution of the input image.
This is comparatively a long time to attain a possible pat-
tern conversion when applying to the real practices.
In the upcoming section, the study proposes a new type
conversion model based on the standard type conversion
model to improve image generation speed and image
quality.
4 Proposed Image Style Conversion Model
From the standard image style conversion model out-
lined above, this section proposes a new image style con-
version model. The concept, architecture, and experi-
ment results of the proposed model are presented below.
4.1 The Concept
The standard conversion model initializes the target
image as white noise, then attempts to match it in a typi-
cal way of the content image and reference image, lead-
ing to a time-consuming process and low-quality output.
Accordingly, the concept of this model is to propose an
image conversion network that possesses the ability to
learn features similar to a content image. When hav-
ing an image that needs to be transformed, its features
can be obtained by the model faster and more accurately,
thereby reducing the conversion time to the target image,
as indicated in Figure 6.
Content Image
Target Image
Reference Image
VGG16
Content
Features
Reference
Features
Content
Features
Reference
Features
Style Loss Function
Content Loss Function
Image
Transformation
Network
Figure 6. Proposed image style conversion model.
CYBERNETICS AND PHYSICS, VOL. 10, NO. 3, 2021 131
4.2 Model Architecture
4.2.1 Image Transformation Network The im-
age transformation network is a convolutional neural
network with the residual mass parameterized by the
weights of W; it converts the input image x to the output
image ˆythrough mapping ˆy=fW(x). Each loss function
li(ˆy,yi) measures the differences between output image
ˆyand input image yi. The image transformation network
was trained using stochastic gradient descent to optimize
weights of the loss function:
W∗=argminWEx,{yi}"X
i=1
λili(fW(x), yi)#(7)
Using the above-mentioned loss function, the follow-
ing details of the convolutional neural network have been
used in the proposed image transformation network, as
presented in Table 1.
There are a few points of interest in this image transfor-
mation network model; the stridden convolutions down-
sampling and upsampling blocks in the network were
used instead of pooling layers. The network body con-
sists of five residual blocks by using the architecture in
the research of Kaiming H. et al. [He et al., 2016].
All convolutional classes (no residual block) were fol-
lowed by the normalization class batch and the ReLU
non-linear classes, as presented in Figure 7.
4.2.2 The Function of the Image Conversion Net-
work Layers Downsampling with strided convolu-
tions: The first convolution in the proposed network has
stride = 1, but the following two layers have stride =
2. This means that every time the filter was relocated,
it shifts 2 pixels instead of 1 and the output image has
the size of n/2 ×n/2. With a convolution layer stride
= 2, it reduced half of the input size, called downsam-
pling. Since the input image can be sampled down; each
pixel of the input results from a calculation involving
the larger number of pixels from the original input im-
age. This approach allows the kernel filters to access a
much larger portion of the original input image without
increasing the kernel size. Applying a certain conversion
pattern to the entire image creates more information for
each filter related to the original input image, making the
conversion network performs better.
Upsampling with strided convolutions: Upsampling is
the contrary of downsampling, which is used with stride
= 2. After applying 2 layers of downsampling, the im-
age size is reduced by 1/4 compared to the original. The
desired output of the converter network is a typed image
with the same resolution as the original content image.
To achieve this, 2 convolution layers were applied with
stride = 1/2. These layers, which increase the output im-
age size, are called upsampling.
Residual convergence class [He et al., 2016]: Between
the downsampling and upsampling classes, there are 5
residual blocks. These are first-order convolution lay-
ers, but the difference between these layers and the con-
ventional convolution layers is that the network’s input
directly contributes to the output.
Figure 7. Image transformation network architecture.
4.2.3 Constructing Loss Function Like the stan-
dard model, the proposed model also utilizes the VGG16
pre-trained neural network [Van Hieu and Hien, 2020a;
Van Hieu and Hien, 2020b] to measure the loss in the
input images’ content or art style. However, the content-
feature reconstruction loss function was computed at
class relu2 2, and the style-feature reconstruction loss
function was calculated in classes relu 2, relu 2 2, relu
3 3, and relu 4 3.
132 CYBERNETICS AND PHYSICS, VOL. 10, NO. 3, 2021
Table 1. Details of proposed image transformation network.
Operation Kernel size Stride Feature maps Padding Activation
Network – 256 ×256 ×3input
Conv2d 9 1 32 ReflectionPad2d
InstanceNorm2d 32
Conv2d 3 2 64 ReflectionPad2d
InstanceNorm2d 64
Conv2d 3 2 128 ReflectionPad2d
InstanceNorm2d 128
Residual block 128 ReLU
Residual block 128 ReLU
Residual block 128 ReLU
Residual block 128 ReLU
Residual block 128 ReLU
Upsampling 64
InstanceNorm2d 64
Upsampling 32
InstanceNorm2d 32
Conv2d 9 1 3 ReflectionPad2d ReLU
Feature Reconstruction Loss: Instead of forcing pixels
of the output image ˆy=fW(x) exactly matches the pix-
els of the target image, they have the same feature repre-
sentations as computed by the loss network φ. Let φj (x)
are the features of the j layer of the network φwhen pro-
cessing image x; if j is a convolution class, φj(x) will be
a feature map of the shape Cj ×H j ×W j. The loss on
feature reproduction is the Euclidean distance between
the object’s features:
lφ
feat(ˆy , y) = 1
CjHjWj
kφj(ˆy)−φj(y)k2
F(8)
Style Reconstruction Loss: The idea of calculating the
style loss function is the comparison of the Gram matrix
of the visual outputs as in the standard model.
Gφ
j(x)c,c0=1
CjHjWj
Hj
X
h=1
Wj
X
w=1
φj(x)h,w,cφj(x)h,w ,c0
(9)
Then, the style loss function is the Frobenius standard
square of the difference between the Gram matrix of the
output image and the input image:
lφ
style(ˆy , y) = kGφ
j(ˆy)−Gφ
j(y)k2
F(10)
To perform style reproduction from a set of J classes
instead of a class j, let define lφ ,J
style (ˆy , y)is the sum of
the losses per class j ∈J.
lφ ,J
style (ˆy , y) =
J
X
j
lφ ,j
style (ˆy , y)(11)
CYBERNETICS AND PHYSICS, VOL. 10, NO. 3, 2021 133
4.3 Results
Figure 8. Proposed style conversion model from the content image of
Golden Bridge in Da Nang and the reference image of Candy artwork.
Figure 9. Comparison of results generated by the Standard transfor-
mation model (top) and Proposed style conversion model (bottom).
It is undoubtedly noticed from Figure 8 and 9 that the
improved conversion network of the proposed model is
significantly better than the standard model, in general.
The improved conversion is many times faster than the
basic transitions model, and small images can even be
executed in real-time. The essence of artwork gener-
ated from the proposed model is also more artistic and
sharper than the first model. Moreover, proposed mod-
els with a trained style through one training can apply its
style for any content image in later usage.
5 Experiments and Results
5.1 Dataset Preparation and Configuration
In this study, the dataset used for implementation in-
cludes the Flirck8k dataset (8100 images, 1Gb) and the
Microsoft Coco dataset (80000 images, 13Gb). The con-
tent of these image datasets is mostly scenes of diversi-
fied environments such as mountains, rivers, and trees,
as illustrated in Figure 10. Given the diversity of these
datasets, it was expected the model to be able to style any
content image even if it has never been trained before.
Figure 10. Flickr8k Dataset.
The Flirck8k dataset was trained on Google Colab
GPU Tesla K80, while the Microsoft Coco dataset was
trained on a personal computer configured with Intel
Core i7 - 4700MQ. The standard conversion model was
configured with the loss function shown in Figure 11 and
the following characteristics.
−Content weight : 1e3loss weight of content;
−Style weight: 1e−2loss weight of style;
−Content image: golden bridge.jpg the content im-
age prepared to be styled;
−Style image: candy.jpg reference image.
And the proposed conversion model was configured
with the loss function shown in Figure 12 and the fol-
lowing characteristics.
−Content weight: 1e5;
−Style weight: 5e10 ;
−Style image: candy.jpg;
−Learning rate: 1e3the learning rate of the neural
network.
134 CYBERNETICS AND PHYSICS, VOL. 10, NO. 3, 2021
Figure 11. Loss function diagram of the standard loop-based conver-
sion model.
Figure 12. Loss function diagram of the proposed loop-based con-
version model.
5.2 Results
Figures 13, 14 and Table 2 showcase the results and
comparations of the models through different criteria,
from which the advantages and disadvantages of each
model will be concluded. Traditional metrics were used
to evaluate output image quality, Peak signal to noise
ratio (PSNR), and structural index similarity (SSIM)
[Wang et al., 2004], both of which represent the human
view of image quality [Hanhart et al., 2013; Huynh-Thu
and Ghanbari, 2008; Sheikh et al., 2006]. The goal of
these analyses was not to achieve the best PSNR or SSIM
results but simply to show differences in output image
quality, characteristic loss from the original image be-
tween different models trained.
5.3 Discussion
It is noticed from Figure 14 that the PSNR index of
the image from the standard and proposed models is rel-
atively close; sometimes, the standard model is higher
than the proposed model (the higher the PSNR, the less
interference effect). However, this index does not pre-
cisely determine the quality of the output image. Specif-
ically for images from the standard model, the noise was
calculated on the whole picture while it is measured in
the picture’s unimportant parts in the proposed model;
therefore, the image quality of the proposed model was
still better than from the standard model. Furthermore,
the proposed model’s image always retains the main im-
age characteristics compared to the original image, per-
forming much better than the standard model. This can
be indicated that the SSIM of the proposed model was
always higher than the standard one.
Figure 15. With the same content and reference image. The proposed
style conversion model (right below) provides better quality than the
standard conversion model (left below).
Figure 15 proves that with the same content and refer-
ence image, the proposed model produces a completely
better image quality than the standard model, which
proves the effectiveness of using additional conversion
networks.
Figure 16. The proposed model was trained on the Flick8k dataset
(left, bottom) and Coco dataset (the right side, below) using the content
image of the homeland of Vietnam and the reference image of Candy
artwork.
Whereas Figure 16 shows that with the proposed
model, the model trained on a larger data set gives supe-
rior results compared to the model on the small dataset
(the Microsoft Coco dataset is 10 times higher than the
Flick8k dataset). This supremacy once again confirms
the efficiency of using a larger image conversion network
of the proposed model compared to the smaller training
dataset.
CYBERNETICS AND PHYSICS, VOL. 10, NO. 3, 2021 135
Figure 13. Some of the artwork results produced by the proposed model.
Figure 14. Comparison of PNSR / SSIM of the images generated from the models (under each image).
136 CYBERNETICS AND PHYSICS, VOL. 10, NO. 3, 2021
Table 2. Comparison of the execution times of the models.
Model / Image size
(pixel)
Standard Model Proposed Model on
Flirck8k Dataset
Proposed Model on Mi-
crosoft Coco Dataset
256 ×256 5 minutes 0.6 second 0.6 second
512 ×512 15 minutes 5 seconds 5 seconds
1024 ×1024 31 minutes 30 seconds 30 seconds
5.4 Artwork Generator Website
To create exquisite artistic paintings and demonstrate
the effectiveness of the prosed image transformation
model that can be applied in practice, an artistic photo
generator website was built with the following main
functions.
−Art exhibitions: (Figure 17) View pictures, collec-
tions of famous artists, exhibition events, and re-
lated information.
−Artwork Generator: (Figure 18) Transforms images
from exhibited artworks or images uploaded from
user devices.
Figure 17. Layout of the artwork generator website.
Figure 18. Artistic Image Generator Website – several style-
transformed images.
6 Conclusion and Future Work
In this paper, a new image conversion model was pro-
posed and compared with a previous conversion archi-
tecture, using a convolutional neural network for trans-
formation, pre-trained VGG16 model for measuring the
loss in the input images’ content. Despite substantial
efforts that have been put in the past ([Huang and Be-
longie, 2017; Chen and Schmidt, 2016; Bourached et al.,
2021]), this research introduced a high-efficiency model
for image style transfer in large-scale and time-efficient
focus. The proposed model generates a remarkable out-
come on output image quality, performing much better
than the standard model while also retaining the main
image characteristics compared to the original model
in both PSNR and SSIM index. Moreover, it also re-
duced more than 90% the execution time, which is sig-
nificant for executing in real-time practice. And these
study outcomes open up new avenues for future research
and can play as a crucial source for future image style
transfer systems. The new image conversion model can
not only be applied in arts but also in various areas such
as geotechnics, civil engineering, materials science, and
physics.
Future work may gear towards improving applications
of the model, such as the ability to convert more artwork
styles in a single training session or the flexibility to ad-
just the degree of transformation. This is also a promis-
ing and potential image conversion model that can be ap-
plied in color grading or semantic image segmentation.
7 Acknowledgement
This research was funded and implemented for the
Mercury project of Est Rouge Technologies JSC, Viet-
nam. This work was also supported by the People’s
Committee of Danang and The University of Danang,
Vietnam.
References
Andreev, A. and Maksimenko, V. (2019). Synchroniza-
tion in coupled neural network with inhibitory cou-
pling. Cybernetics and Physics,8(4), pp. 199–204.
Bazhanov, P., Kotina, E., Ovsyannikov, D., and Ploskikh,
V. (2018). Optimization algorithm of the velocity field
determining in image processing. Cybernetics and
Physics,7(4), pp. 174–181.
Bourached, A., Cann, G., Griffiths, R.-R., and Stork,
D. G. (2021). Recovery of underdrawings and ghost-
paintings via style transfer by deep convolutional neu-
ral networks: A digital tool for art scholars. arXiv
preprint arXiv:2101.10807.
CYBERNETICS AND PHYSICS, VOL. 10, NO. 3, 2021 137
Chen, D., Yuan, L., Liao, J., Yu, N., and Hua, G. (2018).
Stereoscopic neural style transfer. In Proceedings of
the IEEE Conference on Computer Vision and Pattern
Recognition, pp. 6654–6663.
Chen, T. Q. and Schmidt, M. (2016). Fast patch-
based style transfer of arbitrary style. arXiv preprint
arXiv:1612.04337.
Cheng, M.-M., Liu, X.-C., Wang, J., Lu, S.-P., Lai, Y.-
K., and Rosin, P. L. (2019). Structure-preserving neu-
ral style transfer. IEEE Transactions on Image Pro-
cessing,29, pp. 909–920.
Dong, C., Loy, C. C., He, K., and Tang, X. (2015).
Image super-resolution using deep convolutional net-
works. IEEE transactions on pattern analysis and ma-
chine intelligence,38 (2), pp. 295–307.
Doulamis, A. and Varvarigou, T. (2012). Image analy-
sis for artistic style identification: A powerful tool for
preserving cultural heritage. Emerging technologies in
non-destructive testing V,71.
Gatys, L. A., Bethge, M., Hertzmann, A., and Shecht-
man, E. (2016a). Preserving color in neural artistic
style transfer. arXiv preprint arXiv:1606.05897.
Gatys, L. A., Ecker, A. S., and Bethge, M. (2016b). Im-
age style transfer using convolutional neural networks.
In Proceedings of the IEEE conference on computer
vision and pattern recognition, pp. 2414–2423.
Hanhart, P., Korshunov, P., and Ebrahimi, T. (2013).
Benchmarking of quality metrics on ultra-high defini-
tion video sequences. In 2013 18th International Con-
ference on Digital Signal Processing (DSP), IEEE, pp.
1–8.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep
residual learning for image recognition. In Proceed-
ings of the IEEE conference on computer vision and
pattern recognition, pp. 770–778.
Huang, X. and Belongie, S. (2017). Arbitrary style trans-
fer in real-time with adaptive instance normalization.
pp. 1501–1510.
Huynh-Thu, Q. and Ghanbari, M. (2008). Scope of va-
lidity of psnr in image/video quality assessment. Elec-
tronics letters,44 (13), pp. 800–801.
Jing, Y., Yang, Y., Feng, Z., Ye, J., Yu, Y., and Song, M.
(2019). Neural style transfer: A review. IEEE transac-
tions on visualization and computer graphics,26 (11),
pp. 3365–3385.
Li, Y., Fang, C., Yang, J., Wang, Z., Lu, X., and Yang,
M.-H. (2017). Universal style transfer via feature
transforms. arXiv preprint arXiv:1705.08086.
Liu, X.-C., Cheng, M.-M., Lai, Y.-K., and Rosin, P. L.
(2017). Depth-aware neural style transfer. In Proceed-
ings of the Symposium on Non-Photorealistic Anima-
tion and Rendering, pp. 1–10.
Long, J., Shelhamer, E., and Darrell, T. (2015). Fully
convolutional networks for semantic segmentation. pp.
3431–3440.
Ma, W., Chen, Z., and Ji, C. (2020). Block shuffle: A
method for high-resolution fast style transfer with lim-
ited memory. IEEE Access,8, pp. 158056–158066.
Noh, H., Hong, S., and Han, B. (2015). Learning decon-
volution network for semantic segmentation. In Pro-
ceedings of the IEEE international conference on com-
puter vision, pp. 1520–1528.
Selim, A., Elgharib, M., and Doyle, L. (2016). Paint-
ing style transfer for head portraits using convolu-
tional neural networks. ACM Transactions on Graph-
ics (ToG),35 (4), pp. 1–18.
Sheikh, H. R., Sabir, M. F., and Bovik, A. C. (2006).
A statistical evaluation of recent full reference image
quality assessment algorithms. IEEE Transactions on
image processing,15 (11), pp. 3440–3451.
Van Hieu, N. and Hien, N. L. H. (2020a). Automatic
plant image identification of vietnamese species using
deep learning models. International Journal of Engi-
neering Trends and Technology,68 (4), pp. 25–31.
Van Hieu, N. and Hien, N. L. H. (2020b). Recognition of
plant species using deep convolutional feature extrac-
tion. International Journal on Emerging Technologies,
11, pp. 904–910.
Wang, Z. and Bovik, A. C. (2009). Mean squared er-
ror: Love it or leave it? a new look at signal fidelity
measures. IEEE signal processing magazine,26 (1),
pp. 98–117.
Wang, Z., Bovik, A. C., Sheikh, H. R., and Simoncelli,
E. P. (2004). Image quality assessment: from error
visibility to structural similarity. IEEE transactions on
image processing,13 (4), pp. 600–612.
Yang, C.-Y., Ma, C., and Yang, M.-H. (2014). Single-
image super-resolution: A benchmark. In European
conference on computer vision, Springer, pp. 372–386.
Zhao, H.-H., Rosin, P. L., Lai, Y.-K., and Wang, Y.-N.
(2020). Automatic semantic style transfer using deep
convolutional neural networks and soft masks. The Vi-
sual Computer,36 (7), pp. 1307–1324.
Zheng, S., Jayasumana, S., Romera-Paredes, B., Vi-
neet, V., Su, Z., Du, D., Huang, C., and Torr, P. H.
(2015). Conditional random fields as recurrent neural
networks. In Proceedings of the IEEE international
conference on computer vision, pp. 1529–1537.