ArticlePDF Available

Generating Stochastic Processes Through Convolutional Neural Networks

Authors:

Abstract and Figures

The present work establishes the use of convolutional neural networks as a generative model for stochastic processes that are widely present in industrial automation and system modelling such as fault detection, computer vision and sensor data analysis. This enables researchers from a broad range of fields—as in medical imaging, robotics and control engineering—to develop a general tool for artificial data generation and simulation without the need to identify or assume a specific system structure or estimate its parameters. We demonstrate the approach as a generative model on top of data retrieved from a wide set of classic, simplest to the most complex, deterministic and stochastic data generation processes of technological importance—from damped oscillators to autoregressive conditional heteroskedastic and jump-diffusion models. Also, a nonparametric estimation and forecast was carried out for the traditional benchmark “Fisher River” time-series dataset, yielding the superior mean absolute prediction error results compared to a standard ARIMA model. This approach can have potential applications as an alternative to simulation tools such as Gibbs sampling and Monte Carlo-based methods, in the enhancement of the understanding of generative adversarial networks (GANs) and in data simulation for training Reinforcement Learning algorithms.
Content may be subject to copyright.
Journal of Control, Automation and Electrical Systems (2020) 31:294–303
https://doi.org/10.1007/s40313-020-00567-y
Generating Stochastic Processes Through Convolutional Neural
Networks
Fernando Fernandes Neto1·Rodrigo de Losso da Silveira Bueno1·Pedro Delano Cavalcanti2·
Alemayehu Solomon Admasu3
Received: 24 November 2018 / Revised: 23 November 2019 / Accepted: 9 January 2020 / Published online: 31 January 2020
© Brazilian Society for Automatics–SBA 2020
Abstract
The present work establishes the use of convolutional neural networks as a generative model for stochastic processes that
are widely present in industrial automation and system modelling such as fault detection, computer vision and sensor data
analysis. This enables researchers from a broad range of fields—as in medical imaging, robotics and control engineering—to
develop a general tool for artificial data generation and simulation without the need to identify or assume a specific system
structure or estimate its parameters. We demonstrate the approach as a generative model on top of data retrieved from a
wide set of classic, simplest to the most complex, deterministic and stochastic data generation processes of technological
importance—from damped oscillators to autoregressive conditional heteroskedastic and jump-diffusion models. Also, a
nonparametric estimation and forecast was carried out for the traditional benchmark “Fisher River” time-series dataset,
yielding the superior mean absolute prediction error results compared to a standard ARIMA model. This approach can
have potential applications as an alternative to simulation tools such as Gibbs sampling and Monte Carlo-based methods,
in the enhancement of the understanding of generative adversarial networks (GANs) and in data simulation for training
Reinforcement Learning algorithms.
Keywords Convolutional neural networks ·Wave Net ·Stochastic processes ·Generative models
The work at Rutgers University was supported in part by the Van
Dyck fund under the school of Graduate Studies and the Department
of Physics and Astronomy.
BFernando Fernandes Neto
fernando_fernandes_neto@usp.br
Rodrigo de Losso da Silveira Bueno
delosso@usp.br
Pedro Delano Cavalcanti
pedelano@gmail.com
Alemayehu Solomon Admasu
a.solomon@rutgers.edu
1Universidade de São Paulo (USP), Avenida Prof. Luciano
Gualberto, 908; Cidade Universitária, São Paulo, Brazil
2Department of Physics and Astronomy, Rio de Janeiro State
University, R. Sao Francisco Xavier - 524, Rio de Janeiro
20559-900, Brazil
3Department of Physics and Astronomy, Rutgers University,
Piscataway, NJ 08854, USA
1 Introduction
The idea of the present paper is to adapt convolutional neural
networks, which have been successfully applied on the gen-
eration of raw audio waveforms as in van den Oord et al.
(2016b), images as in van den Oord et al. (2016a), text
(see Jzefowicz et al. 2016) and multivariate systems (see
Borovykh et al. 2017), and show that this kind of neural
network can also be used to work as a generative model
on top of data retrieved from a wide set of known deter-
ministic or stochastic data generation processes of industrial
importance and from the simplest to the most complex
processes—from damped oscillators to autoregressive condi-
tional heteroskedastic (ARCH) and jump-diffusion models.
In lieu of traditional identification and estimation proce-
dures, a new approach is proposed here: estimate only the
hyperparameters of a convolutional neural network, i.e.num-
ber of convolutional layers, discretization scheme (encoding)
and dilations.
We demonstrate that data generation processes can be
understood and simulated using this approach, without the
123
Journal of Control, Automation and Electrical Systems (2020) 31:294–303 295
need of assuming any hard-structural form or imposing any
kind of restrictions in 1D signals, which can be then later
generalized to 2D or 3D signals that are far more interest-
ing in terms of computer vision, automation and multivariate
data analysis tasks.
The remainder of the paper is organized as follows. In
Sect. 2, we present background on the process of procedures
of identification, estimation and simulation of stochastic pro-
cesses including standard statistical methods. A deep neural
network model, the WaveNet architecture, is explained in
Sect. 3, while simulation tests and main results are discussed
in Sect. 4. Finally, in Sect. 5, we conclude with the findings
of the present study and future directions of research.
2 Problem Statement
It is widely known that procedures of identification, estima-
tion and simulation of stochastic processes are somewhat
already well established in mathematics, computing and
related fields and sub-fields such as applied statistics, engi-
neering physics and machine learning. On the other hand,
most of them rely on the fact that, to work properly, the
observer must impose a system structure to estimate its
respective parameters, see Hayashi (2000), and very often
assume a probability distribution function.
On top of this specified and estimated system, the
researcher can accomplish tasks such as forecasting and sim-
ulating the system’s states. In parallel to the development of
these strategies, a separate field of artificial neural networks,
which have mostly been forgotten during the 1990s and mid-
2000s because they were considered black boxes without any
intelligibility of their inner computations, was revived as can
be seen in Kolman and Margaliot (2007).
One of the most important reasons for this revival is
due to the success of deep neural networks (neural net-
works with several layers), which have been successfully
employed on tasks such as pattern recognition, classification
and prediction, performing better than humans do. These
tasks encompass a wide range of different problems such
as computer vision—character recognition, object detec-
tion and others; audio processing; and defeating world-class
human players in complex computer games, Go and Chess,
see Silver et al. (2017).
A great part of these successes has been attributed to
a new hybrid architecture called convolutional neural net-
works, where convolutional filters are placed and stacked
composing deep networks—enabling the filtering of desired
multiscale/multidimensional features that enhance classifi-
cation/forecasting capabilities and which can be interpreted
and understood—plus a Softmax classifier/regressor, which
basically works as a normalized multinomial logistic regres-
sor, see Bishop (2006).
Fig. 1 Image of a skull converted into time-series data (Keogh et al.
2006)
By estimating only the hyperparameters of a convolu-
tional neural network, data generation processes can be
understood and simulated without the need of assuming any
hard-structural form or imposing any kind of restrictions.
Moreover, as the data are encoded/decoded outside the neu-
ral network, by means of transforming a regression task into
a classification task, no assumption about the distribution of
the data generating process must be made. Our approach is
based on the fact that if it is possible to characterize what can
be learnt in terms of one-dimensional stochastic processes,
we can verify the capacity of the network to not only learn
patterns but also simulate them.
Furthermore, at some extent, it is possible to transform
2D and 3D signals into time-series data, i.e. 1D signals, in
order to use the methods shown in this work. This vastly
extends the range of problems we are able to work with. An
example of this is image classification and computational
vision. As we can see in Keogh et al. (2006), it is possible to
convert shapes and images into time-series data (see Fig. 1),
making classification tasks, for example, approachable with
this method.
The list of N-dimensional problems that can be reduced
to time- series data is huge. Some examples are: the dis-
crimination of different types of coffee beans (Briandet et al.
1996), the identification of chlorine levels in drinking water
(Lietal.2009), a predictive modelling of bone ageing (Davis
2013), a leaf image retrieval scheme (Tak and Hwang 2007)
and diatom (unicellular algae) identification and classifica-
tion (Jalba et al. 2004).
Time-series modelling is used in a wide range of automa-
tion and control problems, such as modelling electricity loads
(Nowicka-Zagrajek and Weron 2002), estimation of electro-
mechanical modes in power systems (Dosiek et al. 2013) and
parameter estimation of superimposed exponential signals in
noise (Bresler and Macovski 1986).
Forecasting of time series is important to solve a diverse
set of problems. Notwithstanding, simulation of stochastic
processes—such as ARMA and ARCH—has applications in
various fields, including robotics and industrial automation
123
296 Journal of Control, Automation and Electrical Systems (2020) 31:294–303
and simulation of manufacturing processes (Tang and Wong
2000).
Nonparametric forecasting and simulation is a more flex-
ible way to model processes, since it does not assume any
previous autoregressive function structure. This avoids the
parametric method’s problems of estimation and identifica-
tion bias due to data and data size.
Because our network is capable of nonparametric pro-
cess simulation, it can replace the usual approach to simulate
stochastic processes, such as evolutionary algorithms (EAs)
or genetic algorithms (GAs).
The simulated processes can be also be applied for use
in reinforcement learning (RL) algorithms. Supply chain
management, industrial robotics and industrial systems are
problems that recently began being tackled using RL. For
example, in dynamic systems, where all sorts of variables
are found, one wants their models to be prepared to respond
effectively in various situations, and this can be achieved with
reinforcement learning as in Chinchali et al. (2018).
Furthermore, this methodology can be applied in Digital
Twins models that are an integrated probabilistic simulation
of a system, see Glaessgen and Stargel (2012). It digitizes
data gathered from scanners and sensors to be an accurate
virtual representation of the system. This Digital Twin can
be used to continuously prevent damage and flaws and can
also anticipate system’s performance or eventual anomalies.
This can be applied in various areas of industry such as man-
ufacturing systems as in Lee et al. (2015) and product design
as in Tao et al. (2017).
As such, being able to simulate and predict stochastic
processes properly is essential in a wide range of machine
learning applications in industry and automation such as
fault detection, multivariate nonlinear process modelling,
computer vision, medical image recognition, robotics and
many others, see, for example, Monteiro et al. (2019) and de
Marcelo et al. (2017)
3 Model Description
The main idea of the original WaveNet paper (van den Oord
et al. 2016b) is to model the joint probability of a stochastic
process χ=1
2,...,χ
T)as a product of conditional
probabilities
pT)=
T
t=1
pT|χ1,...,χ
T1). (1)
In other words, the probability of pT)is conditioned to all
previous observations.
To model the time series following this approach in terms
of a convolutional neural network, the WaveNet architecture
Input
Hidden Layer
Hidden Layer
Hidden Layer
Output
Fig. 2 Dilated causal convolutional layer. Source: van den Oord et al.
(2016b)
consists of stacking what is called dilated causal convo-
lutional layers, which consists of stacking structures as in
Fig. 2, and as pointed before, a Softmax layer, which con-
sists of a multinomial logistic classifier given by:
hθ(f)=1
K
j=1exp(θ ( j)Tf)
θ(1)Tf
θ(2)Tf
···
θ(K)Tf
(2)
where Kdenotes the number of different classes; fdenotes
the features (independent variables) extracted in the previous
layers; θ(j)Tdenotes the weights of the features used to clas-
sify the output, which are filtered by means of convolution
operations specified in Fig. 2; and hdenotes the hypothesis
of the output pertaining to a specific class that is dependent
on θand f—here, it is worth mentioning that, given this clas-
sification structure, the observed variables in the stochastic
process must be encoded into a discrete variable with Kdif-
ferent classes, where
hθ(x)=
P(y=1|x,θ)
P(y=2|x,θ)
···
P(y=K|x,θ)
.(3)
where each P(·|·)corresponds to the entries in Eq. (2). In
reference to dilated causal convolutional layers, it is worth
pointing out some important observations about the main
features of these structures:
These stacked layer structures act as a generalization of
discrete wavelet filters (see Borovykh et al. 2017), given
the fact that, basically, discrete wavelet transforms can
be thought as a cascade of linear operations;
Stacking such features with nonlinear operators can pro-
vide a general approximator due to the shift invariance,
as discussed in Cheng et al. (2016) and Neto (2017),
enabling the researcher to capture important nonlineari-
ties;
This structure helps obtaining a very wide receptive field,
which facilitates extracting long-range dependencies in
conjunction with short memory (as can be seen in Fig. 3).
123
Journal of Control, Automation and Electrical Systems (2020) 31:294–303 297
Input
Hidden Layer
Dilation = 1
Hidden Layer
Dilation = 2
Hidden Layer
Dilation = 4
Output
Dilation = 8
Fig. 3 Multiscale features in dilation causal convolutional layers.
Source: van den Oord et al. (2016b)
1×1ReLUReLU
1×1
Dilated
Conv
tanh
×
+
σ
1×1
+Softmax
R
esi
d
ua
l
S
ki
p
-connections
k Layers
Causal
Conv
Fig. 4 Overview of the architecture. Source: van den Oord et al.
(2016b). The blue bubble connects the upper and lower sub-figures
(Color figure online)
Keeping that in mind, these dilation convolutional layers
are stacked and organized in a way that each feature is added
to the previous layer features following a residual scheme,
as follows in Fig. 4, enabling deeper models and faster con-
vergence, according to He et al. (2015).
It is also important to notice that this architecture applies
a gated activation function instead of rectified linear activa-
tion functions (ReLU), which are the most popular activation
functions in this kind of neural networks, due to the fact that,
as shown in van den Oord et al. (2016a), it outperforms the
traditional approach. In this case, this gated activation func-
tion is described by:
z=tanh(Wf,kx)σ(Wg,kx)(4)
where denotes the element-wise multiplication operator;
denotes the convolution operator; xdenotes the input; Wf,k
denotes a learnable convolutional filter at the kth layer; and
Wg,kdenotes a learnable convolutional gate.
Aiming to focus on the main subject of this paper, for
further architecture details, one should refer the original
WaveNet implementation paper.
3.1 Forecasting
To test the network’s forecasting capabilities, the chosen
dataset is the “Fisher River” time series, a benchmark dataset
commonly used in the related literature, see Avishek and
Prakash (2017), Giordano et al. (2008) and Bahrpeyma et al.
(2018).
Our approach will be to model the dataset using the clas-
sical ARIMA technique and also the WaveNet architecture.
Then, we will compare the mean absolute prediction error
associated with each model and discuss those results.
3.2 Simulation
To test the network’s simulation capabilities, the chosen
deterministic processes were:
Harmonic Oscillator: as studied in motion sensing and
analysis (Wang et al. 2003), elementary human locomo-
tion is a cyclical process with varying frequencies but
where, for instance, the centre of rhythmic arm move-
ment trajectories can be simulated by a single simple
harmonic oscillator Eq. (5) (Vakulenko et al. 2017):
d2y(t)
dt2+a·y(t)=0(5)
where y(t)is the displacement amplitude as a function
of time tand aand bare constants in Eqs. (5) and (6).
Damped Harmonic Oscillator: similarly as above, this
process renders a more realistic modelling of motion phe-
nomena such as in automatic vehicle guidance with lane
marker and pedestrian detection algorithm (Luong et al.
1995), minimal dynamical description of ocular move-
ments during visual search (Specht et al. 2017) and line
scratches detection and restoration on old film sequences
(Bruni et al. 2003). It is given by :
d2y(t)
dt2+b·dy(t)
dt+a·y(t)=0(6)
Logistic Map with a chaotic choice of the parameter (see
May 1976): this model has become popular in cryptog-
raphy where real-time secure digital image encryption
and transmission algorithms based on chaotic logistic
maps are employed with applications in military image
databases, confidential video conferencing, cable TV,
online personal photograph album, etc. (Pareek et al.
2006). It has the form:
xn+1=r·xn·(1xn)(7)
where ris a positive constant parameter and xn+1and xn
are the system variable at current and next intervals.
Lorentz System, as in Borovykh et al. (2017): the Lorentz
system (8) can be used as a model to generate chaotic
time- series dataset for study of nonlinear dynamics such
as financial asset price prediction (Soofi and Cao 2002)
123
298 Journal of Control, Automation and Electrical Systems (2020) 31:294–303
and hydrological streamflow research (see Dulakshi SK
and Liong 2006).
dx(t)
dt=σ(yx)
dy(t)
dt=xz)y
dz(t)
dt=xy βz
(8)
where σ,ρand βare system parameters with (x,y,z)
varying in time t.
In these four deterministic cases, the convolutional neural
network should behave similar to a standard ordinary differ-
ential equation solver/difference equation solver.
In the case of the stochastic processes, we have chosen the
following processes:
Standard diffusion process with mean reversion: this pro-
cess has been used in image-to-image matching models
(Thirion 1998) where object boundaries in one image are
considered as static semi-permeable membranes and the
second image modelled as the deformable grid diffuses
through it. Implementations include heart motion track-
ing (Crum et al. 2004), human brain MRI registration
(Klein et al. 2009) and various image processing appli-
cations (Zitov and Flusser 2003).
dX(t)=θ(μ X(t)) +σdW(t)(9)
Jump-diffusion process (see Matsuda 2004): this process
has applications as a form of random sampling algorithm
(Grenander and Miller 1994) for pattern theory, computer
vision and medical imaging. In addition, it has found
wide use in finance as an alternative to the Black–Scholes
option-pricing framework (Kou 2002).
dX(t)
X(t)=λk)·dt+σdW(t)+(y(t)1)dN(t)(10)
where θ,μ, σ,α are system parameters, W(t)a Wiener
process, (y(t)1)jump size with mean kand N(t)a
Poisson process with rate λin Eqs. (9) and (10).
In the case of autoregressive model processes, as regres-
sion analysis is a common technique in computer vision
(Meer et al. 1991) and trend prediction such as time-series
forecasting (Zhang 2003), ability to simulate and integrate
autoregressive model process Eqs. (11)–(13) with deep neu-
ral network architectures (see Gregor et al. 2014)isof
paramount importance. These processes have applications
in voice activity detection (see Aibinu et al. 2012), 2D shape
classification (Dubois and Glanz 1986) and multitarget (mul-
tiple motion) sensing and tracking (Okuma et al. 2004)to
name a few. We consider the following three cases:
Autoregressive process of order 1 (AR):
xt=Φ·xt1+t+c(11)
Autoregressive-movingaverage process of order 1 (ARMA):
xt=Φ·xt1+θ·t1+t+c(12)
Autoregressive conditional heteroskedastic process of
order 1 (ARCH):
xt=tht
h2
t=c+Φ1·x2
t1
(13)
where cis a constant, Φare θare parameters of the model
and tare error terms.
In these cases, the convolutional neural network should
simulate a stochastic process compatible with the original
one, as in a standard stochastic differential/difference equa-
tion simulator.
In order to set up the hyperparameters of the convolutional
neural network according to each synthetic time series, it was
adopted a forward method, where we start with two layers
and dilations up to the second order (i.e. dilations 1,2) and
increased the dilations up to 256th order, which were the
dilations in the original paper chosen to deal with eventual
long-range dependencies found in text-to-speech applica-
tions.
In addition to that, it was generated a single time series
(for each data generating process) with 12,000 samples. A
total of 10,000 samples were used to train the neural net-
work, and 2000 samples were used for back-testing purposes.
Whenever the poor results were obtained, an additional con-
volutional layer was added, starting again with only dilations
up to the second order.
Moreover, in this specific application, an 8-bit encoding
was used to discretize the data into 256 classes; 256 skip
channels were used, with a filter width set equal to 2, covering
all the hyperparameters established in van den Oord et al.
(2016a).
4 Results
4.1 Forecasting
The “Fisher River” time series can be seen in Fig. 5.Itrep-
resents the mean daily temperature of a river in the periods
ranging from the year 1988 to 1991, and it shows a strong
seasonal behaviour. Although the data have seasonal compo-
nents, it does not have a strong impact in our analysis since it
123
Journal of Control, Automation and Electrical Systems (2020) 31:294–303 299
0 500 1000 1500
−30 −20 −10 0 10 20
Fisher River Main Daily Temperature [°C]
Time Index
Temperature [°C]
Fig. 5 Fisher river time series
Mean Absolute Prediction Error
Steps ahead
12345678910
0246810
Model
ARIMA Wavenet
Fig. 6 Fisher river time series: ARIMA and WaveNet MAPE compar-
ison
has a frequency of one year and our model makes predictions
up to 10 days ahead.
That said, an ARIMA model of the data was estimated, a
classic approach to such problem. In order to compare predic-
tion capabilities for 1 to 10 steps ahead, we have calculated
the mean absolute prediction error (MAPE) for the two mod-
els, as can be seen in Fig. 6. The purple bars represent the
MAPE of the ARIMA model, and the magenta bars represent
the MAPE of the WaveNet model.
0 2000 4000 6000 8000 10000 12000
0 50 100 150 200 250
Harmonic Oscillator
Process Values
Process Value
Simulated out−of−sample
Simulated in−sample
Fig. 7 Simulation of a deterministic harmonic oscillator—WaveNet
with two layers with dilations up to 8
It is possible to see that, for three or more steps ahead, our
architecture is performing better than the traditional ARIMA
model.
4.2 Simulation
To test the simulation capabilities of this architecture, we
show first the results of the deterministic processes simula-
tions and, afterwards, the results of the stochastic processes
simulations. For visibility purposes, in the case of the logistic
map (Fig. 10), we show only the first 90 observations out-
of-sample. It is also worth mentioning that, for modelling
deterministic processes, we have used the numpy.argsort()
method (present in the NumPy package for Python), to
generate deterministic choices of the Softmax layer output,
obeying the magnitude of the probabilities.
In Figs. 7,8,9and 10, in all four graphics, dashed green
lines represent observations used for back-testing purposes;
the red lines represent predicted outputs by the model; and
dashed black lines represent the original signal used to train
the neural network.
Given that, it is possible to see that the WaveNet architec-
ture is able to capture and fairly reproduce the main dynamics
of the processes in Figs. 7(Harmonic Oscillator) and 8
(Damped Harmonic Oscillator).
It is also interesting to notice that, in Fig. 10 (Chaotic
Logistic Map), until the tenth observation, the model is able
to obtain precise out-of-sample forecasts of the data, while
in Fig. 9(Lorenz system), given the nonlinear multivariable
nature of the system, the best out-of-sample guess is an aver-
age of the past occurrences.
123
300 Journal of Control, Automation and Electrical Systems (2020) 31:294–303
0 2000 4000 6000 8000 10000 12000
0 50 100 150 200 250
Damped Harmonic Oscillator
Process Values
Process Value
Simulated out−of−sample
Simulated in−sample
Fig. 8 Simulation of a deterministic damped harmonic oscillator—
WaveNet with nine layers with dilations up to 4
0 2000 4000 6000 8000 10000 12000
0 50 100 150 200 250
Lorenz System
Process Values
Process Value
Simulated out−of−sample
Simulated in−sample
Fig. 9 Simulation of a deterministic Lorenz system—WaveNet with
three layers with dilations up to 256
That said, we repeat the same experiment with the five pro-
posed stochastic processes. However, instead of only plotting
the time series (except for the jump-diffusion process, plotted
in Fig. 11, which would require a lot of effort to be estimated,
extrapolating the original scope of this present work), we also
plot the distribution of the structural parameters of the simu-
lated series, in Figs. 12,13,14 and 15. Note that the different
colour lines for the process values plots in Figs. 12,13,14 and
15 represent distinct realizations of the simulated process.
Our hypothesis is supported by the fact that, if the struc-
tural parameters are compatible with the original ones, given
the fact that we know the true data generating process param-
0 20406080
0 50 100 150 200 250
Logistic Map (Out−of−Sample Only)
Process Values
Process Value
Simulated
Fig. 10 Simulation of a deterministic chaotic logistic map—WaveNet
with five layers with dilations up to 2
0 500 1000 1500 2000
70 80 90 100 110 120
Jump−Diffusion Process (Out−of−Sample Only)
Process Values
Fig. 11 Simulation of a jump-diffusion process—WaveNet with five
layers with dilations up to 4
eters, the simulated process will be compatible with the
original one.
Hence, in Fig. 11, we show first a jump-diffusion process
with a negative drift, thus a low probability of occurrence of
a high-intensity positive jump. It is also worth noticing that
in Figs. 12,13,14 and 15, the median values are plotted in
red and the true parameter values are plotted in blue. So, a
direct comparison between the true known structural param-
eter values and the estimated one from the simulated series
can be done.
123
Journal of Control, Automation and Electrical Systems (2020) 31:294–303 301
Frequencies of Estimated Parameter
Mean Reversion Speed Parameter
Frequency
0.12 0.14 0.16 0.18 0.20
0 5 10 15 20 25 30
Median
True Parameter
Frequencies of Estimated Paramete
r
Mean Parameter
Frequency
4.8 5.0 5.2 5.4
0 5 10 15 20 25 30
Median
True Parameter
0 500 1000 1500 2000
0246810
Simulated Process by WaveNet
Process Values
Fig. 12 Simulation and inference of a mean-reverting diffusion
process—true mean reversion speed parameter =0.1—blue line not
shown (Color figure online)
Frequencies of Estimated Parameter
AR(1) Parameter
Frequency
0.70 0.72 0.74 0.76 0.78 0.80
0 5 10 15 20
Median
True Parameter
Frequencies of Estimated Parameter
Constant Parameter
Frequency
3.0 3.5 4.0 4.5
0 5 10 20 30
Median
True Parameter
0 500 1000 1500 2000
5101520
Simulated Process by WaveNet
Process Values
Fig. 13 Simulation and inference of an AR(1) process—WaveNet with
five layers with dilations up to 4
To generate different realizations of the processes (repre-
sented by different colours), 100 simulations of each process
were carried out, as in a Monte Carlo approach, using the
numpy.random.choice() method (present in the NumPy pack-
age for Python), to generate random choices that obey a
given distribution, which is returned by the Softmax layer
of WaveNet.
In the first three linear models (Figs. 12,13,14), it is pos-
sible to verify that the results are very reasonable, given the
fact that the networks have learnt from only one realization
Frequencies of Estimated Parameter
AR(1) Parameter
Frequency
0.88 0.90 0.92 0.94
0 5 10 15 20 25 30
Median
True Parameter
Frequencies of Estimated Parameter
MA(1) Parameter
Frequency
−0.6 −0.5 −0.4 −0.3
0 10203040
Median
True Parameter
Frequencies of Estimated Parameter
Constant Parameter
Frequency
2.0 2.5 3.0 3.5
0 5 10 15 20
Median
True Parameter
0 500 1000 1500 2000
510 20 30
Simulated Process by WaveNet
Process Values
Fig. 14 Simulation and inference of an ARMA(1, 1) process—
WaveNet with five layers with dilations up to 4
Frequencies of Estimated Parameter
ARCH(1) Parameter
Frequency
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
0 20406080
Median
True Parameter
Frequencies of Estimated Parameter
Constant Parameter
Frequency
0.10 0.15 0.20 0.25 0.30 0.35
0 20406080
Median
True Parameter
0 500 1000 1500 2000
−5 0 5
Simulated Process by WaveNet
Process Values
Fig. 15 Simulation and inference of an ARCH(1) process—WaveNet
with five layers with dilations up to 4
of each stochastic process (despite a large sample), and the
parameters’ deviations from the true values are not large.
Also, it is important to state that all experiments used train-
ing sets where the first samples were equal to zero, explaining
some of the valleys found in the simulations. In Fig. 15,we
can also verify that the structural characteristics of the data
generation process are compatible with the true parameters.
123
302 Journal of Control, Automation and Electrical Systems (2020) 31:294–303
5 Conclusion
In the present work, we aimed to establish a new simulation of
data generation processes that avoids the traditional identifi-
cation and estimation procedures. The proposed technique is
based on a convolutional neural network, namely WaveNet
architecture, where we estimate only its hyperparameters,
in this case: number of convolutional layers, discretization
scheme(encoding) and dilations.
To accomplish the data generation processes, we have
simulated different deterministic and stochastic processes,
using an existing implementation of the WaveNet architec-
ture code, adapted for this specific purpose, in conjunction
with the R Statistical Package and the Numpy Package. On
top of these different simulations, we show that the generated
data are compatible with original data generation process to
a fair wide extent, being a potential attractive tool that can be
employed in several different research areas such as computer
vision, robotics, time-series classification and forecasting.
In this regard, this approach can also serve as an alterna-
tive technique to simulation tools such as Gibbs sampling or
Monte Carlo- based methods, to enhance the understanding
of generative adversarial networks (GANs) or in data simu-
lation for training Reinforcement Learning algorithms.
As a perspective for future works and research, we suggest
following these experiments with more complex stochas-
tic processes and their respective generalizations to 2 and 3
dimensions. Moreover, given the high computational cost of
the training procedures (all of them were done using only one
GPU in TensorFlow), it is crucial to develop an information
criterion for such kind of models, in order to guide and speed
up the hyperparameters choice. Extending the architecture
for multivariate processes is also desirable path towards new
interesting results.
References
Aibinu, A. M., Salami, M. J. E., & Shafie, A. A. (2012). Artificial neural
network based autoregressive modeling technique with application
in voice activity detection. Engineering Applications of Artificial
Intelligence,25(6), 1265–1276.
Avishek, P., & Prakash, P. (2017). Practical time-series analysis: Mas-
ter time series data processing, visualization, and modeling using
python (1st ed.). Birmingham: Packt Publishing Ltd.
Bahrpeyma, F., Roantree, M., McCarren, A. (2018). Multistep-
ahead prediction: A comparison of analytical and algorithmic
approaches. In Big data analytics and knowledge discovery: 20th
International conference, DaWaK 2018.
Bishop, C. M. (2006). Pattern recognition and machine learning.Cam-
bridge: Springer.
Borovykh, A., Bohte, S., & Oosterlee, C. W. (2017). Conditional
time series forecasting with convolutional neural networks.
arXiv:1703.04691v3.
Bresler, Y., & Macovski, A. (1986). Exact maximum likelihood parame-
ter estimation of superimposed exponential signals in noise. IEEE
Transactions on Acoustics, Speech, and Signal Processing,34,
1081–1089.
Briandet, R., Kemsley, E. K., & Wilson, R. H. (1996). Discrimination of
Arabica and Robusta in instant coffee by Fourier transform infrared
spectroscopy and chemometrics. Journal of Agricultural and Food
Chemistry,44, 170–174.
Bruni, V., Vitulano, D., & Kokaram, A. (2003). Line scratches detection
and restoration via light diffraction. In 3rd International sympo-
sium on image and signal processing and analysis, Rome, Italy
(Vol. 1, pp. 5–10).
Cheng, X., Chen, X., & Mallat, S. (2016). Deep haar scattering net-
works. Information and Inference: A Journal of IMA,5, 105–133.
Chinchali, S., Hu, P., Chu, T., Sharma, M., Bansal, M., Misra, R.,
Pavone, M., & Katti, S. (2018). Cellular network traffic scheduling
with deep reinforcement learning. AAAI.
Crum, W. R., Hartkens, T.,& Hill, D. L. G. (2004). Non-rigid image reg-
istration: Theory and practice. The British Journal of Radiology,
77(2), 140–153.
Davis, L. M. (2013). Predictive modelling of bone ageing. Unpublished
doctoral dissertation, University of East Anglia.
de Marcelo, S. P., et al. (2017). Fault identification in doubly fed induc-
tion generator using FFT and neural networks. Journal of Control,
Automation and Electrical Systems,28, 228–237.
Dosiek, L., Pierre, J. W., & Follum, J. (2013). A recursive maximum
likelihood estimator for the online estimation of electromechanical
modes with error bounds. IEEE Transactions on Power Systems,
28(1), 441–451.
Dubois, S. R., & Glanz, F. H. (1986). An autoregressive model approach
to two-dimensional shape classification. IEEE Transactions PAMI,
8(1), 55–66.
Dulakshi, S. K., & Liong, S. Y. (2006). Chaotic time series prediction
with a global model: Artificial neural network. Journal of Hydrol-
ogy,323(1–4), 92–105.
Giordano, F., Rocca, M., & Perna, C. (2008). Neural network sieve
bootstrap prediction intervals: Some real data evidence. In New
directions in neural networks: 18th Italian workshop on neural
networks: WIRN 2008.
Glaessgen, E. H., & Stargel, D. S. (2012) The digital twin paradigm
for future NASA and U.S. air force vehicles. In 53rd Structures,
structural dynamics, and materials conference.
Gregor, K., Danihelka, I., Mnih, A., Blundell, C., & Wierstra, D. (2014).
Deep autoregressive networks. PMLR,32(2), 1242–1250.
Grenander, U., & Miller, M. I. (1994). Representations of knowledge in
complex systems. Journal of the Royal Statistical Society B,56(4),
549–603.
Hayashi, F. (2000). Econometrics. Princeton, NJ: Princeton University
Press.
He, K., Zhang, X., Shaoqing, R., Sun, J., & Kavukcuoglu,
K. (2015). Deep residual learning for image recognition.
arXiv:1512.03385v1.
Jalba, A. C., Wilkinson, M. H. F., & Roerdink, J. B. T. M. (2004).
Automatic segmentation of diatom images for classification.
Microscopy Research and Technique,65, 72–85.
Jzefowicz, R., Vinyals, O., Schuster, M., Shazeer, N., & Wu, Y. (2016).
Exploring the limits of language modeling. arXiv:1602.02410v2.
Keogh, E., Wei, L., Xi, X., Vlachos, M., Lee, S., & Protopapas, P. (2006).
Supports exact indexing of shapes under rotation invariance with
arbitrary representations and distance measures. In VLDB ’06: Pro-
ceedings of the 32nd International Conference on Very Large Data
Bases (pp.882–893).
Klein, A., et al. (2009). Evaluation of 14 nonlinear deformation algo-
rithms applied to human brain MRI registration. NeuroImage,
46(3), 786–802.
Kolman, E., & Margaliot, M. (2007). Knowledgeextraction from neural
networks using all-permutation fuzzy rule base. IEEE Transactions
on Neural Networks,18, 925–931.
123
Journal of Control, Automation and Electrical Systems (2020) 31:294–303 303
Kou, S. G. (2002). A jump-diffusion model for option pricing. Man-
agement Science,48(8), 1086–1101.
Lee, J., Bagheri, B., & Kao, H.-A. (2015). A cyber-physical systems
architecture for industry 4.0-based manufacturing systems. Manu-
facturing Letters,3, 18–23. https://doi.org/10.1016/j.mfglet.2014.
12.001.
Li, L., McCann, J., Pollard, N., & Faloutsos, C. (2009). DynaMMo:
mining and summarization of coevolving sequences with missing
values. KDD’09. Paris, France.
Luong, Q.-T., Weber, J., Koller, D., & Malik, J. (1995). An integrated
stereo-based approach to automatic vehicle guidance. In Proceed-
ings of IEEE International Conference on Computer Vision.
Matsuda, K. (2004). Introduction to Merton jump diffusion model.
Working Paper.
May, R. M. (1976). Simple mathematical models with very complicated
dynamics. Nature,261, 459–467.
Meer, P., Mintz, D., Rosenfeld, A., & Kim, D. Y. (1991). Robust regres-
sion methods for computer vision: A review. Int J Comput Vision,
6, 59.
Monteiro, N. A. B., Silva, J. J., & Neto, J. S. R. (2019). Soft sensors to
monitoring a multivariate nonlinear process using neural networks.
Journal of Control, Automation and Electrical Systems,30, 54–62.
Neto, F. F. (2017). Building function approximators on top of Haar
scattering networks. Working Paper on Research Net.
Nowicka-Zagrajek, J., & Weron, R. (2002). Modeling electricity loads
in California: ARMA models with hyperbolic noise. Signal Pro-
cessing,82, 1903–1915.
Okuma, K., Taleghani, A., de Freitas, N., Little, J. J., & Lowe, D. G.
(2004). A boosted particle filter: Multitarget detection and track-
ing. In Computer vision—ECCV (pp. 28–39).
Pareek, N. K., Patidar, V., & Sud, K. K. (2006). Image encryption using
chaotic logistic map. Image and Vision Computing,24, 926–934.
Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A.,
Guez, A., et al. (2017). Mastering the game of Go without human
knowledge. Nature,550(19), 354–372.
Soofi, A. S., & Cao, L. (2002). Modelling and forecasting financial
data: techniques of nonlinear dynamics. New York: Springer.
Specht, J. I., Dimieri, L., Urdapilleta, E., & Gasaneo, G. (2017). Min-
imal dynamical description of eye movements. The European
Physical Journal B,90, 25.
Tak, Y. S., & Hwang, E. (2007). A leaf image retrieval scheme based on
partial dynamic time warping and two-level filtering. In Seventh
international conference on computer and information technology.
Tang, W. K., & Wong, Y. K. (2000). Simulation of manufacturing pro-
cesses using ARMA and ARMAX models. Advances in Modelling
and Analysis B,43, 1–2.
Tao, F., Cheng, J., Qi, Q., Zhang, M., Zhang, H., & Sui, F. (2017).
Digital twin-driven product design, manufacturing and service
with big data. The International Journal of Advanced Manufac-
turing Technology,94(9–12), 3563–3576. https://doi.org/10.1007/
s00170-017- 0233-1.
Thirion, J.-P. (1998). Image matching as a diffusion process: An analogy
with Maxwell’s demons. Medical Image Analysis,2(3), 243–260.
Vakulenko, S., Radulescu, O., Morozov, I., & Weber, A. (2017). Central-
ized networks to generate human body motions. Sensors (Basel),
17(12), E2907.
van den Oord, A., Dieleman, S., Espeholt, L., Vinyals, O., Graves, A.,
Kalchbrenner, N., & Kavukcuoglu, K. (2016a). Conditional image
generation with PixelCNN decoders. arXiv:1606.05328v2.
van den Oord, A., Dieleman, S., Zen, H., Simonyan, K., Vinyals,
O., Graves, A., Kalchbrenner, N., Senior, A., & Kavukcuoglu,
K. (2016b). WaveNet: a generative model for raw audio.
arXiv:1609.03499v2.
Wang, L., Hu, W., & Tan, T. (2003). Recent developments in human
motion analysis. Pattern Recognition,36(3), 585–601.
Zhang, G. P. (2003). Time series forecasting using a hybrid ARIMA
and neural network model. Neurocomputing,50, 159–175.
Zitov, B., & Flusser, J. (2003). Image registration methods: A survey.
Image and Vision Computing,21(11), 977–1000.
Publisher’s Note Springer Nature remains neutral with regard to juris-
dictional claims in published maps and institutional affiliations.
123
Article
Full-text available
Human–robot collaboration is increasingly present not only in research environments, but also in industry and many contemporary day-to-day activities. There is a need for the automation of tasks ranging from the simplest to the most complex ones. The insertion of robotic arms provides a considerable step useful in achieving this goal. In this context, safety remains a concern, however. Among the most frequent issue in this collaboration context is human–robot collision. While focus has been on automation efficiency of the of the activities, there is a growing need to reduce or even prevent damage to the involved agents. As part of this goal, a new mechanism for detecting human–robot collisions is proposed in this article. It has been tested in a well-controlled scenario using equipment commonly present in collaborative scenarios for maintenance on a radio base station. The robot used is a UR5 robotic arm in addition to three 2D cameras and a network rack. In our experimental scenario, a person interacts with the network devices installed within the rack while conducting basic collaborative activities inserted in this context. For collision detection, deep learning models were used and evaluated. These were trained to detect overlap between humans and robots considering the view and perspectives from three different cameras. Finally, a new ensemble learning system is proposed in order to establish whether or not a collision took place. It receives as input the result of overlap detection through deep learning models. Results suggest that the proposed system is capable of detecting collisions with an average global accuracy of 89.81% of correctness in a well-controlled scenario. The effectiveness of the proposed ensemble is exposed in comparison with the use of only one of the cameras for decision-making. Finally, the proposed system is shown to detect collisions in real time and to achieve a low response time.
Article
Full-text available
The article is devoted to Improving the system of financial monitoring of the bank by automating the process of verification of bank customers. One of the simplest, but one of the most relevant, in terms of combating money laundering, is the process of assessing the level of invalidity of the client. However, it is not so much his financial capacity as his reputation and the level of probability of involvement in the legalization of criminal proceeds. The bank's clients are the second link in the process of legalization of criminal proceeds, namely an element of the placement process. If the bank's clients place criminally obtained funds, the process of their further identification will become much more complicated and will require additional efforts. Automation of identification and verification processes helps not only to save time on establishing the client's involvement in the legalization of criminal proceeds, but also eliminates the risk of involvement of the bank employee in such operations. Automation of customer identification and verification processes should include full or partial automation of processes: filling out a customer card in an automated banking system; filling in the electronic questionnaire of the client; checking the client's affiliation with politically significant and related persons and obtaining permission to establish business relations with them; customer data verification; financial condition assessments; customer reputation assessments; customer risk assessment; receiving a decision-proposal to establish a business relationship with the client or to refuse service; further updating of data; risk reassessment. Recently, it is especially important to assess the bank's client's affiliation with a politically significant person, as well as to establish the ultimate beneficial owner of the client. A study of the full implementation and enforcement of international agreements between the Government of Ukraine and the United States on tax claims on foreign accounts also found a place. Therefore, the process of identification and verification of customers is very important, and its rational automation can free up time for the bank's specialists to carry out the process of customer verification, which is impossible to conduct online.
Article
Full-text available
In this article we propose building general-purpose function approximators on top of Haar Scattering Networks. We advocate that this architecture enables a better comprehension of feature extraction, in addition to its implementation simplicity and low computational costs. We show its approximation and feature extraction capabilities in a wide range of different problems, which can be applied on several phenomena in signal processing, system identification, econometrics and other potential fields.
Article
Full-text available
We consider continuous-time recurrent neural networks as dynamical models for the simulation of human body motions. These networks consist of a few centers and many satellites connected to them. The centers evolve in time as periodical oscillators with different frequencies. The center states define the satellite neurons' states by a radial basis function (RBF) network. To simulate different motions, we adjust the parameters of the RBF networks. Our network includes a switching module that allows for turning from one motion to another. Simulations show that this model allows us to simulate complicated motions consisting of many different dynamical primitives. We also use the model for learning human body motion from markers' trajectories. We find that center frequencies can be learned from a small number of markers and can be transferred to other markers, such that our technique seems to be capable of correcting for missing information resulting from sparse control marker settings.
Book
Full-text available
Modelling and Forecasting Financial Data brings together a coherent and accessible set of chapters on recent research results on this topic. To make such methods readily useful in practice, the contributors to this volume have agreed to make available to readers upon request all computer programs used to implement the methods discussed in their respective chapters. Modelling and Forecasting Financial Data is a valuable resource for researchers and graduate students studying complex systems in finance, biology, and physics, as well as those applying such methods to nonlinear time series analysis and signal processing.
Article
Full-text available
A long-standing goal of artificial intelligence is an algorithm that learns, tabula rasa, superhuman proficiency in challenging domains. Recently, AlphaGo became the first program to defeat a world champion in the game of Go. The tree search in AlphaGo evaluated positions and selected moves using deep neural networks. These neural networks were trained by supervised learning from human expert moves, and by reinforcement learning from self-play. Here we introduce an algorithm based solely on reinforcement learning, without human data, guidance or domain knowledge beyond game rules. AlphaGo becomes its own teacher: a neural network is trained to predict AlphaGo's own move selections and also the winner of AlphaGo's games. This neural network improves the strength of the tree search, resulting in higher quality move selection and stronger self-play in the next iteration. Starting tabula rasa, our new program AlphaGo Zero achieved superhuman performance, winning 100-0 against the previously published, champion-defeating AlphaGo. © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
Article
Full-text available
Nowadays, along with the application of new-generation information technologies in industry and manufacturing, the big data-driven manufacturing era is coming. However, although various big data in the entire product lifecycle, including product design, manufacturing, and service, can be obtained, it can be found that the current research on product lifecycle data mainly focuses on physical products rather than virtual models. Besides, due to the lack of convergence between product physical and virtual space, the data in product lifecycle is isolated, fragmented, and stagnant, which is useless for manufacturing enterprises. These problems lead to low level of efficiency, intelligence, sustainability in product design, manufacturing, and service phases. However, physical product data, virtual product data, and connected data that tie physical and virtual product are needed to support product design, manufacturing, and service. Therefore, how to generate and use converged cyber-physical data to better serve product lifecycle, so as to drive product design, manufacturing, and service to be more efficient, smart, and sustainable, is emphasized and investigated based on our previous study on big data in product lifecycle management. In this paper, a new method for product design, manufacturing, and service driven by digital twin is proposed. The detailed application methods and frameworks of digital twin-driven product design, manufacturing, and service are investigated. Furthermore, three cases are given to illustrate the future applications of digital twin in the three phases of a product respectively.
Article
Full-text available
This paper presents a fault identification system for doubly fed induction generator. The proposed system is designed to identify single-phase faults and load switching events on an isolated load. Firstly, the system preprocess the stator line current data by the fast Fourier transform (FFT). In order to reduce the dimensionality of the FFT output data, the principal component analysis method is used. The fault identification stage is based on artificial neural network (ANN). Also, a post-processing is employed in order to increase the network reliability, which reduces the error of ANN. The proposed system is simulated and experimentally validated on different voltage, speed and load conditions.
Article
Modern mobile networks are facing unprecedented growth in demand due to a new class of traffic from Internet of Things (IoT) devices such as smart wearables and autonomous cars. Future networks must schedule delay-tolerant software updates, data backup, and other transfers from IoT devices while maintaining strict service guarantees for conventional real-time applications such as voice-calling and video. This problem is extremely challenging because conventional traffic is highly dynamic across space and time, so its performance is significantly impacted if all IoT traffic is scheduled immediately when it originates. In this paper, we present a reinforcement learning (RL) based scheduler that can dynamically adapt to traffic variation, and to various reward functions set by network operators, to optimally schedule IoT traffic. Using 4 weeks of real network data from downtown Melbourne, Australia spanning diverse traffic patterns, we demonstrate that our RL scheduler can enable mobile networks to carry 14.7% more data with minimal impact on existing traffic, and outpeforms heuristic schedulers by more than 2x. Our work is a valuable step towards designing autonomous, "self-driving" networks that learn to manage themselves from past data.
Article
In general, industrial processes have a multivariable nature, with multiple inputs and multiple outputs. Such systems are more difficult to monitor and control due to interactions between the input and output variables. Focusing on these issues, the development of soft sensors to monitor multivariate nonlinear processes using neural networks is proposed. Experiments were performed to monitor the pressure and flow values on an experimental platform (fluid transport system) using developed soft sensors. With the monitoring using soft sensor, it is possible to make processes more reliable, with better performance and with less difficulty in detecting and solving possible failures.
Chapter
Most approaches to forecasting time series data employ one-step-ahead prediction approaches. However, recently there has been focus on multi-step-ahead prediction approaches. These approaches demonstrate enhanced prediction capabilities. However, multi-step-ahead prediction increases the complexity of the prediction process in comparison to one-step-ahead approaches. Typically, studies in the examination of multi-step ahead methods have addressed issues such as the increased complexity, inaccuracy, uncertainty, and error variance on the prediction horizon, and have been deployed in various domains such as finance, economics, agriculture and hydrology. When determining which algorithm to use in a time series analyses, the approach is to analyze the series for numerous characteristics and features, such as heteroscedasticity, auto-correlation, seasonality and stationarity. In this work, a comparative analysis of 20 different time series datasets is presented and a demonstration of the complexity in deciding which approach to use is given. The study investigates some of the main prediction approaches such as ARIMA (Autoregressive integrated moving average), NN (Neural Network), RNN (Recurrent neural network) and SVR (Support vector regression), which focus on the recursive prediction strategy and compare them to a new approach known as MRFA (Multi-Resolution Forecast Aggregation).