ArticlePDF Available

Long-term time series prediction with the NARX network: An empirical evaluation

October 2008
Neurocomputing 71(16-18):3335-3343

October 2008
71(16-18):3335-3343

DOI:10.1016/j.neucom.2008.01.030

Authors:

Jose Maria pires Menezes Jr

Universidade Federal do Piauí

Guilherme A. Barreto

Universidade Federal do Ceará

The NARX network is a dynamical neural architecture commonly used for input–output modeling of nonlinear dynamical systems. When applied to time series prediction, the NARX network is designed as a feedforward time delay neural network (TDNN), i.e., without the feedback loop of delayed outputs, reducing substantially its predictive performance. In this paper, we show that the original architecture of the NARX network can be easily and efficiently applied to long-term (multi-step-ahead) prediction of univariate time series. We evaluate the proposed approach using two real-world data sets, namely the well-known chaotic laser time series and a variable bit rate (VBR) video traffic time series. All the results show that the proposed approach consistently outperforms standard neural network based predictors, such as the TDNN and Elman architectures.

Content uploaded by Guilherme A. Barreto

Content may be subject to copyright.

Long-Term Time Series Prediction with the

NARX Network: An Empirical Evaluation

Jos´e Maria P. J´unior and Guilherme A. Barreto

Department of Teleinformatics Engineering

Federal University of Cear´a, Av. Mister Hull, S/N

CP 6005, CEP 60455-760, Fortaleza-CE, Brazil

josemenezesjr@gmail.com, guilherme@deti.ufc.br

Abstract

The NARX network is a dynamical neural architecture commonly used for input-

output modeling of nonlinear dynamical systems. When applied to time series pre-

diction, the NARX network is designed as a feedforward Time Delay Neural Network

(TDNN), i.e. without the feedback loop of delayed outputs, reducing substantially

its predictive performance. In this paper, we show that the original architecture of

the NARX network can be easily and eﬃciently applied to long-term (multi-step-

ahead) prediction of univariate time series. We evaluate the proposed approach

using two real-world data sets, namely the well-known chaotic laser time series and

a variable bit rate (VBR) video traﬃc time series. All the results show that the pro-

posed approach consistently outperforms standard neural network based predictors,

such as the TDNN and Elman architectures.

Key words: NARX neural network, long-term prediction, nonlinear traﬃc

modeling, chaotic time series, recurrence plot.

Preprint submitted to Elsevier Science 26 December 2007

1 Introduction

Artiﬁcial neural networks (ANNs) have been successfully applied to a number

of time series prediction and modeling tasks, including ﬁnancial time series

prediction [12], river ﬂow forecasting [3], biomedical time series modeling [11],

communication network traﬃc prediction [6,13,2], chaotic time series predic-

tion [42], among several others (see [34], for a recent survey). In particular,

when the time series is noisy and the underlying dynamical system is non-

linear, ANN models frequently outperform standard linear techniques, such

as the well-known Box-Jenkins models [7]. In such cases, the inherent nonlin-

earity of ANN models and a higher robustness to noise seem to explain their

better prediction performance.

In one-step-ahead prediction tasks, ANN models are required to estimate the

next sample value of a time series, without feeding back it to the model’s

input regressor. In other words, the input regressor contains only actual sample

points of the time series. If the user is interested in a longer prediction horizon,

a procedure known as multi-step-ahead or long-term prediction, the model’s

output should be fed back to the input regressor for a ﬁxed but ﬁnite number of

time steps [39]. In this case, the components of the input regressor, previously

composed of actual sample points of the time series, are gradually replaced by

previous predicted values.

If the prediction horizon tends to inﬁnity, from some time in the future the

input regressor will start to be composed only of estimated values of the time

series. In this case, the multi-step-ahead prediction task becomes a dynamic

modeling task, in which the ANN model acts as an autonomous system, trying

to recursively emulate the dynamic behavior of the system that generated

the nonlinear time series [17,18]. Multi-step ahead prediction and dynamic

modeling are much more complex to deal with than one-step-ahead prediction,

and it is believed that these are complex tasks in which ANN models play an

important role, in particular recurrent neural architectures [36].

Simple recurrent networks (SRNs) comprise a class of recurrent neural models

that are essentially feedforward in the signal-ﬂow structure, but also contain a

small number of local and/or global feedback loops in their architectures. Even

though feedforward MLP-like networks can be easily adapted to process time

series through an input tapped delay line, giving rise to the well-known Time

Delay Neural Network (TDNN) [36], they can also be easily converted to SRNs

by feeding back the neuronal outputs of the hidden or output layers, giving

rise to Elman and Jordan networks, respectively [23]. It is worth pointing

out that, when applied to long-term prediction, a feedforward TDNN model

will eventually behavior as a kind of SRN architecture, since a global loop is

needed to feed back the current estimated value into the input regressor.

The aforementioned recurrent architectures are usually trained by means of

temporal gradient-based variants of the backpropagation algorithm [35]. How-

ever, learning to perform tasks in which the temporal dependencies present in

the input/output signals span long time intervals can be quite diﬃcult using

gradient-based learning algorithms [4]. In [27], the authors report that learn-

ing such long-term temporal dependencies with gradient-descent techniques is

more eﬀective in a class of SRN model called Nonlinear Autoregressive with

eXogenous input (NARX) [28] than in simple MLP-based recurrent models.

This occurs in part because the NARX model’s input vector is cleverly built

through two tapped-delay lines: one sliding over the input signal together and

another sliding over the network’s output.

Despite the aforementioned advantages of the NARX network, its feasibility as

a nonlinear tool for univariate time series modeling and prediction has not been

fully explored yet. For example, in [29], the NARX model is indeed reduced to

the TDNN model in order to be applied to time series prediction. Bearing this

under-utilization of the NARX network in mind, we propose a simple strategy

based on Taken’s embedding theorem that allows the original architecture of

the NARX network to be easily and eﬃciently applied to long-term prediction

of univariate nonlinear time series.

Potential ﬁelds of application of our approach are communication network

traﬃc characterization [45,14,16] and chaotic time series prediction [22], since

it has been shown that these kinds of data present long-range dependence

due to their self-similar nature. Thus, for the sake of illustration, we evaluate

the proposed approach using two real-world data sets obtained from these

domains, namely the well-known chaotic laser time series and a variable bit

rate (VBR) video traﬃc time series.

The remainder of the paper is organized as follows. In Section 2, we describe

the NARX network model and its main characteristics. In Section 3 we intro-

duce the basics of the nonlinear time series prediction problem and present our

approach. The simulations and discussion of results are presented in Section 4.

The paper is concluded in Section 5

2 The NARX Network

The Nonlinear Autoregressive model with Exogenous inputs (NARX) [26,30,33]

is an important class of discrete-time nonlinear systems that can be mathe-

matically represented as

y(n+ 1) = f[y(n),...,y(n−dy+ 1); (1)

u(n−k), u(n−k+ 1),...,u(n−du−k+ 1)] ,

where u(n)∈Rand y(n)∈Rdenote, respectively, the input and output of

the model at discrete time step n, while du≥1 and dy≥1, du≤dy, are

the input-memory and output-memory orders, respectively. The parameter k

(k≥0) is a delay term, known as the process dead-time.

Without lack of generality, we always assume k= 0 in this paper, thus ob-

taining the following NARX model:

y(n+ 1) = f[y(n),...,y(n−dy+ 1); (2)

u(n), u(n−1),...,u(n−du+ 1)] ,

which may be written in vector form as

y(n+ 1) = f[y(n); u(n)],(3)

where the vectors y(n) and u(n) denote the output and input regressors,

respectively.

The nonlinear mapping f(·) is generally unknown and can be approximated,

for example, by a standard multilayer Perceptron (MLP) network. The re-

sulting connectionist architecture is then called a NARX network [10,32], a

powerful class of dynamical models which has been shown to be computa-

Fig. 1. NARX network with dudelayed inputs and dydelayed outputs (z−1= unit

time delay).

tionally equivalent to Turing machines [38]. Figure 1 shows the topology of a

two-hidden-layer NARX network.

In what concern training the NARX network, it can be carried out in one out

of two modes:

•Series-Parallel (SP) Mode - In this case, the output’s regressor is formed

only by actual values of the system’s output:

ˆy(n+ 1) = ˆ

f[ysp(n); u(n)],(4)

=ˆ

f[y(n),...,y(n−dy+ 1); u(n), u(n−1),...,u(n−du+ 1)] ,

where the hat symbol (∧) is used to denote estimated values (or functions).

•Parallel (P) Mode - In this case, estimated outputs are fed back and

included in the output’s regressor 1:

1The NARX model in P-mode is also known as Output-Error Model [30].

ˆy(n+ 1) = ˆ

f[yp(n); u(n)],(5)

=ˆ

f[ˆy(n),...,ˆy(n−dy+ 1); u(n), u(n−1),...,u(n−du+ 1)].

As a tool for nonlinear system identiﬁcation, the NARX network has been suc-

cessfully applied to a number of real-world input-output modeling problems,

such as heat exchangers, waste water treatment plants, catalytic reforming

systems in a petroleum reﬁnery and nonlinear time series prediction (see [29]

and references therein).

As mentioned in the introduction, the particular topic of this paper is the

issue of nonlinear univariate time series prediction with the NARX network.

In this type of application, the output-memory order is usually set dy= 0,

thus reducing the NARX network to the TDNN architecture [29], i.e.

y(n+ 1) = f[u(n)],(6)

=f[u(n), u(n−1),...,u(n−du+ 1)],

where u(n)∈Rduis the input regressor. This simpliﬁed formulation of the

NARX network eliminates a considerable portion of its representational capa-

bilities as a dynamic network; that is, all the dynamic information that could

be learned from the past memories of the output (feedback) path is discarded.

For many practical applications, however, such as self-similar traﬃc model-

ing [16], the network must be able to robustly store information for a long

period of time in the presence of noise. In gradient-based training algorithms,

the fraction of the gradient due to information ntime steps in the past ap-

proaches zero as nbecomes large. This eﬀect is called the problem of vanishing

gradient and has been pointed out as the main cause of the poor performance

of standard dynamical ANN models when dealing with long-range dependen-

cies.

The original formulation of the NARX network does not circumvent the prob-

lem of vanishing gradient, but it has been demonstrated that it often per-

forms much better than standard dynamical ANNs in such a class of problems,

achieving much faster convergence and better generalization performance [28].

As pointed out in [27], an intuitive explanation for this improvement in per-

formance is that the output memories of a NARX neural network are repre-

sented as jump-ahead connections in the time-unfolded network that is often

encountered in learning algorithms such as the backpropagation through time

(BPTT). Such jump-ahead connections provide shorter paths for propagat-

ing gradient information, reducing the sensitivity of the network to long-term

dependencies.

Hence, if the output memory is discarded, as shown in Equation (6), per-

formance improvement may no longer be observed. Bearing this in mind as a

motivation, we propose a simple strategy to allow the computational resources

of the NARX network to be fully explored in nonlinear time series prediction

tasks.

3 Nonlinear Time Series Prediction with NARX Network

In this section we provide a short introduction of the theory of embedding and

state-space reconstruction. The interested reader are referred to [1] for further

details.

The state of a deterministic dynamical system is the information necessary to

determine the evolution of the system in time. In discrete time, this evolution

can be described by the following system of diﬀerence equations:

x(n+ 1) = F[x(n)] (7)

where x(n)∈Rdis the state of the system at time step n, and F[·] is a

nonlinear vector valued function. A time series is a time-ordered set of mea-

sures {x(n)},n= 1, . . . , N, of a scalar quantity observed at the output of the

system. This observable quantity is deﬁned in terms of the state x(n) of the

underlying system as follows:

x(n) = h[x(n)] + ε(t) (8)

where h(·) is a nonlinear scalar-valued function, εis a random variable which

accounts for modeling uncertainties and/or measurement noise. It is commonly

assumed that ε(t) is drawn from a Gaussian white noise process. It can be

inferred immediately from Equation (8) that the observations {x(n)}can be

seen as a projection of the multivariate state space of the system onto the

one-dimensional space. Equations (7) and (8) describe together the state-space

behavior of the dynamical system.

In order to perform prediction, one needs to reconstruct (estimate) as well

as possible the state space of the system using the information provided by

{x(n)}only. In [40], Takens has shown that, under very general conditions, the

state of a deterministic dynamic system can be accurately reconstructed by a

time window of ﬁnite length sliding over the observed time series as follows:

x1(n),[x(n), x(n−τ),...,x(n−(dE−1)τ)] (9)

where x(n) is the sample value of the time series at time n,dEis the embedding

dimension and τis the embedding delay. Equation (9) implements the delay

embedding theorem [22]. According to this theorem, a collection of time-lagged

values in a dE-dimensional vector space should provide suﬃcient information

to reconstruct the states of an observable dynamical system. By doing this,

we are indeed trying to unfold the projection back to a multivariate state

space whose topological properties are equivalent to those of the state space

that actually generated the observable time series, provided the embedding

dimension dEis large enough.

The embedding theorem also provides a theoretical framework for nonlinear

time series prediction, where the predictive relationship between the current

state x1(t) and the next value of the time series is given by the following

equation:

x(n+ 1) = g[x1(n)] (10)

Once the embedding dimension dEand delay τare chosen, one remaining

task is to approximate the mapping function g(·). It has been shown that a

feedforward neural network with enough neurons is capable of approximating

any nonlinear function to an arbitrary degree of accuracy. Thus, it can pro-

vide a good approximation to the function g(·) by implementing the following

mapping:

ˆx(n+ 1) = ˆg[x1(n)] (11)

where ˆx(n+ 1) is an estimate of x(n+ 1) and ˆg(·) is the corresponding ap-

proximation of g(·). The estimation error, e(n+ 1) = x(n+ 1) −ˆx(n+ 1), is

commonly used to evaluate the quality of the approximation.

If we set u(n) = x1(n) and y(n+ 1) = x(n+ 1) in Equation (6), then it

leads to an intuitive interpretation of the nonlinear state-space reconstruction

procedure as equivalent to the time series prediction problem whose the goal

is to compute an estimate of x(n+ 1). Thus, the only thing we have to do is

to train a TDNN model [36]. Once training is completed, the TDNN can be

used for predicting the next samples of the time series.

Despite the correctness of the TDNN approach, recall that it is derived from a

simpliﬁed version of the NARX network by eliminating the output memory. In

order to use the full computational abilities of the NARX network for nonlinear

time series prediction, we propose novel deﬁnitions for its input and output

regressors. Firstly, the input signal regressor, denoted by u(n), is deﬁned by

the delay embedding coordinates of Equation (9):

u(n) = x1(n) = [x(n), x(n−τ),...,x(n−(dE−1)τ)],(12)

where we set du=dE. In words, the input signal regressor u(n) is composed

of dEactual values of the observed time series, separated from each other of

τtime steps.

Secondly, since the NARX network can be trained in two diﬀerent modes, the

output signal regressor y(n) can be written accordingly as:

ysp(n) = [x(n), . . . , x(n−dy+ 1)],(13)

yp(n) = [ˆx(n),...,ˆx(n−dy+ 1)].(14)

Note that the output regressor for the SP-mode shown in Equation (13) con-

Fig. 2. Architecture of the NARX network during training in the SP-mode (z−τ=

τunit time delays).

tains dypast values of the actual time series, while the output regressor for the

P-mode shown in Equation (14) contains dypast values of the estimated time

series. For a suitably trained network, no matter under which training mode,

these outputs are estimates of previous values of x(n+ 1). Henceforth, NARX

networks trained using the regression pairs {ysp (n),x1(n)}and {yp(n),x1(n)}

are denoted by NARX-SP and NARX-P networks, respectively. These NARX

networks implement following predictive mappings, can be visualized in Figure

2 and (Figure 3):

ˆx(n+ 1) = ˆ

f[ysp(n),u(n)] = ˆ

f[ysp(n),x1(n)],(15)

ˆx(n+ 1) = ˆ

f[yp(n),u(n)] = ˆ

f[yp(n),x1(n)],(16)

where the nonlinear function ˆ

f(·) be readily implemented through a MLP

trained with plain backpropagation algorithm.

It is worth noting that Figures 2 and 3 correspond to the diﬀerent ways the

Fig. 3. Architecture of the NARX network during training in the P-mode (z−τ=τ

unit time delays).

NARX network can be trained; that is, in SP-mode or in P-mode, respectively.

During the testing phase, however, since long-term predictions are required,

the predicted values should be fed back to both, the input regressor u(n) and

the output regressor ysp(n) (or yp(n)), simultaneously. Thus, the resulting pre-

dictive model has two feedback loops, one for the input regressor and another

for the output regressor, as illustrated in Figure 4.

Thus, unlike the TDNN-based approach for the nonlinear time series predic-

tion problem, the proposed approach makes full use of the output feedback

loop. Equations (12) and (13) are valid only for one-step-ahead prediction

tasks. Again, if one is interested in multi-step-ahead or recursive prediction

tasks, the estimates ˆxshould also be inserted into both regressors in a recursive

fashion.

One may argue that, in addition to the parameters dEand τ, the proposed

approach introduces one more to be determined, namely, dy. However, this

Fig. 4. Common architecture for the NARX-P and NARX-SP networks during the

testing (recursive prediction) phase.

parameter can be eliminated if we recall that, as pointed out in [18], the delay

embedding of Equation (9) has an alternative form given by:

x2(n),[x(n), x(n−1),...,x(n−m+ 1)] (17)

where mis an integer deﬁned as m≥τ·dE. By comparing Equations (13)

and (17), we ﬁnd that a suitable choice is given by dy≥τ·dE, which also

also satisﬁes the necessary condition dy> du. However, we have found by

experimentation that a value chosen from the interval dE< dy≤τ·dEis

suﬃcient for achieving a prediction performance better than those achieved

by conventional neural based time series predictors, such as the TDNN and

Elman architectures.

Finally, the proposed approach is summarized as follows. A NARX network

is deﬁned so that its input regressor u(n) contains samples of the measured

variable x(n) separated τ(τ > 0) time steps from each other, while the out-

put regressor y(n) contains actual or estimated values of the same variable,

but sampled at consecutive time steps. As training proceeds, these estimates

should become more and more similar to the actual values of the time series,

indicating convergence of the training process. Thus, it is interesting to note

that the input regressor supplies medium- to long-term information about the

dynamical behavior of the time series, since the delay τis usually larger than

unity, while the output regressor, once the network has converged, supplies

short-term information about the same time series.

4 Simulations and Discussion

In this paper, our aim is to evaluate, in qualitative and quantitative terms,

the predictive ability of the NARX-P and NARX-SP networks using two real-

world data sets, namely the chaotic laser and the VBR video traﬃc time series.

For the sake of completeness, a performance comparison with the TDNN and

Elman recurrent networks is also carried out.

It is worth emphasizing that our goal in the experiments is to evaluate if the

output regressor ysp (or yp) in the input layer of the NARX network improves

its prediction performance. Thus, to facilitate the performance comparison,

all the networks we simulate have two hidden layers and one output neuron.

All neurons in both hidden layers and the output neuron use hyperbolic tan-

gent activation functions. The standard backpropagation algorithm is used to

train the networks with learning rate equal to 0.001 (selected heuristically).

No momentum term is used. In what concerns the Elman network, only the

neuronal outputs of the ﬁrst hidden layer are fed back to the input layer.

The number of neurons, Nh,1and Nh,2, in the ﬁrst and second hidden layers,

respectively, are equal for all simulated networks. These values are chosen

according to the following heuristic rules [31]:

Nh,1= 2dE+ 1 and Nh,2=qNh,1,(18)

where Nh,2is rounded up towards the next integer number. The ﬁrst rule

is motivated by Kolmogorov’s theorem on function approximation [19]. The

second rule simply states that the number of neurons in the second hidden

layer is the square root of product of the dimension of the ﬁrst hidden layer

and the dimension of the output layer. Finally, we set dy= 2τdE, where τis

selected as the value occurring at the ﬁrst minimum of the mutual information

function of the time series [15].

The total number Mof adjustable parameters (weights and thresholds) for

each of the simulated networks are given by:

M= (dE+ 1) ·Nh,1+ (Nh,1+ 2) ·Nh,2+ 1 (TDNN)

M= (Nh,1+dE+ 1) ·Nh,1+ (Nh,1+ 2) ·Nh,2+ 1 (ELMAN)

M= (dE+dy+ 1) ·Nh,1+ (Nh,1+ 2) ·Nh,2+ 1 (NARX)

(19)

Once a given network has been trained, it is required to provide estimates of

the future sample values of a given time series for a certain prediction horizon

N. The predictions are executed in a recursive fashion until desired prediction

horizon is reached, i.e., during Ntime steps the predicted values are fed back

in order to take part in the composition of the regressors. The networks are

evaluated in terms of the Normalized Mean Squared Error (NMSE),

NM SE (N) = 1

N·σ2

n=1

(x(n+ 1) −ˆx(n+ 1))2,(20)

where x(n+ 1) is the actual value of the time series, ˆx(n+ 1) is the predicted

value, Nis the horizon prediction (i.e., how many steps into the future a given

network has to predict), and ˆσ2

xis the sample variance of the actual time series.

The NMSE values are averaged over 10 training/testing runs.

Chaotic laser time series - The ﬁrst data sequence to be used to evaluate

the NARX-P and NARX-SP models is the chaotic laser time series [42]. This

time series comprises measurements of the intensity pulsations of a single-

mode Far-Infrared-Laser NH3in a chaotic state [21]. It was made available

worldwide during a time series prediction competition organized by the Santa

Fe Institute and, since then, has been used in benchmarking studies.

The laser time series has 1500 points which have been rescaled to the range

[−1,1]. The rescaled time series was further split into two sets for the pur-

pose of performing 1-fold cross-validation, so that the ﬁrst 1000 samples were

used for training and the remaining 500 samples for testing. The embedding

dimension was estimated as dE= 7 by applying Cao’s method [8], which is

a variant of the well-known false nearest neighbors method 2. The embedding

delay was estimated as τ= 2. For the chosen parameters, the total number of

modiﬁable weights and biases for the three simulated neural architectures are

the following: M= 189 (TDNN), M= 414 (Elman) and M= 609 (NARX).

The results are shown in Figures 5(a), 5(b) and 5(c), for the NARX-SP, Elman

2A recent technique for the estimation of dEcan be found in [25].

0 100 200 300 400 500

−1

−0.8

−0.6

−0.4

−0.2

0.2

0.4

0.6

0.8

Time

Predicted

Original

(a)

0 100 200 300 400 500

−1

−0.5

0.5

Time

Predicted

Original

(b)

0 100 200 300 400 500

−1

−0.8

−0.6

−0.4

−0.2

0.2

0.4

0.6

0.8

Time

Predicted

Original

(c)

Fig. 5. Results for the laser series: (a) NARX-SP, (b) Elman, (c) TDNN.

and TDNN networks, respectively 3. A visual inspection illustrates clearly that

the NARX-SP model performed better than the other two architectures. It is

important to point out that a critical situation occurs around time step 60,

where the laser intensity collapses suddenly from its highest value to its lowest

one; then, it starts recovering the intensity gradually. The NARX-SP model

is able to emulate the laser dynamics very closely. The Elman’s network was

doing well until the critical point. From this point onwards, it was unable to

emulate the laser dynamics faithfully, i.e., the predicted laser intensities have

much lower amplitudes than the actual ones. The TDNN network had a very

poor predictive performance. From a dynamical point of view the output of

the TDNN seems to be stuck in a limit cycle, since it only oscillates endlessly.

It is worth mentioning that the previous results did not mean that the TDNN

and Elman networks cannot learn the dynamics of the chaotic laser. Indeed, it

was shown to be possible in [18] using sophisticated training algorithms, such

as backpropagation through time (BPTT) [43] or real-time recurrent learning

(RTRL) [44]. In what concern the TDNN network, our results conﬁrms the ob-

servations reported by Eric Wan [41, p. 62] in his PhD thesis. There, he states

3The results for the NARX-P network are not shown since they are equivalent to

those shown for the NARX-SP network

that the standard MLP, using the input regressor x1(t) only and trained with

the instantaneous gradient descent rule, has been unable to accurately predict

the laser time series. In his own words, “the downward intensity collapse went

completely undetected.”, as in our case.

In sum, our results show that under the same conditions, i.e. with the same

number of hidden neurons, using the standard gradient-based backpropagation

algorithm, a short time series for training, and the same number of training

epochs, the NARX-SP network performs better than the TDNN and Elman

networks. It seems that the presence of the output regressor ysp improves

indeed the predictive power of the NARX network.

For the sake of comparison, under similar training and network evaluation

methodologies, the FIR-MLP model proposed by Eric Wan [41] achieved very

good long-term prediction results on the laser time series, which are equivalent

to those obtained by the NARX-SP network. However, the FIR-MLP required

M= 1105 adjustable parameters to achieve such a good performance, while

the NARX-SP model required roughly half the number of parameters (i.e.

M= 609).

The long-term predictive performances of all networks can be assessed in more

quantitative terms by means of NMSE curves. Figure 6(a) shows the evolution

of NMSE as a function of the prediction horizon N. It is worth emphasizing two

types of behavior in this ﬁgure. Below the critical time step (i.e. N < 60), the

NMSE values reported are approximately the same, with a small advantage to

the Elman network. This means that while the critical point is not reached, all

networks predict well the time series. For N > 60, the NARX-P and NARX-

SP models reveal their superior performance. Figure 6(b) shows the evolution

(a) (b)

Fig. 6. (a) Multi-step-ahead NMSE values and (b) the variances of the predicted

values for the laser time series.

of the variance of the predicted values with N. Note that the highest values of

the variance occur around N= 60. Before this point the NARX-SP network

provides the smallest variances among all models. For N > 60, the variances

obtained for the Elman network are of the same order of magnitude of those

generated by the NARX-SP network; however, the latter provides much more

accurate estimates than the former, as shown in Figure 6(a).

A useful way to qualitatively evaluate the performance of the NARX-SP net-

work for the laser series is through recurrence plots [9]. These diagrams de-

scribe how a reconstructed state-space trajectory recurs or repeats itself, being

useful for characterization of a system as random, periodic or chaotic. For ex-

ample, random signals tends to occupy the whole area of the plot, indicating

that no value tends to repeat itself. Any structured information embedded

in a periodic or chaotic signal is reﬂected in a certain visual pattern in the

recurrence plot.

Recurrence plots are built by calculating the distance between two points in

the state-space at times i(horizontal axis) and j(vertical axis):

δij =kD(i)−D(j)k,(21)

where k · k is the Euclidean norm. The state vectors D(n) = [ˆx(n),ˆx(n−

τ),...,ˆx(n−(dE−1)τ)] are built using the points of the predicted time series.

Then, a dot is placed at the coordinate (i, j ) if δij < r. In this paper, we set

r= 0.4 and the prediction horizon to N= 200.

The results are shown in Figure 7. It can be easily visualized that the recur-

rence plots shown in Figures 7(a) and 7(b) are more similar with one another,

indicating that NARX-SP network reproduced the original state-space trajec-

tory more faithfully.

VBR video traﬃc time series - Due to the widespread use of Internet

and other packet/cell switching broad-band networks, variable bit rate (VBR)

video traﬃc will certainly be a major part of the traﬃc produced by multime-

dia sources. Hence, many researches have focused on VBR video traﬃc predic-

tion to devise network management strategies that satisfy QoS requirements.

From the point of view of modeling, a particular challenging issue on network

traﬃc prediction comes from the important discovery of self-similarity and

long-range dependence (LRD) in broad-band network traﬃc [24]. Researchers

have also observed that VBR video traﬃc typically exhibits burstiness over

multiple time scales (see [5,20], for example).

In this section, we evaluate the predictive abilities of the NARX-P and NARX-

SP networks using VBR video traﬃc time series (trace), extracted from Juras-

sic Park, as described in [37]. This video traﬃc trace was encoded at University

of W¨urzburg with MPEG-I. The frame rates of video sequence coded Juras-

0 50 100 150 200

100

150

200

(a)

0 50 100 150 200

100

150

200

(b)

0 50 100 150 200

100

150

200

(c)

0 50 100 150 200

100

150

200

(d)

Fig. 7. Recurrence plot of the (a) original laser time series, and the ones produced

by (b) NARX-SP; (c) TDNN and (d) Elman networks.

sic Park have been used. The MPEG algorithm uses three diﬀerent types

of frames: Intraframe (I), Predictive (P) and Bidirectionally-Predictive (B).

These three types of frames are organized as a group (Group of Picture, GoP)

deﬁned by the distance L between I frames and the distance M between P

frames. If the cyclic frame pattern is {IBBPBBPBBPBBI}, then L=12 and

M=3. These values for L and M are used in this paper.

The resulting time series has 2000 points which have been rescaled to the

range [−1,1]. The rescaled time series was further split into two sets for cross-

validation purposes: 1500 samples for training and 500 samples for testing.

5 10 15 20 25

0.2

0.4

0.6

0.8

NMSE

Order

FTDNN

Elman

NARX−P

NARX−SP

(a)

0 100 200 300 400 500 600

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Epochs

NMSE

FTDNN

Elman

NARX−P

NARX−SP

(b)

Fig. 8. Evaluation of the sensitivity of the neural networks with respect to (a) the

embedding dimension and (b) the number of training epochs.

Evaluation of the long-term predictive performances of all networks can also

help assessing the sensitivity of the neural models to important training pa-

rameters, such as the number of training epochs and the size of the embedding

dimension, as shown in Figure 8.

Figure 8(a) shows the NMSE curves for all neural networks versus the value of

the embedding dimension, dE, which varies from 3 to 24. For this simulation

we trained all the networks for 300 epochs, τ= 1 and dy= 24. One can easily

note that the NARX-P and NARX-SP performed better than the TDNN

and Elman networks. In particular, the performance of the NARX-SP was

rather impressive, in the sense that it remains constant throughout the studied

range. From dE≥12 onwards, the performances of the NARX-P and NARX-

SP are practically the same. It is worth noting that the performances of the

TDNN and Elman networks approaches those of the NARX-P and NARX-SP

networks when dEis of the same order of magnitude of dy. This suggests that,

for NARX-SP (or NARX-P) networks, we can select a small value for dEand

still have a very good performance.

Figure 8(b) shows the NMSE curves obtained from the simulated neural net-

works versus the number of training epochs, ranging from 90 to 600. For this

simulation we trained all the networks with τ= 1, dE= 12 and dy= 2τdE=

24. Again, better performances were achieved by the NARX-P and NARX-SP.

The performance of the NARX-SP is practically the same from 100 epochs on.

The same behavior is observed for the NARX-P network from 200 epochs on.

This can be explained by recalling that the NARX-P uses estimated values

to compose the output regressor yp(n) and, because of that, it learns slower

than the NARX-SP network.

Another important behavior can be observed for the TDNN and Elman net-

works. From 200 epochs onwards, these networks increase their NMSE values

instead of decreasing them. We hypothesize that this behavior can be an ev-

idence of overﬁtting, a phenomenon observed when powerful nonlinear mod-

els, with excessive degrees of freedom (too much adjustable parameters), are

trained for a long period with a ﬁnite size data set. In this sense, the results of

Figure 8(b) strongly suggest that the NARX-SP and NARX-P networks are

much more robust than the TDNN and Elman networks. In other words, the

presence of an output regressor in the NARX-SP and NARX-P networks seems

to turn them less prone to overﬁtting than the Elman and TDNN models, even

when the number of free parameters in the NARX networks are higher than

the Elman and TDNN models.

Finally, we show in Figures 9(a), 9(b) and 9(c) typical estimated VBR video

traﬃc traces generated by the TDNN, Elman and NARX-SP networks, re-

spectively. For this simulation, all the neural networks are required to predict

recursively the sample values of the VBR video traﬃc trace for 300 steps

ahead in time. For all networks, we have set dE= 12, τ= 1, dy= 24 and

0 50 100 150 200 250 300

−1

−0.8

−0.6

−0.4

−0.2

0.2

0.4

Frame number

Bits

Predicted

Original

(a)

0 50 100 150 200 250 300

−1

−0.8

−0.6

−0.4

−0.2

0.2

0.4

Frame number

Bits

Predicted

Original

(b)

0 50 100 150 200 250 300

−1

−0.5

0.5

Frame number

Bits

Predicted

Original

(c)

Fig. 9. Recursive predictions obtained by (a) TDNN, (b) Elman and (c) NARX-SP

networks.

trained the neural models for 300 epochs. For these training parameters, the

NARX-SP predicted the video traﬃc trace much better than the TDNN and

Elman networks.

As we did for the laser time series, we again emphasize that the results reported

in Figure 9 did not mean to say that the TDNN and Elman networks cannot

ever predict the video traﬃc trace as well as the NARX-SP. They only mean

that, for the same training and conﬁguration parameters, the NARX-SP has

greater computational power provided by the output regressor. Recall that

the MLP is an universal function approximation; and so, any MLP-based

neural model, such as the TDNN and Elman networks, are in principle able to

approximate complex function with arbitrary accuracy, once enough training

epochs and data are provided.

5 Conclusions and Further Work

In this paper, we have shown that the NARX neural network can success-

fully use its output feedback loop to improve its predictive performance in

complex time series prediction tasks. We used the well-known chaotic laser

and real-world VBR video traﬃc time series to evaluate empirically the pro-

posed approach in long-term prediction tasks. The results have shown that the

proposed approach consistently outperforms standard neural network based

predictors, such as the TDNN and Elman architectures.

Currently we are evaluating the proposed approach on several other applica-

tions that require long-term predictions, such as electric load forecasting and

ﬁnancial time series prediction. Applications to signal processing tasks, such

as communication channel equalization, are also being planned.

Acknowledgment

The authors would like to thank CNPq (grant #506979/2004-0), CAPES/PRODOC

and FUNCAP for their ﬁnancial support.

References

[1] H. D. Abarbanel, T. W. Frison, L. Tsimring, Obtaining order in a world of

chaos, IEEE Signal Processing Magazine 15 (3) (1998) 49–65.

[2] A. F. Atiya, M. A. Aly, A. G. Parlos, Sparse basis selection: New results and

application to adaptive prediction of video source traﬃc, IEEE Transactions on

Neural Networks 16 (5) (2005) 1136–1146.

[3] A. F. Atiya, S. M. El-Shoura, S. I. Shaheen, M. S. El-Sherif, A

comparison between neural-network forecasting techniques-case study: River

ﬂow forecasting, IEEE Transactions on Neural Networks 10 (2) (1999) 402–409.

[4] Y. Bengio, P. Simard, P. Frasconi, Learning long-term dependencies with

gradient descent is diﬃcult, IEEE Transactions on Neural Networks 5 (2) (1994)

157–166.

[5] J. Beran, R. Sherman, M. S. Taqqu, W. Willinger, Long-range dependence in

variable-bit-rate video traﬃc, IEEE Transactions on Communications 43 (234)

(1995) 1566 – 1579.

[6] A. Bhattacharya, A. G. Parlos, A. F. Atiya, Prediction of MPEG-coded video

source traﬃc using recurrent neural networks, IEEE Transactions on Neural

Networks 51 (8) (2003) 2177–2190.

[7] G. Box, G. M. Jenkins, G. Reinsel, Time Series Analysis: Forecasting & Control,

3rd ed., Prentice Hall, 1994.

[8] L. Cao, Practical method for determining the minimum embedding dimension

of a scalar time series, Physica D 110 (1–2) (1997) 43–50.

[9] M. C. Casdagli, Recurrence plots revisited, Physica D 108 (1) (1997) 12–44.

[10] S. Chen, S. A. Billings, P. M. Grant, Nonlinear system identiﬁcation using

neural networks, International Journal of Control 11 (6) (1990) 1191–1214.

[11] D. Coyle, G. Prasad, T. M. McGinnity, A time-series prediction approach for

feature extraction in a brain-computer interface, IEEE Transactions on Neural

Systems and Rehabilitation Engineering 13 (4) (2005) 461–467.

[12] S. Dablemont, G. Simon, A. Lendasse, A. Ruttiens, F. Blayo, M. Verleysen,

Time series forecasting with SOM and local non-linear models - Application

to the DAX30 index prediction, in: Proceedings of the 4th Workshop on Self-

Organizing Maps, (WSOM)’03, 2003.

[13] A. D. Doulamis, N. D. Doulamis, S. D. Kollias, An adaptable neural network

model for recursive nonlinear traﬃc prediction and modelling of MPEG video

sources, IEEE Transactions on Neural Networks 14 (1) (2003) 150–166.

[14] A. Erramilli, M. R. D. Veitch, W. Willinger, Self-similar traﬃc and network

dynamics, Procedings of the IEEE 9 (5) (2002) 800–819.

[15] A. M. Fraser, H. L. Swinney, Independent coordinates for strange attractors

from mutual information, Physical Review A 33 (1986) 1134–40.

[16] M. Grossglauser, J. C. Bolot, On the relevance of long-range dependence in

network traﬃc, IEEE/ACM Transactions on Networking 7 (4) (1998) 329–640.

[17] S. Haykin, X. B. Li, Detection of signals in chaos, Proceedings of the IEEE

83 (1) (1995) 95–122.

[18] S. Haykin, J. C. Principe, Making sense of a complex world, IEEE Signal

Processing Magazine 15 (3) (1998) 66–81.

[19] R. Hecht-Nielsen, Kolmogorov’s mapping neural network existence theorem, in:

Proceedings of the IEEE International Conference on Neural Networks, vol. 2,

1987.

[20] D. Heyman, T. Lakshman, What are the implications of long-range dependence

for VBR video traﬃc engineering, IEEE/ACM Transactions on Networking 4

(1996) 301–317.

[21] U. Huebner, N. B. Abraham, C. O. Weiss, Dimensions and entropies of chaotic

intensity pulsations in a single-mode far-infrared NH3 laser, Physical Review A

40 (11) (1989) 6354–6365.

[22] H. Kantz, T. Schreiber, Nonlinear Time Series Analysis, 2nd ed., Cambridge

University Press, 2006.

[23] J. F. Kolen, S. C. Kremer, A Field Guide to Dynamical Recurrent Networks,

Wiley-IEEE Press, 2001.

[24] W. E. Leland, M. S. Taqqu, W. Willinger, D. V. Wilson, On the self-similar

nature of ethernet traﬃc (extended version), IEEE/ACM Transactions on

Networking 2 (1) (1994) 1–15.

[25] A. Lendasse, J. Lee, V. Wertz,

M. Verleysen, Forecasting electricity consumption using nonlinear projection

and self-organizing maps, Neurocomputing 48 (1-4) (2002) 299–311.

[26] I. J. Leontaritis, S. A. Billings, Input-output parametric models for nonlinear

systems - Part I: deterministic nonlinear systems, International Journal of

Control 41 (2) (1985) 303–328.

[27] T. Lin, B. G. Horne, C. L. Giles, How embedded memory in recurrent neural

network architectures helps learning long-term temporal dependencies, Neural

Networks 11 (5) (1998) 861–868.

[28] T. Lin, B. G. Horne, P. Tino, C. L. Giles, Learning long-term dependencies in

NARX recurrent neural networks, IEEE Transactions on Neural Networks 7 (6)

(1996) 1424–1438.

[29] T. Lin, B. G. Horne, P. Tino, C. L. Giles, A delay damage model selection

algorithm for NARX neural networks, IEEE Transactions on Signal Processing

45 (11) (1997) 2719–2730.

[30] L. Ljung, System Identiﬁcation: Theory for the user, 2nd ed., Prentice-Hall,

Englewood Cliﬀs, NJ, 1999.

[31] T. Masters, Practical Neural Network Recipes in C++, Academic Press, 1993.

[32] K. S. Narendra, K. Parthasarathy, Identiﬁcation and control of dynamical

systems using neural networks, IEEE Transactions on Neural Networks 1 (1)

(1990) 4–27.

[33] M. Norgaard, O. Ravn, N. K. Poulsen, L. K. Hansen, Neural Networks for

Modelling and Control of Dynamic Systems, Springer, 2000.

[34] A. K. Palit, D. Popovic, Computational Intelligence in Time Series Forecasting,

1st ed., Springer Verlag, 2005.

[35] B. A. Pearlmutter, Gradient calculations for dynamic recurrent neural networks:

A survey, IEEE Transactions on Neural Networks 6 (5) (1995) 1212–1228.

[36] J. C. Principe, N. R. Euliano, W. C. Lefebvre, Neural Adaptive Systems:

Fundamentals Through Simulations, John Willey and Sons, 2000.

[37] O. Rose, Statistical properties of MPEG video traﬃc and their impact on traﬃc

modeling in ATM systems, in: Proceedings of the 20th Annual IEEE Conference

on Local Computer Networks (LCN’95), IEEE Computer Society, 1995.

[38] H. T. Siegelmann, B. G. Horne, C. L. Giles, Computational capabilities of

recurrent NARX neural networks, IEEE Transactions On Systems, Man, and

Cybernetics B-27 (2) (1997) 208–215.

[39] A. Sorjamaa, J. H. N. Reyhani, Y. Ji, A. Lendasse, Methodology for long-term

prediction of time series, Neurocomputing 70 (16–18) (2007) 2861–2869.

[40] F. Takens, Detecting strange attractors in turbulence, in: D. A. Rand, L.-S.

Young (eds.), Dynamical Systems and Turbulence, vol. 898 of Lecture Notes in

Mathematics, Springer, 1981.

[41] E. A. Wan, Finite impulse response neural networks with applications in time

series prediction, Ph.D. thesis, Department of Electrical Engineering - Stanford

University (1993).

[42] A. Weigend, N. Gershefeld, Time Series Prediction: Forecasting the Future and

Understanding the Past, Addison-Wesley, Reading, 1994.

[43] P. Werbos, Backpropagation through time: What it does and how to do it.,

Proceedings of the IEEE 78 (10) (1990) 1550 – 1560.

[44] R. J. Williams, D. Zipser, A learning algorithm for continually running fully

recurrent neural networks, Neural Computation 1 (2) (1989) 270–280.

[45] H. Youseﬁ’zadeh, E. A. Jonckheere, Dynamic neural-based buﬀer management

for queueing systems with self-similar characteristics, IEEE Transactions on

Neural Networks 16 (5) (2005) 1163–1173.

NARX Time Series Model for Remaining Useful Life Estimation of Gas Turbine Engines

Article

Full-text available

Jul 2016

Prognostics is a promising approach used in condition based maintenance due to its ability to forecast complex systems' remaining useful life. In gas turbine maintenance applications,data-driven prognostic methods develop an understanding of system degradation by using regularly stored condition monitoring data, and then can automatically monitor and evaluate the future health index of the system. This paper presents such a technique for fault prognosis for turbofan engines. A prognostic model based on a nonlinear autoregressive neural network design with exogenous input is designed to determine how the future values of wear can be predicted. The research applies the life prediction as a type of dynamic filtering, in which training time series are used to predict the future values of test series. The results demonstrate the relationship between the historical performance deterioration of an engine's prior operating period with the current life prediction.

Prediction of ionospheric total electron content data using NARX neural network model

Article

Full-text available

Feb 2024

Successful prediction of ionospheric total electron content (TEC) data will help in correction of positioning errors in global navigation satellite systems (GNSS) caused by the ionosphere. This research paper proposes a prediction model for ionospheric TEC using a nonlinear autoregressive with exogenous inputs (NARX) neural network that utilizes past TEC data alongwith solar and geomagnetic indices namely F10.7, disturbed storm (Dst), Kp, Ap, and time of the day. We assess the prediction capability of our model at different latitudes during different solar activity years. We compare our results with another NARX model which uses previous TEC data along with time of the day, day of the year and season as exogenous parameters. The results show that for the solar minimum year the TEC prediction accuracy improves by 35.71% and for the solar maximum year it improves by 31.20%. The results using root mean square error (RMSE), mean absolute error (MAE), correlation coefficient, and symmetric mean absolute percentage error (sMAPE) clearly indicate that solar and geomagnetic indices along with time of the day help in enhancing prediction accuracy of TEC across different latitudinal regions during both solar minimum and maximum years.

Quantifying the Regulation Capacity of the Three Gorges Reservoir on Extreme Hydrological Events and Its Impact on Flow Regime in a Changing Climate

Article

Full-text available

Jun 2024
WATER RESOUR RES

The Three Gorges Reservoir (TGR) is one of the world's largest hydropower projects and plays an important role in water resources management in the Yangtze River. For the sake of disaster prevention and catchment management, it is crucial to understand the regulation capacity of the TGR on extreme hydrological events and its impact on flow regime in a changing climate. This study obtains historical inflows of the TGR from 1961 to 2019 and uses a distributed hydrological model to simulate the future inflows from 2021 to 2070. These data are adopted to drive a machine learning‐based TGR operation model to obtain the simulated outflow with TGR operation, which are then compared with the natural flow without TGR operation to assess the impact of TGR. The results indicate that the average flood peaks and total flooding days in the historical period could have been reduced by 29.2% and 53.4% with the operation of TGR. The relative declines in drought indicators including duration and intensity were generally less than 10%. Faced with more severe extreme hydrological events in the future, the TGR is still expected to alleviate floods and droughts, but cannot bring them down to historical levels. The impact of TGR operation on flow regime will also evolve in a changing climate, potentially altering the habitats of river ecosystems. This study proposes feasible methods for simulating the operation of large reservoirs and quantifying the impact on flow regime, and provides insights for integrated watershed management in the upper Yangtze River basin.

Physically based vs. data-driven models for streamflow and reservoir volume prediction at a data-scarce semi-arid basin

Article

Full-text available

May 2024
ENVIRON SCI POLLUT R

Physically based or data-driven models can be used for understanding basinwide hydrological processes and creating predictions for future conditions. Physically based models use physical laws and principles to represent hydrological processes. In contrast, data-driven models focus on input–output relationships. Although both approaches have found applications in hydrology, studies that compare these approaches are still limited for data-scarce, semi-arid basins with altered hydrological regimes. This study aims to compare the performances of a physically based model (Soil and Water Assessment Tool (SWAT)) and a data-driven model (Nonlinear AutoRegressive eXogenous model (NARX)) for reservoir volume and streamflow prediction in a data-scarce semi-arid region. The study was conducted in the Tersakan Basin, a semi-arid agricultural basin in Türkiye, where the basin hydrology was significantly altered due to reservoirs (Ladik and Yedikir Reservoir) constructed for irrigation purposes. The models were calibrated and validated for streamflow and reservoir volumes. The results show that (1) NARX performed better in the prediction of water volumes of Ladik and Yedikir Reservoirs and streamflow at the basin outlet than SWAT (2). The SWAT and NARX models both provided the best performance when predicting water volumes at the Ladik reservoir. Both models provided the second best performance during the prediction of water volumes at the Yedikir reservoir. The model performances were the lowest for prediction of streamflow at the basin outlet (3). Comparison of physically based and data-driven models is challenging due to their different characteristics and input data requirements. In this study, the data-driven model provided higher performance than the physically based model. However, input data used for establishing the physically based model had several uncertainties, which may be responsible for the lower performance. Data-driven models can provide alternatives to physically-based models under data-scarce conditions.

Predicting Drop Dynamics in Sub-Critical Weber Number Regime: High-Fidelity Simulation and Data-Driven Modeling

Conference Paper

Full-text available

Feb 2024

Accurate prediction of the dynamics of a deformable and freely-moving drop in a uniform gas stream is essential for numerous applications involving droplets, such as spray cooling and liquid fuel injection. When the droplet Weber number is finite but moderate, the drop deviates from its spherical shape and deforms as it is accelerated by the gas stream. Since the drag depends on the drop shape, rigorously resolving the drop shape evolution is necessary for accurate predictions of the drop’s velocity and position. In this study, 2D axisymmetric interface-resolved simulations were performed using the Basilisk solver. The sharp gas-liquid interface is resolved using a geometric Volume-of-Fluid (VOF) method. The quadtree mesh is used to discretize the 2D domain, providing flexibility to dynamically refine the mesh in user-defined regions. The adaptation criterion is based on the wavelet estimate of the discretization errors of the color function and all velocity components. Parametric simulations are conducted by systematically varying the Weber and Reynolds numbers. The instantaneous drop shapes are characterized using spherical harmonic modes. The temporal evolution of the drag and the spherical harmonic mode coefficients are investigated to identify correlations between the drag and the spherical harmonic mode coefficients. The simulation data are also utilized to develop point-particle models for Euler-Lagrange simulations of sprays consisting of a large number of drops. Due to the complex interplay between droplet drag and deformation, accurate models cannot be developed through conventional physics-based approaches. Therefore, a data-driven approach will be adopted. The spherical harmonic mode coefficients up to the sixth mode are used to characterize the drop shape. The evolutions of the spherical harmonic mode coefficients from the simulation results for cases in the test set are used to train the Non-linear Auto-Regressive with Exogenous input Neural Network (NARXNN) model. The predicted mode coefficients are then used as input to train an additional NARXNN model for the drop acceleration.

Data-driven modeling of the aerodynamic deformation and drag for a freely moving drop in the sub-critical Weber number regime

Article

May 2024
INT J MULTIPHAS FLOW

On the Sea Surface Temperature Forecasting Problem with Deep Dilation-Erosion-Linear Models

Article

Apr 2024

Comparability of NARX Model to SWAT Model in Simulating Future Water Resources Scenarios using CMIP6 Climate Model Outputs over UASB, Ethiopia

Preprint

Full-text available

Mar 2024

Quantifying water resource potential is crucial for making well-informed decisions in planning, managing, and developing water resources within a given study area. This study utilizes an ensemble of climate variables derived from five CMIP6 climate models (ECEARTH3, GFDL-ESM4, MPI-ESM1-2-HR, MRIESM2, and INM-CM5-0) to simulate future monthly streamflow conditions over the Upper-Awash Sub-Basin (UASB) for three Shared Socioeconomic Pathway (SSP) scenarios (SSP1.26, SSP2.45, & SSP5.85) until the end of the 21st century. Streamflow simulations are conducted using both a non-linear data-driven model, NARX, and a physically based model, SWAT. These models are trained and validated using observed streamflow data from a gauging station at the outlet of the sub-basin. During training, the NARX model exhibits a Nash-Sutcliffe Efficiency (NSE) of 94%, while the SWAT model achieves 88%. In validation, NARX maintains a high NSE of 92%, compared to SWAT's 82%. Overall, the NARX model demonstrated superior performance and applicability for quick streamflow simulation with fewer input variables. However, it struggles with peak flow simulations due to its sensitivity to outliers in the training phase. Despite differences in their capabilities, both models project an increase in future monthly streamflow across all scenarios and time periods. Seasonal projections indicate a rise in streamflow during the rainy seasons of spring and summer, while dry periods (Dec-Jan-Feb) experience a decrease. The anticipated increase in streamflow during rainy seasons may exacerbate flood incidences, especially when combined with escalating industrialization and population growth within the sub-basin.

A New Visual Biofeedback Protocol Based on Analyzing the Muscle Synergy Patterns to Recover the Upper Limbs Movement in Ischemic Stroke Patients: A Pilot Study

Article

Full-text available

Jul 2023

Prediction of tail biting in pigs using partial least squares regression and artificial neural networks

Article

Jan 2024
COMPUT ELECTRON AGR

Finite Impulse Response Neural Networks with Applications in Time Series Prediction

Article

Full-text available

Jan 1993

Eric A. Wan

Identification and control of dynamical systems using neural networks

Article

Jan 1991
NEURAL NETWORKS

Determining Minimum Embedding Dimension from Scalar Time Series

Chapter

Jan 2002

Liangyue Cao

Determining embedding dimension is considered as one of the most important steps in nonlinear time series modelling and prediction. A number of methods have been developed in determining the minimum embedding dimension since the early study of nonlinear time series analysis. Some of the methods are briefly reviewed in this chapter. The false nearest neighbor and the averaged false nearest neighbor methods are described in details, given the methods have been widely used in the literature. Several real economic time series are tested to demonstrate applications of the methods.

A Field Guide to Dynamical Recurrent Neural Networks

Book

Jan 2001

Computational Intelligence in Time Series Forecasting: Theory and Engineering Applications

Article

Jan 2005

Input-output parametric models for non-linear systems Part II: stochastic non-linear systems

Article

Feb 1985

In the first part of this paper (Leontaritis and Billings 1984) recursive input-output models for deterministic non-linear multivariate discrete-time systems were derived and sufficient conditions for their existence were defined. In this the second part, the non-linear model is compared with other system representations, several examples are introduced and the results are extended to create prediction error input-output models for multivariable non-linear stochastic systems. These latter models are a generalization of the ARMAX models for linear systems and are referred to as NARMAX or non-linear autoregressive moving average models with exogenous inputs.

Recurrence plots revisited

Article

Sep 1997
PHYSICA D

Martin Casdagli

We show that recurrence plots (RPs) give detailed characterizations of time series generated by dynamical systems driven by slowly varying external forces. For deterministic systems we show that RPs of the time series can be used to reconstruct the RP of the driving force if it varies sufficiently slowly. If the driving force is one-dimensional, its functional form can then be inferred up to an invertible coordinate transformation. The same results hold for stochastic systems if the RP of the time series is suitably averaged and transformed. These results are used to investigate the nonlinear prediction of time series generated by dynamical systems driven by slowly varying external forces. We also consider the problem of detecting a small change in the driving force, and propose a surrogate data technique for assessing statistical significance. Numerically simulated time series and a time series of respiration rates recorded from a subject with sleep apnea are used as illustrative examples.

Detecting Strange Attractors in Turbulence

Article

Jan 1981

F. Takens

Input-output parametric models for non-linear systems Part I: deterministic non-linear systems

Article

Feb 1985

Recursive input-output models for non-linear multivariate discrete-time systems are derived, and sufficient conditions for their existence are defined. The paper is divided into two parts. The first part introduces and defines concepts such as Nerode realization, multistructural forms and results from differential geometry which are then used to derive a recursive input-output model for multivariable deterministic non-linear systems. The second part introduces several examples, compares the derived model with other representations and extends the results to create prediction error or innovation input-output models for non-linear stochastic systems. These latter models are the generalization of the multivariable ARM AX models for linear systems and are referred to as NARMAX or Non-linear AutoRegressive Moving Average models with exogenous inputs.

Kolmogorov''s Mapping Neural Network Existence Theorem

Article

Jan 1987

R. Hecht-nielsen

Long-term time series prediction with the NARX network: An empirical evaluation

Abstract

Recommended publications

Forecasting Sequential Data using Consistent Koopman Autoencoders

A neural-network extension of the method of analogues for iteratedtime series prediction

CHARACTERIZATION OF LINEAR AND NONLINEAR STRUCTURE IN HIGH-DIMENSIONAL, SPATIALLY EXTENDED SYSTEMS:...

Prediction of Chaotic Time Series with Neural Networks and the Issue of Dynamic Modeling