Conference PaperPDF Available

Neural network initialization with prototypes - A case study in function approximation

Authors:

Abstract and Figures

The initialization of neural networks in function approximation has been studied by many researchers yet remains a challenging problem. Another important yet open issue in the neural network community is to incorporate knowledge and hints with regard to training for a meaningful neural network. This study makes an attempt to address these two issues in handling a specific type of engineering problems, namely, modeling nonlinear hysteretic restoring forces of a dynamic system under a specific formulation. The paper showcases a heuristic idea on using a growing technique through a prototype-based initialization where the insights to the governing mathematics/physics are related to the features of the activation functions.
Content may be subject to copyright.
Neural Network Initialization with Prototypes -
A Case Study in Function Approximation
Jin-Song Pei
School of Civil Engineering and
Environmental Science
University of Oklahoma
Norman, OK 73019
E-mail: jspei@ou.edu
Joseph P. Wright
Division of Applied Science
Weidlinger Associates Inc.
New York, NY 10014
E-mail: wright@wai.com
Andrew W. Smyth
Department of Civil Engineering and
Engineering Mechanics
Columbia University
New York, NY 10027
E-mail: smyth@civil.columbia.edu
AbstractThe initialization of neural networks in function
approximation has been studied by many researchers yet remains
a challenging problem. Another important yet open issue in the
neural network community is to incorporate knowledge and hints
with regard to training for a meaningful neural network. This
study makes an attempt to address these two issues in handling a
specific type of engineering problems, namely, modeling nonlinear
hysteretic restoring forces of a dynamic system under a specific
formulation. The paper showcases a heuristic idea on using a
growing technique through a prototype-based initialization where
the insight to the governing mathematics/physics are related to
the features of the activation functions.
I. INTRODUCTION
As an important issue in theories and applications, mul-
tilayer feedforward neural network initialization in function
approximation has been studied by many researchers (e.g.,
[24], [7], [33], [3], [25], [31], [6], [12], [2], [14], [16], [13],
[4], [35]) focusing on fast convergence. One of the major
experiences indicates that an efficient initialization (measured
by the success and rate of the convergence) has to rely on the
feature of the function to be approximated (e.g., [6], [15]).
In this study, the authors were prompted by the need in a
specific engineering application of neural networks to seek an
efficient as well as a meaningful solution to the initialization
of a multilayer feedforward neural network.
A heuristic approach is presented to initialize multilayer
feedforward neural networks for a special and critical class
of problems in engineering mechanics and numerous appli-
cations. Mathematically, a smooth function surface is desired
here based on some discrete input-output points. This is a
dual-goal pursuit; one is to find an initial point to start the
training of a multilayer feedforward neural network to enable
convergence and fast training, and the other is to facilitate the
interpretation of the inner workings of neural networks. The
latter is required by the need in engineering practice ([28])
especially by the demand in fusing physics-based and data-
driven modeling tools.
II. LITERATURE REVIEW
Finding good initial points to start neural network training
is of critical importance. Reduced training time is claimed
to be substantial if an initial point can be optimized (e.g.,
[6]). In terms of initialization schemes, there are quite some
publications available, although the number of which might
not be as many as those focusing on the training process itself.
Summaries of these works can be found in [15], [35], for
example. Despite the existence of these works, both analytic
and heuristic approaches are still lacking to guide practical
applications. This might be because this challenging issue can
only be addressed well by looking into the features of the
function to be approximated (or equivalently, the features of
the error function surface) and thus might be hard to be tackled
in a general sense ([6], [15]).
[36] and [1] are examples of the past efforts of incorporat-
ing knowledge into neural network training. For engineering
applications, the existing knowledge of engineering and maths
would offer significant advantages if a proper means could be
found to utilize them in the most critical and under-addressed
issues in neural networks such as the initial setup. Constructive
methods in neural networks do exist (e.g., [11] and a summary
in [18]), however the available knowledge and hints are far
from being sufficiently connected to any of these constructive
methods.
Among these existing studies, [6] is closely related to this
study, where a prototype based initialization was proposed
for both pattern classification and function approximation.
Noteworthy is that [6] also paid attention to the interpretation
of the weights and/or basis function. This proposed study,
however, is more detailed and exercised more thoroughly
but mainly for the approximation of some arbitrary functions
in a specific engineering application. Inspirations from these
previous works to this study can be summarized as follows:
1) Random initialization schemes (e.g., [31]) do not nor-
mally work well, which has been widely acknowledged
by researchers, e.g., [13], [4].
2) Methods based on a good understanding of the capabil-
ities of sigmoidal activation function have been studied.
The importance of thoroughly utilizing the feature of
sigmoidal functions has been recognized. Various at-
tempts have been made to use a sigmoidal function as
a whole (e.g., [11]), in a linearized form (e.g., [3]), in a
piece-wise fashion (e.g., [24], [25]), and as a nonlinear
function (e.g., [4]), respectively.
From these studies, the need of preventing neurons from
saturation has been realized. The weights between the
input and hidden layer are given special attention since
they form the basis functions in function approximation.
The work by [22] does not deal with neural network ini-
tializations directly, however the capability of sigmoidal
functions in approximating polynomials was analyzed.
That work was used by [21] in setting up recurrent
neural networks.
3) Methods utilizing the features of the function to be
approximated have been proposed.
The necessity of studying the intrinsic feature/pattern of
the function to be approximated and the importance of
exercising human judgement on any visible features of
the function have been reported such as in [25], [6].
This will be referred as the engineering judgement in
this study.
4) The necessity of preprocessing data is quite common.
[17] proposed a delta rule pre-training scheme, while
[16] proposed a bottom-up unsupervised learning
process before the top-down supervised training is ap-
plied. For cascade learning, [15] also proposed a pre-
training stage.
III. PROPOSED INITIALIZATION
A. Problem Statement
Modeling nonlinear hysteresis has been a critical and an
active research topic in engineering mechanics, the applica-
tions of which can be found in studying the work load of
aeroplane joints, performing simulations for infrastructures
under dynamic loads (e.g., earthquake excitations) and many
others. The nonlinearity hysteretic especially the rate and
path-dependent hysteretic behavior of a joint will dictate
the system-level response, thus has been studied analytically,
experimentally and numerically. Physics-based macro-models
have the advantage of having physical interpretations, however,
the physical mechanism is often hard to be modeled due to the
complex nature of the problem itself. Analytic models (such
as the Bouc-Wen model ([32]) have been widely adopted in
simulations, however, its weak-form makes the application esp.
for online system identification challenging. Another important
class of models, phenomenological models have been widely
accepted in engineering practice because of the convenience
of enabling intuitive understanding, although they face the
same challenges as the analytic models, i.e., the adaptivity
to data. Neural networks have been exercised by engineers
to tackle such a problem by using the available experimental
measurements (e.g., [8]). However just as in other applications,
lacking of physical interpretation or any intuitive understand-
ing, this highly adaptive and efficient method has yet been
fully exploited to tackle such an engineering problem in need.
Among the leading formulations in modeling noninear hys-
teresis, the force-state mapping technique for Single-Degree-
Of-Freedom (SDOF) systems ([19]) serves as a corner stone.
The usefulness and limitations of this formulation in modeling
nonlinear hysteretic restoring forces especially for memory
associated effects have been discussed in [34] and [20]. In
short, the force-state mapping treats the nonlinear restoring
force of a SDOF system as a function of both of its states,
i.e., displacement and velocity. Based on its importance and
simplicity, this formulation is adopted in the first attempt of
constructing an efficient and meaningful neural network.
In principle, fitting a restoring force surface of a SDOF
system in a state-space can be carried out using a neural
network with one hidden layer ([5], [9]) as in other function
approximation problems, which is defined by the following
expression:
r (x, ˙x) ˆr (p
1
, p
2
) =
n
h
X
j=1
w
2,j
h
j
(p
1
, p
2
) , and
h
j
(p
1
, p
2
) =
1
1 + e
(w
11,j
p
1
+w
12,j
p
2
b
j
)
where r and ˆr stand for the exact and approximated restoring
force, while p
1
and p
2
refer to the scaled displacement x and
velocity ˙x, respectively, for a SDOF system. The scaling issue
has been discussed in [29]. A feedforward neural network
with more than one hidden layer may be required for better
computational efficiency and perhaps more importantly to
yield better insight into the modeling behavior.
B. Methodology
The authors have proposed an approach [26], [28], [29],
[30] to determine the number of hidden layers, the number
of hidden nodes as well as the initial values of all the
weights and biases, all based on the feature of the restoring
surfaces to be modeled. Overall, the proposed method is a
growing technique. Various prototypes are built first to serve
as a general guideline for the initialization. These prototypes
are constructed to simulate some typical nonlinearities either
based on their mathematical expressions and/or geometric
features, and the capabilities of sigmoidal functions. When
building these prototypes, one is working on a forward prob-
lem. Therefore there is no training involved. Not only the
number of the hidden nodes, but also the values of the weights
and biases are derived in the process. The prototypes are
then grouped according to the typical nonlinearities that they
stand for. This is equivalent to constructing a look-up table
of the nonlinear types and the candidates of neural network
architectures.
When training a multilayer feedforward neural network to
obtain a smooth force-state mapping surface based on some
discrete points in the force-state space, a pre-processing is
often carried out to extract the main features of the surface to
be approximated. According to these features, some prototypes
are selected to lead to an initial neural network architecture
with the initial weights and biases. Training then starts to
capture the delicate features of the surface, and more nodes
are added for an improved approximation accuracy.
It is important to note the motivation for this study is not
limited to fast training and guaranteed convergence. With an
equal importance, seeking a meaningful interpretation and
even an intuitive understanding is emphasized here in this
neural network initialization.
C. Understanding Capabilities of Sigmoidal Functions
The key in building the proposed prototypes is to fully
utilize the nonlinearity of the sigmoidal action function in
approximating commonly encountered nonlinearities in the
force-state mapping. As the first step, the authors ([26],
[30]) have presented a constructive approach to approximating
polynomials (see Figs. 1 for some examples) and mapping
polynomial fitting into neural networks with a multilayer
feedforward neural network (see Fig. 2). Based on a finite
linear sum of hyperbolic sigmoidal functions and their Tay-
lor series expansion, numbers of hidden nodes as well as
initial values for weights and biases are derived to satisfy
a certain degree of accuracy. In another study based on a
different methodology by [22] and adopted by [21], derived
neural networks were also sought to map polynomials without
training. Approximating polynomials is studied first because of
their important role in the force-state mapping literature, i.e.,
Chebyshev [19] and ordinary polynomial fitting. The obtained
multilayer feedforward neural network whose functional form
consists of a small sum of sigmoidal basis functions can at
least efficiently approximate polynomials, its application to
the force-state mapping problem is thus quite natural and
compelling.
Σ
Σ
Σ
0
p
zp
3
-
-
Σ
Σ
0
w
2,4
[3]
w
2,3
[3]
w
2,6
[3]
w
1,5
[3]
w
1,1
[3]
w
1,1
[3]
w
1,3
[3]
w
1,3
[3]
w
1,5
[3]
b
1
[3]
b
1
[3]
0
w
2,5
[3]
w
2,1
[3]
Σ
Σ
w
2,2
[3]
-
-
0
w
1
[3]
w
2
[3]
component of weighting vector
L
e
g
e
nd
Σ
Σ
Σ
0
p
zp
2
-
-
Σ
Σ
0
w
2,2
[2]
w
2,3
[2]
w
2,4
[2]
w
1,3
b
1
[2]
w
2,1
[2]
w
1,1
[2]
w
1,3
[2]
b
1
[2]
w
1,1
[2]
w
1
[2]
w
2
[2]
component of weighting vector
L
e
g
e
nd
Σ
Σ
Σ
w
1,1
[0]
0
0
1
1
p
zp
0
w
1,1
[0]
-
w
1
[0]
w
2
[0]
component of weighting vector
L
e
g
e
nd
Σ
Σ
Σ
0
0
p
zp
1
-
w
1,1
[1]
2/w
1,1
[1]
w
1,1
2/w
1,1
[1]
-
w
1
[1]
w
2
[1]
component of weighting vector
L
e
g
e
nd
[2]
Fig. 1. Examples of the derived neural networks to approximate polynomials.
The effectiveness and flexibilities of multilayer feedforward
neural networks, however, far surpass this minimal proficiency
in approximating polynomial type nonlinearities. In addition
to the rigorous algebraic approach used in mapping polyno-
mials, a qualitative geometric analysis has been carried out to
study the capabilities of linear sums of sigmoidal functions
specifically for the force-mapping problem ([26], [28]). Two
examples are given in Figs. 3 and 4.
D. Prototypes
The geometric features in terms of 2-D contours of various
restoring force surfaces, are examined first using analytic, sim-
ulated and experimental results. Then, strategies are developed
to mimic these features in a transparent manner by focusing
on the number of hidden layers and hidden nodes, as well
as the needed values of the weights and biases. These neural
networks are prototypes.
Σ
Σ
Σ
Σ
Σ
Σ
Σ
Σ
Σ
Σ
Σ
Σ
Σ
Σ
Σ
Σ
Σ
Σ
Σ
»
»
»
L
aye
r
1
L
aye
r
2
L
aye
r
3
Σ
»
Σ
»
Σ
»
Σ
Σ
Σ
Σ
Σ
Σ
Σ
Σ
»
Σ
»
(p
1
)
0
(p
1
)
2
(p
1
)
p
1
p
2
(p
2
)
1
(p
2
)
2
(p
1
)
2
p
2
(p
2
)
3
p
1
p
2
z
1
s
t
nod
e
2
nd
nod
e
3
r
d
nod
e
4
t
h
nod
e
5
t
h
nod
e
21
s
t
nod
e
24
t
h
nod
e
25
t
h
nod
e
42
nd
nod
e
43
r
d
nod
e
6
t
h
nod
e
7
t
h
nod
e
14
t
h
nod
e
15
t
h
nod
e
20
t
h
nod
e
58
t
h
nod
e
64
t
h
nod
e
65
t
h
nod
e
70
t
h
nod
e
59
t
h
nod
e
(
f
ull
y
c
onn
ec
t
e
d
)
(
f
ull
y
c
onn
ec
t
e
d
)
(an example for
every node in this layer)
»
»
p
1
(p
2
)
(p
1
)
1
2
3
component of weighting vector
L
e
g
e
nd
to be determined
p
1
p
2
p
1
p
2
related purely to
related purely to
related to both and
with zero weight
Fig. 2. A derived neural network to explicitly perform polynomial fitting for
two variables up to the cubic power including the cross terms. Note that the
fixed weight training method ([10]) needs to be applied.
Some single-hidden layer neural network prototypes and
their simulation results are presented in Fig. 5. A general pro-
cedure for performing these simulations is summarized in the
caption. Note that all the neural network prototypes presented
here for the simulation are not obtained from training. Instead,
their number of hidden nodes and the values of the weights and
biases are decided based on the above algebraic and geometric
study so as to introduce some of the physical, mathematical,
and geometric features of nonlinear surfaces through some
human judgement rather than leaving these issues entirely to
data sets.
It can be seen that the simulation plots in Fig. 5, especially
the restoring force versus displacement plots, can represent
some typical nonlinear hysteretic phenomena, which illustrates
the capability of these prototypes. As long as the initial
weights and biases are selected properly, a sum of a very small
number of sigmoidal functions can achieve what the other
polynomial and non-polynomial based fitting schemes used
by the engineering mechanics community generally cannot
achieve in an efficient, flexible and unified manner. This
finding is not merely a validation of the feasibility studies
by [5], [9] which explore what multilayer feedforward neural
networks can do, rather it suggests schematically how to
design multilayer feedforward neural networks to approximate
a nonlinear function, i.e., a nonlinear restoring force in this
-10 0 10
0
0.5
1
w=1
b=0
p
h(p)
-10 0 10
0
0.5
1
w=1
b=-5&-10
p
h(p)
-10 0 10
0
0.5
1
w=1
b=5&10
p
h(p)
-10 0 10
0
0.5
1
w=0.05
b=0
p
h(p)
-10 0 10
0
0.5
1
w=10
b=0
p
h(p)
Fig. 3. Effect of varying weights and biases for a single sigmoidal function.
-10 0 10
0
1
2
w
1
=0.3, b
1
=-2
w
2
=0.3, b
2
=2
h
1
(p)+h
2
(p)
-10 0 10
0
1
2
w
1
=0.6, b
1
=-4
w
2
=0.6, b
2
=4
-10 0 10
0.5
1
1.5
w
1
=0.3, b
1
=-3
w
2
=0.3, b
2
=3
h
1
(p)+h
2
(p)
-10 0 10
0.5
1
1.5
w
1
=0.6, b
1
=-6
w
2
=0.6, b
2
=6
-10 0 10
0.5
1
1.5
w
1
=0.9, b
1
=-9
w
2
=0.9, b
2
=9
-10 0 10
0.95
1
1.05
w
1
=0.3, b
1
=-6
w
2
=0.3, b
2
=6
p
h
1
(p)+h
2
(p)
-10 0 10
0.995
1
1.005
w
1
=0.6, b
1
=-12
w
2
=0.6, b
2
=12
p
-10 0 10
0
1
2
w
1
=0.9, b
1
=-6
w
2
=0.9, b
2
=6
-10 0 10
0.9998
1
1.0002
w
1
=0.9, b
1
=-18
w
2
=0.9, b
2
=18
p
Fig. 4. Forming quasi-odd functions with different features especially the
varying length of the plateau by using a linear sum of two sigmoidal functions.
context.
The “meaning” of the weights and biases can be appreciated
from a parametric study run on each prototypes using the same
architecture but with various values of parameters. An example
is shown in Fig. 6.
The prototypes presented herein only serve as examples. It
is suggested to collect more prototypes like these, study the
influence of the values of weights and biases to each surface
profile, form the corresponding strategies based on any of
the mathematical, physical, and geometrical features and store
all of them in a look-up table or even a library to provide
guidelines on how (in terms of architecture design) and where
(in terms of initial values of weights and biases) to start neural
network training in mapping nonlinear restoring forces.
To interface raw input-output data to such a library espe-
cially when real-world large data sets are studied, it is further
proposed to use a pre-processing stage to seek guidance on the
neural network initial design (perhaps with some additional
iterative trials as well), which aims at grasping the global
characteristics of the restoring force surface. The training then
Σ
Σ
Σ
Σ
Σ
velocity
displacement
restoring force
Σ
Σ
Σ
Σ
Σ
Σ
Σ
Σ
velocity
displacement
restoring force
Σ
Σ
Σ
Σ
Σ
velocity
displacement
restoring force
Duffing
Softening
Coulomb
-120 0 120
-6
0
6
-120
0
120
-40
0
40
-6
0
6
-1.5
0
1.5
-1.5
0
1.5
-3
0
3
-1.5 0 1.5
-3
0
3
-80 0 80
-4
0
4
-80
0
80
-20
0
20
-4
0
4
restoring force
restoring force
restoring force
displacementdisplacement displacement
restoring force
restoring force
restoring force
displacementdisplacement displacementvelocity velocity velocity
Fig. 5. Examples of neural network prototypes in modeling the restoring force
using displacement and velocity for a SDOF system. In each case, a linear
sum of sigmoidal functions with a selected number of terms and specified
values of weights and biases (see Row 1) is used to construct a restoring
force surface (see Row 2 for the constructed restoring force surfaces). Note
that the values of weights and biases are not indicated. A synthetic swept-sine
excitation force is then applied to excite the SDOF system with the restoring
force surface defined by the neural network. By solving the equation of
motion numerically, a discrete response time history in terms of system states
(displacement and velocity) populate the restoring force surface to create a
trajectory (see Row 2). The trajectory is then be projected to produce restoring
force versus displacement plot as in Row 3.
starts with a clearly defined initial condition that is a product
of some physical insights and engineering judgements, and is
used to fine-tune the surface to reflect some localized and/or
delicate features with or without increasing the size of the
neural network.
The nonuniqueness of identified models for the neural
network approach is well known, that is, given different initial
points (corresponding to different sets of initial weights and
biases), the final trained results differ even for the same set
of training data. The proposed methodology leads to an initial
point that is placed close to a location where there is some type
of physical, geometrical or mathematical meaning. Starting
with such an initial design, it is expected that the final trained
results are closer to having some meaning than if one adopted
other initialization schemes. This is the key assumption made
in this study, which can be justified by the nature of the
local search training tools often used for a global search goal
associated with generalization using neural networks ([23]).
E. Training Example
A training example is presented in Fig. 7. The entire
procedure including the proposed initialization is demonstrated
from Fig. 7(a) through Fig. 7(d) and explained in the caption.
Prototypes and strategies in handling arbitrary parallel con-
tours and some localized features presented in [26] and [28]
are adopted here to decide the neural network architecture and
initial values of the weights and biases.
-15 0 15
-1.1
0
1.1
-15
0
15
-3
0
3
-1.1
0
1.1
-15 0 15
-3
0
3
0
-0.2
0.2
-0.4
0.4
-0.6
0.6
-0.8
0.8
-1
1
-15 0 15
-1.1
0
1.1
-15
0
15
-3
0
3
-1.1
0
1.1
-15 0 15
-3
0
3
0
-0.2
0.2
-0.4
0.4
-0.6
0.6
-0.8
0.8
-1
1
-15 0 15
-1.1
0
1.1
-15
0
15
-3
0
3
-1.1
0
1.1
-15 0 15
-3
0
3
Σ
Σ
Σ
Σ
Σ
Σ
velocity
displacement
restoring force
restoring force
displacement
restoring force
displacementvelocity
velocity
displacement
restoring force
displacement
restoring force
displacementvelocity
velocity
displacement
restoring force
displacement
restoring force
displacementvelocity
velocity
displacement
Clearance (Dead Space)
Fig. 6. A parametric study run on a prototype for clearance (or dead space)
nonlinearity to show the values of the weights and biases can fine-tune the
features of the nonlinearity.
While the major benefit of this prototype-based initialization
would be that it provides a clear sense of what- and how-to-
do-it in the network design, it seems that one can still justify
the proposed initialization for improving mean-square-error
(MSE) performance (see Case 2 in Fig. 7(d)). The derived
weights and biases from the proposed initialization not only
give the smallest initial MSE among the four cases, but also
lead to the best performance after about 300 epoches. This
indicates the computational merit of the proposed initializa-
tion, although many more examples should be investigated to
validate this conclusion. Comparison of the values of weights
and biases between Case 1 (based on the popular Nguyen-
Widrow layer initialization method [24]) and Case 2 (based
on the derived initial values) is presented in [29]. It seems
that most of the trained values are close to their corresponding
initial values in both cases.
F. Ongoing Work
Presented above is an effort of forming an initial neural
network according to physical /mathematical /geometrical in-
terpretations while preserving the neural network’s adaptivity
to data when using the force-state mapping to model nonlinear
hysteretic restoring force. Further theoretical justifications
(e.g., how to grow the size of the neural network based on
some mathematical formulations and how to define the fixity
of the neural network during training) and training validations
for more complicated nonlinearities and large real-world data
sets are being carried out. An expansion of the proposed
initialization methodology to more advanced formulations of
the specific engineering problem is also being studied (e.g.,
[27]).
0 200 400 600 800 1000
10
-
9
10
-
8
10
-
7
10
-
6
10
-
5
10
-
4
10
-
3
10
-
2
10
-
1
10
0
10
1
epoch
squared error
C
ase
1
C
ase
2
C
ase
3
C
ase
4
0 1 2 3 4 5 6 7 8 9 10
10
-
5
10
-
4
10
-
3
10
-
2
10
-
1
10
0
10
1
epoch
squared error
C
ase
1
C
ase
2
C
ase
3
C
ase
4
Σ
Σ
Σ
p
1
p
2
Σ
Σ
Σ
S
z
Σ
Σ
Σ
Σ
Σ
p
1
p
2
Σ
z
Σ
Σ
Σ
Σ
p
1
p
2
Σ
Σ
Σ
S
z
Σ
Σ
Σ
Σ
Σ
Σ
p
1
p
2
Σ
z
Σ
Σ
Σ
C
ase
4
C
ase
1
C
ase
3
C
ase
2
-5 0 5
-5
0
5
(
a
)
Exact restoring force-displacement plot
displacement
restoring force
(
d
)
Time history of preformance index of four neural netowks
C
ase
1
-
Derived two hidden-layer architecture with Nguyen-Widrow layer initialization
C
ase
2
-
Derived two hidden-layer archtecture with the proposed initialization in this study
C
ase
3
-
Further simplified one hidden-layer (six hidden nodes) architecture
with Nguyen-Widrow layer initialization
C
ase
4
-
Further simplified one hidden-layer (three hidden nodes) architecture
with Nguyen-Widrow layer initialization
-5 0 5
-5
0
5
displacement
velocity
(
b
)
Contour of exact restoring force surface
-4
-2
0
2
4
(
c
)
Derived neural network architecture
Fig. 7. A training example using the proposed initialization. (a) A simulated
velocity square damping data set is formed by using r (x, ˙x) = 0.04 ˙x +
0.04 ˙x
2
sign ( ˙x) + x and a swept-sine excitation f = sin(0.01t
2
+ 0.01t)
with a uniform time step t = 0, 0.1, . . . , 200. (b) The original data set is
organized into pairs of restoring force (output of the neural network) versus
displacement and velocity (inputs to the neural network). In a pre-processing
stage, numerical interpolation is then carried out to form a data-based (non-
analytical) restoring force surface. Since 2-D contour lines characterize a 3-D
surface, contour features of the interpolated surface are examined. Based on
these features, a neural network architecture is derived using the prototype in
Fig. 4 and others presented in [26] and [28]. The obtained initial architecture is
shown in Cases 1 and 2 in (c). Further simplifications lead to the architectures
shown in Cases 3 and 4. (d) To validate the proposed initialization, four cases
are considered in the training. Both Cases 1 and 2 adopt the derived neural
network architecture but with different initial values of weights and biases. In
Cases 3 and 4, it is investigated how the simplified architecture would affect
the training performance. Throughout the training, the batch training mode
and the Levenberg-Marquardt algorithm are adopted.
IV. CONCLUSIONS
This study has sought to take a step toward creating a
neural network approach with enough physical /mathematical
/phenomenological insights to be classified as meaningful,
but yet remain highly adaptive. The important problem of
modeling nonlinear restoring forces in a SDOF system based
on the force-state mapping formulation has been selected as
an example to demonstrate that this goal can be achieved
by introducing a prototype-based initialization where human
judgement needs to be exercised in an engineered manner
based on the algebraic and/or geometric aspect of the problem
itself.
ACKNOWLEDGMENT
This study was supported in part by the National Science
Foundation under SGER CMS-0332350 for the first author
and CAREER Award CMS-0134333 for the third author.
REFERENCES
[1] K. A. Al-Mashouq and S. R. Irving, “Includig Hints in Training Neural
Nets, Neural Computation, MIT, 1991, vol. 3, pp. 418–427.
[2] C.M. Bishop, Neural Networks for Pattern Recognition, Oxford Univer-
sity Press, 1995, pp. 482.
[3] T. L. Burrows and M. Niranjan, “Feed-Forward and Recurrent Neural
Networks for System Identification, Cambridge University Engineering
Department, CUED/F-INFENG/TR158, 1993.
[4] P. Costa and P. Larzabal, “Initialization of Supervised Training for
Parametric Estimation, Neural Processing Letters, Kluwer Academic
Publishers, Printed in the Netherlands, 1999, vol. 9, pp. 53–61.
[5] G. Cybenko, Approximation by Superpositions of Sigmoidal Function,
Mathematics of Control, Signals, and Systems, 1989, vol. 2, pp. 303–314.
[6] T. Denoeux and R. Lengell
´
e, “Initializing Back Propagation Networks
with Prototypes, Neural Networks, 1993, vol. 6, pp. 351–363.
[7] G. P. Drago and S. Ridella, “Statistically Controlled Activation Weight
Initialization (SCAWI), IEEE Transactions on Neural Networks, 1992,
vol. 3, no. 3, pp. 627–631.
[8] J. Ghaboussi and X. Wu, “Soft Computing with Neural Networks for En-
gineering Applications: Fundamental Issues and Adaptive Approaches,
Structural Engineering Mechanics, 1998, vol. 6, no. 8, pp. 955–969.
[9] K. Hornik, M. Stinchcombe and H. White, “Multilayer Feedforward
Networks are Universal Approximators, Neural Networks, 1989, vol. 2,
pp. 359–366.
[10] W. Y. Huang and R. P. Lippmann, “Neural Net and Traditional Clas-
sifiers, Neural Information Processing Systems, D. Anderson (ed.),
American Institute of Physics, New York, 1988, pp. 387–396.
[11] L. K. Jones, “Constructive Approximations for Neural Networks by
Sigmoidal Functions, Proceedings of the IEEE, 1990, vol. 78, no. 10,
pp. 1586–1589.
[12] L. S. Kim, “Initializing Weights to a Hidden Layer of a Multilayer
Neural Network by Linear Programming, Proceedings of 1993 Interna-
tional Joint Conference on Neural Networks, 1993. vol. 2, pp. 1701–1704.
[13] M.-Y. Kim and C.-H. Choi, A New Weight Initialization Method for
the MLP with the BP in Multiclass Classification Problems, Neuaral
Processing Letters, 1997, vol. 6, pp. 11–23.
[14] M. Lentokangas, J. Saarinen and K. Kaski, “Initializing Weights of a
Multilayer Perceptron Network by Using the Orthogonal Least Squares
Algorithm, Neural Computation, 1995, vol. 7, pp. 982–999.
[15] M. Lehtokangas, “Fast Initialization for Cascade-Corrolation Learning,
IEEE Transactions on Neural Networks, 1999, vol. 10, no. 2, pp. 410–
414.
[16] N. B. Karayiannis, Accelerating the Training of Feedforward Neural
Networks Using Generalized Hebbian Rules for Initializing the Internal
Representations, IEEE Transactions on Neural Networks, 1996, vol. 7,
no. 2, pp. 419–426.
[17] G. Li and H. Alnuweiri and Y. Wu and H. Li, Acceleration of Back
Propagations through Iniitial Weights Pre-Training with Delta Rule,
Proc. Int. Joint Conf. Neural Networks, pp. 580–585, 1993, vol. 1.
[18] L. Ma and K. Khorasani, “New Training Strategies for Constructive
Neural Networks with Application to Regression Problems, Neual Net-
works, 2004, vol. 17, pp. 589–609.
[19] S. F. Masri and T. K. Caughey, A Nonparametric Identification Tech-
nique for Nonlinear Dynamic Problems, Journal of Applied Mechanics,
1979, vol. 46, pp. 433–447.
[20] S. F. Masri, J. P. Caffrey, T. K. Caughey, A. W. Smyth and A. G.
Chassiakos, “Identification of the State Equation in Complex Non-Linear
Systems, International Journal of Non-Linear Mechanics, 2004, vol. 39,
pp. 1111–1127.
[21] A. J. Meade, Jr., “Regularization of a Programmed Recurrent Artificial
Neural Network, Journal of Guidance, Control, and Dynamics, 2003.
[22] H. N. Mhaskar, “Neural Networks for Optimal Approximation of
Smooth and Analytic Functions, Neural Computation, 1995, vol. 8,
pp. 164–177.
[23] O. Nelles, Nonlinear System Identification: From Classical Approaches
to Neural Networks and Fuzzy Models, Springer Verlag, 2000.
[24] D. Nguyen and B. Widrow, “Improving the Learning Speed of 2-Layer
Neural Networks by Choosing Initial Values of the Adaptive Weights,
Proceedings of the IJCNN, pp. 21–26, 1990, vol.III.
[25] S. Osowski, “New Approach to Selection of Initial Values of Weights in
Neural Function Approximation, Electronic Letters, 1993, vol. 29, no. 3,
pp. 313–315.
[26] J. S. Pei, “Parametric and Nonparametric Identification of Nonlinear
Systems, Columbia University, 2001, Ph.D. Dissertation.
[27] J. S. Pei, A. W. Smyth, A.W. and E. B. Kosmatopoulos, Analysis and
Modification of Volterra/Wiener Neural Networks for Identification of
Nonlinear Hysteretic Dynamic Systems,Journal of Sound and Vibration,
vol. 275, no. 3-5, pp. 693–718.
[28] J. S. Pei and A. W. Smyth, “A New Approach to Design Multilayer Feed-
forward Neural Network Architecture in Modeling Nonlinear Restoring
Forces: Part I - Formulation, ASCE Journal of Engineering Mechanics,
2004, accepted for publication.
[29] J. S. Pei and A. W. Smyth, “A New Approach to Design Multilayer Feed-
forward Neural Network Architecture in Modeling Nonlinear Restoring
Forces: Part II - Applications, ASCE Journal of Engineering Mechanics,
2004, accepted for publication.
[30] J. S. Pei, J. P. Wright, and A. W. Smyth, “Mapping Polynomial Fitting
into Feedforward Neural Networks for Modeling Nonlinear Dynamic
Systems and Beyond, Computer Methods in Applied Mechanics and
Engineering, 2004, accepted for publication.
[31] W. F. Schmidt and S. Raudys and M. A. Kraaijveld and M. Skurikhina,
and R. P. W. Duin, “Initializations, Back-Propagation and Generalization
of Feed-Forward Classifiers, Proc. Int. Joint Conf. Neural Networks,
pp. 598–604, 1993.
[32] Y. K. Wen, “Methods of Random Vibration for Inelastic Structures,
Appl. Mech. Rev., ASME, 1989, vol. 42, no. 2, pp. 39-52.
[33] L. F. A. Wessels and E. Barnard, Avoiding False Local Minima
by Proper Initialization of Connections, IEEE Transactions on Neural
Networks, 1992, vol. 3, no. 6, pp. 899–905.
[34] K. Worden and G. R. Tomlinson, Nonlinearity in Structural Dynamics:
Detection, Identification and Modelling, Institute of Physics Pub, 2001.
[35] J. Y. F. Yam and T. W. S. Chow, “Feedforward Networks Training Speed
Enhancement by Optimal Initialization of the Synaptic Coefficients,
IEEE Transactions on Neural Networks, 2001, vol. 12, no. 2, pp. 430–
434.
[36] S. Zhao and T. S. Dillon, “Incorporating Prior Knowledge in the Form
of Production Rules into Neural Networks Using Boolean-Like Neurons,
Applied Intelligence, 1997, vol. 7, pp. 275–285.
... Concern 3 The use of AI itself involves subjective issues that engineering research practice, in general, tries to minimize or avoid. As reviewed elsewhere [39,42,45], numerous subjective issues can be encountered in the initial design and training of neural networks, and in interpretation of trained results. Among them, the two key questions are prominent, thus remain-ing the focus in our work. ...
... We have been carrying out a series of studies that now form a small collection aimed at providing interpretability to the user of sigmoidal neural networks for a specific function approximation application in engineering mechanics [39][40][41][42][43][44][45][46][47][48]. Techniques A to G have been developed by us to construct sigmoidal neural network prototypes (see more explanation later in this section) to approximate a range of useful functions/features as summarized in Table 1. ...
... The three steps under IE 3 are simple, fast, clear, and fruitful as validated in our work [39][40][41][42][43][44][45][46][47][48]. ...
Chapter
Machine learning may complement physics-based methods for structural health monitoring (SHM), providing higher accuracy, among other benefits. However, many resulting systems are opaque, making them neither interpretable nor trustworthy. Interpretable machine learning (IML) is an active new direction intended to match algorithm accuracy with transparency, enabling users to understand their systems. This chapter overviews existing IML work and philosophy, and discusses candidates from SHM to exemplify and substantiate IML. Multidisciplinary research has been making strides toward providing end users of shallow sigmoidal artificial neural networks (ANNs) with the tools and knowledge for engineering these systems. Notoriously opaque ANNs are made transparent as linear-in-the-weight parameterization tools by using domain knowledge to determine appropriate basis functions. With a small number of hidden nodes to activate these basis functions, the modeling capability of sigmoidal ANNs is systematically revealed without relying on training. The novelty is in ANN initialization theory and practical procedures that can be interpreted via domain knowledge. A rich repository of direct (non-iterative) techniques and reusable ANN prototypes can then be aggregated as the basis functions needed for specific problems, leading to interpretable ANNs as well as improved training performance and generalization as validated by simulated and real-world data.
... Here in this subsection, we use informal language to strive for an intuitive understanding of our constructive method [27,37,38,28,25,23,39,1,2]. Nicknames are first given with quotation marks for a quick grasp of the key idea. ...
... A. Review of C f in [3,4] Following [3,4], the total statistical risk (or equivalently, the generalization error, [24] (1) where n is the number of hidden nodes, d is the input dimension of the function, N is the number of training observations, and C f is the first absolute moment of the Fourier F magnitude distribution of f , i.e., ...
... Yes, based on our existing effort on the specified subset of static nonlinear function [27,37,38,28,25,23,39,1,2]. Note that we do not rely on random initial values in our constructive method. ...
Conference Paper
Full-text available
This paper reports a continuous development of the work by the authors presented at IJCNN 2005 & 2007 [1, 2]. A series of parsimonious universal approximator architectures with pre-defined values for weights and biases called “neural network prototypes” are proposed and used in a repetitive and systematic manner for the initialization of sigmoidal neural networks in function approximation. This paper provides a more in-depth literature review, presents one training example using laboratory data indicating quick convergence and trained sigmoidal neural networks with stable generalization capability, and discusses the complexity measure in [3, 4]. This study centers on approximating a subset of static nonlinear target functions - mechanical restoring force considered as a function of system states (displacement and velocity) for single-degree-of-freedom systems. We strive for efficient and rigorous constructive methods for sigmoidal neural networks to solve function approximation problems in this engineering mechanics application and beyond. Future work is identified.
... Pei et al. (2005a) addressed how to fit a nonlinear function with MLP and used MLP to model a nonlinear dynamic system. Pei et al. (2005b), Pei and Smyth (2006a, b), and Pei et al. (2007Pei et al. ( , 2011 discussed the ability of MLP to approximate nonlinear functions in engineering mechanics. Also, those works were dedicated to developing constructive methods for the initialization of a MLP configuration where the MLPs were trained to predict the nonlinear restoring force by using displacement and velocity inputs. ...
Article
This study presents a deep convolutional neural network (CNN)-based approach to estimate the dynamic response of a linear single-degree-of-freedom (SDOF) system, a nonlinear SDOF system, and a full-scale 3-story multidegree of freedom (MDOF) steel frame. In the MDOF system, roof acceleration is estimated through the input ground motion. Various cases of noise-contaminated signals are considered in this study, and the conventional multilayer perceptron (MLP) algorithm serves as a reference for the proposed CNN approach. According to the results from numerical simulations and experimental data, the proposed CNN approach is able to predict the structural responses accurately, and it is more robust against noisy data compared with the MLP algorithm. Moreover, the physical interpretation of CNN model is discussed in the context of structural dynamics. It is demonstrated that in some special cases, the convolution kernel has the capability of approximating the numerical integration operator, and the convolution layers attempt to extract the dominant frequency signature observed in the ideal target signal while eliminating irrelevant information during the training process.
... Pei and colleagues [141][142][143][144][145] Disp. and vel. ...
Article
During the past decades, significant efforts have been dedicated to develop reliable methods in structural health monitoring. The health assessment for the target structure of interest is achieved through the interpretation of collected data. At the beginning of the 21st century, the rapid advances in sensor technologies and data acquisition platforms have led to the new era of Big Data, where a huge amount of heterogeneous data are collected by a variety of sensors. The increasing accessibility and diversity of the data resources provide new opportunities for structural health monitoring, while the aggregation of information obtained from multiple sensors to make robust decisions remains a challenging problem. This article presents a comprehensive review of the recent data fusion applications in structural health monitoring. State-of-the-art theoretical concepts and applications of data fusion in structural health monitoring are presented. Challenges for data fusion in structural health monitoring are discussed, and a roadmap is provided for future research in this area.
... However, for function approximation (and other applications of this specific type of neural network), many practical and theoretical design issues still lack clearly defined solutions, and the inner work-ings of neural networks remain largely unclear. For example, although the universal approximator theorem proves the feasibility of a feedforward neural network with one hidden layer (Cybenko, 1989, Hornik et al., 1989, arriving at the number of hidden nodes and the initial values of the weights and biases for that neural network often remains a trial-and-error process (See Pei et al., 2005, for a summary). ...
Conference Paper
Full-text available
This paper presents a heuristic initialization methodology for designing multilayer feedforward network networks in modeling nonlinear functions in engineering mechanics applications. In this and previous studies that this work is built upon, the authors do not presume to provide a universal method to approximate arbitrary functions, rather the focus is given to the development of a rational and unambiguous initialization procedure that applies to the approximation of nonlinear functions in the specific domain of engineering mechanics. The applications of this exploratory work can be numerous including those associated with potential interpretation of the inner workings of neural networks, such as damage detection. The goal of this study is fulfilled by utilizing the governing physics and mathematics of nonlinear functions and the strength of sigmoidal basis function. A step-by-step graphical procedure utilizing a few neural network prototypes as " templates " to approximate commonly seen memoryless nonlinear functions of one or two variables is developed in this study. Decomposition of complex nonlinear functions into a summation of some simpler nonlinear functions is utilized to exploit this prototype-based initialization methodology. Training examples are presented to demonstrate the rationality and efficiency of the proposed methodology when compared with the popular Nguyen-Widrow initialization algorithm. Future work is also identified.
... Without a good understanding of how neural networks work, numerous challenging subjective issues in the use of neural networks cannot be fundamentally overcome for practical applications of neural networks. As reviewed by the authors in [21,26,29], numerous subjective issues can be encountered in the initial design, training as well as the interpretation of trained results. This paper is part of a continuing development of a heuristic methodology as presented previously at IMAC XXIV [22] and XXV [23]. ...
Conference Paper
Full-text available
This paper continues the development of a heuristic initialization methodology for designing multi-layer feedforward neural networks aimed at modeling nonlinear functions for engineering mechanics applications as presented previously at IMAC XXIV and XXV. Seeking a transparent and domain knowledge-based approach for neural network initialization and result interpretation, this study examines the efficiency of linear sums of sigmoidal functions while offering constructive methods to approximate certain functions and operations. This effort directly contributes to the further extension of the proposed initialization procedure in that it opens the door for the approximation of a wider range of nonlinear functions.
... For example, although the universal approximator theorem proves the feasibility of a feedforward neural network with one hidden layer (Cybenko (1989); Hornik et al. (1989)), arriving at the number of hidden nodes and the initial values of the weights and biases for that neural network often remains a trial-and-error process (see a summary of existing literature presented in Pei et al. (2005b)). This type of approach is not always desirable in engineering practice. ...
Conference Paper
Full-text available
As a mainstream class of tools in system identification of nonlinear dynamical systems, artificial neural networks are inherently adaptive and thus are, if properly designed, effective in producing a good fit of measured data. However, many practical subjective design challenges and inherent " black box " features of neural network techniques have seriously limited their applications and even acceptance for practical use. This is the significant knowledge gap that the authors strive to bridge in this work and its future expansion. By not treating neural networks as " black boxes " , this study puts forth the development of novel initialization techniques used for multilayer feedforward neural networks to model complex nonlinear hysteretic restoring forces and the approximation of other nonlinear functions in engineering mechanics. This has been done by explicitly considering the governing physics and/or underlying mathematical insight of the system to be identified together with the capabilities of linear sums of sigmoidal functions. In particular, a constructive method in the design of multilayer feedforward neural networks is developed to approximate a series of basic types of nonlinear functions and to perform some basic operations. These neural networks (called " prototypes ") have one hidden layer, a derived number of hidden nodes, and either fixed or recommended initial values for weights and biases. A variety of techniques are also proposed to use these prototypes either individually or combinatorially to produce successful training results for data stemming from more arbitrary nonlinearities. With the derived number of hidden nodes in each approximation, applying the Nguyen-Widrow algorithm is enabled and the training performance is compared between the existing and the proposed initialization options. This presentation will provide an overview and details of a series of heuristic approaches developed by the authors in a specific domain application of artificial intelligence. Further work is identified.
Chapter
Modal-based structural health monitoring (SHM) detects damage and degradation phenomena from the variations of the modal parameters over time. However, the modal parameter estimates are also influenced by environmental and operational variables (EOVs) whose effects have to be compensated. Modeling the influence of EOVs on modal parameters is very challenging, so black-box models, such as regression models, are often adopted as an alternative. However, in many applications, the set of measured EOVs is incomplete or the factors influencing the estimates cannot be identified or measured. In these conditions, output-only techniques for compensation of environmental and operational effects are an attractive alternative. Different linear as well as nonlinear methods for the compensation of the environmental and operational influence on modal parameters in the context of modal-based SHM are reviewed in the present paper. Real datasets collected from vibration-based monitoring systems are analyzed, and the results are presented and discussed to illustrate the applicative perspectives and possible drawbacks of the selected methods.
Conference Paper
Full-text available
The COVID-19 pandemic is a worldwide crisis with impacts that are both devastating and inequitable as effects often fall hardest on communities that are already suffering from economic, social, and political disparities. Interpretable machine learning (IML) offers the possibility for detailed understanding of this and similar disease outbreaks, allowing subject matter experts to explore the data more thoroughly and find patterns and connections that might otherwise remain hidden. As an active area of research in artificial intelligence, IML has great significance yet numerous technical challenges to overcome. In this paper, we focus on approximating epidemic curves using an interpretable artificial neural network. This is a first step toward a flexible and interpretable modeling framework that we plan to use to study impacts of various demographic, socioeconomic, and other factors on disease outbreaks. We tap into a substantial but little-known collection of IML studies in nonlinear function approximation from engineering mechanics, where domain knowledge including visually observable features of the data is systematically sorted and directly utilized in the initialization of sigmoidal neural networks leading to training success and good generalization. After an introductory review of existing work, we present a feasibility study on approximating a particular epidemic curve leading to a promising result.
Conference Paper
Full-text available
Feedforward neural networks are neural networks with (possibly) multiple layers of neurons such that each layer is fully connected to the next one. They have been widely studied in the past partially due to their universal approximation capabilities and empirical effectiveness on a variety of application domains for both regression and classification tasks. In this paper, we provide an overview on feedforward neural networks, focusing on weight initialization methods.
Article
Full-text available
Based on the basic formulation developed in a companion paper, the writers now present the application of an artificial neural network approach to designing streamlined network models to simulate and identify the nonlinear dynamic response of single-degree-of-freedom oscillators using the restoring force-state mapping interpretation. The neural networks which use sigmoidal activation functions are shown to be highly robust in modeling a wide variety of commonly observed nonlinear structural dynamic response behaviors. By streamlining the networks, individual network model parameters take on physically or geometrically interpretable meaning, and hence, the network initialization can be achieved through an engineered approach rather than through less physically meaningful numerical initialization schemes. Although not proven in general, examples show that by starting with a more meaningful initial design, identification convergence is improved, and the final identified model parameters are seen to have a more physical meaning. A set of model architecture prototypes is developed to capture commonly observed nonlinear response behaviors.
Article
In this paper we demonstrate that finite linear combinations of compositions of a fixed, univariate function and a set of affine functionals can uniformly approximate any continuous function ofn real variables with support in the unit hypercube; only mild conditions are imposed on the univariate function. Our results settle an open question about representability in the class of single hidden layer neural networks. In particular, we show that arbitrary decision regions can be arbitrarily well approximated by continuous feedforward neural networks with only a single internal, hidden layer and any continuous sigmoidal nonlinearity. The paper discusses approximation properties of other possible types of nonlinearities that might be implemented by artificial neural networks.
Article
The use of a time series, which is the chaotic response of a nonlinear system, as an excitation,for the parametric identification of single-degree-of-freedom nonlinear systems is explored in this paper It is assumed that the system response consists of several unstable periodic orbits, similar to the input, and hence a Fourier series based technique is used to extract these nearly periodic orbits. Criteria to extract these orbits are developed and a least-squares problem for the identification of system parameters is formulated and solved. The effectiveness of this method is illustrated on a system with quadratic damping and a system with Duffing nonlinearity.
Article
Engineering problems are inherently imprecision tolerant. Biologically inspired soft computing methods are emerging as ideal tools for constructing intelligent engineering systems which employ approximate reasoning and exhibit imprecision tolerance. They also offer built-in mechanisms for dealing with uncertainty. The fundamental issues associated with engineering applications of the emerging soft computing methods are discussed, with emphasis on neural networks. A formalism for neural network representation is presented and recent developments on adaptive modeling of neural networks, specifically nested adaptive neural networks for constitutive modeling are discussed.
Article
A nonparametric identification technique is presented that uses information about the state variables of nonlinear systems to express the system characteristics in terms of orthogonal functions. The method can be used with deterministic or random excitation (stationary or otherwise) to identify dynamic systems with arbitrary nonlinearities, including those with hysteretic characteristics. The method is shown to be more efficient than the Weiner-kernel approach in identifying nonlinear dynamic systems of the type considered.
Book
The book covers the most common and important approaches for the identification of nonlinear static and dynamic systems. Additionally, it provides the reader with the necessary background on optimization techniques making the book self-contained. The emphasis is put on modern methods based on neural networks and fuzzy systems without neglecting the classical approaches. The entire book is written from an engineering point-of-view, focusing on the intuitive understanding of the basic relationships. This is supported by many illustrative figures. Advanced mathematics is avoided. Thus, the book is suitable for last year undergraduate and graduate courses as well as research and development engineers in industries. The new edition~includes exercises.