Content uploaded by Jin-Song Pei
Author content
All content in this area was uploaded by Jin-Song Pei on Feb 22, 2015
Content may be subject to copyright.
Neural Network Initialization with Prototypes -
A Case Study in Function Approximation
Jin-Song Pei
School of Civil Engineering and
Environmental Science
University of Oklahoma
Norman, OK 73019
E-mail: jspei@ou.edu
Joseph P. Wright
Division of Applied Science
Weidlinger Associates Inc.
New York, NY 10014
E-mail: wright@wai.com
Andrew W. Smyth
Department of Civil Engineering and
Engineering Mechanics
Columbia University
New York, NY 10027
E-mail: smyth@civil.columbia.edu
Abstract—The initialization of neural networks in function
approximation has been studied by many researchers yet remains
a challenging problem. Another important yet open issue in the
neural network community is to incorporate knowledge and hints
with regard to training for a meaningful neural network. This
study makes an attempt to address these two issues in handling a
specific type of engineering problems, namely, modeling nonlinear
hysteretic restoring forces of a dynamic system under a specific
formulation. The paper showcases a heuristic idea on using a
growing technique through a prototype-based initialization where
the insight to the governing mathematics/physics are related to
the features of the activation functions.
I. INTRODUCTION
As an important issue in theories and applications, mul-
tilayer feedforward neural network initialization in function
approximation has been studied by many researchers (e.g.,
[24], [7], [33], [3], [25], [31], [6], [12], [2], [14], [16], [13],
[4], [35]) focusing on fast convergence. One of the major
experiences indicates that an efficient initialization (measured
by the success and rate of the convergence) has to rely on the
feature of the function to be approximated (e.g., [6], [15]).
In this study, the authors were prompted by the need in a
specific engineering application of neural networks to seek an
efficient as well as a meaningful solution to the initialization
of a multilayer feedforward neural network.
A heuristic approach is presented to initialize multilayer
feedforward neural networks for a special and critical class
of problems in engineering mechanics and numerous appli-
cations. Mathematically, a smooth function surface is desired
here based on some discrete input-output points. This is a
dual-goal pursuit; one is to find an initial point to start the
training of a multilayer feedforward neural network to enable
convergence and fast training, and the other is to facilitate the
interpretation of the inner workings of neural networks. The
latter is required by the need in engineering practice ([28])
especially by the demand in fusing physics-based and data-
driven modeling tools.
II. LITERATURE REVIEW
Finding good initial points to start neural network training
is of critical importance. Reduced training time is claimed
to be substantial if an initial point can be optimized (e.g.,
[6]). In terms of initialization schemes, there are quite some
publications available, although the number of which might
not be as many as those focusing on the training process itself.
Summaries of these works can be found in [15], [35], for
example. Despite the existence of these works, both analytic
and heuristic approaches are still lacking to guide practical
applications. This might be because this challenging issue can
only be addressed well by looking into the features of the
function to be approximated (or equivalently, the features of
the error function surface) and thus might be hard to be tackled
in a general sense ([6], [15]).
[36] and [1] are examples of the past efforts of incorporat-
ing knowledge into neural network training. For engineering
applications, the existing knowledge of engineering and maths
would offer significant advantages if a proper means could be
found to utilize them in the most critical and under-addressed
issues in neural networks such as the initial setup. Constructive
methods in neural networks do exist (e.g., [11] and a summary
in [18]), however the available knowledge and hints are far
from being sufficiently connected to any of these constructive
methods.
Among these existing studies, [6] is closely related to this
study, where a prototype based initialization was proposed
for both pattern classification and function approximation.
Noteworthy is that [6] also paid attention to the interpretation
of the weights and/or basis function. This proposed study,
however, is more detailed and exercised more thoroughly
but mainly for the approximation of some arbitrary functions
in a specific engineering application. Inspirations from these
previous works to this study can be summarized as follows:
1) Random initialization schemes (e.g., [31]) do not nor-
mally work well, which has been widely acknowledged
by researchers, e.g., [13], [4].
2) Methods based on a good understanding of the capabil-
ities of sigmoidal activation function have been studied.
The importance of thoroughly utilizing the feature of
sigmoidal functions has been recognized. Various at-
tempts have been made to use a sigmoidal function as
a whole (e.g., [11]), in a linearized form (e.g., [3]), in a
piece-wise fashion (e.g., [24], [25]), and as a nonlinear
function (e.g., [4]), respectively.
From these studies, the need of preventing neurons from
saturation has been realized. The weights between the
input and hidden layer are given special attention since
they form the basis functions in function approximation.
The work by [22] does not deal with neural network ini-
tializations directly, however the capability of sigmoidal
functions in approximating polynomials was analyzed.
That work was used by [21] in setting up recurrent
neural networks.
3) Methods utilizing the features of the function to be
approximated have been proposed.
The necessity of studying the intrinsic feature/pattern of
the function to be approximated and the importance of
exercising human judgement on any visible features of
the function have been reported such as in [25], [6].
This will be referred as the engineering judgement in
this study.
4) The necessity of preprocessing data is quite common.
[17] proposed a delta rule pre-training scheme, while
[16] proposed a bottom-up unsupervised learning
process before the top-down supervised training is ap-
plied. For cascade learning, [15] also proposed a pre-
training stage.
III. PROPOSED INITIALIZATION
A. Problem Statement
Modeling nonlinear hysteresis has been a critical and an
active research topic in engineering mechanics, the applica-
tions of which can be found in studying the work load of
aeroplane joints, performing simulations for infrastructures
under dynamic loads (e.g., earthquake excitations) and many
others. The nonlinearity hysteretic especially the rate and
path-dependent hysteretic behavior of a joint will dictate
the system-level response, thus has been studied analytically,
experimentally and numerically. Physics-based macro-models
have the advantage of having physical interpretations, however,
the physical mechanism is often hard to be modeled due to the
complex nature of the problem itself. Analytic models (such
as the Bouc-Wen model ([32]) have been widely adopted in
simulations, however, its weak-form makes the application esp.
for online system identification challenging. Another important
class of models, phenomenological models have been widely
accepted in engineering practice because of the convenience
of enabling intuitive understanding, although they face the
same challenges as the analytic models, i.e., the adaptivity
to data. Neural networks have been exercised by engineers
to tackle such a problem by using the available experimental
measurements (e.g., [8]). However just as in other applications,
lacking of physical interpretation or any intuitive understand-
ing, this highly adaptive and efficient method has yet been
fully exploited to tackle such an engineering problem in need.
Among the leading formulations in modeling noninear hys-
teresis, the force-state mapping technique for Single-Degree-
Of-Freedom (SDOF) systems ([19]) serves as a corner stone.
The usefulness and limitations of this formulation in modeling
nonlinear hysteretic restoring forces especially for memory
associated effects have been discussed in [34] and [20]. In
short, the force-state mapping treats the nonlinear restoring
force of a SDOF system as a function of both of its states,
i.e., displacement and velocity. Based on its importance and
simplicity, this formulation is adopted in the first attempt of
constructing an efficient and meaningful neural network.
In principle, fitting a restoring force surface of a SDOF
system in a state-space can be carried out using a neural
network with one hidden layer ([5], [9]) as in other function
approximation problems, which is defined by the following
expression:
r (x, ˙x) ≈ ˆr (p
1
, p
2
) =
n
h
X
j=1
w
2,j
h
j
(p
1
, p
2
) , and
h
j
(p
1
, p
2
) =
1
1 + e
−(w
11,j
p
1
+w
12,j
p
2
−b
j
)
where r and ˆr stand for the exact and approximated restoring
force, while p
1
and p
2
refer to the scaled displacement x and
velocity ˙x, respectively, for a SDOF system. The scaling issue
has been discussed in [29]. A feedforward neural network
with more than one hidden layer may be required for better
computational efficiency and perhaps more importantly to
yield better insight into the modeling behavior.
B. Methodology
The authors have proposed an approach [26], [28], [29],
[30] to determine the number of hidden layers, the number
of hidden nodes as well as the initial values of all the
weights and biases, all based on the feature of the restoring
surfaces to be modeled. Overall, the proposed method is a
growing technique. Various prototypes are built first to serve
as a general guideline for the initialization. These prototypes
are constructed to simulate some typical nonlinearities either
based on their mathematical expressions and/or geometric
features, and the capabilities of sigmoidal functions. When
building these prototypes, one is working on a forward prob-
lem. Therefore there is no training involved. Not only the
number of the hidden nodes, but also the values of the weights
and biases are derived in the process. The prototypes are
then grouped according to the typical nonlinearities that they
stand for. This is equivalent to constructing a look-up table
of the nonlinear types and the candidates of neural network
architectures.
When training a multilayer feedforward neural network to
obtain a smooth force-state mapping surface based on some
discrete points in the force-state space, a pre-processing is
often carried out to extract the main features of the surface to
be approximated. According to these features, some prototypes
are selected to lead to an initial neural network architecture
with the initial weights and biases. Training then starts to
capture the delicate features of the surface, and more nodes
are added for an improved approximation accuracy.
It is important to note the motivation for this study is not
limited to fast training and guaranteed convergence. With an
equal importance, seeking a meaningful interpretation and
even an intuitive understanding is emphasized here in this
neural network initialization.
C. Understanding Capabilities of Sigmoidal Functions
The key in building the proposed prototypes is to fully
utilize the nonlinearity of the sigmoidal action function in
approximating commonly encountered nonlinearities in the
force-state mapping. As the first step, the authors ([26],
[30]) have presented a constructive approach to approximating
polynomials (see Figs. 1 for some examples) and mapping
polynomial fitting into neural networks with a multilayer
feedforward neural network (see Fig. 2). Based on a finite
linear sum of hyperbolic sigmoidal functions and their Tay-
lor series expansion, numbers of hidden nodes as well as
initial values for weights and biases are derived to satisfy
a certain degree of accuracy. In another study based on a
different methodology by [22] and adopted by [21], derived
neural networks were also sought to map polynomials without
training. Approximating polynomials is studied first because of
their important role in the force-state mapping literature, i.e.,
Chebyshev [19] and ordinary polynomial fitting. The obtained
multilayer feedforward neural network whose functional form
consists of a small sum of sigmoidal basis functions can at
least efficiently approximate polynomials, its application to
the force-state mapping problem is thus quite natural and
compelling.
Σ
Σ
Σ
0
p
z≈p
3
-
-
Σ
Σ
0
w
2,4
[3]
w
2,3
[3]
w
2,6
[3]
w
1,5
[3]
w
1,1
[3]
w
1,1
[3]
w
1,3
[3]
w
1,3
[3]
w
1,5
[3]
b
1
[3]
b
1
[3]
0
w
2,5
[3]
w
2,1
[3]
Σ
Σ
w
2,2
[3]
-
-
0
w
1
[3]
w
2
[3]
component of weighting vector
L
e
g
e
nd
Σ
Σ
Σ
0
p
z≈p
2
-
-
Σ
Σ
0
w
2,2
[2]
w
2,3
[2]
w
2,4
[2]
w
1,3
b
1
[2]
w
2,1
[2]
w
1,1
[2]
w
1,3
[2]
b
1
[2]
w
1,1
[2]
w
1
[2]
w
2
[2]
component of weighting vector
L
e
g
e
nd
Σ
Σ
Σ
w
1,1
[0]
0
0
1
1
p
z≈p
0
w
1,1
[0]
-
w
1
[0]
w
2
[0]
component of weighting vector
L
e
g
e
nd
Σ
Σ
Σ
0
0
p
z≈p
1
-
w
1,1
[1]
2/w
1,1
[1]
w
1,1
2/w
1,1
[1]
-
w
1
[1]
w
2
[1]
component of weighting vector
L
e
g
e
nd
[2]
Fig. 1. Examples of the derived neural networks to approximate polynomials.
The effectiveness and flexibilities of multilayer feedforward
neural networks, however, far surpass this minimal proficiency
in approximating polynomial type nonlinearities. In addition
to the rigorous algebraic approach used in mapping polyno-
mials, a qualitative geometric analysis has been carried out to
study the capabilities of linear sums of sigmoidal functions
specifically for the force-mapping problem ([26], [28]). Two
examples are given in Figs. 3 and 4.
D. Prototypes
The geometric features in terms of 2-D contours of various
restoring force surfaces, are examined first using analytic, sim-
ulated and experimental results. Then, strategies are developed
to mimic these features in a transparent manner by focusing
on the number of hidden layers and hidden nodes, as well
as the needed values of the weights and biases. These neural
networks are prototypes.
Σ
Σ
Σ
Σ
Σ
Σ
Σ
Σ
Σ
Σ
Σ
Σ
Σ
Σ
Σ
Σ
Σ
Σ
Σ
»
»
»
L
aye
r
1
L
aye
r
2
L
aye
r
3
Σ
»
Σ
»
Σ
»
Σ
Σ
Σ
Σ
Σ
Σ
Σ
Σ
»
Σ
»
(p
1
)
0
(p
1
)
2
(p
1
)
p
1
p
2
(p
2
)
1
(p
2
)
2
(p
1
)
2
p
2
(p
2
)
3
p
1
p
2
z
1
s
t
nod
e
2
nd
nod
e
3
r
d
nod
e
4
t
h
nod
e
5
t
h
nod
e
21
s
t
nod
e
24
t
h
nod
e
25
t
h
nod
e
42
nd
nod
e
43
r
d
nod
e
6
t
h
nod
e
7
t
h
nod
e
14
t
h
nod
e
15
t
h
nod
e
20
t
h
nod
e
58
t
h
nod
e
64
t
h
nod
e
65
t
h
nod
e
70
t
h
nod
e
59
t
h
nod
e
(
f
ull
y
c
onn
ec
t
e
d
)
(
f
ull
y
c
onn
ec
t
e
d
)
(an example for
every node in this layer)
»
»
p
1
(p
2
)
(p
1
)
1
2
3
component of weighting vector
L
e
g
e
nd
to be determined
p
1
p
2
p
1
p
2
related purely to
related purely to
related to both and
with zero weight
Fig. 2. A derived neural network to explicitly perform polynomial fitting for
two variables up to the cubic power including the cross terms. Note that the
fixed weight training method ([10]) needs to be applied.
Some single-hidden layer neural network prototypes and
their simulation results are presented in Fig. 5. A general pro-
cedure for performing these simulations is summarized in the
caption. Note that all the neural network prototypes presented
here for the simulation are not obtained from training. Instead,
their number of hidden nodes and the values of the weights and
biases are decided based on the above algebraic and geometric
study so as to introduce some of the physical, mathematical,
and geometric features of nonlinear surfaces through some
human judgement rather than leaving these issues entirely to
data sets.
It can be seen that the simulation plots in Fig. 5, especially
the restoring force versus displacement plots, can represent
some typical nonlinear hysteretic phenomena, which illustrates
the capability of these prototypes. As long as the initial
weights and biases are selected properly, a sum of a very small
number of sigmoidal functions can achieve what the other
polynomial and non-polynomial based fitting schemes used
by the engineering mechanics community generally cannot
achieve in an efficient, flexible and unified manner. This
finding is not merely a validation of the feasibility studies
by [5], [9] which explore what multilayer feedforward neural
networks can do, rather it suggests schematically how to
design multilayer feedforward neural networks to approximate
a nonlinear function, i.e., a nonlinear restoring force in this
-10 0 10
0
0.5
1
w=1
b=0
p
h(p)
-10 0 10
0
0.5
1
w=1
b=-5&-10
p
h(p)
-10 0 10
0
0.5
1
w=1
b=5&10
p
h(p)
-10 0 10
0
0.5
1
w=0.05
b=0
p
h(p)
-10 0 10
0
0.5
1
w=10
b=0
p
h(p)
Fig. 3. Effect of varying weights and biases for a single sigmoidal function.
-10 0 10
0
1
2
w
1
=0.3, b
1
=-2
w
2
=0.3, b
2
=2
h
1
(p)+h
2
(p)
-10 0 10
0
1
2
w
1
=0.6, b
1
=-4
w
2
=0.6, b
2
=4
-10 0 10
0.5
1
1.5
w
1
=0.3, b
1
=-3
w
2
=0.3, b
2
=3
h
1
(p)+h
2
(p)
-10 0 10
0.5
1
1.5
w
1
=0.6, b
1
=-6
w
2
=0.6, b
2
=6
-10 0 10
0.5
1
1.5
w
1
=0.9, b
1
=-9
w
2
=0.9, b
2
=9
-10 0 10
0.95
1
1.05
w
1
=0.3, b
1
=-6
w
2
=0.3, b
2
=6
p
h
1
(p)+h
2
(p)
-10 0 10
0.995
1
1.005
w
1
=0.6, b
1
=-12
w
2
=0.6, b
2
=12
p
-10 0 10
0
1
2
w
1
=0.9, b
1
=-6
w
2
=0.9, b
2
=6
-10 0 10
0.9998
1
1.0002
w
1
=0.9, b
1
=-18
w
2
=0.9, b
2
=18
p
Fig. 4. Forming quasi-odd functions with different features especially the
varying length of the plateau by using a linear sum of two sigmoidal functions.
context.
The “meaning” of the weights and biases can be appreciated
from a parametric study run on each prototypes using the same
architecture but with various values of parameters. An example
is shown in Fig. 6.
The prototypes presented herein only serve as examples. It
is suggested to collect more prototypes like these, study the
influence of the values of weights and biases to each surface
profile, form the corresponding strategies based on any of
the mathematical, physical, and geometrical features and store
all of them in a look-up table or even a library to provide
guidelines on how (in terms of architecture design) and where
(in terms of initial values of weights and biases) to start neural
network training in mapping nonlinear restoring forces.
To interface raw input-output data to such a library espe-
cially when real-world large data sets are studied, it is further
proposed to use a pre-processing stage to seek guidance on the
neural network initial design (perhaps with some additional
iterative trials as well), which aims at grasping the global
characteristics of the restoring force surface. The training then
Σ
Σ
Σ
Σ
Σ
velocity
displacement
restoring force
Σ
Σ
Σ
Σ
Σ
Σ
Σ
Σ
velocity
displacement
restoring force
Σ
Σ
Σ
Σ
Σ
velocity
displacement
restoring force
Duffing
Softening
Coulomb
-120 0 120
-6
0
6
-120
0
120
-40
0
40
-6
0
6
-1.5
0
1.5
-1.5
0
1.5
-3
0
3
-1.5 0 1.5
-3
0
3
-80 0 80
-4
0
4
-80
0
80
-20
0
20
-4
0
4
restoring force
restoring force
restoring force
displacementdisplacement displacement
restoring force
restoring force
restoring force
displacementdisplacement displacementvelocity velocity velocity
Fig. 5. Examples of neural network prototypes in modeling the restoring force
using displacement and velocity for a SDOF system. In each case, a linear
sum of sigmoidal functions with a selected number of terms and specified
values of weights and biases (see Row 1) is used to construct a restoring
force surface (see Row 2 for the constructed restoring force surfaces). Note
that the values of weights and biases are not indicated. A synthetic swept-sine
excitation force is then applied to excite the SDOF system with the restoring
force surface defined by the neural network. By solving the equation of
motion numerically, a discrete response time history in terms of system states
(displacement and velocity) populate the restoring force surface to create a
trajectory (see Row 2). The trajectory is then be projected to produce restoring
force versus displacement plot as in Row 3.
starts with a clearly defined initial condition that is a product
of some physical insights and engineering judgements, and is
used to fine-tune the surface to reflect some localized and/or
delicate features with or without increasing the size of the
neural network.
The nonuniqueness of identified models for the neural
network approach is well known, that is, given different initial
points (corresponding to different sets of initial weights and
biases), the final trained results differ even for the same set
of training data. The proposed methodology leads to an initial
point that is placed close to a location where there is some type
of physical, geometrical or mathematical meaning. Starting
with such an initial design, it is expected that the final trained
results are closer to having some meaning than if one adopted
other initialization schemes. This is the key assumption made
in this study, which can be justified by the nature of the
local search training tools often used for a global search goal
associated with generalization using neural networks ([23]).
E. Training Example
A training example is presented in Fig. 7. The entire
procedure including the proposed initialization is demonstrated
from Fig. 7(a) through Fig. 7(d) and explained in the caption.
Prototypes and strategies in handling arbitrary parallel con-
tours and some localized features presented in [26] and [28]
are adopted here to decide the neural network architecture and
initial values of the weights and biases.
-15 0 15
-1.1
0
1.1
-15
0
15
-3
0
3
-1.1
0
1.1
-15 0 15
-3
0
3
0
-0.2
0.2
-0.4
0.4
-0.6
0.6
-0.8
0.8
-1
1
-15 0 15
-1.1
0
1.1
-15
0
15
-3
0
3
-1.1
0
1.1
-15 0 15
-3
0
3
0
-0.2
0.2
-0.4
0.4
-0.6
0.6
-0.8
0.8
-1
1
-15 0 15
-1.1
0
1.1
-15
0
15
-3
0
3
-1.1
0
1.1
-15 0 15
-3
0
3
0
-0.2
0.2
-0.4
0.4
-0.6
0.6
-0.8
0.8
-1
1
Σ
Σ
Σ
Σ
Σ
Σ
velocity
displacement
restoring force
restoring force
displacement
restoring force
displacementvelocity
velocity
displacement
restoring force
displacement
restoring force
displacementvelocity
velocity
displacement
restoring force
displacement
restoring force
displacementvelocity
velocity
displacement
Clearance (Dead Space)
Fig. 6. A parametric study run on a prototype for clearance (or dead space)
nonlinearity to show the values of the weights and biases can fine-tune the
features of the nonlinearity.
While the major benefit of this prototype-based initialization
would be that it provides a clear sense of what- and how-to-
do-it in the network design, it seems that one can still justify
the proposed initialization for improving mean-square-error
(MSE) performance (see Case 2 in Fig. 7(d)). The derived
weights and biases from the proposed initialization not only
give the smallest initial MSE among the four cases, but also
lead to the best performance after about 300 epoches. This
indicates the computational merit of the proposed initializa-
tion, although many more examples should be investigated to
validate this conclusion. Comparison of the values of weights
and biases between Case 1 (based on the popular Nguyen-
Widrow layer initialization method [24]) and Case 2 (based
on the derived initial values) is presented in [29]. It seems
that most of the trained values are close to their corresponding
initial values in both cases.
F. Ongoing Work
Presented above is an effort of forming an initial neural
network according to physical /mathematical /geometrical in-
terpretations while preserving the neural network’s adaptivity
to data when using the force-state mapping to model nonlinear
hysteretic restoring force. Further theoretical justifications
(e.g., how to grow the size of the neural network based on
some mathematical formulations and how to define the fixity
of the neural network during training) and training validations
for more complicated nonlinearities and large real-world data
sets are being carried out. An expansion of the proposed
initialization methodology to more advanced formulations of
the specific engineering problem is also being studied (e.g.,
[27]).
0 200 400 600 800 1000
10
-
9
10
-
8
10
-
7
10
-
6
10
-
5
10
-
4
10
-
3
10
-
2
10
-
1
10
0
10
1
epoch
squared error
C
ase
1
C
ase
2
C
ase
3
C
ase
4
0 1 2 3 4 5 6 7 8 9 10
10
-
5
10
-
4
10
-
3
10
-
2
10
-
1
10
0
10
1
epoch
squared error
C
ase
1
C
ase
2
C
ase
3
C
ase
4
Σ
Σ
Σ
p
1
p
2
Σ
Σ
Σ
S
z
Σ
Σ
Σ
Σ
Σ
p
1
p
2
Σ
z
Σ
Σ
Σ
Σ
p
1
p
2
Σ
Σ
Σ
S
z
Σ
Σ
Σ
Σ
Σ
Σ
p
1
p
2
Σ
z
Σ
Σ
Σ
C
ase
4
C
ase
1
C
ase
3
C
ase
2
-5 0 5
-5
0
5
(
a
)
Exact restoring force-displacement plot
displacement
restoring force
(
d
)
Time history of preformance index of four neural netowks
C
ase
1
-
Derived two hidden-layer architecture with Nguyen-Widrow layer initialization
C
ase
2
-
Derived two hidden-layer archtecture with the proposed initialization in this study
C
ase
3
-
Further simplified one hidden-layer (six hidden nodes) architecture
with Nguyen-Widrow layer initialization
C
ase
4
-
Further simplified one hidden-layer (three hidden nodes) architecture
with Nguyen-Widrow layer initialization
-5 0 5
-5
0
5
displacement
velocity
(
b
)
Contour of exact restoring force surface
-4
-2
0
2
4
(
c
)
Derived neural network architecture
Fig. 7. A training example using the proposed initialization. (a) A simulated
velocity square damping data set is formed by using r (x, ˙x) = 0.04 ˙x +
0.04 ˙x
2
sign ( ˙x) + x and a swept-sine excitation f = sin(0.01t
2
+ 0.01t)
with a uniform time step t = 0, 0.1, . . . , 200. (b) The original data set is
organized into pairs of restoring force (output of the neural network) versus
displacement and velocity (inputs to the neural network). In a pre-processing
stage, numerical interpolation is then carried out to form a data-based (non-
analytical) restoring force surface. Since 2-D contour lines characterize a 3-D
surface, contour features of the interpolated surface are examined. Based on
these features, a neural network architecture is derived using the prototype in
Fig. 4 and others presented in [26] and [28]. The obtained initial architecture is
shown in Cases 1 and 2 in (c). Further simplifications lead to the architectures
shown in Cases 3 and 4. (d) To validate the proposed initialization, four cases
are considered in the training. Both Cases 1 and 2 adopt the derived neural
network architecture but with different initial values of weights and biases. In
Cases 3 and 4, it is investigated how the simplified architecture would affect
the training performance. Throughout the training, the batch training mode
and the Levenberg-Marquardt algorithm are adopted.
IV. CONCLUSIONS
This study has sought to take a step toward creating a
neural network approach with enough physical /mathematical
/phenomenological insights to be classified as meaningful,
but yet remain highly adaptive. The important problem of
modeling nonlinear restoring forces in a SDOF system based
on the force-state mapping formulation has been selected as
an example to demonstrate that this goal can be achieved
by introducing a prototype-based initialization where human
judgement needs to be exercised in an engineered manner
based on the algebraic and/or geometric aspect of the problem
itself.
ACKNOWLEDGMENT
This study was supported in part by the National Science
Foundation under SGER CMS-0332350 for the first author
and CAREER Award CMS-0134333 for the third author.
REFERENCES
[1] K. A. Al-Mashouq and S. R. Irving, “Includig Hints in Training Neural
Nets,” Neural Computation, MIT, 1991, vol. 3, pp. 418–427.
[2] C.M. Bishop, Neural Networks for Pattern Recognition, Oxford Univer-
sity Press, 1995, pp. 482.
[3] T. L. Burrows and M. Niranjan, “Feed-Forward and Recurrent Neural
Networks for System Identification,” Cambridge University Engineering
Department, CUED/F-INFENG/TR158, 1993.
[4] P. Costa and P. Larzabal, “Initialization of Supervised Training for
Parametric Estimation,” Neural Processing Letters, Kluwer Academic
Publishers, Printed in the Netherlands, 1999, vol. 9, pp. 53–61.
[5] G. Cybenko, “Approximation by Superpositions of Sigmoidal Function,”
Mathematics of Control, Signals, and Systems, 1989, vol. 2, pp. 303–314.
[6] T. Denoeux and R. Lengell
´
e, “Initializing Back Propagation Networks
with Prototypes,” Neural Networks, 1993, vol. 6, pp. 351–363.
[7] G. P. Drago and S. Ridella, “Statistically Controlled Activation Weight
Initialization (SCAWI),” IEEE Transactions on Neural Networks, 1992,
vol. 3, no. 3, pp. 627–631.
[8] J. Ghaboussi and X. Wu, “Soft Computing with Neural Networks for En-
gineering Applications: Fundamental Issues and Adaptive Approaches,”
Structural Engineering Mechanics, 1998, vol. 6, no. 8, pp. 955–969.
[9] K. Hornik, M. Stinchcombe and H. White, “Multilayer Feedforward
Networks are Universal Approximators,” Neural Networks, 1989, vol. 2,
pp. 359–366.
[10] W. Y. Huang and R. P. Lippmann, “Neural Net and Traditional Clas-
sifiers,” Neural Information Processing Systems, D. Anderson (ed.),
American Institute of Physics, New York, 1988, pp. 387–396.
[11] L. K. Jones, “Constructive Approximations for Neural Networks by
Sigmoidal Functions,” Proceedings of the IEEE, 1990, vol. 78, no. 10,
pp. 1586–1589.
[12] L. S. Kim, “Initializing Weights to a Hidden Layer of a Multilayer
Neural Network by Linear Programming,” Proceedings of 1993 Interna-
tional Joint Conference on Neural Networks, 1993. vol. 2, pp. 1701–1704.
[13] M.-Y. Kim and C.-H. Choi, “A New Weight Initialization Method for
the MLP with the BP in Multiclass Classification Problems, Neuaral
Processing Letters, 1997, vol. 6, pp. 11–23.
[14] M. Lentokangas, J. Saarinen and K. Kaski, “Initializing Weights of a
Multilayer Perceptron Network by Using the Orthogonal Least Squares
Algorithm,” Neural Computation, 1995, vol. 7, pp. 982–999.
[15] M. Lehtokangas, “Fast Initialization for Cascade-Corrolation Learning,”
IEEE Transactions on Neural Networks, 1999, vol. 10, no. 2, pp. 410–
414.
[16] N. B. Karayiannis, “Accelerating the Training of Feedforward Neural
Networks Using Generalized Hebbian Rules for Initializing the Internal
Representations,” IEEE Transactions on Neural Networks, 1996, vol. 7,
no. 2, pp. 419–426.
[17] G. Li and H. Alnuweiri and Y. Wu and H. Li, “Acceleration of Back
Propagations through Iniitial Weights Pre-Training with Delta Rule,”
Proc. Int. Joint Conf. Neural Networks, pp. 580–585, 1993, vol. 1.
[18] L. Ma and K. Khorasani, “New Training Strategies for Constructive
Neural Networks with Application to Regression Problems,” Neual Net-
works, 2004, vol. 17, pp. 589–609.
[19] S. F. Masri and T. K. Caughey, “A Nonparametric Identification Tech-
nique for Nonlinear Dynamic Problems,” Journal of Applied Mechanics,
1979, vol. 46, pp. 433–447.
[20] S. F. Masri, J. P. Caffrey, T. K. Caughey, A. W. Smyth and A. G.
Chassiakos, “Identification of the State Equation in Complex Non-Linear
Systems,” International Journal of Non-Linear Mechanics, 2004, vol. 39,
pp. 1111–1127.
[21] A. J. Meade, Jr., “Regularization of a Programmed Recurrent Artificial
Neural Network,” Journal of Guidance, Control, and Dynamics, 2003.
[22] H. N. Mhaskar, “Neural Networks for Optimal Approximation of
Smooth and Analytic Functions,” Neural Computation, 1995, vol. 8,
pp. 164–177.
[23] O. Nelles, Nonlinear System Identification: From Classical Approaches
to Neural Networks and Fuzzy Models, Springer Verlag, 2000.
[24] D. Nguyen and B. Widrow, “Improving the Learning Speed of 2-Layer
Neural Networks by Choosing Initial Values of the Adaptive Weights,”
Proceedings of the IJCNN, pp. 21–26, 1990, vol.III.
[25] S. Osowski, “New Approach to Selection of Initial Values of Weights in
Neural Function Approximation,” Electronic Letters, 1993, vol. 29, no. 3,
pp. 313–315.
[26] J. S. Pei, “Parametric and Nonparametric Identification of Nonlinear
Systems,” Columbia University, 2001, Ph.D. Dissertation.
[27] J. S. Pei, A. W. Smyth, A.W. and E. B. Kosmatopoulos, “Analysis and
Modification of Volterra/Wiener Neural Networks for Identification of
Nonlinear Hysteretic Dynamic Systems,” Journal of Sound and Vibration,
vol. 275, no. 3-5, pp. 693–718.
[28] J. S. Pei and A. W. Smyth, “A New Approach to Design Multilayer Feed-
forward Neural Network Architecture in Modeling Nonlinear Restoring
Forces: Part I - Formulation,” ASCE Journal of Engineering Mechanics,
2004, accepted for publication.
[29] J. S. Pei and A. W. Smyth, “A New Approach to Design Multilayer Feed-
forward Neural Network Architecture in Modeling Nonlinear Restoring
Forces: Part II - Applications,” ASCE Journal of Engineering Mechanics,
2004, accepted for publication.
[30] J. S. Pei, J. P. Wright, and A. W. Smyth, “Mapping Polynomial Fitting
into Feedforward Neural Networks for Modeling Nonlinear Dynamic
Systems and Beyond,” Computer Methods in Applied Mechanics and
Engineering, 2004, accepted for publication.
[31] W. F. Schmidt and S. Raudys and M. A. Kraaijveld and M. Skurikhina,
and R. P. W. Duin, “Initializations, Back-Propagation and Generalization
of Feed-Forward Classifiers,” Proc. Int. Joint Conf. Neural Networks,
pp. 598–604, 1993.
[32] Y. K. Wen, “Methods of Random Vibration for Inelastic Structures,”
Appl. Mech. Rev., ASME, 1989, vol. 42, no. 2, pp. 39-52.
[33] L. F. A. Wessels and E. Barnard, “Avoiding False Local Minima
by Proper Initialization of Connections,” IEEE Transactions on Neural
Networks, 1992, vol. 3, no. 6, pp. 899–905.
[34] K. Worden and G. R. Tomlinson, Nonlinearity in Structural Dynamics:
Detection, Identification and Modelling, Institute of Physics Pub, 2001.
[35] J. Y. F. Yam and T. W. S. Chow, “Feedforward Networks Training Speed
Enhancement by Optimal Initialization of the Synaptic Coefficients,”
IEEE Transactions on Neural Networks, 2001, vol. 12, no. 2, pp. 430–
434.
[36] S. Zhao and T. S. Dillon, “Incorporating Prior Knowledge in the Form
of Production Rules into Neural Networks Using Boolean-Like Neurons,”
Applied Intelligence, 1997, vol. 7, pp. 275–285.