Conference PaperPDF Available

Neural network initialization with prototypes - A case study in function approximation

July 2005

July 2005
3:1377 - 1382 vol. 3

DOI:10.1109/IJCNN.2005.1556075

Source
IEEE Xplore

Conference: Proceedings of International Joint Conference on Neural Networks (IJCNN)
At: Montreal, Canada
Volume: 2005

Authors:

Jin-Song Pei

University of Oklahoma

Joseph P. Wright

Weidlinger Associates, Inc.

Andrew Smyth

Columbia University

The initialization of neural networks in function approximation has been studied by many researchers yet remains a challenging problem. Another important yet open issue in the neural network community is to incorporate knowledge and hints with regard to training for a meaningful neural network. This study makes an attempt to address these two issues in handling a specific type of engineering problems, namely, modeling nonlinear hysteretic restoring forces of a dynamic system under a specific formulation. The paper showcases a heuristic idea on using a growing technique through a prototype-based initialization where the insights to the governing mathematics/physics are related to the features of the activation functions.

Examples of the derived neural networks to approximate polynomials.

…

A derived neural network to explicitly perform polynomial fitting for two variables up to the cubic power including the cross terms. Note that the fixed weight training method ([10]) needs to be applied.

…

Figures - uploaded by Jin-Song Pei

Content may be subject to copyright.

Content uploaded by Jin-Song Pei

Content may be subject to copyright.

Neural Network Initialization with Prototypes -

A Case Study in Function Approximation

Jin-Song Pei

School of Civil Engineering and

Environmental Science

University of Oklahoma

Norman, OK 73019

E-mail: jspei@ou.edu

Joseph P. Wright

Division of Applied Science

Weidlinger Associates Inc.

New York, NY 10014

E-mail: wright@wai.com

Andrew W. Smyth

Department of Civil Engineering and

Engineering Mechanics

Columbia University

New York, NY 10027

E-mail: smyth@civil.columbia.edu

Abstract—The initialization of neural networks in function

approximation has been studied by many researchers yet remains

a challenging problem. Another important yet open issue in the

neural network community is to incorporate knowledge and hints

with regard to training for a meaningful neural network. This

study makes an attempt to address these two issues in handling a

speciﬁc type of engineering problems, namely, modeling nonlinear

hysteretic restoring forces of a dynamic system under a speciﬁc

formulation. The paper showcases a heuristic idea on using a

growing technique through a prototype-based initialization where

the insight to the governing mathematics/physics are related to

the features of the activation functions.

I. INTRODUCTION

As an important issue in theories and applications, mul-

tilayer feedforward neural network initialization in function

approximation has been studied by many researchers (e.g.,

[24], [7], [33], [3], [25], [31], [6], [12], [2], [14], [16], [13],

[4], [35]) focusing on fast convergence. One of the major

experiences indicates that an efﬁcient initialization (measured

by the success and rate of the convergence) has to rely on the

feature of the function to be approximated (e.g., [6], [15]).

In this study, the authors were prompted by the need in a

speciﬁc engineering application of neural networks to seek an

efﬁcient as well as a meaningful solution to the initialization

of a multilayer feedforward neural network.

A heuristic approach is presented to initialize multilayer

feedforward neural networks for a special and critical class

of problems in engineering mechanics and numerous appli-

cations. Mathematically, a smooth function surface is desired

here based on some discrete input-output points. This is a

dual-goal pursuit; one is to ﬁnd an initial point to start the

training of a multilayer feedforward neural network to enable

convergence and fast training, and the other is to facilitate the

interpretation of the inner workings of neural networks. The

latter is required by the need in engineering practice ([28])

especially by the demand in fusing physics-based and data-

driven modeling tools.

II. LITERATURE REVIEW

Finding good initial points to start neural network training

is of critical importance. Reduced training time is claimed

to be substantial if an initial point can be optimized (e.g.,

[6]). In terms of initialization schemes, there are quite some

publications available, although the number of which might

not be as many as those focusing on the training process itself.

Summaries of these works can be found in [15], [35], for

example. Despite the existence of these works, both analytic

and heuristic approaches are still lacking to guide practical

applications. This might be because this challenging issue can

only be addressed well by looking into the features of the

function to be approximated (or equivalently, the features of

the error function surface) and thus might be hard to be tackled

in a general sense ([6], [15]).

[36] and [1] are examples of the past efforts of incorporat-

ing knowledge into neural network training. For engineering

applications, the existing knowledge of engineering and maths

would offer signiﬁcant advantages if a proper means could be

found to utilize them in the most critical and under-addressed

issues in neural networks such as the initial setup. Constructive

methods in neural networks do exist (e.g., [11] and a summary

in [18]), however the available knowledge and hints are far

from being sufﬁciently connected to any of these constructive

methods.

Among these existing studies, [6] is closely related to this

study, where a prototype based initialization was proposed

for both pattern classiﬁcation and function approximation.

Noteworthy is that [6] also paid attention to the interpretation

of the weights and/or basis function. This proposed study,

however, is more detailed and exercised more thoroughly

but mainly for the approximation of some arbitrary functions

in a speciﬁc engineering application. Inspirations from these

previous works to this study can be summarized as follows:

1) Random initialization schemes (e.g., [31]) do not nor-

mally work well, which has been widely acknowledged

by researchers, e.g., [13], [4].

2) Methods based on a good understanding of the capabil-

ities of sigmoidal activation function have been studied.

The importance of thoroughly utilizing the feature of

sigmoidal functions has been recognized. Various at-

tempts have been made to use a sigmoidal function as

a whole (e.g., [11]), in a linearized form (e.g., [3]), in a

piece-wise fashion (e.g., [24], [25]), and as a nonlinear

function (e.g., [4]), respectively.

From these studies, the need of preventing neurons from

saturation has been realized. The weights between the

input and hidden layer are given special attention since

they form the basis functions in function approximation.

The work by [22] does not deal with neural network ini-

tializations directly, however the capability of sigmoidal

functions in approximating polynomials was analyzed.

That work was used by [21] in setting up recurrent

neural networks.

3) Methods utilizing the features of the function to be

approximated have been proposed.

The necessity of studying the intrinsic feature/pattern of

the function to be approximated and the importance of

exercising human judgement on any visible features of

the function have been reported such as in [25], [6].

This will be referred as the engineering judgement in

this study.

4) The necessity of preprocessing data is quite common.

[17] proposed a delta rule pre-training scheme, while

[16] proposed a bottom-up unsupervised learning

process before the top-down supervised training is ap-

plied. For cascade learning, [15] also proposed a pre-

training stage.

III. PROPOSED INITIALIZATION

A. Problem Statement

Modeling nonlinear hysteresis has been a critical and an

active research topic in engineering mechanics, the applica-

tions of which can be found in studying the work load of

aeroplane joints, performing simulations for infrastructures

under dynamic loads (e.g., earthquake excitations) and many

others. The nonlinearity hysteretic especially the rate and

path-dependent hysteretic behavior of a joint will dictate

the system-level response, thus has been studied analytically,

experimentally and numerically. Physics-based macro-models

have the advantage of having physical interpretations, however,

the physical mechanism is often hard to be modeled due to the

complex nature of the problem itself. Analytic models (such

as the Bouc-Wen model ([32]) have been widely adopted in

simulations, however, its weak-form makes the application esp.

for online system identiﬁcation challenging. Another important

class of models, phenomenological models have been widely

accepted in engineering practice because of the convenience

of enabling intuitive understanding, although they face the

same challenges as the analytic models, i.e., the adaptivity

to data. Neural networks have been exercised by engineers

to tackle such a problem by using the available experimental

measurements (e.g., [8]). However just as in other applications,

lacking of physical interpretation or any intuitive understand-

ing, this highly adaptive and efﬁcient method has yet been

fully exploited to tackle such an engineering problem in need.

Among the leading formulations in modeling noninear hys-

teresis, the force-state mapping technique for Single-Degree-

Of-Freedom (SDOF) systems ([19]) serves as a corner stone.

The usefulness and limitations of this formulation in modeling

nonlinear hysteretic restoring forces especially for memory

associated effects have been discussed in [34] and [20]. In

short, the force-state mapping treats the nonlinear restoring

force of a SDOF system as a function of both of its states,

i.e., displacement and velocity. Based on its importance and

simplicity, this formulation is adopted in the ﬁrst attempt of

constructing an efﬁcient and meaningful neural network.

In principle, ﬁtting a restoring force surface of a SDOF

system in a state-space can be carried out using a neural

network with one hidden layer ([5], [9]) as in other function

approximation problems, which is deﬁned by the following

expression:

r (x, ˙x) ≈ ˆr (p

, p

) =

j=1

2,j

, p

) , and

, p

) =

1 + e

−(w

11,j

12,j

−b

)

where r and ˆr stand for the exact and approximated restoring

force, while p

and p

refer to the scaled displacement x and

velocity ˙x, respectively, for a SDOF system. The scaling issue

has been discussed in [29]. A feedforward neural network

with more than one hidden layer may be required for better

computational efﬁciency and perhaps more importantly to

yield better insight into the modeling behavior.

B. Methodology

The authors have proposed an approach [26], [28], [29],

[30] to determine the number of hidden layers, the number

of hidden nodes as well as the initial values of all the

weights and biases, all based on the feature of the restoring

surfaces to be modeled. Overall, the proposed method is a

growing technique. Various prototypes are built ﬁrst to serve

as a general guideline for the initialization. These prototypes

are constructed to simulate some typical nonlinearities either

based on their mathematical expressions and/or geometric

features, and the capabilities of sigmoidal functions. When

building these prototypes, one is working on a forward prob-

lem. Therefore there is no training involved. Not only the

number of the hidden nodes, but also the values of the weights

and biases are derived in the process. The prototypes are

then grouped according to the typical nonlinearities that they

stand for. This is equivalent to constructing a look-up table

of the nonlinear types and the candidates of neural network

architectures.

When training a multilayer feedforward neural network to

obtain a smooth force-state mapping surface based on some

discrete points in the force-state space, a pre-processing is

often carried out to extract the main features of the surface to

be approximated. According to these features, some prototypes

are selected to lead to an initial neural network architecture

with the initial weights and biases. Training then starts to

capture the delicate features of the surface, and more nodes

are added for an improved approximation accuracy.

It is important to note the motivation for this study is not

limited to fast training and guaranteed convergence. With an

equal importance, seeking a meaningful interpretation and

even an intuitive understanding is emphasized here in this

neural network initialization.

C. Understanding Capabilities of Sigmoidal Functions

The key in building the proposed prototypes is to fully

utilize the nonlinearity of the sigmoidal action function in

approximating commonly encountered nonlinearities in the

force-state mapping. As the ﬁrst step, the authors ([26],

[30]) have presented a constructive approach to approximating

polynomials (see Figs. 1 for some examples) and mapping

polynomial ﬁtting into neural networks with a multilayer

feedforward neural network (see Fig. 2). Based on a ﬁnite

linear sum of hyperbolic sigmoidal functions and their Tay-

lor series expansion, numbers of hidden nodes as well as

initial values for weights and biases are derived to satisfy

a certain degree of accuracy. In another study based on a

different methodology by [22] and adopted by [21], derived

neural networks were also sought to map polynomials without

training. Approximating polynomials is studied ﬁrst because of

their important role in the force-state mapping literature, i.e.,

Chebyshev [19] and ordinary polynomial ﬁtting. The obtained

multilayer feedforward neural network whose functional form

consists of a small sum of sigmoidal basis functions can at

least efﬁciently approximate polynomials, its application to

the force-state mapping problem is thus quite natural and

compelling.

z≈p

2,4

[3]

2,3

[3]

2,6

[3]

1,5

[3]

1,1

[3]

1,1

[3]

1,3

[3]

1,3

[3]

1,5

[3]

2,5

[3]

2,1

[3]

2,2

[3]

component of weighting vector

z≈p

2,2

[2]

2,3

[2]

2,4

[2]

1,3

[2]

2,1

[2]

1,1

[2]

1,3

[2]

1,1

[2]

component of weighting vector

1,1

[0]

z≈p

1,1

[0]

component of weighting vector

z≈p

1,1

[1]

2/w

1,1

[1]

1,1

2/w

1,1

[1]

component of weighting vector

[2]

Fig. 1. Examples of the derived neural networks to approximate polynomials.

The effectiveness and ﬂexibilities of multilayer feedforward

neural networks, however, far surpass this minimal proﬁciency

in approximating polynomial type nonlinearities. In addition

to the rigorous algebraic approach used in mapping polyno-

mials, a qualitative geometric analysis has been carried out to

study the capabilities of linear sums of sigmoidal functions

speciﬁcally for the force-mapping problem ([26], [28]). Two

examples are given in Figs. 3 and 4.

D. Prototypes

The geometric features in terms of 2-D contours of various

restoring force surfaces, are examined ﬁrst using analytic, sim-

ulated and experimental results. Then, strategies are developed

to mimic these features in a transparent manner by focusing

on the number of hidden layers and hidden nodes, as well

as the needed values of the weights and biases. These neural

networks are prototypes.

aye

)

nod

(

ull

onn

)

(

ull

onn

)

(an example for

every node in this layer)

)

component of weighting vector

to be determined

related purely to

related to both and

with zero weight

Fig. 2. A derived neural network to explicitly perform polynomial ﬁtting for

two variables up to the cubic power including the cross terms. Note that the

ﬁxed weight training method ([10]) needs to be applied.

Some single-hidden layer neural network prototypes and

their simulation results are presented in Fig. 5. A general pro-

cedure for performing these simulations is summarized in the

caption. Note that all the neural network prototypes presented

here for the simulation are not obtained from training. Instead,

their number of hidden nodes and the values of the weights and

biases are decided based on the above algebraic and geometric

study so as to introduce some of the physical, mathematical,

and geometric features of nonlinear surfaces through some

human judgement rather than leaving these issues entirely to

data sets.

It can be seen that the simulation plots in Fig. 5, especially

the restoring force versus displacement plots, can represent

some typical nonlinear hysteretic phenomena, which illustrates

the capability of these prototypes. As long as the initial

weights and biases are selected properly, a sum of a very small

number of sigmoidal functions can achieve what the other

polynomial and non-polynomial based ﬁtting schemes used

by the engineering mechanics community generally cannot

achieve in an efﬁcient, ﬂexible and uniﬁed manner. This

ﬁnding is not merely a validation of the feasibility studies

by [5], [9] which explore what multilayer feedforward neural

networks can do, rather it suggests schematically how to

design multilayer feedforward neural networks to approximate

a nonlinear function, i.e., a nonlinear restoring force in this

-10 0 10

0

0.5

1

w=1

b=0

p

h(p)

-10 0 10

0

0.5

1

w=1

b=-5&-10

p

h(p)

-10 0 10

0

0.5

1

w=1

b=5&10

p

h(p)

-10 0 10

0

0.5

1

w=0.05

b=0

p

h(p)

-10 0 10

0

0.5

1

w=10

b=0

p

h(p)

Fig. 3. Effect of varying weights and biases for a single sigmoidal function.

-10 0 10

=0.3, b

=-2

=0.3, b

(p)+h

(p)

-10 0 10

=0.6, b

=-4

=0.6, b

-10 0 10

0.5

1.5

=0.3, b

=-3

=0.3, b

(p)+h

(p)

-10 0 10

0.5

1.5

=0.6, b

=-6

=0.6, b

-10 0 10

0.5

1.5

=0.9, b

=-9

=0.9, b

-10 0 10

0.95

1.05

=0.3, b

=-6

=0.3, b

(p)+h

(p)

-10 0 10

0.995

1.005

=0.6, b

=-12

=0.6, b

=12

-10 0 10

=0.9, b

=-6

=0.9, b

-10 0 10

0.9998

1.0002

=0.9, b

=-18

=0.9, b

=18

Fig. 4. Forming quasi-odd functions with different features especially the

varying length of the plateau by using a linear sum of two sigmoidal functions.

context.

The “meaning” of the weights and biases can be appreciated

from a parametric study run on each prototypes using the same

architecture but with various values of parameters. An example

is shown in Fig. 6.

The prototypes presented herein only serve as examples. It

is suggested to collect more prototypes like these, study the

inﬂuence of the values of weights and biases to each surface

proﬁle, form the corresponding strategies based on any of

the mathematical, physical, and geometrical features and store

all of them in a look-up table or even a library to provide

guidelines on how (in terms of architecture design) and where

(in terms of initial values of weights and biases) to start neural

network training in mapping nonlinear restoring forces.

To interface raw input-output data to such a library espe-

cially when real-world large data sets are studied, it is further

proposed to use a pre-processing stage to seek guidance on the

neural network initial design (perhaps with some additional

iterative trials as well), which aims at grasping the global

characteristics of the restoring force surface. The training then

velocity

displacement

restoring force

velocity

displacement

restoring force

velocity

displacement

restoring force

Duffing

Softening

Coulomb

-120 0 120

-6

-120

120

-40

-6

-1.5

1.5

-1.5

1.5

-3

-1.5 0 1.5

-3

-80 0 80

-4

-80

-20

-4

restoring force

displacementdisplacement displacement

restoring force

displacementdisplacement displacementvelocity velocity velocity

Fig. 5. Examples of neural network prototypes in modeling the restoring force

using displacement and velocity for a SDOF system. In each case, a linear

sum of sigmoidal functions with a selected number of terms and speciﬁed

values of weights and biases (see Row 1) is used to construct a restoring

force surface (see Row 2 for the constructed restoring force surfaces). Note

that the values of weights and biases are not indicated. A synthetic swept-sine

excitation force is then applied to excite the SDOF system with the restoring

force surface deﬁned by the neural network. By solving the equation of

motion numerically, a discrete response time history in terms of system states

(displacement and velocity) populate the restoring force surface to create a

trajectory (see Row 2). The trajectory is then be projected to produce restoring

force versus displacement plot as in Row 3.

starts with a clearly deﬁned initial condition that is a product

of some physical insights and engineering judgements, and is

used to ﬁne-tune the surface to reﬂect some localized and/or

delicate features with or without increasing the size of the

neural network.

The nonuniqueness of identiﬁed models for the neural

network approach is well known, that is, given different initial

points (corresponding to different sets of initial weights and

biases), the ﬁnal trained results differ even for the same set

of training data. The proposed methodology leads to an initial

point that is placed close to a location where there is some type

of physical, geometrical or mathematical meaning. Starting

with such an initial design, it is expected that the ﬁnal trained

results are closer to having some meaning than if one adopted

other initialization schemes. This is the key assumption made

in this study, which can be justiﬁed by the nature of the

local search training tools often used for a global search goal

associated with generalization using neural networks ([23]).

E. Training Example

A training example is presented in Fig. 7. The entire

procedure including the proposed initialization is demonstrated

from Fig. 7(a) through Fig. 7(d) and explained in the caption.

Prototypes and strategies in handling arbitrary parallel con-

tours and some localized features presented in [26] and [28]

are adopted here to decide the neural network architecture and

initial values of the weights and biases.

-15 0 15

-1.1

1.1

-15

-3

-1.1

1.1

-15 0 15

-3

-0.2

0.2

-0.4

0.4

-0.6

0.6

-0.8

0.8

-1

-15 0 15

-1.1

1.1

-15

-3

-1.1

1.1

-15 0 15

-3

-0.2

0.2

-0.4

0.4

-0.6

0.6

-0.8

0.8

-1

-15 0 15

-1.1

1.1

-15

-3

-1.1

1.1

-15 0 15

-3

-0.2

0.2

-0.4

0.4

-0.6

0.6

-0.8

0.8

-1

velocity

displacement

restoring force

displacement

restoring force

displacementvelocity

velocity

displacement

restoring force

displacement

restoring force

displacementvelocity

velocity

displacement

restoring force

displacement

restoring force

displacementvelocity

velocity

displacement

Clearance (Dead Space)

Fig. 6. A parametric study run on a prototype for clearance (or dead space)

nonlinearity to show the values of the weights and biases can ﬁne-tune the

features of the nonlinearity.

While the major beneﬁt of this prototype-based initialization

would be that it provides a clear sense of what- and how-to-

do-it in the network design, it seems that one can still justify

the proposed initialization for improving mean-square-error

(MSE) performance (see Case 2 in Fig. 7(d)). The derived

weights and biases from the proposed initialization not only

give the smallest initial MSE among the four cases, but also

lead to the best performance after about 300 epoches. This

indicates the computational merit of the proposed initializa-

tion, although many more examples should be investigated to

validate this conclusion. Comparison of the values of weights

and biases between Case 1 (based on the popular Nguyen-

Widrow layer initialization method [24]) and Case 2 (based

on the derived initial values) is presented in [29]. It seems

that most of the trained values are close to their corresponding

initial values in both cases.

F. Ongoing Work

Presented above is an effort of forming an initial neural

network according to physical /mathematical /geometrical in-

terpretations while preserving the neural network’s adaptivity

to data when using the force-state mapping to model nonlinear

hysteretic restoring force. Further theoretical justiﬁcations

(e.g., how to grow the size of the neural network based on

some mathematical formulations and how to deﬁne the ﬁxity

of the neural network during training) and training validations

for more complicated nonlinearities and large real-world data

sets are being carried out. An expansion of the proposed

initialization methodology to more advanced formulations of

the speciﬁc engineering problem is also being studied (e.g.,

[27]).

0 200 400 600 800 1000

epoch

squared error

ase

0 1 2 3 4 5 6 7 8 9 10

epoch

squared error

ase

-5 0 5

-5

(

)

Exact restoring force-displacement plot

displacement

restoring force

(

)

Time history of preformance index of four neural netowks

ase

Derived two hidden-layer architecture with Nguyen-Widrow layer initialization

ase

Derived two hidden-layer archtecture with the proposed initialization in this study

ase

Further simplified one hidden-layer (six hidden nodes) architecture

with Nguyen-Widrow layer initialization

ase

Further simplified one hidden-layer (three hidden nodes) architecture

with Nguyen-Widrow layer initialization

-5 0 5

-5

displacement

velocity

(

)

Contour of exact restoring force surface

-4

-2

(

)

Derived neural network architecture

Fig. 7. A training example using the proposed initialization. (a) A simulated

velocity square damping data set is formed by using r (x, ˙x) = 0.04 ˙x +

0.04 ˙x

sign ( ˙x) + x and a swept-sine excitation f = sin(0.01t

+ 0.01t)

with a uniform time step t = 0, 0.1, . . . , 200. (b) The original data set is

organized into pairs of restoring force (output of the neural network) versus

displacement and velocity (inputs to the neural network). In a pre-processing

stage, numerical interpolation is then carried out to form a data-based (non-

analytical) restoring force surface. Since 2-D contour lines characterize a 3-D

surface, contour features of the interpolated surface are examined. Based on

these features, a neural network architecture is derived using the prototype in

Fig. 4 and others presented in [26] and [28]. The obtained initial architecture is

shown in Cases 1 and 2 in (c). Further simpliﬁcations lead to the architectures

shown in Cases 3 and 4. (d) To validate the proposed initialization, four cases

are considered in the training. Both Cases 1 and 2 adopt the derived neural

network architecture but with different initial values of weights and biases. In

Cases 3 and 4, it is investigated how the simpliﬁed architecture would affect

the training performance. Throughout the training, the batch training mode

and the Levenberg-Marquardt algorithm are adopted.

IV. CONCLUSIONS

This study has sought to take a step toward creating a

neural network approach with enough physical /mathematical

/phenomenological insights to be classiﬁed as meaningful,

but yet remain highly adaptive. The important problem of

modeling nonlinear restoring forces in a SDOF system based

on the force-state mapping formulation has been selected as

an example to demonstrate that this goal can be achieved

by introducing a prototype-based initialization where human

judgement needs to be exercised in an engineered manner

based on the algebraic and/or geometric aspect of the problem

itself.

ACKNOWLEDGMENT

This study was supported in part by the National Science

Foundation under SGER CMS-0332350 for the ﬁrst author

and CAREER Award CMS-0134333 for the third author.

REFERENCES

[1] K. A. Al-Mashouq and S. R. Irving, “Includig Hints in Training Neural

Nets,” Neural Computation, MIT, 1991, vol. 3, pp. 418–427.

[2] C.M. Bishop, Neural Networks for Pattern Recognition, Oxford Univer-

sity Press, 1995, pp. 482.

[3] T. L. Burrows and M. Niranjan, “Feed-Forward and Recurrent Neural

Networks for System Identiﬁcation,” Cambridge University Engineering

Department, CUED/F-INFENG/TR158, 1993.

[4] P. Costa and P. Larzabal, “Initialization of Supervised Training for

Parametric Estimation,” Neural Processing Letters, Kluwer Academic

Publishers, Printed in the Netherlands, 1999, vol. 9, pp. 53–61.

[5] G. Cybenko, “Approximation by Superpositions of Sigmoidal Function,”

Mathematics of Control, Signals, and Systems, 1989, vol. 2, pp. 303–314.

[6] T. Denoeux and R. Lengell

e, “Initializing Back Propagation Networks

with Prototypes,” Neural Networks, 1993, vol. 6, pp. 351–363.

[7] G. P. Drago and S. Ridella, “Statistically Controlled Activation Weight

Initialization (SCAWI),” IEEE Transactions on Neural Networks, 1992,

vol. 3, no. 3, pp. 627–631.

[8] J. Ghaboussi and X. Wu, “Soft Computing with Neural Networks for En-

gineering Applications: Fundamental Issues and Adaptive Approaches,”

Structural Engineering Mechanics, 1998, vol. 6, no. 8, pp. 955–969.

[9] K. Hornik, M. Stinchcombe and H. White, “Multilayer Feedforward

Networks are Universal Approximators,” Neural Networks, 1989, vol. 2,

pp. 359–366.

[10] W. Y. Huang and R. P. Lippmann, “Neural Net and Traditional Clas-

siﬁers,” Neural Information Processing Systems, D. Anderson (ed.),

American Institute of Physics, New York, 1988, pp. 387–396.

[11] L. K. Jones, “Constructive Approximations for Neural Networks by

Sigmoidal Functions,” Proceedings of the IEEE, 1990, vol. 78, no. 10,

pp. 1586–1589.

[12] L. S. Kim, “Initializing Weights to a Hidden Layer of a Multilayer

Neural Network by Linear Programming,” Proceedings of 1993 Interna-

tional Joint Conference on Neural Networks, 1993. vol. 2, pp. 1701–1704.

[13] M.-Y. Kim and C.-H. Choi, “A New Weight Initialization Method for

the MLP with the BP in Multiclass Classiﬁcation Problems, Neuaral

Processing Letters, 1997, vol. 6, pp. 11–23.

[14] M. Lentokangas, J. Saarinen and K. Kaski, “Initializing Weights of a

Multilayer Perceptron Network by Using the Orthogonal Least Squares

Algorithm,” Neural Computation, 1995, vol. 7, pp. 982–999.

[15] M. Lehtokangas, “Fast Initialization for Cascade-Corrolation Learning,”

IEEE Transactions on Neural Networks, 1999, vol. 10, no. 2, pp. 410–

414.

[16] N. B. Karayiannis, “Accelerating the Training of Feedforward Neural

Networks Using Generalized Hebbian Rules for Initializing the Internal

Representations,” IEEE Transactions on Neural Networks, 1996, vol. 7,

no. 2, pp. 419–426.

[17] G. Li and H. Alnuweiri and Y. Wu and H. Li, “Acceleration of Back

Propagations through Iniitial Weights Pre-Training with Delta Rule,”

Proc. Int. Joint Conf. Neural Networks, pp. 580–585, 1993, vol. 1.

[18] L. Ma and K. Khorasani, “New Training Strategies for Constructive

Neural Networks with Application to Regression Problems,” Neual Net-

works, 2004, vol. 17, pp. 589–609.

[19] S. F. Masri and T. K. Caughey, “A Nonparametric Identiﬁcation Tech-

nique for Nonlinear Dynamic Problems,” Journal of Applied Mechanics,

1979, vol. 46, pp. 433–447.

[20] S. F. Masri, J. P. Caffrey, T. K. Caughey, A. W. Smyth and A. G.

Chassiakos, “Identiﬁcation of the State Equation in Complex Non-Linear

Systems,” International Journal of Non-Linear Mechanics, 2004, vol. 39,

pp. 1111–1127.

[21] A. J. Meade, Jr., “Regularization of a Programmed Recurrent Artiﬁcial

Neural Network,” Journal of Guidance, Control, and Dynamics, 2003.

[22] H. N. Mhaskar, “Neural Networks for Optimal Approximation of

Smooth and Analytic Functions,” Neural Computation, 1995, vol. 8,

pp. 164–177.

[23] O. Nelles, Nonlinear System Identiﬁcation: From Classical Approaches

to Neural Networks and Fuzzy Models, Springer Verlag, 2000.

[24] D. Nguyen and B. Widrow, “Improving the Learning Speed of 2-Layer

Neural Networks by Choosing Initial Values of the Adaptive Weights,”

Proceedings of the IJCNN, pp. 21–26, 1990, vol.III.

[25] S. Osowski, “New Approach to Selection of Initial Values of Weights in

Neural Function Approximation,” Electronic Letters, 1993, vol. 29, no. 3,

pp. 313–315.

[26] J. S. Pei, “Parametric and Nonparametric Identiﬁcation of Nonlinear

Systems,” Columbia University, 2001, Ph.D. Dissertation.

[27] J. S. Pei, A. W. Smyth, A.W. and E. B. Kosmatopoulos, “Analysis and

Modiﬁcation of Volterra/Wiener Neural Networks for Identiﬁcation of

Nonlinear Hysteretic Dynamic Systems,” Journal of Sound and Vibration,

vol. 275, no. 3-5, pp. 693–718.

[28] J. S. Pei and A. W. Smyth, “A New Approach to Design Multilayer Feed-

forward Neural Network Architecture in Modeling Nonlinear Restoring

Forces: Part I - Formulation,” ASCE Journal of Engineering Mechanics,

2004, accepted for publication.

[29] J. S. Pei and A. W. Smyth, “A New Approach to Design Multilayer Feed-

forward Neural Network Architecture in Modeling Nonlinear Restoring

Forces: Part II - Applications,” ASCE Journal of Engineering Mechanics,

2004, accepted for publication.

[30] J. S. Pei, J. P. Wright, and A. W. Smyth, “Mapping Polynomial Fitting

into Feedforward Neural Networks for Modeling Nonlinear Dynamic

Systems and Beyond,” Computer Methods in Applied Mechanics and

Engineering, 2004, accepted for publication.

[31] W. F. Schmidt and S. Raudys and M. A. Kraaijveld and M. Skurikhina,

and R. P. W. Duin, “Initializations, Back-Propagation and Generalization

of Feed-Forward Classiﬁers,” Proc. Int. Joint Conf. Neural Networks,

pp. 598–604, 1993.

[32] Y. K. Wen, “Methods of Random Vibration for Inelastic Structures,”

Appl. Mech. Rev., ASME, 1989, vol. 42, no. 2, pp. 39-52.

[33] L. F. A. Wessels and E. Barnard, “Avoiding False Local Minima

by Proper Initialization of Connections,” IEEE Transactions on Neural

Networks, 1992, vol. 3, no. 6, pp. 899–905.

[34] K. Worden and G. R. Tomlinson, Nonlinearity in Structural Dynamics:

Detection, Identiﬁcation and Modelling, Institute of Physics Pub, 2001.

[35] J. Y. F. Yam and T. W. S. Chow, “Feedforward Networks Training Speed

Enhancement by Optimal Initialization of the Synaptic Coefﬁcients,”

IEEE Transactions on Neural Networks, 2001, vol. 12, no. 2, pp. 430–

434.

[36] S. Zhao and T. S. Dillon, “Incorporating Prior Knowledge in the Form

of Production Rules into Neural Networks Using Boolean-Like Neurons,”

Applied Intelligence, 1997, vol. 7, pp. 275–285.

Interpretable Machine Learning for Function Approximation in Structural Health Monitoring

Chapter

Oct 2021

Machine learning may complement physics-based methods for structural health monitoring (SHM), providing higher accuracy, among other benefits. However, many resulting systems are opaque, making them neither interpretable nor trustworthy. Interpretable machine learning (IML) is an active new direction intended to match algorithm accuracy with transparency, enabling users to understand their systems. This chapter overviews existing IML work and philosophy, and discusses candidates from SHM to exemplify and substantiate IML. Multidisciplinary research has been making strides toward providing end users of shallow sigmoidal artificial neural networks (ANNs) with the tools and knowledge for engineering these systems. Notoriously opaque ANNs are made transparent as linear-in-the-weight parameterization tools by using domain knowledge to determine appropriate basis functions. With a small number of hidden nodes to activate these basis functions, the modeling capability of sigmoidal ANNs is systematically revealed without relying on training. The novelty is in ANN initialization theory and practical procedures that can be interpreted via domain knowledge. A rich repository of direct (non-iterative) techniques and reusable ANN prototypes can then be aggregated as the basis functions needed for specific problems, leading to interpretable ANNs as well as improved training performance and generalization as validated by simulated and real-world data.

Toward constructive methods for sigmoidal neural networks - Function approximation in engineering mechanics applications

Conference Paper

Full-text available

Jul 2011

This paper reports a continuous development of the work by the authors presented at IJCNN 2005 & 2007 [1, 2]. A series of parsimonious universal approximator architectures with pre-defined values for weights and biases called “neural network prototypes” are proposed and used in a repetitive and systematic manner for the initialization of sigmoidal neural networks in function approximation. This paper provides a more in-depth literature review, presents one training example using laboratory data indicating quick convergence and trained sigmoidal neural networks with stable generalization capability, and discusses the complexity measure in [3, 4]. This study centers on approximating a subset of static nonlinear target functions - mechanical restoring force considered as a function of system states (displacement and velocity) for single-degree-of-freedom systems. We strive for efficient and rigorous constructive methods for sigmoidal neural networks to solve function approximation problems in this engineering mechanics application and beyond. Future work is identified.

Deep Convolutional Neural Network for Structural Dynamic Response Estimation and System Identification

Article

Jan 2019

This study presents a deep convolutional neural network (CNN)-based approach to estimate the dynamic response of a linear single-degree-of-freedom (SDOF) system, a nonlinear SDOF system, and a full-scale 3-story multidegree of freedom (MDOF) steel frame. In the MDOF system, roof acceleration is estimated through the input ground motion. Various cases of noise-contaminated signals are considered in this study, and the conventional multilayer perceptron (MLP) algorithm serves as a reference for the proposed CNN approach. According to the results from numerical simulations and experimental data, the proposed CNN approach is able to predict the structural responses accurately, and it is more robust against noisy data compared with the MLP algorithm. Moreover, the physical interpretation of CNN model is discussed in the context of structural dynamics. It is demonstrated that in some special cases, the convolution kernel has the capability of approximating the numerical integration operator, and the convolution layers attempt to extract the dominant frequency signature observed in the ideal target signal while eliminating irrelevant information during the training process.

Data fusion approaches for structural health monitoring and system identification: Past, present, and future

Article

Sep 2018
STRUCT HEALTH MONIT

During the past decades, significant efforts have been dedicated to develop reliable methods in structural health monitoring. The health assessment for the target structure of interest is achieved through the interpretation of collected data. At the beginning of the 21st century, the rapid advances in sensor technologies and data acquisition platforms have led to the new era of Big Data, where a huge amount of heterogeneous data are collected by a variety of sensors. The increasing accessibility and diversity of the data resources provide new opportunities for structural health monitoring, while the aggregation of information obtained from multiple sensors to make robust decisions remains a challenging problem. This article presents a comprehensive review of the recent data fusion applications in structural health monitoring. State-of-the-art theoretical concepts and applications of data fusion in structural health monitoring are presented. Challenges for data fusion in structural health monitoring are discussed, and a roadmap is provided for future research in this area.

Initialization of multilayer feedforward neural networks to approximate nonlinear functions in engineering mechanics applications

Conference Paper

Full-text available

May 2007

This paper presents a heuristic initialization methodology for designing multilayer feedforward network networks in modeling nonlinear functions in engineering mechanics applications. In this and previous studies that this work is built upon, the authors do not presume to provide a universal method to approximate arbitrary functions, rather the focus is given to the development of a rational and unambiguous initialization procedure that applies to the approximation of nonlinear functions in the specific domain of engineering mechanics. The applications of this exploratory work can be numerous including those associated with potential interpretation of the inner workings of neural networks, such as damage detection. The goal of this study is fulfilled by utilizing the governing physics and mathematics of nonlinear functions and the strength of sigmoidal basis function. A step-by-step graphical procedure utilizing a few neural network prototypes as " templates " to approximate commonly seen memoryless nonlinear functions of one or two variables is developed in this study. Decomposition of complex nonlinear functions into a summation of some simpler nonlinear functions is utilized to exploit this prototype-based initialization methodology. Training examples are presented to demonstrate the rationality and efficiency of the proposed methodology when compared with the popular Nguyen-Widrow initialization algorithm. Future work is also identified.

Mapping Some Functions and Four Arithmetic Operations to Multilayer Feedforward Neural Networks

Conference Paper

Full-text available

Feb 2008

This paper continues the development of a heuristic initialization methodology for designing multi-layer feedforward neural networks aimed at modeling nonlinear functions for engineering mechanics applications as presented previously at IMAC XXIV and XXV. Seeking a transparent and domain knowledge-based approach for neural network initialization and result interpretation, this study examines the efficiency of linear sums of sigmoidal functions while offering constructive methods to approximate certain functions and operations. This effort directly contributes to the further extension of the proposed initialization procedure in that it opens the door for the approximation of a wider range of nonlinear functions.

A CONSTRUCTIVE METHOD FOR MULTILAYER FEEDFORWARD NEURAL NETWORK INITIALIZATION TO APPROXIMATE NONLINEAR RESTORING FORCES

Conference Paper

Full-text available

Jul 2010

As a mainstream class of tools in system identification of nonlinear dynamical systems, artificial neural networks are inherently adaptive and thus are, if properly designed, effective in producing a good fit of measured data. However, many practical subjective design challenges and inherent " black box " features of neural network techniques have seriously limited their applications and even acceptance for practical use. This is the significant knowledge gap that the authors strive to bridge in this work and its future expansion. By not treating neural networks as " black boxes " , this study puts forth the development of novel initialization techniques used for multilayer feedforward neural networks to model complex nonlinear hysteretic restoring forces and the approximation of other nonlinear functions in engineering mechanics. This has been done by explicitly considering the governing physics and/or underlying mathematical insight of the system to be identified together with the capabilities of linear sums of sigmoidal functions. In particular, a constructive method in the design of multilayer feedforward neural networks is developed to approximate a series of basic types of nonlinear functions and to perform some basic operations. These neural networks (called " prototypes ") have one hidden layer, a derived number of hidden nodes, and either fixed or recommended initial values for weights and biases. A variety of techniques are also proposed to use these prototypes either individually or combinatorially to produce successful training results for data stemming from more arbitrary nonlinearities. With the derived number of hidden nodes in each approximation, applying the Nguyen-Widrow algorithm is enabled and the training performance is compared between the existing and the proposed initialization options. This presentation will provide an overview and details of a series of heuristic approaches developed by the authors in a specific domain application of artificial intelligence. Further work is identified.

Environmental Influence on Modal Parameters: Linear and Nonlinear Methods for Its Compensation in the Context of Structural Health Monitoring

Chapter

Oct 2021

C. Rainieri

Modal-based structural health monitoring (SHM) detects damage and degradation phenomena from the variations of the modal parameters over time. However, the modal parameter estimates are also influenced by environmental and operational variables (EOVs) whose effects have to be compensated. Modeling the influence of EOVs on modal parameters is very challenging, so black-box models, such as regression models, are often adopted as an alternative. However, in many applications, the set of measured EOVs is incomplete or the factors influencing the estimates cannot be identified or measured. In these conditions, output-only techniques for compensation of environmental and operational effects are an attractive alternative. Different linear as well as nonlinear methods for the compensation of the environmental and operational influence on modal parameters in the context of modal-based SHM are reviewed in the present paper. Real datasets collected from vibration-based monitoring systems are analyzed, and the results are presented and discussed to illustrate the applicative perspectives and possible drawbacks of the selected methods.

Toward Interpretable Machine Learning for Understanding Epidemic Data

Conference Paper

Full-text available

Dec 2020

The COVID-19 pandemic is a worldwide crisis with impacts that are both devastating and inequitable as effects often fall hardest on communities that are already suffering from economic, social, and political disparities. Interpretable machine learning (IML) offers the possibility for detailed understanding of this and similar disease outbreaks, allowing subject matter experts to explore the data more thoroughly and find patterns and connections that might otherwise remain hidden. As an active area of research in artificial intelligence, IML has great significance yet numerous technical challenges to overcome. In this paper, we focus on approximating epidemic curves using an interpretable artificial neural network. This is a first step toward a flexible and interpretable modeling framework that we plan to use to study impacts of various demographic, socioeconomic, and other factors on disease outbreaks. We tap into a substantial but little-known collection of IML studies in nonlinear function approximation from engineering mechanics, where domain knowledge including visually observable features of the data is systematically sorted and directly utilized in the initialization of sigmoidal neural networks leading to training success and good generalization. After an introductory review of existing work, we present a feasibility study on approximating a particular epidemic curve leading to a promising result.

An overview on weight initialization methods for feedforward neural networks

Conference Paper

Full-text available

Jul 2016

Celso Sousa

Feedforward neural networks are neural networks with (possibly) multiple layers of neurons such that each layer is fully connected to the next one. They have been widely studied in the past partially due to their universal approximation capabilities and empirical effectiveness on a variety of application domains for both regression and classification tasks. In this paper, we provide an overview on feedforward neural networks, focusing on weight initialization methods.

New Approach to Designing Multilayer Feedforward Neural Network Architecture for Modeling Nonlinear Restoring Forces. II: Applications

Article

Full-text available

Dec 2006

Based on the basic formulation developed in a companion paper, the writers now present the application of an artificial neural network approach to designing streamlined network models to simulate and identify the nonlinear dynamic response of single-degree-of-freedom oscillators using the restoring force-state mapping interpretation. The neural networks which use sigmoidal activation functions are shown to be highly robust in modeling a wide variety of commonly observed nonlinear structural dynamic response behaviors. By streamlining the networks, individual network model parameters take on physically or geometrically interpretable meaning, and hence, the network initialization can be achieved through an engineered approach rather than through less physically meaningful numerical initialization schemes. Although not proven in general, examples show that by starting with a more meaningful initial design, identification convergence is improved, and the final identified model parameters are seen to have a more physical meaning. A set of model architecture prototypes is developed to capture commonly observed nonlinear response behaviors.

Approximation by superposition of sigmoidale function

Article

Jan 1989

George Cybenko

In this paper we demonstrate that finite linear combinations of compositions of a fixed, univariate function and a set of affine functionals can uniformly approximate any continuous function ofn real variables with support in the unit hypercube; only mild conditions are imposed on the univariate function. Our results settle an open question about representability in the class of single hidden layer neural networks. In particular, we show that arbitrary decision regions can be arbitrarily well approximated by continuous feedforward neural networks with only a single internal, hidden layer and any continuous sigmoidal nonlinearity. The paper discusses approximation properties of other possible types of nonlinearities that might be implemented by artificial neural networks.

Parametric Identification of Nonlinear Systems Using Chaotic Excitation

Article

Jul 2007
J COMPUT NONLIN DYN

The use of a time series, which is the chaotic response of a nonlinear system, as an excitation,for the parametric identification of single-degree-of-freedom nonlinear systems is explored in this paper It is assumed that the system response consists of several unstable periodic orbits, similar to the input, and hence a Fourier series based technique is used to extract these nearly periodic orbits. Criteria to extract these orbits are developed and a least-squares problem for the identification of system parameters is formulated and solved. The effectiveness of this method is illustrated on a system with quadratic damping and a system with Duffing nonlinearity.

Neural Networks for Pattern Recognition.

Article

Dec 1997

Nonlinearity in structural dynamics: Detection, identification and modelling

Article

Jan 2001

Soft computing with neural networks for engineering applications: Fundamental issues and adaptive approaches

Article

Dec 1998
STRUCT ENG MECH

Engineering problems are inherently imprecision tolerant. Biologically inspired soft computing methods are emerging as ideal tools for constructing intelligent engineering systems which employ approximate reasoning and exhibit imprecision tolerance. They also offer built-in mechanisms for dealing with uncertainty. The fundamental issues associated with engineering applications of the emerging soft computing methods are discussed, with emphasis on neural networks. A formalism for neural network representation is presented and recent developments on adaptive modeling of neural networks, specifically nested adaptive neural networks for constitutive modeling are discussed.

Nonparametric Identification of Nonlinear Dynamical Systems

Article

Apr 2003

A Nonparametric Identification Technique for Nonlinear Dynamic Problems

Article

Jun 1979

A nonparametric identification technique is presented that uses information about the state variables of nonlinear systems to express the system characteristics in terms of orthogonal functions. The method can be used with deterministic or random excitation (stationary or otherwise) to identify dynamic systems with arbitrary nonlinearities, including those with hysteretic characteristics. The method is shown to be more efficient than the Weiner-kernel approach in identifying nonlinear dynamic systems of the type considered.

Nonlinear system identification. From classical approaches to neural networks and fuzzy models

Book

Jan 2001

Oliver Nelles

The book covers the most common and important approaches for the identification of nonlinear static and dynamic systems. Additionally, it provides the reader with the necessary background on optimization techniques making the book self-contained. The emphasis is put on modern methods based on neural networks and fuzzy systems without neglecting the classical approaches. The entire book is written from an engineering point-of-view, focusing on the intuitive understanding of the basic relationships. This is supported by many illustrative figures. Advanced mathematics is avoided. Thus, the book is suitable for last year undergraduate and graduate courses as well as research and development engineers in industries. The new edition~includes exercises.

Approximations by superpositions of a sigmoidal function

Article

George Cybenko

Neural network initialization with prototypes - A case study in function approximation

Abstract and Figures

Recommended publications

On the use of hybrid neural networks and non-linear invariants for prediction of electrocardiograms

A study of Heuristic Searching approach for dynamic system modeling

THE DIFFUSION OF COMPETING TECHNOLOGY STANDARDS.

Wavelet network based online sequential extreme learning machine for dynamic system modeling