ArticlePDF Available

Stochastic maximum likelihood mean and cross-spectrum structure estimation: analytic and neuromagnetic Monte Carlo results

March 2004

March 2004

Authors:

Raoul Grasman

University of Amsterdam

H.m. Huizenga

University of Amsterdam

Lourens Waldorp

University of Amsterdam

Show all 6 authorsHide

In [1] we proposed to analyze cross-spectrum matrices obtained from electro- or magneto-encephalographic (EEG/MEG) signals, to obtain estimates of the EEG/MEG sources and their coherence. In this paper we extend this method in two ways. First, by modelling such interactions as linear filters, and second, by taking the mean of the signals across different trials into account. To obtain estimates we propose a stochastic maximum likelihood (SML) method, and obtain the concentrated likelihood that includes the trial means.

Content uploaded by H.m. Huizenga

Content may be subject to copyright.

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XX, NO. Y, MONTH 2004 1

Stochastic maximum likelihood mean and

cross-spectrum structure estimation: analytic and

neuromagnetic Monte Carlo results.

Raoul P.P.P. Grasman*, Hilde M. Huizenga, Lourens J. Waldorp, Peter C.M. Molenaar and

Koen B.E. B

ocker

Abstract— In [1] we proposed to analyze cross-spectrum matrices ob-

tained from electro- or magneto-encephalographic (EEG/MEG) signals, to

obtain estimates of the EEG/MEG sources and their coherence. In this pa-

per we extend this method in two ways. First, by modelling such interac-

tions as linear ﬁlters, and second, by taking the mean of the signals across

different trials into account. To obtain estimates we propose a stochastic

maximum likelihood (SML) method, and obtain the concentrated likeli-

hood that includes the trial means.

Keywords—equivalent current dipole, EEG, MEG, stochastic maximum

likelihood, array signal processing, mean structure, covariance structures,

functional connectivity, effective connectivity

I INTRODUCTION

N cognitive neuroscience the objective is to establish how

structures of the brain cooperate to give rise to mental func-

tions. Several brain imaging techniques are helpful in deter-

mining which parts of the cortex become active during certain

mental processes [2]. These techniques include functional mag-

netic resonance imaging (fMRI) and equivalent current dipole

(ECD) modelling of the electro- (EEG) and magneto- (MEG)

encephalogram [3]. EEG and MEG measure, respectively, the

scalp electric potential ﬁeld and the magnetic ﬁeld near the

head of a human subject. These ﬁelds are generated by lo-

calized electric currents associated with neuronal activity. In

ECD modelling, these currents are modelled by small current

dipoles, and the objective is to estimate their unknown locations,

orientations and amplitudes, given the EEG/MEG sensor out-

puts. Techniques such as fMRI provide great localization pre-

cision, whereas EEG and MEG provide great timing precision

[2]. While function localization to different parts of the cortex

has taken off to a great extent over the last decade, increasingly,

researchers are interested in testing hypotheses about the coop-

erativity between these different cortical structures. That is, in

estimating the parameters that describe the dynamics of these

interactions [4], [5].

Standard methods of investigating interactions include coher-

ence analysis of EEG and MEG signals and so-called event re-

lated (de-)synchronization [6]. Problems with the interpretation

of these measures of cortico-cortical interactions include vol-

ume conduction effects, reference electrode effects, and the lack

of spatial resolution of the EEG/MEG [4]. Newer approaches

consist of localization of activity by means of dipole source

Raoul Grasman, Hilde Huizenga, Lourens Waldorp and Peter Molenaar are

with the Department of Psychology, University of Amsterdam, Roetersstraat 15,

1018WB Amsterdam, the Netherlands. Phone: +31 20 525 6734. Fax: +31 20

639 0279.

Koen B

ocker is with the Department of Psychopharmacology, Utrecht Uni-

versity, Sorbonnelaan 16, De Uithof, 3583CA Utrecht, the Netherlands.

*Corresponding author e-mail: grasman@psy.uva.nl.

localization procedures and correlating source amplitude func-

tions estimated for these locations [7], [8], [9], [10]. Still an-

other approach, that was laid down in [1], is to simultaneously

estimate dipole locations and their amplitude cross-spectra from

the sample cross-spectra of the EEG/MEG signals. The advan-

tage of this last approach is that it makes full use of the virtues of

statistical estimation theory, which include high precision max-

imum likelihood estimators, and straightforward model evalua-

tion theory.

In this paper we will extend the method in [1] with a frame-

work for modelling and testing source amplitude coherence.

Furthermore we will include the information that is present in

the average of the EEG/MEG signals across repeated trials. The

framework suggested, has its roots in what is known in biomet-

rics as path-analysis, and in econometrics and psychometrics as

structural equation modelling (SEM) [11]. The method employs

maximum likelihood in a way that is very similar to stochas-

tic maximum likelihood (SML) directions-of-arrival (DOA) es-

timation, as given in e.g., [12], [13], [14]. We modify the usual

SML formulas to include the mean and a more general noise

covariance.

This paper is organized as follows: In section II the source

model is presented. In section III the mean and cross-spectrum

model is given, and a framework for modelling source amplitude

coherence is presented. In section IV closed form expressions

for the estimators of some of the parameters are derived, and

an expression for the concentrated negative log-likelihood func-

tion is obtained. Also, the generalized likelihood ratio (GLRT)

statistic and approximate standard errors of the estimators are

brieﬂy discussed in connection with model evaluation. In sec-

tion V the approximate standard errors and GLRT statistic are

evaluated in a set of numerical experiments. Finally in section

VI some closing remarks on the methods are made.

II DIPOLE MODEL AND MEASUREMENTS MODEL

Experimental EEG and MEG data usually consist of signal

segments measured in different trials, during which stimuli are

presented to subjects in order to evoke speciﬁc brain responses.

The EEG/MEG signals reﬂect these responses in a highly entan-

gled way, and the purpose of ECD modelling is to disentangle

these signals into the underlying components of localized neuro-

electric activity in the cortex. It has been widely recognized that

these cortical responses, evoked by the presentation of stimuli,

are characterized by a deterministic part and a stochastic part

[15].

The deterministic part, the event related potential/ﬁeld

The debate on this issue has recently revived due to experimental ﬁndings

in [16] and a mathematical analysis of data preprocessing effects veriﬁed in

2 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XX, NO. Y, MONTH 2004

(ERP/ERF), can be estimated by averaging the signals across

many repeated trials [5]. The stochastic part is only reﬂected in

the variance of the signals across trials. It is generally accepted

that these trials may be considered as statistically independent

replications of an evoked brain response, provided that the time-

interval separation between trials is not too small and unpre-

dictable [15]. In relating the ﬁelds produced by these neural cur-

rents to the measurements, the head is often modelled as a spher-

ically symmetric conductor that is locally ﬁtted to the curvature

of the skull [3]. The sources themselves are described by a pa-

rameter vector θ

= [θ

, . . . , θ

], containing location and ori-

entation parameters θ

for each source, indexed a = 1, . . . , d.

Here

denotes transposition.

For EEG/MEG data in trials l = 1, . . . , L the m-dimensional

array of measurements has the form

(t) = A(θ)

(t) +

(t) t = 0, . . . , T − 1. (1)

Here

(t) is the vector of measurements from m channels on

trial l at time t. A is the m × d matrix, of which the columns

contain the gains for the unit amplitude sources parameterized

by θ.

(t) is the d vector of source amplitudes in trial l at time t,

and

(t) is an m vector of noise signals in trial l at time t, inde-

pendent of

(t). The gain matrix A is obtained from the quasi-

static Maxwell equations [3]. For the MEG measurements from

a spherical head model that we use in section V, A was deter-

mined in [18]. Throughout this article it is assumed that source

parameters θ are ﬁxed over time and trials. We will assume that

the sources are sufﬁciently separated so that rank(A(θ)) = d

throughout the source region.

An advantage that arises if it may be assumed that

(t) is

stochastic in nature, is that, from the variation across trials in-

ferences about interdependency between different sources can

be made, which may be interpreted as “functional coupling” of

different cortical areas [19]. In [1] we also pointed out this fact,

along with a discussion about the advantages of transforming

the model into the frequency domain. To summarize the latter:

Assuming stationarity of the stochastic processes, the Fourier

coefﬁcients of different frequencies are asymptotically (i.e. for

T → ∞) uncorrelated and have approximately a complex nor-

mal distribution [20]; the ﬁtting function may therefore be fac-

tored into a set of ﬁtting functions that are much more efﬁciently

evaluated than their time domain equivalent. As a result the

computational burden can be reduced drastically. This property

has been previously exploited in the context of the analysis of

brain signals in [15], [21], [22], [23].

To summarize, we make the following assumptions on the

measurements: i) Multiple segments of multichannel data are

available, generated in accordance with the source signal plus

noise model in (1), in which a precisely deﬁned event occurs, to

which the sources respond. ii) Segments are statistically inde-

pendent of each other. iii) The sources’ parameters θ

are ﬁxed

across time and segments. iv) The source responses consist of

a deterministic part and a stochastic part. v) Noise and source

signals are statistically independent, and the expected value of

the noise is zero. vi) A is a known matrix function of θ and the

experimental data in [17], resulting in opposing views. In any case, the model

can be maintained as a model for the ensemble average, where any trial to trial

variation is absorbed into the stochastic part of the response.

Throughout this paper tilde (e) will indicate time domain quantities.

sources are sufﬁciently separated such that A has full column

rank.

III MODEL SPECIFICATION

III-A Mean and cross-spectrum structure

Deﬁne y

(k) = (2πT )

−1/2

(t) exp (−i2πtk/T ) to be

the discrete Fourier transform coefﬁcient at frequencies 2πk/T ,

k = 1, . . . , K < T /2 [21]. Deﬁne s

(k) and n

(k) similarly. As

indicated previously, subject to certain mixing conditions and

stationarity of the stochastic part of the signals, the Fourier coef-

ﬁcients y

(k) have an asymptotically complex normal distribu-

tion, and are statistically independent for k 6= j [20], [21]. Their

covariance matrix E{[y

(k) − E{y

(k)}][y

(k) − E{y

(k)}]

∗

}

approaches the cross-spectral density matrix R

as T → ∞.

Here (·)

∗

denotes conjugation and transposition.

For the Fourier transformed data the equivalent of (1) is

(k) = A(θ)s

(k) + n

(k) k = 1, . . . , K < T/2. (2)

Using assumption v in the last paragraph of the previous sec-

tion, the cross-spectrum of the stochastic part of

(t) at each

frequency then has the structure

= A(θ)Ψ

∗

(θ) + Θ

k = 1, . . . , K < T /2. (3)

Here Ψ

is the cross-spectrum of the source amplitudes, and Θ

is the cross-spectrum of the noise signals. These are the limiting

values of the covariances of s

(k) and n

(k), respectively. This

is the model presented in [1]. In that paper, this model was ﬁtted

to the sample cross-spectrum

that was computed from the

observed data from the formula [20, p. 282]

L − 1

l=1

(k) −

][y

(k) −

]

∗

, (4)

where

= L

−1

(k). If an estimate of the matrix Ψ

obtained in this way, coherences between source amplitudes can

be obtained as a measure of functional connectivity (see [1]).

In the model thus far, the signal means, and hence the source

amplitude means, are ignored. If the amplitude means are not

equal to zero, the means contain important information, and are

usually the object of interest to the researcher.

Therefore the ﬁrst extension of this model involves the incor-

poration of the trial average in the form of the expected value of

the Fourier coefﬁcients by taking expectations in (2):

E{y

(k)} = µ

= A(θ)E{s

(k)} = A(θ)s

, (5)

because E{n

(k)} = 0 by assumption v) on

(t). Here s

the k-th Fourier coefﬁcient of the ensemble average waveform

s(t).

III-B Linear ﬁlter model for interactions

In some situations a researcher may entertain substantive hy-

potheses about interactions between different sources. In the

current model these hypotheses may be tested directly if Ψ

further structured.

A(θ) is real valued for the biophysical model but can be complex in other

applications. Henceforth we use

∗

instead of

in such cases.

GRASMAN ET AL.: STOCHASTIC MAXIMUM LIKLIHOOD 3

We will approximate the interaction equation between the

amplitudes of two sources indexed a and b by a linear ﬁlter

relation. It is not expected that the relations will be perfect,

since, apart from nonlinearities in the interactions, some intrin-

sic activity will exist and some external input activation is un-

accounted for by the sources incorporated in the model. These

effects will be incorporated through an additional zero mean sta-

tionary stochastic process term

a,l

(t). For different sources a

and b,

and

are assumed to be independent. In addition, a

non-random portion of the response is included through an extra

term h

(t). The resulting equation for the interactions between

sources is

a,l

(t) = h

(t) +

b=1

(τ)es

b,l

(t − τ )dτ +

a,l

(t) (6)

for a = 1, . . . , d. Of course neurophysiological hypotheses may

imply that some of these kernels are equal to zero.

By the convolution theorem of Fourier analysis, in the fre-

quency domain this results in the relation

a,l

(k) = α

(k) +

b=1

(k)s

a,l

(k) + ζ

a,l

(k) (7)

where α

(k) =

2π

(t) exp{−i2πkt}dt, and β

(k) =

2π

(t) exp{−i2πkt}dt. Zero kernels in (6) correspond to

zero coefﬁcients in this equation.

For d sources these relations may be compactly represented

in matrix form:

(k) = α

+ B

(k) + ζ

(k) (8)

where B

= (β

(k)), and ζ

(k)

= (ζ

1,l

(k), . . . , ζ

d,l

(k)).

Some restrictions must be imposed in order to make B

iden-

tiﬁable [11], [24]. We will assume that B

is speciﬁed in such a

way that (I−B

)

−1

exists; this coincides with the condition that

the system of ﬁlters in (6) is invertible [20, p. 30]. This ensures

that

s(t) consists of a deterministic component superimposed on

a stationary stochastic component. The vector of mean ampli-

tude Fourier coefﬁcient E{s

} is then obtained from rewriting

(8) as s

(k) = (I −B

)

−1

[α

+ζ

(k)], and taking expectation:

E{s

(k)} , s

= (I − B

)

−1

as E{ζ

(k)} = 0 by the assumption on the components

(t). Besides the invertibility condition on I − B

the cross-spectrum of

(t) will have to be restricted. A

natural constraint is to restrict E{ζ

(k)ζ

(k)

∗

} = Φ

diag(φ

(k), . . . , φ

(k)), as otherwise, there could be corre-

lations between dipole amplitudes not accounted for by the

ﬁlter model that was invoked precisely to model these cor-

relations. The amplitude cross-spectrum is now obtained

from E{[s

(k) − E{s

(k)}][s

(k) − E{s

(k)}]

∗

} = E{(I −

)

−1

(k)ζ

∗

(k)(I − B

∗

)

−1

} or

= (I − B

)

−1

(I − B

∗

)

−1

For any nonsingular diagonal scaling matrix D it is seen that

[D(I − B

)]

−1

DΦ

∗

[(I − B

∗

]

−1

= (I − B

)

−1

(I −

∗

)

−1

. Hence, without ﬁxing the scale of either B

or Φ

, both

cannot be uniquely identiﬁed. A further restriction therefore has

to be imposed. With no additional information, it is natural to

require either Φ

= I or diag(B

) = 0. The former has the

interpretation that the intrinsic activity has a uniform spectrum

(i.e. is pure white noise), which is somewhat unrealistic, espe-

cially for biological systems, and is therefore not desirable. The

latter ensures that the diagonal elements of I−B

are equal to 1

and means that the kernels h

aa,1

(t), a = 1 . . . d, are identically

zero; hence B

contains the Fourier coefﬁcients of the linear ﬁl-

ter that predicts the activity of one source from the activity of

only other sources (and not from its own activity).

III-C Structure of cross-spectrum Θ

of the noise signals

In [1] the noise cross-spectrum Θ

was constrained to be

proportional to an identity matrix: Θ

= σ

I. Conceptually,

this means that the amount of noise is the same for all sensors,

and that noise at different sensors is mutually uncorrelated. A

more realistic constraint on Θ

is to assume that the background

EEG/MEG consists of dipoles that are randomly located and

randomly activated in different trials and different times [25].

Here we will assume that Θ

is any function of KN parameters

γ = (γ

), j = 1 . . . N, k = 1, . . . , K such that these parame-

ters are identiﬁable.

In summary, then, in addition the measurement assumptions

(section II), we assume: vii) The Fourier coefﬁcients of the

Fourier transformed data have (asymptotically) a complex nor-

mal distribution, independent for different frequencies. viii) The

dependencies between signals of different sources can be rea-

sonably approximated by linear ﬁlter relations as in (6), this may

be justiﬁed as a ﬁrst order approximation of a Volterra func-

tional expansion [20]. ix) The ﬁlter system in (6) is invertible

(i.e. (I − B

)

−1

exists for all k). x) Restrictions have been

introduced, in a way that is justiﬁable within the context of ap-

plication, to make the matrices B

, Φ

and Θ

identiﬁable.

For easy reference we recapitulate some used symbols:

Symbol Meaning

k frequency index

θ vector of all source parameters

A sensor gain matrix

(k) Fourier coefﬁcients of the observed signals

on the l-th trial

sample mean of y

(k)

sample covariance of y

(k)

expected value of y

(k)

expected value of source amplitude Fourier

coefﬁcient s

(k)

cross-spectrum of stochastic part of

(t)

cross-spectrum of stochastic part of

(t)

cross-spectrum of

(t).

Fourier coefﬁcient of the deterministic response

of the source amplitude interactions

matrix of transfer coefﬁcients in the

linear ﬁlter source interaction model

(diagonal) matrix with the variance of ζ

(k)

4 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XX, NO. Y, MONTH 2004

IV PARAMETER ESTIMATION.

Since the interest of the analyst is usually restricted to a lim-

ited band of frequencies, not all frequencies have to be incorpo-

rated in the analysis; we will denote the subset of K frequencies

incorporated by Ω

Following [1], the unknown parameters in (3), (5) and section

III-C (i.e. θ, α

, the non-duplicate elements in Ψ

and γ =

(γ

), for j = 1, . . . , N, k ∈ Ω

) are collected in the p-vector

ξ, and are estimated by maximizing the “likelihood” [20], [21]

`({y

(k) : l = 1, . . . , L, k ∈ Ω

}; ξ) ≈

l,k

exp{−[y

(k) − µ

(ξ)]

∗

(ξ)

−1

(k) − µ

(ξ)]}

(ξ)|

The factorization in k is due to the afore mentioned asymptotic

independence of the Fourier coefﬁcients of different frequencies

[20], [21]. It will be more convenient to minimize the negative

log-likelihood, which is proportional to (cf. [26], [27])

F (ξ) =

k∈Ω



log |R

(ξ)| + tr{R

(ξ)

−1

}

+ [

− µ

(ξ)]

∗

(ξ)

−1

[

− µ

(ξ)]



+ K log π

. (9)

When it is known that µ

(ξ) ≡ 0 and a single fre-

quency is considered, (9) may be reduced to log |R

(ξ)| +

tr{R

(ξ)

−1

}, which is the “Stochastic ML” (SML) objec-

tive function described in e.g. [12], [13], [14], [28], [29].

In some cases, the latter negative log-likelihood function can

be separated, and for unparameterized Ψ

and Θ

= σ

I a

well known compact equivalent concentrated problem was de-

termined by B

ohme (cited in [14], [29]), which greatly increases

the computational efﬁciency.

Here we derive a similar concentrated problem for this case

when the mean is incorporated. Furthermore, we obtain estima-

tors under more general noise conditions—i.e., we do not as-

sume that Θ

= σ

I, but allow Θ

= σ

U(γ), where U(γ)

is a Hermitian positive deﬁnite matrix such that γ is identiﬁed.

For simplicity however we will derive the results for U(γ) = I

and then indicate how they generalize.

IV-A The case that B = 0 (unparameterized Ψ).

Let F

†

denote the pseudo-inverse (F

∗

−1

∗

, and Π

⊥

de-

note the matrix I − FF

†

for any F of full column rank. Then,

for unparameterized Ψ

we obtain the expressions

(

θ) = A

†

(θ)

θ=

(10)

(

θ) =

m − d

tr{Π

⊥

[

∗

]}|

θ=

(11)

(

θ) = A

†

(

−

(θ)I)A

†

∗

(θ)|

θ=

(12)

θ = arg min

log |A(θ)

(θ)A

∗

(θ) +

(θ)I|, (13)

In the case that Θ

(γ) = σ

U(γ), then A

†

is replaced by

(QA)

†

Q, Π

⊥

by QΠ

⊥

Q, and

I by

U(γ), where Q =

−1/2

, a Hermitian ‘square root’ of U

−1

. In deriving the re-

sults, we will temporarily drop the dependence on k and sup-

press the dependence of A on θ.

These estimators are obtained by equating partial derivatives

of F to zero and solving for the desired parameter. We ﬁrst

consider α.

IV-A.1 Mean amplitude parameters α.

Setting the derivatives of F with respect to α equal to zero,

the ﬁrst order conditions −2(

y − Aα)

∗

−1

A = 0 are ob-

tained, which may be solved to yield the optimal estimator

α = [A

∗

−1

∗

−1

y. (14)

In the appendix it is shown that for R = AΨA

∗

+Θ, for any

A of full column rank and nonsingular Ψ and Θ,

∗

−1

∗

−1

= (A

∗

−1

∗

−1

(15)

so that (10) is obtained for Θ = σ

I, and the more general result

is obtained for Θ = σ

IV-A.2 Amplitude cross-spectral parameters Ψ.

Substitution of

α into F yields the concentrated negative log-

likelihood F

. Let ψ denote a real or imaginary part of an ele-

ment of Ψ. The derivatives of F

with respect to ψ are the same

as those of F because

α does not depend on Ψ by (15). Setting

the derivatives equal to zero, we may obtain the equations

− 2< tr{A

∗

−1

(

− R)R

−1

A∂Ψ/∂ψ}|

α=

= 0

ψ = (<Ψ)

or ψ = (=Ψ)

, a, b = 1, . . . , d,

where

R−[

y−

µ][

y −

µ]

∗

, and

µ = A

α. From (14), by

construction of

α, A

∗

−1

[

y −

µ] = 0, so that A

∗

−1

∗

−1

R and the estimation equations can be reduced to, in

matrix form,

∗

−1

(

R − R)R

−1

A = 0.

Substituting R = AΨA

∗

+ Θ, this can be written

∗

−1

(

R − Θ)R

−1

A = A

∗

−1

AΨA

∗

−1

Therefore the optimal estimate of Ψ is given by

Ψ = [A

∗

−1

∗

−1

(

R − Θ)R

−1

A[A

∗

−1

(16)

which with (15) and Θ = σ

I or Θ = σ

U yields (12) and the

more general result indicated thereafter.

IV-A.3 Noise spectrum σ

and concentrated negative log-

likelihood.

We ﬁrst derive some simplifying expressions, required to con-

centrate the likelihood with respect to α and Ψ.

First note that with U

−1/2

= Q,

Ψ can be rewritten

Ψ = (QA)

†

RQ(QA)

†

∗

− σ

−1

. (17)

Furthermore, by using the matrix inversion formula (e.g. [30, p.

9], see the appendix) twice, it can be shown that

−1

= (AΨA

∗

+ σ

−1

QΠ

⊥

Q+Q(QA)

†

∗

[Ψ+σ

∗

−1

]

−1

(QA)

†

GRASMAN ET AL.: STOCHASTIC MAXIMUM LIKLIHOOD 5

Substituting

Ψ in (17), and a little algebra that cancels terms,

yields the equation

−1

= (1/σ

)QΠ

⊥

+ U

−1

A(A

∗

−1

∗

−1

. (18)

Substitute (18) in tr{R

−1

R} to obtain

tr{R

−1

R} = tr{(1/σ

)QΠ

⊥

+ tr{U

−1

A(A

∗

−1

∗

−1

= tr{(QΠ

⊥

R}/σ

+ d,

where the equality tr{AB} = tr{BA} was used. Furthermore

from (18) and (15) it can be shown that

−1

[I − A(A

∗

−1

∗

−1

] =

−1

[I − A(A

∗

−1

∗

−1

From this and from (14) therefore, we ﬁnd

(

y −

µ)

∗

−1

(

y −

µ)

= (

y −

µ)

∗

−1

(I − A(A

∗

−1

∗

−1

)

= (

y −

µ)

∗

−1

(I − A(A

∗

−1

∗

−1

)

which can be written

∗

QΠ

⊥

y, because Θ

−1

(1/σ

−1

= (1/σ

. Combining traces now yields the

concentration of F in (9) with respect to α and Ψ:

α,

= log |A

ΨA

∗

+σ

U|+

tr{QΠ

⊥

R+

∗

)}+d.

To ﬁnd

we must take the derivative of F

α,

with respect

to σ

. Before doing so, ﬁrst note that together with (17) R|

ΨA

∗

+ σ

U can be written A(QA)

†

RQ(QA)

†

∗

−1

⊥

−1

. Therefore ∂R|

/∂σ

= Q

−1

⊥

−1

Setting the derivative of F

α,

with respect to σ

equal to zero

gives the ﬁrst order conditions

< tr{R

−1

⊥

−1

} =

< tr{QΠ

⊥

R +

∗

)}/σ

With (18) we ﬁnd tr{R

−1

⊥

−1

} = (m − d)/σ

therefore

= tr{QΠ

⊥

R +

∗

)}/(m − d) is obtained,

which is (11). Substitution of

in F

α,

yields the concen-

trated negative log-likelihood in (13).

IV-B The case that B 6= 0.

Unfortunately, when B 6= 0, we cannot use the algorithm

in (10)-(13). Some parameters can still be separated however:

Next we ﬁnd estimators for α and Φ when B 6= 0.

If B 6= 0, A in (14) must be substituted by A(I − B)

−1

. The

resulting estimator of α has the simple form (I − B)

α, where

α is given in (14).

We can obtain an estimate for Φ in a similar way as

Ψ. Let

φ be a (real) diagonal element of Φ. Setting derivatives of F

with respect to φ equal to zero, we ﬁnd the ﬁrst order conditions

to be

< tr{Λ

∗

−1

(

R − Θ)R

−1

Λ∂Φ/∂φ}

= < tr{Λ

∗

−1

ΛΦΛ

∗

−1

Λ∂Φ/∂φ}.

where Λ = A(I − B)

−1

. Since the partial derivatives are taken

only with respect to the real diagonal elements of Φ it is easy to

see that < may be dropped. The ﬁrst order conditions therefore

yields a system of equations with the solution

φ = [(Λ

−1

Λ)  (Λ

∗

−1

Λ)]

−1

× diag[Λ

∗

−1

(

R − Θ)R

−1

Λ] (19)

where φ

φ = diag(Φ) contains the diagonal elements of Φ, 

denotes the Hadamard product deﬁned by A  B = (a

and

R denotes conjugation of R.

IV-C Assessment of model ﬁt

The appropriateness of a model can be assessed by means of

various ﬁt assessment techniques that are sometimes grouped

under the term “model selection procedure” [31]. Some of

these procedures indicate how well the model describes the data,

while others provide a rationale for deciding which of several

competing models should be preferred on the basis of the data.

Hence, a model selection procedure can help to decide how

many dipoles should be incorporated in the model, and whether

cross-spectral parameters should be included. In [1] we assessed

the usefulness of the generalized likelihood ratio (GLRT) statis-

tic 2L· (F (

ξ) −

k∈Ω

log |

|) in determining the number of

dipole sources that should be incorporated in the model (i.e. the

detection problem [32]). Here we assess its effectiveness in test-

ing the lack of interaction between different sources. The GLRT

has an asymptotic χ

distribution with df = K(m

+ 2m) − p

degrees of freedom, where p is the number of free parame-

ters. For moderate numbers of observations a Bartlett corrected

statistic should be used [33] as was indicated in [34], [35].

Conﬁdence regions of the estimates can also help to decide

which parameters are necessary, and which may be omitted:

Location estimates that are not contained in each other’s conﬁ-

dence regions indicate separate sources, and conﬁdence regions

of cross-spectral parameter estimates indicate whether these dif-

fer from zero [31], [36].

V SIMULATIONS

In [1] we showed that conﬁdence regions can be constructed

quite reliably. Conﬁdence regions of the estimated parameters

can be computed from the Hessian matrix of the negative log

likelihood ratio F (ξ) evaluated in

ξ [37], [38]. A ﬁnite dif-

ference approximation of the Hessian was calculated from the

gradients at the estimate

ξ. Note that in order to obtain standard

errors for all parameters, F in (9) must be implemented fully, in-

cluding, all analytic derivatives, but only needs to be evaluated

after the last iteration of the algorithm in (10)-(13). We used a

quasi-Newton algorithm [39] to optimize the full negative log-

likelihood (9) to obtain the estimates

ξ. We refer to [1] for the

details on the calculation of the conﬁdence regions.

To asses the performance and the stochastic behavior of the

estimators in the current extensions, a number of simulations

6 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XX, NO. Y, MONTH 2004

0 20 40 60 80 100 120

−0.10 0.00 0.05 0.10 0.15

time

amplitude

Reconstructed mean amplitude of source 1

Estimated source location parameters

−0.50 −0.25 0.00 0.25 0.50 0.75

Fig. 1. To give an impression of the reconstruction, here, the in each simula-

tion reconstructed source amplitude of the ﬁrst source is depicted (upper panel).

The fat gray line is true amplitude as reconstructed from the frequencies used

in the estimation. Furthermore, in the lower panel box and whisker plots of

the source location parameter (θ

= [θ

(x)

, θ

(y)

, θ

(z)

]) estimates are depicted.

Black, gray and white boxed whisker plots respectively correspond to x, y and

z-coordinates. Boxes show the estimates between the ﬁrst and last quartiles, cen-

tral line indicates the median, whiskers indicate the estimators’ range extremes,

dots indicate very extreme estimates.

were carried out. Simulations were conducted in much the

same way as in [1]: Three dipoles were placed in a unit radius

sphere at (0, 0.5, 0.75), (0, −0.5, 0.75) and (−0.5, 0, 0.75), the

ﬁrst two with orientation cosines (1, 0, 0) and the third with ori-

entation cosines (0, 1, 0). The amplitudes of the dipoles con-

sisted of dampened sine waves added to a vector auto regressive

stochastic process: es

(t) = exp(−2t/T ) sin(2πt2/T ) + ea

(t)

and es

(t) = exp(−2t/T ) sin(2πt4/T ) + ea

(t), where ea

(t)

and ea

(t) satisﬁed the equations ea

(t) = 0.7a

(t − 1) +

(t)

and ea

(t) = 0.5a

(t − 1) + 0.7a

(t − 1) +

(t). The

amplitude of the third dipole was generated from es

(t) =

exp(−2t/T ) sin(2πt6/T ) + ea

(t) with ea

(t) = 0.3ea

(t − 1) +

0.7ea

(t − 1) +

(t). For all three i = 1, 2, 3,

(t) ∼ N(0, 1).

With these dipole amplitudes MEG data were simulated for

a whole head 61 sensor array in accordance with (1) — the

components of

n(t) each satisfying the auto-regression process

n(t) = 0.7n(t − 1) + (t) where (t) ∼ N(0, σ

). The ma-

trix function A was obtained from [18]. In all simulations σ

was twice as large as the largest of the noiseless sensor signal

variances, i.e., the signal to noise ratio (SNR) was 1:2. For each

trial T = 128 samples were generated. The generated data were

(Fast Fourier) transformed into the frequency domain and the

mean

and sample cross-spectra

for the ﬁrst ﬁve frequency

TABLE I

COVE R AGE RATE S OF T HE 95% CONFI D ENCE IN T ERVALS.

θ α Φ

100 93.7 95.6 92.2 94.4 94.2

200 94.9 95.5 93.9 94.5 86.2

400 95.1 95.8 94.9 95.1 68.0

Percentage of simulations of which 95% conﬁdence intervals contained the true

parameter values when the correct model was ﬁtted. Percentages are computed

as the proportion of 300 simulations in each case.

100 200 300 400

0 20 60 100

number of trials

% accepted

Fig. 2. Percentage of simulations in which the ﬁtted model was accepted as indi-

cated by the signiﬁcance of the GLRT. This should be in 95% of the simulations

in case of the correct model (continuous line), and in as few of the simulations

as possible, in case of an incorrect model with too few parameters (dashed line).

components were calculated as indicated previously.

The simulations were carried out with L = 100, 200 and 400

trials. Two models were ﬁtted: one in which only the ﬁlter co-

efﬁcients from dipole 1 to dipole 2 (β

) and from dipole 1 to

dipole 3 (β

) were freely estimated while others were forced to

zero (it can be shown that this is a correct model for the simu-

lated data) and one model in which no interactions were allowed

(which is an incorrect model for the simulated data). In all sim-

ulations Θ

= σ

I was used in accordance with the simulated

data. The purpose of ﬁtting the incorrect model was to assess

the adequateness and usefulness of the Bartlett corrected GLRT

in rejecting an incorrect a priori hypothesized model, while re-

taining a correct a priori hypothesized model.

In Table I coverage rates for different kinds of parameters are

presented. These coverage rates represent in a condensed form

the accuracy of the estimators themselves, and the quality with

which the conﬁdence intervals are constructed. This is achieved

by giving the percentage of simulations in which the true param-

eters were contained in the 95% conﬁdence intervals constructed

in each simulation. To illustrate, Fig. 1 depicts the reconstructed

source amplitude of the ﬁrst source (location (0.0, 0.5, 0.75)) in

each simulation with 400 trials in which the correct model was

ﬁtted. Furthermore it depicts the estimated source locations of

all sources in each simulation.

As can be seen from Table I, the coverage rates of the con-

ﬁdence intervals are rather close to their theoretical expected

level of 95%—even for relatively small numbers of trials (i.e.

L = 100). The latter is somewhat surprising because the theory

was developed on the assumption of large numbers of trials.

The departure from the theoretical value of the coverage rates

of σ

was anticipated from the results reported in [1]. The re-

markable feature of these coverage rates is that they are near per-

fect when few trials are available, and departure increases as the

number of trials increases. This seemingly paradoxical result is

GRASMAN ET AL.: STOCHASTIC MAXIMUM LIKLIHOOD 7

due to a slight bias of the estimators in combination with over-

sized conﬁdence intervals for relatively few trials (L = 100).

At L = 400 the coverage rate of these parameters is about 68%

which falls neatly in between the rates reported in [1] for these

estimators under SNR’s 1:1 and 1:5, and the same number of

trials. Apparently, as the signal to noise ratio decreases, the bias

increases, as at L = 400 the estimated standard errors were in

fact quite good.

The acceptance rate of the GLRT is graphed in ﬁgure 2. As

can be seen from the ﬁgure the GLRT rejected both models

when trial count was low (L = 100), indicating that the asymp-

totic approximation is inadequate with low trial counts. Accep-

tance of the correct model was near the nominal rate of 95%

for moderate L = 200 (89%) and at nominal rate for relatively

large numbers of trials L = 400 (96%), indicating adequate

approximation of the statistic by the asymptotic distribution in

these cases

. The incorrect model, was accepted too often with

moderate trial counts (19%), indicating that the GLRT was too

insensitive to modelling errors in such cases. For relatively large

numbers of trials the GLRT therefore seems to be helpful in de-

tecting interactions (but see the discussion below).

VI CONCLUDING REMARKS

We have formulated a framework for modelling coherence be-

tween sources in terms of linear transfer functions. This frame-

work has its roots in the techniques known in the statistical liter-

ature as structural equation modelling (SEM) [11], conﬁrmatory

factor analysis [40], frequency domain dynamic factor analysis

[41] and simultaneous equations [42]. For the latter, frequency

domain-like variants were proposed in [43].

We have given closed form expressions for estimators of the

separable parameters and an expression for the concentrated

negative log-likelihood which greatly simplify the numerical op-

timization procedure. The expressions obtained are very similar

to the standard expressions found in the signal processing liter-

ature on SML DOA estimation, and extends the SML methods

with the inclusion of the mean, and a more general noise covari-

ance matrix which may depend on unknown parameters. The

results of the simulations show that parameter standard errors

are reliably constructed; regardless of the number of trials. Fur-

thermore, the results indicated that the GLRT statistic can be

indicative on the presence of interactions between sources, pro-

vided that enough trials are available (L ≥ 200).

In [1] we have also considered least squares techniques. How-

ever, the generalized least squares method, although known to

have the same asymptotic statistical properties as ML estima-

tors [44], yielded biased estimates of source coherence in ﬁnite

samples, and were therefore not considered here.

Frequency domain dipole modelling of EEG/MEG data has

been pursued before in [45],[21],[22], while the asymptotic sta-

tistical independence of Fourier coefﬁcients has been exploited

in the context of general EEG signal analysis in e.g. [15], and

[23]. The method discussed here and in [1], can be considered

as extensions to the methods in these references.

Other approaches that use dipole localization techniques from

the outset to study cortical synchrony have been presented in

This was conﬁrmed by a Kolmogorov-Smirnov test on the distribution of the

statistic.

[7], [8] and [9]. In [7] synthetic aperture magnetometry (SAM)

is used to derive time series of activity in regions of interest,

that are then subjected to phase analysis. In [8] a beam form-

ing technique is used that searches for sources with maximum

coherence. In [9] an interesting adaptation of iteratively re-

ﬁned minimum norm estimation [46] is presented which uses

a bootstrapping technique on surrogate data. As argued in [9],

problems with the ﬁrst two approaches are that the linearly con-

strained minimum variance beam formers were developed under

the assumption of incoherent sources, and their performance is

known to deteriorate with coherent sources. Furthermore the

method in [8] only ﬁnds coherent sources, while neurophysio-

logical research indicates that desynchronization of sources may

play an important role in several cognitive processes [6], [4], [5]

(cf. [9]). A similar argument would hold against the use of MU-

SIC for estimating coherence between sources [9], [47]. The

minimum norm estimate is known to suffer from bias in its lo-

cation estimates but was improved with bootstrapping methods

[9]. Once the regions of activity have been localized in this man-

ner, these authors suggest to perform a phase analysis on recon-

structed time series [7]. Although this method seems promising,

it is as yet difﬁcult to see a principled framework in which the

adequateness of the resulting source model can be assessed. In

contrast, maximum likelihood estimation directly provides mea-

sures to assess modelling adequateness in the form of the GLRT

statistic.

As an alternative to SML estimation, subspace ﬁtting methods

(SF), in which A(θ) is ﬁtted by least squares to subspace vec-

tors obtained from e.g. principal components analysis (PCA) or

independent components analysis (ICA [48]) have been investi-

gated [29]. SF methods are corrected versions of methods that

ﬁt individual columns of A(θ) to individual subspace compo-

nents as is done in [49] (PCA) and [16], [10] (ICA), which are

known to be suboptimal [50], [51]. Weighted SF (WSF), which

is based on PCA, was shown to yield asymptotically efﬁcient

estimates of θ in [29]. Furthermore WSF and SML were shown

to be asymptotically robust against violations of distribution as-

sumptions of the source signals. Currently, general (asymp-

totic) distributional properties of other subspace estimates, e.g.

obtained from ICA, are unknown, and therefore it is unclear

whether such estimates are efﬁcient. As indicated earlier, we

also investigated generalized least squares estimation of cross-

spectrum structures, which is also known to be asymptotically

efﬁcient [44], [1], and concluded that it yields strongly biased

coherence estimates in ﬁnite samples—this, in contrast with the

SML estimates. We plan to investigate this in the simpler (W)SF

estimates in future work.

With respect to the GLRT statistic a word of caution is in or-

der. As indicated in [1] the GLRT statistic is distributed as χ

only asymptotically, that is, for large L. At the same time as

L grows larger, the sensitivity to modelling error increases, and

the test is likely to become signiﬁcant because of the necessary

approximations in the head model, the source model (dipole ap-

proximation to extended sources) and noise model. Therefore

the GLRT may not be very appropriate as a rigid rejection crite-

rion, and it has been recommended to use it more as a descriptive

index of overall of ﬁt rather than as a statistical test [52]. A large

number of alternative measures have been presented in the lit-

erature, an overview of which may be found in [11]. In [31] a

8 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XX, NO. Y, MONTH 2004

number of ﬁt indices for selecting the number of dipole sources

have been assessed, both with respect to certain theoretical re-

quirements, as well as in numerical experiments with dipole lo-

calization with MEG. It was found that information theoretic

criteria on the one hand, and Wald tests on source amplitudes

on the other hand, were quite effective under various circum-

stances. In the current setup, if the mean is modelled, then the

conﬁdence intervals of the α

parameters are akin to the Wald

amplitude test discussed in [31].

The difference between estimates of unparameterized Ψ and

estimates of Ψ parameterized by B and Φ is precisely the dis-

tinction made in the neuroimaging community between “func-

tional” and “effective” connectivity [4]. However, it should be

emphasized that in modelling the coherence between sources,

several equivalent models may exist, which can have very dif-

ferent neurophysiological interpretations. For example in the

case of two sources, the interaction may be modelled as the ﬁrst

source being input to the second, or vice versa. Both models

would ﬁt equally well, so no distinction can be made on the basis

of the ﬁt. Therefore, in applications a priori information should

be available on which interaction patterns are considered to be

more valid than other, mathematically equivalent ones [11].

ACKNOWLEDGMENTS

The Netherlands Organization for Scientiﬁc Research (NWO)

is gratefully acknowledged for funding this project. This re-

search was conducted while R. Grasman (527-25-014), L. Wal-

dorp (527-25-013), and K. B

ocker (527-25-015) were supported

by a grant of the NWO foundation for Behavioral and Educa-

tional Sciences of this organization awarded to H.M. Huizenga,

P.C.M. Molenaar, L.J. Kenemans, and J.C. de Munck.

We are indebted to the anonymous reviewers whose construc-

tive comments helped us improving on a ﬁrst version of this

paper. We also thank Dr. Conor V. Dolan for proofreading.

APPENDIX

Equation (15). Assume Ψ

−1

and Θ

−1

exist, and that A has

full column rank. From the relation (A + CBD)

−1

= A

−1

−

−1

C(B

−1

+ DA

−1

[30, p. 9], we have

−1

= (AΨA

∗

+ Θ)

−1

= Θ

−1

− Θ

−1

A[Ψ

−1

+ Γ]

−1

∗

−1

where Γ = A

∗

−1

A. From this, and from (I + B)

−1

I − (B

−1

+ I)

−1

, we ﬁnd that A

∗

−1

= (I − Γ[Ψ

−1

Γ]

−1

∗

−1

= (I − [Ψ

−1

+ I]

−1

∗

−1

= [I +

ΓΨ]

−1

∗

−1

= [Γ

−1

+ Ψ]

−1

∗

−1

. Therefore, post-

multiplying A

∗

−1

by A we ﬁnd A

∗

−1

A = [Γ

−1

+ Ψ]

−1

which yields (A

∗

−1

∗

−1

= (A

∗

−1

∗

−1

REFERENCES

[1] R. P. P. P. Grasman, H. M. Huizenga, L. J. Waldorp, K. B. E. B

ocker, and

P. C. M. Molenaar. Frequency domain simultaneous source and source co-

herence estimation with an application to MEG. IEEE Trans. on Biomed-

ical Engineering, in press.

[2] T. J. Sejnowski and P. Smith Churchland. Brain and cognition. In

Michael I. Posner, editor, Foundations of cognitive science, chapter 8,

pages 301–358. MIT Press, Cambridge, Mass., 1989.

[3] M. H

ainen, Rita Hari, Risto J. Ilmoniemi, J. Knuutila, and O.V.

Lounasmaa. Magnetoencephalography – theory, instrumentation, and ap-

plications to noninvasive studies of the working human brain. Reviews of

Modern Physics, 65:413–497, 1993.

[4] F. Varela, J. P. Lachaux, E. Rodriguez, and J. Martinerie. The brainweb:

phase synchronization and large-scale integration. Nature reviews: Neu-

roscience, 2(4):229–239, Apr. 2001.

[5] S. L. Bressler. Event-related potentials. In M. A. Arbib, editor, The Hand-

book of Brain Theory and Neural Networks, pages 412–415. MIT Press,

Cambridge, MA, 2002.

[6] G. Pfurtscheller and F. H. Lopes da Silva. Event-related EEG/MEG syn-

chronization and desynchronization: basic principles. Clinical Neurophys-

iology, 110(11):1842–1857, 1999.

[7] T. Holroyd, M. Nielsen, S. Miyauchi, and T. Yanagida. Broad-band mag-

netic brain activity during rhythmic tapping tasks. In J. Nenonen, R. J.

Ilmoniemi, and T. Katila, editors, BioMag2000. Proceedings of 12th Int.

Conf. Biomagnetism, pages 307–310, Espoo, Finland, Aug 2000. Helsinki

University of Technology.

[8] J. Gross, J. Kujala, M. H

ainen, L. Timmermann, A. Schnitzler, and

R. Salmelin. Dynamic imaging of coherent sources: Studying neural in-

teractions in the human brain. Proc. Nat. Ac. Sci. USA, 98(2):694–699,

2001.

[9] O. David, L. Garnero, D. Cosmelli, and F. J. Varela. Estimation of neural

dynamics from MEG/EEG cortical current density maps: application to

the reconstruction of large-scale cortical synchrony. IEEE Trans BME,

49(9):975–987, Sep 2002.

[10] A. Delorme, S. Makeig, M. Fabre-Thorpe, and T. Sejnowski. From single-

trial EEG to brain area dynamics. Neurocomputing, 44–46:1057–1064,

2002.

[11] K. A. Bollen. Structural equations with latent variables. Wiley series in

probability and mathematical statistics. Wiley, New York, USA, 1st edi-

tion, 1989.

[12] A. Paulraj, B. Ottersten, R. Roy, A. Swindlehurst, G. Xu, and T. Kailath.

Subpsace methods for directions-of-arrival estimation. In N. K. Bose and

C. R. Rao, editors, Handbook of Statistics, chapter 16, pages 639–739.

Elsevier Science Publishers B.V., Amsterdam, Netherlands, 1993.

[13] H. Krim and M. Viberg. Two decades of array signal processing research.

the parametric approach. IEEE Signal Processing Mag., 13(4):67–95, July

1996.

[14] P. Stoica, B. Ottersten, M. Viberg, and R. Moses. Maximum likelihood

array processing for stochastic coherent sources. IEEE Trans. on Signal

Processing, 44(1):96–105, January 1996.

[15] D.T. Pham, J. M

ocks, W. K

ohler, and T. Gasser. Variable latencies of

noisy signals: Estimation and testing in brain potential data. Biometrika,

74(3):525–533, 1987.

[16] S. Makeig, M. Westerﬁeld, T.-P. Jung, S. Enghoff, J. Townsend, E. Courch-

esne, and T. J. Sejnowski. Dynamic Brain Sources of Visual Evoked Re-

sponses. Science, 295(5555):690–694, 2002.

[17] F. Bijma, J. C. de Munck, H. M. Huizenga, and R. M. Heethaar. A

mathematical approach to the temporal stationarity of background noise

in MEG/EEG measurements. NeuroImage, 20:233–243, 2003.

[18] J. Sarvas. Basic mathematical and electromagnetic concepts of the bio-

magnetic inverse problems. Phys. Med. Biol., 32:11–22, 1987.

[19] K. J. Friston, C. Buechel, G. R. Fink, J. Morris, E. Rolls, and R. J. Dolan.

Psychophysiological and modulatory interactions in neuroimaging. Neu-

roimage, 6:218–229, 1997.

[20] D. R. Brillinger. Time Series: data analysis and theory. International

series in decision processes. Holt, Reinhart and Winston Inc., New York,

1975.

[21] J. Raz, B. Turetsky, and G. Fein. Frequency-domain estimation of the

parameters of human brain electrical dipoles. Journal of the American

Statistical Association, 87(417):69–77, 1992.

[22] J. Raz, C. A. Biggins, B. Turetsky, and G. Fein. Frequency-domain dipole

localization - extensions of the method and applications to auditory and

visual-evoked potentials. IEEE Transactions on Biomedical Engineering,

40(9):909–918, 1995.

[23] J. Raz, V. Cardenas, and Fletcher D. Frequency-domain estimation of

covariate effects in multichannel brain evoked-potential data. Biometrics,

51(2):448–460, 1995.

[24] K. G. J

oreskog. Analysis of covariance structures. Scandinavian Journal

of Statistics, 8:65–92, 1981.

[25] J. C. de Munck, P. C. M. Vijn, and F. H. Lopes da Silva. A random dipole

model for spontaneous brain activity. IEEE Transactions on Biomedical

Engineering, 39(8):986–990, Aug. 1992.

[26] T. W. Anderson. An introduction to multivariate statistical analysis. Wiley,

New York, 1971.

[27] M. W. Browne and S. H. C. du Toit. Automated ﬁtting of nonstandard

models. Multivariate Behavioral Research, 27:269–300, 1992.

[28] P. Stoica, B. Ottersten, and M. Viberg. Optimal array signal processing in

the presence of coherent wavefronts. volume 5 of ICASSP, pages 2904–

2907, New York, NY, USA, 1996. IEEE.

[29] B. Ottersten, M. Viberg, and T. Kailath. Analysis of subspace ﬁtting and ml

GRASMAN ET AL.: STOCHASTIC MAXIMUM LIKLIHOOD 9

techniques for parameter estimation from sensor array data. IEEE Trans.

Signal Processing, 40(3):590–600, March 1992.

[30] J. R. Schott. Matrix analysis for statistics. Wiley Series In Probability

And Statistics. John Wiley & Sons, Inc., New York, 1997.

[31] L. J. Waldorp, H. M. Huizenga, R. P.P.P. Grasman, K. B. E. B

ocker,

J. C. de Munck, and P. C. M. Molenaar. Model selection in electromag-

netic source analysis with an application to VEF’s. IEEE Transactions on

Biomedical Engineering, 49(10):1121–1129, 2002.

[32] M. Wax and T. Kailath. Detection of signals by information theoretic

criteria. IEEE Trans ASSP, 33(2):387–392, Apr. 1985.

[33] M. S. Bartlett. A note on the multiplying factors for various χ

approxi-

mations. J. Roy. Stat. Soc., 16:296–298, 1954.

[34] D. F. Morrison. Multivariate statistical methods. McGraw-Hill, New York,

2d edition, 1989.

[35] L. J. Waldorp, H. M. Huizenga, C. V. Dolan, and P. C. M. Molenaar. Esti-

mated generalized least squares electromagnetic source analysis based on

a parameteric noise covariance model. IEEE Transactions on Biomedical

Engineering, 48:737–741, 2001.

[36] H. M. Huizenga, D. J. Heslenfeld, and P. C. M. Molenaar. Optimal mea-

surement conditions for spatiotemporal EEG/MEG source analysis. Psy-

chometrika, 67(2):299–313, Jun 2002.

[37] S. D. Silvey. Statistical inference. Penguin, Harmondsworth, 1970.

[38] G. A. F. Seber and C. J. Wild. Nonlinear Regression. Wiley series in

Probability and Mathematical Statistics, Applied probability and statistics.

Wiley, New York, 1989.

[39] P. E. Gill, M. H. Wright, and W. Murray. Nonlinear Programming. Stan-

ford University Press, Stanford, 1986.

[40] K. G. J

oreskog. A general approach to conﬁrmatory maximum likelihood

factor analysis. Psychometrika, 34:182–202, 1969.

[41] P. C. M. Molenaar. Dynamic factor analysis of psychophysiological sig-

nals. In J.R. Jennings, P. Ackles, and M.G.H. Coles, editors, Advances

in Psychophysiology, volume 5 of Advances in Psychophysiology, pages

229–302. Jessica Kingsley Publishers, London, 1993.

[42] T. Amemiya. Advanced Econometrics. Harvard University Press, Cam-

bridge MA, 1986.

[43] D. R. Brillinger and M. Hatanaka. An harmonic analysis of nonstation-

ary multivariate economic processes. Econometrica, 37(1):131–141, Jan.

1969.

[44] M. W. Browne. Generalized least squares estimators in the analysis of

covariance structures. South African Statistical Journal, 8:1–24, 1974.

[45] B. L

utkenh

oner. Frequency-domain localization of intracerebral dipo-

lar sources. Electroencephalography and clinical Neurophysiology,

82(2):112–118, 1992.

[46] R. Srebro. Iterative reﬁnement of the minimum norm solution of the bio-

electric inverse problem. IEEE Trans BME, 43(5):547–552, May 1996.

[47] S. Supek and C. J. Aine. Spatio-temporal modeling of neuromagnetic

data: II. multi-source resolvability of a MUSIC-based location estimator.

Human Brain Mapping, 5(3):154167, 1997.

[48] A. Hyv

arinen and E. Oja. Independent component analysis: algorithms

and applications. Neural Networks, 13:411–430, 2000.

[49] J. Maier, G. Dagneli, H. Spekreijse, and B. W. van Dijk. Principal com-

ponents analysis for source localization of VEPs in man. Vision Research,

27:165–177, 1987.

[50] A. Achim, F. Richer, and J. M. Saint-Hilaire. Methods for separating

temporally overlapping sources of neuroelectric data. Brain Topography,

1(1):22–28, 1988.

[51] J. C. de Munck. The estimation of time varying dipoles on the basis of

evoked potentials. Electroenceph. and clin. Neurophysiol., 77:156–160,

1990.

[52] K. G. J

oreskog and D. S

orbom. LISREL 7, a guide to the program and ap-

plications. J

oreskog and S

orbom/SPSS Inc., Chicago, Illinois, 2nd edition,

1989.

Raoul Grasman was born in 1973. He received

a degree in artiﬁcial intelligence (cum laude) and

the masters degree in experimental psychology (cum

laude) from the University of Amsterdam in 1997 and

1998, respectively. His research interests concern the

methodology of cognitive neuroscience and experi-

mental psychology research, and multivariate signal

processing. He is currently working towards his Ph.D.

at the University of Amsterdam.

Hilde Huizenga was born in 1965. She received

the MA degree in psychology from the University of

Groningen in 1990, and the Ph.D. degree (cum laude)

in psychology from the University of Amsterdam in

1995. In 1996-2001 she was a postdoctoral fellow,

currently she associate professor at the University of

Amsterdam. Her main research interest is statistical

analysis of neuroscientifc data. In particular, non-

linear regression and covariance structure analysis of

EEG/MEG sources and their interactions.

Lourens Waldorp was born in 1971. He received his

masters degree in methodological psychology in 1998

from the University of Amsterdam. His research inter-

ests include statistical analysis in psychophysiologi-

cal experiments and signal processing. He is currently

working towards a Ph.D. at the University of Amster-

dam.

Peter Molenaar was born in 1946. His doctoral dis-

sertation was about multidimensional signal analysis.

His current research interests include signal analysis

and applied nonlinear dynamics. He is research direc-

tor of several programs and department head.

Koen B.E. B

ocker was born in 1966. He received

his Masters degree in Physiological Psychology and

his Ph.D. degree (cum laude) from Tilburg University,

in 1989 and 1994, respectively. Since 1989 he has

served multiple research positions at different univer-

sities. Currently, he is assistant professor at Utrecht

University. His research interests include the study

of perception and cognition (attention, inhibition and

emotion) and the application of source analysis in cog-

nitive neuroscience.

ResearchGate has not been able to resolve any citations for this publication.

Multivariate Statistical Methods

Article

Full-text available

Jan 1969
J Roy Stat Soc

Wiley Series in Probability and Statistics

Chapter

Oct 2011

An Introduction to Multivariate Statistical Analysis

Article

May 1986
TECHNOMETRICS

Article

Jan 1999
CLIN NEUROPHYSIOL

A general approach to confirmatory maximum likelihood factor analysis

Article

Dec 1967
PSYCHOMETRIKA

Karl G. Jöreskog

We describe a general procedure by which any number of parameters of the factor analytic model can be held fixed at any values and the remaining free parameters estimated by the maximum likelihood method. The generality of the approach makes it possible to deal with all kinds of solutions: orthogonal, oblique and various mixtures of these. By choosing the fixed parameters appropriately, factors can be defined to have desired properties and make subsequent rotation unnecessary. The goodness of fit of the maximum likelihood solution under the hypothesis represented by the fixed parameters is tested by a large sample χ2 test based on the likelihood ratio technique. A by-product of the procedure is an estimate of the variance-covariance matrix of the estimated parameters. From this, approximate confidence intervals for the parameters can be obtained. Several examples illustrating the usefulness of the procedure are given.

An Introduction to Multivariate Statistical Analysis

Article