ArticlePDF Available

Stochastic maximum likelihood mean and cross-spectrum structure estimation: analytic and neuromagnetic Monte Carlo results

Authors:

Abstract

In [1] we proposed to analyze cross-spectrum matrices obtained from electro- or magneto-encephalographic (EEG/MEG) signals, to obtain estimates of the EEG/MEG sources and their coherence. In this paper we extend this method in two ways. First, by modelling such interactions as linear filters, and second, by taking the mean of the signals across different trials into account. To obtain estimates we propose a stochastic maximum likelihood (SML) method, and obtain the concentrated likelihood that includes the trial means.
Raoul Grasman (c) 2003 Submitted to IEEE Transactions on Signal Processing
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XX, NO. Y, MONTH 2004 1
Stochastic maximum likelihood mean and
cross-spectrum structure estimation: analytic and
neuromagnetic Monte Carlo results.
Raoul P.P.P. Grasman*, Hilde M. Huizenga, Lourens J. Waldorp, Peter C.M. Molenaar and
Koen B.E. B
¨
ocker
Abstract— In [1] we proposed to analyze cross-spectrum matrices ob-
tained from electro- or magneto-encephalographic (EEG/MEG) signals, to
obtain estimates of the EEG/MEG sources and their coherence. In this pa-
per we extend this method in two ways. First, by modelling such interac-
tions as linear filters, and second, by taking the mean of the signals across
different trials into account. To obtain estimates we propose a stochastic
maximum likelihood (SML) method, and obtain the concentrated likeli-
hood that includes the trial means.
Keywordsequivalent current dipole, EEG, MEG, stochastic maximum
likelihood, array signal processing, mean structure, covariance structures,
functional connectivity, effective connectivity
I INTRODUCTION
I
N cognitive neuroscience the objective is to establish how
structures of the brain cooperate to give rise to mental func-
tions. Several brain imaging techniques are helpful in deter-
mining which parts of the cortex become active during certain
mental processes [2]. These techniques include functional mag-
netic resonance imaging (fMRI) and equivalent current dipole
(ECD) modelling of the electro- (EEG) and magneto- (MEG)
encephalogram [3]. EEG and MEG measure, respectively, the
scalp electric potential field and the magnetic field near the
head of a human subject. These fields are generated by lo-
calized electric currents associated with neuronal activity. In
ECD modelling, these currents are modelled by small current
dipoles, and the objective is to estimate their unknown locations,
orientations and amplitudes, given the EEG/MEG sensor out-
puts. Techniques such as fMRI provide great localization pre-
cision, whereas EEG and MEG provide great timing precision
[2]. While function localization to different parts of the cortex
has taken off to a great extent over the last decade, increasingly,
researchers are interested in testing hypotheses about the coop-
erativity between these different cortical structures. That is, in
estimating the parameters that describe the dynamics of these
interactions [4], [5].
Standard methods of investigating interactions include coher-
ence analysis of EEG and MEG signals and so-called event re-
lated (de-)synchronization [6]. Problems with the interpretation
of these measures of cortico-cortical interactions include vol-
ume conduction effects, reference electrode effects, and the lack
of spatial resolution of the EEG/MEG [4]. Newer approaches
consist of localization of activity by means of dipole source
Raoul Grasman, Hilde Huizenga, Lourens Waldorp and Peter Molenaar are
with the Department of Psychology, University of Amsterdam, Roetersstraat 15,
1018WB Amsterdam, the Netherlands. Phone: +31 20 525 6734. Fax: +31 20
639 0279.
Koen B
¨
ocker is with the Department of Psychopharmacology, Utrecht Uni-
versity, Sorbonnelaan 16, De Uithof, 3583CA Utrecht, the Netherlands.
*Corresponding author e-mail: grasman@psy.uva.nl.
localization procedures and correlating source amplitude func-
tions estimated for these locations [7], [8], [9], [10]. Still an-
other approach, that was laid down in [1], is to simultaneously
estimate dipole locations and their amplitude cross-spectra from
the sample cross-spectra of the EEG/MEG signals. The advan-
tage of this last approach is that it makes full use of the virtues of
statistical estimation theory, which include high precision max-
imum likelihood estimators, and straightforward model evalua-
tion theory.
In this paper we will extend the method in [1] with a frame-
work for modelling and testing source amplitude coherence.
Furthermore we will include the information that is present in
the average of the EEG/MEG signals across repeated trials. The
framework suggested, has its roots in what is known in biomet-
rics as path-analysis, and in econometrics and psychometrics as
structural equation modelling (SEM) [11]. The method employs
maximum likelihood in a way that is very similar to stochas-
tic maximum likelihood (SML) directions-of-arrival (DOA) es-
timation, as given in e.g., [12], [13], [14]. We modify the usual
SML formulas to include the mean and a more general noise
covariance.
This paper is organized as follows: In section II the source
model is presented. In section III the mean and cross-spectrum
model is given, and a framework for modelling source amplitude
coherence is presented. In section IV closed form expressions
for the estimators of some of the parameters are derived, and
an expression for the concentrated negative log-likelihood func-
tion is obtained. Also, the generalized likelihood ratio (GLRT)
statistic and approximate standard errors of the estimators are
briefly discussed in connection with model evaluation. In sec-
tion V the approximate standard errors and GLRT statistic are
evaluated in a set of numerical experiments. Finally in section
VI some closing remarks on the methods are made.
II DIPOLE MODEL AND MEASUREMENTS MODEL
Experimental EEG and MEG data usually consist of signal
segments measured in different trials, during which stimuli are
presented to subjects in order to evoke specific brain responses.
The EEG/MEG signals reflect these responses in a highly entan-
gled way, and the purpose of ECD modelling is to disentangle
these signals into the underlying components of localized neuro-
electric activity in the cortex. It has been widely recognized that
these cortical responses, evoked by the presentation of stimuli,
are characterized by a deterministic part and a stochastic part
[15].
1
The deterministic part, the event related potential/field
1
The debate on this issue has recently revived due to experimental findings
in [16] and a mathematical analysis of data preprocessing effects verified in
2 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XX, NO. Y, MONTH 2004
(ERP/ERF), can be estimated by averaging the signals across
many repeated trials [5]. The stochastic part is only reflected in
the variance of the signals across trials. It is generally accepted
that these trials may be considered as statistically independent
replications of an evoked brain response, provided that the time-
interval separation between trials is not too small and unpre-
dictable [15]. In relating the fields produced by these neural cur-
rents to the measurements, the head is often modelled as a spher-
ically symmetric conductor that is locally fitted to the curvature
of the skull [3]. The sources themselves are described by a pa-
rameter vector θ
0
= [θ
0
1
, . . . , θ
0
d
], containing location and ori-
entation parameters θ
a
for each source, indexed a = 1, . . . , d.
Here
0
denotes transposition.
For EEG/MEG data in trials l = 1, . . . , L the m-dimensional
array of measurements has the form
2
e
y
l
(t) = A(θ)
e
s
l
(t) +
e
n
l
(t) t = 0, . . . , T 1. (1)
Here
e
y
l
(t) is the vector of measurements from m channels on
trial l at time t. A is the m × d matrix, of which the columns
contain the gains for the unit amplitude sources parameterized
by θ.
e
s
l
(t) is the d vector of source amplitudes in trial l at time t,
and
e
n
l
(t) is an m vector of noise signals in trial l at time t, inde-
pendent of
e
s
l
(t). The gain matrix A is obtained from the quasi-
static Maxwell equations [3]. For the MEG measurements from
a spherical head model that we use in section V, A was deter-
mined in [18]. Throughout this article it is assumed that source
parameters θ are fixed over time and trials. We will assume that
the sources are sufficiently separated so that rank(A(θ)) = d
throughout the source region.
An advantage that arises if it may be assumed that
e
s
l
(t) is
stochastic in nature, is that, from the variation across trials in-
ferences about interdependency between different sources can
be made, which may be interpreted as “functional coupling” of
different cortical areas [19]. In [1] we also pointed out this fact,
along with a discussion about the advantages of transforming
the model into the frequency domain. To summarize the latter:
Assuming stationarity of the stochastic processes, the Fourier
coefficients of different frequencies are asymptotically (i.e. for
T ) uncorrelated and have approximately a complex nor-
mal distribution [20]; the fitting function may therefore be fac-
tored into a set of fitting functions that are much more efficiently
evaluated than their time domain equivalent. As a result the
computational burden can be reduced drastically. This property
has been previously exploited in the context of the analysis of
brain signals in [15], [21], [22], [23].
To summarize, we make the following assumptions on the
measurements: i) Multiple segments of multichannel data are
available, generated in accordance with the source signal plus
noise model in (1), in which a precisely defined event occurs, to
which the sources respond. ii) Segments are statistically inde-
pendent of each other. iii) The sources’ parameters θ
a
are fixed
across time and segments. iv) The source responses consist of
a deterministic part and a stochastic part. v) Noise and source
signals are statistically independent, and the expected value of
the noise is zero. vi) A is a known matrix function of θ and the
experimental data in [17], resulting in opposing views. In any case, the model
can be maintained as a model for the ensemble average, where any trial to trial
variation is absorbed into the stochastic part of the response.
2
Throughout this paper tilde (e) will indicate time domain quantities.
sources are sufficiently separated such that A has full column
rank.
III MODEL SPECIFICATION
III-A Mean and cross-spectrum structure
Define y
l
(k) = (2πT )
1/2
P
t
e
y
l
(t) exp (i2πtk/T ) to be
the discrete Fourier transform coefficient at frequencies 2πk/T ,
k = 1, . . . , K < T /2 [21]. Define s
l
(k) and n
l
(k) similarly. As
indicated previously, subject to certain mixing conditions and
stationarity of the stochastic part of the signals, the Fourier coef-
ficients y
l
(k) have an asymptotically complex normal distribu-
tion, and are statistically independent for k 6= j [20], [21]. Their
covariance matrix E{[y
l
(k) E{y
l
(k)}][y
l
(k) E{y
l
(k)}]
}
approaches the cross-spectral density matrix R
k
as T .
Here (·)
denotes conjugation and transposition.
For the Fourier transformed data the equivalent of (1) is
y
l
(k) = A(θ)s
l
(k) + n
l
(k) k = 1, . . . , K < T/2. (2)
Using assumption v in the last paragraph of the previous sec-
tion, the cross-spectrum of the stochastic part of
e
y
l
(t) at each
frequency then has the structure
3
R
k
= A(θ)Ψ
k
A
(θ) + Θ
k
k = 1, . . . , K < T /2. (3)
Here Ψ
k
is the cross-spectrum of the source amplitudes, and Θ
k
is the cross-spectrum of the noise signals. These are the limiting
values of the covariances of s
l
(k) and n
l
(k), respectively. This
is the model presented in [1]. In that paper, this model was fitted
to the sample cross-spectrum
ˆ
R
k
that was computed from the
observed data from the formula [20, p. 282]
ˆ
R
k
=
1
L 1
L
X
l=1
[y
l
(k)
˙
y
k
][y
l
(k)
˙
y
k
]
, (4)
where
˙
y
k
= L
1
P
l
y
l
(k). If an estimate of the matrix Ψ
k
is
obtained in this way, coherences between source amplitudes can
be obtained as a measure of functional connectivity (see [1]).
In the model thus far, the signal means, and hence the source
amplitude means, are ignored. If the amplitude means are not
equal to zero, the means contain important information, and are
usually the object of interest to the researcher.
Therefore the first extension of this model involves the incor-
poration of the trial average in the form of the expected value of
the Fourier coefficients by taking expectations in (2):
E{y
l
(k)} = µ
k
= A(θ)E{s
l
(k)} = A(θ)s
k
, (5)
because E{n
l
(k)} = 0 by assumption v) on
e
n
l
(t). Here s
k
is
the k-th Fourier coefficient of the ensemble average waveform
e
s(t).
III-B Linear filter model for interactions
In some situations a researcher may entertain substantive hy-
potheses about interactions between different sources. In the
current model these hypotheses may be tested directly if Ψ
k
is
further structured.
3
A(θ) is real valued for the biophysical model but can be complex in other
applications. Henceforth we use
instead of
0
in such cases.
GRASMAN ET AL.: STOCHASTIC MAXIMUM LIKLIHOOD 3
We will approximate the interaction equation between the
amplitudes of two sources indexed a and b by a linear filter
relation. It is not expected that the relations will be perfect,
since, apart from nonlinearities in the interactions, some intrin-
sic activity will exist and some external input activation is un-
accounted for by the sources incorporated in the model. These
effects will be incorporated through an additional zero mean sta-
tionary stochastic process term
e
ζ
a,l
(t). For different sources a
and b,
e
ζ
a
and
e
ζ
b
are assumed to be independent. In addition, a
non-random portion of the response is included through an extra
term h
a
(t). The resulting equation for the interactions between
sources is
es
a,l
(t) = h
a
(t) +
d
X
b=1
Z
h
ab
(τ)es
b,l
(t τ ) +
e
ζ
a,l
(t) (6)
for a = 1, . . . , d. Of course neurophysiological hypotheses may
imply that some of these kernels are equal to zero.
By the convolution theorem of Fourier analysis, in the fre-
quency domain this results in the relation
s
a,l
(k) = α
a
(k) +
d
X
b=1
β
ab
(k)s
a,l
(k) + ζ
a,l
(k) (7)
where α
a
(k) =
1
2π
R
h
a
(t) exp{−i2πkt}dt, and β
ab
(k) =
1
2π
R
h
ab
(t) exp{−i2πkt}dt. Zero kernels in (6) correspond to
zero coefficients in this equation.
For d sources these relations may be compactly represented
in matrix form:
s
l
(k) = α
k
+ B
k
s
l
(k) + ζ
l
(k) (8)
where B
k
= (β
ab
(k)), and ζ
l
(k)
0
= (ζ
1,l
(k), . . . , ζ
d,l
(k)).
Some restrictions must be imposed in order to make B
k
iden-
tifiable [11], [24]. We will assume that B
k
is specified in such a
way that (IB
k
)
1
exists; this coincides with the condition that
the system of filters in (6) is invertible [20, p. 30]. This ensures
that
e
s(t) consists of a deterministic component superimposed on
a stationary stochastic component. The vector of mean ampli-
tude Fourier coefficient E{s
k
} is then obtained from rewriting
(8) as s
l
(k) = (I B
k
)
1
[α
k
+ζ
l
(k)], and taking expectation:
E{s
l
(k)} , s
k
= (I B
k
)
1
α
k
,
as E{ζ
l
(k)} = 0 by the assumption on the components
of
e
ζ
l
(t). Besides the invertibility condition on I B
k
,
the cross-spectrum of
e
ζ
l
(t) will have to be restricted. A
natural constraint is to restrict E{ζ
l
(k)ζ
l
(k)
} = Φ
k
=
diag(φ
1
(k), . . . , φ
d
(k)), as otherwise, there could be corre-
lations between dipole amplitudes not accounted for by the
filter model that was invoked precisely to model these cor-
relations. The amplitude cross-spectrum is now obtained
from E{[s
l
(k) E{s
l
(k)}][s
l
(k) E{s
l
(k)}]
} = E{(I
B
k
)
1
ζ
l
(k)ζ
l
(k)(I B
k
)
1
} or
Ψ
k
= (I B
k
)
1
Φ
k
(I B
k
)
1
.
For any nonsingular diagonal scaling matrix D it is seen that
[D(I B
k
)]
1
k
D
[(I B
k
)D
]
1
= (I B
k
)
1
Φ
k
(I
B
k
)
1
. Hence, without fixing the scale of either B
k
or Φ
k
, both
cannot be uniquely identified. A further restriction therefore has
to be imposed. With no additional information, it is natural to
require either Φ
k
= I or diag(B
k
) = 0. The former has the
interpretation that the intrinsic activity has a uniform spectrum
(i.e. is pure white noise), which is somewhat unrealistic, espe-
cially for biological systems, and is therefore not desirable. The
latter ensures that the diagonal elements of IB
k
are equal to 1
and means that the kernels h
aa,1
(t), a = 1 . . . d, are identically
zero; hence B
k
contains the Fourier coefficients of the linear fil-
ter that predicts the activity of one source from the activity of
only other sources (and not from its own activity).
III-C Structure of cross-spectrum Θ
k
of the noise signals
In [1] the noise cross-spectrum Θ
k
was constrained to be
proportional to an identity matrix: Θ
k
= σ
2
k
I. Conceptually,
this means that the amount of noise is the same for all sensors,
and that noise at different sensors is mutually uncorrelated. A
more realistic constraint on Θ
k
is to assume that the background
EEG/MEG consists of dipoles that are randomly located and
randomly activated in different trials and different times [25].
Here we will assume that Θ
k
is any function of KN parameters
γ = (γ
jk
), j = 1 . . . N, k = 1, . . . , K such that these parame-
ters are identifiable.
In summary, then, in addition the measurement assumptions
(section II), we assume: vii) The Fourier coefficients of the
Fourier transformed data have (asymptotically) a complex nor-
mal distribution, independent for different frequencies. viii) The
dependencies between signals of different sources can be rea-
sonably approximated by linear filter relations as in (6), this may
be justified as a first order approximation of a Volterra func-
tional expansion [20]. ix) The filter system in (6) is invertible
(i.e. (I B
k
)
1
exists for all k). x) Restrictions have been
introduced, in a way that is justifiable within the context of ap-
plication, to make the matrices B
k
, Φ
k
and Θ
k
identifiable.
For easy reference we recapitulate some used symbols:
Symbol Meaning
k frequency index
θ vector of all source parameters
A sensor gain matrix
y
l
(k) Fourier coefficients of the observed signals
on the l-th trial
˙
y
k
sample mean of y
l
(k)
ˆ
R
k
sample covariance of y
l
(k)
µ
k
expected value of y
l
(k)
s
k
expected value of source amplitude Fourier
coefficient s
l
(k)
R
k
cross-spectrum of stochastic part of
e
y
l
(t)
Ψ
k
cross-spectrum of stochastic part of
e
s
l
(t)
Θ
k
cross-spectrum of
e
n
l
(t).
α
k
Fourier coefficient of the deterministic response
of the source amplitude interactions
B
k
matrix of transfer coefficients in the
linear filter source interaction model
Φ
k
(diagonal) matrix with the variance of ζ
l
(k)
4 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XX, NO. Y, MONTH 2004
IV PARAMETER ESTIMATION.
Since the interest of the analyst is usually restricted to a lim-
ited band of frequencies, not all frequencies have to be incorpo-
rated in the analysis; we will denote the subset of K frequencies
incorporated by
#
.
Following [1], the unknown parameters in (3), (5) and section
III-C (i.e. θ, α
k
, the non-duplicate elements in Ψ
k
and γ =
(γ
jk
), for j = 1, . . . , N, k
#
) are collected in the p-vector
ξ, and are estimated by maximizing the “likelihood” [20], [21]
`({y
l
(k) : l = 1, . . . , L, k
#
}; ξ)
Y
l,k
exp{−[y
l
(k) µ
k
(ξ)]
R
k
(ξ)
1
[y
l
(k) µ
k
(ξ)]}
π
m
|R
k
(ξ)|
.
The factorization in k is due to the afore mentioned asymptotic
independence of the Fourier coefficients of different frequencies
[20], [21]. It will be more convenient to minimize the negative
log-likelihood, which is proportional to (cf. [26], [27])
F (ξ) =
X
k
#
log |R
k
(ξ)| + tr{R
k
(ξ)
1
ˆ
R
k
}
+ [
˙
y
k
µ
k
(ξ)]
R
k
(ξ)
1
[
˙
y
k
µ
k
(ξ)]
+ K log π
m
. (9)
When it is known that µ
k
(ξ) 0 and a single fre-
quency is considered, (9) may be reduced to log |R
k
(ξ)| +
tr{R
k
(ξ)
1
ˆ
R
k
}, which is the “Stochastic ML (SML) objec-
tive function described in e.g. [12], [13], [14], [28], [29].
In some cases, the latter negative log-likelihood function can
be separated, and for unparameterized Ψ
k
and Θ
k
= σ
2
k
I a
well known compact equivalent concentrated problem was de-
termined by B
¨
ohme (cited in [14], [29]), which greatly increases
the computational efficiency.
Here we derive a similar concentrated problem for this case
when the mean is incorporated. Furthermore, we obtain estima-
tors under more general noise conditions—i.e., we do not as-
sume that Θ
k
= σ
2
k
I, but allow Θ
k
= σ
2
k
U(γ), where U(γ)
is a Hermitian positive definite matrix such that γ is identified.
For simplicity however we will derive the results for U(γ) = I
and then indicate how they generalize.
IV-A The case that B = 0 (unparameterized Ψ).
Let F
denote the pseudo-inverse (F
F)
1
F
, and Π
F
de-
note the matrix I FF
for any F of full column rank. Then,
for unparameterized Ψ
k
we obtain the expressions
ˆ
α
k
(
ˆ
θ) = A
(θ)
˙
y
k
|
θ=
ˆ
θ
(10)
ˆ
σ
2
k
(
ˆ
θ) =
1
m d
tr{Π
A
[
ˆ
R
k
+
˙
y
k
˙
y
k
]}|
θ=
ˆ
θ
(11)
ˆ
Ψ
k
(
ˆ
θ) = A
(
ˆ
R
k
ˆ
σ
2
k
(θ)I)A
(θ)|
θ=
ˆ
θ
(12)
ˆ
θ = arg min
θ
log |A(θ)
ˆ
Ψ
k
(θ)A
(θ) +
ˆ
σ
2
k
(θ)I|, (13)
In the case that Θ
k
(γ) = σ
2
k
U(γ), then A
is replaced by
(QA)
Q, Π
A
by QΠ
QA
Q, and
ˆ
σ
2
k
I by
ˆ
σ
2
k
U(γ), where Q =
U
1/2
, a Hermitian ‘square root’ of U
1
. In deriving the re-
sults, we will temporarily drop the dependence on k and sup-
press the dependence of A on θ.
These estimators are obtained by equating partial derivatives
of F to zero and solving for the desired parameter. We first
consider α.
IV-A.1 Mean amplitude parameters α.
Setting the derivatives of F with respect to α equal to zero,
the first order conditions 2(
˙
y Aα)
R
1
A = 0 are ob-
tained, which may be solved to yield the optimal estimator
ˆ
α
ˆ
α = [A
R
1
A]
1
A
R
1
˙
y. (14)
In the appendix it is shown that for R = AΨA
+Θ, for any
A of full column rank and nonsingular Ψ and Θ,
(A
R
1
A)
1
A
R
1
= (A
Θ
1
A)
1
A
Θ
1
(15)
so that (10) is obtained for Θ = σ
2
I, and the more general result
is obtained for Θ = σ
2
U.
IV-A.2 Amplitude cross-spectral parameters Ψ.
Substitution of
ˆ
α into F yields the concentrated negative log-
likelihood F
|
ˆ
α
. Let ψ denote a real or imaginary part of an ele-
ment of Ψ. The derivatives of F
|
ˆ
α
with respect to ψ are the same
as those of F because
ˆ
α does not depend on Ψ by (15). Setting
the derivatives equal to zero, we may obtain the equations
2< tr{A
R
1
(
ˆ
R
#
R)R
1
AΨ/∂ψ}|
α=
ˆ
α
= 0
ψ = (<Ψ)
ab
or ψ = (=Ψ)
ab
, a, b = 1, . . . , d,
where
ˆ
R
#
=
ˆ
R[
˙
y
ˆ
µ][
˙
y
ˆ
µ]
, and
ˆ
µ = A
ˆ
α. From (14), by
construction of
ˆ
α, A
R
1
[
˙
y
ˆ
µ] = 0, so that A
R
1
ˆ
R
#
=
A
R
1
ˆ
R and the estimation equations can be reduced to, in
matrix form,
A
R
1
(
ˆ
R R)R
1
A = 0.
Substituting R = AΨA
+ Θ, this can be written
A
R
1
(
ˆ
R Θ)R
1
A = A
R
1
AΨA
R
1
A.
Therefore the optimal estimate of Ψ is given by
ˆ
Ψ = [A
R
1
A]
1
A
R
1
(
ˆ
R Θ)R
1
A[A
R
1
A]
1
(16)
which with (15) and Θ = σ
2
I or Θ = σ
2
U yields (12) and the
more general result indicated thereafter.
IV-A.3 Noise spectrum σ
2
and concentrated negative log-
likelihood.
We first derive some simplifying expressions, required to con-
centrate the likelihood with respect to α and Ψ.
First note that with U
1/2
= Q,
ˆ
Ψ can be rewritten
ˆ
Ψ = (QA)
Q
ˆ
RQ(QA)
σ
2
(A
0
U
1
A)
1
. (17)
Furthermore, by using the matrix inversion formula (e.g. [30, p.
9], see the appendix) twice, it can be shown that
R
1
= (A
+ σ
2
U)
1
=
1
σ
2
QΠ
QA
Q+Q(QA)
[Ψ+σ
2
(A
U
1
A)
1
]
1
(QA)
Q.
GRASMAN ET AL.: STOCHASTIC MAXIMUM LIKLIHOOD 5
Substituting
ˆ
Ψ in (17), and a little algebra that cancels terms,
yields the equation
R
1
|
ˆ
Ψ
= (1
2
)QΠ
QA
Q
+ U
1
A(A
U
1
ˆ
RU
1
A)
1
A
U
1
. (18)
Substitute (18) in tr{R
1
ˆ
R} to obtain
tr{R
1
|
ˆ
Ψ
ˆ
R} = tr{(1
2
)QΠ
QA
Q
ˆ
R}
+ tr{U
1
A(A
U
1
ˆ
RU
1
A)
1
A
U
1
ˆ
R}
= tr{(QΠ
QA
Q
ˆ
R}
2
+ d,
where the equality tr{AB} = tr{BA} was used. Furthermore
from (18) and (15) it can be shown that
R
1
|
ˆ
Ψ
[I A(A
R
1
A)
1
A
R
1
] =
Θ
1
[I A(A
Θ
1
A)
1
A
Θ
1
].
From this and from (14) therefore, we find
(
˙
y
ˆ
µ)
R
1
|
ˆ
Ψ
(
˙
y
ˆ
µ)
= (
˙
y
ˆ
µ)
R
1
|
ˆ
Ψ
(I A(A
R
1
A)
1
A
R
1
)
˙
y
= (
˙
y
ˆ
µ)
Θ
1
(I A(A
Θ
1
A)
1
A
Θ
1
)
˙
y
which can be written
1
σ
2
˙
y
QΠ
QA
Q
˙
y, because Θ
1
=
(1
2
)U
1
= (1
2
)Q
2
. Combining traces now yields the
concentration of F in (9) with respect to α and Ψ:
F
|
ˆ
α,
ˆ
Ψ
= log |A
ˆ
ΨA
+σ
2
U|+
1
σ
2
tr{QΠ
QA
Q(
ˆ
R+
˙
y
˙
y
)}+d.
To find
ˆ
σ
2
we must take the derivative of F
|
ˆ
α,
ˆ
Ψ
with respect
to σ
2
. Before doing so, first note that together with (17) R|
ˆ
Ψ
=
A
ˆ
ΨA
+ σ
2
U can be written A(QA)
Q
ˆ
RQ(QA)
A
+
σ
2
Q
1
Π
QA
Q
1
. Therefore R|
ˆ
Ψ
/∂σ
2
= Q
1
Π
QA
Q
1
.
Setting the derivative of F
|
ˆ
α,
ˆ
Ψ
with respect to σ
2
equal to zero
gives the first order conditions
< tr{R
1
|
ˆ
Ψ
Q
1
Π
QA
Q
1
} =
< tr{QΠ
QA
Q(
ˆ
R +
˙
y
˙
y
)}
4
.
With (18) we find tr{R
1
|
ˆ
Ψ
Q
1
Π
QA
Q
1
} = (m d)
2
,
therefore
ˆ
σ
2
= tr{QΠ
QA
Q(
ˆ
R +
˙
y
˙
y
)}/(m d) is obtained,
which is (11). Substitution of
ˆ
σ
2
in F
|
ˆ
α,
ˆ
Ψ
yields the concen-
trated negative log-likelihood in (13).
IV-B The case that B 6= 0.
Unfortunately, when B 6= 0, we cannot use the algorithm
in (10)-(13). Some parameters can still be separated however:
Next we find estimators for α and Φ when B 6= 0.
If B 6= 0, A in (14) must be substituted by A(I B)
1
. The
resulting estimator of α has the simple form (I B)
ˆ
α, where
ˆ
α is given in (14).
We can obtain an estimate for Φ in a similar way as
ˆ
Ψ. Let
φ be a (real) diagonal element of Φ. Setting derivatives of F
|
ˆ
α
with respect to φ equal to zero, we find the first order conditions
to be
< tr{Λ
R
1
(
ˆ
R Θ)R
1
ΛΦ/∂φ}
= < tr{Λ
R
1
ΛΦΛ
R
1
ΛΦ/∂φ}.
where Λ = A(I B)
1
. Since the partial derivatives are taken
only with respect to the real diagonal elements of Φ it is easy to
see that < may be dropped. The first order conditions therefore
yields a system of equations with the solution
ˆ
φ
φ
φ = [(Λ
0
R
1
Λ) (Λ
R
1
Λ)]
1
× diag[Λ
R
1
(
ˆ
R Θ)R
1
Λ] (19)
where φ
φ
φ = diag(Φ) contains the diagonal elements of Φ,
denotes the Hadamard product defined by A B = (a
ij
b
ij
),
and
R denotes conjugation of R.
IV-C Assessment of model fit
The appropriateness of a model can be assessed by means of
various fit assessment techniques that are sometimes grouped
under the term “model selection procedure” [31]. Some of
these procedures indicate how well the model describes the data,
while others provide a rationale for deciding which of several
competing models should be preferred on the basis of the data.
Hence, a model selection procedure can help to decide how
many dipoles should be incorporated in the model, and whether
cross-spectral parameters should be included. In [1] we assessed
the usefulness of the generalized likelihood ratio (GLRT) statis-
tic 2L· (F (
ˆ
ξ)
P
k
#
log |
ˆ
R
k
|) in determining the number of
dipole sources that should be incorporated in the model (i.e. the
detection problem [32]). Here we assess its effectiveness in test-
ing the lack of interaction between different sources. The GLRT
has an asymptotic χ
2
df
distribution with df = K(m
2
+ 2m) p
degrees of freedom, where p is the number of free parame-
ters. For moderate numbers of observations a Bartlett corrected
statistic should be used [33] as was indicated in [34], [35].
Confidence regions of the estimates can also help to decide
which parameters are necessary, and which may be omitted:
Location estimates that are not contained in each other’s confi-
dence regions indicate separate sources, and confidence regions
of cross-spectral parameter estimates indicate whether these dif-
fer from zero [31], [36].
V SIMULATIONS
In [1] we showed that confidence regions can be constructed
quite reliably. Confidence regions of the estimated parameters
can be computed from the Hessian matrix of the negative log
likelihood ratio F (ξ) evaluated in
ˆ
ξ [37], [38]. A finite dif-
ference approximation of the Hessian was calculated from the
gradients at the estimate
ˆ
ξ. Note that in order to obtain standard
errors for all parameters, F in (9) must be implemented fully, in-
cluding, all analytic derivatives, but only needs to be evaluated
after the last iteration of the algorithm in (10)-(13). We used a
quasi-Newton algorithm [39] to optimize the full negative log-
likelihood (9) to obtain the estimates
ˆ
ξ. We refer to [1] for the
details on the calculation of the confidence regions.
To asses the performance and the stochastic behavior of the
estimators in the current extensions, a number of simulations
6 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XX, NO. Y, MONTH 2004
0 20 40 60 80 100 120
−0.10 0.00 0.05 0.10 0.15
time
amplitude
Reconstructed mean amplitude of source 1
Estimated source location parameters
θ
1
θ
2
θ
3
−0.50 −0.25 0.00 0.25 0.50 0.75
Fig. 1. To give an impression of the reconstruction, here, the in each simula-
tion reconstructed source amplitude of the first source is depicted (upper panel).
The fat gray line is true amplitude as reconstructed from the frequencies used
in the estimation. Furthermore, in the lower panel box and whisker plots of
the source location parameter (θ
0
a
= [θ
(x)
a
, θ
(y)
a
, θ
(z)
a
]) estimates are depicted.
Black, gray and white boxed whisker plots respectively correspond to x, y and
z-coordinates. Boxes show the estimates between the first and last quartiles, cen-
tral line indicates the median, whiskers indicate the estimators’ range extremes,
dots indicate very extreme estimates.
were carried out. Simulations were conducted in much the
same way as in [1]: Three dipoles were placed in a unit radius
sphere at (0, 0.5, 0.75), (0, 0.5, 0.75) and (0.5, 0, 0.75), the
first two with orientation cosines (1, 0, 0) and the third with ori-
entation cosines (0, 1, 0). The amplitudes of the dipoles con-
sisted of dampened sine waves added to a vector auto regressive
stochastic process: es
1
(t) = exp(2t/T ) sin(2πt2/T ) + ea
1
(t)
and es
2
(t) = exp(2t/T ) sin(2πt4/T ) + ea
2
(t), where ea
1
(t)
and ea
2
(t) satisfied the equations ea
1
(t) = 0.7a
1
(t 1) +
e
ζ
1
(t)
and ea
2
(t) = 0.5a
1
(t 1) + 0.7a
2
(t 1) +
e
ζ
2
(t). The
amplitude of the third dipole was generated from es
3
(t) =
exp(2t/T ) sin(2πt6/T ) + ea
3
(t) with ea
3
(t) = 0.3ea
1
(t 1) +
0.7ea
3
(t 1) +
e
ζ
3
(t). For all three i = 1, 2, 3,
e
ζ
i
(t) N(0, 1).
With these dipole amplitudes MEG data were simulated for
a whole head 61 sensor array in accordance with (1) the
components of
e
n(t) each satisfying the auto-regression process
n(t) = 0.7n(t 1) + (t) where (t) N(0, σ
2
e
n
). The ma-
trix function A was obtained from [18]. In all simulations σ
2
e
n
was twice as large as the largest of the noiseless sensor signal
variances, i.e., the signal to noise ratio (SNR) was 1:2. For each
trial T = 128 samples were generated. The generated data were
(Fast Fourier) transformed into the frequency domain and the
mean
˙
y
k
and sample cross-spectra
ˆ
R
k
for the first five frequency
TABLE I
COVE R AGE RATE S OF T HE 95% CONFI D ENCE IN T ERVALS.
θ α Φ
k
B
k
σ
2
k
L
100 93.7 95.6 92.2 94.4 94.2
200 94.9 95.5 93.9 94.5 86.2
400 95.1 95.8 94.9 95.1 68.0
Percentage of simulations of which 95% confidence intervals contained the true
parameter values when the correct model was fitted. Percentages are computed
as the proportion of 300 simulations in each case.
100 200 300 400
0 20 60 100
number of trials
% accepted
Fig. 2. Percentage of simulations in which the fitted model was accepted as indi-
cated by the significance of the GLRT. This should be in 95% of the simulations
in case of the correct model (continuous line), and in as few of the simulations
as possible, in case of an incorrect model with too few parameters (dashed line).
components were calculated as indicated previously.
The simulations were carried out with L = 100, 200 and 400
trials. Two models were fitted: one in which only the filter co-
efficients from dipole 1 to dipole 2 (β
12
) and from dipole 1 to
dipole 3 (β
13
) were freely estimated while others were forced to
zero (it can be shown that this is a correct model for the simu-
lated data) and one model in which no interactions were allowed
(which is an incorrect model for the simulated data). In all sim-
ulations Θ
k
= σ
2
k
I was used in accordance with the simulated
data. The purpose of fitting the incorrect model was to assess
the adequateness and usefulness of the Bartlett corrected GLRT
in rejecting an incorrect a priori hypothesized model, while re-
taining a correct a priori hypothesized model.
In Table I coverage rates for different kinds of parameters are
presented. These coverage rates represent in a condensed form
the accuracy of the estimators themselves, and the quality with
which the confidence intervals are constructed. This is achieved
by giving the percentage of simulations in which the true param-
eters were contained in the 95% confidence intervals constructed
in each simulation. To illustrate, Fig. 1 depicts the reconstructed
source amplitude of the first source (location (0.0, 0.5, 0.75)) in
each simulation with 400 trials in which the correct model was
fitted. Furthermore it depicts the estimated source locations of
all sources in each simulation.
As can be seen from Table I, the coverage rates of the con-
fidence intervals are rather close to their theoretical expected
level of 95%—even for relatively small numbers of trials (i.e.
L = 100). The latter is somewhat surprising because the theory
was developed on the assumption of large numbers of trials.
The departure from the theoretical value of the coverage rates
of σ
2
k
was anticipated from the results reported in [1]. The re-
markable feature of these coverage rates is that they are near per-
fect when few trials are available, and departure increases as the
number of trials increases. This seemingly paradoxical result is
GRASMAN ET AL.: STOCHASTIC MAXIMUM LIKLIHOOD 7
due to a slight bias of the estimators in combination with over-
sized confidence intervals for relatively few trials (L = 100).
At L = 400 the coverage rate of these parameters is about 68%
which falls neatly in between the rates reported in [1] for these
estimators under SNR’s 1:1 and 1:5, and the same number of
trials. Apparently, as the signal to noise ratio decreases, the bias
increases, as at L = 400 the estimated standard errors were in
fact quite good.
The acceptance rate of the GLRT is graphed in figure 2. As
can be seen from the figure the GLRT rejected both models
when trial count was low (L = 100), indicating that the asymp-
totic approximation is inadequate with low trial counts. Accep-
tance of the correct model was near the nominal rate of 95%
for moderate L = 200 (89%) and at nominal rate for relatively
large numbers of trials L = 400 (96%), indicating adequate
approximation of the statistic by the asymptotic distribution in
these cases
4
. The incorrect model, was accepted too often with
moderate trial counts (19%), indicating that the GLRT was too
insensitive to modelling errors in such cases. For relatively large
numbers of trials the GLRT therefore seems to be helpful in de-
tecting interactions (but see the discussion below).
VI CONCLUDING REMARKS
We have formulated a framework for modelling coherence be-
tween sources in terms of linear transfer functions. This frame-
work has its roots in the techniques known in the statistical liter-
ature as structural equation modelling (SEM) [11], confirmatory
factor analysis [40], frequency domain dynamic factor analysis
[41] and simultaneous equations [42]. For the latter, frequency
domain-like variants were proposed in [43].
We have given closed form expressions for estimators of the
separable parameters and an expression for the concentrated
negative log-likelihood which greatly simplify the numerical op-
timization procedure. The expressions obtained are very similar
to the standard expressions found in the signal processing liter-
ature on SML DOA estimation, and extends the SML methods
with the inclusion of the mean, and a more general noise covari-
ance matrix which may depend on unknown parameters. The
results of the simulations show that parameter standard errors
are reliably constructed; regardless of the number of trials. Fur-
thermore, the results indicated that the GLRT statistic can be
indicative on the presence of interactions between sources, pro-
vided that enough trials are available (L 200).
In [1] we have also considered least squares techniques. How-
ever, the generalized least squares method, although known to
have the same asymptotic statistical properties as ML estima-
tors [44], yielded biased estimates of source coherence in finite
samples, and were therefore not considered here.
Frequency domain dipole modelling of EEG/MEG data has
been pursued before in [45],[21],[22], while the asymptotic sta-
tistical independence of Fourier coefficients has been exploited
in the context of general EEG signal analysis in e.g. [15], and
[23]. The method discussed here and in [1], can be considered
as extensions to the methods in these references.
Other approaches that use dipole localization techniques from
the outset to study cortical synchrony have been presented in
4
This was confirmed by a Kolmogorov-Smirnov test on the distribution of the
statistic.
[7], [8] and [9]. In [7] synthetic aperture magnetometry (SAM)
is used to derive time series of activity in regions of interest,
that are then subjected to phase analysis. In [8] a beam form-
ing technique is used that searches for sources with maximum
coherence. In [9] an interesting adaptation of iteratively re-
fined minimum norm estimation [46] is presented which uses
a bootstrapping technique on surrogate data. As argued in [9],
problems with the first two approaches are that the linearly con-
strained minimum variance beam formers were developed under
the assumption of incoherent sources, and their performance is
known to deteriorate with coherent sources. Furthermore the
method in [8] only finds coherent sources, while neurophysio-
logical research indicates that desynchronization of sources may
play an important role in several cognitive processes [6], [4], [5]
(cf. [9]). A similar argument would hold against the use of MU-
SIC for estimating coherence between sources [9], [47]. The
minimum norm estimate is known to suffer from bias in its lo-
cation estimates but was improved with bootstrapping methods
[9]. Once the regions of activity have been localized in this man-
ner, these authors suggest to perform a phase analysis on recon-
structed time series [7]. Although this method seems promising,
it is as yet difficult to see a principled framework in which the
adequateness of the resulting source model can be assessed. In
contrast, maximum likelihood estimation directly provides mea-
sures to assess modelling adequateness in the form of the GLRT
statistic.
As an alternative to SML estimation, subspace fitting methods
(SF), in which A(θ) is fitted by least squares to subspace vec-
tors obtained from e.g. principal components analysis (PCA) or
independent components analysis (ICA [48]) have been investi-
gated [29]. SF methods are corrected versions of methods that
fit individual columns of A(θ) to individual subspace compo-
nents as is done in [49] (PCA) and [16], [10] (ICA), which are
known to be suboptimal [50], [51]. Weighted SF (WSF), which
is based on PCA, was shown to yield asymptotically efficient
estimates of θ in [29]. Furthermore WSF and SML were shown
to be asymptotically robust against violations of distribution as-
sumptions of the source signals. Currently, general (asymp-
totic) distributional properties of other subspace estimates, e.g.
obtained from ICA, are unknown, and therefore it is unclear
whether such estimates are efficient. As indicated earlier, we
also investigated generalized least squares estimation of cross-
spectrum structures, which is also known to be asymptotically
efficient [44], [1], and concluded that it yields strongly biased
coherence estimates in finite samples—this, in contrast with the
SML estimates. We plan to investigate this in the simpler (W)SF
estimates in future work.
With respect to the GLRT statistic a word of caution is in or-
der. As indicated in [1] the GLRT statistic is distributed as χ
2
only asymptotically, that is, for large L. At the same time as
L grows larger, the sensitivity to modelling error increases, and
the test is likely to become significant because of the necessary
approximations in the head model, the source model (dipole ap-
proximation to extended sources) and noise model. Therefore
the GLRT may not be very appropriate as a rigid rejection crite-
rion, and it has been recommended to use it more as a descriptive
index of overall of fit rather than as a statistical test [52]. A large
number of alternative measures have been presented in the lit-
erature, an overview of which may be found in [11]. In [31] a
8 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XX, NO. Y, MONTH 2004
number of fit indices for selecting the number of dipole sources
have been assessed, both with respect to certain theoretical re-
quirements, as well as in numerical experiments with dipole lo-
calization with MEG. It was found that information theoretic
criteria on the one hand, and Wald tests on source amplitudes
on the other hand, were quite effective under various circum-
stances. In the current setup, if the mean is modelled, then the
confidence intervals of the α
k
parameters are akin to the Wald
amplitude test discussed in [31].
The difference between estimates of unparameterized Ψ and
estimates of Ψ parameterized by B and Φ is precisely the dis-
tinction made in the neuroimaging community between “func-
tional” and “effective” connectivity [4]. However, it should be
emphasized that in modelling the coherence between sources,
several equivalent models may exist, which can have very dif-
ferent neurophysiological interpretations. For example in the
case of two sources, the interaction may be modelled as the first
source being input to the second, or vice versa. Both models
would fit equally well, so no distinction can be made on the basis
of the fit. Therefore, in applications a priori information should
be available on which interaction patterns are considered to be
more valid than other, mathematically equivalent ones [11].
ACKNOWLEDGMENTS
The Netherlands Organization for Scientific Research (NWO)
is gratefully acknowledged for funding this project. This re-
search was conducted while R. Grasman (527-25-014), L. Wal-
dorp (527-25-013), and K. B
¨
ocker (527-25-015) were supported
by a grant of the NWO foundation for Behavioral and Educa-
tional Sciences of this organization awarded to H.M. Huizenga,
P.C.M. Molenaar, L.J. Kenemans, and J.C. de Munck.
We are indebted to the anonymous reviewers whose construc-
tive comments helped us improving on a first version of this
paper. We also thank Dr. Conor V. Dolan for proofreading.
APPENDIX
Equation (15). Assume Ψ
1
and Θ
1
exist, and that A has
full column rank. From the relation (A + CBD)
1
= A
1
A
1
C(B
1
+ DA
1
C)
1
DA
1
[30, p. 9], we have
R
1
= (A
+ Θ)
1
= Θ
1
Θ
1
A[Ψ
1
+ Γ]
1
A
Θ
1
where Γ = A
Θ
1
A. From this, and from (I + B)
1
=
I (B
1
+ I)
1
, we find that A
R
1
= (I Γ[Ψ
1
+
Γ]
1
)A
Θ
1
= (I [Ψ
1
Γ
1
+ I]
1
)A
Θ
1
= [I +
ΓΨ]
1
A
Θ
1
= [Γ
1
+ Ψ]
1
Γ
1
A
Θ
1
. Therefore, post-
multiplying A
R
1
by A we find A
R
1
A = [Γ
1
+ Ψ]
1
,
which yields (A
R
1
A)
1
A
R
1
= (A
Θ
1
A)
1
A
Θ
1
.
REFERENCES
[1] R. P. P. P. Grasman, H. M. Huizenga, L. J. Waldorp, K. B. E. B
¨
ocker, and
P. C. M. Molenaar. Frequency domain simultaneous source and source co-
herence estimation with an application to MEG. IEEE Trans. on Biomed-
ical Engineering, in press.
[2] T. J. Sejnowski and P. Smith Churchland. Brain and cognition. In
Michael I. Posner, editor, Foundations of cognitive science, chapter 8,
pages 301–358. MIT Press, Cambridge, Mass., 1989.
[3] M. H
¨
am
¨
al
¨
ainen, Rita Hari, Risto J. Ilmoniemi, J. Knuutila, and O.V.
Lounasmaa. Magnetoencephalography – theory, instrumentation, and ap-
plications to noninvasive studies of the working human brain. Reviews of
Modern Physics, 65:413–497, 1993.
[4] F. Varela, J. P. Lachaux, E. Rodriguez, and J. Martinerie. The brainweb:
phase synchronization and large-scale integration. Nature reviews: Neu-
roscience, 2(4):229–239, Apr. 2001.
[5] S. L. Bressler. Event-related potentials. In M. A. Arbib, editor, The Hand-
book of Brain Theory and Neural Networks, pages 412–415. MIT Press,
Cambridge, MA, 2002.
[6] G. Pfurtscheller and F. H. Lopes da Silva. Event-related EEG/MEG syn-
chronization and desynchronization: basic principles. Clinical Neurophys-
iology, 110(11):1842–1857, 1999.
[7] T. Holroyd, M. Nielsen, S. Miyauchi, and T. Yanagida. Broad-band mag-
netic brain activity during rhythmic tapping tasks. In J. Nenonen, R. J.
Ilmoniemi, and T. Katila, editors, BioMag2000. Proceedings of 12th Int.
Conf. Biomagnetism, pages 307–310, Espoo, Finland, Aug 2000. Helsinki
University of Technology.
[8] J. Gross, J. Kujala, M. H
¨
am
¨
al
¨
ainen, L. Timmermann, A. Schnitzler, and
R. Salmelin. Dynamic imaging of coherent sources: Studying neural in-
teractions in the human brain. Proc. Nat. Ac. Sci. USA, 98(2):694–699,
2001.
[9] O. David, L. Garnero, D. Cosmelli, and F. J. Varela. Estimation of neural
dynamics from MEG/EEG cortical current density maps: application to
the reconstruction of large-scale cortical synchrony. IEEE Trans BME,
49(9):975–987, Sep 2002.
[10] A. Delorme, S. Makeig, M. Fabre-Thorpe, and T. Sejnowski. From single-
trial EEG to brain area dynamics. Neurocomputing, 44–46:1057–1064,
2002.
[11] K. A. Bollen. Structural equations with latent variables. Wiley series in
probability and mathematical statistics. Wiley, New York, USA, 1st edi-
tion, 1989.
[12] A. Paulraj, B. Ottersten, R. Roy, A. Swindlehurst, G. Xu, and T. Kailath.
Subpsace methods for directions-of-arrival estimation. In N. K. Bose and
C. R. Rao, editors, Handbook of Statistics, chapter 16, pages 639–739.
Elsevier Science Publishers B.V., Amsterdam, Netherlands, 1993.
[13] H. Krim and M. Viberg. Two decades of array signal processing research.
the parametric approach. IEEE Signal Processing Mag., 13(4):67–95, July
1996.
[14] P. Stoica, B. Ottersten, M. Viberg, and R. Moses. Maximum likelihood
array processing for stochastic coherent sources. IEEE Trans. on Signal
Processing, 44(1):96–105, January 1996.
[15] D.T. Pham, J. M
¨
ocks, W. K
¨
ohler, and T. Gasser. Variable latencies of
noisy signals: Estimation and testing in brain potential data. Biometrika,
74(3):525–533, 1987.
[16] S. Makeig, M. Westerfield, T.-P. Jung, S. Enghoff, J. Townsend, E. Courch-
esne, and T. J. Sejnowski. Dynamic Brain Sources of Visual Evoked Re-
sponses. Science, 295(5555):690–694, 2002.
[17] F. Bijma, J. C. de Munck, H. M. Huizenga, and R. M. Heethaar. A
mathematical approach to the temporal stationarity of background noise
in MEG/EEG measurements. NeuroImage, 20:233–243, 2003.
[18] J. Sarvas. Basic mathematical and electromagnetic concepts of the bio-
magnetic inverse problems. Phys. Med. Biol., 32:11–22, 1987.
[19] K. J. Friston, C. Buechel, G. R. Fink, J. Morris, E. Rolls, and R. J. Dolan.
Psychophysiological and modulatory interactions in neuroimaging. Neu-
roimage, 6:218–229, 1997.
[20] D. R. Brillinger. Time Series: data analysis and theory. International
series in decision processes. Holt, Reinhart and Winston Inc., New York,
1975.
[21] J. Raz, B. Turetsky, and G. Fein. Frequency-domain estimation of the
parameters of human brain electrical dipoles. Journal of the American
Statistical Association, 87(417):69–77, 1992.
[22] J. Raz, C. A. Biggins, B. Turetsky, and G. Fein. Frequency-domain dipole
localization - extensions of the method and applications to auditory and
visual-evoked potentials. IEEE Transactions on Biomedical Engineering,
40(9):909–918, 1995.
[23] J. Raz, V. Cardenas, and Fletcher D. Frequency-domain estimation of
covariate effects in multichannel brain evoked-potential data. Biometrics,
51(2):448–460, 1995.
[24] K. G. J
¨
oreskog. Analysis of covariance structures. Scandinavian Journal
of Statistics, 8:65–92, 1981.
[25] J. C. de Munck, P. C. M. Vijn, and F. H. Lopes da Silva. A random dipole
model for spontaneous brain activity. IEEE Transactions on Biomedical
Engineering, 39(8):986–990, Aug. 1992.
[26] T. W. Anderson. An introduction to multivariate statistical analysis. Wiley,
New York, 1971.
[27] M. W. Browne and S. H. C. du Toit. Automated fitting of nonstandard
models. Multivariate Behavioral Research, 27:269–300, 1992.
[28] P. Stoica, B. Ottersten, and M. Viberg. Optimal array signal processing in
the presence of coherent wavefronts. volume 5 of ICASSP, pages 2904–
2907, New York, NY, USA, 1996. IEEE.
[29] B. Ottersten, M. Viberg, and T. Kailath. Analysis of subspace fitting and ml
GRASMAN ET AL.: STOCHASTIC MAXIMUM LIKLIHOOD 9
techniques for parameter estimation from sensor array data. IEEE Trans.
Signal Processing, 40(3):590–600, March 1992.
[30] J. R. Schott. Matrix analysis for statistics. Wiley Series In Probability
And Statistics. John Wiley & Sons, Inc., New York, 1997.
[31] L. J. Waldorp, H. M. Huizenga, R. P.P.P. Grasman, K. B. E. B
¨
ocker,
J. C. de Munck, and P. C. M. Molenaar. Model selection in electromag-
netic source analysis with an application to VEF’s. IEEE Transactions on
Biomedical Engineering, 49(10):1121–1129, 2002.
[32] M. Wax and T. Kailath. Detection of signals by information theoretic
criteria. IEEE Trans ASSP, 33(2):387–392, Apr. 1985.
[33] M. S. Bartlett. A note on the multiplying factors for various χ
2
approxi-
mations. J. Roy. Stat. Soc., 16:296–298, 1954.
[34] D. F. Morrison. Multivariate statistical methods. McGraw-Hill, New York,
2d edition, 1989.
[35] L. J. Waldorp, H. M. Huizenga, C. V. Dolan, and P. C. M. Molenaar. Esti-
mated generalized least squares electromagnetic source analysis based on
a parameteric noise covariance model. IEEE Transactions on Biomedical
Engineering, 48:737–741, 2001.
[36] H. M. Huizenga, D. J. Heslenfeld, and P. C. M. Molenaar. Optimal mea-
surement conditions for spatiotemporal EEG/MEG source analysis. Psy-
chometrika, 67(2):299–313, Jun 2002.
[37] S. D. Silvey. Statistical inference. Penguin, Harmondsworth, 1970.
[38] G. A. F. Seber and C. J. Wild. Nonlinear Regression. Wiley series in
Probability and Mathematical Statistics, Applied probability and statistics.
Wiley, New York, 1989.
[39] P. E. Gill, M. H. Wright, and W. Murray. Nonlinear Programming. Stan-
ford University Press, Stanford, 1986.
[40] K. G. J
¨
oreskog. A general approach to confirmatory maximum likelihood
factor analysis. Psychometrika, 34:182–202, 1969.
[41] P. C. M. Molenaar. Dynamic factor analysis of psychophysiological sig-
nals. In J.R. Jennings, P. Ackles, and M.G.H. Coles, editors, Advances
in Psychophysiology, volume 5 of Advances in Psychophysiology, pages
229–302. Jessica Kingsley Publishers, London, 1993.
[42] T. Amemiya. Advanced Econometrics. Harvard University Press, Cam-
bridge MA, 1986.
[43] D. R. Brillinger and M. Hatanaka. An harmonic analysis of nonstation-
ary multivariate economic processes. Econometrica, 37(1):131–141, Jan.
1969.
[44] M. W. Browne. Generalized least squares estimators in the analysis of
covariance structures. South African Statistical Journal, 8:1–24, 1974.
[45] B. L
¨
utkenh
¨
oner. Frequency-domain localization of intracerebral dipo-
lar sources. Electroencephalography and clinical Neurophysiology,
82(2):112–118, 1992.
[46] R. Srebro. Iterative refinement of the minimum norm solution of the bio-
electric inverse problem. IEEE Trans BME, 43(5):547–552, May 1996.
[47] S. Supek and C. J. Aine. Spatio-temporal modeling of neuromagnetic
data: II. multi-source resolvability of a MUSIC-based location estimator.
Human Brain Mapping, 5(3):154167, 1997.
[48] A. Hyv
¨
arinen and E. Oja. Independent component analysis: algorithms
and applications. Neural Networks, 13:411–430, 2000.
[49] J. Maier, G. Dagneli, H. Spekreijse, and B. W. van Dijk. Principal com-
ponents analysis for source localization of VEPs in man. Vision Research,
27:165–177, 1987.
[50] A. Achim, F. Richer, and J. M. Saint-Hilaire. Methods for separating
temporally overlapping sources of neuroelectric data. Brain Topography,
1(1):22–28, 1988.
[51] J. C. de Munck. The estimation of time varying dipoles on the basis of
evoked potentials. Electroenceph. and clin. Neurophysiol., 77:156–160,
1990.
[52] K. G. J
¨
oreskog and D. S
¨
orbom. LISREL 7, a guide to the program and ap-
plications. J
¨
oreskog and S
¨
orbom/SPSS Inc., Chicago, Illinois, 2nd edition,
1989.
Raoul Grasman was born in 1973. He received
a degree in artificial intelligence (cum laude) and
the masters degree in experimental psychology (cum
laude) from the University of Amsterdam in 1997 and
1998, respectively. His research interests concern the
methodology of cognitive neuroscience and experi-
mental psychology research, and multivariate signal
processing. He is currently working towards his Ph.D.
at the University of Amsterdam.
Hilde Huizenga was born in 1965. She received
the MA degree in psychology from the University of
Groningen in 1990, and the Ph.D. degree (cum laude)
in psychology from the University of Amsterdam in
1995. In 1996-2001 she was a postdoctoral fellow,
currently she associate professor at the University of
Amsterdam. Her main research interest is statistical
analysis of neuroscientifc data. In particular, non-
linear regression and covariance structure analysis of
EEG/MEG sources and their interactions.
Lourens Waldorp was born in 1971. He received his
masters degree in methodological psychology in 1998
from the University of Amsterdam. His research inter-
ests include statistical analysis in psychophysiologi-
cal experiments and signal processing. He is currently
working towards a Ph.D. at the University of Amster-
dam.
Peter Molenaar was born in 1946. His doctoral dis-
sertation was about multidimensional signal analysis.
His current research interests include signal analysis
and applied nonlinear dynamics. He is research direc-
tor of several programs and department head.
Koen B.E. B
¨
ocker was born in 1966. He received
his Masters degree in Physiological Psychology and
his Ph.D. degree (cum laude) from Tilburg University,
in 1989 and 1994, respectively. Since 1989 he has
served multiple research positions at different univer-
sities. Currently, he is assistant professor at Utrecht
University. His research interests include the study
of perception and cognition (attention, inhibition and
emotion) and the application of source analysis in cog-
nitive neuroscience.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Article
We describe a general procedure by which any number of parameters of the factor analytic model can be held fixed at any values and the remaining free parameters estimated by the maximum likelihood method. The generality of the approach makes it possible to deal with all kinds of solutions: orthogonal, oblique and various mixtures of these. By choosing the fixed parameters appropriately, factors can be defined to have desired properties and make subsequent rotation unnecessary. The goodness of fit of the maximum likelihood solution under the hypothesis represented by the fixed parameters is tested by a large sample χ2 test based on the likelihood ratio technique. A by-product of the procedure is an estimate of the variance-covariance matrix of the estimated parameters. From this, approximate confidence intervals for the parameters can be obtained. Several examples illustrating the usefulness of the procedure are given.