ArticlePDF Available

Modeling the hemodynamic response in fMRI using smooth FIR filters

January 2001
IEEE Transactions on Medical Imaging 19(12):1188 - 1201

January 2001
19(12):1188 - 1201

DOI:10.1109/42.897811

Source
IEEE Xplore

Authors:

Cyril Goutte

National Research Council Canada

Finn Årup Nielsen

Technical University of Denmark

Modeling the hemodynamic response in functional magnetic resonance (fMRI) experiments is an important aspect of the analysis of functional neuroimages. This has been done in the past using parametric response function, from a limited family. In this contribution, the authors adopt a semi-parametric approach based on finite impulse response (FIR) filters. In order to cope with the increase in the number of degrees of freedom, the authors introduce a Gaussian process prior on the filter parameters. They show how to carry on the analysis by incorporating prior knowledge on the filters, optimizing hyper-parameters using the evidence framework, or sampling using a Markov Chain Monte Carlo (MCMC) approach. The authors present a comparison of their model with standard hemodynamic response kernels on simulated data, and perform a full analysis of data acquired during an experiment involving visual stimulation.

Resulting filter using ridge regression (thin line) and the smooth FIR filter approach (thick line). Top: characteristic length`=length`length`= 5 s; bottom: ` = 7.5 s. From left to right, increasing levels of regularization g = =. Data: one voxel from the visual cortex (voxel 429) displaying a large activation.

…

Comparison of the filters obtained by ridge regression, the smooth FIR filter without boundary condition (thick solid) and the FIR filter with boundary conditions (thick dashed). Notice how the endpoint goes (smoothly) to zero. Data: one voxel from the visual cortex (voxel 429) displaying a large activation.

…

Comparison of the neighboring influences for Ridge regression, Tikhonov regularization on the gradient [ " Tikhonov (1) " ] and the curvature [ " Tikhonov (2) " ] and the smooth filter approach.

…

fMRI time series (+) measured in one strongly activated voxel (voxel 429), with the activation estimated by the smooth FIR filter (solid) and the prediction error bars on this activation, obtained from (12) (dotted). The error bars appear slightly over estimated because the first run has a larger amplitude than the last nine runs, thus inflating the apparent noise level.

…

Smooth FIR filter obtained on data generated by convolution of a square wave by a fixed-shape kernel. (a) Poisson, (b) Gamma, and (c) Gaussian filters. Top: paradigm (dotted), signal (dashed), noisy data (dots) and modeled signal (solid); Bottom: generating filter (dashed) and estimated smooth FIR filter (solid). Notice that the target and modeled signals (top row) are almost indistinguishable.

…

Figures - uploaded by Cyril Goutte

Content may be subject to copyright.

Content uploaded by Cyril Goutte

Content may be subject to copyright.

1188 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 19, NO. 12, DECEMBER 2000

Modeling the Haemodynamic Response in fMRI

Using Smooth FIR Filters

Cyril Goutte*, Finn Årup Nielsen, and Lars Kai Hansen, Member, IEEE

Abstract—Modeling the haemodynamic response in functional

magnetic resonance (fMRI) experiments is an important aspect of

the analysis of functional neuroimages. This has been done in the

past using parametric response function, from a limited family.

In this contribution, we adopt a semi-parametric approach based

on finite impulse response (FIR) filters. In order to cope with

the increase in the number of degrees of freedom, we introduce

a Gaussian process prior on the filter parameters. We show

how to carry on the analysis by incorporating prior knowledge

on the filters, optimizing hyper-parameters using the evidence

framework, or sampling using a Markov Chain Monte Carlo

(MCMC) approach. We present a comparison of our model with

standard haemodynamic response kernels on simulated data, and

perform a full analysis of data acquired during an experiment

involving visual stimulation.

Index Terms—Evidence, FIR filters, fMRI, haemodynamic re-

sponse, Markov Chain Monte Carlo, neuroimaging, smoothness

prior, Tikhonov regularization.

I. INTRODUCTION

MODELING the haemodynamic response is important for

several reasons. First, an appropriate modeling leads to

better statistical maps. With the increased temporal resolution

of functional magnetic resonance (fMRI) images [as compared

to positron emission tomography (PET)], a binary baseline-ac-

tivation description of the data is insufficient and it is necessary

to take into account the temporal pattern of activation due to the

haemodynamic response to the activation. A second reason is

the possibility of performing simulations with the model. The

predicted behavior obtained from simulation can be used to for-

mulate more explicit hypotheses about the fMRI signal, and pos-

sibly optimize the design and acquisition [1]. A last reason is the

possibility, for some models, to give a physiological interpreta-

tion of the model parameters and, thus, better understand the

neurophysiology [2], [3].

Manuscript received April 10, 2000; revised August 21, 2000. This work

was supported by the EU through a BIOMED II Grant BMH4-CT97-2775, the

Human Brain Project P20 MH57180 and the Danish Research Councils through

the Danish Computational Neural Network Center (CONNECT) and the THOR

Center for Neuroinformatics. The Associate Editor responsible for coordinating

the review of this paper and recommending its publication was X. Hu. Asterisk

indicates corresponding author.

*C. Goutte was with the Department of Mathematical Modeling, Technical

University of Denmark, DK-2800 Lyngby, Denmark. He is now with INRIA

Rhone-Alpes, Zirst Montbonnot - 655 avenue de l’Europe F-38334 Saint Ismier

Cedex France (e-mail: cyril.goutte@inrialpes.fr).

F. Å. Nielsen and L. K. Hansen are with the Department of Mathematical

Modeling, Building 321, Technical University of Denmark, DK-2800 Lyngby,

Denmark.

Publisher Item Identifier S 0278-0062(00)10618-4.

The haemodynamic response is usually, as a first approxima-

tion, modeled as a convolution of the experimental paradigm by

a linear filter, and implemented as a linear time-invariant (LTI)

system. This assumption is usually justified by the observation

of additivity in the fMRI signal [4], which is consistent with

the linear hypothesis. Although several groups have since re-

ported small to strong departures from normality in a number of

contexts [5]–[8], it is still believed that the linearity assumption

holds in a wide range of experimental conditions [8]. The LTI

approach has been pursued using several types of parametric

models of the filter, for example using a Poisson filter [9], a

Gamma filter [4], [10], a Gaussian filter [11], or a simple delay

[12]. In addition, a number of investigators have used linear fil-

ters to model the haemodynamic response, but as they set and do

not fit the parameters, they are somewhat out of the scope of this

study (see, e.g., [6], [13]). In these models, the few parameters

have a specific interpretation, measuring, e.g., delay, strength of

activation, etc.

Inthiscontribution,weuseadifferentstandpoint,wherewedo

not impose a specific shape on the linear filter coefficients. The

haemodynamic response is modeled as an FIR function, a partic-

ular case of autoregressive with exogenous input (ARX) model.

This approach has been pioneered by [14]. Though it is undoubt-

edly parametric in the sense that it fits a number of parameters,

these do not really have a physical or physiological meaning. We

will,therefore,refertoitasasemi-parametricmodelingapproach.

This approach is much more flexible than the use of a parametric

filter shape. In particular, it can model reliably theearly decrease

in signal (initial dip, see, e.g., [15], [16]) or the post-activation

undershoot [17], whereas, e.g., the Poisson, Gamma or Gaussian

filterareintrinsically unabletodoso.

As the number of parameters increases, there is a risk that

the model will overfit or that parameters become ill determined.

We deal with this problem by placing a Gaussian Process prior

on the filter coefficients, forcing the filter to be smooth. The

resulting model is determined by the data and three hyper-pa-

rameters which can be set beforehand or again fitted on the data

using a probabilistic argument.

In the following sections, we describe the basic theory for

smooth FIR filters. We discuss a number of topics like boundary

conditions and link to traditional Tikhonov regularization. We

then go on to describe how to use a Bayesian argument to esti-

mate the hyper-parameters, either using the evidence argument

or by integrating over nuisance parameters using Markov Chain

Monte Carlo (MCMC) methods.

We illustrate the workings of this filter using several exper-

iments. We first show how the model can implement some of

the LTI models currently in use in the fMRI literature. We show

Authorized licensed use limited to: Danmarks Tekniske Informationscenter. Downloaded on November 30, 2009 at 10:29 from IEEE Xplore. Restrictions apply.

GOUTTE et al.: MODELING THE HAEMODYNAMIC RESPONSE IN fMRI USING SMOOTH FIR FILTERS 1189

that the smooth FIR filter is able to implement additional fea-

tures that these classical models cannot, for example a post-ac-

tivation undershoot. We then perform a full analysis of fMRI

data acquired during a visual stimulation experiment. In partic-

ular, we show how to derive from the resulting filter measures of

support ( -values) for the null hypothesis of no activation, and

meaningful physiological information like the strength or delay

in activation.

II. DATA

The dataset was acquired at Hvidovre Hospital on a 1.5-T

Magnetom Vision MR scanner by Egill Rostrup. The scanning

sequence was a 2-D gradient echo EPI (T2 weighted) with 66

ms echo time and 50 degrees RF flip angle. The images were ac-

quired with a matrix of 128 128 pixels, with FOV of 230-mm,

and 10-mm slice thickness, in a para-axial orientation parallel to

the calcarine sulcus. The region of interest (ROI) will belimited

to a 68 82 two-dimensional (2-D) voxel map. The voxel di-

mension is 1.8 1.8 10 mm.

The visual paradigm consists of a rest period of 20 s of dark-

ness using a light fixation dot, followed by 10 s of full-field

checker board reversing at 8 Hz, and ending with 20 s of rest

(darkness). In total, 150 images were acquired in 50 s, corre-

sponding to a period of approximately 330 ms/image.

The experiment was repeated in ten separate runs containing

150 images each. In order to reduce saturation effects, the first

29 images were discarded, leaving 121 images for each run.

The datasets studied in this article were acquired on the same

subject, but during two separate scanning sessions (d3711 and

d3991), such that, e.g., the position and the shape of the slice

are slightly different. In each case, the dataset was built by com-

bining the ten runs into a single sequence of 1210 images. How-

ever, as the runs were acquired separately, it should be noted that

there cannot be any causality between the activation in one run

and the signal measured in the next. Note also that due to the

haemodynamic delay, the signal measured in activated voxels

will be roughly centered within the remaining 40 s of each run.

In the dataset we use in this article, the brain has first been

masked, and the data was preprocessed using the run-based de-

trending described by [18].

III. THEORY OF SMOOTH FIR FILTERS

Let us consider a fMRI signal acquired in a given voxel

using a stimulus . The image index runs between one and

. The finite input response (FIR) filter of order models the

fMRI signal using linear coefficients

(1)

where is a vector of

past values of the stimulus.

Assuming independent additive zero mean Gaussian noise,

the likelihood of the model parameters becomes

(2)

(3)

where , is a matrix con-

taining the (transposed) input vectors for all values1of ,

and , is the vector of mea-

surements, the target values for our filter. Maximizing the likeli-

hood with respect to leads to the well-known maximum-like-

lihood (ML) solution

(4)

When the ratio of the number of independent data to the filter

order is small, the matrix tends to be badly conditioned

and the ML solution becomes unstable. It is necessary to regu-

larize the solution. Alternatively, in a Bayesian context we will

impose constraints on by specifying a prior . We will

focus on Gaussian priors, of the general form

(5)

where indicates the determinant of a matrix. The posterior

distribution of , conditioned on the data and the hyper-param-

eters becomes

(6)

which is largest for the maximum a posteriori parameters

(7)

Note that this is also the ridge regression solution when is a

diagonal matrix with identical elements on the diagonal.

The matrix implements the constraints that we impose on

the model. Here, we want to obtain smooth filters, i.e. filters

such as neighboring parameters (e.g., and ) have similar

value. This corresponds to saying that neighboring filter param-

eters should be somehow correlated. Accordingly, will be the

inverse of a covariance matrix where the covariance is a de-

creasing function of the distance between two parameters

with (8)

In (8), the covariance decreases as a Gaussian parameterized by

and , but any nonnegative decreasing function of the distance

could be used. This corresponds to putting a Gaussian

process prior on the filter parameters themselves, rather than on

the predictions [19], [20].

With this expression, the MAP estimate of becomes

which can be efficiently

calculated as , avoiding

the additional inversion of . The resulting estimate

depends on three hyper-parameters: the noise level , the

strength of the prior and the smoothness factor . We will

see below how it is possible to estimate the values of these

parameters using a Bayesian argument.

1Values of

(

)

can be treated in several ways. For a block design in-

volving baseline-activation-baseline patterns, they will naturally take the value

of the baseline. Alternatively, all

(

)

;t<

can be treated as nuisance param-

eters and integrated out of the model.

Authorized licensed use limited to: Danmarks Tekniske Informationscenter. Downloaded on November 30, 2009 at 10:29 from IEEE Xplore. Restrictions apply.

1190 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 19, NO. 12, DECEMBER 2000

Fig. 1. Resulting filter using ridge regression (thin line) and the smooth FIR filter approach (thick line). Top: characteristic length

5 s; bottom:

7.5 s.

From left to right, increasing levels of regularization

=

. Data: one voxel from the visual cortex (voxel 429) displaying a large activation.

The hyper-parameter controls the smoothness of the re-

sulting filter. For large values of , will go to zero very

fast for increasing values of , such that there will be very

little correlation between parameters: the filter will be very un-

smooth. For , we recover the ridge regression solution.

For small values of , will stay close to one for all ,

indicating perfect correlation between the filter parameters. The

filter will be over-smooth. In the limit all parameters are

identical and the filter performs a local averaging of the stim-

ulus . It is useful to think of as corresponding to a “char-

acteristic length” of the filter, i.e., the typical length in which

the filter varies. The characteristic length can be defined here

as . This is quite useful in fMRI modeling because

it is widely believed on the basis of empirical studies [21], [3]

that the haemodynamic response has a characteristic length on

the scale of seconds, typically between 5 and 10 s. In first ap-

proximation it is then possible to use this prior information such

that corresponds to, e.g., 7 s. For an fMRI experiment where

s, this corresponds to 21 filter parameters.

Fig. 1 displays a comparison of smooth filters with the result

of ridge regression. It is quite clear that ridge regression does not

yield smooth filters, and the fluctuation in the parameters is im-

portant. It is possible to reduce this fluctuation by increasing the

amount of regularization (from left to right on Fig. 1), but this

also reduces the amplitude of the filter. This fluctuation means

that the filter contains high frequency components, which po-

tentially have a strong influence on summary statistics like the

maximum parameter or the delay, even though they seem to con-

tribute little to the modeling on average. The influence of the

smoothness factor is clear from the comparison between the top

and bottom row in the figure. The smooth filters in the bottom

Fig. 2. Comparison of the filters obtained by ridge regression, the smooth FIR

filter without boundary condition (thick solid) and the FIR filter with boundary

conditions (thick dashed). Notice how the endpoint goes (smoothly) to zero.

Data: one voxel from the visual cortex (voxel 429) displaying a large activation.

row are clearly smoother, as expected from the larger character-

istic length (the filters obtained from ridge regression are obvi-

ously identical).

A. Boundary Conditions

It is apparent from Fig. 1 that the first and (especially) the

last filter parameters can take clearly positive or negative values.

This cannot be avoided using the aboveequations. However, by

causality, all for should be zero, as the influence of

the stimulus at time , should be felt only for . This

Authorized licensed use limited to: Danmarks Tekniske Informationscenter. Downloaded on November 30, 2009 at 10:29 from IEEE Xplore. Restrictions apply.

GOUTTE et al.: MODELING THE HAEMODYNAMIC RESPONSE IN fMRI USING SMOOTH FIR FILTERS 1191

corresponds to saying that a hypothetical filter parameter

should be equal to zero. According to our prior, this will have

a decreasing influence on , , etc. which will be forced (by

smoothness) to be close to zero. Similarly, it is sensible that the

influence of an activation should vanish in the past, hence, van-

ishing filter parameters for large delays. This can be again im-

plemented by forcing an additional parameter to be zero.

In practice, we are still interested only in estimating the values

of filter parameters . This is done by defining

amatrix , such that

, for . The matrix is then

defined as the central part of , i.e., taking away the

first and last rows and columns. This operation can be easily

defined mathematically by introducing the matrix

, constructed as the superposition of a row of zeros on

top, a unit matrix in the central rows and a row of zeros

at the bottom. We then have .

Note that we cannot use the same trick as above to avoid in-

verting the parameter covariance matrix. However, is a band-

diagonal (Toeplitz) matrix, such as efficient methods exist to

perform an inversion in quadratic time instead of cubic for gen-

eral matrices. Furthermore, is usually quite small such that

inversion of a matrix is quite fast.

Fig. 2 shows the effect of the boundary conditions. The hyper-

parameters are set to the same values as the left-most bottom

plot on Fig. 1. The smooth FIR filter obtained above had clearly

negative values for the first delays as well as for the longer de-

lays (around 75). This effect disappears when boundary condi-

tions are used. In particular, for large delays, the coefficients go

smoothly to zero, as expected.

B. Link to Tikhonov Regularization

Regularization is often performed using Tikhonov regular-

ization, which imposes a constraint on derivatives of the target

function. In the context of this work, this would correspond

to imposing smoothness by constraining the derivatives of the

filter. The regularized solution is then obtained by minimizing

the penalized cost

(9)

where is the order of the derivatives used for smoothing.

Of course the true derivatives are unknown, such that we typ-

ically use instead the central differences approximation, where,

e.g., the first derivative (gradient) is approximated by the differ-

ence between neighboring filter coefficients:

. The regularized cost can then be formulated as

(10)

Note that we have kept the notation because equation (10) can

actually be obtained (up to an additive and a multiplicative con-

stants) as the negative logarithm of the product of the likelihood

(2) and the prior (5), i.e., as the log-posterior, and plays the

same role as the prior covariance. The expression of depends

Fig. 3. Comparison of the neighboring influences for Ridge regression,

Tikhonov regularization on the gradient [“Tikhonov (1)”] and the curvature

[“Tikhonov (2)”] and the smooth filter approach.

on the derivative used. For 1 (gradient) and 2 (curva-

ture), we have , with

.....

and

.....

.(11)

Tikhonov regularization is, thus, implemented by using a band-

diagonal regularization matrix . However, whereas for smooth

FIR filters has nonzero elements on (almost) all diag-

onals, the number of diagonals used by Tikhonov regularization

depends on the order of the derivative and the approximation

used. This means that whereas for smooth FIR filters the effect

of one given parameter is far reaching, it is really limited for

Tikhonov regularization (2 neighbors for here).

By plotting one row (or one column) of the regularization ma-

trix, one can picture the influence of the values of neighboring

parameters on the solution. This is done on Fig. 3 for ridge re-

gression, Tikhonov regularization on the gradient and curva-

tures, and the smooth FIR filter. The influence of the smooth

filter can be seen here as a generalization of the Tikhonov ap-

proach of regularizing on the (approximate) amplitude of higher

order derivatives.

C. Error Bars

The posterior distribution of makes it possible to estimate

the uncertainty of . Note however that (6) is a multivariate

Authorized licensed use limited to: Danmarks Tekniske Informationscenter. Downloaded on November 30, 2009 at 10:29 from IEEE Xplore. Restrictions apply.

1192 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 19, NO. 12, DECEMBER 2000

Fig. 4. fMRI time series (

) measured in one strongly activated voxel (voxel

429), with the activation estimated by the smooth FIR filter (solid) and the

prediction error bars on this activation, obtained from (12) (dotted). The error

bars appear slightly over estimated because the first run has a larger amplitude

than the last nine runs, thus inflating the apparent noise level.

Gaussian distribution with a general covariance matrix, such as

the individual components of are correlated. In that context it

is not easy to represent graphically the uncertainty. Conditional

error bars obtained from , , where

contains all filter parameters except , give a good idea of how

close the filter parameters should be from each other, but greatly

underestimate the possible range of variation of . This range

is well estimated by the marginally error bars obtained from

but these error bars overlook the fact that filter parameters are

very correlated to each other, such that it is impossible for ex-

ample that lies at the top of its marginal error bar while

lies at the bottom.

The conditional error bars are easily obtained from the co-

variance matrix of the posterior: where is the

th diagonal element of .

It is especially interesting to find error bars on the resulting

predictions of the MAP filter, i.e., the estimated (de-noised) re-

sponse pattern. Using the Gaussian noise assumption, the pos-

terior for the prediction associated with an input is also

Gaussian

(12)

This is illustrated for a particular time series (measured in the

visual cortex) in Fig. 4. We have plotted three runs out of 10, and

we see clearly that the data fits well inside the error bars. These

actually seem slightly overestimated, for two main reasons. First

the Gaussian assumption might be violated, though there is little

evidence on this data of outliers. Second, the impression is actu-

ally due to the fact that we represent only three of ten runs on the

figure. Over the 10 runs, 79 measurements exceed the interval

given by the estimate plus or minus 1.96 standard deviations.

This should be compared to an expected 5% of 1210, or 61.

The time series modeled by the smooth FIR filter shows a

clear post-activation undershoot, followed by an overshoot of

similar amplitude. It should be noted that this might be a pre-

processing artefact. Note also that (on this data at least) it is not

possible to observe an “initial dip.”

D. Significance of Activation

In the context of functional neuroimaging, it is not sufficient

to estimate the haemodynamic response in each location of the

brain. One has to use the estimated model in the purpose of

finding regions that are activated by a given stimulus sequence.

This is traditionally done by testing the null hypothesis of no

activation using various statistical tests. In the context of this

study, the null hypothesis takes the form of H , i.e.,

the filter parameters are identically equal to zero. The alterna-

tive hypothesis is H . In a Bayesian context, this

problem is fundamentally ill posed. The posterior probability

for each hypothesis H and H can, of course, be

estimated, but as H corresponds to a single point in parameter

() space, the associated volume is zero, yielding H

0 and rejection of the null-hypothesis in favor of the alternative

In a Bayesian context, the comparison of a point hypothesis

with an interval hypothesis will, thus, usually lead to the adop-

tion of the latter. In order to derive a measure of support for our

point-null hypothesis in a Bayesian context, we will use the con-

cept of highest posterior density (HPD), described, e.g., by [22]

and used in a functional neuroimaging context by [12]. Given

a posterior density function , the HPD region of content

is the region of parameter space such that [22, section

2.8]

1) ;

2) , .

For a given significance level , we can test whether H lies

within the HPD region of content . If so, the null hypothesis

would be accepted at level . Otherwise H would be rejected

and the voxel declared activated. Alternatively, we can use the

HPD as a measure of support by calculating the volume of the

region , . This is the region in

parameter space that lies outside the equiprobability curve going

through 0. Clearly, if H is to be accepted, this volume will

be larger than , as the HPD contains . This region is large

when zero is close to the MAP (hence, should not be rejected),

and small when zero is far of the MAP (and should be rejected).

It, thus, seems justified to use the volume of as a measure of

support for the null hypothesis.

It is important to note that the use of the HPD to construct a

measure of support for point hypotheses is not exempt of some

of the logical flaws of other traditional measures of support like

-values or Bayes factors [23], [24]. In particular, it potentially

suffers from inconsistency in some pathological cases. How-

ever, it has been noted that in a number of standard situations, it

yields results that are similar to classical statistical tests [22].

Authorized licensed use limited to: Danmarks Tekniske Informationscenter. Downloaded on November 30, 2009 at 10:29 from IEEE Xplore. Restrictions apply.

GOUTTE et al.: MODELING THE HAEMODYNAMIC RESPONSE IN fMRI USING SMOOTH FIR FILTERS 1193

In our case, the posterior density of the filter parameter is

Gaussian, such that it is possible to get an efficient closed-form

solution for the measure of support of H in each voxel. With

the notation ,wehave

(13)

where gammainc is the two-parameter incomplete Gamma

function, and we have adopted the notation for the

support for H by similarity with traditional -values.

E. Estimation of the Delay

Whereas standard parametric models of the haemodynamic

response have one or several parameters representing the delay

(parameter of the Poisson, mean of the Gaussian, ratio of the

Gamma parameters), the FIR filter does not model this directly.

It is necessary to estimate the delay from the many filter param-

eters. One approach is to use the group delay described, e.g., by

Oppenheim and Schafer [25]

(14)

i.e., the average of the delay in each filter parameter, weighted

by the parameter values. In some situations, this estimate will be

unreliable. This is the case for example when the denominator of

(14) is small or when the filter has high frequency components.

Note that by construction, the smooth FIR filter contain only

low frequency components, such that the latter does not occur.

Furthermore, the denominator will take small values when the

mean filter coefficient is close to zero, indicating a nonactivated

voxel. Overall, the estimation of the group delay in activated

regions will give a reliable idea of the delay implemented by

the FIR filter.

A second interesting measure is the delay necessary to reach

90% percent of activation after onset of the stimulus, or to re-

turn within 10% of maximum activation after offset of the stim-

ulus. This delay has been reported to be between 5 and 8 s [21].

Note that linear filters implement symmetrical responses, such

that the shape of the activation (and, thus, delay) after stimulus

onset is identical to the deactivation after stimulus offset. For

block design involving binary baseline-activation stimulus, the

delay is easily calculated from the cumulative sums of the filter

parameters.

F. Tuning of Hyper-Parameters

For given values of , and , we have been able to give the

expression of the posterior (6) and derive the MAP estimate of

the filter parameters and some error bars. We will now see how

we can find proper values for these parameters, using again a

probabilistic approach. In a fully Bayesian approach, we would

integrate over “nuisance” parameters to obtain the posterior dis-

tribution of interest. If we are interested in , for example, we

would integrate over the three hyper-parameters, after endowing

them with suitable priors (i.e., reflecting prior knowledge or lack

thereof). In the context of this study, it is impractical to carry out

the marginalization analytically. Classical MCMC techniques

[26] are able to perform numerical integration, but these tech-

niques are computationally intensive and not practical for a full

brain (or even a full slice) analysis. We give a quick guideline

and an example of application in the appendix to this article.

In the following, we will use an intermediate approach,

and select the hyper-parameters according to their likeli-

hood . Note that using uniform priors

on the hyper-parameters,2the posterior distribution of

the hyper-parameters is proportional to the likelihood,

. The hyper-parameters

that we wish to optimize will, thus, be chosen so as to maximize

the likelihood, also known as the evidence. This is obtained by

integrating over the distribution of the weights

(15)

As the product of the two terms inside the integral has a

Gaussian form, integration can be performed analytically,

leading to

(16)

The evidence (16) can be optimized over several hyper-pa-

rameters using standard nonlinear optimization techniques [27].

As this can still be computationally intensive, we can use an ap-

proximation from the so-called “evidence framework” [28], [29,

sec. 10.4], which provides a re-estimation formula for the noise

level and the prior strength . In that framework, these hyper-

parameters can be estimated iteratively. For example, given a

noise level , we estimate the filter , and the noise

level is updated according to the resulting filter fit [29,

sec. 10.4].

The model size can also be thought of as playing the role

of a hyper-parameter. A sensible choice is to take sufficiently

large such that the corresponding filter contains the entire hemo-

dynamic response. In this paper we have chosen to take 60,

corresponding to 20 s. Additional experiments using 75

(corresponding to 25 s) showed little difference in the results.

As a comparison, the length of the filter in SPM99 (from the

file SPM_HRF.M)is32s.

IV. EXPERIMENTS

A. Can Smooth FIR Filter Estimate Standard Kernel Shapes?

In a first experiment, we look at the ability of the smooth filter

to recover the shape of traditional linear filters: the Poisson filter

proposed by [9], the Gamma filter of [10] and the Gaussian filter

[11].

We use the same sequence of 1210 images with ten runs of

31 baseline, 30 activation and 60 baseline. Hence, the paradigm

2Technically, the priors are uniform on the log-domain

Authorized licensed use limited to: Danmarks Tekniske Informationscenter. Downloaded on November 30, 2009 at 10:29 from IEEE Xplore. Restrictions apply.

1194 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 19, NO. 12, DECEMBER 2000

(a) (b)

(c)

Fig. 5. Smooth FIR filter obtained on data generated by convolution of a square wave by a fixed-shape kernel. (a) Poisson, (b) Gamma, and (c) Gaussian filters.

Top: paradigm (dotted), signal (dashed), noisy data (dots) and modeled signal (solid); Bottom: generating filter (dashed) and estimated smooth FIR filter (solid).

Notice that the target and modeled signals (top row) are almost indistinguishable.

is a vector with 1210 elements and consists of a series of square

waves. For all three filters, the mean is taken to be 18 images or

6 s, while the variance of the Gamma and Gaussian filters are set

equal to 70. The variance of the Poisson filter is by construction

equal to the mean, i.e., 18. All filter parameters where scaled

such that the amplitude of the signal was roughly the same as

what we observed in activated voxels in the actual experiment.

Additive white noise of variance 400 was added to the

convolved signal, giving a signal-to-noise ratio between 2.24

and 2.91 dB (i.e., the variance of the signal is only around 40%

larger than that of the noise), cf. Fig. 5.

Fig. 5 shows the results obtained for the three filters. The re-

sulting smooth FIR filter was better estimated in the case of

the Gamma and Gaussian generating filter. The Poisson filter

is more difficult to estimate because the “length scale” of the

filter varies widely depending on the delay. There are steep

changes in the filter coefficients around the maximum, while

the second half of the filter is virtually flat. So in a way this

is the least smooth of the three filters. Note that the estima-

tion of the Gamma filter is also slightly impaired for the same

reason: the variation in filter coefficients is faster before the

mode, and smoother afterwards (middle plot). Note however

that when taking into account the rather large amount of noise

in the data, the fit is quite satisfactory in all cases. Furthermore,

the misfit in the filter parameters does not prevent the smooth

FIR filter to model the data almost exactly (cf. top row in Fig.5).

This simulation indicates that for the three basic filter shapes,

the smooth FIR filter is able to recover the filter shape effi-

ciently. In particular, in all three cases studied here, the recov-

ered FIR filter showed little or no post-activation undershoot, in

accordance with the strictly positive target filters. This suggests

that any difference between the smooth FIR filter and these clas-

sical filters observed on real data is not due to the inability of

the FIR filter to reconstruct the true fMRI response, but rather

Authorized licensed use limited to: Danmarks Tekniske Informationscenter. Downloaded on November 30, 2009 at 10:29 from IEEE Xplore. Restrictions apply.

GOUTTE et al.: MODELING THE HAEMODYNAMIC RESPONSE IN fMRI USING SMOOTH FIR FILTERS 1195

(a) (b)

Fig. 6. (a) Smooth FIR filter obtained for a particular voxel (bold solid), together with the best fit obtained using a Poisson shape (solid), a Gamma shape (dash

dotted) and a Gaussian shape (dashed). (b) Fitted signal, for one run, of the four filters from (a), plotted together with the data, averaged across the ten runs.

(a) (b)

Fig. 7. Brain maps indicating the mean filter coefficient in each voxel. The colorbar is common to both images. The two maps are from the same subject, buttwo

different scanning sessions. The differing shapes are due to different alignments in the two sessions. Notice the good agreement between the activation patterns,

indicating good reproducibility. Voxels from the primary visual cortex (and to the lesser extent the supplementary visual cortex) display a strong (and asymmetric)

response to the stimulus. Notice that locally the haemodynamic response yields a “negative activation” (white dots above the activated area). (a) The horizontal

lines at row 14 and 66 indicate the voxels that we will study in more detail further down (cf. Fig. 8).

to the built-in limitations of these classical filters. In particular,

the modeling of the post-activation undershoot observed on real

data is not due to the Gaussian process prior used to constrain

the FIR coefficients, but reflects a feature that standard filter

shapes are unable to model.

B. What is the Shape of the Haemodynamic Response?

Let us now take the opposite standpoint and compare the

smooth filter obtained on real data to the best fit using the other

three standard kernel shapes. On the same data, we estimate the

maximum a posteriori filter parameters , as well as the

Poisson, Gamma and Gaussian filters with the best fit to the data.

The results are presented on Fig. 6, where we have plotted the

result of the smooth FIR filter together with the best fit obtained

using the three standard kernel shapes introduced above.

One obvious result is that the Poisson filter seems to be quite

inappropriate for estimating the haemodynamic filter. This is

due to the fact that the one-parameter filter has identical mean

and variance. In some particular cases, notably when TR is large

and the filter only covers a few images, this might not be too

limiting. However, this clearly introduces a strong constraint on

the shape of the filter, which leads to an inappropriate filter on

our data. The Poisson parameter is here 17.8, corresponding to

a mean activation delay of 5.9 s, which is reasonable. We would

on the other hand appreciate getting a wider filter as the three

other filters are wider, but due to the restriction of the Poisson

Authorized licensed use limited to: Danmarks Tekniske Informationscenter. Downloaded on November 30, 2009 at 10:29 from IEEE Xplore. Restrictions apply.

1196 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 19, NO. 12, DECEMBER 2000

(a) (b)

Fig. 8. The smooth FIR filters obtained on the rows indicated in black on Fig. 7 (row 14 and 66). The

-axis (VOXELS) runs along the “cut” indicated in Fig. 7,

the

-axis (DELAY) runs along the delays in the FIR filter (like the

-axis in, e.g., Figs. 1 and 2). Notice the strong activation in row 66, in the middle of the

range, which corresponds to voxels from the primary visual cortex (V1). There is also a more limited response in the lateral areas. By contrast, the filters in row

14 are almost flat, indicating no activation.

filter, this would increase the activation delay beyond reason-

able. For comparison, the activation delays are 5.3 s and 5.9 s

for the Gamma and Gaussian filters, respectively, and 5.8 s for

the smooth FIR filter. But the width, measured by the standard

deviation, is 1.4 s for the Poisson filter, versus 1.8 s and 1.9 s

for Gamma and Gaussian.

A second salient feature is that, by construction, none of the

three basic filter shapes is able to model the post activation

undershoot evidenced by the smooth filter. The ability of the

Gaussian filter to model the first activation “bump” nicely, and

go to zero fast afterwards gives a slightly better fit to the data.

By construction, the Gamma filter is skewed and has signifi-

cant mass in the tail (i.e., for large delays). This proves to be

harmful as it introduces additional misfit around the post-acti-

vation undershoot (as the filter can not go to zero fast enough).

This is also the reason why the maximum of the filter seems to

be reached slightly ahead of what is expected. Because of the

skewness, the maximum of the filter is attained sensibly earlier

(at 4.7 s) than the mean (5.9 s). Furthermore, in order to mini-

mize the misfit around the post-activation undershoot, the mode

has to be shifted toward zero.

This result indicates that the smooth FIR filter will be able

to model additional features in the data, when traditional filter

shapes fail. This is important because we know from previous

studies (e.g., [17]) that there is a post-activation undershoot in

fMRI data. It might also be possible to model the initial negative

response if it is present in the data (cf. Section V-A).

C. Full Analysis

The slice that we study in this dataset contains 3891 voxels.

We estimate the smooth FIR filter in each voxel, using 60

delays. This leads to 3891 distinct filters. Note however that in

the MAP estimation procedure (7), the fMRI signal comes into

the pictures through the vector , while is identical for all

voxels. An important methodological question is whether the

hyper-parameters , and should be kept constant across

the brain or locally estimated. The local estimation requires a

significant increase in computation which makes the estimation

process impractical on current workstations for several thou-

sand voxels. Accordingly, we will here adopt a hybrid approach,

which is computationally easier. Parameter is fixed from a

priori knowledge to a value corresponding to a characteristic

length of 7 s; is set by iterative re-estimation, which converges

very fast; and is fixed for the whole brain to a value optimized

using the evidence on a given activated voxel.

Note that using a global set of hyper-parameters does not

imply that the filter itself should be constant. As argued by [10],

the characteristics of the filter should vary spatially. However, it

is desirable to impose some constraint on the filter such that the

filters would not vary beyond reasonable from one voxel to the

next. The use of global hyper-parameters, beyond its computa-

tional justification, forces such a high level constraint between

the filters.

The results are summarized in Fig. 7. As we have 60

filter parameters/voxel, it is necessary to design a summary

statistic in each voxel for presentation purposes. In Fig. 7

we present the mean filter coefficient. The rationale behind

this choice is that positive responses will display a positive

mean coefficient, even when the post-activation undershoot

is taken into account. Alternatively, we could use the most

extreme coefficient, or the mean absolute coefficient, but

the latter loses the sign of the activation, and we know

from previous studies [30], [18] that some areas display a

negative BOLD response to the stimulus.

In order to represent the filter themselves, additional dimen-

sions are required to accommodate the filter delay and the coef-

ficient values. Accordingly, we will illustrate the difference be-

tween the filters on two rows of voxels, one from a nonactivated

area (row 14), and one taken from a cut through the visual cortex

corresponding to row 66 in the summary images (Fig. 7). The re-

sults are presented in Fig. 8. In row 14, the filters are almost flat,

reflecting the fact that there is no activation. Small fluctuations

around zero reflect the presence of noise. On the other hand, the

Authorized licensed use limited to: Danmarks Tekniske Informationscenter. Downloaded on November 30, 2009 at 10:29 from IEEE Xplore. Restrictions apply.

GOUTTE et al.: MODELING THE HAEMODYNAMIC RESPONSE IN fMRI USING SMOOTH FIR FILTERS 1197

(a) (b)

Fig. 9. Brain maps indicating the log support

in each voxel, calculated using (13). Values have been thresholded at 10 and superimposed over areference

background. (a) Displays concentrated activation in the primary visual cortex, as well as in lateral areas. In (b), the lateral activation is more diffuse, on the other

hand artefactual false positives appear (see, e.g., arrow), probably due to movement effects. The colorbar indicates the (base 10) log of

filters identified in the voxels corresponding to the primary vi-

sual cortex (V1), in the middle of the range on Fig. 8(b), display

a strong positive activation, followed by a post-activation under-

shoot, modeled by a series of negative filter coefficients around

40 to 60 images delay. Voxels located in the lateral visual cortex

display a moderate positive activation. In some cases, the esti-

mated filter displays a corresponding under-shoot, but the am-

plitudes are so limited that the relevance of this feature is clearly

debatable.

Let us investigate the significance of activation using the

highest probability density approach outlined above. Fig. 9

presents the location of the voxels for which ,

superimposed on a background reference. This allows for a

quantitative characterization of the activation pattern outlined

above (Fig. 7). In both experiments there is a clear activation

in the primary visual cortex. Notice that the asymmetric nature

of the activation reproduces well in both experiments. Another

finding is that the negative activation that were apparent above

the main (positive) activation area turn out as highly significant

in Fig. 9. We also note that whereas some significant activation

is present on both sides of the lateral visual cortex in the

first experiment [d3711, Fig. 9(a)], only traces of significant

activation are observed in the second experiment. On the other

hand, experiment d3991 displays a higher number of scattered

artefactual activation, including a very consistent area [arrow

on Fig. 9(b)] which could be due to movement artefacts.

Finally, the activated area may seem larger than the cortical

area (visual cortex) especially for d3711. One factor explaining

this is the spatial blurring in the hemodynamic signal, which to

our knowledge is still imperfectly understood.

We will now characterize the delay in activation modeled

by the FIR filters, using the group delay [25] described earlier.

Fig. 10 plots the resulting delays on brain maps where only the

voxels that exceeded the threshold used in Fig. 9 ( ) are

retained, and superimposed on the background reference. There

is a similarity in the spread of the delays, as well as in the fact

that the group delay seems longer in the posterior region of acti-

vation. There is however a striking difference in the actual delay

values. In the first experiments, the delays range roughly be-

tween nine images (3 s) and 18 images (6 s), while in the second,

the range is between 15 and 24 images (5–8 s). This difference

could be the sign of an inconsistency in the time registration

during the experiments. The activation periods might have ac-

tually occurred earlier than registered in experiment d3711, or

later than registered in d3991, or a combination of both.

Apart from this possible inconsistency, Fig. 10 reveals that

the estimation of the group delay yields values that seem

biologically plausible, though noticeably smaller than the

on-line/off-line delay. The estimates seem to be locally similar,

but display noticeable differences at larger scale, some regions

reacting with shorter delays. This difference in activation delay

was previously spotted using clustering [18].

V. DISCUSSION

A. Initial Dip

In addition to modeling the post-activation undershoot, as

shown, e.g., in Fig. 6, the smooth FIR filter is potentially able

to model the initial negative response or initial dip [15], [16],

Authorized licensed use limited to: Danmarks Tekniske Informationscenter. Downloaded on November 30, 2009 at 10:29 from IEEE Xplore. Restrictions apply.

1198 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 19, NO. 12, DECEMBER 2000

(a) (b)

Fig. 10. Brain maps indicating the group delay measured in each activated voxel (

001

), cf. Fig. 9. The delays in the activated areas range from nine to 18

images for d3711 (3–6 s) and from 15 to 24 images for d3991 (5–8 s), indicating a possible inconsistency in time registration during the experiments. The colorbar

indicates delay (in images) between nine images (3 s) and 24 images (8 s).

[31]–[33], . Despite the relatively low field intensity, the ini-

tial dip was observed here in a number of voxels, mainly in ex-

periment d3991. Due to the temporal alignment problem that

we have uncovered above, it was difficult to check the repro-

ducibility of the observed negative response across the two ex-

periment.

However, we can report that this feature has been observed

in 20%–30% of the activated voxels, in both experiments, after

correction of the temporal alignment. A complete analysis of

the initial dip is out of the scope of the current article and will

be reported elsewhere.

B. Dependency on Neuronal Activation

The relationship between the experimental paradigm and the

observed images involves both a neuronal activation induced by

the paradigm, and the hemodynamic response to this activation,

which leads to the actual measurements. By using the paradigm

as input to the linear filter, we assume that the pattern of activa-

tion follows the paradigm closely. Although this is a reasonable

assumption is the context of a strong visual stimulation, in other

cases the stimulus might not be the same as the actual neuronal

activation.

This limitation is common to all the parametric models men-

tioned in this study and has, therefore, little relevance for the

comparison between these methods. It should, however, be kept

in mind when more complicated experiments (perhaps involving

cognitive tasks) are involved. To determine both the neuronal

activation and the hemodynamic response would require blind

deconvolution using a latent variable model. The model of [34]

is an attempt to do this. It is, however, outside of the scope of

this article.

C. Computational Issues

One of the biggest challenge of this approach lies in the prac-

tical implementation for whole brain analysis, or for the anal-

ysis of a reasonable subset of voxel, e.g., after sieving with an

omnibus F-test. While the calculation of the MAP estimate for a

given set of hyper-parameters , and is straightforward and

not more computationally intensive than traditional approaches

based on ridge regression or singular value decomposition, the

tuning of the hyper-parameter is usually time consuming. While

we have argued for example that the parameter controlling the

typical length scale could be set a priori to between 5 and 10 s

(7 s here), there is no guarantee that this is optimal in any sense.

A full nonlinear optimization over two or three hyper-param-

eters is feasible for a limited number of voxels, but too compu-

tationally demanding for a whole volume or even a slice with

current computing facilities. Similarly, sampling from the pos-

terior using an MCMC technique is only practical for a limited

subset of voxels.

One simplification would be the use of fixed hyper-parame-

ters for the whole volumes. In that case only one nonlinear op-

timization or Markov Chain would be needed to yield a set of

hyper-parameters which are applied to all voxels. Though some

researchers (e.g., [10]) have argued that the characteristics of the

haemodynamic response vary spatially, note that having fixed

hyper-parameters would allow the filters themselves to be spa-

tially different, while tying them at a higher level in a hierar-

chical manner. A drawback of this approach is that it might lead

Authorized licensed use limited to: Danmarks Tekniske Informationscenter. Downloaded on November 30, 2009 at 10:29 from IEEE Xplore. Restrictions apply.

GOUTTE et al.: MODELING THE HAEMODYNAMIC RESPONSE IN fMRI USING SMOOTH FIR FILTERS 1199

to averaging some of the characteristics like the noise level or

the length scale. Typically, nonactivated voxels could be mod-

eled using large length scales, corresponding to flat filters, while

activated voxels would benefit from the flexibility introduced by

smaller length scales. Note that this is not necessarily a problem

as far as the predictions themselvesare concerned, as suggested

by Fig. 5.

In the experiments described above, we have adopted an in-

termediate approach, where is set a priori to give a length

scale of 7 s, the regularization strength is optimized once and

for all based on some activated voxels, while the noise level

is estimated locally using an iterative procedure similar to the

re-estimation formulas in the so-called “evidence framework”

(e.g., [28], [29, sec. 10.4]). There is an obvious benefit in terms

of computationally time. In our Matlab implementation running

on a 450 Mhz Pentium II, the full estimation of the filters and

associated measures of support takes around 90 s for 4000

voxels. An added benefit of the local estimation of is that we

do not need to make the assumption that the noise is spatially

stationary.

VI. CONCLUSION

In this paper, the use of smooth FIR filters for analyzing

functional magnetic resonance imaging data was described.

Smoothness is implemented using a correlated Gaussian prior,

and analysis is carried on using Bayesian inference. The

smooth FIR filter has a number of advantages over standard

(Poisson, Gamma, Gaussian) parametric families for modeling

the haemodynamic response. In particular, it can model a long

post-activation undershoot or the initial negative response.

The generality and flexibility of the smooth FIR approach was

illustrated on simulated data. A full analysis of data acquired

during a visual stimulation experiment with high temporal

resolution was performed. The ability of the smooth FIR filter

to find activated regions was demonstrated using a measure of

support derived from the highest posterior density approach.

APPENDIX

SAMPLING VIA MCMC

As noted in Section III-F, an ideal Bayesian analysis would

not optimize parameters, but obtain distributions of the relevant

quantities by integrating over nuisance parameters. This can be

useful here is at least three contexts.

1) Marginalize the hyper-parameters to obtain the posterior

of the filter parameters , in order to obtain

maximum posterior parameters or the covariance of these

parameters;

2) Marginalize the hyper-parameters in the distribution of

the prediction (12), in order to obtain the distribution of

conditioned only on the actual data;

3) Obtain the posterior distribution of the hyper-parame-

ters conditioned on the data in order

to check for example whether the hyper-parameters are

well-determined by the data.

Numerical integration methods will be necessary for all

three problems, and can in principle be easily performed using

(a)

(b)

Fig. 11. (a) Histogram for the three hyper-parameters obtained from sampling

the posterior. (b) Bivariate samples of



and

(in the log domain) as dots,

superimposed on a contour plot of the joint density of



and

conditioned on



400

MCMC [26], in particular Metropolis–Hastings, or hybrid

Monte Carlo [35] if derivatives are available.

A. Priors

The first step is to setup some priors for the three hyper-pa-

rameters that we use here. We will put a Gamma prior on the

variance of the noise and prior strength . For normalized

data, we would take a mean of and a shape factor of

, because it is unlikely that the noise level far exceeds

the variance of the data, and there should be significant mass to-

ward zero in order to allow small noise levels (or little regular-

ization). Accordingly, we will scale the priorwith the empirical

variance calculated on the actual data

Authorized licensed use limited to: Danmarks Tekniske Informationscenter. Downloaded on November 30, 2009 at 10:29 from IEEE Xplore. Restrictions apply.

1200 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 19, NO. 12, DECEMBER 2000

For the length scale, we have strong a priori information

suggesting that typical length scales for the haemodynamic re-

sponse should be between 5 and 10 s. However, we want to

allow larger length scales, which might be useful for nonacti-

vated voxels, where the underlying filter should be uniformly

zero. Accordingly, we will model this prior information using

a log-normal distribution, such that the log of the characteristic

length covers over two standard deviations. This

leads to

B. Sampling from the Posterior

We will now illustrate how to sample from the posterior of the

hyper-parameters , in order to check how well

determined these hyper-parameters are. The posterior is easily

obtained (from Bayes’ rule) as it is proportional to the product

of the evidence (16) by the priors described previously. As the

derivative of the evidence with respect to is nontrivial, we will

simply use a Metropolis–Hastings algorithm [36], [26] with a

Gaussian proposal (in the log domain), which has the advantage

of being symmetric.

After setting the proposal such that we get a acceptance rate

between 50% and 60%, we run the chain for 1000 iterations

and discard the 50 first samples as “burn-in.” The histograms of

the sample distribution for the three hyper-parameters are pre-

sented in Fig. 11 (above). The log scale gives a good indica-

tion of the relative spread of the hyper-parameters around their

mean. Clearly, the noise level is very well determined by the

data. The prior strength is badly determined, meaning that a

wide range of values have large probability. The situation for the

length scale is somewhat intermediate. These results show that

it is sensible to optimize the noise level to a fixed value, as its

posterior distribution is close to a delta function. On the other

hand, it would be interesting to integrate over hyper-parameters

and , which have broader marginal distributions.

In Fig. 11(b), we investigate the joint distribution of and .

The background contour plot has been obtained from the expres-

sion of the posterior by setting .

As our previous investigation showed that the noise level is

well determined in the neighborhood of this value, this gives

a probably accurate description of the marginal joint distribu-

tion . The sample (dots on Fig. 11, right) seems

to support this approximation, and indicates that, due to corre-

lation between and , the hyper-parameters are slightly better

determined by the data than it is suggested by the marginal his-

tograms.

This result shows that obtaining a sample from the hyper-pa-

rameters posterior is potentially useful. Unfortunately, as indi-

cated earlier, it is not computationally possible to perform this

sampling on a large scale.

ACKNOWLEDGMENT

The authors would like to thank E. Rostrup for making his

visual stimulation dataset available to them. They also thank

J. Kershaw for stimulating discussions on highest posterior

density regions and C. Rasmussen for discussions on general

Bayesian matters.

REFERENCES

[1] P. A. Bandettini and R. W. Cox, “Functional contrast in event-related

fMRI: Interstimulus dependency and blocked design comparison,” in

Proceedings of the Fourth International Conference on Functional Map-

ping of the Human Brain. ser. NeuroImage, T. Paus, A. Gjedde, and A.

Evans, Eds. New York: Academic, May 1998, pt. 2 of 3, p. 522.

[2] C. G. Thomas and R. S. Menon, “Amplitude response and stimulus

presentation frequency response of human primary visual cortex using

BOLD EPI at 4 T,” Magn. Reson. Medicine, vol. 40, no. 2, pp. 203–209,

1998.

[3] R. B. Buxton, E. C. Wong, and L. R. Frank, “Dynamics of blood flow

and oxygenation changes during brain activation: The ballon model,”

Magn. Reson. Med, vol. 39, no. 6, pp. 855–864, 1998.

[4] G. M. Boynton, S. A. Engel, G. H. Glover, and D. J. Heeger, “Linear

systems analysis of functional magnetic resonance imaging in human

V1,” J. Neurosci., vol. 16, no. 13, pp. 4207–4221, 1996.

[5] J. R. Binder, S. M. Rao, T. A. Hammeke, J. A. Frost, P. A. Bandettini,

and J. S. Hyde, “Effects of stimulus on signal response during functional

magnetic-resonance-imaging of auditory-cortex,” Cogn. Brain Res., vol.

2, no. 1, pp. 31–38, 1994.

[6] A. M. Dale and R. L. Buckner, “Selective averaging of rapidly presented

individual trials using fMRI,” Human Brain Mapping, vol. 5, no. 5, pp.

329–340, 1997.

[7] M. D. Robson, J. L. Dorosz, and J. C. Gore, “Measurements of the tem-

poral fMRI response of the human auditory cortex to trains of tones,”

NeuroImage, vol. 7, no. 3, pp. 185–198, 1998.

[8] G. H. Glover, “Deconvolution of impulse response in event-related

BOLD fMRI,” NeuroImage, vol. 9, no. 4, pp. 416–429, 1999.

[9] K. J. Friston, P. Jezzard, and R. Turner, “The analysis of functional MRI

time-series,” Human Brain Mapping, vol. 1, pp. 153–174, 1994.

[10] N. Lange and S. L. Zeger, “Non-linear Fourier time series analysis for

human brain mapping by functional magnetic resonance imaging,” J.

Roy. Statistical Soc., ser. C, Appl. Stat., vol. 46, no. 1, pp. 1–30, 1997.

[11] J. C. Rajapakse, F. Kruggel, J. M. Maisog, and D. Y. von Cramon, “Mod-

eling hemodynamic response for analysis of functional MRI time-se-

ries,” Human Brain Mapping, vol. 6, pp. 283–300, 1998.

[12] J. Kershaw, B. A. Ardekani, and I. Kanno, “Application of Bayesian

inference to fMRI data analysis,” IEEE Trans. Med. Imag., vol. 18, pp.

1138–1153, Dec. 1999.

[13] M. S. Cohen, “Parametric analysis of fMRI data using linear systems

methods,” NeuroImage, vol. 6, no. 2, pp. 93–103, Aug. 1997.

[14] F. Å. Nielsen, L. K. Hansen, P. Toft, C. Goutte, N. Lange, S. C. Strother,

N. Mørch, C. Svarer, R. Savoy, B. Rosen, E. Rostrup, and B. Peter,

“Comparison of two convolution models for fMRI time series,” in

Friberg et al. [37], L. Friberg, A. Gjedde, S. Holm, N. A. Lassen,

and M. Nowak, Eds. New York: Academic, May 1997, pt. 2 of 4 in

NeuroImage, vol. 5, p. S473.

[15] R. S. Menon, S. Ogawa, X. Hu, J. P. Strupp, P. Anderson, and K. Ugurbil,

“BOLD based functional MRI at 4 Tesla includes a capillary bed con-

tribution: Echo-planar imaging correlates with previous optical imaging

using intrinsic signals,” Magn. Reson. Med, vol. 33, no. 3, pp. 453–459,

1995.

[16] E. Yacoub and X. Hu, “Detection of the early negative response in fMRI

at 1.5 tesla,” Magn. Reson. Med, vol. 41, no. 6, pp. 1088–1092, 1999.

[17] G. Krüger, A. Kleinschmidt, and J. Frahm, “Dynamic MRI sensitized

to cerebral blood oxygenation and flow during sustained activation of

human visual cortex,” Magn. Reson. Med, vol. 35, no. 6, pp. 797–800,

1996.

[18] C. Goutte, L. K. Hansen, M. G. Liptrot, and E. Rostrup, “Fea-

ture space clustering for fMRI meta-analysis,” IMM, Tech. Rep.

IMM-REP-1999-13, 1999.

[19] C. E. Rasmussen, “Evaluation of Gaussian processes and other methods

for nonlinear regression,” Ph.D. dissertation, Dept. Comput. Sci., Univ.

Toronto, Toronto, Canada, 1996.

[20] C. K. I. Williams, “Prediction with Gaussian processes: From linear re-

gression to linear prediction and beyond,” in Learning and Inference in

Graphical Models, M. I. Jordan, Ed. Norwell, MA: Kluwer, 1998.

[21] P. A. Bandettini, A. Jesmanowicz, E. C. Wong, and J. S. Hyde, “Pro-

cessing strategies for time-course data sets in functional MRI of the

human brain,” Magn. Reson. Med, vol. 30, no. 2, pp. 161–173, August

1993.

Authorized licensed use limited to: Danmarks Tekniske Informationscenter. Downloaded on November 30, 2009 at 10:29 from IEEE Xplore. Restrictions apply.

GOUTTE et al.: MODELING THE HAEMODYNAMIC RESPONSE IN fMRI USING SMOOTH FIR FILTERS 1201

[22] G. E. P. Box and G. C. Tiao, Bayesian Inference in Statistical Anal-

ysis. New York: Wiley, 1992.

[23] M. J. Schervish, “

-values: What they are and what they are not,” Amer.

Statistician, vol. 50, no. 3, pp. 203–206, Aug. 1996.

[24] M. Lavine and M. J. Schervish, “Bayes factors: What they are and what

they are not,” Amer. Statistician, vol. 53, no. 2, pp. 119–122, May 1999.

[25] A. V. Oppenheim and R. W. Schafer, Discrete-Time Signal Pro-

cessing. Englewood Cliffs, NJ: Prentice-Hall, 1989.

[26] D. J. C. MacKay, “Introduction to Monte Carlo methods,” in Learning

in Graphical Models. ser. NATO SCIENCE: D Behavioral and Social

Sciences, M. I. Jordan, Ed. Dordrecht, The Netherlands: Kluwer Aca-

demic, 1998, vol. 89.

[27] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery,

Numerical Recipes in C, 2nd ed. Cambridge, U.K.: Cambridge Univ.

Press, 1992.

[28] D. MacKay, “A practical Bayesian framework for backprop networks,”

Neural Computation, vol. 4, pp. 448–472, 1992.

[29] C. M. Bishop, Neural Networks for Pattern Recognition. Oxford,

U.K.: Clarendon, 1995.

[30] C. Goutte, P. Toft, E. Rostrup, F. Å. Nielsen, and L. K. Hansen, “On

clustering fMRI time series,” NeuroImage, vol. 9, no. 3, pp. 298–310,

1999.

[31] X. Hu, T. H. Le, and K. Ugurbil, “Evaluation of the early response

in fMRI in individual subjects using short stimulus duration,” Magn.

Reson. Med, vol. 37, no. 6, pp. 877–884, 1997.

[32] E. Yacoub, T. H. Le, K. Ugurbil, and X. Hu, “Further evaluation of

the initial negative response in functional magnetic resonance imaging,”

Magn. Reson. Med, vol. 41, no. 3, pp. 436–441, 1999.

[33] G. M. Hathout, B. Varjavand, and R. K. Gopi, “The early response in

fMRI: A modeling approach,” Magn. Reson. Med, vol. 41, no. 3, pp.

550–554, 1999.

[34] P. A. d. F. R. Højen-Sørensen, L. K. Hansen, and C. E. Rasmussen,

“Bayesian modeling of fMRI time series,” in Advances in Neural In-

formation Processing Systems 12: Proceedings of the 1999 Conference,

S. A. Solla, T. K. Leen, and K.-R. Müller, Eds. Cambridge, MA: MIT

Press, 2000, pp. 754–760.

[35] R. M. Neal, Bayesian Learning for Neural Networks. New York:

Springer, 1996, vol. 118 of Lecture Notes in Statistics.

[36] S. Chib and E. Greenberg, “Understanding the Metropolis–Hastings al-

gorithm,” The American Statistician, vol. 49, no. 4, pp. 327–335, 1995.

[37] L. Friberg, A. Gjedde, S. Holm, N. A. Lassen, and M. Nowak, Eds., Pro-

ceedings of the Third International Conference on Functional Mapping

of the Human Brain. New York: Academic, May 1997, part 2 of 4 in

NeuroImage.

Authorized licensed use limited to: Danmarks Tekniske Informationscenter. Downloaded on November 30, 2009 at 10:29 from IEEE Xplore. Restrictions apply.

Hemodynamic Deconvolution Demystified: Sparsity-Driven Regularization at Work

Article

Full-text available

Aug 2023

Deconvolution of the hemodynamic response is an important step to access short timescales of brain activity recorded by functional magnetic resonance imaging (fMRI). Albeit conventional deconvolution algorithms have been around for a long time (e.g., Wiener deconvolution), recent state-of-the-art methods based on sparsity-pursuing regularization are attracting increasing interest to investigate brain dynamics and connectivity with fMRI. This technical note revisits the main concepts underlying two main methods, paradigm free mapping and total activation, in the most accessible way. Despite their apparent differences in the formulation, these methods are theoretically equivalent as they represent the synthesis and analysis sides of the same problem, respectively. We demonstrate this equivalence in practice with their best-available implementations using both simulations, with different signal-to-noise ratios, and experimental fMRI data acquired during a motor task and resting state. We evaluate the parameter settings that lead to equivalent results and showcase the potential of these algorithms compared to other common approaches. This note is useful for practitioners interested in gaining a better understanding of state-of-the-art hemodynamic deconvolution and aims to answer questions that practitioners often have regarding the differences between the two methods.

BOLD response is more than just magnitude: improving detection sensitivity through capturing hemodynamic profiles

Preprint

Full-text available

Feb 2023

Typical FMRI analyses assume a canonical hemodynamic response function (HRF) with a focus on the overshoot peak height, while other morphological aspects are largely ignored. Thus, in most reported analyses, the overall effect is reduced from a curve to a single scalar. Here, we adopt a data-driven approach to HRF estimation at the whole-brain voxel level, without assuming a profile at the individual level. Then, we estimate the BOLD response in its entirety with a smoothness constraint at the population level to improve predictive accuracy and inferential efficiency. Instead of using just the scalar that represents the effect magnitude, we assess the whole HRF shape, which reveals additional information that may prove relevant for many aspects of a study, as well as for cross-study reproducibility. Through a fast event-related FMRI dataset, we demonstrate the extent of under-fitting and information loss that occurs when adopting the canonical approach. We also address the following questions: 1) How much does the HRF shape vary across regions, conditions, and clinical groups? 2) Does an agnostic approach improve sensitivity to detect an effect compared to an assumed HRF? 3) Can examining HRF shape help validate the presence of an effect complementing statistical evidence? 4) Could the HRF shape provide evidence for whole-brain BOLD response during a simple task?

BOLD Response is more than just magnitude: Improving detection sensitivity through capturing hemodynamic profiles

Article

Jun 2023
NEUROIMAGE

Typical fMRI analyses often assume a canonical hemodynamic response function (HRF) that primarily focuses on the peak height of the overshoot, neglecting other morphological aspects. Consequently, reported analyses often reduce the overall response curve to a single scalar value. In this study, we take a data-driven approach to HRF estimation at the whole-brain voxel level, without assuming a response profile at the individual level. We then employ a roughness penalty at the population level to estimate the response curve, aiming to enhance predictive accuracy, inferential efficiency, and cross-study reproducibility. By examining a fast event-related FMRI dataset, we demonstrate the shortcomings and information loss associated with adopting the canonical approach. Furthermore, we address the following key questions: 1. To what extent does the HRF shape vary across different regions, conditions, and participant groups? 2. Does the data-driven approach improve detection sensitivity compared to the canonical approach? 3. Can analyzing the HRF shape help validate the presence of an effect in conjunction with statistical evidence? 4. Does analyzing the HRF shape offer evidence for whole-brain response during a simple task?

Seeing more than the Tip of the Iceberg: Approaches to Subthreshold Effects in Functional Magnetic Resonance Imaging of the Brain

Article

Full-text available

Jun 2024

Many functional magnetic resonance imaging (fMRI) studies and presurgical mapping applications rely on mass-univariate inference with subsequent multiple comparison correction. Statistical results are frequently visualized as thresholded statistical maps. This approach has inherent limitations including the risk of drawing overly-selective conclusions based only on selective results passing such thresholds. This article gives an overview of both established and newly emerging scientific approaches to supplement such conventional analyses by incorporating information about subthreshold effects with the aim to improve interpretation of findings or leverage a wider array of information. Topics covered include neuroimaging data visualization, p-value histogram analysis and the related Higher Criticism approach for detecting rare and weak effects. Further examples from multivariate analyses and dedicated Bayesian approaches are provided.

Counterfactual thinking induces different neural patterns of memory modification in anxious individuals

Article

Full-text available

May 2024

Episodic counterfactual thinking (eCFT) is the process of mentally simulating alternate versions of experiences, which confers new phenomenological properties to the original memory and may be a useful therapeutic target for trait anxiety. However, it remains unclear how the neural representations of a memory change during eCFT. We hypothesized that eCFT-induced memory modification is associated with changes to the neural pattern of a memory primarily within the default mode network, moderated by dispositional anxiety levels. We tested this proposal by examining the representational dynamics of eCFT for 39 participants varying in trait anxiety. During eCFT, lateral parietal regions showed progressively more distinct activity patterns, whereas medial frontal neural activity patterns became more similar to those of the original memory. Neural pattern similarity in many default mode network regions was moderated by trait anxiety, where highly anxious individuals exhibited more generalized representations for upward eCFT (better counterfactual outcomes), but more distinct representations for downward eCFT (worse counterfactual outcomes). Our findings illustrate the efficacy of examining eCFT-based memory modification via neural pattern similarity, as well as the intricate interplay between trait anxiety and eCFT generation.

Exploring an EM-algorithm for banded regression in computational neuroscience

Preprint

Full-text available

Sep 2023

Regression is a principal tool for relating brain responses to stimuli or tasks in computational neuroscience. This often involves fitting linear models with predictors that can be divided into groups, such as distinct stimulus feature subsets in encoding models or features of different neural response channels in decoding models. When fitting such models, it can be relevant to impose differential shrinkage of the different groups of regression weights. Here, we explore a framework that allow for straightforward definition and estimation of such models. We present an expectation-maximization algorithm for tuning hyperparameters that control shrinkage of groups of weights. We highlight properties, limitations, and potential use-cases of the model using simulated data. Next, we explore the model in the context of a BOLD fMRI encoding analysis and an EEG decoding analysis. Finally, we discuss cases where the model can be useful and scenarios where regularization procedures complicate model interpretation.

A novel 4D fMRI clustering technique to examine event-related spatiotemporal dynamics of face processing in naturalistic stimuli

Preprint

Full-text available

Jun 2023

Chelsea Ekstrand

Cortical function is complex, nuanced, and involves information processing in a multimodal and dynamic world. However, previous functional magnetic resonance imaging (fMRI) research has generally characterized static activation differences between strictly controlled proxies of real-world stimuli that do not encapsulate the complexity of everyday multimodal experiences. Of primary importance to the field of neuroimaging is the development of techniques that distill complex spatiotemporal information into simple, behaviorally relevant representations of neural activation. Herein, we present a novel 4D spatiotemporal clustering method to examine dynamic neural activity associated with events (specifically the onset of human faces in audiovisual movies). Results from this study showed that 4D spatiotemporal clustering can extract clusters of fMRI activation over time that closely resemble the known spatiotemporal pattern of human face processing without the need to model a hemodynamic response function. Overall, this technique provides a new and exciting window into dynamic functional processing across both space and time using fMRI that has wide applications across the field of neuroscience.

Understanding the neurodynamic process of decision-making for mobile application downloading

Article

Full-text available

Dec 2022
PLOS ONE

In this article, we try to explore and understand the neurodynamics of the decision-making process for mobile application downloading. We begin the model development in a rather unorthodox fashion. Patterns of brain activation regions are identified, across participants, at different time instance of the decision-making process. Region-wise activation knowledge from previous studies is used to put together the entire process model like a cognitive jigsaw puzzle. We find that there are indeed a common dynamic set of activation patterns that are consistent across people and apps. That is to say that not only are there consistent patterns of activation there is a consistent change from one pattern to another across time as people make the app adoption decision. Moreover, this pattern is clearly different for decisions that end in adoption than for decisions that end with no adoption.

Exploring an EM-algorithm for banded regression in computational neuroscience

Article

Apr 2024

Regression is a principal tool for relating brain responses to stimuli or tasks in computational neuroscience. This often involves fitting linear models with predictors that can be divided into groups, such as distinct stimulus feature subsets in encoding models or features of different neural response channels in decoding models. When fitting such models, it can be relevant to allow differential shrinkage of the different groups of regression weights. Here, we explore a framework that allow for straightforward definition and estimation of such models. We present an expectation-maximization algorithm for tuning hyperparameters that control shrinkage of groups of weights. We highlight properties, limitations, and potential use-cases of the model using simulated data. Next, we explore the model in the context of a BOLD fMRI encoding analysis and an EEG decoding analysis. Finally, we discuss cases where the model can be useful and scenarios where regularization procedures complicate model interpretation.

Macroscopic resting-state brain dynamics are best described by linear models

Article

Full-text available

Dec 2023
Nat. Biomed. Eng.

It is typically assumed that large networks of neurons exhibit a large repertoire of nonlinear behaviours. Here we challenge this assumption by leveraging mathematical models derived from measurements of local field potentials via intracranial electroencephalography and of whole-brain blood-oxygen-level-dependent brain activity via functional magnetic resonance imaging. We used state-of-the-art linear and nonlinear families of models to describe spontaneous resting-state activity of 700 participants in the Human Connectome Project and 122 participants in the Restoring Active Memory project. We found that linear autoregressive models provide the best fit across both data types and three performance metrics: predictive power, computational complexity and the extent of the residual dynamics unexplained by the model. To explain this observation, we show that microscopic nonlinear dynamics can be counteracted or masked by four factors associated with macroscopic dynamics: averaging over space and over time, which are inherent to aggregated macroscopic brain activity, and observation noise and limited data samples, which stem from technological limitations. We therefore argue that easier-to-interpret linear models can faithfully describe macroscopic brain dynamics during resting-state conditions.

Analysis of functional MRI time-series

Article

Jan 1994
HUM BRAIN MAPP

A practical bayesian framework for backprop networks

Article

Jan 1990

D.J.C. MacKay

Functional Contrast in Event-Related fMRI: Interstimulus Interval Dependency and Blocked Design Comparison

Article

May 1998

Selective averaging of rapidly presented individual trials using fMRI

Article

Jan 1997
HUM BRAIN MAPP

Comparison of two convolution models for fMRI time series

Article

Jan 1997
NEUROIMAGE

Evaluation of the early response in fMRI in individual subjects using short stimulus duration

Article

Aug 1997

Optical imaging studies have provided evidence of an initial increase in deoxyhemoglobin following the onset of neuronal stimulation/activation and demonstrated that this initial increase could be spatially more specific to the site of neuronal activity. These studies also raised the possibility of improving the specificity of fMRI by selective mapping of this early response. Previous MR studies reported the observation of this early response but were limited in scope and not in full agreement. This paper presents a more extensive study that (a) demonstrates the initial signal decrease in individual subjects and (b) examines its dependence on stimulus duration and subject. Binocular visual stimulation experiments were performed on 14 subjects using echo-planar imaging (EPI) with high temporal resolution. An initial signal decrease was consistently observed in regions that were more localized than those displaying the delayed positive response. In agreement with previous fMRI and optical imaging findings, the maximum signal decrease was 1-2% and occurred at approximately 2 s after the onset of the stimulus, depending on the subject. For stimulus longer then 3.0 s, the temporal dynamics and the amount of signal change of the early response was essentially independent of the stimulus duration, while the delayed response and the post-stimulus undershoot increased both in terms of magnitude and rise time as the duration of the stimulus increased; this observation is concordant with the recent optical imaging study.

BAYESIAN LEARNING FOR NEURAL NETWORKS Bayesian Learning for Neural Networks

Article

Two features distinguish the Bayesian approach to learning models from data. First, beliefs derived from background knowledge are used to select a prior probability distribution for the model parameters. Second, predictions of future observations are made by integrating the model's predictions with respect to the posterior parameter distribution obtained by updating this prior to take account of the data. For neural network models, both these aspects present diiculties | the prior over network parameters has no obvious relation to our prior knowledge, and integration over the posterior is computationally very demanding. I address the problem by deening classes of prior distributions for network param-eters that reach sensible limits as the size of the network goes to innnity. In this limit, the properties of these priors can be elucidated. Some priors converge to Gaussian processes, in which functions computed by the network may be smooth, Brownian, or fractionally Brownian. Other priors converge to non-Gaussian stable processes. Interesting eeects are obtained by combining priors of both sorts in networks with more than one hidden layer.

Introduction to Monte Carlo Methods

Conference Paper

Mar 1998

David J. C. Mackay

This paper describes a sequence of Monte Carlo methods: importance sampling, rejection sampling, the Metropolis method, and Gibbs sampling. For each method, we discuss whether the method is expected to be useful for high-dimensional problems such as arise in inference with graphical models. After the methods have been described, the terminology of Markov chain Monte Carlo methods is presented. The chapter concludes with a discussion of advanced methods, including methods for reducing random walk behaviour.

Bayesian Inference in Statistical Analysis

Book

Jan 1973

IntroductionInferences Concerning a Single Mean from Observations Assuming Common Known VarianceInferences Concerning the Spread of a Normal Distribution from Observations Having Common Known MeanInferences When Both Mean and Standard Deviation are UnknownInferences Concerning the Difference Between Two MeansInferences Concerning a Variance RatioAnalysis of the Linear ModelA General Discussion of Highest Posterior Density RegionsH.P.D. Regions for the Linear Model: A Bayesian Justification of Analysis of VarianceComparison of ParametersComparison of the Means of k Normal PopulationsComparison of the Spread of k DistributionsSummarized Calculations of Various Posterior Distributions

P Values: What They are and What They are Not

Article

Aug 1996

Mark J. Schervish

P values (or significance probabilities) have been used in place of hypothesis tests as a means of giving more information about the relationship between the data and the hypothesis than does a simple reject/do not reject decision. Virtually all elementary statistics texts cover the calculation of P values for one-sided and point-null hypotheses concerning the mean of a sample from a normal distribution. There is, however, a third case that is intermediate to the one-sided and point-null cases, namely the interval hypothesis, that receives no coverage in elementary texts. We show that P values are continuous functions of the hypothesis for fixed data. This allows a unified treatment of all three types of hypothesis testing problems. It also leads to the discovery that a common informal use of P values as measures of support or evidence for hypotheses has serious logical flaws.

Modeling the hemodynamic response in fMRI using smooth FIR filters

Abstract and Figures

Recommended publications

A MCMC approach for Bayesian super-resolution image reconstruction

Identification of 2-D noncausal Gauss-Markov random fields

Evaluation of Variational and Markov Chain Monte Carlo Methods for Inference in Partially Observed S...

Quantifying the Uncertainty in Model Parameters Using Gaussian Process-Based Markov Chain Monte Carl...