Content uploaded by Dirk Vandermeulen
Author content
All content in this area was uploaded by Dirk Vandermeulen on Dec 19, 2013
Content may be subject to copyright.
Content uploaded by André Collignon
Author content
All content in this area was uploaded by André Collignon
Content may be subject to copyright.
IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 16, NO. 2, APRIL 1997 187
Multimodality Image Registration by
Maximization of Mutual Information
Frederik Maes, Andr´e Collignon, Dirk Vandermeulen, Guy Marchal, and Paul Suetens
Abstract — A new approach to the problem of m ulti-
modality medical image registration is prop osed, using a
basic concept from information theory, Mutual Informa-
tion or relative entropy, as a new matching criterion. The
method presented in this paper applies Mutual Informa-
tion to measure the statistical dependence or information
redundancy between the image intensities of corresponding
voxels in b oth images, which is assumed to be maximal if
the images are geometrically aligned. Maximization of Mu-
tual Information is a very general and powerful criterion,
because no assumptions are made regarding the nature of
this dependence and no limiting constraints are imposed
on the image content of the modalities involved. The ac-
curacy of the mutual information criterion is validated for
rigid body registration of CT, MR and PET images by com-
parison with the stereotactic registration solution, while ro-
bustness is evaluated with respect to implementation issues,
such as interpolation and optimization, and image content,
including partial overlap and image degradation. Our re-
sults demonstrate that subvoxel accuracy with respect to the
stereotactic reference solution can be achieved completely
automatically and without any prior segmentation, feature
extraction or other pre-processing steps, which makes this
method very well suited for clinical applications.
Keywords—Matching criterion, multimodality images, mu-
tual information, registration.
I. Introduction
T
HE geometric alignment or registration of multi-
modality images is a fundamental task in numerous
applications in three-dimensional (3-D) medical image pro-
cessing. Medical diagnosis, for instance, often benefits from
the complementarity of the information in images of differ-
ent modalities. In radiotherapy planning, dose calculation
is based on the CT data, while tumor outlining is often
better performed in the corresponding MR scan. For brain
function analysis, MR images provide anatomical informa-
tion, while functional information may be obtained from
PET images, etcetera.
The bulk of registration algorithms in medical imaging
(see [3], [16], [23] for an overview) can be classified as being
either frame based, point landmark based, surface based,
or voxel based. Stereotactic frame based registration is
very accurate, but inconvenient, and can not be applied
retrospectively, as with any external point landmark based
This work was supported in part by IBM Belgium (Academic
Joint Study) and by the Belgian National Fund for Scientific Re-
search (NFWO) under grant numbers FGWO 3.0115.92, 9.0033.93
and G.3115.92.
The authors are with the Laboratory for Medical Imaging Research
(directors: Andr´e Oosterlinck & Albert L. Baert), a cooperation be-
tween the Department of Electrical Engineering, ESAT (Kardinaal
Mercierlaan 94, B-3001 Heverlee), and the Department of Radiology,
University Hospital Gasthuisberg (Herestraat 49, B-3000 Leuven), of
the Katholieke Universiteit Leuven, Belgium.
F. Maes is Aspirant of the Belgian National Fund for Scientific
Research (NFWO). E-mail: Frederik.Maes@uz.kuleuven.ac.be.
method, while anatomical point landmark based methods
are usually labor-intensive and their accuracy depends on
the accurate indication of corresponding landmarks in all
modalities. Surface-based registration requires delineation
of corresponding surfaces in each of the images separately.
But surface segmentation algorithms are generally highly
data and application dependent and surfaces are not easily
identified in functional modalities such as PET. Voxel based
(VSB) registration methods optimize a functional measur-
ing the similarity of all geometrically corresponding voxel
pairs for some feature. The main advantage of VSB meth-
ods is that feature calculation is straightforward or even
absent when only grey-values are used, such that the accu-
racy of these methods is not limited by segmentation errors
as in surface based methods.
For intra-modality registration multiple VSB methods
have been proposed that optimize some global measure of
the absolute difference between image intensities of corre-
sponding voxels within overlapping parts or in a region of
interest [5], [11], [19], [26]. These criteria all rely on the as-
sumption that the intensities of the two images are linearly
correlated, which is generally not satisfied in the case of
inter-modality registration. Cross-correlation of feature im-
ages derived from the original image data has been applied
to CT/MR matching using geometrical features such as
edges [15] and ridges [24] or using especially designed inten-
sity transformations [25]. But feature extraction may intro-
duce new geometrical errors and requires extra calculation
time. Furthermore, correlation of sparse features like edges
and ridges may have a very peaked optimum at the regis-
tration solution, but at the same time be rather insensitive
to misregistration at larger distances, as all non-edge or
non-ridge voxels correlate equally well. A multi-resolution
optimization strategy is therefore required, which is not
necessarily a disadvantage, as it can be computationally
attractive.
In the approach of Woods et al. [30] and Hill et al. [12],
[13] misregistration is measured by the dispersion of the
two-dimensional (2-D) histogram of the image intensities
of corresponding voxel pairs, which is assumed to be min-
imal in the registered position. But the dispersion mea-
sures they propose are largely heuristic. Hill’s criterion
requires segmentation of the images or delineation of spe-
cific histogram regions to make the method work [20], while
Woods’ criterion is based on additional assumptions con-
cerning the relationship between the grey-values in the dif-
ferent modalities, which reduces its applicability to some
very specific multi-modality combinations (PET/MR).
In this paper, we propose to use the much more general
notion of Mutual Information (MI) or relative entropy [8],
188 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 16, NO. 2, APRIL 1997
[22] to describe the dispersive behaviour of the 2-D his-
togram. Mutual information is a basic concept from in-
formation theory, measuring the statistical dependence be-
tween two random variables or the amount of information
that one variable contains about the other. The MI reg-
istration criterion presented here states that the mutual
information of the image intensity values of correspond-
ing voxel pairs is maximal if the images are geometrically
aligned. Because no assumptions are made regarding the
nature of the relation between the image intensities in both
modalities, this criterion is very general and powerful and
can be applied automatically without prior segmentation
on a large variety of applications.
This paper expands on the ideas first presented by Col-
lignon et al. [7]. Related work in this area includes the
work by Viola and Wells et al. [27], [28] and by Studholme
et al. [21]. The theoretical concept of mutual informa-
tion is presented in section II, while the implementation
of the registration algorithm is described in section III.
In sections IV, V and VI we evaluate the accuracy and
the robustness of the MI matching criterion for rigid body
CT/MR and PET/MR registration. Section VII summa-
rizes our current findings, while section VIII gives some
directions for further work. In the appendix, we discuss
the relationship of the MI registration criterion to other
multi-modality VSB criteria.
II. Theory
Two random variables A and B with marginal proba-
bility distributions p
A
(a)andp
B
(b) and joint probabil-
ity distribution p
AB
(a, b) are statistically independent if
p
AB
(a, b)=p
A
(a).p
B
(b), while they are maximally de-
pendent if they are related by a one-to-one mapping T :
p
A
(a)=p
B
(T (a)) = p
AB
(a, T (a)). Mutual information,
I(A, B), measures the degree of dependence of A and
B by measuring the distance between the joint distribu-
tion p
AB
(a, b) and the distribution associated to the case
of complete independence p
A
(a).p
B
(b), by means of the
Kullback-Leibler measure [22], i.e.
I(A, B)=
a,b
p
AB
(a, b)log
p
AB
(a, b)
p
A
(a).p
B
(b)
(1)
Mutual information is related to entropy by the equa-
tions:
I(A, B)=H(A)+H(B) − H(A, B)(2)
= H(A) − H(A|B)(3)
= H(B) − H(B|A)(4)
with H(A)andH(B) being the entropy of A and B
respectively, H(A, B) their joint entropy and H(A|B)and
H(B|A) the conditional entropy of A given B and of B
given A respectively:
H(A)=−
a
p
A
(a)logp
A
(a)(5)
H(A, B)=−
a,b
p
AB
(a, b)logp
AB
(a, b)(6)
H(A|B)=−
a,b
p
AB
(a, b)logp
A|B
(a|b)(7)
The entropy H(A) is known to be a measure of the
amount of uncertainty about the random variable A, while
H(A|B) is the amount of uncertainty left in A when know-
ing B. Hence, from equation (3), I(A, B) is the reduction in
the uncertainty of the random variable A by the knowledge
of another random variable B, or, equivalently, the amount
of information that B contains about A. Some properties
of mutual information are summarized in table I (see [22]
for their proof).
TABLE I
Some properties of mutual information.
Non-negativity: I(A, B) ≥ 0
Independence: I(A, B)=0⇔ p
AB
(a, b)=p
A
(a).p
B
(b)
Symmetry: I(A, B)=I(B, A)
Self information: I(A, A)=H(A)
Boundedness: I(A, B) ≤ min(H(A),H(B))
≤ (H(A)+H(B))/2
≤ max(H(A),H(B))
≤ H(A, B)
≤ H(A)+H(B)
Data processing: I(A, B) ≥ I(A, T (B))
Considering the image intensity values a and b of a pair of
corresponding voxels in the two images that are to be reg-
istered to be random variables A and B respectively, esti-
mations for the joint and marginal distributions p
AB
(a, b),
p
A
(a)andp
B
(b) can be obtained by simple normalization of
the joint and marginal histograms of the overlapping parts
of both images. Intensities a and b are related through
the geometric transformation T
α
defined by the registra-
tion parameter α. The MI registration criterion states that
the images are geometrically aligned by the transformation
T
α
∗
for which I(A, B) is maximal. This is illustrated in
figure 1 for a CT and an MR image of the brain, show-
ing the 2-D histogram of the image intensity values in a
non-registered and in the registered position. The high-
intensity values in the histogram of the CT image originat-
ing from the bone of the skull are most likely to be mapped
on low-intensity values in the histogram of the MR image
if the images are properly aligned, resulting in a peak in
the 2-D histogram. The uncertainty about the MR voxel
intensity is thus largely reduced if the corresponding CT
voxel is known to be of high intensity. This correspon-
dence is lost in case of misregistration. However, the MI
criterion does not make limiting assumptions regarding the
relation between image intensities of corresponding voxels
in the different modalities, which is highly data dependent,
and no constraints are imposed on the image content of the
modalities involved.
If both marginal distributions p
A
(a)andp
B
(b)canbe
considered to be independent of the registration parame-
ters α, the MI criterion reduces to minimizing the joint
entropy H
AB
(A, B)[6]. Ifeitherp
A
(a)orp
B
(b) is indepen-
MAES et al.: IMAGE REGISTRATION BY MAXIMIZATION OF MUTUAL INFORMATION 189
Unregistered
CT intensity
MR intensity
Soft tissue
Registered
CT intensity
MR intensity
Soft tissue
Skull
Fig. 1. Joint histogram of the overlapping volume of the CT and MR
brain images of dataset A in tables II and III: a) initial position:
I(CT, MR)=0.46; b) registered position: I(CT,MR)=0.89.
Misregistration was about 20 mm and 10 degrees (see the param-
eters in table III).
dent of α, which is the case if one of the images is always
completely contained in the other, the MI criterion reduces
to minimizing the conditional entropy H(A|B)orH(B|A).
However, if both images only partially overlap, which is
very likely during optimization, the volume of overlap will
change when α is varied and p
A
(a)andp
B
(b)andalso
H(A)andH(B) will generally depend on α.TheMIcri-
terion takes this into account explicitly, as becomes clear
in equation (2), which can be interpreted as follows [27]:
“maximizing mutual information will tend to find as much
as possible of the complexity that is in the separate datasets
(maximizing the first two terms) so that at the same time
they explain each other well (minimizing the last term)”.
For I(A, B) to be useful as a registration criterion
and well-behaved with respect to optimization, I(A, B)
should vary smoothly as a function of misregistration
|α−α
∗
|. This requires p
A
(a), p
B
(b)andp
AB
(a, b) to change
smoothly when α is varied, which will be the case if the
image intensity values are spatially correlated. This is il-
lustrated in figure 2, showing the behaviour of I(A, B)asa
function of misregistration between an image and itself ro-
tated around the image center. The trace on the left is ob-
tained from an original MR image and shows a single sharp
optimum with a rather broad attraction basin. The trace
on the right is obtained from the same image after hav-
ing reduced the spatial correlation of the image intensity
by repeatedly swapping pairs of randomly selected pixels.
This curve shows many local maxima and the attraction
basin of the global maximum is also much smaller, which
deteriorates the optimization robustness. Thus, although
the formulation of the MI criterion suggests that spatial
dependence of image intensity values is not taken into ac-
count, such dependence is in fact essential for the criterion
to be well-behaved around the registration solution.
III. Algorithm
A. Transformation
With each of the images is associated an image coor-
dinate frame with its origin positioned in a corner of the
image, with the x axis along the row direction, the y axis
along the column direction and the z axis along the plane
−
20 −10 0 10 2
0
0
1
2
3
4
5
6
7
−
20 −10 0 10 2
0
0
1
2
3
4
5
6
7
Fig. 2. Spatial correlation of image intensity increases MI registration
robustness. Left: original 256 × 256 2-D MR image (top) and the
same image shuffled by swapping 30,000 randomly selected pixel
pairs (bottom). Both images have the same image content. Right:
MI traces obtained using PV interpolation for in-plane rotation
from −20 to +20 degrees of each image over itself. Local maxima
are marked with ’*’.
direction.
One of the images is selected to be the floating image F
from which samples s ∈ S are taken and transformed into
the reference image R. S canbethesetofgridpointsofF
or a sub- or superset thereof. Subsampling of the floating
image might be used to increase speed performance, while
supersampling aims at increasing accuracy. For each value
of the registration parameter α only those values s ∈ S
α
⊂
S are retained for which T
α
s falls inside the volume of R.
In this paper, we have restricted the transformation T
α
to rigid body transformations only, although it is clear that
the MI criterion can be applied to more general transfor-
mations as well. The rigid body transformation is a super-
position of a 3-D rotation and a 3-D translation and the
registration parameter α is a 6-component vector consist-
ing of 3 rotation angles φ
x
, φ
y
, φ
z
(measured in degrees)
and 3 translation distances t
x
, t
y
, t
z
(measured in millime-
ters). Transformation of image coordinates P
F
to P
R
from
the image F to image R is given by
V
R
.(P
R
− C
R
)=
R
x
(φ
x
).R
y
(φ
y
).R
z
(φ
z
).V
F
.(P
F
− C
F
)+t(t
x
,t
y
,t
z
)(8)
with V
F
and V
R
being 3× 3 diagonal matrices representing
the voxel sizes of images F and R respectively (in millime-
ter), C
F
and C
R
the image coordinates of the centers of the
images, R = R
x
.R
y
.R
z
the 3 × 3 rotation matrix, with the
matrices R
x
, R
y
and R
z
representing rotations around the
x, y and z axis respectively, and t the translation vector.
B. Criterion
Let f(s) denote the image intensity in the floating im-
age F at position s and r(T
α
s) the intensity at the trans-
190 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 16, NO. 2, APRIL 1997
formed position in the reference image R. The joint image
intensity histogram h
α
(f,r) of the overlapping volume of
both images at position α is computed by binning the im-
age intensity pairs (f(s),r(T
α
s)) for all s ∈ S
α
.Inor-
der to do this efficiently, the floating and the reference
image intensities are first linearly rescaled to the range
[0,n
F
− 1] and [0,n
R
− 1] respectively, n
F
× n
R
being the
total number of bins in the joint histogram. Typically, we
use n
F
= n
R
= 256.
In general, T
α
s will not coincide with a grid point of R
and interpolation of the reference image is needed to ob-
tain the image intensity value r(T
α
s). Nearest neighbour
(NN) interpolation of R is generally insufficient to guar-
antee subvoxel accuracy, as it is insensitive to translations
up to 1 voxel. Other interpolation methods, such as trilin-
ear (TRI) interpolation, may introduce new intensity val-
ues which are originally not present in the reference image,
leading to unpredictable changes in the marginal distribu-
tion p
R,α
(r) of the reference image for small variations of
α. To avoid this problem, we propose to use trilinear par-
tial volume distribution (PV) interpolation to update the
joint histogram for each voxel pair (s, T
α
s). Instead of in-
terpolating new intensity values in R, the contribution of
the image intensity f(s)ofthesamples of F to the joint
histogram is distributed over the intensity values of all 8
nearest neighbours of T
α
s on the grid of R,usingthesame
weights as for trilinear interpolation (figure 3). Each entry
in the joint histogram is then the sum of smoothly varying
fractions of 1, such that the histogram changes smoothly
as α is varied.
NN:
rr
rr
s
T
α
s
n
1
n
2
n
3
n
4
>
arg min
n
i
d(T
α
s, n
i
)=n
3
r(T
α
s)=r(n
3
)
h
α
(f(s),r(T
α
s)) += 1
TRI:
rr
rr
s
T
α
s
n
1
n
2
n
3
n
4
w
3
w
4
w
2
w
1
i
w
i
(T
α
s)=1
r(T
α
s)=
i
w
i
.r(n
i
)
h
α
(f(s),r(T
α
s)) += 1
PV:
rr
rr
s
T
α
s
n
1
n
2
n
3
n
4
w
3
w
4
w
2
w
1
i
w
i
(T
α
s)=1
∀i : h
α
(f(s),r(n
i
)) += w
i
Fig. 3. Graphical illustration of NN, TRI and PV interpolation in
2-D. NN and TRI interpolation find the reference image inten-
sity value at position T
α
s and update the corresponding joint
histogram entry, while PV interpolation distributes the contribu-
tion of this sample over multiple histogram entries defined by its
nearest neighbour intensities, using the same weights as for TRI
interpolation.
Estimations for the marginal and joint image intensity
distributions p
F,α(f )
, p
R,α
(r)andp
FR,α
(f,r)areobtained
by normalization of h
α
(f,r):
p
FR,α
(f,r)=
h
α
(f,r)
f,r
h
α
(f,r)
(9)
p
F,α
(f)=
r
p
FR,α
(f,r) (10)
p
R,α
(r)=
f
p
FR,α
(f,r) (11)
The MI registration criterion I(α)isthenevaluatedby
I(α)=
f,r
p
FR,α
(f,r)log
2
p
FR,α
(f,r)
p
F,α
(f) p
R,α
(r)
(12)
and the optimal registration parameter α
∗
is found from
α
∗
=argmax
α
I(α) (13)
C. Search
The images are initially positioned such that their cen-
ters coincide and that the corresponding scan axes of both
images are aligned and have the same orientation. Pow-
ell’s multi-dimensional direction set method is then used
to maximize I(α), using Brent’s one-dimensional optimiza-
tion algorithm for the line minimizations [18]. The direc-
tion matrix is initialized with unit vectors in each of the
parameter directions. An appropriate choice for the order
in which the parameters are optimized needs to be spec-
ified, as this may influence optimization robustness. For
instance, when matching images of the brain, the horizon-
tal translation and the rotation around the vertical axis
are more constrained by the shape of the head than the
pitching rotation around the left to right horizontal axis.
Therefore, first aligning the images in the horizontal plane
by first optimizing the in-plane parameters (t
x
,t
y
,φ
z
)may
facilitate the optimization of the out-of-plane parameters
(φ
x
,φ
y
,t
z
). However, as the optimization proceeds, the
Powell algorithm may introduce other optimization direc-
tions and change the order in which these are considered.
D. Complexity
The algorithm was implemented on an IBM RS/6000
workstation (AIX 4.1.3, 58 MHz, 185 SPECfp92; source
code is available on request). The computation time re-
quired for one evaluation of the MI criterion varies linearly
with the number of samples taken from the floating im-
age. While trilinear and partial volume interpolation have
nearly the same complexity (1.4 seconds per million sam-
ples), nearest neighbour interpolation is about three times
as efficient (0.5 seconds per million samples). The num-
ber of criterion evaluations performed during optimization
typically varies between 200 and 600, depending on the
initial position of the images, on the order in which the pa-
rameters are optimized and on the convergence parameters
specified for the Brent and Powell algorithm.
MAES et al.: IMAGE REGISTRATION BY MAXIMIZATION OF MUTUAL INFORMATION 191
IV. Experiments
The performance of the MI registration criterion was
evaluated for rigid body registration of MR, CT and PET
images of the brain of the same patient. The rigid body
assumption is well satisfied inside the skull in 3-D scans
of the head if patient related changes (due to for instance
inter-scanning operations) can be neglected, provided that
scanner calibration problems and problems of geometric
distortions have been minimized by careful calibration and
scan parameter selection respectively. Registration accu-
racy is evaluated in section V by comparison with external
marker-based registration results and other retrospective
registration methods, while the robustness of the method
is evaluated in section VI with respect to implementation
issues, such as sampling, interpolation and optimization,
and image content, including image degradations, such as
noise, intensity inhomogeneities and distortion, and par-
tial image overlap. Four different datasets are used in the
experiments described below (table II). Dataset A
1
con-
tains high resolution MR and CT images, while dataset B
was obtained by smoothing and subsampling the images of
dataset A to simulate lower resolution data. Dataset C
2
contains stereotactically acquired MR, CT and PET im-
ages, which have been edited to remove stereotactic mark-
ers. Dataset D contains an MR image only and is used to
illustrate the effect of various image degradations on the
registration criterion. All images consist of axial slices and
in all cases the x axis is directed horizontally right to left,
the y axis horizontally front to back and the z axis ver-
tically up, such that the image resolution is lowest in the
z direction. In all experiments, the joint histogram size is
256 × 256, while the fractional precision convergence pa-
rameters for the Brent and Powell optimization algorithm
are set to 10
−3
and 10
−5
respectively [18].
TABLE I I
Datasets used in the experiments of sections V and VI.
Set Image Size Voxels (mm) Range
AMR256
2
×180 0.98
2
×1.00 0×4094
CT 256
2
×100 0.94
2
×1.55 0×4093
BMR200
2
×45 1.25
2
×4.00 38×2940
CT 192
2
×39 1.25
2
×4.00 0×2713
CMR256
2
×24 1.25
2
×4.00 2×2087
CT 512
2
×29 0.65
2
×4.00 0×2960
PET 128
2
×15 2.59
2
×8.00 0×683
DMR256
2
×30 1.33
2
×4.00 2×3359
V. Accuracy
The images of datasets A, B and C were registered using
the MI registration criterion with different choices of the
floating image and using different interpolation schemes. In
each case the same optimization strategy was used, starting
from all parameters initially equal to zero and optimizing
1
Data provided by P.A. van den Elsen [25].
2
Data provided by J.M. Fitzpatrick [10].
the parameters in the order (t
x
, t
y
, φ
z
, φ
x
, φ
y
, t
z
). The
results are summarized in table III by the parameters of the
transformation that takes the MR image as the reference
image. Optimization required 300 to 500 evaluations of the
MI criterion, which was performed on an IBM RS6000/3AT
workstation using PV interpolation in about 20 minutes for
CT to MR matching of dataset A (40 minutes for MR to
CT matching) and in less than 2 minutes for PET to MR
matching of dataset C.
The images of dataset A have been registered by van
den Elsen [25] using a correlation-based VSB registration
method. Visual inspection showed this result to be more
accurate than skin marker based registration and we use
it as a reference to validate registration accuracy of the
MI criterion for datasets A and B. For dataset C, we com-
pare our results with the stereotactic registration solution
provided by Fitzpatrick [10]. The difference between the
reference and each of the MI registration solutions was eval-
uated at 8 points near the brain surface (figure 4). The
reference solutions and the mean and the maximal abso-
lute transformed coordinate differences measured at these
points are included in table III.
Fig. 4. The bounding box of the central eighth of the floating image
defines 8 points near the brain surface at which the difference
between different registration transforms is evaluated.
The solutions obtained for dataset A and for dataset
B using different interpolation schemes or for a different
choice of the floating image are all very similar. For dataset
A, the largest differences with the reference solutions oc-
cur for rotation around the x axis (0.7 degrees), but these
are all subvoxel. For dataset B, the differences are some-
what larger, especially in the y direction due to an offset
in the y translation parameter (0.8 mm). However, these
translational differences may have been caused by interpo-
lation and subsampling artifacts introduced when creating
the images of dataset B.
For dataset C, CT to MR registration using TRI inter-
polation did not converge to the reference solution. In this
case, CT to MR registration performs clearly worse than
MR to CT registration, for which all differences are sub-
voxel, the largest being 1.2 mm in the y direction for the
solution obtained using PV interpolation due to a 1 degree
offset for the x rotation parameter. For MR to PET as well
as for PET to MR registration, PV interpolation yields the
smallest differences with the stereotactic reference solution,
especially in the z direction, which are all subvoxel with re-
spect to the voxelsizes of the PET image in case of MR to
PET registration. Relatively large differences occur in the
y direction due to offsets in the y translation parameter of
about 1 to 2 mm.
192 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 16, NO. 2, APRIL 1997
TABLE III
Reference and MI registration parameters for datasets A, B and C and the mean
and maximal absolute difference evaluated at 8 points near the brain surface.
Set F/R Rotation (degrees) Translation (mm) Difference (mm)
xyzxy z xyz
A Reference [25] 9.62 -3.13 2.01 7.00 1.14 18.15
CT/MR NN 10.23 -3.23 2.10 6.98 1.00 18.24 0.09 (0.18) 0.40 (0.79) 0.63 (0.84)
TRI 10.24 -3.21 2.08 6.97 1.05 18.22 0.08 (0.16) 0.40 (0.72) 0.63 (0.80)
PV 10.36 -3.17 2.09 6.94 1.15 18.20 0.08 (0.17) 0.48 (0.76) 0.76 (0.89)
MR/CT NN 10.24 -3.17 2.09 6.95 1.04 18.18 0.08 (0.16) 0.41 (0.74) 0.64 (0.74)
TRI 10.24 -3.15 2.07 6.92 1.00 18.23 0.08 (0.15) 0.41 (0.76) 0.64 (0.80)
PV 10.39 -3.14 2.09 6.90 1.15 18.18 0.10 (0.18) 0.51 (0.77) 0.79 (0.94)
B Reference [25] 9.62 -3.13 2.01 7.00 1.14 18.15
CT/MR NN 10.02 -3.42 2.25 6.63 0.34 18.28 0.40 (0.83) 0.80 (1.45) 0.43 (0.84)
TRI 10.27 -3.11 2.05 6.53 0.54 18.34 0.48 (0.54) 0.61 (1.22) 0.67 (0.99)
PV 10.57 -3.17 2.11 6.60 0.62 18.36 0.40 (0.53) 0.68 (1.47) 0.97 (1.32)
MR/CT NN 10.17 -3.06 2.25 6.47 0.30 17.90 0.54 (0.84) 0.84 (1.57) 0.57 (1.03)
TRI 10.03 -3.05 2.22 6.44 0.37 18.19 0.56 (0.84) 0.77 (1.34) 0.42 (0.64)
PV 10.29 -3.16 2.08 6.48 0.33 17.95 0.52 (0.61) 0.81 (1.48) 0.69 (0.98)
C Reference [10] -0.63 0.05 4.74 26.15 -41.08 -12.35
CT/MR NN 0.87 0.05 4.84 26.70 -40.67 -9.92 0.54 (0.70) 0.74 (1.33) 2.43 (4.80)
TRI 1.21 -1.94 3.67 29.51 -39.78 43.61 ---
PV -0.00 0.00 4.95 26.57 -40.72 -10.00 0.41 (0.77) 0.49 (1.00) 2.35 (3.28)
MR/CT NN -0.21 0.00 4.95 26.56 -41.27 -12.01 0.41 (0.76) 0.35 (0.71) 0.62 (0.98)
TRI -0.51 0.25 5.03 26.35 -40.80 -11.84 0.42 (0.75) 0.43 (0.79) 0.51 (0.95)
PV -1.58 0.13 4.97 26.48 -41.39 -12.18 0.35 (0.73) 0.56 (1.18) 1.38 (1.57)
C Reference [10] 1.52 -1.17 4.22 27.62 -2.60 -4.46
PET/MR NN 0.70 0.26 5.20 27.57 -0.74 -5.08 1.40 (2.28) 1.82 (3.66) 1.97 (3.91)
TRI 0.38 0.01 5.25 27.50 -1.29 -1.37 1.47 (2.31) 1.62 (3.34) 3.22 (6.46)
PV 1.63 0.18 4.98 27.65 -0.46 -4.94 1.09 (1.83) 2.14 (3.32) 1.97 (2.46)
MR/PET NN 0.42 0.14 5.04 27.93 -1.28 -5.03 1.17 (2.16) 1.47 (3.00) 2.00 (4.03)
TRI 0.16 -0.11 4.90 27.99 -1.60 -4.27 0.98 (1.90) 1.27 (2.59) 2.05 (3.66)
PV 1.46 -0.34 4.71 27.94 -0.85 -4.49 0.72 (1.44) 1.74 (2.49) 1.19 (1.37)
VI. Robustness
A. Interpolation and optimization
The robustness of the MI registration criterion with re-
spect to interpolation and optimization was evaluated for
dataset A. The images were registered using either the CT
or the MR volume as the floating image and using differ-
ent interpolation methods. For each combination, various
optimization strategies were tried by changing the order in
which the parameters were optimized, each starting from
the same initial position with all parameters set to 0.
The results are summarized in figure 5. These scatter
plots compare each of the solutions found (represented by
their registration parameters α) with the one for which the
MI registration measure was maximal (denoted by α
∗
)for
each of the interpolation methods separately, using either
the CT or the MR image as the floating image. Differ-
ent solutions are classified by the norm of the registration
parameter difference vector |α − α
∗
| on the horizontal axis
(using mm and degrees for the translation and rotation pa-
rameters respectively) and by the difference in the value of
the MI criterion (MI(α
∗
) − MI(α)) on the vertical axis.
Although the differences are small for each of the interpo-
lation methods used, MR to CT registration seems to be
somewhat more robust than CT to MR registration. More
importantly, the solutions obtained using PV interpolation
are much more clustered than those obtained using NN
or TRI interpolation, indicating that the use of PV in-
terpolation results in a much smoother behaviour of the
registration criterion. This is also apparent from traces
in registration space computed around the optimal solu-
tion for NN, TRI and PV interpolation (figure 6). These
traces look very similar when a large parameter range is
considered, but in the neighbourhood of the registration
solution, traces obtained with NN and TRI interpolation
are noisy and show many local maxima, while traces ob-
tained with PV interpolation are almost quadratic around
the optimum. Remark that the MI values obtained using
TRI interpolation are larger than those obtained using NN
or PV interpolation, which can be interpreted according
to (2): the trilinear averaging and noise reduction of the
reference image intensities resulted in a larger reduction of
the complexity of the joint histogram than the correspond-
ing reduction in the complexity of the reference image his-
togram itself.
B. Subsampling
The computational complexity of the MI criterion is pro-
portional to the number of samples that is taken from the
floating image to compute the joint histogram. Subsam-
pling of the floating image can be applied to increase speed
performance, as long as this does not deteriorate the op-
timization behaviour. This was investigated for dataset A
by registration of the subsampled MR image with the orig-
inal CT image using PV interpolation. Subsampling was
performed by taking samples on a regular grid at sample
intervals of f
x
, f
y
and f
z
voxels in the x, y and z direction
MAES et al.: IMAGE REGISTRATION BY MAXIMIZATION OF MUTUAL INFORMATION 193
NN
TRI
PV
0 0.2 0.4 0.
6
0
2
4
6
8
x 10
−4
CT to MR registration
NN
TRI
PV
0 0.1 0.2 0.
3
0
1
2
x 10
−4
MR to CT registration
Fig. 5. Evaluation of MI registration robustness for dataset A, with either the CT (left) or the MR (right) image as the floating image.
Horizontal axis: norm of the difference vector |α − α
∗
| for different optimization strategies using NN, TRI and PV interpolation. α
∗
corresponds to the registration solution with the highest MI value for each interpolation method. Vertical axis: difference in MI value
between each solution and α
∗
.
a
NN
TRI
PV
−
180 −120 −60 0 60 120 18
0
0
.3
0
.4
0
.5
0
.6
0
.7
0
.8
0
.9
1
−0.5 −0.25 0 0.25 0.
5
0
.881
0
.882
0
.883
NN
b
c
−0.5 −0.25 0 0.25 0.5
0.963
0.964
0.965
TRI
−0.5 −0.25 0 0.25 0.5
0.875
0.876
0.877
PV
d
Fig. 6. MI traces around the optimal registration position for dataset A obtained for rotation around the x axis in the range from −180 to
+180 degrees (a) and from −0.5to+0.5 degrees using NN (b), TRI (c) and PV (d) interpolation.
respectively using nearest neighbour interpolation. No av-
eraging or smoothing of the MR image before subsampling
was applied. We used f
x
= f
y
=1,2,3,or4,andf
z
=1,
2, 3 or 4. The same optimization strategy was used in each
case. Registration solutions α obtained using subsampling
were compared with the solution α
∗
found when no sub-
sampling was applied (figure 7). For subsampling factors
f = f
x
× f
y
× f
z
up to 48 (4 in the x and y direction, 3
in the z direction) the optimization converged in about 4
minutes to a solution less than 0.2 degrees and 0.2 mm off
from the solution found without subsampling.
C. Partial overlap
Clinically acquired images typically only partially over-
lap, as CT scanning is often confined to a specific region to
minimize the radiation dose, while MR protocols frequently
image larger volumes. The influence of partial overlap on
the registration robustness was evaluated for dataset A for
CT to MR registration using PV interpolation. The images
were initially aligned as in the experiment in section V and
the same optimization strategy was applied, but only part
of the CT data was considered when computing the MI cri-
terion. More specifically, 3 50-slice slabs were selected at
194 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 16, NO. 2, APRIL 1997
0 10 20 30 40 5
0
0
0
.02
0
.04
0
.06
0
.08
0.1
0
.12
0
.14
0
.16
Subsampled MR to CT (PV interpolation)
Subsampling factor
Fig. 7. Effect of subsampling the MR floating image of dataset A on
the registration solution. Subsampling factor f vs. the norm of
the difference vector |α − α
∗
|. α
∗
corresponds to the registration
solution obtained when no subsampling is applied.
the bottom (the skull basis), the middle and the top part
of the dataset. The results are summarized in table IV and
compared with the solution found using the full dataset by
the mean and maximal absolute difference evaluated over
the full image at the same 8 points as in section V. The
largest parameter differences occur for rotation around the
x axis and translation in the z direction, resulting in maxi-
mal coordinate differences up to 1.5 CT voxel in the y and
z direction, but on average all differences are subvoxel with
respect to the CT voxelsizes.
D. Image degradation
Various MR image degradation effects, such as noise,
intensity inhomogeneity and geometric distortion, alter the
intensity distribution of the image, which may affect the MI
registration criterion. This was evaluated for the MR image
of dataset D by comparing MI registration traces obtained
for the original image and itself with similar traces obtained
for the original image and its degraded version (figure 8).
Such traces computed for translation in the x direction are
shown in figure 9.
Noise. The original MR data ranges from 2 to 3359 with
mean 160. White zero-mean Gaussian noise with variance
of 50, 100 and 500 was superimposed onto the original im-
age. Figure 9b shows that increasing the noise level de-
creases the mutual information between the two images
without affecting the MI criterion, as the position of maxi-
mal MI in traces computed for all 6 registration parameters
is not changed when the amount of noise is increased.
Intensity inhomogenei ty. To simulate the effect of
MR intensity inhomogeneities on the registration criterion,
the original MR image intensity I was altered into I
using
a slice-by-slice planar quadratic inhomogeneity factor:
log I
(x, y)=logI(x, y)+∆logI(x, y) (14)
∆logI(x, y)=−k((x − x
c
)
2
+(y − y
c
)
2
) (15)
with (x
c
,y
c
) being the image coordinates of the point
around which the inhomogeneity is centered and k ascale
factor. Figure9cshowsMItraces for different values of k
(k = 0.001, 0.002, 0.004; x
c
= y
c
= 100). All traces for
all parameters reach their maximum at the same position
and the MI criterion is not affected by the presence of the
inhomogeneity.
Geometric distortion. Geometric distortions ∆x,∆y,
∆z were applied to the original MR image according to a
slice-by-slice planar quadratic model of the magnetic field
inhomogeneity [17]:
∆x = k((x − x
c
)
2
+(y − y
c
)
2
) (16)
∆y =∆z = 0 (17)
∆i(x, y)=|2k(x − x
c
)| i(x +∆x, y +∆y) (18)
with (x
c
,y
c
) the image coordinates of the center of each
image plane and k a scale parameter. Figure 9d shows
traces of the registration criterion for various amounts of
distortion (k = 0.0001, 0.0005, 0.00075). As expected, the
distortion shifts the optimum of the x translation param-
eter proportional to the average distortion. No such shift
occurred in traces obtained for all other parameters.
VII. Discussion
The mutual information registration criterion presented
in this paper assumes that the statistical dependence be-
tween corresponding voxel intensities is maximal if both
images are geometrically aligned. Because no assumptions
are made regarding the nature of this dependence, the MI
criterion is highly data independent and allows for robust
and completely automatic registration of multi-modality
images in various applications with minimal tuning and
without any prior segmentation or other pre-processing
steps. The results of section V demonstrate that sub-
voxel registration differences with respect to the stereo-
tactic registration solution can be obtained for CT/MR
and PET/MR matching without using any prior knowl-
edge about the grey-value content of both images and the
correspondence between them. Additional experiments on
9 other datasets similar to dataset C within the Retro-
spective Registration Evaluation Project by Fitzpatrick et
al. [10] have verified these results [29], [14]. Moreover, sec-
tion VI-C demonstrated the robustness of the method with
respect to partial overlap, while it was shown in section VI-
D that large image degradations, such as noise and inten-
sity inhomogeneities, have no significant influence on the
MI registration criterion.
Estimations of the image intensity distributions were ob-
tained by simple normalization of the joint histogram. In
all experiments discussed in this paper, the joint histogram
was computed from the entire overlapping part of both im-
ages, using the original image data and a fixed number of
bins of 256 × 256. We have not evaluated the influence
of the bin size, the choice of a region of interest or the
application of non-linear image intensity transformations
on the behaviour of the MI registration criterion. Other
schemes can be used to estimate the image intensity distri-
butions, for instance by using Parzen windowing [9] on a
set of samples taken from the overlapping part of both im-
ages. This approach was used by Viola et al. [27], who also
MAES et al.: IMAGE REGISTRATION BY MAXIMIZATION OF MUTUAL INFORMATION 195
TABLE IV
Influence of partial overlap on the registration robustness for CT to MR registration of dataset A.
ROI Slices Rotation (degrees) Translation (mm) Difference (mm)
xyzxy z xyz
Full 0–99 10.36 -3.17 2.09 6.94 1.15 18.20
Bottom 0–49 10.14 -2.91 2.03 6.67 1.30 19.46 0.28 (0.54) 0.21 (0.46) 1.26 (1.78)
Middle 25–74 9.46 -2.53 2.13 6.67 0.71 17.75 0.43 (0.79) 0.62 (1.31) 1.01 (2.14)
Top 50–99 9.74 -3.05 2.43 6.86 0.82 17.59 0.35 (0.52) 0.52 (1.13) 0.69 (1.46)
abcd
Fig. 8. a) Slice 15 of the MR image of dataset D; b) with zero mean Gaussian noise (variance = 500); c) with quadratic inhomogeneity
(k =0.004); d) with geometric distortion (k =0.00075).
a
−10 −5 0 5 10
0.5
1
1.5
2
2.5
3
3.5
Original
s = 50
s = 100
s = 500
−10 −5 0 5 10
0
0.2
0.4
0.6
0.8
1
1.2
Noise
b
c
k = 0.001
k = 0.002
k = 0.004
−10 −5 0 5 1
0
0
.4
0
.5
0
.6
0
.7
0
.8
0
.9
1
1
.1
Intensity Inhomogeneity
k = 0.0001
k = 0.0005
k = 0.00075
−10 −5 0 5 1
0
0
.5
0
.6
0
.7
0
.8
0
.9
1
1
.1
1
.2
1
.3
Geometric distortion
d
Fig. 9. MI traces using PV interpolation for translation in the x direction of the original MR image of dataset D over its degraded version
in the range from −10 to +10 mm : a) original; b) noise; c) intensity inhomogeneity; d) geometric distortion.
196 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 16, NO. 2, APRIL 1997
use stochastic sampling of the floating image to increase
speed performance.
Partial volume interpolation was introduced to make the
joint and marginal distributions and their mutual infor-
mation vary smoothly for small changes in the registra-
tion parameters. The results of section VI-A indicate that
PV interpolation indeed improves optimization robustness
compared to nearest neighbour and trilinear interpolation.
More experiments are needed to compare this approach to
the Parzen windowing method as used by Viola et al. [27]
and the multi-resolution cubic resampling approach as used
by Studholme et al. [20].
The optimization of the MI registration criterion is per-
formed using Powell’s method. We noticed that for low
resolution images the initial order in which the parame-
ters are optimized strongly influences optimization robust-
ness. Generally, we obtained the best results when first
optimizing the in-plane parameters t
x
, t
y
and φ
z
,before
optimizing the out-of-plane parameters φ
x
, φ
y
and t
z
.For
low resolution images, the optimization often did not con-
verge to the global optimum if a different parameter order
was specified, due to the occurrence of local optima espe-
cially for the x rotation and the z translation parameters.
In the experiments discussed in this paper the amount of
misregistration that was recovered was as large as 10 de-
grees and 40 mm, but we have not extensively investigated
the robustness of the method with respect to the initial
positioning of the images, for instance by using multiple
randomised starting estimates. The choice of the floating
image may also influence the behaviour of the registration
criterion. In the experiment of section VI-A, MR to CT
matching was found to be more robust than CT to MR
matching. However, it is not clear whether this was caused
by sampling and interpolation issues or by the fact that the
MR image is more complex than the CT image and that
the spatial correlation of image intensity values is higher in
the CT image than in the MR image.
We have not tuned the design of the search strategy to-
wards specific applications. For instance, the number of
criterion evaluations required may be decreased by tak-
ing the limited image resolution into account when deter-
mining convergence. Moreover, the results of section VI-B
demonstrate that for high resolution images subsampling
of the floating image can be applied without deteriorating
optimization robustness. Important speed-ups can thus be
realized by using a multi-resolution optimization strategy,
starting with a coarsely sampled image for efficiency and
increasing the resolution as the optimization proceeds for
accuracy [20]. Furthermore, the smooth behaviour of the
MI criterion, especially when using PV interpolation, may
be exploited by using gradient-based optimization meth-
ods, as explicit formulas for the derivatives of the MI func-
tion with respect to the registration parameters can be ob-
tained [27].
All the experiments discussed in this paper were for rigid
body registration of CT, MR and PET images of the brain
of the same patient. However, it is clear that the MI crite-
rion can equally well be applied to other applications, using
more general geometric transformations. We have used the
same method successfully for patient-to-patient matching
of MR brain images for correlation of functional MR data
and for the registration of CT images of a hardware phan-
tom to its geometrical description to assess the accuracy of
spiral CT imaging [14].
Mutual information measures statistical dependence by
comparing the complexity of the joint distribution with
that of the marginals. Both marginal distributions are
taken into account explicitly, which is an important differ-
ence with the measures proposed by Hill et al. [13] (third or-
der moment of the joint histogram) and Collignon et al. [6]
(entropy of the joint histogram), which focus on the joint
histogram only. In appendices A and B we discuss the re-
lationship of these criteria and of the measure of Woods et
al. [30] (variance of intensity ratios) to the mutual infor-
mation criterion.
Mutual information is only one of a family of mea-
sures of statistical dependence or information redundancy
(see appendix C). We have experimented with ρ(A, B)=
H(A, B) − I(A, B), which can be shown to be a met-
ric [8], and ECC(A, B)=2I(A, B)/(H(A)+H(B)), the
Entropy Correlation Coefficient [1]. In some cases, these
measures performed better than the original MI criterion,
but we could not establish a clear preference for either
of these. Furthermore, the use of mutual information for
multi-modality image registration is not restricted to the
original image intensities only: other derived features, such
as edges or ridges, can be used as well. Selection of appro-
priate features is an area for further research.
VIII. Conclusion
The mutual information registration criterion presented
in this paper allows for subvoxel accurate, highly robust
and completely automatic registration of multi-modality
medical images. Because the method is largely data inde-
pendent and requires no user interaction or pre-processing,
themethodiswellsuitedtobeusedinclinicalpractice.
Further research is needed to better understand the in-
fluence of implementation issues, such as sampling and in-
terpolation, on the registration criterion. Furthermore, the
performance of the registration method on clinical data can
be improved by tuning the optimization method to specific
applications, while alternative search strategies, including
multi-resolution and gradient-based methods, have to be
investigated. Finally, other registration criteria can be de-
rived from the one presented here, using alternative infor-
mation measures applied on different features.
Appendix A
We show the relationship between the multi-modality
registration criterion devised by Hill et al. [12] and the joint
entropy H(a, b). Hill et al. used the n-th order moment of
the scatter-plot h as a measure of dispersion:
T
n
=
a,b
(
h(a, b)
V
)
n
(19)
MAES et al.: IMAGE REGISTRATION BY MAXIMIZATION OF MUTUAL INFORMATION 197
with h(a, b) the histogram entries and V =
a,b
h(a, b)
the common volume of overlap. Approximating the joint
probability distribution p(a, b)byp(a, b)=h(a, b)/V ,we
get:
T
n
=
a,b
p(a, b)
n
It turns out that T
n
is one-to-one related to the joint R´enyi
entropy H
n
of order n [22]:
H
n
=
1
1 − n
log(T
n
)
with the following properties:
• lim
n→1
H
n
(p)=−
i
p
i
log p
i
, which is the Shannon
entropy.
• n
2
>n
1
→ H
n
2
(p) ≤ H
n
1
(p)
Hence, the normalized second or third order moment cri-
teria defined by Hill et al. are equivalent to a generalized
version of the joint entropy H(a, b).
Appendix B
We show how the multi-modality registration criterion
devised by Woods et al. [30] relates to the conditional en-
tropy H(a|b). Denote by A and B the set of possible in-
tensities in the two images. Denote by a
i
and b
i
the in-
tensities of A and B at the common voxel position i.For
each voxel i with value b
i
= b in image B,leta
i
(b)bethe
value at voxel i in the corresponding image A.Letµ
a
(b)
be the mean and σ
a
(b) be the standard deviation of the
set {a
i
(b) |∀i : b
i
= b}.Letn
b
=#{i | b
i
= b} and
N =
b
n
b
. The registration criterion that Woods et al.
minimize is then defined as follows:
σ
=
b
n
b
N
σ
a
(b)
µ
a
(b)
(20)
=
b
p
b
(b)
σ
a
(b)
µ
a
(b)
(21)
with p
b
the marginal distribution function of image inten-
sities B.
It can be shown [8] that for a given mean µ
a
(b)and
standard deviation σ
a
(b)
H(A|B)=
b
p(b)H(A|B = b) (22)
= −
b
p(b)
a
p(a|b). log p(a|b) (23)
≤
b
p(b) log(σ
a
(b)) +
1
2
log(2πe) (24)
with equality if the conditional distribution p(a|b)of
image intensities A given B is the normal distribution
N(µ
a
(b),σ
a
(b)).
Using Jensen’s inequality for concave functions [8] we get
H(A|B) ≤
b
p(b) log(
σ
a
(b)
µ
a
(b)
)+
b
p(b) log(µ
a
(b))(25)
≤ log(
b
p(b)
σ
a
(b)
µ
a
(b)
) + log(
b
p(b)µ
a
(b))(26)
= log(σ
)+log(µ(a)) (27)
with µ(a)=
b
p(b)µ
a
(b) the mean intensity of image A.
If µ(a) is constant and p(a|b) can be assumed to be nor-
mally distributed, minimization of σ
then amounts to op-
timizing the conditional entropy H(A|B). In the approach
of Woods, this assumption is approximately accomplished
by editing away parts in one dataset (namely the skin in
MR) for which otherwise additional modes might occur in
p(a|b), while Hill et al. have proposed to take only specifi-
cally selected regions in the joint histogram into account.
Appendix C
Mutual Information I(A, B) is only one example of
the more general f-information measures of dependence
f(P ||P
1
× P
2
) [22] with P the set of joint probability dis-
tributions P (A, B)andP
1
× P
2
the set of joint probability
distributions P (A).P (B) assuming A and B to be indepen-
dent.
f-information is derived from the concept of f-divergence,
whichisdefinedas:
f(P ||Q)=
i
q
i
.f(p
i
/q
i
)
with P = {p
1
,p
2
...} and Q = {q
1
,q
2
...} with suitable
definitions when q
i
=0.
Some examples of f-divergence are:
• I
α
-divergence:
I
α
=
1
α(α − 1)
i
p
α
i
q
α−1
i
− 1
• χ
2
-divergence:
χ
2
=
i
(p
i
− q
i
)
2
q
i
with corresponding f-informations:
• I
α
-information:
I
α
(P ||P
1
× P
2
)=
1
α(α − 1)
i,j
p
α
ij
(p
i.
p
.j
)
α−1
− 1
with p
ij
= P (i, j)andp
i.
=
j
p
ij
and p
.j
=
i
p
ij
•
χ
2
-information:
χ
2
(P ||P
1
× P
2
)=
i,j
(p
ij
− p
i.
p
.j
)
2
p
i.
p
.j
Note that I
α
(P ||P
1
× P
2
) is the information-measure
counterpart of the n-th order moment used by Hill et
al. for n = α =2, 3. Furthermore, I
1
(P ||P
1
× P
2
)=
i,j
p
ij
log(
p
ij
p
i.
p
.j
) which is the definition of Mutual Infor-
mation used in this paper.
198 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 16, NO. 2, APRIL 1997
References
[1] J. Astola, and I. Virtanen, “Entropy correlation coefficient, a
measure of statistical dependence for categorized data,” Proc. of
the Univ. of Vaasa, Discussion Papers, no. 44, Finland, 1982.
[2] J.A. Baddeley, “An error metric for binary images,” Proc. IEEE
Workshop on Robust Computer Vision, pp. 59-78, Bonn, 1992.
[3] L.G. Brown, “A survey of image registration techniques,” ACM
Computing Surveys, vol. 24, no. 4, pp. 325-376, Dec. 1992.
[4] C-H. Chen, Statistical Pattern Recognition, Rochelle Park, N.J.:
Spartan Books, Hayden Book Company, 1973.
[5] J.Y. Chiang, and B.J. Sullivan, “Coincident bit counting - a new
criterion for image registration,” IEEE Trans. Medical Imaging,
vol. 12, no. 1, pp. 30-38, March 1993.
[6] A. Collignon, D. Vandermeulen, P. Suetens, and G. Marchal,
“3D multi-modality medical image registration using feature
space clustering,” Proc. First Int’l Conf. Computer Vision, Vir-
tual Reality and R obotics in Medicine, N. Ayache, ed., pp. 195-
204, Lecture Notes in Computer Science 905, Springer, April
1995.
[7] A. Collignon, F. Maes, D. Delaere, D. Vandermeulen, P. Suetens,
and G. Marchal, “Automated multimodality medical image reg-
istration using information theory,” Proc. XIV’th Int’l Conf. In-
formation Processing in Medical Imaging, Y. Bizais, C. Barillot,
and R. Di Paola, eds., pp. 263-274, Computational Imaging and
Vision 3, Kluwer Academic Plublishers, June 1995.
[8] T.M. Cover, and J.A. Thomas, Elements of Information Theory,
New York, N.Y.: John Wiley & Sons, 1991.
[9] R.O. Duda, and P.E. Hart, Pattern Classification and Scene
Analysis, New York, N.Y.: John Wiley & Sons, 1973.
[10] J.M. Fitzpatrick, Principal Investigator, Evaluation of Ret-
rospective Image Registration, National Institutes of Health,
Project Number 1 R01 NS33926-01, Vanderbilt University,
Nashville, TN, 1994.
[11] P. Gerlot-Chiron, and Y. Bizais, “Registration of multimodality
medical images using region overlap criterion,” CVGIP: Graph-
ical Models and Image Processing, vol. 54, no. 5, pp. 396-406,
Sept. 1992.
[12] D.L.G. Hill, D.J. Hawkes, N.A. Harrison, and C.F. Ruff, “A
strategy for automated multimodality image registration in-
corporating anatomical knowledge and imager characteristics,”
Proc. XIII’th Int’l Conf. Information Processing in Me dical
Imaging, H.H. Barrett, and A.F. Gmitro, eds., pp. 182-196,
Lecture Notes in Computer Science 687, Springer-Verlag, June
1993.
[13] D.L.G. Hill, C. Studholme, and D.J. Hawkes, “Voxel similarity
measures for automated image registration,” Proc. Visualization
in Biome dical Computing 1994, SPIE, vol. 2359, pp. 205-216,
1994.
[14] F. Maes, A. Collignon, D. Vandermeulen, G. Marchal, and P.
Suetens, “Multi-modality image registration by maximization
of mutual information,” Proc. IEEE Workshop Mathematic al
Methods in Biomedical Image Analysis, pp. 14-22, San Fran-
cisco, CA, June 1996.
[15] J.B.A. Maintz, P.A. van den Elsen, and M.A. Viergever, “Com-
parison of feature-based matching of CT and MR brain images,”
Proc. First Int’l Conf. Computer Vision, Virtual Reality and
Robotics in Medicine, N. Ayache, ed., pp. 219-228, Lecture Notes
in Computer Science 905, Springer, April 1995.
[16] C.R. Maurer, and J.M. Fitzpatrick, “A review of medical im-
age registration,” Interactive Image-Guided Neurosurgery,R.J.
Maciunas, ed., pp. 17-44, American Association of Neurological
Surgeons, 1993.
[17] J. Michiels, P. Pelgrims, H. Bosmans, D. Vandermeulen, J. Gy-
bels, G. Marchal, and P. Suetens, “On the problem of geometric
distortion in magnetic resonance images for stereotactic neuro-
surgery,” Magnetic Resonance Imaging, vol. 12, no. 5, pp. 749-
765, 1994.
[18] W.H. Press, B.P. Flannery, S.A. Teukolsky, and W.T. Vetterling,
Numerical Recipes in C, Second Edition, Cambridge, England:
Cambridge University Press, 1992, chapter 10, pp. 412-419.
[19] T. Radcliffe, R. Rajapakshe, and S. Shalev, “Pseudocorrelation:
a fast, robust, absolute, grey-level image alignment algorithm,”
Med. Phys., vol. 21, no. 6, pp. 761-769, June 1994.
[20] C. Studholme, D.L.G. Hill, and D.J. Hawkes, “Multiresolu-
tion voxel similarity measures for MR-PET registration,” Proc.
XIV’th Int’l Conf. Information Processing in Medical Imaging,
Y. Bizais, C. Barillot, and R. Di Paola, eds., pp. 287-298, Com-
putational Imaging and Vision 3, Kluwer Academic Publishers,
June 1995.
[21] C. Studholme, D.L.G. Hill, and D.J. Hawkes, “Automated 3D
registration of truncated MR and CT images of the head,” Pro c.
British Machine Vision Conf., D. Pycock, ed., pp. 27-36, Birm-
ingham, Sept. 1995.
[22] I. Vajda, Theory of Statistical Inference and Information,Dor-
drecht, The Netherlands: Kluwer Academic Publisher, 1989.
[23] P.A. van den Elsen, E-J.D. Pol, and M.A. Viergever, “Medical
image matching - a review with classification,” IEEE Eng. in
Medicine and Biology, pp. 26-38, March 1993.
[24] P.A. van den Elsen, J.B.A. Maintz, E-J.D. Pol, and M.A.
Viergever, “Automatic registration of CT and MR brain images
using correlation of geometrical features,”, IEEE Trans. Medical
Imaging, vol. 14, no. 2, June 1995.
[25] P.A. van den Elsen, E-J.D. Pol, T.S. Sumanaweera, P.F. Hemler,
S. Napel, and J. Adler, “Grey value correlation techniques used
for automatic matching of CT and MR brain and spine images,”
Proc. Visualization in Biomedical Computing, SPIE, vol. 2359,
pp. 227-237, Oct. 1994.
[26] A. Venot, J.F. Lebruchec, and J.C. Roucayrol, “A new class
of similarity measures for robust image registration,” Computer
Vision, Graphics, and Image Processing, vol. 28, no. 2, pp. 176-
184, Nov. 1984.
[27] P. Viola, and W.M. Wells III, “Alignment by maximization of
mutual information,” Proc. Vth Int’l Conf. Computer Vision,
pp. 16-23, Cambridge, MA, June 1995.
[28] W.M. Wells III, P. Viola, H. Atsumi, S. Nakajima, and R. Kiki-
nis, “Multi-modal volume registration by maximization of mu-
tual information,” Medical Image Analysis, vol. 1, no. 1, pp. 35-
51, Mar. 1996.
[29] J. West, J.M. Fitzpatrick, M.Y. Wang, B.M. Dawant, C.R. Mau-
rer, Jr., R.M. Kessler, R.J. Maciunas et al., “Comparison and
evaluation of retrospective intermodality image registration tech-
niques,” Proc. Image Processing, SPIE, vol. 2710, pp. 332-347,
Feb. 1996.
[30] R.P. Woods, J.C. Mazziotta, and S.R. Cherry, “MRI-PET reg-
istration with automated algorithm,” Journal of Computer As-
sisted Tomography, vol. 17, no. 4, pp. 536-546, July/Aug. 1993.