Content uploaded by Liliana Forzani
Author content
All content in this area was uploaded by Liliana Forzani
Content may be subject to copyright.
Electronic Journal of Statistics
Vol. 5 (2011) 146–158
ISSN: 1935-7524
DOI:
10.1214/11-EJS602
On the mean and variance of the
generalized inverse of a singular
Wishart matrix
∗
R. Dennis Cook
†
School of Statistics, University of Minnesota, 313 Ford Hall,
224 Church Street SE, Minneapolis, MN 55455
e-mail:
dennis@stat.umn.edu
and
Liliana Forzani
Instituto de Matem´atica Aplicada del Litoral and Facultad de Ingenier´ıa Qu´ımica,
CONICET and UNL , G¨uemes 3450, (3000) Santa Fe, Argentina
e-mail:
liliana.forzani@gmail.com
Abstract: We derive the first and the second moments of the Moore-
Penrose generalized inverse of a singular standard Wishart matrix without
relying on a density. Instead, we use the moments of an inverse Wishart
distribution and an invariance argument which is related to the literature
on tensor functions. We also find the order of the spectral norm of the
generalized inverse of a Wishart matrix as its dimension and degrees of
freedom diverge.
AMS 2000 subject classificat ions: Primary 62H05, secondary 62E15.
Keywords and phr ases: Inverse Wishart dis tr ibution, Moore-Penrose
generalized inverse, singular inverse Wishart distributions, tensor functions.
Received October 2010.
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
147
2 Moments of W
−
p
(I
p
, n) . . . . . . . . . . . . . . . . . . . . . . . . . . 148
3 Properties of W
−
p
(Σ, n) . . . . . . . . . . . . . . . . . . . . . . . . . . 150
3.1 Mean and variance . . . . . . . . . . . . . . . . . . . . . . . . . . 150
3.2 Order o f kΣ
1/2
W
−
p
(Σ, n)Σ
1/2
k . . . . . . . . . . . . . . . . . . . . 151
4 Proof of Theo rem 2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
4.1 E(W
−
) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
4.2 E(W
−
W
−
) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
4.3 var{vec(W
−
)} and var{tr(W
−
)} . . . . . . . . . . . . . . . . . . 153
∗
The authors thank Joe Eaton and a referee for helpful suggestions.
†
This work was supported in part by grant DMS-1007547 from the U.S. National Science
Foundation.
146
Generalized inve rse Wishart moments 147
4.4 Moments of higher o rder . . . . . . . . . . . . . . . . . . . . . . . 154
4.5 Proof of Proposition 4.1 . . . . . . . . . . . . . . . . . . . . . . . 155
5 Proof of Theo rem 3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
1. Introduction
Let the c olumns of X = (X
1
, . . . , X
n
) ∈ R
p×n
be a sample X
i
, i = 1, . . . , n, from
N
p
(0, Σ), the p-variate norma l distribution with mean 0 and positive definite
variance matrix Σ. The sum of squares matrix S = XX
T
follows the p-variate
Wishart distribution with n degree s of fr eedom, denoted by W
p
(Σ, n). If p > n,
then W
p
(Σ, n) is ca lle d the singular Wishart distribution. Its density was given
by Uhlig [
16] under the Hausdorff measure and by Srivastava [14] under the
Lebesgue measure on the functionally independent elements of S.
Let W ∼ W
p
(Σ, n) and let W
−
be the usual Moore-Penrose inverse of W,
defined as the unique matrix W
−
such that W
−
W W
−
= W
−
, W W
−
W = W ,
and W W
−
and W
−
W are symmetric. As indicated by this definition, we use
the usual inner products based on the identity matrix I
p
for the two symmetry
conditions in the Moore-Penorse inverse, unless stated otherwise. If W is non-
singular then W
−
is the regular inverse. The distribution of W
−
is called the
inve rse Wishart distributio n when p ≤ n and the generalize d inverse Wishart
distribution when p > n. D´ıaz-Garc´ıa and Guti´errez-J´aimez [
4] gave an ex-
pression for the de nsity function of the generalized inverse Wishart distribution
under the Hausdorff measure. Under the Lebesgue measure on the functionally
independent elements, the density was pro posed by Bodnar and Okhrin [
2] and
Zhang [
18] but their results seem inconsistent. The density given by Bodnar and
Okhrin involves the eigenvalues of W
−
, while the density given by Zhang does
not. Both of their results were based on the density of the singular Wishar t dis-
tribution given by Srivastava [
14], and neither gave moments of the distribution.
If W ∼ W
p
(Σ, n) and p ≤ n, then we denote the inverse Wishart distribution of
W
−1
as W
−1
p
(Σ, n). If p > n the distribution of W
−
is denoted as W
−
p
(Σ, n).
In this note we der ive the first two moments of W
−
p
(I
p
, n) without relying on
an expression of its dens ity function, and discuss the is sues involved in extending
this result to W
−
p
(Σ, n). Our results are based on the first two moments of the
inve rse Wishart distribution [
17] plus an invaria nc e argument. We also find the
order of the spectral norm k·k of W
p
(Σ, n) as n, p → ∞. I n addition to being a
contribution to rando m matrix theory, these results may play a role in Bayesian
analysis because the corresponding distributions are natural conjugate priors
for the c ovariance matrix in the normal distribution [
4]. They are also useful in
studies of estimation methods for high dimensional n < p re gressions.
We present our findings on the moments W
−
p
(I, n) in Theorem
2.1 of Sec-
tion
2. Those findings rely on an invariance relationship that is descr ibed in
Propositio n 4.1 and is related to the classical mechanics literature o n tensor s.
Results on the moments of a W
−
p
(Σ, n) random matrix and on its orde r are
given in Section
3.2. The proof of Theorem 2.1 is given in Section 4 and the
proof of Proposition 4.1 is given in Section 4.5.
148 R.D. Cook and L. Forzani
Throughout this article ∼ means equal in distribution and R
p×q
denotes
the collection of all real p × q matr ices. For sequences {a
n
} and {b
n
}, we write
a
n
≍ b
n
if there are constants m, M and N such that 0 < m < |a
n
/b
n
| < M < ∞
for all n > N. The Kronecker product ⊗ of two matrices A = (A
ij
) ∈ R
a×b
and
B ∈ R
c×d
is the ac × bd matrix expressed in block form as A ⊗ B = (A
ij
B),
i = 1, . . . , a, j = 1, . . . , b. The vec operator [
8] maps A ∈ R
a×b
to R
ab
by
stacking its columns. We us e e
i
∈ R
p
to denote the vector w ith a 1 in the i-
th position and 0’s elsewhere. The p
2
× p
2
commutation matrix is denoted as
C
p
2
=
P
i,j
(e
i
⊗ e
j
)(e
T
j
⊗ e
T
i
) [
10, 11].
2. Moments of W
−
p
(I
p
, n)
The first two moments of W
−1
p
(I
p
, n) have been known fo r some time:
Proposition 2.1. (von Rosen, 1988) Let W ∼ W
p
(I
p
, n). If n > p + 3, then
rank(W ) = p with probability 1,
E(W
−1
) = a
1
I
p
,
E(W
−1
W
−1
) = b
1
I
p
,
var{vec(W
−1
)} = c
1
(I
p
2
+ C
p
2
) + 2d
1
vec(I
p
)vec
T
(I
p
),
where a
1
= (n − p − 1)
−1
, b
1
= (n − 1)c
1
, c
−1
1
= (n − p)(n − p − 1)(n − p − 3),
and d
−1
1
= (n − p)(n − p − 1)
2
(n − p − 3).
There is a close relationship between the form of var{vec(W
−1
)} a nd the
spectral decompositio n of a fourth- order isotropic tensor from classical elasticity
theory: Expressing var{vec(W
−1
)} in terms of the elements of W
−1
ij
of W
−1
and
rearranging terms we have
cov(W
−1
ij
, W
−1
kl
) = c
1
(δ
ik
δ
jl
+ δ
il
δ
jk
) + 2d
1
δ
ij
δ
kl
= {(2c
1
+ 6d
1
)/3}δ
ij
δ
kl
+ 2c
1
{(δ
ik
δ
jl
+ δ
il
δ
jk
)/2 − δ
ij
δ
kl
/3},
where Kronecker’s delta δ
ij
= 1 if i = j and 0 otherwise. Except for the coeffi-
cients (2c
1
+ 6d
1
) and 2c
1
, this form is identical to a c lassical tensor decompo-
sition that is related to the bulk and shear moduli (see, for example, equation
(14) of [
1] and the associated references). Further co mments on the relation-
ship between this work and continuum mechanics are given in Section 4. von
Rosen (1988) gives also various moments of W
−1
p
(Σ, n), but here our focus is on
Wisharts with Σ = I
p
.
The following theorem gives results for W
−
p
(I
p
, n) that are analogous to those
stated in Proposition
2.1 for W
−1
p
(I
p
, n).
Theorem 2.1. Let W ∼ W
p
(I
p
, n). If p > n + 3, then rank(W ) < p with
probability 1,
E(W
−
) = a
2
I
p
,
E(W
−
W
−
) = b
2
I
p
,
Generalized inve rse Wishart moments 149
var{vec(W
−
)} = c
2
(I
p
2
+ C
p
2
) + 2d
2
vec(I
p
)vec
T
(I
p
),
var{tr(W
−
)} = 2c
3
n + 2d
3
n
2
,
where
a
2
=
n
p(p − n − 1)
,
b
2
=
n(p − 1)c
3
p
,
c
2
=
n{p(p − 1) − n(p − n − 2) − 2}c
3
p(p − 1)(p + 2)
,
d
2
=
n{n
2
(n − 1) + 2n(p − 2)(p − n) + 2p (p − 1)}d
3
p
2
(p − 1)(p + 2)
,
c
−1
3
= (p − n)(p − n − 1)(p − n − 3) and d
−1
3
= (p − n)(p − n − 1)
2
(p − n − 3).
The constraints n > p + 3 and p > n + 3 in Proposition
2.1 and Theorem 2.1
are needed to ensure that the moments exist. Comparing Proposition 2.1 and
Theorem 2.1 we see that the first two moments o f W
−
have the sa me functional
form in the singular and nonsingular cas es, differing only by a
j
, b
j
, c
j
and
d
j
, j = 1, 2. The following corollary gives the a symptotic ma gnitudes of these
factors. Essentially, it tells us that their asymptotic behavior depends weakly
on the rank of W . Its proof seems straightforward and is omitted.
Corollary 2.1. If n > p + 3 and p/n → r with 0 ≤ r < 1 as p, n → ∞ then
a
1
≍ n
−1
, b
1
≍ n
−2
, c
1
≍ n
−3
and d
1
≍ n
−4
. If p > n + 3 and n/p → r with
0 < r < 1 as p, n → ∞ then a
2
≍ p
−1
≍ n
−1
, b
2
≍ p
−2
≍ n
−2
, c
2
≍ p
−3
≍ n
−3
and d
2
≍ p
−4
≍ n
−4
.
To gain some intuition about the structure of the variance in Theorem
2.1, we
first recognize that one role of P
s
≡ (I
p
2
+ C
p
2
)/2 is to project onto the space of
symmetric p ×p matr ices: Let A be a p ×p matrix. Then P
s
vec(A) = {vec(A) +
vec(A
T
)}/2 and vec
−1
(P
s
A) = (A + A
T
)/2. Also, P
v
≡ vec(I
p
)vec
T
(I
p
)/p
projects onto s pan{vec(I
p
)}. Conse quently,
var{vec(W
−
)} = 2c
2
P
s
+ 2pd
2
P
v
= 2c
2
(P
s
− P
v
) + 2(c
2
+ pd
2
)P
v
,
where P
s
− P
v
and P
v
are orthogonal projection operators. One implication of
this development is given in the following corollary.
Corollary 2.2. The eigenvalues of vec{var(W
−
)} are 2 (c
2
+ pd
2
) with mul-
tiplicity 1, 2c
2
with multiplicity p − 1 and 0 with multiplicity p(p − 1). The
corresponding eigenvectors are vec(I
p
)/
√
p, the eigenvectors of P
s
−P
v
and the
eigenvectors of I
p
2
− P
s
.
If W ∼ W
p
(Σ, n) then Σ
−1/2
W Σ
−1/2
∼ W
p
(I
p
, n) and
W
†
= Σ
−1/2
Σ
−1/2
W Σ
−1/2
−
Σ
−1/2
is a reflexive generalize d inverse of W . It follows straightforwardly that
150 R.D. Cook and L. Forzani
Corollary 2.3. If W ∼ W
p
(Σ, n) and p > n + 3 then
E(W
†
) = a
2
Σ
−1
var{vec(W
†
)} = c
2
(I
p
2
+ C
p
2
)(Σ
−1
⊗ Σ
−1
) + 2d
2
vec(Σ
−1
)vec
T
(Σ
−1
)
cov(W
†
ij
, W
†
kl
) = c
2
(Σ
−1
ik
Σ
−1
jl
+ Σ
−1
il
Σ
−1
jk
) + 2d
2
Σ
−1
ij
Σ
−1
kl
,
where W
†
ij
and Σ
−1
ij
denote element (i, j) of W
†
and Σ
−1
. The second and third
conclusions are the same, except the third is in terms of the elements of W
†
.
The form of var{vec(W
†
)} given in Corollary
2.3 is identical to the asy mptotic
variance of the covariance matrix from a sample from an elliptically contoured
distribution. In that case the cons tants c
2
= 1 + κ and d
2
= κ/2, where κ is the
kurtosis of the distribution (see , for example, Tyler [
15]).
Because ΣW
†
W and Σ
−1
W W
†
are s ymmetric, the reflexive generalized in-
verse W
†
of W is also the Moore-Penrose inverse in the inner products based on
Σ and Σ
−1
[
13], but W
†
it is not the usual Moore-Penrose inverse since W
†
W
and W W
†
are not symmetric. We were unable to find succinct expressions for
the mean and variance of W
−
(Σ, n) that are analogous to those fo r W
−
(I
p
, n)
given in Theorem
2.1. In the next sectio n we g ive some results on the mo ments
of W
−
p
(Σ, n). We a lso give the order of the spectral norm k · k of the scaled
Wishart Σ
1/2
W
−
p
(Σ, n)Σ
1/2
as n, p → ∞, which may be helpful in asymptotic
studies of regressions with p > n.
3. Properties of W
−
p
(Σ, n)
3.1. Mean and variance
Let W ∼ W
p
(Σ, n) with p > n + 3. The singular Wishart matrix W can be
decomposed as W ∼ Y Y
T
, whe re vec(Y ) ∼ N(0, Σ ⊗ I
n
). Since Y ∈ R
p×n
has
rank n with pro bability 1, the usual Moore-Penrose generalized inverse can be
decomposed as
W
−
∼ Y (Y
T
Y )
−2
Y
T
∼ Σ
1/2
Z(Z
T
ΣZ)
−2
Z
T
Σ
1/2
, (3.1)
where Z ∈ R
p×n
is a matrix of iid standar d normal variates. Write the spectra l
decomposition of Σ as Σ = ΓΛΓ
T
, w he re Γ ∈ R
p×p
is orthogonal and Λ > 0
is diagonal. Since the distribution of Z is invariant under orthogonal transfor-
mations we have W
−
∼ ΓΛ
1/2
Z(Z
T
ΛZ)
−2
Z
T
Λ
1/2
Γ
T
. Consequently, without
loss of generality, we assume that Σ = Λ is a diagonal matrix when studying
moments and other quantities. The left and right orthogonal transformations Γ
and Γ
T
can be res tored straig htforwardly for a general Σ > 0.
Let M (Λ) = E(W
−
). Then it follows that for all orthogonal matrices P ∈
R
p×p
, M (P
T
ΛP ) = P
T
M(Λ)P and thus M(Λ) is a tensor-valued isotonic tensor
function of Λ . Isotropic tensor functions have been studied extensively in the
literature on continuum mechanics (see [
12] for an introduction and [9] for recent
Generalized inve rse Wishart moments 151
results). For instance, it is known from this litera ture that M (Λ) and Λ have
the s ame eigenvectors. Although there are various representations for isotropic
tensor functions [
9], they do not seem to provide further illuminatio n in this
setting. Let V (Λ) = var{vec(W
−
)} ∈ R
p
2
×P
2
. The variance V (P
T
ΛP ) = (P
T
⊗
P
T
)V (Λ)(P ⊗ P ) is s imilarly structured as a fourth-orde r tenso r function, and
much the same comments apply.
When Λ = I
p
, the distribution of W
−
is invariant under orthogonal transfor-
mations, W
−
∼ P W
−
P
T
for all orthogonal P ∈ R
p×p
. This invariance property
was used extensively in the moment derivations for Theorem
2.1. However, when
Λ 6= I
p
with distribution of W
−
is no longer invariant and the moments of W
−
become mor e complicated. Nevertheless, it is still possible to make some progress
using sy mmetry arguments involving the rows z
T
i
of Z, essentially utilizing in-
variance under a restricted class of transformations. This leads to the results
stated in Theorem
3.1. In preparation, let m
ij
(Λ) = E{z
T
i
(Z
T
ΛZ)
−2
)z
j
}, let
v
ij,kl
(Λ) = cov{z
T
i
(Z
T
ΛZ)
−2
z
j
, z
T
k
(Z
T
ΛZ)
−2
)z
l
}, and let Λ
i
denote the i th
diagonal element of Λ, i, j, k, l = 1, . . . , p.
Theorem 3.1. Assume that Σ = Λ is a diagonal matrix with diagonal ele-
ments Λ
i
, i = 1, . . . , p. Then M(Λ) is a diagonal matrix with diagonal elements
M
ii
(Λ) = Λ
i
m
ii
(Λ) and
V (Λ) =
p
X
i,j=1
Λ
i
Λ
j
v
ii,jj
(e
i
e
T
j
⊗ e
i
e
T
j
) +
p
X
i,j=1
Λ
i
Λ
j
v
ij,ij
(e
j
e
T
j
⊗ e
i
e
T
i
)(I
p
+ C
p
2
)
− 2
p
X
i=1
Λ
2
i
v
ii,ii
(e
i
e
T
i
⊗ e
i
e
T
i
),
where the Λ arguments for v
(·)
on the right hand side have been suppressed
to improve readability, v
ii,jj
= cov{z
T
i
(Z
T
ΛZ)
−2
z
i
, z
T
j
(Z
T
ΛZ)
−2
)z
j
}, v
ij,ij
=
var{z
T
i
(Z
T
ΛZ)
−2
)z
j
} and v
ii,ii
= var{z
T
i
(Z
T
ΛZ)
−2
)z
i
}.
The moments – m
ii
, v
ij,ij
, v
ii,jj
and v
ii,ii
– needed for Theorem
3.1 evidently
do not have tractable closed- form representations.
3.2. Order of kΣ
1/2
W
−
p
(Σ, n)Σ
1/2
k
Let Z
0
= Z(Z
T
Z)
−1/2
, and let λ
max
and λ
min
denote the la rgest and smallest
eigenvalues of Σ. Then
Γ
T
W
−
Γ = Λ
1/2
Z(Z
T
ΛZ)
−2
Z
T
Λ
1/2
= Λ
1/2
Z(Z
T
Z)
−1
(Z
T
0
ΛZ
0
)
−2
(Z
T
Z)
−1
Z
T
Λ
1/2
.
Because the nor malized ma trix Z
0
has orthogonal co lumns, (Z
T
0
ΛZ
0
)
−2
≤ λ
−2
min
I
p
and thus Σ
1/2
W
−
Σ
1/2
≤ λ
−2
min
ΓΛW
−
I
ΛΓ
T
, where W
−
I
∼ W
−
p
(I
p
, n). T he order
of kΣ
1/2
W
−
Σ
1/2
k can now be found by application of Chebyschev’s inequality.
152 R.D. Cook and L. Forzani
Let ǫ > 0 and, for notational co nvenience, let H = Σ
1/2
W
−
Σ
1/2
. Then for all
h ∈ R
p
with khk = 1,
Pr(h
T
Hh ≥ ǫ) ≤ Pr(λ
−2
min
h
T
ΓΛW
−
I
ΛΓ
T
h ≥ ǫ)
≤ ǫ
−2
λ
−4
min
{var(h
T
ΓΛW
−
I
ΛΓ
T
h) + E
2
(h
T
ΓΛW
−
I
ΛΓ
T
h)}
= ǫ
−2
λ
−4
min
(h
T
ΓΛ ⊗ h
T
ΓΛ)var{vec(W
−
I
)}(ΛΓ
T
h ⊗ ΛΓ
T
h)
+ ǫ
−2
λ
−4
min
a
2
2
(h
T
ΓΛ
2
Γ
T
h)
2
≤ ǫ
−2
(λ
max
/λ
min
)
4
{2(c
2
+ pd
2
) + a
2
2
},
where 2(c
2
+ pd
2
) = kvar{ vec(W
−
I
)}k as given in Corollary
2.2 and a
2
is as
defined in Theorem 2.1. Combining this with Theorem 2.1 and the conclusions
of Corollary 2.1 we have
Corollary 3.1. Let W ∼ W
p
(Σ, n).
(i) Assume that n > p + 3 and that p/ n → r with 0 ≤ r < 1. Then
kΣ
1/2
W
−
Σ
1/2
k = O
p
(n
−1
).
(ii) Assume that the condition number λ
max
/λ
min
of Σ is bounded as p → ∞,
that p > n + 3 and that n/p → r with 0 < r < 1. Then kΣ
1/2
W
−
Σ
1/2
k =
O
p
(n
−1
).
4. Proof of Theorem 2.1
The general idea of this proof is to use invariance arguments along with moment
matching via Proposition
2.1.
A singular Wis hart matrix W ∼ W
p
(I
p
, n), p > n + 3, can be decom-
posed as W = Y Y
T
with vec(Y ) ∼ N (0, I
p
⊗ I
n
) and Y = H
T
D
1/2
U, where
H
T
∈ R
p×n
is semi-orthogonal HH
T
= I
n
, U ∈ R
n×n
is orthogonal and the
diagonal elements d
1
, . . . , d
n
of the diagonal matr ix D ∈ R
n×n
are non-zero
with probability 1. Consequently, W = H
T
DH and W
−
= H
T
D
−1
H. More-
over Y
T
Y ∼ W
n
(I
n
, p) and since p > n + 3, Y
T
Y has full rank with probability
1 and (Y
T
Y )
−1
= U
T
D
−1
U ∼ W
−1
n
(I
n
, p). Interchanging n and p in Pro posi-
tion
2.1 we have
E{(Y
T
Y )
−1
} = E(U
T
D
−1
U) = (p − n − 1)
−1
I
n
, (4.1)
E{(Y
T
Y )
−1
(Y
T
Y )
−1
} = E(U
T
D
−2
U) = c
3
(p − 1)I
n
, (4.2)
var{vec(Y
T
Y )
−1
} = var{vec(U
T
D
−1
U)} (4.3)
= c
3
(I
n
2
+ C
n
2
) + 2d
3
vec(I
n
)vec
T
(I
n
),
where c
−1
3
= (p −n)(p −n −1)(p −n −3), d
−1
3
= (p −n)(p −n −1)
2
(p −n −3).
4.1. E(W
−
)
Using the fact that Y ∼ P Y for any orthogonal matrix P ∈ R
p×p
, we get
E(W
−
) = E[{(P Y )(P Y )
T
}
−
] = P E(W
−
)P
T
Generalized inve rse Wishart moments 153
and consequently E(W
−
) = aI
p
(see, for example, Ea ton [5], Proposition 2.14).
It remains to find a. Since W
−
= H
T
D
−1
H,
ap = tr{E(W
−
)} = tr{E(H
T
D
−1
H)} = tr{E(D
−1
)} = n(p − n − 1)
−1
,
where we used (
4.1) for the last equality. From this we get E(W
−
); that is,
a = a
2
.
4.2. E(W
−
W
−
)
Since E(W
−
W
−
) = P E(W
−
W
−
)P
T
for any orthogonal matrix P , E(W
−
W
−
) =
bI
p
. To find b we have
bp = tr{E(W
−
W
−
)} = tr{ E(H
T
D
−2
H)} = tr{E(D
−2
)} = c
3
n(p − 1),
where we used (4.2) for the last equality. From this we conclude that b = b
2
.
4.3. var{vec(W
−
)} and var{tr(W
−
)}
Our proof of this part is based on the following proposition which gives a char-
acterization of matrices that are invariant under a subc lass of the orthogonal
transformations.
Proposition 4.1. Let A ∈ R
p
2
×p
2
such that (P
T
⊗ P
T
)A(P ⊗ P ) = A for all
orthogonal matrices P ∈ R
p×p
. Then A = cI
p
2
+ fC
p
2
+ 2dvec(I
p
)vec
T
(I
p
), for
some real multipliers c, d and f.
This proposition is appa rently well-known in the literature on continuum
mechanics, where it is often referred to as a representation theorem for fourth-
order isotropic tensors. Its proof for the case p = 3 can be found in the classical
literature on Cartesian tensor s [
6]. Jog [7] provides a concise proof and cites
ten other demonstr ations of the same result, most of which are for p = 3. All of
these proofs rely heavily on a nalytic traditions, nota tion and tensor operators
that are not readily found in the statistical literature and might seem elusive on
first reading. (A dictionary connecting tensors and common matrix operations
in statistics was given by Dauxois et al. [
3].) For completeness we have included
in Section 4.5 a proof that does not use the technical machinery of continuum
mechanics, but r e lies only on the Kronecker product, vec operator [
8] and the
commutation matrix [10, 11]. These operator s were defined in the Introduction
and are used widely in the statistical literature.
The collection of matrices A that satisfies the hypothesis of Proposition
4.1
forms a vector space over the real field that is closed under transposition and
multiplication. The proposition essentially gives a basis {I
p
2
, C
p
2
, vec(I
p
)vec
T
(I
p
)}
for A.
For notational convenience, let V = var{vec(W
−
)}. Since Y ∼ P Y for any
orthogonal matrix P ∈ R
p×p
we have
V = var[vec{(P Y )(P Y )
T
}
−
] = (P ⊗ P )V (P
T
⊗ P
T
),
154 R.D. Cook and L. Forzani
and consequently V satisfies the hypothesis o f Proposition 4.1. However, V is
also invariant under multiplication by C
p
2
. Using Proposition
4.1 this implies
that c = f: V − V C
p
2
= (c − f)I
p
2
−(c − f)C
p
2
= 0. C onsequently the class of
covariance matrices must be of the form
var{vec(W
−
)} = c(I
p
2
+ C
p
2
) + 2dvec(I
p
)vec
T
(I
p
). (4.4)
It remains to find c and d which we do by moment matching.
From (
4.4) we have
var{tr(W
−
)} = var{tr(D
−1
)} = var{vec
T
(I
p
)vec(W
−
)} = 2cp + 2dp
2
. (4.5)
Now, by (4.3) we get another expr ession for
var{tr(D
−1
)} = var{vec
T
(I
n
)vec(Y
T
Y )
−1
} = 2c
3
n + 2d
3
n
2
(4.6)
and using (4.5) and (4.6) together we see that c and d must satisfy
cp + dp
2
= c
3
n + d
3
n
2
. (4.7)
Since we are pursuing two factors – c and d – we require a second independent
equation to determine them uniquely. This can be obtained by first taking the
trace of (
4.4) to get tr[var{vec(W
−
)}] = cp
2
+ cp + 2dp. Second, we obtain a
known expression for tr[va r{ vec(W )
−
}] by writing it as
tr[var{vec(W
−
)}] = tr[E{vec(W
−
)vec
T
(W
−
)}]−tr[E{vec(W
−
)}E
T
{vec(W
−
)}]
and using previous results to reduce the right hand side. Using (4.2) and the
previously derived form for E(W
−
) we have
tr[E{vec(W
−
)vec
T
(W
−
)}] = tr [E{vec(H
T
D
−1
H)vec
T
(H
T
D
−1
H)}]
= tr{E(D
−2
)} = nc
3
(p − 1),
tr[E{vec(W
−
)}E
T
{vec(W
−
)}] = a
2
2
p.
Consequently,
cp
2
+ cp + 2dp = nc
3
(p − 1) − a
2
2
p. (4.8)
Using Maple to solve (
4.7) and (4.8) for c and d gives the solutions stated in
Theorem 2.1.
4.4. Moments of higher order
Moments of higher order can in principle be found similarly, by using results from
von Rose n [
17] in combination with moment matching. For instance, consider
E{(W
−
)
3
} = E(H
T
D
−3
H) = b
3
I
p
. Proceeding as in Section 4.2, E{(Y
T
Y )
−3
} =
E(U
T
D
−3
U) = r
3
I
n
, where r
3
can be obtained from von Rosen’s Corollary 3.1
again interchanging the roles of n and p. Consequently, b
3
= nr
3
/p.
Generalized inve rse Wishart moments 155
4.5. Proof of Proposition 4.1
The proof of Propos ition 4.1 is based on using various subclasses of the orthogo-
nal matrices to characterize the c olumns of A. This characterization is described
in the next lemma , which uses the same hypothesis as the proposition.
Lemma 4.1. Let A ∈ R
p
2
×p
2
be such that (P
T
⊗ P
T
)A(P ⊗ P ) = A for all
orthogonal matrices P ∈ R
p×p
. Then for some real factors h, d, s
1
and s
2
,
(i) A(e
i
⊗ e
i
) = hvec(I
p
) + d(e
i
⊗ e
i
) for i = 1, . . . , p.
(ii) A(e
i
⊗ e
j
) = s
1
(e
i
⊗ e
j
) + s
2
(e
j
⊗ e
i
) for i 6= j = 1, . . . , p.
Proof. This proof is based on taking various forms for P in the hypothesized
relationship. We do not disting uish these forms notationally.
Part (i): Let P be any orthogonal matrix with the property that P e
i
= e
i
for a selected index i. Restricting consideration to this subclass of orthogonal
matrices and multiplying the hypothesized equation on the right by e
i
⊗ e
i
we
have (P
T
⊗ P
T
)A(e
i
⊗ e
i
) = A(e
i
⊗ e
i
). Let M = vec
−1
{A(e
i
⊗ e
i
)} so that
vec(P
T
MP ) = vec(M). (4.9)
Without los t of generality take i = p, and consider orthogonal matrices of the
form
P =
P
1
0
0 1
,
where P
1
∈ R
(p−1)×(p−1)
is any orthogonal matrix. Clearly, P e
p
= e
p
. Partition
M = (M
jk
), j, k = 1, 2 according to the partition of P . We consider the four
partition components M
jk
separately.
(1). From (
4.9) we get P
T
1
M
11
P
1
= M
11
for all orthogonal matrices P
1
∈
R
p−1
. It is well known that this implies M
11
= h
p
I
p−1
for some real multiplier
h
p
(see, for example, Eaton [5], Proposition 2.14).
(2). P
T
1
M
12
= M
12
for all orthogonal matrices P
1
implies that M
12
= 0:
Write M
12
= λU with U ∈ R
p−1
a semi-orthogonal matrix. Taking P
T
1
to be an
orthogonal matrix with row j equal to U
T
, we have M
12
= λe
j
for any j = 1 . . . p
and therefore M
12
= 0. Similarly, M
21
= 0.
(3). Finally, M
22
∈ R
1
is arbitr ary and it follows that M has the form
M = h
p
I
p
+ d
p
e
p
e
T
p
with d
p
= M
22
− h
p
. Therefore, for i = 1, . . . , p,
vec(M) = A(e
i
⊗ e
i
) = h
i
vec(I
p
) + d
i
(e
i
⊗ e
i
). (4.10)
It remains to show that h
i
and d
i
are constant over the index i. Take a new
P such that P e
i
= e
j
and P e
j
= e
i
for two s elected indices i 6= j. Then using
(
4.10), the hypothesis and (4.10) again we find
h
i
vec(I
p
) + d
i
(e
i
⊗ e
i
) = A(e
i
⊗ e
i
)
= (P
T
⊗ P
T
)A(P ⊗ P )(e
i
⊗ e
i
)
= (P
T
⊗ P
T
)A(e
j
⊗ e
j
)
156 R.D. Cook and L. Forzani
= (P
T
⊗ P
T
) (h
j
vec(I
p
) + d
j
(e
j
⊗ e
j
))
= h
j
vec(I
p
) + d
j
(e
i
⊗ e
i
).
Therefore h
i
= h
j
= h, d
i
= d
j
= d and part (i) follows.
Part (ii): The proof of this conc lusion follows the same logic a s the proof of
part (i), but we use the subclass of orthogonal matrices with the pro perty that
P e
i
= e
i
and P e
j
= e
j
for selected indice s i 6= j. Restricting consideration to
this sub class and multiplying the hypothesized equation on the right by e
i
⊗e
j
we have (P
T
⊗P
T
)A(e
i
⊗e
j
) = A(e
i
⊗e
j
). Let M = vec
−1
{A(e
i
⊗e
j
)} so that
vec(P
T
MP ) = vec(M). Without lost of generality take i = p − 1 and j = p,
and consider orthogonal matrices of the form
P =
P
1
0
0 I
2
,
where P
1
∈ R
(p−2)×(p−2)
is any orthogona l matrix. Partition M = (M
jk
), j, k =
1, 2 to conform with the partitions of P . Using the hypothes is of the lemma, we
reason as follows. (1) P
T
1
M
11
P
1
= M
11
for all P
1
orthogonal matrice s of order
p−2 again implies that M
11
= c
ij
I
p−2
for some real factor c
ij
. (2) P
T
1
M
12
= M
12
for all orthogonal matrices P
1
implies aga in M
12
= 0, and analogously M
21
= 0.
(3) M
22
is arbitrary and therefore
A(e
i
⊗e
j
) = c
ij
vec(I
p
) + s
1ij
(e
i
⊗e
j
) + s
2ij
(e
j
⊗e
i
) + t
ij
(e
i
⊗e
i
) + u
ij
(e
j
⊗e
j
).
(4.11)
It follows from part (i) and the fact that A
T
also satisfies the hypothesis that
(e
T
k
⊗ e
T
k
)A(e
i
⊗ e
j
) = 0 for i 6= j = 1, . . . , p and k = 1, . . . , p. Using this result
and multiplying (
4.11) by (e
T
k
⊗ e
T
k
), with k 6= i and k 6= j, (e
T
i
⊗ e
T
i
) and
(e
T
j
⊗ e
T
j
) respectively we conclude that c
ij
= t
ij
= u
ij
= 0 for i 6= j. Therefore
A(e
i
⊗ e
j
) = s
1ij
(e
i
⊗ e
j
) + s
2ij
(e
j
⊗ e
i
), i 6= j. (4.12)
It remains to show that s
1ij
and s
2ij
are constant in the indices i 6= j. Take
a new subclass of P ’s such tha t, for two additional selected indices k 6= s,
P e
i
= e
k
, and P e
j
= e
s
where still i 6= j. Then using (
4.12), the hyp othesis and
(4.12) again we get that
s
1ij
(e
i
⊗ e
j
) + s
2ij
(e
j
⊗ e
i
) = (P
T
⊗ P
T
)A(P ⊗ P )(e
i
⊗ e
j
)
= (P
T
⊗ P
T
)A(e
k
⊗ e
s
)
= (P
T
⊗ P
T
) (s
1ks
(e
k
⊗ e
s
) + s
2ks
(e
s
⊗ e
k
))
= s
1ks
(e
i
⊗ e
j
) + s
2ks
(e
j
⊗ e
i
).
Multiplying the first and the last term by (e
T
i
⊗e
T
j
) and (e
T
j
⊗e
T
i
) respectively
we g et
A(e
i
⊗ e
j
) = s
1
(e
i
⊗ e
j
) + s
2
(e
j
⊗ e
i
), for i 6= j,
which co nc ludes the proof of the lemma.
Generalized inve rse Wishart moments 157
Turning to the proof of Proposition 4 .1, we first show that the multipliers
in Lemma
4.1 are functionally rela ted; in particular, d = s
1
+ s
2
. It follows
immediately from part (ii) of Lemma 4.1 that, for i 6= j, A(P ⊗ P )(e
i
⊗ e
j
) =
(P ⊗P ) (s
1
(e
i
⊗ e
j
) + s
2
(e
j
⊗ e
i
)). Taking P to be in the subcla ss of orthogona l
matrices with the pro perty that, fo r selected indices i 6= j, P e
i
= (e
i
+ e
j
)/
√
2
and P e
j
= (e
j
− e
i
)/
√
2, we have immediately that
A
e
i
+ e
j
√
2
⊗
e
j
− e
i
√
2
= s
1
e
i
+ e
j
√
2
⊗
e
j
− e
i
√
2
+ s
2
e
j
− e
i
√
2
⊗
e
i
+ e
j
√
2
Expanding this equation and simplifying we find that
d((e
j
⊗ e
j
) − (e
i
⊗ e
i
)) = (s
1
+ s
2
) ((e
j
⊗ e
j
) − (e
i
⊗ e
i
))
and consequently d = s
1
+ s
2
. The conclusion of the proposition now follows
from Lemma
4.1 with d = s
1
+ s
2
: Taking v =
P
i,j
c
ij
(e
i
⊗ e
j
),
Av = A
X
i,j
c
ij
(e
i
⊗ e
j
)
=
X
i6=j
c
ij
(s
1
(e
i
⊗ e
j
) + s
2
(e
j
⊗ e
i
)) +
p
X
i=1
c
ii
(hvec(I
p
) + (s
1
+ s
2
)(e
i
⊗ e
i
))
= s
1
X
i,j
c
ij
(e
i
⊗ e
j
) + s
2
X
i,j
c
ij
(e
j
⊗ e
i
) + h
p
X
i=1
c
ii
vec(I
p
)
= s
1
v + s
2
C
p
2
v + h
1
vec(I
p
)vec
T
(I
p
)v,
where h
1
= h
P
p
i=1
c
ii
.
5. Proof of Theorem
3.1
Recall that Theorem 3.1 r equires Σ = Λ to be a diagonal matrix. For notational
convenience, le t H = (Z
T
ΛZ)
−2
. The conclusion that M (Λ) is a diagonal matrix
arises by noting that, for i 6= j, z
T
i
Hz
j
∼ −z
T
i
Hz
j
, and thus E(z
T
i
Hz
j
) = 0.
By a similar symmetry argument, the element v
ij,kl
of V equals 0 when
at least one of its indices i, j, k, l is distinct; that is, not equal to a ny other
index. If no indices are distinct then they must be equal in pairs, leading to
four possibilities: for i 6= j, v
ii,jj
, v
ij,ij
, v
ij,ji
and, for i = j, v
ii,ii
. However,
v
ij,ij
= v
ij,ji
, which leads to the three v terms in the Theorem. The form o f V
follows from these results, the representation V =
P
ij,kl
(Λ
i
Λ
k
Λ
k
Λ
l
)
1/2
v
ij,kl
×
vec(e
i
e
T
j
)vec(e
k
e
T
l
), and the definitio n of the commutation matrix.
References
[1] Basser, P. J. a nd Pajevic, S. (2007). Spectral decomposition of a 4th
order covariance tensor: Applications to diffusion tensor MRI. Signal Pro-
cessing 87, 220–236.
158 R.D. Cook and L. Forzani
[2] Bodnar, T. a nd Okhrin, Y. (2008). Properties of the singular , inve rse
and generalized inverse partitioned Wishart distributions. J. Multivariate
Anal. 99, 2389–2 405.
MR2463397
[3] Dauxois, J., Romain, Y. and Viguier-Pla, S. (1994). Tensor products
and statistics. Linear Algebra and its Applications 210, 59–88.
MR1294771
[4] D
´
ıaz-Garc
´
ıa, J. A. and Guti
´
errez-J
´
aimez, R. (2006). Distribution of
the generaliz ed inverse of a random matrix and its applica tions. Journal of
Statistical Planning and Inference 136, 183–192.
MR2207179
[5] Eaton, M. L. (2007). Multivariate Statistics: A Vector Space Approach.
Beachwood, Ohio: Institute of Mathematical Statistics.
MR2431769
[6] Jeffreys, H. (1 931). Cartesian Tensors. Cambridge: Cambridge Univer-
sity Press. Page 91. MR0133075
[7] Jog, C. S. (2006). A concise pro of of the representation theorem for
fourth-order isotropic tens ors. J. Elasticity 85, 119–124. MR226 5724
[8] Henderson, H, V. and Searle, S. R. (1979). Ve c and vech operators for
matrices, with some uses in J acobians and multivariate statistics. Canadian
J. Statist. 7, 65–81. MR0549795
[9] Itskov, M. (2009). Tensor Analysis and Tensor Algebra for Engineers,
2nd Edition. New York: Springer .
[10] Magnus, J. R. and Neudecker, H. (1979). The commutation matrix:
Some properties and applications. Ann. Statist. 7, 381–39 4.
MR0520247
[11] Neudecker, H. and Wansbeek, T. (1983). Some Results on Commu-
tation Matrices, with Statistical Applications . Canadian J. Statist. 11,
221–231.
MR0732996
[12] Ogden, R. W. (2001). Elements of the theory of finite elasticity. In Fu, Y.
B. and Ogden, R. W. (eds) Nonlinear Elasticity: Theory and Applications.
Cambridge University Press, 1–58.
MR1835108
[13] Stone, M. (1987). Coordinate-Free Multivariate Statistics. New York:
Oxford University Press.
MR0916472
[14] Srivastava, M. S. (2003). Sing ular Wishart and multivariate beta distri-
butions. Ann. Statist. 31, 1537–1560. MR2012825
[15] Tyler, D. E. (1981). Asymptotic inference for eigenvectors. Ann. Statist.
9, 725–736. MR061927 8
[16] Uhlig, H. (1994). On singular Wishart and singular multivariate beta
distributions. Ann. Statist. 22, 395–405. MR1272090
[17] von Rosen, D. (1 988). Moments for the inverted Wishart distribution.
Scand. J. Statist. 15, 97–109.
MR0968156
[18] Zhang, Z. (2007). Pseudo-inverse multivariate/matrix-variate distribu-
tions. J. Multivariate Anal. 98, 1684–1692. MR23 70113