ArticlePDF Available

On the mean and variance of the generalized inverse of a singular Wishart matrix

Authors:

Abstract

We derive the first and the second moments of the Moore-Penrose generalized inverse of a singular standard Wishart matrix without relying on a density. Instead, we use the moments of an inverse Wishart distribution and an invariance argument which is related to the literature on tensor functions. We also find the order of the spectral norm of the generalized inverse of a Wishart matrix as its dimension and degrees of freedom diverge.
Electronic Journal of Statistics
Vol. 5 (2011) 146–158
ISSN: 1935-7524
DOI:
10.1214/11-EJS602
On the mean and variance of the
generalized inverse of a singular
Wishart matrix
R. Dennis Cook
School of Statistics, University of Minnesota, 313 Ford Hall,
224 Church Street SE, Minneapolis, MN 55455
e-mail:
dennis@stat.umn.edu
and
Liliana Forzani
Instituto de Matem´atica Aplicada del Litoral and Facultad de Ingenier´ıa Qu´ımica,
CONICET and UNL , uemes 3450, (3000) Santa Fe, Argentina
e-mail:
liliana.forzani@gmail.com
Abstract: We derive the first and the second moments of the Moore-
Penrose generalized inverse of a singular standard Wishart matrix without
relying on a density. Instead, we use the moments of an inverse Wishart
distribution and an invariance argument which is related to the literature
on tensor functions. We also find the order of the spectral norm of the
generalized inverse of a Wishart matrix as its dimension and degrees of
freedom diverge.
AMS 2000 subject classificat ions: Primary 62H05, secondary 62E15.
Keywords and phr ases: Inverse Wishart dis tr ibution, Moore-Penrose
generalized inverse, singular inverse Wishart distributions, tensor functions.
Received October 2010.
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
147
2 Moments of W
p
(I
p
, n) . . . . . . . . . . . . . . . . . . . . . . . . . . 148
3 Properties of W
p
, n) . . . . . . . . . . . . . . . . . . . . . . . . . . 150
3.1 Mean and variance . . . . . . . . . . . . . . . . . . . . . . . . . . 150
3.2 Order o f kΣ
1/2
W
p
, n
1/2
k . . . . . . . . . . . . . . . . . . . . 151
4 Proof of Theo rem 2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
4.1 E(W
) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
4.2 E(W
W
) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
4.3 var{vec(W
)} and var{tr(W
)} . . . . . . . . . . . . . . . . . . 153
The authors thank Joe Eaton and a referee for helpful suggestions.
This work was supported in part by grant DMS-1007547 from the U.S. National Science
Foundation.
146
Generalized inve rse Wishart moments 147
4.4 Moments of higher o rder . . . . . . . . . . . . . . . . . . . . . . . 154
4.5 Proof of Proposition 4.1 . . . . . . . . . . . . . . . . . . . . . . . 155
5 Proof of Theo rem 3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
1. Introduction
Let the c olumns of X = (X
1
, . . . , X
n
) R
p×n
be a sample X
i
, i = 1, . . . , n, from
N
p
(0, Σ), the p-variate norma l distribution with mean 0 and positive definite
variance matrix Σ. The sum of squares matrix S = XX
T
follows the p-variate
Wishart distribution with n degree s of fr eedom, denoted by W
p
, n). If p > n,
then W
p
, n) is ca lle d the singular Wishart distribution. Its density was given
by Uhlig [
16] under the Hausdor measure and by Srivastava [14] under the
Lebesgue measure on the functionally independent elements of S.
Let W W
p
, n) and let W
be the usual Moore-Penrose inverse of W,
defined as the unique matrix W
such that W
W W
= W
, W W
W = W ,
and W W
and W
W are symmetric. As indicated by this definition, we use
the usual inner products based on the identity matrix I
p
for the two symmetry
conditions in the Moore-Penorse inverse, unless stated otherwise. If W is non-
singular then W
is the regular inverse. The distribution of W
is called the
inve rse Wishart distributio n when p n and the generalize d inverse Wishart
distribution when p > n. D´ıaz-Garc´ıa and Guti´errez-J´aimez [
4] gave an ex-
pression for the de nsity function of the generalized inverse Wishart distribution
under the Hausdorff measure. Under the Lebesgue measure on the functionally
independent elements, the density was pro posed by Bodnar and Okhrin [
2] and
Zhang [
18] but their results seem inconsistent. The density given by Bodnar and
Okhrin involves the eigenvalues of W
, while the density given by Zhang does
not. Both of their results were based on the density of the singular Wishar t dis-
tribution given by Srivastava [
14], and neither gave moments of the distribution.
If W W
p
, n) and p n, then we denote the inverse Wishart distribution of
W
1
as W
1
p
, n). If p > n the distribution of W
is denoted as W
p
, n).
In this note we der ive the first two moments of W
p
(I
p
, n) without relying on
an expression of its dens ity function, and discuss the is sues involved in extending
this result to W
p
, n). Our results are based on the first two moments of the
inve rse Wishart distribution [
17] plus an invaria nc e argument. We also find the
order of the spectral norm k·k of W
p
, n) as n, p . I n addition to being a
contribution to rando m matrix theory, these results may play a role in Bayesian
analysis because the corresponding distributions are natural conjugate priors
for the c ovariance matrix in the normal distribution [
4]. They are also useful in
studies of estimation methods for high dimensional n < p re gressions.
We present our findings on the moments W
p
(I, n) in Theorem
2.1 of Sec-
tion
2. Those findings rely on an invariance relationship that is descr ibed in
Propositio n 4.1 and is related to the classical mechanics literature o n tensor s.
Results on the moments of a W
p
, n) random matrix and on its orde r are
given in Section
3.2. The proof of Theorem 2.1 is given in Section 4 and the
proof of Proposition 4.1 is given in Section 4.5.
148 R.D. Cook and L. Forzani
Throughout this article means equal in distribution and R
p×q
denotes
the collection of all real p × q matr ices. For sequences {a
n
} and {b
n
}, we write
a
n
b
n
if there are constants m, M and N such that 0 < m < |a
n
/b
n
| < M <
for all n > N. The Kronecker product of two matrices A = (A
ij
) R
a×b
and
B R
c×d
is the ac × bd matrix expressed in block form as A B = (A
ij
B),
i = 1, . . . , a, j = 1, . . . , b. The vec operator [
8] maps A R
a×b
to R
ab
by
stacking its columns. We us e e
i
R
p
to denote the vector w ith a 1 in the i-
th position and 0’s elsewhere. The p
2
× p
2
commutation matrix is denoted as
C
p
2
=
P
i,j
(e
i
e
j
)(e
T
j
e
T
i
) [
10, 11].
2. Moments of W
p
(I
p
, n)
The first two moments of W
1
p
(I
p
, n) have been known fo r some time:
Proposition 2.1. (von Rosen, 1988) Let W W
p
(I
p
, n). If n > p + 3, then
rank(W ) = p with probability 1,
E(W
1
) = a
1
I
p
,
E(W
1
W
1
) = b
1
I
p
,
var{vec(W
1
)} = c
1
(I
p
2
+ C
p
2
) + 2d
1
vec(I
p
)vec
T
(I
p
),
where a
1
= (n p 1)
1
, b
1
= (n 1)c
1
, c
1
1
= (n p)(n p 1)(n p 3),
and d
1
1
= (n p)(n p 1)
2
(n p 3).
There is a close relationship between the form of var{vec(W
1
)} a nd the
spectral decompositio n of a fourth- order isotropic tensor from classical elasticity
theory: Expressing var{vec(W
1
)} in terms of the elements of W
1
ij
of W
1
and
rearranging terms we have
cov(W
1
ij
, W
1
kl
) = c
1
(δ
ik
δ
jl
+ δ
il
δ
jk
) + 2d
1
δ
ij
δ
kl
= {(2c
1
+ 6d
1
)/3}δ
ij
δ
kl
+ 2c
1
{(δ
ik
δ
jl
+ δ
il
δ
jk
)/2 δ
ij
δ
kl
/3},
where Kronecker’s delta δ
ij
= 1 if i = j and 0 otherwise. Except for the coeffi-
cients (2c
1
+ 6d
1
) and 2c
1
, this form is identical to a c lassical tensor decompo-
sition that is related to the bulk and shear moduli (see, for example, equation
(14) of [
1] and the associated references). Further co mments on the relation-
ship between this work and continuum mechanics are given in Section 4. von
Rosen (1988) gives also various moments of W
1
p
, n), but here our focus is on
Wisharts with Σ = I
p
.
The following theorem gives results for W
p
(I
p
, n) that are analogous to those
stated in Proposition
2.1 for W
1
p
(I
p
, n).
Theorem 2.1. Let W W
p
(I
p
, n). If p > n + 3, then rank(W ) < p with
probability 1,
E(W
) = a
2
I
p
,
E(W
W
) = b
2
I
p
,
Generalized inve rse Wishart moments 149
var{vec(W
)} = c
2
(I
p
2
+ C
p
2
) + 2d
2
vec(I
p
)vec
T
(I
p
),
var{tr(W
)} = 2c
3
n + 2d
3
n
2
,
where
a
2
=
n
p(p n 1)
,
b
2
=
n(p 1)c
3
p
,
c
2
=
n{p(p 1) n(p n 2) 2}c
3
p(p 1)(p + 2)
,
d
2
=
n{n
2
(n 1) + 2n(p 2)(p n) + 2p (p 1)}d
3
p
2
(p 1)(p + 2)
,
c
1
3
= (p n)(p n 1)(p n 3) and d
1
3
= (p n)(p n 1)
2
(p n 3).
The constraints n > p + 3 and p > n + 3 in Proposition
2.1 and Theorem 2.1
are needed to ensure that the moments exist. Comparing Proposition 2.1 and
Theorem 2.1 we see that the first two moments o f W
have the sa me functional
form in the singular and nonsingular cas es, differing only by a
j
, b
j
, c
j
and
d
j
, j = 1, 2. The following corollary gives the a symptotic ma gnitudes of these
factors. Essentially, it tells us that their asymptotic behavior depends weakly
on the rank of W . Its proof seems straightforward and is omitted.
Corollary 2.1. If n > p + 3 and p/n r with 0 r < 1 as p, n then
a
1
n
1
, b
1
n
2
, c
1
n
3
and d
1
n
4
. If p > n + 3 and n/p r with
0 < r < 1 as p, n then a
2
p
1
n
1
, b
2
p
2
n
2
, c
2
p
3
n
3
and d
2
p
4
n
4
.
To gain some intuition about the structure of the variance in Theorem
2.1, we
first recognize that one role of P
s
(I
p
2
+ C
p
2
)/2 is to project onto the space of
symmetric p ×p matr ices: Let A be a p ×p matrix. Then P
s
vec(A) = {vec(A) +
vec(A
T
)}/2 and vec
1
(P
s
A) = (A + A
T
)/2. Also, P
v
vec(I
p
)vec
T
(I
p
)/p
projects onto s pan{vec(I
p
)}. Conse quently,
var{vec(W
)} = 2c
2
P
s
+ 2pd
2
P
v
= 2c
2
(P
s
P
v
) + 2(c
2
+ pd
2
)P
v
,
where P
s
P
v
and P
v
are orthogonal projection operators. One implication of
this development is given in the following corollary.
Corollary 2.2. The eigenvalues of vec{var(W
)} are 2 (c
2
+ pd
2
) with mul-
tiplicity 1, 2c
2
with multiplicity p 1 and 0 with multiplicity p(p 1). The
corresponding eigenvectors are vec(I
p
)/
p, the eigenvectors of P
s
P
v
and the
eigenvectors of I
p
2
P
s
.
If W W
p
, n) then Σ
1/2
W Σ
1/2
W
p
(I
p
, n) and
W
= Σ
1/2
Σ
1/2
W Σ
1/2
Σ
1/2
is a reflexive generalize d inverse of W . It follows straightforwardly that
150 R.D. Cook and L. Forzani
Corollary 2.3. If W W
p
, n) and p > n + 3 then
E(W
) = a
2
Σ
1
var{vec(W
)} = c
2
(I
p
2
+ C
p
2
)(Σ
1
Σ
1
) + 2d
2
vec
1
)vec
T
1
)
cov(W
ij
, W
kl
) = c
2
1
ik
Σ
1
jl
+ Σ
1
il
Σ
1
jk
) + 2d
2
Σ
1
ij
Σ
1
kl
,
where W
ij
and Σ
1
ij
denote element (i, j) of W
and Σ
1
. The second and third
conclusions are the same, except the third is in terms of the elements of W
.
The form of var{vec(W
)} given in Corollary
2.3 is identical to the asy mptotic
variance of the covariance matrix from a sample from an elliptically contoured
distribution. In that case the cons tants c
2
= 1 + κ and d
2
= κ/2, where κ is the
kurtosis of the distribution (see , for example, Tyler [
15]).
Because ΣW
W and Σ
1
W W
are s ymmetric, the reflexive generalized in-
verse W
of W is also the Moore-Penrose inverse in the inner products based on
Σ and Σ
1
[
13], but W
it is not the usual Moore-Penrose inverse since W
W
and W W
are not symmetric. We were unable to find succinct expressions for
the mean and variance of W
, n) that are analogous to those fo r W
(I
p
, n)
given in Theorem
2.1. In the next sectio n we g ive some results on the mo ments
of W
p
, n). We a lso give the order of the spectral norm k · k of the scaled
Wishart Σ
1/2
W
p
, n
1/2
as n, p , which may be helpful in asymptotic
studies of regressions with p > n.
3. Properties of W
p
, n)
3.1. Mean and variance
Let W W
p
, n) with p > n + 3. The singular Wishart matrix W can be
decomposed as W Y Y
T
, whe re vec(Y ) N(0, Σ I
n
). Since Y R
p×n
has
rank n with pro bability 1, the usual Moore-Penrose generalized inverse can be
decomposed as
W
Y (Y
T
Y )
2
Y
T
Σ
1/2
Z(Z
T
ΣZ)
2
Z
T
Σ
1/2
, (3.1)
where Z R
p×n
is a matrix of iid standar d normal variates. Write the spectra l
decomposition of Σ as Σ = ΓΛΓ
T
, w he re Γ R
p×p
is orthogonal and Λ > 0
is diagonal. Since the distribution of Z is invariant under orthogonal transfor-
mations we have W
ΓΛ
1/2
Z(Z
T
ΛZ)
2
Z
T
Λ
1/2
Γ
T
. Consequently, without
loss of generality, we assume that Σ = Λ is a diagonal matrix when studying
moments and other quantities. The left and right orthogonal transformations Γ
and Γ
T
can be res tored straig htforwardly for a general Σ > 0.
Let M (Λ) = E(W
). Then it follows that for all orthogonal matrices P
R
p×p
, M (P
T
ΛP ) = P
T
M(Λ)P and thus M(Λ) is a tensor-valued isotonic tensor
function of Λ . Isotropic tensor functions have been studied extensively in the
literature on continuum mechanics (see [
12] for an introduction and [9] for recent
Generalized inve rse Wishart moments 151
results). For instance, it is known from this litera ture that M (Λ) and Λ have
the s ame eigenvectors. Although there are various representations for isotropic
tensor functions [
9], they do not seem to provide further illuminatio n in this
setting. Let V (Λ) = var{vec(W
)} R
p
2
×P
2
. The variance V (P
T
ΛP ) = (P
T
P
T
)V (Λ)(P P ) is s imilarly structured as a fourth-orde r tenso r function, and
much the same comments apply.
When Λ = I
p
, the distribution of W
is invariant under orthogonal transfor-
mations, W
P W
P
T
for all orthogonal P R
p×p
. This invariance property
was used extensively in the moment derivations for Theorem
2.1. However, when
Λ 6= I
p
with distribution of W
is no longer invariant and the moments of W
become mor e complicated. Nevertheless, it is still possible to make some progress
using sy mmetry arguments involving the rows z
T
i
of Z, essentially utilizing in-
variance under a restricted class of transformations. This leads to the results
stated in Theorem
3.1. In preparation, let m
ij
(Λ) = E{z
T
i
(Z
T
ΛZ)
2
)z
j
}, let
v
ij,kl
(Λ) = cov{z
T
i
(Z
T
ΛZ)
2
z
j
, z
T
k
(Z
T
ΛZ)
2
)z
l
}, and let Λ
i
denote the i th
diagonal element of Λ, i, j, k, l = 1, . . . , p.
Theorem 3.1. Assume that Σ = Λ is a diagonal matrix with diagonal ele-
ments Λ
i
, i = 1, . . . , p. Then M(Λ) is a diagonal matrix with diagonal elements
M
ii
(Λ) = Λ
i
m
ii
(Λ) and
V (Λ) =
p
X
i,j=1
Λ
i
Λ
j
v
ii,jj
(e
i
e
T
j
e
i
e
T
j
) +
p
X
i,j=1
Λ
i
Λ
j
v
ij,ij
(e
j
e
T
j
e
i
e
T
i
)(I
p
+ C
p
2
)
2
p
X
i=1
Λ
2
i
v
ii,ii
(e
i
e
T
i
e
i
e
T
i
),
where the Λ arguments for v
(·)
on the right hand side have been suppressed
to improve readability, v
ii,jj
= cov{z
T
i
(Z
T
ΛZ)
2
z
i
, z
T
j
(Z
T
ΛZ)
2
)z
j
}, v
ij,ij
=
var{z
T
i
(Z
T
ΛZ)
2
)z
j
} and v
ii,ii
= var{z
T
i
(Z
T
ΛZ)
2
)z
i
}.
The moments m
ii
, v
ij,ij
, v
ii,jj
and v
ii,ii
needed for Theorem
3.1 evidently
do not have tractable closed- form representations.
3.2. Order of kΣ
1/2
W
p
, n
1/2
k
Let Z
0
= Z(Z
T
Z)
1/2
, and let λ
max
and λ
min
denote the la rgest and smallest
eigenvalues of Σ. Then
Γ
T
W
Γ = Λ
1/2
Z(Z
T
ΛZ)
2
Z
T
Λ
1/2
= Λ
1/2
Z(Z
T
Z)
1
(Z
T
0
ΛZ
0
)
2
(Z
T
Z)
1
Z
T
Λ
1/2
.
Because the nor malized ma trix Z
0
has orthogonal co lumns, (Z
T
0
ΛZ
0
)
2
λ
2
min
I
p
and thus Σ
1/2
W
Σ
1/2
λ
2
min
ΓΛW
I
ΛΓ
T
, where W
I
W
p
(I
p
, n). T he order
of kΣ
1/2
W
Σ
1/2
k can now be found by application of Chebyschev’s inequality.
152 R.D. Cook and L. Forzani
Let ǫ > 0 and, for notational co nvenience, let H = Σ
1/2
W
Σ
1/2
. Then for all
h R
p
with khk = 1,
Pr(h
T
Hh ǫ) Pr(λ
2
min
h
T
ΓΛW
I
ΛΓ
T
h ǫ)
ǫ
2
λ
4
min
{var(h
T
ΓΛW
I
ΛΓ
T
h) + E
2
(h
T
ΓΛW
I
ΛΓ
T
h)}
= ǫ
2
λ
4
min
(h
T
ΓΛ h
T
ΓΛ)var{vec(W
I
)}(ΛΓ
T
h ΛΓ
T
h)
+ ǫ
2
λ
4
min
a
2
2
(h
T
ΓΛ
2
Γ
T
h)
2
ǫ
2
(λ
max
min
)
4
{2(c
2
+ pd
2
) + a
2
2
},
where 2(c
2
+ pd
2
) = kvar{ vec(W
I
)}k as given in Corollary
2.2 and a
2
is as
defined in Theorem 2.1. Combining this with Theorem 2.1 and the conclusions
of Corollary 2.1 we have
Corollary 3.1. Let W W
p
, n).
(i) Assume that n > p + 3 and that p/ n r with 0 r < 1. Then
kΣ
1/2
W
Σ
1/2
k = O
p
(n
1
).
(ii) Assume that the condition number λ
max
min
of Σ is bounded as p ,
that p > n + 3 and that n/p r with 0 < r < 1. Then kΣ
1/2
W
Σ
1/2
k =
O
p
(n
1
).
4. Proof of Theorem 2.1
The general idea of this proof is to use invariance arguments along with moment
matching via Proposition
2.1.
A singular Wis hart matrix W W
p
(I
p
, n), p > n + 3, can be decom-
posed as W = Y Y
T
with vec(Y ) N (0, I
p
I
n
) and Y = H
T
D
1/2
U, where
H
T
R
p×n
is semi-orthogonal HH
T
= I
n
, U R
n×n
is orthogonal and the
diagonal elements d
1
, . . . , d
n
of the diagonal matr ix D R
n×n
are non-zero
with probability 1. Consequently, W = H
T
DH and W
= H
T
D
1
H. More-
over Y
T
Y W
n
(I
n
, p) and since p > n + 3, Y
T
Y has full rank with probability
1 and (Y
T
Y )
1
= U
T
D
1
U W
1
n
(I
n
, p). Interchanging n and p in Pro posi-
tion
2.1 we have
E{(Y
T
Y )
1
} = E(U
T
D
1
U) = (p n 1)
1
I
n
, (4.1)
E{(Y
T
Y )
1
(Y
T
Y )
1
} = E(U
T
D
2
U) = c
3
(p 1)I
n
, (4.2)
var{vec(Y
T
Y )
1
} = var{vec(U
T
D
1
U)} (4.3)
= c
3
(I
n
2
+ C
n
2
) + 2d
3
vec(I
n
)vec
T
(I
n
),
where c
1
3
= (p n)(p n 1)(p n 3), d
1
3
= (p n)(p n 1)
2
(p n 3).
4.1. E(W
)
Using the fact that Y P Y for any orthogonal matrix P R
p×p
, we get
E(W
) = E[{(P Y )(P Y )
T
}
] = P E(W
)P
T
Generalized inve rse Wishart moments 153
and consequently E(W
) = aI
p
(see, for example, Ea ton [5], Proposition 2.14).
It remains to find a. Since W
= H
T
D
1
H,
ap = tr{E(W
)} = tr{E(H
T
D
1
H)} = tr{E(D
1
)} = n(p n 1)
1
,
where we used (
4.1) for the last equality. From this we get E(W
); that is,
a = a
2
.
4.2. E(W
W
)
Since E(W
W
) = P E(W
W
)P
T
for any orthogonal matrix P , E(W
W
) =
bI
p
. To find b we have
bp = tr{E(W
W
)} = tr{ E(H
T
D
2
H)} = tr{E(D
2
)} = c
3
n(p 1),
where we used (4.2) for the last equality. From this we conclude that b = b
2
.
4.3. var{vec(W
)} and var{tr(W
)}
Our proof of this part is based on the following proposition which gives a char-
acterization of matrices that are invariant under a subc lass of the orthogonal
transformations.
Proposition 4.1. Let A R
p
2
×p
2
such that (P
T
P
T
)A(P P ) = A for all
orthogonal matrices P R
p×p
. Then A = cI
p
2
+ fC
p
2
+ 2dvec(I
p
)vec
T
(I
p
), for
some real multipliers c, d and f.
This proposition is appa rently well-known in the literature on continuum
mechanics, where it is often referred to as a representation theorem for fourth-
order isotropic tensors. Its proof for the case p = 3 can be found in the classical
literature on Cartesian tensor s [
6]. Jog [7] provides a concise proof and cites
ten other demonstr ations of the same result, most of which are for p = 3. All of
these proofs rely heavily on a nalytic traditions, nota tion and tensor operators
that are not readily found in the statistical literature and might seem elusive on
first reading. (A dictionary connecting tensors and common matrix operations
in statistics was given by Dauxois et al. [
3].) For completeness we have included
in Section 4.5 a proof that does not use the technical machinery of continuum
mechanics, but r e lies only on the Kronecker product, vec operator [
8] and the
commutation matrix [10, 11]. These operator s were defined in the Introduction
and are used widely in the statistical literature.
The collection of matrices A that satisfies the hypothesis of Proposition
4.1
forms a vector space over the real field that is closed under transposition and
multiplication. The proposition essentially gives a basis {I
p
2
, C
p
2
, vec(I
p
)vec
T
(I
p
)}
for A.
For notational convenience, let V = var{vec(W
)}. Since Y P Y for any
orthogonal matrix P R
p×p
we have
V = var[vec{(P Y )(P Y )
T
}
] = (P P )V (P
T
P
T
),
154 R.D. Cook and L. Forzani
and consequently V satisfies the hypothesis o f Proposition 4.1. However, V is
also invariant under multiplication by C
p
2
. Using Proposition
4.1 this implies
that c = f: V V C
p
2
= (c f)I
p
2
(c f)C
p
2
= 0. C onsequently the class of
covariance matrices must be of the form
var{vec(W
)} = c(I
p
2
+ C
p
2
) + 2dvec(I
p
)vec
T
(I
p
). (4.4)
It remains to find c and d which we do by moment matching.
From (
4.4) we have
var{tr(W
)} = var{tr(D
1
)} = var{vec
T
(I
p
)vec(W
)} = 2cp + 2dp
2
. (4.5)
Now, by (4.3) we get another expr ession for
var{tr(D
1
)} = var{vec
T
(I
n
)vec(Y
T
Y )
1
} = 2c
3
n + 2d
3
n
2
(4.6)
and using (4.5) and (4.6) together we see that c and d must satisfy
cp + dp
2
= c
3
n + d
3
n
2
. (4.7)
Since we are pursuing two factors c and d we require a second independent
equation to determine them uniquely. This can be obtained by first taking the
trace of (
4.4) to get tr[var{vec(W
)}] = cp
2
+ cp + 2dp. Second, we obtain a
known expression for tr[va r{ vec(W )
}] by writing it as
tr[var{vec(W
)}] = tr[E{vec(W
)vec
T
(W
)}]tr[E{vec(W
)}E
T
{vec(W
)}]
and using previous results to reduce the right hand side. Using (4.2) and the
previously derived form for E(W
) we have
tr[E{vec(W
)vec
T
(W
)}] = tr [E{vec(H
T
D
1
H)vec
T
(H
T
D
1
H)}]
= tr{E(D
2
)} = nc
3
(p 1),
tr[E{vec(W
)}E
T
{vec(W
)}] = a
2
2
p.
Consequently,
cp
2
+ cp + 2dp = nc
3
(p 1) a
2
2
p. (4.8)
Using Maple to solve (
4.7) and (4.8) for c and d gives the solutions stated in
Theorem 2.1.
4.4. Moments of higher order
Moments of higher order can in principle be found similarly, by using results from
von Rose n [
17] in combination with moment matching. For instance, consider
E{(W
)
3
} = E(H
T
D
3
H) = b
3
I
p
. Proceeding as in Section 4.2, E{(Y
T
Y )
3
} =
E(U
T
D
3
U) = r
3
I
n
, where r
3
can be obtained from von Rosen’s Corollary 3.1
again interchanging the roles of n and p. Consequently, b
3
= nr
3
/p.
Generalized inve rse Wishart moments 155
4.5. Proof of Proposition 4.1
The proof of Propos ition 4.1 is based on using various subclasses of the orthogo-
nal matrices to characterize the c olumns of A. This characterization is described
in the next lemma , which uses the same hypothesis as the proposition.
Lemma 4.1. Let A R
p
2
×p
2
be such that (P
T
P
T
)A(P P ) = A for all
orthogonal matrices P R
p×p
. Then for some real factors h, d, s
1
and s
2
,
(i) A(e
i
e
i
) = hvec(I
p
) + d(e
i
e
i
) for i = 1, . . . , p.
(ii) A(e
i
e
j
) = s
1
(e
i
e
j
) + s
2
(e
j
e
i
) for i 6= j = 1, . . . , p.
Proof. This proof is based on taking various forms for P in the hypothesized
relationship. We do not disting uish these forms notationally.
Part (i): Let P be any orthogonal matrix with the property that P e
i
= e
i
for a selected index i. Restricting consideration to this subclass of orthogonal
matrices and multiplying the hypothesized equation on the right by e
i
e
i
we
have (P
T
P
T
)A(e
i
e
i
) = A(e
i
e
i
). Let M = vec
1
{A(e
i
e
i
)} so that
vec(P
T
MP ) = vec(M). (4.9)
Without los t of generality take i = p, and consider orthogonal matrices of the
form
P =
P
1
0
0 1
,
where P
1
R
(p1)×(p1)
is any orthogonal matrix. Clearly, P e
p
= e
p
. Partition
M = (M
jk
), j, k = 1, 2 according to the partition of P . We consider the four
partition components M
jk
separately.
(1). From (
4.9) we get P
T
1
M
11
P
1
= M
11
for all orthogonal matrices P
1
R
p1
. It is well known that this implies M
11
= h
p
I
p1
for some real multiplier
h
p
(see, for example, Eaton [5], Proposition 2.14).
(2). P
T
1
M
12
= M
12
for all orthogonal matrices P
1
implies that M
12
= 0:
Write M
12
= λU with U R
p1
a semi-orthogonal matrix. Taking P
T
1
to be an
orthogonal matrix with row j equal to U
T
, we have M
12
= λe
j
for any j = 1 . . . p
and therefore M
12
= 0. Similarly, M
21
= 0.
(3). Finally, M
22
R
1
is arbitr ary and it follows that M has the form
M = h
p
I
p
+ d
p
e
p
e
T
p
with d
p
= M
22
h
p
. Therefore, for i = 1, . . . , p,
vec(M) = A(e
i
e
i
) = h
i
vec(I
p
) + d
i
(e
i
e
i
). (4.10)
It remains to show that h
i
and d
i
are constant over the index i. Take a new
P such that P e
i
= e
j
and P e
j
= e
i
for two s elected indices i 6= j. Then using
(
4.10), the hypothesis and (4.10) again we find
h
i
vec(I
p
) + d
i
(e
i
e
i
) = A(e
i
e
i
)
= (P
T
P
T
)A(P P )(e
i
e
i
)
= (P
T
P
T
)A(e
j
e
j
)
156 R.D. Cook and L. Forzani
= (P
T
P
T
) (h
j
vec(I
p
) + d
j
(e
j
e
j
))
= h
j
vec(I
p
) + d
j
(e
i
e
i
).
Therefore h
i
= h
j
= h, d
i
= d
j
= d and part (i) follows.
Part (ii): The proof of this conc lusion follows the same logic a s the proof of
part (i), but we use the subclass of orthogonal matrices with the pro perty that
P e
i
= e
i
and P e
j
= e
j
for selected indice s i 6= j. Restricting consideration to
this sub class and multiplying the hypothesized equation on the right by e
i
e
j
we have (P
T
P
T
)A(e
i
e
j
) = A(e
i
e
j
). Let M = vec
1
{A(e
i
e
j
)} so that
vec(P
T
MP ) = vec(M). Without lost of generality take i = p 1 and j = p,
and consider orthogonal matrices of the form
P =
P
1
0
0 I
2
,
where P
1
R
(p2)×(p2)
is any orthogona l matrix. Partition M = (M
jk
), j, k =
1, 2 to conform with the partitions of P . Using the hypothes is of the lemma, we
reason as follows. (1) P
T
1
M
11
P
1
= M
11
for all P
1
orthogonal matrice s of order
p2 again implies that M
11
= c
ij
I
p2
for some real factor c
ij
. (2) P
T
1
M
12
= M
12
for all orthogonal matrices P
1
implies aga in M
12
= 0, and analogously M
21
= 0.
(3) M
22
is arbitrary and therefore
A(e
i
e
j
) = c
ij
vec(I
p
) + s
1ij
(e
i
e
j
) + s
2ij
(e
j
e
i
) + t
ij
(e
i
e
i
) + u
ij
(e
j
e
j
).
(4.11)
It follows from part (i) and the fact that A
T
also satisfies the hypothesis that
(e
T
k
e
T
k
)A(e
i
e
j
) = 0 for i 6= j = 1, . . . , p and k = 1, . . . , p. Using this result
and multiplying (
4.11) by (e
T
k
e
T
k
), with k 6= i and k 6= j, (e
T
i
e
T
i
) and
(e
T
j
e
T
j
) respectively we conclude that c
ij
= t
ij
= u
ij
= 0 for i 6= j. Therefore
A(e
i
e
j
) = s
1ij
(e
i
e
j
) + s
2ij
(e
j
e
i
), i 6= j. (4.12)
It remains to show that s
1ij
and s
2ij
are constant in the indices i 6= j. Take
a new subclass of P ’s such tha t, for two additional selected indices k 6= s,
P e
i
= e
k
, and P e
j
= e
s
where still i 6= j. Then using (
4.12), the hyp othesis and
(4.12) again we get that
s
1ij
(e
i
e
j
) + s
2ij
(e
j
e
i
) = (P
T
P
T
)A(P P )(e
i
e
j
)
= (P
T
P
T
)A(e
k
e
s
)
= (P
T
P
T
) (s
1ks
(e
k
e
s
) + s
2ks
(e
s
e
k
))
= s
1ks
(e
i
e
j
) + s
2ks
(e
j
e
i
).
Multiplying the first and the last term by (e
T
i
e
T
j
) and (e
T
j
e
T
i
) respectively
we g et
A(e
i
e
j
) = s
1
(e
i
e
j
) + s
2
(e
j
e
i
), for i 6= j,
which co nc ludes the proof of the lemma.
Generalized inve rse Wishart moments 157
Turning to the proof of Proposition 4 .1, we first show that the multipliers
in Lemma
4.1 are functionally rela ted; in particular, d = s
1
+ s
2
. It follows
immediately from part (ii) of Lemma 4.1 that, for i 6= j, A(P P )(e
i
e
j
) =
(P P ) (s
1
(e
i
e
j
) + s
2
(e
j
e
i
)). Taking P to be in the subcla ss of orthogona l
matrices with the pro perty that, fo r selected indices i 6= j, P e
i
= (e
i
+ e
j
)/
2
and P e
j
= (e
j
e
i
)/
2, we have immediately that
A
e
i
+ e
j
2
e
j
e
i
2
= s
1
e
i
+ e
j
2
e
j
e
i
2
+ s
2
e
j
e
i
2
e
i
+ e
j
2
Expanding this equation and simplifying we find that
d((e
j
e
j
) (e
i
e
i
)) = (s
1
+ s
2
) ((e
j
e
j
) (e
i
e
i
))
and consequently d = s
1
+ s
2
. The conclusion of the proposition now follows
from Lemma
4.1 with d = s
1
+ s
2
: Taking v =
P
i,j
c
ij
(e
i
e
j
),
Av = A
X
i,j
c
ij
(e
i
e
j
)
=
X
i6=j
c
ij
(s
1
(e
i
e
j
) + s
2
(e
j
e
i
)) +
p
X
i=1
c
ii
(hvec(I
p
) + (s
1
+ s
2
)(e
i
e
i
))
= s
1
X
i,j
c
ij
(e
i
e
j
) + s
2
X
i,j
c
ij
(e
j
e
i
) + h
p
X
i=1
c
ii
vec(I
p
)
= s
1
v + s
2
C
p
2
v + h
1
vec(I
p
)vec
T
(I
p
)v,
where h
1
= h
P
p
i=1
c
ii
.
5. Proof of Theorem
3.1
Recall that Theorem 3.1 r equires Σ = Λ to be a diagonal matrix. For notational
convenience, le t H = (Z
T
ΛZ)
2
. The conclusion that M (Λ) is a diagonal matrix
arises by noting that, for i 6= j, z
T
i
Hz
j
z
T
i
Hz
j
, and thus E(z
T
i
Hz
j
) = 0.
By a similar symmetry argument, the element v
ij,kl
of V equals 0 when
at least one of its indices i, j, k, l is distinct; that is, not equal to a ny other
index. If no indices are distinct then they must be equal in pairs, leading to
four possibilities: for i 6= j, v
ii,jj
, v
ij,ij
, v
ij,ji
and, for i = j, v
ii,ii
. However,
v
ij,ij
= v
ij,ji
, which leads to the three v terms in the Theorem. The form o f V
follows from these results, the representation V =
P
ij,kl
i
Λ
k
Λ
k
Λ
l
)
1/2
v
ij,kl
×
vec(e
i
e
T
j
)vec(e
k
e
T
l
), and the definitio n of the commutation matrix.
References
[1] Basser, P. J. a nd Pajevic, S. (2007). Spectral decomposition of a 4th
order covariance tensor: Applications to diffusion tensor MRI. Signal Pro-
cessing 87, 220–236.
158 R.D. Cook and L. Forzani
[2] Bodnar, T. a nd Okhrin, Y. (2008). Properties of the singular , inve rse
and generalized inverse partitioned Wishart distributions. J. Multivariate
Anal. 99, 2389–2 405.
MR2463397
[3] Dauxois, J., Romain, Y. and Viguier-Pla, S. (1994). Tensor products
and statistics. Linear Algebra and its Applications 210, 59–88.
MR1294771
[4] D
´
ıaz-Garc
´
ıa, J. A. and Guti
´
errez-J
´
aimez, R. (2006). Distribution of
the generaliz ed inverse of a random matrix and its applica tions. Journal of
Statistical Planning and Inference 136, 183–192.
MR2207179
[5] Eaton, M. L. (2007). Multivariate Statistics: A Vector Space Approach.
Beachwood, Ohio: Institute of Mathematical Statistics.
MR2431769
[6] Jeffreys, H. (1 931). Cartesian Tensors. Cambridge: Cambridge Univer-
sity Press. Page 91. MR0133075
[7] Jog, C. S. (2006). A concise pro of of the representation theorem for
fourth-order isotropic tens ors. J. Elasticity 85, 119–124. MR226 5724
[8] Henderson, H, V. and Searle, S. R. (1979). Ve c and vech operators for
matrices, with some uses in J acobians and multivariate statistics. Canadian
J. Statist. 7, 65–81. MR0549795
[9] Itskov, M. (2009). Tensor Analysis and Tensor Algebra for Engineers,
2nd Edition. New York: Springer .
[10] Magnus, J. R. and Neudecker, H. (1979). The commutation matrix:
Some properties and applications. Ann. Statist. 7, 381–39 4.
MR0520247
[11] Neudecker, H. and Wansbeek, T. (1983). Some Results on Commu-
tation Matrices, with Statistical Applications . Canadian J. Statist. 11,
221–231.
MR0732996
[12] Ogden, R. W. (2001). Elements of the theory of finite elasticity. In Fu, Y.
B. and Ogden, R. W. (eds) Nonlinear Elasticity: Theory and Applications.
Cambridge University Press, 1–58.
MR1835108
[13] Stone, M. (1987). Coordinate-Free Multivariate Statistics. New York:
Oxford University Press.
MR0916472
[14] Srivastava, M. S. (2003). Sing ular Wishart and multivariate beta distri-
butions. Ann. Statist. 31, 1537–1560. MR2012825
[15] Tyler, D. E. (1981). Asymptotic inference for eigenvectors. Ann. Statist.
9, 725–736. MR061927 8
[16] Uhlig, H. (1994). On singular Wishart and singular multivariate beta
distributions. Ann. Statist. 22, 395–405. MR1272090
[17] von Rosen, D. (1 988). Moments for the inverted Wishart distribution.
Scand. J. Statist. 15, 97–109.
MR0968156
[18] Zhang, Z. (2007). Pseudo-inverse multivariate/matrix-variate distribu-
tions. J. Multivariate Anal. 98, 1684–1692. MR23 70113
... With the standard assumption of normally distributed asset returns, the stochastic components ofw T P consists of S + andx, which are independent under the assumption of normally distributed data (see, e.g., [10]). Unfortunately, there exist no derivation of the expected value or variance of S + , when p > N. In [21] however, these quantities are presented in the special case of = I p . The authors also provided approximate results, using moments of standard normal random variables, and exact results for moments of the generalized reflexive inverse, another quantity that can be applied as an inverse of S. Further, in a recent paper [28], several bounds on the mean and variance of S + are provided, based on the Poincaré separation theorem. ...
... The authors also provided approximate results, using moments of standard normal random variables, and exact results for moments of the generalized reflexive inverse, another quantity that can be applied as an inverse of S. Further, in a recent paper [28], several bounds on the mean and variance of S + are provided, based on the Poincaré separation theorem. Our paper builds on the results presented in [21] and [28] to provide bounds and approximations for the moments of the TP weights, ...
... When is the identity matrix, it is possible to derive exact moments of the TP weights obtained from the Moore-Penrose inverse in the singular case. First, note the following results presented in Theorem 2.1 of [21], which state that in the case = I p and p > n + 3, we have that ...
Article
Full-text available
In this paper, a sample estimator of the tangency portfolio (TP) weights is considered. The focus is on the situation where the number of observations is smaller than the number of assets in the portfolio and the returns are i.i.d. normally distributed. Under these assumptions, the sample covariance matrix follows a singular Wishart distribution and, therefore, the regular inverse cannot be taken. In the paper, bounds and approximations for the first two moments of the estimated TP weights are derived, as well as exact results are obtained when the population covariance matrix is equal to the identity matrix, employing the Moore–Penrose inverse. Moreover, exact moments based on the reflexive generalized inverse are provided. The properties of the bounds are investigated in a simulation study, where they are compared to the sample moments. The difference between the moments based on the reflexive generalized inverse and the sample moments based on the Moore–Penrose inverse is also studied.
... The results of Theorem II.6 are derived by approximating the Moore-Penrose inverse with the reflexive inverse (see, e.g., Cook and Forzani [51]), which provides a good approximation of the Moore-Penrose inverse when c i ∈ (1, 2) (see, Bodnar and Parolya [52]). In this case, the dynamic shrinkage estimator of the GMV portfolio (II.32)-(II.34) is expected to perform good in practice, while for larger values of c i it should be used with caution. ...
Article
In this paper, new results in random matrix theory are derived, which allow us to construct a shrinkage estimator of the global minimum variance (GMV) portfolio when the shrinkage target is a random object. More specifically, the shrinkage target is determined as the holding portfolio estimated from previous data. The theoretical findings are applied to develop theory for dynamic estimation of the GMV portfolio, where the new estimator of its weights is shrunk to the holding portfolio at each time of reconstruction. Both cases with and without overlapping samples are considered in the paper. The non-overlapping samples corresponds to the case when different data of the asset returns are used to construct the traditional estimator of the GMV portfolio weights and to determine the target portfolio, while the overlapping case allows intersections between the samples. The theoretical results are derived under weak assumptions imposed on the data-generating process. No specific distribution is assumed for the asset returns except from the assumption of finite $4+\varepsilon$ , $\varepsilon >0$ , moments. Also, the population covariance matrix with unbounded largest eigenvalue can be considered. The performance of new trading strategies is investigated via an extensive simulation. Finally, the theoretical findings are implemented in an empirical illustration based on the returns on stocks included in the S&P 500 index.
... Isotropic tensors with even orders show up in connection with symmetric matrix variates for which the distribution is rotationally invariant [PvR21] sect 3.3.2, also with studying the generalized inverse of the singular Wishart matrices [CF11]. Our aim in this section to derive the skewness and kurtosis for such matrix variates. ...
Preprint
Full-text available
Preface The first order moment and the variance are the first two characteristics for a probability distribution, actually these two quantities characterize the normal distribution fully. Both the first order moment and the variance can be considered as first and second order cumulants respectively. Cumulants are those quantities which shows the discrepancy between a given distribution and the normal one, since all higher (than second) order cumulants of the normal distribution are zero. The third and fourth order cumulants are the next characteristic in this way to be studied. They have been the focus of interest since the beginning the twenties century under the names of skewness and kurtosis. The skewness and the kurtosis are the third and the fourth order cumulants of a standardized distribution. We consider the skewness and kurtosis for multivariate distributions in this sense. The method of tensor derivative (−derivative for short) of the cumulant generator function and usage of commutator matrices prove to be efficient for obtaining the skewness and kurtosis of several distributions ([Ter21]). This Notes has a dual purpose, one is to study different multivariate distributions in interest by our method and the other is to show the usage of the R package MultiStatM (available in Repository CRAN) in this field. The Note is organized as follows: Chapter 1 summarize some basic notions and ideas which are based on earlier results, commutation and symmetrization, kurtosis matrix and −derivative of functions with symmetric variable. Chapter 2 deals with multi-normal distribution and its descendants. We consider moments, cumulants for normal vector variates and their tensor square and quadratic forms. The first four cumulants are given not only for quadratic forms of normal matrices but for symmetric matrix variates with isotropy cartesian tensors as well. The skewness and kurtosis of multivariate Gamma distribution closes this chapter. Chapter 3 concerns some multivariate discrete distributions like multivariate Bernoulli, and multino-mial distributions. The Appendix includes some known formulae of matrix theory and some proofs.
Article
We consider estimation under scenarios where the signals of interest exhibit change of characteristics over time. In particular, we consider the continual learning problem where different tasks, e.g., data with different distributions, arrive sequentially and the aim is to perform well on the newly arrived task without performance degradation on the previously seen tasks. In contrast to the continual learning literature focusing on the centralized setting, we investigate the problem from a distributed estimation perspective. We consider the well-established distributed learning algorithm CoCoA , which distributes the model parameters and the corresponding features over the network. We provide exact analytical characterization for the generalization error of CoCoA under continual learning for linear regression in a range of scenarios, where overparameterization is of particular interest. These analytical results characterize how the generalization error depends on the network structure, the task similarity and the number of tasks, and show how these dependencies are intertwined. In particular, our results show that the generalization error can be significantly reduced by adjusting the network size, where the most favorable network size depends on task similarity and the number of tasks. We present numerical results verifying the theoretical analysis and illustrate the continual learning performance of CoCoA with a digit classification task.
Article
Full-text available
The late Professor Heinz Neudecker (1933–2017) made significant contributions to the development of matrix differential calculus and its applications to econometrics, psychometrics, statistics, and other areas. In this paper, we present an insightful overview of matrix-oriented findings and their consequential implications in statistics, drawn from a careful selection of works either authored by Professor Neudecker himself or closely aligned with his scientific pursuits. The topics covered include matrix derivatives, vectorisation operators, special matrices, matrix products, inequalities, generalised inverses, moments and asymptotics, and efficiency comparisons within the realm of multivariate linear modelling. Based on the contributions of Professor Neudecker, several results related to matrix derivatives, statistical moments and the multivariate linear model, which can literally be considered to be his top three areas of research enthusiasm, are particularly included.
Article
Full-text available
Objective: Organ deformation models have the potential to improve delivery and reduce toxicity of radiotherapy, but existing data-driven motion models are based on either patient-specific or population data. We propose to combine population and patient-specific data using a Bayesian framework. Our goal is to accurately predict individual motion patterns while using fewer scans than previous models. Approach: We have derived and evaluated two Bayesian deformation models. The models were applied retrospectively to the rectal wall from a cohort of prostate cancer patients. These patients had repeat CT scans evenly acquired throughout radiotherapy. Each model was used to create coverage probability matrices (CPMs). The spatial correlations between these estimated CPMs and the ground truth, derived from independent scans of the same patient, were calculated.\\ Main results: Spatial correlation with ground truth were significantly higher for the Bayesian deformation models than both patient-specific and population-derived models with 1, 2 or 3 patient-specific scans as input. Statistical motion simulations indicate that this result will also hold for more than 3 scans. \\ Significance: The improvement over previous models means that fewer scans per patient are needed to achieve accurate deformation predictions. The models have applications in robust radiotherapy planning and evaluation, among others.
Article
We consider estimation under model misspecification where there is a model mismatch between the underlying system, which generates the data, and the model used during estimation. We propose a model misspecification framework which enables a joint treatment of the model misspecification types of having fake features as well as incorrect covariance assumptions on the unknowns and the noise. We present a decomposition of the output error into components that relate to different subsets of the model parameters corresponding to underlying, fake and missing features. Here, fake features are features which are included in the model but are not present in the underlying system. Under this framework, we characterize the estimation performance and reveal trade-offs between the number of samples, number of fake features, and the possibly incorrect noise level assumption. In contrast to existing work focusing on incorrect covariance assumptions or missing features, fake features is a central component of our framework. Our results show that fake features can significantly improve the estimation performance, even though they are not correlated with the features in the underlying system. In particular, we show that the estimation error can be decreased by including more fake features in the model, even to the point where the model is overparametrized, i.e., the model contains more unknowns than observations.
Preprint
Full-text available
Objective: Organ deformation models have the potential to improve delivery and reduce toxicity of radiotherapy, but existing data-driven motion models are based on either patient-specific or population data. We propose to combine population and patient-specific data using a Bayesian framework. Our goal is to accurately predict individual motion patterns while using fewer scans than previous models. Approach: We have derived and evaluated two Bayesian deformation models. The models were applied retrospectively to the rectal wall from a cohort of prostate cancer patients. These patients had repeat CT scans evenly acquired throughout radiotherapy. Each model was used to create coverage probability matrices (CPMs). The spatial correlations between these CPMs and ``true'' CPMs, derived from independent scans of the same patient, were calculated. Main results: Spatial correlation with ground truth were significantly higher for the Bayesian deformation models than both patient-specific and population-derived models with 1, 2 or 3 patient-specific scans as input. Statistical motion simulations indicate that this result will also hold for more than 3 scans. Significance: The improvement over known models means that fewer scans per patient are needed to achieve accurate deformation predictions. The models have applications in robust radiotherapy planning and evaluation, among others.
Preprint
Full-text available
Sketch-and-project is a framework which unifies many known iterative methods for solving linear systems and their variants, as well as further extensions to non-linear optimization problems. It includes popular methods such as randomized Kaczmarz, coordinate descent, variants of the Newton method in convex optimization, and others. In this paper, we obtain sharp guarantees for the convergence rate of sketch-and-project methods via new tight spectral bounds for the expected sketched projection matrix. Our estimates reveal a connection between the sketch-and-project convergence rate and the approximation error of another well-known but seemingly unrelated family of algorithms, which use sketching to accelerate popular matrix factorizations such as QR and SVD. This connection brings us closer to precisely quantifying how the performance of sketch-and-project solvers depends on their sketch size. Our analysis covers not only Gaussian and sub-gaussian sketching matrices, but also a family of efficient sparse sketching methods known as LESS embeddings. Our experiments back up the theory and demonstrate that even extremely sparse sketches show the same convergence properties in practice.
Book
There is a large gap between the engineering course in tensor algebra on the one hand and the treatment of linear transformations within classical linear algebra on the other hand. The aim of this modern textbook is to bridge this gap by means of the consequent and fundamental exposition. The book is addressed primarily to engineering students with some initial knowledge of matrix algebra. Thereby the mathematical formalism is applied as far as it is absolutely necessary. Numerous exercises provided in the book are accompanied by solutions enabling an autonomous study. The last chapters of the book deal with modern developments in the theory of isotropic and anisotropic tensor functions and their applications to continuum mechanics and might therefore be of high interest for PhD-students and scientists working in this area. This second edition is completed by a number of additional examples and exercises. The text and formulae are thoroughly revised and improved where necessary. © Springer-Verlag Berlin Heidelberg 2007, 2009. All rights are reserved.
Article
This collection of papers by leading researchers in the field of finite, nonlinear elasticity concerns itself with the behavior of objects that deform when external forces or temperature gradients are applied. This process is extremely important in many industrial settings, such as aerospace and rubber industries. This book covers the various aspects of the subject comprehensively with careful explanations of the basic theories and individual chapters each covering a different research direction. The authors discuss the use of symbolic manipulation software as well as computer algorithm issues. The emphasis is placed firmly on covering modern, recent developments, rather than the very theoretical approach often found. The book will be an excellent reference for both beginners and specialists in engineering, applied mathematics and physics.
Article
Moments of arbitrary order for the inverted Wishart distribution are obtained with the help of a factorization theorem, moments for normally distributed variables and inverse moments for chi-squared variables. Expressions are given in a recursive as well as a non-recursive manner.
Article
The commutation matrix Pmn changes the order of multiplication of a Kronecker matrix product. The vec operator stacks columns of a matrix one under another in a single column. It is possible to express the vec of a Kronecker matrix product in terms of a Kronecker product of vecs of matrices. The commutation matrix plays an important role here. “Super-vec-operators” like vec A ⊗ vec A vec (A ⊗ A), and vec{(A ⊗ A)Pnn} are very convenient. Several of their properties are being studied. Both the traditional commutation matrix and vec operator and the newer concepts developed from these are applied to multivariate statistical and related problems.
Article
The vec of a matrix X stacks columns of X one under another in a single column; the vech of a square matrix X does the same thing but starting each column at its diagonal element. The Jacobian of a one-to-one transformation X → Y is then ∣∣∂(vecX)/∂(vecY) ∣∣ when X and Y each have functionally independent elements; it is ∣∣ ∂(vechX)/∂(vechY) ∣∣ when X and Y are symmetric; and there is a general form for when X and Y are other patterned matrices. Kronecker product properties of vec(ABC) permit easy evaluation of this determinant in many cases. The vec and vech operators are also very convenient in developing results in multivariate statistics.Le “vec” d'une matrice X est un vecteur contenant les eolonnes de X. Le “vech” d'une matrice carrée X est un vecteur contenant les éléments des colonnes de X qui sont sous ou sur la diagonale. Le Jacobien d'une transformation bijective X → Y s'écrit alors: ‖ ∂(vecX)/∂(vecY)‖ si X et Y ont des elements fonctionnellement independants, ‖ ∂(vechX)/∂(vech Y)‖ si X et Y sont symetriques; on presente egalement une formule generale pour le cas ou X et Y ont differents motifs. Les proprietes du produit de Kronecker de vec(ABC) facilite l'evalution de ce determinant dans plusieurs cas. Les operateurs vec et cech sont aussi utiles pour demontrer des resultats en statistique multivariee.