ArticlePDF Available

On the mean and variance of the generalized inverse of a singular Wishart matrix

January 2011
Electronic Journal of Statistics 5(2011)

January 2011
5(2011)

DOI:10.1214/11-EJS602

Authors:

R. D. Cook

University of Minnesota Twin Cities

Liliana Forzani

Universidad Nacional del Litoral

We derive the first and the second moments of the Moore-Penrose generalized inverse of a singular standard Wishart matrix without relying on a density. Instead, we use the moments of an inverse Wishart distribution and an invariance argument which is related to the literature on tensor functions. We also find the order of the spectral norm of the generalized inverse of a Wishart matrix as its dimension and degrees of freedom diverge.

Content uploaded by Liliana Forzani

Content may be subject to copyright.

Electronic Journal of Statistics

Vol. 5 (2011) 146–158

ISSN: 1935-7524

DOI:

10.1214/11-EJS602

On the mean and variance of the

generalized inverse of a singular

Wishart matrix

∗

R. Dennis Cook

†

School of Statistics, University of Minnesota, 313 Ford Hall,

224 Church Street SE, Minneapolis, MN 55455

e-mail:

dennis@stat.umn.edu

and

Liliana Forzani

Instituto de Matem´atica Aplicada del Litoral and Facultad de Ingenier´ıa Qu´ımica,

CONICET and UNL , G¨uemes 3450, (3000) Santa Fe, Argentina

e-mail:

liliana.forzani@gmail.com

Abstract: We derive the ﬁrst and the second moments of the Moore-

Penrose generalized inverse of a singular standard Wishart matrix without

relying on a density. Instead, we use the moments of an inverse Wishart

distribution and an invariance argument which is related to the literature

on tensor functions. We also ﬁnd the order of the spectral norm of the

generalized inverse of a Wishart matrix as its dimension and degrees of

freedom diverge.

AMS 2000 subject classiﬁcat ions: Primary 62H05, secondary 62E15.

Keywords and phr ases: Inverse Wishart dis tr ibution, Moore-Penrose

generalized inverse, singular inverse Wishart distributions, tensor functions.

Received October 2010.

Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

147

2 Moments of W

−

, n) . . . . . . . . . . . . . . . . . . . . . . . . . . 148

3 Properties of W

−

(Σ, n) . . . . . . . . . . . . . . . . . . . . . . . . . . 150

3.1 Mean and variance . . . . . . . . . . . . . . . . . . . . . . . . . . 150

3.2 Order o f kΣ

1/2

−

(Σ, n)Σ

1/2

k . . . . . . . . . . . . . . . . . . . . 151

4 Proof of Theo rem 2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

4.1 E(W

−

) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

4.2 E(W

−

) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

4.3 var{vec(W

−

)} and var{tr(W

−

)} . . . . . . . . . . . . . . . . . . 153

∗

The authors thank Joe Eaton and a referee for helpful suggestions.

†

This work was supported in part by grant DMS-1007547 from the U.S. National Science

Foundation.

146

Generalized inve rse Wishart moments 147

4.4 Moments of higher o rder . . . . . . . . . . . . . . . . . . . . . . . 154

4.5 Proof of Proposition 4.1 . . . . . . . . . . . . . . . . . . . . . . . 155

5 Proof of Theo rem 3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

1. Introduction

Let the c olumns of X = (X

, . . . , X

) ∈ R

p×n

be a sample X

, i = 1, . . . , n, from

(0, Σ), the p-variate norma l distribution with mean 0 and positive deﬁnite

variance matrix Σ. The sum of squares matrix S = XX

follows the p-variate

Wishart distribution with n degree s of fr eedom, denoted by W

(Σ, n). If p > n,

then W

(Σ, n) is ca lle d the singular Wishart distribution. Its density was given

by Uhlig [

16] under the Hausdorﬀ measure and by Srivastava [14] under the

Lebesgue measure on the functionally independent elements of S.

Let W ∼ W

(Σ, n) and let W

−

be the usual Moore-Penrose inverse of W,

deﬁned as the unique matrix W

−

such that W

−

W W

−

= W

−

, W W

−

W = W ,

and W W

−

and W

−

W are symmetric. As indicated by this deﬁnition, we use

the usual inner products based on the identity matrix I

for the two symmetry

conditions in the Moore-Penorse inverse, unless stated otherwise. If W is non-

singular then W

−

is the regular inverse. The distribution of W

−

is called the

inve rse Wishart distributio n when p ≤ n and the generalize d inverse Wishart

distribution when p > n. D´ıaz-Garc´ıa and Guti´errez-J´aimez [

4] gave an ex-

pression for the de nsity function of the generalized inverse Wishart distribution

under the Hausdorﬀ measure. Under the Lebesgue measure on the functionally

independent elements, the density was pro posed by Bodnar and Okhrin [

2] and

Zhang [

18] but their results seem inconsistent. The density given by Bodnar and

Okhrin involves the eigenvalues of W

−

, while the density given by Zhang does

not. Both of their results were based on the density of the singular Wishar t dis-

tribution given by Srivastava [

14], and neither gave moments of the distribution.

If W ∼ W

(Σ, n) and p ≤ n, then we denote the inverse Wishart distribution of

−1

as W

−1

(Σ, n). If p > n the distribution of W

−

is denoted as W

−

(Σ, n).

In this note we der ive the ﬁrst two moments of W

−

, n) without relying on

an expression of its dens ity function, and discuss the is sues involved in extending

this result to W

−

(Σ, n). Our results are based on the ﬁrst two moments of the

inve rse Wishart distribution [

17] plus an invaria nc e argument. We also ﬁnd the

order of the spectral norm k·k of W

(Σ, n) as n, p → ∞. I n addition to being a

contribution to rando m matrix theory, these results may play a role in Bayesian

analysis because the corresponding distributions are natural conjugate priors

for the c ovariance matrix in the normal distribution [

4]. They are also useful in

studies of estimation methods for high dimensional n < p re gressions.

We present our ﬁndings on the moments W

−

(I, n) in Theorem

2.1 of Sec-

tion

2. Those ﬁndings rely on an invariance relationship that is descr ibed in

Propositio n 4.1 and is related to the classical mechanics literature o n tensor s.

Results on the moments of a W

−

(Σ, n) random matrix and on its orde r are

given in Section

3.2. The proof of Theorem 2.1 is given in Section 4 and the

proof of Proposition 4.1 is given in Section 4.5.

148 R.D. Cook and L. Forzani

Throughout this article ∼ means equal in distribution and R

p×q

denotes

the collection of all real p × q matr ices. For sequences {a

} and {b

}, we write

≍ b

if there are constants m, M and N such that 0 < m < |a

| < M < ∞

for all n > N. The Kronecker product ⊗ of two matrices A = (A

) ∈ R

a×b

and

B ∈ R

c×d

is the ac × bd matrix expressed in block form as A ⊗ B = (A

B),

i = 1, . . . , a, j = 1, . . . , b. The vec operator [

8] maps A ∈ R

a×b

to R

stacking its columns. We us e e

∈ R

to denote the vector w ith a 1 in the i-

th position and 0’s elsewhere. The p

× p

commutation matrix is denoted as

i,j

⊗ e

)(e

⊗ e

) [

10, 11].

2. Moments of W

−

, n)

The ﬁrst two moments of W

−1

, n) have been known fo r some time:

Proposition 2.1. (von Rosen, 1988) Let W ∼ W

, n). If n > p + 3, then

rank(W ) = p with probability 1,

E(W

−1

) = a

E(W

−1

) = b

var{vec(W

−1

)} = c

+ C

) + 2d

vec(I

)vec

where a

= (n − p − 1)

−1

, b

= (n − 1)c

, c

−1

= (n − p)(n − p − 1)(n − p − 3),

and d

−1

= (n − p)(n − p − 1)

(n − p − 3).

There is a close relationship between the form of var{vec(W

−1

)} a nd the

spectral decompositio n of a fourth- order isotropic tensor from classical elasticity

theory: Expressing var{vec(W

−1

)} in terms of the elements of W

−1

of W

−1

and

rearranging terms we have

cov(W

−1

, W

−1

) = c

(δ

+ δ

) + 2d

= {(2c

+ 6d

)/3}δ

+ 2c

{(δ

+ δ

)/2 − δ

/3},

where Kronecker’s delta δ

= 1 if i = j and 0 otherwise. Except for the coeﬃ-

cients (2c

+ 6d

) and 2c

, this form is identical to a c lassical tensor decompo-

sition that is related to the bulk and shear moduli (see, for example, equation

(14) of [

1] and the associated references). Further co mments on the relation-

ship between this work and continuum mechanics are given in Section 4. von

Rosen (1988) gives also various moments of W

−1

(Σ, n), but here our focus is on

Wisharts with Σ = I

The following theorem gives results for W

−

, n) that are analogous to those

stated in Proposition

2.1 for W

−1

, n).

Theorem 2.1. Let W ∼ W

, n). If p > n + 3, then rank(W ) < p with

probability 1,

E(W

−

) = a

E(W

−

) = b

Generalized inve rse Wishart moments 149

var{vec(W

−

)} = c

+ C

) + 2d

vec(I

)vec

var{tr(W

−

)} = 2c

n + 2d

where

p(p − n − 1)

n(p − 1)c

n{p(p − 1) − n(p − n − 2) − 2}c

p(p − 1)(p + 2)

n{n

(n − 1) + 2n(p − 2)(p − n) + 2p (p − 1)}d

(p − 1)(p + 2)

−1

= (p − n)(p − n − 1)(p − n − 3) and d

−1

= (p − n)(p − n − 1)

(p − n − 3).

The constraints n > p + 3 and p > n + 3 in Proposition

2.1 and Theorem 2.1

are needed to ensure that the moments exist. Comparing Proposition 2.1 and

Theorem 2.1 we see that the ﬁrst two moments o f W

−

have the sa me functional

form in the singular and nonsingular cas es, diﬀering only by a

, b

, c

and

, j = 1, 2. The following corollary gives the a symptotic ma gnitudes of these

factors. Essentially, it tells us that their asymptotic behavior depends weakly

on the rank of W . Its proof seems straightforward and is omitted.

Corollary 2.1. If n > p + 3 and p/n → r with 0 ≤ r < 1 as p, n → ∞ then

≍ n

−1

, b

≍ n

−2

, c

≍ n

−3

and d

≍ n

−4

. If p > n + 3 and n/p → r with

0 < r < 1 as p, n → ∞ then a

≍ p

−1

≍ n

−1

, b

≍ p

−2

≍ n

−2

, c

≍ p

−3

≍ n

−3

and d

≍ p

−4

≍ n

−4

To gain some intuition about the structure of the variance in Theorem

2.1, we

ﬁrst recognize that one role of P

≡ (I

+ C

)/2 is to project onto the space of

symmetric p ×p matr ices: Let A be a p ×p matrix. Then P

vec(A) = {vec(A) +

vec(A

)}/2 and vec

−1

A) = (A + A

)/2. Also, P

≡ vec(I

)vec

)/p

projects onto s pan{vec(I

)}. Conse quently,

var{vec(W

−

)} = 2c

+ 2pd

= 2c

− P

) + 2(c

+ pd

where P

− P

and P

are orthogonal projection operators. One implication of

this development is given in the following corollary.

Corollary 2.2. The eigenvalues of vec{var(W

−

)} are 2 (c

+ pd

) with mul-

tiplicity 1, 2c

with multiplicity p − 1 and 0 with multiplicity p(p − 1). The

corresponding eigenvectors are vec(I

√

p, the eigenvectors of P

−P

and the

eigenvectors of I

− P

If W ∼ W

(Σ, n) then Σ

−1/2

W Σ

−1/2

∼ W

, n) and

†

= Σ

−1/2



−1/2

W Σ

−1/2



−

−1/2

is a reﬂexive generalize d inverse of W . It follows straightforwardly that

150 R.D. Cook and L. Forzani

Corollary 2.3. If W ∼ W

(Σ, n) and p > n + 3 then

E(W

†

) = a

−1

var{vec(W

†

)} = c

+ C

)(Σ

−1

⊗ Σ

−1

) + 2d

vec(Σ

−1

)vec

(Σ

−1

)

cov(W

†

, W

†

) = c

(Σ

−1

+ Σ

−1

) + 2d

−1

where W

†

and Σ

−1

denote element (i, j) of W

†

and Σ

−1

. The second and third

conclusions are the same, except the third is in terms of the elements of W

†

The form of var{vec(W

†

)} given in Corollary

2.3 is identical to the asy mptotic

variance of the covariance matrix from a sample from an elliptically contoured

distribution. In that case the cons tants c

= 1 + κ and d

= κ/2, where κ is the

kurtosis of the distribution (see , for example, Tyler [

15]).

Because ΣW

†

W and Σ

−1

W W

†

are s ymmetric, the reﬂexive generalized in-

verse W

†

of W is also the Moore-Penrose inverse in the inner products based on

Σ and Σ

−1

[

13], but W

†

it is not the usual Moore-Penrose inverse since W

†

and W W

†

are not symmetric. We were unable to ﬁnd succinct expressions for

the mean and variance of W

−

(Σ, n) that are analogous to those fo r W

−

, n)

given in Theorem

2.1. In the next sectio n we g ive some results on the mo ments

of W

−

(Σ, n). We a lso give the order of the spectral norm k · k of the scaled

Wishart Σ

1/2

−

(Σ, n)Σ

1/2

as n, p → ∞, which may be helpful in asymptotic

studies of regressions with p > n.

3. Properties of W

−

(Σ, n)

3.1. Mean and variance

Let W ∼ W

(Σ, n) with p > n + 3. The singular Wishart matrix W can be

decomposed as W ∼ Y Y

, whe re vec(Y ) ∼ N(0, Σ ⊗ I

). Since Y ∈ R

p×n

has

rank n with pro bability 1, the usual Moore-Penrose generalized inverse can be

decomposed as

−

∼ Y (Y

Y )

−2

∼ Σ

1/2

Z(Z

ΣZ)

−2

1/2

, (3.1)

where Z ∈ R

p×n

is a matrix of iid standar d normal variates. Write the spectra l

decomposition of Σ as Σ = ΓΛΓ

, w he re Γ ∈ R

p×p

is orthogonal and Λ > 0

is diagonal. Since the distribution of Z is invariant under orthogonal transfor-

mations we have W

−

∼ ΓΛ

1/2

Z(Z

ΛZ)

−2

1/2

. Consequently, without

loss of generality, we assume that Σ = Λ is a diagonal matrix when studying

moments and other quantities. The left and right orthogonal transformations Γ

and Γ

can be res tored straig htforwardly for a general Σ > 0.

Let M (Λ) = E(W

−

). Then it follows that for all orthogonal matrices P ∈

p×p

, M (P

ΛP ) = P

M(Λ)P and thus M(Λ) is a tensor-valued isotonic tensor

function of Λ . Isotropic tensor functions have been studied extensively in the

literature on continuum mechanics (see [

12] for an introduction and [9] for recent

Generalized inve rse Wishart moments 151

results). For instance, it is known from this litera ture that M (Λ) and Λ have

the s ame eigenvectors. Although there are various representations for isotropic

tensor functions [

9], they do not seem to provide further illuminatio n in this

setting. Let V (Λ) = var{vec(W

−

)} ∈ R

×P

. The variance V (P

ΛP ) = (P

⊗

)V (Λ)(P ⊗ P ) is s imilarly structured as a fourth-orde r tenso r function, and

much the same comments apply.

When Λ = I

, the distribution of W

−

is invariant under orthogonal transfor-

mations, W

−

∼ P W

−

for all orthogonal P ∈ R

p×p

. This invariance property

was used extensively in the moment derivations for Theorem

2.1. However, when

Λ 6= I

with distribution of W

−

is no longer invariant and the moments of W

−

become mor e complicated. Nevertheless, it is still possible to make some progress

using sy mmetry arguments involving the rows z

of Z, essentially utilizing in-

variance under a restricted class of transformations. This leads to the results

stated in Theorem

3.1. In preparation, let m

(Λ) = E{z

ΛZ)

−2

}, let

ij,kl

(Λ) = cov{z

ΛZ)

−2

, z

ΛZ)

−2

}, and let Λ

denote the i th

diagonal element of Λ, i, j, k, l = 1, . . . , p.

Theorem 3.1. Assume that Σ = Λ is a diagonal matrix with diagonal ele-

ments Λ

, i = 1, . . . , p. Then M(Λ) is a diagonal matrix with diagonal elements

(Λ) = Λ

(Λ) and

V (Λ) =

i,j=1

ii,jj

⊗ e

) +

i,j=1

ij,ij

⊗ e

)(I

+ C

)

− 2

i=1

ii,ii

⊗ e

where the Λ arguments for v

(·)

on the right hand side have been suppressed

to improve readability, v

ii,jj

= cov{z

ΛZ)

−2

, z

ΛZ)

−2

}, v

ij,ij

var{z

ΛZ)

−2

} and v

ii,ii

= var{z

ΛZ)

−2

The moments – m

, v

ij,ij

, v

ii,jj

and v

ii,ii

– needed for Theorem

3.1 evidently

do not have tractable closed- form representations.

3.2. Order of kΣ

1/2

−

(Σ, n)Σ

1/2

Let Z

= Z(Z

−1/2

, and let λ

max

and λ

min

denote the la rgest and smallest

eigenvalues of Σ. Then

−

Γ = Λ

1/2

Z(Z

ΛZ)

−2

1/2

= Λ

1/2

Z(Z

−1

ΛZ

)

−2

−1

1/2

Because the nor malized ma trix Z

has orthogonal co lumns, (Z

ΛZ

)

−2

≤ λ

−2

min

and thus Σ

1/2

−

1/2

≤ λ

−2

min

ΓΛW

−

ΛΓ

, where W

−

∼ W

−

, n). T he order

of kΣ

1/2

−

1/2

k can now be found by application of Chebyschev’s inequality.

152 R.D. Cook and L. Forzani

Let ǫ > 0 and, for notational co nvenience, let H = Σ

1/2

−

1/2

. Then for all

h ∈ R

with khk = 1,

Pr(h

Hh ≥ ǫ) ≤ Pr(λ

−2

min

ΓΛW

−

ΛΓ

h ≥ ǫ)

≤ ǫ

−2

−4

min

{var(h

ΓΛW

−

ΛΓ

h) + E

ΓΛW

−

ΛΓ

h)}

= ǫ

−2

−4

min

ΓΛ ⊗ h

ΓΛ)var{vec(W

−

)}(ΛΓ

h ⊗ ΛΓ

+ ǫ

−2

−4

min

ΓΛ

≤ ǫ

−2

(λ

max

/λ

min

)

{2(c

+ pd

) + a

where 2(c

+ pd

) = kvar{ vec(W

−

)}k as given in Corollary

2.2 and a

is as

deﬁned in Theorem 2.1. Combining this with Theorem 2.1 and the conclusions

of Corollary 2.1 we have

Corollary 3.1. Let W ∼ W

(Σ, n).

(i) Assume that n > p + 3 and that p/ n → r with 0 ≤ r < 1. Then

kΣ

1/2

−

1/2

k = O

−1

(ii) Assume that the condition number λ

max

/λ

min

of Σ is bounded as p → ∞,

that p > n + 3 and that n/p → r with 0 < r < 1. Then kΣ

1/2

−

1/2

k =

−1

4. Proof of Theorem 2.1

The general idea of this proof is to use invariance arguments along with moment

matching via Proposition

2.1.

A singular Wis hart matrix W ∼ W

, n), p > n + 3, can be decom-

posed as W = Y Y

with vec(Y ) ∼ N (0, I

⊗ I

) and Y = H

1/2

U, where

∈ R

p×n

is semi-orthogonal HH

= I

, U ∈ R

n×n

is orthogonal and the

diagonal elements d

, . . . , d

of the diagonal matr ix D ∈ R

n×n

are non-zero

with probability 1. Consequently, W = H

DH and W

−

= H

−1

H. More-

over Y

Y ∼ W

, p) and since p > n + 3, Y

Y has full rank with probability

1 and (Y

Y )

−1

= U

−1

U ∼ W

−1

, p). Interchanging n and p in Pro posi-

tion

2.1 we have

E{(Y

Y )

−1

} = E(U

−1

U) = (p − n − 1)

−1

, (4.1)

E{(Y

Y )

−1

Y )

−1

} = E(U

−2

U) = c

(p − 1)I

, (4.2)

var{vec(Y

Y )

−1

} = var{vec(U

−1

U)} (4.3)

= c

+ C

) + 2d

vec(I

)vec

where c

−1

= (p −n)(p −n −1)(p −n −3), d

−1

= (p −n)(p −n −1)

(p −n −3).

4.1. E(W

−

)

Using the fact that Y ∼ P Y for any orthogonal matrix P ∈ R

p×p

, we get

E(W

−

) = E[{(P Y )(P Y )

}

−

] = P E(W

−

Generalized inve rse Wishart moments 153

and consequently E(W

−

) = aI

(see, for example, Ea ton [5], Proposition 2.14).

It remains to ﬁnd a. Since W

−

= H

−1

ap = tr{E(W

−

)} = tr{E(H

−1

H)} = tr{E(D

−1

)} = n(p − n − 1)

−1

where we used (

4.1) for the last equality. From this we get E(W

−

); that is,

a = a

4.2. E(W

−

)

Since E(W

−

) = P E(W

−

for any orthogonal matrix P , E(W

−

) =

. To ﬁnd b we have

bp = tr{E(W

−

)} = tr{ E(H

−2

H)} = tr{E(D

−2

)} = c

n(p − 1),

where we used (4.2) for the last equality. From this we conclude that b = b

4.3. var{vec(W

−

)} and var{tr(W

−

)}

Our proof of this part is based on the following proposition which gives a char-

acterization of matrices that are invariant under a subc lass of the orthogonal

transformations.

Proposition 4.1. Let A ∈ R

×p

such that (P

⊗ P

)A(P ⊗ P ) = A for all

orthogonal matrices P ∈ R

p×p

. Then A = cI

+ fC

+ 2dvec(I

)vec

), for

some real multipliers c, d and f.

This proposition is appa rently well-known in the literature on continuum

mechanics, where it is often referred to as a representation theorem for fourth-

order isotropic tensors. Its proof for the case p = 3 can be found in the classical

literature on Cartesian tensor s [

6]. Jog [7] provides a concise proof and cites

ten other demonstr ations of the same result, most of which are for p = 3. All of

these proofs rely heavily on a nalytic traditions, nota tion and tensor operators

that are not readily found in the statistical literature and might seem elusive on

ﬁrst reading. (A dictionary connecting tensors and common matrix operations

in statistics was given by Dauxois et al. [

3].) For completeness we have included

in Section 4.5 a proof that does not use the technical machinery of continuum

mechanics, but r e lies only on the Kronecker product, vec operator [

8] and the

commutation matrix [10, 11]. These operator s were deﬁned in the Introduction

and are used widely in the statistical literature.

The collection of matrices A that satisﬁes the hypothesis of Proposition

4.1

forms a vector space over the real ﬁeld that is closed under transposition and

multiplication. The proposition essentially gives a basis {I

, C

, vec(I

)vec

)}

for A.

For notational convenience, let V = var{vec(W

−

)}. Since Y ∼ P Y for any

orthogonal matrix P ∈ R

p×p

we have

V = var[vec{(P Y )(P Y )

}

−

] = (P ⊗ P )V (P

⊗ P

154 R.D. Cook and L. Forzani

and consequently V satisﬁes the hypothesis o f Proposition 4.1. However, V is

also invariant under multiplication by C

. Using Proposition

4.1 this implies

that c = f: V − V C

= (c − f)I

−(c − f)C

= 0. C onsequently the class of

covariance matrices must be of the form

var{vec(W

−

)} = c(I

+ C

) + 2dvec(I

)vec

). (4.4)

It remains to ﬁnd c and d which we do by moment matching.

From (

4.4) we have

var{tr(W

−

)} = var{tr(D

−1

)} = var{vec

)vec(W

−

)} = 2cp + 2dp

. (4.5)

Now, by (4.3) we get another expr ession for

var{tr(D

−1

)} = var{vec

)vec(Y

Y )

−1

} = 2c

n + 2d

(4.6)

and using (4.5) and (4.6) together we see that c and d must satisfy

cp + dp

= c

n + d

. (4.7)

Since we are pursuing two factors – c and d – we require a second independent

equation to determine them uniquely. This can be obtained by ﬁrst taking the

trace of (

4.4) to get tr[var{vec(W

−

)}] = cp

+ cp + 2dp. Second, we obtain a

known expression for tr[va r{ vec(W )

−

}] by writing it as

tr[var{vec(W

−

)}] = tr[E{vec(W

−

)vec

−

)}]−tr[E{vec(W

−

)}E

{vec(W

−

)}]

and using previous results to reduce the right hand side. Using (4.2) and the

previously derived form for E(W

−

) we have

tr[E{vec(W

−

)vec

−

)}] = tr [E{vec(H

−1

H)vec

−1

H)}]

= tr{E(D

−2

)} = nc

(p − 1),

tr[E{vec(W

−

)}E

{vec(W

−

)}] = a

Consequently,

+ cp + 2dp = nc

(p − 1) − a

p. (4.8)

Using Maple to solve (

4.7) and (4.8) for c and d gives the solutions stated in

Theorem 2.1.

4.4. Moments of higher order

Moments of higher order can in principle be found similarly, by using results from

von Rose n [

17] in combination with moment matching. For instance, consider

E{(W

−

)

} = E(H

−3

H) = b

. Proceeding as in Section 4.2, E{(Y

Y )

−3

} =

E(U

−3

U) = r

, where r

can be obtained from von Rosen’s Corollary 3.1

again interchanging the roles of n and p. Consequently, b

= nr

/p.

Generalized inve rse Wishart moments 155

4.5. Proof of Proposition 4.1

The proof of Propos ition 4.1 is based on using various subclasses of the orthogo-

nal matrices to characterize the c olumns of A. This characterization is described

in the next lemma , which uses the same hypothesis as the proposition.

Lemma 4.1. Let A ∈ R

×p

be such that (P

⊗ P

)A(P ⊗ P ) = A for all

orthogonal matrices P ∈ R

p×p

. Then for some real factors h, d, s

and s

(i) A(e

⊗ e

) = hvec(I

) + d(e

⊗ e

) for i = 1, . . . , p.

(ii) A(e

⊗ e

) = s

⊗ e

) + s

⊗ e

) for i 6= j = 1, . . . , p.

Proof. This proof is based on taking various forms for P in the hypothesized

relationship. We do not disting uish these forms notationally.

Part (i): Let P be any orthogonal matrix with the property that P e

= e

for a selected index i. Restricting consideration to this subclass of orthogonal

matrices and multiplying the hypothesized equation on the right by e

⊗ e

have (P

⊗ P

)A(e

⊗ e

) = A(e

⊗ e

). Let M = vec

−1

{A(e

⊗ e

)} so that

vec(P

MP ) = vec(M). (4.9)

Without los t of generality take i = p, and consider orthogonal matrices of the

form

P =



0 1



where P

∈ R

(p−1)×(p−1)

is any orthogonal matrix. Clearly, P e

= e

. Partition

M = (M

), j, k = 1, 2 according to the partition of P . We consider the four

partition components M

separately.

(1). From (

4.9) we get P

= M

for all orthogonal matrices P

∈

p−1

. It is well known that this implies M

= h

p−1

for some real multiplier

(see, for example, Eaton [5], Proposition 2.14).

(2). P

= M

for all orthogonal matrices P

implies that M

= 0:

Write M

= λU with U ∈ R

p−1

a semi-orthogonal matrix. Taking P

to be an

orthogonal matrix with row j equal to U

, we have M

= λe

for any j = 1 . . . p

and therefore M

= 0. Similarly, M

= 0.

(3). Finally, M

∈ R

is arbitr ary and it follows that M has the form

M = h

+ d

with d

= M

− h

. Therefore, for i = 1, . . . , p,

vec(M) = A(e

⊗ e

) = h

vec(I

) + d

⊗ e

). (4.10)

It remains to show that h

and d

are constant over the index i. Take a new

P such that P e

= e

and P e

= e

for two s elected indices i 6= j. Then using

(

4.10), the hypothesis and (4.10) again we ﬁnd

vec(I

) + d

⊗ e

) = A(e

⊗ e

)

= (P

⊗ P

)A(P ⊗ P )(e

⊗ e

)

= (P

⊗ P

)A(e

⊗ e

)

156 R.D. Cook and L. Forzani

= (P

⊗ P

) (h

vec(I

) + d

⊗ e

))

= h

vec(I

) + d

⊗ e

Therefore h

= h

= h, d

= d

= d and part (i) follows.

Part (ii): The proof of this conc lusion follows the same logic a s the proof of

part (i), but we use the subclass of orthogonal matrices with the pro perty that

P e

= e

and P e

= e

for selected indice s i 6= j. Restricting consideration to

this sub class and multiplying the hypothesized equation on the right by e

⊗e

we have (P

⊗P

)A(e

⊗e

) = A(e

⊗e

). Let M = vec

−1

{A(e

⊗e

)} so that

vec(P

MP ) = vec(M). Without lost of generality take i = p − 1 and j = p,

and consider orthogonal matrices of the form

P =



0 I



where P

∈ R

(p−2)×(p−2)

is any orthogona l matrix. Partition M = (M

), j, k =

1, 2 to conform with the partitions of P . Using the hypothes is of the lemma, we

reason as follows. (1) P

= M

for all P

orthogonal matrice s of order

p−2 again implies that M

= c

p−2

for some real factor c

. (2) P

= M

for all orthogonal matrices P

implies aga in M

= 0, and analogously M

= 0.

(3) M

is arbitrary and therefore

A(e

⊗e

) = c

vec(I

) + s

1ij

⊗e

) + s

2ij

⊗e

) + t

⊗e

) + u

⊗e

(4.11)

It follows from part (i) and the fact that A

also satisﬁes the hypothesis that

⊗ e

)A(e

⊗ e

) = 0 for i 6= j = 1, . . . , p and k = 1, . . . , p. Using this result

and multiplying (

4.11) by (e

⊗ e

), with k 6= i and k 6= j, (e

⊗ e

) and

⊗ e

) respectively we conclude that c

= t

= u

= 0 for i 6= j. Therefore

A(e

⊗ e

) = s

1ij

⊗ e

) + s

2ij

⊗ e

), i 6= j. (4.12)

It remains to show that s

1ij

and s

2ij

are constant in the indices i 6= j. Take

a new subclass of P ’s such tha t, for two additional selected indices k 6= s,

P e

= e

, and P e

= e

where still i 6= j. Then using (

4.12), the hyp othesis and

(4.12) again we get that

1ij

⊗ e

) + s

2ij

⊗ e

) = (P

⊗ P

)A(P ⊗ P )(e

⊗ e

)

= (P

⊗ P

)A(e

⊗ e

)

= (P

⊗ P

) (s

1ks

⊗ e

) + s

2ks

⊗ e

))

= s

1ks

⊗ e

) + s

2ks

⊗ e

Multiplying the ﬁrst and the last term by (e

⊗e

) and (e

⊗e

) respectively

we g et

A(e

⊗ e

) = s

⊗ e

) + s

⊗ e

), for i 6= j,

which co nc ludes the proof of the lemma.

Generalized inve rse Wishart moments 157

Turning to the proof of Proposition 4 .1, we ﬁrst show that the multipliers

in Lemma

4.1 are functionally rela ted; in particular, d = s

+ s

. It follows

immediately from part (ii) of Lemma 4.1 that, for i 6= j, A(P ⊗ P )(e

⊗ e

) =

(P ⊗P ) (s

⊗ e

) + s

⊗ e

)). Taking P to be in the subcla ss of orthogona l

matrices with the pro perty that, fo r selected indices i 6= j, P e

= (e

+ e

√

and P e

= (e

− e

√

2, we have immediately that



+ e

√

⊗

− e

√



= s



+ e

√

⊗

− e

√



+ s



− e

√

⊗

+ e

√



Expanding this equation and simplifying we ﬁnd that

d((e

⊗ e

) − (e

⊗ e

)) = (s

+ s

) ((e

⊗ e

) − (e

⊗ e

))

and consequently d = s

+ s

. The conclusion of the proposition now follows

from Lemma

4.1 with d = s

+ s

: Taking v =

i,j

⊗ e

Av = A





i,j

⊗ e

)





i6=j

⊗ e

) + s

⊗ e

)) +

i=1

(hvec(I

) + (s

+ s

)(e

⊗ e

))

= s

i,j

⊗ e

) + s

i,j

⊗ e

) + h

i=1

vec(I

)

= s

v + s

v + h

vec(I

)vec

)v,

where h

= h

i=1

5. Proof of Theorem

3.1

Recall that Theorem 3.1 r equires Σ = Λ to be a diagonal matrix. For notational

convenience, le t H = (Z

ΛZ)

−2

. The conclusion that M (Λ) is a diagonal matrix

arises by noting that, for i 6= j, z

∼ −z

, and thus E(z

) = 0.

By a similar symmetry argument, the element v

ij,kl

of V equals 0 when

at least one of its indices i, j, k, l is distinct; that is, not equal to a ny other

index. If no indices are distinct then they must be equal in pairs, leading to

four possibilities: for i 6= j, v

ii,jj

, v

ij,ij

, v

ij,ji

and, for i = j, v

ii,ii

. However,

ij,ij

= v

ij,ji

, which leads to the three v terms in the Theorem. The form o f V

follows from these results, the representation V =

ij,kl

(Λ

)

1/2

ij,kl

vec(e

)vec(e

), and the deﬁnitio n of the commutation matrix.

References

[1] Basser, P. J. a nd Pajevic, S. (2007). Spectral decomposition of a 4th

order covariance tensor: Applications to diﬀusion tensor MRI. Signal Pro-

cessing 87, 220–236.

158 R.D. Cook and L. Forzani

[2] Bodnar, T. a nd Okhrin, Y. (2008). Properties of the singular , inve rse

and generalized inverse partitioned Wishart distributions. J. Multivariate

Anal. 99, 2389–2 405.

MR2463397

[3] Dauxois, J., Romain, Y. and Viguier-Pla, S. (1994). Tensor products

and statistics. Linear Algebra and its Applications 210, 59–88.

MR1294771

[4] D

ıaz-Garc

ıa, J. A. and Guti

errez-J

aimez, R. (2006). Distribution of

the generaliz ed inverse of a random matrix and its applica tions. Journal of

Statistical Planning and Inference 136, 183–192.

MR2207179

[5] Eaton, M. L. (2007). Multivariate Statistics: A Vector Space Approach.

Beachwood, Ohio: Institute of Mathematical Statistics.

MR2431769

[6] Jeffreys, H. (1 931). Cartesian Tensors. Cambridge: Cambridge Univer-

sity Press. Page 91. MR0133075

[7] Jog, C. S. (2006). A concise pro of of the representation theorem for

fourth-order isotropic tens ors. J. Elasticity 85, 119–124. MR226 5724

[8] Henderson, H, V. and Searle, S. R. (1979). Ve c and vech operators for

matrices, with some uses in J acobians and multivariate statistics. Canadian

J. Statist. 7, 65–81. MR0549795

[9] Itskov, M. (2009). Tensor Analysis and Tensor Algebra for Engineers,

2nd Edition. New York: Springer .

[10] Magnus, J. R. and Neudecker, H. (1979). The commutation matrix:

Some properties and applications. Ann. Statist. 7, 381–39 4.

MR0520247

[11] Neudecker, H. and Wansbeek, T. (1983). Some Results on Commu-

tation Matrices, with Statistical Applications . Canadian J. Statist. 11,

221–231.

MR0732996

[12] Ogden, R. W. (2001). Elements of the theory of ﬁnite elasticity. In Fu, Y.

B. and Ogden, R. W. (eds) Nonlinear Elasticity: Theory and Applications.

Cambridge University Press, 1–58.

MR1835108

[13] Stone, M. (1987). Coordinate-Free Multivariate Statistics. New York:

Oxford University Press.

MR0916472

[14] Srivastava, M. S. (2003). Sing ular Wishart and multivariate beta distri-

butions. Ann. Statist. 31, 1537–1560. MR2012825

[15] Tyler, D. E. (1981). Asymptotic inference for eigenvectors. Ann. Statist.

9, 725–736. MR061927 8

[16] Uhlig, H. (1994). On singular Wishart and singular multivariate beta

distributions. Ann. Statist. 22, 395–405. MR1272090

[17] von Rosen, D. (1 988). Moments for the inverted Wishart distribution.

Scand. J. Statist. 15, 97–109.

MR0968156

[18] Zhang, Z. (2007). Pseudo-inverse multivariate/matrix-variate distribu-

tions. J. Multivariate Anal. 98, 1684–1692. MR23 70113

On the mean and variance of the estimated tangency portfolio weights for small samples

Article

Full-text available

Sep 2022

In this paper, a sample estimator of the tangency portfolio (TP) weights is considered. The focus is on the situation where the number of observations is smaller than the number of assets in the portfolio and the returns are i.i.d. normally distributed. Under these assumptions, the sample covariance matrix follows a singular Wishart distribution and, therefore, the regular inverse cannot be taken. In the paper, bounds and approximations for the first two moments of the estimated TP weights are derived, as well as exact results are obtained when the population covariance matrix is equal to the identity matrix, employing the Moore–Penrose inverse. Moreover, exact moments based on the reflexive generalized inverse are provided. The properties of the bounds are investigated in a simulation study, where they are compared to the sample moments. The difference between the moments based on the reflexive generalized inverse and the sample moments based on the Moore–Penrose inverse is also studied.

Dynamic Shrinkage Estimation of the High-Dimensional Minimum-Variance Portfolio

Article

Jan 2023

In this paper, new results in random matrix theory are derived, which allow us to construct a shrinkage estimator of the global minimum variance (GMV) portfolio when the shrinkage target is a random object. More specifically, the shrinkage target is determined as the holding portfolio estimated from previous data. The theoretical findings are applied to develop theory for dynamic estimation of the GMV portfolio, where the new estimator of its weights is shrunk to the holding portfolio at each time of reconstruction. Both cases with and without overlapping samples are considered in the paper. The non-overlapping samples corresponds to the case when different data of the asset returns are used to construct the traditional estimator of the GMV portfolio weights and to determine the target portfolio, while the overlapping case allows intersections between the samples. The theoretical results are derived under weak assumptions imposed on the data-generating process. No specific distribution is assumed for the asset returns except from the assumption of finite $4+\varepsilon$ , $\varepsilon >0$ , moments. Also, the population covariance matrix with unbounded largest eigenvalue can be considered. The performance of new trading strategies is investigated via an extensive simulation. Finally, the theoretical findings are implemented in an empirical illustration based on the returns on stocks included in the S&P 500 index.

Skewness and kurtosis for vector and matrix variates -with R/MultiStatM

Preprint

Full-text available

Oct 2022

Gyorgy Terdik

Preface The first order moment and the variance are the first two characteristics for a probability distribution, actually these two quantities characterize the normal distribution fully. Both the first order moment and the variance can be considered as first and second order cumulants respectively. Cumulants are those quantities which shows the discrepancy between a given distribution and the normal one, since all higher (than second) order cumulants of the normal distribution are zero. The third and fourth order cumulants are the next characteristic in this way to be studied. They have been the focus of interest since the beginning the twenties century under the names of skewness and kurtosis. The skewness and the kurtosis are the third and the fourth order cumulants of a standardized distribution. We consider the skewness and kurtosis for multivariate distributions in this sense. The method of tensor derivative (−derivative for short) of the cumulant generator function and usage of commutator matrices prove to be efficient for obtaining the skewness and kurtosis of several distributions ([Ter21]). This Notes has a dual purpose, one is to study different multivariate distributions in interest by our method and the other is to show the usage of the R package MultiStatM (available in Repository CRAN) in this field. The Note is organized as follows: Chapter 1 summarize some basic notions and ideas which are based on earlier results, commutation and symmetrization, kurtosis matrix and −derivative of functions with symmetric variable. Chapter 2 deals with multi-normal distribution and its descendants. We consider moments, cumulants for normal vector variates and their tensor square and quadratic forms. The first four cumulants are given not only for quadratic forms of normal matrices but for symmetric matrix variates with isotropy cartesian tensors as well. The skewness and kurtosis of multivariate Gamma distribution closes this chapter. Chapter 3 concerns some multivariate discrete distributions like multivariate Bernoulli, and multino-mial distributions. The Appendix includes some known formulae of matrix theory and some proofs.

Distributed Continual Learning with COCOA in High-dimensional Linear Regression

Article

Jan 2024

We consider estimation under scenarios where the signals of interest exhibit change of characteristics over time. In particular, we consider the continual learning problem where different tasks, e.g., data with different distributions, arrive sequentially and the aim is to perform well on the newly arrived task without performance degradation on the previously seen tasks. In contrast to the continual learning literature focusing on the centralized setting, we investigate the problem from a distributed estimation perspective. We consider the well-established distributed learning algorithm CoCoA , which distributes the model parameters and the corresponding features over the network. We provide exact analytical characterization for the generalization error of CoCoA under continual learning for linear regression in a range of scenarios, where overparameterization is of particular interest. These analytical results characterize how the generalization error depends on the network structure, the task similarity and the number of tasks, and show how these dependencies are intertwined. In particular, our results show that the generalization error can be significantly reduced by adjusting the network size, where the most favorable network size depends on task similarity and the number of tasks. We present numerical results verifying the theoretical analysis and illustrate the continual learning performance of CoCoA with a digit classification task.

Professor Heinz Neudecker and matrix differential calculus

Article

Full-text available

Oct 2023
STAT PAP

The late Professor Heinz Neudecker (1933–2017) made significant contributions to the development of matrix differential calculus and its applications to econometrics, psychometrics, statistics, and other areas. In this paper, we present an insightful overview of matrix-oriented findings and their consequential implications in statistics, drawn from a careful selection of works either authored by Professor Neudecker himself or closely aligned with his scientific pursuits. The topics covered include matrix derivatives, vectorisation operators, special matrices, matrix products, inequalities, generalised inverses, moments and asymptotics, and efficiency comparisons within the realm of multivariate linear modelling. Based on the contributions of Professor Neudecker, several results related to matrix derivatives, statistical moments and the multivariate linear model, which can literally be considered to be his top three areas of research enthusiasm, are particularly included.

Sidestepping the inversion of the weak-lensing covariance matrix with Approximate Bayesian Computation

Article

Mar 2023

An organ deformation model using Bayesian inference to combine population and patient-specific data

Article

Full-text available

Feb 2023
PHYS MED BIOL

Objective: Organ deformation models have the potential to improve delivery and reduce toxicity of radiotherapy, but existing data-driven motion models are based on either patient-specific or population data. We propose to combine population and patient-specific data using a Bayesian framework. Our goal is to accurately predict individual motion patterns while using fewer scans than previous models. Approach: We have derived and evaluated two Bayesian deformation models. The models were applied retrospectively to the rectal wall from a cohort of prostate cancer patients. These patients had repeat CT scans evenly acquired throughout radiotherapy. Each model was used to create coverage probability matrices (CPMs). The spatial correlations between these estimated CPMs and the ground truth, derived from independent scans of the same patient, were calculated.\\ Main results: Spatial correlation with ground truth were significantly higher for the Bayesian deformation models than both patient-specific and population-derived models with 1, 2 or 3 patient-specific scans as input. Statistical motion simulations indicate that this result will also hold for more than 3 scans. \\ Significance: The improvement over previous models means that fewer scans per patient are needed to achieve accurate deformation predictions. The models have applications in robust radiotherapy planning and evaluation, among others.

Estimation Under Model Misspecification With Fake Features

Article

Jan 2023

We consider estimation under model misspecification where there is a model mismatch between the underlying system, which generates the data, and the model used during estimation. We propose a model misspecification framework which enables a joint treatment of the model misspecification types of having fake features as well as incorrect covariance assumptions on the unknowns and the noise. We present a decomposition of the output error into components that relate to different subsets of the model parameters corresponding to underlying, fake and missing features. Here, fake features are features which are included in the model but are not present in the underlying system. Under this framework, we characterize the estimation performance and reveal trade-offs between the number of samples, number of fake features, and the possibly incorrect noise level assumption. In contrast to existing work focusing on incorrect covariance assumptions or missing features, fake features is a central component of our framework. Our results show that fake features can significantly improve the estimation performance, even though they are not correlated with the features in the underlying system. In particular, we show that the estimation error can be decreased by including more fake features in the model, even to the point where the model is overparametrized, i.e., the model contains more unknowns than observations.

An organ deformation model using Bayesian inference to combine population and patient-specific data

Preprint

Full-text available

Oct 2022

Objective: Organ deformation models have the potential to improve delivery and reduce toxicity of radiotherapy, but existing data-driven motion models are based on either patient-specific or population data. We propose to combine population and patient-specific data using a Bayesian framework. Our goal is to accurately predict individual motion patterns while using fewer scans than previous models. Approach: We have derived and evaluated two Bayesian deformation models. The models were applied retrospectively to the rectal wall from a cohort of prostate cancer patients. These patients had repeat CT scans evenly acquired throughout radiotherapy. Each model was used to create coverage probability matrices (CPMs). The spatial correlations between these CPMs and ``true'' CPMs, derived from independent scans of the same patient, were calculated. Main results: Spatial correlation with ground truth were significantly higher for the Bayesian deformation models than both patient-specific and population-derived models with 1, 2 or 3 patient-specific scans as input. Statistical motion simulations indicate that this result will also hold for more than 3 scans. Significance: The improvement over known models means that fewer scans per patient are needed to achieve accurate deformation predictions. The models have applications in robust radiotherapy planning and evaluation, among others.

Sharp Analysis of Sketch-and-Project Methods via a Connection to Randomized Singular Value Decomposition

Preprint

Full-text available

Aug 2022

Sketch-and-project is a framework which unifies many known iterative methods for solving linear systems and their variants, as well as further extensions to non-linear optimization problems. It includes popular methods such as randomized Kaczmarz, coordinate descent, variants of the Newton method in convex optimization, and others. In this paper, we obtain sharp guarantees for the convergence rate of sketch-and-project methods via new tight spectral bounds for the expected sketched projection matrix. Our estimates reveal a connection between the sketch-and-project convergence rate and the approximation error of another well-known but seemingly unrelated family of algorithms, which use sketching to accelerate popular matrix factorizations such as QR and SVD. This connection brings us closer to precisely quantifying how the performance of sketch-and-project solvers depends on their sketch size. Our analysis covers not only Gaussian and sub-gaussian sketching matrices, but also a family of efficient sparse sketching methods known as LESS embeddings. Our experiments back up the theory and demonstrate that even extremely sparse sketches show the same convergence properties in practice.

Multivariate Statistics

Book

Jan 2007

Morris L. Eaton

Coordinate-Free Multivariate Statistics: An Illustrated Progression from Halmos to Gauss and Bayes

Article

May 1989

Tensor Algebra and Tensor Analysis for Engineers

Book

Jan 2015

Mikhail Itskov

Cartesian Tensors

Article

May 1963

Harold Jeffreys

Tensor Algebra and Tensor Analysis for Engineers: With Applications to Continuum Mechanics

Book

Jan 2007

Mikhail Itskov

There is a large gap between the engineering course in tensor algebra on the one hand and the treatment of linear transformations within classical linear algebra on the other hand. The aim of this modern textbook is to bridge this gap by means of the consequent and fundamental exposition. The book is addressed primarily to engineering students with some initial knowledge of matrix algebra. Thereby the mathematical formalism is applied as far as it is absolutely necessary. Numerous exercises provided in the book are accompanied by solutions enabling an autonomous study. The last chapters of the book deal with modern developments in the theory of isotropic and anisotropic tensor functions and their applications to continuum mechanics and might therefore be of high interest for PhD-students and scientists working in this area. This second edition is completed by a number of additional examples and exercises. The text and formulae are thoroughly revised and improved where necessary. © Springer-Verlag Berlin Heidelberg 2007, 2009. All rights are reserved.

Elements of the theory of finite elasticity

Article

R.W. Ogden

Nonlinear Elasticity

Article

May 2001

This collection of papers by leading researchers in the field of finite, nonlinear elasticity concerns itself with the behavior of objects that deform when external forces or temperature gradients are applied. This process is extremely important in many industrial settings, such as aerospace and rubber industries. This book covers the various aspects of the subject comprehensively with careful explanations of the basic theories and individual chapters each covering a different research direction. The authors discuss the use of symbolic manipulation software as well as computer algorithm issues. The emphasis is placed firmly on covering modern, recent developments, rather than the very theoretical approach often found. The book will be an excellent reference for both beginners and specialists in engineering, applied mathematics and physics.

On Moments of the Inverted Wishart Distribution

Article

Jan 1997

Dietrich von Rosen

Moments of arbitrary order for the inverted Wishart distribution are obtained with the help of a factorization theorem, moments for normally distributed variables and inverse moments for chi-squared variables. Expressions are given in a recursive as well as a non-recursive manner.

Some results on commutation matrices, with statistical applications

Article

Dec 2008
CAN J STAT

The commutation matrix Pmn changes the order of multiplication of a Kronecker matrix product. The vec operator stacks columns of a matrix one under another in a single column. It is possible to express the vec of a Kronecker matrix product in terms of a Kronecker product of vecs of matrices. The commutation matrix plays an important role here. “Super-vec-operators” like vec A ⊗ vec A vec (A ⊗ A), and vec{(A ⊗ A)Pnn} are very convenient. Several of their properties are being studied. Both the traditional commutation matrix and vec operator and the newer concepts developed from these are applied to multivariate statistical and related problems.

Vec and Vech Operators for Matrices with Some Uses in Jacobians and Multivarite Statistics

Article

Jan 1979
CAN J STAT

The vec of a matrix X stacks columns of X one under another in a single column; the vech of a square matrix X does the same thing but starting each column at its diagonal element. The Jacobian of a one-to-one transformation X → Y is then ∣∣∂(vecX)/∂(vecY) ∣∣ when X and Y each have functionally independent elements; it is ∣∣ ∂(vechX)/∂(vechY) ∣∣ when X and Y are symmetric; and there is a general form for when X and Y are other patterned matrices. Kronecker product properties of vec(ABC) permit easy evaluation of this determinant in many cases. The vec and vech operators are also very convenient in developing results in multivariate statistics.Le “vec” d'une matrice X est un vecteur contenant les eolonnes de X. Le “vech” d'une matrice carrée X est un vecteur contenant les éléments des colonnes de X qui sont sous ou sur la diagonale. Le Jacobien d'une transformation bijective X → Y s'écrit alors: ‖ ∂(vecX)/∂(vecY)‖ si X et Y ont des elements fonctionnellement independants, ‖ ∂(vechX)/∂(vech Y)‖ si X et Y sont symetriques; on presente egalement une formule generale pour le cas ou X et Y ont differents motifs. Les proprietes du produit de Kronecker de vec(ABC) facilite l'evalution de ce determinant dans plusieurs cas. Les operateurs vec et cech sont aussi utiles pour demontrer des resultats en statistique multivariee.

On the mean and variance of the generalized inverse of a singular Wishart matrix

Abstract

Recommended publications

Derivation of the Time-Dependent Schr\"odinger Equation from the Time-Independent One

Singular matrix variate Birnbaum-Saunders distribution under elliptical models

Singular matrix variate Birnbaum-Saunders distribution under elliptical models

Some comments about measures, Jacobians and Moore-Penrose inverse

A Note About Measures, Jacobians and Moore–Penrose Inverse