PreprintPDF Available

Covariance matrix estimation under data-based loss

December 2020

December 2020

Authors:

Anis M. Haddouche

IMT Atlantique

Dominique Fourdrinier

Université de Rouen

Fatiha Mezoued

École nationale supérieure de statistique et d'économie appliquée

Preprints and early-stage research may not have been peer reviewed yet.

In this paper, we consider the problem of estimating the $p\times p$ scale matrix $\Sigma$ of a multivariate linear regression model $Y=X\,\beta + \mathcal{E}\,$ when the distribution of the observed matrix $Y$ belongs to a large class of elliptically symmetric distributions. After deriving the canonical form $(Z^T U^T)^T$ of this model, any estimator $\hat{ \Sigma}$ of $\Sigma$ is assessed through the data-based loss tr$(S^{+}\Sigma\, (\Sigma^{-1}\hat{\Sigma} - I_p)^2 )\,$ where $S=U^T U$ is the sample covariance matrix and $S^{+}$ is its Moore-Penrose inverse. We provide alternative estimators to the usual estimators $a\,S$, where $a$ is a positive constant, which present smaller associated risk. Compared to the usual quadratic loss tr$(\Sigma^{-1}\hat{\Sigma} - I_p)^2$, we obtain a larger class of estimators and a wider class of elliptical distributions for which such an improvement occurs. A numerical study illustrates the theory.

Content uploaded by Anis M. Haddouche

Content may be subject to copyright.

Covariance matrix estimation under data–based loss

Dominique Fourdriniera,1,Anis M. Haddoucheb,∗,2 and Fatiha Mezouedc,1

aUniversité de Normandie, UNIROUEN, UNIHAVRE, INSA Rouen, LITIS, avenue de l’Université, BP 12, 76801 Saint-Étienne-du-Rouvray,

France.

bINSA Rouen, LITIS and LMI, avenue de l’Université, BP 12, 76801 Saint-Étienne-du-Rouvray, France.

cÉcole Nationale Supérieure de Statistique et d’Économie Appliquée (ENSSEA), LAMOPS, Tipaza, Algeria.

ARTICLE INFO

Keywords:

data–based loss

elliptically symmetric distributions

high–dimensional statistics

orthogonally invariant estimators

Stein–Haﬀ type identities.

2010 MSC:

62H12

62F10

62C99.

ABSTRACT

In this paper, we consider the problem of estimating the 𝑝×𝑝scale matrix Σof a multivariate

linear regression model 𝑌=𝑋 𝛽 +when the distribution of the observed matrix 𝑌belongs to a

large class of elliptically symmetric distributions. After deriving the canonical form (𝑍⊤𝑈⊤)⊤

of this model, any estimator 

Σof Σis assessed through the data–based loss tr (𝑆+Σ (Σ−1 

Σ −

𝐼𝑝)2)where 𝑆=𝑈⊤𝑈is the sample covariance matrix and 𝑆+is its Moore-Penrose inverse.

We provide alternative estimators to the usual estimators 𝑎 𝑆 , where 𝑎is a positive constant,

which present smaller associated risk. Compared to the usual quadratic loss t r(Σ−1 

Σ − 𝐼𝑝)2, we

obtain a larger class of estimators and a wider class of elliptical distributions for which such an

improvement occurs. A numerical study illustrates the theory.

1. Introduction

Let consider the multivariate linear regression model, with 𝑝responses and 𝑛observations,

𝑌=𝑋 𝛽 +,(1.1)

where 𝑌is an 𝑛×𝑝matrix, 𝑋is an 𝑛×𝑞matrix of known constants of rank 𝑞≤𝑛and 𝛽is a 𝑞×𝑝matrix of unknown

parameters. We assume that the 𝑛×𝑝noise matrix has an elliptically symmetric distribution with density, with

respect to the Lebesgue measure in ℝ𝑝𝑛, of the form

𝜀↦Σ−𝑛∕2 𝑓tr(𝜀Σ−1𝜀⊤),(1.2)

where Σis a 𝑝×𝑝unknown positive deﬁnite matrix and 𝑓(⋅)is a non–negative unknown function.

The model (1.1) has been considered by various authors such as Kubokawa and Srivastava (1999,2001), who esti-

mated Σand 𝛽respectively in the context (1.2), and Tsukuma and Kubokawa (2016) who estimated Σin the Gaussian

setting. A common alternative representation of this model is 𝑌=𝑀+, where is as above and 𝑀is in the column

space of 𝑋, has been also considered in the literature. See for instance Canu and Fourdrinier (2017) and Candès,

Sing-Long and Trzasko (2013).

Although the matrix of regression coeﬃcients 𝛽is also unknown, we are interested in estimating the scale matrix

Σ. We address this problem under a decision–theoretic framework through a canonical form of the model (1.1), which

allows to use a suﬃcient statistic 𝑆=𝑈⊤𝑈for Σ, where 𝑈is an (𝑛−𝑞) × 𝑝matrix (see Section 2for more details).

In this context, the natural estimators of Σare of the form



Σ𝑎=𝑎 𝑆 , (1.3)

for some positive constants 𝑎.

As pointed out by James and Stein (1961), the estimators of the form (1.3) perform poorly in the Gaussian setting.

In fact, larger (smaller) eigenvalues of Σare overestimated (underestimated) by those estimators. Thus we may expect

to improve these estimators by shrinking the eigenvalues of 𝑆, which gives rise to the class of orthogonaly invariant

∗Corresponding author

Dominique.Fourdrinier@univ-rouen.fr (D. Fourdrinier); Mohamed.haddouche@insa-rouen.fr (A.M. Haddouche);

famezoued@yahoo.fr (F. Mezoued)

1Professor

2Temporarily associated to teaching and research.

First Author et al.: Preprint submitted to Elsevier Page 1 of 11

arXiv:2012.11920v1 [math.ST] 22 Dec 2020

estimators (see Takemura (1984)). Since the seminal work of James and Stein (1961), this problem has been largely

considered in the Gaussian setting. See, for instance, Tsukuma and Kubokawa (2016), Tsukuma (2016) and Chételat

and Wells (2016). However, the elliptical setting has been considered by a few authors such as Kubokawa and Srivastava

(1999), Haddouche, Fourdrinier and Mezoued (2021).

In this paper, the performance of any estimator 

Σof Σis assessed through the data-based loss

𝐿𝑆(

Σ,Σ) = tr 𝑆+ΣΣ−1 

Σ − 𝐼𝑝2(1.4)

and its associated risk

𝑅(

Σ,Σ) = 𝐸𝜃,Σtr 𝑆+ΣΣ−1 

Σ − 𝐼𝑝2,(1.5)

where 𝐸𝜃,Σdenotes the expectation with respect to the density speciﬁed below in (2.3) and where 𝑆+is the Moore–

Penrose inverse of 𝑆. Note that, when 𝑝 > 𝑛 −𝑞,𝑆is non–invertible and, when 𝑝≤𝑛−𝑞,𝑆is invertible so that 𝑆+

coincides with the regular inverse 𝑆−1. This type of loss is called data–based loss in so far as it contains a part of the

observation 𝑈through 𝑆=𝑈⊤𝑈. The notion of data–based loss was introduced by Efron and Morris (1976) when

estimating a location parameter. Likewise, Fourdrinier and Strawderman (2015) showed the interest of considering

such a data–based loss with respect to the usual quadratic losses. Also, the data–based loss (1.4) was considered, in a

Gaussian setting, by Tsukuma and Kubokawa (2015) who were motivated by the diﬃculty to handle with the standard

quadratic loss

𝐿(

Σ,Σ) = tr Σ−1 

Σ − 𝐼𝑝2.(1.6)

See Haﬀ (1980) and Tsukuma (2016) for more details. Thus the loss in (1.4) is a data–based variant of the (1.6), through

which we aim to improve on the estimators 

Σ𝑎in (1.3) by alternative estimators, focusing on improved orthogonally

invariant estimators. Note that most improvement results in the Gaussian case were derived thanks to Stein–Haﬀ types

identities. Here, we speciﬁcally use the Stein–Haﬀ type identity given by Haddouche et al. (2021), in the elliptical

case, to establish our dominance result, which is well adapted to our uniﬁed approach of the cases 𝑆invertible and 𝑆

non–invertible.

The rest of this paper is structured as follows. In Section 2, we give improvement conditions of the proposed

estimators over the usual estimators. In Section 3, we assess the quality of the proposed estimators through a simulation

study in the context of the t–distribution. We also compare numerically our results with those of Konno (2009) in the

Gaussian setting. Finally, we give in an Appendix all the proofs of our ﬁndings.

2. Main results

Although we are interested in estimating the scale matrix Σ, recall that 𝛽is a 𝑞×𝑝matrix of unknown parameters.

Note that, since 𝑋has full column rank, the least square estimator of 𝛽is 

𝛽= (𝑋⊤𝑋)−1 𝑋⊤𝑌;this is the maximum

likelihood estimator in the Gaussian setting. Natural estimators of the scale matrix Σare based on the residual sum of

squares given by

𝑆=𝑌⊤(𝐼𝑛−𝑃𝑋)𝑌 , (2.1)

where 𝑃𝑋=𝑋(𝑋⊤𝑋)−1 𝑋⊤is the orthogonal projector onto the subspace spanned by the columns of 𝑋.

Following the lines of Kubokawa and Srivastava (1999) and Tsukuma and Kubokawa (2020b), we derive the canon-

ical form of the model (1.1) which allows a suitable treatment of the estimation of Σ. Let 𝑋=𝑄1𝑇⊤be the 𝑄𝑅

decomposition of 𝑋where 𝑄1is a 𝑛×𝑞semi-orthogonal matrix and 𝑇a𝑞×𝑞lower triangular matrix with positive

diagonal elements. Setting 𝑚=𝑛−𝑞, there exists a 𝑛×𝑚semi-orthogonal matrix 𝑄2which completes 𝑄1such that

𝑄= (𝑄1𝑄2)is an 𝑛×𝑛orthogonal matrix. Then, since

First Author et al.: Preprint submitted to Elsevier Page 2 of 11

𝑄⊤

2𝑋 𝛽 =𝑄⊤

2𝑄1𝑇⊤𝛽= 0

we have

𝑄⊤𝑌=𝑍

𝑈=𝑄⊤

𝑄⊤

2𝑋 𝛽 +𝑄⊤=𝜃

0+𝑄⊤,(2.2)

where 𝑄⊤

1𝑋 𝛽 =𝜃and where 𝑍and 𝑈are, respectively, 𝑞×𝑝and 𝑚×𝑝matrices. As 𝑋=𝑄1𝐿⊤, the projection

matrix 𝑃𝑋satisﬁes 𝑃𝑋=𝑄1𝐿⊤(𝐿⊤𝐿)−1𝐿 𝑄⊤

1=𝑄1𝑄⊤

1so that 𝐼𝑛−𝑃𝑋=𝑄2𝑄⊤

2. It follows that (2.1) becomes

𝑆=𝑌⊤𝑄2𝑄⊤

2𝑌=𝑈⊤𝑈 ,

according to (2.2), which is a suﬃcient statistic for Σ.

The orthogonal matrix 𝑄provides a linear reduction from 𝑛to 𝑞observations within each of the 𝑝responses. In

addition, according to (1.2), the density of 𝑄⊤is the same as that of , and hence, (𝑍⊤𝑈⊤)⊤has an elliptically

symmetric distribution about the matrix (𝜃⊤0⊤)⊤with density

(𝑧, 𝑢)↦Σ−𝑛∕2 𝑓tr (𝑧−𝜃) Σ−1 (𝑧−𝜃)⊤+ t r 𝑢Σ−1 𝑢

⊤,(2.3)

where 𝜃and Σare unknown. In this sense, the model (2.2) is the canonical form of the multivariate linear regression

model (1.1). Note that the marginal distribution of 𝑈=𝑄⊤

2𝑌is elliptically symmetric about 0with covariance matrix

proportional to 𝐼𝑚⊗Σ(see Fang and Zhang (1990)). This implies that 𝑆=𝑈⊤𝑈have a generalized Wishart dis-

tribution (see Díaz-Gacía and Gutiérrez-Jámez (2011)), which coincides with the standard (singular or non–singular)

Wishart distribution in the Gaussian setting (see Srivastava (2003)).

As mentioned in Section 1, the usual estimators of 

Σ𝑎in (1.3) perform poorly. We propose alternative estimators

of the form



Σ𝐽=𝑎(𝑆+𝐽),(2.4)

where 𝐽=𝐽(𝑍, 𝑆 )is a correction matrix. The improvement over the class of estimators 

Σ𝑎can be done by improving

the best estimator 

Σ𝑎𝑜=𝑎𝑜𝑆within this class, namely, the estimator which minimizes the risk (1.5). It is proved in

the Appendix that



Σ𝑎𝑜=𝑎𝑜𝑆 , with 𝑎𝑜=1

𝐾∗𝑣and 𝑣= max{𝑝, 𝑚},(2.5)

where 𝐾∗is the normalizing constant (assumed to be ﬁnite) of the density deﬁned by

(𝑧, 𝑢)↦1

𝐾∗Σ−𝑛∕2 𝐹∗tr (𝑧−𝜃) Σ−1 (𝑧−𝜃)⊤+ t r 𝑢Σ−1 𝑢⊤,(2.6)

where, for any 𝑡≥0,

𝐹∗(𝑡) = 1

2∫∞

𝑡

𝑓(𝜈)𝑑𝜈 .

Note that under de quadratic loss function (1.6) the optimal constant is 1∕𝐾∗(𝑝+𝑚+1). Of course, this risk optimality

has sense only if the risk of 

Σ𝑎𝑜is ﬁnite. As shown in Haddouche (2019), this is the case as soon as 𝐸𝜃,Σt rΣ−1𝑆<

∞and 𝐸𝜃,ΣtrΣ𝑆+<∞.

In order to give a uniﬁed dominance result of 

Σ𝐽over 

Σ𝑎𝑜for the two cases where 𝑆is non–invertible and where

𝑆is invertible, we consider, as a correction matrix in (2.4), the projection of a matrix function 𝐺(𝑍, 𝑆) = 𝐺on the

subspace spanned by the columns of 𝑆𝑆 +, namely,

𝐽=𝑆𝑆 +𝐺 . (2.7)

First Author et al.: Preprint submitted to Elsevier Page 3 of 11

In addition to the risk ﬁniteness conditions of 

Σ𝑎𝑜, it can be shown that the risk of 

Σ𝐽is ﬁnite as soon as the expectations

𝐸𝜃,ΣΣ−1𝑆𝑆 +𝐺2

𝐹and 𝐸𝜃,Σ𝑆+𝐺2

𝐹are ﬁnite, where ⋅𝐹denotes the Frobenius norm. Under these conditions,

the risk diﬀerence between 

Σ𝐽and 

Σ𝑎𝑜is

Δ(𝐺) = 𝑎2

𝑜𝐸𝜃,ΣtrΣ−1 𝑆 𝑆 +𝐺{𝐼𝑝+𝑆+𝐺+𝑆𝑆+}− 2 𝑎𝑜𝐸𝜃 ,Σtr 𝑆+𝐺.(2.8)

Noticing that the ﬁrst integrand term in (2.8) depends on the unknown parameter Σ−1, our approach consists in replacing

this integrand term by a random matrix 𝛿(𝐺), which does not depend on Σ−1, such that Δ(𝐺)≤𝐸∗

𝜃,Σ𝛿(𝐺)where 𝐸∗

𝜃,Σ

denotes the expectation with respect to the density (2.6). Clearly, a suﬃcient condition for Δ(𝐺)to be non–positive

(and hence, for 

Σ𝐽to improve over 

Σ𝑎𝑜) is that 𝛿(𝐺)is non–positive. To this end, we rely on the following Stein–Haﬀ

type identity.

Lemma 2.1 (Haddouche et al. (2021)).Let 𝐺(𝑧, 𝑠)be a 𝑝×𝑝matrix function such that, for any ﬁxed 𝑧,𝐺(𝑧, 𝑠)is

weakly diﬀerentiable with respect to 𝑠. Assume that 𝐸𝜃 ,Σtr (Σ−1 𝑆 𝑆+𝐺)<∞. Then we have

𝐸𝜃,ΣtrΣ−1 𝑆 𝑆 +𝐺=𝐾∗𝐸∗

𝜃,Σtr2𝑆 𝑆 +𝑠{𝑆𝑆+𝐺}⊤+ (𝑚−𝑟− 1) 𝑆+𝐺,(2.9)

where 𝑟= min{𝑝, 𝑚}and 𝑠{⋅}is the Haﬀ operator whose generic element is 1

2(1 + 𝛿𝑖𝑗 )𝜕

𝜕𝑆𝑖𝑗

,with 𝛿𝑖𝑗 = 1 if 𝑖=𝑗

and 𝛿𝑖𝑗 = 0 if 𝑖≠𝑗.

Note that the existence of the expectations in (2.9) is implied by the above risk ﬁniteness conditions. An original

Stein–Haﬀ identity was derived independently by Stein (1986) and Haﬀ (1979) in the Gaussian setting where 𝑆is

invertible. This identity was extended to the class of elliptically symmetric distributions in (2.3)Kubokawa and Sri-

vastava (1999) and also by Bodnar and Gupta (2009). Here, we use the new Stein–Haﬀ type identity recently derived

by Haddouche et al. (2021) in the elliptical framework (2.3) dealing with both cases 𝑆non–invertible and 𝑆invertible.

Applying Lemma 2.1 to the term depending on Σ−1 in the right–hand side of (2.8) gives

Δ(𝐺) = 𝑎2

𝑜𝐾∗𝐸∗

𝜃,Σ(𝑚−𝑟− 1) t r𝑆+𝐺+ (𝑆+𝐺)2+𝑆+𝐺𝑆𝑆+

+ 2 t r𝑆𝑆+𝑠{𝑆 𝑆 +𝐺+𝑆𝑆+𝐺𝑆+𝐺+𝑆 𝑆 +𝐺 𝑆𝑆+}⊤− 2 𝑎𝑜𝐸𝜃 ,Σtr 𝑆+𝐺.(2.10)

It is worth noticing that the risk diﬀerence in (2.10) depends on the 𝐸𝜃,Σand 𝐸∗

𝜃,Σexpectations (which coincide

in the Gaussian setting since 𝐹∗=𝑓). Thus, in order to derive a dominance result, we need to compare these two

expectations. A possible approach consists to restrict us to the subclass of densities verifying 𝑐≤𝐹∗(𝑡)∕𝑓(𝑡)≤𝑏,

for some positive constants 𝑐and 𝑏(see Berger (1975) for the class where 𝑐≤𝐹∗(𝑡)∕𝑓(𝑡)). Due to the complexity of

the use of the quadratic loss in (1.6) (which necessitates a twice application of the Stein–Haﬀ type identity (2.9)), this

subclass was considered by Haddouche et al. (2021). Here, thanks to the data–based loss (1.4), we are able to avoid

such a restriction, and hence, to deal with a larger class of elliptically symmetric distributions in (2.3) (subject to the

moment conditions induced by the above ﬁniteness conditions).

Following the suggestion to shrink the eigenvalues of 𝑆mentioned in Section 1, we consider as a correction matrix

a matrix 𝑆𝑆+𝐺with 𝐺orthogonally invariant in the following sense. Let 𝑆=𝐻 𝐿 𝐻 ⊤the eigenvalue decomposition

of 𝑆where 𝐻is a 𝑝×𝑟semi–orthogonal matrix of eigenvectors and 𝐿= diag(𝑙1,…, 𝑙𝑟), with 𝑙1>, …, > 𝑙𝑟, is

the diagonal matrix of the 𝑟positive corresponding eigenvalues of 𝑆(see Kubokawa and Srivastava (2008) for more

details). Then set 𝐺=𝐻 𝐿Ψ(𝐿)𝐻⊤, with Ψ(𝐿) = diag(𝜓1(𝐿),…, 𝜓𝑟(𝐿)) where 𝜓𝑖=𝜓𝑖(𝐿)(𝑖= 1,…, 𝑟) is a

diﬀerentiable function of 𝐿. Consequently, by semi–orthogonality of 𝐻, we have 𝑆𝑆+𝐻=𝐻 𝐻 ⊤𝐻=𝐻, so that

the correction matrix in (2.7) is

𝐽=𝑆𝑆 +𝐺=𝐺=𝐻 𝐿Ψ(𝐿)𝐻⊤.

Thus the alternative estimators that we consider are of the form



ΣΨ=𝑎𝑜𝑆+𝐻 𝐿 Ψ(𝐿)𝐻⊤) = 𝑎𝑜𝐻 𝐿 𝐼𝑟+ Ψ(𝐿)𝐻⊤,(2.11)

which are usually called orthogonally invariant estimators (i.e. equivariant under orthogonal transformations). See for

instance Takemura (1984).

Now, adapting the risk ﬁniteness conditions mentioned above, we are in a position to give our dominance result of

the alternative estimators in (2.11) over the optimal estimator in (2.5), under the data–based loss (1.4).

First Author et al.: Preprint submitted to Elsevier Page 4 of 11

Theorem 2.1. Assume that the following expectations 𝐸𝜃,Σtr(Σ−1 𝑆),𝐸𝜃,Σtr(Σ𝑆+),𝐸𝜃,ΣΣ−1𝐻 𝐿Ψ(𝐿)𝐻⊤2

𝐹

and 𝐸𝜃,Σ𝐻Ψ(𝐿)𝐻⊤2

𝐹are ﬁnite. Let Ψ(𝐿) = diag(𝜓1,…, 𝜓𝑟)where 𝜓𝑖=𝜓𝑖(𝐿)(𝑖= 1,…, 𝑟) is diﬀerentiable

function of 𝐿with trΨ(𝐿)≥𝜆, for a ﬁxed positive constant 𝜆.

Then an upper bound of the risk diﬀerence between 

ΣΨand 

Σ𝑎𝑜under the loss function (1.4)is given by

Δ(Ψ(𝐿)) ≤𝑎2

𝑜𝐾∗𝐸∗

𝜃,Σ𝑔(Ψ),

where

𝑔(Ψ) =

𝑟



𝑖=1 2(𝑣−𝑟+ 1)𝜓𝑖+ (𝑣−𝑟+ 1)𝜓2

𝑖+ 4𝑙𝑖(1 + 𝜓𝑖)𝜕𝜓𝑖

𝜕𝑙𝑖

𝑟



𝑗≠𝑖

𝑙𝑖(2𝜓𝑖+𝜓2

𝑖) − 𝑙𝑗(2𝜓𝑗+𝜓2

𝑖)

𝑙𝑖−𝑙𝑗

− 2𝑣𝜆.

(2.12)

Also 

ΣΨin (2.11)improves over 

Σ𝑎𝑜in (2.5)as soon as 𝑔(Ψ) ≤0.

The proof of Theorem 2.1 is given in the Appendix. Note that, although the expectation 𝐸∗

𝜃,Σis associated to the

generating function 𝑓(⋅)in (1.2), the function 𝑔(Ψ) does not depend on 𝑓(⋅), and hence, the improvement result in

Theorem 2.1 is robust in that sense. Note also that Theorem 2.1 is well adapted to deal with the James and Stein

(1961) estimator where 𝜓𝑖(𝐿) = 1∕(𝑣+𝑟− 2𝑖+ 1), for 𝑖= 1,…, 𝑟, since trΨ(𝐿)> 𝜆 = 1∕(𝑣+𝑟− 1) and the

Efron-Morris-Dey estimator, considered by Tsukuma and Kubokawa (2020a), where 𝜓𝑖(𝐿) = 1∕1 + 𝑏 𝑙𝛼

𝑖∕tr (𝐿𝛼)𝑣,

for 𝑖= 1,…, 𝑟 and for positive constants 𝑏and 𝛼, since trΨ(𝐿)> 𝜆 =𝑟∕(𝑏+ 1) 𝑣.

In the following, we consider a new class of estimators which is an extension of the Haﬀ (1980) class, that is,

estimators of the form



Σ𝛼,𝑏 =𝑎𝑜𝑆+𝐻 𝐿 Ψ(𝐿)𝐻⊤with, for 𝛼≥1and 𝑏 > 0,Ψ(𝐿) = 𝑏𝐿−𝛼

tr(𝐿−𝛼),(2.13)

where 𝑎𝑜is given in (2.5). For 𝛼= 1, this is the estimator considered by Konno (2009), who deals with the Gaussian

case and the quadratic loss (1.6), while Tsukuma and Kubokawa (2020a) used an extended Stein loss. An elliptical

setting was also considered by Haddouche et al. (2021) under the quadratic loss (1.6).

It is proved in the Appendix that, for the entire class of elliptically symmetric distributions in (2.3), any estimator



Σ𝛼,𝑏 in (2.13) improves on the optimal estimator 

Σ𝑎𝑜in (2.5), under the data–based loss (1.4), as soon as

0< 𝑏 ≤2 (𝑟− 1)

𝑣−𝑟+ 1 .(2.14)

It worth noting that Tsukuma and Kubokawa (2020a) gave Condition (2.14) as an improvement condition although

their loss was diﬀerent.

3. Numerical study

Let the elliptical density in (1.2) be a variance mixture of normal distributions where the mixing variable, with

density ℎ, has the inverse–gamma distribution (𝑘∕2, 𝑘∕2) with shape and scale parameters both equal to 𝑘∕2 for

𝑘 > 2. Thus, for any 𝑡≥0, the generating function 𝑓in (1.2) has the form

𝑓(𝑡) = ∫∞

(2𝚟𝜋)𝑛𝑝∕2 exp −𝑡

2𝚟ℎ(𝚟)𝑑𝚟,

which corresponds to the 𝑡–distribution with 𝑘degrees of freedom. Then the primitive 𝐹∗of 𝑓in (2.6) is, for any

𝑡≥0,

𝐹∗(𝑡) = 1

2∫∞

𝑡∫∞

(2𝚟𝜋)𝑛𝑝∕2 exp −𝑤

2𝚟ℎ(𝚟)𝑑𝚟𝑑𝑤 =∫∞

𝚟

(2𝚟𝜋)𝑛𝑝∕2 exp −𝑡

2𝚟ℎ(𝚟)𝑑𝚟.

by Fubini’s theorem. Therefore the normalizing constant 𝐾∗in (2.6) is

𝐾∗=∫ℝ𝑝𝑛 ∫∞

0Σ−𝑛∕2

(2𝚟𝜋)𝑛𝑝∕2 𝚟exp −1

2𝚟tr (𝑧−𝜃) Σ−1 (𝑧−𝜃)⊤+ t r Σ−1 𝑢

⊤𝑢ℎ(𝚟)𝑑𝚟𝑑𝑧 𝑑 𝑢 ,

First Author et al.: Preprint submitted to Elsevier Page 5 of 11

=∫∞

𝚟∫ℝ𝑝𝑛 Σ−𝑛∕2

(2 𝚟𝜋)𝑛𝑝∕2 exp −1

2𝚟tr (𝑧−𝜃) Σ−1 (𝑧−𝜃)⊤+ t r Σ−1 𝑢

⊤𝑢𝑑𝑧 𝑑 𝑢 ℎ(𝚟)𝑑𝚟(3.1)

by Fubini’s theorem. Clearly the most inner integral in (3.1) equals 1 so that

𝐾∗=∫∞

𝚟ℎ(𝚟)𝑑𝚟=𝑘

𝑘− 2 ,

by propriety of (𝑘∕2, 𝑘∕2). Note that, when 𝑘goes to ∞,(𝑘∕2, 𝑘∕2) goes to the multivariate Gaussian distribution

(for which 𝐾∗= 1 since 𝑓=𝐹∗) with covariance matrix 𝐼𝑛⊗Σ.

In the following, we study numerically the performance of the alternative estimators in (2.13) expressed as



Σ𝛼,𝑏 =𝑎𝑜𝑆+𝑏

tr(𝐿−𝛼)𝐻 𝐿1−𝛼𝐻⊤where 0≤𝑏≤𝑏0=2 (𝑟− 1)

𝑣−𝑟+ 1 and 𝛼≥1.(3.2)

As mentioned above, Konno (2009) consider the case 𝛼= 1, in the Gaussian setting and under the quadratic loss (1.6),

for which its improvement condition is

0≤𝑏≤𝑏1=2 (𝑟− 1) (𝑣+𝑟+ 1)

(𝑣−𝑟+ 1) (𝑣−𝑟+ 3) .

Note that, although 𝑏0< 𝑏1, the improvement condition in (3.2) is valid fo any 𝛼≥1and all the class of elliptically

symmetric distributions (2.3). However it was shown numerically by Haddouche et al. (2021) that 𝑏1is optimal in the

Gaussian context.

We consider the following structures of Σ:(i) the identity matrix 𝐼𝑝and (ii) an autoregressive structure with

coeﬃcient 0.9(i.e. a 𝑝×𝑝matrix where the (𝑖, 𝑗 )th element is 0.9𝑖−𝑗). To assess how an alternative estimator 

Σ𝛼,𝑏

improves over 

Σ𝑎𝑜, we compute the Percentage Reduction In Average Loss (PRIAL) deﬁned as

PRIAL( 

Σ𝛼,𝑏) = average loss of 

Σ𝑎𝑜− average loss of 

Σ𝛼,𝑏

average loss of 

Σ𝑎𝑜

and based on 1000 independent Monte–Carlo replications for some couples (𝑝, 𝑚).

In Figure 1, we study the eﬀect of the constant 𝑏in (3.2) on the prial’s in the non–invertible ((𝑝, 𝑚) = (25,10)) and

the invertible ((𝑝, 𝑚) = (10,25)) cases. The Gaussian setting is investigated for the structure (i) of Σ. Note that, when

0≤𝑏≤𝑏0, the best prial (around 7% in both invertible and non–invertible cases) is reported for 𝑏=𝑏0= 1.125 (for

(𝑣, 𝑟) = (25,20)). For this reason, in the following, we consider the estimators 

Σ𝛼,𝑏0with

𝑏0=2 (𝑟− 1)

𝑣−𝑟+ 1 .

Note also that, for 𝑏>𝑏0, the estimators 

Σ𝛼,𝑏 still improve over Σ𝑎𝑜and that the maximum value of the prial is around

50%. This shows that there exists a larger range of values of 𝑏than the one our theory provides for which 

Σ𝛼,𝑏 improves

over 

Σ𝑎𝑜.

In Figure 2, we study the eﬀect of 𝛼on the prial’s of the estimator 

Σ𝛼,𝑏0over 

Σ𝑎𝑜=𝑆∕𝑣when the sampling

distribution is Gaussian (𝐾∗= 1 in (2.5)), and over 

Σ𝑎𝑜=𝑆(𝑘− 2)∕𝑣𝑘 when it is the 𝑡-distribution (𝐾∗= (𝑘− 2)∕𝑘

in (2.5)) with 𝑘degrees of freedom. For the structure (i) of Σ, note that, for 𝛼≥6, the prial’s stabilize at 12.5%, in the

Gaussian case, and at 8.5%, in the Student case. Similarly, the prial’s are better in the Gaussian setting for the structure

(ii). In addition, it is interesting to observe that, when 𝛼is close to zero, the prial’s are small for the structure (i) and

may be negative for the structure (ii).

In Figure 3, under the Gaussian assumption, we provide the prial’s of 

Σ𝛼,𝑏0with respect to 

Σ𝑎𝑜=𝑆∕𝑣under the

data–based loss (1.4) and the prial’s of 

Σ𝛼,𝑏1with respect to 

Σ𝑎𝑜=𝑆∕(𝑣+𝑟+ 1) under the quadratic loss (1.6). For the

two structures (i) and (ii) of Σ, the prial’s are better under the data–based loss. For the structure (i) with 𝛼= 1 (which

coincide with the Konno’s estimator), we observe a prial equal to 1.73% which is similar to that of Konno (2009). Note

that, under the data–based loss the prial is much better since it equals 13.42%. We observe similar behaviors for the

structure (ii) than for the structure (i), but with lower prial’s.

First Author et al.: Preprint submitted to Elsevier Page 6 of 11

0.0 2.5 5.0 7.5 10.0 12.5 15.0

PRIAL

Invertible case

Non-invertible case

Fig. 1: Eﬀect of 𝑏on the PRIAL of 

Σ𝛼,𝑏, with 𝛼= 1, under data–based loss in the Gaussian setting. The structure (i) of Σ

is considered for the invertible case with (𝑝, 𝑚) = (10,25) and the non–invertible case with (𝑝, 𝑚) = (25,10).

02468

PRIAL

Gaussian

Student

(i)

02468

PRIAL

Gaussian

Student

(ii)

Fig. 2: PRIAL’s of 

Σ𝛼,𝑏0under the data–based loss. The non-invertible case is considered, with (𝑝, 𝑚) = (50,20), for the

structures (i) and (ii) of Σfor the t-distribution, with 𝑘= 5 degrees of freedom, and the Gaussian distribution.

02468

PRIAL

quadratic loss

data-based loss

(i)

02468

PRIAL

quadratic loss

data-based loss

(ii)

Fig. 3: PRIAL’s of 

Σ𝛼,𝑏0under data–based loss and PRIAL’s of 

Σ𝛼,𝑏1under quadratic loss. The non–invertible case is

considered, with (𝑝, 𝑚) = (20,10), for the structures (i) and (ii) of Σunder the Gaussian distribution.

First Author et al.: Preprint submitted to Elsevier Page 7 of 11

4. Conclusion and perspective

For a wide class of elliptically symmetric distributions, we provide a large class of estimators of the scale matrix

Σof the elliptical multivariate linear model (1.1) which improve over the usual estimators 𝑎 𝑆. We highlight that the

use of the data–based loss (1.4) is more attractive than the use of the classical quadratic loss (1.6). Indeed, (1.4) brings

more improved estimators and their improvement is valid within a larger class of distributions. This means that (1.4)

is more discriminant than (1.6) to exhibit improved estimators.

While in (2.10) the risk diﬀerence between 

Σ𝐽=𝑎𝑜(𝑆+𝐽)with 𝐽=𝑆𝑆+𝐺(𝑍 , 𝑆)and 

Σ𝑎𝑜=𝑎𝑜𝑆, the dom-

inance result in Theorem 2.1 is given for a correction matrix 𝐺(𝑍, 𝑆 ) = 𝐻𝐿Ψ(𝐿)𝐻⊤which depends only on 𝑆.

Recently, Tsukuma (2016) consider, in the Gaussian case, alternative estimators where 𝐺(𝑍, 𝑆 )depends on 𝑆and on

the information contained in the sample mean 𝑍. This class of estimators merits future investigations in an elliptical

setting.

5. Appendix

We give in the following corollary an adaptation of Lemma (2.9) to an orthogonally invariant matrix function 𝐺,

that is, of the form 𝐺=𝐻 𝐿 Φ(𝐿)𝐻⊤where Φ(𝐿) = diag(𝜙1,…, 𝜙𝑟)with 𝜙𝑖=𝜙𝑖(𝐿)(𝑖= 1,…, 𝑟) is diﬀerentiable

function of 𝐿

Corollary 5.1. Let Φ(𝐿) = diag(𝜙1,…, 𝜙𝑟)where 𝜙𝑖=𝜙𝑖(𝐿)(𝑖= 1,…, 𝑟) is diﬀerentiable function of 𝐿. Assume

that 𝐸𝜃,Σtr(Σ−1 𝐻 𝐿 Φ(𝐿)𝐻⊤)<∞. Then we have

𝐸𝜃,Σtr(Σ−1 𝐻 𝐿 Φ(𝐿)𝐻⊤)=𝐾∗𝐸∗

𝜃,Σ𝑟



𝑖=1 (𝑣−𝑟+ 1) 𝜙𝑖+ 2 𝑙𝑖

𝜕𝜙𝑖

𝜕𝑙𝑖

𝑟



𝑗≠𝑖

𝑙𝑖𝜙𝑖−𝑙𝑗𝜙𝑗

𝑙𝑖−𝑙𝑗.

Proof. Let 𝐺=𝐻 𝐿 Φ(𝐿)𝐻⊤,𝑆+=𝐻 𝐿−1 𝐻⊤and 𝑆 𝑆 +=𝐻 𝐻 ⊤. Then,

𝑆𝑆 +𝐺=𝐻 𝐻 ⊤𝐻 𝐿 Φ(𝐿)𝐻⊤=𝐻 𝐿 Φ(𝐿)𝐻⊤=𝐺,

since 𝐻is semi–orthogonal. Assuming that 𝐸𝜃,Σtr(Σ−1 𝐻 𝐿 Φ(𝐿)𝐻⊤)<∞, we have from Lemma 2.1

𝐸𝜃,ΣtrΣ−1 𝐻 𝐿 Φ(𝐿)𝐻⊤=𝐾∗𝐸∗

𝜃,Σ2 tr𝐻 𝐻 ⊤𝑠{𝐻 𝐿 Φ(𝐿)𝐻⊤}+ (𝑚−𝑟− 1) tr 𝐻Φ(𝐿)𝐻⊤.

(5.1)

Firstly, using Lemma A.4.2 in Haddouche et al. (2021), we have

𝑠𝐻 𝐿 Φ(𝐿)𝐻⊤=𝐻Φ(1) (𝐿)𝐻⊤+1

2tr Φ(𝐿)𝐼𝑝−𝐻 𝐻 ⊤,(5.2)

where Φ(1)(𝐿) = diag(𝜙(1)

1,…, 𝜙(1)

𝑟), with

𝜙(1)

𝑖=1

2(𝑝−𝑟+ 2) 𝜙𝑖+𝑙𝑖

𝜕𝜙𝑖

𝜕𝑙𝑖

𝑟



𝑗≠𝑖

𝑙𝑖𝜙𝑖−𝑙𝑗𝜙𝑗

𝑙𝑖−𝑙𝑗

.(5.3)

for 𝑖= 1 … 𝑟.

Secondly, using the fact that 𝐻⊤𝐻=𝐼𝑟, we have from (5.2)

𝐻 𝐻⊤𝑠𝐻 𝐿 Φ(𝐿)𝐻⊤=𝐻Φ(1) (𝐿)𝐻⊤.(5.4)

Then, putting (5.4) in (5.1), we obtain

𝐸𝜃,ΣtrΣ−1 𝐻 𝐿 Φ(𝐿)𝐻⊤=𝐾∗𝐸∗

𝜃,Σ2 trΦ(1) (𝐿)+ (𝑚−𝑟− 1) t rΦ(𝐿).

Finally, using (5.3), we have

𝐸𝜃,ΣtrΣ−1 𝐻 𝐿 Φ(𝐿)𝐻⊤=𝐾∗𝐸∗

𝜃,Σ𝑟



𝑖=1 (𝑝+𝑚− 2𝑟+ 1) 𝜙𝑖+ 2 𝑙𝑖

𝜕𝜙𝑖

𝜕𝑙𝑖

𝑟



𝑗≠𝑖

𝑙𝑖𝜙𝑖−𝑙𝑗𝜙𝑗

𝑙𝑖−𝑙𝑗,

where (𝑝+𝑚− 2𝑟+ 1) = (𝑣−𝑟).

First Author et al.: Preprint submitted to Elsevier Page 8 of 11

The optimal constat 𝑎𝑜in (2.5).Let 

Σ𝑎=𝑎 𝑆 where 𝑎 > 0. Assume that the expectations 𝐸𝜃,ΣtrΣ−1𝑆and

𝐸𝜃,ΣtrΣ𝑆+are ﬁnite. Then, the risk of 

Σ𝑎𝑜relating to the data-based loss (1.4) is given by

𝑅

Σ𝑎,Σ) = 𝐸𝜃,Σtr𝑆+Σ (Σ−1 

Σ𝑎−𝐼𝑝)2=𝑎2𝐸𝜃,ΣtrΣ−1 𝑆𝑆+𝑆− 2 𝑎 𝐸𝜃 ,Σtr 𝑆 𝑆++𝐸𝜃,Σt r𝑆+Σ.

(5.5)

Applying the Stein-Haﬀ type identity in Corollary (5.1), with Ψ(𝐿) = 𝐼𝑟, to the ﬁrst term in the right-hand side of

(5.5), we obtain

𝐸𝜃,ΣtrΣ−1 𝑆𝑆+𝑆=𝐸𝜃,Σtr Σ−1𝐻 𝐿 𝐻 ⊤=𝐾∗𝐸∗

𝜃,Σ𝑟



𝑖=1 (𝑣−𝑟+ 1) +

𝑟



𝑗≠𝑖

𝑙𝑖−𝑙𝑗

𝑙𝑖−𝑙𝑗

=𝐾∗[𝑟(𝑣−𝑟+ 1) + 𝑟(𝑟− 1)]=𝐾∗𝑟 𝑣 . (5.6)

Now, using the fact that tr(𝑆+𝑆) = tr(𝐻 𝐻⊤) = 𝑟and thanks to (5.6), we have

𝑅

Σ𝑎,Σ=𝑎2𝐾∗𝑟 𝑣 − 2 𝑎 𝑟 +𝐸𝜃 ,Σtr 𝑆+Σ.

Therefore, choosing 𝑎= 1∕𝐾∗𝑣is optimal under the risk (1.5).

Proof of Theorem 2.1.Let 

ΣΨ=𝑎𝑜𝑆+𝐻 𝐿 Ψ(𝐿)𝐻⊤)where Ψ(𝐿) = diag(𝜓1,…, 𝜓𝑟)such that 𝜓𝑖=𝜓𝑖(𝐿)

(𝑖= 1,…, 𝑟) is diﬀerentiable function of 𝐿and tr Ψ(𝐿)≥𝜆 > 0. Hence, using the fact that 𝐻⊤𝐻=𝐼𝑟, the

involving terms in the risk diﬀerence (2.8) becomes

𝐽=𝑆𝑆 +𝐺=𝐺=𝐻 𝐿Ψ(𝐿)𝐻⊤and 𝑆+𝐺=𝐻Ψ(𝐿)𝐻⊤.

Then, the risk diﬀerence between 

ΣΨand 

Σ𝑎𝑜is given by

Δ(Ψ) = 𝑎2

𝑜𝐸𝜃,ΣtrΣ−1 𝐻 𝐿 (2 Ψ + Ψ2)𝐻⊤− 2 𝑎𝑜𝐸𝜃,ΣtrΨ.(5.7)

Now, applying the Stein-Haﬀ type identity in Corollary (5.1) to the ﬁrst term in the right hand side of (5.7), for

Φ = 2 Ψ + Ψ2, we have

Δ(Ψ) = 𝑎2

𝑜𝐾∗𝐸∗

𝜃,Σ𝑟



𝑖=1 (𝑣−𝑟+ 1) (2 𝜓𝑖+𝜓2

𝑖)+2𝑙𝑖

𝜕(2 𝜓𝑖+𝜓2

𝑖)

𝜕𝑙𝑖

𝑟



𝑗≠𝑖

𝑙𝑖(2 𝜓𝑖+𝜓2

𝑖) − 𝑙𝑗(2 𝜓𝑗+𝜓2

𝑖)

𝑙𝑖−𝑙𝑗

− 2 𝑎0𝐸𝜃 ,ΣtrΨ.

Therefore, using the fact that tr(Ψ) ≥𝜆 > 0, an upper bound of the risk diﬀerence Δ(Ψ) is given by

Δ(Ψ) ≤𝑎2

𝑜𝐾∗𝐸∗

𝜃,Σ𝑟



𝑖=1 2 (𝑣−𝑟+ 1) 𝜓𝑖+ (𝑣−𝑟+ 1) 𝜓2

𝑖+ 4 𝑙𝑖(1 + 𝜓𝑖)𝜕𝜓𝑖

𝜕𝑙𝑖

𝑟



𝑗≠𝑖

𝑙𝑖(2 𝜓𝑖+𝜓2

𝑖) − 𝑙𝑗(2 𝜓𝑗+𝜓2

𝑖)

𝑙𝑖−𝑙𝑗

−2(𝑎𝑜𝐾∗)−1𝜆,

where (𝑎𝑜𝐾∗)−1 =𝑣.

Improvement condition (2.14)of alternative estimators in (2.13).Let consider the class of alternative estimators



Σ𝛼,𝑏 in (2.13). Then, applying Theorem 2.1, an upper bound of the risk diﬀerence between 

Σ𝛼,𝑏 and 

Σ𝑎𝑜is given by

Δ(Ψ) ≤𝑎2

𝑜𝐾∗𝐸∗

𝜃,Σ𝑔(Ψ),(5.8)

where the integrand term in (2.12) becomes

𝑔(Ψ) = 𝑔1(Ψ) + 𝑔2(Ψ)

First Author et al.: Preprint submitted to Elsevier Page 9 of 11

with

𝑔1(Ψ) = −2 (𝑟− 1) 𝑏

𝑟



𝑖=1

𝑙−𝛼

𝑖

tr(𝐿−𝛼)+ (𝑣−𝑟+ 1) 𝑏2

𝑟



𝑖=1

𝑙−2𝛼

𝑖

tr 2(𝐿−𝛼),

since tr Ψ(𝐿)=𝑏, and

𝑔2(Ψ) = 4𝑙𝑖𝑏1 + 𝑏𝑙−𝛼

𝑖

tr(𝐿−𝛼)𝜕

𝜕𝑙𝑖𝑙−𝛼

𝑖

tr(𝐿−𝛼)+2𝑏

tr(𝐿−𝛼)

𝑟



𝑖=1

𝑟



𝑗≠𝑖

𝑙1−𝛼

𝑖−𝑙1−𝛼

𝑗

𝑙𝑖−𝑙𝑗

+𝑏2

tr 2(𝐿−𝛼)

𝑟



𝑖=1

𝑟



𝑗≠𝑖

𝑙1−2𝛼

𝑖−𝑙1−2𝛼

𝑗

𝑙𝑖−𝑙𝑗

The proof consist to prove that the integrand term 𝑔2(Ψ) is non-positive. To this end, it can be shown that, for 𝛼≥1,

𝑟



𝑖=1

𝑟



𝑗≠𝑖

𝑙1−𝛼

𝑖−𝑙1−𝛼

𝑗

𝑙𝑖−𝑙𝑗

= 2

𝑟



𝑖

𝑟



𝑗>𝑖

𝑙1−𝛼

𝑖−𝑙1−𝛼

𝑗

𝑙𝑖−𝑙𝑗

≤0and

𝑟



𝑖=1

𝑟



𝑗≠𝑖

𝑙1−2𝛼

𝑖−𝑙1−2𝛼

𝑗

𝑙𝑖−𝑙𝑗

= 2

𝑟



𝑖=1

𝑟



𝑗>𝑖

𝑙1−2𝛼

𝑖−𝑙1−2𝛼

𝑗

𝑙𝑖−𝑙𝑗

<0.

since 𝐿= diag(𝑙1>, …, > 𝑙𝑟). Then

𝑔2(Ψ) ≤4𝑙𝑖𝑏1 + 𝑏𝑙−𝛼

𝑖

tr(𝐿−𝛼)𝜕

𝜕𝑙𝑖𝑙−𝛼

𝑖

tr(𝐿−𝛼)= 4 𝑏 𝛼 𝑙−𝛼

𝑖

tr(𝐿−𝛼)1 + 𝑏𝑙−𝛼

𝑖

tr(𝐿−𝛼) 𝑙−𝛼

𝑖

tr(𝐿−𝛼)− 1,

since

𝜕

𝜕𝑙𝑖𝑙−𝛼

𝑖

tr(𝐿−𝛼)=𝛼𝑙−𝛼−1

𝑖

tr(𝐿−𝛼)𝑙−𝛼

𝑖

tr(𝐿−𝛼)− 1.

Therefore, since 𝑙−𝛼

𝑖≤tr(𝐿−𝛼), the integrand term 𝑔2(Ψ) ≤0. Then

𝑔(Ψ) ≤𝑔1(Ψ) = −2 (𝑟− 1) 𝑏+ (𝑣−𝑟+ 1) 𝑏2tr(𝐿−2𝛼)

tr 2(𝐿−𝛼).

Now, using the fact that tr(𝐿−2𝛼)≤tr2(𝐿−𝛼), we have

𝑔(Ψ) ≤−2 (𝑟− 1) 𝑏+ (𝑣−𝑟+ 1) 𝑏2.

since 𝑏 > 0. Hence, an upper bound for the risk diﬀerence in (5.8) is given by

Δ(Ψ) ≤𝑎2

𝑜𝑏 𝐾∗𝐸∗

𝜃,Σ− 2 (𝑟− 1) + (𝑣−𝑟+ 1) 𝑏.

Therefore, 

Σ𝛼,𝑏 improves over 

Σ𝑎𝑜under the data-based loss (1.4) as soon as 0< 𝑏 ≤𝑏0= 2 (𝑟− 1)∕(𝑣−𝑟+ 1) .

CRediT authorship contribution statement

Dominique Fourdrinier: Conceptualization, Methodology, Supervision, Validation, Writing - review & editing,

Writing - original draft, Software. Anis M. Haddouche: Conceptualization, Methodology, Supervision, Validation,

Writing - review & editing, Writing - original draft, Software. Fatiha Mezoued: Conceptualization, Methodology,

Supervision, Validation, Writing - review & editing, Writing - original draft, Software.

First Author et al.: Preprint submitted to Elsevier Page 10 of 11

References

Berger, J., 1975. Minimax estimation of location vectors for a wide class of densities. Ann. Statis. 3, 1318–1328.

Bodnar, T., Gupta, A.K., 2009. An identity for multivariate elliptically contoured matrix distribution. Stat. Probab. Lett. 79, 1327–1330.

Candès, E., Sing-Long, C., Trzasko, J.D., 2013. Unbiased risk estimates for singular value thresholding and spectral estimators. IEEE T. Signal

Proces. 61, 4643–4657.

Canu, S., Fourdrinier, D., 2017. Unbiased risk estimates for matrix estimation in the elliptical case. J. Multivariate Anal. 158, 60–72.

Chételat, D., Wells, M.T., 2016. Improved second order estimation in the singular multivariate normal model. J. Multivariate Anal. 147, 1–19.

Díaz-Gacía, J.A., Gutiérrez-Jámez, R., 2011. On Wishart distribution: Some extensions. Linear Algebra Appl. 435, 1296–1310.

Efron, B., Morris, C., 1976. Multivariate empirical Bayes and estimation of covariance matrices. Ann. Statist. 4, 22–32.

Fang, K., Zhang, Y., 1990. Generalized multivariate analysis. 1990. Science Press, Springer-Verlag, Beijing.

Fourdrinier, D., Strawderman, W., 2015. Robust minimax Stein estimation under invariant data–based loss for spherically and elliptically symmetric

distributions. Metrika 78, 461–484.

Haddouche, A.M., Fourdrinier, D., Mezoued, F., 2021. Scale matrix estimation of an elliptically symmetric distribution in high and low dimensions.

J. Multivariate Anal. 181, 104680.

Haddouche, M.A., 2019. Éstimation d’une matrice d’échelle sous un coût basé sur les données in: Estimation d’une matrice d’échelle. Thesis. Nor-

mandie Université ; École nationale supérieure de statistiques et d’économie appliquée (Alger). URL: https://tel.archives-ouvertes.

fr/tel-02376077.

Haﬀ, L., 1980. Empirical Bayes estimation of the multivariate normal covariance matrix. Ann. Statis. 8, 586–597.

Haﬀ, L.R., 1979. Estimation of the inverse covariance matrix: Random mixtures of the inverse Wishart matrix and the identity. Ann. Statist. 7,

1264–1276.

James, W., Stein, C., 1961. Estimation with quadratic loss, in: Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and

Probability, Volume 1: Contributions to the Theory of Statistics, Berkeley, California. pp. 361–379.

Konno, Y., 2009. Shrinkage estimators for large covariance matrices in multivariate real and complex normal distributions under an invariant

quadratic loss. J. Multivariate Anal. 100, 2237–2253.

Kubokawa, T., Srivastava, M., 2001. Robust improvement in estimation of a mean matrix in an elliptically contoured distribution. J. Multivariate

Anal. 76, 138–152.

Kubokawa, T., Srivastava, M., 2008. Estimation of the precision matrix of a singular Wishart distribution and its application in high-dimensional

data. J. Multivariate Anal. 99, 1906–1928.

Kubokawa, T., Srivastava, M.S., 1999. Robust improvement in estimation of a covariance matrix in an elliptically contoured distribution. Ann.

Statist. 27, 600–609.

Srivastava, M.S., 2003. Singular Wishart and multivariate Beta distributions. Ann. Statis. 31, 1537–1560.

Stein, C., 1986. Lectures on the theory of estimation of many parameters. J. Sov. Math. 34, 1373–1403.

Takemura, A., 1984. An orthogonally invariant minimax estimator of the covariance matrix of a multivariate normal population. Tsukuba J. Math.

8, 367–376.

Tsukuma, H., 2016. Estimation of a high-dimensional covariance matrix with the Stein loss. J. Multivariate Anal. 148, 1–17.

Tsukuma, H., Kubokawa, T., 2015. A uniﬁed approach to estimating a normal mean matrix in high and low dimensions. J. Multivariate Anal. 139,

312 – 328.

Tsukuma, H., Kubokawa, T., 2016. Uniﬁed improvements in estimation of a normal covariance matrix in high and low dimensions. J. Multivariate

Anal. 143, 233–248.

Tsukuma, H., Kubokawa, T., 2020a. Estimation of the covariance matrix, in: Shrinkage Estimation for Mean and Covariance Matrices. Springer,

pp. 75–110.

Tsukuma, H., Kubokawa, T., 2020b. Multivariate linear model and group invariance, in: Shrinkage Estimation for Mean and Covariance Matrices.

Springer, pp. 27–33.

First Author et al.: Preprint submitted to Elsevier Page 11 of 11

ResearchGate has not been able to resolve any citations for this publication.

Scale matrix estimation of an elliptically symmetric distribution in high and low dimensions

Article

Sep 2020
J MULTIVARIATE ANAL

The problem of estimating the scale matrix Σ in a multivariate additive model, with elliptical noise, is considered from a decision-theoretic point of view. As the natural estimators of the form Σˆa=aS (where S is the sample covariance matrix and a is a positive constant) perform poorly, we propose estimators of the general form Σˆa,G=a(S+SS+G(Z,S)), where S+ is the Moore–Penrose inverse of S and G(Z,S) is a correction matrix. We provide conditions on G(Z,S) such that Σˆa,G improves over Σˆa under the quadratic loss L(Σ,Σˆ)=tr(ΣˆΣ−1−Ip)2. We adopt a unified approach to the two cases where S is invertible and S is singular. To this end, a new Stein–Haff type identity and calculus on eigenstructure for S are developed. Our theory is illustrated with a large class of estimators which are orthogonally invariant.

Multivariate Linear Model and Group Invariance

Chapter

Apr 2020

Multivariate linear model is a multivariate generalization for the dimension of response variable in traditional multiple linear regression model. This chapter provides some fundamental properties in terms of the multivariate linear model and the corresponding canonical form. The group invariance is also explained for shrinkage estimation in the multivariate linear model.

Shrinkage Estimation for Mean and Covariance Matrices

Book

Jan 2020

This book provides a self-contained introduction to shrinkage estimation for matrix-variate normal distribution models. More specifically, it presents recent techniques and results in estimation of mean and covariance matrices with a high-dimensional setting that implies singularity of the sample covariance matrix. Such high-dimensional models can be analyzed by using the same arguments as for low-dimensional models, thus yielding a unified approach to both high- and low-dimensional shrinkage estimations. The unified shrinkage approach not only integrates modern and classical shrinkage estimation, but is also required for further development of the field. Beginning with the notion of decision-theoretic estimation, this book explains matrix theory, group invariance, and other mathematical tools for finding better estimators. It also includes examples of shrinkage estimators for improving standard estimators, such as least squares, maximum likelihood, and minimum risk invariant estimators, and discusses the historical background and related topics in decision-theoretic estimation of parameter matrices. This book is useful for researchers and graduate students in various fields requiring data analysis skills as well as in mathematical statistics.

Unbiased risk estimates for matrix estimation in the elliptical case

Article

Apr 2017

This paper is concerned with additive models of the form , where is an observed matrix with , is an unknown matrix of interest with low rank, and is a random noise whose distribution is elliptically symmetric. For general estimators of , we develop unbiased risk estimates, including in the special case where is Gaussian with covariance matrix proportional to the identity matrix. To this end, we develop a new Stein–Haff type identity. We apply the theory to a model selection framework with estimators defined through a soft-thresholding function. We establish the robustness of our approach within a large subclass of elliptical distributions.

Unified improvements in estimation of a normal covariance matrix in high and low dimensions

Article

Oct 2015

The problem of estimating a covariance matrix in multivariate linear regression models is addressed in a decision-theoretic framework. This paper derives unified dominance results under a Stein-like loss, irrespective of order of the dimension, the sample size and the rank of the regression coefficients matrix. Especially, using the Stein-Haff identity, we develop a key inequality which is useful for constructing a truncated and improved estimator based on the information contained in the sample means or the ordinary least squares estimator of the regression coefficients. Also, a quadratic loss-like function is used to suggest alternative improved estimators with respect to an invariant quadratic loss.

Improved Second Order Estimation in the Singular Multivariate Normal Model

Article

Sep 2015

We consider the problem of estimating covariance and precision matrices, and their associated discriminant coefficients, from normal data when the rank of the covariance matrix is strictly smaller than its dimension and the available sample size. Using unbiased risk estimation, we construct novel estimators by minimizing upper bounds on the difference in risk over several classes. Our proposal estimates are empirically demonstrated to offer substantial improvement over classical approaches.

Estimation of a high-dimensional covariance matrix with the Stein loss

Article

Jun 2015
J MULTIVARIATE ANAL

Hisayuki Tsukuma

The problem of estimating a normal covariance matrix is considered from a decision-theoretic point of view, where the dimension of the covariance matrix is larger than the sample size. This paper addresses not only the nonsingular case but also the singular case in terms of the covariance matrix. Based on James and Stein's minimax estimator and on an orthogonally invariant estimator, some classes of estimators are unifiedly defined for any possible ordering on the dimension, the sample size and the rank of the covariance matrix. Unified dominance results on such classes are provided under a Stein-type entropy loss. The unified dominance results are applied to improving on an empirical Bayes estimator of a high-dimensional covariance matrix.

A unified approach to estimating a normal mean matrix in high and low dimensions

Article

Jul 2015

This paper addresses the problem of estimating the normal mean matrix with an unknown covariance matrix. Motivated by an empirical Bayes method, we suggest a unified form of the Efron–Morris type estimators based on the Moore–Penrose inverse. This form not only can be defined for any dimension and any sample size, but also can contain the Efron–Morris type or Baranchik type estimators suggested so far in the literature. Also, the unified form suggests a general class of shrinkage estimators. For shrinkage estimators within the general class, a unified expression of unbiased estimators of the risk functions is derived regardless of the dimension of covariance matrix and the size of the mean matrix. An analytical dominance result is provided for a positive-part rule of the shrinkage estimators.

Robust minimax Stein estimation under invariant data-based loss for spherically and elliptically symmetric distributions

Article

May 2015

From an observable $(X,U)$ in $\mathbb R^p \times \mathbb R^k$ , we consider estimation of an unknown location parameter $\theta \in \mathbb R^p$ under two distributional settings: the density of $(X,U)$ is spherically symmetric with an unknown scale parameter $\sigma $ and is ellipically symmetric with an unknown covariance matrix $\Sigma $ . Evaluation of estimators of $\theta $ is made under the classical invariant losses $\Vert d - \theta \Vert ^2 / \sigma ^2$ and $(d - \theta )^t \Sigma ^{-1} (d - \theta )$ as well as two respective data based losses $\Vert d - \theta \Vert ^2 / \Vert U\Vert ^2$ and $(d - \theta )^t S^{-1} (d - \theta )$ where $\Vert U\Vert ^2$ estimates $\sigma ^2$ while $S$ estimates $\Sigma $ . We provide new Stein and Stein–Haff identities that allow analysis of risk for these two new losses, including a new identity that gives rise to unbiased estimates of risk (up to a multiple of $1 / \sigma ^2$ ) in the spherical case for a larger class of estimators than in Fourdrinier et al. (J Multivar Anal 85:24–39, 2003). Minimax estimators of Baranchik form illustrate the theory. It is found that the range of shrinkage of these estimators is slightly larger for the data based losses compared to the usual invariant losses. It is also found that $X$ is minimax with finite risk with respect to the data-based losses for many distributions for which its risk is infinite when calculated under the classical invariant losses. In these cases, including the multivariate $t$ and, in particular, the multivariate Cauchy, we find improved shrinkage estimators as well.

On Wishart distribution: Some extensions

Article

Sep 2011
LINEAR ALGEBRA APPL

This paper proposes a unified approach that enables the Wishart distribution to be studied simultaneously in the real, complex, quaternion and octonion cases under elliptical models. In particular, the matrix multivariate elliptical distribution, the noncentral generalised Wishart distribution, the joint density of the eigenvalues and the distribution of the maximum eigenvalue are obtained for real normed division algebras.

Covariance matrix estimation under data-based loss

Abstract

Recommended publications

Asymptotic distribution of the weighted least squares estimator

The exact powers of some autocorrelation tests when relevant regressors are omitted

Risk Comparison of Improved Estimators in a Linear Regression Model with Multivariate Errors under B...

A note on classical Stein-type estimators in elliptically contoured models