Content uploaded by Anis M. Haddouche
Author content
All content in this area was uploaded by Anis M. Haddouche on Sep 22, 2021
Content may be subject to copyright.
Covariance matrix estimation under data–based loss
Dominique Fourdriniera,1,Anis M. Haddoucheb,∗,2 and Fatiha Mezouedc,1
aUniversité de Normandie, UNIROUEN, UNIHAVRE, INSA Rouen, LITIS, avenue de l’Université, BP 12, 76801 Saint-Étienne-du-Rouvray,
France.
bINSA Rouen, LITIS and LMI, avenue de l’Université, BP 12, 76801 Saint-Étienne-du-Rouvray, France.
cÉcole Nationale Supérieure de Statistique et d’Économie Appliquée (ENSSEA), LAMOPS, Tipaza, Algeria.
ARTICLE INFO
Keywords:
data–based loss
elliptically symmetric distributions
high–dimensional statistics
orthogonally invariant estimators
Stein–Haff type identities.
2010 MSC:
62H12
62F10
62C99.
ABSTRACT
In this paper, we consider the problem of estimating the 𝑝×𝑝scale matrix Σof a multivariate
linear regression model 𝑌=𝑋 𝛽 +when the distribution of the observed matrix 𝑌belongs to a
large class of elliptically symmetric distributions. After deriving the canonical form (𝑍⊤𝑈⊤)⊤
of this model, any estimator
Σof Σis assessed through the data–based loss tr (𝑆+Σ (Σ−1
Σ −
𝐼𝑝)2)where 𝑆=𝑈⊤𝑈is the sample covariance matrix and 𝑆+is its Moore-Penrose inverse.
We provide alternative estimators to the usual estimators 𝑎 𝑆 , where 𝑎is a positive constant,
which present smaller associated risk. Compared to the usual quadratic loss t r(Σ−1
Σ − 𝐼𝑝)2, we
obtain a larger class of estimators and a wider class of elliptical distributions for which such an
improvement occurs. A numerical study illustrates the theory.
1. Introduction
Let consider the multivariate linear regression model, with 𝑝responses and 𝑛observations,
𝑌=𝑋 𝛽 +,(1.1)
where 𝑌is an 𝑛×𝑝matrix, 𝑋is an 𝑛×𝑞matrix of known constants of rank 𝑞≤𝑛and 𝛽is a 𝑞×𝑝matrix of unknown
parameters. We assume that the 𝑛×𝑝noise matrix has an elliptically symmetric distribution with density, with
respect to the Lebesgue measure in ℝ𝑝𝑛, of the form
𝜀↦Σ−𝑛∕2 𝑓tr(𝜀Σ−1𝜀⊤),(1.2)
where Σis a 𝑝×𝑝unknown positive definite matrix and 𝑓(⋅)is a non–negative unknown function.
The model (1.1) has been considered by various authors such as Kubokawa and Srivastava (1999,2001), who esti-
mated Σand 𝛽respectively in the context (1.2), and Tsukuma and Kubokawa (2016) who estimated Σin the Gaussian
setting. A common alternative representation of this model is 𝑌=𝑀+, where is as above and 𝑀is in the column
space of 𝑋, has been also considered in the literature. See for instance Canu and Fourdrinier (2017) and Candès,
Sing-Long and Trzasko (2013).
Although the matrix of regression coefficients 𝛽is also unknown, we are interested in estimating the scale matrix
Σ. We address this problem under a decision–theoretic framework through a canonical form of the model (1.1), which
allows to use a sufficient statistic 𝑆=𝑈⊤𝑈for Σ, where 𝑈is an (𝑛−𝑞) × 𝑝matrix (see Section 2for more details).
In this context, the natural estimators of Σare of the form
Σ𝑎=𝑎 𝑆 , (1.3)
for some positive constants 𝑎.
As pointed out by James and Stein (1961), the estimators of the form (1.3) perform poorly in the Gaussian setting.
In fact, larger (smaller) eigenvalues of Σare overestimated (underestimated) by those estimators. Thus we may expect
to improve these estimators by shrinking the eigenvalues of 𝑆, which gives rise to the class of orthogonaly invariant
∗Corresponding author
Dominique.Fourdrinier@univ-rouen.fr (D. Fourdrinier); Mohamed.haddouche@insa-rouen.fr (A.M. Haddouche);
famezoued@yahoo.fr (F. Mezoued)
1Professor
2Temporarily associated to teaching and research.
First Author et al.: Preprint submitted to Elsevier Page 1 of 11
arXiv:2012.11920v1 [math.ST] 22 Dec 2020
estimators (see Takemura (1984)). Since the seminal work of James and Stein (1961), this problem has been largely
considered in the Gaussian setting. See, for instance, Tsukuma and Kubokawa (2016), Tsukuma (2016) and Chételat
and Wells (2016). However, the elliptical setting has been considered by a few authors such as Kubokawa and Srivastava
(1999), Haddouche, Fourdrinier and Mezoued (2021).
In this paper, the performance of any estimator
Σof Σis assessed through the data-based loss
𝐿𝑆(
Σ,Σ) = tr 𝑆+ΣΣ−1
Σ − 𝐼𝑝2(1.4)
and its associated risk
𝑅(
Σ,Σ) = 𝐸𝜃,Σtr 𝑆+ΣΣ−1
Σ − 𝐼𝑝2,(1.5)
where 𝐸𝜃,Σdenotes the expectation with respect to the density specified below in (2.3) and where 𝑆+is the Moore–
Penrose inverse of 𝑆. Note that, when 𝑝 > 𝑛 −𝑞,𝑆is non–invertible and, when 𝑝≤𝑛−𝑞,𝑆is invertible so that 𝑆+
coincides with the regular inverse 𝑆−1. This type of loss is called data–based loss in so far as it contains a part of the
observation 𝑈through 𝑆=𝑈⊤𝑈. The notion of data–based loss was introduced by Efron and Morris (1976) when
estimating a location parameter. Likewise, Fourdrinier and Strawderman (2015) showed the interest of considering
such a data–based loss with respect to the usual quadratic losses. Also, the data–based loss (1.4) was considered, in a
Gaussian setting, by Tsukuma and Kubokawa (2015) who were motivated by the difficulty to handle with the standard
quadratic loss
𝐿(
Σ,Σ) = tr Σ−1
Σ − 𝐼𝑝2.(1.6)
See Haff (1980) and Tsukuma (2016) for more details. Thus the loss in (1.4) is a data–based variant of the (1.6), through
which we aim to improve on the estimators
Σ𝑎in (1.3) by alternative estimators, focusing on improved orthogonally
invariant estimators. Note that most improvement results in the Gaussian case were derived thanks to Stein–Haff types
identities. Here, we specifically use the Stein–Haff type identity given by Haddouche et al. (2021), in the elliptical
case, to establish our dominance result, which is well adapted to our unified approach of the cases 𝑆invertible and 𝑆
non–invertible.
The rest of this paper is structured as follows. In Section 2, we give improvement conditions of the proposed
estimators over the usual estimators. In Section 3, we assess the quality of the proposed estimators through a simulation
study in the context of the t–distribution. We also compare numerically our results with those of Konno (2009) in the
Gaussian setting. Finally, we give in an Appendix all the proofs of our findings.
2. Main results
Although we are interested in estimating the scale matrix Σ, recall that 𝛽is a 𝑞×𝑝matrix of unknown parameters.
Note that, since 𝑋has full column rank, the least square estimator of 𝛽is
𝛽= (𝑋⊤𝑋)−1 𝑋⊤𝑌;this is the maximum
likelihood estimator in the Gaussian setting. Natural estimators of the scale matrix Σare based on the residual sum of
squares given by
𝑆=𝑌⊤(𝐼𝑛−𝑃𝑋)𝑌 , (2.1)
where 𝑃𝑋=𝑋(𝑋⊤𝑋)−1 𝑋⊤is the orthogonal projector onto the subspace spanned by the columns of 𝑋.
Following the lines of Kubokawa and Srivastava (1999) and Tsukuma and Kubokawa (2020b), we derive the canon-
ical form of the model (1.1) which allows a suitable treatment of the estimation of Σ. Let 𝑋=𝑄1𝑇⊤be the 𝑄𝑅
decomposition of 𝑋where 𝑄1is a 𝑛×𝑞semi-orthogonal matrix and 𝑇a𝑞×𝑞lower triangular matrix with positive
diagonal elements. Setting 𝑚=𝑛−𝑞, there exists a 𝑛×𝑚semi-orthogonal matrix 𝑄2which completes 𝑄1such that
𝑄= (𝑄1𝑄2)is an 𝑛×𝑛orthogonal matrix. Then, since
First Author et al.: Preprint submitted to Elsevier Page 2 of 11
𝑄⊤
2𝑋 𝛽 =𝑄⊤
2𝑄1𝑇⊤𝛽= 0
we have
𝑄⊤𝑌=𝑍
𝑈=𝑄⊤
1
𝑄⊤
2𝑋 𝛽 +𝑄⊤=𝜃
0+𝑄⊤,(2.2)
where 𝑄⊤
1𝑋 𝛽 =𝜃and where 𝑍and 𝑈are, respectively, 𝑞×𝑝and 𝑚×𝑝matrices. As 𝑋=𝑄1𝐿⊤, the projection
matrix 𝑃𝑋satisfies 𝑃𝑋=𝑄1𝐿⊤(𝐿⊤𝐿)−1𝐿 𝑄⊤
1=𝑄1𝑄⊤
1so that 𝐼𝑛−𝑃𝑋=𝑄2𝑄⊤
2. It follows that (2.1) becomes
𝑆=𝑌⊤𝑄2𝑄⊤
2𝑌=𝑈⊤𝑈 ,
according to (2.2), which is a sufficient statistic for Σ.
The orthogonal matrix 𝑄provides a linear reduction from 𝑛to 𝑞observations within each of the 𝑝responses. In
addition, according to (1.2), the density of 𝑄⊤is the same as that of , and hence, (𝑍⊤𝑈⊤)⊤has an elliptically
symmetric distribution about the matrix (𝜃⊤0⊤)⊤with density
(𝑧, 𝑢)↦Σ−𝑛∕2 𝑓tr (𝑧−𝜃) Σ−1 (𝑧−𝜃)⊤+ t r 𝑢Σ−1 𝑢
⊤,(2.3)
where 𝜃and Σare unknown. In this sense, the model (2.2) is the canonical form of the multivariate linear regression
model (1.1). Note that the marginal distribution of 𝑈=𝑄⊤
2𝑌is elliptically symmetric about 0with covariance matrix
proportional to 𝐼𝑚⊗Σ(see Fang and Zhang (1990)). This implies that 𝑆=𝑈⊤𝑈have a generalized Wishart dis-
tribution (see Díaz-Gacía and Gutiérrez-Jámez (2011)), which coincides with the standard (singular or non–singular)
Wishart distribution in the Gaussian setting (see Srivastava (2003)).
As mentioned in Section 1, the usual estimators of
Σ𝑎in (1.3) perform poorly. We propose alternative estimators
of the form
Σ𝐽=𝑎(𝑆+𝐽),(2.4)
where 𝐽=𝐽(𝑍, 𝑆 )is a correction matrix. The improvement over the class of estimators
Σ𝑎can be done by improving
the best estimator
Σ𝑎𝑜=𝑎𝑜𝑆within this class, namely, the estimator which minimizes the risk (1.5). It is proved in
the Appendix that
Σ𝑎𝑜=𝑎𝑜𝑆 , with 𝑎𝑜=1
𝐾∗𝑣and 𝑣= max{𝑝, 𝑚},(2.5)
where 𝐾∗is the normalizing constant (assumed to be finite) of the density defined by
(𝑧, 𝑢)↦1
𝐾∗Σ−𝑛∕2 𝐹∗tr (𝑧−𝜃) Σ−1 (𝑧−𝜃)⊤+ t r 𝑢Σ−1 𝑢⊤,(2.6)
where, for any 𝑡≥0,
𝐹∗(𝑡) = 1
2∫∞
𝑡
𝑓(𝜈)𝑑𝜈 .
Note that under de quadratic loss function (1.6) the optimal constant is 1∕𝐾∗(𝑝+𝑚+1). Of course, this risk optimality
has sense only if the risk of
Σ𝑎𝑜is finite. As shown in Haddouche (2019), this is the case as soon as 𝐸𝜃,Σt rΣ−1𝑆<
∞and 𝐸𝜃,ΣtrΣ𝑆+<∞.
In order to give a unified dominance result of
Σ𝐽over
Σ𝑎𝑜for the two cases where 𝑆is non–invertible and where
𝑆is invertible, we consider, as a correction matrix in (2.4), the projection of a matrix function 𝐺(𝑍, 𝑆) = 𝐺on the
subspace spanned by the columns of 𝑆𝑆 +, namely,
𝐽=𝑆𝑆 +𝐺 . (2.7)
First Author et al.: Preprint submitted to Elsevier Page 3 of 11
In addition to the risk finiteness conditions of
Σ𝑎𝑜, it can be shown that the risk of
Σ𝐽is finite as soon as the expectations
𝐸𝜃,ΣΣ−1𝑆𝑆 +𝐺2
𝐹and 𝐸𝜃,Σ𝑆+𝐺2
𝐹are finite, where ⋅𝐹denotes the Frobenius norm. Under these conditions,
the risk difference between
Σ𝐽and
Σ𝑎𝑜is
Δ(𝐺) = 𝑎2
𝑜𝐸𝜃,ΣtrΣ−1 𝑆 𝑆 +𝐺{𝐼𝑝+𝑆+𝐺+𝑆𝑆+}− 2 𝑎𝑜𝐸𝜃 ,Σtr 𝑆+𝐺.(2.8)
Noticing that the first integrand term in (2.8) depends on the unknown parameter Σ−1, our approach consists in replacing
this integrand term by a random matrix 𝛿(𝐺), which does not depend on Σ−1, such that Δ(𝐺)≤𝐸∗
𝜃,Σ𝛿(𝐺)where 𝐸∗
𝜃,Σ
denotes the expectation with respect to the density (2.6). Clearly, a sufficient condition for Δ(𝐺)to be non–positive
(and hence, for
Σ𝐽to improve over
Σ𝑎𝑜) is that 𝛿(𝐺)is non–positive. To this end, we rely on the following Stein–Haff
type identity.
Lemma 2.1 (Haddouche et al. (2021)).Let 𝐺(𝑧, 𝑠)be a 𝑝×𝑝matrix function such that, for any fixed 𝑧,𝐺(𝑧, 𝑠)is
weakly differentiable with respect to 𝑠. Assume that 𝐸𝜃 ,Σtr (Σ−1 𝑆 𝑆+𝐺)<∞. Then we have
𝐸𝜃,ΣtrΣ−1 𝑆 𝑆 +𝐺=𝐾∗𝐸∗
𝜃,Σtr2𝑆 𝑆 +𝑠{𝑆𝑆+𝐺}⊤+ (𝑚−𝑟− 1) 𝑆+𝐺,(2.9)
where 𝑟= min{𝑝, 𝑚}and 𝑠{⋅}is the Haff operator whose generic element is 1
2(1 + 𝛿𝑖𝑗 )𝜕
𝜕𝑆𝑖𝑗
,with 𝛿𝑖𝑗 = 1 if 𝑖=𝑗
and 𝛿𝑖𝑗 = 0 if 𝑖≠𝑗.
Note that the existence of the expectations in (2.9) is implied by the above risk finiteness conditions. An original
Stein–Haff identity was derived independently by Stein (1986) and Haff (1979) in the Gaussian setting where 𝑆is
invertible. This identity was extended to the class of elliptically symmetric distributions in (2.3)Kubokawa and Sri-
vastava (1999) and also by Bodnar and Gupta (2009). Here, we use the new Stein–Haff type identity recently derived
by Haddouche et al. (2021) in the elliptical framework (2.3) dealing with both cases 𝑆non–invertible and 𝑆invertible.
Applying Lemma 2.1 to the term depending on Σ−1 in the right–hand side of (2.8) gives
Δ(𝐺) = 𝑎2
𝑜𝐾∗𝐸∗
𝜃,Σ(𝑚−𝑟− 1) t r𝑆+𝐺+ (𝑆+𝐺)2+𝑆+𝐺𝑆𝑆+
+ 2 t r𝑆𝑆+𝑠{𝑆 𝑆 +𝐺+𝑆𝑆+𝐺𝑆+𝐺+𝑆 𝑆 +𝐺 𝑆𝑆+}⊤− 2 𝑎𝑜𝐸𝜃 ,Σtr 𝑆+𝐺.(2.10)
It is worth noticing that the risk difference in (2.10) depends on the 𝐸𝜃,Σand 𝐸∗
𝜃,Σexpectations (which coincide
in the Gaussian setting since 𝐹∗=𝑓). Thus, in order to derive a dominance result, we need to compare these two
expectations. A possible approach consists to restrict us to the subclass of densities verifying 𝑐≤𝐹∗(𝑡)∕𝑓(𝑡)≤𝑏,
for some positive constants 𝑐and 𝑏(see Berger (1975) for the class where 𝑐≤𝐹∗(𝑡)∕𝑓(𝑡)). Due to the complexity of
the use of the quadratic loss in (1.6) (which necessitates a twice application of the Stein–Haff type identity (2.9)), this
subclass was considered by Haddouche et al. (2021). Here, thanks to the data–based loss (1.4), we are able to avoid
such a restriction, and hence, to deal with a larger class of elliptically symmetric distributions in (2.3) (subject to the
moment conditions induced by the above finiteness conditions).
Following the suggestion to shrink the eigenvalues of 𝑆mentioned in Section 1, we consider as a correction matrix
a matrix 𝑆𝑆+𝐺with 𝐺orthogonally invariant in the following sense. Let 𝑆=𝐻 𝐿 𝐻 ⊤the eigenvalue decomposition
of 𝑆where 𝐻is a 𝑝×𝑟semi–orthogonal matrix of eigenvectors and 𝐿= diag(𝑙1,…, 𝑙𝑟), with 𝑙1>, …, > 𝑙𝑟, is
the diagonal matrix of the 𝑟positive corresponding eigenvalues of 𝑆(see Kubokawa and Srivastava (2008) for more
details). Then set 𝐺=𝐻 𝐿Ψ(𝐿)𝐻⊤, with Ψ(𝐿) = diag(𝜓1(𝐿),…, 𝜓𝑟(𝐿)) where 𝜓𝑖=𝜓𝑖(𝐿)(𝑖= 1,…, 𝑟) is a
differentiable function of 𝐿. Consequently, by semi–orthogonality of 𝐻, we have 𝑆𝑆+𝐻=𝐻 𝐻 ⊤𝐻=𝐻, so that
the correction matrix in (2.7) is
𝐽=𝑆𝑆 +𝐺=𝐺=𝐻 𝐿Ψ(𝐿)𝐻⊤.
Thus the alternative estimators that we consider are of the form
ΣΨ=𝑎𝑜𝑆+𝐻 𝐿 Ψ(𝐿)𝐻⊤) = 𝑎𝑜𝐻 𝐿 𝐼𝑟+ Ψ(𝐿)𝐻⊤,(2.11)
which are usually called orthogonally invariant estimators (i.e. equivariant under orthogonal transformations). See for
instance Takemura (1984).
Now, adapting the risk finiteness conditions mentioned above, we are in a position to give our dominance result of
the alternative estimators in (2.11) over the optimal estimator in (2.5), under the data–based loss (1.4).
First Author et al.: Preprint submitted to Elsevier Page 4 of 11
Theorem 2.1. Assume that the following expectations 𝐸𝜃,Σtr(Σ−1 𝑆),𝐸𝜃,Σtr(Σ𝑆+),𝐸𝜃,ΣΣ−1𝐻 𝐿Ψ(𝐿)𝐻⊤2
𝐹
and 𝐸𝜃,Σ𝐻Ψ(𝐿)𝐻⊤2
𝐹are finite. Let Ψ(𝐿) = diag(𝜓1,…, 𝜓𝑟)where 𝜓𝑖=𝜓𝑖(𝐿)(𝑖= 1,…, 𝑟) is differentiable
function of 𝐿with trΨ(𝐿)≥𝜆, for a fixed positive constant 𝜆.
Then an upper bound of the risk difference between
ΣΨand
Σ𝑎𝑜under the loss function (1.4)is given by
Δ(Ψ(𝐿)) ≤𝑎2
𝑜𝐾∗𝐸∗
𝜃,Σ𝑔(Ψ),
where
𝑔(Ψ) =
𝑟
𝑖=1 2(𝑣−𝑟+ 1)𝜓𝑖+ (𝑣−𝑟+ 1)𝜓2
𝑖+ 4𝑙𝑖(1 + 𝜓𝑖)𝜕𝜓𝑖
𝜕𝑙𝑖
+
𝑟
𝑗≠𝑖
𝑙𝑖(2𝜓𝑖+𝜓2
𝑖) − 𝑙𝑗(2𝜓𝑗+𝜓2
𝑖)
𝑙𝑖−𝑙𝑗
− 2𝑣𝜆.
(2.12)
Also
ΣΨin (2.11)improves over
Σ𝑎𝑜in (2.5)as soon as 𝑔(Ψ) ≤0.
The proof of Theorem 2.1 is given in the Appendix. Note that, although the expectation 𝐸∗
𝜃,Σis associated to the
generating function 𝑓(⋅)in (1.2), the function 𝑔(Ψ) does not depend on 𝑓(⋅), and hence, the improvement result in
Theorem 2.1 is robust in that sense. Note also that Theorem 2.1 is well adapted to deal with the James and Stein
(1961) estimator where 𝜓𝑖(𝐿) = 1∕(𝑣+𝑟− 2𝑖+ 1), for 𝑖= 1,…, 𝑟, since trΨ(𝐿)> 𝜆 = 1∕(𝑣+𝑟− 1) and the
Efron-Morris-Dey estimator, considered by Tsukuma and Kubokawa (2020a), where 𝜓𝑖(𝐿) = 1∕1 + 𝑏 𝑙𝛼
𝑖∕tr (𝐿𝛼)𝑣,
for 𝑖= 1,…, 𝑟 and for positive constants 𝑏and 𝛼, since trΨ(𝐿)> 𝜆 =𝑟∕(𝑏+ 1) 𝑣.
In the following, we consider a new class of estimators which is an extension of the Haff (1980) class, that is,
estimators of the form
Σ𝛼,𝑏 =𝑎𝑜𝑆+𝐻 𝐿 Ψ(𝐿)𝐻⊤with, for 𝛼≥1and 𝑏 > 0,Ψ(𝐿) = 𝑏𝐿−𝛼
tr(𝐿−𝛼),(2.13)
where 𝑎𝑜is given in (2.5). For 𝛼= 1, this is the estimator considered by Konno (2009), who deals with the Gaussian
case and the quadratic loss (1.6), while Tsukuma and Kubokawa (2020a) used an extended Stein loss. An elliptical
setting was also considered by Haddouche et al. (2021) under the quadratic loss (1.6).
It is proved in the Appendix that, for the entire class of elliptically symmetric distributions in (2.3), any estimator
Σ𝛼,𝑏 in (2.13) improves on the optimal estimator
Σ𝑎𝑜in (2.5), under the data–based loss (1.4), as soon as
0< 𝑏 ≤2 (𝑟− 1)
𝑣−𝑟+ 1 .(2.14)
It worth noting that Tsukuma and Kubokawa (2020a) gave Condition (2.14) as an improvement condition although
their loss was different.
3. Numerical study
Let the elliptical density in (1.2) be a variance mixture of normal distributions where the mixing variable, with
density ℎ, has the inverse–gamma distribution (𝑘∕2, 𝑘∕2) with shape and scale parameters both equal to 𝑘∕2 for
𝑘 > 2. Thus, for any 𝑡≥0, the generating function 𝑓in (1.2) has the form
𝑓(𝑡) = ∫∞
0
1
(2𝚟𝜋)𝑛𝑝∕2 exp −𝑡
2𝚟ℎ(𝚟)𝑑𝚟,
which corresponds to the 𝑡–distribution with 𝑘degrees of freedom. Then the primitive 𝐹∗of 𝑓in (2.6) is, for any
𝑡≥0,
𝐹∗(𝑡) = 1
2∫∞
𝑡∫∞
0
1
(2𝚟𝜋)𝑛𝑝∕2 exp −𝑤
2𝚟ℎ(𝚟)𝑑𝚟𝑑𝑤 =∫∞
0
𝚟
(2𝚟𝜋)𝑛𝑝∕2 exp −𝑡
2𝚟ℎ(𝚟)𝑑𝚟.
by Fubini’s theorem. Therefore the normalizing constant 𝐾∗in (2.6) is
𝐾∗=∫ℝ𝑝𝑛 ∫∞
0Σ−𝑛∕2
(2𝚟𝜋)𝑛𝑝∕2 𝚟exp −1
2𝚟tr (𝑧−𝜃) Σ−1 (𝑧−𝜃)⊤+ t r Σ−1 𝑢
⊤𝑢ℎ(𝚟)𝑑𝚟𝑑𝑧 𝑑 𝑢 ,
First Author et al.: Preprint submitted to Elsevier Page 5 of 11
=∫∞
0
𝚟∫ℝ𝑝𝑛 Σ−𝑛∕2
(2 𝚟𝜋)𝑛𝑝∕2 exp −1
2𝚟tr (𝑧−𝜃) Σ−1 (𝑧−𝜃)⊤+ t r Σ−1 𝑢
⊤𝑢𝑑𝑧 𝑑 𝑢 ℎ(𝚟)𝑑𝚟(3.1)
by Fubini’s theorem. Clearly the most inner integral in (3.1) equals 1 so that
𝐾∗=∫∞
0
𝚟ℎ(𝚟)𝑑𝚟=𝑘
𝑘− 2 ,
by propriety of (𝑘∕2, 𝑘∕2). Note that, when 𝑘goes to ∞,(𝑘∕2, 𝑘∕2) goes to the multivariate Gaussian distribution
(for which 𝐾∗= 1 since 𝑓=𝐹∗) with covariance matrix 𝐼𝑛⊗Σ.
In the following, we study numerically the performance of the alternative estimators in (2.13) expressed as
Σ𝛼,𝑏 =𝑎𝑜𝑆+𝑏
tr(𝐿−𝛼)𝐻 𝐿1−𝛼𝐻⊤where 0≤𝑏≤𝑏0=2 (𝑟− 1)
𝑣−𝑟+ 1 and 𝛼≥1.(3.2)
As mentioned above, Konno (2009) consider the case 𝛼= 1, in the Gaussian setting and under the quadratic loss (1.6),
for which its improvement condition is
0≤𝑏≤𝑏1=2 (𝑟− 1) (𝑣+𝑟+ 1)
(𝑣−𝑟+ 1) (𝑣−𝑟+ 3) .
Note that, although 𝑏0< 𝑏1, the improvement condition in (3.2) is valid fo any 𝛼≥1and all the class of elliptically
symmetric distributions (2.3). However it was shown numerically by Haddouche et al. (2021) that 𝑏1is optimal in the
Gaussian context.
We consider the following structures of Σ:(i) the identity matrix 𝐼𝑝and (ii) an autoregressive structure with
coefficient 0.9(i.e. a 𝑝×𝑝matrix where the (𝑖, 𝑗 )th element is 0.9𝑖−𝑗). To assess how an alternative estimator
Σ𝛼,𝑏
improves over
Σ𝑎𝑜, we compute the Percentage Reduction In Average Loss (PRIAL) defined as
PRIAL(
Σ𝛼,𝑏) = average loss of
Σ𝑎𝑜− average loss of
Σ𝛼,𝑏
average loss of
Σ𝑎𝑜
and based on 1000 independent Monte–Carlo replications for some couples (𝑝, 𝑚).
In Figure 1, we study the effect of the constant 𝑏in (3.2) on the prial’s in the non–invertible ((𝑝, 𝑚) = (25,10)) and
the invertible ((𝑝, 𝑚) = (10,25)) cases. The Gaussian setting is investigated for the structure (i) of Σ. Note that, when
0≤𝑏≤𝑏0, the best prial (around 7% in both invertible and non–invertible cases) is reported for 𝑏=𝑏0= 1.125 (for
(𝑣, 𝑟) = (25,20)). For this reason, in the following, we consider the estimators
Σ𝛼,𝑏0with
𝑏0=2 (𝑟− 1)
𝑣−𝑟+ 1 .
Note also that, for 𝑏>𝑏0, the estimators
Σ𝛼,𝑏 still improve over Σ𝑎𝑜and that the maximum value of the prial is around
50%. This shows that there exists a larger range of values of 𝑏than the one our theory provides for which
Σ𝛼,𝑏 improves
over
Σ𝑎𝑜.
In Figure 2, we study the effect of 𝛼on the prial’s of the estimator
Σ𝛼,𝑏0over
Σ𝑎𝑜=𝑆∕𝑣when the sampling
distribution is Gaussian (𝐾∗= 1 in (2.5)), and over
Σ𝑎𝑜=𝑆(𝑘− 2)∕𝑣𝑘 when it is the 𝑡-distribution (𝐾∗= (𝑘− 2)∕𝑘
in (2.5)) with 𝑘degrees of freedom. For the structure (i) of Σ, note that, for 𝛼≥6, the prial’s stabilize at 12.5%, in the
Gaussian case, and at 8.5%, in the Student case. Similarly, the prial’s are better in the Gaussian setting for the structure
(ii). In addition, it is interesting to observe that, when 𝛼is close to zero, the prial’s are small for the structure (i) and
may be negative for the structure (ii).
In Figure 3, under the Gaussian assumption, we provide the prial’s of
Σ𝛼,𝑏0with respect to
Σ𝑎𝑜=𝑆∕𝑣under the
data–based loss (1.4) and the prial’s of
Σ𝛼,𝑏1with respect to
Σ𝑎𝑜=𝑆∕(𝑣+𝑟+ 1) under the quadratic loss (1.6). For the
two structures (i) and (ii) of Σ, the prial’s are better under the data–based loss. For the structure (i) with 𝛼= 1 (which
coincide with the Konno’s estimator), we observe a prial equal to 1.73% which is similar to that of Konno (2009). Note
that, under the data–based loss the prial is much better since it equals 13.42%. We observe similar behaviors for the
structure (ii) than for the structure (i), but with lower prial’s.
First Author et al.: Preprint submitted to Elsevier Page 6 of 11
0.0 2.5 5.0 7.5 10.0 12.5 15.0
b
10
0
10
20
30
40
50
PRIAL
Invertible case
Non-invertible case
Fig. 1: Effect of 𝑏on the PRIAL of
Σ𝛼,𝑏, with 𝛼= 1, under data–based loss in the Gaussian setting. The structure (i) of Σ
is considered for the invertible case with (𝑝, 𝑚) = (10,25) and the non–invertible case with (𝑝, 𝑚) = (25,10).
02468
0
2
4
6
8
10
12
PRIAL
Gaussian
Student
(i)
02468
4
3
2
1
0
1
2
3
PRIAL
Gaussian
Student
(ii)
Fig. 2: PRIAL’s of
Σ𝛼,𝑏0under the data–based loss. The non-invertible case is considered, with (𝑝, 𝑚) = (50,20), for the
structures (i) and (ii) of Σfor the t-distribution, with 𝑘= 5 degrees of freedom, and the Gaussian distribution.
02468
0
5
10
15
20
PRIAL
quadratic loss
data-based loss
(i)
02468
1
0
1
2
3
4
5
6
7
PRIAL
quadratic loss
data-based loss
(ii)
Fig. 3: PRIAL’s of
Σ𝛼,𝑏0under data–based loss and PRIAL’s of
Σ𝛼,𝑏1under quadratic loss. The non–invertible case is
considered, with (𝑝, 𝑚) = (20,10), for the structures (i) and (ii) of Σunder the Gaussian distribution.
First Author et al.: Preprint submitted to Elsevier Page 7 of 11
4. Conclusion and perspective
For a wide class of elliptically symmetric distributions, we provide a large class of estimators of the scale matrix
Σof the elliptical multivariate linear model (1.1) which improve over the usual estimators 𝑎 𝑆. We highlight that the
use of the data–based loss (1.4) is more attractive than the use of the classical quadratic loss (1.6). Indeed, (1.4) brings
more improved estimators and their improvement is valid within a larger class of distributions. This means that (1.4)
is more discriminant than (1.6) to exhibit improved estimators.
While in (2.10) the risk difference between
Σ𝐽=𝑎𝑜(𝑆+𝐽)with 𝐽=𝑆𝑆+𝐺(𝑍 , 𝑆)and
Σ𝑎𝑜=𝑎𝑜𝑆, the dom-
inance result in Theorem 2.1 is given for a correction matrix 𝐺(𝑍, 𝑆 ) = 𝐻𝐿Ψ(𝐿)𝐻⊤which depends only on 𝑆.
Recently, Tsukuma (2016) consider, in the Gaussian case, alternative estimators where 𝐺(𝑍, 𝑆 )depends on 𝑆and on
the information contained in the sample mean 𝑍. This class of estimators merits future investigations in an elliptical
setting.
5. Appendix
We give in the following corollary an adaptation of Lemma (2.9) to an orthogonally invariant matrix function 𝐺,
that is, of the form 𝐺=𝐻 𝐿 Φ(𝐿)𝐻⊤where Φ(𝐿) = diag(𝜙1,…, 𝜙𝑟)with 𝜙𝑖=𝜙𝑖(𝐿)(𝑖= 1,…, 𝑟) is differentiable
function of 𝐿
Corollary 5.1. Let Φ(𝐿) = diag(𝜙1,…, 𝜙𝑟)where 𝜙𝑖=𝜙𝑖(𝐿)(𝑖= 1,…, 𝑟) is differentiable function of 𝐿. Assume
that 𝐸𝜃,Σtr(Σ−1 𝐻 𝐿 Φ(𝐿)𝐻⊤)<∞. Then we have
𝐸𝜃,Σtr(Σ−1 𝐻 𝐿 Φ(𝐿)𝐻⊤)=𝐾∗𝐸∗
𝜃,Σ𝑟
𝑖=1 (𝑣−𝑟+ 1) 𝜙𝑖+ 2 𝑙𝑖
𝜕𝜙𝑖
𝜕𝑙𝑖
+
𝑟
𝑗≠𝑖
𝑙𝑖𝜙𝑖−𝑙𝑗𝜙𝑗
𝑙𝑖−𝑙𝑗.
Proof. Let 𝐺=𝐻 𝐿 Φ(𝐿)𝐻⊤,𝑆+=𝐻 𝐿−1 𝐻⊤and 𝑆 𝑆 +=𝐻 𝐻 ⊤. Then,
𝑆𝑆 +𝐺=𝐻 𝐻 ⊤𝐻 𝐿 Φ(𝐿)𝐻⊤=𝐻 𝐿 Φ(𝐿)𝐻⊤=𝐺,
since 𝐻is semi–orthogonal. Assuming that 𝐸𝜃,Σtr(Σ−1 𝐻 𝐿 Φ(𝐿)𝐻⊤)<∞, we have from Lemma 2.1
𝐸𝜃,ΣtrΣ−1 𝐻 𝐿 Φ(𝐿)𝐻⊤=𝐾∗𝐸∗
𝜃,Σ2 tr𝐻 𝐻 ⊤𝑠{𝐻 𝐿 Φ(𝐿)𝐻⊤}+ (𝑚−𝑟− 1) tr 𝐻Φ(𝐿)𝐻⊤.
(5.1)
Firstly, using Lemma A.4.2 in Haddouche et al. (2021), we have
𝑠𝐻 𝐿 Φ(𝐿)𝐻⊤=𝐻Φ(1) (𝐿)𝐻⊤+1
2tr Φ(𝐿)𝐼𝑝−𝐻 𝐻 ⊤,(5.2)
where Φ(1)(𝐿) = diag(𝜙(1)
1,…, 𝜙(1)
𝑟), with
𝜙(1)
𝑖=1
2(𝑝−𝑟+ 2) 𝜙𝑖+𝑙𝑖
𝜕𝜙𝑖
𝜕𝑙𝑖
+1
2
𝑟
𝑗≠𝑖
𝑙𝑖𝜙𝑖−𝑙𝑗𝜙𝑗
𝑙𝑖−𝑙𝑗
.(5.3)
for 𝑖= 1 … 𝑟.
Secondly, using the fact that 𝐻⊤𝐻=𝐼𝑟, we have from (5.2)
𝐻 𝐻⊤𝑠𝐻 𝐿 Φ(𝐿)𝐻⊤=𝐻Φ(1) (𝐿)𝐻⊤.(5.4)
Then, putting (5.4) in (5.1), we obtain
𝐸𝜃,ΣtrΣ−1 𝐻 𝐿 Φ(𝐿)𝐻⊤=𝐾∗𝐸∗
𝜃,Σ2 trΦ(1) (𝐿)+ (𝑚−𝑟− 1) t rΦ(𝐿).
Finally, using (5.3), we have
𝐸𝜃,ΣtrΣ−1 𝐻 𝐿 Φ(𝐿)𝐻⊤=𝐾∗𝐸∗
𝜃,Σ𝑟
𝑖=1 (𝑝+𝑚− 2𝑟+ 1) 𝜙𝑖+ 2 𝑙𝑖
𝜕𝜙𝑖
𝜕𝑙𝑖
+
𝑟
𝑗≠𝑖
𝑙𝑖𝜙𝑖−𝑙𝑗𝜙𝑗
𝑙𝑖−𝑙𝑗,
where (𝑝+𝑚− 2𝑟+ 1) = (𝑣−𝑟).
First Author et al.: Preprint submitted to Elsevier Page 8 of 11
The optimal constat 𝑎𝑜in (2.5).Let
Σ𝑎=𝑎 𝑆 where 𝑎 > 0. Assume that the expectations 𝐸𝜃,ΣtrΣ−1𝑆and
𝐸𝜃,ΣtrΣ𝑆+are finite. Then, the risk of
Σ𝑎𝑜relating to the data-based loss (1.4) is given by
𝑅
Σ𝑎,Σ) = 𝐸𝜃,Σtr𝑆+Σ (Σ−1
Σ𝑎−𝐼𝑝)2=𝑎2𝐸𝜃,ΣtrΣ−1 𝑆𝑆+𝑆− 2 𝑎 𝐸𝜃 ,Σtr 𝑆 𝑆++𝐸𝜃,Σt r𝑆+Σ.
(5.5)
Applying the Stein-Haff type identity in Corollary (5.1), with Ψ(𝐿) = 𝐼𝑟, to the first term in the right-hand side of
(5.5), we obtain
𝐸𝜃,ΣtrΣ−1 𝑆𝑆+𝑆=𝐸𝜃,Σtr Σ−1𝐻 𝐿 𝐻 ⊤=𝐾∗𝐸∗
𝜃,Σ𝑟
𝑖=1 (𝑣−𝑟+ 1) +
𝑟
𝑗≠𝑖
𝑙𝑖−𝑙𝑗
𝑙𝑖−𝑙𝑗
=𝐾∗[𝑟(𝑣−𝑟+ 1) + 𝑟(𝑟− 1)]=𝐾∗𝑟 𝑣 . (5.6)
Now, using the fact that tr(𝑆+𝑆) = tr(𝐻 𝐻⊤) = 𝑟and thanks to (5.6), we have
𝑅
Σ𝑎,Σ=𝑎2𝐾∗𝑟 𝑣 − 2 𝑎 𝑟 +𝐸𝜃 ,Σtr 𝑆+Σ.
Therefore, choosing 𝑎= 1∕𝐾∗𝑣is optimal under the risk (1.5).
Proof of Theorem 2.1.Let
ΣΨ=𝑎𝑜𝑆+𝐻 𝐿 Ψ(𝐿)𝐻⊤)where Ψ(𝐿) = diag(𝜓1,…, 𝜓𝑟)such that 𝜓𝑖=𝜓𝑖(𝐿)
(𝑖= 1,…, 𝑟) is differentiable function of 𝐿and tr Ψ(𝐿)≥𝜆 > 0. Hence, using the fact that 𝐻⊤𝐻=𝐼𝑟, the
involving terms in the risk difference (2.8) becomes
𝐽=𝑆𝑆 +𝐺=𝐺=𝐻 𝐿Ψ(𝐿)𝐻⊤and 𝑆+𝐺=𝐻Ψ(𝐿)𝐻⊤.
Then, the risk difference between
ΣΨand
Σ𝑎𝑜is given by
Δ(Ψ) = 𝑎2
𝑜𝐸𝜃,ΣtrΣ−1 𝐻 𝐿 (2 Ψ + Ψ2)𝐻⊤− 2 𝑎𝑜𝐸𝜃,ΣtrΨ.(5.7)
Now, applying the Stein-Haff type identity in Corollary (5.1) to the first term in the right hand side of (5.7), for
Φ = 2 Ψ + Ψ2, we have
Δ(Ψ) = 𝑎2
𝑜𝐾∗𝐸∗
𝜃,Σ𝑟
𝑖=1 (𝑣−𝑟+ 1) (2 𝜓𝑖+𝜓2
𝑖)+2𝑙𝑖
𝜕(2 𝜓𝑖+𝜓2
𝑖)
𝜕𝑙𝑖
+
𝑟
𝑗≠𝑖
𝑙𝑖(2 𝜓𝑖+𝜓2
𝑖) − 𝑙𝑗(2 𝜓𝑗+𝜓2
𝑖)
𝑙𝑖−𝑙𝑗
− 2 𝑎0𝐸𝜃 ,ΣtrΨ.
Therefore, using the fact that tr(Ψ) ≥𝜆 > 0, an upper bound of the risk difference Δ(Ψ) is given by
Δ(Ψ) ≤𝑎2
𝑜𝐾∗𝐸∗
𝜃,Σ𝑟
𝑖=1 2 (𝑣−𝑟+ 1) 𝜓𝑖+ (𝑣−𝑟+ 1) 𝜓2
𝑖+ 4 𝑙𝑖(1 + 𝜓𝑖)𝜕𝜓𝑖
𝜕𝑙𝑖
+
𝑟
𝑗≠𝑖
𝑙𝑖(2 𝜓𝑖+𝜓2
𝑖) − 𝑙𝑗(2 𝜓𝑗+𝜓2
𝑖)
𝑙𝑖−𝑙𝑗
−2(𝑎𝑜𝐾∗)−1𝜆,
where (𝑎𝑜𝐾∗)−1 =𝑣.
Improvement condition (2.14)of alternative estimators in (2.13).Let consider the class of alternative estimators
Σ𝛼,𝑏 in (2.13). Then, applying Theorem 2.1, an upper bound of the risk difference between
Σ𝛼,𝑏 and
Σ𝑎𝑜is given by
Δ(Ψ) ≤𝑎2
𝑜𝐾∗𝐸∗
𝜃,Σ𝑔(Ψ),(5.8)
where the integrand term in (2.12) becomes
𝑔(Ψ) = 𝑔1(Ψ) + 𝑔2(Ψ)
First Author et al.: Preprint submitted to Elsevier Page 9 of 11
with
𝑔1(Ψ) = −2 (𝑟− 1) 𝑏
𝑟
𝑖=1
𝑙−𝛼
𝑖
tr(𝐿−𝛼)+ (𝑣−𝑟+ 1) 𝑏2
𝑟
𝑖=1
𝑙−2𝛼
𝑖
tr 2(𝐿−𝛼),
since tr Ψ(𝐿)=𝑏, and
𝑔2(Ψ) = 4𝑙𝑖𝑏1 + 𝑏𝑙−𝛼
𝑖
tr(𝐿−𝛼)𝜕
𝜕𝑙𝑖𝑙−𝛼
𝑖
tr(𝐿−𝛼)+2𝑏
tr(𝐿−𝛼)
𝑟
𝑖=1
𝑟
𝑗≠𝑖
𝑙1−𝛼
𝑖−𝑙1−𝛼
𝑗
𝑙𝑖−𝑙𝑗
+𝑏2
tr 2(𝐿−𝛼)
𝑟
𝑖=1
𝑟
𝑗≠𝑖
𝑙1−2𝛼
𝑖−𝑙1−2𝛼
𝑗
𝑙𝑖−𝑙𝑗
.
The proof consist to prove that the integrand term 𝑔2(Ψ) is non-positive. To this end, it can be shown that, for 𝛼≥1,
𝑟
𝑖=1
𝑟
𝑗≠𝑖
𝑙1−𝛼
𝑖−𝑙1−𝛼
𝑗
𝑙𝑖−𝑙𝑗
= 2
𝑟
𝑖
𝑟
𝑗>𝑖
𝑙1−𝛼
𝑖−𝑙1−𝛼
𝑗
𝑙𝑖−𝑙𝑗
≤0and
𝑟
𝑖=1
𝑟
𝑗≠𝑖
𝑙1−2𝛼
𝑖−𝑙1−2𝛼
𝑗
𝑙𝑖−𝑙𝑗
= 2
𝑟
𝑖=1
𝑟
𝑗>𝑖
𝑙1−2𝛼
𝑖−𝑙1−2𝛼
𝑗
𝑙𝑖−𝑙𝑗
<0.
since 𝐿= diag(𝑙1>, …, > 𝑙𝑟). Then
𝑔2(Ψ) ≤4𝑙𝑖𝑏1 + 𝑏𝑙−𝛼
𝑖
tr(𝐿−𝛼)𝜕
𝜕𝑙𝑖𝑙−𝛼
𝑖
tr(𝐿−𝛼)= 4 𝑏 𝛼 𝑙−𝛼
𝑖
tr(𝐿−𝛼)1 + 𝑏𝑙−𝛼
𝑖
tr(𝐿−𝛼) 𝑙−𝛼
𝑖
tr(𝐿−𝛼)− 1,
since
𝜕
𝜕𝑙𝑖𝑙−𝛼
𝑖
tr(𝐿−𝛼)=𝛼𝑙−𝛼−1
𝑖
tr(𝐿−𝛼)𝑙−𝛼
𝑖
tr(𝐿−𝛼)− 1.
Therefore, since 𝑙−𝛼
𝑖≤tr(𝐿−𝛼), the integrand term 𝑔2(Ψ) ≤0. Then
𝑔(Ψ) ≤𝑔1(Ψ) = −2 (𝑟− 1) 𝑏+ (𝑣−𝑟+ 1) 𝑏2tr(𝐿−2𝛼)
tr 2(𝐿−𝛼).
Now, using the fact that tr(𝐿−2𝛼)≤tr2(𝐿−𝛼), we have
𝑔(Ψ) ≤−2 (𝑟− 1) 𝑏+ (𝑣−𝑟+ 1) 𝑏2.
since 𝑏 > 0. Hence, an upper bound for the risk difference in (5.8) is given by
Δ(Ψ) ≤𝑎2
𝑜𝑏 𝐾∗𝐸∗
𝜃,Σ− 2 (𝑟− 1) + (𝑣−𝑟+ 1) 𝑏.
Therefore,
Σ𝛼,𝑏 improves over
Σ𝑎𝑜under the data-based loss (1.4) as soon as 0< 𝑏 ≤𝑏0= 2 (𝑟− 1)∕(𝑣−𝑟+ 1) .
CRediT authorship contribution statement
Dominique Fourdrinier: Conceptualization, Methodology, Supervision, Validation, Writing - review & editing,
Writing - original draft, Software. Anis M. Haddouche: Conceptualization, Methodology, Supervision, Validation,
Writing - review & editing, Writing - original draft, Software. Fatiha Mezoued: Conceptualization, Methodology,
Supervision, Validation, Writing - review & editing, Writing - original draft, Software.
First Author et al.: Preprint submitted to Elsevier Page 10 of 11
References
Berger, J., 1975. Minimax estimation of location vectors for a wide class of densities. Ann. Statis. 3, 1318–1328.
Bodnar, T., Gupta, A.K., 2009. An identity for multivariate elliptically contoured matrix distribution. Stat. Probab. Lett. 79, 1327–1330.
Candès, E., Sing-Long, C., Trzasko, J.D., 2013. Unbiased risk estimates for singular value thresholding and spectral estimators. IEEE T. Signal
Proces. 61, 4643–4657.
Canu, S., Fourdrinier, D., 2017. Unbiased risk estimates for matrix estimation in the elliptical case. J. Multivariate Anal. 158, 60–72.
Chételat, D., Wells, M.T., 2016. Improved second order estimation in the singular multivariate normal model. J. Multivariate Anal. 147, 1–19.
Díaz-Gacía, J.A., Gutiérrez-Jámez, R., 2011. On Wishart distribution: Some extensions. Linear Algebra Appl. 435, 1296–1310.
Efron, B., Morris, C., 1976. Multivariate empirical Bayes and estimation of covariance matrices. Ann. Statist. 4, 22–32.
Fang, K., Zhang, Y., 1990. Generalized multivariate analysis. 1990. Science Press, Springer-Verlag, Beijing.
Fourdrinier, D., Strawderman, W., 2015. Robust minimax Stein estimation under invariant data–based loss for spherically and elliptically symmetric
distributions. Metrika 78, 461–484.
Haddouche, A.M., Fourdrinier, D., Mezoued, F., 2021. Scale matrix estimation of an elliptically symmetric distribution in high and low dimensions.
J. Multivariate Anal. 181, 104680.
Haddouche, M.A., 2019. Éstimation d’une matrice d’échelle sous un coût basé sur les données in: Estimation d’une matrice d’échelle. Thesis. Nor-
mandie Université ; École nationale supérieure de statistiques et d’économie appliquée (Alger). URL: https://tel.archives-ouvertes.
fr/tel-02376077.
Haff, L., 1980. Empirical Bayes estimation of the multivariate normal covariance matrix. Ann. Statis. 8, 586–597.
Haff, L.R., 1979. Estimation of the inverse covariance matrix: Random mixtures of the inverse Wishart matrix and the identity. Ann. Statist. 7,
1264–1276.
James, W., Stein, C., 1961. Estimation with quadratic loss, in: Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and
Probability, Volume 1: Contributions to the Theory of Statistics, Berkeley, California. pp. 361–379.
Konno, Y., 2009. Shrinkage estimators for large covariance matrices in multivariate real and complex normal distributions under an invariant
quadratic loss. J. Multivariate Anal. 100, 2237–2253.
Kubokawa, T., Srivastava, M., 2001. Robust improvement in estimation of a mean matrix in an elliptically contoured distribution. J. Multivariate
Anal. 76, 138–152.
Kubokawa, T., Srivastava, M., 2008. Estimation of the precision matrix of a singular Wishart distribution and its application in high-dimensional
data. J. Multivariate Anal. 99, 1906–1928.
Kubokawa, T., Srivastava, M.S., 1999. Robust improvement in estimation of a covariance matrix in an elliptically contoured distribution. Ann.
Statist. 27, 600–609.
Srivastava, M.S., 2003. Singular Wishart and multivariate Beta distributions. Ann. Statis. 31, 1537–1560.
Stein, C., 1986. Lectures on the theory of estimation of many parameters. J. Sov. Math. 34, 1373–1403.
Takemura, A., 1984. An orthogonally invariant minimax estimator of the covariance matrix of a multivariate normal population. Tsukuba J. Math.
8, 367–376.
Tsukuma, H., 2016. Estimation of a high-dimensional covariance matrix with the Stein loss. J. Multivariate Anal. 148, 1–17.
Tsukuma, H., Kubokawa, T., 2015. A unified approach to estimating a normal mean matrix in high and low dimensions. J. Multivariate Anal. 139,
312 – 328.
Tsukuma, H., Kubokawa, T., 2016. Unified improvements in estimation of a normal covariance matrix in high and low dimensions. J. Multivariate
Anal. 143, 233–248.
Tsukuma, H., Kubokawa, T., 2020a. Estimation of the covariance matrix, in: Shrinkage Estimation for Mean and Covariance Matrices. Springer,
pp. 75–110.
Tsukuma, H., Kubokawa, T., 2020b. Multivariate linear model and group invariance, in: Shrinkage Estimation for Mean and Covariance Matrices.
Springer, pp. 27–33.
First Author et al.: Preprint submitted to Elsevier Page 11 of 11