PreprintPDF Available
Preprints and early-stage research may not have been peer reviewed yet.

Abstract

In this paper, we consider the problem of estimating the $p\times p$ scale matrix $\Sigma$ of a multivariate linear regression model $Y=X\,\beta + \mathcal{E}\,$ when the distribution of the observed matrix $Y$ belongs to a large class of elliptically symmetric distributions. After deriving the canonical form $(Z^T U^T)^T$ of this model, any estimator $\hat{ \Sigma}$ of $\Sigma$ is assessed through the data-based loss tr$(S^{+}\Sigma\, (\Sigma^{-1}\hat{\Sigma} - I_p)^2 )\,$ where $S=U^T U$ is the sample covariance matrix and $S^{+}$ is its Moore-Penrose inverse. We provide alternative estimators to the usual estimators $a\,S$, where $a$ is a positive constant, which present smaller associated risk. Compared to the usual quadratic loss tr$(\Sigma^{-1}\hat{\Sigma} - I_p)^2$, we obtain a larger class of estimators and a wider class of elliptical distributions for which such an improvement occurs. A numerical study illustrates the theory.
Covariance matrix estimation under data–based loss
Dominique Fourdriniera,1,Anis M. Haddoucheb,,2 and Fatiha Mezouedc,1
aUniversité de Normandie, UNIROUEN, UNIHAVRE, INSA Rouen, LITIS, avenue de l’Université, BP 12, 76801 Saint-Étienne-du-Rouvray,
France.
bINSA Rouen, LITIS and LMI, avenue de l’Université, BP 12, 76801 Saint-Étienne-du-Rouvray, France.
cÉcole Nationale Supérieure de Statistique et d’Économie Appliquée (ENSSEA), LAMOPS, Tipaza, Algeria.
ARTICLE INFO
Keywords:
data–based loss
elliptically symmetric distributions
high–dimensional statistics
orthogonally invariant estimators
Stein–Haff type identities.
2010 MSC:
62H12
62F10
62C99.
ABSTRACT
In this paper, we consider the problem of estimating the 𝑝×𝑝scale matrix Σof a multivariate
linear regression model 𝑌=𝑋 𝛽 +when the distribution of the observed matrix 𝑌belongs to a
large class of elliptically symmetric distributions. After deriving the canonical form (𝑍𝑈)
of this model, any estimator
Σof Σis assessed through the data–based loss tr (𝑆+Σ −1
Σ
𝐼𝑝)2)where 𝑆=𝑈𝑈is the sample covariance matrix and 𝑆+is its Moore-Penrose inverse.
We provide alternative estimators to the usual estimators 𝑎 𝑆 , where 𝑎is a positive constant,
which present smaller associated risk. Compared to the usual quadratic loss t r(Σ−1
Σ 𝐼𝑝)2, we
obtain a larger class of estimators and a wider class of elliptical distributions for which such an
improvement occurs. A numerical study illustrates the theory.
1. Introduction
Let consider the multivariate linear regression model, with 𝑝responses and 𝑛observations,
𝑌=𝑋 𝛽 +,(1.1)
where 𝑌is an 𝑛×𝑝matrix, 𝑋is an 𝑛×𝑞matrix of known constants of rank 𝑞𝑛and 𝛽is a 𝑞×𝑝matrix of unknown
parameters. We assume that the 𝑛×𝑝noise matrix has an elliptically symmetric distribution with density, with
respect to the Lebesgue measure in 𝑝𝑛, of the form
𝜀Σ𝑛∕2 𝑓tr(𝜀Σ−1𝜀),(1.2)
where Σis a 𝑝×𝑝unknown positive definite matrix and 𝑓()is a non–negative unknown function.
The model (1.1) has been considered by various authors such as Kubokawa and Srivastava (1999,2001), who esti-
mated Σand 𝛽respectively in the context (1.2), and Tsukuma and Kubokawa (2016) who estimated Σin the Gaussian
setting. A common alternative representation of this model is 𝑌=𝑀+, where is as above and 𝑀is in the column
space of 𝑋, has been also considered in the literature. See for instance Canu and Fourdrinier (2017) and Candès,
Sing-Long and Trzasko (2013).
Although the matrix of regression coefficients 𝛽is also unknown, we are interested in estimating the scale matrix
Σ. We address this problem under a decision–theoretic framework through a canonical form of the model (1.1), which
allows to use a sufficient statistic 𝑆=𝑈𝑈for Σ, where 𝑈is an (𝑛𝑞) × 𝑝matrix (see Section 2for more details).
In this context, the natural estimators of Σare of the form
Σ𝑎=𝑎 𝑆 , (1.3)
for some positive constants 𝑎.
As pointed out by James and Stein (1961), the estimators of the form (1.3) perform poorly in the Gaussian setting.
In fact, larger (smaller) eigenvalues of Σare overestimated (underestimated) by those estimators. Thus we may expect
to improve these estimators by shrinking the eigenvalues of 𝑆, which gives rise to the class of orthogonaly invariant
Corresponding author
Dominique.Fourdrinier@univ-rouen.fr (D. Fourdrinier); Mohamed.haddouche@insa-rouen.fr (A.M. Haddouche);
famezoued@yahoo.fr (F. Mezoued)
1Professor
2Temporarily associated to teaching and research.
First Author et al.: Preprint submitted to Elsevier Page 1 of 11
arXiv:2012.11920v1 [math.ST] 22 Dec 2020
estimators (see Takemura (1984)). Since the seminal work of James and Stein (1961), this problem has been largely
considered in the Gaussian setting. See, for instance, Tsukuma and Kubokawa (2016), Tsukuma (2016) and Chételat
and Wells (2016). However, the elliptical setting has been considered by a few authors such as Kubokawa and Srivastava
(1999), Haddouche, Fourdrinier and Mezoued (2021).
In this paper, the performance of any estimator
Σof Σis assessed through the data-based loss
𝐿𝑆(
Σ,Σ) = tr 𝑆+ΣΣ−1
Σ 𝐼𝑝2(1.4)
and its associated risk
𝑅(
Σ,Σ) = 𝐸𝜃,Σtr 𝑆+ΣΣ−1
Σ 𝐼𝑝2,(1.5)
where 𝐸𝜃,Σdenotes the expectation with respect to the density specified below in (2.3) and where 𝑆+is the Moore–
Penrose inverse of 𝑆. Note that, when 𝑝 > 𝑛 𝑞,𝑆is non–invertible and, when 𝑝𝑛𝑞,𝑆is invertible so that 𝑆+
coincides with the regular inverse 𝑆−1. This type of loss is called data–based loss in so far as it contains a part of the
observation 𝑈through 𝑆=𝑈𝑈. The notion of data–based loss was introduced by Efron and Morris (1976) when
estimating a location parameter. Likewise, Fourdrinier and Strawderman (2015) showed the interest of considering
such a data–based loss with respect to the usual quadratic losses. Also, the data–based loss (1.4) was considered, in a
Gaussian setting, by Tsukuma and Kubokawa (2015) who were motivated by the difficulty to handle with the standard
quadratic loss
𝐿(
Σ,Σ) = tr Σ−1
Σ 𝐼𝑝2.(1.6)
See Haff (1980) and Tsukuma (2016) for more details. Thus the loss in (1.4) is a data–based variant of the (1.6), through
which we aim to improve on the estimators
Σ𝑎in (1.3) by alternative estimators, focusing on improved orthogonally
invariant estimators. Note that most improvement results in the Gaussian case were derived thanks to Stein–Haff types
identities. Here, we specifically use the Stein–Haff type identity given by Haddouche et al. (2021), in the elliptical
case, to establish our dominance result, which is well adapted to our unified approach of the cases 𝑆invertible and 𝑆
non–invertible.
The rest of this paper is structured as follows. In Section 2, we give improvement conditions of the proposed
estimators over the usual estimators. In Section 3, we assess the quality of the proposed estimators through a simulation
study in the context of the t–distribution. We also compare numerically our results with those of Konno (2009) in the
Gaussian setting. Finally, we give in an Appendix all the proofs of our findings.
2. Main results
Although we are interested in estimating the scale matrix Σ, recall that 𝛽is a 𝑞×𝑝matrix of unknown parameters.
Note that, since 𝑋has full column rank, the least square estimator of 𝛽is
𝛽= (𝑋𝑋)−1 𝑋𝑌;this is the maximum
likelihood estimator in the Gaussian setting. Natural estimators of the scale matrix Σare based on the residual sum of
squares given by
𝑆=𝑌(𝐼𝑛𝑃𝑋)𝑌 , (2.1)
where 𝑃𝑋=𝑋(𝑋𝑋)−1 𝑋is the orthogonal projector onto the subspace spanned by the columns of 𝑋.
Following the lines of Kubokawa and Srivastava (1999) and Tsukuma and Kubokawa (2020b), we derive the canon-
ical form of the model (1.1) which allows a suitable treatment of the estimation of Σ. Let 𝑋=𝑄1𝑇be the 𝑄𝑅
decomposition of 𝑋where 𝑄1is a 𝑛×𝑞semi-orthogonal matrix and 𝑇a𝑞×𝑞lower triangular matrix with positive
diagonal elements. Setting 𝑚=𝑛𝑞, there exists a 𝑛×𝑚semi-orthogonal matrix 𝑄2which completes 𝑄1such that
𝑄= (𝑄1𝑄2)is an 𝑛×𝑛orthogonal matrix. Then, since
First Author et al.: Preprint submitted to Elsevier Page 2 of 11
𝑄
2𝑋 𝛽 =𝑄
2𝑄1𝑇𝛽= 0
we have
𝑄𝑌=𝑍
𝑈=𝑄
1
𝑄
2𝑋 𝛽 +𝑄=𝜃
0+𝑄,(2.2)
where 𝑄
1𝑋 𝛽 =𝜃and where 𝑍and 𝑈are, respectively, 𝑞×𝑝and 𝑚×𝑝matrices. As 𝑋=𝑄1𝐿, the projection
matrix 𝑃𝑋satisfies 𝑃𝑋=𝑄1𝐿(𝐿𝐿)−1𝐿 𝑄
1=𝑄1𝑄
1so that 𝐼𝑛𝑃𝑋=𝑄2𝑄
2. It follows that (2.1) becomes
𝑆=𝑌𝑄2𝑄
2𝑌=𝑈𝑈 ,
according to (2.2), which is a sufficient statistic for Σ.
The orthogonal matrix 𝑄provides a linear reduction from 𝑛to 𝑞observations within each of the 𝑝responses. In
addition, according to (1.2), the density of 𝑄is the same as that of , and hence, (𝑍𝑈)has an elliptically
symmetric distribution about the matrix (𝜃0)with density
(𝑧, 𝑢)Σ𝑛∕2 𝑓tr (𝑧𝜃) Σ−1 (𝑧𝜃)+ t r 𝑢Σ−1 𝑢
,(2.3)
where 𝜃and Σare unknown. In this sense, the model (2.2) is the canonical form of the multivariate linear regression
model (1.1). Note that the marginal distribution of 𝑈=𝑄
2𝑌is elliptically symmetric about 0with covariance matrix
proportional to 𝐼𝑚Σ(see Fang and Zhang (1990)). This implies that 𝑆=𝑈𝑈have a generalized Wishart dis-
tribution (see Díaz-Gacía and Gutiérrez-Jámez (2011)), which coincides with the standard (singular or non–singular)
Wishart distribution in the Gaussian setting (see Srivastava (2003)).
As mentioned in Section 1, the usual estimators of
Σ𝑎in (1.3) perform poorly. We propose alternative estimators
of the form
Σ𝐽=𝑎(𝑆+𝐽),(2.4)
where 𝐽=𝐽(𝑍, 𝑆 )is a correction matrix. The improvement over the class of estimators
Σ𝑎can be done by improving
the best estimator
Σ𝑎𝑜=𝑎𝑜𝑆within this class, namely, the estimator which minimizes the risk (1.5). It is proved in
the Appendix that
Σ𝑎𝑜=𝑎𝑜𝑆 , with 𝑎𝑜=1
𝐾𝑣and 𝑣= max{𝑝, 𝑚},(2.5)
where 𝐾is the normalizing constant (assumed to be finite) of the density defined by
(𝑧, 𝑢)1
𝐾Σ𝑛∕2 𝐹tr (𝑧𝜃) Σ−1 (𝑧𝜃)+ t r 𝑢Σ−1 𝑢,(2.6)
where, for any 𝑡0,
𝐹(𝑡) = 1
2
𝑡
𝑓(𝜈)𝑑𝜈 .
Note that under de quadratic loss function (1.6) the optimal constant is 1∕𝐾(𝑝+𝑚+1). Of course, this risk optimality
has sense only if the risk of
Σ𝑎𝑜is finite. As shown in Haddouche (2019), this is the case as soon as 𝐸𝜃,Σt rΣ−1𝑆<
and 𝐸𝜃,ΣtrΣ𝑆+<.
In order to give a unified dominance result of
Σ𝐽over
Σ𝑎𝑜for the two cases where 𝑆is non–invertible and where
𝑆is invertible, we consider, as a correction matrix in (2.4), the projection of a matrix function 𝐺(𝑍, 𝑆) = 𝐺on the
subspace spanned by the columns of 𝑆𝑆 +, namely,
𝐽=𝑆𝑆 +𝐺 . (2.7)
First Author et al.: Preprint submitted to Elsevier Page 3 of 11
In addition to the risk finiteness conditions of
Σ𝑎𝑜, it can be shown that the risk of
Σ𝐽is finite as soon as the expectations
𝐸𝜃,ΣΣ−1𝑆𝑆 +𝐺2
𝐹and 𝐸𝜃,Σ𝑆+𝐺2
𝐹are finite, where 𝐹denotes the Frobenius norm. Under these conditions,
the risk difference between
Σ𝐽and
Σ𝑎𝑜is
Δ(𝐺) = 𝑎2
𝑜𝐸𝜃,ΣtrΣ−1 𝑆 𝑆 +𝐺{𝐼𝑝+𝑆+𝐺+𝑆𝑆+} 2 𝑎𝑜𝐸𝜃 ,Σtr 𝑆+𝐺.(2.8)
Noticing that the first integrand term in (2.8) depends on the unknown parameter Σ−1, our approach consists in replacing
this integrand term by a random matrix 𝛿(𝐺), which does not depend on Σ−1, such that Δ(𝐺)𝐸
𝜃,Σ𝛿(𝐺)where 𝐸
𝜃,Σ
denotes the expectation with respect to the density (2.6). Clearly, a sufficient condition for Δ(𝐺)to be non–positive
(and hence, for
Σ𝐽to improve over
Σ𝑎𝑜) is that 𝛿(𝐺)is non–positive. To this end, we rely on the following Stein–Haff
type identity.
Lemma 2.1 (Haddouche et al. (2021)).Let 𝐺(𝑧, 𝑠)be a 𝑝×𝑝matrix function such that, for any fixed 𝑧,𝐺(𝑧, 𝑠)is
weakly differentiable with respect to 𝑠. Assume that 𝐸𝜃 ,Σtr −1 𝑆 𝑆+𝐺)<. Then we have
𝐸𝜃,ΣtrΣ−1 𝑆 𝑆 +𝐺=𝐾𝐸
𝜃,Σtr2𝑆 𝑆 +𝑠{𝑆𝑆+𝐺}+ (𝑚𝑟 1) 𝑆+𝐺,(2.9)
where 𝑟= min{𝑝, 𝑚}and 𝑠{}is the Haff operator whose generic element is 1
2(1 + 𝛿𝑖𝑗 )𝜕
𝜕𝑆𝑖𝑗
,with 𝛿𝑖𝑗 = 1 if 𝑖=𝑗
and 𝛿𝑖𝑗 = 0 if 𝑖𝑗.
Note that the existence of the expectations in (2.9) is implied by the above risk finiteness conditions. An original
Stein–Haff identity was derived independently by Stein (1986) and Haff (1979) in the Gaussian setting where 𝑆is
invertible. This identity was extended to the class of elliptically symmetric distributions in (2.3)Kubokawa and Sri-
vastava (1999) and also by Bodnar and Gupta (2009). Here, we use the new Stein–Haff type identity recently derived
by Haddouche et al. (2021) in the elliptical framework (2.3) dealing with both cases 𝑆non–invertible and 𝑆invertible.
Applying Lemma 2.1 to the term depending on Σ−1 in the right–hand side of (2.8) gives
Δ(𝐺) = 𝑎2
𝑜𝐾𝐸
𝜃,Σ(𝑚𝑟 1) t r𝑆+𝐺+ (𝑆+𝐺)2+𝑆+𝐺𝑆𝑆+
+ 2 t r𝑆𝑆+𝑠{𝑆 𝑆 +𝐺+𝑆𝑆+𝐺𝑆+𝐺+𝑆 𝑆 +𝐺 𝑆𝑆+} 2 𝑎𝑜𝐸𝜃 ,Σtr 𝑆+𝐺.(2.10)
It is worth noticing that the risk difference in (2.10) depends on the 𝐸𝜃,Σand 𝐸
𝜃,Σexpectations (which coincide
in the Gaussian setting since 𝐹=𝑓). Thus, in order to derive a dominance result, we need to compare these two
expectations. A possible approach consists to restrict us to the subclass of densities verifying 𝑐𝐹(𝑡)∕𝑓(𝑡)𝑏,
for some positive constants 𝑐and 𝑏(see Berger (1975) for the class where 𝑐𝐹(𝑡)∕𝑓(𝑡)). Due to the complexity of
the use of the quadratic loss in (1.6) (which necessitates a twice application of the Stein–Haff type identity (2.9)), this
subclass was considered by Haddouche et al. (2021). Here, thanks to the data–based loss (1.4), we are able to avoid
such a restriction, and hence, to deal with a larger class of elliptically symmetric distributions in (2.3) (subject to the
moment conditions induced by the above finiteness conditions).
Following the suggestion to shrink the eigenvalues of 𝑆mentioned in Section 1, we consider as a correction matrix
a matrix 𝑆𝑆+𝐺with 𝐺orthogonally invariant in the following sense. Let 𝑆=𝐻 𝐿 𝐻 the eigenvalue decomposition
of 𝑆where 𝐻is a 𝑝×𝑟semi–orthogonal matrix of eigenvectors and 𝐿= diag(𝑙1,, 𝑙𝑟), with 𝑙1>, , > 𝑙𝑟, is
the diagonal matrix of the 𝑟positive corresponding eigenvalues of 𝑆(see Kubokawa and Srivastava (2008) for more
details). Then set 𝐺=𝐻 𝐿Ψ(𝐿)𝐻, with Ψ(𝐿) = diag(𝜓1(𝐿),, 𝜓𝑟(𝐿)) where 𝜓𝑖=𝜓𝑖(𝐿)(𝑖= 1,, 𝑟) is a
differentiable function of 𝐿. Consequently, by semi–orthogonality of 𝐻, we have 𝑆𝑆+𝐻=𝐻 𝐻 𝐻=𝐻, so that
the correction matrix in (2.7) is
𝐽=𝑆𝑆 +𝐺=𝐺=𝐻 𝐿Ψ(𝐿)𝐻.
Thus the alternative estimators that we consider are of the form
ΣΨ=𝑎𝑜𝑆+𝐻 𝐿 Ψ(𝐿)𝐻) = 𝑎𝑜𝐻 𝐿 𝐼𝑟+ Ψ(𝐿)𝐻,(2.11)
which are usually called orthogonally invariant estimators (i.e. equivariant under orthogonal transformations). See for
instance Takemura (1984).
Now, adapting the risk finiteness conditions mentioned above, we are in a position to give our dominance result of
the alternative estimators in (2.11) over the optimal estimator in (2.5), under the data–based loss (1.4).
First Author et al.: Preprint submitted to Elsevier Page 4 of 11
Theorem 2.1. Assume that the following expectations 𝐸𝜃,Σtr(Σ−1 𝑆),𝐸𝜃,Σtr𝑆+),𝐸𝜃,ΣΣ−1𝐻 𝐿Ψ(𝐿)𝐻2
𝐹
and 𝐸𝜃,Σ𝐻Ψ(𝐿)𝐻2
𝐹are finite. Let Ψ(𝐿) = diag(𝜓1,, 𝜓𝑟)where 𝜓𝑖=𝜓𝑖(𝐿)(𝑖= 1,, 𝑟) is differentiable
function of 𝐿with trΨ(𝐿)𝜆, for a fixed positive constant 𝜆.
Then an upper bound of the risk difference between
ΣΨand
Σ𝑎𝑜under the loss function (1.4)is given by
Δ(Ψ(𝐿)) 𝑎2
𝑜𝐾𝐸
𝜃,Σ𝑔(Ψ),
where
𝑔(Ψ) =
𝑟
𝑖=1 2(𝑣𝑟+ 1)𝜓𝑖+ (𝑣𝑟+ 1)𝜓2
𝑖+ 4𝑙𝑖(1 + 𝜓𝑖)𝜕𝜓𝑖
𝜕𝑙𝑖
+
𝑟
𝑗𝑖
𝑙𝑖(2𝜓𝑖+𝜓2
𝑖) 𝑙𝑗(2𝜓𝑗+𝜓2
𝑖)
𝑙𝑖𝑙𝑗
2𝑣𝜆.
(2.12)
Also
ΣΨin (2.11)improves over
Σ𝑎𝑜in (2.5)as soon as 𝑔(Ψ) 0.
The proof of Theorem 2.1 is given in the Appendix. Note that, although the expectation 𝐸
𝜃,Σis associated to the
generating function 𝑓()in (1.2), the function 𝑔(Ψ) does not depend on 𝑓(), and hence, the improvement result in
Theorem 2.1 is robust in that sense. Note also that Theorem 2.1 is well adapted to deal with the James and Stein
(1961) estimator where 𝜓𝑖(𝐿) = 1∕(𝑣+𝑟 2𝑖+ 1), for 𝑖= 1,, 𝑟, since trΨ(𝐿)> 𝜆 = 1∕(𝑣+𝑟 1) and the
Efron-Morris-Dey estimator, considered by Tsukuma and Kubokawa (2020a), where 𝜓𝑖(𝐿) = 1∕1 + 𝑏 𝑙𝛼
𝑖∕tr (𝐿𝛼)𝑣,
for 𝑖= 1,, 𝑟 and for positive constants 𝑏and 𝛼, since trΨ(𝐿)> 𝜆 =𝑟∕(𝑏+ 1) 𝑣.
In the following, we consider a new class of estimators which is an extension of the Haff (1980) class, that is,
estimators of the form
Σ𝛼,𝑏 =𝑎𝑜𝑆+𝐻 𝐿 Ψ(𝐿)𝐻with, for 𝛼1and 𝑏 > 0,Ψ(𝐿) = 𝑏𝐿𝛼
tr(𝐿𝛼),(2.13)
where 𝑎𝑜is given in (2.5). For 𝛼= 1, this is the estimator considered by Konno (2009), who deals with the Gaussian
case and the quadratic loss (1.6), while Tsukuma and Kubokawa (2020a) used an extended Stein loss. An elliptical
setting was also considered by Haddouche et al. (2021) under the quadratic loss (1.6).
It is proved in the Appendix that, for the entire class of elliptically symmetric distributions in (2.3), any estimator
Σ𝛼,𝑏 in (2.13) improves on the optimal estimator
Σ𝑎𝑜in (2.5), under the data–based loss (1.4), as soon as
0< 𝑏 2 (𝑟 1)
𝑣𝑟+ 1 .(2.14)
It worth noting that Tsukuma and Kubokawa (2020a) gave Condition (2.14) as an improvement condition although
their loss was different.
3. Numerical study
Let the elliptical density in (1.2) be a variance mixture of normal distributions where the mixing variable, with
density , has the inverse–gamma distribution (𝑘∕2, 𝑘∕2) with shape and scale parameters both equal to 𝑘∕2 for
𝑘 > 2. Thus, for any 𝑡0, the generating function 𝑓in (1.2) has the form
𝑓(𝑡) =
0
1
(2𝚟𝜋)𝑛𝑝∕2 exp 𝑡
2𝚟(𝚟)𝑑𝚟,
which corresponds to the 𝑡–distribution with 𝑘degrees of freedom. Then the primitive 𝐹of 𝑓in (2.6) is, for any
𝑡0,
𝐹(𝑡) = 1
2
𝑡
0
1
(2𝚟𝜋)𝑛𝑝∕2 exp 𝑤
2𝚟(𝚟)𝑑𝚟𝑑𝑤 =
0
𝚟
(2𝚟𝜋)𝑛𝑝∕2 exp 𝑡
2𝚟(𝚟)𝑑𝚟.
by Fubini’s theorem. Therefore the normalizing constant 𝐾in (2.6) is
𝐾=𝑝𝑛
0Σ𝑛∕2
(2𝚟𝜋)𝑛𝑝∕2 𝚟exp −1
2𝚟tr (𝑧𝜃) Σ−1 (𝑧𝜃)+ t r Σ−1 𝑢
𝑢(𝚟)𝑑𝚟𝑑𝑧 𝑑 𝑢 ,
First Author et al.: Preprint submitted to Elsevier Page 5 of 11
=
0
𝚟𝑝𝑛 Σ𝑛∕2
(2 𝚟𝜋)𝑛𝑝∕2 exp −1
2𝚟tr (𝑧𝜃) Σ−1 (𝑧𝜃)+ t r Σ−1 𝑢
𝑢𝑑𝑧 𝑑 𝑢 (𝚟)𝑑𝚟(3.1)
by Fubini’s theorem. Clearly the most inner integral in (3.1) equals 1 so that
𝐾=
0
𝚟(𝚟)𝑑𝚟=𝑘
𝑘 2 ,
by propriety of (𝑘∕2, 𝑘∕2). Note that, when 𝑘goes to ,(𝑘∕2, 𝑘∕2) goes to the multivariate Gaussian distribution
(for which 𝐾= 1 since 𝑓=𝐹) with covariance matrix 𝐼𝑛Σ.
In the following, we study numerically the performance of the alternative estimators in (2.13) expressed as
Σ𝛼,𝑏 =𝑎𝑜𝑆+𝑏
tr(𝐿𝛼)𝐻 𝐿1−𝛼𝐻where 0𝑏𝑏0=2 (𝑟 1)
𝑣𝑟+ 1 and 𝛼1.(3.2)
As mentioned above, Konno (2009) consider the case 𝛼= 1, in the Gaussian setting and under the quadratic loss (1.6),
for which its improvement condition is
0𝑏𝑏1=2 (𝑟 1) (𝑣+𝑟+ 1)
(𝑣𝑟+ 1) (𝑣𝑟+ 3) .
Note that, although 𝑏0< 𝑏1, the improvement condition in (3.2) is valid fo any 𝛼1and all the class of elliptically
symmetric distributions (2.3). However it was shown numerically by Haddouche et al. (2021) that 𝑏1is optimal in the
Gaussian context.
We consider the following structures of Σ:(i) the identity matrix 𝐼𝑝and (ii) an autoregressive structure with
coefficient 0.9(i.e. a 𝑝×𝑝matrix where the (𝑖, 𝑗 )th element is 0.9𝑖𝑗). To assess how an alternative estimator
Σ𝛼,𝑏
improves over
Σ𝑎𝑜, we compute the Percentage Reduction In Average Loss (PRIAL) defined as
PRIAL(
Σ𝛼,𝑏) = average loss of
Σ𝑎𝑜 average loss of
Σ𝛼,𝑏
average loss of
Σ𝑎𝑜
and based on 1000 independent Monte–Carlo replications for some couples (𝑝, 𝑚).
In Figure 1, we study the effect of the constant 𝑏in (3.2) on the prial’s in the non–invertible ((𝑝, 𝑚) = (25,10)) and
the invertible ((𝑝, 𝑚) = (10,25)) cases. The Gaussian setting is investigated for the structure (i) of Σ. Note that, when
0𝑏𝑏0, the best prial (around 7% in both invertible and non–invertible cases) is reported for 𝑏=𝑏0= 1.125 (for
(𝑣, 𝑟) = (25,20)). For this reason, in the following, we consider the estimators
Σ𝛼,𝑏0with
𝑏0=2 (𝑟 1)
𝑣𝑟+ 1 .
Note also that, for 𝑏>𝑏0, the estimators
Σ𝛼,𝑏 still improve over Σ𝑎𝑜and that the maximum value of the prial is around
50%. This shows that there exists a larger range of values of 𝑏than the one our theory provides for which
Σ𝛼,𝑏 improves
over
Σ𝑎𝑜.
In Figure 2, we study the effect of 𝛼on the prial’s of the estimator
Σ𝛼,𝑏0over
Σ𝑎𝑜=𝑆𝑣when the sampling
distribution is Gaussian (𝐾= 1 in (2.5)), and over
Σ𝑎𝑜=𝑆(𝑘 2)∕𝑣𝑘 when it is the 𝑡-distribution (𝐾= (𝑘 2)∕𝑘
in (2.5)) with 𝑘degrees of freedom. For the structure (i) of Σ, note that, for 𝛼6, the prial’s stabilize at 12.5%, in the
Gaussian case, and at 8.5%, in the Student case. Similarly, the prial’s are better in the Gaussian setting for the structure
(ii). In addition, it is interesting to observe that, when 𝛼is close to zero, the prial’s are small for the structure (i) and
may be negative for the structure (ii).
In Figure 3, under the Gaussian assumption, we provide the prial’s of
Σ𝛼,𝑏0with respect to
Σ𝑎𝑜=𝑆𝑣under the
data–based loss (1.4) and the prial’s of
Σ𝛼,𝑏1with respect to
Σ𝑎𝑜=𝑆∕(𝑣+𝑟+ 1) under the quadratic loss (1.6). For the
two structures (i) and (ii) of Σ, the prial’s are better under the data–based loss. For the structure (i) with 𝛼= 1 (which
coincide with the Konno’s estimator), we observe a prial equal to 1.73% which is similar to that of Konno (2009). Note
that, under the data–based loss the prial is much better since it equals 13.42%. We observe similar behaviors for the
structure (ii) than for the structure (i), but with lower prial’s.
First Author et al.: Preprint submitted to Elsevier Page 6 of 11
0.0 2.5 5.0 7.5 10.0 12.5 15.0
b
10
0
10
20
30
40
50
PRIAL
Invertible case
Non-invertible case
Fig. 1: Effect of 𝑏on the PRIAL of
Σ𝛼,𝑏, with 𝛼= 1, under data–based loss in the Gaussian setting. The structure (i) of Σ
is considered for the invertible case with (𝑝, 𝑚) = (10,25) and the non–invertible case with (𝑝, 𝑚) = (25,10).
02468
0
2
4
6
8
10
12
PRIAL
Gaussian
Student
(i)
02468
4
3
2
1
0
1
2
3
PRIAL
Gaussian
Student
(ii)
Fig. 2: PRIAL’s of
Σ𝛼,𝑏0under the data–based loss. The non-invertible case is considered, with (𝑝, 𝑚) = (50,20), for the
structures (i) and (ii) of Σfor the t-distribution, with 𝑘= 5 degrees of freedom, and the Gaussian distribution.
02468
0
5
10
15
20
PRIAL
quadratic loss
data-based loss
(i)
02468
1
0
1
2
3
4
5
6
7
PRIAL
quadratic loss
data-based loss
(ii)
Fig. 3: PRIAL’s of
Σ𝛼,𝑏0under data–based loss and PRIAL’s of
Σ𝛼,𝑏1under quadratic loss. The non–invertible case is
considered, with (𝑝, 𝑚) = (20,10), for the structures (i) and (ii) of Σunder the Gaussian distribution.
First Author et al.: Preprint submitted to Elsevier Page 7 of 11
4. Conclusion and perspective
For a wide class of elliptically symmetric distributions, we provide a large class of estimators of the scale matrix
Σof the elliptical multivariate linear model (1.1) which improve over the usual estimators 𝑎 𝑆. We highlight that the
use of the data–based loss (1.4) is more attractive than the use of the classical quadratic loss (1.6). Indeed, (1.4) brings
more improved estimators and their improvement is valid within a larger class of distributions. This means that (1.4)
is more discriminant than (1.6) to exhibit improved estimators.
While in (2.10) the risk difference between
Σ𝐽=𝑎𝑜(𝑆+𝐽)with 𝐽=𝑆𝑆+𝐺(𝑍 , 𝑆)and
Σ𝑎𝑜=𝑎𝑜𝑆, the dom-
inance result in Theorem 2.1 is given for a correction matrix 𝐺(𝑍, 𝑆 ) = 𝐻𝐿Ψ(𝐿)𝐻which depends only on 𝑆.
Recently, Tsukuma (2016) consider, in the Gaussian case, alternative estimators where 𝐺(𝑍, 𝑆 )depends on 𝑆and on
the information contained in the sample mean 𝑍. This class of estimators merits future investigations in an elliptical
setting.
5. Appendix
We give in the following corollary an adaptation of Lemma (2.9) to an orthogonally invariant matrix function 𝐺,
that is, of the form 𝐺=𝐻 𝐿 Φ(𝐿)𝐻where Φ(𝐿) = diag(𝜙1,, 𝜙𝑟)with 𝜙𝑖=𝜙𝑖(𝐿)(𝑖= 1,, 𝑟) is differentiable
function of 𝐿
Corollary 5.1. Let Φ(𝐿) = diag(𝜙1,, 𝜙𝑟)where 𝜙𝑖=𝜙𝑖(𝐿)(𝑖= 1,, 𝑟) is differentiable function of 𝐿. Assume
that 𝐸𝜃,Σtr(Σ−1 𝐻 𝐿 Φ(𝐿)𝐻)<. Then we have
𝐸𝜃,Σtr(Σ−1 𝐻 𝐿 Φ(𝐿)𝐻)=𝐾𝐸
𝜃,Σ𝑟
𝑖=1 (𝑣𝑟+ 1) 𝜙𝑖+ 2 𝑙𝑖
𝜕𝜙𝑖
𝜕𝑙𝑖
+
𝑟
𝑗𝑖
𝑙𝑖𝜙𝑖𝑙𝑗𝜙𝑗
𝑙𝑖𝑙𝑗.
Proof. Let 𝐺=𝐻 𝐿 Φ(𝐿)𝐻,𝑆+=𝐻 𝐿−1 𝐻and 𝑆 𝑆 +=𝐻 𝐻 . Then,
𝑆𝑆 +𝐺=𝐻 𝐻 𝐻 𝐿 Φ(𝐿)𝐻=𝐻 𝐿 Φ(𝐿)𝐻=𝐺,
since 𝐻is semi–orthogonal. Assuming that 𝐸𝜃,Σtr(Σ−1 𝐻 𝐿 Φ(𝐿)𝐻)<, we have from Lemma 2.1
𝐸𝜃,ΣtrΣ−1 𝐻 𝐿 Φ(𝐿)𝐻=𝐾𝐸
𝜃,Σ2 tr𝐻 𝐻 𝑠{𝐻 𝐿 Φ(𝐿)𝐻}+ (𝑚𝑟 1) tr 𝐻Φ(𝐿)𝐻.
(5.1)
Firstly, using Lemma A.4.2 in Haddouche et al. (2021), we have
𝑠𝐻 𝐿 Φ(𝐿)𝐻=𝐻Φ(1) (𝐿)𝐻+1
2tr Φ(𝐿)𝐼𝑝𝐻 𝐻 ,(5.2)
where Φ(1)(𝐿) = diag(𝜙(1)
1,, 𝜙(1)
𝑟), with
𝜙(1)
𝑖=1
2(𝑝𝑟+ 2) 𝜙𝑖+𝑙𝑖
𝜕𝜙𝑖
𝜕𝑙𝑖
+1
2
𝑟
𝑗𝑖
𝑙𝑖𝜙𝑖𝑙𝑗𝜙𝑗
𝑙𝑖𝑙𝑗
.(5.3)
for 𝑖= 1 𝑟.
Secondly, using the fact that 𝐻𝐻=𝐼𝑟, we have from (5.2)
𝐻 𝐻𝑠𝐻 𝐿 Φ(𝐿)𝐻=𝐻Φ(1) (𝐿)𝐻.(5.4)
Then, putting (5.4) in (5.1), we obtain
𝐸𝜃,ΣtrΣ−1 𝐻 𝐿 Φ(𝐿)𝐻=𝐾𝐸
𝜃,Σ2 trΦ(1) (𝐿)+ (𝑚𝑟 1) t rΦ(𝐿).
Finally, using (5.3), we have
𝐸𝜃,ΣtrΣ−1 𝐻 𝐿 Φ(𝐿)𝐻=𝐾𝐸
𝜃,Σ𝑟
𝑖=1 (𝑝+𝑚 2𝑟+ 1) 𝜙𝑖+ 2 𝑙𝑖
𝜕𝜙𝑖
𝜕𝑙𝑖
+
𝑟
𝑗𝑖
𝑙𝑖𝜙𝑖𝑙𝑗𝜙𝑗
𝑙𝑖𝑙𝑗,
where (𝑝+𝑚 2𝑟+ 1) = (𝑣𝑟).
First Author et al.: Preprint submitted to Elsevier Page 8 of 11
The optimal constat 𝑎𝑜in (2.5).Let
Σ𝑎=𝑎 𝑆 where 𝑎 > 0. Assume that the expectations 𝐸𝜃,ΣtrΣ−1𝑆and
𝐸𝜃,ΣtrΣ𝑆+are finite. Then, the risk of
Σ𝑎𝑜relating to the data-based loss (1.4) is given by
𝑅
Σ𝑎,Σ) = 𝐸𝜃,Σtr𝑆+Σ −1
Σ𝑎𝐼𝑝)2=𝑎2𝐸𝜃,ΣtrΣ−1 𝑆𝑆+𝑆 2 𝑎 𝐸𝜃 ,Σtr 𝑆 𝑆++𝐸𝜃,Σt r𝑆+Σ.
(5.5)
Applying the Stein-Haff type identity in Corollary (5.1), with Ψ(𝐿) = 𝐼𝑟, to the first term in the right-hand side of
(5.5), we obtain
𝐸𝜃,ΣtrΣ−1 𝑆𝑆+𝑆=𝐸𝜃,Σtr Σ−1𝐻 𝐿 𝐻 =𝐾𝐸
𝜃,Σ𝑟
𝑖=1 (𝑣𝑟+ 1) +
𝑟
𝑗𝑖
𝑙𝑖𝑙𝑗
𝑙𝑖𝑙𝑗
=𝐾[𝑟(𝑣𝑟+ 1) + 𝑟(𝑟 1)]=𝐾𝑟 𝑣 . (5.6)
Now, using the fact that tr(𝑆+𝑆) = tr(𝐻 𝐻) = 𝑟and thanks to (5.6), we have
𝑅
Σ𝑎,Σ=𝑎2𝐾𝑟 𝑣 2 𝑎 𝑟 +𝐸𝜃 ,Σtr 𝑆+Σ.
Therefore, choosing 𝑎= 1∕𝐾𝑣is optimal under the risk (1.5).
Proof of Theorem 2.1.Let
ΣΨ=𝑎𝑜𝑆+𝐻 𝐿 Ψ(𝐿)𝐻)where Ψ(𝐿) = diag(𝜓1,, 𝜓𝑟)such that 𝜓𝑖=𝜓𝑖(𝐿)
(𝑖= 1,, 𝑟) is differentiable function of 𝐿and tr Ψ(𝐿)𝜆 > 0. Hence, using the fact that 𝐻𝐻=𝐼𝑟, the
involving terms in the risk difference (2.8) becomes
𝐽=𝑆𝑆 +𝐺=𝐺=𝐻 𝐿Ψ(𝐿)𝐻and 𝑆+𝐺=𝐻Ψ(𝐿)𝐻.
Then, the risk difference between
ΣΨand
Σ𝑎𝑜is given by
Δ(Ψ) = 𝑎2
𝑜𝐸𝜃,ΣtrΣ−1 𝐻 𝐿 (2 Ψ + Ψ2)𝐻 2 𝑎𝑜𝐸𝜃,ΣtrΨ.(5.7)
Now, applying the Stein-Haff type identity in Corollary (5.1) to the first term in the right hand side of (5.7), for
Φ = 2 Ψ + Ψ2, we have
Δ(Ψ) = 𝑎2
𝑜𝐾𝐸
𝜃,Σ𝑟
𝑖=1 (𝑣𝑟+ 1) (2 𝜓𝑖+𝜓2
𝑖)+2𝑙𝑖
𝜕(2 𝜓𝑖+𝜓2
𝑖)
𝜕𝑙𝑖
+
𝑟
𝑗𝑖
𝑙𝑖(2 𝜓𝑖+𝜓2
𝑖) 𝑙𝑗(2 𝜓𝑗+𝜓2
𝑖)
𝑙𝑖𝑙𝑗
2 𝑎0𝐸𝜃 ,ΣtrΨ.
Therefore, using the fact that tr(Ψ) 𝜆 > 0, an upper bound of the risk difference Δ(Ψ) is given by
Δ(Ψ) 𝑎2
𝑜𝐾𝐸
𝜃,Σ𝑟
𝑖=1 2 (𝑣𝑟+ 1) 𝜓𝑖+ (𝑣𝑟+ 1) 𝜓2
𝑖+ 4 𝑙𝑖(1 + 𝜓𝑖)𝜕𝜓𝑖
𝜕𝑙𝑖
+
𝑟
𝑗𝑖
𝑙𝑖(2 𝜓𝑖+𝜓2
𝑖) 𝑙𝑗(2 𝜓𝑗+𝜓2
𝑖)
𝑙𝑖𝑙𝑗
−2(𝑎𝑜𝐾)−1𝜆,
where (𝑎𝑜𝐾)−1 =𝑣.
Improvement condition (2.14)of alternative estimators in (2.13).Let consider the class of alternative estimators
Σ𝛼,𝑏 in (2.13). Then, applying Theorem 2.1, an upper bound of the risk difference between
Σ𝛼,𝑏 and
Σ𝑎𝑜is given by
Δ(Ψ) 𝑎2
𝑜𝐾𝐸
𝜃,Σ𝑔(Ψ),(5.8)
where the integrand term in (2.12) becomes
𝑔(Ψ) = 𝑔1(Ψ) + 𝑔2(Ψ)
First Author et al.: Preprint submitted to Elsevier Page 9 of 11
with
𝑔1(Ψ) = −2 (𝑟 1) 𝑏
𝑟
𝑖=1
𝑙𝛼
𝑖
tr(𝐿𝛼)+ (𝑣𝑟+ 1) 𝑏2
𝑟
𝑖=1
𝑙−2𝛼
𝑖
tr 2(𝐿𝛼),
since tr Ψ(𝐿)=𝑏, and
𝑔2(Ψ) = 4𝑙𝑖𝑏1 + 𝑏𝑙𝛼
𝑖
tr(𝐿𝛼)𝜕
𝜕𝑙𝑖𝑙𝛼
𝑖
tr(𝐿𝛼)+2𝑏
tr(𝐿𝛼)
𝑟
𝑖=1
𝑟
𝑗𝑖
𝑙1−𝛼
𝑖𝑙1−𝛼
𝑗
𝑙𝑖𝑙𝑗
+𝑏2
tr 2(𝐿𝛼)
𝑟
𝑖=1
𝑟
𝑗𝑖
𝑙1−2𝛼
𝑖𝑙1−2𝛼
𝑗
𝑙𝑖𝑙𝑗
.
The proof consist to prove that the integrand term 𝑔2(Ψ) is non-positive. To this end, it can be shown that, for 𝛼1,
𝑟
𝑖=1
𝑟
𝑗𝑖
𝑙1−𝛼
𝑖𝑙1−𝛼
𝑗
𝑙𝑖𝑙𝑗
= 2
𝑟
𝑖
𝑟
𝑗>𝑖
𝑙1−𝛼
𝑖𝑙1−𝛼
𝑗
𝑙𝑖𝑙𝑗
0and
𝑟
𝑖=1
𝑟
𝑗𝑖
𝑙1−2𝛼
𝑖𝑙1−2𝛼
𝑗
𝑙𝑖𝑙𝑗
= 2
𝑟
𝑖=1
𝑟
𝑗>𝑖
𝑙1−2𝛼
𝑖𝑙1−2𝛼
𝑗
𝑙𝑖𝑙𝑗
<0.
since 𝐿= diag(𝑙1>, , > 𝑙𝑟). Then
𝑔2(Ψ) 4𝑙𝑖𝑏1 + 𝑏𝑙𝛼
𝑖
tr(𝐿𝛼)𝜕
𝜕𝑙𝑖𝑙𝛼
𝑖
tr(𝐿𝛼)= 4 𝑏 𝛼 𝑙𝛼
𝑖
tr(𝐿𝛼)1 + 𝑏𝑙𝛼
𝑖
tr(𝐿𝛼) 𝑙𝛼
𝑖
tr(𝐿𝛼) 1,
since
𝜕
𝜕𝑙𝑖𝑙𝛼
𝑖
tr(𝐿𝛼)=𝛼𝑙𝛼−1
𝑖
tr(𝐿𝛼)𝑙𝛼
𝑖
tr(𝐿𝛼) 1.
Therefore, since 𝑙𝛼
𝑖tr(𝐿𝛼), the integrand term 𝑔2(Ψ) 0. Then
𝑔(Ψ) 𝑔1(Ψ) = −2 (𝑟 1) 𝑏+ (𝑣𝑟+ 1) 𝑏2tr(𝐿−2𝛼)
tr 2(𝐿𝛼).
Now, using the fact that tr(𝐿−2𝛼)tr2(𝐿𝛼), we have
𝑔(Ψ) −2 (𝑟 1) 𝑏+ (𝑣𝑟+ 1) 𝑏2.
since 𝑏 > 0. Hence, an upper bound for the risk difference in (5.8) is given by
Δ(Ψ) 𝑎2
𝑜𝑏 𝐾𝐸
𝜃,Σ 2 (𝑟 1) + (𝑣𝑟+ 1) 𝑏.
Therefore,
Σ𝛼,𝑏 improves over
Σ𝑎𝑜under the data-based loss (1.4) as soon as 0< 𝑏 𝑏0= 2 (𝑟 1)∕(𝑣𝑟+ 1) .
CRediT authorship contribution statement
Dominique Fourdrinier: Conceptualization, Methodology, Supervision, Validation, Writing - review & editing,
Writing - original draft, Software. Anis M. Haddouche: Conceptualization, Methodology, Supervision, Validation,
Writing - review & editing, Writing - original draft, Software. Fatiha Mezoued: Conceptualization, Methodology,
Supervision, Validation, Writing - review & editing, Writing - original draft, Software.
First Author et al.: Preprint submitted to Elsevier Page 10 of 11
References
Berger, J., 1975. Minimax estimation of location vectors for a wide class of densities. Ann. Statis. 3, 1318–1328.
Bodnar, T., Gupta, A.K., 2009. An identity for multivariate elliptically contoured matrix distribution. Stat. Probab. Lett. 79, 1327–1330.
Candès, E., Sing-Long, C., Trzasko, J.D., 2013. Unbiased risk estimates for singular value thresholding and spectral estimators. IEEE T. Signal
Proces. 61, 4643–4657.
Canu, S., Fourdrinier, D., 2017. Unbiased risk estimates for matrix estimation in the elliptical case. J. Multivariate Anal. 158, 60–72.
Chételat, D., Wells, M.T., 2016. Improved second order estimation in the singular multivariate normal model. J. Multivariate Anal. 147, 1–19.
Díaz-Gacía, J.A., Gutiérrez-Jámez, R., 2011. On Wishart distribution: Some extensions. Linear Algebra Appl. 435, 1296–1310.
Efron, B., Morris, C., 1976. Multivariate empirical Bayes and estimation of covariance matrices. Ann. Statist. 4, 22–32.
Fang, K., Zhang, Y., 1990. Generalized multivariate analysis. 1990. Science Press, Springer-Verlag, Beijing.
Fourdrinier, D., Strawderman, W., 2015. Robust minimax Stein estimation under invariant data–based loss for spherically and elliptically symmetric
distributions. Metrika 78, 461–484.
Haddouche, A.M., Fourdrinier, D., Mezoued, F., 2021. Scale matrix estimation of an elliptically symmetric distribution in high and low dimensions.
J. Multivariate Anal. 181, 104680.
Haddouche, M.A., 2019. Éstimation d’une matrice d’échelle sous un coût basé sur les données in: Estimation d’une matrice d’échelle. Thesis. Nor-
mandie Université ; École nationale supérieure de statistiques et d’économie appliquée (Alger). URL: https://tel.archives-ouvertes.
fr/tel-02376077.
Haff, L., 1980. Empirical Bayes estimation of the multivariate normal covariance matrix. Ann. Statis. 8, 586–597.
Haff, L.R., 1979. Estimation of the inverse covariance matrix: Random mixtures of the inverse Wishart matrix and the identity. Ann. Statist. 7,
1264–1276.
James, W., Stein, C., 1961. Estimation with quadratic loss, in: Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and
Probability, Volume 1: Contributions to the Theory of Statistics, Berkeley, California. pp. 361–379.
Konno, Y., 2009. Shrinkage estimators for large covariance matrices in multivariate real and complex normal distributions under an invariant
quadratic loss. J. Multivariate Anal. 100, 2237–2253.
Kubokawa, T., Srivastava, M., 2001. Robust improvement in estimation of a mean matrix in an elliptically contoured distribution. J. Multivariate
Anal. 76, 138–152.
Kubokawa, T., Srivastava, M., 2008. Estimation of the precision matrix of a singular Wishart distribution and its application in high-dimensional
data. J. Multivariate Anal. 99, 1906–1928.
Kubokawa, T., Srivastava, M.S., 1999. Robust improvement in estimation of a covariance matrix in an elliptically contoured distribution. Ann.
Statist. 27, 600–609.
Srivastava, M.S., 2003. Singular Wishart and multivariate Beta distributions. Ann. Statis. 31, 1537–1560.
Stein, C., 1986. Lectures on the theory of estimation of many parameters. J. Sov. Math. 34, 1373–1403.
Takemura, A., 1984. An orthogonally invariant minimax estimator of the covariance matrix of a multivariate normal population. Tsukuba J. Math.
8, 367–376.
Tsukuma, H., 2016. Estimation of a high-dimensional covariance matrix with the Stein loss. J. Multivariate Anal. 148, 1–17.
Tsukuma, H., Kubokawa, T., 2015. A unified approach to estimating a normal mean matrix in high and low dimensions. J. Multivariate Anal. 139,
312 328.
Tsukuma, H., Kubokawa, T., 2016. Unified improvements in estimation of a normal covariance matrix in high and low dimensions. J. Multivariate
Anal. 143, 233–248.
Tsukuma, H., Kubokawa, T., 2020a. Estimation of the covariance matrix, in: Shrinkage Estimation for Mean and Covariance Matrices. Springer,
pp. 75–110.
Tsukuma, H., Kubokawa, T., 2020b. Multivariate linear model and group invariance, in: Shrinkage Estimation for Mean and Covariance Matrices.
Springer, pp. 27–33.
First Author et al.: Preprint submitted to Elsevier Page 11 of 11
ResearchGate has not been able to resolve any citations for this publication.
Article
The problem of estimating the scale matrix Σ in a multivariate additive model, with elliptical noise, is considered from a decision-theoretic point of view. As the natural estimators of the form Σˆa=aS (where S is the sample covariance matrix and a is a positive constant) perform poorly, we propose estimators of the general form Σˆa,G=a(S+SS+G(Z,S)), where S+ is the Moore–Penrose inverse of S and G(Z,S) is a correction matrix. We provide conditions on G(Z,S) such that Σˆa,G improves over Σˆa under the quadratic loss L(Σ,Σˆ)=tr(ΣˆΣ−1−Ip)2. We adopt a unified approach to the two cases where S is invertible and S is singular. To this end, a new Stein–Haff type identity and calculus on eigenstructure for S are developed. Our theory is illustrated with a large class of estimators which are orthogonally invariant.
Chapter
Multivariate linear model is a multivariate generalization for the dimension of response variable in traditional multiple linear regression model. This chapter provides some fundamental properties in terms of the multivariate linear model and the corresponding canonical form. The group invariance is also explained for shrinkage estimation in the multivariate linear model.
Book
This book provides a self-contained introduction to shrinkage estimation for matrix-variate normal distribution models. More specifically, it presents recent techniques and results in estimation of mean and covariance matrices with a high-dimensional setting that implies singularity of the sample covariance matrix. Such high-dimensional models can be analyzed by using the same arguments as for low-dimensional models, thus yielding a unified approach to both high- and low-dimensional shrinkage estimations. The unified shrinkage approach not only integrates modern and classical shrinkage estimation, but is also required for further development of the field. Beginning with the notion of decision-theoretic estimation, this book explains matrix theory, group invariance, and other mathematical tools for finding better estimators. It also includes examples of shrinkage estimators for improving standard estimators, such as least squares, maximum likelihood, and minimum risk invariant estimators, and discusses the historical background and related topics in decision-theoretic estimation of parameter matrices. This book is useful for researchers and graduate students in various fields requiring data analysis skills as well as in mathematical statistics.
Article
This paper is concerned with additive models of the form , where is an observed matrix with , is an unknown matrix of interest with low rank, and is a random noise whose distribution is elliptically symmetric. For general estimators of , we develop unbiased risk estimates, including in the special case where is Gaussian with covariance matrix proportional to the identity matrix. To this end, we develop a new Stein–Haff type identity. We apply the theory to a model selection framework with estimators defined through a soft-thresholding function. We establish the robustness of our approach within a large subclass of elliptical distributions.
Article
The problem of estimating a covariance matrix in multivariate linear regression models is addressed in a decision-theoretic framework. This paper derives unified dominance results under a Stein-like loss, irrespective of order of the dimension, the sample size and the rank of the regression coefficients matrix. Especially, using the Stein-Haff identity, we develop a key inequality which is useful for constructing a truncated and improved estimator based on the information contained in the sample means or the ordinary least squares estimator of the regression coefficients. Also, a quadratic loss-like function is used to suggest alternative improved estimators with respect to an invariant quadratic loss.
Article
We consider the problem of estimating covariance and precision matrices, and their associated discriminant coefficients, from normal data when the rank of the covariance matrix is strictly smaller than its dimension and the available sample size. Using unbiased risk estimation, we construct novel estimators by minimizing upper bounds on the difference in risk over several classes. Our proposal estimates are empirically demonstrated to offer substantial improvement over classical approaches.
Article
The problem of estimating a normal covariance matrix is considered from a decision-theoretic point of view, where the dimension of the covariance matrix is larger than the sample size. This paper addresses not only the nonsingular case but also the singular case in terms of the covariance matrix. Based on James and Stein's minimax estimator and on an orthogonally invariant estimator, some classes of estimators are unifiedly defined for any possible ordering on the dimension, the sample size and the rank of the covariance matrix. Unified dominance results on such classes are provided under a Stein-type entropy loss. The unified dominance results are applied to improving on an empirical Bayes estimator of a high-dimensional covariance matrix.
Article
This paper addresses the problem of estimating the normal mean matrix with an unknown covariance matrix. Motivated by an empirical Bayes method, we suggest a unified form of the Efron–Morris type estimators based on the Moore–Penrose inverse. This form not only can be defined for any dimension and any sample size, but also can contain the Efron–Morris type or Baranchik type estimators suggested so far in the literature. Also, the unified form suggests a general class of shrinkage estimators. For shrinkage estimators within the general class, a unified expression of unbiased estimators of the risk functions is derived regardless of the dimension of covariance matrix and the size of the mean matrix. An analytical dominance result is provided for a positive-part rule of the shrinkage estimators.
Article
From an observable \((X,U)\) in \(\mathbb R^p \times \mathbb R^k\) , we consider estimation of an unknown location parameter \(\theta \in \mathbb R^p\) under two distributional settings: the density of \((X,U)\) is spherically symmetric with an unknown scale parameter \(\sigma \) and is ellipically symmetric with an unknown covariance matrix \(\Sigma \) . Evaluation of estimators of \(\theta \) is made under the classical invariant losses \(\Vert d - \theta \Vert ^2 / \sigma ^2\) and \((d - \theta )^t \Sigma ^{-1} (d - \theta )\) as well as two respective data based losses \(\Vert d - \theta \Vert ^2 / \Vert U\Vert ^2\) and \((d - \theta )^t S^{-1} (d - \theta )\) where \(\Vert U\Vert ^2\) estimates \(\sigma ^2\) while \(S\) estimates \(\Sigma \) . We provide new Stein and Stein–Haff identities that allow analysis of risk for these two new losses, including a new identity that gives rise to unbiased estimates of risk (up to a multiple of \(1 / \sigma ^2\) ) in the spherical case for a larger class of estimators than in Fourdrinier et al. (J Multivar Anal 85:24–39, 2003). Minimax estimators of Baranchik form illustrate the theory. It is found that the range of shrinkage of these estimators is slightly larger for the data based losses compared to the usual invariant losses. It is also found that \(X\) is minimax with finite risk with respect to the data-based losses for many distributions for which its risk is infinite when calculated under the classical invariant losses. In these cases, including the multivariate \(t\) and, in particular, the multivariate Cauchy, we find improved shrinkage estimators as well.
Article
This paper proposes a unified approach that enables the Wishart distribution to be studied simultaneously in the real, complex, quaternion and octonion cases under elliptical models. In particular, the matrix multivariate elliptical distribution, the noncentral generalised Wishart distribution, the joint density of the eigenvalues and the distribution of the maximum eigenvalue are obtained for real normed division algebras.