Content uploaded by Daniel Vogel
Author content
All content in this area was uploaded by Daniel Vogel on Jun 15, 2016
Content may be subject to copyright.
Available via license: CC BY
Content may be subject to copyright.
arXiv:1606.02274v1 [stat.ME] 7 Jun 2016
The spatial sign covariance matrix and its
application for robust correlation estimation
A. D¨
urre1, R. Fried
Fakult¨at Statistik, Technische Universit¨at Dortmund
44221 Dortmund, Germany
D. Vogel
Institute for Complex Systems and Mathematical Biology, University of Aberdeen
Aberdeen AB24 3UE, United Kingdom
Abstract
We summarize properties of the spatial sign covariance matrix and especially
look at the relationship between its eigenvalues and those of the shape matrix
of an elliptical distribution. The explicit relationship known in the bivariate
case was used to construct the spatial sign correlation coefficient, which is a
non-parametric and robust estimator for the correlation coefficient within the
elliptical model. We consider a multivariate generalization, which we call the
multivariate spatial sign correlation matrix.
1 Introduction
Let X1,...,Xndenote a sample of independent pdimensional random variables from
a distribution Fand s:Rp→Rpwith s(x) = x/|x|for x6= 0 and s(0) = 0 the spatial
sign, then
Sn(tn,X1,...,Xn) = 1
n
n
X
i=1
s(Xi−tn)s(Xi−tn)T
denotes the empirical spatial sign covariance matrix (SSCM) with location tn. The
canonical choice for the location estimator tnis the spatial median
µn= argmin
µ∈Rp
n
X
i=1 ||Xi−µ||.
Beside its nice robustness properties like an asymptotic breakdown-point of 1/2, it
has (under regularity conditions, see [12]) the advantageous feature that it centres the
spatial signs, i.e.,
1
n
n
X
i=1
s(Xi−µn) = 0,
so that Sn(µn,X1,...,Xn) is indeed the empirical covariance matrix of the spatial
signs of the data. If tnis (strongly) consistent for a location t∈R, it was shown
1corresponding author, e-mail: alexander.duerre@udo.edu
1
in [5] that under mild conditions on Fthe empirical SSCM is a (strongly) consistent
estimator for its population counterpart
S(X) = E(s(X−t)s(X−t)T).
There are some nice results if Fis within the class of continuous elliptical distributions,
which means that Fpossesses a density of the form
f(x) = det(V)−1
2g((x−µ)V−1(x−µ))
for a location µ∈Rp, a symmetric and positive definite shape matrix V∈Rp×p
and a function g:R→R, which is often called the elliptical generator. Prominent
members of the elliptical family are the multivariate normal distribution and elliptical
t-distributions (e.g. [2], p. 208). If second moments exists, then µis the expectation
of X∼F, and Va multiple of the covariance matrix. The shape matrix Vis unique
only up to a multiplicative constant. In the following, we consider the trace-normalized
shape matrix V0=V /tr(V),which is convenient since S(X) also has trace 1. If Fis
elliptical, then S(X) and Vshare the same eigenvectors and the respective eigenvalues
have the same ordering. For this reason, the SSCM has been proposed for robust prin-
cipal component analysis (e.g. [13, 15]). In the present article, we study the eigenvalues
of the SSCM.
2 Eigenvalues of the SSCM
Let λ1≥... ≥λp≥0 denote the eigenvalues of V0and δ1≥... ≥δp≥0 those of
S(X). Explicit formulae that relate the δito the λiare only known for p= 2 (see
[19, 3]), namely
δi=√λi
√λ1+√λ2
, i = 1,2.(1)
Assuming λ2>0, we have δ1/δ2=pλ1/λ2≤λ1/λ2,thus the eigenvalues of the SSCM
are closer together than those of the corresponding shape matrix. It is shown in [8]
that this holds true for arbitrary p > 2, so
λi/λj≥δi/δjfor 1 ≤i < j ≤p(2)
as long as λj>0.There is no explicit map between the eigenvalues known for p > 2.
D¨urre et al. [8] give a representation of δias one-dimensional integral, which permits
fast and accurate numerical evaluations for arbitrary p,
δi=λi
2Z∞
0
1
(1 + λix)Qp
j=1(1 + λjx)1
2
dx, i = 1,...,p. (3)
We use this formula (implemented in R [17] in the package sscor [9]) to get an impression
how the eigenvalues of S(X) look like in comparison to those of V0. We first look at of
equidistantly spaced eigenvalues
λi=2i
p(p+ 1), i = 1,...,p,
2
●
●
●
0.20 0.30 0.40 0.50
0.20 0.30 0.40 0.50
Eigenvalues of V0
Eigenvalues of S(X)
●
●
●
●
●●●●●●●
0.05 0.10 0.15
0.05 0.10 0.15
Eigenvalues of V0
Eigenvalues of S(X)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0.000 0.005 0.010 0.015
0.000 0.005 0.010 0.015
Eigenvalues of V0
Eigenvalues of S(X)
Figure 1: Eigenvalues of the SSCM wrt the corresponding eigenvalues of the shape
matrix in the equidistant setting p= 3 (left), p= 11 (centre) and p= 101 (right).
●
●
●
0.1 0.3 0.5 0.7
0.1 0.3 0.5 0.7
Eigenvalues of V0
Eigenvalues of S(X)
●
●
●
●
●
●
●
●
●
●
●
0.0 0.1 0.2 0.3 0.4
0.0 0.1 0.2 0.3 0.4
Eigenvalues of V0
Eigenvalues of S(X)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0.00 0.02 0.04 0.06 0.08
0.00 0.02 0.04 0.06 0.08
Eigenvalues of V0
Eigenvalues of S(X)
Figure 2: Eigenvalues of the SSCM wrt the corresponding eigenvalues of shape matrix
in the setting of one large eigenvalue for p= 3 (left), p= 11 (centre) and p= 101
(right).
for different p= 3,11,101. The magnitude of the eigenvalues necessarily decreases
as pincreases, since Pp
i=1 λi=Pp
i=1 δi= 1 per definition of V0and S(X). As one can
see in Figure 1, the eigenvalues of S(X) and V0approach each other for increasing p.
In fact the maximal absolute difference for p= 101 is roughly 2 ·10−4. In the second
scenario, we take p−1 equidistantly spaced eigenvalues and one eigenvalue 5 times
larger than the rest, i.e.,
λi=
i
p((p+1)/2+5)−5i= 1,...,p−1,
5(p−1)
p((p+1)/2+5)−5i=p.
This models the case where the dependence is mainly driven by one principle compo-
nent. As one can see in Figure 2, the distance between the two largest eigenvalues is
smaller for S(X) than for V0. This is not surprising in light of (2). Thus in general, the
eigenvalues of the SSCM are less separated than those of V0, which is one reason why
3
the use of the SSCM for robust principal component analysis has been questioned (e.g.
[1, 14]). However, the differences appear to be generally small in higher dimensions.
3 Estimation of the correlation matrix
Equation (1) can be used to derive an estimator for the correlation coefficient based
on the empirical SSCM: the spatial sign correlation coefficient ρn([6]). Under mild
regularity assumptions this estimator is consistent under elliptical distributions and
asymptotically normal with variance
ASV(ρn) = (1 −ρ2)2+1
2(a+a−1)(1 −ρ2)3/2,(4)
where a=pv11/v22 is the ratio of the marginal scales and ρ=v12/√v11 v22 is the
generalized correlation coefficient, which coincides with the usual moment correlation
coefficient if second moments exists. Equation (4) indicates that the variance of ρnis
minimal for a= 1, but can get arbitrarily large if atends to infinity or 0.
Therefore a two-step procedure has been proposed, the two-stage spatial sign cor-
relation ρσ,n, which first normalizes the data by a robust scale estimator, e.g., the
median absolute deviation (mad), and then computes the spatial sign correlation of
the transformed data. Under mild conditions (see [7]), this two-step procedure yields
an asymptotic variance of
ASV(ρσ,n) = (1 −ρ2)2+ (1 −ρ2)3/2,(5)
which equals that of ρnfor the favourable case of a= 1. Since (5) only depends on
the parameter ρ, the two-stage spatial sign correlation coefficient is very suitable to
construct robust and non-parametric confidence intervals for the correlation coefficient
under ellipticity. It turns out that these intervals are quite accurate even for rather
small sample sizes of n= 10 and in fact more accurate then those based on the sample
moment correlation coefficient [7].
One can construct an estimator of the correlation matrix Rby filling the off-diagonal
positions of the matrix estimate with the bivariate spatial sign correlation coefficients
of all pairs of variables. This was proposed in [6]. Equation (3) allows an alternative
approach: First standardize the data by a robust scale estimator and compute the
SSCM of the transformed data. Then apply a singular value decomposition
Sn(tn,X1,...,Xn) = ˆ
Uˆ
∆ˆ
UT,
where ˆ
∆ contains the ordered eigenvalues ˆ
δ1≥... ≥ˆ
δp.One obtains estimates
ˆ
λ1,...,ˆ
λpby inverting (3). Although theoretical results are yet to be established,
4
we found in our simulations that the following fix point algorithm
ˆ
λ(0)
i=δi, i = 1,...,p,
˜
λ(k+1)
i= 2ˆ
δi Z∞
0
1
(1 + ˆ
λ(k)
ix)Qp
j=1(1 + ˆ
λ(k)
jx)1
2
dx, !−1
, i = 1, . . . , p, k = 1,2,...
ˆ
λ(k+1)
i=˜
λ(k+1)
i p
X
j=1
˜
λj
(k+1)!−1
, i = 1, . . . , p, k = 1,2,...
works reliably and converges fast. Let ˆ
Λ denote the diagonal matrix containing ˆ
λ1,...,ˆ
λp,
then ˆ
V=ˆ
Uˆ
Λˆ
UTis a suitable estimator for for the shape of the standardized data and
ˆ
Rwith ˆrij = ˆvij /pˆvii ˆvjj an estimator for the correlation matrix, which we call the
multivariate spatial sign correlation matrix. Contrary to the pairwise approach, the
multivariate spatial sign correlation matrix is positive semi-definite by construction.
Theoretical properties of the new estimator are not straightforward to establish. By
a small simulation study we want to get an impression of its efficiency. We compare the
variances of the moment correlation, the pairwise as well as the multivariate spatial sign
correlation under several elliptical distributions: normal, Laplace and tdistributions
with 5 and 10 degrees of freedom. The latter three generate heavier tails than the
normal distribution. The Laplace distribution is obtained by the elliptical generator
g(x) = cpexp(−p|x|/2), where cpis the appropriate integration constant depending
on p(e.g. [2], p. 209).
We take the identity matrix as shape matrix and compare the variances of an off-
diagonal element of the matrix estimates for different dimensions p= 2,3,5,10,50
and sample sizes n= 100,1000. We use the R packages mvtnorm [10] and MNM [16]
for the data generation. The results based on 10000 runs are summarized in Table 1.
Except for the moment correlation at the t5distribution, the results for n= 100 and
n= 1000 are very similar. Note that the variance of the moment correlation decreases
at the Laplace distribution as the dimension pincreases, but not so for the other
distributions considered. The lower dimensional marginals of the Laplace distribution
are, contrary to the normal and the t-distributions, not Laplace distributed (see [11]),
and the kurtosis of the one-dimensional marginals of the Laplace distribution in fact
decreases as pincreases.
Equation (5) yields an asymptotic variance of 2 for the pairwise spatial sign corre-
lation matrix elements regardless of the specific elliptical generator, which can also be
observed in the simulation results. The moment correlation is twice as efficient under
normality, but has a higher variance at heavy tailed distributions. For uncorrelated t5
distributed random variables, the spatial sign correlation outperforms the moment cor-
relation. Looking at the multivariate spatial sign correlation, we see a strong increase
of efficiency for larger p. For p= 50 the variance is comparable to that of the moment
correlation. Since the asymptotic variance of the SSCM does not depend on the ellipti-
cal generator, this is expected to also hold for the multivariate spatial sign correlation,
and we find this confirmed by the simulations. The multivariate spatial sign correla-
tion is more efficient than the moment correlation even under slightly heavier tails for
moderately large p.
5
n100 1000
p2 3 5 10 50 2 3 5 10 50
N
cor 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
sscor pairwise 1.9 1.9 1.9 1.9 1.9 2.0 2.0 2.0 2.0 2.0
sscor multivariate 1.9 1.6 1.4 1.2 1.0 2.0 1.7 1.4 1.2 1.0
t10
cor 1.3 1.3 1.3 1.3 1.3 1.3 1.3 1.3 1.4 1.3
sscor pairwise 2.0 1.9 1.9 2.0 1.9 2.0 2.0 2.0 2.0 2.0
sscor multivariate 2.0 1.7 1.3 1.2 1.0 2.0 1.7 1.4 1.2 1.0
t5
cor 2.0 2.1 2.1 2.1 2.1 2.6 2.6 2.6 2.6 2.6
sscor pairwise 2.0 2.0 1.9 2.0 1.9 2.1 2.0 2.0 2.0 2.0
sscor multivariate 2.0 1.7 1.4 1.2 1.1 2.1 1.7 1.4 1.2 1.0
L
cor 1.6 1.5 1.3 1.2 1.1 1.6 1.5 1.3 1.2 1.1
sscor pairwise 1.9 1.9 1.9 2.0 2.0 2.0 2.0 2.0 2.0 2.0
sscor multivariate 1.9 1.6 1.4 1.2 1.1 2.0 1.7 1.4 1.2 1.1
Table 1: Simulated variances (multiplied by √n) of one off-diagonal element of the
correlation matrix estimate based on the moment correlation (cor), the pairwise spatial
sign correlation (sscor pairwise) and the multivariate spatial sign correlation matrix
(sscor multivariate) for spherical normal (N), t5,t10, and Laplace (L) distribution,
several dimensions pand sample sizes n= 100,1000.
An increase of efficiency for larger pis not uncommon for robust scatter estimators.
It can be observed amongst others for M-estimators, the Tyler shape matrix, the MCD,
and S-estimators (e.g. [4, 18]). All of these are affine equivariant estimators, requiring
n > p. This is not necessary for the spatial sign correlation matrix. One may expect
that the efficiency gain for large pis at the expense of robustness, in particular a
larger maximum bias curve. Further research will be necessary to thoroughly explore
the robustness properties and efficiency of the multivariate spatial sign correlation
estimator.
References
[1] Bali J.L., Boente G., Tyler D.E., Wang J.L. (2011). Robust functional principal
components: A projection-pursuit approach. The Annals of Statistics. Vol. 39,
pp. 2852-2882.
[2] Bilodeau M., Brenner D. (1999). Theory of Multivariate Statistics. Springer, New
York.
[3] Croux C., Dehon C., Yadine, A. (2010). The k-step spatial sign covariance matrix.
Advances in data analysis and classification. Vol. 4, pp. 137-150.
6
[4] Croux C., Haesbroeck G. (1999). Influence function and efficiency of the minimum
covariance determinant scatter matrix estimator. Journal of Multivariate Analysis.
Vol. 71, pp. 161-190.
[5] D¨urre A., Vogel D., Tyler D.E. (2014). The spatial sign covariance matrix with
unknown location. Journal of Multivariate Analysis. Vol. 130, pp. 107-117.
[6] D¨urre A., Vogel D., Fried R. (2015). Spatial sign correlation. Journal of Multivari-
ate Analysis. Vol. 135, pp. 89-105.
[7] D¨urre A., Vogel, D. (2016). Asymptotics of the two-stage spatial sign correlation.
Journal of Multivariate Analysis. Vol. 144, pp. 54-67.
[8] D¨urre A., Tyler D.E., Vogel, D. (2016). On the eigenvalues of the spatial sign
covariance matrix in more than two dimensions. Statistics & Probability Letters.
Vol. 111, pp. 80-85.
[9] D¨urre A., Vogel D. (2016). sscor: Robust Correlation Estimation and Testing
Based on Spatial Signs. R package version 0.2.
[10] Genz A, Bretz F., Miwa T., Mi X., Leisch F., Scheipl F., Bornkamp B., Maech-
ler M., Hothorn T. (2016), mvtnorm: Multivariate Normal and t Distributions. R
package version 1.0.5.
[11] Kano Y. (1994). Consistency property of elliptic probability density functions,
Journal of Multivariate Analysis. Vol. 51, pp. 139-147.
[12] Kemperman J. H. B. (1987). The median of a finite measure on a Banach space.
Statistical Data Analysis Based on the L1-Norm and Related Methods. pp. 217-
230.
[13] Locantore N., Marron J.S., Simpson D.G., Tripoli N., Zhang J.T., Co-
hen K.L. (1999). Robust principal component analysis for functional data. Test.
Vol. 8, pp. 1-73.
[14] Magyar A.F., Tyler D.E. (2014). The asymptotic inadmissibility of the spatial sign
covariance matrix for elliptically symmetric distributions. Biometrika. Vol. 101,
pp. 673-688.
[15] Marden, J.I. (1999). Some robust estimates of principal components. Statistics &
probability letters. Vol. 43, pp. 349-359.
[16] Nordhausen K., Oja H. (2011), Multivariate L1methods: the package MNM.
Journal of Statistical Software. Vol. 43, pp. 1-28.
[17] R Development Core Team (2016). R: A Language and Environment for Statistical
Computing.
7
[18] Taskinen S., Croux C., Kankainen A., Ollila E., Oja H. (2006). Influence functions
and efficiencies of the canonical correlation and vector estimates based on scatter
and shape matrices. Journal of Multivariate Analysis. Vol. 97, pp. 359-384.
[19] Vogel D., K¨ollmann C., Fried R. (2008). Partial correlation estimates based on
signs. Proceedings of the 1st Workshop on Information Theoretic Methods in Sci-
ence and Engineering. Vol. 43, pp. 1-6.
8