ArticlePDF Available

A generalized spatial sign covariance matrix

Authors:

Abstract and Figures

The well-known spatial sign covariance matrix (SSCM) carries out a radial transform which moves all data points to a sphere, followed by computing the classical covariance matrix of the transformed data. Its popularity stems from its robustness to outliers, fast computation, and applications to correlation and principal component analysis. In this paper we study more general radial functions. It is shown that the eigenvectors of the generalized SSCM are still consistent and the ranks of the eigenvalues are preserved. The influence function of the resulting scatter matrix is derived, and it is shown that its asymptotic breakdown value is as high as that of the original SSCM. A simulation study indicates that the best results are obtained when the inner half of the data points are not transformed and points lying far away are moved to the center.
Content may be subject to copyright.
Journal of Multivariate Analysis 171 (2019) 94–111
Contents lists available at ScienceDirect
Journal of Multivariate Analysis
journal homepage: www.elsevier.com/locate/jmva
A generalized spatial sign covariance matrix
Jakob Raymaekers, Peter Rousseeuw
Department of Mathematics, KU Leuven, Belgium
article info
Article history:
Received 3 May 2018
Available online 24 November 2018
AMS 2010 subject classifications:
primary 62H12
secondary 62H86
Keywords:
Orthogonal equivariance
Outliers
Robust location and scatter
abstract
The well-known spatial sign covariance matrix (SSCM) carries out a radial transform which
moves all data points to a sphere, followed by computing the classical covariance matrix of
the transformed data. Its popularity stems from its robustness to outliers, fast computation,
and applications to correlation and principal component analysis. In this paper we study
more general radial functions. It is shown that the eigenvectors of the generalized SSCM are
still consistent and the ranks of the eigenvalues are preserved. The influence function of the
resulting scatter matrix is derived, and it is shown that its asymptotic breakdown value is
as high as that of the original SSCM. A simulation study indicates that the best results are
obtained when the inner half of the data points are not transformed and points lying far
away are moved to the center.
©2018 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY
license (http://creativecommons.org/licenses/by/4.0/).
1. Introduction
Robust estimation of the covariance (scatter) matrix is an important and challenging problem. Over the last decades,
many robust estimators for the covariance matrix have been developed. Many of them possess the attractive property of
affine equivariance, meaning that when the data are subjected to an affine transformation the estimator will transform
accordingly.
However, all highly robust affine equivariant scatter estimators have a combinatorial time complexity. Other estimators
possess the less restrictive property of orthogonal equivariance. This means that the estimators commute with orthogonal
transformations, which are characterized by orthogonal matrices and include rotations and reflections.
The most well-known orthogonally equivariant scatter estimator is the spatial sign covariance matrix (SSCM) proposed
independently in [20,31] and studied in more detail in [8,11,19], among others. The estimator computes the regular
covariance matrix on the spatial signs of the data, which are the projections of the location-centered datapoints on the unit
sphere. Somewhat surprisingly, this transformation yields a consistent estimator of the eigenvectors of the true covariance
matrix [20] under relatively general conditions on the underlying distribution. Of course the eigenvalues are different from
the eigenvalues of the true covariance matrix, but it was shown in [31] that the order of the eigenvalues is preserved. We
build on this idea by illustrating that the SSCM is part of a larger class of orthogonally equivariant estimators, all of which
estimate the eigenvectors of the true covariance matrix and preserve the order of the eigenvalues.
The SSCM is easy to compute, and has been used extensively in several applications. The most common use of the SSCM
is probably in the context of (functional) spherical PCA as developed in [5,17,30,32]. Like classical PCA, spherical PCA aims to
find a lower dimensional subspace that captures most of the variability in the data. After centering the data, spherical PCA
projects the data onto the unit (hyper)sphere before searching for the directions of highest variability. This projection gives
all data points the same weight in the estimation of the subspace, thereby limiting the influence of potential outliers. The
Corresponding author.
E-mail address: peter@rousseeuw.net (P. Rousseeuw).
https://doi.org/10.1016/j.jmva.2018.11.010
0047-259X/©2018 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/
4.0/).
J. Raymaekers and P. Rousseeuw / Journal of Multivariate Analysis 171 (2019) 94–111 95
directions (‘loadings’) of spherical PCA thus correspond to the eigenvectors of the SSCM scatter matrix. The corresponding
scores are usually taken to be the inner products of the loading vectors with the original (centered) data points, not with the
projections of the data points on the sphere. Some concrete applications of spherical PCA are about the shape of the cornea
in ophthalmology as analyzed in [17], and for multichannel signal processing as illustrated in [31].
In addition to spherical PCA, there also has been a lot of recent research on the use of the SSCM for constructing robust
correlation estimators [7,9,10]. The main focus of this work is on results including asymptotic properties, the eigenvalues,
and the influence function which measures robustness. A third application of the SSCM is its use as an initial estimate for
more involved robust scatter estimators [4,16]. The SSCM is particularly well-suited for this task as it is very fast and highly
robust against outlying observations and therefore often yields a reliable starting value. Another application of the SSCM is
to testing for sphericity [29], which uses the asymptotic properties of the SSCM in order to assess whether the underlying
distribution of the data deviates substantially from a spherical distribution. Serneels et al. [28] use the spatial sign transform
as an initial preprocessing step in order to obtain a robust version of partial least squares regression. Finally, Boente et al. [1]
study SCCM as an operator for functional data analysis.
The next section introduces a generalization of the SSCM and studies its properties. Section 3compares the performance
of several members of this class in a small simulation study. Section 4applies the method to a real data example, and Section 5
concludes. All proofs can be found in the Appendix.
2. Methodology
2.1. Definition
Definition 1. Let Xbe a p-variate random variable and µa vector serving as its center. Define the generalized spatial sign
covariance matrix (GSSCM) of Xby
SgX(X)=EFX{gX(Xµ)gX(Xµ)},(1)
where the function gXis of the form
gX(t)=tξX(t),(2)
where we call ξX:R+R+the radial function and ∥·∥is the Euclidean norm.
Note that the form of gXin (2) precisely characterizes an orthogonally equivariant data transformation as shown in [13],
p. 276. Also note that the regular covariance matrix corresponds to ξX(r)=1, and that ξX(r)=1/ryields the SSCM.
For a finite data set X= {x1,...,xn}the GSSCM is given by
SgX(X)=1
n
n
i=1
ξ2
X{∥xiT(X)∥}{xiT(X)}{xiT(X)},(3)
where Tis a location estimator. Note that the SSCM gives the xiwith xiT(X)<1 a weight higher than 1, but in general
this is not required. In fact, the other functions we will propose satisfy ξX(r)1 for all r.
In the above definitions, we added the subscript Xor Xto the functions gand ξto indicate that they can depend on Xor
X. In what follows we will drop these subscripts to ease the notational burden. We will study the following functions ξ:
1. Winsorizing (Winsor):
ξ(r)=1 if rQ2,
Q2/rif Q2<r.(4)
2. Quadratic Winsor (Quad):
ξ(r)=1 if rQ2,
Q2
2/r2if Q2<r.(5)
3. Ball:
ξ(r)=1 if rQ2,
0 if Q2<r.(6)
4. Shell:
ξ(r)=
0 if r<Q1,
1 if Q1rQ3,
0 if Q3<r.
(7)
96 J. Raymaekers and P. Rousseeuw / Journal of Multivariate Analysis 171 (2019) 94–111
Fig. 1. Radial functions ξin Eq. (2).
5. Linearly Redescending (LR):
ξ(r)=
1 if rQ2,
(Q
3r)/(Q
3Q2) if Q2<rQ
3,
0 if Q
3<r.
(8)
The cutoffs Q1,Q2,Q3and Q
3depend on the Euclidean distances xiT(X)by
Q1=hmedi{∥xiT(X)2/3} hmadi{∥xiT(X)2/3}3/2,
Q2=hmedi{∥xiT(X)2/3}3/2=hmedi{∥xiT(X)∥},
Q3=hmedi{∥xiT(X)2/3} + hmadi{∥xiT(X)2/3}3/2,
Q
3=hmedi{∥xiT(X)2/3} + 1.4826 ×hmadi{∥xiT(X)2/3}3/2,
where hmed and hmad are variations on the median and median absolute deviation given by the order statistic hmed
(y1,...,yn)=y(h)and hmad(y1,...,yn)=hmedi|yihmedj(yj)|where h=(n+p+1)/2. The 2/3 power in these
formulas is the Wilson–Hilferty transformation [33] to near normality. In Appendix A.1 it is verified that this transformation
brings the above cutoffs close to the theoretical ones, which are quantiles of a convolution of Gamma random variables with
different scale parameters.
Fig. 1 shows the above functions ξand that of the SSCM for distances whose square follows the χ2
2distribution. The ξ
of the SSCM is the only one which upweights observations close to the center. The Winsor ξand its square have a similar
shape, but the latter goes down faster. The Ball and Shell ξfunctions are both designed to give a weight of 1 to half (in fact,
h) of the data points and 0 to the remainder, to make them comparable. Ball does this by giving a weight of 1 to the hpoints
with the smallest distances. Shell is inspired by the idea of Rocke to downweight observations both with very high and very
low distances from the center [25]. The Linearly Redescending ξis a compromise between the Ball and the Quad ξfunctions.
2.2. Preservation of the eigenstructure
In what follows, we assume that the distribution FXof Xhas an elliptical density with center zero and that its covariance
matrix Σ=EFX(XX) exists. Therefore, Xcan be written as X=UDZ, where Uis a p×porthogonal matrix, Dis a p×p
diagonal matrix with strictly positive diagonal elements, and Zis a p-variate random variable which is spherically symmetric,
i.e., its density is of the form fZ(z)w(z), where wis a decreasing function. Assume without loss of generality that the
covariance matrix of Zis Ip. The following proposition says that Sg(X) has the same eigenvectors as Σand preserves the ranks
of the eigenvalues.
Proposition 1. Let X =UDZ be a p-variate random variable as described above, with D =diag(δ1, . . . , δp)where δ1···
δp>0. Assume that the covariance matrix Sg=EFXg(X)g(X)of g(X)exists. Then Σand Sgcan be diagonalized as
Σ=UΛUand Sg=UΛgU
J. Raymaekers and P. Rousseeuw / Journal of Multivariate Analysis 171 (2019) 94–111 97
where Λ=diag(λ1, . . . , λp)with λj=δ2
jand Λg=diag(λg,1, . . . , λg,p)with λg,1··· λg,p>0and λj=λj+1λg,j=
λg,j+1.
This proposition justifies the generalized SSCM approach.
2.3. Location estimator
So far we have not specified any location estimator T. For the SSCM the most often used location estimator is the spatial
median; see, e.g., [2] and [12], which we denote by T0. The spatial median of a dataset X= {x1,...,xn}is defined as
T0(X)=arg min
θ
n
i=1xiθ.
In order to improve its robustness against a substantial fraction of outliers, we propose to use the k-step least trimmed
squares (LTS) estimator. The LTS method was originally proposed in regression [26], and for multivariate location it becomes
TLTS(X)=arg min
θ
h
i=1xθ2
(i),
where the subscript (i) stands for the ith smallest squared distance. Without the square this becomes the least trimmed
absolute distance estimator studied in [3]. For the multivariate location LTS the C-step of [27] simplifies to
Definition 2 (C-step).Fix h=(n+1)/2. Given a location estimate Tj1(X), we take the set Ij= {i1,...,ih}⊂{1,...,n}
such that {∥xiTj1(X) : iIj}are the hsmallest distances in the set {∥xiTj1(X) : i=1,...,n}. The C-step then yields
Tj(X)=1
h
iIj
xi.
The C-step is fast to compute, and guaranteed to lower the LTS objective. The k-step LTS is then the result of ksuccessive
C-steps starting from the spatial median T0(X).
It is also possible to avoid the estimation of location altogether, by calculating the GSSCM on the O(n2) pair-wise
differences of the data points. This approach is called the ‘‘symmetrization’’ of an estimator, but is more computationally
intensive. Visuri et al. [31] studied the symmetrized SSCM and called it Kendall’s τcovariance matrix.
2.4. Robustness properties
A major reason for the SSCM’s popularity is its robustness against outliers. Robustness can be quantified by the influence
function and the breakdown value. We will study both for the GSSCM.
The influence function [13] quantifies the effect of a small amount of contamination on a statistical functional T. Consider
the contaminated distribution Fε,z=(1 ε)F+ε(z), where (z) is the distribution that puts all its mass in z. The influence
function of Tat Fis then given by
IF(z,T,F)=lim
ε0
T(Fε,z)T(F)
ε=
∂ε T(Fε ,z)ε=0
.
For the generalized SSCM class we obtain the following result:
Proposition 2. Denote Sg(F)=Ξgand let µ=0in (1). The influence function of Sgat the distribution F is given by
IF(z,Sg,F)=
∂ε Sg(Fε,z)ε=0=g(z)g(z)Ξg+
∂ε gε(X)gε(X)dF (X)ε=0
.(9)
If gdoes not depend on F, the last term of (9) vanishes. For example, for g(t)=t, we retrieve the IF of the classical
covariance matrix IF(z,Σ,F)=zzΣ, and for g(t)=t/twe obtain IF(z,SSCM,F)=(z/z) (z/z)SSCM(F) in
line with the findings of [5]. For the GSSCM estimators defined by the functions (4)(8) the last term of (9) remains, and the
expressions of their IF can be found in Appendix A.3.
In order to visualize the influence function we consider the bivariate standard normal case, i.e., F=N(0,I2). We put
contamination at (z,z) or (z,0) for different values of zand plot the IF for the diagonal elements and the off-diagonal element.
Note that we cannot compare the raw IFs directly as Sg(F)=Ξg=cgI, where cg=g1(X)2dF(X); hence Ξgis only equal
to I2up to a factor. In order to make the estimators consistent for this distribution, we can divide them by cg, and so we plot
IF(z,Sg,F)/cgin Fig. 2.
The rows in Fig. 2 correspond to the IF of the first diagonal element S11 (top), the off-diagonal element S12 (middle) and
the element S22 (bottom). Let us first consider the left part of the figure, which contains the IFs for an outlier in (z,z). By
symmetry, the IFs of the diagonal elements S11 and S22 are the same here. In the regions where the function ξis 1 the IF is
98 J. Raymaekers and P. Rousseeuw / Journal of Multivariate Analysis 171 (2019) 94–111
Fig. 2. Influence functions of the GSSCM at the bivariate standard normal distribution for contamination at (z,z) (left) and (z,0) (right). The rows correspond
to the first diagonal element S11 (top), the off-diagonal element S12 (middle), and S22 (bottom).
quadratic, like that of the classical covariance. The diagonal elements of the IF of the SSCM are zero, except at z=0 where it
takes the value 1. The Quad IF is the only one which redescends as |z|increases, whereas the others are also bounded but
J. Raymaekers and P. Rousseeuw / Journal of Multivariate Analysis 171 (2019) 94–111 99
stabilize at a value around 1.3. The shape of the IF of the Ball estimator resembles that of the univariate Huber M-estimator
of scale.
For the IF of the off-diagonal element S12, the picture is very different. All are redescending except for the SSCM and
Winsor. Here it is Winsor whose IF resembles that of Huber’s M-estimator of scale. Note that the IFs of the Ball and Shell
estimators have large jumps at their cutoff values. The discontinuities in the IFs are due to the fact that the cutoffs depend
on the median and the MAD of the distances X2/3, as both the median and the MAD have jumps in their IF.
The right panel of Fig. 2 shows the influence functions for an outlier in (z,0). In this case the IFs of the diagonal elements
S11 and S22 are no longer the same, as the symmetry is broken. The IFs of S11 are again quadratic where ξ=1, with jumps at
the cutoffs. Note that these cutoffs are now located at different values of z, as (z,0) = (z,z). The IF of the off-diagonal
element is constant at 0, indicating that S12 remains zero even when there is an outlier at (z,0). Finally, for the second
diagonal element S22 the IF of the SSCM is 1. This is because adding εof contamination at (z,0) reduces the mass of the
remaining part of Fby εwhich lowers the estimated scatter in the vertical direction. For the other estimators there is an
additional effect of (z,0) on the cutoffs, which causes the discontinuities.
A second tool for quantifying the robustness of an estimator is the finite-sample breakdown value [6]. For a multivariate
location estimator Tand a dataset Xof size n, the breakdown value is the smallest fraction of the data that needs to be
replaced by contamination to make the resulting location estimate lie arbitrarily far away from the original location T(X).
More precisely,
ε(T,X)=min m/n:sup
X
mT(X
m)T(X)∥=∞,
where X
mranges over all datasets obtained by replacing any mpoints of Xby arbitrary points.
For a multivariate estimator of scale S, the breakdown value is defined as the smallest fraction of contamination needed
to make an eigenvalue of Seither arbitrarily large or arbitrarily close to zero. We denote the eigenvalues of S(X) by
λ1{S(X)}··· λp{S(X)}. The breakdown value of Sis then given by
ε(S,X)=min m/n:sup
X
m
max[λ1{S(X
m)}, λ1
p{S(X
m)}] = .
For the results on breakdown we assume the following conditions on the function ξ:
1. The function ξtakes values in [0,1].
2. For any dataset X, one has #{xi:ξ{∥xiT(X)∥} = 1}(n+p+1)/2.
3. For any vector t, one has g(t)∥=∥tξ(t)hmedi(di)+1.4826 ×hmadi(di).
Note that all functions ξproposed in (4)(8) satisfy these assumptions. The following proposition gives the breakdown value
of the GSSCM scatter estimator Sg.
Proposition 3. Let X= {x1,...,xn}be a p-dimensional dataset in general position, meaning that no p +1points lie on
the same hyperplane. Also assume that the location estimator T has a breakdown value of at least (np+1)/2/n. Then
ε(Sg,X)=(np+1)/2/n.
As we would like the GSSCM scatter estimator to attain this breakdown value, we have to use a location estimator whose
breakdown value is at least (np+1)/2/n. The following proposition verifies that the k-step LTS estimator satisfies this,
and even attains the best possible breakdown value for translation equivariant location estimators.
Proposition 4. The k-step LTS estimator Tksatisfies ε(Tk,X)=(n+1)/2/n at any p-variate dataset X= {x1,...,xn}. When
the C-steps are iterated until convergence (k ), the breakdown value remains the same.
3. Simulation study
We now perform a simulation study comparing the GSSCM versions (4)(8). As the estimators are orthogonally
equivariant, it suffices to generate diagonal covariance matrices. We generate m=1000 samples of size n=100 from
the multivariate Gaussian distribution of dimension p=10 with center µ=0and covariance matrices Σ1=Ip(‘constant
eigenvalues’), Σ2=diag(10,9,...,1) (‘linear eigenvalues’), and Σ3=diag(102,92,...,1) (‘quadratic eigenvalues’). To
assess robustness we also add 20% and 40% of contamination in the direction of the last eigenvector, at the point (0,...,0, γ )
for several values of γ. For the location estimator Tin (3) we used the k-step LTS with k=5.
For measuring how much the estimated
Σdeviates from the true Σwe use the Kullback–Leibler divergence (KLdiv)
given by
KLdiv(
Σ,Σ)=trace(
ΣΣ1)ln{det(
ΣΣ1)} p.
We also consider the shape matrices
Γ= {det(
Σ)}1/p
Σand Γ= {det(Σ)}1/pΣwhich have determinant 1, and compute
KLdivshape(
Σ,Σ)=KLdiv(
Γ,Γ). Both the KLdiv and the KLdivshape are then averaged over the m=1000 replications.
100 J. Raymaekers and P. Rousseeuw / Journal of Multivariate Analysis 171 (2019) 94–111
Fig. 3. Simulation results: KLdiv (left) and KLdivshape (right) for the uncontaminated normal distribution, with constant, linear and quadratic eigenvalues.
Fig. 3 shows the simulation results on the uncontaminated data. Looking at KLdiv (left panel), we note that the SSCM
deviates the most from the true covariance matrix Σ. Among the other choices, Winsor and Quad have the lowest bias,
followed by LR, Shell, and Ball. When looking only at the shape component (right panel), SSCM performs the best when the
distribution is spherical (constant eigenvalues), in line with Remark 3.1 in [19]. However, it loses this dominant performance
once the distribution deviates from sphericity. Among the other GSSCM methods Winsor performs the best, followed by its
quadratic counterpart, LR, Shell, and finally Ball.
The result for the simulation with 20% of point contamination is presented in Fig. 4. All plots are as a function of γ,
which indicates the position of the outliers. In the left panel (KLdiv), the SSCM has a large bias. The Winsor GSSCM, which
did very well in the uncontaminated setting, now has a disappointing performance when the eigenstructure becomes more
challenging with linear or quadratic eigenvalues. Quad performs a lot better, but also suffers under quadratic eigenvalues.
LR and Shell perform the best here, followed by Ball. Their redescending nature helps them for far outliers. The conclusions
for the shape component (right panel) are largely similar, except that Winsor and especially Ball look worse here.
The simulation results for 40% of contamination are shown in Fig. 5. The KLdiv plots on the left indicate that the SSCM
performs poorly for constant and linear eigenvalues, and looks better for quadratic eigenvalues but not when γis large
(far outliers). Winsor performs badly for linear and quadratic eigenvalues, whereas Quad does much better. Ball looks okay
except for relatively small γ. LR and Shell perform the best for both small and large γ, and are okay for intermediate γ.
When estimating the shape component (right panels) SSCM and Winsor have the worst performance overall, whereas Ball
also does poorly for small to intermediate γ. LR and Shell are the best picks here. Quad does almost as well, but redescends
more slowly.
4. Application: Principal component analysis
We analyze a multivariate dataset from a study by Reaven and Miller [24]. The dataset contains five numerical variables
for 109 subjects, consisting of 33 overt diabetes patients and 76 healthy people. The variables are body weight, fasting plasma
glucose, area under the plasma glucose curve, area under the plasma insulin curve, and steady state plasma glucose response.
These data were previously analyzed in [22] in the context of clustering using statistical data depth, and are available in
the Rpackage ddalpha [23] under the tag chemdiab_2vs3. Here we analyze the data by principal component analysis.
We first standardize the data, as the variables have quite different scales. Denote the standardized observations by zifor
i {1,...,109}.
We consider the diabetes patients as outliers and would like the PCA subspace to model the variability within the healthy
patients. For classical PCA, the PCA subspace corresponds to the linear span of the keigenvectors (also called ‘loadings’) of
the covariance matrix which correspond with the klargest eigenvalues. In similar fashion we can perform PCA based on the
GSSCM with the LR radial function (8), by considering the linear span of its kfirst eigenvectors. We take k=3 components,
thereby explaining more than 95% of the variance.
Fig. 6 shows the scores with respect to the first 3 loadings for classical PCA and GSSCM PCA. The scores siare the projections
of the observations zionto the PCA subspace, i.e., si,j=z
ivjwhere vjdenotes the jth eigenvector. From these plots, it is clear
that the first eigenvector of the classical PCA is heavily attracted by the diabetes patients. As a result, the outliers are only
J. Raymaekers and P. Rousseeuw / Journal of Multivariate Analysis 171 (2019) 94–111 101
Fig. 4. Simulation results: KLdiv (left) and KLdivshape (right) for the normal distribution with constant (top), linear (middle) and quadratic (bottom)
eigenvalues and 20% of contamination. The outliers were placed at the point (0,...,0, γ ).
distinguishable in their scores with respect to the first principal component. This is very different for the GSSCM PCA, where
the principal components seem to fit the healthy patients better, resulting in outlying scores for the diabetes patients with
respect to several principal components.
102 J. Raymaekers and P. Rousseeuw / Journal of Multivariate Analysis 171 (2019) 94–111
Fig. 5. Simulation results: KLdiv (left) and KLdivshape (right) for the normal distribution with constant (top), linear (middle) and quadratic (bottom)
eigenvalues and 40% of point contamination.
In addition to the scores plots, the PCA outlier map of [15] can serve as a diagnostic tool for identifying outliers. It
plots the orthogonal distance ODiagainst the score distance SDifor every observation ziin the dataset. The score distance
J. Raymaekers and P. Rousseeuw / Journal of Multivariate Analysis 171 (2019) 94–111 103
Fig. 6. Scores from the 3 first loading vectors of classical PCA (left) and GSSCM PCA (right).
Fig. 7. Outlier maps based on classical PCA (left) and GSSCM PCA (right).
of observation icaptures the distance between the observation and the center of the data within the PCA subspace. It is
given by
SDi=
3
j=1
(sij/ˆσj)2,
where ˆσjdenotes the scale of the jth scores. For classical PCA ˆσjis their standard deviation, whereas for GSSCM PCA we take
their median absolute deviation. The orthogonal distance to the PCA subspace is given by ODi= ziVsi, where Vis the
5×3 matrix containing the three eigenvectors in its columns. Both the score distances and the orthogonal distances have
cutoffs, described in [15]. Fig. 7 shows the outlier maps resulting from the classical PCA and the GSSCM PCA. Classical PCA
clearly fails to distinguish the diabetes patients from the healthy subjects. In contrast, GSSCM PCA flags most of the diabetes
patients as having both an abnormally high orthogonal distance to the PCA subspace as well as having a projection in the
PCA subspace far away from those of the healthy subjects.
104 J. Raymaekers and P. Rousseeuw / Journal of Multivariate Analysis 171 (2019) 94–111
5. Conclusions
The spatial sign covariance matrix (SSCM) can be seen as a member of a larger class called Generalized SSCM (GSSCM)
estimators in which other radial functions are allowed. It turns out that the GSSCM estimators are still consistent for the true
eigenvectors while preserving the ranks of the eigenvalues. Their computation is as fast as the SSCM. We have studied five
GSSCM methods with intuitively appealing radial functions, and shown that their asymptotic breakdown values are as high
as that of the original SSCM. We also derived their influence functions and carried out a simulation study.
The radial function of the SSCM is ξ(r)=1/rwhich implies that points near the center are given a very high weight
in the covariance computation. Our alternative radial functions give these points a weight of at most 1, which yields better
performance at uncontaminated Gaussian data (Fig. 3) as well as contaminated data (Figs. 4 and 5). In particular, Winsor is
the most similar to SSCM since its ξ(r) is 1 for the central half of the data and 1/rfor the outer half. It performs best for
uncontaminated data, but still suffers when far outliers are present. It is almost uniformly outperformed by Quad, whose
ξ(r) is 1 in the central half and 1/r2outside it. The influence of outliers on Quad smoothly redescends to zero. The other
three estimators are hard redescenders whose ξ(r)=0 for large enough r. Among them, the linear redescending (LR) radial
function performed best overall.
A potential topic for further research is to investigate principal component analysis based on a GSSCM covariance matrix.
Software availability
R-code for computing these estimators and an example script are available from the website wis.kuleuven.be/stat/robust/
software.
Acknowledgments
This research was supported by projects of Internal Funds KU Leuven, Belgium.
Appendix
A.1. Distribution of Euclidean distances
Exact distribution. The exact distribution of the squared Euclidean distances X2of a multivariate Gaussian distribution with
general covariance matrix is given by the following result:
Proposition 5. Let X N(0,Σ), and suppose the eigenvalues of Σare given by λ1, . . . , λp. Then
X2
p
i=1
Γ(1/2,2λi).
For p we have X2N(
i=1λi,2
i=1λ2
i).
Proof. We can write X=UDZ, where Uis an orthogonal matrix, Dis the diagonal matrix with elements λ1,...,λp, and
Zfollows the p-variate standard Gaussian distribution. Note that X2= UDZ2= DZ2=p
i=1λiZ2
i, where Z2
iχ2(1).
Therefore, λiZ2
iΓ(1/2,2λi) so the distribution of X2is a sum of iid gamma distributions with a constant shape of 1/2
and varying scale parameters equal to twice the eigenvalues of the covariance matrix. As p , one has
X2N
i=1
λi,2
i=1
λ2
i
by the Lyapunov Central Limit Theorem.
Approximate distribution of a sum of Gamma variables. Proposition 5 gives the exact distribution of the squared Euclidean
distances X2. The distribution of a sum of gamma distributions has been studied in [21]. Quantiles of this distribution can
be computed by the Rpackage coga [14] for convolutions of gamma distributions. However, this computation requires the
knowledge of the eigenvalues λ1, . . . , λpthat we are trying to estimate. Therefore we need a transformation of the Euclidean
distances such that the transformed distances have an approximate distribution whose quantiles do not require knowing
λ1, . . . , λp.
In the simplest case λ1= ··· = λp(constant eigenvalues), and then X21follows a χ2
pdistribution. It is known that
when pincreases the distribution of X2tends to a Gaussian distribution, but this also holds for some other powers of
X. Wilson and Hilferty [33] found that the best transformation of this type was X2/3in the sense of coming closest to a
J. Raymaekers and P. Rousseeuw / Journal of Multivariate Analysis 171 (2019) 94–111 105
Fig. 8. Approximation of the third quartile of a coga distribution for dimensions p {1,...,20}when the eigenvalues are constant (top left), linear (top
right), or quadratic (bottom), using three different normalizing transforms.
Gaussian distribution. The quantiles qαof a Gaussian distribution are easier to compute and can then be transformed back
to q3/2
α.
It turns out that the same Wilson–Hilferty transformation also works quite well in the more general situation where the
eigenvalues λ1, . . . , λpneed not be the same. We came to this conclusion by a simulation study, a part of which is illustrated
here. The dimension pranged from 1 to 20 by steps of 1. For each p, we generated n=106observations y1,...,ynfrom the
coga distribution with shape parameters (0.5,...,0.5). The scale parameters had three settings: constant (2,...,2), linear
(p,p1,...,1), and quadratic (p2,(p1)2,...,1), after which the scale parameters were further standardized in order
to sum to 2p. These correspond to the distribution of the squared Euclidean norms of a multivariate normal distribution
where the covariance matrix has eigenvalues that are constant or proportional to (p,p1,...,1) (linear eigenvalues) or
to (p2,(p1)2,...,1) (quadratic eigenvalues). Denote the unsquared Euclidean norms as ri=yi. Then we estimate
quantiles, e.g., Q3by assuming normality of the transformed values h1(ri)=r2
i(square), h2(ri)=ri(Fisher), and h3(ri)=r2/3
i
(Wilson–Hilferty), by computing the third quartile of a Gaussian distribution with ˆµ=mediani{h(ri)}and ˆσ=madi{h(ri)}.
Finally, we have evaluated the cumulative distribution function of the coga distribution in ˆ
Q2
3. Ideally, we would like to obtain
Fcoga(ˆ
Q2
3)=0.75. The result of this experiment is shown in Fig. 8. We clearly see that the Wilson–Hilferty transform brings
the approximate quantile closest to its target value. The results for the first quartile Q1 (not shown) are very similar.
A.2. Proof of Proposition 1
Part 1: Preservation of the eigenvectors. First note that gis orthogonally equivariant, i.e., g(HX)=Hg(X) for any orthogonal
matrix H. Therefore Sg=EFX{g(X)g(X)}implies EFX{g(HX)g(HX)} = HSgH.
106 J. Raymaekers and P. Rousseeuw / Journal of Multivariate Analysis 171 (2019) 94–111
The distribution of Zis spherically symmetric hence invariant to reflections along a coordinate axis, which are described
by diagonal matrices Rwith an entry of 1 and all other entries +1. For every reflection matrix Rit thus holds that
EFZ{g(DZ)g(DZ)} = EFZ{g(DRZ)g(DRZ)} = EFZ{g(RDZ )g(RDZ)} = RE{g(DZ )g(DZ)}R, where the second equality holds
because DR =RD as both Dand Rare diagonal, and the last equality because R is orthogonal. Therefore EFZ{g(DZ)g(DZ)}is
a diagonal matrix, which we can denote as Λg=diag(λg,1, . . . , λg,p).
Now take Uan arbitrary orthogonal matrix and let X=UDZ. Then
Sg=EFZ{g(UDZ)g(UDZ)} = UEFZ{g(DZ)g(DZ)}U=UΛgU.
For the plain covariance matrix Σof X we have Σ=EFZ{UDZ(UDZ)} = UΛU, where Λ=DD=diag(δ2
1, . . . , δ2
p).
Therefore, the same matrix Uorthogonalizes both Σand Sg, hence Sgand Σhave the same eigenvectors.
Part 2: Preservation of the ranks of the eigenvalues. Let i>jand suppose that δi> δj. We will show that λg,i> λg,j. Note that
λg,i=g(DZ)2
ifZ(Z)dZ =δ2
iz2
iξ(DZ)2fZ(Z)dZ,
where fZis the density of Z. Similarly, we have
λg,j=g(DZ)2
jfZ(Z)dZ =δ2
jz2
jξ(DZ)2fZ(Z)dZ.
This means that λg,i> λg,jis equivalent to
(δ2
iz2
iδ2
jz2
j)ξ(DZ)2fZ(Z)dZ >0.(A.1)
As Zis spherically symmetric, i.e., fZ(Z)w(Z), we can write (A.1) as
(δ2
iz2
iδ2
jz2
j)ξ(DZ)2w(Z)dZ >0.(A.2)
Note that we can change the variable of integration as follows. Let yk=δkzkand write Y=(y1,...,yp). Then (A.2) is
equivalent to
1
δ1···δp
(y2
iy2
j)ξ(Y)2w
p
k=1
y2
k
δ2
k
dY
>0.(A.3)
We can ignore the positive constant 1/(δ1···δp) and split the integral over the domains A= {xRd: |xi|>|xj|} and
B= {xRd: |xi|<|xj|}, yielding
(y2
iy2
j)ξ(Y)2w
p
k=1
y2
k
δ2
k
dY =A
(y2
iy2
j)ξ(Y)2w
p
k=1
y2
k
δ2
k
dY
+B
(y2
iy2
j)ξ(Y)2w
p
k=1
y2
k
δ2
k
dY
=A
(y2
iy2
j)ξ(Y)2w
p
k=1
y2
k
δ2
k
dY
+A
(y2
jy2
i)ξ(Y)2w
p
k=1
y2
k
δ2
k+ij
dY
=A
(y2
iy2
j)ξ(Y)2
w
p
k=1
y2
k
δ2
k
w
p
k=1
y2
k
δ2
k+ij
dY
where in the second equality we have changed the variables of the integration over Bby replacing (yi,yj) by (yj,yi) which
has Jacobian 1. The ij in that step is the correction term
ij =y2
i2
j+y2
j2
iy2
i2
iy2
j2
j=(y2
iy2
j)2
j(y2
iy2
j)2
i=(y2
iy2
j)(12
j12
i).
J. Raymaekers and P. Rousseeuw / Journal of Multivariate Analysis 171 (2019) 94–111 107
Note that on Ait holds that |yi|>|yj|hence y2
iy2
j>0 so ij >0. Since wis a decreasing function, it follows that
w
p
k=1
y2
k
δ2
k
w
p
k=1
y2
k
δ2
k+ij
>0
which implies (A.3) so λg,i> λg,j. In contrast if δiand δjare tied, i.e., δi=δj, it follows that ij =0 hence λg,i=λg,j. This
concludes the proof of Proposition 1.
A.3. Influence functions
Proof of Proposition 2.Consider the contaminated distribution Fε,z=(1 ε)F+εz, where zRpand ε [0,1]. We
then have
Sg(Fε,z)=EFε,z{g(X)g(X)} = (1 ε)gε(X)gε(X)dF (X)+εgε(X)gε(X)dz.
If we take the derivative with respect to εand evaluate it in ε=0, we get
∂ε Sg(Fε,z)ε=0=g(z)g(z)Ξg+
∂ε gε(X)gε(X)dF (X)ε=0
.
Calculation of the IF. While the expression of the influence function might seem relatively simple, its (numerical) calculation
is rather involved. We can write
∂ε gε(X)gε(X)dF (X)ε=0=
∂ε {gε(X)}gε(X)+gε(X)
∂ε gε(X)dF (X)ε=0
=
∂ε gε(X)ε=0g(X)+g(X)
∂ε gε(X)ε=0dF (X).
So the term we need to determine is gε(X)/∂ ε|ε=0. Recalling that g(t)=tξ(t) we have gε(t)=tξε(t). This means
that the contamination affects gbecause it affects the radial function ξ. Therefore we have to compute gε(X)/∂ ε|ε=0=
X∂ξε(X)/∂ ε|ε=0for the functions ggiven by (4)(8).
In these functions ξdepends on FXthough the distribution of X2/3. Suppose that X2/3Gwhen XF, so Gis
a univariate distribution. For XεFε,z=(1 ε)F+εzwe then have Xε2/3Gε,z2/3=(1 ε)G+εz2/3. For
uncontaminated data the density of X2/3is given by
fG(t)=fcoga(t3)|3t2|,
where fcoga is the density of the convolution of gamma distributions. We need this density to evaluate the influence
functions of their median and mad. The cutoffs in the paper are
Q1=(hmedX2/3hmadX2/3)3/2,Q2=(hmedX2/3)3/2,
Q3=(hmedX2/3+hmadX2/3)3/2,Q
3=(hmedX2/3+1.4826 ×hmadX2/3)3/2,
and we can compute their influence functions, viz.
IF(z,Q1,F)=3
2median(G)mad(G){IF(z2/3,median,G)IF(z2/3,mad,G)},
IF(z,Q2,F)=3
2median(G)IF(z2/3,median,G),
IF(z,Q3,F)=3
2median(G)+mad(G){IF(z2/3,median,G)+IF(z2/3,mad,G)},
IF(z,Q
3,F)=3
2median(G)+1.4826 ×mad(G){IF(z2/3,median,G)
+1.4826 ×IF(z2/3,mad,G)}.
The Winsor GSSCM is given by ξ(r)=1rQ2+Q2/r1r>Q2. For the contaminated case this becomes ξε(r)=1rQ2 +
Q2/r1r>Q2 . We then have
∂ε ξε(r)=
∂ε 1[0,Q2 ](r)+Q2
r1(Q2,)(r)=δ(rQ2 )Q
2 +Q
2
r1(Q2,)(r)Q2
rδ(rQ2)Q
2,
108 J. Raymaekers and P. Rousseeuw / Journal of Multivariate Analysis 171 (2019) 94–111
where δ(xy) denotes the distributional derivative of 1(−∞,x](y)=1[y,)(x) with respect to x. Evaluation in ε=0 gives
δ(rQ2)IF(z,Q2,F)+IF(z,Q2,F)
r1(Q2,)(r)Q2
rδ(rQ2)IF(z,Q2,F)
=1Q2
rδ(rQ2)IF(z,Q2,F)+IF(z,Q2,F)
r1(Q2,)(r).
As (1Q2/r)δ(rQ2) is 0 everywhere, we only need to integrate the last term. This yields
∂ε gε(X)ε=0=X
XIF(z,Q2,F)1(Q2,)(X).
The influence function of Sgis thus given by
IF(z,Sg,F)=g(z)g(z)Ξg(F)
+X
XIF(z,Q2,F)1(Q2,)(X)g(X)dF(X)+g(X)X
XIF(z,Q2,F)1(Q2,)(X)dF(X).
Note that the last two terms in the sum are each other’s transpose. The integration is done numerically.
The derivation of the influence function of the Quad GSSCM is entirely similar to that of Winsor. The main difference is
that now gε(X)/∂ε|ε=0is given by
∂ε gε(X)ε=0=2Q2IF(z,Q2,F)X
X21(Q2,)(X).
The linearly redescending (LR) method uses a second cutoff, viz.
ξ(r)=
1 if rQ2,
(Q
3r)/(Q
3Q2) if Q2<rQ
3,
0 if r>Q
3.
In the contaminated case we obtain gε(x)=xξε(x) with
ξε(r)=
1 if rQ2,
(Q
3 r)/(Q
3 Q2) if Q2 <rQ
3,
0 if r>Q
3.
Taking the derivative with respect to εyields
∂ε ξε(r)=δ(rQ2 )+Q
3 r
Q
3 Q2 δ(rQ
3)δ(rQ2 )
+1[Q2,Q
3]
Q∗′
3(Q
3 Q2)(Q∗′
3 Q
2)(Q
3 r)
(Q
3 Q2)2.
Evaluation in ε=0 gives
δ(rQ2)+Q
3r
Q
3Q2δ(rQ
3)δ(rQ2)
+1[Q2,Q
3]
IF(z,Q
3,F)(Q
3Q2) {IF(z,Q
3,F)IF(z,Q2,F)}(Q
3r)
(Q
3Q2)2.
When integrating only the last term plays a role, yielding
∂ε gε(X)ε=0=X1[Q2,Q
3](X)
IF(z,Q
3,F)(Q
3Q2) {IF(z,Q
3,F)IF(z,Q2,F)}(Q
3 X)
(Q
3Q2)2
=X1[Q2,Q
3](X)IF(z,Q
3,F)(X Q2)+IF(z,Q2,F)(Q
3 X)
(Q
3Q2)2.
For the Ball GSSCM we analogously derive that
∂ε gε(X)ε=0=δ(X Q2)IF(z,Q2,F)X.
J. Raymaekers and P. Rousseeuw / Journal of Multivariate Analysis 171 (2019) 94–111 109
Finally, for the Shell GSSCM we obtain
∂ε gε(X)ε=0={δ(X Q3)IF(z,Q3,F)δ(X Q1)IF(z,Q1,F)}X.
This concludes the proof of Proposition 2.
A.4. Breakdown values
Proof of Proposition 3.Denote by Jthe set of all subsets of {1,...,n}with p+1 elements. For every subset JJwe
define ηJ=maxiJd2(xi,HJ), where HJis the hyperplane minimizing
iJ
d2(xi,H)
over all possible hyperplanes Hand d(x,H) is the Euclidean distance between a point xand a hyperplane H.
Define ηX=minJJηJ. Since the original points {x1,...,xn}are in general position, no p+1 points can lie on the same
hyperplane, which ensures that ηX>0. We also put c1=maxixiT(X)<.
Part 1. We first need to show that ε(np+1)/2/n.
Let m<(np+1)/2and replace mobservations of X= {x1,...,xn}yielding Xwith location estimate T(X).
Because m/nis below the breakdown value of T, there is a constant c2<so that T(X)T(X)c2for all such
contaminated datasets X. By the triangle inequality, xiT(X)c1+c2<. This implies hmed(d
i)c1+c2,
hence hmed(d
i)+1.4826 ×hmad(d
i)2.4826 ×hmed(d
i)2.4826 ×(c1+c2), where d
i= x
iT(X). Therefore
g(t)2.4826 ×(c1+c2) by condition 3.
First we show that the largest eigenvalue of Sg(X) is bounded over all such datasets X. Take any X, obtained by replacing
mpoints of Xby arbitrary points. Then
λmax =sup
u∥=1
uSg(X)u=sup
u∥=1
1
n
n
i=1
ug{x
iT(X)}g{x
iT(X)}u
=sup
u∥=1
1
n
n
i=1ug{x
iT(X)}2sup
u∥=1
1
n
n
i=1u2g{x
iT(X)}∥2{2.4826 ×(c1+c2)}2<.
Next we show that the smallest eigenvalue of Sg(X) has a positive lower bound for all contaminated datasets X.
By condition 2 on ξwe know that #{xi:ξ{∥xiT(X)∥} = 1}(n+p+1)/2. Therefore, we have at least
(n+p+1)/2((np+1)/21) =p+1 regular points for which ξ{∥xiT(X)∥} = 1, let us assume without
loss of generality that these are x1,...,xp+1. We can now write
λmin =min
u∥=1uSg(X)u=min
u∥=1
1
n
n
i=1
ug{x
iT(X)}g{x
iT(X)}u=min
u∥=1
1
n
n
i=1ug{x
iT(X)}2
min
u∥=1
1
n
p+1
i=1u{xiT(X)}ξ{xiT(X)}2=min
u∥=1
1
n
p+1
i=1u{xiT(X)}2
1
n
p+1
i=1
d2(xi,H{1,...,p+1})ηX>0.
Part 2. It remains to show that ε(np+1)/2/n. This is the known upper bound for affine equivariant scatter
estimators but that result does not apply here, so we need to show it for this case. Take any m(np+1)/2and
replace the last mpoints of X, keeping the points x1,...,xnmunchanged. By location equivariance we can assume without
loss of generality that the average of x1,...,xnmis zero. For j {nm+1,...,n}, put x
j=λaj, where ajis such that
mini∈{nm+1,...,n}ajai1 and such that for all λ > 1 one has mini∈{1,...,nm}λajxiλ. This is possible by placing
the ajoutside of the convex hull of Xand far enough from each other and X.
Now consider an unbounded increasing sequence of λk>1. For every λkthe set {x
nm+1,...,x
n}must contain at least
one point for which wi=1, call this point x
b. Take another point of Xfor which wi=1, name this x
c. Note that x
ccan be an
original data point or a replaced point. We now have that x
bx
cλhence x
bT(X)+ x
cT(X)λ. Therefore
x
bT(X)2+ x
cT(X)2λ2/2. We then obtain
p
j=1
λj{Sg(X)} = trace{Sg(X)} = 1
n
n
i=1
trace[{x
iT(X)}{x
iT(X)}] = 1
n
n
i=1x
iT(X)2
1
n{∥x
bT(X)2+ x
cT(X)2}λ2/(2n).
This becomes arbitrarily large and so Sg(X) explodes. This concludes the proof of Proposition 3.
110 J. Raymaekers and P. Rousseeuw / Journal of Multivariate Analysis 171 (2019) 94–111
Proof of Proposition 4.Showing that ε(T,X)(n+1)/2/nis easy, since (n+1)/2/nis the upper bound on the
breakdown value of all translation equivariant location estimators; see, e.g., [18].
It remains to show that ε(T,X)(n+1)/2/n.
Note that the objective given by the sum of the hsmallest squared Euclidean distances is nonincreasing in every C-step.
The value of the objective function after step kis
h
j=1
d2
(j){X,Tk(X)},
where d(j){X,Tk(X)}denotes the jth order statistic of the distances xiTk(X), and we have that
h
j=1
d2
(j){X,Tk(X)}
h
j=1
d2
(j){X,Tk1(X)}.
Recall that h=(n+1)/2and define c1=maxixiTk(X)<. Let m<nhand replace without loss of generality
the last mobservations of X= {x1,...,xn}to obtain X= {x1,...,xnm,x
nm+1,...,x
n} = {x
1,...,x
n}. Since the spatial
median T0does not yet break down for this m[18], there is a constant c2such that maxixiT0(X)c2<for all such
datasets X.
Consider Tk(X) and the corresponding objective function h
j=1d2
(j){X,Tk(X)}. Since the C-step does not increase the
value of the objective function, we have that
h
j=1
d2
(j){X,Tk(X)}
h
j=1
d2
(j){X,Tk1(X)}···
h
j=1
d2
(j){X,T0(X)}.
Note that
h
j=1
d2
(j){X,T0(X)}
h
i=1x
iT0(X)2=
h
i=1xiT0(X)2h
i=1xiT0(X)2
(hc2)2.
Since mis at most (n1)/2and h=(n+1)/2we have at least (n+1)/2(n1)/2=1 point xjwith
1jnmfor which xjTk(X)2d2
(h){X,Tk(X)}. Note that
xjTk(X)2
h
j=1
d2
(j){X,Tk(X)}
h
j=1
d2
(j){X,T0(X)}.
So for this xjwe can write
Tk(X)T0(X)Tk(X)xj∥+∥xjT0(X)hc2+c1<.
Note that this upper bound does not depend on kand therefore remains valid when the procedure is iterated until
convergence (k ). This concludes the proof of Proposition 4.
References
[1] G. Boente, D. Rodriguez, M. Sud, The spatial sign operator: Asymptotic results and applications, J. Multivariate Anal. 170 (2018) (in press).
[2] B.M. Brown, Statistical uses of the spatial median, J. R. Stat. Soc. Ser. B Stat. Methodol. 45 (1983) 25–30.
[3] C. Chatzinakos, L. Pitsoulis, G. Zioutas, Optimization techniques for robust multivariate location and scatter estimation, J. Comb. Optim. 31 (2016)
1443–1460.
[4] C. Croux, C. Dehon, A. Yadine, The k-step spatial sign covariance matrix, Adv. Data Anal. Classif. 4 (2010) 137–150.
[5] C. Croux, E. Ollila, H. Oja, Sign and rank covariance matrices: Statistical properties and application to principal components analysis, in: Y. Dodge (Ed.),
Statistical Data Analysis Based on the L1-Norm and Related Methods, Birkhäuser, Basel, 2002, pp. 257–269.
[6] D. Donoho, P. Huber, The notion of breakdown point, in: P. Bickel, K. Doksum, J. Hodges (Eds.), A Festschrift for Erich Lehmann, Wadsworth, Belmont,
CA, pp. 157–184.
[7] A. Dürre, R. Fried, D. Vogel, The spatial sign covariance matrix and its application for robust correlation estimation, Austrian J. Statist. 46 (2017) 13–22.
[8] A. Dürre, D.E. Tyler, D. Vogel, On the eigenvalues of the spatial sign covariance matrix in more than two dimensions, Statist. Probab. Lett. 111 (2016)
80–85.
[9] A. Dürre, D. Vogel, Asymptotics of the two-stage spatial sign correlation, J. Multivariate Anal. 144 (2016) 54–67.
[10] A. Dürre, D. Vogel, R. Fried, Spatial sign correlation, J. Multivariate Anal. 135 (2015) 89–105.
[11] A. Dürre, D. Vogel, D.E. Tyler, The spatial sign covariance matrix with unknown location, J. Multivariate Anal. 130 (2014) 107–117.
[12] J.C. Gower, Algorithm AS 78: The Mediancentre, J. R. Stat. Soc. Ser. C. Appl. Stat. 23 (1974) 466–470.
[13] F. Hampel, E. Ronchetti, P. Rousseeuw, W. Stahel, Robust Statistics: The Approach Based on Influence Functions, Wiley, New York, 1986.
[14] C. Hu, V. Pozdnyakov, J. Yan, Coga: Convolution of Gamma Distributions, University of Connecticut, 2018. Rpackage version 0.2.2.
[15] M. Hubert, P.J. Rousseeuw, K. Vande. Branden, ROBPCA: A new approach to robust principal component analysis, Technometrics 47 (2005) 64–79.
[16] M. Hubert, P.J. Rousseeuw, T. Verdonck, A deterministic algorithm for robust location and scatter, J. Comput. Graph. Statist. 21 (2012) 618–637.
[17] N. Locantore, J.S. Marron, D.G. Simpson, N. Tripoli, J.T. Zhang, K.L. Cohen, Robust principal component analysis for functional data, Test 8 (1999) 1–28.
J. Raymaekers and P. Rousseeuw / Journal of Multivariate Analysis 171 (2019) 94–111 111
[18] H. Lopuhaä, P. Rousseeuw, Breakdown points of affine equivariant estimators of multivariate location and covariance matrices, Ann. Statist. 19 (1991)
229–248.
[19] A.F. Magyar, D.E. Tyler, The asymptotic inadmissibility of the spatial sign covariance matrix for elliptically symmetric distributions, Biometrika 101
(2014) 673–688.
[20] J.I. Marden, Some robust estimates of principal components, Statist. Probab. Lett. 43 (1999) 349–359.
[21] P.G. Moschopoulos, The distribution of the sum of independent gamma random variables, Ann. Inst. Statist. Math. 37 (1985) 541–544.
[22] P. Mozharovskyi, K. Mosler, T. Lange, Classifying real-world data with the DDα-procedure, Adv. Data Anal. Classif. 9 (2015) 287–314.
[23] O. Pokotylo, P. Mozharovskyi, R. Dyckerhoff, Depth and depth-based classification with R-package ddalpha,arXiv:1608.04109, 2016.
[24] G.M. Reaven, R.G. Miller, An attempt to define the nature of chemical diabetes using a multidimensional analysis, Diabetologia 16 (1979) 17–24.
[25] D.M. Rocke, Robustness properties of S-estimators of multivariate location and shape in high dimension, Ann. Statist. 24 (1996) 1327–1345.
[26] P. Rousseeuw, Least median of squares regression, J. Amer. Statist. Assoc. 79 (1984) 871–880.
[27] P. Rousseeuw, K. Van Driessen, A fast algorithm for the Minimum Covariance Determinant estimator, Technometrics 41 (1999) 212–223.
[28] S. Serneels, E. De Nolf, P.J. Van Espen, Spatial sign preprocessing: A simple way to impart moderate robustness to multivariate estimators, J. Chem.
Inf. Model. 46 (2006) 1402–1409, PMID: 16711760.
[29] S. Sirkia, S. Taskinen, H. Oja, D.E. Tyler, Tests and estimates of shape based on spatial signs and ranks, J. Nonparametr. Stat. 21 (2009) 155–176.
[30] S. Taskinen, I. Koch, H. Oja, Robustifying principal component analysis with spatial sign vectors, Statist. Probab. Lett. 82 (2012) 765–774.
[31] S. Visuri, V. Koivunen, H. Oja, Sign and rank covariance matrices, J. Statist. Plann. Inference 91 (2000) 557–575.
[32] S. Visuri, H. Oja, V. Koivunen, Subspace-based direction-of-arrival estimation using nonparametric statistics, IEEE Trans. Signal Process. 49 (2001)
2060–2073.
[33] E.B. Wilson, M.M. Hilferty, The distribution of chi-square, Proc. Nat. Acad. Sci. USA 17 (1931) 684–688.
... The SSCM was studied in detail in Magyar and Tyler (2014), Dürre et al. (2014), Dürre et al. (2016), Boente et al. (2019). In Raymaekers and Rousseeuw (2019) a generalisation to the SSCM was introduced, namely the generalized spatial sign covariance matrix (GSSCM). They identified SSCM as a part of a larger class of orthogonally equivariant scatter estimates, namely the generalized spatial sign covariance matrices. ...
... By using the Euclidean norm, the GSSCM becomes an orthogonally equivariant scatter estimator. In Raymaekers and Rousseeuw (2019), it is shown that the GSSCM inherits the consistency properties of the SSCM in that it is a Fisher consistent estimator of the eigenvectors for elliptical distributions and preserves the ranks of the eigenvalues under the same assumptions. ...
... where T is a (orthogonally equivariant) location estimator for the center of the data matrix X ∈ R n× p . In Raymaekers and Rousseeuw (2019), it is suggested to use the k-step least trimmed squares (LTS) estimator. This estimator starts from the spatial median, but adds additional iterative steps to improve robustness against outliers. ...
Article
Full-text available
Outliers contaminating data sets are a challenge to statistical estimators. Even a small fraction of outlying observations can heavily influence most classical statistical methods. In this paper we propose generalized spherical principal component analysis, a new robust version of principal component analysis that is based on the generalized spatial sign covariance matrix. Theoretical properties of the proposed method including influence functions, breakdown values and asymptotic efficiencies are derived. These theoretical results are complemented with an extensive simulation study and two real-data examples. We illustrate that generalized spherical principal component analysis can combine great robustness with solid efficiency properties, in addition to a low computational cost.
... which corresponds to a weighted covariance matrix with the data points being down-weighted based on their Euclidean distance from the center. These weighted covariance matrices have recently been studied in [27], wherein they are referred to as general spatial sign covariance matrices. Note that for u(s) = 1/s one obtains the usual SSCM. ...
Preprint
Full-text available
We introduce a class of regularized M-estimators of multivariate scatter and show, analogous to the popular spatial sign covariance matrix (SSCM), that they possess high breakdown points. We also show that the SSCM can be viewed as an extreme member of this class. Unlike the SSCM, this class of estimators takes into account the shape of the contours of the data cloud when down-weighing observations. We also propose a median based cross validation criterion for selecting the tuning parameter for this class of regularized M-estimators. This cross validation criterion helps assure the resulting tuned scatter estimator is a good fit to the data as well as having a high breakdown point. A motivation for this new median based criterion is that when it is optimized over all possible scatter parameters, rather than only over the tuned candidates, it results in a new high breakdown point affine equivariant multivariate scatter statistic.
... An important remark is that, though these descriptors are used in their original scale to build the decision trees, they are standardized for the clustering stage since this is recommendable for these methods. We used the generalized spatial sign technique to accomplish this [60], with quadratic radial function and k-step least trimmed squares for the location estimator, since it delivers robustness to estimators based on co-variances in the presence of potential outliers. ...
Article
Full-text available
It has been argued that hunter-gatherers' food-sharing may have provided the basis for a whole range of social interactions, and hence its study may provide important insight into the evolutionary origin of human sociality. Motivated by this observation, we propose a simple network optimization model inspired by a food-sharing dynamic that can recover some empirical patterns found in social networks. We focus on two of the main food-sharing drivers discussed by the anthropological literature: the reduction of individual starvation risk and the care for the group welfare or egalitarian access to food shares, and show that networks optimizing both criteria may exhibit a community structure of highly-cohesive groups around special agents that we call hunters, those who inject food into the system. These communities appear under conditions of uncertainty and scarcity in the food supply, which suggests their adaptive value in this context. We have additionally obtained that optimal welfare networks resemble social networks found in lab experiments that promote more egalitarian income distribution, and also distinct distributions of reciprocity among hunters and non-hunters, which may be consistent with some empirical reports on how sharing is distributed in waves, first among hunters, and then hunters with their families. These model results are consistent with the view that social networks functionally adaptive for optimal resource use, may have created the environment in which prosocial behaviors evolved. Finally, our model also relies on an original formulation of starvation risk, and it may contribute to a formal framework to proceed in this discussion regarding the principles guiding food-sharing networks.
... Along with Tyler's M estimate, other variations of SCM and robust estimation of scatter matrices are worth exploring for comparison and generalization. For example, the k-step SCM-a finite-iteration intermediary between SCM and Tyler's M estimate-aims to balance robustness and efficiency [12], and the generalized SCM [45] in essence uses an orthogonally equivariant weight function. Finally, the depth-weighted Stahel-Donoho estimates of location and scatter [53] may be incorporated in a slightly relaxed version of our weighted signs framework by considering (transformations of) depth functions multiplied by the ℓ 2 -norm of a vector as the weight function. ...
Article
Multivariate sign functions are often used for robust estimation and inference. We propose using data dependent weights in association with such functions. The proposed weighted sign functions retain desirable robustness properties, while significantly improving efficiency in estimation and inference compared to unweighted multivariate sign-based methods. Using weighted signs, we demonstrate methods of robust location estimation and robust principal component analysis. We extend the scope of using robust multivariate methods to include robust sufficient dimension reduction and functional outlier detection. Several numerical studies and real data applications demonstrate the efficacy of the proposed methodology.
... It was developed to overcome the fact that distance covariance does not have the zero equivalence property in general metric spaces (Lyons, 2013). It has also been shown that distance covariance can be sensible to outliers (Raymaekers and Rousseeuw, 2019), a disadvantage that ball covariance (BCOV) claims not to have. Formally, let X and Y be two random vectors defined respectively in two separable Banach spaces (X, ζ X ) and (Y, ζ Y ), where ζ X and ζ Y are norms defined on X and Y respectively. ...
Preprint
Full-text available
As its name suggests, sufficient dimension reduction (SDR) targets to estimate a subspace from data that contains all information sufficient to explain a dependent variable. Ample approaches exist to SDR, some of the most recent of which rely on minimal to no model assumptions. These are defined according to an optimization criterion that maximizes a nonparametric measure of association. The original estimators are nonsparse, which means that all variables contribute to the model. However, in many practical applications, an SDR technique may be called for that is sparse and as such, intrinsically performs sufficient variable selection (SVS). This paper examines how such a sparse SDR estimator can be constructed. Three variants are investigated, depending on different measures of association: distance covariance, martingale difference divergence and ball covariance. A simulation study shows that each of these estimators can achieve correct variable selection in highly nonlinear contexts, yet are sensitive to outliers and computationally intensive. The study sheds light on the subtle differences between the methods. Two examples illustrate how these new estimators can be applied in practice, with a slight preference for the option based on martingale difference divergence in the bioinformatics example.
Article
The direpack package establishes a set of modern statistical dimensionality reduction techniques into the Python universe as a single, consistent package. Several of the methods included are only available as open source through direpack, whereas the package also offers competitive Python implementations of methods previously only available in other programming languages. In its present version, the package is structured in three subpackages for different approaches to dimensionality reduction: projection pursuit, sufficient dimension reduction and robust M estimators. As a corollary, the package also provides access to regularized regression estimators based on these reduced dimension spaces, as well as a set of classical and robust preprocessing utilities, including very recent developments such as generalized spatial signs. Finally, direpack has been written to be consistent with the scikit-learn API, such that the estimators can flawlessly be included into (statistical and/or machine) learning pipelines in that framework.
Article
Multivariate sign-based tests for a class of testing problems on the eigenvalues of scatter matrices are constructed. The class of testing problems is characterized by real mappings h say. A necessary and sufficient condition on h to obtain asymptotically valid sign-based procedures is identified. A simulation study shows the very good robustness properties of our sign tests while their practical relevance is illustrated on a real data set.
Article
Full-text available
Due to the increasing recording capability, functional data analysis has become an important research topic. For functional data the study of outlier detection and/or the development of robust statistical procedures has started recently. One robust alternative to the sample covariance operator is the sample spatial sign covariance operator. In this paper, we study the asymptotic behaviour of the sample spatial sign covariance operator when location is unknown. Among other possible applications of the obtained results, we derive the asymptotic distribution of the principal directions obtained from the sample spatial sign covariance operator and we develop test to detect differences between the scatter operators of two populations. In particular, the test performance is illustrated through a Monte Carlo study for small sample sizes.
Article
Full-text available
Following the seminal idea of Tukey, data depth is a function that measures how close an arbitrary point of the space is located to an implicitly defined center of a data cloud. Having undergone theoretical and computational developments, it is now employed in numerous applications with classification being the most popular one. The R-package ddalpha is a software directed to fuse experience of the applicant with recent achievements in the area of data depth and depth-based classification. ddalpha provides an implementation for exact and approximate computation of most reasonable and widely applied notions of data depth. These can be further used in the depth-based multivariate and functional classifiers implemented in the package, where the $DD\alpha$-procedure is in the main focus. The package is expandable with user-defined custom depth methods and separators. The implemented functions for depth visualization and the built-in benchmark procedures may also serve to provide insights into the geometry of the data and the quality of pattern recognition.
Article
Full-text available
We summarize properties of the spatial sign covariance matrix and especially look at the relationship between its eigenvalues and those of the shape matrix of an elliptical distribution. The explicit relationship known in the bivariate case was used to construct the spatial sign correlation coefficient, which is a non-parametric and robust estimator for the correlation coefficient within the elliptical model. We consider a multivariate generalization, which we call the multivariate spatial sign correlation matrix.
Article
Full-text available
Computation of typical statistical sample estimates such as the median or least squares fit usually require the solution of an unconstrained optimization problem with a convex objective function, that can be solved efficiently by various methods. The presence of outliers in the data dictates the computation of a robust estimate, which can be defined as the optimum statistical estimate for a subset that contains at least half of the observations. The resulting problem is now a combinatorial optimization problem which is often computationally intractable. Classical statistical methods for multivariate location \(\varvec{\mu }\) and scatter matrix \(\varvec{\varSigma }\) estimation are based on the sample mean vector and covariance matrix, which are very sensitive in the presence of outlier observations. We propose a new method for robust location and scatter estimation which is composed of two stages. In the first stage an unbiased multivariate \(L_{1}\) -median center for all the observations is attained by a novel procedure called the least trimmed Euclidean deviations estimator. This robust median defines a coverage set of observations which is used in the second stage to iteratively compute the set of outliers which violate the correlational structure of the data set. Extensive computational experiments indicate that the proposed method outperforms existing methods in accuracy, robustness and computational time.
Article
We gather several results on the eigenvalues of the spatial sign covariance matrix of an elliptical distribution. It is shown that the eigenvalues are a one-to-one function of the eigenvalues of the shape matrix and that they are closer together than the latter. We further provide a one-dimensional integral representation of the eigenvalues, which facilitates their numerical computation.
Article
The $${ DD}\alpha $$DDα-classifier, a nonparametric fast and very robust procedure, is described and applied to fifty classification problems regarding a broad spectrum of real-world data. The procedure first transforms the data from their original property space into a depth space, which is a low-dimensional unit cube, and then separates them by a projective invariant procedure, called $$\alpha $$α-procedure. To each data point the transformation assigns its depth values with respect to the given classes. Several alternative depth notions (spatial depth, Mahalanobis depth, projection depth, and Tukey depth, the latter two being approximated by univariate projections) are used in the procedure, and compared regarding their average error rates. With the Tukey depth, which fits the distributions’ shape best and is most robust, ‘outsiders’, that is data points having zero depth in all classes, appear. They need an additional treatment for classification. Evidence is also given about the dimension of the extended feature space needed for linear separation. The $${ DD}\alpha $$DDα-procedure is available as an R-package.
Article
The spatial sign correlation (D\"urre, Vogel and Fried, 2015) is a highly robust and easy-to-compute, bivariate correlation estimator based on the spatial sign covariance matrix. Since the estimator is inefficient when the marginal scales strongly differ, a two-stage version was proposed. In the first step, the observations are marginally standardized by means of a robust scale estimator, and in the second step, the spatial sign correlation of the thus transformed data set is computed. D\"urre et al. (2015) give some evidence that the asymptotic distribution of the two-stage estimator equals that of the spatial sign correlation at equal marginal scales by comparing their influence functions and presenting simulation results, but give no formal proof. In the present paper, we close this gap and establish the asymptotic normality of the two-stage spatial sign correlation and compute its asymptotic variance for elliptical population distributions. We further derive a variance-stabilizing transformation, similar to Fisher's z-transform, and numerically compare the small-sample coverage probabilities of several confidence intervals.
Article
A new robust correlation estimator based on the spatial sign covariance matrix (SSCM) is proposed. We derive its asymptotic distribution and influence function at elliptical distributions. Finite sample and robustness properties are studied and compared to other robust correlation estimators by means of numerical simulations.