Content uploaded by Peter Rousseeuw
Author content
All content in this area was uploaded by Peter Rousseeuw on Dec 12, 2018
Content may be subject to copyright.
Content uploaded by Peter Rousseeuw
Author content
All content in this area was uploaded by Peter Rousseeuw on Dec 04, 2018
Content may be subject to copyright.
Journal of Multivariate Analysis 171 (2019) 94–111
Contents lists available at ScienceDirect
Journal of Multivariate Analysis
journal homepage: www.elsevier.com/locate/jmva
A generalized spatial sign covariance matrix
Jakob Raymaekers, Peter Rousseeuw ∗
Department of Mathematics, KU Leuven, Belgium
article info
Article history:
Received 3 May 2018
Available online 24 November 2018
AMS 2010 subject classifications:
primary 62H12
secondary 62H86
Keywords:
Orthogonal equivariance
Outliers
Robust location and scatter
abstract
The well-known spatial sign covariance matrix (SSCM) carries out a radial transform which
moves all data points to a sphere, followed by computing the classical covariance matrix of
the transformed data. Its popularity stems from its robustness to outliers, fast computation,
and applications to correlation and principal component analysis. In this paper we study
more general radial functions. It is shown that the eigenvectors of the generalized SSCM are
still consistent and the ranks of the eigenvalues are preserved. The influence function of the
resulting scatter matrix is derived, and it is shown that its asymptotic breakdown value is
as high as that of the original SSCM. A simulation study indicates that the best results are
obtained when the inner half of the data points are not transformed and points lying far
away are moved to the center.
©2018 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY
license (http://creativecommons.org/licenses/by/4.0/).
1. Introduction
Robust estimation of the covariance (scatter) matrix is an important and challenging problem. Over the last decades,
many robust estimators for the covariance matrix have been developed. Many of them possess the attractive property of
affine equivariance, meaning that when the data are subjected to an affine transformation the estimator will transform
accordingly.
However, all highly robust affine equivariant scatter estimators have a combinatorial time complexity. Other estimators
possess the less restrictive property of orthogonal equivariance. This means that the estimators commute with orthogonal
transformations, which are characterized by orthogonal matrices and include rotations and reflections.
The most well-known orthogonally equivariant scatter estimator is the spatial sign covariance matrix (SSCM) proposed
independently in [20,31] and studied in more detail in [8,11,19], among others. The estimator computes the regular
covariance matrix on the spatial signs of the data, which are the projections of the location-centered datapoints on the unit
sphere. Somewhat surprisingly, this transformation yields a consistent estimator of the eigenvectors of the true covariance
matrix [20] under relatively general conditions on the underlying distribution. Of course the eigenvalues are different from
the eigenvalues of the true covariance matrix, but it was shown in [31] that the order of the eigenvalues is preserved. We
build on this idea by illustrating that the SSCM is part of a larger class of orthogonally equivariant estimators, all of which
estimate the eigenvectors of the true covariance matrix and preserve the order of the eigenvalues.
The SSCM is easy to compute, and has been used extensively in several applications. The most common use of the SSCM
is probably in the context of (functional) spherical PCA as developed in [5,17,30,32]. Like classical PCA, spherical PCA aims to
find a lower dimensional subspace that captures most of the variability in the data. After centering the data, spherical PCA
projects the data onto the unit (hyper)sphere before searching for the directions of highest variability. This projection gives
all data points the same weight in the estimation of the subspace, thereby limiting the influence of potential outliers. The
∗Corresponding author.
E-mail address: peter@rousseeuw.net (P. Rousseeuw).
https://doi.org/10.1016/j.jmva.2018.11.010
0047-259X/©2018 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/
4.0/).
J. Raymaekers and P. Rousseeuw / Journal of Multivariate Analysis 171 (2019) 94–111 95
directions (‘loadings’) of spherical PCA thus correspond to the eigenvectors of the SSCM scatter matrix. The corresponding
scores are usually taken to be the inner products of the loading vectors with the original (centered) data points, not with the
projections of the data points on the sphere. Some concrete applications of spherical PCA are about the shape of the cornea
in ophthalmology as analyzed in [17], and for multichannel signal processing as illustrated in [31].
In addition to spherical PCA, there also has been a lot of recent research on the use of the SSCM for constructing robust
correlation estimators [7,9,10]. The main focus of this work is on results including asymptotic properties, the eigenvalues,
and the influence function which measures robustness. A third application of the SSCM is its use as an initial estimate for
more involved robust scatter estimators [4,16]. The SSCM is particularly well-suited for this task as it is very fast and highly
robust against outlying observations and therefore often yields a reliable starting value. Another application of the SSCM is
to testing for sphericity [29], which uses the asymptotic properties of the SSCM in order to assess whether the underlying
distribution of the data deviates substantially from a spherical distribution. Serneels et al. [28] use the spatial sign transform
as an initial preprocessing step in order to obtain a robust version of partial least squares regression. Finally, Boente et al. [1]
study SCCM as an operator for functional data analysis.
The next section introduces a generalization of the SSCM and studies its properties. Section 3compares the performance
of several members of this class in a small simulation study. Section 4applies the method to a real data example, and Section 5
concludes. All proofs can be found in the Appendix.
2. Methodology
2.1. Definition
Definition 1. Let Xbe a p-variate random variable and µa vector serving as its center. Define the generalized spatial sign
covariance matrix (GSSCM) of Xby
SgX(X)=EFX{gX(X−µ)gX(X−µ)⊤},(1)
where the function gXis of the form
gX(t)=tξX(∥t∥),(2)
where we call ξX:R+→R+the radial function and ∥·∥is the Euclidean norm.
Note that the form of gXin (2) precisely characterizes an orthogonally equivariant data transformation as shown in [13],
p. 276. Also note that the regular covariance matrix corresponds to ξX(r)=1, and that ξX(r)=1/ryields the SSCM.
For a finite data set X= {x1,...,xn}the GSSCM is given by
SgX(X)=1
n
n
i=1
ξ2
X{∥xi−T(X)∥}{xi−T(X)}{xi−T(X)}⊤,(3)
where Tis a location estimator. Note that the SSCM gives the xiwith ∥xi−T(X)∥<1 a weight higher than 1, but in general
this is not required. In fact, the other functions we will propose satisfy ξX(r)⩽1 for all r.
In the above definitions, we added the subscript Xor Xto the functions gand ξto indicate that they can depend on Xor
X. In what follows we will drop these subscripts to ease the notational burden. We will study the following functions ξ:
1. Winsorizing (Winsor):
ξ(r)=1 if r⩽Q2,
Q2/rif Q2<r.(4)
2. Quadratic Winsor (Quad):
ξ(r)=1 if r⩽Q2,
Q2
2/r2if Q2<r.(5)
3. Ball:
ξ(r)=1 if r⩽Q2,
0 if Q2<r.(6)
4. Shell:
ξ(r)=
0 if r<Q1,
1 if Q1⩽r⩽Q3,
0 if Q3<r.
(7)
96 J. Raymaekers and P. Rousseeuw / Journal of Multivariate Analysis 171 (2019) 94–111
Fig. 1. Radial functions ξin Eq. (2).
5. Linearly Redescending (LR):
ξ(r)=
1 if r⩽Q2,
(Q∗
3−r)/(Q∗
3−Q2) if Q2<r⩽Q∗
3,
0 if Q∗
3<r.
(8)
The cutoffs Q1,Q2,Q3and Q∗
3depend on the Euclidean distances ∥xi−T(X)∥by
Q1=hmedi{∥xi−T(X)∥2/3} − hmadi{∥xi−T(X)∥2/3}3/2,
Q2=hmedi{∥xi−T(X)∥2/3}3/2=hmedi{∥xi−T(X)∥},
Q3=hmedi{∥xi−T(X)∥2/3} + hmadi{∥xi−T(X)∥2/3}3/2,
Q∗
3=hmedi{∥xi−T(X)∥2/3} + 1.4826 ×hmadi{∥xi−T(X)∥2/3}3/2,
where hmed and hmad are variations on the median and median absolute deviation given by the order statistic hmed
(y1,...,yn)=y(h)and hmad(y1,...,yn)=hmedi|yi−hmedj(yj)|where h=⌊(n+p+1)/2⌋. The 2/3 power in these
formulas is the Wilson–Hilferty transformation [33] to near normality. In Appendix A.1 it is verified that this transformation
brings the above cutoffs close to the theoretical ones, which are quantiles of a convolution of Gamma random variables with
different scale parameters.
Fig. 1 shows the above functions ξand that of the SSCM for distances whose square follows the χ2
2distribution. The ξ
of the SSCM is the only one which upweights observations close to the center. The Winsor ξand its square have a similar
shape, but the latter goes down faster. The Ball and Shell ξfunctions are both designed to give a weight of 1 to half (in fact,
h) of the data points and 0 to the remainder, to make them comparable. Ball does this by giving a weight of 1 to the hpoints
with the smallest distances. Shell is inspired by the idea of Rocke to downweight observations both with very high and very
low distances from the center [25]. The Linearly Redescending ξis a compromise between the Ball and the Quad ξfunctions.
2.2. Preservation of the eigenstructure
In what follows, we assume that the distribution FXof Xhas an elliptical density with center zero and that its covariance
matrix Σ=EFX(XX⊤) exists. Therefore, Xcan be written as X=UDZ, where Uis a p×porthogonal matrix, Dis a p×p
diagonal matrix with strictly positive diagonal elements, and Zis a p-variate random variable which is spherically symmetric,
i.e., its density is of the form fZ(z)∼w(∥z∥), where wis a decreasing function. Assume without loss of generality that the
covariance matrix of Zis Ip. The following proposition says that Sg(X) has the same eigenvectors as Σand preserves the ranks
of the eigenvalues.
Proposition 1. Let X =UDZ be a p-variate random variable as described above, with D =diag(δ1, . . . , δp)where δ1⩾··· ⩾
δp>0. Assume that the covariance matrix Sg=EFXg(X)g(X)⊤of g(X)exists. Then Σand Sgcan be diagonalized as
Σ=UΛU⊤and Sg=UΛgU⊤
J. Raymaekers and P. Rousseeuw / Journal of Multivariate Analysis 171 (2019) 94–111 97
where Λ=diag(λ1, . . . , λp)with λj=δ2
jand Λg=diag(λg,1, . . . , λg,p)with λg,1⩾··· ⩾λg,p>0and λj=λj+1⇔λg,j=
λg,j+1.
This proposition justifies the generalized SSCM approach.
2.3. Location estimator
So far we have not specified any location estimator T. For the SSCM the most often used location estimator is the spatial
median; see, e.g., [2] and [12], which we denote by T0. The spatial median of a dataset X= {x1,...,xn}is defined as
T0(X)=arg min
θ
n
i=1∥xi−θ∥.
In order to improve its robustness against a substantial fraction of outliers, we propose to use the k-step least trimmed
squares (LTS) estimator. The LTS method was originally proposed in regression [26], and for multivariate location it becomes
TLTS(X)=arg min
θ
h
i=1∥x•−θ∥2
(i),
where the subscript (i) stands for the ith smallest squared distance. Without the square this becomes the least trimmed
absolute distance estimator studied in [3]. For the multivariate location LTS the C-step of [27] simplifies to
Definition 2 (C-step).Fix h=⌊(n+1)/2⌋. Given a location estimate Tj−1(X), we take the set Ij= {i1,...,ih}⊂{1,...,n}
such that {∥xi−Tj−1(X)∥ : i∈Ij}are the hsmallest distances in the set {∥xi−Tj−1(X)∥ : i=1,...,n}. The C-step then yields
Tj(X)=1
h
i∈Ij
xi.
The C-step is fast to compute, and guaranteed to lower the LTS objective. The k-step LTS is then the result of ksuccessive
C-steps starting from the spatial median T0(X).
It is also possible to avoid the estimation of location altogether, by calculating the GSSCM on the O(n2) pair-wise
differences of the data points. This approach is called the ‘‘symmetrization’’ of an estimator, but is more computationally
intensive. Visuri et al. [31] studied the symmetrized SSCM and called it Kendall’s τcovariance matrix.
2.4. Robustness properties
A major reason for the SSCM’s popularity is its robustness against outliers. Robustness can be quantified by the influence
function and the breakdown value. We will study both for the GSSCM.
The influence function [13] quantifies the effect of a small amount of contamination on a statistical functional T. Consider
the contaminated distribution Fε,z=(1 −ε)F+ε∆(z), where ∆(z) is the distribution that puts all its mass in z. The influence
function of Tat Fis then given by
IF(z,T,F)=lim
ε→0
T(Fε,z)−T(F)
ε=∂
∂ε T(Fε ,z)ε=0
.
For the generalized SSCM class we obtain the following result:
Proposition 2. Denote Sg(F)=Ξgand let µ=0in (1). The influence function of Sgat the distribution F is given by
IF(z,Sg,F)=∂
∂ε Sg(Fε,z)ε=0=g(z)g(z)⊤−Ξg+∂
∂ε gε(X)gε(X)⊤dF (X)ε=0
.(9)
If gdoes not depend on F, the last term of (9) vanishes. For example, for g(t)=t, we retrieve the IF of the classical
covariance matrix IF(z,Σ,F)=zz⊤−Σ, and for g(t)=t/∥t∥we obtain IF(z,SSCM,F)=(z/∥z∥) (z/∥z∥)⊤−SSCM(F) in
line with the findings of [5]. For the GSSCM estimators defined by the functions (4)–(8) the last term of (9) remains, and the
expressions of their IF can be found in Appendix A.3.
In order to visualize the influence function we consider the bivariate standard normal case, i.e., F=N(0,I2). We put
contamination at (z,z) or (z,0) for different values of zand plot the IF for the diagonal elements and the off-diagonal element.
Note that we cannot compare the raw IFs directly as Sg(F)=Ξg=cgI, where cg=g1(X)2dF(X); hence Ξgis only equal
to I2up to a factor. In order to make the estimators consistent for this distribution, we can divide them by cg, and so we plot
IF(z,Sg,F)/cgin Fig. 2.
The rows in Fig. 2 correspond to the IF of the first diagonal element S11 (top), the off-diagonal element S12 (middle) and
the element S22 (bottom). Let us first consider the left part of the figure, which contains the IFs for an outlier in (z,z). By
symmetry, the IFs of the diagonal elements S11 and S22 are the same here. In the regions where the function ξis 1 the IF is
98 J. Raymaekers and P. Rousseeuw / Journal of Multivariate Analysis 171 (2019) 94–111
Fig. 2. Influence functions of the GSSCM at the bivariate standard normal distribution for contamination at (z,z) (left) and (z,0) (right). The rows correspond
to the first diagonal element S11 (top), the off-diagonal element S12 (middle), and S22 (bottom).
quadratic, like that of the classical covariance. The diagonal elements of the IF of the SSCM are zero, except at z=0 where it
takes the value −1. The Quad IF is the only one which redescends as |z|increases, whereas the others are also bounded but
J. Raymaekers and P. Rousseeuw / Journal of Multivariate Analysis 171 (2019) 94–111 99
stabilize at a value around 1.3. The shape of the IF of the Ball estimator resembles that of the univariate Huber M-estimator
of scale.
For the IF of the off-diagonal element S12, the picture is very different. All are redescending except for the SSCM and
Winsor. Here it is Winsor whose IF resembles that of Huber’s M-estimator of scale. Note that the IFs of the Ball and Shell
estimators have large jumps at their cutoff values. The discontinuities in the IFs are due to the fact that the cutoffs depend
on the median and the MAD of the distances ∥X∥2/3, as both the median and the MAD have jumps in their IF.
The right panel of Fig. 2 shows the influence functions for an outlier in (z,0). In this case the IFs of the diagonal elements
S11 and S22 are no longer the same, as the symmetry is broken. The IFs of S11 are again quadratic where ξ=1, with jumps at
the cutoffs. Note that these cutoffs are now located at different values of z, as ∥(z,0)∥ = ∥(z,z)∥. The IF of the off-diagonal
element is constant at 0, indicating that S12 remains zero even when there is an outlier at (z,0). Finally, for the second
diagonal element S22 the IF of the SSCM is −1. This is because adding εof contamination at (z,0) reduces the mass of the
remaining part of Fby εwhich lowers the estimated scatter in the vertical direction. For the other estimators there is an
additional effect of (z,0) on the cutoffs, which causes the discontinuities.
A second tool for quantifying the robustness of an estimator is the finite-sample breakdown value [6]. For a multivariate
location estimator Tand a dataset Xof size n, the breakdown value is the smallest fraction of the data that needs to be
replaced by contamination to make the resulting location estimate lie arbitrarily far away from the original location T(X).
More precisely,
ε∗(T,X)=min m/n:sup
X∗
m∥T(X∗
m)−T(X)∥=∞,
where X∗
mranges over all datasets obtained by replacing any mpoints of Xby arbitrary points.
For a multivariate estimator of scale S, the breakdown value is defined as the smallest fraction of contamination needed
to make an eigenvalue of Seither arbitrarily large or arbitrarily close to zero. We denote the eigenvalues of S(X) by
λ1{S(X)}⩾··· ⩾λp{S(X)}. The breakdown value of Sis then given by
ε∗(S,X)=min m/n:sup
X∗
m
max[λ1{S(X∗
m)}, λ−1
p{S(X∗
m)}] = ∞.
For the results on breakdown we assume the following conditions on the function ξ:
1. The function ξtakes values in [0,1].
2. For any dataset X, one has #{xi:ξ{∥xi−T(X)∥} = 1}⩾⌊(n+p+1)/2⌋.
3. For any vector t, one has ∥g(t)∥=∥t∥ξ(∥t∥)⩽hmedi(di)+1.4826 ×hmadi(di).
Note that all functions ξproposed in (4)–(8) satisfy these assumptions. The following proposition gives the breakdown value
of the GSSCM scatter estimator Sg.
Proposition 3. Let X= {x1,...,xn}be a p-dimensional dataset in general position, meaning that no p +1points lie on
the same hyperplane. Also assume that the location estimator T has a breakdown value of at least ⌊(n−p+1)/2⌋/n. Then
ε∗(Sg,X)=⌊(n−p+1)/2⌋/n.
As we would like the GSSCM scatter estimator to attain this breakdown value, we have to use a location estimator whose
breakdown value is at least ⌊(n−p+1)/2⌋/n. The following proposition verifies that the k-step LTS estimator satisfies this,
and even attains the best possible breakdown value for translation equivariant location estimators.
Proposition 4. The k-step LTS estimator Tksatisfies ε∗(Tk,X)=⌊(n+1)/2⌋/n at any p-variate dataset X= {x1,...,xn}. When
the C-steps are iterated until convergence (k→ ∞), the breakdown value remains the same.
3. Simulation study
We now perform a simulation study comparing the GSSCM versions (4)–(8). As the estimators are orthogonally
equivariant, it suffices to generate diagonal covariance matrices. We generate m=1000 samples of size n=100 from
the multivariate Gaussian distribution of dimension p=10 with center µ=0and covariance matrices Σ1=Ip(‘constant
eigenvalues’), Σ2=diag(10,9,...,1) (‘linear eigenvalues’), and Σ3=diag(102,92,...,1) (‘quadratic eigenvalues’). To
assess robustness we also add 20% and 40% of contamination in the direction of the last eigenvector, at the point (0,...,0, γ )
for several values of γ. For the location estimator Tin (3) we used the k-step LTS with k=5.
For measuring how much the estimated
Σdeviates from the true Σwe use the Kullback–Leibler divergence (KLdiv)
given by
KLdiv(
Σ,Σ)=trace(
ΣΣ−1)−ln{det(
ΣΣ−1)} − p.
We also consider the shape matrices
Γ= {det(
Σ)}−1/p
Σand Γ= {det(Σ)}−1/pΣwhich have determinant 1, and compute
KLdivshape(
Σ,Σ)=KLdiv(
Γ,Γ). Both the KLdiv and the KLdivshape are then averaged over the m=1000 replications.
100 J. Raymaekers and P. Rousseeuw / Journal of Multivariate Analysis 171 (2019) 94–111
Fig. 3. Simulation results: KLdiv (left) and KLdivshape (right) for the uncontaminated normal distribution, with constant, linear and quadratic eigenvalues.
Fig. 3 shows the simulation results on the uncontaminated data. Looking at KLdiv (left panel), we note that the SSCM
deviates the most from the true covariance matrix Σ. Among the other choices, Winsor and Quad have the lowest bias,
followed by LR, Shell, and Ball. When looking only at the shape component (right panel), SSCM performs the best when the
distribution is spherical (constant eigenvalues), in line with Remark 3.1 in [19]. However, it loses this dominant performance
once the distribution deviates from sphericity. Among the other GSSCM methods Winsor performs the best, followed by its
quadratic counterpart, LR, Shell, and finally Ball.
The result for the simulation with 20% of point contamination is presented in Fig. 4. All plots are as a function of γ,
which indicates the position of the outliers. In the left panel (KLdiv), the SSCM has a large bias. The Winsor GSSCM, which
did very well in the uncontaminated setting, now has a disappointing performance when the eigenstructure becomes more
challenging with linear or quadratic eigenvalues. Quad performs a lot better, but also suffers under quadratic eigenvalues.
LR and Shell perform the best here, followed by Ball. Their redescending nature helps them for far outliers. The conclusions
for the shape component (right panel) are largely similar, except that Winsor and especially Ball look worse here.
The simulation results for 40% of contamination are shown in Fig. 5. The KLdiv plots on the left indicate that the SSCM
performs poorly for constant and linear eigenvalues, and looks better for quadratic eigenvalues but not when γis large
(far outliers). Winsor performs badly for linear and quadratic eigenvalues, whereas Quad does much better. Ball looks okay
except for relatively small γ. LR and Shell perform the best for both small and large γ, and are okay for intermediate γ.
When estimating the shape component (right panels) SSCM and Winsor have the worst performance overall, whereas Ball
also does poorly for small to intermediate γ. LR and Shell are the best picks here. Quad does almost as well, but redescends
more slowly.
4. Application: Principal component analysis
We analyze a multivariate dataset from a study by Reaven and Miller [24]. The dataset contains five numerical variables
for 109 subjects, consisting of 33 overt diabetes patients and 76 healthy people. The variables are body weight, fasting plasma
glucose, area under the plasma glucose curve, area under the plasma insulin curve, and steady state plasma glucose response.
These data were previously analyzed in [22] in the context of clustering using statistical data depth, and are available in
the Rpackage ddalpha [23] under the tag chemdiab_2vs3. Here we analyze the data by principal component analysis.
We first standardize the data, as the variables have quite different scales. Denote the standardized observations by zifor
i∈ {1,...,109}.
We consider the diabetes patients as outliers and would like the PCA subspace to model the variability within the healthy
patients. For classical PCA, the PCA subspace corresponds to the linear span of the keigenvectors (also called ‘loadings’) of
the covariance matrix which correspond with the klargest eigenvalues. In similar fashion we can perform PCA based on the
GSSCM with the LR radial function (8), by considering the linear span of its kfirst eigenvectors. We take k=3 components,
thereby explaining more than 95% of the variance.
Fig. 6 shows the scores with respect to the first 3 loadings for classical PCA and GSSCM PCA. The scores siare the projections
of the observations zionto the PCA subspace, i.e., si,j=z⊤
ivjwhere vjdenotes the jth eigenvector. From these plots, it is clear
that the first eigenvector of the classical PCA is heavily attracted by the diabetes patients. As a result, the outliers are only
J. Raymaekers and P. Rousseeuw / Journal of Multivariate Analysis 171 (2019) 94–111 101
Fig. 4. Simulation results: KLdiv (left) and KLdivshape (right) for the normal distribution with constant (top), linear (middle) and quadratic (bottom)
eigenvalues and 20% of contamination. The outliers were placed at the point (0,...,0, γ ).
distinguishable in their scores with respect to the first principal component. This is very different for the GSSCM PCA, where
the principal components seem to fit the healthy patients better, resulting in outlying scores for the diabetes patients with
respect to several principal components.
102 J. Raymaekers and P. Rousseeuw / Journal of Multivariate Analysis 171 (2019) 94–111
Fig. 5. Simulation results: KLdiv (left) and KLdivshape (right) for the normal distribution with constant (top), linear (middle) and quadratic (bottom)
eigenvalues and 40% of point contamination.
In addition to the scores plots, the PCA outlier map of [15] can serve as a diagnostic tool for identifying outliers. It
plots the orthogonal distance ODiagainst the score distance SDifor every observation ziin the dataset. The score distance
J. Raymaekers and P. Rousseeuw / Journal of Multivariate Analysis 171 (2019) 94–111 103
Fig. 6. Scores from the 3 first loading vectors of classical PCA (left) and GSSCM PCA (right).
Fig. 7. Outlier maps based on classical PCA (left) and GSSCM PCA (right).
of observation icaptures the distance between the observation and the center of the data within the PCA subspace. It is
given by
SDi=
3
j=1
(sij/ˆσj)2,
where ˆσjdenotes the scale of the jth scores. For classical PCA ˆσjis their standard deviation, whereas for GSSCM PCA we take
their median absolute deviation. The orthogonal distance to the PCA subspace is given by ODi= ∥zi−Vsi∥, where Vis the
5×3 matrix containing the three eigenvectors in its columns. Both the score distances and the orthogonal distances have
cutoffs, described in [15]. Fig. 7 shows the outlier maps resulting from the classical PCA and the GSSCM PCA. Classical PCA
clearly fails to distinguish the diabetes patients from the healthy subjects. In contrast, GSSCM PCA flags most of the diabetes
patients as having both an abnormally high orthogonal distance to the PCA subspace as well as having a projection in the
PCA subspace far away from those of the healthy subjects.
104 J. Raymaekers and P. Rousseeuw / Journal of Multivariate Analysis 171 (2019) 94–111
5. Conclusions
The spatial sign covariance matrix (SSCM) can be seen as a member of a larger class called Generalized SSCM (GSSCM)
estimators in which other radial functions are allowed. It turns out that the GSSCM estimators are still consistent for the true
eigenvectors while preserving the ranks of the eigenvalues. Their computation is as fast as the SSCM. We have studied five
GSSCM methods with intuitively appealing radial functions, and shown that their asymptotic breakdown values are as high
as that of the original SSCM. We also derived their influence functions and carried out a simulation study.
The radial function of the SSCM is ξ(r)=1/rwhich implies that points near the center are given a very high weight
in the covariance computation. Our alternative radial functions give these points a weight of at most 1, which yields better
performance at uncontaminated Gaussian data (Fig. 3) as well as contaminated data (Figs. 4 and 5). In particular, Winsor is
the most similar to SSCM since its ξ(r) is 1 for the central half of the data and 1/rfor the outer half. It performs best for
uncontaminated data, but still suffers when far outliers are present. It is almost uniformly outperformed by Quad, whose
ξ(r) is 1 in the central half and 1/r2outside it. The influence of outliers on Quad smoothly redescends to zero. The other
three estimators are hard redescenders whose ξ(r)=0 for large enough r. Among them, the linear redescending (LR) radial
function performed best overall.
A potential topic for further research is to investigate principal component analysis based on a GSSCM covariance matrix.
Software availability
R-code for computing these estimators and an example script are available from the website wis.kuleuven.be/stat/robust/
software.
Acknowledgments
This research was supported by projects of Internal Funds KU Leuven, Belgium.
Appendix
A.1. Distribution of Euclidean distances
Exact distribution. The exact distribution of the squared Euclidean distances ∥X∥2of a multivariate Gaussian distribution with
general covariance matrix is given by the following result:
Proposition 5. Let X ∼N(0,Σ), and suppose the eigenvalues of Σare given by λ1, . . . , λp. Then
∥X∥2∼
p
i=1
Γ(1/2,2λi).
For p → ∞ we have ∥X∥2⇝N(∞
i=1λi,2∞
i=1λ2
i).
Proof. We can write X=UDZ, where Uis an orthogonal matrix, Dis the diagonal matrix with elements √λ1,...,λp, and
Zfollows the p-variate standard Gaussian distribution. Note that ∥X∥2= ∥UDZ∥2= ∥DZ∥2=p
i=1λiZ2
i, where Z2
i∼χ2(1).
Therefore, λiZ2
i∼Γ(1/2,2λi) so the distribution of ∥X∥2is a sum of iid gamma distributions with a constant shape of 1/2
and varying scale parameters equal to twice the eigenvalues of the covariance matrix. As p→ ∞, one has
∥X∥2⇝N∞
i=1
λi,2∞
i=1
λ2
i
by the Lyapunov Central Limit Theorem. □
Approximate distribution of a sum of Gamma variables. Proposition 5 gives the exact distribution of the squared Euclidean
distances ∥X∥2. The distribution of a sum of gamma distributions has been studied in [21]. Quantiles of this distribution can
be computed by the Rpackage coga [14] for convolutions of gamma distributions. However, this computation requires the
knowledge of the eigenvalues λ1, . . . , λpthat we are trying to estimate. Therefore we need a transformation of the Euclidean
distances such that the transformed distances have an approximate distribution whose quantiles do not require knowing
λ1, . . . , λp.
In the simplest case λ1= ··· = λp(constant eigenvalues), and then ∥X∥2/λ1follows a χ2
pdistribution. It is known that
when pincreases the distribution of ∥X∥2tends to a Gaussian distribution, but this also holds for some other powers of
∥X∥. Wilson and Hilferty [33] found that the best transformation of this type was ∥X∥2/3in the sense of coming closest to a
J. Raymaekers and P. Rousseeuw / Journal of Multivariate Analysis 171 (2019) 94–111 105
Fig. 8. Approximation of the third quartile of a coga distribution for dimensions p∈ {1,...,20}when the eigenvalues are constant (top left), linear (top
right), or quadratic (bottom), using three different normalizing transforms.
Gaussian distribution. The quantiles qαof a Gaussian distribution are easier to compute and can then be transformed back
to q3/2
α.
It turns out that the same Wilson–Hilferty transformation also works quite well in the more general situation where the
eigenvalues λ1, . . . , λpneed not be the same. We came to this conclusion by a simulation study, a part of which is illustrated
here. The dimension pranged from 1 to 20 by steps of 1. For each p, we generated n=106observations y1,...,ynfrom the
coga distribution with shape parameters (0.5,...,0.5). The scale parameters had three settings: constant (2,...,2), linear
(p,p−1,...,1), and quadratic (p2,(p−1)2,...,1), after which the scale parameters were further standardized in order
to sum to 2p. These correspond to the distribution of the squared Euclidean norms of a multivariate normal distribution
where the covariance matrix has eigenvalues that are constant or proportional to (p,p−1,...,1) (linear eigenvalues) or
to (p2,(p−1)2,...,1) (quadratic eigenvalues). Denote the unsquared Euclidean norms as ri=√yi. Then we estimate
quantiles, e.g., Q3by assuming normality of the transformed values h1(ri)=r2
i(square), h2(ri)=ri(Fisher), and h3(ri)=r2/3
i
(Wilson–Hilferty), by computing the third quartile of a Gaussian distribution with ˆµ=mediani{h(ri)}and ˆσ=madi{h(ri)}.
Finally, we have evaluated the cumulative distribution function of the coga distribution in ˆ
Q2
3. Ideally, we would like to obtain
Fcoga(ˆ
Q2
3)=0.75. The result of this experiment is shown in Fig. 8. We clearly see that the Wilson–Hilferty transform brings
the approximate quantile closest to its target value. The results for the first quartile Q1 (not shown) are very similar.
A.2. Proof of Proposition 1
Part 1: Preservation of the eigenvectors. First note that gis orthogonally equivariant, i.e., g(HX)=Hg(X) for any orthogonal
matrix H. Therefore Sg=EFX{g(X)g(X)⊤}implies EFX{g(HX)g(HX)⊤} = HSgH⊤.
106 J. Raymaekers and P. Rousseeuw / Journal of Multivariate Analysis 171 (2019) 94–111
The distribution of Zis spherically symmetric hence invariant to reflections along a coordinate axis, which are described
by diagonal matrices Rwith an entry of −1 and all other entries +1. For every reflection matrix Rit thus holds that
EFZ{g(DZ)g(DZ)⊤} = EFZ{g(DRZ)g(DRZ)⊤} = EFZ{g(RDZ )g(RDZ)⊤} = RE{g(DZ )g(DZ)⊤}R⊤, where the second equality holds
because DR =RD as both Dand Rare diagonal, and the last equality because R is orthogonal. Therefore EFZ{g(DZ)g(DZ)⊤}is
a diagonal matrix, which we can denote as Λg=diag(λg,1, . . . , λg,p).
Now take Uan arbitrary orthogonal matrix and let X=UDZ. Then
Sg=EFZ{g(UDZ)g(UDZ)⊤} = UEFZ{g(DZ)g(DZ)⊤}U⊤=UΛgU⊤.
For the plain covariance matrix Σof X we have Σ=EFZ{UDZ(UDZ)⊤} = UΛU⊤, where Λ=DD⊤=diag(δ2
1, . . . , δ2
p).
Therefore, the same matrix Uorthogonalizes both Σand Sg, hence Sgand Σhave the same eigenvectors.
Part 2: Preservation of the ranks of the eigenvalues. Let i>jand suppose that δi> δj. We will show that λg,i> λg,j. Note that
λg,i=g(DZ)2
ifZ(Z)dZ =δ2
iz2
iξ(∥DZ∥)2fZ(Z)dZ,
where fZis the density of Z. Similarly, we have
λg,j=g(DZ)2
jfZ(Z)dZ =δ2
jz2
jξ(∥DZ∥)2fZ(Z)dZ.
This means that λg,i> λg,jis equivalent to
(δ2
iz2
i−δ2
jz2
j)ξ(∥DZ∥)2fZ(Z)dZ >0.(A.1)
As Zis spherically symmetric, i.e., fZ(Z)∼w(∥Z∥), we can write (A.1) as
(δ2
iz2
i−δ2
jz2
j)ξ(∥DZ∥)2w(∥Z∥)dZ >0.(A.2)
Note that we can change the variable of integration as follows. Let yk=δkzkand write Y=(y1,...,yp). Then (A.2) is
equivalent to
1
δ1···δp
(y2
i−y2
j)ξ(∥Y∥)2w
p
k=1
y2
k
δ2
k
dY
>0.(A.3)
We can ignore the positive constant 1/(δ1···δp) and split the integral over the domains A= {x∈Rd: |xi|>|xj|} and
B= {x∈Rd: |xi|<|xj|}, yielding
(y2
i−y2
j)ξ(∥Y∥)2w
p
k=1
y2
k
δ2
k
dY =A
(y2
i−y2
j)ξ(∥Y∥)2w
p
k=1
y2
k
δ2
k
dY
+B
(y2
i−y2
j)ξ(∥Y∥)2w
p
k=1
y2
k
δ2
k
dY
=A
(y2
i−y2
j)ξ(∥Y∥)2w
p
k=1
y2
k
δ2
k
dY
+A
(y2
j−y2
i)ξ(∥Y∥)2w
p
k=1
y2
k
δ2
k+∆ij
dY
=A
(y2
i−y2
j)ξ(∥Y∥)2
w
p
k=1
y2
k
δ2
k
−w
p
k=1
y2
k
δ2
k+∆ij
dY
where in the second equality we have changed the variables of the integration over Bby replacing (yi,yj) by (−yj,yi) which
has Jacobian 1. The ∆ij in that step is the correction term
∆ij =y2
i/δ2
j+y2
j/δ2
i−y2
i/δ2
i−y2
j/δ2
j=(y2
i−y2
j)/δ2
j−(y2
i−y2
j)/δ2
i=(y2
i−y2
j)(1/δ2
j−1/δ2
i).
J. Raymaekers and P. Rousseeuw / Journal of Multivariate Analysis 171 (2019) 94–111 107
Note that on Ait holds that |yi|>|yj|hence y2
i−y2
j>0 so ∆ij >0. Since wis a decreasing function, it follows that
w
p
k=1
y2
k
δ2
k
−w
p
k=1
y2
k
δ2
k+∆ij
>0
which implies (A.3) so λg,i> λg,j. In contrast if δiand δjare tied, i.e., δi=δj, it follows that ∆ij =0 hence λg,i=λg,j. This
concludes the proof of Proposition 1.□
A.3. Influence functions
Proof of Proposition 2.Consider the contaminated distribution Fε,z=(1 −ε)F+ε∆z, where z∈Rpand ε∈ [0,1]. We
then have
Sg(Fε,z)=EFε,z{g(X)g(X)⊤} = (1 −ε)gε(X)gε(X)⊤dF (X)+εgε(X)gε(X)⊤d∆z.
If we take the derivative with respect to εand evaluate it in ε=0, we get
∂
∂ε Sg(Fε,z)ε=0=g(z)g(z)⊤−Ξg+∂
∂ε gε(X)gε(X)⊤dF (X)ε=0
.
Calculation of the IF. While the expression of the influence function might seem relatively simple, its (numerical) calculation
is rather involved. We can write
∂
∂ε gε(X)gε(X)⊤dF (X)ε=0=∂
∂ε {gε(X)}gε(X)⊤+gε(X)∂
∂ε gε(X)⊤dF (X)ε=0
=∂
∂ε gε(X)ε=0g(X)⊤+g(X)∂
∂ε gε(X)⊤ε=0dF (X).
So the term we need to determine is ∂gε(X)/∂ ε|ε=0. Recalling that g(t)=tξ(∥t∥) we have gε(t)=tξε(∥t∥). This means
that the contamination affects gbecause it affects the radial function ξ. Therefore we have to compute ∂gε(X)/∂ ε|ε=0=
X∂ξε(∥X∥)/∂ ε|ε=0for the functions ggiven by (4)–(8).
In these functions ξdepends on FXthough the distribution of ∥X∥2/3. Suppose that ∥X∥2/3∼Gwhen X∼F, so Gis
a univariate distribution. For Xε∼Fε,z=(1 −ε)F+ε∆zwe then have ∥Xε∥2/3∼Gε,∥z∥2/3=(1 −ε)G+ε∆∥z∥2/3. For
uncontaminated data the density of ∥X∥2/3is given by
fG(t)=fcoga(t3)|3t2|,
where fcoga is the density of the convolution of gamma distributions. We need this density to evaluate the influence
functions of their median and mad. The cutoffs in the paper are
Q1=(hmed∥X∥2/3−hmad∥X∥2/3)3/2,Q2=(hmed∥X∥2/3)3/2,
Q3=(hmed∥X∥2/3+hmad∥X∥2/3)3/2,Q∗
3=(hmed∥X∥2/3+1.4826 ×hmad∥X∥2/3)3/2,
and we can compute their influence functions, viz.
IF(z,Q1,F)=3
2median(G)−mad(G){IF(∥z∥2/3,median,G)−IF(∥z∥2/3,mad,G)},
IF(z,Q2,F)=3
2median(G)IF(∥z∥2/3,median,G),
IF(z,Q3,F)=3
2median(G)+mad(G){IF(∥z∥2/3,median,G)+IF(∥z∥2/3,mad,G)},
IF(z,Q∗
3,F)=3
2median(G)+1.4826 ×mad(G){IF(∥z∥2/3,median,G)
+1.4826 ×IF(∥z∥2/3,mad,G)}.
The Winsor GSSCM is given by ξ(r)=1r⩽Q2+Q2/r1r>Q2. For the contaminated case this becomes ξε(r)=1r⩽Q2,ε +
Q2,ε/r1r>Q2,ε . We then have
∂
∂ε ξε(r)=∂
∂ε 1[0,Q2,ε ](r)+Q2,ε
r1(Q2,ε,∞)(r)=δ(r−Q2,ε )Q′
2,ε +Q′
2,ε
r1(Q2,ε,∞)(r)−Q2,ε
rδ(r−Q2,ε)Q′
2,ε,
108 J. Raymaekers and P. Rousseeuw / Journal of Multivariate Analysis 171 (2019) 94–111
where δ(x−y) denotes the distributional derivative of 1(−∞,x](y)=1[y,∞)(x) with respect to x. Evaluation in ε=0 gives
δ(r−Q2)IF(z,Q2,F)+IF(z,Q2,F)
r1(Q2,∞)(r)−Q2
rδ(r−Q2)IF(z,Q2,F)
=1−Q2
rδ(r−Q2)IF(z,Q2,F)+IF(z,Q2,F)
r1(Q2,∞)(r).
As (1−Q2/r)δ(r−Q2) is 0 everywhere, we only need to integrate the last term. This yields
∂
∂ε gε(X)ε=0=X
∥X∥IF(z,Q2,F)1(Q2,∞)(∥X∥).
The influence function of Sgis thus given by
IF(z,Sg,F)=g(z)g(z)⊤−Ξg(F)
+X
∥X∥IF(z,Q2,F)1(Q2,∞)(∥X∥)g(X)⊤dF(X)+g(X)X
∥X∥IF(z,Q2,F)1(Q2,∞)(∥X∥)⊤dF(X).
Note that the last two terms in the sum are each other’s transpose. The integration is done numerically.
The derivation of the influence function of the Quad GSSCM is entirely similar to that of Winsor. The main difference is
that now ∂gε(X)/∂ε|ε=0is given by
∂
∂ε gε(X)ε=0=2Q2IF(z,Q2,F)X
∥X∥21(Q2,∞)(∥X∥).
The linearly redescending (LR) method uses a second cutoff, viz.
ξ(r)=
1 if r⩽Q2,
(Q∗
3−r)/(Q∗
3−Q2) if Q2<r⩽Q∗
3,
0 if r>Q∗
3.
In the contaminated case we obtain gε(x)=xξε(∥x∥) with
ξε(r)=
1 if r⩽Q2,ε,
(Q∗
3,ε −r)/(Q∗
3,ε −Q2,ε) if Q2,ε <r⩽Q∗
3,ε,
0 if r>Q∗
3,ε.
Taking the derivative with respect to εyields
∂
∂ε ξε(r)=δ(r−Q2,ε )+Q∗
3,ε −r
Q∗
3,ε −Q2,ε δ(r−Q∗
3,ε)−δ(r−Q2,ε )
+1[Q2,ε,Q∗
3,ε]
Q∗′
3,ε(Q∗
3,ε −Q2,ε)−(Q∗′
3,ε −Q′
2,ε)(Q∗
3,ε −r)
(Q∗
3,ε −Q2,ε)2.
Evaluation in ε=0 gives
δ(r−Q2)+Q∗
3−r
Q∗
3−Q2δ(r−Q∗
3)−δ(r−Q2)
+1[Q2,Q∗
3]
IF(z,Q∗
3,F)(Q∗
3−Q2)− {IF(z,Q∗
3,F)−IF(z,Q2,F)}(Q∗
3−r)
(Q∗
3−Q2)2.
When integrating only the last term plays a role, yielding
∂
∂ε gε(X)ε=0=X1[Q2,Q∗
3](∥X∥)
IF(∥z∥,Q∗
3,F)(Q∗
3−Q2)− {IF(∥z∥,Q∗
3,F)−IF(∥z∥,Q2,F)}(Q∗
3− ∥X∥)
(Q∗
3−Q2)2
=X1[Q2,Q∗
3](∥X∥)IF(∥z∥,Q∗
3,F)(∥X∥ − Q2)+IF(∥z∥,Q2,F)(Q∗
3− ∥X∥)
(Q∗
3−Q2)2.
For the Ball GSSCM we analogously derive that
∂
∂ε gε(X)ε=0=δ(∥X∥ − Q2)IF(z,Q2,F)X.
J. Raymaekers and P. Rousseeuw / Journal of Multivariate Analysis 171 (2019) 94–111 109
Finally, for the Shell GSSCM we obtain
∂
∂ε gε(X)ε=0={δ(∥X∥ − Q3)IF(z,Q3,F)−δ(∥X∥ − Q1)IF(z,Q1,F)}X.
This concludes the proof of Proposition 2.□
A.4. Breakdown values
Proof of Proposition 3.Denote by Jthe set of all subsets of {1,...,n}with p+1 elements. For every subset J∈Jwe
define ηJ=maxi∈Jd2(xi,HJ), where HJis the hyperplane minimizing
i∈J
d2(xi,H)
over all possible hyperplanes Hand d(x,H) is the Euclidean distance between a point xand a hyperplane H.
Define ηX=minJ∈JηJ. Since the original points {x1,...,xn}are in general position, no p+1 points can lie on the same
hyperplane, which ensures that ηX>0. We also put c1=maxi∥xi−T(X)∥<∞.
Part 1. We first need to show that ε∗⩾⌊(n−p+1)/2⌋/n.
Let m<⌊(n−p+1)/2⌋and replace mobservations of X= {x1,...,xn}yielding X∗with location estimate T(X∗).
Because m/nis below the breakdown value of T, there is a constant c2<∞so that ∥T(X∗)−T(X)∥⩽c2for all such
contaminated datasets X∗. By the triangle inequality, ∥xi−T(X∗)∥⩽c1+c2<∞. This implies hmed(d∗
i)⩽c1+c2,
hence hmed(d∗
i)+1.4826 ×hmad(d∗
i)⩽2.4826 ×hmed(d∗
i)⩽2.4826 ×(c1+c2), where d∗
i= ∥x∗
i−T(X∗)∥. Therefore
∥g(t)∥⩽2.4826 ×(c1+c2) by condition 3.
First we show that the largest eigenvalue of Sg(X∗) is bounded over all such datasets X∗. Take any X∗, obtained by replacing
mpoints of Xby arbitrary points. Then
λmax =sup
∥u∥=1
u⊤Sg(X∗)u=sup
∥u∥=1
1
n
n
i=1
u⊤g{x∗
i−T(X∗)}g{x∗
i−T(X∗)}⊤u
=sup
∥u∥=1
1
n
n
i=1u⊤g{x∗
i−T(X∗)}2⩽sup
∥u∥=1
1
n
n
i=1∥u∥2∥g{x∗
i−T(X∗)}∥2⩽{2.4826 ×(c1+c2)}2<∞.
Next we show that the smallest eigenvalue of Sg(X∗) has a positive lower bound for all contaminated datasets X∗.
By condition 2 on ξwe know that #{xi:ξ{∥xi−T(X∗)∥} = 1}⩾⌊(n+p+1)/2⌋. Therefore, we have at least
⌊(n+p+1)/2⌋−(⌊(n−p+1)/2⌋−1) =p+1 regular points for which ξ{∥xi−T(X∗)∥} = 1, let us assume without
loss of generality that these are x1,...,xp+1. We can now write
λmin =min
∥u∥=1u⊤Sg(X∗)u=min
∥u∥=1
1
n
n
i=1
u⊤g{x∗
i−T(X∗)}g{x∗
i−T(X∗)}⊤u=min
∥u∥=1
1
n
n
i=1u⊤g{x∗
i−T(X∗)}2
⩾min
∥u∥=1
1
n
p+1
i=1u⊤{xi−T(X∗)}ξ{xi−T(X∗)}2=min
∥u∥=1
1
n
p+1
i=1u⊤{xi−T(X∗)}2
⩾1
n
p+1
i=1
d2(xi,H{1,...,p+1})⩾ηX>0.
Part 2. It remains to show that ε∗⩽⌊(n−p+1)/2⌋/n. This is the known upper bound for affine equivariant scatter
estimators but that result does not apply here, so we need to show it for this case. Take any m⩾⌊(n−p+1)/2⌋and
replace the last mpoints of X, keeping the points x1,...,xn−munchanged. By location equivariance we can assume without
loss of generality that the average of x1,...,xn−mis zero. For j∈ {n−m+1,...,n}, put x∗
j=λaj, where ajis such that
mini∈{n−m+1,...,n}∥aj−ai∥⩾1 and such that for all λ > 1 one has mini∈{1,...,n−m}∥λaj−xi∥⩾λ. This is possible by placing
the ajoutside of the convex hull of Xand far enough from each other and X.
Now consider an unbounded increasing sequence of λk>1. For every λkthe set {x∗
n−m+1,...,x∗
n}must contain at least
one point for which wi=1, call this point x∗
b. Take another point of X∗for which wi=1, name this x∗
c. Note that x∗
ccan be an
original data point or a replaced point. We now have that ∥x∗
b−x∗
c∥⩾λhence ∥x∗
b−T(X∗)∥+ ∥x∗
c−T(X∗)∥⩾λ. Therefore
∥x∗
b−T(X∗)∥2+ ∥x∗
c−T(X∗)∥2⩾λ2/2. We then obtain
p
j=1
λj{Sg(X∗)} = trace{Sg(X∗)} = 1
n
n
i=1
trace[{x∗
i−T(X∗)}{x∗
i−T(X∗)}⊤] = 1
n
n
i=1∥x∗
i−T(X∗)∥2
⩾1
n{∥x∗
b−T(X∗)∥2+ ∥x∗
c−T(X∗)∥2}⩾λ2/(2n).
This becomes arbitrarily large and so Sg(X∗) explodes. This concludes the proof of Proposition 3.□
110 J. Raymaekers and P. Rousseeuw / Journal of Multivariate Analysis 171 (2019) 94–111
Proof of Proposition 4.Showing that ε∗(T,X)⩽⌊(n+1)/2⌋/nis easy, since ⌊(n+1)/2⌋/nis the upper bound on the
breakdown value of all translation equivariant location estimators; see, e.g., [18].
It remains to show that ε∗(T,X)⩾⌊(n+1)/2⌋/n.
Note that the objective given by the sum of the hsmallest squared Euclidean distances is nonincreasing in every C-step.
The value of the objective function after step kis
h
j=1
d2
(j){X,Tk(X)},
where d(j){X,Tk(X)}denotes the jth order statistic of the distances ∥xi−Tk(X)∥, and we have that
h
j=1
d2
(j){X,Tk(X)}⩽
h
j=1
d2
(j){X,Tk−1(X)}.
Recall that h=⌊(n+1)/2⌋and define c1=maxi∥xi−Tk(X)∥<∞. Let m<n−hand replace without loss of generality
the last mobservations of X= {x1,...,xn}to obtain X∗= {x1,...,xn−m,x∗
n−m+1,...,x∗
n} = {x∗
1,...,x∗
n}. Since the spatial
median T0does not yet break down for this m[18], there is a constant c2such that maxi∥xi−T0(X∗)∥⩽c2<∞for all such
datasets X∗.
Consider Tk(X∗) and the corresponding objective function h
j=1d2
(j){X∗,Tk(X∗)}. Since the C-step does not increase the
value of the objective function, we have that
h
j=1
d2
(j){X∗,Tk(X∗)}⩽
h
j=1
d2
(j){X∗,Tk−1(X∗)}⩽··· ⩽
h
j=1
d2
(j){X∗,T0(X∗)}.
Note that
h
j=1
d2
(j){X∗,T0(X∗)}⩽
h
i=1∥x∗
i−T0(X∗)∥2=
h
i=1∥xi−T0(X∗)∥2⩽h
i=1∥xi−T0(X∗)∥2
⩽(hc2)2.
Since mis at most ⌊(n−1)/2⌋and h=⌊(n+1)/2⌋we have at least ⌊(n+1)/2⌋−⌊(n−1)/2⌋=1 point xjwith
1⩽j⩽n−mfor which ∥xj−Tk(X∗)∥2⩽d2
(h){X∗,Tk(X∗)}. Note that
∥xj−Tk(X∗)∥2⩽
h
j=1
d2
(j){X∗,Tk(X∗)}⩽
h
j=1
d2
(j){X∗,T0(X∗)}.
So for this xjwe can write
∥Tk(X∗)−T0(X)∥⩽∥Tk(X∗)−xj∥+∥xj−T0(X)∥⩽hc2+c1<∞.
Note that this upper bound does not depend on kand therefore remains valid when the procedure is iterated until
convergence (k→ ∞). This concludes the proof of Proposition 4.□
References
[1] G. Boente, D. Rodriguez, M. Sud, The spatial sign operator: Asymptotic results and applications, J. Multivariate Anal. 170 (2018) (in press).
[2] B.M. Brown, Statistical uses of the spatial median, J. R. Stat. Soc. Ser. B Stat. Methodol. 45 (1983) 25–30.
[3] C. Chatzinakos, L. Pitsoulis, G. Zioutas, Optimization techniques for robust multivariate location and scatter estimation, J. Comb. Optim. 31 (2016)
1443–1460.
[4] C. Croux, C. Dehon, A. Yadine, The k-step spatial sign covariance matrix, Adv. Data Anal. Classif. 4 (2010) 137–150.
[5] C. Croux, E. Ollila, H. Oja, Sign and rank covariance matrices: Statistical properties and application to principal components analysis, in: Y. Dodge (Ed.),
Statistical Data Analysis Based on the L1-Norm and Related Methods, Birkhäuser, Basel, 2002, pp. 257–269.
[6] D. Donoho, P. Huber, The notion of breakdown point, in: P. Bickel, K. Doksum, J. Hodges (Eds.), A Festschrift for Erich Lehmann, Wadsworth, Belmont,
CA, pp. 157–184.
[7] A. Dürre, R. Fried, D. Vogel, The spatial sign covariance matrix and its application for robust correlation estimation, Austrian J. Statist. 46 (2017) 13–22.
[8] A. Dürre, D.E. Tyler, D. Vogel, On the eigenvalues of the spatial sign covariance matrix in more than two dimensions, Statist. Probab. Lett. 111 (2016)
80–85.
[9] A. Dürre, D. Vogel, Asymptotics of the two-stage spatial sign correlation, J. Multivariate Anal. 144 (2016) 54–67.
[10] A. Dürre, D. Vogel, R. Fried, Spatial sign correlation, J. Multivariate Anal. 135 (2015) 89–105.
[11] A. Dürre, D. Vogel, D.E. Tyler, The spatial sign covariance matrix with unknown location, J. Multivariate Anal. 130 (2014) 107–117.
[12] J.C. Gower, Algorithm AS 78: The Mediancentre, J. R. Stat. Soc. Ser. C. Appl. Stat. 23 (1974) 466–470.
[13] F. Hampel, E. Ronchetti, P. Rousseeuw, W. Stahel, Robust Statistics: The Approach Based on Influence Functions, Wiley, New York, 1986.
[14] C. Hu, V. Pozdnyakov, J. Yan, Coga: Convolution of Gamma Distributions, University of Connecticut, 2018. Rpackage version 0.2.2.
[15] M. Hubert, P.J. Rousseeuw, K. Vande. Branden, ROBPCA: A new approach to robust principal component analysis, Technometrics 47 (2005) 64–79.
[16] M. Hubert, P.J. Rousseeuw, T. Verdonck, A deterministic algorithm for robust location and scatter, J. Comput. Graph. Statist. 21 (2012) 618–637.
[17] N. Locantore, J.S. Marron, D.G. Simpson, N. Tripoli, J.T. Zhang, K.L. Cohen, Robust principal component analysis for functional data, Test 8 (1999) 1–28.
J. Raymaekers and P. Rousseeuw / Journal of Multivariate Analysis 171 (2019) 94–111 111
[18] H. Lopuhaä, P. Rousseeuw, Breakdown points of affine equivariant estimators of multivariate location and covariance matrices, Ann. Statist. 19 (1991)
229–248.
[19] A.F. Magyar, D.E. Tyler, The asymptotic inadmissibility of the spatial sign covariance matrix for elliptically symmetric distributions, Biometrika 101
(2014) 673–688.
[20] J.I. Marden, Some robust estimates of principal components, Statist. Probab. Lett. 43 (1999) 349–359.
[21] P.G. Moschopoulos, The distribution of the sum of independent gamma random variables, Ann. Inst. Statist. Math. 37 (1985) 541–544.
[22] P. Mozharovskyi, K. Mosler, T. Lange, Classifying real-world data with the DDα-procedure, Adv. Data Anal. Classif. 9 (2015) 287–314.
[23] O. Pokotylo, P. Mozharovskyi, R. Dyckerhoff, Depth and depth-based classification with R-package ddalpha,arXiv:1608.04109, 2016.
[24] G.M. Reaven, R.G. Miller, An attempt to define the nature of chemical diabetes using a multidimensional analysis, Diabetologia 16 (1979) 17–24.
[25] D.M. Rocke, Robustness properties of S-estimators of multivariate location and shape in high dimension, Ann. Statist. 24 (1996) 1327–1345.
[26] P. Rousseeuw, Least median of squares regression, J. Amer. Statist. Assoc. 79 (1984) 871–880.
[27] P. Rousseeuw, K. Van Driessen, A fast algorithm for the Minimum Covariance Determinant estimator, Technometrics 41 (1999) 212–223.
[28] S. Serneels, E. De Nolf, P.J. Van Espen, Spatial sign preprocessing: A simple way to impart moderate robustness to multivariate estimators, J. Chem.
Inf. Model. 46 (2006) 1402–1409, PMID: 16711760.
[29] S. Sirkia, S. Taskinen, H. Oja, D.E. Tyler, Tests and estimates of shape based on spatial signs and ranks, J. Nonparametr. Stat. 21 (2009) 155–176.
[30] S. Taskinen, I. Koch, H. Oja, Robustifying principal component analysis with spatial sign vectors, Statist. Probab. Lett. 82 (2012) 765–774.
[31] S. Visuri, V. Koivunen, H. Oja, Sign and rank covariance matrices, J. Statist. Plann. Inference 91 (2000) 557–575.
[32] S. Visuri, H. Oja, V. Koivunen, Subspace-based direction-of-arrival estimation using nonparametric statistics, IEEE Trans. Signal Process. 49 (2001)
2060–2073.
[33] E.B. Wilson, M.M. Hilferty, The distribution of chi-square, Proc. Nat. Acad. Sci. USA 17 (1931) 684–688.