ArticlePDF Available

A generalized spatial sign covariance matrix

May 2019
Journal of Multivariate Analysis 171:94-111

May 2019
171:94-111

DOI:10.1016/j.jmva.2018.11.010

License
CC BY 4.0

Authors:

Jakob Raymaekers

University of Antwerp

Peter Rousseeuw

KU Leuven

The well-known spatial sign covariance matrix (SSCM) carries out a radial transform which moves all data points to a sphere, followed by computing the classical covariance matrix of the transformed data. Its popularity stems from its robustness to outliers, fast computation, and applications to correlation and principal component analysis. In this paper we study more general radial functions. It is shown that the eigenvectors of the generalized SSCM are still consistent and the ranks of the eigenvalues are preserved. The influence function of the resulting scatter matrix is derived, and it is shown that its asymptotic breakdown value is as high as that of the original SSCM. A simulation study indicates that the best results are obtained when the inner half of the data points are not transformed and points lying far away are moved to the center.

Radial functions ξ in Eq. (2).

…

Scores from the 3 first loading vectors of classical PCA (left) and GSSCM PCA (right).

…

Outlier maps based on classical PCA (left) and GSSCM PCA (right).

…

Approximation of the third quartile of a coga distribution for dimensions p ∈ {1,. .. , 20} when the eigenvalues are constant (top left), linear (top right), or quadratic (bottom), using three different normalizing transforms.

…

Figures - uploaded by Peter Rousseeuw

Content may be subject to copyright.

Content uploaded by Peter Rousseeuw

Content may be subject to copyright.

Content uploaded by Peter Rousseeuw

Content may be subject to copyright.

Journal of Multivariate Analysis 171 (2019) 94–111

Contents lists available at ScienceDirect

Journal of Multivariate Analysis

journal homepage: www.elsevier.com/locate/jmva

A generalized spatial sign covariance matrix

Jakob Raymaekers, Peter Rousseeuw ∗

Department of Mathematics, KU Leuven, Belgium

article info

Article history:

Received 3 May 2018

Available online 24 November 2018

AMS 2010 subject classifications:

primary 62H12

secondary 62H86

Keywords:

Orthogonal equivariance

Outliers

Robust location and scatter

abstract

The well-known spatial sign covariance matrix (SSCM) carries out a radial transform which

moves all data points to a sphere, followed by computing the classical covariance matrix of

the transformed data. Its popularity stems from its robustness to outliers, fast computation,

and applications to correlation and principal component analysis. In this paper we study

more general radial functions. It is shown that the eigenvectors of the generalized SSCM are

still consistent and the ranks of the eigenvalues are preserved. The influence function of the

resulting scatter matrix is derived, and it is shown that its asymptotic breakdown value is

as high as that of the original SSCM. A simulation study indicates that the best results are

obtained when the inner half of the data points are not transformed and points lying far

away are moved to the center.

license (http://creativecommons.org/licenses/by/4.0/).

1. Introduction

Robust estimation of the covariance (scatter) matrix is an important and challenging problem. Over the last decades,

many robust estimators for the covariance matrix have been developed. Many of them possess the attractive property of

affine equivariance, meaning that when the data are subjected to an affine transformation the estimator will transform

accordingly.

However, all highly robust affine equivariant scatter estimators have a combinatorial time complexity. Other estimators

possess the less restrictive property of orthogonal equivariance. This means that the estimators commute with orthogonal

transformations, which are characterized by orthogonal matrices and include rotations and reflections.

The most well-known orthogonally equivariant scatter estimator is the spatial sign covariance matrix (SSCM) proposed

independently in [20,31] and studied in more detail in [8,11,19], among others. The estimator computes the regular

covariance matrix on the spatial signs of the data, which are the projections of the location-centered datapoints on the unit

sphere. Somewhat surprisingly, this transformation yields a consistent estimator of the eigenvectors of the true covariance

matrix [20] under relatively general conditions on the underlying distribution. Of course the eigenvalues are different from

the eigenvalues of the true covariance matrix, but it was shown in [31] that the order of the eigenvalues is preserved. We

build on this idea by illustrating that the SSCM is part of a larger class of orthogonally equivariant estimators, all of which

estimate the eigenvectors of the true covariance matrix and preserve the order of the eigenvalues.

The SSCM is easy to compute, and has been used extensively in several applications. The most common use of the SSCM

is probably in the context of (functional) spherical PCA as developed in [5,17,30,32]. Like classical PCA, spherical PCA aims to

find a lower dimensional subspace that captures most of the variability in the data. After centering the data, spherical PCA

projects the data onto the unit (hyper)sphere before searching for the directions of highest variability. This projection gives

all data points the same weight in the estimation of the subspace, thereby limiting the influence of potential outliers. The

∗Corresponding author.

E-mail address: peter@rousseeuw.net (P. Rousseeuw).

https://doi.org/10.1016/j.jmva.2018.11.010

4.0/).

J. Raymaekers and P. Rousseeuw / Journal of Multivariate Analysis 171 (2019) 94–111 95

directions (‘loadings’) of spherical PCA thus correspond to the eigenvectors of the SSCM scatter matrix. The corresponding

scores are usually taken to be the inner products of the loading vectors with the original (centered) data points, not with the

projections of the data points on the sphere. Some concrete applications of spherical PCA are about the shape of the cornea

in ophthalmology as analyzed in [17], and for multichannel signal processing as illustrated in [31].

In addition to spherical PCA, there also has been a lot of recent research on the use of the SSCM for constructing robust

correlation estimators [7,9,10]. The main focus of this work is on results including asymptotic properties, the eigenvalues,

and the influence function which measures robustness. A third application of the SSCM is its use as an initial estimate for

more involved robust scatter estimators [4,16]. The SSCM is particularly well-suited for this task as it is very fast and highly

robust against outlying observations and therefore often yields a reliable starting value. Another application of the SSCM is

to testing for sphericity [29], which uses the asymptotic properties of the SSCM in order to assess whether the underlying

distribution of the data deviates substantially from a spherical distribution. Serneels et al. [28] use the spatial sign transform

as an initial preprocessing step in order to obtain a robust version of partial least squares regression. Finally, Boente et al. [1]

study SCCM as an operator for functional data analysis.

The next section introduces a generalization of the SSCM and studies its properties. Section 3compares the performance

of several members of this class in a small simulation study. Section 4applies the method to a real data example, and Section 5

concludes. All proofs can be found in the Appendix.

2. Methodology

2.1. Definition

Definition 1. Let Xbe a p-variate random variable and µa vector serving as its center. Define the generalized spatial sign

covariance matrix (GSSCM) of Xby

SgX(X)=EFX{gX(X−µ)gX(X−µ)⊤},(1)

where the function gXis of the form

gX(t)=tξX(∥t∥),(2)

where we call ξX:R+→R+the radial function and ∥·∥is the Euclidean norm.

Note that the form of gXin (2) precisely characterizes an orthogonally equivariant data transformation as shown in [13],

p. 276. Also note that the regular covariance matrix corresponds to ξX(r)=1, and that ξX(r)=1/ryields the SSCM.

For a finite data set X= {x1,...,xn}the GSSCM is given by

SgX(X)=1



i=1

ξ2

X{∥xi−T(X)∥}{xi−T(X)}{xi−T(X)}⊤,(3)

where Tis a location estimator. Note that the SSCM gives the xiwith ∥xi−T(X)∥<1 a weight higher than 1, but in general

this is not required. In fact, the other functions we will propose satisfy ξX(r)⩽1 for all r.

In the above definitions, we added the subscript Xor Xto the functions gand ξto indicate that they can depend on Xor

X. In what follows we will drop these subscripts to ease the notational burden. We will study the following functions ξ:

1. Winsorizing (Winsor):

ξ(r)=1 if r⩽Q2,

Q2/rif Q2<r.(4)

2. Quadratic Winsor (Quad):

ξ(r)=1 if r⩽Q2,

2/r2if Q2<r.(5)

3. Ball:

ξ(r)=1 if r⩽Q2,

0 if Q2<r.(6)

4. Shell:

ξ(r)=





0 if r<Q1,

1 if Q1⩽r⩽Q3,

0 if Q3<r.

(7)

96 J. Raymaekers and P. Rousseeuw / Journal of Multivariate Analysis 171 (2019) 94–111

Fig. 1. Radial functions ξin Eq. (2).

5. Linearly Redescending (LR):

ξ(r)=





1 if r⩽Q2,

(Q∗

3−r)/(Q∗

3−Q2) if Q2<r⩽Q∗

0 if Q∗

3<r.

(8)

The cutoffs Q1,Q2,Q3and Q∗

3depend on the Euclidean distances ∥xi−T(X)∥by

Q1=hmedi{∥xi−T(X)∥2/3} − hmadi{∥xi−T(X)∥2/3}3/2,

Q2=hmedi{∥xi−T(X)∥2/3}3/2=hmedi{∥xi−T(X)∥},

Q3=hmedi{∥xi−T(X)∥2/3} + hmadi{∥xi−T(X)∥2/3}3/2,

Q∗

3=hmedi{∥xi−T(X)∥2/3} + 1.4826 ×hmadi{∥xi−T(X)∥2/3}3/2,

where hmed and hmad are variations on the median and median absolute deviation given by the order statistic hmed

(y1,...,yn)=y(h)and hmad(y1,...,yn)=hmedi|yi−hmedj(yj)|where h=⌊(n+p+1)/2⌋. The 2/3 power in these

formulas is the Wilson–Hilferty transformation [33] to near normality. In Appendix A.1 it is verified that this transformation

brings the above cutoffs close to the theoretical ones, which are quantiles of a convolution of Gamma random variables with

different scale parameters.

Fig. 1 shows the above functions ξand that of the SSCM for distances whose square follows the χ2

2distribution. The ξ

of the SSCM is the only one which upweights observations close to the center. The Winsor ξand its square have a similar

shape, but the latter goes down faster. The Ball and Shell ξfunctions are both designed to give a weight of 1 to half (in fact,

h) of the data points and 0 to the remainder, to make them comparable. Ball does this by giving a weight of 1 to the hpoints

with the smallest distances. Shell is inspired by the idea of Rocke to downweight observations both with very high and very

low distances from the center [25]. The Linearly Redescending ξis a compromise between the Ball and the Quad ξfunctions.

2.2. Preservation of the eigenstructure

In what follows, we assume that the distribution FXof Xhas an elliptical density with center zero and that its covariance

matrix Σ=EFX(XX⊤) exists. Therefore, Xcan be written as X=UDZ, where Uis a p×porthogonal matrix, Dis a p×p

diagonal matrix with strictly positive diagonal elements, and Zis a p-variate random variable which is spherically symmetric,

i.e., its density is of the form fZ(z)∼w(∥z∥), where wis a decreasing function. Assume without loss of generality that the

covariance matrix of Zis Ip. The following proposition says that Sg(X) has the same eigenvectors as Σand preserves the ranks

of the eigenvalues.

Proposition 1. Let X =UDZ be a p-variate random variable as described above, with D =diag(δ1, . . . , δp)where δ1⩾··· ⩾

δp>0. Assume that the covariance matrix Sg=EFXg(X)g(X)⊤of g(X)exists. Then Σand Sgcan be diagonalized as

Σ=UΛU⊤and Sg=UΛgU⊤

J. Raymaekers and P. Rousseeuw / Journal of Multivariate Analysis 171 (2019) 94–111 97

where Λ=diag(λ1, . . . , λp)with λj=δ2

jand Λg=diag(λg,1, . . . , λg,p)with λg,1⩾··· ⩾λg,p>0and λj=λj+1⇔λg,j=

λg,j+1.

This proposition justifies the generalized SSCM approach.

2.3. Location estimator

So far we have not specified any location estimator T. For the SSCM the most often used location estimator is the spatial

median; see, e.g., [2] and [12], which we denote by T0. The spatial median of a dataset X= {x1,...,xn}is defined as

T0(X)=arg min



i=1∥xi−θ∥.

In order to improve its robustness against a substantial fraction of outliers, we propose to use the k-step least trimmed

squares (LTS) estimator. The LTS method was originally proposed in regression [26], and for multivariate location it becomes

TLTS(X)=arg min



i=1∥x•−θ∥2

(i),

where the subscript (i) stands for the ith smallest squared distance. Without the square this becomes the least trimmed

absolute distance estimator studied in [3]. For the multivariate location LTS the C-step of [27] simplifies to

Definition 2 (C-step).Fix h=⌊(n+1)/2⌋. Given a location estimate Tj−1(X), we take the set Ij= {i1,...,ih}⊂{1,...,n}

such that {∥xi−Tj−1(X)∥ : i∈Ij}are the hsmallest distances in the set {∥xi−Tj−1(X)∥ : i=1,...,n}. The C-step then yields

Tj(X)=1

h

i∈Ij

xi.

The C-step is fast to compute, and guaranteed to lower the LTS objective. The k-step LTS is then the result of ksuccessive

C-steps starting from the spatial median T0(X).

It is also possible to avoid the estimation of location altogether, by calculating the GSSCM on the O(n2) pair-wise

differences of the data points. This approach is called the ‘‘symmetrization’’ of an estimator, but is more computationally

intensive. Visuri et al. [31] studied the symmetrized SSCM and called it Kendall’s τcovariance matrix.

2.4. Robustness properties

A major reason for the SSCM’s popularity is its robustness against outliers. Robustness can be quantified by the influence

function and the breakdown value. We will study both for the GSSCM.

The influence function [13] quantifies the effect of a small amount of contamination on a statistical functional T. Consider

the contaminated distribution Fε,z=(1 −ε)F+ε∆(z), where ∆(z) is the distribution that puts all its mass in z. The influence

function of Tat Fis then given by

IF(z,T,F)=lim

ε→0

T(Fε,z)−T(F)

ε=∂

∂ε T(Fε ,z)ε=0

For the generalized SSCM class we obtain the following result:

Proposition 2. Denote Sg(F)=Ξgand let µ=0in (1). The influence function of Sgat the distribution F is given by

IF(z,Sg,F)=∂

∂ε Sg(Fε,z)ε=0=g(z)g(z)⊤−Ξg+∂

∂ε gε(X)gε(X)⊤dF (X)ε=0

.(9)

If gdoes not depend on F, the last term of (9) vanishes. For example, for g(t)=t, we retrieve the IF of the classical

covariance matrix IF(z,Σ,F)=zz⊤−Σ, and for g(t)=t/∥t∥we obtain IF(z,SSCM,F)=(z/∥z∥) (z/∥z∥)⊤−SSCM(F) in

line with the findings of [5]. For the GSSCM estimators defined by the functions (4)–(8) the last term of (9) remains, and the

expressions of their IF can be found in Appendix A.3.

In order to visualize the influence function we consider the bivariate standard normal case, i.e., F=N(0,I2). We put

contamination at (z,z) or (z,0) for different values of zand plot the IF for the diagonal elements and the off-diagonal element.

Note that we cannot compare the raw IFs directly as Sg(F)=Ξg=cgI, where cg=g1(X)2dF(X); hence Ξgis only equal

to I2up to a factor. In order to make the estimators consistent for this distribution, we can divide them by cg, and so we plot

IF(z,Sg,F)/cgin Fig. 2.

The rows in Fig. 2 correspond to the IF of the first diagonal element S11 (top), the off-diagonal element S12 (middle) and

the element S22 (bottom). Let us first consider the left part of the figure, which contains the IFs for an outlier in (z,z). By

symmetry, the IFs of the diagonal elements S11 and S22 are the same here. In the regions where the function ξis 1 the IF is

98 J. Raymaekers and P. Rousseeuw / Journal of Multivariate Analysis 171 (2019) 94–111

Fig. 2. Influence functions of the GSSCM at the bivariate standard normal distribution for contamination at (z,z) (left) and (z,0) (right). The rows correspond

to the first diagonal element S11 (top), the off-diagonal element S12 (middle), and S22 (bottom).

quadratic, like that of the classical covariance. The diagonal elements of the IF of the SSCM are zero, except at z=0 where it

takes the value −1. The Quad IF is the only one which redescends as |z|increases, whereas the others are also bounded but

J. Raymaekers and P. Rousseeuw / Journal of Multivariate Analysis 171 (2019) 94–111 99

stabilize at a value around 1.3. The shape of the IF of the Ball estimator resembles that of the univariate Huber M-estimator

of scale.

For the IF of the off-diagonal element S12, the picture is very different. All are redescending except for the SSCM and

Winsor. Here it is Winsor whose IF resembles that of Huber’s M-estimator of scale. Note that the IFs of the Ball and Shell

estimators have large jumps at their cutoff values. The discontinuities in the IFs are due to the fact that the cutoffs depend

on the median and the MAD of the distances ∥X∥2/3, as both the median and the MAD have jumps in their IF.

The right panel of Fig. 2 shows the influence functions for an outlier in (z,0). In this case the IFs of the diagonal elements

S11 and S22 are no longer the same, as the symmetry is broken. The IFs of S11 are again quadratic where ξ=1, with jumps at

the cutoffs. Note that these cutoffs are now located at different values of z, as ∥(z,0)∥ = ∥(z,z)∥. The IF of the off-diagonal

element is constant at 0, indicating that S12 remains zero even when there is an outlier at (z,0). Finally, for the second

diagonal element S22 the IF of the SSCM is −1. This is because adding εof contamination at (z,0) reduces the mass of the

remaining part of Fby εwhich lowers the estimated scatter in the vertical direction. For the other estimators there is an

additional effect of (z,0) on the cutoffs, which causes the discontinuities.

A second tool for quantifying the robustness of an estimator is the finite-sample breakdown value [6]. For a multivariate

location estimator Tand a dataset Xof size n, the breakdown value is the smallest fraction of the data that needs to be

replaced by contamination to make the resulting location estimate lie arbitrarily far away from the original location T(X).

More precisely,

ε∗(T,X)=min m/n:sup

X∗

m∥T(X∗

m)−T(X)∥=∞,

where X∗

mranges over all datasets obtained by replacing any mpoints of Xby arbitrary points.

For a multivariate estimator of scale S, the breakdown value is defined as the smallest fraction of contamination needed

to make an eigenvalue of Seither arbitrarily large or arbitrarily close to zero. We denote the eigenvalues of S(X) by

λ1{S(X)}⩾··· ⩾λp{S(X)}. The breakdown value of Sis then given by

ε∗(S,X)=min m/n:sup

X∗

max[λ1{S(X∗

m)}, λ−1

p{S(X∗

m)}] = ∞.

For the results on breakdown we assume the following conditions on the function ξ:

1. The function ξtakes values in [0,1].

2. For any dataset X, one has #{xi:ξ{∥xi−T(X)∥} = 1}⩾⌊(n+p+1)/2⌋.

3. For any vector t, one has ∥g(t)∥=∥t∥ξ(∥t∥)⩽hmedi(di)+1.4826 ×hmadi(di).

Note that all functions ξproposed in (4)–(8) satisfy these assumptions. The following proposition gives the breakdown value

of the GSSCM scatter estimator Sg.

Proposition 3. Let X= {x1,...,xn}be a p-dimensional dataset in general position, meaning that no p +1points lie on

the same hyperplane. Also assume that the location estimator T has a breakdown value of at least ⌊(n−p+1)/2⌋/n. Then

ε∗(Sg,X)=⌊(n−p+1)/2⌋/n.

As we would like the GSSCM scatter estimator to attain this breakdown value, we have to use a location estimator whose

breakdown value is at least ⌊(n−p+1)/2⌋/n. The following proposition verifies that the k-step LTS estimator satisfies this,

and even attains the best possible breakdown value for translation equivariant location estimators.

Proposition 4. The k-step LTS estimator Tksatisfies ε∗(Tk,X)=⌊(n+1)/2⌋/n at any p-variate dataset X= {x1,...,xn}. When

the C-steps are iterated until convergence (k→ ∞), the breakdown value remains the same.

3. Simulation study

We now perform a simulation study comparing the GSSCM versions (4)–(8). As the estimators are orthogonally

equivariant, it suffices to generate diagonal covariance matrices. We generate m=1000 samples of size n=100 from

the multivariate Gaussian distribution of dimension p=10 with center µ=0and covariance matrices Σ1=Ip(‘constant

eigenvalues’), Σ2=diag(10,9,...,1) (‘linear eigenvalues’), and Σ3=diag(102,92,...,1) (‘quadratic eigenvalues’). To

assess robustness we also add 20% and 40% of contamination in the direction of the last eigenvector, at the point (0,...,0, γ )

for several values of γ. For the location estimator Tin (3) we used the k-step LTS with k=5.

For measuring how much the estimated 

Σdeviates from the true Σwe use the Kullback–Leibler divergence (KLdiv)

given by

KLdiv(

Σ,Σ)=trace(

ΣΣ−1)−ln{det(

ΣΣ−1)} − p.

We also consider the shape matrices 

Γ= {det(

Σ)}−1/p

Σand Γ= {det(Σ)}−1/pΣwhich have determinant 1, and compute

KLdivshape(

Σ,Σ)=KLdiv(

Γ,Γ). Both the KLdiv and the KLdivshape are then averaged over the m=1000 replications.

100 J. Raymaekers and P. Rousseeuw / Journal of Multivariate Analysis 171 (2019) 94–111

Fig. 3. Simulation results: KLdiv (left) and KLdivshape (right) for the uncontaminated normal distribution, with constant, linear and quadratic eigenvalues.

Fig. 3 shows the simulation results on the uncontaminated data. Looking at KLdiv (left panel), we note that the SSCM

deviates the most from the true covariance matrix Σ. Among the other choices, Winsor and Quad have the lowest bias,

followed by LR, Shell, and Ball. When looking only at the shape component (right panel), SSCM performs the best when the

distribution is spherical (constant eigenvalues), in line with Remark 3.1 in [19]. However, it loses this dominant performance

once the distribution deviates from sphericity. Among the other GSSCM methods Winsor performs the best, followed by its

quadratic counterpart, LR, Shell, and finally Ball.

The result for the simulation with 20% of point contamination is presented in Fig. 4. All plots are as a function of γ,

which indicates the position of the outliers. In the left panel (KLdiv), the SSCM has a large bias. The Winsor GSSCM, which

did very well in the uncontaminated setting, now has a disappointing performance when the eigenstructure becomes more

challenging with linear or quadratic eigenvalues. Quad performs a lot better, but also suffers under quadratic eigenvalues.

LR and Shell perform the best here, followed by Ball. Their redescending nature helps them for far outliers. The conclusions

for the shape component (right panel) are largely similar, except that Winsor and especially Ball look worse here.

The simulation results for 40% of contamination are shown in Fig. 5. The KLdiv plots on the left indicate that the SSCM

performs poorly for constant and linear eigenvalues, and looks better for quadratic eigenvalues but not when γis large

(far outliers). Winsor performs badly for linear and quadratic eigenvalues, whereas Quad does much better. Ball looks okay

except for relatively small γ. LR and Shell perform the best for both small and large γ, and are okay for intermediate γ.

When estimating the shape component (right panels) SSCM and Winsor have the worst performance overall, whereas Ball

also does poorly for small to intermediate γ. LR and Shell are the best picks here. Quad does almost as well, but redescends

more slowly.

4. Application: Principal component analysis

We analyze a multivariate dataset from a study by Reaven and Miller [24]. The dataset contains five numerical variables

for 109 subjects, consisting of 33 overt diabetes patients and 76 healthy people. The variables are body weight, fasting plasma

glucose, area under the plasma glucose curve, area under the plasma insulin curve, and steady state plasma glucose response.

These data were previously analyzed in [22] in the context of clustering using statistical data depth, and are available in

the Rpackage ddalpha [23] under the tag chemdiab_2vs3. Here we analyze the data by principal component analysis.

We first standardize the data, as the variables have quite different scales. Denote the standardized observations by zifor

i∈ {1,...,109}.

We consider the diabetes patients as outliers and would like the PCA subspace to model the variability within the healthy

patients. For classical PCA, the PCA subspace corresponds to the linear span of the keigenvectors (also called ‘loadings’) of

the covariance matrix which correspond with the klargest eigenvalues. In similar fashion we can perform PCA based on the

GSSCM with the LR radial function (8), by considering the linear span of its kfirst eigenvectors. We take k=3 components,

thereby explaining more than 95% of the variance.

Fig. 6 shows the scores with respect to the first 3 loadings for classical PCA and GSSCM PCA. The scores siare the projections

of the observations zionto the PCA subspace, i.e., si,j=z⊤

ivjwhere vjdenotes the jth eigenvector. From these plots, it is clear

that the first eigenvector of the classical PCA is heavily attracted by the diabetes patients. As a result, the outliers are only

J. Raymaekers and P. Rousseeuw / Journal of Multivariate Analysis 171 (2019) 94–111 101

Fig. 4. Simulation results: KLdiv (left) and KLdivshape (right) for the normal distribution with constant (top), linear (middle) and quadratic (bottom)

eigenvalues and 20% of contamination. The outliers were placed at the point (0,...,0, γ ).

distinguishable in their scores with respect to the first principal component. This is very different for the GSSCM PCA, where

the principal components seem to fit the healthy patients better, resulting in outlying scores for the diabetes patients with

respect to several principal components.

102 J. Raymaekers and P. Rousseeuw / Journal of Multivariate Analysis 171 (2019) 94–111

Fig. 5. Simulation results: KLdiv (left) and KLdivshape (right) for the normal distribution with constant (top), linear (middle) and quadratic (bottom)

eigenvalues and 40% of point contamination.

In addition to the scores plots, the PCA outlier map of [15] can serve as a diagnostic tool for identifying outliers. It

plots the orthogonal distance ODiagainst the score distance SDifor every observation ziin the dataset. The score distance

J. Raymaekers and P. Rousseeuw / Journal of Multivariate Analysis 171 (2019) 94–111 103

Fig. 6. Scores from the 3 first loading vectors of classical PCA (left) and GSSCM PCA (right).

Fig. 7. Outlier maps based on classical PCA (left) and GSSCM PCA (right).

of observation icaptures the distance between the observation and the center of the data within the PCA subspace. It is

given by

SDi=







j=1

(sij/ˆσj)2,

where ˆσjdenotes the scale of the jth scores. For classical PCA ˆσjis their standard deviation, whereas for GSSCM PCA we take

their median absolute deviation. The orthogonal distance to the PCA subspace is given by ODi= ∥zi−Vsi∥, where Vis the

5×3 matrix containing the three eigenvectors in its columns. Both the score distances and the orthogonal distances have

cutoffs, described in [15]. Fig. 7 shows the outlier maps resulting from the classical PCA and the GSSCM PCA. Classical PCA

clearly fails to distinguish the diabetes patients from the healthy subjects. In contrast, GSSCM PCA flags most of the diabetes

patients as having both an abnormally high orthogonal distance to the PCA subspace as well as having a projection in the

PCA subspace far away from those of the healthy subjects.

104 J. Raymaekers and P. Rousseeuw / Journal of Multivariate Analysis 171 (2019) 94–111

5. Conclusions

The spatial sign covariance matrix (SSCM) can be seen as a member of a larger class called Generalized SSCM (GSSCM)

estimators in which other radial functions are allowed. It turns out that the GSSCM estimators are still consistent for the true

eigenvectors while preserving the ranks of the eigenvalues. Their computation is as fast as the SSCM. We have studied five

GSSCM methods with intuitively appealing radial functions, and shown that their asymptotic breakdown values are as high

as that of the original SSCM. We also derived their influence functions and carried out a simulation study.

The radial function of the SSCM is ξ(r)=1/rwhich implies that points near the center are given a very high weight

in the covariance computation. Our alternative radial functions give these points a weight of at most 1, which yields better

performance at uncontaminated Gaussian data (Fig. 3) as well as contaminated data (Figs. 4 and 5). In particular, Winsor is

the most similar to SSCM since its ξ(r) is 1 for the central half of the data and 1/rfor the outer half. It performs best for

uncontaminated data, but still suffers when far outliers are present. It is almost uniformly outperformed by Quad, whose

ξ(r) is 1 in the central half and 1/r2outside it. The influence of outliers on Quad smoothly redescends to zero. The other

three estimators are hard redescenders whose ξ(r)=0 for large enough r. Among them, the linear redescending (LR) radial

function performed best overall.

A potential topic for further research is to investigate principal component analysis based on a GSSCM covariance matrix.

Software availability

R-code for computing these estimators and an example script are available from the website wis.kuleuven.be/stat/robust/

software.

Acknowledgments

This research was supported by projects of Internal Funds KU Leuven, Belgium.

Appendix

A.1. Distribution of Euclidean distances

Exact distribution. The exact distribution of the squared Euclidean distances ∥X∥2of a multivariate Gaussian distribution with

general covariance matrix is given by the following result:

Proposition 5. Let X ∼N(0,Σ), and suppose the eigenvalues of Σare given by λ1, . . . , λp. Then

∥X∥2∼



i=1

Γ(1/2,2λi).

For p → ∞ we have ∥X∥2⇝N(∞

i=1λi,2∞

i=1λ2

i).

Proof. We can write X=UDZ, where Uis an orthogonal matrix, Dis the diagonal matrix with elements √λ1,...,λp, and

Zfollows the p-variate standard Gaussian distribution. Note that ∥X∥2= ∥UDZ∥2= ∥DZ∥2=p

i=1λiZ2

i, where Z2

i∼χ2(1).

Therefore, λiZ2

i∼Γ(1/2,2λi) so the distribution of ∥X∥2is a sum of iid gamma distributions with a constant shape of 1/2

and varying scale parameters equal to twice the eigenvalues of the covariance matrix. As p→ ∞, one has

∥X∥2⇝N∞



i=1

λi,2∞



i=1

λ2

i

by the Lyapunov Central Limit Theorem. □

Approximate distribution of a sum of Gamma variables. Proposition 5 gives the exact distribution of the squared Euclidean

distances ∥X∥2. The distribution of a sum of gamma distributions has been studied in [21]. Quantiles of this distribution can

be computed by the Rpackage coga [14] for convolutions of gamma distributions. However, this computation requires the

knowledge of the eigenvalues λ1, . . . , λpthat we are trying to estimate. Therefore we need a transformation of the Euclidean

distances such that the transformed distances have an approximate distribution whose quantiles do not require knowing

λ1, . . . , λp.

In the simplest case λ1= ··· = λp(constant eigenvalues), and then ∥X∥2/λ1follows a χ2

pdistribution. It is known that

when pincreases the distribution of ∥X∥2tends to a Gaussian distribution, but this also holds for some other powers of

∥X∥. Wilson and Hilferty [33] found that the best transformation of this type was ∥X∥2/3in the sense of coming closest to a

J. Raymaekers and P. Rousseeuw / Journal of Multivariate Analysis 171 (2019) 94–111 105

Fig. 8. Approximation of the third quartile of a coga distribution for dimensions p∈ {1,...,20}when the eigenvalues are constant (top left), linear (top

right), or quadratic (bottom), using three different normalizing transforms.

Gaussian distribution. The quantiles qαof a Gaussian distribution are easier to compute and can then be transformed back

to q3/2

α.

It turns out that the same Wilson–Hilferty transformation also works quite well in the more general situation where the

eigenvalues λ1, . . . , λpneed not be the same. We came to this conclusion by a simulation study, a part of which is illustrated

here. The dimension pranged from 1 to 20 by steps of 1. For each p, we generated n=106observations y1,...,ynfrom the

coga distribution with shape parameters (0.5,...,0.5). The scale parameters had three settings: constant (2,...,2), linear

(p,p−1,...,1), and quadratic (p2,(p−1)2,...,1), after which the scale parameters were further standardized in order

to sum to 2p. These correspond to the distribution of the squared Euclidean norms of a multivariate normal distribution

where the covariance matrix has eigenvalues that are constant or proportional to (p,p−1,...,1) (linear eigenvalues) or

to (p2,(p−1)2,...,1) (quadratic eigenvalues). Denote the unsquared Euclidean norms as ri=√yi. Then we estimate

quantiles, e.g., Q3by assuming normality of the transformed values h1(ri)=r2

i(square), h2(ri)=ri(Fisher), and h3(ri)=r2/3

(Wilson–Hilferty), by computing the third quartile of a Gaussian distribution with ˆµ=mediani{h(ri)}and ˆσ=madi{h(ri)}.

Finally, we have evaluated the cumulative distribution function of the coga distribution in ˆ

3. Ideally, we would like to obtain

Fcoga(ˆ

3)=0.75. The result of this experiment is shown in Fig. 8. We clearly see that the Wilson–Hilferty transform brings

the approximate quantile closest to its target value. The results for the first quartile Q1 (not shown) are very similar.

A.2. Proof of Proposition 1

Part 1: Preservation of the eigenvectors. First note that gis orthogonally equivariant, i.e., g(HX)=Hg(X) for any orthogonal

matrix H. Therefore Sg=EFX{g(X)g(X)⊤}implies EFX{g(HX)g(HX)⊤} = HSgH⊤.

106 J. Raymaekers and P. Rousseeuw / Journal of Multivariate Analysis 171 (2019) 94–111

The distribution of Zis spherically symmetric hence invariant to reflections along a coordinate axis, which are described

by diagonal matrices Rwith an entry of −1 and all other entries +1. For every reflection matrix Rit thus holds that

EFZ{g(DZ)g(DZ)⊤} = EFZ{g(DRZ)g(DRZ)⊤} = EFZ{g(RDZ )g(RDZ)⊤} = RE{g(DZ )g(DZ)⊤}R⊤, where the second equality holds

because DR =RD as both Dand Rare diagonal, and the last equality because R is orthogonal. Therefore EFZ{g(DZ)g(DZ)⊤}is

a diagonal matrix, which we can denote as Λg=diag(λg,1, . . . , λg,p).

Now take Uan arbitrary orthogonal matrix and let X=UDZ. Then

Sg=EFZ{g(UDZ)g(UDZ)⊤} = UEFZ{g(DZ)g(DZ)⊤}U⊤=UΛgU⊤.

For the plain covariance matrix Σof X we have Σ=EFZ{UDZ(UDZ)⊤} = UΛU⊤, where Λ=DD⊤=diag(δ2

1, . . . , δ2

p).

Therefore, the same matrix Uorthogonalizes both Σand Sg, hence Sgand Σhave the same eigenvectors.

Part 2: Preservation of the ranks of the eigenvalues. Let i>jand suppose that δi> δj. We will show that λg,i> λg,j. Note that

λg,i=g(DZ)2

ifZ(Z)dZ =δ2

iz2

iξ(∥DZ∥)2fZ(Z)dZ,

where fZis the density of Z. Similarly, we have

λg,j=g(DZ)2

jfZ(Z)dZ =δ2

jz2

jξ(∥DZ∥)2fZ(Z)dZ.

This means that λg,i> λg,jis equivalent to

(δ2

iz2

i−δ2

jz2

j)ξ(∥DZ∥)2fZ(Z)dZ >0.(A.1)

As Zis spherically symmetric, i.e., fZ(Z)∼w(∥Z∥), we can write (A.1) as

(δ2

iz2

i−δ2

jz2

j)ξ(∥DZ∥)2w(∥Z∥)dZ >0.(A.2)

Note that we can change the variable of integration as follows. Let yk=δkzkand write Y=(y1,...,yp). Then (A.2) is

equivalent to

δ1···δp



(y2

i−y2

j)ξ(∥Y∥)2w









k=1

δ2

k

dY 



>0.(A.3)

We can ignore the positive constant 1/(δ1···δp) and split the integral over the domains A= {x∈Rd: |xi|>|xj|} and

B= {x∈Rd: |xi|<|xj|}, yielding

(y2

i−y2

j)ξ(∥Y∥)2w









k=1

δ2

k

dY =A

(y2

i−y2

j)ξ(∥Y∥)2w









k=1

δ2

k

dY

+B

(y2

i−y2

j)ξ(∥Y∥)2w









k=1

δ2

k

dY

=A

(y2

i−y2

j)ξ(∥Y∥)2w









k=1

δ2

k

dY

+A

(y2

j−y2

i)ξ(∥Y∥)2w









k=1

δ2

k+∆ij

dY

=A

(y2

i−y2

j)ξ(∥Y∥)2



w









k=1

δ2

k

−w









k=1

δ2

k+∆ij





dY

where in the second equality we have changed the variables of the integration over Bby replacing (yi,yj) by (−yj,yi) which

has Jacobian 1. The ∆ij in that step is the correction term

∆ij =y2

i/δ2

j+y2

j/δ2

i−y2

i/δ2

i−y2

j/δ2

j=(y2

i−y2

j)/δ2

j−(y2

i−y2

j)/δ2

i=(y2

i−y2

j)(1/δ2

j−1/δ2

i).

J. Raymaekers and P. Rousseeuw / Journal of Multivariate Analysis 171 (2019) 94–111 107

Note that on Ait holds that |yi|>|yj|hence y2

i−y2

j>0 so ∆ij >0. Since wis a decreasing function, it follows that

w









k=1

δ2

k

−w









k=1

δ2

k+∆ij

>0

which implies (A.3) so λg,i> λg,j. In contrast if δiand δjare tied, i.e., δi=δj, it follows that ∆ij =0 hence λg,i=λg,j. This

concludes the proof of Proposition 1.□

A.3. Influence functions

Proof of Proposition 2.Consider the contaminated distribution Fε,z=(1 −ε)F+ε∆z, where z∈Rpand ε∈ [0,1]. We

then have

Sg(Fε,z)=EFε,z{g(X)g(X)⊤} = (1 −ε)gε(X)gε(X)⊤dF (X)+εgε(X)gε(X)⊤d∆z.

If we take the derivative with respect to εand evaluate it in ε=0, we get

∂

∂ε Sg(Fε,z)ε=0=g(z)g(z)⊤−Ξg+∂

∂ε gε(X)gε(X)⊤dF (X)ε=0

Calculation of the IF. While the expression of the influence function might seem relatively simple, its (numerical) calculation

is rather involved. We can write

∂

∂ε gε(X)gε(X)⊤dF (X)ε=0=∂

∂ε {gε(X)}gε(X)⊤+gε(X)∂

∂ε gε(X)⊤dF (X)ε=0

=∂

∂ε gε(X)ε=0g(X)⊤+g(X)∂

∂ε gε(X)⊤ε=0dF (X).

So the term we need to determine is ∂gε(X)/∂ ε|ε=0. Recalling that g(t)=tξ(∥t∥) we have gε(t)=tξε(∥t∥). This means

that the contamination affects gbecause it affects the radial function ξ. Therefore we have to compute ∂gε(X)/∂ ε|ε=0=

X∂ξε(∥X∥)/∂ ε|ε=0for the functions ggiven by (4)–(8).

In these functions ξdepends on FXthough the distribution of ∥X∥2/3. Suppose that ∥X∥2/3∼Gwhen X∼F, so Gis

a univariate distribution. For Xε∼Fε,z=(1 −ε)F+ε∆zwe then have ∥Xε∥2/3∼Gε,∥z∥2/3=(1 −ε)G+ε∆∥z∥2/3. For

uncontaminated data the density of ∥X∥2/3is given by

fG(t)=fcoga(t3)|3t2|,

where fcoga is the density of the convolution of gamma distributions. We need this density to evaluate the influence

functions of their median and mad. The cutoffs in the paper are

Q1=(hmed∥X∥2/3−hmad∥X∥2/3)3/2,Q2=(hmed∥X∥2/3)3/2,

Q3=(hmed∥X∥2/3+hmad∥X∥2/3)3/2,Q∗

3=(hmed∥X∥2/3+1.4826 ×hmad∥X∥2/3)3/2,

and we can compute their influence functions, viz.

IF(z,Q1,F)=3

2median(G)−mad(G){IF(∥z∥2/3,median,G)−IF(∥z∥2/3,mad,G)},

IF(z,Q2,F)=3

2median(G)IF(∥z∥2/3,median,G),

IF(z,Q3,F)=3

2median(G)+mad(G){IF(∥z∥2/3,median,G)+IF(∥z∥2/3,mad,G)},

IF(z,Q∗

3,F)=3

2median(G)+1.4826 ×mad(G){IF(∥z∥2/3,median,G)

+1.4826 ×IF(∥z∥2/3,mad,G)}.

The Winsor GSSCM is given by ξ(r)=1r⩽Q2+Q2/r1r>Q2. For the contaminated case this becomes ξε(r)=1r⩽Q2,ε +

Q2,ε/r1r>Q2,ε . We then have

∂

∂ε ξε(r)=∂

∂ε 1[0,Q2,ε ](r)+Q2,ε

r1(Q2,ε,∞)(r)=δ(r−Q2,ε )Q′

2,ε +Q′

2,ε

r1(Q2,ε,∞)(r)−Q2,ε

rδ(r−Q2,ε)Q′

2,ε,

108 J. Raymaekers and P. Rousseeuw / Journal of Multivariate Analysis 171 (2019) 94–111

where δ(x−y) denotes the distributional derivative of 1(−∞,x](y)=1[y,∞)(x) with respect to x. Evaluation in ε=0 gives

δ(r−Q2)IF(z,Q2,F)+IF(z,Q2,F)

r1(Q2,∞)(r)−Q2

rδ(r−Q2)IF(z,Q2,F)

=1−Q2

rδ(r−Q2)IF(z,Q2,F)+IF(z,Q2,F)

r1(Q2,∞)(r).

As (1−Q2/r)δ(r−Q2) is 0 everywhere, we only need to integrate the last term. This yields

∂

∂ε gε(X)ε=0=X

∥X∥IF(z,Q2,F)1(Q2,∞)(∥X∥).

The influence function of Sgis thus given by

IF(z,Sg,F)=g(z)g(z)⊤−Ξg(F)

+X

∥X∥IF(z,Q2,F)1(Q2,∞)(∥X∥)g(X)⊤dF(X)+g(X)X

∥X∥IF(z,Q2,F)1(Q2,∞)(∥X∥)⊤dF(X).

Note that the last two terms in the sum are each other’s transpose. The integration is done numerically.

The derivation of the influence function of the Quad GSSCM is entirely similar to that of Winsor. The main difference is

that now ∂gε(X)/∂ε|ε=0is given by

∂

∂ε gε(X)ε=0=2Q2IF(z,Q2,F)X

∥X∥21(Q2,∞)(∥X∥).

The linearly redescending (LR) method uses a second cutoff, viz.

ξ(r)=





1 if r⩽Q2,

(Q∗

3−r)/(Q∗

3−Q2) if Q2<r⩽Q∗

0 if r>Q∗

In the contaminated case we obtain gε(x)=xξε(∥x∥) with

ξε(r)=





1 if r⩽Q2,ε,

(Q∗

3,ε −r)/(Q∗

3,ε −Q2,ε) if Q2,ε <r⩽Q∗

3,ε,

0 if r>Q∗

3,ε.

Taking the derivative with respect to εyields

∂

∂ε ξε(r)=δ(r−Q2,ε )+Q∗

3,ε −r

Q∗

3,ε −Q2,ε δ(r−Q∗

3,ε)−δ(r−Q2,ε )

+1[Q2,ε,Q∗

3,ε]

Q∗′

3,ε(Q∗

3,ε −Q2,ε)−(Q∗′

3,ε −Q′

2,ε)(Q∗

3,ε −r)

(Q∗

3,ε −Q2,ε)2.

Evaluation in ε=0 gives

δ(r−Q2)+Q∗

3−r

Q∗

3−Q2δ(r−Q∗

3)−δ(r−Q2)

+1[Q2,Q∗

IF(z,Q∗

3,F)(Q∗

3−Q2)− {IF(z,Q∗

3,F)−IF(z,Q2,F)}(Q∗

3−r)

(Q∗

3−Q2)2.

When integrating only the last term plays a role, yielding

∂

∂ε gε(X)ε=0=X1[Q2,Q∗

3](∥X∥)

IF(∥z∥,Q∗

3,F)(Q∗

3−Q2)− {IF(∥z∥,Q∗

3,F)−IF(∥z∥,Q2,F)}(Q∗

3− ∥X∥)

(Q∗

3−Q2)2

=X1[Q2,Q∗

3](∥X∥)IF(∥z∥,Q∗

3,F)(∥X∥ − Q2)+IF(∥z∥,Q2,F)(Q∗

3− ∥X∥)

(Q∗

3−Q2)2.

For the Ball GSSCM we analogously derive that

∂

∂ε gε(X)ε=0=δ(∥X∥ − Q2)IF(z,Q2,F)X.

J. Raymaekers and P. Rousseeuw / Journal of Multivariate Analysis 171 (2019) 94–111 109

Finally, for the Shell GSSCM we obtain

∂

∂ε gε(X)ε=0={δ(∥X∥ − Q3)IF(z,Q3,F)−δ(∥X∥ − Q1)IF(z,Q1,F)}X.

This concludes the proof of Proposition 2.□

A.4. Breakdown values

Proof of Proposition 3.Denote by Jthe set of all subsets of {1,...,n}with p+1 elements. For every subset J∈Jwe

define ηJ=maxi∈Jd2(xi,HJ), where HJis the hyperplane minimizing



i∈J

d2(xi,H)

over all possible hyperplanes Hand d(x,H) is the Euclidean distance between a point xand a hyperplane H.

Define ηX=minJ∈JηJ. Since the original points {x1,...,xn}are in general position, no p+1 points can lie on the same

hyperplane, which ensures that ηX>0. We also put c1=maxi∥xi−T(X)∥<∞.

Part 1. We first need to show that ε∗⩾⌊(n−p+1)/2⌋/n.

Let m<⌊(n−p+1)/2⌋and replace mobservations of X= {x1,...,xn}yielding X∗with location estimate T(X∗).

Because m/nis below the breakdown value of T, there is a constant c2<∞so that ∥T(X∗)−T(X)∥⩽c2for all such

contaminated datasets X∗. By the triangle inequality, ∥xi−T(X∗)∥⩽c1+c2<∞. This implies hmed(d∗

i)⩽c1+c2,

hence hmed(d∗

i)+1.4826 ×hmad(d∗

i)⩽2.4826 ×hmed(d∗

i)⩽2.4826 ×(c1+c2), where d∗

i= ∥x∗

i−T(X∗)∥. Therefore

∥g(t)∥⩽2.4826 ×(c1+c2) by condition 3.

First we show that the largest eigenvalue of Sg(X∗) is bounded over all such datasets X∗. Take any X∗, obtained by replacing

mpoints of Xby arbitrary points. Then

λmax =sup

∥u∥=1

u⊤Sg(X∗)u=sup

∥u∥=1



i=1

u⊤g{x∗

i−T(X∗)}g{x∗

i−T(X∗)}⊤u

=sup

∥u∥=1



i=1u⊤g{x∗

i−T(X∗)}2⩽sup

∥u∥=1



i=1∥u∥2∥g{x∗

i−T(X∗)}∥2⩽{2.4826 ×(c1+c2)}2<∞.

Next we show that the smallest eigenvalue of Sg(X∗) has a positive lower bound for all contaminated datasets X∗.

By condition 2 on ξwe know that #{xi:ξ{∥xi−T(X∗)∥} = 1}⩾⌊(n+p+1)/2⌋. Therefore, we have at least

⌊(n+p+1)/2⌋−(⌊(n−p+1)/2⌋−1) =p+1 regular points for which ξ{∥xi−T(X∗)∥} = 1, let us assume without

loss of generality that these are x1,...,xp+1. We can now write

λmin =min

∥u∥=1u⊤Sg(X∗)u=min

∥u∥=1



i=1

u⊤g{x∗

i−T(X∗)}g{x∗

i−T(X∗)}⊤u=min

∥u∥=1



i=1u⊤g{x∗

i−T(X∗)}2

⩾min

∥u∥=1

p+1



i=1u⊤{xi−T(X∗)}ξ{xi−T(X∗)}2=min

∥u∥=1

p+1



i=1u⊤{xi−T(X∗)}2

⩾1

p+1



i=1

d2(xi,H{1,...,p+1})⩾ηX>0.

Part 2. It remains to show that ε∗⩽⌊(n−p+1)/2⌋/n. This is the known upper bound for affine equivariant scatter

estimators but that result does not apply here, so we need to show it for this case. Take any m⩾⌊(n−p+1)/2⌋and

replace the last mpoints of X, keeping the points x1,...,xn−munchanged. By location equivariance we can assume without

loss of generality that the average of x1,...,xn−mis zero. For j∈ {n−m+1,...,n}, put x∗

j=λaj, where ajis such that

mini∈{n−m+1,...,n}∥aj−ai∥⩾1 and such that for all λ > 1 one has mini∈{1,...,n−m}∥λaj−xi∥⩾λ. This is possible by placing

the ajoutside of the convex hull of Xand far enough from each other and X.

Now consider an unbounded increasing sequence of λk>1. For every λkthe set {x∗

n−m+1,...,x∗

n}must contain at least

one point for which wi=1, call this point x∗

b. Take another point of X∗for which wi=1, name this x∗

c. Note that x∗

ccan be an

original data point or a replaced point. We now have that ∥x∗

b−x∗

c∥⩾λhence ∥x∗

b−T(X∗)∥+ ∥x∗

c−T(X∗)∥⩾λ. Therefore

∥x∗

b−T(X∗)∥2+ ∥x∗

c−T(X∗)∥2⩾λ2/2. We then obtain



j=1

λj{Sg(X∗)} = trace{Sg(X∗)} = 1



i=1

trace[{x∗

i−T(X∗)}{x∗

i−T(X∗)}⊤] = 1



i=1∥x∗

i−T(X∗)∥2

⩾1

n{∥x∗

b−T(X∗)∥2+ ∥x∗

c−T(X∗)∥2}⩾λ2/(2n).

This becomes arbitrarily large and so Sg(X∗) explodes. This concludes the proof of Proposition 3.□

110 J. Raymaekers and P. Rousseeuw / Journal of Multivariate Analysis 171 (2019) 94–111

Proof of Proposition 4.Showing that ε∗(T,X)⩽⌊(n+1)/2⌋/nis easy, since ⌊(n+1)/2⌋/nis the upper bound on the

breakdown value of all translation equivariant location estimators; see, e.g., [18].

It remains to show that ε∗(T,X)⩾⌊(n+1)/2⌋/n.

Note that the objective given by the sum of the hsmallest squared Euclidean distances is nonincreasing in every C-step.

The value of the objective function after step kis



j=1

(j){X,Tk(X)},

where d(j){X,Tk(X)}denotes the jth order statistic of the distances ∥xi−Tk(X)∥, and we have that



j=1

(j){X,Tk(X)}⩽



j=1

(j){X,Tk−1(X)}.

Recall that h=⌊(n+1)/2⌋and define c1=maxi∥xi−Tk(X)∥<∞. Let m<n−hand replace without loss of generality

the last mobservations of X= {x1,...,xn}to obtain X∗= {x1,...,xn−m,x∗

n−m+1,...,x∗

n} = {x∗

1,...,x∗

n}. Since the spatial

median T0does not yet break down for this m[18], there is a constant c2such that maxi∥xi−T0(X∗)∥⩽c2<∞for all such

datasets X∗.

Consider Tk(X∗) and the corresponding objective function h

j=1d2

(j){X∗,Tk(X∗)}. Since the C-step does not increase the

value of the objective function, we have that



j=1

(j){X∗,Tk(X∗)}⩽



j=1

(j){X∗,Tk−1(X∗)}⩽··· ⩽



j=1

(j){X∗,T0(X∗)}.

Note that



j=1

(j){X∗,T0(X∗)}⩽



i=1∥x∗

i−T0(X∗)∥2=



i=1∥xi−T0(X∗)∥2⩽h



i=1∥xi−T0(X∗)∥2

⩽(hc2)2.

Since mis at most ⌊(n−1)/2⌋and h=⌊(n+1)/2⌋we have at least ⌊(n+1)/2⌋−⌊(n−1)/2⌋=1 point xjwith

1⩽j⩽n−mfor which ∥xj−Tk(X∗)∥2⩽d2

(h){X∗,Tk(X∗)}. Note that

∥xj−Tk(X∗)∥2⩽



j=1

(j){X∗,Tk(X∗)}⩽



j=1

(j){X∗,T0(X∗)}.

So for this xjwe can write

∥Tk(X∗)−T0(X)∥⩽∥Tk(X∗)−xj∥+∥xj−T0(X)∥⩽hc2+c1<∞.

Note that this upper bound does not depend on kand therefore remains valid when the procedure is iterated until

convergence (k→ ∞). This concludes the proof of Proposition 4.□

References

[1] G. Boente, D. Rodriguez, M. Sud, The spatial sign operator: Asymptotic results and applications, J. Multivariate Anal. 170 (2018) (in press).

[2] B.M. Brown, Statistical uses of the spatial median, J. R. Stat. Soc. Ser. B Stat. Methodol. 45 (1983) 25–30.

[3] C. Chatzinakos, L. Pitsoulis, G. Zioutas, Optimization techniques for robust multivariate location and scatter estimation, J. Comb. Optim. 31 (2016)

1443–1460.

[4] C. Croux, C. Dehon, A. Yadine, The k-step spatial sign covariance matrix, Adv. Data Anal. Classif. 4 (2010) 137–150.

[5] C. Croux, E. Ollila, H. Oja, Sign and rank covariance matrices: Statistical properties and application to principal components analysis, in: Y. Dodge (Ed.),

Statistical Data Analysis Based on the L1-Norm and Related Methods, Birkhäuser, Basel, 2002, pp. 257–269.

[6] D. Donoho, P. Huber, The notion of breakdown point, in: P. Bickel, K. Doksum, J. Hodges (Eds.), A Festschrift for Erich Lehmann, Wadsworth, Belmont,

CA, pp. 157–184.

[7] A. Dürre, R. Fried, D. Vogel, The spatial sign covariance matrix and its application for robust correlation estimation, Austrian J. Statist. 46 (2017) 13–22.

[8] A. Dürre, D.E. Tyler, D. Vogel, On the eigenvalues of the spatial sign covariance matrix in more than two dimensions, Statist. Probab. Lett. 111 (2016)

80–85.

[9] A. Dürre, D. Vogel, Asymptotics of the two-stage spatial sign correlation, J. Multivariate Anal. 144 (2016) 54–67.

[10] A. Dürre, D. Vogel, R. Fried, Spatial sign correlation, J. Multivariate Anal. 135 (2015) 89–105.

[11] A. Dürre, D. Vogel, D.E. Tyler, The spatial sign covariance matrix with unknown location, J. Multivariate Anal. 130 (2014) 107–117.

[12] J.C. Gower, Algorithm AS 78: The Mediancentre, J. R. Stat. Soc. Ser. C. Appl. Stat. 23 (1974) 466–470.

[13] F. Hampel, E. Ronchetti, P. Rousseeuw, W. Stahel, Robust Statistics: The Approach Based on Influence Functions, Wiley, New York, 1986.

[14] C. Hu, V. Pozdnyakov, J. Yan, Coga: Convolution of Gamma Distributions, University of Connecticut, 2018. Rpackage version 0.2.2.

[15] M. Hubert, P.J. Rousseeuw, K. Vande. Branden, ROBPCA: A new approach to robust principal component analysis, Technometrics 47 (2005) 64–79.

[16] M. Hubert, P.J. Rousseeuw, T. Verdonck, A deterministic algorithm for robust location and scatter, J. Comput. Graph. Statist. 21 (2012) 618–637.

[17] N. Locantore, J.S. Marron, D.G. Simpson, N. Tripoli, J.T. Zhang, K.L. Cohen, Robust principal component analysis for functional data, Test 8 (1999) 1–28.

J. Raymaekers and P. Rousseeuw / Journal of Multivariate Analysis 171 (2019) 94–111 111

[18] H. Lopuhaä, P. Rousseeuw, Breakdown points of affine equivariant estimators of multivariate location and covariance matrices, Ann. Statist. 19 (1991)

229–248.

[19] A.F. Magyar, D.E. Tyler, The asymptotic inadmissibility of the spatial sign covariance matrix for elliptically symmetric distributions, Biometrika 101

(2014) 673–688.

[20] J.I. Marden, Some robust estimates of principal components, Statist. Probab. Lett. 43 (1999) 349–359.

[21] P.G. Moschopoulos, The distribution of the sum of independent gamma random variables, Ann. Inst. Statist. Math. 37 (1985) 541–544.

[22] P. Mozharovskyi, K. Mosler, T. Lange, Classifying real-world data with the DDα-procedure, Adv. Data Anal. Classif. 9 (2015) 287–314.

[23] O. Pokotylo, P. Mozharovskyi, R. Dyckerhoff, Depth and depth-based classification with R-package ddalpha,arXiv:1608.04109, 2016.

[24] G.M. Reaven, R.G. Miller, An attempt to define the nature of chemical diabetes using a multidimensional analysis, Diabetologia 16 (1979) 17–24.

[25] D.M. Rocke, Robustness properties of S-estimators of multivariate location and shape in high dimension, Ann. Statist. 24 (1996) 1327–1345.

[26] P. Rousseeuw, Least median of squares regression, J. Amer. Statist. Assoc. 79 (1984) 871–880.

[27] P. Rousseeuw, K. Van Driessen, A fast algorithm for the Minimum Covariance Determinant estimator, Technometrics 41 (1999) 212–223.

[28] S. Serneels, E. De Nolf, P.J. Van Espen, Spatial sign preprocessing: A simple way to impart moderate robustness to multivariate estimators, J. Chem.

Inf. Model. 46 (2006) 1402–1409, PMID: 16711760.

[29] S. Sirkia, S. Taskinen, H. Oja, D.E. Tyler, Tests and estimates of shape based on spatial signs and ranks, J. Nonparametr. Stat. 21 (2009) 155–176.

[30] S. Taskinen, I. Koch, H. Oja, Robustifying principal component analysis with spatial sign vectors, Statist. Probab. Lett. 82 (2012) 765–774.

[31] S. Visuri, V. Koivunen, H. Oja, Sign and rank covariance matrices, J. Statist. Plann. Inference 91 (2000) 557–575.

[32] S. Visuri, H. Oja, V. Koivunen, Subspace-based direction-of-arrival estimation using nonparametric statistics, IEEE Trans. Signal Process. 49 (2001)

2060–2073.

[33] E.B. Wilson, M.M. Hilferty, The distribution of chi-square, Proc. Nat. Acad. Sci. USA 17 (1931) 684–688.

Generalized spherical principal component analysis

Article

Full-text available

Mar 2024
STAT COMPUT

Outliers contaminating data sets are a challenge to statistical estimators. Even a small fraction of outlying observations can heavily influence most classical statistical methods. In this paper we propose generalized spherical principal component analysis, a new robust version of principal component analysis that is based on the generalized spatial sign covariance matrix. Theoretical properties of the proposed method including influence functions, breakdown values and asymptotic efficiencies are derived. These theoretical results are complemented with an extensive simulation study and two real-data examples. We illustrate that generalized spherical principal component analysis can combine great robustness with solid efficiency properties, in addition to a low computational cost.

Robust and Resistant Regularized Covariance Matrices

Preprint

Full-text available

Jul 2023

We introduce a class of regularized M-estimators of multivariate scatter and show, analogous to the popular spatial sign covariance matrix (SSCM), that they possess high breakdown points. We also show that the SSCM can be viewed as an extreme member of this class. Unlike the SSCM, this class of estimators takes into account the shape of the contours of the data cloud when down-weighing observations. We also propose a median based cross validation criterion for selecting the tuning parameter for this class of regularized M-estimators. This cross validation criterion helps assure the resulting tuned scatter estimator is a good fit to the data as well as having a high breakdown point. A motivation for this new median based criterion is that when it is optimized over all possible scatter parameters, rather than only over the tuned candidates, it results in a new high breakdown point affine equivariant multivariate scatter statistic.

Modularity of food-sharing networks minimises the risk for individual and group starvation in hunter-gatherer societies

Article

Full-text available

May 2023
PLOS ONE

It has been argued that hunter-gatherers' food-sharing may have provided the basis for a whole range of social interactions, and hence its study may provide important insight into the evolutionary origin of human sociality. Motivated by this observation, we propose a simple network optimization model inspired by a food-sharing dynamic that can recover some empirical patterns found in social networks. We focus on two of the main food-sharing drivers discussed by the anthropological literature: the reduction of individual starvation risk and the care for the group welfare or egalitarian access to food shares, and show that networks optimizing both criteria may exhibit a community structure of highly-cohesive groups around special agents that we call hunters, those who inject food into the system. These communities appear under conditions of uncertainty and scarcity in the food supply, which suggests their adaptive value in this context. We have additionally obtained that optimal welfare networks resemble social networks found in lab experiments that promote more egalitarian income distribution, and also distinct distributions of reciprocity among hunters and non-hunters, which may be consistent with some empirical reports on how sharing is distributed in waves, first among hunters, and then hunters with their families. These model results are consistent with the view that social networks functionally adaptive for optimal resource use, may have created the environment in which prosocial behaviors evolved. Finally, our model also relies on an original formulation of starvation risk, and it may contribute to a formal framework to proceed in this discussion regarding the principles guiding food-sharing networks.

On weighted multivariate sign functions

Article

May 2022
J MULTIVARIATE ANAL

Multivariate sign functions are often used for robust estimation and inference. We propose using data dependent weights in association with such functions. The proposed weighted sign functions retain desirable robustness properties, while significantly improving efficiency in estimation and inference compared to unweighted multivariate sign-based methods. Using weighted signs, we demonstrate methods of robust location estimation and robust principal component analysis. We extend the scope of using robust multivariate methods to include robust sufficient dimension reduction and functional outlier detection. Several numerical studies and real data applications demonstrate the efficacy of the proposed methodology.

Sparse dimension reduction based on energy and ball statistics

Preprint

Full-text available

Dec 2020

As its name suggests, sufficient dimension reduction (SDR) targets to estimate a subspace from data that contains all information sufficient to explain a dependent variable. Ample approaches exist to SDR, some of the most recent of which rely on minimal to no model assumptions. These are defined according to an optimization criterion that maximizes a nonparametric measure of association. The original estimators are nonsparse, which means that all variables contribute to the model. However, in many practical applications, an SDR technique may be called for that is sparse and as such, intrinsically performs sufficient variable selection (SVS). This paper examines how such a sparse SDR estimator can be constructed. Three variants are investigated, depending on different measures of association: distance covariance, martingale difference divergence and ball covariance. A simulation study shows that each of these estimators can achieve correct variable selection in highly nonlinear contexts, yet are sensitive to outliers and computationally intensive. The study sheds light on the subtle differences between the methods. Two examples illustrate how these new estimators can be applied in practice, with a slight preference for the option based on martingale difference divergence in the bioinformatics example.

Elegant robustification of sparse partial least squares by robustness-inducing transformations

Article

Feb 2024

Robust second-order stationary spatial blind source separation using generalized sign matrices

Article

Dec 2023

A Review of Outlier Detection and Robust Estimation Methods for High Dimensional Time Series Data

Article

Feb 2023

direpack: A Python 3 package for state-of-the-art statistical dimensionality reduction methods

Article

Feb 2023

The direpack package establishes a set of modern statistical dimensionality reduction techniques into the Python universe as a single, consistent package. Several of the methods included are only available as open source through direpack, whereas the package also offers competitive Python implementations of methods previously only available in other programming languages. In its present version, the package is structured in three subpackages for different approaches to dimensionality reduction: projection pursuit, sufficient dimension reduction and robust M estimators. As a corollary, the package also provides access to regularized regression estimators based on these reduced dimension spaces, as well as a set of classical and robust preprocessing utilities, including very recent developments such as generalized spatial signs. Finally, direpack has been written to be consistent with the scikit-learn API, such that the estimators can flawlessly be included into (statistical and/or machine) learning pipelines in that framework.

On some multivariate sign tests for scatter matrix eigenvalues

Article

Apr 2021

Multivariate sign-based tests for a class of testing problems on the eigenvalues of scatter matrices are constructed. The class of testing problems is characterized by real mappings h say. A necessary and sufficient condition on h to obtain asymptotically valid sign-based procedures is identified. A simulation study shows the very good robustness properties of our sign tests while their practical relevance is illustrated on a real data set.

The spatial sign covariance operator: Asymptotic results and applications

Article

Full-text available

Apr 2018

Due to the increasing recording capability, functional data analysis has become an important research topic. For functional data the study of outlier detection and/or the development of robust statistical procedures has started recently. One robust alternative to the sample covariance operator is the sample spatial sign covariance operator. In this paper, we study the asymptotic behaviour of the sample spatial sign covariance operator when location is unknown. Among other possible applications of the obtained results, we derive the asymptotic distribution of the principal directions obtained from the sample spatial sign covariance operator and we develop test to detect differences between the scatter operators of two populations. In particular, the test performance is illustrated through a Monte Carlo study for small sample sizes.

Depth and Depth-Based Classification with R Package ddalpha

Article

Full-text available

Aug 2016
J STAT SOFTW

Following the seminal idea of Tukey, data depth is a function that measures how close an arbitrary point of the space is located to an implicitly defined center of a data cloud. Having undergone theoretical and computational developments, it is now employed in numerous applications with classification being the most popular one. The R-package ddalpha is a software directed to fuse experience of the applicant with recent achievements in the area of data depth and depth-based classification. ddalpha provides an implementation for exact and approximate computation of most reasonable and widely applied notions of data depth. These can be further used in the depth-based multivariate and functional classifiers implemented in the package, where the $DD\alpha$-procedure is in the main focus. The package is expandable with user-defined custom depth methods and separators. The implemented functions for depth visualization and the built-in benchmark procedures may also serve to provide insights into the geometry of the data and the quality of pattern recognition.

The Spatial Sign Covariance Matrix and Its Application for Robust Correlation Estimation

Article

Full-text available

Jun 2016

We summarize properties of the spatial sign covariance matrix and especially look at the relationship between its eigenvalues and those of the shape matrix of an elliptical distribution. The explicit relationship known in the bivariate case was used to construct the spatial sign correlation coefficient, which is a non-parametric and robust estimator for the correlation coefficient within the elliptical model. We consider a multivariate generalization, which we call the multivariate spatial sign correlation matrix.

Optimization techniques for robust multivariate location and scatter estimation

Article

Full-text available

Jan 2015

Computation of typical statistical sample estimates such as the median or least squares fit usually require the solution of an unconstrained optimization problem with a convex objective function, that can be solved efficiently by various methods. The presence of outliers in the data dictates the computation of a robust estimate, which can be defined as the optimum statistical estimate for a subset that contains at least half of the observations. The resulting problem is now a combinatorial optimization problem which is often computationally intractable. Classical statistical methods for multivariate location $\varvec{\mu }$ and scatter matrix $\varvec{\varSigma }$ estimation are based on the sample mean vector and covariance matrix, which are very sensitive in the presence of outlier observations. We propose a new method for robust location and scatter estimation which is composed of two stages. In the first stage an unbiased multivariate $L_{1}$ -median center for all the observations is attained by a novel procedure called the least trimmed Euclidean deviations estimator. This robust median defines a coverage set of observations which is used in the second stage to iteratively compute the set of outliers which violate the correlational structure of the data set. Extensive computational experiments indicate that the proposed method outperforms existing methods in accuracy, robustness and computational time.

On the eigenvalues of the spatial sign covariance matrix in more than two dimensions

Article

Dec 2015
STAT PROBABIL LETT

We gather several results on the eigenvalues of the spatial sign covariance matrix of an elliptical distribution. It is shown that the eigenvalues are a one-to-one function of the eigenvalues of the shape matrix and that they are closer together than the latter. We further provide a one-dimensional integral representation of the eigenvalues, which facilitates their numerical computation.

Classifying real-world data with the DDα-procedure

Article

Sep 2015

The $${ DD}\alpha $$DDα-classifier, a nonparametric fast and very robust procedure, is described and applied to fifty classification problems regarding a broad spectrum of real-world data. The procedure first transforms the data from their original property space into a depth space, which is a low-dimensional unit cube, and then separates them by a projective invariant procedure, called $$\alpha $$α-procedure. To each data point the transformation assigns its depth values with respect to the given classes. Several alternative depth notions (spatial depth, Mahalanobis depth, projection depth, and Tukey depth, the latter two being approximated by univariate projections) are used in the procedure, and compared regarding their average error rates. With the Tukey depth, which fits the distributions’ shape best and is most robust, ‘outsiders’, that is data points having zero depth in all classes, appear. They need an additional treatment for classification. Evidence is also given about the dimension of the extended feature space needed for linear separation. The $${ DD}\alpha $$DDα-procedure is available as an R-package.

Asymptotics of the two-stage spatial sign correlation

Article

Jun 2015

The spatial sign correlation (D\"urre, Vogel and Fried, 2015) is a highly robust and easy-to-compute, bivariate correlation estimator based on the spatial sign covariance matrix. Since the estimator is inefficient when the marginal scales strongly differ, a two-stage version was proposed. In the first step, the observations are marginally standardized by means of a robust scale estimator, and in the second step, the spatial sign correlation of the thus transformed data set is computed. D\"urre et al. (2015) give some evidence that the asymptotic distribution of the two-stage estimator equals that of the spatial sign correlation at equal marginal scales by comparing their influence functions and presenting simulation results, but give no formal proof. In the present paper, we close this gap and establish the asymptotic normality of the two-stage spatial sign correlation and compute its asymptotic variance for elliptical population distributions. We further derive a variance-stabilizing transformation, similar to Fisher's z-transform, and numerically compare the small-sample coverage probabilities of several confidence intervals.

Algorithm AS 78: The Mediancentre

Article

Jan 1974

John Gower

Spatial Sign Correlation

Article

Mar 2014

A new robust correlation estimator based on the spatial sign covariance matrix (SSCM) is proposed. We derive its asymptotic distribution and influence function at elliptical distributions. Finite sample and robustness properties are studied and compared to other robust correlation estimators by means of numerical simulations.

Robust Statistics: The Approach Based on Influence Functions

Book

Jan 1986

We are not allowed to upload or share this book, sorry.

A generalized spatial sign covariance matrix

Abstract and Figures

Recommended publications

ROBPCA: A new approach to robust principal component analysis