ArticlePDF Available

A covariance matrix test for high-dimensional data

October 2016
Songklanakarin Journal of Science and Technology 38(5):521-535

October 2016
38(5):521-535

DOI:10.14456/sjst-psu.2016.68

Authors:

Independent academician/ researcher

For the multivariate normally distributed data with the dimension larger than or equal to the number of observations, or the sample size, called high-dimensional normal data, we proposed a test for testing the null hypothesis that the covariance matrix of a normal population is proportional to a given matrix on some conditions when the dimension goes to infinity. We showed that this test statistic is consistent. The asymptotic null and non-null distribution of the test statistic is also given. The performance of the proposed test is evaluated via simulation study and its application.

Content uploaded by Samruam Chongcharoen

Content may be subject to copyright.

Original Article

A covariance matrix test for high-dimensional data

Saowapha Chaipitak and Samruam Chongcharoen*

School of Applied Statistics, National Institute of Development Administration (NIDA),

Bang Kapi, Bangkok, 10240 Thailand.

Received: 25 October 2015; Accepted: 25 January 2016

Abstract

For the multivariate normally distributed data with the dimension larger than or equal to the number of observations,

or the sample size, called high-dimensional normal data, we proposed a test for testing the null hypothesis that the covariance

matrix of a normal population is proportional to a given matrix on some conditions when the dimension goes to infinity.

We showed that this test statistic is consistent. The asymptotic null and non-null distribution of the test statistic is also

given. The performance of the proposed test is evaluated via simulation study and its application.

Keywords: asymptotic distribution, high-dimensional data, null distribution, non-null distribution, multivariate normal,

hypothesis testing

Songklanakarin J. Sci. Technol.

38 (5), 521-535, Sep. - Oct. 2016

1. Introduction

Let 1

,...,

X X

be a set of independent observations

from a multivariate normal distribution

( , )





where both

the mean vector



and covariance matrix ,  is a positive

definite matrix, are unknown. In this paper, we are interested

in the problem of testing the hypothesis that the covariance

matrix of a normal population is proportional to a given

matrix, that is,

0 0

H t

  

against

1 0

H t

  

where

both

0 ,

   

are known. The likelihood ratio test

(LRT),which is based on the sample covariance matrix, is the

traditional technique to handle this hypothesis and requires

n p



. But many applications in modern science and eco-

nomics, e.g. the analysis of DNA microarrays, the dimension

is usually in thousands of gene expressions whereas the

sample size is small, which makes

n p



, called high-dimen-

sional data. For such data, the LRT is not applicable because

the sample covariance matrix, S, is singular when

n p



(see,

for examples, Muirhead, 1982, Sections 8.3 and 8.4; Anderson,

1984, Sections 10.7 and 10.8).

Recently, several authors have proposed methods for

testing the related problems. Some of them are: John (1971);

Nagao (1973); Ledoit and Wolf (2002); and Srivastava (2005),

and Fisher et al. (2010). Those are given as follows.

John (1971) proposed a test statistic for testing that

the covariance matrix of a normal population is proportional

to an identity matrix, that is, 0:

H tI



 

  

a known

value which is the locally most powerful invariant test as

(1 / ) ( )

U tr I

p p tr S

 

 

 

 

 

 

 

 

and Nagao (1973) proposed a test statistic for testing



 

 

V tr S I

 

 

 

Both U and V test statistics are consistent and have been

studied under assuming that n goes to infinity while p

remains fixed. So, Ledoit and Wolf (2002) demonstrated that

the test statistic for testing



based on U statistic is still

consistent if n goes to infinity with p that is as ( , )n p

 

and

/ ,

p n c



(0, )

 

. The null hypothesis



is rejected

* Corresponding author.

Email address: samruam@as.nida.ac.th

http: //www.sjst.ps u.ac. th

S. Chaipitak & S. Chongcharoen / Songklanakarin J. Sci. Technol. 38 (5), 521-535, 2016

522

npU

U(1.1)

exceeds the appropriate quantile from the





distribution

with

( 1) / 2 1

p p

 

degree of freedom. For testing



 

if goes to infinity with p as n < p, Ledoit and Wolf

(2002) showed that the statistic V is not consistent against

every alternative and its n



limiting distribution differs from

its

( , )

n p



limiting distribution under the null hypothesis.

Then they modified the statistic V as

 

1 1

p p

W tr S I trS

p n p n

   

 

   

   

They have shown that the statistic W is consisten t as

( , )n p

 

, including the case n < p. The test statistic based

on W rejects the null hypothesis



/ 2

npW exceeds the

appropriate quantile from the 2





distribution with

( 1) / 2

p p



degrees of freedom. Srivastava (2005) proposed

a test statistic when ( , )n p

 

( )

n O p





0 1,



 

reject the null hypothesis

H I



  , 2





unknown,

with a test statistic which did not relate with unknown





Then we applied his test statistic for testing



and reject



2 1

ˆ ˆ

( / 1)

T h h

 

(1.2)

exceeds the appropriate quantile of the standard normal dis-

tribution, where 1

(1 / ) ( )

h p tr S

 and

( 1)( 2)

n n



 

2 2

1 1

( )

trS trS

p n



 

 

 

. The statistics

and

are

( , )

n p



consistent estimators of

(1 / )

p tr



and

(1/ )

p tr



respec-

tively. Also he proposed a test to reject the null hypothesis



2 1 2

ˆ ˆ

( 2 1)

T h h

  

(1.3)

exceeds the appropriate quantile of the standard normal distri-

bution. Motivated by the result in Srivastava (2005), which

requires, /

p n c



(0, )

 

Fisher et al. (2010)

proposed the test for testing



based on unbiased and

consistent estimators of the second and fourth arithmetic

means of the sample eigenvalues. With the constants:

4 2 3 6 2(5 6)

, , ,

2 2

( 2) ( 2)

n n n

b c d

nn n n n n n

  

    

   

5 6

2 2

( 2)

n n n



 

 

and

5 2

( 2)

( 1)( 2)( 4)( 6)( 1)( 2)( 3)

n n n

n n n n n n n



 



      

they proposed the test statistic to reject the null hypothesis



4 2

ˆ ˆ

( / 1)

8(8 12 )

n h h

c c





 

exceeds the appropriate quantile of the standard normal distri-

bution, where

4 3 * 2 2 2 2 4

( ) ( ) ( )

h trS btrS trS c trS dtrS trS e trS



    

 

 

 

( , )

n p



consistent estimator of

(1/ )

p tr



The remainder of this paper is organized as follows.

Section 2 provides the proposed test statistic and its asymp-

totic distribution under both the null and alternative hypo-

theses as (n, p) go to infinity even if n < p. Section 3 shows

the performance of the proposed test statistic through

simulation technique. Section 4 applies the test statistic to

real data. Section 5 contains the conclusions. The theoretical

derivations are given in the Appendix.

2. Description of the Proposed Test

Suppose





1 1

,..., ~ ,

n p

X X N







and we are

interested in testing that the covariance matrix of a normal

population is proportional to a given matrix, that is,

  

against

1 0

H t

  

where 0t

  

is known

value and



is a given known positive definite matrix. Wee

proposed the test statistic by considering a measure of a

distance between the two matrices

1 1 2 1 2

0 0 0

1 1 2

( ( ) ( ) ( )

tr tI tr tr t

p p p



  

          

(2.1)

where tr denotes the trace of matrix and if and





only if

the null hypothesis holds. Thus, we may consider testing

: 0





against 1

: 0





We shall make the following assumptions:

(A) 0 0

lim , (0, ), 1,..., 8

i i i

a a a i

 

   

(B)

( , )

lim / , (0, )

n p

p n c c

 

  

where 1

(1 / ) ( ) (1/ ) ( / )

i i

i j j

a p tr p d







    . The



’s are the eigenvalues of the covariance matrix  and dj’ss

are the eigenvalues of a known positive definite matrix 0.

We need estimators of a1 and a2 to be consistent estimators

523

S. Chaipitak & S. Chongcharoen / Songklanakarin J. Sci. Technol. 38 (5), 521-535, 2016

for large p and n even if n < p. The following theorem

provides these consistent estimators.

Theorem 2.1 The unbiased and consistent estimators of a1 =

1 0

(1 / ) ( )

a p tr



  

and

1 2

2 0

(1 / ) ( )

a p tr 

  

are respectively

given by

1 0

(1 / ) ( )

a p tr S



  (2.2)

and

1 2 1 2

2 0 0

1 1

( ) ( ( ))

( 1)( 2)

a tr S tr S

n n p n

 

   

 

 

 

 

(2.3)

Thus we use estimators in Theorem 2.1 to define the unbiased

and consistent estimator of



in (2.1) as

2 1

ˆ ˆ ˆ

a ta t



  

(2.4)

The following theorem gives the asymptotic distribution of

the estimators

and

in (2.4).

Theorem 2.2 Under the assumption (A), and (B), as

( , )n p

 

2 3

1 1

2 2 3 4

2 / 4 /

ˆ4 / 4(2 ) /

a np a np

a a

a np a ca np



 

 

   

 

 



   

 

 

   

 

 

where D

x y



denotes x converges in distribution to y.

The following theorem and corollary provide the

asymptotic distribution of



under the alternative and null

hypothesis by applying the delta method of a function of

two random variables.

Theorem 2.3 Under the assumption (A), and (B), as

( , )n p

 





ˆ0,

  

 (2.5)

with

2 2

2 3 4 2

(2 4 2 )

ta ta a ca



    .

Corollary 2.1 Under the null hypothesis

0 0

H t

  

then





and under the assumption (A), and (B), as ( , )n p

 

ˆˆ

(0,1)

np nD

T N



   (2.6)

Remark If t = 1 and 0

 

where I is identity matrix, then

the proposed statistic T is the test statistic

in (1.3) given

by Srivastava (2005).

3. Simulation Study

For studying the performance of the proposed test

statistic T, we compute the attained significance level (ASL)

of the proposed test by simulation technique. Based on 10,000

replications of the data set simulated under the null hypo-

thesis

0 0

H t

  

, test statistic T is computed and then we

obtain the attained significance level (ASL) of the test by

recording the proportion rejection of test statistic for the

null hypothesis with the nominal significance level at 0.05.

We simulate the ASL for different four null hypotheses as

1) 1

0 0 01

H t C

    wh e r e 01 ,

( )

i j p p

C c 

 

( )

i j p p

 

is a Toeplitz matrix with elements 0 1

1, c c

 

0.5

 

and the rest elements are equal to zero

2) 2

0 0 02

: 0.5 0.51 1

p p p

H t C I



      , where

denotes the

p p



identity matrix, and

denotes the



vector having each element equal to 1

3) 3

0 0 03

H t C

    where 03 ,

( )

i j p p

C c 

 

( )

j i p p



with ,

( 1) ( / 2 ) 1,..,

i j

c i j i j p



    

and ,

1.0 1,...,

i i

c i p

  

4) 4

0 0 04

H t C

    where 04 ,

( )

i j p p

C c 

 

( )

j i p p



with

1 / 5

| |

( 1) 0.9 , 1, ..,

i j i j

i j

c i j p

 

   

For each null hypothesis, we simulate the empirical

power of the proposed test T under the alternative hypothesis

for each of four null hypotheses as

1) 1

0 01

H C

  against 1

1 1

H C

 

where



( ) ( )

i j p p i j p p

c c

  

 is a Toeplitz matrix with elements

0 1 1

1, 0.49

c c c



    and the rest elements are equal to

zero

2) 2

0 02

H C

  against 2

1 2

: 0.9

H C I

   

(0.1)1 1

p p



3) 3

0 03

H C

  against 3

1 3

H C

 

where 3



, ,

( ) ( )

i j p p j i p p

c c

 

 with ,

( 1) ( / 4 )

i j

c i j i



  

1,..,

j p

 

and ,

1.0 1,...,

i i

c i p

  

4) 4

0 04

H C

  against 4

1 4

H C

 

where 4



, ,

( ) ( )

i j p p j i p p

c c

 

 with

2 / 5

| |

( 1) 0.9

i j i j

i j

c 

 

, 1,..,

i j p

 

3.1 Simulation results

The ASL is provided in Table 1 corresponding to the

null hypotheses. As expected, the ASL of the test statistic T

S. Chaipitak & S. Chongcharoen / Songklanakarin J. Sci. Technol. 38 (5), 521-535, 2016

524

is reasonably close to the nominal significant level 0.05 and

gets better when p and n get large. We found that four sets

of the ASL are almost the same that means the consistency

of our test statistic is not affected by varying the null co-

variance matrices.

The empirical powers are shown in Table 2. It shows

that four sets of the empirical power of test statistic T rapidly

converge to one and stay high as n and p get large for n < p.

We also compute the ASL in a special case of the null

covariance matrix with setting t = 2 and 0

 

, that is, the

test with the null hypothesis as 0

: 2

H I



  (spherecity).

We compare the performance of the proposed test statistic

T with the test statistics defined in Ledoit and Wolf (2002),

denoted Uj in (1.1) and Srivastava (2005), denoted TS1 in

(1.2). We compare them under the alternative hypothesis

: 2

H D



  where 1

( ,..., ); (0,1),

p i

D diag d d d Unif



1, 2, ...,

i p



. The ASL and the empirical powers are provided

in Table 3. Table 3 reports that the ASL of the proposed test

statistic T is similar to those provided in Table 1 and closed

to those from the test statistic TS1 and Uj. But the test statistic

T gives the best performance for all of the setting (n,p) and

has substantially higher powers than those of Uj and TS1 for

almost every n and p considered. These results suggest that

the proposed test may more appropriate to use than Uj test

and TS1 test, especially when is small.

4. A Real Example

In this section, the microarray dataset is collected from

Notterman et al. (2001) is available at http://genomics-

pubs.p rincet on .ed u/oncol og y/Data/ Car cin omaNo rmal

datasetCancerResearch.xls (last accessed: 9 October 2011).

There are 18 colon adenocarcinomas and their paired normal

colon tissues and they are obtained on oligonucleotide

arrays. The expression levels of 6500 human genes are

measured on each. For simplicity, we will restrict attention to

18 colon adenocarcinomas with only first 256 measurements

each. We examine whether the covariance matrix is the

sphericity. The data gives the observed test statistic values

T = 8 .500,

284.567

U and 1

270.582

T with p-

value

p value

 

each, thus the hypothesis of being sphericity is

rejected at any reasonable significance level.

5. Conclusions

For testing the covariance matrix in high-dimensional

data, our test statistic is proposed under normality assump-

tion. The test statistic is approximated by normal distribution.

Numerical simulations indicate that our test statistic T in

(2.6) constructed from the consistent estimators with accu-

rately control size of test and their powers get better when

(n,p) get large for n < p. Moreover, the test statistic gives

higher power than, for testing being sphericity of the co-

variance matrix, those of the tests in Ledoit and Wolf (2002)

and Srivastava (2005).

Acknowledgements

The authors would like to express their gratitude to

the Commission on Higher Education (CHE) of Thailand for

their financial support.

Table 1. The ASL of test statistic T under four null hypotheses at Nominal Significance

Level

0.05





The ASL of T

n N

 

0 01

H C

  2:

0 02

H C

  3:

0 03

H C

  4:

0 04

H C

 

10 9 0.059 0.058 0.059 0.059

40 9 0.055 0.055 0.055 0.055

39 0.056 0.056 0.056 0.057

80 9 0.057 0.056 0.057 0.057

39 0.052 0.052 0.052 0.052

79 0.053 0.052 0.052 0.052

160 9 0.053 0.053 0.054 0.054

39 0.056 0.055 0.055 0.056

79 0.056 0.056 0.056 0.055

159 0.053 0.053 0.053 0.053

320 9 0.052 0.052 0.052 0.052

39 0.052 0.051 0.052 0.052

79 0.051 0.051 0.050 0.050

159 0.051 0.050 0.051 0.051

319 0.053 0.051 0.053 0.053

525

S. Chaipitak & S. Chongcharoen / Songklanakarin J. Sci. Technol. 38 (5), 521-535, 2016

Table 2. The empirical power of T under four alternative hypotheses.

The empirical power of T

n N

 

1 1

H C

  

1 2

H C

  3:

1 3

H C

  4:

1 4

H C

 

10 9 0.480 0.560 0.174 0.159

40 9 0.996 0.617 0.265 0.286

39 1.000 1.000 0.772 0.877

80 9 1.000 0.624 0.300 0.330

39 1.000 1.000 0.837 0.939

79 1.000 1.000 0.998 1.000

160 9 1.000 0.625 0.319 0.346

39 1.000 1.000 0.866 0.966

79 1.000 1.000 0.999 1.000

159 1.000 1.000 1.000 1.000

320 9 1.000 0.629 0.342 0.361

39 1.000 1.000 0.891 0.977

79 1.000 1.000 1.000 1.000

159 1.000 1.000 1.000 1.000

319 1.000 1.000 1.000 1.000

Table 3. The ASL (under

: 2

H I

  ) and the empirical power (under

: 2

H D

  )

T U

and

at Nominal Significance Level

0.05





ASL Empirical Power

n N

 

10 9 0.059 0.049 0.048 1.000 0.412 0.405

40 9 0.055 0.054 0.051 1.000 0.368 0.360

39 0.056 0.056 0.053 1.000 0.999 0.999

80 9 0.057 0.057 0.053 1.000 0.356 0.348

39 0.052 0.052 0.050 1.000 0.999 0.999

79 0.052 0.051 0.050 1.000 1.000 1.000

160 9 0.053 0.056 0.054 1.000 0.354 0.346

39 0.055 0.056 0.055 1.000 0.999 0.999

79 0.055 0.057 0.055 1.000 1.000 1.000

159 0.053 0.052 0.052 1.000 1.000 1.000

320 9 0.052 0.055 0.052 1.000 0.352 0.343

39 0.052 0.054 0.052 1.000 0.999 0.999

79 0.050 0.050 0.050 1.000 1.000 1.000

159 0.051 0.050 0.050 1.000 1.000 1.000

319 0.053 0.053 0.053 1.000 1.000 1.000

References

Anderson, T.W. 1984. An introduction to multivariate statisti-

cal analysis, 2nd edition, Wiley, New York, U.S.A.

Billingsley, P. 1995. Probability and measure, 3rd edtion, Wiley,

New York, U.S.A.

Fisher, T., Sun, X., and Gallagher, C. M. 2010. A new test for

sphericity of the covariance matrix for high dimen-

sional data. Journal of Multivariate Analysis. 101,

2554-2570.

John, S. 1971. Some optimal multivariate tests. Biometrika.

58, 123-127.

Ledoit, O., and Wolf, M. 2002. Some hypothesis tests for

the covariance matrix when the dimension is large

compared to the sample size. Annual Statistics. 30,

1081-1102.

Lehmann, E.L., and Romano, J.P., 2005. Testing statistical

hypotheses, 3rd edition, Springer, New York, U.S.A.

Nagao, H., 1973. On some test criteria for covariance matrix.

Annual Statistics. 1, 700-709.

S. Chaipitak & S. Chongcharoen / Songklanakarin J. Sci. Technol. 38 (5), 521-535, 2016

526

Muirhead, R.J. 1982. Aspects of Multivariate Statistical

Theory, Wiley, New York, U.S.A.

Notterman, D.A., Alon, U., Sierk, A.J., and Levine, A.J. 2001.

Transcriptional gene expression profiles of colorectal

adenoma, adenocarcinoma, and normal tissue exam-

ined by oligonucleotide arrays. Cancer Research. 61,

3124-3130.

Rao, C.R. 1973. Linear statistical inference and its Applica-

tion, 2nd edition, Wiley, New York, U.S.A.

Rencher, A.C. 2000. Linear models in statistics, Wiley,

New York, U.S.A.

Srivastava, M.S., 2005. Some tests concerning the covariance

matrix in high dimensional data. Journal of Japan

Statistical Society. 35, 251-272.

Appendix

Before proving Theorem 2.1, we need the following information and lemma:

For positive symmetric definite matrix  and by spectral decomposition, we have



  

where 1 2

( , ..., )

diag

  

 

with



being the ith eigenvalue of and  is an orthogonal matrix with each column as normalized corresponding

eigenvectors 1 2

, ,...,

γ γ γ

. Similarly, we also can write



as 0

RDR



  where 1 2

( , ..., )

D diag d d d

 with di being the

ith eigenvalue of



and R is an orthogonal matrix with each column as normalized corresponding eigenvectors 1 2

, ,...,

r r r

(Rencher, 2003).

Let

~ ( , )

nS YY W n



  where 1 2

( , , ... )

Y y y y

 and each

~ ( , )

j p

y N



0 and independent (Anderson (1984),

Section 3.3; Srivastava (2005); Fisher et al. (2010)). Let 1 2

( , ,... )

U u u u

 where

is independently and identically

distributed (iid.)

( , )

N I

and we can write

Y U

  where

1 1

2 2

   

. Define 1 2

( , ,..., )

W U w w w

  

   and each wi

are iid.

( , ).

N I

0 Thus, define

v w w

ii i i



 are iid chi-squared random variables with n degree of freedom.

Lemma A.1. For

ii i i

v w w



 and

ij i j

v w w



 for any

i j



( ) ( 2)...( 2 2), 1, 2, ..

E v n n n r r    

( ) 2 ,

Var v n



( ) 8 ( 2)( 3),

Var v n n n

  

( ) 8 ,

E v n n

 

( ) 12 ( 4),

E v n n n

  

2 4 4 3

( ( 2)) 3 ( 2)[272 ( )],

E v n n n n n O n    

( ) ,

E v n



( ) 3 ( 2),

E v n n

 

( ) ( 2),

ii ij

E v v n n  2 2

( ) ( 2)( 4),

ii ij

E v v n n n

  

2 2

( ) ( 2) .

ij ii jj

E v v v n n 

Proof. The first 6 results can be found in Srivastava (2005) and the last 5 results can be found in Fisher et al. (2010).

As in similar proofs of Srivastava (2005), we can write 1

(1 / ) ( )

p tr S



 and

1 2

(1/ ) ( )

p tr S



 in terms of chi-squared

random variables.

1 1

1 0

1 1 1 1

ˆ( ) ( )

a tr S tr RDR YY v

p p n np d



 



 

    

 

 

  (A.1)

Similarly, we also have

527

S. Chaipitak & S. Chongcharoen / Songklanakarin J. Sci. Technol. 38 (5), 521-535, 2016

1 2 2 2

02 2 2

1 1 2

( ) ,

p p i j

ii ij

i i j i j

tr S v v

p d d

n p d n p

 





 

  

  (A.2)

where

ij i j

v w w



 We let

1 2 1 2

0 0

1 1

( ) ( ( )) .

( 1)( 2)

a tr S tr S

n n p n

 

   

 

 

 

 

(A.3)

Thus

[

2 2

22 2 2

1 1

2 2

3 2 2

1 2

1 1 2 1

ˆ( 1)( 2)

1 2 1

( )

( 1)( 2)

p p p

i j

i i

ii ij ii

i i j i

i j i

p p i j

ii ij ii jj

i i j i j

a v v v

n n p d d np d

n p d n p

n n v v v v

n n d d n

n p d n p

nb b

n n

 



  

 

  

 

  

 



  

 

 

 

 

 

 

 

 

 

 

 

 

  (A.4)

where

2 2

1 2

3 2 2

1 2 1

, ( ).

p p i j

ii ij ii jj

i i j i j

b v b v v v

d d n

n p d n p

 



 



  

 

Proof of Theorem 2.1.

Since

1 0 1 1 1

0 1

1 1 1 1

( ) ( ) ( )

1 1 ( ) .

p p p

i i i

ii ii

i i i

E a E tr S E v E v n

p np d np d np d

tr a

p d p

  





  





  



    

    

 

   

 

   

And from Lemma A.1, we easily find that 21

( ) 0

ij ii jj

E v v v

 

then 2

( ) 0

E b



. Thus

2 2

23 2 3 2

1 1

2 2

1 2

0 2

3 2 2

1 1

( ) ( )

( 1)( 2) ( 1)( 2)

1 1 1

( 2) ( )

( 1)( 2)

p p

i i

ii ii

i i

p p

i i

n n n n

E a E v E v

n n n n

n p d n p d

n n n n tr a

n n p p

n p d d

 

 



 

 

 

 

   



      

 

   

   

   

This is shown that both

and

are unbiased estimators of

and

respectively. To show that

and

are consistent

estimators considered by

2 2

2 2 2

1 1

1 1 2 1 2

( ) ( ) .

p p i i

i i

ii ii

Var a Var v Var v a

np d np p np

n p d d

  

 

   

  

 

   

 

   

   

(A.5)

S. Chaipitak & S. Chongcharoen / Songklanakarin J. Sci. Technol. 38 (5), 521-535, 2016

528

And since

2 1 2

( 1)( 2)

a b b

n n

 

 

 

 

thus

 

2 1 2 1 2

( ) ( ) ( ) 2 ( , .

( 1)( 2)

Var a Var b Var b COV b b

n n

  

 

 

 

 

2 4

2 2

16 2 2 6 2 4

1 1

( 1) ( 1)

( ) ( )

8( 1) ( 2)( 3) .

p p

i i

ii ii

i i

n n

Var b Var v Var v

n p d n p d

n n n a

n p

 

 

 

 

 

  



 

 

 

(A.6)

24 2

2 2

4 2 2 2

2 4

4 1

( )

4 1 1

4( 1)( 2) 1 .

pi j

ij ii jj

i j i j

pi i

ij ii jj ij ii jj

i j i i

Var b Var v v v

d d n

n p

E v v v E v v v

n n

n p d d

n n a a

 





 

   

 

 

 

 

 

 

   

 

   

 

   

 

 

 

 

(A.7)

And since 2

( ) 0

E b



then

1 2 1 2

2 2

5 2

2 2

2 2 2 2

1 2

11 22

2 2

1 2

( , ) ( )

2( 1) 1

1 1

2( 1)

p p j k

ii jk jj kk

i i j i j

p p

j k j k

jk jj kk jk jj kk

i j i j

COV b b E b b

nE v v v v

d d n

n p d

E v v v v E v v v v

d d n d d n

d d

n p

 



   

 

 

 

 





 

  







 

 

 

 

 

 

 

   

   

   

   

   

   

   

2 2

p j k

pp jk jj kk

i j i j

E v v v v

d d n

  



 



 

 

 

 

 

 

 

 

 

 

 

 

 

(A.8)

becau se

529

S. Chaipitak & S. Chongcharoen / Songklanakarin J. Sci. Technol. 38 (5), 521-535, 2016

2 2

2 2 2

2 2 3

1 1

( 2)( ) ( 2)( )( ) 0, ,

1( 2)(

pj k

ii jk jj kk

i j i j

p p

j k j k

ii jk ii jj kk

i j i j

pj k j k

ii ik ii kk

i j i j i j

E v v v v

d d n

E v v v v v n n n n n n n for i j k

d d n d d n

E v v v v n n n

d d n d d

 

   



 





 





        

    

 

 

 

 

 

 

 

   

   

   

 

 

 

2 2 3

4) ( 2)( 4)( ) 0, ,

1 1

( 2)( 4) ( 2)( 4)( ) 0, .

i j

p p

j k j k

ii ij ii jj

i j i j

n n n n for i j k

E v v v v n n n n n n n for i j k

d d n d d n

   



 



 

     

          

 

 

 

   

   

   

(A.9)

By (A.6) – (A.8), then we have

 

2 1 2 1 2

2 2

4 2

4 2 4

2 2 5 4

4 2

( ) ( ) ( ) 2 ( , )

( 1) ( 2)

8( 1) ( 2)( 3) 4( 1)( 2) 1

( 1) ( 2)

4(2 3 6) 4 .

( 1)( 2) ( 1)( 2)

Var a Var b Var b COV b b

n n

n n n n n n

a a a

n n n p n

n n a a

n n n p n n

  

 

    

  

 

 

 

   

 

 

 

  

 

 

 

 

 

 

 

  

(A.10)

Since

and

are unbiased estimators of

and

, respectively and from (A.5), (A.9), and by applying the Chebyshev’ss

inequality, for any





as ( , )n p

 

1 1 1

2 2

1 1

ˆ ˆ

( ) 0

P a a Var a np



 

    

 

  and

2 2

2 2 2 4 2 4 2

2 2 2

1 1 4(2 3 6) 4 8 4

ˆ ˆ

( ) 0.

( 1)( 2) ( 1)( 2)

n n

P a a Var a a a a a

n n n p n n np n



 

 

       

   

   

 

     

   

 

 

Hence

and

are unbiased and consistent estimators of

and

, respectively. The proof is completed.

Proof of Theorem 2.2.

From Theorem 2.1, we have

1 1 2 2

ˆ ˆ

( ) , ( )

E a a E a a

 

(A.11)

By Lemma A.1., with simple calculations and in similar proofs of Srivastava (2005) under assumption (A), and (B), and as

( , )n p

 

, we obtain

1 2

( ) 2 / ,

Var a a np

(A.12)

1 4 4

8( 1) ( 2)( 3)

( ) 8 / ,

n n n

Var b a a np

n p

  

  (A.13)

2 2

2 2 4 2 4

4( 1)( 2) 1

( ) 4 ( / ) / ,

n n

Var b a a c a a p np

 

   

 

 

  (A.14)

S. Chaipitak & S. Chongcharoen / Songklanakarin J. Sci. Technol. 38 (5), 521-535, 2016

530

2 2

2 4 2 4 2

4(2 3 6) 4

( ) 4(2 ) / ,

( 1)( 2) ( 1)( 2)

n n

Var a a a a ca np

n n n p n n

 

   

    (A.15)

1 1 1 1 1 1

2 2

4 2 3 2

1 1 1 1

3 2

4 3 2 2

ˆ ˆ ˆ

( , ) ( ) ( ) ( )

1 1 1

1 ( 1)( 2)

p p p p

i i i i

ii ii ii ii

i i i i

i i

p p i j

ii ii jj

i i j

COV a b E a b E a E b

n n

E v v E v E v

d np d

n p d n p d

n n n

E v v v

n p d d d n p

   

 



   

 

 

 

 

   

  

  

 

   

 

   

 

   

 

 

   

 

 

 

2 2

1 1

p p

i i

 

 

 

4 2 3 2

2 2 3 2

4 2 3 4 2 2

2 2 3

1( 2)( 4) ( 2)

( 1)( 2)

( 1)( )( 2)( 4) ( 1) ( 2)

( 1)( 2)

p p i j

i i j

p p i j

i i j

p p

i j

i i j

nn n n n n

n p d d d

n n

n p d d d

n n n n n n n

n p d n p d d

n n

n p d

 



 



 



 



 



    

 

 

    

 

 



 

 

 

 

 

 

2 2 2

3 3

3 2 3 2 2 3

1 1

3 3

3 2

( 1)( 2)

( 1)( 2)( 4) ( 1)( 2)

4( 1)( 2) 4 .

p p i j

i j i j

p p

i i

n n

n p d d

n n n n n

n p d n p d

n n a a

n p

 



 

 

 



    

 

 

 

(A.16)

From the fact that 2

( ) 0

E b



and similar to the proof for

1 2

( )

E b b

2 1

ˆ ˆ

( , ) ( ) 0.

1 2 1 2 3 2 1

p p i j

COV a b E a b E v v v v

ii ij ii jj

d d d n

n p i i j

i i j

 



   

 

 

 

 

   

  

   

  

 

 

  

  (A.17)

Note that

531

S. Chaipitak & S. Chongcharoen / Songklanakarin J. Sci. Technol. 38 (5), 521-535, 2016

1 1 2 1

2 2

ˆ ˆ

( )

1 2 3 2 2

( 1)( 2)

1 1

1 2 1

2 2

2 2 2 2

( 2) ( 1)( 2)

1 1 1

p p p

n n i j

i i

E a a E v v v v v

ii ii ij ii jj

np d n n d d n

n p d n p

i i i j

i i j

p p p i j

i i i

E v v v v v v

ii ii ii ij ii jj

d d d d n

n n p d n n np

i i i

i i i j

 

  



  

  

 

  

  

  

  

  

 

 

  

 

 

 

 



i j





 



 

 

 

 

 

(A.18)

By similar proof to

1 2

( )

E b b

we have 1

p p i j

E v v v v

ii ij ii jj

d d d n

i i j

 



 

 

 

 

 

 

 

 

 

 

 

then the expectation of the second

term in (A.18) equals to zero. Thus, we obtain that

3 2

1 2 2 2 3 2

2 2 3 2

2 3 2 2

3 2

2 3 2 2

ˆ ˆ

( )

( 2)

( 2)( 4) ( 2)

( 2)

( 4) 1

p p i j

ii ii jj

i i j

p p i j

i i j

p p i j

i i j

pi i

ii i

E a a E v v v

n n p d d d

n n n n n

n n p d d d

np d p d d

np d p d

 



 



 



 

 



 



 



    



 



 

 

 

 

 

 

 

1 1 1

3 2

2 3 2 2

1 1 1

3 2 1

4 1

4 / .

p p p

i i

i i i

p p p

i i i

i i i i

i i

np d p d

a np a a

 

  

  

  



 

 

 

 

(A.19)

By (A.11) and (A.19) as ( , )n p

 

, we obtain

ˆ ˆ ˆ ˆ ˆ ˆ

( , ) ( ) ( ) ( ) 4 /

1 2 1 2 1 2 3

COV a a E a a E a E a a np

   (A.20)

To find the distribution of

and

, we used Multivariate central limit theorem (Rao,1973,p.147) and Lindebergg

Central Limit Theorem (Billingsley, 1995, p.359)

Since

[ ]

2 1 2

( 1)( 2)

a b b

n n

 

  , so we need to find the distribution of

1 1

a b

and

which will distribute as

Normal distribution, respectively. First, we find the distribution of

1 1

a b

because both are functions of

and the second

is of

because it is a function of ,

v i j



. Finally, the distribution of

which is a distribution of a linear function of two

normal random variables is obtained.

First, in order to find the distribution of

and

. Under



and

as before, we let

S. Chaipitak & S. Chongcharoen / Songklanakarin J. Sci. Technol. 38 (5), 521-535, 2016

532

( )

v n

i ii

d n





 and

2 2

( ( 2))

( 2)( 3)

v n n

i ii

uid n n n



 



 

where

2 2 4 4 3 3

( ) 0, ( ) 0, ( ) 2 / , ( ) 8 / , ( , ) 4 /

1 2 1 2 1 2

E u E u Var u d Var u d COV u u e d

i i i i i i i i i i i n i

  

     an d

2 / 3 1

e n n

   

 

Since

s are independent, thus

( , )

1 2

u u

i i i



u are independently distributed

random vectors,

1,...,

i p



with ( )Ei



u 0

and covariance matrices



given by

2 2 3 3

2 / 4 /

, 1, ...,

3 3 4 4

4 / 8 /

d e d

i i i n i

i p

e d d

i n i i i

 

  

 

 

 

For any n as

 

( ... ) /

2 3

2 3 2 41 1 2 3 0

3 4 4 8

4 3 4

3 4

1 1

n n pn

p p

i n i

p p

d d a e ai i

i i n

e a a

p p

e n

n i i

p P

d di i

i i

 

     

 

 

    

 

 

 

 

   

   

 

 

 

where

0 0

2 3

0 0

3 4

2 4

4 8

a e a

e a a

 

 

 

 

is the distribution function of

then

(

2 2 2 2 4 4

1 2 1 2

2 2 2 2 2

1 1 1 1

1 1 1 1 2

) ( ) ( ),

i i

p p p p

i i i i i i i i i i

i i i i

dF dF E u u E u u

p p p p p

  

 

   



   

 

    u u u u

u u

from



inequality in Rao (1973, p.149). Since as

 

and from Lemma A.1.,

4 4

2 2 2

4 4

( ) ( ) 12 ( 4) 0

2 2 2 2 4 2 2 2 4 2

1 1 1

p p p

i i

E u E v n n n

i ii

p p d n p d ni i i

i i

 

  

    

  

  

and by an analogous derivation as

 

Hence 24 4

( ) 0

1 2

2 2 1

E u u

i i

p i



 





 

. By applying the multivariate central limit theorem, as

 

for any n

533

S. Chaipitak & S. Chongcharoen / Songklanakarin J. Sci. Technol. 38 (5), 521-535, 2016

( )

( ... ) ( , )

2 2

1 2 2

( ( 2))

2( 2)( 3)

pv n

i ii

np iiDN

p n

pv n np i ii

np d n n









    

 



 



 

 

   

 

 

u u u 0

Note that as

 

0 0

2 4

2 3

0 0

1, ,

0 0

4 8

3 4

a e a

en n

e a a

    

 

 

 

where

0 0

2 4

2 3

0 0

4 8

3 4

a a

 

 

 

 

Thus, it follows that as

( , )n p

 

( )

( , ).

2 2 2

( ( 2))

2( 2)( 3)

pv n

i ii

np iiDN

pv n n

i ii

np d n n











 



 



 

 

   

 

 

And under assumption (A) which leads to assuming that

  

, where

2 4

2 3

4 8

3 4

a a

 

 

 

 

then we have that

( )

( , ).

2 2 2

( ( 2))

2( 2)( 3)

pv n

i ii

np iiDN

pv n n

i ii

np d n n











 



 



 

 

   

 

 

For the first element in the previous random vector, since

1 1 1 1 2

1 1 1

( )1 1 1 ˆ ˆ

( ) ( ) (0, 2 ),

p p p D

i ii i ii i

i i i

v n v n

npa npa np a a N a

d d d

np np np

  

  

  

     

  

 

 

then

( , 2 / ).

1 1 2

a N a a np

 (A.21)

For the second element, we have that

2 2 2 2 2

2 2 2

1 1 1

1 2

( ( 2)) ( 2)

1 1

( 2)( 3) ( 2)( 3) ( 2)( 3)

( 2)1

(0, 8 ).

( 1) ( 2)( 3) ( 2)( 3)

p p p

i ii i ii i

i i i

v n n v n n

np np

d n n d n n d n n

n pb n n pa

N a

np n n n n n

  

  

  

  

 

     



 

    

 

 

 

  

 

 

Since as

 

S. Chaipitak & S. Chongcharoen / Songklanakarin J. Sci. Technol. 38 (5), 521-535, 2016

534

3( 2)

1 1

1 2

( ) ( )

1 2 1 2

( 1) ( 2)( 3) ( 2)( 3)

n pb n n pa

npb npa np b a

np n n n n n np



    

    

 

 

 

then

( ) (0,8 )

1 2 4

np b a N a

 also, and with a linear transformation we have the result that

( ,8 / ).

1 2 4

b N a a np

 (A.22)

The next is to find the distribution of

. Srivastava (2005) gave the important results, which are used for the next

proof, that

/ ~ (0,1)

v n N

ij as

 

and

2 2

/ ~

v n



which are asymptotically independently distributed for all distinct

i and j.

Note that

defined in (A.4), now we let 2

( )

i j

ij ij ii ij

i j

v v v

n pd d

 



  . From Lemma A.1., we have

( ) 0

Eij





and

let

 

2 2 1

4 2

2 2

4 4

2 2

4 2

( ) ( )

4( )

( )

4( 1)( 2) 4

p p i j

p ij ij ii jj

i j i j i j

pi j

ij ii jj

i j i j

S Var Var v v v

n pd d

Var v v v

d d

n p

Var b

a an n c

a a

p p

n n

 



 

 



 



  

 



 

   

 

 

 

 

 

   

   

   

as ( , )n p

 

Let 2

2 1

( )

p p i j

p ij ij ii ij

i j i j i j

M v v v b

d d n

n p

 



 

 

   

. If

is the distribution function of



. Since, for





2 2

2 2 2

2 2

4 2 2 2

2 2

4 2 2 2 2 2

2 2

2 2 2 2 2 2

1 1

1( )

4 1

8( 1)( 2)

ij p

p p

ij ij ij ij

i j i j

p p

i j p

pi j

ij ii jj

i j i j

pi j

i j p i j

pi j

i j p i j

dP dP

S S

E v v v

d d n

n p S

n n

n p S d d

 

 







 



 



 



 





 

 







 

 



 

 

 

 

535

S. Chaipitak & S. Chongcharoen / Songklanakarin J. Sci. Technol. 38 (5), 521-535, 2016

 

Then, it follows from the Lindeberg Central Limit Theorem in Lemma A.3.,

(0,1).

2 ( / )

2 4

Mnpb D

Spc a a p







Then we have 42

0, ( / ) .

2 2 4

b N a a p

np 

 



 

 

(A.23)

By (A.8), then

and

are asymptotically independent. Note that

is a linear function of two random variables

and

that is,

[ ]

2 1 2

( 1)( 2)

a b b

n n

  

 

1 2

b b



 

By (A.5), (A.15.), (A.22.), and (A.23.), then we have





, 4(2 ) / .

2 2 4 2

a N a a ca np

 (A.24)

From (A.20), ˆ ˆ

( , ) 4 /

1 2 3

COV a a a np

, (A.21), and (A.24), we have

2 / 4 /

ˆ2 3

1 1

, .

ˆ4 / 4( 2 ) /

2 2 3 4 2

a np a np

a a

a a a np a ca np

  

   

 

 



   

 

 

   

  

The proof is completed. 

Proof of Theorem 2.3. Note that our test statistic is

ˆˆ ˆ

2 1

a ta t



  

and we have





 

 and ˆ







.

By applying the delta method (Lehmann and Romano, 2005, p.436), thus,

(0, )

  

 where

 

2 / 4 /

2 3 2 4

2 2

2 1 (2 4 2 )

2 3 4 2

4 / 4(2 ) /

3 4 2

a np a np t

t t a ta a ca

a np a ca np





     



  

  

  

 

The proof is completed. 

Proof of Corollary 2.1. Under

2 3

, ,

0 2 3

H a t a t

 

and

a t



Thus, 2 4

4 / .

ct np



 It follows from Theorem 2.3. that

the null asymptotic distribution of T is

(0,1)

N. The proof is completed. 

ResearchGate has not been able to resolve any citations for this publication.

Some Hypothesis Tests for the Covariance Matrix When the Dimension is Large Compared to the Sample Size

Article

Full-text available

Jan 2001
ANN STAT

This paper analyzes whether standard covariance matrix tests work when dimensionality is large, and in particular larger than sample size. In the latter case, the singularity of the sample covariance matrix makes likelihood ratio tests degenerate, but other tests based on quadratic forms of sample covariance matrix eigenvalues remain well-defined. We study the consistency property and limiting distribution of these tests as dimensionality and sample size go to infinity together, with their ratio converging to a finite non-zero limit. We find that the existing test for sphericity is robust against high dimensionality, but not the test for equality of the covariance matrix to a given matrix. For the latter test, we develop a new correction to the existing test statistic that makes it robust against high dimensionality.

Probability and Measure

Book

Jan 1986
J AM STAT ASSOC

P. Billingsley

Probability. Measure. Integration. Random Variables and Expected Values. Convergence of Distributions. Derivatives and Conditional Probability. Stochastic Processes. Appendix. Notes on the Problems. Bibliography. List of Symbols. Index.

Testing Statistical Hypotheses

Article

Jul 1962

Some Optimal Multivariates Tests

Article

Apr 1971
BIOMETRIKA

S. John

SUMMARY Tests that are best for detecting small deviations from the null hypothesis are derived for a number of hypotheses concerning multivariate normal populations. Both one-sided and two-sided tests are considered.

An introduction to multivariate statistical analysis. 2nd ed

Article

May 1959

T. W. Anderson

A new test for sphericity of the covariance matrix for high dimensional data

Article

Nov 2010

In this paper we propose a new test procedure for sphericity of the covariance matrix when the dimensionality, p, exceeds that of the sample size, N=n+1. Under the assumptions that (A) 0trΣi/p∞ as p→∞ for i=1,…,16 and (B) p/n→c∞ known as the concentration, a new statistic is developed utilizing the ratio of the fourth and second arithmetic means of the eigenvalues of the sample covariance matrix. The newly defined test has many desirable general asymptotic properties, such as normality and consistency when (n,p)→∞. Our simulation results show that the new test is comparable to, and in some cases more powerful than, the tests for sphericity in the current literature.

On Some Test Criteria for Covariance Matrix

Article

Jul 1973
ANN STAT

Hisao Nagao

Some new test criteria are proposed for testing various hypotheses concerning covariance matrices. Asymptotic expansions of their null distributions are derived in terms of the $\chi^2$-distribution.

A covariance matrix test for high-dimensional data

Abstract

Recommended publications

Testing homogeneity of several covariance matrices and multi-sample sphericity for high-dimensional...

Testing a block exchangeable covariance matrix

A covariance matrix test for high-dimensional data

A Test for Testing the Equality of Two Covariance Matrices for High-dimensional Data

On testing for an identity covariance matrix when the dimensionality equals or exceeds the sample si...

Tests of Covariance Matrices for High Dimensional Multivariate Data Under Non Normality