ArticlePDF Available

Functional central limit theorems for single-stage sampling designs

September 2015

September 2015

Source
arXiv

Authors:

Hendrik P. Lopuhaä

Delft University of Technology

Anne Ruiz-Gazen

Toulouse 1 Capitole University

For a joint model-based and design-based inference, we establish functional central limit theorems for the Horvitz-Thompson empirical process and the H\'ajek empirical process centered by their finite population mean as well as by their super-population mean in a survey sampling framework. The results apply to generic sampling designs and essentially only require conditions on higher order correlations. We apply our main results to a Hadamard differentiable statistical functional and illustrate its limit behavior by means of a computer simulation.

Content uploaded by Hendrik P. Lopuhaä

Content may be subject to copyright.

arXiv:1509.09273v1 [math.ST] 30 Sep 2015

Functional central limit theorems

in survey sampling

Hélène Boistard1, Hendrik P. Lopuhaä2, and Anne Ruiz-Gazen3

1Toulouse School of Economics

2Delft University of Technology

3Toulouse School of Economics

October 1, 2015

Abstract

For a joint model-based and design-based inference, we establish functional central limit

theorems for the Horvitz-Thompson empirical process and the Hájek empirical process centered

by their ﬁnite population mean as well as by their super-population mean in a survey sampling

framework. The results apply to generic sampling designs and essentially only require conditions

on higher order correlations. We apply our main results to a Hadamard diﬀerentiable statistical

functional and illustrate its limit behavior by means of a computer simulation.

1 Introduction

Functional central limit theorems are well established in statistics. Much of the theory has been de-

veloped for empirical processes of independent summands. In combination with the functional delta-

method they have become a very powerful tool for investigating the limit behavior for Hadamard or

Fréchet diﬀerentiable statistical functionals (e.g., see [vdVW96] or [vdV98] for a rigorous treatment

with several applications).

In survey sampling, results on functional central limit theorems are far from complete. At the

same time there is a need for such results. For instance, in [Dd08] the limit distribution of several

statistical functionals is investigated, under the assumption that such a limit theorem exists for a

design-based empirical process, whereas in [BD09] the existence of a functional central limit theorem

is assumed, to perform model-based inference on several Gini indices. Weak convergence of processes

in combination with the delta method are treated in [Bha07], [Dav09], [BM11], but these results

are tailor made for speciﬁc statistical functionals, and do not apply to the empirical processes that

are typically considered in survey sampling.

Recently, functional central limit theorems for empirical processes in survey sampling have ap-

peared in the literature. Most of them are concerned with empirical processes indexed by a class

of functions, see [BW07], [SW13], and [BCC14]. However, the results in [BW07] and [SW13]

are restricted to sampling schemes that have exchangeable inclusion indicators and constant inclu-

sion probabilities, such as simple random sampling and Bernoulli sampling, whereas the approach

in [BCC14] seems diﬃcult to extend to sampling designs other than those that are closely related

to Poisson sampling. [Wan12] considers empirical processes indexed by a real valued parameter.

Unfortunately, this paper seems to miss a number of assumptions that cannot be avoided and, more

importantly, it seems to contain a ﬂaw in the proof. (see Section 7for a more detailed discussion).

The main purpose of the present paper is to establish functional central limit theorems for

the Horvitz-Thompson and the Hájek empirical distribution function that apply to general sam-

pling designs. For design-based inference about ﬁnite population parameters, these empirical dis-

tribution functions will be centered around their population mean. On the other hand, in many

situations involving survey data, one is interested in the corresponding model parameters (e.g.,

see [KG98] and [BR09]). Recently, Rubin-Bleuer and Schiopu Kratina [RBSK05] deﬁned a mathe-

matical framework for joint model-based and design-based inference through a probability product-

space and introduced a general and uniﬁed methodology for studying the asymptotic properties

of model parameter estimators. To incorporate both types of inferences, we consider the Horvitz-

Thompson empirical process and the Hájek empirical process under the super-population model

described in [RBSK05], both centered around their ﬁnite population mean as well as around their

super-population mean. Our main results are functional central limit theorems for both empirical

processes indexed by a real valued parameter and apply to generic sampling schemes. These results

are established only requiring the usual standard assumptions that one encounters in asymptotic

theory in survey sampling. Our approach was inspired by an unpublished manuscript from Philippe

Fevrier and Nicolas Ragache, which was the outcome of an internship at INSEE in 2001.

The article is organized as follows. Notations and assumptions are discussed in Section 2.

In particular we brieﬂy discuss the joint model-based and design-based inference setting deﬁned

in [RBSK05]. In Sections 3and 4, we list the assumptions and state our main results. Our

assumptions essentially concern the inclusion probabilities of the sampling design up to the fourth

order and a central limit theorem (CLT) for the Horvitz-Thompson estimator of a population total

for i.i.d. bounded random variables. Our results allow random inclusion probabilities and are stated

in terms of the design-based expected sample size, but we also formulate more detailed results in

case these quantities are deterministic.

As an application of our results, in combination with the functional delta-method, we obtain the

limit distribution of the poverty rate in Section 5. This example is further investigated in Section 6

by means of a simulation. Finally, in Section 7we discuss in detail the diﬀerences of our results

with the work by [BW07], [SW13], [Wan12], and [BCC14]. All proofs are deferred to Section 8and

some tedious technicalities can be found in [BLRG15].

2 Notations and assumptions

We adopt the super-population setup as described in [RBSK05]. Consider a sequence of ﬁnite

populations (UN), of sizes N= 1,2,.... With each population we associate a set of indices UN=

{1,2,...,N}. Furthermore, for each index i∈UN, we have a tuple (yi, zi)∈R×Rq

+. We denote

yN= (y1, y2,...,yN)∈RNand zN∈Rq×N

+similarly. The vector yNcontains the values of the

variable of interest and zNcontains information for the sampling design. We assume that the values

in each ﬁnite population are realizations of random variables (Yi, Zi)∈R×Rq

+, for i= 1,2,...,N,

on a common probability space (Ω,F,Pm). Similarly, we denote YN= (Y1, Y2,...,YN)∈RNand

ZN∈Rq×N

+. To incorporate the sampling design, a product space is deﬁned as follows. For all

N= 1,2,..., let SN={s:s⊂UN}be the collection of subsets of UNand let AN=σ(SN)be the

σ-algebra generated by SN. A sampling design associated to some sampling scheme is a function

p:AN×Rq×N

+7→ [0,1], such that

(i) for all s∈ SN,zN7→ p(s, zN)is a Borel-measurable function on Rq×N

(ii) for all zN∈Rq×N

+,A7→ p(A, zN)is a probability measure on AN.

Note that for each ω∈Ω, we can deﬁne a probability measure A7→ Pd(A, ω) = Ps∈Ap(s, ZN(ω))

on the design space (SN,AN). Corresponding expectations will be denoted by Ed(·, ω). Next, we

deﬁne a product probability space that includes the super-population and the design space, under

the premise that sample selection and the model characteristic are independent given the design

variables. Let (SN×Ω,AN×F)be the product space with probability measure Pd,m deﬁned on

simple rectangles {s} × E∈AN×Fby

Pd,m({s} × E) = ZE

p(s, ZN(ω)) dPm(ω) = ZE

Pd({s}, ω)dPm(ω).

When taking expectations or computing probabilities, we will emphasize whether this is with respect

either to the measure Pd,m associated with the product space (SN×Ω,AN×F), or the measure Pd

associated with the design space (SN,AN), or the measure Pmassociated with the super-population

space (Ω,F).

If nsdenotes the size of sample s, then this may depend on the speciﬁc sampling design including

the values of the design variables Z1(ω),...,ZN(ω). Similarly, the inclusion probabilities may

depend on the values of the design variables, πi(ω) = Ed(ξi, ω) = Ps∋ips, ZN(ω), where ξiis the

indicator 1{s∋i}. Instead of ns, we will consider n=Ed[ns(ω)] = PN

i=1 Ed(ξi, ω) = PN

i=1 πi(ω). This

means that the inclusion probabilities and the design-based expected sample size may be random

variables on (Ω,F,Pm).

We ﬁrst consider the Horvitz-Thompson (HT) empirical processes, obtained from the HT em-

pirical c.d.f.:

FHT

N(t) = 1

i=1

ξi1{Yi≤t}

πi

, t ∈R.(2.1)

We will consider HT empirical process √n(FHT

N−FN), obtained by centering around the empirical

c.d.f. FNof Y1,...,YN, as well as the HT empirical process √n(FHT

N−F), obtained by centering

around the c.d.f. Fof the Yi’s. A functional central limit theorem for both processes will be

formulated in Section 3. In addition, we will consider the Hájek empirical c.d.f.:

FHJ

N(t) = 1

i=1

ξi1{Yi≤t}

πi

, t ∈R,(2.2)

where b

N=PN

i=1 ξi/πiis the HT estimator for the population total N. Functional central limit

theorems for √n(FHJ

N−FN)and √n(FHJ

N−F)will be provided in Section 4. The advantage of our

results is that they allow general sampling schemes and that we primarily require bounds on the

rate at which higher order correlations tend to zero ω-almost surely, under the design measure Pd.

3 FCLT’s for the Horvitz-Thompson empirical processes

A functional central limit theorem for √n(FHT

N−FN)and √n(FHT

N−F)is obtained by proving weak

convergence of all ﬁnite dimensional distributions and tightness. In order to establish the latter for

general sampling schemes, we impose a number of conditions that involve the sets

Dν,N =n(i1, i2,...,iν)∈ {1,2,...,N}ν:i1, i2,...,iνall diﬀerento,(3.1)

for the integers 1≤ν≤4. We assume the following conditions:

(C1) there exist constants K1, K2, such that for all i= 1,2,...,N,

0< K1≤Nπi

n≤K2<∞, ω −a.s.

There exists a constant K3>0, such that for all N= 1,2,...:

(C2) max(i,j)∈D2,N Ed(ξi−πi)(ξj−πj)< K3n/N2,

(C3) max(i,j,k)∈D3,N Ed(ξi−πi)(ξj−πj)(ξk−πk)< K3n2/N3,

(C4) max(i,j,k,l)∈D4,N Ed(ξi−πi)(ξj−πj)(ξk−πk)(ξl−πl)< K3n2/N4,

ω-almost surely. These conditions on higher order correlations are commonly used in the litera-

ture on survey sampling in order to derive asymptotic properties of estimators (e.g., see [BO00],

and [CCGL10]). [BO00] proved that they hold for simple random sampling without replacement and

stratiﬁed simple random sampling without replacement, whereas [BLRG12] proved that they hold

also for rejective sampling. Lemma 2 from [BLRG12] allows us to reformulate the above conditions

on higher order correlations into conditions on higher order inclusion probabilities.

To establish the convergence of ﬁnite dimensional distributions, for sequences of bounded i.i.d. ran-

dom variables V1, V2,...on (Ω,F,Pm), we will need a CLT for the HT estimator in the design space,

conditionally on the Vi’s. To this end, let S2

Nbe the (design-based) variance of the HT estimator

of the population mean, i.e.,

N=1

i=1

j=1

πij −πiπj

πiπj

ViVj.(3.2)

We assume that

(HT1) For Nsuﬃciently large SN>0and for any sequence of bounded i.i.d. random variables

V1, V2,...,

SN 1

i=1

ξiVi

πi−1

i=1

Vi!→N(0,1), ω −a.s.,

in distribution under Pd.

Note that (HT1) holds for simple random sampling without replacement if n(N−n)/N tends to

inﬁnity when Ntends to inﬁnity (see [Tho97]), as well as for Poisson sampling under some conditions

on the ﬁrst order inclusion probabilities (e.g., see [Ful09]). For rejective sampling, [Háj64] gives some

suﬃcient conditions for (HT1) to hold.

We also need that nS2

Nconverges for the particular case where the Vi’s are random vectors

consisting of indicators 1{Yj≤t}.

(HT2) For k∈ {1,2,...},i= 1,2,...,k and t1, t2,...,tk∈R, deﬁne Yt

ik =1{Yi≤t1},...,1{Yi≤tk}.

There exists a deterministic matrix ΣHT

k, such that

lim

N→∞

i=1

j=1

πij −πiπj

πiπj

YikYt

jk =ΣHT

k, ω −a.s. (3.3)

This kind of assumption is quite standard in the literature on survey sampling and is usually

imposed for general random vectors (see, for example [DS92], p.379, [FF91], condition 3 on page 457,

or [KR81], condition C4 on page 1014). It suﬃces to require (3.3) for Yt

ik =1{Yi≤t1},...,1{Yi≤tk}.

Moreover, if (C1)-(C2) hold, then the sequence in (3.3) is bounded, so that by dominated convergence

it follows that

ΣHT

k= lim

N→∞

i=1

j=1

Emnπij −πiπj

πiπj

YikYt

jk .(3.4)

This might help to get a more tractable expression for ΣHT

We are now able to formulate our ﬁrst main result. Let D(R)be the space of càdlàg functions

on Requipped with the Skorohod topology.

Theorem 3.1. Let Y1,...,YNbe i.i.d. random variables with c.d.f. Fand empirical c.d.f. FN

and let FHT

Nbe deﬁned in (2.1). Suppose that conditions (C1)-(C4) and (HT1)-(HT2) hold. Then

√n(FHT

N−FN)converges weakly in D(R)to a mean zero Gaussian process GHT with covariance

function

EmGHT(s)GHT(t) = lim

N→∞

i=1

j=1

Emnπij −πiπj

πiπj

1{Yi≤s}1{Yj≤t}

for s, t ∈R.

Note that Theorem 3.1 allows a random (design-based) expected sample size nand random

inclusion probabilities. However, the expression of the covariance function of the limiting Gaussian

process is somewhat unsatisfactory. When nand the inclusion probabilities are deterministic, we can

obtain a functional CLT with a more precise expression for EmGHT(s)GHT(t)under slightly weaker

conditions. This is formulated in the proposition below. Note that with imposing conditions (i)-(ii)

in Proposition 3.1 instead of (3.3), convergence of nS2

Nis not necessarily guaranteed. However, this

is established in Lemma 9.1 in [BLRG15] under (C1) and (C2).

Finally, we like to emphasize that if we would have imposed (HT2) for any sequence Y1,Y2,...

of bounded random vectors, then (HT2) would have implied conditions (i)-(ii) in the deterministic

setup of Proposition 3.1.

Proposition 3.1. Consider the setting of Theorem 3.1, where nand πi, πij , for i, j = 1,2,...,N,

are deterministic. Suppose that (C1)-(C4) and (HT1) hold, but instead of (HT2) assume that there

exist constants µπ1,µπ2∈Rsuch that

(i) lim

N→∞

i=1 1

πi−1=µπ1,

(ii) lim

N→∞

N2XX

i6=j

πij −πiπj

πiπj

=µπ2.

Then √n(FHT

N−FN)converges weakly in D(R)to a mean zero Gaussian process GHT with covariance

function µπ1F(s∧t) + µπ2F(s)F(t), for s, t ∈R.

When n/N →λ∈[0,1], then conditions (i)-(ii) hold with µπ1= 1 −λand µπ2=λ−1for

simple random sampling without replacement. For Poisson sampling, (ii) holds trivially because

the trials are independent. For rejective sampling, (i)-(ii) together with n/N →λ∈[0,1], can be

deduced from the associated Poisson sampling design. Indeed, suppose that (i) holds for Poisson

sampling with ﬁrst order inclusion probabilities p1,...,pN, such that PN

i=1 pi=n. Then, from

Theorem 1 in [BLRG12] it follows that if d=PN

i=1 pi(1 −pi)tends to inﬁnity, assumption (i) holds

for rejective sampling. Furthermore, if n/N →λ∈[0,1] and N/d has a ﬁnite limit, then also (ii)

holds for rejective sampling.

Weak convergence of the process √n(FHT

N−F), where we center with Finstead of FN, requires

a CLT in the super-population space for

√n 1

i=1

ξiVi

πi−µV!,where µV=Em(Vi),(3.5)

for sequences of bounded i.i.d. random variables V1, V2,... on (Ω,F,Pm). Our approach to establish

asymptotic normality of (3.5) is then to decompose as follows

√n 1

i=1

ξiVi

πi−µV!

=√n 1

i=1

ξiVi

πi−1

i=1

Vi!+√n

√N×√N 1

i=1

Vi−µV!.

(3.6)

Since the Vi’s are i.i.d. and bounded, for the second term on the right hand side, by the traditional

CLT we immediately obtain

√N 1

i=1

Vi−µV!→N(0, σ2

V),(3.7)

in distribution under Pm, where σ2

Vdenotes the variance of the Vi’s, whereas the ﬁrst term on the

right hand side can be handled with (HT1). [BW07] and [SW13] use a decomposition similar to

the one in (3.6). Their approach assumes exchangeable ξi’s and equal inclusion probabilities n/N,

which allows the use of results on exchangeable weighted bootstrap to handle the ﬁrst term on the

right hand side of (3.6). Instead, we only require conditions (C2)-(C4) on higher order correlations

for the ξi’s and allow the πi’s to vary within certain bounds as described in (C1). To combine the

two separate limits in (3.7) and (HT1), we will need

(HT3) n/N →λ∈[0,1],ω-a.s.

We will then use Theorem 5.1(iii) from [RBSK05]. The ﬁnite dimensional projections of the processes

involved turn out to be related to a particular HT estimator. In order to have the corresponding

design-based variance converging to a strictly positive constant, we need the following condition.

(HT4) For all k∈ {1,2,...}and t1, t2,...,tk∈R, the matrix ΣHT

kin (3.3) is positive deﬁnite.

We are now able to formulate our second main result.

Theorem 3.2. Let Y1,...,YNbe i.i.d. random variables met c.d.f. Fand let FHT

Nbe deﬁned in (2.1).

Suppose that conditions (C1)-(C4) and (HT1)-(HT4) hold. Then √n(FHT

N−F)converges weakly in

D(R)to a mean zero Gaussian process GHT

Fwith covariance function Ed,mGHT

F(s)GHT

F(t)given by

lim

N→∞

i=1

j=1

Emnπij −πiπj

πiπj

1{Yi≤s}1{Yj≤t}+λF(s∧t)−F(s)F(t),

for s, t ∈R.

Theorem 3.2 allows random nand inclusion probabilities. As before, when the sample size nand

inclusion probabilities are deterministic we can obtain a functional CLT under a simpler condition

than (HT4) and with a more detailed description of the covariance function of the limiting process.

Proposition 3.2. Consider the setting of Theorem 3.2, where nand πi, πij , for i, j = 1,2,...,N,

are deterministic. Suppose that (C1)-(C4), (HT1)and (HT3) hold, but instead of (HT2) and (HT4)

assume that there exist constants µπ1,µπ2∈Rsuch that

(i) lim

N→∞

i=1 1

πi−1=µπ1>0,

(ii) lim

N→∞

N2XX

i6=j

πij −πiπj

πiπj

=µπ2.

Then √n(FHT

N−F)converges weakly in D(R)to a mean zero Gaussian process GHT with covariance

function (µπ1+λ)F(s∧t) + (µπ2−λ)F(s)F(t), for s, t ∈R.

Since 1/πi≥1, we will always have µπ1≥0in condition (i) in Proposition 3.2. This means that

(i) is not very restrictive. For simple random sampling without replacement, condition (i) requires

λto be strictly smaller than one.

4 FCLT’s for the Hájek empirical processes

To determine the behavior of the process √n(FHJ

N−FN), it is useful to relate this process to the

process

Gπ

N(t) = √n

i=1

ξi

πi1{Yi≤t}−F(t).(4.1)

We can then write √nFHJ

N(t)−FN(t)=YN(t) + N

N−1Gπ

N(t),(4.2)

where

YN(t) = √n

i=1 ξi

πi−11{Yi≤t}−F(t).(4.3)

As intermediate results we will ﬁrst show that the process Gπ

Nconverges weakly to a mean zero

Gaussian process and that b

N/N →1in probability. As a consequence, the limiting behavior of

√n(FHJ

N−FN)will be the same as that of YN, which is an easier process to handle. Instead of

(HT2) and (HT4) we now need

(HJ2) For k∈ {1,2,...},i= 1,2,...,k and t1, t2,...,tk∈R, deﬁne

ik =1{Yi≤t1}−F(t1),...,1{Yi≤tk}−F(tk).

There exists a deterministic matrix ΣHJ

k, such that

lim

N→∞

i=1

j=1

πij −πiπj

πiπje

Yik e

jk =ΣHJ

k, ω −a.s. (4.4)

and

(HJ4) For all k∈ {1,2,...}and t1, t2,...,tk∈R, the matrix ΣHJ

kin (4.4) is positive deﬁnite.

As in the case of (3.4), if (C1)-(C2) hold, then (HJ2) implies

ΣHJ

k= lim

N→∞

i=1

j=1

Emnπij −πiπj

πiπje

Yik e

jk .(4.5)

Theorem 4.1. Let Gπ

Nbe deﬁned in (4.1)and let b

N=PN

i=1 ξi/πi. Suppose n→ ∞,ω-a.s., and

that there exists σ2

π≥0, such that

i=1

πij −πiπj

πiπj→σ2

π, ω −a.s. (4.6)

If in addition,

(i) (HT1) hold, then b

N/N →1in Pd,m probability.

(ii) (C1)-C(4), (HT1), (HT3), (HJ2) and (HJ4) hold, then Gπ

Nconverges weakly in D(R)to a

mean zero Gaussian process Gπwith covariance function Ed,mGπ(s)Gπ(t)given by

lim

N→∞

i=1

j=1

Emnπij −πiπj

πiπj1{Yi≤s}−F(s)1{Yi≤t}−F(t)

+λ(F(s∧t)−F(s)F(t)) , s, t ∈R.

Note that in view of condition (HT3), the condition n→ ∞ is immediate, if λ > 0. We proceed

by establishing weak convergence of √n(FHJ

N−FN).

Theorem 4.2. Let Y1,...,YNbe i.i.d. random variables with c.d.f. Fand empirical c.d.f. FNand

let FHJ

Nbe deﬁned in (2.2). Suppose n→ ∞,ω-a.s., and that (C1)-C(4), (HT1), (HT3), and (HJ2)

hold, as well as condition (4.6). Then √n(FHJ

N−FN)converges weakly in D(R)to a mean zero

Gaussian process GHJ with covariance function Ed,mGHJ (s)GHJ(t)given by

lim

N→∞

i=1

j=1

Emnπij −πiπj

πiπj1{Yi≤s}−F(s)1{Yi≤t}−F(t),

for s, t ∈R.

Note that we do not need condition (HJ4) in Theorem 4.2. This condition is only needed in

Theorem 4.1 to establish the limit distribution of the ﬁnite dimensional projections of the process

Gπ

N. For Theorem 4.2 we only need that Gπ

Nis tight.

As before, below we obtain a functional CLT for √n(FHJ

N−FN)in the case that nand the

inclusion probabilities are deterministic. Similar to the remark we made after Theorem 3.1, note

that if we would have imposed (HJ2) for any sequence of bounded random vectors, then this would

imply conditions (i)-(ii) of Proposition 3.1, which can then be left out in Theorem 4.1.

Proposition 4.1. Consider the setting of Theorem 4.2, where nand πi, πij , for i, j = 1,2,...,N,

are deterministic. Suppose n→ ∞ and that (C1)-(C4), (HT1) and (HT3) hold, as well as condi-

tions (i)-(ii) from Proposition 3.1. Then √n(FHJ

N−FN)converges weakly in D(R)to a mean zero

Gaussian process GHT with covariance function µπ1(F(s∧t)−F(s)F(t)), for s, t ∈R.

Finally, we consider √n(FHJ

N−F). Again, we relate this process to (4.1) and write

√nFHJ

N(t)−F(t)=N

Gπ

N(t).(4.7)

Since b

N/N →1in probability, this implies that √n(FHJ

N−F)has the same limiting behavior as

Gπ

Theorem 4.3. Let Y1,...,YNbe i.i.d. random variables with c.d.f. Fand let FHJ

Nbe deﬁned in (2.2).

Suppose n→ ∞,ω-a.s., and that (C1)-C(4), (HT1), (HT3), (HJ2) and (HJ4) hold, as well as

condition (4.6). Then √n(FHJ

N−F)converges weakly in D(R)to a mean zero Gaussian process GHJ

with covariance function Ed,mGπ(s)Gπ(t)given by

lim

N→∞

i=1

j=1

Emnπij −πiπj

πiπj1{Yi≤s}−F(s)1{Yi≤t}−F(t)

+λ(F(s∧t)−F(s)F(t)) , s, t ∈R.

With Theorem 4.3 we recover Theorem 1 in [Wan12]. Our assumptions are comparable to those

in [Wan12], although this paper seems to miss a condition on the convergence of the variance, such

as our condition (HJ2).

We conclude this section by establishing a functional CLT for √n(FHJ

N−F)in the case of

deterministic nand inclusion probabilities.

Proposition 4.2. Consider the setting of Theorem 4.3, where nand πi, πij , for i, j = 1,2,...,N,

are deterministic. Suppose n→ ∞ and that (C1)-(C4), (HT1) and (HT3) hold, as well as condi-

tions (i)-(ii) from Proposition 3.2. Then √n(FHJ

N−F)converges weakly in D(R)to a mean zero

Gaussian process GHJ with covariance function (µπ1+λ) (F(s∧t)−F(s)F(t)), for s, t ∈R.

5 Hadamard-diﬀerentiable functionals

Theorem 4.3 provides an elegant means to study the limit behavior of estimators that can be

described as φ(FHJ

N), where φis a Hadamard-diﬀerentiable functional. Given such a φ, the functional

delta-method, e.g., see Theorems 3.9.4 and 3.9.5 in [vdVW96] or Theorem 20.8 in [vdV98], enables

one to establish the limit distribution of φ(FHJ

N). Similarly, this holds for Theorems 3.1,3.2, and 4.2,

or Propositions 3.1,3.2,4.1, and 4.2 in the special case of deterministic nand inclusion probabilities.

We illustrate this by discussing the poverty rate. This indicator has recently been revisited

by [GT14] and [OAB15]. This example has also been discussed by [Dd08], but under the assumption

of weak convergence of √n(FHJ

N−FN)to some centered continuous Gaussian process. Note that

this assumption is now covered by our Theorem 4.2 and Proposition 4.1. Let Dφ⊂D(R)consist of

F∈D(R)that are non-decreasing. Then for F∈Dφ, the poverty rate is deﬁned as

φ(F) = FβF −1(α)(5.1)

for ﬁxed 0< α, β < 1, where F−1(α) = inf {t:F(t)≥α}. Typical choices are α= 0.5and β= 0.5

(INSEE) or β= 0.6(EUROSTAT). Its Hadamard derivative is given by

φ′

F(h) = −βf(βF −1(α))

f(F−1(α)) h(F−1(α)) + h(βF −1(α)).(5.2)

See [BLRG15] for details.

We then have the following corollaries for the Horvitz-Thompson estimator φ(FHT

N)and the

Hájek estimator φ(FHJ

N)for the poverty rate φ(F).

Corollary 5.1. Let φbe deﬁned by (5.1)and suppose that the conditions of Proposition 3.2 hold.

Then, if Fis diﬀerentiable at F−1(α), the random variable √n(φ(FHT

N)−φ(F)) converges in dis-

tribution to a mean zero normal random variable with variance

σ2

HT,α,β =β2f(βF −1(α))2

f(F−1(α))2γπ1α+γπ2α2

+γπ1φ(F) + γπ2φ(F)2−2βf(βF −1(α))

f(F−1(α)) φ(F)γπ1+γπ2α,

(5.3)

where γπ1=µπ1+λand γπ2=µπ2−λ. If in addition n/N →0, then √n(φ(FHT

N)−φ(FN))

converges in distribution to a mean zero normal random variable with variance σ2

HT,α,β.

Corollary 5.2. Let φbe deﬁned by (5.1). and suppose that the conditions of Proposition 4.2

hold. Then, if Fis diﬀerentiable at F−1(α), the random variable √n(φ(FHJ

N)−φ(F)) converges in

distribution to a mean zero normal random variable with variance

σ2

HJ,α,β =β2f(βF −1(α))2

f(F−1(α))2γπ1α(1 −α)

+γπ1φ(F)1−φ(F))−2βf(βF −1(α))

f(F−1(α)) φ(F)γπ1(1 −α),

(5.4)

where γπ1=µπ1+λ. If in addition n/N →0, then √n(φ(FHJ

N)−φ(FN)) converges in distribution

to a mean zero normal random variable with variance σ2

HJ,α,β.

6 Simulation study

The objective of this simulation study is to investigate the performance of the Horvitz-Thompson

(HT) and the Hájek (HJ) estimators for the poverty rate, as deﬁned in (5.1), at the ﬁnite population

level and at the super-population level. The asymptotic results from Corollary 5.1 and 5.2 are used

to obtain variance estimators whose performance is also assessed in this small study.

Six simulation schemes are implemented with diﬀerent population sizes and (design-based) ex-

pected sample sizes, namely N= 10 000 and 1000 and n= 500,100, and 50. The samples are drawn

according to three diﬀerent sampling designs. The ﬁrst one is simple random sampling without re-

placement (SI) with size n. The second design is Bernoulli sampling (BE) with parameter n/N.

The third one is Poisson sampling (PO) with ﬁrst order inclusion probabilities equal to 0.4n/N for

the ﬁrst half of the population and equal to 1.6n/N for the other half of the population, where

the population is randomly ordered. The ﬁrst order inclusion probabilities are deterministic for the

three designs and the sample size nsis ﬁxed for the SI design, while it is random with respect to the

design for the BE and PO designs. Moreover, the SI and BE designs are equal probability designs,

while PO is an unequal probability design. The results are obtained by replicating NR= 1000

populations. For each population, nR= 1000 samples are drawn according to the diﬀerent designs.

The variable of interest Yis generated for each population according to an exponential distribution

with rate parameter equal to one. For this distribution and given αand β, the poverty rate has

an explicit expression φ(F) = 1 −exp(βln(1 −α)). In what follows, α= 0.5and β= 0.6and

φ(F)≃0.34. These are the same values for αand βas considered in [Dd08].

The Horvitz-Thompson estimator and Hájek estimator for φ(F)or φ(FN)are denoted by b

φHT

and b

φHJ, respectively. They are obtained by plugging in the empirical c.d.f.’s FHT

Nand FHJ

N, re-

spectively, for Fin expression (5.1). The empirical quantiles are calculated by using the function

wtd.quantile from the R package Hmisc for the Hájek estimator and by adapting the function for

the Horvitz-Thompson estimator. For the SI sampling design, the two estimators are the same.

The performance of the estimators for the parameters φ(F)and φ(FN)is evaluated using some

Monte-Carlo relative bias (RB). This is reported in Table 1. When estimating the super-population

parameter φ(F), if b

φij denotes the estimate (either b

φHT or b

φHJ) for the ith generated population

and the jth drawn sample, the Monte Carlo relative bias of b

φin percentages has the following

expression

RBF(b

φ) = 100

NRnR

i=1

j=1 b

φij −φ(F)

φ(F).

When estimating the ﬁnite population parameter φ(FN), the parameter depends on the generated

population Ni, for each i= 1,...,NR, and will be denoted by φ(FNi). The Monte Carlo relative

bias of b

φis then computed by replacing Fby FNiin the above expression. Concerning the relative

biases reported in Table 1, the values are small and never exceed 3%. As expected, these values

increase when ndecreases. When the centering is relative to φ(FN), the relative bias is in general

Table 1: RB (in %) of the HT and the HJ estimators for the ﬁnite population φ(FN)and the

super-population φ(F)poverty rate parameter

N= 10 000 N= 1000

n= 500 n= 100 n= 50 n= 500 n= 100 n= 50

SI HT-HJ φ(FN)−0.17 −0.89 −1.82 −0.05 −0.84 −1.62

φ(F)−0.20 −0.91 −1.86 −0.18 −0.72 −1.85

HT φ(FN)−0.12 −0.66 −1.29 0.01 −0.65 −1.12

BE φ(F)−0.15 −0.68 −1.34 −0.12 −0.54 −1.36

HJ φ(FN)−0.17 −0.92 −1.87 −0.04 −0.88 −1.68

φ(F)−0.20 −0.93 −1.92 −0.17 −0.76 −1.91

HT φ(FN)−0.05 −1.05 −2.06 −0.06 −0.30 −0.37

PO φ(F)−0.08 −1.07 −2.11 −0.19 −0.19 −0.63

HJ φ(FN)−0.20 −1.27 −2.95 −0.04 −1.08 −1.99

φ(F)−0.23 −1.28 −3.00 −0.17 −0.97 −2.23

Table 2: RB (in %) for the variance estimator of the HT and the HJ estimators for the poverty rate

parameter

N= 10 000 N= 1000

n= 500 n= 100 n= 50 n= 500 n= 100 n= 50

SI HT-HJ −2.21 −3.08 −2.97 −2.25 −3.26 −3.00

BE HT −4.15 −5.11 −4.21 −3.31 −5.11 −4.19

HJ −2.22 −3.06 −3.03 −2.26 −3.24 −3.03

PO HT −4.43 −4.96 −3.45 −3.74 −5.72 −4.59

HJ −2.36 −3.43 −3.36 −2.44 −3.75 −4.13

somewhat smaller than when centering with φ(F). This behavior is most prominent when N= 1000

and n= 500, which suggests that the estimates are typically closer to the population poverty rate

φ(FN)than to the model parameter φ(F). The Hájek estimator has a larger relative bias than the

Horvitz-Thompson estimator in all situations but in particular for the Poisson sampling design when

the size of the population is 1000. Note that all values in Table 1are negative, which illustrates the

fact that the estimators typically underestimate the population and model poverty rates.

In Table 2, the estimators of the variance of b

φHT and b

φHJ are obtained by plugging in the

empirical c.d.f.’s FHT

Nand FHJ

N, respectively, for Fin the expressions (5.3) and (5.4). To estimate f

in the variance of b

φHJ, we follow [BS03], who propose a Hájek type kernel estimator with a Gaussian

kernel function. For the variance of b

φHT, we use a corresponding Horvitz-Thompson estimator by

replacing b

Nby N. Based on [Sil86], pages 45-47, we choose b= 0.79Rn−1/5

s, where Rdenotes

the interquartile range. This diﬀers from [BS03], who propose a similar bandwidth of the order

N−1/5. However, this severely underestimates the optimal bandwidth, leading to large variances of

the kernel estimator. Usual bias variance trade-oﬀ computations show that the optimal bandwidth

is of the order n−1/5

For the SI sampling design, (5.3) and (5.4) are identical and can be calculated in an explicit way

using the fact that µπ1+λ= 1 and µπ2−λ=−1. For the BE design, µπ1+λ= 1, whereas for

Poisson sampling, the value (n/N2)PN

i=1 1/πiis taken for µπ1+λ. For these designs, µπ2−λ=−λ,

where we take n/N as the value of λ.

In order to compute the relative bias of the variance estimates, the asymptotic variance is taken

as reference. This asymptotic variance AV(b

φ)of the estimator b

φ(either b

φHT or b

φHJ) is computed

from (5.3) and (5.4). The expressions f(β F −1(α)) and f(F−1(α)) are explicit in the case of an

Table 3: Coverage probabilities (in %) for 95% conﬁdence intervals of the HT and the HJ estimators

for the ﬁnite population φ(FN)and the super-population φ(F)poverty rate parameter

N= 10 000 N= 1000

n= 500 n= 100 n= 50 n= 500 n= 100 n= 50

SI HT-HJ φ(FN)95.2 94.4 93.5 98.8 95.1 94.6

φ(F)94.6 93.2 92.2 94.7 93.2 92.0

HT φ(FN)94.9 94.3 94.6 98.4 94.8 94.6

BE φ(F)94.4 93.7 94.9 94.6 93.6 94.7

HJ φ(FN)95.1 94.3 93.9 98.7 94.9 94.2

φ(F)94.7 94.2 93.9 94.7 94.2 93.9

HT φ(FN)94.5 94.2 94.3 96.8 94.0 93.6

PO φ(F)94.5 94.0 94.3 94.6 93.6 93.5

HJ φ(FN)94.8 93.9 93.6 97.2 94.2 93.3

φ(F)94.6 93.9 93.6 94.6 93.9 93.2

exponential distribution. Furthermore, for µπ1+λand µπ2−λwe use the same expressions as

mentioned above. The Monte Carlo relative bias of the variance estimator c

AV(b

φ)in percentages, is

deﬁned by

RB(c

AV(b

φ)) = 100

NRnR

i=1

j=1 c

AV(b

φij )−AV(b

φ)

AV(b

φ),

where c

AV(b

φij )denotes the variance estimate for the ith generated population and the jth drawn

sample.

Table 3gives the Monte-Carlo coverage probabilities for a nominal coverage probability of 95%

for the two parameters φ(FN)and φ(F), the Horvitz-Thompson and the Hájek estimators and the

diﬀerent simulation schemes. In general the coverage probabilities are somewhat smaller than 95%,

which is due to the underestimation of the asymptotic variance, as can be seen from Table 2. The

case N= 1000 and n= 500 for b

φHJ forms an exception, which is probably due to the fact that

in this case λ=n/N is far from zero, so that the limit distribution of √n(φ(FHT

N)−φ(FN)) and

√n(φ(FHJ

N)−φ(FN)) has a larger variance than the ones reported in Corollaries 5.1 and 5.2. When

looking at Table 2, the relative biases are smaller than 5% when nis 500. The biases are larger

for the Horvitz-Thompson estimator than for the Hájek estimator. Again all relative biases are

negative, which illustrates the fact that the asymptotic variance is typically underestimated.

7 Discussion

[Wan12] formulates a functional central limit theorem (see his Theorem 1) for the Hájek empirical

c.d.f. from (2.2) centered around F. It is also claimed that a similar result holds for the Horvitz-

Thompson process in (2.1), but details are not provided. The paper seems to miss a number of

assumptions that cannot be avoided. For instance, the proof of his Theorem 1 requires convergence

in probability of the covariance matrix of the vector √n∗(Fnπ (t)−FN(t), Fnπ(s)−FN(s)). This

assumption is comparable with our condition (HJ2), but is missing in [Wan12]. More severely, the

argument establishing Billingsley’s tightness condition seems to contain a serious mistake, which

cannot be repaired easily (see the inequality on line 6 page 678 in [Wan12]; the inequality can be

shown not to hold for instance for sampling designs with independent inclusion indicators). As

a consequence, assumption 5 in [Wan12] diﬀers somewhat from our conditions (C2)-(C4). The

remaining assumptions in [Wan12] are comparable to the conditions needed for our Theorem 4.3.

Note that, in addition to the latter theorem, we also establish Theorems 3.1,3.2, and 4.2 for other

empirical processes of interest.

[BW07] and [SW13] obtain weak convergence of the empirical process (in our notation)

√N

i=1 ξi

πi

f(Yi)−Emf(Yi), f ∈ F.(7.1)

Weak convergence is established under ﬁnite population two-phase stratiﬁed sampling. This pro-

cess is comparable to our Horvitz-Thompson empirical process in Theorem 3.2. Although their

functional CLT allows general function classes, it only covers sampling designs with equal inclu-

sion probabilities within strata that assume exchangeability of the inclusion indicators ξ1,...,ξN,

such as simple random sampling and Bernoulli sampling. Their approach views two-phase stratiﬁed

sampling as a form of bootstrap and uses results on exchangeable weighted bootstrap for empirical

processes from [PW93], as incorporated in [vdVW96]. This approach, in particular the application

of Theorem 3.6.13 in [vdVW96], seems diﬃcult to extend to more complex sampling designs that

go beyond exchangeable inclusion indicators. Although our results only correspond to the class of

indicators ft(y) = 1(−∞,t](y), for t∈R, the advantage of our results is that they are applicable to

general sampling designs. Moreover, our results also include empirical processes centered with the

population mean.

[BCC14] establish a functional CLT, for the Poisson-like empirical process

TN(f) = 1

√N

i=1

(ξi−pi)f(Yi)

pi−θN,p(f), f ∈ F,(7.2)

where p= (p1,...,pN)is the vector of inclusion probabilities corresponding to a Poisson sampling

design and

θN,p(f) = 1

i=1

(1 −pi)f(Yi), dN=

i=1

pi(1 −pi).

However, the functional CLT is obtained conditionally on the Y1, Y2.... In this case, the terms in the

summation in (7.2) are independent, which allows the use of Theorem 2.11.1 from [vdVW96]. From

their result a functional CLT under rejective sampling can then be established for the design-based

Horvitz-Thompson process

GN,π (f) = 1

√N

i=1 ξi

πi

f(Yi)−f(Yi), f ∈ F,(7.3)

ω-almost surely. This is due to the close connection between Poisson sampling and rejective sam-

pling. For this reason, the approach used in [BCC14] seems diﬃcult to extend to other sampling

designs. For the class of indicators ft(y) = 1(−∞,t](y), for t∈R, the process in (7.3) is similar to the

one in our Theorem 3.1, but this theorem allows general sampling designs. Moreover, our results

also include empirical processes centered with the superpopulation mean.

8 Proofs

We will use Theorem 13.5 from [Bil99], which requires convergence of ﬁnite dimensional distributions

and a tightness condition (see (13.14) in [Bil99]. We will ﬁrst establish the tightness condition, as

stated in the following lemma.

Lemma 8.1. Let Y1,...,YNbe i.i.d. random variables with c.d.f. Fand empirical c.d.f. FNand let

FHT

Nbe deﬁned according to (2.1). Let XN=√n(FHT

N−FN)and suppose that (C1)-(C4) hold. Then

there exists a constant K > 0independent of N, such that for any t1,t2and −∞ < t1≤t≤t2<∞,

Ed,m h(XN(t)−XN(t1))2(XN(t2)−XN(t))2i≤KF(t2)−F(t1)2.

Proof. First note that

XN(t) = √n

i=1 ξi

πi−11{Yi≤t}.

For the sake of brevity, for −∞ < t1≤t≤t2<∞, and i= 1,2,...,N, we deﬁne p1=F(t)−F(t1),

p2=F(t2)−F(t),Ai=1{t1<Yi≤t}, and Bi=1{t<Yi≤t2}. Furthermore, let αi= (ξi−πi)Ai/πi

and βi= (ξi−πi)Bi/πi. Then, according to the fact that p1p2≤(F(t2)−F(t1))2, due to the

monotonicity of F, it suﬃces to show

N4Ed,m 

n2 N

i=1

αi!2



j=1

βj



2

≤Kp1p2.(8.1)

The expectation on the left hand side can be decomposed as follows

i=1

k=1

Ed,m n2α2

iβ2

k+

i=1 X

j6=i

k=1

Ed,m n2αiαjβ2

k

k=1 X

l6=k

i=1

Ed,m n2α2

iβkβl+

i=1 X

j6=i

k=1 X

l6=k

Ed,m n2αiαjβkβl.

(8.2)

Note that by symmetry, sums two and three on the right hand side can be handled similarly, so

that essentially we have to deal with three summations. We consider them one by one.

First note that, since 1{t1<Yi≤t}1{t<Yi≤t2}= 0, we will only have non-zero expectations when

{i, j}and {k, l}are disjoint. With (C1), we ﬁnd

i=1

k=1

Ed,m n2α2

iβ2

k=1

N4XX

(i,k)∈D2,N

Ed,m n2α2

iβ2

k

N4XX

(i,k)∈D2,N

Emn2AiBk

π2

iπ2

Ed(ξi−πi)2(ξk−πk)2

≤1

1XX

(i,k)∈D2,N

EmAiBk

n2Ed(ξi−πi)2(ξk−πk)2

(8.3)

Straightforward computation shows that Ed(ξi−πi)2(ξk−πk)2equals

(πik −πiπk)(1 −2πi)(1 −2πk) + πiπk(1 −πi)(1 −πk).

Hence, with (C1)-(C2) we ﬁnd that

Ed(ξi−πi)2(ξk−πk)2≤ |Ed(ξi−πi)(ξk−πk)|+K2

N2=On2

N2,

ω-almost surely. It follows that

i=1

k=1

Ed,m n2α2

iβ2

k≤O1

N2XX

(i,k)∈D2,N

Em[AiBk].

Since D2,N has N(N−1) elements and Em[AiBj] = p1p2for (i, j )∈D2,N , it follows that

i=1

j=1

Ed,m n2α2

iβ2

j≤Kp1p2.(8.4)

Consider the second (and third) summation on the right hand side of (8.2). Similarly to (8.3), we

can then write

N4

i=1 X

j6=i

k=1

Ed,m n2αiαjβ2

k=1

N4XXX

(i,j,k)∈D3,N

Ed,m n2αiαjβ2

k

≤1

N4XXX

(i,j,k)∈D3,N Ed,m n2AiAjBk

πiπjπ2

(ξi−πi)(ξj−πj)(ξk−πk)2

≤1

N4XXX

(i,j,k)∈D3,N

Emn2AiAjBk

πiπjπ2

kEd(ξi−πi)(ξj−πj)(ξk−πk)2

≤1

1XXX

(i,j,k)∈D3,N

EmAiAjBk

n2Ed(ξi−πi)(ξj−πj)(ξk−πk)2.

We ﬁnd that Ed(ξi−πi)(ξj−πj)(ξk−πk)2equals

(1 −2πk)Ed(ξi−πi)(ξj−πj)(ξk−πk) + πk(1 −πk)Ed(ξi−πi)(ξj−πj)

With (C1)-(C3), this means |Ed(ξi−πi)(ξj−πj)(ξk−πk)2|=O(n2/N 3),ω-almost surely. It follows

that

N4

i=1 X

j6=i

k=1

Ed,m n2αiαjβ2

k=O1

N3XXX

(i,j,k)∈D3,N

Em[AiAjBk].

Since D3,N has N(N−1)(N−2) elements and Ed,m [AiAjBk] = p2

1p2, for (i, j, k)∈D3,N , we ﬁnd

N4

i=1 X

j6=i

k=1

Ed,m n2αiαjβ2

k≤Kp1p2.(8.5)

The computations for the third summation in (8.2) are completely similar. Finally, consider the

last summation in (8.2). As before, this summation can be bounded by

(i,j,k,l)∈D4,N

EmAiAjBkBl

n2Ed(ξi−πi)(ξj−πj)(ξk−πk)(ξl−πl).

Since D4,N has N(N−1)(N−2)(N−3) elements and Em[AiAjBkBl] = p2

1p2

2, for (i, j, k, l)∈D4,N ,

with (C4) we conclude that

N4

i=1 X

j6=i

k=1 X

l6=k

Ed,m n2αiαjβkβl≤Kp1p2.(8.6)

Together with (8.4), (8.5) and decomposition (8.2), this proves (8.1).

Lemma 8.2. Let XN=√n(FHT

N−FN)and suppose that (C1)-(C2),(HT1)-(HT2) hold. For any

k∈ {1,2,...}, and t1,...,tk∈R,XN(t1),...,XN(tk)converges in distribution under Pd,m to a

k-variate mean zero normal random vector with covariance matrix ΣHT

kgiven in (3.4).

Proof. We will use the Cramér-Wold device. Note that any linear combination

a1√nFHT

N(t1)−FN(t1)+···+ak√nFHT

N(tk)−FN(tk)(8.7)

can be written as

√n(1

i=1

ξi

πi

Vik −1

i=1

Vik),(8.8)

where

Vik =a11{Yi≤t1}+···+ak1{Yi≤tk}=at

kYik (8.9)

with Yt

ik = (1{Yi≤t1},...,1{Yi≤tk})and at

k= (a1,...,ak). For the corresponding design-based

variance, we have

nS2

N=n

i=1

j=1

πij −πiπj

πiπj

VikVjk

=at

k

n

i=1

j=1

πij −πiπj

πiπj

YikYt

jk 

ak→at

kΣHT

kak,

(8.10)

ω-almost surely, according to (HT2), where ΣHT

kcan obtained from (3.4). Together with (HT1), it

follows that (8.7) converges in distribution to a mean zero normal random variable with variance

kΣHT

kak. We conclude that (8.7) converges in distribution to a1N1+···+akNk, where (N1,...,Nk)

has a k-variate mean zero normal distribution with covariance matrix ΣHT

k. According to the

Cramér-Wold device this proves the lemma.

Proof of Theorem 3.1 We ﬁrst consider XN=√n(FHT

N−FN)for the case that the Yi’s follow

a uniform distribution on [0,1]. We apply Theorem 13.5 from [Bil99]. Lemma 8.2 provides the

limiting distribution of the ﬁnite dimensional projections (XN(t1),...,XN(tk)), which is the same

as that of the vector (GHT(t1),...,GHT(tk)), where GHT is a mean zero Gaussian process with

covariance function

EmGHT(s)GHT(t) = lim

N→∞

i=1

j=1

Emnπij −πiπj

πiπj

1{Yi≤s}1{Yj≤t},

for all s, t ∈R. Tightness condition (13.14) in [Bil99] is provided by Lemma 8.1. Since GHT is

continuous at 1, the theorem now follows from Theorem 13.5 in [Bil99] for the case that the Yi’s are

uniformly distributed on [0,1].

To extend this to a functional CLT with i.i.d. random variables Y1, Y2,...with a general c.d.f. F,

we can follow the argument in the proof of Theorem 14.3 from [Bil99]. First deﬁne the generalized

inverse of F:

ϕ(s) = inf{t:s≤F(t)},

that satisﬁes s≤F(t)if and only if ϕ(s)≤t. This means that if U1, U2,... are i.i.d. uniformly

distributed on [0,1],ϕ(Ui)has the same distribution as Yi, so that 1{Yi≤t}

=1{ϕ(Ui)≤t}=1{Ui≤F(t)}.

It follows that

XN(t) = √n(1

i=1

ξi1{Yi≤t}

πi−1

i=1

1{Yi≤t})d

=ZN(F(t)), t ∈R,

where

ZN(t) = √n

i=1 ξi

πi−11{Ui≤t}, t ∈[0,1],(8.11)

Hence, the general HT empirical process XNis the image of the HT uniform empirical process ZN

under the mapping ψ:D[0,1] 7→ D(R)given by [ψx] (t) = x(F(t)). Note that, if xN→xin D[0,1]

in the Skorohod topology and xhas continuous sample paths, then the convergence is uniform. But

then also ψxNconverges to ψx uniformly in D(R). This implies that ψxNconverges to ψx in the

Skorohod topology. We have established that ZN⇒Zweakly in D[0,1] in the Skorohod topology,

where Zhas continuous sample paths. Therefore, according to the continuous mapping theorem,

e.g., Theorem 2.7 in [Bil99], it follows that ψ(ZN)⇒ψ(Z)weakly. This proves the theorem for Yi’s

with a general c.d.f. F.✷

Proof of Proposition 3.1 The proof is similar to that of Theorem 3.1. First consider the case of

uniform Yi’s with F(t) = t. We only have to verify the weak convergence of the ﬁnite dimensional

projections of the process XN=√n(FHT

N−FN). Consider (8.7) represented as in (8.8). From (HT1)

and Lemma 9.1(ii) in [BLRG15] we conclude that (8.7) converges in distribution to a mean zero

normal random variable with variance

σ2

HT =µπ1EmV2

1k+µπ2(Em[V1k])2

=µπ1at

kEmY1kYt

1kak+µπ2at

k(EmY1k) (EmY1k)tak=at

kΣkak,

where Σkis the k×k-matrix with (q, r)-element equal to µπ1(tq∧tr) + µπ2tqtr. We conclude

that (8.7) converges in distribution to a1N1+···+akNk, where (N1,...,Nk)has a k-variate mean

zero normal distribution with covariance matrix Σk. As in the proof of Lemma 8.2, by means of

the Cramér-Wold device this establishes the limit distribution of (XN(t1),...,XN(tk)), which is the

same that of the vector (GHT(t1),...,GHT(tk)), where GHT is a mean zero Gaussian process with

covariance function Ed,mGHT(s)GHT(t) = µπ1(s∧t) + µπ2st. From here on, the proof is completely

the same as that of Theorem 3.1.✷

To establish tightness for the process √n(FHT

N−F)we use the following decomposition

√n(FHT

N−F) = √n(FHT

N−FN) + √n

√N·√N(FN−F).(8.12)

The ﬁrst process on the right hand side converges weakly to Gaussian process, according to Theo-

rem 3.1. The process √N(FN−F)also converges weakly to a Gaussian process, due to the classical

Donsker theorem. In particular both processes on the right hand side are tight in D(R)with the

Skorohod metric. In general the sum of two tight processes in D(R)is not necessarily tight. How-

ever, this will be the case if both processes converge weakly to continuous processes (see Lemma 9.2

in [BLRG15]).

Lemma 8.3. Let V1, V2,... be a sequence of bounded i.i.d. random variables on (Ω,F,Pm)with

mean µVand variance σ2

V, and let S2

Nbe deﬁned by (3.2). Suppose (HT1) and (HT3) hold and

nS2

N→σ2

HT >0in Pm-probability. Then,

√n 1

i=1

ξiVi

πi−µV!,(8.13)

converges in distribution under Pd,m to a mean zero normal random variable with variance σ2

HT +

λσ2

Note that, in view of the expression for σ2

HT obtained in Lemma 9.1, for simple random sampling

without replacement, the condition σ2

HT >0implies that λmust diﬀer from 1.

Proof. We decompose as follows

SN 1

i=1

ξiVi

πi−µV!=1

SN 1

i=1

ξiVi

πi−1

i=1

Vi!

√nSN×√n

√N×√N 1

i=1

Vi−µV!.

According to (HT3), the central limit theorem, Slutsky’s theorem, and the fact that nS2

N→σ2

HT >0

in probability,

√nSN×√n

√N×√N 1

i=1

Vi−µV!→N(0, λσ2

V/σ2

HT),(8.14)

in distribution under Pm, whereas, thanks to (HT1),

SN 1

i=1

ξiVi

πi−1

i=1

Vi!→N(0,1), ω −a.s., (8.15)

in distribution under Pd. Since the latter limit distribution does not depend on ω, we can apply

Theorem 5.1(iii) from [RBSK05]. It follows that

SN 1

i=1

ξiVi

πi−µV!→N0,1 + λσ2

V/σ2

HT,

in distribution under Pd,m. Together with nS2

N→σ2

HT in probability, this implies that the random

variable in (8.13) converges to a mean zero normal random variable with variance σ2

HT +λσ2

Lemma 8.4. Let XF

N=√n(FHT

N−F)and suppose that (C1)-(C2),(HT1)-(HT4) hold. Then for any

k∈ {1,2,...}, and t1, t2,...,tk∈R, the sequence XF

N(t1),...,XF

N(tk)converges in distribution

under Pd,m to a k-variate mean zero normal random vector with covariance matrix ΣF

HT =ΣHT

λΣF, where ΣHT

kis given in (3.4)and ΣFis the k×kmatrix with (q, r)-entry F(tq∧tr)−F(tq)F(tr),

for q, r = 1,2,...,k.

Proof. The proof is similar to the proof of Lemma 8.2. The details can be found in [BLRG15].

Proof of Theorem 3.2 The proof is completely similar to that of Theorem 3.1. We ﬁrst consider

the process XF

N=√n(FHT

N−F)for the case that the Yi’s follow a uniform distribution with

F(t) = t. Decompose XF

Nas in (8.12). By Theorem 3.1, the ﬁrst process on the right hand side

of (8.12) converges weakly to a process in C[0,1]. Due to the classical Donsker theorem and (HT3),

the second process on the right hand side of (8.12) also converges weakly to a process in C[0,1].

Tightness of XF

Nthen follows from Lemma 9.2 in [BLRG15]. Convergence of the ﬁnite dimensional

distributions is provided by Lemma 8.4. The theorem now follows from Theorem 13.5 in [Bil99]

for the case that the Yi’s are uniformly distributed on [0,1]. Next, this is extended to Yi’s with a

general c.d.f. Fin the same way as in the proof of Theorem 3.1.✷

To establish convergence in distribution of the ﬁnite dimensional distributions of √n(FHT

N−F)

under the conditions of Proposition 3.2, as in the proof of Lemma 8.4, we will use the Cramér-Wold

device. To ensure that the limit in (9.2) is still strictly positive without imposing (HT4), we will

need the following lemma. Its proof can befound in [BLRG15].

Lemma 8.5. Let Fbe the c.d.f. of the i.i.d. Y1,...,YN. For any k-tuple (t1,...,tk)∈Rk, suppose

that the values F(t1),...,F(tk)are all distinct and such that 0< F (ti)<1. Let a, b ∈R, such that

a≥b. If a > 0, then the k×kmatrix Mwith (i, j)-th element Mij =aF (ti∧tj)−bF (ti)F(tj)is

positive deﬁnite.

Lemma 8.6. Let XF

N=√n(FHT

N−F)and suppose that nand πi, πij , for i, j = 1,2,...,N, are

deterministic. Suppose that (C1)-(C2), (HT1) and (HT3) hold, as well as conditions (i)-(ii) of

Proposition 3.2. Then, for any k∈ {1,2,...}, and t1,...,tk∈R,XF

N(t1),...,XF

N(tk)converges

in distribution under Pd,m to a k-variate mean zero normal random vector with covariance matrix

ΣF

HT, with (q, r)-entry (µπ1+λ)F(tq∧tr) + (µπ2−λ)F(tq)F(tr), for q, r, = 1,2,...,k.

Proof. The proof follows the same ideas as the proof of Lemma 8.4, but is a bit more technical. It

can be found in [BLRG15].

Proof of Proposition 3.2 The proof is similar to that of Theorem 3.2. Tightness is obtained

in the same way and the convergence of ﬁnite dimensional projections is provided by Lemma 8.6.

The theorem now follows from Theorem 13.5 in [Bil99] for the case that the Yi’s are uniformly

distributed on [0,1]. Next, this is extended to Yi’s with a general c.d.f. Fin the same way as in the

proof of Theorem 3.1.✷

Proof of Theorem 4.1 For part (i), note that with S2

Ndeﬁned in (3.2) with Vi= 1, from (HT1)

together with condition (4.6), it follows that

√nSN×1

SN 1

i=1

ξi

πi−1!→N(0, σ2

π), ω −a.s.,

in distribution under Pd. This implies

√n b

N−1!=√n 1

i=1

ξi

πi−1!→N(0, σ2

π),(8.16)

in distribution under Pd,m. In particular, since n→ ∞, this proves part (i).

The proof of part(ii) is along the same lines as the proof of Theorems 3.1 and 3.2. First consider

the case, where the Yi’s are uniform, with F(t) = ton [0,1]. Then, with FHT

Ndeﬁned in (2.1) and

N=√n(FHT

N−F), we can write Gπ

N(t) = XF

N(t)−(XF

N(t)−Gπ

N(t)). According to Theorem 3.2,

the process XF

Nconverges weakly to a continuous process. As a consequence of (8.16), the process

N(t)−Gπ

N(t) = t√n 1

i=1

ξi

πi−1!,

also converges weakly to a continuous process. Hence, similar to the argument in the proof of

Theorem 3.2, we conclude that the process Gπ

Nis tight. Next, we establish weak convergence of the

ﬁnite dimensional projections Gπ

N(t1),...,Gπ

N(tk).(8.17)

To this end we apply the Cramér-Wold device and consider linear combinations

a1Gπ

N(t1) + ···+akGπ

N(tk) = √n

i=1

ξi

πi

Vik.(8.18)

Convergence of (8.18), is obtained completely similar to that of (9.1) in Lemma 8.4, but this time

with

Vik =a11{Yi≤t1}−t1) + ···+ak(1{Yi≤tk})−tk,

and µk= 0. Using the fact that (HJ4) allows the use of Lemma 8.3, one can deduce that (8.18)

converges in distribution under Pd,m to a1N1+···+akNk, where (N1,...,Nk)has a k-variate normal

distribution with covariance matrix Σπ=ΣHJ

k+λΣF, where ΣHJ

kand ΣFare given in (4.5) and

Lemma 8.4, respectively. By means of the Cramér-Wold device, this proves that (8.17) converges in

distribution under Pd,m to a mean zero k-variate normal random vector with covariance matrix Σπ.

This distribution is the same as that of Gπ(t1),...,Gπ(tk), where Gπis a mean zero Gaussian

process with covariance function

lim

N→∞

i=1

j=1

Emnπij −πiπj

πiπj1{Yi≤s}−s1{Yi≤t}−t

+λ(s∧t−st), s, t ∈R.

Since Gπis continuous at 1, the theorem then follows from Theorem 13.5 in [Bil99] for the case

of uniform Yi’s. Extension to Yi’s with a general c.d.f. Fis completely similar to the proof of

Theorem 3.1.✷

Proof of Theorem 4.2 We use (4.2). From the proof of Theorem 4.1, we know that Gπ

Nis tight.

Together with Theorem 4.1(i), it then follows that the limit behavior of √n(FHJ

N−FN)is the same

as that of the process YNdeﬁned in (4.3). This process can be written as

YN(t) = √n

i=1 ξi

πi−11{Yi≤t}−F(t)√n

i=1 ξi

πi−1.

As in the proofs of Theorems 3.1,3.2, and 4.1, we ﬁrst consider the case of uniform Yi’s. The ﬁrst

process on the right hand side is √n(FHT

N−FN), which converges weakly to a continuous process,

according to Theorem 3.1, whereas the second process also converges to a continuous process due

to (8.16). As in the proof of Theorem 3.2, one can then argue that YN, being the diﬀerence of these

processes, is tight. Next, we prove weak convergence of the ﬁnite dimensional projections

YN(t1),...,YN(tk).(8.19)

As before, we apply the Cramér-Wold device and consider

a1YN(t1) + ···+akYN(tk) = √n(1

i=1

ξi

πi

Vik −1

i=1

Vik),(8.20)

with

Vik =a11{Yi≤t1}−t1) + ···+ak(1{Yi≤tk})−tk.

Convergence of (8.20) is obtained completely similar to that of (8.8) in the proof of Lemma 8.2. From

(HT1) and (HJ2), it follows that (8.20) converges in distribution under Pd,m to a1N1+···+akNk,

where (N1,...,Nk)has a k-variate normal distribution with covariance matrix ΣHJ

kgiven in (4.5).

By means of the Cramér-Wold device, this proves that (8.19) converges in distribution under Pd,m

to a mean zero k-variate normal random vector with covariance matrix ΣHJ

k. This distribution is the

same as that of GHJ(t1),...,GHJ(tk), where GHJ is a mean zero Gaussian process with covariance

function

lim

N→∞

i=1

j=1

Emnπij −πiπj

πiπj1{Yi≤s}−s1{Yi≤t}−t,

for s, t ∈R. As before, the theorem now follows from Theorem 13.5 in [Bil99] for the case of uniform

Yi’s, and is then extended to Yi’s with a general c.d.f. F.✷

Proof of Theorem 4.3 The theorem follows directly from relation (4.7) and Theorem 4.1.✷

The proofs of Propositions 4.1 and 4.2 are similar to those of Theorems 4.2 and 4.1, respectively,

and can be found in [BLRG15]. The proofs for Corollaries 5.1 and 5.2 are fairly straightforward

and can be found in [BLRG15].

References

[BCC14] Patrice Bertail, Emilie Chautru, and Stéphan Clémençon. Empirical processes in survey

sampling. Submitted, See also https://hal.archives-ouvertes.fr/hal-00989585,

2014.

[BD09] Garry F. Barrett and Stephen G. Donald. Statistical inference with generalized Gini

indices of inequality, poverty, and welfare. J. Bus. Econom. Statist., 27(1):1–17, 2009.

[Bha07] Debopam Bhattacharya. Inference on inequality from household survey data. J. Econo-

metrics, 137(2):674–707, 2007.

[Bil99] Patrick Billingsley. Convergence of probability measures. Wiley Series in Probability

and Statistics: Probability and Statistics. John Wiley & Sons, Inc., New York, second

edition, 1999. A Wiley-Interscience Publication.

[BLRG12] Helène Boistard, Hendrik P. Lopuhaä, and Anne Ruiz-Gazen. Approximation of rejective

sampling inclusion probabilities and application to high order correlations. Electron. J.

Stat., 6:1967–1983, 2012.

[BLRG15] Helène Boistard, Hendrik P. Lopuhaä, and Anne Ruiz-Gazen. Supplement to "functional

central limit theorems in survey sampling". 2015.

[BM11] Debopam Bhattacharya and Bhaskhar Mazumder. A nonparametric analysis of black

and white diﬀerences in intergenerational income mobility in the united states. Quant.

Econ., 2(3):335–379, 2011.

[BO00] F. Jay Breidt and Jean D. Opsomer. Local polynomial regresssion estimators in survey

sampling. Ann. Statist., 28(4):1026–1053, 2000.

[BR09] David Binder and Georgia Roberts. Handbook of Statistics 29B: Sample Surveys: Design,

Methods and Applications., chapter Chapter 24: Design- and Model-Based Inference for

Model Parameters, pages 33–54. Elsevier, Amsterdam, 2009.

[BS03] Yves G. Berger and Chris J. Skinner. Variance estimation for a low income proportion.

J. Roy. Statist. Soc. Ser. C, 52(4):457–468, 2003.

[BW07] Norman E. Breslow and Jon A. Wellner. Weighted likelihood for semiparametric models

and two-phase stratiﬁed samples, with application to Cox regression. Scand. J. Statist.,

34(1):86–102, 2007.

[CCGL10] Hervé Cardot, Mohamed Chaouch, Camelia Goga, and Catherine Labruère. Properties

of design-based functional principal components analysis. J. Statist. Plann. Inference,

140(1):75–91, 2010.

[Dav09] Russell Davidson. Reliable inference for the Gini index. J. Econometrics, 150(1):30–40,

2009.

[Dd08] Fabien Dell and Xavier d’Haultfœuille. Measuring the evolution of complex indicators:

Theory and application to the poverty rate in France. Ann. Économ. Statist., (90):259–

290, 2008.

[DS92] Jean-Claude Deville and Carl-Erik Särndal. Calibration estimators in survey sampling.

Journal of the American statistical Association, 87(418):376–382, 1992.

[Dud02] R. M. Dudley. Real analysis and probability, volume 74 of Cambridge Studies in Advanced

Mathematics. Cambridge University Press, Cambridge, 2002. Revised reprint of the 1989

original.

[FF91] Carol A. Francisco and Wayne A. Fuller. Quantile estimation with a complex survey

design. Ann. Statist., 19(1):454–469, 1991.

[Ful09] W.A. Fuller. Sampling Statistics. Wiley Series in Survey Methodology. Wiley, New York,

2009.

[GT14] Eric Graf and Yves Tillé. Variance estimation using linearization for poverty and social

exclusion indicators. Survey Methodology, 40(1):61–79, 2014.

[Háj64] Jaroslav Hájek. Asymptotic theory of rejective sampling with varying probabilities from

a ﬁnite population. Ann. Math. Statist., 35:1491–1523, 1964.

[KG98] Edward L. Korn and Barry I. Graubard. Variance estimation for superpopulation pa-

rameters. Statist. Sinica, 8(4):1131–1151, 1998.

[KR81] D. Krewski and J. N. K. Rao. Inference from stratiﬁed samples: properties of the

linearization, jackknife and balanced repeated replication methods. Ann. Statist.,

9(5):1010–1019, 1981.

[OAB15] M. Oguz-Alper and Y. G. Berger. Variance estimation of change of poverty based upon

the turkish eu-silc survey. Journal of Oﬃcial Statistics, 31(2):155–175, 2015.

[PW93] Jens Præstgaard and Jon A. Wellner. Exchangeably weighted bootstraps of the general

empirical process. Ann. Probab., 21(4):2053–2086, 1993.

[RBSK05] Susana Rubin-Bleuer and Ioana Schiopu Kratina. On the two-phase framework for joint

model and design-based inference. Ann. Statist., 33(6):2789–2810, 2005.

[Sil86] B. W. Silverman. Density estimation for statistics and data analysis. Monographs on

Statistics and Applied Probability. Chapman & Hall, London, 1986.

[SW13] Takumi Saegusa and Jon A. Wellner. Weighted likelihood estimation under two-phase

sampling. Ann. Statist., 41(1):269–295, 2013.

[Tho97] M. E. Thompson. Theory of sample surveys, volume 74 of Monographs on Statistics and

Applied Probability. Chapman & Hall, London, 1997.

[vdV98] A. W. van der Vaart. Asymptotic statistics, volume 3 of Cambridge Series in Statistical

and Probabilistic Mathematics. Cambridge University Press, Cambridge, 1998.

[vdVW96] Aad W. van der Vaart and Jon A. Wellner. Weak convergence and empirical processes.

Springer Series in Statistics. Springer-Verlag, New York, 1996. With applications to

statistics.

[Wan12] Jianqiang C. Wang. Sample distribution function based goodness-of-ﬁt test for complex

surveys. Comput. Statist. Data Anal., 56(3):664–679, 2012.

Hélène Boistard

Toulouse School of Economics

21 allée de Brienne

31000 Toulouse, France

e-mail: helene@boistard.fr

Hendrik P. Lopuhaä

Delft Institute of Applied Mathematics

Delft University of Technology

Delft, The Netherlands

e-mail: h.p.lopuhaa@tudelft.nl

Anne Ruiz-Gazen

Toulouse School of Economics

21 allée de Brienne

31000 Toulouse, France

e-mail: anne.ruiz-gazen@tse-fr.eu

9 Supplemental Material

9.1 Proofs of Lemmas, Propositions and Corollaries in the main text

Proof of Lemma 8.4 We will use the Cramér-Wold device. To this end, we determine the limit

distribution of a1XF

N(t1)+ ···+akXF

N(tk), for a1,...,ak∈Rﬁxed and at

k= (a1,...,ak)6= (0,...,0).

As in the proof of Lemma 8.2, we consider

a1XF

N(t1) + ···+akXF

N(tk) = √n 1

i=1

ξi

πi

Vik −µk!,(9.1)

where Vik is deﬁned in (8.9). We want to apply Lemma 8.3. As in (8.10),

nS2

N→at

kΣHT

kak, ω −a.s.,(9.2)

where at

kΣHT

kak>0, thanks to (HT4). This means that, according to Lemma 8.3, the right hand

side of (9.1) converges in distribution under Pd,m to a mean zero normal random variable with

variance

kΣHT

kak+λnEm[V2

1k]−(Em[V1k])2o=at

kΣF

HTak,

where

ΣF

HT =ΣHT

k+λΣF.(9.3)

We conclude that (9.1) converges in distribution to a1N1+···+akNk, where (N1,...,Nk)has a

mean zero k-variate normal distribution with covariance matrix ΣF

HT. By the Cramér-Wold device,

this proves the lemma. ✷

Proof of Lemma 8.5 Without loss of generality we may assume 0< F (t1)<···< F (tk)<1,

since we can permute the rows and columns of Mwithout changing the determinant. For the entries

of Mwe can distinguish three situations:

1. if 1≤j < i ≤k, then Mij =aF (tj)−bF (ti)F(tj)

2. if 1≤i=j≤k, then Mij =aF (ti)−bF (ti)2

3. if 1≤i < j ≤k, then Mij =aF (ti)−bF (ti)F(tj).

Now, for 2≤i≤k, multiply the i-th row by F(t1)/F (ti). This changes the determinant with a factor

F(t1)k−1/F (t2)···F(tk)>0, and as a result, all entries in column j, at positions 1≤i≤j≤k, are

the same: aF (t1)−bF (t1)F(tj). Hence, if we subtract row-2 from row-1, then row-3 from row-2,

. . . , and then row-kfrom row-(k−1), we get a new matrix M′with a right-upper triangle consisting

of zero’s and a main diagonal with elements M′

ii =aF (t1)−aF (t1)F(ti)/F (ti+1), if 1≤i≤k−1,

and M′

kk =aF (t1)−bF (t1)F(tk). It follows that

det(M) = F(t2)···F(tk)

F(t1)k−1det(M′)

=ak−1F(t1)(F(t2)−F(t1)) ···(F(tk)−F(tk−1))(a−bF (tk)) >0,

since a > 0,0< F (t1)<···< F (tk)<1, and a−bF (tk)> a −b≥0.✷

Proof of Lemma 8.6 The proof is similar to that of Lemma 8.4. We determine the limit

distribution of (9.1). Note that without loss of generality we can assume that 0≤F(t1)≤ ··· ≤

F(tk)≤1. In contrast with the proof of Lemma 8.4, we now have to distinguish between several

cases.

We ﬁrst consider the situation where all F(ti)’s are distinct and such that 0< F (ti)<1. From

(HT1) and Lemma 9.1(ii) we conclude that

nS2

N→σ2

HT =µπ1Em[V2

1k] + µπ2(Em[V1k])2=at

kΣkak,

where

Σk=µπ1F(tq∧tr) + µπ2F(tq)F(tr)k

q,r=1 .(9.4)

First note that

µπ1+µπ2= lim

N→∞

i=1

j=1

πij −πiπj

πiπj

= lim

N→∞

N2Var N

i=1

ξi

πi!≥0.

Therefore, together with condition (i) we can apply Lemma 8.5 with a=µπ1and b=−µπ2. It

follows that Σkis positive deﬁnite, so that σ2

HT >0. This means that, according to Lemma 8.3,

the right hand side of (9.1) converges in distribution under Pd,m to a mean zero normal random

variable with variance (µπ1+λ)Em[V2

1k] + (µπ2−λ) (Em[V1k])2=at

kΣF

HTak, where

ΣF

HT =(µπ1+λ)F(tq∧tr) + (µπ2−λ)F(tq)F(tr)k

q,r=1.(9.5)

We conclude that (9.1) converges in distribution to a1N1+···+akNk, where (N1,...,Nk)has a

mean zero k-variate normal distribution with covariance matrix ΣF

HT. By means of the Cramér-Wold

device, this proves the lemma for the case that 0< F (t1)<···< F (tk)<1.

The case that the F(ti)’s are not all distinct, but still satisfy 0< F (ti)<1, can be reduced

to the case where all F(ti)’s are distinct. This can be seen as follows. For simplicity, suppose

F(t1) = ··· =F(tm) = F(t0), with 0< F (t0)< F (tm+1)<··· < F (tk)<1. Then we can

write (9.1) as

a0XF

N(t0) + am+1XF

N(tm+1) + ···+akXF

N(tk),(9.6)

where a0=a1+···+am. As before, with (HT4) and Lemma 8.5, it follows from Lemma 8.3

that (9.6) converges in distribution to a mean zero normal random variable with variance at

0ΣF

0a0,

where a0= (a0, am+1,...,ak)tand

ΣF

0=γπ1Em[Y0Yt

0] + (γπ2−λ) (Em[Y0]) (Em[Y0])t,

with Y0= (1{Yi≤t0},1{Yi≤tm+1},...,1{Yi≤tk})t. However, note that

0Y0= (a1+···+am)1{Yi≤t0}+am+11{Yi≤tm+1 }+···+ak1{Yi≤tk}

=a11{Yi≤t1}+···+ak1{Yi≤tk}=at

kY1k,,

where ak= (a1,...,ak)tand Y1k= (1{Yi≤t1},...,1{Yi≤tk})t, as before. This means that at

0ΣF

0a0=

kΣF

HTak, with ΣF

HT from (9.3). It follows that (9.1) converges in distribution to a1N1+···+akNk,

where (N1,...,Nk)has a mean zero k-variate normal distribution with covariance matrix ΣF

HT.

By means of the Cramér-Wold device, this proves the lemma for the case F(t1) = ···=F(tm) =

F(t0)< F (tm+1)<··· < F (tk)<1. The argument is the same for other cases with multiple

F(ti)∈(0,1) being equal to each other.

Next, consider the case F(t1) = 0. In this case, 1{Yi≤t1}= 0 with probability one. This means

that the summation on the left hand side of (9.1) reduces to a2XF

N(t2) + ···+akXF

N(tk)and

ΣHT =





0 0 ··· 0

ΣHT,k−1





,(9.7)

where ΣHT,k−1is the matrix in (9.4) based on 0< F (t2)<··· < F (tk)<1. When at

k−1=

(a2,...,ak)6= (0,...,0), then

σ2

HT =at

kΣF

HTak=at

k−1ΣHT,k−1ak−1>0,

because ΣHT,k−1is positive deﬁnite, due to (HT4) and Lemma 8.5. This allows application of

Lemma 8.3 to (9.1). As in the previous cases, we conclude that (9.1) converges in distribution

to a1N1+···+akNk, where (N1,...,Nk)has a mean zero k-variate normal distribution with

covariance matrix ΣF

HT given by (9.3). When at

k= (a1,0,...,0), with a16= 0, then both (9.1) and

a1N1+···+akNkare equal to zero. According to the Cramér-Wold device, this proves the lemma

for the case F(tk) = 0.

It remains to consider the case F(tk) = 1. In this case, the (k, k)-th element of the matrix ΣHT

in (9.4) is equal to µπ1+µπ2. We distinguish between µπ1+µπ2= 0 and µπ1+µπ2>0. In the

latter case, from the proof of Lemma 8.5 we ﬁnd that ΣHT has determinant

µk−1

π1F(t1)

i=2

(F(ti)−F(ti−1))(µπ1+µπ2)>0,

using (HT4) and 0< F (t1)<···< F (tk−1)< F (tk) = 1. This allows application of Lemma 8.3

to (9.1). As before, we conclude that (9.1) converges in distribution to a1N1+···+akNk, where

(N1,...,Nk)has a k-variate mean zero normal distribution with covariance matrix ΣF

HT from (9.3).

According to the Cramér-Wold device, this proves the lemma for the case F(tk) = 1 and µπ1+µπ2>

Next, consider the case F(tk) = 1 and µπ1+µπ2= 0. This means

ΣHT =





ΣHT,k−1

0··· 0 0





,(9.8)

where ΣHT,k−1is the matrix in (9.4) corresponding to 0< F (t1)<··· < F (tk−1)<1. When

k−1= (a1,...,ak−1)6= (0,...,0), then

σ2

HT =at

kΣHTak=at

k−1ΣHT,k−1ak−1>0,

because ΣHT,k−1is positive deﬁnite, due to (HT4) and Lemma 8.5. This allows application of

Lemma 8.3 to (9.1). As in the previous cases, we conclude that (9.1) converges in distribution to

a1N1+···+akNk, where (N1,...,Nk)has a k-variate mean zero normal distribution with covariance

matrix ΣF

HT given by (9.3). When at

k= (0,...,0, ak), with ak6= 0, then a1N1+···+akNk= 0 and

a1XF

N(t1) + ···+akXF

N(tk) = ak√n 1

i=1

ξi

πi−1!.

converges to zero in probability. The latter follows from the fact that according to (HT1) and

Lemma 9.1, we have that

√n 1

i=1

ξi

πi−1!→N(0, µπ1+µπ2),(9.9)

in distribution under Pd,m. According to the Cramér-Wold device, this proves the lemma for the

case F(tk) = 1 and µπ1+µπ2= 0. Finally, the argument for the case that F(t1) = 0 and F(tk) = 1

simultaneously, either with or without repeated among the F(ti)’s, is completely similar. This

ﬁnishes the proof. ✷

Proof of Proposition 4.1 The proof is similar to that of Theorem 4.2. We ﬁnd that the limit

behavior of √n(FHJ

N−FN)is the same as that of the process YNdeﬁned in (4.3). When we ﬁrst

consider the case of uniform Yi’s with F(t) = t, tightness of the process YNfollows in the same way

as in the proof of Theorem 4.2. It remains to establish weak convergence of the ﬁnite dimensional

projections (8.19). This can be done in the same way as in the proof of Proposition 3.1, but this

time with

Vik =a11{Yi≤t1}−t1) + ···+ak(1{Yi≤tk})−tk.

From (HT1) and Lemma 9.1(i) we conclude that (8.20) converges in distribution to a mean zero

normal random variable with variance

σ2

HT =µπ1EmV2

1k=at

Σkak,

where e

Σkis the k×k-matrix with (q, r)-element equal to µπ1(tq∧tr−tqtr). We conclude that (8.20)

converges in distribution to a1N1+···+akNk, where (N1,...,Nk)has a k-variate mean zero normal

distribution with covariance matrix e

Σk. By means of the Cramér-Wold device this establishes the

limit distribution of (8.19), which is the same as that of the vector (GHJ (t1),...,GHJ (tk)), where

GHJ is a mean zero Gaussian process with covariance function

Ed,mGHJ(s)GHJ(t) = µπ1(s∧t−st).

From here on, the proof is completely the same as that of Theorem 4.2.✷

Proof of Proposition 4.2 From relation (4.7) and Theorem 4.1 we know that the limit behavior

of √n(FHJ

N−F)is the same as that of Gπ

N. Tightness of Gπ

Nhas been obtained in the proof of

Theorem 4.1. It remains to establish weak convergence of (8.17). This can be done in the same

way as in the proof of Lemma 8.6, but this time with

Vik =a11{Yi≤t1}−F(t1)+···+ak1{Yi≤tk}−F(tk)

and µk= 0. When 0< F (t1)<··· < F (tk)<1, from (HT1) and Lemma 9.1 we ﬁnd that

nS2

N→µπ1Em[V2

1k] = at

kΣkak, where

Σk=µπ1F(tq∧tr)−F(tq)F(tr)k

q,r=1.(9.10)

From condition (i) of Proposition 3.2 and Lemma 8.5, it follows that Σkis positive deﬁnite, so that

kΣkak>0. Hence, according to Lemma 8.3, the right hand side of (8.18) converges in distribution

under Pd,m to a mean zero normal random variable with variance (µπ1+λ)Em[V2

1k] = at

kΣF

HJak,

where

ΣF

HJ =(µπ1+λ)F(tq∧tr)k

q,r=1 .(9.11)

We conclude that the right hand side of (8.18) converges in distribution to a1N1+···+akNk, where

(N1,...,Nk)has a mean zero k-variate normal distribution with covariance matrix ΣF

HJ. By means

of the Cramér-Wold device, this proves weak convergence of Gπ

N(t1),...,Gπ

N(tk)for the case that

0< F (t1)<···< F (tk)<1. As in the proof of Lemma 8.6, the case where the F(ti)’s are not all

distinct, but satisfy 0< F (ti)<1, the case F(t1) = 0, and the case F(tk) = 1, can be reduced to

the previous case. From here on, the proof is completely the same as that of Theorem 4.1.✷

Proof of (5.2) Following [Dd08], one can write φ=ψ2◦ψ1, where

ψ1(F) = F, β F −1(α)

ψ2(F, x) = F(x).

The Hadamard-derivative of φcan then be obtained from the chain rule, e.g., see Lemma 3.9.3

in [vdVW96]. According to Lemma 3.9.20 in [vdVW96], for 0< α < 1and F∈Dφthat have a

positive derivative at F−1(α), the map ψ1is Hadamard-diﬀerentiable at Ftangentially to the set

of functions h∈D(R)that are continuous at F−1(α)with derivative

ψ′

1,F (h) = h, −βh(F−1(α))

f(F−1(α)) .

It is fairly straightforward to show that for Fthat are diﬀerentiable at x, the mapping ψ2is

Hadamard-diﬀerentiable at (F, x)tangentially to the set of pairs (h, ǫ), such that his continuous at

xand ǫ∈R, with derivative

ψ′

2,(F,x)(h, ǫ) = ǫf (x) + h(x).

Then for F∈Dφthat are diﬀerentiable at βF −1(α), the mapping ψ2is Hadamard-diﬀerentiable at

ψ1(F) = F, β F −1(α). It follows from the chain rule that φ(F) = FβF −1(α)=ψ2◦ψ1(F)is

Hadamard-diﬀerentiable at Ftangentially to the set D0consisting of functions h∈D(R)that are

continuous at F−1(α)with derivative

φ′

F(h) = −βf(βF −1(α))

f(F−1(α)) h(F−1(α)) + h(βF −1(α)).

✷

Proof of Corollary 5.1 The mapping φ:Dφ⊂D(R)7→ Ris Hadamard-diﬀerentiable at F

tangentially to the set D0consisting of functions h∈D(R)that are continuous at F−1(α). According

to Theorem 3.2, the sequence √n(FHT

N−F)converges weakly to a mean zero Gaussian process GHT

with covariance structure

Ed,mGHT

F(s)GHT

F(t) = (µπ1+λ)F(s∧t) + (µπ2−λ)F(s)F(t),(9.12)

for s, t ∈R. It then follows from Theorem 3.9.4 in [vdVW96], that the random variable √n(φ(FHT

N)−

φ(F)) converges weakly to

−βf(βF −1(α))

f(F−1(α)) GHT

F(F−1(α)) + GHT

F(βF −1(α)),

which has a normal distribution with mean zero and variance

σ2

HT,α,β =β2f(βF −1(α))2

f(F−1(α))2EGHT

F(F−1(α))2

+EGHT

F(βF −1(α))2

−2βf(βF −1(α))

f(F−1(α)) EGHT

F(F−1(α))GHT

F(βF −1(α)).

The precise expression can then be derived from (9.12), which proves part one. For part two, write

√nφ(FHT

N)−φ(FN)=√nφ(FHT

N)−φ(F)+√n

√N√N(φ(FN)−φ(F)) .

The process √N(FN−F)converges weakly to a mean zero Gaussian process GF. Then, Hadamard-

diﬀerentiability of φtogether with Theorem 3.9.4 in [vdVW96] yields that the sequence √N(φ(FN)−

φ(F)) converges weakly to φ′

F(GF). As n/N →0, the theorem follows from part one. ✷

Proof of Corollary 5.2 The proof is completely the same as that of Corollary 5.1, with the only

diﬀerence that the covariance structure of the limiting process √n(φ(FHJ

N)−φ(F)) is now given in

Theorem 4.3.✷

9.2 Additional Lemmas

Lemma 9.1. Let S2

Nbe deﬁned by (3.2), where V1, V2,... is a sequence of i.i.d. random variables

on (Ω,F,Pm)with Em[V4

1]<∞. Suppose that nand πi, πij, for i, j = 1,2,...,N are deterministic

and let Vm(S2

N)denote the variance of S2

N. If (C1)-(C2) hold, then n2Vm[S2

N] = O(1/N). Then,

(i) if Em[V1] = 0 and condition (i) in Proposition 3.1 holds,

nS2

N→σ2

HT =µπ1Em[V2

1],in Pm-probability.

(ii) if Em[V1]6= 0 and conditions (i)-(ii) in Proposition 3.1 hold,

nS2

N→σ2

HT =µπ1Em[V2

1] + µπ2(Em[V1])2,in Pm-probability.

Proof. For any ǫ > 0, by Markov inequality we have

Pm|nS2

N−Em[nS2

N]|> ǫ<n2Vm[S2

ǫ2,(9.13)

where Vmdenotes the variance of S2

Nunder the super-population model. In order to compute

Vm[S2

N], we ﬁrst have

Em[S2

N] = 1

i=1

j=1

πij −πiπj

πiπj

Em(ViVj)

=Em[V2

i=1

1−πi

πi

+(Em[V1])2

N2XX

i6=j

πij −πiπj

πiπj

(9.14)

From this, tedious but straightforward calculus leads to the expression for (Em[S2

N])2and Em[S4

N].

One ﬁnds

N4EmS2

N2=a1(Em[V1])4+a2EmV2

1(Em[V1])2+a3EmV2

12,

where, according to (C1)-(C2):

a1=XXXX

(i,j,k,l)∈D4,N

πij −πiπj

πiπj

πkl −πkπl

πkπl

+ 4 XXX

(i,j,l)∈D3,N

πij −πiπj

πiπj

πil −πiπl

πiπl

+ 2 XX

(i,j)∈D2,N πij −πiπj

πiπj2

=XXXX

(i,j,k,l)∈D4,N

πij −πiπj

πiπj

πkl −πkπl

πkπl

+O(N3/n2) + O(N2/n2)

a2= 2 XXX

(i,k,l)∈D3,N

1−πi

πi

πkl −πkπl

πkπl

+ 4 XX

(i,k)∈D2,N

1−πi

πi

πik −πiπk

πiπk

= 2 XXX

(i,k,l)∈D3,N

1−πi

πi

πkl −πkπl

πkπl

+O(N3/n2)

a3=XX

(i,j)∈D2,N

1−πi

πi

1−πj

πj

i=1 1−πi

πi2

=XX

(i,j)∈D2,N

1−πi

πi

1−πj

πj

+O(N3/n2).

Furthermore,

N4EmS4

N=b1(Em[V1])4+b2EmV2

1(Em[V1])2

+b3EmV2

12+b4Em[V1]EmV3

1

where

b1=XXXX

(i,j,k,l)∈D4,N

πij −πiπj

πiπj

πkl −πkπl

πkπl

i=1 1−πi

πi2

=XXXX

(i,j,k,l)∈D4,N

πij −πiπj

πiπj

πkl −πkπl

πkπl

+O(N3/n2)

b2= 2 XXX

(i,k,l)∈D3,N

1−πi

πi

πkl −πkπl

πkπl

+ 4 XXX

(i,j,l)∈D3,N

πij −πiπj

πiπj

πil −πiπl

πiπl

= 2 XXX

(i,k,l)∈D3,N

1−πi

πi

πkl −πkπl

πkπl

+O(N3/n2)

b3=XX

(i,k)∈D2,N

1−πi

πi

1−πk

πk

+ 2 XX

(i,j)∈D2,N πij −πiπj

πiπj2

=XX

(i,k)∈D2,N

1−πi

πi

1−πk

πk

+O(N2/n2)

b4= 4 XX

(i,j)∈D2,N

πij −πiπj

πiπj

1−πj

πj

=O(N3/n2).

The variance expression for S2

Nis deduced easily from the previous computations. From the expres-

sion derived in [BLRG15], we ﬁnd that ai−bi=O(N3/n2), for i= 1,2,3, and b4=O(N3/n2), so

that

n2Vm[S2

N] = n2Em[S4

N]−n2Em[S2

N]2=O(1/N ).(9.15)

From (9.13) we conclude that nS2

N−Em[nS2

N]tends to zero in Pm-probability. As a consequence,

statements (i) and (ii) follow from (9.14).

Lemma 9.2. If xN xand yN yin D[0,1] with the Skorohod metric, and x, y ∈C[0,1], then

the sequence {xN+yN}is also tight in D[0,1].

Proof. We can use Theorem 13.2 from [Bil99]. The ﬁrst condition follows easily since

sup

t∈[0,1] |xN(t) + yN(t)| ≤ sup

t∈[0,1] |xN(t)|+ sup

t∈[0,1] |yN(t)|.

Because xN xand yN yboth sequences {xN}and {yN}are tight, so that they satisfy the

ﬁrst condition of Theorem 13.2 individually. For condition (ii) of Theorem 13.2 in [Bil99], choose

ǫ > 0. According to (12.7) in [Bil99], for any 0< δ < 1/2,

w′

x(δ)≤wx(2δ).

This means that

Pw′

xN+yN(δ)≥ǫ≤P{wxN+yN(2δ)≥ǫ}

≤P{wxN(2δ)≥ǫ/2}+P{wyN(2δ)≥ǫ/2}.

Consider the ﬁrst probability. Since xN xin D[0,1] with the Skorohod metric, according to the

almost sure representation theorem (see, e.g., Theorem 11.7.2 in [Dud02]), there exist exnand ex,

having the same distribution as xNand x, respectively, such that exN→ex, with probability one, in

the Skorohod metric. Because exd

=xand x∈C[0,1], also ex∈C[0,1]. Hence, since exis continuous,

it follows that

sup

t∈[0,1] |exN(t)−ex(t)| → 0,with probability one. (9.16)

We then ﬁnd that

P{wxN(2δ)≥ǫ/2}=P(sup

|s−t|<2δ|xN(s)−xN(t)| ≥ ǫ/2)

=P(sup

|s−t|<2δ|exN(s)−exN(t)| ≥ ǫ/2)

≤P(sup

|s−t|<2δ|ex(s)−ex(t)| ≥ ǫ/4)

+P(sup

s∈[0,1] |exN(s)−ex(s)| ≥ ǫ/8)+P(sup

t∈[0,1] |exN(t)−ex(t)| ≥ ǫ/8).

The latter two probabilities tend to zero due to to (9.16). For the ﬁrst probability on the right

hand side, note that C[0,1] is separable and complete. This means that each random element in

C[0,1] is tight. Hence, ex∈C[0,1] is tight, so that according to Theorem 7.3 in [Bil99], there exists

a0< δ < 1/2, such that

P(sup

|s−t|<2δ|x(s)−x(t)| ≥ ǫ/4)=P{wx(2δ)≥ǫ/4} ≤ η.

We conclude that P{wxN(2δ)≥ǫ/2} → 0, and the same result for yNcan be obtained similarly.

This proves the lemma.

Inference for Two-Stage Sampling Designs

Article

May 2020

Two‐stage sampling designs are commonly used for household and health surveys. To produce reliable estimators with associated confidence intervals, some basic statistical properties like consistency and asymptotic normality of the Horvitz–Thompson estimator are desirable, along with the consistency of associated variance estimators. These properties have been mainly studied for single‐stage sampling designs. In this work, we prove the consistency of the Horvitz–Thompson estimator and of associated variance estimators for a general class of two‐stage sampling designs, under mild assumptions. We also study two‐stage sampling with a large entropy sampling design at the first stage and prove that the Horvitz–Thompson estimator is asymptotically normally distributed through a coupling argument. When the first‐stage sampling fraction is negligible, simplified variance estimators which do not require estimating the variance within the primary sampling units are proposed and shown to be consistent. An application to a panel for urban policy, which is the initial motivation for this work, is also presented.

Large sample theory for merged data from multiple sources

Article

Jun 2019
ANN STAT

Takumi Saegusa

We develop large sample theory for merged data from multiple sources. Main statistical issues treated in this paper are (1) the same unit potentially appears in multiple datasets from overlapping data sources, (2) duplicated items are not identified and (3) a sample from the same data source is dependent due to sampling without replacement. We propose and study a new weighted empirical process and extend empirical process theory to a dependent and biased sample with duplication. Specifically, we establish the uniform law of large numbers and uniform central limit theorem over a class of functions along with several empirical process results under conditions identical to those in the i.i.d. setting. As applications, we study infinite-dimensional M-estimation and develop its consistency, rates of convergence and asymptotic normality. Our theoretical results are illustrated with simulation studies and a real data example.

Analytic inference in finite population framework via resampling

Preprint

Sep 2018

The aim of this paper is to provide a resampling technique that allows us to make inference on superpopulation parameters in finite population setting. Under complex sampling designs, it is often difficult to obtain explicit results about superpopulation parameters of interest, especially in terms of confidence intervals and test-statistics. Computer intensive procedures, such as resampling, allow us to avoid this problem. To reach the above goal, asymptotic results about empirical processes in finite population framework are first obtained. Then, a resampling procedure is proposed, and justified via asymptotic considerations. Finally, the results obtained are applied to different inferential problems and a simulation study is performed to test the goodness of our proposal.

Functional central limit theorems for single-stage sampling designs

Article

Full-text available

Aug 2017
ANN STAT

For a joint model-based and design-based inference, we establish functional central limit theorems for the Horvitz–Thompson empirical process and the Hájek empirical process centered by their finite population mean as well as by their super-population mean in a survey sampling framework. The results apply to single-stage unequal probability sampling designs and essentially only require conditions on higher order correlations. We apply our main results to a Hadamard differentiable statistical functional and illustrate its limit behavior by means of a computer simulation.

Empirical Processes in Survey Sampling with (Conditional) Poisson Designs

Article

Full-text available

Aug 2016
SCAND J STAT

It is the main purpose of this paper to study the asymptotics of certain variants of the empirical process in the context of survey data. Precisely, Functional Central Limit Theorems are established under usual conditions when the sample is drawn from a Poisson or a rejective sampling design. The framework we develop encompasses sampling designs with non-uniform first order inclusion probabilities, which can be chosen so as to optimize estimation accuracy. Applications to Hadamard differentiable functionals are considered.

Variance Estimation of Change in Poverty Rates: an Application to the Turkish EU-SILC Survey

Article

Full-text available

Jun 2015
J Offic Stat

Interpreting changes between point estimates at different waves may be misleading if we do not take the sampling variation into account. It is therefore necessary to estimate the standard error of these changes in order to judge whether or not the observed changes are statistically significant. This involves the estimation of temporal correlations between cross-sectional estimates, because correlations play an important role in estimating the variance of a change in the cross-sectional estimates. Standard estimators for correlations cannot be used because of the rotation used in most panel surveys, such as the European Union Statistics on Income and Living Conditions (EU-SILC) surveys. Furthermore, as poverty indicators are complex functions of the data, they require special treatment when estimating their variance. For example, poverty rates depend on poverty thresholds which are estimated from medians. We propose using a multivariate linear regression approach to estimate correlations by taking into account the variability of the poverty threshold. We apply the approach proposed to the Turkish EU-SILC survey data.

Empirical processes in survey sampling

Article

Full-text available

Oct 2013

Supplementary materials are available for this article.

Theory of Sample Surveys

Article

Nov 1998
TECHNOMETRICS

Wiley Series in Survey Methodology

Chapter

Aug 2009

Wayne A. Fuller

Wiley Series in Probability and Statistics

Chapter

Sep 2004

Variance estimation of change of poverty based upon the Turkish EU-SILC survey

Article

Apr 2015

Interpreting changes between point estimates at different waves may be misleading, if we do not take the sampling variation into account. It is therefore necessary to estimate the standard error of these changes in order to judge whether or not the observed changes are statistically significant. This involves the estimation of temporal correlations between cross sectional estimates, because correlations play an important role in estimating the variance of change in the cross-sectional estimates. Standard estimator for correlations cannot be used, because of the rotation used in most panel surveys, such as the European Union Statistics on Income and Living Conditions (EU-SILC) surveys. Furthermore, as poverty indicators are complex functions of the data, they need a special treatment when estimating their variance. For example, poverty rates depend on poverty thresholds which are estimated from medians. We propose to use a multivariate linear regression approach to estimate correlations by taking into account of the variability of the poverty threshold. We apply the proposed approach to the Turkish EU-SILC survey data.

Variance Estimation Using Linearization for Poverty and Social Exclusion Indicators

Article

Jun 2014
SURV METHODOL

We have used the generalized linearization technique based on the concept of influence function, as Osier has done (Osier 2009), to estimate the variance of complex statistics such as Laeken indicators. Simulations conducted using the R language show that the use of Gaussian kernel estimation to estimate an income density function results in a strongly biased variance estimate. We are proposing two other density estimation methods that significantly reduce the observed bias. One of the methods has already been outlined by Deville (2000). The results published in this article will help to significantly improve the quality of information on the precision of certain Laeken indicators that are disseminated and compared internationally.

Real analysis and probability

Article

Jan 1989

Richard M. Dudley

Design and Model-Based Inference for Model Parameters

Article

Dec 2009

This chapter discusses the issues associated with making inferences about model parameters from survey data that have been obtained from a probability–sampling scheme using a frequency-based framework. In sample surveys, information is collected from a sample of units from a finite population. It is common for the sampling plan to be complex, which is defined as any sampling plan where the units are selected using a design that is not simple random sampling. When a survey is conducted, the survey producer targets a particular population of inference or a particular set of populations of inference. It is important to distinguish between two types of populations: the survey population and the target populations. It is presumed that the realizations of the random variables generated by such a model have given rise to the values of the characteristics of interest in the finite population from which the sample was selected. The relationship between the units of a population and the units of analysis is important. The researcher must be aware of the differences between the two types of units.

Calibration Estimators in Survey Sampling

Article

Jun 1992

This article investigates estimation of finite population totals in the presence of univariate or multivariate auxiliary information. Estimation is equivalent to attaching weights to the survey data. We focus attention on the several weighting systems that can be associated with a given amount of auxiliary information and derive a weighting system with the aid of a distance measure and a set of calibration equations. We briefly mention an application to the case in which the information consists of known marginal counts in a two- or multi-way table, known as generalized raking. The general regression estimator (GREG) was conceived with multivariate auxiliary information in mind. Ordinarily, this estimator is justified by a regression relationship between the study variable y and the auxiliary vector x. But we note that the GREG can be derived by a different route by focusing instead on the weights. The ordinary sampling weights of the kth observation is 1/πk, where πk is the inclusion probability of k. We show that the weights implied by the GREG are as close as possible, according to a given distance measure, to the 1/πk while respecting side conditions called calibration equations. These state that the sample sum of the weighted auxiliary variable values must equal the known population total for that auxiliary variable. That is, the calibrated weights must give perfect estimates when applied to each auxiliary variable. That is a consistency check that appeals to many practitioners, because a strong correlation between the auxiliary variables and the study variable means that the weights that perform well for the auxiliary variable also should perform well for the study variable. The GREG uses the auxiliary information efficiently, so the estimates are precise; however, the individual weights are not always without reproach. For example, negative weights can occur, and in some applications this does not make sense. It is natural to seek the root of the dissatisfaction in the underlying distance measure. Consequently, we allow alternative distance measures that satisfy only a set of minimal requirements. Each distance measure leads, via the calibration equations, to a specific weighting system and thereby to a new estimator. These estimators form a family of calibration estimators. We show that the GREG is a first approximation to all other members of the family; all are asymptotically equivalent to the GREG, and the variance estimator already known for the GREG is recommended for use in any other member of the family. Numerical features of the weights and ease of computation become more than anything else the bases for choosing between the estimators. The reasoning is applied to calibration on known marginals of a two-way frequency table. Our family of distance measures leads in this case to a family of generalized raking procedures, of which classical raking ratio is one.

Functional central limit theorems for single-stage sampling designs

Abstract

Recommended publications

A Survey of Lower Bounds for Satisfiability and Related Problems

Towards a Framework for Enterprise Architecture Frameworks Comparison and Selection

On Efficient Dimension Reduction with Respect to a Statistical Functional of Interest

An interdisciplinary framework to evaluate bioshield plantations: Insights from peninsular India