Content uploaded by Hendrik P. Lopuhaä
Author content
All content in this area was uploaded by Hendrik P. Lopuhaä on Oct 23, 2015
Content may be subject to copyright.
arXiv:1509.09273v1 [math.ST] 30 Sep 2015
Functional central limit theorems
in survey sampling
Hélène Boistard1, Hendrik P. Lopuhaä2, and Anne Ruiz-Gazen3
1Toulouse School of Economics
2Delft University of Technology
3Toulouse School of Economics
October 1, 2015
Abstract
For a joint model-based and design-based inference, we establish functional central limit
theorems for the Horvitz-Thompson empirical process and the Hájek empirical process centered
by their finite population mean as well as by their super-population mean in a survey sampling
framework. The results apply to generic sampling designs and essentially only require conditions
on higher order correlations. We apply our main results to a Hadamard differentiable statistical
functional and illustrate its limit behavior by means of a computer simulation.
1 Introduction
Functional central limit theorems are well established in statistics. Much of the theory has been de-
veloped for empirical processes of independent summands. In combination with the functional delta-
method they have become a very powerful tool for investigating the limit behavior for Hadamard or
Fréchet differentiable statistical functionals (e.g., see [vdVW96] or [vdV98] for a rigorous treatment
with several applications).
In survey sampling, results on functional central limit theorems are far from complete. At the
same time there is a need for such results. For instance, in [Dd08] the limit distribution of several
statistical functionals is investigated, under the assumption that such a limit theorem exists for a
design-based empirical process, whereas in [BD09] the existence of a functional central limit theorem
is assumed, to perform model-based inference on several Gini indices. Weak convergence of processes
in combination with the delta method are treated in [Bha07], [Dav09], [BM11], but these results
are tailor made for specific statistical functionals, and do not apply to the empirical processes that
are typically considered in survey sampling.
Recently, functional central limit theorems for empirical processes in survey sampling have ap-
peared in the literature. Most of them are concerned with empirical processes indexed by a class
of functions, see [BW07], [SW13], and [BCC14]. However, the results in [BW07] and [SW13]
are restricted to sampling schemes that have exchangeable inclusion indicators and constant inclu-
sion probabilities, such as simple random sampling and Bernoulli sampling, whereas the approach
in [BCC14] seems difficult to extend to sampling designs other than those that are closely related
to Poisson sampling. [Wan12] considers empirical processes indexed by a real valued parameter.
Unfortunately, this paper seems to miss a number of assumptions that cannot be avoided and, more
importantly, it seems to contain a flaw in the proof. (see Section 7for a more detailed discussion).
1
The main purpose of the present paper is to establish functional central limit theorems for
the Horvitz-Thompson and the Hájek empirical distribution function that apply to general sam-
pling designs. For design-based inference about finite population parameters, these empirical dis-
tribution functions will be centered around their population mean. On the other hand, in many
situations involving survey data, one is interested in the corresponding model parameters (e.g.,
see [KG98] and [BR09]). Recently, Rubin-Bleuer and Schiopu Kratina [RBSK05] defined a mathe-
matical framework for joint model-based and design-based inference through a probability product-
space and introduced a general and unified methodology for studying the asymptotic properties
of model parameter estimators. To incorporate both types of inferences, we consider the Horvitz-
Thompson empirical process and the Hájek empirical process under the super-population model
described in [RBSK05], both centered around their finite population mean as well as around their
super-population mean. Our main results are functional central limit theorems for both empirical
processes indexed by a real valued parameter and apply to generic sampling schemes. These results
are established only requiring the usual standard assumptions that one encounters in asymptotic
theory in survey sampling. Our approach was inspired by an unpublished manuscript from Philippe
Fevrier and Nicolas Ragache, which was the outcome of an internship at INSEE in 2001.
The article is organized as follows. Notations and assumptions are discussed in Section 2.
In particular we briefly discuss the joint model-based and design-based inference setting defined
in [RBSK05]. In Sections 3and 4, we list the assumptions and state our main results. Our
assumptions essentially concern the inclusion probabilities of the sampling design up to the fourth
order and a central limit theorem (CLT) for the Horvitz-Thompson estimator of a population total
for i.i.d. bounded random variables. Our results allow random inclusion probabilities and are stated
in terms of the design-based expected sample size, but we also formulate more detailed results in
case these quantities are deterministic.
As an application of our results, in combination with the functional delta-method, we obtain the
limit distribution of the poverty rate in Section 5. This example is further investigated in Section 6
by means of a simulation. Finally, in Section 7we discuss in detail the differences of our results
with the work by [BW07], [SW13], [Wan12], and [BCC14]. All proofs are deferred to Section 8and
some tedious technicalities can be found in [BLRG15].
2 Notations and assumptions
We adopt the super-population setup as described in [RBSK05]. Consider a sequence of finite
populations (UN), of sizes N= 1,2,.... With each population we associate a set of indices UN=
{1,2,...,N}. Furthermore, for each index i∈UN, we have a tuple (yi, zi)∈R×Rq
+. We denote
yN= (y1, y2,...,yN)∈RNand zN∈Rq×N
+similarly. The vector yNcontains the values of the
variable of interest and zNcontains information for the sampling design. We assume that the values
in each finite population are realizations of random variables (Yi, Zi)∈R×Rq
+, for i= 1,2,...,N,
on a common probability space (Ω,F,Pm). Similarly, we denote YN= (Y1, Y2,...,YN)∈RNand
ZN∈Rq×N
+. To incorporate the sampling design, a product space is defined as follows. For all
N= 1,2,..., let SN={s:s⊂UN}be the collection of subsets of UNand let AN=σ(SN)be the
σ-algebra generated by SN. A sampling design associated to some sampling scheme is a function
p:AN×Rq×N
+7→ [0,1], such that
(i) for all s∈ SN,zN7→ p(s, zN)is a Borel-measurable function on Rq×N
+.
(ii) for all zN∈Rq×N
+,A7→ p(A, zN)is a probability measure on AN.
Note that for each ω∈Ω, we can define a probability measure A7→ Pd(A, ω) = Ps∈Ap(s, ZN(ω))
on the design space (SN,AN). Corresponding expectations will be denoted by Ed(·, ω). Next, we
2
define a product probability space that includes the super-population and the design space, under
the premise that sample selection and the model characteristic are independent given the design
variables. Let (SN×Ω,AN×F)be the product space with probability measure Pd,m defined on
simple rectangles {s} × E∈AN×Fby
Pd,m({s} × E) = ZE
p(s, ZN(ω)) dPm(ω) = ZE
Pd({s}, ω)dPm(ω).
When taking expectations or computing probabilities, we will emphasize whether this is with respect
either to the measure Pd,m associated with the product space (SN×Ω,AN×F), or the measure Pd
associated with the design space (SN,AN), or the measure Pmassociated with the super-population
space (Ω,F).
If nsdenotes the size of sample s, then this may depend on the specific sampling design including
the values of the design variables Z1(ω),...,ZN(ω). Similarly, the inclusion probabilities may
depend on the values of the design variables, πi(ω) = Ed(ξi, ω) = Ps∋ips, ZN(ω), where ξiis the
indicator 1{s∋i}. Instead of ns, we will consider n=Ed[ns(ω)] = PN
i=1 Ed(ξi, ω) = PN
i=1 πi(ω). This
means that the inclusion probabilities and the design-based expected sample size may be random
variables on (Ω,F,Pm).
We first consider the Horvitz-Thompson (HT) empirical processes, obtained from the HT em-
pirical c.d.f.:
FHT
N(t) = 1
N
N
X
i=1
ξi1{Yi≤t}
πi
, t ∈R.(2.1)
We will consider HT empirical process √n(FHT
N−FN), obtained by centering around the empirical
c.d.f. FNof Y1,...,YN, as well as the HT empirical process √n(FHT
N−F), obtained by centering
around the c.d.f. Fof the Yi’s. A functional central limit theorem for both processes will be
formulated in Section 3. In addition, we will consider the Hájek empirical c.d.f.:
FHJ
N(t) = 1
b
N
N
X
i=1
ξi1{Yi≤t}
πi
, t ∈R,(2.2)
where b
N=PN
i=1 ξi/πiis the HT estimator for the population total N. Functional central limit
theorems for √n(FHJ
N−FN)and √n(FHJ
N−F)will be provided in Section 4. The advantage of our
results is that they allow general sampling schemes and that we primarily require bounds on the
rate at which higher order correlations tend to zero ω-almost surely, under the design measure Pd.
3 FCLT’s for the Horvitz-Thompson empirical processes
A functional central limit theorem for √n(FHT
N−FN)and √n(FHT
N−F)is obtained by proving weak
convergence of all finite dimensional distributions and tightness. In order to establish the latter for
general sampling schemes, we impose a number of conditions that involve the sets
Dν,N =n(i1, i2,...,iν)∈ {1,2,...,N}ν:i1, i2,...,iνall differento,(3.1)
for the integers 1≤ν≤4. We assume the following conditions:
(C1) there exist constants K1, K2, such that for all i= 1,2,...,N,
0< K1≤Nπi
n≤K2<∞, ω −a.s.
3
There exists a constant K3>0, such that for all N= 1,2,...:
(C2) max(i,j)∈D2,N Ed(ξi−πi)(ξj−πj)< K3n/N2,
(C3) max(i,j,k)∈D3,N Ed(ξi−πi)(ξj−πj)(ξk−πk)< K3n2/N3,
(C4) max(i,j,k,l)∈D4,N Ed(ξi−πi)(ξj−πj)(ξk−πk)(ξl−πl)< K3n2/N4,
ω-almost surely. These conditions on higher order correlations are commonly used in the litera-
ture on survey sampling in order to derive asymptotic properties of estimators (e.g., see [BO00],
and [CCGL10]). [BO00] proved that they hold for simple random sampling without replacement and
stratified simple random sampling without replacement, whereas [BLRG12] proved that they hold
also for rejective sampling. Lemma 2 from [BLRG12] allows us to reformulate the above conditions
on higher order correlations into conditions on higher order inclusion probabilities.
To establish the convergence of finite dimensional distributions, for sequences of bounded i.i.d. ran-
dom variables V1, V2,...on (Ω,F,Pm), we will need a CLT for the HT estimator in the design space,
conditionally on the Vi’s. To this end, let S2
Nbe the (design-based) variance of the HT estimator
of the population mean, i.e.,
S2
N=1
N2
N
X
i=1
N
X
j=1
πij −πiπj
πiπj
ViVj.(3.2)
We assume that
(HT1) For Nsufficiently large SN>0and for any sequence of bounded i.i.d. random variables
V1, V2,...,
1
SN 1
N
N
X
i=1
ξiVi
πi−1
N
N
X
i=1
Vi!→N(0,1), ω −a.s.,
in distribution under Pd.
Note that (HT1) holds for simple random sampling without replacement if n(N−n)/N tends to
infinity when Ntends to infinity (see [Tho97]), as well as for Poisson sampling under some conditions
on the first order inclusion probabilities (e.g., see [Ful09]). For rejective sampling, [Háj64] gives some
sufficient conditions for (HT1) to hold.
We also need that nS2
Nconverges for the particular case where the Vi’s are random vectors
consisting of indicators 1{Yj≤t}.
(HT2) For k∈ {1,2,...},i= 1,2,...,k and t1, t2,...,tk∈R, define Yt
ik =1{Yi≤t1},...,1{Yi≤tk}.
There exists a deterministic matrix ΣHT
k, such that
lim
N→∞
n
N2
N
X
i=1
N
X
j=1
πij −πiπj
πiπj
YikYt
jk =ΣHT
k, ω −a.s. (3.3)
This kind of assumption is quite standard in the literature on survey sampling and is usually
imposed for general random vectors (see, for example [DS92], p.379, [FF91], condition 3 on page 457,
or [KR81], condition C4 on page 1014). It suffices to require (3.3) for Yt
ik =1{Yi≤t1},...,1{Yi≤tk}.
Moreover, if (C1)-(C2) hold, then the sequence in (3.3) is bounded, so that by dominated convergence
it follows that
ΣHT
k= lim
N→∞
1
N2
N
X
i=1
N
X
j=1
Emnπij −πiπj
πiπj
YikYt
jk .(3.4)
4
This might help to get a more tractable expression for ΣHT
k.
We are now able to formulate our first main result. Let D(R)be the space of càdlàg functions
on Requipped with the Skorohod topology.
Theorem 3.1. Let Y1,...,YNbe i.i.d. random variables with c.d.f. Fand empirical c.d.f. FN
and let FHT
Nbe defined in (2.1). Suppose that conditions (C1)-(C4) and (HT1)-(HT2) hold. Then
√n(FHT
N−FN)converges weakly in D(R)to a mean zero Gaussian process GHT with covariance
function
EmGHT(s)GHT(t) = lim
N→∞
1
N2
N
X
i=1
N
X
j=1
Emnπij −πiπj
πiπj
1{Yi≤s}1{Yj≤t}
for s, t ∈R.
Note that Theorem 3.1 allows a random (design-based) expected sample size nand random
inclusion probabilities. However, the expression of the covariance function of the limiting Gaussian
process is somewhat unsatisfactory. When nand the inclusion probabilities are deterministic, we can
obtain a functional CLT with a more precise expression for EmGHT(s)GHT(t)under slightly weaker
conditions. This is formulated in the proposition below. Note that with imposing conditions (i)-(ii)
in Proposition 3.1 instead of (3.3), convergence of nS2
Nis not necessarily guaranteed. However, this
is established in Lemma 9.1 in [BLRG15] under (C1) and (C2).
Finally, we like to emphasize that if we would have imposed (HT2) for any sequence Y1,Y2,...
of bounded random vectors, then (HT2) would have implied conditions (i)-(ii) in the deterministic
setup of Proposition 3.1.
Proposition 3.1. Consider the setting of Theorem 3.1, where nand πi, πij , for i, j = 1,2,...,N,
are deterministic. Suppose that (C1)-(C4) and (HT1) hold, but instead of (HT2) assume that there
exist constants µπ1,µπ2∈Rsuch that
(i) lim
N→∞
n
N2
N
X
i=1 1
πi−1=µπ1,
(ii) lim
N→∞
n
N2XX
i6=j
πij −πiπj
πiπj
=µπ2.
Then √n(FHT
N−FN)converges weakly in D(R)to a mean zero Gaussian process GHT with covariance
function µπ1F(s∧t) + µπ2F(s)F(t), for s, t ∈R.
When n/N →λ∈[0,1], then conditions (i)-(ii) hold with µπ1= 1 −λand µπ2=λ−1for
simple random sampling without replacement. For Poisson sampling, (ii) holds trivially because
the trials are independent. For rejective sampling, (i)-(ii) together with n/N →λ∈[0,1], can be
deduced from the associated Poisson sampling design. Indeed, suppose that (i) holds for Poisson
sampling with first order inclusion probabilities p1,...,pN, such that PN
i=1 pi=n. Then, from
Theorem 1 in [BLRG12] it follows that if d=PN
i=1 pi(1 −pi)tends to infinity, assumption (i) holds
for rejective sampling. Furthermore, if n/N →λ∈[0,1] and N/d has a finite limit, then also (ii)
holds for rejective sampling.
Weak convergence of the process √n(FHT
N−F), where we center with Finstead of FN, requires
a CLT in the super-population space for
√n 1
N
N
X
i=1
ξiVi
πi−µV!,where µV=Em(Vi),(3.5)
5
for sequences of bounded i.i.d. random variables V1, V2,... on (Ω,F,Pm). Our approach to establish
asymptotic normality of (3.5) is then to decompose as follows
√n 1
N
N
X
i=1
ξiVi
πi−µV!
=√n 1
N
N
X
i=1
ξiVi
πi−1
N
N
X
i=1
Vi!+√n
√N×√N 1
N
N
X
i=1
Vi−µV!.
(3.6)
Since the Vi’s are i.i.d. and bounded, for the second term on the right hand side, by the traditional
CLT we immediately obtain
√N 1
N
N
X
i=1
Vi−µV!→N(0, σ2
V),(3.7)
in distribution under Pm, where σ2
Vdenotes the variance of the Vi’s, whereas the first term on the
right hand side can be handled with (HT1). [BW07] and [SW13] use a decomposition similar to
the one in (3.6). Their approach assumes exchangeable ξi’s and equal inclusion probabilities n/N,
which allows the use of results on exchangeable weighted bootstrap to handle the first term on the
right hand side of (3.6). Instead, we only require conditions (C2)-(C4) on higher order correlations
for the ξi’s and allow the πi’s to vary within certain bounds as described in (C1). To combine the
two separate limits in (3.7) and (HT1), we will need
(HT3) n/N →λ∈[0,1],ω-a.s.
We will then use Theorem 5.1(iii) from [RBSK05]. The finite dimensional projections of the processes
involved turn out to be related to a particular HT estimator. In order to have the corresponding
design-based variance converging to a strictly positive constant, we need the following condition.
(HT4) For all k∈ {1,2,...}and t1, t2,...,tk∈R, the matrix ΣHT
kin (3.3) is positive definite.
We are now able to formulate our second main result.
Theorem 3.2. Let Y1,...,YNbe i.i.d. random variables met c.d.f. Fand let FHT
Nbe defined in (2.1).
Suppose that conditions (C1)-(C4) and (HT1)-(HT4) hold. Then √n(FHT
N−F)converges weakly in
D(R)to a mean zero Gaussian process GHT
Fwith covariance function Ed,mGHT
F(s)GHT
F(t)given by
lim
N→∞
1
N2
N
X
i=1
N
X
j=1
Emnπij −πiπj
πiπj
1{Yi≤s}1{Yj≤t}+λF(s∧t)−F(s)F(t),
for s, t ∈R.
Theorem 3.2 allows random nand inclusion probabilities. As before, when the sample size nand
inclusion probabilities are deterministic we can obtain a functional CLT under a simpler condition
than (HT4) and with a more detailed description of the covariance function of the limiting process.
Proposition 3.2. Consider the setting of Theorem 3.2, where nand πi, πij , for i, j = 1,2,...,N,
are deterministic. Suppose that (C1)-(C4), (HT1)and (HT3) hold, but instead of (HT2) and (HT4)
assume that there exist constants µπ1,µπ2∈Rsuch that
(i) lim
N→∞
n
N2
N
X
i=1 1
πi−1=µπ1>0,
6
(ii) lim
N→∞
n
N2XX
i6=j
πij −πiπj
πiπj
=µπ2.
Then √n(FHT
N−F)converges weakly in D(R)to a mean zero Gaussian process GHT with covariance
function (µπ1+λ)F(s∧t) + (µπ2−λ)F(s)F(t), for s, t ∈R.
Since 1/πi≥1, we will always have µπ1≥0in condition (i) in Proposition 3.2. This means that
(i) is not very restrictive. For simple random sampling without replacement, condition (i) requires
λto be strictly smaller than one.
4 FCLT’s for the Hájek empirical processes
To determine the behavior of the process √n(FHJ
N−FN), it is useful to relate this process to the
process
Gπ
N(t) = √n
N
N
X
i=1
ξi
πi1{Yi≤t}−F(t).(4.1)
We can then write √nFHJ
N(t)−FN(t)=YN(t) + N
b
N−1Gπ
N(t),(4.2)
where
YN(t) = √n
N
N
X
i=1 ξi
πi−11{Yi≤t}−F(t).(4.3)
As intermediate results we will first show that the process Gπ
Nconverges weakly to a mean zero
Gaussian process and that b
N/N →1in probability. As a consequence, the limiting behavior of
√n(FHJ
N−FN)will be the same as that of YN, which is an easier process to handle. Instead of
(HT2) and (HT4) we now need
(HJ2) For k∈ {1,2,...},i= 1,2,...,k and t1, t2,...,tk∈R, define
e
Yt
ik =1{Yi≤t1}−F(t1),...,1{Yi≤tk}−F(tk).
There exists a deterministic matrix ΣHJ
k, such that
lim
N→∞
n
N2
N
X
i=1
N
X
j=1
πij −πiπj
πiπje
Yik e
Yt
jk =ΣHJ
k, ω −a.s. (4.4)
and
(HJ4) For all k∈ {1,2,...}and t1, t2,...,tk∈R, the matrix ΣHJ
kin (4.4) is positive definite.
As in the case of (3.4), if (C1)-(C2) hold, then (HJ2) implies
ΣHJ
k= lim
N→∞
1
N2
N
X
i=1
N
X
j=1
Emnπij −πiπj
πiπje
Yik e
Yt
jk .(4.5)
Theorem 4.1. Let Gπ
Nbe defined in (4.1)and let b
N=PN
i=1 ξi/πi. Suppose n→ ∞,ω-a.s., and
that there exists σ2
π≥0, such that
n
N2
N
X
i=1
N
X
i=1
πij −πiπj
πiπj→σ2
π, ω −a.s. (4.6)
If in addition,
7
(i) (HT1) hold, then b
N/N →1in Pd,m probability.
(ii) (C1)-C(4), (HT1), (HT3), (HJ2) and (HJ4) hold, then Gπ
Nconverges weakly in D(R)to a
mean zero Gaussian process Gπwith covariance function Ed,mGπ(s)Gπ(t)given by
lim
N→∞
1
N2
N
X
i=1
N
X
j=1
Emnπij −πiπj
πiπj1{Yi≤s}−F(s)1{Yi≤t}−F(t)
+λ(F(s∧t)−F(s)F(t)) , s, t ∈R.
Note that in view of condition (HT3), the condition n→ ∞ is immediate, if λ > 0. We proceed
by establishing weak convergence of √n(FHJ
N−FN).
Theorem 4.2. Let Y1,...,YNbe i.i.d. random variables with c.d.f. Fand empirical c.d.f. FNand
let FHJ
Nbe defined in (2.2). Suppose n→ ∞,ω-a.s., and that (C1)-C(4), (HT1), (HT3), and (HJ2)
hold, as well as condition (4.6). Then √n(FHJ
N−FN)converges weakly in D(R)to a mean zero
Gaussian process GHJ with covariance function Ed,mGHJ (s)GHJ(t)given by
lim
N→∞
1
N2
N
X
i=1
N
X
j=1
Emnπij −πiπj
πiπj1{Yi≤s}−F(s)1{Yi≤t}−F(t),
for s, t ∈R.
Note that we do not need condition (HJ4) in Theorem 4.2. This condition is only needed in
Theorem 4.1 to establish the limit distribution of the finite dimensional projections of the process
Gπ
N. For Theorem 4.2 we only need that Gπ
Nis tight.
As before, below we obtain a functional CLT for √n(FHJ
N−FN)in the case that nand the
inclusion probabilities are deterministic. Similar to the remark we made after Theorem 3.1, note
that if we would have imposed (HJ2) for any sequence of bounded random vectors, then this would
imply conditions (i)-(ii) of Proposition 3.1, which can then be left out in Theorem 4.1.
Proposition 4.1. Consider the setting of Theorem 4.2, where nand πi, πij , for i, j = 1,2,...,N,
are deterministic. Suppose n→ ∞ and that (C1)-(C4), (HT1) and (HT3) hold, as well as condi-
tions (i)-(ii) from Proposition 3.1. Then √n(FHJ
N−FN)converges weakly in D(R)to a mean zero
Gaussian process GHT with covariance function µπ1(F(s∧t)−F(s)F(t)), for s, t ∈R.
Finally, we consider √n(FHJ
N−F). Again, we relate this process to (4.1) and write
√nFHJ
N(t)−F(t)=N
b
N
Gπ
N(t).(4.7)
Since b
N/N →1in probability, this implies that √n(FHJ
N−F)has the same limiting behavior as
Gπ
N.
Theorem 4.3. Let Y1,...,YNbe i.i.d. random variables with c.d.f. Fand let FHJ
Nbe defined in (2.2).
Suppose n→ ∞,ω-a.s., and that (C1)-C(4), (HT1), (HT3), (HJ2) and (HJ4) hold, as well as
condition (4.6). Then √n(FHJ
N−F)converges weakly in D(R)to a mean zero Gaussian process GHJ
F
with covariance function Ed,mGπ(s)Gπ(t)given by
lim
N→∞
1
N2
N
X
i=1
N
X
j=1
Emnπij −πiπj
πiπj1{Yi≤s}−F(s)1{Yi≤t}−F(t)
+λ(F(s∧t)−F(s)F(t)) , s, t ∈R.
8
With Theorem 4.3 we recover Theorem 1 in [Wan12]. Our assumptions are comparable to those
in [Wan12], although this paper seems to miss a condition on the convergence of the variance, such
as our condition (HJ2).
We conclude this section by establishing a functional CLT for √n(FHJ
N−F)in the case of
deterministic nand inclusion probabilities.
Proposition 4.2. Consider the setting of Theorem 4.3, where nand πi, πij , for i, j = 1,2,...,N,
are deterministic. Suppose n→ ∞ and that (C1)-(C4), (HT1) and (HT3) hold, as well as condi-
tions (i)-(ii) from Proposition 3.2. Then √n(FHJ
N−F)converges weakly in D(R)to a mean zero
Gaussian process GHJ with covariance function (µπ1+λ) (F(s∧t)−F(s)F(t)), for s, t ∈R.
5 Hadamard-differentiable functionals
Theorem 4.3 provides an elegant means to study the limit behavior of estimators that can be
described as φ(FHJ
N), where φis a Hadamard-differentiable functional. Given such a φ, the functional
delta-method, e.g., see Theorems 3.9.4 and 3.9.5 in [vdVW96] or Theorem 20.8 in [vdV98], enables
one to establish the limit distribution of φ(FHJ
N). Similarly, this holds for Theorems 3.1,3.2, and 4.2,
or Propositions 3.1,3.2,4.1, and 4.2 in the special case of deterministic nand inclusion probabilities.
We illustrate this by discussing the poverty rate. This indicator has recently been revisited
by [GT14] and [OAB15]. This example has also been discussed by [Dd08], but under the assumption
of weak convergence of √n(FHJ
N−FN)to some centered continuous Gaussian process. Note that
this assumption is now covered by our Theorem 4.2 and Proposition 4.1. Let Dφ⊂D(R)consist of
F∈D(R)that are non-decreasing. Then for F∈Dφ, the poverty rate is defined as
φ(F) = FβF −1(α)(5.1)
for fixed 0< α, β < 1, where F−1(α) = inf {t:F(t)≥α}. Typical choices are α= 0.5and β= 0.5
(INSEE) or β= 0.6(EUROSTAT). Its Hadamard derivative is given by
φ′
F(h) = −βf(βF −1(α))
f(F−1(α)) h(F−1(α)) + h(βF −1(α)).(5.2)
See [BLRG15] for details.
We then have the following corollaries for the Horvitz-Thompson estimator φ(FHT
N)and the
Hájek estimator φ(FHJ
N)for the poverty rate φ(F).
Corollary 5.1. Let φbe defined by (5.1)and suppose that the conditions of Proposition 3.2 hold.
Then, if Fis differentiable at F−1(α), the random variable √n(φ(FHT
N)−φ(F)) converges in dis-
tribution to a mean zero normal random variable with variance
σ2
HT,α,β =β2f(βF −1(α))2
f(F−1(α))2γπ1α+γπ2α2
+γπ1φ(F) + γπ2φ(F)2−2βf(βF −1(α))
f(F−1(α)) φ(F)γπ1+γπ2α,
(5.3)
where γπ1=µπ1+λand γπ2=µπ2−λ. If in addition n/N →0, then √n(φ(FHT
N)−φ(FN))
converges in distribution to a mean zero normal random variable with variance σ2
HT,α,β.
Corollary 5.2. Let φbe defined by (5.1). and suppose that the conditions of Proposition 4.2
hold. Then, if Fis differentiable at F−1(α), the random variable √n(φ(FHJ
N)−φ(F)) converges in
9
distribution to a mean zero normal random variable with variance
σ2
HJ,α,β =β2f(βF −1(α))2
f(F−1(α))2γπ1α(1 −α)
+γπ1φ(F)1−φ(F))−2βf(βF −1(α))
f(F−1(α)) φ(F)γπ1(1 −α),
(5.4)
where γπ1=µπ1+λ. If in addition n/N →0, then √n(φ(FHJ
N)−φ(FN)) converges in distribution
to a mean zero normal random variable with variance σ2
HJ,α,β.
6 Simulation study
The objective of this simulation study is to investigate the performance of the Horvitz-Thompson
(HT) and the Hájek (HJ) estimators for the poverty rate, as defined in (5.1), at the finite population
level and at the super-population level. The asymptotic results from Corollary 5.1 and 5.2 are used
to obtain variance estimators whose performance is also assessed in this small study.
Six simulation schemes are implemented with different population sizes and (design-based) ex-
pected sample sizes, namely N= 10 000 and 1000 and n= 500,100, and 50. The samples are drawn
according to three different sampling designs. The first one is simple random sampling without re-
placement (SI) with size n. The second design is Bernoulli sampling (BE) with parameter n/N.
The third one is Poisson sampling (PO) with first order inclusion probabilities equal to 0.4n/N for
the first half of the population and equal to 1.6n/N for the other half of the population, where
the population is randomly ordered. The first order inclusion probabilities are deterministic for the
three designs and the sample size nsis fixed for the SI design, while it is random with respect to the
design for the BE and PO designs. Moreover, the SI and BE designs are equal probability designs,
while PO is an unequal probability design. The results are obtained by replicating NR= 1000
populations. For each population, nR= 1000 samples are drawn according to the different designs.
The variable of interest Yis generated for each population according to an exponential distribution
with rate parameter equal to one. For this distribution and given αand β, the poverty rate has
an explicit expression φ(F) = 1 −exp(βln(1 −α)). In what follows, α= 0.5and β= 0.6and
φ(F)≃0.34. These are the same values for αand βas considered in [Dd08].
The Horvitz-Thompson estimator and Hájek estimator for φ(F)or φ(FN)are denoted by b
φHT
and b
φHJ, respectively. They are obtained by plugging in the empirical c.d.f.’s FHT
Nand FHJ
N, re-
spectively, for Fin expression (5.1). The empirical quantiles are calculated by using the function
wtd.quantile from the R package Hmisc for the Hájek estimator and by adapting the function for
the Horvitz-Thompson estimator. For the SI sampling design, the two estimators are the same.
The performance of the estimators for the parameters φ(F)and φ(FN)is evaluated using some
Monte-Carlo relative bias (RB). This is reported in Table 1. When estimating the super-population
parameter φ(F), if b
φij denotes the estimate (either b
φHT or b
φHJ) for the ith generated population
and the jth drawn sample, the Monte Carlo relative bias of b
φin percentages has the following
expression
RBF(b
φ) = 100
NRnR
NR
X
i=1
nR
X
j=1 b
φij −φ(F)
φ(F).
When estimating the finite population parameter φ(FN), the parameter depends on the generated
population Ni, for each i= 1,...,NR, and will be denoted by φ(FNi). The Monte Carlo relative
bias of b
φis then computed by replacing Fby FNiin the above expression. Concerning the relative
biases reported in Table 1, the values are small and never exceed 3%. As expected, these values
increase when ndecreases. When the centering is relative to φ(FN), the relative bias is in general
10
Table 1: RB (in %) of the HT and the HJ estimators for the finite population φ(FN)and the
super-population φ(F)poverty rate parameter
N= 10 000 N= 1000
n= 500 n= 100 n= 50 n= 500 n= 100 n= 50
SI HT-HJ φ(FN)−0.17 −0.89 −1.82 −0.05 −0.84 −1.62
φ(F)−0.20 −0.91 −1.86 −0.18 −0.72 −1.85
HT φ(FN)−0.12 −0.66 −1.29 0.01 −0.65 −1.12
BE φ(F)−0.15 −0.68 −1.34 −0.12 −0.54 −1.36
HJ φ(FN)−0.17 −0.92 −1.87 −0.04 −0.88 −1.68
φ(F)−0.20 −0.93 −1.92 −0.17 −0.76 −1.91
HT φ(FN)−0.05 −1.05 −2.06 −0.06 −0.30 −0.37
PO φ(F)−0.08 −1.07 −2.11 −0.19 −0.19 −0.63
HJ φ(FN)−0.20 −1.27 −2.95 −0.04 −1.08 −1.99
φ(F)−0.23 −1.28 −3.00 −0.17 −0.97 −2.23
Table 2: RB (in %) for the variance estimator of the HT and the HJ estimators for the poverty rate
parameter
N= 10 000 N= 1000
n= 500 n= 100 n= 50 n= 500 n= 100 n= 50
SI HT-HJ −2.21 −3.08 −2.97 −2.25 −3.26 −3.00
BE HT −4.15 −5.11 −4.21 −3.31 −5.11 −4.19
HJ −2.22 −3.06 −3.03 −2.26 −3.24 −3.03
PO HT −4.43 −4.96 −3.45 −3.74 −5.72 −4.59
HJ −2.36 −3.43 −3.36 −2.44 −3.75 −4.13
somewhat smaller than when centering with φ(F). This behavior is most prominent when N= 1000
and n= 500, which suggests that the estimates are typically closer to the population poverty rate
φ(FN)than to the model parameter φ(F). The Hájek estimator has a larger relative bias than the
Horvitz-Thompson estimator in all situations but in particular for the Poisson sampling design when
the size of the population is 1000. Note that all values in Table 1are negative, which illustrates the
fact that the estimators typically underestimate the population and model poverty rates.
In Table 2, the estimators of the variance of b
φHT and b
φHJ are obtained by plugging in the
empirical c.d.f.’s FHT
Nand FHJ
N, respectively, for Fin the expressions (5.3) and (5.4). To estimate f
in the variance of b
φHJ, we follow [BS03], who propose a Hájek type kernel estimator with a Gaussian
kernel function. For the variance of b
φHT, we use a corresponding Horvitz-Thompson estimator by
replacing b
Nby N. Based on [Sil86], pages 45-47, we choose b= 0.79Rn−1/5
s, where Rdenotes
the interquartile range. This differs from [BS03], who propose a similar bandwidth of the order
N−1/5. However, this severely underestimates the optimal bandwidth, leading to large variances of
the kernel estimator. Usual bias variance trade-off computations show that the optimal bandwidth
is of the order n−1/5
s.
For the SI sampling design, (5.3) and (5.4) are identical and can be calculated in an explicit way
using the fact that µπ1+λ= 1 and µπ2−λ=−1. For the BE design, µπ1+λ= 1, whereas for
Poisson sampling, the value (n/N2)PN
i=1 1/πiis taken for µπ1+λ. For these designs, µπ2−λ=−λ,
where we take n/N as the value of λ.
In order to compute the relative bias of the variance estimates, the asymptotic variance is taken
as reference. This asymptotic variance AV(b
φ)of the estimator b
φ(either b
φHT or b
φHJ) is computed
from (5.3) and (5.4). The expressions f(β F −1(α)) and f(F−1(α)) are explicit in the case of an
11
Table 3: Coverage probabilities (in %) for 95% confidence intervals of the HT and the HJ estimators
for the finite population φ(FN)and the super-population φ(F)poverty rate parameter
N= 10 000 N= 1000
n= 500 n= 100 n= 50 n= 500 n= 100 n= 50
SI HT-HJ φ(FN)95.2 94.4 93.5 98.8 95.1 94.6
φ(F)94.6 93.2 92.2 94.7 93.2 92.0
HT φ(FN)94.9 94.3 94.6 98.4 94.8 94.6
BE φ(F)94.4 93.7 94.9 94.6 93.6 94.7
HJ φ(FN)95.1 94.3 93.9 98.7 94.9 94.2
φ(F)94.7 94.2 93.9 94.7 94.2 93.9
HT φ(FN)94.5 94.2 94.3 96.8 94.0 93.6
PO φ(F)94.5 94.0 94.3 94.6 93.6 93.5
HJ φ(FN)94.8 93.9 93.6 97.2 94.2 93.3
φ(F)94.6 93.9 93.6 94.6 93.9 93.2
exponential distribution. Furthermore, for µπ1+λand µπ2−λwe use the same expressions as
mentioned above. The Monte Carlo relative bias of the variance estimator c
AV(b
φ)in percentages, is
defined by
RB(c
AV(b
φ)) = 100
NRnR
NR
X
i=1
nR
X
j=1 c
AV(b
φij )−AV(b
φ)
AV(b
φ),
where c
AV(b
φij )denotes the variance estimate for the ith generated population and the jth drawn
sample.
Table 3gives the Monte-Carlo coverage probabilities for a nominal coverage probability of 95%
for the two parameters φ(FN)and φ(F), the Horvitz-Thompson and the Hájek estimators and the
different simulation schemes. In general the coverage probabilities are somewhat smaller than 95%,
which is due to the underestimation of the asymptotic variance, as can be seen from Table 2. The
case N= 1000 and n= 500 for b
φHJ forms an exception, which is probably due to the fact that
in this case λ=n/N is far from zero, so that the limit distribution of √n(φ(FHT
N)−φ(FN)) and
√n(φ(FHJ
N)−φ(FN)) has a larger variance than the ones reported in Corollaries 5.1 and 5.2. When
looking at Table 2, the relative biases are smaller than 5% when nis 500. The biases are larger
for the Horvitz-Thompson estimator than for the Hájek estimator. Again all relative biases are
negative, which illustrates the fact that the asymptotic variance is typically underestimated.
7 Discussion
[Wan12] formulates a functional central limit theorem (see his Theorem 1) for the Hájek empirical
c.d.f. from (2.2) centered around F. It is also claimed that a similar result holds for the Horvitz-
Thompson process in (2.1), but details are not provided. The paper seems to miss a number of
assumptions that cannot be avoided. For instance, the proof of his Theorem 1 requires convergence
in probability of the covariance matrix of the vector √n∗(Fnπ (t)−FN(t), Fnπ(s)−FN(s)). This
assumption is comparable with our condition (HJ2), but is missing in [Wan12]. More severely, the
argument establishing Billingsley’s tightness condition seems to contain a serious mistake, which
cannot be repaired easily (see the inequality on line 6 page 678 in [Wan12]; the inequality can be
shown not to hold for instance for sampling designs with independent inclusion indicators). As
a consequence, assumption 5 in [Wan12] differs somewhat from our conditions (C2)-(C4). The
remaining assumptions in [Wan12] are comparable to the conditions needed for our Theorem 4.3.
12
Note that, in addition to the latter theorem, we also establish Theorems 3.1,3.2, and 4.2 for other
empirical processes of interest.
[BW07] and [SW13] obtain weak convergence of the empirical process (in our notation)
1
√N
N
X
i=1 ξi
πi
f(Yi)−Emf(Yi), f ∈ F.(7.1)
Weak convergence is established under finite population two-phase stratified sampling. This pro-
cess is comparable to our Horvitz-Thompson empirical process in Theorem 3.2. Although their
functional CLT allows general function classes, it only covers sampling designs with equal inclu-
sion probabilities within strata that assume exchangeability of the inclusion indicators ξ1,...,ξN,
such as simple random sampling and Bernoulli sampling. Their approach views two-phase stratified
sampling as a form of bootstrap and uses results on exchangeable weighted bootstrap for empirical
processes from [PW93], as incorporated in [vdVW96]. This approach, in particular the application
of Theorem 3.6.13 in [vdVW96], seems difficult to extend to more complex sampling designs that
go beyond exchangeable inclusion indicators. Although our results only correspond to the class of
indicators ft(y) = 1(−∞,t](y), for t∈R, the advantage of our results is that they are applicable to
general sampling designs. Moreover, our results also include empirical processes centered with the
population mean.
[BCC14] establish a functional CLT, for the Poisson-like empirical process
e
Gp
TN(f) = 1
√N
N
X
i=1
(ξi−pi)f(Yi)
pi−θN,p(f), f ∈ F,(7.2)
where p= (p1,...,pN)is the vector of inclusion probabilities corresponding to a Poisson sampling
design and
θN,p(f) = 1
dN
N
X
i=1
(1 −pi)f(Yi), dN=
N
X
i=1
pi(1 −pi).
However, the functional CLT is obtained conditionally on the Y1, Y2.... In this case, the terms in the
summation in (7.2) are independent, which allows the use of Theorem 2.11.1 from [vdVW96]. From
their result a functional CLT under rejective sampling can then be established for the design-based
Horvitz-Thompson process
GN,π (f) = 1
√N
N
X
i=1 ξi
πi
f(Yi)−f(Yi), f ∈ F,(7.3)
ω-almost surely. This is due to the close connection between Poisson sampling and rejective sam-
pling. For this reason, the approach used in [BCC14] seems difficult to extend to other sampling
designs. For the class of indicators ft(y) = 1(−∞,t](y), for t∈R, the process in (7.3) is similar to the
one in our Theorem 3.1, but this theorem allows general sampling designs. Moreover, our results
also include empirical processes centered with the superpopulation mean.
8 Proofs
We will use Theorem 13.5 from [Bil99], which requires convergence of finite dimensional distributions
and a tightness condition (see (13.14) in [Bil99]. We will first establish the tightness condition, as
stated in the following lemma.
13
Lemma 8.1. Let Y1,...,YNbe i.i.d. random variables with c.d.f. Fand empirical c.d.f. FNand let
FHT
Nbe defined according to (2.1). Let XN=√n(FHT
N−FN)and suppose that (C1)-(C4) hold. Then
there exists a constant K > 0independent of N, such that for any t1,t2and −∞ < t1≤t≤t2<∞,
Ed,m h(XN(t)−XN(t1))2(XN(t2)−XN(t))2i≤KF(t2)−F(t1)2.
Proof. First note that
XN(t) = √n
N
N
X
i=1 ξi
πi−11{Yi≤t}.
For the sake of brevity, for −∞ < t1≤t≤t2<∞, and i= 1,2,...,N, we define p1=F(t)−F(t1),
p2=F(t2)−F(t),Ai=1{t1<Yi≤t}, and Bi=1{t<Yi≤t2}. Furthermore, let αi= (ξi−πi)Ai/πi
and βi= (ξi−πi)Bi/πi. Then, according to the fact that p1p2≤(F(t2)−F(t1))2, due to the
monotonicity of F, it suffices to show
1
N4Ed,m
n2 N
X
i=1
αi!2
N
X
j=1
βj
2
≤Kp1p2.(8.1)
The expectation on the left hand side can be decomposed as follows
N
X
i=1
N
X
k=1
Ed,m n2α2
iβ2
k+
N
X
i=1 X
j6=i
N
X
k=1
Ed,m n2αiαjβ2
k
+
N
X
k=1 X
l6=k
N
X
i=1
Ed,m n2α2
iβkβl+
N
X
i=1 X
j6=i
N
X
k=1 X
l6=k
Ed,m n2αiαjβkβl.
(8.2)
Note that by symmetry, sums two and three on the right hand side can be handled similarly, so
that essentially we have to deal with three summations. We consider them one by one.
First note that, since 1{t1<Yi≤t}1{t<Yi≤t2}= 0, we will only have non-zero expectations when
{i, j}and {k, l}are disjoint. With (C1), we find
1
N4
N
X
i=1
N
X
k=1
Ed,m n2α2
iβ2
k=1
N4XX
(i,k)∈D2,N
Ed,m n2α2
iβ2
k
=1
N4XX
(i,k)∈D2,N
Emn2AiBk
π2
iπ2
k
Ed(ξi−πi)2(ξk−πk)2
≤1
K4
1XX
(i,k)∈D2,N
EmAiBk
n2Ed(ξi−πi)2(ξk−πk)2
(8.3)
Straightforward computation shows that Ed(ξi−πi)2(ξk−πk)2equals
(πik −πiπk)(1 −2πi)(1 −2πk) + πiπk(1 −πi)(1 −πk).
Hence, with (C1)-(C2) we find that
Ed(ξi−πi)2(ξk−πk)2≤ |Ed(ξi−πi)(ξk−πk)|+K2
2
n2
N2=On2
N2,
14
ω-almost surely. It follows that
1
N4
N
X
i=1
N
X
k=1
Ed,m n2α2
iβ2
k≤O1
N2XX
(i,k)∈D2,N
Em[AiBk].
Since D2,N has N(N−1) elements and Em[AiBj] = p1p2for (i, j )∈D2,N , it follows that
1
N4
N
X
i=1
N
X
j=1
Ed,m n2α2
iβ2
j≤Kp1p2.(8.4)
Consider the second (and third) summation on the right hand side of (8.2). Similarly to (8.3), we
can then write
1
N4
N
X
i=1 X
j6=i
N
X
k=1
Ed,m n2αiαjβ2
k=1
N4XXX
(i,j,k)∈D3,N
Ed,m n2αiαjβ2
k
≤1
N4XXX
(i,j,k)∈D3,N Ed,m n2AiAjBk
πiπjπ2
k
(ξi−πi)(ξj−πj)(ξk−πk)2
≤1
N4XXX
(i,j,k)∈D3,N
Emn2AiAjBk
πiπjπ2
kEd(ξi−πi)(ξj−πj)(ξk−πk)2
≤1
K4
1XXX
(i,j,k)∈D3,N
EmAiAjBk
n2Ed(ξi−πi)(ξj−πj)(ξk−πk)2.
We find that Ed(ξi−πi)(ξj−πj)(ξk−πk)2equals
(1 −2πk)Ed(ξi−πi)(ξj−πj)(ξk−πk) + πk(1 −πk)Ed(ξi−πi)(ξj−πj)
With (C1)-(C3), this means |Ed(ξi−πi)(ξj−πj)(ξk−πk)2|=O(n2/N 3),ω-almost surely. It follows
that
1
N4
N
X
i=1 X
j6=i
N
X
k=1
Ed,m n2αiαjβ2
k=O1
N3XXX
(i,j,k)∈D3,N
Em[AiAjBk].
Since D3,N has N(N−1)(N−2) elements and Ed,m [AiAjBk] = p2
1p2, for (i, j, k)∈D3,N , we find
1
N4
N
X
i=1 X
j6=i
N
X
k=1
Ed,m n2αiαjβ2
k≤Kp1p2.(8.5)
The computations for the third summation in (8.2) are completely similar. Finally, consider the
last summation in (8.2). As before, this summation can be bounded by
1
K4
1X
(i,j,k,l)∈D4,N
EmAiAjBkBl
n2Ed(ξi−πi)(ξj−πj)(ξk−πk)(ξl−πl).
Since D4,N has N(N−1)(N−2)(N−3) elements and Em[AiAjBkBl] = p2
1p2
2, for (i, j, k, l)∈D4,N ,
with (C4) we conclude that
1
N4
N
X
i=1 X
j6=i
N
X
k=1 X
l6=k
Ed,m n2αiαjβkβl≤Kp1p2.(8.6)
Together with (8.4), (8.5) and decomposition (8.2), this proves (8.1).
15
Lemma 8.2. Let XN=√n(FHT
N−FN)and suppose that (C1)-(C2),(HT1)-(HT2) hold. For any
k∈ {1,2,...}, and t1,...,tk∈R,XN(t1),...,XN(tk)converges in distribution under Pd,m to a
k-variate mean zero normal random vector with covariance matrix ΣHT
kgiven in (3.4).
Proof. We will use the Cramér-Wold device. Note that any linear combination
a1√nFHT
N(t1)−FN(t1)+···+ak√nFHT
N(tk)−FN(tk)(8.7)
can be written as
√n(1
N
N
X
i=1
ξi
πi
Vik −1
N
N
X
i=1
Vik),(8.8)
where
Vik =a11{Yi≤t1}+···+ak1{Yi≤tk}=at
kYik (8.9)
with Yt
ik = (1{Yi≤t1},...,1{Yi≤tk})and at
k= (a1,...,ak). For the corresponding design-based
variance, we have
nS2
N=n
N2
N
X
i=1
N
X
j=1
πij −πiπj
πiπj
VikVjk
=at
k
n
N2
N
X
i=1
N
X
j=1
πij −πiπj
πiπj
YikYt
jk
ak→at
kΣHT
kak,
(8.10)
ω-almost surely, according to (HT2), where ΣHT
kcan obtained from (3.4). Together with (HT1), it
follows that (8.7) converges in distribution to a mean zero normal random variable with variance
at
kΣHT
kak. We conclude that (8.7) converges in distribution to a1N1+···+akNk, where (N1,...,Nk)
has a k-variate mean zero normal distribution with covariance matrix ΣHT
k. According to the
Cramér-Wold device this proves the lemma.
Proof of Theorem 3.1 We first consider XN=√n(FHT
N−FN)for the case that the Yi’s follow
a uniform distribution on [0,1]. We apply Theorem 13.5 from [Bil99]. Lemma 8.2 provides the
limiting distribution of the finite dimensional projections (XN(t1),...,XN(tk)), which is the same
as that of the vector (GHT(t1),...,GHT(tk)), where GHT is a mean zero Gaussian process with
covariance function
EmGHT(s)GHT(t) = lim
N→∞
1
N2
N
X
i=1
N
X
j=1
Emnπij −πiπj
πiπj
1{Yi≤s}1{Yj≤t},
for all s, t ∈R. Tightness condition (13.14) in [Bil99] is provided by Lemma 8.1. Since GHT is
continuous at 1, the theorem now follows from Theorem 13.5 in [Bil99] for the case that the Yi’s are
uniformly distributed on [0,1].
To extend this to a functional CLT with i.i.d. random variables Y1, Y2,...with a general c.d.f. F,
we can follow the argument in the proof of Theorem 14.3 from [Bil99]. First define the generalized
inverse of F:
ϕ(s) = inf{t:s≤F(t)},
that satisfies s≤F(t)if and only if ϕ(s)≤t. This means that if U1, U2,... are i.i.d. uniformly
distributed on [0,1],ϕ(Ui)has the same distribution as Yi, so that 1{Yi≤t}
d
=1{ϕ(Ui)≤t}=1{Ui≤F(t)}.
It follows that
XN(t) = √n(1
N
N
X
i=1
ξi1{Yi≤t}
πi−1
N
N
X
i=1
1{Yi≤t})d
=ZN(F(t)), t ∈R,
16
where
ZN(t) = √n
N
N
X
i=1 ξi
πi−11{Ui≤t}, t ∈[0,1],(8.11)
Hence, the general HT empirical process XNis the image of the HT uniform empirical process ZN
under the mapping ψ:D[0,1] 7→ D(R)given by [ψx] (t) = x(F(t)). Note that, if xN→xin D[0,1]
in the Skorohod topology and xhas continuous sample paths, then the convergence is uniform. But
then also ψxNconverges to ψx uniformly in D(R). This implies that ψxNconverges to ψx in the
Skorohod topology. We have established that ZN⇒Zweakly in D[0,1] in the Skorohod topology,
where Zhas continuous sample paths. Therefore, according to the continuous mapping theorem,
e.g., Theorem 2.7 in [Bil99], it follows that ψ(ZN)⇒ψ(Z)weakly. This proves the theorem for Yi’s
with a general c.d.f. F.✷
Proof of Proposition 3.1 The proof is similar to that of Theorem 3.1. First consider the case of
uniform Yi’s with F(t) = t. We only have to verify the weak convergence of the finite dimensional
projections of the process XN=√n(FHT
N−FN). Consider (8.7) represented as in (8.8). From (HT1)
and Lemma 9.1(ii) in [BLRG15] we conclude that (8.7) converges in distribution to a mean zero
normal random variable with variance
σ2
HT =µπ1EmV2
1k+µπ2(Em[V1k])2
=µπ1at
kEmY1kYt
1kak+µπ2at
k(EmY1k) (EmY1k)tak=at
kΣkak,
where Σkis the k×k-matrix with (q, r)-element equal to µπ1(tq∧tr) + µπ2tqtr. We conclude
that (8.7) converges in distribution to a1N1+···+akNk, where (N1,...,Nk)has a k-variate mean
zero normal distribution with covariance matrix Σk. As in the proof of Lemma 8.2, by means of
the Cramér-Wold device this establishes the limit distribution of (XN(t1),...,XN(tk)), which is the
same that of the vector (GHT(t1),...,GHT(tk)), where GHT is a mean zero Gaussian process with
covariance function Ed,mGHT(s)GHT(t) = µπ1(s∧t) + µπ2st. From here on, the proof is completely
the same as that of Theorem 3.1.✷
To establish tightness for the process √n(FHT
N−F)we use the following decomposition
√n(FHT
N−F) = √n(FHT
N−FN) + √n
√N·√N(FN−F).(8.12)
The first process on the right hand side converges weakly to Gaussian process, according to Theo-
rem 3.1. The process √N(FN−F)also converges weakly to a Gaussian process, due to the classical
Donsker theorem. In particular both processes on the right hand side are tight in D(R)with the
Skorohod metric. In general the sum of two tight processes in D(R)is not necessarily tight. How-
ever, this will be the case if both processes converge weakly to continuous processes (see Lemma 9.2
in [BLRG15]).
Lemma 8.3. Let V1, V2,... be a sequence of bounded i.i.d. random variables on (Ω,F,Pm)with
mean µVand variance σ2
V, and let S2
Nbe defined by (3.2). Suppose (HT1) and (HT3) hold and
nS2
N→σ2
HT >0in Pm-probability. Then,
√n 1
N
N
X
i=1
ξiVi
πi−µV!,(8.13)
converges in distribution under Pd,m to a mean zero normal random variable with variance σ2
HT +
λσ2
V.
17
Note that, in view of the expression for σ2
HT obtained in Lemma 9.1, for simple random sampling
without replacement, the condition σ2
HT >0implies that λmust differ from 1.
Proof. We decompose as follows
1
SN 1
N
N
X
i=1
ξiVi
πi−µV!=1
SN 1
N
N
X
i=1
ξiVi
πi−1
N
N
X
i=1
Vi!
+1
√nSN×√n
√N×√N 1
N
N
X
i=1
Vi−µV!.
According to (HT3), the central limit theorem, Slutsky’s theorem, and the fact that nS2
N→σ2
HT >0
in probability,
1
√nSN×√n
√N×√N 1
N
N
X
i=1
Vi−µV!→N(0, λσ2
V/σ2
HT),(8.14)
in distribution under Pm, whereas, thanks to (HT1),
1
SN 1
N
N
X
i=1
ξiVi
πi−1
N
N
X
i=1
Vi!→N(0,1), ω −a.s., (8.15)
in distribution under Pd. Since the latter limit distribution does not depend on ω, we can apply
Theorem 5.1(iii) from [RBSK05]. It follows that
1
SN 1
N
N
X
i=1
ξiVi
πi−µV!→N0,1 + λσ2
V/σ2
HT,
in distribution under Pd,m. Together with nS2
N→σ2
HT in probability, this implies that the random
variable in (8.13) converges to a mean zero normal random variable with variance σ2
HT +λσ2
V.
Lemma 8.4. Let XF
N=√n(FHT
N−F)and suppose that (C1)-(C2),(HT1)-(HT4) hold. Then for any
k∈ {1,2,...}, and t1, t2,...,tk∈R, the sequence XF
N(t1),...,XF
N(tk)converges in distribution
under Pd,m to a k-variate mean zero normal random vector with covariance matrix ΣF
HT =ΣHT
k+
λΣF, where ΣHT
kis given in (3.4)and ΣFis the k×kmatrix with (q, r)-entry F(tq∧tr)−F(tq)F(tr),
for q, r = 1,2,...,k.
Proof. The proof is similar to the proof of Lemma 8.2. The details can be found in [BLRG15].
Proof of Theorem 3.2 The proof is completely similar to that of Theorem 3.1. We first consider
the process XF
N=√n(FHT
N−F)for the case that the Yi’s follow a uniform distribution with
F(t) = t. Decompose XF
Nas in (8.12). By Theorem 3.1, the first process on the right hand side
of (8.12) converges weakly to a process in C[0,1]. Due to the classical Donsker theorem and (HT3),
the second process on the right hand side of (8.12) also converges weakly to a process in C[0,1].
Tightness of XF
Nthen follows from Lemma 9.2 in [BLRG15]. Convergence of the finite dimensional
distributions is provided by Lemma 8.4. The theorem now follows from Theorem 13.5 in [Bil99]
for the case that the Yi’s are uniformly distributed on [0,1]. Next, this is extended to Yi’s with a
general c.d.f. Fin the same way as in the proof of Theorem 3.1.✷
To establish convergence in distribution of the finite dimensional distributions of √n(FHT
N−F)
under the conditions of Proposition 3.2, as in the proof of Lemma 8.4, we will use the Cramér-Wold
device. To ensure that the limit in (9.2) is still strictly positive without imposing (HT4), we will
need the following lemma. Its proof can befound in [BLRG15].
18
Lemma 8.5. Let Fbe the c.d.f. of the i.i.d. Y1,...,YN. For any k-tuple (t1,...,tk)∈Rk, suppose
that the values F(t1),...,F(tk)are all distinct and such that 0< F (ti)<1. Let a, b ∈R, such that
a≥b. If a > 0, then the k×kmatrix Mwith (i, j)-th element Mij =aF (ti∧tj)−bF (ti)F(tj)is
positive definite.
Lemma 8.6. Let XF
N=√n(FHT
N−F)and suppose that nand πi, πij , for i, j = 1,2,...,N, are
deterministic. Suppose that (C1)-(C2), (HT1) and (HT3) hold, as well as conditions (i)-(ii) of
Proposition 3.2. Then, for any k∈ {1,2,...}, and t1,...,tk∈R,XF
N(t1),...,XF
N(tk)converges
in distribution under Pd,m to a k-variate mean zero normal random vector with covariance matrix
ΣF
HT, with (q, r)-entry (µπ1+λ)F(tq∧tr) + (µπ2−λ)F(tq)F(tr), for q, r, = 1,2,...,k.
Proof. The proof follows the same ideas as the proof of Lemma 8.4, but is a bit more technical. It
can be found in [BLRG15].
Proof of Proposition 3.2 The proof is similar to that of Theorem 3.2. Tightness is obtained
in the same way and the convergence of finite dimensional projections is provided by Lemma 8.6.
The theorem now follows from Theorem 13.5 in [Bil99] for the case that the Yi’s are uniformly
distributed on [0,1]. Next, this is extended to Yi’s with a general c.d.f. Fin the same way as in the
proof of Theorem 3.1.✷
Proof of Theorem 4.1 For part (i), note that with S2
Ndefined in (3.2) with Vi= 1, from (HT1)
together with condition (4.6), it follows that
√nSN×1
SN 1
N
N
X
i=1
ξi
πi−1!→N(0, σ2
π), ω −a.s.,
in distribution under Pd. This implies
√n b
N
N−1!=√n 1
N
N
X
i=1
ξi
πi−1!→N(0, σ2
π),(8.16)
in distribution under Pd,m. In particular, since n→ ∞, this proves part (i).
The proof of part(ii) is along the same lines as the proof of Theorems 3.1 and 3.2. First consider
the case, where the Yi’s are uniform, with F(t) = ton [0,1]. Then, with FHT
Ndefined in (2.1) and
XF
N=√n(FHT
N−F), we can write Gπ
N(t) = XF
N(t)−(XF
N(t)−Gπ
N(t)). According to Theorem 3.2,
the process XF
Nconverges weakly to a continuous process. As a consequence of (8.16), the process
XF
N(t)−Gπ
N(t) = t√n 1
N
N
X
i=1
ξi
πi−1!,
also converges weakly to a continuous process. Hence, similar to the argument in the proof of
Theorem 3.2, we conclude that the process Gπ
Nis tight. Next, we establish weak convergence of the
finite dimensional projections Gπ
N(t1),...,Gπ
N(tk).(8.17)
To this end we apply the Cramér-Wold device and consider linear combinations
a1Gπ
N(t1) + ···+akGπ
N(tk) = √n
N
N
X
i=1
ξi
πi
Vik.(8.18)
19
Convergence of (8.18), is obtained completely similar to that of (9.1) in Lemma 8.4, but this time
with
Vik =a11{Yi≤t1}−t1) + ···+ak(1{Yi≤tk})−tk,
and µk= 0. Using the fact that (HJ4) allows the use of Lemma 8.3, one can deduce that (8.18)
converges in distribution under Pd,m to a1N1+···+akNk, where (N1,...,Nk)has a k-variate normal
distribution with covariance matrix Σπ=ΣHJ
k+λΣF, where ΣHJ
kand ΣFare given in (4.5) and
Lemma 8.4, respectively. By means of the Cramér-Wold device, this proves that (8.17) converges in
distribution under Pd,m to a mean zero k-variate normal random vector with covariance matrix Σπ.
This distribution is the same as that of Gπ(t1),...,Gπ(tk), where Gπis a mean zero Gaussian
process with covariance function
lim
N→∞
1
N2
N
X
i=1
N
X
j=1
Emnπij −πiπj
πiπj1{Yi≤s}−s1{Yi≤t}−t
+λ(s∧t−st), s, t ∈R.
Since Gπis continuous at 1, the theorem then follows from Theorem 13.5 in [Bil99] for the case
of uniform Yi’s. Extension to Yi’s with a general c.d.f. Fis completely similar to the proof of
Theorem 3.1.✷
Proof of Theorem 4.2 We use (4.2). From the proof of Theorem 4.1, we know that Gπ
Nis tight.
Together with Theorem 4.1(i), it then follows that the limit behavior of √n(FHJ
N−FN)is the same
as that of the process YNdefined in (4.3). This process can be written as
YN(t) = √n
N
N
X
i=1 ξi
πi−11{Yi≤t}−F(t)√n
N
N
X
i=1 ξi
πi−1.
As in the proofs of Theorems 3.1,3.2, and 4.1, we first consider the case of uniform Yi’s. The first
process on the right hand side is √n(FHT
N−FN), which converges weakly to a continuous process,
according to Theorem 3.1, whereas the second process also converges to a continuous process due
to (8.16). As in the proof of Theorem 3.2, one can then argue that YN, being the difference of these
processes, is tight. Next, we prove weak convergence of the finite dimensional projections
YN(t1),...,YN(tk).(8.19)
As before, we apply the Cramér-Wold device and consider
a1YN(t1) + ···+akYN(tk) = √n(1
N
N
X
i=1
ξi
πi
Vik −1
N
N
X
i=1
Vik),(8.20)
with
Vik =a11{Yi≤t1}−t1) + ···+ak(1{Yi≤tk})−tk.
Convergence of (8.20) is obtained completely similar to that of (8.8) in the proof of Lemma 8.2. From
(HT1) and (HJ2), it follows that (8.20) converges in distribution under Pd,m to a1N1+···+akNk,
where (N1,...,Nk)has a k-variate normal distribution with covariance matrix ΣHJ
kgiven in (4.5).
By means of the Cramér-Wold device, this proves that (8.19) converges in distribution under Pd,m
to a mean zero k-variate normal random vector with covariance matrix ΣHJ
k. This distribution is the
same as that of GHJ(t1),...,GHJ(tk), where GHJ is a mean zero Gaussian process with covariance
function
lim
N→∞
1
N2
N
X
i=1
N
X
j=1
Emnπij −πiπj
πiπj1{Yi≤s}−s1{Yi≤t}−t,
20
for s, t ∈R. As before, the theorem now follows from Theorem 13.5 in [Bil99] for the case of uniform
Yi’s, and is then extended to Yi’s with a general c.d.f. F.✷
Proof of Theorem 4.3 The theorem follows directly from relation (4.7) and Theorem 4.1.✷
The proofs of Propositions 4.1 and 4.2 are similar to those of Theorems 4.2 and 4.1, respectively,
and can be found in [BLRG15]. The proofs for Corollaries 5.1 and 5.2 are fairly straightforward
and can be found in [BLRG15].
References
[BCC14] Patrice Bertail, Emilie Chautru, and Stéphan Clémençon. Empirical processes in survey
sampling. Submitted, See also https://hal.archives-ouvertes.fr/hal-00989585,
2014.
[BD09] Garry F. Barrett and Stephen G. Donald. Statistical inference with generalized Gini
indices of inequality, poverty, and welfare. J. Bus. Econom. Statist., 27(1):1–17, 2009.
[Bha07] Debopam Bhattacharya. Inference on inequality from household survey data. J. Econo-
metrics, 137(2):674–707, 2007.
[Bil99] Patrick Billingsley. Convergence of probability measures. Wiley Series in Probability
and Statistics: Probability and Statistics. John Wiley & Sons, Inc., New York, second
edition, 1999. A Wiley-Interscience Publication.
[BLRG12] Helène Boistard, Hendrik P. Lopuhaä, and Anne Ruiz-Gazen. Approximation of rejective
sampling inclusion probabilities and application to high order correlations. Electron. J.
Stat., 6:1967–1983, 2012.
[BLRG15] Helène Boistard, Hendrik P. Lopuhaä, and Anne Ruiz-Gazen. Supplement to "functional
central limit theorems in survey sampling". 2015.
[BM11] Debopam Bhattacharya and Bhaskhar Mazumder. A nonparametric analysis of black
and white differences in intergenerational income mobility in the united states. Quant.
Econ., 2(3):335–379, 2011.
[BO00] F. Jay Breidt and Jean D. Opsomer. Local polynomial regresssion estimators in survey
sampling. Ann. Statist., 28(4):1026–1053, 2000.
[BR09] David Binder and Georgia Roberts. Handbook of Statistics 29B: Sample Surveys: Design,
Methods and Applications., chapter Chapter 24: Design- and Model-Based Inference for
Model Parameters, pages 33–54. Elsevier, Amsterdam, 2009.
[BS03] Yves G. Berger and Chris J. Skinner. Variance estimation for a low income proportion.
J. Roy. Statist. Soc. Ser. C, 52(4):457–468, 2003.
[BW07] Norman E. Breslow and Jon A. Wellner. Weighted likelihood for semiparametric models
and two-phase stratified samples, with application to Cox regression. Scand. J. Statist.,
34(1):86–102, 2007.
[CCGL10] Hervé Cardot, Mohamed Chaouch, Camelia Goga, and Catherine Labruère. Properties
of design-based functional principal components analysis. J. Statist. Plann. Inference,
140(1):75–91, 2010.
21
[Dav09] Russell Davidson. Reliable inference for the Gini index. J. Econometrics, 150(1):30–40,
2009.
[Dd08] Fabien Dell and Xavier d’Haultfœuille. Measuring the evolution of complex indicators:
Theory and application to the poverty rate in France. Ann. Économ. Statist., (90):259–
290, 2008.
[DS92] Jean-Claude Deville and Carl-Erik Särndal. Calibration estimators in survey sampling.
Journal of the American statistical Association, 87(418):376–382, 1992.
[Dud02] R. M. Dudley. Real analysis and probability, volume 74 of Cambridge Studies in Advanced
Mathematics. Cambridge University Press, Cambridge, 2002. Revised reprint of the 1989
original.
[FF91] Carol A. Francisco and Wayne A. Fuller. Quantile estimation with a complex survey
design. Ann. Statist., 19(1):454–469, 1991.
[Ful09] W.A. Fuller. Sampling Statistics. Wiley Series in Survey Methodology. Wiley, New York,
2009.
[GT14] Eric Graf and Yves Tillé. Variance estimation using linearization for poverty and social
exclusion indicators. Survey Methodology, 40(1):61–79, 2014.
[Háj64] Jaroslav Hájek. Asymptotic theory of rejective sampling with varying probabilities from
a finite population. Ann. Math. Statist., 35:1491–1523, 1964.
[KG98] Edward L. Korn and Barry I. Graubard. Variance estimation for superpopulation pa-
rameters. Statist. Sinica, 8(4):1131–1151, 1998.
[KR81] D. Krewski and J. N. K. Rao. Inference from stratified samples: properties of the
linearization, jackknife and balanced repeated replication methods. Ann. Statist.,
9(5):1010–1019, 1981.
[OAB15] M. Oguz-Alper and Y. G. Berger. Variance estimation of change of poverty based upon
the turkish eu-silc survey. Journal of Official Statistics, 31(2):155–175, 2015.
[PW93] Jens Præstgaard and Jon A. Wellner. Exchangeably weighted bootstraps of the general
empirical process. Ann. Probab., 21(4):2053–2086, 1993.
[RBSK05] Susana Rubin-Bleuer and Ioana Schiopu Kratina. On the two-phase framework for joint
model and design-based inference. Ann. Statist., 33(6):2789–2810, 2005.
[Sil86] B. W. Silverman. Density estimation for statistics and data analysis. Monographs on
Statistics and Applied Probability. Chapman & Hall, London, 1986.
[SW13] Takumi Saegusa and Jon A. Wellner. Weighted likelihood estimation under two-phase
sampling. Ann. Statist., 41(1):269–295, 2013.
[Tho97] M. E. Thompson. Theory of sample surveys, volume 74 of Monographs on Statistics and
Applied Probability. Chapman & Hall, London, 1997.
[vdV98] A. W. van der Vaart. Asymptotic statistics, volume 3 of Cambridge Series in Statistical
and Probabilistic Mathematics. Cambridge University Press, Cambridge, 1998.
22
[vdVW96] Aad W. van der Vaart and Jon A. Wellner. Weak convergence and empirical processes.
Springer Series in Statistics. Springer-Verlag, New York, 1996. With applications to
statistics.
[Wan12] Jianqiang C. Wang. Sample distribution function based goodness-of-fit test for complex
surveys. Comput. Statist. Data Anal., 56(3):664–679, 2012.
Hélène Boistard
Toulouse School of Economics
21 allée de Brienne
31000 Toulouse, France
e-mail: helene@boistard.fr
Hendrik P. Lopuhaä
Delft Institute of Applied Mathematics
Delft University of Technology
Delft, The Netherlands
e-mail: h.p.lopuhaa@tudelft.nl
Anne Ruiz-Gazen
Toulouse School of Economics
21 allée de Brienne
31000 Toulouse, France
e-mail: anne.ruiz-gazen@tse-fr.eu
23
9 Supplemental Material
9.1 Proofs of Lemmas, Propositions and Corollaries in the main text
Proof of Lemma 8.4 We will use the Cramér-Wold device. To this end, we determine the limit
distribution of a1XF
N(t1)+ ···+akXF
N(tk), for a1,...,ak∈Rfixed and at
k= (a1,...,ak)6= (0,...,0).
As in the proof of Lemma 8.2, we consider
a1XF
N(t1) + ···+akXF
N(tk) = √n 1
N
N
X
i=1
ξi
πi
Vik −µk!,(9.1)
where Vik is defined in (8.9). We want to apply Lemma 8.3. As in (8.10),
nS2
N→at
kΣHT
kak, ω −a.s.,(9.2)
where at
kΣHT
kak>0, thanks to (HT4). This means that, according to Lemma 8.3, the right hand
side of (9.1) converges in distribution under Pd,m to a mean zero normal random variable with
variance
at
kΣHT
kak+λnEm[V2
1k]−(Em[V1k])2o=at
kΣF
HTak,
where
ΣF
HT =ΣHT
k+λΣF.(9.3)
We conclude that (9.1) converges in distribution to a1N1+···+akNk, where (N1,...,Nk)has a
mean zero k-variate normal distribution with covariance matrix ΣF
HT. By the Cramér-Wold device,
this proves the lemma. ✷
Proof of Lemma 8.5 Without loss of generality we may assume 0< F (t1)<···< F (tk)<1,
since we can permute the rows and columns of Mwithout changing the determinant. For the entries
of Mwe can distinguish three situations:
1. if 1≤j < i ≤k, then Mij =aF (tj)−bF (ti)F(tj)
2. if 1≤i=j≤k, then Mij =aF (ti)−bF (ti)2
3. if 1≤i < j ≤k, then Mij =aF (ti)−bF (ti)F(tj).
Now, for 2≤i≤k, multiply the i-th row by F(t1)/F (ti). This changes the determinant with a factor
F(t1)k−1/F (t2)···F(tk)>0, and as a result, all entries in column j, at positions 1≤i≤j≤k, are
the same: aF (t1)−bF (t1)F(tj). Hence, if we subtract row-2 from row-1, then row-3 from row-2,
. . . , and then row-kfrom row-(k−1), we get a new matrix M′with a right-upper triangle consisting
of zero’s and a main diagonal with elements M′
ii =aF (t1)−aF (t1)F(ti)/F (ti+1), if 1≤i≤k−1,
and M′
kk =aF (t1)−bF (t1)F(tk). It follows that
det(M) = F(t2)···F(tk)
F(t1)k−1det(M′)
=ak−1F(t1)(F(t2)−F(t1)) ···(F(tk)−F(tk−1))(a−bF (tk)) >0,
since a > 0,0< F (t1)<···< F (tk)<1, and a−bF (tk)> a −b≥0.✷
24
Proof of Lemma 8.6 The proof is similar to that of Lemma 8.4. We determine the limit
distribution of (9.1). Note that without loss of generality we can assume that 0≤F(t1)≤ ··· ≤
F(tk)≤1. In contrast with the proof of Lemma 8.4, we now have to distinguish between several
cases.
We first consider the situation where all F(ti)’s are distinct and such that 0< F (ti)<1. From
(HT1) and Lemma 9.1(ii) we conclude that
nS2
N→σ2
HT =µπ1Em[V2
1k] + µπ2(Em[V1k])2=at
kΣkak,
where
Σk=µπ1F(tq∧tr) + µπ2F(tq)F(tr)k
q,r=1 .(9.4)
First note that
µπ1+µπ2= lim
N→∞
n
N2
N
X
i=1
N
X
j=1
πij −πiπj
πiπj
= lim
N→∞
n
N2Var N
X
i=1
ξi
πi!≥0.
Therefore, together with condition (i) we can apply Lemma 8.5 with a=µπ1and b=−µπ2. It
follows that Σkis positive definite, so that σ2
HT >0. This means that, according to Lemma 8.3,
the right hand side of (9.1) converges in distribution under Pd,m to a mean zero normal random
variable with variance (µπ1+λ)Em[V2
1k] + (µπ2−λ) (Em[V1k])2=at
kΣF
HTak, where
ΣF
HT =(µπ1+λ)F(tq∧tr) + (µπ2−λ)F(tq)F(tr)k
q,r=1.(9.5)
We conclude that (9.1) converges in distribution to a1N1+···+akNk, where (N1,...,Nk)has a
mean zero k-variate normal distribution with covariance matrix ΣF
HT. By means of the Cramér-Wold
device, this proves the lemma for the case that 0< F (t1)<···< F (tk)<1.
The case that the F(ti)’s are not all distinct, but still satisfy 0< F (ti)<1, can be reduced
to the case where all F(ti)’s are distinct. This can be seen as follows. For simplicity, suppose
F(t1) = ··· =F(tm) = F(t0), with 0< F (t0)< F (tm+1)<··· < F (tk)<1. Then we can
write (9.1) as
a0XF
N(t0) + am+1XF
N(tm+1) + ···+akXF
N(tk),(9.6)
where a0=a1+···+am. As before, with (HT4) and Lemma 8.5, it follows from Lemma 8.3
that (9.6) converges in distribution to a mean zero normal random variable with variance at
0ΣF
0a0,
where a0= (a0, am+1,...,ak)tand
ΣF
0=γπ1Em[Y0Yt
0] + (γπ2−λ) (Em[Y0]) (Em[Y0])t,
with Y0= (1{Yi≤t0},1{Yi≤tm+1},...,1{Yi≤tk})t. However, note that
at
0Y0= (a1+···+am)1{Yi≤t0}+am+11{Yi≤tm+1 }+···+ak1{Yi≤tk}
=a11{Yi≤t1}+···+ak1{Yi≤tk}=at
kY1k,,
where ak= (a1,...,ak)tand Y1k= (1{Yi≤t1},...,1{Yi≤tk})t, as before. This means that at
0ΣF
0a0=
at
kΣF
HTak, with ΣF
HT from (9.3). It follows that (9.1) converges in distribution to a1N1+···+akNk,
where (N1,...,Nk)has a mean zero k-variate normal distribution with covariance matrix ΣF
HT.
By means of the Cramér-Wold device, this proves the lemma for the case F(t1) = ···=F(tm) =
F(t0)< F (tm+1)<··· < F (tk)<1. The argument is the same for other cases with multiple
F(ti)∈(0,1) being equal to each other.
25
Next, consider the case F(t1) = 0. In this case, 1{Yi≤t1}= 0 with probability one. This means
that the summation on the left hand side of (9.1) reduces to a2XF
N(t2) + ···+akXF
N(tk)and
ΣHT =
0 0 ··· 0
0
.
.
.
0
ΣHT,k−1
,(9.7)
where ΣHT,k−1is the matrix in (9.4) based on 0< F (t2)<··· < F (tk)<1. When at
k−1=
(a2,...,ak)6= (0,...,0), then
σ2
HT =at
kΣF
HTak=at
k−1ΣHT,k−1ak−1>0,
because ΣHT,k−1is positive definite, due to (HT4) and Lemma 8.5. This allows application of
Lemma 8.3 to (9.1). As in the previous cases, we conclude that (9.1) converges in distribution
to a1N1+···+akNk, where (N1,...,Nk)has a mean zero k-variate normal distribution with
covariance matrix ΣF
HT given by (9.3). When at
k= (a1,0,...,0), with a16= 0, then both (9.1) and
a1N1+···+akNkare equal to zero. According to the Cramér-Wold device, this proves the lemma
for the case F(tk) = 0.
It remains to consider the case F(tk) = 1. In this case, the (k, k)-th element of the matrix ΣHT
in (9.4) is equal to µπ1+µπ2. We distinguish between µπ1+µπ2= 0 and µπ1+µπ2>0. In the
latter case, from the proof of Lemma 8.5 we find that ΣHT has determinant
µk−1
π1F(t1)
k
Y
i=2
(F(ti)−F(ti−1))(µπ1+µπ2)>0,
using (HT4) and 0< F (t1)<···< F (tk−1)< F (tk) = 1. This allows application of Lemma 8.3
to (9.1). As before, we conclude that (9.1) converges in distribution to a1N1+···+akNk, where
(N1,...,Nk)has a k-variate mean zero normal distribution with covariance matrix ΣF
HT from (9.3).
According to the Cramér-Wold device, this proves the lemma for the case F(tk) = 1 and µπ1+µπ2>
0.
Next, consider the case F(tk) = 1 and µπ1+µπ2= 0. This means
ΣHT =
ΣHT,k−1
0
.
.
.
0
0··· 0 0
,(9.8)
where ΣHT,k−1is the matrix in (9.4) corresponding to 0< F (t1)<··· < F (tk−1)<1. When
at
k−1= (a1,...,ak−1)6= (0,...,0), then
σ2
HT =at
kΣHTak=at
k−1ΣHT,k−1ak−1>0,
because ΣHT,k−1is positive definite, due to (HT4) and Lemma 8.5. This allows application of
Lemma 8.3 to (9.1). As in the previous cases, we conclude that (9.1) converges in distribution to
a1N1+···+akNk, where (N1,...,Nk)has a k-variate mean zero normal distribution with covariance
matrix ΣF
HT given by (9.3). When at
k= (0,...,0, ak), with ak6= 0, then a1N1+···+akNk= 0 and
a1XF
N(t1) + ···+akXF
N(tk) = ak√n 1
N
N
X
i=1
ξi
πi−1!.
26
converges to zero in probability. The latter follows from the fact that according to (HT1) and
Lemma 9.1, we have that
√n 1
N
N
X
i=1
ξi
πi−1!→N(0, µπ1+µπ2),(9.9)
in distribution under Pd,m. According to the Cramér-Wold device, this proves the lemma for the
case F(tk) = 1 and µπ1+µπ2= 0. Finally, the argument for the case that F(t1) = 0 and F(tk) = 1
simultaneously, either with or without repeated among the F(ti)’s, is completely similar. This
finishes the proof. ✷
Proof of Proposition 4.1 The proof is similar to that of Theorem 4.2. We find that the limit
behavior of √n(FHJ
N−FN)is the same as that of the process YNdefined in (4.3). When we first
consider the case of uniform Yi’s with F(t) = t, tightness of the process YNfollows in the same way
as in the proof of Theorem 4.2. It remains to establish weak convergence of the finite dimensional
projections (8.19). This can be done in the same way as in the proof of Proposition 3.1, but this
time with
Vik =a11{Yi≤t1}−t1) + ···+ak(1{Yi≤tk})−tk.
From (HT1) and Lemma 9.1(i) we conclude that (8.20) converges in distribution to a mean zero
normal random variable with variance
σ2
HT =µπ1EmV2
1k=at
ke
Σkak,
where e
Σkis the k×k-matrix with (q, r)-element equal to µπ1(tq∧tr−tqtr). We conclude that (8.20)
converges in distribution to a1N1+···+akNk, where (N1,...,Nk)has a k-variate mean zero normal
distribution with covariance matrix e
Σk. By means of the Cramér-Wold device this establishes the
limit distribution of (8.19), which is the same as that of the vector (GHJ (t1),...,GHJ (tk)), where
GHJ is a mean zero Gaussian process with covariance function
Ed,mGHJ(s)GHJ(t) = µπ1(s∧t−st).
From here on, the proof is completely the same as that of Theorem 4.2.✷
Proof of Proposition 4.2 From relation (4.7) and Theorem 4.1 we know that the limit behavior
of √n(FHJ
N−F)is the same as that of Gπ
N. Tightness of Gπ
Nhas been obtained in the proof of
Theorem 4.1. It remains to establish weak convergence of (8.17). This can be done in the same
way as in the proof of Lemma 8.6, but this time with
Vik =a11{Yi≤t1}−F(t1)+···+ak1{Yi≤tk}−F(tk)
and µk= 0. When 0< F (t1)<··· < F (tk)<1, from (HT1) and Lemma 9.1 we find that
nS2
N→µπ1Em[V2
1k] = at
kΣkak, where
Σk=µπ1F(tq∧tr)−F(tq)F(tr)k
q,r=1.(9.10)
From condition (i) of Proposition 3.2 and Lemma 8.5, it follows that Σkis positive definite, so that
at
kΣkak>0. Hence, according to Lemma 8.3, the right hand side of (8.18) converges in distribution
under Pd,m to a mean zero normal random variable with variance (µπ1+λ)Em[V2
1k] = at
kΣF
HJak,
where
ΣF
HJ =(µπ1+λ)F(tq∧tr)k
q,r=1 .(9.11)
27
We conclude that the right hand side of (8.18) converges in distribution to a1N1+···+akNk, where
(N1,...,Nk)has a mean zero k-variate normal distribution with covariance matrix ΣF
HJ. By means
of the Cramér-Wold device, this proves weak convergence of Gπ
N(t1),...,Gπ
N(tk)for the case that
0< F (t1)<···< F (tk)<1. As in the proof of Lemma 8.6, the case where the F(ti)’s are not all
distinct, but satisfy 0< F (ti)<1, the case F(t1) = 0, and the case F(tk) = 1, can be reduced to
the previous case. From here on, the proof is completely the same as that of Theorem 4.1.✷
Proof of (5.2) Following [Dd08], one can write φ=ψ2◦ψ1, where
ψ1(F) = F, β F −1(α)
ψ2(F, x) = F(x).
The Hadamard-derivative of φcan then be obtained from the chain rule, e.g., see Lemma 3.9.3
in [vdVW96]. According to Lemma 3.9.20 in [vdVW96], for 0< α < 1and F∈Dφthat have a
positive derivative at F−1(α), the map ψ1is Hadamard-differentiable at Ftangentially to the set
of functions h∈D(R)that are continuous at F−1(α)with derivative
ψ′
1,F (h) = h, −βh(F−1(α))
f(F−1(α)) .
It is fairly straightforward to show that for Fthat are differentiable at x, the mapping ψ2is
Hadamard-differentiable at (F, x)tangentially to the set of pairs (h, ǫ), such that his continuous at
xand ǫ∈R, with derivative
ψ′
2,(F,x)(h, ǫ) = ǫf (x) + h(x).
Then for F∈Dφthat are differentiable at βF −1(α), the mapping ψ2is Hadamard-differentiable at
ψ1(F) = F, β F −1(α). It follows from the chain rule that φ(F) = FβF −1(α)=ψ2◦ψ1(F)is
Hadamard-differentiable at Ftangentially to the set D0consisting of functions h∈D(R)that are
continuous at F−1(α)with derivative
φ′
F(h) = −βf(βF −1(α))
f(F−1(α)) h(F−1(α)) + h(βF −1(α)).
✷
Proof of Corollary 5.1 The mapping φ:Dφ⊂D(R)7→ Ris Hadamard-differentiable at F
tangentially to the set D0consisting of functions h∈D(R)that are continuous at F−1(α). According
to Theorem 3.2, the sequence √n(FHT
N−F)converges weakly to a mean zero Gaussian process GHT
F
with covariance structure
Ed,mGHT
F(s)GHT
F(t) = (µπ1+λ)F(s∧t) + (µπ2−λ)F(s)F(t),(9.12)
for s, t ∈R. It then follows from Theorem 3.9.4 in [vdVW96], that the random variable √n(φ(FHT
N)−
φ(F)) converges weakly to
−βf(βF −1(α))
f(F−1(α)) GHT
F(F−1(α)) + GHT
F(βF −1(α)),
which has a normal distribution with mean zero and variance
σ2
HT,α,β =β2f(βF −1(α))2
f(F−1(α))2EGHT
F(F−1(α))2
+EGHT
F(βF −1(α))2
−2βf(βF −1(α))
f(F−1(α)) EGHT
F(F−1(α))GHT
F(βF −1(α)).
28
The precise expression can then be derived from (9.12), which proves part one. For part two, write
√nφ(FHT
N)−φ(FN)=√nφ(FHT
N)−φ(F)+√n
√N√N(φ(FN)−φ(F)) .
The process √N(FN−F)converges weakly to a mean zero Gaussian process GF. Then, Hadamard-
differentiability of φtogether with Theorem 3.9.4 in [vdVW96] yields that the sequence √N(φ(FN)−
φ(F)) converges weakly to φ′
F(GF). As n/N →0, the theorem follows from part one. ✷
Proof of Corollary 5.2 The proof is completely the same as that of Corollary 5.1, with the only
difference that the covariance structure of the limiting process √n(φ(FHJ
N)−φ(F)) is now given in
Theorem 4.3.✷
9.2 Additional Lemmas
Lemma 9.1. Let S2
Nbe defined by (3.2), where V1, V2,... is a sequence of i.i.d. random variables
on (Ω,F,Pm)with Em[V4
1]<∞. Suppose that nand πi, πij, for i, j = 1,2,...,N are deterministic
and let Vm(S2
N)denote the variance of S2
N. If (C1)-(C2) hold, then n2Vm[S2
N] = O(1/N). Then,
(i) if Em[V1] = 0 and condition (i) in Proposition 3.1 holds,
nS2
N→σ2
HT =µπ1Em[V2
1],in Pm-probability.
(ii) if Em[V1]6= 0 and conditions (i)-(ii) in Proposition 3.1 hold,
nS2
N→σ2
HT =µπ1Em[V2
1] + µπ2(Em[V1])2,in Pm-probability.
Proof. For any ǫ > 0, by Markov inequality we have
Pm|nS2
N−Em[nS2
N]|> ǫ<n2Vm[S2
N]
ǫ2,(9.13)
where Vmdenotes the variance of S2
Nunder the super-population model. In order to compute
Vm[S2
N], we first have
Em[S2
N] = 1
N2
N
X
i=1
N
X
j=1
πij −πiπj
πiπj
Em(ViVj)
=Em[V2
1]
N2
N
X
i=1
1−πi
πi
+(Em[V1])2
N2XX
i6=j
πij −πiπj
πiπj
.
(9.14)
From this, tedious but straightforward calculus leads to the expression for (Em[S2
N])2and Em[S4
N].
One finds
N4EmS2
N2=a1(Em[V1])4+a2EmV2
1(Em[V1])2+a3EmV2
12,
29
where, according to (C1)-(C2):
a1=XXXX
(i,j,k,l)∈D4,N
πij −πiπj
πiπj
πkl −πkπl
πkπl
+ 4 XXX
(i,j,l)∈D3,N
πij −πiπj
πiπj
πil −πiπl
πiπl
+ 2 XX
(i,j)∈D2,N πij −πiπj
πiπj2
=XXXX
(i,j,k,l)∈D4,N
πij −πiπj
πiπj
πkl −πkπl
πkπl
+O(N3/n2) + O(N2/n2)
a2= 2 XXX
(i,k,l)∈D3,N
1−πi
πi
πkl −πkπl
πkπl
+ 4 XX
(i,k)∈D2,N
1−πi
πi
πik −πiπk
πiπk
= 2 XXX
(i,k,l)∈D3,N
1−πi
πi
πkl −πkπl
πkπl
+O(N3/n2)
a3=XX
(i,j)∈D2,N
1−πi
πi
1−πj
πj
+
N
X
i=1 1−πi
πi2
=XX
(i,j)∈D2,N
1−πi
πi
1−πj
πj
+O(N3/n2).
Furthermore,
N4EmS4
N=b1(Em[V1])4+b2EmV2
1(Em[V1])2
+b3EmV2
12+b4Em[V1]EmV3
1
where
b1=XXXX
(i,j,k,l)∈D4,N
πij −πiπj
πiπj
πkl −πkπl
πkπl
+
N
X
i=1 1−πi
πi2
=XXXX
(i,j,k,l)∈D4,N
πij −πiπj
πiπj
πkl −πkπl
πkπl
+O(N3/n2)
b2= 2 XXX
(i,k,l)∈D3,N
1−πi
πi
πkl −πkπl
πkπl
+ 4 XXX
(i,j,l)∈D3,N
πij −πiπj
πiπj
πil −πiπl
πiπl
= 2 XXX
(i,k,l)∈D3,N
1−πi
πi
πkl −πkπl
πkπl
+O(N3/n2)
b3=XX
(i,k)∈D2,N
1−πi
πi
1−πk
πk
+ 2 XX
(i,j)∈D2,N πij −πiπj
πiπj2
=XX
(i,k)∈D2,N
1−πi
πi
1−πk
πk
+O(N2/n2)
b4= 4 XX
(i,j)∈D2,N
πij −πiπj
πiπj
1−πj
πj
=O(N3/n2).
The variance expression for S2
Nis deduced easily from the previous computations. From the expres-
sion derived in [BLRG15], we find that ai−bi=O(N3/n2), for i= 1,2,3, and b4=O(N3/n2), so
30
that
n2Vm[S2
N] = n2Em[S4
N]−n2Em[S2
N]2=O(1/N ).(9.15)
From (9.13) we conclude that nS2
N−Em[nS2
N]tends to zero in Pm-probability. As a consequence,
statements (i) and (ii) follow from (9.14).
Lemma 9.2. If xN xand yN yin D[0,1] with the Skorohod metric, and x, y ∈C[0,1], then
the sequence {xN+yN}is also tight in D[0,1].
Proof. We can use Theorem 13.2 from [Bil99]. The first condition follows easily since
sup
t∈[0,1] |xN(t) + yN(t)| ≤ sup
t∈[0,1] |xN(t)|+ sup
t∈[0,1] |yN(t)|.
Because xN xand yN yboth sequences {xN}and {yN}are tight, so that they satisfy the
first condition of Theorem 13.2 individually. For condition (ii) of Theorem 13.2 in [Bil99], choose
ǫ > 0. According to (12.7) in [Bil99], for any 0< δ < 1/2,
w′
x(δ)≤wx(2δ).
This means that
Pw′
xN+yN(δ)≥ǫ≤P{wxN+yN(2δ)≥ǫ}
≤P{wxN(2δ)≥ǫ/2}+P{wyN(2δ)≥ǫ/2}.
Consider the first probability. Since xN xin D[0,1] with the Skorohod metric, according to the
almost sure representation theorem (see, e.g., Theorem 11.7.2 in [Dud02]), there exist exnand ex,
having the same distribution as xNand x, respectively, such that exN→ex, with probability one, in
the Skorohod metric. Because exd
=xand x∈C[0,1], also ex∈C[0,1]. Hence, since exis continuous,
it follows that
sup
t∈[0,1] |exN(t)−ex(t)| → 0,with probability one. (9.16)
We then find that
P{wxN(2δ)≥ǫ/2}=P(sup
|s−t|<2δ|xN(s)−xN(t)| ≥ ǫ/2)
=P(sup
|s−t|<2δ|exN(s)−exN(t)| ≥ ǫ/2)
≤P(sup
|s−t|<2δ|ex(s)−ex(t)| ≥ ǫ/4)
+P(sup
s∈[0,1] |exN(s)−ex(s)| ≥ ǫ/8)+P(sup
t∈[0,1] |exN(t)−ex(t)| ≥ ǫ/8).
The latter two probabilities tend to zero due to to (9.16). For the first probability on the right
hand side, note that C[0,1] is separable and complete. This means that each random element in
C[0,1] is tight. Hence, ex∈C[0,1] is tight, so that according to Theorem 7.3 in [Bil99], there exists
a0< δ < 1/2, such that
P(sup
|s−t|<2δ|x(s)−x(t)| ≥ ǫ/4)=P{wx(2δ)≥ǫ/4} ≤ η.
We conclude that P{wxN(2δ)≥ǫ/2} → 0, and the same result for yNcan be obtained similarly.
This proves the lemma.
31