ArticlePDF Available

Functional central limit theorems for single-stage sampling designs

Authors:

Abstract

For a joint model-based and design-based inference, we establish functional central limit theorems for the Horvitz-Thompson empirical process and the H\'ajek empirical process centered by their finite population mean as well as by their super-population mean in a survey sampling framework. The results apply to generic sampling designs and essentially only require conditions on higher order correlations. We apply our main results to a Hadamard differentiable statistical functional and illustrate its limit behavior by means of a computer simulation.
arXiv:1509.09273v1 [math.ST] 30 Sep 2015
Functional central limit theorems
in survey sampling
Hélène Boistard1, Hendrik P. Lopuhaä2, and Anne Ruiz-Gazen3
1Toulouse School of Economics
2Delft University of Technology
3Toulouse School of Economics
October 1, 2015
Abstract
For a joint model-based and design-based inference, we establish functional central limit
theorems for the Horvitz-Thompson empirical process and the Hájek empirical process centered
by their finite population mean as well as by their super-population mean in a survey sampling
framework. The results apply to generic sampling designs and essentially only require conditions
on higher order correlations. We apply our main results to a Hadamard differentiable statistical
functional and illustrate its limit behavior by means of a computer simulation.
1 Introduction
Functional central limit theorems are well established in statistics. Much of the theory has been de-
veloped for empirical processes of independent summands. In combination with the functional delta-
method they have become a very powerful tool for investigating the limit behavior for Hadamard or
Fréchet differentiable statistical functionals (e.g., see [vdVW96] or [vdV98] for a rigorous treatment
with several applications).
In survey sampling, results on functional central limit theorems are far from complete. At the
same time there is a need for such results. For instance, in [Dd08] the limit distribution of several
statistical functionals is investigated, under the assumption that such a limit theorem exists for a
design-based empirical process, whereas in [BD09] the existence of a functional central limit theorem
is assumed, to perform model-based inference on several Gini indices. Weak convergence of processes
in combination with the delta method are treated in [Bha07], [Dav09], [BM11], but these results
are tailor made for specific statistical functionals, and do not apply to the empirical processes that
are typically considered in survey sampling.
Recently, functional central limit theorems for empirical processes in survey sampling have ap-
peared in the literature. Most of them are concerned with empirical processes indexed by a class
of functions, see [BW07], [SW13], and [BCC14]. However, the results in [BW07] and [SW13]
are restricted to sampling schemes that have exchangeable inclusion indicators and constant inclu-
sion probabilities, such as simple random sampling and Bernoulli sampling, whereas the approach
in [BCC14] seems difficult to extend to sampling designs other than those that are closely related
to Poisson sampling. [Wan12] considers empirical processes indexed by a real valued parameter.
Unfortunately, this paper seems to miss a number of assumptions that cannot be avoided and, more
importantly, it seems to contain a flaw in the proof. (see Section 7for a more detailed discussion).
1
The main purpose of the present paper is to establish functional central limit theorems for
the Horvitz-Thompson and the Hájek empirical distribution function that apply to general sam-
pling designs. For design-based inference about finite population parameters, these empirical dis-
tribution functions will be centered around their population mean. On the other hand, in many
situations involving survey data, one is interested in the corresponding model parameters (e.g.,
see [KG98] and [BR09]). Recently, Rubin-Bleuer and Schiopu Kratina [RBSK05] defined a mathe-
matical framework for joint model-based and design-based inference through a probability product-
space and introduced a general and unified methodology for studying the asymptotic properties
of model parameter estimators. To incorporate both types of inferences, we consider the Horvitz-
Thompson empirical process and the Hájek empirical process under the super-population model
described in [RBSK05], both centered around their finite population mean as well as around their
super-population mean. Our main results are functional central limit theorems for both empirical
processes indexed by a real valued parameter and apply to generic sampling schemes. These results
are established only requiring the usual standard assumptions that one encounters in asymptotic
theory in survey sampling. Our approach was inspired by an unpublished manuscript from Philippe
Fevrier and Nicolas Ragache, which was the outcome of an internship at INSEE in 2001.
The article is organized as follows. Notations and assumptions are discussed in Section 2.
In particular we briefly discuss the joint model-based and design-based inference setting defined
in [RBSK05]. In Sections 3and 4, we list the assumptions and state our main results. Our
assumptions essentially concern the inclusion probabilities of the sampling design up to the fourth
order and a central limit theorem (CLT) for the Horvitz-Thompson estimator of a population total
for i.i.d. bounded random variables. Our results allow random inclusion probabilities and are stated
in terms of the design-based expected sample size, but we also formulate more detailed results in
case these quantities are deterministic.
As an application of our results, in combination with the functional delta-method, we obtain the
limit distribution of the poverty rate in Section 5. This example is further investigated in Section 6
by means of a simulation. Finally, in Section 7we discuss in detail the differences of our results
with the work by [BW07], [SW13], [Wan12], and [BCC14]. All proofs are deferred to Section 8and
some tedious technicalities can be found in [BLRG15].
2 Notations and assumptions
We adopt the super-population setup as described in [RBSK05]. Consider a sequence of finite
populations (UN), of sizes N= 1,2,.... With each population we associate a set of indices UN=
{1,2,...,N}. Furthermore, for each index iUN, we have a tuple (yi, zi)R×Rq
+. We denote
yN= (y1, y2,...,yN)RNand zNRq×N
+similarly. The vector yNcontains the values of the
variable of interest and zNcontains information for the sampling design. We assume that the values
in each finite population are realizations of random variables (Yi, Zi)R×Rq
+, for i= 1,2,...,N,
on a common probability space (Ω,F,Pm). Similarly, we denote YN= (Y1, Y2,...,YN)RNand
ZNRq×N
+. To incorporate the sampling design, a product space is defined as follows. For all
N= 1,2,..., let SN={s:sUN}be the collection of subsets of UNand let AN=σ(SN)be the
σ-algebra generated by SN. A sampling design associated to some sampling scheme is a function
p:AN×Rq×N
+7→ [0,1], such that
(i) for all s∈ SN,zN7→ p(s, zN)is a Borel-measurable function on Rq×N
+.
(ii) for all zNRq×N
+,A7→ p(A, zN)is a probability measure on AN.
Note that for each ω, we can define a probability measure A7→ Pd(A, ω) = PsAp(s, ZN(ω))
on the design space (SN,AN). Corresponding expectations will be denoted by Ed(·, ω). Next, we
2
define a product probability space that includes the super-population and the design space, under
the premise that sample selection and the model characteristic are independent given the design
variables. Let (SN×,AN×F)be the product space with probability measure Pd,m defined on
simple rectangles {s} × EAN×Fby
Pd,m({s} × E) = ZE
p(s, ZN(ω)) dPm(ω) = ZE
Pd({s}, ω)dPm(ω).
When taking expectations or computing probabilities, we will emphasize whether this is with respect
either to the measure Pd,m associated with the product space (SN×,AN×F), or the measure Pd
associated with the design space (SN,AN), or the measure Pmassociated with the super-population
space (Ω,F).
If nsdenotes the size of sample s, then this may depend on the specific sampling design including
the values of the design variables Z1(ω),...,ZN(ω). Similarly, the inclusion probabilities may
depend on the values of the design variables, πi(ω) = Ed(ξi, ω) = Psips, ZN(ω), where ξiis the
indicator 1{si}. Instead of ns, we will consider n=Ed[ns(ω)] = PN
i=1 Ed(ξi, ω) = PN
i=1 πi(ω). This
means that the inclusion probabilities and the design-based expected sample size may be random
variables on (Ω,F,Pm).
We first consider the Horvitz-Thompson (HT) empirical processes, obtained from the HT em-
pirical c.d.f.:
FHT
N(t) = 1
N
N
X
i=1
ξi1{Yit}
πi
, t R.(2.1)
We will consider HT empirical process n(FHT
NFN), obtained by centering around the empirical
c.d.f. FNof Y1,...,YN, as well as the HT empirical process n(FHT
NF), obtained by centering
around the c.d.f. Fof the Yi’s. A functional central limit theorem for both processes will be
formulated in Section 3. In addition, we will consider the Hájek empirical c.d.f.:
FHJ
N(t) = 1
b
N
N
X
i=1
ξi1{Yit}
πi
, t R,(2.2)
where b
N=PN
i=1 ξiiis the HT estimator for the population total N. Functional central limit
theorems for n(FHJ
NFN)and n(FHJ
NF)will be provided in Section 4. The advantage of our
results is that they allow general sampling schemes and that we primarily require bounds on the
rate at which higher order correlations tend to zero ω-almost surely, under the design measure Pd.
3 FCLT’s for the Horvitz-Thompson empirical processes
A functional central limit theorem for n(FHT
NFN)and n(FHT
NF)is obtained by proving weak
convergence of all finite dimensional distributions and tightness. In order to establish the latter for
general sampling schemes, we impose a number of conditions that involve the sets
Dν,N =n(i1, i2,...,iν)∈ {1,2,...,N}ν:i1, i2,...,iνall differento,(3.1)
for the integers 1ν4. We assume the following conditions:
(C1) there exist constants K1, K2, such that for all i= 1,2,...,N,
0< K1Nπi
nK2<, ω a.s.
3
There exists a constant K3>0, such that for all N= 1,2,...:
(C2) max(i,j)D2,N Ed(ξiπi)(ξjπj)< K3n/N2,
(C3) max(i,j,k)D3,N Ed(ξiπi)(ξjπj)(ξkπk)< K3n2/N3,
(C4) max(i,j,k,l)D4,N Ed(ξiπi)(ξjπj)(ξkπk)(ξlπl)< K3n2/N4,
ω-almost surely. These conditions on higher order correlations are commonly used in the litera-
ture on survey sampling in order to derive asymptotic properties of estimators (e.g., see [BO00],
and [CCGL10]). [BO00] proved that they hold for simple random sampling without replacement and
stratified simple random sampling without replacement, whereas [BLRG12] proved that they hold
also for rejective sampling. Lemma 2 from [BLRG12] allows us to reformulate the above conditions
on higher order correlations into conditions on higher order inclusion probabilities.
To establish the convergence of finite dimensional distributions, for sequences of bounded i.i.d. ran-
dom variables V1, V2,...on (Ω,F,Pm), we will need a CLT for the HT estimator in the design space,
conditionally on the Vi’s. To this end, let S2
Nbe the (design-based) variance of the HT estimator
of the population mean, i.e.,
S2
N=1
N2
N
X
i=1
N
X
j=1
πij πiπj
πiπj
ViVj.(3.2)
We assume that
(HT1) For Nsufficiently large SN>0and for any sequence of bounded i.i.d. random variables
V1, V2,...,
1
SN 1
N
N
X
i=1
ξiVi
πi1
N
N
X
i=1
Vi!N(0,1), ω a.s.,
in distribution under Pd.
Note that (HT1) holds for simple random sampling without replacement if n(Nn)/N tends to
infinity when Ntends to infinity (see [Tho97]), as well as for Poisson sampling under some conditions
on the first order inclusion probabilities (e.g., see [Ful09]). For rejective sampling, [Háj64] gives some
sufficient conditions for (HT1) to hold.
We also need that nS2
Nconverges for the particular case where the Vi’s are random vectors
consisting of indicators 1{Yjt}.
(HT2) For k∈ {1,2,...},i= 1,2,...,k and t1, t2,...,tkR, define Yt
ik =1{Yit1},...,1{Yitk}.
There exists a deterministic matrix ΣHT
k, such that
lim
N→∞
n
N2
N
X
i=1
N
X
j=1
πij πiπj
πiπj
YikYt
jk =ΣHT
k, ω a.s. (3.3)
This kind of assumption is quite standard in the literature on survey sampling and is usually
imposed for general random vectors (see, for example [DS92], p.379, [FF91], condition 3 on page 457,
or [KR81], condition C4 on page 1014). It suffices to require (3.3) for Yt
ik =1{Yit1},...,1{Yitk}.
Moreover, if (C1)-(C2) hold, then the sequence in (3.3) is bounded, so that by dominated convergence
it follows that
ΣHT
k= lim
N→∞
1
N2
N
X
i=1
N
X
j=1
Emnπij πiπj
πiπj
YikYt
jk .(3.4)
4
This might help to get a more tractable expression for ΣHT
k.
We are now able to formulate our first main result. Let D(R)be the space of càdlàg functions
on Requipped with the Skorohod topology.
Theorem 3.1. Let Y1,...,YNbe i.i.d. random variables with c.d.f. Fand empirical c.d.f. FN
and let FHT
Nbe defined in (2.1). Suppose that conditions (C1)-(C4) and (HT1)-(HT2) hold. Then
n(FHT
NFN)converges weakly in D(R)to a mean zero Gaussian process GHT with covariance
function
EmGHT(s)GHT(t) = lim
N→∞
1
N2
N
X
i=1
N
X
j=1
Emnπij πiπj
πiπj
1{Yis}1{Yjt}
for s, t R.
Note that Theorem 3.1 allows a random (design-based) expected sample size nand random
inclusion probabilities. However, the expression of the covariance function of the limiting Gaussian
process is somewhat unsatisfactory. When nand the inclusion probabilities are deterministic, we can
obtain a functional CLT with a more precise expression for EmGHT(s)GHT(t)under slightly weaker
conditions. This is formulated in the proposition below. Note that with imposing conditions (i)-(ii)
in Proposition 3.1 instead of (3.3), convergence of nS2
Nis not necessarily guaranteed. However, this
is established in Lemma 9.1 in [BLRG15] under (C1) and (C2).
Finally, we like to emphasize that if we would have imposed (HT2) for any sequence Y1,Y2,...
of bounded random vectors, then (HT2) would have implied conditions (i)-(ii) in the deterministic
setup of Proposition 3.1.
Proposition 3.1. Consider the setting of Theorem 3.1, where nand πi, πij , for i, j = 1,2,...,N,
are deterministic. Suppose that (C1)-(C4) and (HT1) hold, but instead of (HT2) assume that there
exist constants µπ1,µπ2Rsuch that
(i) lim
N→∞
n
N2
N
X
i=1 1
πi1=µπ1,
(ii) lim
N→∞
n
N2XX
i6=j
πij πiπj
πiπj
=µπ2.
Then n(FHT
NFN)converges weakly in D(R)to a mean zero Gaussian process GHT with covariance
function µπ1F(st) + µπ2F(s)F(t), for s, t R.
When n/N λ[0,1], then conditions (i)-(ii) hold with µπ1= 1 λand µπ2=λ1for
simple random sampling without replacement. For Poisson sampling, (ii) holds trivially because
the trials are independent. For rejective sampling, (i)-(ii) together with n/N λ[0,1], can be
deduced from the associated Poisson sampling design. Indeed, suppose that (i) holds for Poisson
sampling with first order inclusion probabilities p1,...,pN, such that PN
i=1 pi=n. Then, from
Theorem 1 in [BLRG12] it follows that if d=PN
i=1 pi(1 pi)tends to infinity, assumption (i) holds
for rejective sampling. Furthermore, if n/N λ[0,1] and N/d has a finite limit, then also (ii)
holds for rejective sampling.
Weak convergence of the process n(FHT
NF), where we center with Finstead of FN, requires
a CLT in the super-population space for
n 1
N
N
X
i=1
ξiVi
πiµV!,where µV=Em(Vi),(3.5)
5
for sequences of bounded i.i.d. random variables V1, V2,... on (Ω,F,Pm). Our approach to establish
asymptotic normality of (3.5) is then to decompose as follows
n 1
N
N
X
i=1
ξiVi
πiµV!
=n 1
N
N
X
i=1
ξiVi
πi1
N
N
X
i=1
Vi!+n
N×N 1
N
N
X
i=1
ViµV!.
(3.6)
Since the Vi’s are i.i.d. and bounded, for the second term on the right hand side, by the traditional
CLT we immediately obtain
N 1
N
N
X
i=1
ViµV!N(0, σ2
V),(3.7)
in distribution under Pm, where σ2
Vdenotes the variance of the Vi’s, whereas the first term on the
right hand side can be handled with (HT1). [BW07] and [SW13] use a decomposition similar to
the one in (3.6). Their approach assumes exchangeable ξi’s and equal inclusion probabilities n/N,
which allows the use of results on exchangeable weighted bootstrap to handle the first term on the
right hand side of (3.6). Instead, we only require conditions (C2)-(C4) on higher order correlations
for the ξi’s and allow the πi’s to vary within certain bounds as described in (C1). To combine the
two separate limits in (3.7) and (HT1), we will need
(HT3) n/N λ[0,1],ω-a.s.
We will then use Theorem 5.1(iii) from [RBSK05]. The finite dimensional projections of the processes
involved turn out to be related to a particular HT estimator. In order to have the corresponding
design-based variance converging to a strictly positive constant, we need the following condition.
(HT4) For all k∈ {1,2,...}and t1, t2,...,tkR, the matrix ΣHT
kin (3.3) is positive definite.
We are now able to formulate our second main result.
Theorem 3.2. Let Y1,...,YNbe i.i.d. random variables met c.d.f. Fand let FHT
Nbe defined in (2.1).
Suppose that conditions (C1)-(C4) and (HT1)-(HT4) hold. Then n(FHT
NF)converges weakly in
D(R)to a mean zero Gaussian process GHT
Fwith covariance function Ed,mGHT
F(s)GHT
F(t)given by
lim
N→∞
1
N2
N
X
i=1
N
X
j=1
Emnπij πiπj
πiπj
1{Yis}1{Yjt}+λF(st)F(s)F(t),
for s, t R.
Theorem 3.2 allows random nand inclusion probabilities. As before, when the sample size nand
inclusion probabilities are deterministic we can obtain a functional CLT under a simpler condition
than (HT4) and with a more detailed description of the covariance function of the limiting process.
Proposition 3.2. Consider the setting of Theorem 3.2, where nand πi, πij , for i, j = 1,2,...,N,
are deterministic. Suppose that (C1)-(C4), (HT1)and (HT3) hold, but instead of (HT2) and (HT4)
assume that there exist constants µπ1,µπ2Rsuch that
(i) lim
N→∞
n
N2
N
X
i=1 1
πi1=µπ1>0,
6
(ii) lim
N→∞
n
N2XX
i6=j
πij πiπj
πiπj
=µπ2.
Then n(FHT
NF)converges weakly in D(R)to a mean zero Gaussian process GHT with covariance
function (µπ1+λ)F(st) + (µπ2λ)F(s)F(t), for s, t R.
Since 1i1, we will always have µπ10in condition (i) in Proposition 3.2. This means that
(i) is not very restrictive. For simple random sampling without replacement, condition (i) requires
λto be strictly smaller than one.
4 FCLT’s for the Hájek empirical processes
To determine the behavior of the process n(FHJ
NFN), it is useful to relate this process to the
process
Gπ
N(t) = n
N
N
X
i=1
ξi
πi1{Yit}F(t).(4.1)
We can then write nFHJ
N(t)FN(t)=YN(t) + N
b
N1Gπ
N(t),(4.2)
where
YN(t) = n
N
N
X
i=1 ξi
πi11{Yit}F(t).(4.3)
As intermediate results we will first show that the process Gπ
Nconverges weakly to a mean zero
Gaussian process and that b
N/N 1in probability. As a consequence, the limiting behavior of
n(FHJ
NFN)will be the same as that of YN, which is an easier process to handle. Instead of
(HT2) and (HT4) we now need
(HJ2) For k∈ {1,2,...},i= 1,2,...,k and t1, t2,...,tkR, define
e
Yt
ik =1{Yit1}F(t1),...,1{Yitk}F(tk).
There exists a deterministic matrix ΣHJ
k, such that
lim
N→∞
n
N2
N
X
i=1
N
X
j=1
πij πiπj
πiπje
Yik e
Yt
jk =ΣHJ
k, ω a.s. (4.4)
and
(HJ4) For all k∈ {1,2,...}and t1, t2,...,tkR, the matrix ΣHJ
kin (4.4) is positive definite.
As in the case of (3.4), if (C1)-(C2) hold, then (HJ2) implies
ΣHJ
k= lim
N→∞
1
N2
N
X
i=1
N
X
j=1
Emnπij πiπj
πiπje
Yik e
Yt
jk .(4.5)
Theorem 4.1. Let Gπ
Nbe defined in (4.1)and let b
N=PN
i=1 ξii. Suppose n→ ∞,ω-a.s., and
that there exists σ2
π0, such that
n
N2
N
X
i=1
N
X
i=1
πij πiπj
πiπjσ2
π, ω a.s. (4.6)
If in addition,
7
(i) (HT1) hold, then b
N/N 1in Pd,m probability.
(ii) (C1)-C(4), (HT1), (HT3), (HJ2) and (HJ4) hold, then Gπ
Nconverges weakly in D(R)to a
mean zero Gaussian process Gπwith covariance function Ed,mGπ(s)Gπ(t)given by
lim
N→∞
1
N2
N
X
i=1
N
X
j=1
Emnπij πiπj
πiπj1{Yis}F(s)1{Yit}F(t)
+λ(F(st)F(s)F(t)) , s, t R.
Note that in view of condition (HT3), the condition n→ ∞ is immediate, if λ > 0. We proceed
by establishing weak convergence of n(FHJ
NFN).
Theorem 4.2. Let Y1,...,YNbe i.i.d. random variables with c.d.f. Fand empirical c.d.f. FNand
let FHJ
Nbe defined in (2.2). Suppose n→ ∞,ω-a.s., and that (C1)-C(4), (HT1), (HT3), and (HJ2)
hold, as well as condition (4.6). Then n(FHJ
NFN)converges weakly in D(R)to a mean zero
Gaussian process GHJ with covariance function Ed,mGHJ (s)GHJ(t)given by
lim
N→∞
1
N2
N
X
i=1
N
X
j=1
Emnπij πiπj
πiπj1{Yis}F(s)1{Yit}F(t),
for s, t R.
Note that we do not need condition (HJ4) in Theorem 4.2. This condition is only needed in
Theorem 4.1 to establish the limit distribution of the finite dimensional projections of the process
Gπ
N. For Theorem 4.2 we only need that Gπ
Nis tight.
As before, below we obtain a functional CLT for n(FHJ
NFN)in the case that nand the
inclusion probabilities are deterministic. Similar to the remark we made after Theorem 3.1, note
that if we would have imposed (HJ2) for any sequence of bounded random vectors, then this would
imply conditions (i)-(ii) of Proposition 3.1, which can then be left out in Theorem 4.1.
Proposition 4.1. Consider the setting of Theorem 4.2, where nand πi, πij , for i, j = 1,2,...,N,
are deterministic. Suppose n→ ∞ and that (C1)-(C4), (HT1) and (HT3) hold, as well as condi-
tions (i)-(ii) from Proposition 3.1. Then n(FHJ
NFN)converges weakly in D(R)to a mean zero
Gaussian process GHT with covariance function µπ1(F(st)F(s)F(t)), for s, t R.
Finally, we consider n(FHJ
NF). Again, we relate this process to (4.1) and write
nFHJ
N(t)F(t)=N
b
N
Gπ
N(t).(4.7)
Since b
N/N 1in probability, this implies that n(FHJ
NF)has the same limiting behavior as
Gπ
N.
Theorem 4.3. Let Y1,...,YNbe i.i.d. random variables with c.d.f. Fand let FHJ
Nbe defined in (2.2).
Suppose n→ ∞,ω-a.s., and that (C1)-C(4), (HT1), (HT3), (HJ2) and (HJ4) hold, as well as
condition (4.6). Then n(FHJ
NF)converges weakly in D(R)to a mean zero Gaussian process GHJ
F
with covariance function Ed,mGπ(s)Gπ(t)given by
lim
N→∞
1
N2
N
X
i=1
N
X
j=1
Emnπij πiπj
πiπj1{Yis}F(s)1{Yit}F(t)
+λ(F(st)F(s)F(t)) , s, t R.
8
With Theorem 4.3 we recover Theorem 1 in [Wan12]. Our assumptions are comparable to those
in [Wan12], although this paper seems to miss a condition on the convergence of the variance, such
as our condition (HJ2).
We conclude this section by establishing a functional CLT for n(FHJ
NF)in the case of
deterministic nand inclusion probabilities.
Proposition 4.2. Consider the setting of Theorem 4.3, where nand πi, πij , for i, j = 1,2,...,N,
are deterministic. Suppose n→ ∞ and that (C1)-(C4), (HT1) and (HT3) hold, as well as condi-
tions (i)-(ii) from Proposition 3.2. Then n(FHJ
NF)converges weakly in D(R)to a mean zero
Gaussian process GHJ with covariance function (µπ1+λ) (F(st)F(s)F(t)), for s, t R.
5 Hadamard-differentiable functionals
Theorem 4.3 provides an elegant means to study the limit behavior of estimators that can be
described as φ(FHJ
N), where φis a Hadamard-differentiable functional. Given such a φ, the functional
delta-method, e.g., see Theorems 3.9.4 and 3.9.5 in [vdVW96] or Theorem 20.8 in [vdV98], enables
one to establish the limit distribution of φ(FHJ
N). Similarly, this holds for Theorems 3.1,3.2, and 4.2,
or Propositions 3.1,3.2,4.1, and 4.2 in the special case of deterministic nand inclusion probabilities.
We illustrate this by discussing the poverty rate. This indicator has recently been revisited
by [GT14] and [OAB15]. This example has also been discussed by [Dd08], but under the assumption
of weak convergence of n(FHJ
NFN)to some centered continuous Gaussian process. Note that
this assumption is now covered by our Theorem 4.2 and Proposition 4.1. Let DφD(R)consist of
FD(R)that are non-decreasing. Then for FDφ, the poverty rate is defined as
φ(F) = FβF 1(α)(5.1)
for fixed 0< α, β < 1, where F1(α) = inf {t:F(t)α}. Typical choices are α= 0.5and β= 0.5
(INSEE) or β= 0.6(EUROSTAT). Its Hadamard derivative is given by
φ
F(h) = βf(βF 1(α))
f(F1(α)) h(F1(α)) + h(βF 1(α)).(5.2)
See [BLRG15] for details.
We then have the following corollaries for the Horvitz-Thompson estimator φ(FHT
N)and the
Hájek estimator φ(FHJ
N)for the poverty rate φ(F).
Corollary 5.1. Let φbe defined by (5.1)and suppose that the conditions of Proposition 3.2 hold.
Then, if Fis differentiable at F1(α), the random variable n(φ(FHT
N)φ(F)) converges in dis-
tribution to a mean zero normal random variable with variance
σ2
HT,α,β =β2f(βF 1(α))2
f(F1(α))2γπ1α+γπ2α2
+γπ1φ(F) + γπ2φ(F)22βf(βF 1(α))
f(F1(α)) φ(F)γπ1+γπ2α,
(5.3)
where γπ1=µπ1+λand γπ2=µπ2λ. If in addition n/N 0, then n(φ(FHT
N)φ(FN))
converges in distribution to a mean zero normal random variable with variance σ2
HT,α,β.
Corollary 5.2. Let φbe defined by (5.1). and suppose that the conditions of Proposition 4.2
hold. Then, if Fis differentiable at F1(α), the random variable n(φ(FHJ
N)φ(F)) converges in
9
distribution to a mean zero normal random variable with variance
σ2
HJ,α,β =β2f(βF 1(α))2
f(F1(α))2γπ1α(1 α)
+γπ1φ(F)1φ(F))2βf(βF 1(α))
f(F1(α)) φ(F)γπ1(1 α),
(5.4)
where γπ1=µπ1+λ. If in addition n/N 0, then n(φ(FHJ
N)φ(FN)) converges in distribution
to a mean zero normal random variable with variance σ2
HJ,α,β.
6 Simulation study
The objective of this simulation study is to investigate the performance of the Horvitz-Thompson
(HT) and the Hájek (HJ) estimators for the poverty rate, as defined in (5.1), at the finite population
level and at the super-population level. The asymptotic results from Corollary 5.1 and 5.2 are used
to obtain variance estimators whose performance is also assessed in this small study.
Six simulation schemes are implemented with different population sizes and (design-based) ex-
pected sample sizes, namely N= 10 000 and 1000 and n= 500,100, and 50. The samples are drawn
according to three different sampling designs. The first one is simple random sampling without re-
placement (SI) with size n. The second design is Bernoulli sampling (BE) with parameter n/N.
The third one is Poisson sampling (PO) with first order inclusion probabilities equal to 0.4n/N for
the first half of the population and equal to 1.6n/N for the other half of the population, where
the population is randomly ordered. The first order inclusion probabilities are deterministic for the
three designs and the sample size nsis fixed for the SI design, while it is random with respect to the
design for the BE and PO designs. Moreover, the SI and BE designs are equal probability designs,
while PO is an unequal probability design. The results are obtained by replicating NR= 1000
populations. For each population, nR= 1000 samples are drawn according to the different designs.
The variable of interest Yis generated for each population according to an exponential distribution
with rate parameter equal to one. For this distribution and given αand β, the poverty rate has
an explicit expression φ(F) = 1 exp(βln(1 α)). In what follows, α= 0.5and β= 0.6and
φ(F)0.34. These are the same values for αand βas considered in [Dd08].
The Horvitz-Thompson estimator and Hájek estimator for φ(F)or φ(FN)are denoted by b
φHT
and b
φHJ, respectively. They are obtained by plugging in the empirical c.d.f.’s FHT
Nand FHJ
N, re-
spectively, for Fin expression (5.1). The empirical quantiles are calculated by using the function
wtd.quantile from the R package Hmisc for the Hájek estimator and by adapting the function for
the Horvitz-Thompson estimator. For the SI sampling design, the two estimators are the same.
The performance of the estimators for the parameters φ(F)and φ(FN)is evaluated using some
Monte-Carlo relative bias (RB). This is reported in Table 1. When estimating the super-population
parameter φ(F), if b
φij denotes the estimate (either b
φHT or b
φHJ) for the ith generated population
and the jth drawn sample, the Monte Carlo relative bias of b
φin percentages has the following
expression
RBF(b
φ) = 100
NRnR
NR
X
i=1
nR
X
j=1 b
φij φ(F)
φ(F).
When estimating the finite population parameter φ(FN), the parameter depends on the generated
population Ni, for each i= 1,...,NR, and will be denoted by φ(FNi). The Monte Carlo relative
bias of b
φis then computed by replacing Fby FNiin the above expression. Concerning the relative
biases reported in Table 1, the values are small and never exceed 3%. As expected, these values
increase when ndecreases. When the centering is relative to φ(FN), the relative bias is in general
10
Table 1: RB (in %) of the HT and the HJ estimators for the finite population φ(FN)and the
super-population φ(F)poverty rate parameter
N= 10 000 N= 1000
n= 500 n= 100 n= 50 n= 500 n= 100 n= 50
SI HT-HJ φ(FN)0.17 0.89 1.82 0.05 0.84 1.62
φ(F)0.20 0.91 1.86 0.18 0.72 1.85
HT φ(FN)0.12 0.66 1.29 0.01 0.65 1.12
BE φ(F)0.15 0.68 1.34 0.12 0.54 1.36
HJ φ(FN)0.17 0.92 1.87 0.04 0.88 1.68
φ(F)0.20 0.93 1.92 0.17 0.76 1.91
HT φ(FN)0.05 1.05 2.06 0.06 0.30 0.37
PO φ(F)0.08 1.07 2.11 0.19 0.19 0.63
HJ φ(FN)0.20 1.27 2.95 0.04 1.08 1.99
φ(F)0.23 1.28 3.00 0.17 0.97 2.23
Table 2: RB (in %) for the variance estimator of the HT and the HJ estimators for the poverty rate
parameter
N= 10 000 N= 1000
n= 500 n= 100 n= 50 n= 500 n= 100 n= 50
SI HT-HJ 2.21 3.08 2.97 2.25 3.26 3.00
BE HT 4.15 5.11 4.21 3.31 5.11 4.19
HJ 2.22 3.06 3.03 2.26 3.24 3.03
PO HT 4.43 4.96 3.45 3.74 5.72 4.59
HJ 2.36 3.43 3.36 2.44 3.75 4.13
somewhat smaller than when centering with φ(F). This behavior is most prominent when N= 1000
and n= 500, which suggests that the estimates are typically closer to the population poverty rate
φ(FN)than to the model parameter φ(F). The Hájek estimator has a larger relative bias than the
Horvitz-Thompson estimator in all situations but in particular for the Poisson sampling design when
the size of the population is 1000. Note that all values in Table 1are negative, which illustrates the
fact that the estimators typically underestimate the population and model poverty rates.
In Table 2, the estimators of the variance of b
φHT and b
φHJ are obtained by plugging in the
empirical c.d.f.’s FHT
Nand FHJ
N, respectively, for Fin the expressions (5.3) and (5.4). To estimate f
in the variance of b
φHJ, we follow [BS03], who propose a Hájek type kernel estimator with a Gaussian
kernel function. For the variance of b
φHT, we use a corresponding Horvitz-Thompson estimator by
replacing b
Nby N. Based on [Sil86], pages 45-47, we choose b= 0.79Rn1/5
s, where Rdenotes
the interquartile range. This differs from [BS03], who propose a similar bandwidth of the order
N1/5. However, this severely underestimates the optimal bandwidth, leading to large variances of
the kernel estimator. Usual bias variance trade-off computations show that the optimal bandwidth
is of the order n1/5
s.
For the SI sampling design, (5.3) and (5.4) are identical and can be calculated in an explicit way
using the fact that µπ1+λ= 1 and µπ2λ=1. For the BE design, µπ1+λ= 1, whereas for
Poisson sampling, the value (n/N2)PN
i=1 1iis taken for µπ1+λ. For these designs, µπ2λ=λ,
where we take n/N as the value of λ.
In order to compute the relative bias of the variance estimates, the asymptotic variance is taken
as reference. This asymptotic variance AV(b
φ)of the estimator b
φ(either b
φHT or b
φHJ) is computed
from (5.3) and (5.4). The expressions f(β F 1(α)) and f(F1(α)) are explicit in the case of an
11
Table 3: Coverage probabilities (in %) for 95% confidence intervals of the HT and the HJ estimators
for the finite population φ(FN)and the super-population φ(F)poverty rate parameter
N= 10 000 N= 1000
n= 500 n= 100 n= 50 n= 500 n= 100 n= 50
SI HT-HJ φ(FN)95.2 94.4 93.5 98.8 95.1 94.6
φ(F)94.6 93.2 92.2 94.7 93.2 92.0
HT φ(FN)94.9 94.3 94.6 98.4 94.8 94.6
BE φ(F)94.4 93.7 94.9 94.6 93.6 94.7
HJ φ(FN)95.1 94.3 93.9 98.7 94.9 94.2
φ(F)94.7 94.2 93.9 94.7 94.2 93.9
HT φ(FN)94.5 94.2 94.3 96.8 94.0 93.6
PO φ(F)94.5 94.0 94.3 94.6 93.6 93.5
HJ φ(FN)94.8 93.9 93.6 97.2 94.2 93.3
φ(F)94.6 93.9 93.6 94.6 93.9 93.2
exponential distribution. Furthermore, for µπ1+λand µπ2λwe use the same expressions as
mentioned above. The Monte Carlo relative bias of the variance estimator c
AV(b
φ)in percentages, is
defined by
RB(c
AV(b
φ)) = 100
NRnR
NR
X
i=1
nR
X
j=1 c
AV(b
φij )AV(b
φ)
AV(b
φ),
where c
AV(b
φij )denotes the variance estimate for the ith generated population and the jth drawn
sample.
Table 3gives the Monte-Carlo coverage probabilities for a nominal coverage probability of 95%
for the two parameters φ(FN)and φ(F), the Horvitz-Thompson and the Hájek estimators and the
different simulation schemes. In general the coverage probabilities are somewhat smaller than 95%,
which is due to the underestimation of the asymptotic variance, as can be seen from Table 2. The
case N= 1000 and n= 500 for b
φHJ forms an exception, which is probably due to the fact that
in this case λ=n/N is far from zero, so that the limit distribution of n(φ(FHT
N)φ(FN)) and
n(φ(FHJ
N)φ(FN)) has a larger variance than the ones reported in Corollaries 5.1 and 5.2. When
looking at Table 2, the relative biases are smaller than 5% when nis 500. The biases are larger
for the Horvitz-Thompson estimator than for the Hájek estimator. Again all relative biases are
negative, which illustrates the fact that the asymptotic variance is typically underestimated.
7 Discussion
[Wan12] formulates a functional central limit theorem (see his Theorem 1) for the Hájek empirical
c.d.f. from (2.2) centered around F. It is also claimed that a similar result holds for the Horvitz-
Thompson process in (2.1), but details are not provided. The paper seems to miss a number of
assumptions that cannot be avoided. For instance, the proof of his Theorem 1 requires convergence
in probability of the covariance matrix of the vector n(F (t)FN(t), F(s)FN(s)). This
assumption is comparable with our condition (HJ2), but is missing in [Wan12]. More severely, the
argument establishing Billingsley’s tightness condition seems to contain a serious mistake, which
cannot be repaired easily (see the inequality on line 6 page 678 in [Wan12]; the inequality can be
shown not to hold for instance for sampling designs with independent inclusion indicators). As
a consequence, assumption 5 in [Wan12] differs somewhat from our conditions (C2)-(C4). The
remaining assumptions in [Wan12] are comparable to the conditions needed for our Theorem 4.3.
12
Note that, in addition to the latter theorem, we also establish Theorems 3.1,3.2, and 4.2 for other
empirical processes of interest.
[BW07] and [SW13] obtain weak convergence of the empirical process (in our notation)
1
N
N
X
i=1 ξi
πi
f(Yi)Emf(Yi), f ∈ F.(7.1)
Weak convergence is established under finite population two-phase stratified sampling. This pro-
cess is comparable to our Horvitz-Thompson empirical process in Theorem 3.2. Although their
functional CLT allows general function classes, it only covers sampling designs with equal inclu-
sion probabilities within strata that assume exchangeability of the inclusion indicators ξ1,...,ξN,
such as simple random sampling and Bernoulli sampling. Their approach views two-phase stratified
sampling as a form of bootstrap and uses results on exchangeable weighted bootstrap for empirical
processes from [PW93], as incorporated in [vdVW96]. This approach, in particular the application
of Theorem 3.6.13 in [vdVW96], seems difficult to extend to more complex sampling designs that
go beyond exchangeable inclusion indicators. Although our results only correspond to the class of
indicators ft(y) = 1(−∞,t](y), for tR, the advantage of our results is that they are applicable to
general sampling designs. Moreover, our results also include empirical processes centered with the
population mean.
[BCC14] establish a functional CLT, for the Poisson-like empirical process
e
Gp
TN(f) = 1
N
N
X
i=1
(ξipi)f(Yi)
piθN,p(f), f ∈ F,(7.2)
where p= (p1,...,pN)is the vector of inclusion probabilities corresponding to a Poisson sampling
design and
θN,p(f) = 1
dN
N
X
i=1
(1 pi)f(Yi), dN=
N
X
i=1
pi(1 pi).
However, the functional CLT is obtained conditionally on the Y1, Y2.... In this case, the terms in the
summation in (7.2) are independent, which allows the use of Theorem 2.11.1 from [vdVW96]. From
their result a functional CLT under rejective sampling can then be established for the design-based
Horvitz-Thompson process
GN,π (f) = 1
N
N
X
i=1 ξi
πi
f(Yi)f(Yi), f ∈ F,(7.3)
ω-almost surely. This is due to the close connection between Poisson sampling and rejective sam-
pling. For this reason, the approach used in [BCC14] seems difficult to extend to other sampling
designs. For the class of indicators ft(y) = 1(−∞,t](y), for tR, the process in (7.3) is similar to the
one in our Theorem 3.1, but this theorem allows general sampling designs. Moreover, our results
also include empirical processes centered with the superpopulation mean.
8 Proofs
We will use Theorem 13.5 from [Bil99], which requires convergence of finite dimensional distributions
and a tightness condition (see (13.14) in [Bil99]. We will first establish the tightness condition, as
stated in the following lemma.
13
Lemma 8.1. Let Y1,...,YNbe i.i.d. random variables with c.d.f. Fand empirical c.d.f. FNand let
FHT
Nbe defined according to (2.1). Let XN=n(FHT
NFN)and suppose that (C1)-(C4) hold. Then
there exists a constant K > 0independent of N, such that for any t1,t2and −∞ < t1tt2<,
Ed,m h(XN(t)XN(t1))2(XN(t2)XN(t))2iKF(t2)F(t1)2.
Proof. First note that
XN(t) = n
N
N
X
i=1 ξi
πi11{Yit}.
For the sake of brevity, for −∞ < t1tt2<, and i= 1,2,...,N, we define p1=F(t)F(t1),
p2=F(t2)F(t),Ai=1{t1<Yit}, and Bi=1{t<Yit2}. Furthermore, let αi= (ξiπi)Aii
and βi= (ξiπi)Bii. Then, according to the fact that p1p2(F(t2)F(t1))2, due to the
monotonicity of F, it suffices to show
1
N4Ed,m
n2 N
X
i=1
αi!2
N
X
j=1
βj
2
Kp1p2.(8.1)
The expectation on the left hand side can be decomposed as follows
N
X
i=1
N
X
k=1
Ed,m n2α2
iβ2
k+
N
X
i=1 X
j6=i
N
X
k=1
Ed,m n2αiαjβ2
k
+
N
X
k=1 X
l6=k
N
X
i=1
Ed,m n2α2
iβkβl+
N
X
i=1 X
j6=i
N
X
k=1 X
l6=k
Ed,m n2αiαjβkβl.
(8.2)
Note that by symmetry, sums two and three on the right hand side can be handled similarly, so
that essentially we have to deal with three summations. We consider them one by one.
First note that, since 1{t1<Yit}1{t<Yit2}= 0, we will only have non-zero expectations when
{i, j}and {k, l}are disjoint. With (C1), we find
1
N4
N
X
i=1
N
X
k=1
Ed,m n2α2
iβ2
k=1
N4XX
(i,k)D2,N
Ed,m n2α2
iβ2
k
=1
N4XX
(i,k)D2,N
Emn2AiBk
π2
iπ2
k
Ed(ξiπi)2(ξkπk)2
1
K4
1XX
(i,k)D2,N
EmAiBk
n2Ed(ξiπi)2(ξkπk)2
(8.3)
Straightforward computation shows that Ed(ξiπi)2(ξkπk)2equals
(πik πiπk)(1 2πi)(1 2πk) + πiπk(1 πi)(1 πk).
Hence, with (C1)-(C2) we find that
Ed(ξiπi)2(ξkπk)2≤ |Ed(ξiπi)(ξkπk)|+K2
2
n2
N2=On2
N2,
14
ω-almost surely. It follows that
1
N4
N
X
i=1
N
X
k=1
Ed,m n2α2
iβ2
kO1
N2XX
(i,k)D2,N
Em[AiBk].
Since D2,N has N(N1) elements and Em[AiBj] = p1p2for (i, j )D2,N , it follows that
1
N4
N
X
i=1
N
X
j=1
Ed,m n2α2
iβ2
jKp1p2.(8.4)
Consider the second (and third) summation on the right hand side of (8.2). Similarly to (8.3), we
can then write
1
N4
N
X
i=1 X
j6=i
N
X
k=1
Ed,m n2αiαjβ2
k=1
N4XXX
(i,j,k)D3,N
Ed,m n2αiαjβ2
k
1
N4XXX
(i,j,k)D3,N Ed,m n2AiAjBk
πiπjπ2
k
(ξiπi)(ξjπj)(ξkπk)2
1
N4XXX
(i,j,k)D3,N
Emn2AiAjBk
πiπjπ2
kEd(ξiπi)(ξjπj)(ξkπk)2
1
K4
1XXX
(i,j,k)D3,N
EmAiAjBk
n2Ed(ξiπi)(ξjπj)(ξkπk)2.
We find that Ed(ξiπi)(ξjπj)(ξkπk)2equals
(1 2πk)Ed(ξiπi)(ξjπj)(ξkπk) + πk(1 πk)Ed(ξiπi)(ξjπj)
With (C1)-(C3), this means |Ed(ξiπi)(ξjπj)(ξkπk)2|=O(n2/N 3),ω-almost surely. It follows
that
1
N4
N
X
i=1 X
j6=i
N
X
k=1
Ed,m n2αiαjβ2
k=O1
N3XXX
(i,j,k)D3,N
Em[AiAjBk].
Since D3,N has N(N1)(N2) elements and Ed,m [AiAjBk] = p2
1p2, for (i, j, k)D3,N , we find
1
N4
N
X
i=1 X
j6=i
N
X
k=1
Ed,m n2αiαjβ2
kKp1p2.(8.5)
The computations for the third summation in (8.2) are completely similar. Finally, consider the
last summation in (8.2). As before, this summation can be bounded by
1
K4
1X
(i,j,k,l)D4,N
EmAiAjBkBl
n2Ed(ξiπi)(ξjπj)(ξkπk)(ξlπl).
Since D4,N has N(N1)(N2)(N3) elements and Em[AiAjBkBl] = p2
1p2
2, for (i, j, k, l)D4,N ,
with (C4) we conclude that
1
N4
N
X
i=1 X
j6=i
N
X
k=1 X
l6=k
Ed,m n2αiαjβkβlKp1p2.(8.6)
Together with (8.4), (8.5) and decomposition (8.2), this proves (8.1).
15
Lemma 8.2. Let XN=n(FHT
NFN)and suppose that (C1)-(C2),(HT1)-(HT2) hold. For any
k∈ {1,2,...}, and t1,...,tkR,XN(t1),...,XN(tk)converges in distribution under Pd,m to a
k-variate mean zero normal random vector with covariance matrix ΣHT
kgiven in (3.4).
Proof. We will use the Cramér-Wold device. Note that any linear combination
a1nFHT
N(t1)FN(t1)+···+aknFHT
N(tk)FN(tk)(8.7)
can be written as
n(1
N
N
X
i=1
ξi
πi
Vik 1
N
N
X
i=1
Vik),(8.8)
where
Vik =a11{Yit1}+···+ak1{Yitk}=at
kYik (8.9)
with Yt
ik = (1{Yit1},...,1{Yitk})and at
k= (a1,...,ak). For the corresponding design-based
variance, we have
nS2
N=n
N2
N
X
i=1
N
X
j=1
πij πiπj
πiπj
VikVjk
=at
k
n
N2
N
X
i=1
N
X
j=1
πij πiπj
πiπj
YikYt
jk
akat
kΣHT
kak,
(8.10)
ω-almost surely, according to (HT2), where ΣHT
kcan obtained from (3.4). Together with (HT1), it
follows that (8.7) converges in distribution to a mean zero normal random variable with variance
at
kΣHT
kak. We conclude that (8.7) converges in distribution to a1N1+···+akNk, where (N1,...,Nk)
has a k-variate mean zero normal distribution with covariance matrix ΣHT
k. According to the
Cramér-Wold device this proves the lemma.
Proof of Theorem 3.1 We first consider XN=n(FHT
NFN)for the case that the Yi’s follow
a uniform distribution on [0,1]. We apply Theorem 13.5 from [Bil99]. Lemma 8.2 provides the
limiting distribution of the finite dimensional projections (XN(t1),...,XN(tk)), which is the same
as that of the vector (GHT(t1),...,GHT(tk)), where GHT is a mean zero Gaussian process with
covariance function
EmGHT(s)GHT(t) = lim
N→∞
1
N2
N
X
i=1
N
X
j=1
Emnπij πiπj
πiπj
1{Yis}1{Yjt},
for all s, t R. Tightness condition (13.14) in [Bil99] is provided by Lemma 8.1. Since GHT is
continuous at 1, the theorem now follows from Theorem 13.5 in [Bil99] for the case that the Yi’s are
uniformly distributed on [0,1].
To extend this to a functional CLT with i.i.d. random variables Y1, Y2,...with a general c.d.f. F,
we can follow the argument in the proof of Theorem 14.3 from [Bil99]. First define the generalized
inverse of F:
ϕ(s) = inf{t:sF(t)},
that satisfies sF(t)if and only if ϕ(s)t. This means that if U1, U2,... are i.i.d. uniformly
distributed on [0,1],ϕ(Ui)has the same distribution as Yi, so that 1{Yit}
d
=1{ϕ(Ui)t}=1{UiF(t)}.
It follows that
XN(t) = n(1
N
N
X
i=1
ξi1{Yit}
πi1
N
N
X
i=1
1{Yit})d
=ZN(F(t)), t R,
16
where
ZN(t) = n
N
N
X
i=1 ξi
πi11{Uit}, t [0,1],(8.11)
Hence, the general HT empirical process XNis the image of the HT uniform empirical process ZN
under the mapping ψ:D[0,1] 7→ D(R)given by [ψx] (t) = x(F(t)). Note that, if xNxin D[0,1]
in the Skorohod topology and xhas continuous sample paths, then the convergence is uniform. But
then also ψxNconverges to ψx uniformly in D(R). This implies that ψxNconverges to ψx in the
Skorohod topology. We have established that ZNZweakly in D[0,1] in the Skorohod topology,
where Zhas continuous sample paths. Therefore, according to the continuous mapping theorem,
e.g., Theorem 2.7 in [Bil99], it follows that ψ(ZN)ψ(Z)weakly. This proves the theorem for Yi’s
with a general c.d.f. F.
Proof of Proposition 3.1 The proof is similar to that of Theorem 3.1. First consider the case of
uniform Yi’s with F(t) = t. We only have to verify the weak convergence of the finite dimensional
projections of the process XN=n(FHT
NFN). Consider (8.7) represented as in (8.8). From (HT1)
and Lemma 9.1(ii) in [BLRG15] we conclude that (8.7) converges in distribution to a mean zero
normal random variable with variance
σ2
HT =µπ1EmV2
1k+µπ2(Em[V1k])2
=µπ1at
kEmY1kYt
1kak+µπ2at
k(EmY1k) (EmY1k)tak=at
kΣkak,
where Σkis the k×k-matrix with (q, r)-element equal to µπ1(tqtr) + µπ2tqtr. We conclude
that (8.7) converges in distribution to a1N1+···+akNk, where (N1,...,Nk)has a k-variate mean
zero normal distribution with covariance matrix Σk. As in the proof of Lemma 8.2, by means of
the Cramér-Wold device this establishes the limit distribution of (XN(t1),...,XN(tk)), which is the
same that of the vector (GHT(t1),...,GHT(tk)), where GHT is a mean zero Gaussian process with
covariance function Ed,mGHT(s)GHT(t) = µπ1(st) + µπ2st. From here on, the proof is completely
the same as that of Theorem 3.1.
To establish tightness for the process n(FHT
NF)we use the following decomposition
n(FHT
NF) = n(FHT
NFN) + n
N·N(FNF).(8.12)
The first process on the right hand side converges weakly to Gaussian process, according to Theo-
rem 3.1. The process N(FNF)also converges weakly to a Gaussian process, due to the classical
Donsker theorem. In particular both processes on the right hand side are tight in D(R)with the
Skorohod metric. In general the sum of two tight processes in D(R)is not necessarily tight. How-
ever, this will be the case if both processes converge weakly to continuous processes (see Lemma 9.2
in [BLRG15]).
Lemma 8.3. Let V1, V2,... be a sequence of bounded i.i.d. random variables on (Ω,F,Pm)with
mean µVand variance σ2
V, and let S2
Nbe defined by (3.2). Suppose (HT1) and (HT3) hold and
nS2
Nσ2
HT >0in Pm-probability. Then,
n 1
N
N
X
i=1
ξiVi
πiµV!,(8.13)
converges in distribution under Pd,m to a mean zero normal random variable with variance σ2
HT +
λσ2
V.
17
Note that, in view of the expression for σ2
HT obtained in Lemma 9.1, for simple random sampling
without replacement, the condition σ2
HT >0implies that λmust differ from 1.
Proof. We decompose as follows
1
SN 1
N
N
X
i=1
ξiVi
πiµV!=1
SN 1
N
N
X
i=1
ξiVi
πi1
N
N
X
i=1
Vi!
+1
nSN×n
N×N 1
N
N
X
i=1
ViµV!.
According to (HT3), the central limit theorem, Slutsky’s theorem, and the fact that nS2
Nσ2
HT >0
in probability,
1
nSN×n
N×N 1
N
N
X
i=1
ViµV!N(0, λσ2
V2
HT),(8.14)
in distribution under Pm, whereas, thanks to (HT1),
1
SN 1
N
N
X
i=1
ξiVi
πi1
N
N
X
i=1
Vi!N(0,1), ω a.s., (8.15)
in distribution under Pd. Since the latter limit distribution does not depend on ω, we can apply
Theorem 5.1(iii) from [RBSK05]. It follows that
1
SN 1
N
N
X
i=1
ξiVi
πiµV!N0,1 + λσ2
V2
HT,
in distribution under Pd,m. Together with nS2
Nσ2
HT in probability, this implies that the random
variable in (8.13) converges to a mean zero normal random variable with variance σ2
HT +λσ2
V.
Lemma 8.4. Let XF
N=n(FHT
NF)and suppose that (C1)-(C2),(HT1)-(HT4) hold. Then for any
k∈ {1,2,...}, and t1, t2,...,tkR, the sequence XF
N(t1),...,XF
N(tk)converges in distribution
under Pd,m to a k-variate mean zero normal random vector with covariance matrix ΣF
HT =ΣHT
k+
λΣF, where ΣHT
kis given in (3.4)and ΣFis the k×kmatrix with (q, r)-entry F(tqtr)F(tq)F(tr),
for q, r = 1,2,...,k.
Proof. The proof is similar to the proof of Lemma 8.2. The details can be found in [BLRG15].
Proof of Theorem 3.2 The proof is completely similar to that of Theorem 3.1. We first consider
the process XF
N=n(FHT
NF)for the case that the Yi’s follow a uniform distribution with
F(t) = t. Decompose XF
Nas in (8.12). By Theorem 3.1, the first process on the right hand side
of (8.12) converges weakly to a process in C[0,1]. Due to the classical Donsker theorem and (HT3),
the second process on the right hand side of (8.12) also converges weakly to a process in C[0,1].
Tightness of XF
Nthen follows from Lemma 9.2 in [BLRG15]. Convergence of the finite dimensional
distributions is provided by Lemma 8.4. The theorem now follows from Theorem 13.5 in [Bil99]
for the case that the Yi’s are uniformly distributed on [0,1]. Next, this is extended to Yi’s with a
general c.d.f. Fin the same way as in the proof of Theorem 3.1.
To establish convergence in distribution of the finite dimensional distributions of n(FHT
NF)
under the conditions of Proposition 3.2, as in the proof of Lemma 8.4, we will use the Cramér-Wold
device. To ensure that the limit in (9.2) is still strictly positive without imposing (HT4), we will
need the following lemma. Its proof can befound in [BLRG15].
18
Lemma 8.5. Let Fbe the c.d.f. of the i.i.d. Y1,...,YN. For any k-tuple (t1,...,tk)Rk, suppose
that the values F(t1),...,F(tk)are all distinct and such that 0< F (ti)<1. Let a, b R, such that
ab. If a > 0, then the k×kmatrix Mwith (i, j)-th element Mij =aF (titj)bF (ti)F(tj)is
positive definite.
Lemma 8.6. Let XF
N=n(FHT
NF)and suppose that nand πi, πij , for i, j = 1,2,...,N, are
deterministic. Suppose that (C1)-(C2), (HT1) and (HT3) hold, as well as conditions (i)-(ii) of
Proposition 3.2. Then, for any k∈ {1,2,...}, and t1,...,tkR,XF
N(t1),...,XF
N(tk)converges
in distribution under Pd,m to a k-variate mean zero normal random vector with covariance matrix
ΣF
HT, with (q, r)-entry (µπ1+λ)F(tqtr) + (µπ2λ)F(tq)F(tr), for q, r, = 1,2,...,k.
Proof. The proof follows the same ideas as the proof of Lemma 8.4, but is a bit more technical. It
can be found in [BLRG15].
Proof of Proposition 3.2 The proof is similar to that of Theorem 3.2. Tightness is obtained
in the same way and the convergence of finite dimensional projections is provided by Lemma 8.6.
The theorem now follows from Theorem 13.5 in [Bil99] for the case that the Yi’s are uniformly
distributed on [0,1]. Next, this is extended to Yi’s with a general c.d.f. Fin the same way as in the
proof of Theorem 3.1.
Proof of Theorem 4.1 For part (i), note that with S2
Ndefined in (3.2) with Vi= 1, from (HT1)
together with condition (4.6), it follows that
nSN×1
SN 1
N
N
X
i=1
ξi
πi1!N(0, σ2
π), ω a.s.,
in distribution under Pd. This implies
n b
N
N1!=n 1
N
N
X
i=1
ξi
πi1!N(0, σ2
π),(8.16)
in distribution under Pd,m. In particular, since n→ ∞, this proves part (i).
The proof of part(ii) is along the same lines as the proof of Theorems 3.1 and 3.2. First consider
the case, where the Yi’s are uniform, with F(t) = ton [0,1]. Then, with FHT
Ndefined in (2.1) and
XF
N=n(FHT
NF), we can write Gπ
N(t) = XF
N(t)(XF
N(t)Gπ
N(t)). According to Theorem 3.2,
the process XF
Nconverges weakly to a continuous process. As a consequence of (8.16), the process
XF
N(t)Gπ
N(t) = tn 1
N
N
X
i=1
ξi
πi1!,
also converges weakly to a continuous process. Hence, similar to the argument in the proof of
Theorem 3.2, we conclude that the process Gπ
Nis tight. Next, we establish weak convergence of the
finite dimensional projections Gπ
N(t1),...,Gπ
N(tk).(8.17)
To this end we apply the Cramér-Wold device and consider linear combinations
a1Gπ
N(t1) + ···+akGπ
N(tk) = n
N
N
X
i=1
ξi
πi
Vik.(8.18)
19
Convergence of (8.18), is obtained completely similar to that of (9.1) in Lemma 8.4, but this time
with
Vik =a11{Yit1}t1) + ···+ak(1{Yitk})tk,
and µk= 0. Using the fact that (HJ4) allows the use of Lemma 8.3, one can deduce that (8.18)
converges in distribution under Pd,m to a1N1+···+akNk, where (N1,...,Nk)has a k-variate normal
distribution with covariance matrix Σπ=ΣHJ
k+λΣF, where ΣHJ
kand ΣFare given in (4.5) and
Lemma 8.4, respectively. By means of the Cramér-Wold device, this proves that (8.17) converges in
distribution under Pd,m to a mean zero k-variate normal random vector with covariance matrix Σπ.
This distribution is the same as that of Gπ(t1),...,Gπ(tk), where Gπis a mean zero Gaussian
process with covariance function
lim
N→∞
1
N2
N
X
i=1
N
X
j=1
Emnπij πiπj
πiπj1{Yis}s1{Yit}t
+λ(stst), s, t R.
Since Gπis continuous at 1, the theorem then follows from Theorem 13.5 in [Bil99] for the case
of uniform Yi’s. Extension to Yi’s with a general c.d.f. Fis completely similar to the proof of
Theorem 3.1.
Proof of Theorem 4.2 We use (4.2). From the proof of Theorem 4.1, we know that Gπ
Nis tight.
Together with Theorem 4.1(i), it then follows that the limit behavior of n(FHJ
NFN)is the same
as that of the process YNdefined in (4.3). This process can be written as
YN(t) = n
N
N
X
i=1 ξi
πi11{Yit}F(t)n
N
N
X
i=1 ξi
πi1.
As in the proofs of Theorems 3.1,3.2, and 4.1, we first consider the case of uniform Yi’s. The first
process on the right hand side is n(FHT
NFN), which converges weakly to a continuous process,
according to Theorem 3.1, whereas the second process also converges to a continuous process due
to (8.16). As in the proof of Theorem 3.2, one can then argue that YN, being the difference of these
processes, is tight. Next, we prove weak convergence of the finite dimensional projections
YN(t1),...,YN(tk).(8.19)
As before, we apply the Cramér-Wold device and consider
a1YN(t1) + ···+akYN(tk) = n(1
N
N
X
i=1
ξi
πi
Vik 1
N
N
X
i=1
Vik),(8.20)
with
Vik =a11{Yit1}t1) + ···+ak(1{Yitk})tk.
Convergence of (8.20) is obtained completely similar to that of (8.8) in the proof of Lemma 8.2. From
(HT1) and (HJ2), it follows that (8.20) converges in distribution under Pd,m to a1N1+···+akNk,
where (N1,...,Nk)has a k-variate normal distribution with covariance matrix ΣHJ
kgiven in (4.5).
By means of the Cramér-Wold device, this proves that (8.19) converges in distribution under Pd,m
to a mean zero k-variate normal random vector with covariance matrix ΣHJ
k. This distribution is the
same as that of GHJ(t1),...,GHJ(tk), where GHJ is a mean zero Gaussian process with covariance
function
lim
N→∞
1
N2
N
X
i=1
N
X
j=1
Emnπij πiπj
πiπj1{Yis}s1{Yit}t,
20
for s, t R. As before, the theorem now follows from Theorem 13.5 in [Bil99] for the case of uniform
Yi’s, and is then extended to Yi’s with a general c.d.f. F.
Proof of Theorem 4.3 The theorem follows directly from relation (4.7) and Theorem 4.1.
The proofs of Propositions 4.1 and 4.2 are similar to those of Theorems 4.2 and 4.1, respectively,
and can be found in [BLRG15]. The proofs for Corollaries 5.1 and 5.2 are fairly straightforward
and can be found in [BLRG15].
References
[BCC14] Patrice Bertail, Emilie Chautru, and Stéphan Clémençon. Empirical processes in survey
sampling. Submitted, See also https://hal.archives-ouvertes.fr/hal-00989585,
2014.
[BD09] Garry F. Barrett and Stephen G. Donald. Statistical inference with generalized Gini
indices of inequality, poverty, and welfare. J. Bus. Econom. Statist., 27(1):1–17, 2009.
[Bha07] Debopam Bhattacharya. Inference on inequality from household survey data. J. Econo-
metrics, 137(2):674–707, 2007.
[Bil99] Patrick Billingsley. Convergence of probability measures. Wiley Series in Probability
and Statistics: Probability and Statistics. John Wiley & Sons, Inc., New York, second
edition, 1999. A Wiley-Interscience Publication.
[BLRG12] Helène Boistard, Hendrik P. Lopuhaä, and Anne Ruiz-Gazen. Approximation of rejective
sampling inclusion probabilities and application to high order correlations. Electron. J.
Stat., 6:1967–1983, 2012.
[BLRG15] Helène Boistard, Hendrik P. Lopuhaä, and Anne Ruiz-Gazen. Supplement to "functional
central limit theorems in survey sampling". 2015.
[BM11] Debopam Bhattacharya and Bhaskhar Mazumder. A nonparametric analysis of black
and white differences in intergenerational income mobility in the united states. Quant.
Econ., 2(3):335–379, 2011.
[BO00] F. Jay Breidt and Jean D. Opsomer. Local polynomial regresssion estimators in survey
sampling. Ann. Statist., 28(4):1026–1053, 2000.
[BR09] David Binder and Georgia Roberts. Handbook of Statistics 29B: Sample Surveys: Design,
Methods and Applications., chapter Chapter 24: Design- and Model-Based Inference for
Model Parameters, pages 33–54. Elsevier, Amsterdam, 2009.
[BS03] Yves G. Berger and Chris J. Skinner. Variance estimation for a low income proportion.
J. Roy. Statist. Soc. Ser. C, 52(4):457–468, 2003.
[BW07] Norman E. Breslow and Jon A. Wellner. Weighted likelihood for semiparametric models
and two-phase stratified samples, with application to Cox regression. Scand. J. Statist.,
34(1):86–102, 2007.
[CCGL10] Hervé Cardot, Mohamed Chaouch, Camelia Goga, and Catherine Labruère. Properties
of design-based functional principal components analysis. J. Statist. Plann. Inference,
140(1):75–91, 2010.
21
[Dav09] Russell Davidson. Reliable inference for the Gini index. J. Econometrics, 150(1):30–40,
2009.
[Dd08] Fabien Dell and Xavier d’Haultfœuille. Measuring the evolution of complex indicators:
Theory and application to the poverty rate in France. Ann. Économ. Statist., (90):259–
290, 2008.
[DS92] Jean-Claude Deville and Carl-Erik Särndal. Calibration estimators in survey sampling.
Journal of the American statistical Association, 87(418):376–382, 1992.
[Dud02] R. M. Dudley. Real analysis and probability, volume 74 of Cambridge Studies in Advanced
Mathematics. Cambridge University Press, Cambridge, 2002. Revised reprint of the 1989
original.
[FF91] Carol A. Francisco and Wayne A. Fuller. Quantile estimation with a complex survey
design. Ann. Statist., 19(1):454–469, 1991.
[Ful09] W.A. Fuller. Sampling Statistics. Wiley Series in Survey Methodology. Wiley, New York,
2009.
[GT14] Eric Graf and Yves Tillé. Variance estimation using linearization for poverty and social
exclusion indicators. Survey Methodology, 40(1):61–79, 2014.
[Háj64] Jaroslav Hájek. Asymptotic theory of rejective sampling with varying probabilities from
a finite population. Ann. Math. Statist., 35:1491–1523, 1964.
[KG98] Edward L. Korn and Barry I. Graubard. Variance estimation for superpopulation pa-
rameters. Statist. Sinica, 8(4):1131–1151, 1998.
[KR81] D. Krewski and J. N. K. Rao. Inference from stratified samples: properties of the
linearization, jackknife and balanced repeated replication methods. Ann. Statist.,
9(5):1010–1019, 1981.
[OAB15] M. Oguz-Alper and Y. G. Berger. Variance estimation of change of poverty based upon
the turkish eu-silc survey. Journal of Official Statistics, 31(2):155–175, 2015.
[PW93] Jens Præstgaard and Jon A. Wellner. Exchangeably weighted bootstraps of the general
empirical process. Ann. Probab., 21(4):2053–2086, 1993.
[RBSK05] Susana Rubin-Bleuer and Ioana Schiopu Kratina. On the two-phase framework for joint
model and design-based inference. Ann. Statist., 33(6):2789–2810, 2005.
[Sil86] B. W. Silverman. Density estimation for statistics and data analysis. Monographs on
Statistics and Applied Probability. Chapman & Hall, London, 1986.
[SW13] Takumi Saegusa and Jon A. Wellner. Weighted likelihood estimation under two-phase
sampling. Ann. Statist., 41(1):269–295, 2013.
[Tho97] M. E. Thompson. Theory of sample surveys, volume 74 of Monographs on Statistics and
Applied Probability. Chapman & Hall, London, 1997.
[vdV98] A. W. van der Vaart. Asymptotic statistics, volume 3 of Cambridge Series in Statistical
and Probabilistic Mathematics. Cambridge University Press, Cambridge, 1998.
22
[vdVW96] Aad W. van der Vaart and Jon A. Wellner. Weak convergence and empirical processes.
Springer Series in Statistics. Springer-Verlag, New York, 1996. With applications to
statistics.
[Wan12] Jianqiang C. Wang. Sample distribution function based goodness-of-fit test for complex
surveys. Comput. Statist. Data Anal., 56(3):664–679, 2012.
Hélène Boistard
Toulouse School of Economics
21 allée de Brienne
31000 Toulouse, France
e-mail: helene@boistard.fr
Hendrik P. Lopuhaä
Delft Institute of Applied Mathematics
Delft University of Technology
Delft, The Netherlands
e-mail: h.p.lopuhaa@tudelft.nl
Anne Ruiz-Gazen
Toulouse School of Economics
21 allée de Brienne
31000 Toulouse, France
e-mail: anne.ruiz-gazen@tse-fr.eu
23
9 Supplemental Material
9.1 Proofs of Lemmas, Propositions and Corollaries in the main text
Proof of Lemma 8.4 We will use the Cramér-Wold device. To this end, we determine the limit
distribution of a1XF
N(t1)+ ···+akXF
N(tk), for a1,...,akRfixed and at
k= (a1,...,ak)6= (0,...,0).
As in the proof of Lemma 8.2, we consider
a1XF
N(t1) + ···+akXF
N(tk) = n 1
N
N
X
i=1
ξi
πi
Vik µk!,(9.1)
where Vik is defined in (8.9). We want to apply Lemma 8.3. As in (8.10),
nS2
Nat
kΣHT
kak, ω a.s.,(9.2)
where at
kΣHT
kak>0, thanks to (HT4). This means that, according to Lemma 8.3, the right hand
side of (9.1) converges in distribution under Pd,m to a mean zero normal random variable with
variance
at
kΣHT
kak+λnEm[V2
1k](Em[V1k])2o=at
kΣF
HTak,
where
ΣF
HT =ΣHT
k+λΣF.(9.3)
We conclude that (9.1) converges in distribution to a1N1+···+akNk, where (N1,...,Nk)has a
mean zero k-variate normal distribution with covariance matrix ΣF
HT. By the Cramér-Wold device,
this proves the lemma.
Proof of Lemma 8.5 Without loss of generality we may assume 0< F (t1)<···< F (tk)<1,
since we can permute the rows and columns of Mwithout changing the determinant. For the entries
of Mwe can distinguish three situations:
1. if 1j < i k, then Mij =aF (tj)bF (ti)F(tj)
2. if 1i=jk, then Mij =aF (ti)bF (ti)2
3. if 1i < j k, then Mij =aF (ti)bF (ti)F(tj).
Now, for 2ik, multiply the i-th row by F(t1)/F (ti). This changes the determinant with a factor
F(t1)k1/F (t2)···F(tk)>0, and as a result, all entries in column j, at positions 1ijk, are
the same: aF (t1)bF (t1)F(tj). Hence, if we subtract row-2 from row-1, then row-3 from row-2,
. . . , and then row-kfrom row-(k1), we get a new matrix Mwith a right-upper triangle consisting
of zero’s and a main diagonal with elements M
ii =aF (t1)aF (t1)F(ti)/F (ti+1), if 1ik1,
and M
kk =aF (t1)bF (t1)F(tk). It follows that
det(M) = F(t2)···F(tk)
F(t1)k1det(M)
=ak1F(t1)(F(t2)F(t1)) ···(F(tk)F(tk1))(abF (tk)) >0,
since a > 0,0< F (t1)<···< F (tk)<1, and abF (tk)> a b0.
24
Proof of Lemma 8.6 The proof is similar to that of Lemma 8.4. We determine the limit
distribution of (9.1). Note that without loss of generality we can assume that 0F(t1)≤ ···
F(tk)1. In contrast with the proof of Lemma 8.4, we now have to distinguish between several
cases.
We first consider the situation where all F(ti)’s are distinct and such that 0< F (ti)<1. From
(HT1) and Lemma 9.1(ii) we conclude that
nS2
Nσ2
HT =µπ1Em[V2
1k] + µπ2(Em[V1k])2=at
kΣkak,
where
Σk=µπ1F(tqtr) + µπ2F(tq)F(tr)k
q,r=1 .(9.4)
First note that
µπ1+µπ2= lim
N→∞
n
N2
N
X
i=1
N
X
j=1
πij πiπj
πiπj
= lim
N→∞
n
N2Var N
X
i=1
ξi
πi!0.
Therefore, together with condition (i) we can apply Lemma 8.5 with a=µπ1and b=µπ2. It
follows that Σkis positive definite, so that σ2
HT >0. This means that, according to Lemma 8.3,
the right hand side of (9.1) converges in distribution under Pd,m to a mean zero normal random
variable with variance (µπ1+λ)Em[V2
1k] + (µπ2λ) (Em[V1k])2=at
kΣF
HTak, where
ΣF
HT =(µπ1+λ)F(tqtr) + (µπ2λ)F(tq)F(tr)k
q,r=1.(9.5)
We conclude that (9.1) converges in distribution to a1N1+···+akNk, where (N1,...,Nk)has a
mean zero k-variate normal distribution with covariance matrix ΣF
HT. By means of the Cramér-Wold
device, this proves the lemma for the case that 0< F (t1)<···< F (tk)<1.
The case that the F(ti)’s are not all distinct, but still satisfy 0< F (ti)<1, can be reduced
to the case where all F(ti)’s are distinct. This can be seen as follows. For simplicity, suppose
F(t1) = ··· =F(tm) = F(t0), with 0< F (t0)< F (tm+1)<··· < F (tk)<1. Then we can
write (9.1) as
a0XF
N(t0) + am+1XF
N(tm+1) + ···+akXF
N(tk),(9.6)
where a0=a1+···+am. As before, with (HT4) and Lemma 8.5, it follows from Lemma 8.3
that (9.6) converges in distribution to a mean zero normal random variable with variance at
0ΣF
0a0,
where a0= (a0, am+1,...,ak)tand
ΣF
0=γπ1Em[Y0Yt
0] + (γπ2λ) (Em[Y0]) (Em[Y0])t,
with Y0= (1{Yit0},1{Yitm+1},...,1{Yitk})t. However, note that
at
0Y0= (a1+···+am)1{Yit0}+am+11{Yitm+1 }+···+ak1{Yitk}
=a11{Yit1}+···+ak1{Yitk}=at
kY1k,,
where ak= (a1,...,ak)tand Y1k= (1{Yit1},...,1{Yitk})t, as before. This means that at
0ΣF
0a0=
at
kΣF
HTak, with ΣF
HT from (9.3). It follows that (9.1) converges in distribution to a1N1+···+akNk,
where (N1,...,Nk)has a mean zero k-variate normal distribution with covariance matrix ΣF
HT.
By means of the Cramér-Wold device, this proves the lemma for the case F(t1) = ···=F(tm) =
F(t0)< F (tm+1)<··· < F (tk)<1. The argument is the same for other cases with multiple
F(ti)(0,1) being equal to each other.
25
Next, consider the case F(t1) = 0. In this case, 1{Yit1}= 0 with probability one. This means
that the summation on the left hand side of (9.1) reduces to a2XF
N(t2) + ···+akXF
N(tk)and
ΣHT =
0 0 ··· 0
0
.
.
.
0
ΣHT,k1
,(9.7)
where ΣHT,k1is the matrix in (9.4) based on 0< F (t2)<··· < F (tk)<1. When at
k1=
(a2,...,ak)6= (0,...,0), then
σ2
HT =at
kΣF
HTak=at
k1ΣHT,k1ak1>0,
because ΣHT,k1is positive definite, due to (HT4) and Lemma 8.5. This allows application of
Lemma 8.3 to (9.1). As in the previous cases, we conclude that (9.1) converges in distribution
to a1N1+···+akNk, where (N1,...,Nk)has a mean zero k-variate normal distribution with
covariance matrix ΣF
HT given by (9.3). When at
k= (a1,0,...,0), with a16= 0, then both (9.1) and
a1N1+···+akNkare equal to zero. According to the Cramér-Wold device, this proves the lemma
for the case F(tk) = 0.
It remains to consider the case F(tk) = 1. In this case, the (k, k)-th element of the matrix ΣHT
in (9.4) is equal to µπ1+µπ2. We distinguish between µπ1+µπ2= 0 and µπ1+µπ2>0. In the
latter case, from the proof of Lemma 8.5 we find that ΣHT has determinant
µk1
π1F(t1)
k
Y
i=2
(F(ti)F(ti1))(µπ1+µπ2)>0,
using (HT4) and 0< F (t1)<···< F (tk1)< F (tk) = 1. This allows application of Lemma 8.3
to (9.1). As before, we conclude that (9.1) converges in distribution to a1N1+···+akNk, where
(N1,...,Nk)has a k-variate mean zero normal distribution with covariance matrix ΣF
HT from (9.3).
According to the Cramér-Wold device, this proves the lemma for the case F(tk) = 1 and µπ1+µπ2>
0.
Next, consider the case F(tk) = 1 and µπ1+µπ2= 0. This means
ΣHT =
ΣHT,k1
0
.
.
.
0
0··· 0 0
,(9.8)
where ΣHT,k1is the matrix in (9.4) corresponding to 0< F (t1)<··· < F (tk1)<1. When
at
k1= (a1,...,ak1)6= (0,...,0), then
σ2
HT =at
kΣHTak=at
k1ΣHT,k1ak1>0,
because ΣHT,k1is positive definite, due to (HT4) and Lemma 8.5. This allows application of
Lemma 8.3 to (9.1). As in the previous cases, we conclude that (9.1) converges in distribution to
a1N1+···+akNk, where (N1,...,Nk)has a k-variate mean zero normal distribution with covariance
matrix ΣF
HT given by (9.3). When at
k= (0,...,0, ak), with ak6= 0, then a1N1+···+akNk= 0 and
a1XF
N(t1) + ···+akXF
N(tk) = akn 1
N
N
X
i=1
ξi
πi1!.
26
converges to zero in probability. The latter follows from the fact that according to (HT1) and
Lemma 9.1, we have that
n 1
N
N
X
i=1
ξi
πi1!N(0, µπ1+µπ2),(9.9)
in distribution under Pd,m. According to the Cramér-Wold device, this proves the lemma for the
case F(tk) = 1 and µπ1+µπ2= 0. Finally, the argument for the case that F(t1) = 0 and F(tk) = 1
simultaneously, either with or without repeated among the F(ti)’s, is completely similar. This
finishes the proof.
Proof of Proposition 4.1 The proof is similar to that of Theorem 4.2. We find that the limit
behavior of n(FHJ
NFN)is the same as that of the process YNdefined in (4.3). When we first
consider the case of uniform Yi’s with F(t) = t, tightness of the process YNfollows in the same way
as in the proof of Theorem 4.2. It remains to establish weak convergence of the finite dimensional
projections (8.19). This can be done in the same way as in the proof of Proposition 3.1, but this
time with
Vik =a11{Yit1}t1) + ···+ak(1{Yitk})tk.
From (HT1) and Lemma 9.1(i) we conclude that (8.20) converges in distribution to a mean zero
normal random variable with variance
σ2
HT =µπ1EmV2
1k=at
ke
Σkak,
where e
Σkis the k×k-matrix with (q, r)-element equal to µπ1(tqtrtqtr). We conclude that (8.20)
converges in distribution to a1N1+···+akNk, where (N1,...,Nk)has a k-variate mean zero normal
distribution with covariance matrix e
Σk. By means of the Cramér-Wold device this establishes the
limit distribution of (8.19), which is the same as that of the vector (GHJ (t1),...,GHJ (tk)), where
GHJ is a mean zero Gaussian process with covariance function
Ed,mGHJ(s)GHJ(t) = µπ1(stst).
From here on, the proof is completely the same as that of Theorem 4.2.
Proof of Proposition 4.2 From relation (4.7) and Theorem 4.1 we know that the limit behavior
of n(FHJ
NF)is the same as that of Gπ
N. Tightness of Gπ
Nhas been obtained in the proof of
Theorem 4.1. It remains to establish weak convergence of (8.17). This can be done in the same
way as in the proof of Lemma 8.6, but this time with
Vik =a11{Yit1}F(t1)+···+ak1{Yitk}F(tk)
and µk= 0. When 0< F (t1)<··· < F (tk)<1, from (HT1) and Lemma 9.1 we find that
nS2
Nµπ1Em[V2
1k] = at
kΣkak, where
Σk=µπ1F(tqtr)F(tq)F(tr)k
q,r=1.(9.10)
From condition (i) of Proposition 3.2 and Lemma 8.5, it follows that Σkis positive definite, so that
at
kΣkak>0. Hence, according to Lemma 8.3, the right hand side of (8.18) converges in distribution
under Pd,m to a mean zero normal random variable with variance (µπ1+λ)Em[V2
1k] = at
kΣF
HJak,
where
ΣF
HJ =(µπ1+λ)F(tqtr)k
q,r=1 .(9.11)
27
We conclude that the right hand side of (8.18) converges in distribution to a1N1+···+akNk, where
(N1,...,Nk)has a mean zero k-variate normal distribution with covariance matrix ΣF
HJ. By means
of the Cramér-Wold device, this proves weak convergence of Gπ
N(t1),...,Gπ
N(tk)for the case that
0< F (t1)<···< F (tk)<1. As in the proof of Lemma 8.6, the case where the F(ti)’s are not all
distinct, but satisfy 0< F (ti)<1, the case F(t1) = 0, and the case F(tk) = 1, can be reduced to
the previous case. From here on, the proof is completely the same as that of Theorem 4.1.
Proof of (5.2) Following [Dd08], one can write φ=ψ2ψ1, where
ψ1(F) = F, β F 1(α)
ψ2(F, x) = F(x).
The Hadamard-derivative of φcan then be obtained from the chain rule, e.g., see Lemma 3.9.3
in [vdVW96]. According to Lemma 3.9.20 in [vdVW96], for 0< α < 1and FDφthat have a
positive derivative at F1(α), the map ψ1is Hadamard-differentiable at Ftangentially to the set
of functions hD(R)that are continuous at F1(α)with derivative
ψ
1,F (h) = h, βh(F1(α))
f(F1(α)) .
It is fairly straightforward to show that for Fthat are differentiable at x, the mapping ψ2is
Hadamard-differentiable at (F, x)tangentially to the set of pairs (h, ǫ), such that his continuous at
xand ǫR, with derivative
ψ
2,(F,x)(h, ǫ) = ǫf (x) + h(x).
Then for FDφthat are differentiable at βF 1(α), the mapping ψ2is Hadamard-differentiable at
ψ1(F) = F, β F 1(α). It follows from the chain rule that φ(F) = FβF 1(α)=ψ2ψ1(F)is
Hadamard-differentiable at Ftangentially to the set D0consisting of functions hD(R)that are
continuous at F1(α)with derivative
φ
F(h) = βf(βF 1(α))
f(F1(α)) h(F1(α)) + h(βF 1(α)).
Proof of Corollary 5.1 The mapping φ:DφD(R)7→ Ris Hadamard-differentiable at F
tangentially to the set D0consisting of functions hD(R)that are continuous at F1(α). According
to Theorem 3.2, the sequence n(FHT
NF)converges weakly to a mean zero Gaussian process GHT
F
with covariance structure
Ed,mGHT
F(s)GHT
F(t) = (µπ1+λ)F(st) + (µπ2λ)F(s)F(t),(9.12)
for s, t R. It then follows from Theorem 3.9.4 in [vdVW96], that the random variable n(φ(FHT
N)
φ(F)) converges weakly to
βf(βF 1(α))
f(F1(α)) GHT
F(F1(α)) + GHT
F(βF 1(α)),
which has a normal distribution with mean zero and variance
σ2
HT,α,β =β2f(βF 1(α))2
f(F1(α))2EGHT
F(F1(α))2
+EGHT
F(βF 1(α))2
2βf(βF 1(α))
f(F1(α)) EGHT
F(F1(α))GHT
F(βF 1(α)).
28
The precise expression can then be derived from (9.12), which proves part one. For part two, write
nφ(FHT
N)φ(FN)=nφ(FHT
N)φ(F)+n
NN(φ(FN)φ(F)) .
The process N(FNF)converges weakly to a mean zero Gaussian process GF. Then, Hadamard-
differentiability of φtogether with Theorem 3.9.4 in [vdVW96] yields that the sequence N(φ(FN)
φ(F)) converges weakly to φ
F(GF). As n/N 0, the theorem follows from part one.
Proof of Corollary 5.2 The proof is completely the same as that of Corollary 5.1, with the only
difference that the covariance structure of the limiting process n(φ(FHJ
N)φ(F)) is now given in
Theorem 4.3.
9.2 Additional Lemmas
Lemma 9.1. Let S2
Nbe defined by (3.2), where V1, V2,... is a sequence of i.i.d. random variables
on (Ω,F,Pm)with Em[V4
1]<. Suppose that nand πi, πij, for i, j = 1,2,...,N are deterministic
and let Vm(S2
N)denote the variance of S2
N. If (C1)-(C2) hold, then n2Vm[S2
N] = O(1/N). Then,
(i) if Em[V1] = 0 and condition (i) in Proposition 3.1 holds,
nS2
Nσ2
HT =µπ1Em[V2
1],in Pm-probability.
(ii) if Em[V1]6= 0 and conditions (i)-(ii) in Proposition 3.1 hold,
nS2
Nσ2
HT =µπ1Em[V2
1] + µπ2(Em[V1])2,in Pm-probability.
Proof. For any ǫ > 0, by Markov inequality we have
Pm|nS2
NEm[nS2
N]|> ǫ<n2Vm[S2
N]
ǫ2,(9.13)
where Vmdenotes the variance of S2
Nunder the super-population model. In order to compute
Vm[S2
N], we first have
Em[S2
N] = 1
N2
N
X
i=1
N
X
j=1
πij πiπj
πiπj
Em(ViVj)
=Em[V2
1]
N2
N
X
i=1
1πi
πi
+(Em[V1])2
N2XX
i6=j
πij πiπj
πiπj
.
(9.14)
From this, tedious but straightforward calculus leads to the expression for (Em[S2
N])2and Em[S4
N].
One finds
N4EmS2
N2=a1(Em[V1])4+a2EmV2
1(Em[V1])2+a3EmV2
12,
29
where, according to (C1)-(C2):
a1=XXXX
(i,j,k,l)D4,N
πij πiπj
πiπj
πkl πkπl
πkπl
+ 4 XXX
(i,j,l)D3,N
πij πiπj
πiπj
πil πiπl
πiπl
+ 2 XX
(i,j)D2,N πij πiπj
πiπj2
=XXXX
(i,j,k,l)D4,N
πij πiπj
πiπj
πkl πkπl
πkπl
+O(N3/n2) + O(N2/n2)
a2= 2 XXX
(i,k,l)D3,N
1πi
πi
πkl πkπl
πkπl
+ 4 XX
(i,k)D2,N
1πi
πi
πik πiπk
πiπk
= 2 XXX
(i,k,l)D3,N
1πi
πi
πkl πkπl
πkπl
+O(N3/n2)
a3=XX
(i,j)D2,N
1πi
πi
1πj
πj
+
N
X
i=1 1πi
πi2
=XX
(i,j)D2,N
1πi
πi
1πj
πj
+O(N3/n2).
Furthermore,
N4EmS4
N=b1(Em[V1])4+b2EmV2
1(Em[V1])2
+b3EmV2
12+b4Em[V1]EmV3
1
where
b1=XXXX
(i,j,k,l)D4,N
πij πiπj
πiπj
πkl πkπl
πkπl
+
N
X
i=1 1πi
πi2
=XXXX
(i,j,k,l)D4,N
πij πiπj
πiπj
πkl πkπl
πkπl
+O(N3/n2)
b2= 2 XXX
(i,k,l)D3,N
1πi
πi
πkl πkπl
πkπl
+ 4 XXX
(i,j,l)D3,N
πij πiπj
πiπj
πil πiπl
πiπl
= 2 XXX
(i,k,l)D3,N
1πi
πi
πkl πkπl
πkπl
+O(N3/n2)
b3=XX
(i,k)D2,N
1πi
πi
1πk
πk
+ 2 XX
(i,j)D2,N πij πiπj
πiπj2
=XX
(i,k)D2,N
1πi
πi
1πk
πk
+O(N2/n2)
b4= 4 XX
(i,j)D2,N
πij πiπj
πiπj
1πj
πj
=O(N3/n2).
The variance expression for S2
Nis deduced easily from the previous computations. From the expres-
sion derived in [BLRG15], we find that aibi=O(N3/n2), for i= 1,2,3, and b4=O(N3/n2), so
30
that
n2Vm[S2
N] = n2Em[S4
N]n2Em[S2
N]2=O(1/N ).(9.15)
From (9.13) we conclude that nS2
NEm[nS2
N]tends to zero in Pm-probability. As a consequence,
statements (i) and (ii) follow from (9.14).
Lemma 9.2. If xN xand yN yin D[0,1] with the Skorohod metric, and x, y C[0,1], then
the sequence {xN+yN}is also tight in D[0,1].
Proof. We can use Theorem 13.2 from [Bil99]. The first condition follows easily since
sup
t[0,1] |xN(t) + yN(t)| ≤ sup
t[0,1] |xN(t)|+ sup
t[0,1] |yN(t)|.
Because xN xand yN yboth sequences {xN}and {yN}are tight, so that they satisfy the
first condition of Theorem 13.2 individually. For condition (ii) of Theorem 13.2 in [Bil99], choose
ǫ > 0. According to (12.7) in [Bil99], for any 0< δ < 1/2,
w
x(δ)wx(2δ).
This means that
Pw
xN+yN(δ)ǫP{wxN+yN(2δ)ǫ}
P{wxN(2δ)ǫ/2}+P{wyN(2δ)ǫ/2}.
Consider the first probability. Since xN xin D[0,1] with the Skorohod metric, according to the
almost sure representation theorem (see, e.g., Theorem 11.7.2 in [Dud02]), there exist exnand ex,
having the same distribution as xNand x, respectively, such that exNex, with probability one, in
the Skorohod metric. Because exd
=xand xC[0,1], also exC[0,1]. Hence, since exis continuous,
it follows that
sup
t[0,1] |exN(t)ex(t)| → 0,with probability one. (9.16)
We then find that
P{wxN(2δ)ǫ/2}=P(sup
|st|<2δ|xN(s)xN(t)| ≥ ǫ/2)
=P(sup
|st|<2δ|exN(s)exN(t)| ≥ ǫ/2)
P(sup
|st|<2δ|ex(s)ex(t)| ≥ ǫ/4)
+P(sup
s[0,1] |exN(s)ex(s)| ≥ ǫ/8)+P(sup
t[0,1] |exN(t)ex(t)| ≥ ǫ/8).
The latter two probabilities tend to zero due to to (9.16). For the first probability on the right
hand side, note that C[0,1] is separable and complete. This means that each random element in
C[0,1] is tight. Hence, exC[0,1] is tight, so that according to Theorem 7.3 in [Bil99], there exists
a0< δ < 1/2, such that
P(sup
|st|<2δ|x(s)x(t)| ≥ ǫ/4)=P{wx(2δ)ǫ/4} ≤ η.
We conclude that P{wxN(2δ)ǫ/2} → 0, and the same result for yNcan be obtained similarly.
This proves the lemma.
31
... These properties were also studied in Breidt and Opsomer (2000) for the class of local polynomial regression estimators and in Breidt et al. (2016), but under assumptions that are not generally applicable to multistage sampling designs. More recently, Boistard et al. (2017) and Bertail et al. (2017) established functional central limit theorems for HT empirical processes. In summary, these properties have been mainly studied in the literature for one-stage sampling designs. ...
... This assumption is compatible with the case n I =N I → 0 (negligible first-stage sampling fraction). A similar condition was considered in Breidt and Opsomer (2000), assumption A5, and in Boistard et al. (2017), assumption HT3. Equation (9) states that the first-order inclusion probabilities do not depart much from that obtained under simple random sampling. ...
... Equation (9) states that the first-order inclusion probabilities do not depart much from that obtained under simple random sampling. The same condition was considered in Boistard et al. (2017), assumption C1. Overall, assumption 1 is under the control of the survey sampler. ...
Article
Two‐stage sampling designs are commonly used for household and health surveys. To produce reliable estimators with associated confidence intervals, some basic statistical properties like consistency and asymptotic normality of the Horvitz–Thompson estimator are desirable, along with the consistency of associated variance estimators. These properties have been mainly studied for single‐stage sampling designs. In this work, we prove the consistency of the Horvitz–Thompson estimator and of associated variance estimators for a general class of two‐stage sampling designs, under mild assumptions. We also study two‐stage sampling with a large entropy sampling design at the first stage and prove that the Horvitz–Thompson estimator is asymptotically normally distributed through a coupling argument. When the first‐stage sampling fraction is negligible, simplified variance estimators which do not require estimating the variance within the primary sampling units are proposed and shown to be consistent. An application to a panel for urban policy, which is the initial motivation for this work, is also presented.
... The study of dependent empirical processes arising from complex sampling was initiated by [7] for stratified samples followed by [52]. Beyond stratified samples, [4] and [5] studied the U-CLT for rejective sampling and single stage sampling, respectively. ...
... Previous studies focused on dependence within a sample but our theory addresses dependence within and between samples at the same time. Another difference is that simple inverse probability weighting adopted in [4,5,7] is not valid in our setting. This technique corrects selection bias from data sources but does not account for bias from duplicated selection. ...
... samples and the ones from the analysis of merged data without worrying about differences in assumptions. This generality makes a contrast with [4] that assumes the uniform entropy condition and [5] that assumes a priori the existence of the finite-dimensional CLT. ...
Article
We develop large sample theory for merged data from multiple sources. Main statistical issues treated in this paper are (1) the same unit potentially appears in multiple datasets from overlapping data sources, (2) duplicated items are not identified and (3) a sample from the same data source is dependent due to sampling without replacement. We propose and study a new weighted empirical process and extend empirical process theory to a dependent and biased sample with duplication. Specifically, we establish the uniform law of large numbers and uniform central limit theorem over a class of functions along with several empirical process results under conditions identical to those in the i.i.d. setting. As applications, we study infinite-dimensional M-estimation and develop its consistency, rates of convergence and asymptotic normality. Our theoretical results are illustrated with simulation studies and a real data example.
... In a recent paper by [33], and more rigorously in [5], the authors obtain a result substantially equivalent to Proposition 1. However, they have the only purpose of establishing a functional central limit theorem, without proposing a resampling scheme that allows to recover the large sample distribution of statistics of interest. ...
... However, they have the only purpose of establishing a functional central limit theorem, without proposing a resampling scheme that allows to recover the large sample distribution of statistics of interest. On one hand, the regularity assumptions in [5] are slightly weaker than ours; on the other hand, they assume the asymptotic normality of the distribution function estimator for the considered sampling design, while in the present paper this assumption is avoided, and replaced by the high entropy condition for the considered sampling designs. Such a condition, although slightly more restrictive than those in [5], allows us to explicitly write down the covariance kernel function of the asymptotic law of the considered functionals, without resorting to the computation of second order inclusion probabilities, usually a numerically complicate task for almost all πps sampling designs. ...
... On one hand, the regularity assumptions in [5] are slightly weaker than ours; on the other hand, they assume the asymptotic normality of the distribution function estimator for the considered sampling design, while in the present paper this assumption is avoided, and replaced by the high entropy condition for the considered sampling designs. Such a condition, although slightly more restrictive than those in [5], allows us to explicitly write down the covariance kernel function of the asymptotic law of the considered functionals, without resorting to the computation of second order inclusion probabilities, usually a numerically complicate task for almost all πps sampling designs. Moreover, approximations of second order inclusion probabilities, such as Hájek approximation (cfr. ...
Preprint
The aim of this paper is to provide a resampling technique that allows us to make inference on superpopulation parameters in finite population setting. Under complex sampling designs, it is often difficult to obtain explicit results about superpopulation parameters of interest, especially in terms of confidence intervals and test-statistics. Computer intensive procedures, such as resampling, allow us to avoid this problem. To reach the above goal, asymptotic results about empirical processes in finite population framework are first obtained. Then, a resampling procedure is proposed, and justified via asymptotic considerations. Finally, the results obtained are applied to different inferential problems and a simulation study is performed to test the goodness of our proposal.
... Finally, in Section 8 we discuss our results in relation to more complex designs. All proofs are deferred to Section 9 and some tedious technicalities can be found in [BLRG15]. ...
... It can be seen that conditions A3-A4 in [CMM15] imply the lower bound in (C1). Details can be found in the supplement B [BLRG15]. There exists a constant K 3 > 0, such that for all N = 1, 2, . . ...
... Note that with imposing conditions (i)-(ii) in Proposition 3.1 instead of (3.3), convergence of nS 2 N is not necessarily guaranteed. However, this is established in Lemma B.1 in [BLRG15] under (C1) and (C2). Finally, we like to emphasize that if we would have imposed (HT2) for any sequence Y 1 , Y 2 , . . . of bounded random vectors, then (HT2) would have implied conditions (i)-(ii) in the deterministic setup of Proposition 3.1. ...
Article
Full-text available
For a joint model-based and design-based inference, we establish functional central limit theorems for the Horvitz–Thompson empirical process and the Hájek empirical process centered by their finite population mean as well as by their super-population mean in a survey sampling framework. The results apply to single-stage unequal probability sampling designs and essentially only require conditions on higher order correlations. We apply our main results to a Hadamard differentiable statistical functional and illustrate its limit behavior by means of a computer simulation.
... This problem has been addressed by Breslow and Wellner (2007) in the particular case of a stratified sampling plan (see also Gill et al., 1988 in the iid case within each stratum), where the individuals are selected at random (without replacement) in each stratum, by means of bootstrap limit results (see also Barbe and Bertail, 1995;Mason and Newton, 1992 for exchangeable sampling plans and Lumley, 2012 in the case where the sampling plan is simple and independent from the variables of interest). We also mention the recent work of Wang (2012), which has been examined precisely in Boistard et al. (2014), under some conditions on inclusion probability, up to the fourth order : they essentially focus on the empirical cumulative distribution function. Our approach is different and follows (and extends) that of Hàjek (1964), considered next by Berger (1998Berger ( , 2011. ...
... When T pP X q E P X pXq, Theorem 4.1 exactly reduces to the Central Limit Theorem established in Hàjek (1964). Other functionals of interest in the field of the economic literature, including the Gini index or poverty index, are considered in Boistard et al. (2014). ...
... In an univariate setting, the functional T pP X q F X pxq : P pX ¤ xq can be dealt with by simply considering the class of indicator functions u Þ Ñ I tu ¤ xu with x R and applying next Corollary 3.1. This particular case was considered in Wang (2012) and Boistard et al. (2014). We provide illustrations of this specific example for rejective sampling in Section 5. ...
Article
Full-text available
It is the main purpose of this paper to study the asymptotics of certain variants of the empirical process in the context of survey data. Precisely, Functional Central Limit Theorems are established under usual conditions when the sample is drawn from a Poisson or a rejective sampling design. The framework we develop encompasses sampling designs with non-uniform first order inclusion probabilities, which can be chosen so as to optimize estimation accuracy. Applications to Hadamard differentiable functionals are considered.
Article
Full-text available
Interpreting changes between point estimates at different waves may be misleading if we do not take the sampling variation into account. It is therefore necessary to estimate the standard error of these changes in order to judge whether or not the observed changes are statistically significant. This involves the estimation of temporal correlations between cross-sectional estimates, because correlations play an important role in estimating the variance of a change in the cross-sectional estimates. Standard estimators for correlations cannot be used because of the rotation used in most panel surveys, such as the European Union Statistics on Income and Living Conditions (EU-SILC) surveys. Furthermore, as poverty indicators are complex functions of the data, they require special treatment when estimating their variance. For example, poverty rates depend on poverty thresholds which are estimated from medians. We propose using a multivariate linear regression approach to estimate correlations by taking into account the variability of the poverty threshold. We apply the approach proposed to the Turkish EU-SILC survey data.
Article
Full-text available
Supplementary materials are available for this article.
Article
Interpreting changes between point estimates at different waves may be misleading, if we do not take the sampling variation into account. It is therefore necessary to estimate the standard error of these changes in order to judge whether or not the observed changes are statistically significant. This involves the estimation of temporal correlations between cross sectional estimates, because correlations play an important role in estimating the variance of change in the cross-sectional estimates. Standard estimator for correlations cannot be used, because of the rotation used in most panel surveys, such as the European Union Statistics on Income and Living Conditions (EU-SILC) surveys. Furthermore, as poverty indicators are complex functions of the data, they need a special treatment when estimating their variance. For example, poverty rates depend on poverty thresholds which are estimated from medians. We propose to use a multivariate linear regression approach to estimate correlations by taking into account of the variability of the poverty threshold. We apply the proposed approach to the Turkish EU-SILC survey data.
Article
We have used the generalized linearization technique based on the concept of influence function, as Osier has done (Osier 2009), to estimate the variance of complex statistics such as Laeken indicators. Simulations conducted using the R language show that the use of Gaussian kernel estimation to estimate an income density function results in a strongly biased variance estimate. We are proposing two other density estimation methods that significantly reduce the observed bias. One of the methods has already been outlined by Deville (2000). The results published in this article will help to significantly improve the quality of information on the precision of certain Laeken indicators that are disseminated and compared internationally.
Article
This chapter discusses the issues associated with making inferences about model parameters from survey data that have been obtained from a probability–sampling scheme using a frequency-based framework. In sample surveys, information is collected from a sample of units from a finite population. It is common for the sampling plan to be complex, which is defined as any sampling plan where the units are selected using a design that is not simple random sampling. When a survey is conducted, the survey producer targets a particular population of inference or a particular set of populations of inference. It is important to distinguish between two types of populations: the survey population and the target populations. It is presumed that the realizations of the random variables generated by such a model have given rise to the values of the characteristics of interest in the finite population from which the sample was selected. The relationship between the units of a population and the units of analysis is important. The researcher must be aware of the differences between the two types of units.
Article
This article investigates estimation of finite population totals in the presence of univariate or multivariate auxiliary information. Estimation is equivalent to attaching weights to the survey data. We focus attention on the several weighting systems that can be associated with a given amount of auxiliary information and derive a weighting system with the aid of a distance measure and a set of calibration equations. We briefly mention an application to the case in which the information consists of known marginal counts in a two- or multi-way table, known as generalized raking. The general regression estimator (GREG) was conceived with multivariate auxiliary information in mind. Ordinarily, this estimator is justified by a regression relationship between the study variable y and the auxiliary vector x. But we note that the GREG can be derived by a different route by focusing instead on the weights. The ordinary sampling weights of the kth observation is 1/πk, where πk is the inclusion probability of k. We show that the weights implied by the GREG are as close as possible, according to a given distance measure, to the 1/πk while respecting side conditions called calibration equations. These state that the sample sum of the weighted auxiliary variable values must equal the known population total for that auxiliary variable. That is, the calibrated weights must give perfect estimates when applied to each auxiliary variable. That is a consistency check that appeals to many practitioners, because a strong correlation between the auxiliary variables and the study variable means that the weights that perform well for the auxiliary variable also should perform well for the study variable. The GREG uses the auxiliary information efficiently, so the estimates are precise; however, the individual weights are not always without reproach. For example, negative weights can occur, and in some applications this does not make sense. It is natural to seek the root of the dissatisfaction in the underlying distance measure. Consequently, we allow alternative distance measures that satisfy only a set of minimal requirements. Each distance measure leads, via the calibration equations, to a specific weighting system and thereby to a new estimator. These estimators form a family of calibration estimators. We show that the GREG is a first approximation to all other members of the family; all are asymptotically equivalent to the GREG, and the variance estimator already known for the GREG is recommended for use in any other member of the family. Numerical features of the weights and ease of computation become more than anything else the bases for choosing between the estimators. The reasoning is applied to calibration on known marginals of a two-way frequency table. Our family of distance measures leads in this case to a family of generalized raking procedures, of which classical raking ratio is one.