ArticlePDF Available

Post-stratified Probability-Proportional-to-Size Sampling from Stratified Populations

Authors:

Abstract and Figures

This paper develops statistical inference based on a post-stratified probability-proportional-to-size (pp) sample from a finite population. A pp sample selects the sample units with selection probabilities proportional to their size and measures them for the characteristic of interest. For each measured unit, the pp sample further creates position information (rank) in a comparison set of size M. The sample is then post-stratified into ranking classes based on their position information in the comparison set. A pp sample is expanded to stratified populations by selecting a pp sample from each stratum population to form the stratified pp sample. Using this stratified pp sample, we construct unbiased and Rao–Blackwell estimators for the mean of the stratified populations. Different sample size allocation procedures for stratum sample sizes are investigated. The new sampling design is applied to apple production data to estimate the total apple production in Turkey.
Content may be subject to copyright.
Supplementary materials for this article are available at https:// doi.org/ 10.1007/ s13253-019-00370- 6.
Post-stratified Probability-Proportional-to-Size
Sampling from Stratified Populations
Omer Ozturk
This paper develops statistical inference based on a post-stratified probability-
proportional-to-size ( pp) sample from a finite population. A pp sample selects the sample
units with selection probabilities proportional to their size and measures them for the
characteristic of interest. For each measured unit, the pp sample further creates position
information (rank) in a comparison set of size M. The sample is then post-stratified
into ranking classes based on their position information in the comparison set. A pp
sample is expanded to stratified populations by selecting a pp sample from each stratum
population to form the stratified pp sample. Using this stratified pp sample, we con-
struct unbiased and Rao–Blackwell estimators for the mean of the stratified populations.
Different sample size allocation procedures for stratum sample sizes are investigated.
The new sampling design is applied to apple production data to estimate the total apple
production in Turkey.
Supplementary materials accompanying this paper appear online.
Key Words: Rao–Blackwell estimator; Stratified sampling; Post-stratified sample;
Neyman allocation; Probability-proportional-to-size sampling.
1. INTRODUCTION
In settings, where characteristic of interest Yis approximately proportional to a positive
known auxiliary variable X, the probability-proportional-to-size ( pps) sampling would be
preferable over simple random sampling. In a pps sample, sample units are selected with
probability proportional to size of X. Hence, it gives higher chance for important (large)
units in the population to be included in the sample. Since X-variable is approximately
proportional to Y-variable and its values are available for all population units, it also provides
information on the relative position (rank) of Y-variable on a unit in a comparison set. This
position information can be used to induce more structure in the sample by post-stratifying
the sample into different ranking groups.
O. Ozturk (B), Department of Statistics, The Ohio State University, 1958 Neil Avenue, Columbus, OH 43210,
USA
(E-mail: omer@stat.osu.edu).
© 2019 International Biometric Society
Journal of Agricultural, Biological, and Environmental Statistics, Volume 24, Number 4, Pages 693–718
https://doi.org/10.1007/s13253-019-00370-6
693
694 O. Ozturk
Table 1. Population characteristics of apple production (in 1000 kg) data, θl,τl,Nl,andρlare the mean, standard
deviation, population size and the correlation coefficient between X-andY-variables for the strata l,
respectively.
Strata (l)θlτlNlρl
Marmara (l=1) 1536.8 6425 106 0.816
Aegean (l=2) 2233.7 11,604.9 105 0.856
Mediterranean (l=3) 9384.31 29,907.5 94 0.901
Black Sea (l=4) 967 2389.7 204 0.713
Central Anatolia (l=5) 5588 28,643.4 171 0.986
Eastern Anatolia (l=6) 631.4 1171 103 0.885
Southeastern Anatolia (l=7) 72.4 111.3 68 0.917
All regions combined 2940.456 17,135 851 0.916
One natural setting for post-stratified pps sampling is given in Kadilar and Cingi (2003)
for the estimation of apple production. The population consists of apple producing localities
in Turkey. The apples in Turkey are produced in seven different geographical regions, Mar-
mara, Aegean, Mediterranean, Central Anatolia, Black Sea, Eastern Anatolia and Southeast-
ern Anatolia regions. These regions have different weather patterns, and apple production
varies from one region to the other. Turkish Statistical Institute collected data from these
regions to estimate the total apple production in 2002. The data set contains two variables
apple production (Y, in 1000 kg) and the number of apple trees (X) in each locality and
region. The values of X-variable are available for all population units in the data base prior to
sampling. The entire population has seven subpopulations and fits into stratified population
structure. The main characteristics of the population are presented in Table 1.
The entire population with all regions combined contains 851 units (townships). The cor-
relation coefficient between Xand Yis 0.916. The population means (standard deviations)
of X- and Y-variables are 37,732.667 (145,031.7) and 2940.456 (17,135), respectively. It
is clear from Table 1that the strata populations have different means and variances. Hence,
there exist large within- and between-strata variations. It is reasonable to assume that apple
production Yis approximately proportional to the number of apple trees Xin each locality.
Hence, the use of pps sample would be appropriate. In addition to the structure imposed by
unequal probability sampling, the number of apple trees (X) can provide information about
the relative position of apple production (Y), among a small comparison set of localities, on
each sampling unit in the pps sample.
All 851 localities in apple production data will serve as our finite population in this
paper. We select a pps sample with selection probabilities proportional to the number of
apple trees and measure the apple production. For each unit in the pps sample, we select
another pps sample to construct a comparison set and determine the rank of the measured
unit, without measurement, using Xvariable. The pps sample is then post-stratified based
on their ranking information from the comparison sets. The theory of stratified sampling
suggests that the post-stratified pps sample improves the initial pps sample. This paper
provides a foundation for such a claim.
The position information is successfully used in ranked set sampling (rss) and judgment
post-stratified sampling ( jps) designs. Both of these designs determine the rank of each
Post- stratified Probability- Proportional- to-Size Sampling 695
measured unit in a comparison set of size M. Each unit in these designs, in addition to
the information it caries, provides additional information through its rank about the other
unmeasured units in the comparison set. Since the construction of the comparison set and
ranking of the units in it are available with no addition cost, this additional information is
essentially free.
For the construction of a rss sample of size n, we first need to determine the cycle
size dand set size M,n=dM. We then select nM localities from the apple production
population and partition them into ndisjoint comparison sets each having Munits. Units
in each comparison set are ranked without measurement using X-variable, and the value
of Yassociated with the hth ranked X(Y[h]j)is measured in ddifferent comparison sets,
h=1,...,M. The measured values Y[h]j,h=1,...,M;j=1,...,d, are called a ranked
set sample.
For the construction of a jps sample of size n, we start with a simple random sample of
size nand measure all of them, Yi;i=1,...,n. For each measured unit in this sample, we
select additional M1 units from the population to form a comparison set of size M.The
rank Riof the measured unit in each of these comparison sets is determined. The pairs of
(Yi,Ri),i=1,...,n, constitute a jps sample.
The rss sampling design generated common interest for many researchers in a finite
population setting. Patil et al. (1995) used ranked set sample to estimate population mean
for a population of size Nwhen the sample is constructed without replacement. Deshpande
et al. (2006) described three different sampling designs and constructed nonparametric
confidence intervals for population quantiles. Al-Saleh and Samawi (2007), Ozdemir and
Gokpinar (2007 and 2008), Gokpinar and Ozdemir (2010), Ozturk and Jafari Jozani (2013),
Frey (2011) and Ozturk (2014a,2016a) computed inclusion probabilities and constructed
Horvitz–Thompson-type estimators for population mean and total based on a ranked set
sample. Ozturk and Bayramoglu Kavlak (2018,2019) developed statistical inference based
on a superpopulation model using ranked set sample data. These research papers show that
rss design yields a substantial amount of improvement in efficiency over the usual simple
random sampling design.
The jps sampling design is originally developed for an infinite population setting in
MacEachern et al. (2004). In recent years, considerable attention has been given to research
efforts in jps sampling. Wang et al. (2006) developed a class of estimators for population
mean using the concomitant of multivariate order statistics. Wang et al. (2008) put a stochas-
tic ordering constraint among judgment classes to improve the efficiency of the estimator of
population mean. Frey and Ozturk (2011) replaced the stochastic ordering constraint with a
weaker ordering condition in which judgment class cumulative distribution functions (cdf)
can be no more extreme than the cdf of the true order statistics. In a follow-up paper, Frey
(2012) combined this weaker ordering condition with stochastic ordering constraint to con-
struct a better estimator for the population mean. Frey and Feeman (2012,2013) constructed
optimal estimators within a class of unbiased estimators for population mean and variance.
In finite population setting, Ozturk (2016b) developed estimators for population mean based
on a jps sample where he showed that the estimator needs a finite population correction
factor similar to the one used in a simple random sampling.
696 O. Ozturk
In this paper, we look at the jpssampling design from different perspective. We construct
the jps sample using probability-proportional-to-size sampling design. We first construct
apps sample from a finite population. This pps sample is then post-stratified based on
their relative positions in comparison sets. Even though it may be possible to construct a
different type of estimator, presumably more efficient, based on full covariate information,
it is not considered in this paper. Section 1introduces the post-stratified pps (pp) sample
in a finite population setting. It constructs unbiased estimators for the population mean
and its variance. Section 2constructs Rao–Blackwell estimator by conditioning on the
measured values of the pps sample. Section 3extends pp sampling to a stratified population
and constructs unbiased and Rao–Blackwell estimators for the population mean. Section 4
considers four different sample size allocation procedures to minimize the variance of the
estimator under a cost model and different stratum population structures. Section 5provides
empirical evidence to investigate the properties of the proposed estimators and compares
them with their competitors. Section 6applies the proposed sampling design and estimators
to apple production data in Turkey. Section 7provides some concluding remarks. All proofs
are given in a supplementary material.
2. PROBABILITY-PROPORTIONAL-TO-SIZE POST-STRATIFIED
SAMPLING
We consider a finite population of size N,PN={u1,...,uN}. Each population unit
uipossesses two characteristics Yand X, where Yis the characteristic of interest and X
is an auxiliary size variable. In this population, actual values of Y- and X-variables are
denoted with y1,..., yNand x1,...,xN, respectively. We assume that the characteristic X
is approximately proportional to the characteristic Y. The population mean and variance of
Yare denoted by
θ=1
N
N
i=1
yiand γ2=1
N
N
i=1
(yiθ)2.
From this population, we select a probability-proportional-to-size sample of size nwith
replacement. Note that we use a design-based inference. Hence, the values of Yand Xin the
population are non-random constants. The sampling variation is induced by the selection
probability of the units. Let Wibe an indicator function
Wi=1 if unit iis selected
0 otherwise
with P(Wi=1)=πi, where the πiis proportional to the value of Xon ui,πixi.
We then write Yi=Wiyi. In this expression, even though yiis a constant, Yiis a random
variable since Wihas a Bernoulli distribution with success probability πi. In the remainder
of the paper, we reserve the capital letter Yfor random variable and the lowercase letter
(yi) for a constant value of Yon unit ui.The pps sample then constitutes the triplets,
(ui,Yi
i);i=1,...,n.
Post- stratified Probability- Proportional- to-Size Sampling 697
We now induce more structure in this pps sample to improve its information content.
For each selected unit uiin the pps sample, we select additional M1 units using pps
sampling with replacement to form a comparison set of size M
Si={ui,u1,...,uM1},i=1,...,n.
The units in comparison set Siare ranked with respect to size variable X, and the rank
of ui,Ri, is recorded. The pps sample is then augmented with this ranking information,
(ui,Yi
i,Ri);i=1,...,n. Each unit uiin the augmented pps sample has two pieces of
information. The first piece is the value (Yi=Wiyi) of the characteristic Y. The second piece
is the relative position (rank Ri)ofuiamong all Munits in the comparison set Si. The rank
Riis obtained with no additional cost since the size variable Xis available in the sampling
frame prior to sampling. Since the comparison sets are constructed with replacement pps
sampling design, it is possible that the same unit may appear more than once in Si.Ifthis
happens, ties are broken at random to rank the units in the comparison set Si. Even under
perfect ranking, ties can create ranking error in the comparison sets since they are broken
at random.
In the augmented pps sample, if we ignore the ranks, Ri;i=1,...,n,thesample
(Yi
i)becomes a pps sample. Based on this pps sample, an unbiased estimator of the
population mean θand its variance is given by
¯
Ypps,n=1
Nn
N
i=1
Wiyi
πi
2
pps,n=Va r (¯
Ypps,n)=1
n
N
k=1
πkyk
Nπkθ2
.(1)
Unbiased estimator of σ2
pps,nis available in the literature (Thompson 2002, page 52), and
an approximate (1α)100-% confidence interval for the population mean is given by
¯
Ypps,n±tn1,α/2ˆσpps,n,ˆσ2
pps,n=1
n(n1)
n
i=1yi
πiN¯
Ypps,n2
,
where tdf,ais the ath upper quantile of the df-degrees of freedom t-distribution.
In a pps sample, the probability mass function (pmf) and cumulative distribution function
(cdf) of Yiare given by
f(y)=fYi(y)=P(Yi=y)=
N
i=1
P(Wi=1)I(yi=y)=
N
j=1
πjI(yj=y)
and
F(y)=FYi(y)=
i:yiy
N
j=1
πjI(yj=yi),
698 O. Ozturk
where I(a)is an indicator function. From the above equation, we also observe that Yis
are independent identically distributed discrete random variables. For independent discrete
random variables, cdf and pmf of the hth-order statistic in a set of size Mare given by
F(h:M)(y)=
M
k=hM
kFk(y){1F(y)}Mkand f(h:M)(y)=F(h:M)(y)F(h:M)(y),
where F(h:M)(y)is the left limit at y.
In the augmented pps sample, the ranks can be used to post-stratify the pps sample
based on their relative positions (ranks) in the comparison sets. The ranks, Ri;i=1,...,n,
are independent identically distributed (iid) discrete uniform random variables on integers
1,...,M. For large values of M, the post-stratified sample may create a lot of empty
ranking groups. The empty ranking groups usually increase the variance of the estimators.
Without loss of generality, we drop the notation uifrom the augmented pps sample. The
new sample will be called as post-stratified pps (pp) sample and it will contain the triplets,
(Yi
i,Ri), i=1,...,n.
To reduce the likelihood of empty ranking classes, we reduce the number of ranking
groups from Mto d,1dM, where dis the number of post-stratified ranking groups
and His the number of ranks in each ranking group, 1 dM.Let
Dr={(r1)H+1,...,(r1)H+H};r=1,...,d;∪
d
r=1Dr={1,...,M},
where the sets Dr;r=1,...,d, form a partition for integers 1,...,M. For example, if
M=9 and d=3, D1={1,2,3},D2={4,5,6}and D3={7,8,9}form a partition for
integers 1,...,9. Using these partition sets, we stratify the sample into dstrata based on the
membership of Riin set Dr;r=1,...,d. The large values of dcreates more structure in
the sample, but may lead to a lot of empty strata and more uncertainty in the estimators. For
notational convenience, we relabel the pp sample, (Zi,r
i);i=1,...,n;r=1,...,d,
where
Zi,r=YiI(RiDr);i=1,...,n;r=1,...,d.
The Zi,rs are independent but not identically distributed random variables. The conditional
distribution of Z1,rgiven that the rank R1belongs to the set Dris given by
P(Z1,r=z1|R1Dr)=1
H
hDr
f[h:M](z1),
where f[h:M](z)is the pmf of the hth-order statistic Y[h:M]in a pps sample when the
comparison set is ordered based on Xvariable. We note that the rank Rimay not be equal
to the rank of Yiin the comparison set Sisince the units are ranked based on X-variable.
Hence, we use the square brackets to denote the possibility of ranking error. If the units are
ranked based on Y-variable, the comparison sets still may contain repeated observations,
since the units are selected with replacement. In this case, ranking error may be relatively
Post- stratified Probability- Proportional- to-Size Sampling 699
small if the population size Nis large with respect to set size M. In this paper, unless stated
otherwise we consider a ranking procedure based on the characteristic X.
Using the conditional distribution of Z1,rgiven that Y1has a rank in the set Dr, we define
the conditional mean and variance of Z1,r1as follows
¯μr=EZ1,r
π1|R1Dr=1
H
hDr
EY[h:M],1
π[h:M],1=1
H
hDr
μ[h:M](2)
and
var Z1,r
π1|R1Dr=1
H
hDr
σ2
[h:M]+1
H
hDrμ[h:M]−¯μr2=1
H2
r+τ2
r), (3)
where
σ2
r=
hDr
σ2
[h:M];τ2
r=
hDrμ[h:M]−¯μr2,
μ[h:M]=EY[h:M],1
π[h:M],1and σ2
[h:M]=Var Y[h:M],1
π[h:M],1.
Let
Jr=1
nrif nr>0
0 otherwise. (4)
We now construct an estimator for the population mean θ
¯
Ypp,n=1
Nd
n
d
r=1
IrJr
n
i=1
Zi,r
πi=
d
r=1
ar¯
Zr,¯
Zr=Jr
N
n
i=1
Zi,r,ar=Ir
dn
,
where nris the number of observations in ranking class r,Ir=(nr>0)and dn=d
r=1Ir.
We note that ¯
Zris a pps estimator based on sample observations having membership in
set Dr. Hence, the estimator ¯
Ypp,nis a weighted average of pps estimators from ranking
groups. The weights, ar;r=1,...,d, are used as an adjustment to create an unbiased
estimator for θ.
Note that nr,Irand dnare random variables. The vector of ranking class sample sizes
n=(n1,...,nd)has a d-dimensional multinomial random variable with parameters n=
n1+···+ndand success probability vector (1/d,...,1/d). Using this multinomial random
variable, we establish the following results, proofs of which are given in Ozturk (2014b)
and Dastbaravarde et al. (2016).
Theorem 1. Let n=(n1,...,nd)be a multinomial random variable with success
probability vector (1/d,...,1/d). The following equalities hold
i. E(I1
dn)=1/d
ii. Var(I1
dn)=1
d2d1
k=1(k
d)n1
700 O. Ozturk
iii. Cov( I1
dn,I2
dn)=− 1
d1Va r (I1
dn)
iv. E(I1J1
d2
n
)=1
dn1
n+d
k=2k1
j=1nk+1
m=1
(1)j1
k2md1
k1k1
j1n
m
(kj)nm.
Note that expected values, variance and covariance in Theorem 1do not depend on popu-
lation characteristics. They only depend on the design parameter dand sample size nand
hence can be computed once and for all, ahead of time, prior to sampling. We next show that
¯
Ypp,nis an unbiased estimator for θand provide a closed-form expression for its variance.
Theorem 2. Let (Zi,r
i);i=1,...,n;r=1,...,d be a post-stratified probability-
proportional-to-size sample from a finite population. The estimator ¯
Ypp,nis unbiased for
population mean θand its variance is given by σ2
pp,n=Va r (¯
Ypp,n)
σ2
pp,n=d
N2(d1)Var I1
dnd
r=1
(¯μrNθ)2
+
EI1J1
d2
n
N2H
d
r=1
hDr[h:M]−¯μr)2+σ2
[h:M].
There are two types of variations contributing to the variance of the estimator ¯
Ypp,n,
variation due to differences among population units and the variation due to differences
among ranking class sample sizes nr,r=1,...,d. The ranking class sample size variation
is quantified by the expressions Var(I1/dn)and E(I1J1/d2
n), where J1is defined in Eq. (4).
For the large sample size n, we can establish the following limits
lim
n→∞ nVar I1
dn=0 and lim
n→∞ nE(I1J1/d2
n)=1/d.
Using these two limits, the variance of n(¯
Ypp,nθ) can be reduced to a simple form
Var n(¯
Ypp,nθ)1
N2Hd
d
r=1
hDr[h:M]−¯μr)2+σ2
[h:M]
=1
N2dH
d
r=1
2
r+τ2
r).
The large sample approximation of the variance of the estimator shows that it is partitioned
into two pieces, within and between ranking group variations. This is similar to the parti-
tion of the variation in a stratified sample, where variance is decomposed into within- and
between-strata variations.
Post- stratified Probability- Proportional- to-Size Sampling 701
We now construct a conditionally unbiased estimator for the variance of ¯
Ypp,ngiven that
one of the groups has at least two-measured units . Let
J
r=1/(nr1)if nr>1
0 otherwise,(5)
U1=d
d
r=1
n
i=1
n
j=i
I
rJrJ
rZi,r
πiZj,r
πj2
d
n
,(6)
U2=
d
r=1
d
t=r
n
i=1
n
j=1
IrItJrJtZi,r
πiZj,t
πj2
d2
n
,(7)
where I
r=I(nr>1)and d
n=d
r=1I
r.
Theorem 3. Let (Zi,r
i);i=1,...,n;r=1,...,d, be a post-stratified probability-
proportional-to-size sample from a finite population. Assume that there is at least one set
Drthat contains at least two observations. A conditionally unbiased estimator for σ2
pp,nis
then given by
ˆσ2
pp,n=U1
2EI2
1J1
d2
nVar I1
dn+U2
2(d1)
Va r (I1/dn)
EI1I2
d2
n,
where E(I1I2/d2
n)=−Va r (I1/dn)/(d1)+1/d2.
Theorem 3holds for any nas long as there exist a set Drwith at least two observations.
We can then construct an approximate (1α)100% confidence interval for the population
mean θfor moderate sample sizes
¯
Ypp,n±tndn,α/2ˆσ2
pp,n,
where the degrees of freedom df =ndnis suggested to account the heterogeneity among
ranking classes.
3. RAO–BLACKWELL ESTIMATOR
The post-stratified probability-proportional-to-size sample estimator can be considered
as a conditional estimator for given values of sample units, ui,i=1,...,n.LetR=
{R1,...,Rn}be the conditional ranks of nunits given S=(u1,...,un). The estimator
¯
Ypp,nis constructed based on just one realization of the ranks Ri,i=1,...,n,giventhe
sample units
¯
Ypp,n(R)=1
Nd
n
d
r=1
IrJr
n
i=1
Zi,r
πi|u1,...,un,
702 O. Ozturk
where the notation Rhighlights that this estimator depends the realization of the conditional
ranks for given sample unit vector S. For a given sample unit vector S, one can obtain many
realization of the ranks by constructing different comparison sets from the population. Each
of these realization leads to different estimator. We then use Rao–Blackwell theorem to
combine all theses estimators
¯
YRB,n=ER¯
Ypp,n(R),
where the expectation is taken over the conditional distribution of ranks, Ri;i=1,...,n,
given the sample units ui=1,...,n. The construction of the Rao–Blackwell estima-
tor requires the computation of the conditional expectations of post-stratified probability-
proportional-to-size sample estimator over conditional distribution of ranks given the set
of sample units S. Even though we are unable to find a closed-form expression for this
expectation, we provide an algorithm to approximate it.
Algorithm 1. I. Select an integer Q . For q =1,..., Q, construct comparison sets
Sq
i={ui,uq
2...,uq
M};i =1,...,n , where uq
t;t =2,...,M , are the unmeasured units
selected from the population using p ps sample to form the comparison set Sq
i.
II. Using the comparison sets in step I, compute Rq=Rq
1,...,Rq
nand
Jq
r=1/nq
rif nq
r>0
0otherwise; Iq
i,r=I(Rq
iDr);nq
r=
n
i=1
Iq
i,r;
Iq
r=I(nq
r>0);dq
n=
d
r=1
Iq
r;¯
Yq
pp,n=
d
r=1
Iq
rJq
r
dq
n
n
i=1
YiI(Rq
iDr)
Nπi
III. Approximate the ¯
YRB,nfrom
˜
YRB,n1
Q
Q
q=1¯
Yq
pp,n.
The algorithm does not provide an estimate for the variance of Rao–Blackwell estimator.
We use jackknife variance estimator to assess the sampling variation. To construct the
jackknife variance, for given Qsets of ranks Rq,q=1,..., Q, we compute nRao–
Blackwell estimator, ¯
Y(i)
RB,n,i=1,...,n, where ¯
Y(i)
RB,nis the Rao–Blackwell estimator after
the ith unit is removed from the sample. We now create jackknife replicates
si=n¯
YRB,n(n1)¯
Y(i)
RB,n,i=1,...,n.
The jackknife variance estimate is then given by
ˆσ2
J=1
n(n1)
n
i=1
(si−¯si)2
Post- stratified Probability- Proportional- to-Size Sampling 703
where ¯si=i=1si/n. An approximate (1α)100% confidence interval for the population
mean θbased on Rao–Blackwell estimator is given by
¯
YRB,n±tn1,α/2ˆσ2
J.
4. POST-STRATIFIED PROBABILITY-PROPORTIONAL-TO-SIZE
SAMPLES FROM STRATIFIED POPULATIONS
In this section, we expand the post-stratified probability-proportional-to-size sample to
stratified populations. We assume that main population is divided into Ldisjoint subpopula-
tions PNl=u1,l,...,uNl,l, where Nlis the population size of the lth stratum population,
l=1,...,L. The stratum population means, variances and totals are defined as
θl=1
Nl
Nl
i=1
yi,l;γ2
Nl=1
Nl
Nl
i=1
(yi,lθl)2;tNl=Nlθl;l=1,...,L,
θ=1
N
L
l=1
Nl
i=1
yi,l,
where yi,lis the value of Yon unit ui,lin stratum population PNland N=N1+···+NL.In
this population, we wish to draw inference on parameter θ. To construct a post-stratified pps
sample from this stratified population, we select a post-stratified pps sample with sample
size nland set size Mlfrom each stratum population. We combine these samples to form
the post-stratified pps stratified sample (str), (Yi,l
i,l,Ri,l);i=1,...,nl;l=1,...,L,
where Yi,lis the value of Yon unit ui.l
i,lis the selection probability of the unit ui,land
Ri,lis the rank of the unit ui,lin the comparison set of size Mlfrom the stratum population
l. Using this stratified sample, we construct an estimator for the population mean θ
¯
Ystr =
L
l=1
Nl
N¯
Ypp,nl=
L
l=1
Nl
N
dl
r=1
Ir,lJr,l
Nldnl
nl
i=1
Zi,r,l
πi,l;n=n1+···,nL,
Jr,l=1/nr,lif nr,l>0
0 otherwise,
where nr,lis the number of observations in ranking group r,Zi,r,l=Yi,lI(Ri,lDr,l),
Dr,l={(r1)Hl+1,...,(r1)Hl+Hl},Ir,l=I(nr,l>0),dnl=dl
r=1I(nr,l>0),
Hl=Ml/dl, and dlis the number of ranking groups for stratum l;l=1,...,L.We
use the notation (Zi,r,l
i,l),i=1,...,nl;r=1,...,dl;l=1,...,L, to denote the
post-stratified pps sample from a stratified population.
Theorem 4. Let Zi,r,l
i,l;i =1,...,nl;r=1,...,dl;l=1,...,L, be a post-
stratified pps sample from a stratified population. The estimator ¯
Ystr is unbiased for the
population mean θand its variance σ2
str =Va r (¯
Ystr )is given by
704 O. Ozturk
σ2
str =
L
l=1
N2
l
N2σ2
pp,nl,
σ2
pp,nl=dl
N2
l(dl1)Var I1,l
dnldl
r=1
(¯μr,lNlθl)2
+
EI1,lJ1,l
d2
nl
N2
lHl
dl
r=1
hDr,lh:Ml−¯μr,l)2+σ2
[h:Ml],
where ¯μr,l=hDr,lμ[h:Ml]/Hl.
An unbiased estimator for the population total can be constructed from Tstr =N¯
Ystr .
The variance of Tstr follows from Theorem 4,Var(Tstr)=N2σ2
str .
Corollary 1. Let n0=min(n1,...,nL)and λl=limnl→∞ nl
n>0as n0goes to
infinity. The variance of n(¯
Ystr θ) =L
l=1n
nl
Nl
N[nl(¯
Ypp,nlθl)]is given by
σ2
λ=
L
l=1
1
N2dlHlλl
dl
r=1
2
r,l+τ2
r,l),
where σ2
r,l=hDr,lσ2
[h:Ml],τ2
r,l=hDr,l[h:Ml]−¯μr,l)2and ¯μr,l=hDr,lμ[h:Ml]/Hl.
A conditional unbiased estimator for σ2
str can be established from Theorem 3given that
there is at least one set Dr,lhaving at least two observations in each stratum sample
ˆσ2
str =
L
l=1
N2
l
N2ˆσ2
pp,nl,
ˆσ2
pp,nl=U1,l
2EI2
1,lJ1,l
d2
nlVar I1,l
dnl+U2,l
2(dl1)
Va r (I1,l/dnl)
EI1,lI2,l
d2
nl,
where U1,land U2,lare the expressions U1and U2in Eqs. (6) and (7)forstratuml, respec-
tively. An approximate (1α) ×100% confidence interval for the population mean θcan
be constructed from
¯
Ystr ±tdf,α/2ˆσstr ,
where df =L
l=1nlL
l=1dnl.
Post-stratified pps sample from a stratified population consists of Ldifferent post-
stratified probability-proportional-to-size samples, one from each stratum population. The
stratum populations usually have different means and variances. For a fixed sample size n,
n=n1+...+nL, the information content of the stratified sample depends on the stratum
sample sizes, nl;l=1,...,L. For a finite sample size n, it is a challenge to investigate
Post- stratified Probability- Proportional- to-Size Sampling 705
the relationship between the stratum sample sizes and information content of the sample.
To ease the computation, we look at four different sample size allocations for large sample
sizes.
The equal allocation procedure selects equal number of observations from each stratum
populations nl=n/L,l=1,...,L. Under this allocation scheme, the asymptotic variance
of n(¯
Ystr θ) reduces to
σ2
λ(E)=
L
l=1
L
N2dlHl
dl
r=1
2
r,l+τ2
r,l),
where Ein σ2
λ(E)is used to denote the equal allocation.
In certain cases, it may be reasonable to select sample sizes proportional to the stratum
population sizes, nl=n(Nl/N). Under proportional (P) allocation, the asymptotic variance
of the estimator reduces to
σ2
λ(P)=
L
l=1
1
Nd
lHlNl
dl
r=1
2
r,l+τ2
r,l).
Optimal (Neyman) allocation minimizes the variance of the estimator with respect to
sample sizes nlsubject to the constraint that the sum of the stratum sample sizes equals n.
Using Lagrange multiplier, we can show that Neyman (N) allocation sample sizes are given
by
nm=
ndl
r=12
r,m+τ2
r,m)
dmHm
L
l=1dl
r=12
r,l+τ2
r,l)
dlHl
;m=1,...,L.
Under Neyman allocation, the asymptotic variance simplifies to
σ2
λ(N)=L
l=1dl
r=12
r,l+τ2
r,l)/(dlHl)2
N2.
Sampling cost is also a limiting factor in sample size determination when there is a
constraint in the budget. In this case, it is desirable to minimize the variance with respect to
stratum sample sizes for a given cost function and a budget. A simple cost function for this
setting can be constructed as
CT=C0+
L
l=1
(cl+rl)nl,(8)
where CTis the total cost, C0is overhead cost, clis the cost of measuring a single unit
from stratum population land rlis the cost of obtaining the rank of a measured unit in a
706 O. Ozturk
comparison set in stratum l. For the setting where post-stratified probability-proportional-
to-size sampling is appropriate, it is reasonable to assume that rlis relatively small since the
values of X-variable are available for all population units. Under the cost model in Eq. (8),
the asymptotic variance of the estimator is minimized for
nm=ndm
r=12
r,m+τ2
r,m)/(Mm(cm+rm))
L
l=1dl
r=12
r,l+τ2
r,l)/(Ml(cl+rl)) ;m=1,...,L.
For the cost function CT, the variance of the optimal estimator simplifies to
σ2
λ(C)=L
l=1dl
r=12
r,l+τ2
r,l)/(Ml(cl+rl))2
N2.
The equal and proportional allocations are relatively easy to implement. The difference
between the variances of the estimators under equal and proportional allocations can be
written as follows
σ2
λ(E)σ2
λ(P)=
L
l=1
LA
2
l
NM
lNl¯
N
NN
l;A2
l=
dl
r=1
2
r,l+τ2
r,l);¯
N=
L
l=1
Nl/L.
It is reasonable to assume that A2
lis an increasing function of the population variance
of stratum l,τ2
l. We then expect that the difference between the variances of equal and
proportional allocation will be positive when large stratum population (large Nl) has large
variances (large τ2
lor large A2
l). In this case, proportional allocation procedure samples
more data from a stratum population having large population size and variance to reduce
the contribution of variation from this stratum sample to the estimator. We note that for the
implementation of the equal and proportional allocations it is not necessary to have point
estimates for the population variances. It only requires knowing if the larger populations
have larger variances. This may be less restrictive than knowing the point estimates of the
population variances.
The Neyman allocation is optimal. Hence, it yields smaller variance than both equal
and proportional allocations. On the other hand, the computation of stratum sample sizes
requires that A2
lmust be known prior to construction of the sample. For setting, where
Ml=dlHlMfor all stratum samples and the stratum population variances are known
(or may be estimated from pilot studies) from the previous studies. The Neyman allocation
can be approximated from
nm=nA
m
L
l=1Alnˆτm
L
l=1ˆτl;m=1,...,L,
where ˆτ2
lis the estimate of the variance of the stratum population l.
Post- stratified Probability- Proportional- to-Size Sampling 707
5. EFFICIENCY COMPARISON OF THE ESTIMATORS
In this section, we provide empirical evidence about the efficiency of the proposed esti-
mators using several populations, where a probability-proportional-to-size sampling would
be a natural choice. In these populations, the values of Y-variable are proportional to the
values of the X-variable. A small percentage of the population units have extreme values
in both Y-and X-variables with different proportionality constants. The units that produce
extreme values usually behave differently from the other units in the population. They have
larger variance and the slope of the regression fit between Y- and X-values would be larger
than the slope of the regression fit of Yon Xfor the remaining population units. For exam-
ple, if we sample farms to estimate the crop production (Y), the farm population can be
divided into two parts small/normal size in acre (X) and mega farms that has extremely
large X-values . The percentage of the mega farms would be small, but they may have larger
variance in Y- and X-variables and the regression fit of YtoXmay have a larger slope. For
our empirical investigation, we generate this type of population structure using the model
below.
I. For a fixed population size N, generate the size variable Xfrom an exponential
distribution with mean 100 and order these Nrandom numbers from smallest to
largest, x(i)< ... < x(N), where x(i)is the ith smallest value of x-values.
II. Let Nbe he largest integer such that NNω. Generate the Y-values from
y[i]=15x(i)+τx(i)ii=1,...,N
45x(i)+2τx(i)ii=N+1,...,N,(9)
where iis generated from a normal distribution with mean zero and variance 1 and
y[i]is the value of the Y-variable that corresponds to the value of x(i).
In model (9), the quantity 1 ωis the proportion of population units that produce extreme
measurements on Y- and X-variables. It corresponds to proportion of mega farms in crop
production example. In the simulation study, we used R-software in R Core Team (2018)
to generate the population values. We set the random generator seed at set.seed =1
so that the same population values can be created. Using model (9), the populations
of size N=2000 are generated with several values of ω=0.7,0.8,0.90,0.95 and
τ=50,200,400,500,800 to establish different correlation structures between Y- and
X-variables. Sample size nis selected to be n=30 in the first phase of the simulation
study. The number of replication in Rao–Blackwell estimator is taken to be Q=50.
For each population, we used the pairs of set size and number of groups (M,d)as
(M,d)=(4,4), (5,5), (10,2), (10,5), (30,3), (60,3).
For the populations generated by the model in Eq. (9), a ratio estimator in a double
sampling could be an alternative estimator for the population mean or total using the size
variable Xas an auxiliary variable. The double sampling first selects nM units from the
population and measures only the Xvariable. In the second stage, from the nM units selected,
it selects a subsample of size nand measures both Y- and X-variables. The ratio estimator
is then given by
708 O. Ozturk
¯
YR=n
i=1Yi
n
i=1Xi1
nM
nM
j=1
Xi.
This estimator is a biased estimator, and its approximate mean square error (MSE) is given
in Section 14.1 in Thompson (2002). We compare the Rao–Blackwell estimator ( ¯
YRB,n)
with probability-proportional-to-size ( ¯
Ypps,n), post-stratified probability-proportional-to-
size ( ¯
Ypp,n) and ratio estimators ( ¯
YR). The efficiencies of these estimators are defined as
follows
RE1=Va r (¯
Ypps,n)
Va r (¯
YRB,n),RE2=Va r (¯
Ypp,n)
Va r (¯
YRB,n),RE3=MSE(¯
YR)
Va r (¯
YRB,n).
Tables 2and 3present the efficiency values RE1,RE2and RE3for the finite populations
generated by the model in Eq. (9). In all these simulation settings, the Rao–Blackwell
estimator ( ¯
YRB,n) is always better than the ratio estimator ( ¯
YR), (RE3>1). The efficiency
is higher for larger values of ρand decreases with lower values of ρ. The smallest efficiency
gain is RE3=1.102 in Table 3when M=60, d=3, τ=800, ω=0.95 and ρ=0.377.
The efficiency of Rao–Blackwell estimator with respect to probability-proportional-to-
size ( pps) estimator depends on the correlation coefficient ρand the proportion of population
units with extreme X- and Y-values (1 ω) . For large values of ρ(0.5). The Rao–
Blackwell estimator is superior to pps estimator (RE11). If the correlation coefficient is
less than 0.5, the Rao–Blackwell estimator is slightly less efficient than the pps estimator.
If the population has smaller number of units having extreme X- and Y-measurements, the
Rao–Blackwell estimators tend to have higher efficiencies then for a pps estimator for the
same correlation coefficient ρ. For example, in Table 2when ω=0.7 and ρ=0.966
and 0.856 the RE1values are usually higher than the RE1values when ω=0.8 and ρ=
0.949 and 0.854. These numerical values indicate that the Rao–Blackwell and post-stratified
probability-proportional-to-size ( pp) sample estimators can handle extreme observations
better than their competitor ratio estimator. As expected, Rao–Blackwell estimator is always
superior to the pp sample estimator (RE2>1).
Tables 4and 5present efficiency results for different sample (n=120), set (M=
5,10,60,2000) sizes and the number of groups (d) when ω=0.7,0.9. The efficiency
values slightly increase with sample size for the same values of M,d,ρand ω. For example,
in Table 2when ω=0.7 and ρ=0.966 , RE1values are 2.193, 4.188 for M=5,d=5 and
M=60,d=3, respectively. The RE1values in Table 4for the same simulation settings
are 2.353, and 4.449. Efficiencies in Tables 4and 5also depend on the selection of set size
Mand the number of groups. The larger set sizes suggest better efficiency for large ρ.For
example, the value of RE1is the largest (5.964) when ρ=0.966, M=2000 and d=20
in Table 4. On the other hand, the value of RE1is the largest (5.054) when ρ=0.907,
M=2000 and d=10 in Table 5. This suggests that selection of Mand ddepends on the
within-set ranking quality through the correlation coefficient between X- and Y-variables.
This relationship is further investigated in Figs. 1and 2.
Figures 1and 2present the plots of RE3values with respect to dfor different set sizes
Min nine panels for ω=0.7 and ω=0.90, respectively. The displays in the each row fix
Post- stratified Probability- Proportional- to-Size Sampling 709
Table 2. Efficiency comparison of the proposed estimators with probability-proportional-to-size and ratio esti-
mators.
ωτ ρ MdRE
1RE2RE3τρ MdRE
1RE2RE3
0.70 50 0.966 4 4 1.993 1.447 6.703 400 0.657 10 5 0.999 1.154 1.281
50 0.966 5 5 2.193 1.564 6.779 400 0.657 30 3 1.036 1.049 1.366
50 0.966 10 2 2.081 1.277 4.433 400 0.657 60 3 1.037 1.039 1.395
50 0.966 10 5 2.610 1.579 5.832 500 0.574 4 4 1.005 1.110 1.382
50 0.966 30 3 3.563 1.466 5.920 500 0.574 5 5 1.008 1.156 1.404
50 0.966 60 3 4.188 1.469 6.523 500 0.574 10 2 1.021 1.024 1.366
200 0.856 4 4 1.163 1.163 2.214 500 0.574 10 5 0.977 1.149 1.221
200 0.856 5 5 1.179 1.209 2.181 500 0.574 30 3 1.008 1.044 1.314
200 0.856 10 2 1.187 1.063 1.834 500 0.574 60 3 1.007 1.035 1.344
200 0.856 10 5 1.168 1.198 1.747 800 0.401 4 4 0.984 1.103 1.274
200 0.856 30 3 1.256 1.085 1.761 800 0.401 5 5 0.985 1.150 1.304
200 0.856 60 3 1.275 1.071 1.784 800 0.401 10 2 0.999 1.020 1.306
400 0.657 4 4 1.024 1.116 1.480 800 0.401 10 5 0.952 1.142 1.155
400 0.657 5 5 1.028 1.161 1.495 800 0.401 30 3 0.976 1.039 1.257
400 0.657 10 2 1.041 1.029 1.420 800 0.401 60 3 0.972 1.030 1.286
0.80 50 0.949 4 4 2.279 1.567 8.075 400 0.680 10 5 1.048 1.154 1.484
50 0.949 5 5 2.442 1.602 7.192 400 0.680 30 3 1.052 1.038 1.407
50 0.949 10 2 2.961 1.533 7.318 400 0.680 60 3 1.040 1.032 1.359
50 0.949 10 5 3.071 1.585 7.580 500 0.605 4 4 1.021 1.116 1.484
50 0.949 30 3 2.851 1.284 5.524 500 0.605 5 5 1.026 1.178 1.459
50 0.949 60 3 2.625 1.220 5.173 500 0.605 10 2 1.039 1.036 1.402
200 0.854 4 4 1.231 1.163 2.485 500 0.605 10 5 1.015 1.121 1.342
200 0.854 5 5 1.229 1.238 2.286 500 0.605 30 3 0.999 1.040 1.327
200 0.854 10 2 1.303 1.099 2.259 500 0.605 60 3 1.013 1.033 1.327
200 0.854 10 5 1.290 1.223 2.201 800 0.445 4 4 1.013 1.117 1.335
200 0.854 30 3 1.259 1.083 1.916 800 0.445 5 5 0.999 1.162 1.311
200 0.854 60 3 1.258 1.069 1.905 800 0.445 10 2 1.010 1.019 1.219
400 0.680 4 4 1.053 1.159 1.639 800 0.445 10 5 0.968 1.159 1.246
400 0.680 5 5 1.051 1.167 1.553 800 0.445 30 3 0.975 1.046 1.240
400 0.680 10 2 1.080 1.041 1.510 800 0.445 60 3 0.967 1.030 1.202
The finite population is constructed from the model in Eq. (9) with ω=0.70, sample size n=30, the population
size N=2000, and the correlation coefficient, ρ, between the X-andY-variables. The efficiencies are RE1=
Va r (¯
Ypps,n)/Var (¯
YRB,n),RE2=Var (¯
Ypp,n)/Var (¯
YRB,n),RE3=MSE(¯
YR)/Va r (¯
YRB,n). Variances and mean
square error (MSE) are computed from 5000 simulation replication.
the ρand ωand changes the sample sizes from n=30 to n=120. The displays in the each
column fix the sample size and ωand vary the correlation coefficient. The plots in Fig. 1
suggest that for large sample size n=120 and large correlation coefficient between X- and
Y-variable (ρ>0.90), we can select as large Mas possible with dhaving values between
6 and 15. On the other hand, for moderately large sample sizes n=60, M=300 and the
number of groups dbetween 6 and 10 seem to be slightly better than M=2000. For the
lower correlation coefficient ρ<90, the selection of Mdoes not make a big difference. All
efficiency curves are quite close to each other. In this case, the number of groups dshould
not be larger than 5 for any M.
Similar results also hold in Fig. 2. Only difference is that the efficiency values RE3
is much higher. This indicates that the proposed pp and Rao–Blackwell estimators can
710 O. Ozturk
Table 3. Efficiency comparison of the proposed estimators with probability-proportional-to-size and ratio esti-
mators.
ωτ ρ MdRE
1RE2RE3τρ MdRE
1RE2RE3
0.90 50 0.907 4 4 2.038 1.443 7.825 400 0.629 10 5 1.033 1.129 1.520
50 0.907 5 5 2.210 1.561 7.854 400 0.629 30 3 1.034 1.045 1.447
50 0.907 10 2 1.946 1.279 6.143 400 0.629 60 3 1.038 1.032 1.411
50 0.907 10 5 2.692 1.588 8.307 500 0.553 4 4 1.029 1.125 1.352
50 0.907 30 3 3.723 1.577 9.571 500 0.553 5 5 1.014 1.204 1.413
50 0.907 60 3 4.449 1.474 11.050 500 0.553 10 2 1.024 1.026 1.336
200 0.808 4 4 1.190 1.180 2.480 500 0.553 10 5 1.002 1.148 1.259
200 0.808 5 5 1.224 1.241 2.478 500 0.553 30 3 1.010 1.039 1.298
200 0.808 10 2 1.196 1.068 2.158 500 0.553 60 3 1.010 1.035 1.299
200 0.808 10 5 1.220 1.209 2.286 800 0.393 4 4 0.993 1.119 1.198
200 0.808 30 3 1.298 1.092 2.207 800 0.393 5 5 0.987 1.150 1.223
200 0.808 60 3 1.279 1.074 2.301 800 0.393 10 2 0.996 1.031 1.246
400 0.629 4 4 1.036 1.128 1.569 800 0.393 10 5 0.986 1.134 1.131
400 0.629 5 5 1.044 1.150 1.463 800 0.393 30 3 0.986 1.034 1.150
400 0.629 10 2 1.050 1.039 1.497 800 0.393 60 3 0.986 1.022 1.159
0.95 50 0.870 4 4 1.710 1.368 7.306 400 0.592 10 5 1.001 1.123 1.366
50 0.870 5 5 1.822 1.436 7.334 400 0.592 30 3 1.012 1.048 1.392
50 0.870 10 2 1.354 1.125 4.700 400 0.592 60 3 1.013 1.034 1.262
50 0.870 10 5 2.200 1.523 7.887 500 0.521 4 4 1.007 1.110 1.335
50 0.870 30 3 1.927 1.189 5.863 500 0.521 5 5 1.006 1.134 1.255
50 0.870 60 3 1.811 1.125 5.421 500 0.521 10 2 0.993 1.017 1.281
200 0.766 4 4 1.151 1.196 2.444 500 0.521 10 5 0.999 1.140 1.295
200 0.766 5 5 1.145 1.201 2.244 500 0.521 30 3 0.983 1.032 1.241
200 0.766 10 2 1.079 1.040 1.944 500 0.521 60 3 0.981 1.031 1.213
200 0.766 10 5 1.153 1.180 2.127 800 0.377 4 4 0.996 1.132 1.208
200 0.766 30 3 1.160 1.052 1.930 800 0.377 5 5 0.983 1.134 1.152
200 0.766 60 3 1.125 1.054 1.927 800 0.377 10 2 1.003 1.021 1.140
400 0.592 4 4 1.014 1.129 1.431 800 0.377 10 5 0.973 1.145 1.178
400 0.592 5 5 1.009 1.183 1.420 800 0.377 30 3 0.956 1.019 1.149
400 0.592 10 2 1.021 1.030 1.377 800 0.377 60 3 0.952 1.034 1.102
The finite population is constructed from the model in Eq. (9) with ω=0.70, sample size n=30, the population
size N=2000, and the correlation coefficient, ρ, between the X-andY-variables. The efficiencies are RE1=
Va r (¯
Ypps,n)/Var (¯
YRB,n),RE2=Var (¯
Ypp,n)/Var (¯
YRB,n),RE3=MSE(¯
YR)/Va r (¯
YRB,n). Variances and mean
square error (MSE) are computed from 5000 simulation replication.
be better alternatives than a ratio estimator for the population producing extreme X- and
Y-values for the smaller values of 1 ω.
6. APPLICATION TO APPLE PRODUCTION DATA
In this section, we apply the proposed sampling designs to apple production data without
stratification. Since efficiency of the point estimators is discussed in the previous chapter,
we investigate the properties of the confidence intervals.
We performed a simulation study to compare the efficiency of the confidence intervals
of population mean based on the Rao–Blackwell ( ¯
YRB,n), pp (¯
Ypp,n) and pps (¯
Ypps,n)
estimators. The simulation study considered set and group size combinations (M,d),
(M,d)=(6,6),(7,7),(8,8),(9,9),(10,10),(30,3),(30,5),(300,3),(300,5),(300,10),
Post- stratified Probability- Proportional- to-Size Sampling 711
Table 4. Efficiency comparison of the proposed estimators with probability-proportional-to-size and ratio esti-
mators.
τρ MdRE
1RE2RE3τρ MdRE
1RE2RE3
50 0.966 5 5 2.353 1.456 7.398 400 0.657 60 10 1.048 1.062 1.339
50 0.966 10 10 2.760 1.408 5.784 400 0.657 2000 2 1.048 1.002 1.267
50 0.966 60 2 2.047 1.114 3.123 400 0.657 2000 4 1.027 1.006 1.250
50 0.966 60 3 4.449 1.380 6.940 400 0.657 2000 5 1.050 1.014 1.297
50 0.966 60 4 3.325 1.226 4.719 400 0.657 2000 10 1.027 1.014 1.381
50 0.966 60 5 3.950 1.302 6.496 400 0.657 2000 20 0.985 1.088 1.285
50 0.966 60 10 4.015 1.328 7.143 500 0.574 5 5 1.026 1.014 1.530
50 0.966 2000 2 1.808 1.029 2.445 500 0.574 10 10 1.020 1.082 1.246
50 0.966 2000 4 3.002 1.035 4.227 500 0.574 60 2 1.032 0.999 1.406
50 0.966 2000 5 3.331 1.028 5.063 500 0.574 60 3 1.044 1.017 1.296
50 0.966 2000 10 4.185 1.077 5.386 500 0.574 60 4 1.042 1.019 1.312
50 0.966 2000 20 5.964 1.258 8.858 500 0.574 60 5 1.038 1.022 1.469
200 0.856 5 5 1.208 1.126 2.160 500 0.574 60 10 1.017 1.055 1.464
200 0.856 10 10 1.237 1.101 2.034 500 0.574 2000 2 1.018 0.999 1.303
200 0.856 60 2 1.195 1.022 1.722 500 0.574 2000 4 1.035 1.002 1.309
200 0.856 60 3 1.300 1.029 1.763 500 0.574 2000 5 1.027 1.009 1.349
200 0.856 60 4 1.215 1.029 1.542 500 0.574 2000 10 1.030 1.012 1.395
200 0.856 60 5 1.285 1.042 1.904 500 0.574 2000 20 0.923 1.090 1.123
200 0.856 60 10 1.322 1.079 1.701 800 0.401 5 5 1.021 1.009 1.234
200 0.856 2000 2 1.153 1.002 1.295 800 0.401 10 10 1.001 1.054 1.362
200 0.856 2000 4 1.281 1.009 1.709 800 0.401 60 2 1.008 1.000 1.394
200 0.856 2000 5 1.251 1.013 1.608 800 0.401 60 3 1.025 1.004 1.262
200 0.856 2000 10 1.289 1.005 1.501 800 0.401 60 4 0.998 1.008 1.340
200 0.856 2000 20 1.160 1.090 1.571 800 0.401 60 5 0.998 1.017 1.275
400 0.657 5 5 1.058 1.038 1.533 800 0.401 60 10 1.007 1.059 1.390
400 0.657 10 10 1.047 1.074 1.492 800 0.401 2000 2 0.996 1.000 1.279
400 0.657 60 2 1.064 1.013 1.329 800 0.401 2000 4 1.011 1.001 1.433
400 0.657 60 3 1.069 1.001 1.526 800 0.401 2000 5 0.966 1.004 1.232
400 0.657 60 4 1.057 1.005 1.318 800 0.401 2000 10 0.937 1.015 1.167
400 0.657 60 5 1.073 1.031 1.440 800 0.401 2000 20 0.894 1.052 1.083
The finite population is constructed from the model in Eq. (9) with ω=0.70, sample size n=120, the population
size N=2000, and the correlation coefficient, ρ, between X-andY-variables. The efficiencies are RE1=
Va r (¯
Ypps,n)/Var (¯
YRB,n),RE2=Var (¯
Ypp,n)/Var (¯
YRB,n),RE3=MSE(¯
YR)/Va r (¯
YRB,n). Variances and mean
square error (MSE) are computed from 5000 simulation replication.
(600,3),(600,5),(600,10). We considered two different sample sizes n=60,120. Simu-
lation size is taken to be 1000. Rao–Blackwell estimator is computed with fifty replication
Q=50. The efficiencies of the confidence intervals are defined as the ratio of the squared
average lengths
RE4=1000
i=1L2
pp,i
1000
i=1L2
RB,i
,RE5=1000
i=1L2
pps,i
1000
i=1L2
RB,i
,
where Lpp,i,LRB,iand Lpps,iare the length of the confidence intervals in the ith replication
based on point estimators ¯
Ypp,n,˜
YRB,nand ¯
Ypps,n, respectively, and given by
Lpp,i=2tndnˆσpp,n,i,LRB,i=2tn1ˆσRB,n,i,Lps,i=2tndnˆσpps,n,i.
712 O. Ozturk
Table 5. Efficiency comparison of the proposed estimators with probability-proportional-to-size and ratio esti-
mators.
τρ MdRE
1RE2RE3τρ MdRE
1RE2RE3
50 0.907 5 5 2.152 1.395 7.805 400 0.629 60 10 1.095 1.063 1.504
50 0.907 10 10 2.711 1.453 9.264 400 0.629 2000 2 1.037 1.000 1.402
50 0.907 60 2 1.748 1.103 5.312 400 0.629 2000 4 1.072 0.999 1.374
50 0.907 60 3 4.899 1.440 14.821 400 0.629 2000 5 1.092 1.016 1.484
50 0.907 60 4 3.233 1.247 9.029 400 0.629 2000 10 1.069 1.013 1.288
50 0.907 60 5 3.545 1.242 9.746 400 0.629 2000 20 1.029 1.061 1.342
50 0.907 60 10 4.363 1.282 11.177 500 0.553 5 5 1.031 1.027 1.495
50 0.907 2000 2 1.693 1.010 4.899 500 0.553 10 10 1.049 1.090 1.454
50 0.907 2000 4 2.767 1.037 6.999 500 0.553 60 2 1.027 1.005 1.363
50 0.907 2000 5 3.228 1.024 7.918 500 0.553 60 3 1.064 1.004 1.423
50 0.907 2000 10 5.054 1.109 14.088 500 0.553 60 4 1.043 1.018 1.305
50 0.907 2000 20 4.738 1.178 13.567 500 0.553 60 5 1.043 1.031 1.454
200 0.808 5 5 1.234 1.065 2.813 500 0.553 60 10 1.031 1.048 1.373
200 0.808 10 10 1.213 1.111 2.250 500 0.553 2000 2 1.046 1.001 1.338
200 0.808 60 2 1.189 1.021 1.986 500 0.553 2000 4 1.018 1.004 1.280
200 0.808 60 3 1.422 1.058 2.350 500 0.553 2000 5 1.027 1.005 1.243
200 0.808 60 4 1.311 1.042 2.266 500 0.553 2000 10 1.013 1.014 1.214
200 0.808 60 5 1.278 1.049 2.403 500 0.553 2000 20 0.925 1.053 1.132
200 0.808 60 10 1.362 1.103 2.462 800 0.393 5 5 1.025 1.007 1.223
200 0.808 2000 2 1.148 1.004 1.856 800 0.393 10 10 1.004 1.072 1.158
200 0.808 2000 4 1.247 1.003 2.024 800 0.393 60 2 1.004 1.002 1.168
200 0.808 2000 5 1.333 1.006 2.135 800 0.393 60 3 1.017 1.013 1.150
200 0.808 2000 10 1.280 1.017 2.161 800 0.393 60 4 1.002 1.004 1.135
200 0.808 2000 20 1.357 1.102 2.325 800 0.393 60 5 1.006 1.039 1.090
400 0.629 5 5 1.070 1.017 1.537 800 0.393 60 10 0.965 1.061 1.099
400 0.629 10 10 1.065 1.101 1.381 800 0.393 2000 2 1.003 1.004 1.349
400 0.629 60 2 1.058 1.008 1.429 800 0.393 2000 4 0.997 1.005 1.270
400 0.629 60 3 1.051 1.021 1.554 800 0.393 2000 5 0.978 1.002 1.139
400 0.629 60 4 1.060 1.010 1.610 800 0.393 2000 10 0.970 1.029 1.081
400 0.629 60 5 1.073 1.040 1.363 800 0.393 2000 20 0.909 1.099 1.248
The finite population is constructed from the model in Eq. (9) with ω=0.90, sample size n=120, the population
size N=2000, and the correlation coefficient, ρ, between the X-andY-variables. The efficiencies are RE1=
Va r (¯
Ypps,n)/Var (¯
YRB,n),RE2=Var (¯
Ypp,n)/Var (¯
YRB,n),RE3=MSE(¯
YR)/Va r (¯
YRB,n). Variances and mean
square error (MSE) are computed from 5000 simulation replication.
The notations ˆσpp,n,i,ˆσRB,n,iand ˆσpps,n,iare used to denote the variance estimates of the
estimator in the ith replication of the simulation study. The values of RE4,RE5greater than
1 indicate that the average length of the confidence interval in denominators are shorter than
the ones in the numerators. Table 6presents the efficiencies and the coverage probabilities
of the 95% confidence intervals.
It is clear that the values of RE5are all greater than 1. Hence, the confidence intervals
based on Rao–Blackwell estimators have shorter length than the lengths of the intervals
constructed based on a pps estimator. The efficiency values slightly increase with sample
size n=120. Since the correlation coefficient =0.916)between the X- and Y-variables
is relatively high, the larger set sizes Mtend to produce shorter intervals based on Rao–
Blackwell estimator.
Post- stratified Probability- Proportional- to-Size Sampling 713
246810 14
23456
The number
of groups (d)
RE3
M=30
M=60
M=300
M=2000
ρ=0.966, n=30, ω=0.7
246810 14
23456
The number
of groups (d)
RE3
ρ=0.966, n=60, ω=0.7
246810 14
23456
The number
of groups (d)
RE3
ρ=0.966, n=120, ω=0.7
246810 14
1.0 1.2 1.4 1.6 1.8
The number
of groups (d)
RE3
ρ=0.856, n=30, ω=0.7
246810 14
1.0 1.2 1.4 1.6 1.8
The number
of groups (d)
RE3
ρ=0.856, n=60, ω=0.7
246810 14
1.0 1.2 1.4 1.6 1.8
The number
of groups (d)
RE3
ρ=0.856, n=120, ω=0.7
246810 14
1.0 1.2 1.4 1.6 1.8
The number
of groups (d)
RE3
ρ=0.657, n=30, ω=0.7
246810 14
1.0 1.2 1.4 1.6 1.8
The number
of groups (d)
RE3
ρ=0.657, n=60, ω=0.7
246810 14
1.0 1.2 1.4 1.6 1.8
The number
of groups (d)
RE3
ρ=0.657, n=120, ω=0.7
Figure 1. Efficiency plots (RE3=MSE(¯
YR,n)
Va r (¯
YRB,n)) of Rao–Blackwell estimator with respect to ratio estimator in
double sampling for different values of M,dand nwhen ω=0.7.
The efficiency (RE4>1) of the jackknife confidence interval based on Rao–Blackwell
estimator is higher when the set size Mis less then or equal to 10. For large values of M,
RE4values are around 1 indicating that the jackknife confidence intervals are as good as or
slightly less efficient than pp-based confidence intervals.
The coverage probabilities for all three confidence intervals are slightly lower than the
nominal coverage probability 0.95 when n=60. Since the confidence intervals are con-
structed based on normal approximation, this may be due to the effect of sample sizes.
Table 6shows that coverage probabilities are quite close to 0.95 when n=120.
We performed another simulation study to investigate the efficiency of stratified pp
estimators under equal, proportional and Neyman allocations. In this part of the simula-
tion, pp samples are constructed from stratified apple production data in Table 1.Aswe
observe from the table, there is a large variation between stratum populations. Hence, the
use of stratified pp sample would be appropriate. To determine the sample sizes in Ney-
man allocation, we used the population standard deviations in Table 1. Simulation study
714 O. Ozturk
246810 14
4681012
The number
of groups (d)
RE3
M=30
M=60
M=300
M=2000
ρ=0.907, n=30, ω=0.9
246810 14
4 6 8 10 12
The number
of groups (d)
RE3
ρ=0.907, n=60, ω=0.9
246810 14
4681012
The number
of groups (d)
RE3
ρ=0.907, n=120, ω=0.9
246810 14
1.6 1.8 2.0
The number
of groups (d)
RE3
ρ=0.808, n=30, ω=0.9
246810 14
1.0 1.5 2.0 2.5
The number
of groups (d)
RE3
ρ=0.808, n=60, ω=0.9
246810 14
1.8 2.0 2.2
The number
of groups (d)
RE3
ρ=0.808, n=120, ω=0.9
246810 14
1.10 1.25 1.40
The number
of groups (d)
RE3
ρ=0.629, n=30, ω=0.9
246810 14
1.1 1.2 1.3 1.4
The number
of groups (d)
RE3
ρ=0.629, n=60, ω=0.9
246810 14
1.15 1.30 1.45
The number
of groups (d)
RE3
ρ=0.629, n=120, ω=0.9
Figure 2. Efficiency plots (RE3=MSE(¯
YR,n)
Va r (¯
YRB,n)) of Rao–Blackwell estimator with respect to ratio estimator in
double sampling for different values of M,d,ρand nwhen ω=0.90.
considered two designs, stratified pps and stratified pp sampling designs with sample sizes
n=140,210,240. For stratified pp sampling design, we computed two estimators, ¯
Ystr
and Rao–Blackwell estimator ¯
Ystr,RB
¯
Ystr,RB =
L
l=1
Nl
N¯
YRB,nl
where ¯
YRB,nlis the Rao–Blackwell estimator from stratum population l. The variance of
¯
Ystr,RB is denoted with σ2
λ,RB(E),σ2
λ,RB(P), and σ2
λ,RB(N)for equal, proportional and
Neyman allocations, respectively. To approximate Rao–Blackwell estimator, the number of
replications, Q, is selected to be 10 and 50, respectively.
For comparison purposes, we also considered the estimator of θbased on stratified pps
sample
Post- stratified Probability- Proportional- to-Size Sampling 715
Table 6. The efficiencies and coverage of probabilities (Cov) of the confidence intervals.
nMdRE
4RE5cov (RB) Cov (pp) Cov (pps)
60 6 6 1.172 1.333 0.937 0.938 0.931
60 7 7 1.200 1.344 0.926 0.927 0.933
60 8 8 1.220 1.348 0.919 0.919 0.931
60 9 3 1.108 1.363 0.920 0.923 0.936
60 9 9 1.243 1.366 0.935 0.927 0.934
60 10 10 1.268 1.353 0.927 0.925 0.937
60 30 3 1.062 1.396 0.941 0.935 0.944
60 30 5 1.092 1.401 0.930 0.931 0.935
60 300 3 0.998 1.410 0.932 0.929 0.931
60 300 5 0.994 1.353 0.941 0.934 0.942
60 300 10 1.006 1.283 0.906 0.905 0.937
60 600 3 0.987 1.419 0.931 0.932 0.937
60 600 5 0.991 1.362 0.928 0.928 0.936
60 600 10 0.943 1.204 0.930 0.922 0.937
120 6 6 1.125 1.338 0.949 0.948 0.947
120 7 7 1.125 1.359 0.933 0.949 0.938
120 8 8 1.134 1.371 0.963 0.955 0.960
120 9 3 1.093 1.362 0.943 0.945 0.950
120 9 9 1.147 1.385 0.956 0.953 0.963
120 10 10 1.162 1.388 0.945 0.945 0.940
120 30 3 1.057 1.405 0.940 0.944 0.936
120 30 5 1.077 1.449 0.933 0.934 0.935
120 300 3 1.013 1.475 0.936 0.939 0.938
120 300 5 1.008 1.423 0.942 0.938 0.941
120 300 10 1.017 1.443 0.934 0.932 0.931
120 600 3 1.004 1.471 0.949 0.950 0.961
120 600 5 1.006 1.439 0.941 0.935 0.945
120 600 10 1.006 1.453 0.939 0.940 0.951
The RE4is the ratio of the average squared lengths of the confidence intervals based on post-stratified probability-
proportional-to-size (pp) and Rao–Blackwell (RB) estimators. The RE5is the ratio of the average squared lengths
of the confidence intervals based on probability-proportional-to-size (pps) and RB estimators.
˘
Ystr =
L
l=1
Nl
N¯
Ypps,nl,
where ¯
Ypps,nlis the pps estimator in Eq. (1) from the stratum population l. The variance
of n(˘
Ystr θ), similar to the ones in stratified pp sampling, can be computed for equal,
proportional and Neyman allocation
˘σ2
λ(E)=
L
l=1
L˘
A2
l
N2σ2
λ(P)=
L
l=1
˘
A2
l
NN
l
,˘σ2
λ(N)=L
l=1
˘
Al
N2
,
where ˘
A2
l=1
nN
k=1πkyk
Nlπkθl2
.
Stratified samples for both pps and pp designs are constructed using equal, proportional
and Neyman allocations. Simulation size is taken to be 50,000. Table 7presents relative
efficiencies of stratified pps and pp estimators with respect to Rao–Blackwell estimators.
716 O. Ozturk
Table 7. Relative efficiencies the estimator and coverage probability of the confidence interval of the parameter θfor equal (E), proportional ( P)andNeyman(N) allocation procedures.
nQStratified pps Stratified pp Rao–Blackwell Stratified pp
˘σ2
λ(E)
σ2
λ,RB(E)˘σ2
λ(P)
σ2
λ,RB(P)˘σ2
λ(N)
σ2
λ,RB(N)
σ2
λ(E)
σ2
λ,RB(E)
σ2
λ(P)
σ2
λ,RB(P)
σ2
λ(N)
σ2
λ,RB(N)
σ2
λ,RB(E)
σ2
λ,RB(N)
σ2
λ,RB(P)
σ2
λ,RB(N)Cov (E)Cov(P)Cov(N)
140 10 1.742 1.741 1.549 1.356 1.351 1.179 1.351 1.553 0.936 0.927 0.942
140 50 1.864 1.790 1.577 1.372 1.378 1.185 1.331 1.525 0.937 0.928 0.941
210 10 1.907 1.913 1.709 1.269 1.341 1.166 1.418 1.500 0.942 0.937 0.946
210 50 1.932 1.901 1.653 1.330 1.322 1.205 1.319 1.452 0.939 0.938 0.938
280 10 1.967 1.899 1.731 1.266 1.237 1.209 1.328 1.514 0.942 0.938 0.945
280 50 2.060 1.928 1.683 1.278 1.284 1.239 1.256 1.412 0.947 0.938 0.938
Post- stratified Probability- Proportional- to-Size Sampling 717
It is clear that Rao–Blackwell estimator for each allocation procedure provides substantial
amount of improvement over stratified pps and pp estimators. As expected, the efficiency
increases with sample size n, but the increase in the number of replications Qin Rao–
Blackwell estimator from 10 to 50 does not make significant improvement on the efficiency.
The Neyman allocation dominates the other allocation procedures as expected. For this
population, equal allocation has higher efficiency than the proportional allocation. This is
consistent with the expression σ2
λ(E)σ2
λ(L). This difference could be negative for pop-
ulations in which smaller stratum populations has larger variances. A close inspection of
apple production data in Table 1indicates the largest stratum population variance belongs
to second smallest stratum. Hence for this population equal allocation is better than a pro-
portional allocations. The coverage probabilities of the confidence interval of θbased on
stratified pp estimator are relatively close to the nominal value of 0.95.
7. CONCLUDING REMARKS
In many survey sampling studies, in addition to variable of interest, the population units
have a known auxiliary variable. This auxiliary variable is often proportional to the variable
under study. If the population has strong heterogeneity among its members, such as extremely
large values for some population units, the pps sample would provide an estimator for
population mean with smaller variance than a simple random sample estimator of the same
size. In a pps sample, sample units are selected with selection probabilities proportional
to size of the auxiliary variable. Since the auxiliary variable is highly correlated with the
variable of interest, it also provides information about the relative position of the units in
a comparison set with respect to variable of interest. In this paper, we used this position
information to construct post-stratified pps sample. The new sample creates post-strata
among sample units of a pps sample. Hence, the estimators of the population mean have
a smaller variance than a pps sample of the same size. The post-stratification of the pps
sample is performed by conditioning on the comparison sets. We use Rao–Blackwell theorem
to improve the post-stratified pps sample estimator. The new sampling design is naturally
extended to stratified population. Efficiency of the estimator of the population mean is
empirically evaluated in a stratified population.
[Received February 2019. Accepted July 2019. Published Online July 2019.]
REFERENCES
Al-Saleh, M. F. and Samawi, H. (2007). A note on Inclusion Probability in Ranked Set Sampling for finite
population. Tes t , 16, 198–209.
Dastbaravarde, A., Arghami, N.R., Sarmad, M., (2016). Some theoretical results concerning non parametric esti-
mation by using a judgement poststratification sample. Communications in Statistics - Theory and Methods,
45, 2181–2203.
Deshpande, J.V., Frey, J., Ozturk, O. (2006) Nonparametric ranked set-sampling confidence intervals for a finite
population. Environmental and Ecological Statistics, 13, 25–40.
718 O. Ozturk
Frey, J. (2011). A note on ranked-set sampling using a covariate. Journal of Statistical Planning and Inference,
141, 809–816.
— (2012). Constrained nonparametric estimation of the mean and the CDF using ranked-set sampling with a
covariate. Annals of the Institute of Statistical Mathematics, 64, 439–456.
Frey, J. and Feeman, T.G. (2012). An improved mean estimator for judgement post-stratification. Computational
Statistics and Data Analysis, 56, 418–426.
— (2013). Variance estimation using judgement post- stratification. Annals of the Institute of Statistical Mathe-
matics, 65, 551–569.
Frey, J. and Ozturk, O. (2011). Constrained estimation using judgement post-stratification. Annals of the Institute
of Statistical Mathematics, 63, 769–789.
Gokpinar, F. and Ozdemir, Y.A. (2010). Generalization of inclusion probabilities in ranked set sampling. Hacettepe
Journal of Mathematics and Statistics, 39, 89–95.
Kadilar, C. and Cingi, H. (2003). Ratio estimators in stratified random sampling. Biometrical Journal, 45, 218–225.
MacEachern, S. N., Stasny, E. A., and Wolfe, D. A. (2004) Judgment post- stratification with imprecise rankings.
Biometrics, 60, 207–215.
Ozdemir, Y.A. and Gokpinar,F. (2008). A new formula for inclusion probabilities in median ranked set sampling.
Communications in Statistics - Theory and Methods, 37, 2022–2033.
— (2007). A generalized formula for inclusion probabilities in ranked set sampling. Hacettepe Journal of Mathe-
matics and Statistics, 36, 89–99.
Ozturk, O. (2014a). Estimation of population mean and total in finite population setting using multiple auxiliary
variables. Journal of Agricultural, Biological and Environmental Statistics, 19, 161–184.
— (2014b). Statistical inference for population quantiles and variance in judgment post-stratified samples, Com-
putational Statistics and Data Analysis, 77, 188–205.
— (2016a). Estimation of a finite population mean and total using population ranks of sample units. Journal of
Agricultural, Biological and Environmental Statistics, 21, 181–202.
— (2016b). Statistical inference based on judgment post-stratified samples in finite population. Survey Methodol-
ogy, 42, 239–262.
Ozturk., O. and Bayramoglu Kavlak, K. (2018). Model based inference using ranked set samples. Survey Method-
ology, 44, 1–16.
— (2019). Statistical inference using stratified ranked set samples from finite populations, Chapter 12, pages,
157–170. Ranked Set Sampling: 65 Years Improving the Accuracy in Data Gathering edited by Bouza and
Al-Omari, Elsevier, San Diego, USA.
Ozturk, O. and Jafari Jozani, M. (2013). Inclusion Probabilities in Partially Rank Ordered Set Sampling. Compu-
tational Statistics and Data Analysis, 69, 122–132.
Patil, G.P., Sinha, A.K., and Taillie, C. (1995). Finite population corrections for ranked set sampling. Annals of the
Institute of Statistical Mathematics, 47, 621–636.
R Core Team (2018). R: A language and environment for statistical computing. R Foundation for Statistical
Computing, Vienna, Austria. URL https://www.R-project.org/.
Thompson, S.K. (2002). Sampling, 2nd edition, Wiley, New York.
Wang, X., Stokes, L., Lim, J., and Chen, M. (2006). Concomitants of multivariate order statistics with application
to judgment post-stratification. Journal of the American Statistical Association, 101, 1693–1704
Wang, X., Lim, J., Stokes, S.L. (2008). A nonparametric mean estimator for judgement post-stratified data.
Biometrics, 64, 355–363.
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published
maps and institutional affiliations.
... Therefore, this study distributed questionnaires to residents of Suzhou using the PPS random-sampling method. By employing a three-stage unequal probability sampling approach, we ensured scientific rigor and the representativeness of the sample [34,35]. ...
Article
Full-text available
In the current digital wave, social media is not only a hub for information exchange but also a shaper of new business marketing models, as is especially evident in the trend towards light and healthy eating. The influence of the influencer economy on consumer purchasing decisions is increasingly pronounced. This paper systematically investigates the impact of influencer marketing on consumer purchase intentions in social media utilizing the Consumer Attitude Theory. Through a sample survey of 654 consumers and empirical analysis using the fuzzy comprehensive evaluation model, the results show that the influencers’ credibility and professionalism and consumers’ satisfaction with live-streaming sales by influencers have a significant positive impact on enhancing consumers’ purchase intentions. To enhance consumers’ purchase intentions, this study suggests that influencers should transparently disclose their collaborations with brands, showcase the positive experiences of other users, and use relevant research and data to support their product recommendations in order to enhance their credibility. Simultaneously, influencers need to strengthen product knowledge, improve professional image and reputation, and meet consumer needs through personalized recommendations and carefully designed live-streaming content to promote brand-value enhancement.
... A total of 60 dogs were used for the study. The participants were recruited using a stratified proportionate probability sampling technique 14 . The two strata were apparently healthy dogs and clinically ill dogs. ...
Article
Full-text available
Introduction: The global unpopularity of linear ultrasonographic measurement, due to its inherent subjectivity, contrasts with the safety, portability, low cost, and real-time capabilities of this imaging modality. The increased availability of ultrasounds in veterinary practice in Ghana presents an opportunity to provide ultrasonographic liver size reference ranges to aid the diagnosis of hepatopathies in domestic dogs. Therefore, this study sought to establish ultrasonographic liver size reference ranges of dogs in Accra, Ghana. It also aimed to to investigate the correlation between liver size and selected morphometric parameters in these domestic dogs. Materials and methods: A total of 60 dogs from different domestic breeds, sexes (27 males and 33 females), age ranges (2.82 ± 2.12 years), weights (28.83 ±9.98kg), and body conformation were sampled. Purposive sampling of dogs was performed based on presenting history, clinical signs, physical exam, and blood analysis. Blood samples were collected for serum biochemistry to distinguish between those classified as healthy and those presenting with clinical illness. Additionally, all dogs were subjected to linear ultrasonographic liver size measurements in longitudinal and transverse planes. Results: The findings indicated a strong positive correlation of mean longitudinal sonographic liver measurement with body height, body girth (the widest point of the chest and the rib cage), the distance between the last rib and the tuber coxa, and the distance between the xiphoid and the tuber ischium. Equations were derived from the mean longitudinal sonographic measurement and these body parameters for deep and non-deep-chested breeds. This study helped to establish equations that can be used to estimate the longitudinal liver measurement. Conclusion: This information can be used in clinical settings to help veterinarians (even with basic knowledge of hepatic ultrasonography) to have a fair idea of hepatopathies relating to size.
... PPS deals with finite populations. The inference under RSS with a finite population is considered for different designs, see Deshpande, Frey, and Ozturk (2006), Ozturk (2016), and more recently PPS is considered in Ozturk (2019) and Ozturk (2020) that considered the stratified populations, here we focus deeply on RPPS and show its applications in macroeconomic data.. ...
Article
Full-text available
In this paper, we apply the Ranked Set Sampling (RSS) technique to economic data in the form of homescan market research data set for the meat food group. The RSS method is then extended to select sampling units based on the Probability-Proportional-to-Size (PPS) approach. The new proposed ranked set sampling, using the PPS-derived method, RPPS, is assessed via Monte Carlo investigations and an extensive homescan data set to evaluate its performances. The results are promising and in line with theoretical and simulation studies, showing that the RPPS technique is more reliable and has a smaller variance than the PPS route.
... Lee et al. [3] suggested a composite estimator to estimate the total when the cluster sizes are different and the population units are unknown in stratified two-stage cluster sampling. In the recent years, Ozturk [7] constructed a PPS-ranked set sample from a stratified population. Also, Ugwu and Madukaife [12] introduced PPS in the first-stage of the two-stage cluster sampling and RSS in the second-stage. ...
Article
Full-text available
A new sampling scheme is introduced in this paper which can be considered to be an extension of the stratified sub-sampling. Here, the population is first stratified, and probability proportional to size (PPS) sampling with replacement is used to select clusters within each stratum. From each selected cluster, units are selected with ranked set sampling (RSS) without replacement. An estimator is proposed under the sampling design and its efficiency is checked using simulated data and Census data of India, 2011.
... Patel and Bhatt [29] gave an estimation of fnite population total under PPS sampling in the presence of extra-auxiliary information; Singh et al. [30], Makela et al [31], and Ahmad and Shabbir [32] presented the use of an auxiliary variable to estimate the fnite population means under PPS sampling. Ozturk [33] suggested poststratifed probability proportional to size sampling from stratifed populations. Latpate et al. [34] discussed in detail probability proportional to the size sampling. ...
Article
Full-text available
In some situations, the population of interest difers signifcantly in size, for example, in a medical study, the number of patients having a specifc disease and the size of health units may vary. Similarly, in a survey related to the income of a household, the household may have a diferent number of siblings, and then in such situations, we use probability proportional to size sampling. In this article, we have proposed an improved class of estimators for the estimation of population mean on the basis of probability proportional to size (PPS) sampling, using two auxiliary variables. Te mathematical expressions of the bias and mean square error (MSE) are derived up to the frst order of approximation. Four real datasets and a simulation study are conducted to assess the efciency of the improved class of estimators. It is found from the real datasets and a simulation study, that the proposed generalized class of estimators produced better results in terms of minimum MSE and higher PRE, as related to other considered estimators. An empirical study is given to support the theoretical results. Te theoretical study also demonstrates that the proposed generalized class of estimators outperforms the existing estimators.
Article
Full-text available
This article aims to suggest a new generalized class of estimators based on probability proportional to size sampling using two auxiliary variables. The numerical expressions for the bias and mean squared error (MSE) are derived up to the first order of approximation. Four actual data sets are used to examine the performances of a new improved generalized class of estimators. From the results of real data sets, it is examined that the suggested estimator gives the minimum MSE and the percentage relative efficiency is higher than all existing estimators, which shows the importance of the new generalized class of estimators. To check the strength and generalizability of our proposed class of estimators, a simulation study is also accompanied. The consequence of the simulation study shows the worth of newly found proposed class estimators. Overall, we get to the conclusion that the proposed estimator outperforms as compared to all other estimators taken into account in this study.
Article
The area under a receiver operating characteristic curve (AUC) is a useful tool to assess the performance of continuous‐scale diagnostic tests on binary classification. In this article, we propose an empirical likelihood (EL) method to construct confidence intervals for the AUC from data collected by ranked set sampling (RSS). The proposed EL‐based method enables inferences without assumptions required in existing nonparametric methods and takes advantage of the sampling efficiency of RSS. We show that for both balanced and unbalanced RSS, the EL‐based point estimate is the Mann–Whitney statistic, and confidence intervals can be obtained from a scaled chi‐square distribution. Simulation studies and two case studies on diabetes and chronic kidney disease data suggest that using the proposed method and RSS enables more efficient inference on the AUC.
Article
Full-text available
This paper presents novel estimators for a judgment post-stratified (JPS) sample, which combine the ranking information from different methods or rankers. A JPS sample divides the units in the original simple random sample (SRS) into several ranking groups based on the relative positions (ranks) of the units in their individual small comparison sets. Ranks in the comparison sets may be assigned with several different ranking procedures. When considered separately, each ranking method leads to a different JPS sample estimator of the population mean or total. Here we introduce equally or unequally weighted estimators, which combine the ranking information from multiple sources. The unequal weights utilize the standard errors of the individual ranking methods estimators. The weighted estimators provide a substantial improvement over an SRS estimator and a JPS estimator based on a single ranking method. The new estimators are applied to crop establishment phenotypic data from an agricultural field experiment. Supplementary materials accompanying this paper appear online.
Chapter
In recent years, the scale of network marketing increase rapidly. The remark information of customs after shopping will mostly make comments on the quality of goods. The information can provide support for online marketing platform, production enterprises and market supervision departments, and guide management organizations to find quality problems. This paper proposes a sampling method based on Bayesian method and remarks of customs, which can be used to evaluate the quality of goods. It can greatly reduce the number of samples and find the quality problems of goods effectively.
Chapter
Both the International Education Organization (OIE) and UNESCO have stated that promoting collaborative activities is a key competence for sustainable development. This postulate focuses on collaboration with local and international networks. In this line, it is important to mention that, in each teamwork, the members are people who interact sharing objectives, rules and deadlines linked to the activity. Under this reality, it is essential to promote study-team activities in higher education, where students can develop skills to solve problems in multidisciplinary groups. To support the process of generating efficient study-teams, in this investigation we present a system capable of exploring the best alternatives to automatically organize homogeneous study-teams that favor the best performance. Our proposal uses a personalized genetic algorithm (GA), based on student learning styles and academic profile. The experimentation phase has yielded positive results compared to the self-organization method or the teacher imposition method.
Article
Full-text available
This paper introduces new estimators for population total and mean in a finite population setting, where ranks (or approximate ranks) of population units are available before selecting sample units. The proposed estimators require selecting a simple random sample and identifying the population ranks of sample units. Selection of the sample can be performed with- or without-replacement. The population ranks of the selected units of with-replacement samples are determined among all population units. On the other hand, the ranks of the sample units of without-replacement samples are identified in two different ways: (1) The rank of a sample unit is determined sequentially among the remaining population units after excluding all previously ranked sample units from the population; (2) The ranks are determined among all units in the population. By conditioning on these population ranks, we construct a set of weighted estimators, develop a bootstrap re-sampling procedure to estimate the variances of the estimators, and construct percentile confidence intervals for the population mean and total. We show that the new estimators provide a substantial amount of efficiency gain over their competitors. We apply the proposed estimators to estimate corn production in one of the counties in Ohio.
Article
Full-text available
This paper introduces a new sampling design in a finite population setting, where potential sampling units have a wealth of auxiliary information that can be used to rank them into partially ordered sets. The proposed sampling design selects a set of sampling units. These units are judgment ranked without measurement by using available auxiliary information. The ranking process allows ties among ranks whenever units cannot be ranked accurately with high confidence. The ranking information from all sources is combined in a meaningful way to construct strength-of-agreement weights. These weights are then used to select a single sampling unit for full measurement in each set. Three different levels of sampling design, level-0, level-1, and level-2, are investigated. They differ in their replacement policies. Level-0 sampling designs construct the sample by sampling with replacement, level-1 sampling designs constructs the sample without replacement of the fully measured unit in each set, and level-2 sampling designs construct the sample without replacement on the entire set. For these three designs, we estimate the first and second order inclusion probabilities and construct estimators for the population total and mean. We develop a bootstrap resampling procedure to estimate the variances of the estimators and to construct percentile confidence intervals for the population mean and total. We show that the new sampling designs provide a substantial amount of efficiency gain over their competitor designs in the literature.
Article
This paper draws statistical inference for finite population mean based on judgment post stratified (JPS) samples. The JPS sample first selects a simple random sample and then stratifies the selected units into H judgment classes based on their relative positions (ranks) in a small set of size H. This leads to a sample with random sample sizes in judgment classes. Ranking process can be performed either using auxiliary variables or visual inspection to identify the ranks of the measured observations. The paper develops unbiased estimator and constructs confidence interval for population mean. Since judgment ranks are random variables, by conditioning on the measured observations we construct Rao-Blackwellized estimators for the population mean. The paper shows that Rao-Blackwellized estimators perform better than usual JPS estimators. The proposed estimators are applied to 2012 United States Department of Agriculture Census Data.
Article
A judgment post-stratified (JPS) sample is used in order to develop statistical inference for population quantiles and variance. For the ppth order of the population quantile, a test is constructed, an estimator is developed, and a distribution-free confidence interval is provided. An unbiased estimator for the population variance is also derived. For finite sample sizes, it is shown that the proposed inferential procedures for quantiles are more efficient than corresponding simple random sampling (SRS) procedures, but less efficient than corresponding ranked set sampling (RSS) procedures. The variance estimator is less efficient, as efficient as, or more efficient than a simple random sample variance estimator for small, moderately small, and large sample sizes, respectively. Furthermore, it is shown that JPS sample quantile estimators and tests are asymptotically equivalent to RSS estimators and tests in their efficiency comparison.
Article
In a finite population setting, this paper considers a partially rank ordered set (PROS) sampling design. The PROS design selects a simple random sample (SRS) of MM units without replacement from a finite population and creates a partially rank ordered judgment subsets by dividing the units in SRS into subsets of a pre-specified size. The subsetting process creates a partial ordering among units in which each unit in subset hh is considered to be smaller than every unit in subset h′h′ for h′>hh′>h. The PROS design then selects a unit for full measurement from one of these subsets. Remaining units are returned to the population based on three replacement policies. For each replacement policy, we compute the first and second order inclusion probabilities and use them to construct the Horvitz–Thompson estimator and its variance for the estimation of the population total and mean. It is shown that the replacement policy that does not return any of the MM units, prior to selection of the next unit for full measurement, outperforms all other replacement policies.