ArticlePDF Available

PREDICTIVE LIKELIHOOD IN FINITE POPULATIONS

Authors:
  • Institute of Mathematical Statistics, University of São Paulo, Brazil

Abstract

Superpopulation models are transformed in predictive models in order to permit the use of standard classical statistics techniques. Confidence intervals based on predictive models replace the predictive intervals based on superpopulation models. The ideas are illustrated by various examples and the normal case turns out to produce intervals that are also obtained by the standard classical survey sampling techniques.
Revista Brasileira de Probabilidade e Estat´ıstica (1993), 7, pp. 65–82.
c
Associa¸ao Brasileira de Estat´ıstica
PREDICTIVE LIKELIHOOD IN FINITE
POPULATIONS
Pilar Iglesias, Mˆonica C. Sandoval and Carlos Alberto de Bragan¸ca
Pereira
Departamento de Estat´ıstica
Instituto de Matem´atica e Estat´ıstica
Universidade de S˜ao Paulo
ao Paulo, SP, Brasil
Summary
Superpopulation models are transformed in predictive models in order to permit
the use of standard classical statistics techniques. Confidence intervals based on
predictive models replace the predictive intervals based on superpopulation models.
The ideas are illustrated by various examples and the normal case turns out to
produce intervals that are also obtained by the standard classical survey sampling
techniques.
Key words: Maximum likelihood predictor, nuisance parameter, pivotal quantity,
prediction, predictive interval, predictive model, minimal sufficient reduction, spe-
cific sufficiency.
1. Introduction
Prediction of unknown quantities in parametric statistics has been fo-
cused from different points of view and predictive intervals (for these quanti-
66 REBRAPE, VOL 7, 1993
ties) of various types have been developed by many authors (see, for exam-
ple, Thatcher (1964), Hahn (1969), Lawless (1972), and Vit, (1973)). Gener-
ally, methods for deriving such intervals either make use of pivotal quantities
and normal approximations or are based on hypothesis testing approaches.
Faulkenberry (1973) proposed an alternative approach that was based on the
study of the conditional distribution of the original random quantities given a
sufficient statistics. In connection with this and under a non-Bayesian perspec-
tive, we advocate the use of a likelihood approach in order to obtain maximum
likelihood predictors for the quantities of interest. The idea is to derive con-
ditional probability functions producing the predictive likelihoods that do not
depend on the value of the unknown parameter (a quantity of no interest)
which is in fact replaced by the sufficient statistics (the quantity of interest).
For more details on the various forms of predictive likelihood we refer to the
recent work of Bjo/rnstad(1990). Other important references about choice of
likelihoods are Bayarri, De Groot, and Kadane (1986) and de Finetti (1977).
The related definitions of predictive likelihoods introduced by Lauritzen
(1974), Hinkley (1979), and Butler (1985) that consist of conditioning on min-
imal sufficient statistics, may differ in some situations. Consequently, the
maximum likelihood predictors obtained under these situations may differ sig-
nificantly (Bjo/rnstad, 1990). The aim of the present paper is to consider a
definition that combine the former ones and have interesting justifications un-
der the survey sampling or finite population context. The main idea consists
of looking for a family of probability for the data to be observed, indexed by
the quantity of interest, to be predicted. This quantity, that in the context
of finite populations is a function of both the observed and the unobserved
quantities, must act as the parameter in a standard statistical model.
Let Y= (Y1,...,YN) be a random vector with distribution indexed by
a parameter θ(scalar or vector). The quantity to be observed, the sample, is
represented without loss of generality by
Ys= (Y1,...,Yn)
IGLESIAS ET AL.: PREDICTIVE LIKELIHOOD 67
where n < N. The remaining part of Y, the unobserved quantity, is repre-
sented by YU= (Yn+1,...,YN).
The problem to be solved consists of making a predictive statement about
a function of Y,τ(Y), based on the observed value, ysof Ys.
From a Bayesian point of view, the problem is solved in a straightforward
manner. Once the prior probability (density) function, pdf, for the model
parameter, θ, is considered, the predictive distribution of τ(Y) given (Ys=ys)
is obtained. For instance, if p(θ) is the prior pdf, then the predictive pdf at
point τ(Y) = τis
f(τ|ys) = Zf(τ|ys, θ)p(θ|ys)dθ , (1.1)
where p(θ|ys) is the posterior pdf of θ. That is, the predictive pdf is the average
of f(τ|ys, θ) under the posterior distribution. Also note that alternatively we
could write
f(τ|ys)f(τ)f(ys|τ),(1.2)
which can be interpreted as the Bayes’ operation when τis considered as the
parameter and f(ys|τ) defines the likelihood of τ. Here also, f(τ) and f(ys|τ)
are obtained by integration using adequate distributions of θ. Note that, if τ
is sufficient under the full model of Y, then integration to obtain f(ys|τ) is
unnecessary.
If prior information is not supposed to be used, the distributions that
could be used for prediction, f(τ|θ), f(τ|ys, θ), and f(ys|τ, θ), do involve
θ. Consequently, a satisfactory frequentist solution for the prediction of τ
may not be obvious. A tentative could be to replace, in these functions, the
parameter θby its maximum likelihood estimate. Such an approach implicitly
assumes that the true value of θis its estimate and would not take in account
the uncertainty about θ.
Hinkley (1979) and Butler (1986) introduced predictive likelihood func-
tions that neither involves the replacement of an estimate for θnor requires the
use of prior distributions. Consequently, standard inferential methods could
68 REBRAPE, VOL 7, 1993
be used. The different definitions presented by Hinkley (1979) and Butler
(1986) may lead to different solutions for the prediction problem. The defi-
nition of predictive likelihood that will be used to solve the problem stated
above, the prediction of τ, is more general than the former ones and produces
no inconsistency for the particular case of finite populations.
Several aspects of a prediction problem are discussed in Section 2. Sec-
tion 3 presents solutions for finite population problems. It is interesting to
notice that the standard normal produce the same results obtained under the
standard survey sampling techniques.
2. Prediction
Based on the model described in Section 1, the problem to be solved is
the prediction of τ(Y) using the observation ysof Ys. The starting point in a
frequentist context consists in associating a distribution to Ysindexed by τ.
The natural procedure is to consider the original random variables Y1,...,YN,
as independent with a common distribution indexed by a unknown parameter
θ. With this model we obtain the conditional distribution of the sample Ys
given τand θ. Hence the new model is indexed by two parameter, τand θ.
In this manner and according to the main purpose of this study, the problem
may involve inference about τin the presence of a nuisance parameter, θ(Basu
(1977)).
The parameter of the modified model is denoted by π= (τ, θ) and the
modified likelihood by l(π|ys). The definitions and results presented in the
sequel will be the basis of the solution. Note that there is a lack of independence
between the sample elements, Y1,...,Yn, in the modified model.
We are using, in a general notation, f(or g) and lfor probability density
and likelihood functions, respectively. For simplicity, they do not show neither
differences in dimension nor in distribution since they are implicit in each case.
IGLESIAS ET AL.: PREDICTIVE LIKELIHOOD 69
Definition 2.1. The function l(π|ys) is called predictive likelihood. We also
call predictive functions the ones that are proportional to it, following the
principle of sufficiency.
Definition 2.2. If there exists a point b
π= (b
τ,b
θ) such that
sup
πl(π|ys) = l(b
π|ys),
then b
τis the maximum likelihood predictor of τ(Y).
Definition 2.3. If there exist functions T1=T1(Ys) and T2=T2(Ys),
with observed values τ1and τ2, such that T1<T2a.s. and Pr [(T1,T2)
τ(Y)|θ] = γ, then (τ1,τ2) is called the predictive interval of τ(Y) with 100γ%
of confidence.
The following result characterizes the role τ(Y) when it is a minimal
sufficient statistics under the original model, f(y|θ).
Lemma 2.1. If there exist functions T=T(Ys)and U=U(Ys)such that,
(with respect to θ)S,U, and τ=τ(Y)are minimal sufficient reductions of
Ys,YU, and Y, respectively, then
i) f(ys|π)depends on πonly through τand
ii) f(ys|π) = g(t|τ)h(ys), where gis the pdf of T.
The proof is straightforward.
It should be noted that, if in this Lemma, Ysand YUare statistically
independent in the original model and Uis uniquely defined by Tand τ,
then gdefines the predictive likelihood of Hinkley (1979). That is, Hinkley’s
predictive likelihood is l(u|t) which is equal to g(t|τ) at the observed point
t. Moreover, under the conditions of the Lemma, the maximum likelihood
70 REBRAPE, VOL 7, 1993
predictor of τagrees with the predictor defined in Lauritzen (1974). Also, in
this case there is no nuisance parameter to be eliminated.
Another feature of the modified likelihood when f(ys|π) = f(ys|τ) is
that, under the Bayesian point of view, formula (1.2) may be immediately
applied without the need of integration to obtain its factors. In this case,
comparison between Bayesian and frequentist methods follows the standard
procedures since there is no need for elimination of nuisance parameters.
To illustrate Lemma 2.1, we present the following simple example.
Example 2.1. Let Y1,...,YN, the elements of Y, be independent exponential
random quantities with common unknown mean equal to θ1. The population
total τ(Y) = Y1+···+YNand the sample total T=Y1+···+Ynare minimal
sufficient reductions of Yand Ys, with respect to the original model. The
predictive likelihood may be written as
l(π|ys)l(τ|t)(τt)Nn1tn1
τN1I(t, τ),(2.1)
where I(·,τ) is the indicator function of the interval (0,τ).
The maximum likelihood predictor obtained from (2.1) is b
τ=T
n(N1).
To obtain the predictive interval, we notice that Z=T
τis a pivotal quantity
with distribution Beta with parameters Nnand n. Hence a predictive
interval with 100γ% of confidence is of the form
T
b;T
a,
where aand bare chosen in such a way that a < b and Pr (a < Z < b) = γ.
(Note that Pr (a < Z < b) is the incomplete Beta function divided by the
Beta function, both calculated at point (Nn, n).] To obtain the shortest
interval we choose aand bthat makes (1/a)(1/b) minimum. In particular
if N=n+ 1, the shortest predictive interval is
T, T
n
q(1 γ)
.
IGLESIAS ET AL.: PREDICTIVE LIKELIHOOD 71
Unfortunately, there are many situations where the quantity of interest,
τis not a minimal sufficient reduction of Yfor the original model. Conse-
quently, elimination of the nuisance parameter, θ, would be necessary. How-
ever, interesting results can be obtained under other kinds of simplifications.
In the sequel we discuss a common situation that allows simple solutions.
Suppose that the original parameter has the representation θ= (θS, θU)
and that, for any fixed value of θU,τ(Y) is specific sufficient relatively to θS
(see Basu (1977)); that is, the nuisance parameter to be eliminated is simply
θU. In this case, the predictive model depends on the modified parameter
π= (τ, θ) only through π= (τ, θU). As before we represent the maximum
likelihood predictor by b
τ.
A dual situation is when there exists a minimal sufficient reduction of
Y(in relation to θ), η(Y), such that η(Y) = (τ(Y),λ(Y)). The function
l(η|yS) = f(yS|η) may be considered as the predictive likelihood where λis
the nuisance parameter to be eliminated and τis the quantity to be predicted.
The maximum likelihood predictor in this case is represented by e
τ.
The next example is very standard and incorporates both situations de-
scribed above, besides the fact that there is a choice of λsuch that τand λ
are statistically independent. Consequently, to obtain the predictor we can
use indifferently either likelihood l(π|yS) or l(η|yS) since e
τ=b
τ.
Example 2.2. Let Y1,...,YN, the elements of Y, be independent normal
random variables with unknown common mean and variance represented by
θS=µ(IR) and θU=σ2(IR+), respectively. Here, we take
T=Y1+···+Yn, V =Y2
1+···+Y2
n,τ=Y1+···+YN,and λ=Y2
1+···Y2
N.
Recall that τis specific sufficient in relation to µ. To obtain the first predictive
likelihood, l(π|yS), we note that YS|πis distributed as a n-variate normal
with mean τ
Njnand covariance matrix σ2In1
NJn, where jnis the n-
variate vector with all components equal to the unity, Inis the identity matrix
72 REBRAPE, VOL 7, 1993
of order nand Jnis the squared matrix of order nwith all components equal
to the unity. Hence the maximum likelihood predictor of τis b
τ=N
nT. On the
other hand, the alternative likelihood is based on the (conditional) distribution
of (T, V )|η, since (T , V ) is a sufficient reduction (in the original model) of the
sample, YS, in relation to (µ, σ2). After some calculations we obtain the
likelihood at point (t, v)
l(η|t, v) λv(τt)2
Nn!(Nn3)/2 λτ2
N!(N3)/2
.(2.2)
The maximum likelihood predictor, e
τ, coincides with b
τ=N
nT. If λis replaced
by
λ=
N
X
i=1
(Yi(τ/N))2,
then we would obtain the same maximum likelihood predictor for τsince there
is a one-to-one correspondence between (τ,λ) and (τ,λ). Note that, in the
original model, τand λare statistically independent.
We end this section by noticing that all the discussion presented here can
be extended for the case where Y1,...,YNare vectors.
3. Finite population examples
Prediction in finite population has been studied till now through the
standard classical sampling theory and the superpopulation model approach.
Advantages and restrictions for these two methods have been presented in
the literature (see Basu (1969) and Cassel, S¨arndal, & Wretman (1967) for
interesting discussions and for a large list of references). The restrictions for
the standard sampling theory leans upon the probabilistic model used that
is not related to the quantity of interest. For the superpopulation model,
the restrictions can be stated from the fact that ad hoc methods must be
constructed. The problems come from the fact that there is an unknown
IGLESIAS ET AL.: PREDICTIVE LIKELIHOOD 73
parameter of no interest that must be estimated in order to use it for the
prediction of the unknown quantity of interest. The approach discussed in
the present paper eliminates one of the steps by transforming the quantity
of interest in a parameter of an alternative model obtained from the original
superpopulation model. In this way only standard statistical techniques needed
to be used.
As in most finite population situations, in this section we consider the
population total as the quantity of interest and the sample total as the relevant
statistic. Hence, we use the following notation through this section:
T=Y1+···+Ynand τ=Y1+···+YN.
The observed value of Tis represented by t.
Next we present thee predictive likelihood for the exponential family.
Lemma 3.1. Let Y1,...,YN, the elements of Y, be statistically independent
random quantities with common density function
f(y|(θ, φ)) exp (θy b(θ)
a(φ)+c(y, φ)),
where a(·), b(·)and c(·,·)are known functions. The predictive likelihood in this
case is
l(τ, θ, φ|yS)l(τ, φ|t)exp [h1(t, φ) + h2(τt, φ)h3(t, φ)]
where h1, h2and h3are known functions.
The proof is simple and uses stronglyu the specific sufficiency of τ(for
Y) and t(for YS) and the fact that tand τtare statistically independent,
in the original model.
In the sequel we present standard examples that will permit the readers
to compare with the solutions obtained under the superpopulation approach.
74 REBRAPE, VOL 7, 1993
Example 3.2. Let Y1,...,YN, the elements of Y, be statistically indepen-
dent Bernoulli random variables with common unknown parameter (sucess
probability) θ. The predictive likelihood is obtained from the conditional dis-
tribution of T|τwhich is hipergeometric with parameter τ, population size N
and sample size n. The maximum likelihood predictor, b
τ, is an integer that
satisfies
(N+ 1)T
n1b
τ(N+ 1)T
n.
The construction of the predictive with 100γ% confidence for τ, presents the
same numerical difficulties of the confidence interval for the parameter of the
hipergeometric model.
Example 3.3. Let Y1,...,YN, the elements of Y, be statistically independent
Poisson random variables with common unknown mean θ. The predictive like-
lihood is obtained from the conditional distribution of T|τwhich is binomial
with the parameters τand n/N. For this binomial model, τrepresents the
sample size and n/N is the probability of success. The maximum likelihood
predictor, b
τ, is an integer that satisfies
NT
n1b
τNT
n.
The construction of the predictive set for τpresents the same numerical diffi-
culties of the former example.
Example 3.4. Let Y1,...,YN, the elements of Y, be statistically independent
geometric random variables with common unknown parameter θ, having com-
mon mean equal to (1 θ). The predictive likelihood is obtained from the
conditional distribution of T|τand is equal to
l(τ|t)n+t1
tNn+τt1
τt
N+τ1
τ,
where tand τare integers satisfying 0 tτ. The maximum likelihood
IGLESIAS ET AL.: PREDICTIVE LIKELIHOOD 75
predictor, b
τ, is an integer that satisfies
(N1)T
n1b
τ(N1)T
n.
Again, we have the same difficulties to built the predictive set for τ.
Note that the above three examples have in common the fact that the
quantity of interest is an integer quantity which brings the standard difficul-
ties of constructing confidence sets. The following examples are related with
continuous random quantities and present nice and simple analytical solutions.
Example 3.5. Let Y1,...,YN, the elements of Y, be statistically independent
Gamma random variables with common positive parameters a(known) and β
(unknown) having common mean equal to a/β. The predictive likelihood is
obtained from the distribution of T|τand is equal to
l(τ|t)1t
τ(Nn)a1t
τna
.
The maximum likelihood predictor is b
τ= (Na1)T
n, To obtain the predictive
interval for τwith confidence 100γ% we notice the fact that the pivotal quan-
tity Z=T/τhas a beta distribution with parameter (na; (Nn)a). Hence,
the predictive interval for τis
T
d;T
c,
where cand dare chosen in such a way that
1
c1
dis minimum under Zd
cf(z)dz =γ .
Example 3.6. Let Y1,...,YN, the elements of Y, be statistically independent
Normal random variables with common unknown mean µand known variance
c2. The predictive likelihood is obtained from the conditional distribution of
T|τwhich is normal with mean hn
Niτand variance h1n
Ninc2. Consequently,
76 REBRAPE, VOL 7, 1993
the maximum likelihood predictor of τis b
τ=N
nTand the predictive interval
for τis obtained using the pivotal quantity
Z=Tn
Nτ
crnh1n
Ni,
that is distributed as a standard normal variable. The predictive interval for
τis defined by
b
τ±Nzcs1
n1
N,
where zis such that Pr (z < Z < z) = γ.
The above examples did not involved any nuisance parameter. The fol-
lowing ones are multiparametric cases.
Example 3.7. This is the continuation of Example 2.2. If the quantity of
interest is only the population total, τ, then, in order to obtain the predictive
interval, the following pivotal quantity can be used:
W=Tn
Nτ
Srnh1n
Ni,
where S2=1
n1hVT2
ni. Since the distribution of Wis Student twith n1
degrees of freedom, the predictive interval for τis defined by
b
τ±NwSs1
n1
N,
where wis such that Pr (w < W < w) = γ.
Note that the last example produces formulas that are equal to those
obtained by using the classical sampling techniques which is not based on any
superpopulation model.
Example 3.8. If in Example 3.7 we also consider the population variance as
a quantity of interest, then the second predictive likelihood (expression 2.2) of
IGLESIAS ET AL.: PREDICTIVE LIKELIHOOD 77
Example 2.6 is the one to be used. Suppose that the population variance is
defined by
φ=1
N3
N
X
i=1 Yiτ
N2
(for simplicity we consider N3 in the place of N). The maximum likelihood
predictor of φis given by
b
φ=1
n
n
X
i=1 YiT
n2
1
n b
τ2
NT2
n(b
τT)2
Nn!
if this expression is positive and zero otherwise. This predictor can receive
interesting interpretations since is a function of the sample variance corrected
by a function of the population total, the sample total and the total of the
unobserved part of the population. It may not be simple to decide which
pivotal quantity to be used to obtain a predictive interval for the population
variance. However, a good start could be the fact that the ratio between
1
n
n
X
i=1 YiT
n2
and 1
Nn
N
X
i=n+1 YiτT
Nn2
is a pivotal quantity with a well known distribution.
Example 3.9. Let Y1,...,YN, the elements of Y, be statistically independent
Normal random variables with known common variance c2and unknown mean
of Yi, for i= 1,2,...,N, equal to β0+β1xiwhere xiis a fixed value of a
covariance xand β0and β1are unknown parameters. In this case, τis specific
sufficient for β0. If the only quantity of interest is τ, then we use the likelihood
obtained from the conditional distribution of yS|τ,β1which is n-variate normal
with mean equal to
τ
Njn+β1(xxNjn)
and variance equal to
c2(In1
NJn),
78 REBRAPE, VOL 7, 1993
where jn,Inand Jnare defined as in Example 2.2 and x= (x1,...,xn) and
xN=1
N
N
X
i=1
xi
is the population mean of the xi’s (note that, equivalently, xnis the sample
mean of the xi’s).
The maximum likelihood predictors of β1and τare, respectively:
b
β1=Pn
i=1(xixN)YiT
n
Pn
i=1(xixn)2
and
b
τ=N
nTNb
β1(xnxN).(3.1)
To obtain the predictive interval for τwe use the following standard
normal pivotal quantity
b
ττ
NcqR(x)
n1
N
,
where
R(x) = Pn
i=1(xixN)2
Pn
i=1(xixn)2.
Example 3.10. In Example 3.9 suppose that besides τ,
ρ=
N
X
i=1
Yixi
is also a quantity of interest. The vecto (τ,ρ) is a minimal sufficient statistic
for the original model in relation to (β0, β1). For Z=Pn
i=1 Yixiand since
(T, Z) is a minimal sufficient reduction of the sample, the predictive likelihood
may be obtained from the conditional distribution of (T , Z)|(τ,ρ) which is
also normal. To describe the mean and the variance of this distribution we
introduce the following notation:
sn=1
nx2
1+···+x2
n,
IGLESIAS ET AL.: PREDICTIVE LIKELIHOOD 79
sN=1
Nx2
1+···+x2
N,
Σ11 = Σ12 =n 1xn
xnsn!,
Σ22 =N 1xN
xNsN!and
Σ = Σ11 Σ12Σ1
22 Σ12 .
The conditional distribution of (T, Z)|(τ,ρ) is normal with mean µand
variance Σ where
µ=n
N(sNx2
N) sNxNxnxnxN
xnsNxNsnsnxNxn! τ
ρ!=B τ
ρ!.
Clearly, the maximum likelihood predictor of (τ,ρ) is given by B1(T, Z).
It is not difficult to check that the first component of this vector coincides
with the expression (3.1). Also by a proper standard transformation, as we
present in the next example, we obtain the pivotal quantity that will produce
the predictive region for (τ,ρ).
Except for the last example we have been working with univariate random
variables. We end this section with a multivariate normal distribution where
the quantity of interest is the vector of population totals.
Example 3.11. Let Y1,...,YN, the elements of Y, be statistically indepen-
dent Normal random vector of order k(>1) with unknown common mean
vector µand known common covariance matrix Σ. The population total, τ,
which is also a vector of order k, is a minimal sufficient reduction of Ywith
respect to µ. The predictive likelihood can be obtained from the conditional
distribution of the sample total Tgiven τ. This distribution is multivariate
normal with mean vector hn
Niτand covariance matrix h1n
NinΣ. Hence, the
maximum likelihood predictor of τis b
τ=N
nTand the predictive region for τ
80 REBRAPE, VOL 7, 1993
is given by
(τ:N
(Nn)nTN
nτ
Σ1TN
nτχ2),
where χ2is the value of a qui-squared with kdegrees of freedom that gives
100γ% of confidence.
4. Final remarks
Since we presented cases of smooth likelihoods, to obtain the maximum
likelihood predictors in the discrete cases, we compared the likelihood ratios
l(τ|t)/l(τ1|t) and l(τ|t)/l(τ+ 1|t) with the unity. For the continuous
cases sove the equation obtained by making the partial derivative of the log-
likelihood equal to zero.
It is important that we understand the relevant role that sufficiency and
specific sufficiency play in the method discussed in this article. In order to
illustrate this role, let us consider two simple cases.
Again, consider the population total, τ, as the quantity of interest. First
we go back to Examples 2.2 and 3.7 where the original model is normal with
mean µand variance σ2. Suppose that we receive an additional information
saying that the mean is known to be zero. The model now has only one
unknown parameter, σ2. The predictive likelihood based on the conditional
distribution of the sample given the population total is exactly the same as
before, l(π|yS). Hence the information that the parameter µ, of no interest,
is known to be zero, would not improve the prediction of τ. We believe that
this is not reasonable, since τand µare strongly related.
The second case considered here is the regression case of Examples 3.9
and 3.10. The information that the intercept parameter, β0, is null also would
not change the predictive likelihood and no improvement in the prediction is
attained with such an important information. Using Bayesian methods, where
IGLESIAS ET AL.: PREDICTIVE LIKELIHOOD 81
all relevant information is processed, this problem would not occur (Datta &
Gosh (1991)).
We end this article by emphasizing that the method discussed here is
based on a proper model that is consequence of standard suppositions. In no
place, additional suppositions or restrictions were considered. The real ques-
tion to be discussed is an old one: what is the likelihood function? We believe
that Bayesians would answer this question saying that, after elimination of
nuisance parameters by integration, l(τ|yS) is the correct likelihood function.
(Received January 1993. Revised September 1993.)
References
Bayarry, M.J., De Groot, M.H. and Kadane, J.B. (1986). What is the likelihood
function? In: Gupta, S.S. and Berger, J.O. (Eds.). Proceedings of the
Fourth Purdue Symposium of Statistical Decision Theory and Related Topics.
Springer-Verlag, New York, pp. 3–27.
Basu, D. (1969). Role of sufficiency and likelihood principles in sample survey
theory. Sankhy˜a A,31, 441–54.
Basu, D. (1975). Statistical information and likelihood. Sankhy˜a A,37, 1–71.
Basu, D. (1977). On the elimination of nuisance parameters. JASA,72, 355–66.
Bjo/rnstad, J.F. (1990). Predictive likelihood: a review (with discussion). Statis-
tical Science,5, 242–65.
Butler, R.W. (1986). Predictive likelihood inference with applications (with dis-
cussion). JRSS,B 47, 1–38.
Cassel, C.M., S¨arndal, C.E., and Wretman, J.H. (1977). Foundations of inference
in survey sampling. John Wiley, New York.
Datta, G.S. and Ghosh, M. (1991). Bayesian prediction in linear models: applica-
tions to small area estimation. The Annals of Statistics,73(4), 1748–1770.
Davidson, A.C. (1986). Approximate predictive likelihood. Biometrika,73, 323–
32.
82 REBRAPE, VOL 7, 1993
De Finetti, B. (1987). Probabilities of probabilities: a real problem or a misunder-
standing? In: Ayka¸c, A. and Brumat, C. (Eds.). New developments in the
applications of Bayesian methods. North Holland, Amsterdam, pp. 1–10.
Faulkenberry, G.D. (1973). A method of obtaining prediction intervals. JASA,
68, 433–5.
Hahn, J.G. (1969). Factor for calculating two sided prediction intervals for samples
from a normal distribution. JASA,64, 878–88.
Hinkley, D.V. (1979). Predictive likelihood. Ann. Statist.,7, 718–28.
Lauritzen, S.L. (1974). Sufficiency, prediction and extreme models. Scand. J.
Statist.,1, 128–3.
Lawless, J.F. (1972). On prediction intervals for samples from the exponential
distribution and prediction limits for system survived. Sankhy˜a B,34, 1–14.
Royall, R.M. (1968). An old approach to finite population sampling theory. JASA,
63, 1269–79.
Royall, R.M. (1971). Linear regression models in finite population sampling theory.
In: Godambe, V.P. and Sprott, D.A. (Eds.). Foundations of statistical
inference. Holt, Rinehar and Winston, Toronto, 259–74.
Thatcher, A.R. (1964). Relationships between Bayesian and confidence limits for
prediction. JRSS,B 26, 176–92.
Vit., P. (1973). Interval prediction for a Poisson process. Biometrika,60, 667–8.
Article
We discuss inference for repeated fractional data, with outcomes between 0 to 1, including positive probability masses on 0 and 1. The point masses at the boundaries prevent the routine use of logit and other commonly used transformations of (0, 1) data. We introduce a model augmentation with latent variables that allow for the desired positive probability at 0 and 1 in the model. A linear mixed effect model is imposed on the latent variables. We propose a Bayesian semiparametric model for the random effects distribution. Specifically, we use a Polya tree prior for the unknown random effects distribution. The proposed model can capture possible multimodality and skewness of random effect distribution. We discuss implementation of posterior inference by Markov chain Monte Carlo simulation. The proposed model is illustrated by a simulation study and a cancer study in dogs.
Article
Eliminating nuisance parameters from a model is universally recognized as a major problem of statistics. A surprisingly large number of elimination methods have been proposed by various writers on the topic. In this article we propose to critically review two such elimination methods. We shall be concerned with some particular cases of the marginalizing and the conditioning methods. The origin of these methods may be traced to the work of Sir Ronald A. Fisher. The contents of the marginalization and the conditionality arguments are then reexamined from the Bayesian point of view. This article should be regarded as a sequel to the author’s three-part essay (Basu 1975) on statistical information and likelihood.
Article
Given the number of successes in a random sample, prediction limits can be determined for the number which will be observed in a second sample, in a way which does not depend on any assumption or inference about the unknown proportion in the population. Such “confidence limits” for the prediction are found to correspond to Bayesian solutions based on two particular prior distributions, and are related to Laplace's rule of succession. The results suggest a possible type of “prediction strategy”.
Article
Conditional and marginal predictive likelihood are defined and used in predicting unobserved or missing data. A distinction is made between predictive and estimative goals of a data analysis. Such methodology and distinctions are illustrated in the context of outlier theory and mixed model ANOVA.
Article
A method is given for obtaining a prediction interval for a variable Y with distribution FY(y/θ) based on observations x1, ···, xn from distribution Fx(x/θ), where θ is a common parameter to both distributions. The method uses the conditional distribution of y given t, where t is a sufficient statistic for θ in the joint distribution of (X1, ···, Xn, Y). Sample applications are given for the negative exponential and Poisson distributions.
Article
The main objective of this work is to relate classical inference and sampling theory (and current sampling practice) to recent advances toward formalizing and unifying the theory of sampling from finite populations. It is pointed out that the use of definitions which coincide with what seem to be the traditional notions of simple random sampling, and post-stratification (stratification after sampling) leads to the solution of such riddles which appear in the more recent theory as the failure of the likelihood principle to sanction any non-trivial inference whatever. In particular, a general method of maximum likelihood is shown to lead to new estimates for the population mean when the three aforementioned sampling plans are employed. Thus objective likelihood inference from sample to population is seen in these instances to be possible if and only if randomization is employed in the selection of the sample. Finally, completeness theorems are proved which imply that in the simplest common sampling problems conventional unbiased estimators of population parameters have the happy property of being the unique unbiased estimators.