ArticlePDF Available

On applications of marginal models for categorical data

Authors:

Abstract and Figures

The paper considers marginal models for categorical data and after reviewing the most important theoretical results concerning the definition, estimation and testing of such models, discusses a number of common statistical problems. These examples include, among others, the analysis of repeated measurements, panel studies and missing data. Fitting marginal models in these cases has the potential of providing the researcher with substantial new insight. The examples illustrate that the marginal modeling approach may be used more widely than thought before. One of the examples shows howgraphical models associated with directed acyclic graphs can be parameterized. A general algorithm is presented to compute maximum likelihood estimates under marginal models.
Content may be subject to copyright.
METRON - International Journal of Statistics
2004, vol. LXII, n. 1, pp. 15-37
TAM ´
AS RUDAS WICHER P. BERGSMA
On applications of marginal models for
categorical data
Summary - The paper considers marginal models for categorical data and after review-
ing the most important theoretical results concerning the definition, estimation and
testing of such models, discusses a number of common statistical problems. These
examples include, among others, the analysis of repeated measurements, panel studies
and missing data. Fitting marginal models in these cases has the potential of pro-
viding the researcher with substantial new insight. The examples illustrate that the
marginal modeling approach may be used more widely than thought before. One of
the examples shows how graphical models associated with directed acyclic graphs can
be parameterized. A general algorithm is presented to compute maximum likelihood
estimates under marginal models.
Key Words - Graphical models; Log-linear models; Marginal models; Maximum
likelihood estimation, Missing data; Repeated measurements.
1. Introduction
During the past decade, a fair number of papers applying marginal mod-
els to medical (Balagtes, Becker, Lang, (1995), Molenberghs, Lesaffre, (1999))
and sociological (Becker, (1994), Becker, Minick, Yang, (1998)) data, parallel
to papers exploring components of the theory of marginal modeling (Lang,
Agresti, (1994), Glonek, McCullagh, (1995), Bergsma, (1997), Lang, Mc-
Donald, Smith, (1999), Colombi, Forcina, (2001), Bartolucci, Forcina, Dard-
adoni, (2001), Bartolucci, Forcina, (2002), Bergsma, Rudas, (2002a), Bergsma,
Rudas, (2002b)) have been published. In its most general form, a marginal
model, when applied to a multivariate statistical problem, imposes structural
restrictions on certain marginals (i.e., subsets) of the original variables. When
the variables are categorical, the models for the marginals are usually of the
log-linear or of the log-affine type. Such models are most conveniently formu-
lated by restricting the values of appropriately defined parameters. Therefore,
Received October 2003 and revised December 2003.
16 TAM `
AS RUDAS WICHER P. BERGSMA
the existence, flexibility and interpretability of marginal models depend largely
on the parameters that are used to formulate the model.
The present paper, based on recent theoretical developments (Bergsma,
Rudas, (2002a)), illustrates the applicability of a large class of marginal mod-
els to a variety of statistical problems. This class of models is based on
restricting the values of certain marginal parameters of the joint distribution
in a contingency table. This is a flexible class of parameters, that generalizes
earlier approaches to define marginal parameters (Glonek, McCullagh, (1995),
Glonek, (1996), Kauermann, (1997)). Certain combinatorial properties of the
variables involved imply smoothness of the parameterization and variation in-
dependence of its components. These properties are essential in interpretation,
imply the existence of a large class of models and the applicability of standard
large sample theory for estimation and testing.
The present paper contains almost no proofs. Section 2 gives a somewhat
informal exposition of the theory referred to above and Section 3 considers
applications of marginal models to a number of common statistical problems.
These include measuring the effect of a treatment, panel studies and Markov
chains, data fusion, missing data, joint treatment of the sampling and statistical
models, and graphical models. It is not the goal of the present paper to
explore the analyses made possible by the marginal approach in any depth,
rather, the aim is to illustrate that fitting marginal models and the interpretation
of carefully defined parameters may yield new insight into the above problems
and, often, appears to be the appropriate strategy. Finally, Section 4 describes
an algorithm to fit marginal log-linear or log-affine models. Much of the
theory and applications discussed in the present paper extend in a natural way
to problems involving continuous data, but these generalizations will not be
considered here.
2. Theory
The class of marginal models applied in this paper is based on marginal
log-linear parameters. These are obtained as ordinary log-linear parameters
(Agresti, (1990)) but they are not computed from the entire contingency table,
but rather from a marginal of it. A marginal log-linear parameter therefore, is
characterized by two subsets of the variables, one to which we first marginalize
and a subset of this one, to which the parameter applies. For example, for
variables A,B,C,D,λABC
i∗∗ is a marginal log-linear parameter. The marginal
which it pertains to is ABC, and this is shown in the superscript. Within the
ABC marginal, the parameter represents the log-linear effect of category iof
variable A. Note that the ordinary log-linear parameter of the variable Ain
category iis λA
iwhich, as a marginal log-linear parameter is denoted by
λABCD
i∗∗∗ ,asitiscomputed from the entire ABCD table, not from a marginal
On applications of marginal models for categorical data 17
of it. In this paper, only marginal log-linear parameters are considered, that
is, the superscript always refers to the marginal from which the parameter is
computed.
The usual log-linear parameters can be interpreted (Bishop, Fienberg, Hol-
land, (1975)) as measuring average conditional association among the variables
involved, conditioned on all other variables and then average taken over all
possible categories of the conditioning variables. Every marginal log-linear pa-
rameter pertains to a certain marginal and the average conditional association is
measured within this marginal. For example, when all the variables are binary,
λABC
1∗∗ =1
4
j
k
log(p1jk+/p2jk+)1/2,
where pijkl is a cell probability, either theoretical or observed or estimated,
and +”isamarginalization operator. That is, λABC
1∗∗ is related to the average
conditional log odds of category 1 of Aversus category 2, conditioned on and
averaged over all categories of Band C,inthe ABC marginal of the ABCD
table. Similarly,
λABC
11=1
2
k
log p11k+p22k+
p12k+p21k+1/4
,
that is, the marginal log-linear parameter λABC
ijof the ABCD table is related
to the average conditional log odds ratio between Aand B, conditioned on and
averaged over C, after marginalization over D. Throughout the paper, positivity
of the cell frequencies is assumed.
As a common way to refer to the various possibilities, the subset in the
superscript of a marginal log-linear parameter will be called the marginal and
the variables whose indices appear in the subscript will be called the effect
which is measured by the parameter.
The marginal parameters defined here include the ordinary log-linear pa-
rameters (those parameters which have the set of all variables as the relevant
marginal), the multivariate logistic parameters of Glonek, McCullagh (1995)
(those parameters for which the effect variables coincide with the marginal
variables) and a mixture of these considered by Glonek (1996).
The marginal log-linear parameters can be used in several different ways to
parameterize the distribution on a contingency table. The parameter selection
can be done in two steps. The substantive problem at hand determines which
marginals of the contingency table are of interest. In the first step, arrange
these marginals in a hierarchical ordering, i.e. in such a way that no marginal
contains one which comes later in the sequence. It is easy to see that such an
ordering always exists. If a certain rule is followed when selecting the subsets
that are effects within the marginals then, as it will be seen later, the resulting
18 TAM `
AS RUDAS WICHER P. BERGSMA
parameters will have desirable properties. This rule says that for every marginal,
only such subsets of it should be included as an effect that are not subsets of
any of the previous marginals. Marginal log-linear parameters defined by this
rule are called hierarchical.
Forexample, if for three variables A,B,C, the marginals of interest are
AB and BC,ahierarchical ordering is
AB BC.
Then, in the second step, for the AB marginal, the marginal log-linear pa-
rameters may pertain to the effects ,A,B,AB, and for the BC marginal,
the effects can be Cand BC. Thus, one possible set of hierarchical marginal
log-linear parameters implied by the above ordering of the relevant marginals
contains the following parameters:
λAB
∗∗
AB
i
AB
ij
BC
jk .
The following are also hierarchical marginal log-linear parameters, based on
the same ordering
λAB
∗∗
AB
iλAB
j
BC
kλBC
jk .
Notice however, that the above parameters are not complete, i.e. they do not
constitute a parameterization (see below). If the other hierarchical ordering of
the relevant marginals,
BC AB,
is selected, the resulting parameters may be as follows:
λBC
∗∗
BC
j
BC
k
BC
jk
AB
iλAB
ij .
As is seen above, the possible choices of the hierarchical marginal log-linear
parameters depend on the ordering of the relevant marginals. For example, λBC
j
is allowed in the second ordering but not in the first one. Note that hierarchy
of these parameters refers to a property of the ordering of the marginals that
determines which parameters are allowed, not to the choice of the effects within
the marginals (as is the case with classical hierarchical log-linear models).
A set of hierarchical marginal log-linear parameters can be completed to
a parameterization of the distribution on the contingency table. To do so, the
list of marginals has to be completed by adding the entire set of variables as
the last one in the hierarchical ordering and as new parameters, those have to
be included that pertain to effects not present, for the marginal where it is first
possible (Bergsma, Rudas, (2002a)). For example, the second set of parameters
above can be completed as
λAB
∗∗
AB
iλAB
j
AB
ij
BC
kλBC
jk
ABC
ik
ABC
ijk .
On applications of marginal models for categorical data 19
Note that for simplicity, parameterization refers here to the parameterization of a
frequency (rather than probability) distribution and for every parameter includes
all linearly independent choices of the indices, i.e. for binary variables every
parameter in the above list refers to one value. To obtain a parameterization of
a probability distribution, the main effect (i.e., λAB
∗∗ above) has to be omitted.
The ordinary log-linear parameterization and the one based on the mul-
tivariate logistic transform (Glonek, McCullagh, (1995)) are both hierarchical
marginal log-linear parameterizations.
Now certain desirable properties of marginal log-linear parameters and
of the statistical models derived from them will be studied. Note that these
properties extend to the hierarchical marginal log-linear parameterization if
parameters with these properties are completed by the above procedure to yield
a parameterization.
The properties studied will be smoothness of parameters, variation in-
dependence, existence of log-linear or log-affine marginal models defined by
restricting a set of parameters and the applicability of standard large sample
theory to these models. Proofs of the results to follow can be found in Bergsma,
Rudas (2002a).
Smoothness of the parameters considered essentially means a one-to-one
and differentiable correspondence between the vector of parameters and the
vector of cell probabilities. Smoothness is important in the interpretation of
the parameters and in studying the dimension of a model which, in turn, is
crucial for testing the fit of the model.
Theorem 1. Hierarchical marginal log-linear parameters are smooth for strictly
positive frequency distributions on the contingency table.
The above result establishes only a sufficient condition of smoothness of
marginal log-linear parameters but it can be shown that if the same effect
appears among marginal log-linear parameters within different marginals, then
these parameters cannot be smooth.
The next property to consider is variation independence of the parameters.
Variation independence means that the joint range of the parameters is the
Cartesian product of the separate ranges of the parameters involved. Lack
of variation independence may lead to the definition of non-existing (empty)
models and makes the separate interpretation of the parameters misleading. To
illustrate the importance of variation independence of the parameters, consider
the following marginal log-linear parameters and their prescribed values for
three binary variables:
λA
=log 8
A
1=0
B
1=0
C
1=0,
λAB
11 =(1/4)log(1/9), λAC
11 =(1/4)log(1/9), λBC
11 =(1/4)log(9).
20 TAM `
AS RUDAS WICHER P. BERGSMA
The above prescribed values are all within the ranges of the respective param-
eters. In spite of this, these values are not within the combined range of the
parameters, that is, no distribution exists with these parameters. To see this,
notice that the prescriptions imply that there are 8 observations, the one-way
marginals are uniform (4,4)and the first two two-way marginals have an odds
ratio equal to 1/9, while the third two-way marginal has an odds ratio equal
to 9. This completely specifies the two-way marginals of the table, but they are
not compatible: there is no (non-negative) three-way table with these two-way
marginals. This can be seen either by establishing a contradiction implied by
the assumptions or by considering the correlation matrix and establishing that
it is not positive definite. The parameters involved in this case are not variation
independent and the definition above specifies a non-existing distribution or, the
prescriptions define an empty model. Note, that for this example of potential
contradiction, neither the specification of the value of λA
nor the multipliers
of (1/4)were necessary.
To see how the lack of variation independence makes the separate interpre-
tation of the parameters invalid, consider a simple 2×2 treatment by outcome
experiment in two groups say, men and women. Suppose the following data
are observed:
Outcome Treatment Control
good 10 5
bad 40 45
Men
Outcome Treatment Control
good 30 20
bad 20 30
Women
If the measure of the effect of the treatment is the difference in proportion
of positive outcome among treated and among control, then this measure is .1
for men and .2 for women. Is then the treatment twice as effective for women
than for men, as the numerical values suggest? The answer is, of course, not
necessarily, because, given the marginals, the maximum value of this measure
is .3 for men and 1 for women. That is, the treatment is one third as useful
On applications of marginal models for categorical data 21
for men and only one fifth as useful for women as it could be. The measure
of treatment effect used here is not variation independent from the marginal
distributions therefore, it lacks calibration and cannot be interpreted without
paying attention to the other parameters.
To assure variation independence of hierarchical marginal log-linear pa-
rameters, a generalization of the classical decomposability concept is needed.
An ordering of a class of incomparable marginals is decomposable (Haber-
man, (1974), Lauritzen, Speed, Vijayan, (1984)) if it consists of two subsets
only or every subset has the property that its intersection with the union of the
previous subsets is equal to its intersection with one of the previous subsets. A
hierarchical ordering of subsets consisting of tmarginals is ordered decompos-
able if for every 3 ut, the maximal ones from among the first usubsets
have a decomposable ordering.
Forexample AB,BC,ABC is ordered decomposable but AB,BC,AC,ABC
is not. Ordered decomposability in fact does depend on ordering. For variables
A,B,C,D, the ordering AB,BC,ABC,AC D is ordered decomposable but
the ordering AB,BC,AC D,ABC is not.
Theorem 2. The components of a hierarchical marginal log-linear parameteriza-
tion are variation independent if and only if the ordering of the marginals involved
is ordered decomposable.
Marginal log-linear parameters derived from marginals in a hierarchical and
ordered decomposable order will be called hierarchical and ordered decompos-
able marginal log-linear parameters.
In the sequel, statistical models defined by restrictions on marginal log-
linear parameters will be considered. In this context, the parameters pertain
to the expectations of cell frequencies under Poisson sampling. A log-linear
marginal model is defined by assuming that certain linear combinations of
marginal log-linear parameters are equal to zero. Such models are never empty,
as they always contain the uniform distribution. A log-affine marginal model
is defined by assuming that certain linear combinations of marginal log-linear
parameters are equal to given constants. Examples of such models will be
considered in the next section.
The existence of log-affine marginal models is, in general, a difficult ques-
tion. In fact, the example used above to illustrate the importance of variation
independence is a log-affine marginal model which is empty, i.e. does not exist.
The following result shows that the conclusion suggested by that example is
true in general.
Theorem 3. Alog-affine marginal model defined by restrictions of variation inde-
pendent parameters is not empty.
22 TAM `
AS RUDAS WICHER P. BERGSMA
This implies that log-affine marginal models based on ordered decompos-
able hierarchical marginal log-linear parameters always exist i.e., include at
least one distribution.
The last desirable property we discuss here, is the applicability of standard
large sample theory. This is, again, not as straightforward as everyday statistical
practice may appear to suggest. For example, consider a 2 ×2×2 table and
the model assuming that λAB
11 =0
ABC
111 =0
ABC
11=0. The first condition
specifies marginal independence of variables Aand B, and the last two imply
that Aand Bare conditionally independent given C.Dawid (1980) showed
that the three assumptions imply that either Ais independent of both Band C
jointly or Bis independent of both Aand Cjointly or both of these hold true.
In the latter case, however large the sample is, the likelihood has, with positive
probability, local maxima on both branches of the model and the likelihood
ratio statistic is, asymptotically, the minimum of two chi-squared distributions
rather than having asymptotic chi-squared distribution.
If however, the model is based on appropriately selected marginal log-linear
parameters, the standard asymptotic theory applies.
Theorem 4. Suppose a non-empty log-affine marginal model is based on smooth
parameters. Then,under Poisson or multinomial sampling
a. The probability that the maximum likelihood estimate ˆπof the true probability
πexists and is a stationary point of the likelihood equation tends to 1,as the
sample size goes to infinity.
b. The asymptotic distribution of N 1/2(ˆππ) is normal,with zero expectation,
where N is the sample size.
c. The likelihood ratio statistic has an asymptotic chi-squared distribution with
the number of degrees of freedom being equal to the number of linearly inde-
pendent restrictions.
If the log-affine marginal model is based on hierarchical ordered decom-
posable marginal log-linear parameters, then the above asymptotic results hold
true. In other situations, standard asymptotic theory may or may not apply,
depending on the true population parameters.
3. Applications
Once in possession of the theoretical results outlined above, one finds
several important statistical problems where marginal log-linear or log-affine
models may be applied. In this section, some of these situations are reviewed.
Here, we formulate the relevant marginal models and investigate their properties
using the results given in the previous section. Issues related to estimation of
these models will be considered in the next section.
On applications of marginal models for categorical data 23
3.1. Repeated measurements
One of the most widely used experimental designs in the medical and
behavioral sciences to measure the effect of a treatment is to observe the
same individuals before and after it. In this design, the variables observed
before and after the treatment are related because they are observed on the
same individuals. If the variables are categorical, the observations before and
after the treatment should be considered as marginals of the same contingency
table (Hagenaars, (1990)). We now outline some potentially useful repeated
measurements models and use the theory of the previous section to show that
these models are well-behaved.
If the same characteristic is measured before (variable A) and after (vari-
able D)treatment and the hypothesis is that the distributions before and after
treatment are the same (no effect of the treatment), the statistical model in the
A×Dtable is defined by
λA
i=λD
i,for all i.
This is a log-linear marginal model and the marginals involved have a hierar-
chical and ordered decomposable ordering, e.g., A,D.Itimmediately follows
that the model exists and that standard large sample theory applies.
In fact, this is the model of marginal homogeneity.
When two variables are measured before (A,B) and also two after ( D,E)
the treatment, an interesting model assumes that Aand Bare independent and
Dand Eare independent. Note that this model may be meaningful whether
the same or different characteristics are measured before and after treatment.
This model assumes that
λAB
ij =0
DE
lm =0,for all i,jand l,m.
Here, the marginals involved are AB,DE and they are ordered decomposable.
The related hierarchical marginal log-linear parameters that may be arbitrarily
restricted are λAB
∗∗ AB
iAB
j
AB
ij
DE
l
DE
m
DE
lm . Therefore, the model is
a marginal log-linear model based on hierarchical and ordered decomposable
parameters, and consequently standard large sample theory applies to this model.
A log-affine marginal model based on the same parameters which is relevant
here, is the one assuming that the ratio of the marginal odds ratios, as measures
of association, between Dand Eand between Aand Bis equal to a specified
constant or, equivalently, that the difference of association, as measured by
log-linear parameters is equal to a specified value. This leads to the model
specified by λAB
11 λDE
11 =c. This model exists for all cand standard large
sample theory applies.
24 TAM `
AS RUDAS WICHER P. BERGSMA
If there are several variables measured before and after the treatment,
arbitrary linear or affine assumptions about some of the marginal log-linear
parameters pertaining to effects within the marginal of the before-treatment
variables and about some of those pertaining to effects within the marginal of
after-treatment variables will obey standard asymptotic theory, if the model is
not empty. The latter condition holds in the linear case and in the affine case
it holds if the effects for both marginals are decomposable.
There are however, more general cases covered by the available theory. Re-
strictions considering the association between before and after treatment vari-
ables can also be included. For example, if there are three variables A,B
and Cmeasured before and three variables D,Eand Fmeasured after the
treatment, the model with the following restrictions
λABC
ijk =0
DEF
lmn =0
ABCD
i∗∗l=0
ABCD
ijl=0
ABCD
ikl =0
ABCD
ijkl =0,
for all i,j,k,l,mand n
means that there is no second order association among the before-treatment
variables and among the after-treatment variables and Aand Dare condition-
ally independent, given the other before treatment variables. This model can be
obtained by restricting the parameters derived from the ordering of marginals
ABC,ABCD,DEF. Since this ordering is hierarchical and ordered decom-
posable, the model exists and standard asymptotic theory applies.
All this extends in a natural way to designs where several measurements are
taken over the same individuals, either with certain treatments applied between
the measurements or time passing by between the measurements.
A further application of marginal models is needed when measurements on
the same characteristic are taken repeatedly to reduce the effect of measurement
error as for example, blood pressure of a person may be measured on three
consecutive days, different tests of the same mental ability may be administered
to the same person, or different questions measuring the same attitude may be
included in a questionnaire. Suppose, A1and A2are the first and second
measurements of the same characteristic, and B1and B2are the first and
second measurements of another characteristic. Then, A1and A2should be
essentially (disregarding error related to imprecise measurement or to temporary
fluctuations of the quantity measured) identical, just like B1and B2.Inaddition
to certain marginal homogeneity restrictions, this would also imply that the
association between Aiand Bjis the same for every combination of i,j=1,2.
This leads to the following model:
λA1
i=λA2
i
B1
j=λB2
j
A1A2
i1i2=ri1i2(A), λB1B2
j1j2=rj1j2(B),
λA1B1
ij =λA1B2
ij =λA2B1
ij =λA2B2
ij ,
On applications of marginal models for categorical data 25
for all possible values of the indices, where ri1i2(A)and rj1j2(B)represent
the strength of association between the first and second measurements of the
characteristics Aand B, respectively, and should be selected based on the
magnitude and distribution of error which is usually present, or acceptable,
when those measurements are performed. For this model, standard asymptotic
theory holds.
3.2. Panel studies and Markov chains
In this setup, again, the same individuals are observed several times on
the same variables however, interest lies not so much in whether or not the
distributions in the different time points are identical or different rather, the
pattern of change is of interest. A frequently investigated hypothesis is that of
ak-th order Markov chain that is, the conditional distribution of the variables
measured at time point tdepends only on the positions at the kpreceding
time points. This is, of course, a log-linear model but the related hypotheses
discussed below are of the marginal type.
The estimation of the transition probabilities are often among the goals
of the analysis of panel data. Parallel to the Markov hypothesis, one may be
interested in modeling whether or not the distributions in the previous waves
of the panel influence the association between the distributions in the last two
waves. The pattern of association between the distributions in the t-th and
t1-st time points can be captured by the conditional odds ratios or log-linear
parameters of the joint distribution of the variables measured at these two time
points. If this only depends on the distribution at the tk1-st, ...t2-nd
time points, the process generating the data has a k-th order memory.
Therefore, to test the hypothesis of a first order memory against that of a
second order memory, one needs at least four waves of the panel. In this case,
the hypothesis that one has a first order memory, given that the memory is of
second order (saturated in the present case) can be formulated as
λA1A2A3A4
∗∗i3i4=λA2A3A4
i3i4
A1A2A3A4
i2i3i4=λA2A3A4
i2i3i4
A1A2A3A4
i1i2i3i4=0,
where Aidenotes the variable(s) measured at the i-th time point. This model
asserts that the association between A3and A4depends only on A2and not
on A1. The association is measured by the appropriate marginal log-linear
parameters (or, equivalently by the appropriate marginal conditional odds ratios).
The conditions imply that the marginal log-linear parameters (or, equivalently,
the marginal conditional odds ratios) are the same if conditioned on A2only
or on both A2and A1. This is a collapsibility condition (Whittemore, (1980))
and is also a marginal log-linear model. The marginal log-linear parameters
in it are not contained in any hierarchical marginal log-linear parameterization
26 TAM `
AS RUDAS WICHER P. BERGSMA
because, for example, the {A3,A4}effect appears in two marginals. Therefore,
the statistical properties of this model cannot be obtained from the results of
this paper. In fact, it can be shown (Bergsma, Rudas, (2002a)) that the above
parameters cannot be parts of a smooth marginal log-linear parameterization
and the Jacobian of any marginal log-linear parameterization containing the
above parameters is singular at the uniform distribution. Note, that the same
applies to any similar collapsibility restriction (see also Davis, (1989)).
If the process is known to have a, say, one step memory, testing stationarity
(with respect to conditional association between neighbors) requires fitting the
following model:
λA1A2A3
jk =λA2A3A4
jk
A1A2A3
jkl =λA2A3A4
jkl ,
for every j,kand l,iffour waves are available. The model says that the
conditional association between A2and A3when A1is given is the same as
that between A3and A4when A2is given. This is a marginal log-linear model,
hierarchical, ordered decomposable and standard large sample theory applies.
The related log-affine marginal model, in which the above marginal log-linear
parameters have prescribed values (for example, as in small area estimation),
also exists and has standard large sample behavior.
3.3. Incomplete data
There are various statistical problems requiring the analysis of an incom-
plete set of data. Incomplete data may arise unintentionally or intentionally, in
surveys or in censuses, in data collection or in secondary analysis problems.
The most common source of unintentionally missing data due to data col-
lection is that some of the respondents in a survey or census fail to respond to
certain questions in a questionnaire (item nonreponse), or to the entire question-
naire (unit nonreponse). The problem of coverage error (parts of the population
being omitted from the sample frame) leads to incomplete data similar to unit
nonresponse. The information collected is intentionally incomplete, when, to
reduce the burden of the respondents, with respect to time and invasion of
privacy, a long questionnaire is split into shorter overlapping parts and every
respondent is only asked questions in one of the parts. Such split designs may
be applied both in surveys and censuses. In secondary data analysis it may
happen that no available data set contains all the necessary information and the
researcher has to rely on several previously collected sets of data. This leads
to a problem similar to analyzing data arising as a result of a split design, with
the additional problem that the separate sampling procedures behind the sepa-
rate sets of data make even the existence of a common underlying population
distribution questionable.
On applications of marginal models for categorical data 27
When the data are categorical, the available information, in all the above
cases, can be considered as being marginal distributions of a higher dimen-
sional contingency table (the one that would contain all variables of interest).
Depending on the actual circumstances, the distribution on the entire table (the
complete data) would apply to the entire population or to a sample from it
or may not exist at all. The first step of the analysis in all these cases is to
find out whether such a joint distribution exists and if several such distribu-
tions exist, select one according to some optimality criterion. Depending on
the circumstances, such a procedure may be called an extension of measures,
estimation or data fusion.
Notice that this scheme also covers model-based estimation of the joint
distribution, when the sufficient statistics are certain marginal distributions,
like, e.g., with log-linear models. Here, the information is not incomplete in
the sense that the entire table may have been observed but only certain aspects
(the sufficient statistics) are relevant for further analysis.
If the distributions on an incomparable (with respect to inclusion) set of
marginals are given and they are weakly compatible that is, they coincide
on the intersections of the marginals, decomposability implies that there al-
ways exists an extension (in fact, usually infinitely many) and if the system
is not decomposable, it depends on the actual marginals whether or not weak
compatibility implies strong compatibility (Darroch, Lauritzen, Speed, (1980),
Kellerer, (1964)). This classical theory however, does not cover cases when
information with respect to a more complex system of marginals is available
and the results of this paper are relevant. If some of the marginals for which
observations are available are contained in each other, classical decomposability
and the extension procedure based on it are of no help. This is the case, among
others, in the common missing data situation when there are respondents who
actually did respond to all questions, implying that observation not only for
some marginals but also for the entire table are available. Note that our ap-
proach here to handling missing data problems is fundamentally different from
the standard approach based on imputation techniques (Little, Rubin, (1987)).
In this more general case, the following procedure may be applied. Con-
sider all marginals for which information is available and their intersections.
Order these hierarchically and construct the hierarchical marginal log-linear pa-
rameterization. Determine the values of those parameters, for which this is
possible using the given information. If for a certain marginal different sources
of information are available, for example both Aand AB are observed, con-
sider a pooled estimate for the distribution on A. Set those marginal log-linear
parameters for which no information is available to arbitrary values, for exam-
ple to zero. Then, as described in Bergsma, Rudas (2002a), a generalization
of the iterative proportional scaling algorithm can be used to reconstruct the
entire distribution.
28 TAM `
AS RUDAS WICHER P. BERGSMA
To illustrate the procedure, assume that for a three-way table, observations
are available for the AB,AC,ABC marginals. Adding intersections and putting
the marginals in a hierarchical order yields A,B,AB,AC,ABC.
The distribution on the Amarginal is estimated by pooling data from all
three original distributions, the Bmarginal is estimated by pooling data from
the AB and ABC marginals. The odds ratios (Rudas, (1998a)) in the AB
marginal are estimated by pooling the original AB and ABC data sets. That
is, estimates for the one-way marginals and the odds ratios of the AB marginal
table or, equivalently, estimates of the marginal log-linear parameters λA
i,λB
j
and λAB
ij , are obtained and combining these by the iterative scaling procedure
yields our estimate for the AB marginal distribution.
Next, the distribution of the AC marginal is obtained by taking into account
the already estimated Amarginal distribution and the conditional distribution
of Cgiven Awhich is obtained by pooling data from the original AC and
ABC marginals (yielding the λAC
kand λAC
ik marginal log-linear parameters).
Finally, to estimate the ABC marginal, the already estimated AB and AC
marginal distributions are combined with the conditional distribution of B,given
Aand C, which is taken from the original ABC marginal (i.e. the λABC
jk ,λABC
ijk
marginal log-linear parameters are estimated).
The procedure may not yield a joint distribution but if the marginals (in-
cluding intersections) have an ordered decomposable ordering, just like in the
present example, there will always be a common extension to the marginals.
In the following example, a certain part of the information available needs
to be discarded. Suppose that one is interested in reconstructing or estimating
the joint distribution of variables A,B,C. The AB marginal was observed in
a simple random sample, and the AC marginal in a sample which was stratified
according to A. But the stratification in the latter data collection procedure was
based on information which may not be reliable, for example outdated census
data.
In this situation, one would use the AB sample to estimate the joint distri-
bution of these two variables (disregarding the Amarginal in AC ) and the AC
sample to estimate the Cmarginal and the interaction between Aand C. That
is, the information with respect to the distribution of Ais taken entirely from
the AB sample. Therefore, estimates are available for the following marginal
log-linear parameters:
λAB
i
AB
j
AB
ij
AC
k
AC
ik .
Because these marginal log-linear parameters are ordered decomposable, there
always exists a three dimensional distribution with these parameters.
Note however, that if, additionally, observations are also available on the
joint distribution of BC,nocomponent of that, not even the association be-
tween Band Cis guaranteed to be strongly compatible (i.e., yielding a joint
On applications of marginal models for categorical data 29
distribution) with the information obtained from the first two samples, because
ordered decomposability is lost.
As outlined above, the applicability of the theory of marginal models to
missing data problems comes from the fact that the different observed data
patterns can be considered as information pertaining to various marginals of
the joint distribution. If, for example, three variables, A,B,C, are observed,
but for some respondents the observation on Ais missing and for the remaining
ones the observation on Bis missing, the observed data patterns are BC and
AC. This problem, quite differently from the standard approach (Little, Rubin,
1987), can be viewed from the point of view of data fusion: one may try to
reconstruct the joint distribution with these marginals.
In the present example, the marginal log-linear parameters which cannot
be estimated from the data pertain to the AB and ABC effects. Therefore,
to be able to estimate the joint distribution, in order to have hierarchy, either
λAB
ij or λABC
ijand λABC
ijk need to be given certain values. While the most
straightforward assumption is that these marginal log-linear parameters are equal
to zero, this assumption will have different implications depending on the choice
of parameters to which it is applied. If the parameters selected are λAB
ij and
λABC
ijk , then assuming they are equal to zero means that Aand Bare marginally
independent, while if the same assumption is applied to λABC
ijand λABC
ijk , then
Aand Bare conditionally independent, given C.Ifitisonly the true response
to a question that decides whether or not the response is given, the observed
values for Aand the observed values for Bboth are random samples from
their respective distributions in the first case. In the second case, this is only
true within fixed categories of C.
3.4. Joint treatment of the sampling and statistical models
A statistical model may be viewed as a subset of the possible distributions
and a statistical hypothesis assumes that the true distribution belongs to this
subset. When the model is parametric, it restricts some of the parameters of the
distribution. The restricted parameters need to be estimated from the data in
such a way that the resulting estimates fulfill the requirements of the model and
the parameters not restricted by the model are estimated from the data without
further restrictions. A sampling model, on the other hand, assuming finite
population size, assigns probabilities to the possible samples (subsets of the
population), often in a way that it excludes certain samples from consideration.
In many practical situations this implies specifying certain parameters of the
observed distribution (e.g., as in stratified sampling some of the marginals are
kept fixed). Then, these parameters should not be estimated from the data, even
if the statistical model, without consideration of the sampling model, would call
for estimating these parameters. Rather, the estimates should only be sought
30 TAM `
AS RUDAS WICHER P. BERGSMA
among distributions fulfilling both the model and the sampling restrictions.
Therefore, the resulting model is the intersection of the statistical and of
the sampling model. Considering any model of interest as being the intersection
of other (possibly simpler) models may prove useful both from a conceptual
and from a computational point of view, as it was illustrated for log-linear
models by Rudas (1998b, 2002).
The parameter estimates obtained under the combined restrictions are the
estimates in the statistical model with the sampling model taken into account.
Many of the popular sampling models restrict the values of certain marginal
log-linear parameters and if the statistical model is also a marginal model, the
above combination can be carried out easily and the relationship between the
two models becomes apparent, while with other approaches potential conflicts
may not be easy to recognize.
As a first example, consider simple random sampling with fixed sample
size N. From all possible samples (subsets of the population) only those of
size Nare considered and no further restriction applies (that is, these samples
have equal probabilities). This restriction is equivalent to λ=log N. When a
marginal log-linear or log-affine statistical model is estimated with this sampling
scheme, the combined model is a log-affine marginal model. If the statistical
model is independence in a two-way table, the joint restrictions are
λ=log N
AB
ij =0,fo all i,j,
and this is a log-affine marginal model. It is easy to see that if the statisti-
cal model is defined by log-linear or log-affine restrictions on a hierarchical
marginal log-linear parameterization and the overall effect λdoes not appear
among those restricted by the statistical model, then adding the multinomial
constraint λ=log Ndoes not affect the properties of the model.
As another example, consider a case-control study, where cases (e.g. pa-
tients to be given a certain treatment) enter the study as a result of a process
not under the control of the experimenter, but the design calls for selecting
one control person for every case by a certain procedure. The status of both
cases and controls is measured before any treatment is applied and after the
treatment was applied. Here, the design specifies that the case-control marginal
is uniform, while the total, the status marginal and the association between
the two variables are unrestricted. That is, if Ais the case-control variable,
λA
i=0.
In the case of stratified sampling, the distribution of a (group of) variable(s)
is fixed by sampling design. If the frequency of variable Ais fixed, say Ni>0
in category i, this is equivalent to λA
+λA
i=log Niand this restriction should
be added to the restrictions imposed by the statistical model.
To illustrate the possible conflicts that may arise, assume now that the
variables A,B,Care observed and the statistical model of interest prescribes
On applications of marginal models for categorical data 31
the marginal distribution of Cand the marginal odds ratios of AC and of BC.
If the available data are obtained from a stratified sample, where stratification
prescribed the AB marginal and the stratification was based on reliable infor-
mation concerning the distribution of the AB marginal (for example, a recent
census), then the combination of the sampling and statistical models prescribes
the AB marginal distribution and the AC and BC marginal odds ratios and
the further parameters of the distribution are to be estimated. This is a log-
affine marginal model and as the parameters are hierarchical but not ordered
decomposable therefore, depending on the actual values of the parameters, it
may be empty. But the parameters are smooth and therefore, if the model is
nonempty, standard asymptotic theory applies to this model.
The advantage of the marginal modelling approach is that the combinatorial
properties of the class of parameters restricted by either one of the models
decides the statistical properties of the resulting model.
3.5. Graphical models
Graphical log-linear models (Darroch, Lauritzen, Speed, (1980)) use graphs
to model the association structure of multivariate distributions. The nodes of
the graph are identified with the variables involved, and two variables not
connected by an edge are assumed to be conditionally independent, given all
other variables. These are conditional independence statements involving all
variables. Models pertaining to the joint distribution of variables are also
associated with directed acyclic graphs (Lauritzen, (1996)). In this case, a
variable is assumed to be conditionally independent from its nondescendants,
given its parents, where nondescendants are those nodes into which no directed
path leads from the variable and parents are the nodes from which arrows points
to the variable. Graphical models based on directed acyclic graphs therefore,
assume conditional independencies which do not involve all variables rather,
certain marginal distributions pertaining to subsets of the variables.
Consequently, graphical models based on directed acyclic graphs are mar-
ginal log-linear models. For example, consider the directed acyclic graph in
Figure 1.
This graph implies the following conditional independencies:
CBDE|A
DC|AB
EABCF|D
FABE|CD.
The model defined by the above graph or, equivalently, by the above condi-
tional independencies is a marginal model. It will be illustrated now how the
32 TAM `
AS RUDAS WICHER P. BERGSMA
A
B
C
D
E
F
Figure 1. A directed acyclic graph.
marginal log-linear parameters of this model can be used to parameterize the
distributions in it. The parameters involved are associated with the arrows in
the graph defining the model and their values can be given intuitively appealing
interpretations.
The parameterization of graphical log-linear models defined by directed
acyclic graphs is based on the factorization of the distributions in such models
given in Lauritzen (1996). The factorization involves functions depending only
on subsets of the variables which consist of a node and its parents:
p(ω) =αVfα{α}∪pa(α) ),
where ωis a cell of the table, Vis the set of variables forming the table,
pa(α) is the set of variables that are parents of αand for WV,ωWis a
projection operator.
The subsets entering the factorization for the above example are, in a
hierarchical order, the following:
A,AB,AC,ABD,DE,CDF,
where a node is always preceded by its parent(s). The hierarchical marginal
log-linear parameters are the following:
λA
A
i
AB
j
AB
ij
AC
k
AC
ik
ABD
∗∗l
ABD
il,
λABD
jl
ABD
ijl
DE
m
DE
lm
CDF
∗∗n
CDF
ln
CDF
kn
CDF
kln .
It is easy to see that the distribution, assuming its positivity, has the desired
conditional independence properties if and only if it has a parameterization as
above, with all marginal log-linear parameters pertaining to effects not appearing
in the list above set to zero. Therefore, the distributions in the graphical model
On applications of marginal models for categorical data 33
are parameterized by the marginal log-linear parameters pertaining to the nodes-
and-their-parent(s) type subsets of the variables.
The parameterization presented here consists of parameters with a straight-
forward interpretation. For example, λA
iis the effect of variable A, λAB
ij is the
effect of Aon B, etc. Note that λABD
ijl is a measure of the joint effect of A
and Bon D,inaddition to their separate effects, the existence of which is
implied by the presence of a directed triangle containing these variables. Note
that this interpretation of the meaning of λABD
ijl is justified because Aand B
precede Dand therefore the association among them may be interpreted as an
effect.
The parameters are most easily interpreted when all variables are binary, as
in this case they have a single numerical value. In other cases, the parameters
are vector valued and this reflects the way in which effects are measured in
the log-linear tradition.
The approach to parameterize graphical models based on directed acyclic
graphs presented here facilitates associating values with the arrows in the graph
in a meaningful way. Many potential users of graphical modeling may find
this useful and this gives a chance to graphical modeling to compete with the
popular LISREL (J¨oreskog, (1997)) approach that in a similar but different
context provides the user with numbers assigned to the arrows, representing
the strengths of effects. In LISREL, the numbers are regression coefficients
in marginal regression equations but these equations are being defined by the
user without the opportunity to check their consistency or implications. In the
approach outlined here, the numbers are values of marginal log-linear parame-
ters. The models are specified with respect to the entire joint distribution using
graphs, all the implications can be read off from the graph and by restrict-
ing attention to directed acyclic graphs, contradicting model specifications are
impossible.
4. Fitting marginal models
The models discussed in this paper can be specified by the constraint
h(µ) =0,(1)
where µ=log mis the vector of log expected cell frequencies and
h(µ) =Blog Aexp(µ) v(2)
for certain fixed matrices Aand Band a vector v. Under Poisson sampling,
the kernel of the unrestricted log-likelihood is given as
Ln(µ) =niµiexpi).
34 TAM `
AS RUDAS WICHER P. BERGSMA
Notice that when conditioned on the sample size, the same estimates are ob-
tained by maximizing the kernel as for multinomial sampling. Maximum likeli-
hood estimation under a statistical model is a constrained optimization problem.
If ˆµ, the maximum likelihood estimate (MLE), exists, and if the matrices A
and Bpossess certain regularity properties, the MLE is a saddle point of the
Lagrange function
L(µ, λ) =Ln(µ) λh(µ)
where λis a vector of Lagrange multipliers.
Aitchinson, Silvey (1958) proposed a Fisher scoring method to find the
saddle point of L, λ), searching in the product space of the Lagrange mul-
tiplier vector and the vector of expected frequencies. A drawback of such an
approach is that it does not distinguish between local maxima, local minima
or saddlepoints of the likelihood function subject to the constraints, that is,
the algorithm may converge to any stationary point depending on the starting
point. Only in certain special cases, for example ordinary log-linear models
(Haberman, (1974)), there is only one stationary point which is the maximum
of the likelihood. An improved approach (Fletcher, (1970), Rapcsak, (2000))
is based on a so called exact penalty function Pc(µ), which has the MLE ˆµ
as an unconstrained maximum. The function depends on a penalty parameter
c>0 which must be taken sufficiently large. The advantage is that standard
optimization algorithms can be used to maximize Pc(µ), which is not possible
with the Aitchison-Silvey approach. Furthermore, the search is done in the
original parameter space of µ, rather than the product space of the λand µ
parameter spaces, which also simplifies the search.
The function Pc(µ) is derived from L(µ, λ) by (i) writing the Lagrange
multiplier as a function of µand (ii) adding a penalty term which penalizes
for deviations from the model constraint h(µ) =0. The Lagrange multiplier,
as a function of µ,isdetermined by differentiating L(µ, λ) with respect to µ,
equating the result to zero, and solving for λ.Apossible solution for λ, with
the Jacobian of h(µ) given by
H(µ) =h(µ)
∂µ=BD1
AmADm
is obtained as
λ(µ) =(HD1
mH)1HD1
m(nm)
where H=H(µ) and Dxis the diagonal matrix with the vector xon the
main diagonal. A suitable nonnegative penalty term is the quadratic function
h(µ)(HD1
mH)1h(µ). Thus, an appropriate exact penalty function has the
form
Pc(µ) =L(µ, λ(µ)) +1
2ch(µ)(HD1
mH)1h(µ)
where cis some positive constant. Then we have (Rapcs´ak, (2000)):
On applications of marginal models for categorical data 35
Theorem 5. There exists a c>0such that,for every c >c,ˆµis an unconstrained
local maximum of Pc(µ).
Thus, standard optimization algorithms can be used to find ˆµ.However,
a large enough value of cneeds to be selected. Initially, one can start with
any value of cgreater than one. If it is found that for the iterated estimates
the penalty term does not go to zero, the penalty parameter must be increased.
When a sufficiently large penalty parameter has been found the algorithm will
converge to a local maximum of the likelihood. If there is some doubt that
this is not the global maximum, the procedure must be repeated with different
starting values.
The standard Newton approach involves complicated derivatives making it
impractical. However, a modified quasi-Newton approach which is based on
simplified first and second derivatives of Pc(µ) can be used instead. It can be
shown that the derivative of Pc(µ) with the derivative of λ(µ) replaced by its
expected value is given by
d(µ) =nmHλ(µ) (c1)H(HD1
m.H)1h(µ).
In spite of the simplification, d(µ) is still a valid search direction for ˆµ.
The expected value of the second derivative matrix evaluated under the model
h(µ) =0is
Fc(µ) =Dm+(c2)H(HD1
mH)1H
and for c>1, this matrix is positive definite.
For sufficiently large c>1, an algorithm then is
µ(0)=log n
µ(k+1)=µ(k)step(k)Fc(k))1d(k))
where step(k)∈0,1] is a step size chosen such that Pc(k+1)) >Pc(k)) and
if ni=0, then it is replaced by a small positive quantity, say 1050. The above
algorithm will converge to ˆµif the starting estimate log nis sufficiently close
to it. Otherwise a different starting estimate may need to be tried.
Acknowledgments
Rudas’s research was supported in part by Grant No. OTKA T-032213 from the Hungarian
National Science Foundation. Bergsma’s research was supported by The Netherlands Organization
for Scientific Research (NWO), Project Number 400-20-001.
36 TAM `
AS RUDAS WICHER P. BERGSMA
REFERENCES
Agresti, A. (1990) Categorical Data Analysis,Wiley, New York.
Aitchison, J. and Silvey, S. D. (1958) Maximum likelihood estimation of parameters subject to
restraints, Ann. Math. Stat., 29, 813-828.
Balagtas, C. C., Becker, M. P., and Lang, J. B. (1995) Marginal modelling of categorical data
from crossover experiments, Applied Statistics, 44, 63-77.
Bartolucci, F. and Forcina, A. (2002) Extended RC association models allowing for order re-
strictions and marginal modeling, J. Amer. Statist. Assoc.,97(460), 1192-1199.
Bartolucci, F., Forcina, A., and Dardanoni, V. (2001) Positive quadrant dependence and
marginal modeling in two-way tables with ordered margins, J. Amer. Statist. Assoc.,96
(456), 1497-1505.
Becker, M. P. (1994) Analysis of repeated categorical measurements using models for marginal
distributions: an application to trends in attitudes on legalized abortion, In Marsden, P. V.
(ed.) Sociological Methodology, 24, 229-265, Blackwell, Oxford.
Becker, M. P., Minick, S., and Yang, I. (1998) Specifications of models for cross-classified
counts: comparisons of the log-linear model and marginal model perspectives, Sociological
Methods and Research, 26, 511-529.
Bergsma, W. P. (1997) Marginal Models for Categorical Data,Tilburg University Press, Tilburg.
Bergsma, W. P. and Rudas, T. (2002a) Marginal models for categorical data, Ann. Stat., 30,
140-159.
Bergsma, W. P and Rudas, T. (2002b) Modeling conditional and marginal association in contin-
gency tables, Ann. Fac. Sci. Tolulouse Math.,11(6), 443-454.
Bishop, Y. V. V., Fienberg, S. E., and Holland, P. W. (1975) Discrete Multivariate Analysis,
MIT Press, Cambridge, MA.
Colombi, R. and Forcina, A. (2001) Marginal regression models for the analysis of positive asso-
ciation of ordinal response variables, Biometrika,88(4), 1007-1019.
Darroch, J. N., Lauritzen, S. L., and Speed, T. P. (1980) Markov fields and log-linear models
for contingency tables, Ann. Stat.,8,539-552.
Davis, L. J. (1989) Intersection union tests for strict collapsibility in three-dimensional contingency
tables, Ann. Stat., 17, 1693-1708.
Dawid, A. P. (1980) Conditional independence for statistical operations, Ann. Stat.,8,598-617.
Fletcher, R. (1970) A class of methods for nonlinear programming with termination and conver-
gence properties, In Abadie, J. Wolfe, P. (eds.) Integer and nonlinear programming, North
Holland, Amsterdam.
Glonek, G. J. N. (1996) A class of regression models for multivariate responses, Biometrika, 83,
15-28.
Glonek, G. J. N. and McCullagh, P. (1995) Multivariate logistic models, J. Roy. Statist. Soc., Ser.
B, 57, 533-546.
Haberman, S. J. (1974) The Analysis of Frequency Data, University of Chicago Press, Chicago.
Hagenaars, J. A. (1990) Categorical Longitudinal Data, Sage, Newbury Park.
J¨
oreskog, K. G. (1997) Structural equation models in the social sciences: specification, estimation,
and testing, In Krishnaiah, P. R. (ed.) Applications of Statistics, 267-287, North-Holland,
Amsterdam.
Kauermann, G. (1997) A note on multivariate logistic models for contingency tables, Austr. J. Stat.,
39, 261-276.
On applications of marginal models for categorical data 37
Kellerer, H. G. (1964) Verteilungfunktionen mit gegebenen Marginalverteilungen, Zeitschrift f¨
ur
Wahrscheinlichkeitstheorie und verwandte Gebiete,3,247-270.
Lang, J. B. and Agresti, A. (1994) Simultaneously modelling the joint and marginal distributions
of multivariate categorical responses, J. Am. Stat. Assoc., 89, 625-632.
Lang, J. B., McDonald, J. W., and Smith, P. W. F. (1999) Association-marginal modelling of
multivariate categorical responses: a maximum likelihood approach, J. Am. Stat. Assoc., 94,
1161-1171.
Lauritzen, S. L. (1996) Graphical Models, Clarendon Press, Oxford.
Lauritzen, S. L., Speed, T. P., and Vijayan, K. (1984) Decomposable graphs and hypergraphs,
J. Austr. Math. Soc., Ser. A, 36, 12-29.
Little, R. J. and Rubin, D. (1987) Statistical Analysis with Missing Data,Wiley, New York.
Molenberghs, G. and Lesaffre, E. (1999) Marginal modelling of multivariate categorical data,
Statistics in Medicine, 18, 2237-2255.
Rapcs ´
ak, T. (2000) Global Lagrange multiplier rule and smooth exact penalty functions for equality
constraints, In Di Pillo, G., Giannesi, F. (eds.) Nonlinear Optimization and Related Topics,
Kluwer, 351-368.
Rudas, T. (1998a) Odds Ratios in the Analysis of Contingency Tables, Sage, Thousand Oaks.
Rudas, T. (1998b) A new algorithm for the maximum likelihood estimation of graphical log-linear
models, Computational Statistics,13(9), 529-537.
Rudas, T. (2002) Canonical representation of log-linear models, Communications in Statistics (The-
ory and Methods),31(12), 2311-2323.
TAM ´
AS RUDAS
Department of Statistics
Faculty of Social Sciences
otv¨os Lor´and University
H-1117 Budapest
azm´any P´eter s ´et´any 1/A (Hungary)
rudas@tarki.hu
WICHER P. BERGSMA
EURANDOM, office LG 1.37
P.O. Box 513
5600 MB Eindhoven (The Netherlands)
bergsma@eurandom.tue.nl
... A disszertáció tárgyát képezı, kategoriális változókra alkalmazott grafikus modellek 6 (Rudas, Bergsma, 2004;Rudas, Bergsma, Németh, 2006ab) az utóbbi évtizedekben megjelent két statisztikai terület, a marginális modellek, illetve a grafikus modellek metszéspontján helyezkednek el. E két területet tárgyalom részletesebben a következıkben. ...
... Az alábbiakban a marginális loglineáris paraméterezés további, alapvetı fontosságú tulajdonságait tárgyalom. Hacsak nem jelzem másként, a tételeket Bergsma, Rudas (2002)-bıl és Rudas, Bergsma (2004)-bıl emelem ki, a definíciók, jelölések és magyarázatok esetében e forrásokon kívül Lauritzen (1996)-ra is támaszkodom majd. ...
... Pl. az alábbi irányított körmentes gráf esetén 13. ábra. IKG (Rudas, Bergsma, 2004) páronkénti Markov-tulajdonság mellett többek között ez a két állítás teljesül a modellre: ...
... The objective variables involved in the hypothesis are completed with subjective social position and subjective intergenerational mobility, aiming to take into account perception as a pathway through which objective factors have an impact on individual behavior. The fitted models are graphical models based on directed acyclic graphs and the values of marginal log-linear parameters (as proposed in Rudas, Bergsma, 2004) are used to gain insight into the strengths of associations. The main findings include that according to some parameters, a downward mobility trend prevailed between 1987 and 1992 as opposed to the upward trend between 1992 and 1999. ...
... subjective perception also has mental determinants etc). Rather the values of marginal log-linear parameters (as proposed in Rudas, Bergsma, 2004) are of interest to gain insight into the strengths of associations and into their change over time. ...
... Graphical loglinear modeling is a plausible alternative of the popular LISREL approach, since in LISREL, numbers assigned to the arrows are regression coefficients of different marginal regression equations and their consistency cannot be checked. In the marginal loglinear approach a model is specified with respect to the entire joint distribution hence contradicting model specifications are impossible (Rudas, Bergsma, 2004). The analysis of the model in Figure 1 was repeated by applying path analysis (about the methods and results see Section 5) 7 . ...
Conference Paper
Full-text available
The paper analyzes social mobility data for Hungary from the years 1987, 1992 and 1999. The main focus is put on testing Treiman’s modernization hypothesis that was posed in 1970 and is still widely cited today in the context of transition. The objective variables involved in the hypothesis are completed with subjective social position and subjective intergenerational mobility, aiming to take into account perception as a pathway through which objective factors have an impact on individual behavior. The fitted models are graphical models based on directed acyclic graphs and the values of marginal log-linear parameters (as proposed in Rudas, Bergsma, 2004) are used to gain insight into the strengths of associations. The main findings include that according to some parameters, a downward mobility trend prevailed between 1987 and 1992 as opposed to the upward trend between 1992 and 1999. That is, when investigating transition process we should distinguish these two periods.
... One of the most desirable properties of parameters and parameterizations is the variation independence of their components. Before giving a general definition, a simple example adopted from Rudas and Bergsma (2004) is given to illustrate the concept. ...
Preprint
Full-text available
Marginal models involve restrictions on the conditional and marginal association structure of a set of categorical variables. They generalize log-linear models for contingency tables, which are the fundamental tools for modelling the conditional association structure. This chapter gives an overview of the development of marginal models during the past 20 years. After providing some motivating examples, the first few sections focus on the definition and characteristics of marginal models. Specifically, we show how their fundamental properties can be understood from the properties of marginal log-linear parameterizations. Algorithms for estimating marginal models are discussed, focussing on the maximum likelihood and the generalized estimating equations approaches. It is shown how marginal models can help to understand directed graphical and path models, and a description is given of marginal models with latent variables.
... 3 More about this topic can be found in Rudas and Bergsma (2004); Németh (2009); Rudas et al. (2006); Rudas et al. (2010), and Bergsma et al. (2009). This introduction is mostly based on Németh 2009. ...
Article
Full-text available
It is crucial to understand the role that labor market positions might play in creating gender differences in work–life balance. One theoretical approach to understanding this relationship is the spillover theory. The spillover theory argues that an individual’s life domains are integrated; meaning that well-being can be transmitted between life domains. Based on data collected in Hungary in 2014, this paper shows that work-to-family spillover does not affect both genders the same way. The effect of work on family life tends to be more negative for women than for men. Two explanations have been formulated in order to understand this gender inequality. According to the findings of the analysis, gender is conditionally independent of spillover if financial status and flexibility of work are also incorporated into the analysis. This means that the relative disadvantage for women in terms of spillover can be attributed to their lower financial status and their relatively low access to flexible jobs. In other words, the gender inequalities in work-to-family spillover are deeply affected by individual labor market positions. The observation of the labor market’s effect on work–life balance is especially important in Hungary since Hungary has one of the least flexible labor arrangements in Europe. A marginal log-linear model, which is a method for categorical multivariate analysis, has been applied in this analysis.
Article
Full-text available
Categorical marginal models (CMMs) are flexible tools for modelling dependent or clustered categorical data, when the dependencies themselves are not of interest. A major limitation of maximum likelihood (ML) estimation of CMMs is that the size of the contingency table increases exponentially with the number of variables, so even for a moderate number of variables, say between 10 and 20, ML estimation can become computationally infeasible. An alternative method, which retains the optimal asymptotic efficiency of ML, is maximum empirical likelihood (MEL) estimation. However, we show that MEL tends to break down for large, sparse contingency tables. As a solution, we propose a new method, which we call maximum augmented empirical likelihood (MAEL) estimation and which involves augmentation of the empirical likelihood support with a number of well-chosen cells. Simulation results show good finite sample performance for very large contingency tables.
Chapter
Marginal models involve restrictions on the conditional and marginal association structure of a set of categorical variables. They generalize log-linear models for contingency tables, which are the fundamental tools for modelling the conditional association structure. This chapter gives an overview of the development of marginal models during the past 20 years. After providing some motivating examples, the first few sections focus on the definition and characteristics of marginal models. Specifically, we show how their fundamental properties can be understood from the properties of marginal log-linear parameterizations. Algorithms for estimating marginal models are discussed, focussing on the maximum likelihood and the generalized estimating equations approaches. It is shown how marginal models can help to understand directed graphical and path models, and a description is given of marginal models with latent variables.
Preprint
Full-text available
Bayesian methods for graphical log-linear marginal models have not been developed in the same extent as traditional frequentist approaches. In this work, we introduce a novel Bayesian approach for quantitative learning for such models. These models belong to curved exponential families that are difficult to handle from a Bayesian perspective. Furthermore, the likelihood cannot be analytically expressed as a function of the marginal log-linear interactions, but only in terms of cell counts or probabilities. Posterior distributions cannot be directly obtained, and MCMC methods are needed. Finally, a well-defined model requires parameter values that lead to compatible marginal probabilities. Hence, any MCMC should account for this important restriction. We construct a fully automatic and efficient MCMC strategy for quantitative learning for graphical log-linear marginal models that handles these problems. While the prior is expressed in terms of the marginal log-linear interactions, we build an MCMC algorithm that employs a proposal on the probability parameter space. The corresponding proposal on the marginal log-linear interactions is obtained via parameter transformation. By this strategy, we achieve to move within the desired target space. At each step, we directly work with well-defined probability distributions. Moreover, we can exploit a conditional conjugate setup to build an efficient proposal on probability parameters. The proposed methodology is illustrated by a simulation study and a real dataset.
Article
When data composed of several categorical responses together with categorical or continuous predictors are observed, the multivariate logistic transform introduced by McCullagh and Nelder can be used to define a class of regression models that is, in many applications, particularly suitable for relating the joint distribution of the responses to predictors. In this paper we give a general definition of this class of models and study their properties. A computational scheme for performing maximum likelihood estimation for data sets of moderate size is described and a system of model formulae that succinctly define particular models is introduced. Applications of these models to longitudinal problems are illustrated by numerical examples.
Article
Generalized log-linear models can be used to describe the association structure and/or the marginal distributions of multivariate categorical responses. We simultaneously model the association structure and marginal distributions using association-marginal (AM) models, which are specially formulated generalized log-linear models that combine two models: an association (A) model, which describes the association among all the responses; and a marginal (M) model, which describes the marginal distributions of the responses. Because the model's composite link function is not required to be invertible, a large class of models can be entertained and model specification is typically straightforward. We propose a "mixed freedom/constraint" parameterization that exploits the special structure of an AM model. Using this parameterization, maximum likelihood fitting is straightforward and typically: feasible for large, sparse tables. When a parsimonious association model is used, the size of the fitting problem is substantially reduced, and some of the problems associated with sampling 0's are avoided. We compare the asymptotic behavior of AM model parameter estimators assuming product-multinomial and Poisson sampling. For computational convenience, the product-multinomial variances are obtained by adjusting the Poisson variances. We propose a conditional score statistic for AM model assessment. The proposed maximum likelihood methods are illustrated through an analysis of marijuana use data from five waves of the National Youth Survey.
Article
Models parameterized in terms of linear models for marginal logits and linear models for marginal log-odds ratios provide a useful framework for the analysis of cross-classifications of counts when there is interest in comparing marginal distributions, or studying changes in marginal distributions. Examples of such cross-classifications include tabulations of responses to a collection of items from a questionnaire, and contingency tables cross-classifying the repeated measurements of a categorical response variable from longitudinal studies. The example used for illustrative purposes in this chapter is based on four items common to the National Opinion Research Center's (NORC) 1965 SRS870, 1975 General Social Survey (GSS), and 1985 GSS. Each of the items asked respondents to indicate whether or not they approved of the availability of legalized abortions for women in a specific situation. The utility of the marginal models approach to analyzing repeated categorical measurements is demonstrated through comparisons with two analyses based on conventional log-linear models. An algorithm that can be used to fit marginal models by the method of maximum likelihood is described in the appendix to this chapter.
Article
Graphical log-linear models for contingency tables are characterized as intersections of simpler conditional independence models. This suggests an algorithm to compute maximum likelihood estimates by iteratively imposing the conditional independences on the data. A convergence proof is given and this algorithm is compared to iterative proportional fitting, which is the standard method of computing maximum likelihood estimates under log-linear models. The algorithm proposed here is intuitively appealing and its main idea can be applied to the estimation of other statistical models.
Article
Marginal models provide a useful framework for the analysis of crossover experiments when the response variable is categorical. We use the three- treatment, three-periodic crossover experiment with a binary outcome variable to demonstrate how marginal models can be used to perform a likelihood-based analysis of multiple-period crossover experiments. Other designs are discussed in less detail. Maximum likelihood estimation is performed using a constraint equation specification of the marginal model. Data from a crossover trial comparing treatments for primary dysmenorrhoea are used to demonstrate the utility of marginal models in analysing crossover data.
Article
Smooth exact penalty functions based on Courant’s and Fletcher’s ideas are reconsidered. After a short survey, the original ideas are combined with the global Lagrange multiplier rule formulated by the first and second covariant derivatives of the objective function with respect to the induced Riemannian metric of the constraint manifold. The tensor approach is described by the usual tools of nonlinear optimization giving a clearer geometric background of these methods.
Article
Given a set of discrete response variables, some of which are ordinal, and an arbitrary set of discrete explanatory variables, we propose a simple matrix formulation for parameterising the saturated model as in Glonek (1996). This is such that, within a hierarchical structure, marginal logits and log-odds ratios of various possible types, together with the remaining log-linear interactions of high order, may be modelled by equality and inequality constraints. Inequality constraints are particularly relevant for specifying models of positive association. Efficient algorithms are provided for computing maximum likelihood estimates under such constraints. The asymptotic distribution of the likelihood ratio test is derived and an extension of the usual analysis of deviance is outlined which incorporates inequality constraints.