Content uploaded by Alam Moudud
Author content
All content in this area was uploaded by Alam Moudud
Content may be subject to copyright.
20 CONTRIBUTED RESEARCH ART ICLES
hglm: A Package for Fitting Hierarchical
Generalized Linear Models
by Lars Rönnegård, Xia Shen and Moudud Alam
Abstract We present the hglm package for fit-
ting hierarchical generalized linear models. It
can be used for linear mixed models and gener-
alized linear mixed models with random effects
for a variety of links and a variety of distribu-
tions for both the outcomes and the random ef-
fects. Fixed effects can also be fitted in the dis-
persion part of the model.
Introduction
The hglm package (Alam et al.,2010) implements
the estimation algorithm for hierarchical general-
ized linear models (HGLM; Lee and Nelder,1996).
The package fits generalized linear models (GLM;
McCullagh and Nelder,1989) with random effects,
where the random effect may come from a distribu-
tion conjugate to one of the exponential-family dis-
tributions (normal, gamma, beta or inverse-gamma).
The user may explicitly specify the design matrices
both for the fixed and random effects. In conse-
quence, correlated random effects, as well as random
regression models can be fitted. The dispersion pa-
rameter can also be modeled with fixed effects.
The main function is hglm() and the input is spec-
ified in a similar manner as for glm(). For instance,
R> hglm(fixed = y ~ week, random = ~ 1|ID,
family = binomial(link = logit))
fits a logit model for ywith week as fixed effect and ID
representing the clusters for a normally distributed
random intercept. Given an hglm object, the stan-
dard generic functions are print(),summary() and
plot().
Generalized linear mixed models (GLMM) have
previously been implemented in several R functions,
such as the lmer() function in the lme4 package
(Bates and Maechler,2010) and the glmmPQL() func-
tion in the MASS package (Venables and Ripley,
2002). In GLMM, the random effects are assumed
to be Gaussian whereas the hglm() function allows
other distributions to be specified for the random
effect. The hglm() function also extends the fitting
algorithm of the dglm package (Dunn and Smyth,
2009) by including random effects in the linear pre-
dictor for the mean, i.e. it extends the algorithm so
that it can cope with mixed models. Moreover, the
model specification in hglm() can be given as a for-
mula or alternatively in terms of y,X,Zand X.disp.
Here yis the vector of observed responses, Xand
Zare the design matrices for the fixed and random
effects, respectively, in the linear predictor for the
means and X.disp is the design matrix for the fixed
effects in the dispersion parameter. This enables a
more flexible modeling of the random effects than
specifying the model by an R formula. Consequently,
this option is not as user friendly but gives the user
the possibility to fit random regression models and
random effects with known correlation structure.
The hglm package produces estimates of fixed
effects, random effects and variance components as
well as their standard errors. In the output it also
produces diagnostics such as deviance components
and leverages.
Three illustrating models
The hglm package makes it possible to
1. include fixed effects in a model for the residual
variance,
2. fit models where the random effect distribution
is not necessarily Gaussian,
3. estimate variance components when we have
correlated random effects.
Below we describe three models that can be fitted us-
ing hglm(), which illustrate these three points. Later,
in the Examples section, five examples are presented
that include the R syntax and output for the hglm()
function.
Linear mixed model with fixed effects in
the residual variance
We start by considering a normal-normal model with
heteroscedastic residual variance. In biology, for in-
stance, this is important if we wish to model a ran-
dom genetic effect (e.g., Rönnegård and Carlborg,
2007) for a trait y, where the residual variance differs
between the sexes.
For the response yand observation number iwe
have:
yi|β,u,βd∼N(Xiβ+Ziu,exp(Xd,iβd))
u∼MVN0,Iσ2
u
where βare the fixed effects in the mean part of the
model, the random effect urepresents random vari-
ation among clusters of observations and βdis the
fixed effect in the residual variance part of the model.
The variance of the random effect uis given by σ2
u.
The R Journal Vol. 2/2, December 2010 ISSN 2073-4859
CONTRIBUTED RESEARCH ARTI CLES 21
The subscript ifor the matrices X,Z, and Xdindi-
cates the i’th row. Here, a log link function is used
for the residual variance and the model for the resid-
ual variance is therefore given by exp(Xd,iβd). In
the more general GLM notation, the residual vari-
ance here is described by the dispersion term φ, so
we have log(φi) = Xd,iβd.
This model cannot be fitted with the dglm pack-
age, for instance, because we have random effects in
the mean part of the model. It is also beyond the
scope of the lmer() function since we allow a model
for the residual variance.
The implementation in hglm() for this model is
demonstrated in Example 2 in the Examples section
below.
A Poisson model with gamma distributed
random effects
For dependent count data it is common to model
a Poisson distributed response with a gamma dis-
tributed random effect (Lee et al.,2006). If we assume
no overdispersion conditional on uand thereby have
a fixed dispersion term, this model may be specified
as:
E(yi|β,u)=exp(Xiβ+Ziv)
where a level jin the random effect vis given by
vj=log(uj)and ujare iid with gamma distribution
having mean and variance: E(uj) = 1, var(uj) = λ.
This model can also be fitted with the hglm pack-
age, since it extends existing GLMM functions (e.g.
lmer()) to allow a non-normal distribution for the
random effect. Later on, in Example 3, we show the
hglm() code used for fitting a gamma-Poisson model
with fixed effects included in the dispersion parame-
ter.
A linear mixed model with a correlated
random effect
In animal breeding it is important to estimate vari-
ance components prior to ranking of animal perfor-
mances (Lynch and Walsh,1998). In such models the
genetic effect of each animal is modeled as a level
in a random effect and the correlation structure Ais
a matrix with known elements calculated from the
pedigree information. The model is given by
yi|β,u∼NXiβ+Ziu,σ2
e
u∼MVN0,Aσ2
u
This may be reformulated as (see Lee et al.,2006;
Rönnegård and Carlborg,2007)
yi|β,u∼NXiβ+Z∗
iu∗,σ2
e
u∗∼MVN(0,Iσ2
u)
where Z∗=ZL and Lis the Cholesky factorization of
A.
Thus the model can be fitted using the hglm()
function with a user-specified input matrix Z(see R
code in Example 4 below).
Overview of the fitting algorithm
The fitting algorithm is described in detail in Lee
et al. (2006) and is summarized as follows. Let nbe
the number of observations and kbe the number of
levels in the random effect. The algorithm is then:
1. Initialize starting values.
2. Construct an augmented model with response
yaug =y
E(u).
3. Use a GLM to estimate βand vgiven the vec-
tor φand the dispersion parameter for the ran-
dom effect λ. Save the deviance components
and leverages from the fitted model.
4. Use a gamma GLM to estimate βdfrom the
first ndeviance components dand leverages
hobtained from the previous model. The re-
sponse variable and weights for this model are
d/(1−h)and (1−h)/2, respectively. Update
the dispersion parameter by putting φequal to
the predicted response values for this model.
5. Use a similar GLM as in Step 4 to estimate λ
from the last kdeviance components and lever-
ages obtained from the GLM in Step 3.
6. Iterate between steps 3-5 until convergence.
For a more detailed description of the algorithm
in a particular context, see below.
H-likelihood theory
Let ybe the response and uan unobserved random
effect. The hglm package fits a hierarchical model
y|u∼fm(µ,φ)and u∼fd(ψ,λ)where fmand fdare
specified distributions for the mean and dispersion
parts of the model.
We follow the notation of Lee and Nelder (1996),
which is based on the GLM terminology by McCul-
lagh and Nelder (1989). We also follow the likelihood
approach where the model is described in terms of
likelihoods. The conditional (log-)likelihood for y
given uhas the form of a GLM
`(θ0,φ;y|u) = yθ0−b(θ0)
a(φ)+c(y,φ)(1)
where θ0is the canonical parameter, φis the disper-
sion term, µ0is the conditional mean of ygiven u
The R Journal Vol. 2/2, December 2010 ISSN 2073-4859
22 CONTRIBUTED RESEARCH ART ICLES
where η0=g(µ0), i.e. g() is a link function for the
GLM. The linear predictor is given by η0=η+v
where η=Xβand v=v(u)for some strict monotonic
function of u. The link function v(u)should be spec-
ified so that the random effects occur linearly in the
linear predictor to ensure meaningful inference from
the h-likelihood (Lee et al.,2007). The h-likelihood
or hierarchical likelihood is defined by
h=`(θ0,φ;y|u) + `(α;v)(2)
where `(α;v)is the log density for vwith parameter
α. The estimates of βand vare given by ∂h
∂β =0 and
∂h
∂v=0. The dispersion components are estimated by
maximizing the adjusted profile h-likelihood
hp=h−1
2log| − 1
2πH|β=ˆ
β,v=ˆ
v
(3)
where His the Hessian matrix of the h-likelihood.
The dispersion term φcan be connected to a lin-
ear predictor Xdβdgiven a link function gd(·)with
gd(φ) = Xdβd. The adjusted profile likelihoods of `
and hmay be used for inference of β,vand the dis-
persion parameters φand λ(pp. 186 in Lee et al.,
2006). More detail and discussion of h-likelihood
theory is presented in the hglm vignette.
Detailed description of the hglm fitting al-
gorithm for a linear mixed model with het-
eroscedastic residual variance
In this section we describe the fitting algorithm in de-
tail for a linear mixed model where fixed effects are
included in the model for the residual variance. The
extension to distributions other than Gaussian is de-
scribed at the end of the section.
Lee and Nelder (1996) showed that linear mixed
models can be fitted using a hierarchy of GLM by
using an augmented linear model. The linear mixed
model
y=Xb+Zu+e
v=ZZTσ2
u+Rσ2
e
where Ris a diagonal matrix with elements given
by the estimated dispersion model (i.e. φdefined be-
low). In the first iteration of the HGLM algorithm, R
is an identity matrix. The model may be written as
an augmented weighted linear model:
ya=Taδ+ea(4)
where
ya=y
0qTa=X Z
0 Iq
δ=b
uea=e
−u
Here, qis the number of columns in Z, 0qis a vec-
tor of zeros of length q, and Iqis the identity matrix
of size q×q. The variance-covariance matrix of the
augmented residual vector is given by
V(ea) = Rσ2
e0
0 Iqσ2
u
Given σ2
eand σ2
u, this weighted linear model gives
the same estimates of the fixed and random effects
(band urespectively) as Henderson’s mixed model
equations (Henderson,1976).
The estimates from weighted least squares are
given by:
Tt
aW−1Taˆ
δ=Tt
aW−1ya
where W≡V(ea).
The two variance components are estimated iter-
atively by applying a gamma GLM to the residuals
e2
iand u2
iwith intercept terms included in the linear
predictors. The leverages hifor these models are cal-
culated from the diagonal elements of the hat matrix:
Ha=Ta(Tt
aW−1Ta)−1Tt
aW−1(5)
A gamma GLM is used to fit the dispersion part of
the model with response
yd,i=e2
i/(1−hi)(6)
where E(yd) = µdand µd≡φ(i.e. σ2
efor a Gaussian
reponse). The GLM model for the dispersion pa-
rameter is then specified by the link function gd(.)
and the linear predictor Xdβd, with prior weights
(1−hi)/2, for
gd(µd) = Xdβd(7)
Similarly, a gamma GLM is fitted to the dispersion
term α(i.e. σ2
ufor a GLMM) for the random effect v,
with
yα,j=u2
j/(1−hn+j),j=1,2,...,q(8)
and
gα(µα) = λ(9)
where the prior weights are (1−hn+j)/2 and the esti-
mated dispersion term for the random effect is given
by ˆ
α=g−1
α(ˆ
λ).
The algorithm iterates by updating both R=
diag(ˆ
φ)and σ2
u=ˆ
α, and subsequently going back to
Eq. (4).
For a non-Gaussian response variable y, the esti-
mates are obtained simply by fitting a GLM instead
of Eq. (4) and by replacing e2
iand u2
jwith the de-
viance components from the augmented model (see
Lee et al.,2006).
The R Journal Vol. 2/2, December 2010 ISSN 2073-4859
CONTRIBUTED RESEARCH ARTI CLES 23
Implementation details
Distributions and link functions
There are two important classes of models that can
be fitted in hglm: GLMM and conjugate HGLM.
GLMMs have Gaussian random effects. Conjugate
HGLMs have been commonly used partly due to the
fact that explicit formulas for the marginal likelihood
exist. HGLMs may be used to fit models in sur-
vival analysis (frailty models), where for instance the
complementary-log-log link function can be used on
binary responses (see e.g., Carling et al.,2004). The
gamma distribution plays an important role in mod-
eling responses with a constant coefficient of varia-
tion (see Chapter 8 in McCullagh and Nelder,1989).
For such responses with a gamma distributed ran-
dom effect we have a gamma-gamma model. A sum-
mary of the most important models is given in Tables
1and 2. Note that the random-effect distribution can
be an arbitrary conjugate exponential-family distri-
bution. For the specific case where the random-effect
distribution is a conjugate to the distribution of y,
this is called a conjugate HGLM. Further implemen-
tation details can be found in the hglm vignette.
Possible future developments
In the current version of hglm() it is possible to in-
clude a single random effect in the mean part of the
model. An important development would be to in-
clude several random effects in the mean part of the
model and also to include random effects in the dis-
persion parts of the model. The latter class of models
is called Double HGLM and has been shown to be
a useful tool for modeling heavy tailed distributions
(Lee and Nelder,2006).
The algorithm of hglm() gives true marginal like-
lihood estimates for the fixed effects in conjugate
HGLM (Lee and Nelder,1996, pp. 629), whereas
for other models the estimates are approximated.
Lee and co-workers (see Lee et al.,2006, and refer-
ences therein) have developed higher-order approx-
imations, which are not implemented in the current
version of the hglm package. For such extensions,
we refer to the commercially available GenStat soft-
ware (Payne et al.,2007), the recently available R
package HGLMMM (Molas,2010) and also to com-
ing updates of hglm.
Examples
Example 1: A linear mixed model
Data description The output from the hglm() func-
tion for a linear mixed model is compared to the re-
sults from the lme() function in the nlme (Pinheiro
et al.,2009) package using simulated data. In the sim-
ulated data there are five clusters with 20 observa-
tions in each cluster. For the mean part of the model,
the simulated intercept value is µ=0, the variance
for the random effect is σ2
u=0.2, and the residual
variance is σ2
e=1.0 .
Both functions produce the same estimate of
the fixed intercept effect of 0.1473 (s.e. 0.16)
and also the same variance component estimates.
The summary.hglm() function gives the estimate
of the variance component for the random in-
tercept (0.082) as well as the residual variance
(0.84). It also gives the logarithm of the vari-
ance component estimates together with standard
errors below the lines Model estimates for the
dispersion term and Dispersion model for the
random effects. The lme() function gives the
square root of the variance component estimates.
The model diagnostics produced by the
plot.hglm function are shown in Figures 1and 2.
The data are completely balanced and therefore pro-
duce equal leverages (hatvalues) for all observations
and also for all random effects (Figure 1). Moreover,
the assumption of the deviance components being
gamma distributed is acceptable (Figure 2).
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●
0 20 40 60 80 100
0.1 0.2 0.3 0.4
Index
hatvalues
Figure 1: Hatvalues (i.e. diagonal elements of the
augmented hat-matrix) for each observation 1 to 100,
and for each level in the random effect (index 101-
105).
The R Journal Vol. 2/2, December 2010 ISSN 2073-4859
24 CONTRIBUTED RESEARCH ART ICLES
Table 1: Commonly used distributions and link functions possible to fit with hglm()
Model name y|udistribution Link g(µ)udistribution Link v(u)
Linear mixed model Gaussian identity Gaussian identity
Binomial conjugate Binomial logit Beta logit
Binomial GLMM Binomial logit Gaussian identity
Binomial frailty Binomial comp-log-log Gamma log
Poisson GLMM Poisson log Gaussian identity
Poisson conjugate Poisson log Gamma log
Gamma GLMM Gamma log Gaussian identity
Gamma conjugate Gamma inverse Inverse-Gamma inverse
Gamma-Gamma Gamma log Gamma log
Table 2: hglm code for commonly used models
Model name Setting for family argument Setting for rand.family argument
Linear mixed modelagaussian(link = identity) gaussian(link = identity)
Beta-Binomial binomial(link = logit) Beta(link = logit)
Binomial GLMM binomial(link = logit) gaussian(link = identity)
Binomial frailty binomial(link = cloglog) Gamma(link = log)
Poisson GLMM poisson(link = log) gaussian(link = identity)
Poisson frailty poisson(link = log) Gamma(link = log)
Gamma GLMM Gamma(link = log) gaussian(link = identity)
Gamma conjugate Gamma(link = inverse) inverse.gamma(link = inverse)
Gamma-Gamma Gamma(link = log) Gamma(link = log)
aFor example, the hglm() code for a linear mixed model is
hglm(family = gaussian(link = identity), rand.family = gaussian(link = identity), ...)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80 100
012345
Index
Deviances
●●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●●
●●●●●
●
0 1 2 3 4 5
012345
Gamma Quantiles
Deviance Quantiles
Figure 2: Deviance diagnostics for each observation
and each level in the random effect.
The R code and output for this example is as fol-
lows:
R> set.seed(123)
R> n.clus <- 5 #No. of clusters
R> n.per.clus <- 20 #No. of obs. per cluster
R> sigma2_u <- 0.2 #Variance of random effect
R> sigma2_e <- 1 #Residual variance
R> n <- n.clus*n.per.clus
R> X <- matrix(1, n, 1)
R> Z <- diag(n.clus)%x%rep(1, n.per.clus)
R> a <- rnorm(n.clus, 0, sqrt(sigma2_u))
R> e <- rnorm(n, 0, sqrt(sigma2_e))
R> mu <- 0
R> y <- mu + Z%*%a + e
R> lmm <- hglm(y = y, X = X, Z = Z)
R> summary(lmm)
R> plot(lmm)
Call:
hglm.default(X = X, y = y, Z = Z)
DISPERSION MODEL
WARNING: h-likelihood estimates through EQL can be biased.
Model estimates for the dispersion term:[1] 0.8400608
Model estimates for the dispersion term:
Link = log
Effects:
Estimate Std. Error
-0.1743 0.1441
Dispersion = 1 is used in Gamma model on deviances
to calculate the standard error(s).
Dispersion parameter for the random effects
[1] 0.08211
Dispersion model for the random effects:
Link = log
Effects:
Estimate Std. Error
-2.4997 0.8682
Dispersion = 1 is used in Gamma model on deviances
to calculate the standard error(s).
MEAN MODEL
Summary of the fixed effects estimates
Estimate Std. Error t value Pr(>|t|)
X.1 0.1473 0.1580 0.933 0.353
Note: P-values are based on 96 degrees of freedom
Summary of the random effects estimate
Estimate Std. Error
[1,] -0.3237 0.1971
[2,] -0.0383 0.1971
The R Journal Vol. 2/2, December 2010 ISSN 2073-4859
CONTRIBUTED RESEARCH ARTI CLES 25
[3,] 0.3108 0.1971
[4,] -0.0572 0.1971
[5,] 0.1084 0.1971
EQL estimation converged in 5 iterations.
R> #Same analysis with the lme function
R> library(nlme)
R> clus <- rep(1:n.clus,
+ rep(n.per.clus, n.clus))
R> summary(lme(y ~ 0 + X,
+ random = ~ 1 | clus))
Linear mixed-effects model fit by REML
Data: NULL
AIC BIC logLik
278.635 286.4203 -136.3175
Random effects:
Formula: ~1 | clus
(Intercept) Residual
StdDev: 0.2859608 0.9166
Fixed effects: y ~ 0 + X
Value Std.Error DF t-value p-value
X 0.1473009 0.1573412 95 0.9361873 0.3516
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-2.5834807 -0.6570612 0.0270673 0.6677986 2.1724148
Number of Observations: 100
Number of Groups: 5
Example 2: Analysis of simulated data for
a linear mixed model with heteroscedastic
residual variance
Data description Here, a heteroscedastic residual
variance is added to the simulated data from the pre-
vious example. Given the explanatory variable xd,
the simulated residual variance is 1.0 for xd=0 and
2.72 for xd=1. The output shows that the vari-
ance of the random effect is 0.109, and that ˆ
βd=
(−0.32,1.47), i.e. the two residual variances are es-
timated as 0.72 and 3.16. (Code continued from Ex-
ample 1)
R> beta.disp <- 1
R> X_d <- matrix(1, n, 2)
R> X_d[,2] <- rbinom(n, 1, .5)
R> colnames(X_d) <- c("Intercept", "x_d")
R> e <- rnorm(n, 0,
+ sqrt(sigma2_e*exp(beta.disp*X_d[,2])))
R> y <- mu + Z%*%a + e
R> summary(hglm(y = y, X = X, Z = Z,
+ X.disp = X_d))
Call:
hglm.default(X = X, y = y, Z = Z, X.disp = X_d)
DISPERSION MODEL
WARNING: h-likelihood estimates through EQL can be biased.
Model estimates for the dispersion term:
Link = log
Effects:
Estimate Std. Error
Intercept -0.3225 0.2040
x_d 1.4744 0.2881
Dispersion = 1 is used in Gamma model on deviances
to calculate the standard error(s).
Dispersion parameter for the random effects
[1] 0.1093
Dispersion model for the random effects:
Link = log
Effects:
Estimate Std. Error
-2.2135 0.8747
Dispersion = 1 is used in Gamma model on deviances
to calculate the standard error(s).
MEAN MODEL
Summary of the fixed effects estimates
Estimate Std. Error t value Pr(>|t|)
X.1 -0.0535 0.1836 -0.291 0.771
Note: P-values are based on 96 degrees of freedom
Summary of the random effects estimate
Estimate Std. Error
[1,] 0.0498 0.2341
[2,] -0.2223 0.2276
[3,] 0.4404 0.2276
[4,] -0.1786 0.2276
[5,] -0.0893 0.2296
EQL estimation converged in 5 iterations.
Example 3: Fitting a Poisson model with
gamma random effects, and fixed effects in
the dispersion term
Data description We simulate a Poisson model
with random effects and estimate the parameter in
the dispersion term for an explanatory variable xd.
The estimated dispersion parameter for the random
effects is 0.6556. (Code continued from Example 2)
R> u <- rgamma(n.clus,1)
R> eta <- exp(mu + Z%*%u)
R> y <- rpois(length(eta), eta)
R> gamma.pois <- hglm(y = y, X = X, Z = Z,
+ X.disp = X_d,
+ family = poisson(
+ link = log),
+ rand.family =
+ Gamma(link = log))
R> summary(gamma.pois)
Call:
hglm.default(X = X, y = y, Z = Z,
family = poisson(link = log),
rand.family = Gamma(link = log), X.disp = X_d)
DISPERSION MODEL
WARNING: h-likelihood estimates through EQL can be biased.
Model estimates for the dispersion term:
Link = log
Effects:
Estimate Std. Error
Intercept -0.0186 0.2042
x_d 0.4087 0.2902
Dispersion = 1 is used in Gamma model on deviances
to calculate the standard error(s).
Dispersion parameter for the random effects
The R Journal Vol. 2/2, December 2010 ISSN 2073-4859
26 CONTRIBUTED RESEARCH ART ICLES
[1] 1.926
Dispersion model for the random effects:
Link = log
Effects:
Estimate Std. Error
0.6556 0.7081
Dispersion = 1 is used in Gamma model on deviances
to calculate the standard error(s).
MEAN MODEL
Summary of the fixed effects estimates
Estimate Std. Error t value Pr(>|t|)
X.1 2.3363 0.6213 3.76 0.000293
---
Note: P-values are based on 95 degrees of freedom
Summary of the random effects estimate
Estimate Std. Error
[1,] 1.1443 0.6209
[2,] -1.6482 0.6425
[3,] -2.5183 0.6713
[4,] -1.0243 0.6319
[5,] 0.2052 0.6232
EQL estimation converged in 3 iterations.
Example 4: Incorporating correlated ran-
dom effects in a linear mixed model - a ge-
netics example
Data description The data consists of 2025 indi-
viduals from two generations where 1000 individ-
uals have observed trait values ythat are approxi-
mately normal (Figure 3). The data we analyze was
simulated for the QTLMAS 2009 Workshop (Coster
et al.,2010)1. A longitudinal growth trait was sim-
ulated. For simplicity we analyze only the val-
ues given on the third occasion at age 265 days.
y
Frequency
2 4 6 8 10 14
0 50 100 150 200
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−3 −1 0 1 2 3
2 4 6 8 10 12
Theoretical Quantiles
Sample Quantiles
Figure 3: Histogram and qqplot for the analyzed
trait.
We fitted a model with a fixed intercept and a
random animal effect, a, where the correlation struc-
ture of ais given by the additive relationhip matrix
A(which is obtained from the available pedigree in-
formation). An incidence matrix Z0was constructed
and relates observation number with id-number in
the pedigree. For observation yicoming from indi-
vidual jin the ordered pedigree file Z0[i,j] = 1, and
all other elements are 0. Let Lbe the Cholesky factor-
ization of A, and Z=Z0L. The design matrix for the
fixed effects, X, is a column of ones. The estimated
variance components are ˆ
σ2
e=2.21 and ˆ
σ2
u=1.50.
The R code for this example is given below.
R> data(QTLMAS)
R> y <- QTLMAS[,1]
R> Z <- QTLMAS[,2:2026]
R> X <- matrix(1, 1000, 1)
R> animal.model <- hglm(y = y, X = X, Z = Z)
R> print(animal.model)
Call:
hglm.default(X = X, y = y, Z = Z)
Fixed effects:
X.1
7.279766
Random effects:
[1] -1.191733707 1.648604776 1.319427376 -0.928258503
[5] -0.471083317 -1.058333534 1.011451565 1.879641994
[9] 0.611705900 -0.259125073 -1.426788944 -0.005165978
...
Dispersion parameter for the mean model:[1] 2.211169
Dispersion parameter for the random effects:[1] 1.502516
EQL estimation converged in 2 iterations
Example 5: Binomial-beta model applied
to seed germination data
Data description The seed germination data pre-
sented by Crowder (1978) has previously been ana-
lyzed using a binomial GLMM (Breslow and Clay-
ton,1993) and a binomial-beta HGLM (Lee and
Nelder,1996). The data consists of 831 observations
from 21 germination plates. The effect of seed vari-
ety and type of root extract was studied in a 2 ×2
factorial lay-out. We fit the binomial-beta HGLM
used by Lee and Nelder (1996) and setting fix.disp
= 1 in hglm() produces comparable estimates to the
ones obtained by Lee and Nelder (with differences
<2×10−3). The beta distribution parameter αin Lee
and Nelder (1996) was defined as 1/(2a)where ais
the dispersion term obtained from hglm(). The out-
put from the R code given below gives ˆ
a=0.0248 and
the corresponding estimate given in Lee and Nelder
(1996) is ˆ
a=1/(2ˆ
α) = 0.023. We conclude that the
hglm package produces similar results as the ones
presented in Lee and Nelder (1996) and the disper-
sion parameters estimated using the EQL method in
GenStat differ by less than 1%. Additional examples,
together with comparisons to estimates produced by
GenStat, are given in the hglm vignette included in
the package on CRAN.
R> data(seeds)
R> germ <- hglm(
+ fixed = r/n ~ extract*I(seed=="O73"),
1http://www.qtlmas2009.wur.nl/UK/Dataset
The R Journal Vol. 2/2, December 2010 ISSN 2073-4859
CONTRIBUTED RESEARCH ARTI CLES 27
+ weights = n, data = seeds,
+ random = ~1|plate, family = binomial(),
+ rand.family = Beta(), fix.disp = 1)
R> summary(germ)
Call:
hglm.formula(family = binomial(), rand.family = Beta(),
fixed = r/n ~ extract * I(seed == "O73"),
random = ~1 | plate, data = seeds,
weights = n, fix.disp = 1)
DISPERSION MODEL
WARNING: h-likelihood estimates through EQL can be biased.
Model estimates for the dispersion term:[1] 1
Model estimates for the dispersion term:
Link = log
Effects:
[1] 1
Dispersion = 1 is used in Gamma model on deviances to
calculate the standard error(s).
Dispersion parameter for the random effects
[1] 0.02483
Dispersion model for the random effects:
Link = log
Effects:
Estimate Std. Error
-3.6956 0.5304
Dispersion = 1 is used in Gamma model on deviances to
calculate the standard error(s).
MEAN MODEL
Summary of the fixed effects estimates
Estimate Std. Error t value
(Intercept) -0.5421 0.1928 -2.811
extractCucumber 1.3386 0.2733 4.898
I(seed == "O73")TRUE 0.0751 0.3114 0.241
extractCucumber:I(seed=="O73") -0.8257 0.4341 -1.902
Pr(>|t|)
(Intercept) 0.018429
extractCucumber 0.000625
I(seed == "O73")TRUE 0.814264
extractCucumber:I(seed=="O73") 0.086343
---
Note: P-values are based on 10 degrees of freedom
Summary of the random effects estimate
Estimate Std. Error
[1,] -0.2333 0.2510
[2,] 0.0085 0.2328
...
[21,] -0.0499 0.2953
EQL estimation converged in 7 iterations.
Summary
The hierarchical generalized linear model approach
offers new possibilities to fit generalized linear mod-
els with random effects. The hglm package extends
existing GLMM fitting algorithms to include fixed ef-
fects in a model for the residual variance, fits mod-
els where the random effect distribution is not neces-
sarily Gaussian and estimates variance components
for correlated random effects. For such models there
are important applications in, for instance: genet-
ics (Noh et al.,2006), survival analysis (Ha and Lee,
2005), credit risk modeling (Alam and Carling,2008),
count data (Lee et al.,2006) and dichotomous re-
sponses (Noh and Lee,2007). We therefore expect
that this new package will be of use for applied statis-
ticians in several different fields.
Bibliography
M. Alam and K. Carling. Computationally feasible
estimation of the covariance structure in general-
ized linear mixed models GLMM. Journal of Sta-
tistical Computation and Simulation, 78:1227–1237,
2008.
M. Alam, L. Ronnegard, and X. Shen. hglm: Hierar-
chical Generalized Linear Models, 2010. URL http:
//CRAN.R-project.org/package=hglm. R package
version 1.1.1.
D. Bates and M. Maechler. lme4: Linear mixed-effects
models using S4 classes, 2010. URL http://CRAN.
R-project.org/package=lme4. R package version
0.999375-37.
N. E. Breslow and D. G. Clayton. Approximate infer-
ence in generalized linear mixed models. Journal of
the American Statistical Association, 88:9–25, 1993.
K. Carling, L. Rönnegård, and K. Roszbach. An
analysis of portfolio credit risk when counterpar-
ties are interdependent within industries. Sveriges
Riksbank Working Paper, 168, 2004.
A. Coster, J. Bastiaansen, M. Calus, C. Maliepaard,
and M. Bink. QTLMAS 2010: Simulated dataset.
BMC Proceedings, 4(Suppl 1):S3, 2010.
M. J. Crowder. Beta-binomial ANOVA for propor-
tions. Applied Statistics, 27:34–37, 1978.
P. K. Dunn and G. K. Smyth. dglm: Double generalized
linear models, 2009. URL http://CRAN.R-project.
org/package=dglm. R package version 1.6.1.
I. D. Ha and Y. Lee. Comparison of hierarchical likeli-
hood versus orthodox best linear unbiased predic-
tor approaches for frailty models. Biometrika, 92:
717–723, 2005.
C. R. Henderson. A simple method for comput-
ing the inverse of a numerator relationship matrix
used in prediction of breeding values. Biometrics,
32(1):69–83, 1976.
Y. Lee and J. A. Nelder. Double hierarchical general-
ized linear models with discussion. Applied Statis-
tics, 55:139–185, 2006.
Y. Lee and J. A. Nelder. Hierarchical generalized lin-
ear models with discussion. J. R. Statist. Soc. B, 58:
619–678, 1996.
The R Journal Vol. 2/2, December 2010 ISSN 2073-4859
28 CONTRIBUTED RESEARCH ART ICLES
Y. Lee, J. A. Nelder, and Y. Pawitan. Generalized linear
models with random effects. Chapman & Hall/CRC,
2006.
Y. Lee, J. A. Nelder, and M. Noh. H-likelihood: prob-
lems and solutions. Statistics and Computing, 17:
49–55, 2007.
M. Lynch and B. Walsh. Genetics and analysis of Quan-
titative Traits. Sinauer Associates, Inc., 1998. ISBN
087893481.
P. McCullagh and J. A. Nelder. Generalized linear mod-
els. Chapman & Hall/CRC, 1989.
M. Molas. HGLMMM: Hierarchical Generalized Linear
Models, 2010. URL http://CRAN.R- project.org/
package=HGLMMM. R package version 0.1.1.
M. Noh and Y. Lee. REML estimation for binary data
in GLMMs. Journal of Multivariate Analysis, 98:896–
915, 2007.
M. Noh, B. Yip, Y. Lee, and Y. Pawitan. Multicompo-
nent variance estimation for binary traits in family-
based studies. Genetic Epidemiology, 30:37–47, 2006.
R. W. Payne, D. A. Murray, S. A. Harding, D. B. Baird,
and D. M. Soutar. GenStat for Windows (10th edi-
tion) introduction, 2007. URL http://www.vsni.
co.uk/software/genstat.
J. Pinheiro, D. Bates, S. DebRoy, D. Sarkar, and the
R Core team. nlme: Linear and Nonlinear Mixed Ef-
fects Models, 2009. URL http://CRAN.R- project.
org/package=nlme. R package version 3.1-96.
L. Rönnegård and Ö. Carlborg. Separation of base al-
lele and sampling term effects gives new insights
in variance component QTL analysis. BMC Genet-
ics, 8(1), 2007.
W. N. Venables and B. D. Ripley. Modern Applied
Statistics with S. Springer, New York, fourth edi-
tion, 2002. URL http://www.stats.ox.ac.uk/
pub/MASS4. ISBN 0-387-95457-0.
Lars Rönnegård
Statistics Unit
Dalarna University, Sweden
and
Department of Animal Breeding and Genetics
Swedish University of Agricultural Sciences, Sweden
lrn@du.se
Xia Shen
Department of Cell and Molecular Biology
Uppsala University, Sweden
and
Statistics Unit
Dalarna University, Sweden
xia.shen@lcb.uu.se
Moudud Alam
Statistics Unit
Dalarna University, Sweden
maa@du.se
The R Journal Vol. 2/2, December 2010 ISSN 2073-4859