ArticlePDF Available

The Problem of Scaling in Exponential Random Graph Models

Authors:

Abstract and Figures

This study shows that residual variation can cause problems related to scaling in exponential random graph models (ERGM). Residual variation is likely to exist when there are unmeasured variables in a model-even those uncor-related with other predictors-or when the logistic form of the model is inappropriate. As a consequence, coefficients cannot be interpreted as effect sizes or compared between models and homophily coefficients, as well as other interaction coefficients, cannot be interpreted as substantive effects in most ERGM applications. We conduct a series of simulations considering the substantive impact of these issues, revealing that realistic levels of residual variation can have large consequences for ERGM inference. A flexible methodological framework is introduced to overcome these problems. Formal tests of mediation and moderation are also proposed. These methods are applied to revisit the relationship between selective mixing and triadic closure in a large AddHealth school friendship network. Extensions to other classes of statistical work models are discussed.
Content may be subject to copyright.
Original Article
The Problem of Scaling
in Exponential Random
Graph Models
Scott W. Duxbury
1
Abstract
This study shows that residual variation can cause problems related to scaling
in exponential random graph models (ERGM). Residual variation is likely to
exist when there are unmeasured variables in a model—even those uncor-
related with other predictors—or when the logistic form of the model is
inappropriate. As a consequence, coefficients cannot be interpreted as effect
sizes or compared between models and homophily coefficients, as well as
other interaction coefficients, cannot be interpreted as substantive effects in
most ERGM applications. We conduct a series of simulations considering the
substantive impact of these issues, revealing that realistic levels of residual
variation can have large consequences for ERGM inference. A flexible
methodological framework is introduced to overcome these problems.
Formal tests of mediation and moderation are also proposed. These meth-
ods are applied to revisit the relationship between selective mixing and
triadic closure in a large AddHealth school friendship network. Extensions to
other classes of statistical work models are discussed.
Keywords
social network analysis, exponential random graph models, scaling, media-
tion, moderation
1
University of North Carolina, Chapel Hill, North Carolina, USA
Corresponding Author:
Scott W. Duxbury, Department of Sociology, University of North Carolina, 163 Hamilton Hall,
102 Emerson Drive, Chapel Hill, NC 27514, USA.
Email: duxbury@unc.email.edu.
Sociological Methods & Research
1-39
ªThe Author(s) 2021
Article reuse guidelines:
sagepub.com/journals-permissions
DOI: 10.1177/0049124120986178
journals.sagepub.com/home/smr
Over the past decade, social scientists have increasingly looked to statistical
network methods to address important substantive questions. These methods
improve upon classical approaches of statistical inference for network data
by relaxing independence assumptions and providing a means to formally
represent interdependent social phenomena. Among the models available,
exponential random graph models (ERGM) have established themselves as
an especially popular tool for this type of inference given their flexibility and
ability to represent nodal, dyadic, and network covariates. Indeed, the diffu-
sion of ERGM across the social, political, behavioral, and health sciences has
led to an explosion of research examining the generative properties of social
networks (Adams and Schaefer 2016; Cranmer et al. 2017; Kreager et al.
2017; Lewis 2013; Papachristos, Hureau, and Braga 2013; Young 2011).
However, there is a problem in ERGM applications that has gone largely
unnoticed: ERGM coefficients are confounded with residual variation–unex-
plained variation in tie probabilities—that rescales model coefficients. This
has several important consequences. First, coefficients and exponentiated
coefficients cannot be interpreted as effect sizes. Second, scaling can pro-
duce differences in coefficient size across models that are unrelated to either
mediation or confounding. Third, scaling can bias interaction coefficients—
including homophily coefficients—yielding incorrect assessments of direc-
tion, interaction effect size, and significance.
While recent studies have sought to address several sources of residual
variation, including unobserved heterogeneity (unmeasured nodal covari-
ates; Box-Steffensmeier, Christenson, and Morgan 2018; Thiemichen et al.
2016; van Duijn, Snijders, and Zijlstra 2004), nesting structure (Schweinber-
ger 2020; Schweinberger and Handcock 2015; Stewart et al. 2019), and
measurement error (Kim, Leonardo, and Kirkland 2016), the consequences
of scaling for statistical inference using ERGM have been largely over-
looked. In addition to these sources, we show that scaling can affect ERGM
results even when omitted variables are uncorrelated with other predictors.
We further illustrate that residual variation can impact ERGM coefficients
when the logistic formulation of the model is inappropriate for the data,
which is common in both sparse and especially dense networks and is related
to the known problem of degeneracy (e.g., Handcock et al. 2003; Mele 2017).
Because we rarely have measures of all variables relevant to network
formation,
1
problems linked to residual variation and scaling are likely pre-
valent in ERGM applications, and their consequences for scientific inquiry
using statistical network methods are potentially large. Indeed, a survey of
sociological literature applying ERGM shows that it is common for research-
ers to interpret coefficients as effect sizes, to compare coefficients between
2Sociological Methods & Research XX(X)
models, and to rely on interaction coefficients to interpret interactions (e.g.,
Kreager et al. 2017:707; Lewis 2013:18,816; Papachristos and Bastomski
2018:545; Papachristos et al. 2013:434; Stewart et al. 2019:108; Wimmer
and Lewis 2010:618). Current introductory texts on ERGM also make no
mention of scaling and instead recommend interpreting interaction coeffi-
cients as interaction effects (Lusher, Koskinen, and Robins 2013:54) or
comparing coefficients between models to assess confounding (Cranmer
et al. 2017:242; Goodreau, Kitts, and Morris 2009:115).
Although similar issues have been documented in generalized linear mod-
els (see Mood 2010), they have gone unaddressed in statistical network
analysis.
2
The goal of this article is to introduce the problem of scaling in
ERGM, to outline its sources, and to propose methods for overcoming the
issue in ERGM. The first section provides a brief overview of ERGM. The
second section introduces the problem of scaling and how it affects para-
meter interpretation by deriving a latent variable formulation of ERGM. The
third section evaluates the impact of scaling for assessing effect size, differ-
ences in coefficients between models, and interaction coefficients with a
series of simulations. The fourth section proposes methods for overcoming
each of these issues. These methods are extended to develop formal tests for
mediation and moderation analysis, which have yet to be introduced for
statistical network models. It concludes with a replication study using the
largest AddHealth in-school friendship network examined by Goodreau et al.
(2009). The empirical application shows that correcting for scaling can
change substantive conclusions in ERGM applications. Extensions to other
statistical network models are also discussed.
Overview of ERGM
ERGMs are a class of statistical network model that represent graph prob-
abilities using an exponential-family distribution. Common ERGM formula-
tions are Erdos-Renyi models (Erdos and Renyi 1959), dyad independence
(p1) models (Holland and Leinhardt 1981), Markov graphs (Frank and
Strauss 1986), and curved ERGMs (Hunter 2007; Snijders et al. 2006). In
most applications, ERGMs are used to represent some kind of dyadic depen-
dence structure (e.g., Markov graphs or curved ERGMs) in binary cross-
sectional networks though extensions to dynamic and weighted networks
exist (Desmarais and Cranmer 2012; Hanneke, Fu, and Xing, 2010; Krivitsky
2012).
Given a network Ywith yij ties connecting actors iand j, ERGM estimates
the probability of observing Yas a function of exogenous actor level
Duxbury 3
characteristics and sufficient graph statistics. ERGM has the following prob-
ability mass function:
PrðY¼yjzðy;xÞÞ ¼ expðyTzðy;xÞÞ
kðyÞ;ð1Þ
where yis the parameter vector, zðy;xÞis a pvector of exogenous charac-
teristics xand endogenous graph statistics computed on Y,and
kðyÞ¼PexpðyTzðy;xÞÞ0is a normalizing constant ensuring that the sum
of equation (1) over all possible networks equals 1. Due to the intractability
of kðyÞin most networks of interest, the denominator is typically approxi-
mated using Markov chain Monte Carlo (MCMC) sampling (Geyer and
Thompson 1992; Snijders 2002). A large distribution of possible networks
are simulated and randomly sampled to approximate the maximum
likelihood.
3
Because the measured network is only a single representation of the
underlying generative process, the graph statistics of the measured network
are the expectation of those statistics across all possible networks. It is
assumed that the measured network is a reasonable representation of the
underlying stochastic distribution, that the likelihood principle is true, and
that the likelihood formulation of the problem is reasonable.
Equation (1) provides the joint form of the model representing the prob-
ability of observing a network as a function of its sufficient statistics. This
equation can be rewritten in its conditional form to provide a tie-level inter-
pretation. Consider the tie variable yij, where yij ¼1ifiand jare connected
and yij ¼0 otherwise. The conditional form of ERGM is
pij
1pij ¼PrðYij ¼1jX¼x;Yij ¼yijÞ
PrðYij ¼0jX¼x;Yij ¼yijÞ¼expðyT
endogenousdþ
ij ðyÞþyT
exogenousxijÞ;
ð2Þ
with cumulative distribution function:
pij ¼expðyT
endogenousdþ
ij ðyÞþyT
exogenousxij Þ
1þexpðyT
endogenousdþ
ij ðyÞþyT
exogenousxij Þ;ð3Þ
where dþ
ij is the change in parameterized graph statistic when a focal tie yij is
toggled from 0 to 1. Readers will recognize the functional form from logistic
regression. Indeed, ERGM is a logit model—an auto-logistic regression
(Frank and Strauss 1986; Wasserman and Patterson 1996)—but is distinct
from logistic regression in that tie variables are treated as conditional on the
4Sociological Methods & Research XX(X)
entire graph structure and are not assumed to be independent (except in
special cases; Besag 1972:72; Besag 1974:201). Parameters can thus be
interpreted as the increase/decrease in log-odds of an ij tie given a one unit
change in a focal covariate effect, conditional on other covariates in the
model.
Scaling in ERGM
The Problem of Scaling: A Latent Variable Formulation
Problems related to scaling and rescaling emerge in ERGM because the error
distribution is invariant across models. Although all stochastic models
(which ERGM and generalized linear models are cases) assume some ran-
dom error, this error is not reflected in either equations (1) or (2). Researchers
are consequently forced to (implicitly) assume a distribution for the error.
The distribution of this error is invariant regardless of model specification.
The simplest way to show how error invariance arises is to formulate
ERGM as a latent variable model. This formulation is common in the liter-
ature on logistic regression (Cramer 2007; Hosmer and Lemeshow 2000;
Karlson, Holm, and Breen 2012; Long 1997; Mood 2010) but has not been
derived for ERGM. Online Appendix A (which can be found at http://smr.
sagepub.com/supplemental/) shows that the latent ERGM representation pre-
sented below is equivalent to the traditional ERGM representation.
To begin, we can regard Yij as a binary indicator of the latent variable Y
ij
that takes a value of 1 of when Y
ij is greater than 0
4
and takes a value of 0
otherwise:
Yij ¼1ifY
ij >0
0 otherwise:
ð4Þ
We can then write the latent model for Y
ij as a function of latent endo-
genous and exogenous parameters:
Y
ij ¼aT
endogenousdþ
ij ðyÞþaT
exogenousxij þeij :ð5Þ
While it may seem strange to write the latent variable model as condi-
tional on the measured graph statistics, recognizing that the measured net-
work is a direct mapping of the latent variable clarifies that this notation is
merely a convenience. We could equivalently write the latent variable model
as conditional on all values of Y
ij >0 without losing generality.
Duxbury 5
The benefit of equation (5) is that we are able to account for the error (e)
because Yis continuous, while Yis not. As in any stochastic model, we
require an assumption on the mean of the error to formulate the model, which
is ubiquitously assumed to be 0 conditional on other covariates. However,
because eis unmeasurable, we also require an assumption on the variance to
scale the inequality in equation (4). As in other logit models, we assume that
eis a logistic random variable with mean 0 and variance p2
33:29 (see, for
instance, Winship and Mare 1983:61-63; Allison 1999:189; Mood 2010:68-
69).
The problem of scaling emerges because the variance of eis fixed regard-
less of model specification. When the true mean of the error is not 0, we
encounter the well-known problem of omitted confounding variable bias.
However, when the true error variance is not 3.29, coefficients are also
biased by scaling. This is clearest to see by incorporating the scale tinto
the latent variable model:
Y
ij ¼aT
endogenousdþ
ij ðyÞþaT
exogenousxij þteij :ð6Þ
trelates the assumed logistic distribution of eto the true error distribution.
It is the ratio of the true standard deviation of the latent error to the assumed
standard deviation of the latent error.
It can be shown (Online Appendix A, which can be found at http://smr.
sagepub.com/supplemental/) that tlinks the latent model to the estimated
model via:
yT
endogenousdþ
ij ðyÞþyT
exogenousxij ¼aT
endogenousdþ
ij ðyÞþaT
exogenousxij
t;ð7Þ
and the latent parameters to the estimated parameters as: y¼a=t. Conse-
quently, only in the special case where the true error variance is 3.29 (t¼1)
can we conclude that y¼a. As discussed below, this conclusion is rarely
supported in practice. In the majority of cases, residual variation is absorbed
into t, and, consequently, the estimated coefficients do not equal the true
(latent) coefficients.
While scaling does not affect predicted probabilities or change the sign of
non-interaction coefficients or their zstatistics (because standard errors scale
along with coefficients),
5
it does bias coefficient size. Further, because t
usually varies between groups and between models, scaling can bias inter-
action coefficient size, sign, and significance, and prevent comparisons of
coefficients between models (discussed below). Simulations in Online
Appendix B (which can be found at http://smr.sagepub.com/supplemental/)
6Sociological Methods & Research XX(X)
also indicate that the MCMC methods that address many estimation prob-
lems in ERGM do little to reduce the bias created by scaling.
Sources of Scaling
It should be clear from the above discussion that ERGM coefficients are only
identified to a scale when residual variation is present. Several causes of
residual variation have been discussed elsewhere. For instance, Box-
Steffensmeier et al. (2018) discuss how unmeasured nodal covariates can
contribute to omitted variable bias and model degeneracy. Schweinberger
and colleagues (Schweinberger 2020; Schweinberger and Handcock 2015;
Stewart et al. 2019) develop a similar argument for the broader case of
nesting structure, where accounting for nesting structure improves estimation
of decay parameters and out-of-sample statistics for triad and degree distri-
butions in curved ERGMs (see Stewart et al. 2019). Kim et al. (2016) doc-
ument that measurement error during network data collection can generate
inaccurate sufficient statistics that bias ERGM coefficients.
Each of these studies provides problem-specific solutions such as
including nodal random effects (Box-Steffensmeier et al. 2018; Thiemi-
chen et al. 2016; van Duijn et al. 2004), explicitly modeling nesting struc-
ture (Stewart et al. 2019), and using pseudolikelihood estimation to reduce
attenuation bias in the presence of measurement error (Kim et al. 2016).
However, residual variation can produce problems of scaling in more dif-
ficult to detect circumstances and is likelytopersisteveninresearchthat
utilizes these corrections.
The first source of residual variation is when the logistic formulation of
the cumulative distribution function is inappropriate. This is a common
concern in rare event models for independent binary data, such as rare
event logistic regression, where the concentration of event probabilities
is too extreme to be appropriately represented by the logistic sigmoid
function (see Cramer 2007). In the case of ERGM, dense concentrations
of tie probabilities close to 0 or close to 1 can disturb the logistic functional
form of the model, which is likely to occur in either very dense or very
sparse networks. In these circumstances, it is unreasonable to assume a
logistic functional form for the model and, as a consequence, the true error
is unlikely to be a logistic random variable. This issue is related to the well-
known problem of fitting ERGMs to either very dense networks or very
sparse networks (Handcock et al. 2003; Mele 2017). However, instead of
pertaining to MCMC-MLE convergence, the problems outlined here can
emerge even in converged models.
Duxbury 7
The second and perhaps more concerning cause of scaling is omitted
variables, even those that are uncorrelated with other predictors. When an
omitted variable is a determinant of tie probabilities, its residual variation is
absorbed into t, causing coefficients to rescale. Because this scaling is
caused by the unmeasured effect of an omitted variable, ERGM coefficients
are biased even when omitted variables are uncorrelated with other predic-
tors. This is an important result. Recent methodological extensions to ERGM
have focused on developing corrections for unmeasured nodal covariates and
unmeasured confounding variables (Box-Steffensmeier et al. 2018; Thiemi-
chen et al. 2016). However, omitted variables do not need to confound any
observed relationship to rescale ERGM coefficients. Nor is unmeasured
heterogeneity among nodes the sole-source of residual variation. Residual
variation may exist due to unmeasured dyadic- and network-level covariates
or because of unmeasured interactions between or within any of these levels.
As a consequence, these corrections do not resolve problems of scaling in
many ERGM applications. Because the assumption of no omitted variables is
untestable, we can rarely rule out the possibility that ERGM results are
affected in practice.
Consequences of Scaling for ERGM Inference
Because we cannot test for residual variation, we can rarely rule out the
possibility of scaling in applied research. As Box-Steffensmeier et al.
(2018:4) observe, many variables relevant to an empirical model go unmea-
sured during data collection. The more realistic and more conservative
assumption is that some residual variation exists in most models and that
coefficients are only identified to a scale in most ERGM applications. This
has several important consequences for ERGM inference.
First, coefficients and exponentiated coefficients cannot be interpreted as
effect sizes. While rescaling does not alter conclusions about the direction
and significance of noninteraction coefficients, it does affect coefficient
magnitude. For example, assume that we have a latent model
Y¼a1X1þa2X2þte and that X
1
and X
2
are exogenous and uncorrelated
node-level variables. If we do not include X
2
into our ERGM, it can be shown
(see Online Appendix C, which can be found at http://smr.sagepub.com/
supplemental/) that y1will rescale to
y1a1ffiffiffiffiffiffiffiffi
3:29
p
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
3:29 þa2
2VarðX2Þ
p:ð8Þ
8Sociological Methods & Research XX(X)
If X
1
and X
2
are correlated, we obtain:
y1ða1þa2g1Þffiffiffiffiffiffiffiffi
3:29
p
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
3:29 þa2
2VarðvÞ
p;ð9Þ
where g1is the effect of X
1
on X
2
and vis the variation in X
2
unexplained by
X
1
. Consequently, if the effect size of a2or the variance of X
2
is large, the
rescaling can be substantial.
Second, coefficients cannot be compared between models unless we
assume that tis invariant. In fact, we cannot meet this assumption in any
scenario where we include or exclude a variable that we expect to hold some
explanatory power (i.e., that explains residual variation). Because we are
typically interested in assessing differences between coefficients when we
expect some amount of confounding, we must instead assume:
tModel1tModel2;ð10Þ
and thus,
yModel1yModel2¼aModel1
tModel1aModel2
tModel2aModel1aModel2:ð11Þ
Differences in coefficients are therefore an uninterpretable blend of the
change in tand actual confounding. This issue proves to be somewhat more
dramatic for between model comparisons than for interpretations of effect
size. Because tcan either increase or decrease in size between models, it is
possible that yModel1yModel2can provide the opposite sign to
aModel1aModel2. It is also possible that rescaling can suppress the difference
in coefficients, such that we would conclude no change between models even
though there actually is confounding or exaggerate the difference, such that
we conclude confounding when two variables are uncorrelated.
Third, interaction coefficients, including homophily and heterophily coef-
ficients, cannot be used to assess significance or interpret the effect of inter-
actions unless we assume that the each group-specific coefficient is
identified to the same scale. This is easiest to see by writing separate models
for each group. Consider the simplest case of two groups with latent models
representing the group-specific tie propensity:
Group 0:Y0
ij ¼a0T
endogenousdþ
ij ðyÞ0þa0T
exogenousx0
ij þt0e0
ij
Group 1:Y1
ij ¼a1T
endogenousdþ
ij ðyÞ1þa1T
exogenousx1
ij þt1e1
ij:
Duxbury 9
Since e0¼e1,y0and y1will only be properly scaled when t0¼t1.If
t0t1, interaction coefficients used to represent differences between
Groups 0 and 1 will be biased.
6
Moreover, because we anticipate heteroge-
neity in effects when we model interactions, it is typically implausible to
assume that tis invariant between groups. In fact, because interaction coef-
ficients measure the equality of coefficients between groups, interaction
coefficients can provide the incorrect sign and incorrect zstatistic when the
amount of residual variation differs between groups.
Despite that scaling and rescaling present important problems for statistical
inference using ERGM, these issues have not been addressed in statistical
network analysis. As illustrated above, scaling can emerge when there are
omitted variables in a model, even when those omitted variables are uncorre-
lated with all other covariates in a model. Scaling is thus likely prevalent in
applications as the assumption of no omitted variables is difficult to meet and
impossible to verify. The following section evaluates the substantive impact of
these issues for ERGM inference with a series of simulations.
The Substantive Impact of Scaling
Given that most ERGM coefficients are likely only identified to a scale, a
pressing question is how much impact we should expect scaling and rescal-
ing to have in applied research. To address this question, we carry out a series
of simulation studies that evaluate the impact of scaling on coefficient size,
differences in coefficients, and interaction coefficients. All simulations are
based on empirical data so that conclusions are realistic.
7
In the simulations
to follow, residual variation is introduced by omitting uncorrelated variables
from ERGM. The change in parameters of interest therefore arise because the
model is only identified to a scale rather than because of an omitted con-
founding variable.
Simulation of Coefficient Size
To assess the substantive impact of scaling on coefficient size in ERGM, a
simulation was conducted using the Faux Dixon High network as a reference
network. Faux Dixon High is a directed network of 1,305 friendship ties
between 248 high school students. The network is simulated from one large
high school in the National Longitudinal Study of Adolescent Health.
8
We first
fit an ERGM of the form: log ^
p
1^
p

¼y1dþðGWESPÞþy2ReceiverGrade,
where GWESP indicates a geometrically weighted edgewise shared
10 Sociological Methods & Research XX(X)
partnership term (fixed decay parameter of 0.1), y1is approximately 1
(p<:001), and y2is approximately 0.5 (p<:001). We used an MCMC
sampler (Metropolis–Hastings algorithm) to simulate 1,000 network data sets
using the empirical ERGM parameters, where y1equaled 1 and y2could equal
0.4, 0.45, or 0.5. We chose these values to remain within the scope of
effect sizes that are realistic given the empirical network data. With 1,000
replications for each condition, the sample space of the experiment contains
results from ERGMs fit to 3,000 network data sets.
The goal was to simulate networks from the model and then attempt to
recapture the true value of y1when the amount of residual variation
increased. To introduce residual variation, we fit an ERGM to each of the
simulated data sets with ReceiverGrade omitted from the model. Because
ReceiverGrade is uncorrelated with GWESP (r¼:03),
9
the only source of
bias in the simulated values of y1is scaling, where larger absolute values of
y2correspond to more residual variation and thus greater scaling. We also fit
an ERGM to each network that included ReceiverGrade to provide a control
condition for the experiment. Because these latter ERGMs were fit with
prespecified knowledge of the generative model, there is little-to-no residual
variation in the control condition.
Results are straightforward to summarize (Figure 1). While the mean
GWESP coefficient approximates its true value of 1 across conditions, the
estimated GWESP coefficient rarely replicates its true value when residual
variation is present. As the amount of residual variation increases, the esti-
mated coefficient increases. When the coefficient for ReceiverGrade is fixed
to 0.4, the amount of bias is manageable, where the mean coefficient for
GWESP is 1.35. However, even modest increases in residual variation have
sizable effects. The mean coefficient for GWESP is 1.4 when the Receiver-
Grade coefficient is equal to 0.45 and increases to 1.57 when the Recei-
verGrade coefficient is 0.5. If we were to interpret the results from an
ERGM in this latter case, we would conclude that GWESP has an effect size
57 percent larger than its actual effect size despite there being no omitted
confounding variables. These differences are exaggerated if we use expo-
nentiated coefficients to interpret effect sizes. For instance, the true odds
ratio is 2.72 (exp(1) ¼2.72), but the estimated odds ratio increases to 3.86
(exp(1.35) ¼3.86), 4.06 (exp(1.4) ¼4.06), and finally to 4.81 (exp(1.57) ¼
4.81) at each successive increase in the amount residual variation.
Also of note are the confidence intervals in each simulation. Every con-
fidence interval is narrow, and none of the confidence intervals in any of the
treatment conditions contained the true GWESP (geometrically weighted
edgewise shared partnerships) coefficient of 1. This result illustrates that
Duxbury 11
scaling can pose problems for ERGM inference even when resampling is
possible because confidence intervals rescale alongside ERGM coefficients.
These results illustrate that the consequences of scaling for conclusions about
effect size can be substantial. Even though even though ReceiverGrade and
GWESP are uncorrelated, the bias in the bias in estimated coefficients is
sizable in every treatment condition. We therefore caution researchers
against relying on coefficients and exponentiated coefficients to interpret
effect size in ERGM applications.
Simulation of Differences in Coefficients
We now consider the effect of scaling on the difference in coefficients using a
simulation based on the Faux Mesa High network data. The Faux Mesa Network
is an undirected network of 205 students and 203 ties. We first estimated an
ERGM predicting ties as a function of students’ sex and GWESP (fixed decay
term equal to 0.3). Next, we simulated a confounding or mediating edge cov-
ariate Mto be correlated with GWESP, but uncorrelated with Grade,usinga
linear model of the form: M¼1þ:5dþðGWESPÞþe,whereeis a randomly
Grade= −.5
Grade= −.45
Grade= −.4
0.75 1.00 1.25 1.50 1.75
Lower bound 95% CI
Grade= −.5
Grade= −.45
Grade= −.4
0.75 1.00 1.25 1.50 1.75
GWESP Coefficient
Control Treatment
Grade= −.5
Grade= −.45
Grade= −.4
0.75 1.00 1.25 1.50 1.75
Upper bound 95% CI
Figure 1. Effect of scaling on y1.Note: N ¼6;000. Dashed vertical line marks its true
value of 1.
12 Sociological Methods & Research XX(X)
distributed error with a mean of 0 and a standard deviation of 2. We used the
following ERGM equation to simulate network data sets:
log p
1p

¼yEdges þyGWESPdþðGWESPÞþyMMþyGrade Grade;ð12Þ
where the coefficients for Edges and GWESP were set to their empirical
values (yEdges ¼5:4 and yGWESP ¼1:85) and yMwas set to equal 1. To
introduce residual variation, we varied the value of yGrade to equal either
0.1, 0.15, or 0.2, where higher absolute values increase the amount of
residual variation and the degree of scaling. With 1,000 simulated data sets
for each condition, the sample space of the experiment includes the differ-
ences in coefficients for 3,000 simulated networks.
At each replication, we fit two ERGMs to the simulated network data to
calculate the naive difference in coefficients. The first ERGM predicted ties as
a function of the number of edges and GWESP; the second ERGM includedM.
Because Grade is uncorrelated with either GWESP (r¼:02) or M(r¼0),
omitting it from both models introduces residual variation without introducing
omitted confounding variable bias. The naive difference in coefficients is the
difference in yGWESP between models. We calculated the true difference in
coefficients using the product of true coefficients, which is equivalent to the
true difference in coefficients when using the latent parameter for M(aM)(see
Breen, Karlson, and Holm 2013; Mackinnon et al. 2007). The true difference
in coefficients is b1aM,
10
where b1is obtained from fitting the linear regres-
sion M¼b0þb1dþðGWESPÞþeand aM¼1 is the true coefficient used to
simulate the network. We used a linear regression to estimate b1instead of the
predetermined value of 0.5 because the GWESP statistic, and thus, the value of
b1varies endogenously for each simulated network.
11
Figure 2 summarizes simulation results. When there are low amounts of
residual variation, the naive difference in coefficients is small in value and
close to 0. Here, the mean naive difference in coefficients is 0.03 despite the
mean true difference being substantial in size at 0.17. In other words, the
estimated difference in coefficients is 17 percent of the size of the true differ-
ence in coefficients. The naive difference in coefficients also provides the
incorrect sign in 26 percent of cases. Increasing residual variation increases
the discrepancy between the true and naive difference in coefficients. For
instance, when yGrade is fixed to equal 0.2, the naive difference in coeffi-
cients yields the incorrect sign in 34 percent of cases. The mean naive differ-
ence in coefficients is 0.05, while the mean true difference in coefficients is
0.22. A researcher encountering these results would conclude that either there
Duxbury 13
is no difference in coefficients between models or that Msuppresses, rather
than explains, the effect of GWESP. Thus, we caution researchers against
comparing coefficients between models in ERGM applications.
Simulation of Interaction Coefficients
We now return to the Faux Dixon network data to examine the impact of
scaling on interaction coefficients. We first fit an ERGM to the Faux Dixon
High network with the form:
log p
1p

¼ySameSexSameSex þySenderSex Female þyReceiverSexFemale
þySenderGradeGrade þyReceiverGrade Grade
þyMutualdþðMutualityÞ;
ð13Þ
Grade=−.2
Grade=−.15
Grade=−.1
−1.0 −0.5 0.0 0.5 1.0 1.5
Difference in coefficients
Tru e
Naive
Figure 2. Effect of scaling on difference in coefficients. Note: N ¼6;000. Dashed line
marks zero.
14 Sociological Methods & Research XX(X)
where the SameSex interaction coefficient is the coefficient of interest
(model results in Table 1). We then used a Metropolis–Hastings algorithm
to simulate 1,000 networks from the empirical ERGM. We fit an ERGM to
each of the simulated networks, where the regressors were SenderSex,
ReceiverSex,Mutual, and SameSex. Residual variation was introduced by
omitting SenderGrade in one condition and then increased in a second con-
dition by omitting both SenderGrade and ReceiverGrade. Because same sex
friendships are uncorrelated with either ReceiverGrade (r¼:003) or
SenderGrade (r¼:01), the only source of bias in the interaction coeffi-
cient is from scaling.
12
We also fit an ERGM to each network where both
grade covariates were included to provide a control condition for the
experiment.
Figure 3 summarizes the simulation results. In the control condition, the
simulated values of ySameSex reproduce the true value of 0.11 without trouble.
The mean value of ySameSex is 0.12 with a mean zstatistic of 2.4 (true value
2.33). When there are low amounts of residual variation and only
ReceiverGrade is omitted from the model; however, the average coefficient
for SameSex is 0.05 with a mean zstatistic of 1.06. Only in 15 percent of
cases did the estimated value of ySameSex approximate its true value of 0.11
with a zstatistic greater than 1.96. In an additional 12 percent of cases,
ySameSex yielded a negative sign. If we were to interpret these results under
conditions of little scaling, we would conclude that sex homophily has little-
Table 1. ERGM of Friendships in Faux Dixon High.
Parameters Model 1
Sex (male is referent)
Sender 0.02 (.06)
Receiver 0.04 (.06)
Same sex 0.11* (.05)
Grade
Sender 0.25*** (.02)
Receiver 0.22*** (.02)
Mutual 3.46*** (.10)
AIC 11,063
BIC 11,117
p<:05:p<:01:p<:001:
Duxbury 15
to-no substantive effect on friendship formation. Results are even more
striking when we increase the amount of residual variation. When both
ReceiverGrade and SenderGrade are omitted from the simulated ERGMs,
the mean coefficient for SameSex is 3.29—a substantial increase in size and
a reversal of sign when compared to the true value of ySameSex. At no point did
the simulated coefficients approximate the true coefficient. Moreover, each
simulated ERGM returned this value at a high level of confidence, where the
mean zstatistic for ySameSex was 82.78. A researcher examining this model
would be led to conclude that sex homophily is inversely related to friendship
formation. We therefore caution researchers against interpreting interaction
coefficients and their zstatistics as evidence of an interaction effect.
Coefficient z−statistic
Coefficient z−statistic
Coefficient z−statistic
−3 −2 −1 0 −75 −50 −25 0
0.0 0.1 10123
0.0 0.1 0.2 024
Control No Sender Grade No Sender or Receiver Grade
Figure 3. Effect of scaling on SameSex coefficient. Note: N ¼3;000. Dashed line
marks the true value of ySameSex (0.11) and its zstatistic (2.3).
16 Sociological Methods & Research XX(X)
Summary of Simulation Results
Simulation results illustrate that residual variation and scaling have impor-
tant consequences for interpreting effect sizes, assessing differences in
effects between models, and interpreting and testing interactions. When
residual variation is present, ERGM coefficients are a biased measure of
effect size, and this bias can be substantial under realistic circumstances.
Coefficients also frequently rescale between models, suppressing differences
in effects and potentially altering conclusions about confounding and media-
tion. Interaction coefficients and their zstatistics are also often biased in the
simulations, frequently yielding the incorrect sign. Collectively, these results
illustrate that realistic amounts of residual variation can have large conse-
quences for ERGM inference. We now introduce methods for addressing
scaling in ERGM.
Methods for Handling Scaling
Simulation results reveal that realistic amounts of scaling can have large
consequences for ERGM inference. Because we rarely have observational
measures of all relevant variables, it is likely that these problems are com-
mon in practice. How should this issue be overcome in applied research?
While a number of studies have proposed solutions for addressing scaling
and rescaling in GLM (generalized linear models), their extension to statis-
tical network models is problematic. Most solutions that circulate the liter-
ature on GLM assume independent and identically distributed
observations—an assumption violated by network data. For instance, one
popular method for comparing coefficients between models is to first regress
the confounding variable of interest on the remaining predictor variables and
then standardize the focal variable using the residual from this regression
(Breen et al. 2013; Karlson et al. 2012; Mackinnon et al. 2007). Since
residuals are biased by nonindependence and endogeneity, this method can-
not be used in dyadic dependence ERGMs. Likewise, a common strategy for
testing interactions is to estimate separate regressions for each group and use
likelihood ratio tests to assess the equivalence of coefficients between groups
(Allison 1999). However, because network data are interdependent, splitting
the data into separate models will fundamentally damage the representation
of network structure with potentially impactful consequences.
We propose using a marginal effects framework to overcome problems
related to scaling in ERGM. While ERGM coefficients can only be iden-
tified to a scale, scaling has no effect on predicted tie probabilities (see
Duxbury 17
Online Appendix A, which can be found at http://smr.sagepub.com/supple-
mental/). Moreover, because marginal effects are obtained in postestima-
tion, the only necessary independence assumption is that the ERGM is
estimated at the correct level of independence, which is an assumption
implicit in all ERGMs (Koskinen and Daraganova 2013). The framework
integrates methods that have been recently proposed for GLM (Long and
Mustillo 2018; Mize, Doan, and Long, 2019; Mood 2010) but can be
extended to ERGM due to their relaxed independence assumptions. The
framework is flexible and can be easily applied across ERGM specifica-
tions and applications. We focus here on the average marginal effect
(AME), though all methods can be applied to any marginal effect variant,
including partial effects (see Wooldridge, 2002), marginal effects at means
(Agresti 2002; Long 1997), or marginal effects at representative values
(Long and Mustillo 2018; Mize et al. 2019).
13
Interpreting Effect Sizes
Marginal effects are based on the derivative of the slope at a particular point
in the cumulative distribution function (equation 3). The marginal effect for a
variable is the expected increase in tie probability when the variable
increases by 1. For a continuous variable, we define the marginal effect with
respect to a variable Xas its partial derivative,
MEij
yx¼yx
d^
pij
dXij
:ð14Þ
For binary variables, the partial derivative is equivalent to the difference
in tie probabilities when Xchanges from 0 to 1. The superscript ij indexes
that all dyads in the ERGM sample space have a marginal effect. Because
marginal effects summarize changes in tie probability, they are unaffected by
scaling (Cramer 2007; Long 1997). Equation (14) also makes clear that the
marginal effect preserves ERGM independence assumptions; that is, mar-
ginal effects are assumed to be independent conditional on the parameterized
sufficient statistics.
The AME of a variable is its mean marginal effect. Formally, we calculate
the AME as:
AMEyx¼yx1
nSij¼1
d^
pij
dXij ¼Sij¼1MEij
yx
n;ð15Þ
18 Sociological Methods & Research XX(X)
where nis the number of dyads in the ERGM sample space. The AME
expresses the average change in expectation given a one-unit increase in X.
It can be calculated on either the scale of the ERGM linear component or
the scale of tie probabilities. Standard errors are obtained with the Delta
method (see Agresti 2002).
14
The Delta method standard error provides a z
statistic equal to the coefficient zstatistic for noninteraction coefficients.
AMEs therefore do not affect conclusions regarding the significance,
direction, or relative size of effects within a model for non-interaction
terms.
15
An appealing property of the AME is that it has an intuitive interpreta-
tion. Suppose that we are examining the effect of age on friendship net-
works and we obtain an AME of 0.005 (calculated on the scale of tie
probabilities) and a coefficient of 2. A one-unit increase in the coefficient
for age would correspond to an exp(2) ¼7.38 increase in the estimated odds
of a friendship. However, because of scaling, we cannot be sure that the
estimated odds ratio is the true odds ratio. Moreover, because odds ratios
are ratios, the substantive change in tie probability varies multiplicatively
for each unit increase in age. Alternatively, we can interpret the AME
equivalently regardless of the value of age or amount of scaling: A one-
year increase in age correlates with an average 0.005 increase in tie prob-
ability. This interpretation is more intuitive and often more immediately
relevant to research interests than odds ratios. Moreover, because AMEs do
not rescale between models or groups, they provide a basis for drawing
cross-model comparisons and for evaluating interaction effects. We now
introduce these methods for ERGM.
Testing Differences between Models
As described above, we cannot attribute the difference in coefficients to
confounding or mediation. Comparisons of coefficient significance are also
problematic as the difference between significant and insignificant is not, in
itself, statistically significant (Bollen and Stine 1990; Gelman and Stern
2006). To assess the change in effect size, we calculate the difference in the
AME between models:
AMEModel1
yxAMEModel2
yx:ð16Þ
Because AMEs are robust to scaling but are still affected by omitted
correlates, AMEs only change in size when a confounding variable is
excluded or included. Differences in AMEs can therefore be attributed to
substantively relevant differences in effects.
Duxbury 19
To determine whether the change in AME is statistically significant, we
use a Wald test with test statistic:
z¼AMEModel1
yxAMEModel2
yx
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
VarðAMEModel1
yxÞþVarðAMEModel2
yxÞ2CovðAMEModel1
yx;AMEModel2
yxÞ
q;
ð17Þ
where the denominator is the standard error for the difference in AMEs.
Rejecting the null hypothesis means that there is a statistically significant
difference in AME between models.
To calculate the standard error for the difference in AMEs, we require the
variance estimates for both AMEs and the cross-model covariance between
AMEs. We obtain the cross-model covariance using seemingly unrelated
estimation (Mize et al. 2019; Weesie 1999). Suppose y1and y2are the
parameter vectors from two separate ERGMs fit to the same network. The
cross-model covariance vector is calculated as:
Covðy1;y2Þ¼D1
1Siwiu1iuT
2iD1
2;ð18Þ
where Dis the negative Hessian matrix of the ERGM log-likelihood (D1is
the covariance matrix of the ERGM estimator), uis the gradient of the log-
likelihood, and wis a vector of weights equal to zero if the ith parameter does
not appear in both models and equal to 1 if it does. The cross-model covar-
iance matrix for the marginal effects can be estimated using the Delta method
and the cross-model covariance matrix.
A desirable property of equation (17) is that it reduces to a Sobel test (see
Sobel 1986) in large-sample linear models (see Online Appendix D, which
can be found at http://smr.sagepub.com/supplemental/).
16
Under standard
assumptions for mediation analysis (see Mackinnon 2008:53-55), we can use
this method as a formal test of mediation. If we are conducting mediation
analysis, the difference in AMEs is the indirect effect, or the average change
in tie probability indirectly attributable to Xthrough a mediating pathway.
We can interpret AMEModel2
yxas the partial or direct effect. Another desirable
trait of this method is that AMEModel1
yxis equal to the sum of the indirect and
partial effects, which is not usually true in mediation analysis in nonlinear
probability models (Mackinnon 2008). The total effect is equal to AMEModel 1
yx.
We can assess the extent of mediation by calculating how much of the
total effect is explained by controlling for the mediator. The percent
mediated is
20 Sociological Methods & Research XX(X)
100 1 AMEModel2
yx
AMEModel1
yx
!
:ð19Þ
The percent mediated is a useful quantity when the goal is to summarize the
degree of confounding by including a correlated variable. However, the quantity
may be misleadingly large if the total effect is small. Researchers will often wish
to interpret the percent mediated with respect to the total and indirect effects.
Simulation. Because there is no baseline method for comparing coefficients
between models in ERGM, it is useful to demonstrate the validity of the
proposed test under conditions of residual variation. The following simula-
tion uses the Faux Desert High network, which is a simulated friendship
network based on one rural Southwest high school in the AddHealth data
collection. The network is undirected
17
with 107 students and 439 friendship
ties. We first estimated an ERGM to the network of the form
log ^
pij
1^
pij

¼y0Edges þy1Grade þy2AbsDiff ðGradeÞ,whereGrade is a
nodal covariate and AbsDiff ðGradeÞis the absolute difference in two stu-
dents’ grades. We stored the estimated coefficients to use for simulation,
where y01:65, y10:2, and y21:45.
To conduct the simulation, we simulated random networks using the above
model, but we included a random nodal-level covariate with a mean of 0 and a
standard deviation 1 to manipulate residual variation. The random variable was
uncorrelated with any other predictor. Its coefficient could obtain three possi-
ble values of 0.5, 1, and 1.5. We fit four ERGMs to each network and assessed
the difference in AMEs for each matched pair. The first pair of ERGMs was
estimated with and without AbsDiff ðGradeÞ, and the difference in AMEy1was
recorded. These models included the random variable, so that residual varia-
tion was minimized. This provided a control condition and an estimate of the
true difference in AMEs between models. The same comparisons were per-
formed in the second set of ERGMs, but the random nodal variable was
excluded from ERGM estimations. By increasing the coefficient for the ran-
dom nodal variable, we are thus able to increase residual variation and evaluate
its impact on the difference in AMEs. For each condition, we simulated 1,000
networks. With three levels of residual variation and four ERGMs fit to each
network, the simulation considers the differences in AMEs in 12,000 ERGMs
between 6,000 matched pairs in 3,000 network data sets.
Figure 4 plots the differences between the true and estimated values of
AMEModel1
y1AMEModel2
y1, the difference in zstatistics, and the difference in
Duxbury 21
the upper and lower bounds for 95 percent confidence intervals. Across
measures and levels of residual variation, there is little difference between
the true and estimated statistics. The largest discrepancy is for the zstatistic,
where in one case the estimated zstatic was 0.58 below the true zstatistic.
However, given that the true zstatistic was 8.38, the relative discrepancy was
negligible (the estimated zstatistic was 7.79). In the overwhelming majority
of cases, there was little-to-no discrepancy between the estimated and true
.5 11.5
Difference in AME
−1e−04 0e+00 1e−04 −1e−04 0e+00 1e−04 −1e−04 0e+00 1e−04
Z−stat
−0.6 −0.4 −0.2 0.0 −0.6 −0.4 −0.2 0.0 −0.6 −0.4 −0.2 0.0
Lower bound
95% CI
0e+00 1e−04 2e−04 0e+00 1e−04 2e−04 0e+00 1e−04 2e−04
Upper bound
95% CI
−8e−05 0e+00 8e−05 −8e−05 0e+00 8e−05 −8e−05 0e+00 8e−05
Difference between True and Estimated Value
Figure 4. Difference in true and estimated AMEModel1
y1AMEModel2
y1when residual
variation is introduced. Note: N ¼3;000. Dashed line marks 0 representing no dif-
ference in estimates.
22 Sociological Methods & Research XX(X)
values: The mean difference in zstatistics is 0.04, the mean difference in
AMEModel1
y1AMEModel2
y1is 4:6106, the mean difference in 95 percent
confidence interval lower bounds is 1:2105, and the mean difference in
95 percent confidence interval upper bounds is 2:9106. Consistent
with theoretical results, these simulation findings illustrate that the test of
differences in AMEs is robust to scaling and can be used to compare effects
across models in the presence of residual variation.
Testing Interaction Effects
As demonstrated in simulation analyses, excluding as few as two uncorre-
lated explanatory variables can reverse the sign of ERGM interaction coeffi-
cients. We now show how marginal effects can be used to interpret and test
interaction effects in ERGM. Moderation exists when the effect of a variable
varies when a second variable changes in value. We measure the interaction
effect using the second difference in AMEs (Long and Mustillo 2018). We
define the AME for a level of an interaction as AMEg¼k
yx, where yxis the effect
of interest and gis the moderator with kvalues. The second difference is:
DAMEg
yx¼AMEg¼k1
yxAMEg¼k2
yx:ð20Þ
If gis binary, the only second difference for the interaction is when g
changes from 0 to 1. If gis continuous, we can specify the values of gto be
any values in the data set. We can also set gto representative values or
summary statistics, such as the mean plus or minus one standard deviation.
We interpret the second difference as the increase/decrease in AME when
gchanges in value. Say that we are interested in assessing the effect of sex
homophily in a friendship network. The AME of interest is a binary indicator
variable for female students. Let the AME for male alters be 0.001 and the
AME for female alters be 0.004. The second difference would be 0.003,
indicating that the effect of being female on tie probabilities increases by
0.003 when an alter is female instead of male, reflecting a preference for
same sex friendships. We assess significance using a Wald test with the
following test statistic:
z¼AMEg¼k1
yxAMEg¼k2
yx
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
VarðAMEg¼k1
yxÞþVarðAMEg¼k2
yxÞ2CovðAMEg¼k1
yx;AMEg¼k2
yxÞ
q;
ð21Þ
Duxbury 23
where the denominator is the standard error for the second difference. As
before, we can use the Delta method to estimate the AME covariance matrix.
When ghas a large number of unique values, it may be difficult to
succinctly summarize the impact of an interaction using second differences.
One way to do so is to compute the second difference when gchanges from
its smallest value to its maximum, which will provide insight to the overall
change in the interaction effect. In other cases, researchers may simply want
to report the average interaction effect or the average second difference. We
first calculate the second differences for all kin increasing order and then
take the mean second difference:
DAMEg
yx¼XAMEg¼k
yxAMEg¼k1
yx
Nk1;ð22Þ
where N
k
is the number of unique values in g. We can interpret
DAMEg
yxas
the average change in effect when gincreases in value. We can also calculate
the average absolute second difference if the interaction is curvilinear. If we
are examining the average second difference, we test the significance of the
interaction with the average Wald statistic for the second differences. If we
are examining the average absolute second difference, we test significance
with the average absolute Wald statistic. In either scenario, the null hypoth-
esis is that the average second difference or the average absolute second
difference is zero.
18
Because marginal effects are robust to scaling, we can also compare
second differences between models. For instance, if we are studying a school
friendship network, we may be interested in evaluating whether triadic clo-
sure explains the effect of sex homophily. We estimate two ERGMs, one that
includes sex homophily, and one that includes both sex homophily and
triangles. We would then compute the second difference for sex homophily
in each model and use equation (17) to test their equivalence, replacing the
AMEs with second differences.
Summary
The marginal effects framework outlined above provides a strategy to over-
come problems of scaling in ERGM. Because it is typically impossible to
determine that there are no omitted variables in a model, we recommend that
researchers use these methods to interpret effect size, test for and interpret
interaction effects, and to compare effects between models. These methods
are available for use via the opensource software package ergMargins for R
24 Sociological Methods & Research XX(X)
(Duxbury 2019), available through the Comprehensive R Archive Network
repository and as a part of the xergm suite of packages. We now apply these
methods in an empirical application reexamining the role of selective mixing
and triadic closure in a large AddHealth school friendship network.
Empirical Application: Revisiting Birds of a Feather or
Friend of a Friend?
To demonstrate how these methods can be used to correct for scaling, we
examine the relationship between selective mixing and triad counts in one
large AddHealth friendship network. An empirical regularity in social net-
works is high levels of clustering (Newman 2010; Watts and Strogatz 1998).
Observed levels of clustering in social networks are typically attributed to
two underlying processes: selective mixing (preferential attachment to sim-
ilar alters) and triadic closure. Goodreau et al. (2009) sought to disentangle
these two processes using ERGM on a pooled sample of 59 school networks
from the AddHealth data set. Based on comparisons of homophily coeffi-
cients between models, they found that (1) triadic closure explains much of
the effect of selective mixing, (2) that selective mixing promotes triadic
closure, and (3) that there is little relationship between students’ nodal attri-
butes and triad closure, with the exception of female students’ tendency to be
embedded in triangles.
However, because coefficients are only identified to a scale in most
ERGM applications, we are unable to compare homophily coefficients
between models or interpret them as evidence of a homophily effect. We
now revisit this research question in a replication analysis of the largest
AddHealth in-school network in Goodreau et al.’s (2009) sample. The net-
work contains friendship nominations between 2,209 7th and 12th graders
(school ID: 44 in the supplementary tables to Goodreau et al. 2009). While
the in-school network is directed, we follow Goodreau et al.’s (2009) inclu-
sion criteria and only examine the 1,893 mutual ties between students. Our
model specification is, for the most part, identical to the original study. The
only difference is that we include Native American students in the “other”
racial category, instead of as an independent group. Nodal covariates or
“sociality” terms include students’ race (whites are referent), sex (males are
referent), and grade (7th grade is referent). Each sociality term is treated as a
categorical variable, with missing values controlled for as a discrete cate-
gory. Selective mixing is measured by including homophily (matched attri-
bute) interactions for each nodal covariate. Triadic closure is measured with
a GWESP term using a fixed decay parameter of 0.25. Consistent with
Duxbury 25
Goodreau et al. (2009), we estimated three models. The first is a dyad
independence model including only exogenous attributes. The second is a
curved ERGM including only the GWESP parameter. The third is a fully
specified model including all variables.
Table 2 presents results (Panel A). While we were unable to perfectly
replicate Goodreau et al.’s (2009) models, the substantive results are mostly
consistent in terms of direction, relative coefficient size, and differences in
coefficients between models.
19
In model 1, female students have higher tie
probabilities as compared to male students. Black and Asian students have
higher tie probabilities than whites, while Latino students have lower tie
probabilities. Naive interpretations of selective mixing coefficients also sug-
gest that there is a preference for sex, race, and grade homophily across
categories. Model 2 includes only GWESP and an edges term. The positive
coefficient indicates that triadic closure increases the probability of friend-
ship formation. Model 3 presents full model results. The percent change in
coefficients indicates that most coefficients decline in size in the full model,
with the exception of the black sociality coefficient. Consistent with Good-
reau et al. (2009), the change in coefficient size is also quite large, with most
coefficients declining by more than 20 percent. If scaling were not a problem,
these results would indeed imply confounding. We now turn to AMEs to
interpret effect size, interaction effects, and differences in effects between
models.
Panel B in Table 2 presents results as AMEs and second differences. A
focus on AMEs reveals that the sociality effect sizes are relatively small. For
instance, the AME for female students is 0.0003 in model 1, indicating that
female students are only, on average, 0.03 percent more likely to be part of a
mutual friendship than male students. Based on these averages, we would
expect that an Asian female student in seventh grade would only be 0.11
percent more likely—one tenth of one percent—to form a mutual friendship
than a white male student in 10th grade (0.0003 þ0.0003 (0.0005) ¼
0.0011). By comparison, the AME for GWESP is 0.002 in model 2, indicat-
ing that a one-unit increase raises the tie probability by 0.0018 (0.18 percent
increase), with diminishing returns. In other words, closing a single triangle
yields a greater difference in tie probabilities than the absolute difference in
tie probabilities between the demographic most likely to forge a tie (seventh
grade Asian female students) compared to the least likely demographic (10th
grade White male students). This stands in contrast to the substantive impact
implied by the sociality coefficients, which appear, at least intuitively, to
have noteworthy effect sizes.
26 Sociological Methods & Research XX(X)
Table 2. ERGM of Friendships in Large AddHealth School Network.
Panel A. Model results presented as coefficients and standard errors.
Model 1 Model 2 Model 3 Percent
Changey(SE)y(SE)y(SE)
Edges 19.83*** (.41) 7.78*** (.03) 17.70*** (.51)
Sociality
Female 0.32*** (.03) 0.18*** (.05) 34
Black 0.25*** (.06) 0.30* (.12) 20
Latino 0.41*** (.03) 0.16*** (.04) 39
Asian 0.43*** (.09) 0.35* (.15) 19
Other race 0.29 (.21) 0.03 (.33) 90
8th Grade 0.03 (.06) 0.14 (.11) 467
9th Grade 0.56*** (.05) 0.32*** (.09) 43
10th Grade 0.60*** (.05) 0.33*** (.09) 45
11th Grade 0.43*** (.05) 0.19* (.09) 56
12th Grade 0.27*** (.06) 0.16 (.10) 41
Selective mixing
Female 0.72*** (.05) 0.61*** (.06) 15
White 1.05*** (.08) 0.74*** (.10) 30
Black 1.52*** (.12) 1.18*** (.16) 22
Latino 1.01*** (.08) 0.87*** (.09) 14
Asian 1.53*** (.16) 1.01*** (.21) 34
Other race 0.14 (.23) 0.07 (.34) 50
7th Grade 2.89*** (.20) 2.42*** (.21) 16
8th Grade 2.66*** (.19) 2.00*** (.21) 25
9th Grade 1.79*** (.09) 1.49*** (.11) 17
10th Grade 0.98*** (.08) 0.79*** (.11) 20
11th Grade 0.78*** (.08) 0.66*** (.10) 15
12th Grade 1.26*** (.09) 0.91*** (.12) 28
Triadic closure
GWESP 2.57*** (.05) 1.92*** (.07) 25
(continued)
Duxbury 27
We now assess the interaction effect for homophily terms in model 3.
Results for the direction and significance of second differences are consistent
with those for interaction coefficients, indicating that scaling is not
Table 2. (continued)
Panel B. Model results presented as AMEs and second differences
AME 10 (SE)AME10 (SE) AME 10 (SE)
Percent
Change
Sociality
Female 0.003*** (.000) 0.001*** (.000) 51**
Black 0.002*** (.000) 0.002* (.000) 3
Latino 0.003*** (.000) 0.001*** (.000) 65***
Asian 0.003*** (.000) 0.002* (.001) 29
Other race 0.001 (.002) 0.000 (.000) 9
8th Grade 0.000 (.000) 0.000 (.000) 460
9th Grade 0.004*** (.000) 0.002*** (.000) 48**
10th Grade 0.005*** (.000) 0.002*** (.000) 52**
11th Grade 0.003*** (.000) 0.001* (.000) 62**
12th Grade 0.002*** (.000) 0.001 (.000) 46
Selective mixing
Female 0.007*** (.000) 0.004*** (.000) 29***
Black 0.006*** (.000) 0.005*** (.000) 17
Latino 0.007*** (.000) 0.006*** (.000) 15
Asian 0.004*** (.000) 0.004*** (.000) 19
Other race 0.001 (.000) 0.000 (.002) 62
8th Grade 0.007*** (.000) 0.005*** (.000) 31**
9th Grade 0.010*** (.000) 0.008*** (.000) 22*
10th Grade 0.005*** (.000) 0.004*** (.000) 12
11th Grade 0.004*** (.000) 0.004*** (.000) 11
12th Grade 0.006*** (.000) 0.004*** (.000) 26
Triadic closure
GWESP .018*** (.000) 0.013*** (.000) 27***
AIC 25,310 28,110 24,187
BIC 25,641 28,136 24,530
Note: Coefficients for “missing” categories not reported. AME standard errors are calculated
with the Delta method. AMEs for selective mixing coefficients are second differences. All AMEs
are calculated on the scale of tie probabilities. AMEs are multiplied by 10 to simplify presenta-
tion. The significance of the percent change in AMEs is determined using equation (15).
p<:05:p<:01:p<:001:
28 Sociological Methods & Research XX(X)
problematizing conclusions about the positive influence of homophily. How-
ever, it is also clear that interaction coefficients imply misleading conclu-
sions about interaction effect size. If we were to interpret homophily
coefficients as odds ratios, for instance, we would conclude that eighth grade
homophily increases the odds of friendship seven times over (exp(2.00) ¼
7.39). Likewise, we would conclude that same sex friendships increase the
odds of friendship by 84 percent (exp(0.61) ¼1.84). Figure 5 makes clear
that the differences in interaction effect sizes are not that large. Instead, the
interaction effect for eighth grade friendships is approximately equal to the
interaction effect for female friendships. In fact, the largest second difference
is for same grade friendships among ninth graders, which has a smaller
coefficient than 8th grade homophily. We arrive at a similar result for racial
homophily. Although the homophily coefficient is largest for black students,
the homophily interaction effect is greatest for Latinos. These results illus-
trate that even in cases where scaling does not problematize conclusions
about the significance and direction of interaction coefficients, it can still
alter conclusions about the relative and overall importance of interaction
effects.
Our final goal is to examine differences in effects between models.
Although the percent change in AMEs is similar to the percent change in
coefficients for some covariates, it is quite different for others. For instance,
while the coefficient for black students increases by 20 percent, the AME
.0005***.0005***.0005***.0005*** .0006***.0006***.0004***.0004*** .0000.0000.0005***.0005*** .0008***.0008***.0004***.0004*** .0004***.0004*** .0005***.0005***
−1e−03
−5e−04
0e+00
5e−04
1e−03
10th Grade 11th Grade 12th Grade 8th Grade 9th Grade Asian Black Female Latino Other Race
AME
Alter's attribute
Different
Same
Figure 5. Average marginal effects for homophily terms. Note: X axis is students’
attributes. Bands are 95 percent confidence intervals. Second differences are printed
at the top of the plot.
Duxbury 29
does not change, indicating that the increase in coefficient size is entirely a
result of scaling rather than a suppressing effect. This is particularly relevant
for 12th graders. Even though the coefficient for 12th graders changes from
significant to insignificant, a test of the difference in AMEs reveals that this
change in significance is not itself statistically significant.
Moreover, even though most AMEs change by more than 10 percent
between models, the difference in effects is only significant for 9 of the 21
model terms. Consequently, despite the relative change in coefficient size
being fairly large in many cases, the conclusion of confounding is not sup-
ported for most covariates. In fact, while each second difference changes by
more than 15 percent between models, the difference in second difference is
only significant for 3 of the 10 interactions. Notably, there is no significant
change in any of the racial homophily second differences. This contrasts with
the differences in racial homophily coefficients, which decline by 15 percent
to 35 percent between models. These results suggest that there is little sys-
tematic relationship between selective mixing and triadic closure in the
school 44 network.
An interesting and unique result that arises from comparisons of AMEs is
that the differences in AMEs are greater for nodal covariates than for homo-
phily covariates (Table 3). Five of the 10 differences in sociality AMEs are
statistically significant. For instance, the AME for female students declines
by 51 percent between models. The direct and indirect effects for sex are both
0.0001. This means that, compared to male students, female students have a
Table 3. Mediation Analysis for Sociality AMEs with GWESP as the Mediator.
Total AME Direct AME Indirect AME Percent Change
Female .003*** (.000) .001*** (.000) .001** (.000) 51
Black .002*** (.000) .002* (.001) .000 (.000) 3
Latino .003*** (.000) .001*** (.000) .002*** (.000) 65
Asian .003*** (.000) .002* (.000) .001 (.001) 29
Other race .002 (.002) .000 (.002) .002 (.003) 9
8th Grade .000 (.000) .000 (.000) .001 (.001) 460
9th Grade .004*** (.000) .002*** (.001) .002** (.001) 48
10th Grade .005*** (.000) .002*** (.000) .002** (.000) 52
11th Grade .003*** (.000) .001* (.001) .002** (.000) 62
12th Grade .002*** (.000) .001 (.001) .001 (.001) 46
Note: Delta Standard Errors in Parentheses. AMEs are Multiplied by 10. All AMEs are Calculated
on the Scale of Tie Probabilities. Indirect AME is the Difference in AME between Models.
p<:05:p<:01:p<:001:
30 Sociological Methods & Research XX(X)
0.0001 higher probability of forming friendship ties because of their gender
(direct effect). The indirect effect reflects that being female is also indirectly
to forming friendship ties because female students tend to be embedded in a
greater number of triangles. Put differently, being female instead of male
indirectly increases the probability of forming a friendship tie by 0.0001 by
contributing to triadic closure. Through both indirect and direct pathways,
the probability that female students will form reciprocal friendships is 0.0003
higher than it is for male students (total effect).
Similarly, accounting for triadic closure explains a substantial portion of
the effects of being in 9th, 10th, and 11th grade. The percent mediated for
each of these variables ranges from 45 percent to 60 percent. This result
suggests that part of the reason that students’ grade is predictive of tie
probabilities is because it affects triangle counts; that is, students in 9th,
10th, and 11th grade tend to be embedded in a greater number of triangles
as compared to students in 7th grade. The AME for Latinos also declines by
65 percent after controlling for GWESP. Because the percent mediated for
sociality AMEs are on average larger than for selective mixing terms, these
results suggest that triangle counts explain a greater share of the effect of
sociality than they do of selective mixing.
A replication of Goodreau et al. (2009) reveals how scaling can alter con-
clusions in ERGM applications. While findings regarding the direction and
significance of sociality, triad closure, and selective mixing effects were sup-
ported, conclusions regarding the substantive impact of these covariates and
those related to indirect effects were problematized. Particularly, sociality
terms appear to have small effect sizes when we correct for scaling. We further
found that scaling affected the relative size of homophily coefficients, altering
conclusions regarding the relative impact of some types of selective mixing.
Scaling also led to increases in homophily coefficient size that were not
reflected in the homophily effects, causing homophily coefficients to overstate
the substantive importance of some types of selective mixing, like eighth grade
homophily. Finally, a reanalysis of the differences in effects between models
shows how scaling can alter conclusions about confounding and indirect path-
ways. Selective mixing only appears to affect triangle counts in the minority of
cases. Further, sociality terms have greater impact on triadic closure than
selective mixing, which is not reflected in comparisons of naive coefficients.
Discussion
Residual variation can have large consequences for ERGM inference. The
equality of coefficients cannot be compared between models or groups in any
Duxbury 31
scenario where coefficients are scaled nor can coefficients be interpreted as
effect sizes. Because the assumption of no omitted variables is difficult to
verify, we can rarely outrule the possibility of residual variation and scaling
in practice. This study outlined these issues and proposed resolutions using
marginal effects. The methods are robust to scaling and can be flexibly
applied across ERGM specifications. These methods were further extended
to develop formal tests of mediation and moderation, which have yet to be
introduced for statistical network analysis. Collectively, the methods provide
a flexible framework for interpreting effect sizes and conducting process
analysis in research using statistical network methods.
While the methodological discussion here focused on ERGM, the same
issues can also affect inference in other statistical network models that can be
represented as logit models. Stochastic actor-oriented models, for instance,
use a multinomial logistic regression to model network and behavioral
change (Snijders 2001). Generalizations of ERGM map weighted edge data
to a binary ERGM reference distribution (Desmarais and Cranmer 2012;
Krivitsky 2012), and temporal ERGM reduces to an ERGM with block
structure (Hanneke et al. 2010). Likewise, relational event models are often
estimated as a logistic regression (Butts 2008). Because the proposed meth-
ods rely on postestimation, they can be used to overcome scaling in any of
these models. The methods can therefore be flexibly applied in a variety of
social network research to assess interaction effects and indirect pathways.
They can also be applied in research using frailty ERGM (see Box-
Steffensmeier et al. 2018) to address omitted confounding variables and
rescaling simultaneously.
A further implication of our results is that meta-regressions of statistical
network model output may be affected by scaling. Researchers often use
meta-regression to combine output from multiple ERGMs, where the
meta-coefficient is a weighted or unweighted average of the lower-level
coefficients. The averages of these coefficients can be confounded with
residual variation. Because each lower-level ERGM is identified to a unique
scale (tvaries between models), it is likely that scaling could alter the size
and potentially the direction of coefficients in ERGM meta-regression. Fur-
ther research should explore the possibility that scaling affects results in
ERGM meta-analyses.
In sum, ERGM results can be affected by scaling, which often arises when
there is residual variation in an empirical model. Because we cannot test for
all possible sources of residual variation (i.e., omitted variables), it is
extremely difficult to rule out the possibility of scaling in practice. A meth-
odological framework was introduced to overcome problems of scaling in
32 Sociological Methods & Research XX(X)
ERGM. Formal tests were also developed to test the equivalence of marginal
effects between models and groups. These methods can be applied to conduct
mediation and moderation analysis in ERGM and related statistical network
models. As such, they introduce a new methodological toolkit that can be
used to assess the significance and effect of interactions and indirect path-
ways in statistical network analysis.
Acknowledgments
I thank David Melamed, Jacob Young, David Schaefer, Skyler Cranmer, and Carter
Butts for helpful comments and conversations at various stages of this project.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research,
authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or pub-
lication of this article.
ORCID iD
Scott W. Duxbury https://orcid.org/0000-0002-2071-8357
Supplemental Material
The supplemental material for this article is available online.
Notes
1. Indeed, Box-Steffensmeier et al. (2018) develop their frailty extension to expo-
nential random graph model (ERGM) with the specific goal of accounting for
nodal covariates that are rarely measured, such as students’ charisma in school
friendship networks.
2. As will be discussed below, these issues are also present in other statistical
network models, including stochastic actor-oriented models and relational event
models.
3. This is the Geyer–Thompson approach, which uses a Metropolis–Hastings algo-
rithm and is the most widely applied strategy. Another common estimation
algorithm is stochastic approximation (Snijders 2002), which uses smaller ran-
dom Markov chain Monte Carlo (MCMC) samples and more frequent updating.
4. In principle, the threshold of 0 is not necessary for this formulation. Any thresh-
old can be used and scaled to arrive at the same latent variable formulation. A
threshold of 0 is a simple convenience.
Duxbury 33
5. This is a key difference between scaling and classical omitted confounding
variable bias. When there are omitted confounders in a model, noninteraction
coefficients can yield the incorrect sign, significance, and size. For noninterac-
tion coefficients, the consequences of scaling are less severe, as zstatistics and
the sign of coefficients are unchanged. However, as will be discussed below,
scaling can cause the sign, significance, and magnitude of interaction coefficients
and cross-model differences in coefficients to be incorrect.
6. An equivalent way to conceptualize this is to consider the coefficient for an
interaction X12 between dummy variables X
1
and X
2
. The coefficient for X12 in a
logit model is not a log-odds ratio, but the ratio of log-odds ratios l1=l2,where
l1is the log-odds ratio when X2¼1andl2is the log-odds ratio when X2¼0
(Ai and Norton 2003). If the amount of scaling is greater in l2than in l1,y12
will inflate. If the amount of scaling is greater in l1than in l2,y12 will shrink.
Moreover, if l2is negative while l1is positive, y12 will be negative, even
though the interaction effect is positive (e.g., tie probabilities increase
when X2¼1).
7. All simulation results were replicated using dyad independence models to ensure
that findings were not an artefact of nonstationary MCMC samples. See Online
Appendix B (which can be found at http://smr.sagepub.com/supplemental/) for a
related simulation using p1 models.
8. The “Faux” high school networks are available in the statnet package for R. The
networks were generated by fitting an ERGM to select high school friendship
networks in the AddHealth data set and then simulating a network from the
ERGM. The networks are therefore generated synthetically but replicate the
structural characteristics and friendship patterns of empirical networks.
9. The correlation is computed by computing the GWESP statistic on every dyad in
the network and calculating the correlation with ReceiverGrade for each dyad.
10. While this measure does not yield the true difference in coefficients in most
ERGMs, the product of coefficients is equivalent to the difference in coefficients
when calculated using the latent coefficient (b1aM) instead of the estimated
coefficient (b1yM). Thus, the two measures are equivalent in this special case.
Note that this generality holds for all logit models (Breen et al. 2013; Mackinnon
et al. 2007), where ERGM is a special case (see also Online Appendix A, which
can be found at http://smr.sagepub.com/supplemental/).
11. While including an endogenous change statistic when the outcome is a tie vari-
able biases the linear estimator, in this case the outcome variable is exogenous
and there are no other variables in the bivariate regression. Thus, much like a
dyad independence model, the model does not violate independence assumptions
necessary for linear regression.
12. The correlation between students’ grade and students’ sex is 0.02.
34 Sociological Methods & Research XX(X)
13. Keeping with convention in sociological research, all methods are introduced
using a micro-level interpretation of tie probabilities.
14. The Delta method uses the gradient of the AMEs to estimate the asymptotic
covariance matrix. The asymptotic covariance matrix for the AMEs is the cross-
product Dy^
OyDT
y, where Dyis the gradient of the AMEs and ^
Oyis the covariance
matrix of the ERGM estimator.
15. Because the AME point estimates are unaffected by scaling and the z-statistics
are equivalent to the coefficient zstatistics, 95 percent confidence intervals for
the AME should typically include the true AME. To examine this with simula-
tion, we replicated the effect size simulations in section 3.1. Consistent with
expectations, the 95 percent confidence intervals contained the true AME in
94.3 percent of models. Importantly, there was little variation in the coverage
of the confidence intervals as the amount of scaling increased.
16. The Sobel test has been critiqued for providing conservative standard errors in
small sample spaces with fewer than roughly 200 observations (Bollen and Stine
1990; Mackinnon, Lockwood, and Williams 2004). However, because the dyad
sample space is usually quite large for even very small networks (i.e., an undir-
ected network with 20 nodes has nðn1Þ
2¼190 dyads), this is a relatively small
concern in ERGM applications.
17. The original network is directed, but we treat it as undirected to simplify the
simulation.
18. Note that the average of a series of Wald tests follow an asymptotic standard
normal distribution (Dumitrescu and Hurlin 2012).
19. We expect that the discrepancy is, in part, because of changes to the default
behaviors of the ergm software for R, which has undergone substantial updates in
the 10 years since Goodreau et al. (2009) was published, including convergence
criteria for the MC-MLE. This may have had an impact on the likelihood esti-
mator or MCMC sample space in the original study. We had to increase the
number of MCMC samples manually to obtain satisfactory convergence in our
analysis.
References
adams, jimi and David R. Schaefer. 2016. “How Initial Prevalence Moderates
Network-based Smoking Change: Estimating Contextual Effects with Stochastic
Actor-based Models.” Journal of Health and Social Behavior 57:22-38.
Agresti, Alan. 2002. Categorical Data Analysis. New York: Wiley.
Ai, Chunrong and Edward C. Norton. 2003. “Interaction Terms in Logit and Probit
Models.” Economics Letters 80:123-29.
Allison, Paul D. 1999. “Comparing Logit and Probit Coefficients across Groups.”
Sociological Methods and Research 28:186-208.
Duxbury 35
Besag, Julian E. 1972. “Nearest-neighbour Systems and the Auto-logistic Model for
Binary Data.” Journal of the Royal Statistical Society Series B 34:75-83.
Besag, Julian E. 1974. “Spatial Interaction and the Statistical Analysis of Lattice
Systems.” Journal of the Royal Statistical Society Series B 36:192-236.
Bollen, Kenneth A. and Robert Stine. 1990. “Direct and Indirect Effects: Clas-
sical and Bootstrap Estimates of Variability.” Sociological Methodology 20:
115-40.
Box-Steffensmeier, Janet M., Dino P. Christenson, and Jason W. Morgan. 2018.
“Modeling Unobserved Heterogeneity in Social Networks with the Frailty Expo-
nential Random Graph Model.” Political Analysis 26:3-19.
Breen, Richard, Kristian Bernt Karlson, and Anders Holm. 2013. “Total, Direct, and
Indirect Effects in Logit and Probit Models.” Sociological Methods and Research
42:164-91.
Butts, Carter T. 2008. “A Relational Event Framework for Social Action.” Socio-
logical Methodology 38:155-200.
Cramer, Jan Salomon. 2007. “Robustness of Logit Analysis: Analysis for Unobserved
Heterogeneity and Mis-specified Disturbances.” Oxford Bulletin of Economics
and Statistics 69:545-55.
Cranmer, Skyler J., Philip Leifeld, Scott D. McClurg, and Meredith Rolfe. 2017.
“Navigating the Range of Statistical Tools for Inferential Network Analysis.”
American Journal of Political Science 61:237-51.
Desmarais, Bruce A. and Skyler J. Cranmer. 2012. “Statistical Inference for Valued-
edge Networks: The Generalized Exponential Random Graph Model.” PLoS One
7:e30136.
Dumitrescu, Elena Ivona and Christophe Hurlin. 2012. “Testing for Granger Non-
causality in Heterogenous Panels.” Economics Modeling 29:1450-60.
Duxbury, Scott. 2019. ergMargins: Process Analysis for Exponential Random Graph
Models. Comprehensive R Archive Network.
Erdos, Paul and Alfred Renyi. 1959. “On Random Graphs.” Publicationes Mathema-
ticae 6:290-97.
Frank, Ove and David Strauss. 1986. “Markov Graphs.” Journal of the American
Statistical Association 81:832-42.
Gelman, Andrew and Hal Stern. 2006. “The Difference between “Significant” and
“Not Significant” Is Not Itself Statistically Significant.” The American Statistician
60:328-31.
Geyer, Charles J. and Elizabeth A. Thompson. 1992. “Constrained Monte Carlo
Maximum Likelihood for Dependent Data.” Journal of the Royal Statistical Soci-
ety B 54:657-99.
36 Sociological Methods & Research XX(X)
Goodreau, Steven M., James A. Kitts, and Martina Morris. 2009. “Birds of a Feather,
or Friend of a Friend? Using Exponential Random Graph Models to Investigate
Adolescent Social Networks.” Demography 46:103-25.
Handcock, Mark S., Garry Robins, Tom A. B. Snijders, Jim Moody, and Julian Besag.
2003. “Assessing Degeneracy in Statistical Models of Social Networks.” Journal
of the American Statistical Association 76:33-50.
Hanneke, Steve, Wenjie Fu, and Eric P. Xing. 2010. “Discrete Temporal Models of
Social Networks.” Electronic Journal of Statistics 4:585-605.
Holland, Paul W. and Samuel Leinhardt. 1981. “An Exponential Family of Probabil-
ity Distributions for Directed Graphs.” Journal of the American Statistical Asso-
ciation 76:33-50.
Hosmer, David W. and Stanley Lemeshow. 2000. Applied Logistic Regression. Hobo-
ken, NJ: Wiley-Interscience Publication.
Hunter, David R. 2007. “Curved Exponential Family Models for Social Networks.”
Social Networks 29:216-30.
Karlson, Kristian Bernt, Anders Holm, and Richard Breen. 2012. “Comparing
Regression Coefficients between Same-sample Nested Models Using Logit and
Probit: A New Method.” Sociological Methodology 42:286-313.
Kim, Yeaji, Antenangeli Leonardo, and Justin Kirkland. 2016. “Measurement Error
and Attenuation Bias in Exponential Random Graph Models.” Statistics, Politics,
and Policy 7:29-54.
Koskinen, Johan and Galina Daraganova. 2013. “Dependence Graphs and Sufficient
Statistics.” Pp. 77-90 in Exponential Random Graph Models for Social Networks,
edited by Dean Lusher, Johan Koskinen, and Garry Robins, chapter 7. Cambridge:
Cambridge University Press.
Kreager, Derek A., Jacob T. N. Young, Dana L. Haynie, Martin Bouchard,
David R. Schaefer, and Gary Zajac. 2017. “Where “Old Heads” Prevail:
Inmate Hierarchy in a Men’s Prison Unit.” American Sociological Review
82:685-718.
Krivitsky, Pavel N. 2012. “Exponential-family Random Graph Models for Valued
Networks.” Electronic Journal of Statistics 6:1100-1128.
Lewis, Kevin. 2013. “The Limits of Racial Prejudice.” Proceedings of the National
Academy of Sciences 110:18814-819.
Long, J. Scott. 1997. Regression Models for Categorical and Limited Dependent
Variables. Thousand Oaks, CA: Sage.
Long, J. Scott and Sarah A. Mustillo. 2018. “Using Predictions and Marginal Effects
to Compare Groups in Regression Models for Binary Outcomes.” Sociological
Methods & Research. doi:10.1177/0049124118799374.
Lusher, Dean, Johan Koskinen, and Garry Robins. 2013. Exponential Random Graph
Models for Social Networks. Cambridge: Cambridge University Press.
Duxbury 37
Mackinnon, David P. 2008. Introduction to Statistical Mediation Analysis. Abingdon,
UK: Routledge.
Mackinnon, David P., Chondra M. Lockwood, Hendricks Brown, and Wei Wang.
2007. “The Intermediate Endpoint Effect in Logistic and Probit Regression.”
Clinical Trials 4:499-513.
Mackinnon, David P., Chondra M. Lockwood, and Jason Williams. 2004.
“Confidence Limits for the Indirect Effects: Distribution of the Product and
Resampling Methods.” Multivariate Behavioral Research 39:99-128.
Mele, Angelo. 2017. “A Structural Model of Dense Network Formation.” Econome-
trica 85:825-50.
Mize, Trenton, Long Doan, and J. Scott Long. 2019. “A General Framework for
Comparing Predictions and Marginal Effects across Models.” Sociological Meth-
odology 49(1):1-38.
Mood, Carina. 2010. “Logistic Regression: Why We Cannot Do What We Think We
Can Do, and What We Can Do about It.” European Sociological Review 26:67-82.
Newman, Mark E.J. 2010. Networks: An Introduction. Oxford, UK: Oxford Univer-
sity Press.
Papachristos, Andrew V. and Sara Bastomski. 2018. “Connected in Crime: The
Enduring Effect of Neighborhood Networks on the Spatial Patterning of Vio-
lence.” American Journal of Sociology 124:517-68.
Papachristos, Andrew V., David M. Hureau, and Anthony A. Braga. 2013. “The
Corner and the Crew: The Influence of Geography and Social Networks on Gang
Violence.” American Sociological Review 78:417-47.
Schweinberger, Michael. 2020. “Consistent Structure Estimation of Exponential
Family Random Graph Models with Block Structure.” Bernoulli 26:1205-33.
Schweinberger, Michael and Mark S. Handcock. 2015. “Local Dependence in Ran-
dom Graph Models: Characterization, Properties, and Statistical Inference.” Jour-
nal of the Royal Statistical Society, Series B 77:647-76.
Snijders, Tom A. B. 2001. “The Statistical Evaluation of Social Network Dynamics.”
Sociological Methodology 31:361-95.
Snijders, Tom A. B. 2002. “Markov Chain Monte Carlo Estimation of Exponential
Random Graph Models.” Journal of Social Structure 3: 2-37.
Snijders, Tom A.B., Phillipa E. Pattison, Garry L. Robins, and Mark S. Handcock.
2006. “New Specifications for Exponential Random Graph Models.” Sociological
Methodology 36:99-150.
Sobel, Michael E. 1986. “Direct and Indirect Effects in Linear Structural Equation
Models.” Sociological Methods and Research 16:155-76.
Stewart, Jonathan, Michael Schweinberger, Michal Bojanowski, and Martina Morris.
2019. “Multilevel Network Data Facilitate Statistical Inference for Curved
ERGMs with Geometrically Weighted Terms.” Social Networks 59:98-119.
38 Sociological Methods & Research XX(X)
Thiemichen, S., N. Friel, A. Caimo, and G. Kauermann. 2016. “Bayesian Exponential
Random Graph Models with Nodal Random Effects.” Social Networks 46:11-28.
van Duijn, Marijtje A., Tom A. B. Snijders, and Bonne J. H. Zijlstra. 2004. “p2: A
Random Effects Model with Covariates for Directed Graphs.” Statistica Neerlan-
dica 58:234-54.
Wasserman, Stan and Phillipa Patterson. 1996. “Logit Models and Logistic Regres-
sions for Social Networks I: An Introduction to Markov Graphs and p*.” Psycho-
metrika 61:401-25.
Watts, Duncan J. and Steven H. Strogatz. 1998. “Collective Dynamics of ‘Small-
world’ Networks.” Nature 393:440-42.
Weesie, Jeroen. 1999. “Seemingly Unrelated Estimation and Cluster-adjusted Sand-
wich Estimator.” Stata Technical Bulletin 9:231-48.
Wimmer, Andreas and Kevin Lewis. 2010. “Beyond and Below Racial Homophily:
ERG Models of a Friendship Network Documented on Facebook.” American
Journal of Sociology 2:583-642.
Winship, Christopher and Robert D. Mare. 1983. “Structural Equations and Path
Analysis for Discrete Data.” American Journal of Sociology 89:54-110.
Wooldridge, Jeff. 2002. Econometric Analysis of Cross Section and Panel Data.
Cambridge, MA: MIT Press.
Young, Jacob. 2011. “How Do They ‘End Up Together’? A Social Network Analysis
of Self-Control, Homophily, and Adolescent Relationships.” Journal of Quanti-
tative Criminology 27:251-73.
Author Biography
Scott W. Duxbury is an assistant professor of sociology at the University of North
Carolina at Chapel Hill. His research examines network responsiveness to exogenous
shocks, statistical methods for network and panel data, and public influence over the
criminal justice system. His research has been appeared in outlets such as American
Sociological Review, Social Forces, Sociological Methods & Research, Criminology,
and Social Networks.
Duxbury 39
... As these methods are constantly evolving, there is also a need to revisit substantive questions on ethnic homophily and its contextual determinants across schools. For example, deriving meaningful effect sizes that have a clear theoretical interpretation and can be compared across models and samples is crucial to answer questions about mechanisms and contextual variation more reliably (Duxbury 2021). ...
... Additionally, in both offerings of both courses men disproportionately undernominate women as strong in the course material as compared to men nominating men (pink dots on both panels of Fig. 5). Although comparing ERGM coefficient values across different-sized networks is ill defined [79], we tentatively observe that this bias from men occurs to a similar extent in every course, with the possible exception of the spring offering of the lab course which has a slightly smaller coefficient estimate for the man → woman variable. ...
Article
Full-text available
Previous work has identified that recognition from others is an important predictor of students’ participation, persistence, and career intentions in physics. However, research has also found a gender bias in peer recognition in which student nominations of strong peers in their physics course disproportionately favor men over women. In this study, we draw on methods from social network analysis and find a consistent gender bias in which men disproportionately undernominate women as strong in their physics course in two offerings of both a lecture course (for science and engineering, but not physics, majors) and a distinct lab course (for science, engineering, and physics majors). We also find in one offering of the lecture course that women disproportionately undernominate men, contrary to what previous research would predict. We expand on prior work by also probing two data sources related to who and what gets recognized in peer recognition: students’ interactions with their peers (who gets recognized) and students’ written explanations of their nominations of strong peers (what gets recognized). Results suggest that the nature of the observed gender bias in peer recognition varies between the instructional contexts of lecture and lab. In the lecture course, the gender bias is related to who gets recognized: both men and women disproportionately overnominate their interaction ties to students of their same gender as strong in the course. In the lab course, the gender bias is also related to what gets recognized: men nominate men more than women because of skills related to interactions, such as being helpful. These findings illuminate the different ways in which students form perceptions of their peers and add nuance to our understanding of the nature of gender bias in peer recognition.
... Although there have been recent advancements aimed towards creating formal tests of mediation in network models (e.g., Duxbury, 2021), these advancements have only just begun to be employed in research contexts. Therefore, while network models provide greater flexibility to enable leadership researchers to incorporate import factors of leadership into network models, their flexibility in considering complex theoretical models remains limited. ...
Article
Full-text available
Lawmakers are routinely confronted by urgent social issues, yet they hold conflicting policy preferences, incentives, and goals that can undermine collaboration. How do lawmakers collaborate on solutions to urgent issues in the presence of conflicts? I argue that by building mutual trust, networks provide a mechanism to overcome the risks conflict imposes on policy collaboration. But, in doing so, network dependence constrains lawmakers’ ability to react to the problems that motivate policy action beyond their immediate connections. I test this argument using machine learning and longitudinal analysis of federal crime legislation cosponsorship networks between 1979 and 2005, a period of rising political elite polarization. Results show that elite polarization increased the effects of reciprocal action and prior collaboration on crime legislation co-sponsorships while suppressing the effect of violent crime rates. These relationships vary only marginally by political party and are pronounced for ratified criminal laws. The findings provide new insights to the role of collaboration networks in the historical development of the carceral state and elucidate how political actors pursue collective policy action on urgent issues in the presence of conflict.
Article
Arms transfers result from economic and political motives, with the latter often dominating the former. While this is accepted knowledge for the post‐World War II period, it seems not to apply earlier. Much existing research argues that in the interwar years, weapons were traded as purely commercial goods because governments had neither the ability nor willingness to control and direct arms transfers. We reassess this idea and argue that, while formal control was largely absent, governments could steer weapons shipments nonetheless because arms producers depended on them as main customers, sales agents, and financiers of their export business. Anecdotal evidence suggests that governments actively used this influence. To test whether interwar arms transfers were the result of political or commercial interests, we use newly collected, historical data on the small arms trade and inferential network analysis methods. Our results suggest that although economic drivers existed throughout the interwar period, political considerations were especially influential when international relations were hostile at the start and end of the period. This research contributes to our understanding of international economic relations between the world wars and of the drivers of arms transfers across time.
Chapter
Full-text available
Das vorliegende Kapitel gibt einen Überblick über die Arbeitsschritte, die bei der Analyse großer Netzwerke zu beachten sind. Dazu zählen Aspekte der Datenerhebung und Stichprobenziehung, Netzwerkeffekten, Methoden sowie Fallstricke und Limitationen die vor allem bei der Analyse großer Netzwerke auftreten.
Article
Full-text available
Mediation analysis is increasingly used in the social sciences. Extension to social network data, however, has proved difficult because statistical network models are formulated at a lower level of analysis (the dyad) than many outcomes of interest. This study introduces a general approach for micro-macro mediation analysis in social networks. The author defines the average mediated micro effect (AMME) as the indirect effect of a network selection process on an individual, group, or organizational outcome through its effect on an intervening network variable. The author shows that the AMME can be nonparametrically identified using a wide range of common statistical network and regression modeling strategies under the assumption of conditional independence among multiple mediators. Nonparametric and parametric algorithms are introduced to generically estimate the AMME in a multitude of research designs. The author illustrates the utility of the method with an applied example using cross-sectional National Longitudinal Study of Adolescent to Adult Health data to examine the friendship selection mechanisms that indirectly shape adolescent school performance through their effect on network structure.
Article
The complexity of metropolitan polycentric governance is still challenging scholars and practitioners, who have mostly been engaged in a normative debate in which scant attention has been paid to the coexistence and interdependence of institutional solutions. The ecology of games framework (EGF) can be used to remedy this gap. By incorporating the analysis of institutional variation into EGF propositions about venues' interdependence, this article examines the mechanisms of metropolitan governance configuration resulting from institutional complexity at the inter‐municipal level. Provincial forums, municipal associations, and inter‐municipal agreements are the policy venues studied in the Santiago Metropolitan Region, Chile. Official documents reporting formal agreements in 2017–2021 help to capture the inter‐municipal governance network to which we apply exponential random graph models (ERGMs). The results show the positive effects of mandated provincial venues on inter‐municipal ties and the absence of the effect of self‐organized municipal associations, tendencies that prevail even when incorporating other relevant covariates into the models. These results nourish the EGF debate about interdependencies between coexisting policy venues, emphasizing the role of the different institutional attributes framing the policy venues and the effects of these differences on governance formation.
Article
Full-text available
How do individuals’ network selection decisions create unique network structures? Despite broad sociological interest in the micro-level social interactions that create macro-level network structure, few methods are available to statistically evaluate micro-macro relationships in social networks. This study introduces a general methodological framework for testing the effect of (micro) network selection processes, such as homophily, reciprocity, or preferential attachment, on unique (macro) network structures, such as segregation, clustering, or brokerage. The approach uses estimates from a statistical network model to decompose the contributions of each parameter to a node, subgraph, or global network statistic specified by the researcher. A flexible parametric algorithm is introduced to estimate variances, confidence intervals, and p values. Prior micro-macro network methods can be regarded as special cases of the general framework. Extensions to hypothetical network interventions, joint parameter tests, and longitudinal and multilevel network data are discussed. An example is provided analyzing the micro foundations of political segregation in a crime policy collaboration network.
Article
Full-text available
While economic sociology research and theory argue that excessive network embeddedness depresses competition in illegal markets, prior research does not examine how distinct types of embeddedness may have asymmetric effects on the diversity of purchasing behavior-the range of illegal goods that buyers typically purchase. This study considers how network embeddedness can positively or negatively affect drug purchasing diversity in online drug markets by referring buyers to new vendors or "locking" buyers into recurrent trade for the same products. We analyze novel network data on 16,847 illegal drug exchanges between 7205 actors on one online illegal drug market. Consistent with hypothesized network asymmetry, buyers are more likely to purchase a new type of drug when the transaction is part of an indirect network referral. Although histories of exchange increase the overall frequency of drug purchasing, they are associated with decreases in new drug-type purchases. In the aggregate, these processes either contribute to an integrated market where buyers purchase multiple drugs from multiple vendors (in the case of referrals) or a fragmented market characterized by recurrent trade from the same vendors for the same substances (in the case of repeated trade). We discuss the implications of these findings for research on embeddedness, illegal markets, risky exchange, and drug policy.
Article
Full-text available
Multilevel network data provide two important benefits for ERG modeling. First, they facilitate estimation of the decay parameters in geometrically weighted terms for degree and triad distributions. Estimating decay parameters from a single network is challenging, so in practice they are typically fixed rather than estimated. Multilevel network data overcome that challenge by leveraging replication. Second, such data make it possible to assess out-of-sample performance using traditional cross-validation techniques. We demonstrate these benefits by using a multilevel network sample of classroom networks from Poland. We show that estimating the decay parameters improves in-sample performance of the model and that the out-of-sample performance of our best model is strong, suggesting that our findings can be generalized to the population of interest.
Article
Full-text available
Many research questions involve comparing predictions or effects across multiple models. For example, it may be of interest whether an independent variable’s effect changes after adding variables to a model. Or, it could be important to compare a variable’s effect on different outcomes or across different types of models. When doing this, marginal effects are a useful method for quantifying effects because they are in the natural metric of the dependent variable and they avoid identification problems when comparing regression coefficients across logit and probit models. Despite advances that make it possible to compute marginal effects for almost any model, there is no general method for comparing these effects across models. In this article, the authors provide a general framework for comparing predictions and marginal effects across models using seemingly unrelated estimation to combine estimates from multiple models, which allows tests of the equality of predictions and effects across models. The authors illustrate their method to compare nested models, to compare effects on different dependent or independent variables, to compare results from different samples or groups within one sample, and to assess results from different types of models.
Article
Full-text available
In the study of social processes, the presence of unobserved heterogeneity is a regular concern. It should be particularly worrisome for the statistical analysis of networks, given the complex dependencies that shape network formation combined with the restrictive assumptions of related models. In this paper, we demonstrate the importance of explicitly accounting for unobserved heterogeneity in exponential random graph models (ERGM) with a Monte Carlo analysis and two applications that have played an important role in the networks literature. Overall, these analyses show that failing to account for unobserved heterogeneity can have a significant impact on inferences about network formation. The proposed frailty extension to the ERGM (FERGM) generally outperforms the ERGM in these cases, and does so by relatively large margins. Moreover, our novel multilevel estimation strategy has the advantage of avoiding the problem of degeneration that plagues the standard MCMC-MLE approach.
Article
Full-text available
Research on inmate social order, a once-vibrant area, receded just as U.S. incarceration rates climbed and the country’s carceral contexts dramatically changed. This study returns to inmate society with an abductive mixed-methods investigation of informal status within a contemporary men’s prison unit. We collected narrative and social network data from 133 male inmates housed in a unit of a Pennsylvania medium-security prison. Analyses of inmate narratives suggest that unit “old heads” provide collective goods in the form of mentoring and role modeling that foster a positive and stable peer environment. We test this hypothesis with Exponential Random Graph Models (ERGMs) of peer nomination data. The ERGM results complement the qualitative analysis and suggest that older inmates and inmates who have been on the unit longer are perceived by their peers as powerful and influential. Both analytic strategies point to the maturity of aging and the acquisition of local knowledge as important for attaining informal status in the unit. In summary, this mixed-methods case study extends theoretical insights of classic prison ethnographies, adds quantifiable results capable of future replication, and points to a growing population of older inmates as important for contemporary prison social organization.
Article
Full-text available
We consider the challenging problem of statistical inference for exponential-family random graph models based on a single observation of a random graph with complex dependence. To facilitate statistical inference, we consider random graphs with additional structure in the form of block structure. We have shown elsewhere that when the block structure is known, it facilitates consistency results for $M$-estimators of canonical and curved exponential-family random graph models with complex dependence, such as transitivity. In practice, the block structure is known in some applications (e.g., multilevel networks), but is unknown in others. When the block structure is unknown, the first and foremost question is whether it can be recovered with high probability based on a single observation of a random graph with complex dependence. The main consistency results of the paper show that it is possible to do so under weak dependence and smoothness conditions. These results confirm that exponential-family random graph models with block structure constitute a promising direction of statistical network analysis.
Article
The unequal spatial distribution of crime is an enduring feature of cities. While research suggests that spatial diffusion processes heighten this concentration, the actual mechanisms of diffusion are not well understood as research rarely measures the ways in which people, groups, and behaviors connect neighborhoods. This study considers how a particular behavior, criminal co-offending, creates direct and indirect pathways between neighborhoods. Analyzing administrative records and survey data, the authors find that individual acts of co-offending link together to create a “network of neighborhoods,” facilitating the diffusion of crime over time and across space and, in so doing, create pathways between all Chicago neighborhoods. Statistical analyses demonstrate that these neighborhood networks are (1) stable over time; (2) generated by important structural characteristics, social processes, and endogenous network properties; and (3) a better predictor of the geographic distribution of crime than traditional spatial models.
Article
Methods for group comparisons using predicted probabilities and marginal effects on probabilities are developed for regression models for binary outcomes. Unlike approaches based on the comparison of regression coefficients across groups, the methods we propose are unaffected by the scalar identification of the coefficients and are expressed in the natural metric of the outcome probability. While we develop our approach using binary logit with two groups, we consider how our interpretive framework can be used with a broad class of regression models and can be extended to any number of groups.
Article
Exponential Random Graph Models (ERGMs) are becoming increasingly popular tools for estimating the properties of social networks across the social sciences. While the asymptotic properties of ERGMs are well understood, much less is known about how ERGMs perform in the face of violations of the assumptions that drive those asymptotic properties. Given that empirical social networks rarely meet the strenuous assumptions of the ERGM perfectly, practical researchers are often in the position of knowing their coefficients are imperfect, but not knowing precisely how wrong those coefficients may be. In this research, we examine one violation of the asymptotic assumptions of ERGMs – perfectly measured social networks. Using several Monte Carlo simulations, we demonstrate that even randomly distributed measurement errors in networks under study can cause considerable attenuation in coefficients from ERGMs, and do real harm to subsequent hypothesis tests.
Article
This paper proposes an empirical model of network formation, combining strategic and random networks features. Payoffs depend on direct links, but also link externalities. Players meet sequentially at random, myopically updating their links. Under mild assumptions, the network formation process is a potential game and converges to an exponential random graph model (ERGM), generating directed dense networks. I provide new identification results for ERGMs in large networks: if link externalities are nonnegative, the ERGM is asymptotically indistinguishable from an Erdős–Rényi model with independent links. We can identify the parameters only when at least one of the externalities is negative and sufficiently large. However, the standard estimation methods for ERGMs can have exponentially slow convergence, even when the model has asymptotically independent links. I thus estimate parameters using a Bayesian MCMC method. When the parameters are identifiable, I show evidence that the estimation algorithm converges in almost quadratic time.