ArticlePDF Available

A Graph Theory Based Similarity Metric Enables Comparison of Subpopulation Psychometric Networks

Authors:

Abstract

Network psychometrics leverages pairwise Markov random fields to depict conditional dependencies among a set of psychological variables as undirected edge-weighted graphs. Researchers often intend to compare such psychometric networks across subpopulations, and recent methodological advances provide invariance tests of differences in subpopulation networks. What remains missing, though, is an analogue to an effect size measure that quantifies differences in psychometric networks. We address this gap by complementing recent advances for investigating whether psychometric networks differ with an intuitive similarity measure quantifying the extent to which networks differ. To this end, we build on graph-theoretic approaches and propose a similarity measure based on the Frobenius norm of differences in psychometric networks’ weighted adjacency matrices. To assess this measure’s utility for quantifying differences between psychometric networks, we study how it captures differences in subpopulation network models implied by both latent variable models and Gaussian graphical models. We show that a wide array of network differences translates intuitively into the proposed measure, while the same does not hold true for customary correlation-based comparisons. In a simulation study on finite-sample behavior, we show that the proposed measure yields trustworthy results when population networks differ and sample sizes are sufficiently large, but fails to identify exact similarity when population networks are the same. From these results, we derive a strong recommendation to only use the measure as a complement to a significant test for network similarity. We illustrate potential insights from quantifying psychometric network similarities through cross-country comparisons of human values networks.
Running head: NETWORK SIMILARITY 1
A graph-theory based similarity metric enables comparison of sub-population psychometric
networks
Esther Ulitzsch1, Saurabh Khanna2, Mijke Rhemtulla3, and Benjamin W. Domingue2
1IPN—Leibniz Institute for Science and Mathematics Education
2Stanford Graduate School of Education
3University of California, Davis
Author Note
Correspondence concerning this article should be sent to Esther Ulitzsch,
IPN—Leibniz Institute for Science and Mathematics Education, Educational Measurement,
Olshausenstraße 62, 24118 Kiel, Germany, phone: +49-431-880-1704, email:
ulitzsch@leibniz-ipn.de. Extra materials for this article can be found in the OSF and are
available via the following link: https://osf.io/guxf8/. This work was supported by the
Jacobs Foundation.
NETWORK SIMILARITY 2
Abstract
Network psychometrics leverages pairwise Markov random fields to depict conditional
dependencies among a set of psychological variables as undirected edge-weighted graphs.
Researchers often intend to compare such psychometric networks across sub-populations,
and recent methodological advances provide invariance tests of differences in
sub-population networks. What remains missing, though, is an analogue to an effect size
measure that quantifies differences in psychometric networks. We address this gap by
complementing recent advances for investigating whether psychometric networks differ,
with an intuitive similarity measure quantifying the extent to which networks differ. To
this end, we build on graph-theoretic approaches and propose a similarity measure based
on the Frobenius norm of differences in psychometric networks’ weighted adjacency
matrices. To assess this measure’s utility for quantifying differences between psychometric
networks, we study how it captures differences in sub-population network models implied
by both latent variable models and Gaussian graphical models. We show that a wide array
of network differences translates intuitively into the proposed measure, while the same does
not hold true for customary correlation-based comparisons. In a simulation study on finite
sample behavior, we show that the proposed measure yields trustworthy results when
population networks differ and sample sizes are sufficiently large, but fails to identify exact
similarity when population networks are the same. From these result, we derive the strong
recommendation to only use the measure as a complement to a significant test for network
similarity. We illustrate potential insights from quantifying psychometric network
similarities through cross-country comparisons of human values networks.
Keywords: network models; group comparisons; graph similarity
NETWORK SIMILARITY 3
1 Introduction
Network psychometrics leverages pairwise Markov random fields to depict conditional
dependencies among a set of psychological variables as undirected edge-weighted graphs
and provides tools for exploring the relationships among the studied observables. In
psychological research, psychometric network models have facilitated a more nuanced
understanding of the interplay of, among others, psychopathological symptoms (e.g. Fried
et al., 2015; Isvoranu et al., 2016; McNally et al., 2015), attitudes and beliefs (Dalege et al.,
2016), or different aspects of health-related quality of life (Kossakowski et al., 2016), and
may pose a viable alternative to latent variable or common cause modeling of psychological
and behavioral data (Borsboom, 2017; Borsboom & Cramer, 2013; Cramer et al., 2010;
Hofmann et al., 2016; McNally et al., 2017), especially when the assumption of a common
cause is not tenable and/or in the absence of strong prior theory on how variables are
related to each other.
A wide range of commonly encountered psychological research questions involves
comparisons across multiple groups. Examples include comparisons between clinical and
non-clinical populations, treatment groups and their control counterparts, groups differing
in their exposure to risk factors, and different cultural groups. When using network models
to address such research questions, researchers are typically interested in whether and to
what extent networks differ across sub-populations. While earlier work conducting such
comparisons predominantly relied on visual inspection of networks (Bringmann et al., 2013;
Koenders et al., 2015; Wigman et al., 2015) or comparisons in terms of some selected
features such as the strength of single edges or centrality indicators of nodes (Birkeland
et al., 2017; Forbes et al., 2021), recent methodological advances provided invariance tests
(Haslbeck, 2022; van Borkulo et al., 2022; Williams et al., 2020) that test for evidence of
differences in sub-population networks and meta-analytic techniques that support
aggregating psychometric networks across different samples (Epskamp et al., 2022). What
is missing, however, is an analogue to an effect size measure that quantifies differences in
NETWORK SIMILARITY 4
psychometric networks. In this study, we aim to fill this gap and complement recently
developed methods for investigating whether psychometric networks differ from each other
with an easily applicable similarity measure for quantifying the extent to which
psychometric networks differ.
To this end, we suggest capitalizing on established graph-theoretical measures for
determining the degree of similarity among graphs, and evaluate whether these may serve
as standardized measures that—in analogy to effect size measures—quantify the overall
difference between psychometric networks. Having such measures at hand opens the path
for, among others, evaluating fiercely debated replicability of psychometric networks across
different samples (see Forbes et al., 2021; Fried et al., 2018; Jones et al., 2021; Williams,
2022, for discussions) by gauging differences among original and replicated networks,
quantifying the degree of change in psychometric networks an intervention induces to
experimental versus control groups, or investigating whether, say, structurally different
clinical samples are as dissimilar to each other as they are from a non-clinical sample.
In what follows, Sections 2 and 3 provide concise overviews of psychometric network
models and previous work on comparisons of sub-population networks. Section 4 then
provides an overview of established graph-theoretical measures for quantifying differences
among graphs. Based on theoretical consideration around the applicability of these
measures to psychometric networks, we identify the Frobenius norm of differences in
psychometric networks’ weighted adjacency matrices as a potentially suitable candidate
measure for quantifying psychometric network differences and use it to derive a normalized
similarity measure. To study its utility for quantifying differences between psychometric
networks, Section 5 investigates whether various differences in both population network
models implied by latent variable models and Gaussian graphical models translate
intuitively into the Frobenius norm-based similarity measure. We contrast its behavior
against the correlation between lower triangulars of networks’ weighted adjacency matrices,
which is the current ad hoc (as it has not been formally evaluated) method of choice for
NETWORK SIMILARITY 5
quantifying the similarity between sub-population networks. We show that a wide array of
network differences translates intuitively into the suggested Frobenius norm-based
similarity measure, while the same does not hold true for correlation-based comparisons.
Section 6 studies the Frobenius norm-based similarity measure’s finite sample behavior. We
show that it yields trustworthy results when population networks differ and sample sizes
are sufficiently large, but fails to identify exact similarity when population networks are the
same. From the simulations’ results, Section 7 then derives initial guidelines for
interpreting the magnitude of the Frobenius norm-based similarity measure. Based on our
investigations of the measure’s finite-sample behavior, we strongly recommend to only use
the measure in combination with a significance test for network similarity, and use the
measure to quantify the degree of similarity only in the case that the significant test rejects
the null of networks being equal. Section 8 illustrate the insights than can be gained from
quantifying network similarities by conducting cross-country comparisons of human values
networks.
2 Psychometric Network Models
Network psychometrics make use of pairwise Markov random fields (Murphy, 2012) to
depict conditional dependencies among a set of psychological variables (e.g., symptoms) as
undirected edge-weighted graphs G= (V, E, w)(see Bondy & Murty, 2008; Schulz et al.,
2022). Here, G= (V, E, w)is a tuple (i.e., an immutable ordered sequence) where the set
of nodes V={v1, v2,...vp}denotes the variables of the p-node network model and the
edges EV
2(i.e., the set of all two-element subsets of V) denote the connections among
them. Pairwise Markov random fields encode conditional independence associations, such
that e={j, k} Ewhen the variables jand kare not conditionally independent after
conditioning on all other variables. Conversely, e={j, k}/Eindicates that variables j
and kare independent after controlling for all other variables in the network. The weight
function w:ERassigns to each edge e={j, k} Ea weight encoding the strength of
NETWORK SIMILARITY 6
conditional dependence between nodes jand k. The weight of an edge e={j, k}is
denoted by w({j, k}). The structure of Gcan be represented using its p×pweighted
adjacency matrix A=A(G), with its entry ajk given by
ajk =
w({j, k})if {j, k} E
0otherwise.
(1)
The most common types of Markov random random fields used in network
psychometrics are Gaussian graphical models (GGMs, Lauritzen, 1996) for multivariate
normal distributed data, Ising models for binary data, and mixed graphical models for data
containing variables from different distribution families. In the present study, we will use
GGMs to investigate the utility of graph-theoretical measures for quantifying differences
between psychometric networks. In the application of psychometric networks, continuous
data are commonly encountered and GGMs are the customary method of choice for their
analysis.
In GGMs, edge weights represent non-zero partial correlation coefficients. More
specifically, let ydenote a set of prandom variables, constituting the nodes of the GGM. It
is assumed that yis centered and follows a multivariate normal distribution with
y Np(0,Σ),(2)
with Σgiving the variance-covariance matrix. Partial correlations ωjk between variables j
and kcan directly be obtained from the precision matrix Θ=Σ1as
ωjk =θjk
qθjj qθkk j6=k. (3)
Then, the p×ppartial correlation matrix constitutes the weighted adjacency matrix A
of the GGM.
NETWORK SIMILARITY 7
3 Comparing Psychometric Network Models
Methods for comparing sub-population networks have received increasing attention in
recent years (e.g. Epskamp et al., 2022; Haslbeck, 2022; van Borkulo et al., 2022; Williams
et al., 2020). So far, this rapidly evolving stream of research has mainly been concerned
with developing significance tests focused on identification of evidence for differences across
sub-population networks.
Among these, the Network Comparison Test (NCT; van Borkulo et al., 2022) is the
most widely used and has become a customary tool in applied network psychometrics. The
NCT is a three-step, resampling-based permutation test. First, networks for two groups of
interest are estimated separately, and some test statistic summarizing key structural
differences among them—such as the difference in the networks’ sums of edge weights or
the largest difference in edge weights—is obtained. Second, a reference distribution is
created by pooling the two samples, resampling according to the original sample sizes, and,
for each resample, obtaining the chosen test statistic. Finally, the significance of the
empirical test statistic is assessed by comparing it to the reference distribution. Further
recent methodological advances cover network comparisons across more than two groups
(Haslbeck, 2022), Bayesian network comparisons (Williams et al., 2020), and partial
pooling of networks across multiple samples (Epskamp et al., 2022).
Naturally, the power of significance tests for comparing psychometric networks
increases with increasing sample size, and even infinitesimal differences between networks
will eventually become significant. Hence, effect sizes measures that facilitate investigating
whether found differences in networks are of practical relevance are urgently needed.
Methods for quantifying the extent to which sub-population networks differ, however, have
received far less attention. In applied research, NCTs are often complemented with
(rank-order) correlations of lower triangulars of the network pair’s weighted adjacency
matrices (as in Bereznowski et al., 2021; Burger et al., 2020; Fritz et al., 2018; Maccallum
et al., 2017; Schlegl et al., 2021; Van Loo et al., 2018) or the comparison of specific features
NETWORK SIMILARITY 8
of the network such as node centrality indices (as in Beard et al., 2016; Kossakowski et al.,
2016; Schlegl et al., 2021; Shim et al., 2021). Nevertheless, to date, there are no procedures
accepted as a standard by the psychometric network community for quantifying differences
in sub-population networks, and the suitability of such current practices has yet not been
investigated. Obviously, the comparison of specific features such as node centralities is
sensitive to only some aspects of possible network differences. Likewise, Brusco (2004),
Brusco and Cradit (2005), and Hubert (1978) pointed out that correlation-based
comparisons are able to capture only the presence of linear association in weighted
adjacency matrices, but might overlook other types of structural similarities. In fact, as
will be illustrated below, current correlation-based practice for quantifying differences in
psychometric networks may yield counter-intuitive conclusions or fail to exhibit sensitivity
to specific network differences. To fill this gap, we suggest borrowing from
graph-theoretical literature, where determining the similarity (or distance) between graphs
is a well-known and well-studied problem.
4 Bringing Graph-Theoretical Distance Measures Into Psychometric Network
Modeling
Graph-theoretical distance measures aim to compress differences between graphs into a
real-valued measure that converges to zero as pairs of graphs approach isomorphism, i.e.,
one-to-one correspondence mapping the graphs’ node, edge, and, if applicable, edge weight
sets; indicating exact similarity. Figure 1 provides an illustration of isomorphic and
non-isomorphic pairs of unweighted, undirected graphs. In the isomorphic pair, the two
depicted graphs exhibit exact similarity in that there exists a one-to-one mapping of the
vertices of graph 1 to those of graph 2 such that there is an edge between vertices jand k
in graph 1 if and only if there is an edge between the corresponding vertices in graph 2.
For the non-isomorphic pair, no such mapping exists, meaning that the two graphs do not
exhibit exact similarity, and researchers may want to employ graph-theoretical distance
NETWORK SIMILARITY 9
measures to quantify how far the pair of graphs is from isomorphism.
Graph-theoretical distance measures find application in a broad range of fields
where the to-be-studied objects can be expressed as graphs, ranging from neuroscience,
e.g., for quantifying differences in brain connectivity among different subgroups (Sporns
et al., 2004), through computer vision for quantifying differences in images (Wilson & Zhu,
2008) and social network analysis, e.g., to study differences in flow of information (Koutra
et al., 2013), to communication science, e.g., to study re-tweet patters (Bovet & Makse,
2019; Van Vliet et al., 2021).
´a
bc
d
Graph 1
1
1
1
e
fg
h
Graph 2
1
11
(a) An isomorphic pair of graphs
´a
bc
d
Graph 1
1
1
1
e
fg
h
Graph 2
1
11
1
(b) A non-isomorphic pair of graphs
Figure 1 . Examples for isomorphic and non-isomorphic pairs of graphs. In the isomorphic
pair, the bijection bf,cg,ah, and demaps the vertices of graph 1 to the
vertices of graph 2 such that any two vertices jand kof graph 1 are adjacent in graph 1 if
and only if f(j)and f(k)are adjacent in graph 2. For the non-isomorphic graph, no such
mapping exists.
A great variety of measures exists (see Tantardini et al., 2019; Wills & Meyer, 2020,
for an overview). The choice of measure should be guided by the properties of the studied
graphs. Tantardini et al. (2019) provided a three-dimensional taxonomy informing this
choice. First, measures can be distinguished based on whether they are dependent on node
correspondence. In the case of known node correspondence, the two graphs in question
NETWORK SIMILARITY 10
have the same node set, and the pairwise correspondence between nodes is known (i.e., no
mapping is needed). Second, measures can be distinguished based on whether they are
applicable to unweighted graphs only or generalize to weighted graphs. Third, measures
can be distinguished based on whether they are applicable to undirected graphs only or
generalize to directed graphs.
Based on this taxonomy, graph-theoretical measures used to compare psychometric
networks should fulfill the following criteria. First, sub-population comparisons of
psychometric networks are commonly performed on the same set of variables. Hence, a
known node correspondence method can be used. Second, as psychometric networks are
always weighted, measures used for their comparisons should generalize to weighted graphs.
Third, while most psychometric networks are undirected, directed networks exist in the
context of longitudinal and time-series data. These so-called temporal networks are
directed network of regression coefficients that depict the lagged associations between
variables (e.g., affective states) from one measurement point to the next (see Epskamp,
2020; Epskamp, van Borkulo, et al., 2018; Epskamp, Waldorp, et al., 2018). Thus,
measures should ideally generalize to directed graphs.
Norm-based graph distance measures satisfy these criteria (Tantardini et al., 2019).
These distance measures are based on the difference of the p×pweighted adjacency
matrices Anand Amof two graphs. To compress the structure of the obtained matrix
AnAminto a single measure, any matrix norm can be used. The standard choice (Wills
& Meyer, 2020) is the (normalized) Frobenius norm:
dF(n, m) = 1
qp/2kAnAmkF=1
qp/2
v
u
u
t
p
X
j=1
p
X
k=1 |an
jk am
jk |2.(4)
The constant 1
p/2normalizes the Frobenius norm by the size of the to-be-compared
weighted adjacency matrices, thereby ensuring that values of dF(n, m)are comparable
across pairs of networks with different numbers of nodes.
NETWORK SIMILARITY 11
Besides accommodating the characteristics of psychometric networks, the Frobenius
norm of the difference between two adjacency matrices comes with two further major
advantages over other commonly used graph-theoretical distance measures. First, the
Frobenius norm of the difference between two adjacency matrix is a proper metric in the
mathematical sense (Wills & Meyer, 2020).1The second major advantage lies in its
simplicity (Martinez & Chavez, 2018). This is an important property, as more complex
measures—having been developed for networks capturing flows or distances—may not be
applicable to the context of network psychometrics.
Note that the distance measure dF(n, m)=0is unbounded [0,), with 0 indicating
exact similarity. To facilitate interpretability, following Koutra et al. (2013), we use the
distance-to-similarity transformation
sF(n, m) = 1
1 + dF(n, m).(5)
This transformation ensures that sF(n, m)is bounded within the interval (0,1], which
facilitates interpretability. Note that the transformation yields a similarity measure instead
of a distance measure. That is, for exact similarity, the Frobenius norm-based similarity
measure sF(n, m)evaluates to 1, not 0. Since dF(n, m)is a proper metric satisfying the
identity of indiscernibles, sF(n, m)equals 1 if and only if the compared networks are
isomorphic. An example calculation of sFis provided in Appendix A. An R function for
obtaining the suggested measure is provided in Appendix B.
5 Illustrations and Experiments with Population Networks
For illustrating and evaluating the proposed Frobenius norm-based measure sF, we
investigated whether and how it captures various differences in psychometric networks. To
1That is, it satisfies a) identity of indiscernibles (i.e., dF(n, m)=0iff An=Am), b) symmetry (i.e.,
dF(n, m) = dF(m, n)), and c) triangle inequality (i.e., dF(n, m)dF(n, l) + dF(l, m)) (Van Loan & Golub,
1996, p. 56).
NETWORK SIMILARITY 12
this end, we investigated differences between population networks (i.e., we considered true
partial correlation matrices) in two studies.
In Study I, we assessed how sFcaptures differences in networks implied by latent
variable models. Psychometric networks can fairly well capture patterns implied by latent
variable models (Golino & Epskamp, 2017; van der Maas et al., 2006). Due to their
ubiquity and long-standing application in psychological research, we believe that there is a
common understanding in the research community of what constitutes a mild or severe
difference among latent variable models. In fact, many studies in the context of
confirmatory factor analysis leverage this common understanding when investigating
misspecifications, e.g., for deriving fit index cut-offs (see Garnier-Villarreal & Jorgensen,
2020; McNeish & Wolf, 2021, for recent examples). We aimed to use common intuition
about latent variable models and the common understanding of the severity of differences
among them to generate insights into the resultant functioning of sF. The hope in doing so
is that, when inspecting sFas a function of varying extent of differences between
sub-population confirmatory factor analysis models, readers can draw on this
understanding to build initial intuition on values of sFthat indicate mild or severe
differences. We considered two commonly studied scenarios of group differences in latent
variable models: a) differences in latent correlations and b) the absence versus presence of
cross-loadings.
In Study II, we investigated how sFquantifies differences in GGMs. We aimed to
cover a broad range of quantitative (in the sense that networks share the same set of edges,
but differ in edge weights) and qualitative differences (in the sense that networks differ in
structure), considering both network pairs differing in a single edge or edge weight only
and differences spread out across the whole network. To this end, network pairs were
created by first generating network 1 and then deriving network 2 from network 1 through
various types of manipulations. We considered three sets of scenarios. First, we
investigated manipulations of varying severeness of network 1’s strongest edge weight.
NETWORK SIMILARITY 13
Second, we gradually decreased similarity between network 1 and network 2 by increasing
the proportion of network 1’s edge weights affected by manipulations of varying severeness.
Third, we studied how micro-manipulations of edge weights spreading across the whole
network are captured by sF.
Whenever possible, we also benchmarked our metric’s performance with that of the
correlation between vectorized strictly lower triangulars of the networks’ weighted
adjacency matrices, denoted with scor, which is the current method of choice for
determining the similarity between sub-population networks. Note that values of sFand
scor are not directly comparable, as they are located on different scales.
R code for our investigations is provided in the OSF repository accompanying this
study.
5.1 Study I: Comparing Networks Implied by Latent Variable Models
5.1.1 Method. For generating population networks, we obtained partial
correlation matrices from model-implied variance-covariance matrices of confirmatory
factor analysis models. That is, for the observed variables yof the networks we assumed
y=ν+Λη+ε,(6)
where νis a p×1vector containing intercepts, Λis a p×lmatrix of factor loadings, ηis an
l×1vector of latent factors, and εdenotes the vector of multivariate normally distributed
residuals with zero mean vector and covariance matrix Ψ(Bollen, 1989). For all models, we
generated uncorrelated residuals. All intercepts were set to zero, assuming that the mean
structure is saturated and completely reflected in the intercepts, that is, E(y) = ν=0.
The model-implied covariance structure Σof the observed variables is given by
Σ=ΛΦΛT+Ψ,(7)
NETWORK SIMILARITY 14
where Φis the l×lcovariance matrix of the latent factors. From the inverse of the
model-implied covariance matrix, we constructed the partial correlation matrix according
to Equation 3.
Differences in latent correlations. We generated pairs of networks implied by
two-dimensional confirmatory factor analysis models with different latent correlations ρ
(see Figure 2). For all models, we generated the observed variables ywith unit variance
and set all non-zero factor loadings to λ= 0.70. We assumed half of the observed variables
to load on the first factor and the other half on the second factor. All generated pairs of
networks comprised the special case of ρ= 1 (network 1), implying the same covariance
matrix (and, consequently, the same network) as a unidimensional model, and a network
implied by a model with ρ6= 1 (network 2). We varied the number of nodes (p= 10;
p= 20) as well as the correlation between the two latent factors of network 2, evaluating
the boundary values of -.99 and .99 as well as the sequence from .95 to .-95 in steps of .05.
Doing so gradually increases differences in networks derived from models with ρ= 1 and
ρ6= 1, with ρ=.99 inducing the largest difference.
Y1Y2Y3Y4Y5Y6Y7Y8Y9Y10 Y1Y2Y3Y4Y5Y6Y7Y8Y9Y10
η1η2
Group 1
ρ= 1
η1η2
Group 2
ρ6= 1
Figure 2 . Confirmatory factor analysis models for two groups with different latent
correlations
To help intuition, Figure 3 illustrates exemplar networks implied by models with
ρ= 1 (Figure 3a), ρ=.50 (Figure 3b), and ρ=.90 (Figure 3c). As was to be expected,
the network with ρ=.50 exhibited greater proximity to the network with ρ= 1 than the
network with ρ=.90. The model with ρ= 1 implied all partial correlations (i.e., edge
weights) to be equal (.10 in the present case). When ρ6= 1, edge strengths between nodes
NETWORK SIMILARITY 15
belonging to the same factor were stronger (.19 for ρ=.50 and .14 for ρ=.90) compared
to those of the model with perfectly correlated latent factors; and edge weights between
nodes belonging to different factors were considerably weaker when ρ=.50 (.02) and
negative when ρ=.90 (-.07), mirroring the structure of the respective confirmatory
factor analysis models.
Note that in this set of comparisons, scor could not be obtained due to lack of
variability in edge weights of the ρ= 1 network. Hence, networks were only compared
using sF.
1
2
3
4
5
6
7
8
9
10
a)
1
2
3
4
5
6
7
8
9
10
b)
1
2
3
4
5
6
7
8
9
10
c)
Figure 3 . Psychometric networks implied by confirmatory factor analysis models with
different correlations ρamong latent factors. a) ρ= 1, b) ρ=.50, c) ρ=.90. Line
thickness represents the absolute size of the edge weights. Red and blue denote negative
and positive edge weights, respectively.
Differences in cross-loadings. For evaluating how sFand the
correlation-based benchmark scor capture differences in loading patterns, we compared
networks implied by two-dimensional models with the same latent correlation (ρ=.50;
ρ= 0;ρ=.50), but differing in whether (network 2) or not (network 1) there were
cross-loadings (see Figure 4). For network 1, we generated the observed variables ywith
unit variance and set all non-zero factor loadings to λ= 0.50. For network 2, we added
cross-loadings to this model, while keeping main factor loadings and error variances
constant across both models. We varied the number of cross loadings (1; 2) as well as their
NETWORK SIMILARITY 16
size 0.5λ, evaluating the sequence λ= 0.05 to λ= 0.50 in steps of 0.05. We, again,
varied the number of nodes (p= 10;p= 20).
Y1Y2Y3Y4Y5Y6Y7Y8Y9Y10 Y1Y2Y3Y4Y5Y6Y7Y8Y9Y10
η1η2
Group 1
η1η2
Group 2
Figure 4 . Confirmatory factor analysis models for two groups with different loading
patterns
Figure 5 depicts examplar network pairs implied by two-dimensional models with
correlations between latent factors of -.50 (Figures 5a and 5b) and .50 (Figures 5c and 5d.)
Networks b) and d) are implied by confirmatory factor analysis models with high
cross-loadings (λ= 0.50) for variables y6and y7.
The most pronounced differences between networks implied by models with and
without cross-loadings were observable for weights of edges connecting variables y6and y7
to variables from different primary factors (i.e., y1to y5). In the network pair with
negatively correlated latent factors, weights for these edges were negative (-.03) when no
cross-loadings were present but positive (.02) for the network obtained from the model with
cross-loadings. In the network pair with positively correlated latent factors, the
introduction of cross-loadings resulted in higher weights for these edges (.07 as compared to
.03).
5.1.2 Results.
Differences in latent correlations. Figure 6 displays sFas a function of the
correlation between latent factors in the network with ρ6= 1 and the number of nodes p.
Recall that all comparisons involved p-node network pairs with ρ= 1 for network 1 and
ρ6= 1 for network 2. As expected, sFincreases to unity as ρ1. Declines in sFassociated
with declines in ρwere more pronounced for smaller networks (e.g., for ρ=.90,sFwas
.65 for p= 10, but .74 for p= 20), and steepness of changes in sFwas greater the closer ρ
NETWORK SIMILARITY 17
1
2
3
4
5
6
7
8
9
10
a)
1
2
3
4
5
6
7
8
9
10
b)
1
2
3
4
5
6
7
8
9
10
c)
1
2
3
4
5
6
7
8
9
10
d)
Figure 5 . Psychometric networks implied by different latent variable models. a)
two-dimensional model without cross-loadings with a correlation between latent factors of
-.50, b) two-dimensional model with high cross-loadings of variables y6and y7with a
correlation between latent factors of -.50, c) two-dimensional model without cross-loadings
with a correlation between latent factors of .50, d) two-dimensional model with high
cross-loadings of variables y6and y7with a correlation between latent factors of .50. Line
thickness represents the absolute size of the edge weights. Red and blue denote negative
and positive edge weights, respectively.
in network 2 was to the boundaries of -.99 and .99. Across all conditions, sFranged from
.62 (p= 10,ρ=.99) to .98 (p= 10,ρ=.99).
Differences in cross-loadings. Results for sFand the scor baseline are given in
Figure 7. The behavior of both measures is displayed as a function the number of nodes,
the correlation between latent factors, the number of cross-loadings, and λ.sFcaptured
differences in loading patterns in that it decreased steadily as a function of the size of λ
and tended to be smaller for a higher number of cross-loadings and a smaller number of
nodes. For a single small cross-loading (λ= 0.05), sFwas close to one (i.e., .99),
indicating almost perfect similarity between the networks. Two high cross-loadings
(λ= 0.50), in contrast, resulted in sFaround .90 for p= 10 and .93 for p= 20.
NETWORK SIMILARITY 18
0.4
0.6
0.8
1.0
−1.0 −0.5 0.0 0.5 1.0
ρ
sF
p = 10 p = 20
Figure 6 . Similarity in terms of the Frobenius norm-based measure sFbetween two
network models implied by two-dimensional confirmatory factor analysis models with
different correlations between latent factors. The correlation between factors was set to 1
for network 1 and varied for network 2. ρgives the correlation for network 2; pdenotes the
number of nodes.
The alternative scor metric also captured differences in the number and size of the
cross-loadings. However, the magnitude of scor was highly contingent on correlations
between latent factors, and rapidly decreased for high positive correlations. For instance,
while two high cross-loadings (λ= 0.50) in a network with p= 10 resulted in scor = 0.95
for ρ=.50,scor was as low as .88 when ρ=.50. These sensitivities are counter-intuitive
as the correlation between latent factors was kept constant within network pairs, and can
be attributed to decreased variability in the edge weights of the network without
cross-loadings when ρhad large, positive values (compare Figures 5a and 5c). sF, in
contrast, did not exhibit such sensitivities to the correlation between latent factors.
5.2 Study II: Comparing Gaussian Graphical Models
5.2.1 Method. For investigating the capability of sFand the scor baseline to
capture differences in networks implied by GGMs, we first generated population networks
according to the procedure outlined in Wysocki and Rhemtulla (2021), varying the number
of nodes (p= 10;p= 20) and network density (i.e., the proportion of possible edges in the
NETWORK SIMILARITY 19
p = 10
p = 20
Cross−loadings = 1
Cross−loadings = 2
0.1 0.2 0.3 0.4 0.5 0.1 0.2 0.3 0.4 0.5
0.4
0.6
0.8
1.0
0.4
0.6
0.8
1.0
λ*
s
ρ=−0.5 ρ=0ρ=0.5
sFscor
Figure 7 . Similarity in terms of the Frobenius norm-based measure sF, and
correlation-based comparisons scor of network models implied by confirmatory factor
analysis models with different loading patterns. No cross-loadings were added for network
1. The size (0.50λ) and number of cross-loadings was varied for network 2. ρgives the
correlation between factors for both networks; pdenotes the number of nodes.
network actually present, regardless of the edges’ weights), considering values from .10 to
.90 in increments of .20. For each of the 2×5 = 10 conditions, 100 population network
pairs were generated.
We started by generating population network 1. To this end, for each of the 100
replications, we generated a random weighted adjacency matrix—constituting network
1—using the R package BDgraph (Mohammadi & Wit, 2015b). For creating the graph
structure, BDgraph randomly samples edges from a binomial distribution according to the
specified network density. Then, using a G-Wishart distribution with WG(df, Ip), a
precision matrix Θis generated according to the graph structure. Here, Iprepresents a
p×pidentity matrix and df gives the degrees of freedom of the G-Wishart distribution
(Mohammadi & Wit, 2015a).
NETWORK SIMILARITY 20
We obtained edge weights from the simulated precision matrix according to
Equation 3. The degrees of freedom of the G-Wishart distribution determine the degree of
shrinkage of the precision matrix towards the identity matrix (Hsu et al., 2012), and thus
controls the size of the edge weights. Note that the same degrees of freedom of the
G-Wishart distribution will result in different partial correlations for different network
densities and sizes. To achieve comparable distributions of edge weights for all networks,
we set the degrees of freedom to correspond to a targeted range of partial correlation values
of -.35 to .35 for each combination of network density and size. For all considered network
densities and sizes, we obtained edge weights distributed normally around zero, with a
standard deviation of around .15.
Finally, for obtaining network 2, the generated network 1 was duplicated and the
duplicate’s edges and/or edge weights were manipulated.2We considered different types of
manipulations, described in greater detail below. On the rare occasions that the
manipulations yielded a non-positive definite correlation matrix, a new network 1 was
generated. For each of the 100 replications, we computed similarities between network 1
and network 2 in terms of sFand scor. In the reporting of results, we focus on the median
of these quantities across the 100 replications.
Networks differing in single edge weights. In our first set of scenarios, we
considered differences in single edge weights. To this end, in the network 2 duplicate, we
manipulated the edge weight with the highest absolute value, either a) halving it (i.e.,
multiplying it by 0.50), b) removing the corresponding edge, or c) switching its sign (i.e.,
multiplying it by -1).
Networks differing in multiple edge weights. In our second set of scenarios,
we investigated differences in multiple edge weights. To this end, for manipulating the
network 2 duplicate, we randomly selected a subset of the edges, considering proportions
2Recall that comparisons of sub-population networks are commonly performed on networks with the same
node set and that both sFand scor assume known node correspondence. Hence, all induced differences
concerned edges and edge weights only.
NETWORK SIMILARITY 21
from .10 to .90 in increments of .10, and either a) halved the selected edges’ weights, b)
removed the respective edges, or c) switched the signs of their weights.
Note that scenario a) and, even more so, scenario b) created network pairs strongly
differing in their connectivity (i.e., the weighted absolute sum of all edges in the network;
van Borkulo et al., 2022). In scenario a), network pairs only differed in connectivity, while
the overall structure of the network remained intact. Differences in connectivity were
particularly pronounced when network 1 was dense and a large proportion of the edge
weights of network 2 was affected by the manipulation. For instance, for scenario b), under
conditions with a density of .70, network 1 exhibited an average connectivity value twice as
high as the connectivity of network 2 when 50% of the original edge weights were removed,
and ten times as high when 90% of the original edge weights were removed. For scenario
a), under conditions with a density of .70, the average connectivity of network 1 was 1.3
times higher than the connectivity of network 2 when 50% of the original edge weights were
halved, and 1.8 times higher when 90% of the original edge weights were halved. Since
network connectivity differences are a common property of interest and, especially in
clinical applications, changes in connectivity over time as well as differences between
clinical and non-clinical samples are a common phenomenon of interest (e.g., Beard et al.,
2016; Kim et al., 2014) it is important to ensure that similarity measures are sensitive to
differences in connectivity.
Micro-differences spread across the networks. In our third set of scenarios,
we investigated micro-differences in edge weights, that were, however, spread out across the
whole network. To this end, for manipulating the network 2 duplicate, we either slightly
dampened all edge weights by multiplying them by 0.80, or “jittered” them, by making half
of them slightly larger (multiplying them by 1.25) and half of them slightly smaller in size
(multiplying them by 0.80).
Note that dampening edge weights led to differences in average connectivity, with
the connectivity of network 1 being 1.25 times higher than the connectivity of network 2
NETWORK SIMILARITY 22
across all conditions.
5.2.2 Results.
Networks differing in single edge weights. Results for our manipulations of
single edge weights as a function of the number of nodes and network 1 density are given in
Figure 8. Both measures exhibited intuitive behavior in that they decreased with
increasing severity of the manipulation, i.e., given network size and density, both measures
were smallest when the strongest edge weight was multiplied by -1 and largest when the
strongest edge weight was multiplied by 0.50.
Dampening the strongest edge weights (i.e., multiplying it by 0.50) led to values for
sFand scor that were fairly constant as a function of density, but were different for
networks of different size. The dependency on network size was to be expected, as a
single-edge difference poses a more severe difference in a smaller network than in a larger
one. sFwas around .89 for p= 10 and .92 for p= 20 across all density levels.
Correlation-based comparisons, in contrast, were not markedly sensitive to single edges
being dampened, yielding values around .97 and .99 for p= 10 and p= 20, respectively.
Both measures were sensitive to the edge with the strongest weight being missing.
Patterns of sensitivity, however, differed. Again, sFremained fairly constant as a function
of density and exhibited sensitivity to network size, yielding values around .79 for p= 10
and .85 for p= 20 across all density levels. Correlation-based comparisons, in contrast,
were highly sensitive to the networks’ densities. These effects were stronger in smaller as
compared to larger networks. For instance, while scor was .93 for p= 10 and a density of
.90, it was as low as .68 for a density of .10. This is due to the fact that given the increased
number of zero entries, values in the strictly lower triangular are less variable in networks
with lower density. As such, comparable sensitivities of scor can be expected for dense
networks where both networks posses low as compared to high variability in their edge
weights.
These patterns were further exacerbated for sign switches of the strongest edge
NETWORK SIMILARITY 23
weight. Values of scor were highly sensitive to density and rapidly increased with increasing
density levels. For instance, scor was .73 for p= 10 and a density of .90, but as low as -.09
for a density of .10. This is remarkable, as researchers may falsely conclude from such a
low correlation that two networks that share the same (small) set of edges but exhibit a
strong difference in a single edge weight are highly dissimilar. Values for sF, in contrast,
were relatively insensitive to density, yielding values around .66 for p= 10 and .74 for
p= 20 across all density levels.
p = 10
p = 20
0.25 0.50 0.75 0.25 0.50 0.75
0.4
0.6
0.8
1.0
0.4
0.6
0.8
1.0
0.4
0.6
0.8
1.0
Density
s
sFscor
Figure 8 . Similarity in terms of the Frobenius norm-based measure sF, and
correlation-based comparisons scor between network models differing in single edge weights
across 100 simulated pairs of networks with different size and density. Lines are smoothed
to accommodate small differences in average ranges of network 1’s edge weights across
different conditions. pdenotes the number of nodes. Note that the y-axis is truncated at
0.30 for comparability across experiments, but values for scor fall below 0.30.
NETWORK SIMILARITY 24
Networks differing in multiple edge weights. Results for our manipulations
of multiple edge weights as a function of network size and network 1 density as well as the
proportion of edge weights affected by our manipulations are displayed in Figure 9. For
simplicity, only results for densities of .30 and .90 are displayed. Again, both measures
behaved intuitively in that they decreased with increasing severeness of the manipulation
in terms of both the type of edge weight manipulation and the proportion of affected edge
weights. Nevertheless, mirroring results for manipulations of single edge weights, scor was
almost insensitive to edge weights being halved, ranging from .94 to .98 across all
conditions, as compared to a range of .72 to .93 for sF.
For the same type of manipulation, similarities were slightly lower for larger as
compared to smaller networks. For small networks with p= 10, both sFand scor were
somewhat higher in sparse as compared to dense networks. This behavior was to be
expected, because given the same proportion of manipulated edges, for more dense
networks, a higher number of edge weights were affected by the manipulations.
Both measures exhibited steeper decline with an increasing proportion of affected
edge weights the more severe the type of manipulation. This effect, however, was much
more pronounced for scor. For instance, in small, dense networks (p= 10 and a density of
.90), sFtook values of .77 and .61 when 20% and 90% of the edges were missing,
respectively, and .64 and .44 when 20% and 90% of the edge weights were multiplied with
-1. scor, in contrast, was .91 and .29 when 20% and 90% of the edges were missing, and
took values of .65 and -.82 when 20% and 90% of the edge weights were multiplied with -1.
Micro-differences spread across the network. Figure 10 gives results for
micro-manipulations of edge weights spread across the whole network as a function of the
number of nodes and network 1 density. Recall that when all edge weights were slightly
dampened (i.e., multiplied by 0.80), edge weights were merely rescaled. As such, scor was
incapable to capture these differences, corresponding to 1 across all conditions. Likewise,
scor was almost insensitive to edge weights being “jittered”, remaining above .98 across all
NETWORK SIMILARITY 25
p = 10
density = 0.30
p = 10
density = 0.90
p = 20
density = 0.30
p = 20
density = 0.90
edge weight half
edge missing
edge weight sign switch
0.25 0.50 0.75 0.25 0.50 0.75 0.25 0.50 0.75 0.25 0.50 0.75
0.4
0.6
0.8
1.0
0.4
0.6
0.8
1.0
0.4
0.6
0.8
1.0
Proportion of affected edges
s
sFscor
Figure 9 . Similarity in terms of the Frobenius norm-based measure sF, and
correlation-based comparisons scor between network models differing in multiple edge
weights across 100 simulated pairs of networks with different size and density. Lines are
smoothed to accommodate small differences in average ranges of network 1’s edge weights
across different conditions. pdenotes the number of nodes. Note that the y-axis is
truncated at 0.30 for comparability across experiments, but values for scor fall below 0.30.
conditions. sFyielded intuitive results in that it slowly decreased as density increased
(note that for more dense networks, more edge weights were affected by the manipulation).
For instance, sFwas .91 in a network with p= 10 and low density of .10, and .87 when
density was .90. Results were almost identical when edge weights were dampened.
6 Investigating Finite-Sample Behavior
One challenge with sFis that its sampling distribution may not always be centered around
the population value. This is due to the fact that it is more likely for a finite-sample
network pair to exhibit less extreme similarity or dissimilarity than their population
NETWORK SIMILARITY 26
p = 10
p = 20
edge weights jittered
edge weights dampened
0.25 0.50 0.75 0.25 0.50 0.75
0.4
0.6
0.8
1.0
0.4
0.6
0.8
1.0
Density
s
sFscor
Figure 10 . Similarity in terms of the Frobenius norm-based measure sF, and
correlation-based comparisons scor between network models with micro-differences in edge
weights across 100 simulated pairs of networks with different size and density. Lines are
smoothed to accommodate small differences in average ranges of network 1’s edge weights
across different conditions. pdenotes the number of nodes.
counterparts,3especially when population similarity is extremely high or extremely low,
respectively. To illustrate this for the case of sF, consider the case of two identical
sub-population networks with non-zero density. For identical sub-population networks, the
population matrix AnAmhas zero entries only. For any given cell, it is highly unlikely
that for both networks, the exact same sample estimate is obtained for two given
sub-samples. With sFbeing constructed based on absolute values, the first moment of the
sFsampling distribution can, therefore, be assumed to be smaller than the population
value, underestimating similarity. Likewise, extreme values in the matrix AnAm
stemming, for instance, from extreme negative edge weights in sub-population network n
and extreme positive edge weights in sub-population network m—are likely to be less
extreme for two given sub-samples, resulting in sFvalues higher than their population
counterparts. To investigate the degree to which this property may distort conclusions on
3Note that this also holds true for scor
NETWORK SIMILARITY 27
network similarity, we studied sF’s finite sample behavior.
6.1 Method
We studied the finite-sample behavior of sFfor both (a) finite-sample network pairs
stemming from sub-population networks with exact similarity and (b) finite-sample
network pairs stemming from different sub-population networks. For the latter, we studied
all scenarios considered in Study II of our experiments with population networks. For both
cases, we varied the number of nodes (p= 10;p= 20), network density, considering values
from .10 to .90 in increments of .20, and the sub-sample size (N= 200;N= 500;
N= 1000;N= 2000;N= 5000;N= 10000), assuming sample sizes for both sub-samples
to be the same. For each condition, we employed 500 replications and investigated the
median and interquartile range of sFacross replications.
6.1.1 Finite-sample behavior for sub-population networks with exact
similarity. For studying the finite-sample behavior of sFfor finite-sample network pairs
stemming from sub-population networks with exact similarity, for each replication, we first
generated a population network according to the procedure described above. We then
simulated data according to the generated population network for two independent samples
by drawing values for Nobservations from the multivariate normal distribution implied by
the generated population network. A separate GGM was fit to each generated data set.
For estimation, we used the EBICglasso method (see Epskamp & Fried, 2018, for an
introduction) as implemented in the R package qgraph (Epskamp et al., 2012), setting
γ= 0.50. Finally, we obtained sFfor the two estimated GGMs.
6.1.2 Finite-sample behavior for different sub-population networks. To
study the finite-sample behavior of sFfor finite-sample network pairs stemming from
different sub-population networks, for each replication, we first generated population
network pairs according to the procedure and scenarios described in Section 5.2. Then, we
simulated two data sets for Nobservations according to the multivariate normal
NETWORK SIMILARITY 28
distributions implied by the two generated population networks. Again, we fit a separate
GGM to each generated data set and obtained sFfor the two estimated GGMs.
6.2 Results
6.2.1 Finite-sample behavior for sub-population networks with exact
similarity. Figure 11 displays the median and interquartile ranges of the Frobenius
norm-based measure sFfor finite sample networks stemming from the same population
network with varying size and density across 500 replications. The color of the solid lines
(median) and shaded areas (interquartile range) denotes the sub-sample size. The dashed
gray line marks the true population value of sF= 1. Figure 11 highlights three important
points. First, for small sub-sample sizes, estimates of sFexhibited a strong downwards bias
(i.e., researchers would falsely conclude that two finite-sample networks stemming from the
same population network exhibit low similarity). Second, this effect was aggravated by
high network density. Third, and most importantly, when the population network was
dense, sFdid not approach the true value of 1even for very large sub-sample sizes of
N= 10000, such that even for very large sample sizes researchers would conclude that two
finite-sample networks stemming from the same population network exhibit high, but not
perfect similarity.
6.2.2 Finite-sample behavior for different sub-population networks.
Figures 12 to 14 illustrate the finite-sample behavior of sFfor samples stemming from
sub-population networks differing according to the scenarios studied in Section 5.2. In each
figure, the dashed gray line marks the true population value.
Contrasting these figures against Figure 11 illustrates that sFbehaved markedly
different when sub-population networks differed, with the most important difference being
that—regardless of the type and severeness of true network differences—estimates of sF
approached the true population value with increasing sub-sample size and yielded
trustworthy conclusions on network similarity when sub-sample size was sufficiently large
NETWORK SIMILARITY 29
p = 10
p = 20
0.25 0.50 0.75 0.25 0.50 0.75
0.4
0.6
0.8
1.0
Density
sF
N200
500
1000
2000
5000
10000
Figure 11 . Median and interquartile ranges of the Frobenius norm-based measure sFfor
finite sample networks stemming from the same population network with varying size and
density across 500 simulated pairs of networks estimated from data with varying
sub-sample sizes. The dashed gray line marks the true population value of sF= 1.p
denotes the number of nodes. Ndenotes the sub-sample size.
(>2000).
When sub-sample size was small, however, sFexhibited a severe downwards bias
when the population networks exhibited high similarity (e.g., when the strongest edge
weight was halved, see first row in Figure 12, or when all edge weights were mildly
dampened, see second row in Figure 14), especially when network density was high. In
contrast, when true similarity was very low (e.g., when the size of a large proportion of
edge weights was switched in large networks, see the panel in the fourth row and third
column in Figure 13), sFexhibited upwards bias under small sub-sample conditions. That
is, researchers would falsely conclude that two finite-sample networks stemming from
population networks with low similarity exhibit a comparably higher degree of similarity.
That is, under small sample conditions, estimates of sFwere not trustworthy in that they
tended to be less extreme than their population counterparts, such that networks with very
high and very low population similarity appeared less, respectively, more similar.
NETWORK SIMILARITY 30
p = 10
p = 20
strongest edge weight half
strongest edge missing
strongest edge weight sign switch
0.25 0.50 0.75 0.25 0.50 0.75
0.4
0.6
0.8
1.0
0.4
0.6
0.8
1.0
0.4
0.6
0.8
1.0
Density
sF
N200
500
1000
2000
5000
10000
Figure 12 . Median and interquartile ranges of the Frobenius norm-based measure sFfor
finite sample networks stemming from population networks differing in single edge weights
across 500 simulated pairs of networks estimated from data with varying sub-sample sizes.
The dashed gray line marks the true population value. Density gives the density of the
population network 1. Network 2 was derived from network 1 by manipulating single edge
weights. pdenotes the number of nodes. Ndenotes the sub-sample size.
7 Interim Summary: Deriving Guidelines
We derive guidelines on the recommended scope of application and interpretation of sF
from our experiments conducted in Sections 5 (investigating population networks) and 6
(investigating finite-sample behavior).
When using sFto investigate network similarity in finite samples, researchers should
be aware that (a) when true population networks exhibit exact similarity, sFdoes not
approach 1 even for very large sub-sample sizes and (b) when, in contrast, true population
NETWORK SIMILARITY 31
p = 10
density = 0.30
p = 10
density = 0.90
p = 20
density = 0.30
p = 20
density = 0.90
edge weight half
edge missing
edge weight sign switch
0.25 0.50 0.75 0.25 0.50 0.75 0.25 0.50 0.75 0.25 0.50 0.75
0.4
0.6
0.8
1.0
0.4
0.6
0.8
1.0
0.4
0.6
0.8
1.0
Proportion of affected edges
sF
N200
500
1000
2000
5000
10000
Figure 13 . Median and interquartile ranges of the Frobenius norm-based measure sFfor
finite sample networks stemming from population networks differing in multiple edge
weights across 500 simulated pairs of networks estimated from data with varying
sub-sample sizes. The dashed gray line marks the true population value. Density gives the
density of the population network 1. Network 2 was derived from network 1 by
manipulating single edge weights. pdenotes the number of nodes. Ndenotes the
sub-sample size.
networks do not exhibit exact similarity, finite-sample estimates of sFdo approach the true
population network, but only when sub-sample sizes are sufficiently large. Based on these
results, we strongly recommend two rules of use. First, researchers should pair sFwith
significance tests for network comparisons such as the NCT and calculate sFonly after
concluding that population networks indeed differ. Second, if the conducted test for
network comparison evidences differences in sub-population networks, we recommend to
interpret sFonly if sub-sample sizes are sufficiently large, i.e., >2000.
For conditions where these requirements are met, we derive initial guidelines on
interpreting the degree of similarity between sub-population networks from our
NETWORK SIMILARITY 32
p = 10
p = 20
edge weights jittered
edge weight dampened
0.25 0.50 0.75 0.25 0.50 0.75
0.4
0.6
0.8
1.0
0.4
0.6
0.8
1.0
Density
sF
N200
500
1000
2000
5000
10000
Figure 14 . Median and interquartile ranges of the Frobenius norm-based measure sFfor
finite sample networks stemming from population networks with micro-differences in edge
weights across 500 simulated pairs of networks estimated from data with varying
sub-sample sizes. The dashed gray line marks the true population value. Density gives the
density of the population network 1. Network 2 was derived from network 1 by
manipulating single edge weights. pdenotes the number of nodes. Ndenotes the
sub-sample size.
experiments with population networks. Recall that these served to provide intuition for the
scale of sFby investigating how scenarios of varying differences in psychometric networks
translate into sF. Note that the values of sFobtained for the most extreme scenarios of
large, dense networks with a large proportion of edge weights differing in sign were slightly
above .40, which means that such values are already indicative of very low similarity. We,
therefore, map values of sFfrom 1 to .50 in decrements of .10 to investigated scenarios of
differences between networks. Having an analog scenario for their obtained values of sF
may aid researchers in evaluating the severeness and practical significance of differences in
their studied networks.
For each value of sF, Table 1 provides one example from Study I (comparing
NETWORK SIMILARITY 33
networks implied by latent variable models) and one from Study II (comparing networks
implied by GGMs). For examples from Study I, we focus on differences in latent
correlations and provide the non-zero correlation of network 2 for which the respective sF
value has been obtained. For examples from Study II, we focus on networks with medium
density of .50 differing in multiple edge weights and provide the type of manipulation and
proportion of affected edge weights for which the respective sFvalue has been obtained.
For instance, for sF=.90, researchers can infer that the two compared networks are as
similar as are pairs of networks implied by two-dimensional latent variable models, where
latent factors are perfectly correlated in network 1 and exhibit a correlation of .95 in
network 2. For a value of sF=.60, researchers could conclude that their to-be-compared
networks are as similar as a pair of large networks (p= 20) with medium density of .50
that differ in sign in 30% of their edge weights. For further contextualizing these exemplar
scenarios, it should be kept in mind that our investigations of networks implied by GGMs
focused on networks with weights ranging from -.35 to .35, and that the values of sF
obtained for the different scenarios refer to the comparison of population networks.
8 Empirical Example
The empirical example serves (a) to illustrate the insights that can be gained from
quantifying network similarities, (b) to explore agreement and disagreement between sF
and scor in empirical data as well as (c) in order to provide further intuition for network
similarities, to give examples for empirical networks with varying degrees of similarities in
terms of sFand scor. To these ends, we investigated patterns of similarities in psychometric
networks of human values across European countries.
NETWORK SIMILARITY 34
Table 1
Exemplar scenarios for different values of sF
sFStudy I Study II
p= 10 p= 20 p= 10 p= 20
1 exact similarity
.90 ρ=.95 ρ=.95 20% of edges missing 10% of edges missing
.80 ρ=.80 ρ=.75 90% of edges missing 60% of edges missing
.70 ρ=.05 ρ=.99 20% of edge weights
sign switch
10% of edge weights
sign switch
.60 ρ=.99 40% of edge weights
sign switch
30% of edge weights
sign switch
.50 60% of edge weights
sign switch
Notes: Study I investigated differences between networks implied by latent
variable models. Study II investigated differences in networks implied by
Gaussian graphical models. In the displayed conditions of Study I, the
correlation between factors ρwas set to 1 for network 1 and varied for network
2. In the displayed conditions of Study II, network 1 had a medium density of
.50 and network 2 was derived from network 1 via the described manipulations.
pdenotes the number of nodes. A dash indicates that the respective sFvalue
(or a smaller one) has not been observed in the studied scenario.
8.1 Method
We used data from the semi-annual European Social Survey (ESS) 20184. We focused on
21 items measuring basic human values, derived from Schwartz’s (1992) seminal theory, to
investigate patterns of similarities in human values across European countries. The ESS
basic human values scale includes verbal portraits of 21 different people, each reflecting the
importance of a value (see Table 3). For example, “Thinking up new ideas and being
creative is important to her. She likes to do things in her own original way” describes a
person for whom self-direction values are important. For each portrait, respondents are
asked to rate their likeness to the described person on a six-point Likert scale (1: “not like
me at all”, 6: “very much like me.”), and their own values are inferred from these
self-reported similarities (see, e.g., Davidov, 2008, for further details).
We analyzed data from 27 European country samples (see Table 2).
4Data can be retrieved from the ESS data portal via https://ess-search.nsd.no/
NETWORK SIMILARITY 35
Table 2
Overview Over Analyzed European Countries
Country Region N
Austria Western 2373
Belgium Western 1724
Bulgaria Eastern 1503)
Switzerland Western 1423)
Cyprus Southern 757
Czech Republic Eastern 2180
Germany Western 2246
Denmark Northern 469
Estonia Eastern 879
Spain Southern 1479
Finland Northern 1669
France Western 1804
UK Western 2122
Croatia Eastern 1679
Hungary Eastern 1574
Ireland Western 2082
Iceland Northern 787
Italy Southern 2457
Lithuania Eastern 1497
Latvia Eastern 805
Netherlands Western 1591
Norway Northern 1353
Poland Eastern 1296
Portugal Southern 994
Sweden Northern 1462
Slovenia Eastern 1226
Slovakia Eastern 994
A separate GGM was fit to data from each country. For estimation, we used the
same set-up as in the simulation study conducted in Section 6. For each of the 27
2= 351
country pairs, we first tested for network invariance using the R package
NetworkComparisonTest (van Borkulo et al., 2017). The resampling-based network
invariance test evaluates the null hypothesis that, for a given pair of networks, all edges are
equal by evaluating whether the largest difference between corresponding edges of the
compared networks is significantly different from zero (van Borkulo et al., 2022). We
NETWORK SIMILARITY 36
employed a confidence level of α=.05 with Bonferroni correction due to multiple pairwise
comparisons.
We computed and compared pairwise country network similarities in terms of sF
and scor only for network pairs for which the null hypothesis of all edges being equal was
rejected. Data and R code are provided in the OSF repository accompanying this study.
Table 3
Items of the European Social Survey Human Values scales
Value Name Content
Universalism
UN1 Important that people are treated equally and have equal
opportunities
UN2 Important to understand different people
UN3 Important to care for nature and environment
Benevolence BE1 Important to help people and care for others well-being
BE2 Important to be loyal to friends and devote to people close
Conformity CO1 Important to do what is told and follow rules
CO2 Important to behave properly
Tradition TR1 Important to be humble and modest, not draw attention
TR2 Important to follow traditions and customs
Security SE1 Important to live in secure and safe surroundings
SE2 Important that government is strong and ensures safety
Power PO1 Important to be rich, have money and expensive things
PO2 Important to get respect from others
Achievement AC1 Important to show abilities and be admired
AC2 Important to be successful and that people recognise
achievements
Hedonism HE1 Important to have a good time
HE2 Important to seek fun and things that give pleasure
Stimulation ST1 Important to try new and different things in life
ST2 Important to seek adventures and have an exciting life
Self-direction SD1 Important to think new ideas and being creative
SD2 Important to make own decisions and be free
8.2 Results
Of the 351 country pairs, 297 exhibited significant differences in human values networks.
For country pairs with significant differences in human values networks, both sFand scor
indicated considerable variability in similarities across European countries (range sF: [.66;
NETWORK SIMILARITY 37
.80]; range scor: [.34; .83]). Both the network invariance test and the two similarity
measures suggested that Western (proportion of significant network invariance tests: .57;
median sF: .75; median scor: .76) and Northern European (proportion of significant
network invariance tests: .50; median sF: .75; median scor: .74) countries shared strong
similarities in human values networks among each other—network pairs tended to show no
significant differences or, if they did, exhibited high similarities. Eastern (proportion of
significant network invariance tests: .89; median sF: .71; median scor : .57) and, to a lesser
extent, Southern (proportion of significant network invariance tests: .67; median sF: .71;
median scor: .62) European countries had more unique networks, i.e. tended to exhibit
significant differences more often and to be less similar to the other countries in their
group. This pattern could be observed for both similarity measures and is illustrated in
Figure 15, displaying human values network similarities for the 27 investigated European
countries in terms of sFand scor. Blue lines indicate that human values networks did not
significantly differ and are the same in Figures 15a and 15b. For human values network
pairs with significant differences, displayed in red, thicker lines indicate higher similarity in
terms of the respective measure. For readability, only the upper 20 percentile of network
similarities is displayed for each measure.
Figure 15 also illustrates some disagreement among the measures. For instance,
Italy’s human values network was significantly different from all other networks, as
indicated by the absence of blue lines connecting Italy with other countries. While for sF,
none of Italy’s similarities to other countries was in the upper 20 percentile, ccor indicated
high similarity of Italy’s network with few other European countries.
sFand scor exhibited a high, but not perfect rank order correlation of .89. To
illustrate differences in rank orders, Figures 16 and 17 display human values networks with
the highest and lowest similarities for the two measures. Out of all network pairs with
significant invariance tests, both measures yielded the highest values for the similarity
between the networks of Switzerland and Germany (sF=.80;scor =.83). To further
NETWORK SIMILARITY 38
Netherlands Germany
Switzerland Slovakia
Belgium
Hungary
Denmark
Ireland
Finland
Portugal
Slovenia
UK
Spain
Cyprus
Norway
France Czech Republic
Iceland
Latvia
Italy
Bulgaria
Sweden Estonia
Austria
Lithuania
Poland
Croatia
(a) sF
Netherlands
Germany
Switzerland Slovakia
Belgium
Hungary
Denmark
Ireland
Finland
Portugal
Slovenia
UK
Spain
Cyprus
Norway
France Czech Republic
Iceland
Latvia
Italy
Bulgaria
Sweden Estonia
Austria
Lithuania
Poland
Croatia
(b) scor
Figure 15 . Human values network similarities for 27 European countries for different
similarity measures. Blue lines indicate that human values networks did not significantly
differ. For human values network pairs with significant differences, displayed in red, thicker
lines indicate higher similarity in terms of the respective measure. Displayed are only the
upper 20 percentile of network similarities for each measure; that is, absent lines indicate
that the network invariance test was significant and the similarity measure was not in the
upper 20 percentile.
NETWORK SIMILARITY 39
investigate sources of deviations from perfect similarity between the two human values
networks, we inspected the distribution of non-zero differences in edge weights of the
German and Swiss human values networks, displayed in Figure 18. As can be seen,
differences oscillated around zero, indicating that deviations from perfect similarity
between the two networks were driven by small differences spread across the whole network
rather than strongly pronounced differences in single edge weights.
UN1
UN2
UN3
BE1
BE2
CO1
CO2
TR1
TR2
SE1
SE2
PO1
PO2
AC1
AC2
HE1 HE2 ST1
ST2
SD1
SD2
Switzerland
UN1
UN2
UN3
BE1
BE2
CO1
CO2
TR1
TR2
SE1
SE2
PO1
PO2
AC1
AC2
HE1 HE2 ST1
ST2
SD1
SD2
Germany
Figure 16 . Significantly different human values networks with the highest similarity for
both similarity measures (sF=.80;scor =.83).
UN1
UN2
UN3
BE1
BE2
CO1
CO2
TR1
TR2
SE1
SE2
PO1
PO2
AC1
AC2
HE1 HE2 ST1
ST2
SD1
SD2
Iceland
UN1
UN2
UN3
BE1
BE2
CO1
CO2
TR1
TR2
SE1
SE2
PO1
PO2
AC1
AC2
HE1 HE2 ST1
ST2
SD1
SD2
Bulgaria
UN1
UN2
UN3
BE1
BE2
CO1
CO2
TR1
TR2
SE1
SE2
PO1
PO2
AC1
AC2
HE1 HE2 ST1
ST2
SD1
SD2
Portugal
UN1
UN2
UN3
BE1
BE2
CO1
CO2
TR1
TR2
SE1
SE2
PO1
PO2
AC1
AC2
HE1 HE2 ST1
ST2
SD1
SD2
Slovakia
Figure 17 . Significantly different human values networks with the lowest similarity for
different similarity measures. sFwas lowest for the networks of Iceland and Bulgaria
(sF=.66;scor =.38), while scor was lowest for the networks of Slovakia and
Portugal(sF=.68;scor =.34).
Iceland and Bulgaria (see Figure 17) displayed the lowest similarity in terms of sF
(sF=.66;scor =.38). With 18 and 60 out of 210 possible edges being present,
NETWORK SIMILARITY 40
corresponding to densities of .09 and .29, these were among the networks with the lowest
and highest density, respectively, and only 13 of Iceland’s 18 edges were also included in
Bulgaria’s network. Slovakia and Portugal were the least similar in terms of scor (sF=.68;
scor =.34). With 36 and 31 edges, corresponding to densities of .17 and .15, both networks
were relatively sparse, and only 13 of Portugal’s 31 edges were also included in Slovakia’s
network. Note that, again, differences in the same measure for these pairwise similarities
were rather small. That is, overall, the two measures agreed in that networks for Iceland
and Bulgaria were comparably dissimilar as networks for Slovakia and Portugal.
0.0
2.5
5.0
7.5
10.0
12.5
−0.2 −0.1 0.0 0.1 0.2
Non−zero difference in edge weights (Germany − Switzerland)
Number of edge weights
Figure 18 . Distribution of non-zero differences in edge weights of the German and Swiss
human values networks.
9 Discussion
The present study aimed to provide a similarity measure for quantifying differences in
psychometric networks. To this end, we derived a similarity measure based on the
Frobenius norm of differences in psychometric networks’ weighted adjacency matrices. The
measure originates in a commonly used graph-theoretical measure for determining
similarity of graphs and accommodates the specifics of psychometric networks. We
illustrated and evaluated the proposed similarity measure sFby studying a wide range of
differences in pairs of population network models both implied by latent variable models
and by GGMs as well as sF’s finite-sample behavior.
In our evaluations of population network pairs, we could show that the studied
NETWORK SIMILARITY 41
scenarios were captured in an intuitive manner by sF, while the same did not hold true for
customary correlation-based comparisons. Our evaluations further underlined pitfalls of
currently used correlation-based comparisons (see also Brusco, 2004; Brusco & Cradit,
2005; Hubert, 1978) and, as such, the need for a measure that is more tailored towards the
context of psychometric networks. First, correlation-based comparisons do not allow
comparisons with empty networks due to lack of variance in the entries of the comparison
network’s weighted adjacency matrix. Second, correlation-based comparisons are highly
sensitive to the variability of entries of the strictly lower triangulars of networks’ weighted
adjacency matrices. Among others, this implies different conclusions concerning similarity
between networks for the same type of difference when networks possess low as compared
to high homogeneity of edge weights, or low as compared to high density. Third,
correlation-based comparisons are not sensitive to edge weights being dampened. This
may, for instance, be the case when a clinical intervention reduces symptom dependencies
compared to the baseline. Fourth, on a more general note, the correlation of the strictly
lower triangulars of networks’ weighted adjacency matrices is not a true metric, meaning
that a correlation of 1 does not necessarily imply equality of psychometric networks. This
is different for sF, which allows drawing such conclusions when sF= 1.
In studying the finite-sample behavior of sF, we found sFto be consistent—albeit
biased for small sub-sample sizes—only if population networks did not exhibit exact
similarity. We, therefore, strongly recommend to (a) use sFas a complement to
significance tests for network comparisons such as the NCT that should be interpreted only
if there is evidence for differences in sub-population networks and (b) to interpret sFonly
if sub-sample sizes are sufficiently large, i.e., >2000. To guide the interpretation of sF
when these conditions are met, we summarized results from our investigations of
sub-population networks by providing exemplar scenarios for different values of sF.
A further aspect that should be kept in mind when interpreting sFis that it is a
measure of global similarity that compresses structural similarities and differences between
NETWORK SIMILARITY 42
to-be-compared networks into a single score. As such, from mere assessment of sF, it is not
possible to determine whether networks have low similarity because of large difference in
some local area of the networks—in the most extreme case a single edge weight—or due to
small differences spread throughout the networks. A fairly straightforward follow-up
analysis to better understand the determinants of an observed sFscore may be an
assessment of the distribution of edge weight differences (see Figure 18 from the empirical
application for an example).
In an empirical application based on cross-country comparisons of human values
networks, we showcased the potential insights from quantifying network similarity. We
illustrated that sFcan be employed for both normative (e.g., human values networks of
countries A and B exhibit low similarity) and ipsative evaluations (e.g., networks of
country A and B exhibit the same degree of similarity as networks of country A and B) of
network similarity.
In motivating and evaluating sF, we focused on the quantification of similarity
between sub-population psychometric networks. Another use case for sFwe view
worthwhile to consider are simulation studies when evaluating the statistical performance
of network estimation techniques. Specificity, sensitivity, and scor are commonly employed
outcome measures in psychometric network simulations (e.g. Isvoranu et al., 2021;
van Borkulo et al., 2014). For a more nuanced evaluation of bias in network estimation,
researchers may consider to complement these with sFbetween data-generating and
estimated networks.
Note that although sFwas illustrated on GGMs, it is also applicable to other types
of Markov random fields, e.g., Ising models or mixed graphical models. This is because sF
uses the weighted adjacency matrices of networks that have already been estimated, and is,
therefore, not bound to specific models or estimation techniques used to obtain the
networks. Nevertheless, separate interpretation guidelines may need to be established, as
bounds and typical ranges of differences in edge weights may markedly differ between
NETWORK SIMILARITY 43
different types of Markov random fields. Recall that GGMs’ edge weights are partial
correlations bounded between 1and 1. That is, the largest possible contribution of a
single absolute difference between corresponding edge weights to sFis 2. This is different
for Ising models, where edge weights are not bounded and magnitudes can exceed unity.
For instance, in an application of the Ising model to political opinion data, Brusco et al.
(2022) reported maximum edge weights of as high as roughly 3.5. For edge weights of such
size, one can easily imagine scenarios where the contribution of a single absolute difference
between corresponding edge weights to sFby far exceeds 1, e.g., when the corresponding
edge weight in the comparison network is close to zero or negative. Consequently, sFused
to study differences between Ising models may take values that are much smaller than sF
applied in the context of GGMs.
9.1 Limitations and Future Directions
The interpretation guidelines derived from simulation results provide initial guidance for
interpreting obtained similarities. Nevertheless, we point out that these guidelines are
preliminary and should be subject to further refinement once a broader range of scenarios
has been investigated and subject-matter expertise on typical similarity values in different
fields of applications of psychometric networks has accumulated.
Note the finite-sampling behavior of sFdoes not only constrain its applicability to
research contexts where network comparison tests indicate differences in sub-population
networks and sub-sample sizes are sufficiently large, but also impedes quantifying
uncertainty in similarity estimates. While bootstrapping seems like an obvious means to
this end, its employment is not straightforward, as bootstrap resamples may oftentimes
yield similarities below the sample point estimate. Hence, bootstrapped confidence
intervals for sFcan be assumed to underestimate similarity, especially when sFis high.
Developing techniques for quantifying uncertainty in similarity estimates remains an
important task for future research.
NETWORK SIMILARITY 44
The present study focused on using sFas an effect size measure analogue for
quantifying psychometric network differences. Having a measure for quantifying network
similarities, however, also opens the path for uncovering groups of similar networks. More
specifically, when multiple networks are compared (as in, e.g., Fried et al., 2018, who
conducted a cross-cultural multi-site study of post-traumatic stress disorder symptom
networks), the result can, again, be depicted as a full graph, with nodes denoting networks
and edge weights quantifying similarity among them. Then, graph-modeled data clustering
techniques such as spectral clustering (Von Luxburg, 2007) or cluster editing (Böcker &
Baumbach, 2013) can be used to uncover clusters of networks that are similar to each
other, such that researchers may identify sets of sub-populations that share common
network structures. When comparing symptom networks, researchers may uncover groups
of sub-populations that differ in inter-relationships among symptoms and may design
group-specific interventions. Likewise, clustering idiographic networks (see Epskamp,
van Borkulo, et al., 2018, for an introduction) may uncover sub-populations of, say,
patients that have different network structures and may require different treatment.
Evaluating different clustering techniques and exploring their potential for identifying
subgroups of similar networks pose interesting topics for future research.
We point out that the present study is aimed at initiating discussions on urgently
needed effect size measures for quantifying differences in sub-population psychometric
networks rather than providing definite solutions. The proposed sFcaptures differences in
sub-population in an intuitive manner. It is, however, limited by a sampling distribution
that is oftentimes not centered around the true population value. Future research may
develop and explore alternative measures.
Concordance indices for comparing proximity matrices developed by Hubert (1978),
for instance, may be a promising starting point. Intuitively speaking, these indices capture
concordance in the ordering of the entries of proximity matrices—i.e., in the present
application, concordance in the ordering of to-be-compared psychometric networks’ edge
NETWORK SIMILARITY 45
weights. Different indices exist that are sensitive to different aspects of concordance in
ordering. In psychological research, these indices have been applied to compare confusion
matrices (Brusco, 2004) as well as in the context of comparing submatrices of
multitrait-multimethod correlation matrices (Hubert & Baker, 1979), and recently gained
attention in network psychometric application as a means for easing interpretability of
weighted adjacency matrices by re-ordering items in a meaningful way (Brusco et al.,
2022). Their utility as effect size measures for quantifying differences in psychometric
networks, however, has not been studied yet.5A potential limitation of concordance indices
may be that they capture differences related to ordinal properties of the to-be-compared
matrices. That is, concordance indices can be expected to not be sensitive to scenarios
where differences between networks are not reflected in ordering, e.g., when one network is
a dampened duplicate of the other. Likewise, in analogy to drawing on graph-theoretical
literature for quantifying differences in psychometric networks, concordance indices would
need to be selected with care, as not all indices may capture aspects that are meaningful in
the context of network psychometrics. Concordance indices may, however, have a more
favorable sampling distribution that is centered around the population value and, if so,
could be a viable alternative when network comparison tests are not significant or when
sub-sample sizes are small. A further advantage is that generalized concordance indices
exist that allow for comparing more than two matrices (Hubert, 1987), while, in its current
form, sFcan only be employed for pairwise comparisons.
Finally, we focused on the special case of comparing sub-population networks
comprising the same set of variables, assuming known node correspondence. Future
research may expand on comparisons of psychometric networks comprising different sets of
5Hubert (1987) also presented permutation-based significance tests for concordance indices. These
compare the obtained concordance index value to a reference distribution of no concordance between
proximity matrices. Note, however, that these tests are based on permutation of matrix entries, while the
entries themselves are treated as fixed. Hence, when applied in the context of network psychometrics,
permutation-based significance tests for concordance indices would neglect the uncertainty of edge weights
estimates due to sampling variation and, in our view, do, therefore, not pose an alternative to existing
network comparison tests.
NETWORK SIMILARITY 46
variables, e.g., when comparing networks using different measures for similar constructs.
Before evaluating measures applicable to compare networks with unknown node
correspondence, however, conceptual discussion are needed on the specific patterns of
similarity among psychometric networks with unknown node correspondence that applied
researchers would deem meaningful and may want to capture.
Extra Material
Extra materials for this article can be found in the OSF and are available via the following
link: https://osf.io/guxf8/
NETWORK SIMILARITY 47
Appendix A
Example Calculation for sF
To illustrate the calculation of sF, we will consider two networks nand mwith p= 3 nodes
and weighted adjacency matrices An=
1.0 0.3 0.2
0.3 1.0 0.1
0.2 0.1 1.0
and Am=
1.0 0.1 0.4
0.1 1.0 0.0
0.4 0.0 1.0
The normalized Frobenius norm of the difference between these two adjacency
matrices is
dF(n, m) = 1
3/2q3· |1.01.0|2+ 2 · |0.30.1|2+ 2 · |0.20.4|2+ 2 · |0.10.0|2=
1
3/20.18 0.35. Then, sF=1
1+dF(n,m)0.74.
NETWORK SIMILARITY 48
Appendix B
R Function for Obtaining sF
1sF <- function ( n1 , n 2 , p ){
2# n 1 : w ei g h te d a dj a c en c y ma t ri x fo r n e t wo r k 1
3# n 2 : w ei g h te d a dj a c en c y ma t ri x fo r n e t wo r k 2
4# p: num b er of n od e s
5r et ur n (1 /( 1+ ( n or m ( n1 - n 2 , ty p e =" F " )/ sqrt( p /2 )) ) )
6}
Figure B1 . R function for obtaining sF.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
The topic of replicability has recently captivated the emerging field of network psychometrics. Although methodological practice (e.g., p-hacking) has been identified as a root cause of unreliable research findings in psychological science, the statistical model itself has come under attack in the partial correlation network literature. In a motivating example, I first describe how sampling variability inherent to partial correlations can merely give the appearance of unreliability. For example, when going from zero-order to partial correlations there is necessarily more sampling variability that translates into reduced statistical power. I then introduce novel methodology for deriving expected network replicability (ENR), wherein replication is modeled with the Poisson-binomial distribution. This analytic solution can be used with the Pearson, Spearman, Kendall, and polychoric partial correlation coefficient. I first employed the method to estimate ENR for a variety of data sets from the network literature. Here it was determined that partial correlation networks do not have inherent limitations, given current estimates of replicability were consistent with ENR. I then highlighted sources that can reduce replicability, that is, when going from continuous to ordinal data with few categories and employing a multiple comparisons correction. To address these challenges, I described a strategy for using the proposed method to plan for network replication. I end with recommendations that include the importance of the network literature repositioning itself with gold-standard approaches for assessing replication, including explicit consideration of Type I and Type II error rates. The method for computing ENR is implemented in the R package GGMnonreg. (PsycInfo Database Record (c) 2022 APA, all rights reserved).
Article
Full-text available
The network approach, in which psychological constructs are modeled in terms of interactions between their constituent factors, have rapidly gained popularity in psychology. Applications of such network approaches to various psychological constructs have recently moved from a descriptive stance, in which the goal is to estimate the network structure, to a more comparative stance, in which the goal is to compare network structures across groups. However, the statistical tools to do so are lacking. In this article, we present the network comparison test (NCT). NCT is a statistical test that compares two network structures on three types of characteristics. Performance of NCT is evaluated by means of a simulation study. Simulated data shows that NCT performs well in various circumstances for all three tests: when the groups are simulated to be similar, the error rate (i.e., NCT indicating that they are different, while the simulated networks are similar) is adequately low, and when the groups are simulated to be different, the ability to detect a difference is sufficiently high when the difference between simulated networks and the sample size are substantial. We illustrate NCT by comparing depression symptom networks of males and females. Possible extensions of NCT are discussed.
Article
Full-text available
This study aimed to investigate direct relationships of work addiction symptoms with dimensions of work engagement. We used three samples in which work addiction was measured with the Bergen Work Addiction Scale and work engagement was measured with the Utrecht Work Engagement Scale. One sample comprised responses from working Norwegians ( n 1 = 776), and two samples comprised responses from working Poles ( n 2 = 719; n 3 = 715). We jointly estimated three networks using the fused graphic lasso method. Additionally, we estimated the stability of each network, node centrality, and node predictability and quantitatively compared all networks. The results showed that absorption and mood modification could constitute a bridge between work addiction and work engagement. It suggests that further investigation of properties of absorption and mood modification might be crucial for answering the question of how engaged workers become addicted to work.
Article
Full-text available
Posttraumatic stress disorder (PTSD) researchers have increasingly used psychological network models to investigate PTSD symptom interactions, as well as to identify central driver symptoms. It is unclear, however, how generalizable such results are. We have developed a meta-analytic framework for aggregating network studies while taking between-study heterogeneity into account and applied this framework in the first-ever meta-analytic study of PTSD symptom networks. We analyzed the correlational structures of 52 different samples with a total sample size of n = 29,561 and estimated a single pooled network model underlying the data sets, investigated the scope of between-study heterogeneity, and assessed the performance of network models estimated from single studies. Our main findings are that: (a) We identified large between-study heterogeneity, indicating that it should be expected for networks of single studies to not perfectly align with one-another, and meta-analytic approaches are vital for the study of PTSD networks. (b) While several clear symptom-links, interpretable clusters, and significant differences between strength of edges and centrality of nodes can be identified in the network, no single or small set of nodes that clearly played a more central role than other nodes could be pinpointed, except for the symptom "amnesia" that was clearly the least central symptom. (c) Despite large between-study heterogeneity, we found that network models estimated from single samples can lead to similar network structures as the pooled network model. We discuss the implications of these findings for both the PTSD literature as well as methodological literature on network psychometrics. (PsycInfo Database Record (c) 2021 APA, all rights reserved).
Article
Full-text available
Model fit assessment is a central component of evaluating confirmatory factor analysis models and the validity of psychological assessments. Fit indices remain popular and researchers often judge fit with fixed cutoffs derived by Hu and Bentler (1999). Despite their overwhelming popularity, methodological studies have cautioned against fixed cutoffs, noting that the meaning of fit indices varies based on a complex interaction of model characteristics like factor reliability, number of items, and number of factors. Criticism of fixed cutoffs stems primarily from the fact that they were derived from one specific confirmatory factor analysis model and lack generalizability. To address this, we propose a simulation-based method called dynamic fit index cutoffs such that derivation of cutoffs is adaptively tailored to the specific model and data characteristics being evaluated. Unlike previously proposed simulation-based techniques, our method removes existing barriers to implementation by providing an open-source, Web based Shiny software application that automates the entire process so that users neither need to manually write any software code nor be knowledgeable about foundations of Monte Carlo simulation. Additionally, we extend fit index cutoff derivations to include sets of cutoffs for multiple levels of misspecification. In doing so, fit indices can more closely resemble their originally intended purpose as effect sizes quantifying misfit rather than improperly functioning as ad hoc hypothesis tests. We also provide an approach specifically designed for the nuances of 1-factor models, which have received surprisingly little attention in the literature despite frequent substantive interests in unidimensionality. (PsycInfo Database Record (c) 2021 APA, all rights reserved).
Article
Full-text available
Statistical network models such as the Gaussian Graphical Model and the Ising model have become popular tools to analyze multivariate psychological datasets. In many applications, the goal is to compare such network models across groups. In this paper, I introduce a method to estimate group differences in network models that is based on moderation analysis. This method is attractive because it allows one to make comparisons across more than two groups for all parameters within a single model and because it is implemented for all commonly used cross-sectional network models. Next to introducing the method, I evaluate the performance of the proposed method and existing approaches in a simulation study. Finally, I provide a fully reproducible tutorial on how to use the proposed method to compare a network model across three groups using the R-package mgm.
Article
Full-text available
A growing number of publications focus on estimating Gaussian graphical models (GGM, networks of partial correlation coefficients). At the same time, generalizibility and replicability of these highly parameterized models are debated, and sample sizes typically found in datasets may not be sufficient for estimating the underlying network structure. In addition, while recent work emerged that aims to compare networks based on different samples, these studies do not take potential cross-study heterogeneity into account. To this end, this paper introduces methods for estimating GGMs by aggregating over multiple datasets. We first introduce a general maximum likelihood estimation modeling framework in which all discussed models are embedded. This modeling framework is subsequently used to introduce meta-analytic Gaussian network aggregation (MAGNA). We discuss two variants: fixed-effects MAGNA, in which heterogeneity across studies is not taken into account, and random-effects MAGNA, which models sample correlations and takes heterogeneity into account. We assess the performance of MAGNA in large-scale simulation studies. Finally, we exemplify the method using four datasets of post-traumatic stress disorder (PTSD) symptoms, and summarize findings from a larger meta-analysis of PTSD symptom.
Article
Full-text available
Social scientists have long studied international differences in political culture and communication. An influential strand of theory within political science argues that different types of political systems generate different parliamentary cultures: Systems with proportional representation generate cross-party cohesion, whereas majoritarian systems generate division. To contribute to this long-standing discussion, we study parliamentarian retweets across party lines using a database of 2.3 million retweets by 4,018 incumbent parliamentarians across 19 countries during 2018. We find that there is at most a tenuous relationship between democratic systems and cross-party retweeting: Majoritarian systems are not unequivocally more divisive than proportional systems. Moreover, we find important qualitative differences: Countries are not only more or less divisive, but they are cohesive and divisive in different ways. To capture this complexity, we complement our quantitative analysis with Visual Network Analysis to identify four types of network structures: divided, bipolar, fringe party, and cohesive.
Article
The Ising model has received significant attention in network psychometrics during the past decade. A popular estimation procedure is IsingFit, which uses nodewise l1-regularized logistic regression along with the extended Bayesian information criterion to establish the edge weights for the network. In this paper, we report the results of a simulation study comparing IsingFit to two alternative approaches: (1) a nonregularized nodewise stepwise logistic regression method, and (2) a recently proposed global l1-regularized logistic regression method that estimates all edge weights in a single stage, thus circumventing the need for nodewise estimation. MATLAB scripts for the methods are provided as supplemental material. The global l1-regularized logistic regression method generally provided greater accuracy and sensitivity than IsingFit, at the expense of lower specificity and much greater computation time. The stepwise approach showed considerable promise. Relative to the l1-regularized approaches, the stepwise method provided better average specificity for all experimental conditions, as well as comparable accuracy and sensitivity at the largest sample size.
Article
Objective This study applied network analyses to illustrate patterns of associations between cancer‐related physical and psychological symptoms (CPPS) and quality of life (QOL) before and after surgery. Methods Participants consisted of 256 gastric cancer patients admitted for curative section surgery at the surgical department in a teaching hospital in Korea between May 2016 and November 2017. Participants completed the survey, including MD Anderson Symptom Inventory, Hospital Anxiety and Depression Scale, and Functional Assessment of Cancer Therapy‐Gastric Cancer before surgery (T0), one week after surgery (T1), and 3–6 months after surgery (T2). Results Three networks featured several salient connections with varying magnitudes between CPPS and QOL across all time points. Particularly, anxiety was tightly connected to emotional wellbeing (EWB) across all time points and physical wellbeing (PWB) at T1. On the other hand, depression was connected to functional wellbeing at T0 and T2, gastric cancer concerns (CS) at T1, and PWB at T2. Distress and sadness were the most central symptoms in the three networks. Other central symptoms included shortness of breath at T0, fatigue at T0 and T1, and PWB and CS at T2. Anxiety, depression, and EWB served as bridges connecting CPPS to QOL across all time points with varying degrees of importance, as did PWB at T1 and T2. Conclusions Treating psychological distress and enhancing EWB and PWB can be high impact intervention targets throughout the cancer trajectory. This article is protected by copyright. All rights reserved.