ArticlePDF Available

Probabilistic graphical models in modern social network analysis

Authors:
  • Twilio Inc.

Abstract and Figures

The advent and availability of technology has brought us closer than ever through social networks. Consequently, there is a growing emphasis on mining social networks to extract information for knowledge and discovery. However, methods for social network analysis (SNA) have not kept pace with the data explosion. In this review, we describe directed and undirected probabilistic graphical models (PGMs), and highlight recent applications to social networks. PGMs represent a flexible class of models that can be adapted to address many of the current challenges in SNA. In this work, we motivate their use with simple and accessible examples to demonstrate the modeling and connect to theory. In addition, recent applications in modern SNA are highlighted, including the estimation and quantification of importance, propagation of influence, trust (and distrust), link and profile prediction, privacy protection, and news spread through microblogging. Applications are selected to demonstrate the flexibility and predictive capabilities of PGMs in SNA. Finally, we conclude with a discussion of challenges and opportunities for PGMs in social networks.
An example of using MLN in the political science. a Depicts a social network of five senators with two attributes. The ground predicates (b) are denoted by 15 elliptical nodes. The red ones are captured by the links of the social networks and dark blue nodes indicate nodes’ attributes. Two first-order logics, F1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F_1$$\end{document} and F2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F_2$$\end{document}, determine the structure of the MLN. There exist five groundings of the F1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F_1$$\end{document} (illustrated by the edges between the R(x) and S(x)′ nodes) and 15 groundings of F2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F_2$$\end{document} captured by the rest of the edges. Other examples of MLNs can be found in Tresp and Nickel (2013)
… 
This content is subject to copyright. Terms and conditions apply.
ORIGINAL ARTICLE
Probabilistic graphical models in modern social network analysis
Alireza Farasat
1,2
Alexander Nikolaev
1
Sargur N. Srihari
2
Rachael Hageman Blair
3
Received: 14 October 2014 / Revised: 14 August 2015 / Accepted: 23 August 2015 / Published online: 19 October 2015
Ó Springer-Verlag Wien 2015
Abstract The advent and availability of technol ogy has
brought us closer than ever through social networks.
Consequently, there is a growing emphasis on mining
social networks to extract information for knowledge and
discovery. However, methods for social network analysis
(SNA) have not kept pace with the data explosion. In this
review, we describe directed and undirected probabilistic
graphical models (PGMs), and highlight recent applica-
tions to social networks. PGMs represent a flexible class of
models that can be adapted to address many of the current
challenges in SNA. In this work, we motivate their use with
simple and accessible examples to demonstrate the mod-
eling and connect to theory. In addition, recent applications
in modern SNA are highlighted, including the estimation
and quantification of importanc e, propagation of influence,
trust (and distrust), link and profile prediction, privacy
protection, and news spread through microblogging.
Applications are selected to demonstrate the flexibility and
predictive capabilities of PGMs in SNA. Finally, we con-
clude with a discussion of challenges and opportunities for
PGMs in social networks.
Keywords Probabilistic graphica l modeling Social
network analysis Bayesian networks Markov networks
Exponential random graph models Markov logic
networks Social influence Network sampling
1 Introduction
Over 40 years ago, social scientist Allen Barton stated that
‘If our aim is to understand people’s behavior rather than
simply to record it, we want to know about primary groups ,
neighborhoods, organizations, social circles, and commu-
nities; about interaction, communication, role expectations,
and social control.’ (Barton 1968 as reported in Freeman
2004). This sentiment is fundamental to the concept of
modularity. The importance of structural relationships in
defining communities and predicting future behaviors has
long been reco gnized, and is not restricted to the social
sciences (Freeman 2004).
Social network analysis (SNA) has a rich history that is
based on the defining principle that links between actors
are informative. Th e advent and availability of Internet
technology has created an explosion in online social net-
works and a transformation in SNA. The analysis of
today’s social networks is a difficult Big Data problem,
which requires the integration of statistics and computer
science to leverage networks for knowledge mining and
discovery (Manyika et al. 2011). Historically, scientists
have had to rely on tractable records of social interactions
and experiments (e.g., Milgram’s small wor ld experiment);
now they have a luxury of accessing huge digital databases
of relational social data. SNA relies on diverse data rep-
resentations and relational information, which may include
(among others), tracked relationships among actors, events,
and other covariate information (Scott and Carrington
2011). Modeling social networks is especially challenging
due to the heterogeneity of the popul ations represented, and
& Rachael Hageman Blair
hageman@buffalo.edu
1
Department of Industrial and Systems Engineering,
University at Buffalo, Buffalo, USA
2
Department of Computer Science and Engineering,
University at Buffalo, Buffalo, USA
3
Department of Biostatistics, University at Buffalo, Buffalo,
USA
123
Soc. Netw. Anal. Min. (2015) 5:62
DOI 10.1007/s13278-015-0289-6
the broad spectrum of information represented in the data
itself. Modern applications of SNA include, among others,
the estimation of influence, privacy protection, trust (and
distrust) microblogging, and web browsing.
In this review, we focus on probabilistic graphical models
(PGMs), which have demonstrated promise in modeling
social networks (Lauritzen 1996; Koller and Friedman
2009). PGMs represent a marriage between graph theory and
probability theory that offers flexible modeling paradigms
with good interpretably. The graphical representation con-
sists of nodes connected by edges, which may be directed
(Bayesian networks) or undirected (Markov networks). The
relationship between nodes in a graph can be interpreted in
terms of conditional independencies. These independencies
can be read directly from the graph and enable a
tractable decomposition of the joint distribution possible
through the use of conditional probabilities. In this setting,
the compact representation of a high-dimensional joint dis-
tribution of random variables fX
1
; X
2
; ...; X
p
g can be rep-
resented explicitly in a factorized form that has a graphical
interpretation rooted in conditional independencies.
A powerful feature of the PGM modeling paradigms is the
ability to perform probabilistic queries and reasoning on the
graph at multiple levels (Koller and Friedman 2009). Queries
of interest may include the estimation of probabilities (joint,
conditional, or marginal), reasoning about variables in light
of new evidence (cau sal, evidential, and inter-causal rea-
soning), and quantitative predictions through the use of the
graph as a generator in simulations. Another attractive fea-
ture of PGMs is their inherent flexibility to model variables
that follow different distributions, and the ability to bring in a
priori information in to the learning process.
In this review, we outline the basic theory of PGMs,
along with the parameter and structural learning. The topic
of PGMs is extremely rich in content and theory. Several
existing surveys on the topic of graphical models that are
similar in spirit include (e.g., Goldenberg et al. 2010 ; Daud
et al. 2010; Salter-Townshend et al. 2012; Srihari 2014).
Our review differs from existing reviews in both style and
content. One distinguishing feature is that we illustrate the
different modeling paradigms using accessible and simple
models. Simple examples facilitate a connection between
theory and practice. Once this connection is established, we
highlight more complex recent applications in SNA that
differ from each other in both the nature of the data and
objectives of the modeling. These applications reveal the
inherent flexibility of PGMs to model a broad spectrum of
data that target relevant open challenges and questions in
SNA. We address both directed PGM s, known as Bayesian
networks (BNs), and undirected PGMs, known as Markov
networks, in Sects. 2 and 3, respectively. In Sect. 4, some
of the current challenges are highlighted, comparisons
between directed and undirected paradigms are made, and
future directions and opportunities for PGM-based research
in SNA are also highli ghted.
2 Directed probabilistic graphical models
Bayesian networks (BNs) are a special class of PGMs that
capture directed dependencies between variables, which
may represent cause and effect relationships. The edges in
a BN form a directed acyclic graphs (DAGs). The DAG
architecture conveys a critical modeling assumption that
there is no feedback via cycles in the graph. BNs obey the
Markov assumption which states that each variable, X
i
,is
independent of its non-descendants (unconnected nodes),
given its parents in G. Taken together, these assumptions
enable the compact representation of the high-dimensional
joint probability distribution of the variables in the model.
Despite their flexibility, the use of directed graphs in SNA
has been somew hat limited, although the applications that
we highlight are diverse. We describe the basic principles
of these directed PGMs and motivate them with applica-
tions in the literature, which showcase their utility in SNA.
Static Bayesian Networks Our major focus is static BNs,
which utilize data from a single snapshot of a social
community at a given time point. A DAG conveys precise
information regarding the conditional independencies
between modeled variables (nodes). For a set of random
variables fX
1
; X
2
; ...; X
n
g is a network with the structure
that encodes conditional independence relationships:
PðX
1
; X
2
; ...; X
n
Þ¼PðGÞ
Y
n
i¼1
PðX
i
j paðX
i
Þ; H
i
Þ;
ð1Þ
where P(G) is the prior distribution over the graph G,
paðX
i
Þ are the parent nodes of child X
i
, and H
i
denotes the
parameters of the local probability distribution. The prior
comes into play only when an expert cannot describe the
graph and structural learning is required. Structures and
relationships that are more likely (and less likely) can be
embedded into P(G) to influence searches through the
posterior model space.
A simple and fully parameterized BN for a course at a
University is shown in Fig. 1. This network can be viewed
as a template model in which different sections of a course
are taught by different professors ðPr 2fp0; p1g). Similar
templates can be used for various courses, e.g., Calculus,
Introduction to Chemistry, etc. In this example, different
teaching assistants (TA) vary in their teaching effective-
ness (TA
q
2fPoor; Fair; Goodg), and their own grade in
the course influences their overall ability to convey the
material in the class in well (TA
g
2fA; B; Cg). A student’s
grade (Grade 2fA; B; Cg) is caused by their intelligence
62 Page 2 of 18 Soc. Netw. Anal. Min. (2015) 5:62
123
fi0; i1g, the professor of the course, and the grade of the
TA. Finally, if a professor is asked to write a letter of
recommendation for a particular student, for simplicity,
this may be based on the performance of the student in the
course and the professor (e.g., some are more prone to
write positive letters). This scenario, although overly sim-
plistic, may hold in large classroom settings where the
teacher does not get to know the individual student s well.
The Markov assumption is apparent through the condi-
tional probability tables (CPDs) for each node, which
depend only on the parents. The head nodes (top layer)
have no parents, thus the CPD table is simply a marginal
probability that will sum to one across the different states
of the discrete variable. On the other hand, nodes with
parents are conditional on the possible combinations of the
parent states. It is evident that even for a small number of
parents, and a small number of sta tes for those p arents, the
CPD tables can grow quickly. In our example, PðG j
I; Pr; TA
g
Þ has 12 possible states (or scenarios) that can
occur that would influence grade-level.
Our toy example is an expert system that was written
down according to knowledge about the modeling domain.
Importantly, there are many scenarios that may be more or
less realistic, which may include changing some edges or
the addition of new variables. Nonetheless, it is often the
case that a network structure can be accurately described
by an expert. When a structure is prescribed for a BN,
parameter learning is still required. This comes in the form
of CPD tables for discrete distributions (Fig. 1). In our
simple example, the probabilities for the CPD table may
have been extracted from teaching evaluations, grades, or
other means.
As demonstrated in our simple example, each child node
is dependent on its parent nodes. The parameter learning
can be viewed as a local model or distribution that involves
only the child and the parents. These local models are the
Professor
Grade
TA Grade
TA Quality
Intellegence
i0 i1
0.35 0.65
p0 p1
0.5 0.5
Letter
i0, p0, A i0, p0, B i0, p0, C i0, p1, A i0, p1, B i0, p1, C i1, p0, A i1, p0, B i1, p0, C i1, p1, A i1, p1, B i1, p1, C
A 0.1 0.1 0.2 0.2 0.2 0.2 0.4 0.4 0.7 0.5 0.6 0.65
B 0.2 0.25 0.3 0.4 0.4 0.3 0.2 0.3 0.2 0.3 0.3 0.25
C 0.7 0.65 0.5 0.4 0.4 0.5 0.4 0.3 0.1 0.2 0.1 0.1
p0, A p0, B p0,C p1, A p1, B p1, C
Very Good 0.9 0.25 0 0.6 0.2 0
Good 0.1 0.75 0.1 0.4 0.4 0
Nuetral 0 0 0.9 0 0.4 0.95
A
Simple Parameterized Static Bayesian Network
P(I, Pr, TA_g, G, TA_q, L} = P(I)P(Pr)P(TA_g)P(G|I,Pr,TA_g)P(TA_q| TA_g)P(L|Pr, G)
Joint Distribution
AB C
0.6 0.35 0.05
AB C
Good 0.3 0.3 0.1
Fair 0.5 0.6 0.8
Poor 0.2 0.1 0.1
Simple Bayesian Networks
B
Time-slices from a Dynamic Bayesian Network
Student A
Student B
True Network
Student A
Student B
Student A
Student B
time point k
time point k+1
Unrolled BN (two time points)
Fig. 1 a Simple example of a
parameterized Bayesian
Network of a University multi-
section course. In this scenario,
a student may enroll in a course
taught by different
Professors. A student’s
Grade is influenced by the
student’s Intelligence, the
Professor, and the
effectiveness of their TA as
measured by the TA grade in
the course. The TA Quality is
a direct reflection of the TA’s
mastery of the material (their
grade). The quality of a
student’s Letter of
recommendation is dependent
on the grade in the course, and
also the professor. b Time slices
from a dynamic Bayesian
network of two students that
interact in a study group. The
true network is given to the left,
which includes both feedback
and a cycle (prohibited in static
BNs). Relationships can
unrolled in a DBN across
discrete time points (right)
Soc. Netw. Anal. Min. (2015) 5:62 Page 3 of 18 62
123
building blocks of the graphical model and they make up
the factors in the product for the joint distribution (Eq. 1).
When the variables in the model are continuous, the
specification of a local model requires distribution param-
eters. For example, if fX
1
; X
2
; X
3
g are Gaussian, and
X
1
! X
2
X
3
, then a local model would be of the form
X
3
Nðb
0
þ b
1
X
1
þ b
2
X
3
; r
2
Þ. Th erefore, in the con-
tinuous case, the local model can be viewed as a regression
on the parents. Another popular local model is a condi-
tional Gaussian Bayesian network (CG-BN), which gives
rise to regressions in which a continuous child node may
have parents that are both discrete or continuous (Lauritzen
1996). To enable factorization, CG-BNs prohibit the dis-
crete children from having cont inuous parents.
Structural learning is required when the network is not
known and has to be learned from the data. The objective
function for maximization is the posterior probability of a
graph, G, given by:
PðG j XÞ/PðX j GÞPðGÞ;
where P(G) is the prior on the graph. The marginal like-
lihood, PðX j GÞ, requires complex integration over the
parameters H:
PðX j GÞ/
Z
PðX j h; GÞPðh j GÞdh;
which can be alleviated with the use of conjugate priors. In
an effort to accelerate the learning process, and prevent
from over-fitting, a fan-in assumption is typical ly adopted.
This limits the number of parents that a node can have
(e.g., a node can have no more than three parents). The
graph prior P(G) can be explicitly used to encourage cer-
tain relationships, and penalize against others (Mukherjee
and Speed 2008; Hageman et al. 2011). Computation relies
on the fact that each node in the network, together with the
corresponding parents, represents a local model, which can
be described by a regression. These individual regressions
have priors on their para meters, PðX j GÞ, for example a
normal-Wishart prior can be used for nodes that follow a
normal distribution. In practice, the posterior in a graph is
calculated as a product of the local models, which is valid
representation under the Markov assumption. Possi ble
local models are often pre-computed in an effort to ease the
computational demand of the learning algorithms.
The structural learning problem concerns identifying a
global network that assembles these local models in a
optimal way. The process is a major challenge (NP-hard),
as the number of possible networks is super-exponential
with the number of nodes (Chickering et al. 2001). Struc-
tural learning methods rely on sampling-based appro aches
or a greedy optimization, e.g., hill climbing or simulated
annealing (Heckerman 2008). Sampling-based approaches
rely on Markov Chain Monte Carlo (MCMC) techniques
that sample the posterior distribution by moving through
model space according to a proposal distribution. The
proposal represents the modification to the current graph,
G
curr
, which is then evaluated and potentially accepted,
G
new
, (kept in the sample) or rejected (not kept in the
sample, another proposal is attempted) (Madigan et al.
1995). A widely used proposal for a new graph in the
Markov chain is to either add, delete, or reverse a single
edge (Fig. 2). This proposal is implemented within a
Metropolis–Hastings framework. The acceptance criterion
for a new graph is determined by:
min 1;
PðG
new
j XÞ
PðG
curr
j XÞ
|fflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflffl}
BayesFactor
QðG
curr
j G
new
Þ
QðG
new
j G
curr
Þ
|fflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflffl}
HastingsRatio
8
>
>
>
<
>
>
>
:
9
>
>
>
=
>
>
>
;
:
ð2Þ
The Bayes factor gives a measure of goodness of fit of the
proposed graph relative to the current. The Hastings ratio is
simply the neighborhood siz e of possible moves from the
current and new graph, equivalently,
NeighborhoodðG
curr
Þ=NeighborhoodðG
new
Þ (¼5=6in
Fig. 2).
The directionality and causal structure of the inferred
model make BN an attractive modeling paradigm for social
networks that capture cause and effect relationships.
Screen-based bayes net structure (SBNS) was developed as
a search strategy for large-scale data, which relies on the
adopted assumption of sparsity in the overall network
structure (Goldenberg and Moore 2004). SBSN enforces
the sparsity through a two stage process, which frames the
structural learning problem as market basket analysis task.
The algorithm relies on the theory of frequent sets and
support, to first screen for local modu les of nodes, and then
connect them through a global structure search. The market
basket framework lends itself to transaction style data,
which is by nature large, sparse and binary. The learning
problem is to identify an influence graph based on derived
features of the binary transaction data. In this case, actors
are assumed to be linked to each o ther indirectly through
items or events. A simple example of individuals linked
through a conference is shown in Fig. 3a. In this example,
the conference attendance (transaction items) can be used
to infer a network of social influence between individuals,
which adds insights into the social hierarchy that are not
apparent in classical interaction networks. The method was
shown to be effective for modeling a variety of SNs,
including citation networks, collaboration data, and movie
appearance records (Berry and Linoff 1997).
Koelle et al. proposed applications of BNs to SNA for
the prediction of novel links and pre-specified node fea-
tures (e.g., leadership potential) (Koelle et al. 2006). The
authors emphasize the advantage of BN to account for
62 Page 4 of 18 Soc. Netw. Anal. Min. (2015) 5:62
123
uncertainty, noise, and incompleteness in the network. For
example, a topology-based network measures such as de-
gree centrality, which is often used as a surrogate for im-
portance, are subject to summarizations over incomplete
and sometimes erroneous data. Comparatively, a BN
affords more flexibility that enables measures such as im-
portance to be estimated in a more data-dependent manner.
Koelle et al. provide an example of combining topology-
based network measures with covariate information
(Fig. 3b). Directed inference of this type leverages small
X1
X2 X3 X3X2
X1
X1
X2 X3
X1
X2 X3
X1
X2 X3
X1
X2 X3
X1
X2 X3
X1
X2 X3
X1
X2 X3
X1
X2 X3
X1
X2 X3
X1
X2 X3
X1
X2 X3
Current Graph (G_curr)
Current Graph (G_new)
Structural Learning: Example of Graph Proposals in MCMC
Neighborhood Neighborhood
Deletion
Reversal
Addition Addition
Addition Reversal Reversal
Addition noiteleDnoiteleD
Addition
1/6
1/5
Fig. 2 A widely used proposal
in an MCMC sampling for
structural learning in a BN is the
addition, reversal, or deletion of
an edge from the current graph,
G
curr
, to form a new graph,
G
new
. The proposal move is
selected at random from models
in the neighborhood with a
probability. The Metropolis–
Hastings acceptance criteria are
a function of the neighborhood
size for the current and propose
graphs and the overall fit of
those graphs to the data as
measured by the Bayes factor
Future Leadership
Potential
Sex Education
Religion
Individual Importance
Degree
Centrality
Link
Certainty
Individual
Importance
- Centrality Measure
- Attribute
- Derived Metric
m1
m2
t1
t2
t3
k1
k2
k3
Q
Edge Types
following
mentioning
tagging
publishing
re-tweeting
sharing
Node Types
microblogger
tweet
reply
retweet
hashtag
web resource
Twitter
Tweet “Layored” Search
Microbloggers Tweets Terms Query
Conference 1
Anne
Bill
Cal
Doug
Eden
Anne
Bill
Cal
DougEden
Individuals linked through events
Inferred Social Influence
A Sparse Bayesian Influence
B Local Bayesian Network Prediction
C Bayesian models of Twitter Queries
Bayesian Network Applications
Session Level
Query Level
- Observed
- Latent
E
t-1
E
t
E
t+1
A
i
a
u
s
u
S
i
C
i
D Click Modeling
Conference 2
Conference 3
E - Examination
A - Attraction
S - Satisfaction
Fig. 3 Simplified schematics of
select examples of Bayesian
networks in social networks.
a Inferring influence based on
transaction style data that links
actors to events. b DAGs can
utilize network features such as
attributes and centrality
measures on the network itself
to predict derived metrics, e.g.,
individual importance or
leadership potential. c Twitter is
a microblogging community,
which can be queried using a
retrieval model that is based on
a Bayesian network. d An
application of DBNs for click
modeling in a browser. The
temporal dimension is a click
sequence that connects the
examination of webpages with
attraction and satisfaction
Soc. Netw. Anal. Min. (2015) 5:62 Page 5 of 18 62
123
local models, which can be naturally translated to regres-
sion or classification problems, depending on the child
node (response variable). In this setting, the local BN can
be evaluated at the node-level, ranked probability estimates
can be used for predictive purpos es, and the output serves
as a surrogate for model fit on a given structure.
Privacy protection is a major concern amongst users in
online social networks. Generally, people prefer that their
personal information is shared in small circles of friends
and family, and shielded from strangers. Despite this
common desire, relatively simple BNs have been shown to
be successful in the invasion of privacy though the infer-
ence of personal attributes, which have been shielded
through privacy settings (He et al. 2006). The BNs operate
under the often accurate assumption that friends in social
circles are likely to share common attributes. In 2006, the
recommendation by He et al. to improve privacy was to hide
friend lists through privacy settings, and to request that
friends hide their personal attributes. Practically speaking,
setting the optimal pri vacy settings is complex, and can be a
tedious and difficult for an average user (Lipford et al.
2008). In 2010, a privacy wizard template was proposed,
which automates a persons priv acy settings based on an
implicit set of rules derived using Naive Bayes (the simplest
BN) or decision tree methods (Fang and LeFevre 2010).
On the other side of the application spectru m, BNs are
useful for recommending products and services, to users,
taking into account their intere sts, needs and communica-
tions patterns. Belief propagation has been used to sum-
marize belief about a product and propagate that belief
through a BN (Ayday and Fekri 2010; Yang et al. 2013).
Belief propagation is the process in which node marginal
distributions (beliefs) are updated in light of new evidence.
In the case of a BN, evidence (e.g., opinion or ratings) is
absorbed and propagated through a computational object
known as a junction tree, resulting in updated marginal
distributions. Comparing the networ k marginals before and
after evidence is entered and propagated conveys a system-
wide effect of influence(s), and insights into how perception
or ratings change when recommendations are passed
through a network. Despite its simplicity, the BN approach
has been shown to be competitive with the more classical
collaborative filtering (CF)-based recommendation. Trust
(and distrust) can be highly variable dynamic processes,
which depends not only on distance from a recommender,
but also, the characteristics of the network users (Wang and
Vassileva 2003; Kuter and Golbeck 2007). Accounting for
trust in recommendation systems is an open area of research
Microblogging networks represent another effective
venue for rapidly disseminating information and influence
throughout a community. Twitter is the most well-known
microblogging network, in which posts (tweets) are short
and time-sensitive with respect to the reference of current
topics (Kwak et al. 2010). Users within microblogging
networks of this type participate though the act of following
and being followed, which gives rise naturally to directed
associations (Java et al. 2007). With over 50 million tweets
submitted daily, ranking and querying microblogs has
become an important and active area of open research.
Jabeur et al. proposed a retrieval model for tweet searches,
which takes into account a number of factors, including
hashtags, influence of the microbloggers, and the time
(Jabeur et al. 2012a, b). A query relevance function was
developed based on a BN that leverages the PageRank
algorithm to estimate parameters, such as influence, in the
model (Fig. 3c). The retrieval model was shown to out-
perform traditional methods for information retrieval on
Twitter data from the TREC Tweets 2011 corpu s (Ounis
et al. 2011).
Dynamic Bayesian Networks The static BNs described
depict a network at a single time point. This is most often an
oversimplification of the true nature of the network, which is
inherently dynamic. Modeling the dynamics of a network
over the time-course can be achieved in the BN framework
with additional modeling assumptions. Dynamic Bayesian
networks (DBNs) provide compact repr esentations for
encoding structured probability distributions over arbitrarily
long time-courses (Murphy 2002). State-space models, such
as hidden Mar kov model (HMM) and Kalman filter models
(KFMs), can be viewed as a special class of the more general
DBN. Specifically, KFMs require unimodal linear Gaussian
assumptions on the state-space variables. HMMs do not
allow for factorizations within the state-space, but can be
extended to hierar chical HMMs for this purpose. Overall,
DBNs enable a more general representation of sequential or
time-course data.
DBN modeling is achieved through the use of template
models, which are instantiated, i.e., duplicated, over multiple
time points. The relationships between the variables within a
template are fixed, and represent the inherent dependencies
between ground variables in the model. There are three types
of edges in a DBN. Intra-time slice edges represent depen-
dencies within a time slice. Persistent edges link the same
variable in two time slices; for example, the velocity of a
vehicle at a time slice is very dependent on the velocity of the
vehicle in the previous time slice. Finally, inter-time slice
edges connect different variables between time slices, for
example, the velocity of a car may also be influenced by
weather at the previous time slice.
The objective is to model a template variable over a
discretized time-course, X
0
X
T
, and represent PðX
0
:
X
T
Þ as a function of the templates over the range of time
points. Re ducing the temporal problem to conditional
template models makes the problem computationally
tractable, b ut requires the specification of a fixed structure
62 Page 6 of 18 Soc. Netw. Anal. Min. (2015) 5:62
123
across the entire time trajectory. In a DBN, the probability
for a random variable X spanning the time-course can be
given in factored form,
PðX
0:T
Þ¼PðX
0
Þ
Y
T1
t¼0
PðX
tþ1
j X
t
Þ;
where X
0
represents the initial state, and the conditional
probability terms of the form PðX
tþ1
j X
t
Þ convey the
conditional independence assumptions. The conditional
representation of the likelihood is similar in spirit to the
static BN representation, but conveys the conditional
independence with respect to time. The Markov assump-
tion enables this factorization, which has different, yet
analogous meanings in static and dynamic BNs. In a DBN,
the Markov assumption explains the memor yless property,
i.e., that the current state depends on the previous and is
conditionally independent of the past ðX
tþ1
?X
0:t1
j X
t
Þ.
Comparatively, in static BNs, the Markov assumption only
captures nodes’ independence of their non-descendants,
given the states of their parents.
Briefly, the learning paradigms are rather similar.
Structural learning is typically achieved by the same
scoring strategies, but with the added constraint that the
structure must repeat over time (Friedman et al. 1998).
Such a constraint alleviates the computational burden for
search strategies. Additionally, the best initial structure can
be searched for independently from the remainder of the
time-course. The search is performed either through greedy
hill climbing or samplin g.
A major advantage of DBNs is that they can be enriched
to accommodate more complex interactions that would
violate DAG assumptions in a static BN. Figure 1b shows a
simple example of a common situation where the true
network has feedback in the form of self-loops and a cycle
in the graph. This feedback is prohibited in a static BN, but
can be captured in a DBN. In this scenario, two students
form a stud y group, but also self-study, in an effort to
improve learning outcomes. The unrolled BN captures
these relationships for two time slices that contains per-
sistent and inter-time slice edges. These relationships are
preserved over a time series (e.g., a semester), thereby
forming a template model.
Despite the fact that social networ ks are inherently
dynamic, the applications of DBNs in SNA have been
limited. Importantly, there have been many attempts to
model social networks probabilistically over time, but not
in the strict PGM context, which is the focus of this review;
many of these advances are mentioned in the discussion.
Chapelle et al. used DBNs to model web users’ browsing
history (Chapelle and Zhang 2009). The DBN extends the
traditional and widely used cascade model for browsing
behavior. The dynamic of click sequences for single click
(Fig. 3d) takes into account the information at the query
and session levels, differentiating perceived/actual attrac-
tion (a
u
and A
i
respectively) and perceived/actual satis-
faction (s
u
and S
i
respectively) with links. At each click
(time-step), the hidden binary variables for examination
(E
i
) and satisfaction (S
i
) track the time progression to
predict future clicks. The DBM approach was shown to
outperform traditional methods, and highlighted the sen-
sitivity of click modeling to measures of relevance and
popularity at the query level.
Meetings can be viewed as social events, in which
valuable information is exchanged mainly through speech.
Effectively processing, capturing, and organizing this
information can be costly, but important in order to max-
imize the impact and information flow for participants.
Dielman et al. cast the problem of meeting structuring as a
DBN, which partitions meetings into sequences of actions
or phases based on audio (Dielmann and Renals 2004).
DBNs outperformed baseline HMMs in detecting meeting
actions in a smart room, such as dialog, note s at the board,
computer presentations, and presentations at the board.
Twitter and microblogs, in general, have become a
major resource for the media to obtain breaking news or a
the occurrence of a critical event. Recently, Sakaki et al.
showed that tweet modeling via Kalman Filtering is
effective for the prediction of earthquakes of a certain
magnitude in Japan. Furthermore, they developed a
reporting system Torreter, which is quicker than the
existing government reporting system in warning registered
individuals through email of an impending quake (Jansen
et al. 2009).
3 Undirected probabilistic graphical models
Markov networks (MNs), also known as Markov random
fields (MRFs), are PGMs with undirected edges . Similar to
directed BNs, a MN is a representation of the joint prob-
ability distribution between random variables (represented
by nodes), where the absence of an edge between two
nodes implies conditional independence between the
nodes, given the other nodes in the network. In this review,
we restrict our focus to MNs, Markov logic networks
(MLNs) and exponential random graph models (ERGMs),
which can be viewed as generalizations of the random
graphs (Frank and Strauss 1986), and are widely used in
SNA (Newman et al. 2002). The basic formulation of these
models and their utility in SNA will be highlighted.
Markov networks express the joint probability of given
random variables by decomposing the network into smaller
complete sub-graphs known as cliques, and use maximal
cliques to capture the random variables’ dependencies. A
clique is a maximal clique if it cannot be extended to
Soc. Netw. Anal. Min. (2015) 5:62 Page 7 of 18 62
123
include additional adjacent nodes. The clique representa-
tion enables a compact factorization of the probability
density function (pdf). The joint pdf of n random variables,
X ¼fX
1
; ...; X
n
g, with conditional (in)dependencies cap-
tured by a graph, can be expressed as:
PðXÞ¼
1
Z
Y
C2X
w
C
ðX
C
Þ;
ð3Þ
where C is a maximal clique in the set of maximal cliques
X. Let X
C
denote a subset of X comprised of the random
variables that form clique C. Clique potential w
C
ðX
C
Þ is a
function of these variables (e.g., the frequency of distinct
realizations of the random variables forming the clique). A
unique clique potential can be specified for each clique; the
size of a clique can be one determinant of the corre-
sponding clique potential; however, cliques of the same
size may have different clique potentials. The clique
potentials are p ositive functions that capture the depen-
dence of the variables within the cliques (Koller and
Friedman 2009). The normalizing constant, also known as
the partition function, is given as:
Z ¼
X
X2v
Y
C2X
w
C
ðX
C
Þ:
Each clique potential in a MN is specified by a factor,
which can be viewed as a table of weights for each com-
bination of values of variables in the potential. In some
special cases of MNs such as log-linear models (Murphy
2012), clique potentials are represented by a set of func-
tions, termed features, with associated weights (i.e.,
h
C
/
C
ðX
C
Þ¼logðw
C
ðX
C
ÞÞ, where /
C
ðX
C
Þ is a feature
derived from the values of the variables in set X
C
) and h
C
is
the weight of /
C
ðX
C
Þ estimated at the parameter learning
stage.
The Hammersley–Clifford theorem specifies the condi-
tions under which a positive probability distribution can be
represented as a MN. Specifically, the given representation
(Eq. 3) implies conditional independencies between the
maximal cliques and is, by definition, a Gibbs measure
(Murphy 2012).
A simple example of the use of MNs to study the col-
lective behavior in a social network is shown in Fig. 4 .
Suppose each member of a friendship network (Fig. 4a) is
looking to make a decision about purchasing a given pro-
duct, ‘liking’ a post in an online forum, supporting a
political party, participating in a school activity or choos-
ing a family doctor. In this setting, the random variables in
the nodes are binary and the edges indicate pairwise
dependencies between them. Let X
1
; ...; X
5
denote five
random variables, each of which takes on the value of 1 or
0 to signal the member’s attitude—Like or Dislike,
respectively. Figure 4b depicts the MN with three cliques
a, b and c, X ¼fa; b; cg, with X
a
¼fX
1
; X
2
; X
3
g, X
b
¼
fX
1
; X
5
g and X
c
¼fX
4
; X
5
g. The MN includes three clique
potentials, w
a
ðX
1
; X
2
; X
3
Þ; w
b
ðX
1
; X
5
Þ and w
c
ðX
4
; X
5
Þ, sat-
isfying the requirements of the Hammersley–Clifford
A
B
C
Fig. 4 a A social network and a
corresponding MN model (b).
The nodes of the MN are actors’
decisions and the variable
dependencies are defined based
on the ties in the social network.
The three tables show the
frequencies of hypothetical
(observed) Like and Dislike
combinations. c A factor graph
for the MN in b
62 Page 8 of 18 Soc. Netw. Anal. Min. (2015) 5:62
123
theorem under the assumption that a member’s decision
can be affected by only their immediate friends and that it
matters if those friends are also friends with each other.
The joint probability function of X
1
; ...; X
5
is expresse d as:
PðX
1
; ...; X
5
Þ¼
1
Z
w
a
ðX
1
; X
2
; X
3
Þw
b
ðX
1
; X
5
Þw
c
ðX
4
; X
5
Þ;
where Z ¼
P
X
1
;X
2
;...;X
5
w
a
ðX
1
; X
2
; X
3
Þw
b
ðX
1
; X
5
Þw
c
ðX
4
; X
5
Þ
¼ 3 1 13 þþ9 10 19 (the summation is taken
over all the 2
5
distinct realizations of the model’s variables;
the first explicitly written term above corresponds to the
case where the variables are all zeros, and the last term
corresponds to the case where the variables are all ones).
To parameterize the MN and obtain a log-linear model, let
h
a
/
a
ðX
1
; X
2
; X
3
Þ¼logðw
a
ðX
1
; X
2
; X
3
ÞÞ, h
b
/
b
ðX
1
; X
5
Þ¼
logðw
b
ðX
1
; X
5
ÞÞ and h
c
/
c
ðX
4
; X
5
Þ¼logðw
c
ðX
4
; X
5
ÞÞ. With
H ¼fh
a
; h
b
; h
c
g, the joint probability function defined by
the MN becomes:
PðX
1
; ...; X
5
Þ¼
1
ZðHÞ
expfh
a
/
a
ðX
1
; X
2
; X
3
Þ
þ h
b
/
b
ðX
1
; X
5
Þþh
c
/
c
ðX
4
; X
5
Þg;
where ZðHÞ is the partition function. The set of the model
parameters, H, would need to be estimated from any
available data of known decisions made by the social
network members. Figure 4c depicts a factor graph corre-
sponding to the MN. Fact or graphs are bipartite graphs
used to specify the factorization of the probability distri-
bution function, and also, to inform the computation of
marginal probability distributions of MN variables (Mur-
phy 2012).
MN specification problems, including parameters esti-
mation and structure learning from data, can be quite
challenging. The main difficul ty in MN parameter esti-
mation is that the max imum likelihood problem formul ated
with Eq. 3 has no analytical solution due to the complex
expression of Z (Lee et al. 2006). The problem of finding
the optimal structure of the MN using available data,
similar to BNs, is even more challenging (Bromberg et al.
2009). Currently existing approaches to structure learning
are either constraint-based or score-based (see Koller and
Friedman 2009; Ding 2011; Schmidt et al. 2010 for more
details).
MNs have found increased utility in SNA with the
emergence of online social networks (OSNs) and digital
social media (see Bonchi et al. 2011 for a review of key
problems in SNA). The need to capture non-c ausal
dependencies within and between data instances (e.g.,
profile information) and observed relationships (e.g.,
hyperlinks) in these applications is exacerbated by the
presence of missing or hidden data in OSNs (Xiang and
Neville 2013). A popular problem instance in this domain,
that of user (missing) profile prediction, has been attacked
using MNs (Taskar et al. 2002 ; Neville and Jensen 2007).
Along with the problem of predicting missing profiles,
link prediction is among the most prominent problems in
Big Data SNA. Multiple variations of MNs that have been
used to estimate the probability that a (unobserved) link
exists betwee n nodes include Markov logic networks,
relational Markov networks, relational Bayesian networks
and relational dependency networks (Al Hasan and Zaki
2011; Chen et al. 2013; Tresp and Nickel 2013). Detection
of community structures is another area of MN application
(Newman 2006). Communities can be discovered through
examination and subsetting (cutting) network relationships
according to labels of interest, and through the use of
weighted community detection algorithms. Social network
clustering is especially challenging in a dynamic context,
e.g. in mobile social networks (Hump hreys 2007). Wan
et al. employed undirected graphical models (i.e., condi-
tional random fields) constructed from mobile user logs
that include both communication records and user move-
ment information (Wan et al. 2012).
Several generative models have been proposed, which
are motivated by MNs, and explain the effects of selection
and influence (e.g., see Aggarwal 2011). Modeling chan-
neled spread of opinions and rumors, known more generally
as diffusion modeling, is an active area of research in SNA
(Bach et al. 2012). Several applications of diffusion mode ls
have been proposed for social networks including, but not
limited to the spread of information (Cowan and Jonar d
2004), viral marketing (Kempe et al. 2003), spread of dis-
eases (Anderson and May 1979), the spread of cooperation
(Santos et al. 2006). Given a social network, for each node,
a corresponding rando m variable indicates the state of the
node (e.g., product or technology adoption) and links in the
network represent dependency (Wortman 2008).
Markov logic networks employ a probabilistic frame-
work that integrates MNs with first-order logic such that
the MN weights are positive for only a small subset of
meaningful features viewed as templates (Richardson and
Domingos 2006). Formally, let F
i
denote a first-order logic
formula, i.e., a logical expression comprising constants,
variables, functions and predicates, and w
i
2Rdenote a
scalar weight. An MLN is then defined as a set of pairs
ðF
i
; w
i
Þ. From the MLN, the ground Markov network,
M
L;C
, is constructed (Richardson and Domingos 2006) with
the probability distribution (Tresp and Nickel 2013),
PðX ¼ xÞ¼
1
Z
exp
X
i
w
i
n
i
ðxÞ
!
; ð4Þ
where n
i
ðxÞ is the number of true groundings (e.g., true
logic expressions based on observations) of F
i
, i.e., such
formulae that hold, in x.
Soc. Netw. Anal. Min. (2015) 5:62 Page 9 of 18 62
123
A simple example network of five senators is shown in
Fig. 5a. In this setting, each senator supports one of two
political parties (Democratic or Republican). Each senator
in this network has two attributes (1) political affiliation,
(R(n) is 1 if senator n is a republican and 0 otherwise for
n 2fA; B; C; D; Eg), and (2) supporting a particular bill,
(S(n) is 1 if senator n supports the bill and 0 otherwise). Let
F(n, m), a binary symmetric function, denote the relation-
ship between senators n and m (n m). Suppose
‘Republicans do not support the bill’ and ‘If two senators
have a relationship and one is republican then so is the
other’ are two logical statements denoted by F
1
and F
2
.
The first-order logic format is given as follows:
F
1
: 8n; RðnÞ):SðnÞ;
F
2
: 8n; mFðn; mÞ^RðnÞ)RðmÞ:
The MLN is similar to the ground network (Fig. 5b).
However, only the combinations of variables correspond-
ing to logical statements, F
1
and F
2
, are parameterized in
the MLN (all the other weights are zero). Let X ¼
fX
1
; ...; X
15
g denote the set of all nodes in the M
L;C
where
X
i
indicate node i (e.g., X
1
is the node labeled S(A)in
Fig. 5b). In an MLN, clique potentials are defined simi lar
to those in Markov networks. Now, one can use this MLN
to find the probability that all senators in this example
support the bill (i.e., PðSðAÞ¼1; ...; SðEÞ¼1). To cal-
culate the number of true groundings (n
1
and n
2
), both F
1
and F
2
should be examined for all nodes in the observed
network. More generally, this approach can be imple-
mented to estimate missing profiles in social networks as
well.
Many problems in statistical relational learning, such as
link prediction (Domingos et al. 2008), social network
modeling, collective classification, link-based clustering
and object identification, can be formulated using instances
of MLN (Richardson and Domingos 2006). Dierkes et al.
used MLNs to investigate the influence of Mobile Social
Networks on consumer decision-making behavior. With
the call detail records represented by a weighted graph,
MLNs were employed in conjunction with logit models as
the learning technique based on lagged neighborhood
variables. The resulting MLNs were used as predictive
models for the analysis of the impact of word of mouth on
churn (the decision to abandon a communication service
provider) and purchase decisions (Dierkes et al. 2011).
As mentioned above, link mining and link prediction
problems can also be addressed using MLNs, since MLNs
combine logic and probability reasoning in a single
framework (Domingos et al. 2010). Furthermore, the abil-
ity of MLNs to represent complex rules by exploiting
relational information makes them an appropriate alterna-
tive for collective classification (e.g., classification of
publications in a citation network, or of hyperlinked web-
pages) (Crane and McDowell 2011).
The Ising model and its variations form a subclass of
MN with foundations in theoretical physics. The Ising
model is a discrete and pairwise MN, and is popular in
applications in part due to its simplicity (Koller and
Friedman 2009). The variables in the model, X
1
X
p
, are
assumed to be binary, and their joint probability is given
as:
PðX; HÞ¼exp
X
ðj;kÞ2E
h
jk
X
j
X
k
UðHÞ
0
@
1
A
8 X 2 v;
where v 2f1; 1g
p
, and UðHÞ is the log of the partition
function
UðHÞ¼log
X
x2v
exp
X
ðj;kÞ2E
h
jk
x
j
x
k
0
@
1
A
2
4
3
5
:
A
B
Fig. 5 An example of using MLN in the political science. a Depicts a
social network of five senators with two attributes. The ground
predicates (b) are denoted by 15 elliptical nodes. The red ones are
captured by the links of the social networks and dark blue nodes
indicate nodes’ attributes. Two first-order logics, F
1
and F
2
,
determine the structure of the MLN. There exist five groundings of
the F
1
(illustrated by the edges between the R(x) and S(x)
0
nodes) and
15 groundings of F
2
captured by the rest of the edges. Other examples
of MLNs can be found in Tresp and Nickel (2013)
62 Page 10 of 18 Soc. Netw. Anal. Min. (2015) 5:62
123
Special, efficient methods exist for learning the Ising
Model parameters from data (Ravikumar et al. 2010).
While the model has been originally found useful for
understanding magnetism and phase transitions, its utility
has later expanded to image processing, neural modeling,
and studies of tipping points in economics and social
domains (Afrasiabi et al. 2013).
In SNA, the Ising model can be employed to analyze
factors such as network substruc tures and nodal features
affecting the opinion formation process. A classical
example within this is a stud y of medical innovation
spread, namely the adoption of drug tetracycline by 125
physicians in four small cities in Illinois (Van den Bulte
and Lilien 2001). A small subset of this network is illus-
trated in Fig. 6. The adoption status of node k is repre-
sented by X
k
8 k ¼ 1; ...; 5, where X
k
¼ 1 if adopted (blue
nodes) and 0 otherwise (black nodes). Since the Ising
model only capt ures pairwise dependencies between vari-
ables, the corresponding MN only considers cliques of size
two (i.e., dyads); hence, one can concisely write the clique
potentials in the form wðX
k
; X
j
Þ¼X
k
X
j
. Let n
þ
and n
denote the number of agreements (i.e., cases withX
k
X
j
¼ 1
for some k and j) and the number of disagre ements (i.e.,
cases with X
k
X
j
¼1 for some k and j), respectively.
Assuming h
jk
¼ h
A
if X
k
X
j
¼ 1 and h
jk
¼ h
D
if X
k
X
j
¼1,
H ¼fh
A
; h
D
g, one can obtain
P
ðj;kÞ2E
h
jk
X
j
X
k
¼ h
A
n
þ
h
D
n
. Hence, the joint probability of all nodes’ adoption
status is:
PðX
1
; ...; X
5
; HÞ¼expðh
A
n
þ
h
D
n
UðHÞÞ;
where UðHÞ¼log
P
x2v
½expðh
A
n
þ
h
D
n
Þ is the parti-
tion function. For this small example, the table in Fig. 6b
shows all clique potentials which are either 1 or 1 based
on the network structure. The counts of agreements and
disagreements are obtained next (e.g., n
þ
¼ 5 and n
¼ 2
in Fig. 6b). The partition function has 32 additive terms:
each combination of X
k
’s leads to particular values of n
þ
and n
and all these values affect the value of UðHÞ.
In the model presented above, the counts of possible
agreements and disagreements depend on the network
structure, so the MN can be said to explore the impact of
homophily on tetracycline adoption decisions. Note that the
model’s parameters, h
A
and h
D
, first need to be estimated
from any given data (i.e., from a single observation of a
network of (non)adopters); however, the approaches to
such parameter estimation are beyond the scope of this
paper.
Figure 7 depicts the entire physicians’ advisory network
from a data set prepared by Ron Burt from the 1966 data
collected by Coleman et al. (1966) about the spread of
medical innovation. The figure illustrates the physicians’
network in two different time points and shows how
physicians changed their opinions and adopted the new
medication overtime. To find the probability of adoption,
the Ising model can be modified by considering the impact
of nodal attributes on the adoption.
Recently, the Ising Model has been used to examine
social behaviors (Vega-Redondo 2007), including collec-
tive decision making, opinion formation and adoption of
new technologies or products (Grabowski and Kosin
´
ski
2006; Krause et al. 2012). For example, Fellows et al.
proposed a random model of the full network by modeling
nodal attributes as random variates. They utilized the new
model formulation to analyze a peer social network from
the National Longitudinal Study of Adolescent Health
(Fellows and Handcock 2012). Agliari et al. (2010) pro-
posed a mode l to extract the underlying dynamics of social
systems based on diffusive effects and people strategic
choices to convince others. Through the adaptation of a
cost function, based on the Ising model, for social inter-
actions between individuals, they showed by numerical
simulation that a steady-state is obtained through natural
dynamics of social systems.
A
B
Fig. 6 An example of implementing the Ising model to find the probability of adopting a new medication. a A sub-network of an physicians’
advisory network with 5. b A pairwise Markov network is constructed where the cliques with size of at most 2 are involved
Soc. Netw. Anal. Min. (2015) 5:62 Page 11 of 18 62
123
Exponential random graph models (ERGMs) (Wasser-
man and Pattison 1996), also known as the p
-class models,
are among the most widely used network approaches to
modeling social networks in recent years (Pattison and
Wasserman 1999; Robins et al. 1999, 2007a). A social
network of individuals is denoted by graph G
s
with N nodes
and M edges, M
N
2

. The corresponding adjacency
matrix of is denoted by Y ¼½y
ij
NN
, where y
ij
is a random
variable and defined as follows:
y
ij
¼
1 if there exists a link between nodes i and j 8i;j;ij
0 otherwise.
Based on an ERGM, the probability of any observed net-
work, y, is:
PðY ¼ y; HÞ¼
1
Z
exp
X
K
i¼1
h
i
f
i
ðyÞ
!
; ð5Þ
where f
i
ðyÞ; i ¼ 1; ...; K, are called sufficient statistics
(Morris et al. 2008; Lusher et al. 2012), or motifs based on
configurations of the observed graph and H ¼fh
1
; ...; h
K
g
is a K-vector of parameters (K is the number of different
sufficient statistics used in the model). Network configu-
rations used to compute sufficient statistics, including but
not limited to network edge count (tie between two actors),
as well as counts of 2-stars (two ties sharing an actor) and
triads of various types, are related to communication pat-
terns among actors in a social network (see Lusher et al.
2012 for more details about network configurations). The
parameters of an ERGM describe the probabilities of a
wide variety of possible configurations in social networks
(Robins et al. 2001). Again, Z is called the normalization
constant.
As an example, a social network of five individuals is
assumed. Since the edges (ties) between nodes are
considered as random variables, the given network is the
most likely realization out of many possible networks. In
this case, an ERGM const ructs a probability distribution
over all possible networks with five nodes. Figure 7 illus-
trates the social network (A) and the corresponding graph
where edges, y
ij
8i; j ¼ 1; ...; 5 represent random variables
along with five sufficient statistics (f
i
ðyÞ i ¼ 1; ...; 5)
including edge, 2-star, 3-star, 4-star and triangle (B ). The
probability distribution of any possi ble network is obta ined
as follows:
PðY ¼ y; HÞ¼
1
Z
exp h
1
f
1
ðyÞþh
2
f
2
ðyÞþh
3
f
3
ðyÞð
þh
4
f
4
ðyÞþh
5
f
5
ðyÞÞ;
where y is any observed network with five nodes,
H ¼fh
1
; ...; h
5
g, the set of weights of sufficient statistics,
are estimated through solving an optimization problem
where the probability of the observed network is maxi-
mized. The exact computation of the normalization con-
stant, Z, requires handling of many terms (all possible
network realization must be considered and their corre-
sponding sufficient statistics calculated). This chal lenge is
conventionally handled using Markov Chain Monte Carlo
(MCMC) sampling tec hnique (Snijders et al. 2006).
Some of the first proposed models, e.g., random graphs
and p
1
models (Frank and Strauss 1986), used Bernoulli
and dyadic dependence str uctures, which are generally
overly simplistic (Robins et al. 2007a). On the contrary,
ERGMs are based on Mar kov dependence assumption
(Frank and Strauss 1986) supposing that two possible ties
are conditionally dependent when they share an actor
(node). Moreover, Markov dependence assumption can be
extended to attributed networks which assumes each node
has a set of attributes influencing the node’s possible
incoming and outgoing ties (Robins et al. 2007a) (e.g.,
Fig. 7 The spread of new drug adoption through an advisory network of physicians: two snapshots at different time points, about 2 years apart
(from left to right). The growth dynamics in the number of adopters can be analyzed with an Ising Model
62 Page 12 of 18 Soc. Netw. Anal. Min. (2015) 5:62
123
more experienced actors in an advisory network, more
incoming ties). When nodal attributes are taken into
account as random variables, ERGMs and MNs can be
integrated to mode l the social network due to similarities
that they share (see the Appendix and Fellows and Hand-
cock 2012; Thiemichen et al. 2014; Lusher et al. 2012).
ERGMs have been widely employed to study the net-
work and friendship formation (Song et al. 2014) and
global network structural using local structure of the
observed network (Uddin et al. 2013a ). The observed
network is considered as one realization from too many
possible networks with similar important characteristics
(Robins et al. 2007a). For example, Broekel and Hartog
(2013) used ERGMs to identify factors determining the
structure of inter-organizational networks based on the
single observation. Schaefer and Simpkins ( 2014) used
SNA to study the relation between weight status and friend
selection and ERGMs to measure the effects of body mass
index on friend selection.
Moreover, Goodreau et al. (2009) used ERGMs to
examine the generative processes that give rise to
widespread patterns in friendship networks. Cranmer and
Desmarais used ERGMs to model co-sponsorship net-
works in the U.S. Congress and conflict networks in the
international system. They determined that several pre-
viously unexplored network parameters are accept-
able predictors of the U.S. House of Representatives
legislative co-sponsorship network (Cranmer and Des-
marais 2011).
The ERGMs have also been utilized in modeling the
changing communication network structure and classifying
networks based on the occurrence of their local features
(Uddin et al. 2013a) and to identify micro-level structural
properties of physician collaboration network on hospital-
ization cost and readmission rate (Uddin et al. 2013b).
Finally, a ERGM-based model of clustering nodes con-
sidering their role in the network has been reported (Salter-
Townshend and Murphy 2014 ).
4 Discussion
Mining social networks for knowledge and discovery has
proven to be a very challenging and active research area.
This review focussed on PGMs. The directed and undi-
rected PGM paradigms were described and their applica-
tions to social networks were highlighted. An important
consideration and major challenge is the issue of scalabil-
ity, not only for PGMs, but for SNA, in general. Structural
and parameter learning in high dimensions can be pro-
hibitive. Moreover, for structural learning, both greedy-
and sampling-based search strategies can get stuck at local
minima, and many graphs may be likelihood equivalent.
These numerical caveats can give rise to misleading net-
works, generating models, and subsequent predictions. In
addition, ERGMs can exhibit degeneracy, which occurs
when the generated networks show little resemblance to the
generating model. Proposed modifications to the concept of
goodness of fit have been proposed to safeguard against the
problems of degeneracy (Goodreau 2007; Hunter et al.
2008).
In the majority of applications of PGMs (both directed
and undirected) in SNA, the graphical structures are
assumed to be either known or designed by human experts
(i.e., captured directly by social networ ks), thereby the
learning problem is limited to the parameter estimation.
However, practically hand-constructed PGMs for SNA
have many barriers: time taken to construct them varies
from hours to months, experts can be costly or unavailable,
the data may be huge and errors may lead to poor answers.
On the other hand, structure learning is NP-hard with the
hypothesis space being super-exponential (2
Oðn
2
Þ
)
networks.
Directed and undirected graphs share common inter-
pretations in terms of conditional independences. Selection
of a PGM modeling paradigm is not trivial and is driven by
the data and ultimately what the user hopes to achieve with
the model. When the relationship can be viewed in terms of
cause and effect, BNs are more appropriate , and when the
relationship is association, MNs are preferred. Inferences in
both paradigms are met with challenges. The types of
variables (continuous or discrete) have to be carefully
considered. Modeling with a mixture of these variables is
possible in the case of BNs under strict assumptions.
However, the inference problem becomes more sensitive to
sample size, as the parameters estimated for the local
models are done so from a potentially reduced population,
which can be severely subset by level factors of parent
nodes. Another important learning task, outside of the
scope of this review, is queries that involve the absorption
of evidence (e.g., new data) in the network and propagation
through the network. This process is known as belief
propagation and it takes place on a factor graph (aka cluster
graph). In the case of BNs, the factor graph is a factor tree
(aka junction tree), and the propagation schemes give rise
to exact inferences of marginal distributions (aka beliefs).
On the other hand, in MNs the factor graph may have
cycles, which does not ensure exact inference in terms of
marginals, but has still been shown to be useful in practice,
see Koller and Friedman (2009) for more details.
There are several opportunities to access open source data
resources in order to develop and test methodologies for
PGMs, and related areas. Max-Plank researchers have
released OSN data used in publications, which includes
crawled data from Flickr, YouTube, Wikipedia and
Soc. Netw. Anal. Min. (2015) 5:62 Page 13 of 18 62
123
Facebook (Mislove et al. 2007; Cha et al. 2008, 2009;
Viswanath et al. 2009). Several directed OSNs have been
released in the Stanford Network Analysis Package (snap),
e.g. from Epinions, Amazon, LiveJournal, Slashdot and
Wikipedia voting (Stanford 2011). Recently, a Facebook
dataset was released that exhibited convergence properties
and was shown to be representative of the underlying pop-
ulation (Gjoka et al. 2010). Document classification datasets
have also been released (Getoor 2012). A sample from the
CiteSeer database contains 3312 publications from one of
six classes, and 4732 links. The Cora dataset consists of
2708 publications classified into seven categories and the
citation network has 5429 links. Each publication is
described by a binary word vector which indicates the
presence of certain words within a collection of 1433.
WebKB consists of 877 scientific publications from five
classes, contains 1601 links and includes binary word attri-
butes similar to Cora. Terrorism databases are also publicly
available (Division 1948; National Consortium for the Study
of Terrorism and Responses to Terrorism 2015). The most
extensive is the RAND Database of Worldwide Terrorism
Incidents, which details terrorist attacks in nine distinct
regions of the world across the time-span 1968–2009 (dates
vary slightly depending on region) (Division 1948). Several
well-known challenges may arise in the analysis and rep-
resentation of terrorist network data, including incomplete
information, latent variables influencing node dynamics, and
fuzzy boundaries between terrorists, supporters of terrorists,
and the innocent (Sparrow 1991; Krebs 2002). The DBLP
computer science bibliography (http://dblp.uni-trier.de/db/)
is a massive online database that contains bibliographic
meta-data for over 2.6 million publications. There is also
ample opportunity to enroll in various data challenges,
which are often posed by corporations and operators of the
networks themselves.
In this review, we surveyed directed and undirected PGMs,
and highlighted their applications in modern social networks.
Despite limitations that arise related to scalability and infer-
ence, it is our opinion that the utility of PGMs has been
somewhat under-realized in the social network arena. It is
indisputable that methods for understanding social networks
have not kept pace with the data explosion. There are several
relevant topics and opportunities in social networks, e.g., link
predication, collective classification, modeling information
diffusion, entity resolution, and viral marketing, where con-
ditional independencies can be leveraged to improve perfor-
mance. PGMs implicitly convey conditional independence
and provide flexible modeling paradigms, which hold
tremendous promise and untapped opportunity for SNA.
Acknowledgments A. N. is supported in part by a MURI grant
(Number W911NF-09-1-0392) for Unified Research on Network-
based Hard/Soft Information Fusion, issued by the US Army Research
Office (ARO) under the program management of Dr. John Lavery, in
part by the Academy of Finland Grant MineSocMed (Number
268078), and in part by the 2015 U.S. Air Force Summer Faculty
Fellowship Program, sponsored by the Air Force Office of Scientific
Research. R. H. B. is supported through NSF DMS 1312250.
Appendix
Similarity between MNs and ERGMs
While MNs and ERGMs have been developed in different
scientific domains, they both speci fy exponential family
distributions. MN models treat social network nodes as
random variables, and hence, their utility is most obvious
in modeling processes on networks; ERGMs, on the other
hand, have been conceptualized to model network forma-
tion, where it is the edge presence indicators that are
treated as random variables (these random variables are
dependent if their correspo nding edges share a node). But
in fact, this application-related difference in what to treat
as random is not fundamental. This Appendix works to
more rigorously disclose the similarity between MNs and
ERGMs by re- defining an ERGM as a PGM. We begin,
however, by reviewing the branc h of literature devoted
exclusively to ERGMs .
Similar to MNs, a well-discussed problem of ERGMs
for analyzing social networks is related to the challenge of
parameters estimation (Robins et al. 2007b) due to the lack
of enough observed data. Robins et al. (2007b) outlin e this
and some other problems associated with ERGMs, e.g.,
degeneracy in model selection and bimodal distribution
shapes (see also Handcock et al. 2003; Rinaldo et al. 2009;
Snijders et al. 2006; Handcock et al. 2006).
The roots of ERGMs in the Principle of Maximum
Entropy (Park and Newman 2004) and the Hammersley–
Clifford theorem have been previously pointed out (Robins
et al. 2001; Goldenberg et al. 2010). Here, we illustrate
how MNs and ERGMs are similar in terms of the form and
structure using most popular significant statistics in
ERGMs; under the assumption of Markov dependence, for
a given social network, one can build a corresponding
Markov network via the following conversion: (1) each
node in the Markov network will correspond to an edge in
the social network [Fienberg called this construct a ‘usual
graphical model’ for ERGMs (Fienberg 2012)], (2) when
two edges share a node in the social network, a link will be
built between two corresponding nodes in the Markov
network.
Corresponding to each possible edge in a social network,
a node in an MN networ k is introduced; note the difference
between the original social network and the MN network—
they are not the same! Consider an ERGM with the
62 Page 14 of 18 Soc. Netw. Anal. Min. (2015) 5:62
123
significant statistics including the number of edges, f
1
ðyÞ,
the number of k-stars, f
i
ðyÞ; i ¼ 2; ...; N 1 and the
number of triangles, f
N
ðyÞ. In an MN, a maximum Entropy
(maxent) model proposes the following form for the
internal ener gy of the system, E
c
ðxÞ¼
P
i
a
ci
g
ci
. Define,
g
ci
as i
th
feature of clique c 2 X and a
ci
is its corresponding
weight in G. Thus, w
c
ðxÞ¼expfb
c
P
N
i¼1
a
ci
g
ci
g. Since
there are too many parameters in the MN, they can be
deducted by imposing homogeneity constraints similar to
that of ERGMs (Robins et al. 2007a). Before imposing
such constraints, these following facts are required.
It is straightforward to demonstrate that G encompasses
cliques of size f3; ...; N 1g. In addition, all substructure
in G
s
can be redefined by features in G. Considering these
points, we can rewrite the joint probability of all variables
represented by the MN, P(X), as follows:
PðXÞ¼
1
ZðaÞ
Y
C
c¼1
exp b
c
X
N
i¼1
a
ci
g
ci
!
¼
1
ZðaÞ
exp
X
C
c¼1
b
c
X
N
i¼1
a
ci
g
ci
!
:
ð6Þ
In ( 4 ), ZðaÞ is the partition function which is a function of
parameters. The homogeneity assumption, here, means
a
ci
¼ h
0
i
8 c ¼ 1; ...; C; then P(X) is:
PðXÞ¼
1
Zðh
0
Þ
exp
X
N
i¼1
h
0
i
X
C
c¼1
b
c
g
ci
!
: ð7Þ
In (5), let’s Z
0
¼ Zðh
0
Þ. In addition, we assume that
P
C
c¼1
b
c
g
ci
represented by f
0
i
, means that substructures i in
all cliques c are added up by weight b
c
. Finally, if we
replace f
0
i
in (5):
PðXÞ¼
1
Z
0
exp
X
N
i¼1
h
0
i
f
0
i
!
: ð8Þ
Comparing PðY ¼ yÞ and (4) confirms that ERGMs and
MNs are similar and under the following conditions they
are identical:
1. h
i
¼ h
0
i
,
2. f
i
¼ f
0
i
¼
P
C
c¼1
b
c
g
ci
.
The followi ng Numerical Example (the same exampl e in
the ERGM section) depicts similarities between ERGMs
and MNs. The social network has five actors, N ¼ 5
(Fig. 8). Considering Markov dependency assumption,
there exists an unique corresponding Markov network
shown in Fig. 9 with 10 nodes.There are 15 cliques (so-
called factors) of siz e three or four,
A
B
Fig. 8 a A social network with 5 nodes and b the corresponding realization network (graph) and sufficient statistics of the observed network
Fig. 9 A social network with five actors (left) and its corresponding Markov network (right)
Soc. Netw. Anal. Min. (2015) 5:62 Page 15 of 18 62
123
U ¼f/
1
ðy
12
; y
13
; y
14
; y
15
Þ; ...; /
15
ðy
24
; y
45
; y
25
Þg:
As already mentioned, the joint probability function of
all variables in each clique is proportional to the internal
energy. For instance:
/
1
ðxÞ¼
1
k
expfb
1
E
c
ðy
12
; y
13
; y
14
; y
15
Þg;
where E
1
ðxÞ¼
P
i
a
ci
g
ci
and k is the distribution
parameter. This simple example shows that how ERGMs
and MNs are the same in terms of the underlying concept
and the expressed proba bility distribution.
References
Afrasiabi MH, Gue
´
rin R, Venkatesh S (2013) Opinion formation in
Ising networks. In: Information theory and applications work-
shop (ITA), 2013, pp 1–10. IEEE
Aggarwal CC (2011) An introduction to social network data analytics.
Springer, Berlin
Agliari E, Burioni R, Contucci P (2010) A diffusive strategic
dynamics for social systems. J Stat Phys 139(3):478–491
Al Hasan M, Zaki MJ (2011) A survey of link prediction in social
networks. In: Social network data analytics. Springer, Berlin,
pp 243–275
Anderson RM, May RM et al (1979) Population biology of infectious
diseases: Part i. Nature 280(5721):361–367
Ayday E, Fekri F (2010) A belief propagation based recommender
system for online services. In: Proceedings of the fourth ACM
conference on recommender systems, pp 217–220. ACM
Bach SH, Broecheler M, Getoor L, O’Leary DP (2012) Scaling MPE
inference for constrained continuous Markov random fields with
consensus optimization. In: NIPS, pp 2663–2671
Berry MJ, Linoff G (1997) Data mining techniques: for marketing,
sales, and customer support. Wiley, New York
Bonchi F, Castillo C, Gionis A, Jaimes A (2011) Social network
analysis and mining for business applications. ACM Trans Intell
Syst Technol (TIST) 2(3):22
Broekel T, Hartog M (2013) Explaining the structure of inter-
organizational networks using exponential random graph mod-
els. Ind Innov 20(3):277–295
Bromberg F, Margaritis D, Honavar V et al (2009) Efficient Markov
network structure discovery using independence tests. J Artif
Intell Res 35(2):449
Cha M, Mislove A, Adams B, Gummadi KP (2008) Characterizing
social cascades in Flickr. In: Proceedings of the 1st workshop on
online social networks (WOSN’08), Seattle, WA
Cha M, Mislove A, Gummadi KP (2009) A measurement-driven
analysis of information propagation in the Flickr social network.
In: Proceedings of the 18th annual World wide web conference
(WWW’09), Madrid, Spain
Chapelle O, Zhang Y (2009) A dynamic Bayesian network click
model for web search ranking. In: Proceedings of the 18th
international conference on World wide web, pp 1–10. ACM
Chen H, Ku WS, Wang H, Tang L, Sun MT (2013) Linkprobe:
probabilistic inference on large-scale social networks. In: 2013
IEEE 29th international conference on data engineering (ICDE),
pp 290–301. IEEE
Chickering DM, Heckerman D, Meek C (2001) Large-sample
learning of Bayesian networks in NP-hard. J Mach Learn Res
5(2004):1287–1330
Coleman JS, Katz E, Menzel H et al (1966) Medical innovation: a
diffusion study. Bobbs-Merrill Company Indianapolis, New
York
Cowan R, Jonard N (2004) Network structure and the diffusion of
knowledge. J Econ Dyn Control 28(8):1557–1575
Crane R, McDowell LK (2011) Evaluating Markov logic networks for
collective classification. In: Proceedings of the 9th MLG
workshop at the 17th ACM SIGKDD conference on knowledge
discovery and data mining
Cranmer SJ, Desmarais BA (2011) Inferential network analysis with
exponential random graph models. Polit Anal 19(1):66–86
Daud A, Li J, Zhou L, Muhammad F (2010) Knowledge discovery
through directed probabilistic topic models: a survey. Front
Comput Sci China 4(2):280–301
Dielmann A, Renals S (2004) Dynamic Bayesian networks for
meeting structuring. In: IEEE international conference on
acoustics, speech, and signal processing, 2004. Proceedings
(ICASSP’04), vol 5, p V-629. IEEE
Dierkes T, Bichler M, Krishnan R (2011) Estimating the effect of
word of mouth on churn and cross-buying in the mobile phone
market with Markov logic networks. Decis Support Syst
51(3):361–371
Ding S (2011) Learning undirected graphical models with structure
penalty. arXiv:1104.5256
Division NSR (1948) Rand database of worldwide terrorism inci-
dents. http://www.rand.org/nsrd/projects/terrorism-incidents.
html
Domingos P, Kok S, Lowd D, Poon H, Richardson M, Singla P (2008)
Markov logic. In: Probabilistic inductive logic programming.
Springer, Berlin, pp 92–117
Domingos P, Lowd D, Kok S, Nath A, Poon H, Richardson M, Singla
P (2010) Markov logic: a language and algorithms for link
mining. In: Link mining: models, algorithms, and applications.
Springer, New York, pp 135–161
Fang L, LeFevre K (2010) Privacy wizards for social networking
sites. In: Proceedings of the 19th international conference on
World wide web, pp 351–360. ACM
Fellows I, Handcock MS (2012) Exponential-family random network
models (preprint). arXiv:1208.0121
Fienberg SE (2012) A brief history of statistical models for network
analysis and open challenges. J Comput Graph Stat
21(4):825–839
Frank O, Strauss D (1986) Markov graphs. J Am Stat Assoc
81(395):832–842
Freeman L (2004) The development of social network analysis.
Empirical Press, Vancouver
Friedman N, Murphy K, Russell S (1998) Learning the structure of
dynamic probabilistic networks. In: Proceedings of the four-
teenth conference on uncertainty in artificial intelligence.
Morgan Kaufmann Publishers Inc., San Mateo, pp 139–147
Getoor L (2012) Social network datasets. http://www.cs.umd.edu/
sen/lbc-proj/LBC.html
Gjoka M, Kurant M, Butts CT, Markopoulou A (2010) Walking in
Facebook: a case study of unbiased sampling of OSNs. In:
INFOCOM, 2010 Proceedings IEEE, pp 1–9
Goldenberg A, Moore A (2004) Tractable learning of large Bayes net
structures from sparse data. In: Proceedings of the twenty-first
international conference on machine learning, p 44. ACM
Goldenberg A, Zheng AX, Fienberg SE, Airoldi EM (2010) A survey
of statistical network models. Found Trends Mach Learn
2(2):129–233
Goodreau SM (2007) Advances in exponential random graph (p
)
models applied to a large social network. Soc Netw
29(2):231–248
Goodreau SM, Kitts JA, Morris M (2009) Birds of a feather, or friend
of a friend? using exponential random graph models to
62 Page 16 of 18 Soc. Netw. Anal. Min. (2015) 5:62
123
investigate adolescent social networks*. Demography
46(1):103–125
Grabowski A, Kosin
´
ski R (2006) Ising-based model of opinion
formation in a complex network of interpersonal interactions.
Physica A: Stat Mech Appl 361(2):651–664
Hageman RS, Leduc MS, Korstanje R, Paigen B, Churchill GA
(2011) A Bayesian framework for inference of the genotype–
phenotype map for segregating populations. Genetics
187:1163–1170
Handcock MS, Robins G, Snijders TA, Moody J, Besag, J (2003)
Assessing degeneracy in statistical models of social networks.
Technical report, Working paper
Handcock M, Hunter D, Butts C, Goodreau S, Morris M (2006)
Statnet: an r package for the statistical analysis and simulation of
social networks. Manual. University of Washington
He J, Chu WW, Liu ZV (2006) Inferring privacy information from
social networks. In: Intelligence and security informatics.
Springer, Berlin, pp 154–165
Heckerman D (2008) A tutorial on learning with Bayesian networks.
Springer, Berlin
Humphreys L (2007) Mobile social networks and social practice: a
case study of dodgeball. J Comput Mediat Commun
13(1):341–360
Hunter DR, Goodreau SM, Handcock MS (2008) Goodness of fit of
social network models. J Am Stat Assoc 103(481):
Jabeur LB, Tamine L, Boughanem M (2012a) Featured tweet search:
modeling time and social influence for microblog retrieval. In:
2012 IEEE/WIC/ACM international conferences on Web intel-
ligence and intelligent agent technology (WI-IAT), vol 1,
pp 166–173. IEEE
Jabeur LB, Tamine L, Boughanem M (2012b) Uprising microblogs: a
Bayesian network retrieval model for tweet search. In: Proceed-
ings of the 27th annual ACM symposium on applied computing,
pp 943–948. ACM
Jansen BJ, Zhang M, Sobel K, Chowdury A (2009) Twitter power:
tweets as electronic word of mouth. J Am Soc Inf Sci Technol
60(11):2169–2188
Java A, Song X, Finin T, Tseng B (2007) Why we twitter:
understanding microblogging usage and communities. In: Pro-
ceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop
on Web mining and social network analysis, pp 56–65. ACM
Kempe D, Kleinberg J, Tardos E
´
(2003) Maximizing the spread of
influence through a social network. In: Proceedings of the ninth
ACM SIGKDD international conference on knowledge discov-
ery and data mining, pp 137–146. ACM
Koelle D, Pfautz J, Farry M, Cox Z, Catto G, Campolongo J (2006)
Applications of Bayesian belief networks in social network
analysis. In: Proceedings of the 4th Bayesian modeling appli-
cations workshop, UAI conference
Koller D, Friedman N (2009) Probabilistic graphical models:
principles and techniques. Massachusetts Institute of Technol-
ogy, Cambridge
Krause SM, Bo
¨
ttcher P, Bornholdt S (2012) Mean-field-like behavior
of the generalized voter-model-class kinetic Ising model. Phys
Rev E 85(3):031126
Krebs VE (2002) Mapping networks of terrorist cells. Connections
24(3):43–52
Kuter U, Golbeck J (2007) Sunny: a new algorithm for trust inference
in social networks using probabilistic confidence models. AAAI
7:1377–1382
Kwak H, Lee C, Park H, Moon S (2010) What is twitter, a social
network or a news media? In: Proceedings of the 19th
international conference on World wide web, pp 591–600. ACM
Lauritzen SL (1996) Graphical models. Oxford University Press,
Oxford
Lee SI, Ganapathi V, Koller D (2006) Efficient structure learning of
Markov networks using l
1-regularization. In: Advances in
neural information processing systems, pp 817–824
Lipford HR, Besmer A, Watson J (2008) Understanding privacy
settings in Facebook with an audience view. UPSEC 8:1–8
Lusher D, Koskinen J, Robins G (2012) Exponential random graph
models for social networks: theory, methods, and applications.
Cambridge University Press, Cambridge
Madigan D, York J, Allard D (1995) Bayesian graphical models for
discrete data. Int Stat Rev 63(2):215–232
Manyika J, Chui M, Brown B, Bughin J, Dobbs R, Roxburgh C, Byers
AH (2011) Big data: the next frontier for innovation, competi-
tion, and productivity
Mislove A, Marcon M, Gummadi KP, Druschel P, Bhattacharjee B
(2007) Measurement and analysis of online social networks. In:
Proceedings of the 5th ACM/USENIX Internet measurement
conference (IMC’07), San Diego, CA
Morris M, Handcock MS, Hunter DR (2008) Specification of
exponential-family random graph models: terms and computa-
tional aspects. J Stat Softw 24(4):1548
Mukherjee S, Speed T (2008) Network inference using informative
priors. PNAS 11158:14313–14318
Murphy KP (2002) Dynamic Bayesian networks: representation,
inference and learning. PhD thesis, University of California
Murphy KP (2012) Machine learning: a probabilistic perspective. The
MIT Press, Cambridge
National Consortium for the Study of Terrorism and Responses to
Terrorism (START) (2015) University of Maryland. http://www.
start.umd.edu/
Neville J, Jensen D (2007) Relational dependency networks. J Mach
Learn Res 8:653–692
Newman ME (2006) Modularity and community structure in
networks. Proc Natl Acad Sci 103(23):8577–8582
Newman ME, Watts DJ, Strogatz SH (2002) Random graph models of
social networks. Proc Natl Acad Sci 99(suppl 1):2566–2572
Ounis I, Macdonald C, Lin J, Soboroff I (2011) Overview of the
TREC-2011 microblog track. In: Proceedings of the 20th Text
REtrieval Conference (TREC 2011)
Park J, Newman ME (2004) Statistical mechanics of networks. Phys
Rev E 70(6):066117
Pattison P, Wasserman S (1999) Logit models and logistic regressions
for social networks: II. Multivariate relations. Br J Math Stat
Psychol 52(2):169–193
Ravikumar P, Wainwright MJ, Lafferty JD et al (2010) High-
dimensional Ising model selection using l1-regularized logistic
regression. Ann Stat 38(3):1287–1319
Richardson M, Domingos P (2006) Markov logic networks. Mach
Learn 62(1–2):107–136
Rinaldo A, Fienberg SE, Zhou Y et al (2009) On the geometry of
discrete exponential families with application to exponential
random graph models. Electr J Stat 3:446–484
Robins G, Pattison P, Wasserman S (1999) Logit models and logistic
regressions for social networks: III. Valued relations. Psychome-
trika 64(3):371–394
Robins G, Pattison P, Elliott P (2001) Network models for social
influence processes. Psychometrika 66(2):161–189
Robins G, Pattison P, Kalish Y, Lusher D (2007a) An introduction to
exponential random graph (p) models for social networks. Soc
Netw 29(2):173–191
Robins G, Snijders T, Wang P, Handcock M, Pattison P (2007b)
Recent developments in exponential random graph (p) models
for social networks. Soc Netw 29(2):192–215
Salter-Townshend M, Murphy TB (2014) Role analysis in networks
using mixtures of exponential random graph models. J Comput
Grap Stat (just-accepted)
Soc. Netw. Anal. Min. (2015) 5:62 Page 17 of 18 62
123
Salter-Townshend M, White A, Gollini I, Murphy TB (2012) Review
of statistical network analysis: models, algorithms, and software.
Stat Anal Data Min ASA Data Sci J 5(4):243–264
Santos FC, Pacheco JM, Lenaerts T (2006) Evolutionary dynamics of
social dilemmas in structured heterogeneous populations. Proc
Natl Acad Sci USA 103(9):3490–3494
Schaefer DR, Simpkins SD (2014) Using social network analysis to
clarify the role of obesity in selection of adolescent friends. Am J
Public Health 104(7):1223–1229
Schmidt MW, Murphy K, Fung G, Rosales R (2010) Structure
learning in random fields for heart motion abnormality detection.
In: IEEE Conference on Computer Vision and Pattern Recog-
nition. IEEE, pp. 1–8
Scott J, Carrington PJ (2011) The SAGE handbook of social network
analysis. SAGE Publications, London
Snijders TA, Pattison PE, Robins GL, Handcock MS (2006) New
specifications for exponential random graph models. Sociol
Methodol 36(1):99–153
Song X, Jiang S, Yan X, Chen H (2014) Collaborative friendship
networks in online healthcare communities: an exponential
random graph model analysis. In: Smart health, vol 8549.
Springer, Switzerland, pp 75–87
Sparrow MK (1991) The application of network analysis to criminal
intelligence: an assessment of the prospects. Soc Netw
13(3):251–274
Srihari S (2014) Probabilistic graphical models. In: Alhajj R, Rokne J
(eds) Encyclopedia of social network analysis and mining.
Springer, Berlin
Stanford (2011) Stanford network analysis package (snap). http://
snap.stanford.edu
Taskar B, Abbeel P, Koller D (2002) Discriminative probabilistic
models for relational data. In: Proceedings of the eighteenth
conference on uncertainty in artificial intelligence. Morgan
Kaufmann Publishers Inc., USA, pp 485–492
Thiemichen S, Friel N, Caimo A, Kauermann G (2014) Bayesian
exponential random graph models with nodal random effects
(preprint). arXiv:1407.6895
Tresp V, Nickel M (2013) Relational models. In: Rokne J, Alhajj R
(eds) Encyclopedia of social network analysis and mining.
Springer, Heidelberg
Uddin S, Hamra J, Hossain L (2013a) Exploring communication
networks to understand organizational crisis using exponential
random graph models. Comput Math Organ Theory 19(1):25–41
Uddin S, Hossain L, Hamra J, Alam A (2013b) A study of physician
collaborations through social network and exponential random
graph. BMC Health Serv Res 13(1):234
Van den Bulte C, Lilien GL (2001) Medical innovation revisited:
social contagion versus marketing effort1. Am J Sociol
106(5):1409–1435
Vega-Redondo F (2007) Complex social networks, vol 44. Cambridge
University Press, Cambridge
Viswanath B, Mislove A, Cha M, Gummadi KP (2009) On the
evolution of user interaction in Facebook. In: Proceedings of the
2nd ACM SIGCOMM workshop on social networks
(WOSN’09), Barcelona, Spain
Wan HY, Lin YF, Wu ZH, Huang HK (2012) Discovering typed
communities in mobile social networks. J Comput Sci Technol
27(3):480–491
Wang Y, Vassileva J (2003) Bayesian network-based trust model. In:
IEEE/WIC international conference on Web intelligence, 2003.
WI 2003. Proceedings, pp 372–378. IEEE
Wasserman S, Pattison P (1996) Logit models and logistic regressions
for social networks: I. An introduction to Markov graphs and p.
Psychometrika 61(3):401–425
Wortman, J.: Viral marketing and the diffusion of trends on social
networks (2008)
Xiang R, Neville J (2013) Collective inference for network data with
copula latent Markov networks. In: Proceedings of the sixth
ACM international conference on Web search and data mining,
pp 647–656. ACM
Yang X, Guo Y, Liu Y (2013) Bayesian-inference-based recommen-
dation in online social networks. IEEE Trans Parallel Distrib
Syst 24(4):642–651
62 Page 18 of 18 Soc. Netw. Anal. Min. (2015) 5:62
123
... G AUSSIAN graphical models (GGMs) [1], [2] are promising in modelling multivariate distributions by capturing the conditional dependency relations between random variables in the form of a graph (network). As a result, GGMs have been extensively used in a wide variety of domains, such as social network analysis [3], gene regulatory networks analytics [4], computer vision [5], neuroimaging [6], [7], and robotics artificial intelligence [8]. Formally, a GGM is a multivariate Gaussian distribution X ∼ N (µ, Σ) associated with an undirected graph G = (V, E), where each variable in X corresponds to a node in G and the topology of G is characterized by the precision matrix Ω = Σ −1 , i.e., a zero off-diagonal entry in the precision matrix denotes the conditional independence of two variables as well as the absence of the corresponding edge in the graph. ...
... These methods aim at learning the skeleton of DAGs, which is different from the graph formed by the concentration matrix of undirected GGMs [31]. 3 ...
... Citation information: DOI 10.1109/TNSE.2022.3208732 Definition 4. (k-separable graph [16]) Let u and v be a pair of non-adjacent nodes in a graph G. Suppose there exists a node set S ⊂ V \ {u, v} with |S| ≤ k such that every path from u to v has to pass through some nodes in S. 3 Then the pair of nodes u and v is said to be k-separable. A graph G is said to be k-separable if every pair of non-adjacent nodes in the graph is k-separable. ...
Article
Full-text available
This paper considers the problem of learning the underlying graph topology of Gaussian Graphical Models (GGMs) from observations. Under high-dimensional settings, to achieve low sample complexity, many existing graph topology learning algorithms assume structural constraints such as sparsity to hold. Without prior knowledge of graph sparsity, the correctness of their results is difficult to check. In this paper, we aim to do away with these assumptions by developing algorithms for learning degree-bounded GGMs and separable GGMs without any sparsity priors. The proposed algorithms, which are based only on the knowledge of conditional independence relations in the data distribution, require minimal structural assumptions while still achieving low sample complexity, and hence are ‘applicable’. Specifically, for any user defined sparsity parameter $k$ , we prove that the proposed algorithms can consistently identify whether a $p$ -dimensional GGM is degree-bounded by $k$ (or strongly $k$ -separable) with $\Omega (k \log p)$ sample complexity. Besides, our algorithms also demonstrate ‘partial’ learning properties whenever the overall graph is not entirely sparse, that is, not all nodes are degree-bounded (or are strongly separable). In this case, we can still learn the sparse portions of the graph, with theoretical guarantees included. Numerical results show that existing algorithms fail even in some simple settings where sparsity assumptions do not hold, whereas our algorithms do not.
... Graphical models (Drton and Maathuis, 2017) such as the Gaussian graphical model (Meinshausen and Bühlmann, 2006;Friedman et al., 2008) have been widely applied to many different fields for identifying key interactions between random variables (Farasat et al., 2015;Wille et al., 2004;Dobra and Lenkoski, 2011). These graphical models usually assume that all samples, or at least the samples under a known condition, are sampled from the same population and thus have the same network. ...
Preprint
Full-text available
In applications such as gene regulatory network analysis based on single-cell RNA sequencing data, samples often come from a mixture of different populations and each population has its own unique network. Available graphical models often assume that all samples are from the same population and share the same network. One has to first cluster the samples and use available methods to infer the network for every cluster separately. However, this two-step procedure ignores uncertainty in the clustering step and thus could lead to inaccurate network estimation. Motivated by these applications, we consider the mixture Poisson log-normal model for network inference of count data from mixed populations. The latent precision matrices of the mixture model correspond to the networks of different populations and can be jointly estimated by maximizing the lasso-penalized log-likelihood. Under rather mild conditions, we show that the mixture Poisson log-normal model is identifiable and has the positive definite Fisher information matrix. Consistency of the maximum lasso-penalized log-likelihood estimator is also established. To avoid the intractable optimization of the log-likelihood, we develop an algorithm called VMPLN based on the variational inference method. Comprehensive simulation and real single-cell RNA sequencing data analyses demonstrate the superior performance of VMPLN.
... In PGMs, researchers can specify probability distributions of their choice for the random variables along with the prior distribution (Airoldi, 2007;Farasat et al., 2015). Taking into consideration priors and likelihood formulations, PGMs fit the data to obtain the distributions of the constants and latent variables. ...
Preprint
Full-text available
To improve the traveling experience, researchers have been analyzing the role of attitudes in travel behavior modeling. Although most researchers use closed-ended surveys, the appropriate method to measure attitudes is debatable. Topic Modeling could significantly reduce the time to extract information from open-ended responses and eliminate subjective bias, thereby alleviating analyst concerns. Our research uses Topic Modeling to extract information from open-ended questions and compare its performance with closed-ended responses. Furthermore, some respondents might prefer answering questions using their preferred questionnaire type. So, we propose a modeling framework that allows respondents to use their preferred questionnaire type to answer the survey and enable analysts to use the modeling frameworks of their choice to predict behavior. We demonstrate this using a dataset collected from the USA that measures the intention to use Autonomous Vehicles for commute trips. Respondents were presented with alternative questionnaire versions (open- and closed- ended). Since our objective was also to compare the performance of alternative questionnaire versions, the survey was designed to eliminate influences resulting from statements, behavioral framework, and the choice experiment. Results indicate the suitability of using Topic Modeling to extract information from open-ended responses; however, the models estimated using the closed-ended questions perform better compared to them. Besides, the proposed model performs better compared to the models used currently. Furthermore, our proposed framework will allow respondents to choose the questionnaire type to answer, which could be particularly beneficial to them when using voice-based surveys.
... In PGMs, researchers can specify probability distributions of their choice for the random variables along with the prior distribution (Airoldi, 2007;Farasat et al., 2015). Taking into consideration priors and likelihood formulations, PGMs fit the data to obtain the distributions of the constants and latent variables. ...
Article
To improve the traveling experience, researchers have been analyzing the role of attitudes in travel behavior modeling. Although most researchers use closed-ended surveys, the appropriate method to measure attitudes is debatable. Topic Modeling could significantly reduce the time to extract information from open-ended responses and eliminate subjective bias, thereby alleviating analyst concerns. Our research uses Topic Modeling to extract information from open-ended questions and compare its performance with closed-ended responses. Furthermore, some respondents might prefer answering questions using their preferred questionnaire type. So, we propose a modeling framework that allows respondents to use their preferred questionnaire type to answer the survey and enable analysts to use the modeling frameworks of their choice to predict behavior. We demonstrate this using a dataset collected from the USA that measures the intention to use Autonomous Vehicles for commute trips. Respondents were presented with alternative questionnaire versions (open- and closed-ended). Since our objective was also to compare the performance of alternative questionnaire versions, the survey was designed to eliminate influences resulting from statements, behavioral framework, and the choice experiment. Results indicate the suitability of using Topic Modeling to extract information from open-ended responses; however, the models estimated using the closed-ended questions perform better compared to them. Besides, the proposed model performs better compared to the models used currently. Furthermore, our proposed framework will allow respondents to choose the questionnaire type to answer, which could be particularly beneficial to them when using voice-based surveys.
... To the best of our knowledge, a few applications of GMs are introduced in the SNA framework. For instance, the contribution of Farasat et al. (2015) focuses on the use of GMs to model social networks representing a sort of a marriage between graph theory and probability theory. The authors highlight the utility of GMs in conveying conditional independence and providing flexible modeling paradigms. ...
Article
In the present contribution we provide a discussion of the paper on “Bayesian graphical models for modern biological applications”. The authors present an extensive review of Bayesian graphical models, which are used for a variety of inferential tasks applied to biology and medicine settings. Our contribution proposes a conceptual connection between two scientific frameworks, graphical models and social network analysis, by highlighting also the role played by network models and random graphs. A bibliometric analysis is performed by exploiting publications collected from online bibliographic archives to map the main themes characterizing the two research fields. Specifically, a co-word network analysis is carried out using visualization tools and thematic evolution maps.
... A promising modeling option toward this goal is a graphical model, previously applied in a variety of domains, such as genetics (Ghazalpour et al., 2006), sociology (Farasat, Nikolaev, Srihari, & Blair, 2015), and psychology (Fried et al., 2019). From a statistical point of view, network analysis uses multiple regressions that are visually plotted. ...
Article
Full-text available
Research on the etiology of dyslexia typically uses an approach based on a single core deficit, failing to understand how variations in combinations of factors contribute to reading development and how this combination relates to intervention outcome. To fill this gap, this study explored links between 28 cognitive, environmental, and demographic variables related to dyslexia by employing a network analysis using a large clinical database of 1,257 elementary school children. We found two highly connected subparts in the network: one comprising reading fluency and accuracy measures, and one comprising intelligence-related measures. Interestingly, phoneme awareness was functionally related to the controlled and accurate processing of letter-speech sound mappings, whereas rapid automatized naming was more functionally related to the automated convergence of visual and speech information. We found evidence for the contribution of a variety of factors to (a)typical reading development, though associated with different aspects of the reading process. As such, our results contradict prevailing claims that dyslexia is caused by a single core deficit. This study shows how the network approach to psychopathology can be used to study complex interactions within the reading network and discusses future directions for more personalized interventions.
Chapter
People use social media platforms like Facebook, Twitter, and blog sites for expressing their views and criticising the products purchased and movies watched. They use these platforms for getting information like blood donation requirements and job opportunities. During the disastrous situations like floods and earthquakes, these platforms act as powerful media for passing messages to all people. During this COVID-19 pandemic period, all social media platforms are effectively used by all businesses for the instant communication and interactions between the groups of people. In all these scenarios, the information gets diffused and reaches different levels of people. Sometimes this diffusion gives positive aspects to the readers; sometimes it creates negative impacts to them, which has its own cascading effects. It becomes essential to monitor the rate of flow of information and stop spreading the fake or false messages. The application of suitable graph network modelling and theories would support this research issue and recommend the appropriate model for the social media data.
Article
Purpose Predicting the final status of an ongoing process or a subsequent activity in a process is an important aspect of process management. Semi-structured business processes cannot be predicted by precise and mathematical methods. Therefore, artificial intelligence is one of the successful methods. This study aims to propose a method that is a combination of deep learning methods, in particular, the recurrent neural network and Markov chain. Design/methodology/approach The proposed method applies the BestFirst algorithm for the search section and the Cfssubseteval algorithm for the feature comparison section. This study focuses on the prediction systems of social insurance and tries to present a method that is less costly in providing real-world results based on the past history of an event. Findings The proposed method is simulated with real data obtained from Iranian Social Security Organization, and the results demonstrate that using the proposed method increases the memory utilization slightly more than the Markov method; however, the CPU usage time has dramatically decreased in comparison with the Markov method and the recurrent neural network and has, therefore, significantly increased the accuracy and efficiency. Originality/value This research tries to provide an approach capable of producing the findings closer to the real world with fewer time and processing overheads, given the previous records of an event and the prediction systems of social insurance.
Chapter
We provide a survey on relational models. Relational models describe complete networked {domains by taking into account global dependencies in the data}. Relational models can lead to more accurate predictions if compared to non-relational machine learning approaches. Relational models typically are based on probabilistic graphical models, e.g., Bayesian networks, Markov networks, or latent variable models. Relational models have applications in social networks analysis, the modeling of knowledge graphs, bioinformatics, recommendation systems, natural language processing, medical decision support, and linked data.
Article
We survey the recent literature on theoretical models of diffusion in social networks and the application of these models to viral marketing. To put this work in context, we begin with a review of the most common models that have been examined in the economics and sociology literature, including local interaction games, threshold models, and cascade models, in addition to a family of models based on Markov random fields. We then discuss a series of recent algorithmic and analytical results that have emerged from the computer science community. The first set of results addresses the problem of influence maximization, in which the goal is to determine the optimal group of individuals in a social network to target with an advertising campaign in order to cause a new product or technology to spread throughout the network. We then discuss an analysis of the properties of graphs that allow or prohibit the widespread propagation of trends.
Conference Paper
Health 2.0 provides patients an unprecedented way to connect with each other online. However, less attention has been paid to how patient collaborative friendships form in online healthcare communities. This study examines the relationship between collaborative friendship formation and patients’ characteristics. Results from Exponential Random Graph Model (ERGM) analysis indicate that gender homophily doesn’t appear in CFNs, while health homophily such as treatments homophily and health-status homophily increases the likelihood of collaborative friendship formation. This study provides insights for improving website design to help foster close relationship among patients and deepen levels of engagement.