ArticlePDF Available

Probabilistic graphical models in modern social network analysis

October 2015
Social Network Analysis and Mining 5(1)

October 2015
5(1)

DOI:10.1007/s13278-015-0289-6

Authors:

Alireza Farasat

Twilio Inc.

A.G. Nikolaev

University at Buffalo, The State University of New York

The advent and availability of technology has brought us closer than ever through social networks. Consequently, there is a growing emphasis on mining social networks to extract information for knowledge and discovery. However, methods for social network analysis (SNA) have not kept pace with the data explosion. In this review, we describe directed and undirected probabilistic graphical models (PGMs), and highlight recent applications to social networks. PGMs represent a flexible class of models that can be adapted to address many of the current challenges in SNA. In this work, we motivate their use with simple and accessible examples to demonstrate the modeling and connect to theory. In addition, recent applications in modern SNA are highlighted, including the estimation and quantification of importance, propagation of influence, trust (and distrust), link and profile prediction, privacy protection, and news spread through microblogging. Applications are selected to demonstrate the flexibility and predictive capabilities of PGMs in SNA. Finally, we conclude with a discussion of challenges and opportunities for PGMs in social networks.

a Simple example of a parameterized Bayesian Network of a University multi-section course. In this scenario, a student may enroll in a course taught by different Professors. A student’s Grade is influenced by the student’s Intelligence, the Professor, and the effectiveness of their TA as measured by the TA grade in the course. The TA Quality is a direct reflection of the TA’s mastery of the material (their grade). The quality of a student’s Letter of recommendation is dependent on the grade in the course, and also the professor. b Time slices from a dynamic Bayesian network of two students that interact in a study group. The true network is given to the left, which includes both feedback and a cycle (prohibited in static BNs). Relationships can unrolled in a DBN across discrete time points (right)

…

A widely used proposal in an MCMC sampling for structural learning in a BN is the addition, reversal, or deletion of an edge from the current graph, Gcurr\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G_{\mathrm{{curr}}}$$\end{document}, to form a new graph, Gnew\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G_{\mathrm{{new}}}$$\end{document}. The proposal move is selected at random from models in the neighborhood with a probability. The Metropolis–Hastings acceptance criteria are a function of the neighborhood size for the current and propose graphs and the overall fit of those graphs to the data as measured by the Bayes factor

…

Simplified schematics of select examples of Bayesian networks in social networks. a Inferring influence based on transaction style data that links actors to events. b DAGs can utilize network features such as attributes and centrality measures on the network itself to predict derived metrics, e.g., individual importance or leadership potential. c Twitter is a microblogging community, which can be queried using a retrieval model that is based on a Bayesian network. d An application of DBNs for click modeling in a browser. The temporal dimension is a click sequence that connects the examination of webpages with attraction and satisfaction

…

a A social network and a corresponding MN model (b). The nodes of the MN are actors’ decisions and the variable dependencies are defined based on the ties in the social network. The three tables show the frequencies of hypothetical (observed) Like and Dislike combinations. c A factor graph for the MN in b

…

An example of using MLN in the political science. a Depicts a social network of five senators with two attributes. The ground predicates (b) are denoted by 15 elliptical nodes. The red ones are captured by the links of the social networks and dark blue nodes indicate nodes’ attributes. Two first-order logics, F1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F_1$$\end{document} and F2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F_2$$\end{document}, determine the structure of the MLN. There exist five groundings of the F1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F_1$$\end{document} (illustrated by the edges between the R(x) and S(x)′ nodes) and 15 groundings of F2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F_2$$\end{document} captured by the rest of the edges. Other examples of MLNs can be found in Tresp and Nickel (2013)

…

Figures - available from: Social Network Analysis and Mining

This content is subject to copyright. Terms and conditions apply.

Content uploaded by Alireza Farasat

Content may be subject to copyright.

ORIGINAL ARTICLE

Probabilistic graphical models in modern social network analysis

Alireza Farasat

1,2

•

Alexander Nikolaev

•

Sargur N. Srihari

•

Rachael Hageman Blair

Received: 14 October 2014 / Revised: 14 August 2015 / Accepted: 23 August 2015 / Published online: 19 October 2015

Ó Springer-Verlag Wien 2015

Abstract The advent and availability of technol ogy has

brought us closer than ever through social networks.

Consequently, there is a growing emphasis on mining

social networks to extract information for knowledge and

discovery. However, methods for social network analysis

(SNA) have not kept pace with the data explosion. In this

review, we describe directed and undirected probabilistic

graphical models (PGMs), and highlight recent applica-

tions to social networks. PGMs represent a ﬂexible class of

models that can be adapted to address many of the current

challenges in SNA. In this work, we motivate their use with

simple and accessible examples to demonstrate the mod-

eling and connect to theory. In addition, recent applications

in modern SNA are highlighted, including the estimation

and quantiﬁcation of importanc e, propagation of inﬂuence,

trust (and distrust), link and proﬁle prediction, privacy

protection, and news spread through microblogging.

Applications are selected to demonstrate the ﬂexibility and

predictive capabilities of PGMs in SNA. Finally, we con-

clude with a discussion of challenges and opportunities for

PGMs in social networks.

Keywords Probabilistic graphica l modeling  Social

network analysis  Bayesian networks  Markov networks 

Exponential random graph models  Markov logic

networks  Social inﬂuence  Network sampling

1 Introduction

Over 40 years ago, social scientist Allen Barton stated that

‘‘If our aim is to understand people’s behavior rather than

simply to record it, we want to know about primary groups ,

neighborhoods, organizations, social circles, and commu-

nities; about interaction, communication, role expectations,

and social control.’’ (Barton 1968 as reported in Freeman

2004). This sentiment is fundamental to the concept of

modularity. The importance of structural relationships in

deﬁning communities and predicting future behaviors has

long been reco gnized, and is not restricted to the social

sciences (Freeman 2004).

Social network analysis (SNA) has a rich history that is

based on the deﬁning principle that links between actors

are informative. Th e advent and availability of Internet

technology has created an explosion in online social net-

works and a transformation in SNA. The analysis of

today’s social networks is a difﬁcult Big Data problem,

which requires the integration of statistics and computer

science to leverage networks for knowledge mining and

discovery (Manyika et al. 2011). Historically, scientists

have had to rely on tractable records of social interactions

and experiments (e.g., Milgram’s small wor ld experiment);

now they have a luxury of accessing huge digital databases

of relational social data. SNA relies on diverse data rep-

resentations and relational information, which may include

(among others), tracked relationships among actors, events,

and other covariate information (Scott and Carrington

2011). Modeling social networks is especially challenging

due to the heterogeneity of the popul ations represented, and

& Rachael Hageman Blair

hageman@buffalo.edu

Department of Industrial and Systems Engineering,

University at Buffalo, Buffalo, USA

Department of Computer Science and Engineering,

University at Buffalo, Buffalo, USA

Department of Biostatistics, University at Buffalo, Buffalo,

USA

123

Soc. Netw. Anal. Min. (2015) 5:62

DOI 10.1007/s13278-015-0289-6

the broad spectrum of information represented in the data

itself. Modern applications of SNA include, among others,

the estimation of inﬂuence, privacy protection, trust (and

distrust) microblogging, and web browsing.

In this review, we focus on probabilistic graphical models

(PGMs), which have demonstrated promise in modeling

social networks (Lauritzen 1996; Koller and Friedman

2009). PGMs represent a marriage between graph theory and

probability theory that offers ﬂexible modeling paradigms

with good interpretably. The graphical representation con-

sists of nodes connected by edges, which may be directed

(Bayesian networks) or undirected (Markov networks). The

relationship between nodes in a graph can be interpreted in

terms of conditional independencies. These independencies

can be read directly from the graph and enable a

tractable decomposition of the joint distribution possible

through the use of conditional probabilities. In this setting,

the compact representation of a high-dimensional joint dis-

tribution of random variables fX

; X

; ...; X

g can be rep-

resented explicitly in a factorized form that has a graphical

interpretation rooted in conditional independencies.

A powerful feature of the PGM modeling paradigms is the

ability to perform probabilistic queries and reasoning on the

graph at multiple levels (Koller and Friedman 2009). Queries

of interest may include the estimation of probabilities (joint,

conditional, or marginal), reasoning about variables in light

of new evidence (cau sal, evidential, and inter-causal rea-

soning), and quantitative predictions through the use of the

graph as a generator in simulations. Another attractive fea-

ture of PGMs is their inherent ﬂexibility to model variables

that follow different distributions, and the ability to bring in a

priori information in to the learning process.

In this review, we outline the basic theory of PGMs,

along with the parameter and structural learning. The topic

of PGMs is extremely rich in content and theory. Several

existing surveys on the topic of graphical models that are

similar in spirit include (e.g., Goldenberg et al. 2010 ; Daud

et al. 2010; Salter-Townshend et al. 2012; Srihari 2014).

Our review differs from existing reviews in both style and

content. One distinguishing feature is that we illustrate the

different modeling paradigms using accessible and simple

models. Simple examples facilitate a connection between

theory and practice. Once this connection is established, we

highlight more complex recent applications in SNA that

differ from each other in both the nature of the data and

objectives of the modeling. These applications reveal the

inherent ﬂexibility of PGMs to model a broad spectrum of

data that target relevant open challenges and questions in

SNA. We address both directed PGM s, known as Bayesian

networks (BNs), and undirected PGMs, known as Markov

networks, in Sects. 2 and 3, respectively. In Sect. 4, some

of the current challenges are highlighted, comparisons

between directed and undirected paradigms are made, and

future directions and opportunities for PGM-based research

in SNA are also highli ghted.

2 Directed probabilistic graphical models

Bayesian networks (BNs) are a special class of PGMs that

capture directed dependencies between variables, which

may represent cause and effect relationships. The edges in

a BN form a directed acyclic graphs (DAGs). The DAG

architecture conveys a critical modeling assumption that

there is no feedback via cycles in the graph. BNs obey the

Markov assumption which states that each variable, X

,is

independent of its non-descendants (unconnected nodes),

given its parents in G. Taken together, these assumptions

enable the compact representation of the high-dimensional

joint probability distribution of the variables in the model.

Despite their ﬂexibility, the use of directed graphs in SNA

has been somew hat limited, although the applications that

we highlight are diverse. We describe the basic principles

of these directed PGMs and motivate them with applica-

tions in the literature, which showcase their utility in SNA.

Static Bayesian Networks Our major focus is static BNs,

which utilize data from a single snapshot of a social

community at a given time point. A DAG conveys precise

information regarding the conditional independencies

between modeled variables (nodes). For a set of random

variables fX

; X

; ...; X

g is a network with the structure

that encodes conditional independence relationships:

PðX

; X

; ...; X

Þ¼PðGÞ

i¼1

PðX

j paðX

Þ; H

Þ;

ð1Þ

where P(G) is the prior distribution over the graph G,

paðX

Þ are the parent nodes of child X

, and H

denotes the

parameters of the local probability distribution. The prior

comes into play only when an expert cannot describe the

graph and structural learning is required. Structures and

relationships that are more likely (and less likely) can be

embedded into P(G) to inﬂuence searches through the

posterior model space.

A simple and fully parameterized BN for a course at a

University is shown in Fig. 1. This network can be viewed

as a template model in which different sections of a course

are taught by different professors ðPr 2fp0; p1g). Similar

templates can be used for various courses, e.g., Calculus,

Introduction to Chemistry, etc. In this example, different

teaching assistants (TA) vary in their teaching effective-

ness (TA

2fPoor; Fair; Goodg), and their own grade in

the course inﬂuences their overall ability to convey the

material in the class in well (TA

2fA; B; Cg). A student’s

grade (Grade 2fA; B; Cg) is caused by their intelligence

62 Page 2 of 18 Soc. Netw. Anal. Min. (2015) 5:62

123

fi0; i1g, the professor of the course, and the grade of the

TA. Finally, if a professor is asked to write a letter of

recommendation for a particular student, for simplicity,

this may be based on the performance of the student in the

course and the professor (e.g., some are more prone to

write positive letters). This scenario, although overly sim-

plistic, may hold in large classroom settings where the

teacher does not get to know the individual student s well.

The Markov assumption is apparent through the condi-

tional probability tables (CPDs) for each node, which

depend only on the parents. The head nodes (top layer)

have no parents, thus the CPD table is simply a marginal

probability that will sum to one across the different states

of the discrete variable. On the other hand, nodes with

parents are conditional on the possible combinations of the

parent states. It is evident that even for a small number of

parents, and a small number of sta tes for those p arents, the

CPD tables can grow quickly. In our example, PðG j

I; Pr; TA

Þ has 12 possible states (or scenarios) that can

occur that would inﬂuence grade-level.

Our toy example is an expert system that was written

down according to knowledge about the modeling domain.

Importantly, there are many scenarios that may be more or

less realistic, which may include changing some edges or

the addition of new variables. Nonetheless, it is often the

case that a network structure can be accurately described

by an expert. When a structure is prescribed for a BN,

parameter learning is still required. This comes in the form

of CPD tables for discrete distributions (Fig. 1). In our

simple example, the probabilities for the CPD table may

have been extracted from teaching evaluations, grades, or

other means.

As demonstrated in our simple example, each child node

is dependent on its parent nodes. The parameter learning

can be viewed as a local model or distribution that involves

only the child and the parents. These local models are the

Professor

Grade

TA Grade

TA Quality

Intellegence

i0 i1

0.35 0.65

p0 p1

0.5 0.5

Letter

i0, p0, A i0, p0, B i0, p0, C i0, p1, A i0, p1, B i0, p1, C i1, p0, A i1, p0, B i1, p0, C i1, p1, A i1, p1, B i1, p1, C

A 0.1 0.1 0.2 0.2 0.2 0.2 0.4 0.4 0.7 0.5 0.6 0.65

B 0.2 0.25 0.3 0.4 0.4 0.3 0.2 0.3 0.2 0.3 0.3 0.25

C 0.7 0.65 0.5 0.4 0.4 0.5 0.4 0.3 0.1 0.2 0.1 0.1

p0, A p0, B p0,C p1, A p1, B p1, C

Very Good 0.9 0.25 0 0.6 0.2 0

Good 0.1 0.75 0.1 0.4 0.4 0

Nuetral 0 0 0.9 0 0.4 0.95

Simple Parameterized Static Bayesian Network

P(I, Pr, TA_g, G, TA_q, L} = P(I)P(Pr)P(TA_g)P(G|I,Pr,TA_g)P(TA_q| TA_g)P(L|Pr, G)

Joint Distribution

AB C

0.6 0.35 0.05

AB C

Good 0.3 0.3 0.1

Fair 0.5 0.6 0.8

Poor 0.2 0.1 0.1

Simple Bayesian Networks

Time-slices from a Dynamic Bayesian Network

Student A

Student B

True Network

Student A

Student B

Student A

Student B

time point k

time point k+1

Unrolled BN (two time points)

Fig. 1 a Simple example of a

parameterized Bayesian

Network of a University multi-

section course. In this scenario,

a student may enroll in a course

taught by different

Professors. A student’s

Grade is inﬂuenced by the

student’s Intelligence, the

Professor, and the

effectiveness of their TA as

measured by the TA grade in

the course. The TA Quality is

a direct reﬂection of the TA’s

mastery of the material (their

grade). The quality of a

student’s Letter of

recommendation is dependent

on the grade in the course, and

also the professor. b Time slices

from a dynamic Bayesian

network of two students that

interact in a study group. The

true network is given to the left,

which includes both feedback

and a cycle (prohibited in static

BNs). Relationships can

unrolled in a DBN across

discrete time points (right)

Soc. Netw. Anal. Min. (2015) 5:62 Page 3 of 18 62

123

building blocks of the graphical model and they make up

the factors in the product for the joint distribution (Eq. 1).

When the variables in the model are continuous, the

speciﬁcation of a local model requires distribution param-

eters. For example, if fX

; X

g are Gaussian, and

! X

 X

, then a local model would be of the form

Nðb

þ b

 X

þ b

 X

; r

Þ. Th erefore, in the con-

tinuous case, the local model can be viewed as a regression

on the parents. Another popular local model is a condi-

tional Gaussian Bayesian network (CG-BN), which gives

rise to regressions in which a continuous child node may

have parents that are both discrete or continuous (Lauritzen

1996). To enable factorization, CG-BNs prohibit the dis-

crete children from having cont inuous parents.

Structural learning is required when the network is not

known and has to be learned from the data. The objective

function for maximization is the posterior probability of a

graph, G, given by:

PðG j XÞ/PðX j GÞPðGÞ;

where P(G) is the prior on the graph. The marginal like-

lihood, PðX j GÞ, requires complex integration over the

parameters H:

PðX j GÞ/

PðX j h; GÞPðh j GÞdh;

which can be alleviated with the use of conjugate priors. In

an effort to accelerate the learning process, and prevent

from over-ﬁtting, a fan-in assumption is typical ly adopted.

This limits the number of parents that a node can have

(e.g., a node can have no more than three parents). The

graph prior P(G) can be explicitly used to encourage cer-

tain relationships, and penalize against others (Mukherjee

and Speed 2008; Hageman et al. 2011). Computation relies

on the fact that each node in the network, together with the

corresponding parents, represents a local model, which can

be described by a regression. These individual regressions

have priors on their para meters, PðX j GÞ, for example a

normal-Wishart prior can be used for nodes that follow a

normal distribution. In practice, the posterior in a graph is

calculated as a product of the local models, which is valid

representation under the Markov assumption. Possi ble

local models are often pre-computed in an effort to ease the

computational demand of the learning algorithms.

The structural learning problem concerns identifying a

global network that assembles these local models in a

optimal way. The process is a major challenge (NP-hard),

as the number of possible networks is super-exponential

with the number of nodes (Chickering et al. 2001). Struc-

tural learning methods rely on sampling-based appro aches

or a greedy optimization, e.g., hill climbing or simulated

annealing (Heckerman 2008). Sampling-based approaches

rely on Markov Chain Monte Carlo (MCMC) techniques

that sample the posterior distribution by moving through

model space according to a proposal distribution. The

proposal represents the modiﬁcation to the current graph,

curr

, which is then evaluated and potentially accepted,

new

, (kept in the sample) or rejected (not kept in the

sample, another proposal is attempted) (Madigan et al.

1995). A widely used proposal for a new graph in the

Markov chain is to either add, delete, or reverse a single

edge (Fig. 2). This proposal is implemented within a

Metropolis–Hastings framework. The acceptance criterion

for a new graph is determined by:

min 1;

PðG

new

j XÞ

PðG

curr

j XÞ



|ﬄﬄﬄﬄﬄﬄﬄﬄﬄ{zﬄﬄﬄﬄﬄﬄﬄﬄﬄ}

BayesFactor

QðG

curr

j G

new

QðG

new

j G

curr

|ﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄ{zﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄ}

HastingsRatio

;

ð2Þ

The Bayes factor gives a measure of goodness of ﬁt of the

proposed graph relative to the current. The Hastings ratio is

simply the neighborhood siz e of possible moves from the

current and new graph, equivalently,

NeighborhoodðG

curr

Þ=NeighborhoodðG

new

Þ (¼5=6in

Fig. 2).

The directionality and causal structure of the inferred

model make BN an attractive modeling paradigm for social

networks that capture cause and effect relationships.

Screen-based bayes net structure (SBNS) was developed as

a search strategy for large-scale data, which relies on the

adopted assumption of sparsity in the overall network

structure (Goldenberg and Moore 2004). SBSN enforces

the sparsity through a two stage process, which frames the

structural learning problem as market basket analysis task.

The algorithm relies on the theory of frequent sets and

support, to ﬁrst screen for local modu les of nodes, and then

connect them through a global structure search. The market

basket framework lends itself to transaction style data,

which is by nature large, sparse and binary. The learning

problem is to identify an inﬂuence graph based on derived

features of the binary transaction data. In this case, actors

are assumed to be linked to each o ther indirectly through

items or events. A simple example of individuals linked

through a conference is shown in Fig. 3a. In this example,

the conference attendance (transaction items) can be used

to infer a network of social inﬂuence between individuals,

which adds insights into the social hierarchy that are not

apparent in classical interaction networks. The method was

shown to be effective for modeling a variety of SNs,

including citation networks, collaboration data, and movie

appearance records (Berry and Linoff 1997).

Koelle et al. proposed applications of BNs to SNA for

the prediction of novel links and pre-speciﬁed node fea-

tures (e.g., leadership potential) (Koelle et al. 2006). The

authors emphasize the advantage of BN to account for

62 Page 4 of 18 Soc. Netw. Anal. Min. (2015) 5:62

123

uncertainty, noise, and incompleteness in the network. For

example, a topology-based network measures such as de-

gree centrality, which is often used as a surrogate for im-

portance, are subject to summarizations over incomplete

and sometimes erroneous data. Comparatively, a BN

affords more ﬂexibility that enables measures such as im-

portance to be estimated in a more data-dependent manner.

Koelle et al. provide an example of combining topology-

based network measures with covariate information

(Fig. 3b). Directed inference of this type leverages small

X2 X3 X3X2

X2 X3

Current Graph (G_curr)

Current Graph (G_new)

Structural Learning: Example of Graph Proposals in MCMC

Neighborhood Neighborhood

Deletion

Reversal

Addition Addition

Addition Reversal Reversal

Addition noiteleDnoiteleD

Addition

1/6

1/5

Fig. 2 A widely used proposal

in an MCMC sampling for

structural learning in a BN is the

addition, reversal, or deletion of

an edge from the current graph,

curr

, to form a new graph,

new

. The proposal move is

selected at random from models

in the neighborhood with a

probability. The Metropolis–

Hastings acceptance criteria are

a function of the neighborhood

size for the current and propose

graphs and the overall ﬁt of

those graphs to the data as

measured by the Bayes factor

Future Leadership

Potential

Sex Education

Religion

Individual Importance

Degree

Centrality

Link

Certainty

Individual

Importance

- Centrality Measure

- Attribute

- Derived Metric

Edge Types

following

mentioning

tagging

publishing

re-tweeting

sharing

Node Types

microblogger

retweet

hashtag

web resource

Twitter

Tweet “Layored” Search

Microbloggers Tweets Terms Query

Conference 1

Anne

Bill

Cal

Doug

Eden

Anne

Bill

Cal

DougEden

Individuals linked through events

Inferred Social Influence

A Sparse Bayesian Influence

B Local Bayesian Network Prediction

C Bayesian models of Twitter Queries

Bayesian Network Applications

Session Level

Query Level

- Observed

- Latent

t-1

t+1

D Click Modeling

Conference 2

Conference 3

E - Examination

A - Attraction

S - Satisfaction

Fig. 3 Simpliﬁed schematics of

select examples of Bayesian

networks in social networks.

a Inferring inﬂuence based on

transaction style data that links

actors to events. b DAGs can

utilize network features such as

attributes and centrality

measures on the network itself

to predict derived metrics, e.g.,

individual importance or

leadership potential. c Twitter is

a microblogging community,

which can be queried using a

retrieval model that is based on

a Bayesian network. d An

application of DBNs for click

modeling in a browser. The

temporal dimension is a click

sequence that connects the

examination of webpages with

attraction and satisfaction

Soc. Netw. Anal. Min. (2015) 5:62 Page 5 of 18 62

123

local models, which can be naturally translated to regres-

sion or classiﬁcation problems, depending on the child

node (response variable). In this setting, the local BN can

be evaluated at the node-level, ranked probability estimates

can be used for predictive purpos es, and the output serves

as a surrogate for model ﬁt on a given structure.

Privacy protection is a major concern amongst users in

online social networks. Generally, people prefer that their

personal information is shared in small circles of friends

and family, and shielded from strangers. Despite this

common desire, relatively simple BNs have been shown to

be successful in the invasion of privacy though the infer-

ence of personal attributes, which have been shielded

through privacy settings (He et al. 2006). The BNs operate

under the often accurate assumption that friends in social

circles are likely to share common attributes. In 2006, the

recommendation by He et al. to improve privacy was to hide

friend lists through privacy settings, and to request that

friends hide their personal attributes. Practically speaking,

setting the optimal pri vacy settings is complex, and can be a

tedious and difﬁcult for an average user (Lipford et al.

2008). In 2010, a privacy wizard template was proposed,

which automates a persons priv acy settings based on an

implicit set of rules derived using Naive Bayes (the simplest

BN) or decision tree methods (Fang and LeFevre 2010).

On the other side of the application spectru m, BNs are

useful for recommending products and services, to users,

taking into account their intere sts, needs and communica-

tions patterns. Belief propagation has been used to sum-

marize belief about a product and propagate that belief

through a BN (Ayday and Fekri 2010; Yang et al. 2013).

Belief propagation is the process in which node marginal

distributions (beliefs) are updated in light of new evidence.

In the case of a BN, evidence (e.g., opinion or ratings) is

absorbed and propagated through a computational object

known as a junction tree, resulting in updated marginal

distributions. Comparing the networ k marginals before and

after evidence is entered and propagated conveys a system-

wide effect of inﬂuence(s), and insights into how perception

or ratings change when recommendations are passed

through a network. Despite its simplicity, the BN approach

has been shown to be competitive with the more classical

collaborative ﬁltering (CF)-based recommendation. Trust

(and distrust) can be highly variable dynamic processes,

which depends not only on distance from a recommender,

but also, the characteristics of the network users (Wang and

Vassileva 2003; Kuter and Golbeck 2007). Accounting for

trust in recommendation systems is an open area of research

Microblogging networks represent another effective

venue for rapidly disseminating information and inﬂuence

throughout a community. Twitter is the most well-known

microblogging network, in which posts (tweets) are short

and time-sensitive with respect to the reference of current

topics (Kwak et al. 2010). Users within microblogging

networks of this type participate though the act of following

and being followed, which gives rise naturally to directed

associations (Java et al. 2007). With over 50 million tweets

submitted daily, ranking and querying microblogs has

become an important and active area of open research.

Jabeur et al. proposed a retrieval model for tweet searches,

which takes into account a number of factors, including

hashtags, inﬂuence of the microbloggers, and the time

(Jabeur et al. 2012a, b). A query relevance function was

developed based on a BN that leverages the PageRank

algorithm to estimate parameters, such as inﬂuence, in the

model (Fig. 3c). The retrieval model was shown to out-

perform traditional methods for information retrieval on

Twitter data from the TREC Tweets 2011 corpu s (Ounis

et al. 2011).

Dynamic Bayesian Networks The static BNs described

depict a network at a single time point. This is most often an

oversimpliﬁcation of the true nature of the network, which is

inherently dynamic. Modeling the dynamics of a network

over the time-course can be achieved in the BN framework

with additional modeling assumptions. Dynamic Bayesian

networks (DBNs) provide compact repr esentations for

encoding structured probability distributions over arbitrarily

long time-courses (Murphy 2002). State-space models, such

as hidden Mar kov model (HMM) and Kalman ﬁlter models

(KFMs), can be viewed as a special class of the more general

DBN. Speciﬁcally, KFMs require unimodal linear Gaussian

assumptions on the state-space variables. HMMs do not

allow for factorizations within the state-space, but can be

extended to hierar chical HMMs for this purpose. Overall,

DBNs enable a more general representation of sequential or

time-course data.

DBN modeling is achieved through the use of template

models, which are instantiated, i.e., duplicated, over multiple

time points. The relationships between the variables within a

template are ﬁxed, and represent the inherent dependencies

between ground variables in the model. There are three types

of edges in a DBN. Intra-time slice edges represent depen-

dencies within a time slice. Persistent edges link the same

variable in two time slices; for example, the velocity of a

vehicle at a time slice is very dependent on the velocity of the

vehicle in the previous time slice. Finally, inter-time slice

edges connect different variables between time slices, for

example, the velocity of a car may also be inﬂuenced by

weather at the previous time slice.

The objective is to model a template variable over a

discretized time-course, X

X

, and represent PðX

Þ as a function of the templates over the range of time

points. Re ducing the temporal problem to conditional

template models makes the problem computationally

tractable, b ut requires the speciﬁcation of a ﬁxed structure

62 Page 6 of 18 Soc. Netw. Anal. Min. (2015) 5:62

123

across the entire time trajectory. In a DBN, the probability

for a random variable X spanning the time-course can be

given in factored form,

PðX

0:T

Þ¼PðX

T1

t¼0

PðX

tþ1

j X

Þ;

where X

represents the initial state, and the conditional

probability terms of the form PðX

tþ1

j X

Þ convey the

conditional independence assumptions. The conditional

representation of the likelihood is similar in spirit to the

static BN representation, but conveys the conditional

independence with respect to time. The Markov assump-

tion enables this factorization, which has different, yet

analogous meanings in static and dynamic BNs. In a DBN,

the Markov assumption explains the memor yless property,

i.e., that the current state depends on the previous and is

conditionally independent of the past ðX

tþ1

0:t1

j X

Þ.

Comparatively, in static BNs, the Markov assumption only

captures nodes’ independence of their non-descendants,

given the states of their parents.

Brieﬂy, the learning paradigms are rather similar.

Structural learning is typically achieved by the same

scoring strategies, but with the added constraint that the

structure must repeat over time (Friedman et al. 1998).

Such a constraint alleviates the computational burden for

search strategies. Additionally, the best initial structure can

be searched for independently from the remainder of the

time-course. The search is performed either through greedy

hill climbing or samplin g.

A major advantage of DBNs is that they can be enriched

to accommodate more complex interactions that would

violate DAG assumptions in a static BN. Figure 1b shows a

simple example of a common situation where the true

network has feedback in the form of self-loops and a cycle

in the graph. This feedback is prohibited in a static BN, but

can be captured in a DBN. In this scenario, two students

form a stud y group, but also self-study, in an effort to

improve learning outcomes. The unrolled BN captures

these relationships for two time slices that contains per-

sistent and inter-time slice edges. These relationships are

preserved over a time series (e.g., a semester), thereby

forming a template model.

Despite the fact that social networ ks are inherently

dynamic, the applications of DBNs in SNA have been

limited. Importantly, there have been many attempts to

model social networks probabilistically over time, but not

in the strict PGM context, which is the focus of this review;

many of these advances are mentioned in the discussion.

Chapelle et al. used DBNs to model web users’ browsing

history (Chapelle and Zhang 2009). The DBN extends the

traditional and widely used cascade model for browsing

behavior. The dynamic of click sequences for single click

(Fig. 3d) takes into account the information at the query

and session levels, differentiating perceived/actual attrac-

tion (a

and A

respectively) and perceived/actual satis-

faction (s

and S

respectively) with links. At each click

(time-step), the hidden binary variables for examination

) and satisfaction (S

) track the time progression to

predict future clicks. The DBM approach was shown to

outperform traditional methods, and highlighted the sen-

sitivity of click modeling to measures of relevance and

popularity at the query level.

Meetings can be viewed as social events, in which

valuable information is exchanged mainly through speech.

Effectively processing, capturing, and organizing this

information can be costly, but important in order to max-

imize the impact and information ﬂow for participants.

Dielman et al. cast the problem of meeting structuring as a

DBN, which partitions meetings into sequences of actions

or phases based on audio (Dielmann and Renals 2004).

DBNs outperformed baseline HMMs in detecting meeting

actions in a smart room, such as dialog, note s at the board,

computer presentations, and presentations at the board.

Twitter and microblogs, in general, have become a

major resource for the media to obtain breaking news or a

the occurrence of a critical event. Recently, Sakaki et al.

showed that tweet modeling via Kalman Filtering is

effective for the prediction of earthquakes of a certain

magnitude in Japan. Furthermore, they developed a

reporting system Torreter, which is quicker than the

existing government reporting system in warning registered

individuals through email of an impending quake (Jansen

et al. 2009).

3 Undirected probabilistic graphical models

Markov networks (MNs), also known as Markov random

ﬁelds (MRFs), are PGMs with undirected edges . Similar to

directed BNs, a MN is a representation of the joint prob-

ability distribution between random variables (represented

by nodes), where the absence of an edge between two

nodes implies conditional independence between the

nodes, given the other nodes in the network. In this review,

we restrict our focus to MNs, Markov logic networks

(MLNs) and exponential random graph models (ERGMs),

which can be viewed as generalizations of the random

graphs (Frank and Strauss 1986), and are widely used in

SNA (Newman et al. 2002). The basic formulation of these

models and their utility in SNA will be highlighted.

Markov networks express the joint probability of given

random variables by decomposing the network into smaller

complete sub-graphs known as cliques, and use maximal

cliques to capture the random variables’ dependencies. A

clique is a maximal clique if it cannot be extended to

Soc. Netw. Anal. Min. (2015) 5:62 Page 7 of 18 62

123

include additional adjacent nodes. The clique representa-

tion enables a compact factorization of the probability

density function (pdf). The joint pdf of n random variables,

X ¼fX

; ...; X

g, with conditional (in)dependencies cap-

tured by a graph, can be expressed as:

PðXÞ¼

C2X

ðX

Þ;

ð3Þ

where C is a maximal clique in the set of maximal cliques

X. Let X

denote a subset of X comprised of the random

variables that form clique C. Clique potential w

ðX

Þ is a

function of these variables (e.g., the frequency of distinct

realizations of the random variables forming the clique). A

unique clique potential can be speciﬁed for each clique; the

size of a clique can be one determinant of the corre-

sponding clique potential; however, cliques of the same

size may have different clique potentials. The clique

potentials are p ositive functions that capture the depen-

dence of the variables within the cliques (Koller and

Friedman 2009). The normalizing constant, also known as

the partition function, is given as:

Z ¼

X2v

C2X

ðX

Þ:

Each clique potential in a MN is speciﬁed by a factor,

which can be viewed as a table of weights for each com-

bination of values of variables in the potential. In some

special cases of MNs such as log-linear models (Murphy

2012), clique potentials are represented by a set of func-

tions, termed features, with associated weights (i.e.,

ðX

Þ¼logðw

ðX

ÞÞ, where /

ðX

Þ is a feature

derived from the values of the variables in set X

) and h

the weight of /

ðX

Þ estimated at the parameter learning

stage.

The Hammersley–Clifford theorem speciﬁes the condi-

tions under which a positive probability distribution can be

represented as a MN. Speciﬁcally, the given representation

(Eq. 3) implies conditional independencies between the

maximal cliques and is, by deﬁnition, a Gibbs measure

(Murphy 2012).

A simple example of the use of MNs to study the col-

lective behavior in a social network is shown in Fig. 4 .

Suppose each member of a friendship network (Fig. 4a) is

looking to make a decision about purchasing a given pro-

duct, ‘‘liking’’ a post in an online forum, supporting a

political party, participating in a school activity or choos-

ing a family doctor. In this setting, the random variables in

the nodes are binary and the edges indicate pairwise

dependencies between them. Let X

; ...; X

denote ﬁve

random variables, each of which takes on the value of 1 or

0 to signal the member’s attitude—Like or Dislike,

respectively. Figure 4b depicts the MN with three cliques

a, b and c, X ¼fa; b; cg, with X

¼fX

; X

g, X

; X

g and X

¼fX

; X

g. The MN includes three clique

potentials, w

ðX

; X

Þ; w

ðX

; X

Þ and w

ðX

; X

Þ, sat-

isfying the requirements of the Hammersley–Clifford

Fig. 4 a A social network and a

corresponding MN model (b).

The nodes of the MN are actors’

decisions and the variable

dependencies are deﬁned based

on the ties in the social network.

The three tables show the

frequencies of hypothetical

(observed) Like and Dislike

combinations. c A factor graph

for the MN in b

62 Page 8 of 18 Soc. Netw. Anal. Min. (2015) 5:62

123

theorem under the assumption that a member’s decision

can be affected by only their immediate friends and that it

matters if those friends are also friends with each other.

The joint probability function of X

; ...; X

is expresse d as:

PðX

; ...; X

Þ¼

ðX

; X

Þw

ðX

; X

Þw

ðX

; X

Þ;

where Z ¼

;...;X

ðX

; X

Þw

ðX

; X

Þw

ðX

; X

¼ 3  1  13 þþ9  10 19 (the summation is taken

over all the 2

distinct realizations of the model’s variables;

the ﬁrst explicitly written term above corresponds to the

case where the variables are all zeros, and the last term

corresponds to the case where the variables are all ones).

To parameterize the MN and obtain a log-linear model, let

ðX

; X

Þ¼logðw

ðX

; X

ÞÞ, h

ðX

; X

Þ¼

logðw

ðX

; X

ÞÞ and h

ðX

; X

Þ¼logðw

ðX

; X

ÞÞ. With

H ¼fh

; h

g, the joint probability function deﬁned by

the MN becomes:

PðX

; ...; X

Þ¼

ZðHÞ

expfh

ðX

; X

þ h

ðX

; X

Þþh

ðX

; X

Þg;

where ZðHÞ is the partition function. The set of the model

parameters, H, would need to be estimated from any

available data of known decisions made by the social

network members. Figure 4c depicts a factor graph corre-

sponding to the MN. Fact or graphs are bipartite graphs

used to specify the factorization of the probability distri-

bution function, and also, to inform the computation of

marginal probability distributions of MN variables (Mur-

phy 2012).

MN speciﬁcation problems, including parameters esti-

mation and structure learning from data, can be quite

challenging. The main difﬁcul ty in MN parameter esti-

mation is that the max imum likelihood problem formul ated

with Eq. 3 has no analytical solution due to the complex

expression of Z (Lee et al. 2006). The problem of ﬁnding

the optimal structure of the MN using available data,

similar to BNs, is even more challenging (Bromberg et al.

2009). Currently existing approaches to structure learning

are either constraint-based or score-based (see Koller and

Friedman 2009; Ding 2011; Schmidt et al. 2010 for more

details).

MNs have found increased utility in SNA with the

emergence of online social networks (OSNs) and digital

social media (see Bonchi et al. 2011 for a review of key

problems in SNA). The need to capture non-c ausal

dependencies within and between data instances (e.g.,

proﬁle information) and observed relationships (e.g.,

hyperlinks) in these applications is exacerbated by the

presence of missing or hidden data in OSNs (Xiang and

Neville 2013). A popular problem instance in this domain,

that of user (missing) proﬁle prediction, has been attacked

using MNs (Taskar et al. 2002 ; Neville and Jensen 2007).

Along with the problem of predicting missing proﬁles,

link prediction is among the most prominent problems in

Big Data SNA. Multiple variations of MNs that have been

used to estimate the probability that a (unobserved) link

exists betwee n nodes include Markov logic networks,

relational Markov networks, relational Bayesian networks

and relational dependency networks (Al Hasan and Zaki

2011; Chen et al. 2013; Tresp and Nickel 2013). Detection

of community structures is another area of MN application

(Newman 2006). Communities can be discovered through

examination and subsetting (cutting) network relationships

according to labels of interest, and through the use of

weighted community detection algorithms. Social network

clustering is especially challenging in a dynamic context,

e.g. in mobile social networks (Hump hreys 2007). Wan

et al. employed undirected graphical models (i.e., condi-

tional random ﬁelds) constructed from mobile user logs

that include both communication records and user move-

ment information (Wan et al. 2012).

Several generative models have been proposed, which

are motivated by MNs, and explain the effects of selection

and inﬂuence (e.g., see Aggarwal 2011). Modeling chan-

neled spread of opinions and rumors, known more generally

as diffusion modeling, is an active area of research in SNA

(Bach et al. 2012). Several applications of diffusion mode ls

have been proposed for social networks including, but not

limited to the spread of information (Cowan and Jonar d

2004), viral marketing (Kempe et al. 2003), spread of dis-

eases (Anderson and May 1979), the spread of cooperation

(Santos et al. 2006). Given a social network, for each node,

a corresponding rando m variable indicates the state of the

node (e.g., product or technology adoption) and links in the

network represent dependency (Wortman 2008).

Markov logic networks employ a probabilistic frame-

work that integrates MNs with ﬁrst-order logic such that

the MN weights are positive for only a small subset of

meaningful features viewed as templates (Richardson and

Domingos 2006). Formally, let F

denote a ﬁrst-order logic

formula, i.e., a logical expression comprising constants,

variables, functions and predicates, and w

2Rdenote a

scalar weight. An MLN is then deﬁned as a set of pairs

ðF

; w

Þ. From the MLN, the ground Markov network,

L;C

, is constructed (Richardson and Domingos 2006) with

the probability distribution (Tresp and Nickel 2013),

PðX ¼ xÞ¼

exp

ðxÞ

; ð4Þ

where n

ðxÞ is the number of true groundings (e.g., true

logic expressions based on observations) of F

, i.e., such

formulae that hold, in x.

Soc. Netw. Anal. Min. (2015) 5:62 Page 9 of 18 62

123

A simple example network of ﬁve senators is shown in

Fig. 5a. In this setting, each senator supports one of two

political parties (Democratic or Republican). Each senator

in this network has two attributes (1) political afﬁliation,

(R(n) is 1 if senator n is a republican and 0 otherwise for

n 2fA; B; C; D; Eg), and (2) supporting a particular bill,

(S(n) is 1 if senator n supports the bill and 0 otherwise). Let

F(n, m), a binary symmetric function, denote the relation-

ship between senators n and m (n 6¼ m). Suppose

‘‘Republicans do not support the bill’’ and ‘‘If two senators

have a relationship and one is republican then so is the

other’’ are two logical statements denoted by F

and F

The ﬁrst-order logic format is given as follows:

: 8n; RðnÞ):SðnÞ;

: 8n; mFðn; mÞ^RðnÞ)RðmÞ:

The MLN is similar to the ground network (Fig. 5b).

However, only the combinations of variables correspond-

ing to logical statements, F

and F

, are parameterized in

the MLN (all the other weights are zero). Let X ¼

; ...; X

g denote the set of all nodes in the M

L;C

where

indicate node i (e.g., X

is the node labeled S(A)in

Fig. 5b). In an MLN, clique potentials are deﬁned simi lar

to those in Markov networks. Now, one can use this MLN

to ﬁnd the probability that all senators in this example

support the bill (i.e., PðSðAÞ¼1; ...; SðEÞ¼1). To cal-

culate the number of true groundings (n

and n

), both F

and F

should be examined for all nodes in the observed

network. More generally, this approach can be imple-

mented to estimate missing proﬁles in social networks as

well.

Many problems in statistical relational learning, such as

link prediction (Domingos et al. 2008), social network

modeling, collective classiﬁcation, link-based clustering

and object identiﬁcation, can be formulated using instances

of MLN (Richardson and Domingos 2006). Dierkes et al.

used MLNs to investigate the inﬂuence of Mobile Social

Networks on consumer decision-making behavior. With

the call detail records represented by a weighted graph,

MLNs were employed in conjunction with logit models as

the learning technique based on lagged neighborhood

variables. The resulting MLNs were used as predictive

models for the analysis of the impact of word of mouth on

churn (the decision to abandon a communication service

provider) and purchase decisions (Dierkes et al. 2011).

As mentioned above, link mining and link prediction

problems can also be addressed using MLNs, since MLNs

combine logic and probability reasoning in a single

framework (Domingos et al. 2010). Furthermore, the abil-

ity of MLNs to represent complex rules by exploiting

relational information makes them an appropriate alterna-

tive for collective classiﬁcation (e.g., classiﬁcation of

publications in a citation network, or of hyperlinked web-

pages) (Crane and McDowell 2011).

The Ising model and its variations form a subclass of

MN with foundations in theoretical physics. The Ising

model is a discrete and pairwise MN, and is popular in

applications in part due to its simplicity (Koller and

Friedman 2009). The variables in the model, X

X

, are

assumed to be binary, and their joint probability is given

as:

PðX; HÞ¼exp

ðj;kÞ2E

 UðHÞ

8 X 2 v;

where v 2f1; 1g

, and UðHÞ is the log of the partition

function

UðHÞ¼log

x2v

exp

ðj;kÞ2E

Fig. 5 An example of using MLN in the political science. a Depicts a

social network of ﬁve senators with two attributes. The ground

predicates (b) are denoted by 15 elliptical nodes. The red ones are

captured by the links of the social networks and dark blue nodes

indicate nodes’ attributes. Two ﬁrst-order logics, F

and F

determine the structure of the MLN. There exist ﬁve groundings of

the F

(illustrated by the edges between the R(x) and S(x)

nodes) and

15 groundings of F

captured by the rest of the edges. Other examples

of MLNs can be found in Tresp and Nickel (2013)

62 Page 10 of 18 Soc. Netw. Anal. Min. (2015) 5:62

123

Special, efﬁcient methods exist for learning the Ising

Model parameters from data (Ravikumar et al. 2010).

While the model has been originally found useful for

understanding magnetism and phase transitions, its utility

has later expanded to image processing, neural modeling,

and studies of tipping points in economics and social

domains (Afrasiabi et al. 2013).

In SNA, the Ising model can be employed to analyze

factors such as network substruc tures and nodal features

affecting the opinion formation process. A classical

example within this is a stud y of medical innovation

spread, namely the adoption of drug tetracycline by 125

physicians in four small cities in Illinois (Van den Bulte

and Lilien 2001). A small subset of this network is illus-

trated in Fig. 6. The adoption status of node k is repre-

sented by X

8 k ¼ 1; ...; 5, where X

¼ 1 if adopted (blue

nodes) and 0 otherwise (black nodes). Since the Ising

model only capt ures pairwise dependencies between vari-

ables, the corresponding MN only considers cliques of size

two (i.e., dyads); hence, one can concisely write the clique

potentials in the form wðX

; X

Þ¼X

. Let n

and n



denote the number of agreements (i.e., cases withX

¼ 1

for some k and j) and the number of disagre ements (i.e.,

cases with X

¼1 for some k and j), respectively.

Assuming h

¼ h

if X

¼ 1 and h

¼ h

if X

¼1,

H ¼fh

; h

g, one can obtain

ðj;kÞ2E

¼ h

h



. Hence, the joint probability of all nodes’ adoption

status is:

PðX

; ...; X

; HÞ¼expðh

 h



 UðHÞÞ;

where UðHÞ¼log

x2v

½expðh

 h



Þ is the parti-

tion function. For this small example, the table in Fig. 6b

shows all clique potentials which are either 1 or 1 based

on the network structure. The counts of agreements and

disagreements are obtained next (e.g., n

¼ 5 and n



¼ 2

in Fig. 6b). The partition function has 32 additive terms:

each combination of X

’s leads to particular values of n

and n



and all these values affect the value of UðHÞ.

In the model presented above, the counts of possible

agreements and disagreements depend on the network

structure, so the MN can be said to explore the impact of

homophily on tetracycline adoption decisions. Note that the

model’s parameters, h

and h

, ﬁrst need to be estimated

from any given data (i.e., from a single observation of a

network of (non)adopters); however, the approaches to

such parameter estimation are beyond the scope of this

paper.

Figure 7 depicts the entire physicians’ advisory network

from a data set prepared by Ron Burt from the 1966 data

collected by Coleman et al. (1966) about the spread of

medical innovation. The ﬁgure illustrates the physicians’

network in two different time points and shows how

physicians changed their opinions and adopted the new

medication overtime. To ﬁnd the probability of adoption,

the Ising model can be modiﬁed by considering the impact

of nodal attributes on the adoption.

Recently, the Ising Model has been used to examine

social behaviors (Vega-Redondo 2007), including collec-

tive decision making, opinion formation and adoption of

new technologies or products (Grabowski and Kosin

ski

2006; Krause et al. 2012). For example, Fellows et al.

proposed a random model of the full network by modeling

nodal attributes as random variates. They utilized the new

model formulation to analyze a peer social network from

the National Longitudinal Study of Adolescent Health

(Fellows and Handcock 2012). Agliari et al. (2010) pro-

posed a mode l to extract the underlying dynamics of social

systems based on diffusive effects and people strategic

choices to convince others. Through the adaptation of a

cost function, based on the Ising model, for social inter-

actions between individuals, they showed by numerical

simulation that a steady-state is obtained through natural

dynamics of social systems.

Fig. 6 An example of implementing the Ising model to ﬁnd the probability of adopting a new medication. a A sub-network of an physicians’

advisory network with 5. b A pairwise Markov network is constructed where the cliques with size of at most 2 are involved

Soc. Netw. Anal. Min. (2015) 5:62 Page 11 of 18 62

123

Exponential random graph models (ERGMs) (Wasser-

man and Pattison 1996), also known as the p



-class models,

are among the most widely used network approaches to

modeling social networks in recent years (Pattison and

Wasserman 1999; Robins et al. 1999, 2007a). A social

network of individuals is denoted by graph G

with N nodes

and M edges, M 



. The corresponding adjacency

matrix of is denoted by Y ¼½y



NN

, where y

is a random

variable and deﬁned as follows:

1 if there exists a link between nodes i and j 8i;j;i6¼j

0 otherwise.



Based on an ERGM, the probability of any observed net-

work, y, is:

PðY ¼ y; HÞ¼

exp

i¼1

ðyÞ

; ð5Þ

where f

ðyÞ; i ¼ 1; ...; K, are called sufﬁcient statistics

(Morris et al. 2008; Lusher et al. 2012), or motifs based on

conﬁgurations of the observed graph and H ¼fh

; ...; h

is a K-vector of parameters (K is the number of different

sufﬁcient statistics used in the model). Network conﬁgu-

rations used to compute sufﬁcient statistics, including but

not limited to network edge count (tie between two actors),

as well as counts of 2-stars (two ties sharing an actor) and

triads of various types, are related to communication pat-

terns among actors in a social network (see Lusher et al.

2012 for more details about network conﬁgurations). The

parameters of an ERGM describe the probabilities of a

wide variety of possible conﬁgurations in social networks

(Robins et al. 2001). Again, Z is called the normalization

constant.

As an example, a social network of ﬁve individuals is

assumed. Since the edges (ties) between nodes are

considered as random variables, the given network is the

most likely realization out of many possible networks. In

this case, an ERGM const ructs a probability distribution

over all possible networks with ﬁve nodes. Figure 7 illus-

trates the social network (A) and the corresponding graph

where edges, y

8i; j ¼ 1; ...; 5 represent random variables

along with ﬁve sufﬁcient statistics (f

ðyÞ i ¼ 1; ...; 5)

including edge, 2-star, 3-star, 4-star and triangle (B ). The

probability distribution of any possi ble network is obta ined

as follows:

PðY ¼ y; HÞ¼

exp h

ðyÞþh

ðyÞð

þh

ðyÞþh

ðyÞÞ;

where y is any observed network with ﬁve nodes,

H ¼fh

; ...; h

g, the set of weights of sufﬁcient statistics,

are estimated through solving an optimization problem

where the probability of the observed network is maxi-

mized. The exact computation of the normalization con-

stant, Z, requires handling of many terms (all possible

network realization must be considered and their corre-

sponding sufﬁcient statistics calculated). This chal lenge is

conventionally handled using Markov Chain Monte Carlo

(MCMC) sampling tec hnique (Snijders et al. 2006).

Some of the ﬁrst proposed models, e.g., random graphs

and p

models (Frank and Strauss 1986), used Bernoulli

and dyadic dependence str uctures, which are generally

overly simplistic (Robins et al. 2007a). On the contrary,

ERGMs are based on Mar kov dependence assumption

(Frank and Strauss 1986) supposing that two possible ties

are conditionally dependent when they share an actor

(node). Moreover, Markov dependence assumption can be

extended to attributed networks which assumes each node

has a set of attributes inﬂuencing the node’s possible

incoming and outgoing ties (Robins et al. 2007a) (e.g.,

Fig. 7 The spread of new drug adoption through an advisory network of physicians: two snapshots at different time points, about 2 years apart

(from left to right). The growth dynamics in the number of adopters can be analyzed with an Ising Model

62 Page 12 of 18 Soc. Netw. Anal. Min. (2015) 5:62

123

more experienced actors in an advisory network, more

incoming ties). When nodal attributes are taken into

account as random variables, ERGMs and MNs can be

integrated to mode l the social network due to similarities

that they share (see the Appendix and Fellows and Hand-

cock 2012; Thiemichen et al. 2014; Lusher et al. 2012).

ERGMs have been widely employed to study the net-

work and friendship formation (Song et al. 2014) and

global network structural using local structure of the

observed network (Uddin et al. 2013a ). The observed

network is considered as one realization from too many

possible networks with similar important characteristics

(Robins et al. 2007a). For example, Broekel and Hartog

(2013) used ERGMs to identify factors determining the

structure of inter-organizational networks based on the

single observation. Schaefer and Simpkins ( 2014) used

SNA to study the relation between weight status and friend

selection and ERGMs to measure the effects of body mass

index on friend selection.

Moreover, Goodreau et al. (2009) used ERGMs to

examine the generative processes that give rise to

widespread patterns in friendship networks. Cranmer and

Desmarais used ERGMs to model co-sponsorship net-

works in the U.S. Congress and conﬂict networks in the

international system. They determined that several pre-

viously unexplored network parameters are accept-

able predictors of the U.S. House of Representatives

legislative co-sponsorship network (Cranmer and Des-

marais 2011).

The ERGMs have also been utilized in modeling the

changing communication network structure and classifying

networks based on the occurrence of their local features

(Uddin et al. 2013a) and to identify micro-level structural

properties of physician collaboration network on hospital-

ization cost and readmission rate (Uddin et al. 2013b).

Finally, a ERGM-based model of clustering nodes con-

sidering their role in the network has been reported (Salter-

Townshend and Murphy 2014 ).

4 Discussion

Mining social networks for knowledge and discovery has

proven to be a very challenging and active research area.

This review focussed on PGMs. The directed and undi-

rected PGM paradigms were described and their applica-

tions to social networks were highlighted. An important

consideration and major challenge is the issue of scalabil-

ity, not only for PGMs, but for SNA, in general. Structural

and parameter learning in high dimensions can be pro-

hibitive. Moreover, for structural learning, both greedy-

and sampling-based search strategies can get stuck at local

minima, and many graphs may be likelihood equivalent.

These numerical caveats can give rise to misleading net-

works, generating models, and subsequent predictions. In

addition, ERGMs can exhibit degeneracy, which occurs

when the generated networks show little resemblance to the

generating model. Proposed modiﬁcations to the concept of

goodness of ﬁt have been proposed to safeguard against the

problems of degeneracy (Goodreau 2007; Hunter et al.

2008).

In the majority of applications of PGMs (both directed

and undirected) in SNA, the graphical structures are

assumed to be either known or designed by human experts

(i.e., captured directly by social networ ks), thereby the

learning problem is limited to the parameter estimation.

However, practically hand-constructed PGMs for SNA

have many barriers: time taken to construct them varies

from hours to months, experts can be costly or unavailable,

the data may be huge and errors may lead to poor answers.

On the other hand, structure learning is NP-hard with the

hypothesis space being super-exponential (2

Oðn

)

networks.

Directed and undirected graphs share common inter-

pretations in terms of conditional independences. Selection

of a PGM modeling paradigm is not trivial and is driven by

the data and ultimately what the user hopes to achieve with

the model. When the relationship can be viewed in terms of

cause and effect, BNs are more appropriate , and when the

relationship is association, MNs are preferred. Inferences in

both paradigms are met with challenges. The types of

variables (continuous or discrete) have to be carefully

considered. Modeling with a mixture of these variables is

possible in the case of BNs under strict assumptions.

However, the inference problem becomes more sensitive to

sample size, as the parameters estimated for the local

models are done so from a potentially reduced population,

which can be severely subset by level factors of parent

nodes. Another important learning task, outside of the

scope of this review, is queries that involve the absorption

of evidence (e.g., new data) in the network and propagation

through the network. This process is known as belief

propagation and it takes place on a factor graph (aka cluster

graph). In the case of BNs, the factor graph is a factor tree

(aka junction tree), and the propagation schemes give rise

to exact inferences of marginal distributions (aka beliefs).

On the other hand, in MNs the factor graph may have

cycles, which does not ensure exact inference in terms of

marginals, but has still been shown to be useful in practice,

see Koller and Friedman (2009) for more details.

There are several opportunities to access open source data

resources in order to develop and test methodologies for

PGMs, and related areas. Max-Plank researchers have

released OSN data used in publications, which includes

crawled data from Flickr, YouTube, Wikipedia and

Soc. Netw. Anal. Min. (2015) 5:62 Page 13 of 18 62

123

Facebook (Mislove et al. 2007; Cha et al. 2008, 2009;

Viswanath et al. 2009). Several directed OSNs have been

released in the Stanford Network Analysis Package (snap),

e.g. from Epinions, Amazon, LiveJournal, Slashdot and

Wikipedia voting (Stanford 2011). Recently, a Facebook

dataset was released that exhibited convergence properties

and was shown to be representative of the underlying pop-

ulation (Gjoka et al. 2010). Document classiﬁcation datasets

have also been released (Getoor 2012). A sample from the

CiteSeer database contains 3312 publications from one of

six classes, and 4732 links. The Cora dataset consists of

2708 publications classiﬁed into seven categories and the

citation network has 5429 links. Each publication is

described by a binary word vector which indicates the

presence of certain words within a collection of 1433.

WebKB consists of 877 scientiﬁc publications from ﬁve

classes, contains 1601 links and includes binary word attri-

butes similar to Cora. Terrorism databases are also publicly

available (Division 1948; National Consortium for the Study

of Terrorism and Responses to Terrorism 2015). The most

extensive is the RAND Database of Worldwide Terrorism

Incidents, which details terrorist attacks in nine distinct

regions of the world across the time-span 1968–2009 (dates

vary slightly depending on region) (Division 1948). Several

well-known challenges may arise in the analysis and rep-

resentation of terrorist network data, including incomplete

information, latent variables inﬂuencing node dynamics, and

fuzzy boundaries between terrorists, supporters of terrorists,

and the innocent (Sparrow 1991; Krebs 2002). The DBLP

computer science bibliography (http://dblp.uni-trier.de/db/)

is a massive online database that contains bibliographic

meta-data for over 2.6 million publications. There is also

ample opportunity to enroll in various data challenges,

which are often posed by corporations and operators of the

networks themselves.

In this review, we surveyed directed and undirected PGMs,

and highlighted their applications in modern social networks.

Despite limitations that arise related to scalability and infer-

ence, it is our opinion that the utility of PGMs has been

somewhat under-realized in the social network arena. It is

indisputable that methods for understanding social networks

have not kept pace with the data explosion. There are several

relevant topics and opportunities in social networks, e.g., link

predication, collective classiﬁcation, modeling information

diffusion, entity resolution, and viral marketing, where con-

ditional independencies can be leveraged to improve perfor-

mance. PGMs implicitly convey conditional independence

and provide ﬂexible modeling paradigms, which hold

tremendous promise and untapped opportunity for SNA.

Acknowledgments A. N. is supported in part by a MURI grant

(Number W911NF-09-1-0392) for Uniﬁed Research on Network-

based Hard/Soft Information Fusion, issued by the US Army Research

Ofﬁce (ARO) under the program management of Dr. John Lavery, in

part by the Academy of Finland Grant MineSocMed (Number

268078), and in part by the 2015 U.S. Air Force Summer Faculty

Fellowship Program, sponsored by the Air Force Ofﬁce of Scientiﬁc

Research. R. H. B. is supported through NSF DMS 1312250.

Appendix

Similarity between MNs and ERGMs

While MNs and ERGMs have been developed in different

scientiﬁc domains, they both speci fy exponential family

distributions. MN models treat social network nodes as

random variables, and hence, their utility is most obvious

in modeling processes on networks; ERGMs, on the other

hand, have been conceptualized to model network forma-

tion, where it is the edge presence indicators that are

treated as random variables (these random variables are

dependent if their correspo nding edges share a node). But

in fact, this application-related difference in what to treat

as random is not fundamental. This Appendix works to

more rigorously disclose the similarity between MNs and

ERGMs by re- deﬁning an ERGM as a PGM. We begin,

however, by reviewing the branc h of literature devoted

exclusively to ERGMs .

Similar to MNs, a well-discussed problem of ERGMs

for analyzing social networks is related to the challenge of

parameters estimation (Robins et al. 2007b) due to the lack

of enough observed data. Robins et al. (2007b) outlin e this

and some other problems associated with ERGMs, e.g.,

degeneracy in model selection and bimodal distribution

shapes (see also Handcock et al. 2003; Rinaldo et al. 2009;

Snijders et al. 2006; Handcock et al. 2006).

The roots of ERGMs in the Principle of Maximum

Entropy (Park and Newman 2004) and the Hammersley–

Clifford theorem have been previously pointed out (Robins

et al. 2001; Goldenberg et al. 2010). Here, we illustrate

how MNs and ERGMs are similar in terms of the form and

structure using most popular signiﬁcant statistics in

ERGMs; under the assumption of Markov dependence, for

a given social network, one can build a corresponding

Markov network via the following conversion: (1) each

node in the Markov network will correspond to an edge in

the social network [Fienberg called this construct a ‘‘usual

graphical model’’ for ERGMs (Fienberg 2012)], (2) when

two edges share a node in the social network, a link will be

built between two corresponding nodes in the Markov

network.

Corresponding to each possible edge in a social network,

a node in an MN networ k is introduced; note the difference

between the original social network and the MN network—

they are not the same! Consider an ERGM with the

62 Page 14 of 18 Soc. Netw. Anal. Min. (2015) 5:62

123

signiﬁcant statistics including the number of edges, f

ðyÞ,

the number of k-stars, f

ðyÞ; i ¼ 2; ...; N  1 and the

number of triangles, f

ðyÞ. In an MN, a maximum Entropy

(maxent) model proposes the following form for the

internal ener gy of the system, E

ðxÞ¼

. Deﬁne,

as i

feature of clique c 2 X and a

is its corresponding

weight in G. Thus, w

ðxÞ¼expfb

i¼1

g. Since

there are too many parameters in the MN, they can be

deducted by imposing homogeneity constraints similar to

that of ERGMs (Robins et al. 2007a). Before imposing

such constraints, these following facts are required.

It is straightforward to demonstrate that G encompasses

cliques of size f3; ...; N  1g. In addition, all substructure

in G

can be redeﬁned by features in G. Considering these

points, we can rewrite the joint probability of all variables

represented by the MN, P(X), as follows:

PðXÞ¼

ZðaÞ

c¼1

exp b

i¼1

ZðaÞ

exp

c¼1

i¼1

ð6Þ

In ( 4 ), ZðaÞ is the partition function which is a function of

parameters. The homogeneity assumption, here, means

¼ h

8 c ¼ 1; ...; C; then P(X) is:

PðXÞ¼

Zðh

exp

i¼1

c¼1

: ð7Þ

In (5), let’s Z

¼ Zðh

Þ. In addition, we assume that

c¼1

represented by f

, means that substructures i in

all cliques c are added up by weight b

. Finally, if we

replace f

in (5):

PðXÞ¼

exp

i¼1

: ð8Þ

Comparing PðY ¼ yÞ and (4) conﬁrms that ERGMs and

MNs are similar and under the following conditions they

are identical:

1. h

¼ h

2. f

¼ f

c¼1

The followi ng Numerical Example (the same exampl e in

the ERGM section) depicts similarities between ERGMs

and MNs. The social network has ﬁve actors, N ¼ 5

(Fig. 8). Considering Markov dependency assumption,

there exists an unique corresponding Markov network

shown in Fig. 9 with 10 nodes.There are 15 cliques (so-

called factors) of siz e three or four,

Fig. 8 a A social network with 5 nodes and b the corresponding realization network (graph) and sufﬁcient statistics of the observed network

Fig. 9 A social network with ﬁve actors (left) and its corresponding Markov network (right)

Soc. Netw. Anal. Min. (2015) 5:62 Page 15 of 18 62

123

U ¼f/

ðy

; y

Þ; ...; /

ðy

; y

Þg:

As already mentioned, the joint probability function of

all variables in each clique is proportional to the internal

energy. For instance:

ðxÞ¼

expfb

ðy

; y

Þg;

where E

ðxÞ¼

and k is the distribution

parameter. This simple example shows that how ERGMs

and MNs are the same in terms of the underlying concept

and the expressed proba bility distribution.

References

Afrasiabi MH, Gue

rin R, Venkatesh S (2013) Opinion formation in

Ising networks. In: Information theory and applications work-

shop (ITA), 2013, pp 1–10. IEEE

Aggarwal CC (2011) An introduction to social network data analytics.

Springer, Berlin

Agliari E, Burioni R, Contucci P (2010) A diffusive strategic

dynamics for social systems. J Stat Phys 139(3):478–491

Al Hasan M, Zaki MJ (2011) A survey of link prediction in social

networks. In: Social network data analytics. Springer, Berlin,

pp 243–275

Anderson RM, May RM et al (1979) Population biology of infectious

diseases: Part i. Nature 280(5721):361–367

Ayday E, Fekri F (2010) A belief propagation based recommender

system for online services. In: Proceedings of the fourth ACM

conference on recommender systems, pp 217–220. ACM

Bach SH, Broecheler M, Getoor L, O’Leary DP (2012) Scaling MPE

inference for constrained continuous Markov random ﬁelds with

consensus optimization. In: NIPS, pp 2663–2671

Berry MJ, Linoff G (1997) Data mining techniques: for marketing,

sales, and customer support. Wiley, New York

Bonchi F, Castillo C, Gionis A, Jaimes A (2011) Social network

analysis and mining for business applications. ACM Trans Intell

Syst Technol (TIST) 2(3):22

Broekel T, Hartog M (2013) Explaining the structure of inter-

organizational networks using exponential random graph mod-

els. Ind Innov 20(3):277–295

Bromberg F, Margaritis D, Honavar V et al (2009) Efﬁcient Markov

network structure discovery using independence tests. J Artif

Intell Res 35(2):449

Cha M, Mislove A, Adams B, Gummadi KP (2008) Characterizing

social cascades in Flickr. In: Proceedings of the 1st workshop on

online social networks (WOSN’08), Seattle, WA

Cha M, Mislove A, Gummadi KP (2009) A measurement-driven

analysis of information propagation in the Flickr social network.

In: Proceedings of the 18th annual World wide web conference

(WWW’09), Madrid, Spain

Chapelle O, Zhang Y (2009) A dynamic Bayesian network click

model for web search ranking. In: Proceedings of the 18th

international conference on World wide web, pp 1–10. ACM

Chen H, Ku WS, Wang H, Tang L, Sun MT (2013) Linkprobe:

probabilistic inference on large-scale social networks. In: 2013

IEEE 29th international conference on data engineering (ICDE),

pp 290–301. IEEE

Chickering DM, Heckerman D, Meek C (2001) Large-sample

learning of Bayesian networks in NP-hard. J Mach Learn Res

5(2004):1287–1330

Coleman JS, Katz E, Menzel H et al (1966) Medical innovation: a

diffusion study. Bobbs-Merrill Company Indianapolis, New

York

Cowan R, Jonard N (2004) Network structure and the diffusion of

knowledge. J Econ Dyn Control 28(8):1557–1575

Crane R, McDowell LK (2011) Evaluating Markov logic networks for

collective classiﬁcation. In: Proceedings of the 9th MLG

workshop at the 17th ACM SIGKDD conference on knowledge

discovery and data mining

Cranmer SJ, Desmarais BA (2011) Inferential network analysis with

exponential random graph models. Polit Anal 19(1):66–86

Daud A, Li J, Zhou L, Muhammad F (2010) Knowledge discovery

through directed probabilistic topic models: a survey. Front

Comput Sci China 4(2):280–301

Dielmann A, Renals S (2004) Dynamic Bayesian networks for

meeting structuring. In: IEEE international conference on

acoustics, speech, and signal processing, 2004. Proceedings

(ICASSP’04), vol 5, p V-629. IEEE

Dierkes T, Bichler M, Krishnan R (2011) Estimating the effect of

word of mouth on churn and cross-buying in the mobile phone

market with Markov logic networks. Decis Support Syst

51(3):361–371

Ding S (2011) Learning undirected graphical models with structure

penalty. arXiv:1104.5256

Division NSR (1948) Rand database of worldwide terrorism inci-

dents. http://www.rand.org/nsrd/projects/terrorism-incidents.

html

Domingos P, Kok S, Lowd D, Poon H, Richardson M, Singla P (2008)

Markov logic. In: Probabilistic inductive logic programming.

Springer, Berlin, pp 92–117

Domingos P, Lowd D, Kok S, Nath A, Poon H, Richardson M, Singla

P (2010) Markov logic: a language and algorithms for link

mining. In: Link mining: models, algorithms, and applications.

Springer, New York, pp 135–161

Fang L, LeFevre K (2010) Privacy wizards for social networking

sites. In: Proceedings of the 19th international conference on

World wide web, pp 351–360. ACM

Fellows I, Handcock MS (2012) Exponential-family random network

models (preprint). arXiv:1208.0121

Fienberg SE (2012) A brief history of statistical models for network

analysis and open challenges. J Comput Graph Stat

21(4):825–839

Frank O, Strauss D (1986) Markov graphs. J Am Stat Assoc

81(395):832–842

Freeman L (2004) The development of social network analysis.

Empirical Press, Vancouver

Friedman N, Murphy K, Russell S (1998) Learning the structure of

dynamic probabilistic networks. In: Proceedings of the four-

teenth conference on uncertainty in artiﬁcial intelligence.

Morgan Kaufmann Publishers Inc., San Mateo, pp 139–147

Getoor L (2012) Social network datasets. http://www.cs.umd.edu/

sen/lbc-proj/LBC.html

Gjoka M, Kurant M, Butts CT, Markopoulou A (2010) Walking in

Facebook: a case study of unbiased sampling of OSNs. In:

INFOCOM, 2010 Proceedings IEEE, pp 1–9

Goldenberg A, Moore A (2004) Tractable learning of large Bayes net

structures from sparse data. In: Proceedings of the twenty-ﬁrst

international conference on machine learning, p 44. ACM

Goldenberg A, Zheng AX, Fienberg SE, Airoldi EM (2010) A survey

of statistical network models. Found Trends Mach Learn

2(2):129–233

Goodreau SM (2007) Advances in exponential random graph (p

)

models applied to a large social network. Soc Netw

29(2):231–248

Goodreau SM, Kitts JA, Morris M (2009) Birds of a feather, or friend

of a friend? using exponential random graph models to

62 Page 16 of 18 Soc. Netw. Anal. Min. (2015) 5:62

123

investigate adolescent social networks*. Demography

46(1):103–125

Grabowski A, Kosin

ski R (2006) Ising-based model of opinion

formation in a complex network of interpersonal interactions.

Physica A: Stat Mech Appl 361(2):651–664

Hageman RS, Leduc MS, Korstanje R, Paigen B, Churchill GA

(2011) A Bayesian framework for inference of the genotype–

phenotype map for segregating populations. Genetics

187:1163–1170

Handcock MS, Robins G, Snijders TA, Moody J, Besag, J (2003)

Assessing degeneracy in statistical models of social networks.

Technical report, Working paper

Handcock M, Hunter D, Butts C, Goodreau S, Morris M (2006)

Statnet: an r package for the statistical analysis and simulation of

social networks. Manual. University of Washington

He J, Chu WW, Liu ZV (2006) Inferring privacy information from

social networks. In: Intelligence and security informatics.

Springer, Berlin, pp 154–165

Heckerman D (2008) A tutorial on learning with Bayesian networks.

Springer, Berlin

Humphreys L (2007) Mobile social networks and social practice: a

case study of dodgeball. J Comput Mediat Commun

13(1):341–360

Hunter DR, Goodreau SM, Handcock MS (2008) Goodness of ﬁt of

social network models. J Am Stat Assoc 103(481):

Jabeur LB, Tamine L, Boughanem M (2012a) Featured tweet search:

modeling time and social inﬂuence for microblog retrieval. In:

2012 IEEE/WIC/ACM international conferences on Web intel-

ligence and intelligent agent technology (WI-IAT), vol 1,

pp 166–173. IEEE

Jabeur LB, Tamine L, Boughanem M (2012b) Uprising microblogs: a

Bayesian network retrieval model for tweet search. In: Proceed-

ings of the 27th annual ACM symposium on applied computing,

pp 943–948. ACM

Jansen BJ, Zhang M, Sobel K, Chowdury A (2009) Twitter power:

tweets as electronic word of mouth. J Am Soc Inf Sci Technol

60(11):2169–2188

Java A, Song X, Finin T, Tseng B (2007) Why we twitter:

understanding microblogging usage and communities. In: Pro-

ceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop

on Web mining and social network analysis, pp 56–65. ACM

Kempe D, Kleinberg J, Tardos E

(2003) Maximizing the spread of

inﬂuence through a social network. In: Proceedings of the ninth

ACM SIGKDD international conference on knowledge discov-

ery and data mining, pp 137–146. ACM

Koelle D, Pfautz J, Farry M, Cox Z, Catto G, Campolongo J (2006)

Applications of Bayesian belief networks in social network

analysis. In: Proceedings of the 4th Bayesian modeling appli-

cations workshop, UAI conference

Koller D, Friedman N (2009) Probabilistic graphical models:

principles and techniques. Massachusetts Institute of Technol-

ogy, Cambridge

Krause SM, Bo

ttcher P, Bornholdt S (2012) Mean-ﬁeld-like behavior

of the generalized voter-model-class kinetic Ising model. Phys

Rev E 85(3):031126

Krebs VE (2002) Mapping networks of terrorist cells. Connections

24(3):43–52

Kuter U, Golbeck J (2007) Sunny: a new algorithm for trust inference

in social networks using probabilistic conﬁdence models. AAAI

7:1377–1382

Kwak H, Lee C, Park H, Moon S (2010) What is twitter, a social

network or a news media? In: Proceedings of the 19th

international conference on World wide web, pp 591–600. ACM

Lauritzen SL (1996) Graphical models. Oxford University Press,

Oxford

Lee SI, Ganapathi V, Koller D (2006) Efﬁcient structure learning of

Markov networks using l

1-regularization. In: Advances in

neural information processing systems, pp 817–824

Lipford HR, Besmer A, Watson J (2008) Understanding privacy

settings in Facebook with an audience view. UPSEC 8:1–8

Lusher D, Koskinen J, Robins G (2012) Exponential random graph

models for social networks: theory, methods, and applications.

Cambridge University Press, Cambridge

Madigan D, York J, Allard D (1995) Bayesian graphical models for

discrete data. Int Stat Rev 63(2):215–232

Manyika J, Chui M, Brown B, Bughin J, Dobbs R, Roxburgh C, Byers

AH (2011) Big data: the next frontier for innovation, competi-

tion, and productivity

Mislove A, Marcon M, Gummadi KP, Druschel P, Bhattacharjee B

(2007) Measurement and analysis of online social networks. In:

Proceedings of the 5th ACM/USENIX Internet measurement

conference (IMC’07), San Diego, CA

Morris M, Handcock MS, Hunter DR (2008) Speciﬁcation of

exponential-family random graph models: terms and computa-

tional aspects. J Stat Softw 24(4):1548

Mukherjee S, Speed T (2008) Network inference using informative

priors. PNAS 11158:14313–14318

Murphy KP (2002) Dynamic Bayesian networks: representation,

inference and learning. PhD thesis, University of California

Murphy KP (2012) Machine learning: a probabilistic perspective. The

MIT Press, Cambridge

National Consortium for the Study of Terrorism and Responses to

Terrorism (START) (2015) University of Maryland. http://www.

start.umd.edu/

Neville J, Jensen D (2007) Relational dependency networks. J Mach

Learn Res 8:653–692

Newman ME (2006) Modularity and community structure in

networks. Proc Natl Acad Sci 103(23):8577–8582

Newman ME, Watts DJ, Strogatz SH (2002) Random graph models of

social networks. Proc Natl Acad Sci 99(suppl 1):2566–2572

Ounis I, Macdonald C, Lin J, Soboroff I (2011) Overview of the

TREC-2011 microblog track. In: Proceedings of the 20th Text

REtrieval Conference (TREC 2011)

Park J, Newman ME (2004) Statistical mechanics of networks. Phys

Rev E 70(6):066117

Pattison P, Wasserman S (1999) Logit models and logistic regressions

for social networks: II. Multivariate relations. Br J Math Stat

Psychol 52(2):169–193

Ravikumar P, Wainwright MJ, Lafferty JD et al (2010) High-

dimensional Ising model selection using l1-regularized logistic

regression. Ann Stat 38(3):1287–1319

Richardson M, Domingos P (2006) Markov logic networks. Mach

Learn 62(1–2):107–136

Rinaldo A, Fienberg SE, Zhou Y et al (2009) On the geometry of

discrete exponential families with application to exponential

random graph models. Electr J Stat 3:446–484

Robins G, Pattison P, Wasserman S (1999) Logit models and logistic

regressions for social networks: III. Valued relations. Psychome-

trika 64(3):371–394

Robins G, Pattison P, Elliott P (2001) Network models for social

inﬂuence processes. Psychometrika 66(2):161–189

Robins G, Pattison P, Kalish Y, Lusher D (2007a) An introduction to

exponential random graph (p) models for social networks. Soc

Netw 29(2):173–191

Robins G, Snijders T, Wang P, Handcock M, Pattison P (2007b)

Recent developments in exponential random graph (p) models

for social networks. Soc Netw 29(2):192–215

Salter-Townshend M, Murphy TB (2014) Role analysis in networks

using mixtures of exponential random graph models. J Comput

Grap Stat (just-accepted)

Soc. Netw. Anal. Min. (2015) 5:62 Page 17 of 18 62

123

Salter-Townshend M, White A, Gollini I, Murphy TB (2012) Review

of statistical network analysis: models, algorithms, and software.

Stat Anal Data Min ASA Data Sci J 5(4):243–264

Santos FC, Pacheco JM, Lenaerts T (2006) Evolutionary dynamics of

social dilemmas in structured heterogeneous populations. Proc

Natl Acad Sci USA 103(9):3490–3494

Schaefer DR, Simpkins SD (2014) Using social network analysis to

clarify the role of obesity in selection of adolescent friends. Am J

Public Health 104(7):1223–1229

Schmidt MW, Murphy K, Fung G, Rosales R (2010) Structure

learning in random ﬁelds for heart motion abnormality detection.

In: IEEE Conference on Computer Vision and Pattern Recog-

nition. IEEE, pp. 1–8

Scott J, Carrington PJ (2011) The SAGE handbook of social network

analysis. SAGE Publications, London

Snijders TA, Pattison PE, Robins GL, Handcock MS (2006) New

speciﬁcations for exponential random graph models. Sociol

Methodol 36(1):99–153

Song X, Jiang S, Yan X, Chen H (2014) Collaborative friendship

networks in online healthcare communities: an exponential

random graph model analysis. In: Smart health, vol 8549.

Springer, Switzerland, pp 75–87

Sparrow MK (1991) The application of network analysis to criminal

intelligence: an assessment of the prospects. Soc Netw

13(3):251–274

Srihari S (2014) Probabilistic graphical models. In: Alhajj R, Rokne J

(eds) Encyclopedia of social network analysis and mining.

Springer, Berlin

Stanford (2011) Stanford network analysis package (snap). http://

snap.stanford.edu

Taskar B, Abbeel P, Koller D (2002) Discriminative probabilistic

models for relational data. In: Proceedings of the eighteenth

conference on uncertainty in artiﬁcial intelligence. Morgan

Kaufmann Publishers Inc., USA, pp 485–492

Thiemichen S, Friel N, Caimo A, Kauermann G (2014) Bayesian

exponential random graph models with nodal random effects

(preprint). arXiv:1407.6895

Tresp V, Nickel M (2013) Relational models. In: Rokne J, Alhajj R

(eds) Encyclopedia of social network analysis and mining.

Springer, Heidelberg

Uddin S, Hamra J, Hossain L (2013a) Exploring communication

networks to understand organizational crisis using exponential

random graph models. Comput Math Organ Theory 19(1):25–41

Uddin S, Hossain L, Hamra J, Alam A (2013b) A study of physician

collaborations through social network and exponential random

graph. BMC Health Serv Res 13(1):234

Van den Bulte C, Lilien GL (2001) Medical innovation revisited:

social contagion versus marketing effort1. Am J Sociol

106(5):1409–1435

Vega-Redondo F (2007) Complex social networks, vol 44. Cambridge

University Press, Cambridge

Viswanath B, Mislove A, Cha M, Gummadi KP (2009) On the

evolution of user interaction in Facebook. In: Proceedings of the

2nd ACM SIGCOMM workshop on social networks

(WOSN’09), Barcelona, Spain

Wan HY, Lin YF, Wu ZH, Huang HK (2012) Discovering typed

communities in mobile social networks. J Comput Sci Technol

27(3):480–491

Wang Y, Vassileva J (2003) Bayesian network-based trust model. In:

IEEE/WIC international conference on Web intelligence, 2003.

WI 2003. Proceedings, pp 372–378. IEEE

Wasserman S, Pattison P (1996) Logit models and logistic regressions

for social networks: I. An introduction to Markov graphs and p.

Psychometrika 61(3):401–425

Wortman, J.: Viral marketing and the diffusion of trends on social

networks (2008)

Xiang R, Neville J (2013) Collective inference for network data with

copula latent Markov networks. In: Proceedings of the sixth

ACM international conference on Web search and data mining,

pp 647–656. ACM

Yang X, Guo Y, Liu Y (2013) Bayesian-inference-based recommen-

dation in online social networks. IEEE Trans Parallel Distrib

Syst 24(4):642–651

62 Page 18 of 18 Soc. Netw. Anal. Min. (2015) 5:62

123

A preview of this full-text is provided by Springer Nature.

Learn more

Content available from Social Network Analysis and Mining

This content is subject to copyright. Terms and conditions apply.

Applicable and Partial Learning of Graph Topology Without Sparsity Priors

Article

Full-text available

Jan 2022

This paper considers the problem of learning the underlying graph topology of Gaussian Graphical Models (GGMs) from observations. Under high-dimensional settings, to achieve low sample complexity, many existing graph topology learning algorithms assume structural constraints such as sparsity to hold. Without prior knowledge of graph sparsity, the correctness of their results is difficult to check. In this paper, we aim to do away with these assumptions by developing algorithms for learning degree-bounded GGMs and separable GGMs without any sparsity priors. The proposed algorithms, which are based only on the knowledge of conditional independence relations in the data distribution, require minimal structural assumptions while still achieving low sample complexity, and hence are ‘applicable’. Specifically, for any user defined sparsity parameter $k$ , we prove that the proposed algorithms can consistently identify whether a $p$ -dimensional GGM is degree-bounded by $k$ (or strongly $k$ -separable) with $\Omega (k \log p)$ sample complexity. Besides, our algorithms also demonstrate ‘partial’ learning properties whenever the overall graph is not entirely sparse, that is, not all nodes are degree-bounded (or are strongly separable). In this case, we can still learn the sparse portions of the graph, with theoretical guarantees included. Numerical results show that existing algorithms fail even in some simple settings where sparsity assumptions do not hold, whereas our algorithms do not.

Network Analysis of Count Data from Mixed Populations

Preprint

Full-text available

Dec 2022

In applications such as gene regulatory network analysis based on single-cell RNA sequencing data, samples often come from a mixture of different populations and each population has its own unique network. Available graphical models often assume that all samples are from the same population and share the same network. One has to first cluster the samples and use available methods to infer the network for every cluster separately. However, this two-step procedure ignores uncertainty in the clustering step and thus could lead to inaccurate network estimation. Motivated by these applications, we consider the mixture Poisson log-normal model for network inference of count data from mixed populations. The latent precision matrices of the mixture model correspond to the networks of different populations and can be jointly estimated by maximizing the lasso-penalized log-likelihood. Under rather mild conditions, we show that the mixture Poisson log-normal model is identifiable and has the positive definite Fisher information matrix. Consistency of the maximum lasso-penalized log-likelihood estimator is also established. To avoid the intractable optimization of the log-likelihood, we develop an algorithm called VMPLN based on the variational inference method. Comprehensive simulation and real single-cell RNA sequencing data analyses demonstrate the superior performance of VMPLN.

Open vs Closed-ended questions in attitudinal surveys -- comparing, combining, and interpreting using natural language processing

Preprint

Full-text available

May 2022

To improve the traveling experience, researchers have been analyzing the role of attitudes in travel behavior modeling. Although most researchers use closed-ended surveys, the appropriate method to measure attitudes is debatable. Topic Modeling could significantly reduce the time to extract information from open-ended responses and eliminate subjective bias, thereby alleviating analyst concerns. Our research uses Topic Modeling to extract information from open-ended questions and compare its performance with closed-ended responses. Furthermore, some respondents might prefer answering questions using their preferred questionnaire type. So, we propose a modeling framework that allows respondents to use their preferred questionnaire type to answer the survey and enable analysts to use the modeling frameworks of their choice to predict behavior. We demonstrate this using a dataset collected from the USA that measures the intention to use Autonomous Vehicles for commute trips. Respondents were presented with alternative questionnaire versions (open- and closed- ended). Since our objective was also to compare the performance of alternative questionnaire versions, the survey was designed to eliminate influences resulting from statements, behavioral framework, and the choice experiment. Results indicate the suitability of using Topic Modeling to extract information from open-ended responses; however, the models estimated using the closed-ended questions perform better compared to them. Besides, the proposed model performs better compared to the models used currently. Furthermore, our proposed framework will allow respondents to choose the questionnaire type to answer, which could be particularly beneficial to them when using voice-based surveys.

Open vs closed ended questions in attitudinal surveys comparing combining and interpreting using natural language processing

Article

Apr 2022
TRANSPORT RES C-EMER

To improve the traveling experience, researchers have been analyzing the role of attitudes in travel behavior modeling. Although most researchers use closed-ended surveys, the appropriate method to measure attitudes is debatable. Topic Modeling could significantly reduce the time to extract information from open-ended responses and eliminate subjective bias, thereby alleviating analyst concerns. Our research uses Topic Modeling to extract information from open-ended questions and compare its performance with closed-ended responses. Furthermore, some respondents might prefer answering questions using their preferred questionnaire type. So, we propose a modeling framework that allows respondents to use their preferred questionnaire type to answer the survey and enable analysts to use the modeling frameworks of their choice to predict behavior. We demonstrate this using a dataset collected from the USA that measures the intention to use Autonomous Vehicles for commute trips. Respondents were presented with alternative questionnaire versions (open- and closed-ended). Since our objective was also to compare the performance of alternative questionnaire versions, the survey was designed to eliminate influences resulting from statements, behavioral framework, and the choice experiment. Results indicate the suitability of using Topic Modeling to extract information from open-ended responses; however, the models estimated using the closed-ended questions perform better compared to them. Besides, the proposed model performs better compared to the models used currently. Furthermore, our proposed framework will allow respondents to choose the questionnaire type to answer, which could be particularly beneficial to them when using voice-based surveys.

Discussion to: Bayesian graphical models for modern biological applications by Y. Ni, V. Baladandayuthapani, M. Vannucci and F.C. Stingo: Looking for the missing link between graphical models and social network analysis

Article

Nov 2021

In the present contribution we provide a discussion of the paper on “Bayesian graphical models for modern biological applications”. The authors present an extensive review of Bayesian graphical models, which are used for a variety of inferential tasks applied to biology and medicine settings. Our contribution proposes a conceptual connection between two scientific frameworks, graphical models and social network analysis, by highlighting also the role played by network models and random graphs. A bibliometric analysis is performed by exploiting publications collected from online bibliographic archives to map the main themes characterizing the two research fields. Specifically, a co-word network analysis is carried out using visualization tools and thematic evolution maps.

A network approach to dyslexia: Mapping the reading network

Article

Full-text available

Jul 2021
DEV PSYCHOPATHOL

Research on the etiology of dyslexia typically uses an approach based on a single core deficit, failing to understand how variations in combinations of factors contribute to reading development and how this combination relates to intervention outcome. To fill this gap, this study explored links between 28 cognitive, environmental, and demographic variables related to dyslexia by employing a network analysis using a large clinical database of 1,257 elementary school children. We found two highly connected subparts in the network: one comprising reading fluency and accuracy measures, and one comprising intelligence-related measures. Interestingly, phoneme awareness was functionally related to the controlled and accurate processing of letter-speech sound mappings, whereas rapid automatized naming was more functionally related to the automated convergence of visual and speech information. We found evidence for the contribution of a variety of factors to (a)typical reading development, though associated with different aspects of the reading process. As such, our results contradict prevailing claims that dyslexia is caused by a single core deficit. This study shows how the network approach to psychopathology can be used to study complex interactions within the reading network and discusses future directions for more personalized interventions.

A Comprehensive Survey on Information Diffusion Models for Social Media Text: Social Media Analytics

Chapter

Sep 2023

People use social media platforms like Facebook, Twitter, and blog sites for expressing their views and criticising the products purchased and movies watched. They use these platforms for getting information like blood donation requirements and job opportunities. During the disastrous situations like floods and earthquakes, these platforms act as powerful media for passing messages to all people. During this COVID-19 pandemic period, all social media platforms are effectively used by all businesses for the instant communication and interactions between the groups of people. In all these scenarios, the information gets diffused and reaches different levels of people. Sometimes this diffusion gives positive aspects to the readers; sometimes it creates negative impacts to them, which has its own cascading effects. It becomes essential to monitor the rate of flow of information and stop spreading the fake or false messages. The application of suitable graph network modelling and theories would support this research issue and recommend the appropriate model for the social media data.

Temporal Graph Representation Learning via Maximal Cliques

Conference Paper

Dec 2022

Predicting business processes of the social insurance using recurrent neural network and Markov chain

Article

Aug 2021
J Model Manag

Purpose Predicting the final status of an ongoing process or a subsequent activity in a process is an important aspect of process management. Semi-structured business processes cannot be predicted by precise and mathematical methods. Therefore, artificial intelligence is one of the successful methods. This study aims to propose a method that is a combination of deep learning methods, in particular, the recurrent neural network and Markov chain. Design/methodology/approach The proposed method applies the BestFirst algorithm for the search section and the Cfssubseteval algorithm for the feature comparison section. This study focuses on the prediction systems of social insurance and tries to present a method that is less costly in providing real-world results based on the past history of an event. Findings The proposed method is simulated with real data obtained from Iranian Social Security Organization, and the results demonstrate that using the proposed method increases the memory utilization slightly more than the Markov method; however, the CPU usage time has dramatically decreased in comparison with the Markov method and the recurrent neural network and has, therefore, significantly increased the accuracy and efficiency. Originality/value This research tries to provide an approach capable of producing the findings closer to the real world with fewer time and processing overheads, given the previous records of an event and the prediction systems of social insurance.

Temporal Pattern Detection in Time-Varying Graphical Models

Conference Paper

Jan 2021

Understanding networks with exponential-family random network models

Article

Aug 2023
SOC NETWORKS

The SAGE Handbook of Social Network Analysis

Book

Jan 2014

Relational Models

Chapter

Sep 2016

We provide a survey on relational models. Relational models describe complete networked {domains by taking into account global dependencies in the data}. Relational models can lead to more accurate predictions if compared to non-relational machine learning approaches. Relational models typically are based on probabilistic graphical models, e.g., Bayesian networks, Markov networks, or latent variable models. Relational models have applications in social networks analysis, the modeling of knowledge graphs, bioinformatics, recommendation systems, natural language processing, medical decision support, and linked data.

Bayesian network-based trust model

Article

Jan 2003

Walking in facebook: A case study of unbiased sampling of osns

Article

Jan 2010

Big data: The next frontier for innovation, competition, and productivity

Technical Report

May 2011

LinkProbe: Probabilistic inference on large-scale social networks

Article

Jan 2013

A measurement-driven analysis of information propagation in the flickr social network

Article

Jan 2009

Viral Marketing and the Diffusion of Trends on Social Networks

Article

May 2008

Jennifer Wortman

We survey the recent literature on theoretical models of diffusion in social networks and the application of these models to viral marketing. To put this work in context, we begin with a review of the most common models that have been examined in the economics and sociology literature, including local interaction games, threshold models, and cascade models, in addition to a family of models based on Markov random fields. We then discuss a series of recent algorithmic and analytical results that have emerged from the computer science community. The first set of results addresses the problem of influence maximization, in which the goal is to determine the optimal group of individuals in a social network to target with an advertising campaign in order to cause a new product or technology to spread throughout the network. We then discuss an analysis of the properties of graphs that allow or prohibit the widespread propagation of trends.

Collaborative Friendship Networks in Online Healthcare Communities: An Exponential Random Graph Model Analysis

Conference Paper

Jul 2014

Health 2.0 provides patients an unprecedented way to connect with each other online. However, less attention has been paid to how patient collaborative friendships form in online healthcare communities. This study examines the relationship between collaborative friendship formation and patients’ characteristics. Results from Exponential Random Graph Model (ERGM) analysis indicate that gender homophily doesn’t appear in CFNs, while health homophily such as treatments homophily and health-status homophily increases the likelihood of collaborative friendship formation. This study provides insights for improving website design to help foster close relationship among patients and deepen levels of engagement.

Probabilistic graphical models in modern social network analysis

Abstract and Figures

Recommended publications

Finding All Bayesian Network Structures within a Factor of Optimal

Learning Compact Markov Logic Networks with Decision Trees

Probabilistic graphical models in complex industrial applications

Using Bayesian networks to estimate missing Airborne Laser Swath Mapping (ALSM) data