Conference PaperPDF Available

Analysis of Physical Activity Propagation in a Health Social Network

Authors:

Abstract and Figures

Modeling physical activity propagation, such as the activity level and intensity, is the key to prevent the cascades of obesity, and help spread wellness and healthy behavior in a social network. However, there has been lacking of scientific and quantitative study to elucidate how social communication may deliver physical activity interventions. In this work we introduce a Community-level Physical Activity Propagation (CPP) model to analyze physical activity propagation and social influence at different granularities (i.e., individual level and community level). CPP is a novel model which is inspired by the well-known Independent Cascade and Community-level Social Influence models. Given a social network, we utilize a hierarchical approach to detect a set of communities and their reciprocal influence strength of physical activities. CPP provides a powerful tool to dis- cover, summarize, and investigate influence patterns of physical activities in a health social network. The detail experimental evaluation shows not only the effectiveness of our approach but also the correlation of the detected communities with various health outcome measures (i.e., both existing ones and our novel measure, named Wellness score, which is a combination of lifestyle parameters, biometrics, and biomarkers). Our promising results potentially pave a way for knowledge discovery in health social networks.
Content may be subject to copyright.
Analysis of Physical Activity Propagation
in a Health Social Network
NhatHai Phan
University of Oregon, USA
haiphan@cs.uoregon.edu
Dejing Dou
University of Oregon, USA
dou@cs.uoregon.edu
Xiao Xiao
University of Oregon, USA
xiaox@uoregon.edu
Brigitte Piniewski
PeaceHealth Laboratories
BPiniewski@peacehealthlabs.org
David Kil
HealthMantic, Inc
david.kil@healthmantic.com
ABSTRACT
Modeling physical activity propagation, such as the activ-
ity level and intensity, is the key to prevent the cascades of
obesity, and help spread wellness and healthy behavior in
a social network. However, there has been lacking of sci-
entific and quantitative study to elucidate how social com-
munication may deliver physical activity interventions. In
this work we introduce a Community-level Physical Activ-
ity Propagation (CPP) model to analyze physical activity
propagation and social influence at different granularities
(i.e., individual level and community level). CPP is a novel
model which is inspired by the well-known Independent Cas-
cade and Community-level Social Influence models. Given a
social network, we utilize a hierarchical approach to detect
a set of communities and their reciprocal influence strength
of physical activities. CPP provides a powerful tool to dis-
cover, summarize, and investigate influence patterns of phys-
ical activities in a health social network. The detail exper-
imental evaluation shows not only the effectiveness of our
approach but also the correlation of the detected commu-
nities with various health outcome measures (i.e., both ex-
isting ones and our novel measure, named Wellness score,
which is a combination of lifestyle parameters, biometrics,
and biomarkers). Our promising results potentially pave a
way for knowledge discovery in health social networks.
Categories and Subject Descriptors
H.2.8 [Database Management]: Database Applications—
Data Mining
General Terms
Theory; Algorithms; Experimentation
Keywords
Physical activity propagation; health social network
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are not
made or distributed for profit or commercial advantage and that copies bear
this notice and the full citation on the first page. Copyrights for components
of this work owned by others than ACM must be honored. Abstracting with
credit is permitted. To copy otherwise, or republish, to post on servers or to
redistribute to lists, requires prior specific permission and/or a fee. Request
permissions from Permissions@acm.org.
CIKM ’14 November 03 - 07 2014, Shanghai, China
Copyright 2014 ACM 978-1-4503-2598-1/14/11$15.00.
http://dx.doi.org/10.1145/2661829.2662025.
1. INTRODUCTION
Regular physical activity reduces the risk of developing
cardiovascular decease, diabetes, obesity, osteoporosis, some
cancers, and other chronic conditions [15]. Public health
goal standards recommend adults to participate in at least
30 minutes of moderate-intensity physical activity on 5 or
more days a week [16]. However, less than 50% of the adult
population meets these standards in many industrialized
countries [1, 15]. Thus finding the effective population-based
intervention strategies to propagate the physical activity is
a key challenge.
The exploiting of Internet and the success of online social
networks hold promise for wide-scale promotion of physical
activity behavior change. In many developed countries, In-
ternet access is greater than 63% and keeps increasing [5].
The Internet is identified as an important source of health
information and may thus be an appropriate delivery for
health behavior interventions [10]. Since 2000, a wide range
of studies evaluating Internet-delivered health behavior in-
terventions have been reported. Over half of them have been
reported positive behavioral outcomes [9, 17, 18, 27]. Re-
cently, online social networks can help people to interact and
participate various physical activities and thus could better
promote and spread physical activities with affordable cost.
However, there has been lacking of scientific and quantita-
tive study to elucidate how social network may contribute
to physical activity propagation.
Besides online social network, recent advances in mobile
technology provide new opportunities to support healthy
behaviors through lifestyle monitoring and online commu-
nities. Mobile devices can track and record the walk-
ing/jogging/running distance and intensity of an individual.
Utilizing these technologies, our recent study, named Yesi-
Well, conducted in 2010-2011 as a collaboration between
PeaceHealth Laboratories, SK Telecom Americas and Uni-
versity of Oregon to record daily physical activities, so-
cial activities (i.e., text messages, social games, meetup
events, competitions, etc.), biomarkers, and biometric mea-
sures (i.e., cholesterol, triglyceride, BMI, etc.) for a group of
254 individuals who formed a health social network. Phys-
ical activities are reported via a mobile device carried by
each user. All users enroll an online social network appli-
cation allowing them make friend and communicate each
other. Biomarkers and biometric measures are recorded via
monthly medical tests performed at our laboratories on each
user. The fundamental problems this study seeks to answer,
Figure 1: Probability of a message becomes effective
to propagate physical activities.
which are also the key in understanding the determinants of
healthy behavior propagation, are as follows:
1. Can social communication affect the physical activity
propagation?
2. How can we leverage the social interaction to understand
the physical activity propagation?
3. How can we understand the propagation process with
different granularities?
4. Can we clarify the effect of physical activity propagation
to health outcome measures?
For the first question, to illustrate that social communi-
cation can deliver physical activity, we have performed a
simple statistical analysis on our health social network. As-
sume that a user ureceives a message mat timestamp tfrom
another user, we compare the total number of walking and
running steps of uin the future period [t, t + t] with the
past period [tt, t]. If uincreases his total number of steps
then mis considered as an effective message. The solid line
in Figure 1 illustrates the probability of a message becoming
effective; meanwhile the dashed line shows the probability
of users increasing total number of steps when randomly
choosing timestamp t(i.e., user might or might not receive
a message at a random time t). It is clear that with t= 1
day the probability a user increasing his total number of
steps is up to 0.58 and significantly larger than 0.26 of ran-
dom t. This phenomenon remains when tincreases to 50
days before dropping down. This evidence strengthens our
belief that social communications in health social networks
can help propagate physical activities.
Motivated by the evidence, our goal in this paper is to
understand the dynamics of physical activity propagation
via social communication channels at both individual level
and community level. More in concrete: 1) we aim to evalu-
ate the probability of physical activity propagations for ev-
ery social communication edge. The estimated probabilities
can be used in many applications (i.e., propagation predic-
tion, health behavior interventions, etc); 2) we then devise
a graph summarization paradigm for the analysis of phys-
ical activity propagation and social influence. In fact, we
aim to find an abstraction of the propagation process which
provides data analysts with a compact, and yet meaning-
ful, view of patterns of influence and activity diffusion over
health social networks. Members in the same community
tend to play the same role in the propagation process.
To achieve this goal, we are inspired by the well-known
Independent Cascade (IC) model [7] and the Community-
level Social Influence (CSI) model [12] to fit a health social
network. In our health social network, users are strongly
encouraged to communicate each other. The correlation be-
tween effective messages and ineffective messages does not
truly represent the user-user influence relationship. There-
fore, existing models (e.g., CSI) cannot extract meaning-
ful community structures. To overcome this issue we pro-
pose a new model called Community-level Physical Activity
Propagation (CPP) in which effective messages are com-
bined with a user’s responsibility to infer the probability of
physical activity propagations in a health social network.
Regarding our discovered structure, a community is iden-
tified by a set of communicated nodes that share a similar
physical activity influence tendency over nodes belonging to
other communities. In order to clarify the effect of activ-
ity propagation to health outcome, we analyze the corre-
lation between detected communities not only with exist-
ing health outcome measures (i.e., biometrics, BMI, average
number of steps, BMI slope) but also with a novel measure,
named Wellness score, which is modeled as a combination
of lifestyle parameters, biometrics, and biomarkers.
The main contributions of this paper are as follows:
1. We introduce the Community-level Physical Activity
Propagation (CPP) model, which is inspired by the ideas
of IC and CSI models.
2. Given a set of disjoint communities, we devise an
Expectation-Maximization algorithm to effectively learn
the strength of their pairwise influence relationships.
Then we utilize a greedy algorithm which explores a
given hierarchical partitioning of the network. Our ap-
proach results in a community structure that guarantees
a good balance between the accuracy in describing identi-
fied propagation activities and a compact representation
of the influence relationships.
3. We propose a novel health outcome measure, named
Wellness score, which is a combination of lifestyle pa-
rameters, biometrics, and biomarkers towards a mimic
percentile user ranking.
4. Through a comprehensive experiment on the YesiWell so-
cial network, we show the effectiveness of our approach.
Our discovery potentially paves a way for knowledge dis-
covery and data mining in health social networks (e.g.,
physical activity interventions).
The rest of the paper is organized as follows. In Sec. 2,
we formally define the problem tackled in this paper and
explain the technical detail of our model. The experimental
evaluation is in Sec. 3. We briefly review related prior art
in Sec. 4 and conclude the paper with a summary of our
major findings and future research directions in Sec. 5.
2. COMMUNITY-LEVEL PHYSICAL AC-
TIVITY PROPAGATION MODEL
We first give a definition of a single trace of physical activ-
ity propagations and review the fundamental independent
cascade propagation (IC) model [7] in Sec. 2.1. Then we
introduce CPP model (Sec. 2.2). Finally, we present our
parameter learning process and model selection in Sec. 2.3.
2.1 Preliminaries and the Independent Cas-
cade (IC) Model Review
We first explain how to identify a single trace when a user
vinfluences another user uby sending a message. Assume
that at time t, user vsends a message mto user u; given a
t,vis called to activate uat time tif the total number of
(walking & running) steps of uin [t, t + t] is larger than
or equal to the total number of steps of uin the past period
[tt, t]. Normally, the influence can be further propagated
if usuccessfully activates other users at the next timestamp
(i.e., t+1) [7]. However, the process in health social networks
is usually slower than that. Following [11], we circumvent
this problem by adopting a time window wto define a single
trace as follows: given a chain of users α={U1,...,Un}
such that Uiis a set of users, U1U2. . . Un=;αis
called a single trace if i[1, n 1],uUi+1 is activated
by some user u0Uisuch that tα(u)[tα(u0), tα(u0) + w]
where tα(u) is the activation time of uin α. In real cases,
U1can be a user instead of a set of users.
Let G= (V, E ) denote a directed network, where Vis the
set of vertices and EV×Vdenotes a set of directed arcs.
Each arc (v, u)Erepresents an influence relationship (i.e.,
vis a potential influencer for u) and it is associated with
a probability p(v, u) which represents the strength of such
influence relationship. Let D={α1,...,αr}denote a log of
observed propagation traces over G. We assume that each
propagation trace in Dis initiated by a special node 6∈ V,
which models a source of influence that is external to the
network. More specifically, we have tα(Ω) < t(v) for each
αDand vV. Time unfolds in discrete steps. At time
t= 0 all vertices in Vare inactive, makes an attempt to
activate every vertex vVand succeeds with probability
p(Ω, v). At subsequent time steps, when a node vbecomes
active, it makes one attempt at influencing each inactive
neighbor u, who receives a message from v, with probability
p(v, u). Multiple nodes may try to independently activate
the same node at the same time.
There are different ways to evaluate the function p. The
Independent Cascade (IC) model proposed by Kempe et al.
[7] can be instantiated with an arbitrary choice of p. They
use a uniform probability qin their experiments, that is,
p(v, u) = qfor all (v, u)E. On the other hand, Saito et al.
[21] estimate a separate probability p(v, u) for every (v, u)
Efrom a set of observed traces. These two approaches can
be viewed as opposite ends of a complexity scale. Using
a single parameter results in a simple but potentially low
accuracy model, while estimating a different probability for
each arc might provide a good fit but at the price of risking
to overfit.
Next we introduce our CPP model to shift the model-
ing of influence strength from node-to-node to community-
to-community. In our community-based model, all vertices
which belong to the same cluster are assumed to have iden-
tical influence probabilities towards other clusters.
2.2 The CPP Model
We start by introducing the likelihood of a single trace α
when expressed as a function of single edge probability. This
is useful to define the problem that we tackle in this paper.
Let Iα,u be the set of user u’s neighbors that potentially
influence u’s activation in the trace α:
Iα,u ={v|(v, u)E, if uUithen vUi1}(1)
Let p:V×V[0,1] denote a function that maps every
pair of nodes to a probability. The log likelihood of the
traces in Dgiven pcan be defined as:
log L(D|p) = X
αD
log Lα(p) (2)
Each vIα,u,vsucceeds in activating uon the considered
trace αwith probability p(v, u) and fails with probability
1p(v, u). We define γα,v,u as users’ responsibility which
represents the probability that in trace α. The activation of
uwas due to the success of the activation trial performed
by v. The traces are assumed to be i.i.d. By using γα,v ,u,
we can define the likelihood of the observed propagation as
follows:
Lα(p) = Y
uVY
vIα,u
p(v, u)γα,v,u 1p(v , u)1γα,v,u (3)
Note that social communication is very important to
keep people following health intervention programs. Conse-
quently we encourage social communications, i.e., message
sending. Thus users may receive many messages but we only
consider successful arcs of physical activity influence in Eq.3.
To shift the influence strength estimation from node-to-
node to community-to-community in the CPP model, we use
a hierarchical decomposition Hof the network G. In detail,
His a tree with the network Gas a root r, the nodes in V
as leaves, and an arbitrary number of internal nodes (i.e.,
between the root rand the leaves uV). A cut hof H
is a set of edges of H, so that for every vV, one and
only one edge ehbelongs to the path from the root rto
v. Therefore, by removing all the edges in hfrom H, we
disconnect every vVfrom r.
Let CHdenote the set of all possible cuts of H. Each
hCHresults in a partition Phof the network G, so that
all vertices in Vthat are below the same edge ehin H
belong to the same cluster ceV. Let c(u) denote the
cluster to which the node uVbelongs to the partition
Ph. In the CPP model, all vertices that belong to the same
cluster are assumed to have identical influence probabilities
towards other clusters. Given a probability function ˆph:
Ph× Ph[0,1] that assigns a probability between any two
clusters of the partition Ph, we define:
ph(v, u) = ˆph(c(v), c(u)) (4)
In the next section, we will show that we can find ˆphus-
ing an expectation maximization (EM) algorithm. For the
moment, we can assume that ˆphis induced by hin a de-
terministic function since our aim is to identify our prob-
lem in terms of finding an optimal cut hCH. In fact,
a straightforward solution is the cut at the leaf level of H
that maximizes the likelihood defined in Equations 2 and 3
(i.e., individual level). Reducing the number of pairwise in-
fluence probabilities used by the model can only result in a
lower likelihood but the model complexity can be simplified.
That is the reason why we propose to use a model selection
function fthat takes into account both likelihood and the
complexity of the model.
For instance, Figures 2 and 3 respectively illustrate an
example of input and output for our problem, i.e., a CPP
A network Gof physical activity propagations Hierarchical decomposition Hof the G
Figure 2: An example of input for the CPP model: a graph Gof physical activity propagations (each
undirected edge is considered as the corresponding two directed arcs), a hierarchy H.
Figure 3: A possible detected community structure
resulted from the input of Figure 2 and correspond-
ing to the cut h3. The edge thickness represents the
strength of the influence.
model. The cut h1corresponds to the leaf level model where
each single node of the social graph constitutes a state of
the CPP model. Essentially this is the maximum likelihood
cut that corresponds to the idea of standard independent
cascade model [7] (i.e., individual level). Two other cuts
are also presented, where h2corresponds to the clustering
{{A, D, F },{B, G},{E , K},{M},{L, N, O }} and the cut h3
results in our model in Figure 3, which is the best model
according to the model selection function fin the example.
Then we can formally define the model learning problem
addressed in this paper. Note that the network Gand the
hierarchy Hremain fixed. The model complexity is only
affected by the cut hCH.
Definition 1. CPP Model Learning. Given a network
G= (V, E ), a set of propagation traces Dacross G, a hier-
archical partitioning Hof G, and a model selection function
f, find the optimal cut of Hdefined as
h= arg min
hCH
f(L(D|ˆph), h) (5)
It is interesting to note that the two extreme cases outlined
above, i.e., uniform probability, or all links have a different
probability can be modeled in our approach. Indeed, the cut
h1in Figure 2 places all vertices of Gin separate clusters,
which corresponds to the most complex model with a sep-
arate influence probability on every edge. The cuts h2and
h3induce models with a lower granularity (i.e., community
level). Finally, if there is no cut then all vertices are in the
same cluster, which results in the simplest possible model
with a constant p(v, u) for each edge (v, u).
2.3 Learning inter-Community Influence &
Model Selection
In this section, we propose an expectation-maximization
(EM) approach for estimating the pairwise influence
strength among the clusters of nodes, i.e., the parameters
of the CPP model. As presented before, we assume that the
clusters in a partition Phhave been induced by a cut hof
a given hierarchical decomposition Hof G. However, the
EM method presented in this section can be applied to an
arbitrary disjoint partition of V. Remind that c(u) denotes
the cluster to which ubelongs, and let C(x)Vdenote the
set of vertices that belong to cluster x Ph.
According to the discrete-time independent cascade model
[7], given a single trace α, at least one of user vIα,u was
successful to deliver physical activities to user uindepen-
dently, but we do not know which one. As discussed before,
by using users’ responsibilities γα,v,u we can define the com-
plete expectation log likelihood of the observed propagation
as follows:
Q( ˆph,ˆphprevious) = log Y
αDY
uVY
vIα,u
ˆph(c(v), c(u))γα,v,u
(6)
1ˆph(c(v), c(u))1γα,v,u
where ˆphprevious means the probability of the previous
partition. Assume that we have an estimate of every γα,v,u,
we can determine the ˆphwhich maximizes Eq.6 by solving
∂Q( ˆph,ˆphprevious)
ˆph(x,y)= 0 for all pair of clusters x, y Ph. This
gives the following estimate of ˆph(x, y).
ˆph(x, y) = P
αDP
uC(y)P
vIα,uC(x)
γα,v,u
P
αDP
uC(y)P
vC(x)
I(vIα,u)(7)
Next, we need to provide an estimate for every γα,v,u . We
do this based on the assumption that the probability distri-
butions γα,v,u are independent of the partition P. Indeed,
if vis believed to be the physical activity influencer for u
in the trace α, this belief should not change for different
ways of clustering the two nodes. Therefore, we estimate
γα,v,u from the model where every uVbelongs to its own
cluster, since this results in simplified estimates which only
depend on the network structure. By denoting this model
as ˆpo, we obtain the following estimation of γα,v,u:
γα,v,u =ˆpo(v , u)
PzIα,u ˆpo(z, u)(8)
We can summarize our learning method as follows:
1. Run the EM algorithm without imposing a cluster-
ing structure to estimate ˆpo(v, u) for all arcs (v, u)
E. Note that the estimate of ˆpo(v, u) is: ˆpo(v, u) =
PαDγα,v,u
PαDI(vIα,u). Repeats the two following steps until
convergence.
step 1 - Estimate each successful probability ˆpo.
step 2 - Update each influence responsibility γα,v,u by
using the Eq.8.
2. After obtaining γα,v,u , keep γα,v,u fixed for different
partitions Ph, and update ˆph(x, y ) according to the
Eq.7.
We have already presented our learning method to max-
imize the log likelihood L(D|ph) at individual and given a
partition Ph. Recall that the log likelihood is maximized for
the cut hthat places every node in its own cluster. We need
thus an approach to address the trade-off between model
accuracy and model complexity. In this work, we utilize the
Bayesian Information Criterion (BIC) [22] as a selection
function fin the Eq.5. In statistics, the BIC is a criterion
for model selection among a finite set of models.
BI C =2 log L(D|ph) + |h|log(|D|) (9)
where |h|is the number of inter-community influences ˆpo(x
, y) we need to estimate, |D|is the number of traces in D.
Finally, we can evaluate different cuts hCHof the hier-
archical decomposition of the network. Next, we utilize the
heuristic bottom-up greedy algorithm proposed in [12] to re-
port the best solution found as output given the hierarchical
decomposition H. In each iteration, the algorithm finds out
the two best communities to merge and to update the model.
The resulting cut as well as the corresponding parameters
are stored in the set C. Once the algorithm reaches H’s
root, it evaluates the objective function for every cut in C
and returns the one having the best value.
3. EXPERIMENTS
The CPP model generalizes the presentation of physical
activity propagations in health social networks. In the fol-
lowing we will describe how a CPP model can be exploited
for different purposes including data understanding, and
characterization of physical activity propagation flow. Fur-
thermore it can be used to categorize users based on influ-
ence behaviors and health outcomes. We use the real world
user behavior data and the corresponding social network to
empirically validate the effectiveness of the CPP model. We
first elaborate on the experiment configurations on the data
set, and health outcome evaluation metrics. Then, we in-
troduce the experimental results and how we can utilize our
discovery in different applications.
3.1 Experiment Configuration and Health
Outcome Metrics
Human Physical Activity Dataset. The YesiWell
study is conducted in 2010-2011 as collaboration among sev-
eral health laboratories and universities to help people main-
tain active lifestyles and lose weight. The dataset is collected
Figure 4: Distribution of the record number and
user number.
Figure 5: The number of inbox messages and the
number of users distribution.
from 254 users, including personal information, a social net-
work, and their daily physical activities in ten months from
October 2010 to August 2011.
The initial physical activity data, collected by a special
electronic equipment for each user, includes information of
the number of walking and running steps. Since in the
dataset, some users’ daily records are missing, we show the
basic analysis on the distribution of physical activity record
numbers in Figure 4. In the Figure 4, there are 14 users with
their daily physical activity record number smaller than 10,
and 8 users with their record number larger than 10 but
smaller than 20. Thus, to clean the data, we filtered the
users whose daily physical activity record number is smaller
than 80. In addition, we only consider users who contribute
to the social communication (i.e., users must send (resp.,
receive) messages to (resp., from) other users). Finally, we
have 123 users for experiments. Figure 5 illustrates the dis-
tribution of the number of inbox messages and the number of
users in our data. It clearly follows Power law distribution.
Body Mass Index (BMI) is a measure for human body
shape based on an individual’s mass and height, BM I =
mass(kg)
(height(m))2. The BMI is used in a wide variety of contexts
as a simple method to assess how much an individual’s body
weight departs from what is normal or desirable for a person
of his or her height. Indeed, BMI provides a simple numeric
measure of a person’s thickness or thinness, allowing health
professionals to discuss overweight and underweight prob-
lems more objectively with their patients. The current value
settings are as follows: a BMI of 18.5 to 25 may indicate op-
timal weight, a BMI lower than 18.5 suggests the person is
underweight, a number above 25 may indicate the person is
overweight, a number above 30 suggests the person is obese.
Wellness Score. The medical establishment has ac-
knowledged major shortcomings of BMI. BMI depends upon
weight and the square of height but it ignores basic scal-
ing laws whereby mass increases to the 3rd power of lin-
ear dimensions. Hence, larger individuals, even if they had
exactly the same body shape and relative composition, al-
ways have a larger BMI. Also, its assumptions about the
distribution between lean mass and adipose tissue are some-
how inexact [14, 25]. Thus, to enrich the health outcome
and to rank user’s health, we further propose a novel mea-
sure called Wellness score. In essence, wellness score is a
composite score of one’s health based on lifestyle parame-
ters, biometrics, and biomarkers. Lifestyle parameters en-
compass physical activities measured in steps per minute,
self-reported lifestyle parameters, the number of goals set
and achieved, and social activities in terms of the size of
and communications within one’s social network, creation
of and participation in competitions and social games, and
public/private feed activities within the our social network.
The biometric and biomarker component scores are based
on a combination of utility functions (i.e., BMI vs. mortal-
ity, triglyceride/HDL vs. health risk, LDL vs. health risk,
HbA1c vs. diabetes risk level, etc.) and correlation func-
tions between BMI and biomarkers. In short, one’s com-
ponent risk score y=β1U(BM I ) + β2ρ1U(T G/H DL) +
β3ρ2U(LDL) + β4ρ3U(HbA1c), where βis component
weight, U(.) is a specific utility function associated with the
component in parentheses, ρis the correlation coefficient be-
tween BMI and the selected biomarker component. Lifestyle
component score is based on a heuristic weighted combina-
tion of the number of steps per day, intensity of steps based
on estimated speed, and various social activity-derived fea-
tures highly associated with future weight loss [8].
Finally raw wellness scores are computed over multiple
participants through Markov Chain Monte Carlo sampling
in an attempt to remap the raw scores such that remapped
scores mimic percentile ranking. For instance, a wellness
score of 90 means 90% ranking (i.e., top 10%). We also
apply some boosting at the bottom so that people do not
become too discouraged when their scores are too low.
Experiment Setting. Our proposed model (source
code1) requires input as a hierarchical decomposition of the
network. Following [12], we obtain this hierarchy by recur-
sively partitioning the underlying network using METIS [6],
which reportedly provides high quality partitions. Finally,
the delay threshold tand the time window ware respec-
tively set to a day and a week. We ran our experiments on
a Intel i7 2.8 GHz processor and 4 GB memory.
3.2 Experimental Results
An effective way of summarizing influence relationships
in the network is to consider the community-level influence
propagation network. In Figure 6, we show the network of
physical activity propagations for our dataset. The node size
is the average number of steps for all users in their commu-
nity. While the edge width is proportional to the probability
of physical activity influences. The shapes will be described
later. Note that we only consider the arcs which have prob-
abilities larger than 0.25. It is very interesting since the
network is almost acyclic, and this suggests a clear direc-
1ix.cs.uoregon.edu/~haiphan/Publications/CPP.rar
Figure 6: Detected community structure in our
health social network data.
tionality pattern in the flow of physical activities. Moreover,
with the CPP model we are able to categorize the eight de-
tected communities into three kinds of group based on their
influence behavior as follows:
1) Influencer - This group can be seen as circle nodes in
Figure 6. Indeed, these nodes have the strongest influence
probability to deliver physical activities to other users in
other communities. In addition, they almost do not receive
physical activity delivering from other communities.
2) Influenced users - This group can be seen as rectangle
nodes in Figure 6. These nodes are easy to be influenced by
influencers (i.e., circle nodes) since they receive the physi-
cal activity delivering with high propagation probabilities.
Moreover, the average number of steps of these nodes are
quite large, even larger than the influencer nodes. These
influenced users sometimes try to deliver physical activities
to other communities but not much.
3) Non-Influenced users - This group can be seen as tri-
angle nodes in Figure 6. These nodes are very hard to be in-
fluenced since they receive very small probabilities of physi-
cal activity propagations from other groups. In addition, the
average number of steps of the non-influenced nodes is very
small compared with the other mentioned kinds of nodes.
Essentially, the effectiveness of our approach can be val-
idated by exploiting the differences among the three user
categories in terms of behaviors, life styles, and health out-
comes to explain why they have such physical activity prop-
agation behaviors. We will illustrate the varying of health
outcome measures (i.e., BMI, #steps, Wellness score) over
time for the three groups. Note that in the next experi-
ments, all the users in the same category will be gathered
together and thus we will have only three groups of users
instead of the eight detected communities.
BMI. Figure 7 illustrates the average and the standard
deviation of BMI for the three groups (i.e., influencers, in-
fluenced users, and non-influenced users). Interestingly, the
influencer group has average and standard deviation of BMI
significantly lower than the other two groups. Since the pur-
pose of participants who enrolled in this study is to reduce
their BMIs, the influencer group can potentially be their
external motivation. That is one of the reasons to explain
why the influencer group has a strong influence probabilities
to other groups. Meanwhile, the non-influenced users have
almost the highest average and standard deviation of BMI.
Even they have quite similar BMI values with the influenced
user group at the beginning.
Physical activity record number. Figure 8 illustrates
the average number of steps for the three groups over time.
(a) Average BMI (b) BMI standard deviation
Figure 7: Average and standard deviation of BMI for the three user categories.
Figure 8: Average steps for all users in the three kinds of community, i.e., Influencer, Influenced users, and
non-Influenced users. (Best view in color)
We can see that the influencer group not only has the best
BMI values but also is stable in doing practices day by day
(i.e., a good life style) from the beginning to the end of the
study. Together with the CPP model results, it clarifies the
activity delivering role of the influencer group. Regarding
the influenced user group, they did less physical activities
at the beginning (i.e., at the middle of November, 2010)
but after that they had rapidly increased activities, even
more than the influencer group. Interestingly, their activity
performance is stabilized along with the influencer group
until the end of the program. With the CPP model results,
we can say that the influencer group has been successful to
deliver physical activities to the influenced user group.
Regarding the non-influenced user group, there is no big
change in their physical activity behaviors. They have the
lowest activity performance and it usually fluctuates in the
whole program lifetime. It is only a short period (i.e., Jan-
uary to March, 2011) within that they have a quite stable
(but the lowest) activity performance. So, we can say that
it is hard to improve the practice behavior of non-influenced
user group via social communications.
Wellness score. We have illustrated the correlation be-
tween the CPP model results and health outcome measures
such as BMI and the exercise activity record number inde-
pendently above. However, these individual measure cannot
reflect the actual user health status which is a complex com-
bination of a user lifestyle, biometrics, and biomarkers. Our
proposed wellness score is a such metric. Figure 9 illustrates
the wellness score for the three user groups. It is quite clear
that the influencer group always has a high wellness score.
In addition, the influenced user group has a big change in
their scores. In fact, the influenced user group has a low
score at the beginning but after that they had increased
their scores to be one of the highest ones. Meanwhile, the
non-influenced user group has the lowest score even they
has a better starting point compared with the influenced
user group.
Community consistency. Interestingly, in Figure 7b
and Figure 9b, the standard deviations of the BMI and Well-
ness score are quite small (i.e., from 1.5 to 2.5 for the BMI
standard deviation, and from 3 to 5 for the Wellness score
standard deviation). Furthermore they are quite stable (i.e.,
no big changes) for all the three user groups. Therefore, not
only the health outcome measures but also the lifestyles and
physical activity record numbers are quite consistent among
the users in the same communities.
Until now, we can conclude that there are significant dif-
ferences in terms of behaviors, lifestyles, biometrics, and
biomarkers between the three user groups. Indeed, the CPP
model offers us an effective tool to discover the flow of physi-
cal activity propagations. Base on that we can easily exploit
unrevealed influence patterns and distinguish users in terms
of physical activity delivering. Moreover, the detected com-
munities are internally consistent. It is very useful for many
(a) Average Wellness score (b) Wellness score standard deviation
Figure 9: Average and standard deviation of Wellness score for the three user categories.
(a) #steps (b) Wellness score
Figure 10: CPP model vs social link based on health outcome. The markers correspond to the three user
categories in Figure 6.
other tasks such as activity propagation prediction. Conse-
quently, the CPP model has a strong correlation with health
outcomes that is very meaningful for us to design physical
activity interventions through health social networks.
The CPP model vs social link clustering. The out-
put of the CPP model can be graphically represented to
analyze the influence probability between two communities
and social link relationships. An effective way is plotting the
corresponding heat-maps, as shown in Figure 10. In these
figures, we plot the Jaccard similarity in terms of number
of steps and wellness score between the CPP model and ob-
tained clusters by clustering the social network links. Note
that the clustering algorithm maximizes the high correlation
within-cluster and low between-cluster. Given two clusters
Aand B, the Jaccard similarity is computed as follows:
J(A, B, steps) = PuABu.steps
PuABu.steps (10)
where u.steps is the total number of steps reported by u.
We use the similar equation for J(A, B, wellness score).
In general, we register almost no correlation between the
CPP model and the social link clustering. Five over eight
detected communities in the CPP model are found almost
in the cluster 0, which is the densest cluster in our friend
network. Thus, applying normal clustering algorithm on
social network links cannot discover communities obtained
by the CPP model.
Comparison of the CPP model and the CSI model
[12]. To highlight the effectiveness of our CPP model, we
further compare our results with a CSI model. Indeed, we
applied both model selection functions MDL [19] and BIC
proposed in a CSI model. The former function generates
only one community while we observe 6 communities from
the latter function. In Figure 11, we plot the intensity of
the influence probability between two communities observed
from the CSI model (BIC model selection function) and the
CPP model. In the CPP model, it is clear to see the influence
role of the communities c0, c1,and c3while c7, c6and c2
receive strong influence probabilities. Furthermore, c4and
c5do not contribute much to the process.
Meanwhile it is not clear to distinguish the differences
between the communities observed by the CSI model. In
addition, the probability range in the CSI model is [0, 0.7]
smaller than the range in our model. The reason might
be our model is designed for health social network and we
do not take into account users who clearly fail to influence
others. In contrast, the CSI model does not consider that.
4. RELATED WORK
4.1 Physical Activity Intervention Ap-
proaches
Regular physical activities decrease the risk of develop-
ing cardiovascular disease, diabetes, obesity, osteoporosis,
some cancers, and other chronic conditions. Thus, find-
(a) CSI model (b) CPP model
Figure 11: CPP and CSI models on our health social network data.
ing effective population-based intervention strategies to pro-
mote physical activities is a key challenge. Website-delivered
physical activity interventions have the potential to over-
come many of the barriers associated with traditional face-
to-face exercise counseling or group-based physical activity
programs. An Internet user can seek advice at any time,
any place, and often at a lower cost compared with other
delivery modalities [20].
In 2000, a set of articles that identified the potential of
interactive health communications, including Internet and
website-delivered interventions, for improving health behav-
iors were published [9, 17, 18]. Since then, over fifteen stud-
ies [27] evaluating a website-delivered intervention to im-
prove physical activities that used the Internet or e-mail
have been reported. Improvement in physical activities was
reported in eight. Better outcomes were identified when in-
terventions had more than five contacts with participants
and when the time to follow-up was short (3 months; 60%
positive outcomes), compared to medium-term (3-6 months,
50%) and long-term (6 months, 40%) follow-up. Indeed, a
little over half of the controlled trials of website-delivered
physical activity interventions have reported positive be-
havioral outcomes. However, intervention effects were short
lived, and there was limited evidence of maintenance of phys-
ical activity changes.
Although the website-delivered approaches reported posi-
tive results, research is needed to identify elements that can
improve behavioral outcomes. The maintenance of change
and the engagement and retention of participants; larger and
more representative study samples are also needed. Indeed,
social network has this potential for being adopted since it
take the advantage of the nature of social relationships to
deliver healthy behavior. Furthermore, social network could
be a long-life environment and thus the retention of partici-
pants could be naturally improved. Though we are in a long
way to reach the goal, our proposed model and discovery is
the foundation for further researches since it offers us a pow-
erful tool to understand the physical activity propagation on
a health social network.
4.2 Social Influence and Information Propa-
gation Models
Social influence and the phenomenon of influence-driven
propagations in social networks have received considerable
attention in the recent years. One of the key issues in this
area is to identify a set of influential users in a given social
network. Domingos and Richardson [3] approach the prob-
lem with Markov random fields, while Kempe et al. [7] frame
influence maximization as a discrete optimization problem.
Another line of study has focused on the problem of learning
the influence probabilities on every edge of a social network
given an observed log of propagations over this network [4,
21, 24, 28]. In addition, many tasks in machine learning and
data mining involve finding simple and interpretable mod-
els that nonetheless provide a good fit to observed data. In
graph summarization, the objective is to provide a coarse
representation of a graph for further analysis. Tian et al.
[26] and Zhang et al. [29] consider algorithms to build
graph summaries based on node attributes, while Navlakha
et al. [13] use Minimum Description Length principle (MDL)
[19] to find good structural summaries of graphs. In [12],
Mehmood et al. introduce a hierarchical approach to sum-
marize patterns of influence in a network, by detecting com-
munities and their reciprocal influence strength.
5. CONCLUSIONS AND FUTURE WORK
In this paper we introduce a hierarchical approach to an-
alyze the physical activity propagation through social com-
munications at the community level (which also can be ap-
plied to individual level). Our proposed CPP model offers
a more compact representation of the network of propaga-
tions. Furthermore it can be easily plotted and exploited
to understand and detect interesting properties in the in-
formation propagation flow over the network. Our empiri-
cal analysis over a real-world health social network empha-
sizes the three meaningful observations: 1) social networks
have great potential to propagate physical activities via so-
cial communications, 2) the propagation network found in a
health social network by the CPP model is almost acyclic,
and 3) the physical activity-based influence behavior has a
strong correlation to health outcome measures such as BMI,
lifestyles, and our proposed Wellness score.
Since online social networks have been exploited in re-
cent years, our first observation paves an early brick on a
new, promising, and perhaps most effective way to propa-
gate physical activities to wide population. While the second
observation offers interesting insights, it shows the existence
of a clear direction in the propagation of physical activities.
That is useful for physical activity intervention approaches
to design more effective strategies. The third observation
might be exploited to categorize users or to predict user
macro-activities based on their influence behaviors [23].
In the near future, we are going to clarify the correlation
between the physical activity propagation via social com-
munications and a corresponding friend network. Indeed,
homophily principle is important to deliver healthy behav-
ior on health social networks [2]. Therefore, by discovering
the correlation between homophily effect and social commu-
nications, we could have a complete picture. As a result we
will be able to build up better human behavior predictive
models and physical activity intervention approaches.
6. ACKNOWLEDGMENTS
This work is supported by the NIH grant R01GM103309.
The views and conclusions contained in this document are
those of the authors and should not be interpreted as rep-
resenting the official policies, either expressed or implied, of
the NIH, or the U.S. Government.
7. REFERENCES
[1] A. Bauman, T. Armstrong, J. Davies, N. Owen,
W. Brown, B. Bellew, and P. Vita. Trends in physical
activity participation and the impact of integrated
campaigns among australian adults, 1997-99.
Australian and New Zealand Journal of Public Health,
27(1):76–9, 2003.
[2] N. Christakis and J. Fowler. The spread of obesity in
a large social network over 32 years. New England
Journal of Medicine, 357:370–9, 2007.
[3] P. Domingos and M. Richardson. Mining the network
value of customers. In Proceedings of KDD’01, pages
57–66, 2001.
[4] A. Goyal, F. Bonchi, and L. V. S. Lakshmanan.
Learning influence probabilities in social networks. In
Proceedings of WSDM’10, pages 241–250, 2010.
[5] http://www.internetworldstats.com/stats.htm.
[6] G. Karypis and V. Kumar. A fast and high quality
multilevel scheme for partitioning irregular graphs.
SIAM J. Sci. Comput., 20(1):359–392, 1998.
[7] D. Kempe, J. Kleinberg, and E. Tardos. Maximizing
the spread of influence through a social network. In
Proceedings of KDD’03, pages 137–146, 2003.
[8] D. Kil, F. Shin, B. Piniewski, J. Hahn, and K. Chan.
Impacts of social health data on predicting weight loss
and engagement. In O’Reilly StrataRx Conference,
San Francisco, CA, October 2012.
[9] B. Marcus, C. Nigg, D. Riebe, and L. Forsyth.
Interactive communication strategies: implications for
population-based physical activity promotion.
American Journal of Preventive Medicine,
19(2):121–6, 2000.
[10] A. Marshall, E. Eakin, E. Leslie, and N. Owen.
Exploring the feasibility and acceptability of using
internet technology to promote physical activity
within a defined community. Health promotion journal
of Australia, 2005(16):82–4, 2005.
[11] M. Mathioudakis, F. Bonchi, C. Castillo, A. Gionis,
and A. Ukkonen. Sparsification of influence networks.
In Proceedings of KDD’11, pages 529–537, 2011.
[12] Y. Mehmood, N. Barbieri, F. Bonchi, and A. Ukkonen.
Csi: Community-level social influence analysis. In
Proceedings of ECML-PKDD’13, pages 48–63, 2013.
[13] S. Navlakha, R. Rastogi, and N. Shrivastava. Graph
summarization with bounded error. In Proceedings of
SIGMOD’08, pages 419–432, 2008.
[14] N. I. of Health. Aim for a healthy weight: Assess your
risk. National Institutes of Health, 2007.
[15] U. D. of Health and H. Services. Physical activity and
health: A report of the surgeon general. Atlanta GA:
U.S. Department of Health and Human Services,
Centers for Disease Control and Prevention, National
Center for Chronic Disease Prevention and Health
Promotion, 1996.
[16] R. Pate, M. Pratt, S. Blair, and et al. Physical activity
and public health. a recommendation from the centers
for disease control and prevention and the american
college of sports medicine. JAMA, 273(5):402–7, 1995.
[17] K. Patrick. Information technology and the future of
preventive medicine: potential, pitfalls, and policy.
American Journal of Preventive Medicine,
19(2):132–5, 2000.
[18] J. Prochaska, M. Zabinski, K. Calfas, J. Sallis, and
K. Patrick. Pace+: interactive communication
technology for behavior change in clinical settings.
American Journal of Preventive Medicine,
19(2):127–31, 2000.
[19] J. Rissanen. A universal prior for integers and
estimation by minimum description length. The
annals of statistics, 14(5):416–431, 1983.
[20] L. Ritterband, L. Gonder-Frederick, D. Cox,
A. Clifton, R. West, and S. Borowitz. Internet
interventions: in review, in use, and into the future.
Professional Psychology: Research and Practice,
34:527–34, 2003.
[21] K. Saito, R. Nakano, and M. Kimura. Prediction of
information diffusion probabilities for independent
cascade model. In Proceedings of KES 2008, pages
67–75, 2008.
[22] G. Schwarz. Estimating the dimension of a model. The
annals of statistics, 6(2):461–464, 1978.
[23] Y. Shen, R. Jin, D. Dou, N. Chowdhury, J. Sun,
B. Piniewski, and D. Kil. Socialized gaussian process
model for human behavior prediction in a health
social network. In ICDM’12, pages 1110–1115, 2012.
[24] J. Tang, J. Sun, C. Wang, and Z. Yang. Social
influence analysis in large-scale networks. In
Proceedings of KDD’09, pages 807–816, 2009.
[25] R. Taylor. Use of body mass index for monitoring
growth and obesity. Paediatrics & Child Health,
15(5):258, 2010.
[26] Y. Tian, R. Hankins, and J. Patel. Efficient
aggregation for graph summarization. In Proceedings
of SIGMOD’08, pages 567–580, 2008.
[27] C. Vandelanotte, K. Spathonis, E. Eakin, and
N. Owen. Website-delivered physical activity
interventions: A review of the literature. American
Journal of Preventive Medicine, 33(1):54–64, 2007.
[28] R. Xiang, J. Neville, and M. Rogati. Modeling
relationship strength in online social networks. In
Proceedings of WWW’10, pages 981–990, 2010.
[29] N. Zhang, Y. Tian, and J. Patel. Discovery-driven
graph summarization. In Proceedings of ICDE’10,
pages 880–891, 2010.
... It is well known that individuals tend to be friends with people who perform behaviors similar to theirs (homophily principle). In addition, as shown in Phan et al. (2014), users differentially experience and absorb physical exercise-based influences from their friends. Therefore, the explicit social influences in health social networks can be defined as a function of the homophily effect and physical exercise-based social influences. ...
... where F u t is a set of friends of user u until time t from the beginning. w t ðm; uÞ is the physical exercise-based social influence of m on u at time t, which is derived by using the CPP model (Phan et al. 2014). s t is the similarity between two arbitrary neighboring users in the social network at time t. ...
... Meanwhile, our SRBM ? better models this by using the physical exercise-based social influence w t ðm; uÞ derived from the CPP model (Phan et al. 2014). In addition, the temporal smoothing is effective in modeling how our network has grown over time (i.e., developed from scratch, Fig. 3b). ...
Article
Full-text available
Human behavior modeling is a key component in application domains such as healthcare and social behavior research. In addition to accurate prediction, having the capacity to understand the roles of human behavior determinants and to provide explanations for the predicted behaviors is also important. Having this capacity increases trust in the systems and the likelihood that the systems will be actually adopted, thus driving engagement and loyalty. However, most prediction models do not provide explanations for the behaviors they predict. In this paper, we study the research problem, human behavior prediction with explanations, for healthcare intervention systems in health social networks. In this work, we propose a deep learning model, named social restricted Boltzmann machine (SRBM), for human behavior modeling over undirected and nodes-attributed graphs. In the proposed SRBM⁺ model, we naturally incorporate self-motivation, implicit and explicit social influences, and environmental events together. Our model not only predicts human behaviors accurately, but also, for each predicted behavior, it generates explanations. Experimental results on real-world and synthetic health social networks confirm the accuracy of SRBM⁺ in human behavior prediction and its quality in human behavior explanation.
... In addition, mobile devices can track and record the distance and intensity of an individual's walking, jogging, and running. We utilized these technologies in our recent study, named YesiWell [Phan et al. (2014)], conducted in 2010-2011 as a collaboration between PeaceHealth Laboratories, SK Telecom Americas, and the University of Oregon to record daily physical 20 activities, social activities (i.e., text messages, social games, events, competitions, etc.), biomarkers, and biometric measures (i.e., cholesterol, triglycerides, BMI, etc.) for a group of 254 individuals. Physical activities were reported via a mobile device carried by each user. ...
... It is well-known that individuals tend to be friends with people who perform behaviors similar to theirs (homophily principle). In addition, as shown in [Phan et al. (2014)], users differentially experience and absorb physical exercise-based influences from their friends. Therefore, the explicit social influences in health 300 social networks can be defined as a function of the homophily effect and physical exercise-based social influences. ...
... where F u t is a set of friends of user u, from the beginning until time t. ψ t (m, u) is the physical exercise-based social influence of m on u at time t, which is derived by using the CPP model [Phan et al. (2014)]. s t is the similarity between two 305 arbitrary neighboring users in the social network at time t. ...
... And likewise, a negative post can indicate a decrease in the activity compared to (the average of) previous and future days. To study the group-level behavior caused by the posts, we build on our previous research [25] that demonstrated how social influence can contribute to physical activity propagation. Findings of the current paper show that there is a significant difference between positive and negative posts in how they influence other members of the network. ...
... Better outcomes were identified when interventions had more than five contacts with participants and when the time to follow-up was short (≤3 months; 60% positive outcomes), compared to medium-term (3-6 months, 50%) and long-term (≥6 months, 40%). Phan et al. [25] introduced a hierarchical approach to analyze physical activity propagation through social communications at the community level. Their findings show that: (a) social networks have great potential to propagate physical activities via social communications; and (b) physical activity-based influence behavior has a strong correlation to health outcome measures such as BMI. ...
... To investigate this, we must equip ourselves with methods that can model influence in a social network. To model physical activity propagation, Phan et al. [25] introduced the Community-level Physical Activity Propagation (CPP) model, which was inspired by the ideas of the Independent Cascade (IC) model [17], and the Community-level Social Influence (CSI) model [21]. Similar to our Topic aware Community-level Physical Activity Propagation (TaCPP) model [26], we take the content of the posts into account. ...
Conference Paper
New horizons are emerging within healthcare delivery, education , intervention provision, and tracking. We study a health social network that has tracked physical activities , biomarkers, and posts the participants have shared, throughout a one-year program. The program was aimed at helping people to adopt healthy behaviors and to lose weight. In this paper, we focus on users' posts that relate to physical activities. Prior papers characterize health based solely on users' information disclosed through natural language or questionnaires. The drawback of these works is their lack of medical records or health-related information to validate their findings. By contrast, with our direct access to users' physical and medical data, we investigate the implication of users' posts at both individual and group levels. We are able to validate our hypotheses about the effects of certain social network activities, by contextualiz-ing them in the specific users' actual medical progress and documented levels of exercise. Our findings show that activity self-disclosure posts are good indicators of one's real-world physical activity, which makes them good resources for monitoring the participants. In addition, using a physical activity propagation model, we show how these posts can influence the physical activity behavior at the network level. Further, posts exhibit distinctive affective, biological, and linguistic style markers. We observe that these characteristics can be used in a predictive capacity, to detect positive activity signals with ∼ 88% accuracy, which can be utilized for an unobtrusive monitoring solution.
... Today, early digital therapeutics approaches offer relatively similar programs to most participants [10,16,24,25,32,33,35]. In addition, the programs offered by these approaches usually do not take into account the impact of social networks, which have been demonstrated to be important in spreading healthy behaviors, e.g., physical activities [15,27,30,31]. Harvesting high-definition insights from the digital transactions of users and their social effects will enable significant personalization. ...
... Overall, we have 7 million data points for physical activities; 10,000 records for BMI and wellness score 1 [31]; 3,101 instances of participation in competitions; 1,765 instances of participation within 278 social games; 2,656 messages sent; 1,828 friend connections; 1,300 goals set; 14,138 survey answers, etc. Users volunteered to join the study; therefore, they were not under any pressure to exercise more or less. After 6 months, the YesiWell users increased leisure walking minutes by 164% on average (i.e., from 129.2 min/week to 341 min/week), compared with 47% among the control participants, who did not use the Yesi-Well social network (P < 0.05 performed by t-test) [15]. ...
Chapter
Full-text available
Identifying the effects of social and motivational factors is critical to understanding how healthy behaviors, i.e., physical activities, spread in digital therapeutics programs. We evaluated a comprehensive interconnected social network of 254 overweight and obese individuals across 335 days. Daily physical activities, social activities, biomarkers, and biometric measures were available for all subjects. We improved proportional hazards models to characterize the impact of self-motivation, influence, and susceptibility in the spread of physical activities. After 6 months, the YesiWell users increased leisure walking minutes by 164% on average compared with 47% among the control participants (\(P<0.05\)). The YesiWell users also lost more weight than the controls (5.2 pounds vs. 1.5 pounds) (\(P<0.01\)). Our estimations showed that influence and susceptibility increase with age; relaxed people are 96% more influential than stressed people (\(P<0.001\)); obese people are 23% more self-motivated (\(P<0.001\)); socially active people are 29% more influential (\(P<0.001\)); those who self-characterize as “keep-to-themselves” people have a 79% greater susceptibility (\(P<0.001\)). Relaxed people exert the most influence on non-stressed peers at 109% more than baseline (\(P<0.001\)). Our findings could enable new and effective personalized behavioral interventions to spread healthy behaviors in next-generation digital therapeutics.
... ψ t (v, u) is the probability that v influences u on physical activity at time t. ψ t (v, u) is derived from the CPP model (Phan et al. 2014) which is an efficient social influence model in health social networks. F u is a set of friends of u in the social network. ...
Article
In recent years, deep learning has spread beyond both academia and industry with many exciting real-world applications. The development of deep learning has presented obvious privacy issues. However, there has been lack of scientific study about privacy preservation in deep learning. In this paper, we concentrate on the auto-encoder, a fundamental component in deep learning, and propose the deep private auto-encoder (dPA). Our main idea is to enforce ε-differential privacy by perturbing the objective functions of the traditional deep auto-encoder, rather than its results. We apply the dPA to human behavior prediction in a health social network. Theoretical analysis and thorough experimental evaluations show that the dPA is highly effective and efficient, and it significantly outperforms existing solutions.
Article
Full-text available
Objective: The primary objective of this online survey is to understand differences in user profile, user preferences and perceived impact among the European population. Methods: The sample groups were based on the most recent report of the European country with the highest and lowest levels of physical activity. Cross-sectional online survey of population resident in Portugal and population resident in Finland were selected by simple random sampling. Responses were collected from the open-source tool LimeSurvey. IBM Statistical Package for Social Sciences Statistics was used to analyse the acquired data. Results: A total of 538 responses were considered with 48.4% of respondents residing in Portugal, and 51.4% residing in Finland. About 38.5% of the general survey population regularly practice exercise, and 39.7% regularly engage in physical activity. Regarding the level of online community experience, responses were distributed between medium, moderately low, and very low. Overall, there is a significant relationship between both sample groups when it comes to physical activity, common emotions using online communities, user perception, preferences and openness. Conclusion: Our survey results provide evidence to support that country of residence is related to user physical activity and highlight the importance of considering demographic factors to understand general population lifestyle choices. Submitted to: International Journal of Health Promotion and Education
Article
Full-text available
The purpose of this study is to find what tools, techniques and strategies can be applied to online communities (OC) to positively impact its users’ engagement with OC dedicated to promoting physical activity (PA). Exploring the different elements that compose digital platforms (DP) that harbour OC can be an innovative step to evaluate the promotion of PA. This is especially helpful in light of the potential for internet-based interventions, which can reach a large number of users at reduced costs. The work plan was divided into four different phases. Phase 1 encompassed the development of a scoping review on OC characteristics. Phase 2 covered an analysis of existing DP. Phase 3 was dedicated to a survey on user preferences and perceived impact. Finally, Phase 4 incorporated the final research discussion along with a conceptual framework and a set of guidelines. The main findings indicate that OC should be harboured in DP optimised in website and app formats and that both are equivalently indispensable. We also found that a variety of details, OC features, OC content, user interaction strategies, and BCT are present in the current DP. Finally, we discovered that there appears to be a broad dissatisfaction with the features offered by OC. Existing evidence is insufficient to draw objective conclusions, therefore more scientific studies on the various subjects must be carried out. Users’ preferences and perceived impact may initially appear to be at odds—users’ expectations of OC appear to be unmet, and user preferences need to be nurtured. Submitted to: Health Promotion International
Article
Full-text available
Objective: The objective of this scoping review was to identify, characterize and synthesize existing literature on the use of online communities to promote physical activity and identify gaps to direct future research. Methods: Systematic searches were conducted in Science Direct, PubMed, Scopus, and Institute of Electrical and Electronics Engineers Xplore for studies published up to August 2020. The search terms included a combination of the following keywords: physical activity, sedentary, exercise, health, sport, brand, online community. No limits were used. Studies were included if they encompassed a full publication containing enough details on characteristics and described any feature primarily aiming at physical activity promotion. Results: A total of 21 different online communities were found in the total of 25 selected studies. Of those studies, all reported on at least one behaviour change technique, 68.2% (n=15) used websites to support the OC, 36% (n=9) reported on strategies to keep users engaged, 16% (n=4) comprised information related to the design process, and 16% (n=4) reported on OC effectiveness. Conclusion: Existing reports do not provide evident detailed information on the design process or user engagement strategies related to online communities, and only a few studies assess its effectiveness in improving physical activity. Further research is needed. Submitted to: Health promotion journal of Australia
Conference Paper
Full-text available
A scoping review was carried out to identify existing literature characterising the content and features of OC targeting the promotion of PA.
Conference Paper
Full-text available
Distributed word representations are able to capture syntactic and semantic regularities in text. In this paper, we present a word representation scheme that incorporates authorship information. While maintaining similarity among related words in the induced distributed space, our word vectors can be effectively used for some text classification tasks too. We build on a log-bilinear document model (lbDm), which extracts document features, and word vectors based on word co-occurrence counts. First, we propose a log-bilinear author model (lbAm), which contains an additional author matrix. We show that by directly learning author feature vectors, as opposed to document vectors, we can learn better word representations for the authorship attribution task. Furthermore, authorship information has been found to be useful for sentiment classification. We enrich the author model with a sentiment tensor, and demonstrate the effectiveness of this hybrid model (lbHm) through our experiments on a movie review-classification dataset.
Article
Full-text available
Objective: To encourage increased participation in physical activity among Americans of all ages by issuing a public health recommendation on the types and amounts of physical activity needed for health promotion and disease prevention. Participants: A planning committee of five scientists was established by the Centers for Disease Control and Prevention and the American College of Sports Medicine to organize a workshop. This committee selected 15 other workshop discussants on the basis of their research expertise in issues related to the health implications of physical activity. Several relevant professional or scientific organizations and federal agencies also were represented. Evidence: The panel of experts reviewed the pertinent physiological, epidemiologic, and clinical evidence, including primary research articles and recent review articles. Consensus process: Major issues related to physical activity and health were outlined, and selected members of the expert panel drafted sections of the paper from this outline. A draft manuscript was prepared by the planning committee and circulated to the full panel in advance of the 2-day workshop. During the workshop, each section of the manuscript was reviewed by the expert panel. Primary attention was given to achieving group consensus concerning the recommended types and amounts of physical activity. A concise "public health message" was developed to express the recommendations of the panel. During the ensuing months, the consensus statement was further reviewed and revised and was formally endorsed by both the Centers for Disease Control and Prevention and the American College of Sports Medicine. Conclusion: Every US adult should accumulate 30 minutes or more of moderate-intensity physical activity on most, preferably all, days of the week.
Conference Paper
Full-text available
Modeling and predicting human behaviors, such as the activity level and intensity, is the key to prevent the cascades of obesity, and help spread wellness and healthy behavior in a social network. In this work, we propose a Socialized Gaussian Process (SGP) for socialized human behavior modeling. In the proposed SGP model, we naturally incorporates human's personal behavior factor and social correlation factor into a unified model, where basic Gaussian Process model is leveraged to capture individual's personal behavior pattern. Furthermore, we extend the Gaussian Process Model to socialized Gaussian Process (SGP) which aims to capture social correlation phenomena in the social network. The detailed experimental evaluation has shown the SGP model achieves the best prediction accuracy compared with other baseline methods.
Conference Paper
Full-text available
Modeling how information propagates in social networks driven by peer influence, is a fundamental research question towards understanding the structure and dynamics of these complex networks, as well as developing viral marketing applications. Existing literature studies influence at the level of individuals, mostly ignoring the existence of a community structure in which multiple nodes may exhibit a common influence pattern. In this paper we introduce CSI, a model for analyzing information propagation and social influence at the granularity of communities. CSI builds over a novel propagation model that generalizes the classic Independent Cascade model to deal with groups of nodes (instead of single nodes) influence. Given a social net- work and a database of past information propagation, we propose a hierarchical approach to detect a set of communities and their reciprocal influence strength. CSI provides a higher level and more intuitive description of the influence dynamics, thus representing a powerful tool to summarize and investigate patterns of influence in large social networks. The evaluation on various datasets suggests the effectiveness of the proposed approach in modeling information propagation at the level of communities. It further enables to detect interesting patterns of influence, such as the communities that play a key role in the overall diffusion process, or that are likely to start information cascades.
Article
Full-text available
Recently, a number of researchers have investigated a class of graph partitioning algorithms that reduce the size of the graph by collapsing vertices and edges, partition the smaller graph, and then uncoarsen it to construct a partition for the original graph (Bui and Jones, Proc. of the 6th SIAM Conference on Parallel Processing for Scientific Computing, 1993, 445-452; Hen- drickson and Leland, A Multilevel Algorithm for Partitioning Graphs, Tech. report SAND 93-1301, Sandia National Laboratories, Albuquerque, NM, 1993). From the early work it was clear that multilevel techniques held great promise; however, it was not known if they can be made to con- sistently produce high quality partitions for graphs arising in a wide range of application domains. We investigate the effectiveness of many different choices for all three phases: coarsening, partition of the coarsest graph, and refinement. In particular, we present a new coarsening heuristic (called heavy-edge heuristic) for which the size of the partition of the coarse graph is within a small factor of the size of the final partition obtained after multilevel refinement. We also present a much faster variation of the Kernighan-Lin (KL) algorithm for refining during uncoarsening. We test our scheme on a large number of graphs arising in various domains including finite element methods, linear pro- gramming, VLSI, and transportation. Our experiments show that our scheme produces partitions that are consistently better than those produced by spectral partitioning schemes in substantially smaller time. Also, when our scheme is used to compute fill-reducing orderings for sparse matrices, it produces orderings that have substantially smaller fill than the widely used multiple minimum degree algorithm.
Article
Full-text available
Objective. —To encourage increased participation in physical activity among Americans of all ages by issuing a public health recommendation on the types and amounts of physical activity needed for health promotion and disease prevention.
Article
Full-text available
The provision of health care over the Internet is a rapidly evolving and potentially beneficial means of delivering treatment otherwise unsought or unobtainable. Internet interventions are typically behavioral treatments operationalized and transformed for Web delivery with the goal of symptom improvement. The literature on the feasibility and utility of Internet interventions is limited, and there are even fewer outcome study findings. This article reviews empirically tested Internet interventions and provides an overview of the issues in developing and/or using them in clinical practice. Future directions and implications are also addressed. Although Internet interventions will not likely replace face-to-face care, there is little doubt that they will grow in importance as a powerful component of successful psychobehavioral treatment. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
The prevalence of obesity has increased substantially over the past 30 years. We performed a quantitative analysis of the nature and extent of the person-to-person spread of obesity as a possible factor contributing to the obesity epidemic. We evaluated a densely interconnected social network of 12,067 people assessed repeatedly from 1971 to 2003 as part of the Framingham Heart Study. The body-mass index was available for all subjects. We used longitudinal statistical models to examine whether weight gain in one person was associated with weight gain in his or her friends, siblings, spouse, and neighbors. Discernible clusters of obese persons (body-mass index [the weight in kilograms divided by the square of the height in meters], > or =30) were present in the network at all time points, and the clusters extended to three degrees of separation. These clusters did not appear to be solely attributable to the selective formation of social ties among obese persons. A person's chances of becoming obese increased by 57% (95% confidence interval [CI], 6 to 123) if he or she had a friend who became obese in a given interval. Among pairs of adult siblings, if one sibling became obese, the chance that the other would become obese increased by 40% (95% CI, 21 to 60). If one spouse became obese, the likelihood that the other spouse would become obese increased by 37% (95% CI, 7 to 73). These effects were not seen among neighbors in the immediate geographic location. Persons of the same sex had relatively greater influence on each other than those of the opposite sex. The spread of smoking cessation did not account for the spread of obesity in the network. Network phenomena appear to be relevant to the biologic and behavioral trait of obesity, and obesity appears to spread through social ties. These findings have implications for clinical and public health interventions.