Conference PaperPDF Available

Analysis of Physical Activity Propagation in a Health Social Network

November 2014
Intelligent Systems, IEEE 31(1)

November 2014
31(1)

DOI:10.1145/2661829.2662025

Conference: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management (CIKM 2014)

Authors:

Nhat Hai Phan

New Jersey Institute of Technology

Dejing Dou

University of Oregon

Xiao Xiao

Show all 5 authorsHide

Modeling physical activity propagation, such as the activity level and intensity, is the key to prevent the cascades of obesity, and help spread wellness and healthy behavior in a social network. However, there has been lacking of scientific and quantitative study to elucidate how social communication may deliver physical activity interventions. In this work we introduce a Community-level Physical Activity Propagation (CPP) model to analyze physical activity propagation and social influence at different granularities (i.e., individual level and community level). CPP is a novel model which is inspired by the well-known Independent Cascade and Community-level Social Influence models. Given a social network, we utilize a hierarchical approach to detect a set of communities and their reciprocal influence strength of physical activities. CPP provides a powerful tool to dis- cover, summarize, and investigate influence patterns of physical activities in a health social network. The detail experimental evaluation shows not only the effectiveness of our approach but also the correlation of the detected communities with various health outcome measures (i.e., both existing ones and our novel measure, named Wellness score, which is a combination of lifestyle parameters, biometrics, and biomarkers). Our promising results potentially pave a way for knowledge discovery in health social networks.

Probability of a message becomes effective to propagate physical activities.

…

An example of input for the CPP model: a graph G of physical activity propagations (each undirected edge is considered as the corresponding two directed arcs), a hierarchy H .

…

A possible detected community structure resulted from the input of Figure 2 and corresponding to the cut h3. The edge thickness represents the strength of the influence.

…

Figures - uploaded by Nhat Hai Phan

Content may be subject to copyright.

Content uploaded by Nhat Hai Phan

Content may be subject to copyright.

Analysis of Physical Activity Propagation

in a Health Social Network

NhatHai Phan

University of Oregon, USA

haiphan@cs.uoregon.edu

Dejing Dou

University of Oregon, USA

dou@cs.uoregon.edu

Xiao Xiao

University of Oregon, USA

xiaox@uoregon.edu

Brigitte Piniewski

PeaceHealth Laboratories

BPiniewski@peacehealthlabs.org

David Kil

HealthMantic, Inc

david.kil@healthmantic.com

ABSTRACT

Modeling physical activity propagation, such as the activ-

ity level and intensity, is the key to prevent the cascades of

obesity, and help spread wellness and healthy behavior in

a social network. However, there has been lacking of sci-

entiﬁc and quantitative study to elucidate how social com-

munication may deliver physical activity interventions. In

this work we introduce a Community-level Physical Activ-

ity Propagation (CPP) model to analyze physical activity

propagation and social inﬂuence at diﬀerent granularities

(i.e., individual level and community level). CPP is a novel

model which is inspired by the well-known Independent Cas-

cade and Community-level Social Inﬂuence models. Given a

social network, we utilize a hierarchical approach to detect

a set of communities and their reciprocal inﬂuence strength

of physical activities. CPP provides a powerful tool to dis-

cover, summarize, and investigate inﬂuence patterns of phys-

ical activities in a health social network. The detail exper-

imental evaluation shows not only the eﬀectiveness of our

approach but also the correlation of the detected commu-

nities with various health outcome measures (i.e., both ex-

isting ones and our novel measure, named Wellness score,

which is a combination of lifestyle parameters, biometrics,

and biomarkers). Our promising results potentially pave a

way for knowledge discovery in health social networks.

Categories and Subject Descriptors

H.2.8 [Database Management]: Database Applications—

Data Mining

General Terms

Theory; Algorithms; Experimentation

Keywords

Physical activity propagation; health social network

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are not

made or distributed for proﬁt or commercial advantage and that copies bear

this notice and the full citation on the ﬁrst page. Copyrights for components

of this work owned by others than ACM must be honored. Abstracting with

credit is permitted. To copy otherwise, or republish, to post on servers or to

redistribute to lists, requires prior speciﬁc permission and/or a fee. Request

permissions from Permissions@acm.org.

CIKM ’14 November 03 - 07 2014, Shanghai, China

http://dx.doi.org/10.1145/2661829.2662025.

1. INTRODUCTION

Regular physical activity reduces the risk of developing

cardiovascular decease, diabetes, obesity, osteoporosis, some

cancers, and other chronic conditions [15]. Public health

goal standards recommend adults to participate in at least

30 minutes of moderate-intensity physical activity on 5 or

more days a week [16]. However, less than 50% of the adult

population meets these standards in many industrialized

countries [1, 15]. Thus ﬁnding the eﬀective population-based

intervention strategies to propagate the physical activity is

a key challenge.

The exploiting of Internet and the success of online social

networks hold promise for wide-scale promotion of physical

activity behavior change. In many developed countries, In-

ternet access is greater than 63% and keeps increasing [5].

The Internet is identiﬁed as an important source of health

information and may thus be an appropriate delivery for

health behavior interventions [10]. Since 2000, a wide range

of studies evaluating Internet-delivered health behavior in-

terventions have been reported. Over half of them have been

reported positive behavioral outcomes [9, 17, 18, 27]. Re-

cently, online social networks can help people to interact and

participate various physical activities and thus could better

promote and spread physical activities with aﬀordable cost.

However, there has been lacking of scientiﬁc and quantita-

tive study to elucidate how social network may contribute

to physical activity propagation.

Besides online social network, recent advances in mobile

technology provide new opportunities to support healthy

behaviors through lifestyle monitoring and online commu-

nities. Mobile devices can track and record the walk-

ing/jogging/running distance and intensity of an individual.

Utilizing these technologies, our recent study, named Yesi-

Well, conducted in 2010-2011 as a collaboration between

PeaceHealth Laboratories, SK Telecom Americas and Uni-

versity of Oregon to record daily physical activities, so-

cial activities (i.e., text messages, social games, meetup

events, competitions, etc.), biomarkers, and biometric mea-

sures (i.e., cholesterol, triglyceride, BMI, etc.) for a group of

254 individuals who formed a health social network. Phys-

ical activities are reported via a mobile device carried by

each user. All users enroll an online social network appli-

cation allowing them make friend and communicate each

other. Biomarkers and biometric measures are recorded via

monthly medical tests performed at our laboratories on each

user. The fundamental problems this study seeks to answer,

Figure 1: Probability of a message becomes eﬀective

to propagate physical activities.

which are also the key in understanding the determinants of

healthy behavior propagation, are as follows:

1. Can social communication aﬀect the physical activity

propagation?

2. How can we leverage the social interaction to understand

the physical activity propagation?

3. How can we understand the propagation process with

diﬀerent granularities?

4. Can we clarify the eﬀect of physical activity propagation

to health outcome measures?

For the ﬁrst question, to illustrate that social communi-

cation can deliver physical activity, we have performed a

simple statistical analysis on our health social network. As-

sume that a user ureceives a message mat timestamp tfrom

another user, we compare the total number of walking and

running steps of uin the future period [t, t + ∆t] with the

past period [t−∆t, t]. If uincreases his total number of steps

then mis considered as an eﬀective message. The solid line

in Figure 1 illustrates the probability of a message becoming

eﬀective; meanwhile the dashed line shows the probability

of users increasing total number of steps when randomly

choosing timestamp t(i.e., user might or might not receive

a message at a random time t). It is clear that with ∆t= 1

day the probability a user increasing his total number of

steps is up to 0.58 and signiﬁcantly larger than 0.26 of ran-

dom t. This phenomenon remains when ∆tincreases to 50

days before dropping down. This evidence strengthens our

belief that social communications in health social networks

can help propagate physical activities.

Motivated by the evidence, our goal in this paper is to

understand the dynamics of physical activity propagation

via social communication channels at both individual level

and community level. More in concrete: 1) we aim to evalu-

ate the probability of physical activity propagations for ev-

ery social communication edge. The estimated probabilities

can be used in many applications (i.e., propagation predic-

tion, health behavior interventions, etc); 2) we then devise

a graph summarization paradigm for the analysis of phys-

ical activity propagation and social inﬂuence. In fact, we

aim to ﬁnd an abstraction of the propagation process which

provides data analysts with a compact, and yet meaning-

ful, view of patterns of inﬂuence and activity diﬀusion over

health social networks. Members in the same community

tend to play the same role in the propagation process.

To achieve this goal, we are inspired by the well-known

Independent Cascade (IC) model [7] and the Community-

level Social Inﬂuence (CSI) model [12] to ﬁt a health social

network. In our health social network, users are strongly

encouraged to communicate each other. The correlation be-

tween eﬀective messages and ineﬀective messages does not

truly represent the user-user inﬂuence relationship. There-

fore, existing models (e.g., CSI) cannot extract meaning-

ful community structures. To overcome this issue we pro-

pose a new model called Community-level Physical Activity

Propagation (CPP) in which eﬀective messages are com-

bined with a user’s responsibility to infer the probability of

physical activity propagations in a health social network.

Regarding our discovered structure, a community is iden-

tiﬁed by a set of communicated nodes that share a similar

physical activity inﬂuence tendency over nodes belonging to

other communities. In order to clarify the eﬀect of activ-

ity propagation to health outcome, we analyze the corre-

lation between detected communities not only with exist-

ing health outcome measures (i.e., biometrics, BMI, average

number of steps, BMI slope) but also with a novel measure,

named Wellness score, which is modeled as a combination

of lifestyle parameters, biometrics, and biomarkers.

The main contributions of this paper are as follows:

1. We introduce the Community-level Physical Activity

Propagation (CPP) model, which is inspired by the ideas

of IC and CSI models.

2. Given a set of disjoint communities, we devise an

Expectation-Maximization algorithm to eﬀectively learn

the strength of their pairwise inﬂuence relationships.

Then we utilize a greedy algorithm which explores a

given hierarchical partitioning of the network. Our ap-

proach results in a community structure that guarantees

a good balance between the accuracy in describing identi-

ﬁed propagation activities and a compact representation

of the inﬂuence relationships.

3. We propose a novel health outcome measure, named

Wellness score, which is a combination of lifestyle pa-

rameters, biometrics, and biomarkers towards a mimic

percentile user ranking.

4. Through a comprehensive experiment on the YesiWell so-

cial network, we show the eﬀectiveness of our approach.

Our discovery potentially paves a way for knowledge dis-

covery and data mining in health social networks (e.g.,

physical activity interventions).

The rest of the paper is organized as follows. In Sec. 2,

we formally deﬁne the problem tackled in this paper and

explain the technical detail of our model. The experimental

evaluation is in Sec. 3. We brieﬂy review related prior art

in Sec. 4 and conclude the paper with a summary of our

major ﬁndings and future research directions in Sec. 5.

2. COMMUNITY-LEVEL PHYSICAL AC-

TIVITY PROPAGATION MODEL

We ﬁrst give a deﬁnition of a single trace of physical activ-

ity propagations and review the fundamental independent

cascade propagation (IC) model [7] in Sec. 2.1. Then we

introduce CPP model (Sec. 2.2). Finally, we present our

parameter learning process and model selection in Sec. 2.3.

2.1 Preliminaries and the Independent Cas-

cade (IC) Model Review

We ﬁrst explain how to identify a single trace when a user

vinﬂuences another user uby sending a message. Assume

that at time t, user vsends a message mto user u; given a

∆t,vis called to activate uat time tif the total number of

(walking & running) steps of uin [t, t + ∆t] is larger than

or equal to the total number of steps of uin the past period

[t−∆t, t]. Normally, the inﬂuence can be further propagated

if usuccessfully activates other users at the next timestamp

(i.e., t+1) [7]. However, the process in health social networks

is usually slower than that. Following [11], we circumvent

this problem by adopting a time window wto deﬁne a single

trace as follows: given a chain of users α={U1,...,Un}

such that Uiis a set of users, U1∩U2∩. . . ∩Un=∅;αis

called a single trace if ∀i∈[1, n −1],∀u∈Ui+1 is activated

by some user u0∈Uisuch that tα(u)∈[tα(u0), tα(u0) + w]

where tα(u) is the activation time of uin α. In real cases,

U1can be a user instead of a set of users.

Let G= (V, E ) denote a directed network, where Vis the

set of vertices and E⊆V×Vdenotes a set of directed arcs.

Each arc (v, u)∈Erepresents an inﬂuence relationship (i.e.,

vis a potential inﬂuencer for u) and it is associated with

a probability p(v, u) which represents the strength of such

inﬂuence relationship. Let D={α1,...,αr}denote a log of

observed propagation traces over G. We assume that each

propagation trace in Dis initiated by a special node Ω 6∈ V,

which models a source of inﬂuence that is external to the

network. More speciﬁcally, we have tα(Ω) < t(v) for each

α∈Dand v∈V. Time unfolds in discrete steps. At time

t= 0 all vertices in Vare inactive, Ω makes an attempt to

activate every vertex v∈Vand succeeds with probability

p(Ω, v). At subsequent time steps, when a node vbecomes

active, it makes one attempt at inﬂuencing each inactive

neighbor u, who receives a message from v, with probability

p(v, u). Multiple nodes may try to independently activate

the same node at the same time.

There are diﬀerent ways to evaluate the function p. The

Independent Cascade (IC) model proposed by Kempe et al.

[7] can be instantiated with an arbitrary choice of p. They

use a uniform probability qin their experiments, that is,

p(v, u) = qfor all (v, u)∈E. On the other hand, Saito et al.

[21] estimate a separate probability p(v, u) for every (v, u)∈

Efrom a set of observed traces. These two approaches can

be viewed as opposite ends of a complexity scale. Using

a single parameter results in a simple but potentially low

accuracy model, while estimating a diﬀerent probability for

each arc might provide a good ﬁt but at the price of risking

to overﬁt.

Next we introduce our CPP model to shift the model-

ing of inﬂuence strength from node-to-node to community-

to-community. In our community-based model, all vertices

which belong to the same cluster are assumed to have iden-

tical inﬂuence probabilities towards other clusters.

2.2 The CPP Model

We start by introducing the likelihood of a single trace α

when expressed as a function of single edge probability. This

is useful to deﬁne the problem that we tackle in this paper.

Let Iα,u be the set of user u’s neighbors that potentially

inﬂuence u’s activation in the trace α:

Iα,u ={v|(v, u)∈E, if u∈Uithen v∈Ui−1}(1)

Let p:V×V→[0,1] denote a function that maps every

pair of nodes to a probability. The log likelihood of the

traces in Dgiven pcan be deﬁned as:

log L(D|p) = X

α∈D

log Lα(p) (2)

Each v∈Iα,u,vsucceeds in activating uon the considered

trace αwith probability p(v, u) and fails with probability

1−p(v, u). We deﬁne γα,v,u as users’ responsibility which

represents the probability that in trace α. The activation of

uwas due to the success of the activation trial performed

by v. The traces are assumed to be i.i.d. By using γα,v ,u,

we can deﬁne the likelihood of the observed propagation as

follows:

Lα(p) = Y

u∈VY

v∈Iα,u

p(v, u)γα,v,u 1−p(v , u)1−γα,v,u (3)

Note that social communication is very important to

keep people following health intervention programs. Conse-

quently we encourage social communications, i.e., message

sending. Thus users may receive many messages but we only

consider successful arcs of physical activity inﬂuence in Eq.3.

To shift the inﬂuence strength estimation from node-to-

node to community-to-community in the CPP model, we use

a hierarchical decomposition Hof the network G. In detail,

His a tree with the network Gas a root r, the nodes in V

as leaves, and an arbitrary number of internal nodes (i.e.,

between the root rand the leaves u∈V). A cut hof H

is a set of edges of H, so that for every v∈V, one and

only one edge e∈hbelongs to the path from the root rto

v. Therefore, by removing all the edges in hfrom H, we

disconnect every v∈Vfrom r.

Let CHdenote the set of all possible cuts of H. Each

h∈CHresults in a partition Phof the network G, so that

all vertices in Vthat are below the same edge e∈hin H

belong to the same cluster ce⊆V. Let c(u) denote the

cluster to which the node u∈Vbelongs to the partition

Ph. In the CPP model, all vertices that belong to the same

cluster are assumed to have identical inﬂuence probabilities

towards other clusters. Given a probability function ˆph:

Ph× Ph→[0,1] that assigns a probability between any two

clusters of the partition Ph, we deﬁne:

ph(v, u) = ˆph(c(v), c(u)) (4)

In the next section, we will show that we can ﬁnd ˆphus-

ing an expectation maximization (EM) algorithm. For the

moment, we can assume that ˆphis induced by hin a de-

terministic function since our aim is to identify our prob-

lem in terms of ﬁnding an optimal cut h∗∈CH. In fact,

a straightforward solution is the cut at the leaf level of H

that maximizes the likelihood deﬁned in Equations 2 and 3

(i.e., individual level). Reducing the number of pairwise in-

ﬂuence probabilities used by the model can only result in a

lower likelihood but the model complexity can be simpliﬁed.

That is the reason why we propose to use a model selection

function fthat takes into account both likelihood and the

complexity of the model.

For instance, Figures 2 and 3 respectively illustrate an

example of input and output for our problem, i.e., a CPP

A network Gof physical activity propagations Hierarchical decomposition Hof the G

Figure 2: An example of input for the CPP model: a graph Gof physical activity propagations (each

undirected edge is considered as the corresponding two directed arcs), a hierarchy H.

Figure 3: A possible detected community structure

resulted from the input of Figure 2 and correspond-

ing to the cut h3. The edge thickness represents the

strength of the inﬂuence.

model. The cut h1corresponds to the leaf level model where

each single node of the social graph constitutes a state of

the CPP model. Essentially this is the maximum likelihood

cut that corresponds to the idea of standard independent

cascade model [7] (i.e., individual level). Two other cuts

are also presented, where h2corresponds to the clustering

{{A, D, F },{B, G},{E , K},{M},{L, N, O }} and the cut h3

results in our model in Figure 3, which is the best model

according to the model selection function fin the example.

Then we can formally deﬁne the model learning problem

addressed in this paper. Note that the network Gand the

hierarchy Hremain ﬁxed. The model complexity is only

aﬀected by the cut h∈CH.

Definition 1. CPP Model Learning. Given a network

G= (V, E ), a set of propagation traces Dacross G, a hier-

archical partitioning Hof G, and a model selection function

f, ﬁnd the optimal cut of Hdeﬁned as

h∗= arg min

h∈CH

f(L(D|ˆph), h) (5)

It is interesting to note that the two extreme cases outlined

above, i.e., uniform probability, or all links have a diﬀerent

probability can be modeled in our approach. Indeed, the cut

h1in Figure 2 places all vertices of Gin separate clusters,

which corresponds to the most complex model with a sep-

arate inﬂuence probability on every edge. The cuts h2and

h3induce models with a lower granularity (i.e., community

level). Finally, if there is no cut then all vertices are in the

same cluster, which results in the simplest possible model

with a constant p(v, u) for each edge (v, u).

2.3 Learning inter-Community Inﬂuence &

Model Selection

In this section, we propose an expectation-maximization

(EM) approach for estimating the pairwise inﬂuence

strength among the clusters of nodes, i.e., the parameters

of the CPP model. As presented before, we assume that the

clusters in a partition Phhave been induced by a cut hof

a given hierarchical decomposition Hof G. However, the

EM method presented in this section can be applied to an

arbitrary disjoint partition of V. Remind that c(u) denotes

the cluster to which ubelongs, and let C(x)⊆Vdenote the

set of vertices that belong to cluster x∈ Ph.

According to the discrete-time independent cascade model

[7], given a single trace α, at least one of user v∈Iα,u was

successful to deliver physical activities to user uindepen-

dently, but we do not know which one. As discussed before,

by using users’ responsibilities γα,v,u we can deﬁne the com-

plete expectation log likelihood of the observed propagation

as follows:

Q( ˆph,ˆphprevious) = log Y

α∈DY

u∈VY

v∈Iα,u

ˆph(c(v), c(u))γα,v,u

(6)

1−ˆph(c(v), c(u))1−γα,v,u 

where ˆphprevious means the probability of the previous

partition. Assume that we have an estimate of every γα,v,u,

we can determine the ˆphwhich maximizes Eq.6 by solving

∂Q( ˆph,ˆphprevious)

∂ˆph(x,y)= 0 for all pair of clusters x, y ∈ Ph. This

gives the following estimate of ˆph(x, y).

ˆph(x, y) = P

α∈DP

u∈C(y)P

v∈Iα,u∩C(x)

γα,v,u

α∈DP

u∈C(y)P

v∈C(x)

I(v∈Iα,u)(7)

Next, we need to provide an estimate for every γα,v,u . We

do this based on the assumption that the probability distri-

butions γα,v,u are independent of the partition P. Indeed,

if vis believed to be the physical activity inﬂuencer for u

in the trace α, this belief should not change for diﬀerent

ways of clustering the two nodes. Therefore, we estimate

γα,v,u from the model where every u∈Vbelongs to its own

cluster, since this results in simpliﬁed estimates which only

depend on the network structure. By denoting this model

as ˆpo, we obtain the following estimation of γα,v,u:

γα,v,u =ˆpo(v , u)

Pz∈Iα,u ˆpo(z, u)(8)

We can summarize our learning method as follows:

1. Run the EM algorithm without imposing a cluster-

ing structure to estimate ˆpo(v, u) for all arcs (v, u)∈

E. Note that the estimate of ˆpo(v, u) is: ˆpo(v, u) =

Pα∈Dγα,v,u

Pα∈DI(v∈Iα,u). Repeats the two following steps until

convergence.

step 1 - Estimate each successful probability ˆpo.

step 2 - Update each inﬂuence responsibility γα,v,u by

using the Eq.8.

2. After obtaining γα,v,u , keep γα,v,u ﬁxed for diﬀerent

partitions Ph, and update ˆph(x, y ) according to the

Eq.7.

We have already presented our learning method to max-

imize the log likelihood L(D|ph) at individual and given a

partition Ph. Recall that the log likelihood is maximized for

the cut hthat places every node in its own cluster. We need

thus an approach to address the trade-oﬀ between model

accuracy and model complexity. In this work, we utilize the

Bayesian Information Criterion (BIC) [22] as a selection

function fin the Eq.5. In statistics, the BIC is a criterion

for model selection among a ﬁnite set of models.

BI C =−2 log L(D|ph) + |h|log(|D|) (9)

where |h|is the number of inter-community inﬂuences ˆpo(x

, y) we need to estimate, |D|is the number of traces in D.

Finally, we can evaluate diﬀerent cuts h∈CHof the hier-

archical decomposition of the network. Next, we utilize the

heuristic bottom-up greedy algorithm proposed in [12] to re-

port the best solution found as output given the hierarchical

decomposition H. In each iteration, the algorithm ﬁnds out

the two best communities to merge and to update the model.

The resulting cut as well as the corresponding parameters

are stored in the set C. Once the algorithm reaches H’s

root, it evaluates the objective function for every cut in C

and returns the one having the best value.

3. EXPERIMENTS

The CPP model generalizes the presentation of physical

activity propagations in health social networks. In the fol-

lowing we will describe how a CPP model can be exploited

for diﬀerent purposes including data understanding, and

characterization of physical activity propagation ﬂow. Fur-

thermore it can be used to categorize users based on inﬂu-

ence behaviors and health outcomes. We use the real world

user behavior data and the corresponding social network to

empirically validate the eﬀectiveness of the CPP model. We

ﬁrst elaborate on the experiment conﬁgurations on the data

set, and health outcome evaluation metrics. Then, we in-

troduce the experimental results and how we can utilize our

discovery in diﬀerent applications.

3.1 Experiment Conﬁguration and Health

Outcome Metrics

Human Physical Activity Dataset. The YesiWell

study is conducted in 2010-2011 as collaboration among sev-

eral health laboratories and universities to help people main-

tain active lifestyles and lose weight. The dataset is collected

Figure 4: Distribution of the record number and

user number.

Figure 5: The number of inbox messages and the

number of users distribution.

from 254 users, including personal information, a social net-

work, and their daily physical activities in ten months from

October 2010 to August 2011.

The initial physical activity data, collected by a special

electronic equipment for each user, includes information of

the number of walking and running steps. Since in the

dataset, some users’ daily records are missing, we show the

basic analysis on the distribution of physical activity record

numbers in Figure 4. In the Figure 4, there are 14 users with

their daily physical activity record number smaller than 10,

and 8 users with their record number larger than 10 but

smaller than 20. Thus, to clean the data, we ﬁltered the

users whose daily physical activity record number is smaller

than 80. In addition, we only consider users who contribute

to the social communication (i.e., users must send (resp.,

receive) messages to (resp., from) other users). Finally, we

have 123 users for experiments. Figure 5 illustrates the dis-

tribution of the number of inbox messages and the number of

users in our data. It clearly follows Power law distribution.

Body Mass Index (BMI) is a measure for human body

shape based on an individual’s mass and height, BM I =

mass(kg)

(height(m))2. The BMI is used in a wide variety of contexts

as a simple method to assess how much an individual’s body

weight departs from what is normal or desirable for a person

of his or her height. Indeed, BMI provides a simple numeric

measure of a person’s thickness or thinness, allowing health

professionals to discuss overweight and underweight prob-

lems more objectively with their patients. The current value

settings are as follows: a BMI of 18.5 to 25 may indicate op-

timal weight, a BMI lower than 18.5 suggests the person is

underweight, a number above 25 may indicate the person is

overweight, a number above 30 suggests the person is obese.

Wellness Score. The medical establishment has ac-

knowledged major shortcomings of BMI. BMI depends upon

weight and the square of height but it ignores basic scal-

ing laws whereby mass increases to the 3rd power of lin-

ear dimensions. Hence, larger individuals, even if they had

exactly the same body shape and relative composition, al-

ways have a larger BMI. Also, its assumptions about the

distribution between lean mass and adipose tissue are some-

how inexact [14, 25]. Thus, to enrich the health outcome

and to rank user’s health, we further propose a novel mea-

sure called Wellness score. In essence, wellness score is a

composite score of one’s health based on lifestyle parame-

ters, biometrics, and biomarkers. Lifestyle parameters en-

compass physical activities measured in steps per minute,

self-reported lifestyle parameters, the number of goals set

and achieved, and social activities in terms of the size of

and communications within one’s social network, creation

of and participation in competitions and social games, and

public/private feed activities within the our social network.

The biometric and biomarker component scores are based

on a combination of utility functions (i.e., BMI vs. mortal-

ity, triglyceride/HDL vs. health risk, LDL vs. health risk,

HbA1c vs. diabetes risk level, etc.) and correlation func-

tions between BMI and biomarkers. In short, one’s com-

ponent risk score y=β1U(BM I ) + β2ρ1U(T G/H DL) +

β3ρ2U(LDL) + β4ρ3U(HbA1c), where βis component

weight, U(.) is a speciﬁc utility function associated with the

component in parentheses, ρis the correlation coeﬃcient be-

tween BMI and the selected biomarker component. Lifestyle

component score is based on a heuristic weighted combina-

tion of the number of steps per day, intensity of steps based

on estimated speed, and various social activity-derived fea-

tures highly associated with future weight loss [8].

Finally raw wellness scores are computed over multiple

participants through Markov Chain Monte Carlo sampling

in an attempt to remap the raw scores such that remapped

scores mimic percentile ranking. For instance, a wellness

score of 90 means 90% ranking (i.e., top 10%). We also

apply some boosting at the bottom so that people do not

become too discouraged when their scores are too low.

Experiment Setting. Our proposed model (source

code1) requires input as a hierarchical decomposition of the

network. Following [12], we obtain this hierarchy by recur-

sively partitioning the underlying network using METIS [6],

which reportedly provides high quality partitions. Finally,

the delay threshold ∆tand the time window ware respec-

tively set to a day and a week. We ran our experiments on

a Intel i7 2.8 GHz processor and 4 GB memory.

3.2 Experimental Results

An eﬀective way of summarizing inﬂuence relationships

in the network is to consider the community-level inﬂuence

propagation network. In Figure 6, we show the network of

physical activity propagations for our dataset. The node size

is the average number of steps for all users in their commu-

nity. While the edge width is proportional to the probability

of physical activity inﬂuences. The shapes will be described

later. Note that we only consider the arcs which have prob-

abilities larger than 0.25. It is very interesting since the

network is almost acyclic, and this suggests a clear direc-

1ix.cs.uoregon.edu/~haiphan/Publications/CPP.rar

Figure 6: Detected community structure in our

health social network data.

tionality pattern in the ﬂow of physical activities. Moreover,

with the CPP model we are able to categorize the eight de-

tected communities into three kinds of group based on their

inﬂuence behavior as follows:

1) Inﬂuencer - This group can be seen as circle nodes in

Figure 6. Indeed, these nodes have the strongest inﬂuence

probability to deliver physical activities to other users in

other communities. In addition, they almost do not receive

physical activity delivering from other communities.

2) Inﬂuenced users - This group can be seen as rectangle

nodes in Figure 6. These nodes are easy to be inﬂuenced by

inﬂuencers (i.e., circle nodes) since they receive the physi-

cal activity delivering with high propagation probabilities.

Moreover, the average number of steps of these nodes are

quite large, even larger than the inﬂuencer nodes. These

inﬂuenced users sometimes try to deliver physical activities

to other communities but not much.

3) Non-Inﬂuenced users - This group can be seen as tri-

angle nodes in Figure 6. These nodes are very hard to be in-

ﬂuenced since they receive very small probabilities of physi-

cal activity propagations from other groups. In addition, the

average number of steps of the non-inﬂuenced nodes is very

small compared with the other mentioned kinds of nodes.

Essentially, the eﬀectiveness of our approach can be val-

idated by exploiting the diﬀerences among the three user

categories in terms of behaviors, life styles, and health out-

comes to explain why they have such physical activity prop-

agation behaviors. We will illustrate the varying of health

outcome measures (i.e., BMI, #steps, Wellness score) over

time for the three groups. Note that in the next experi-

ments, all the users in the same category will be gathered

together and thus we will have only three groups of users

instead of the eight detected communities.

BMI. Figure 7 illustrates the average and the standard

deviation of BMI for the three groups (i.e., inﬂuencers, in-

ﬂuenced users, and non-inﬂuenced users). Interestingly, the

inﬂuencer group has average and standard deviation of BMI

signiﬁcantly lower than the other two groups. Since the pur-

pose of participants who enrolled in this study is to reduce

their BMIs, the inﬂuencer group can potentially be their

external motivation. That is one of the reasons to explain

why the inﬂuencer group has a strong inﬂuence probabilities

to other groups. Meanwhile, the non-inﬂuenced users have

almost the highest average and standard deviation of BMI.

Even they have quite similar BMI values with the inﬂuenced

user group at the beginning.

Physical activity record number. Figure 8 illustrates

the average number of steps for the three groups over time.

(a) Average BMI (b) BMI standard deviation

Figure 7: Average and standard deviation of BMI for the three user categories.

Figure 8: Average steps for all users in the three kinds of community, i.e., Inﬂuencer, Inﬂuenced users, and

non-Inﬂuenced users. (Best view in color)

We can see that the inﬂuencer group not only has the best

BMI values but also is stable in doing practices day by day

(i.e., a good life style) from the beginning to the end of the

study. Together with the CPP model results, it clariﬁes the

activity delivering role of the inﬂuencer group. Regarding

the inﬂuenced user group, they did less physical activities

at the beginning (i.e., at the middle of November, 2010)

but after that they had rapidly increased activities, even

more than the inﬂuencer group. Interestingly, their activity

performance is stabilized along with the inﬂuencer group

until the end of the program. With the CPP model results,

we can say that the inﬂuencer group has been successful to

deliver physical activities to the inﬂuenced user group.

Regarding the non-inﬂuenced user group, there is no big

change in their physical activity behaviors. They have the

lowest activity performance and it usually ﬂuctuates in the

whole program lifetime. It is only a short period (i.e., Jan-

uary to March, 2011) within that they have a quite stable

(but the lowest) activity performance. So, we can say that

it is hard to improve the practice behavior of non-inﬂuenced

user group via social communications.

Wellness score. We have illustrated the correlation be-

tween the CPP model results and health outcome measures

such as BMI and the exercise activity record number inde-

pendently above. However, these individual measure cannot

reﬂect the actual user health status which is a complex com-

bination of a user lifestyle, biometrics, and biomarkers. Our

proposed wellness score is a such metric. Figure 9 illustrates

the wellness score for the three user groups. It is quite clear

that the inﬂuencer group always has a high wellness score.

In addition, the inﬂuenced user group has a big change in

their scores. In fact, the inﬂuenced user group has a low

score at the beginning but after that they had increased

their scores to be one of the highest ones. Meanwhile, the

non-inﬂuenced user group has the lowest score even they

has a better starting point compared with the inﬂuenced

user group.

Community consistency. Interestingly, in Figure 7b

and Figure 9b, the standard deviations of the BMI and Well-

ness score are quite small (i.e., from 1.5 to 2.5 for the BMI

standard deviation, and from 3 to 5 for the Wellness score

standard deviation). Furthermore they are quite stable (i.e.,

no big changes) for all the three user groups. Therefore, not

only the health outcome measures but also the lifestyles and

physical activity record numbers are quite consistent among

the users in the same communities.

Until now, we can conclude that there are signiﬁcant dif-

ferences in terms of behaviors, lifestyles, biometrics, and

biomarkers between the three user groups. Indeed, the CPP

model oﬀers us an eﬀective tool to discover the ﬂow of physi-

cal activity propagations. Base on that we can easily exploit

unrevealed inﬂuence patterns and distinguish users in terms

of physical activity delivering. Moreover, the detected com-

munities are internally consistent. It is very useful for many

(a) Average Wellness score (b) Wellness score standard deviation

Figure 9: Average and standard deviation of Wellness score for the three user categories.

(a) #steps (b) Wellness score

Figure 10: CPP model vs social link based on health outcome. The markers correspond to the three user

categories in Figure 6.

other tasks such as activity propagation prediction. Conse-

quently, the CPP model has a strong correlation with health

outcomes that is very meaningful for us to design physical

activity interventions through health social networks.

The CPP model vs social link clustering. The out-

put of the CPP model can be graphically represented to

analyze the inﬂuence probability between two communities

and social link relationships. An eﬀective way is plotting the

corresponding heat-maps, as shown in Figure 10. In these

ﬁgures, we plot the Jaccard similarity in terms of number

of steps and wellness score between the CPP model and ob-

tained clusters by clustering the social network links. Note

that the clustering algorithm maximizes the high correlation

within-cluster and low between-cluster. Given two clusters

Aand B, the Jaccard similarity is computed as follows:

J(A, B, steps) = Pu∈A∩Bu.steps

Pu∈A∪Bu.steps (10)

where u.steps is the total number of steps reported by u.

We use the similar equation for J(A, B, wellness score).

In general, we register almost no correlation between the

CPP model and the social link clustering. Five over eight

detected communities in the CPP model are found almost

in the cluster 0, which is the densest cluster in our friend

network. Thus, applying normal clustering algorithm on

social network links cannot discover communities obtained

by the CPP model.

Comparison of the CPP model and the CSI model

[12]. To highlight the eﬀectiveness of our CPP model, we

further compare our results with a CSI model. Indeed, we

applied both model selection functions MDL [19] and BIC

proposed in a CSI model. The former function generates

only one community while we observe 6 communities from

the latter function. In Figure 11, we plot the intensity of

the inﬂuence probability between two communities observed

from the CSI model (BIC model selection function) and the

CPP model. In the CPP model, it is clear to see the inﬂuence

role of the communities c0, c1,and c3while c7, c6and c2

receive strong inﬂuence probabilities. Furthermore, c4and

c5do not contribute much to the process.

Meanwhile it is not clear to distinguish the diﬀerences

between the communities observed by the CSI model. In

addition, the probability range in the CSI model is [0, 0.7]

smaller than the range in our model. The reason might

be our model is designed for health social network and we

do not take into account users who clearly fail to inﬂuence

others. In contrast, the CSI model does not consider that.

4. RELATED WORK

4.1 Physical Activity Intervention Ap-

proaches

Regular physical activities decrease the risk of develop-

ing cardiovascular disease, diabetes, obesity, osteoporosis,

some cancers, and other chronic conditions. Thus, ﬁnd-

(a) CSI model (b) CPP model

Figure 11: CPP and CSI models on our health social network data.

ing eﬀective population-based intervention strategies to pro-

mote physical activities is a key challenge. Website-delivered

physical activity interventions have the potential to over-

come many of the barriers associated with traditional face-

to-face exercise counseling or group-based physical activity

programs. An Internet user can seek advice at any time,

any place, and often at a lower cost compared with other

delivery modalities [20].

In 2000, a set of articles that identiﬁed the potential of

interactive health communications, including Internet and

website-delivered interventions, for improving health behav-

iors were published [9, 17, 18]. Since then, over ﬁfteen stud-

ies [27] evaluating a website-delivered intervention to im-

prove physical activities that used the Internet or e-mail

have been reported. Improvement in physical activities was

reported in eight. Better outcomes were identiﬁed when in-

terventions had more than ﬁve contacts with participants

and when the time to follow-up was short (≤3 months; 60%

positive outcomes), compared to medium-term (3-6 months,

50%) and long-term (≥6 months, 40%) follow-up. Indeed, a

little over half of the controlled trials of website-delivered

physical activity interventions have reported positive be-

havioral outcomes. However, intervention eﬀects were short

lived, and there was limited evidence of maintenance of phys-

ical activity changes.

Although the website-delivered approaches reported posi-

tive results, research is needed to identify elements that can

improve behavioral outcomes. The maintenance of change

and the engagement and retention of participants; larger and

more representative study samples are also needed. Indeed,

social network has this potential for being adopted since it

take the advantage of the nature of social relationships to

deliver healthy behavior. Furthermore, social network could

be a long-life environment and thus the retention of partici-

pants could be naturally improved. Though we are in a long

way to reach the goal, our proposed model and discovery is

the foundation for further researches since it oﬀers us a pow-

erful tool to understand the physical activity propagation on

a health social network.

4.2 Social Inﬂuence and Information Propa-

gation Models

Social inﬂuence and the phenomenon of inﬂuence-driven

propagations in social networks have received considerable

attention in the recent years. One of the key issues in this

area is to identify a set of inﬂuential users in a given social

network. Domingos and Richardson [3] approach the prob-

lem with Markov random ﬁelds, while Kempe et al. [7] frame

inﬂuence maximization as a discrete optimization problem.

Another line of study has focused on the problem of learning

the inﬂuence probabilities on every edge of a social network

given an observed log of propagations over this network [4,

21, 24, 28]. In addition, many tasks in machine learning and

data mining involve ﬁnding simple and interpretable mod-

els that nonetheless provide a good ﬁt to observed data. In

graph summarization, the objective is to provide a coarse

representation of a graph for further analysis. Tian et al.

[26] and Zhang et al. [29] consider algorithms to build

graph summaries based on node attributes, while Navlakha

et al. [13] use Minimum Description Length principle (MDL)

[19] to ﬁnd good structural summaries of graphs. In [12],

Mehmood et al. introduce a hierarchical approach to sum-

marize patterns of inﬂuence in a network, by detecting com-

munities and their reciprocal inﬂuence strength.

5. CONCLUSIONS AND FUTURE WORK

In this paper we introduce a hierarchical approach to an-

alyze the physical activity propagation through social com-

munications at the community level (which also can be ap-

plied to individual level). Our proposed CPP model oﬀers

a more compact representation of the network of propaga-

tions. Furthermore it can be easily plotted and exploited

to understand and detect interesting properties in the in-

formation propagation ﬂow over the network. Our empiri-

cal analysis over a real-world health social network empha-

sizes the three meaningful observations: 1) social networks

have great potential to propagate physical activities via so-

cial communications, 2) the propagation network found in a

health social network by the CPP model is almost acyclic,

and 3) the physical activity-based inﬂuence behavior has a

strong correlation to health outcome measures such as BMI,

lifestyles, and our proposed Wellness score.

Since online social networks have been exploited in re-

cent years, our ﬁrst observation paves an early brick on a

new, promising, and perhaps most eﬀective way to propa-

gate physical activities to wide population. While the second

observation oﬀers interesting insights, it shows the existence

of a clear direction in the propagation of physical activities.

That is useful for physical activity intervention approaches

to design more eﬀective strategies. The third observation

might be exploited to categorize users or to predict user

macro-activities based on their inﬂuence behaviors [23].

In the near future, we are going to clarify the correlation

between the physical activity propagation via social com-

munications and a corresponding friend network. Indeed,

homophily principle is important to deliver healthy behav-

ior on health social networks [2]. Therefore, by discovering

the correlation between homophily eﬀect and social commu-

nications, we could have a complete picture. As a result we

will be able to build up better human behavior predictive

models and physical activity intervention approaches.

6. ACKNOWLEDGMENTS

This work is supported by the NIH grant R01GM103309.

The views and conclusions contained in this document are

those of the authors and should not be interpreted as rep-

resenting the oﬃcial policies, either expressed or implied, of

the NIH, or the U.S. Government.

7. REFERENCES

[1] A. Bauman, T. Armstrong, J. Davies, N. Owen,

W. Brown, B. Bellew, and P. Vita. Trends in physical

activity participation and the impact of integrated

campaigns among australian adults, 1997-99.

Australian and New Zealand Journal of Public Health,

27(1):76–9, 2003.

[2] N. Christakis and J. Fowler. The spread of obesity in

a large social network over 32 years. New England

Journal of Medicine, 357:370–9, 2007.

[3] P. Domingos and M. Richardson. Mining the network

value of customers. In Proceedings of KDD’01, pages

57–66, 2001.

[4] A. Goyal, F. Bonchi, and L. V. S. Lakshmanan.

Learning inﬂuence probabilities in social networks. In

Proceedings of WSDM’10, pages 241–250, 2010.

[5] http://www.internetworldstats.com/stats.htm.

[6] G. Karypis and V. Kumar. A fast and high quality

multilevel scheme for partitioning irregular graphs.

SIAM J. Sci. Comput., 20(1):359–392, 1998.

[7] D. Kempe, J. Kleinberg, and E. Tardos. Maximizing

the spread of inﬂuence through a social network. In

Proceedings of KDD’03, pages 137–146, 2003.

[8] D. Kil, F. Shin, B. Piniewski, J. Hahn, and K. Chan.

Impacts of social health data on predicting weight loss

and engagement. In O’Reilly StrataRx Conference,

San Francisco, CA, October 2012.

[9] B. Marcus, C. Nigg, D. Riebe, and L. Forsyth.

Interactive communication strategies: implications for

population-based physical activity promotion.

American Journal of Preventive Medicine,

19(2):121–6, 2000.

[10] A. Marshall, E. Eakin, E. Leslie, and N. Owen.

Exploring the feasibility and acceptability of using

internet technology to promote physical activity

within a deﬁned community. Health promotion journal

of Australia, 2005(16):82–4, 2005.

[11] M. Mathioudakis, F. Bonchi, C. Castillo, A. Gionis,

and A. Ukkonen. Sparsiﬁcation of inﬂuence networks.

In Proceedings of KDD’11, pages 529–537, 2011.

[12] Y. Mehmood, N. Barbieri, F. Bonchi, and A. Ukkonen.

Csi: Community-level social inﬂuence analysis. In

Proceedings of ECML-PKDD’13, pages 48–63, 2013.

[13] S. Navlakha, R. Rastogi, and N. Shrivastava. Graph

summarization with bounded error. In Proceedings of

SIGMOD’08, pages 419–432, 2008.

[14] N. I. of Health. Aim for a healthy weight: Assess your

risk. National Institutes of Health, 2007.

[15] U. D. of Health and H. Services. Physical activity and

health: A report of the surgeon general. Atlanta GA:

U.S. Department of Health and Human Services,

Centers for Disease Control and Prevention, National

Center for Chronic Disease Prevention and Health

Promotion, 1996.

[16] R. Pate, M. Pratt, S. Blair, and et al. Physical activity

and public health. a recommendation from the centers

for disease control and prevention and the american

college of sports medicine. JAMA, 273(5):402–7, 1995.

[17] K. Patrick. Information technology and the future of

preventive medicine: potential, pitfalls, and policy.

American Journal of Preventive Medicine,

19(2):132–5, 2000.

[18] J. Prochaska, M. Zabinski, K. Calfas, J. Sallis, and

K. Patrick. Pace+: interactive communication

technology for behavior change in clinical settings.

American Journal of Preventive Medicine,

19(2):127–31, 2000.

[19] J. Rissanen. A universal prior for integers and

estimation by minimum description length. The

annals of statistics, 14(5):416–431, 1983.

[20] L. Ritterband, L. Gonder-Frederick, D. Cox,

A. Clifton, R. West, and S. Borowitz. Internet

interventions: in review, in use, and into the future.

Professional Psychology: Research and Practice,

34:527–34, 2003.

[21] K. Saito, R. Nakano, and M. Kimura. Prediction of

information diﬀusion probabilities for independent

cascade model. In Proceedings of KES 2008, pages

67–75, 2008.

[22] G. Schwarz. Estimating the dimension of a model. The

annals of statistics, 6(2):461–464, 1978.

[23] Y. Shen, R. Jin, D. Dou, N. Chowdhury, J. Sun,

B. Piniewski, and D. Kil. Socialized gaussian process

model for human behavior prediction in a health

social network. In ICDM’12, pages 1110–1115, 2012.

[24] J. Tang, J. Sun, C. Wang, and Z. Yang. Social

inﬂuence analysis in large-scale networks. In

Proceedings of KDD’09, pages 807–816, 2009.

[25] R. Taylor. Use of body mass index for monitoring

growth and obesity. Paediatrics & Child Health,

15(5):258, 2010.

[26] Y. Tian, R. Hankins, and J. Patel. Eﬃcient

aggregation for graph summarization. In Proceedings

of SIGMOD’08, pages 567–580, 2008.

[27] C. Vandelanotte, K. Spathonis, E. Eakin, and

N. Owen. Website-delivered physical activity

interventions: A review of the literature. American

Journal of Preventive Medicine, 33(1):54–64, 2007.

[28] R. Xiang, J. Neville, and M. Rogati. Modeling

relationship strength in online social networks. In

Proceedings of WWW’10, pages 981–990, 2010.

[29] N. Zhang, Y. Tian, and J. Patel. Discovery-driven

graph summarization. In Proceedings of ICDE’10,

pages 880–891, 2010.

A deep learning approach for human behavior prediction with explanations in health social networks: social restricted Boltzmann machine (SRBM+)

Article

Full-text available

Sep 2016

Human behavior modeling is a key component in application domains such as healthcare and social behavior research. In addition to accurate prediction, having the capacity to understand the roles of human behavior determinants and to provide explanations for the predicted behaviors is also important. Having this capacity increases trust in the systems and the likelihood that the systems will be actually adopted, thus driving engagement and loyalty. However, most prediction models do not provide explanations for the behaviors they predict. In this paper, we study the research problem, human behavior prediction with explanations, for healthcare intervention systems in health social networks. In this work, we propose a deep learning model, named social restricted Boltzmann machine (SRBM), for human behavior modeling over undirected and nodes-attributed graphs. In the proposed SRBM⁺ model, we naturally incorporate self-motivation, implicit and explicit social influences, and environmental events together. Our model not only predicts human behaviors accurately, but also, for each predicted behavior, it generates explanations. Experimental results on real-world and synthetic health social networks confirm the accuracy of SRBM⁺ in human behavior prediction and its quality in human behavior explanation.

Ontology-based Deep Learning for Human Behavior Prediction with Explanations in Health Social Networks (Information Sciences - IF: 4.832)

Article

Full-text available

Aug 2016
INFORM SCIENCES

Characterizing Physical Activity in a Health Social Network

Conference Paper

Apr 2016

New horizons are emerging within healthcare delivery, education , intervention provision, and tracking. We study a health social network that has tracked physical activities , biomarkers, and posts the participants have shared, throughout a one-year program. The program was aimed at helping people to adopt healthy behaviors and to lose weight. In this paper, we focus on users' posts that relate to physical activities. Prior papers characterize health based solely on users' information disclosed through natural language or questionnaires. The drawback of these works is their lack of medical records or health-related information to validate their findings. By contrast, with our direct access to users' physical and medical data, we investigate the implication of users' posts at both individual and group levels. We are able to validate our hypotheses about the effects of certain social network activities, by contextualiz-ing them in the specific users' actual medical progress and documented levels of exercise. Our findings show that activity self-disclosure posts are good indicators of one's real-world physical activity, which makes them good resources for monitoring the participants. In addition, using a physical activity propagation model, we show how these posts can influence the physical activity behavior at the network level. Further, posts exhibit distinctive affective, biological, and linguistic style markers. We observe that these characteristics can be used in a predictive capacity, to detect positive activity signals with ∼ 88% accuracy, which can be utilized for an unobtrusive monitoring solution.

Social and Motivational Factors for the Spread of Physical Activities in a Health Social Network

Chapter

Full-text available

Dec 2021

Identifying the effects of social and motivational factors is critical to understanding how healthy behaviors, i.e., physical activities, spread in digital therapeutics programs. We evaluated a comprehensive interconnected social network of 254 overweight and obese individuals across 335 days. Daily physical activities, social activities, biomarkers, and biometric measures were available for all subjects. We improved proportional hazards models to characterize the impact of self-motivation, influence, and susceptibility in the spread of physical activities. After 6 months, the YesiWell users increased leisure walking minutes by 164% on average compared with 47% among the control participants (\(P<0.05\)). The YesiWell users also lost more weight than the controls (5.2 pounds vs. 1.5 pounds) (\(P<0.01\)). Our estimations showed that influence and susceptibility increase with age; relaxed people are 96% more influential than stressed people (\(P<0.001\)); obese people are 23% more self-motivated (\(P<0.001\)); socially active people are 29% more influential (\(P<0.001\)); those who self-characterize as “keep-to-themselves” people have a 79% greater susceptibility (\(P<0.001\)). Relaxed people exert the most influence on non-stressed peers at 109% more than baseline (\(P<0.001\)). Our findings could enable new and effective personalized behavioral interventions to spread healthy behaviors in next-generation digital therapeutics.

Differential Privacy Preservation for Deep Auto-Encoders: an Application of Human Behavior Prediction

Article

Feb 2016

In recent years, deep learning has spread beyond both academia and industry with many exciting real-world applications. The development of deep learning has presented obvious privacy issues. However, there has been lack of scientific study about privacy preservation in deep learning. In this paper, we concentrate on the auto-encoder, a fundamental component in deep learning, and propose the deep private auto-encoder (dPA). Our main idea is to enforce ε-differential privacy by perturbing the objective functions of the traditional deep auto-encoder, rather than its results. We apply the dPA to human behavior prediction in a health social network. Theoretical analysis and thorough experimental evaluations show that the dPA is highly effective and efficient, and it significantly outperforms existing solutions.

Online Communities as a strategy to improve Physical Activity: a survey on user preferences and perceived impact

Article

Full-text available

Jun 2023

Jennifer Santos

Objective: The primary objective of this online survey is to understand differences in user profile, user preferences and perceived impact among the European population. Methods: The sample groups were based on the most recent report of the European country with the highest and lowest levels of physical activity. Cross-sectional online survey of population resident in Portugal and population resident in Finland were selected by simple random sampling. Responses were collected from the open-source tool LimeSurvey. IBM Statistical Package for Social Sciences Statistics was used to analyse the acquired data. Results: A total of 538 responses were considered with 48.4% of respondents residing in Portugal, and 51.4% residing in Finland. About 38.5% of the general survey population regularly practice exercise, and 39.7% regularly engage in physical activity. Regarding the level of online community experience, responses were distributed between medium, moderately low, and very low. Overall, there is a significant relationship between both sample groups when it comes to physical activity, common emotions using online communities, user perception, preferences and openness. Conclusion: Our survey results provide evidence to support that country of residence is related to user physical activity and highlight the importance of considering demographic factors to understand general population lifestyle choices. Submitted to: International Journal of Health Promotion and Education

Where Physical and Digital Meet and Unite: An Online Community Approach

Article

Full-text available

Dec 2023

Jennifer Santos

The purpose of this study is to find what tools, techniques and strategies can be applied to online communities (OC) to positively impact its users’ engagement with OC dedicated to promoting physical activity (PA). Exploring the different elements that compose digital platforms (DP) that harbour OC can be an innovative step to evaluate the promotion of PA. This is especially helpful in light of the potential for internet-based interventions, which can reach a large number of users at reduced costs. The work plan was divided into four different phases. Phase 1 encompassed the development of a scoping review on OC characteristics. Phase 2 covered an analysis of existing DP. Phase 3 was dedicated to a survey on user preferences and perceived impact. Finally, Phase 4 incorporated the final research discussion along with a conceptual framework and a set of guidelines. The main findings indicate that OC should be harboured in DP optimised in website and app formats and that both are equivalently indispensable. We also found that a variety of details, OC features, OC content, user interaction strategies, and BCT are present in the current DP. Finally, we discovered that there appears to be a broad dissatisfaction with the features offered by OC. Existing evidence is insufficient to draw objective conclusions, therefore more scientific studies on the various subjects must be carried out. Users’ preferences and perceived impact may initially appear to be at odds—users’ expectations of OC appear to be unmet, and user preferences need to be nurtured. Submitted to: Health Promotion International

Online Communities as a strategy to improve Physical Activity: a scoping review on its use and characteristics

Article

Full-text available

Jun 2021

Jennifer Santos

Objective: The objective of this scoping review was to identify, characterize and synthesize existing literature on the use of online communities to promote physical activity and identify gaps to direct future research. Methods: Systematic searches were conducted in Science Direct, PubMed, Scopus, and Institute of Electrical and Electronics Engineers Xplore for studies published up to August 2020. The search terms included a combination of the following keywords: physical activity, sedentary, exercise, health, sport, brand, online community. No limits were used. Studies were included if they encompassed a full publication containing enough details on characteristics and described any feature primarily aiming at physical activity promotion. Results: A total of 21 different online communities were found in the total of 25 selected studies. Of those studies, all reported on at least one behaviour change technique, 68.2% (n=15) used websites to support the OC, 36% (n=9) reported on strategies to keep users engaged, 16% (n=4) comprised information related to the design process, and 16% (n=4) reported on OC effectiveness. Conclusion: Existing reports do not provide evident detailed information on the design process or user engagement strategies related to online communities, and only a few studies assess its effectiveness in improving physical activity. Further research is needed. Submitted to: Health promotion journal of Australia

Online Communities to improve Physical Activity: an analysis of their content and features

Conference Paper

Full-text available

Feb 2022

A scoping review was carried out to identify existing literature characterising the content and features of OC targeting the promotion of PA.

Personalized Semantic Word Vectors

Conference Paper

Full-text available

Oct 2016

Distributed word representations are able to capture syntactic and semantic regularities in text. In this paper, we present a word representation scheme that incorporates authorship information. While maintaining similarity among related words in the induced distributed space, our word vectors can be effectively used for some text classification tasks too. We build on a log-bilinear document model (lbDm), which extracts document features, and word vectors based on word co-occurrence counts. First, we propose a log-bilinear author model (lbAm), which contains an additional author matrix. We show that by directly learning author feature vectors, as opposed to document vectors, we can learn better word representations for the authorship attribution task. Furthermore, authorship information has been found to be useful for sentiment classification. We enrich the author model with a sentiment tensor, and demonstrate the effectiveness of this hybrid model (lbHm) through our experiments on a movie review-classification dataset.

Physical activity and public health. A recommendation from the Centers for Disease Control and Prevention and the American College of Sports Medicine

Article

Full-text available

Feb 1995

Objective: To encourage increased participation in physical activity among Americans of all ages by issuing a public health recommendation on the types and amounts of physical activity needed for health promotion and disease prevention. Participants: A planning committee of five scientists was established by the Centers for Disease Control and Prevention and the American College of Sports Medicine to organize a workshop. This committee selected 15 other workshop discussants on the basis of their research expertise in issues related to the health implications of physical activity. Several relevant professional or scientific organizations and federal agencies also were represented. Evidence: The panel of experts reviewed the pertinent physiological, epidemiologic, and clinical evidence, including primary research articles and recent review articles. Consensus process: Major issues related to physical activity and health were outlined, and selected members of the expert panel drafted sections of the paper from this outline. A draft manuscript was prepared by the planning committee and circulated to the full panel in advance of the 2-day workshop. During the workshop, each section of the manuscript was reviewed by the expert panel. Primary attention was given to achieving group consensus concerning the recommended types and amounts of physical activity. A concise "public health message" was developed to express the recommendations of the panel. During the ensuing months, the consensus statement was further reviewed and revised and was formally endorsed by both the Centers for Disease Control and Prevention and the American College of Sports Medicine. Conclusion: Every US adult should accumulate 30 minutes or more of moderate-intensity physical activity on most, preferably all, days of the week.

Socialized Gaussian Process Model for Human Behavior Prediction in a Health Social Network

Conference Paper

Full-text available

Dec 2012

Modeling and predicting human behaviors, such as the activity level and intensity, is the key to prevent the cascades of obesity, and help spread wellness and healthy behavior in a social network. In this work, we propose a Socialized Gaussian Process (SGP) for socialized human behavior modeling. In the proposed SGP model, we naturally incorporates human's personal behavior factor and social correlation factor into a unified model, where basic Gaussian Process model is leveraged to capture individual's personal behavior pattern. Furthermore, we extend the Gaussian Process Model to socialized Gaussian Process (SGP) which aims to capture social correlation phenomena in the social network. The detailed experimental evaluation has shown the SGP model achieves the best prediction accuracy compared with other baseline methods.

CSI: Community-Level Social Influence Analysis

Conference Paper

Full-text available

Sep 2013

Modeling how information propagates in social networks driven by peer influence, is a fundamental research question towards understanding the structure and dynamics of these complex networks, as well as developing viral marketing applications. Existing literature studies influence at the level of individuals, mostly ignoring the existence of a community structure in which multiple nodes may exhibit a common influence pattern. In this paper we introduce CSI, a model for analyzing information propagation and social influence at the granularity of communities. CSI builds over a novel propagation model that generalizes the classic Independent Cascade model to deal with groups of nodes (instead of single nodes) influence. Given a social net- work and a database of past information propagation, we propose a hierarchical approach to detect a set of communities and their reciprocal influence strength. CSI provides a higher level and more intuitive description of the influence dynamics, thus representing a powerful tool to summarize and investigate patterns of influence in large social networks. The evaluation on various datasets suggests the effectiveness of the proposed approach in modeling information propagation at the level of communities. It further enables to detect interesting patterns of influence, such as the communities that play a key role in the overall diffusion process, or that are likely to start information cascades.

Kumar, V.: A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs. SIAM Journal on Scientific Computing 20(1), 359-392

Article

Full-text available

Jan 1999

Recently, a number of researchers have investigated a class of graph partitioning algorithms that reduce the size of the graph by collapsing vertices and edges, partition the smaller graph, and then uncoarsen it to construct a partition for the original graph (Bui and Jones, Proc. of the 6th SIAM Conference on Parallel Processing for Scientific Computing, 1993, 445-452; Hen- drickson and Leland, A Multilevel Algorithm for Partitioning Graphs, Tech. report SAND 93-1301, Sandia National Laboratories, Albuquerque, NM, 1993). From the early work it was clear that multilevel techniques held great promise; however, it was not known if they can be made to con- sistently produce high quality partitions for graphs arising in a wide range of application domains. We investigate the effectiveness of many different choices for all three phases: coarsening, partition of the coarsest graph, and refinement. In particular, we present a new coarsening heuristic (called heavy-edge heuristic) for which the size of the partition of the coarse graph is within a small factor of the size of the final partition obtained after multilevel refinement. We also present a much faster variation of the Kernighan-Lin (KL) algorithm for refining during uncoarsening. We test our scheme on a large number of graphs arising in various domains including finite element methods, linear pro- gramming, VLSI, and transportation. Our experiments show that our scheme produces partitions that are consistently better than those produced by spectral partitioning schemes in substantially smaller time. Also, when our scheme is used to compute fill-reducing orderings for sparse matrices, it produces orderings that have substantially smaller fill than the widely used multiple minimum degree algorithm.

Pate RR, Pratt M, Blair SN, Haskell WL, Macera CA, Bouchard C, Bouchner D, Ettinger W, Heath GW, King AC et al: Physical activity and public health: recommendation from the Centers for Disease Control and Prevention and the American College of Sports Medicine. JAMA 273, 402-407

Article

Full-text available

Feb 1995

Objective. —To encourage increased participation in physical activity among Americans of all ages by issuing a public health recommendation on the types and amounts of physical activity needed for health promotion and disease prevention.

Internet Interventions: In Review, In Use, and Into the Future

Article

Full-text available

Oct 2003

The provision of health care over the Internet is a rapidly evolving and potentially beneficial means of delivering treatment otherwise unsought or unobtainable. Internet interventions are typically behavioral treatments operationalized and transformed for Web delivery with the goal of symptom improvement. The literature on the feasibility and utility of Internet interventions is limited, and there are even fewer outcome study findings. This article reviews empirically tested Internet interventions and provides an overview of the issues in developing and/or using them in clinical practice. Future directions and implications are also addressed. Although Internet interventions will not likely replace face-to-face care, there is little doubt that they will grow in importance as a powerful component of successful psychobehavioral treatment. (PsycINFO Database Record (c) 2012 APA, all rights reserved)

Use of body mass index for monitoring growth and obesity

Article

May 2010

Richard S. Taylor

The Spread of Obesity in a Large Social Network Over 32 Years

Article

Jul 2007
NEW ENGL J MED

Nicholas Christakis

The prevalence of obesity has increased substantially over the past 30 years. We performed a quantitative analysis of the nature and extent of the person-to-person spread of obesity as a possible factor contributing to the obesity epidemic. We evaluated a densely interconnected social network of 12,067 people assessed repeatedly from 1971 to 2003 as part of the Framingham Heart Study. The body-mass index was available for all subjects. We used longitudinal statistical models to examine whether weight gain in one person was associated with weight gain in his or her friends, siblings, spouse, and neighbors. Discernible clusters of obese persons (body-mass index [the weight in kilograms divided by the square of the height in meters], > or =30) were present in the network at all time points, and the clusters extended to three degrees of separation. These clusters did not appear to be solely attributable to the selective formation of social ties among obese persons. A person's chances of becoming obese increased by 57% (95% confidence interval [CI], 6 to 123) if he or she had a friend who became obese in a given interval. Among pairs of adult siblings, if one sibling became obese, the chance that the other would become obese increased by 40% (95% CI, 21 to 60). If one spouse became obese, the likelihood that the other spouse would become obese increased by 37% (95% CI, 7 to 73). These effects were not seen among neighbors in the immediate geographic location. Persons of the same sex had relatively greater influence on each other than those of the opposite sex. The spread of smoking cessation did not account for the spread of obesity in the network. Network phenomena appear to be relevant to the biologic and behavioral trait of obesity, and obesity appears to spread through social ties. These findings have implications for clinical and public health interventions.

Estimate the dimension of a model" The Annals of Statistics

Article

G Schwarz

SUPPLEMENTARY ONLINE MATERIAL FOR: THE SPREAD OF OBESITY IN A LARGE SOCIAL NETWORK OVER 32 YEARS

Article

Analysis of Physical Activity Propagation in a Health Social Network

Abstract and Figures

Recommended publications

Topic-aware Physical Activity Propagation in a Health Social Network (IEEE Intelligent Systems - IF:...

Topic-aware Physical Activity Propagation with Temporal Dynamics in a Health Social Network (ACM TIS...

Dynamic Socialized Gaussian Process Models for Human Behavior Prediction in a Health Social Network...

Social Restricted Boltzmann Machine: Human Behavior Prediction in Health Social Networks (Invited to...