ArticlePDF Available

Online profiling and clustering of Facebook users

February 2015
Decision Support Systems 70:60 - 72

February 2015
70:60 - 72

DOI:10.1016/j.dss.2014.12.001

Authors:

Erasmus University Rotterdam

Abstract In a relatively short period of time, social media have acquired a prominent role in media and daily life. Although this development brought about several academic endeavors, the literature concerning the analysis of social media data to investigate one's customer base appears to be limited. In this paper, we show how data from the social network site Facebook can be operationalized to gain insight into the individuals connected to a company's Facebook site. In particular, we propose a data collection framework to obtain individual specific data and propose methodology to explore user profiles and identify segments based on these profiles. The proposed data collection framework can be used as an identification step in an analytical customer relationship management implementation that specifically focuses on potential customers. We illustrate our methodology by applying it to the Facebook page of an internationally well-known professional football (soccer) club. In our analysis, we identify four clusters of users that differ with respect to their indicated “liking” profiles.

Screenshot of Facebook Insights.

…

Age and gender distributions: Insights' data versus our sample.

…

Facebook data collection framework.

…

Overview of the available profile information for the 43,861 users in our sample.

…

Facebook administrators: screen shot of Web page which lists people who 'like' a Facebook page.

…

Figures - uploaded by Michel van de Velden

Content may be subject to copyright.

Content uploaded by Michel van de Velden

Content may be subject to copyright.

Online proﬁling and clustering of Facebook users

Jan-Willem van Dam, Michel van de Velden ⁎

Econometric Institute, Erasmus University Rotterdam, P.O. Box 1738, 3000 DR Rotterdam, The Netherlands

abstractarticle info

Article history:

Received 20 November 2013

Received in revised form 22 September 2014

Accepted 1 December 2014

Available online 9 December 2014

Keywords:

Online proﬁling

Social networks

Customer relationship management

Correspondence analysis

Cluster analysis

Facebook

In a relatively short period of time, social media have acquired a prominent role in media and daily life. Although

this development brought about several academic endeavors, the literature concerning the analysis of social

media data to investigate one's customer base appears to be limited. In this paper, we show how data from the

social network site Facebook can be operationalizedto gain insight into the individuals connected to a company's

Facebook site. In particular, we propose a data collection framework to obtain individual speciﬁcdataand

propose methodology to explore user proﬁles and identify segments based on these proﬁles. The proposed

data collection framework can be used as an identiﬁcation step in an analytical customer relationship management

implementation that speciﬁcally focuses on potential customers. We illustrate our methodology by applying it to

the Facebook page of an internationally well-known professional football (soccer) club. In our analysis, we identify

four clusters of users that differ with respect to their indicated “liking”proﬁles.

1. Introduction

Social networks andtheir role played in daily life increasedconsider-

ably over thelast few years. As illustrated by the editorials of two recent

special issues [3,8], recent academic publications cover a broad

spectrum of topics related to social media. Some examples concern

the potential of social media and its effect on customer loyalty [4];

how to use Facebook to activate customers in sharing product/service

recommendations [7,16,19], the role of social networks, in particular

Facebook, on intentional social actions [6]; the relationship between

personal networks and patterns of Facebook usage [29]; the effective-

ness of user generated content in stimulating sales [10,17,39].This

short list of topics and references is by no means exhaustive. It only

serves to illustrate the recent interest and range of applications relating

to ﬁrms and social networks. One common element among extant liter-

ature is that none of these studies build on directly observed individual

level social network data. Instead, either aggregate data or focus groups

and/or (online) surveys were employed in order to answer the research

questions. This limitation was recently also observed by [31] In their

study of the effect of social media participation on visit frequency and

proﬁtability, survey respondents were linked to their social media (i.e.

Facebook) proﬁles by matching of names. In this paper we add to the

existing literature by explicitly considering the retrieval and analysis

of proﬁle data directly obtained from social network sites. The proposed

methodology can be implemented into an analytical customer

relationship management (CRM) framework aimed at the analysis of

customer characteristics that may help improve a ﬁrm's customer man-

agement strategies. Moreover, by focusing on data from public pro-

ﬁles from the social media platform Facebook, we are able to identify

potential rather than actual customers. That is, in contrast to typical

CRM implementations that rely on data directly obtained from cus-

tomers, we consider a much broader group of individuals that indicated

an interest in a ﬁrm even when an actual purchase has not yet been

recorded.

The contribution of this paper is threefold: First, we show how

Facebook users that “like”aﬁrm can be identiﬁed. As also observed by

[31] this is not a trivial task. Second, using the information volunteered

by such Facebook users through their publicly available pages, we show

how segments of Facebook users can be identiﬁed through data visual-

ization and cluster analysis methods. Clustering of a ﬁrms' Facebook

fans, may improve understanding of strategic segmentation of social

media users connected to a ﬁrm [28] Moreover, the cluster results and

visualizations can be used to improve targeting of marketing efforts.

For example, a company may consider seeking cooperation with anoth-

er brand or a popular media ﬁgure based on the popularity of such a

brand or person with the (potential) customers. Moreover, such efforts

could be targeted directly at speciﬁc groups of (potential) customers

rather than at all (potential) customers. Third, we apply our methodol-

ogy to a (large) data set of Facebook users thatindicated liking a popular

and successful international football club. This football club granted us

administrator rights, under provision of not revealing the name of the

club and any results indicative of the club's name. The results of our

analysis show that, based on the Facebook users' liking behavior,

clusters can be obtained. Given the differences between liking patterns

Decision Support Systems 70 (2015) 60–72

⁎Corresponding a uthor at: Michel van de Velden, Econometric Institute, Erasmus

University Rotterdam, P.O. Box 1738, 3000 DR Rotterdam, The Netherlands.

E-mail addresses: jwvdam@gmail.com (J.-W. van Dam), vandevelden@ese.eur.nl

(M. van de Velden).

http://dx.doi.org/10.1016/j.dss.2014.12.001

Contents lists available at ScienceDirect

Decision Support Systems

journal homepage: www.elsevier.com/locate/dss

in these clusters, differentiated marketing strategies for the different

clusters can be developed.

The remainder of this paper is structuredas follows. First, we brieﬂy

review previous research on analytical CRM and online proﬁling. Next,

we brieﬂy discuss speciﬁc data considerations for Facebook. We

introduce some terminology and review Facebook's data analysis and

programming facilities. In Section 4, we show how speciﬁc Facebook

users can be identiﬁed. Next, we analyze the individual level data

using a combined multiple correspondence analysis and k-means clus-

ter analysis method. We show how results can be visualized and

interpreted. The paper concludes with a discussion of our results, impli-

cations for research and practice, and future directions.

2. Customer relationship management and online proﬁling

Customer relationship management (CRM) has become widely

recognized as an important business approach [27,31] deﬁnes CRM as

an “enterprise approach to understanding and inﬂuencing customer be-

havior through meaningful communications in order to improve

customer acquisition, customer retention, customer loyalty, and customer

proﬁtability”.Hence,customeracquisition(or:identiﬁcation) can be seen

as the ﬁrst step in a customer relationship management cycle that, to-

gether with retention and customer development form a complete cycle

geared at creating a better understanding of (potential) customers in

order to increase long term customer value to the ﬁrm [22,33,30].

Customer identiﬁcation is typically based on information directly

available to a ﬁrm [38] For example, customers maybe required to pro-

vide certain background information upon purchasing a product. In ad-

dition, companies may ask customers to volunteer information by

completing a survey or persuading them to join a loyalty program.

Based on the available data, a customer proﬁle, that is, a model of the

customer, can be constructed. Based on such a customer proﬁle, a mar-

keteer decides on appropriate strategies and tactics to meet the speciﬁc

needs of the consumer [32]. Hence, possessing accurate information

about preferences and background characteristics of your (potential)

customers makes it possible to improve targeting of, possibly individual

speciﬁc, marketing efforts [25].

Obtaining direct customer information requires an existing relation-

ship with the ﬁrm. That is, customers need to have either purchased a

product or made contact with the ﬁrm in such a way that identiﬁcation

is possible so that additional information can be collected. In the case of

yet unidentiﬁed potential customers,it is not possible to acquire data in

this fashion. Moreover, except for the observable transactional data

(i.e., purchase time and amounts, etc.) customers may decline to pro-

vide additional information.

Social media offer a new source of customer proﬁle information. In

particular, social media offer opportunities to identify potential

customers. Through social media, individuals often express preferences

for brands, products, services, persons, political parties, etc., in a freeun-

solicited way. Thus, if one is able to collectsuch information from poten-

tial customers of a certain ﬁrm, for example by focusing on users that

indicated an interest in that ﬁrm, online proﬁles can be created that

allow for better, individualized, targeting.

Although it has been suggested that the rise of new media requires

novel approaches to successfully manage customer relationships [18],

applications in which customer background data from social network

sources are used to gain insight into customer backgrounds, appear to

be under represented in the academic literature. There are some studies

[24,23,15,5] in whichsocial network data were used that contained per-

sonal information of the users in the data set. However, the goalof these

studies was to study network ties [24], privacy issues [5,15], or relating

the number of friends to the amount of information available on a

person's Facebook page [23]. None of the studies used the data for on-

line proﬁling: The collection of information from the Internet for the

purpose of formulating a proﬁle of users' habits and interests [37].In

this paper, we ﬁll this gap by proposing a data collection framework

for the purpose of online proﬁling.

Online proﬁling can be divided into two categories: reactive and

non-reactive data collection [37].Non-reactive data collection focuses

on the collection of data concerning Web usage behavior, e.g., IP ad-

dresses of visitors, timespent on certain Web pages, and clicking behav-

ior information. These data are used to gain insight into Web user

behavior, and thus, characteristics of individual visitors or visitor

groups. Non-reactive data form a large and potentially interesting

source of online proﬁle information. However, for the construction of

online user proﬁles, the observed usage behavior must ﬁrst be trans-

formed into meaningful variables. The construction and deﬁnition of

such variables is not always a easy task. In our study, we therefore

primarily focus on the retrieval and analysis of online proﬁles based

on reactive rather than non-reactive data.

Reactive data collection zooms in on visitor characteristics which

cannot be collected through tracking Web usage behavior of a visitor.

Instead, reactive information is collected by using forms and selection

menus, which have to be ﬁlled in by visitors themselves. Reactive data

requires little to no recoding of the original variables and they are im-

mediately collected at the user level. Moreover, in the case of Facebook,

providing reactive data requires very little effort from the users. For ex-

ample, when joining Facebook, users are asked to provide certain per-

sonal background information (e.g., name, gender, date of birth).

Users provide this information by selecting the appropriate options.

This basic background information can be supplemented by more per-

sonal information concerning, for example, hobbies, relationship status

etc. Finally, by “liking”other pages, personal preferences for persons or

objects can be indicated.

The resulting online proﬁles can be of great value for marketeers, as

they can be used to identify different (segments of) users (customers)

that require different marketing approaches. Moreover, it enablesthe com-

pany to know its potential customers, that is, individuals that indicated a

preference towards the product/brand by “liking”it on Facebook.

3. Facebook data

Facebook users put personal information on their Facebook page.

Some examples are someone's name, gender, date of birth, e-mail ad-

dress, sexual orientation, marital status, interests, hobbies, favorite

sports team(s), favorite athlete(s), or favorite music. Furthermore, it is

possible to specify your Facebook friends, post messages, publish pic-

tures or other content. Consequently, the potential value to marketeers

and researchers of the information available through Facebook is sub-

stantial. However, extracting the information is no trivial task as:

1. Facebook users are able to make certain information not publicly

available and therefore not visible to non friends.

2. Facebook users are not obliged to ﬁll in ﬁelds and therefore, many

users do not specify all possible information about themselves.

3. The default statistics that Facebook offers for Facebook page admin-

istrators are limited.

4. It is not obvious how Facebook users who “like”your page can be

identiﬁed.

The ﬁrst two points are a result of the design and policy of

Facebook.com and therefore we take these points as these are. Instead,

we focus on theextraction of available data from Facebook and consider

auserproﬁle data collection framework taking into account the above-

mentioned issues.

The Facebook data collection framework that we propose consists of

three steps: 1) identiﬁcation of “fans”of the Facebook page, 2) retrieval

of relevant data for the identiﬁed fans, and 3) preparation of the data.

The ﬁrst step of this framework requires administrator rights to the

page, in the other steps public information from the relevant pages

needs to be collected. Before we show how to implement the data

61J.-W. van Dam, M. van de Velden / Decision Support Systems 70 (2015) 60–72

collection framework, we brieﬂy summarize some important aspects

concerning Facebook pages and the available data.

3.1. Facebook Insights

The owner of a Facebook page is in principle the administrator of the

page. Personal pages are typically managed only by the page owner, how-

ever, in the case of a company's Facebook pages, the page administrator

can also give other Facebook users these administrator rights. It is possible

to have multiple administrators for one Facebook page, e.g., multiple mar-

keting and CRM employees may be page administrators. As Facebook

page administrator, one has certain privileges in comparison to regular

users or visitors of a Facebook page. As administrator, one can edit, pub-

lish and withdraw content, target advertisements and install Facebook

apps on the page. Also, administrators have access to Facebook Insights,

a dashboard which provides statistics on user's growth, demographics,

consumption of content, and creation of content. However, the informa-

tion made available through the dashboard is aggregated over users

who “liked”the page. Consequently, the possibilities concerning the anal-

ysis of individual speciﬁc data using this feature, are limited.

On Facebook, users can indicate whether they “like”another Facebook

page.Thus,theyareabletoexpressaformofafﬁnity with the company,

person or product behind the Facebook page. Through Facebook Insights

it is possible to see how many users “like”your page, how this number

evolves over time, and whether these users are active on your page or

not. (The deﬁnition of an active Facebook user is as follows: users who,

within a chosen time period, engaged with, viewed, or consumed content

generated by a Facebook page). Furthermore, you can see which media

on the page are most popular (e.g., watching videos, listening to

audio, or viewing photos). It is also possible to see which Facebook

tabs (e.g., the wall, information, photos, and events) are most popu-

lar and from which external referrers visitors come. Additionally,

one is able to see which page posts have been viewed the most and

which posts generated most user feedback.

The above-mentioned possibilities of the Facebook Insights dash-

board all concern information related to the Facebook page itself. Infor-

mation about the Facebook users is only present through aggregated

breakdowns.

Fig. 1 gives an example of the breakdowns for the gender and age

distributions based on the information a user provided on his Facebook

page. The home country and home city are determined using the IP

address from which users (who indicated “liking”the page) access

the F acebook page. The language is based on the users' default language

setting when accessing Facebook.

Fig. 1. Screenshot of Facebook Insights.

62 J.-W. van Dam, M. van de Velden / Decision Support Systems 70 (2015) 60–72

Other personal information of users, such as, for example, relation-

ship status, sexual orientation, favorite brands, favorite music, liked

pages, etc., are not accessible through theFacebook Insights' dashboard.

3.2. Facebook application programming interface (API)

Through Facebook Developers Platform [9] it is possible to develop

web applications (or plugins) which make use of the Facebook

platform; e.g., mobile applications which makes it possible to connect

to your Facebook page to post pictures, applications which integrate

Facebook features in a Web site, or applications which make it possible

to ﬁnd friends.

The Facebook Developers Platform consists of multiple applica-

tion programming interfaces (APIs). The Graph API is the core of

Facebook Platform, enabling one to read and write data to Facebook.

It provides a simple and consistent view of the social graph, uniform-

ly representing objects (e.g., people, photos, events and pages) and

the connections between them (e.g., friendships, likes and photo

tags) [9]. In addition to the Graph API, there is an Internationalization

API, an Ads API, and a Chat API. For our data collection framework, the

Graph API is crucial.

The Graph API makes itpossible for developers to integrate Facebook

into Web site (Web) applications, and to build Facebook applications.

However, even when a Facebook user has a public proﬁle, which is ac-

cessible (online) by anyone, the data in his proﬁle are not publicly ac-

cessible through the API. In fact, only the following ﬁelds are always

publicly available through the API: user id; username; full name; ﬁrst

name; last name; gender; locale (i.e. the default language setting);

proﬁle picture. For marketing or customer relationship management

purposes, this list is not very useful. In addition, if we compare this list

with the complete list of ﬁelds that Facebook provides, we observe

that there is potentially much more relevant information available on

the Facebook pages.

Considering the complete list of ﬁelds that Facebook provides, we

identify as potentially interesting characteristics that are not available

through the API: date of birth; place of birth; sexual orientation;

political view; relationship status; education; work experience; contact

information; activities; interests; likes (i.e. internet pages “liked”by a

Facebook user, these could correspond to books, movies, athletes but

also friends' Facebook pages. This last ﬁeld, likes, is of particular interest

in this study as we want to see if fans can be clustered according to the

preferences indicated in this ﬁeld. As the API does not allow the retrieval

of these data, we need to develop alternative methods. In the next section,

we consider how such data can be obtained.

4. Facebook user proﬁle data collection framework

To gather a Facebook user's proﬁle information relevant to customer

relationship management and/or for marketing purposes, the informa-

tion resources described in the previous section, must be combined.

Fig. 2 shows the user proﬁle data collection framework. For convenience

we introduce the term “fan”for users who “liked”a Facebook page. In

fact, Facebook itself originally gave users the option to “become a fan of”

other pages and changed this into “like”. The data collection framework

consists of three parts: 1) identifying the ‘fans’,2) gathering the personal

information, and 3) preparing and structuring the gathered data.

4.1. Identifying Facebook page fans

Administrators of a Facebook page, can list fans of their page by

accessing https://www.facebook.com/browse/?type=page_fans&page_

id=1234567890, where 1234567890 should be replaced by the page

ID of the page one is interested in (and is administrator of). A screen

shot of the URL is given in Fig. 3. When one clicks on the ‘See more’

button on the bottom of the page, more ‘fans’are listed. However,

after showing 500 ‘fans’, the button does not show up anymore.

After exploring the HTTP requests resulting from “clicking”the ‘See

more’button, we conclude that:

•Facebook uses Asynchronous JavaScript (AJAX) for its HTTP requests.

•Facebook uses two parameters (fb_dtsg and post_form_id) to prevent

cross-site request forgery (CSRF) in its HTTP requests.

•Only authenticated Facebook administrators have access to the page

(cookies are used for authentication).

•The response format of the HTTP request is in JSON format, which

contains each fan's picture, name, URL, and ID.

With theseobservations in mind, a PHP script to store the name,URL

and ID of each Facebook user in a (MySQL) database, was written. The

pseudo code for this script can be found as Algorithm 1 in Appendix A.

Running this algorithm yields, after removing duplicate Facebook IDs,

10,000 unique ‘fans’. Facebook does not give information about how

these ‘fans’are selected. By changing the fb_dtsg, and post_form_id pa-

rameters a new set of 10,000 unique ‘fans’is obtained. Between these

sets there exists some overlap, but the greater part of the ‘fans’is differ-

ent. As the algorithm always results in 10,000 unique Facebook IDs, it

will be difﬁcult to obtain a list with all fans if the total number of fans ex-

ceeds the 10,000. A sample, however, can be obtained without too much

difﬁculty by using Algorithm 1, and varying the two parameters.

4.2. Gathering fan's public proﬁle information

The second step in the data collection process is gathering informa-

tion of the Facebook ‘fans’identiﬁed in the previous step. This can be

done by visiting these Facebook pages and storing the relevant, individ-

ual level, data. Note that only public information can be obtained. The

data we thus obtain, only concerns users that granted public access to

their pages.

4.3. Data preparation and storage

The third step in our proﬁle data collection framework, concerns

preparation and structuring of the gathered data. When one creates or

updates his/her Facebook proﬁle, personal information can (and in

Fig. 2. Facebook data collection framework.

63J.-W. van Dam, M. van de Velden / Decision Support Systems 70 (2015) 60–72

some cases, such as name and date of birth, must) be provided by

completing several ﬁelds. There are text ﬁelds (e.g., name, language,

interests), check boxes (e. g., sexual orientation), or lists (e.g., gender,

relationship status). For check boxes and lists, the options which can

be selected are limited. For text ﬁelds no such limitation exists and

users can type in anything they like. We distinguish between two

sets of variables: background characteristics and liking data.

4.3.1. Background characteristics

From individual Facebook pages we are able to obtain personal back-

ground information of the users. In particular, from the public proﬁles

we can obtain the variables gender, date of birth, location, relationship

status and the number of Facebook friends. Gender and date of birth

are straightforward background variables. Concerning the other vari-

ables we brieﬂy indicate how the data is available on the Facebook

pages and how we process these for our analysis.

4.3.2. Location

A user's location is represented by a string with the name of the city

or town someone lives in and/or comes from, together with a URL to the

Facebook page of that location. Location may be useful when analyzing

fans of a page and we may be interested in more details about the loca-

tion. In particular, for a geographical overview of the fanbase one needs

to know the country, continent, and the coordinates (latitude, longi-

tude) of a location. This can be achieved by using the GeoNames geo-

graphical database [36]. The GeoNames API has a fuzzy search engine

which accepts all kinds of input. For example, the engine accepts both

‘Rio de Janeiro’and ‘Rio Janeiro’as search terms for the large Brazilian

city. As output, GeoNames yields various details such as, city name,

country name, latitude, longitude, number of inhabitants, etc.

For ‘fans’whodon't publicly specify their location wecannot discov-

er their latitude and longitude. However, through Facebook's API it is

possible to gather Facebook's language setting. Assuming that the lan-

guages correspond to the user's location, one could use the language

to determine, at country level, the user's location. That is, a Facebook

user who's using Facebook in Japanese is assumed to come from

Japan. Although we believe that this assumption is not a very unrealistic

one, there are cases in which language is not linkable to one speciﬁc

country. For example, we cannot infer a country from a user with a

language setting such as “Arabic”,“English”,“French”,“German”or

“Spanish”.

4.3.3. Relationship status

A Facebook usercan specify their relationship status by selecting one

of the following options: single, in a relationship, engaged, married, it's

complicated, widowed, separated or divorced. However, in our data set

we also found values as ‘in a complicated relationship’,‘in an open rela-

tionship’,or‘civil partnership’. This is probably a result of the fact that

Facebook changed the possible values for the ﬁeld‘relationship status’

over the years. At the time of our data gathering process (2011), it

was not possible to choose other values than the eight values listed

above. Therefore, we convert the values ‘in a complicated relationship’

and ‘in an open relationship’into ‘in a relationship’,and‘civil partner-

ship’becomes “married”.

Fig. 3. Facebook administrators: screen shot of Web page which lists people who ‘like’a Facebook page.

64 J.-W. van Dam, M. van de Velden / Decision Support Systems 70 (2015) 60–72

4.3.4. Number of Facebook friends

The number of Facebook friends can serve as a proxy for Facebook

activity or popularity of a user.

4.3.5. Liking data

In addition to the background information, which, with the excep-

tion of the number of Facebook friends, is user supplied, we are able

to ﬁnd for each user which other Facebook pages are “liked”.Basedon

this information we would like to see if clusters/segments of users can

be identiﬁed. That is, is it possible to distinguish groups of Facebook

users with similar “liking”patterns. Similar patterns could indicate sim-

ilar preferences and companies may be able to employ segment speciﬁc

marketing strategies. For example, if a segment of users tends to like

certain artists more often than any other segment, promotions involv-

ing such an artist could be speciﬁcally targeted at that segment alone.

In the next section, we introduce methodology to ﬁnd and interpret

segments based solely on the liking proﬁles of users.

5. Application: clustering Facebook fans

In the previous section we described in some detail how a Facebook

page owner/administrator can obtain data from its fans. A customer re-

lationship manager or a marketing manager would like to make these

data operational by,for example, investigating whether fans can be seg-

mented according to their indicated preferences and/or background

characteristics. That is, is it possible to identify groups of fans on the

basis of individual speciﬁc like data. For example, are certain brands or

celebrities notably more popular in subgroups. Such information could

be useful as it allows better targeting of marketing strategies.

A large internationally successful football (soccer) club granted us

administrator rights to its ofﬁcial Facebook site. This enabled us to ex-

tract the data using the framework introduced in Section 2.Forstrategic

purposes, the football club requested that its name, and any information

that could possibly lead readers to infer the name, be kept from the pub-

lic. Consequently, in our data analysis only a selection of general, not

football related, labels are used.

At the time ofthe data extraction, February 2011, thetotal number of

Facebook fans of the club was about 4 million. From these, we extracted

data from over 40,000 fans. To check representativeness of this sample

we compared the gender and age-distribution of our sample to that of

the population as obtained through Facebook Insights. The results, pre-

sented in Table 1, show only small differences indicating that our data

set is representative with respect to gender and age-distribution.

Furthermore, we see that the Facebook fans of the football club are,

not surprisingly, predominantly young males.

As explained in the previous section, Facebook users' data

concerning their location can be enriched with geographical identiﬁers

such as latitude and longitude. In Fig. 4, the resulting concentrations of

fans are visualized in a heatmap created using Google maps API [13].

The ﬁgure shows a high concentration of Facebook ‘fans’in Europe,

India, Nigeria, South-east Asia, and Central America. Big parts of Africa

and Australia have a low ‘fan’concentration and in China there are hard-

ly any ‘fans’visible. This isa result of the fact that Facebook's penetration

in China and Africa is relatively low, compared to that in other regions

[2]. Australia has a relatively high general Facebook penetration (46 %)

[2]. However, there are hardly any ‘fans’of “our”football club in

Australia, according to Fig. 4. Apparently, this football club is not popu-

lar in Australia on Facebook.

Recall thatour data collection framework only allows for the retriev-

al of publicly published ﬁelds. Table 2 shows a breakdown of user's

background data available in our initial sample. We see that except for

gender and language settings, the percentage of users providing the

proﬁle information varies and is generally limited. We thereforeexclude

these variables when attempting to identify subgroups. Instead, we

focus on the “like”data. Our goal is to ﬁnd clusters of Facebook fans

based on their liking data.

Facebook users can specify what/who they like on their proﬁle page.

For example, not only famous movie stars, movies, sports, athletes, tv-

programs, and actors but also brands, restaurants or personal friends

may be liked. For our initial sample of 43,861 fans, we found that

176,381 unique Facebook pages were “liked”. However, of these

176,381 pages 77.5% was liked by only 1 user in our sample. Often,

these pages are simply personal friends' Facebook pages. For our pur-

poses, such pages are not interesting and a selection must be made.

We consider only the top 150 Facebook pages in terms of “likes”in

our sample. Selecting data corresponding to these 150 most popular

pages reduces our sample to 16,170 cases. However, the distribution

of the number of likes in this sample of 16,170 users is rather skewed

as many people have only few likes and only a few have many likes.

To allow for discrimination on the basis of the liking proﬁles, we only

consider users that liked at least 5 other pages. The resulting data set

consists of 11,712 individuals. Constructing a data matrix with individ-

uals as rows and the top 150 liked pages as columns, yields a large

matrix with few observations. The scarcity of data (i.e., the many zeros

indicating that a page is not “liked”) and dimensionality of the data

set, pose a serious problem for “normal”cluster analysis methods. We

therefore analyze the large data matrix by using a joint dimension

reduction and clustering approach.

5.1. MCA K-means

There exist several methods for clustering high-dimensional data.

One popular approach is to use a two-step procedure. In the ﬁrst step,

a dimension reduction technique is used to reduce the dimensionality

of the data. In the second step, cluster analysis is applied to the data in

the reduced space. This method may be referred to as the tandem ap-

proach [1].Asshownby[35] an important drawback of this method is

that the dimension reduction may distort or hide the cluster structure.

To overcome this problem several methods have been proposed [20,

21,34] here we apply the joint MCA and K-means method proposed

by [20].

MCA, also known as homogeneity analysis [12], yields optimal scal-

ing values for the columns (i.e., quantiﬁcations for pages) in such a way

that pages differently assessed by the individuals receive dissimilar

scale values. Furthermore, rows (i.e., individuals) exhibiting dissimilar

patterns of liked pages, receive dissimilar scale values. K-means cluster-

ing [26],ﬁnds clusters by minimizing the sum of squared deviations be-

tween the individual observations and their cluster means. [20]

proposed a joint method, from here on referred to as MCA–Kmeans,

that averages the MCA and K-means objective functions. An important

advantage of the MCA–Kmeans approach is that it enables visualization

of the data. A more formal formulation as well as an efﬁcient algorithm

useful for dealing with large data matrices is given in Appendix B.

5.2. Analysis

We apply MCA–Kmeans to the 11,712 observations with the 150 bi-

nary variables indicating whether a page was or was not liked by an in-

dividual. To decide upon the number of dimensions and clusters we

inspect the changes in ﬁt when more dimensions/clusters are added.

In particular, for the dimensionality, we consider the adjusted explained

Table 1

Age and gender distributions: Insights' data versus our sample.

Overall 13–24 25–34 35–44 45–54 55+

Male population 0.82 0.77 0.17 0.03 0.01 0.02

Male sample 0.79 0.74 0.22 0.01 0.01 0.02

Female population 0.14 0.77 0.15 0.04 0.02 0.02

Female sample 0.19 0.78 0.17 0.03 0.01 0.01

65J.-W. van Dam, M. van de Velden / Decision Support Systems 70 (2015) 60–72

inertia of the MCA solution, as deﬁned by [14] The adjusted explained

inertia takes into account the rather speciﬁc structure of the data ap-

proximated in MCA. In particular, it corrects for the underestimation

of typical correspondence analysis ﬁt measures when applied to a

(super)indicator matrix. Fig. 5 gives the cumulative explained inertia

for the MCA solutions with different dimensionality. Although the effect

is small, we can see that after three dimensions the effect of adding

more dimensions decreases. We therefore consider only three dimen-

sions in our analysis. An additional beneﬁt of this choice is that it allows

for graphical representations.

To select the number of clusters we consider the value of the objec-

tive function, using a three dimensional solution, with different num-

bers of clusters. In Fig. 6 the ﬁnal objective function values are plotted

against the number of clusters. The decrease in objective function

value after four clusters is small and we therefore consider three dimen-

sional solutions with four clusters.

5.3. Results

In Fig. 7, the solution using the ﬁrst two dimensions of the MCA–

Kmeans solution is given. Cluster memberships are indicated by using

different colors and symbols. We see that in the ﬁrst two dimensions

three clusters appear separated from each other whereas the fourth

cluster, situated around the origin, appears to overlap all three other

clusters. As can be veriﬁed from Fig. 8, this fourth cluster separates itself

from the other three clusters in the third dimension. Note that, in MCA,

the origin corresponds to the average proﬁle. That is the, average distri-

bution of likes over Facebook pages. The fact that many attribute points

are situated close to the origin, is partly due to the relatively large

amount of not liked pages. That is, most users did not “like”more than

8 out of the 150 pages. Hence, each row of the data matrix contains

many zeros for the “like”columns and, consequently, many ones for

the corresponding “not liked”columns. This caused the attribute points

corresponding to the “not liked”pages to dominate the mean proﬁle

and draw the corresponding points to the origin.

The spreadand sizes of the clusters are nicely displayed in Figs. 7 and

8. However, the attributes (i.e. the Facebook pages) are not labeled to

avoid further cluttering. Consequently, interpretation of the clusters in

terms of the liked/not liked pages is not possible from these ﬁgures.

For a better interpretation of the cluster with respect to the pages,

Figs. 9 and 10 give joint plots of the cluster centers and the attributes.

Points close to the origin have not been labeled to avoid clutter and,

due to the conﬁdentiality agreement we have relabeled pages corre-

sponding famous football players as FFP and pages corresponding to

football clubs as FC.

Fig. 4. Where do the football club's Facebook fans come from?

Table 2

Overview ofthe available proﬁle information for the43,861 users in our sample.

% FB users in our sample

Gender 99.6

Date of birth 2.5

Relationship status 21.5

Sexual orientation 27.8

Location 54.2

Hometown 29.8

FB language setting 97.4

# FB friends 48.6

Education 37.1

Work experience 22.3

1 2 3 4 5 6 7 8 9 10

100

Number of dimensions

Explained inertia (in %)

Fig. 5. Explained adjusted inertia as a function of number of MCA dimensions.

66 J.-W. van Dam, M. van de Velden / Decision Support Systems 70 (2015) 60–72

Looking at the positions of the attributes and the clusters, we see

that in cluster 1 (3549 observations) there appears to be a link with

Latin America. That is, the Facebook page fútbol and pages correspond-

ing to entertainers particularly popular in Latin America (e.g., Daddy

Yankee, Wisin & Yandel) are relatively often liked. Also, the football

club near cluster 1 is in fact the only South American club present in

our data set. In cluster 2 (1734 observations) we ﬁnd relatively many

likes to Southeast Asia related stars and topics (e.g. SCTV and RCTI are

Indonesian television stations, Upin and Ipin is a Malaysian television

series, and Timnas Indonesia is the Facebook page of the Indonesian na-

tional football team). The three pages farthest removed from the origin

and relatively often associated with individuals in cluster 3 (4048 obser-

vations) are cricket and India related (e.g., Sachin Tendulkar is a famous

Indian cricket batsman). Other pages that are relatively often associated

with this cluster are chess, traveling and sleeping. Note that the cluster

mean for this cluster is not far from that of cluster 4 (2381 observations)

so we should be careful in interpreting the pages close to both cluster

centers on the basis of the ﬁrst two dimensions. Instead, plotting the

second and third dimension clariﬁes some differences as the clusters

separate along the third dimension. Fig. 10 gives the corresponding

plot, where again, to avoid clutter, we removed some labels and use

the general labels for players and clubs. Individuals in the fourth

cluster relatively often like pages corresponding to American enter-

tainers (e.g. Vin Diesel, Selena Gomez, John Cena, Megan Fox). Also,

Disney, Jackie Chan and MaﬁaWars(amultiPlayersocialnetwork

game) and Facebook are liked more often than average in this

cluster.

Itisimportanttonotethatthefourclustersarecharacterizedby

liked Facebook pages that predominantly are not immediately foot-

ball related. In fact, the clutter of football related pages close to the

origin indicates that in all clusters, these pages are liked as well.

This is not surprising as all individuals in our sample ‘liked’the foot-

ball club which granted us administrator rights thus asserting their

interest in football. However, as the clusters differentiate themselves

through non-football related pages, opportunities arise for cluster

speciﬁc marketing efforts.

The MCA–Kmeans approach emphasizes relative rather than abso-

lute differences. This means that if we look at the distribution of likes

in each cluster, the attributes closest to the cluster means in the plot,

need not be the most often observed in the cluster. In fact, as indicated

before, for all clusters, the most often liked pages are predominantly

football related pages. Table 3 lists the 10 most often liked pages in

the four clusters. To distinguish between different football clubs and

players we numbered them. Note that differences among the most

popular Facebook pages are limited. The order of the clubs and

players varies, but these are small differences that are of no practical

signiﬁcance.

The MCA–Kmeans results suggest that the clustering may be

linked to geographical factors. To further study this we consider

the Facebook data concerning the locations of the individuals. How-

ever, if individuals chose not to publish their locations, we cannot

determine the country of origin. The language settings could be

used to ﬁnd plausible country or regions for these data. On the other

hand, the fact that the information is missing may also be informative

in itself and we choose to leave the missing locations as they are.

Table 4 gives, for each cluster, the 10 most frequently occurring countries

and the corresponding percentages of occurrences per cluster. We see

that, as conjectured earlier, the ﬁrst cluster has a clear Latin American

2 3 4 5

221.6

221.7

221.8

221.9

Number of clusters K

Value of the objective function

Fig. 6. Value of MCA–Kmeans objective for 3 dimensional solutions for different numbers

of clusters.

Dimension 1

Dimension 2

Cluster 1

Cluster 2

Cluster 3

Cluster 4

LikedPage

Fig. 7. MCA–Kmeans solution with attributes and subjects.

67J.-W. van Dam, M. van de Velden / Decision Support Systems 70 (2015) 60–72

component. Cluster 2 is heavily dominated by Facebook users from

Indonesia. Note that this is the only cluster in which “unknown”is not

the most frequently observed country. Also, with over 62% users from

Indonesia, it is by far the most homogeneous cluster concerning nation-

alities. Facebook users from India are over represented in the third

cluster. For the fourth cluster there does not appear to be a strong geo-

graphical link.

6. Conclusions

In a relatively short time, social network sites have become an im-

portant part of daily life for millions of people. Consequently, such

sites are considered to be an important marketing tool. Interviews

with marketing and customer relationship managers reveal that a

clear strategy regarding the social network sites often does not exist

Dimension 2

Dimension 3

Cluster 1

Cluster 2

Cluster 3

Cluster 4

LikedPage

Fig. 8. MCA–Kmeans solution with attributes and subjects.

FC FC

FFP

Fútbol

The Simpsons

Eminem

Justin Bieber

Texas Hold’em Poker

Linkin Park

South Park

SpongeBob SquarePants

Toy Story

FFP

Jackie Chan

Family Guy

FFP

Music

FFP

TIMNAS INDONESIA

FFP

Black Eyed Peas

Shrek

Lil Wayne

Horror film

Two and a Half Men

David Guetta

Swimming

Cluster4

Enrique Iglesias

Badminton

Bible

Cricket

1 Cent

FFP

Need for Speed

Saw

Avril Lavigne

Usain Bolt

Cluster1

Futsal Al−Qur’an

Sepak bola

FIFA 1

American Pie

Upin & Ipin

The Big Bang Theory

How I Met Your Mother

Usher

History

Wisin & Yandel

Prison Break

FFP

Traveling

Volleyball

Top Gear

Jackass

House

Chess

A Walk to Remember

Avenged Sevenfold

Dahsyat

The Hangover

Bon Jovi

FFP

Sachin Tendulkar

Brazil national football team

Sex and the City

FFP

Futurama

Ungu

FFP

Sleeping

RCTI

Cluster3

Daddy Yankee

SCTV

Hip hop music

Indian Cricket Team

Cluster2

Dimension 1

Dimension 2

Fig. 9. MC–Kmeans plot with cluster means and liked Facebook pages. FC labels denote football clubs, FFP indicates famous players.

68 J.-W. van Dam, M. van de Velden / Decision Support Systems 70 (2015) 60–72

and managers are often unable to use the social network data in their

customer relationship management strategy.

In this paper, we formulated a data collection framework for re-

trieving online proﬁle data from Facebook users. In particular, we

showed how a Facebook page owner, that is, a person or company

with administrator rights to the Facebook page, can ﬁnd other

Facebook users that indicated liking their page. Then, by visiting

the pages of such users, individual level data can be collected.

We applied the data collection framework to obtain a sample of

Facebook users who indicated “liking”a large international football

club. Then, using a joint dimension reduction and clustering approach,

clusters could be identiﬁed on the basis of the users' liking patterns.

Four clusters were obtained that differed with respect to the liking pat-

terns. Moreover, the visualizations immediately exposed how the clus-

ters differentiated themselves. In particular, differences in relative

popularity of non football related Facebook pages characterize the dif-

ferent clusters. Furthermore, the clusters appear to be separated along

geographical lines. That is, although no geographical data were used,

the clusters differentiated themselves along Facebook pages of locally

popular music/tv/sport stars. The popularity of certain pages in only

certain (or one) clusters, could be used to formulate better targeted,

differentiated, marketing strategies.

6.1. Implications for research

In the CRM literature [33,27,38] customer identiﬁcation is considered

as the ﬁrst step in a CRM cycle. Typically, the identiﬁcation concerns di-

rectly observable customer generated content (e.g., transaction data).

The identiﬁcation of potential rather than actual customers as imple-

mented in the data collection framework presented in this paper, offers

several new research opportunities. It would, for example, be interesting

to study the added value and incorporation of the proposed framework

into existing CRM systems. Merging the online proﬁle data from the (po-

tential) customers as obtained from social media, with actual transaction

data, offers other research opportunities. Moreover, tracking the proﬁles

over time allows researchers to study effects of targeted marketing

efforts in a structural fashion.

The data collection framework presented in thispaper was designed

speciﬁcally for Facebook. However, Facebook is not the only social

media platform on which individuals provide information about their

preferences and personal backgrounds. Similar ideas and methods can

perhaps be used to obtain, publicly available data from other social

media platforms (e.g., Twitter, Instagram, Google Plus, LinkedIn). It

may in fact depend on the ﬁrm and its product which social media

outlet is the most interesting.

6.2. Implications for practice

Despite the often acknowledged potential of social network data,

most Facebook related marketing research relies on (online) question-

naires and/or focus groups rather than directly exploiting social network

data. One reason for this situation concerns the limited possibilities for

FC FFP

FFP

Football

FFP

Soccer

FFP

Twilight

Fútbol

The Simpsons

Cluster3

Texas Hold’em Poker

South Park

and 1 more

Basketball

SpongeBob SquarePants

Toy Story

FFP

Shakira

FFP

Jackie Chan

Personal Development

Family Guy

FFP

Music

FFP

TIMNAS INDONESIA

Cluster1

FFP

Shrek

David Guetta

Swimming

Enrique Iglesias

Badminton

Bible

Dr. House

Cricket

FFP

PES

Usain Bolt

Comedy

Facebook

Futsal

Al−Qur’an

Mafia Wars

Sepak bola

FIFA 1

John Cena

Upin & Ipin

The Big Bang Theory

History

Sports

Football Forever

Maria Sharapova

Wisin & Yandel

FFP

Vin Diesel

Traveling

Tennis

Volleyball

Jackass

Megan Fox

Disney

HBO

Taylor Swift

Chess

Prince of Persia

A Walk to Remember

Dahsyat

Selena Gomez

FFP

Sachin Tendulkar

Brazil national football team

Cluster4

FFP

Futurama

Ungu

FFP

Sleeping

RCTI

Daddy Yankee

Step Up Movie

SCTV

Hip hop music

Cluster2

Indian Cricket Team

Dimension 2

Dimension 3

Fig. 10. MCA–Kmeans plot with cluster means and attributes (liked Facebook pages). FC labels correspond to clubs, FFP labels correspond to famous football players.

Table 3

Top 10 pages per cluster.

Cluster 1 Cluster 2 Cluster 3 Cluster 4

Club 1 Club 5 Football Club 3

Club 2 Player 2 Club 1 Club 2

Player 1 Club 2 Club 2 Player 2

Player 2 Club 1 Club 3 Club 1

Club 3 Timnas Indonesia Player 2 Club 4

Fútbol Harry Potter Player 1 Harry Potter

Club 4 Player 3 Harry Potter And 1 more

South Park Club 4 Club 4 Player 5

The Simpsons Player 1 Player 6 Player 1

Club 5 Player 4 AKON Club 5

69J.-W. van Dam, M. van de Velden / Decision Support Systems 70 (2015) 60–72

directly retrieving data from social network sites. In this paper, we for-

mulated a data collection framework for retrieving online proﬁle data

from Facebook users. We showed how a Facebook page owner, that is

a person or company with administrator rights to the Facebook page,

can ﬁnd other Facebook users that indicated liking their page. Then, by

visiting the pages of such users, individual level data can be collected.

The proposed data collection framework has direct potential for

marketing managers as it makes it possible to investigate whether dis-

tinct clusters requiring distinct marketing efforts can be identiﬁed

among potential customers (that is, users that already showed some

form of afﬁliation to the company by “liking”it on Facebook). Hence,

the general framework presented in this paper can be used to improve

and enhance implementation of the identiﬁcation phase in a ﬁrm's

CRM process. In particular, by focusing on potential rather than existing

customers, information becomes available that can be used to improve

marketing efforts aimed speciﬁcally at acquisition of new customers.

Ideally, a system should be implemented that merges the online proﬁle

data with other available proﬁle data (e.g., proﬁles based on transaction

data).

6.3. Limitations and future research directions

The proposed data collection framework only allows for the retrieval

of data from users with public proﬁles. Moreover, from the sample of

users with a public proﬁle, we selected users that “liked”at least ﬁve

of the most popular 150 Facebook pages. The sample analyzed in this

paper therefore does not necessarily represent the population of

Facebook users who “liked”the football club. Instead, the sample only

represents active main stream users.

Another important issue, inherent to some extent to Facebook and

other “new”media,concerns the rapid developments thatmay overtake

current research. In our case, since collecting the data, February 2011,

and ﬁnalizing this paper, the number of users who “liked”the Facebook

page increased from around 4 million to 21 million. More importantly,

however, given the steady increase of Facebook users, it may very well

be the case that current users differ from previous users in their usage

of Facebook. As we only received administrator rights for a short period

of time, we did not study such changes. However, the data collection

framework makes it possible to easily track such changes and act

upon them.

In this paper, we considered a clustering analysis based solely on lik-

ing patterns of Facebook users. Although such indicated liking patterns

require very little effort from the users, they are considered as so-called

reactive data. It would be interesting to see whether the reactive data

can be augmented by non-reactive data. For example, considering net-

work data (i.e. by incorporating data concerning the connections be-

tween users), and/or by other data available on users' Facebook pages

(e.g., posted messages/links/pictures, etc.). Augmenting the data in

such a fashion, may yield even richer and more challenging data sets.

Finally, it should be noted that other social network related applica-

tions can also beneﬁt from our data collection framework. For example,

recently, [11] considered targeting strategies directed towards individ-

uals in a social network using data obtained directly from a large social

network site. Their analysis could be extended by using Facebook data

obtained after application of our methodology.

Appendix A. Identifying Facebook page fans

Appendix B. MCA–K-means

In MCA–Kmeans, the objective is to minimize a weighted average of

the MCA objective and a K-means objective. The resulting objective

function of MCA–Kmeans can be expressed as:

min

Y;B;C;GϕY;B;C;GðÞ¼α1MCA þ1−α1

ðÞK‐means

¼α1X

j¼1

Y−ZjBj











2þ1−α1

ðÞY−CG

jjjj

s:t:YTY¼Ik

where Ydenotes the n×kgroup conﬁguration, Z

is the n×p

(ob-

served) indicator matrix for the jth variable, B

is a matrix of category

quantiﬁcations (attribute weights), Cdenotes the n×Kcluster mem-

bership matrix and Ggives the K×kmatrix of cluster means. The num-

ber of clusters (K) and the dimensionality (k) need to be selected by the

user. The αcoefﬁcient, which lies between zero and one, allows us to

control for the importance of the dimension reduction part versus the

clustering part. In our application we ﬁxαto 0.5 so that both parts are

equally important. An alternating least-squares algorithm can be used

Table 4

Top 10 countries per cluster (with cluster sizes) and clusterwise relative frequencies per country.

Cluster 1 (3549) Cluster 2 (1734) Cluster 3 (4048) Cluster 4 (2381) All (11,712)

Unknown 21.67 Indonesia 62.11 Unknown 16.67 Unknown 21.50 Unknown 18.53

Mexico 9.30 Unknown 12.34 India 16.30 Indonesia 11.68 Indonesia 15.45

USA 6.03 Malaysia 5.94 Indonesia 8.37 Malaysia 6.34 India 7.03

Colombia 5.04 UK 2.48 Malaysia 5.09 India 5.12 Malaysia 4.46

Brazil 3.38 USA 2.13 Nigeria 4.47 UK 4.49 USA 3.94

UK 3.35 Thailand 1.04 UK 4.47 Egypt 3.02 UK 3.84

Indonesia 3.27 Turkey 0.98 USA 3.53 USA 2.86 Mexico 3.63

Argentina 3.07 Spain 0.92 Egypt 2.67 Mexico 2.48 Nigeria 2.23

Chile 2.70 Brazil 0.75 Brazil 2.03 Nigeria 2.10 Brazil 2.09

France 2.51 Mexico 0.58 Iran 1.90 Algeria 1.68 Egypt 2.00

Total 60.33 Total 89.27 Total 65.51 Total 61.28 Total 63.20

70 J.-W. van Dam, M. van de Velden / Decision Support Systems 70 (2015) 60–72

to solve the minimization problem. For ﬁxed Y,thecategoryquantiﬁca-

tions become:

Bj¼ZT

jZj



−1ZT

jY;

and, similarly, for ﬁxed Yand C,theclustergroupmeanscanbe

calculated as:

G¼CTC



−1CTY:

Furthermore, [20] shows that, for ﬁxed C, the conﬁguration matrix Y

can be obtained using the eigenequation

α1X

j¼1

ZjZT

jZj



−1ZT

jþα2CC



−1CT

AY¼YΛ:ð1Þ

By considering the eigenvectors (i.e., the columns of Y) correspond-

ing to the klargest eigenvalues, the optimal group conﬁguration, for

ﬁxed C, is obtained.After updating Yin thisfashion, the clustermember-

ship matrix Cis obtained by considering distances of the k-dimensional

points in Yto cluster means in Gand by subsequently assigning obser-

vations to the closest cluster.

Starting with some initial values for Cand Y(e.g., random cluster

memberships and Ythe conﬁguration obtained after applying MCA)

the approximations are sequentially updated leading the objective to

decrease monotonically. If the decrease is below a certain threshold,

the algorithm terminates and a solution is obtained. To reduce the

chances of obtaining a local minimum, several random starts should

be applied.

Note that the eigenEq. (1) is of crucial importance in the proposed

algorithm. For large n, the matrix that needs to be considered becomes

large. It is therefore useful to reformulate the method in a more efﬁcient

way. This can easily be achieved by deﬁning

X¼ﬃﬃﬃﬃﬃﬃ

α1

pZD−1

zﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

1−α1

ðÞ

pCD−1



;

where Z=[Z

,…,Z

], D

=diag(Z

Z)andD

If we consider the singular value decomposition

X¼YΛ1

2VT

;

where Y

Y=V

V=I, we get, in accordance with Eq. (1),

XXTY¼YΛ

and

XTXV¼VΛ:ð2Þ

The group conﬁguration Ycan thus be obtained as

Y¼XVΛ

−1

:ð3Þ

Finally, although not speciﬁcally mentioned in [20], it is important to

consider all Zmatrices in deviation from the mean vector to avoid a so-

called trivial solution. Alternatively, the trivial solution, i.e.the eigenvec-

tor corresponding to the largest eigenvalue of X

X, should be ignored.

An important advantage of using Eq. 2over Eq. 1is that we only need

to ﬁnd the k+ 1 largest eigenvalues and corresponding eigenvectors

for the Q×Qmatrix X

Xrather than for the n×nmatrix XX

References

[1] H. Arabie, L. Hubert, Cluster analysis in marke ting research. Factorial k-me ans

analysis for two-way data, in: R. Bago zz (Ed.), Advanced Methods of Marketing

Research (160–189), Blackwell, Oxford, 1994.

[2] E.D. Argaez, http://www.internetworldstats.com/facebook.htm2011 (Accessed on 2

June 2011).

[3] S. Ba, H.R. Rao, DSS special issue on the theory and applications of social networks,

Decision Support Systems 55 (4) (2013) 939–940 (1. Social Media Research and

Applications 2. Theory and Applications of Social Networks).

[4] C.H. Baird, G. Parasnis, From social media to social customer relationship manage-

ment, Strategy and Leadership 39 (2011) 30–37.

[5] R. Chakraborty, C. Vishik, H.R. Rao, Privacy preserving actions of older adults on so-

cial media: exploring the behavior of opting out of information sharing, Decision

Support Systems 55 (4) (2013) 948–956 (bce:title N1. Social Media Research and

Applications 2. Theory and Applications of Social Networksb/ce:title N).

[6] C.M. Cheung, M.K. Lee, A theoretical model of intentional social action in online

social networks, Decision Support Systems 49 (1) (2010) 24–30.

[7] J. Claussen, T. Kretschmer, P. Mayrhofer, The effect of rewarding user engagement:

the case of Facebook apps, Information Systems Research 24 (2013).

[8] W. Duan, Special issue on social media: an editorial introduction, Decision Support

Systems 55(4) (2013) 861–862 861–862. 1. Social MediaResearch and Applications

2. Theory and Applications of Social Networks.

[9] Facebook.com.,h ttp://developers.facebook.com/2011 (Accessedon 16 March 2011).

[10] C. Forman, A. Ghose, B. Wiesenfeld, Examining the relationship between reviews

and sales: the role of reviewer identity discloser in electronic markets, Information

Systems Research 19 (2008) 291–313.

[11] Gelper, S., Lans, R.Van der Van Bruggen, G. 2014. Competition for attention inonline

social networks: implications for viral marketing (unpublished manuscript).

[12] A. Giﬁ, Nonlinear Multivariate Analysis, Wiley, Chichester, 1990.

[13] Google, http://code.google.com/apis/maps/index.html2004 (Accessed on 3 May 2011).

[14] M.J. Greenacre, Correspondence Analysis in Practice, Academic Press, London,

1993.

[15] R. Gross, A. Acquisti, Information revelation and privacy in online social networks

(the Facebook case), Proceedings of the 2005 ACM Work shop on Privacy in the

Electronic Society2005. 71–80.

[16] L. Harris, C. Dennis, Engaging customers on Fac ebook: challenges for e-tailers,

Journal of Consumer Behaviour 10 (2011) 338–346.

[17] T.Hennig-Thurau,K.P.Gwinner,G.Walsh,D.D.Gremier,Electronicword-of-

mouth via consumer-opinion platforms: what motivates consumers to articu-

late themselves on the Internet, Journal of Interactive Marketing 18 (2004)

38–52.

[18] T. Hennig-Thurau, E.C. Malthouse, C. Friege, Gensler S. Lobschat, A. Rangaswamy, B.

Skiera, The impact of new media on customer relationships, Journal of Service

Research 13 (2010) 311–330.

[19] S. Ho, D. Bodoff, K. Tam, Timing of adaptive we b personalization and i ts effects

on online consumer behavior, Information Systems Research 22 (2011)

660–679.

[20] H. Hwang, W.R. Dillon, Y. Takane, An extension of multiple correspondence analysis

for identifying heterogenous subgroups of respondents, Psychometrika 71 (2006)

161–171.

[21] A. Iodice D' Enza, F. Palumbo, Iterative factor clustering of binary data, Computational

Statistics 1–19 (2012), http://dx.doi.org/10.1007/s00180-012-0329-x.

[22] A.H. Kracklauer, D.Q. Mills, D. Seifert, Customer managementas the origin of collab-

orative customer relationship management, Collaborative Customer Relationship

Management, Springer, 2004, pp. 3–6.

[23] C. Lampe, N. Ellison, C. Steinﬁeld, A familiar face(book): proﬁle elements as signals

in an online social network, Proceedings of Conference on Human Factors in Com-

puting Systems, ACM Press, 2007, pp. 435–444.

[24] K. Lewis, J. Kaufman, M. Gonzalez, A. Wimmer, N. Christakis, Tastes, ties, and time: a

new social network dataset using Facebook.com, Social Networks 30 (4) (2008)

330–342.

[25] Y.-M. Li, Y.-L. Shiu, A diffusion mechanism for social advertising over microblogs,

Decision Support Systems 54 (1) (2012) 9–22.

[26] J. MacQueen, Some methods for classiﬁcation and analysis of multivariate observa-

tions. In L. Cam & J. Neyman (Eds.),Proceedings of the FifthBerkeley Symposium on

Mathematical Statistics and Probability (1, 281–297), University of California Press,

California, 1967.

[27] E.W. Ngai, L. Xiu, D.C. C hau, Application of data mining techniques in customer

relationship management: a literature review and classiﬁcation, Expert Systems

with Applications 36 (2) (2009) 2592–2602.

[28] S. Okazaki, What do we know about mobile internet adopters? A cluster analysis,

Information Management 43 (2006) 127–141.

[29] N. Park, S. Lee, J.H.Kim, Individuals'personal network characteristics and patterns of

Facebook use: a social network approach,Computers in Human Behavior28 (2012)

1700–1707.

[30] A. Parvatiyar, J.N. Sheth, Customer relationship management: emerging prac-

tice, process, and discipline, Journal of Economic and Social Research 3 (2)

(2001) 1–34.

[31] R. Rishika, A. Kumar, R. Janakiraman, R. Bezawada, The effect of customers' social

media participation on customer visit frequency and proﬁtability: an empirical

investigation, Information Systems Research 24 (2013).

[32] M.J. Shaw, C. Subramaniam, G.W. Tan, M. Welge, Knowledge management and data

mining for marketing, Decision Support Systems 31 (2001) 127–137.

[33] R.S. Swift, Accelerating Customer Relationships: Using CRM and Relationship Tech-

nologies, Prentice Hall Professional, 2001.

71J.-W. van Dam, M. van de Velden / Decision Support Systems 70 (2015) 60–72

[34] S. Van Buuren, W. Heiser, Clustering n objects into k groups underoptimal scaling of

variables, Psychometrika 54 (1989) 699–706.

[35] M. Vichi, H. Kiers, Fa ctorial k-means analysis for two-way data, Computatio nal

Statistics and Data Analysis 37 (2001) 49–64.

[36] M. Wick, http://download.geonames.org/export/dump/readme.txt2005 (Accessed

on 28 April 2011).

[37] K.P.Wiedmann,H.Buxel,G.Walsh,Customerproﬁling in e-commerce: method-

ological aspects and challenges, Journal of Database Marketing 9 (2) (2002)

170–184.

[38] M. Xu, J. Walton , Gaining customer k nowledge through a nalytical CRM, Ind ustri-

alManagementAmp;DataSystems105(7)(2005)955–971.

[39] F. Zhu, X. Zhang, Impact of online consumer reviews on sales: the moderating

role of product and consumer characteristics, Journal of Marketing 74 (2010)

133–148.

Jan-Willem van Damobtained his Master’sdegree Cum Laude inEconomics & Informatics

from Erasmus University Rotterdam, the Netherlands, in 2011. The focus of his Master’s

thesis is on employing data mining techniques for enhancing sport marketing applica-

tions. His resear ch interests co ver areas such as data mining, Web 2. 0, the Semantic

Web foundations and applications, and Web information systems.

Michel v an de Velden is an assistant professor at the Econometric Institute of the Erasmus

University Rotterdam. His research interests conce rn development and application of

visualization methods for multivariate data. His work covers a wide range of research

disciplines ranging from linear algebra to transportation science, and has been published

in an equally wide range of high standing academic journals including Linear Algebra

and its Applications, Psychometrika, Journal of Computational and Graphical Statistics,

Journal of Statistical Software and Marketing Letters. For a full CV and list of publications,

please visit, http://people.few.eur.nl/vandevelden/

72 J.-W. van Dam, M. van de Velden / Decision Support Systems 70 (2015) 60–72

Digital Marketing Analytics in Sports

Chapter

Full-text available

Jan 2024

Ali B. Mahmoud

In this chapter, we embark on an in-depth journey to uncover the transformative power of digital marketing analytics in the sports sector. We explore how data analytics has become an integral part of sports marketing strategies, impacting areas such as fan engagement, sponsorships, advertising strategies, ticket sales, and revenue optimisation. With a shift towards data-driven marketing tactics, this chapter illuminates the transition from conventional methods to innovative approaches that harness technology. Through the effective collection, analysis, and application of data across a multitude of digital channels-including social media, official websites, and mobile applications-we demonstrate how the sports industry is evolving. By intertwining theoretical frameworks with real-world case studies, we provide rich insights that serve as an invaluable resource for industry professionals, scholars, and students alike. This chapter underscores the pivotal role of digital marketing analytics in elevating the fan experience, crafting personalised marketing initiatives, and maximising revenue for sports organisations. Moreover, we delve into the latest trends, the challenges faced, and the prospective future developments within this arena. The narrative highlights the urgency for sports businesses to embrace the swiftly changing digital environment in order to maintain a competitive edge.

Predictive Dispatch of Volunteer First Responders: Algorithm Development and Validation

Article

Full-text available

Nov 2023

Background Smartphone-based emergency response apps are increasingly being used to identify and dispatch volunteer first responders (VFRs) to medical emergencies to provide faster first aid, which is associated with better prognoses. Volunteers’ availability and willingness to respond are uncertain, leading in recent studies to response rates of 17% to 47%. Dispatch algorithms that select volunteers based on their estimated time of arrival (ETA) without considering the likelihood of response may be suboptimal due to a large percentage of alerts wasted on VFRs with shorter ETA but a low likelihood of response, resulting in delays until a volunteer who will actually respond can be dispatched. Objective This study aims to improve the decision-making process of human emergency medical services dispatchers and autonomous dispatch algorithms by presenting a novel approach for predicting whether a VFR will respond to or ignore a given alert. Methods We developed and compared 4 analytical models to predict VFRs’ response behaviors based on emergency event characteristics, volunteers’ demographic data and previous experience, and condition-specific parameters. We tested these 4 models using 4 different algorithms applied on actual demographic and response data from a 12-month study of 112 VFRs who received 993 alerts to respond to 188 opioid overdose emergencies. Model 4 used an additional dynamically updated synthetic dichotomous variable, frequent responder, which reflects the responder’s previous behavior. Results The highest accuracy (260/329, 79.1%) of prediction that a VFR will ignore an alert was achieved by 2 models that used events data, VFRs’ demographic data, and their previous response experience, with slightly better overall accuracy (248/329, 75.4%) for model 4, which used the frequent responder indicator. Another model that used events data and VFRs’ previous experience but did not use demographic data provided a high-accuracy prediction (277/329, 84.2%) of ignored alerts but a low-accuracy prediction (153/329, 46.5%) of responded alerts. The accuracy of the model that used events data only was unacceptably low. The J48 decision tree algorithm provided the best accuracy. Conclusions VFR dispatch has evolved in the last decades, thanks to technological advances and a better understanding of VFR management. The dispatch of substitute responders is a common approach in VFR systems. Predicting the response behavior of candidate responders in advance of dispatch can allow any VFR system to choose the best possible response candidates based not only on ETA but also on the probability of actual response. The integration of the probability to respond into the dispatch algorithm constitutes a new generation of individual dispatch, making this one of the first studies to harness the power of predictive analytics for VFR dispatch. Our findings can help VFR network administrators in their continual efforts to improve the response times of their networks and to save lives.

AN INTELLIGENT TRAVEL RECOMMENDER SYSTEM BY MINING BEHAVIORAL ATTRIBUTES FROM ONLINE TRAVELOGUES IN MALAYALAM – A LOW RESOURCED LANGUAGE

Article

Full-text available

May 2023

Language technology involves various language processing tools and techniques which significantly contribute to Natural Language Processing (NLP). Among NLP, natural language text and speech processing are two emerging segments that require huge attention from research. Regional language processing with the advent of Artificial Intelligence brings umpteen opportunities, especially in the Indian context as many languages were spoken in different parts of the Country. A Recommender Model in the Malayalam language in Travel and tourism domain using unsupervised machine learning techniques is the intention behind this paper. Malayalam is a low-resource and highly inflected language that possesses a greater chance for ambiguity. Data sharing online platforms and social media are used as data collection sources, where the availability is still limited and challenging, which may cause scarcity of data. The works propose various methodologies to generate a custom-made scraping model from the social media written in the Malayalam Language and its preprocessing. A deep-level Travelogue Tagger has been specially constructed as part of the experiment. This paper proposes a recommender model based on traveler reviews using Collaborative filtering and Cosine similarity methods. The experiment succeeded with high precision.

Social Media Fake Profile Classification: A New Machine Learning Approach

Chapter

Jun 2023

Social media fake profile serves various illegal social activities. Therefore, detection and prevention of these profiles are essential. The current approaches based on machine learning (ML) are just considering social media user profile attributes by providing a strict classification. This paper provides a view to utilize a scoring-based fake profile classification technique to monitor the activities of user by using profile attributes and published content. This paper first includes a review to know the dataset to be used and the technique to obtain data from social media platform. Then based on social media user’s profile attribute-based ML model has been introduced to classify the fake and legitimate profiles. To train and validate the model, we have used five machine learning algorithms, namely artificial neural network (ANN), support vector machine (SVM), C4.5 decision tree, Bayes classifier, and k-nearest neighbor (k-NN). Here we have found ANN and SVM which is accurate classification technique as compared to others for this task. Finally, by updating the backpropagation neural network and a scoring method for profile a fake profile classification approach has been developed. The developed model is utilizing the content published by users and the basic profile information of public domain. The experiments have been carried out based on real twits and available profile attribute dataset in GitHub. The results are also compared with SVM and ANN algorithms. Based on the precision, recall, and F-score, the proposed technique outperforms as compared to other two implemented models and has been achieved up to 0.94 f-score.KeywordsSocial media analysisSecurity and privacyFake profile detectionMachine learningArtificial intelligence

Integrating Spatio-Temporal Information into Comprehensive Portraits of Targets

Conference Paper

Dec 2023

Scoring big: a bibliometric analysis of the intersection between sports and public relations

Article

Full-text available

Jan 2024

The escalating ubiquity of social media has intensified the influence of public relations on the general populace’s outlook toward sports and athletes. However, there are limited studies in the literature regarding an overview of public relations in the context of sports. This study addresses this gap by using bibliometric analysis to provide an overview of current trends and future developments in sports PR. The procedures have been executed by analysing the most productive authors and organisations, the frequently researched subjects, and the most cited publications. A comprehensive search of scientific databases was conducted to analyse 524 publications. The datasets retrieved in this study were analysed using ScientoPy and VOSviewer to identify tendencies and map the research themes based on the authors’ keywords. The findings highlight that the keywords frequently used by previous scholars are public relations, sports, and social media. An overall picture of the state of research on sport and public relations is given by this bibliometric analysis. The findings indicate that despite significant advancements in this domain, a considerable amount of work remains to be undertaken, particularly in underexplored areas such as sports communication and image restoration. The findings of this study can be advantageous for individuals involved in sports and public relations, including researchers, practitioners, and students.

Improved heuristics for solving large-scale Scanning Transmission Electron Microscopy image segmentation using the ordered median problem

Article

Dec 2023
COMPUT OPER RES

Exploring Organizational Self-(re)presentations on Visual Social Media: Computational Analysis of Startups’ Instagram Photos Based on Unsupervised Learning

Article

Full-text available

Dec 2023

Yunhwan Kim

Design and Implementation of Social Media Mining – Knowledge Discovery Methods for Effective Digital Marketing Strategies

Chapter

Sep 2023

The Group Right to Mutual Privacy

Article

Full-text available

Jun 2023

Anuj Puri

Contemporary privacy challenges go beyond individual interests and result in collective harms. To address these challenges, this article argues for a collective interest in Mutual Privacy which is based on our shared genetic, social, and democratic interests as well as our common vulnerabilities against algorithmic grouping. On the basis of the shared interests and participatory action required for its cumulative protection, Mutual Privacy is then classified as an aggregate shared participatory public good which is protected through the group right to Mutual Privacy.

Competition for Attention in Online Social Networks: Implications for Seeding Strategies

Article

Full-text available

Feb 2021
MANAGE SCI

Many firms try to leverage consumers’ interactions on social platforms as part of their communication strategies. However, information on online social networks only propagates if it receives consumers’ attention. This paper proposes a seeding strategy to maximize information propagation while accounting for competition for attention. The theory of exchange networks serves as the framework for identifying the optimal seeding strategy and recommends seeding people that have many friends, who, in turn, have only a few friends. There is little competition for the attention of those seeds’ friends, and these friends are therefore responsive to the messages they receive. Using a game-theoretic model, we show that it is optimal to seed people with the highest Bonacich centrality. Importantly, in contrast to previous seeding literature that assumed a fixed and nonnegative connectivity parameter of the Bonacich measure, we demonstrate that this connectivity parameter is negative and needs to be estimated. Two independent empirical validations using a total of 34 social media campaigns on two different large online social networks show that the proposed seeding strategy can substantially increase a campaign’s reach. The second study uses the activity network of messages exchanged to confirm that the effects are driven by competition for attention. This paper was accepted by Anandhi Bharadwaj, information systems.

Customer relationship management: Emerging practice, process, and discipline

Article

Full-text available

Jan 2001

Customer relationship management (CRM) has once again gained prominence amongst academics and practitioners. However, there is a tremendous amount of confusion regarding its domain and meaning. In this paper, the authors explore the conceptual foundations of CRM by examining the literature on relationship marketing and other disciplines that contribute to the knowledge of CRM. A CRM process framework is proposed that builds on other relationship development process models. CRM implementation challenges as well as CRM's potential to become a distinct discipline of marketing are also discussed in this paper.

The impact of new media on customer relations

Article

Full-text available

Jan 2010

Customer Management as the Origin of Collaborative Customer Relationship Management

Article

Jan 2004

The marketing departments of retailers and manufacturers speak more often in their analyses about “hybrid” consumers — customers who do not demonstrate behavior consistent with simple categories. The smart shopper, one with a Jaguar in the parking lot of a discount hypermarket, is a reality, just as is the college student in a boutique wine shop. Because of this seemingly paradoxical customer behavior, it is becoming more and more difficult for retailers and manufacturers to identify and retain valuable customers.

Nonlinear Multivariate Analysis

Article

Sep 1993

Correspondence Analysis in Practice

Article

Jan 1993

Michael Greenacre

From social media to social customer relationship management

Article

Jan 2013

This publication contains reprint articles for which IEEE does not hold copyright. Full text is not available on IEEE Xplore for these articles.

DSS special issue on the Theory and Applications of Social Networks

Article

Nov 2013
DECIS SUPPORT SYST

Special issue on social media: An editorial introduction

Article

Nov 2013
DECIS SUPPORT SYST

Wenjing Duan

Social broadcasting networks such as Twitter in the U.S. and ''Weibo'' in China are transforming the way online word of mouth (WOM) is disseminated and consumed in the digital age. In the present study, we investigated whether and how Twitter WOM affects ...

Individuals’ personal network characteristics and patterns of Facebook use: A social network approach

Article

Sep 2012
COMPUT HUM BEHAV

Using the theoretical framework of ego-centric networks, this study examines the associations between the characteristics of both Facebook-specific and pre-existing personal networks and patterns of Facebook use. With data from an ego-network survey of college students, the study discovered that various dimensions of Facebook-specific network characteristics, such as multiplexity, proximity, density, and heterogeneity in race, were positively associated with usage patterns, including time spent on Facebook, posting messages, posting photos, and lurking. In contrast, network characteristics of pre-existing relationships, such as density and heterogeneity in race, were negatively associated with Facebook usage patterns. Theoretical implications and limitations were discussed.

Online profiling and clustering of Facebook users

Abstract and Figures

Recommended publications

‘You Have One Identity’: Performing the Self on Facebook and LinkedIn

Propagation phenomena in large social networks

Fcbook: Swearing impacts impression formation on social media

Community Detection in Social Network using Temporal Data

Online profiling and clustering of Facebook users

Abstract and Figures

‘You Have One Identity’: Performing the Self on Facebook and LinkedIn

Propagation phenomena in large social networks

F*c*book: Swearing impacts impression formation on social media

Community Detection in Social Network using Temporal Data

Fcbook: Swearing impacts impression formation on social media