ArticlePDF Available

Review of social media analytics process and Big Data pipeline

April 2018
Social Network Analysis and Mining 8(1)

April 2018
8(1)

DOI:10.1007/s13278-018-0507-0

Authors:

Hiba Sebei

University of Sfax

Mohamed Ali Hadj Taieb

Data Engineering and Semantics Resaerch Unit. Faculty of Sciences of Sfax. University of Sfax. Tunisia

Mohamed Ben Aouicha

University of Sfax

Social media analytics is a research axis focused on extracting useful insights from social media data, with the aim of helping individuals and organizations take the most optimum decisions regarding several disciplines of life (business, marketing, politics, health, etc.). In this respect, social networks, microblogging, and media-sharing websites represent striking instances of online social media, as constructed under the Web 2.0 associated technologies, targeted to promote the interaction between users and these websites, while shifting the user’s position from that of a mere consumer to that of a social data producer. Hence, huge amounts of social data turn out to be issued, thus turning into critical sources of Big Data. Actually, the traditional media analytical techniques seem obsolete and inadequate to process this huge array of unstructured social media and capture the massive data range, mainly the shifting from the batch scale to the streaming one. Such a process has culminated in injecting Big Data technologies throughout the analysis process. So, the present survey is targeted to help the concerned researchers identify the challenges encountered during the analysis process along with Big Data solutions. Indeed, the aim lies in providing a clear analytical process applicable with Big Data technologies. A systematic literature review is conducted to address the challenges facing integration of Big Data technologies, while displaying some adequate solutions. Following extensive literature search, an overall global view concerning the superposition of the social media analytics and Big Data technologies has been drawn and discussed, along with a promising potential research trend.

Social Big Data analysis steps

…

Illustration of the context and the research questions

…

Social Big Data management steps

…

Evolution of the search rate of term Big Data from 2004 until 2017 (made by Google trends)

…

Big Data dimensions

…

Figures - available from: Social Network Analysis and Mining

This content is subject to copyright. Terms and conditions apply.

Content uploaded by Mohamed Ali Hadj Taieb

Content may be subject to copyright.

Vol.:(0123456789)

1 3

Social Network Analysis and Mining (2018) 8:30

https://doi.org/10.1007/s13278-018-0507-0

REVIEW ARTICLE

Review ofsocial media analytics process andBig Data pipeline

HibaSebei1 · MohamedAliHadjTaieb1 · MohamedBenAouicha1

Received: 29 August 2017 / Revised: 25 March 2018 / Accepted: 27 March 2018

Abstract

Social media analytics is a research axis focused on extracting useful insights from social media data, with the aim of helping

individuals and organizations take the most optimum decisions regarding several disciplines of life (business, marketing,

politics, health, etc.). In this respect, social networks, microblogging, and media-sharing websites represent striking instances

of online social media, as constructed under the Web 2.0 associated technologies, targeted to promote the interaction between

users and these websites, while shifting the user’s position from that of a mere consumer to that of a social data producer.

Hence, huge amounts of social data turn out to be issued, thus turning into critical sources of Big Data. Actually, the tradi-

tional media analytical techniques seem obsolete and inadequate to process this huge array of unstructured social media and

capture the massive data range, mainly the shifting from the batch scale to the streaming one. Such a process has culminated

in injecting Big Data technologies throughout the analysis process. So, the present survey is targeted to help the concerned

researchers identify the challenges encountered during the analysis process along with Big Data solutions. Indeed, the aim

lies in providing a clear analytical process applicable with Big Data technologies. A systematic literature review is conducted

to address the challenges facing integration of Big Data technologies, while displaying some adequate solutions. Following

extensive literature search, an overall global view concerning the superposition of the social media analytics and Big Data

technologies has been drawn and discussed, along with a promising potential research trend.

Keywords Big Data pipeline· Online social media· Social Big Data· Social media analytics· Big Data challenges· Big

Data technologies

1 Introduction

It is worth mentioning that online social media websites

stand as a critically important platform of Big Data sources

(Gandomi and Haider 2015; Yaqoob etal. 2016) mainly

involving online social networking websites (e.g., Face-

book, MySpace, etc.), multimedia sharing websites (e.g.,

YouTube, Instagram, etc.), and microblogging websites

(e.g., Twitter). Given the wide spread deployment of these

websites worldwide, a huge amount of data is generated in

a scale of seconds.

According to a report published by the We are Social and

Hootsuite1 dealing with the digital world, in 2018, it has

been reported that more than half of the world’s population

is presently using the Internet, 42% among them are active

social media users and about 39% are active mobile social

users.

In addition to the large-scale application of social media,

these websites are constructed under the Web 2.0 technology

(Cormode and Krishnamurthy 2008; Newman etal. 2016;

Zeng etal. 2010), which makes the exploitation of the mas-

sively user-generated data amounts an eﬀective proﬁt for a

wide range of domains such as politics (Stieglitz and Dang-

Xuan 2013), through the forecasts of the election result, the

biomedical area (Kotsilieris etal. 2017; Ji etal. 2017), as

well as the business-decision-making process (Rahmani

etal. 2014), through assessments of the costumers’ attitudes

as available in the various social-related media sites (He

etal. 2017). Thus, the rise of social media analytics as a

* Hiba Sebei

hiba.enis@gmail.com

1 Multimedia, InfoRmation systems andAdvanced Computing

Laboratory, Sfax University, Sfax, Tunisia 1 https ://weare socia l.com/blog/2018/01/globa l-digit al-repor t-2018.

Social Network Analysis and Mining (2018) 8:30

1 3

30 Page 2 of 28

process (Gandomi and Haider 2015) aims to extract useful

knowledge from social data through the collection, clean-

ing, and analyzing of the social. Actually, what matter most

about data are not data themselves but rather, information

and knowledge they contain. Several studies that have been

conducted (Chen etal. 2009; Rowley 2007; Stenmark 2002)

discuss the diﬀerence between data, information, and knowl-

edge, and devise a special hierarchy whereby to highlight the

connection persistent between them. In this respect, Ackoﬀ

(1989) deﬁnes data as “symbols that represent the proper-

ties of objects and events data have no meaning, processing

this data leads to the producing of information that have

meaning and answer to question related to ‘who,’ ‘what,’

‘where,’ and ‘when’ and knowledge is about transform-

ing information into instructions and rules and providing

answers to ‘how’ questions.” As for Chen etal. (2009), they

establish a distinction, between the three terms in terms of a

computational space, with regard to their signiﬁcance in the

computer memory. Accordingly, data turn out to refer to the

representations of models and attributes of real or simulated

entities, while information represents the results of a com-

putational process, such as statistical analysis, for assigning

meanings to the data, or the transcripts of some meanings

assigned by human beings, and knowledge designates “Data

that represent the results of a computer-simulated cogni-

tive process, such as perception, learning, association, and

reasoning, or the transcripts of some knowledge acquired

by human beings.”

In its complete form, social media analytics refers to

the traditional multiprocess task of analyzing the social

media-associated data. For the purpose of getting useful

insights, including such areas as: sentiment analysis (Li and

Wu 2010), sentiment classiﬁcation, social network analysis

(Magnusson 2012) as well as data mining (Han etal. 2011a),

these analytical frameworks rest on a multitude of associated

techniques such as text mining, computational linguistics,

machine learning, and natural language processing.

In this regard, the social media websites stand as typical

examples of Big Data sources, characterized with an expo-

nential growth of heterogeneous data (videos, photographs,

texts, audios)which makes the social media analytical pro-

cess a challenging task, owing mainly to the traditional tech-

niques’ ineﬃciency to analyze this huge ﬂow of social data

(Orgaz etal. 2016). Thisraises the need for new technolo-

gies to help in boosting and enhancing the performance of

traditional techniques and achieving reliable results based

on the implemented analysis (Peng etal. 2017; Sapountzi

and Kostas 2016). These novel technologies are basically

Big Data technologies which, once combined with conven-

tional social media analytics, prove to display a remarkable

potential enabling to process the ﬂood of social media data.

Hence, the emergence of a new coined term dubbed Social

Big Data (SBD) or Big Social Data (BSD) to characterize

this new area of research deﬁned by (Bohlouli etal. 2015;

He etal. 2017; Orgaz etal. 2016). SBD designates the joint

of Big Data technologies and frameworks with the tradi-

tional analysis techniques targeted to process and analyze

social media data and with the aim of deriving useful value.

Similarly, Nguyen etal. (2014), along with Vatrapu etal.

(2016), deﬁne SBD as being the social media data generated

as characterized with a huge massive volume of unstructured

data.

Noteworthy, however, is that the alliance between social

media analytics and Big Data does not seem to be explicitly

discussed. Indeed, an exhaustive estimation of the relevant

research works conducted between the years 2008 and 2017

involving the terms social media analytics review and Big

Data analytics review reveals well that these surveys appear

to predominantly focused on reviewing the software tools

as applied to social media scraping and analytics (Batrinca

and Treleaven 2015), highlighting the social media modeling

and analysis related techniques (Jure 2011), outlining the

diﬀerent tools, methods, and techniques useful for analyz-

ing Big Data (Elgendy and Elragal 2014), treating the social

media messages’ analysis methods (Imran etal. 2015), and

investigating the state-of-the-art techniques associated with

analyzing social media pertaining data (Wu etal. 2016).

Worth citing, in this respect, are the work elaborated by

Orgaz etal. (2016) and the literature analysis conducted by

Stieglitz etal. (2018). The former is centered on reviewing

the MapReduce methodology, the frameworks that imple-

ment them (e.g., Hadoop, Spark, etc.), as well as social

media analytics methods and algorithms (e.g., community

detection, text analytics, etc.) designed to handle the ﬂow of

Big Social Data along with the social data analysis-attached

applications. As regards the present research, it encompasses

not just a review of the methodologies, but also their catego-

rization in terms of their functions relevant to each of the

social data processing steps. We also provide an applica-

tion illustrating the diﬀerent steps involved in the process,

through implementation of Big Data technologies. As for the

second work, it has been conducted by Stieglitz etal. (2018)

in the form of a structured literature analysis. Accordingly,

the authors undertake to deﬁne the social media analytics’

intervention area, highlighting their application in several

domains (politics, communication, business, etc.). Similarly,

they brieﬂy enumerate the challenges associated with the

fact that social media commonly share the Big Data-related

characteristics (volume, variety, velocity, and veracity),

while exclusively focusing on the challenges as emanating

only during the ﬁrst three steps (discovery, collection, and

preparation) of the analytics. Furthermore, Stieglitz etal.

(2018) proceed with categorizing the articles dealing with

the same speciﬁc step relating challenges and report the

solutions as mentioned in the state of the art. As for the pre-

sent conducted work, it proves to diﬀer with regard to three

Social Network Analysis and Mining (2018) 8:30

1 3

Page 3 of 28 30

major points. In a ﬁrst place, a noticeable attempt is made to

carry out a thorough discussion of how social media-related

data do actually enclose the Big Data-associated challeng-

ing aspects. Noteworthy, however, is that identifying just a

single Big Data-associated V aspect does not seem to con-

stitute a suﬃciently reliable condition whereby the Big Data

processing-related problem could be wholly identiﬁed and

accounted for. It is actually in this respect that our contribu-

tion can be distinguished through theexplanation of how

the Big Data relevant aspects could be iteratively embodied

within the social data domain. The aim lies in helping the

users of social media analytics identify and ensure whether

they are actually faced with a Big Data-associated prob-

lem or not throughout their executed analysis. In a second

place, the solutions proposed in the present research go even

further as to outline each single step-related challenges. In

eﬀect, a detailed description of the Big Data relevant solu-

tions, adequately ﬁt for coping with each type of challenge,

is advanced, while establishing a comparison between them.

Finally, the newly advanced framework appears to diﬀer in

terms of the steps to follow in order to analyze big social

media data, along with the techniques and technologies

applicable to deal with each one of them.

It is worth noting that the most of the existing research

works involve isolated case studies stressing the challenges

researchers often encounter when deploying speciﬁc meth-

ods to analyze social media data, such as the social network

analysis or opinion mining. Other researchers undertake to

deal with the diﬃculty of associating with particular related

problems (Imranet al. 2015). Actually, there exist no set

standards or guidelines that researchers can follow while

administrating the analysis whereby to identify the diﬃcul-

ties likely to arise through the analysis process and how Big

Data could be used in the process.

In this respect, the present work is intended to provide an

explicit description of how social media analytics behaves

within a Big Data context. Accordingly, and for a compre-

hensive view of the combined scheme to be achieved, as

integrating social media analytics and Big Data technolo-

gies, we have formulated two main questions to which we try

to provide plausible answers through the undertaken survey,

namely:

• What type of challenges do researchers encounter when

analyzing a big social data?

• How could the Big Data technologies be eﬀectively inte-

grated in such a way as to deal with such challenges?

Figure1 illustrates the motivation lying behind the con-

duction of the present work along with the research question.

The remainder of this research work is structured as

follows: In a first place, the paper-applied terminology

is thoroughly highlighted. In a second place, the pursued

methodology is displayed, whereby a systematic review is

established, along with a depiction of the achieved results, as

subject of Sect.3. As for Sect.4, it involves a discussion of

the major attained ﬁndings highlighting the explicit achieve-

ments reached following implementation of the combined

social media analytics process and Big Data architecture

along with a presentation of Big Data technologies-related

features. In a last stage, new research potential directions and

perspective lines are highlighted.

2 Background

This section is devoted to outline the major terminologies as

used in this survey, along with their respective deﬁnitions.

2.1 Big Data

It is worth highlighting that the International Data Corpo-

ration (IDC) has released that as much as 1.8 ZB of data

was created by the end of 2011 and predicted that no less

than 2.8 ZB of data would be issued in the next few years

(2016), while an amount of corporate 40 ZB of data would

be generated by enterprises by the year 2020. This ﬂood

of data is subsumed by the new term called “Big Data,”

deﬁned through responding to two major questions (Gan-

domi and Haider 2015): What is Big Data? What tasks does

it perform? As response to the ﬁrst, Big Data denotes the

explosion of various data sources, such as social media and

mobiles. In the respect, the McKinsey report (2011) states

that Big Data is about “datasets whose size is beyond the

ability of typical database software tools to capture, store,

manage, and analyze.”(Manyika etal. 2011). In turn, (Ous-

sous etal. 2017) deﬁne the term as being “the large grow-

ing data sets that include heterogeneous formats: structured,

unstructured and semi-structured data.” Accordingly, Big

Data is considered as technologies useful for managing a

huge amount of data that cannot be managed through tradi-

tional technologies. Similarly, the IDC describes Big Data

as being a new generation of software and architectures

designed to economically extract value from very large

volumes of a wide variety of data through providing and

enabling high-velocity capture, discovery, and/or analysis.

No matter what deﬁnition-related question these deﬁ-

nitions try to answer? and what tasks does it perform?,

the link between these responses provides an idea about

the term Big Data as being a standard notion enabling

to describe data through well-deﬁned characteristics such

as volume and variety, and in parallel the technologies

involved in treating these enormous amounts of generated

data. Figure2 illustrates how often the term Big Data has

been looked up and searched even since the year 2004

relative to the total search volume across various world

Social Network Analysis and Mining (2018) 8:30

1 3

30 Page 4 of 28

regions. This term has been largely wide spread over the

last fewer years starting from the year 2011. It also depicts

that the search rate for the term has reached its peak over

the last 5years, as it has multiplied by eight. As docu-

mented by (Gandomi and Haider 2015) on introducing

a study made by the ProQuest Research Library dealing

with the frequency distribution of documents containing

the term Big Data, it has been revealed that by the begin-

ning of 2011, a rate of about 380 frequent distributions of

documents containing the term Big Data has been scored

and that this rate recorded an even more remarkable rate

in 2013 to reach an average of 1800 searches as a monthly

frequency.

2.2 The related Big Data Vs

The ﬁrst reﬂection lying behind the term Big Data consists

in the volume denoting the data size or amount (Newman

Fig. 1 Illustration of the context

and the research questions

Social media analytics

Big Data technologies

Research Questions

What type of challenges do researchers encounter

when analyzing a big social data?

How could the big data technologies be effectively

integrated in such a way as to deal with such

challenges?

Extracted Knowledge

Business HeathPolitic

Input

Processing

Output

Fig. 2 Evolution of the search rate of term Big Data from 2004 until 2017 (made by Google trends)

Social Network Analysis and Mining (2018) 8:30

1 3

Page 5 of 28 30

etal. 2016). Noteworthy, however, is that the states of art

highlight a variety of dimensions characterizing these data

and their applications, emphasizing the added value they

provide. Three major dimensions associated with Big Data,

termed as the three Vs of Big Data, are volume, variety,

and veracity, which display a common consensus reached

among authors. It is actually, Laney, who initially described

Big Data through the three Vs, namely volume, variety, and

velocity (Uddin etal. 2014).

Volume represents the scale of generated data overcom-

ing the terabytes to reach petabytes and even exabytes. Data

are continuously generated from a multiplicity of sources

such as social media, cloud-based services (Amazon), enter-

prises-related data, and those pertaining to the Internet of

Things (IOT) (Khan etal. 2014; Storey and Song 2017).

An estimation made by Radicati and Hoang (2011) states

that the number of e-mail accounts created worldwide will

increase from 3.3 billion, in 2012, to over 4.3 billion by late

2016. The survey made by IBM in mid-2012 reveals that a

data amount exceeding one terabyte is ranked as Big Data

(Schroeck etal. 2012). This threshold amount is relative

(Gandomi and Haider 2015) as the data volume quantiﬁca-

tion depends also on other factors such as time and data type.

Concerning the time factor, storage capacities will increase

allowing the management of bigger datasets. As for the data-

type factor, it is clear that one terabyte of a textual type

of data is not necessarily equal to a one terabyte of video-

type date. Hence, Big Data is not just about volume but it

includes other dimensions beginning with the initial letter

V culminating in the “Vs” of Big Data.

Variety describes the various data sources and types

(Chen and Zhang 2014). Data steaming from differ-

ent sources are characterized with diﬀerent formats. For

instance, one could distinguish structured data that refer to

often managed Structured Query Language (SQL), a pro-

gramming language created for managing and querying

data within Relational Data Base Management Systems

(RDBMS) (Hashem etal. 2015). Structured data are easy

to input, query, and store. There are also data generated in a

semistructured format, such as Extensible Markup Language

(XML) and JavaScript Object Notation (JSON) data. Yet,

the main format characterizing Big Data is that pertaining to

unstructured data such as the multimedia-related data (vid-

eos, photographs, and audios) that do not take a ﬁxed format

(Gandomi and Haider 2015), which makes its management

a serious challenge facing data scientists.

Velocity refers to the speed characterizing incoming and

outgoing data (Chen and Zhang 2014). In fact, the speed

marking the generated data is evaluated in terms of scale

of batch, near real time, and real time to reach stream-

ing. According to Yaqoob etal. (2016), the data velocity

depends highly on the proliferation of mobile devices and

other device sensors connected to the Internet. Additionally,

providing reasonable response time and updates turns out to

be a requirement and a reference whereby the applications

eﬃciency can be assessed, as conﬁrmed by (Fan and Bifet

2013). Besides, managing and analyzing streaming data also

stand as extra challenges, requiring the application of rel-

evant techniques and technologies to handle (Orgaz etal.

2016). Actually, some other authors and companies attribute

other dimensions characterizing Big Data. For instance, IBM

and Microsoft “coined Veracity as the fourth V, which rep-

resents the unreliability inherent in some sources of data as

customer sentiments in social media are uncertain in nature”

(Gandomi and Haider 2015). In this respect, veracity refers

to data messiness and trustworthiness (Gandomi and Haider

2015). For Storey and Song (2017), veracity raises chal-

lenges related to data quality (Haryadi etal. 2016) closely

associated with accuracy, timeliness, currency, completeness

(Agrawal etal. 2012), consistency, and accessibility (Corbel-

lini etal. 2017) that should be handled by means of auto-

mated techniques. In turn, McKinsey and Oracle added the

notion of value as the fourth V associated with deﬁning Big

Data (Chen and Zhang 2014). This value dimension refers

to the worthiness of hidden insights latent within Big Data

(Gandomi and Haider 2015). With respect to Wang etal.

(2017), Big Data is characterized with the 5V dimensions

through consideration of the both the value and veracity

aspects. Other authors worth citing among them are Uddin

etal. (2014) who talk even about seven Vs attached to Big

Data and incorporating both of the validity and volatility

dimensions. They deﬁne validity as being the data correct-

ness and accuracy with regard to the intended usage, while

volatility designates the retention policy relating structured

data as frequently implemented in our businesses. Figure3

depicts the diﬀerent dimensions associated with Big Data

and their respective designations.

2.3 Social media analytics

In this regard, Stieglitz etal. (2014) deﬁne social media ana-

lytics as “an emerging interdisciplinary research ﬁeld that

aims on combining, extending, and adapting methods for

analysis of social media data.” Another deﬁnition introduced

by Zeng etal. (2010) considers social media analytics as

tools and frameworks whereby “to collect, monitor, analyze,

summarize, and visualize social media data, usually driven

by speciﬁc requirements from a target application.”

3 Research design

For an eﬀective social media analytics process to take place

under a Big Data architecture, one has to primarily recog-

nize the reasons lying behind the implementation of Big

Data technologies as necessary conditions for extracting

Social Network Analysis and Mining (2018) 8:30

1 3

30 Page 6 of 28

knowledge from social data. In this regard, it seems appro-

priate to review the state-of-the-art social media analytics-

related challenges prior to discussing them. This section is

structured as follows: In a ﬁrst stage, we propose to describe

the methods appealed to the establishment of this study. In a

second stage, the major challenges as drawn from the exam-

ined previously elaborated works are highlighted, mainly:

the lack of predeﬁned steps for processing social media

analytics and the Big Data dimensions characterizing social

data. In an ultimate stage, some relevant solutions are pro-

posed in regard to each cited challenge.

3.1 The applied methodology

In this paper, a systematic review is thoroughly conducted.

The relevant data are manually collected through the exami-

nation of a paper search based on a set of relevant predeﬁned

terms, through the available electronic databases, prior a

skimming reading of each retrieved work’s respective title

and abstract to determine their relevance. The applied meth-

odology is further detailed in the upcoming subsections.

3.1.1 Databases andterms

Our search method applied is basically database-oriented

and accounts for the entirety of published journal arti-

cles and conferences as extracted from four main biblio-

graphic databases dealing with the computer science-asso-

ciated areas, more particularly, the ACM (Association for

Computing Machinery), IEEE (Institute of Electrical and

Electronics Engineers) Xplore, Springer, and Elsevier. The

search focused on retrieving examples of social media ana-

lytics frameworks and identifying the perceived challenges

related to the Big Data aspect of social media data: the vol-

ume, variety, and velocity. To this end, a number of terms

have been searched: “social media analytics,” “social media

analysis,” “social data analysis” as synonyms describing the

same search area. These terms are jointly combined with

the terminologies “Big data challenges “and “Frameworks”

(Table1).

3.1.2 Inclusion andexclusion rules

The criteria applied to identify the convenient studies sat-

isfying the research question requirement among those col-

lected from the electronic databases are mainly:

• The paper should be published between the year 2008

and the year 2018.

Fig. 3 Big Data dimensions

BigData

Refers

to the worth of hidden

insights

inside big data

(Gando

mi and Haider 2015)

Refers to the messiness

and trustworthiness of

data (Gandomi and

Haider 2015)

Refers the correctness and

accuracy of data with

regard to the intended

usage (Khan et al. 2014)

Refers to the retention policy

of structured data that we

implement every day in our

businesses (Khan et al. 2014)

01 Volume

02 Variety

03 Velocity

04 Variability

05 Value

06 Veracity

07 Validity

08 Volatility

Refers to the size of the

data (C. P. Chen and

Zhang 2014).

Describes the sources

and types of data (C. P.

Chen and Zhang 2014)

Refers to the speed of

incoming and outgoing

data (C. P. Chen

and

Zhang 2014)

Refers to the variation in the

data flow rates(Gandomi

and Haider 2015)

Table 1 Used search terms and electronic databases

Search terms Electronic databases

Social media analytics Frameworks ACM

IEEE Xplore

Springer

Elsevier

OR Social media analysis

AND

OR Social data analysis Big Data challenges

Social Network Analysis and Mining (2018) 8:30

1 3

Page 7 of 28 30

• The paper should be developed in English.

• The papers should involve processing steps related to the

social media analytics-attached task.

• The papers should discuss the social media analytics-

associated challenges as based on the Big Data-related

four Vs (volume, variety, velocity, and veracity). The

4 Vs related to Big Data are exclusively considered

based on the following bases: The value dimension as

determined by Sapountzi and Kostas (2016) stands as

the process adopted for extracting insights and useful

information from data. In addition, diﬀerent techniques

devised by Zeng etal. (2010) to extract value from data

such as machine learning, data mining, statistics, opti-

mization, and decision support analysis are also imple-

mented. Actually, this dimension could well stand as a

prerequisite for the processing of Big Data rather than a

Big Data-associated dimension. Concerning the volatility

and validity dimensions, they depend highly on the social

data application domain.

3.2 Results overview

The query is implemented on the basis of the above prede-

ﬁned search terms, and interrelating combinations binding

them, as implemented to the four stated electronic databases

along with the application of the inclusion and exclusion

criteria, have yielded the relevant articles. The relevant arti-

cles are categorized into three main categories: First, papers

discuss a social media analytics process by describing the

diﬀerent followed steps of the analysis. Second, papers dis-

cuss social media analytics challenges due to the Big Data

dimensions characterizing the analyzed social data and pro-

posing some solutions. Finally, papers present social media

analytics frameworks based Big Data architecture. Table2

sums up the search results associated with the relevant

papers concerning each of the databases.

3.2.1 Lack ofpredened step foranalyzing social media

data

Among the relevant papers collected before, this subsec-

tion focuses on the set of papers that present social media

analytics frameworks and describe the followed steps to

extract useful knowledge from social data.

3.2.1.1 Findings The designed papers present a set of

frameworks in several domains: health (Abbasi etal. 2014;

Dredze 2012), emergency situation (Avvenuti etal. 2016),

business (Wang etal. 2016), etc. We investigate for each

domain the presented frameworks and summarize the input,

output, and the followed steps in each framework. The

results are summarized in Table3. The table illustrates sev-

eral frameworks categorized according to their application

ﬁelds. It depicts major highlighted frameworks pertaining

to the political context as (Skoric etal.2012; Stieglitz and

Dang-Xuan 2013; Yaqub etal. 2017) in which the authors

appear to collect data from social media websites (e.g.,

Twitter) and analyze them in a bid to investigate the user’

s behavior as predominating prior and following the US

election case. Stieglitz etal. (2018) extend the framework

introduced in (Stieglitz and Dang-Xuan 2013) by adding

the challenges rising in each step of the framework. The

table also illustrates the nature of people’s discussion and

sentiment regarding the concerned politicians. Besides, the

authors seem to apply the social media-related data in a bid

to predict the election results. The frameworks rest on the

implementation of several techniques such as opinion min-

ing and sentiment analysis (Stieglitz and Dang-Xuan 2013),

while the analysis-reached results are reported using dash-

boards and curves. In addition, Table3 also illustrates some

frameworks pertaining to the healthcare domain (Ji et al.

2017; Kanhabua etal. 2012a). The two applied frameworks

rely on the analysis of healthcare relating social media data.

Each one of the frameworks appears to address a speciﬁc

subject, as it the case of (Ji etal. 2017), where the authors

focus on using social media data for the purpose of search-

ing information related to speciﬁc disease. The framework

is destined to both patients and doctors alike, as it enables

them to execute search information related to symptoms and

medicines. As for the framework introduced by (Kanhabua

etal. 2012a), it is aimed to track the temporal developments

of outbreak mentions in Twitter as a helpful tool for detect-

ing early warnings for a rapid response from health authori-

ties to take place. Even though both frameworks rest on

analyzing Twitter data, the steps of analysis diﬀer in terms

of technologies applied with respect to the storage (RDF

database) and visualization tools, while the processing logic

remains the sameanalysis steps: collect, cleanse, store, ana-

lyze and visualize of the analysis results. Other frameworks

attached to predicting natural disaster are also introduced

in this table as (Avvenuti etal. 2014; Sakaki et al. 2013;

Win and Aung 2017) based on analyzing tweets extracted

from Twitter microblog website. The relevant data are col-

lected, ﬁltered, and then classiﬁed. These frameworks dis-

play a variety of classiﬁcation methods and techniques. In

Table 2 Number of research per database

Electronic databases Number of papers

identiﬁed in search

Number of papers

meeting inclusion

criteria

ACM digital library 45 20

Springer 10 4

IEEE Xplore 64 31

Elsevier 36 6

Social Network Analysis and Mining (2018) 8:30

1 3

30 Page 8 of 28

Table 3 Social media analytics-based frameworks

Field Ref Input Analysis steps Output

Politic (Stieglitz and Dang-Xuan 2013) Status updates and corresponding

comments from public Facebook

proﬁles and pages: Facebook

Graph API

Public tweets: Twitter search API

and Twitter Streaming API

Blog messages from Web blogs:

RSS Feeds and HTML parsing

Data tracking and monitoring: refers

to the choice of the appropriate

data tracking sources, and the

methods used to extract data (as

keyword/topic-based, actor-based

approach, etc.)

Data preprocessing: prepare textual

data by eliminating stop words,

stemming, and lemmatization

Data analysis: refers to social

network analysis, opinion mining,

sentiment analysis, text analysis,

etc.

Visual representation of the analysis

Dashboard and reports

(Yaqub etal. 2017) Tweets: the streaming API Collect tweets using keywords

Data cleaning and extraction

Sentiment tagging and classiﬁcation

of the gathered tweets

Importing data in the MySQL data-

base to perform exploration

Development of user behavioral

model, formulating hypotheses

and deriving ﬁndings approving

or disapproving the hypotheses for

the analysis of data

Diagrams show the frequency

and quantify sentiment based on

extracted tweets

(Skoric etal. 2012) Tweets published by a set of selected

Singapore-based Twitter users

Data collection (tweets using Twitter

Rest API)

Data storage: MySQl

Data measures

Curve drawing to show the correlation

between the tweets and votes

Health (Ji etal. 2017) Twitter and health care websites Data collection and ﬁltering

Data storage in RDF database

Developing analytic service as

recommendation and statistical

services

Visualizing the result for users

(patients and clinics)

User dashboard shows information

(e.g., drugs and conditions) con-

cerning a target disease

(Kanhabua etal. 2012a) 3000 oﬃcial outbreak reports pub-

licly available from the external

resources (WHO and ProMED-

mail)

Twitter collection consists of over

112 million of tweets

Data collection

Tweet processing (ﬁltering the

relevant tweets related to the

outbreak)

Data analysis: drawing the evolution

of the tweets-related outbreak

during time

User dashboard visualizes the tempo-

ral development of an outbreak, and

the target place is Bangladesh

Social Network Analysis and Mining (2018) 8:30

1 3

Page 9 of 28 30

Table 3 (continued)

Field Ref Input Analysis steps Output

Other

Urban (Bocconi etal. 2015) Social media (e.g., Twitter,

Instagram, Foursquare), mobile

phone data, spatial statistics, and

demographics

Ingestion and analysis: responsi-

ble for acquiring, cleansing, and

analysis of social data

Fusion tier caters for the integration

interoperability issues across diﬀer-

ent data sources and usage domains

Exploration and visualization: user

interfaces for data exploration,

comparison, and urban analytics

Map-based visualizations that show

clustered points, choropleth, and

path

Business (Oh etal. 2015) Tweets Capturing social media data

through: the identiﬁcation of rel-

evant keywords, the data extraction

based on these keywords (tweets

related to the keywords)and using

the Twitter Search API

Preprocessing the extracted data

which refer to data ﬁltering using

text mining techniques (“remove

irrelevant tweets or to assign tweets

such as in case of tweets belonging

to more than one ad or brand”)

Understanding data by deﬁning the

relevant measures and analyzing

the data (sentiment analysis tech-

niques, text mining, etc.

Presentation: summarizing and report-

ing the analysis results

Marketing (Bothos etal. 2010) Tweets Content extraction from social media

by customized parsers for each

source (Flixster.com, IMDb.com,

Twitter, etc.) Execute Web queries

Execute targeted queries at the

microblogging application Twitter

Processing social information by pro-

cessing of rating sentiment analysis,

query analysis market-based collec-

tive intelligence with artiﬁcial agents

Reports

Content analysis (DWFP) (Dang etal. 2014) Tweets Data integration through data pars-

ing and collection

Storing the collected data in uniﬁed

database

Developing search support using

keyword-based functions

Developing the multilingual transla-

tion support function through the

use of Google Translation API

Visualizing the search results using

a uniﬁed interface design for users

through application of Java Server

Pages (JSP) technology

Social Network Analysis and Mining (2018) 8:30

1 3

30 Page 10 of 28

Table 3 (continued)

Field Ref Input Analysis steps Output

General purpose (Stieglitz etal. 2018) Data based on the research domain

(e.g., marketing data, political

data, business data, etc.)

Data tracking and storage: track-

ing based on a set of approaches

(e.g., keyword actor and URL

related) and methods (e.g., API,

RSS/HTML parsing), storage on

databases

Data preparation

Data analysis based on set of

approaches (e.g., structural attrib-

ute, topic/trend related) and meth-

ods (e.g., social network analysis,

content analysis, etc.)

Reports

Disaster (Win and Aung 2017) Tweets Tweets collection: Twitter stream-

ing API

Tweets preprocessing: reduce the

redundancy and noise

Feature extraction: linguistic

Features detection such as Word

N-grams, POS features, sentiment

Lexicon features using NRC

Hashtag Sentiment Lexicon

Creating disaster lexicons from

annotated tweets

Classiﬁcation of tweets: LibLinear

classiﬁer

The corpus for the searched disaster

contains the target relevant tweets

(Sakaki etal. 2013) Tweets including keywords

related to a target event

Crawl tweets: Twitter search API

Classifying tweets into positive and

negative tweets, where positive

means that a tweet is truly refer-

ring to an actual contemporaneous

earthquake occurrence

Event detector

Location estimation

Visualize earthquake location

estimation based on tweets using a

geographic map.

(Avvenuti etal. 2014) Tweets from Twitter Data acquisition: collect data based

on keywords

Data ﬁltering: ﬁlter the noise from

the collected data

Event detection

Web application designed to show

temporal, geographical, and content

analyses at both of the event and

message level

Social Network Analysis and Mining (2018) 8:30

1 3

Page 11 of 28 30

addition, the table also contains some other frameworks

devoted to serve other purposes such as domain of business

(Oh etal. 2015), marketing (Bothos etal. 2010), and urban

development (Bocconi etal. 2015).

3.2.1.2 Discussion: a Big Data pipeline for encapsulating

social media analytics The reached result appears to reveal

well that despite the variety of these frameworks-associated

applications ﬁelds, they share some points in common such

as the input which is extracted from the social media via

their available APIs (e.g., Twitter Rest API, Facebook Graph

API, etc.). The same applies to the output which refers to the

value extracted from the analyzed social data along with the

visual representation of their analysis (e.g., report, graphs;

maps). Noteworthy, however, and as Table3 analysis steps

indicate, is that each of these frameworks proves to undertake

speciﬁc steps to extract value from social data. A number of

studies maintain the absence of clear processes whereby the

steps could be deﬁned to derive and extract useful informa-

tion from social data, as documented by Peng etal. (2017),

who state the lack of standardization with regard to the pro-

cessing of social networking data. Nevertheless, following

the emergence of social data analytics, it has become crucial

to identify the involved steps necessary for constructing a

clear view for companies and researchers as to how these

data could be managed (Cambria et al. 2014). The state-

of-the art framework appears to reveal the persistence of

an inconsistency prevailing among/between the research

works with respect to the steps to address during the social

media analytics process. As far as Stieglitz etal. (2018) are

concerned, a sample of social media analytics framework

has been proposed, whereby four main steps are deﬁned,

namely: the discovery, tracking, preparation, and analysis

steps. For each step to be well determined, the authors set

a selection of convenient methods and approaches. Further-

more, their devised framework undertakes to provide infor-

mation about the challenges likely to emerge with respect

to each step. Based on the approaches and methods already

outlined throughout the present survey, along with those

discussed by Stieglitz and Dang-Xuan (2013), as well as the

Big Data aspects characterizing social data, we propose to

put forward a new social media analytics relating scheme.

As already stated, the newly advanced social media relating

analytical frameworks turn out to diﬀer in terms of the steps,

as well as techniques and technologies liable for implemen-

tation with respect to each of them. Concerning the relevant

techniques and technologies relative to each speciﬁc step,

they make subject of a full section (Sect.4), while the sug-

gested steps’ identiﬁcation is discussed in the section below.

Indeed, as social media prove to represent Big Data

sources, it sounds reasonable to implement the Big Data

processing pipeline media to stand as predeﬁned steps for

encapsulating the social media analytics process.

Worth citing in this respect are Furht and Villanustre

(2016) who designed a special workﬂow whereby the dif-

ferent steps involved in analyzing Big Data are highlighted.

This Big Data-associated workﬂow outlines six diﬀerent

steps necessary for the process, namely: data collection

that refers to extracting data from diﬀerent sources and

under diﬀerent formats (structured and unstructured data),

ingestion which refers to “loading vast amounts of data

onto a single data store,” discovery and cleansing which

determine the “understanding format and content; clean up

and formatting,” integration which designates the “link-

ing, entity extraction, entity resolution, indexing and data

fusion,” analysis which outlines such relevant techniques

as intelligence, statistics, predictive and text analytics,

machine learning to analyze Big Data, and delivery that

helps in setting the querying, visualization, real-time

delivery of the analysis’ achievedresults for enterprises.

Still, the admitted Big Data pipeline scheme as proposed

by Agrawal etal. (2015) presents a Big Data processing

pipeline involving the steps necessary to implement for

the processing of Big Data sources. Indeed, ﬁve distinct

steps are reckoned necessary by the authors for processing

Big Data, namely:

• Acquisition and recording This step describes the

process of collecting and storing data from diﬀerent

sources.

• Information extraction and cleaning: This step aims to

prepare and cleanse data for the processing step.

• Data Integration, aggregation, and representation: This

step deals with enveloping data in suitable formats to ﬁt

for the analysis.

• Query processing data modeling and analysis These pro-

cedures concern the querying and data mining processes,

aimed to analyze data through implementation of Big

Data analytics techniques.

• Big Data interpretation This step consists in understand-

ing the analysis and taking the right decisions regarding

a particular problem faced in regard to the complexity of

data.

In the same rein of thought, Gandomi and Haider (2015)

regroup these steps into major categories. Accordingly, the

ﬁrst category involves Big Data including the processes and

supporting technologies relevant to acquire, store, prepare, and

retrieve data subject of analysis. It involves three main steps:

acquisition and recording, information extraction, cleansing,

and data integration, aggregation and representation. This

respect, Big Data management as deﬁned by (Siddiqa etal.

2016), stands as a new discipline based on the application of

“data management techniques, tools, and platforms as stor-

age, preprocessing, processing and security” and serves “to

enhance data quality and accessibility for decision-making.”

Social Network Analysis and Mining (2018) 8:30

1 3

30 Page 12 of 28

In fact, the author advances a Big Data management process

ﬂow and taxonomy that help to describe diﬀerent activities

involved in extracting decisional information from Big Data.

It includes several activities, namely data collection, storage,

and preprocessing, through preparing the data collected for

analysis by means of such techniques and algorithms as data

cleansing (Kumar and Chadha 2012), transmission (Siddiqa

etal. 2016), processing, and analysis via the two data mining

methods of classiﬁcation and prediction. As for the second

category, that of Big Data analytics, it consists in the applica-

tion of certain techniques useful for analyzing and acquiring

intelligence from Big Data. It includes the querying process,

data modeling, and analysis as well as the interpretation

step. This encapsulation oﬀers a clear view for companies

and researchers working on social media data analytics about

the social data processing pipeline. It exclusively concerns

the selection of the appropriate techniques and methods ﬁt

for achieving their goals. However, each step of the pipeline

exposes challenges basically related to the data dimensions.

Several discussions are established in this context, as in

Agrawal etal. (2012), respectively, in Chen and Zhang (2014),

the authors mentioned that the pipeline challenges consist in

heterogeneity and incompleteness, scale, timeliness, privacy,

and human collaboration, where the scale refers to the huge

volume of data that reveal problems related to data storage and

processing and timeliness is related to the speed of incoming

data and the time response. Siddiqa etal. (2016) also discuss

required parameters that should be handled during Big Data

management as availability of the system for user at any time,

scalability, data integrity, heterogeneity, resource optimiza-

tion, and velocity, while Olshannikova etal. (2016) classify

the challenges during Big Data processing into:

• Data challenges related to the characteristics of the data

such as the volume, variety, velocity, veracity, dataqual-

ity, data availability, and scalability.

• Processing challenges related to the methods used to cap-

ture, to transform, to model, etc., data.

• Management challenges related to the privacy and secu-

rity of data during the processing steps.

Figure4 depicts the Big Data pipeline as used to

encapsulate the social media analytics process and the

challenges related to each step. In the upcoming subsec-

tion, these challenges are investigated along with their

requirements.

3.2.2 Big Data dimensions characterizing social data

Each step of the previously discussed pipeline exposes

several challenges related to the Big Data aspect charac-

terizing social data. So, in this subsection, our focus of

interest is laid on examining the remaining set of selected

papers that serve to introduce how social media data could

well stand as typical concretization of Big Data-relating

dimensions. Moreover, the challenges as sourced from

these dimensions are also addressed along with the pro-

posed solutions from the studied frameworks of the state

of the art.

3.2.2.1 Findings Volume Online social media websites oﬀer a

diversity of free, easy to use, and public services making them

destined and available to all the Web users without any restric-

tions or costs being imposed. They are characterized with a

great number of active users. Actually, Twitter has reached the

threshold of about 330 million monthly active users2, while

YouTube has recorded over 1 billion users3 and Facebook

announced that it touches an average rate of about 2 billion

Challenges

Social Big Data analytics pipeline

Social Big Data management Social Big Data analysis

Data

collection

Data

storage

Data

preprocessing

Data

processing

Data

analysis

Data

interpretation

Data fidelity

Privacy

Security

Data

Quality

Data streaming

Real-time response Heterogeneity

Data

visualization

Scalability

Availibility

Inte

rit

Fig. 4 Big Data pipeline used to encapsulate the social media analytics process

2 https ://blog.hoots uite.com/twitt er-stati stics /.

3 https ://www.youtu be.com/intl/eng/yt/about /press /.

Social Network Analysis and Mining (2018) 8:30

1 3

Page 13 of 28 30

daily active users, in 2017.4 Indeed, every 20min, about 2 mil-

lion friendly requests and 3 million messages are sent and 1

million shared links5 are established on Facebook. Moreover,

no less than 3600 photographs turn out to be shared by pho-

tographers every minute on Instagram,6 while 300h of new

uploaded videos are registered to occur every single minute on

YouTube, and about 500 million tweets are discovered to be

registered every day on Twitter (see footnote 2). This highlights

well the huge amount of data generated andgets in analogy

with the volume dimension associated with Big Data.

As a matter of fact, the access to these social data raises

several technical challenges (Jagadish etal. 2014; Reuter and

Scholl 2014). In fact, it is based on the querying of social media

platforms through their available APIs (e.g., Facebook Graph

API,7 Twitter Search API8, and YouTube Data API,9 etc.)

which turn out to be quite limited in space, for example, the

YouTube Data API sets a limitation of 30,000 units per user

per second, while the entire quota per day is set at 50,000,000

units. Besides, researchers may also collect data via other meth-

ods too, such as Web crawling, which does not comply with

the terms of services of most social media platforms (Imran

etal. 2015). To achieve reliable results, the researchers need

to extract a huge amount of data that could appear to stand

as a sample for the analytic process. In fact, on studying the

characteristics of YouTube videos based on the 7.6MB of

crawled videos, Cheng etal. (2013) document that 900TB of

disk space is required to store nearly about 120 million You-

Tube videos. Karpenko and Aarabi (2011) develop a compact

representation called “tiny videos” that help to achieve high

video compression based on the extraction of 52,159 videos

occupying 520GB of disk space using the YouTube Data API,

and the metadata of the videos are stored in a ﬁle of 2.8MB in

size. In turn, Achrekar etal. (2011) develop the Social Network

Enabled Flu Trends (SNEFT) framework to track and predict

the emergence and spread of an inﬂuenza epidemic among a

particular population based on a collection of tweets and pro-

ﬁles extracted from Twitter over the period ranging from Oct

18, 2009 to Oct 31, 2010, whereby they collected 4.7 million

tweets from 1.5 million unique users along with their social

relationship from twitter and a number of retweets counting

9.5% of the total collected tweets.

These huge amounts of data reaching the gigabytes and tera-

bytes of data along with the extraction of multimedia data in

the form of videos and photographs appear to raise several chal-

lenges throughout the social media analytics process, mainly:

• Data storage (Chen and Zhang 2014; Kaisler etal. 2013): As

the current disk technology limits are set to about 10 tera-

bytes per disk, 1 exabyte would require 10,000 disks, which

makes it diﬃcult to attach the number of disks required.

• Data processing: in terms of the processor speed as CPU

following the Moore’s Law. Yet, a fundamental shift is

under way nowadays: Data volume is increasing as a rate

that exceeds the CPU speeds and those of other comput-

ing resources (Jagadish etal. 2014).

• Data visualization: The human eyes have diﬃculty in vis-

ualizing a large amount of data along with the computer

screen size, which is around 1 to 3 million pixels. These

challenges are categorized by Agrawal etal. (2015) under

the heading of perceptual scalability.

Variety Social media websites oﬀer a huge amount of data

including proﬁle data, user connections,multimedia metadata

(Hiba etal.2018) and data describing user’s daily activities.

People can upload videos, photographs and express opinions

through textual posts and feelings through a diversity of actions

alike. Thus, online social media data turn to be based on user-

generated content, which is deﬁned by Kaplan and Haenlein

(2010) as the sum of all ways, whereby people make use of

Social Media. They add that the term has emerged ever since

2005 and describes diﬀerent forms of media content as publicly

available and created by end users. In turn, the Organization for

Economic Cooperation and Development (OECD) lists diﬀer-

ent types of user-generated contents (e.g., texts, photographs,

audios, videos, citizen journalism, mobile contents, etc.) (Vick-

ery and Wunsch-Vincent 2007).

Therefore, social media data abounded with a huge

variety of data types classiﬁed as structured data, as pro-

ﬁle information and unstructured data as YouTube videos,

Facebook posts, and Google Plus activities. Social media

data stand as sticking examples of unstructured data. Hence

the rise ofseveral challenges through both data storage and

data processing stages, mainly.

• Data storage: Social media data include such multimedia

data as photographs and videos.

A photograph may have one or more than one color.

So, there are no predeﬁned ﬁelds characterizing photo-

graphs, which makes them unﬁt for storing in a relational

database. So, storing such data requires new technologies

to support the lack of predeﬁned schema.

• Data processing: Fig.5 illustrates a tweet extracted from

Twitter using the Twitter4j10 library. The extracted tweet

is represented in a semistructured format (JSON) but it

does not seem liable to analysis as what really matters is

actually the ability to query the data. In such a case, the

real value of the extracted data lies in recognizing what is

4 https ://newsr oom.fb.com/compa ny-info/.

5 http://www.stati sticb rain.com/faceb ook-stati stics /.

6 https ://vibbi .com/buy-insta gram-follo wers.

7 https ://devel opers .faceb ook.com/docs/graph -api.

8 https ://dev.twitt er.com/overv iew/api.

9 https ://devel opers .googl e.com/youtu be/v3/?hl=de.10 http://twitt er4j.org/en/index .html.

Social Network Analysis and Mining (2018) 8:30

1 3

30 Page 14 of 28

being actually tweeted; this makes a reference to the value

of the text attribute. This value corresponds to a textual

format that contains some links, a text, and annotations.

Figure6 depicts the results of a query relevant to extract

a video from YouTube using the YouTube Data API. The

query response corresponds to a JSON format highlighting

the metadata of the video and the video link. Indeed, the

response does not appear to reveal any information about

the video content. So, processing such data type requires the

existence of appropriate techniques ﬁt for extracting useful

information and then value on the basis of the multimedia

content (text, photograph, video, and audio).

Velocity This dimension concerns the generated data-

associated speed. Given the availability of online social

media via such mobile applications as Facebook, YouTube,

and Instagram, users never stop getting connected to these

applications and continuously producing data (He etal.

2017). The recently released statistics reveal that 500 mil-

lion tweets are daily posted on Twitter11 and 300h of video

are uploaded on YouTube every minute. Statistics also indi-

cate a huge amount of data being generated in the scale of

minutes and days (see footnote 3). Figure7 illustrates the

tweets generated by the CNN oﬃcial page over the period

ranging between July 1, 2017 and the July 20, 2017. Figure7

Fig. 5 Extracted tweet in a

JSON format using the Twit-

ter4J Library from the Twitter

microblogging website shows

the diﬀerent format (link, text,

annotation) of data included in

one textual tweet

Fig. 6 Extracted video in

a JSON format using the

YouTube Data API from the

YouTube sharing website

Fig. 7 Evolution of the total number of tweets replies and retweets

generated by CNN and its reactors on Twitter between the period of

the July 1 and the July 20, 2017 using the TweetStats tool

11 https ://www.omnic oreag ency.com/twitt er-stati stics /.

Social Network Analysis and Mining (2018) 8:30

1 3

Page 15 of 28 30

is generated via the TweetStats tool, and it reﬂects the aver-

age number of tweets generated by the CNN news on a daily

basis, which appears to reach a rate of 137, 9 tweets per

day and about 2895 tweets per month. It also reﬂects cer-

tain information about the number of retweets and replies

corresponding to the generated tweets. In some particular

cases, the analysis of social data proves to entail the pro-

duction of rapid responses following the data analysis, as

developers keep seeking to minimize the latency time as

well as analyze the data that come in streaming as it is the

case with crisis responders, who may want lower latency

for a better response to a developing situation (Imran etal.

2015; Middleton etal. 2014). In eﬀect, Twitter and YouTube

oﬀer, respectively, a Twitter Streaming APIs12 and YouTube

Live Streaming API13 providing developers with low latency

access to the tweet data and video-streamed data. Still, ana-

lyzing such data raises major challenges likely to encounter

during the data processing task as it requires the availability

of adequate techniques and technologies ﬁt for processing

data in real time (Jagadish etal. 2014).

Veracity Social media data are characterized with its

related veracity. In fact, based on human-generated content,

social data are usually full of rumors (Ashwin etal. 2016;

Mendoza etal. 2010). Such unveriﬁed information might

have a negative eﬀect on the decision-making process and

be confusing to people (Ashwin etal. 2016). So, identify-

ing and detecting these rumors are imposed during the data

analysis, procedure for reliable results to be achievable. Fig-

ure8 introduces the projection of Big Data dimensions on

social media data features.

The integration of Big Data technologies to solve the above

challenges is applied along with the state-of-the-art pro-

posed frameworks turning out to provide some schema based

on social Big Data analysis, whereby the authors attempt to

describe the relevant steps along with the implementation of

Big Data technologies. Example of these frameworks is the

platform Social Media Analysis using Big Data Technology

(SoMABIT) as developed by Bohlouli etal. (2015) dislocating

into three major layers: a data layer useful for describing the

diﬀerent sources providing social media data, a logic layer for

knowledge discovery, and a decision-making procedure via the

application of Big Data technologies such as Hadoop (White

2012), Mahout (Owen and Owen 2012) and the implementa-

tion of distributed algorithms through the MapReduce (Dean

and Ghemawat 2008) paradigm, along with an application layer

Big Data

Dimensions

Projection

Social media data

1.4 billion daily active users

on average for December

201714

Since 2015 more than

10,000 videos. They

recorded over a billion

views and more than 70

million viewing hours. 15

330 monthly active users16

Streaming

Data

>>Terabytes of data Text, Videos, Photos Production at a scale of

SecondsSpread of

rumors

Fig. 8 Projection of the four Vs of Big Data dimensions on social media data

12 https ://dev.twitt er.com/strea ming/overv iew.

13 https ://devel opers .googl e.com/youtu be/v3/live/getti ng-start ed.

Social Network Analysis and Mining (2018) 8:30

1 3

30 Page 16 of 28

acting as an end-user interface relevant to querying the plat-

form and visualizing discovered knowledge. In eﬀect, the need

for extracting knowledge from social media data is highlighted

by He etal. (2017) in a bid to improve the organizational and

corporate performance. They also state the limits associated

with traditional content analysis as enhancement for the imple-

mentation of systematic methods whereby knowledge could be

extracted from social data. To this end, they propose a frame-

work that rests on Big Data analytics technologies to “process

Social Big Data, visualize and benchmark comparisons among

competitors across events, products, issues and any other areas”

and store knowledge within a speciﬁed knowledge management

system applicable by managers and employees, alike. The study

conducted by Peng etal. (2017) stands as another intervention

to enhance the relationship persistent between social inﬂuence

analysis and Big Data.

In this respect, the social inﬂuence analysis process is

applicable through the implementation of a number of dif-

ferent steps, namely data collection and storage within the

cloud infrastructure, a preprocessing step intended to clean

data from irrelevant private information by means of Big

Data techniques, such as machine learning, data mining, nat-

ural language processing in a bid to ensure the performance

of the following step, as deﬁned by social inﬂuence analysis,

including algorithms such as (selection of the users’ inﬂu-

ence, performance analysis of related algorithms, selection

of evaluation metrics and inﬂuence computing) culminating

the application of social inﬂuence analysis to several use

cases such as stock market prediction and personal recom-

mendation. Table4 summarizes the social media data-relat-

ing aspects and the relevant challenges and it highlights the

diﬀerent challenges rising during the data analytics process.

3.2.2.2 Discussion: theneed ofBig Data technologies The

reached results appear to reveal well that the Big Data dimen-

sions as associated with social data display either material

challenges such as storage and processing devices (i.e. CPU,

storage disk) characterized by a limitation of capacity or

speed of devices during the storage and processing steps,

technical challenges, which refer to the ineﬃciency of tradi-

tional approaches and methods on the analysis of the multi-

media data. Moreover,the technologies challenges describe

the ineﬃciency of tools in handling the huge amount of

unstructured data as the case of visualization tools. Thus,

the ineﬃciency of the data analysis methods (Orgaz etal.

2016) and traditional tools (Orgaz etal. 2016; Sapountzi and

Kostas 2016) for analyzing large scale of unstructured data

promotes the integration of Big Data-related technologies

as an optimum solution poses special social media analytics

types of challenges. Hence, novel paradigms and software

(Chang etal. 2014) have been developed to handle the mas-

sive volume, variety, and velocity of the issued data in a

bid to facilitate the useful value extraction from them with

respect to various respects and purposes.

This has culminated in the emergence of new research

area, dubbed Social Big Data, with the aim of combining

Big Data tools and technologies with traditional social data

analytics-related techniques for the sake of boosting them

and ensuring an eﬀective data processing and analysis pro-

cess, as outsourced from social media. As illustrated through

Fig.9, Orgaz etal. (2016) consider that Social Big Data is

the result of two combined domains: Big Data and social

media. They also deﬁne Social Big Data as a concept useful

for describing the processes and methods applied to process

social data that cater for the Big Data basic dimensions such

as volume, variety, and velocity with the aim of extract-

ing useful knowledge for users and companies, alike. In this

respect, Cambria etal. (2014) consider the Big Social Data

analysis as “inherently interdisciplinary and spans areas such

as machine learning, graph mining, information retrieval,

knowledge-based systems, linguistics, common-sense rea-

soning, natural language processing, and Big Data comput-

ing.” Based on the data-type categorization, Sapountzi and

Kostas (2016) provide a special view in regard to social

Table 4 Challenges of the social media analytics steps based on the social data aspects

Social data aspect Challenges Social media analytics steps

Volume Provide more space to store data(Chen and Zhang 2014;

Jagadish etal. 2014)

Volume is increasing faster than CPU speed (Chen and Zhang 2014)

Ensure data scalability(Chen and Zhang 2014)

Data storage

Data processing

Data visualization

Variety Homogenize the data for the analysis (Jagadish etal. 2014)

Provide analytics methods for the multimedia data

Store non-schema data

Data storage

Data preprocessing

Data analysis

Velocity Ensure the data availability

Real-time response (Chen and Zhang 2014)

Real-time visualization

Data capture

Data storage

Data processing

Data visualization

Veracity Noisy data due to the user-generated data

Spread of rumors (Ashwin etal. 2016; Mendoza etal. 2010)

Data preprocessing

Data processing

Social Network Analysis and Mining (2018) 8:30

1 3

Page 17 of 28 30

networking data analysis. They adopt social network analysis

techniques useful for resolving a diversity of tasks, namely

link prediction, community detection, and inﬂuence analysis

relevant to structured data analysis and the integration of Big

Data analytics such as text and multimedia mining associ-

ated with analyzing social networking data.

The reviewed papers present the challenges and the solu-

tions related to the Big Data dimensions. However, their

studies are limited to the description of the use of Big Data

technologies for resolving speciﬁc use cases. Thus, in the

next section, we combine the previous results in order to

draw a global view describing the alliance of Big Data and

social media analytics.

4 Discussion

The investigated papers in this survey declared two main

points: ﬁrst, no predeﬁned steps for analyzing social media

data. Second, the Big Data aspect of the analyzed data is a

challenge faced through the integration of Big Data tech-

nologies. For the ﬁrst point, we propose the use of Big Data

pipeline to encapsulate the social media analytics steps. For

the second one, each one of the previous researches proposes

a way to integrate the adequate Big Data technologies and

methods but for a target use case. The link between these

researches can be useful to build a global view of social

media analytics process under a Big Data environment.

• Data collection and acquisition a step that refers to gath-

ering data from diﬀerent sources and their transmission

to data storage platform. Social networking websites

(e.g., Facebook), microblogging (e.g., Twitter), and mul-

timedia sharing websites (e.g., YouTube), as application

of online social media websites, appear to provide appli-

cation programming interfaces (API) for extracting data

such as Facebook Graph API,14 Twitter Search API,15

YouTube Data API16, and Google + REST API,17 as three

diﬀerent approaches advanced by (Stieglitz and Dang-

Xuan 2013) for the purpose of tracking data from social

media: self-involved approach, keyword/topic-based,

actor-based approach, and random/exploratory approach.

At this level, the data are collected according to diﬀerent

formats, namely structured, unstructured data.

• Data Recording or storing refers to systems applied to

store collected data while considering the challenges

related to volume, privacy, and scalability of data. This

step is based on developed techniques such as clustering,

replication, and indexing (Siddiqa etal. 2016). In fact,

traditional data storage process (relational databases)

remains as single-node-based system with ﬁxed schema,

rendering the storage of Big Data a challenging task.

Hence, Big Data oﬀers data storage technologies based

on the sharing, clustering, replication, and indexing prin-

ciples (Corbellini etal. 2017; Siddiqa etal. 2016). In this

regard, storage systems stand as ﬁle systems-based tech-

nologies as Google File System (GFS) (Ghemawat etal.

2003), Hadoop Distributed File System (HDFS) (Orgaz

Fig. 9 Illustration of Social Big

Data

Big Data

analytics

Machine learning

Image mining

Audio analytics

Video anal

tics

Text analytics

Graph mining

Social data (volume,

velocity, variety, etc.)

Extracted Knowledge

Social Data

Big Data Analytic

Social Big Data Analytics

14 https ://devel opers .faceb ook.com/docs/graph -api.

15 https ://dev.twitt er.com/overv iew/api.

16 https ://devel opers .googl e.com/youtu be/v3/getti ng-start ed.

17 https ://devel opers .googl e.com/+/web/api/rest/.

Social Network Analysis and Mining (2018) 8:30

1 3

30 Page 18 of 28

etal. 2016), and databases technologies as NoSQL data-

bases (Corbellini etal. 2017).

– NoSQL databases are provided to store distributed

data and allow a horizontal scaling opportunity.

These databases involve four categories:

– key value (Corbellini etal. 2017), for example Redis

(Carlson 2013);

– Document-oriented database (Corbellini et al.

2017), for example MangoDB (Chodorow 2013) and

CoucheDB (Lennon 2009);

– wide column (Corbellini et al. 2017), for example

HBase (Taylor 2010) and BigTable (Chang etal. 2008);

– graph databases (Corbellini etal. 2017), for example

Neo4j18 and AllegroGraph (Aasman 2006).

• Data preprocessing refers to the methods applied to pre-

pare data in a speciﬁed format ﬁt for analysis, using a

variety of techniques such as data cleansing (Kumar and

Chadha 2012) that help in handling challenges related

to the missing values, the noise embedded data and data

inconsistency, data transmission (Siddiqa etal. 2016),

data reduction (Santhanam and Padmavathi 2014) deal-

ing with such techniques as data compression and redu-

plication, data integration (Ahamed etal. 2014; Espos-

ito etal. 2015) referring to the combination of data as

decoded from multiple sources and data transformation

(Baskar etal. 2013) that deals with the process of data

normalization, aggregation, and generalization of data.

To note certain challenges seems to be associated with

this step, namely data heterogeneity, noisy incomplete-

ness, etc.

• Data processing Processing Big Data rests on parallel

and distributed programming paradigms. In this respect,

four main processing paradigms could be distinguished:

batch processing, streaming processing, interactive pro-

cessing (Chen and Zhang 2014), and large-scale graph

processing as introduced by (Sakr 2016)

– The batch processing relies on the MapReduce

(Dean and Ghemawat 2008) paradigm, whereby the

data are ﬁrstly stored and then processed. The most

popular framework widely implementing this para-

digm is Hadoop (White 2012) as create as Dryad

(Isard etal. 2007).

– The streaming processing is used to process stream-

ing data and get real-time responses (Baquero etal.

2016), such as Storm, S4 (Neumeyer etal. 2010), and

the Apache kafka (Auradkar etal. 2012) frameworks.

– The interactive processing as deﬁned by (Chen and

Zhang 2014) is a framework that “presents the data

in an interactive environment, allowing users to

undertake their own analysis of information” as it is

the case of Google’s Dremel (Mikolov etal. 2011).

– The large-scale graph processing used to process

large-scale graph as, for example, Pregel (Malewicz

etal. 2010), and it is worth noting at this level that

a clear distinction needs to be established among

processing frameworks for large-scale RDF graphs

(Sakr 2016).

• Data analysis describes Big Data techniques as machine

learning, text analytics, and multimedia analytics to get

insights from the relevant data. It is provided through a

variety of Big Data tools and libraries that implement

Big Data analytics (Sapountzi and Kostas 2016) such

as multimedia analysis and text mining relevant to sev-

eral disciplines (Chen and Zhang 2014) as data mining,

machine learning, social network analysis, etc. In this

regard, Big Data oﬀers a variety of tools that operate on

the top Big Data processing frameworks. Worth citing

in this respect are the Apache Mahout (Owen and Owen

2012), Skytree server,19 SparkMlib (Meng etal. 2016)

for machine learning, Apache Nutch (Orgaz etal. 2016)

regarding the business data analysis context.

• Data interpretation refers to the visualization of reports,

diagrams, and tables in a format recognizable by end

users and still contains challenges related to data com-

plexity. It is introduced via Big Data visualization tools

in a bid to create understandable insights based on the

analysis for end users. Visualization tools oﬀer a clear

view about the interpreted data and allow the interaction

between users and extracted insights via the visualiza-

tion of reports and graphs. Several Big Data tools are

applicable in this regard, such as Pentaho20 for reports,

Tableau21 for data visualization, Jaspersoft package22

for generating business intelligence reports, and Talend

Open studio23 for the graphic visualization.

Figures10 and 11 detail each related step through highlight-

ing the Big Data technologies involved along with the relevant

methods and techniques applicable to each step. Hence, Fig.10

deals with the steps related to the Social Big Data management,

while Fig.11 illustrates the analysis and interpretation steps

relevant to the Social Big Data analytics process.

To better explain the usefulness of the proposed frame-

work, ﬁrst, we establish an analogy of some existing frame-

works from the literature and the proposed framework archi-

tecture. Second, we picked a social media analytics task

19 http://www.skytr ee.net/.

20 https ://www.penta ho.com/.

21 https ://www.table au.com/.

22 https ://githu b.com/Jaspe rsoft /jaspe rrepo rts.

23 https ://www.talen d.com/produ cts/talen d-open-studi o/.

18 https ://neo4j .com/.

Social Network Analysis and Mining (2018) 8:30

1 3

Page 19 of 28 30

Data Sources Data Collection Data StorageData preprocessingData processing

Media Sharing

Microblogging

Social Networking

Social media data

sources

Tracking approaches

(Stieglitz and Dang-Xuan

2013

)

Self-involved

Keyword-based

Actor-based

Random/Explorative

URL-based

Tracking Methods

(Stieglitz and Dang-Xuan

2013

)

APIs : Facebook Graph

API, Twitter Search API,

YouTube Data API ,

Google + REST API

RSS/HTML -parsing

(blogs)

Output Data Format

Structured data

Semi-structured data

Unstructured data

Storage techniques (Siddiqa

et al. 2016)

Clustring

Replication

Indexing

Storage Systems

File S

stem

NoSQL databases

GFS

HDFS

COSMOS

TFS

Preprocessing algorithms

Data cleansing (Kumar

and Chadha 2012)

Missing Value

Noisy Data

Inconsistent data

Data transformation

(Baskar et al. 2013)

Normalization

Aggregation

Generalization

Data Reduction (Santhanam

and Padmavathi

2014

)

Compression

Reduplication

Programing model and

Tools

Batch processing: Hadoop

Stream processing: S4,

Apache Kafka, Storm, etc.

Interactive Processing:

Google’s Dremel

Large-scale graph processing:

Pregel, Apache Giraph, etc.

Fig. 10 Social Big Data management steps

Data analysis

Content-

based analysis

Structure-

based analysis

Topic /issue/trend

anal

sis

Opinion sentiment

analysis

Social network

anal

sis

Statistical analysis

Analysis type Analysis objective Analysis method Big data analysis tools

•Text mining

•Video content analysis

•Image analysis

•Trend detection

•Opinion mining

•Sentiment analysis

•Link prediction

•Community detection

•Influence analysis

•Cluster analysis

•Linear regression

Machine learning and Data

mining

Search Engine

Statistical analysis

Data interpretation

Dashboard

Report

Graph

Analysis output and big data tools

•Talend

•Pentaho

•Tableau

•Qlikview

•Gephi

•Walrus

Fig.11 Social Big Data analysis steps

Social Network Analysis and Mining (2018) 8:30

1 3

30 Page 20 of 28

consisted on analyzing the sentiment of Twitter users for a

new brand product. Then, we describe the processing of this

task based on the predeﬁned steps of the proposed framework

in order to explain the utility of the proposed framework.

For the analogy between existing frameworks and the

proposed architecture, Table5 introduces a set of applica-

tions implementing the conceptual view of the Social Big

Data analytics framework. The table presents some simple

example of applications through introducing the techniques

applicable to each step of the predeﬁned Social Big Data

analytics framework pipeline. In this respect, Selvan and

Moh (2015) stress the cruciality of real-time customer

feedbacks for companies through analyzing the tweets,

as extracted via the Twitter streaming API and processed

using the Hadoop framework. The data are stored prior the

analysis in the HDFS component of Hadoop framework. In

this context, and for the purpose of investigating the public

opinion on a particular topic of interest, Bhuta etal. (2014)

undertake to analyze Twitter data using sentiment analysis

techniques after the collection of a set of public tweets using

Twitter streaming API and its ﬁltration through eliminat-

ing the non-English words. The analysis results are reported

using statistical graphs and geographical charts. Table5

also introduces a framework developed by (Bohlouli etal.

2015) displaying a concrete representation of the proposed

conceptual Social Big Data analytics framework. The data

were collected using the Twitter Streaming API, the Flume

component of Hadoop enabling the transfer of the collected

data to the HDFS for the storage step. Similarly, Bohlouli

etal. (2015) use the Hadoop framework to process the col-

lected data, Mahout for analysis purposes and visualize the

attained results through a user interface involving reports,

diagrams, and curves.

The variety of Big Data technologies makes the iden-

tiﬁcation of the adequate technology a confusing task for

individuals. In this respect, the current study highlights the

most popular Big Data technologies suitable for both storage

and processing step. Also, it provides a summary of their

major features. In fact, the state of the art reveals the exist-

ence of several Big Data storage technologies (Strohbach

etal. 2016), namely: distributed ﬁle systems such as Hadoop

File System (HDFS), NoSQl databases, NewSQL databases

along with Big Data Querying Platforms. The current work

identiﬁes the features related to each data model by giv-

ing examples of technologies for each one of them. Several

features are derived from the state of the art (Hashem etal.

2015; Siddiqa etal. 2017; Wu etal. 2017, Corbellini etal.

2017), namely: Consistency, Availability, Partition (CAP)

(Han etal. 2011b) (at most only two of the three features

can be veriﬁed by a technology), along with architecture

and the data storage characterizing the technology. Table6

depicts Big Data technologies-related features along with

the advantages and disadvantages of each data model. For

the processing step, the state of the art (Yaqoob etal. 2016;

Belcastro etal. 2018; Hu etal. 2014; Cao etal. 2017) reveals

a set of features that serve to compare between the diﬀerent

Big Data processing paradigms-related technologies. Among

those features, we cite: scalability deﬁning the ability of a

system, network, or process to handle a growing amount of

work either by adding new resources to a single node or add-

ing new nodes to the system (Cao etal. 2017), fault tolerance

referring to the system operation continuity despite the fail-

ure of node (Cao etal. 2017), latency describing the speed

of the system response; this feature is usually used for com-

paring real-time technologies (Chintapalli etal. 2016) and

the programming model (i.e., MapReduce, Directed Acyclic

Graph (DAG),Message Passing, Bulk Synchronous Paral-

lel (BSP), Workﬂow and SQL-like.) (Belcastro etal. 2018)

implemented by the technology. Table7 details the features

along with applications related to the most popular Big Data

processing technologies relative to each processing para-

digm (i.e., batch, stream, interactive and graph processing).

In order to reach the target objectives from the analysis, the

research should set a speciﬁcally clear plan prior to proceeding

Table 5 Social Big Data analytics framework applications

Social Big Data conceptual framework steps

References Data Sources Data Collection Data Storage Data preprocess-

ing

Data processing Data analysis Data interpreta-

tion

(Selvan and

Moh 2015)

Twitter Streaming API

and Flume

component of

Hadoop

HDFS of

Hadoop

Filtration done

based on

keyword

Hive of Hadoop,

MapReduce

paradigm, and

Apache Oozie

Text analysis Visualization

in Microsoft

Oﬃce Excel

(Bhuta etal.

2014)

Twitter Twitter Stream-

ing API

No storage as it

is a real-time

processing

Filter non-Eng-

lish tweets

Dictionary-

based clas-

siﬁcation

Sentiment

analysis

Statistical graphs

geographical

charts

(Bohlouli etal.

2015)

Twitter Social media

API and Flume

Hbase Filter the data

and reduce

noise

Implementation

of the Hadoop

MapReduce

Mahout for deci-

sion making

and sentiment

analysis

User interface-

based Html

Social Network Analysis and Mining (2018) 8:30

1 3

Page 21 of 28 30

Table 6 Features-related Big Data storage technologies

Key-value data model Column-oriented data model Document data model Graph data model

Redis MemcacheDB Cassandra Hbase Hypertable MangoDb CoucheDB Neo4J AllegroGraph (RDF

graph)

Scalability Supported (High) Supported Supported Supported Supported Supported Supported Supported Supported

C: Consistency,

A: Availabil-

ity, P: Partition

CP CP AP CP CP CP AP CP CP

Architecture Multi data nodes Master/slave Multi-master Master/slave Master/slave Master/slave Master/slave Master/slave

Data Storage In-memory In-memory In-memory On-Disk On-Disk and In-

memory

On-Disk On-Disk On-Disk and In-

memory

On-Disk

Query Language API API MapReduce MapReduce Thrift Interface SQL MapReduce Cypher, SPARQL SPARQL, RDFS++

Description Data are stored as a distributed hash

table

Data tables are stored as sections of columns of data.

It is an extension of key-value store database, where

columns can have a complex structure, rather than a

blob value (Storey and Song 2017; Bermbach etal.

2015)

A collection of key-value stores

where the value is a document,

such as JSON, BSON (Abra-

mova and Bernardino 2013)

Based upon graph theory (set of nodes,

edges, and properties) (Storey and

Song 2017)

Advantages Very fast random access via Key,

scalable, easy to distribute across

clusters, and provides a simple

model as a hash table (Storey and

Song 2017)

Several popular websites use key-

value store data model, namely:

“Dynamo at Amazon, Redis at

GitHub, Digg, and Blizzard Inter-

active, Memcached at Facebook,

Zynga and Twitter and Voldemort

at Linkedin” (Atikoglu etal.

2012)

Easy distribution, management of very large volumes

of data and partial update

Good for semistructured data,

easy partition, the number of

requests for composite objects

is limited, permission of the

ad hoc applications and partial

update

Represent many large real-world entities

such as maps and social networks (e.g.,

OpenStreetMap, Twitter) (Corbellini

etal. 2017)

Represent linked open data (RDF graph)

and oﬀer a fully query language

(SPARQL)

Disadvantages No complex ﬁltering query, the

join needs to be performed in

the applications, and there is

no mechanism for supporting

multirecord consistency (Corbel-

lini etal. 2017; Storey and Song

2017)

All joins must be made in the code, no constraints and

no triggers (Corbellini etal. 2017)

All joins must be made in the

code, no constraints and no trig-

gers (Corbellini etal. 2017)

Not eﬃcient at processing high volumes

of transactions

Social Network Analysis and Mining (2018) 8:30

1 3

30 Page 22 of 28

Table 7 Features-related Big Data processing technologies

a https ://www.micro soft.com/en-us/resea rch/proje ct/dryad /

b https ://cloju re.org/

Batch processing Stream processing Graph Processing

Hadoop DryadaS4 Spark streaming Apache storm Pregel Graphlab

Programming model MapReduce Directed Acyclic

Graph

MapReduce Directed Acyclic

Graph

Directed Acyclic

Graph

Bulk Synchronous

Parallel

MapReduce

Programming lan-

guage

Java C++ Scala Scala ClojurebC++ C++

Scalability Supported Supported Supported Supported Supported Supported (Cao etal.

2017)

Supported (Cao etal.

2017)

Fault tolerance Supported on node

level.

Supported on node

level.

Supported Supported Supported by the

Nimbus, the master

node, in case of

failure, the passive

node becomes active

without aﬀecting the

workers

Fault tolerance by

check pointing

Supported by snapshot

update

Latency MapReduce reads and

writes from disk,

which slows down

the processing speed

Not supported:

(process terabytes

of data at scale of

minutes) (Cao etal.

2017)

Lower latency due

to the use of local

node memory (Cao

etal. 2017)

Micro-batch: runs

applications up

to 100× faster in

memory and 10×

faster on disk than

Hadoop (Chintapalli

etal. 2016; Xin

etal. 2012)

Lower latency

response (Chinta-

palli etal. 2016;

Belcastro etal.

2018)

Not supported Not supported

Applications Useful for “distributed

sorting, Web link-

graph reversal, Web

access log stats,

inverted index con-

struction, document

clustering, machine

learning, and statis-

tical machine trans-

lation” (Dean and

Ghemawat 2010;

Cao etal. 2017)

Used by Microsoft to

analyze petabytes

of data belongs to

clusters of thousand

computers (Cao

etal. 2017)

A general purpose

framework, used by

Yahoo, Google, and

Bing for processing

unlimited streams

of data (Neumeyer

etal. 2010)

A stream processing

engine for applica-

tions based data

mining and machine

learning (Neumeyer

etal. 2010)

Store data on RAM

memory, which

makes it faster

than Hadoop on

processing iterative

machine learning

(Belcastro etal.

2018, Xin etal.

2012)

Applications related

to the processing of

social network data

and sensor networks

(Belcastro etal.

2018)

Graph computing: PageRank (Malewicz etal.

2010), shortest path, and bipartite matching

Computing social network analysis as the case

of Facebook in which the graph processing

is used to analyze the social graph formed

by users and their connections (Hu etal.

2014)

Netﬂix Movie Recommendation (Low etal.

2012)

Friends of friends score application (Ching

etal. 2015)

Social Network Analysis and Mining (2018) 8:30

1 3

Page 23 of 28 30

the analysis process. Accordingly, one could well refer to the

proposed framework identifying the steps, cited herein, that

should be followed, while accounting for the challenges and

useful solutions provided in this respect, more speciﬁcally:

1. Identifying whether the case does actually display a Big

Data problem issue, through examination of the collected

data pertaining characteristics. In the selected study case

sample, the analysis subject involves a selection of tweets

extracted from Twitter, whereby the following dimensions

are tackled and treated: the volume (to be quantiﬁed), the

variety (tweets: unstructured data), the velocity (real-time

data), as well as the veracity, therefore, the entirety of

these conditions satisfy a Big data challenge.

2. Observing each step involved in the proposed frame-

work through identiﬁcation of the goals, challenges, and

requirements likely to prevail, along with the possible

solutions proposed.

Data collection

•Goal: collect real time data about the target subject

•Solutions:

-Method: Search by keyword.

-Tools: Twitter streaming API.

Data Storage

•Goal: Store huge amount of unstructured data/easy access to the

collected data

•Challenges: Volume and Variety.

•Solutions:

-Tools: NoSQL database, cloud solution, etc.

Data preprocessin

•Goal: Extract the relevant tweets, delete the noise (unsfull metadata)

•Challenges: volume, untructured data.

•Solutions: Data cleansing algorithms.

Data processing

•Goal: parallel analysis of the tweets .

•Challenges: volume.

•Solutions:

-Tools: Hadoop (batch)/ Spark (real-time) and Sentiment

analysis tool.

Data Analysis

•Goal: generate socres relative to user sentiment.

•Requirement: identify if it is a real-time or batch processing.

•Challenges: volume and variety.

•Solutions:

-Identify the adequate sentiment analysis method.

-Tools: Sentiment analysis tool (e.g.R langage).

Data interpretatio

•Goal: Visualize the simple sentiment scores values and interpret all

the

scores to identify the popularity of the brand.

•Challenges: volume.

•

Solution: theuse of reporting tool supporting the data volume.

Social Network Analysis and Mining (2018) 8:30

1 3

30 Page 24 of 28

The present survey targets researchers who envisage

to analyze social media data, for any particular purposes,

without having any clear idea about the steps necessary to

pursue to execute the analysis procedure, much less about

the challenges likely to be encountered all through the

analysis, nor even the adequate Big Data technologies ﬁt

for application in this respect. Accordingly, throughout the

scope of the present article, researchers should be able to

learn about:

• The steps necessary to follow for an eﬃcient social media

analytics task to be eﬀectively conducted.

• How to ensure that they are really dealing with a prob-

lem of a Big Data type during the social media analysis.

Accordingly, they can turn to the section-dubbed Big

Data dimensions, characterizing social media data, to

recognize how each dimension is mapped to conform

with social data, to be aware that it is not only the volume

that characterizes a Big Data problem.

• Identifying each step-associated challenges before their

emergence. Concerning data collection, for instance, they

have to know how much data they will need to collect

and therefore, how much memory space will be required.

They also need to identify whether the application per-

tains to a real-time analysis or to a batch analysis so that

the adequate technologies can be identiﬁed, along with

the appropriate algorithms ﬁt for maintaining a real-time

storage, analysis, and response. Additionally, they have to

identify the adequate methods whereby multimedia data

can be analyzed when unstructured data are being dealt

with.

• The most commonly appropriate Big Data tools ﬁt for

implementation with each step. Actually, the present sur-

vey describes several frameworks, tools, and algorithms

commonly applicable throughout the analysis process.

For instance, regarding the storage process, the users

could apply either NoSQL databases, ﬁles systems, or

cloud storage; as for the processing step, the relevant

frameworks are categorized according to their use (batch,

stream, interactive, and graph processing), and for the

analysis stage, the convenient tools are categorized

according to their functionalities (searching, mining, or

statistical analysis, etc.).

5 Conclusion andpotential research trends

The present work is focused on studying the joint interaction

between social media analytics and Big Data. Based on the

reached results, the paper ends up with setting up a spe-

cial type of alliance between social media analytics and Big

Data. The alliance is undertaken based on two major levels,

namely the social media analytics’ processing steps and the

social media analytics applied technologies.

Concerning the processing steps, the state-of-the-art

model maintains that each social media analytics-based

framework to develop should follow its proper relevant steps

throughout the social media data processing procedure. Still,

no clear view seems available as to how such social data can

be processed. Hence, the Big Data processing pipeline is put

forward in a bid to encapsulate the social media analytics-

associated processing procedures. This encapsulation is

established following an amalgamation analogy involving

the Big Data dimensions and social media data features,

whereby it has been demonstrated that social media data

are only a Big Data source encompassing the entirety of the

discussed dimensions associated with Big Data, speciﬁcally:

volume, variety, velocity, and veracity. As for the relevant

processing steps, they are depicted in the following phases: a

data collection stage, as gathered through diﬀerent methods

such as social media APIs; a data storage phase, that requires

the support of huge platform of continuously generated

unstructured data; then comes the data preprocessing step

that involves the implementation of speciﬁc algorithms to

clean the data and prepare it for the processing step; ﬁnally,

there lies the analysis and interpretation stage that helps

recapitulate and ensure the visualization of useful insights

as drawn and extracted from social media data to make them

understandable and easy to recognize by the end users.

Concerning the second level of the alliance, it rests on

integrating the Big Data technologies relevant to each step

of the social media analytics process. Introduced under the

Social Big Data research ﬁeld, the combination identiﬁes for

each step of the social media analytics process the appro-

priate frameworks ﬁt for application to help optimize the

analysis results and support the smooth ﬂow of social data.

Indeed, the Big Data-related technologies help promote the

processing of huge amount of unstructured data via appli-

cation of the Hadoop and also Storm frameworks, for real-

time processing purposes of the continuously generated data,

given the need for eﬀective systems ﬁt for dealing with such

speedy data ﬂow. For each of these frameworks, a set of

libraries are developed, for example, Mahout, Mlib, Solor,

and GraphX, to support such various analyses as machine

learning, search engine, and graph computation. For visu-

alization purposes, also, Big Data proves to oﬀer a set of

technologies whereby the analysis results could be visual-

ized either on dashboards, for example Qlkview, on tables,

for example, Microsoft Tableau Software, or even through

huge graphs, as is the case of Gephi.

Figures10 and 11 depict the hybrid alliance by intro-

ducing through a conceptual framework how social media

analytics can be established under a Big Data environment

context.

Social Network Analysis and Mining (2018) 8:30

1 3

Page 25 of 28 30

Overall, this study may well stand as an initial guide for

those who envisage to deal with analyzing social media data,

by serving them to get a clear view of the processing steps

involved in social media analytics and the Big Data devel-

oped technologies applicable to each step of the pipeline.

Considering the promising results brought about by this

study, they lie in providing a conceptual framework associ-

ated with the Big Data pipeline and Social Data analysis

processes. Still, further studies, some of which are cur-

rently underway in our laboratories, seem necessary to fur-

ther explore the proposed scheme’s feasibility and potential

performance of the merger of the collection processes as

derived from several social networks, while accounting for

the data heterogeneity dimension.

It is also critically important to get an idea of the process-

ing steps involved in Social Big Data analytics and identify

the challenges related to each relevant step, while evaluating

the Big Data technologies developed to manage such a huge

amount of social data. Yet, what really matters is choice

of “the right tools for the right job” in order to ensure the

achievement of the desired analytics’ goals subject of under-

standing and prediction through the Big Data processing.

Indeed, to help people to beneﬁt from Big Data technolo-

gies, this work presents the diﬀerent Big Data technologies

used during the storage and processing steps. In addition, it

reviews the storage technologies-related features (i.e., scal-

ability, CAP, architecture, data storage, and query language)

and also the processing-related features (i.e., open source,

scalability, latency, API, fault tolerance, and programming

language) along with their advantages, disadvantages, and

applications.

Constructing an eﬀective and successful analysis strategy

is the ﬁrst step in extracting insights from data. To ensure

the success of such a strategy, a set of requirements should

be fulﬁlled and instructions need to be followed, includ-

ing the identiﬁcation of the objectives lying behind the data

analytics procedure; talking to the stakeholders, i.e., getting

into discussion with people from the business and techni-

cal side likely to be aﬀected by the analysis result, in order

to understand their speciﬁc needs, identify the appropriate

data useful for achieving the analytics’ goals, and identify

the adequate techniques and tools required for processing

data, as each tool has proper features ﬁtting it to deal with

a speciﬁc need.

This work has been speciﬁcally conceived to deal exclu-

sively with the Big Data framework cases, as treated in a

selection of relevant works. However, Big Data technolo-

gies appear to enclose greater compilations of studies than

those speciﬁed in this context. As a future line of thought,

researchers could well lay greater focus on investigating

the other existing Big Data technologies, while establish-

ing systematic comparisons between them. For instance,

they could concentrate on more than just a single Big Data

technology liable to achieve the particular task, but with

diﬀerent performance levels. Thus, it is required that future

studies should specify the relevant parameters necessary for

establishing the comparison between the existing Big Data-

related technologies. Taking as an example the instance of

Big Data storage solutions, people lacking experience might

well get confused as to whether it would be convenient to use

a database type of document or a graph-type one. Respond-

ing to such question often requires a large experience in the

domain of databases, which makes the establishment of such

a comparison extremely useful for the user.

Noteworthy, also, is that the present research draws a

global view of the social media analytics process, which

could even be further extended so as to focus on a speciﬁc

input of the analysis, such as, for instance, constructing a

social media analytics framework whereby knowledge could

be extracted from media types of data (e.g., photographs

or videos). The extended framework will be more speciﬁc

in terms of reviewing the techniques and algorithms used

for analyzing image or video data, along with the relevant

technologies useful for such purposes.

References

Aasman J (2006) Allegro graph: RDF triple database. Oakland Franz

Incorporated, Cidade

Abbasi A, Adjeroh DA, Dredze M, Paul MJ, Zahedi FM, Zhao H, Walia

N etal (2014) Social media analytics for smart health. IEEE

Intell Syst 29(2):60–80

Abramova V, Bernardino J (2013) NoSQL databases: MongoDB vs

cassandra. In: Proceedings of the international C* conference

on computer science and software engineering, ACM, pp 14–22

Achrekar H, Gandhe A, Lazarus R, Yu S-H, Liu B (2011) Predict-

ing ﬂu trends using twitter data. In: Computer Communications

Workshops (INFOCOM WKSHPS), 2011 IEEE Conference on.

IEEE, pp 702–707

Ackoﬀ RL (1989) From data to wisdom. J Appl Syst Anal 16(1):3–9

Agrawal D, Bernstein P, Bertino E, Davidson S, Dayal U, Franklin

M, Gehrke J, Haas L, Halevy A, Han J, Jagadish HV, Labrinidis

A, Madden S, Papakonstantinou Y, Patel JM, Ramakrishnan R,

Ross K, Shahabi C, Suciu D, Vaithyanathan S, Widom J (2012)

Challenges and opportunities with big data—a community white

paper developed by leading researchers across the United States.

http://cra.org/ccc/docs/init/bigda tawhi tepap er.pdf

Agrawal R, Kadadi A, Dai X, Andres F (2015) Challenges and oppor-

tunities with big data visualization. In: Proceedings of the 7th

international conference on management of computational and

collective intElligence in digital EcoSystems, ACM, pp 169–173

Ahamed BB, Ramkumar T, Hariharan S (2014) Data integration pro-

gression in large data source using mapping aﬃnity. In: 7th Inter-

national conference on advanced software engineering and its

applications (ASEA), IEEE, pp 16–21

Ashwin KTK, Kammarpally P, George KM (2016) Veracity of infor-

mation in twitter data: a case study. In: IEEE Computer Society

BigComp, pp 129–136

Atikoglu B, Xu Y, Frachtenberg E, Jiang S, Paleczny M (2012) Work-

load analysis of a large-scale key-value store. In: Harrison PG,

Social Network Analysis and Mining (2018) 8:30

1 3

30 Page 26 of 28

Arlitt MF, Casale G (eds) SIGMETRICS. ACM, New York, pp

53–64

Avvenuti M, Cresci S, Marchetti A, Meletti C, Tesconi M (2014) EARS

(earthquake alert and report system): a real time decision support

system for earthquake crisis management. In: Proceedings of

the 20th ACM SIGKDD international conference on knowledge

discovery and data mining, ACM, pp 1749–1758

Avvenuti M, Cresci S, Marchetti A, Meletti C, Tesconi M (2016) Pre-

dictability or early warning: using social media in modern emer-

gency response. IEEE Internet Comput 20(6):4–6

Baquero AV, Palacios RC, Molloy O (2016) Real-time business activ-

ity monitoring and analysis of process performance on big-data

domains. Telematics Inform 33(3):793–807

Baskar S, Arockiam L, Charles S (2013) A systematic approach on data

pre-processing in data mining. Compusoft 2(11):335

Batrinca B, Treleaven PC (2015) Social media analytics: a survey of

techniques, tools and platforms. AI Soc 30:89–116

Belcastro L, Marozzo F, Talia D (2018) Programming models and

systems for Big Data analysis. Int J Parallel Emerg Distrib

Syst. https ://doi.org/10.1080/17445 760.2017.14225 01

Bermbach D, Müller S, Eberhardt J, Tai S (2015) Informed schema

design for column store-based database services. In: SOCA,

IEEE Computer Society, pp 163–172

Bhuta S, Doshi A, Doshi U, Narvekar M (2014) A review of techniques

for sentiment analysis Of Twitter data. In: International confer-

ence on issues and challenges in intelligent computing techniques

(ICICT), IEEE, pp. 583–591

Bocconi S, Bozzon A, Psyllidis A, Bolivar CT, Houben G-J (2015)

Social glass: a platform for urban analytics and decision-making

through heterogeneous social data. In: Gangemi A, Leonardi S,

Panconesi A (eds) WWW (companion volume). ACM, New

York, pp 175–178

Bohlouli M, Dalter J, Dornhöfer M, Zenkert J, Fathi M (2015) Knowl-

edge discovery from social media using big data-provided senti-

ment analysis (SoMABiT). J Inf Sci 41(6):779–798

Bothos E, Apostolou D, Mentzas G (2010) Using social media to pre-

dict future events with agent-based markets. IEEE Intell Syst

25(6):50–58

Cambria E, Wang H, White B (2014) Guest editorial: big social data

analysis. Knowl-Based Syst 69:1–2

Cao J, Chawla S, Wang Y, Wu H (2017) Programming platforms

for Big Data analysis. In: Handbook of big data technologies.

Springer, pp 65–99

Carlson JL (2013) Redis in action. Manning Publications Co., Shelter

Island

Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M,

Chandra T etal (2008) Bigtable: a distributed storage system

for structured data. ACM Trans Comput Syst (TOCS) 26(2):4

Chang RM, Kauﬀman RJ, Kwon Y (2014) Understanding the paradigm

shift to computational social science in the presence of big data.

Decis Support Syst 63:67–80

Chen CP, Zhang C-Y (2014) Data-intensive applications, challenges,

techniques and technologies: a survey on Big Data. Inf Sci

275:314–347

Chen M, Ebert D, Hagen H, Laramee RS, Van Liere R, Ma K-L, Rib-

arsky W etal (2009) Data, information, and knowledge in visu-

alization. IEEE Comput Gr Appl 29(1):1–10

Cheng X, Liu J, Dale C (2013) Understanding the characteristics of

internet short video sharing: a YouTube-based measurement

study. IEEE Trans Multimed 15(5):1184–1194

Ching A, Edunov S, Kabiljo M, Logothetis D, Muthukrishnan S (2015)

One Trillion edges: graph processing at Facebook-scale. PVLDB

8:1804–1815

Chintapalli S, Dagit D, Evans B, Farivar R, Graves T, Holderbaugh M,

Liu Z, Nusbaum K, Patil K, Peng B, Poulosky P (2016) Bench-

marking streaming computation engines: storm, ﬂink and spark

streaming. In: IPDPS workshops, IEEE Computer Society, pp

1789–1792

Chodorow K (2013) MongoDB: the deﬁnitive guide. O”Reilly Media,

Inc., Newton

Corbellini A, Mateos C, Zunino A, Godoy D, Schiaﬃno S (2017) Per-

sisting big-data: the NoSQL landscape. Inf Syst 63:1–23

Cormode G, Krishnamurthy B (2008) Key diﬀerences between Web

1.0 and Web 2.0. First Monday 13(6)

Dang Y, Zhang Y, Hu PJ-H, Brown SA, Ku Y, Wang J-H, Chen H

(2014) An integrated framework for analyzing multilingual con-

tent in Web 2.0 social media. Decis Support Syst 61:126–135

Dean J, Ghemawat S (2008) MapReduce: simpliﬁed data processing

on large clusters. Commun ACM 51(1):107–113

Dean J, Ghemawat S (2010) MapReduce: a ﬂexible data processing

tool. Commun ACM 53:72–77

Dredze M (2012) How social media will change public health. IEEE

Intell Syst 27(4):81–84

Elgendy N, Elragal A (2014) Big data analytics: a literature review

paper. In Perner P (eds) Advances in data mining. Applications

and theoretical aspects. ICDM. Lecture notes in computer sci-

ence, vol 8557. Springer, Cham

Esposito C, Ficco M, Palmieri F, Castiglione A (2015) A knowledge-

based platform for Big Data analytics based on publish/subscribe

services and stream processing. Knowl-Based Syst 79:3–17

Fan W, Bifet A (2013) Mining big data: current status, and forecast to

the future. ACM SIGKDD Explor Newsl 14(2):1–5

Furht B, Villanustre F (2016) Introduction to Big Data. Big Data tech-

nologies and applications. Springer, Berlin, pp 3–11

Gandomi A, Haider M (2015) Beyond the hype: big data concepts,

methods, and analytics. Int J Inf Manag 35(2):137–144

Auradkar A, Botev C, Das S, De Maagd D, Feinberg A, Ganti P, Gao

L, etal. (2012) Data infrastructure at linkedin. In: IEEE 28th

international conference on data engineering (ICDE), IEEE, pp

1370–1381

Ghemawat S, Gobioﬀ H, Leung S-T (2003) The Google ﬁle system.

ACM SIGOPS operating systems review, vol 37. ACM, New

York, pp 29–43

Han J, Kamber M, Pei J (2011a) Data mining: concepts and techniques.

Elsevier, Amsterdam

Han J, Haihong E, Le G, Du J (2011b) Survey on NoSQL database. In:

6th international conference on pervasive computing and applica-

tions (ICPCA), IEEE, pp 363–366

Haryadi AF, Hulstijn J, Wahyudi A, Voort H, van der, Janssen M

(2016) Antecedents of big data quality: an empirical examina-

tion in ﬁnancial service organizations. In: IEEE international

conference on Big Data (Big Data), IEEE, pp 116–121

Hashem IAT, Yaqoob I, Anuar NB, Mokhtar S, Gani A, Khan SU

(2015) The rise of “big data” on cloud computing: review and

open research issues. Inf Syst 47:98–115

He W, Wang F-K, Akula V (2017) Managing extracted knowledge

from big social media data for business decision making. J Knowl

Manag 21(2):275–294

Hiba S, Mohamed Ali HT, Mohamed BA (2018) Popularity metrics’

normalization for social media entities. In: 20th International

Conference on Enterprise Information Systems, pp 525–535

Hu H, Wen Y, Chua TS, Li X (2014) Toward scalable systems for big

data analytics: a technology tutorial. IEEE Access 2:652–687

Imran M, Castillo C, Diaz F, Vieweg S (2015) Processing social media

messages in mass emergency: a survey. ACM Comput Surv

47(4):67

Isard M, Budiu M, Yu Y, Birrell A, Fetterly D (2007) Dryad: dis-

tributed data-parallel programs from sequential building blocks.

ACM SIGOPS operating systems review, ACM, vol 41, pp 59–72

Jagadish H, Gehrke J, Labrinidis A, Papakonstantinou Y, Patel JM,

Ramakrishnan R, Shahabi C (2014) Big data and its technical

challenges. Commun ACM 57(7):86–94

Social Network Analysis and Mining (2018) 8:30

1 3

Page 27 of 28 30

Ji X, Chun SA, Cappellari P, Geller J (2017) Linking and using social

media data for enhancing public health analytics. J Inf Sci

43(2):221–245

Jure L (2011) Social media analytics: tracking, modeling and predict-

ing the ﬂow of information through networks. In: Proceedings of

the 20th international conference companion on World wide web

(WWW ‘11). ACM, New York, NY, USA, pp 277–278

Kaisler SH, Armour F, Espinosa JA, Money WH (2013) Big Data:

issues and challenges moving forward. In: IEEE Computer Soci-

ety HICSS, pp 995–1004

Kanhabua N, Romano S, Stewart A, Nejdl W (2012a) Supporting

temporal analytics for health-related events in microblogs. In:

Proceedings of the 21st ACM international conference on Infor-

mation and knowledge management, CIKM’12, ACM, Maui,

Hawaii, pp 2686–2688

Kaplan AM, Haenlein M (2010) Users of the world, unite! The chal-

lenges and opportunities of Social Media. Bus Horiz 53(1):59–68

Karpenko A, Aarabi P (2011) Tiny videos: a large data set for non-

parametric video retrieval and frame classiﬁcation. IEEE Trans

Pattern Anal Mach Intell 33(3):618–630

Khan N, Yaqoob I, Hashem IAT, Inayat Z, Mahmoud Ali WK, Alam

M, Shiraz M etal (2014) Big data: survey, technologies, oppor-

tunities, and challenges. Sci World J 2014:1–18

Kotsilieris T, Pavlaki A, Christopoulou SC, Anagnostopoulos I (2017)

The impact of social networks on health care. Social Netw Anal

Min 7(1):18:1–18:6

Kumar V, Chadha A (2012) Mining association rules in student’s

assessment data. Int J Comput Sci Issues 9(5):211–216

Lennon, J. (2009). Introduction to couchdb. Beginning CouchDB, pp

3–9

Li N, Wu DD (2010) Using text mining and sentiment analysis for

online forums hotspot detection and forecast. Decis Support Syst

48(2):354–368

Low Y, Bickson D, Gonzalez J, Guestrin C, Kyrola A, Hellerstein JM

(2012) Distributed GraphLab: a framework for machine learning

and data mining in the cloud. Proc VLDB Endow 5(8):716–727

Magnusson J (2012) Social network analysis utilizing Big Data Tech-

nology. https ://www.diva-porta l.org/smash /get/diva2 :50975 7/

FULLT EXT01 .pdf

Malewicz G, Austern MH, Bik AJ, Dehnert JC, Horn I, Leiser N, Cza-

jkowski G (2010) Pregel: a system for large-scale graph process-

ing. In: Proceedings of the ACM SIGMOD international confer-

ence on management of data, ACM, pp 135–146

Manyika J, Chui M, Brown B, Bughin J, Dobbs R, Roxburgh C, Byers

A (2011) Big Data: the next frontier for innovation, competition,

and productivity

Mendoza M, Poblete B, Castillo C (2010) Twitter under crisis: can we

trust what we RT? In: Giles CL, Mitra P, Perisic I, Yen J, Zhang

H (eds) SOMA@KDD. ACM, New York, pp 71–79

Meng X, Bradley J, Yavuz B, Sparks E, Venkataraman S, Liu D, Free-

man J etal (2016) Mllib: machine learning in apache spark. J

Mach Learn Res 17(34):1–7

Middleton SE, Middleton L, Modaﬀeri S (2014) Real-time crisis map-

ping of natural disasters using social media. IEEE Intell Syst

29(2):9–17

Mikolov T, Deoras A, Povey D, Burget L, Cernock J (2011) Strategies

for training large scale neural network language models. In: IEEE

Workshop on automatic speech recognition and understanding

(ASRU), IEEE, pp 196–201

Neumeyer L, Robbins B, Nair A, Kesari A (2010) S4: distributed

stream computing platform. In: IEEE international conference

on data mining workshops (ICDMW), IEEE, pp 170–177

Newman R, Chang V, Walters RJ, Wills GB (2016) Web 2.0–the past

and the future. Int J Inf Manag 36(4):591–598

Nguyen DT, Hwang D, Jung JJ (2014) Time-frequency social data

analytics for understanding social big data. In: IDC, Studies in

Computational Intelligence, vol 570. Springer, pp 223–228

Oh C, Sasser S, Almahmoud S (2015) Social media analytics frame-

work: the case of Twitter and Super Bowl ads. J Inf Technol

Manag 26(1):1–18

Olshannikova E, Ometov A, Koucheryavy Y, Olsson T (2016) Visu-

alizing Big Data. In: Big Data technologies and applications,

Springer, pp 101–131

Orgaz GB, Jung JJ, Camacho D (2016) Social big data: recent achieve-

ments and new challenges. Inf Fus 28:45–59

Oussous A, Benjelloun F-Z, Lahcen AA, Belfkih S (2017) Big Data

technologies: a survey. J King Saud Univ Comput Inf Sci. https

://doi.org/10.1016/j.jksuc i.2017.06.001

Owen S, Owen S (2012) Mahout in action. Manning Publications Co.,

Shelter Island

Peng S, Wang G, Xie D (2017) Social inﬂuence analysis in social

networking big data: opportunities and challenges. IEEE Netw

31(1):11–17

Radicati S, Hoang Q (2011) Email statistics report 2011–2015. The

Radicati Group, Inc. A Technology Market Research Firm

Rahmani A, Chen AC-L, Sarhan A, Jida J, Rifaie M, Alhajj R (2014)

Social media analysis and summarization for opinion mining: a

business case study. Social Netw Anal Min 4(1):171

Reuter C, Scholl S (2014) Technical limitations for designing applica-

tions for social media. In: Butz A, Koch M, Schlichter JH (eds)

Mensch & Computer workshop band. De Gruyter Oldenbourg,

Berlin, pp 131–139

Rowley J (2007) The wisdom hierarchy: representations of the DIKW

hierarchy. J Inf Sci 33(2):163–180

Sakaki T, Okazaki M, Matsuo Y (2013) Tweet analysis for real-time

event detection and earthquake reporting system development.

IEEE Trans Knowl Data Eng 25(4):919–931

Sakr S (2016) Large-scale graph processing systems. In: Big Data 2.0

Processing Systems: A Survey, Springer, Cham, pp 53–73

Santhanam T, Padmavathi M (2014) Comparison of K-means clus-

tering and statistical outliers in reducing medical datasets. In:

International conference on science engineering and management

research (ICSEMR), IEEE, pp 1–6

Sapountzi A, Psannis KE (2016) Social networking data analysis

tools & challenges. Future Gener Comput Sys. https ://doi.

org/10.1016/j.futur e.2016.10.019

Schroeck M, Shockley R, Smart J, Romero-Morales D, Tufano P (2012)

Analytics: the real-world use of big data: How innovative enter-

prises extract value from uncertain data, Executive Report. In:

IBM Institute for Business Value and Said Business School at

the University of Oxford

Selvan LGS, Moh T-S (2015) A framework for fast-feedback opinion

mining on Twitter data streams. In: CTS, IEEE, pp 314–318

Siddiqa A, Hashem IAT, Yaqoob I, Marjani M, Shamshirband S, Gani

A, Nasaruddin F (2016) A survey of big data management: tax-

onomy and state-of-the-art. J Netw Comput Appl 71:151–166

Siddiqa A, Karim A, Gani A (2017) Big data storage technologies: a

survey. Front IT & EE 18:1040–1070

Skoric MM, Poor ND, Achananuparp P, Lim E-P, Jiang J (2012)

Tweets and votes: a study of the 2011 Singapore General Elec-

tion. In: IEEE Computer Society, HICSS, pp 2583–2591

Stenmark D (2002) Information vs. knowledge: the role of intranets in

knowledge management. In: Proceedings of HICSS. IEEE Press

Stieglitz S, Dang-Xuan L (2013) Social media and political communi-

cation: a social media analytics framework. Soc Netw Anal Min

3(4):1277–1291

Stieglitz S, Dang-Xuan L, Bruns A, Neuberger C (2014) Social media

analytics. Wirtschaftsinformatik 56(2):101–109

Social Network Analysis and Mining (2018) 8:30

1 3

30 Page 28 of 28

Stieglitz S, Mirbabaie M, Ross B, Neuberger C (2018) Social media

analytics—challenges in topic discovery, data collection, and

data preparation. Int J Inf Manag 39:156–168

Storey VC, Song I-Y (2017) Big data technologies and management:

what conceptual modeling can do. Data Knowl Eng 108:50–67

Strohbach M, Daubert J, Ravkin H, Lischka M (2016) Big data storage.

In: New horizons for a data-driven economy, Springer, Cham,

pp 119–141

Taylor RC (2010) An overview of the Hadoop/MapReduce/HBase

framework and its current applications in bioinformatics. BMC

Bioinf 11(12):S1

Uddin MF, Gupta N etal. (2014) Seven V’s of Big Data understanding

Big Data to extract value. In: American Society for Engineering

Education (ASEE Zone 1), Zone 1 Conference of the IEEE, pp

1–5

Vatrapu R, Mukkamala RR, Hussain A, Flesch B (2016) Social set

analysis: a set theoretical approach to big data analytics. IEEE

Access 4:2542–2571

Vickery G, Wunsch-Vincent S (2007) Participative web and user-cre-

ated content: Web 2.0 wikis and social networking. Organization

for Economic Cooperation and Development (OECD)

Wang WY, Pauleen DJ, Zhang T (2016) How social media applications

aﬀect B2B communication and improve business performance in

SMEs. Ind Mark Manag 54:4–14

Wang H, Xu Z, Pedrycz W (2017) An overview on the roles of fuzzy

set techniques in big data processing: trends, challenges and

opportunities. Knowl-Based Syst 118:15–30

White T (2012) Hadoop: the deﬁnitive guide. O”Reilly Media, Newton

Win SSM, Aung TN (2017) Target oriented tweets monitoring system

during natural disasters. In: Uehara K, Nakamura M (eds) ICIS,

IEEE Computer Society, pp 143–148

Wu Y, Cao N, Gotz D, Tan Y-P, Keim DA (2016) A survey on

visual analytics of social media data. IEEE Trans Multimed

18:2135–2148

Wu D, Sakr S, Zhu L (2017) Big data storage and data models. In:

Handbook of big data technologies, Springer, Cham, pp 3–29

Xin R, Rosen J, Zaharia M, Franklin MJ, Shenker S, Stoica I (2012)

Shark: SQL and rich analytics at scale. CoRR. abs/1211.6176

Yaqoob I, Hashem IAT, Gani A, Mokhtar S, Ahmed E, Anuar NB,

Vasilakos AV (2016) Big data: from beginning to future. Int J

Inf Manag 6(6):1231–1247

Yaqub U, Chun SA, Atluri V, Vaidya J (2017) Sentiment based analysis

of tweets during the US Presidential Elections. In: Hinnant CC,

Ojo A (eds) DG.O, ACM, New York, pp 1–10

Zeng D, Chen H, Lusch R, Li S-H (2010) Social media analytics and

intelligence. IEEE Intell Syst 25(6):13–16

A preview of this full-text is provided by Springer Nature.

Learn more

Content available from Social Network Analysis and Mining

This content is subject to copyright. Terms and conditions apply.

How luxury fashion brands leverage TikTok to captivate young consumers: an exploratory investigation using video analytics

Article

Full-text available

Jan 2024

Fashion brands including luxury brands are embracing TikTok to access young consumers, but there is a notable absence of research on how luxury fashion brands can leverage TikTok. Video analytics is crucial for understanding marketing communications via TikTok, a video-based social media platform. This study aims to examine how luxury brands establish their presence and effectively attract and engage with young consumers on TikTok through social media video analytics. A multiple case study approach was employed on the selected four luxury fashion brands. Data were collected from the selected brands’ official accounts, endorsed users’ accounts, and related hashtag links on TikTok. A three-stage content analysis of social media video analytics was conducted. The common and customized strategies employed across the selected brands on TikTok were identified, respectively. The findings revealed that young consumers prefer high-quality videos regarding branding messages, branded challenges, and influencers-led branded content. A consumer-brand engagement framework was proposed based on the data analysis. This research contributes to understanding how TikTok benefits the fashion industry and offers theoretical and practical insights for fashion brands to better harness TikTok. This study represents a pioneering endeavor in exploring social media video analytics, contributing to the advancement of marketing analytics literature.

Scientific Appearance in Telegram

Article

May 2024

This paper examines the influence of scientific appearance (SA) on post dissemination and analyses a dataset of important actors in Germany, specifically those involved in the dissemination of disinformation on the social media platform Telegram. SA is identified through textual elements such as predefined keywords or digital object identifiers (DOIs). Characteristics and behaviours of actors with and without SA are compared using metadata such as forward counts and original posts. The additional content analysis provides insights into SA's usage and impact. The findings indicate that SA may influence the dissemination of posts and demonstrate how different methods can be applied for studying social media platforms.

Mining crowdsourced text to capture hikers' perceptions associated with landscape features and outdoor physical activities

Article

Full-text available

Oct 2023
ECOL INFORM

Issue 3 www.ijrar.org (E-ISSN

Article

Full-text available

Jul 2023

Big Data refers to the rapidly growing volume, variety, value, veracity, and velocity of data being generated in the modern digital world. It looks at the different kinds and sources of Big Data, such as structured data, semi-structured data, and unstructured data, highlighting the growing significance of the sources and elements of unstructured data from the social media perspective, Time Series Data, Geospatial Data, and Streaming Data. The three main challenges of Big Data, including characteristic challenges, processing challenges, and management challenges, are highlighted in this paper. This paper presents an overview of Big Data's characteristics, types, challenges, and various social media platforms. In conclusion, organizations not at all longer neglect unstructured data today; relatively, they are inventing means of evaluating it to extract information.

Multilingual text categorization and sentiment analysis: a comparative analysis of the utilization of multilingual approaches for classifying twitter data

Article

Full-text available

May 2023
NEURAL COMPUT APPL

Text categorization and sentiment analysis are two of the most typical natural language processing tasks with various emerging applications implemented and utilized in different domains, such as health care and policy making. At the same time, the tremendous growth in the popularity and usage of social media, such as Twitter, has resulted on an immense increase in user-generated data, as mainly represented by the corresponding texts in users’ posts. However, the analysis of these specific data and the extraction of actionable knowledge and added value out of them is a challenging task due to the domain diversity and the high multilingualism that characterizes these data. The latter highlights the emerging need for the implementation and utilization of domain-agnostic and multilingual solutions. To investigate a portion of these challenges this research work performs a comparative analysis of multilingual approaches for classifying both the sentiment and the text of an examined multilingual corpus. In this context, four multilingual BERT-based classifiers and a zero-shot classification approach are utilized and compared in terms of their accuracy and applicability in the classification of multilingual data. Their comparison has unveiled insightful outcomes and has a twofold interpretation. Multilingual BERT-based classifiers achieve high performances and transfer inference when trained and fine-tuned on multilingual data. While also the zero-shot approach presents a novel technique for creating multilingual solutions in a faster, more efficient, and scalable way. It can easily be fitted to new languages and new tasks while achieving relatively good results across many languages. However, when efficiency and scalability are less important than accuracy, it seems that this model, and zero-shot models in general, can not be compared to fine-tuned and trained multilingual BERT-based classifiers.

Multilingual text categorization and sentiment analysis: a comparative analysis of the utilization of multilingual approaches for classifying twitter data

Article

Full-text available

May 2023
NEURAL COMPUT APPL

Text categorization and sentiment analysis are two of the most typical natural language processing tasks with various emerging applications implemented and utilized in different domains, such as health care and policy making. At the same time, the tremendous growth in the popularity and usage of social media, such as Twitter, has resulted on an immense increase in user-generated data, as mainly represented by the corresponding texts in users' posts. However, the analysis of these specific data and the extraction of actionable knowledge and added value out of them is a challenging task due to the domain diversity and the high multilingualism that characterizes these data. The latter highlights the emerging need for the implementation and utilization of domain-agnostic and multilingual solutions. To investigate a portion of these challenges this research work performs a comparative analysis of multilingual approaches for classifying both the sentiment and the text of an examined multilingual corpus. In this context, four multilingual BERT-based classifiers and a zero-shot classification approach are utilized and compared in terms of their accuracy and applicability in the classification of multilingual data. Their comparison has unveiled insightful outcomes and has a twofold interpretation. Multilingual BERT-based classifiers achieve high performances and transfer inference when trained and fine-tuned on multilingual data. While also the zero-shot approach presents a novel technique for creating multilingual solutions in a faster, more efficient, and scalable way. It can easily be fitted to new languages and new tasks while achieving relatively good results across many languages. However, when efficiency and scalability are less important than accuracy, it seems that this model, and zero-shot models in general, can not be compared to fine-tuned and trained multilingual BERT-based classifiers.

Alem da mensagem: o Telegram como um ambiente informacional complexo, suas affordances e a disseminação de desinformação política no Brasil

Thesis

Full-text available

Dec 2023

Giulia Tucci

RESUMO: Esta tese investiga o fluxo informacional no Telegram, propondo um framework teórico para pesquisas desenvolvidas com dados extraídos desta plataforma digital híbrida, posicionada na intersecção entre aplicativo de mensagens e rede social. O trabalho foca na disseminação de desinformação política no Brasil. Uma revisão de escopo da literatura embasa o estudo, identificando pesquisas empíricas que utilizaram dados de grupos ou canais do Telegram. Analisaram-se métodos, procedimentos de seleção de fontes e ferramentas de coleta de dados. Destacou-se a relevância das funcionalidades e das affordances específicas. Adicionalmente, foi realizado um estudo de caso sobre grupos pró-Bolsonaro na campanha eleitoral de 2022, elucidando estratégias, atores e fluxos de (des)informação. A pesquisa aprofunda a compreensão do Telegram como espaço informacional complexo, explorando seu potencial de influência na propagação de desinformação política. A combinação de métodos digitais e análise de redes complexas permitiu a identificação de affordances do aplicativo e o desenvolvimento de uma taxonomia de ações de usuários. Estes elementos compõem o framework proposto, que serve como guia para futuras investigações, revelando aspectos distintos do Telegram e suas implicações no cenário informacional contemporâneo. Palavras-chave: Telegram; plataformas digitais; affordances; métodos digitais; campanhas eleitorais; desinformação. ABSTRACT: This thesis investigates the informational flow within Telegram, proposing a theoretical framework for research developed with data collected from this hybrid digital platform, positioned at the intersection of messaging app and social media. The study focuses on the dissemination of political disinformation in Brazil. A scoping review underpins the research, pinpointing empirical studies that used data from Telegram groups or channels. Methods, procedures for sources selection, and data collection tools were analyzed. The relevance of specific functionalities and affordances was emphasized. Additionally, a case study on pro-Bolsonaro groups during the 2022 electoral campaign was conducted, shedding light on strategies, actors, and flows of (dis)information. The research deepens the understanding of Telegram as a complex informational space, exploring its potential influence on the spread of political misinformation. The blend of digital methods and complex network analysis enabled the identification of app affordances and the development of a taxonomy of user actions. These elements constitute the proposed framework, serving as a guide for future inquiries, unveiling Telegram's unique facets and its implications in the contemporary informational landscape. Keywords: Telegram; digital platforms; affordances; digital methods; electoral campaigns; disinformation.

Competitive Sentiment Analysis for Brand Reputation Monitoring

Conference Paper

Feb 2024

Social Network Analysis: A Survey on Process, Tools, and Application

Article

Feb 2024

Due to the explosive rise of online social networks, social network analysis (SNA) has emerged as a significant academic field in recent years. Understanding and examining social relationships in networks through network analysis opens up numerous research avenues in sociology, literature, media, biology, computer science, sports, and more. Therefore, certain studies review and discuss some research verticals of SNA, such as viral marketing, information diffusion, clustering, link prediction, etc., to provide background knowledge and understanding. These studies still lack the SNA process, tools, and practical aspects in multidisciplinary applications. Inspired by these facts, we have discussed the background, process, tools, and application of SNA. First, we have presented a detailed description of the SNA process. Thereafter, we presented a comparative analysis of SNA tools and languages. Finally, we have discussed the various application corresponding to SNA research verticals.

A taxonomy and survey of big data in social media

Article

Jul 2023

Examining the particular value of each platform for big data would be difficult because of the variety of social media forms and sizes. Using social media to objectively and subjectively analyze large groups of individuals makes it the most effective tool for this task. There are numerous sources of big data within the organization. Social media can be identified by the interaction and communication it facilitates. Utilizing social media has become a daily occurrence in modern society. In addition, this frequent use generates data demonstrating the importance of researching the relationship between big data and social media. It is because so many internet users are also active on social media. We conducted a systematic literature review (SLR) to identify 42 articles published between 2018 and 2022 that examined the significance of big data in social media and upcoming issues in this field. We also discuss the potential benefits of utilizing big data in social media. Our analysis discovered open problems and future challenges, such as high‐quality data, information accessibility, speed, natural language processing (NLP), and enhancing prediction approaches. As proven by our investigations of evaluation metrics for big data in social media, the distribution reveals that 24% is related to data‐trace, 12% is related to execution time, 21% to accuracy, 6% to cost, 10% to recall, 11% to precision, 11% to F1‐score, and 5% run time complexity.

Conference Paper

Full-text available

Jan 2018

Programming Models and Systems for Big Data Analysis

Article

Full-text available

Jan 2018

Big Data analysis refers to advanced and efficient data mining and machine learning techniques applied to large amount of data. Research work and results in the area of Big Data analysis are continuously rising, and more and more new and efficient architectures, programming models, systems, and data mining algorithms are proposed. Taking into account the most popular programming models for Big Data analysis (MapReduce, Directed Acyclic Graph, Message Passing, Bulk Synchronous Parallel, Workflow and SQL-like), we analysed the features of the main systems implementing them. Such systems are compared using four classification criteria (i.e. level of abstraction, type of parallelism, infrastructure scale and classes of applications) for helping developers and users to identify and select the best solution according to their skills, hardware availability, productivity and application needs.

Social media analytics – Challenges in topic discovery, data collection, and data preparation

Article

Full-text available

Apr 2018
INT J INFORM MANAGE

Since an ever-increasing part of the population makes use of social media in their day-today lives, social media data is being analysed in many different disciplines. The social media analytics process involves four distinct steps, data discovery, collection, preparation, and analysis. While there is a great deal of literature on the challenges and difficulties involving specific data analysis methods, there hardly exists research on the stages of data discovery, collection, and preparation. To address this gap, we conducted an extended and structured literature analysis through which we identified challenges addressed and solutions proposed. The literature search revealed that the volume of data was most often cited as a challenge by researchers. In contrast, other categories have received less attention. Based on the results of the literature search, we discuss the most important challenges for researchers and present potential solutions. The findings are used to extend an existing framework on social media analytics. The article provides benefits for researchers and practitioners who wish to collect and analyse social media data.

Big Data Technologies: A Survey

Article

Full-text available

Jun 2017

Developing Big Data applications has become increasingly important in the last few years. In fact, several organizations from different sectors depend increasingly on knowledge extracted from huge volumes of data. However, in Big Data context, traditional data techniques and platforms are less efficient. They show a slow responsiveness and lack of scalability, performance and accuracy. To face the complex Big Data challenges, much work has been carried out. As a result, various types of distributions and technologies have been developed. This paper is a review that survey recent technologies developed for Big Data. It aims to help to select and adopt the right combination of different Big Data technologies according to their technological needs and specific applications’ requirements. It provides not only a global view of main Big Data technologies but also comparisons according to different system layers such as Data Storage Layer, Data Processing Layer, Data Querying Layer, Data Access Layer and Management Layer. It categorizes and discusses main technologies features, advantages, limits and usages.

Sentiment based Analysis of Tweets during the US Presidential Elections

Conference Paper

Full-text available

Jun 2017

In a relatively short period of time, social media has gained significant importance as a mass communication and public engagement tool for political and governance purposes. Rapid dissemination of information through social media platforms such as Twitter, provides politicians and campaigners with the ability to broadcast their message to a wide audience instantly and directly while bypassing the traditional media channels. In this paper, we investigate the nature and characteristics of the political discourse that took place on Twitter during the American Presidential elections of November 2016. The goal of this study is to perform exploratory sentiment based analysis of Twitter data that was gathered both before and after the Election Day. Our objective is to identify the nature and sentiment of discussions along with understanding the behavior of users with respect to their Twitter profile and associated attributes of their tweets. We also aim to inspect popular Twitter discussion topics and their relation with important news and events occurring simultaneously.

The impact of social networks on health care

Article

Full-text available

May 2017

Our work examines the risks and benefits stemming from the evolution of Social Network Services (SNSs) in the healthcare domain. More specifically, we study the impact of specific health-oriented social networks such as PatientsLikeMe. Social networks evolved to a ubiquitous part of daily life and WEB 2.0 paved the way for the internet to be used as a method of interactive communication and information immersion. Health SNSs have the strength to influence healthcare services delivery and information availability supported by emerging technologies which track, gather and quantify real-time medical data from patients. SNSs support publicly provided information to patients, offering them the power not only to educate themselves but take part in the decision-making process of their health. On the other hand, healthcare stakeholders have gained access to new information which can help to cut costs, progress research, and improve the healthcare system. However, apart from the unambiguous benefits of SNSs, several risks are identified such as patient confidentiality violation. By incorporating the volumes of data collected by websites like PatientsLikeMe and other WEB 2.0 applications, the patient–industry partnership could ensure better products at lesser costs. Web 3.0 is the next step toward a heath care eco-system which will evolve out of micro-contributions creating the most accurate representations of medicine for the stakeholders.

A Systematic Approach on Data Pre-processing In Data Mining

Article

Nov 2013

Data pre-processing is an important and critical step in the data mining process and it has a huge impact on the success of a data mining Soil classification. Data pre-processing is a first step of the Knowledge discovery in databases (KDD) process that reduces the complexity of the data and offers better analysis and ANN training. Based on the collected data from the field as well soil testing laboratory, data analysis is performed more accurately and efficiently. Data pre-processing is challenging and tedious task as it involves extensive manual effort and time in developing the data operation scripts. There are a number of different tools and methods used for pre-processing, including: sampling, which selects a representative subset from a large population of data; transformation, which manipulates raw data to produce a single input; denoising, which removes noise from data; normalization, which organizes data for more efficient access; and feature extraction, which pulls out specified data that is significant in some particular context. Pre-processing technique for soil data sets are also useful for classification in data mining

Big data storage technologies: a survey

Article

Aug 2017

There is a great thrust in industry toward the development of more feasible and viable tools for storing fast-growing volume, velocity, and diversity of data, termed ‘big data’. The structural shift of the storage mechanism from traditional data management systems to NoSQL technology is due to the intention of fulfilling big data storage requirements. However, the available big data storage technologies are inefficient to provide consistent, scalable, and available solutions for continuously growing heterogeneous data. Storage is the preliminary process of big data analytics for real-world applications such as scientific experiments, healthcare, social networks, and e-business. So far, Amazon, Google, and Apache are some of the industry standards in providing big data storage solutions, yet the literature does not report an in-depth survey of storage technologies available for big data, investigating the performance and magnitude gains of these technologies. The primary objective of this paper is to conduct a comprehensive investigation of state-of-the-art storage technologies available for big data. A well-defined taxonomy of big data storage technologies is presented to assist data analysts and researchers in understanding and selecting a storage mechanism that better fits their needs. To evaluate the performance of different storage architectures, we compare and analyze the existing approaches using Brewer’s CAP theorem. The significance and applications of storage technologies and support to other categories are discussed. Several future research challenges are highlighted with the intention to expedite the deployment of a reliable and scalable storage system.

MongoDB: The Definitive Guide

Book

Jan 2010

Managing extracted knowledge from big social media data for business decision making

Article

Apr 2017
J Knowl Manag

Purpose This paper aims to propose a knowledge management (KM) framework for leveraging big social media data to help interested organizations integrate Big Data technology, social media and KM systems to store, share and leverage their social media data. Specifically, this research focuses on extracting valuable knowledge on social media by contextually comparing social media knowledge among competitors. Design/methodology/approach A case study was conducted to analyze nearly one million Twitter messages associated with five large companies in the retail industry (Costco, Walmart, Kmart, Kohl’s and The Home Depot) to extract and generate new knowledge and to derive business decisions from big social media data. Findings This case study confirms that this proposed framework is sensible and useful in terms of integrating Big Data technology, social media and KM in a cohesive way to design a KM system and its process. Extracted knowledge is presented visually in a variety of ways to discover business intelligence. Originality/value Practical guidance for integrating Big Data, social media and KM is scarce. This proposed framework is a pioneering effort in using Big Data technologies to extract valuable knowledge on social media and discover business intelligence by contextually comparing social media knowledge among competitors.

Review of social media analytics process and Big Data pipeline

Abstract and Figures

Recommended publications

Food Business Marketing Strategy Through Social Network Service

The coordination strategies of high-performing salespeople: Internal working relationships that driv...

Business and Marketing Intelligence

The Art of Selling: Business and Marketing for iOS and Mac Start Ups