ChapterPDF Available

Semantic Information Fusion of Linked Open Data and Social Big Data for the Creation of an Extended Corporate CRM Database

January 2015

January 2015
570:211-221

DOI:10.1007/978-3-319-10422-5_23

In book: Intelligent Distributed Computing VIII (pp.211-221)

Authors:

Ana-Isabel Torre-Bastida

Tecnalia

Esther Villar-Rodriguez

Tecnalia

Javier Del Ser

Tecnalia

Sergio Gil-Lopez

Tecnalia

The amount of on-line available open information from heterogeneous sources and domains is growing at an extremely fast pace, and constitutes an important knowledge base for the consideration of industries and companies. In this context, two relevant data providers can be highlighted: the “Linked Open Data” and “SocialMedia” paradigms. The fusion of these data sources – structured the former, and raw data the latter –, along with the information contained in structured corporate databases within the organizations themselves, may unveil significant business opportunities and competitive advantage to those who are able to understand and leverage their value. In this paper, we present a use case that represents the creation of an existing and potential customer knowledge base, exploiting social and linked open data based on which any given organization might infer valuable information as a support for decision making. In order to achieve this a solution based on the synergy of big data and semantic technologies will be designed and developed. The first will be used to implement the tasks of collection and initial data fusion based on natural language processing techniques, whereas the latter will perform semantic aggregation, persistence, reasoning and retrieval of information, as well as the triggering of alerts over the semantized information.

Semantic data aggregation

…

Generated semantic data model and Sparql execution example

…

Figures - uploaded by Javier Del Ser

Content may be subject to copyright.

Content uploaded by Javier Del Ser

Content may be subject to copyright.

Semantic Information Fusion of Linked Open

Data and Social Big Data for the Creation

of an Extended Corporate CRM Database

Ana I. Torre-Bastida, Esther Villar-Rodriguez,

Javier Del Ser, and Sergio Gil-Lopez

Abstract. The amount of on-line available open information from heterogeneous

sources and domains is growing at an extremely fast pace, and constitutes an im-

portant knowledge base for the consideration of industries and companies. In this

context, two relevant data providers can be highlighted: the “Linked Open Data” and

“Social Media” paradigms. The fusion of these data sources– structured the former,

and raw data the latter –, along with the information contained in structured corpo-

rate databases within the organizations themselves, may unveil signiﬁcant business

opportunities and competitive advantage to those who are able to understand and

leverage their value. In this paper, we present a use case that represents the creation

of an existing and potential customer knowledge base, exploiting social and linked

open data based on which any given organization might infer valuable information

as a support for decision making. In order to achieve this a solution based on the

synergy of big data and semantic technologies will be designed and developed. The

ﬁrst will be used to implement the tasks of collection and initial data fusion based

on natural language processing techniques, whereas the latter will perform seman-

tic aggregation, persistence, reasoning and retrieval of information, as well as the

triggering of alerts over the semantized information.

Keywords: Big Data, Social Media, Linked Open Data, business intelligent, infor-

mation fusion, ontology management, information modelling.

1 Introduction and Motivation

Nowadays, organizations need to gather valuableinformation that will allow them to

improve their business processes and optimize their decision making. In this context,

Ana I. Torre-Bastida ·Esther Villar-Rodriguez ·Javier Del Ser ·Sergio Gil-Lopez

TECNALIA, OPTIMA Unit, E-48160 Derio, Spain

e-mail: {isabel.torre,esther.villar,javier.delser,

sergio.gil}@tecnalia.com

D. Camacho et al. (eds.), Intelligent Distributed Computing VIII,

Studies in Computational Intelligence 570, DOI: 10.1007/978-3-319-10422-5_23

212 A.I. Torre-Bastida et al.

business intelligence [1] is the set of strategies, relevant aspects and key technolo-

gies to the creation of knowledge on the data environment, through the analysis of

these and the context, with the ultimate aim to facilitate business decision making.

However, the principal problem to achieve this task is the vast amount of available

data and the efﬁcient extraction of useful information from huge repositories. The

problems associated with data volume is the concept known as Big Data, where the

collection of data sets is so tremendousand complexthat their processingusing tradi-

tional data management tools results computationally unaffordable.Two of the most

notable data providers in Big data are the Linked Open Data (LOD [2]) and social

big data. Social media is becoming an important context-rich information source for

organizations and therefore many business executive consider an essential challenge

to be faced in order to incorporate this user-generated information in their decision-

making chain. The goal is that businesses achieve proﬁts from social platformssuch

as Wikipedia (DBpedia in the LOD), Facebook or Twitter. Due to the heterogeneity

of the received digitized data, following non-standard schemas and with low accu-

racy and reliability, a great human effort becomes necessary to extract, format and

assimilate, trying to solve the second major problem, which is the removal of noise

in data content before using it. A third problem arising therefrom is how to get to

merge these datasets with traditional business data, such as relational database or

corporate knowledge systems.

Many projects follow these business intelligence research lines in areas such as

brand recognition, competitor analysis or benchmarking [3–5]. But there are scarce

studies applying them to a speciﬁc matter such as customer relationship management

towards potential customers identiﬁcation or existing clients’ information improve-

ment or enrichment. Aimed at ﬁlling this gap, our approach deﬁnes a system capa-

ble of 1) implementing the generation and management of an extended corporate

CRM database; 2) solving several related analytics problems (knowledge discover-

ing and aggregation/fusion) stemming therefrom; and 3) exploiting emerging data

sources by using the semanticand big-datatechnology stack. Technically, oursystem

follows a semantic aggregation approach that allows taking advantage of the LOD

datasets structure so as to enhance our solution. In detail, the main contributions of

our scheme are the following:

• Analysis of the particular business problem of discovering and improving orga-

nizations customer database.

• Exploitation of new data sources (social media, LOD).

• Making use of the semantic and big-data technology stack for data collection and

aggregation tasks.

In the rest of this paper we ﬁrst introduce the main concepts related to our ap-

proach, Social Media, Linked Open Data and the closest related work. Then our

scheme and its core processing steps are described in detail. Finally, we illustrate a

study use case to evaluate our system prototype.

Semantic Information Fusion for the Creation of a CRM Database 213

2 Background

The web has recently undergone a transformation in the amount and type of avail-

able contents, emerging a new paradigm called Big Data. This new term is used to

describe the exponential growth and availability of data, both structured and unstruc-

tured. In our approach we use two clear examples of these kind of datasets: social

big data (unstructured) and Linked Open Data (semantically structured).

Nowadays social media platforms are storing enormous amounts of no previously

automatically analyzed data that could reveal critical information. The reason behind

is that the user role has shifted from being a mere consumer to a content provider.

Social media is deﬁned by Kaplan et al. [7] as“a group of Internet-based applica-

tions that build on the ideological and technological foundations of Web 2.0, and

that allow the creation and exchange of user-generated content". For this reason it

can be considered as a context-rich source of big data and is usually referred to as

Social Big Data. To better explain this deﬁnition we must introduce two concepts:

Web 2.0 and User Generated Content (UGC). Web 2.0 describes a new method in

which software developers and end-users collaborate on the World Wide Web; that

is, content and applications are no more statically published by an individual, but are

continuously modiﬁed by a collaborative users community instead. UGC describes

the various forms of media content that are publicly available and created by end-

users.

On the other hand, Linked Data refers to a set of best practices for publishing

and interlinking structured data on the Web. With this deﬁnition, Bizer et al. [6]

deﬁned the linked data paradigm and provided a mechanism for building the Web of

Data, what is based on the semantic Web technologies and it may be considered as a

simpliﬁed version of the Semantic Web. The data model for representing interlinked

data is RDF [3], where data is represented as node-and-edge-labeleddirected graphs.

Some published Linked Data sets contain billions of triples, whose cardinality is

steadily increasing to yield the Linked Data Cloud, i.e. a group of data sets available

on the Web as Linked Data with links pointing at other Linked Data sets.

2.1 Related Work

In business intelligence – especially in the area of competitive information (CI) –

the data gathering process can involve a large number of research areas regarding

to technologies and strategies, which have unchained an intense activity in the re-

lated literature. Our approach deals with social big data which has been broadly

adopted to nourish data analytic systems. Studies as the one by Rappaport in [8] in-

troduces the essential role it can take to exploit social media in the ﬁeld of business

intelligence, presenting cases of study in which the social media data is turned into

business advantages. In [9] a preliminary study about using text-mining techniques

in the task of collaborative intelligence information gathering is presented. The main

difference with our approach lies on the used techniques (our work resorts to

214 A.I. Torre-Bastida et al.

semantic fusion and big data technologies) and the application domain, which in our

case lays on the speciﬁc exampleof creating a knowledge base of customers. Another

work related to our scheme is the one presented by Shroff et al. in [10], which de-

scribes a frameworkfor the fusion of business intelligence in various industries such

as manufacturing, retail or insurance. Once again and unlike our proposal, this con-

tribution hinges on its global and general case-based implementation without con-

centrating on a speciﬁc problem. Furthermore, the artiﬁcial intelligence techniques

used in their work are the blackboard architectureand the locality sensitive hashing,

which are far away from our semantic fusion approach. Another interesting and more

speciﬁc work is presented by Agichtein et al. in [11], which elaborates on a high-

quality social media information gathering scheme, but only managing data posted in

Yahoo!Answers social platform. Other investigations also discuss the advantages of

data fusion on information collected from social media as in [12], in which multiple

features in the social media environment (textual, visual and user information) are

fused for later being used on a retrieval algorithm for large social media data (ﬂickr).

Likewise, in [13] a use case of a shared on-line calendar is presented and enhanced

with events generated by user social networks and location data using fusion tech-

niques. Further,we highlight the work presented by Kim et al. in [14] due to the fact

that it is the only one that uses semantic fusion techniques. However, its purpose

is out of the scope of business intelligence and does not provide enough technical

details. Its methodologyand assessment is deemed as insufﬁcient for a fair compari-

son with the technical proposalnext presented.Finally, we analyze the work done by

Hoang et al. in [15], where a survey about technologies and applications of Linked

data mahups as well as the challenges to build them are presented. In this paper a

use case close to our approach is presented, since both use semantic technologies

for integration. The main difference lies in the architecture (they use semantic web

pipes and our approach instead uses ontologymapping/alignment techniquesfor the

semantic integration) and datasets (they exploit freebase and do not use social media

data sources).

3 Information Collection from the Web

Our system collects external information, such as company related tweets, customer

feedback (comments) or business related open data and merges this data with tra-

ditional enterprise databases, with the aim at storing these aggregated information

following an adequate business semantic model. This section delves into the ﬁrst of

these tasks, the information collection from two different on-line available sources:

the Social Media and Linked Open Data Cloud, as well as into the subsequent ﬁlters

to extract their relevance:

Semantic Information Fusion for the Creation of a CRM Database 215

1. Social Big Data: at this point the data collected in two streaming social platforms

is selected.

a. Facebook posts from speciﬁc user-ids are the considered data, extracting the

comments generated by these users.

b. Twitter feeds containing certain keywords or from speciﬁc user-ids. These key-

words are extracted using a TF/IDF approachfrom the corporation documents

and website.

The technology used to perform these tasks are Facebook1and Twitter2streaming

application programming interfaces (API).

2. Linked Open Data Cloud: there are several datasets related to the business do-

main, such as DBpedia, CrunchBase or Freebase, which can be queried by the

SPARQL query language or web services. From these datasets, structured infor-

mation about customers is obtained, which is latter mapped to the semantic model

of our system.

The data collection is detailed in Figure 1. This task is composed by three sub-

processes: data collection and noise reduction, extraction of disambiguated entities

and harvesting of related entities information available as open data.

In a ﬁrst step, the different social media data streams are captured using the afore-

mentioned APIs. Next, the posts(from Twitter or Facebook platforms) are prepro-

cessed. At this stage we use the Freeling API3to carry out the language analysis,

calculating their corresponding synsets (i.e. a group of data elements that are con-

sidered to be semantically equivalent, represented by an identiﬁer). The collection

of pairs formed by each post and its synsets are the input events to a set of rules that

allow deducing if the post (tweet/comment) can be considered within the business

domain. This stage is what we have coined as NOISE ﬁlter. This ﬁlter, composed

by a set of rules is implemented by the Esper CEP engine4built up from a set of

synsets constructing a context that describes and models the business domain, for

example concept: business; synsets:08056231,08058098.Ifanyof

the synsets belonging to ongoing post can be match to any one of the synsets that

form the context, the rule is activated and the post is ﬁltered as pertinent. Otherwise

the post is discarded since it is assumed that its content is not about anythingrelated

to the business world.

Once posts have gone through the noise ﬁlter, the result will be deemed valu-

able since it is likely to provide meaningful information about the domain, which

is then fed to the subprocess in charge of entity extraction. Named Entity Recog-

nition (NER) refers to the module or function in charge of detecting any kind of

entities such as cities, organizations, people and is mostly utilized by NLP utilities

as a contributor for semantic information. In our case, the ﬁltered posts can contain

1https://developers.facebook.com/docs/graph-api

2https://dev.twitter.com/docs/api/streaming

3http://nlp.lsi.upc.edu/freeling/

4http://esper.codehaus.org/

216 A.I. Torre-Bastida et al.

Tweets by

keyword

Tweets by

user-id

Facebook post

by user-id

NOISE filter

(CEP)

NER – entity

extraction

(Map/Reduce)

Open data

collection

(Sparql/Rest)

LOD

(Dbpedia,

Freebase,

Cruncbase)

en d

Filtered posts

(tweet/fb_p ost + customer)

Tweets +

keywor d

Filtered posts

(tweet+ keyword)

Entities extracted

(entities+customer)

Entities and

information

Entities extracted

(entities+keyword)

Tweets/fb_posts

+ customer

Fig. 1 Data collection process ﬂow

any named entity corresponding to an already existing customer, a potential client

or even a competitor working in the same market sectors. On this purpose, Daedalus

Topic Extraction API has been selected, integrating it on a Map-Reduce framework

to parallelize the algorithm responsible for extracting entities. The output obtained

from the Map-Reduce job is a set of entities grouped by post. Finally, for each of the

previously extracted entities, we will collect the information available in the Linked

Open Data sets (freebase, DBpedia) and other open data sets such as CrunchBase.

This information will be merged and aggregated to the existing data from corpora-

tion relational databases, with the ﬁnal aim of feeding the semantic model.

4 Semantic Fusion: Aggregation, Model and Interlinking

The semantic aggregation process has two main goals: to improve the existing infor-

mation for customers of the organization and to discover new potential customers.

The entire process is detailed in Figure 2. First of all a classiﬁcation process is ap-

plied to each post to determine whether its contents relate to any entity existing in

the semantic data model. Depending on the result of this classiﬁcation the system

follows two different alternative ﬂows. In the positive case, the semantic model is

updated with the new information about customer and its partnerships/relationships.

Otherwise, the data gathered from the Linked Open Data Cloud is mapped into a

Semantic Information Fusion for the Creation of a CRM Database 217

Corporation

RDBMS

Classification

Entities by post

LOD &

Relational

information

mapping

Semantic

model

updating

Potential

entities

information

Existing

customer

information

New customer

& relationships

Update

customer &

relationships

RDF

repository

Semantic Model

Fig. 2 Semantic data aggregation

new instance within the semantic model. These processes are supported by a set

of previously computed semantic links between our model and the LOD datasets

vocabularies, which are calculated following the ontology alignment process pro-

posed by the authors in [16].

With regard to the deﬁnition of our model schema, well-known semantic vocab-

ularies will be reused, to promote interoperability with other RDF repositories or

datasets. Our ontology model is based on the combination of the schema.org on-

tology along with that used in DBpedia and vocabularies as SKOS to specify seman-

tic relationships and links. New classes or properties are also modeled in the case

that existing vocabularies do not provide their deﬁnition. Finally, the new instances

of the semantic data model are stored in the Virtuoso Open-Link RDF repository 5.

5 Information Retrieval, Inference and Alert Generation

Once the information has been converted to RDF format following our semantic

model and it is saved in the RDF repository, some added-value operations can be

implemented over the stored data, such as the following features:

• Information retrieval: In our case SPARQL – the current W3C recommendation

for querying RDF data– is selected to allow usersto perform selective queries. In

5http://virtuoso.openlinksw.com/

218 A.I. Torre-Bastida et al.

our system the SPARQL endpoint provided by RDF repository Virtuoso Open-

Link and the JENA API6are the chosen tools for implementing this module.

• Inference: Based on the information stored in the repository semantic inference

processes (RDFS and OWL) can be performed with the aim of discovering new

relationships. This task can be accomplished by semantic reasoners like Pellet

[17], in combination with the JENA API. This process also allows for the deﬁni-

tion of speciﬁc business semantic rules implemented using the SWRL (Semantic

Web Rule Language combining OWL and RuleML) language.

• Alert generation: Finally, an alert generationmodule is responsible for monitoring

the data and triggering events that indicate that a number of conditions speciﬁed

in the alert have been fulﬁlled. For its implementation a listener is utilized during

the loading and inference process that allows detecting whether alert conditions

have been met.

6 Use Case

This section describes in detail an illustrative example of the process followed by

our system since the data collection occurs until the information is retrieved by a

1,@Gamesa_Official wins contract to supply 20 MW to

Energa in #Poland http://t.co/RjIG1Hup12

2. China #Automotive ABS Market @Bigmarketreport

http://t.co/IFR0pW76Ey

3.-Applying the energy of today's Taurus New Moon Eclipse

empower... More for Virgo http://t.co/y4bAKcHKCd

4. Soooo much to do sooo little energy #cantbearsed […]

1, Nuevo Plan Ciencia y Tecnologí a con @Innobasque

@AlianzaIK4 @tecnalia @iberdrola @jakiunde @idom …

2,Calcula y obtén un gráfico de la rentabilidad de tu

inversión en @Iberdrola https://t.co/hRIi7bdycv

3.Trabajamos junto a @BSC_CNS este este proyecto para

diseñar instalaciones eólicas ….

4., Consulta la actualidad de nuestra filial brasileña,

Elektro http://t.co/KOjCk2W6cs […]

1,Hoy en el blog podéis ..de nuestra filial

escocesa ScottishPower. …

2, Hoy se celebra el Día de la #Tierra. En

Iberdrola …diferentes políticas ….la

estrategia …

3, Iberdrola Ingeniería … construir la

subestación Votkinskaya, …RusHydro […]

Keywords TF/ID F

"automotive",

"energy", “IT”,” …

Tweets by

Keywords:

User-id extracted from RDB

“iberdrola“, …

FB posts by user-id:

cnologí

Tweets by user-id:

NLP Pre-processing

NOISE filter

Business context:

organization-> 8056231,08058098

business ->08061042 …

Filtered tweets:

TweetsK:1, keyword, @Gamesa_Official wins contract to supply 20 MW to Energa in #Poland http://t.co/RjIG1Hup12

FB:1, Iberdrola, Hoy en el blog podéis ..de nuestra filial escocesa ScottishPower.

FB:3, Iberdrola, Iberdrola Ingeniería … construir la subestación Votkinskaya, …RusHydro…

TweetsC:1, Iberdrola, Nuevo Plan Ciencia y Tecnología con @Innobasque @AlianzaIK4 @tecnalia @iberdrola @jakiunde @idom …

TweetsC:3, Iberdrola, Trabajamos junto a @BSC_CNS este este proyecto para diseñar instalaciones eólicas ….

Open data collection

(Freebase, Dbpedia, Crunchbase)

NER processing

Entities +open information:

{ScottishPower [

foaf:homepage http://www.scottishpower.com/;

dbpedia-owl:numberOfEmployees. 9953 …..]; Energa [dbpedia-

owl:country dbpedia:Poland ; …]

Dbpedia Sparql example:

SELECT ?thing

WHERE {

?thing rdfs:label ?name.

FILTER(regex(str(?name),

“Iberdrola", \"i\")) }

ll ti

Entities extracted:

TweetsK:1, keyword, Gamesa_Oficial, Energa; FB:1, Iberdrola, ScottishPower; FB:3, Iberdrola, RusHydro;

TweetsC:1, Iberdrola, Innobasque, AlianzaIK4, Tecnalia, Jakiunde, Idom ; TweetsC:3, Iberdrola, …

{

Fig. 3 Data collection example

6https://jena.apache.org/documentation/query/

Semantic Information Fusion for the Creation of a CRM Database 219

Semantic model

updating

LOD & Relational

information mapping

Entities +open information:

{ScottishPower [foaf:homepagehttp://www.scottishpower.com/; dbpedia-owl:numberOfEmployees. 9953 …..];

Energa [dbpedia-owl:country dbpedia:Poland ; …]

nti

LOD &

Classification PREFIX d:

<http://datafusion.org/ontology/>

Select ?org ?name ?subj ?name2

where {

?org a d:organization.

?org rdfs:label ?name

?subj a d:subject.

?act rdfs:label ?name2

?org d:relatedTo ?subj.}

org1 “ScottishPower” subj1 “energy”

org2 “Energa ” subj1 “energy ”

org3 “Gamesa” subj1 “energy”

org4 “RusHydro” subj1 “energy”

org5 “AlianzaIK4” subj1 “energy”

org5 “AlianzaIK4” subj2 “industry”

org5 “AlianzaIK4” subj3 “IT”

[…]

PREFIX d: http://datafusion.org/ontology/

org1 a d:energycompany.

web1 a d:website.

org1 rdfs:label “ScottishPower”.

web1 d:url http://www.scottishpower.com/.

org1 d:contact web1.

[…]

org2 a d:energycompany.

org2 rdfs:label “Energa”.

[…]

RDF repository

Fig. 4 Generated semantic data model and Sparql execution example

SPARQL query. The data collection process is shown in Figure 3. The ﬁrst input is

the real data retrieved fromTwitter and Facebook. Tweets and posts are preprocessed

to transform them in synsets as explained in Section 3. These synsets are ﬁltered (a

noise ﬁlter for irrelevant data) using a macro that consists of a set of synsets repre-

senting the business domain (business context in the Figure). These ﬁltered tweets

and posts are subject to a named entity recognition procedure aimed at extracting

the entities so as to collect from them the information available on the LOD.

Finally, the data model and instances generated by the semantic aggregation pro-

cess and an example of information retrieval using a SPARQL sentence are shown in

Figure 4. As shown in the picture, the query returns a list of all organizations and its

related subjects. In this context it must be noted that although ScottishPower is

annotated as energycompany, this entity is also returned in the query, because in

the ontological model (see ﬁgure 2) an energycompany is categorized as a sub-

class of organization. This unveils one of the advantages of using a semantic

model for information retrieval.

7 Concluding Remarks and Future Research

This manuscript has gravitated on the problem of automatically creating and man-

aging a customer database from a novel perspective: semantic aggregation. Input

data comes from new sources such as social media and Linked Open Data. Further-

more, different modules have been implemented leveraging Big Data (Map-Reduce,

220 A.I. Torre-Bastida et al.

Complex Event Processing) and semantic web (RDF repository, reasoner, SWRL)

technology stacks. A use case exempliﬁes the multiple possibilities and potentiality

offered to a corporation by our approach, ranging from the discovery of new cus-

tomers to the knowledge base expansion of traditional clients. This springs proﬁtable

advantages in the business domain, where the decision making is a critical process

and the collection of customer information is a key factor.

Future work will be devoted towards the study of new applications and enlarging

the technical scope of this semantic aggregation so as to e.g. also include projects

referencing entities, business concepts or places and properties that can be matched

to relationships to the model inferred from the posts thanks to developing new algo-

rithms that use PLN and classiﬁcation techniques. Furthermore multilingual features

will be also considered for their inclusion in the platform.

References

1. Moss, L.T., Atre, S.: Business Intelligence Roadmap: The Complete Project Lifecycle for

Decision-Support Applications. Addison-Wesley (2003)

2. Bizer, C., Heath, T., Idehen, K., Berners-Lee, T.: Linked data on the web (LDOW2008). In:

Proceedings of the 17th International Conference on World Wide Web, pp. 1265–1266 (2008)

3. Hoffman, D.L., Fodor, M.: Can you measure the ROI of your social media marketing. MIT

Sloan Management Review 52(1), 41–49 (2010)

4. Vuori, V.: Social media changing the competitive intelligence process: elicitation of employ-

ees’ competitive knowledge. Tampereen teknillinen yliopisto. Julkaisu-Tampere University

of Technology. Publication; 1001 (2011)

5. Bingham, T., Conner, M.: The new social learning: A guide to transforming organizations

through social media. Berrett-Koehler Publishers (2010)

6. Bizer, C., Heath, T., Berners-Lee, T.: Linked data-the story so far. International Journal on

Semantic Web and Information Systems 5(3), 1–22 (2009)

7. Kaplan, A.M., Haenlein, M.: Users of the world, unite! The challenges and opportunities of

Social Media. Business Horizons 53(1), 59–68 (2010)

8. Rappaport, S.D.: Listen First!: Turning Social Media Conversations Into Business Advantage.

John Wiley and Sons (2011)

9. Dey, L., Haque, S.M., Khurdiya, A., Shroff, G.: Acquiring competitive intelligence from

social media. In: Proceedings of the 2011 Joint Workshop on Multilingual OCR and Analytics

for Noisy Unstructured Text Data, p. 3. ACM (2011)

10. Shroff, G., Agarwal, P., Dey, L.: Enterprise information fusion for real-time business intelli-

gence. In: IEEE International Conference on Information Fusion (FUSION), pp. 1–8 (2011)

11. Agichtein, E., Castillo, C., Donato, D., Gionis, A., Mishne, G.: Finding high-quality content

in social media. In: Proceedings of the 2008 International Conference on Web Search and

Data Mining, pp. 183–194. ACM (2008)

12. Cui, B., Tung, A.K., Zhang, C., Zhao, Z.: Multiple feature fusion for social media applica-

tions. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management

of Data, pp. 435–446. ACM (2010)

13. Lovett, T., O’Neill, E., Irwin, J., Pollington, D.: The calendar as a sensor: analysis and im-

provement using data fusion with social networks and location. In: Proceedings of the 12th

ACM International Conference on Ubiquitous Computing, pp. 3–12. ACM (2010)

14. Kim, H., Son, J., Jang, K.: Semantic Data Fusion: from Open Data to Linked Data. In: Pro-

ceedings of the European Semantic Web Conference (2013)

Semantic Information Fusion for the Creation of a CRM Database 221

15. Hanh, H.H., Tai, N.C., Duy, K.T., Dosam, H., Jason, J.J.: Semantic Information Integration

with Linked Data Mashups Approaches. International Journal of Distributed Sensor Net-

works 2014, Article ID 813875 (2014)

16. Torre-Bastida, A.I., Villar-Rodriguez, E., Del Ser, J., Camacho, D., Gonzalez-Rodriguez,

M.: On Interlinking Linked Data Sources by Using Ontology Matching Techniques and the

Map-Reduce Framework. In: Corchado, E., Yin, H. (eds.) IDEAL 2014. LNCS, vol. 8669,

pp. 53–60. Springer, Heidelberg (2014)

17. Sirin, E., Parsia, B., Grau, B.C., Kalyanpur, A., Katz, Y.: Pellet: A practical owl-dl reasoner.

Web Semantics: Science, Services and Agents on the World Wide Web 5(2), 51–53 (2007)

Semantic Web and IoT

Chapter

Full-text available

Nov 2020

In this chapter, we provide an overview of the current trends in using semantic technologies in the IoT domain, presenting practical applications and use cases in different domains, such as in the healthcare domain (home care and occupational health), disaster management, public events, precision agriculture, intelligent transportation, building and infrastructure management. More specifically, we elaborate on semantic web-enabled middleware, frameworks and architectures (e.g. semantic descriptors for M2M) proposed to overcome the limitations of device and data heterogeneity. We present recent advances in structuring, modelling (e.g. RDFa, JSON-LD) and semantically enriching data and information derived from sensor environments, focusing on the advanced conceptual modelling capabilities offered by semantic web ontology languages (e.g. RDF/OWL2). Querying and validation solutions on top of RDF graphs and Linked Data (e.g. SPARQL, SPIN and SHACL) are also presented. Furthermore, insights are provided on reasoning, aggregation, fusion and interpretation solutions that aim to intelligently process and ingest sensor information, infusing also human awareness for advanced situational awareness.

Virtual Integration Architecture for Linking Corporate CRM Database with Web information Data

Conference Paper

Full-text available

Nov 2017

The Web has recently undergone throw a transformation in the amount and type of available information. The emerging Linked Open Data Could (LOD) contains hundreds of published structured data sources. This kind of sources are open to the public, and they can access them from various SPARQL endpoints. On the other hand, Social platforms are storing enormous amounts of data that could reveal critical information. This information is commonly supplied using REST APIs in a semi-structured format. The access, retrieval, and utilization of these different data models on the Web impose a need for the data to be integrated and providing users with a single entry point to them. In this paper, we explore the major challenges in this area and discuss the limitations of some existing integration solutions and tools. We also propose a semantic virtual integration architecture to link Corporate internal CRM Database with large-scale Social Data and Linked Data.

Towards an Integration Approach for Enterprise Data with Large-scale Social and Linked Data

Conference Paper

Full-text available

Nov 2016

Today, a huge amount of valuable data on the web that organizations or users can use to improve their decisionmaking process. With the advent of the Linked Open Data (LOD), many structured data sources have been published and can be accessed from various SPARQL endpoints. On the other hand, Social data is becoming an important context-rich information. Web APIs provide this last generally in a semistructured format. The access, retrieval, and utilization of these different data models on the web impose a need for the data to be integrated and providing users with a single entry point to them. In this paper, we explore the major challenges in this area and discuss the limitations of some existing integration solutions and tools. We also propose a modular mediator-based architecture to integrate heterogeneous enterprise data with large-scale Social Data and Linked Data.

Applications, Methodologies, and Technologies for Linked Open Data: A Systematic Literature Review

Article

Jun 2020

Cecilia Avila-Garzon

Advances in semantic web technologies have rocketed the volume of linked data published on the web. In this regard, linked open data (LOD) has long been a topic of great interest in a wide range of fields (e.g. open government, business, culture, education, etc.). This article reports the results of a systematic literature review on LOD. 250 articles were reviewed for providing a general overview of the current applications, technologies, and methodologies for LOD. The main findings include: i) most of the studies conducted so far focus on the use of semantic web technologies and tools applied to contexts such as biology, social sciences, libraries, research, and education; ii) there is a lack of research with regard to a standardized methodology for managing LOD; and iii) a plenty of tools can be used for managing LOD, but most of them lack of user-friendly interfaces for querying datasets.

MidSemI: A Middleware for Semantic Integration of Business Data with Large-scale Social and Linked Data

Article

Full-text available

Apr 2019

The web diversification into the Web of Data and social media means that companies need to gather all the necessary data to help make the best-informed market decisions. However, data providers on the web publish data in various data models and may equip it with different search capabilities, thus requiring data integration techniques to access them. This work explores the current challenges in this area, discusses the limitations of some existing integration tools, and addresses them by proposing a semantic mediator-based approach to virtually integrate enterprise data with large-scale social and linked data. The implementation of the proposed approach is a configurable middleware application and a user-friendly keyword search interface that retrieves its input from internal enterprise data combined with various SPARQL endpoints and Web APIs. An evaluation study was conducted to compare its features with recent integration approaches. The results illustrate the added value and usability of the contributed approach.

Evolution of Linked Data Application Domains From 2009 to 2015: A Systematic Literature Review

Article

Apr 2018

Nowadays, research in Linked Data has significantly advanced and it now entails enormous area of applications like research publications and data sets. Its flexibility and effectiveness in handling and linking data from numerous sources has made Linked Data more popular. The aim of this article is to systematically present literature review of Linked Data and its development since 2009. Moreover, cumulative experiences and lessons learned from recent years will be highlighted. Findings showed that Linked Data has grown in the past five years in terms of number of datasets, research publications and domain-specific applications.

Semantic Information Integration with Linked Data Mashups Approaches

Article

Full-text available

Apr 2014
INT J DISTRIB SENS N

The introduction of semantic web and Linked Data helps facilitate sharing of data on the Internet more easily. Subsequently, the resource description framework (RDF) is the standard in publishing structured data resources on the Internet and is used in interconnecting with other data resources. To remedy the data integration issues of the traditional web mashups, the semantic web technology uses the Linked Data based on RDF data model as the unified data model for combining, aggregating, and transforming data from heterogeneous data resources to build Linked Data mashups. There have been tremendous amounts of efforts of semantic web community to enable Linked Data mashups but there is still lack of a systematic survey on concepts, technologies, applications, and challenges. Therefore, in this paper, we investigate in detail semantic mashups research and application approaches in the information integration. This paper also presents a Linked Data mashup application as an illustration of the proposed approaches.

Linked Data: The Story so Far

Article

Full-text available

Jan 2009

The term Linked Data refers to a set of best practices for publishing and connecting structured data on the Web. These best practices have been adopted by an increasing number of data providers over the last three years, leading to the creation of a global data space containing billions of assertions-the Web of Data. In this article we present the concept and technical principles of Linked Data, and situate these within the broader context of related technological developments. We describe progress to date in publishing Linked Data on the Web, review applications that have been developed to exploit the Web of Data, and map out a research agenda for the Linked Data community as it moves forward.

Acquiring competitive intelligence from social media

Article

Full-text available

Sep 2011

Competitive intelligence is the art of defining, gathering and analyzing intelligence about competitor's products, promotions, sales etc. from external sources. The Web comes across as an important source for gathering competitive intelligence. News, blogs, as well as social media not only provide competitors information but also provide direct comparison of customer behaviors with respect to different verticals among competing organizations. This paper discusses methodologies to obtain competitive intelligence from different types of web resources including social media using a wide array of text mining techniques. It provides some results from case-studies to show how the gathered information can be integrated with structured data and used to explain business facts and thereby adopted for future decision making.

Can You Measure the ROI of Your Social Media Marketing?

Article

Full-text available

Oct 2010
MIT SLOAN MANAGE REV

This paper argues that social media metrics should be captured as customer investments in marketers’ social media efforts and that applications considered in concert with performance objectives drive the choice of metrics. Motivating this approach are the “four c’s” that drive consumer use of social media. These include the connections consumers make with each other, the user-generated content they create, their consumption of other users’ content and their control of their own online experiences. Social media metrics that are linked to three broad social media performance objectives are identified for eight general categories of social media applications and the paths managers have for improving social media effectiveness that rely on using these metrics are discussed.

Pellet: A Practical OWL-DL Reasoner

Article

Jan 2007

Finding high-quality content in social media

Article

Jan 2008

Linked data on the Web (LDOW2008)

Conference Paper

Apr 2008

The Web is increasingly understood as a global information space consisting not just of linked documents, but also of Linked Data. More than just a vision, the resulting Web of Data has been brought into being by the maturing of the Semantic Web technology stack, and by the publication of an increasing number of data sets according to the principles of Linked Data. The Linked Data on the Web (LDOW2008) workshop brings together researchers and practitioners working on all aspects of Linked Data. The workshop provides a forum to present the state of the art in the field and to discuss ongoing and future research challenges. In this workshop summary we will outline the technical context in which Linked Data is situated, describe developments in the past year through initiatives such as the Linking Open Data community project, and look ahead to the workshop itself.

Social media changing the competitive intelligence process: Elicitation of employees’ competitive knowledge

Thesis

Nov 2011

Vilma Vuori

Competitive intelligence process aims to provide actionable information about the external business environment to back up decision-making in companies. The affects that the rise of social media may have on competitive intelligence is a topic of interest to both practice and theory. The main objectives of this dissertation are to understand how social media changes the competitive intelligence process and how can it enhance the elicitation of employees’ competitive knowledge. The research questions are studied using both theoretical and empirical research approach. Empirical study consists of three data sets complementing each other, adopting several methods and perspectives. The results of the dissertation suggest that social media has an effect on companies’ information environment, as the widespread use of social media produces more volume and more versatile information than before. In the competitive intelligence context this influences information gathering especially: social media for its part increases the available information sources, but it also offers technologies to automate some parts of information gathering and processing. In addition, use of suitable social media tools can have affects on the elicitation of employees’ competitive knowledge and making competitive knowledge more visible in a company. Social media provides an opportunity to implement the competitive intelligence process as participative and collaborative and engaging employees in the process. The role of the employees shifts to that of more active participants shaping the collaborative understanding by contributing their competitive knowledge to the process as well as better benefiting more from others’ competitive knowledge. However, the success of using social media in better utilising and sharing employees’ competitive knowledge relies heavily on utility, perceived usefulness and affordance of the tools as well as how motivated the employees are to use it for knowledge sharing. The main motivating factors and barriers are in line with those regarding general knowledge sharing. The main contributions include increasing knowledge on the connection between social media and competitive intelligence: how the emergence of social media affects carrying out the competitive intelligence process and especially sharing of employees’ competitive knowledge. In addition, the research reveals the motivational factors and barriers related to employees’ willingness to use social media for sharing competitive knowledge. The findings also have practical managerial implications for companies planning to adopt social media for competitive knowledge sharing, as they provide means for them to prepare the conditions for successful utilisation and active employee participation.

On Interlinking Linked Data Sources by Using Ontology Matching Techniques and the Map-Reduce Framework

Conference Paper

Sep 2014

Interlinking different data sources has become a crucial task due to the explosion of diverse, heterogeneous information repositories in the so-called Web of Data. In this paper an approach to extract relationships between entities existing in huge Linked Data sources is presented. Our approach hinges on the Map-Reduce processing framework and context-based ontology matching techniques so as to discover the maximum number of possible relationships between entities within different data sources in an computationally efficient fashion. To this end the processing flow is composed by three Map-Reduce jobs in charge for 1) the collection of linksets between datasets; 2) context generation; and 3) construction of entity pairs and similarity computation. In order to assess the performance of the proposed scheme an exemplifying prototype is implemented between DBpedia and LinkedMDB datasets. The obtained results are promising and pave the way towards benchmarking the proposed interlinking procedure with other ontology matching systems.

The calendar as a sensor

Conference Paper

Sep 2010

The shared online calendar is the de facto standard for event organisation and management in the modern office environment. It is also a potentially valuable source of context, provided the calendar event data represent an accurate account of 'real-world' events. However, as we show through a field study, the calendar does not represent reality well as genuine events are hidden by a multitude of reminders and 'placeholders', i.e. events that appear in the calendar but do not occur. We show that the calendar's representation of real events can be significantly improved through data fusion with other sources of context, namely social network and location data. Finally, we discuss some of the issues raised during our field study, their significance and how performance could be farther improved.

Semantic Information Fusion of Linked Open Data and Social Big Data for the Creation of an Extended Corporate CRM Database

Abstract and Figures

Recommended publications

Design and Implementation of an Extended Corporate CRM Database System with Big Data Analytical Func...

Linked Music Data from Global Music Charts

GeoLinked data and INSPIRE through an application case

A Progressive Web Application on Ancient Roman Empire Coins and Relevant Historical Figures with Gra...