ChapterPDF Available

Semantic Information Fusion of Linked Open Data and Social Big Data for the Creation of an Extended Corporate CRM Database

Authors:

Abstract and Figures

The amount of on-line available open information from heterogeneous sources and domains is growing at an extremely fast pace, and constitutes an important knowledge base for the consideration of industries and companies. In this context, two relevant data providers can be highlighted: the “Linked Open Data” and “SocialMedia” paradigms. The fusion of these data sources – structured the former, and raw data the latter –, along with the information contained in structured corporate databases within the organizations themselves, may unveil significant business opportunities and competitive advantage to those who are able to understand and leverage their value. In this paper, we present a use case that represents the creation of an existing and potential customer knowledge base, exploiting social and linked open data based on which any given organization might infer valuable information as a support for decision making. In order to achieve this a solution based on the synergy of big data and semantic technologies will be designed and developed. The first will be used to implement the tasks of collection and initial data fusion based on natural language processing techniques, whereas the latter will perform semantic aggregation, persistence, reasoning and retrieval of information, as well as the triggering of alerts over the semantized information.
Content may be subject to copyright.
Semantic Information Fusion of Linked Open
Data and Social Big Data for the Creation
of an Extended Corporate CRM Database
Ana I. Torre-Bastida, Esther Villar-Rodriguez,
Javier Del Ser, and Sergio Gil-Lopez
Abstract. The amount of on-line available open information from heterogeneous
sources and domains is growing at an extremely fast pace, and constitutes an im-
portant knowledge base for the consideration of industries and companies. In this
context, two relevant data providers can be highlighted: the “Linked Open Data” and
“Social Media” paradigms. The fusion of these data sources– structured the former,
and raw data the latter –, along with the information contained in structured corpo-
rate databases within the organizations themselves, may unveil significant business
opportunities and competitive advantage to those who are able to understand and
leverage their value. In this paper, we present a use case that represents the creation
of an existing and potential customer knowledge base, exploiting social and linked
open data based on which any given organization might infer valuable information
as a support for decision making. In order to achieve this a solution based on the
synergy of big data and semantic technologies will be designed and developed. The
first will be used to implement the tasks of collection and initial data fusion based
on natural language processing techniques, whereas the latter will perform seman-
tic aggregation, persistence, reasoning and retrieval of information, as well as the
triggering of alerts over the semantized information.
Keywords: Big Data, Social Media, Linked Open Data, business intelligent, infor-
mation fusion, ontology management, information modelling.
1 Introduction and Motivation
Nowadays, organizations need to gather valuableinformation that will allow them to
improve their business processes and optimize their decision making. In this context,
Ana I. Torre-Bastida ·Esther Villar-Rodriguez ·Javier Del Ser ·Sergio Gil-Lopez
TECNALIA, OPTIMA Unit, E-48160 Derio, Spain
e-mail: {isabel.torre,esther.villar,javier.delser,
sergio.gil}@tecnalia.com
© Springer International Publishing Switzerland 201 211
D. Camacho et al. (eds.), Intelligent Distributed Computing VIII,
Studies in Computational Intelligence 570, DOI: 10.1007/978-3-319-10422-5_23
5
212 A.I. Torre-Bastida et al.
business intelligence [1] is the set of strategies, relevant aspects and key technolo-
gies to the creation of knowledge on the data environment, through the analysis of
these and the context, with the ultimate aim to facilitate business decision making.
However, the principal problem to achieve this task is the vast amount of available
data and the efficient extraction of useful information from huge repositories. The
problems associated with data volume is the concept known as Big Data, where the
collection of data sets is so tremendousand complexthat their processingusing tradi-
tional data management tools results computationally unaffordable.Two of the most
notable data providers in Big data are the Linked Open Data (LOD [2]) and social
big data. Social media is becoming an important context-rich information source for
organizations and therefore many business executive consider an essential challenge
to be faced in order to incorporate this user-generated information in their decision-
making chain. The goal is that businesses achieve profits from social platformssuch
as Wikipedia (DBpedia in the LOD), Facebook or Twitter. Due to the heterogeneity
of the received digitized data, following non-standard schemas and with low accu-
racy and reliability, a great human effort becomes necessary to extract, format and
assimilate, trying to solve the second major problem, which is the removal of noise
in data content before using it. A third problem arising therefrom is how to get to
merge these datasets with traditional business data, such as relational database or
corporate knowledge systems.
Many projects follow these business intelligence research lines in areas such as
brand recognition, competitor analysis or benchmarking [3–5]. But there are scarce
studies applying them to a specific matter such as customer relationship management
towards potential customers identification or existing clients’ information improve-
ment or enrichment. Aimed at filling this gap, our approach defines a system capa-
ble of 1) implementing the generation and management of an extended corporate
CRM database; 2) solving several related analytics problems (knowledge discover-
ing and aggregation/fusion) stemming therefrom; and 3) exploiting emerging data
sources by using the semanticand big-datatechnology stack. Technically, oursystem
follows a semantic aggregation approach that allows taking advantage of the LOD
datasets structure so as to enhance our solution. In detail, the main contributions of
our scheme are the following:
Analysis of the particular business problem of discovering and improving orga-
nizations customer database.
Exploitation of new data sources (social media, LOD).
Making use of the semantic and big-data technology stack for data collection and
aggregation tasks.
In the rest of this paper we first introduce the main concepts related to our ap-
proach, Social Media, Linked Open Data and the closest related work. Then our
scheme and its core processing steps are described in detail. Finally, we illustrate a
study use case to evaluate our system prototype.
Semantic Information Fusion for the Creation of a CRM Database 213
2 Background
The web has recently undergone a transformation in the amount and type of avail-
able contents, emerging a new paradigm called Big Data. This new term is used to
describe the exponential growth and availability of data, both structured and unstruc-
tured. In our approach we use two clear examples of these kind of datasets: social
big data (unstructured) and Linked Open Data (semantically structured).
Nowadays social media platforms are storing enormous amounts of no previously
automatically analyzed data that could reveal critical information. The reason behind
is that the user role has shifted from being a mere consumer to a content provider.
Social media is defined by Kaplan et al. [7] as“a group of Internet-based applica-
tions that build on the ideological and technological foundations of Web 2.0, and
that allow the creation and exchange of user-generated content". For this reason it
can be considered as a context-rich source of big data and is usually referred to as
Social Big Data. To better explain this definition we must introduce two concepts:
Web 2.0 and User Generated Content (UGC). Web 2.0 describes a new method in
which software developers and end-users collaborate on the World Wide Web; that
is, content and applications are no more statically published by an individual, but are
continuously modified by a collaborative users community instead. UGC describes
the various forms of media content that are publicly available and created by end-
users.
On the other hand, Linked Data refers to a set of best practices for publishing
and interlinking structured data on the Web. With this definition, Bizer et al. [6]
defined the linked data paradigm and provided a mechanism for building the Web of
Data, what is based on the semantic Web technologies and it may be considered as a
simplified version of the Semantic Web. The data model for representing interlinked
data is RDF [3], where data is represented as node-and-edge-labeleddirected graphs.
Some published Linked Data sets contain billions of triples, whose cardinality is
steadily increasing to yield the Linked Data Cloud, i.e. a group of data sets available
on the Web as Linked Data with links pointing at other Linked Data sets.
2.1 Related Work
In business intelligence – especially in the area of competitive information (CI) –
the data gathering process can involve a large number of research areas regarding
to technologies and strategies, which have unchained an intense activity in the re-
lated literature. Our approach deals with social big data which has been broadly
adopted to nourish data analytic systems. Studies as the one by Rappaport in [8] in-
troduces the essential role it can take to exploit social media in the field of business
intelligence, presenting cases of study in which the social media data is turned into
business advantages. In [9] a preliminary study about using text-mining techniques
in the task of collaborative intelligence information gathering is presented. The main
difference with our approach lies on the used techniques (our work resorts to
214 A.I. Torre-Bastida et al.
semantic fusion and big data technologies) and the application domain, which in our
case lays on the specific exampleof creating a knowledge base of customers. Another
work related to our scheme is the one presented by Shroff et al. in [10], which de-
scribes a frameworkfor the fusion of business intelligence in various industries such
as manufacturing, retail or insurance. Once again and unlike our proposal, this con-
tribution hinges on its global and general case-based implementation without con-
centrating on a specific problem. Furthermore, the artificial intelligence techniques
used in their work are the blackboard architectureand the locality sensitive hashing,
which are far away from our semantic fusion approach. Another interesting and more
specific work is presented by Agichtein et al. in [11], which elaborates on a high-
quality social media information gathering scheme, but only managing data posted in
Yahoo!Answers social platform. Other investigations also discuss the advantages of
data fusion on information collected from social media as in [12], in which multiple
features in the social media environment (textual, visual and user information) are
fused for later being used on a retrieval algorithm for large social media data (flickr).
Likewise, in [13] a use case of a shared on-line calendar is presented and enhanced
with events generated by user social networks and location data using fusion tech-
niques. Further,we highlight the work presented by Kim et al. in [14] due to the fact
that it is the only one that uses semantic fusion techniques. However, its purpose
is out of the scope of business intelligence and does not provide enough technical
details. Its methodologyand assessment is deemed as insufficient for a fair compari-
son with the technical proposalnext presented.Finally, we analyze the work done by
Hoang et al. in [15], where a survey about technologies and applications of Linked
data mahups as well as the challenges to build them are presented. In this paper a
use case close to our approach is presented, since both use semantic technologies
for integration. The main difference lies in the architecture (they use semantic web
pipes and our approach instead uses ontologymapping/alignment techniquesfor the
semantic integration) and datasets (they exploit freebase and do not use social media
data sources).
3 Information Collection from the Web
Our system collects external information, such as company related tweets, customer
feedback (comments) or business related open data and merges this data with tra-
ditional enterprise databases, with the aim at storing these aggregated information
following an adequate business semantic model. This section delves into the first of
these tasks, the information collection from two different on-line available sources:
the Social Media and Linked Open Data Cloud, as well as into the subsequent filters
to extract their relevance:
Semantic Information Fusion for the Creation of a CRM Database 215
1. Social Big Data: at this point the data collected in two streaming social platforms
is selected.
a. Facebook posts from specific user-ids are the considered data, extracting the
comments generated by these users.
b. Twitter feeds containing certain keywords or from specific user-ids. These key-
words are extracted using a TF/IDF approachfrom the corporation documents
and website.
The technology used to perform these tasks are Facebook1and Twitter2streaming
application programming interfaces (API).
2. Linked Open Data Cloud: there are several datasets related to the business do-
main, such as DBpedia, CrunchBase or Freebase, which can be queried by the
SPARQL query language or web services. From these datasets, structured infor-
mation about customers is obtained, which is latter mapped to the semantic model
of our system.
The data collection is detailed in Figure 1. This task is composed by three sub-
processes: data collection and noise reduction, extraction of disambiguated entities
and harvesting of related entities information available as open data.
In a first step, the different social media data streams are captured using the afore-
mentioned APIs. Next, the posts(from Twitter or Facebook platforms) are prepro-
cessed. At this stage we use the Freeling API3to carry out the language analysis,
calculating their corresponding synsets (i.e. a group of data elements that are con-
sidered to be semantically equivalent, represented by an identifier). The collection
of pairs formed by each post and its synsets are the input events to a set of rules that
allow deducing if the post (tweet/comment) can be considered within the business
domain. This stage is what we have coined as NOISE filter. This filter, composed
by a set of rules is implemented by the Esper CEP engine4built up from a set of
synsets constructing a context that describes and models the business domain, for
example concept: business; synsets:08056231,08058098.Ifanyof
the synsets belonging to ongoing post can be match to any one of the synsets that
form the context, the rule is activated and the post is filtered as pertinent. Otherwise
the post is discarded since it is assumed that its content is not about anythingrelated
to the business world.
Once posts have gone through the noise filter, the result will be deemed valu-
able since it is likely to provide meaningful information about the domain, which
is then fed to the subprocess in charge of entity extraction. Named Entity Recog-
nition (NER) refers to the module or function in charge of detecting any kind of
entities such as cities, organizations, people and is mostly utilized by NLP utilities
as a contributor for semantic information. In our case, the filtered posts can contain
1https://developers.facebook.com/docs/graph-api
2https://dev.twitter.com/docs/api/streaming
3http://nlp.lsi.upc.edu/freeling/
4http://esper.codehaus.org/
216 A.I. Torre-Bastida et al.
Tweets by
keyword
Tweets by
user-id
Facebook post
by user-id
NOISE filter
(CEP)
NER – entity
extraction
(Map/Reduce)
Open data
collection
(Sparql/Rest)
LOD
(Dbpedia,
Freebase,
Cruncbase)
e
en d
Filtered posts
(tweet/fb_p ost + customer)
Tweets +
keywor d
Filtered posts
(tweet+ keyword)
Entities extracted
(entities+customer)
Entities and
related
information
Entities extracted
(entities+keyword)
Tweets/fb_posts
+ customer
Fig. 1 Data collection process flow
any named entity corresponding to an already existing customer, a potential client
or even a competitor working in the same market sectors. On this purpose, Daedalus
Topic Extraction API has been selected, integrating it on a Map-Reduce framework
to parallelize the algorithm responsible for extracting entities. The output obtained
from the Map-Reduce job is a set of entities grouped by post. Finally, for each of the
previously extracted entities, we will collect the information available in the Linked
Open Data sets (freebase, DBpedia) and other open data sets such as CrunchBase.
This information will be merged and aggregated to the existing data from corpora-
tion relational databases, with the final aim of feeding the semantic model.
4 Semantic Fusion: Aggregation, Model and Interlinking
The semantic aggregation process has two main goals: to improve the existing infor-
mation for customers of the organization and to discover new potential customers.
The entire process is detailed in Figure 2. First of all a classification process is ap-
plied to each post to determine whether its contents relate to any entity existing in
the semantic data model. Depending on the result of this classification the system
follows two different alternative flows. In the positive case, the semantic model is
updated with the new information about customer and its partnerships/relationships.
Otherwise, the data gathered from the Linked Open Data Cloud is mapped into a
Semantic Information Fusion for the Creation of a CRM Database 217
Corporation
RDBMS
Classification
Entities by post
LOD &
Relational
information
mapping
Semantic
model
updating
Potential
entities
information
Existing
customer
information
New customer
& relationships
Update
customer &
relationships
RDF
repository
Semantic Model
Fig. 2 Semantic data aggregation
new instance within the semantic model. These processes are supported by a set
of previously computed semantic links between our model and the LOD datasets
vocabularies, which are calculated following the ontology alignment process pro-
posed by the authors in [16].
With regard to the definition of our model schema, well-known semantic vocab-
ularies will be reused, to promote interoperability with other RDF repositories or
datasets. Our ontology model is based on the combination of the schema.org on-
tology along with that used in DBpedia and vocabularies as SKOS to specify seman-
tic relationships and links. New classes or properties are also modeled in the case
that existing vocabularies do not provide their definition. Finally, the new instances
of the semantic data model are stored in the Virtuoso Open-Link RDF repository 5.
5 Information Retrieval, Inference and Alert Generation
Once the information has been converted to RDF format following our semantic
model and it is saved in the RDF repository, some added-value operations can be
implemented over the stored data, such as the following features:
Information retrieval: In our case SPARQL – the current W3C recommendation
for querying RDF data– is selected to allow usersto perform selective queries. In
5http://virtuoso.openlinksw.com/
218 A.I. Torre-Bastida et al.
our system the SPARQL endpoint provided by RDF repository Virtuoso Open-
Link and the JENA API6are the chosen tools for implementing this module.
Inference: Based on the information stored in the repository semantic inference
processes (RDFS and OWL) can be performed with the aim of discovering new
relationships. This task can be accomplished by semantic reasoners like Pellet
[17], in combination with the JENA API. This process also allows for the defini-
tion of specific business semantic rules implemented using the SWRL (Semantic
Web Rule Language combining OWL and RuleML) language.
Alert generation: Finally, an alert generationmodule is responsible for monitoring
the data and triggering events that indicate that a number of conditions specified
in the alert have been fulfilled. For its implementation a listener is utilized during
the loading and inference process that allows detecting whether alert conditions
have been met.
6 Use Case
This section describes in detail an illustrative example of the process followed by
our system since the data collection occurs until the information is retrieved by a
1,@Gamesa_Official wins contract to supply 20 MW to
Energa in #Poland http://t.co/RjIG1Hup12
2. China #Automotive ABS Market @Bigmarketreport
http://t.co/IFR0pW76Ey
3.-Applying the energy of today's Taurus New Moon Eclipse
empower... More for Virgo http://t.co/y4bAKcHKCd
4. Soooo much to do sooo little energy #cantbearsed […]
1, Nuevo Plan Ciencia y Tecnologí a con @Innobasque
@AlianzaIK4 @tecnalia @iberdrola @jakiunde @idom
2,Calcula y obtén un gráfico de la rentabilidad de tu
inversión en @Iberdrola https://t.co/hRIi7bdycv
3.Trabajamos junto a @BSC_CNS este este proyecto para
diseñar instalaciones eólicas ….
4., Consulta la actualidad de nuestra filial brasileña,
Elektro http://t.co/KOjCk2W6cs […]
1,Hoy en el blog podéis ..de nuestra filial
escocesa ScottishPower. …
2, Hoy se celebra el Día de la #Tierra. En
Iberdrola …diferentes políticas ….la
estrategia …
3, Iberdrola Ingeniería … construir la
subestación Votkinskaya, RusHydro […]
Keywords TF/ID F
"automotive",
"energy", “IT”,” …
Tweets by
Keywords:
User-id extracted from RDB
“iberdrola“, …
FB posts by user-id:
cnologí
a
co
Tweets by user-id:
NLP Pre-processing
NOISE filter
Business context:
organization-> 8056231,08058098
business ->08061042 …
Filtered tweets:
TweetsK:1, keyword, @Gamesa_Official wins contract to supply 20 MW to Energa in #Poland http://t.co/RjIG1Hup12
FB:1, Iberdrola, Hoy en el blog podéis ..de nuestra filial escocesa ScottishPower.
FB:3, Iberdrola, Iberdrola Ingeniería … construir la subestación Votkinskaya, …RusHydro…
TweetsC:1, Iberdrola, Nuevo Plan Ciencia y Tecnología con @Innobasque @AlianzaIK4 @tecnalia @iberdrola @jakiunde @idom
TweetsC:3, Iberdrola, Trabajamos junto a @BSC_CNS este este proyecto para diseñar instalaciones eólicas ….
Open data collection
(Freebase, Dbpedia, Crunchbase)
NER processing
Entities +open information:
{ScottishPower [
foaf:homepage http://www.scottishpower.com/;
dbpedia-owl:numberOfEmployees. 9953 …..]; Energa [dbpedia-
owl:country dbpedia:Poland ; …]
Dbpedia Sparql example:
SELECT ?thing
WHERE {
?thing rdfs:label ?name.
FILTER(regex(str(?name),
“Iberdrola", \"i\")) }
t
ll ti
Entities extracted:
TweetsK:1, keyword, Gamesa_Oficial, Energa; FB:1, Iberdrola, ScottishPower; FB:3, Iberdrola, RusHydro;
TweetsC:1, Iberdrola, Innobasque, AlianzaIK4, Tecnalia, Jakiunde, Idom ; TweetsC:3, Iberdrola, …
{
S
{
{
fo
d
Fig. 3 Data collection example
6https://jena.apache.org/documentation/query/
Semantic Information Fusion for the Creation of a CRM Database 219
Semantic model
updating
LOD & Relational
information mapping
Entities +open information:
{ScottishPower [foaf:homepagehttp://www.scottishpower.com/; dbpedia-owl:numberOfEmployees. 9953 …..];
Energa [dbpedia-owl:country dbpedia:Poland ; …]
nti
c
mo
d
e
l
LOD &
Re
l
g
Classification PREFIX d:
<http://datafusion.org/ontology/>
Select ?org ?name ?subj ?name2
where {
?org a d:organization.
?org rdfs:label ?name
?subj a d:subject.
?act rdfs:label ?name2
?org d:relatedTo ?subj.}
org1 “ScottishPower” subj1 “energy”
org2 “Energa ” subj1 “energy ”
org3 “Gamesa” subj1 “energy”
org4 “RusHydro” subj1 “energy”
org5 “AlianzaIK4” subj1 “energy”
org5 “AlianzaIK4” subj2 “industry”
org5 “AlianzaIK4” subj3 “IT”
[…]
PREFIX d: http://datafusion.org/ontology/
org1 a d:energycompany.
web1 a d:website.
org1 rdfs:label “ScottishPower”.
web1 d:url http://www.scottishpower.com/.
org1 d:contact web1.
[…]
org2 a d:energycompany.
org2 rdfs:label “Energa”.
[…]
RDF repository
o
o
o
o
o
Fig. 4 Generated semantic data model and Sparql execution example
SPARQL query. The data collection process is shown in Figure 3. The first input is
the real data retrieved fromTwitter and Facebook. Tweets and posts are preprocessed
to transform them in synsets as explained in Section 3. These synsets are filtered (a
noise filter for irrelevant data) using a macro that consists of a set of synsets repre-
senting the business domain (business context in the Figure). These filtered tweets
and posts are subject to a named entity recognition procedure aimed at extracting
the entities so as to collect from them the information available on the LOD.
Finally, the data model and instances generated by the semantic aggregation pro-
cess and an example of information retrieval using a SPARQL sentence are shown in
Figure 4. As shown in the picture, the query returns a list of all organizations and its
related subjects. In this context it must be noted that although ScottishPower is
annotated as energycompany, this entity is also returned in the query, because in
the ontological model (see figure 2) an energycompany is categorized as a sub-
class of organization. This unveils one of the advantages of using a semantic
model for information retrieval.
7 Concluding Remarks and Future Research
This manuscript has gravitated on the problem of automatically creating and man-
aging a customer database from a novel perspective: semantic aggregation. Input
data comes from new sources such as social media and Linked Open Data. Further-
more, different modules have been implemented leveraging Big Data (Map-Reduce,
220 A.I. Torre-Bastida et al.
Complex Event Processing) and semantic web (RDF repository, reasoner, SWRL)
technology stacks. A use case exemplifies the multiple possibilities and potentiality
offered to a corporation by our approach, ranging from the discovery of new cus-
tomers to the knowledge base expansion of traditional clients. This springs profitable
advantages in the business domain, where the decision making is a critical process
and the collection of customer information is a key factor.
Future work will be devoted towards the study of new applications and enlarging
the technical scope of this semantic aggregation so as to e.g. also include projects
referencing entities, business concepts or places and properties that can be matched
to relationships to the model inferred from the posts thanks to developing new algo-
rithms that use PLN and classification techniques. Furthermore multilingual features
will be also considered for their inclusion in the platform.
References
1. Moss, L.T., Atre, S.: Business Intelligence Roadmap: The Complete Project Lifecycle for
Decision-Support Applications. Addison-Wesley (2003)
2. Bizer, C., Heath, T., Idehen, K., Berners-Lee, T.: Linked data on the web (LDOW2008). In:
Proceedings of the 17th International Conference on World Wide Web, pp. 1265–1266 (2008)
3. Hoffman, D.L., Fodor, M.: Can you measure the ROI of your social media marketing. MIT
Sloan Management Review 52(1), 41–49 (2010)
4. Vuori, V.: Social media changing the competitive intelligence process: elicitation of employ-
ees’ competitive knowledge. Tampereen teknillinen yliopisto. Julkaisu-Tampere University
of Technology. Publication; 1001 (2011)
5. Bingham, T., Conner, M.: The new social learning: A guide to transforming organizations
through social media. Berrett-Koehler Publishers (2010)
6. Bizer, C., Heath, T., Berners-Lee, T.: Linked data-the story so far. International Journal on
Semantic Web and Information Systems 5(3), 1–22 (2009)
7. Kaplan, A.M., Haenlein, M.: Users of the world, unite! The challenges and opportunities of
Social Media. Business Horizons 53(1), 59–68 (2010)
8. Rappaport, S.D.: Listen First!: Turning Social Media Conversations Into Business Advantage.
John Wiley and Sons (2011)
9. Dey, L., Haque, S.M., Khurdiya, A., Shroff, G.: Acquiring competitive intelligence from
social media. In: Proceedings of the 2011 Joint Workshop on Multilingual OCR and Analytics
for Noisy Unstructured Text Data, p. 3. ACM (2011)
10. Shroff, G., Agarwal, P., Dey, L.: Enterprise information fusion for real-time business intelli-
gence. In: IEEE International Conference on Information Fusion (FUSION), pp. 1–8 (2011)
11. Agichtein, E., Castillo, C., Donato, D., Gionis, A., Mishne, G.: Finding high-quality content
in social media. In: Proceedings of the 2008 International Conference on Web Search and
Data Mining, pp. 183–194. ACM (2008)
12. Cui, B., Tung, A.K., Zhang, C., Zhao, Z.: Multiple feature fusion for social media applica-
tions. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management
of Data, pp. 435–446. ACM (2010)
13. Lovett, T., O’Neill, E., Irwin, J., Pollington, D.: The calendar as a sensor: analysis and im-
provement using data fusion with social networks and location. In: Proceedings of the 12th
ACM International Conference on Ubiquitous Computing, pp. 3–12. ACM (2010)
14. Kim, H., Son, J., Jang, K.: Semantic Data Fusion: from Open Data to Linked Data. In: Pro-
ceedings of the European Semantic Web Conference (2013)
Semantic Information Fusion for the Creation of a CRM Database 221
15. Hanh, H.H., Tai, N.C., Duy, K.T., Dosam, H., Jason, J.J.: Semantic Information Integration
with Linked Data Mashups Approaches. International Journal of Distributed Sensor Net-
works 2014, Article ID 813875 (2014)
16. Torre-Bastida, A.I., Villar-Rodriguez, E., Del Ser, J., Camacho, D., Gonzalez-Rodriguez,
M.: On Interlinking Linked Data Sources by Using Ontology Matching Techniques and the
Map-Reduce Framework. In: Corchado, E., Yin, H. (eds.) IDEAL 2014. LNCS, vol. 8669,
pp. 53–60. Springer, Heidelberg (2014)
17. Sirin, E., Parsia, B., Grau, B.C., Kalyanpur, A., Katz, Y.: Pellet: A practical owl-dl reasoner.
Web Semantics: Science, Services and Agents on the World Wide Web 5(2), 51–53 (2007)
... The underlying common language for services is focused on a collection of ontologies that allow for the representation and reasoning of various objects, circumstances and possible threats, and so on. In [131], a use case is proposed that represents the development of a current and future consumer knowledge base, leveraging of social and connected open data on the basis of which any company could infer useful information as a decision-making support. Semantic technologies perform semantic aggregation, persistence, reasoning and retrieval of information, as well as the triggering of alerts over the semantized information. ...
Chapter
Full-text available
In this chapter, we provide an overview of the current trends in using semantic technologies in the IoT domain, presenting practical applications and use cases in different domains, such as in the healthcare domain (home care and occupational health), disaster management, public events, precision agriculture, intelligent transportation, building and infrastructure management. More specifically, we elaborate on semantic web-enabled middleware, frameworks and architectures (e.g. semantic descriptors for M2M) proposed to overcome the limitations of device and data heterogeneity. We present recent advances in structuring, modelling (e.g. RDFa, JSON-LD) and semantically enriching data and information derived from sensor environments, focusing on the advanced conceptual modelling capabilities offered by semantic web ontology languages (e.g. RDF/OWL2). Querying and validation solutions on top of RDF graphs and Linked Data (e.g. SPARQL, SPIN and SHACL) are also presented. Furthermore, insights are provided on reasoning, aggregation, fusion and interpretation solutions that aim to intelligently process and ingest sensor information, infusing also human awareness for advanced situational awareness.
... IV. MOTIVATING SCENARIO Nowadays, organizations and industries need to gather valuable information that will allow them to improve their business processes and optimize their decision-making [25]. In this context, we are investigating how semantic technologies can be used to leverage the effort of sales and marketing staff to help them be more responsive to customers and to identify new sales opportunities. ...
Conference Paper
Full-text available
The Web has recently undergone throw a transformation in the amount and type of available information. The emerging Linked Open Data Could (LOD) contains hundreds of published structured data sources. This kind of sources are open to the public, and they can access them from various SPARQL endpoints. On the other hand, Social platforms are storing enormous amounts of data that could reveal critical information. This information is commonly supplied using REST APIs in a semi-structured format. The access, retrieval, and utilization of these different data models on the Web impose a need for the data to be integrated and providing users with a single entry point to them. In this paper, we explore the major challenges in this area and discuss the limitations of some existing integration solutions and tools. We also propose a semantic virtual integration architecture to link Corporate internal CRM Database with large-scale Social Data and Linked Data.
... IV. MOTIVATING SCENARIO Nowadays, organizations and industries need to gather valuable information that will allow them to improve their business processes and optimize their decision-making [25]. In this context, we are investigating how semantic technologies can be used to leverage the effort of sales and commercial staff to help them be more responsive to customers, and to identify new sales opportunities. ...
Conference Paper
Full-text available
Today, a huge amount of valuable data on the web that organizations or users can use to improve their decisionmaking process. With the advent of the Linked Open Data (LOD), many structured data sources have been published and can be accessed from various SPARQL endpoints. On the other hand, Social data is becoming an important context-rich information. Web APIs provide this last generally in a semistructured format. The access, retrieval, and utilization of these different data models on the web impose a need for the data to be integrated and providing users with a single entry point to them. In this paper, we explore the major challenges in this area and discuss the limitations of some existing integration solutions and tools. We also propose a modular mediator-based architecture to integrate heterogeneous enterprise data with large-scale Social Data and Linked Data.
Article
Advances in semantic web technologies have rocketed the volume of linked data published on the web. In this regard, linked open data (LOD) has long been a topic of great interest in a wide range of fields (e.g. open government, business, culture, education, etc.). This article reports the results of a systematic literature review on LOD. 250 articles were reviewed for providing a general overview of the current applications, technologies, and methodologies for LOD. The main findings include: i) most of the studies conducted so far focus on the use of semantic web technologies and tools applied to contexts such as biology, social sciences, libraries, research, and education; ii) there is a lack of research with regard to a standardized methodology for managing LOD; and iii) a plenty of tools can be used for managing LOD, but most of them lack of user-friendly interfaces for querying datasets.
Article
Full-text available
The web diversification into the Web of Data and social media means that companies need to gather all the necessary data to help make the best-informed market decisions. However, data providers on the web publish data in various data models and may equip it with different search capabilities, thus requiring data integration techniques to access them. This work explores the current challenges in this area, discusses the limitations of some existing integration tools, and addresses them by proposing a semantic mediator-based approach to virtually integrate enterprise data with large-scale social and linked data. The implementation of the proposed approach is a configurable middleware application and a user-friendly keyword search interface that retrieves its input from internal enterprise data combined with various SPARQL endpoints and Web APIs. An evaluation study was conducted to compare its features with recent integration approaches. The results illustrate the added value and usability of the contributed approach.
Article
Nowadays, research in Linked Data has significantly advanced and it now entails enormous area of applications like research publications and data sets. Its flexibility and effectiveness in handling and linking data from numerous sources has made Linked Data more popular. The aim of this article is to systematically present literature review of Linked Data and its development since 2009. Moreover, cumulative experiences and lessons learned from recent years will be highlighted. Findings showed that Linked Data has grown in the past five years in terms of number of datasets, research publications and domain-specific applications.
Article
Full-text available
The introduction of semantic web and Linked Data helps facilitate sharing of data on the Internet more easily. Subsequently, the resource description framework (RDF) is the standard in publishing structured data resources on the Internet and is used in interconnecting with other data resources. To remedy the data integration issues of the traditional web mashups, the semantic web technology uses the Linked Data based on RDF data model as the unified data model for combining, aggregating, and transforming data from heterogeneous data resources to build Linked Data mashups. There have been tremendous amounts of efforts of semantic web community to enable Linked Data mashups but there is still lack of a systematic survey on concepts, technologies, applications, and challenges. Therefore, in this paper, we investigate in detail semantic mashups research and application approaches in the information integration. This paper also presents a Linked Data mashup application as an illustration of the proposed approaches.
Article
Full-text available
The term Linked Data refers to a set of best practices for publishing and connecting structured data on the Web. These best practices have been adopted by an increasing number of data providers over the last three years, leading to the creation of a global data space containing billions of assertions-the Web of Data. In this article we present the concept and technical principles of Linked Data, and situate these within the broader context of related technological developments. We describe progress to date in publishing Linked Data on the Web, review applications that have been developed to exploit the Web of Data, and map out a research agenda for the Linked Data community as it moves forward.
Article
Full-text available
Competitive intelligence is the art of defining, gathering and analyzing intelligence about competitor's products, promotions, sales etc. from external sources. The Web comes across as an important source for gathering competitive intelligence. News, blogs, as well as social media not only provide competitors information but also provide direct comparison of customer behaviors with respect to different verticals among competing organizations. This paper discusses methodologies to obtain competitive intelligence from different types of web resources including social media using a wide array of text mining techniques. It provides some results from case-studies to show how the gathered information can be integrated with structured data and used to explain business facts and thereby adopted for future decision making.
Article
Full-text available
This paper argues that social media metrics should be captured as customer investments in marketers’ social media efforts and that applications considered in concert with performance objectives drive the choice of metrics. Motivating this approach are the “four c’s” that drive consumer use of social media. These include the connections consumers make with each other, the user-generated content they create, their consumption of other users’ content and their control of their own online experiences. Social media metrics that are linked to three broad social media performance objectives are identified for eight general categories of social media applications and the paths managers have for improving social media effectiveness that rely on using these metrics are discussed.
Conference Paper
The Web is increasingly understood as a global information space consisting not just of linked documents, but also of Linked Data. More than just a vision, the resulting Web of Data has been brought into being by the maturing of the Semantic Web technology stack, and by the publication of an increasing number of data sets according to the principles of Linked Data. The Linked Data on the Web (LDOW2008) workshop brings together researchers and practitioners working on all aspects of Linked Data. The workshop provides a forum to present the state of the art in the field and to discuss ongoing and future research challenges. In this workshop summary we will outline the technical context in which Linked Data is situated, describe developments in the past year through initiatives such as the Linking Open Data community project, and look ahead to the workshop itself.
Thesis
Competitive intelligence process aims to provide actionable information about the external business environment to back up decision-making in companies. The affects that the rise of social media may have on competitive intelligence is a topic of interest to both practice and theory. The main objectives of this dissertation are to understand how social media changes the competitive intelligence process and how can it enhance the elicitation of employees’ competitive knowledge. The research questions are studied using both theoretical and empirical research approach. Empirical study consists of three data sets complementing each other, adopting several methods and perspectives. The results of the dissertation suggest that social media has an effect on companies’ information environment, as the widespread use of social media produces more volume and more versatile information than before. In the competitive intelligence context this influences information gathering especially: social media for its part increases the available information sources, but it also offers technologies to automate some parts of information gathering and processing. In addition, use of suitable social media tools can have affects on the elicitation of employees’ competitive knowledge and making competitive knowledge more visible in a company. Social media provides an opportunity to implement the competitive intelligence process as participative and collaborative and engaging employees in the process. The role of the employees shifts to that of more active participants shaping the collaborative understanding by contributing their competitive knowledge to the process as well as better benefiting more from others’ competitive knowledge. However, the success of using social media in better utilising and sharing employees’ competitive knowledge relies heavily on utility, perceived usefulness and affordance of the tools as well as how motivated the employees are to use it for knowledge sharing. The main motivating factors and barriers are in line with those regarding general knowledge sharing. The main contributions include increasing knowledge on the connection between social media and competitive intelligence: how the emergence of social media affects carrying out the competitive intelligence process and especially sharing of employees’ competitive knowledge. In addition, the research reveals the motivational factors and barriers related to employees’ willingness to use social media for sharing competitive knowledge. The findings also have practical managerial implications for companies planning to adopt social media for competitive knowledge sharing, as they provide means for them to prepare the conditions for successful utilisation and active employee participation.
Conference Paper
Interlinking different data sources has become a crucial task due to the explosion of diverse, heterogeneous information repositories in the so-called Web of Data. In this paper an approach to extract relationships between entities existing in huge Linked Data sources is presented. Our approach hinges on the Map-Reduce processing framework and context-based ontology matching techniques so as to discover the maximum number of possible relationships between entities within different data sources in an computationally efficient fashion. To this end the processing flow is composed by three Map-Reduce jobs in charge for 1) the collection of linksets between datasets; 2) context generation; and 3) construction of entity pairs and similarity computation. In order to assess the performance of the proposed scheme an exemplifying prototype is implemented between DBpedia and LinkedMDB datasets. The obtained results are promising and pave the way towards benchmarking the proposed interlinking procedure with other ontology matching systems.
Conference Paper
The shared online calendar is the de facto standard for event organisation and management in the modern office environment. It is also a potentially valuable source of context, provided the calendar event data represent an accurate account of 'real-world' events. However, as we show through a field study, the calendar does not represent reality well as genuine events are hidden by a multitude of reminders and 'placeholders', i.e. events that appear in the calendar but do not occur. We show that the calendar's representation of real events can be significantly improved through data fusion with other sources of context, namely social network and location data. Finally, we discuss some of the issues raised during our field study, their significance and how performance could be farther improved.