Conference PaperPDF Available

Complex Network Analysis of a Tourism Content Sharing Network

Authors:
Complex Network Analysis of a Tourism Content
Sharing Network
Alex Becheru
University of Craiova
Email: becheru@gmail.com
Costin B˘
adic˘
a
University of Craiova
Email: cbadica@software.ucv.ro
Mih˘
ait¸˘
a Antonie
University of Craiova
Email: mihai.antonie@gmail.com
Abstract—This paper presents results of the analysis of a
tourism information web-site (AmFostAcolo.ro) by using Complex
Networks (CN) methods. The work accomplished here comple-
ments a previous paper, where we discussed data extraction and
modelling into a complex network. Properties of the resulted
network, communities and vertices are looked upon, in order
to extract useful information and detect social phenomena.
Temporal analysis methods are employed for examining the
evolution of the web-site. The results obtained prove the natural
development of the web-site and the usefulness of CN analysis
methods in this scenario.
I. INTRODUCTION
A great interest was shown during the last decade to
the application of information and communication technol-
ogy (ICT) for the development of Smart Tourism business
[1], here understood as ”the application of information and
communication technologies to the tourism sector1. Tourists
are interested to benefit from the availability of advanced ICT
services for knowledge and information management to assist
them in taking informed decisions matching their preferences
with less effort and in shorter time. Tourism companies look
for improving the quality of their services, as well as their
image, based on the feedback gathered from their customers.
Many sources of tourist information can be found, e.g. web-
sites and social media [2], [3]. These sources can be defined as
content sharing networks that allow users to share and make
use of the information provided. In this paper we shall consider
as content the reviews and/or comments expressed in natural
language, stating users’ opinions and experiences regarding
tourism entities.
As data source we chose one of the most popular tourism
information web-sites in Romania, AmFostAcolo2. Registered
users are able to interact with each other trough: (i) echoes, as
well as answers to echoes posted in relation to certain reviews
or comments, and (ii) asking questions and giving answers to
questions about a certain tourism entities.
The focus of this paper is to analyse the content sharing
network extracted from the above mentioned web-site. The
extraction and processing workflow has been discussed and
defined in a previous paper [4]. Thus we hope to obtain
valuable information and to asses the usefulness of Complex
Network analysis methods in the context the paper presents.
1http://www.smarttourism.org/
2http://www.amfostacolo.ro
Fig. 1. Exemplification of the users that share information regarding a
particular tourism entity.
We explicitly use the term user of the web-site and not
customer of a tourism entity, as a user sharing information is
not obliged to have the quality of customer. Let us imagine a
scenario in which user A was a customer of hotel Ah, located
near hotel Bh. User A can share a valuable piece of information
about Bh without being a customer: Bh has a better view of the
sea than hotel Ah. For a better understanding of the differences
between a web-site user and customer of a tourism entity view
Figure. 1.
Throughout this paper we shall be referring to tourism
entities as areas of tourism interest. There is no limitation of
the size of the area, it can refer to a country or to a specific
hotel. Also there is no limitation of the type of the tourism
entity, it can depict a hotel, aqua park, forest, etc.
This paper is structured as following. The next sections
reviews some background information about complex networks
and metions related works. The third section provides an
insight into the data extraction and processing workflow. The
fourth section details the experiments performed and the results
obtained. The fifth section presents related work. Conclusions
and future work are presented in the last section.
II. BACKGROU ND A ND R EL ATED W OR K
In order to understand complex interconnected systems a
new field of research emerged Network Science (NS) or Com-
plex Networks Analysis (CNA). The heart of this new research
field leverages on Graph Theory and Computer Science.NS
investigates non-trivial features of graph problems that usually
are not addressed by lattice theory or random graphs. The
understanding of such non-trivial features is of high interest,
as they frequently occur in real world problems. The com-
plexity of real world networks comes from the modelling and
evaluation of overlapping and interdependent phenomena, that
are neither purely regular nor purely random. Also complexity
may come with the sheer size of the network itself
There are two important papers standing as the building
blocks of Network Science. Paul Erd˝
os and Alfr´
ed R´
enyi wrote
about random graphs in 1959 [5]. In 1973, Mark Granovetter
discovered the strength of weak ties [6]. A graph usually con-
sists of a number of subgraphs, nodes inside these subgraphs
are tightly connected among them and loosely (weak ties)
connectedwith other subgraphs.One may think that those weak
ties are not relevant, but without their presence the graph of
subgraphs would not exist. CNA emerged at the beginning of
the 1990s as a result of the progress in applied computational
sciences. But the most important factor was the access to
data describing real world networks. The emergence of the
WorldWideWeb, as well as the explosion of the interest in
detailed mapping across many sciences, especially in biology
and economics.
NS can be used in many application domains. For example,
internet companies like Google and Facebook are practically
built on complex networks. In medicine, the spread of diseases
is now studied with the help of CNA [7]. Security forces
map the networks of acquaintances of wanted individuals,maps
which could lead to alternative ways to reach them. The
famous Saddam Hussein was captured using methods from
NS [8]. Large oil companies use a branch of CNA known
as Organisational Network Analysis to enhance the flow of
information exchangewithin the companies [9].
Our literature research on Smart Tourism has found a
generous amount of papers regarding this subject. Especially
a lot of work is aimed at developing recommender systems
for toursim that exploit the recent advances in ICT, including
mobile computing [10], [11], [12], sematinc technologies [13],
[14] and geo-tagged information [15]. A recent survey of
toursim recommender systems is presented in [16].
Other two papers are focused on the analysis of data
extracted from AmFostAcolo [17], [18] . Paper [17] presents
result of sentiment analysis for relating tourist opinion holder
with the review content. Paper [18] explores complex network
representations of tourist reviews for extracting lexical and
quantitative features of the review text. Other works employing
network analysis computational methods in the tourism domain
are [19] and [20].
III. DATA EXTRACT IO N,PROC ES SI NG W OR KFL OW AN D
NE TWORK CREATI ON
This section should also be viewed as previous work
reiteration as the work presented here has been mentioned in
a previous paper [4].
Web-site users are able to interact on AmFostAcolo in
at least two ways. Firstly, users can ask questions and give
answers related to a particular tourism entity. This facility
Fig. 2. Data extraction and preprocessing workflow.
is useful whenever users would like to ask something not
necessarily related to a specific impression. For example, a
user can ask a question related to the tourism entity of “Cheile
Sohodolului”. Secondly, users can post tourist impressions
related to a certain place, so each place becomes a container of
impressions or reviews written by the users registered at Am-
FostAcolo, that visited the place and shared their impressions
on the Web site. Each post can trigger discussions about the
place in question. Other users can post echoes to this article.
Most often these echoes are questions requiring clarifications
or more details about the post. Questions in turn can trigger
answers and so on.
The task of data extraction was achieved using the data
preprocessing workflow described in Figure. 2. This task
contains the following activities:
Data extraction from AmFostAcolo and its conver-
sion to XML. This was achieved by developing a
customized tool based on jsoup library for HTML
processing3.
Data storage into a relational database developed using
mysql4. This was achieved by developing a customized
tool based on jaxb5library for binding XML data with
their Java implementation.
Data export to CSV format, to facilitate its further
processing using other tools.
Our main interest in AmFostAcolo data relates to users,
their questions related to a tourism entity, as well as the
answers to these questions. For this purpose we have defined
an XML schema for representing this data in XML. Data
was extracted from AmFostAcolo using the technique of Web
scraping, and then was saved onto a set of target XML files.
The XML schema of extracted data is presented in Fig-
ure. 3. The root note represents a user, so there is one such
file for each user registered with AmFostAcolo. For each user
we record the set of questions posted by this user, as well as
their answers, using elements qapost, containing one question,
as well as one answer element. For each answer we must also
record the user that posted this answer, captured using the
qauthor element. Using this simple representation we are able
to capture the interactions established between the site users.
In our network representation users are vertices, and links
represent interactions between users. In order to be more
explicit let us imagine that user A posts a question on the
web-site and user B answers that question. This interaction is
3http://jsoup.org/
4https://www.mysql.com/
5https://jaxb.java.net/
Fig. 3. XML schema of extracted data.
Fig. 4. AmFostAcolo question answer interaction network. The vertices’
diameters are proportional to their Page-Rank coefficient. The colour of
vertices’ depict their community, as determined with the modularity algorithm
[21]. Links are coloured according to the colour of their source vertice.
represented as a directed link having as source vertice B and
target vertice A. In order to understand the evolution of the
interaction we also captured the time of the answer, thus we
can further conduct temporal analysis.
In constructing the network we took in consideration all the
web-site’s users and their respective interactions. By applying
the above mentioned method for network creation we obtained
a directed graph/network with 8017 vertices and 25666 links.
See Figure. 4 for a visual representation of the network.
IV. EXP ER IM EN TS A ND R ES ULTS
The experiments presented in this section were conducted
using the following tools. Gephi is an inateractive visualisation
and exploration platform for all kinds of networks and complex
systems, dynamic and hierarchical graphs [22]. NetworkX is a
Python package for exploration and analysis of networks and
network algorithms [23].
A. Questions to answered
The purpose of this paper is to conduct analysis on the
extracted tourism information sharing network. We hope to
better understand how information is exchanged in such an
environment. A key interest is to observe the development
process of such a network. We can separate our experiments
in 2 categories:
Experiments that regard analysis of topological aspects
of the network. These analysis shall be conducted in
depth by using 3 levels of granularity: entire network,
community and vertice. The presence of diverse soci-
ological phenomena shall be tested.
Experiments that regard the evolution of the network
through time. Methods mentioned in the previous
category shall be used but considering the temporal
scale.
The following questions should be answered in the subse-
quent pages:
1) Can we categorise the network as a complex network
type? How does this influence the web-site’s commu-
nity of users?
2) Are there present any sociological phenomena? What
are their influence on the content sharing network?
3) Is the community of users stable and resilient to
changes?
4) How did the network form over time?
5) Is the community expanding or contracting?
6) Is there any evidence that a review regarding a
tourism entity, either positive or negative, may have
significant influence on the community.
7) Can we determine trends of user interest, seasonal or
country wise?
B. Network topological aspects
We start by calculating basic entire network metrics [24],
see Table I. As you can see in average a user responds to
questions addressed by 3 other users, with an average number
of almost 6 answers (see row 1&2 of the table). The longest
shortest path between two vertices is 16 (diameter), which
compared to the size of the network (25666 links) has a
small value. This together with the small average path length,
5.042, is an important sign that the small world phenomenon
[25] maybe present in our network. The presence of such
a phenomenon maybe due to the presence of hubs, vertices
with a large number of connections that interconnect various
parts of a network. In order to detect the presence of hubs
we conducted a degree distribution of the vertices. As we can
see in Figure 5 the distribution is of type scale-free, which is
widely encountered in the natural world [26]. We also found
the same distribution when we took in consideration the Page-
Rank [27] coefficient of each vertice. This findings supports
the presence of hubs, few nodes with large degree.
Another way to test the presence of the small world
phenomenon is to conduct a diffusion experiment, we would
aspect the diffusion to spread in a significant part of the
network. For the diffusion experiment we chose as start vertice
a node in the periphery of the graph with out degree 8(double
the average). The diffusion is set to loose 70% of its strength
at each step. Only the vertices up to neighbour of neighbour
can further forward the diffusion, the rest can only receive. As
it can be seen in Figure. 6 the diffusion spread to a significant
part of the network, although the start vertice is positioned
in the periphery of the graph. All the above being said, we
can now state that the small world phenomenon is present in
our network. In our context this would mean that information
travels fast and with ease in our tourism information sharing
network. Thus a review, either positive or negative, could get
noticed by a significant part of the network.
The presence of hubs may depict the presence of another
social phenomenon, preferential attachment [28]. A set of
processes in which some quantity, typically some form of
wealth or credit, is distributed among a number of individuals
or objects according to how much they already have, so that
those who are already wealthy receive more than those who
are not. In our context this would mean that a post from a very
active user (responded to many questions in the past) weighs
(as in trusted) more that an answer from a typical user.
We continue our analysis by shrinking the granularity from
entire network to communities. For community detection we
use the modularity algorithm [21]. The value of the modularity
lies in the range [1/2,1). It is positive if the number of edges
within groups exceeds the number expected on the basis of
chance. The value of the modularity coefficient for our network
is 0.48, thus we can argue that social communities are present.
The number of communities detected is 146, but this
value is skewed by many small scattered communities (not
connected with the giant component of the network). By
keeping only the communities that form the giant component
we obtained 25 large communities, see Figure. 7. The average
inter community degree is 47.2(the maximum is 48), which
proves that the communities are very well interconnected. For
a better understanding of the inter community connectivity see
the histogram of degree distribution in Figure. 7.
From a visual point our network appears to be of type core-
periphery [29], as there is a central conglomerate of nodes
which appears to be very well connected while the outer parts
of the network are more scattered. To prove the above as true
we had to determine and analyse the core. The core should
have a higher grade of inter edge connection than the rest
of the network and to some order more compact. To extract
the core we eliminated from the network the nodes that have a
degree less than 30 (10 times the average). Thus we obtained a
core network having 300 vertices (3.74% of the total) and 2080
links (12% of the total). The results of the metrics analysis can
be see in Table. II.
As we expected the core is far more connected than the
rest of the network. The average degree is more than 3 times
larger while the average weighted degree is 5 times bigger.
Event the diameter is reduced to almost half the previous value.
See Table. I for the entire network metrics’ values (previous
values). The modularity coefficient has a lower value as it is
harder to determine communities in such a well connected
environment. The above findings are consistent with the fewer
number of communities. The average clustering coefficient is
more than 4 times larger, while the average path length has
shrunken considerably. All these results prove the existence of
acore-periphery structure in our network.
The core-periphery type of complex networks is very re-
silient to node removals, especially from the core. This means
that our user community is stable, and only a catastrophic event
would lead to a separation of the giant component. The core
acts as a diffusion enhancer, once the information gets here
it is distributed fast in all parts on the network. To become a
member of the core a user has to be very active, so in a sense
the community is meritocratic.
At the granularity level of vertices we were most interested
in analysing geographical points of interests. The majority of
users have stated their location, usually cities or counties. The
large majority of questions on the web-site refer to a tourism
entity, and each tourist entity can be pinpointed to a location.
A location maybe a country,city or a specific address. Thus
we were able to construct a network were vertices represent
locations. A directed link from vertice A to vertice B was
constructed if a user from location A answered a question
regarding location B.
Table. III depicts the top 5 locations from where questions
are answered and top 5 countries of interest for the web-
site users. The ranking was made base on weighted in degree
respective weighted out degree. Although much more work has
Fig. 5. Degre distribution of vertices.
Metric Value
Avg. Degree 3.201
Avg. Weighted degree 5.881
Diameter 16
Modularity coefficient 0.48
Number of Communities detected by the modularity algorithm 146
Avg. Clustering coefficient 0.015
Avg. Path length 5.042
TABLE I. TABLE CONTAINING BASIC GRAPH METRICS OF THE ENTIRE
NE TWO RK.
Fig. 7. The inter community graph. Each vertice represents a community,
they can be distinguished through their colour. The vertices’ diameter is
proportional to their respective Page-Rannk coefficient. Only the communities
that have inter-community degree larger than 0 are considered.
to be done regarding geographical analysis the current results
are correlated with Romanian tourism market 6.
C. Temporal analysis
As mentioned before in order to conduct the temporal
analysis we had to capture the time of link formation. We
decided to make a snapshot of the network on January 1’st
of each year starting with 2010 till 2015. The results can
6http://vacantalamare.stirileprotv.ro/stiri/romanii-nu-mai-prefera-bulgaria-
topul-destinatiilor-de-vacanta-alese-in-aceasta-vara.html, last visited july 20th
2015
Fig. 8. Degre distribution of communities. Only the communities that have
inter-community degree larger than 0 are considered.
Metric Value
Avg. Degree 10.267
Avg. Weighted degree 28.477
Diameter 9
Modularity coefficient 0.299
Number of Communities detected by the modularity algorithm 10
Avg. Clustering coefficient 0.068
Avg. Path length 3.262
TABLE II. TABLE CONTAINING BASIC GRAPH METRICS OF THE CORE.
Fig. 10. Degre distribution of communities. Only the communities that have
inter-community degree larger than 0 are considered.
be seen in Figure. 9. As it can be seen the network grows
significantly from inside out (like a tree), natural growth. The
growth process seams to pursue the following steps:
1) from the current core start developing to the outside,
see year 2010 and 2014.
2) develop connections with the outskirts until they
become a part of the core (as connected as the core),
see years from 2011 to 2013 and 2015.
3) return to step 1.
Further we wanted to analyse the development pace of
the network, is it stepping down or up? As you can see in
Figure. 10 the pace of link formation has been constant, with
a significant rise between 2010 and 2012.
V. CONCLUSION AND FUTURE WORK
In conclusion we can argue that due to the Complex Net-
works methods of analysis we could answer all the questions
Fig. 6. AmFostAcolo question answer interaction network with diffusion experiment. The vertices’ diameters are proportional to their Page-Rank coefficient
[27]. The colour of the vertices’ depict if the diffusion has reached them (red) or not (grey). Links are coloured according to the colour of their source vertice.
The arrow points to the start vertice of diffusion.
Top locations for question answering Weighted In-Degree Top country destinations Weighted Out-Degree
Bucharest 1126 Greece 865
Giurgiu 276 Bulgaria 685
Brasov 259 Turkey 394
Iasi 242 Romania 227
Sibiu 198 Egipt 168
TABLE III. TABL E STATIN G TOP 5DEPARTURE LOCATIONS AND DESTINATIONS.
raised in section IV-A. This proves the usefulness of such
methods in the presented case of study.
We were able to state that the complex network type is
core-periphery, and depict the influence it has especially on the
increased resilience of the network. We could find significant
evidence for the presence of the small world and preferential
attachment phenomena. We prove the network developed in a
”natural” way from the inside out. We also have shown that
the rate on link creation is on a posibitive stable slope, thus the
community is still it the expansion process. We have shown
that a review, either positive or negative, care be perceived by
a significant part of the network very fast due to the type of
the network and to the just mentioned phenomena. We were
able to determine the countries which are of most interest to
the users of the web-site.
We acknowledge that further work is needed. We plan to
expand our data source to several web-sites. Also we plan to do
more detailed analysis on the communities. The geographical
analysis has to be further expanded and the seasonal visitation
trends have to be discovered and analysed. .
ACK NOW LE DG EM EN T
This work was supported by the strategic grant POS-
DRU/159/1.5/2/133255. Project ID 133225(2014), co-financed
by the European Social Fund within the Sectorial Operational
Program Human Resources Development 2007-2013.
Fig. 9. Time lapse of the evolution of network as captured on the first day of each year. From left to right and top do bottom the respective years are: 2010,
2011, 2012, 2013, 2014, 2015.
REFERENCES
[1] G. B ¨
uY¨
uK¨
oZkan and B. Erg¨
uN, “Intelligent system applications in
electronic tourism,” Expert Systems with Applications, vol. 38, no. 6,
pp. 6586–6598, 2011.
[2] U. Gretzel, H. Werthner, C. Koo, and C. Lamsfus, “Conceptual foun-
dations for understanding smart tourism ecosystems,” Computers in
Human Behavior, 2015.
[3] E. No and J. K. Kim, “Comparing the attributes of online tourism
information sources,” Computers in Human Behavior, 2015.
[4] M. Antonie, C. B˘
adic˘
a, and A. Becheru, “Towards social data ana-
lytics for smart tourism: A network science perspective,” accepted at
RUMOUR-2015 ,Workshop on Social Media and the Web of Linked
Data 18 July 2015, Sibiu, Romania.
[5] A. Renyi and P. Erdos, “On random graphs,Publicationes Mathemat-
icae, vol. 6, no. 290-297, p. 5, 1959.
[6] M. S. Granovetter, “The strength of weak ties,” American journal of
sociology, pp. 1360–1380, 1973.
[7] A.-L. Barab´
asi, N. Gulbahce, and J. Loscalzo, “Network medicine: a
network-based approach to human disease,” Nature Reviews Genetics,
vol. 12, no. 1, pp. 56–68, 2011.
[8] C. Wilson, “Searching for saddam: Why social network analysis hasnt
led us to osama bin laden,” Slate (February 26, 2010), 2010.
[9] R. L. Cross, J. Singer, S. Colella, R. J. Thomas, and Y. Silverstone,
The organizational network fieldbook: Best practices, techniques and
exercises to drive organizational innovation and performance. John
Wiley & Sons, 2010.
[10] W.-S. Yang and S.-Y. Hwang, “itravel: A recommender system in mobile
peer-to-peer environment,Journal of Systems and Software, vol. 86,
no. 1, pp. 12–20, 2013.
[11] M. Rodriguez-Sanchez, J. Martinez-Romo, S. Borromeo, and
J. Hernandez-Tamames, “Gat: Platform for automatic context-aware
mobile services for m-tourism,” Expert Systems with Applications,
vol. 40, no. 10, pp. 4154–4163, 2013.
[12] D. Gavalas, C. Konstantopoulos, K. Mastakas, and G. Pantziou, “Mobile
recommender systems in tourism,” Journal of Network and Computer
Applications, vol. 39, pp. 319–333, 2014.
[13] M. Al-Hassan, H. Lu, and J. Lu, “A semantic enhanced hybrid rec-
ommendation approach: A case study of e-government tourism service
recommendation system,” Decision Support Systems, vol. 72, pp. 97–
109, 2015.
[14] ´
A. Garc´
ıa-Crespo, J. L. L´
opez-Cuadrado, R. Colomo-Palacios,
I. Gonz´
alez-Carrasco, and B. Ruiz-Mezcua, “Sem-fit: A semantic based
expert system to provide recommendations in the tourism domain,
Expert systems with applications, vol. 38, no. 10, pp. 13 310–13 319,
2011.
[15] K. Jiang, H. Yin, P. Wang, and N. Yu, “Learning from contextual
information of geo-tagged web photos to rank personalized tourism
attractions,” Neurocomputing, vol. 119, pp. 17–25, 2013.
[16] J. Borr`
as, A. Moreno, and A. Valls, “Intelligent tourism recommender
systems: A survey,Expert Systems with Applications, vol. 41, no. 16,
pp. 7370–7389, 2014.
[17] M. Colhon, C. B˘
adic˘
a, and A. S¸endre, “Relating the opinion holder
and the review accuracy in sentiment analysis of tourist reviews,” in
Knowledge Science, Engineering and Management. Springer, 2014,
pp. 246–257.
[18] A. Becheru, C. B˘
adic˘
a, and M. Colhon, “Tourist review analytics using
complex networks.
[19] R. Baggio, N. Scott, and C. Cooper, “Network science: A review
focused on tourism,” Annals of Tourism Research, vol. 37, no. 3, pp.
802–827, 2010.
[20] D. F. Nettleton, “Data mining of social networks represented as graphs,
Computer Science Review, vol. 7, pp. 1–34, 2013.
[21] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre, “Fast
unfolding of communities in large networks,Journal of Statistical
Mechanics: Theory and Experiment, vol. 2008, no. 10, p. P10008, 2008.
[22] M. Bastian, S. Heymann, and M. Jacomy, “Gephi: An open
source software for exploring and manipulating networks,” 2009.
[Online]. Available: http://www.aaai.org/ocs/index.php/ICWSM/09/
paper/view/154
[23] D. A. Schult and P. Swart, “Exploring network structure, dynamics, and
function using networkx,” in Proceedings of the 7th Python in Science
Conferences (SciPy 2008), vol. 2008, 2008, pp. 11–16.
[24] P. Boldi and S. Vigna, “Axioms for centrality,Internet Mathematics,
vol. 10, no. 3-4, pp. 222–262, 2014.
[25] S. Milgram, “The small world problem,” Psychology today, vol. 2, no. 1,
pp. 60–67, 1967.
[26] A.-L. Barab´
asi et al., “Scale-free networks: a decade and beyond,
science, vol. 325, no. 5939, p. 412, 2009.
[27] L. Page, S. Brin, R. Motwani, and T. Winograd, “The pagerank citation
ranking: bringing order to the web.” 1999.
[28] M. E. Newman, “Clustering and preferential attachment in growing
networks,” Physical Review E, vol. 64, no. 2, p. 025102, 2001.
[29] D. A. Hojman and A. Szeidl, “Core and periphery in networks,” Journal
of Economic Theory, vol. 139, no. 1, pp. 295–309, 2008.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Gephi is an open source software for graph and network analysis. It uses a 3D render engine to display large networks in real-time and to speed up the exploration. A flexible and multi-task architecture brings new possibilities to work with complex data sets and produce valuable visual results. We present several key features of Gephi in the context of interactive exploration and interpretation of networks. It provides easy and broad access to network data and allows for spatializing, filtering, navigating, manipulating and clustering. Finally, by presenting dynamic features of Gephi, we highlight key aspects of dynamic network visualization.
Conference Paper
Full-text available
A number of techniques for Natural Language Processing (shortly, NLP) based on graph representations were developed. Usually they target a specific NLP task, such as: text summarisation, syntactic parsing, word sense disambiguation, ontology construction, sentiment and subjectivity analysis, or text clustering. In this paper we explore complex network representation of tourist reviews for extracting lexical and quantitative features of the review text. The most important contribution of our proposal consists of defining a new method for keywords extraction using Complex Network ranking metrics.
Conference Paper
Full-text available
In this paper we propose a sentiment classification method for the categorization of tourist reviews according to the sentiment expressed. We also give the results of the application of our sentiment analysis method on a real data set extracted from the AmFostAcolo tourist review Web site. In our analysis we were focused on investigating the relation between the opinion holder and the accuracy of the review sentiment with the review score. Based on our initial experimental results we concluded that specific characteristics of the opinion holder, like for example his or her reputation, might relate to the accuracy of the opinions expressed in his or her reviews.
Article
Full-text available
Using digital ecosystems and smart business networks as conceptual building blocks, this paper defines, describes and illustrates the idea of a smart tourism ecosystem (STE). It further draws on conceptualizations of smart technologies, smart cities and smart tourism to envision new ways in which value is created, exchanged and consumed in the STE. Technologies essential to the functioning of an STE are described and it is argued that data emerging from these technologies are the driver for new business models, interaction paradigms and even new species. Critical questions regarding the need for regulatory intervention and innovative research are raised.
Article
Full-text available
Recommender systems are effectively used as a personalized information filtering technology to automatically predict and identify a set of interesting items on behalf of users according to their personal needs and preferences. Collaborative Filtering (CF) approach is commonly used in the context of recommender systems; however, obtaining better prediction accuracy and overcoming the main limitations of the standard CF recommendation algorithms, such as sparsity and cold-start item problems, remain a significant challenge. Recent developments in personalization and recommendation techniques support the use of semantic enhanced hybrid recommender systems, which incorporate ontology-based semantic similarity measure with other recommendation approaches to improve the quality of recommendations. Consequently, this paper presents the effectiveness of utilizing semantic knowledge of items to enhance the recommendation quality. It proposes a new Inferential Ontology-based Semantic Similarity (IOBSS) measure to evaluate semantic similarity between items in a specific domain of interest by taking into account their explicit hierarchical relationships, shared attributes and implicit relationships. The paper further proposes a hybrid semantic enhanced recommendation approach by combining the new IOBSS measure and the standard item-based CF approach. A set of experiments with promising results validates the effectiveness of the proposed hybrid approach, using a case study of the Australian e-government tourism services.
Conference Paper
In this paper we present our preliminary results regarding collecting, processing and visualizing relations between the user comments that were posted on Smart Tourism Web sites. The focus of this paper is on investigating the user interactions generated by expressing questions and answers containing the users’ impressions and opinions about the attractions offered by various tourism destinations. We propose a prototype system based on the design of a conceptual data model and of the development of a data processing workflow that allows to capture, to analyze and to query the implicit social network that was determined by the relations between user comments, using specialized software tools for graph databases and complex networks analytics.
Article
Ongoing developments in information technology (IT), particularly with respect to the Internet, have led to changes in the way tourism-related information is distributed. These changes have affected the planning and consumption patterns of tourists before and during their trips. Since the majority of tourists retrieve information from multiple information sources on the web, it is essential to define the differences in these sources and identify the distinct characteristics or properties of each source in order to understand the needs and tendencies of tourists. This study classifies online tourism information sources into four types: blogs, public websites, company websites, and social media websites. Five website attributes are identified: accessibility, security, information-trust, interaction, and personalization. This study uses data from 61 participants. Each participant answered all of the questions for four different information sources. This study then conducts an analysis of variance (ANOVA) test and a multiple comparison Scheffé test to verify differences between groups. Based on these five attributes, the results of this multiple comparison show that the overall mean values are relatively high in personal blogs, while security is the dominant attribute for public websites. The mean values of all five attributes were relatively lower in SNSes compared to the other sources.