Conference PaperPDF Available

Sentiment Analysis of News Articles: A Lexicon based Approach

Authors:

Abstract and Figures

Modern technological era has reshaped traditional lifestyle in several domains. The medium of publishing news and events has become faster with the advancement of Information Technology (IT). IT has also been flooded with immense amounts of data, which is being published every minute of every day, by millions of users, in the shape of comments, blogs, news sharing through blogs, social media micro-blogging websites and many more. Manual traversal of such huge data is a challenging job; thus, sophisticated methods are acquired to perform this task automatically and efficiently. News reports events that comprise of emotions-good, bad, neutral. Sentiment analysis is utilized to investigate human emotions (i.e., sentiments) present in textual information. This paper presents a lexicon-based approach for sentiment analysis of news articles. The experiments have been performed on BBC news dataset, which expresses the applicability and validation of the adopted approach.
Content may be subject to copyright.
2019 International Conference on Computing, Mathematics and Engineering Technologies – iCoMET 2019
978-1-5386-9509-8/19/$31.00 ©2019 IEEE
Sentiment Analysis of News Articles:
A Lexicon based Approach
Soonh Taj, Baby Bakhtawer Shaikh, Areej Fatemah Meghji
Mehran University of Engineering and Technology, Jamshoro, Pakistan
engr.soonhtaj@gmail.com, malkashaikh70@yahoo.com, areej.fatemah@faculty.muet.edu.pk
Abstract — Modern technological era has reshaped
traditional lifestyle in several domains. The medium of
publishing news and events has become faster with the
advancement of Information Technology (IT). IT has also been
flooded with immense amounts of data, which is being published
every minute of every day, by millions of users, in the shape of
comments, blogs, news sharing through blogs, social media
micro-blogging websites and many more. Manual traversal of
such huge data is a challenging job; thus, sophisticated methods
are acquired to perform this task automatically and efficiently.
News reports events that comprise of emotions – good, bad,
neutral. Sentiment analysis is utilized to investigate human
emotions (i.e., sentiments) present in textual information. This
paper presents a lexicon-based approach for sentiment analysis
of news articles. The experiments have been performed on BBC
news dataset, which expresses the applicability and validation of
the adopted approach.
Keywords— Sentiment analysis, Lexicon-based approach, news
articles.
I.
I
NTRODUCTION
With the emergence of the Internet, web and mobile
technologies, people have changed their way of consuming
news. Traditional physical newspapers and magazines have
been replaced by virtual online versions like online news and
weblogs. Readers are more inclined to use online sources of
news mainly due to two key features: interactivity and
immediacy [1].
In this day and age, people want to consume as much
news, from as many sources, as they possibly can, on matters
that are important to them or matters that catch their attention.
Interactivity refers to the inherent tendency depicted by the
masses that makes them consume news of their interest.
Immediacy is a feature that represents the need of people to be
informed about news with no delay in time [2]. The world we
live in and the technology we are accustomed to, allows
people to benefit from these features by providing them instant
news on events as they happen in real-time. Online news
websites have developed effective strategies to draw peoples’
attention [3]. Online news expresses opinions regarding news
entities, which may comprise of people, places or even things,
while reporting on events that have recently occurred [4]. For
this reason interactive emotion rating services are offered by
various channels of several news websites, i.e., news can be
positive, negative or neutral [5].
Sentiment Analysis or Opinion Mining is a way of finding
out the polarity or strength of the opinion (positive or
negative) that is expressed in written text, in the case of this
paper – a news article [3] [4]. Manual labeling of sentiment
words is a time consuming process. There are two popular
approaches that are utilized to automate the process of
sentiment analysis. The first process makes use of a lexicon of
weighted words and the second process is based on
approaches of machine learning. Lexicon based methods use a
word stock dictionary with opinion words and match given set
of words in a text for finding polarity. As opposed to machine
learning methods, this approach does not need to preprocess
data not does it have to train a classifier [6]. This research is
based on a method for Lexicon-based sentiment analysis of
news articles
The remainder of this paper is organized as follows:
Section II presents related work conducted in sentiment
analysis for news articles. Section III presents the proposed
methodology and experiment setup of this paper. Results have
been presented in Section
IV
followed by limitations of the
research in Section V. Finally, Section
VI
presents the
conclusion of this research.
II. R
ELATED
W
ORK
Many researchers have contributed in news sentiment
analysis using different approaches. A brief discussion on the
work done previously on sentiment analysis is provided in this
section.
Reis, Olmo Benevenuto, Prates and An proposed a
methodology to discover the relationship between sentiment
polarity and news popularity [3]. Using different sentiment
analysis methods, an experiment was conducted by utilizing
the content of 69,907 headlines generated by four most
reputed media corporations –The New York Times, BBC,
Reuters, and Dailymail. Extracting features from text of news
headlines, the research analyzed the sentiment polarity of
these headlines. The research concluded that the polarity of
the headline had a great impact on the popularity of the news
article. The research found that negative and positive news
headlines gained greater interest than news headlines that had
a neutral tone.
Godbole, Srinivasaiah, and Sekine built an algorithm based
on sentiment lexicons which could help in finding the
sentiment words and entities associated in the text corpus of
news and blogs by looking at the co-occurrence of entity and
sentiment word in the same sentence [4]. Seven dimensions
comprising of general, health, crime, sports, business, politics
and media were selected for sentiment analysis from news and
blogs. Two trends were analyzed in the experiment - 1)
Polarity: sentiment associated with entity is positive or
negative and 2) Subjectivity: how much sentiment an entity
garners. Score for both polarity and subjectivity were
calculated.
Islam, Ashraf, Abir and Mottalib proposed an approach to
classify online news. Sentiment analysis was done at sentence
level and a dynamic dictionary with predefined positive and
negative words was used in order to get help for finding
sentiment polarity [6]. Following steps was carried out for
news article classification. 1) Selection of an online news
article. 2) Extraction of sentences from the news articles.
Sentences can be simple, compound, complex and compound
complex. 3) Search for positive words, phrases or clauses in
those sentences and finding their polarities. 4) Combining the
polarities of all sentences to get the final polarity of news
article. 91% accurate results were collected for classification
of news articles.
Meyer, Bikdash, Dai performed fine grained sentiment
analysis of financial news headline using machine learning
approach and lexicon based approach. A total of eight
experiments were conducted to find more accurate results.
Results from both approaches were also compared [8]. For
lexicon based approach, Bag of Words (BOW) model along
with General Inquirer Lexicon (H4N) was used to determine
sentiment polarities. For machine learning approach, Parts of
Speech (POS) syntactic model was used. In the experiments
concluded that more accurate results were obtained by using
machine learning approach.
Shirsat, Jagdale and Deshmukh proposed a methodology
for sentiment analysis on document level so that polarity of an
entire news article could be determined [9]. The paper
explored a dataset of 2225 documents. After text pre-
processing through tokenization, stop word removal and
stemming, post-processing was done on the entire news
article. In this step the sentiment score of the article was
assigned based on this sentiment score. The news articles were
categorized as positive, negative or neutral.
Agarwal, Sharma, Sikka and Dhir performed opinion
mining using python packages to classify words and
SentiWordNet 3.0 to identify the positive and negative words
so that total impact i.e. positive or negative sentiment in news
headline can be evaluated [10]. The impact of news headlines
has been analyzed using two algorithms.
Algorithm 1: Preprocessing of each word
Select news headline, then pre-process each word in it
using POS tagger and perform Lemmatization, and
Stemming. This is done using Natural Language Tool Kit
(NLTK).
Algorithm 2: Analyzing news headlines
After pre-processing pass each word in to SentiWordNet
3.0 dictionary to find positive, negative and objective
scores. If positive score > negative score then mark news
headline as positive. And if positive score < negative
score then mark news headline as negative.
Lei, Rao, Li, Quan, and Wenyin have built a model for
detecting social emotions induced by news articles, tweets etc.
[11]. The model comprises of modules for document selection,
tagging of parts of speech, and lexicon generation based on
social emotions. This model first creates a training set from
corpus of news documents then applies techniques of POS
tagging and feature extraction. After this step social emotion
lexicons have been generated through calculation of the
probabilities of the emotions based on the document. To test
the accuracy of the model, a dataset of 40,897 news articles
collected from the societal channel has been used.
III. R
ESEARCH
M
ETHODOLOGY
The methodology used for sentiment analysis of news
articles in this paper is based on the Lexicon-based approach.
Sentiment analysis can generally be carried out using
supervised or unsupervised approaches. A supervised approach
comprises of a set of labeled training data that is used to build a
classification model with the intent of using this model to
classify new data for which labels are not present.
Unsupervised or Lexicon-based approaches to sentiment
analysis do not require any training data. In this approach, the
sentiments conveyed by a word are inferred on grounds of the
polarity of the word. In case of a sentence or a document, the
polarities of the individual words that compose the document
collectively convey the sentiment of the sentence or the
document. Thus the polarity of a sentence is the accumulative
total (sum) of polarities of the individual words (or phrases) in
the sentence [12].
This approach utilizes some predefined lists of words such
that each word in the list is associated with a specific
sentiment. Further this approach can use the following
methods:
1. Dictionary-based methods: in these methods lexicon
dictionary is used in order to find out the positive opinion
words and negative opinion words.
2. Corpus-based methods: in these methods large corpus of
words is used and based on syntactic patterns other opinion
words can be found within the context.
Sentiment analysis can be done on document level,
sentence level, word level or phrase level. This paper explores
sentiment analysis on the document level. Similar to [13] [14],
this research identifies whether the documents new articles
expressed opinions are positive, negative or neutral. The
dictionary based approach has been used for sentiment analysis
of news articles utilizing the wordNet lexical dictionary. The
experiment for this research was carried out using the Rapid
Miner tool. The methodology for this experiment has been
presented in Fig.1.
Fig. 1. Sentiment Analysis Methodology
The methodology comprised of 5 steps, starting with data
collection. The BBC News dataset has been used for this
experiment. The next step was preprocessing the collected
data in order to reduce inconsistencies in the dataset. The
polarity of the words in the collected news articles was
computed next using the wordNet lexical dictionary. The steps
have been explained in detail below.
A. Data Collection
The BBC News dataset was utilized for this experiment.
The dataset is available online at
http://mlg.ucd.ie/datasets/bbc.html. This data set comprises of
a total of 2225 documents that comprise of news articles
reported on the BBC news website between the years 2004-
2005. The news stories belong to 5 (five) topical areas. The
dataset comprises of the following class labels: business,
entertainment, politics, sport, and tech.
B. Text Pre-processing
News articles in the dataset were preprocessed.
Preprocessing is a necessary step to clean text (lessen noise of
text) and to reduce inconsistencies from it so that this cleansed
data can more effectively be utilized in text mining or
sentiment analysis task [15]. The entire preprocessing task was
carried out using the Rapid miner tool which provides a vast
set of operators for preprocessing tasks. The first
preprocessing task was tokenizing the text in news articles into
a set of tokens by using the “Tokenize” operator. Tokenizing
breaks a sequence of sentences (combination of strings) into
individual components such as words, phrases or symbols
which are termed tokens. Apart from individual words and
phrases, tokens can even comprise of entire sentences. During
tokenization some characters, such as punctuation marks, are
discarded. After tokenization, the text of the entire documents
was changed to a lower case format using the “Transform
cases” operator. Stop words from the text were removed using
“filter stop word (English)” operator. The next task was
reducing inflected or derived words through a process called
Stemming. Stemming of words was done using the “stem
(wordNet)” operator.
C. Calculate Polarity of Sentiment of Sentiment words
After preprocessing, the statistical technique known as
Term Frequency-Inverse Document Frequency (TF-IDF) has
been used. In TF-IDF term frequency is counted [16].
According to this technique words that occur frequently in a
document are considered important and a weight is given to
these words. Using TF-IDF important words or terms in a
document were identified and assigned a weightage according
to the occurrence of various words in the news article.
After identification of important words, a dictionary has
been used for assigning sentiment score to the discovered
words. The WordNet dictionary, which is also known as a
lexical database for English language, has been used in this
experiment.
WordNet contains more than 118,000 different
word forms and more than 90,000 different word senses [17].
WordNet provides accurate results to find opinion words in a
given text and to give sentiment score to them.
D. Calculate Total Sentiment Score
According to the principle of document level sentiment
analysis, each individual document is tagged with its
respective polarity. This is generally done by finding polarities
of each individual words/phrases and sentences and combining
them to predict the polarity of whole document. Treating each
new article as a document, the sentiment conveyed in the
article has been computed by combining polarities of
individual words/phrases and sentences in news articles.
The sentiment score of whole news article has been
calculated using the “extract sentiment” operator. This
operator provides final results about sentiments: text having a
sentiment score of -1 is considered negative and text having a
sentiment score of +1 is positive. This operator provided
accurate results by using SentiWordNet 3.0.0 dictionary which
is actually an extension of the wordNet dictionary. WordNet
and SentiWordNet are connected by Synset IDs. Also by using
Score sentiment function based on WordNet and
SentiWordNet dictionary, total sentiment score of news article
was calculated.
E. Sentiment Results
News articles were classified in to positive, negative and
neutral classes by looking at their total sentiment score. News
articles sentiment was then calculated as the average value of
total word sentiments.
IV. R
ESULTS AND
D
ISCUSSION
News articles having a sentiment score of 0 were
considered as neutral and news articles with a score of +1
were treated as positive whereas news articles having a
sentiment score of -1 have been treated as negative. The
results of the experiment have been presented in Table 1.
TABLE I. S
ENTIMENT
R
ESULTS
N
EWS
C
LASS
T
OTAL
A
RTICLES
P
OSITIVE
N
EGATIVE
N
EUTRAL
Business 510
274 205 31
Entertainment 401 163 220 18
Politics 417
205 200 12
Sport 511
246 236 29
Tech 401 170
216 15
It was observed that a majority of new articles fell into the
negative or positive categories with a minor percentage of
articles having neutral sentiments. A majority of news articles
in the Entertainment and Tech category exhibited negative
sentiments, whereas the categories of business and sports
comprised of a majority of articles depicting positive
sentiments. The category of politics had almost an equal
proportion of articles exhibiting positive as well as negative
sentiments. The results of sentiment analysis have been
graphically represented in Fig.2.
Fig. 2. Results of New Articles
V.
R
ESEARCH
L
IMITATIONS AND
C
HALLENGES
Sentiment analysis focuses on text written in English and
Chinese with few researches now being carried out on Arabic,
Thai and Italian. Insufficient or limited word coverage as
many new words and their new semantics must be updated in
lexical database [18]. The accuracy of sentiment classification
is also a challenging task in sentiment analysis. Finalizing the
techniques most suitable for specific sentiment analysis tasks
is also a challenge as the nature of the dataset keeps changing
- datasets of news, reviews, and blogs all have text expressed
in various formats. This causes a variation in the accuracy and
performance of sentiment analysis classifiers [19].
For the proposed model a limitation is that it only uses English
news articles from one source for sentiment analysis.
VI. C
ONCLUSION
There are many directions in sentiment analysis that can be
explored. This paper explored sentiment analysis of news and
blogs using a dataset from BBC comprising of new articles
between the year 2004 and 2005. It was observed that
categories of business and sports had more positive articles,
whereas entertainment and tech had a majority of negative
articles. Future work in this regard will be based on sentiment
analysis of news using various machine learning approaches
with the development of an online application from where
users can read news of their interests. Also, based on
sentiment analysis methods, readers can customize their news
feed.
R
EFERENCES
[1] M. Karlsson, The immediacy of online news, the visibility of journalistic
processes and a restructuring of journalistic authority. Journalism, 12(3),
279-295, 2011.
[2] A. Kohut, C. Doherty, M. Dimock and S. Keeter, Americans spending
more time following the news. Pew Research Center, 2010.
[3] J. Reis, P. Olmo, F. Benevenuto, H. Kwak, R. Prates, and J. An,
Breaking the news: first impressions matter on online news. In ICWSM
’15, 2015.
[4] N. Godbole, M. Srinivasaiah, and S. Sekine, Large-scale sentiment
analysis for news and blogs. In International Conference on Weblogs
and Social Media, Denver, CO, 2007.
[5] J. Lei, Y. Rao, Q. Li, X. Quan, and L. Wenyin, “Towards building a
social emotion detection system for online news,” Future Generation
Computer Systems, vol. 37, pp. 438–448, 2014.
[6] M. U. Islam, F. B. Ashraf, A. I. Abir and M. A. Mottalib, "Polarity
detection of online news articles based on sentence structure and
dynamic dictionary," 2017 20th International Conference of Computer
and Information Technology (ICCIT), Dhaka, 2017, pp. 1-5.
doi: 10.1109/ICCITECHN.2017.8281777
[7] V. Kharde, and P. Sonawane, “Sentiment analysis of twitter data: a
survey of techniques,” International Journal of Computer Applications,
vol. 139, no. 11, pp. 5–15, 2016.
[8] B. Meyer, M. Bikdash, and X. Dai, “Fine-grained financial news
sentiment analysis,” SoutheastCon 2017, 2017.
[9] V. S. Shirsat, R. S. Jagdale and S. N. Deshmukh, "Document Level
Sentiment Analysis from News Articles," 2017 International Conference
on Computing, Communication, Control and Automation (ICCUBEA),
Pune, 2017, pp. 1-4.
[10] A. Agarwal, V. Sharma, G. Sikka and R. Dhir, "Opinion mining of news
headlines using SentiWordNet," 2016 Symposium on Colossal Data
Analysis and Networking (CDAN), Indore, 2016, pp.1-5.doi:
10.1109/CDAN.2016.7570949
[11] J. Lei, Y. Rao, Q. Li, X. Quan, and L. Wenyin, “Towards building a
social emotion detection system for online news,” Future Generation
Computer Systems, vol. 37, pp. 438–448, 2014.
[12] Musto, C., Semeraro, G., and Polignano, M, A comparison of lexicon-
based approaches for sentiment analysis of microblog posts. Information
Filtering and Retrieval, 59. 2014.
[13] A. Dandrea, F. Ferri, P. Grifoni, and T. Guzzo, “Approaches, tools and
applications for sentiment analysis implementation,” International
Journal of Computer Applications, vol. 125, no. 3, pp. 26–33, 2015.
[14] M. Devika, C. Sunitha, and A. Ganesh, “Sentiment analysis: a
comparative study on different approaches,” Procedia Computer
Science, vol. 87, pp. 44–49, 2016.
[15] E. Haddi, X. Liu, and Y. Shi, “The Role of Text Pre-processing in
Sentiment Analysis,” Procedia Computer Science, vol. 17, pp. 26–32,
2013.
[16] K. Ghag and K. Shah, “SentiTFIDF – Sentiment classification using
relative term frequency inverse document frequency,” International
Journal of Advanced Computer Science and Applications, vol. 5, no. 2,
2014.
[17] G. A. Miller, “WordNet: a lexical database for english,”
Communications of the ACM, vol. 38, no. 11, pp. 39–41, Jan. 1995.
[18] Z. Madhoushi, A. R. Hamdan, and S. Zainudin, “Sentiment analysis
techniques in recent works,” 2015 Science and Information Conference
(SAI), 2015.
[19] D.M.E.D.M. Hussein, “A survey on sentiment analysis challenges,”
Journal of King Saud University - Engineering Sciences, vol. 30, no. 4,
pp. 330–338, 2018.
... In this part, we provide the sentiment analysis by analyzing the abstract and then labeling it into three classifications: positive, negative, or neutral (Carrillo-de-Albornoz et al., 2018;Sharma et al., 2023;Taj et al., 2019). This labeling also assists by The ChatGPT, which was already used in the previous study (Javaid et al., 2023;Praveen & Vajrobol, 2023;Wang et al., 2023). ...
... Sentiment analysis extends our understanding of the field by revealing a complex landscape of positive, negative, and neutral sentiments about ChatGPT's role in education (Carrillo-de-Albornoz et al., 2018;Sharma et al., 2023;Taj et al., 2019). This complexity is mainly present in higher education, where concern about academic integrity combines with optimism about ChatGPT's ability to enhance student learning, engagement, and critical thinking. ...
Article
This bibliometric study involves 241 articles about ChatGPT within education, revealing a robust study field with an extraordinary annual growth rate of 23,900% and a high level of international collaboration (18.67%). The results also show that leading countries contribute distinct insights. Five unique study clusters emerged from co-occurrence analysis, concentrating on the development, role, and practical impact of ChatGPT. The multidisciplinary scope of the research highlights ChatGPT’s broad applicability from transformations to ethical dilemmas. Sentiment analysis also showed that teaching is essential, especially in higher education and medicine. The limitations of this study are concentrated on specific databases. Future research suggests adding more databases, the ethical, and pedagogical implications.
... Meanwhile, clustering methods as a subset of an unsupervised approach that uses TF-IDF (term frequency-inverse document frequency) as a criterion for text classification, where TF corresponds to terms frequency, IDF is a weighting factor, and the term of highest TF-IDF is considered as Potential features. Taj et al. (2019) and Hung et al. (2021) joint TF-IDF and WordNet for sentiment analysis task, Word-Net is used to link the polarity terms with their corresponding scores. A threshold operator is proposed to determine the appropriate class. ...
... It was found that LSTM with the help of data augmentation outperforms CNN and R-CNN. Taj et al. (2019) used the same deep learning algorithms over the Pre-trained representation model for sentiment analysis. Following the same approach, Wint et al. (2018) proposed a SA approach based on Bi-LSTM stacked over two parallel CNN layers, this architecture merges the feature sequences from each CNN layer at the pooling layer and passed them to Bi-LSTM. ...
Article
Full-text available
With the enormous growth of social data in recent years, sentiment analysis has gained increasing research attention and has been widely explored in various languages. Arabic language nature imposes several challenges, such as the complicated morphological structure and the limited resources, Thereby, the current state-of-the-art methods for sentiment analysis remain to be enhanced. This inspired us to explore the application of the emerging deep-learning architecture to Arabic text classification. In this paper, we present an ensemble model which integrates a convolutional neural network, bidirectional long short-term memory (Bi-LSTM), and attention mechanism, to predict the sentiment orientation of Arabic sentences. The convolutional layer is used for feature extraction from the higher-level sentence representations layer, the BiLSTM is integrated to further capture the contextual information from the produced set of features. Two attention mechanism units are incorporated to highlight the critical information from the contextual feature vectors produced by the Bi-LSTM hidden layers. The context-related vectors generated by the attention mechanism layers are then concatenated and passed into a classifier to predict the final label. To disentangle the influence of these components, the proposed model is validated as three variant architectures on a multi-domains corpus, as well as four benchmarks. Experimental results show that incorporating Bi-LSTM and attention mechanism improves the model’s performance while yielding 96.08% in accuracy. Consequently, this architecture consistently outperforms the other State-of-The-Art approaches with up to + 14.47%, + 20.38%, and + 18.45% improvements in accuracy, precision, and recall respectively. These results demonstrated the strengths of this model in addressing the challenges of text classification tasks.
... The politics category exhibits a near-equilibrium distribution of articles with favorable and negative views. The scope of this study is restricted to the utilization of English news items obtained from a singular source for the purpose of doing sentiment analysis [6]. ...
... BBC News Dataset The BBC News Dataset [9], provided by the British Broadcasting Corporation, comprises 2,225 news articles from 2004 to 2005, categorized into five sections: politics, sports, business, technology, and entertainment. Renowned for its clear categorization and high-quality content, the dataset is extensively used for research in text classification, natural language processing, and machine learning. ...
Technical Report
Full-text available
Financial forecasting in the healthcare sector is a critical task, necessitating the integration of diverse data sources for accurate predictions. This paper introduces MarketProphet, a novel multi-modal dataset tailored for time series forecasting of the S&P 500 Healthcare index. The dataset comprises daily trading information of 22 select healthcare stocks, identified based on trading volume, transaction amounts, and price variations. To enrich the financial data, MarketProphet also integrates daily news articles and policy updates relevant to the healthcare sector, offering a comprehensive view of market influencers. The dataset spans multiple years, providing daily records of stock metrics (opening and closing prices, trading volumes, and price changes) alongside the S&P 500 Healthcare index values. The inclusion of qualitative data from news and policy sources allows for a layered analytical approach, enhancing the potential for accurate market trend predictions. MarketProphet is designed to support advanced time series forecasting, employing both quantitative financial data and qualitative insights. This blend aims to improve the prediction accuracy of the S&P 500 Healthcare index's future movements, which is particularly valuable for financial analysts, data scientists, and policy researchers. By presenting a holistic view that combines financial performance with sector-specific news and policy developments, MarketProphet emerges as a pivotal tool for understanding and forecasting the dynamics of the healthcare market.
Article
Full-text available
Sentiment analysis (SA) is an intellectual process of extricating user's feelings and emotions. It is one of the pursued field of Natural Language Processing (NLP). The evolution of Internet based applications has steered massive amount of personalized reviews for various related information on the Web. These reviews exist in different forms like social Medias, blogs, Wiki or forum websites. Both travelers and customers find the information in these reviews to be beneficial for their understanding and planning processes. The boom of search engines like Yahoo and Google has flooded users with copious amount of relevant reviews about specific destinations, which is still beyond human comprehension. Sentiment Analysis poses as a powerful tool for users to extract the needful information, as well as to aggregate the collective sentiments of the reviews. Several methods have come to the limelight in recent years for accomplishing this task. In this paper we compare the various techniques used for Sentiment Analysis by analyzing various methodologies.
Article
Full-text available
Abstract With accelerated evolution of the internet as websites, social networks, blogs, online portals, reviews, opinions, recommendations, ratings, and feedback are generated by writers. This writer generated sentiment content can be about books, people, hotels, products, research, events, etc. These sentiments become very beneficial for businesses, governments, and individuals. While this content meant to be useful, a bulk of this writer generated content require using the text mining techniques and sentiment analysis. But there are several challenges faced the sentiment analysis and evaluation process. These challenges become obstacles in analyzing the accurate meaning of sentiments and detecting the suitable sentiment polarity. Sentiment analysis is the practice of applying Natural Language Processing and Text Analysis techniques to identify and extract subjective information from text. This paper presents a survey on the Sentiment analysis challenges relevant to their approaches and techniques.
Article
Full-text available
With the advancement of web technology and its growth, there is a huge volume of data present in the web for internet users and a lot of data is generated too. Internet has become a platform for online learning, exchanging ideas and sharing opinions. Social networking sites like Twitter, Facebook, Google+ are rapidly gaining popularity as they allow people to share and express their views about topics,have discussion with different communities, or post messages across the world. There has been lot of work in the field of sentiment analysis of twitter data. This survey focuses mainly on sentiment analysis of twitter data which is helpful to analyze the information in the tweets where opinions are highly unstructured, heterogeneous and are either positive or negative, or neutral in some cases. In this paper, we provide a survey and a comparative analyses of existing techniques for opinion mining like machine learning and lexicon-based approaches, together with evaluation metrics. Using various machine learning algorithms like Naive Bayes, Max Entropy, and Support Vector Machine, we provide a research on twitter data streams.General challenges and applications of Sentiment Analysis on Twitter are also discussed in this paper.
Conference Paper
Full-text available
Sentiment Analysis (SA) task is to label people's opinions as different categories such as positive and negative from a given piece of text. Another task is to decide whether a given text is subjective, expressing the writer's opinions, or objective, expressing. These tasks were performed at different levels of analysis ranging from the document level, to the sentence and phrase level. Another task is aspect extraction which originated from aspect-based sentiment analysis in phrase level. All these tasks are under the umbrella of SA. In recent years a large number of methods, techniques and enhancements have been proposed for the problem of SA in different tasks at different levels. This survey aims to categorize SA techniques in general, without focusing on specific level or task. And also to review the main research problems in recent articles presented in this field. We found that machine learning-based techniques including supervised learning, unsupervised learning and semi-supervised learning techniques, Lexicon-based techniques and hybrid techniques are the most frequent techniques used. The open problems are that recent techniques are still unable to work well in different domain; sentiment classification based on insufficient labeled data is still a challenging problem; there is lack of SA research in languages other than English; and existing techniques are still unable to deal with complex sentences that requires more than sentiment words and simple parsing.
Article
Full-text available
The exponential growth of available online information provides computer scientists with many new challenges and opportunities. A recent trend is to analyze people feelings, opinions and orientation about facts and brands: this is done by exploiting Sentiment Analysis techniques, whose goal is to classify the polarity of a piece of text according to the opinion of the writer. In this paper we propose a lexicon-based approach for sentiment classification of Twitter posts. Our approach is based on the exploitation of widespread lexical resources such as SentiWordNet, WordNet-Affect, MPQA and SenticNet. In the experimental session the effectiveness of the approach was evaluated against two state-of-the-art datasets. Preliminary results provide interesting outcomes and pave the way for future research in the area.
Conference Paper
The 24-hour news cycle and barrage of online media is a constant drum beat. The flow of positive and negative news is always in flux, influencing our current perspective and reassessing our future outlook. Nowhere is this more true than in the capital markets where assets are priced and risk assessed based on future expectations. While many factors influence a trader's decision to buy or sell an asset it can be argued that the sentiment from the 24-hour news cycle greatly impacts their outlook on the future value of an asset. In this paper we propose new methods to predict the positive or negative sentiment of financial news. Our analysis has found that contemporary document level sentiment analysis methods break down at fine-grained levels. Fine-grained analysis methods are vitally important as the velocity and impact of small texts, such as tweets and news flashes, increase their influence over the decision process. Using Natural Language Processing methods we extract syntactic sentence patterns from financial news headlines. From these patterns we conduct experiments using both lexicon and machine learning sentiment analysis approaches to predict sentiment. We find that our sentiment prediction methods are able to consistently out perform lexicon methods. Our robust techniques give the financial practitioner a method to fold a fine-grained news sentiment factor into their pricing or risk prediction models.