Conference PaperPDF Available

Sentiment Analysis of News Articles: A Lexicon based Approach

February 2019

February 2019

DOI:10.1109/ICOMET.2019.8673428

Conference: 2nd International Conference on Computing Mathematics & Engineering Technologies-2019 (iCoMET)

Authors:

Soonh Taj

Sukkur Institute of Business Administration

Areej Fatemah Meghji

Mehran University of Engineering and Technology

Modern technological era has reshaped traditional lifestyle in several domains. The medium of publishing news and events has become faster with the advancement of Information Technology (IT). IT has also been flooded with immense amounts of data, which is being published every minute of every day, by millions of users, in the shape of comments, blogs, news sharing through blogs, social media micro-blogging websites and many more. Manual traversal of such huge data is a challenging job; thus, sophisticated methods are acquired to perform this task automatically and efficiently. News reports events that comprise of emotions-good, bad, neutral. Sentiment analysis is utilized to investigate human emotions (i.e., sentiments) present in textual information. This paper presents a lexicon-based approach for sentiment analysis of news articles. The experiments have been performed on BBC news dataset, which expresses the applicability and validation of the adopted approach.

Sentiment Analysis Methodology

…

Figures - uploaded by Soonh Taj

Content may be subject to copyright.

Content uploaded by Soonh Taj

Content may be subject to copyright.

2019 International Conference on Computing, Mathematics and Engineering Technologies – iCoMET 2019

Sentiment Analysis of News Articles:

A Lexicon based Approach

Soonh Taj, Baby Bakhtawer Shaikh, Areej Fatemah Meghji

Mehran University of Engineering and Technology, Jamshoro, Pakistan

engr.soonhtaj@gmail.com, malkashaikh70@yahoo.com, areej.fatemah@faculty.muet.edu.pk

Abstract — Modern technological era has reshaped

traditional lifestyle in several domains. The medium of

publishing news and events has become faster with the

advancement of Information Technology (IT). IT has also been

flooded with immense amounts of data, which is being published

every minute of every day, by millions of users, in the shape of

comments, blogs, news sharing through blogs, social media

micro-blogging websites and many more. Manual traversal of

such huge data is a challenging job; thus, sophisticated methods

are acquired to perform this task automatically and efficiently.

News reports events that comprise of emotions – good, bad,

neutral. Sentiment analysis is utilized to investigate human

emotions (i.e., sentiments) present in textual information. This

paper presents a lexicon-based approach for sentiment analysis

of news articles. The experiments have been performed on BBC

news dataset, which expresses the applicability and validation of

the adopted approach.

Keywords— Sentiment analysis, Lexicon-based approach, news

articles.

NTRODUCTION

With the emergence of the Internet, web and mobile

technologies, people have changed their way of consuming

news. Traditional physical newspapers and magazines have

been replaced by virtual online versions like online news and

weblogs. Readers are more inclined to use online sources of

news mainly due to two key features: interactivity and

immediacy [1].

In this day and age, people want to consume as much

news, from as many sources, as they possibly can, on matters

that are important to them or matters that catch their attention.

Interactivity refers to the inherent tendency depicted by the

masses that makes them consume news of their interest.

Immediacy is a feature that represents the need of people to be

informed about news with no delay in time [2]. The world we

live in and the technology we are accustomed to, allows

people to benefit from these features by providing them instant

news on events as they happen in real-time. Online news

websites have developed effective strategies to draw peoples’

attention [3]. Online news expresses opinions regarding news

entities, which may comprise of people, places or even things,

while reporting on events that have recently occurred [4]. For

this reason interactive emotion rating services are offered by

various channels of several news websites, i.e., news can be

positive, negative or neutral [5].

Sentiment Analysis or Opinion Mining is a way of finding

out the polarity or strength of the opinion (positive or

negative) that is expressed in written text, in the case of this

paper – a news article [3] [4]. Manual labeling of sentiment

words is a time consuming process. There are two popular

approaches that are utilized to automate the process of

sentiment analysis. The first process makes use of a lexicon of

weighted words and the second process is based on

approaches of machine learning. Lexicon based methods use a

word stock dictionary with opinion words and match given set

of words in a text for finding polarity. As opposed to machine

learning methods, this approach does not need to preprocess

data not does it have to train a classifier [6]. This research is

based on a method for Lexicon-based sentiment analysis of

news articles

The remainder of this paper is organized as follows:

Section II presents related work conducted in sentiment

analysis for news articles. Section III presents the proposed

methodology and experiment setup of this paper. Results have

been presented in Section

followed by limitations of the

research in Section V. Finally, Section

presents the

conclusion of this research.

II. R

ELATED

ORK

Many researchers have contributed in news sentiment

analysis using different approaches. A brief discussion on the

work done previously on sentiment analysis is provided in this

section.

Reis, Olmo Benevenuto, Prates and An proposed a

methodology to discover the relationship between sentiment

polarity and news popularity [3]. Using different sentiment

analysis methods, an experiment was conducted by utilizing

the content of 69,907 headlines generated by four most

reputed media corporations –The New York Times, BBC,

Reuters, and Dailymail. Extracting features from text of news

headlines, the research analyzed the sentiment polarity of

these headlines. The research concluded that the polarity of

the headline had a great impact on the popularity of the news

article. The research found that negative and positive news

headlines gained greater interest than news headlines that had

a neutral tone.

Godbole, Srinivasaiah, and Sekine built an algorithm based

on sentiment lexicons which could help in finding the

sentiment words and entities associated in the text corpus of

news and blogs by looking at the co-occurrence of entity and

sentiment word in the same sentence [4]. Seven dimensions

comprising of general, health, crime, sports, business, politics

and media were selected for sentiment analysis from news and

blogs. Two trends were analyzed in the experiment - 1)

Polarity: sentiment associated with entity is positive or

negative and 2) Subjectivity: how much sentiment an entity

garners. Score for both polarity and subjectivity were

calculated.

Islam, Ashraf, Abir and Mottalib proposed an approach to

classify online news. Sentiment analysis was done at sentence

level and a dynamic dictionary with predefined positive and

negative words was used in order to get help for finding

sentiment polarity [6]. Following steps was carried out for

news article classification. 1) Selection of an online news

article. 2) Extraction of sentences from the news articles.

Sentences can be simple, compound, complex and compound

complex. 3) Search for positive words, phrases or clauses in

those sentences and finding their polarities. 4) Combining the

polarities of all sentences to get the final polarity of news

article. 91% accurate results were collected for classification

of news articles.

Meyer, Bikdash, Dai performed fine grained sentiment

analysis of financial news headline using machine learning

approach and lexicon based approach. A total of eight

experiments were conducted to find more accurate results.

Results from both approaches were also compared [8]. For

lexicon based approach, Bag of Words (BOW) model along

with General Inquirer Lexicon (H4N) was used to determine

sentiment polarities. For machine learning approach, Parts of

Speech (POS) syntactic model was used. In the experiments

concluded that more accurate results were obtained by using

machine learning approach.

Shirsat, Jagdale and Deshmukh proposed a methodology

for sentiment analysis on document level so that polarity of an

entire news article could be determined [9]. The paper

explored a dataset of 2225 documents. After text pre-

processing through tokenization, stop word removal and

stemming, post-processing was done on the entire news

article. In this step the sentiment score of the article was

assigned based on this sentiment score. The news articles were

categorized as positive, negative or neutral.

Agarwal, Sharma, Sikka and Dhir performed opinion

mining using python packages to classify words and

SentiWordNet 3.0 to identify the positive and negative words

so that total impact i.e. positive or negative sentiment in news

headline can be evaluated [10]. The impact of news headlines

has been analyzed using two algorithms.

Algorithm 1: Preprocessing of each word

Select news headline, then pre-process each word in it

using POS tagger and perform Lemmatization, and

Stemming. This is done using Natural Language Tool Kit

(NLTK).

Algorithm 2: Analyzing news headlines

After pre-processing pass each word in to SentiWordNet

3.0 dictionary to find positive, negative and objective

scores. If positive score > negative score then mark news

headline as positive. And if positive score < negative

score then mark news headline as negative.

Lei, Rao, Li, Quan, and Wenyin have built a model for

detecting social emotions induced by news articles, tweets etc.

[11]. The model comprises of modules for document selection,

tagging of parts of speech, and lexicon generation based on

social emotions. This model first creates a training set from

corpus of news documents then applies techniques of POS

tagging and feature extraction. After this step social emotion

lexicons have been generated through calculation of the

probabilities of the emotions based on the document. To test

the accuracy of the model, a dataset of 40,897 news articles

collected from the societal channel has been used.

III. R

ESEARCH

ETHODOLOGY

The methodology used for sentiment analysis of news

articles in this paper is based on the Lexicon-based approach.

Sentiment analysis can generally be carried out using

supervised or unsupervised approaches. A supervised approach

comprises of a set of labeled training data that is used to build a

classification model with the intent of using this model to

classify new data for which labels are not present.

Unsupervised or Lexicon-based approaches to sentiment

analysis do not require any training data. In this approach, the

sentiments conveyed by a word are inferred on grounds of the

polarity of the word. In case of a sentence or a document, the

polarities of the individual words that compose the document

collectively convey the sentiment of the sentence or the

document. Thus the polarity of a sentence is the accumulative

total (sum) of polarities of the individual words (or phrases) in

the sentence [12].

This approach utilizes some predefined lists of words such

that each word in the list is associated with a specific

sentiment. Further this approach can use the following

methods:

1. Dictionary-based methods: in these methods lexicon

dictionary is used in order to find out the positive opinion

words and negative opinion words.

2. Corpus-based methods: in these methods large corpus of

words is used and based on syntactic patterns other opinion

words can be found within the context.

Sentiment analysis can be done on document level,

sentence level, word level or phrase level. This paper explores

sentiment analysis on the document level. Similar to [13] [14],

this research identifies whether the documents new articles

expressed opinions are positive, negative or neutral. The

dictionary based approach has been used for sentiment analysis

of news articles utilizing the wordNet lexical dictionary. The

experiment for this research was carried out using the Rapid

Miner tool. The methodology for this experiment has been

presented in Fig.1.

Fig. 1. Sentiment Analysis Methodology

The methodology comprised of 5 steps, starting with data

collection. The BBC News dataset has been used for this

experiment. The next step was preprocessing the collected

data in order to reduce inconsistencies in the dataset. The

polarity of the words in the collected news articles was

computed next using the wordNet lexical dictionary. The steps

have been explained in detail below.

A. Data Collection

The BBC News dataset was utilized for this experiment.

The dataset is available online at

http://mlg.ucd.ie/datasets/bbc.html. This data set comprises of

a total of 2225 documents that comprise of news articles

reported on the BBC news website between the years 2004-

2005. The news stories belong to 5 (five) topical areas. The

dataset comprises of the following class labels: business,

entertainment, politics, sport, and tech.

B. Text Pre-processing

News articles in the dataset were preprocessed.

Preprocessing is a necessary step to clean text (lessen noise of

text) and to reduce inconsistencies from it so that this cleansed

data can more effectively be utilized in text mining or

sentiment analysis task [15]. The entire preprocessing task was

carried out using the Rapid miner tool which provides a vast

set of operators for preprocessing tasks. The first

preprocessing task was tokenizing the text in news articles into

a set of tokens by using the “Tokenize” operator. Tokenizing

breaks a sequence of sentences (combination of strings) into

individual components such as words, phrases or symbols

which are termed tokens. Apart from individual words and

phrases, tokens can even comprise of entire sentences. During

tokenization some characters, such as punctuation marks, are

discarded. After tokenization, the text of the entire documents

was changed to a lower case format using the “Transform

cases” operator. Stop words from the text were removed using

“filter stop word (English)” operator. The next task was

reducing inflected or derived words through a process called

Stemming. Stemming of words was done using the “stem

(wordNet)” operator.

C. Calculate Polarity of Sentiment of Sentiment words

After preprocessing, the statistical technique known as

Term Frequency-Inverse Document Frequency (TF-IDF) has

been used. In TF-IDF term frequency is counted [16].

According to this technique words that occur frequently in a

document are considered important and a weight is given to

these words. Using TF-IDF important words or terms in a

document were identified and assigned a weightage according

to the occurrence of various words in the news article.

After identification of important words, a dictionary has

been used for assigning sentiment score to the discovered

words. The WordNet dictionary, which is also known as a

lexical database for English language, has been used in this

experiment.

WordNet contains more than 118,000 different

word forms and more than 90,000 different word senses [17].

WordNet provides accurate results to find opinion words in a

given text and to give sentiment score to them.

D. Calculate Total Sentiment Score

According to the principle of document level sentiment

analysis, each individual document is tagged with its

respective polarity. This is generally done by finding polarities

of each individual words/phrases and sentences and combining

them to predict the polarity of whole document. Treating each

new article as a document, the sentiment conveyed in the

article has been computed by combining polarities of

individual words/phrases and sentences in news articles.

The sentiment score of whole news article has been

calculated using the “extract sentiment” operator. This

operator provides final results about sentiments: text having a

sentiment score of -1 is considered negative and text having a

sentiment score of +1 is positive. This operator provided

accurate results by using SentiWordNet 3.0.0 dictionary which

is actually an extension of the wordNet dictionary. WordNet

and SentiWordNet are connected by Synset IDs. Also by using

Score sentiment function based on WordNet and

SentiWordNet dictionary, total sentiment score of news article

was calculated.

E. Sentiment Results

News articles were classified in to positive, negative and

neutral classes by looking at their total sentiment score. News

articles sentiment was then calculated as the average value of

total word sentiments.

IV. R

ESULTS AND

ISCUSSION

News articles having a sentiment score of 0 were

considered as neutral and news articles with a score of +1

were treated as positive whereas news articles having a

sentiment score of -1 have been treated as negative. The

results of the experiment have been presented in Table 1.

TABLE I. S

ENTIMENT

ESULTS

EWS

LASS

OTAL

RTICLES

OSITIVE

EGATIVE

EUTRAL

Business 510

274 205 31

Entertainment 401 163 220 18

Politics 417

205 200 12

Sport 511

246 236 29

Tech 401 170

216 15

It was observed that a majority of new articles fell into the

negative or positive categories with a minor percentage of

articles having neutral sentiments. A majority of news articles

in the Entertainment and Tech category exhibited negative

sentiments, whereas the categories of business and sports

comprised of a majority of articles depicting positive

sentiments. The category of politics had almost an equal

proportion of articles exhibiting positive as well as negative

sentiments. The results of sentiment analysis have been

graphically represented in Fig.2.

Fig. 2. Results of New Articles

ESEARCH

IMITATIONS AND

HALLENGES

Sentiment analysis focuses on text written in English and

Chinese with few researches now being carried out on Arabic,

Thai and Italian. Insufficient or limited word coverage as

many new words and their new semantics must be updated in

lexical database [18]. The accuracy of sentiment classification

is also a challenging task in sentiment analysis. Finalizing the

techniques most suitable for specific sentiment analysis tasks

is also a challenge as the nature of the dataset keeps changing

- datasets of news, reviews, and blogs all have text expressed

in various formats. This causes a variation in the accuracy and

performance of sentiment analysis classifiers [19].

For the proposed model a limitation is that it only uses English

news articles from one source for sentiment analysis.

VI. C

ONCLUSION

There are many directions in sentiment analysis that can be

explored. This paper explored sentiment analysis of news and

blogs using a dataset from BBC comprising of new articles

between the year 2004 and 2005. It was observed that

categories of business and sports had more positive articles,

whereas entertainment and tech had a majority of negative

articles. Future work in this regard will be based on sentiment

analysis of news using various machine learning approaches

with the development of an online application from where

users can read news of their interests. Also, based on

sentiment analysis methods, readers can customize their news

feed.

EFERENCES

[1] M. Karlsson, The immediacy of online news, the visibility of journalistic

processes and a restructuring of journalistic authority. Journalism, 12(3),

279-295, 2011.

[2] A. Kohut, C. Doherty, M. Dimock and S. Keeter, Americans spending

more time following the news. Pew Research Center, 2010.

[3] J. Reis, P. Olmo, F. Benevenuto, H. Kwak, R. Prates, and J. An,

Breaking the news: first impressions matter on online news. In ICWSM

’15, 2015.

[4] N. Godbole, M. Srinivasaiah, and S. Sekine, Large-scale sentiment

analysis for news and blogs. In International Conference on Weblogs

and Social Media, Denver, CO, 2007.

[5] J. Lei, Y. Rao, Q. Li, X. Quan, and L. Wenyin, “Towards building a

social emotion detection system for online news,” Future Generation

Computer Systems, vol. 37, pp. 438–448, 2014.

[6] M. U. Islam, F. B. Ashraf, A. I. Abir and M. A. Mottalib, "Polarity

detection of online news articles based on sentence structure and

dynamic dictionary," 2017 20th International Conference of Computer

and Information Technology (ICCIT), Dhaka, 2017, pp. 1-5.

doi: 10.1109/ICCITECHN.2017.8281777

[7] V. Kharde, and P. Sonawane, “Sentiment analysis of twitter data: a

survey of techniques,” International Journal of Computer Applications,

vol. 139, no. 11, pp. 5–15, 2016.

[8] B. Meyer, M. Bikdash, and X. Dai, “Fine-grained financial news

sentiment analysis,” SoutheastCon 2017, 2017.

[9] V. S. Shirsat, R. S. Jagdale and S. N. Deshmukh, "Document Level

Sentiment Analysis from News Articles," 2017 International Conference

on Computing, Communication, Control and Automation (ICCUBEA),

Pune, 2017, pp. 1-4.

[10] A. Agarwal, V. Sharma, G. Sikka and R. Dhir, "Opinion mining of news

headlines using SentiWordNet," 2016 Symposium on Colossal Data

Analysis and Networking (CDAN), Indore, 2016, pp.1-5.doi:

10.1109/CDAN.2016.7570949

[11] J. Lei, Y. Rao, Q. Li, X. Quan, and L. Wenyin, “Towards building a

social emotion detection system for online news,” Future Generation

Computer Systems, vol. 37, pp. 438–448, 2014.

[12] Musto, C., Semeraro, G., and Polignano, M, A comparison of lexicon-

based approaches for sentiment analysis of microblog posts. Information

Filtering and Retrieval, 59. 2014.

[13] A. Dandrea, F. Ferri, P. Grifoni, and T. Guzzo, “Approaches, tools and

applications for sentiment analysis implementation,” International

Journal of Computer Applications, vol. 125, no. 3, pp. 26–33, 2015.

[14] M. Devika, C. Sunitha, and A. Ganesh, “Sentiment analysis: a

comparative study on different approaches,” Procedia Computer

Science, vol. 87, pp. 44–49, 2016.

[15] E. Haddi, X. Liu, and Y. Shi, “The Role of Text Pre-processing in

Sentiment Analysis,” Procedia Computer Science, vol. 17, pp. 26–32,

2013.

[16] K. Ghag and K. Shah, “SentiTFIDF – Sentiment classification using

relative term frequency inverse document frequency,” International

Journal of Advanced Computer Science and Applications, vol. 5, no. 2,

2014.

[17] G. A. Miller, “WordNet: a lexical database for english,”

Communications of the ACM, vol. 38, no. 11, pp. 39–41, Jan. 1995.

[18] Z. Madhoushi, A. R. Hamdan, and S. Zainudin, “Sentiment analysis

techniques in recent works,” 2015 Science and Information Conference

(SAI), 2015.

[19] D.M.E.D.M. Hussein, “A survey on sentiment analysis challenges,”

Journal of King Saud University - Engineering Sciences, vol. 30, no. 4,

pp. 330–338, 2018.

The ChatGPT Impact on Education: A Comprehensive Bibliometric Review

Article

May 2024

This bibliometric study involves 241 articles about ChatGPT within education, revealing a robust study field with an extraordinary annual growth rate of 23,900% and a high level of international collaboration (18.67%). The results also show that leading countries contribute distinct insights. Five unique study clusters emerged from co-occurrence analysis, concentrating on the development, role, and practical impact of ChatGPT. The multidisciplinary scope of the research highlights ChatGPT’s broad applicability from transformations to ethical dilemmas. Sentiment analysis also showed that teaching is essential, especially in higher education and medicine. The limitations of this study are concentrated on specific databases. Future research suggests adding more databases, the ethical, and pedagogical implications.

Improving Arabic sentiment analysis across context-aware attention deep model based on natural language processing

Article

Full-text available

Apr 2024

With the enormous growth of social data in recent years, sentiment analysis has gained increasing research attention and has been widely explored in various languages. Arabic language nature imposes several challenges, such as the complicated morphological structure and the limited resources, Thereby, the current state-of-the-art methods for sentiment analysis remain to be enhanced. This inspired us to explore the application of the emerging deep-learning architecture to Arabic text classification. In this paper, we present an ensemble model which integrates a convolutional neural network, bidirectional long short-term memory (Bi-LSTM), and attention mechanism, to predict the sentiment orientation of Arabic sentences. The convolutional layer is used for feature extraction from the higher-level sentence representations layer, the BiLSTM is integrated to further capture the contextual information from the produced set of features. Two attention mechanism units are incorporated to highlight the critical information from the contextual feature vectors produced by the Bi-LSTM hidden layers. The context-related vectors generated by the attention mechanism layers are then concatenated and passed into a classifier to predict the final label. To disentangle the influence of these components, the proposed model is validated as three variant architectures on a multi-domains corpus, as well as four benchmarks. Experimental results show that incorporating Bi-LSTM and attention mechanism improves the model’s performance while yielding 96.08% in accuracy. Consequently, this architecture consistently outperforms the other State-of-The-Art approaches with up to + 14.47%, + 20.38%, and + 18.45% improvements in accuracy, precision, and recall respectively. These results demonstrated the strengths of this model in addressing the challenges of text classification tasks.

Sentiment Analysis of Bangladeshi Digital Newspaper by Using Machine Learning and Natural Language Processing

Conference Paper

Full-text available

May 2024

MarketProphet-SP500MediNews: Forecasting Healthcare Sector S&P 500 Index Trends Using Data from 22 Key Stocks and Medical News Insights

Technical Report

Full-text available

Feb 2024

Financial forecasting in the healthcare sector is a critical task, necessitating the integration of diverse data sources for accurate predictions. This paper introduces MarketProphet, a novel multi-modal dataset tailored for time series forecasting of the S&P 500 Healthcare index. The dataset comprises daily trading information of 22 select healthcare stocks, identified based on trading volume, transaction amounts, and price variations. To enrich the financial data, MarketProphet also integrates daily news articles and policy updates relevant to the healthcare sector, offering a comprehensive view of market influencers. The dataset spans multiple years, providing daily records of stock metrics (opening and closing prices, trading volumes, and price changes) alongside the S&P 500 Healthcare index values. The inclusion of qualitative data from news and policy sources allows for a layered analytical approach, enhancing the potential for accurate market trend predictions. MarketProphet is designed to support advanced time series forecasting, employing both quantitative financial data and qualitative insights. This blend aims to improve the prediction accuracy of the S&P 500 Healthcare index's future movements, which is particularly valuable for financial analysts, data scientists, and policy researchers. By presenting a holistic view that combines financial performance with sector-specific news and policy developments, MarketProphet emerges as a pivotal tool for understanding and forecasting the dynamics of the healthcare market.

Machine learning-based Sentiment Analysis: A Comprehensive Review

Conference Paper

Apr 2024

Forecasting Political Security Threats: A Fusion of Lexicon-Based and ML Approaches

Chapter

May 2024

Współczesne Problemy Gospodarowania

Book

Jan 2023

Pivotal study about the sentimental analysis and its application on news and its psychological effects on our life

Conference Paper

Jan 2024

Sentiment Analysis on News Headlines of Nation’s Capital Relocation Using CNN and SVM

Conference Paper

Nov 2023

Checking the Truthfulness of News Channels using NLP Techniques

Conference Paper

Nov 2023

Document Level Sentiment Analysis from News Articles

Conference Paper

Full-text available

Aug 2017

Polarity detection of online news articles based on sentence structure and dynamic dictionary

Conference Paper

Full-text available

Dec 2017

Sentiment Analysis: A Comparative Study on Different Approaches

Article

Full-text available

Dec 2016

Sentiment analysis (SA) is an intellectual process of extricating user's feelings and emotions. It is one of the pursued field of Natural Language Processing (NLP). The evolution of Internet based applications has steered massive amount of personalized reviews for various related information on the Web. These reviews exist in different forms like social Medias, blogs, Wiki or forum websites. Both travelers and customers find the information in these reviews to be beneficial for their understanding and planning processes. The boom of search engines like Yahoo and Google has flooded users with copious amount of relevant reviews about specific destinations, which is still beyond human comprehension. Sentiment Analysis poses as a powerful tool for users to extract the needful information, as well as to aggregate the collective sentiments of the reviews. Several methods have come to the limelight in recent years for accomplishing this task. In this paper we compare the various techniques used for Sentiment Analysis by analyzing various methodologies.

A Survey on Sentiment Analysis Challenges

Article

Full-text available

Apr 2016

Doaa Mohey El-Din

Abstract With accelerated evolution of the internet as websites, social networks, blogs, online portals, reviews, opinions, recommendations, ratings, and feedback are generated by writers. This writer generated sentiment content can be about books, people, hotels, products, research, events, etc. These sentiments become very beneficial for businesses, governments, and individuals. While this content meant to be useful, a bulk of this writer generated content require using the text mining techniques and sentiment analysis. But there are several challenges faced the sentiment analysis and evaluation process. These challenges become obstacles in analyzing the accurate meaning of sentiments and detecting the suitable sentiment polarity. Sentiment analysis is the practice of applying Natural Language Processing and Text Analysis techniques to identify and extract subjective information from text. This paper presents a survey on the Sentiment analysis challenges relevant to their approaches and techniques.

Sentiment Analysis of Twitter Data: A Survey of Techniques

Article

Full-text available

Apr 2016

With the advancement of web technology and its growth, there is a huge volume of data present in the web for internet users and a lot of data is generated too. Internet has become a platform for online learning, exchanging ideas and sharing opinions. Social networking sites like Twitter, Facebook, Google+ are rapidly gaining popularity as they allow people to share and express their views about topics,have discussion with different communities, or post messages across the world. There has been lot of work in the field of sentiment analysis of twitter data. This survey focuses mainly on sentiment analysis of twitter data which is helpful to analyze the information in the tweets where opinions are highly unstructured, heterogeneous and are either positive or negative, or neutral in some cases. In this paper, we provide a survey and a comparative analyses of existing techniques for opinion mining like machine learning and lexicon-based approaches, together with evaluation metrics. Using various machine learning algorithms like Naive Bayes, Max Entropy, and Support Vector Machine, we provide a research on twitter data streams.General challenges and applications of Sentiment Analysis on Twitter are also discussed in this paper.

Sentiment analysis techniques in recent works

Conference Paper

Full-text available

Jul 2015

Suhaila Zainudin

Sentiment Analysis (SA) task is to label people's opinions as different categories such as positive and negative from a given piece of text. Another task is to decide whether a given text is subjective, expressing the writer's opinions, or objective, expressing. These tasks were performed at different levels of analysis ranging from the document level, to the sentence and phrase level. Another task is aspect extraction which originated from aspect-based sentiment analysis in phrase level. All these tasks are under the umbrella of SA. In recent years a large number of methods, techniques and enhancements have been proposed for the problem of SA in different tasks at different levels. This survey aims to categorize SA techniques in general, without focusing on specific level or task. And also to review the main research problems in recent articles presented in this field. We found that machine learning-based techniques including supervised learning, unsupervised learning and semi-supervised learning techniques, Lexicon-based techniques and hybrid techniques are the most frequent techniques used. The open problems are that recent techniques are still unable to work well in different domain; sentiment classification based on insufficient labeled data is still a challenging problem; there is lack of SA research in languages other than English; and existing techniques are still unable to deal with complex sentences that requires more than sentiment words and simple parsing.

A comparison of lexicon-based approaches for sentiment analysis of microblog

Article

Full-text available

Jan 2014

The exponential growth of available online information provides computer scientists with many new challenges and opportunities. A recent trend is to analyze people feelings, opinions and orientation about facts and brands: this is done by exploiting Sentiment Analysis techniques, whose goal is to classify the polarity of a piece of text according to the opinion of the writer. In this paper we propose a lexicon-based approach for sentiment classification of Twitter posts. Our approach is based on the exploitation of widespread lexical resources such as SentiWordNet, WordNet-Affect, MPQA and SenticNet. In the experimental session the effectiveness of the approach was evaluated against two state-of-the-art datasets. Preliminary results provide interesting outcomes and pave the way for future research in the area.

Opinion mining of news headlines using SentiWordNet

Conference Paper

Mar 2016

Fine-grained financial news sentiment analysis

Conference Paper

Mar 2017

The 24-hour news cycle and barrage of online media is a constant drum beat. The flow of positive and negative news is always in flux, influencing our current perspective and reassessing our future outlook. Nowhere is this more true than in the capital markets where assets are priced and risk assessed based on future expectations. While many factors influence a trader's decision to buy or sell an asset it can be argued that the sentiment from the 24-hour news cycle greatly impacts their outlook on the future value of an asset. In this paper we propose new methods to predict the positive or negative sentiment of financial news. Our analysis has found that contemporary document level sentiment analysis methods break down at fine-grained levels. Fine-grained analysis methods are vitally important as the velocity and impact of small texts, such as tweets and news flashes, increase their influence over the decision process. Using Natural Language Processing methods we extract syntactic sentence patterns from financial news headlines. From these patterns we conduct experiments using both lexicon and machine learning sentiment analysis approaches to predict sentiment. We find that our sentiment prediction methods are able to consistently out perform lexicon methods. Our robust techniques give the financial practitioner a method to fold a fine-grained news sentiment factor into their pricing or risk prediction models.

Large-scale sentiment analysis for news and blogs

Article

Jan 2007

Sentiment Analysis of News Articles: A Lexicon based Approach

Abstract and Figures

Recommended publications

Recent Trends in Calculating Polarity Score Using Sentimental Analysis

News Sentiment Analysis

Sentiment analysis using web scraping for live news data with machine learning algorithms

Sentimental Classification of News Headlines using Recurrent Neural Network

A Comparative Study of Various Approaches for Sentiment Analysis of Twitter Data