ChapterPDF Available

Social Media Sentiment Analysis on Third Booster Dosage for COVID-19 Vaccination: A Holistic Machine Learning Approach

March 2023

March 2023

DOI:10.1007/978-981-19-8477-8_14

In book: Intelligent Systems and Human Machine Collaboration (pp.179-190)

Authors:

Papri Ghosh

Narula Institute of Technology

Ritam Dutta

Poornima University

Nikita Agarwal

Siksha O Anusandhan University

Siddhartha Chatterjee

College of Engineering and Managament, Kolaghat

Show all 5 authorsHide

Over a period of more than two years the public health has been experiencing legitimate threat due to COVID-19 virus infection. This article represents a holistic machine learning approach to get an insight of social media sentiment analysis on third booster dosage for COVID-19 vaccination across the globe. Here in this work, researchers have considered Twitter responses of people to perform the sentiment analysis. Large number of tweets on social media require multiple terabyte sized database. The machine learned algorithm-based sentiment analysis can actually be performed by retrieving millions of twitter responses from users on daily basis. Comments regarding any news or any trending product launch may be ascertained well in twitter information. Our aim is to analyze the user tweet responses on third booster dosage for COVID-19 vaccination. In this sentiment analysis, the user sentiment responses are firstly categorized into positive sentiment, negative sentiment, and neutral sentiment. A performance study is performed to quickly locate the application and based on their sentiment score the application can distinguish the positive sentiment, negative sentiment and neutral sentiment-based tweet responses once clustered with various dictionaries and establish a powerful support on the prediction. This paper surveys the polarity activity exploitation using various machine learning algorithms viz. Naïve Bayes (NB), K- Nearest Neighbors (KNN), Recurrent Neural Networks (RNN), and Valence Aware wordbook and sEntiment thinker (VADER) on the third booster dosage for COVID-19 vaccination. The VADER sentiment analysis predicts 97% accuracy, 92% precision, and 95% recall compared to other existing machine learning models.KeywordsHolistic approachNBKNNRNNVADER sentiment analysisTweeter responseBooster dosage

Different stages of sentiment analysis

…

Flowchart of our proposed sentiment analysis model

…

Twitter polarity count on third booster dosage using VADER sentiment Analysis

…

Sample dataset collected from social media (Twitter)

…

Figures - uploaded by Siddhartha Chatterjee

Content may be subject to copyright.

Content uploaded by Siddhartha Chatterjee

Content may be subject to copyright.

Social Media Sentiment Analysis

on Third Booster Dosage for COVID-19

Vaccination: A Holistic Machine

Learning Approach

Papri Ghosh , Ritam Dutta , Nikita Agarwal , Siddhartha Chatterjee ,

and Solanki Mitra

Abstract Over a period of more than two years the public health has been expe-

riencing legitimate threat due to COVID-19 virus infection. This article represents

a holistic machine learning approach to get an insight of social media sentiment

analysis on third booster dosage for COVID-19 vaccination across the globe. Here

in this work, researchers have considered Twitter responses of people to perform the

sentiment analysis. Large number of tweets on social media require multiple terabyte

sized database. The machine learned algorithm-based sentiment analysis can actually

be performed by retrieving millions of twitter responses from users on daily basis.

Comments regarding any news or any trending product launch may be ascertained

well in twitter information. Our aim is to analyze the user tweet responses on third

booster dosage for COVID-19 vaccination. In this sentiment analysis, the user senti-

ment responses are ﬁrstly categorized into positive sentiment, negative sentiment,

and neutral sentiment. A performance study is performed to quickly locate the appli-

cation and based on their sentiment score the application can distinguish the positive

sentiment, negative sentiment and neutral sentiment-based tweet responses once clus-

tered with various dictionaries and establish a powerful support on the prediction.

This paper surveys the polarity activity exploitation using various machine learning

algorithms viz. Naïve Bayes (NB), K- Nearest Neighbors (KNN), Recurrent Neural

Networks (RNN), and Valence Aware wordbook and sEntiment thinker (VADER) on

the third booster dosage for COVID-19 vaccination. The VADER sentiment analysis

predicts 97% accuracy, 92% precision, and 95% recall compared to other existing

machine learning models.

P. Ghosh

CSE Department, Narula Institute of Technology, Kolkata, West Bengal, India

R. Dutta (B

)·N. Agarwal

ITER, Siksha ‘O’ Anusandhan University, Bhubaneswar, Odisha, India

e-mail: ritamdutta1986@gmail.com

S. Chatterjee

CSE Department, IMPS College of Engineering and Technology, Malda, West Bengal, India

S. Mitra

CSE Department, University of Glasgow, University Ave, Glasgow G12 8QW, UK

S. Bhattacharyya et al. (eds.), Intelligent Systems and Human Machine Collaboration,

Lecture Notes in Electrical Engineering 985,

https://doi.org/10.1007/978-981-19-8477-8_14

179

180 P. Ghosh et al.

Keywords Holistic approach ·NB ·KNN ·RNN ·VADER sentiment analysis ·

Tweeter response ·Booster dosage

1 Introduction

Social websites which are different forms viz. blogs, icon and forum sharing, video

sharing social networks, microblogs etc. have used several online social media

viz. Facebook, Instagram, YouTube, Linked-in, Twitter. These websites and mobile

applications share various people’s response globally.

In these social sites, different individuals across the world can express their discus-

sion, comments in various styles like text, image, video, emoji [1,2]. Social media

with huge source of knowledge can gather user opinion and various polls regarding

the expression. Microblog has become the simplest familiar and therefore the source

of various data [3]. Twitter is one the microblog service that enables users to share,

reply within a short time frame as tweets [4]. It provides a fashionable supply of

knowledge which are utilized in various scientiﬁc studies that are using sentiment

analysis to extract and analysis data which are expressed as tweets on various topics

like market, election, share-trade prediction.

Linguistic Inquiry and Word Count (LIWC) is one of the tools of text extraction

[5,6]. Most of these tolls require programming, here in our work we have used

Valence Aware wordbook and sentiment thinker (VADER), which work on sentiment

analysis of tweets on third booster dosage for COVID-19 vaccination among various

countries.

2 Literature Survey

A thorough literature survey containing text mining and sentiment analysis

approaches have been performed in this work. Gurkhe et al. [7] in their paper

have projected twitter information which is collected and processed from numerous

sources and removed the content which does not hold any polarity. Bouazizi et al.

[8] have used a tool SENTA for sentiment analysis of the tweets and calculate score

according to sentiment. Gautam et al. have used a review classiﬁcation on tweets [9],

where they have used primary algorithms viz. Naïve Bayes, Support Vector Machine

(SVM), Maximum Entropy used from NLTK module of Python. Amolik et al. [10]

have used twitter sentiment analysis on Hollywood movie industry and they have

compared Naïve Bayes and SVM algorithm on accuracy classiﬁcation. Mukherjee

et al. [11] have used a hybrid sentiment analysis tool TwiSent where spell check and

linguistic handler have already been deﬁned. Davidov et al. [12] have introduced a

supervised sentiment analysis technique on twitter information. Neethu et al. [13]

have used machine learning technique on SVM, Naïve Bayes, Maximum entropy on

MATLAB platform for classiﬁcation data. A typical design structure of tweets on

Social Media Sentiment Analysis on Third Booster Dosage … 181

terrorism attack and their possible activities has been represented by Garg et al. [14].

Lots of tweets after the attack with #tag has been used on Naïve Bayes algorithm

which was used on huge data and analysis has been performed for characteristics of

the comments. Hasan et al. [15] have projected a hybrid approach where the tweets

are followed by the #tag on political trend. Several Urdu tweets are translated to

English for analysis, where the Naïve Bayes and SVM approaches are used to build

a structure. Bhavsar et al. [16,17] have designed and projected a sentiment analysis

method on python platform and the data source were collected from Kaggle. Clas-

siﬁcations on user’s emotions on positivity and negativity were done for accuracy

ﬁnding. Otaibi et al. [41] have structured a model both on supervised and unsuper-

vised algorithm. They have used Twitter API for extracting 7000 tweets which are

based on comments on McDonald and KFC quality. The analysis was performed on

R programming language platform.

3 Sentiment Analysis on Twitter

This system primarily consists of the following stages. The stages square measure

mentioned in Fig. 1.

Fig. 1 Different stages of sentiment analysis

182 P. Ghosh et al.

3.1 Data Extraction

For any machine learning model, database is the primary need. These data train

themselves and predict the unseen data. Initially we opt for the topic with associated

subject that will be gathered. The social media responses (tweets) are retrieved in

unstructured, structured and semi structured form.

3.2 Data Pre-processing

In this step, within the collected social media responses (tweets) data pre-processing

is performed. Here the large set of information is ﬁltered by eliminating irrelevant,

inconsistent, and yelling information. It functions by converting these datasets to

lowercase and removing the duplicate values viz. punctuations, spaces, stops etc.,

further need to add contractions and lemmatizations.

3.3 Sentiment Detection

By incorporating data classiﬁcation and data (tweet) mining the sentiment detection

can be performed [18].

3.4 Sentiment Classiﬁcation

Algorithmic ruled sentiment analysis is generally classiﬁed to two approaches viz.

supervised learning and unattended learning. In the supervised learning, the Naïve

Thomas Bayes, SVM and most entropy square measure accustomed execute the

sentiment analysis [19–28].

3.5 Evaluation

The ﬁnal output is analyzed to require call whether or not we must always prefer

it or not [29–33]. In this step, within the collected social media responses (tweets)

data pre-processing is performed. Here the large set of information is ﬁltered by

eliminating irrelevant, inconsistent, and yelling information.

Social Media Sentiment Analysis on Third Booster Dosage … 183

4 Proposed Model Component Description

Twitter may be a social networking platform that enables its users to send and skim

micro-blogs of up to 280-characters called “tweets”. It allows registered users to

browse and post their tweets through internet, short message service (SMS) and

mobile applications. As a worldwide period of time communications platform,

Twitter has quite four hundred million monthly guests and 255 million monthly

active users round the world. Twitter’s active cluster of registered members includes

World leaders, major athletes, star performers, news organizations, and amusement

retailers. It is presently accessible in additional than thirty-ﬁve languages.

Twitter was launched in 2006 by Jack Dorsey, Evan Williams, Biz Stone, and

patriarch Glass. Twitter is headquartered in San Francisco, California, USA.

Here in Table 1, the sample dataset collected from social media (Twitter) has been

showcased where the different sentiment categories have been identiﬁed by machine

learning algorithms used in the model.

The Natural Language Toolkit (NLTK) has been used in our Valence Aware Dictio-

nary for sEntiment Reasoning (VADER) model. NLTK provides free ASCII text ﬁle

Python package that has many tools for building programs and classifying knowledge

[34,35]. The VADER model is a rule-based sentiment analysis tool that speciﬁcally

Table 1 Sample dataset

collected from social media

(Twitter)

Comments Category

Positive

Negative

Neutral

184 P. Ghosh et al.

attuned to the user emotions expressed in social media [36,37]. All instances of the

info sets had accents and punctuation marks removed and then different data pre-

processing techniques have been applied, viz. Lemma extraction [38], Stemming,

Part of Speech (PoS) tagging [39,40] and Summarization. Similar words of received

twitter responses are identiﬁed using regular expressions and passed to a dictionary

to label the data that can be used for supervised learning.

4.1 Flow Chart of Our Proposed Model

In this section, the ﬂowchart of our proposed sentiment analysis model is depicted in

Fig. 2. In the mentioned Algorithm 1 below, the steps to carry on the analysis process

on the received texts are further shown in Fig. 3.

In our work, the VADER sentiment analysis tool has been used to get the simu-

lated response. VADER is a lexicon and rule-based sentiment analysis tool which

is speciﬁcally adjusted to the sentiments expressed in social media (twitter) and

provides substantial results on texts from other domains.

4.2 Different Machine Learning Models Used

for Performance Comparison

The machine learning models are trained with some data and are used to make

predictions on unseen data. The specialty of machine learning models is that they

can extract and learn the features of a dataset using some feature selection technique

and therefore don’t require human intervention. This section describes the machine

learning models used in our proposed work.

4.2.1 Naïve Bayes

The Naïve Bayes classiﬁcation is a well-known supervised machine learning

approach that makes predictions based on some probability. It is based on the Bayes

theorem to determine the probability value, calculated as shown below:

P(C|X)=P(X|C).P(C)

P(X)(1)

where,

P(X|C): likelihood

P(C|X): posterior probability

Social Media Sentiment Analysis on Third Booster Dosage … 185

Fig. 2 Flowchart of our proposed sentiment analysis model

P(C): class probability

P(X): predictor probability

4.2.2 K-Nearest Neighbor (KNN)

K-Nearest Neighbor is the simplest yet widely used supervised Machine Learning

algorithm. It is widely used in text mining, pattern recognition, and many other ﬁelds.

It groups similar types of data with respect to k neighbors and based on the similarity,

the classiﬁcation is done. In our work, we have used the grid search technique to

determine the “k” value and it was observed that k =15 gave good results as shown

in Fig. 4b.

186 P. Ghosh et al.

Fig. 3 Proposed algorithm for twitter comments sentiment classiﬁcation

Fig. 4 a Sentiment score analysis using NB, KNN, RNN and VADER algorithms, bAccuracy,

Precision and Recall for different sentiment analyzer models

4.2.3 Recurrent Neural Network (RNN)

Recurrent Neural Networks are a type of Neural Network that remember old data

to make future predictions. They analyze the data more efﬁciently as compared

to other machine learning models as the latter uses the current data only to make

future predictions. The RNN model has a memory that remembers the relevant past

information and forgets the irrelevant information. As the same parameters are used

throughout the layers, the complexity of the model is reduced to a large extent. The

RNN model is widely used in text mining, natural language processing, and many

other tasks. The parameters used in our RNN model are shown in Table 2.

Social Media Sentiment Analysis on Third Booster Dosage … 187

Table 2 Parameters used in

our RNN model Parameters RNN

Layers used 4

Fully connected layers used 2

Number of nodes in each layer 128, 64, 32, 16

Number of nodes in fully connected layers 16, 1

Optimization technique RMSprop

Activation function SoftMax

Epochs 100

Learning rate 0.0001

4.2.4 VADER

The Valence Aware Dictionary and sEntiment Reasoner sentiment analysis tool being

an unsupervised learning approach, which is able to detect the polarity of the senti-

ment (positive, negative, or neutral) of a given text when the data is analyzed as unla-

beled. Therefore, our proposed VADER model is less expensive compared to other

existing supervised learning approaches. Orthodox sentiment analyzer models are

given opportunity to learn from labeled training data, which complexes the process.

The VADER sentiment analyzer is smart to get the job done without the label forma-

tion. VADER uses a lexicon of sentiment-related words to determine the overall

sentiment of a given body of text.

4.3 Evaluation Matrix

This section describes the metrics used to evaluate the performance of our models.

The three evaluation metrics incorporated in our work are described below:

Accuracy: It represents the total predictions correctly made.

Precision: It is the ratio of the number of correct positive results divided by the

number of positive results predicted by the classiﬁer.

Recall: It is the ratio of the number of correct positive results divided by the

number of relevant samples.

5 Simulation Results and Discussion

For the result analysis, we have continued by closing computation supported infor-

mation which were collected from social media (twitter) information. The senti-

ment analysis tool incorporates word-order sensitive relationships between terms and

188 P. Ghosh et al.

Fig. 5 Twitter polarity count on third booster dosage using VADER sentiment Analysis

then collected information are processed to identify positive, negative, and neutral

sentiments.

Here in Fig. 4a, a bar chart comparison on sentiment score analysis fetched from

social media (twitter) responses modeled by four machine learning algorithms i.e.,

NB, KNN, RNN and VADER is performed. The performance of these models is

evaluated using the well-known machine learning evaluation metrics viz. accuracy,

precision, and recall as shown in Fig. 4b. The tweeter responses on the usage and

signiﬁcance of third booster dosage for COVID-19 vaccination have been categori-

cally modeled using four machine learning algorithms, where the VADER sentiment

analyzer shows best accurate results compared to others.

The twitter polarity count on third booster dosage has also been compared between

top ﬁve superpower countries using VADER sentiment analysis as shown in Fig. 5.

6 Conclusion and Future Scope

In our paper on the concurrent result of analyzing twitter responses of various coun-

tries on usage and probable signiﬁcance of third booster dosage to combat COVID-19

infection is recorded. The World Health Organization wishes to speculate into the

medicine market. The machine learning models viz. NB, KNN, RNN, and VADER

were used to analyze the sentiments of the humans portrayed in tweets. This compar-

ative study has proved the results where VADER sentiment analysis was performed

with 97% accuracy, 92% precision, and 95% recall. The results have shown a robust

co-relation between Twitter comments in sentiment polarity. The VADER being an

Social Media Sentiment Analysis on Third Booster Dosage … 189

unsupervised learning approach has proposed a model that is less expensive compared

to other existing supervised learning approaches. This proposed approach can be used

for other contagious infections if needed in future.

References

1. Jansen BJ, Hang MZ, Sobeland K, Chowdury A (2009) Twitter power: Tweets as electronic

word of mouth. J Am Soc Inf Sci Technol 60(11):2169–2188

2. Kharde V, Sonawane P (2016) Sentiment analysis of twitter data: a survey of techniques. Int J

Comput Appl. ArXiv1601.06971

3. Selvaperumal P, Suruliandi A (2014) A short message classiﬁcation algorithm for tweet

classiﬁcation. Int Conf Recent Trends Inf Technol 1–3

4. Singh T, Kumari M (2016) Role of text pre-processing in twitter sentiment analysis. Procedia

Comput Sci 89:549–554

5. Tausczik YR, Pennebaker JW (2010) The psychological meaning of words: LIWC and

computerized text analysis methods. J Lang Soc Psychol 29(1):24–54

6. Gilbert CJ (2016) Vader: a parsimonious rule-based model for sentiment analysis of social

media text. In: Eighth international conference on weblogs and social media (ICWSM-14)

7. Gurkhe D, Pal N, Bhatia R (2014) Effective sentiment analysis of social media datasets using

Naïve Bayesian classiﬁcation. Int J Comput Appl

8. Bouazizi M, Ohtsuki T (2018) Multi-class sentiment analysis in Twitter: what if classiﬁcation

is not the answer. IEEE Access. 6:64486–64502

9. Gautam G, Yadav D (2014) Sentiment analysis of twitter data using machine learning

approaches and semantic analysis. In: 7th international conference on contemporary computing

10. Amolik A, Jivane N, Bhandari JM, Venkatesan M (2016) Twitter sentiment analysis of movie

reviews using machine learning techniques. Int J Eng Technol 7(6):1–7

11. Mukherjee S., Malu A, Balamurali AR, Bhattacharyya P (2013) TwiSent: a multistage system

for Analyzing sentiment in Twitter. In: Proceedings of the 21st ACM international conference

on information and knowledge management

12. Davidov D, Sur O, Rappoport A (2010) Enhanced sentiment learning using Twitter hashtags

and smileys. In: Proceedings of the 23rd international conference on computational linguistics,

posters

13. Neethu M, Rajasree R (2013) Sentiment analysis in twitter using machine learning techniques.

In: 4th international conference on computing, communications and networking technologies,

IEEE, Tiruchengode, India

14. Garg P, Garg H, Ranga V (2017) Sentiment analysis of the Uri terror attack using Twitter. In:

International conference on computing, communication and automation, IEEE, Greater Noida,

India

15. Hasan A, Moin S, Karim A, Shamshirb S (2018) Machine learning based sentiment analysis

for Twitter accounts, Licensee, MDPI, Switzerland

16. Bhavsar H, Manglani R (2019) Sentiment analysis of Twitter data using python. Int Res J Eng

Technol 3(2):41–45

17. Sirsat S, Rao S, Wukkadada B (2019) Sentiment analysis on Twitter data for product evaluation.

IOSR J Eng 5(1):22–25

18. Behdenna S, Barigou F, Belalem G (2018) Document level sentiment analysis: a survey. In:

EAI endorsed transactions on context-aware systems and applications

19. Taboada M, Brooke J, Toﬁloski M, Voll K, Stede M (2011) Lexicon-based methods for

sentiment analysis. Comput Linguist J 267–307

20. Tong RM (2001) An operational system for detecting and tracking opinions in on-line

discussions. In: Working notes of the SIGIR workshop on operational text classiﬁcation, pp

1–6

190 P. Ghosh et al.

21. Turney P, Littman M (2003) Measuring praise and criticism: inference of semantic orientation

from association. ACM Trans Inform Syst J 21(4):315–346

22. Kaur L (2016) Review paper on Twitter sentiment analysis techniques. Int J Res Appl Sci Eng

Technol 4(1)

23. Miller GA, Beckwith R, Fellbaum C, Gross D, Miller KJ (1990) Introduction to WordNet: an

on-line lexical database. Int J Lexicogr 3(4):235–244

24. Mohammad S, Dunne C, Dorr B (2009) Generating high-coverage semantic orientation lexicons

from overly marked words and a thesaurus, In: Proceedings of the conference on empirical

methods in natural language processing

25. Harb A, Plantie M, Dray G, Roche M, Trousset F, Poncelet F (2008) Web opinion mining:

How to extract opinions from blogs? In: Proceedings of the 5th international conference on

soft computing as transdisciplinary science and technology (CSTST 08), pp 211–217

26. Turney PD (2002) Thumbs up or thumbs down? Semantic orientation applied to unsuper-

vised classiﬁcation of reviews. In: Proceedings of the 40th annual meeting on association for

computational linguistics, pp 417–424

27. Wang G, Araki K (2007) Modifying SOPMI for Japanese weblog opinion mining by using

a balancing factor and detecting neutral expressions. In: Human language technologies: the

conference of the North American chapter of the association for computational linguistics,

companion volume, pp 189–192

28. Rice DR, Zorn C (2013) Corpus-based dictionaries for sentiment analysis of specialized

vocabularies. In: Proceedings of NDATAD, pp 98–115

29. Su KY, Chiang TH, Chang JS (1996) An overview of corpus-based statistics oriented (CBSO)

Techniques for natural language processing. Comput Linguist Chin Lang Process 1(1):101–157

30. Su KY, Chiang TH (1990) Some key issues in designing MT systems. Mach Transl 5(4):265–

300

31. Su KY, Chiang TH (1992) Why corpus-based statistics oriented machine translation. In:

Proceedings of 4th international conference on theoretical and methodological issue in machine

translation, Montreal, Canada, pp 249–262

32. Su KY, Chang JS, Una Hsu YL (1995) A corpus-based two-way design for parameterized MT

systems: rationale, architecture and training issues. In: Proceedings of the 6th international

conference on theoretical and methodological issues in machine translation, TMI-95, pp 334–

353

33. Moss HE, Ostrin RK, Tyler LK, Marslen WD (1995) Accessing different types of lexical

se-mantic information: evidence from priming. J Exp Psychol Learn Mem Cogn 21(1):863–883

34. Natural Language Toolkit. http://www.nltk.org/. Last Accessed 20 Nov 2018

35. Bird S, Loper E, Klein E (2009) Natural language processing with python. O’Reilly Media Inc.

36. Gilbert CJ (2014) Vader: A parsimonious rule-based model for sentiment analysis of social

media text. In: 8th international confernece on weblogs and social media

37. Elbagir S, Yang J (2019) Twitter sentiment analysis using natural language toolkit and VADER

sentiment. LNCS 232:342–347

38. Natural Language Processing with neural networks. http://nilc.icmc.usp.br/nlpnet/.Last

Accessed 05 Dec 2021

39. Fonseca ER, Rosa JLG (2013) A two-step convolutional neural network approach for semantic

role labelling. In: Proceedings of the 2013 international joint conference on neural networks-

2013, pp 2955–2961

40. Fonseca ER, Rosa JLG (2013) Mac-Morpho revisited: towards robust part-of-speech tagging.

In: Proceedings of the 9th Brazilian symposium in information and human language technology,

pp 98–107

41. Rahman SA, Al Otaibi FA, AlShehri WA (2019) Sentiment analysis of Twitter data. In: 2019

international conference on computer and information sciences, pp 1–4. https://doi.org/10.

1109/ICCISci.2019.8716464

Machine Learning Strategies for Securing Financial Transactions against Risks

Conference Paper

Mar 2024

Multi-Class Sentiment Analysis in Twitter: What if Classification is Not the Answer

Article

Full-text available

Oct 2018

With the rapid growth of Online Social Media content, and the impact these have made on people’s behavior, many researchers have been interested in studying these media platforms. A major part of their work focused on sentiment analysis and opinion mining. These refer to the automatic identification of opinions of people towards specific topics by analyzing their posts and publications. Multi-class sentiment analysis, in particular, addresses the identification of the exact sentiment conveyed by the user rather than the overall sentiment polarity of his text message or post. That being the case, we introduce a task different from the conventional multi-class classification, which we run on a data set collected from Twitter. We refer to this task as “quantification”. By the term “quantification”, we mean the identification of all the existing sentiments within an online post (i.e., tweet) instead of attributing a single sentiment label to it. For this sake, we propose an approach that automatically attributes different scores to each sentiment in a tweet, and selects the sentiments with the highest scores which we judge as conveyed in the text. To reach this target, we added to our previously introduced tool SENTA the necessary components to run and perform such a task. Throughout this work, we present the added components; we study the feasibility of quantification; and propose an approach to perform it on a data set made of tweets for 11 different sentiment classes. The data set was manually labeled and the results of the automatic analysis were checked against the human annotation. Our experiments show the feasibility of this task, and reach an F1 score equal to 45.9%.

Document Level Sentiment Analysis: A survey

Article

Full-text available

Mar 2018

Sentiment analysis becomes a very active research area in the text mining field. It aims to extract people's opinions, sentiments, and subjectivity from the texts. Sentiment analysis can be performed at three levels: at document level, at sentence level and at aspect level. An important part of research effort focuses on document level sentiment classification, including works on opinion classification of reviews. This survey paper tackles a comprehensive overview of the last update of sentiment analysis at document level. The main target of this survey is to give nearly full image of sentiment analysis application, challenges and techniques at this level. In addition, some future research issues are also presented.

Machine Learning-Based Sentiment Analysis for Twitter Accounts

Article

Full-text available

Feb 2018

Growth in the area of opinion mining and sentiment analysis has been rapid and aims to explore the opinions or text present on different platforms of social media through machine-learning techniques with sentiment, subjectivity analysis or polarity calculations. Despite the use of various machine-learning techniques and tools for sentiment analysis during elections, there is a dire need for a state-of-the-art approach. To deal with these challenges, the contribution of this paper includes the adoption of a hybrid approach that involves a sentiment analyzer that includes machine learning. Moreover, this paper also provides a comparison of techniques of sentiment analysis in the analysis of political views by applying supervised machine-learning algorithms such as Naïve Bayes and support vector machines (SVM).

Role of Text Pre-processing in Twitter Sentiment Analysis

Article

Full-text available

Dec 2016

Ubiquitous nature of online social media and ever expending usage of short text messages becomes a potential source of crowd wisdom extraction especially in terms of sentiments therefore sentiment classification and analysis is a significant task of current research purview. Major challenge in this area is to tame the data in terms of noise, relevance, emoticons, folksonomies and slangs. This works is an effort to see the effect of pre-processing on twitter data for the fortification of sentiment classification especially in terms of slang word. The proposed method of pre-processing relies on the bindings of slang words on other coexisting words to check the significance and sentiment translation of the slang word. We have used n-gram to find the bindings and conditional random fields to check the significance of slang word. Experiments were carried out to observe the effect of proposed method on sentiment classification which clearly indicates the improvements in accuracy of classification.

VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text

Article

May 2014

The inherent nature of social media content poses serious challenges to practical applications of sentiment analysis. We present VADER, a simple rule-based model for general sentiment analysis, and compare its effectiveness to eleven typical state-of-practice benchmarks including LIWC, ANEW, the General Inquirer, SentiWordNet, and machine learning oriented techniques relying on Naive Bayes, Maximum Entropy, and Support Vector Machine (SVM) algorithms. Using a combination of qualitative and quantitative methods, we first construct and empirically validate a gold-standard list of lexical features (along with their associated sentiment intensity measures) which are specifically attuned to sentiment in microblog-like contexts. We then combine these lexical features with consideration for five general rules that embody grammatical and syntactical conventions for expressing and emphasizing sentiment intensity. Interestingly, using our parsimonious rule-based model to assess the sentiment of tweets, we find that VADER outperforms individual human raters (F1 Classification Accuracy = 0.96 and 0.84, respectively), and generalizes more favorably across contexts than any of our benchmarks.

Sentiment Analysis on Twitter with Python’s Natural Language Toolkit and VADER Sentiment Analyzer

Conference Paper

Jan 2020

Sentiment Analysis of Twitter Data

Conference Paper

Apr 2019

Corpus-based dictionaries for sentiment analysis of specialized vocabularies

Article

Apr 2019

Contemporary dictionary-based approaches to sentiment analysis exhibit serious validity problems when applied to specialized vocabularies, but human-coded dictionaries for such applications are often labor-intensive and inefficient to develop. We demonstrate the validity of “minimally-supervised” approaches for the creation of a sentiment dictionary from a corpus of text drawn from a specialized vocabulary. We demonstrate the validity of this approach in estimating sentiment from texts in a large-scale benchmarking dataset recently introduced in computational linguistics, and demonstrate the improvements in accuracy of our approach over well-known standard (nonspecialized) sentiment dictionaries. Finally, we show the usefulness of our approach in an application to the specialized language used in US federal appellate court decisions.

Sentiment analysis of the Uri terror attack using Twitter

Conference Paper