ChapterPDF Available

Social Media Sentiment Analysis on Third Booster Dosage for COVID-19 Vaccination: A Holistic Machine Learning Approach

Authors:

Abstract and Figures

Over a period of more than two years the public health has been experiencing legitimate threat due to COVID-19 virus infection. This article represents a holistic machine learning approach to get an insight of social media sentiment analysis on third booster dosage for COVID-19 vaccination across the globe. Here in this work, researchers have considered Twitter responses of people to perform the sentiment analysis. Large number of tweets on social media require multiple terabyte sized database. The machine learned algorithm-based sentiment analysis can actually be performed by retrieving millions of twitter responses from users on daily basis. Comments regarding any news or any trending product launch may be ascertained well in twitter information. Our aim is to analyze the user tweet responses on third booster dosage for COVID-19 vaccination. In this sentiment analysis, the user sentiment responses are firstly categorized into positive sentiment, negative sentiment, and neutral sentiment. A performance study is performed to quickly locate the application and based on their sentiment score the application can distinguish the positive sentiment, negative sentiment and neutral sentiment-based tweet responses once clustered with various dictionaries and establish a powerful support on the prediction. This paper surveys the polarity activity exploitation using various machine learning algorithms viz. Naïve Bayes (NB), K- Nearest Neighbors (KNN), Recurrent Neural Networks (RNN), and Valence Aware wordbook and sEntiment thinker (VADER) on the third booster dosage for COVID-19 vaccination. The VADER sentiment analysis predicts 97% accuracy, 92% precision, and 95% recall compared to other existing machine learning models.KeywordsHolistic approachNBKNNRNNVADER sentiment analysisTweeter responseBooster dosage
Content may be subject to copyright.
Social Media Sentiment Analysis
on Third Booster Dosage for COVID-19
Vaccination: A Holistic Machine
Learning Approach
Papri Ghosh , Ritam Dutta , Nikita Agarwal , Siddhartha Chatterjee ,
and Solanki Mitra
Abstract Over a period of more than two years the public health has been expe-
riencing legitimate threat due to COVID-19 virus infection. This article represents
a holistic machine learning approach to get an insight of social media sentiment
analysis on third booster dosage for COVID-19 vaccination across the globe. Here
in this work, researchers have considered Twitter responses of people to perform the
sentiment analysis. Large number of tweets on social media require multiple terabyte
sized database. The machine learned algorithm-based sentiment analysis can actually
be performed by retrieving millions of twitter responses from users on daily basis.
Comments regarding any news or any trending product launch may be ascertained
well in twitter information. Our aim is to analyze the user tweet responses on third
booster dosage for COVID-19 vaccination. In this sentiment analysis, the user senti-
ment responses are firstly categorized into positive sentiment, negative sentiment,
and neutral sentiment. A performance study is performed to quickly locate the appli-
cation and based on their sentiment score the application can distinguish the positive
sentiment, negative sentiment and neutral sentiment-based tweet responses once clus-
tered with various dictionaries and establish a powerful support on the prediction.
This paper surveys the polarity activity exploitation using various machine learning
algorithms viz. Naïve Bayes (NB), K- Nearest Neighbors (KNN), Recurrent Neural
Networks (RNN), and Valence Aware wordbook and sEntiment thinker (VADER) on
the third booster dosage for COVID-19 vaccination. The VADER sentiment analysis
predicts 97% accuracy, 92% precision, and 95% recall compared to other existing
machine learning models.
P. Ghosh
CSE Department, Narula Institute of Technology, Kolkata, West Bengal, India
R. Dutta (B
)·N. Agarwal
ITER, Siksha ‘O’ Anusandhan University, Bhubaneswar, Odisha, India
e-mail: ritamdutta1986@gmail.com
S. Chatterjee
CSE Department, IMPS College of Engineering and Technology, Malda, West Bengal, India
S. Mitra
CSE Department, University of Glasgow, University Ave, Glasgow G12 8QW, UK
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
S. Bhattacharyya et al. (eds.), Intelligent Systems and Human Machine Collaboration,
Lecture Notes in Electrical Engineering 985,
https://doi.org/10.1007/978-981-19-8477-8_14
179
180 P. Ghosh et al.
Keywords Holistic approach ·NB ·KNN ·RNN ·VADER sentiment analysis ·
Tweeter response ·Booster dosage
1 Introduction
Social websites which are different forms viz. blogs, icon and forum sharing, video
sharing social networks, microblogs etc. have used several online social media
viz. Facebook, Instagram, YouTube, Linked-in, Twitter. These websites and mobile
applications share various people’s response globally.
In these social sites, different individuals across the world can express their discus-
sion, comments in various styles like text, image, video, emoji [1,2]. Social media
with huge source of knowledge can gather user opinion and various polls regarding
the expression. Microblog has become the simplest familiar and therefore the source
of various data [3]. Twitter is one the microblog service that enables users to share,
reply within a short time frame as tweets [4]. It provides a fashionable supply of
knowledge which are utilized in various scientific studies that are using sentiment
analysis to extract and analysis data which are expressed as tweets on various topics
like market, election, share-trade prediction.
Linguistic Inquiry and Word Count (LIWC) is one of the tools of text extraction
[5,6]. Most of these tolls require programming, here in our work we have used
Valence Aware wordbook and sentiment thinker (VADER), which work on sentiment
analysis of tweets on third booster dosage for COVID-19 vaccination among various
countries.
2 Literature Survey
A thorough literature survey containing text mining and sentiment analysis
approaches have been performed in this work. Gurkhe et al. [7] in their paper
have projected twitter information which is collected and processed from numerous
sources and removed the content which does not hold any polarity. Bouazizi et al.
[8] have used a tool SENTA for sentiment analysis of the tweets and calculate score
according to sentiment. Gautam et al. have used a review classification on tweets [9],
where they have used primary algorithms viz. Naïve Bayes, Support Vector Machine
(SVM), Maximum Entropy used from NLTK module of Python. Amolik et al. [10]
have used twitter sentiment analysis on Hollywood movie industry and they have
compared Naïve Bayes and SVM algorithm on accuracy classification. Mukherjee
et al. [11] have used a hybrid sentiment analysis tool TwiSent where spell check and
linguistic handler have already been defined. Davidov et al. [12] have introduced a
supervised sentiment analysis technique on twitter information. Neethu et al. [13]
have used machine learning technique on SVM, Naïve Bayes, Maximum entropy on
MATLAB platform for classification data. A typical design structure of tweets on
Social Media Sentiment Analysis on Third Booster Dosage 181
terrorism attack and their possible activities has been represented by Garg et al. [14].
Lots of tweets after the attack with #tag has been used on Naïve Bayes algorithm
which was used on huge data and analysis has been performed for characteristics of
the comments. Hasan et al. [15] have projected a hybrid approach where the tweets
are followed by the #tag on political trend. Several Urdu tweets are translated to
English for analysis, where the Naïve Bayes and SVM approaches are used to build
a structure. Bhavsar et al. [16,17] have designed and projected a sentiment analysis
method on python platform and the data source were collected from Kaggle. Clas-
sifications on user’s emotions on positivity and negativity were done for accuracy
finding. Otaibi et al. [41] have structured a model both on supervised and unsuper-
vised algorithm. They have used Twitter API for extracting 7000 tweets which are
based on comments on McDonald and KFC quality. The analysis was performed on
R programming language platform.
3 Sentiment Analysis on Twitter
This system primarily consists of the following stages. The stages square measure
mentioned in Fig. 1.
Fig. 1 Different stages of sentiment analysis
182 P. Ghosh et al.
3.1 Data Extraction
For any machine learning model, database is the primary need. These data train
themselves and predict the unseen data. Initially we opt for the topic with associated
subject that will be gathered. The social media responses (tweets) are retrieved in
unstructured, structured and semi structured form.
3.2 Data Pre-processing
In this step, within the collected social media responses (tweets) data pre-processing
is performed. Here the large set of information is filtered by eliminating irrelevant,
inconsistent, and yelling information. It functions by converting these datasets to
lowercase and removing the duplicate values viz. punctuations, spaces, stops etc.,
further need to add contractions and lemmatizations.
3.3 Sentiment Detection
By incorporating data classification and data (tweet) mining the sentiment detection
can be performed [18].
3.4 Sentiment Classification
Algorithmic ruled sentiment analysis is generally classified to two approaches viz.
supervised learning and unattended learning. In the supervised learning, the Naïve
Thomas Bayes, SVM and most entropy square measure accustomed execute the
sentiment analysis [1928].
3.5 Evaluation
The final output is analyzed to require call whether or not we must always prefer
it or not [2933]. In this step, within the collected social media responses (tweets)
data pre-processing is performed. Here the large set of information is filtered by
eliminating irrelevant, inconsistent, and yelling information.
Social Media Sentiment Analysis on Third Booster Dosage 183
4 Proposed Model Component Description
Twitter may be a social networking platform that enables its users to send and skim
micro-blogs of up to 280-characters called “tweets”. It allows registered users to
browse and post their tweets through internet, short message service (SMS) and
mobile applications. As a worldwide period of time communications platform,
Twitter has quite four hundred million monthly guests and 255 million monthly
active users round the world. Twitter’s active cluster of registered members includes
World leaders, major athletes, star performers, news organizations, and amusement
retailers. It is presently accessible in additional than thirty-five languages.
Twitter was launched in 2006 by Jack Dorsey, Evan Williams, Biz Stone, and
patriarch Glass. Twitter is headquartered in San Francisco, California, USA.
Here in Table 1, the sample dataset collected from social media (Twitter) has been
showcased where the different sentiment categories have been identified by machine
learning algorithms used in the model.
The Natural Language Toolkit (NLTK) has been used in our Valence Aware Dictio-
nary for sEntiment Reasoning (VADER) model. NLTK provides free ASCII text file
Python package that has many tools for building programs and classifying knowledge
[34,35]. The VADER model is a rule-based sentiment analysis tool that specifically
Table 1 Sample dataset
collected from social media
(Twitter)
Comments Category
Positive
Negative
Neutral
184 P. Ghosh et al.
attuned to the user emotions expressed in social media [36,37]. All instances of the
info sets had accents and punctuation marks removed and then different data pre-
processing techniques have been applied, viz. Lemma extraction [38], Stemming,
Part of Speech (PoS) tagging [39,40] and Summarization. Similar words of received
twitter responses are identified using regular expressions and passed to a dictionary
to label the data that can be used for supervised learning.
4.1 Flow Chart of Our Proposed Model
In this section, the flowchart of our proposed sentiment analysis model is depicted in
Fig. 2. In the mentioned Algorithm 1 below, the steps to carry on the analysis process
on the received texts are further shown in Fig. 3.
In our work, the VADER sentiment analysis tool has been used to get the simu-
lated response. VADER is a lexicon and rule-based sentiment analysis tool which
is specifically adjusted to the sentiments expressed in social media (twitter) and
provides substantial results on texts from other domains.
4.2 Different Machine Learning Models Used
for Performance Comparison
The machine learning models are trained with some data and are used to make
predictions on unseen data. The specialty of machine learning models is that they
can extract and learn the features of a dataset using some feature selection technique
and therefore don’t require human intervention. This section describes the machine
learning models used in our proposed work.
4.2.1 Naïve Bayes
The Naïve Bayes classification is a well-known supervised machine learning
approach that makes predictions based on some probability. It is based on the Bayes
theorem to determine the probability value, calculated as shown below:
P(C|X)=P(X|C).P(C)
P(X)(1)
where,
P(X|C): likelihood
P(C|X): posterior probability
Social Media Sentiment Analysis on Third Booster Dosage 185
Fig. 2 Flowchart of our proposed sentiment analysis model
P(C): class probability
P(X): predictor probability
4.2.2 K-Nearest Neighbor (KNN)
K-Nearest Neighbor is the simplest yet widely used supervised Machine Learning
algorithm. It is widely used in text mining, pattern recognition, and many other fields.
It groups similar types of data with respect to k neighbors and based on the similarity,
the classification is done. In our work, we have used the grid search technique to
determine the “k” value and it was observed that k =15 gave good results as shown
in Fig. 4b.
186 P. Ghosh et al.
Fig. 3 Proposed algorithm for twitter comments sentiment classification
Fig. 4 a Sentiment score analysis using NB, KNN, RNN and VADER algorithms, bAccuracy,
Precision and Recall for different sentiment analyzer models
4.2.3 Recurrent Neural Network (RNN)
Recurrent Neural Networks are a type of Neural Network that remember old data
to make future predictions. They analyze the data more efficiently as compared
to other machine learning models as the latter uses the current data only to make
future predictions. The RNN model has a memory that remembers the relevant past
information and forgets the irrelevant information. As the same parameters are used
throughout the layers, the complexity of the model is reduced to a large extent. The
RNN model is widely used in text mining, natural language processing, and many
other tasks. The parameters used in our RNN model are shown in Table 2.
Social Media Sentiment Analysis on Third Booster Dosage 187
Table 2 Parameters used in
our RNN model Parameters RNN
Layers used 4
Fully connected layers used 2
Number of nodes in each layer 128, 64, 32, 16
Number of nodes in fully connected layers 16, 1
Optimization technique RMSprop
Activation function SoftMax
Epochs 100
Learning rate 0.0001
4.2.4 VADER
The Valence Aware Dictionary and sEntiment Reasoner sentiment analysis tool being
an unsupervised learning approach, which is able to detect the polarity of the senti-
ment (positive, negative, or neutral) of a given text when the data is analyzed as unla-
beled. Therefore, our proposed VADER model is less expensive compared to other
existing supervised learning approaches. Orthodox sentiment analyzer models are
given opportunity to learn from labeled training data, which complexes the process.
The VADER sentiment analyzer is smart to get the job done without the label forma-
tion. VADER uses a lexicon of sentiment-related words to determine the overall
sentiment of a given body of text.
4.3 Evaluation Matrix
This section describes the metrics used to evaluate the performance of our models.
The three evaluation metrics incorporated in our work are described below:
Accuracy: It represents the total predictions correctly made.
Precision: It is the ratio of the number of correct positive results divided by the
number of positive results predicted by the classifier.
Recall: It is the ratio of the number of correct positive results divided by the
number of relevant samples.
5 Simulation Results and Discussion
For the result analysis, we have continued by closing computation supported infor-
mation which were collected from social media (twitter) information. The senti-
ment analysis tool incorporates word-order sensitive relationships between terms and
188 P. Ghosh et al.
Fig. 5 Twitter polarity count on third booster dosage using VADER sentiment Analysis
then collected information are processed to identify positive, negative, and neutral
sentiments.
Here in Fig. 4a, a bar chart comparison on sentiment score analysis fetched from
social media (twitter) responses modeled by four machine learning algorithms i.e.,
NB, KNN, RNN and VADER is performed. The performance of these models is
evaluated using the well-known machine learning evaluation metrics viz. accuracy,
precision, and recall as shown in Fig. 4b. The tweeter responses on the usage and
significance of third booster dosage for COVID-19 vaccination have been categori-
cally modeled using four machine learning algorithms, where the VADER sentiment
analyzer shows best accurate results compared to others.
The twitter polarity count on third booster dosage has also been compared between
top five superpower countries using VADER sentiment analysis as shown in Fig. 5.
6 Conclusion and Future Scope
In our paper on the concurrent result of analyzing twitter responses of various coun-
tries on usage and probable significance of third booster dosage to combat COVID-19
infection is recorded. The World Health Organization wishes to speculate into the
medicine market. The machine learning models viz. NB, KNN, RNN, and VADER
were used to analyze the sentiments of the humans portrayed in tweets. This compar-
ative study has proved the results where VADER sentiment analysis was performed
with 97% accuracy, 92% precision, and 95% recall. The results have shown a robust
co-relation between Twitter comments in sentiment polarity. The VADER being an
Social Media Sentiment Analysis on Third Booster Dosage 189
unsupervised learning approach has proposed a model that is less expensive compared
to other existing supervised learning approaches. This proposed approach can be used
for other contagious infections if needed in future.
References
1. Jansen BJ, Hang MZ, Sobeland K, Chowdury A (2009) Twitter power: Tweets as electronic
word of mouth. J Am Soc Inf Sci Technol 60(11):2169–2188
2. Kharde V, Sonawane P (2016) Sentiment analysis of twitter data: a survey of techniques. Int J
Comput Appl. ArXiv1601.06971
3. Selvaperumal P, Suruliandi A (2014) A short message classification algorithm for tweet
classification. Int Conf Recent Trends Inf Technol 1–3
4. Singh T, Kumari M (2016) Role of text pre-processing in twitter sentiment analysis. Procedia
Comput Sci 89:549–554
5. Tausczik YR, Pennebaker JW (2010) The psychological meaning of words: LIWC and
computerized text analysis methods. J Lang Soc Psychol 29(1):24–54
6. Gilbert CJ (2016) Vader: a parsimonious rule-based model for sentiment analysis of social
media text. In: Eighth international conference on weblogs and social media (ICWSM-14)
7. Gurkhe D, Pal N, Bhatia R (2014) Effective sentiment analysis of social media datasets using
Naïve Bayesian classification. Int J Comput Appl
8. Bouazizi M, Ohtsuki T (2018) Multi-class sentiment analysis in Twitter: what if classification
is not the answer. IEEE Access. 6:64486–64502
9. Gautam G, Yadav D (2014) Sentiment analysis of twitter data using machine learning
approaches and semantic analysis. In: 7th international conference on contemporary computing
10. Amolik A, Jivane N, Bhandari JM, Venkatesan M (2016) Twitter sentiment analysis of movie
reviews using machine learning techniques. Int J Eng Technol 7(6):1–7
11. Mukherjee S., Malu A, Balamurali AR, Bhattacharyya P (2013) TwiSent: a multistage system
for Analyzing sentiment in Twitter. In: Proceedings of the 21st ACM international conference
on information and knowledge management
12. Davidov D, Sur O, Rappoport A (2010) Enhanced sentiment learning using Twitter hashtags
and smileys. In: Proceedings of the 23rd international conference on computational linguistics,
posters
13. Neethu M, Rajasree R (2013) Sentiment analysis in twitter using machine learning techniques.
In: 4th international conference on computing, communications and networking technologies,
IEEE, Tiruchengode, India
14. Garg P, Garg H, Ranga V (2017) Sentiment analysis of the Uri terror attack using Twitter. In:
International conference on computing, communication and automation, IEEE, Greater Noida,
India
15. Hasan A, Moin S, Karim A, Shamshirb S (2018) Machine learning based sentiment analysis
for Twitter accounts, Licensee, MDPI, Switzerland
16. Bhavsar H, Manglani R (2019) Sentiment analysis of Twitter data using python. Int Res J Eng
Technol 3(2):41–45
17. Sirsat S, Rao S, Wukkadada B (2019) Sentiment analysis on Twitter data for product evaluation.
IOSR J Eng 5(1):22–25
18. Behdenna S, Barigou F, Belalem G (2018) Document level sentiment analysis: a survey. In:
EAI endorsed transactions on context-aware systems and applications
19. Taboada M, Brooke J, Tofiloski M, Voll K, Stede M (2011) Lexicon-based methods for
sentiment analysis. Comput Linguist J 267–307
20. Tong RM (2001) An operational system for detecting and tracking opinions in on-line
discussions. In: Working notes of the SIGIR workshop on operational text classification, pp
1–6
190 P. Ghosh et al.
21. Turney P, Littman M (2003) Measuring praise and criticism: inference of semantic orientation
from association. ACM Trans Inform Syst J 21(4):315–346
22. Kaur L (2016) Review paper on Twitter sentiment analysis techniques. Int J Res Appl Sci Eng
Technol 4(1)
23. Miller GA, Beckwith R, Fellbaum C, Gross D, Miller KJ (1990) Introduction to WordNet: an
on-line lexical database. Int J Lexicogr 3(4):235–244
24. Mohammad S, Dunne C, Dorr B (2009) Generating high-coverage semantic orientation lexicons
from overly marked words and a thesaurus, In: Proceedings of the conference on empirical
methods in natural language processing
25. Harb A, Plantie M, Dray G, Roche M, Trousset F, Poncelet F (2008) Web opinion mining:
How to extract opinions from blogs? In: Proceedings of the 5th international conference on
soft computing as transdisciplinary science and technology (CSTST 08), pp 211–217
26. Turney PD (2002) Thumbs up or thumbs down? Semantic orientation applied to unsuper-
vised classification of reviews. In: Proceedings of the 40th annual meeting on association for
computational linguistics, pp 417–424
27. Wang G, Araki K (2007) Modifying SOPMI for Japanese weblog opinion mining by using
a balancing factor and detecting neutral expressions. In: Human language technologies: the
conference of the North American chapter of the association for computational linguistics,
companion volume, pp 189–192
28. Rice DR, Zorn C (2013) Corpus-based dictionaries for sentiment analysis of specialized
vocabularies. In: Proceedings of NDATAD, pp 98–115
29. Su KY, Chiang TH, Chang JS (1996) An overview of corpus-based statistics oriented (CBSO)
Techniques for natural language processing. Comput Linguist Chin Lang Process 1(1):101–157
30. Su KY, Chiang TH (1990) Some key issues in designing MT systems. Mach Transl 5(4):265–
300
31. Su KY, Chiang TH (1992) Why corpus-based statistics oriented machine translation. In:
Proceedings of 4th international conference on theoretical and methodological issue in machine
translation, Montreal, Canada, pp 249–262
32. Su KY, Chang JS, Una Hsu YL (1995) A corpus-based two-way design for parameterized MT
systems: rationale, architecture and training issues. In: Proceedings of the 6th international
conference on theoretical and methodological issues in machine translation, TMI-95, pp 334–
353
33. Moss HE, Ostrin RK, Tyler LK, Marslen WD (1995) Accessing different types of lexical
se-mantic information: evidence from priming. J Exp Psychol Learn Mem Cogn 21(1):863–883
34. Natural Language Toolkit. http://www.nltk.org/. Last Accessed 20 Nov 2018
35. Bird S, Loper E, Klein E (2009) Natural language processing with python. O’Reilly Media Inc.
36. Gilbert CJ (2014) Vader: A parsimonious rule-based model for sentiment analysis of social
media text. In: 8th international confernece on weblogs and social media
37. Elbagir S, Yang J (2019) Twitter sentiment analysis using natural language toolkit and VADER
sentiment. LNCS 232:342–347
38. Natural Language Processing with neural networks. http://nilc.icmc.usp.br/nlpnet/.Last
Accessed 05 Dec 2021
39. Fonseca ER, Rosa JLG (2013) A two-step convolutional neural network approach for semantic
role labelling. In: Proceedings of the 2013 international joint conference on neural networks-
2013, pp 2955–2961
40. Fonseca ER, Rosa JLG (2013) Mac-Morpho revisited: towards robust part-of-speech tagging.
In: Proceedings of the 9th Brazilian symposium in information and human language technology,
pp 98–107
41. Rahman SA, Al Otaibi FA, AlShehri WA (2019) Sentiment analysis of Twitter data. In: 2019
international conference on computer and information sciences, pp 1–4. https://doi.org/10.
1109/ICCISci.2019.8716464
Article
Full-text available
With the rapid growth of Online Social Media content, and the impact these have made on people’s behavior, many researchers have been interested in studying these media platforms. A major part of their work focused on sentiment analysis and opinion mining. These refer to the automatic identification of opinions of people towards specific topics by analyzing their posts and publications. Multi-class sentiment analysis, in particular, addresses the identification of the exact sentiment conveyed by the user rather than the overall sentiment polarity of his text message or post. That being the case, we introduce a task different from the conventional multi-class classification, which we run on a data set collected from Twitter. We refer to this task as “quantification”. By the term “quantification”, we mean the identification of all the existing sentiments within an online post (i.e., tweet) instead of attributing a single sentiment label to it. For this sake, we propose an approach that automatically attributes different scores to each sentiment in a tweet, and selects the sentiments with the highest scores which we judge as conveyed in the text. To reach this target, we added to our previously introduced tool SENTA the necessary components to run and perform such a task. Throughout this work, we present the added components; we study the feasibility of quantification; and propose an approach to perform it on a data set made of tweets for 11 different sentiment classes. The data set was manually labeled and the results of the automatic analysis were checked against the human annotation. Our experiments show the feasibility of this task, and reach an F1 score equal to 45.9%.
Article
Full-text available
Sentiment analysis becomes a very active research area in the text mining field. It aims to extract people's opinions, sentiments, and subjectivity from the texts. Sentiment analysis can be performed at three levels: at document level, at sentence level and at aspect level. An important part of research effort focuses on document level sentiment classification, including works on opinion classification of reviews. This survey paper tackles a comprehensive overview of the last update of sentiment analysis at document level. The main target of this survey is to give nearly full image of sentiment analysis application, challenges and techniques at this level. In addition, some future research issues are also presented.
Article
Full-text available
Growth in the area of opinion mining and sentiment analysis has been rapid and aims to explore the opinions or text present on different platforms of social media through machine-learning techniques with sentiment, subjectivity analysis or polarity calculations. Despite the use of various machine-learning techniques and tools for sentiment analysis during elections, there is a dire need for a state-of-the-art approach. To deal with these challenges, the contribution of this paper includes the adoption of a hybrid approach that involves a sentiment analyzer that includes machine learning. Moreover, this paper also provides a comparison of techniques of sentiment analysis in the analysis of political views by applying supervised machine-learning algorithms such as Naïve Bayes and support vector machines (SVM).
Article
Full-text available
Ubiquitous nature of online social media and ever expending usage of short text messages becomes a potential source of crowd wisdom extraction especially in terms of sentiments therefore sentiment classification and analysis is a significant task of current research purview. Major challenge in this area is to tame the data in terms of noise, relevance, emoticons, folksonomies and slangs. This works is an effort to see the effect of pre-processing on twitter data for the fortification of sentiment classification especially in terms of slang word. The proposed method of pre-processing relies on the bindings of slang words on other coexisting words to check the significance and sentiment translation of the slang word. We have used n-gram to find the bindings and conditional random fields to check the significance of slang word. Experiments were carried out to observe the effect of proposed method on sentiment classification which clearly indicates the improvements in accuracy of classification.
Article
The inherent nature of social media content poses serious challenges to practical applications of sentiment analysis. We present VADER, a simple rule-based model for general sentiment analysis, and compare its effectiveness to eleven typical state-of-practice benchmarks including LIWC, ANEW, the General Inquirer, SentiWordNet, and machine learning oriented techniques relying on Naive Bayes, Maximum Entropy, and Support Vector Machine (SVM) algorithms. Using a combination of qualitative and quantitative methods, we first construct and empirically validate a gold-standard list of lexical features (along with their associated sentiment intensity measures) which are specifically attuned to sentiment in microblog-like contexts. We then combine these lexical features with consideration for five general rules that embody grammatical and syntactical conventions for expressing and emphasizing sentiment intensity. Interestingly, using our parsimonious rule-based model to assess the sentiment of tweets, we find that VADER outperforms individual human raters (F1 Classification Accuracy = 0.96 and 0.84, respectively), and generalizes more favorably across contexts than any of our benchmarks.
Article
Contemporary dictionary-based approaches to sentiment analysis exhibit serious validity problems when applied to specialized vocabularies, but human-coded dictionaries for such applications are often labor-intensive and inefficient to develop. We demonstrate the validity of “minimally-supervised” approaches for the creation of a sentiment dictionary from a corpus of text drawn from a specialized vocabulary. We demonstrate the validity of this approach in estimating sentiment from texts in a large-scale benchmarking dataset recently introduced in computational linguistics, and demonstrate the improvements in accuracy of our approach over well-known standard (nonspecialized) sentiment dictionaries. Finally, we show the usefulness of our approach in an application to the specialized language used in US federal appellate court decisions.