A preview of this full-text is provided by Springer Nature.
Content available from Information Systems Frontiers
This content is subject to copyright. Terms and conditions apply.
A Hybrid Approach of Machine Learning and Lexicons to Sentiment
Analysis: Enhanced Insights from Twitter Data of Natural Disasters
Shalak Mendon
1,2
&Pankaj Dutta
2
&Abhishek Behl
2
&Stefan Lessmann
3
Accepted: 7 January 2021
#Springer Science+Business Media, LLC, part of Springer Nature 2021
Abstract
The success factor of sentimental analysis lies in identifying the most occurring and relevant opinions among users relating to the
particulartopic. In this paper, we develop a framework to analyze users’sentiments on Twitter on natural disasters using the data
pre-processing techniques and a hybrid of machine learning, statistical modeling, and lexicon-based approach. We choose TF-
IDF and K-means for sentiment classification among affinitiveand hierarchical clustering. Latent Dirichlet Allocation, a pipeline
of Doc2Vec and K-means used to capture themes, then perform multi-level polarity indices classification and its time series
analysis. In our study, we draw insights from 243,746 tweets for Kerala’s 2018 natural disasters in India. The key findings of the
study are the classification of sentiments based on similarity and polarity indices and identifying themes among the topics
discussed on Twitter. We observe different sets of emotions and influencers, among others. Through this case example of
Kerala floods, it shows how the government and other organizations could track the positive/negative sentiments concerning
time and location; gain a better understanding of the topic of discussion trending among the public, and collaborate with crucial
Twitter users/influencers to spread and figure out the gaps in the implementation of schemes in terms of design and execution.
This research’s uniqueness is the streamlined and efficient combination of algorithms and techniques embedded inthe framework
used in achieving the above output, which can be integrated into a platform with GUI for further automation.
Keywords Sentimental analysis .K-means clustering .Latent Dirichlet allocation .Machine learning .Twitter .Natural disasters
1 Introduction
Sentiment analysis using social media is an emerging and
rapidly growing segment in understanding people’sopinions
concerning day-to-day events (Zahra et al. 2020). Social me-
dia websites like Twitter, Facebook, YouTube, and LinkedIn
have garnered billions of users worldwide and have been
growing at a rapid phase (Kapoor et al. 2018). Especially in
emerging countries with a high growth rate of internet pene-
tration, more and more people have adopted social media to
talk to one another, share their opinions, and listen to others’
views. The immediate transfer of data has proven to be
extremely useful in natural disasters (Liu and Xu 2018;
Bhuvana and Aram 2019).
Twitter, one of the social media websites, lets a user write
messages of the maximum length of 280 characters at a time.
These short messages help quickly convey information
among users (Tang et al. 2009; Vomfell et al. 2018).
Unlike lengthy articles, blogs are written by one user, which
takes time to analyze. Twitter messages are directly on point
and help explain the sentiments quickly. Tweets can be ana-
lyzed based on hashtags, which are typically keywords used
by people, allowing collating all sentiments of people in one
place (Khan et al. 2014; Pandey et al. 2017). In this paper, we
concentrate on developing a framework for sentimental anal-
ysis (Öztürk and Ayvaz 2018), which could be used for mul-
tiple scenarios. Our study has considered Kerala floods,
which occurred in 2018 in India (Indian Express 2018).
People worldwide used several hashtags like
#KeralaFloods, #DoForKerala, #IndiaForKerala,
#KeralaDonationChallenge among others. These keywords
were generated at different points in time and helped under-
stand people’s sentiments at different times (Bandyopadhyay
et al. 2018).
*Pankaj Dutta
pdutta@iitb.ac.in
1
Wipro Limited, Electronic City, Bengaluru, Karnataka 560100, India
2
SJM School of Management, Indian Institute of Technology
Bombay, Powai, Mumbai 400076, India
3
Chair of Information Systems, School of Business and Economics,
Humboldt-Universität zu Berlin, Unter den Linden 6,
10099 Berlin, Germany
https://doi.org/10.1007/s10796-021-10107-x
/ Published online: 14 February 2021
Information Systems Frontiers (2021) 23:1145–1168
Content courtesy of Springer Nature, terms of use apply. Rights reserved.