Content uploaded by Ajeet Ram Pathak
Author content
All content in this area was uploaded by Ajeet Ram Pathak on Jun 26, 2019
Content may be subject to copyright.
Full Terms & Conditions of access and use can be found at
https://www.tandfonline.com/action/journalInformation?journalCode=tsms20
Journal of Statistics and Management Systems
ISSN: 0972-0510 (Print) 2169-0014 (Online) Journal homepage: https://www.tandfonline.com/loi/tsms20
Empirical evaluation of deep learning models for
sentiment analysis
Ajeet Ram Pathak, Manjusha Pandey & Siddharth Rautaray
To cite this article: Ajeet Ram Pathak, Manjusha Pandey & Siddharth Rautaray (2019) Empirical
evaluation of deep learning models for sentiment analysis, Journal of Statistics and Management
Systems, 22:4, 741-752, DOI: 10.1080/09720510.2019.1609554
To link to this article: https://doi.org/10.1080/09720510.2019.1609554
Published online: 25 Jun 2019.
Submit your article to this journal
View Crossmark data
©
Empirical evaluation of deep learning models for sentiment analysis
Ajeet Ram Pathak *
Manjusha Pandey †
Siddharth Rautaray §
School of Computer Engineering
Kalinga Institute of Industrial Technology (KIIT) University
Bhubaneswar 751024
Odisha
India
Abstract
The availability of computing resources and generation of large scale data emanating
from Artificial Intelligence, Internet of Things and social media platforms have resulted into
resurgence of deep learning technology. Deep learning architectures have been successfully
adopted to solve the problems arising in variety of domains such as computer vision,
information retrieval, robotics, and natural language processing, etc. Due to inherent ability
of deep architectures to extract hierarchical structures from complex multimedia data, they
have been widely used for the tasks of classification, regression and prediction. Motivated
by the same, this paper addresses the problem of identifying the subjective information from
text documents and predicting the sentiments at sentence level using deep feedforward
neural network with global average pooling and long short term memory model with dense
layers. The experimentation details state that both models are on par and provide good
accuracy on the benchmarked dataset of sentiment classification.
Subject Classification: (2010) 68M12
Keywords: Deep feedforward neural network, Deep learning, Long short term memory model,
Sentiment analysis, Social media analytics
1. Introduction
Sentiment analysis aims to determine the sentiments of a speaker as
“positive” or “negative” with reference to certain event or subject [5]. It
*E-mail: ajeet.pathak44@gmail.com (Corresponding author)
†E-mail: manjushapandey82@gmail.com
§E-mail: sr.rgpv@gmail.com
Journal of Statistics & Management Systems
ISSN 0972-0510 (Print), ISSN 2169-0014 (Online)
Vol. 22 (2019), No. 4, pp. 741–752
DOI : 10.1080/09720510.2019.1609554
742 A. R. PATHAK, M. PANDEY AND S. RAUTARAY
is applicable in various fields. In business industries, it allows inferring
sentiments of customers about services or products, and helps to improve
the services, launch new products, etc. Public sentiments and reaction
of public towards different campaigns and schemes implemented by
Government play crucial role in decision making in the domain of
politics. The recent developments in the field of artificial intelligence (AI)
systems for recognizing and analyzing human emotions and sentiments
have been made on account of availability of large data available on
social media platforms, cheaper computing resources and emerging
deep learning capabilities incorporated with natural language processing
and computer vision. Many sectors such as Business services, Gaming
industries, Healthcare, Retail and Advertising have been adopting the
sentiment analysis and emotion recognition software leading to reach
market of $3.8 Billion by 2025 [2]. However, identifying and monitoring
the contents of social sites on the Web and filtering this information for the
sake of sentiment analysis is very challenging task due to diversity of sites,
heterogeneous opinionated data (sentiments expressed via texts, pictures,
emoticons), slangs, unstructured data, and regional languages. The data
available on social media platforms are characterized as big data due
to immense volume, the rate at which data are generated (velocity) and
heterogeneity (variety) [10]. Manually finding the relevant sites holding
the opinionated data, extracting and predicting the sentiments is infeasible
by average human personnel. Therefore, automated systems for sentiment
analysis are immensely needed. Due to practical applications of sentiment
analysis in various fields, researchers have come up with numerous models
performing analysis at multiple levels of text granularities (document-
level, phrase-level, sentence-level, aspect-level).
Existing research has focused on applying supervised and
unsupervised techniques for sentiment analysis. Early research papers
used supervised methods based on Support Vector Machines (SVM), Naïve
Bayes, Maximum Entropy, etc. On the other hand, techniques based on
sentiment lexicons, syntactic analysis, etc have been used as unsupervised
methods.
The past decade witnessed proliferation of deep learning as powerful
technique in various application domains such as computer vision,
speech recognition, natural language processing. Motivated by the same,
this paper aims to address the problem of sentence-level sentiment
classification by performing experiments using two deep architectures viz.
deep feedforward neural network with global average pooling and long
short term memory (LSTM) model on the benchmarked dataset. Based on
DEEP LEARNING MODELS FOR SENTIMENT ANALYSIS 743
the results obtained, it is observed that both the models work on par and
significant results have been achieved on the benchmarked dataset.
2. Related work
The problem of sentiment analysis from social media data has
greatly spurred the interest of research community since past 2 decades.
Considering the scope of the paper, deep learning approaches for sentiment
analysis has been discussed here. Motivated by the success of deep learning
in the domain of computer vision [9, 11-13], deep architectures have also
been adopted for natural language processing tasks [20-23].
To get the benefit of linguistic resources for sentiment classification,
[14] proposed to model negation, sentiment and intensity words using
linguistic-driven regularized sequence of LSTM network. This sequence
model regularizes the difference between sentiment distribution of
current location and that of forward or backward locations. By modeling
linguistic resources, this model works independent of parsing structures
and phrase-level annotations.
For improving the accuracy of sentiment analysis for Arabic data,
ensemble model encompassing convolution neural network (CNN) and
LSTM has been put forth in [3]. In this model, soft voting scheme is used
in which predicted class probability for data is averaged across both CNN
and LSTM model and the class having highest average is chosen to be final
prediction of the ensemble model. [4] applied LSTM network with Global
Vectors for Word Representation (GloVe) model for sentiment analysis on
SemEval-2017 dataset and predicted the sentiments based on a five-point
ordinal scale having the classes as strongly negative, negative, neutral,
positive, strongly positive.
The performance of conventional recurrent neural network (RNN)
and LSTM on the accuracy of sentiment classification has been compared
in [6]. Based on the results, it is claimed that LSTM works better than
RNN. For sentiment analysis of Hindi reviews, [15] proposed a deep
learning model based on CNN. They experimented with different settings
of CNN parameters by varying input size, regularization technique,
output dimension, dropout rate, epochs, activation function, etc to get
the optimal results. Rezaeinia et al. [16] proposed improved word vector
model designed using combination of lexicon based approach, parts-of-
speech tagging approach, word position technique and word2vec/ GloVe
method. For learning sentiment-specific word embeddings, [19] described
a method of integrating sentiment information of texts. How to develop
744 A. R. PATHAK, M. PANDEY AND S. RAUTARAY
neural network models to deal with fine-grained sentiment analysis has
also been presented.
Some research papers have taken into account the influence of
factors such as quality of data, structural information, domain specific
factors, user behaviors while performing the task of sentiment analysis to
get improved results. Li et al. [7] assessed the effect of textual quality of
reviews based on review length, readability and level of word count on
the performance of sentiment analysis task by experimenting over movie
review dataset using 3 deep learning models – simple RNN, LSTM and
CNN. They claimed that datasets having short length and high readability
gives more accuracy compared to the datasets with longer length and low
readability. On the similar lines, CNN based approach proposed in [1]
also incorporates the information of user behavior (personality traits and
social activities) for the task of sentiment analysis. Semi-supervised RNN
based model proposed in [17] performs sentiment analysis by utilizing
structural information among reviews at different levels of granularity
such as words, phrases and sentences. For the sake of forecasting the
sentiments in the domain of financial sentiment analysis, [18] proposed
RNN based approach which takes word vectors as input obtained from
GloVe method. Their proposed model takes into account the effect of
market trends, propensity, etc for sentiment forecasting.
3. Methodology
For sentence-level sentiment classification, we used 2 models based
on deep architecture viz. Deep feedforward neural network model with
global average pooling and LSTM model with dense layers. Initially,
preprocessing of input reviews is performed. As a first step, punctuation
marks are removed and words are converted into lower case. Then
tokenization is performed and word index dictionary is created for all the
words in a review so that each review can be represented as an ordered
sequence of integers. In order to provide an input to LSTM, all reviews
should have same length. Therefore, padding is applied to maintain the
fixed length of each review. Labels are encoded as ‘1’ for positive review
while ‘0’ for negative review. Figure 1 shows the configuration of deep
feedforward neural network model with global average pooling. The
model has 1 embedding layer, global average pooling layer and 6 dense
layers. Fixed length reviews are converted into embedded vectors using
embedding layer. After embedding layer, global average pooling is
applied [8] as shown in figure 2. Global average pooling acts as a structural
DEEP LEARNING MODELS FOR SENTIMENT ANALYSIS 745
Figure 1
Deep feedforward neural network model with global average pooling
Figure 2
Global average pooling
746 A. R. PATHAK, M. PANDEY AND S. RAUTARAY
regularizer and prevents overf itting. It establishes correspondence between
feature maps and the confidence maps of categories, and generates one
feature map for each corresponding category.
Dense layers except last one use rectifier linear unit (ReLU) as
activation function. The equation of ReLU activation function is given as
( ) max( 0, ) (1)fx x=
where x is the input to neuron. It is a non-linear activation function in
which if input is greater than 0, then output equals the input.
As the final output is the prediction stating whether the review is
positive or negative, output layer has single unit which uses sigmoid
activation function. Its equation is given as
1
(2)
1
() x
xe
σ
−
=+
where x is the input to neuron. This function squashes real-valued inputs
to the range [0, 1].
Figure 3 shows the configuration of LSTM model with dense layers.
LSTM model is good at handling long-term dependencies in sentences.
Such models process the data with the help of gate vectors and have ability
to control passing of information along the sequence. Inputs for LSTM can
be given as xt , ht –1, ct – 1.
For time t, entries in LSTM can be given as
1
1
1
1
1(7)
( ) (3)
( ) (4)
( ) (5)
( ) (6)
( ) (8)
t xi t hi t i
t xf t hf t f
t xo t ho t o
t xc t hc t c
t tt t t
tt t
i Wx Wh b
f Wx Wh b
o Wx Wh b
g Wx Wh b
c fc ig
ho c
σ
σ
σ
φ
φ
−
−
−
−
−
= ++
= ++
= ++
= ++
= +
=
where it is the input gate, ft is the forget gate, ct is the memory cell unit, qt is
the output gate, ht ŒRN, ft ŒRN, it ŒRN, zt ŒRN, ct ŒRN and qt ŒRN.
()
( ) (9)
()
xx
xx
ee
xee
φ
−
−
+
=+
is the hyperbolic tangent function which squashes its inputs to the range
[–1, 1].
Integers obtained by encoding the fixed length reviews are converted
into embedded vectors and passed to LSTM layers recursively, followed
DEEP LEARNING MODELS FOR SENTIMENT ANALYSIS 747
Figure 3
Long short term memory model with dense layers
Figure 4
Sentiment analysis using LSTM model
by dense layers. The output layer predicts the sentiment associated with
sentences using sigmoid activation function as mentioned in equation (2).
Figure 4 depicts the workflow of modules in LSTM based deep architecture
for sentiment analysis.
4. Experimentation details and results
The models have been evaluated on IMDb Movie Review dataset
which is standard benchmarked dataset for sentiment classification. This
dataset contains 50,000 reviews which are evenly split into 25,000 training
748 A. R. PATHAK, M. PANDEY AND S. RAUTARAY
and 25,000 testing sets. The experimentation is performed on Google
Compute Engine and implementation is done using Python, TensorFlow
environment with Keras API. We trained both models – deep feedforward
neural network with global average pooling and LSTM model with dense
layers using loss function L of binary cross-entropy and Adam optimizer.
The loss function L is given in equation (10).
1
1
[ log( ) (1 )log(1 )] ( )
ˆˆ
10
N
ii i i
i
L yy y y
N
=
=− +− −
∑
where yi denotes actual label (‘1’ for positive sentiment and ‘0’ for negative
sentiment),
ˆ
i
y
denotes predicted label, and N denotes number of samples.
For both models, we set the maximum word size to 256, number of
epochs to 10 and batch size to 512. Initially, we tested both models on
sample reviews. Table 1 shows the prediction scores obtained by deep
feedforward neural network and LSTM model with dense layers. Out of 7
reviews, review numbers 2 and 3 are positive reviews, and remaining are
negative reviews. Based on the probability scores, both deep feedforward
neural network and LSTM model correctly predicted the labels for all the
7 reviews. We trained and tested the deep forward neural network for in-
domain sentiment analysis on IMDb dataset. The percentage of validation
accuracy and loss for this model are 88% and 30% as depicted in figures
5 and 6 respectively. To check the effectiveness of LSTM model for cross-
domain sentiment analysis, we trained LSTM model on IMDb dataset and
tested it on Restaurants reviews dataset. For LSTM model, percentage
accuracy of 78% and loss of 29% have been achieved as shown in figures 7
and 8 respectively. As LSTM model has been tested on dataset other than
that it was trained on, there is difference between accuracies of both deep
feedforward neural network and LSTM model.
5. Conclusion
Sentiment analysis involves identifying and extracting the subjective
information automatically and predicting the sentiment of the given
subject represented in text documents. In this paper, we experimented
with two models based on deep architectures for performing sentiment
analysis.
Based on the results, it can be observed that both models achieved
significant accuracy on par for sentiment classification of reviews from
IMDb dataset. We also achieved an accuracy of 78% using LSTM model
for cross-domain sentiment analysis. As a future work, we aim to design
DEEP LEARNING MODELS FOR SENTIMENT ANALYSIS 749
Table 1
Prediction Scores by the deep architectures
Review
Number
Prediction
score
Predicted
label
Prediction
score
Predicted
label Actual
label
Deep feedforward neural
network
LSTM model with dense
layers
1 0.24896 0 0.34446 0 0
2 0.59615 1 0.55112 1 1
3 0.79539 1 0.69589 11
4 0.33660 0 0.43456 0 0
5 0.38660 0 0.48632 0 0
6 0.48448 0 0.46579 0 0
7 0.38755 0 0.42122 0 0
Figure 5
Validation accuracy per epoch for deep feedforward neural network with
global average pooling
Figure 6
Validation loss per epoch for deep feedforward neural network with global
average pooling
750 A. R. PATHAK, M. PANDEY AND S. RAUTARAY
Figure 7
Validation accuracy per epoch for LSTM model with dense layers
Figure 8
Validation loss per epoch for LSTM model with dense layers
ensemble deep learning model for sentiment prediction over large scale
data.
References
[1] Alharbi, A. S. M., & de Doncker, E. Twitter sentiment analysis with a
deep neural network: An enhanced approach using user behavioral
information. Cognitive Systems Research, 54, 50–61 (2019).
DEEP LEARNING MODELS FOR SENTIMENT ANALYSIS 751
[2] Emotion Recognition and Sentiment Analysis Market, https://www.
tractica.com/newsroom/press-releases/emotion-recognition-and-
sentiment-analysis-market-to-reach-3-8-billion-by-2025/
[3] Heikal, M., Torki, M., & El-Makky, N. Sentiment Analysis of Arabic
Tweets using Deep Learning. Procedia Computer Science, 142, 114–
122 (2018).
[4] Karpov, N., Lyashuk, A., & Vizgunov, A. Sentiment Analysis Using
Deep Learning. In International Conference on Network Analysis
(pp. 281–288) (2016).
[5] B. Agarwal, N. Mittal, “Semantic Feature Clustering for Sentiment
Analysis of English Reviews”, In IETE Journal of Research, Vol: 60 (6),
pages 414-422, (2014).
[6] Li, D., & Qian, J. Text sentiment analysis based on long short-term
memory. In Computer Communication and the Internet (ICCCI),
2016 IEEE International Conference on (pp. 471–475) (2016).
[7] Li, L., Goh, T. -T., & Jin, D. How textual quality of online reviews
affect classification performance: a case of deep learning sentiment
analysis. Neural Computing and Applications, 1–29 (2018).
[8] Lin, M., Chen, Q., & Yan, S. Network in network. arXiv Preprint arX-
iv:1312.4400 (2013).
[9] Pathak, A. R., Pandey, M., & Rautaray, S. Application of Deep Learn-
ing for Object Detection. Procedia Computer Science, 132, 1706–1717.
(2018).
[10] Pathak, A. R., Pandey, M., & Rautaray, S. Construing the big data
based on taxonomy, analytics and approaches. Iran Journal of Com-
puter Science, 1(4), 237–259 (2018).
[11] Pathak, A. R., Pandey, M., & Rautaray, S. Deep Learning Approaches
for Detecting Objects from Images: A Review. In P. K. Pattnaik, S. S.
Rautaray, H. Das, & J. Nayak (Eds.), Progress in Computing, Ana-
lytics and Networking (pp. 491–499). Singapore: Springer Singapore
(2018).
[12] Pathak, A. R., Pandey, M., Rautaray, S., & Pawar, K. Assessment of
Object Detection Using Deep Convolutional Neural Networks. In S.
Bhalla, V. Bhateja, A. A. Chandavale, A. S. Hiwale, & S. C. Satapathy
(Eds.), Intelligent Computing and Information and Communication
(pp. 457–466). Singapore: Springer Singapore (2018).
752 A. R. PATHAK, M. PANDEY AND S. RAUTARAY
[13] Pawar, K., & Attar, V. Deep learning approaches for video-based
anomalous activity detection. World Wide Web. https://doi.
org/10.1007/s11280-018-0582-1 (2018).
[14] Q. Qian, M. Huang, J. Lei, and X. Zhu, “Linguistically Regularized
LSTM for Sentiment Classification,” in Proceedings of the 55th An-
nual Meeting of the Association for Computational Linguistics, pp.
1679–1689 (2017).
[15] Rani, S., & Kumar, P. Deep Learning Based Sentiment Analysis Using
Convolution Neural Network. Arabian Journal for Science and Engineer-
ing, 1–10 (2018).
[16] Rezaeinia, S. M., Rahmani, R., Ghodsi, A., & Veisi, H. Sentiment
analysis based on improved pre-trained word embeddings. Expert
Systems with Applications, 117, 139–147 (2019).
[17] Rong, W., Peng, B., Ouyang, Y., Li, C., & Xiong, Z. Structural informa-
tion aware deep semi-supervised recurrent neural network for senti-
ment analysis. Frontiers of Computer Science, 9(2), 171–184 (2015).
[18] Souma, W., Vodenska, I., & Aoyama, H. Enhanced news sentiment
analysis using deep learning methods. Journal of Computational Social
Science, 1–14 (2019).
[19] Tang, D., & Zhang, M. Deep Learning in Sentiment Analysis. In Deep
Learning in Natural Language Processing (pp. 219–253). Springer
(2018).
[20] G. Jain, M. Sharma, B. Agarwal, “Spam Detection in Social Media us-
ing Convolutional and Long Short Term Memory Neural Network”,
In Annals of Mathematics and Artificial Intelligence, 85(1), pp 21–44,
(2019).
[21] B. Agarwal, H. Ramampiaro, H. Langseth, M. Ruocco, “A Deep Net-
work Model for Paraphrase Detection in Short Text Messages”, In In-
formation Processing and Management, 54 (6), pp:922-937, (2018).
[22] S. Ram, S. Gupta, B. Agarwal, “Devanagri Character Recognition
Model Using Deep Convolution Neural Network”, In Journal of Statis-
tics and Management Systems, 21 (4), pages: 593–599, (2018).
[23] Shikhar Seth, Basant Agarwal, “Diabetic detection using Convolutio-
nal Neural Network”, In Journal of Statistics and Management Systems,
21 (4), pages: 569–574 2018