ArticlePDF Available

Abstract and Figures

Analysis of consumer reviews posted on social media is found to be essential for several business applications. Consumer reviews posted in social media are increasing at an exponential rate both in terms of number and relevance, which leads to big data. In this paper, a hybrid approach of two deep learning architectures namely Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM) (RNN with memory) is suggested for sentiment classification of reviews posted at diverse domains. Deep convolutional networks have been highly effective in local feature selection, while recurrent networks (LSTM) often yield good results in the sequential analysis of a long text. The proposed Co-LSTM model is mainly aimed at two objectives in sentiment analysis. First, it is highly adaptable in examining big social data, keeping scalability in mind, and secondly, unlike the conventional machine learning approaches, it is free from any particular domain. The experiment has been carried out on four review datasets from diverse domains to train the model which can handle all kinds of dependencies that usually arises in a post. The experimental results show that the proposed ensemble model outperforms other machine learning approaches in terms of accuracy and other parameters.
Content may be subject to copyright.
Co-LSTM: Convolutional LSTM Model for Sentiment Analysis in Social Big Data
Ranjan Kumar Behera1, Monalisa Jena2, Santanu Kumar Rath3, Sanjay Misra4
1,3Department of Computer Science & Engineering, National Institute of Technology, Rourkela, India, 769008
2Department of Information and Communication Technology, F. M. University Balasore, Odisha, India
4Department of Electrical and Information Engineering, Covenant University, Ota 1023, Nigeria
4Department of Computer Engineering, Atilim University, Ankara Turkey
jranjanb.19@gmail.com1, bmonalisa.26@gmail.com2, skrath@nitrkl.ac.in3, sanjay.misra@covenantuniversity.edu.ng4
Abstract
Analysis of consumer reviews posted on social media is found to be essential for several business applications.
Consumer reviews posted in social media are increasing at an exponential rate both in terms of number
and relevance, which leads to big data. In this paper, a hybrid approach of two deep learning architectures
namely Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM) (RNN with memory)
is suggested for sentiment classification of reviews posted at diverse domains. Deep convolutional networks
have been highly effective in local feature selection, while recurrent networks (LSTM) often yield good results
in the sequential analysis of a long text. The proposed Co-LSTM model is mainly aimed at two objectives
in sentiment analysis. First, it is highly adaptable in examining big social data, keeping scalability in mind,
and secondly, unlike the conventional machine learning approaches, it is free from any particular domain.
The experiment has been carried out on four review datasets from diverse domains to train the model which
can handle all kinds of dependencies that usually arises in a post. The experimental results show that the
proposed ensemble model outperforms other machine learning approaches in terms of accuracy and other
parameters.
Keywords: Deep Learning; Big Data; Sentiment Analysis; Word Embedding; RNN; CNN; LSTM
1. Introduction
Social media provides an extraordinary platform for big data analytics in various real-world applications.
A massive amount of data is continuously generated when users are posting their views or opinions while
communicating with each other through various social platforms like Twitter, Facebook, Myspace, etc. Social
data is one of the big data generated from various social channel, poses all the three big data characteristics5
like velocity, heterogeneity, large-volume. Apart from these, it possesses a unique characteristic known
as semantic, which refers to the fact that it is generated manually and contains symbolic information
having inherent subjective meaning. This unique characteristic of social big data leads to several challenges
and opportunities for sentiment analysis. Sentiment analysis (SA) is found to be an emerging research
direction since early 2000. Various terminologies like opinion mining, sentiment classification, review mining,10
Preprint submitted to Journal of Information Processing and Management November 12, 2020
sentiment mining, opinion extraction are also used for sentiment analysis. It is the way of predicting attitude
towards numerous products or social entities from sentiments. The source of sentiment analysis often varies
from textual to visual representations. The sentiments involved in social media are certainly a source for
modeling business strategies to achieve the business goal. It is often used for managing the online reputation
of a specific product or brand. However, as the amount of data in social media repository is increasing at15
an exponential rate, the traditional algorithms often fail to extract the sentiments from such big data.
Affective computing is one of the emerging research applications of sentiment analysis which able to capture
the public sentiments automatically from the social media posts [1]. Sentiment analysis can be treated as a
classification task as it classifies the orientation of a text into either positive, negative or neutral. Some of
the widely adopted approaches towards big data sentiment analysis of unstructured data can be categorized20
into lexicon-based, linguistic-based, or machine-learning-based approaches.
The classification task involved in SA can be categorized into four different domains such as subjectiv-
ity classification, word sentiment classification, document sentiment classification, and opinion extraction.
Subjectivity classification intends to classify sentences as subjective or objective. The subjective level of
a sentence indicates that the particular sentence is an opinion about a topic or subject whereas objective25
classification infers the factual information associated with the sentences. It is one of the sub-categories of
sentence-level sentiment analysis (SA). In document-level classification, the whole document is treated as a
unit for sentiment analysis. The techniques involved in sentence-level sentiment analysis are not different
from document level SA, as they can also be treated as mini-documents. Word sentiment classification
determines the polarity of a sentiment involved with the particular word.30
Sentiment analysis can be categorized based on the dataset used for processing. The major sources of data
are from the public reviews associated with a product, organization, movie, or any other social entity. These
reviews are important to business analytics as it plays a vital role in taking decisions about their products.
Sentiment analysis is not only applied to product reviews but can also be applied on stock prediction, movie
review, news articles, or political debates. For example, in political debates, it may be desired to figure out35
the opinions of voters on certain electoral candidates or political party. The election results may be heavily
influenced by predicting the sentiment of users from their political posts. Various micro-blogging and social
network websites are found to be rich sources of information, as people post their opinions and thoughts for
discussion about a certain topic freely, which can be used as valuable resources in sentiment analysis. In this
paper, social media reviews of various domains like the airline, movies, self-driving car, and election data are40
considered for modeling the architecture for sentiment analysis. Severyn and Moschitti [2] have shown in
their paper that traditional machine learning algorithms perform well for classification and regression tasks
for small size dataset. However, deep networks are more suitable for processing big social media, especially
in the area of text classification. Wang et al. [3] and Yin et al. [4] observed that both feature detection and
2
dependency capturing for long sentences are necessary to accurately classify a sentence.45
The major contribution of this paper can be stated as below:
In this paper, an effort has been made to develop an effective deep neural network architecture for
sentiment analysis which can process big social data in a scalable manner without compromising
with the performance. To address the issue, the functionality of both CNN and RNN is leveraged
in the proposed Co-LSTM model for sentiment classification. The CNN model is mainly used for50
deep learning which automatically extract the features from the big social data instead of manual
intervention as in case of traditional machine learning models. They are able to tune the hyper-
parameters of the classifier model automatically which makes the model scalable to handle big data.
In the proposed approach, the CNN model is used for better feature extraction through a pooling
process and the LSTM is adopted for capturing the long term dependency among words in a sentence.55
The second contribution is to develop the sentiment analysis model, which should be domain-independent.
To address the issue, we have trained the deep learning architecture using reviews from four different
domains where almost all kinds of word dependencies exist and evaluated the performance separately
for each dataset. The reviews from various domains like the movie, airline, self-driving car, and presi-
dential election data have been considered to develop a generalized classifier which does not need any60
domain-specific knowledge.
The following sections of the paper are organized as follows: In section 2, the motivation towards the
hybridization of CNN and LSTM has been discussed. Section 3 brings out the literature survey on techniques
involves in sentiment analysis. The methodology adopted in the study is presented in section 4. Step by step
process of the proposed algorithm is discussed in section 5. Evaluation parameters for the algorithm have65
been discussed in section 6. The implementation and results have been discussed in section 7. In section 8,
the conclusion and future work for the paper are presented. Section 9 pointed out a few statements on the
threat to the validation of the work.
2. Motivation
The motivation towards the research work may be described as follow:70
In the present world of digitization, social media available on the web is a big source of customer
interactions and reviews. Sentiment analysis of such a huge amount of data helps to identify and track
customer behavior about products, services, or brands [5]. Customer feedback is essentially required
in the decision-making process. For example, customer reviews about an e-commerce product can
help a new user to decide on the product before buying it. The same approach is also applicable for75
3
movie reviews, as they help users in deciding the movie for watching. Also for business, one can study
the sentiments of a specific product or brand in specific demographic areas to identify the potential
customers or the business potential of the new product or service in that area. Thus the sentiment
analysis helps to enhance the business of an enterprise. Likewise, there are several applications of SA,
which are helpful in our day to day activities [6].80
The opinion mining or sentiment analysis in social media has some major hurdles associated with it.
One of the biggest challenges is the authentication of the end-user, where there is the possibility of
incorporation of noise in the data acquired. Another major hurdle is inconsistency in social media
data. The expression of sentiments and wording styles vary from person to person. People sometimes
use shorthand notations, which make it difficult for the classifier to properly distinguish between word85
features. For example words like ‘us’ can be used for ‘United States of America’ as well as a pronoun,
thus the classifier might get confused between ‘us’ as a pronoun or ‘us’ as a name or noun. Generally,
no proper grammar and spelling protocols are often followed while writing the reviews in social media.
Sometimes people use an acronym that makes the analysis more complicated. Social media sentiment
analysis poses several challenges in handling noises like special characters, informal words, etc. Apart90
from that, it also contains sentences which involve sarcasm, different kind of negation statements,
ambiguous words, multi-polarity word, etc. Some of the solutions to handle sarcasm for sentiment
analysis is described in work presented by Maynard et al. [7]. The other major challenges are the
cleaning and preprocessing of the sheer volume of data based on the context of reviews. It thus
needs to have domain-specific knowledge for feature engineering of those data and make a proper95
transformation in the preprocessing phase, which is a cumbersome task. We have also motivated by
few papers based on domain-independent sentiment analysis which were authored by Biyani et al. [8]
and Bagheri et al. [9].
Social big data is found to be a potential resource for sentiment analysis as it involves human sentiments
on a specific topic, or product. It involves lots of sarcasm and dependencies which need to be exploited100
for predicting sentiments accurately. It also consists of short text and proverb, where actual sentiment
is quite challenging to predict. A number of statistical learning approaches already exist for sentiment
analysis. However, its performance highly depends on the quality of features, extracted from the review.
It usually requires expertise in feature engineering, and it is also expensive in terms of computational
time and space. A neural network can reduce the burden of proper feature engineering. CNN can105
able to exploit the parallelism in extracting the local correlations and patterns from the text as the
computation at a time step doesn’t depend on the computation at the previous time step. In this
paper, we have adopted CNN for better feature engineering for the big social review data. However,
4
it may not be suitable for capturing the contextual information from a given review as it doesn’t
remember the past context. We have adopted LSTM which is mainly suitable to capture the temporal110
contextual information. It is best suited for capturing the dependencies of words inside the reviews.
It is mainly used for sequence prediction.
Some of the other architectures like simple multi-layer perceptron (MLP) and probabilistic neural
network can be used for feature extraction, but they are not suitable for processing a large set of big
social review data. These architectures are not suitable to capture sequential dependencies which are115
the essential parameters for sentiment classification. In the second phase, simple RNN can be used for
classification but It suffers from vanishing gradient problem due to which it is quite difficult to train
for the problem which requires long term temporal dependencies. This has motivated us to use CNN
in the first phase and LSTM in the second phase.
3. Related Work120
In sentiment analysis, the given text or review is analyzed, and it captures the prevalent emotional
opinion within that text to identify the reviewers attitude as positive, negative, or neutral. Technically, it
is the process of extracting the sentiment orientation of a text unit by using Natural Language Processing
(NLP), statistics, or machine learning methods. Sentiment analysis plays a crucial role in social media
monitoring, as it captures the public opinion about certain topics. Some of the pioneer works related to125
sentiment analysis are presented below:
Dos Santos and Gatti [10] have proposed an efficient CNN model to exploit the character to sentence
information to classify the emotional level of the short text. They have proposed their model consisting of
two layers of CNN which they named as character Conventional Neural network (ChCNN). Zhou et al. [11]
have proposed a bi-directional LSTM model for sentiment analysis in which a two-dimensional pooling130
layer has been adopted. They have experimented on Stanford Sentiment Treebank (SST) database which
resulted in 88.7% accuracy. Ma et al. [12] have proposed an extension of LSTM model termed as sentic
LSTM for targeted aspect-based sentiment analysis. Their work mainly concerns with combining tasks
of target-dependent aspect detection and aspect-based polarity classification. In another work, they have
embedded common sense knowledge in the recurrent encoder for targeted sentiment analysis [13]. Their135
model is a hybridization of the attention architecture and Sentic LSTM. Wang et al. [14] have proposed a
hybrid version of CNN and RNN for opinion analysis of the sentences. As the CNN model is independent of
the location of a word in the sentence, both of the models have been worked as a layering fashion, i.e., the
output of the CNN is fed into the input to the RNN model. Rao et al. [15] have proposed document-level
sentiment analysis which captures the semantic relationships between the sentences in the document. They140
have proposed SSR-LSTM and SR-LSTM which are based on deep recurrent neural networks.
5
Hussain et al. [16] have shown the potential of the semi-supervised model, which hybridize the random
projection scaling and support vector machine to perform reasoning in big social media. Their model seems
to be quite suitable for extracting the semantic information from emoticon representation and polarity
identification in knowledge based on big social data. Cambria et al. [17] have developed a three-level145
representation for sentiment analysis termed as SenticNet 5 which able to discover conceptual primitives
automatically and the commonsense knowledge is embedded. An ensemble of top down and bottom up
learning has been embedded in senticNet 6 which is based in symbolic and subsymbolic AI [18]. They have
trained their model using WordNet-affect emoticon list, which is freely available on the Internet.
Sentiment analysis of customer reviews is based on a procedure, which may be called as a dichotomous150
one. The procedures followed in it can be categorized into three types, such as the Supervised method,
Lexicon based method, and Semantic-based sentiment analysis. These are described as follows:
3.1. Supervised methods
In supervised methods, sentiments of reviews are predicted based on the labelled sentiments associated
with the available review data [19]. The overall procedure is to predict sentiments based on the classification155
model using different machine learning techniques, which are trained on these available data after going
through proper feature engineering. Qiang et al. [20] have presented a comparison of different supervised
machine learning techniques for sentiment classification of travel destination review in the USA. There are
different techniques available to carry out feature engineering and data transformations, such as n-gram [21],
POS tagging known as Part-Of-Speech tagging [19] methods based on semantic patterns [22] and word-based160
semantic concepts [23].
The major limitation of supervised learning is that they are domain-specific i.e., the classifier models
trained on restaurant reviews may not perfectly work on movie or product review [24]. Another limitation
may be noted that the classifier needs a large amount of training data to cover all possible cases. Araque et
al. [25] has proposed deep learning-based ensemble techniques for classifying sentiment using in the social165
application. In their work, they hybridized surface classifiers with linear machine learning algorithms. The
feature processing has been carried out by combining deep and surface features from different domains.
3.2. Lexicon based methods
Lexicon based methods use the sentiment orientation of words or phrases existing in a review to evaluate
the overall sentiment score. Based on the obtained sentiment score, the review is termed as either positive or170
negative. Hence, lexicon-based methods are based on counting the sentiment lexicons rather than training
the data. The model will be more effective if the lexicon dictionary is associated with more number of words.
There exist various in-built dictionaries with terms and associated sentiment orientations like SentiWordNet
[26], MPQA subjectivity lexicon [27] and LIWC lexicon [28], etc. The major disadvantage of this approach
6
is the associated cost in searching the sentiment orientation of each word in the in-built dictionary. Also,175
the sentiment orientation of a word may vary from domain to domain. This problem can be tackled if the
sentiment orientation of a word concerning the semantics of its context is being considered [29]. But in
the case of most of the lexicon-based approaches, the existence of syntactical features or words explicitly
reflects the sentiment independent of the context in the document. Deep learning has been a popular trend
in sentence-level sentiment analysis. Yoon et al. [30] proposed a multi-channel lexicon-based model which180
hybridize CNN with bidirectional LSTM for sentiment classification. The performance of their model is
based on the set of rules extracted from the sentiment orientation of lexicon present in the context, which is
domain-dependent. In this paper, the proposed hybridized model is domain-independent in which training-
based approach is adopted for sentiment analysis rather than the lexicon-based approach. In this paper,
normal LSTM model is adopted instead of bidirectional LSTM as bidirectional LSTM are found to be more185
complex and needs huge computational power. They also need to scan the entire review text to capture
the context dependency, which makes computationally inefficient while processing huge size social media
reviews. In their work, the multi-channel embedding layer has been used, which is based on the Word2Vec
model.
3.3. Semantic based methods190
Various types of semantic-based sentiment approaches have been proposed by several authors, which
can broadly be classified as conceptual semantic and contextual semantic [31]. Co-occurrence patterns
of words in the text are used to evaluate the semantics in the case of contextual semantics, which is also
known as statistical semantics [32]. External semantic knowledge bases like semantic networks are used with
natural language processing to conceptually represent the words to convey sentiments. SenticNet [33] is an195
example of the conceptual lexicon for sentiment analysis. Although conceptual semantic approaches have
outperformed the contextual approaches in many cases, they are limited to their knowledge base domain.
We et al. [34] have proposed a semantic approach for clustering words in a text-based on the lexical chain
and WordNet model. In their work, WordNet is integrated with lexical chains to exploit the ontological
structure for capturing the semantic relationship between the words in a cluster.200
3.4. Research questions
In this paper, the following research challenges have been identified, and an effort has been to resolve
the same using deep learning algorithms.
RQ1:.Review data in social media often consists of noisy elements like incorrect spellings, grammatical
errors, product ids, hyperlinks. Sometimes they are rich with emoticons which make the task more difficult205
for sentiment analysis. Emoticons are not natural text like language. These are the textual symbols consist-
ing of various characters representing a specific smiley face. Each of them is associated with some kinds of
7
emotions (happy, sad, irritate, etc.). Handling emoticon is found to be challenging as compared to handing
text which represents emotions. Sailunaz et al. [35] have presented a model for sentiment analysis that can
classify sentences based on emoticons associated with the text. Emoticons are the essential elements for210
short text or small reviews. Is the classifier model able to handle noisy data and emoticons?
To address this issue, proper feature engineering is desirable before training the classifier models. A
huge text corpus has been referred to identify the incorrect spelling from the reviews. Google-1 (Billion
word Corpus) [36] has been used to handle the word which are misspelled. It is then replaced with one
or two-letter distance words available in the text corpus. All the numerical digits have been replaced with215
the newly introduced word “digit”. The hyperlinks are filtered out using a regular expression. Emojis are
handled through the package known as emoji-sentiment-lexicon for replacement of the emoticons available in
the review text. LSTM architecture in the proposed model is able to capture the context in which emoticons
are used in the reviews.
RQ2:.The processed string can’t directly be fed to a model for training as most of the learning algorithms220
require numerical vectors as input. Traditional approaches like Tf-Idf [37] or one-hot encoding for converting
a string into a numerical value, provide a random numerical index to a word or phrase. The random numerical
vector may not able to capture the actual context involve in the text. The research question may be frame
as how to capture the context of corresponding words or phrases in their numerical representation?
In this paper, a word embedding layer is being considered to create a numerical feature matrix for225
capturing actual context present in the review text. Each word is being assigned a one-dimensional numerical
vector that is self-trainable. Here the numerical vector is being constructed by passing through several
training steps rather than by random assignment.
RQ3:.The feature matrix obtained from the word embedding layer, is passed through the convolutional
neural network. The output of the convolutional layer is then provided as input to the neural architecture to230
predict the sentiment as positive or negative. Most of the conventional model for classification treats every
feature of an input independently, which is not in the case of human originated reviews. How to capture
the dependency between the words in a sentence for predicting actual sentiments?
To capture the sequential dependency or semantic representation of a review, a Long Short Term Memory
(LSTM) layer is used. LSTM seems to be able to capture the long term dependency of words in the text235
with its unique architecture of having memory at each network.
8
4. Background Details
4.1. Word Embedding Techniques
Word embedding is the technique of converting text into numbers so that it can be used as input to
the machine learning algorithms [38]. The same text is converted to different numerical formats following240
different procedures depending on the context it is used. The word embedding process is quite important
in text processing as various machine learning or neural network techniques do not support operation on
plain texts but only numbers. Technically, word embedding method maps a word to a vector, based on
a dictionary, which may be trained over a text corpus using a neural network. Vector representation of
a word can be of various types. One-hot encoding is a popular vector representation technique of words245
consists of binary number only. In this representation, if the position of a word in a sentence is n, the
nth position of the vector corresponding to the word is one, and rest values will be zero. For example,
considering the sentence “social media research”, the one-hot encoded vector for ‘media’ will be [0, 1, 0]
since the word ‘media’ exists only in the second position of the sentence. Various types of word embedding
techniques can be categorized into two classes, such as frequency-based embedding and prediction based250
embedding. Frequency-based word embedding techniques are based upon how frequently a word is used in
the sentence [39]. Count-vectorizer, Tf-Idf vectorizer and co-occurrence matrix are some of the examples
of frequency-based techniques [40]. The prediction-based techniques use previous information and neural
network models to prepare the word vector based on the context [31]. CBOW (Continuous bag of words)
and skip-gram model are the examples of this category [33].255
4.2. Deep Learning Techniques
Deep learning is a representation learning technique that can itself process the raw input to be suitable
for the classification or regression eliminating the use of feature engineering as in the case of conventional
machine learning techniques. There are various deep learning models like the convolutional neural network
(CNN), probabilistic neural network (PNN), recurrent neural network (RNN), etc.260
4.2.1. Convolutional Neural Network (CNN)
CNN generally operates based on the convolution and sub-sampling process carried out through a series
of layers [31]. It is then followed by one or more fully connected layers. All the operations performed in the
CNN model passes through three sequential layers as follows:
Convolution Layer: CNN has got such a name mainly because of the convolution operation performed.265
The Convolution process primarily helps in extracting features from input data. For example, if an
image is considered as input then the convolution process extracts the features from the image with
preserving the spatial relationship between pixels by learning image features using small squares (2-D
9
filters) of input data. When it is applied in text classification, it helps in extracting the feature matrix
by preserving high-level word or phrase representation.270
Pooling Layer: It is a good practice that when the size of the input is too large, it is desirable to
reduce the number of trainable parameters. The feature dimension needs to be reduced without losing
any important information. Pooling layers are periodically introduced between subsequent convolution
layers. Pooling (also called sub-sampling or down-sampling) reduces the spatial size of each feature
map but retains the most important information. Spatial Pooling can be of different types: max,275
average, sum, etc. In the case of max pooling, a spatial neighborhood (for example, a 22 window) is
defined and take the largest element from the rectified feature map within that window. Instead of
taking the largest element, average (average pooling) or sum of all elements in that window may also
be considered for average and sum pooling respectively. In this paper, the max-pooling approach has
been considered.280
Fully Connected Layer: The fully connected layer is a traditional multi-layer perceptron that uses a
softmax activation function in the output layer. The term “fully connected” implies that every neuron
in the previous layer is connected to every other neuron on the next layer. The output from the
convolutional and pooling layers represent high-level features of the input data. The intuition behind
the fully connected layer is to use these features for classifying the input into various classes based on285
the training dataset. Most of the features from convolutional and pooling layers seem to be good for
the classification task.
4.2.2. Recurrent Neural Network (RNN)
In real-world scenarios, semantic information of one word often depends on the meaning associated
with previous words in a text. CNN fails to process this dependency as they consider every word in the290
text independently. RNN may be the appropriate solution to capture the dependency. RNNs perform the
sequential analysis by carrying out the same process recurrently for every element in the sequence. RNN
possesses a memory to capture the information that has already been calculated which influences the result
to be evaluated. The schematic diagram of RNN can be depicted as in Figure 1.
The process of RNN may be well represented through an example. Considering a text which consists of295
a sequence of three words. The network is unfolded to three layers (one layer for each word) as shown in,
Figure 1. To visualize the computation consider Xtbe the one-hot encoded vector of a word to be input at
timestamp t. Ctbe the cell state at timestamp twhich acts as a memory for the network. Ytis the output
at timestamp t.
10
X
Y
Xt-1 XtXt+1
Yt-1 YtYt+1
Ct-1 Ct
C
Input Layer
Output Layer
Figure 1: Schematic Flow Diagram for Recurrent Neural Network
4.2.3. Long Short Term Memory (LSTM)300
LSTM is a sophisticated version of RNN used for sequential modeling mainly on text data. It can be
considered as a special case of RNN where only the essential portion of data is being passed to the next
layer instead of passing whole data. One of the major problems in a simple RNN network is the vanishing
gradient problem [41] [42]. Gradient descent method is often used in neural networks to minimize the error by
optimizing the weight value at each neuron. Usually, the gradient of the loss function decreases exponentially305
at subsequent steps through back-propagation in RNN, which is also known as gradient vanishing problem.
For example, considering sentences like “I play cricket, and I am good at bowling”, the word ‘bowling’
depends on the word ‘cricket’, which is far behind the former one in position. With the increase in distance
between two dependent words, the performance of RNN often decreases, and also the gradient value vanishes
significantly. The Long Short Term Memory (LSTM) overcomes this problem and performs well in long term310
dependency case.
Vectorization: construction of feature
vector from the dictionary
Input: Social media reviews
Preprocessing in order to remove the
noise (special characters, emoticons,
hyper-links etc)
Feature matrix for each review
using word embeddings
Results
Classification of reviews
Train the models using deep learning
algorithms
Figure 2: Schematic diagram of the proposed approach
11
5. Proposed model for sentiment analysis
The presented approach passes through the three layers, such as word embedding, Convolution, and
LSTM layer. The schematic diagram of the proposed approach for sentiment analysis is presented in Figure
2. In the first layer, word-embedding is applied to embed the words in the review, which eradicates the315
domain dependency of the review features. The second phase uses the convolution layer and the pooling
process in order to identify the important local and deep features in the sentence [43]. The third layer applies
the LSTM network on the output obtained from the second layer to capture their sequential dependency from
left to right. The combination of three layers helps in realizing the behavior of the sentence. The output of
the LSTM is then supplied to the fully connected sigmoid layer to evaluate the result by considering binary320
cross-entropy as the loss function. The overall architecture of the classifier is shown in Figure. 7. The steps
of the proposed approach are presented as follows:
Step 1: Preprocessing. Social media reviews often in the form of text which contain noisy data such as
special characters, symbols, and hyperlinks, etc. The noisy information are filtered out with the help of
regular expression. In the preprocessing stage, all the reviews are broken into tokens in the form of words.325
The duplicate words are then eliminated to construct a unique representation for each word. A vocabulary
dictionary is then constructed with unique words as keys and words indices as values. Two new words such
as “digit” and “unknown” are introduced to represent all the numerical digits and the words, which are
not present in the dictionary, respectively. The process of vocabulary dictionary construction is shown in
Figure. 3.
Large set of social
media reviews
Removal of escape or
special character,
hyperlinks
Combine all the reviews in a
single text file
Remove all the duplicate
words from the tokenized list
and make all the words into
lower case letter
The text file is converted into
list of tokens in the form of
words
A vocabulary dictionary is created
consist of list of key, value pairs
where each key correspond to word
and the value correspond to the index
of that word in the tokenized list
Two new elements with key
correspond to unknown word and digit
are inserted at the end of the dictionary
having value next to the last index.
Noise Removal
Tokenization and duplicate
removal Vocabulary dictionary creation
Input text
Figure 3: Vocabulary dictionary creation before feature vectorization
330
12
After preprocessing, text vectorization process has been carried out for each review. Each element of
the vector representation of review corresponds to the index of the word in the vocabulary dictionary. The
length of the vector has been fixed to 25. As most of the reviews are having word-length less than 25, the
index of the newly introduced word unknown is padded at the end to make length 25. If the word-length
of any review exceeds 25, the less significant features are removed, i.e., the word-length of the review is335
truncated to 25. The insignificant words are identified with the process of lemmatization and the stop word
removal using the NLT package available in python. Most of the words in English have several alternative
words with similar meaning. Lemmatization is the process of transforming alternative form to the base form
which inherently reduces the number of words. The feature vectorization process is shown in Figure. 4.
Sometimes dimensional reduction is necessary to filter features for reducing the computational complex-340
ity. One intuitive example can be “awesomely amazing” may be mapped as only “amazing” as it reduces
the input size without losing semantic information. We have adopted PCA for dimensional reduction of
the feature metric, which is then passed into CNN and LSTM model as input. A novel architecture of
convolution and pooling process has also been considered in order to feature filtering.
Social media
review text
Represent the review with
a numerical vector, where
each element corresponds
to the index of the word in
the Vocabulary dictionary.
If the word is not
available in dictionary,
insert the index
corresponding to the
"Unknown" word
introduced in dictionary
Pad the index of
unknown at the end of
the vector to make the
length 25
Word-length 25
Truncate the review by
eliminating
insignificant features
to make the length 25
< 25
> 25
Figure 4: Process of feature vectorization
[ pizza here is expensive but tasty :-) ]
[ pizza, here, is, expensive, but, tasty ]
[ 1 159 200 101 90 456 ]
1
159
200
101
90
456
0.23, 0.65, 0.55 ...............................,0.88
0.24, 0.65, 0.15 ...............................,0.68
0.13, 0.35, 0.42 ...............................,0.18
0.17, 0.59, 0.71 ...............................,0.53
0.37, 0.13, 0.49 ...............................,0.22
0.38, 0.79, 0.02 ...............................,0.82
6 X 128
Figure 5: wordembedding for feature matrix construction
13
Input
Layer
Hidden
Layer
Output
Layer
w(i)
w(i+1)
w(i+3)
w(i+4)
one-hot encoded
vector based on
indices
w(i+2)
vector
representation
of target word
Figure 6: Schematic diagram of Word embedding model (CBOW)
Step 2: Word-embedding model. Each word in the list of texts is embedded to a vector of dimension 128345
which is trained through the backpropagation process. Word2vec algorithm has been used for training the
word embedding as it is simple and more efficient for vector representation. Word embedding is a model
used to represent the review in textual format into numerical vector space which can be further process
through neural networks. Prior to the representation, the vocabulary dictionary is created for the datasets
considered. In vocabulary dictionary, each word is associated with a index which represents the position of350
the word in the dictionary. As the position of each word is unique in nature, we have leveraged it for vector
representation of each review available in the dataset. The indices are used to represent the words in the
vocabulary dictionary. These are used to construct the one-hot encoding representation, which are treated
as input for word embedding model. A sample feature matrix constructed from the word embedding model
is presented in Figure 5. The indices value for each of the word in Figure 5 are just an example. It may be355
varied from dataset to dataset. The values present inside the matrix are the randomly assigned weights for
the embedding layers, which are adjusted through the backpropagation process. In word embedding model,
CBOW is used which takes the context of the word as input and tries to predict the representation for
the target word. Internally it uses three-layer feed forward neural network for constructing the numerical
representation for words.The architectural diagram for word embedding model (CBOW) is presented in360
Figure 6. The schematic diagram for deep learning process is presented in Figure. 7.
Step 3: Convolutional Layer. In the convolution layer, seven filters each of size 3X3 with stride one are
traversed over the input feature matrix to get the required features. Multiple filters have been used for
extracting different types of features. For example; if a matrix of size 8x128 is traversed with the filter of
14
Word embedding
layer
Convolution
Layer
Max Pooling
Layer
Fully connected
Sigmoid layer
Social Media
Review
Fully connected
CNN layer
LSTM
Layer
Sentiment Results
(Positive/ Negetive
Figure 7: Schematic diagram of deep learning steps in the proposed sentiment analysis model
dimension 3x3, the convolution process will deliver a feature matrix of size 6x126. It captures all the local365
hidden features as shown in Figure 8. Rectified Linear Unit (ReLU) activation function has been used in
the fully connected layer of CNN as it is found to be six times faster than the sigmoid and tanh activation
function [44]. However, in the last layer, the sigmoid function is used to get the class label. The inputs to
the last layer is the output of the last LSTM layer.
0 0 0 0 ..................... 0 0
0 0.65 0.65 0.55 ..................... 0.88 0
0 0.24 0.65 0.15 ..................... 0.68 0
0 0.13 0.35 0.42 ..................... 0.18 0
0 0.37 0.13 0.49 ..................... 0.22 0
0 0.17 0.59 0.71 ................... 0.53 0
0 0.38 0.79 0.02 ..................... 0.82 0
0 0 0 0 ..................... 0 0
1 0 1
0 1 0
0 1 1
1.54 1.45 ...........
1.37 2.62 ...........
*
8 X 128
6 X 126
Feature Matrix
Filter 3X3
Convolutionalized Feature
Matrix
Figure 8: Convolution Process
Step 4: Maxpooling Layer. After getting a feature matrix of size uxvfrom the convolution layer, the max-370
pooling is performed with a filter of dimension 2x2. In max-pooling, the maximum feature value is selected
at each position of the filter while traversing. The stride of size 2 is considered for traversing the filter. The
obtained feature matrix is of dimension u
2xv
2. Max-pooling operation is performed for each convolution
filter independently. Figure 9 shows the schematic structure of the Max-Pooling layer.
Step 5: Long Short Term Memory (LSTM) Network. The output from the max-pooling layer is passed to375
the LSTM layer to sequentially analyze the generated feature vectors from left to right. Since the important
15
1.54 1.45 ...........
1.37 2.62 ........... 2.62 .....
6 X 126
3 X 63
Figure 9: Max-Pooling Layer
local features have been extracted at the output of the max-pooling layer, the LSTM network is able to
check the long term dependencies to detect the global features. The output of the LSTM layer is flattened
to reduce the features, which is then passed through the fully connected CNN layer to predict the actual
sentiment. In this work, a hundred number of LSTM networks have been applied with a ten percent dropout380
to avoid the over-fitting condition.
Step 6: Sigmoid Layer. The feature vectors obtained from the output of LSTM layer are passed to a fully
connected sigmoid layer to find the probability distribution of each category. It can be mathematically
defined as follows:
Psigmoid(Cj) = eoj
1 + eoj(1)
where Psigmoid(Cj) is the probability distribution for the category jand ojrepresents the output corre-
sponding to the category j. The Sigmoid activation function is used to normalize the confidence score of
the classifier between zero to one. After getting the probability distribution from sigmoid layer, binary cross
entropy is applied as loss function to calculate disparity between actual sentiments and predicted sentiments.
loss =
k
X
i=1
R(Ci)×logPsigmoid (Ci) (2)
where kis the number of categories and R(Ci) is the actual sentiment associated with the text. It can take
discrete value from the set L={0,1}, where L is the sentiment label of review text (Negative, Positive ). It
is similar to the likelihood function which seek to minimize the difference between probability distribution
in the training set and the models predicted probability distribution of the testing dataset.385
6. Implementation
6.1. Dataset used for Experiment
In this paper, four review datasets from diverse domains such as Movie review, Airline review, US
presidential election review and self-driving car review have been considered for the experiment. As all of
these are from different domains, the writing style of reviews are totally different from each other. Different390
16
kind of word dependencies may be available inside the post of reviews. As one of the contributions in this
paper is to build a domain-independent sentiment analysis model, the model has been trained by merging
the training set from all the datasets and evaluation has been carried out for each of the datasets separately.
The confusion matrices presented in the result section is based on the testing part of individual dataset.
All of these datasets are balanced in nature, i.e., the ratio of the number of samples belonging to positive,395
negative or neutral classes is equal or almost equal to each other. The description of the datasets are
explained as follows:
1. Movie Review: The Large Movie Review Dataset (often referred to as the IMDB dataset) contains
25,000 highly polar moving reviews (good or bad) for training and the same amount again for testing.
The problem is to determine whether a given moving review has a positive or negative sentiment. The400
data was collected by Stanford researchers and was used in a 2011 paper, where a split of 70:30 of the
data was used for training and test [21].
2. Airline Review Dataset: This data originally came from Crowdflower’s Data for Everyone library. It
contains reviews about major U.S. airlines. The Twitter data was scraped from February 2013 to
January 2014 in a paper by Wan et al. [45], and it is supervised as to classify positive, negative, and405
neutral tweets, followed by categorizing negative reasons (such as “late flight” or “rude service”). It
contains whether the sentiment of the tweets in this set was positive, neutral, or negative for six US
airlines:
3. Self Driving Car dataset: This dataset has been collected from the website “https://www.kaggle.com/
[46]. It has three attributes such as Twitter id, review text, and the polarity associated with the410
sentiment.
4. US Presidential Election Dataset: This data is collected from the website “https://www.kaggle.com/
[21]. It is the first GOP debate Twitter sentiment data that analyze tweets on the first 2016 GOP
Presidential Debate. It consists of 21 attributes and 13871 number of reviews.
6.2. Performance Evaluation Parameters415
The results obtained from the experiment have been discussed in this section. The proposed Co-LSTM
model has been compared with the other machine learning models like SVM, Naive Bayes, Linear Regression,
Random Forest, CNN, and RNN for validation. The performance of the proposed algorithm has been
accessed in terms of accuracy, precision, recall, and F-measure which have been measured from the confusion
matrix. The statistical test like a t-test has also been used to show how the proposed algorithm is significantly420
different from other algorithms. The ROC curve and AUC value are also presented for analyzing the
performance of the proposed algorithm.
17
Table 1: Confusion Matrix
Correct label
Predicted label
Positive Negetive
Positive True Positive (TP) False Poitive (FP)
Negative False Negative (FN) True Negative (TN)
Confusion Matrix.Confusion matrix, also known as error matrix or contingency matrix is the visual
representation of statistical values, obtained through experiments. It shows the statistics about the actual
and predicted level for each review in the text for the classifier. It is used to evaluate the performance425
of most of the supervised machine learning algorithms. The confusion matrix for binary classification can
be represented in the form, as shown in Table 1. In this study, the classification of reviews is labeled as
either positive or negative sentiments. The confusion matrix has four components with the help of which
the different performance parameters can be evaluated:
True Positive (TP): It represents the reviews that are originally labeled as positive and also predicted430
as positive by the classifier.
False Positive (FP): It represents the reviews that are originally labeled as negative but predicted as
positive by the classifier.
True Negative (TN): It represents the reviews that are originally labeled as negative and also predicted
as negative by the classifier.435
False Negative (FN): It represents the reviews that are originally labeled as positive but predicted as
negative by the classifier.
The performance of the proposed classifier has been evaluated based on the following parameters.
i. Precision: It is defined the ratio of true positive prediction to the total number of positive prediction.
It measures the exactness of the classifier. It can be expressed as:
P recision =T P
T P +F P (3)
ii. Recall: It is defined as the ratio between the number of true positive prediction to the total number of
actual positive sample. It is also known as sensitivity.
Recall =T P
T P +F N (4)
iii. F-measure: It is the harmonic mean of Precision and Recall.
Fmeasure =2×P recision ×Recall
P recision +Recall (5)
18
iv. Accuracy: It is defined as the fraction of samples that are predicted correctly.
Accuracy =T P +T N
T P +F P +T N +F N (6)
6.3. Result Analysis and Discussion
Table 2: Confusion Matrix, Evaluation Parameters for Movie Review Dataset
Models Confusion Matrix Evaluation Parameter
Predicted Yes Predicted No Precision Recall F-Measure Accuracy
SVM Actual Yes 329 66 0.8329 0.8266 0.8298 0.8311
Actual No 69 336
Predicted Yes Predicted No
Naive Bayes Actual Yes 355 40 0.8987 0.7230 0.8014 0.7800
Actual No 136 269
Predicted Yes Predicted No
Linear Regression Actual Yes 318 77 0.8051 0.8010 0.8030 0.8050
Actual No 79 326
Predicted Yes Predicted No
Random Forest Actual Yes 302 93 0.7646 0.6028 0.6741 0.6350
Actual No 199 206
Predicted Yes Predicted No
CNN Actual Yes 316 79 0.8000 0.8294 0.8144 0.8200
Actual No 65 340
Predicted Yes Predicted No
RNN Actual Yes 296 99 0.7494 0.7810 0.7649 0.7725
Actual No 83 322
Predicted Yes Predicted No
Co-LSTM Actual Yes 330 65 0.8354 0.8350 0.8302 0.8313
Actual No 70 335
Standard machine learning models such as SVM, linear regression, random forest, and Naive Bayes440
are being considered for experimental comparison. Deep learning models are found to be more effective
than machine learning algorithms. CNN and LSTM networks are considered as the basic framework for the
proposed model, i.e., Co-LSTM. For classification of sentiment reviews efficiently, researchers have frequently
come up with ensemble systems based on these architectures, and the experimental results reported in
literature reflect the viability of the different techniques. Although the extensive study has been carried out445
using traditional models, a good amount of work has been carried out using deep learning models too in recent
years. The latter is found to outperform the traditional systems in most of the cases, thereby establishing
its utility in the field of natural language processing, including sentiment analysis. The performance results
of various machine learning techniques have been presented in the confusion matrix form along with the
evaluation parameters for each of the datasets.450
19
Comparative analysis of the classification models based on precision, recall, f-measure, and accuracy for
the movie review dataset is presented in Table 2. It can be observed that accuracy and f-measure for the
proposed Co-LSTM model yield better results as compared to other algorithms. Naive Bayes and CNN
model have better precision and recall value respectively for the movie review dataset as they are biased
more towards positive sentiments. The top three models for movie review datasets in term of accuracy are455
found to be Co-LSTM, SVM, and CNN with 83.13%, 83.11%, and 82% respectively.
Table 3: Confusion Matrix, Evaluation Parameters for Airline Dataset
Models Confusion Matrix Evaluation Parameters
Predicted Yes Predicted No Precision Recall F-Measure Accuracy
SVM Actual Yes 3419 230 0.9370 0.9529 0.9449 0.9136
Actual No 169 799
Predicted Yes Predicted No
Naive Bayes Actual Yes 3646 3 0.9992 0.8135 0.8968 0.8183
Actual No 836 132
Predicted Yes Predicted No
Linear Regression Actual Yes 3611 38 0.9896 0.9007 0.9431 0.9056
Actual No 398 570
Predicted Yes Predicted No
Random Forest Actual Yes 3589 60 0.9836 0.8680 0.9221 0.8687
Actual No 546 422
Predicted Yes Predicted No
CNN Actual Yes 3553 96 0.9737 0.9449 0.9591 0.9344
Actual No 207 761
Predicted Yes Predicted No
RNN Actual Yes 3541 108 0.9704 0.9651 0.9678 0.9489
Actual No 128 840
Predicted Yes Predicted No
Co-LSTM Actual Yes 3442 207 0.9433 0.9860 0.9681 0.9496
Actual No 49 919
The experimental results for accuracy, precision, recall, and f-measure for the Airline review dataset are
presented in Table 3. Like the movie review dataset, the Naive Bayes algorithm is more inclined towards
positive sentiment. The precision value for the Naive Bayes algorithm is found to be 0.9992. Co-LSTM
model has better accuracy, f-measure and recall value as compared to all other classifiers for the Airline460
review dataset. It can be observed that Co-LSTM and CNN seem to have very close performance results
with RNN. The accuracy for Co-LSTM, RNN and CNN is found to be 94.96%, 94.89%, 93.44% respectively.
The performance results in the self-driving car dataset are presented in Table 4. It can be observed that
Co-LSTM performs better in terms of accuracy, f-measure, and recall for self-driving car reviews. Precision
value for the Naive Bayes algorithm is found to be 100%. The deep learning models such as RNN and CNN465
have accuracy 83.62% and 83.44% respectively. Unlike other datasets, the performance of SVM is satisfactory
20
for self-driving car reviews in term of precision, recall and f-measure. Table 5 shows the performance result
of all the models in US presidential election data. In this dataset, the accuracy of Co-LSTM is found to be
90.45%. It outperforms all other models in terms of accuracy, f-measure, and recall. Like the self-driving
dataset, here the precision value for the Naive Bayes model is 1.0 due to more biasness towards positive470
sentiments.
Table 4: Confusion Matrix, Evaluation Parameters for Self Driving Car Dataset
Models Confusion Matrix Evaluation Parameters
Predicted Yes Predicted No Precision Recall F-Measure Accuracy
SVM Actual Yes 1615 398 0.9023 0.8549 0.8278 0.8081
Actual No 274 491
Predicted Yes Predicted No
Naive Bayes Actual Yes 2013 0 1.0000 0.7265 0.8416 0.7271
Actual No 758 7
Predicted Yes Predicted No
Linear Regression Actual Yes 1956 57 0.9717 0.7878 0.8701 0.7898
Actual No 527 238
Predicted Yes Predicted No
Random Forest Actual Yes 1907 106 0.9473 0.7583 0.8423 0.7430
Actual No 608 157
Predicted Yes Predicted No
CNN Actual Yes 1884 129 0.9359 0.8506 0.8912 0.8344
Actual No 331 434
Predicted Yes Predicted No
RNN Actual Yes 1916 97 0.9518 0.8426 0.8939 0.8362
Actual No 358 407
Predicted Yes Predicted No
Co-LSTM Actual Yes 1895 118 0.9414 0.8798 0.9095 0.8643
Actual No 259 506
The observed value from the sentiment classification has been plotted through the Receiving Operator
Characteristics (ROC) curve. This curve represents the trade-off between the false positive rate (FPR) and
the true positive rate (TPR). The FPR is defined as the ratio between the number of false-positive to the
total number of actual negative available in the dataset. Similarly, the TPR is defined as the ratio between475
the number of True positive value to the total number of actual positive in the dataset. It is same as the
recall or sensitivity. The ROC curve is plotted against the false positive rate (x-axis) and the true positive
rate (y-axis), which ranges from 0 to 1. It is one of the suitable approaches to find out the best model for the
classification task. The classification model is said to have better performance if the curve is more inclined
towards a true positive rate. The best prediction for a classifier will have curve towards (0,1), i.e., at the480
top-right region. So the performance of a model can be evaluated through the area under the curve of the
ROC line. More is the area under curve, better is the performance of the model. Figure 10a, 10b, 10c and
21
Table 5: Confusion Matrix, Evaluation Parameters for GOP Datasets
Models Confusion Matrix Evaluation Parameters
Predicted Yes Predicted No Precision Recall F-Measure Accuracy
SVM Actual Yes 2937 451 0.8669 0.9167 0.8911 0.8327
Actual No 267 637
Predicted Yes Predicted No
Naive Bayes Actual Yes 3388 0 1.0000 0.808 0.8938 0.8124
Actual No 805 99
Predicted Yes Predicted No
Linear Regression Actual Yes 3323 65 0.9808 0.8518 0.9118 0.8502
Actual No 578 326
Predicted Yes Predicted No
Random Forest Actual Yes 3241 147 0.9566 0.8529 0.9018 0.8355
Actual No 559 345
Predicted Yes Predicted No
CNN Actual Yes 3258 130 0.9616 0.9075 0.9338 0.8924
Actual No 332 572
Predicted Yes Predicted No
RNN Actual Yes 3333 55 0.9838 0.8686 0.9226 0.8698
Actual No 504 400
Predicted Yes Predicted No
Co-LSTM Actual Yes 3256 132 0.961 0.9213 0.9408 0.9045
Actual No 278 626
10d show the ROC curves of different classifiers for the movie, airline, self driving car and GOP datasets
respectively. It can be observed that the black line, i.e., the ROC curve for the proposed model Co-LSTM
is positioned more closed to TPR, which indicates that it has high TPR and low FPR. Naive Nayes is found485
to have more false-positive as compared to other classifiers in most of the datasets. For the Airline and
Self-driving car dataset, Co-LSTM has a better distinguishable ROC curve as compared to other models.
It can be observed that the ROC curve for deep learning models like RNN and CNN are more close to each
other. The Area under the curve (AUC) for the models are listed in Table 6. It can be noted that AUC is
more for Co-LSTM in all the datasets.490
The paired t-test analysis has been performed for each pair of classification models for each of the
evaluation parameters i.e., accuracy, precision, recall, and f-measure. It is used to check whether the
performance of proposed model is significantly different from others or not. The t-test analysis has been
performed for each data set for 5-fold cross-validation. The major parameters evaluated in t-test analysis is
the p-value. The classifier is said to be significantly different than others if the p-value is less than 0.05. It495
can be observed from Table 7 that for accuracy, recall, and f-measure, the Co-LSTM is significantly different
from all other classification models, i.e., the value obtained for accuracy, recall, and f-measure, is not due to
randomness.
22
(a) Receiving Operator Characteristics (ROC) for Movie (b) Receiving Operator Characteristics (ROC) for Airline
(c) Receiving Operator Characteristics (ROC) for Self Driv-
ing Car (d) Receiving Operator Characteristics (ROC) for GOP
Figure 10: ROC Comparison for Different Classification Models
Table 6: Area under curve (AUC) value for ROC Curve
Models Movie Review Airline Dataset Self Driving Car GOP
SVM 0.905 0.955 0.795 0.862
Naive Bayes 0.877 0.930 0.743 0.820
Linear Regression 0.881 0.954 0.799 0.865
Random Forest 0.695 0.881 0.700 0.817
CNN 0.894 0.970 0.867 0.922
RNN 0.862 0.978 0.868 0.920
Co-LSTM 0.920 0.984 0.909 0.934
7. Conclusion and Future Work
A neural network architecture comprised of both CNN and LSTM has been proposed to predict the500
sentiment of customer reviews. The major advantage of this model is that it is not limited to a specific
domain. Thus, the same model can be trained for product reviews as well as service reviews without
23
Table 7: t-test analysis (p-value) for various evaluation parameters
Accuracy Precision
SVM NB LR RF CNN RNN Co-LSTM SVM NB LR RF CNN RNN Co-LSTM
SVM - 0.058 0.792 0.257 0.162 0.486 0.010 - 0.037 0.166 0.324 0.208 0.375 0.017
NB 0.058 - 0.031 0.774 0.015 0.100 0.013 0.037 - 0.142 0.095 0.039 0.139 0.002
LR 0.792 0.031 - 0.151 0.017 0.370 0.019 0.166 0.142 - 0.043 0.058 0.155 0.039
RF 0.257 0.774 0.151 - 0.043 0.027 0.029 0.324 0.095 0.043 - 0.689 0.939 0.017
CNN 0.162 0.015 0.017 0.043 - 0.399 0.043 0.208 0.039 0.058 0.689 - 0.825 0.018
RNN 0.486 0.100 0.370 0.027 0.399 - 0.030 0.375 0.139 0.155 0.939 0.825 - 0.018
Co-LSTM 0.010 0.013 0.019 0.029 0.043 0.030 - 0.017 0.002 0.039 0.017 0.018 0.018 -
Recall F-Measure
SVM NB LR RF CNN RNN Co-LSTM SVM NB LR RF CNN RNN Co-LSTM
SVM - 0.001 0.012 0.048 0.179 0.202 0.016 - 0.368 0.602 0.409 0.223 0.652 0.012
NB 0.001 - 0.006 0.951 0.001 0.024 0.004 0.368 - 0.086 0.554 0.029 0.305 0.011
LR 0.012 0.006 - 0.246 0.008 0.230 0.021 0.602 0.086 - 0.188 0.006 0.745 0.005
RF 0.048 0.951 0.246 - 0.062 0.067 0.026 0.409 0.554 0.188 - 0.085 0.037 0.039
CNN 0.179 0.001 0.008 0.062 - 0.315 0.013 0.223 0.029 0.006 0.085 - 0.415 0.038
RNN 0.202 0.024 0.230 0.067 0.315 - 0.010 0.652 0.305 0.745 0.037 0.415 - 0.020
Co-LSTM 0.016 0.004 0.021 0.026 0.013 0.010 - 0.012 0.011 0.005 0.039 0.038 0.020 -
degrading the performance. No sophisticated manual feature engineering is required, thus avoiding domain-
specific expertise. It is all due to the use of the pre-trained word-embedding model for embedding the
input feature vector. In the next step, the use of CNN layer before the LSTM network helps to identify505
the important features only from the embedded vector, thus greatly improving the training time and hence
makes it computationally feasible. At the last stage, the use of LSTM network layer helps to build the model
by studying the sequential arrangements in the review rather than just considering words or phrases alone.
Thus the model also incorporates the context study of the review and performs better in case of context
such as negation as well as sarcasm.510
In this study, an application of hybrid neural network architecture of both Recurrent Neural Network
(RNN) and Convolutional Neural Network (CNN) built on the top of the word embedding model has been
presented. The main advantage of this architecture is the sequential study of the important features in a
review to predict the sentiment. Due to the application of the word embedding model and LSTM network,
performance is quite better in multiple domains (as we experimented with movie reviews and airline tweets)515
without any domain-specific feature engineering. It can also be verified in other sentence classification
activities.
8. Threat to Validation
The proposed architecture is based on the convolutional deep neural networks in the context of natural
language processing. Few limitations of the proposed convolutional LSTM model may be as follows:520
The deep learning model requires a huge amount of data for proper training and is computationally
intensive too.
24
In feature matrix creation, the word embedding model is trained on the pre-trained data corpus.
Pre-trained data corpus should be large enough to cover all frequently used words. If the pre-trained
corpus is not sufficient; some of the important features might be missing while training the model.525
If the initial convolutional layer of the Co-LSTM model is unable to capture some of the texts order
or sequence information, then the convolutional layer may fail to capture the sequential dependency
of the words. Thus, the LSTM layer may act as just a fully connected layer without any memory.
In this work, the word embedding model based on a pre-trained corpus has been considered. Some-
times, it is quite difficult to deal with misspellings or other irregularities found on the language used530
in social media. However, this can be improvised by building a social media-specific word-embeddings
model.
Acknowledgement
This research work was supported by Fund for Improvement of S&T Infrastructure in Universities and
Higher Educational Institutions (FIST) Scheme under Department of Science and Technology (DST), Govt.535
of India The authors wish to express their gratitude and heartiest thanks to the department of computer
science & engineering, National Institute of Technology, Rourkela, India for providing their research support.
References
[1] E. Cambria, Affective computing and sentiment analysis, IEEE Intelligent Systems 31 (2) (2016) 102–
107.540
[2] A. Severyn, A. Moschitti, Twitter sentiment analysis with deep convolutional neural networks, in: Pro-
ceedings of the 38th International ACM SIGIR Conference on Research and Development in Information
Retrieval, 2015, pp. 959–962.
[3] Y. Wang, M. Huang, X. Zhu, L. Zhao, Attention-based lstm for aspect-level sentiment classification,
in: Proceedings of the 2016 conference on empirical methods in natural language processing, 2016, pp.545
606–615.
[4] W. Yin, K. Kann, M. Yu, H. Scutze, Comparative study of cnn and rnn for natural language processing,
arXiv preprint arXiv:1702.01923.
[5] G. Vinodhini, R. Chandrasekaran, Sentiment analysis and opinion mining: a survey, International
Journal 2 (6) (2012) 282–292.550
25
[6] B. Liu, Sentiment analysis and opinion mining, Synthesis lectures on human language technologies 5 (1)
(2012) 1–167.
[7] D. Maynard, M. A. Greenwood, Who cares about sarcastic tweets? investigating the impact of sarcasm
on sentiment analysis, in: LREC 2014 Proceedings, ELRA, 2014, pp. 26–31.
[8] P. Biyani, C. Caragea, P. Mitra, C. Zhou, J. Yen, G. E. Greer, K. Portier, Co-training over domain-555
independent and domain-dependent features for sentiment analysis of an online cancer support com-
munity, in: International Conference on Advances in Social Networks Analysis and Mining (ASONAM
2013), IEEE, 2013, pp. 413–417.
[9] A. Bagheri, M. Saraee, F. De Jong, Care more about customers: Unsupervised domain-independent
aspect detection for sentiment analysis of customer reviews, Knowledge-Based Systems 52 (2013) 201–560
213.
[10] C. Dos Santos, M. Gatti, Deep convolutional neural networks for sentiment analysis of short texts,
in: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics:
Technical Papers, 2014, pp. 69–78.
[11] P. Zhou, Z. Qi, S. Zheng, J. Xu, H. Bao, B. Xu, Text classification improved by integrating bidirectional565
lstm with two-dimensional max pooling, arXiv preprint arXiv:1611.06639.
[12] Y. Ma, H. Peng, T. Khan, E. Cambria, A. Hussain, Sentic lstm: a hybrid network for targeted aspect-
based sentiment analysis, Cognitive Computation 10 (4) (2018) 639–650.
[13] Y. Ma, H. Peng, E. Cambria, Targeted aspect-based sentiment analysis via embedding commonsense
knowledge into an attentive lstm, in: Thirty-second AAAI conference on artificial intelligence, 2018,570
pp. 5876–5883.
[14] X. Wang, W. Jiang, Z. Luo, Combination of convolutional and recurrent neural network for senti-
ment analysis of short texts, in: Proceedings of COLING 2016, the 26th international conference on
computational linguistics: Technical papers, 2016, pp. 2428–2437.
[15] G. Rao, W. Huang, Z. Feng, Q. Cong, Lstm with sentence representations for document-level sentiment575
classification, Neurocomputing 308 (2018) 49–57.
[16] A. Hussain, E. Cambria, Semi-supervised learning for big social data analysis, Neurocomputing 275
(2018) 1662–1673.
26
[17] E. Cambria, S. Poria, D. Hazarika, K. Kwok, Senticnet 5: Discovering conceptual primitives for sen-
timent analysis by means of context embeddings, in: Thirty-Second AAAI Conference on Artificial580
Intelligence, 2018, pp. 1795–1802.
[18] E. Cambria, Y. Li, F. Z. Xing, S. Poria, K. Kwok, Senticnet 6: Ensemble application of symbolic and
subsymbolic ai for sentiment analysis, in: Proceedings of the 29th ACM International Conference on
Information & Knowledge Management, 2020, pp. 105–114.
[19] A. Agarwal, B. Xie, I. Vovsha, O. Rambow, R. Passonneau, Sentiment analysis of twitter data, in:585
Proceedings of the workshop on languages in social media, Association for Computational Linguistics,
2011, pp. 30–38.
[20] Q. Ye, Z. Zhang, R. Law, Sentiment classification of online reviews to travel destinations by supervised
machine learning approaches, Expert systems with applications 36 (3) (2009) 6527–6535.
[21] A. Bifet, E. Frank, Sentiment knowledge discovery in twitter streaming data, in: International confer-590
ence on discovery science, Springer, 2010, pp. 1–15.
[22] H. Saif, Y. He, H. Alani, Semantic sentiment analysis of twitter, in: International semantic web confer-
ence, Springer, 2012, pp. 508–524.
[23] R. K. Behera, S. K. Rath, S. Misra, R. Damaˇseviˇcius, R. Maskeli¯unas, Large scale community detection
using a small world model, Applied Sciences 7 (11) (2017) 1173.595
[24] A. Aue, M. Gamon, Customizing sentiment classifiers to new domains: A case study, in: Proceedings
of recent advances in natural language processing (RANLP), Vol. 1, Citeseer, 2005, pp. 2–1.
[25] O. Araque, I. Corcuera-Platas, J. F. Sanchez-Rada, C. A. Iglesias, Enhancing deep learning sentiment
analysis with ensemble techniques in social applications, Expert Systems with Applications 77 (2017)
236–246.600
[26] S. Baccianella, A. Esuli, F. Sebastiani, Sentiwordnet 3.0: an enhanced lexical resource for sentiment
analysis and opinion mining., in: LREC, Vol. 10, 2010, pp. 2200–2204.
[27] T. Wilson, J. Wiebe, P. Hoffmann, Recognizing contextual polarity in phrase-level sentiment analysis,
in: Proceedings of the conference on human language technology and empirical methods in natural
language processing, Association for Computational Linguistics, 2005, pp. 347–354.605
[28] J. W. Pennebaker, M. R. Mehl, K. G. Niederhoffer, Psychological aspects of natural language use: Our
words, our selves, Annual review of psychology 54 (1) (2003) 547–577.
27
[29] E. Cambria, An introduction to concept-level sentiment analysis, in: Mexican International Conference
on Artificial Intelligence, Springer, 2013, pp. 478–483.
[30] J. Yoon, H. Kim, Multi-channel lexicon integrated cnn-bilstm models for sentiment analysis, in: Pro-610
ceedings of the 29th Conference on Computational Linguistics and Speech Processing (ROCLING 2017),
2017, pp. 244–253.
[31] R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, P. Kuksa, Natural language processing
(almost) from scratch, Journal of Machine Learning Research 12 (Aug) (2011) 2493–2537.
[32] P. D. Turney, P. Pantel, From frequency to meaning: Vector space models of semantics, Journal of615
artificial intelligence research 37 (2010) 141–188.
[33] E. Cambria, C. Havasi, A. Hussain, Senticnet 2: A semantic and affective resource for opinion mining
and sentiment analysis., in: FLAIRS conference, 2012, pp. 202–207.
[34] T. Wei, Y. Lu, H. Chang, Q. Zhou, X. Bao, A semantic approach for text clustering using wordnet and
lexical chains, Expert Systems with Applications 42 (4) (2015) 2264–2275.620
[35] K. Sailunaz, R. Alhajj, Emotion and sentiment analysis from twitter text, Journal of Computational
Science 36 (2019) 101003.
[36] C. Chelba, T. Mikolov, M. Schuster, Q. Ge, T. Brants, P. Koehn, T. Robinson, One billion word
benchmark for measuring progress in statistical language modeling, arXiv preprint arXiv:1312.3005.
[37] J. Ramos, et al., Using tf-idf to determine word relevance in document queries, in: Proceedings of the625
first instructional conference on machine learning, Vol. 242, Piscataway, NJ, 2003, pp. 133–142.
[38] O. Melamud, O. Levy, I. Dagan, A simple word embedding model for lexical substitution, in: Pro-
ceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing, 2015, pp.
1–7.
[39] O. Levy, Y. Goldberg, Neural word embedding as implicit matrix factorization, in: Advances in neural630
information processing systems, 2014, pp. 2177–2185.
[40] D. Tang, F. Wei, N. Yang, M. Zhou, T. Liu, B. Qin, Learning sentiment-specific word embedding for
twitter sentiment classification, in: Proceedings of the 52nd Annual Meeting of the Association for
Computational Linguistics (Volume 1: Long Papers), Vol. 1, 2014, pp. 1555–1565.
[41] J. Wang, L.-C. Yu, K. R. Lai, X. Zhang, Dimensional sentiment analysis using a regional cnn-lstm635
model, in: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics
(Volume 2: Short Papers), 2016, pp. 225–230.
28
[42] S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural computation 9 (8) (1997) 1735–1780.
[43] Y. Kim, Convolutional neural networks for sentence classification, arXiv preprint arXiv:1408.5882.
[44] C.-N. Chou, C.-K. Shie, F.-C. Chang, J. Chang, E. Y. Chang, Representation learning on large and640
small data, Big Data Anal. Large-Scale Multimed. Search. Wiley, Hoboken, NJ (2019) 3–30.
[45] Y. Wan, Q. Gao, An ensemble sentiment classification system of twitter data for airline services analysis,
in: Data Mining Workshop (ICDMW), 2015 IEEE International Conference on, IEEE, 2015, pp. 1318–
1325.
[46] L.-C. Chen, J. T. Barron, G. Papandreou, K. Murphy, A. L. Yuille, Semantic image segmentation with645
task-specific edge detection using cnns and a discriminatively trained domain transform, in: Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 4545–4554.
29
... To provide a comparative analysis of results, we also include ML algorithms such as Support Vector Machines (SVM), Naïve Bayes (NB), Logistic Regression (LR), and Random Forest (RF). These ML algorithms are selected based on [5] that serves for comparison with the DL models. The existing paper focuses on sentiment classification of self-driving car data using SVM, NB, LR, RF, CNN, and LSTM models [5]. ...
... These ML algorithms are selected based on [5] that serves for comparison with the DL models. The existing paper focuses on sentiment classification of self-driving car data using SVM, NB, LR, RF, CNN, and LSTM models [5]. In this research paper, we aim to expand upon the existing work by incorporating additional DL models. ...
... A hybrid approach, Co-LSTM, which combines CNN and LSTM for sentiment classification of consumer reviews from social media across diverse domains is presented in [5]. The Co-LSTM model aims to handle big social data while being scalable and domain-independent. ...
Article
Sentiment Analysis (SA) is a crucial task in understanding public opinions and perceptions towards emerging technologies. In this study, we focus on SA for a self-driving car dataset as it provides valuable insights into public perceptions and opinions towards a transformative technology. The dataset consists of textual reviews associated with sentiment labels, providing insights into how people perceive self-driving car technology. Our objective is to analyze the sentiments expressed in these reviews using Deep Learning (DL) models, namely, Artificial Neural Network (ANN), Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Bidirectional LSTM (BiLSTM), Gated Recurrent Unit (GRU), and Bidirectional GRU (BiGRU). We compared our results with an existing technique in the field of self-driving car sentiment classification, that implemented various Machine Learning (ML) and DL models, including Support Vector Machines (SVM), Naïve Bayes (NB), Logistic Regression (LR), Random Forest (RF), CNN, and LSTM. In our study, we expanded upon this research by evaluating the performance of ANN, BiLSTM, GRU, and BiGRU. Results reveal that BiLSTM, GRU, and BiGRU exhibit superior performance in sentiment classification within the self-driving car dataset. These findings offer valuable insights into public sentiment towards self-driving cars, contributing significantly to the advancement of SA techniques in the domain of autonomous vehicles. Additionally, the results are statistically tested and are statistically significant.
... This enhancement allows the model to effectively learn and retain patterns over long periods when processing sequential data. Experimentally, it has been shown that the number of layers in a given LSTM model plays a crucial role in enhancing the performance of sequential data analysis [19]. In a restricted environment, as the number of layers in a network increases, its ability to discern intricate patterns and representations from the data grows. ...
... The comparison of various methods is tabulated in Table 1. Methods Outcomes [19] Consumer reviews analysis by topic modeling approach with CNN and LSTM. ...
Article
Full-text available
Sentimental Analysis is considered a computational strategy that helps in identifying and assessing the emotions of people via text documents. Tools and different methods have been adopted for determining both positive and negative emotions in the form of text data analytics by using Machine and Deep Learning techniques. Experimentally, it has been shown that the accuracy of existing text classification models such as Bi-LSTM, Decision Tree, and Ensemble Classifiers is limited by poor quality data, inappropriate hyperparameter tuning, and model-specific bias levels. Additionally, these models are prone to overfitting, high computational overhead, and longer training time. To overcome these limitations, we proposed a hybrid binary classification framework by combining Deep sequential features with the Random Forest (RF) technique. The approach is implemented in four phases: Initially, data preprocessing is performed by employing a Vader sentiment package. In the second step, the deep Long Short Term Memory (LSTM) model was employed to extract deep sequential features corresponding to sad and happy emotions. In the third phase, a bi-orthogonalization algorithm with principal component Analysis (PCA) and Singular Value Decomposition (SVD) was employed to minimize the redundancy and maximize the relevance of extracted features. Finally, a five-fold cross-validation technique was implemented to discriminate sad and happy emotions using the Random Forest (RF) algorithm. Eventually, a grid search approach was implemented for hyperparameter tuning and results were compared with five baseline algorithms (Vanilla LSTM (VLSTM), Support Vector Machine (SVM), Gradient Boosting Machine (GBM), Naïve Bayes (NB), Ada Boost Algorithm (ABA). The experimental outcomes revealed that the proposed model achieved an accuracy rate of 99.631% on the 4000 stories dataset which was superior to all five state-of-the-art methods with a margin of 4.63%, 10.7%, 19.44%, 21%, and 56.5%, respectively. Interestingly, the proposed model realized improved results in terms of other conventional performance metrics also such as precision, recall, specificity, and time complexity. Overall, the proposed model has great potential in educational institutions, child psychology research, and child-friendly content moderation, generally helping in the understanding of the emotions and experiences of children in the digital realm.
... The experiments indicated that LSTM are a good choice to address sentiment analysis task. The authors in [11] introduced their online system for Arabic sentiment analysis. They proposed a deep learning model based on CNN combined with LSTM. ...
Article
Full-text available
This research investigates the efficacy of ensemble learning within the field of Arabic sentiment analysis. Ensemble learning, which combines predictions from multiple models to enhance accuracy, has shown promising results when compared to individual models. Hence, we propose an ensemble learning model that integrates two robust models: a Bidirectional Long Short-Term Memory (BiLSTM) model and a Generative pre-trained transformers (GPT) model. The GPT model has previously demonstrated effectiveness in various Arabic natural language processing (ANLP) tasks. To examine the performance of our ensemble model, we separately trained the BiLSTM and transformer-based model using three different datasets. We combined the models by aggregating their final probabilities for each class. Through multiple experiments, we compared the effectiveness of the proposed ensemble model with the standalone models. The results clearly indicate that the ensemble learning models outperform the standalone models in Arabic sentiment analysis. Specifically, the proposed ensemble model that demonstrated an accuracy increase of nearly 7% when compared to the best standalone model.
... Picard introduced Speech Emotion Recognition (SER) in 1997, and it has received a lot of attention since then. Language is primarily constructed through speech, which plays a critical role in conveying not just significant semantic information also rich emotional nuances [1,2]. Goal of SER is to detect a user's emotional states through their speech, promoting harmonic communication among humans and machines, such as the smart home assistants discussed in this paper. ...
Article
Full-text available
With the expansion of Speech Emotion Recognition in the consumer domain, several devices, particularly those designed for managing smart home personal assistants for the elderly, have been widely available on the market. The increasing processing power and connection, together with the growing need to facilitate longer residency through technological interventions, highlight the potential benefits of smart home assistants. Enabling these assistants to recognize human emotions would greatly improve user-assistant communication, allowing the assistant to deliver more constructive and customized feedback to the user. In this research work, Modeling and Sentiment Analysis of Social Relationships in Elderly Smart Homes Based on Graph Neural Networks (SASR-MBHNN-BBOA) is proposed. The input data are collected from Social Recommendation Dataset. Then, input data are pre-processed utilizing Inverse Optimal Safety Filters (IOSF) for cleaning the data and removing the background noise. Then the pre-processed data are given to Memristive Bi-neuron Hopfield Neural Network (MBHNN) for predicting the sentiments like positive, negative and neutral. In general, MBHNN doesn’t express some adoption of optimization approaches for determining optimal parameters to predicting the sentiments accurately. Hence BBOA is proposed to optimize MBHNN classifier which precisely predicts the sentiments in elderly smart home. The proposed SASR-MBHNN-BBOA method is implemented in Python, and it assessed with numerous performance metrics such as accuracy, precision, recall, F1-score, ROC. The outcomes show SASR-MBHNN-BBOA attains 20.8%, 19.5%, and 29.6% higher Accuracy, 28.8%, 22.5%, and 32.6% higher Precision, 15.5%, 27.4%, and 18.2% higher Recall are analysed with existing methods such as, Emotional speech analysis in real time for smart home assistants.(SASR-CNN-SHA), Machine Learning to Investigate Elderly Care Requirements in China via the Lens of Family Caregivers (SASR-ML-IECR),Identifying User Emotions via Audio Conversations with Smart Assistants (SASR-DNN-EASA) methods respectively.
Article
Sentence clustering plays a central role in various text-processing activities and has received extensive attention for measuring semantic similarity between compared sentences. However, relatively little focus has been placed on evaluating clustering performance using available similarity measures that adopt low-dimensional continuous representations. Such representations are crucial in domains like sentence clustering, where traditional word co-occurrence representations often achieve poor results when clustering semantically similar sentences that share no common words. This article presents a new implementation that incorporates a sentence similarity measure based on the notion of embedding representation for evaluating the performance of three types of text clustering methods: partitional clustering, hierarchical clustering, and fuzzy clustering, on standard textual datasets. This measure derives its semantic information from pre-training models designed to simulate human knowledge about words in natural language. The article also compares the performance of the used similarity measure by training it on two state-of-the-art pre-training models to investigate which yields better results. We argue that the superior performance of the selected clustering methods stems from their more effective use of the semantic information offered by this embedding-based similarity measure. Furthermore, we use hierarchical clustering, the best-performing method, for a text summarization task and report the results. The implementation in this article demonstrates that incorporating the sentence embedding measure leads to significantly improved performance in both text clustering and text summarization tasks.
Article
Full-text available
Online product recommendation has gained much popularity in recent years and has become the most demanding research area that can help consumers make better purchasing decisions. Recently, many machine learning techniques have been tested on various datasets for analyzing customer sentiments through online portals. Still, customers have difficulty finding profound products due to a lack of depth-level recommendations. The existing models for product recommendation may rely on either text or image reviews and ignore the multus-medium based reviews that lead to a poor recommendation. Furthermore, the recommendation system does not properly utilize product ranking. To effectively analyze the sentiment of online products, a novel framework ASIF is suggested in this manuscript. The key steps of the proposed framework are multi-modal data collection, normalization, text and image-based feature extraction, two-level feature fusion and extended transfer learning-based recommendation at binary level and multilevel. Five different datasets have been used for the evaluation of ASIF. From the experimental analysis and comparison to the baseline methods, it has been observed that the accuracy, precision, recall and F-Score of ASIF is far better, giving 95.95% and 94.95% on the standard dataset.
Article
Full-text available
Sentiment analysis has emerged as one of the most popular natural language processing (NLP) tasks in recent years. A classic setting of the task mainly involves classifying the overall sentiment polarity of the inputs. However, it is based on the assumption that the sentiment expressed in a sentence is unified and consistent, which does not hold in the reality. As a fine-grained alternative of the task, analyzing the sentiment towards a specific target and aspect has drawn much attention from the community for its more practical assumption that sentiment is dependent on a particular set of aspects and entities. Recently, deep neural models have achieved great successes on sentiment analysis. As a functional simulation of the behavior of human brains and one of the most successful deep neural models for sequential data, long short-term memory (LSTM) networks are excellent in learning implicit knowledge from data. However, it is impossible for LSTM to acquire explicit knowledge such as commonsense facts from the training data for accomplishing their specific tasks. On the other hand, emerging knowledge bases have brought a variety of knowledge resources to our attention, and it has been acknowledged that incorporating the background knowledge is an important add-on for many NLP tasks. In this paper, we propose a knowledge-rich solution to targeted aspect-based sentiment analysis with a specific focus on leveraging commonsense knowledge in the deep neural sequential model. To explicitly model the inference of the dependent sentiment, we augment the LSTM with a stacked attention mechanism consisting of attention models for the target level and sentence level, respectively. In order to explicitly integrate the explicit knowledge with implicit knowledge, we propose an extension of LSTM, termed Sentic LSTM. The extended LSTM cell includes a separate output gate that interpolates the token-level memory and the concept-level input. In addition, we propose an extension of Sentic LSTM by creating a hybrid of the LSTM and a recurrent additive network that simulates sentic patterns. In this paper, we are mainly concerned with a joint task combining the target-dependent aspect detection and targeted aspect-based polarity classification. The performance of proposed methods on this joint task is evaluated on two benchmark datasets. The experiment shows that the combination of proposed attention architecture and knowledge-embedded LSTM could outperform state-of-the-art methods in two targeted aspect sentiment tasks. We present a knowledge-rich solution for the task of targeted aspect-based sentiment analysis. Our model can effectively incorporate the commonsense knowledge into the deep neural network and be trained in an end-to-end manner. We show that the two-step attentive neural architecture as well as the proposed Sentic LSTM and H-Sentic-LSTM can achieve an improved performance on resolving the aspect categories and sentiment polarity for a targeted entity in its context over state-of-the-art systems.
Conference Paper
Full-text available
In this paper, we propose a solution to targeted aspect-based sentiment analysis. We augment the long short-term memory (LSTM) network with a hierarchical attention mechanism consisting of a target-level attention and a sentence-level attention. Commonsense knowledge of sentiment- related concepts is incorporated into the end-to-end training of a deep neural network for sentiment classification. In order to tightly integrate the commonsense knowledge into the recurrent encoder, we propose an extension of LSTM, termed Sentic LSTM. We conduct experiments on two publicly released datasets, which show that the combination of the proposed attention architecture and Sentic LSTM can outperform state-of-the-art methods in targeted aspect sentiment.
Article
Full-text available
In a social network, small or large communities within the network play a major role in deciding the functionalities of the network. Despite of diverse definitions, communities in the network may be defined as the group of nodes that are more densely connected as compared to nodes outside the group. Revealing such hidden communities is one of the challenging research problems. A real world social network follows small world phenomena, which indicates that any two social entities can be reachable in a small number of steps. In this paper, nodes are mapped into communities based on the random walk in the network. However, uncovering communities in large-scale networks is a challenging task due to its unprecedented growth in the size of social networks. A good number of community detection algorithms based on random walk exist in literature. In addition, when large-scale social networks are being considered, these algorithms are observed to take considerably longer time. In this work, with an objective to improve the efficiency of algorithms, parallel programming framework like Map-Reduce has been considered for uncovering the hidden communities in social network. The proposed approach has been compared with some standard existing community detection algorithms for both synthetic and real-world datasets in order to examine its performance, and it is observed that the proposed algorithm is more efficient than the existing ones.
Article
Full-text available
Deep learning techniques for Sentiment Analysis have become very popular. They provide automatic feature extraction and both richer representation capabilities and better performance than traditional feature based techniques (i.e., surface methods). Traditional surface approaches are based on complex manually extracted features, and this extraction process is a fundamental question in feature driven methods. These long-established approaches can yield strong baselines, and their predictive capabilities can be used in conjunction with the arising deep learning methods. In this paper we seek to improve the performance of deep learning techniques integrating them with traditional surface approaches based on manually extracted features. The contributions of this paper are sixfold. First, we develop a deep learning based sentiment classifier using a word embeddings model and a linear machine learning algorithm. This classifier serves as a baseline to compare to subsequent results. Second, we propose two ensemble techniques which aggregate our baseline classifier with other surface classifiers widely used in Sentiment Analysis. Third, we also propose two models for combining both surface and deep features to merge information from several sources. Fourth, we introduce a taxonomy for classifying the different models found in the literature, as well as the ones we propose. Fifth, we conduct several experiments to compare the performance of these models with the deep learning baseline. For this, we use seven public datasets that were extracted from the microblogging and movie reviews domain. Finally, as a result, a statistical study confirms that the performance of these proposed models surpasses that of our original baseline on F1-Score.
Article
With the recent development of deep learning, research in AI has gained new vigor and prominence. While machine learning has succeeded in revitalizing many research fields, such as computer vision, speech recognition, and medical diagnosis, we are yet to witness impressive progress in natural language understanding. One of the reasons behind this unmatched expectation is that, while a bottom-up approach is feasible for pattern recognition, reasoning and understanding often require a top-down approach. In this work, we couple sub-symbolic and symbolic AI to automatically discover conceptual primitives from text and link them to commonsense concepts and named entities in a new three-level knowledge representation for sentiment analysis. In particular, we employ recurrent neural networks to infer primitives by lexical substitution and use them for grounding common and commonsense knowledge by means of multi-dimensional scaling.
Article
Online social networks have emerged as new platform that provide an arena for people to share their views and perspectives on different issues and subjects with their friends, family, relatives, etc. We can share our thoughts, mental state, moments, stand on specific social, national, international issues through text, photos, audio and video messages and posts. Indeed, despite the availability of other forms of communication, text is still one of the most common ways of communication in a social network. The target of the work described in this paper is to detect and analyze sentiment and emotion expressed by people from text in their twitter posts and use them for generating recommendations. We collected tweets and replies on few specific topics and created a dataset with text, user, emotion, sentiment information, etc. We used the dataset to detect sentiment and emotion from tweets and their replies and measured the influence scores of users based on various user-based and tweet-based parameters. Finally, we used the latter information to generate generalized and personalized recommendations for users based on their twitter activity. The method we used in this paper includes some interesting novelties such as, (i) including replies to tweets in the dataset and measurements, (ii) introducing agreement score, sentiment score and emotion score of replies in influence score calculation, (iii) generating general and personalized recommendation containing list of users who agreed on the same topic and expressed similar emotions and sentiments towards that particular topic.
Chapter
Extracting useful features from a scene is an essential step in any computer vision and multimedia data analysis task. The approaches in feature extraction can be divided into two categories: model‐centric and data‐driven. This chapter focuses on how neural networks, specifically convolutional neural networks (CNNs), achieve effective representation learning. It reviews representative CNN models proposed since 2012. The chapter deals with the small data problem. It presents how features learned from one source domain with big data can be transferred to a different target domain with small data. Deep learning has its roots in neuroscience. CNNs are composed of two major components: feature extraction and classification. The common practice of transfer representation learning is to pretrain a CNN on a very large dataset and then to use the pretrained CNN as either an initialization or a fixed feature extractor for the task of interest.
Article
Recently, due to their ability to deal with sequences of different lengths, neural networks have achieved a great success on sentiment classification. It is widely used on sentiment classification. Especially long short-term memory networks. However, one of the remaining challenges is to model long texts to exploit the semantic relations between sentences in document-level sentiment classification. Existing Neural network models are not powerful enough to capture enough sentiment messages from relatively long time-steps. To address this problem, we propose a new neural network model (SR-LSTM) with two hidden layers. The first layer learns sentence vectors to represent semantics of sentences with long short term memory network, and in the second layer, the relations of sentences are encoded in document representation. Further, we also propose an approach to improve it which first clean datasets and remove sentences with less emotional polarity in datasets to have a better input for our model. The proposed models outperform the state-of-the-art models on three publicly available document-level review datasets.
Article
In an era of social media and connectivity, web users are becoming increasingly enthusiastic about interacting, sharing, and working together through online collaborative media. More recently, this collective intelligence has spread to many different areas, with a growing impact on everyday life, such as in education, health, commerce and tourism, leading to an exponential growth in the size of the social Web. However, the distillation of knowledge from such unstructured Big data is, an extremely challenging task. Consequently, the semantic and multimodal contents of the Web in this present day are, whilst being well suited for human use, still barely accessible to machines. In this work, we explore the potential of a novel semi-supervised learning model based on the combined use of random projection scaling as part of a vector space model, and support vector machines to perform reasoning on a knowledge base. The latter is developed by merging a graph representation of commonsense with a linguistic resource for the lexical representation of affect. Comparative simulation results show a significant improvement in tasks such as emotion recognition and polarity detection, and pave the way for development of future semi-supervised learning approaches to big social data analytics.