ArticlePDF Available

Co-LSTM: Convolutional LSTM model for sentiment analysis in social big data

January 2021
Information Processing & Management 58(1):102435

January 2021
58(1):102435

DOI:10.1016/j.ipm.2020.102435

Authors:

Ranjan Kumar Behera

National Institute of Technology Rourkela

Monalisa Jena

Fakir Mohan University

Santanu Kumar Rath

National Institute of Technology Rourkela

Sanjay Misra

Institute for Energy Technology

Analysis of consumer reviews posted on social media is found to be essential for several business applications. Consumer reviews posted in social media are increasing at an exponential rate both in terms of number and relevance, which leads to big data. In this paper, a hybrid approach of two deep learning architectures namely Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM) (RNN with memory) is suggested for sentiment classification of reviews posted at diverse domains. Deep convolutional networks have been highly effective in local feature selection, while recurrent networks (LSTM) often yield good results in the sequential analysis of a long text. The proposed Co-LSTM model is mainly aimed at two objectives in sentiment analysis. First, it is highly adaptable in examining big social data, keeping scalability in mind, and secondly, unlike the conventional machine learning approaches, it is free from any particular domain. The experiment has been carried out on four review datasets from diverse domains to train the model which can handle all kinds of dependencies that usually arises in a post. The experimental results show that the proposed ensemble model outperforms other machine learning approaches in terms of accuracy and other parameters.

Process of feature vectorization

…

wordembedding for feature matrix construction

…

Convolution Process

…

Confusion Matrix, Evaluation Parameters for Movie Review Dataset

…

Confusion Matrix, Evaluation Parameters for Self Driving Car Dataset

…

Figures - uploaded by Sanjay Misra

Content may be subject to copyright.

Content uploaded by Sanjay Misra

Content may be subject to copyright.

Co-LSTM: Convolutional LSTM Model for Sentiment Analysis in Social Big Data

Ranjan Kumar Behera1, Monalisa Jena2, Santanu Kumar Rath3, Sanjay Misra4

1,3Department of Computer Science & Engineering, National Institute of Technology, Rourkela, India, 769008

2Department of Information and Communication Technology, F. M. University Balasore, Odisha, India

4Department of Electrical and Information Engineering, Covenant University, Ota 1023, Nigeria

4Department of Computer Engineering, Atilim University, Ankara Turkey

jranjanb.19@gmail.com1, bmonalisa.26@gmail.com2, skrath@nitrkl.ac.in3, sanjay.misra@covenantuniversity.edu.ng4

Abstract

Analysis of consumer reviews posted on social media is found to be essential for several business applications.

Consumer reviews posted in social media are increasing at an exponential rate both in terms of number

and relevance, which leads to big data. In this paper, a hybrid approach of two deep learning architectures

namely Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM) (RNN with memory)

is suggested for sentiment classiﬁcation of reviews posted at diverse domains. Deep convolutional networks

have been highly eﬀective in local feature selection, while recurrent networks (LSTM) often yield good results

in the sequential analysis of a long text. The proposed Co-LSTM model is mainly aimed at two objectives

in sentiment analysis. First, it is highly adaptable in examining big social data, keeping scalability in mind,

and secondly, unlike the conventional machine learning approaches, it is free from any particular domain.

The experiment has been carried out on four review datasets from diverse domains to train the model which

can handle all kinds of dependencies that usually arises in a post. The experimental results show that the

proposed ensemble model outperforms other machine learning approaches in terms of accuracy and other

parameters.

Keywords: Deep Learning; Big Data; Sentiment Analysis; Word Embedding; RNN; CNN; LSTM

1. Introduction

Social media provides an extraordinary platform for big data analytics in various real-world applications.

A massive amount of data is continuously generated when users are posting their views or opinions while

communicating with each other through various social platforms like Twitter, Facebook, Myspace, etc. Social

data is one of the big data generated from various social channel, poses all the three big data characteristics5

like velocity, heterogeneity, large-volume. Apart from these, it possesses a unique characteristic known

as semantic, which refers to the fact that it is generated manually and contains symbolic information

having inherent subjective meaning. This unique characteristic of social big data leads to several challenges

and opportunities for sentiment analysis. Sentiment analysis (SA) is found to be an emerging research

direction since early 2000. Various terminologies like opinion mining, sentiment classiﬁcation, review mining,10

Preprint submitted to Journal of Information Processing and Management November 12, 2020

sentiment mining, opinion extraction are also used for sentiment analysis. It is the way of predicting attitude

towards numerous products or social entities from sentiments. The source of sentiment analysis often varies

from textual to visual representations. The sentiments involved in social media are certainly a source for

modeling business strategies to achieve the business goal. It is often used for managing the online reputation

of a speciﬁc product or brand. However, as the amount of data in social media repository is increasing at15

an exponential rate, the traditional algorithms often fail to extract the sentiments from such big data.

Aﬀective computing is one of the emerging research applications of sentiment analysis which able to capture

the public sentiments automatically from the social media posts [1]. Sentiment analysis can be treated as a

classiﬁcation task as it classiﬁes the orientation of a text into either positive, negative or neutral. Some of

the widely adopted approaches towards big data sentiment analysis of unstructured data can be categorized20

into lexicon-based, linguistic-based, or machine-learning-based approaches.

The classiﬁcation task involved in SA can be categorized into four diﬀerent domains such as subjectiv-

ity classiﬁcation, word sentiment classiﬁcation, document sentiment classiﬁcation, and opinion extraction.

Subjectivity classiﬁcation intends to classify sentences as subjective or objective. The subjective level of

a sentence indicates that the particular sentence is an opinion about a topic or subject whereas objective25

classiﬁcation infers the factual information associated with the sentences. It is one of the sub-categories of

sentence-level sentiment analysis (SA). In document-level classiﬁcation, the whole document is treated as a

unit for sentiment analysis. The techniques involved in sentence-level sentiment analysis are not diﬀerent

from document level SA, as they can also be treated as mini-documents. Word sentiment classiﬁcation

determines the polarity of a sentiment involved with the particular word.30

Sentiment analysis can be categorized based on the dataset used for processing. The major sources of data

are from the public reviews associated with a product, organization, movie, or any other social entity. These

reviews are important to business analytics as it plays a vital role in taking decisions about their products.

Sentiment analysis is not only applied to product reviews but can also be applied on stock prediction, movie

review, news articles, or political debates. For example, in political debates, it may be desired to ﬁgure out35

the opinions of voters on certain electoral candidates or political party. The election results may be heavily

inﬂuenced by predicting the sentiment of users from their political posts. Various micro-blogging and social

network websites are found to be rich sources of information, as people post their opinions and thoughts for

discussion about a certain topic freely, which can be used as valuable resources in sentiment analysis. In this

paper, social media reviews of various domains like the airline, movies, self-driving car, and election data are40

considered for modeling the architecture for sentiment analysis. Severyn and Moschitti [2] have shown in

their paper that traditional machine learning algorithms perform well for classiﬁcation and regression tasks

for small size dataset. However, deep networks are more suitable for processing big social media, especially

in the area of text classiﬁcation. Wang et al. [3] and Yin et al. [4] observed that both feature detection and

dependency capturing for long sentences are necessary to accurately classify a sentence.45

The major contribution of this paper can be stated as below:

•In this paper, an eﬀort has been made to develop an eﬀective deep neural network architecture for

sentiment analysis which can process big social data in a scalable manner without compromising

with the performance. To address the issue, the functionality of both CNN and RNN is leveraged

in the proposed Co-LSTM model for sentiment classiﬁcation. The CNN model is mainly used for50

deep learning which automatically extract the features from the big social data instead of manual

intervention as in case of traditional machine learning models. They are able to tune the hyper-

parameters of the classiﬁer model automatically which makes the model scalable to handle big data.

In the proposed approach, the CNN model is used for better feature extraction through a pooling

process and the LSTM is adopted for capturing the long term dependency among words in a sentence.55

•The second contribution is to develop the sentiment analysis model, which should be domain-independent.

To address the issue, we have trained the deep learning architecture using reviews from four diﬀerent

domains where almost all kinds of word dependencies exist and evaluated the performance separately

for each dataset. The reviews from various domains like the movie, airline, self-driving car, and presi-

dential election data have been considered to develop a generalized classiﬁer which does not need any60

domain-speciﬁc knowledge.

The following sections of the paper are organized as follows: In section 2, the motivation towards the

hybridization of CNN and LSTM has been discussed. Section 3 brings out the literature survey on techniques

involves in sentiment analysis. The methodology adopted in the study is presented in section 4. Step by step

process of the proposed algorithm is discussed in section 5. Evaluation parameters for the algorithm have65

been discussed in section 6. The implementation and results have been discussed in section 7. In section 8,

the conclusion and future work for the paper are presented. Section 9 pointed out a few statements on the

threat to the validation of the work.

2. Motivation

The motivation towards the research work may be described as follow:70

•In the present world of digitization, social media available on the web is a big source of customer

interactions and reviews. Sentiment analysis of such a huge amount of data helps to identify and track

customer behavior about products, services, or brands [5]. Customer feedback is essentially required

in the decision-making process. For example, customer reviews about an e-commerce product can

help a new user to decide on the product before buying it. The same approach is also applicable for75

movie reviews, as they help users in deciding the movie for watching. Also for business, one can study

the sentiments of a speciﬁc product or brand in speciﬁc demographic areas to identify the potential

customers or the business potential of the new product or service in that area. Thus the sentiment

analysis helps to enhance the business of an enterprise. Likewise, there are several applications of SA,

which are helpful in our day to day activities [6].80

•The opinion mining or sentiment analysis in social media has some major hurdles associated with it.

One of the biggest challenges is the authentication of the end-user, where there is the possibility of

incorporation of noise in the data acquired. Another major hurdle is inconsistency in social media

data. The expression of sentiments and wording styles vary from person to person. People sometimes

use shorthand notations, which make it diﬃcult for the classiﬁer to properly distinguish between word85

features. For example words like ‘us’ can be used for ‘United States of America’ as well as a pronoun,

thus the classiﬁer might get confused between ‘us’ as a pronoun or ‘us’ as a name or noun. Generally,

no proper grammar and spelling protocols are often followed while writing the reviews in social media.

Sometimes people use an acronym that makes the analysis more complicated. Social media sentiment

analysis poses several challenges in handling noises like special characters, informal words, etc. Apart90

from that, it also contains sentences which involve sarcasm, diﬀerent kind of negation statements,

ambiguous words, multi-polarity word, etc. Some of the solutions to handle sarcasm for sentiment

analysis is described in work presented by Maynard et al. [7]. The other major challenges are the

cleaning and preprocessing of the sheer volume of data based on the context of reviews. It thus

needs to have domain-speciﬁc knowledge for feature engineering of those data and make a proper95

transformation in the preprocessing phase, which is a cumbersome task. We have also motivated by

few papers based on domain-independent sentiment analysis which were authored by Biyani et al. [8]

and Bagheri et al. [9].

•Social big data is found to be a potential resource for sentiment analysis as it involves human sentiments

on a speciﬁc topic, or product. It involves lots of sarcasm and dependencies which need to be exploited100

for predicting sentiments accurately. It also consists of short text and proverb, where actual sentiment

is quite challenging to predict. A number of statistical learning approaches already exist for sentiment

analysis. However, its performance highly depends on the quality of features, extracted from the review.

It usually requires expertise in feature engineering, and it is also expensive in terms of computational

time and space. A neural network can reduce the burden of proper feature engineering. CNN can105

able to exploit the parallelism in extracting the local correlations and patterns from the text as the

computation at a time step doesn’t depend on the computation at the previous time step. In this

paper, we have adopted CNN for better feature engineering for the big social review data. However,

it may not be suitable for capturing the contextual information from a given review as it doesn’t

remember the past context. We have adopted LSTM which is mainly suitable to capture the temporal110

contextual information. It is best suited for capturing the dependencies of words inside the reviews.

It is mainly used for sequence prediction.

Some of the other architectures like simple multi-layer perceptron (MLP) and probabilistic neural

network can be used for feature extraction, but they are not suitable for processing a large set of big

social review data. These architectures are not suitable to capture sequential dependencies which are115

the essential parameters for sentiment classiﬁcation. In the second phase, simple RNN can be used for

classiﬁcation but It suﬀers from vanishing gradient problem due to which it is quite diﬃcult to train

for the problem which requires long term temporal dependencies. This has motivated us to use CNN

in the ﬁrst phase and LSTM in the second phase.

3. Related Work120

In sentiment analysis, the given text or review is analyzed, and it captures the prevalent emotional

opinion within that text to identify the reviewers attitude as positive, negative, or neutral. Technically, it

is the process of extracting the sentiment orientation of a text unit by using Natural Language Processing

(NLP), statistics, or machine learning methods. Sentiment analysis plays a crucial role in social media

monitoring, as it captures the public opinion about certain topics. Some of the pioneer works related to125

sentiment analysis are presented below:

Dos Santos and Gatti [10] have proposed an eﬃcient CNN model to exploit the character to sentence

information to classify the emotional level of the short text. They have proposed their model consisting of

two layers of CNN which they named as character Conventional Neural network (ChCNN). Zhou et al. [11]

have proposed a bi-directional LSTM model for sentiment analysis in which a two-dimensional pooling130

layer has been adopted. They have experimented on Stanford Sentiment Treebank (SST) database which

resulted in 88.7% accuracy. Ma et al. [12] have proposed an extension of LSTM model termed as sentic

LSTM for targeted aspect-based sentiment analysis. Their work mainly concerns with combining tasks

of target-dependent aspect detection and aspect-based polarity classiﬁcation. In another work, they have

embedded common sense knowledge in the recurrent encoder for targeted sentiment analysis [13]. Their135

model is a hybridization of the attention architecture and Sentic LSTM. Wang et al. [14] have proposed a

hybrid version of CNN and RNN for opinion analysis of the sentences. As the CNN model is independent of

the location of a word in the sentence, both of the models have been worked as a layering fashion, i.e., the

output of the CNN is fed into the input to the RNN model. Rao et al. [15] have proposed document-level

sentiment analysis which captures the semantic relationships between the sentences in the document. They140

have proposed SSR-LSTM and SR-LSTM which are based on deep recurrent neural networks.

Hussain et al. [16] have shown the potential of the semi-supervised model, which hybridize the random

projection scaling and support vector machine to perform reasoning in big social media. Their model seems

to be quite suitable for extracting the semantic information from emoticon representation and polarity

identiﬁcation in knowledge based on big social data. Cambria et al. [17] have developed a three-level145

representation for sentiment analysis termed as SenticNet 5 which able to discover conceptual primitives

automatically and the commonsense knowledge is embedded. An ensemble of top down and bottom up

learning has been embedded in senticNet 6 which is based in symbolic and subsymbolic AI [18]. They have

trained their model using WordNet-aﬀect emoticon list, which is freely available on the Internet.

Sentiment analysis of customer reviews is based on a procedure, which may be called as a dichotomous150

one. The procedures followed in it can be categorized into three types, such as the Supervised method,

Lexicon based method, and Semantic-based sentiment analysis. These are described as follows:

3.1. Supervised methods

In supervised methods, sentiments of reviews are predicted based on the labelled sentiments associated

with the available review data [19]. The overall procedure is to predict sentiments based on the classiﬁcation155

model using diﬀerent machine learning techniques, which are trained on these available data after going

through proper feature engineering. Qiang et al. [20] have presented a comparison of diﬀerent supervised

machine learning techniques for sentiment classiﬁcation of travel destination review in the USA. There are

diﬀerent techniques available to carry out feature engineering and data transformations, such as n-gram [21],

POS tagging known as Part-Of-Speech tagging [19] methods based on semantic patterns [22] and word-based160

semantic concepts [23].

The major limitation of supervised learning is that they are domain-speciﬁc i.e., the classiﬁer models

trained on restaurant reviews may not perfectly work on movie or product review [24]. Another limitation

may be noted that the classiﬁer needs a large amount of training data to cover all possible cases. Araque et

al. [25] has proposed deep learning-based ensemble techniques for classifying sentiment using in the social165

application. In their work, they hybridized surface classiﬁers with linear machine learning algorithms. The

feature processing has been carried out by combining deep and surface features from diﬀerent domains.

3.2. Lexicon based methods

Lexicon based methods use the sentiment orientation of words or phrases existing in a review to evaluate

the overall sentiment score. Based on the obtained sentiment score, the review is termed as either positive or170

negative. Hence, lexicon-based methods are based on counting the sentiment lexicons rather than training

the data. The model will be more eﬀective if the lexicon dictionary is associated with more number of words.

There exist various in-built dictionaries with terms and associated sentiment orientations like SentiWordNet

[26], MPQA subjectivity lexicon [27] and LIWC lexicon [28], etc. The major disadvantage of this approach

is the associated cost in searching the sentiment orientation of each word in the in-built dictionary. Also,175

the sentiment orientation of a word may vary from domain to domain. This problem can be tackled if the

sentiment orientation of a word concerning the semantics of its context is being considered [29]. But in

the case of most of the lexicon-based approaches, the existence of syntactical features or words explicitly

reﬂects the sentiment independent of the context in the document. Deep learning has been a popular trend

in sentence-level sentiment analysis. Yoon et al. [30] proposed a multi-channel lexicon-based model which180

hybridize CNN with bidirectional LSTM for sentiment classiﬁcation. The performance of their model is

based on the set of rules extracted from the sentiment orientation of lexicon present in the context, which is

domain-dependent. In this paper, the proposed hybridized model is domain-independent in which training-

based approach is adopted for sentiment analysis rather than the lexicon-based approach. In this paper,

normal LSTM model is adopted instead of bidirectional LSTM as bidirectional LSTM are found to be more185

complex and needs huge computational power. They also need to scan the entire review text to capture

the context dependency, which makes computationally ineﬃcient while processing huge size social media

reviews. In their work, the multi-channel embedding layer has been used, which is based on the Word2Vec

model.

3.3. Semantic based methods190

Various types of semantic-based sentiment approaches have been proposed by several authors, which

can broadly be classiﬁed as conceptual semantic and contextual semantic [31]. Co-occurrence patterns

of words in the text are used to evaluate the semantics in the case of contextual semantics, which is also

known as statistical semantics [32]. External semantic knowledge bases like semantic networks are used with

natural language processing to conceptually represent the words to convey sentiments. SenticNet [33] is an195

example of the conceptual lexicon for sentiment analysis. Although conceptual semantic approaches have

outperformed the contextual approaches in many cases, they are limited to their knowledge base domain.

We et al. [34] have proposed a semantic approach for clustering words in a text-based on the lexical chain

and WordNet model. In their work, WordNet is integrated with lexical chains to exploit the ontological

structure for capturing the semantic relationship between the words in a cluster.200

3.4. Research questions

In this paper, the following research challenges have been identiﬁed, and an eﬀort has been to resolve

the same using deep learning algorithms.

RQ1:.Review data in social media often consists of noisy elements like incorrect spellings, grammatical

errors, product ids, hyperlinks. Sometimes they are rich with emoticons which make the task more diﬃcult205

for sentiment analysis. Emoticons are not natural text like language. These are the textual symbols consist-

ing of various characters representing a speciﬁc smiley face. Each of them is associated with some kinds of

emotions (happy, sad, irritate, etc.). Handling emoticon is found to be challenging as compared to handing

text which represents emotions. Sailunaz et al. [35] have presented a model for sentiment analysis that can

classify sentences based on emoticons associated with the text. Emoticons are the essential elements for210

short text or small reviews. Is the classiﬁer model able to handle noisy data and emoticons?

To address this issue, proper feature engineering is desirable before training the classiﬁer models. A

huge text corpus has been referred to identify the incorrect spelling from the reviews. Google-1 (Billion

word Corpus) [36] has been used to handle the word which are misspelled. It is then replaced with one

or two-letter distance words available in the text corpus. All the numerical digits have been replaced with215

the newly introduced word “digit”. The hyperlinks are ﬁltered out using a regular expression. Emojis are

handled through the package known as emoji-sentiment-lexicon for replacement of the emoticons available in

the review text. LSTM architecture in the proposed model is able to capture the context in which emoticons

are used in the reviews.

RQ2:.The processed string can’t directly be fed to a model for training as most of the learning algorithms220

require numerical vectors as input. Traditional approaches like Tf-Idf [37] or one-hot encoding for converting

a string into a numerical value, provide a random numerical index to a word or phrase. The random numerical

vector may not able to capture the actual context involve in the text. The research question may be frame

as how to capture the context of corresponding words or phrases in their numerical representation?

In this paper, a word embedding layer is being considered to create a numerical feature matrix for225

capturing actual context present in the review text. Each word is being assigned a one-dimensional numerical

vector that is self-trainable. Here the numerical vector is being constructed by passing through several

training steps rather than by random assignment.

RQ3:.The feature matrix obtained from the word embedding layer, is passed through the convolutional

neural network. The output of the convolutional layer is then provided as input to the neural architecture to230

predict the sentiment as positive or negative. Most of the conventional model for classiﬁcation treats every

feature of an input independently, which is not in the case of human originated reviews. How to capture

the dependency between the words in a sentence for predicting actual sentiments?

To capture the sequential dependency or semantic representation of a review, a Long Short Term Memory

(LSTM) layer is used. LSTM seems to be able to capture the long term dependency of words in the text235

with its unique architecture of having memory at each network.

4. Background Details

4.1. Word Embedding Techniques

Word embedding is the technique of converting text into numbers so that it can be used as input to

the machine learning algorithms [38]. The same text is converted to diﬀerent numerical formats following240

diﬀerent procedures depending on the context it is used. The word embedding process is quite important

in text processing as various machine learning or neural network techniques do not support operation on

plain texts but only numbers. Technically, word embedding method maps a word to a vector, based on

a dictionary, which may be trained over a text corpus using a neural network. Vector representation of

a word can be of various types. One-hot encoding is a popular vector representation technique of words245

consists of binary number only. In this representation, if the position of a word in a sentence is n, the

nth position of the vector corresponding to the word is one, and rest values will be zero. For example,

considering the sentence “social media research”, the one-hot encoded vector for ‘media’ will be [0, 1, 0]

since the word ‘media’ exists only in the second position of the sentence. Various types of word embedding

techniques can be categorized into two classes, such as frequency-based embedding and prediction based250

embedding. Frequency-based word embedding techniques are based upon how frequently a word is used in

the sentence [39]. Count-vectorizer, Tf-Idf vectorizer and co-occurrence matrix are some of the examples

of frequency-based techniques [40]. The prediction-based techniques use previous information and neural

network models to prepare the word vector based on the context [31]. CBOW (Continuous bag of words)

and skip-gram model are the examples of this category [33].255

4.2. Deep Learning Techniques

Deep learning is a representation learning technique that can itself process the raw input to be suitable

for the classiﬁcation or regression eliminating the use of feature engineering as in the case of conventional

machine learning techniques. There are various deep learning models like the convolutional neural network

(CNN), probabilistic neural network (PNN), recurrent neural network (RNN), etc.260

4.2.1. Convolutional Neural Network (CNN)

CNN generally operates based on the convolution and sub-sampling process carried out through a series

of layers [31]. It is then followed by one or more fully connected layers. All the operations performed in the

CNN model passes through three sequential layers as follows:

•Convolution Layer: CNN has got such a name mainly because of the convolution operation performed.265

The Convolution process primarily helps in extracting features from input data. For example, if an

image is considered as input then the convolution process extracts the features from the image with

preserving the spatial relationship between pixels by learning image features using small squares (2-D

ﬁlters) of input data. When it is applied in text classiﬁcation, it helps in extracting the feature matrix

by preserving high-level word or phrase representation.270

•Pooling Layer: It is a good practice that when the size of the input is too large, it is desirable to

reduce the number of trainable parameters. The feature dimension needs to be reduced without losing

any important information. Pooling layers are periodically introduced between subsequent convolution

layers. Pooling (also called sub-sampling or down-sampling) reduces the spatial size of each feature

map but retains the most important information. Spatial Pooling can be of diﬀerent types: max,275

average, sum, etc. In the case of max pooling, a spatial neighborhood (for example, a 22 window) is

deﬁned and take the largest element from the rectiﬁed feature map within that window. Instead of

taking the largest element, average (average pooling) or sum of all elements in that window may also

be considered for average and sum pooling respectively. In this paper, the max-pooling approach has

been considered.280

•Fully Connected Layer: The fully connected layer is a traditional multi-layer perceptron that uses a

softmax activation function in the output layer. The term “fully connected” implies that every neuron

in the previous layer is connected to every other neuron on the next layer. The output from the

convolutional and pooling layers represent high-level features of the input data. The intuition behind

the fully connected layer is to use these features for classifying the input into various classes based on285

the training dataset. Most of the features from convolutional and pooling layers seem to be good for

the classiﬁcation task.

4.2.2. Recurrent Neural Network (RNN)

In real-world scenarios, semantic information of one word often depends on the meaning associated

with previous words in a text. CNN fails to process this dependency as they consider every word in the290

text independently. RNN may be the appropriate solution to capture the dependency. RNNs perform the

sequential analysis by carrying out the same process recurrently for every element in the sequence. RNN

possesses a memory to capture the information that has already been calculated which inﬂuences the result

to be evaluated. The schematic diagram of RNN can be depicted as in Figure 1.

The process of RNN may be well represented through an example. Considering a text which consists of295

a sequence of three words. The network is unfolded to three layers (one layer for each word) as shown in,

Figure 1. To visualize the computation consider Xtbe the one-hot encoded vector of a word to be input at

timestamp t. Ctbe the cell state at timestamp twhich acts as a memory for the network. Ytis the output

at timestamp t.

Xt-1 XtXt+1

Yt-1 YtYt+1

Ct-1 Ct

Input Layer

Output Layer

Figure 1: Schematic Flow Diagram for Recurrent Neural Network

4.2.3. Long Short Term Memory (LSTM)300

LSTM is a sophisticated version of RNN used for sequential modeling mainly on text data. It can be

considered as a special case of RNN where only the essential portion of data is being passed to the next

layer instead of passing whole data. One of the major problems in a simple RNN network is the vanishing

gradient problem [41] [42]. Gradient descent method is often used in neural networks to minimize the error by

optimizing the weight value at each neuron. Usually, the gradient of the loss function decreases exponentially305

at subsequent steps through back-propagation in RNN, which is also known as gradient vanishing problem.

For example, considering sentences like “I play cricket, and I am good at bowling”, the word ‘bowling’

depends on the word ‘cricket’, which is far behind the former one in position. With the increase in distance

between two dependent words, the performance of RNN often decreases, and also the gradient value vanishes

signiﬁcantly. The Long Short Term Memory (LSTM) overcomes this problem and performs well in long term310

dependency case.

Vectorization: construction of feature

vector from the dictionary

Input: Social media reviews

Preprocessing in order to remove the

noise (special characters, emoticons,

hyper-links etc)

Feature matrix for each review

using word embeddings

Results

Classification of reviews

Train the models using deep learning

algorithms

Figure 2: Schematic diagram of the proposed approach

5. Proposed model for sentiment analysis

The presented approach passes through the three layers, such as word embedding, Convolution, and

LSTM layer. The schematic diagram of the proposed approach for sentiment analysis is presented in Figure

2. In the ﬁrst layer, word-embedding is applied to embed the words in the review, which eradicates the315

domain dependency of the review features. The second phase uses the convolution layer and the pooling

process in order to identify the important local and deep features in the sentence [43]. The third layer applies

the LSTM network on the output obtained from the second layer to capture their sequential dependency from

left to right. The combination of three layers helps in realizing the behavior of the sentence. The output of

the LSTM is then supplied to the fully connected sigmoid layer to evaluate the result by considering binary320

cross-entropy as the loss function. The overall architecture of the classiﬁer is shown in Figure. 7. The steps

of the proposed approach are presented as follows:

Step 1: Preprocessing. Social media reviews often in the form of text which contain noisy data such as

special characters, symbols, and hyperlinks, etc. The noisy information are ﬁltered out with the help of

regular expression. In the preprocessing stage, all the reviews are broken into tokens in the form of words.325

The duplicate words are then eliminated to construct a unique representation for each word. A vocabulary

dictionary is then constructed with unique words as keys and words indices as values. Two new words such

as “digit” and “unknown” are introduced to represent all the numerical digits and the words, which are

not present in the dictionary, respectively. The process of vocabulary dictionary construction is shown in

Figure. 3.

Large set of social

media reviews

Removal of escape or

special character,

hyperlinks

Combine all the reviews in a

single text file

Remove all the duplicate

words from the tokenized list

and make all the words into

lower case letter

The text file is converted into

list of tokens in the form of

words

A vocabulary dictionary is created

consist of list of key, value pairs

where each key correspond to word

and the value correspond to the index

of that word in the tokenized list

Two new elements with key

correspond to unknown word and digit

are inserted at the end of the dictionary

having value next to the last index.

Noise Removal

Tokenization and duplicate

removal Vocabulary dictionary creation

Input text

Figure 3: Vocabulary dictionary creation before feature vectorization

330

After preprocessing, text vectorization process has been carried out for each review. Each element of

the vector representation of review corresponds to the index of the word in the vocabulary dictionary. The

length of the vector has been ﬁxed to 25. As most of the reviews are having word-length less than 25, the

index of the newly introduced word unknown is padded at the end to make length 25. If the word-length

of any review exceeds 25, the less signiﬁcant features are removed, i.e., the word-length of the review is335

truncated to 25. The insigniﬁcant words are identiﬁed with the process of lemmatization and the stop word

removal using the NLT package available in python. Most of the words in English have several alternative

words with similar meaning. Lemmatization is the process of transforming alternative form to the base form

which inherently reduces the number of words. The feature vectorization process is shown in Figure. 4.

Sometimes dimensional reduction is necessary to ﬁlter features for reducing the computational complex-340

ity. One intuitive example can be “awesomely amazing” may be mapped as only “amazing” as it reduces

the input size without losing semantic information. We have adopted PCA for dimensional reduction of

the feature metric, which is then passed into CNN and LSTM model as input. A novel architecture of

convolution and pooling process has also been considered in order to feature ﬁltering.

Social media

review text

Represent the review with

a numerical vector, where

each element corresponds

to the index of the word in

the Vocabulary dictionary.

If the word is not

available in dictionary,

insert the index

corresponding to the

"Unknown" word

introduced in dictionary

Pad the index of

unknown at the end of

the vector to make the

length 25

Word-length 25

Truncate the review by

eliminating

insignificant features

to make the length 25

< 25

> 25

Figure 4: Process of feature vectorization

[ pizza here is expensive but tasty :-) ]

[ pizza, here, is, expensive, but, tasty ]

[ 1 159 200 101 90 456 ]

159

200

101

456

0.23, 0.65, 0.55 ...............................,0.88

0.24, 0.65, 0.15 ...............................,0.68

0.13, 0.35, 0.42 ...............................,0.18

0.17, 0.59, 0.71 ...............................,0.53

0.37, 0.13, 0.49 ...............................,0.22

0.38, 0.79, 0.02 ...............................,0.82

6 X 128

Figure 5: wordembedding for feature matrix construction

Input

Layer

Hidden

Layer

Output

Layer

w(i)

w(i+1)

w(i+3)

w(i+4)

one-hot encoded

vector based on

indices

w(i+2)

vector

representation

of target word

Figure 6: Schematic diagram of Word embedding model (CBOW)

Step 2: Word-embedding model. Each word in the list of texts is embedded to a vector of dimension 128345

which is trained through the backpropagation process. Word2vec algorithm has been used for training the

word embedding as it is simple and more eﬃcient for vector representation. Word embedding is a model

used to represent the review in textual format into numerical vector space which can be further process

through neural networks. Prior to the representation, the vocabulary dictionary is created for the datasets

considered. In vocabulary dictionary, each word is associated with a index which represents the position of350

the word in the dictionary. As the position of each word is unique in nature, we have leveraged it for vector

representation of each review available in the dataset. The indices are used to represent the words in the

vocabulary dictionary. These are used to construct the one-hot encoding representation, which are treated

as input for word embedding model. A sample feature matrix constructed from the word embedding model

is presented in Figure 5. The indices value for each of the word in Figure 5 are just an example. It may be355

varied from dataset to dataset. The values present inside the matrix are the randomly assigned weights for

the embedding layers, which are adjusted through the backpropagation process. In word embedding model,

CBOW is used which takes the context of the word as input and tries to predict the representation for

the target word. Internally it uses three-layer feed forward neural network for constructing the numerical

representation for words.The architectural diagram for word embedding model (CBOW) is presented in360

Figure 6. The schematic diagram for deep learning process is presented in Figure. 7.

Step 3: Convolutional Layer. In the convolution layer, seven ﬁlters each of size 3X3 with stride one are

traversed over the input feature matrix to get the required features. Multiple ﬁlters have been used for

extracting diﬀerent types of features. For example; if a matrix of size 8x128 is traversed with the ﬁlter of

Word embedding

layer

Convolution

Layer

Max Pooling

Layer

Fully connected

Sigmoid layer

Social Media

Review

Fully connected

CNN layer

LSTM

Layer

Sentiment Results

(Positive/ Negetive

Figure 7: Schematic diagram of deep learning steps in the proposed sentiment analysis model

dimension 3x3, the convolution process will deliver a feature matrix of size 6x126. It captures all the local365

hidden features as shown in Figure 8. Rectiﬁed Linear Unit (ReLU) activation function has been used in

the fully connected layer of CNN as it is found to be six times faster than the sigmoid and tanh activation

function [44]. However, in the last layer, the sigmoid function is used to get the class label. The inputs to

the last layer is the output of the last LSTM layer.

0 0 0 0 ..................... 0 0

0 0.65 0.65 0.55 ..................... 0.88 0

0 0.24 0.65 0.15 ..................... 0.68 0

0 0.13 0.35 0.42 ..................... 0.18 0

0 0.37 0.13 0.49 ..................... 0.22 0

0 0.17 0.59 0.71 ................... 0.53 0

0 0.38 0.79 0.02 ..................... 0.82 0

0 0 0 0 ..................... 0 0

1 0 1

0 1 0

0 1 1

1.54 1.45 ...........

1.37 2.62 ...........

8 X 128

6 X 126

Feature Matrix

Filter 3X3

Convolutionalized Feature

Matrix

Figure 8: Convolution Process

Step 4: Maxpooling Layer. After getting a feature matrix of size uxvfrom the convolution layer, the max-370

pooling is performed with a ﬁlter of dimension 2x2. In max-pooling, the maximum feature value is selected

at each position of the ﬁlter while traversing. The stride of size 2 is considered for traversing the ﬁlter. The

obtained feature matrix is of dimension u

2xv

2. Max-pooling operation is performed for each convolution

ﬁlter independently. Figure 9 shows the schematic structure of the Max-Pooling layer.

Step 5: Long Short Term Memory (LSTM) Network. The output from the max-pooling layer is passed to375

the LSTM layer to sequentially analyze the generated feature vectors from left to right. Since the important

1.54 1.45 ...........

1.37 2.62 ........... 2.62 .....

6 X 126

3 X 63

Figure 9: Max-Pooling Layer

local features have been extracted at the output of the max-pooling layer, the LSTM network is able to

check the long term dependencies to detect the global features. The output of the LSTM layer is ﬂattened

to reduce the features, which is then passed through the fully connected CNN layer to predict the actual

sentiment. In this work, a hundred number of LSTM networks have been applied with a ten percent dropout380

to avoid the over-ﬁtting condition.

Step 6: Sigmoid Layer. The feature vectors obtained from the output of LSTM layer are passed to a fully

connected sigmoid layer to ﬁnd the probability distribution of each category. It can be mathematically

deﬁned as follows:

Psigmoid(Cj) = eoj

1 + eoj(1)

where Psigmoid(Cj) is the probability distribution for the category jand ojrepresents the output corre-

sponding to the category j. The Sigmoid activation function is used to normalize the conﬁdence score of

the classiﬁer between zero to one. After getting the probability distribution from sigmoid layer, binary cross

entropy is applied as loss function to calculate disparity between actual sentiments and predicted sentiments.

loss =−

i=1

R(Ci)×logPsigmoid (Ci) (2)

where kis the number of categories and R(Ci) is the actual sentiment associated with the text. It can take

discrete value from the set L={0,1}, where L is the sentiment label of review text (Negative, Positive ). It

is similar to the likelihood function which seek to minimize the diﬀerence between probability distribution

in the training set and the models predicted probability distribution of the testing dataset.385

6. Implementation

6.1. Dataset used for Experiment

In this paper, four review datasets from diverse domains such as Movie review, Airline review, US

presidential election review and self-driving car review have been considered for the experiment. As all of

these are from diﬀerent domains, the writing style of reviews are totally diﬀerent from each other. Diﬀerent390

kind of word dependencies may be available inside the post of reviews. As one of the contributions in this

paper is to build a domain-independent sentiment analysis model, the model has been trained by merging

the training set from all the datasets and evaluation has been carried out for each of the datasets separately.

The confusion matrices presented in the result section is based on the testing part of individual dataset.

All of these datasets are balanced in nature, i.e., the ratio of the number of samples belonging to positive,395

negative or neutral classes is equal or almost equal to each other. The description of the datasets are

explained as follows:

1. Movie Review: The Large Movie Review Dataset (often referred to as the IMDB dataset) contains

25,000 highly polar moving reviews (good or bad) for training and the same amount again for testing.

The problem is to determine whether a given moving review has a positive or negative sentiment. The400

data was collected by Stanford researchers and was used in a 2011 paper, where a split of 70:30 of the

data was used for training and test [21].

2. Airline Review Dataset: This data originally came from Crowdﬂower’s Data for Everyone library. It

contains reviews about major U.S. airlines. The Twitter data was scraped from February 2013 to

January 2014 in a paper by Wan et al. [45], and it is supervised as to classify positive, negative, and405

neutral tweets, followed by categorizing negative reasons (such as “late ﬂight” or “rude service”). It

contains whether the sentiment of the tweets in this set was positive, neutral, or negative for six US

airlines:

3. Self Driving Car dataset: This dataset has been collected from the website “https://www.kaggle.com/ ”

[46]. It has three attributes such as Twitter id, review text, and the polarity associated with the410

sentiment.

4. US Presidential Election Dataset: This data is collected from the website “https://www.kaggle.com/ ”

[21]. It is the ﬁrst GOP debate Twitter sentiment data that analyze tweets on the ﬁrst 2016 GOP

Presidential Debate. It consists of 21 attributes and 13871 number of reviews.

6.2. Performance Evaluation Parameters415

The results obtained from the experiment have been discussed in this section. The proposed Co-LSTM

model has been compared with the other machine learning models like SVM, Naive Bayes, Linear Regression,

Random Forest, CNN, and RNN for validation. The performance of the proposed algorithm has been

accessed in terms of accuracy, precision, recall, and F-measure which have been measured from the confusion

matrix. The statistical test like a t-test has also been used to show how the proposed algorithm is signiﬁcantly420

diﬀerent from other algorithms. The ROC curve and AUC value are also presented for analyzing the

performance of the proposed algorithm.

Table 1: Confusion Matrix

Correct label

Predicted label

Positive Negetive

Positive True Positive (TP) False Poitive (FP)

Negative False Negative (FN) True Negative (TN)

Confusion Matrix.Confusion matrix, also known as error matrix or contingency matrix is the visual

representation of statistical values, obtained through experiments. It shows the statistics about the actual

and predicted level for each review in the text for the classiﬁer. It is used to evaluate the performance425

of most of the supervised machine learning algorithms. The confusion matrix for binary classiﬁcation can

be represented in the form, as shown in Table 1. In this study, the classiﬁcation of reviews is labeled as

either positive or negative sentiments. The confusion matrix has four components with the help of which

the diﬀerent performance parameters can be evaluated:

•True Positive (TP): It represents the reviews that are originally labeled as positive and also predicted430

as positive by the classiﬁer.

•False Positive (FP): It represents the reviews that are originally labeled as negative but predicted as

positive by the classiﬁer.

•True Negative (TN): It represents the reviews that are originally labeled as negative and also predicted

as negative by the classiﬁer.435

•False Negative (FN): It represents the reviews that are originally labeled as positive but predicted as

negative by the classiﬁer.

The performance of the proposed classiﬁer has been evaluated based on the following parameters.

i. Precision: It is deﬁned the ratio of true positive prediction to the total number of positive prediction.

It measures the exactness of the classiﬁer. It can be expressed as:

P recision =T P

T P +F P (3)

ii. Recall: It is deﬁned as the ratio between the number of true positive prediction to the total number of

actual positive sample. It is also known as sensitivity.

Recall =T P

T P +F N (4)

iii. F-measure: It is the harmonic mean of Precision and Recall.

F−measure =2×P recision ×Recall

P recision +Recall (5)

iv. Accuracy: It is deﬁned as the fraction of samples that are predicted correctly.

Accuracy =T P +T N

T P +F P +T N +F N (6)

6.3. Result Analysis and Discussion

Table 2: Confusion Matrix, Evaluation Parameters for Movie Review Dataset

Models Confusion Matrix Evaluation Parameter

Predicted Yes Predicted No Precision Recall F-Measure Accuracy

SVM Actual Yes 329 66 0.8329 0.8266 0.8298 0.8311

Actual No 69 336

Predicted Yes Predicted No

Naive Bayes Actual Yes 355 40 0.8987 0.7230 0.8014 0.7800

Actual No 136 269

Predicted Yes Predicted No

Linear Regression Actual Yes 318 77 0.8051 0.8010 0.8030 0.8050

Actual No 79 326

Predicted Yes Predicted No

Random Forest Actual Yes 302 93 0.7646 0.6028 0.6741 0.6350

Actual No 199 206

Predicted Yes Predicted No

CNN Actual Yes 316 79 0.8000 0.8294 0.8144 0.8200

Actual No 65 340

Predicted Yes Predicted No

RNN Actual Yes 296 99 0.7494 0.7810 0.7649 0.7725

Actual No 83 322

Predicted Yes Predicted No

Co-LSTM Actual Yes 330 65 0.8354 0.8350 0.8302 0.8313

Actual No 70 335

Standard machine learning models such as SVM, linear regression, random forest, and Naive Bayes440

are being considered for experimental comparison. Deep learning models are found to be more eﬀective

than machine learning algorithms. CNN and LSTM networks are considered as the basic framework for the

proposed model, i.e., Co-LSTM. For classiﬁcation of sentiment reviews eﬃciently, researchers have frequently

come up with ensemble systems based on these architectures, and the experimental results reported in

literature reﬂect the viability of the diﬀerent techniques. Although the extensive study has been carried out445

using traditional models, a good amount of work has been carried out using deep learning models too in recent

years. The latter is found to outperform the traditional systems in most of the cases, thereby establishing

its utility in the ﬁeld of natural language processing, including sentiment analysis. The performance results

of various machine learning techniques have been presented in the confusion matrix form along with the

evaluation parameters for each of the datasets.450

Comparative analysis of the classiﬁcation models based on precision, recall, f-measure, and accuracy for

the movie review dataset is presented in Table 2. It can be observed that accuracy and f-measure for the

proposed Co-LSTM model yield better results as compared to other algorithms. Naive Bayes and CNN

model have better precision and recall value respectively for the movie review dataset as they are biased

more towards positive sentiments. The top three models for movie review datasets in term of accuracy are455

found to be Co-LSTM, SVM, and CNN with 83.13%, 83.11%, and 82% respectively.

Table 3: Confusion Matrix, Evaluation Parameters for Airline Dataset

Models Confusion Matrix Evaluation Parameters

Predicted Yes Predicted No Precision Recall F-Measure Accuracy

SVM Actual Yes 3419 230 0.9370 0.9529 0.9449 0.9136

Actual No 169 799

Predicted Yes Predicted No

Naive Bayes Actual Yes 3646 3 0.9992 0.8135 0.8968 0.8183

Actual No 836 132

Predicted Yes Predicted No

Linear Regression Actual Yes 3611 38 0.9896 0.9007 0.9431 0.9056

Actual No 398 570

Predicted Yes Predicted No

Random Forest Actual Yes 3589 60 0.9836 0.8680 0.9221 0.8687

Actual No 546 422

Predicted Yes Predicted No

CNN Actual Yes 3553 96 0.9737 0.9449 0.9591 0.9344

Actual No 207 761

Predicted Yes Predicted No

RNN Actual Yes 3541 108 0.9704 0.9651 0.9678 0.9489

Actual No 128 840

Predicted Yes Predicted No

Co-LSTM Actual Yes 3442 207 0.9433 0.9860 0.9681 0.9496

Actual No 49 919

The experimental results for accuracy, precision, recall, and f-measure for the Airline review dataset are

presented in Table 3. Like the movie review dataset, the Naive Bayes algorithm is more inclined towards

positive sentiment. The precision value for the Naive Bayes algorithm is found to be 0.9992. Co-LSTM

model has better accuracy, f-measure and recall value as compared to all other classiﬁers for the Airline460

review dataset. It can be observed that Co-LSTM and CNN seem to have very close performance results

with RNN. The accuracy for Co-LSTM, RNN and CNN is found to be 94.96%, 94.89%, 93.44% respectively.

The performance results in the self-driving car dataset are presented in Table 4. It can be observed that

Co-LSTM performs better in terms of accuracy, f-measure, and recall for self-driving car reviews. Precision

value for the Naive Bayes algorithm is found to be 100%. The deep learning models such as RNN and CNN465

have accuracy 83.62% and 83.44% respectively. Unlike other datasets, the performance of SVM is satisfactory

for self-driving car reviews in term of precision, recall and f-measure. Table 5 shows the performance result

of all the models in US presidential election data. In this dataset, the accuracy of Co-LSTM is found to be

90.45%. It outperforms all other models in terms of accuracy, f-measure, and recall. Like the self-driving

dataset, here the precision value for the Naive Bayes model is 1.0 due to more biasness towards positive470

sentiments.

Table 4: Confusion Matrix, Evaluation Parameters for Self Driving Car Dataset

Models Confusion Matrix Evaluation Parameters

Predicted Yes Predicted No Precision Recall F-Measure Accuracy

SVM Actual Yes 1615 398 0.9023 0.8549 0.8278 0.8081

Actual No 274 491

Predicted Yes Predicted No

Naive Bayes Actual Yes 2013 0 1.0000 0.7265 0.8416 0.7271

Actual No 758 7

Predicted Yes Predicted No

Linear Regression Actual Yes 1956 57 0.9717 0.7878 0.8701 0.7898

Actual No 527 238

Predicted Yes Predicted No

Random Forest Actual Yes 1907 106 0.9473 0.7583 0.8423 0.7430

Actual No 608 157

Predicted Yes Predicted No

CNN Actual Yes 1884 129 0.9359 0.8506 0.8912 0.8344

Actual No 331 434

Predicted Yes Predicted No

RNN Actual Yes 1916 97 0.9518 0.8426 0.8939 0.8362

Actual No 358 407

Predicted Yes Predicted No

Co-LSTM Actual Yes 1895 118 0.9414 0.8798 0.9095 0.8643

Actual No 259 506

The observed value from the sentiment classiﬁcation has been plotted through the Receiving Operator

Characteristics (ROC) curve. This curve represents the trade-oﬀ between the false positive rate (FPR) and

the true positive rate (TPR). The FPR is deﬁned as the ratio between the number of false-positive to the

total number of actual negative available in the dataset. Similarly, the TPR is deﬁned as the ratio between475

the number of True positive value to the total number of actual positive in the dataset. It is same as the

recall or sensitivity. The ROC curve is plotted against the false positive rate (x-axis) and the true positive

rate (y-axis), which ranges from 0 to 1. It is one of the suitable approaches to ﬁnd out the best model for the

classiﬁcation task. The classiﬁcation model is said to have better performance if the curve is more inclined

towards a true positive rate. The best prediction for a classiﬁer will have curve towards (0,1), i.e., at the480

top-right region. So the performance of a model can be evaluated through the area under the curve of the

ROC line. More is the area under curve, better is the performance of the model. Figure 10a, 10b, 10c and

Table 5: Confusion Matrix, Evaluation Parameters for GOP Datasets

Models Confusion Matrix Evaluation Parameters

Predicted Yes Predicted No Precision Recall F-Measure Accuracy

SVM Actual Yes 2937 451 0.8669 0.9167 0.8911 0.8327

Actual No 267 637

Predicted Yes Predicted No

Naive Bayes Actual Yes 3388 0 1.0000 0.808 0.8938 0.8124

Actual No 805 99

Predicted Yes Predicted No

Linear Regression Actual Yes 3323 65 0.9808 0.8518 0.9118 0.8502

Actual No 578 326

Predicted Yes Predicted No

Random Forest Actual Yes 3241 147 0.9566 0.8529 0.9018 0.8355

Actual No 559 345

Predicted Yes Predicted No

CNN Actual Yes 3258 130 0.9616 0.9075 0.9338 0.8924

Actual No 332 572

Predicted Yes Predicted No

RNN Actual Yes 3333 55 0.9838 0.8686 0.9226 0.8698

Actual No 504 400

Predicted Yes Predicted No

Co-LSTM Actual Yes 3256 132 0.961 0.9213 0.9408 0.9045

Actual No 278 626

10d show the ROC curves of diﬀerent classiﬁers for the movie, airline, self driving car and GOP datasets

respectively. It can be observed that the black line, i.e., the ROC curve for the proposed model Co-LSTM

is positioned more closed to TPR, which indicates that it has high TPR and low FPR. Naive Nayes is found485

to have more false-positive as compared to other classiﬁers in most of the datasets. For the Airline and

Self-driving car dataset, Co-LSTM has a better distinguishable ROC curve as compared to other models.

It can be observed that the ROC curve for deep learning models like RNN and CNN are more close to each

other. The Area under the curve (AUC) for the models are listed in Table 6. It can be noted that AUC is

more for Co-LSTM in all the datasets.490

The paired t-test analysis has been performed for each pair of classiﬁcation models for each of the

evaluation parameters i.e., accuracy, precision, recall, and f-measure. It is used to check whether the

performance of proposed model is signiﬁcantly diﬀerent from others or not. The t-test analysis has been

performed for each data set for 5-fold cross-validation. The major parameters evaluated in t-test analysis is

the p-value. The classiﬁer is said to be signiﬁcantly diﬀerent than others if the p-value is less than 0.05. It495

can be observed from Table 7 that for accuracy, recall, and f-measure, the Co-LSTM is signiﬁcantly diﬀerent

from all other classiﬁcation models, i.e., the value obtained for accuracy, recall, and f-measure, is not due to

randomness.

(a) Receiving Operator Characteristics (ROC) for Movie (b) Receiving Operator Characteristics (ROC) for Airline

ing Car (d) Receiving Operator Characteristics (ROC) for GOP

Figure 10: ROC Comparison for Diﬀerent Classiﬁcation Models

Table 6: Area under curve (AUC) value for ROC Curve

Models Movie Review Airline Dataset Self Driving Car GOP

SVM 0.905 0.955 0.795 0.862

Naive Bayes 0.877 0.930 0.743 0.820

Linear Regression 0.881 0.954 0.799 0.865

Random Forest 0.695 0.881 0.700 0.817

CNN 0.894 0.970 0.867 0.922

RNN 0.862 0.978 0.868 0.920

Co-LSTM 0.920 0.984 0.909 0.934

7. Conclusion and Future Work

A neural network architecture comprised of both CNN and LSTM has been proposed to predict the500

sentiment of customer reviews. The major advantage of this model is that it is not limited to a speciﬁc

domain. Thus, the same model can be trained for product reviews as well as service reviews without

Table 7: t-test analysis (p-value) for various evaluation parameters

Accuracy Precision

SVM NB LR RF CNN RNN Co-LSTM SVM NB LR RF CNN RNN Co-LSTM

SVM - 0.058 0.792 0.257 0.162 0.486 0.010 - 0.037 0.166 0.324 0.208 0.375 0.017

NB 0.058 - 0.031 0.774 0.015 0.100 0.013 0.037 - 0.142 0.095 0.039 0.139 0.002

LR 0.792 0.031 - 0.151 0.017 0.370 0.019 0.166 0.142 - 0.043 0.058 0.155 0.039

RF 0.257 0.774 0.151 - 0.043 0.027 0.029 0.324 0.095 0.043 - 0.689 0.939 0.017

CNN 0.162 0.015 0.017 0.043 - 0.399 0.043 0.208 0.039 0.058 0.689 - 0.825 0.018

RNN 0.486 0.100 0.370 0.027 0.399 - 0.030 0.375 0.139 0.155 0.939 0.825 - 0.018

Co-LSTM 0.010 0.013 0.019 0.029 0.043 0.030 - 0.017 0.002 0.039 0.017 0.018 0.018 -

Recall F-Measure

SVM NB LR RF CNN RNN Co-LSTM SVM NB LR RF CNN RNN Co-LSTM

SVM - 0.001 0.012 0.048 0.179 0.202 0.016 - 0.368 0.602 0.409 0.223 0.652 0.012

NB 0.001 - 0.006 0.951 0.001 0.024 0.004 0.368 - 0.086 0.554 0.029 0.305 0.011

LR 0.012 0.006 - 0.246 0.008 0.230 0.021 0.602 0.086 - 0.188 0.006 0.745 0.005

RF 0.048 0.951 0.246 - 0.062 0.067 0.026 0.409 0.554 0.188 - 0.085 0.037 0.039

CNN 0.179 0.001 0.008 0.062 - 0.315 0.013 0.223 0.029 0.006 0.085 - 0.415 0.038

RNN 0.202 0.024 0.230 0.067 0.315 - 0.010 0.652 0.305 0.745 0.037 0.415 - 0.020

Co-LSTM 0.016 0.004 0.021 0.026 0.013 0.010 - 0.012 0.011 0.005 0.039 0.038 0.020 -

degrading the performance. No sophisticated manual feature engineering is required, thus avoiding domain-

speciﬁc expertise. It is all due to the use of the pre-trained word-embedding model for embedding the

input feature vector. In the next step, the use of CNN layer before the LSTM network helps to identify505

the important features only from the embedded vector, thus greatly improving the training time and hence

makes it computationally feasible. At the last stage, the use of LSTM network layer helps to build the model

by studying the sequential arrangements in the review rather than just considering words or phrases alone.

Thus the model also incorporates the context study of the review and performs better in case of context

such as negation as well as sarcasm.510

In this study, an application of hybrid neural network architecture of both Recurrent Neural Network

(RNN) and Convolutional Neural Network (CNN) built on the top of the word embedding model has been

presented. The main advantage of this architecture is the sequential study of the important features in a

review to predict the sentiment. Due to the application of the word embedding model and LSTM network,

performance is quite better in multiple domains (as we experimented with movie reviews and airline tweets)515

without any domain-speciﬁc feature engineering. It can also be veriﬁed in other sentence classiﬁcation

activities.

8. Threat to Validation

The proposed architecture is based on the convolutional deep neural networks in the context of natural

language processing. Few limitations of the proposed convolutional LSTM model may be as follows:520

•The deep learning model requires a huge amount of data for proper training and is computationally

intensive too.

•In feature matrix creation, the word embedding model is trained on the pre-trained data corpus.

Pre-trained data corpus should be large enough to cover all frequently used words. If the pre-trained

corpus is not suﬃcient; some of the important features might be missing while training the model.525

•If the initial convolutional layer of the Co-LSTM model is unable to capture some of the texts order

or sequence information, then the convolutional layer may fail to capture the sequential dependency

of the words. Thus, the LSTM layer may act as just a fully connected layer without any memory.

•In this work, the word embedding model based on a pre-trained corpus has been considered. Some-

times, it is quite diﬃcult to deal with misspellings or other irregularities found on the language used530

in social media. However, this can be improvised by building a social media-speciﬁc word-embeddings

model.

Acknowledgement

This research work was supported by Fund for Improvement of S&T Infrastructure in Universities and

Higher Educational Institutions (FIST) Scheme under Department of Science and Technology (DST), Govt.535

of India The authors wish to express their gratitude and heartiest thanks to the department of computer

science & engineering, National Institute of Technology, Rourkela, India for providing their research support.

References

[1] E. Cambria, Aﬀective computing and sentiment analysis, IEEE Intelligent Systems 31 (2) (2016) 102–

107.540

[2] A. Severyn, A. Moschitti, Twitter sentiment analysis with deep convolutional neural networks, in: Pro-

ceedings of the 38th International ACM SIGIR Conference on Research and Development in Information

Retrieval, 2015, pp. 959–962.

[3] Y. Wang, M. Huang, X. Zhu, L. Zhao, Attention-based lstm for aspect-level sentiment classiﬁcation,

in: Proceedings of the 2016 conference on empirical methods in natural language processing, 2016, pp.545

606–615.

[4] W. Yin, K. Kann, M. Yu, H. Sch¨utze, Comparative study of cnn and rnn for natural language processing,

arXiv preprint arXiv:1702.01923.

[5] G. Vinodhini, R. Chandrasekaran, Sentiment analysis and opinion mining: a survey, International

Journal 2 (6) (2012) 282–292.550

[6] B. Liu, Sentiment analysis and opinion mining, Synthesis lectures on human language technologies 5 (1)

(2012) 1–167.

[7] D. Maynard, M. A. Greenwood, Who cares about sarcastic tweets? investigating the impact of sarcasm

on sentiment analysis, in: LREC 2014 Proceedings, ELRA, 2014, pp. 26–31.

[8] P. Biyani, C. Caragea, P. Mitra, C. Zhou, J. Yen, G. E. Greer, K. Portier, Co-training over domain-555

independent and domain-dependent features for sentiment analysis of an online cancer support com-

munity, in: International Conference on Advances in Social Networks Analysis and Mining (ASONAM

2013), IEEE, 2013, pp. 413–417.

[9] A. Bagheri, M. Saraee, F. De Jong, Care more about customers: Unsupervised domain-independent

aspect detection for sentiment analysis of customer reviews, Knowledge-Based Systems 52 (2013) 201–560

213.

[10] C. Dos Santos, M. Gatti, Deep convolutional neural networks for sentiment analysis of short texts,

in: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics:

Technical Papers, 2014, pp. 69–78.

[11] P. Zhou, Z. Qi, S. Zheng, J. Xu, H. Bao, B. Xu, Text classiﬁcation improved by integrating bidirectional565

lstm with two-dimensional max pooling, arXiv preprint arXiv:1611.06639.

[12] Y. Ma, H. Peng, T. Khan, E. Cambria, A. Hussain, Sentic lstm: a hybrid network for targeted aspect-

based sentiment analysis, Cognitive Computation 10 (4) (2018) 639–650.

[13] Y. Ma, H. Peng, E. Cambria, Targeted aspect-based sentiment analysis via embedding commonsense

knowledge into an attentive lstm, in: Thirty-second AAAI conference on artiﬁcial intelligence, 2018,570

pp. 5876–5883.

[14] X. Wang, W. Jiang, Z. Luo, Combination of convolutional and recurrent neural network for senti-

ment analysis of short texts, in: Proceedings of COLING 2016, the 26th international conference on

computational linguistics: Technical papers, 2016, pp. 2428–2437.

[15] G. Rao, W. Huang, Z. Feng, Q. Cong, Lstm with sentence representations for document-level sentiment575

classiﬁcation, Neurocomputing 308 (2018) 49–57.

[16] A. Hussain, E. Cambria, Semi-supervised learning for big social data analysis, Neurocomputing 275

(2018) 1662–1673.

[17] E. Cambria, S. Poria, D. Hazarika, K. Kwok, Senticnet 5: Discovering conceptual primitives for sen-

timent analysis by means of context embeddings, in: Thirty-Second AAAI Conference on Artiﬁcial580

Intelligence, 2018, pp. 1795–1802.

[18] E. Cambria, Y. Li, F. Z. Xing, S. Poria, K. Kwok, Senticnet 6: Ensemble application of symbolic and

subsymbolic ai for sentiment analysis, in: Proceedings of the 29th ACM International Conference on

Information & Knowledge Management, 2020, pp. 105–114.

[19] A. Agarwal, B. Xie, I. Vovsha, O. Rambow, R. Passonneau, Sentiment analysis of twitter data, in:585

Proceedings of the workshop on languages in social media, Association for Computational Linguistics,

2011, pp. 30–38.

[20] Q. Ye, Z. Zhang, R. Law, Sentiment classiﬁcation of online reviews to travel destinations by supervised

machine learning approaches, Expert systems with applications 36 (3) (2009) 6527–6535.

[21] A. Bifet, E. Frank, Sentiment knowledge discovery in twitter streaming data, in: International confer-590

ence on discovery science, Springer, 2010, pp. 1–15.

[22] H. Saif, Y. He, H. Alani, Semantic sentiment analysis of twitter, in: International semantic web confer-

ence, Springer, 2012, pp. 508–524.

[23] R. K. Behera, S. K. Rath, S. Misra, R. Damaˇseviˇcius, R. Maskeli¯unas, Large scale community detection

using a small world model, Applied Sciences 7 (11) (2017) 1173.595

[24] A. Aue, M. Gamon, Customizing sentiment classiﬁers to new domains: A case study, in: Proceedings

of recent advances in natural language processing (RANLP), Vol. 1, Citeseer, 2005, pp. 2–1.

[25] O. Araque, I. Corcuera-Platas, J. F. Sanchez-Rada, C. A. Iglesias, Enhancing deep learning sentiment

analysis with ensemble techniques in social applications, Expert Systems with Applications 77 (2017)

236–246.600

[26] S. Baccianella, A. Esuli, F. Sebastiani, Sentiwordnet 3.0: an enhanced lexical resource for sentiment

analysis and opinion mining., in: LREC, Vol. 10, 2010, pp. 2200–2204.

[27] T. Wilson, J. Wiebe, P. Hoﬀmann, Recognizing contextual polarity in phrase-level sentiment analysis,

in: Proceedings of the conference on human language technology and empirical methods in natural

language processing, Association for Computational Linguistics, 2005, pp. 347–354.605

[28] J. W. Pennebaker, M. R. Mehl, K. G. Niederhoﬀer, Psychological aspects of natural language use: Our

words, our selves, Annual review of psychology 54 (1) (2003) 547–577.

[29] E. Cambria, An introduction to concept-level sentiment analysis, in: Mexican International Conference

on Artiﬁcial Intelligence, Springer, 2013, pp. 478–483.

[30] J. Yoon, H. Kim, Multi-channel lexicon integrated cnn-bilstm models for sentiment analysis, in: Pro-610

ceedings of the 29th Conference on Computational Linguistics and Speech Processing (ROCLING 2017),

2017, pp. 244–253.

[31] R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, P. Kuksa, Natural language processing

(almost) from scratch, Journal of Machine Learning Research 12 (Aug) (2011) 2493–2537.

[32] P. D. Turney, P. Pantel, From frequency to meaning: Vector space models of semantics, Journal of615

artiﬁcial intelligence research 37 (2010) 141–188.

[33] E. Cambria, C. Havasi, A. Hussain, Senticnet 2: A semantic and aﬀective resource for opinion mining

and sentiment analysis., in: FLAIRS conference, 2012, pp. 202–207.

[34] T. Wei, Y. Lu, H. Chang, Q. Zhou, X. Bao, A semantic approach for text clustering using wordnet and

lexical chains, Expert Systems with Applications 42 (4) (2015) 2264–2275.620

[35] K. Sailunaz, R. Alhajj, Emotion and sentiment analysis from twitter text, Journal of Computational

Science 36 (2019) 101003.

[36] C. Chelba, T. Mikolov, M. Schuster, Q. Ge, T. Brants, P. Koehn, T. Robinson, One billion word

benchmark for measuring progress in statistical language modeling, arXiv preprint arXiv:1312.3005.

[37] J. Ramos, et al., Using tf-idf to determine word relevance in document queries, in: Proceedings of the625

ﬁrst instructional conference on machine learning, Vol. 242, Piscataway, NJ, 2003, pp. 133–142.

[38] O. Melamud, O. Levy, I. Dagan, A simple word embedding model for lexical substitution, in: Pro-

ceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing, 2015, pp.

1–7.

[39] O. Levy, Y. Goldberg, Neural word embedding as implicit matrix factorization, in: Advances in neural630

information processing systems, 2014, pp. 2177–2185.

[40] D. Tang, F. Wei, N. Yang, M. Zhou, T. Liu, B. Qin, Learning sentiment-speciﬁc word embedding for

twitter sentiment classiﬁcation, in: Proceedings of the 52nd Annual Meeting of the Association for

Computational Linguistics (Volume 1: Long Papers), Vol. 1, 2014, pp. 1555–1565.

[41] J. Wang, L.-C. Yu, K. R. Lai, X. Zhang, Dimensional sentiment analysis using a regional cnn-lstm635

model, in: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics

(Volume 2: Short Papers), 2016, pp. 225–230.

[42] S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural computation 9 (8) (1997) 1735–1780.

[43] Y. Kim, Convolutional neural networks for sentence classiﬁcation, arXiv preprint arXiv:1408.5882.

[44] C.-N. Chou, C.-K. Shie, F.-C. Chang, J. Chang, E. Y. Chang, Representation learning on large and640

small data, Big Data Anal. Large-Scale Multimed. Search. Wiley, Hoboken, NJ (2019) 3–30.

[45] Y. Wan, Q. Gao, An ensemble sentiment classiﬁcation system of twitter data for airline services analysis,

in: Data Mining Workshop (ICDMW), 2015 IEEE International Conference on, IEEE, 2015, pp. 1318–

1325.

[46] L.-C. Chen, J. T. Barron, G. Papandreou, K. Murphy, A. L. Yuille, Semantic image segmentation with645

task-speciﬁc edge detection using cnns and a discriminatively trained domain transform, in: Proceedings

of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 4545–4554.

Sentiment Analysis of Self Driving Car Dataset: A comparative study of Deep Learning approaches

Article

May 2024

Sentiment Analysis (SA) is a crucial task in understanding public opinions and perceptions towards emerging technologies. In this study, we focus on SA for a self-driving car dataset as it provides valuable insights into public perceptions and opinions towards a transformative technology. The dataset consists of textual reviews associated with sentiment labels, providing insights into how people perceive self-driving car technology. Our objective is to analyze the sentiments expressed in these reviews using Deep Learning (DL) models, namely, Artificial Neural Network (ANN), Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Bidirectional LSTM (BiLSTM), Gated Recurrent Unit (GRU), and Bidirectional GRU (BiGRU). We compared our results with an existing technique in the field of self-driving car sentiment classification, that implemented various Machine Learning (ML) and DL models, including Support Vector Machines (SVM), Naïve Bayes (NB), Logistic Regression (LR), Random Forest (RF), CNN, and LSTM. In our study, we expanded upon this research by evaluating the performance of ANN, BiLSTM, GRU, and BiGRU. Results reveal that BiLSTM, GRU, and BiGRU exhibit superior performance in sentiment classification within the self-driving car dataset. These findings offer valuable insights into public sentiment towards self-driving cars, contributing significantly to the advancement of SA techniques in the domain of autonomous vehicles. Additionally, the results are statistically tested and are statistically significant.

Children’s Sentiment Analysis From Texts by Using Weight Updated Tuned With Random Forest Classification

Article

Full-text available

Jan 2024

Sentimental Analysis is considered a computational strategy that helps in identifying and assessing the emotions of people via text documents. Tools and different methods have been adopted for determining both positive and negative emotions in the form of text data analytics by using Machine and Deep Learning techniques. Experimentally, it has been shown that the accuracy of existing text classification models such as Bi-LSTM, Decision Tree, and Ensemble Classifiers is limited by poor quality data, inappropriate hyperparameter tuning, and model-specific bias levels. Additionally, these models are prone to overfitting, high computational overhead, and longer training time. To overcome these limitations, we proposed a hybrid binary classification framework by combining Deep sequential features with the Random Forest (RF) technique. The approach is implemented in four phases: Initially, data preprocessing is performed by employing a Vader sentiment package. In the second step, the deep Long Short Term Memory (LSTM) model was employed to extract deep sequential features corresponding to sad and happy emotions. In the third phase, a bi-orthogonalization algorithm with principal component Analysis (PCA) and Singular Value Decomposition (SVD) was employed to minimize the redundancy and maximize the relevance of extracted features. Finally, a five-fold cross-validation technique was implemented to discriminate sad and happy emotions using the Random Forest (RF) algorithm. Eventually, a grid search approach was implemented for hyperparameter tuning and results were compared with five baseline algorithms (Vanilla LSTM (VLSTM), Support Vector Machine (SVM), Gradient Boosting Machine (GBM), Naïve Bayes (NB), Ada Boost Algorithm (ABA). The experimental outcomes revealed that the proposed model achieved an accuracy rate of 99.631% on the 4000 stories dataset which was superior to all five state-of-the-art methods with a margin of 4.63%, 10.7%, 19.44%, 21%, and 56.5%, respectively. Interestingly, the proposed model realized improved results in terms of other conventional performance metrics also such as precision, recall, specificity, and time complexity. Overall, the proposed model has great potential in educational institutions, child psychology research, and child-friendly content moderation, generally helping in the understanding of the emotions and experiences of children in the digital realm.

A combined Bi-LSTM-GPT Model for Arabic Sentiment Analysis

Article

Full-text available

Jul 2023

This research investigates the efficacy of ensemble learning within the field of Arabic sentiment analysis. Ensemble learning, which combines predictions from multiple models to enhance accuracy, has shown promising results when compared to individual models. Hence, we propose an ensemble learning model that integrates two robust models: a Bidirectional Long Short-Term Memory (BiLSTM) model and a Generative pre-trained transformers (GPT) model. The GPT model has previously demonstrated effectiveness in various Arabic natural language processing (ANLP) tasks. To examine the performance of our ensemble model, we separately trained the BiLSTM and transformer-based model using three different datasets. We combined the models by aggregating their final probabilities for each class. Through multiple experiments, we compared the effectiveness of the proposed ensemble model with the standalone models. The results clearly indicate that the ensemble learning models outperform the standalone models in Arabic sentiment analysis. Specifically, the proposed ensemble model that demonstrated an accuracy increase of nearly 7% when compared to the best standalone model.

Modeling and Sentiment Analysis of Social Relationships in Elderly Smart Homes Based on Graph Neural Networks

Article

Full-text available

May 2024

Qianqian Hu

With the expansion of Speech Emotion Recognition in the consumer domain, several devices, particularly those designed for managing smart home personal assistants for the elderly, have been widely available on the market. The increasing processing power and connection, together with the growing need to facilitate longer residency through technological interventions, highlight the potential benefits of smart home assistants. Enabling these assistants to recognize human emotions would greatly improve user-assistant communication, allowing the assistant to deliver more constructive and customized feedback to the user. In this research work, Modeling and Sentiment Analysis of Social Relationships in Elderly Smart Homes Based on Graph Neural Networks (SASR-MBHNN-BBOA) is proposed. The input data are collected from Social Recommendation Dataset. Then, input data are pre-processed utilizing Inverse Optimal Safety Filters (IOSF) for cleaning the data and removing the background noise. Then the pre-processed data are given to Memristive Bi-neuron Hopfield Neural Network (MBHNN) for predicting the sentiments like positive, negative and neutral. In general, MBHNN doesn’t express some adoption of optimization approaches for determining optimal parameters to predicting the sentiments accurately. Hence BBOA is proposed to optimize MBHNN classifier which precisely predicts the sentiments in elderly smart home. The proposed SASR-MBHNN-BBOA method is implemented in Python, and it assessed with numerous performance metrics such as accuracy, precision, recall, F1-score, ROC. The outcomes show SASR-MBHNN-BBOA attains 20.8%, 19.5%, and 29.6% higher Accuracy, 28.8%, 22.5%, and 32.6% higher Precision, 15.5%, 27.4%, and 18.2% higher Recall are analysed with existing methods such as, Emotional speech analysis in real time for smart home assistants.(SASR-CNN-SHA), Machine Learning to Investigate Elderly Care Requirements in China via the Lens of Family Caregivers (SASR-ML-IECR),Identifying User Emotions via Audio Conversations with Smart Assistants (SASR-DNN-EASA) methods respectively.

Experimental study on short-text clustering using transformer-based semantic similarity measure

Article

May 2024

Sentence clustering plays a central role in various text-processing activities and has received extensive attention for measuring semantic similarity between compared sentences. However, relatively little focus has been placed on evaluating clustering performance using available similarity measures that adopt low-dimensional continuous representations. Such representations are crucial in domains like sentence clustering, where traditional word co-occurrence representations often achieve poor results when clustering semantically similar sentences that share no common words. This article presents a new implementation that incorporates a sentence similarity measure based on the notion of embedding representation for evaluating the performance of three types of text clustering methods: partitional clustering, hierarchical clustering, and fuzzy clustering, on standard textual datasets. This measure derives its semantic information from pre-training models designed to simulate human knowledge about words in natural language. The article also compares the performance of the used similarity measure by training it on two state-of-the-art pre-training models to investigate which yields better results. We argue that the superior performance of the selected clustering methods stems from their more effective use of the semantic information offered by this embedding-based similarity measure. Furthermore, we use hierarchical clustering, the best-performing method, for a text summarization task and report the results. The implementation in this article demonstrates that incorporating the sentence embedding measure leads to significantly improved performance in both text clustering and text summarization tasks.

Identification of Effective Deep Learning Approaches for Classifying Sentiments at Aspect Level in Different Domain

Conference Paper

Dec 2023

ASIF: attention-based sentiment inquiry framework for profound product recommendations

Article

Full-text available

May 2024
MULTIMED TOOLS APPL

Asif Nawaz

Online product recommendation has gained much popularity in recent years and has become the most demanding research area that can help consumers make better purchasing decisions. Recently, many machine learning techniques have been tested on various datasets for analyzing customer sentiments through online portals. Still, customers have difficulty finding profound products due to a lack of depth-level recommendations. The existing models for product recommendation may rely on either text or image reviews and ignore the multus-medium based reviews that lead to a poor recommendation. Furthermore, the recommendation system does not properly utilize product ranking. To effectively analyze the sentiment of online products, a novel framework ASIF is suggested in this manuscript. The key steps of the proposed framework are multi-modal data collection, normalization, text and image-based feature extraction, two-level feature fusion and extended transfer learning-based recommendation at binary level and multilevel. Five different datasets have been used for the evaluation of ASIF. From the experimental analysis and comparison to the baseline methods, it has been observed that the accuracy, precision, recall and F-Score of ASIF is far better, giving 95.95% and 94.95% on the standard dataset.

Mongolian Text Sentiment Analysis Based on Multi-scale CNN and mLSTM Mode

Conference Paper

Aug 2023

Investigation of causal public opinion indexes for price fluctuation in vegetable marketing

Article

May 2024
COMPUT ELECTR ENG

CiteNet: Cross-modal incongruity perception network for multimodal sentiment prediction

Article

Apr 2024
KNOWL-BASED SYST

Sentic LSTM: a Hybrid Network for Targeted Aspect-Based Sentiment Analysis

Article

Full-text available

Aug 2018

Sentiment analysis has emerged as one of the most popular natural language processing (NLP) tasks in recent years. A classic setting of the task mainly involves classifying the overall sentiment polarity of the inputs. However, it is based on the assumption that the sentiment expressed in a sentence is unified and consistent, which does not hold in the reality. As a fine-grained alternative of the task, analyzing the sentiment towards a specific target and aspect has drawn much attention from the community for its more practical assumption that sentiment is dependent on a particular set of aspects and entities. Recently, deep neural models have achieved great successes on sentiment analysis. As a functional simulation of the behavior of human brains and one of the most successful deep neural models for sequential data, long short-term memory (LSTM) networks are excellent in learning implicit knowledge from data. However, it is impossible for LSTM to acquire explicit knowledge such as commonsense facts from the training data for accomplishing their specific tasks. On the other hand, emerging knowledge bases have brought a variety of knowledge resources to our attention, and it has been acknowledged that incorporating the background knowledge is an important add-on for many NLP tasks. In this paper, we propose a knowledge-rich solution to targeted aspect-based sentiment analysis with a specific focus on leveraging commonsense knowledge in the deep neural sequential model. To explicitly model the inference of the dependent sentiment, we augment the LSTM with a stacked attention mechanism consisting of attention models for the target level and sentence level, respectively. In order to explicitly integrate the explicit knowledge with implicit knowledge, we propose an extension of LSTM, termed Sentic LSTM. The extended LSTM cell includes a separate output gate that interpolates the token-level memory and the concept-level input. In addition, we propose an extension of Sentic LSTM by creating a hybrid of the LSTM and a recurrent additive network that simulates sentic patterns. In this paper, we are mainly concerned with a joint task combining the target-dependent aspect detection and targeted aspect-based polarity classification. The performance of proposed methods on this joint task is evaluated on two benchmark datasets. The experiment shows that the combination of proposed attention architecture and knowledge-embedded LSTM could outperform state-of-the-art methods in two targeted aspect sentiment tasks. We present a knowledge-rich solution for the task of targeted aspect-based sentiment analysis. Our model can effectively incorporate the commonsense knowledge into the deep neural network and be trained in an end-to-end manner. We show that the two-step attentive neural architecture as well as the proposed Sentic LSTM and H-Sentic-LSTM can achieve an improved performance on resolving the aspect categories and sentiment polarity for a targeted entity in its context over state-of-the-art systems.

Targeted Aspect-Based Sentiment Analysis�via Embedding Commonsense Knowledge into an Attentive LSTM

Conference Paper

Full-text available

Feb 2018

In this paper, we propose a solution to targeted aspect-based sentiment analysis. We augment the long short-term memory (LSTM) network with a hierarchical attention mechanism consisting of a target-level attention and a sentence-level attention. Commonsense knowledge of sentiment- related concepts is incorporated into the end-to-end training of a deep neural network for sentiment classification. In order to tightly integrate the commonsense knowledge into the recurrent encoder, we propose an extension of LSTM, termed Sentic LSTM. We conduct experiments on two publicly released datasets, which show that the combination of the proposed attention architecture and Sentic LSTM can outperform state-of-the-art methods in targeted aspect sentiment.

Large Scale Community Detection Using a Small World Model

Article

Full-text available

Nov 2017

In a social network, small or large communities within the network play a major role in deciding the functionalities of the network. Despite of diverse definitions, communities in the network may be defined as the group of nodes that are more densely connected as compared to nodes outside the group. Revealing such hidden communities is one of the challenging research problems. A real world social network follows small world phenomena, which indicates that any two social entities can be reachable in a small number of steps. In this paper, nodes are mapped into communities based on the random walk in the network. However, uncovering communities in large-scale networks is a challenging task due to its unprecedented growth in the size of social networks. A good number of community detection algorithms based on random walk exist in literature. In addition, when large-scale social networks are being considered, these algorithms are observed to take considerably longer time. In this work, with an objective to improve the efficiency of algorithms, parallel programming framework like Map-Reduce has been considered for uncovering the hidden communities in social network. The proposed approach has been compared with some standard existing community detection algorithms for both synthetic and real-world datasets in order to examine its performance, and it is observed that the proposed algorithm is more efficient than the existing ones.

Enhancing Deep Learning Sentiment Analysis with Ensemble Techniques in Social Applications

Article

Full-text available

Feb 2017
EXPERT SYST APPL

Deep learning techniques for Sentiment Analysis have become very popular. They provide automatic feature extraction and both richer representation capabilities and better performance than traditional feature based techniques (i.e., surface methods). Traditional surface approaches are based on complex manually extracted features, and this extraction process is a fundamental question in feature driven methods. These long-established approaches can yield strong baselines, and their predictive capabilities can be used in conjunction with the arising deep learning methods. In this paper we seek to improve the performance of deep learning techniques integrating them with traditional surface approaches based on manually extracted features. The contributions of this paper are sixfold. First, we develop a deep learning based sentiment classifier using a word embeddings model and a linear machine learning algorithm. This classifier serves as a baseline to compare to subsequent results. Second, we propose two ensemble techniques which aggregate our baseline classifier with other surface classifiers widely used in Sentiment Analysis. Third, we also propose two models for combining both surface and deep features to merge information from several sources. Fourth, we introduce a taxonomy for classifying the different models found in the literature, as well as the ones we propose. Fifth, we conduct several experiments to compare the performance of these models with the deep learning baseline. For this, we use seven public datasets that were extracted from the microblogging and movie reviews domain. Finally, as a result, a statistical study confirms that the performance of these proposed models surpasses that of our original baseline on F1-Score.

Attention-based LSTM for Aspect-level Sentiment Classification

Conference Paper

Full-text available

Jan 2016

SenticNet 5: Discovering Conceptual Primitives for Sentiment Analysis by Means of Context Embeddings

Article

Apr 2018

With the recent development of deep learning, research in AI has gained new vigor and prominence. While machine learning has succeeded in revitalizing many research fields, such as computer vision, speech recognition, and medical diagnosis, we are yet to witness impressive progress in natural language understanding. One of the reasons behind this unmatched expectation is that, while a bottom-up approach is feasible for pattern recognition, reasoning and understanding often require a top-down approach. In this work, we couple sub-symbolic and symbolic AI to automatically discover conceptual primitives from text and link them to commonsense concepts and named entities in a new three-level knowledge representation for sentiment analysis. In particular, we employ recurrent neural networks to infer primitives by lexical substitution and use them for grounding common and commonsense knowledge by means of multi-dimensional scaling.

Emotion and Sentiment Analysis from Twitter Text

Article

Jul 2019

Online social networks have emerged as new platform that provide an arena for people to share their views and perspectives on different issues and subjects with their friends, family, relatives, etc. We can share our thoughts, mental state, moments, stand on specific social, national, international issues through text, photos, audio and video messages and posts. Indeed, despite the availability of other forms of communication, text is still one of the most common ways of communication in a social network. The target of the work described in this paper is to detect and analyze sentiment and emotion expressed by people from text in their twitter posts and use them for generating recommendations. We collected tweets and replies on few specific topics and created a dataset with text, user, emotion, sentiment information, etc. We used the dataset to detect sentiment and emotion from tweets and their replies and measured the influence scores of users based on various user-based and tweet-based parameters. Finally, we used the latter information to generate generalized and personalized recommendations for users based on their twitter activity. The method we used in this paper includes some interesting novelties such as, (i) including replies to tweets in the dataset and measurements, (ii) introducing agreement score, sentiment score and emotion score of replies in influence score calculation, (iii) generating general and personalized recommendation containing list of users who agreed on the same topic and expressed similar emotions and sentiments towards that particular topic.

Representation Learning on Large and Small Data

Chapter

Mar 2019

Extracting useful features from a scene is an essential step in any computer vision and multimedia data analysis task. The approaches in feature extraction can be divided into two categories: model‐centric and data‐driven. This chapter focuses on how neural networks, specifically convolutional neural networks (CNNs), achieve effective representation learning. It reviews representative CNN models proposed since 2012. The chapter deals with the small data problem. It presents how features learned from one source domain with big data can be transferred to a different target domain with small data. Deep learning has its roots in neuroscience. CNNs are composed of two major components: feature extraction and classification. The common practice of transfer representation learning is to pretrain a CNN on a very large dataset and then to use the pretrained CNN as either an initialization or a fixed feature extractor for the task of interest.

LSTM with sentence representations for Document-level Sentiment Classification

Article

May 2018
NEUROCOMPUTING

Recently, due to their ability to deal with sequences of different lengths, neural networks have achieved a great success on sentiment classification. It is widely used on sentiment classification. Especially long short-term memory networks. However, one of the remaining challenges is to model long texts to exploit the semantic relations between sentences in document-level sentiment classification. Existing Neural network models are not powerful enough to capture enough sentiment messages from relatively long time-steps. To address this problem, we propose a new neural network model (SR-LSTM) with two hidden layers. The first layer learns sentence vectors to represent semantics of sentences with long short term memory network, and in the second layer, the relations of sentences are encoded in document representation. Further, we also propose an approach to improve it which first clean datasets and remove sentences with less emotional polarity in datasets to have a better input for our model. The proposed models outperform the state-of-the-art models on three publicly available document-level review datasets.

Semi-Supervised Learning for Big Social Data Analysis

Article

Oct 2017
NEUROCOMPUTING

In an era of social media and connectivity, web users are becoming increasingly enthusiastic about interacting, sharing, and working together through online collaborative media. More recently, this collective intelligence has spread to many different areas, with a growing impact on everyday life, such as in education, health, commerce and tourism, leading to an exponential growth in the size of the social Web. However, the distillation of knowledge from such unstructured Big data is, an extremely challenging task. Consequently, the semantic and multimodal contents of the Web in this present day are, whilst being well suited for human use, still barely accessible to machines. In this work, we explore the potential of a novel semi-supervised learning model based on the combined use of random projection scaling as part of a vector space model, and support vector machines to perform reasoning on a knowledge base. The latter is developed by merging a graph representation of commonsense with a linguistic resource for the lexical representation of affect. Comparative simulation results show a significant improvement in tasks such as emotion recognition and polarity detection, and pave the way for development of future semi-supervised learning approaches to big social data analytics.

Co-LSTM: Convolutional LSTM model for sentiment analysis in social big data

Abstract and Figures

Recommended publications

A Hybrid Model for Review Analysis Using Deep Learning

SentiCircles: A Platform for Contextual and Conceptual Sentiment Analysis

SentiCircles for Contextual and Conceptual Semantic Sentiment Analysis of Twitter

SentiCircles for Contextual and Conceptual Semantic Sentiment Analysis of Twitter