ArticlePDF Available

Methodological Study Of Opinion Mining And Sentiment Analysis Techniques

Authors:
  • University of Technology and Applied Sciences (CAS - Ibri) Oman
  • Thakur Publication pvt. ltd

Abstract

Decision making both on individual and organizational level is always accompanied by the search of other’s opinion on the same. With tremendous establishment of opinion rich resources like, reviews, forum discussions, blogs, micro-blogs, Twitter etc provide a rich anthology of sentiments. This user generated content can serve as a benefaction to market if the semantic orientations are deliberated. Opinion mining and sentiment analysis are the formalization for studying and construing opinions and sentiments. The digital ecosystem has itself paved way for use of huge volume of opinionated data recorded. This paper is an attempt to review and evaluate the various techniques used for opinion and sentiment analysis.
International Journal on Soft Computing (IJSC) Vol. 5, No. 1, February 2014
DOI: 10.5121/ijsc.2014.5102 11
METHODOLOGICAL STUDY OF OPINION MINING
AND SENTIMENT ANALYSIS TECHNIQUES
Pravesh Kumar Singh1,Mohd Shahid Husain2
1M.Tech, Department of Computer Science and Engineering, Integral University,
Lucknow, India
2Assistant Professor, Department of Computer Science and Engineering, Integral
University, Lucknow, India
ABSTRACT
Decision making both on individual and organizational level is always accompanied by the search of
other’s opinion on the same. With tremendous establishment of opinion rich resources like, reviews, forum
discussions, blogs, micro-blogs, Twitter etc provide a rich anthology of sentiments. This user generated
content can serve as a benefaction to market if the semantic orientations are deliberated. Opinion mining
and sentiment analysis are the formalization for studying and construing opinions and sentiments. The
digital ecosystem has itself paved way for use of huge volume of opinionated data recorded. This paper is
an attempt to review and evaluate the various techniques used for opinion and sentiment analysis.
KEYWORDS
Opinion Mining, Sentiment Analysis, Feature Extraction Techniques, Naïve Bayes Classifiers, Clustering,
Support Vector Machines
1. INTRODUCTION
Generally individuals and companies are always interested in others opinion like if someone
wants to purchase a new product, then firstly, he/she tries to know the reviews i.e., what other
people think about the product and based on those reviews, he/she takes the decision.
Similarly, companies also excavate deep for consumer reviews. Digital ecosystem has a plethora
for same in the form of blogs, reviews etc.
A very basic step of opinion mining and sentiment analysis is feature extraction. Figure 1 shows
the process of opinion mining and sentiment analysis
.
International Journal on Soft Computing (IJSC) Vol. 5, No. 1, February 2014
12
There are various methods used for opinion mining and sentiment analysis among which
following are the important ones:
1) Naïve Bays Classifier.
2) Support Vector Machine (SVM).
3) Multilayer Perceptron.
4) Clustering.
In this paper, categorization of work done for feature extraction and classification in opinion
mining and sentiment analysis is done. In addition to this, performance analysis, advantages and
disadvantages of different techniques are appraised.
2. DATA SETS
This section provides brief details of datasets used in experiments.
2.1. Product Review Dataset
Blitzer takes the review of products from amazon.com which belong to a total of 25 categories
like videos, toys etc. He randomly selected 4000 +ve and 4000 ve reviews.
2.2. Movie Review Dataset
The movie review dataset is taken from the Pang and Lee (2004) works. It contains movie review
with feature of 1000 +ve and 1000 ve processed movie reviews.
3. CLASSIFICATION TECHNIQUES
3.1. Naïve Bayes Classifier
It’s a probabilistic and supervised classifier given by Thomas Bayes. According to this theorem,
if there are two events say, e1and e2then the conditional probability of occurrence of event e1
when e2has already occurred is given by the following mathematical formula:
2
112
21 )()|(
)|( eePeeP
eeP =
This algorithm is implemented to calculate the probability of a data to be positive or negative. So,
conditional probability of a sentiment is given as:
)Sentence(P )Sentiment|Sentence(P)Sentiment(P
)Sentence|Sentiment(P =
And conditional probability of a word is given as:
WordofnosTotal+classatobelongingwordsofNumber 1+classinoccurencewordofNumber
=)Sentiment|Word(P
Algorithm
S1: Initialize P(positive) num popozitii (positive)/ num_total_propozitii
International Journal on Soft Computing (IJSC) Vol. 5, No. 1, February 2014
13
S2: Initialize P(negative) num popozitii (negative) / num_total_propozitii
S3: Convert sentences into words
for each class of {positive, negative}:
for each word in {phrase}
P(word | class) < num_apartii (word | class) 1 | num_cuv (class) +
num_total_cuvinte
P (class) P (class) * P (word | class)
Returns max {P(pos), P(neg)}
The above algorithm can be represented using figure 2
3.1.1. Evaluation of Algorithm
To evaluate the algorithm following measures are used:
Accuracy
Precision
Recall
Relevance
Following contingency table is used to calculate the various measures.
Relevant
Irrelevant
True Positive (tp)
False Positive (fp)
False Negative (fn)
True Negative (tn)
Now, Precision =
fp+tptp
Accuracy =
F,
fn+fp+tn+tp tn+tp
=
callReecisionPr callRe*ecisionPr*2 +
; Recall =
fn+tptp
Classifier
Classifier
Training Set
+ve Sentence
ve Sentence
Classifier
Sentence
Review
Book Review
Figure 2. Algorithm of Naïve Bayes
International Journal on Soft Computing (IJSC) Vol. 5, No. 1, February 2014
14
3.1.2. Accuracy
On the 5000 sentences [1] Ion SMEUREANU, Cristian BUCUR train the Naïve Gauss Algorithm
and got 0.79939209726444 accuracy; Where number of groups (n) is 2.
3.1.3. Advantages of Naïve Bayes Classification Method
1. Model is easy to interpret.
2. Efficient computation.
3.1.4. Disadvantage of Naïve Bayes Classification Method
Assumptions of attributes being independent, which may not be necessarily valid.
3.2 Support Vector Machine (SVM)
SVM is a supervised learning model. This model is associated with a learning algorithm that
analyzes the data and identifies the pattern for classification.
The concept of SVM algorithm is based on decision plane that defines decision boundaries. A
decision plane separates group of instances having different class memberships.
For example, consider an instance which belongs to either class Circle or Diamond. There is a
separating line (figure 3) which defines a boundary. At the right side of boundary all instances are
Circle and at the left side all instances are Diamond.
Is there is an exercise/training data set D, a set of n points is written as:
( ) { }
{ }
)1.......(1,1c,Rxc,xD n
1i
i
p
iii
¬εε=
Where, xiis a p-dimensional real vector. Find the maximum-margin hyper plane i.e. splits the
points having ci= 1 from those having ci= -1. Any hyperplane can be written as the set of points
satisfying:
)........(21b-xw =
Support
Vectors
Support
Vectors
Figure 3. Principle of SVM
International Journal on Soft Computing (IJSC) Vol. 5, No. 1, February 2014
15
Finding a maximum margin hyperplane, reduces to find the pair w and b, such that the distance
between the hyperplanes is maximal while still separating the data. These hyperplanes are
described by:
1b=xw
and
1bxw =
The distance between two hyperplanes is
w
b
and therefore
w
needs to be minimized. The
minimized
w
in w, b subject to ci(w.xi
b) 1 for any i = 1n.
Using Lagrange’s multipliers (αi) this optimization problem can be expressed as:
( )
3.....}]1-)b-x.w(c[-w
2
1
{
b,w
maxmin
n
1i iii
2
α
α=
3.2.1. Extensions of SVM
There are some extensions which makes SVM more robust and adaptable to real world problem.
These extensions include the following:
1. Soft Margin Classification
In text classification sometimes data are linearly divisible, for very high dimensional problems
and for multi-dimensional problems data are also separable linearly. Generally (in maximum
cases) the opinion mining solution is one that classifies most of the data and ignores outliers and
noisy data. If a training set data say D cannot be separated clearly then the solution is to have fat
decision classifiers and make some mistake.
Mathematically, a slack variable ξiare introduced that are not equal to zero which allows xito not
meet the margin requirements with a cost i.e., proportional to ξ.
2. Non-linear Classification
Non-linear classifiers are given by the Bemhard Boser, Isabelle Guyon and Vapnik in 1992 using
kernel to max margin hyperplanes.
Aizeman given a kernel trick i.e., every dot product is replaced by non-linear kernel function.
When this case is apply then the effectiveness of SVM lies in the selection of kernel and soft
margin parameters.
3. Multiclass SVM
Basically SVM relevant for two class tasks but for the multiclass problems there is multiclass
SVM is available. In the multi class case labels are designed to objects which are drawn from a
finite set of numerous elements. These binary classifiers might be built using two classifiers:
1. Distinguishing one versus all labels and
2. Among each pair of classes one versus one.
3.2.2. Accuracy
When pang take unigrams learning method then it gives the best output in a presence based
frequency model run by SVM and he calculated 82.9% accuracy in the process.
International Journal on Soft Computing (IJSC) Vol. 5, No. 1, February 2014
16
3.2.3. Advantages of Support Vector Machine Method
1. Very good performance on experimental results.
2. Low dependency on data set dimensionality.
3.2.4. Disadvantages of Support Vector Machine Method
1. One disadvantages of SVM is i.e. in case of categorical or missing value it needs pre-processed.
2. Difficult interpretation of resulting model.
3.3. Multi-Layer Perceptron (MLP)
Multi-Layer perceptron is a feed forward neural network, with one or N layers among inputs and
output. Feed forward means i.e, uni-direction flow of data such as from input layer to output
layer. This ANN which multilayer perceptron begin with input layer where every node means a
predicator variable. Input nodes or neurons are connected with every neuron in next layer (named
as hidden layers). The hidden layer neurons are connected to other hidden layer neuron.
Output layer is made up as follows:
1. When prediction is binary output layer made up of one neuron and
2. When prediction is non-binary then output layer made up of N neuron.
This arrangement makes an efficient flow of information from input layer to output layer.
Figure 4 shows the structure of MLP. In figure 4 there is input layer and an output layer like
single layer perceptron but there is also a hidden layer work in this algorithm.
MLP is a back propagation algorithm and has two phases:
Phase I: It is the forward phase where activation are propagated from the input layer to output
layer.
Phase II: In this phase to change the weight and bias value errors among practical & real values
and the requested nominal value in the output layer is propagate in the backward direction.
MLP is popular technique due to the fact i.e. it can act as universal function approximator. MLP
is a general, flexible and non-linear tool because a “back propagation” network has minimum one
hidden layer with various non-linear entities that can learn every function or relationship between
group of input and output variable (whether variables are discrete or continuous).
International Journal on Soft Computing (IJSC) Vol. 5, No. 1, February 2014
17
An advantage of MLP, compare to classical modeling method is that it does not enforce any sort
of constraint with respect to the initial data neither does it generally start from specific
assumptions.
Another benefit of the method lies in its capability to evaluation good models even despite the
presence of noise in the analyzed information, as arises when there is an existence of omitted and
outlier values in the spreading of the variables. Hence, it is a robust method when dealing with
problems of noise in the given information.
3.3.1. Accuracy
On the health care data Ludmila I. Kuncheva, (IEEE Member) calculate accuracy of MLP as
84.25%-89.50%.
3.3.2. Advantages of MLP
1. It acts as a universal function approximator.
2. MLP can learn each and every relationship among input and output variables.
3.3.3. Disadvantages of MLP
1. MLP needs more time for execution compare to other technique because flexibility lies in the
need to have enough training data.
2. It is considered as complex “black box”.
3.4 Clustering Classifier
Clustering is an unsupervised learning method and has no labels on any point. Clustering
technique recognizes the structure in data and group, based on how nearby they are to one
another.
So, clustering is process of organizing objects and instances in a class or group whose members
are similar in some way and members of class or cluster is not similar to those are in the other
cluster
This method is an unsupervised method, so one does not know that how many clusters or groups
are existing in the data.
Using this method one can organize the data set into different clusters based on the similarities
and distance among data points.
International Journal on Soft Computing (IJSC) Vol. 5, No. 1, February 2014
18
Clustering organization is denoted as a set of subsets C = C1. . . Ckof S, such that:
φ== =ji
k
1i iCCandCS
for
ji
. Therefore, any object in S related to exactly one and only one
subset.
For example, consider figure 5 where data set has three normal clusters.
Now consider the some real-life examples for illustrating clustering:
Example 1: Consider the people having similar size together to make small and large shirts.
1. Tailor-made for each person: expensive
2. One-size-fits-all: does not fit all.
Example 2: In advertising, segment consumers according to their similarities: To do targeted
advertising.
Example 3: To create a topic hierarchy, we can take a group of text and organize those texts
according to their content matches.
Basically there are two types of measures used to estimate the relation: Distance measures and
similarity measures.
Basically following are two kinds of measures used to guesstimate this relation:
1. Distance measures and
2. Similarity measures
Distance Measures
To get the similarity and difference between the group of objects distance measures uses the
various clustering methods.
It is convenient to represent the distance between two instances let say xiand xjas: d (xi, xj). A
valid distance measure should be symmetric and gains its minimum value (usually zero) in case
of identical vectors.
If distance measure follows the following properties then it is known as metric distance measure:
Sx,x
x=x0=)x,x(d.2
Sx,x,x
)x,x(d+)x,x(d)x,x(dinequalityTriangle.1
ji
jiji
kji
kjjiki
There are variations in distance measures depending upon the attribute in question.
3.4.1. Clustering Algorithms
A number of clustering algorithms are getting popular. The basic reason of a number of clustering
methods is that “cluster” is not accurately defined (Estivill-Castro, 2000). As a result many
clustering methods have been developed, using a different induction principle.
1. Exclusive Clustering
In this clustering algorithm, data are clusters in an exclusive way, so that a data fits to only one
certain cluster. Example of exclusive clustering is K-means clustering.
International Journal on Soft Computing (IJSC) Vol. 5, No. 1, February 2014
19
2. Overlapping Clustering
This clustering algorithm uses fuzzy sets to grouped data, so each point may fit to two or more
groups or cluster with various degree of membership.
3. Hierarchical Clustering
Hierarchical clustering has two variations: agglomerative and divisive clustering
Agglomerative clustering is based on the union among the two nearest groups. The start state is
realized by setting every data as a group or cluster. After some iteration it gets the final clusters
needed. It is a bottom-up version.
Divisive clustering begins from one group or cluster containing all data items. At every step,
clusters are successively fragmented into smaller groups or clusters according to some difference.
It is a top-down version.
4. Probabilistic Clustering
It is a mix of Gaussian, and uses totally a probabilistic approach.
3.4.2. Evaluation Criteria Measures for Clustering Technique
Basically, it is divided into two group’s internal quality criteria and external quality criteria.
1. Internal Quality Criteria
Using similarity measure it measures the compactness if clusters. It generally takes into
consideration intra-cluster homogeneity, the inter-cluster separability or a combination of these
two. It doesn’t use any exterior information beside the data itself.
2. External Quality Criteria
External quality criteria are important for observing the structure of the cluster match to some
previously defined classification of the instance or objects.
3.4.3. Accuracy
Depending on the data accuracy of the clustering techniques varied from 65.33% to 99.57%.
3.4.4. Advantages of Clustering Method
The most important benefit of this technique is that it offers the classes or groups that fulfill
(approximately) an optimality measure.
3.4.5. Disadvantages of Clustering Method
1. There is no learning set of labeled observations.
2. Number of groups is usually unknown.
3. Implicitly, users already choose the appropriate features and distance measure.
4. CONCLUSION
The important part of gathering information always seems as, what the people think. The rising
accessibility of opinion rich resources such as online analysis websites and blogs means that, one
can simply search and recognize the opinions of others. One can precise his/her ideas and
opinions concerning goods and facilities. These views and thoughts are subjective figures which
signify opinions, sentiments, emotional state or evaluation of someone.
International Journal on Soft Computing (IJSC) Vol. 5, No. 1, February 2014
20
In this paper, different methods for data (feature or text) extraction are presented. Every method
has some benefits and limitations and one can use these methods according to the situation for
feature and text extraction. Based on the survey we can find the accuracy of different methods in
different data set using N-gram feature shown in table 1.
Table 1: Accuracy of Different Methods
Movie Reviews
Product Reviews
N-gram
Feature
NB
MLP
SVM
NB
MLP
SVM
75.50
81.05
81.15
62.50
79.27
79.40
According to the survey, accuracy of SVM is better than other three methods when N-gram
feature was used.
The four methods discussed in the paper are actually applicable in different areas like clustering is
applied in movie reviews and SVM techniques is applied in biological reviews & analysis.
Although the field of opinion mining is new, but still diverse methods available to provide a way
to implement these methods in various programming languages like PHP, Python etc. with an
outcome of innumerable applications. From a convergent point of view Naïve Bayes is best
suitable for textual classification, clustering for consumer services and SVM for biological
reading and interpretation.
ACKNOWLEDGEMENTS
Every good writing requires the help and support of many people for it to be truly good. I would
take the opportunity of thanking all those who extended a helping hand whenever I needed one.
I offer my heartfelt gratitude to Mr. Mohd. Shahid Husain, who encouraged, guided and helped
me a lot in the project. I extent my thanks to Miss. Ratna Singh (fiancee) for her incandescent
help to complete this paper.
A vote of thanks to my family for their moral and emotional support. Above all utmost thanks to
the Almighty God for the divine intervention in this academic endeavor.
REFERENCES
[1] Ion SMEUREANU, Cristian BUCUR, Applying Supervised Opinion Mining Techniques on Online
User Reviews, Informatica Economică vol. 16, no. 2/2012.
[2] Bo Pang and Lillian Lee, “Opinion Mining and Sentiment Analysis”, Foundations and TrendsR_ in
Information Retrieval Vol. 2, Nos. 12 (2008).
[3] Abbasi, “Affect intensity analysis of dark web forums,” in Proceedings of Intelligence and Security
Informatics (ISI), pp. 282288, 2007.
[4] K. Dave, S. Lawrence & D. Pennock. \Mining the Peanut Gallery: Opinion Extraction and Semantic
Classi_cation of Product Reviews." Proceedings of the 12th International Conference on World Wide
Web, pp. 519-528, 2003.
[5] B. Liu. \Web Data Mining: Exploring hyperlinks, contents, and usage data," Opinion Mining.
Springer, 2007.
[6] B. Pang & L. Lee, \Seeing stars: Exploiting class relationships for sentiment categorization with
respect to rating scales." Proceedings of the Association for Computational Linguistics (ACL), pp.
15124,2005.
[7] Nilesh M. Shelke, Shriniwas Deshpande, Vilas Thakre, Survey of Techniques for Opinion Mining,
International Journal of Computer Applications (0975 8887) Volume 57No.13, November 2012.
International Journal on Soft Computing (IJSC) Vol. 5, No. 1, February 2014
21
[8] Nidhi Mishra and C K Jha, Classification of Opinion Mining Techniques, International Journal of
Computer Applications 56 (13):1-6, October 2012, Published by Foundation of Computer Science,
New York, USA.
[9] Oded Z. Maimon, Lior Rokach, “Data Mining and Knowledge Discovery Handbook” Springer, 2005.
[10] Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. “Sentiment classification using machine
learning techniques.” In Proceedings of the 2002 Conference on Empirical Methods in Natural
Language Processing (EMNLP), pages 7986.
[11] Towards Enhanced Opinion Classification using NLP Techniques, IJCNLP 2011, pages 101107,
Chiang Mai, Thailand, November 13, 2011
Author
Pravesh Kumar Singh is a fine blend of strong scientific orientation and editing.
He is a Computer Science (Bachelor in Technology) graduate from a renowned
gurukul in India called Dr. Ram Manohar Lohia Awadh University with
excellence not only in academics but also had flagship in choreography. He
mastered in Computer Science and Engineering from Integral University,
Lucknow, India. Currently he is acting as Head MCA (Master in Computer
Applications) department in Thakur Publications and also working in the capacity
of Senior Editor.
... Consequently, accurate sentiment analysis hinges on the initial identification and elimination of botgenerated content [10]. Furthermore, sentiment analysis is highly reliant on suitable training datasets, and SVMs have emerged as a powerful model, demonstrating an impressive 80 percent accuracy in predicting the influence of political parties [11]. However, an essential prerequisite involves identifying human-generated content accurately amidst the presence of bots and other automated programs [12] . ...
... The text-blob automatically removes Hashtag's symbol from the fetched tweets. According to one idea, [11] adding the content, keywords, or title of the documents that a URL leads to as a feature that will improve our performance. d) Removing Numbers, Special Characters and Punctuations: Text-blob also removes punctuation and numbers from the data to make it less noisy. ...
... Pada teknik klasifikasi machine learning di atas dapat diketahui bahwa naïve bayes digunakan untuk menghitung probabilitas dengan waktu komputasi yang pendek pada data latih dengan performa yang baik (Jadhav dan Channe, 2016). Selain itu, penggunaan algoritma naïve bayes yang mudah (dalam interpretasi model) dan tingkat efisiensi yang baik, dipandang sebagai keunggulan dibanding algoritma atau teknik lainnya dalam machine learning (Ahmad et al., 2017;Singh & Husain, 2014). Decision tree dan K-nearest neighbors (KNN) merupakan algoritma supervised learning dengan pengambilan keputusan menggunakan model seperti pohon dan data yang baru masuk tergantung dengan nilai yang paling banyak pada "tetangga" nya. ...
Article
Full-text available
Analisis sentimen atau bisa disebut juga opinion mining merupakan salah satu tugas utama dari Natural Language Processing (NLP) yang merupakan studi komputasi yang mempelajari tentang pendapat seseorang terhadap suatu topik bahasan atau entitas. Analisis dilakukan dengan algoritma machine learning (pembelajaran mesin) Naïve Bayes, Decision Tree, dan K-Nearest Neighbor dengan membagi sentimen ke dalam dua kategori sentimen yaitu sentimen positif dan sentimen negatif. Data analisis diambil dari Financial Opinion Mining and Question Answering (FiQA) dan The Financial PhraseBank yang terdiri dari 4.840 kalimat yang dipilih dari berbagai berita keuangan dan dianotasi oleh 16 annotator berbeda yang berpengalaman dalam domain finansial. Penelitian ini ditujukan untuk mendapatkan hasil analisis sentimen dengan algoritma terbaik melalui perbandingan performa algoritma machine learning Naïve Bayes, Decision Tree, dan K-Nearest Neighbor terhadap kalimat finansial yang disajikan oleh FiQA dan The Financial PhraseBank. Berdasarkan analisis, didapatkan hasil performa dari masing-masing algoritma dengan nilai akurasi algoritma Naïve Bayes sebesar 78,45%; algoritma Decision Tree dengan nilai akurasi sebesar 77,72%; algoritma K-Nearest Neighbor (k=3) dengan nilai akurasi sebesar 41,25%; dan K-Nearest Neighbor (k=5) dengan nilai akurasi sebesar 37,38%. Analisis sentimen dengan algoritma Naive Bayes memiliki performa paling baik dengan nilai akurasi paling tinggi. Sentiment analysis or can also be called opinion mining is one of the main tasks of Natural Language Processing (NLP) which is a computational study that studies a person's opinion on a topic or entity. The analysis was performed with machine learning algorithms Naïve Bayes, Decision Tree, and K-Nearest Neighbor by dividing sentiment into two categories of sentiment namely positive sentiment and negative sentiment. The analysis data was taken from Financial Opinion Mining and Question Answering (FiQA) and The Financial PhraseBank which consisted of 4,840 sentences selected from various financial news and annotated by 16 different annotators experienced in the financial domain. This research is aimed at obtaining sentiment analysis results with the best algorithms through comparison of the performance of Naïve Bayes, Decision Tree, and K-Nearest Neighbor machine learning algorithms against financial sentences presented by FiQA and The Financial PhraseBank. Based on the analysis, the performance results of each algorithm were obtained with the accuracy value of the Naïve Bayes algorithm of 78,45%; Decision Tree algorithm with an accuracy value of 77,72%; K-Nearest Neighbor algorithm (k=3) with an accuracy value of 41,25%; and K-Nearest Neighbor (k=5) with an accuracy value of 37,38%. Sentiment analysis with the Naive Bayes algorithm (K=5) performs best with the highest accuracy values.
... There are many more applications and improvements on opinion mining and sentiment analysis algorithms. This survey give a several sentiment or opinion mining technique and their accuracy [2], [3], [4], [5], [6], [7], [8], [9], [10] and [11] given by authors. (Table 1) 5 ...
Article
Full-text available
Now a days, customers opinions are plays the major role in the E-commerce applications such as Flipkart, Amazon, eBay etc. Based on customer feedback on the product or seller in the form reviews or comments are the difficulty process by potential buyers to choose a products through online. In the proposed system, the various sentiment analysis techniques to provide a solution in two main areas. 1) Extract customer opinions on specific product or seller. 2) Analyze the sentiments towards that specific product or seller. In this paper, we analyzed several opinion mining techniques and sentiment analysis and their correctness in the categories of opinions or sentiments.
Chapter
With the technological developments in the fields of natural language processing (NLP) and opinion mining (OM), many real-time applications are concentrating on analyzing the opinions of the people. The opinions or reviews given by the people through the internet are collected for summarization or classification based on the need. The feature selection typically saves the operating time, eliminates irrelevant features and redundancy. For feature selection, a semantic based feature selection algorithm called information gain (IG) is used. Naive Bayes, bagging, support vector machines (SVM), classification and regression trees (CART), and algorithms along with optimization techniques like ant colony optimization algorithms are used to optimize and classify the opinions. Also, in this chapter, the state-of-the art machine learning technique, deep learning, is also involved with the convolution neural networks (CNN) algorithm to identify the positive and negative opinions in different fields such as movie reviews, emojis and medical data.
Article
Full-text available
The 2021 COP26 meeting presented South Africa with an $8.5 billion deal to reduce its heavy reliance on coal, sparking a renewed public debate about transforming the country's coal-fired energy system to address emissions, energy deficits, and declining services. This paper examines public opinion on this important energy transition initiative as expressed on social media. While the use of social media platforms for public deliberation on policy matters is increasing in Africa, research exploring the African social media landscape in the context of energy transition is limited. This paper addresses this gap by qualitatively analysing 3,980 Facebook comments on 31 news posts related to the COP26 deal using sentiment and thematic approaches in ATLAS.ti 22. The findings reveal a prevalent negative sentiment and delegitimizing opinions that challenge the deal's credibility. Prominent topics within the discourse encompass concerns about corruption, distrust in public institutions, and perceptions of foreign involvement. Although some motivating factors supporting the deal emerged, negative sentiments and viewpoints dominated the discourse. Studying symbolic practices related to energy visions in this underexplored Global South context yields valuable insights into public opinions on energy transitions, highlighting the link between governance institutions and societal attitudes toward energy transition.
Chapter
Sentiment analysis's main goal is to extract the context from the text. The digital world of today offers us a variety of raw data formats, including blogs, Twitter, and Facebook. In order to perform analysis on this raw data, researchers must transform it into useful information. Numerous researchers used both deep learning and traditional machine learning techniques to determine the text's polarity. In order to understand the work done, we reviewed both approaches in this paper. The best methods for classifying the text will be selected by the researchers with the aid of this paper. We select a few of the best articles and evaluate them critically based on various factors. The purpose of this study is to explore the different machine learning and deep learning techniques to identify its importance as well as to raise an interest for this research area.
Article
Full-text available
In recent years, the spectacular development of web technologies, lead to an enormous quantity of user generated information in online systems. This large amount of information on web platforms make them viable for use as data sources, in applications based on opinion mining and sentiment analysis. The paper proposes an algorithm for detecting sentiments on movie user reviews, based on naive Bayes classifier. We make an analysis of the opinion mining domain, techniques used in sentiment analysis and its applicability. We implemented the proposed algorithm and we tested its performance, and suggested directions of development.
Article
Full-text available
The important part to gather the information is always seems as what the people think. The growing availability of opinion rich resources like online review sites and blogs arises as people can easily seek out and understand the opinions of others. Users express their views and opinions regarding products and services. These opinions are subjective information which represents user’s sentiments, feelings or appraisal related to the same. The concept of opinion is very broad. In this paper we focus on the Classification of opinion mining techniques that conveys user’s opinion i.e. positive or negative at various levels. The precise method for predicting opinions enable us, to extract sentiments from the web and foretell online customer’s preferences, which could prove valuable for marketing research. Much of the research work had been done on the processing of opinions or sentiments recently because opinions are so important that whenever we need to make a decision we want to know others ’ opinions. This opinion is not only important for a user but is also useful for an organization.
Conference Paper
Full-text available
Affects play an important role in influencing people's perceptions and decision making. Affect analysis is useful for measuring the presence of hate, violence, and the resulting propaganda dissemination across extremist groups. In this study we performed affect analysis of U.S. and Middle Eastern extremist group forum postings. We constructed an affect lexicon using a probabilistic disambiguation technique to measure the usage of violence and hate affects. These techniques facilitate in depth analysis of multilingual content. The proposed approach was evaluated by applying it across 16 U.S. supremacist and Middle Eastern extremist group forums. Analysis across regions reveals that the Middle Eastern test bed forums have considerably greater violence intensity than the U.S. groups. There is also a strong linear relationship between the usage of hate and violence across the Middle Eastern messages.
Article
Full-text available
The web contains a wealth of product reviews, but sifting through them is a daunting task. Ideally, an opinion mining tool would process a set of search results for a given item, generating a list of product attributes (quality, features, etc.) and aggregating opinions about each of them (poor, mixed, good). We begin by identifying the unique properties of this problem and develop a method for automatically distinguishing between positive and negative reviews. Our classifier draws on information retrieval techniques for feature extraction and scoring, and the results for various metrics and heuristics vary depending on the testing situation. The best methods work as well as or better than traditional machine learning. When operating on individual sentences collected from web searches, performance is limited due to noise and ambiguity. But in the context of a complete web-based tool and aided by a simple method for grouping sentences into attributes, the results are qualitatively quite useful.
Book
The Data Mining process encompasses many different specific techniques and algorithms that can be used to analyze the data and derive the discovered knowledge. An important problem regarding the results of the Data Mining process is the development of efficient indicators of assessing the quality of the results of the analysis. This, the quality assessment problem, is a cornerstone issue of the whole process because: i) The analyzed data may hide interesting patterns that the Data Mining methods are called to reveal. Due to the size of the data, the requirement for automatically evaluating the validity of the extracted patterns is stronger than ever. ii)A number of algorithms and techniques have been proposed which under different assumptions can lead to different results. iii)The number of patterns generated during the Data Mining process is very large but only a few of these patterns are likely to be of any interest to the domain expert who is analyzing the data. In this chapter we will introduce the main concepts and quality criteria in Data Mining. Also we will present an overview of approaches that have been proposed in the literature for evaluating the Data Mining results.
Book
The article mainly described the Web number of pages according to scoop out of basic mission, include a contents, structure, use etc. Aim at the complexity of the Web data with the special, the Web data mining t daily record's waiting a small part can mining a method with the in common use data, besides which, have to do the data processing of the necessity to the Web page, make it attain the excavation request that the structure turns a data, or use the XML technique to construct the half structure data mode to carry on again data excavation.
Article
An important part of our information-gathering behavior has always been to find out what other people think. With the growing availability and popularity of opinion-rich resources such as online review sites and personal blogs, new opportunities and challenges arise as people now can, and do, actively use information technologies to seek out and understand the opinions of others. The sudden eruption of activity in the area, of opinion mining and sentiment analysis, which deals with the computational treatment of opinion, sentiment, and subjectivity in text, has thus occurred at least in part as a direct response to the surge of interest in new systems that deal directly with opinions as a first-class object. This survey covers techniques and approaches that promise to directly enable opinion-oriented information-seeking systems. Our focus is on methods that seek to address the new challenges raised by sentiment-aware applications, as compared to those that are already present in more traditional fact-based analysis. We include material on summarization of evaluative text and on broader issues regarding privacy, manipulation, and economic impact that the development of opinion-oriented information-access services gives rise to. To facilitate future work, a discussion of available resources, benchmark datasets, and evaluation campaigns is also provided.
Article
We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data, we find that standard machine learning techniques definitively outperform human-produced baselines. However, the three machine learning methods we employed (Naive Bayes, maximum entropy classification, and support vector machines) do not perform as well on sentiment classification as on traditional topic-based categorization. We conclude by examining factors that make the sentiment classification problem more challenging.
Article
We address the rating-inference problem, wherein rather than simply decide whether a review is "thumbs up" or "thumbs down", as in previous sentiment analysis work, one must determine an author's evaluation with respect to a multi-point scale (e.g., one to five "stars"). This task represents an interesting twist on standard multi-class text categorization because there are several different degrees of similarity between class labels; for example, "three stars" is intuitively closer to "four stars" than to "one star". We first evaluate human performance at the task. Then, we apply a meta-algorithm, based on a metric labeling formulation of the problem, that alters a given n-ary classifier's output in an explicit attempt to ensure that similar items receive similar labels. We show that the meta-algorithm can provide significant improvements over both multi-class and regression versions of SVMs when we employ a novel similarity measure appropriate to the problem.
Vilas Thakre, Survey of Techniques for Opinion Mining
  • M Nilesh
  • Shriniwas Shelke
  • Deshpande
Nilesh M. Shelke, Shriniwas Deshpande, Vilas Thakre, Survey of Techniques for Opinion Mining, International Journal of Computer Applications (0975-8887) Volume 57-No.13, November 2012.