ArticlePDF Available

Methodological Study Of Opinion Mining And Sentiment Analysis Techniques

February 2014
International Journal on Soft Computing 5(1):11-21

February 2014
5(1):11-21

DOI:10.5121/ijsc.2014.5102

License
CC BY 4.0

Authors:

Mohd Shahid Husain

University of Technology and Applied Sciences (CAS - Ibri) Oman

Pravesh kumar Singh

Thakur Publication pvt. ltd

Decision making both on individual and organizational level is always accompanied by the search of other’s opinion on the same. With tremendous establishment of opinion rich resources like, reviews, forum discussions, blogs, micro-blogs, Twitter etc provide a rich anthology of sentiments. This user generated content can serve as a benefaction to market if the semantic orientations are deliberated. Opinion mining and sentiment analysis are the formalization for studying and construing opinions and sentiments. The digital ecosystem has itself paved way for use of huge volume of opinionated data recorded. This paper is an attempt to review and evaluate the various techniques used for opinion and sentiment analysis.

Content uploaded by Mohd Shahid Husain

Content may be subject to copyright.

International Journal on Soft Computing (IJSC) Vol. 5, No. 1, February 2014

DOI: 10.5121/ijsc.2014.5102 11

METHODOLOGICAL STUDY OF OPINION MINING

AND SENTIMENT ANALYSIS TECHNIQUES

Pravesh Kumar Singh1,Mohd Shahid Husain2

1M.Tech, Department of Computer Science and Engineering, Integral University,

Lucknow, India

2Assistant Professor, Department of Computer Science and Engineering, Integral

University, Lucknow, India

ABSTRACT

Decision making both on individual and organizational level is always accompanied by the search of

other’s opinion on the same. With tremendous establishment of opinion rich resources like, reviews, forum

discussions, blogs, micro-blogs, Twitter etc provide a rich anthology of sentiments. This user generated

content can serve as a benefaction to market if the semantic orientations are deliberated. Opinion mining

and sentiment analysis are the formalization for studying and construing opinions and sentiments. The

digital ecosystem has itself paved way for use of huge volume of opinionated data recorded. This paper is

an attempt to review and evaluate the various techniques used for opinion and sentiment analysis.

KEYWORDS

Opinion Mining, Sentiment Analysis, Feature Extraction Techniques, Naïve Bayes Classifiers, Clustering,

Support Vector Machines

1. INTRODUCTION

Generally individuals and companies are always interested in other’s opinion like if someone

wants to purchase a new product, then firstly, he/she tries to know the reviews i.e., what other

people think about the product and based on those reviews, he/she takes the decision.

Similarly, companies also excavate deep for consumer reviews. Digital ecosystem has a plethora

for same in the form of blogs, reviews etc.

A very basic step of opinion mining and sentiment analysis is feature extraction. Figure 1 shows

the process of opinion mining and sentiment analysis

International Journal on Soft Computing (IJSC) Vol. 5, No. 1, February 2014

There are various methods used for opinion mining and sentiment analysis among which

following are the important ones:

1) Naïve Bays Classifier.

2) Support Vector Machine (SVM).

3) Multilayer Perceptron.

4) Clustering.

In this paper, categorization of work done for feature extraction and classification in opinion

mining and sentiment analysis is done. In addition to this, performance analysis, advantages and

disadvantages of different techniques are appraised.

2. DATA SETS

This section provides brief details of datasets used in experiments.

2.1. Product Review Dataset

Blitzer takes the review of products from amazon.com which belong to a total of 25 categories

like videos, toys etc. He randomly selected 4000 +ve and 4000 –ve reviews.

2.2. Movie Review Dataset

The movie review dataset is taken from the Pang and Lee (2004) works. It contains movie review

with feature of 1000 +ve and 1000 –ve processed movie reviews.

3. CLASSIFICATION TECHNIQUES

3.1. Naïve Bayes Classifier

It’s a probabilistic and supervised classifier given by Thomas Bayes. According to this theorem,

if there are two events say, e1and e2then the conditional probability of occurrence of event e1

when e2has already occurred is given by the following mathematical formula:

112

21 )()|(

)|( eePeeP

eeP =

This algorithm is implemented to calculate the probability of a data to be positive or negative. So,

conditional probability of a sentiment is given as:

)Sentence(P )Sentiment|Sentence(P)Sentiment(P

)Sentence|Sentiment(P =

And conditional probability of a word is given as:

WordofnosTotal+classatobelongingwordsofNumber 1+classinoccurencewordofNumber

=)Sentiment|Word(P

Algorithm

S1: Initialize P(positive) ←num −popozitii (positive)/ num_total_propozitii

International Journal on Soft Computing (IJSC) Vol. 5, No. 1, February 2014

S2: Initialize P(negative) ←num −popozitii (negative) / num_total_propozitii

S3: Convert sentences into words

for each class of {positive, negative}:

for each word in {phrase}

P(word | class) < num_apartii (word | class) 1 | num_cuv (class) +

num_total_cuvinte

P (class) ←P (class) * P (word | class)

Returns max {P(pos), P(neg)}

The above algorithm can be represented using figure 2

3.1.1. Evaluation of Algorithm

To evaluate the algorithm following measures are used:

Accuracy

Precision

Recall

Relevance

Following contingency table is used to calculate the various measures.

Relevant

Irrelevant

Detected Opinions

True Positive (tp)

False Positive (fp)

Undetected Opinions

False Negative (fn)

True Negative (tn)

Now, Precision =

fp+tptp

Accuracy =

fn+fp+tn+tp tn+tp

callReecisionPr callRe*ecisionPr*2 +

; Recall =

fn+tptp

Classifier

Training Set

+ve Sentence

−ve Sentence

Classifier

Sentence

Review

Book Review

Figure 2. Algorithm of Naïve Bayes

International Journal on Soft Computing (IJSC) Vol. 5, No. 1, February 2014

3.1.2. Accuracy

On the 5000 sentences [1] Ion SMEUREANU, Cristian BUCUR train the Naïve Gauss Algorithm

and got 0.79939209726444 accuracy; Where number of groups (n) is 2.

3.1.3. Advantages of Naïve Bayes Classification Method

1. Model is easy to interpret.

2. Efficient computation.

3.1.4. Disadvantage of Naïve Bayes Classification Method

Assumptions of attributes being independent, which may not be necessarily valid.

3.2 Support Vector Machine (SVM)

SVM is a supervised learning model. This model is associated with a learning algorithm that

analyzes the data and identifies the pattern for classification.

The concept of SVM algorithm is based on decision plane that defines decision boundaries. A

decision plane separates group of instances having different class memberships.

For example, consider an instance which belongs to either class Circle or Diamond. There is a

separating line (figure 3) which defines a boundary. At the right side of boundary all instances are

Circle and at the left side all instances are Diamond.

Is there is an exercise/training data set D, a set of n points is written as:

( ) { }

{ }

)1.......(1,1c,Rxc,xD n

iii −

¬εε=

Where, xiis a p-dimensional real vector. Find the maximum-margin hyper plane i.e. splits the

points having ci= 1 from those having ci= -1. Any hyperplane can be written as the set of points

satisfying:

)........(21b-xw =•

Support

Vectors

Support

Vectors

Figure 3. Principle of SVM

International Journal on Soft Computing (IJSC) Vol. 5, No. 1, February 2014

Finding a maximum margin hyperplane, reduces to find the pair w and b, such that the distance

between the hyperplanes is maximal while still separating the data. These hyperplanes are

described by:

1b=−• xw

and

1bxw −=−•

The distance between two hyperplanes is

and therefore

needs to be minimized. The

minimized

in w, b subject to ci(w.xi

−

b) ≥1 for any i = 1…n.

Using Lagrange’s multipliers (αi) this optimization problem can be expressed as:

( )

3.....}]1-)b-x.w(c[-w

{

b,w

maxmin ∑

1i iii







































α

α=

3.2.1. Extensions of SVM

There are some extensions which makes SVM more robust and adaptable to real world problem.

These extensions include the following:

1. Soft Margin Classification

In text classification sometimes data are linearly divisible, for very high dimensional problems

and for multi-dimensional problems data are also separable linearly. Generally (in maximum

cases) the opinion mining solution is one that classifies most of the data and ignores outliers and

noisy data. If a training set data say D cannot be separated clearly then the solution is to have fat

decision classifiers and make some mistake.

Mathematically, a slack variable ξiare introduced that are not equal to zero which allows xito not

meet the margin requirements with a cost i.e., proportional to ξ.

2. Non-linear Classification

Non-linear classifiers are given by the Bemhard Boser, Isabelle Guyon and Vapnik in 1992 using

kernel to max margin hyperplanes.

Aizeman given a kernel trick i.e., every dot product is replaced by non-linear kernel function.

When this case is apply then the effectiveness of SVM lies in the selection of kernel and soft

margin parameters.

3. Multiclass SVM

Basically SVM relevant for two class tasks but for the multiclass problems there is multiclass

SVM is available. In the multi class case labels are designed to objects which are drawn from a

finite set of numerous elements. These binary classifiers might be built using two classifiers:

1. Distinguishing one versus all labels and

2. Among each pair of classes one versus one.

3.2.2. Accuracy

When pang take unigrams learning method then it gives the best output in a presence based

frequency model run by SVM and he calculated 82.9% accuracy in the process.

International Journal on Soft Computing (IJSC) Vol. 5, No. 1, February 2014

3.2.3. Advantages of Support Vector Machine Method

1. Very good performance on experimental results.

2. Low dependency on data set dimensionality.

3.2.4. Disadvantages of Support Vector Machine Method

1. One disadvantages of SVM is i.e. in case of categorical or missing value it needs pre-processed.

2. Difficult interpretation of resulting model.

3.3. Multi-Layer Perceptron (MLP)

Multi-Layer perceptron is a feed forward neural network, with one or N layers among inputs and

output. Feed forward means i.e, uni-direction flow of data such as from input layer to output

layer. This ANN which multilayer perceptron begin with input layer where every node means a

predicator variable. Input nodes or neurons are connected with every neuron in next layer (named

as hidden layers). The hidden layer neurons are connected to other hidden layer neuron.

Output layer is made up as follows:

1. When prediction is binary output layer made up of one neuron and

2. When prediction is non-binary then output layer made up of N neuron.

This arrangement makes an efficient flow of information from input layer to output layer.

Figure 4 shows the structure of MLP. In figure 4 there is input layer and an output layer like

single layer perceptron but there is also a hidden layer work in this algorithm.

MLP is a back propagation algorithm and has two phases:

Phase I: It is the forward phase where activation are propagated from the input layer to output

layer.

Phase II: In this phase to change the weight and bias value errors among practical & real values

and the requested nominal value in the output layer is propagate in the backward direction.

MLP is popular technique due to the fact i.e. it can act as universal function approximator. MLP

is a general, flexible and non-linear tool because a “back propagation” network has minimum one

hidden layer with various non-linear entities that can learn every function or relationship between

group of input and output variable (whether variables are discrete or continuous).

International Journal on Soft Computing (IJSC) Vol. 5, No. 1, February 2014

An advantage of MLP, compare to classical modeling method is that it does not enforce any sort

of constraint with respect to the initial data neither does it generally start from specific

assumptions.

Another benefit of the method lies in its capability to evaluation good models even despite the

presence of noise in the analyzed information, as arises when there is an existence of omitted and

outlier values in the spreading of the variables. Hence, it is a robust method when dealing with

problems of noise in the given information.

3.3.1. Accuracy

On the health care data Ludmila I. Kuncheva, (IEEE Member) calculate accuracy of MLP as

84.25%-89.50%.

3.3.2. Advantages of MLP

1. It acts as a universal function approximator.

2. MLP can learn each and every relationship among input and output variables.

3.3.3. Disadvantages of MLP

1. MLP needs more time for execution compare to other technique because flexibility lies in the

need to have enough training data.

2. It is considered as complex “black box”.

3.4 Clustering Classifier

Clustering is an unsupervised learning method and has no labels on any point. Clustering

technique recognizes the structure in data and group, based on how nearby they are to one

another.

So, clustering is process of organizing objects and instances in a class or group whose members

are similar in some way and members of class or cluster is not similar to those are in the other

cluster

This method is an unsupervised method, so one does not know that how many clusters or groups

are existing in the data.

Using this method one can organize the data set into different clusters based on the similarities

and distance among data points.

International Journal on Soft Computing (IJSC) Vol. 5, No. 1, February 2014

Clustering organization is denoted as a set of subsets C = C1. . . Ckof S, such that:

φ=∩= =ji

1i iCCandCS 

for

ji ≠

. Therefore, any object in S related to exactly one and only one

subset.

For example, consider figure 5 where data set has three normal clusters.

Now consider the some real-life examples for illustrating clustering:

Example 1: Consider the people having similar size together to make small and large shirts.

1. Tailor-made for each person: expensive

2. One-size-fits-all: does not fit all.

Example 2: In advertising, segment consumers according to their similarities: To do targeted

advertising.

Example 3: To create a topic hierarchy, we can take a group of text and organize those texts

according to their content matches.

Basically there are two types of measures used to estimate the relation: Distance measures and

similarity measures.

Basically following are two kinds of measures used to guesstimate this relation:

1. Distance measures and

2. Similarity measures

Distance Measures

To get the similarity and difference between the group of objects distance measures uses the

various clustering methods.

It is convenient to represent the distance between two instances let say xiand xjas: d (xi, xj). A

valid distance measure should be symmetric and gains its minimum value (usually zero) in case

of identical vectors.

If distance measure follows the following properties then it is known as metric distance measure:

S∈x,x∀

x=x⇒0=)x,x(d.2

S∈x,x,x∀

)x,x(d+)x,x(d≤)x,x(dinequalityTriangle.1

jiji

kji

kjjiki

There are variations in distance measures depending upon the attribute in question.

3.4.1. Clustering Algorithms

A number of clustering algorithms are getting popular. The basic reason of a number of clustering

methods is that “cluster” is not accurately defined (Estivill-Castro, 2000). As a result many

clustering methods have been developed, using a different induction principle.

1. Exclusive Clustering

In this clustering algorithm, data are clusters in an exclusive way, so that a data fits to only one

certain cluster. Example of exclusive clustering is K-means clustering.

International Journal on Soft Computing (IJSC) Vol. 5, No. 1, February 2014

2. Overlapping Clustering

This clustering algorithm uses fuzzy sets to grouped data, so each point may fit to two or more

groups or cluster with various degree of membership.

3. Hierarchical Clustering

Hierarchical clustering has two variations: agglomerative and divisive clustering

Agglomerative clustering is based on the union among the two nearest groups. The start state is

realized by setting every data as a group or cluster. After some iteration it gets the final clusters

needed. It is a bottom-up version.

Divisive clustering begins from one group or cluster containing all data items. At every step,

clusters are successively fragmented into smaller groups or clusters according to some difference.

It is a top-down version.

4. Probabilistic Clustering

It is a mix of Gaussian, and uses totally a probabilistic approach.

3.4.2. Evaluation Criteria Measures for Clustering Technique

Basically, it is divided into two group’s internal quality criteria and external quality criteria.

1. Internal Quality Criteria

Using similarity measure it measures the compactness if clusters. It generally takes into

consideration intra-cluster homogeneity, the inter-cluster separability or a combination of these

two. It doesn’t use any exterior information beside the data itself.

2. External Quality Criteria

External quality criteria are important for observing the structure of the cluster match to some

previously defined classification of the instance or objects.

3.4.3. Accuracy

Depending on the data accuracy of the clustering techniques varied from 65.33% to 99.57%.

3.4.4. Advantages of Clustering Method

The most important benefit of this technique is that it offers the classes or groups that fulfill

(approximately) an optimality measure.

3.4.5. Disadvantages of Clustering Method

1. There is no learning set of labeled observations.

2. Number of groups is usually unknown.

3. Implicitly, users already choose the appropriate features and distance measure.

4. CONCLUSION

The important part of gathering information always seems as, what the people think. The rising

accessibility of opinion rich resources such as online analysis websites and blogs means that, one

can simply search and recognize the opinions of others. One can precise his/her ideas and

opinions concerning goods and facilities. These views and thoughts are subjective figures which

signify opinions, sentiments, emotional state or evaluation of someone.

International Journal on Soft Computing (IJSC) Vol. 5, No. 1, February 2014

In this paper, different methods for data (feature or text) extraction are presented. Every method

has some benefits and limitations and one can use these methods according to the situation for

feature and text extraction. Based on the survey we can find the accuracy of different methods in

different data set using N-gram feature shown in table 1.

Table 1: Accuracy of Different Methods

Movie Reviews

Product Reviews

N-gram

Feature

MLP

SVM

MLP

SVM

75.50

81.05

81.15

62.50

79.27

79.40

According to the survey, accuracy of SVM is better than other three methods when N-gram

feature was used.

The four methods discussed in the paper are actually applicable in different areas like clustering is

applied in movie reviews and SVM techniques is applied in biological reviews & analysis.

Although the field of opinion mining is new, but still diverse methods available to provide a way

to implement these methods in various programming languages like PHP, Python etc. with an

outcome of innumerable applications. From a convergent point of view Naïve Bayes is best

suitable for textual classification, clustering for consumer services and SVM for biological

reading and interpretation.

ACKNOWLEDGEMENTS

Every good writing requires the help and support of many people for it to be truly good. I would

take the opportunity of thanking all those who extended a helping hand whenever I needed one.

I offer my heartfelt gratitude to Mr. Mohd. Shahid Husain, who encouraged, guided and helped

me a lot in the project. I extent my thanks to Miss. Ratna Singh (fiancee) for her incandescent

help to complete this paper.

A vote of thanks to my family for their moral and emotional support. Above all utmost thanks to

the Almighty God for the divine intervention in this academic endeavor.

REFERENCES

[1] Ion SMEUREANU, Cristian BUCUR, Applying Supervised Opinion Mining Techniques on Online

User Reviews, Informatica Economică vol. 16, no. 2/2012.

[2] Bo Pang and Lillian Lee, “Opinion Mining and Sentiment Analysis”, Foundations and TrendsR_ in

Information Retrieval Vol. 2, Nos. 1–2 (2008).

[3] Abbasi, “Affect intensity analysis of dark web forums,” in Proceedings of Intelligence and Security

Informatics (ISI), pp. 282–288, 2007.

[4] K. Dave, S. Lawrence & D. Pennock. \Mining the Peanut Gallery: Opinion Extraction and Semantic

Classi_cation of Product Reviews." Proceedings of the 12th International Conference on World Wide

Web, pp. 519-528, 2003.

[5] B. Liu. \Web Data Mining: Exploring hyperlinks, contents, and usage data," Opinion Mining.

Springer, 2007.

[6] B. Pang & L. Lee, \Seeing stars: Exploiting class relationships for sentiment categorization with

respect to rating scales." Proceedings of the Association for Computational Linguistics (ACL), pp.

15124,2005.

[7] Nilesh M. Shelke, Shriniwas Deshpande, Vilas Thakre, Survey of Techniques for Opinion Mining,

International Journal of Computer Applications (0975 –8887) Volume 57–No.13, November 2012.

International Journal on Soft Computing (IJSC) Vol. 5, No. 1, February 2014

[8] Nidhi Mishra and C K Jha, Classification of Opinion Mining Techniques, International Journal of

Computer Applications 56 (13):1-6, October 2012, Published by Foundation of Computer Science,

New York, USA.

[9] Oded Z. Maimon, Lior Rokach, “Data Mining and Knowledge Discovery Handbook” Springer, 2005.

[10] Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. “Sentiment classification using machine

learning techniques.” In Proceedings of the 2002 Conference on Empirical Methods in Natural

Language Processing (EMNLP), pages 79–86.

[11] Towards Enhanced Opinion Classiﬁcation using NLP Techniques, IJCNLP 2011, pages 101–107,

Chiang Mai, Thailand, November 13, 2011

Author

Pravesh Kumar Singh is a fine blend of strong scientific orientation and editing.

He is a Computer Science (Bachelor in Technology) graduate from a renowned

gurukul in India called Dr. Ram Manohar Lohia Awadh University with

excellence not only in academics but also had flagship in choreography. He

mastered in Computer Science and Engineering from Integral University,

Lucknow, India. Currently he is acting as Head MCA (Master in Computer

Applications) department in Thakur Publications and also working in the capacity

of Senior Editor.

Vote Bank Forecast Based on Social Media Sentiment Analysis

Conference Paper

Full-text available

Dec 2023

Analisis Sentimen terhadap Kalimat Finansial pada FiQA dan The Financial PhraseBank

Article

Full-text available

Jun 2023

Analisis sentimen atau bisa disebut juga opinion mining merupakan salah satu tugas utama dari Natural Language Processing (NLP) yang merupakan studi komputasi yang mempelajari tentang pendapat seseorang terhadap suatu topik bahasan atau entitas. Analisis dilakukan dengan algoritma machine learning (pembelajaran mesin) Naïve Bayes, Decision Tree, dan K-Nearest Neighbor dengan membagi sentimen ke dalam dua kategori sentimen yaitu sentimen positif dan sentimen negatif. Data analisis diambil dari Financial Opinion Mining and Question Answering (FiQA) dan The Financial PhraseBank yang terdiri dari 4.840 kalimat yang dipilih dari berbagai berita keuangan dan dianotasi oleh 16 annotator berbeda yang berpengalaman dalam domain finansial. Penelitian ini ditujukan untuk mendapatkan hasil analisis sentimen dengan algoritma terbaik melalui perbandingan performa algoritma machine learning Naïve Bayes, Decision Tree, dan K-Nearest Neighbor terhadap kalimat finansial yang disajikan oleh FiQA dan The Financial PhraseBank. Berdasarkan analisis, didapatkan hasil performa dari masing-masing algoritma dengan nilai akurasi algoritma Naïve Bayes sebesar 78,45%; algoritma Decision Tree dengan nilai akurasi sebesar 77,72%; algoritma K-Nearest Neighbor (k=3) dengan nilai akurasi sebesar 41,25%; dan K-Nearest Neighbor (k=5) dengan nilai akurasi sebesar 37,38%. Analisis sentimen dengan algoritma Naive Bayes memiliki performa paling baik dengan nilai akurasi paling tinggi. Sentiment analysis or can also be called opinion mining is one of the main tasks of Natural Language Processing (NLP) which is a computational study that studies a person's opinion on a topic or entity. The analysis was performed with machine learning algorithms Naïve Bayes, Decision Tree, and K-Nearest Neighbor by dividing sentiment into two categories of sentiment namely positive sentiment and negative sentiment. The analysis data was taken from Financial Opinion Mining and Question Answering (FiQA) and The Financial PhraseBank which consisted of 4,840 sentences selected from various financial news and annotated by 16 different annotators experienced in the financial domain. This research is aimed at obtaining sentiment analysis results with the best algorithms through comparison of the performance of Naïve Bayes, Decision Tree, and K-Nearest Neighbor machine learning algorithms against financial sentences presented by FiQA and The Financial PhraseBank. Based on the analysis, the performance results of each algorithm were obtained with the accuracy value of the Naïve Bayes algorithm of 78,45%; Decision Tree algorithm with an accuracy value of 77,72%; K-Nearest Neighbor algorithm (k=3) with an accuracy value of 41,25%; and K-Nearest Neighbor (k=5) with an accuracy value of 37,38%. Sentiment analysis with the Naive Bayes algorithm (K=5) performs best with the highest accuracy values.

An Overview on Opinion Mining Techniques and Sentiment Analysis

Article

Full-text available

Jan 2018

Now a days, customers opinions are plays the major role in the E-commerce applications such as Flipkart, Amazon, eBay etc. Based on customer feedback on the product or seller in the form reviews or comments are the difficulty process by potential buyers to choose a products through online. In the proposed system, the various sentiment analysis techniques to provide a solution in two main areas. 1) Extract customer opinions on specific product or seller. 2) Analyze the sentiments towards that specific product or seller. In this paper, we analyzed several opinion mining techniques and sentiment analysis and their correctness in the categories of opinions or sentiments.

Sentiment Analysis: A Hybrid Approach on Twitter Data

Article

Jan 2024

Performance Evaluation of Deep Learning and Machine Learning Techniques for Opinion Mining

Chapter

May 2024

With the technological developments in the fields of natural language processing (NLP) and opinion mining (OM), many real-time applications are concentrating on analyzing the opinions of the people. The opinions or reviews given by the people through the internet are collected for summarization or classification based on the need. The feature selection typically saves the operating time, eliminates irrelevant features and redundancy. For feature selection, a semantic based feature selection algorithm called information gain (IG) is used. Naive Bayes, bagging, support vector machines (SVM), classification and regression trees (CART), and algorithms along with optimization techniques like ant colony optimization algorithms are used to optimize and classify the opinions. Also, in this chapter, the state-of-the art machine learning technique, deep learning, is also involved with the convolution neural networks (CNN) algorithm to identify the positive and negative opinions in different fields such as movie reviews, emojis and medical data.

Exploring Transition Tensions in Public Opinion on the COP26 Coal Phase-out Deal for South Africa as Expressed on Facebook

Article

Full-text available

May 2024
ENVIRON COMMUN

The 2021 COP26 meeting presented South Africa with an $8.5 billion deal to reduce its heavy reliance on coal, sparking a renewed public debate about transforming the country's coal-fired energy system to address emissions, energy deficits, and declining services. This paper examines public opinion on this important energy transition initiative as expressed on social media. While the use of social media platforms for public deliberation on policy matters is increasing in Africa, research exploring the African social media landscape in the context of energy transition is limited. This paper addresses this gap by qualitatively analysing 3,980 Facebook comments on 31 news posts related to the COP26 deal using sentiment and thematic approaches in ATLAS.ti 22. The findings reveal a prevalent negative sentiment and delegitimizing opinions that challenge the deal's credibility. Prominent topics within the discourse encompass concerns about corruption, distrust in public institutions, and perceptions of foreign involvement. Although some motivating factors supporting the deal emerged, negative sentiments and viewpoints dominated the discourse. Studying symbolic practices related to energy visions in this underexplored Global South context yields valuable insights into public opinions on energy transitions, highlighting the link between governance institutions and societal attitudes toward energy transition.

A Comparative Study of Automatic Hate Speech Detection Using Machine Learning

Conference Paper

Jan 2024

Deep Learning or Traditional Methods for Sentiment Analysis: A Review

Chapter

Feb 2024

Sentiment analysis's main goal is to extract the context from the text. The digital world of today offers us a variety of raw data formats, including blogs, Twitter, and Facebook. In order to perform analysis on this raw data, researchers must transform it into useful information. Numerous researchers used both deep learning and traditional machine learning techniques to determine the text's polarity. In order to understand the work done, we reviewed both approaches in this paper. The best methods for classifying the text will be selected by the researchers with the aid of this paper. We select a few of the best articles and evaluate them critically based on various factors. The purpose of this study is to explore the different machine learning and deep learning techniques to identify its importance as well as to raise an interest for this research area.

Combining machine learning algorithms for personality trait prediction

Article

Feb 2024

Exploring the Applications, Challenges, and Issues of Sentiment Analysis

Conference Paper

Feb 2023

Applying Supervised Opinion Mining Techniques on Online User Reviews

Article

Full-text available

Jan 2012

In recent years, the spectacular development of web technologies, lead to an enormous quantity of user generated information in online systems. This large amount of information on web platforms make them viable for use as data sources, in applications based on opinion mining and sentiment analysis. The paper proposes an algorithm for detecting sentiments on movie user reviews, based on naive Bayes classifier. We make an analysis of the opinion mining domain, techniques used in sentiment analysis and its applicability. We implemented the proposed algorithm and we tested its performance, and suggested directions of development.

Classification of Opinion Mining Techniques

Article

Full-text available

Oct 2012

The important part to gather the information is always seems as what the people think. The growing availability of opinion rich resources like online review sites and blogs arises as people can easily seek out and understand the opinions of others. Users express their views and opinions regarding products and services. These opinions are subjective information which represents user’s sentiments, feelings or appraisal related to the same. The concept of opinion is very broad. In this paper we focus on the Classification of opinion mining techniques that conveys user’s opinion i.e. positive or negative at various levels. The precise method for predicting opinions enable us, to extract sentiments from the web and foretell online customer’s preferences, which could prove valuable for marketing research. Much of the research work had been done on the processing of opinions or sentiments recently because opinions are so important that whenever we need to make a decision we want to know others ’ opinions. This opinion is not only important for a user but is also useful for an organization.

Affect Intensity Analysis of Dark Web Forums

Conference Paper

Full-text available

May 2007

Affects play an important role in influencing people's perceptions and decision making. Affect analysis is useful for measuring the presence of hate, violence, and the resulting propaganda dissemination across extremist groups. In this study we performed affect analysis of U.S. and Middle Eastern extremist group forum postings. We constructed an affect lexicon using a probabilistic disambiguation technique to measure the usage of violence and hate affects. These techniques facilitate in depth analysis of multilingual content. The proposed approach was evaluated by applying it across 16 U.S. supremacist and Middle Eastern extremist group forums. Analysis across regions reveals that the Middle Eastern test bed forums have considerably greater violence intensity than the U.S. groups. There is also a strong linear relationship between the usage of hate and violence across the Middle Eastern messages.

Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews

Article

Full-text available

Oct 2003

The web contains a wealth of product reviews, but sifting through them is a daunting task. Ideally, an opinion mining tool would process a set of search results for a given item, generating a list of product attributes (quality, features, etc.) and aggregating opinions about each of them (poor, mixed, good). We begin by identifying the unique properties of this problem and develop a method for automatically distinguishing between positive and negative reviews. Our classifier draws on information retrieval techniques for feature extraction and scoring, and the results for various metrics and heuristics vary depending on the testing situation. The best methods work as well as or better than traditional machine learning. When operating on individual sentences collected from web searches, performance is limited due to noise and ambiguity. But in the context of a complete web-based tool and aided by a simple method for grouping sentences into attributes, the results are qualitatively quite useful.

The Data Mining and Knowledge Discovery Handbook

Book

Jan 2005

The Data Mining process encompasses many different specific techniques and algorithms that can be used to analyze the data and derive the discovered knowledge. An important problem regarding the results of the Data Mining process is the development of efficient indicators of assessing the quality of the results of the analysis. This, the quality assessment problem, is a cornerstone issue of the whole process because: i) The analyzed data may hide interesting patterns that the Data Mining methods are called to reveal. Due to the size of the data, the requirement for automatically evaluating the validity of the extracted patterns is stronger than ever. ii)A number of algorithms and techniques have been proposed which under different assumptions can lead to different results. iii)The number of patterns generated during the Data Mining process is very large but only a few of these patterns are likely to be of any interest to the domain expert who is analyzing the data. In this chapter we will introduce the main concepts and quality criteria in Data Mining. Also we will present an overview of approaches that have been proposed in the literature for evaluating the Data Mining results.

Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data

Book

Jan 2007

Bing Liu

The article mainly described the Web number of pages according to scoop out of basic mission, include a contents, structure, use etc. Aim at the complexity of the Web data with the special, the Web data mining t daily record's waiting a small part can mining a method with the in common use data, besides which, have to do the data processing of the necessity to the Web page, make it attain the excavation request that the structure turns a data, or use the XML technique to construct the half structure data mode to carry on again data excavation.

Opinion Mining and Sentiment Analysis

Article

Jan 2008

An important part of our information-gathering behavior has always been to find out what other people think. With the growing availability and popularity of opinion-rich resources such as online review sites and personal blogs, new opportunities and challenges arise as people now can, and do, actively use information technologies to seek out and understand the opinions of others. The sudden eruption of activity in the area, of opinion mining and sentiment analysis, which deals with the computational treatment of opinion, sentiment, and subjectivity in text, has thus occurred at least in part as a direct response to the surge of interest in new systems that deal directly with opinions as a first-class object. This survey covers techniques and approaches that promise to directly enable opinion-oriented information-seeking systems. Our focus is on methods that seek to address the new challenges raised by sentiment-aware applications, as compared to those that are already present in more traditional fact-based analysis. We include material on summarization of evaluative text and on broader issues regarding privacy, manipulation, and economic impact that the development of opinion-oriented information-access services gives rise to. To facilitate future work, a discussion of available resources, benchmark datasets, and evaluation campaigns is also provided.

Thumbs up? Sentiment Classification Using Machine Learning Techniques

Article

Jun 2002

We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data, we find that standard machine learning techniques definitively outperform human-produced baselines. However, the three machine learning methods we employed (Naive Bayes, maximum entropy classification, and support vector machines) do not perform as well on sentiment classification as on traditional topic-based categorization. We conclude by examining factors that make the sentiment classification problem more challenging.

Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales

Article

Jul 2005

We address the rating-inference problem, wherein rather than simply decide whether a review is "thumbs up" or "thumbs down", as in previous sentiment analysis work, one must determine an author's evaluation with respect to a multi-point scale (e.g., one to five "stars"). This task represents an interesting twist on standard multi-class text categorization because there are several different degrees of similarity between class labels; for example, "three stars" is intuitively closer to "four stars" than to "one star". We first evaluate human performance at the task. Then, we apply a meta-algorithm, based on a metric labeling formulation of the problem, that alters a given n-ary classifier's output in an explicit attempt to ensure that similar items receive similar labels. We show that the meta-algorithm can provide significant improvements over both multi-class and regression versions of SVMs when we employ a novel similarity measure appropriate to the problem.

Vilas Thakre, Survey of Techniques for Opinion Mining

Nov 2012
975-8887

M Nilesh
Shriniwas Shelke
Deshpande

Nilesh M. Shelke, Shriniwas Deshpande, Vilas Thakre, Survey of Techniques for Opinion Mining, International Journal of Computer Applications (0975-8887) Volume 57-No.13, November 2012.

Methodological Study Of Opinion Mining And Sentiment Analysis Techniques

Abstract

Recommended publications

ISAR: Implicit Sentiment Analysis of User Reviews

A PANOPTICS OF SENTIMENTAL ANALYSIS

Analytical Study of Feature Extraction Techniques in Opinion Mining

Books Reviews using Naive Bayes and Clustering Classifier

Books Reviews using Naïve Bayes and Clustering Classifier

Contextual Opinion mining in online Odia Text using 'Support Vector Machine