Conference PaperPDF Available

OpinionFinder: A System for Subjectivity Analysis.

January 2005

January 2005

DOI:10.3115/1225733.1225751

Source
DBLP

Conference: HLT/EMNLP 2005, Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, 6-8 October 2005, Vancouver, British Columbia, Canada

Authors:

Theresa Wilson

Hanover College

Swapna Somasundaran

Educational Testing Service

Jason S. Kessler

Amazon

Show all 9 authorsHide

OpinionFinder is a system that performs subjectivity analysis, automatically identifying when opinions, sentiments, speculations, and other private states are present in text. Specifically, OpinionFinder aims to identify subjective sentences and to mark various aspects of the subjectivity in these sentences, including the source (holder) of the subjectivity and words that are included in phrases expressing positive or negative sentiments.

Content uploaded by Siddharth Patwardhan

Content may be subject to copyright.

Content uploaded by Siddharth Patwardhan

Content may be subject to copyright.

OpinionFinder: A system for subjectivity analysis

Theresa Wilson‡, Paul Hoffmann‡, Swapna Somasundaran†, Jason Kessler†,

Janyce Wiebe†‡, Yejin Choi§, Claire Cardie§, Ellen Riloff∗, Siddharth Patwardhan∗

‡Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA 15260

†Department of Computer Science, University of Pittsburgh, Pittsburgh, PA 15260

§Department of Computer Science, Cornell University, Ithaca, NY 14853

∗School of Computing, University of Utah, Salt Lake City, UT 84112

{twilson,hoffmanp,swapna,wiebe}@cs.pitt.edu,

{ychoi,cardie}@cs.cornell.edu,{riloff,sidd}@cs.utah.edu

1 Introduction

OpinionFinder is a system that performs subjectivity

analysis, automatically identifying when opinions,

sentiments, speculations and other private states are

present in text. Speciﬁcally, OpinionFinder aims to

identify subjective sentences and to mark various as-

pects of the subjectivity in these sentences, includ-

ing the source (holder) of the subjectivity and words

that are included in phrases expressing positive or

negative sentiments.

Our goal with OpinionFinder is to develop a sys-

tem capable of supporting other Natural Language

Processing (NLP) applications by providing them

with information about the subjectivity in docu-

ments. Of particular interest are question answering

systems that focus on being able to answer opinion-

oriented questions, such as the following:

Was the election in Iran regarded as fair?

Is support dimishing for the war in Iraq?

To answer these types of questions, a system needs

to be able to identify when opinions are expressed in

text and who is expressing them. Other applications

that would beneﬁt from knowledge of subjective lan-

guage include systems that summarize the various

viewpoints in a document or that mine product re-

views. Even typical fact-oriented applications, such

as information extraction, can beneﬁt from subjec-

tivity analysis by ﬁltering out opinionated sentences.

(Riloff et al., 2005).

2 OpinionFinder

OpinionFinder runs in two modes, batch and inter-

active. Document processing is largely the same for

both modes. In batch mode, OpinionFinder takes a

list of documents to process. Interactive mode pro-

vides a front-end that allows a user to query on-line

news sources for documents to process.

2.1 System Architecture Overview

OpinionFinder operates as one large pipeline. Con-

ceptually, the pipeline can be divided into two parts.

The ﬁrst part performs mostly general purpose doc-

ument processing (e.g., tokenization and part-of-

speech tagging). The second part performs the sub-

jectivity analysis. The results of the the subjectiv-

ity analysis are returned to the user in the form of

SGML/XML markup of the original documents.

2.2 Document Processing

For general document processing, OpinionFinder

ﬁrst runs the Sundance partial parser (Riloff and

Phillips, 2004) to provide semantic class tags, iden-

tify Named Entities, and match extraction patterns

that correspond to subjective language (Riloff and

Wiebe, 2003). Next, OpenNLP11.1.0 is used to

tokenize, sentence split and part-of-speech tag the

data, and the Abney stemmer in SCOL2version 1g is

used to stem. In batch mode, OpinionFinder parses

the data again, this time to obtain constituency parse

trees (Collins, 1997), which are then converted to

dependency parse trees (Xia and Palmer, 2001).

1http://opennlp.sourceforge.net/

2http://www.vinartus.net/spa/

Currently, this stage is only included for batch mode

processing due to the time required for parsing. Fi-

nally, a clue-ﬁnder is run to identify words and

phrases from a large subjective language lexicon.

2.3 Subjectivity Analysis

The subjectivity analysis has four components.

2.3.1 Subjective Sentence Classiﬁcation

The ﬁrst component is a Naive Bayes classiﬁer

that distinguishes between subjective and objective

sentences using a variety of lexical and contextual

features (Wiebe and Riloff, 2005; Riloff and Wiebe,

2003). The classiﬁer is trained using subjective and

objective sentences, which are automatically gener-

ated from a large corpus of unannotated data by two

high-precision, rule-based classiﬁers.

2.3.2 Speech Events and Direct Subjective

Expression Classiﬁcation

The second component identiﬁes speech events

(e.g., “said,” “according to”) and direct subjective

expressions (e.g., “fears,” “is happy”). Speech

events include both speaking and writing events.

Direct subjective expressions are words or phrases

where an opinion, emotion, sentiment, etc. is di-

rectly described. A high-precision, rule-based clas-

siﬁer is used to identify these expressions.

2.3.3 Opinion Source Identiﬁcation

The third component is a source identiﬁer that

combines a Conditional Random Field sequence

tagging model (Lafferty et al., 2001) and extraction

pattern learning (Riloff, 1996) to identify the sources

of speech events and direct subjective expressions

(Choi et al., 2005). The source of a speech event is

the speaker; the source of a subjective expression is

the experiencer of the private state. The source iden-

tiﬁer is trained on the MPQA Opinion Corpus3using

a variety of features, including those obtained from

the dependency parse. Because the source identi-

ﬁer relies on dependency parse information, it is cur-

rently only included in batch mode.

2.3.4 Sentiment Expression Classiﬁcation

The ﬁnal component uses two classiﬁers to iden-

tify words contained in phrases that express pos-

3The MPQA Opinion Corpus can be freely obtained at

http://nrrc.mitre.org/NRRC/publications.htm.

itive or negative sentiments (Wilson et al., 2005).

The ﬁrst classiﬁer focuses on identifying sentiment

expressions. The second classiﬁer takes the senti-

ment expressions and identiﬁes those that are pos-

itive and negative. Both classiﬁers were developed

using BoosTexter (Schapire and Singer, 2000) and

trained on the MPQA Corpus.

3 Related Work

Please see (Wiebe and Riloff, 2005; Choi et al.,

2005; Wilson et al., 2005) for related work in au-

tomatic opinion and sentiment analysis.

4 Acknowledgments

This work was supported by the Advanced Research

and Development Activity (ARDA), by the National

Science Foundation under grants IIS-0208028, IIS-

0208798 and IIS-0208985, and by the Xerox Foun-

dation.

References

J. Choi, C. Cardie, E. Riloff, and S. Patwardhan. 2005. Identi-

fying sources of opinions with conditional random ﬁelds and

extraction patterns. In HLT/EMNLP 2005.

M. Collins. 1997. Three generative, lexicalised models for sta-

tistical parsing. In ACL-97.

J. Lafferty, A. McCallum, and F. Pereira. 2001. Conditional

random ﬁelds: Probabilistic models for segmenting and la-

beling sequence data. In ICML-2001.

E. Riloff and W. Phillips. 2004. An Introduction to the Sun-

dance and AutoSlog Systems. Technical Report UUCS-04-

015, School of Computing, University of Utah.

E. Riloff and J. Wiebe. 2003. Learning extraction patterns for

subjective expressions. In EMNLP-2003.

E. Riloff, J. Wiebe, and W. Phillips. 2005. Exploiting sub-

jectivity classiﬁcation to improve information extraction. In

AAAI-2005.

Ellen Riloff. 1996. Automatically generating extraction pat-

terns from untagged text. In AAAI/IAAI, Vol. 2.

R. E. Schapire and Y. Singer. 2000. BoosTexter: A boosting-

based system for text categorization. Machine Learning,

39(2/3):135–168.

J. Wiebe and E. Riloff. 2005. Creating subjective and objec-

tive sentence classiﬁers from unannotated texts. InCICLing-

2005.

T. Wilson, J. Wiebe, and P. Hoffman. 2005. Recognizing

contextual polarity in phrase-level sentiment analysis. In

HLT/EMNLP 2005.

F. Xia and M. Palmer. 2001. Converting dependency structures

to phrase structures. In HLT-2001.

Revisão sistemática sobre uso de ontologia ara análise de sentimento em conteúdo da Web

Article

Jun 2017

Paulo Oliveira Lima

É crescente o número de trabalhos que procuram extrair informação e conhecimento a partir de dados não estruturados como textos publicados na web em blogs, microblogs, redes sociais e fontes de notícias. Em muitos casos, a avaliação deste tipo de dado não estruturado depende do contexto ou domínio específico de uma aplicação. Assim o uso de ontologia surge como uma ferramenta para dar suporte à classificação de dados e inferência de informação a partir de dados não estruturados como. Os avanços trazem sistemas baseados em ontologias (ontology driven). Com o objetivo de verificar os avanços nesta área, o presente trabalho apresenta uma revisão sistemática sobre os temas análise de sentimento e ontologia. O objetivo específico é responder a questão: Qual é a contribuição do uso de Ontologia para Análise de Sentimento em dados não estruturados da Web?

Umigon-lexicon: rule-based model for interpretable sentiment analysis and factuality categorization

Article

Full-text available

Jun 2024

Clément Levallois

We introduce umigon-lexicon, a novel resource comprising English lexicons and associated conditions designed specifically to evaluate the sentiment conveyed by an author's subjective perspective. We conduct a comprehensive comparison with existing lexicons and evaluate umigon-lexicon's efficacy in sentiment analysis and factuality classification tasks. This evaluation is performed across eight datasets and against six models. The results demonstrate umigon-lexicon's competitive performance, underscoring the enduring value of lexicon-based solutions in sentiment analysis and factuality categorization. Furthermore, umigon-lexicon stands out for its intrinsic interpretability and the ability to make its operations fully transparent to end users, offering significant advantages over existing models.

SENTIDA: A New Tool for Sentiment Analysis in Danish

Article

Full-text available

Sep 2019

In the midst of the Era of Big Data, tools for analysing and processing unstructured data are needed more than ever. Being among these, sentiment analysis has experienced both a substantial proliferation in popularity and major developmental progress. However, the development of sentiment analysis tools in Danish has not experienced the same rapid development as e.g. English tools. Few Danish tools exist, and often the ones available are either ineffective or outdated. Moreover, authoritative validation tests in low-resource languages, are missing, which is why little can be deduced about the competence of current Danish models. We present SENTIDA, a simple and effective model for general sentiment analysis in Danish, and compare its competence to the current benchmark within the field of Danish sentiment analysis, AFINN. Combining a lexical approach with several incorporated functions, we construct SENTIDA and categorise it as a domain-independent sentiment analysis tool focusing on polarity strength. Subsequently, we run different validation tests, including a binary classification test of Trustpilot reviews and a correlation test based on manually rated texts from different domains. The results show that SENTIDA excels across all tests, predicting reviews with an accuracy above 80% in all trials and providing significant correlations with manually annotated texts.

Service failure monitoring via multivariate multiple linear regression profile schemes with dimensionality reduction

Article

Nov 2023
DECIS SUPPORT SYST

A multimodal sentiment analysis approach for tweets by comprehending co-relations between information modalities

Article

Full-text available

Nov 2023
MULTIMED TOOLS APPL

With the popularity of smart devices and online social media platforms, people are expressing their views in various modalities like text, images, and audio. Thus, recent research in sentiment analysis is no more limited to one modality of information only, rather it compiles all the available modalities to predict more correct sentiment. Multimodal sentiment analysis (MSA) is the process of extracting sentiment from various modalities such as text, images, and audio. Existing research works predict the sentiment of individual modalities independently and these predictions leverage the final sentiment. This paper presents an MSA approach for obtaining the final sentiment of an image-text tweet using multimodal decision-level fusion by incorporating features of individual modalities and inter-modal semantic relations. A dataset is prepared from an existing benchmark MSA dataset by annotating the final sentiment to tweets as a whole after assessing all the modalities. The proposed approach is experimented on this dataset and compared with state-of-the-art MSA methods. The in-depth analysis of the comparison results shows that the proposed approach outperforms existing methods in terms of accuracy, and F1-score.

Hyper‐Local Fear of Crime: Identifying Linguistic Cues of Fear in Crime Talk on Reddit

Article

Oct 2023

The fear of crime is an emotional response individuals have toward crime or the anticipation related to being the victim of crime. The increasing exposure to crime information presents considerable risks to people's psychological health and well‐being. Nevertheless, the fear of crime in online discourses is under‐researched despite abundant conversations about crime. This work presents a mixed‐methods study to comprehend how people disclose the fear of crime and what linguistic content or cues are associated with the fear. We gathered conversations about crime in the Baltimore subreddit. The content analysis revealed a necessity to differentiate between “experienced” and “expressive” fear of crime. The regression modeling identified strong factors related to the fear of crime, such as negative sentiment, objective expression, and first‐person pronouns. This work extends the conceptualization of the fear of crime in online discourses and suggests potential ways to detect the fear automatically.

Stock Market Dynamics Through Deep Learning Context

Preprint

May 2024

Studies conducted on financial market prediction lack a comprehensive feature set that can carry a broad range of contributing factors; therefore, leading to imprecise results. Furthermore, while cooperating with the most recent innovations in explainable AI, studies have not provided an illustrative summary of market-driving factors using this powerful tool. Therefore, in this study, we propose a novel feature matrix that holds a broad range of features including Twitter content and market historical data to perform a binary classification task of one step ahead prediction. The utilization of our proposed feature matrix not only leads to improved prediction accuracy when compared to existing feature representations, but also its combination with explainable AI allows us to introduce a fresh analysis approach regarding the importance of the market-driving factors included. Thanks to the Lime interpretation technique, our interpretation study shows that the volume of tweets is the most important factor included in our feature matrix that drives the market's movements.

Stock Market Analysis using Long Short-Term Memory(LSTM) and linear regression Machine Learning Model

Conference Paper

Dec 2023

Leveraging Emotional Features and Machine Learning for Predicting Startup Funding Success

Conference Paper

Oct 2023

ORDSAENet: Outlier Resilient Semantic Featured Deep Driven Sentiment Analysis Model for Education Domain

Article

Full-text available

Oct 2023

The high pace rising global competitions across education sector has forced institutions to enhance aforesaid aspects, which require assessing students or related stakeholders’ perception and opinion towards the learning materials, courses, learning methods or pedagogies, etc. To achieve it, the use of reviews by students can of paramount significance; yet, annotating student’s opinion over huge heterogenous and unstructured data remains a tedious task. Though, the artificial intelligence (AI) and natural language processing (NLP) techniques can play decisive role; yet the conventional unsupervised lexicon, corpus-based solutions, and machine learning and/or deep driven approaches are found limited due to the different issues like class-imbalance, lack of contextual details, lack of long-term dependency, convergence, local minima etc. The aforesaid challenges can be severe over large inputs in Big Data ecosystems. In this reference, this paper proposed an outlier resilient semantic featuring deep driven sentiment analysis model (ORDSAENet) for educational domain sentiment annotations. To address data heterogeneity and unstructured-ness over unpredictable digital media, the ORDSAENet applies varied pre-processing methods including missing value removal, Unicode normalization, Emoji and Website link removal, removal of the words with numeric values, punctuations removal, lower case conversion, stop-word removal, lemmatization, and tokenization. Moreover, it applies a text size-constrained criteria to remove outlier texts from the input and hence improve ROI-specific learning for accurate annotation. The tokenized data was processed for Word2Vec assisted continuous bag-of-words (CBOW) semantic embedding followed by synthetic minority over-sampling with edited nearest neighbor (SMOTE-ENN) resampling. The resampled embedding matrix was then processed for Bi-LSTM feature extraction and learning that retains both local as well as contextual features to achieve efficient learning and classification. Executing ORDSAENet model over educational review dataset encompassing both qualitative reviews as well as quantitative ratings for the online courses, revealed that the proposed approach achieves average sentiment annotation accuracy, precision, recall, and F-Measure of 95.87%, 95.26%, 95.06% and 95.15%, respectively, which is higher than the LSTM driven standalone feature learning solutions and other state-of-arts. The overall simulation results and allied inferences confirm robustness of the ORDSAENet model towards real-time educational sentiment annotation solution.

Determining the sentiment of opinions

Article

Full-text available

Jan 2004

Identifying sentiments (the affective parts of opinions) is a challenging problem. We present a system that, given a topic, automatically finds the people who hold opinions about that topic and the sentiment of each opinion. The system contains a module for determining word sentiment and another for combining sentiments within a sentence. We experiment with various models of classifying and combining sentiment at word and sentence levels, with promising results.

Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

Conference Paper

Full-text available

Feb 2005
Lect Notes Comput Sci

This paper presents the results of developing subjectivity classifiers using only unannotated texts for training. The performance rivals that of previous supervised learning approaches. In addition, we advance the state of the art in objective sentence classification, by learn- ing extraction patterns associated with objectivity and creating objec- tive classifiers that achieve substantially higher recall than previous work with comparable precision.

Exploiting Subjectivity Classification to Improve Information Extraction.

Conference Paper

Full-text available

Jan 2005

Information extraction (IE) systems are prone to false hits for a variety of reasons and we observed that many of these false hits occur in sentences that contain sub- jective language (e.g., opinions, emotions, and senti- ments). Motivated by these observations, we explore the idea of using subjectivity analysis to improve the precision of information extraction systems. In this pa- per, we describe an IE system that uses a subjective sen- tence classier to lter its extractions. We experimented with several different strategies for using the subjectiv- ity classications, including an aggressive strategy that discards all extractions found in subjective sentences and more complex strategies that selectively discard ex- tractions. We evaluated the performance of these differ- ent approaches on the MUC-4 terrorism data set. We found that indiscriminately ltering extractions from subjective sentences was overly aggressive, but more selective ltering strategies improved IE precision with minimal recall loss.

Identifying Sources of Opinions with Conditional Random Fields and Extraction Patterns

Conference Paper

Full-text available

Oct 2005

Recent systems have been developed for sentiment classification, opinion recognition, and opinion analysis (e.g., detecting polarity and strength). We pursue another aspect of opinion analysis: identifying the sources of opinions, emotions, and sentiments. We view this problem as an information extraction task and adopt a hybrid approach that combines Conditional Random Fields (Lafferty et al., 2001) and a variation of AutoSlog (Riloff, 1996a). While CRFs model source identification as a sequence tagging task, AutoSlog learns extraction patterns. Our results show that the combination of these two methods performs better than either one alone. The resulting system identifies opinion sources with 79.3% precision and 59.5% recall using a head noun matching measure, and 81.2% precision and 60.6% recall using an overlap measure.

BoosTexter: A Boosting-based System for Text Categorization

Article

May 2000

This work focuses on algorithms which learn from examples to perform multiclass text and speech categorization tasks. Our approach is based on a new and improved family of boosting algorithms. We describe in detail an implementation, called BoosTexter, of the new boosting algorithms for text categorization tasks. We present results comparing the performance of BoosTexter and a number of other text-categorization algorithms on a variety of tasks. We conclude by describing the application of our system to automatic call-type identification from unconstrained spoken customer responses.

Automatically Generating Extraction Patterns from Untagged Text

Conference Paper

Jan 1996

Ellen Riloff

Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis

Conference Paper

Oct 2005

Learning extraction patterns for subjective expressions

Article

Jul 2003

This paper presents a bootstrapping process that learns linguistically rich extraction patterns for subjective (opinionated) expressions. High-precision classifiers label unannotated data to automatically create a large training set, which is then given to an extraction pattern learning algorithm. The learned patterns are then used to identify more subjective sentences. The bootstrapping process learns many subjective patterns and increases recall while maintaining high precision.

Converting Dependency Structures to Phrase Structures

Article

Oct 2003

this paper, we address the relationship between dependency structures and phrase structures from a practical perspective; namely, the exploration of different algorithms that convert dependency structures to phrase structures and the evaluation of their performance against an existing Treebank. This work not only provides ways to convert Treebanks from one type of representation to the other, but also clarifies the differences in representational coverage of the two approaches

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

Article

Jan 2002

We present conditional random fields, a framework for building probabilistic models to segment and label sequence data. Conditional random fields offer several advantages over hidden Markov models and stochastic grammars for such tasks, including the ability to relax strong independence assumptions made in those models. Conditional random fields also avoid a fundamental limitation of maximum entropy Markov models (MEMMs) and other discriminative Markov models based on directed graphical models, which can be biased towards states with few successor states. We present iterative parameter estimation algorithms for conditional random fields and compare the performance of the resulting models to HMMs and MEMMs on synthetic and natural-language data.

OpinionFinder: A System for Subjectivity Analysis.

Abstract

Recommended publications

Demo: Text Titling Application

Identifying Expressions of Opinion in Context

Identifying Sources of Opinions with Conditional Random Fields and Extraction Patterns

Toward opinion summarization

Recognizing Strong and Weak Opinion Clauses.