Content uploaded by Siddharth Patwardhan
Author content
All content in this area was uploaded by Siddharth Patwardhan on Feb 19, 2014
Content may be subject to copyright.
Content uploaded by Siddharth Patwardhan
Author content
All content in this area was uploaded by Siddharth Patwardhan
Content may be subject to copyright.
OpinionFinder: A system for subjectivity analysis
Theresa Wilson‡, Paul Hoffmann‡, Swapna Somasundaran†, Jason Kessler†,
Janyce Wiebe†‡, Yejin Choi§, Claire Cardie§, Ellen Riloff∗, Siddharth Patwardhan∗
‡Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA 15260
†Department of Computer Science, University of Pittsburgh, Pittsburgh, PA 15260
§Department of Computer Science, Cornell University, Ithaca, NY 14853
∗School of Computing, University of Utah, Salt Lake City, UT 84112
{twilson,hoffmanp,swapna,wiebe}@cs.pitt.edu,
{ychoi,cardie}@cs.cornell.edu,{riloff,sidd}@cs.utah.edu
1 Introduction
OpinionFinder is a system that performs subjectivity
analysis, automatically identifying when opinions,
sentiments, speculations and other private states are
present in text. Specifically, OpinionFinder aims to
identify subjective sentences and to mark various as-
pects of the subjectivity in these sentences, includ-
ing the source (holder) of the subjectivity and words
that are included in phrases expressing positive or
negative sentiments.
Our goal with OpinionFinder is to develop a sys-
tem capable of supporting other Natural Language
Processing (NLP) applications by providing them
with information about the subjectivity in docu-
ments. Of particular interest are question answering
systems that focus on being able to answer opinion-
oriented questions, such as the following:
Was the election in Iran regarded as fair?
Is support dimishing for the war in Iraq?
To answer these types of questions, a system needs
to be able to identify when opinions are expressed in
text and who is expressing them. Other applications
that would benefit from knowledge of subjective lan-
guage include systems that summarize the various
viewpoints in a document or that mine product re-
views. Even typical fact-oriented applications, such
as information extraction, can benefit from subjec-
tivity analysis by filtering out opinionated sentences.
(Riloff et al., 2005).
2 OpinionFinder
OpinionFinder runs in two modes, batch and inter-
active. Document processing is largely the same for
both modes. In batch mode, OpinionFinder takes a
list of documents to process. Interactive mode pro-
vides a front-end that allows a user to query on-line
news sources for documents to process.
2.1 System Architecture Overview
OpinionFinder operates as one large pipeline. Con-
ceptually, the pipeline can be divided into two parts.
The first part performs mostly general purpose doc-
ument processing (e.g., tokenization and part-of-
speech tagging). The second part performs the sub-
jectivity analysis. The results of the the subjectiv-
ity analysis are returned to the user in the form of
SGML/XML markup of the original documents.
2.2 Document Processing
For general document processing, OpinionFinder
first runs the Sundance partial parser (Riloff and
Phillips, 2004) to provide semantic class tags, iden-
tify Named Entities, and match extraction patterns
that correspond to subjective language (Riloff and
Wiebe, 2003). Next, OpenNLP11.1.0 is used to
tokenize, sentence split and part-of-speech tag the
data, and the Abney stemmer in SCOL2version 1g is
used to stem. In batch mode, OpinionFinder parses
the data again, this time to obtain constituency parse
trees (Collins, 1997), which are then converted to
dependency parse trees (Xia and Palmer, 2001).
1http://opennlp.sourceforge.net/
2http://www.vinartus.net/spa/
Currently, this stage is only included for batch mode
processing due to the time required for parsing. Fi-
nally, a clue-finder is run to identify words and
phrases from a large subjective language lexicon.
2.3 Subjectivity Analysis
The subjectivity analysis has four components.
2.3.1 Subjective Sentence Classification
The first component is a Naive Bayes classifier
that distinguishes between subjective and objective
sentences using a variety of lexical and contextual
features (Wiebe and Riloff, 2005; Riloff and Wiebe,
2003). The classifier is trained using subjective and
objective sentences, which are automatically gener-
ated from a large corpus of unannotated data by two
high-precision, rule-based classifiers.
2.3.2 Speech Events and Direct Subjective
Expression Classification
The second component identifies speech events
(e.g., “said,” “according to”) and direct subjective
expressions (e.g., “fears,” “is happy”). Speech
events include both speaking and writing events.
Direct subjective expressions are words or phrases
where an opinion, emotion, sentiment, etc. is di-
rectly described. A high-precision, rule-based clas-
sifier is used to identify these expressions.
2.3.3 Opinion Source Identification
The third component is a source identifier that
combines a Conditional Random Field sequence
tagging model (Lafferty et al., 2001) and extraction
pattern learning (Riloff, 1996) to identify the sources
of speech events and direct subjective expressions
(Choi et al., 2005). The source of a speech event is
the speaker; the source of a subjective expression is
the experiencer of the private state. The source iden-
tifier is trained on the MPQA Opinion Corpus3using
a variety of features, including those obtained from
the dependency parse. Because the source identi-
fier relies on dependency parse information, it is cur-
rently only included in batch mode.
2.3.4 Sentiment Expression Classification
The final component uses two classifiers to iden-
tify words contained in phrases that express pos-
3The MPQA Opinion Corpus can be freely obtained at
http://nrrc.mitre.org/NRRC/publications.htm.
itive or negative sentiments (Wilson et al., 2005).
The first classifier focuses on identifying sentiment
expressions. The second classifier takes the senti-
ment expressions and identifies those that are pos-
itive and negative. Both classifiers were developed
using BoosTexter (Schapire and Singer, 2000) and
trained on the MPQA Corpus.
3 Related Work
Please see (Wiebe and Riloff, 2005; Choi et al.,
2005; Wilson et al., 2005) for related work in au-
tomatic opinion and sentiment analysis.
4 Acknowledgments
This work was supported by the Advanced Research
and Development Activity (ARDA), by the National
Science Foundation under grants IIS-0208028, IIS-
0208798 and IIS-0208985, and by the Xerox Foun-
dation.
References
J. Choi, C. Cardie, E. Riloff, and S. Patwardhan. 2005. Identi-
fying sources of opinions with conditional random fields and
extraction patterns. In HLT/EMNLP 2005.
M. Collins. 1997. Three generative, lexicalised models for sta-
tistical parsing. In ACL-97.
J. Lafferty, A. McCallum, and F. Pereira. 2001. Conditional
random fields: Probabilistic models for segmenting and la-
beling sequence data. In ICML-2001.
E. Riloff and W. Phillips. 2004. An Introduction to the Sun-
dance and AutoSlog Systems. Technical Report UUCS-04-
015, School of Computing, University of Utah.
E. Riloff and J. Wiebe. 2003. Learning extraction patterns for
subjective expressions. In EMNLP-2003.
E. Riloff, J. Wiebe, and W. Phillips. 2005. Exploiting sub-
jectivity classification to improve information extraction. In
AAAI-2005.
Ellen Riloff. 1996. Automatically generating extraction pat-
terns from untagged text. In AAAI/IAAI, Vol. 2.
R. E. Schapire and Y. Singer. 2000. BoosTexter: A boosting-
based system for text categorization. Machine Learning,
39(2/3):135–168.
J. Wiebe and E. Riloff. 2005. Creating subjective and objec-
tive sentence classifiers from unannotated texts. InCICLing-
2005.
T. Wilson, J. Wiebe, and P. Hoffman. 2005. Recognizing
contextual polarity in phrase-level sentiment analysis. In
HLT/EMNLP 2005.
F. Xia and M. Palmer. 2001. Converting dependency structures
to phrase structures. In HLT-2001.