Conference Paper

Development and use of a gold standard data set for subjectivity classifications

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Contents with the subjective bias can make people be doubtful about the texts' reliability and possibly trigger social unrest with offensive language. Prior research has used the lexical and grammatical cues like lexicon-syntactic patterns (Wiebe and Riloff, 2005;Riloff and Wiebe, 2003) or various n-gram features (Murray and Carenini, 2009;Wilson and Raaijmakers, 2008;Wiebe et al., 1999) to classify sentences as either subjective or objective. For instance, in the encyclopedia domain, Recasens et al. (2013) constructed an automatic parallel corpus from Wikipedia revisions that violate the Neutral Point of View (NPOV) policy,which advocates for "fairly presenting views with reliable sources and avoiding editor bias" and introduced the task of identifying the bias-induced word in a statement. ...
... A pilot study conducted by Pryzant et al. (2020) on their Wikipedia Neutrality Corpus (WNC) demonstrated that over 5% of the revisions are not related to bias mitigation and thus wrongly labeled on the sentence level. Meanwhile, existing manually annotated corpora for subjectivity often suffer from the small dataset size in Wiebe et al. (1999) or limited annotation quality: annotator agreement from Hube and Fetahu (2019) falls Source Sentence: pre-edit (biased language) Target Sentence: post-edit (neutral language) ...
... We create the new WIKIBIAS corpus by first extracting Wikipedia revisions where editors provide Neutral Point of View (NPOV) 2 justifications (Recasens et al., 2013;Yang et al., 2017;Zanzotto and Pennacchiotti, 2010;Pryzant et al., 2020) to construct automatically labeled data (WIKIBIAS-AUTO); then manually annotating sentences with fine-grained bias types at the span-level to create clean ground truth (WIKIBIAS-MANUAL). This is in contrast to the prior work on subjectivity that annotated only on the sentence-level (Wiebe et al., 1999;Hube andFetahu, 2019, 2018). In particular, we design a two-stage human annotation methodology to handle sentences with both single-and multi-edits. ...
... According to Carrilo et al. [7], sentiment analysis, in short, can be understood from three sequential states: (1) The detection of subjectivity, which aims to discover the terms, expressions, or phrases that contain an opinion [19,27,33]. "Opinion" is understood in SA as the individual's feeling about a given object (or property, attribute of an object). ...
... (2) The recognition of polarity allows the text to be classified as positive or negative [10,26,31]. (3) The classification of inference intensity permits the identification of n degrees of positivity or negativity, in a way that a certain characteristic presented in a text may be polarized as weak or strongly positive; there is, therefore, a scale for polarities [6,7,10,33]. ...
... However, only the second paper presented the steps used to classify the polarity intensity, which related the polarity of the sentence to that of each aspect. The identification of new aspects for sentiment analysis is related to the possibility of defining dictionaries of terms, not only of sentiment lexicons, but of characteristics related to certain contexts, for example, the characteristics most evaluated in the tourism sector, but this factor was not presented [8,9,18,19,21,27,33]. ...
... Past work found that linguistic features such as part of speech, named entities, and hedging can distinguish between subjective [66,68] and objective [38,54] statements (corresponding to sharing opinions and facts), and that lexicon-based features can distinguish information seeking questions (which roughly correspond to fact seeking) from other types of questions such as social coordination [28,30,45]. But because these results relied exclusively on thirdparty labels, they only reflect perceptions. ...
... Selecting linguistic features. We began with a basic set of linguistic features [66]: the usage of pronouns, adjectives, cardinal numbers, modals, and adverbs. We then refined the pronoun feature by distinguishing the use of first-person and second-person pronouns [30,38]. ...
... Subjectivity detection. Distinguishing between opinions and facts is closely related to the task of subjectivity detection, for which a number of language-based models have been proposed [40,50,64,66]; see [43] for a more complete survey. However, the two tasks are not identical as subjectivity encompasses more than opinions: [66] defines subjective language as expressing private state, which includes not only opinions but also emotions and speculation [52]. ...
Preprint
Full-text available
Discourse involves two perspectives: a person's intention in making an utterance and others' perception of that utterance. The misalignment between these perspectives can lead to undesirable outcomes, such as misunderstandings, low productivity and even overt strife. In this work, we present a computational framework for exploring and comparing both perspectives in online public discussions. We combine logged data about public comments on Facebook with a survey of over 16,000 people about their intentions in writing these comments or about their perceptions of comments that others had written. Unlike previous studies of online discussions that have largely relied on third-party labels to quantify properties such as sentiment and subjectivity, our approach also directly captures what the speakers actually intended when writing their comments. In particular, our analysis focuses on judgments of whether a comment is stating a fact or an opinion, since these concepts were shown to be often confused. We show that intentions and perceptions diverge in consequential ways. People are more likely to perceive opinions than to intend them, and linguistic cues that signal how an utterance is intended can differ from those that signal how it will be perceived. Further, this misalignment between intentions and perceptions can be linked to the future health of a conversation: when a comment whose author intended to share a fact is misperceived as sharing an opinion, the subsequent conversation is more likely to derail into uncivil behavior than when the comment is perceived as intended. Altogether, these findings may inform the design of discussion platforms that better promote positive interactions.
... Sentiment analysis aims to identify attitudes expressed in a text by its author(s) and has drawn much attention in the domain of natural language processing due to its wide applicability. However, early efforts on sentiment analysis have focused mostly on determining sentiment polarities in a document or a sentence such as positive, negative or neutral without the targets commented [18,19,20], which has only limited ability to support making public-opinion-based decisions. Therefore, majority of recent related works turn to fine-grained sentiment analysis which involves identifying target aspects and classifying their corresponding sentiments in a text. ...
... . (20) 4 Experiments ...
Preprint
Extracting aspect-polarity pairs from texts is an important task of fine-grained sentiment analysis. While the existing approaches to this task have gained many progresses, they are limited at capturing relationships among aspect-polarity pairs in a text, thus degrading the extraction performance. Moreover, the existing state-of-the-art approaches, namely token-based se-quence tagging and span-based classification, have their own defects such as polarity inconsistency resulted from separately tagging tokens in the former and the heterogeneous categorization in the latter where aspect-related and polarity-related labels are mixed. In order to remedy the above defects, in-spiring from the recent advancements in relation extraction, we propose to generate aspect-polarity pairs directly from a text with relation extraction technology, regarding aspect-pairs as unary relations where aspects are enti-ties and the corresponding polarities are relations. Based on the perspective, we present a position- and aspect-aware sequence2sequence model for joint extraction of aspect-polarity pairs. The model is characterized with its ability to capture not only relationships among aspect-polarity pairs in a text through the sequence decoding, but also correlations between an aspect and its polarity through the position- and aspect-aware attentions. The experi-ments performed on three benchmark datasets demonstrate that our model outperforms the existing state-of-the-art approaches, making significant im-provement over them.
... Most existing methods for subjectivity classification are based on supervised learning. One of the earliest study [54] used naive Bayes classifier with combination of binary features such as the presence of adjectives, adverbs (other than negation terms), pronouns, a cardinal number, etc. Researchers also used other learning algorithms with more refined features. ...
... Several document level [41,48], sentence level [54,55,59] and feature level [20,21] sentiment analysis systems are proposed in literature in order to determine opinions about product and product features. The two main tasks in sentiment analysis are word sense disambiguation and negation handling and most of the issues are also concerned with these two tasks. ...
Thesis
Full-text available
The extraction of unstructured data from the Web and to analyse them in order to determine useful information which can be used by customers and manufacturers to make decisions about product is a challengeable task. There are some existing techniques to evaluate products based on the ratings and product reviews posted on the Web. However, all these techniques have some inherent issues and limitations and therefore not able to fulfil the needs and requirements of both customer and manufacturer. For instance, the existing sentiment analysis methods (which classify the opinions in customer reviews about a product as positive or negative) are not able to determine the context of word in a sentence accurately. In addition, negation handling methods adopted while determining the sentiment are not able to deal with all types of negations and they also do not consider all exceptions where negations behave differently. Similarly, the existing product reputation models are based on single source, not robust to false and biased ratings, not able to reflect the recent opinions, do not allow users to evaluate product on different criteria, and also do not provide a good estimation accuracy. On the other hand, the existing product reputation systems are centralized which have issues such as single point of failure, easy to falsify evaluation information and not suitable approach to solve a complex problem. This thesis proposes methods and techniques for evaluating product reputation based on data available on the Web and to provide valuable information to customers and manufacturers for decision making. These methods perform the following tasks: 1) extract product evaluation data from multiple Web sources 2) analyse product reviews in order to determine that whether opinions about product features in customer reviews are positive or negative, 3) computes different product reputation values while considering different evaluation criteria, and 4) finally the results are provided to customers and manufacturers in order to make decisions. This thesis contributes in three main research areas i.e. 1) feature level sentiment analysis, 2) product reputation model and 3) multiagent architecture. First, a word sense disambiguation and negation handling methods are proposed in order to improve the performance of feature level sentiment analysis. Second, a novel mathematical model is proposed which computes several reputation values in order to evaluate product based on different criteria. Finally, multiagent architecture for review analysis and product evaluation is proposed. Huge amount of the product evaluation data on the Web is in textual form (i.e. product reviews). In order to analyse product reviews to evaluate product we propose a feature level sentiment analysis method which determines the opinions about different features of a product. A word sense disambiguation method is introduced which identify the sense of words according to the context while determining the polarity. In addition, a negation handling method is proposed which determine the sequence of words affected by different types of negations. The results show that both word sense disambiguation and negation handling methods improve the overall accuracy of feature level sentiment analysis. A multi-source product reputation model is proposed where informative, robust and strategy proof aggregation methods are introduced to compute different reputation values. Sources from which reviews are extracted may not be creditable hence a source credibility measuring method is proposed in order to avoid malicious web sources. In addition, suitable decay principles for product reputation are also introduced in order to reflect the newest opinions about product quickly. The model also considers several parameters such as reviewer expertise, rating trustworthiness, time span of ratings, reviewer age, sex and location in order to evaluate product in different ways. Different types of ratings (i.e. textual and numeric ratings) are considered to compute reputation values which increase the choices for customers and manufacturers to make decisions. The results show that the proposed model is robust, strategy proof, able to reflect recent opinions, and estimates true reputation values even if some ratings are false. A multiagent layered architecture is proposed for product reputation evaluation. The main idea behind this layered architecture is to divide the complex problem of the product evaluation which is handled by a single entity in a centralized fashion into simpler and smaller problems handled by several entities in a distributed fashion. The architecture addresses different aspects of product evaluation such as taking inputs and displaying results to users, reviews extraction, feature level sentiment analysis and computing reputation values. In addition, the architecture also addresses issues concerned with centralized approach and also offers additional benefits such as autonomy, pro-activeness, openness and social ability.
... Although cannot distinguish multiple harmful opinions. ➢ Sentence level [189,190]: The task is finding harmful or not and subjectivity in each sentence expressed. Neutral opinions are not counted. ...
Preprint
Full-text available
In today's digital world, online platforms play a significant role in facilitating communication and content sharing. Furthermore, with the emergence of large language models (LLMs), there has been a notable increase in both the quantity and variety of content. The exponential rise in user-generated content has led to challenges in maintaining a respectful online environment. In some cases, users have taken advantage of anonymity in order to use harmful content, which can negatively affect the user experience and pose serious social problems. Recognizing the limitations of manual moderation, automatic detection systems have been developed to tackle this problem. Nevertheless, several obstacles persist, including lack of a comprehensive framework, the absence of a universal definition for harmful content, the need for detailed annotation guideline. The current definitions of harmful content are static and do not adapt to changes over time. Furthermore, the detection methods are outdated and fail to keep pace with advancements in content, platforms, and new technologies such as LLMs. This study aims to address these challenges by introducing, for the first time, a detailed framework adaptable to any content, language and platform. This framework encompasses various aspects of harmful content detection. One of the key component of the framework is our development of a cross-language annotation guideline. Additionally, the integration of sentiment analysis represents an approach to enhancing harmful content detection. Furthermore, a definition of harmful content is proposed, which is formulated through a comprehensive review of various related concepts and emerging needs, allowing for adaptability to dynamic changes. Addressing these challenges and implementing a harmful content detection framework is vital, as it allows for early detection and prevention of harmful content, greatly improving the safety and security of all online users.
... In most scenarios, determining the level of intensity of mood or moods (positive/negative) can be an effective attribute in exploring the views of hate speech identification. Machine learning algorithms are used to classify the content based on their essential or relevant words and phrases [5], [6]. Subjective and non-substantive functions in the input are detected and are used for the conceptual classification of the input. ...
Article
This paper describes a model for spotting offensive data from the comments being collected from social media. The comments posted will include expressions, emoticons and will mostly be in code mixed language and classifying these code-mixed language comments is tricky. The proposed system uses a multi-head attention model to extract features from the code-mixed Tamil input data. Various classification algorithms are applied to these extracted features to categorize offensive comments. The generated labels are optimized by performing majority voting on labels generated by different algorithms. This system is validated on the validation set and is evaluated by applying the Tamil CodeMix test data from the dataset published by the HASOC task (Task2-subtask1) at FIRE 2021. The evaluation yields an average weighted F1 score of 0.83 and is ranked 3rd position in the official ranking.
... At the sentence level, the task is to classify whether each sentence expresses a positive, negative or neutral opinion. Before analysing the polarity (orientation) of the sentence, it is necessary to determine whether the sentence is subjective or objective because the objective ones do not express opinions, unlike subjective ones [5]. By firstly separating sentences into one of the two categories, then the subjective sentences can be labelled as positive or negative. ...
Article
Full-text available
Social network systems are constantly fed with text messages. While this enables rapid communication and global awareness, some messages could be aptly made to hurt or mislead. Automatically identifying meaningful parts of a sentence, such as, e.g., positive or negative sentiments in a phrase, would give valuable support for automatically flagging hateful messages, propaganda, etc. Many existing approaches concerned with the study of people’s opinions, attitudes and emotions and based on machine learning require an extensive labelled dataset and provide results that are not very decisive in many circumstances due to the complexity of the language structure and the fuzziness inherent in most of the techniques adopted. This paper proposes a deterministic approach that automatically identifies people’s sentiments at the sentence level. The approach is based on text analysis rules that are manually derived from the way Italian grammar works. Such rules are embedded in finite-state automata and then expressed in a way that facilitates checking unstructured Italian text. A few grammar rules suffice to analyse an ample amount of correctly formed text. We have developed a tool that has validated the proposed approach by analysing several hundreds of sentences gathered from social media: hence, they are actual comments given by users. Such a tool exploits parallel execution to make it ready to process many thousands of sentences in a fraction of a second. Our approach outperforms a well-known previous approach in terms of precision.
... The first approach is to choose such words through questionaries or computer analysis (selecting words for the index by using machine text analysis); such indices can also be constructed by experts. The MFD (Moral Foundations Dictionary) [28] is an example of an index compiled through a questionary. ...
Article
The research purpose is to evaluate influence of sanctions on the Russian economy taking into consideration the sectoral aspect (oil and gas, telecommunications and consumer sector). The research methodology comprises econometric modeling (elastic net and GARCH modeling) and text analysis. In the paper we developed author’s sanction indices based on the text analysis. We used the EcSentiThemeLex dictionary to assess the news’ positivity and negativity. The empiric research base consists of news publications of the lenta.ru portal for the period from 01.01.2014 to 31.03.2023 represented by the thematic sections “economy” and “science and technology”. The research results are as follows. On the basis of GARCH modeling we revealed that sanctions have a negative impact on capitalization of the largest companies inoil and gas, the consumer sector and telecommunications. The news tonality influences companies’ capitalization. We have developed sanctions indices (a minimal index, an expanded index, a maximally expanded index) which allow to assess the extent of sanctions pressure. On the basis of elastic net method we made the conclusion of priority of sentiment variables over the control ones, i.e. information on sanctions and its tonality influences the stock market more than the oil prices, rouble exchange rate and interbank rate in the short term. Sanctions influence is not industry specific. However, the study does entail certain limitations: 1. reliance on publications from a single source; 2. the use of a single dictionary for evaluating news sentiment; 3. the sanctions index does not allow the incorporation of new terms when fresh sanctions are imposed. We intend to address these issues in future research.
... Sentiment analysis originates from the analysis of subjectivity in sentences [10]. Due to the emergence of a large number of network resources, the research of sentiment analysis has become an active field since 2000 [11]. ...
Article
Full-text available
In the field of natural language processing, sentiment analysis via deep learning has a excellent performance by using large labeled datasets. Meanwhile, labeled data are insufficient in many sentiment analysis tasks, and obtaining these data is time-consuming and laborious. Prompt learning devotes to resolving the data deficiency by reformulating downstream tasks with the help of prompt. The model performance of this method depends on the quality of the prompt. This paper proposes an adaptive prompting (AP) construction strategy using seq2seq-attention structure to acquire the semantic information of the input sequence. Our method of dynamically constructing adaptive prompts can not only improve the quality of prompt, but also can effectively generalize to other fields by constructing a pre-trained prompt with existing public labeled data. The experimental results on FewCLUE datasets demonstrate that the proposed method AP can effectively construct appropriate adaptive prompt regardless of the quality of hand-crafted prompt and outperform the state-of-the-art baselines.
... Though, it must be noted that subjectivity is not correspondent to sentiment as many objective sentences can imply opinions. 3. Entity and Aspect level: Aspect level achieves finer-grained analysis. ...
Article
Full-text available
Every day the social media networks (SMN) are generating massive amounts of data that may be structured, unstructured or semi-structured. The data may range from a normal text to a graphic image or an audio or video. Analysing this varied and ever-growing data is a big challenge. This paper focusses on extracting and analyzing data from a much used online text message application WhatsApp through the process of Sentiment Analysis. Sentiment Analysis or also known as opinion mining is contextual mining of data to identify, extract and analyse the underlying sentiment in messages and classify them to be positive, negative or neutral. R language has been used in this paper to understand the different emotions in a WhatsApp chat.
... Automatic classification of sentiment has been applied in various fields of research over the past 20 years as access to vast amounts of written text about various topics have become available through the internet. Already in 1999 Wiebe et al. [58] worked on a dataset for automatic classification of news articles to identify whether information is being presented as fact or opinion. While sentiment analysis is still being used to analyse media platforms like those of news agencies [5,46], its application today also includes platforms on which a wide variety of people contribute content such as social media or internet forums. ...
Preprint
Full-text available
Software testing is an integral part of modern software engineering practice. Past research has not only underlined its significance, but also revealed its multi-faceted nature. The practice of software testing and its adoption is influenced by many factors that go beyond tools or technology. This paper sets out to investigate the context of software testing from the practitioners' point of view by mining and analyzing sentimental posts on the widely used question and answer website Stack Overflow. By qualitatively analyzing sentimental expressions of practitioners, which we extract from the Stack Overflow dataset using sentiment analysis tools, we discern factors that help us to better understand the lived experience of software engineers with regards to software testing. Grounded in the data that we have analyzed, we argue that sentiments like insecurity, despair and aspiration, have an impact on practitioners' attitude towards testing. We suggest that they are connected to concrete factors like the level of complexity of projects in which software testing is practiced.
... Об'єктивні фрази також можуть містити приховану думку (див. про це [16]). ...
... Whereas, the sentence-level SA, targets to classify sentiments expressed in each sentence. Here, the first step is to identify whether the sentence is subjective i.e. expressing people's views and opinions or objective i.e. expressing factual information [4]. If the sentence is subjective, then the sentence-level SA will determine whether the sentence expresses positive or negative opinions. ...
Article
Full-text available
Nowadays, Deep Learning (DL) is a fast growing and most attractive research field in the area of image processing and natural language processing (NLP), which is being adopted across several sectors like medicine, agriculture, commerce and so many other areas as well. This is mainly because of the greater advantages in using DL like automatic feature extraction, capability to process more number of parameters and capacity to generate more accuracy in results. In this paper, we have examined the research works which have used the DL based Sentiment Analysis (SA) for the social network data. This paper provides the brief explanation about the SA, the necessities of the pre-processing of text, performance metrics and the roles of DL models in SA. The main focus of this paper is to explore how the DL algorithms can enhance the performance of SA than the traditional machine learning algorithms for text based analysis. Since DL models are more effective for NLP research, the text classification can be applied on the complex sentences in which there are two inverse emotions which produces the two different emotions about an event. Through this literature appraisal we conclude that by using the Convolutional Neural Network (CNN) technique we can obtain more accuracy than others. The paper also brings to the light that there is no major focus on mixed emotions by using DL methods, which eventually increases the scope for future researches.
... Sentence-level classification identifies the sentence as objective or subjective, also known as subjectivity classification [3]. Then considering these subjective sentences as small documents, sentiments expressed by these sentences classified as negative or positive. ...
... Abdul-Mageed et al. [1] produced an Arabic dataset that was manually labeled into four classes, objective, subjective-positive, subjectivenegative, and subjective-neutral. Their labeling criteria were revealed from [42] in which if a sentence is not objective, then it will be a candidate for the other three subjective classes. Out of their strict annotation process, their dataset consists of 1,281 objective, 491 subjective-positive, 689 subjectivenegative, and 394 subjective-neutral news sentences. ...
Article
Full-text available
The image of the tolerant religion of Islam has been distorted by extremists in the last two decades in many ways, such as luring teenagers into terrorist acts. Nowadays, millions of users socialize and share ideas using social media platforms such as Twitter. Typically, the ideas shared on Twitter (tweets) reach and influence many people who could simply retweet them and make them even spread faster. Unfortunately, some of these ideas are posted by extremists who share hateful Arabic content. Thus, it is very important to automate the process of controlling and monitoring hateful Arabic tweets, given that Arabic is the most widely used language in the Islamic world. In this paper, we provide a manually labeled and curated dataset of 3,000 Arabic tweets that contain hateful and non-hateful tweets. To automate the process of detecting hateful tweets, we utilize advanced Machine Learning (ML) techniques and perform sentiment analysis to capture the meaning of the Arabic words in a proper word embedding (Word2Vec). Also, we used the proposed model to classify and analyze 100,000 tweets of the last decade. The outcome of this work promotes future research on analyzing Arabic hateful speech by providing a manually labeled Arabic dataset, and the trained model (achieved 92% accuracy) which can be used as an underlying tool by governments, Internet service providers, and social media applications to detect any inflammatory tweets before they spread to a wider audience.
... Subjectivity classification classifies sentences into two classes, subjective and objective [25]. An objective sentence states some factual information, whereas a subjective sentence expresses personal feelings, views, judgments, or beliefs. ...
Preprint
Full-text available
Multitask learning often helps improve the performance of related tasks as these often have inter-dependence on each other and perform better when solved in a joint framework. In this paper, we present a deep multitask learning framework that jointly performs polarity and subjective detection. We propose an attention-based multitask model for predicting polarity and subjectivity. The input sentences are transformed into vectors using pre-trained BERT and Glove embeddings, and the results depict that BERT embedding based model works better than the Glove based model. We compare our approach with state-of-the-art models in both subjective and polarity classification single-task and multitask frameworks. The proposed approach reports baseline performances for both polarity detection and subjectivity detection.
... It is the most crucial step in Sentiment Analysis. Subjective sentences author's views, assessments, sentiments, beliefs and perceptions [2], [4]. Sentiment Analysis has two approaches: lexicon-based and mach ine learning-based [6]. ...
Conference Paper
Full-text available
The recent years have witnessed a significant growth in the data collected from the reviews posted on various websites. Reviews are a direct way of getting the response of the customers and clients of any business, making it a convenient way for getting feedback for marketing, performance and other such characteristics in association with any product or service. The opinions mined from these collections of data can provide strategies to improve the sales based on how well a product is received. This is done in two steps, first being the S ubjectivity Detection followed by Sentiment Analysis. For this process, various methods have been already introduced in this field. These vary from S VMs, Naive-Bayesian, deep learning etc. S ince, English is the most commonly used language in the world, it is not surprising that most work done in this field focuses on the same. But it is already known that there are roughly around 6500 languages used around the world. India alone has 447 languages which ranks it fourth on the list of countries with the greatest number of languages. The proposed research work focuses on sentiment classification in Hindi language text. The proposed research work has attempted to experiment with a method that does not rely on availability language dictionaries. This is done by creating a completely numerical data corresponding to the text. The model proposed in this paper will use a combination of Recurrent Neural Network and Convolutional Neural Network model to extract the subjective data form the given dataset of movie reviews.
... Though, it must be noted that subjectivity is not correspondent to sentiment as many objective sentences can imply opinions. 3. Entity and Aspect level: Aspect level achieves finer-grained analysis. ...
Article
Every day the social media networks (SMN) are generating massive amounts of data that may be structured, unstructured or semi-structured. The data may range from a normal text to a graphic image or an audio or video. Analysing this varied and ever�growing data is a big challenge. This paper focusses on extracting and analyzing data from a much used online text message application WhatsApp through the process of Sentiment Analysis. Sentiment Analysis or also known as opinion mining is contextual mining of data to identify, extract and analyse the underlying sentiment in messages and classify them to be positive, negative or neutral. R language has been used in this paper to understand the different emotions in a WhatsApp chat.
... A more comprehensive overview of the topic, including the evolution of SD over the years and the relation with other tasks, is covered by the excellent survey of Chaturvedi et al. [6]. To the best of our knowledge, Wiebe et al. [30] are the first to create a corpus for SD. They annotate a set of news articles and also describe an iterative process to improve inter-annotator agreement and annotation guidelines, from which we draw inspiration for our own process. ...
Conference Paper
We present SubjectivITA: the first Italian corpus for subjectivity detection on news articles, with annotations at sentence and document level. Our corpus consists of 103 articles extracted from online newspapers, amounting to 1,841 sentences. We also define baselines for sentence-and document-level subjectivity detection using transformer-based and statistical classifiers. Our results suggest that sentence-level subjectivity annotations may often be sufficient to classify the whole document.
... Neutral usually means no opinion. This level of analysis is closely related to subjectivity classification [29], which distinguishes sentences (called objective sentences) that express factual information from sentences (called subjective sentences) that express subjective views and opinions. However, we should note that subjectivity is not equivalent to sentiment as many objective sentences can imply opinions, e.g., "We bought the car last month and the windshield wiper has fallen off." ...
Thesis
Full-text available
With my work, I intend to showcase a lexical-level sentiment analysis while matching the found tokens with previous state-of-the-art datasets, providing sentiment rating to the tweets which consist of these words, developing a word occurrence probability cluster for especially this type of events from these features. Henceforth, I bring to the table an LSTM model for predicting sentiments by training and testing the sentiment-rated tweets, implying which, I achieved an accuracy of 84.51%, which I analyzed and addressed further by doing the epoch-level error analysis, and keeping up with that I even take an approach for making the trend analysis of tweet counts and people’s changing opinions reflecting through tweets about this new topic with time.
... Starting from Wiebe et al. [18] work in the late 90s, there has been a surge of interest in the different setups of SA. In general, it can be done at a document, sentence, or aspect level [5] and the classification in terms of positive, negative, or neutral, but also other more fine-grained scales such as a ranking from 1 to 5. ...
Article
Twenty-four studies on twenty-three distinct languages and eleven social media illustrate the steady interest in deep learning approaches for multilingual sentiment analysis of social media. We improve over previous reviews with wider coverage from 2017 to 2020 as well as a study focused on the underlying ideas and commonalities behind the different solutions to achieve multilingual sentiment analysis. Interesting findings of our research are (i) the shift of research interest to cross-lingual and code-switching approaches, (ii) the apparent stagnation of the less complex architectures derived from a backbone featuring an embedding layer, a feature extractor based on a single CNN or LSTM and a classifier, (iii) the lack of approaches tackling multilingual aspect-based sentiment analysis through deep learning, and, surprisingly, (iv) the lack of more complex architectures such as the transformers-based, despite results suggest the more difficult tasks requires more elaborated architectures. Full text: https://authors.elsevier.com/a/1cv0e5aecSjupP
... Emotion analysis is the extraction, detection, and classification of the theme views, feelings, and attitudes, which was first proposed by Nasukawa et al. [8]. The main tasks of affective analysis include subjective and objective analysis [9], affective tendency analysis (affective classification), viewpoint information extraction, comment mining, etc., among which affective classification is the most commonly used. ...
Article
Full-text available
This article, the official news site of Linyi five services in Shanghai, Yiwu GanZhou Shenzhen national logistics hub of news as the data source, through Word2Vec and construction LSTM emotion classification model, with positive or negative emotion in general categories, calculating to analyze its emotional value, from the emotional category and time series analysis and word frequency vector to Linyi public opinion analysis of logistics hub
... On the other hand, when annotated corpora is available, machine-learning methods are a natural choice for building subjectivity and sentiment classifiers. For example, Wiebe et al. (1999) used a data set manually annotated for subjectivity to train a machine learning classifier, which led to significant improvements over the baseline. ...
Conference Paper
Full-text available
In this study we propose to identify some data source that can be used to determine "controlled vocabulary" (lexicon of markers), in order to develop an engine for extracting articles refering to a given topic. Methods of automatic enriching of the created lexicon of markers are presented.
... Furthermore, they have used different text mining, Natural Language Processing and Network Analysis techniques to predict user behavior. Any company or food delivering company can used this sort of information for t [26,27] he purpose of success and failure of product. Nobody has worked to analyze the behavior of certain decision and their impact of human life before. ...
Article
Full-text available
Tweet data can be processed as a useful information. Social media sites like Twitter, Facebook, Google+ are rapidly growing popularity. These social media sites provide a platform for people to share and express their views about daily routine life, have to discuss on particular topics, have discussion with different communities, or connect with globe by posting messages. Tweets posted on twitter are expressed as opinions. These opinions can be used for different purposes such as to take public views on uncertain decisions such as Muslim ban in America, War in Syria, American Soldiers in Afghanistan etc. These decisions have direct impact on user's life such as violations & aggressiveness are common causes. For this purpose, we will collect opinions on some popular decision taken in past decade from twitter. We will divide the sentiments into two classes that is anger (hatred) and positive. We will propose a hypothesis model for such data which will be used in future. We will use Support Vector Machine (SVM), Naive Bayes (NB), and Logistic Regression (LR) classifier for text classification task. Furthermore , we will also compare SVM results with NB, LR. Research will help us to predict early behaviors & reactions of people before the big consequences of such decisions.
... It achieved 91% accuracy and explored the contribution of similarity and Bayesian classification approaches. Ref. [12] is also worthy citing, as the authors are amongst the first ones to deal with the subjectivity classification task. They have developed a reference dataset and achieved over 81% accuracy with a Bayesian classifier. ...
Article
Full-text available
Texts published on social media have been a valuable source of information for companies and users, as the analysis of this data helps improving/selecting products and services of interest. Due to the huge amount of data, techniques for automatically analyzing user opinions are necessary. The research field that investigates these techniques is called sentiment analysis. This paper focuses specifically on the task of subjectivity classification, which aims to predict whether a text passage conveys an opinion. We report the study and comparison of machine learning methods of different paradigms to perform subjectivity classification of book review sentences in Portuguese, which have shown to be a challenging domain in the area. Specifically, we explore richer features for the task, using several lexical, centrality-based and discourse features. We show the contributions of the different feature sets and evidence that the combination of lexical, centrality-based and discourse features produce better results than any of the feature sets individually. Additionally, by analyzing the achieved results and the acquired knowledge by some symbolic machine learning methods, we show that some discourse relations may clearly signal subjectivity. Our corpus annotation also reveals some distinctive discourse structuring patterns for sentence subjectivity.
... Machine-learning methods often produce only a binary result (positive or negative) from batch processing using a supervised classifier with a large domain-specific set of labelled training set, so that the classifier can distinguish between positive and negative patterns of messages (Hew et al., 2020). Some of the most popular machine learning algorithms for sentiment analysis are Support Vector Machines (Pang et al., 2002;Dave et al., 2003;Gamon, 2004;Matsumoto et al., 2005;Airoldi et al., 2004), Naive Bayes (Wiebe et al., 1999(Wiebe et al., , 2005Yu and Hatzivassiloglou, 2003;Melville et al., 2009), and Maximum Entropy-based Classifiers (Nigam et al., 1999;Pang et al., 2002). While there are clearly many libraries for machine-learning analysis of text-based data, there are fewer training sets to work from. ...
Article
When presenting to a large group of students, either in an amphitheatre or through an online platform, effectively connecting to the audience – understanding how well the audience is following the presentation and taking appropriate actions promptly if they experience difficulties – is a serious challenge. Backchannel systems are sometimes deployed to address this issue by allowing audience to give feedback to the presenter without interrupting the current discourse. However, these systems are not designed to immediately aggregate and present the audience's feedback to the presenter in a meaningful way that is easy to quickly digest. To fill this gap, we have explored a proof-of-concept method for analysing the emotions and sentiments from the audience's feedback in real time and displaying to the presenter a morale graph showing a trend of the audience's overall reaction over time. This allows a presenter to effectively connected to their audience in real time, knowing whether their presentation is going well and what issues their audience may have in common at any specific moment. We have further implemented this method in an educational context, using a prototype backchannel system, known as ClasSense, for a lecturer to effectively connect to their students. This paper presents the evaluation of this system, which shows that lecturers accept and prefer the morale graph based user interface developed over other backchannel user interfaces that display all posts in chronological order. Students also positively expressed their agreement that the system not only makes their feedback an important part of the class but also increases their interactions with the lecturers. This is further confirmed with a Markov chain predicting the probability that students’ and lecturers’ survey results lead to their overall positive sentiment towards the tool. The flexibility of the ClasSense system suggests it may also be suitable in contexts other than education.
... Subjectivity is ubiquitous in our use of language (Banfield, 1982;Quirk et al., 1985;Wiebe et al., 1999;Benamara et al., 2017), and is therefore an important aspect to consider in Natural Language Processing (NLP). For example, subjectivity can be associated with different senses of the same word. ...
Preprint
Full-text available
Subjectivity is the expression of internal opinions or beliefs which cannot be objectively observed or verified, and has been shown to be important for sentiment analysis and word-sense disambiguation. Furthermore, subjectivity is an important aspect of user-generated data. In spite of this, subjectivity has not been investigated in contexts where such data is widespread, such as in question answering (QA). We therefore investigate the relationship between subjectivity and QA, while developing a new dataset. We compare and contrast with analyses from previous work, and verify that findings regarding subjectivity still hold when using recently developed NLP architectures. We find that subjectivity is also an important feature in the case of QA, albeit with more intricate interactions between subjectivity and QA performance. For instance, a subjective question may or may not be associated with a subjective answer. We release an English QA dataset (SubjQA) based on customer reviews, containing subjectivity annotations for questions and answer spans across 6 distinct domains.
... In the literature wide range of features have been explored in the task of tweet sentiment analysis including unigrams, bigrams, n-grams, part-of-speech (POS) tags, word embedding, word clusters [19], [20], [30], [21], [1], [9]. In this work we use TweetToSparseFeatureVector filter in Weka Affective tweets [14] package to extract word n-grams, character n-grams, brown word clusters and part-of-speech tags. ...
Article
Full-text available
Emotions are known to influence the perception of human beings along with their memory, thinking and imagination. Human perception is important in today’s world in a wide range of factors including but not limited to business, education, art, and music. Microblogging and Social networking sites like Twitter, Facebook are challenging sources of information that allow people to share their feelings and thoughts on a daily basis. In this paper we propose an approach to automatically detect emotions on Twitter messages that explores characteristics of the tweets and the writer’s emotion using Support Vector Machine LibLinear model and achieve 98% accuracy. Emotion mining gained attraction in the field of computer science due to the vast variety of systems that can be developed and promising applications like remote health care system, customer care services, smart phones that react based on users’s emotion, vehicles that sense emotion of the driver. These emotions help understand the current state of user. In order to perform suitable actions or provide suggestions on how user’s can enhance their feeling for a better healthy life-style we use actionable recommendations. In this work we extract action rules with respect to the user emotions that help provide suggestions for user’s.
... In this method feature vector is built for each annotated text entry, which is then used to train machine learning algorithms and finally validate the learning against annotated reference text. In the early days text is classified as either subjective or objective using supervised sentiment analysis [8]. One of the earliest works carried out using supervised approach of analysis of sentiment is prediction of stock market performance [9], leading to the development of an election outcome prediction [10] and movie box-office performance prediction [11]. ...
Article
Full-text available
Most of the recent work in sentiment analysis is carried out on textual data. The text-based sentiment analysis mainly relies on the construction of word dictionaries, using machine learning techniques that learn and extract opinions from large text corpora. Text-based sentiment analysis has numerous applications such as customer satisfaction analysis about a brand or product perception, to gauge voting intentions, etc. With the rapid growth of social media, users post humongous volumes of data in various modalities such as text, image, audio, and video. These multimodal data streams bring new opportunities for going beyond text-based sentiment analysis and improving possible results. Since sentiment can be extracted from facial and vocal expressions, prosody and body posture, multimodal sentiment analysis offers new avenues in sentiment analysis. In multimodal sentiment analysis, the sentiment is extracted from transcribed content, visual and vocal features. This survey defines sentiment, sentiment analysis, states problems and challenges in multimodal sentiment analysis and finally reviews some of the recent computational approaches used multimodal sentiment analysis.
Chapter
Twitter is increasingly being used as a venue for medical research because of the large number of unstructured and free-text tweets sent there on healthcare-related topics. In natural language processing, sentiment analysis is one sort of data mining that may be used to assess the direction of a person's personality. Computational linguistics is used to the analysis of text to infer and assess conceptual understanding of the internet, social media, and related topics. Healthcare information is also widely available online in the form of personal blogs, social media, and websites that rate medical conditions, but this data was not compiled in a systematic fashion. A few of the numerous advantages of sentiment analysis include better healthcare outcomes and more efficient medical practice. In this paper, we explore possible new opportunities for those researchers who want to do work in the domain of sentiment analysis in the medical field and. We explore many recent and existing papers and find out the strength and research gaps of these papers in terms of methodologies, datasets used, and different machine learning and deep learning models. These tabular forms give new direction for research in this domain.KeywordsSentiment analysisLexicon-based sentiment classification deep learningClinical text miningHealth status analysis computational linguistics
Conference Paper
Aplicativos online de avaliação geralmente recomendam as revisões (reviews) mais úteis para os usuários leitores de avaliações. Aqui, introduzimos um novo problema: avaliar a utilidade de uma revisão para o dono de um estabelecimento. Especificamente, propomos o uso de aspectos e sentimentos das revisões, e a geração de um ranking ordenado a partir das mais úteis para o gerenciamento e desenvolvimento do estabelecimento.
Article
This reflection article addresses a difficulty faced by scholars and practitioners working with numbers about people, which is that those who study people want numerical data about these people. Unfortunately, time and time again, this numerical data about people is wrong. Addressing the potential causes of this wrongness, we present examples of analyzing people numbers, i.e., numbers derived from digital data by or about people, and discuss the comforting illusion of data validity. We first lay a foundation by highlighting potential inaccuracies in collecting people data, such as selection bias. Then, we discuss inaccuracies in analyzing people data, such as the flaw of averages, followed by a discussion of errors that are made when trying to make sense of people data through techniques such as posterior labeling. Finally, we discuss a root cause of people data often being wrong – the conceptual conundrum of thinking the numbers are counts when they are actually measures. Practical solutions to address this illusion of data validity are proposed. The implications for theories derived from people data are also highlighted, namely that these people theories are generally wrong as they are often derived from people numbers that are wrong.
Article
Full-text available
The present research paper attempts to estimate the influence of physical capital investment, education expenditure and trade on the economic growth of the four major countries in South Asia, namely, India, Pakistan, Bangladesh and Srilanka. In addition the study estimates spillover benefits of the institutional measures of voice and accountability, political stability and absence of violence and terrorism in the neighbouring countries on economic growth of home country. The paper diagnoses the intra-regional trade in South Asia and whether Northeast India can catalyse the economic integration in the region. The study also throws light on the spillover benefits from regional integration and hindrances in realisation of the spillover benefits for the North-Eastern states from the Act-East Policy of Government of India. We run a panel regression for the period 1996-2016 to estimate the influence of various conventional factors and spillover effects of institutional measures of voice and accountability and political stability and absence of violence and terrorism on economic growth of the four major economies of South Asia. Annual data on various explanatory variables have been collected from Penn World Tables, Word Bank, World Bank Governance Indicators for the four South Asian countries, namely, India, Pakistan, Bangladesh and Srilanka for the period 1996 to 2016. Significant positive effects of physical capital investment, trade, regional institutions of voice and accountability and political stability are observed. Surprisingly, it is observed that international trade measured by trade-GDP ratio has positive and significant influence on economic growth of the countries in South Asia, but intraregional trade within South Asia remains meagre. Policy makers should make the most of the geographical location of the Northeast states in escalating the economic growth of South Asian nations. This also provides the Northeast region a generous opportunity to reap the benefits of the Act East Policy of India. Keywords: International Economics, Institutions and Macroeconomy, Panel Data Models, Estimation
Article
Full-text available
In recent years, deep learning-based sentiment analysis has received attention mainly because of the rise of social media and e-commerce. In this paper, we showcase the fact that the polarity detection and subjectivity detection subtasks of sentiment analysis are inter-related. To this end, we propose a knowledge-sharing-based multitask learning framework. To ensure high-quality knowledge sharing between the tasks, we use the Neural Tensor Network, which consists of a bilinear tensor layer that links the two entity vectors. We show that BERT-based embedding with our MTL framework outperforms the baselines and achieves a new state-of-the-art status in multitask learning. Our framework shows that the information across datasets for related tasks can be helpful for understanding task-specific features.
Chapter
Full-text available
The electroencephalogram is a test that is used to keep track on the brain activity. These signals are generally used in clinical areas to identify various brain activities that happen during specific tasks and to design brain–machine interfaces to help in prosthesis, orthosis, exoskeletons, etc. One of the tedious tasks in designing a brain–machine interface application is based on processing of EEG signals acquainted from real-time environment. The complexity arises due to the fact that the signals are noisy, non-stationary, and high-dimensional in nature. So, building a robust BMI is based on the efficient processing of these signals. Optimal selection of features from the signals and the classifiers used plays a vital role in building efficient devices. This paper concentrates on surveying the recent feature selection, feature extraction, and classification algorithms used in various applications for the development of BMI.KeywordsEEGProsthesisOrthosisExoskeletons
Book
Full-text available
“Learning gives creativity, creativity leads to thinking, thinking provides knowledge, and knowledge makes you great.” — Abdul Kalam Azad. “The capacity to learn is a Gift, The ability to learn is a skill, The willingness to learn is a choice.” — Brian Herbert. “Anyone who stops learning is old whether at twenty or eighty. Anyone who keeps learning stays young.” —Henry Ford. “It is the customer who pays the wages and the more you engage with customers the clearer things become and the easier it is to determine what you should be doing. ” — John Russell, President, Harley Davidson. This is an Edited book from the chapters of researchers presented in a seminar.
Chapter
With the rapid development and popularity of the World Wide Web, the Internet has entered the Web 2.0 and social network era. In Web 1.0, the Internet was characterized by static web pages. Web 2.0 refers to a World Wide Web that highlights user-generated content. Social networks are represented by a number of online tools and platforms, such as Twitter, Facebook, Weibo, and WeChat, where people share their perspectives, opinions, thoughts, and experiences. These online platforms contain innumerable subjective texts regarding different topics and events that fully reflect the individual opinions, sentiments, attitudes, and emotions of all of society.
Chapter
Full-text available
Recent advances in machine learning have led to computer systems that are humanlike in behavior. Sentiment analysis, the automatic determination of emotions in text, is allowing us to capitalize on substantial previously unattainable opportunities in commerce, public health, government policy, social sciences, and art. Further, analysis of emotions in text, from news to social media posts, is improving our understanding of not just how people convey emotions through language but also how emotions shape our behavior. This article presents a sweeping overview of sentiment analysis research that includes: the origins of the field, the rich landscape of tasks, challenges, a survey of the methods and resources used, and applications. We also discuss how, without careful fore-thought, sentiment analysis has the potential for harmful outcomes. We outline the latest lines of research in pursuit of fairness in sentiment analysis.
Article
This paper is an intend to consolidate the review and perform the literature survey on the sentiment analysis and on opinion mining. In this paper we try to analyze people sentiments, opinions, and emotions from their text language by which we can try to understand in what mood or emotion was the person while writing the text message. There are many types of sentimental moods according to which person writes the text it can be classified like happy, sad, neutral, angry. Also there are times when the user can be sad and angry at the same time which is needed to be identified by the analysis.
Article
Sentiments are the attitude, opinions, thoughts, beliefs or feelings of the writer towards something, such as people, artifacts, company or location. Sentiment analysis intends to conclude the judgment of a presenter or an author apropos to some subject matter or on the whole relative polarity of the manuscript. The outlook could be the perception or assessment, emotional condition, or the projected poignant message of the person behind
Conference Paper
Full-text available
Consumers use numerical review ratings, and unstructured review text on various latent aspects of the products/services to make online purchase decisions. The sheer volume of content creates an information overload issue for consumers in extracting relevant information. With the advent of advanced analytical tools and cheaper computing resources, it is now possible to extract rich information from unstructured review text. Using latest data mining and analytics tools and techniques, this work proposes a novel approach to derive and extract objective latent aspect ratings and ways to integrate the proposed model with extant review systems to address the information overload issue.
Article
Along with the emergence of the Internet, the rapid development of handheld devices has democratized content creation due to the extensive use of social media and has resulted in an explosion of short informal texts. Although a sentiment analysis of these texts is valuable for many reasons, this task is often perceived as a challenge given that these texts are often short, informal, noisy, and rich in language ambiguities, such as polysemy. Moreover, most of the existing sentiment analysis methods are based on clean data. In this paper, we present DICET, a transformer-based method for sentiment analysis that encodes representation from a transformer and applies deep intelligent contextual embedding to enhance the quality of tweets by removing noise while taking word sentiments, polysemy, syntax, and semantic knowledge into account. We also use the bidirectional long- and short-term memory network to determine the sentiment of a tweet. To validate the performance of the proposed framework, we perform extensive experiments on three benchmark datasets, and results show that DICET considerably outperforms the state of the art in sentiment classification.
ResearchGate has not been able to resolve any references for this publication.