Figure 1 - uploaded by Manuel Burghardt
Content may be subject to copyright.
Example annotation task

Example annotation task

Source publication
Conference Paper
Full-text available
We present results of a sentiment annotation study in the context of historical German plays. Our annotation corpus consists of 200 representative speeches from the German playwright Gotthold Ephraim Lessing. Six annotators, five non-experts and one expert in the domain, annotated the speeches according to different sentiment annotation schemes. Th...

Contexts in source publication

Context 1
... participants were able to annotate the presence of one or more emotion categories from a set of eight basic emotions (anger, fear, surprise, trust, anticipation, joy, disgust, sadness). Figure 1 illustrates the annotation process: For the differentiated polarity and the binary polarity, every annotator was asked to choose the most adequate sentiment category. For the emotion category, the instruction was to mark any emotions that are present in a speech. ...
Context 2
... participants were able to annotate the presence of one or more emotion categories from a set of eight basic emotions (anger, fear, surprise, trust, anticipation, joy, disgust, sadness). Figure 1 illustrates the annotation process: For the differentiated polarity and the binary polarity, every annotator was asked to choose the most adequate sentiment category. For the emotion category, the instruction was to mark any emotions that are present in a speech. ...

Similar publications

Conference Paper
Full-text available
First, we briefly summarize the state of research in traffic congestion and existing approaches towards jam classification. We describe the assumptions of the adaptive smoothing method used for processing the empirical data. We present the results of our analysis of the spatiotemporal dynamics of traffic flow on the A12 and A13 motorways in Austria...
Article
Full-text available
Some things to consider when researching from a cross-national and comparative perspective.
Article
Full-text available
We, the Editors and Publisher of the journal Artificial Cells, Nanomedicine, and Biotechnology, have retracted the following article: Zhiliang Guo, Lanlan Li, Yu Gao, Xiaoyun Zhang and Min Cheng. (2019). Overexpression of lncRNA ANRIL aggravated hydrogen peroxide-disposed injury in PC-12 cells via inhibiting miR-499a/PDCD4 axis-mediated PI3K/Akt/mT...
Article
Full-text available
Job satisfaction is one of the important factors to get optimal work results through communication, perception and satisfaction of employees in the office. So, the aim of this paper is to know the impact of inter-personal communication and perception of leadership quality on employee’s satisfaction in the office of the region of the ministry of rel...

Citations

... The annotation process is time-consuming, tedious, and demanding due to the highly diverse nature of emotional expressions, the potential usage of metaphorical language in fiction, and the general interconnectedness of preceding and subsequent text. In some cases, the annotators are not ideally aligned with each other (Schmidt et al., 2018). The multitude of interpretations regarding the emotions conveyed in the text underscores its subjective nature, introducing an additional layer of complexity to the annotation process. ...
Article
Full-text available
In recent years, there has been an increasing focus on women’s science fiction in China. A prevailing perception among readers and critics suggests that women’s sensibilities enable them to convey more nuanced emotions in their works. To examine this viewpoint within the realm of contemporary Chinese science fiction, a quantitative approach based on affective computing was employed. This approach allowed for a systematic evaluation of indicators such as emotional arc, emotional richness, and twistiness. The findings reveal that while individual writers may exhibit distinct emotional writing styles, overall, there is no significant disparity in emotional narratives between male and female science fiction writers.
... Annotators whose labels revealed a limited correlation with the mean of the group (Spearman rho < 0.8) were excluded from the later stages of the work. The final agreement between the remaining annotators has been checked through Krippendorf's alpha method, verifying an alpha value threshold > 0.67 (Schmidt et al., 2018). K-alpha statistical test was performed using R statistics -rel. ...
Article
Full-text available
Feedback and requests by occupants are relevant sources of data to improve building management, and building maintenance. Indeed, most predictable faults can be directly identified by occupants and communicated to facility managers through communications written in the end-users’ native language. In this sense, natural language processing methods can support the request identification and attribution process if they are robust enough to extract useful information from these unstructured textual sources. Machine learning (ML) can support assessing and managing these data, especially in the case of many simultaneous communications. In this field, the application of pre-processing and ML methods to English-written databases has been widely provided, while efforts in other native languages are still limited, impacting the real applicability. Moreover, the performance of combinations of methods for pre-processing, ML and classification classes attribution, has been limitedly investigated while comparing different languages. To fill this gap, this work hence explores the performance of automatic priority assignment of maintenance end-users’ requests depending on the combined influence of: (a) different natural language pre-processing methods, (b) several supervised ML algorithms, (c) two priority classification rules (2-class versus 4-class), (d) the database language (i.e. the original database written in Italian, the native end-users’ language; a translated database version in English, as standard reference). Analyses are performed on a database of about 12000 maintenance requests written in Italian concerning a stock of 23 buildings open to the public. A random sample of the sentences is supervised and labelled by 20 expert annotators following the best-worst method to attribute a priority score. Labelled sentences are then pre-processed using four different approaches to progressively reduce the number of unique words (potential predictors). Five different consolidated ML methods are applied, and comparisons involve accuracy, precision, recall and F1-score for each combination of pre-processing action, ML method and the number of priority classes. Results show that, within each ML algorithm, different pre-processing methods limitedly impact the final accuracy and average F1-score. In both Italian and English conditions, the best performance is obtained by NN, LR, SVM methods, while NB generally fails, and by considering the 2-class priority classification scale. In this sense, results confirm that facility managers can be effectively supported by ML methods for preliminary priority assessments in building maintenance processes, even when the requests database is written in end-users’ native language.
... But there is no general notion of why this should be transferable to another domain or another language with a "similar" task. The problem of distributional shift or domain adaptation often contributes to loss of performance [35,36,37]. Then again: How similar do these domains have to be, to safely assume generalization? ...
... Recent research suggest that every step along the way to a trained pipeline based on human annotations involves the risk of bias [38]. Also, in the special case of sentiment, there are many studies that evidence that scholar's and crowd-sourced annotations alike have particularly low agreement between annotators in historic texts [39,36,37]. Even the agreement of various machine-learning algorithms seems surprisingly low even when trained and evaluated on the same sentiment data sets [40]. ...
... Results of annotators showing a priority score scale with limited correlation with the mean of the whole annotators' group (Spearman rho < 0.8) were excluded from the later stages of the work. The agreement between the remaining annotators has been checked through Krippendorff's alpha method, verifying an alpha value > 0.67, usually defined as a threshold value (Schmidt et al. 2018). This verification was carried out for the remaining annotators' group as a whole and considering the specific expertise levels. ...
... Various combinations of existing lexicons and NLP tools have been evaluated against a human-annotated subsample [45], which serves as a gold standard. In fact, Human manual Annotation (HMA) techniques still seem to better retrieve the presence of particular terms (i.e. ...
... The three levels are then characterized by the same size. A polarity annotation contingency table has been plotted to evaluate the agreement of all annotators and the Krippendorff's α coefficient has been calculated [45]. ...
Article
In the building management process, the collection of end-users' maintenance requests is a rich source of information to evaluate occupants' satisfaction and building systems. Computerized Maintenance Management Systems typically collect non-standardized data, difficult to be analyzed. Text mining methodologies can help to extract information from end-users' requests and support priority assignment of decisions. Sentiment Analysis can be applied at this aim, but complexities due to words/sentences orientations/polarities and domains/contexts can reduce its effectiveness. This study compares the ability of different Sentiment Analysis techniques and Human Manual Annotation, considered the gold standard, to automatically define a maintenance severity ranking. About 12,000 requests were collected for 34 months in 23 University buildings. Results show that current Sentiment Analysis techniques seem to limitedly recognize the role of technical words for severity assessment of requests, thus remarking the necessity of novel lexicons in the field of building facility management for automatic maintenance management procedures.
... While these projects are very promising, there are several problems concerning the current state of this field: The annotation process has been shown to be rather tedious and challenging, often leading to the need of (expensive) experts that understand context and language of the material. Furthermore, agreement among annotators is low due to the inherent subjectivity of narrative and poetic texts but also due to problems in the interpretation because of historical or vague language [1,27,50,36,38,48,41]. The annotation problems pose challenges to the creation of valid corpora that are necessary for modern machine learning. ...
... The annotation results are as follows: The annotators agree upon 348 of the 672 speeches (52%) with a Cohen's κ-value of 0.233 (fair agreement according to [18]). These are rather low [1,27,42,50,36,38,48,41]. We define the gold standard we use for the evaluation of sentiment prediction via the following approach: If annotators agree upon a speech, the speech is assigned the chosen class. ...
... The majority of annotations are negative, which is in line with previous research considering the annotation of literary texts [1,42,36,38,48,41]. The high number of neutral annotations is, according to our analysis, due to the fact that many speeches are very short (e.g. ...
Conference Paper
Full-text available
We present first results of an exploratory study about sentiment analysis via different media channels on a German historical play. We propose the exploration of other media channels than text for sentiment analysis on plays since the auditory and visual channel might offer important cues for sentiment analysis. We perform a case study and investigate how textual, auditory (voice-based), and visual (face-based) sentiment analysis perform compared to human annotations and how these approaches differ from each other. As use case we chose Emilia Galotti by the famous German playwright Gotthold Ephraim Lessing. We acquired a video recording of a 2002 theater performance of the play at the "Wiener Burgtheater". We evaluate textual lexicon-based sentiment analysis and two state-of-the-art audio and video sentiment analysis tools. As gold standard we use speech-based annotations of three expert annotators. We found that the audio and video sentiment analysis do not perform better than the textual sentiment analysis and that the presentation of the video channel did not improve annotation statistics. We discuss the reasons for this negative result and limitations of the approaches. We also outline how we plan to further investigate the possibilities of multimodal sentiment analysis.
... Examples of other projects and research goals are the analysis and prediction of plot developments (Reagan et al., 2016), character relations (Nalisnick & Baird, 2013;Yavuz, 2021) or "happy endings" (Jannidis et al., 2016) via sentiment and emotion analysis. One branch of research focuses on the annotation of texts with sentiment or emotion information to create well-curated corpora for evaluation and machine learning and to investigate annotation behavior and agreement statistics (Alm & Sproat, 2005;Sprugnoli et al., 2016;Schmidt, Burghardt, & Dennerlein, 2018a;Schmidt, Burghardt, Dennerlein, & Wolff, 2019a;Schmidt, Jakob, & Wolff, 2019;Schmidt, Winterl, Maul, Schark, Vlad, & Wolff, 2019). Modern approaches explore multimodal methods to analyze sentiment in cultural artefacts Ortloff et al., 2019). ...
Conference Paper
Full-text available
We present SentText, a web-based tool to perform and explore lexicon-based sentiment analysis on texts, specifically developed for the Digital Humanities (DH) community. The tool was developed integrating ideas of the user-entered design process and we gathered requirements via semi-structured interviews. The tool offers the functionality to perform sentiment analysis with predefined sentiment lexicons or self-adjusted lexicons. Users can explore results of sentiment analysis via various visualizations like bar or pie charts and word clouds. It is also possible to analyze and compare collections of documents. Furthermore, we have added a close reading function enabling researchers to examine the applicability of sentiment lexicons for specific text sorts. We report upon the first usability tests with positive results. We argue that the tool is beneficial to explore lexicon-based sentiment analysis in the DH but can also be integrated in DH-teaching.
... Various forms of syntactic (cf. [4]) or semantic annotations [3,6,41] exist for various media types. In film studies various approaches towards annotation are used. ...
... [22]). Reasons for this might be that annotators perceive the task as challenging and tedious [1,41,42,49]. If the annotators have no expertise, they report problems with the language and the missing context [1,41,42,49]. ...
... Reasons for this might be that annotators perceive the task as challenging and tedious [1,41,42,49]. If the annotators have no expertise, they report problems with the language and the missing context [1,41,42,49]. Furthermore, narrative texts are generally more prone to subjectivity since they can be interpreted in different ways. ...
Chapter
Full-text available
Movies in Digital Humanities are often enriched with information by annotating the text e.g. via subtitles. However, we hypothesize that the missing presentation of the multimedia content is disadvantageous for certain annotation types like sentiment annotation. We claim that performing the annotation live during the viewing of the movie is beneficial for the annotation process. We present and evaluate the first version of a novel approach and prototype to perform live sentiment annotation of movies while watching them. The prototype consists of an Arduino microcontroller and a potentiometer which is paired with a slider. We perform an annotation study for five movies receiving sentiment annotations from three annotators each, once via live annotation and once via traditional subtitle annotation to compare the approaches. While the agreement among annotators increases slightly by using live sentiment annotation, the overall experience and subjective effort measured by quantitative post questionnaires improves significantly. The qualitative analysis of post annotation interviews validates these findings.
... These corpora allow for the application of advanced machine learning approaches but also for systematic performance evaluations of various approaches. While these corpora are rather easy to acquire in the context of social media and product reviews e.g. by using Amazon Mechanical Turk [29] it has been shown that the annotation of literary texts is more challenging [2,25,26]. Due to the current lack of annotated corpora, researchers often employ rule-based sentiment analysis approaches (dictionary-based approaches; c.f. [13]). These are oftentimes not optimized for the specific domain and vocabulary of literary texts and are often out-performed by machine learning approaches [13,29]. ...
... There are various reasons for the lack of annotated corpora for literary texts. Sentiment annotation for this text sort is currently done with text-based annotation tools enabling the annotation of passages [25,26] or more complex relations in the text [14]. However, annotators perceive the task as very challenging and tedious [2,25]. ...
... Sentiment annotation for this text sort is currently done with text-based annotation tools enabling the annotation of passages [25,26] or more complex relations in the text [14]. However, annotators perceive the task as very challenging and tedious [2,25]. If the annotators have no specific expertise, they report many problems with the language and the missing context [2,25,27]. ...
Conference Paper
Full-text available
In this contribution, we present the first version of a novel approach and prototype to perform live sentiment annotation of movies while watching them. Our prototype consists of an Arduino microcontroller and a potentiometer, which is paired with a slider. We motivate the need for this approach by arguing that the presentation of multimedia content of movies as well as performing the annotation live during the viewing of the movie is beneficial for the annotation process and more intuitive for the viewer/annotator. After outlining the motivation and the technical setup of our system, we report on which studies we plan to validate the benefits of our system.
... Second, we also see potential for using differing media channels to improve the annotation of sentiment for narrative texts. Literary texts have been proven to be very difficult and tedious to annotate due to the historic and complex language (Schmidt, Burghardt & Dennerlein, 2018;Schmidt, Burghardt & Wolff, 2018;Alm et al., 2005). The presentation of text material in multimodal form might improve and facilitate the annotation process concerning sentiment. ...
... The annotator was a male student of German literary studies who had to write a thesis about Emilia Galotti during the annotation process and can therefore be regarded as an expert for this specific play. We restricted the annotation to 200 speeches since sentiment annotation in this context has been proven to be very tedious and challenging (Alm et al., 2005;Schmidt, Burghardt & Dennerlein, 2018). Therefore, all comparisons with human annotations are done with those specific 200 speeches. ...
Conference Paper
Full-text available
We present a case study as part of a work-in-progress project about multimodal sentiment analysis on historic German plays, taking Emilia Galotti by G. E. Lessing as our initial use case. We analyze the textual version and an audio version (audiobook). We focus on ready-to-use sentiment analysis methods: For the textual component, we implement a naive lexicon-based approach and another approach that enhances the lexicon by means of several NLP methods. For the audio analysis, we use the free version of the Vokaturi tool. We compare the results of all approaches and evaluate them against the annotations of a human expert, which serves as a gold standard. For our use case, we can show that audio and text sentiment analysis behave very differently: textual sentiment analysis tends to predict sentiment as rather negative and audio sentiment as rather positive. Compared to the gold standard, the textual sentiment analysis achieves accuracies of 56% while the accuracy for audio sentiment analysis is only 32%. We discuss possible reasons for these mediocre results and give an outlook on further steps we want to pursue in the context of multimodal sentiment analysis on historic plays.