Figure 7 - uploaded by Agneta Gulz
Content may be subject to copyright.
The three body stereotypes (somatotypes) defined by Sheldon, et al. (1940). From left to right: ectomorph: narrow shoulders and hips, thin narrow chest and abdomen, thin legs and arms, very little muscle or body fat; mesomorph: broad shoulders and narrow hips, muscular body, strong forearms and thighs; endomorph: wide hips and narrow shoulders, fat body, fat upper arms and thighs, slim wrists and ankles. © Magnus Haake.

The three body stereotypes (somatotypes) defined by Sheldon, et al. (1940). From left to right: ectomorph: narrow shoulders and hips, thin narrow chest and abdomen, thin legs and arms, very little muscle or body fat; mesomorph: broad shoulders and narrow hips, muscular body, strong forearms and thighs; endomorph: wide hips and narrow shoulders, fat body, fat upper arms and thighs, slim wrists and ankles. © Magnus Haake.

Source publication
Article
Full-text available
The paper deals with the use of visual stereotypes in virtual pedagogical agents and its potential impact in digital learning environments. An analysis of the concept of visual stereotypes is followed by a discussion of affordances and drawbacks as to their use in the context of traditional media. Next, the paper explores whether virtual pedagogica...

Similar publications

Article
Full-text available
Showing the design and layout of the building using a model, making this model requires time and accuracy because the model is built on a scale that has been adjusted to the building to be made later, a problem that often occurs in modeling is the suitability of the scale and detail of the model. Augmented Reality as a variation of the virtual envi...
Article
Full-text available
Process satisfaction is one important determinant of workgroup collaborative system adoption, continuance, and performance. We explicate the computer-mediated communication (CMC) interactivity model (CMCIM) to explain and predict how interactivity enhances communication quality that results in increased process satisfaction in CMC-supported workgro...
Article
Full-text available
We are living in the 21st century that requires us to know how to integrate paper based curricula into interactive media. The purpose of the study is to provide exemplary cases that music teachers hope for their music classes through using Digital Textbook. For this purpose, two times of interviews with ten music teachers in South Korea were conduc...
Technical Report
Full-text available
Report of research conducted for early career research fellowship jointly sponsored by the Fred Rogers Center for Early Learning and Digital Media at Saint Vincent College and the Technology in Early Childhood Center at Erikson Institute.
Article
Full-text available
Integrating interactive e-module to train critical thinking skills in the Merdeka Curriculum era relies heavily on needs analysis. The Merdeka Curriculum requires a holistic and applicable approach to teaching, utilizing the STEM approach as one of its foundations to achieve this goal. The research consists of three stages: planning, data collectio...

Citations

... For example, NPCs having the appearance of mentors wear a beard, which conveys older age and higher competence [11]. These visual cues aim to activate stereotypes in the player's mind [12]. In Super Mario Bros., the aggressive and angry nature of the Goombas is conveyed through their slanting eyebrows. ...
Article
Non-Playable Characters (NPCs) are a subtype of virtual agents that populate video games by endorsing social roles in the narrative. To infer NPCs’ roles, players evaluate NPCs’ appearance and behaviors, usually by ascribing human traits to NPCs, such as intelligence, likability and morality. In particular, hostile NPCs in video games are essential to build the games’ inherent challenges. The three experiments reported here investigated the extent to which the perception of hostility in a military shooter game (including both threat of appearance and aggressiveness in behaviors) is influenced by the appearance and the behaviors of NPCs thanks to perceived intelligence, likability and morality-related questionnaires. Our results first show that hostility is efficiently conveyed through NPCs’ behaviors, but not significantly by their appearance. Second, our study allows identifying the main predictors of hostility perception, namely unfriendliness, knowledge and harmfulness.
... Studies have shown that stereotypes apply to technology and the evaluation of different cues impact several perceived competence items. Usually, female-gendered technology is perceived as having lower levels of trustworthiness and intelligence, compared to the technology that embodies a man, but higher levels of warmth and empathy (Haake & Gulz, 2008) It is emphasized that stereotypes are more likely to occur when technology is used in domains that are specific for one of the genders, rather than in gender neutral ones. Thus, when technology embodies a woman, individuals perceive its competence higher in domains that are typical for women, compared to technical or other occupations that are labeled as being specific for men. ...
... This is the reason why it is difficult to reshape them to avoid their negative effect. However, technology can help shape new generations that could become less oriented towards strong stereotypes that lead to discrimination (Haake & Gulz, 2008). Gender stereotypes are highly frequent while communicating with conversational agents or virtual assistants. ...
Article
Full-text available
Artificial Intelligence implies computer systems capable of mimicking human-like intelligence and competencies. In the nowadays society it is an exciting topic, thus, technology’s gender features and roles are of great interest as well. As the literature is still scarce and inconsistent, the present paper aims to develop a systematic literature review on gender stereotypes attached to technology (virtual assistants and robots). The main goals are to emphasize the labels given to technology from a gender perspective, the perceived competencies of the gendered technology, the most relevant variables responsible for the way gender issues are perceived in connection with technology, and the proposed solutions for diminishing the technology gender stereotypes. Forty-five scientific papers have been selected and analyzed. Findings suggest that the most intelligent technologies are designed as females, male-gendered technology performs better in task-solving, and users’ age and technology’s visual representation are important variables in perception.
... It can be defined as the character or character in the design context, which is related to the presentation or appearance and visual impression. A person's physical appearance forms expectations of other values in character [13]. This theory shows that the character's visual presentation acts as a specific trait that represents the values, identity, motivations, and dispositions they possess. ...
Article
The purpose of this research is to find the modification of the iconic Pencak Silat movements from Panglipur martial arts that were shown in The Raid 2 movie. Panglipur martial arts version of Pencak Silat is one of the traditional ones. Panglipur Pencak Silat movements and stances that were used in The Raid 2 movie are not as perfect as the original version since it was modified according to the needs of the movie scene. This study used the Ethnography research method to analyze the cultural identity that was contained in the characters of The Raid 2 movie. A descriptive qualitative method was also used to collect the data needed for this research. This method was also used to observed and study the visual of the Pencak Silat movements. The visual analysis was conducted by studying important and related scenes in The Raid 2 movies to find out whether there was a modification done to the Panglipur Pencak Silat movements. The visual analysis and studies proved that the Panglipur Pencak Silat movements that were shown in The Raid 2 movie were modified from its original movement version for the purpose and need of the movie. The impact of this study is the rise of interest in the audience in Indonesian traditional martial arts and culture through the modification of the Pencak Silat movements that are visualized through a modernized martial arts action movie
... The concept of stereotype needs to be emphasized here. It refers to the classification of a person into a certain group according to his/her visual characteristics (Haake & Gulz, 2008). They indicated that people have a fixed impression of certain workers such as flight attendants, and thus their effect on students in the teaching environment cannot be ignored. ...
... Cultural differences in some clues can have effects on findings from PA research (Gulz & Haake, 2006;Haake & Gulz, 2008;Richards & Dignum, 2019). For example, Lin et al. (2020) explored whether the personalization principle applies to Asian populations. ...
... Moreover, Lily Diagrams were used to visualize the clues to aid understanding the framework. We emphasized the cultural differences of PAs (Gulz & Haake, 2006;Haake & Gulz, 2008;Richards & Dignum, 2019), so we searched for literature in specific areas (Taiwan) to avoid misunderstandings about the effectiveness. We then collected 2000-2019 PA literature to run the framework. ...
Article
Pedagogical agents (PAs) are a crucial aspect of the e-learning environment. A PA is defined as a virtual character presented on an interface, and they are designed to promote student learning. PAs have been widely discussed in academic papers. However, an appropriate analysis framework has not been proposed because of the diversity and complexity of PAs. Therefore, we reviewed the literature and proposed a list of related clues, including environmental, learner, role, appearance, and social clues. We used this framework to analyse the learning effectiveness of PAs in specific areas. The keyword ‘pedagogical agent’ was used to search for related papers from 2000 to 2019. A total of 136 related papers were obtained. A meta-analysis was performed using a random effects model (Hedges’ g was used to measure the effect size). The effect size of the learning effectiveness of PA was small to medium (g = 0.423). The results of subgroup analysis (Hedges’s g) revealed that subjects, grades, additional support, appearance style, and facial expression changes had a different moderating effect on the effect of PA on learning effectiveness. We discussed the moderating effect of related clues and suggested future research directions.
... Our presumptions are based on common references and traits. First impressions of a new character have been shown to be persistent even if impressions are contradicted or more nuanced information are revealed (Haake and Gulz, 2008). Visual attributes help game designers to communicate elements of a game's theme, story, and challenge, and steer player behavior (Baranowski et al., 2008;Schell, 2008;Przybylski et al., 2010;Bakkes et al., 2012;Mohd Tuah et al., 2017). ...
... Narratives often use tropes or clichés that the audiences are familiar with-such as the damsel in distress trope used in Mario-to set expectations and to make clear what actions will need to be taken. Similar to narrative tropes, character designers use visual archetypes (Haake and Gulz, 2008; e.g., the muscle packed action hero; or the magician in long robes) to provide visual affordances for players (e.g., recognizing a character that will likely use brute force vs. magic) to motivate player actions. For example, when facing a life or death decision we would act differently toward an immoral character (e.g., someone who puts themselves at risk to help an injured child in a dire situation vs. someone who always acts in their own best interests); or, someone who betrays their team or family, compared to someone who has displayed moral behavior (e.g., acting with loyalty even while being tempted toward disloyalty). ...
Article
Full-text available
The visual design of antagonists—typically thought of as “bad guys”—is crucial for game design. Antagonists are key to providing the backdrop to a game's setting and motivating a player's actions. The visual representation of antagonists is important because it affects player expectations about the character's personality and potential actions. Particularly important is how players perceive an antagonist's morality. For example, an antagonist appearing disloyal might foreshadow betrayal; a character who looks cruel suggests that tough fights are ahead; or, a player might be surprised when a friendly looking character attacks them. Today, the art of designing character morality is informed by archetypal elements, existing characters, and the artist's own background. However, little work has provided insight into how an antagonist's appearance can lead players to make moral judgments. Using Mechanical Turk, we collected participant ratings on a stimulus image set of 105 antagonists from popular video games. The results of our work provide insights into how the visual attributes of antagonists can influence judgments of character morality. Our findings provide a valuable new lens for understanding and deepening an important aspect of game design. Our results can be used to help ensure that a particular character design has the best chance to be universally seen as “evil,” or to help create more complex and conflicted emotional experiences through carefully designed characters that do not appear to be bad. Our research extends current research practices that seek to build an understanding of game design and provides exciting new directions for exploring how design and aesthetic practices can be better studied and supported.
... Visual stereotypes can appear in many different media formats: photos, movies, paintings, drawings, comics, animated movies, etc. Some of these formats allow different degrees of visual naturalism, which in the context of stereotypes is a feature of interest [1]. Visual Stereotypes and Virtual Pedagogical Agents Rather than using narrative, belief bonding, or even communication, characters are created by imagining gods or characters. ...
... Nevertheless -this portrayal of the ideal can be taken one step further with interactive computer media. A key difference lies in what is otherwise seen as a central potential of virtual characters -not the least in pedagogical terms -namely their interactivity: Virtual characters may communicate, respond, and answer, thus establishing a dynamic, mutual social relation [1]. ...
... [201]). It is possible that the more 'stylized' (non-realistic and exaggerated) appearance of our coach (Fig. 6.4) impacted this, and that a more 'naturalistic' (human-looking) appearance, including affectively expressive facial expressions, would have served us better [271,272]. The third explanation is that our experimental setup impacted coach effectiveness. ...
Thesis
Full-text available
This thesis presents the research, design, and evaluation of the learning support system VESSEL: Virtual Environment to Support the Societal participation Education of Low-literates. The project was started from the premise that people of low literacy in the Netherlands participate in society less often and less effectively than literate people do: Their lower ability to read, write, speak, and understand the Dutch language hampers their ability to independently be part of society. Our goal was to create learning support prototypes with a re-usable design rationale, aimed at helping these people of low literacy learn to improve their societal participation. To achieve this, low-literate learners participated throughout the entire design process, ensuring that we addressed their wants and needs with regard to learning and the perceived shortcomings of existing learning materials and kept in mind their skills and capabilities in order to ensure effective learning. Particularly, we investigated the possible ways that digital learning, Virtual Learning Environments (VLE), and Embodied Conversational Agents (ECA) could help fulfill the societal participation needs of this target group. We used the Socio-Cognitive Engineering (SCE) methodology to organize and structure this research, distinguishing the foundation, specification and evaluation of the VESSEL design. Two studies provided a grounded foundation for VESSEL, which was refined and worked out into three subsequent studies that provided the consequential design specifications and prototype evaluations (all prototypes have been tested with a human ’Wizard of Oz’ simulating VESSEL functionality). In the first study, we collected necessary information for the foundation of VESSEL in three categories. The first category consisted of the operational demands, which form an overview of the context of use: Demographic information about low-literate learners in the Netherlands, a description of the crucial practical situations of participating in Dutch society, and important attributes of learning societal participation in the Netherlands. The second category encompassed human factors knowledge, consisting of literature about adult learning and ICT-supported learning. The third category contained technology insights, which we gathered by looking at both existing learning support software in the areas of low literacy and societal participation in the Netherlands, and the envisioned capabilities of VLEs. In the second study, we extended and refined our knowledge of the operational demands (as the foundation for VESSEL). We spoke to low-literate language learners in the Netherlands, in order to gain qualitative insights into their daily life experiences related to participating in Dutch society. We used participant workshops and Cultural Probes to obtain large amounts of rich data pertaining to these experiences, and we used the Grounded Theory method to transform these data into the Societal Participation Experiences of Low-Literates (SPELL) model. This model describes the four attribute categories of societal participation experiences: Personal attributes, formal societal attributes, information societal attributes, and information-communication attributes. In the third study, we used our foundation of information to create a first prototype, a ’proof-of-concept’ VESSEL. This prototype consisted of four interactive scenario-based learning exercises: Two exercises (’Easy’ and ’Hard’) about conducting online banking, and two exercises about talking to a city hall service desk employee. The prototype also contained our ECA ’coach’, Anna, who could provide three types of learning support: Cognitive learning support based on scaffolding, affective learning support based on motivational interviewing, and social learning support based on small talk. This prototype was evaluated with low-literate language learners throughout the Netherlands in an empirical mixed-method experiment, in which users did all four exercises both with and without coach support. Results showed that all learners managed to complete all exercises with coach support, while almost no learners completed all exercises without coach support. Participants interacted with the coach in a natural manner: They asked for her help and even talked to her without external prompting. A majority of participants appreciated her presence and help. In the fourth study, we formalized the coach’s cognitive learning support capabilities for the design and evaluation of the second prototype. We used existing scaffolding literature and our own experiences from the third study to define five levels of cognitive learning support: Prompt, Explanation, Hint, Instruction, and Modeling. We created a large corpus of standardized speech utterances for the coach in the context of the Hard Online Banking exercise, and wrote detailed rules describing which type of utterance the coach should use in any given situation, how long the coach should wait between utterances, and what kinds of user-uttered keywords she should react to and how. The model describes that the coach should always offer the lowest level of support (Prompt) for any new topic, that support should always go up in level and never repeat itself unless asked, and that the coach should wait a certain amount of time after any utterance. Two support models were made to describe this timing: The Generalized Model, in which the coach always waits 20 seconds, and the Individualized Model, in which the coach adapts the support wait time to the individual participant’s previous performance. The second prototype was created, focusing on an expanded version of the Hard Online Banking exercise, and an empirical mixed-method experiment was conducted with low-literate learners to test the differences between the two support models: Learners completed three exercises in either the Generalized condition (with a consistent 20 second support wait time) or the Individualized condition (in which their support time in exercises two and three depended on their results in exercises one and two). We hypothesized that both the Generalized and Individualized Models would increase learning effectiveness, and that the Individualized Model would increase learning effectiveness significantly more than the Generalized Model. Results support the first hypothesis: Support from either model resulted in high learning effectiveness and higher self-efficacy for low-literate learners, and low-literate learners managed to use the new keyword-based speech recognition without the need for explanation. The second hypothesis was not supported: No differences in learning effectiveness were found between the two support models. In the fifth study, we formalized the coach’s affective and social support capabilities for the design and evaluation of the third prototype. We used existing motivational interviewing literature to define four levels of affective learning support (Reflective Listening, Normalizing, Affirmation, and Self-Efficacy Supporting) for three emotional states (Anger, Fear, and Sadness) at three levels of specificity (General, Specific, and Very Specific), and created a corpus of affective support utterances: For each combination of emotional state and specificity (General, Specific, or Very Specific Anger, Fear, or Sadness), one or two support utterances were created for each level of affective support. We used the Shimmer photoplethysmographic sensor and the FaceReader facial expression recognition software to infer learners’ affective states from their heart rate and facial expressions (respectively), and connected this to new affective support rules: Whenever the coach inferred an emotion at a certain level of specificity, it should use one Reflective Listening utterance relevant to that particular combination, one Normalizing utterance, one Affirmation utterance, and one Self-Efficacy utterance, in that order. We also used existing small talk literature to write a simple branching small talk script for the coach, focused on bonding with the user and introducing the new Volunteer Work exercise, in which learners had to fill out a volunteer work background information form and then talk to an ECA about their answers. A third and final Wizard-of-Oz prototype was created and evaluated with low-literate learners in an empirical mixed-method experiment, in which learners completed the full exercise once with only cognitive learning support and once with cognitive, affective, and social learning support. Results did not show strong significant differences between the two conditions. We identified three potential explanations: Our exercises did not manage to evoke emotional reactions in learners strong enough for our sensors to detect, our affective support model was not effective in the way we intended, and/or our experimental setup limited the amount of emotional reactions learners could experience. However, the prototype in general did work as intended: Learners completed every exercise, requested and used the coach’s support, and reported higher self-efficacy at the end of the experiment. This experiment also reported differences between NT1 and NT2 learners and between men and women, suggesting more careful study into demographic differences will be required. Overall, results from our studies show that VESSEL seems to be increasing learning effectiveness. Learners across studies reported that working with VESSEL made for a positive learning experience, and after doing challenging societal participation exercises for the first time, learners’ self-efficacy regarding the exercise topic (online banking / volunteer work) increased and stayed on the new level throughout. However, it proved difficult to clearly identify distinctive effects for specific VESSEL functions: For instance, positive learning outcomes could not clearly be attributed to the adaptive timing of the support or the constructive scaffolding used for cognitive support, and the positive experience of interacting with the coach could not be attributed to the presence of scripted small talk and affective support. Crucially, our results show that learners were able to use VESSEL as intended: they interacted with the exercises as intended and with the coach as envisioned, without the need for prior explanations or tutorials (save a brief introduction given by the coach). This suggests that we have managed to incorporate the actual capabilities, shortcomings, and wants and needs of people of low literacy into the design of VESSEL. However, it is not clear whether these positive outcomes would apply to all low-literate learners: While we attempted to recruit low-literate participants from different backgrounds and skill levels, on reflection, the majority of our participants were relatively high-skilled and intrinsically motivated. This is further complicated by the relatively low number of participants in our experiments, which calls the power of the results into question. Just as importantly, we regularly saw that learners socially engaged with ’Anna’: They responded to her questions, asked questions of their own, thanked her for her help, and even occasionally talked to her as if she was a real person – telling stories and making jokes. Learners were grateful for the support, and generally indicated that they would like to receive more support like this in the future.
... Similar to a chatbot scenario (Berger et al., 2019) or virtual agents in Virtual Reality/Augmented Reality applications these guided dialogues could support focus and engagement (Wang et al., 2019). The concept of avatars has been explored since many years for underlying motivational effects but also regarding pedagogical and stereotypical gender effects of such so-called pedagogical agents (Baylor, 2009;Gulz & Haake, 2006;Haake & Gulz, 2008). Visualisations of pedagogical agents (hereafter referred to as PA) have been widely investigated by researchers who reported effects regarding learning styles, learning impact and performance (Atorf et al., 2019;Laureano-Cruces et al., 2016;Veletsianos, 2010), influences on persuasion (Khan & Sutcliffe, 2014), student motivation/emotions (Bendou et al., 2017;Liew et al., 2016) and benefits concerning cognitive load (Schroeder, 2017). ...
Conference Paper
Full-text available
Educational institutes are currently facing the new normality that an ongoing pandemic situation has brought to teaching and learning. Distributed learning with content that blends over several platforms and locations needs to be created with didactic expertise in a feasible manner. At the same time, the possibilities for creating and distributing digital content have developed rapidly. Advanced computing supports the creation of artificial images, natural speech, and even natural-looking but non-existent persons. Since such generative content is often also published under a Creative Commons license, it presents as viable option for designing learning content, assignments, or instructions for tasks. However, there is still limited evidence on how, for example, generated pedagogical agents (tutors) influence behaviour and decisions. This study investigated the influences of artificially generated tutor personas in a decision-making task distributed internationally on the Google Play store. The field experiment extended the balloon analogue risk task (BART) with instructions from generated persona photographs to evaluate potential influences on risk-taking behaviour. In a between-subject design, either a female tutor, a male tutor, or no tutor picture at all was presented during the task. The results (N=74) show a higher risk propensity when displaying a male artificial instructor compared to a female instructor. Participants also proceed with greater caution when instructed by a female tutor as they reflect longer before initiating the next step to pump up the balloon. Further lines of research and experiences from the distribution of an investigative instruction app on Google Play are summarised in the conclusive implications.
... No.2: 111-134 E-ISSN: 2477-586X, ISSN: 2338-3348 | https://doi.org/10.28932/srjd.v4i2.2038| Received: 19-11-2019, Accepted: 18-07-2020 Peter Ekspektasi pemain terhadap para hero ini berkaitan dengan presentasi/penampilan dan kesan secara visual, yang tentu saja ditentukan oleh tampilan fisik yang dimiliki karakter tersebut (Haake, Gulz, 2008). Teori ini menunjukkan bahwa presentasi visual Serat Rupa Journal of Design, July 2020, Vol.4, ...
Article
Gatotkaca adalah salah satu pahlawan yang terkenal di dalam kultur pop Indonesia yang berasal dari karya sastra India Mahabharata. Di Indonesia, Gatotkaca banyak ditampilkan pada pertunjukkan wayang. Di masa modern, Gatotkaca banyak diadopsi ke dalam iklan, buku komik, karikatur dan terakhir ke dalam permainan online bernama Mobile Legends Bang Bang. Di Indonesia, Mobile Legends Bang Bang memiliki jumlah pemain terbesar bila dibandingkan dengan jumlah pemain dari belahan dunia lain. Dalam pengembangan gim-nya, Mobile Legends membuat karakter hero yang diambil dari Indonesia, yaitu Gatotkaca, ke dalam daftar hero mereka yang dapat dipilih oleh pemain. Pengembang permainan ini berharap agar karakter ini dapat merepresentasikan pemain dari Indonesia dan menambah daya tarik terhadap permainan ini. Asumsi awal dari penelitian ini adalah generasi pemain yang berusia 15- 30 tahun belum tentu mengenali ciri Indonesia dalam visualisasi karakter yang dikembangkan oleh Montoon, dimana perusahaan ini adalah perusahaan global yang bukan berasal dari Indonesia. Penelitian ini bertujuan untuk mengeksplorasi sejauh mana pemain muda Indonesia dapat mengenali karakter Gatotkaca dalam gim Mobile Legends dan untuk mengetahui mitos apa yang ingin dibangun oleh Montoon melalui karakter Gatotkaca, yang membuat pemain Indonesia tertarik untuk memainkannya. Peneliti menggunakan kuesioner dan focus group discussion dalam mengumpulkan informasi yang diperlukan. Hasil dari responden dianalisis menggunakan semiotika Barthes untuk memperoleh data yang menghubungkan antara persepsi pemain dengan ciri-ciri visual yang dimiliki oleh karakter hero dan digunakan metode triangulasi dari 3 sumber yaitu hasil FGD, survei dan teori penandaan Barthes, untuk melakukan perbandingan data. Representasi mitos yang ditemukan adalah karakter Gatotkaca sebagai karakter hero laki satu-satunya yang mewakili Indonesia yang memiliki kekuatan dewa, sangat maskulin dan tidak bersahabat, tetapi memiliki keagungan seorang raja. Para responden dapat mengindentifikasi Gatotkaca sebagai karakter dari Indonesia berdasarkan visual kostum, tetapi tidak semua aspek visual dari Gatotkaca merepresentasikan Indonesia. Selain itu terdapat kesalahan persepsi oleh generasi pemain 15-30 tahun terhadap identitas Indonesia. Keberadaan karakter Gatotkaca tidak mempengaruhi pemilihan karakter mereka ketika memainkan Mobile Legends.
... Flowery clothes, twinkling mirror-balls, dreadlocks, shoulder pads, skulls and crosses, Cowboy hats, Baseball caps and lots of jewelry are visual synonyms for music styles ranging from the early sixties to contemporary Hip-Hop. Visual stereotypes play an essential role in our social interaction with unfamiliar others [91]. They trigger a categorization process in which we quickly form expectations on a person's likely behavior, attitudes, opinions, personality, manners, etc. and thus shape our personal attitude towards that person. ...
Preprint
This thesis combines audio-analysis with computer vision to approach Music Information Retrieval (MIR) tasks from a multi-modal perspective. This thesis focuses on the information provided by the visual layer of music videos and how it can be harnessed to augment and improve tasks of the MIR research domain. The main hypothesis of this work is based on the observation that certain expressive categories such as genre or theme can be recognized on the basis of the visual content alone, without the sound being heard. This leads to the hypothesis that there exists a visual language that is used to express mood or genre. In a further consequence it can be concluded that this visual information is music related and thus should be beneficial for the corresponding MIR tasks such as music genre classification or mood recognition. A series of comprehensive experiments and evaluations are conducted which are focused on the extraction of visual information and its application in different MIR tasks. A custom dataset is created, suitable to develop and test visual features which are able to represent music related information. Evaluations range from low-level visual features to high-level concepts retrieved by means of Deep Convolutional Neural Networks. Additionally, new visual features are introduced capturing rhythmic visual patterns. In all of these experiments the audio-based results serve as benchmark for the visual and audio-visual approaches. The experiments are conducted for three MIR tasks Artist Identification, Music Genre Classification and Cross-Genre Classification. Experiments show that an audio-visual approach harnessing high-level semantic information gained from visual concept detection, outperforms audio-only genre-classification accuracy by 16.43%.