Figure 3 - uploaded by Shannon Claire Hennig
Content may be subject to copyright.
Selection of open ended responses

Selection of open ended responses

Source publication
Conference Paper
Full-text available
Expressive voices are needed on speech generating devices. Three expressive voice gradients, ranging from calm to very intense, were created using HMM-based speech synthesis as an initial point of inquiry. A survey was completed by 64 adults (half non-native English speakers; half men, half women) that included a listening test which evaluated whet...

Similar publications

Article
Full-text available
Adults as well as infants have the capacity to discriminate languages based on visual speech alone. Here, we investigated whether adults' ability to discriminate languages based on visual speech cues is influenced by the age of language acquisition. Adult participants who had all learned English (as a first or second language) but did not speak Fre...
Preprint
Full-text available
This reports data from Turkish-Australian English early and late sequential bilinguals and an Australian English monolingual control group. The participants completed an AX discrimination task in which there were vowel contrasts from Turkish, English and Thai languages. Results revealed that early exposure led to better discrimination - and possibl...
Conference Paper
Full-text available
The goal of this task is to allow for comparison across sense-induction and discrimination systems, and also to compare these systems to other supervised and knowledge-based systems. In total there were 6 participating systems. We reused the SemEval-2007 English lexical sample subtask of task 17, and set up both clustering-style unsupervised evalua...
Conference Paper
Full-text available
This paper proposes using linguistic knowledge from Wiktionary to improve lexical disambiguation in multiple languages, focusing on part-of-speech tagging in selected languages with various characteristics including English, Vietnamese, and Korean. Dictionaries and subsumption networks are first automatically extracted from Wiktionary. These lingui...
Article
Full-text available
The present studies examined the role of linguistic experience in directing English and Mandarin learners' attention to aspects of a visual scene. Specifically, they asked whether young language learners in these 2 cultures attend to differential aspects of a word-learning situation. Two groups of English and Mandarin learners, 6-8-month-olds (n =...

Citations

... Evaluating expressive speech study. We can draw on some experience evaluating the use and effectiveness of more more expressive voices (Hennig, 2103;Hennig, Sz é kely, Carson-Berndsen, & Chellali, 2012). A computerbased survey was developed in which sets of identical phrases were synthesized with three different tones of voice. ...
Article
Full-text available
People with complex communication needs who use speech-generating devices have very little expressive control over their tone of voice. Despite its importance in human interaction, the issue of tone of voice remains all but absent from AAC research and development however. In this paper, we describe three interdisciplinary projects, past, present and future: The critical design collection Six Speaking Chairs has provoked deeper discussion and inspired a social model of tone of voice; the speculative concept Speech Hedge illustrates challenges and opportunities in designing more expressive user interfaces; the pilot project Tonetable could enable participatory research and seed a research network around tone of voice. We speculate that more radical interactions might expand frontiers of AAC and disrupt speech technology as a whole.
... The reason for this is that the user of a speech generating device often does not have the opportunity to listen to the resulting synthetic speech ahead of a conversation, but has to trust the known characteristics of the voice styles to be able to use them during interaction. Results of further perceptual experiments aimed at AAC application in particular (Hennig et al., 2012) have shown that the three voices can be characterised on an expressiveness gradient representing different intensities of emotions: from calm (A voice), through intense (B voice) to very intense (C voice). In this study, participants were asked to compare a set of sentences and decide which one sounds more emotionally intense. ...
Article
Full-text available
Abstract The ability to efficiently facilitate social interaction and emotional expression is an important, yet unmet requirement for speech generating devices aimed at individuals with speech impairment. Using gestures such as facial expressions to control aspects of expressive synthetic speech could contribute to an improved communication experience for both the user of the device and the conversation partner. For this purpose, a mapping model between facial expressions and speech is needed, that is high level (utterance-based), versatile and personalisable. In the mapping developed in this work, visual and auditory modalities are connected based on the intended emotional salience of a message: the intensity of facial expressions of the user to the emotional intensity of the synthetic speech. The mapping model has been implemented in a system called WinkTalk that uses estimated facial expression categories and their intensity values to automatically select between three expressive synthetic voices reflecting three degrees of emotional intensity. An evaluation is conducted through an interactive experiment using simulated augmented conversations. The results have shown that automatic control of synthetic speech through facial expressions is fast, non-intrusive, sufficiently accurate and supports the user to feel more involved in the conversation. It can be concluded that the system has the potential to facilitate a more efficient communication process between user and listener.
... This resulted in an HMM adult male voice with three distinct voice styles that vary both with regards to prosody and voice quality. Previous work with the same participants confirmed that listeners perceive these voice styles as distinctive and spontaneously labeled them as ranging on a continuum from calm (voice A) to expressive/emphatic (voice B) to very expressive/emphatic (voice C) [30]. 81% of participants agreed that the three voice styles could be described as ranging from calm to very intense. ...
Conference Paper
Full-text available
As speech synthesis technology develops more advanced paralinguistic capabilities, open questions emerge regarding how humans perceive the use of such vocal capabilities by robots. Perceptions of spoken interaction are complex and influenced by multiple factors including the linguistic content of a message, social context, perceived intelligence of the agent, and form factor of its embodiment. This paper shares results from a study that controlled for the above factors in order to investigate the effect on human listeners of a male synthetic voice with an expressive range. Participants were randomly assigned to three conditions, counterbalancing for gender and language background, in which how paralinguistic cues were applied was varied. As the voice became more expressive and appropriate for the context, observers were more likely to describe the communication as effective, but were less likely to refer to the unseen agent as a person. Possible effects of the listener gender and cultural-linguistic background are examined. Implications for future methodologies in this field are discussed.
Conference Paper
This paper describes a demonstration of the WinkTalk system, which is a speech synthesis platform using expressive synthetic voices. With the help of a webcamera and facial expression analysis, the system allows the user to control the expressive features of the synthetic speech for a particular utterance with their facial expressions. Based on a personalised mapping between three expressive synthetic voices and the users facial expressions, the system selects a voice that matches their face at the moment of sending a message. The WinkTalk system is an early research prototype that aims to demonstrate that facial expressions can be used as a more intuitive control over expressive speech synthesis than manual selection of voice types, thereby contributing to an improved communication experience for users of speech generating devices.