Selection of open ended responses

Source publication

Listener evaluation of an expressiveness scale in speech synthesis for conversational phrases: implications for AAC

Conference Paper

Full-text available

Jul 2012

Expressive voices are needed on speech generating devices. Three expressive voice gradients, ranging from calm to very intense, were created using HMM-based speech synthesis as an initial point of inquiry. A survey was completed by 64 adults (half non-native English speakers; half men, half women) that included a listening test which evaluated whet...

Article

Full-text available

Nov 2013

Adults as well as infants have the capacity to discriminate languages based on visual speech alone. Here, we investigated whether adults' ability to discriminate languages based on visual speech cues is influenced by the age of language acquisition. Adult participants who had all learned English (as a first or second language) but did not speak Fre...

Perception of vowels in sequential bilinguals

Preprint

Full-text available

Feb 2018

This reports data from Turkish-Australian English early and late sequential bilinguals and an Australian English monolingual control group. The participants completed an AX discrimination task in which there were vowel contrasts from Turkish, English and Thai languages. Results revealed that early exposure led to better discrimination - and possibl...

Semeval-2007 task 02

Conference Paper

Full-text available

Jan 2007

The goal of this task is to allow for comparison across sense-induction and discrimination systems, and also to compare these systems to other supervised and knowledge-based systems. In total there were 6 participating systems. We reused the SemEval-2007 English lexical sample subtask of task 17, and set up both clustering-style unsupervised evalua...

Using Wiktionary to Improve Lexical Disambiguation in Multiple Languages

Conference Paper

Full-text available

Mar 2012

This paper proposes using linguistic knowledge from Wiktionary to improve lexical disambiguation in multiple languages, focusing on part-of-speech tagging in selected languages with various characteristics including English, Vietnamese, and Korean. Dictionaries and subsumption networks are first automatically extracted from Wiktionary. These lingui...

English- and Mandarin-Learning Infants' Discrimination of Actions and Objects in Dynamic Events

Article

Full-text available

Aug 2015

The present studies examined the role of linguistic experience in directing English and Mandarin learners' attention to aspects of a visual scene. Specifically, they asked whether young language learners in these 2 cultures attend to differential aspects of a word-learning situation. Two groups of English and Mandarin learners, 6-8-month-olds (n =...

17 Ways to Say Yes: Toward Nuanced Tone of Voice in AAC and Speech Technology

Article

Full-text available

May 2015
AUGMENT ALTERN COMM

People with complex communication needs who use speech-generating devices have very little expressive control over their tone of voice. Despite its importance in human interaction, the issue of tone of voice remains all but absent from AAC research and development however. In this paper, we describe three interdisciplinary projects, past, present and future: The critical design collection Six Speaking Chairs has provoked deeper discussion and inspired a social model of tone of voice; the speculative concept Speech Hedge illustrates challenges and opportunities in designing more expressive user interfaces; the pilot project Tonetable could enable participatory research and seed a research network around tone of voice. We speculate that more radical interactions might expand frontiers of AAC and disrupt speech technology as a whole.

Predicting synthetic voice style from facial expressions. An application for augmented conversations

Article

Full-text available

Feb 2014
SPEECH COMMUN

Abstract The ability to efficiently facilitate social interaction and emotional expression is an important, yet unmet requirement for speech generating devices aimed at individuals with speech impairment. Using gestures such as facial expressions to control aspects of expressive synthetic speech could contribute to an improved communication experience for both the user of the device and the conversation partner. For this purpose, a mapping model between facial expressions and speech is needed, that is high level (utterance-based), versatile and personalisable. In the mapping developed in this work, visual and auditory modalities are connected based on the intended emotional salience of a message: the intensity of facial expressions of the user to the emotional intensity of the synthetic speech. The mapping model has been implemented in a system called WinkTalk that uses estimated facial expression categories and their intensity values to automatically select between three expressive synthetic voices reflecting three degrees of emotional intensity. An evaluation is conducted through an interactive experiment using simulated augmented conversations. The results have shown that automatic control of synthetic speech through facial expressions is fast, non-intrusive, sufficiently accurate and supports the user to feel more involved in the conversation. It can be concluded that the system has the potential to facilitate a more efficient communication process between user and listener.

Expressive synthetic voices: Considerations for human robot interaction

Conference Paper

Full-text available

Sep 2012

As speech synthesis technology develops more advanced paralinguistic capabilities, open questions emerge regarding how humans perceive the use of such vocal capabilities by robots. Perceptions of spoken interaction are complex and influenced by multiple factors including the linguistic content of a message, social context, perceived intelligence of the agent, and form factor of its embodiment. This paper shares results from a study that controlled for the above factors in order to investigate the effect on human listeners of a male synthetic voice with an expressive range. Participants were randomly assigned to three conditions, counterbalancing for gender and language background, in which how paralinguistic cues were applied was varied. As the voice became more expressive and appropriate for the context, observers were more likely to describe the communication as effective, but were less likely to refer to the unseen agent as a person. Possible effects of the listener gender and cultural-linguistic background are examined. Implications for future methodologies in this field are discussed.

Expressive speech synthesis in human interaction

Thesis

Full-text available

Dec 2015

Eva Szekely

WinkTalk: a demonstration of a multimodal speech synthesis platform linking facial expressions to expressive synthetic voices

Conference Paper

Jun 2012

This paper describes a demonstration of the WinkTalk system, which is a speech synthesis platform using expressive synthetic voices. With the help of a webcamera and facial expression analysis, the system allows the user to control the expressive features of the synthetic speech for a particular utterance with their facial expressions. Based on a personalised mapping between three expressive synthetic voices and the users facial expressions, the system selects a voice that matches their face at the moment of sending a message. The WinkTalk system is an early research prototype that aims to demonstrate that facial expressions can be used as a more intuitive control over expressive speech synthesis than manual selection of voice types, thereby contributing to an improved communication experience for users of speech generating devices.

Selection of open ended responses

Similar publications

Citations