Fig 5 - uploaded by Yasuharu Den
Content may be subject to copyright.
Model fitting by GUI tool

Model fitting by GUI tool

Source publication
Article
Full-text available
An architecture for highly-interactive human-like spoken-dialog agent is discussed in this paper. In order to easily integrate the modules of different characteristics including speech recognizer, speech synthesizer, facial-image synthesizer and dialog controller, each module is modeled as a virtual machine that has a simple common interface and is...

Contexts in source publication

Context 1
... customize the face model only by snap shot, a generic face model is man- ually adjusted to the frontal face image. A graphical user interface helps to shorten the time to complete this fitting process. Fig. 5 shows the image before fitting and after fitting. Firstly, four points located on two corners of the sides around temple, bottom of nose and top of chin are adjusted and then face features are decided roughly. Secondly, four points around each eye and center of eye ball are decided, contour of eyelid and mouth and nose position are ...
Context 2
... the software toolkit, we have built several experimental ASDA systems to evaluate the toolkit. A screen-shot of the system and an example of a user-system interaction are shown in Fig. 15 and Fig. 16 ...

Similar publications

Article
Full-text available
A corpus-based method of generating fundamental frequency (F0) contours of various speaking styles from text was developed. Instead of directly predicting F0 values, the method predicts command values of the F0 contour generation process model. Because of the model constraint, the resulting F0 contour keeps certain naturalness even when the predict...
Article
Full-text available
This paper describes the Isarn speech synthesis system, which is a regional dialect spoken in the Northeast of Thailand. In this study, we focus to improve the prosody generation of the system by using the additional context features. In order to develop the system, the speech parameters (Mel-ceptrum and fundamental frequencies of phoneme within di...
Article
Full-text available
Galatea is a software toolkit to develop a human-like spoken dialog agent. In order to easily integrate the modules of different characteristics including speech recognizer, speech synthesizer, facial animation synthesizer[ facial-image synthesizer ] and dialog controller, each module is modeled as a virtual machine having a simple common interface...
Preprint
Full-text available
We propose a training method for spontaneous speech synthesis models that guarantees the consistency of linguistic parts of synthesized speech. Personalized spontaneous speech synthesis aims to reproduce the individuality of disfluency, such as filled pauses. Our prior model includes a filled-pause prediction model and synthesizes filled-pause-incl...
Conference Paper
Full-text available
This paper proposes a novel framework that enables us to manipulate and control formants in HMM-based speech synthesis. In this framework, the dependency between formants and spectral features is modelled by piecewise linear transforms; formant parameters are effectively mapped by these to the means of Gaussian distributions over the spectral synth...

Citations

... Galatea toolkit [19] is a voice interaction toolkit along with MMDAgent used in this research. Galatea toolkit can develop a spoken dialogue system with a life-like animated agent. ...
Article
Full-text available
In this paper, we propose a voice interaction system using 3D-CG virtual agents for stand-alone smartphones. Because the proposed system can handle speech recognition and speech synthesis on a stand-alone smartphone differently from the existing mobile voice interaction systems, this system enables us to talk naturally without encountering delays caused by network communications. Moreover, proposed system can be fully customized by dialogue scripts, Java-based plugins, and Android APIs. Therefore, developers can make original voice interaction systems for smartphones easily based on proposed system. We have made a subset of the proposed system available as opensource software. We expect that this system will contribute to studies of human-agent interaction using smartphones.
... Prior work on multi-modal spoken dialog frameworks tended to focus on speech input and output as the primary modality, see [1,2,4]. For examples, there were RavenClaw (later Olympus) [1,2], Galatea [3], and DIPPER [4]. Both Olympus and DIPPER provide a hub-based or n open agent architecture that offers interfaces to speech recognizers, TTS, dialog managers as well as natural language understanding. ...
Data
Full-text available
This paper presents a framework for rapid multi-modal dialog system development for domain specific systems as well as a run-time engine that automates domain independent tasks and behaviors of a conversational multi-modal system. We present the set of module that make up the engine and discuss some of the tools for rapid development of the domain specific language model, corpus and in-teraction rules. The capabilities of the multi-modal framework are demonstrate with the help of two conversational mobile systems that have been built using this frame-work.
... Prior work on multi-modal spoken dialog frameworks tended to focus on speech input and output as the primary modality, see [1,2,4]. For examples, there were RavenClaw (later Olympus) [1,2], Galatea [3], and DIPPER [4] . Both Olympus and DIPPER provide a hub-based or n open agent architecture that offers interfaces to speech recognizers, TTS, dialog managers as well as natural language understanding . ...
Conference Paper
Full-text available
This paper presents a framework for rapid multi-modal dialog system development for domain specific systems as well as a run-time engine that automates domain independent tasks and behaviors of a conversational multi-modal system. We present the set of module that make up the engine and discuss some of the tools for rapid development of the domain specific language model, corpus and in-teraction rules. The capabilities of the multi-modal framework are demonstrate with the help of two conversational mobile systems that have been built using this frame-work.
... In the latest version (version 3) of our system, around 30 chord progressions, 65 rhythm patterns for melody (including 10 patterns which are able to generate melody with auftakt), and 37 accompaniment patterns are installed for the user to give directions on generating songs. Prosody of the lyrics is analyzed with the text-to-speech engine of Galatea Talk [17], and shown in the text fields located in the boxes. Users can manually correct the prosody by editing the string in the text field. ...
Article
Full-text available
This paper describes a system designed to assist users in creating original songs from Japanese lyrics with ease. Although software which helps in accomplishing this task has advanced recently, assisting users in going through the difficulties of composition is still a challenging task. We discuss a possible solution for assisting composers through three approaches; to design a system with direc-tion functionality in generating songs, to formulate com-position as an optimization problem, and to integrate a synthesis and analysis engine of vocals and lyrics. After 54 days of operation of our implemented web-based sys-tem, 15, 139 songs were automatically generated by 5, 908 distinct users. On average, 2.33 songs were generated per access to the website per user and a wide variety of com-position parameters were chosen for song generation. The results indicate that our method is able to greatly assist users in generating original songs from Japanese lyrics.
... Speech synthesis module is implemented using Gtalk 2 and SoX 3 (Sound eXchange). Gtalk is a text-to-speech engine in Japanese developed by Galatea Project [7]. SoX is a command line utility that can convert various formats of audio files into other formats. ...
Conference Paper
WioNA (Wikipedia Ontology NAo) is proposed to build much better HRI by integrating four elements: Japanese speech interface, semantic interpretation, Japanese Wikipedia Ontology and Robot Action Ontology. WioNA is implemented on a humanoid robot "Nao". In WioNA, we developed two ontologies: Japanese Wikipedia Ontology and Robot Action Ontology. Japanese Wikipedia Ontology has a large size of concept hierarchy and instance network with many properties from Japanese Wikipedia (semi) automatically. By giving Japanese Wikipedia Ontology to Nao as wisdom, Nao can dialogue with users on many topics of various fields. Robot Action Ontology, in contrast, is built by organizing various performable actions of Nao to control and generate robot actions. Aligning Robot Action Ontology with Japanese Wikipedia Ontology enables Nao to perform related actions to dialogue topics. To show the validities of WioNA, we describe human-robot conversation logs of two case studies whose dialogue topics are sport and rock singer. These case studies show us how HRI goes well in WioNA with these topics.
... A female voice was used to record 3410 sentences, including greetings, aizuchi, and weather information, with a familiar and lively intonation. GalateaTalk [24] is used as the speech synthesizer, and can control speaker type, voice tone (intonation), and speech rate. The same sentences have been recorded in both the human and synthesized voices. ...
Article
If a spoken dialog system can respond to a user as naturally as a human, the interaction will appear smoother. In this research, we aim to develop a spoken dialog system that emulates human behavior in a dialog. The proposed system makes use of a decision tree to generate responses at the appropriate times. These responses include ‘aizuchi’ (back-channel), ‘repetition’, ‘collaborative completion’, etc. At each time interval, the decision tree generates the response timing features referring to the pitch and energy contours, recognition hypotheses, and the preparation status of the response generator. A subjective evaluation shows that there is a high degree of naturalness in the timing of ordinary responses and aizuchi, and that the spoken dialog system exhibits user-friendly behavior. The recorded voice system was preferred to a text-to-speech system (synthesized speech), and almost all subjects felt familiarity with the aizuchi. © 2010 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.
... For example, it can be programmed to report the frequency of restroom use. The voice announcement function was developed using the speech synthesis module of the Galatea Toolkit [6]. 3) Daily Activity Monitor: The logviewer can see the raw data, which represents the detailed phenomena occurring in the home as a replay. ...
Article
Full-text available
We have been developing an RFID (radio frequency identification) mat system to assist caregivers in a group home. In Japan, the number of group homes offering home-like care for elderly persons suffering from dementia has increased considerably. Even though the smaller number of people residing in a group home makes it suitable for family-like care, the scarcity of caregivers increases the burden, especially during the night. To augment caregiver attention, we developed floor mats with embedded RFID antennae and slippers with RFID tags. These can help caregivers be aware of the activities of persons suffering from dementia by specifying whether an individual has passed over a mat in a particular corridor. This not only helps the caregivers understand such persons by reviewing their activities but also keeps them informed about their current activities. We introduced the floor mats in a real group home and confirmed the feasibility of the system. In this study, we describe the system and lessons learned from our experiment.
... These two components communicate with each other using AJAX technology. The speech recognition engine (Julius [4]), the speech synthesis engine (gtalk [5]), the facial image synthesis engine (FSM [6]), and the interaction manager are deliverables from the Japanese research project that developed anthropomorphic spoken dialogue agents (the Galatea Project [7]), and the later ISTC (Interactive Speech Technology Consortium [8] [9]). ...
Conference Paper
Full-text available
In this paper, we propose a system that enables users to have multimodal interactions (MMI) with an anthropomorphic agent via a web browser. By using the system, a user can interact simply by accessing a web site from his/her web browser. A notable characteristic of the system is that the anthropomorphic agent is synthesized from a photograph of a real human face. This makes it possible to construct a web site whose owner's facial agent speaks with visitors to the site. This paper describes the structure of the system and provides a screen shot.
... What, then, are the good candidates? The broad answer is anything that is observable in human-human communication, but they can be exemplified by phenomena others have looked at: Kawamoto et al. (2004) mentions grunts and back-channel feedback, use of prosody to indicate utterance type and emotion, incremental understanding and interruptability, and facial animation with lip synchronisation; Porzel (2006) focuses on turn-taking issues; to mention but a few. The selected candidate can then be tested for perception, understanding and response. ...
Article
This paper presents an overview of methods that can be used to collect and analyse data on user responses to spoken dialogue system components intended to increase human-likeness, and to evaluate how well the components succeed in reaching that goal. Wizard-of-Oz variations, human–human data manipulation, and micro-domains are discussed in this context, as is the use of third-party reviewers to get a measure of the degree of human-likeness. We also present the two-way mimicry target, a model for measuring how well a human–computer dialogue mimics or replicates some aspect of human–human dialogue, including human flaws and inconsistencies. Although we have added a measure of innovation, none of the techniques is new in its entirety. Taken together and described from a human-likeness perspective, however, they form a set of tools that may widen the path towards human-like spoken dialogue systems.
... Some, e.g., CSLU and SpeechBuilder, have also been used for educational purposes. And yet others, such as Olympus, GALATEEA (Kawamoto et al., 2002) and DIPPER (Bos et al., 2003) are primarily used for research. Different toolkits rely on different theories and dialog representations: finite-state, slot-filling, plan-based, information state-update. ...
Conference Paper
Full-text available
We introduce Olympus, a freely available framework for research in conversational interfaces. Olympus' open, transparent, flexible, modular and scalable nature facilitates the development of large-scale, real-world systems, and enables research leading to technological and scientific advances in conversational spoken language interfaces. In this paper, we describe the overall architecture, several systems spanning different domains, and a number of current research efforts supported by Olympus.