Model fitting by GUI tool

Source publication

Open-source Software for Developing Anthropomorphic Spoken Dialog Agents

Article

Full-text available

Jan 2003

An architecture for highly-interactive human-like spoken-dialog agent is discussed in this paper. In order to easily integrate the modules of different characteristics including speech recognizer, speech synthesizer, facial-image synthesizer and dialog controller, each module is modeled as a virtual machine that has a simple common interface and is...

Context 1

... customize the face model only by snap shot, a generic face model is man- ually adjusted to the frontal face image. A graphical user interface helps to shorten the time to complete this fitting process. Fig. 5 shows the image before fitting and after fitting. Firstly, four points located on two corners of the sides around temple, bottom of nose and top of chin are adjusted and then face features are decided roughly. Secondly, four points around each eye and center of eye ball are decided, contour of eyelid and mouth and nose position are ...

View in full-text

Context 2

... the software toolkit, we have built several experimental ASDA systems to evaluate the toolkit. A screen-shot of the system and an example of a user-system interaction are shown in Fig. 15 and Fig. 16 ...

View in full-text

CORPUS-BASED SYNTHESIS OF FUNDAMENTAL FREQUENCY CONTOURS WITH VAROUS SPEAKING STYLES FROM TEXT USING F0 CONTOUR GENERATION PROCESS MODEL

Article

Full-text available

A corpus-based method of generating fundamental frequency (F0) contours of various speaking styles from text was developed. Instead of directly predicting F0 values, the method predicts command values of the F0 contour generation process model. Because of the model constraint, the resulting F0 contour keeps certain naturalness even when the predict...

Fig.2: Architecture of the Isarn text-to-speech system.

Isarn Dialect Speech Synthesis using HMM with syllable-context features

Article

Full-text available

Nov 2018

This paper describes the Isarn speech synthesis system, which is a regional dialect spoken in the Northeast of Thailand. In this study, we focus to improve the prosody generation of the system by using the additional context features. In order to develop the system, the speech parameters (Mel-ceptrum and fundamental frequencies of phoneme within di...

Life-Like Characters. Tools, Affective Functions, and Applications

Article

Full-text available

Jan 2003

Galatea is a software toolkit to develop a human-like spoken dialog agent. In order to easily integrate the modules of different characteristics including speech recognizer, speech synthesizer, facial animation synthesizer[ facial-image synthesizer ] and dialog controller, each module is modeled as a virtual machine having a simple common interface...

Fig. 1. Proposed linguistic-speech consistency training that guarantees...

Fig. 5. Results of preliminary investigation. Distributions of cosine...

Spontaneous speech synthesis with linguistic-speech consistency training using pseudo-filled pauses

Preprint

Full-text available

Oct 2022

We propose a training method for spontaneous speech synthesis models that guarantees the consistency of linguistic parts of synthesized speech. Personalized spontaneous speech synthesis aims to reproduce the individuality of disfluency, such as filled pauses. Our prior model includes a filled-pause prediction model and synthesizes filled-pause-incl...

Formant-Controlled HMM-Based Speech Synthesis.

Conference Paper

Full-text available

Aug 2011

This paper proposes a novel framework that enables us to manipulate and control formants in HMM-based speech synthesis. In this framework, the dependency between formants and spectral features is modelled by piecewise linear transforms; formant parameters are effectively mapped by these to the means of Gaussian distributions over the spectral synth...

Voice interaction system with 3D-CG virtual agent for stand-alone smartphones

Article

Full-text available

Oct 2014

In this paper, we propose a voice interaction system using 3D-CG virtual agents for stand-alone smartphones. Because the proposed system can handle speech recognition and speech synthesis on a stand-alone smartphone differently from the existing mobile voice interaction systems, this system enables us to talk naturally without encountering delays caused by network communications. Moreover, proposed system can be fully customized by dialogue scripts, Java-based plugins, and Android APIs. Therefore, developers can make original voice interaction systems for smartphones easily based on proposed system. We have made a subset of the proposed system available as opensource software. We expect that this system will contribute to studies of human-agent interaction using smartphones.

A Framework for Domain-specific Multi-modal Dialog System Creation

Data

Full-text available

Jan 2014

This paper presents a framework for rapid multi-modal dialog system development for domain specific systems as well as a run-time engine that automates domain independent tasks and behaviors of a conversational multi-modal system. We present the set of module that make up the engine and discuss some of the tools for rapid development of the domain specific language model, corpus and in-teraction rules. The capabilities of the multi-modal framework are demonstrate with the help of two conversational mobile systems that have been built using this frame-work.

A Framework for Domain-specific Multi-modal Dialog System Creation

Conference Paper

Full-text available

Jan 2014

Assistance for novice users on creating songs from Japanese lyrics

Article

Full-text available

Jan 2012

This paper describes a system designed to assist users in creating original songs from Japanese lyrics with ease. Although software which helps in accomplishing this task has advanced recently, assisting users in going through the difficulties of composition is still a challenging task. We discuss a possible solution for assisting composers through three approaches; to design a system with direc-tion functionality in generating songs, to formulate com-position as an optimization problem, and to integrate a synthesis and analysis engine of vocals and lyrics. After 54 days of operation of our implemented web-based sys-tem, 15, 139 songs were automatically generated by 5, 908 distinct users. On average, 2.33 songs were generated per access to the website per user and a wide variety of com-position parameters were chosen for song generation. The results indicate that our method is able to greatly assist users in generating original songs from Japanese lyrics.

Intelligent humanoid robot with Japanese Wikipedia Ontology and Robot Action Ontology

Conference Paper

Mar 2011

WioNA (Wikipedia Ontology NAo) is proposed to build much better HRI by integrating four elements: Japanese speech interface, semantic interpretation, Japanese Wikipedia Ontology and Robot Action Ontology. WioNA is implemented on a humanoid robot "Nao". In WioNA, we developed two ontologies: Japanese Wikipedia Ontology and Robot Action Ontology. Japanese Wikipedia Ontology has a large size of concept hierarchy and instance network with many properties from Japanese Wikipedia (semi) automatically. By giving Japanese Wikipedia Ontology to Nao as wisdom, Nao can dialogue with users on many topics of various fields. Robot Action Ontology, in contrast, is built by organizing various performable actions of Nao to control and generate robot actions. Aligning Robot Action Ontology with Japanese Wikipedia Ontology enables Nao to perform related actions to dialogue topics. To show the validities of WioNA, we describe human-robot conversation logs of two case studies whose dialogue topics are sport and rock singer. These case studies show us how HRI goes well in WioNA with these topics.

A Spoken Dialog System for Spontaneous Conversations Considering Response Timing and Response Type

Article

Jan 2011
IEEJ T ELECTR ELECTR

If a spoken dialog system can respond to a user as naturally as a human, the interaction will appear smoother. In this research, we aim to develop a spoken dialog system that emulates human behavior in a dialog. The proposed system makes use of a decision tree to generate responses at the appropriate times. These responses include ‘aizuchi’ (back-channel), ‘repetition’, ‘collaborative completion’, etc. At each time interval, the decision tree generates the response timing features referring to the pitch and energy contours, recognition hypotheses, and the preparation status of the response generator. A subjective evaluation shows that there is a high degree of naturalness in the timing of ordinary responses and aizuchi, and that the spoken dialog system exhibits user-friendly behavior. The recorded voice system was preferred to a text-to-speech system (synthesized speech), and almost all subjects felt familiarity with the aizuchi. © 2010 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.

An Empirical Study of an RFID Mat Sensor System in a Group Home

Article

Full-text available

Apr 2009

We have been developing an RFID (radio frequency identification) mat system to assist caregivers in a group home. In Japan, the number of group homes offering home-like care for elderly persons suffering from dementia has increased considerably. Even though the smaller number of people residing in a group home makes it suitable for family-like care, the scarcity of caregivers increases the burden, especially during the night. To augment caregiver attention, we developed floor mats with embedded RFID antennae and slippers with RFID tags. These can help caregivers be aware of the activities of persons suffering from dementia by specifying whether an individual has passed over a mat in a particular corridor. This not only helps the caregivers understand such persons by reviewing their activities but also keeps them informed about their current activities. We introduced the floor mats in a real group home and confirmed the feasibility of the system. In this study, we describe the system and lessons learned from our experiment.

A browser-based multimodal interaction system

Conference Paper

Full-text available

Oct 2008

In this paper, we propose a system that enables users to have multimodal interactions (MMI) with an anthropomorphic agent via a web browser. By using the system, a user can interact simply by accessing a web site from his/her web browser. A notable characteristic of the system is that the anthropomorphic agent is synthesized from a photograph of a real human face. This makes it possible to construct a web site whose owner's facial agent speaks with visitors to the site. This paper describes the structure of the system and provides a screen shot.

Towards human-like spoken dialogue systems

Article

Aug 2008
SPEECH COMMUN

This paper presents an overview of methods that can be used to collect and analyse data on user responses to spoken dialogue system components intended to increase human-likeness, and to evaluate how well the components succeed in reaching that goal. Wizard-of-Oz variations, human–human data manipulation, and micro-domains are discussed in this context, as is the use of third-party reviewers to get a measure of the degree of human-likeness. We also present the two-way mimicry target, a model for measuring how well a human–computer dialogue mimics or replicates some aspect of human–human dialogue, including human flaws and inconsistencies. Although we have added a measure of innovation, none of the techniques is new in its entirety. Taken together and described from a human-likeness perspective, however, they form a set of tools that may widen the path towards human-like spoken dialogue systems.

Olympus: An open-source framework for conversational spoken language interface research

Conference Paper

Full-text available

Apr 2007

We introduce Olympus, a freely available framework for research in conversational interfaces. Olympus' open, transparent, flexible, modular and scalable nature facilitates the development of large-scale, real-world systems, and enables research leading to technological and scientific advances in conversational spoken language interfaces. In this paper, we describe the overall architecture, several systems spanning different domains, and a number of current research efforts supported by Olympus.

Model fitting by GUI tool

Contexts in source publication

Similar publications

Citations