Lucia’s wireframe, textures and renderings.

Multimodal Embodied Conversational Agents: A discussion of architectures, frameworks and modules for commercial applications

Conference Paper

Full-text available

Dec 2022

Lip syncing method for realistic expressive 3D face model

Article

Full-text available

Mar 2018
MULTIMED TOOLS APPL

Lip synchronization of 3D face model is now being used in a multitude of important fields. It brings a more human, social and dramatic reality to computer games, films and interactive multimedia, and is growing in use and importance. High level of realism can be used in demanding applications such as computer games and cinema. Authoring lip syncing with complex and subtle expressions is still difficult and fraught with problems in terms of realism. This research proposed a lip syncing method of realistic expressive 3D face model. Animated lips requires a 3D face model capable of representing the myriad shapes the human face experiences during speech and a method to produce the correct lip shape at the correct time. The paper presented a 3D face model designed to support lip syncing that align with input audio file. It deforms using Raised Cosine Deformation (RCD) function that is grafted onto the input facial geometry. The face model was based on MPEG-4 Facial Animation (FA) Standard. This paper proposed a method to animate the 3D face model over time to create animated lip syncing using a canonical set of visemes for all pairwise combinations of a reduced phoneme set called ProPhone. The proposed research integrated emotions by the consideration of Ekman model and Plutchik’s wheel with emotive eye movements by implementing Emotional Eye Movements Markup Language (EEMML) to produce realistic 3D face model.

Lip syncing method for realistic expressive 3D face model

Article

Mar 2017
MULTIMED TOOLS APPL

Itimad Raheem Ali

Two-Dimensional Expressive Speech Animation Animação 2D de Fala Expressiva

Thesis

Feb 2015

Paula Dornhofer Paro Costa

The facial animation technology experiences an increasing demand for applications involving virtual assistants, sellers, tutors and newscasters; lifelike game characters, social agents, and tools for scientific experiments in psychology and behavioral sciences. A relevant and challenging aspect of the development of talking heads is the realistic reproduction of the speech articulatory movements combined with the elements of non-verbal communication and the expression of emotions. This work presents an image-based, or 2D, facial animation synthesis methodology that allows the reproduction of a wide range of expressive speech emotional states and also supports the modulation of head movements and the control of face elements, like the blinking of the eyes and the raising of the eyebrows. The synthesis of the animation uses a database of prototype images which are combined to produce animation keyframes. The weights used for combining the prototype images are derived from a statistical active appearance model (AAM), which is built from a set of sample images extracted from an audio-visual corpus of a real face. The generation of the animation keyframes is driven by the timed phonetic transcription of the speech to be animated and the desired emotional state. The keyposes consist of expressive context-dependent visemes that implicitly model the speech coarticulation effects. The transition between adjacent keyposes is performed through a non-linear image morphing algorithm. To evaluate the synthesized animations, a perceptual evaluation based on the recognition of emotions was performed. Among the contributions of the work is also the building of a database of expressive speech video and motion capture data for Brazilian Portuguese.

Two-Dimensional Expressive Speech Animation

Thesis

Full-text available

Feb 2015

Paula Dornhofer Paro Costa

The facial animation technology experiences an increasing demand for applications involving virtual assistants, sellers, tutors and newscasters; lifelike game characters, social agents, and tools for scientific experiments in psychology and behavioral sciences. A relevant and challenging aspect of the development of talking heads is the realistic reproduction of the speech articulatory movements combined with the elements of non-verbal communication and the expression of emotions. This work presents an image-based, or 2D, facial animation synthesis methodology that allows the reproduction of a wide range of expressive speech emotional states and also supports the modulation of head movements and the control of face elements, like the blinking of the eyes and the raising of the eyebrows. The synthesis of the animation uses a database of prototype images which are combined to produce animation keyframes. The weights used for combining the prototype images are derived from a statistical active appearance model (AAM), which is built from a set of sample images extracted from an audio-visual corpus of a real face. The generation of the animation keyframes is driven by the timed phonetic transcription of the speech to be animated and the desired emotional state. The keyposes consist of expressive context-dependent visemes that implicitly model the speech coarticulation effects. The transition between adjacent keyposes is performed through a non-linear image morphing algorithm. To evaluate the synthesized animations, a perceptual evaluation based on the recognition of emotions was performed. Among the contributions of the work is also the building of a database of expressive speech video and motion capture data for Brazilian Portuguese.

Discovering the Italian Literature: Interactive Access to Audio-indexed Text Resources

Conference Paper

Full-text available

May 2014

In this paper we present a web interface to study Italian through the access to read Italian literature. The system allows to browse the content, search for specific words and listen to the correct pronunciation produced by native speakers in a given context. This work aims at providing people who are interested in learning Italian with a new way of exploring the Italian culture and literature through a web interface with a search module. By submitting a query, users may browse and listen to the results through several modalities including: a) the voice of a native speaker: if an indexed audio track is available, the user can listen either to the query terms or to the whole context in which they appear (sentence, paragraph, verse); b) a synthetic voice: the user can listen to the results read by a text-to-speech system; c) an avatar: the user can listen to and look at a talking head reading the paragraph and visually reproducing real speech articulatory movements. In its up to date version, different speech technologies currently being developed at ISTC-CNR are implemented into a single framework. The system will be described in detail and hints for future work are discussed.

Lucia WebGL: A new WebGL-based talking head

Article

Full-text available

Jan 2014

In this DEMO we present the first worldwide WebGL implementation of a talking head (LuciaWebGL), and also the first WebGL talking head running on iOS mobile devices (Apple iPhone and iPad).

A webGL talking head for mobile devices

Conference Paper

Full-text available

May 2012

Luciaweb is a 3D Italian talking avatar based on the new WebGL technology. WebGL is the standard programming library to develop 3D computer graphics inside the web browsers. In the last year we developed a facial animation system based on this library to interact with the user in a bimodal way. The overall system is a client-server application using the http protocol: we have a client (a browser or an app) and a web server. No software download and no plugin are required. All the software reside on the server and the visualization player is delivered inside the html pages that the client ask at the beginning of the connection. On the server side a software called AudioVideo Engine generates the phonemes and visemes information needed for the animation. The demo called Emotional Parrot shows the ability to reproduce the same input in different emotional states. This is the first WebGL software running on iOS device ever.

One Shot Audio to Animated Video Generation

Preprint

Full-text available

Feb 2021

We consider the challenging problem of audio to animated video generation. We propose a novel method OneShotAu2AV to generate an animated video of arbitrary length using an audio clip and a single unseen image of a person as an input. The proposed method consists of two stages. In the first stage, OneShotAu2AV generates the talking-head video in the human domain given an audio and a person's image. In the second stage, the talking-head video from the human domain is converted to the animated domain. The model architecture of the first stage consists of spatially adaptive normalization based multi-level generator and multiple multilevel discriminators along with multiple adversarial and non-adversarial losses. The second stage leverages attention based normalization driven GAN architecture along with temporal predictor based recycle loss and blink loss coupled with lipsync loss, for unsupervised generation of animated video. In our approach, the input audio clip is not restricted to any specific language, which gives the method multilingual applicability. OneShotAu2AV can generate animated videos that have: (a) lip movements that are in sync with the audio, (b) natural facial expressions such as blinks and eyebrow movements, (c) head movements. Experimental evaluation demonstrates superior performance of OneShotAu2AV as compared to U-GAT-IT and RecycleGan on multiple quantitative metrics including KID(Kernel Inception Distance), Word error rate, blinks/sec

Lucia’s wireframe, textures and renderings.

Similar publications

Citations