Fig 3 - uploaded by Piero Cosi
Content may be subject to copyright.
Lucia’s wireframe, textures and renderings. 

Lucia’s wireframe, textures and renderings. 

Source publication
Conference Paper
Full-text available
LUCIA is an MPEG-4 facial animation system developed at ISTC-CNR. It works on standard Facial Animation Parameters and speaks with the Italian version of FESTIVAL TTS. To achieve an emotive/expressive talking head LUCIA was built from real human data physically extracted by ELITE optic-tracking movement analyzer. LUCIA can copy a real human being b...

Similar publications

Thesis
Full-text available
Animated lip-sync is a well studied problem and its usefulness to industry, academia and society is undisputed. Many solutions have been proposed over the last century but more recently, the application of deep neural networks has sparked much new research. In the spirit of scientific reproducibility, this thesis documents an attempt to replicate...

Citations

... The agent's voice (Text-to-speech -TTS) can also significantly impact users' perception of cognitive and emotional trust [18], [64]. Many companies even commercialised these voice-based assistance [43]. [b] FACSvatar [87]. ...
... These low-poly avatars use fewer polygons for the mesh structure of the avatar and support little facial and behavioural details in the animation. LUCIA [43] an open- source 3D platform that allows developers to control the posture and behaviour of the avatar. It can imitate humans by tracking the movements of passive markers placed on a person's face. ...
... Open source 3D facial animation model Xface is implemented depending on the viability of MPEG-4 approach [5]. In Xface a set of vertices are selected for all feature points (zones) and a Fig. 2 Talking head systems a Greta [1], b Xface [3], and c Lucia [24] Raised Cosine Function (RCF) is used to deform the region and transfer the vertices in the neighbourhood when a feature point is moved. This is being a distance transform achieves satisfactory results. ...
Article
Full-text available
Lip synchronization of 3D face model is now being used in a multitude of important fields. It brings a more human, social and dramatic reality to computer games, films and interactive multimedia, and is growing in use and importance. High level of realism can be used in demanding applications such as computer games and cinema. Authoring lip syncing with complex and subtle expressions is still difficult and fraught with problems in terms of realism. This research proposed a lip syncing method of realistic expressive 3D face model. Animated lips requires a 3D face model capable of representing the myriad shapes the human face experiences during speech and a method to produce the correct lip shape at the correct time. The paper presented a 3D face model designed to support lip syncing that align with input audio file. It deforms using Raised Cosine Deformation (RCD) function that is grafted onto the input facial geometry. The face model was based on MPEG-4 Facial Animation (FA) Standard. This paper proposed a method to animate the 3D face model over time to create animated lip syncing using a canonical set of visemes for all pairwise combinations of a reduced phoneme set called ProPhone. The proposed research integrated emotions by the consideration of Ekman model and Plutchik’s wheel with emotive eye movements by implementing Emotional Eye Movements Markup Language (EEMML) to produce realistic 3D face model.
... Open source 3D facial animation model Xface is implemented depending on the viability of MPEG-4 approach [5]. In Xface a set of vertices are selected for all feature points (zones) and a Fig. 2 Talking head systems a Greta [1], b Xface [3], and c Lucia [24] Raised Cosine Function (RCF) is used to deform the region and transfer the vertices in the neighbourhood when a feature point is moved. This is being a distance transform achieves satisfactory results. ...
... LUCIA is a MPEG-4 facial animation engine that implements a modified version of Cohen-Massaro coarticulation model to model visual speech (see Section 2.3.1) (COSI et al., 2004;LEONE et al., 2012). The system also receives as input an APML script containing emotional tags associated to the six facial expressions of Ekman. ...
Thesis
The facial animation technology experiences an increasing demand for applications involving virtual assistants, sellers, tutors and newscasters; lifelike game characters, social agents, and tools for scientific experiments in psychology and behavioral sciences. A relevant and challenging aspect of the development of talking heads is the realistic reproduction of the speech articulatory movements combined with the elements of non-verbal communication and the expression of emotions. This work presents an image-based, or 2D, facial animation synthesis methodology that allows the reproduction of a wide range of expressive speech emotional states and also supports the modulation of head movements and the control of face elements, like the blinking of the eyes and the raising of the eyebrows. The synthesis of the animation uses a database of prototype images which are combined to produce animation keyframes. The weights used for combining the prototype images are derived from a statistical active appearance model (AAM), which is built from a set of sample images extracted from an audio-visual corpus of a real face. The generation of the animation keyframes is driven by the timed phonetic transcription of the speech to be animated and the desired emotional state. The keyposes consist of expressive context-dependent visemes that implicitly model the speech coarticulation effects. The transition between adjacent keyposes is performed through a non-linear image morphing algorithm. To evaluate the synthesized animations, a perceptual evaluation based on the recognition of emotions was performed. Among the contributions of the work is also the building of a database of expressive speech video and motion capture data for Brazilian Portuguese.
... LUCIA is a MPEG-4 facial animation engine that implements a modified version of Cohen-Massaro coarticulation model to model visual speech (see Section 2.3.1) (COSI et al., 2004;LEONE et al., 2012). The system also receives as input an APML script containing emotional tags associated to the six facial expressions of Ekman. ...
Thesis
Full-text available
The facial animation technology experiences an increasing demand for applications involving virtual assistants, sellers, tutors and newscasters; lifelike game characters, social agents, and tools for scientific experiments in psychology and behavioral sciences. A relevant and challenging aspect of the development of talking heads is the realistic reproduction of the speech articulatory movements combined with the elements of non-verbal communication and the expression of emotions. This work presents an image-based, or 2D, facial animation synthesis methodology that allows the reproduction of a wide range of expressive speech emotional states and also supports the modulation of head movements and the control of face elements, like the blinking of the eyes and the raising of the eyebrows. The synthesis of the animation uses a database of prototype images which are combined to produce animation keyframes. The weights used for combining the prototype images are derived from a statistical active appearance model (AAM), which is built from a set of sample images extracted from an audio-visual corpus of a real face. The generation of the animation keyframes is driven by the timed phonetic transcription of the speech to be animated and the desired emotional state. The keyposes consist of expressive context-dependent visemes that implicitly model the speech coarticulation effects. The transition between adjacent keyposes is performed through a non-linear image morphing algorithm. To evaluate the synthesized animations, a perceptual evaluation based on the recognition of emotions was performed. Among the contributions of the work is also the building of a database of expressive speech video and motion capture data for Brazilian Portuguese.
... Users interacting with the web interface are also offered the opportunity to listen and look at a female talking head reading the paragraph which contains the keyword. This task has been achieved through LUCIA-WebGL Cosi et al., 2011), an MPEG-4 Facial Animation Parameters (FAP) driven talking head that implements a decoder compatible with the "Predictable Facial Animation Object Profile" (Pandzic and Forchheimer, 2003). LUCIA-WebGL is totally based on real human data collected by means of ELITE, a fully automatic movement analyzer for 3D kinematics data acquisition by (Ferrigno and Pedotti, 1985). ...
Conference Paper
Full-text available
In this paper we present a web interface to study Italian through the access to read Italian literature. The system allows to browse the content, search for specific words and listen to the correct pronunciation produced by native speakers in a given context. This work aims at providing people who are interested in learning Italian with a new way of exploring the Italian culture and literature through a web interface with a search module. By submitting a query, users may browse and listen to the results through several modalities including: a) the voice of a native speaker: if an indexed audio track is available, the user can listen either to the query terms or to the whole context in which they appear (sentence, paragraph, verse); b) a synthetic voice: the user can listen to the results read by a text-to-speech system; c) an avatar: the user can listen to and look at a talking head reading the paragraph and visually reproducing real speech articulatory movements. In its up to date version, different speech technologies currently being developed at ISTC-CNR are implemented into a single framework. The system will be described in detail and hints for future work are discussed.
... At ISTC-CNR of Padua we developed LUCIA, a talking head based on an open source facial animation frame-work [7]. [8], [9]. More Recently, WebGL ("Web Graphics Language") was introduced [10], and currently supported by the major web browsers development companies. ...
Article
Full-text available
In this DEMO we present the first worldwide WebGL implementation of a talking head (LuciaWebGL), and also the first WebGL talking head running on iOS mobile devices (Apple iPhone and iPad).
... An efficient coding of shape and animation of human face was included in the MPEG-4 international standard [3]. At ISTC-CNR of Padua we developed LUCIA talking head, an open source facial animation framework [4]. With the introduction of WebGL [5], which is 3D graphics for web browsers, we enhanced the possibility for Lucia to be embedded in any internet site [6]. ...
Conference Paper
Full-text available
Luciaweb is a 3D Italian talking avatar based on the new WebGL technology. WebGL is the standard programming library to develop 3D computer graphics inside the web browsers. In the last year we developed a facial animation system based on this library to interact with the user in a bimodal way. The overall system is a client-server application using the http protocol: we have a client (a browser or an app) and a web server. No software download and no plugin are required. All the software reside on the server and the visualization player is delivered inside the html pages that the client ask at the beginning of the connection. On the server side a software called AudioVideo Engine generates the phonemes and visemes information needed for the animation. The demo called Emotional Parrot shows the ability to reproduce the same input in different emotional states. This is the first WebGL software running on iOS device ever.
Preprint
Full-text available
We consider the challenging problem of audio to animated video generation. We propose a novel method OneShotAu2AV to generate an animated video of arbitrary length using an audio clip and a single unseen image of a person as an input. The proposed method consists of two stages. In the first stage, OneShotAu2AV generates the talking-head video in the human domain given an audio and a person's image. In the second stage, the talking-head video from the human domain is converted to the animated domain. The model architecture of the first stage consists of spatially adaptive normalization based multi-level generator and multiple multilevel discriminators along with multiple adversarial and non-adversarial losses. The second stage leverages attention based normalization driven GAN architecture along with temporal predictor based recycle loss and blink loss coupled with lipsync loss, for unsupervised generation of animated video. In our approach, the input audio clip is not restricted to any specific language, which gives the method multilingual applicability. OneShotAu2AV can generate animated videos that have: (a) lip movements that are in sync with the audio, (b) natural facial expressions such as blinks and eyebrow movements, (c) head movements. Experimental evaluation demonstrates superior performance of OneShotAu2AV as compared to U-GAT-IT and RecycleGan on multiple quantitative metrics including KID(Kernel Inception Distance), Word error rate, blinks/sec