Figure 1 - uploaded by Le Lu
Content may be subject to copyright.
Automatic Initialization Procedure: a) a general 3D face mesh model; b) mesh model with texture; c) a facial image with ASM converge; d) before pose initialization; e) after pose initialization; f) personalized face model adjustment.

Automatic Initialization Procedure: a) a general 3D face mesh model; b) mesh model with texture; c) a facial image with ASM converge; d) before pose initialization; e) after pose initialization; f) personalized face model adjustment.

Source publication
Conference Paper
Full-text available
We present a model- and exemplar-based technique for head pose tracking. Because of the dynamic nature, it is not possible to represent face appearance by a single tex- ture image. Instead, we sample the complex face appear- ance space by a few reference images (exemplars). By tak- ing advantage of the rich geometric information of a 3D face model...

Contexts in source publication

Context 1
... problem of constructing the 3D face model is for- mulated as searching for the head pose and the metric co- efficients to best fit the face feature points and silhouettes which are detected by ASM. As shown in Figure 1 (b) and (c),there are three different sets of point correspondences between the 3D face mesh model and the point distribution model of ASM. Each red point in (c) corresponds to a vertex on the face mesh in (b). ...
Context 2
... point-to-curve correspondences are converted to point-to-point correspondences using an iterative closest point approach. Figure 1 (d) and (e) shows the generic mesh before and after pose initialization respectively. ...
Context 3
... find that it usually terminates in 3 or 4 iterations. Figure 1 (f) shows the final face mesh. We can see that its silhouettes match the image silhouettes a lot better than (e). ...

Similar publications

Article
Full-text available
Facial expression is a topic which is frequently used in several areas of research, such as security, psychology, and entertainment. All of these researches demand a high quality source data of facial expression. This research builds Javanese-Indonesian Female Facial Expression Database in three dimensional data called JIFFE-3D Database. JIFFE-3D D...
Article
Full-text available
Statistical models of non-rigid deformable objects such as Active Appearance Models (AAM) are a popular means of both registration, tracking and synthesis of faces. Due to rapid fitting and good accuracy they are used extensively for facial expression tracking and analysis. A problem facing AAM based face tracking, is their inability to generalise...
Article
Full-text available
This paper presents a groundbreaking online educational platform that utilizes facial expression recognition technology to track the progress of students within the classroom environment. Through periodic image capture and facial data extraction, the platform employs ResNet50, CBAM, and TCNs for enhanced facial expression recognition. Achieving acc...
Preprint
Full-text available
Recent advances in Generative Adversarial Nets (GANs) have shown remarkable improvements for facial expression editing. However, current methods are still prone to generate artifacts and blurs around expression-intensive regions, and often introduce undesired overlapping artifacts while handling large-gap expression transformations such as transfor...

Citations

... • Appearance template methods: compare a face image to a set of exemplars template to find the most similar view[105,106]; • Detector array: use a series of head detectors, each trained for a specific pose and assign the pose relative to the detector with the greatest support[107][108][109]; • Manifold embedding: embed an image into low-dimensional manifolds that model the continuous variation in head pose and use these for pose regression[110][111][112][113][114][115][116][117][118][119]; • Tracking methods: use temporal constraint to recover the pose from observed movements in video frames[51,[120][121][122][123][124]; • Hybrid classical approaches: combine one or more of the afromentioned methods in a single model[1,104]; ...
Article
Full-text available
Head pose estimation (HPE) is an active and popular area of research. Over the developed, leading to a progressive improvement in accuracy; nevertheless, head pose estimation remains an open research topic, especially in unconstrained environments. In this paper, we will review the increasing amount of available datasets and the modern methodologies used to estimate orientation, with a special attention to deep learning techniques. We will discuss the evolution of the feld by proposing a classifcation of head pose estimation methods, explaining their advantages and disadvantages, and highlighting the diferent ways deep learning techniques have been used in the context of HPE. An in-depth performance comparison and discussion is presented at the end of the work. We also highlight the most promising research directions for future investigations on the topic.
... Most feature-based methods in the literature sample a set of feature points at specific feature positions within the face region. The chosen features often serve as landmarks for flexible [12,21,22,24,34,42] or geometrical face models that infer the head orientation from the relative configuration of the facial features [29]. Methods that cater for the occurrence of non-rigid face deformations due to changes in facial expression often require a preliminary training stage in order to capture the deformation modes of a shape model [42], to learn meaningful face expressions by the creation of several non-rigid motion models [24], or to select salient keypoints that are stable under non-rigid face deformations and learn corresponding keypoint descriptors at different head pose angles [38]. ...
... The chosen features often serve as landmarks for flexible [12,21,22,24,34,42] or geometrical face models that infer the head orientation from the relative configuration of the facial features [29]. Methods that cater for the occurrence of non-rigid face deformations due to changes in facial expression often require a preliminary training stage in order to capture the deformation modes of a shape model [42], to learn meaningful face expressions by the creation of several non-rigid motion models [24], or to select salient keypoints that are stable under non-rigid face deformations and learn corresponding keypoint descriptors at different head pose angles [38]. Nonetheless, the estimation accuracy of methods that rely on model-fitting generally depends upon accurate initialisation and tracking of specific facial features. ...
Article
Full-text available
Head pose estimation under non-rigid face movement is particularly useful in applications relating to eye-gaze tracking in less constrained scenarios, where the user is allowed to move naturally during tracking. Existing vision-based head pose estimation methods often require accurate initialisation and tracking of specific facial landmarks, while methods that handle non-rigid face deformations typically necessitate a preliminary training phase prior to head pose estimation. In this paper, we propose a method to estimate the head pose in real-time from the trajectories of a set of feature points spread randomly over the face region, without requiring a training phase or model-fitting of specific facial features. Conversely, our method exploits the 3-dimensional shape of the surface of interest, recovered via shape and motion factorisation, in combination with Kalman and particle filtering to determine the contribution of each feature point to the estimation of head pose based on a variance measure. Quantitative and qualitative results reveal the capability of our method in handling non-rigid face movement without deterioration of the head pose estimation accuracy. | Post-print: http://www.stefaniacristina.engineer/publications/
... However, these algorithms mostly require the person(s) to face the camera more or less and be rather close to it in order to have a relatively high image resolution of the face. Using video, head pose estimation can be included in a joint head and pose tracking algorithm [27], [15], [28], [29]. Early works of Stiefelhagen and Zhu [30], for example, used a Gaussian Mixture Model (GMM) on head pose angles to estimate VFOA. ...
Article
Full-text available
In this paper, we propose a new method for estimating the Visual Focus Of Attention (VFOA) in a video stream captured by a single distant camera and showing several persons sitting around table, like in formal meeting or videoconferencing settings. The visual targets for a given person are automatically extracted on-line using an unsupervised algorithm that incrementally learns the different appearance clusters from low-level visual features computed from face patches provided by a face tracker without the need of an intermediate errorprone step of head-pose estimation as in classical approaches. The clusters learnt in that way can then be used to classify the different visual attention targets of the person during a tracking run, without any prior knowledge on the environment and the configuration of the room or the visible persons. Experiments on public datasets containing almost two hours of annotated videos from meetings and video-conferencing show that the proposed algorithm produces state-of-the-art results and even outperforms a traditional supervised method that is based on head orientation estimation and that classifies visual focus of attention using Gaussian Mixture Models.
... This is done either globally, e.g. by learning to classify image patches of the head at different angles based on lowlevel visual features or locally, i.e. by localising certain facial features and by geometrically and statistically inferring the global orientation (see [11] for a literature survey). Further, with video, head pose estimation can be included in a joint head and pose tracking algorithm [9,2,8,14]. Early works of Stiefelhagen et al. [17], for example, used a Gaussian Mixture Model (GMM) on head pose angles to estimate VFOA. ...
Conference Paper
Full-text available
In this paper, we propose a novel approach for estimating visual focus of attention in video streams. The method is based on an unsupervised algorithm that incrementally learns the different appearance clusters from low-level visual features extracted from face patches provided by a face tracker. The clusters learnt in that way can then be used to classify the different visual attention targets of a given person during a tracking run, without any prior knowledge on the environment and the configuration of the room or the visible persons. Experiments on public datasets containing almost two hours of annotated videos from meetings and video-conferencing show that the proposed algorithm produces state-of-the-art results and even outperforms a traditional supervised method that is based on head orientation estimation and that classifies visual focus of attention using Gaussian Mixture Models.
... For a given person, τ s is constant. Estimating τ s can be carried out using either feature-based [11] or featureless approaches [9]. In our recent work, we have shown that some components of the shape control vector can be automatically initialized with a featureless approach [12]. ...
Conference Paper
Human-machine interaction is a hot topic nowadays in the communities of computer vision and robotics. In this context, face recognition algorithms (used as primary cue for a person's identity assessment) work well under controlled conditions but degrade significantly when tested in real-world environments. This is mostly due to the difficulty of simultaneously handling variations in illumination, pose, and occlusions. In this paper, we propose a novel approach for robust pose-invariant face recognition for human-robot interaction based on the real-time fitting of a 3D deformable model to input images taken from video sequences. More concrete, our approach generates a rectified face image irrespective with the actual head-pose orientation. Experimental results performed on Honda video database, using several manifold learning techniques, show a distinct advantage of the proposed method over the standard 2D appearance-based snapshot approach.
... This method does not take advantage of the fact that knowledge about head pose could improve head modeling and thus head tracking accuracy. The second group of methods [3,7,14] considers head tracking and pose estimation as a joint process. Following this conception, we proposed in previous works a method relying on a Bayesian formulation coupling the head tracking and pose estimation problems. ...
Article
Full-text available
This paper presents a Rao-Blackwellized mixed state particle filter for joint head tracking and pose estimation. Rao-Blackwellizing a particle filter consists of marginalizing some of the variables of the state space in order to exactly compute their posterior probability density function. Marginalizing variables reduces the dimension of the configuration space and makes the particle filter more efficient and requires a lower number of particles. Experiments were conducted on our head pose ground truth video database consisting of people engaged in meeting discussions. Results from these experiments demonstrated benefits of the Rao-Blackwellized particle filter model with fewer particles over the mixed state particle filter model.
... However, since head pose estimation is very sensitive to head localization [27], head pose results are highly dependent on the tracking accuracy. To address this issue, [17], [24], [28] perform the head tracking and the pose estimation jointly. ...
Article
Full-text available
We address the problem of recognizing the visual focus of attention (VFOA) of meeting participants based on their head pose. To this end, the head pose observations are modeled using a Gaussian Mixture Model (GMM) or a Hidden Markov Model (HMM) whose hidden states corresponds to the VFOA. The novelties of this work are threefold. First, contrary to previous studies on the topic, in our set-up, the potential VFOA of a person is not restricted to other participants only. It includes environmental targets as well (a table and a projection screen), which increases the complexity of the task, with more VFOA targets spread in the pan as well as tilt gaze space. Second, we propose a geometric model to set the GMM or HMM parameters by exploiting results from cognitive science on saccadic eye motion, which allows the prediction of the head pose given a gaze target. Third, an unsupervised parameter adaptation step not using any labeled data is proposed which accounts for the specific gazing behaviour of each participant.
... However, since head pose estimation is very sensitive to head localization [28], head pose results are highly dependent on the tracking accuracy. To address this issue, [25], [29], [18] perform the head tracking and the pose estimation jointly. ...
Article
Full-text available
We address the problem of recognizing the visual focus of attention (VFOA) of meeting participants based on their head pose. To this end, the head pose observations are modeled using a Gaussian mixture model (GMM) or a hidden Markov model (HMM) whose hidden states correspond to the VFOA. The novelties of this paper are threefold. First, contrary to previous studies on the topic, in our setup, the potential VFOA of a person is not restricted to other participants only. It includes environmental targets as well (a table and a projection screen), which increases the complexity of the task, with more VFOA targets spread in the pan as well as tilt gaze space. Second, we propose a geometric model to set the GMM or HMM parameters by exploiting results from cognitive science on saccadic eye motion, which allows the prediction of the head pose given a gaze target. Third, an unsupervised parameter adaptation step not using any labeled data is proposed, which accounts for the specific gazing behavior of each participant. Using a publicly available corpus of eight meetings featuring four persons, we analyze the above methods by evaluating, through objective performance measures, the recognition of the VFOA from head pose information obtained either using a magnetic sensor device or a vision-based tracking system. The results clearly show that in such complex but realistic situations, the VFOA recognition performance is highly dependent on how well the visual targets are separated for a given meeting participant. In addition, the results show that the use of a geometric model with unsupervised adaptation achieves better results than the use of training data to set the HMM parameters.
... (2) For a given subject, τ s is constant. Estimating τ s can be carried out using either feature-based (Lu et al., 2001) or featureless approaches (Ahlberg, 2002). In our work, we assume that the control vector τ s is already known for every subject, and it is set manually using for instance the face in the first frame of the video sequence (the Candide model and target face shapes are aligned manually). ...
Chapter
Full-text available
This chapter provided a set of recent deterministic and stochastic (robust) techniques that perform efficient facial expression recognition from video sequences. More precisely, we described two texture- and view-independent frameworks for facial expression recognition given natural head motion. Both frameworks use temporal classification and do not require any learned facial image patch since the facial texture model is learned online. The latter property makes them more flexible than many existing recognition approaches. The proposed frameworks can easily include other facial gestures in addition to the universal expressions. The first framework (Tracking then Recognition) exploits the temporal representation of tracked facial actions in order to infer the current facial expression in a deterministic way. Within this framework, we proposed two different recognition methods: i) a method based on Dynamic Time Warping, and ii) a method based on Linear Discriminant Analysis. The second framework (Tracking and Recognition) proposes a novel paradigm in which facial action tracking and expression recognition are simultaneously performed. This framework consists of two stages. In the first stage, the 3D head pose is recovered using a deterministic registration technique based on Online Appearance Models. In the second stage, the facial actions as well as the facial expression are simultaneously estimated using a stochastic framework based on multi-class dynamics. We have shown that possible inaccuracies affecting the out-of-plane parameters associated with the 3D head pose have no impact on the stochastic tracking and recognition. The developed scheme lends itself nicely to real-time systems. We expect the approach to perform well in the presence of perturbing factors, such as video discontinuities and moderate illumination changes. The developed face tracker was successfully tested with moderate rapid head movements. Should ultra-rapid head movements break tracking, it is possible to use a re-initialization process or a stochastic tracker that propagates a probability distribution over time, such as the particle-filter-based tracking method presented in our previous work (Dornaika & Davoine, 2006). The out-of-plane face motion range is limited within the interval [-45 deg, 45 deg] for the pitch and the yaw angles. Within this range, the obtained distortions associated with the facial patch are still acceptable to estimate the correct pose of the head. Note that the proposed algorithm does not require that the first frame should be a neutral face since all universal expressions have the same probability. The current work uses an appearance model given by one single multivariate Gaussian whose parameters are slowly updated over time. The robustness of this model is improved through the use of robust statistics that prevent outliers from deteriorating the global appearance model. This relatively simple model was adopted to allow real-time performance. We found that the tracking based on this model was successful even in the presence of occlusions caused by a rotated face and occluding hands. The current appearance model can be made more sophisticated through the use of Gaussian mixtures (Zhou et al., 2004; Lee, 2005) and/or illumination templates to take into account sudden and significant local appearance changes due for instance to the presence of shadows.
... For a given subject, τ s is constant. Estimating τ s can be carried out using either feature-based (Lu et al. 2001) or featureless approaches (Ahlberg 2002). In our work, we assume that the control vector τ s is already known for every subject, and it is set manually using for instance the face in the first The corresponding shape-free facial image frame of the video sequence (the Candide model and target face shapes are aligned manually). ...
Article
Full-text available
The recognition of facial gestures and expressions in image sequences is an important and challenging problem. Most of the existing methods adopt the following paradigm. First, facial actions/features are retrieved from the images, then the facial expression is recognized based on the retrieved temporal parameters. In contrast to this mainstream approach, this paper introduces a new approach allowing the simultaneous retrieval of facial actions and expression using a particle filter adopting multi-class dynamics that are conditioned on the expression. For each frame in the video sequence, our approach is split into two consecutive stages. In the first stage, the 3D head pose is retrieved using a deterministic registration technique based on Online Appearance Models. In the second stage, the facial actions as well as the facial expression are simultaneously retrieved using a stochastic framework based on second-order Markov chains. The proposed fast scheme is either as robust as, or more robust than existing ones in a number of respects. We describe extensive experiments and provide evaluations of performance to show the feasibility and robustness of the proposed approach.