Fig 3 - uploaded by Christoph Strecha
Content may be subject to copyright.
Model initialization. Left column: the input stereo pair with feature points and their spatial uncertainty. Middle column: the fit of the model guided by the feature points. The fit is relatively accurate, but alignment errors are still visible at the contour of the face. Right column: renderings of the initialized model. The reconstruction is relatively poor, but the main facial features are already visible. 

Model initialization. Left column: the input stereo pair with feature points and their spatial uncertainty. Middle column: the fit of the model guided by the feature points. The fit is relatively accurate, but alignment errors are still visible at the contour of the face. Right column: renderings of the initialized model. The reconstruction is relatively poor, but the main facial features are already visible. 

Source publication
Conference Paper
Full-text available
This paper presents a new method for face modeling and face re cognition from a pair of calibrated stereo cameras. In a first step, the algorithm bui lds a stereo reconstruction of the face by adjusting the global transformation paramete rs and the shape parame- ters of a 3D morphable face model. The adjustment of the param eters is such that stere...

Context in source publication

Context 1
... the 3 × 3-matrix J T is the Jacobian of the rigid-body transformation evaluated at S(X p ), and ∂X p /∂α j is a 3-derivative vector, which contains the XYZ-values of the j th eigen-shape at the position of X p . The initialization procedure is graphically illustrated in Fig.(3). ...

Similar publications

Article
Full-text available
Blind estimation of a two-slope feature-domain reverbera-tion model is proposed. The reverberation model is suit-able for robust distant-talking automatic speech recognition approaches which use a convolution in the feature domain to characterize the reverberant feature vector sequence, e.g. [1, 2, 3]. Since the model describes the reverberation by...

Citations

... For the real-world, multi-view experiments, we use multiple realistic datasets, including the multi-view car (MVC) dataset [Ozuysal et al., 2009], a multi-view stereo dataset of generic objects (Recon3D) [Kolev et al., 2010], a multi-view, multi-class dataset [Roig et al., 2011] (MVMC), the Amsterdam library of object images [Geusebroek et al., 2005] (ALOI), and a multi-view stereo face dataset [Fransens et al., 2005] (Stereo Face). The latter three of these datasets are from the EPFL CVLAB data repository 1 . ...
Preprint
Full-text available
It is known that representations from self-supervised pre-training can perform on par, and often better, on various downstream tasks than representations from fully-supervised pre-training. This has been shown in a host of settings such as generic object classification and detection, semantic segmentation, and image retrieval. However, some issues have recently come to the fore that demonstrate some of the failure modes of self-supervised representations, such as performance on non-ImageNet-like data, or complex scenes. In this paper, we show that self-supervised representations based on the instance discrimination objective lead to better representations of objects that are more robust to changes in the viewpoint and perspective of the object. We perform experiments of modern self-supervised methods against multiple supervised baselines to demonstrate this, including approximating object viewpoint variation through homographies, and real-world tests based on several multi-view datasets. We find that self-supervised representations are more robust to object viewpoint and appear to encode more pertinent information about objects that facilitate the recognition of objects from novel views.
... We conduct experiments on a diverse collection of datasets including both rigid and non-rigid objects. To show the generalizability of our method, we have conducted a series of experiments: (i) hand pose estimation using a synthetic training set and real NYU hand depth image data [41] for testing, (ii) synthesis of rotated views of rigid objects using the 3D object dataset [4], (iii) synthesis of rotated views using a real face dataset [9], and (iv) the modification of a diverse range of attributes on a synthetic face dataset [17]. For each experiment, we have trained the models using 80% of the datasets. ...
... The cumulative number of frames with maximum error below a threshold distance D has then been computed, as is commonly used in hand pose estimation tasks [6,29]. A comparison of the pose estimation results using synthetic views generated by the proposed model, the CVAE-GAN model, and the CAAE model are presented in Fig. 7, along with the results obtained by performing pose estimation using the single-view input frame Fig. 10 Qualitative evaluation for view synthesis of real faces using the image dataset [9] alone. In particular, for a threshold distance D = 40 mm, the proposed model yields the highest accuracy with 61.98% of the frames having all predicted joint locations within a distance of 40 mm from the ground truth values. ...
... Real face experiment We have also conducted an experiment using a real face dataset to show the applicability of LTNN for real images. The stereo face database [9], consisting of images of 100 individuals from 10 different viewpoints, was used for experiments with real faces. These faces were first segmented using the method of [28], and then we manually cleaned up the failure cases. ...
Article
Full-text available
We propose a fully convolutional conditional generative neural network, the latent transformation neural network, capable of rigid and non-rigid object view synthesis using a lightweight architecture suited for real-time applications and embedded systems. In contrast to existing object view synthesis methods which incorporate conditioning information via concatenation, we introduce a dedicated network component, the conditional transformation unit. This unit is designed to learn the latent space transformations corresponding to specified target views. In addition, a consistency loss term is defined to guide the network toward learning the desired latent space mappings, a task-divided decoder is constructed to refine the quality of generated views of objects, and an adaptive discriminator is introduced to improve the adversarial training process. The generalizability of the proposed methodology is demonstrated on a collection of three diverse tasks: multi-view synthesis on real hand depth images, view synthesis of real and synthetic faces, and the rotation of rigid objects. The proposed model is shown to be comparable with the state-of-the-art methods in structural similarity index measure and \(L_{1}\) metrics while simultaneously achieving a 24% reduction in the compute time for inference of novel images.
... Model-based stereo reconstruction was explored in Wallraven et al. [1999]. The reconstruction quality was improved by eliminating the estimation of illumination and reflectance in Amberg et al. [2007]; Fransens et al. [2005]. 3DMMs also prove to be very valuable in low-resolution settings where high-quality image textures cannot be exploited, or under occlusions [Romeiro and Zickler 2007;Thies et al. 2018b]. ...
Preprint
Full-text available
In this paper, we provide a detailed survey of 3D Morphable Face Models over the 20 years since they were first proposed. The challenges in building and applying these models, namely capture, modeling, image formation, and image analysis, are still active research topics, and we review the state-of-the-art in each of these areas. We also look ahead, identifying unsolved challenges, proposing directions for future research and highlighting the broad range of current and future applications.
... Deneyler iki yüz bulunan imgelerde yüz degişiminin başarısını degerlendirmek ve gerçek zamanlı sistemde performansıölçmek amacıyla iki biçimde gerçekleştirilmiştir. Ş ekil 3'de HP [12], FERET ve Stereo Yüz [11] veri setinden alınan imgelerdeki yüzler ile yapılan deneyler, onerilen yöntemin tek bir giriş imgesinde uygulanmasını göstermektedir. ...
... Reconstructing the shape of a 3D face seen under varying poses should always result in the same face mesh. To investigate the robustness to varying positions and occlusions, we used the above assumption to test our method on the Morphace dataset [45], which consists of eight expressions with varying pose of 100 subjects. This dataset has the extracted feature pixel information marked in two face images of each viewpoint, so we omitted the feature extraction in the proposed method and directly used the provided feature pixel information in this experiment. ...
Article
Full-text available
Face reconstruction is a popular topic in 3D vision system. However, traditional methods often depend on monocular cues, which contain few feature pixels and only use their location information while ignoring a lot of textural information. Furthermore, they are affected by the accuracy of the feature extraction method and occlusion. Here, we propose a novel facial reconstruction framework that accurately extracts the 3D shapes and poses of faces from images captured at multi-views. It extends the traditional method using the monocular bilinear model to the multi-view-based bilinear model by incorporating the feature prior constraint and the texture constraint, which are learned from multi-view images. The feature prior constraint is used as a shape prior to allowing us to estimate accurate 3D facial contours. Furthermore, the texture constraint extracts a high-precision 3D facial shape where traditional methods fail because of their limited number of feature points or the mostly texture-less and texture-repetitive nature of the input images. Meanwhile, it fully explores the implied 3D information of the multi-view images, which also enhances the robustness of the results. Additionally, the proposed method uses only two or more uncalibrated images with an arbitrary baseline, estimating calibration and shape simultaneously. A comparison with the state-of-the-art monocular bilinear model-based method shows that the proposed method has a significantly higher level of accuracy.
... They primarily involve stereo photogrammetric techniques and/or structured light. Such systems often assume the availability of multiple facial images as input and thus need to address a number of challenges, such as establishing dense correspondences across facial images with varying poses, expressions, and illumination [4] [5]. ...
... To evaluate the proposed method we make the comparative analysis with some recent state-of-the-art methods such as LSE [15], LcR [27], and LINE [21], EZ-PLS [32]. The experiments are performed on FEI database [38] and subset of FERET [39], CAS-PEAL [40] and Stereo-pair database [41]. Hallucination results are demonstrated using two objective metrics, i.e. ...
Article
Full-text available
In this paper, ambiguity in low resolution U+0028 LR U+0029 and high resolution U+0028 HR U+0029 manifold for nearest neighbor selection in face hallucination U+0028 FH U+0029 problem is considered. To highlight the performance of FH we propose to resolve the ambiguity through two measures. Firstly, an improved search criterion, i.e., reference patch embedding U+0028 RPE U+0029 is designed for neighbor embedding U+0028 NE U+0029 to enhance the structural similarity in LR manifold. Secondly, locality constrained partial least square U+0028 PLS U+0029 estimation is employed for NE in HR manifold. PLS maximizes the degree of similarity between two manifolds and share almost the same local structure. Therefore locality constrained refined neighbor selection in a unified feature space better optimizes the reconstruction weights, thus the performance is improved. It is illustrated with the help of extensive experiments that proposed methods leads to better performance with respect to peak signal to noise ratio U+0028 PSNR U+0029 and structural similarity index matrix U+0028 SSIM U+0029 as compared to the results obtained by traditional position patch based methods for FH.
... We test the algorithm with the stereo face database [10]. The database has in total 1600 images from 100 faces, where each face has 16 different view images captured by 2 cameras. ...
Article
Full-text available
In this paper, we propose a novel face feature extraction method based on deep learning. Using synthesized multi-view face images, we train a deep face feature (DFF) extractor based on the correlation between projections of a face point on images from different views. A feature vector can be extracted for each pixel of the face image based on the trained DFF model, and it is more effective than general purpose feature descriptors for face-related tasks such as alignment, matching, and reconstruction. Based on the DFF, we develop an effective face alignment method with single or multiple face images as input, which iteratively updates landmarks, pose and 3D shape. Experiments demonstrate that our method can achieve state-of-the-art results for face alignment with a single image, and the alignment can be further improved with multi-view face images.
... The mahalonobis distance technique is used to find the distance between features. R. Fransens [22] presented an algorithm to builds a stereo reconstruction of the face by adjusting the global transformation parameters and shape parameters of a 3D morphable face model. The resulting shape and texture information from a face is used for face recognition. ...
... The estimated decision scores of 2D and 3D features are fused at the decision level using OR rule. The proposed method is evaluated by carrying out experiments on the stereo face database [22]. The remaining sections of the paper are organized as follows. ...
... To evaluate the performance of the proposed multi modal face recognition method, we carried out various experiments using stereo face database [22]. It consists of stereo pairs of 70 subjects (35 males, 35 females) recorded from eight different viewpoints. ...
Article
Full-text available
In this paper, we present multimodal 2D +3D face recognition method using block based curvelet features. The 3D surface of face (Depth Map) is computed from the stereo face images using stereo vision technique. The statistical measures such as mean, standard deviation, variance and entropy are extracted from each block of curvelet subband for both depth and intensity images independently.In order to compute the decision score, the KNN classifier is employed independently for both intensity and depth map. Further, computed decision scoresof intensity and depth map are combined at decision level to improve the face recognition rate. The combination of intensity and depth map is verified experimentally using benchmark face database. The experimental results show that the proposed multimodal method is better than individual modality.
... C A S -P E A L ( Gao et al. 2008a), FERET, CMU (Rowley et al. 1998), Stereo-pair (Fransens et al. 2005) ...
Article
Full-text available
This paper comprehensively surveys the development of face hallucination (FH), including both face super-resolution and face sketch-photo synthesis techniques. Indeed, these two techniques share the same objective of inferring a target face image (e.g. high-resolution face image, face sketch and face photo) from a corresponding source input (e.g. low-resolution face image, face photo and face sketch). Considering the critical role of image interpretation in modern intelligent systems for authentication, surveillance, law enforcement, security control, and entertainment, FH has attracted growing attention in recent years. Existing FH methods can be grouped into four categories: Bayesian inference approaches, subspace learning approaches, a combination of Bayesian inference and subspace learning approaches, and sparse representation-based approaches. In spite of achieving a certain level of development, FH is limited in its success by complex application conditions such as variant illuminations, poses, or views. This paper provides a holistic understanding and deep insight into FH, and presents a comparative analysis of representative methods and promising future directions.