Conference PaperPDF Available

Identify computer generated characters by analysing facial expressions variation

Authors:

Abstract and Figures

Significant improvements have been recently achieved in both quality and realism of computer generated characters, which are nowadays often very difficult to be distinguished from real ones. However, generating highly realistic facial expressions is still a challenging issue, since synthetic expressions usually follow a repetitive pattern, while in natural faces the same expression is usually produced in similar but not equal ways. In this paper, we propose a method to distinguish between computer generated and natural faces based on facial expressions analysis. In particular, small variations of the facial shape models corresponding to the same expression are used as evidence of synthetic characters.
Content may be subject to copyright.
Identify computer generated characters
by analysing facial expressions variation
Duc-Tien Dang-Nguyen, Giulia Boato, Francesco G.B. De Natale
Department of Information and Computer Science - University of Trento
via Sommarive 5, 38123 Trento - Italy
dangnguyen@disi.unitn.it
boato@disi.unitn.it
denatale@ing.unitn.it
Abstract—Significant improvements have been recently
achieved in both quality and realism of computer generated
characters, which are nowadays often very difficult to be distin-
guished from real ones. However, generating highly realistic facial
expressions is still a challenging issue, since synthetic expressions
usually follow a repetitive pattern, while in natural faces the
same expression is usually produced in similar but not equal
ways. In this paper, we propose a method to distinguish between
computer generated and natural faces based on facial expressions
analysis. In particular, small variations of the facial shape models
corresponding to the same expression are used as evidence of
synthetic characters.
I. INTRODUCTION
Digital graphics tools are nowadays widespread and ex-
ploited to create realistic computer media data both from
professional and non-professional users. In particular, com-
puter generated (CG) characters are increasingly used in many
applications such as talking-faces, e-learning, virtual meeting
and especially video games. Since the first virtual newsreader
Ananova
1
introduced in 2000, significant improvements have
been achieved in both quality and realism of CG characters,
which are nowadays often very difficult to be distinguished
from real ones.
At one hand, these results open a new area for advance
human-computer interaction. On the other hand, non existing
subjects or situations can be generated leading to the need
of techniques assessing data trustability and authenticity with
sufficient confidence. Therefore, the research community has
recently focused on the development of tools supporting the
discrimination between natural and CG multimedia content in
an accurate and reliable way.
In multimedia forensics, approaches distinguishing between
CG and natural data have been developed since 2005. Most of
them focus on still images, by estimating statistical differences
in wavelet-based decomposition [1][2]; by modelling physical
differences like local patch statistics, fractal and quadratic
geometry, and gradient on surface [3]; by evaluating the
1
http://news.bbc.co.uk/2/hi/entertainment/718327.stm
WIFS‘2012, December, 2-5, 2012, Tenerife, Spain. 978-1-
4244-9080-6/10/$26.00
c
2012 IEEE.
noise of the recording device [4]; by combining different
informations like the hybrid approach in [5]. Recently, a
geometric approach supporting the distinction of CG and real
human faces has been presented in [6], which exploits face
asymmetry as a discriminative feature. However, to the best
of our knowledge, there is no multimedia forensics approach
that aims at discriminating between CG versus natural objects
or subjects in video sequences. Such a goal requires different
techniques with respect to the state-of-the-art.
In this paper we propose a method to distinguish between
CG and real characters by analysing facial expressions. Repro-
ducing facial expressions is one of the most challenging issues
in creating virtual characters [7] and there are studies back to
1971 that analyse this problem (see for instance [8]). Most of
the algorithms generate synthetic facial expressions following
the Facial Action Coding System (FACS) by Ekman [9][10]
or MPEG-4 standard [11]. In FACS, the muscles on face
are coded as Action Units (AUs), and an expression is then
represented as a combination of AUs. In MPEG-4, explicit
movements for each point on the face is defined by Facial
Animation Parameters (FAPs). Based on these parameters
(FACS or FAPs), a physically-based model is applied to make
it more realistic. However, when CG contents become very
realistic they often become also unfamiliar (so called Uncanny
valley [12]) and recently some approaches attempt to overcome
such a problem [7][13]. Here, we propose to exploit this gap
to differentiate between computer generated and natural faces.
The underlying idea is that facial expressions in CG charac-
ters follow a repetitive pattern, while in natural faces the same
expression is usually produced in similar but not equal ways
(e.g., human beings do not always smile in the same way). Our
forensic technique take as input various instances of the same
character expression (extracting corresponding frames of the
video sequences) and determine whether the character is CG or
natural based on the analysis of the corresponding variations.
We show that CG faces often replicate the same expression
exactly in the same way, i.e., the variations is smaller than the
natural ones, and can therefore be automatically detected.
The rest of this paper is organized as follows: the proposed
method is described in section II, experimental results are re-
ported in section III, while section IV draws some conclusions.
II. PROPOSED METHOD
Our method contains five steps as detailed in Figure 1. From
a given video sequence, frames that contain human faces are
extracted in the first step A. Then, in step B, facial expression
recognition is applied in order to recognize the expressions of
the faces. Six types of facial expressions are used in this step,
following the six universal expressions of Ekman (happiness,
sadness, disgust, surprise, anger, and fear) [9] plus a ‘neutral’
one. Based on the recognition results, faces corresponding to a
particular expression (e.g., happiness) are selected for the next
steps. Notice that the ‘neutral’ expressions are not considered,
i.e., faces showing no expression are not taken into account for
further processing. In the next step C the Active Shape Model
(ASM), which represents the shape of a face, is extracted from
each face. In order to measure their variations, all shapes have
to be comparable. Thus, in step D, each extracted ASM is
then normalized to a standard shape. After this step, all ASM
shapes are normalized and are comparable. Finally, in step
E, differences between normalized shapes are analysed, and
based on the variation analysis results, the given sequence is
confirmed to be CG or natural.
The right part of Figure 1 shows an illustration of the
analysis procedure on happiness expression. Seven frames that
contain faces are extracted in step A. Then, facial expression
recognition is applied in step B and three happy faces are
kept. For each face, the corresponding ASM model, which is
represented by a set of reference points, is extracted in step
C. Then, each model is normalized to a standard shape, step
D. All normalized shapes are then compared together in step
E, and based on the analysis results the given character is
confirm as computer generated since the differences between
the normalized shapes are small (details about the variation
analysis are given in the following Subsection II-E).
A. Human faces extraction
Face detection problem has been solved with the Viola-
Jones method [14], which can be applied in real-time ap-
plications with a high accuracy. In this step, we reuse this
approach to detect faces from video frames, and frames that
contain faces are extracted. More details about this well-known
method can be found in [15] and [14]. It is worth mentioning
that in this first work we do not face the problem of face
recognition, thus assuming to have just a single person per
video sequence (the analysed character).
B. Facial expression recognition
Facial expression recognition is a nontrivial problem in
facial analysis. In this study, we applied an EigenFaces-
based application [16] developed by Rosa for facial expression
recognition. The goal of this step is to filter out the outlier
expressions and keep the recognized ones for further steps.
Notice that this application associates an expression to a given
face without requiring any detection of reference points. In
Figure 1 an example of results of this application is shown
with 7 faces (3 happy, 2 disgust, 1 surprise and 1 neutral).
Fig. 2. The 87 points of Active Shape Model (ASM). Source: Microsoft
Research Face SDK.
C. Active Shape Model Extraction
Input images for this step are confirmed to have the same
facial expression of the same person, thanks to the prepro-
cessing in the first two steps. In order to extract face shapes,
which are used in our analysis, an alignment method is applied.
In this step, we follow the Component-based Discriminative
Search approach [17], proposed by Liang et al. The general
idea of this approach is to find the best matching from the
mode candidates, where modes are important predefined points
on face images (e.g., eyes, nose, mouth) and are detected from
multiple component positions [17]. Given a face image, the
result of this step is a set of reference points, representing the
detected face. In Figure 3 (a) an example of this step is shown,
where the right image shows reference points representing the
face in the left image. In this method, the authors exploit
the so called ASM, which contains 87 reference points as
shown in Figure 2. Another example of this step on a CG
face is also reported in Figure 3 (c), where the left image
shows the synthetic facial image and the right one shows the
corresponding ASM.
D. Normalized Face Computation
ASM models precisely and suitably represent faces, but
they are incomparable since faces could be different in sizes
or orientations. They need to be normalized in order to be
comparable. In this step, we apply the traditional approach
from [18] to normalize a shape of a face in order to have
a common coordinate system. This normalization is an affine
transformation used to transform the reference points into fixed
positions. Since eye inner corners and the philtrum are stable
under different expressions, these points have been chosen
as reference points. Shown in Figure 2, the reference points
number 0 and 8 are two inner eye corners. The last reference
point, the philtrum, can be computed via the top point of outer
lip and two nostrils (point 51 and 41, 42 on the ASM model,
respectively), as follows:
p
philtrum
=
p
41
+p
42
2
+ p
51
2
(1)
Disgust Happy Surprise Happy Disgust Happy Neutral
Human Faces
Extraction
Facial Expression
Recognition
Active Shape Model
Extraction
Variation Analysis
Normalized
Face Computation
RESULT
INPUT
A
B
C
D
E
Fig. 1. Schema of the proposed method: A. Human faces are extracted from the video sequence(s). B. Facial expressions are recognized (in the example 3
happy, 2 disgust, 1 surprise and 1 neutral). C. Faces with the same expression are selected (in this example only happy faces) and their active shape models
are extracted. D. The extracted models are normalized. E. Differences on the normalized models are analysed to determine whether the character is CG.
where p
41
, p
42
, and p
51
are the reference points on the
extracted ASM.
After computing the three reference points, each ASM
model is normalized by moving {p
41
, p
42
, p
phitrum
} into
their normalized positions, as follows: (i) rotate the seg-
ment [p
41
, p
42
] into an horizontal line segment; (ii) shear the
philtrum to be on the perpendicular line through the middle
point of [p
41
, p
42
]; and finally (iii) scale the image so that the
length of segment [p
41
, p
42
] and the distance from p
philtrum
to [p
41
, p
42
] have predefined fixed values (see [18] for more
details).
Shown in Figure 3 (b) and (d) are examples of the normal-
ized faces after Face Normalization step. The left images show
the normalized faces and the right ones show the normalized
reference points.
E. Variation Analysis
In this step, differences among normalized ASM models are
analysed in order to determine if a given character (and there-
fore the corresponding set of faces) is CG or real. We analyse
the differences as described in the following paragraphs.
First, the distance d
i,p
of each reference point p on a model
Fig. 3. ASM and normalized ASM: (a) and (c) show a photographic and
a computer generated happy face, respectively, and their corresponding ASM
points; (b) and (d) show the normalized images of (a) and (c), respectively,
and their corresponding normalized points.
i to the average of all points p of all models is calculated as:
d
i,p
= k(x, y)
i,p
(x, y)
p
k (2)
where (x, y)
i,p
is the position of the reference point p on the
model i; (x, y)
p
=
1
N
N
P
i=1
(x, y)
i,p
, where N is the number of
normalized ASM models; and k·k is Euclidean distance.
Depending on the facial expression ξ (among six universal
expressions), a subset S
ξ
of reference points (not all 87 points)
are selected for the analysis. For example, with the happy
facial expression (ξ = 1) only reference points from 0 to 15
and from 48 to 67, which represent the eyes and the mouth, are
considered, i.e., S
1
= {0, 1, 2..15, 48, 49, .., 67}. The subsets
are selected based on our experiments and suggestions from
EMFACS [9], in which a facial expression is represented by a
combination of AUs codes. Shown in Table I are the reference
points selected in our method and the correspondent AUs
codes from EMFACS. Some explanations of the AUs codes
are also listed in Table II. Full codes in EMFACS could be
seen in [9].
TABLE I
EXPRESSIONS WITH ACTION UNITS AND CORRESPONDENT ASM POINTS
ξ Expression Action Units (AUs) Reference Points (S
ξ
)
1 Happiness 6+12 S
1
= {0 15, 48 67}
2 Sadness 1+4+15 S
2
= {0 35, 48 57}
3 Surprise 1+2+5B+26 S
3
= {16 35, 48 67}
4 Fear 1+2+4+5+20+26 S
4
= {16 35, 48 57}
5 Anger 4+5+7+23 S
5
= {0 64}
6 Disgust 9+15+16 S
6
= {0 15, 48 67}
TABLE II
EXAMPLE OF SOME FACIAL ACTIONS [9]
AU Number FACS name
1 Inner Brow Raiser
4 Brow Lowerer
6 Cheek Raiser
12 Lip Corner Puller
15 Lip Corner Depressor
.. ..
Two main properties are taken into account in this analysis:
mean and variance, calculated as their traditional definitions:
µ
p
=
1
N
N
X
i=1
d
i,p
, and σ
p
=
1
N
N
X
i=1
||d
i,p
µ
p
||
2
(3)
where µ
p
and σ
p
are the mean and variance of all distances
d
i,p
at reference point p over all models.
The given set of models on expression ξ is confirmed to be
CG or natural by comparing the Expression Variation Value
EV V
ξ
to the threshold τ
ξ
. The value of EV V
ξ
is computed
as follows:
EV V
ξ
= α
ξ
1
|S
ξ
|
P
p
µ
p
λ
1
+ (1 α
ξ
)
max
p
{σ
p
}
λ
2
(4)
where α
ξ
is a weighted constant, α
ξ
[0; 1]; λ
1
and λ
2
are the normalization values used to normalize the numerators
into [0; 1]. In our experiments α
ξ
are set to 0.7 for ξ = 1, ..., 6.
EV V
ξ
is then compared with τ
ξ
, recognizing the character
corresponding to the set of faces as CG if EV V
ξ
< τ
ξ
, natural
otherwise.
Shown in Figure 4 are the mean values, corresponding to all
87 ASM points, for the sadness expression (ξ = 2) analysed
on the two set of images shown in Figure 5 (a). The horizontal
axis represents p, from 1 to 87, while the vertical axis shows
the value of µ
p
. Since the facial expression is sadness (ξ =
2), only the values from µ
0
to µ
35
and from µ
48
to µ
57
are
considered (see the selected reference points in Table I). In this
example, the Expression Variation Value EV V
2
of the CG face
is 0.35 comparing to 0.74 of the natural one (τ
2
= 0.6).
0 10 20 30 40 50 60 70 80 90
0
0.5
1
1.5
2
2.5
3
3.5
Computer Generated
Natural
Fig. 4. Example of differences on the mean of ASM points between CG
and photographic sad faces of Figure 5 (b).
Values of the thresholds τ
ξ(ξ=1..6
) are manually set based on
experiments, with the goal of keeping the miss classification
as small as possible.
III. EXPERIMENTAL RESULTS
In our experiments, we use two public datasets:
Bo
˘
gazic¸i University Head Motion Analysis Project
Database (BUHMAP-DB) [19], which contains 440
videos of 11 people (6 female, 5 male) performing
5 repetitions on 8 different gestures. We selected the
happiness and sadness from this database, since the other
six gestures are not related to our topic. Finally, we have
110 videos from this dataset. Each video lasts about 1 -
2 seconds.
The Japanese Female Facial Expression (JAFFE)
Database [20], which contains 213 images of 7 facial
expressions posed by 10 Japanese female models.
The first experiment is performed on happiness and sadness
expressions from BUHMAP-DB videos. Starting from the 11
people of BUHMAP-DB, we created 11 CG characters by
using FaceGen [21] and morphed all of them into both happy
and sad faces. FaceGen is a powerful tool which can be used
in building complex face structures from one to three images.
In our case, we pass a ‘neutral’ image to FaceGen in order to
build the face structure, then we use Morph options to generate
happiness and sadness expressions on the new generated face.
Thus, we obtained 110 sets of happy and sad faces, where
each model has 5 sets corresponding to happiness and 5 sets
corresponding to sadness. Shown in Figure 5 are two examples
of the CG versions and the original faces from BUHMAP-DB.
(a)
(b)
Fig. 5. Examples of (a) happy, and (b) sad faces from BUHMAP-DB and
the corresponding CG faces generated via FaceGen.
(a) (b) (c)
Fig. 6. Examples of (a) happy, (b) sad, and (c) surprised faces from JAFFE
and the corresponding CG faces generated via FaceGen.
The goal of this experiment is to analyse the differences
from CG models with the natural faces in order to confirm
the idea of the proposed method. The analysis is performed
as follows: for each video sequence 10 frames are uniformly
extracted and similarly for each CG model 10 images are
selected. Then, the sets of images are analysed and the corre-
sponding Expression Variation Values computed as described
in Section II-E. In this case since the expressions are already
known, we implement the method from step C. In this step,
we use Microsoft Face SDK [22] to extract the ASM models.
Finally, we apply step D and E to get the results.
Shown in Figure 7 are EV V
1
values computed on the 55
sets of CG and the 55 sets of natural happy faces. These values
are well separated between CG and natural. There is only
one miss classification using the threshold τ
1
= 0.45. The
accuracy, therefore, is 99% (equals 109/110).
On sadness expression, the result is even better, with 100%
of accuracy using the threshold τ
2
= 0.6. The EV V
2
values
for CG and natural characters are perfectly separated, as shown
in Figure 8.
Our second experiment is performed on the JAFFE database,
which contains all six expressions. Also in this case we used
FaceGen [21] to create the CG models reproducing the JAFFE
models (see Figure 6 for some examples). For each model in
this database, we reproduced all 6 expressions. Therefore, we
perform the second test on 120 sets of images, 60 sets of
CG and 60 sets of JAFEE real faces. The complete proposed
approach described in Section II is applied as a classification
approach on these sets.
Shown in Figure 9 is the average EV V
ξ
for each expression
(ξ = 1, ..., 6). The inner blue boundary represents the EV V
ξ
5 10 15 20 25 30 35 40 45 50
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Computer Generated
Natural
Fig. 7. Facial Expression Values computed on happiness expression. The
threshold value τ
1
is 0.45. The separation between CG and natural EV V
1
is
clearly visualized with only one miss classification.
5 10 15 20 25 30 35 40 45 50
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Computer Generated
Natural
Fig. 8. Facial Expression Values computed on sadness expression. The
threshold value τ
2
is 0.6. CG and natural EV V
2
are clearly separated.
computed from CG sets of images, and the outer red boundary
represents the natural EV V
ξ
. Results show that CG and natural
Expression Variation Values can be differentiated by using and
comparing with a set of thresholds τ
ξ
, visualized by the green
boundary. The classification performance of this experiment is
in average 96.67%. Details for each expression are reported
in the confusion matrices, Table III.
TABLE III
CONFUSION MATRICES ON CG AND NATURAL FACES, COMPUTED ON
JAFFE DATABASE.
ξ Expression CG Natural
1 Happiness CG 100% 0%
Natural 0% 100%
2 Sadness CG 100% 0%
Natural 0% 100%
3 Surprise CG 100% 0%
Natural 0% 100%
4 Fear CG 90% 10%
Natural 0% 100%
5 Anger CG 100% 0%
Natural 0% 100%
6 Disgust CG 80% 20%
Natural 10% 90%
0"
0.2"
0.4"
0.6"
0.8"
1"
Happiness"
Sadness"
Surprise"
Anger"
Fear"
Disgust"
Computer<Generated"
Natural"
Threshold"
Fig. 9. Average of Expression Variation Values analysed for all expressions.
CG and natural EV V
ξ
are separated for all ξ = 1, ..., 6.
The last experiment is performed by comparing Star Trek
Aurora
2
movie, a fully-animated product, against Star Trek
Odyssey, a live action movie from Star Trek: Hidden Frontier
series
3
. In Star Trek Aurora, two graphics applications, namely
Poser and Cinema 4D, are used to create the entire 3D world
and characters. We extracted 4 female characters in each
movie and selected frames that contain happy expression of
those characters. Happy faces are then confirmed by using
Rosa application [16]. An illustration of two characters in
happy emotion is shown in Figure 10. Finally, EV V s are
computed and compared. Using the same threshold as in the
first experiment (τ
1
= 0.45), all EV V
1
calculated for the 4
characters of Star Trek Aurora are smaller than τ
1
while all of
the EV V
1
from Star Trek Odyssey are over τ
1
, i.e., the CG
characters can be recognized and separated from the natural
ones.
(a)
(b)
Fig. 10. Examples of happy faces extracted from (a) Star Trek Aurora, (b)
Star Trek Odyssey.
IV. CONCLUSIONS
In this study, we introduced a novel problem about dif-
ferentiating between CG and natural human faces in video
sequences and we presented a method that allows distinguish-
ing CG characters based on facial expression analysis. Indeed,
results show that CG persons usually present smaller differ-
ences in face shape changing among the same expression, in
2
http://auroratrek.com
3
http://www.hiddenfrontier.com
comparison with real persons. Although experimental results
are performed just on small datasets, we proved that the
method can be effective. Further work will be devoted to au-
tomatic selection of thresholds and exploitation of transitional
parameters of faces.
REFERENCES
[1] S. Lyu and H. Farid, “How realistic is photorealistic?” IEEE Transac-
tions on Signal Processing, vol. 53, no. 2, pp. 845–850, 2005.
[2] Y. Wang and P. Moulin, “On discrimination between photorealistic and
photographic images, in IEEE International Conference on Acoustics,
Speech and Signal Processing, 2006, pp. II.161–II.164.
[3] T. T. Ng, S. F. Chang, J. Hsu, L. Xie, and M. P. Tsui, “Physics-motivated
features for distinguishing photographic images and computer graphics,
in ACM Multimedia, 2005, pp. 239–248.
[4] N. Khanna, G. T. C. Chiu, J. P. Allebach, and E. J. Delp, “Forensic tech-
niques for classifying scanner, computer generated and digital camera
images, in IEEE International Conference on Acoustics, Speech and
Signal Processing, 2008, pp. 1653–1656.
[5] V. Conotter and L. Cordin, “Detecting photographic and computer
generated composites,” in SPIE Symposium on Electronic Imaging, 2011.
[6] D.-T. Dang-Nguyen, G. Boato, and F. G. B. DeNatale, “Discrimina-
tion between computer generated and natural human faces based on
asymmetry information, in European Signal Processing Conference
(EUSIPCO), Bucharest, 2012.
[7] A. Tinwell, M. Grimshaw, D. A. Nabi, and A. Williams, “Facial
expression of emotion and perception of the Uncanny Valley in virtual
characters,Computers in Human Behavior, vol. 27, no. 2, pp. 741–749,
Mar. 2011.
[8] P. Ekman and W. V. Friesen, “Constants across cultures in the face and
emotion, Journal of Personality and Social Psychology, vol. 17, no. 2,
pp. 124–129, 1971.
[9] P. Ekman and W. Friesen, Facial Action Coding System: A Technique
for the Measurement of Facial Movement. Palo Alto: Consulting
Psychologists Press, 1978.
[10] P. Ekman, W. Friesen, and J. Hager, Facial Action Coding System
(FACS): Manual. Salt Lake City (USA): A Human Face, 2002.
[11] ISO, ISO/IEC 14496-2:1999: Information technology Coding of
audio-visual objects Part 2:Visual, 1999.
[12] M. Mori, “Bukimi no tani (the uncanny valley), Energy, vol. 7, no. 4,
pp. 33–35, 1970.
[13] K. F. MacDorman, R. D. Green, C.-C. Ho, and C. T. Koch, “Too real for
comfort? Uncanny responses to computer generated faces, Computers
in Human Behavior, vol. 25, no. 3, pp. 695–710, May 2009.
[14] P. A. Viola and M. J. Jones, “Robust real-time face detection, Interna-
tional Journal of Computer Vision, vol. 57, no. 2, pp. 137–154, 2004.
[15] ——, “Rapid object detection using a boosted cascade of simple
features, in CVPR, 2001, pp. 511–518.
[16] L. Rosa, “EigenExpressions for Facial Expression Recognition,
http://www.advancedsourcecode.com/facialexpression.asp, 2007.
[17] L. Liang, R. Xiao, F. Wen, and J. Sun, “Face alignment via component-
based discriminative search, in Proceedings of the 10th European
Conference on Computer Vision: Part II, ser. ECCV ’08. Berlin,
Heidelberg: Springer-Verlag, 2008, pp. 72–85.
[18] Y. Liu, K. L. Schidt, J. F. Cohn, and S. Mitra, “Facial asymmetry
quantification for expression invariant human identification, Computer
Vision and Image Understanding Journal, vol. 91, no. 1/2, pp. 138–159,
2003.
[19] O. Aran, I. Ari, M. A. G
¨
uvensan, H. Haberdar, Z. Kurt, H. . T
¨
urkmen,
A. Uyar, and L. Akarun, A database of non-manual signs in Turkish
sign language, in Signal Processing and Communications Applications
(SIU2007), Eskis¸ehir, 2007.
[20] M. K. J. G. Michael J. Lyons, Shigeru Akamatsu, “Coding facial
expressions with gabor wavelets, in IEEE International Conference on
Automatic Face and Gesture Recognition, 1998, pp. 200–205.
[21] “FaceGen Modeller from Singular Inversions, http://www.facegen.com,
2004.
[22] “Microsoft Research Face SDK, http://research.microsoft.com/en-
us/projects/facesdk/, May, 2012.
... Researchers have proposed different techniques to distinguish between real and synthetic faces over the years [5][6][7], and great attention has been devoted over the last few years toward identifying whether an image has been generated through a GAN [8][9][10][11]. Further details about the related works are provided in the next section. ...
Article
Full-text available
In the field of image forensics, notable attention has been recently paid toward the detection of synthetic contents created through Generative Adversarial Networks (GANs), especially face images. This work explores a classification methodology inspired by the inner architecture of typical GANs, where vectors in a low-dimensional latent space are transformed by the generator into meaningful high-dimensional images. In particular, the proposed detector exploits the inversion of the GAN synthesis process: given a face image under investigation, we identify the point in the GAN latent space which more closely reconstructs it; we project the vector back into the image space, and we compare the resulting image with the actual one. Through experimental tests on widely known datasets (including FFHQ, CelebA, LFW, and Caltech), we demonstrate that real faces can be accurately discriminated from GAN-generated ones by properly capturing the facial traits through different feature representations. In particular, features based on facial landmarks fed to a Support Vector Machine consistently yield a global accuracy of above 88% for each dataset. Furthermore, we experimentally prove that the proposed detector is robust concerning routinely applied post-processing operations.
... Synthetic expressions usually contain some repetitive patterns, and in natural human faces, the same expressions are usually produced in a similar but not equal way. Based on this discrepancy, Dang-Nguyen et al. [DNBDN12b] distinguished CG characters from real ones by analyzing variations in facial expressions. This method contains five steps: human faces extraction, facial expression recognition, active shape model (ASM) extraction, normalized face computation, and variation analysis. ...
Thesis
With the advances of image editing and generation software tools, it has become easier to tamper with the content of images or create new images, even for novices. These generated images, such as computer graphics (CG) image and colorized image (CI), have high-quality visual realism, and potentially throw huge threats to many important scenarios. For instance, the judicial departments need to verify that pictures are not produced by computer graphics rendering technology, colorized images can cause recognition/monitoring systems to produce incorrect decisions, and so on. Therefore, the detection of computer-generated images has attracted widespread attention in the multimedia security research community. In this thesis, we study the identification of different computer-generated images including CG image and CI, namely, identifying whether an image is acquired by a camera or generated by a computer program. The main objective is to design an efficient detector, which has high classification accuracy and good generalization capability. Specifically, we consider dataset construction, network architecture, training methodology, visualization and understanding, for the considered forensic problems. The main contributions are: (1) a colorized image detection method based on negative sample insertion, (2) a generalization method for colorized image detection, (3) a method for the identification of natural image (NI) and CG image based on CNN (Convolutional Neural Network), and (4) a CG image identification method based on the enhancement of feature diversity and adversarial samples.
... Dang-Nguyen et al. [15] use a technique to differentiate between faces computers generated and natural faces dependent on the analysis of facial expressions. They used the BUHMAP database for happiness and sadness classification. ...
Conference Paper
the goal of facial expression Recognition is to detect human emotion through facial images. But the biggest challenge of recognizing facial expression is how to extract distinctive characteristics from images of the human face to differentiate diverse emotions. To tackle this challenge, we propose a FER algorithm using geometric features. In the first step, facial landmarks are detected from input sequence video using Dlib Library and geometric features are extracted, considering the spatial position between landmarks. These feature vectors are then implemented in Support Vector Machine (SVM) classifier to classify facial expressions. The Experimental results demonstrate that our proposed method applied on a fusion of two databases (personal database and BUHMAP) shows 94.5% accuracy. Keywords—Facial landmarks, Facial expression recognition, geometric features, Dlib Library, SVM
... Some works using traditional methods surfaced at first in the forensics angle. These methods [22]- [25] were first put forward for the discrimination of the traditional computer-generated face images, based mostly on statistical features that detect the variations in images. The theory was that the naturally acquired images have more complex variations than the computer-generated (CG) face images. ...
Article
Full-text available
With Generative Adversarial Networks (GAN) achieving realistic image generation, fake image detection research has become an imminent need. In this paper, a novel detection algorithm is designed to exploit the structural defect in GAN, taking advantage of the most vulnerable link in GAN generators — the Up-sampling process conducted by the Transposed Convolution operation. The Transposed Convolution in the process will cause the lack of global information in the generated images. Therefore, the Self-Attention mechanism is adopted correspondingly, equipping the algorithm with a much better comprehension of the global information than the other current work adopting pure CNN network, which is reflected in the significant increase in the detection accuracy. With the thorough comparison to the current work and corresponding careful analysis, it is verified that our proposed algorithm outperforms other current works in the field. Also, with experiments conducted on other image-generation categories and images undergone usual real-life post-processing methods, our proposed algorithm shows decent robustness for various categories of images under different reality circumstances, rather than restricted by image types and pure laboratory situation.
Article
With the boom of artificial intelligence, facial manipulation technology is becoming more simple and more numerous. At the same time, the technology also has a large and profound negative impact on face forensics, such as Deepfakes. In this paper, in order to aggregate multiframe features to detect facial manipulation videos, we solve facial manipulated video detection from set perspective and propose a novel framework based on set, which is called set convolutional neural network (SCNN). Three instances of the proposed framework SCNN are implemented and evaluated on the Deepfake TIMIT dataset, FaceForensics++ dataset and DFDC Preview datset. The results show that the method outperforms previous methods and can achieve state-of-the-art performance on both datasets. As a perspective, the proposed method is a fusion promotion of single-frame digital video forensics network.
Chapter
Face forensic detection is to distinguish manipulated from pristine face images. The main drawback of existing face forensics detection methods is their limited generalization ability due to differences in domains. Furthermore, artifacts such as imaging variations or face attributes do not persistently exist among all generated results for a single generation method. Therefore, in this paper, we propose a novel framework to address the domain gap induced by multiple deep fake datasets. To this end, the maximum mean discrepancy (MMD) loss is incorporated to align the different feature distributions. The center and triplet losses are added to enhance generalization. This addition ensures that the learned features are shared by multiple domains and provides better generalization abilities to unseen deep fake samples. Evaluations on various deep fake benchmarks (DF-TIMIT, UADFV, Celeb-DF and FaceForensics++) show that the proposed method achieves the best overall performance. An ablation study is performed to investigate the effect of the different components and style transfer losses.
Chapter
In recent years, we have come across a vast range of software tools like “Photoshop” and techniques like DeepFake that have made it easier to create unrealistic and believable face swaps in videos that end up leaving very few traces of manipulation. The realistic nature of DeepFake videos are exploited for carrying out unethical and false practices such as generation of pedopornographic materials, Fake News, Fake Surveillance footage, Fake Hoaxes, videos for blackmailing amongst many more. Many AI based tools have been developed to detect DeepFake based video manipulation by extracting tampered face features. Through our approach, we aim to provide a forensic tool for investigators to detect these DeepFake videos which analyses specific facial artifacts like Eye Blink and Pulse. Each artifact will give their own probabilistic output via there classifier model and then combine them to give probability of fakeness. Thus instead of allowing resource intensive algorithms to detect facial artifacts for DeepFake detection, we aim to analyze and interpret immutable facial artifacts for detecting a DeepFake video.
Article
Full-text available
Human emotions is an important topic that is to be considered in many fields such as Bio-medical engineering, Psychology, Marketing, developing applications that can track human behavior, in developing robots for human use etc., Some important emotions which can be detected by humans instantly include happy, sad, Surprise, anger, fear, disguise etc., These emotions play a crucial role in non-verbal communications which help in expressing feelings by a person to another person. It is a difficult thing to give a computer the ability to understand human emotions. A good amount of research is being done in order to model a computer to understand and analyse human emotions from expressions. In this paper, we are providing an approach by which a computer can understand face expressions by convolutional neural networks (CNN). The datasets which are used for training CNN are FER-2013. This achieves a good amount of accuracy in terms of training and testing points. Kaggle facial expression dataset with seven facial expression labels as happy, sad, surprise, fear, anger, disgust, and neutral is used in this project. This system achieved a good amount of accuracy.
Article
Full-text available
Aim: Emotional facial expressions are cross-culturally readily recognized. Although each of the emotions could be expressed by body language, we are better tuned to facial expressions. We wanted to confirm our assumption that recognition of facial expressions of emotions is an innate ability of individual brain with gender specific pattern.
Article
Nowadays, sophisticated computer graphics editors lead to a significant increase in the photorealism of images. Thus, computer generated (CG) images result to be convincing and hard to be distinguished from real ones at a first glance. Here, we propose an image forensics technique able to automatically detect local forgeries, i.e., objects generated via computer graphics software inserted in natural images, and vice versa. We develop a novel hybrid classifier based on wavelet based features and sophisticated pattern noise statistics. Experimental results show the effectiveness of the proposed approach.
Article
The recent development of information and communication technology has made computer software able to create highly realistic multimedia contents that can be, for human, impossible to distinguish from the natural ones. This fact leads to the need for tools and techniques that can reliably discriminate between natural and computer generated multimedia data in forensics applications. In this paper, we focus on the specific class of images containing faces, since we consider critical to be able to discriminate between photographic faces and the photorealistic ones. To this aim, we present a new geometric-based approach relying on face asymmetry information. Experimental results show that asymmetry information could be used as a hint to tackle this problem without requiring classification tools and training or combined with state-of-the-art approaches to improve their performances.
Article
First description of the uncanny valley theory
Conference Paper
Sign languages are visual languages. The message is not only transferred via hand gestures (manual signs) but also head/body motion and facial expressions (non-manual signs). In this article, we present a database of non-manual signs in Turkish sign language (TSL). There are eight non-manual signs in the database, which are frequently used in TSL. The database contains the videos of these signs as well as a ground truth data of 60 manually landmarked points of the face.
Conference Paper
This paper presents a classifier built for differentiating digital photorealistic images from digital photographs. Results show that our 144-dimensional (144-D) feature vector extracted from characteristic functions of wavelet histograms is more efficient than Lyu and Farid's 216-D feature vector (S. Lyu and H. Farid), and Ng et al.'s 192-D feature vector (2005). Our classifier outperforms Lyu and Farid's state-of-art method while only requiring half of their feature extraction and testing time
Article
We investigate facial asymmetry as a biometric under expression variation. For the first time, we have defined two types of quantified facial asymmetry measures that are easily computable from facial images and videos. Our findings show that the asymmetry measures of automatically selected facial regions capture individual differences that are relatively stable to facial expression variations. More importantly, a synergy is achieved by combining facial asymmetry information with conventional EigenFace and FisherFace methods. We have assessed the generality of these findings across two publicly available face databases: Using a random subset of 110 subjects from the FERET database, a 38% classification error reduction rate is obtained. Error reduction rates of 45–100% are achieved on 55 subjects from the Cohn–Kanade AU-Coded Facial Expression Database. These results suggest that facial asymmetry may provide complementary discriminative information to human identification methods, which has been missing in automatic human identification.