Conference PaperPDF Available

Identify computer generated characters by analysing facial expressions variation

December 2012

December 2012

DOI:10.1109/WIFS.2012.6412658

Conference: WIFS

Authors:

Duc Tien Dang Nguyen

Dublin City University

G. Boato

Università degli Studi di Trento

Significant improvements have been recently achieved in both quality and realism of computer generated characters, which are nowadays often very difficult to be distinguished from real ones. However, generating highly realistic facial expressions is still a challenging issue, since synthetic expressions usually follow a repetitive pattern, while in natural faces the same expression is usually produced in similar but not equal ways. In this paper, we propose a method to distinguish between computer generated and natural faces based on facial expressions analysis. In particular, small variations of the facial shape models corresponding to the same expression are used as evidence of synthetic characters.

Schema of the proposed method: A. Human faces are extracted from the video sequence(s). B. Facial expressions are recognized (in the example 3 happy, 2 disgust, 1 surprise and 1 neutral). C. Faces with the same expression are selected (in this example only happy faces) and their active shape models are extracted. D. The extracted models are normalized. E. Differences on the normalized models are analysed to determine whether the character is CG.

…

The 87 points of Active Shape Model (ASM). Source: Microsoft Research Face SDK.

…

ASM and normalized ASM: (a) and (c) show a photographic and a computer generated happy face, respectively, and their corresponding ASM points; (b) and (d) show the normalized images of (a) and (c), respectively, and their corresponding normalized points.

…

Example of differences on the mean of ASM points between CG and photographic sad faces of Figure 5 (b).

…

Facial Expression Values computed on happiness expression. The threshold value τ 1 is 0.45. The separation between CG and natural EV V 1 is clearly visualized with only one miss classification.

…

Figures - uploaded by G. Boato

Content may be subject to copyright.

Content uploaded by G. Boato

Content may be subject to copyright.

Identify computer generated characters

by analysing facial expressions variation

Duc-Tien Dang-Nguyen, Giulia Boato, Francesco G.B. De Natale

Department of Information and Computer Science - University of Trento

via Sommarive 5, 38123 Trento - Italy

dangnguyen@disi.unitn.it

boato@disi.unitn.it

denatale@ing.unitn.it

Abstract—Signiﬁcant improvements have been recently

achieved in both quality and realism of computer generated

characters, which are nowadays often very difﬁcult to be distin-

guished from real ones. However, generating highly realistic facial

expressions is still a challenging issue, since synthetic expressions

usually follow a repetitive pattern, while in natural faces the

same expression is usually produced in similar but not equal

ways. In this paper, we propose a method to distinguish between

computer generated and natural faces based on facial expressions

analysis. In particular, small variations of the facial shape models

corresponding to the same expression are used as evidence of

synthetic characters.

I. INTRODUCTION

Digital graphics tools are nowadays widespread and ex-

ploited to create realistic computer media data both from

professional and non-professional users. In particular, com-

puter generated (CG) characters are increasingly used in many

applications such as talking-faces, e-learning, virtual meeting

and especially video games. Since the ﬁrst virtual newsreader

Ananova

introduced in 2000, signiﬁcant improvements have

been achieved in both quality and realism of CG characters,

which are nowadays often very difﬁcult to be distinguished

from real ones.

At one hand, these results open a new area for advance

human-computer interaction. On the other hand, non existing

subjects or situations can be generated leading to the need

of techniques assessing data trustability and authenticity with

sufﬁcient conﬁdence. Therefore, the research community has

recently focused on the development of tools supporting the

discrimination between natural and CG multimedia content in

an accurate and reliable way.

In multimedia forensics, approaches distinguishing between

CG and natural data have been developed since 2005. Most of

them focus on still images, by estimating statistical differences

in wavelet-based decomposition [1][2]; by modelling physical

differences like local patch statistics, fractal and quadratic

geometry, and gradient on surface [3]; by evaluating the

http://news.bbc.co.uk/2/hi/entertainment/718327.stm

WIFS‘2012, December, 2-5, 2012, Tenerife, Spain. 978-1-

4244-9080-6/10/$26.00

2012 IEEE.

noise of the recording device [4]; by combining different

informations like the hybrid approach in [5]. Recently, a

geometric approach supporting the distinction of CG and real

human faces has been presented in [6], which exploits face

asymmetry as a discriminative feature. However, to the best

of our knowledge, there is no multimedia forensics approach

that aims at discriminating between CG versus natural objects

or subjects in video sequences. Such a goal requires different

techniques with respect to the state-of-the-art.

In this paper we propose a method to distinguish between

CG and real characters by analysing facial expressions. Repro-

ducing facial expressions is one of the most challenging issues

in creating virtual characters [7] and there are studies back to

1971 that analyse this problem (see for instance [8]). Most of

the algorithms generate synthetic facial expressions following

the Facial Action Coding System (FACS) by Ekman [9][10]

or MPEG-4 standard [11]. In FACS, the muscles on face

are coded as Action Units (AUs), and an expression is then

represented as a combination of AUs. In MPEG-4, explicit

movements for each point on the face is deﬁned by Facial

Animation Parameters (FAPs). Based on these parameters

(FACS or FAPs), a physically-based model is applied to make

it more realistic. However, when CG contents become very

realistic they often become also unfamiliar (so called Uncanny

valley [12]) and recently some approaches attempt to overcome

such a problem [7][13]. Here, we propose to exploit this gap

to differentiate between computer generated and natural faces.

The underlying idea is that facial expressions in CG charac-

ters follow a repetitive pattern, while in natural faces the same

expression is usually produced in similar but not equal ways

(e.g., human beings do not always smile in the same way). Our

forensic technique take as input various instances of the same

character expression (extracting corresponding frames of the

video sequences) and determine whether the character is CG or

natural based on the analysis of the corresponding variations.

We show that CG faces often replicate the same expression

exactly in the same way, i.e., the variations is smaller than the

natural ones, and can therefore be automatically detected.

The rest of this paper is organized as follows: the proposed

method is described in section II, experimental results are re-

ported in section III, while section IV draws some conclusions.

II. PROPOSED METHOD

Our method contains ﬁve steps as detailed in Figure 1. From

a given video sequence, frames that contain human faces are

extracted in the ﬁrst step A. Then, in step B, facial expression

recognition is applied in order to recognize the expressions of

the faces. Six types of facial expressions are used in this step,

following the six universal expressions of Ekman (happiness,

sadness, disgust, surprise, anger, and fear) [9] plus a ‘neutral’

one. Based on the recognition results, faces corresponding to a

particular expression (e.g., happiness) are selected for the next

steps. Notice that the ‘neutral’ expressions are not considered,

i.e., faces showing no expression are not taken into account for

further processing. In the next step C the Active Shape Model

(ASM), which represents the shape of a face, is extracted from

each face. In order to measure their variations, all shapes have

to be comparable. Thus, in step D, each extracted ASM is

then normalized to a standard shape. After this step, all ASM

shapes are normalized and are comparable. Finally, in step

E, differences between normalized shapes are analysed, and

based on the variation analysis results, the given sequence is

conﬁrmed to be CG or natural.

The right part of Figure 1 shows an illustration of the

analysis procedure on happiness expression. Seven frames that

contain faces are extracted in step A. Then, facial expression

recognition is applied in step B and three happy faces are

kept. For each face, the corresponding ASM model, which is

represented by a set of reference points, is extracted in step

C. Then, each model is normalized to a standard shape, step

D. All normalized shapes are then compared together in step

E, and based on the analysis results the given character is

conﬁrm as computer generated since the differences between

the normalized shapes are small (details about the variation

analysis are given in the following Subsection II-E).

A. Human faces extraction

Face detection problem has been solved with the Viola-

Jones method [14], which can be applied in real-time ap-

plications with a high accuracy. In this step, we reuse this

approach to detect faces from video frames, and frames that

contain faces are extracted. More details about this well-known

method can be found in [15] and [14]. It is worth mentioning

that in this ﬁrst work we do not face the problem of face

recognition, thus assuming to have just a single person per

video sequence (the analysed character).

B. Facial expression recognition

Facial expression recognition is a nontrivial problem in

facial analysis. In this study, we applied an EigenFaces-

based application [16] developed by Rosa for facial expression

recognition. The goal of this step is to ﬁlter out the outlier

expressions and keep the recognized ones for further steps.

Notice that this application associates an expression to a given

face without requiring any detection of reference points. In

Figure 1 an example of results of this application is shown

with 7 faces (3 happy, 2 disgust, 1 surprise and 1 neutral).

Fig. 2. The 87 points of Active Shape Model (ASM). Source: Microsoft

Research Face SDK.

C. Active Shape Model Extraction

Input images for this step are conﬁrmed to have the same

facial expression of the same person, thanks to the prepro-

cessing in the ﬁrst two steps. In order to extract face shapes,

which are used in our analysis, an alignment method is applied.

In this step, we follow the Component-based Discriminative

Search approach [17], proposed by Liang et al. The general

idea of this approach is to ﬁnd the best matching from the

mode candidates, where modes are important predeﬁned points

on face images (e.g., eyes, nose, mouth) and are detected from

multiple component positions [17]. Given a face image, the

result of this step is a set of reference points, representing the

detected face. In Figure 3 (a) an example of this step is shown,

where the right image shows reference points representing the

face in the left image. In this method, the authors exploit

the so called ASM, which contains 87 reference points as

shown in Figure 2. Another example of this step on a CG

face is also reported in Figure 3 (c), where the left image

shows the synthetic facial image and the right one shows the

corresponding ASM.

D. Normalized Face Computation

ASM models precisely and suitably represent faces, but

they are incomparable since faces could be different in sizes

or orientations. They need to be normalized in order to be

comparable. In this step, we apply the traditional approach

from [18] to normalize a shape of a face in order to have

a common coordinate system. This normalization is an afﬁne

transformation used to transform the reference points into ﬁxed

positions. Since eye inner corners and the philtrum are stable

under different expressions, these points have been chosen

as reference points. Shown in Figure 2, the reference points

number 0 and 8 are two inner eye corners. The last reference

point, the philtrum, can be computed via the top point of outer

lip and two nostrils (point 51 and 41, 42 on the ASM model,

respectively), as follows:

philtrum

+ p

(1)

Disgust Happy Surprise Happy Disgust Happy Neutral

Human Faces

Extraction

Facial Expression

Recognition

Active Shape Model

Extraction

Variation Analysis

Normalized

Face Computation

RESULT

INPUT

Fig. 1. Schema of the proposed method: A. Human faces are extracted from the video sequence(s). B. Facial expressions are recognized (in the example 3

happy, 2 disgust, 1 surprise and 1 neutral). C. Faces with the same expression are selected (in this example only happy faces) and their active shape models

are extracted. D. The extracted models are normalized. E. Differences on the normalized models are analysed to determine whether the character is CG.

where p

, p

, and p

are the reference points on the

extracted ASM.

After computing the three reference points, each ASM

model is normalized by moving {p

, p

phitrum

} into

their normalized positions, as follows: (i) rotate the seg-

ment [p

, p

] into an horizontal line segment; (ii) shear the

philtrum to be on the perpendicular line through the middle

point of [p

, p

]; and ﬁnally (iii) scale the image so that the

length of segment [p

, p

] and the distance from p

philtrum

to [p

, p

] have predeﬁned ﬁxed values (see [18] for more

details).

Shown in Figure 3 (b) and (d) are examples of the normal-

ized faces after Face Normalization step. The left images show

the normalized faces and the right ones show the normalized

reference points.

E. Variation Analysis

In this step, differences among normalized ASM models are

analysed in order to determine if a given character (and there-

fore the corresponding set of faces) is CG or real. We analyse

the differences as described in the following paragraphs.

First, the distance d

i,p

of each reference point p on a model

Fig. 3. ASM and normalized ASM: (a) and (c) show a photographic and

a computer generated happy face, respectively, and their corresponding ASM

points; (b) and (d) show the normalized images of (a) and (c), respectively,

and their corresponding normalized points.

i to the average of all points p of all models is calculated as:

i,p

= k(x, y)

i,p

− (x, y)

k (2)

where (x, y)

i,p

is the position of the reference point p on the

model i; (x, y)

i=1

(x, y)

i,p

, where N is the number of

normalized ASM models; and k·k is Euclidean distance.

Depending on the facial expression ξ (among six universal

expressions), a subset S

of reference points (not all 87 points)

are selected for the analysis. For example, with the happy

facial expression (ξ = 1) only reference points from 0 to 15

and from 48 to 67, which represent the eyes and the mouth, are

considered, i.e., S

= {0, 1, 2..15, 48, 49, .., 67}. The subsets

are selected based on our experiments and suggestions from

EMFACS [9], in which a facial expression is represented by a

combination of AUs codes. Shown in Table I are the reference

points selected in our method and the correspondent AUs

codes from EMFACS. Some explanations of the AUs codes

are also listed in Table II. Full codes in EMFACS could be

seen in [9].

TABLE I

EXPRESSIONS WITH ACTION UNITS AND CORRESPONDENT ASM POINTS

ξ Expression Action Units (AUs) Reference Points (S

)

1 Happiness 6+12 S

= {0 − 15, 48 − 67}

2 Sadness 1+4+15 S

= {0 − 35, 48 − 57}

3 Surprise 1+2+5B+26 S

= {16 − 35, 48 − 67}

4 Fear 1+2+4+5+20+26 S

= {16 − 35, 48 − 57}

5 Anger 4+5+7+23 S

= {0 − 64}

6 Disgust 9+15+16 S

= {0 − 15, 48 − 67}

TABLE II

EXAMPLE OF SOME FACIAL ACTIONS [9]

AU Number FACS name

1 Inner Brow Raiser

4 Brow Lowerer

6 Cheek Raiser

12 Lip Corner Puller

15 Lip Corner Depressor

.. ..

Two main properties are taken into account in this analysis:

mean and variance, calculated as their traditional deﬁnitions:

i=1

i,p

, and σ

i=1

||d

i,p

− µ

(3)

where µ

and σ

are the mean and variance of all distances

i,p

at reference point p over all models.

The given set of models on expression ξ is conﬁrmed to be

CG or natural by comparing the Expression Variation Value

EV V

to the threshold τ

. The value of EV V

is computed

as follows:

EV V

= α

1,ξ

+ (1 − α

)

max

{σ

}

2,ξ

(4)

where α

is a weighted constant, α

∈ [0; 1]; λ

1,ξ

and λ

2,ξ

are the normalization values used to normalize the numerators

into [0; 1]. In our experiments α

are set to 0.7 for ξ = 1, ..., 6.

EV V

is then compared with τ

, recognizing the character

corresponding to the set of faces as CG if EV V

< τ

, natural

otherwise.

Shown in Figure 4 are the mean values, corresponding to all

87 ASM points, for the sadness expression (ξ = 2) analysed

on the two set of images shown in Figure 5 (a). The horizontal

axis represents p, from 1 to 87, while the vertical axis shows

the value of µ

. Since the facial expression is sadness (ξ =

2), only the values from µ

to µ

and from µ

to µ

are

considered (see the selected reference points in Table I). In this

example, the Expression Variation Value EV V

of the CG face

is 0.35 comparing to 0.74 of the natural one (τ

= 0.6).

0 10 20 30 40 50 60 70 80 90

0.5

1.5

2.5

3.5

Computer Generated

Natural

Fig. 4. Example of differences on the mean of ASM points between CG

and photographic sad faces of Figure 5 (b).

Values of the thresholds τ

ξ(ξ=1..6

) are manually set based on

experiments, with the goal of keeping the miss classiﬁcation

as small as possible.

III. EXPERIMENTAL RESULTS

In our experiments, we use two public datasets:

• Bo

gazic¸i University Head Motion Analysis Project

Database (BUHMAP-DB) [19], which contains 440

videos of 11 people (6 female, 5 male) performing

5 repetitions on 8 different gestures. We selected the

happiness and sadness from this database, since the other

six gestures are not related to our topic. Finally, we have

110 videos from this dataset. Each video lasts about 1 -

2 seconds.

• The Japanese Female Facial Expression (JAFFE)

Database [20], which contains 213 images of 7 facial

expressions posed by 10 Japanese female models.

The ﬁrst experiment is performed on happiness and sadness

expressions from BUHMAP-DB videos. Starting from the 11

people of BUHMAP-DB, we created 11 CG characters by

using FaceGen [21] and morphed all of them into both happy

and sad faces. FaceGen is a powerful tool which can be used

in building complex face structures from one to three images.

In our case, we pass a ‘neutral’ image to FaceGen in order to

build the face structure, then we use Morph options to generate

happiness and sadness expressions on the new generated face.

Thus, we obtained 110 sets of happy and sad faces, where

each model has 5 sets corresponding to happiness and 5 sets

corresponding to sadness. Shown in Figure 5 are two examples

of the CG versions and the original faces from BUHMAP-DB.

(a)

(b)

Fig. 5. Examples of (a) happy, and (b) sad faces from BUHMAP-DB and

the corresponding CG faces generated via FaceGen.

(a) (b) (c)

Fig. 6. Examples of (a) happy, (b) sad, and (c) surprised faces from JAFFE

and the corresponding CG faces generated via FaceGen.

The goal of this experiment is to analyse the differences

from CG models with the natural faces in order to conﬁrm

the idea of the proposed method. The analysis is performed

as follows: for each video sequence 10 frames are uniformly

extracted and similarly for each CG model 10 images are

selected. Then, the sets of images are analysed and the corre-

sponding Expression Variation Values computed as described

in Section II-E. In this case since the expressions are already

known, we implement the method from step C. In this step,

we use Microsoft Face SDK [22] to extract the ASM models.

Finally, we apply step D and E to get the results.

Shown in Figure 7 are EV V

values computed on the 55

sets of CG and the 55 sets of natural happy faces. These values

are well separated between CG and natural. There is only

one miss classiﬁcation using the threshold τ

= 0.45. The

accuracy, therefore, is 99% (equals 109/110).

On sadness expression, the result is even better, with 100%

of accuracy using the threshold τ

= 0.6. The EV V

values

for CG and natural characters are perfectly separated, as shown

in Figure 8.

Our second experiment is performed on the JAFFE database,

which contains all six expressions. Also in this case we used

FaceGen [21] to create the CG models reproducing the JAFFE

models (see Figure 6 for some examples). For each model in

this database, we reproduced all 6 expressions. Therefore, we

perform the second test on 120 sets of images, 60 sets of

CG and 60 sets of JAFEE real faces. The complete proposed

approach described in Section II is applied as a classiﬁcation

approach on these sets.

Shown in Figure 9 is the average EV V

for each expression

(ξ = 1, ..., 6). The inner blue boundary represents the EV V

5 10 15 20 25 30 35 40 45 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Computer Generated

Natural

Fig. 7. Facial Expression Values computed on happiness expression. The

threshold value τ

is 0.45. The separation between CG and natural EV V

clearly visualized with only one miss classiﬁcation.

5 10 15 20 25 30 35 40 45 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Computer Generated

Natural

Fig. 8. Facial Expression Values computed on sadness expression. The

threshold value τ

is 0.6. CG and natural EV V

are clearly separated.

computed from CG sets of images, and the outer red boundary

represents the natural EV V

. Results show that CG and natural

Expression Variation Values can be differentiated by using and

comparing with a set of thresholds τ

, visualized by the green

boundary. The classiﬁcation performance of this experiment is

in average 96.67%. Details for each expression are reported

in the confusion matrices, Table III.

TABLE III

CONFUSION MATRICES ON CG AND NATURAL FACES, COMPUTED ON

JAFFE DATABASE.

ξ Expression CG Natural

1 Happiness CG 100% 0%

Natural 0% 100%

2 Sadness CG 100% 0%

Natural 0% 100%

3 Surprise CG 100% 0%

Natural 0% 100%

4 Fear CG 90% 10%

Natural 0% 100%

5 Anger CG 100% 0%

Natural 0% 100%

6 Disgust CG 80% 20%

Natural 10% 90%

0.2"

0.4"

0.6"

0.8"

Happiness"

Sadness"

Surprise"

Anger"

Fear"

Disgust"

Computer<Generated"

Natural"

Threshold"

Fig. 9. Average of Expression Variation Values analysed for all expressions.

CG and natural EV V

are separated for all ξ = 1, ..., 6.

The last experiment is performed by comparing Star Trek

Aurora

movie, a fully-animated product, against Star Trek

Odyssey, a live action movie from Star Trek: Hidden Frontier

series

. In Star Trek Aurora, two graphics applications, namely

Poser and Cinema 4D, are used to create the entire 3D world

and characters. We extracted 4 female characters in each

movie and selected frames that contain happy expression of

those characters. Happy faces are then conﬁrmed by using

Rosa application [16]. An illustration of two characters in

happy emotion is shown in Figure 10. Finally, EV V s are

computed and compared. Using the same threshold as in the

ﬁrst experiment (τ

= 0.45), all EV V

calculated for the 4

characters of Star Trek Aurora are smaller than τ

while all of

the EV V

from Star Trek Odyssey are over τ

, i.e., the CG

characters can be recognized and separated from the natural

ones.

(a)

(b)

Fig. 10. Examples of happy faces extracted from (a) Star Trek Aurora, (b)

Star Trek Odyssey.

IV. CONCLUSIONS

In this study, we introduced a novel problem about dif-

ferentiating between CG and natural human faces in video

sequences and we presented a method that allows distinguish-

ing CG characters based on facial expression analysis. Indeed,

results show that CG persons usually present smaller differ-

ences in face shape changing among the same expression, in

http://auroratrek.com

http://www.hiddenfrontier.com

comparison with real persons. Although experimental results

are performed just on small datasets, we proved that the

method can be effective. Further work will be devoted to au-

tomatic selection of thresholds and exploitation of transitional

parameters of faces.

REFERENCES

[1] S. Lyu and H. Farid, “How realistic is photorealistic?” IEEE Transac-

tions on Signal Processing, vol. 53, no. 2, pp. 845–850, 2005.

[2] Y. Wang and P. Moulin, “On discrimination between photorealistic and

photographic images,” in IEEE International Conference on Acoustics,

Speech and Signal Processing, 2006, pp. II.161–II.164.

[3] T. T. Ng, S. F. Chang, J. Hsu, L. Xie, and M. P. Tsui, “Physics-motivated

features for distinguishing photographic images and computer graphics,”

in ACM Multimedia, 2005, pp. 239–248.

[4] N. Khanna, G. T. C. Chiu, J. P. Allebach, and E. J. Delp, “Forensic tech-

niques for classifying scanner, computer generated and digital camera

images,” in IEEE International Conference on Acoustics, Speech and

Signal Processing, 2008, pp. 1653–1656.

[5] V. Conotter and L. Cordin, “Detecting photographic and computer

generated composites,” in SPIE Symposium on Electronic Imaging, 2011.

[6] D.-T. Dang-Nguyen, G. Boato, and F. G. B. DeNatale, “Discrimina-

tion between computer generated and natural human faces based on

asymmetry information,” in European Signal Processing Conference

(EUSIPCO), Bucharest, 2012.

[7] A. Tinwell, M. Grimshaw, D. A. Nabi, and A. Williams, “Facial

expression of emotion and perception of the Uncanny Valley in virtual

characters,” Computers in Human Behavior, vol. 27, no. 2, pp. 741–749,

Mar. 2011.

[8] P. Ekman and W. V. Friesen, “Constants across cultures in the face and

emotion,” Journal of Personality and Social Psychology, vol. 17, no. 2,

pp. 124–129, 1971.

[9] P. Ekman and W. Friesen, Facial Action Coding System: A Technique

for the Measurement of Facial Movement. Palo Alto: Consulting

Psychologists Press, 1978.

[10] P. Ekman, W. Friesen, and J. Hager, Facial Action Coding System

(FACS): Manual. Salt Lake City (USA): A Human Face, 2002.

[11] ISO, ISO/IEC 14496-2:1999: Information technology — Coding of

audio-visual objects — Part 2:Visual, 1999.

[12] M. Mori, “Bukimi no tani (the uncanny valley),” Energy, vol. 7, no. 4,

pp. 33–35, 1970.

[13] K. F. MacDorman, R. D. Green, C.-C. Ho, and C. T. Koch, “Too real for

comfort? Uncanny responses to computer generated faces,” Computers

in Human Behavior, vol. 25, no. 3, pp. 695–710, May 2009.

[14] P. A. Viola and M. J. Jones, “Robust real-time face detection,” Interna-

tional Journal of Computer Vision, vol. 57, no. 2, pp. 137–154, 2004.

[15] ——, “Rapid object detection using a boosted cascade of simple

features,” in CVPR, 2001, pp. 511–518.

[16] L. Rosa, “EigenExpressions for Facial Expression Recognition,”

http://www.advancedsourcecode.com/facialexpression.asp, 2007.

[17] L. Liang, R. Xiao, F. Wen, and J. Sun, “Face alignment via component-

based discriminative search,” in Proceedings of the 10th European

Conference on Computer Vision: Part II, ser. ECCV ’08. Berlin,

Heidelberg: Springer-Verlag, 2008, pp. 72–85.

[18] Y. Liu, K. L. Schidt, J. F. Cohn, and S. Mitra, “Facial asymmetry

quantiﬁcation for expression invariant human identiﬁcation,” Computer

Vision and Image Understanding Journal, vol. 91, no. 1/2, pp. 138–159,

2003.

[19] O. Aran, I. Ari, M. A. G

uvensan, H. Haberdar, Z. Kurt, H. . T

urkmen,

A. Uyar, and L. Akarun, “A database of non-manual signs in Turkish

sign language,” in Signal Processing and Communications Applications

(SIU2007), Eskis¸ehir, 2007.

[20] M. K. J. G. Michael J. Lyons, Shigeru Akamatsu, “Coding facial

expressions with gabor wavelets,” in IEEE International Conference on

Automatic Face and Gesture Recognition, 1998, pp. 200–205.

[21] “FaceGen Modeller from Singular Inversions,” http://www.facegen.com,

2004.

[22] “Microsoft Research Face SDK,” http://research.microsoft.com/en-

us/projects/facesdk/, May, 2012.

Identifying Synthetic Faces through GAN Inversion and Biometric Traits Analysis

Article

Full-text available

Jan 2023

In the field of image forensics, notable attention has been recently paid toward the detection of synthetic contents created through Generative Adversarial Networks (GANs), especially face images. This work explores a classification methodology inspired by the inner architecture of typical GANs, where vectors in a low-dimensional latent space are transformed by the generator into meaningful high-dimensional images. In particular, the proposed detector exploits the inversion of the GAN synthesis process: given a face image under investigation, we identify the point in the GAN latent space which more closely reconstructs it; we project the vector back into the image space, and we compare the resulting image with the actual one. Through experimental tests on widely known datasets (including FFHQ, CelebA, LFW, and Caltech), we demonstrate that real faces can be accurately discriminated from GAN-generated ones by properly capturing the facial traits through different feature representations. In particular, features based on facial landmarks fed to a Support Vector Machine consistently yield a global accuracy of above 88% for each dataset. Furthermore, we experimentally prove that the proposed detector is robust concerning routinely applied post-processing operations.

Detection of computer-generated images via deep learning

Thesis

Dec 2020

Weize Quan

With the advances of image editing and generation software tools, it has become easier to tamper with the content of images or create new images, even for novices. These generated images, such as computer graphics (CG) image and colorized image (CI), have high-quality visual realism, and potentially throw huge threats to many important scenarios. For instance, the judicial departments need to verify that pictures are not produced by computer graphics rendering technology, colorized images can cause recognition/monitoring systems to produce incorrect decisions, and so on. Therefore, the detection of computer-generated images has attracted widespread attention in the multimedia security research community. In this thesis, we study the identification of different computer-generated images including CG image and CI, namely, identifying whether an image is acquired by a camera or generated by a computer program. The main objective is to design an efficient detector, which has high classification accuracy and good generalization capability. Specifically, we consider dataset construction, network architecture, training methodology, visualization and understanding, for the considered forensic problems. The main contributions are: (1) a colorized image detection method based on negative sample insertion, (2) a generalization method for colorized image detection, (3) a method for the identification of natural image (NI) and CG image based on CNN (Convolutional Neural Network), and (4) a CG image identification method based on the enhancement of feature diversity and adversarial samples.

Facial expression recognition based on geometric features

Conference Paper

Jun 2020

the goal of facial expression Recognition is to detect human emotion through facial images. But the biggest challenge of recognizing facial expression is how to extract distinctive characteristics from images of the human face to differentiate diverse emotions. To tackle this challenge, we propose a FER algorithm using geometric features. In the first step, facial landmarks are detected from input sequence video using Dlib Library and geometric features are extracted, considering the spatial position between landmarks. These feature vectors are then implemented in Support Vector Machine (SVM) classifier to classify facial expressions. The Experimental results demonstrate that our proposed method applied on a fusion of two databases (personal database and BUHMAP) shows 94.5% accuracy. Keywords—Facial landmarks, Facial expression recognition, geometric features, Dlib Library, SVM

GAN-Generated Image Detection with Self-Attention Mechanism against GAN Generator Defect

Article

Full-text available

Aug 2020

With Generative Adversarial Networks (GAN) achieving realistic image generation, fake image detection research has become an imminent need. In this paper, a novel detection algorithm is designed to exploit the structural defect in GAN, taking advantage of the most vulnerable link in GAN generators — the Up-sampling process conducted by the Transposed Convolution operation. The Transposed Convolution in the process will cause the lack of global information in the generated images. Therefore, the Self-Attention mechanism is adopted correspondingly, equipping the algorithm with a much better comprehension of the global information than the other current work adopting pure CNN network, which is reflected in the significant increase in the detection accuracy. With the thorough comparison to the current work and corresponding careful analysis, it is verified that our proposed algorithm outperforms other current works in the field. Also, with experiments conducted on other image-generation categories and images undergone usual real-life post-processing methods, our proposed algorithm shows decent robustness for various categories of images under different reality circumstances, rather than restricted by image types and pure laboratory situation.

Detection of Facial Forgery in Digital Images

Conference Paper

Mar 2023

TrueFace: a Dataset for the Detection of Synthetic Face Images from Social Networks

Conference Paper

Oct 2022

Detecting facial manipulated videos based on set convolutional neural networks

Article

Apr 2021
J VIS COMMUN IMAGE R

With the boom of artificial intelligence, facial manipulation technology is becoming more simple and more numerous. At the same time, the technology also has a large and profound negative impact on face forensics, such as Deepfakes. In this paper, in order to aggregate multiframe features to detect facial manipulation videos, we solve facial manipulated video detection from set perspective and propose a novel framework based on set, which is called set convolutional neural network (SCNN). Three instances of the proposed framework SCNN are implemented and evaluated on the Deepfake TIMIT dataset, FaceForensics++ dataset and DFDC Preview datset. The results show that the method outperforms previous methods and can achieve state-of-the-art performance on both datasets. As a perspective, the proposed method is a fusion promotion of single-frame digital video forensics network.

MMD Based Discriminative Learning for Face Forgery Detection

Chapter

Feb 2021

Face forensic detection is to distinguish manipulated from pristine face images. The main drawback of existing face forensics detection methods is their limited generalization ability due to differences in domains. Furthermore, artifacts such as imaging variations or face attributes do not persistently exist among all generated results for a single generation method. Therefore, in this paper, we propose a novel framework to address the domain gap induced by multiple deep fake datasets. To this end, the maximum mean discrepancy (MMD) loss is incorporated to align the different feature distributions. The center and triplet losses are added to enhance generalization. This addition ensures that the learned features are shared by multiple domains and provides better generalization abilities to unseen deep fake samples. Evaluations on various deep fake benchmarks (DF-TIMIT, UADFV, Celeb-DF and FaceForensics++) show that the proposed method achieves the best overall performance. An ablation study is performed to investigate the effect of the different components and style transfer losses.

ShallowFake-Detection of Fake Videos Using Deep Learning

Chapter

Feb 2021

In recent years, we have come across a vast range of software tools like “Photoshop” and techniques like DeepFake that have made it easier to create unrealistic and believable face swaps in videos that end up leaving very few traces of manipulation. The realistic nature of DeepFake videos are exploited for carrying out unethical and false practices such as generation of pedopornographic materials, Fake News, Fake Surveillance footage, Fake Hoaxes, videos for blackmailing amongst many more. Many AI based tools have been developed to detect DeepFake based video manipulation by extracting tampered face features. Through our approach, we aim to provide a forensic tool for investigators to detect these DeepFake videos which analyses specific facial artifacts like Eye Blink and Pulse. Each artifact will give their own probabilistic output via there classifier model and then combine them to give probability of fakeness. Thus instead of allowing resource intensive algorithms to detect facial artifacts for DeepFake detection, we aim to analyze and interpret immutable facial artifacts for detecting a DeepFake video.

FaceForensics++: Learning to Detect Manipulated Facial Images

Conference Paper

Oct 2019

FACIAL EXPRESSION RECOGNITION

Article

Full-text available

Sep 2021

Human emotions is an important topic that is to be considered in many fields such as Bio-medical engineering, Psychology, Marketing, developing applications that can track human behavior, in developing robots for human use etc., Some important emotions which can be detected by humans instantly include happy, sad, Surprise, anger, fear, disguise etc., These emotions play a crucial role in non-verbal communications which help in expressing feelings by a person to another person. It is a difficult thing to give a computer the ability to understand human emotions. A good amount of research is being done in order to model a computer to understand and analyse human emotions from expressions. In this paper, we are providing an approach by which a computer can understand face expressions by convolutional neural networks (CNN). The datasets which are used for training CNN are FER-2013. This achieves a good amount of accuracy in terms of training and testing points. Kaggle facial expression dataset with seven facial expression labels as happy, sad, surprise, fear, anger, disgust, and neutral is used in this project. This system achieved a good amount of accuracy.

Recognition of facial expressions in men and women

Article

Full-text available

Dec 2014

Aim: Emotional facial expressions are cross-culturally readily recognized. Although each of the emotions could be expressed by body language, we are better tuned to facial expressions. We wanted to confirm our assumption that recognition of facial expressions of emotions is an innate ability of individual brain with gender specific pattern.

Detecting photographic and computer generated composites

Article

Sep 2011
Proceedings of SPIE

Nowadays, sophisticated computer graphics editors lead to a significant increase in the photorealism of images. Thus, computer generated (CG) images result to be convincing and hard to be distinguished from real ones at a first glance. Here, we propose an image forensics technique able to automatically detect local forgeries, i.e., objects generated via computer graphics software inserted in natural images, and vice versa. We develop a novel hybrid classifier based on wavelet based features and sophisticated pattern noise statistics. Experimental results show the effectiveness of the proposed approach.

Measuring facial movement with the facial action coding system

Article

Jan 1982

Discrimination Between Computer Generated and Natural Human Faces Based on Asymmetry Information

Article

Jan 2012

The recent development of information and communication technology has made computer software able to create highly realistic multimedia contents that can be, for human, impossible to distinguish from the natural ones. This fact leads to the need for tools and techniques that can reliably discriminate between natural and computer generated multimedia data in forensics applications. In this paper, we focus on the specific class of images containing faces, since we consider critical to be able to discriminate between photographic faces and the photorealistic ones. To this aim, we present a new geometric-based approach relying on face asymmetry information. Experimental results show that asymmetry information could be used as a hint to tackle this problem without requiring classification tools and training or combined with state-of-the-art approaches to improve their performances.

Facial action coding system: A technique for the measurement of facial movement

Article

Jan 1978

Bukimi no tani [the Uncanny Valley]

Article

M Mori

First description of the uncanny valley theory

A Database of Non-Manual Signs in Turkish Sign Language

Conference Paper

Jul 2007

Sign languages are visual languages. The message is not only transferred via hand gestures (manual signs) but also head/body motion and facial expressions (non-manual signs). In this article, we present a database of non-manual signs in Turkish sign language (TSL). There are eight non-manual signs in the database, which are frequently used in TSL. The database contains the videos of these signs as well as a ground truth data of 60 manually landmarked points of the face.

On Discrimination between Photorealistic and Photographic Images

Conference Paper

Jun 2006
Acoust Speech Signal Process

This paper presents a classifier built for differentiating digital photorealistic images from digital photographs. Results show that our 144-dimensional (144-D) feature vector extracted from characteristic functions of wavelet histograms is more efficient than Lyu and Farid's 216-D feature vector (S. Lyu and H. Farid), and Ng et al.'s 192-D feature vector (2005). Our classifier outperforms Lyu and Farid's state-of-art method while only requiring half of their feature extraction and testing time

Facial asymmetry quantification for expression invariant human identification

Article

Jul 2003
COMPUT VIS IMAGE UND

We investigate facial asymmetry as a biometric under expression variation. For the first time, we have defined two types of quantified facial asymmetry measures that are easily computable from facial images and videos. Our findings show that the asymmetry measures of automatically selected facial regions capture individual differences that are relatively stable to facial expression variations. More importantly, a synergy is achieved by combining facial asymmetry information with conventional EigenFace and FisherFace methods. We have assessed the generality of these findings across two publicly available face databases: Using a random subset of 110 subjects from the FERET database, a 38% classification error reduction rate is obtained. Error reduction rates of 45–100% are achieved on 55 subjects from the Cohn–Kanade AU-Coded Facial Expression Database. These results suggest that facial asymmetry may provide complementary discriminative information to human identification methods, which has been missing in automatic human identification.