Content uploaded by Philippe G Schyns
Author content
All content in this area was uploaded by Philippe G Schyns on Mar 12, 2020
Content may be subject to copyright.
Equipping social robots with culturally-sensitive facial expressions of
emotion using data-driven methods
Chaona Chen1,2, Laura B. Hensel2, Yaocong Duan2, Robin A. A. Ince1,
Oliver G. B. Garrod1, Jonas Beskow3, Rachael E. Jack1,2, and Philippe G. Schyns1,2
1Institute of Neuroscience and Psychology, University of Glasgow, G12 8QB, Scotland, UK
2School of Psychology, University of Glasgow, G12 8QB, Scotland, UK
3Furhat Robotics, 11428 Stockholm, Sweden
Email: rachael.jack@glasgow.ac.uk
Abstract— Social robots must be able to generate realistic
and recognizable facial expressions to engage their human
users. Many social robots are equipped with standardized
facial expressions of emotion that are widely considered to be
universally recognized across all cultures. However, mounting
evidence shows that these facial expressions are not univer-
sally recognized – for example, they elicit significantly lower
recognition accuracy in East Asian cultures than they do
in Western cultures. Therefore, without culturally sensitive
facial expressions, state-of-the-art social robots are restricted
in their ability to engage a culturally diverse range of human
users, which in turn limits their global marketability. To
develop culturally sensitive facial expressions, novel data-driven
methods are used to model the dynamic face movement patterns
that convey basic emotions (e.g., happy, sad, anger) in a given
culture using cultural perception. Here, we tested whether such
dynamic facial expression models, derived in an East Asian
culture and transferred to a popular social robot, improved
the social signalling generation capabilities of the social robot
with East Asian participants. Results showed that, compared to
the social robot's existing set of facial ‘universal’ expressions,
the culturally-sensitive facial expression models are recognized
with generally higher accuracy and judged as more human-
like by East Asian participants. We also detail the specific
dynamic face movements (Action Units) that are associated with
high recognition accuracy and judgments of human-likeness,
including those that further boost performance. Our results
therefore demonstrate the utility of using data-driven methods
that employ human cultural perception to derive culturally-
sensitive facial expressions that improve the social face signal
generation capabilities of social robots. We anticipate that these
methods will continue to inform the design of social robots and
broaden their usability and global marketability.
I. INTRODUCTION
Facial expressions are widely considered to be the univer-
sal language of emotion. Based on Darwin's ground-breaking
theory on the biological origins of facial expressions of
emotion [1] and Ekman's seminal cross-cultural recognition
studies (e.g., [2]), several dominant theories in the field of
psychology have argued that six basic emotions – happy,
surprise, fear, disgust, anger and sad – are expressed and
This work was supported by The Economic and Social Research Council
and Medical Research Council (United Kingdom; ESRC/MRC-060-25-
0010), British Academy (BA SG171783), Wellcome Trust (107802/Z/15/Z)
and Multidisciplinary University Research Initiative (MURI)/Engineering
and Physical Sciences Research Council (EP/N019261/1).
Fig. 1. A. Examples of the six standardized universal facial expressions of
basic emotions and individual face movements, called Action Units (AUs).
B. Color-coded points show the average recognition accuracy of these facial
expressions in 40 locations across the world as reported in 15 previous
studies [2, 5, 6, 21-32]. Figure adapted from [33] with permission.
recognized in the same way across all cultures (e.g., [2-
7]). To represent these universal facial expressions, the field
established a set of six standardized facial expressions (see
Fig. 1Afor examples) each comprising a specific pattern
of face movements called Action Units (AUs) such as Lid
Tightener (AU7) and Lip Corner Puller (AU20); see Fig
1Afor examples) [8]. These standardized facial expression
images quickly became the gold standard in research and
consequently influenced a broad range of fields including
affective computing [see 9 for a review] and social robotics
[10-12]. For example, state-of-the-art social robots such as
Felix [13], SAYA [14] and Furhat [15, see also 16 for a re-
view] generate facial expressions based on these standardized
Action Unit patterns.
However, mounting evidence shows that these facial ex-
pressions are not recognized well across different cultures.
Whereas all six facial expressions are recognized with com-
parably high accuracy in Western cultures, facial expressions
such as fear, disgust, and anger elicit significantly lower
accuracy in a number of other cultures [17, 18, see also
reviews of 19, 20]. To illustrate, Fig. 1Bshows the recog-
nition accuracy of these standardized facial expressions in
40 locations across the world reported in 15 well-known978-1-7281-0089-0/19/$31.00 ©2019 IEEE
Fig. 2. Data-driven, perception-based method to model culturally-sensitive dynamic facial expressions of emotion and their transference to social robotics.
A. Stimulus generation and task procedure. B. Facial expression modelling procedure. C. Transference of facial expression models to social robotics and
cultural validation.
studies [2, 5, 6, 21-32]. Red represents high recognition
accuracy (i.e., >75%) and blue represents lower accuracy
(i.e., <75%) [33]. As shown by the distribution of red and
blue points, these standardized facial expressions tend to
be recognized primarily in Western cultures but less so in
Eastern cultures. These consistent cultural differences there-
fore suggest that the facial expressions widely considered
to be universal are instead more representative of Western
culture [see also reviews of 18, 34]. Indeed, findings of
substantial cultural differences in a variety of psychological
phenomena once thought to be universal are now increasing
[17, 35, 36] because most current psychological knowledge is
derived from Western (more specifically, Western, educated,
industrialized, rich and democratic – WEIRD) [37] popula-
tions and using Western-centric theories and confirmatory
methods [38]. A further limitation of these standardized
facial expressions of emotion is that they are static and
therefore do not represent the naturalistic dynamics of human
facial expressions [39]. Traditional theory-driven approaches
in psychology have therefore restricted understanding of the
specific dynamic face movement patterns that convey basic
emotions in different cultures. In turn, this has impacted
related fields such as social robotics where expressive ca-
pacity remains limited (e.g., primarily to Western cultures,
without naturalistic dynamics). For example, social robots
using these standardized universal facial expressions tend
to elicit low recognition accuracy (<50%) amongst non-
Western participants [40].
II. RELATED WORK
To better understand facial expression communication
across cultures, new data-driven methods have been used
to model the specific dynamic face movement patterns that
convey the six basic emotions in different cultures [e.g., 17,
41]. Fig. 2A-Billustrates this approach. On each experimental
trial, cultural participants view a random facial animation
generated by a facial animation platform [42] that randomly
samples a subset Action Units (AUs) from a core set of 42
AUs. For example, in Fig. 2A, three AUs are selected – Outer
Brow Raiser (AU2) color-coded in green, Lip Corner Puller
(AU12) in blue, and Lips Part (AU14) in red. Each AU is
then independently activated with a random movement (in
Fig. 2Asee color-coded temporal activation curves for each
AU; temporal parameters are labelled in the green curve). For
each Action Unit, we generated time courses using a cubic
Hermite spline interpolation of three 2-dimensional (time,
amplitude) control points and randomly generating 6 values
by sampling a uniform distribution on the interval [0,1].
These values are then transformed on the unit interval into
temporal parameters that represent the properties of onset
latency, acceleration, peak amplitude, peak latency, decel-
eration, and offset latency, according to the rules for each
parameter (see [42] for full details). Participants view the
resulting randomly generated facial animation and classify it
according to one of six emotions (‘happy,’ ‘surprise,’ ‘fear,’
‘disgust,’ ‘anger’ or ‘sad’) and rate its intensity on a 5-point
scale (‘very weak’ to ‘very strong’). If the facial animation
does not correspond to any of these emotions, participants se-
lect ‘other.’ Therefore, each facial animation that is classified
by the participant as a particular emotion at a given intensity
contains a dynamic face movement pattern that elicits the
perception of that particular emotion in the participant. After
many such trials, a statistical relationship is built between the
stimulus information (here, dynamic Action Units) presented
on each trial and the participant's corresponding responses
(e.g., ‘happy,’ ‘very strong’) as depicted in Fig. 2B. This
procedure therefore produces a statistically robust model of
a dynamic facial expression pattern that elicits the perception
of a given emotion in a participant from the culture of interest
as demonstrated by a perceptual validation task (see [42]
for full details of the modelling procedure1). Importantly,
because these models are quantifiable representations of
facial expressions, they can be directly transferred to social
robotics to generate culturally-sensitive facial expressions, as
illustrated in Fig. 2C. Therefore, this data-driven approach of
agnostically sampling face movements and using subjective
human cultural perception to model the dynamic Action
Unit patterns that represent different emotions (or any social
category such as different smiles [43], personality traits [44],
pain and pleasure [35] and mental states [45]) in a bottom-
up manner is particularly suitable for objectively exploring
facial expressions in diverse cultures [38].
Using this approach, Jack, et al. [17] modelled a set of 30
dynamic facial expressions of each of the six basic emotions
using the cultural perception of 30 East Asian participants
with each model derived from an individual participant.
Comparison of these 30 individual models in each emotion
category showed high consistency across participants as
measured by average Hamming distance: Happy, Median =
0.07 (SE = 0.002); Surprise: Median = 0.12 (SE = 0.003);
Fear: Median = 0.19 (SE = 0.003); Disgust: Median = 0.19
(SE = 0.004); Anger: Median = 0.21 (SE = 0.004) and Sad:
Median = 0.14 (SE = 0.003; see also similarity matrix in
Fig. 2 in [17]). Here, we aim to transfer these 30 culturally-
derived dynamic facial expression models to a popular social
robot head – Furhat https://www.furhatrobotics.com/ – and
examine whether they improve recognition accuracy amongst
East Asian participants compared to the social robot's exist-
ing ‘universal’ facial expressions.
III. METHOD
A. Transference of culturally-derived dynamic facial expres-
sion models to a social robot
To display the culturally-derived dynamic facial expression
models on the social robot head, we first supplemented
the social robot's existing facial movement vocabulary of
7 pre-set universal facial expressions of emotion (2 happy,
1 surprise, 1 fear, 1 disgust, 1 anger and 1 sad) with a
set of 42 individual dynamic Action Units including all
combinations (see [46] for full details of transforming the AU
shape deviation data to the social robot's mesh topologies).
With this development, displayed each of the culturally-
derived dynamic facial expression models of the six basic
emotions (n = 30 models per emotion) on the social robot
head along with the existing set of standardized universal
facial expressions of emotion (n = 7).
In a first experiment, we asked a group of East Asian
participants to classify all of these facial expressions by
emotion (section B). In a second experiment, we asked the
same group of East Asian participants to judge their human-
likeness (section C). We blocked and counterbalanced these
1The values of temporal parameters reported in this study are normalized
within their unit interval (i.e., [0,1]). Formula (11) to (16) in [42] should
be used to transform these temporal parameter values into to their real
values (e.g., time in seconds for peak latency). The Action Unit patterns and
temporal parameters of each facial expression model have been deposited
on Open Science Framework (available at https://osf.io/nxe9q/).
two tasks across participants and describe the design and re-
sults below. For both experiments, we recruited 10 East Asian
participants (10 Chinese, 5 females, mean age 23.6 years, SD
= 2.12 years) with minimal exposure to and engagement with
other cultures as assessed by questionnaire (see [17] for an
example). All participants had normal or corrected-to-normal
vision, were free from any emotion related atypicalities (e.g.
Autism Spectrum Disorder, depression), learning difficulties
(e.g. dyslexia), synaesthesia, and disorders of face perception
(e.g. prosopagnosia) as per self-report. All participants had
a minimum International English Language Testing System
(IELTS) score of 6.0 (competent user). Each participant gave
written informed consent and received a standard rate of £6
per hour for their participation. The Ethics Committee of the
College of Science and Engineering, University of Glasgow
provided ethical approval (Ref No: 300160186).
Fig. 3. A. Recognition accuracy of culturally-derived facial expressions
and the social robot's existing facial expressions. B. Judgments of human-
likeness. In both panels, red circles represent the culturally-derived facial
expression models; blue represents the social robot's existing standardized
facial expressions. Circle size represents the number of facial expression
models (e.g., in happy, six models are recognized at 95% accuracy; in
disgust, 1 model is recognized at 25% accuracy).
B. Recognition of universal versus culturally-derived facial
expressions of emotion
On each trial, participants viewed a facial expression
displayed on the social robot head and classified it according
Fig. 4. A. Classification performance of culturally-derived facial expressions of the six basic emotions. The color-coded matrix shows the proportion of
trials on which participants classified the input facial expression as a given emotion (see the colorbar to the right). B. Color-coded face maps show the
Action Unit patterns of the models that participants classified correctly (diagonal squares) and incorrectly (off-diagonal squares). Color-coding indicates
the proportion of trials (see colorbar to the right). For example, Upper Lip Raiser Left and Right and Cheek Raiser Left are common Action Units in
disgust expressions, which likely causes the confusion of anger as disgust as shown in A.
to one of six emotions – happy, surprise, fear, disgust, anger
or sad – in a 6-alternative forced choice task. Each participant
viewed a total of 374 facial animations ([30 culturally-
derived facial expression models ×6 emotions] + [7 existing
standardized universal facial expressions] ×2 repetitions)
presented in random order across the experiment. We pre-
sented each facial animation on one of the social robot's 7
in-built face textures (‘Default,’ ‘Male,’ ‘Female,’ ‘Obama,’
‘iRobot,’ ‘Gabriel,’ and ‘Avatar’), pseudo-randomly selected
for each participant so that each face texture appeared an
equal number of times across the experiment. We blocked all
trials by face texture and randomized the order of the blocks
for each participant. We presented each facial animation once
for a duration of 1.25 seconds. After each animation, the face
returned to a neutral expression. Participants then responded
using a Graphic User Interface (GUI) displayed on a 19-inch
flat panel Dell monitor next to the robot head. We instructed
participants to respond quickly and accurately. Following
participant response, two beeps sounded to cue participants
to prepare for the next trial. Participants then viewed the
social robot and pressed the spacebar to start the next trial.
We displayed the social robot head (size 22.5 cm ×16 cm)
in the participant's central visual field at a constant viewing
distance of 90 cm using a chin rest. The facial expressions
therefore subtended 14.25◦(vertical) and 10.16◦(horizontal)
of visual angle, which reflects the average size of a human
face [47] during natural social interaction [48]. We used
MATLAB 2016a to display the GUI and record participant
responses.
To compare the recognition accuracy of the culturally-
derived facial expression models with the social robot's
existing facial expressions, we computed the proportion of
correct responses for each facial expression model (n = 30
per emotion) and each of the social robot's existing facial
expressions (n = 7 total) by pooling the data across all
trials and participants. Fig. 3Ashows the results for each
emotion. Red circles represent each culturally-derived facial
expression model; blue represents the social robot's existing
facial expressions. Circle size represents the number of facial
expression models with a specific accuracy (e.g., in happy,
6 models are recognized at 95% accuracy). As shown by
the distribution of red circles in each emotion category, the
majority of the culturally-derived facial expression models
elicited higher recognition accuracy than the social robot's
existing facial expressions. One exception is anger where
only 1 model showed higher performance than the social
robot's existing facial expression. We will explore and re-
port on this lower recognition performance later in the
manuscript.
C. Judgments of human-likeness of universal versus
culturally-derived facial expressions of emotion
Next, we compared the participants' judgments of human-
likeness for the culturally-derived facial expression models
and the social robot's existing universal facial expressions.
On each trial, we presented a pair of facial expressions
of the same emotion (e.g., happy) – one culturally-derived
facial expression and one of the social robot's existing facial
expressions – each displayed on the same face texture and
in pseudo-random sequential order. We presented each facial
expression once for a duration of 1.25 seconds, with an inter-
stimulus interval (ISI) of 0.5 second between each. After
displaying each pair of facial expressions, the social robot
face returned to a neutral expression. Participants indicated
which facial expression they thought looked most human-
like using a GUI displayed on a 19-inch flat panel Dell
monitor positioned next to the social robot head. Following
participant response, two beeps sounded to cue participants
to prepare for the next trial. Participants then viewed the
social robot head and pressed the spacebar to start the next
trial. We randomly assigned one of the social robot's 7 in-
built face textures to each emotion category, blocked the
trials by emotion, and randomized the order of the blocks
across the experiment for each participant. Each participant
completed a total of 420 trials ([60 pairs of happy facial
expressions + 30 pairs of facial expressions for each of
the other 5 emotions] ×2 pair orders). We used the same
viewing conditions and equipment as in B. Recognition
of universal versus culturally-derived facial expressions of
emotion above.
To compare the human-likeness judgments of the
culturally-derived facial expression models and the social
robot's existing facial expressions, we computed the propor-
tion of times that participants selected each facial expression
as most humanlike by pooling data across all trials and
participants. Fig. 3Bshows the results. As shown by the
distribution of red circles in Fig. 3B, participants consistently
judged the culturally-derived facial expression models as
more human-like than the social robot's existing standardized
universal facial expressions.
We now return to exploring the low recognition accuracy
amongst the culturally-derived facial expression models of
anger (see Fig. 3A). First, we examined the distribution
of correct and incorrect classifications as shown by the
confusion matrix in Fig. 4A. The y axis represents the
emotion of the input stimulus and the x axis represents
the participants' classification response. Each color-coded
cell of the matrix shows the proportion of trials on which
participants classified the input stimulus (e.g., facial expres-
sion model of anger) as a given emotion category (e.g.,
disgust) with proportions derived by pooling data across
all participants and trials. Squares on the diagonal show
the correct classifications; off-diagonal squares show the
incorrect classifications. Brighter colors indicate a higher
proportion of trials; darker colors indicate a lower proportion
of trials (see colorbar to the right). For the anger models, the
off-diagonal squares show that participants tended to mis-
classify these facial expression models as disgust. Similarly,
participants misclassified disgust models as anger, although
to a lower degree. To explore the potential face signalling
source of these misclassifications, we examined the Action
Units distributed across correct and incorrect responses. Fig.
4Bshows the results where each face map shows the Action
Unit patterns that participants classified correctly (diagonal
squares) and incorrectly (off-diagonal squares). Red indicates
a high proportion of trials; blue indicates a low proportion
of trials (see colorbar to the right). For anger, the off-
diagonal squares show that participants tended to misclassify
as disgust the facial expression models that comprised Action
Units that are prevalent in disgust such as the Upper Lip
Raiser, bilaterally (AU10R and AU10L), and Cheek Raiser
Left (AU6L). Similarly, participants misclassified disgust
facial expressions as anger when they contained Action Units
that are common in correctly classified anger expressions
such as Lip Pressor (AU24) and Lip Tightener (AU23).
D. Dynamic Action Units associated with high performance
We showed that the culturally-derived facial expression
models are recognized with higher accuracy and are judged
as more human-like compared to the social robot's existing
facial expressions. To identify which specific face move-
ments are associated with these improved performances, we
used an information-theoretic approach based on mutual
information (MI) [49, 50]. Specifically, MI quantifies the
relationship between two variables – here, the presence
of an Action Unit and performance of a facial expression
model (i.e., recognition accuracy or judgments of human-
likeness). High MI would indicate that an Action Unit (e.g.,
Brow Lowerer, AU4) is strongly associated with performance
(e.g., correct emotion classifications); low MI would in-
dicate a weak relationship. To identify, for each emotion,
the AUs that are strongly associated with performance, we
applied the following analysis for recognition accuracy and
human-likeness separately: We computed the MI between
each Action Unit (i.e., whether it present or absent in the
culturally-derived facial expression model) and performance
(e.g., correct emotion classifications) by pooling the partic-
ipants' responses to the culturally-derived facial expressions
collected in B. Recognition of universal versus culturally-
derived facial expressions of emotion, resulting in 600 trials
per emotion category (30 models ×10 participants ×2
repetitions). We computed the MI for each Action Unit
except three that are present in 100% of the facial expres-
sion models – i.e., in happy, Lip Corner Puller (AU12)
and Dimpler (AU14); in surprise, Inner/Outer Brow Raiser
(AU1-2) – which therefore provides no variance to compute
MI. We established the statistical significance of high MI
values using a Monte Carlo simulation method by shuffling
the participants' responses 1000 times, computing MI for
each Action Unit at each iteration and using the random
distribution of MI values to identify the Action Units with
MI values that are significantly higher than chance (i.e.,
>95% of the distribution, uncorrected). All Action Units
with significantly high MI are displayed on color-coded face
maps in Fig. 5Afor recognition accuracy and in Fig. 5B
for judgements of human-likeness. The color-coded matrices
next to the face maps indicate these Action Units in the first
column.
Certain Action Units could also improve recognition per-
formance based on their specific dynamic properties such
as high amplitude, early peak latency, or fast acceleration.
Fig. 5. Dynamic Action Unit patterns associated with high recognition accuracy (panel A) and high judgments of human-likeness (panel B). In each panel,
the face maps show the Action Units that are associated with high performance; the color-coded matrices also indicate any specific (unit interval) temporal
parameter values associated with performance (see legend at bottom). Action Units that further boost performance are indicated with white asterisks.
For example, panel Ashows that in sad, Chin Raiser at high amplitude further boosts recognition accuracy. Panel Bshows that in fear, judgments of
human-likeness are boosted by Mouth Stretch with medium peak latency.
To identify any such Action Units, we computed for each
Action Unit separately, the MI between performance (e.g.,
correct versus incorrect emotion classifications) and four
main temporal parameters – amplitude, peak latency, ac-
celeration, deceleration – separately using three levels of
temporal parameter values (high, medium, low). We estab-
lished statistical significance of high MI values for each
temporal parameter using a Monte Carlo simulation method
as described above. Action Unit dynamics with significantly
high MI are also displayed in the face maps shown in Fig.
5Afor recognition accuracy and Fig. 5Bfor judgments of
human-likeness. Next, to specify the level of these dynamic
properties (i.e., high, medium, or low) we computed the
frequency of each level across the high-performance trials
(e.g., correct emotion classifications) and took the temporal
parameter level with the highest frequency. The results
are shown in Fig. 5Ain the color-coded matrices where
distinct colours indicate the value of each temporal parameter
significantly associated with high recognition accuracy (blue
– low [0.01, 0.4], green – medium [0.41, 0.8] and yellow
– high [0.81, 1] for the unit interval of each parameter;
see legends below). Together, these results show that for
each emotion, several specific Action Units and/or their
specific dynamic properties are strongly associated with high
recognition accuracy and judgments of human-likeness. For
example, for happy, the presence of Inner Brow Raiser (AU1)
and Cheek Raiser (AU6) is strongly associated with high
recognition accuracy. For surprise, the presence of Mouth
Stretch (AU27) is strongly associated with judgments of
human-likeness.
E. Dynamic Action Units that further boost performance
Above, we identified the Action Units and their dynamic
properties that are strongly associated with (and therefore
important for) the correct classification of emotions and
judgments of human-likeness. As shown in Fig. 3, certain
facial expression models elicit particularly high performance.
To identify the specific face movements that further boost
performance, we first computed the MI between each Action
Unit and very high performance for each emotion separately.
High MI would indicate that an Action Unit boosts perfor-
mance. We defined very high performance as the accuracy
elicited by the top 25% of facial expression models in each
task – i.e., recognition accuracy and judgments of human-
likeness – separately. We established statistical significance
using the Monte Carlo method described above. These high-
performance Action Units are also displayed in the face maps
in Figs. 5A-Band indicated with white asterisks in the color-
coded matrices. Next, to identify and characterize the specific
dynamic Action Unit properties that boost performance, we
conducted a similar MI analysis as before. These Action
Units are also displayed in the face maps in Figs. 5A-B
with the color-coded matrices showing the specific level of
each temporal parameter. For example, in fear, Brow Lowerer
(AU4) boosts performance; for fear, Mouth Stretch with
medium peak latency boosts judgments of human-likeness.
IV. CONCLUSIONS
Here, we transferred a set of 30 culturally-derived dynamic
facial expression models of the six basic emotions to a pop-
ular social robot and compared the recognition accuracy and
human-likeness judgments of East Asian participants with
those of the social robot's existing standardized universal
facial expressions. Results show that these culturally-derived
dynamic facial expression models generally outperformed the
social robot's existing facial expressions on both recognition
accuracy and judgements of human-likeness. Further anal-
ysis of the facial expression models revealed that certain
Action Units and temporal dynamic properties drive high
performance on both recognition accuracy and judgements
of human-likeness. We also showed that the misclassification
of the anger facial expression models as disgust could be
due to shared Action Units – i.e., Upper Lip Raiser (AU10R
and AU10L) and Cheek Raiser Left (AU6L) – as shown
in Fig. 4B(see also e.g. [27, 51, 52]) that could reflect
a latent signalling structure across emotion categories [17].
Identifying such common face movement patterns and those
that clearly distinguish specific emotion categories could
therefore better inform the design of social robots to further
enhance performance.
Together, our results highlight the advantage of using
culturally-sensitive dynamic facial expressions that are de-
rived from cultural perception using data-driven methods
over the theoretically-derived standardized facial expressions
of emotion that currently in common use. We therefore
anticipate that our data-driven and user-centred approach
to modelling dynamic facial expressions will be used to
further diversify and refine the social face signal gener-
ation capabilities of social robots. Such facial expression
models could also be used, in conjunction with additional
information about cultural context, to improve the social
sensing capabilities of social robots. User-directed selec-
tions of culturally-sensitive facial expressions in artificial
agents could also meet personal preferences such as learning
culture-specific facial expressions. Together, we anticipate
that the use of data-driven approaches will further inform
the design of culturally-sensitive digital agents to improve
their performance, accessibility, and marketability within a
culturally diverse global market.
REFERENCES
[1] C. Darwin, The Expression of the Emotions in Man and Animals, 3rd
ed. London: Fontana Press, 1999/1872.
[2] P. Ekman, E. R. Sorenson, and W. V. Friesen, ”Pan-Cultural Elements
in Facial Displays of Emotion,” Science, vol. 164, pp. 86-88, April 4,
1969 1969.
[3] P. Ekman and W. Friesen, ”A new pan-cultural facial expression of
emotion,” Motivation and Emotion, vol. 10, pp. 159-168, 1986.
[4] D. Matsumoto, ”American-Japanese Cultural Differences in the
Recognition of Universal Facial Expressions,” Journal of Cross-
Cultural Psychology, vol. 23, pp. 72-84, 1992.
[5] M. Biehl, D. Matsumoto, P. Ekman, V. Hearn, K. Heider, T. Kudoh,
et al., ”Matsumoto and Ekman's Japanese and Caucasian Facial Ex-
pressions of Emotion (JACFEE): Reliability Data and Cross-National
Differences,” Journal of Nonverbal Behavior, vol. 21, pp. 3-21, 1997.
[6] J. D. Boucher and G. E. Carlson, ”Recognition of Facial Expression
in Three Cultures,” Journal of Cross-Cultural Psychology, vol. 11, pp.
263-280, 1980.
[7] Kimiko Shimoda, M. Argyle, and P. R. Bitti, ”The intercultural
recognition of emotional expressions by three national racial groups:
English, Italian and Japanese,” European Journal of Social Psychology,
vol. 8, pp. 169-179, 1978.
[8] P. Ekman and W. V. Friesen, Manual for the Facial Action Coding
System. Palo Alto, CA: Consulting Psychologists Press, 1978.
[9] M. Pantic and L. J. M. Rothkrantz, ”Automatic analysis of facial
expressions: The state of the art,” IEEE Transactions on pattern
analysis and machine intelligence, vol. 22, pp. 1424-1445, 2000.
[10] N. Lazzeri, D. Mazzei, A. Greco, A. Rotesi, A. Lanat, and D. E. De
Rossi, ”Can a humanoid face be expressive? A psychophysiological
investigation,” Frontiers in bioengineering and biotechnology, vol. 3,
p. 64, 2015.
[11] C. C. Bennett and S. Sabanovic, ”Deriving minimal features for
human-like facial expressions in robotic faces,” International Journal
of Social Robotics, vol. 6, pp. 367-381, 2014.
[12] E. G. Krumhuber and K. R. Scherer, ”The look of fear from the eyes
varies with the dynamic sequence of facial actions,” Swiss Journal of
Psychology, 2016.
[13] L. Canamero and J. Fredslund, ”I show you how I like you-can you
read it in my face? [robotics],” IEEE Transactions on systems, man,
and cybernetics – Part A: Systems and humans, vol. 31, pp. 454-459,
2001.
[14] T. Hashimoto, S. Hitramatsu, T. Tsuji, and H. Kobayashi, ”Develop-
ment of the face robot SAYA for rich facial expressions,” in SICE-
ICASE, 2006. International Joint Conference, 2006, pp. 5423-5428.
[15] S. Al Moubayed, J. Beskow, G. Skantze, and B. Granstrm, ”Furhat: a
back-projected human-like robot head for multiparty human-machine
interaction,” in Cognitive behavioural systems, ed: Springer, 2012, pp.
114-130.
[16] T. Fong, I. Nourbakhsh, and K. Dautenhahn, ”A survey of socially
interactive robots,” Robotics and autonomous systems, vol. 42, pp.
143-166, 2003.
[17] R. E. Jack, O. G. B. Garrod, H. Yu, R. Caldara, and P. G. Schyns,
”Facial expressions of emotion are not culturally universal,” Proceed-
ings of the National Academy of Sciences, vol. 109, pp. 7241-7244,
2012 2012.
[18] H. A. Elfenbein and N. Ambady, ”On the universality and cultural
specificity of emotion recognition: A meta-analysis,” Psychological
Bulletin, vol. 128, pp. 203-235, 2002.
[19] B. Mesquita and N. H. Frijda, ”Cultural variations in emotions: a
review,” Psychological Bulletin, vol. 112, pp. 179-204, Sep 1992.
[20] J. A. Russell, ”Is there universal recognition of emotion from facial
expression? A review of the cross-cultural studies,” Psychological
Bulletin, vol. 115, pp. 102-41, Jan 1994.
[21] L. Ducci, L. Arcuri, and T. Sineshaw, ”Emotion Recognition in
Ethiopia The Effect of Familiarity with Western Culture on Accuracy
of Recognition,” Journal of Cross-Cultural Psychology, vol. 13, pp.
340-351, 1982.
[22] P. Ekman, ”Universals and cultural differences in facial expressions of
emotion,” presented at the Nebraska Symposium on Motivation, 1972.
[23] P. Ekman, W. V. Friesen, M. O'Sullivan, A. Chan, I. Diacoyanni-
Tarlatzis, K. Heider, et al., ”Universals and cultural differences in the
judgments of facial expressions of emotion,” Journal of Personality
and Social Psychology, vol. 53, pp. 712-7, Oct 1987.
[24] H. A. Elfenbein and N. Ambady, ”Universals and Cultural Differences
in Recognizing Emotions,” Current Directions in Psychological Sci-
ence, vol. 12, pp. 159-164, October 1, 2003 2003.
[25] H. A. Elfenbein, M. Mandal, N. Ambday, S. Harizuka, and S.
Kumar, ”Hemifacial differences in the in-group advantage in emotion
recognition,” Cognition and Emotion, vol. 18, pp. 613-629, 2004.
[26] Y. Huang, S. Tang, D. Helmeste, T. Shioiri, and T. Someya, ”Dif-
ferential judgement of static facial expressions of emotions in three
cultures,” Psychiatry and clinical neurosciences, vol. 55, pp. 479-483,
2001.
[27] R. E. Jack, C. Blais, C. Scheepers, P. G. Schyns, and R. Caldara,
”Cultural confusions show that facial expressions are not universal,”
Current Biology, vol. 19, pp. 1543-1548, 2009.
[28] G. Kirouac and F. Y. Dore, ”Accuracy of the judgment of facial
expression of emotions as a function of sex and level of education,”
Journal of Nonverbal Behavior, vol. 9, pp. 3-7, 1985.
[29] G. Kirouac and F. Y. Dor, ”Accuracy and latency of judgment of
facial expressions of emotions,” Perceptual and Motor Skills, vol. 57,
pp. 683-686, 1983.
[30] D. Matsumoto and P. Ekman, ”American-Japanese cultural differences
in intensity ratings of facial expressions of emotion,” Motivation and
Emotion, vol. 13, pp. 143-157, 1989.
[31] F. T. McAndrew, ”A cross-cultural study of recognition thresholds for
facial expressions of emotion,” Journal of Cross-Cultural Psychology,
vol. 17, pp. 211-224, 1986.
[32] T. Shioiri, T. Someya, D. Helmeste, and S. W. Tang, ”Misinterpretation
of facial expression: A cross-cultural study,” Psychiatry and clinical
neurosciences, vol. 53, pp. 45-50, 1999.
[33] R. E. Jack, ”Culture and facial expressions of emotion,” Visual
Cognition, vol. 21, pp. 1248-1286, Sep 1 2013.
[34] N. L. Nelson and J. A. Russell, ”Universality revisited,” Emotion
Review, vol. 5, pp. 8-15, 2013 2013.
[35] C. Chen, C. Crivelli, O. G. B. Garrod, P. G. Schyns, J.-M. Fernndez-
Dols, and R. E. Jack, ”Distinct facial expressions represent pain and
pleasure across cultures,” Proceedings of the National Academy of
Sciences, 2018.
[36] C. Crivelli, J. A. Russell, S. Jarillo, and J.-M. Fernndez-Dols, ”The fear
gasping face as a threat display in a Melanesian society,” Proceedings
of the National Academy of Sciences, vol. 113, pp. 12403-12407, 2016.
[37] J. Henrich, S. Heine, and A. Norenzayan, ”The weirdest people in the
world?,” Behavioral and Brain Sciences, vol. 33, pp. 61-83, 2010.
[38] R. E. Jack, C. Crivelli, and T. Wheatley, ”Data-driven methods
to diversify knowledge of human psychology,” Trends in Cognitive
Sciences, vol. 22, pp. 1-5, 2018.
[39] E. Krumhuber, A. S. Manstead, D. Cosker, D. Marshall, P. L. Rosin,
and A. Kappas, ”Facial dynamics as indicators of trustworthiness and
cooperative behavior,” Emotion, vol. 7, p. 730, 2007.
[40] G. Trovato, T. Kishi, N. Endo, K. Hashimoto, and A. Takanishi,
”A cross-cultural study on generation of culture dependent facial
expressions of humanoid social robot,” in International Conference
on Social Robotics, 2012, pp. 35-44.
[41] R. E. Jack, W. Sun, I. Delis, O. G. Garrod, and P. G. Schyns, ”Four
not six: Revealing culturally common facial expressions of emotion,”
Journal of Experimental Psychology: General, vol. 145, p. 708, 2016.
[42] H. Yu, O. G. B. Garrod, and P. G. Schyns, ”Perception-driven facial
expression synthesis,” Computers and Graphics, vol. 36, pp. 152-162,
2012.
[43] M. Rychlowska, R. E. Jack, O. G. Garrod, P. G. Schyns, J. D. Martin,
and P. M. Niedenthal, ”Functional smiles: Tools for love, sympathy,
and war,” Psychological Science, vol. 28, pp. 1259-1270, 2017.
[44] D. Gill, O. G. Garrod, R. E. Jack, and P. G. Schyns, ”Facial
movements strategically camouflage involuntary social signals of face
morphology,” Psychological science, vol. 25, pp. 1079-1086, 2014.
[45] C. Chen, O. Garrod, P. Schyns, and R. Jack, ”The Face is the Mirror
of the Cultural Mind,” Journal of vision, vol. 15, pp. 928-928, 2015.
[46] C. Chen, O. G. Garrod, J. Zhan, J. Beskow, P. G. Schyns, and R. E.
Jack, ”Reverse Engineering Psychologically Valid Facial Expressions
of Emotion into Social Robots,” in Automatic Face and Gesture
Recognition (FG 2018), 2018 13th IEEE International Conference on,
2018, pp. 448-452.
[47] L. Ibrahimagic-Seper, A. Celebic, N. Petricevic, and E. Selimovic,
”Anthropometric differences between males and females in face di-
mensions and dimensions of central maxillary incisors,” Medicinski
glasnik, vol. 3, pp. 58-62, 2006.
[48] E. Hall, The Hidden Dimension. Garden City, NY: Doubleday, 1966.
[49] T. E. Nichols and A. P. Holmes, ”Nonparametric permutation tests
for functional neuroimaging: a primer with examples,” Human Brain
Mapping, vol. 15, pp. 1-25, 2002.
[50] T. M. Cover and J. A. Thomas, Elements of information theory: John
Wiley and Sons, 2012.
[51] S. Du, Y. Tao, and A. M. Martinez, ”Compound facial expressions
of emotion,” Proceedings of the National Academy of Sciences, p.
201322355, 2014.
[52] V. Shuman, E. Clark-Polner, B. Meuleman, D. Sander, and K. R.
Scherer, ”Emotion perception from a componential perspective,” Cog-
nition and Emotion, vol. 31, pp. 47-56, 2017.