Conference PaperPDF Available

Equipping social robots with culturally-sensitive facial expressions of emotion using data-driven methods

Authors:

Figures

Content may be subject to copyright.
Equipping social robots with culturally-sensitive facial expressions of
emotion using data-driven methods
Chaona Chen1,2, Laura B. Hensel2, Yaocong Duan2, Robin A. A. Ince1,
Oliver G. B. Garrod1, Jonas Beskow3, Rachael E. Jack1,2, and Philippe G. Schyns1,2
1Institute of Neuroscience and Psychology, University of Glasgow, G12 8QB, Scotland, UK
2School of Psychology, University of Glasgow, G12 8QB, Scotland, UK
3Furhat Robotics, 11428 Stockholm, Sweden
Email: rachael.jack@glasgow.ac.uk
Abstract Social robots must be able to generate realistic
and recognizable facial expressions to engage their human
users. Many social robots are equipped with standardized
facial expressions of emotion that are widely considered to be
universally recognized across all cultures. However, mounting
evidence shows that these facial expressions are not univer-
sally recognized – for example, they elicit significantly lower
recognition accuracy in East Asian cultures than they do
in Western cultures. Therefore, without culturally sensitive
facial expressions, state-of-the-art social robots are restricted
in their ability to engage a culturally diverse range of human
users, which in turn limits their global marketability. To
develop culturally sensitive facial expressions, novel data-driven
methods are used to model the dynamic face movement patterns
that convey basic emotions (e.g., happy, sad, anger) in a given
culture using cultural perception. Here, we tested whether such
dynamic facial expression models, derived in an East Asian
culture and transferred to a popular social robot, improved
the social signalling generation capabilities of the social robot
with East Asian participants. Results showed that, compared to
the social robot's existing set of facial ‘universal’ expressions,
the culturally-sensitive facial expression models are recognized
with generally higher accuracy and judged as more human-
like by East Asian participants. We also detail the specific
dynamic face movements (Action Units) that are associated with
high recognition accuracy and judgments of human-likeness,
including those that further boost performance. Our results
therefore demonstrate the utility of using data-driven methods
that employ human cultural perception to derive culturally-
sensitive facial expressions that improve the social face signal
generation capabilities of social robots. We anticipate that these
methods will continue to inform the design of social robots and
broaden their usability and global marketability.
I. INTRODUCTION
Facial expressions are widely considered to be the univer-
sal language of emotion. Based on Darwin's ground-breaking
theory on the biological origins of facial expressions of
emotion [1] and Ekman's seminal cross-cultural recognition
studies (e.g., [2]), several dominant theories in the field of
psychology have argued that six basic emotions – happy,
surprise, fear, disgust, anger and sad – are expressed and
This work was supported by The Economic and Social Research Council
and Medical Research Council (United Kingdom; ESRC/MRC-060-25-
0010), British Academy (BA SG171783), Wellcome Trust (107802/Z/15/Z)
and Multidisciplinary University Research Initiative (MURI)/Engineering
and Physical Sciences Research Council (EP/N019261/1).
Fig. 1. A. Examples of the six standardized universal facial expressions of
basic emotions and individual face movements, called Action Units (AUs).
B. Color-coded points show the average recognition accuracy of these facial
expressions in 40 locations across the world as reported in 15 previous
studies [2, 5, 6, 21-32]. Figure adapted from [33] with permission.
recognized in the same way across all cultures (e.g., [2-
7]). To represent these universal facial expressions, the field
established a set of six standardized facial expressions (see
Fig. 1Afor examples) each comprising a specific pattern
of face movements called Action Units (AUs) such as Lid
Tightener (AU7) and Lip Corner Puller (AU20); see Fig
1Afor examples) [8]. These standardized facial expression
images quickly became the gold standard in research and
consequently influenced a broad range of fields including
affective computing [see 9 for a review] and social robotics
[10-12]. For example, state-of-the-art social robots such as
Felix [13], SAYA [14] and Furhat [15, see also 16 for a re-
view] generate facial expressions based on these standardized
Action Unit patterns.
However, mounting evidence shows that these facial ex-
pressions are not recognized well across different cultures.
Whereas all six facial expressions are recognized with com-
parably high accuracy in Western cultures, facial expressions
such as fear, disgust, and anger elicit significantly lower
accuracy in a number of other cultures [17, 18, see also
reviews of 19, 20]. To illustrate, Fig. 1Bshows the recog-
nition accuracy of these standardized facial expressions in
40 locations across the world reported in 15 well-known978-1-7281-0089-0/19/$31.00 ©2019 IEEE
Fig. 2. Data-driven, perception-based method to model culturally-sensitive dynamic facial expressions of emotion and their transference to social robotics.
A. Stimulus generation and task procedure. B. Facial expression modelling procedure. C. Transference of facial expression models to social robotics and
cultural validation.
studies [2, 5, 6, 21-32]. Red represents high recognition
accuracy (i.e., >75%) and blue represents lower accuracy
(i.e., <75%) [33]. As shown by the distribution of red and
blue points, these standardized facial expressions tend to
be recognized primarily in Western cultures but less so in
Eastern cultures. These consistent cultural differences there-
fore suggest that the facial expressions widely considered
to be universal are instead more representative of Western
culture [see also reviews of 18, 34]. Indeed, findings of
substantial cultural differences in a variety of psychological
phenomena once thought to be universal are now increasing
[17, 35, 36] because most current psychological knowledge is
derived from Western (more specifically, Western, educated,
industrialized, rich and democratic – WEIRD) [37] popula-
tions and using Western-centric theories and confirmatory
methods [38]. A further limitation of these standardized
facial expressions of emotion is that they are static and
therefore do not represent the naturalistic dynamics of human
facial expressions [39]. Traditional theory-driven approaches
in psychology have therefore restricted understanding of the
specific dynamic face movement patterns that convey basic
emotions in different cultures. In turn, this has impacted
related fields such as social robotics where expressive ca-
pacity remains limited (e.g., primarily to Western cultures,
without naturalistic dynamics). For example, social robots
using these standardized universal facial expressions tend
to elicit low recognition accuracy (<50%) amongst non-
Western participants [40].
II. RELATED WORK
To better understand facial expression communication
across cultures, new data-driven methods have been used
to model the specific dynamic face movement patterns that
convey the six basic emotions in different cultures [e.g., 17,
41]. Fig. 2A-Billustrates this approach. On each experimental
trial, cultural participants view a random facial animation
generated by a facial animation platform [42] that randomly
samples a subset Action Units (AUs) from a core set of 42
AUs. For example, in Fig. 2A, three AUs are selected – Outer
Brow Raiser (AU2) color-coded in green, Lip Corner Puller
(AU12) in blue, and Lips Part (AU14) in red. Each AU is
then independently activated with a random movement (in
Fig. 2Asee color-coded temporal activation curves for each
AU; temporal parameters are labelled in the green curve). For
each Action Unit, we generated time courses using a cubic
Hermite spline interpolation of three 2-dimensional (time,
amplitude) control points and randomly generating 6 values
by sampling a uniform distribution on the interval [0,1].
These values are then transformed on the unit interval into
temporal parameters that represent the properties of onset
latency, acceleration, peak amplitude, peak latency, decel-
eration, and offset latency, according to the rules for each
parameter (see [42] for full details). Participants view the
resulting randomly generated facial animation and classify it
according to one of six emotions (‘happy,’ ‘surprise,’ ‘fear,
‘disgust,’ ‘anger’ or ‘sad’) and rate its intensity on a 5-point
scale (‘very weak’ to ‘very strong’). If the facial animation
does not correspond to any of these emotions, participants se-
lect ‘other.’ Therefore, each facial animation that is classified
by the participant as a particular emotion at a given intensity
contains a dynamic face movement pattern that elicits the
perception of that particular emotion in the participant. After
many such trials, a statistical relationship is built between the
stimulus information (here, dynamic Action Units) presented
on each trial and the participant's corresponding responses
(e.g., ‘happy,’ ‘very strong’) as depicted in Fig. 2B. This
procedure therefore produces a statistically robust model of
a dynamic facial expression pattern that elicits the perception
of a given emotion in a participant from the culture of interest
as demonstrated by a perceptual validation task (see [42]
for full details of the modelling procedure1). Importantly,
because these models are quantifiable representations of
facial expressions, they can be directly transferred to social
robotics to generate culturally-sensitive facial expressions, as
illustrated in Fig. 2C. Therefore, this data-driven approach of
agnostically sampling face movements and using subjective
human cultural perception to model the dynamic Action
Unit patterns that represent different emotions (or any social
category such as different smiles [43], personality traits [44],
pain and pleasure [35] and mental states [45]) in a bottom-
up manner is particularly suitable for objectively exploring
facial expressions in diverse cultures [38].
Using this approach, Jack, et al. [17] modelled a set of 30
dynamic facial expressions of each of the six basic emotions
using the cultural perception of 30 East Asian participants
with each model derived from an individual participant.
Comparison of these 30 individual models in each emotion
category showed high consistency across participants as
measured by average Hamming distance: Happy, Median =
0.07 (SE = 0.002); Surprise: Median = 0.12 (SE = 0.003);
Fear: Median = 0.19 (SE = 0.003); Disgust: Median = 0.19
(SE = 0.004); Anger: Median = 0.21 (SE = 0.004) and Sad:
Median = 0.14 (SE = 0.003; see also similarity matrix in
Fig. 2 in [17]). Here, we aim to transfer these 30 culturally-
derived dynamic facial expression models to a popular social
robot head – Furhat https://www.furhatrobotics.com/ – and
examine whether they improve recognition accuracy amongst
East Asian participants compared to the social robot's exist-
ing ‘universal’ facial expressions.
III. METHOD
A. Transference of culturally-derived dynamic facial expres-
sion models to a social robot
To display the culturally-derived dynamic facial expression
models on the social robot head, we first supplemented
the social robot's existing facial movement vocabulary of
7 pre-set universal facial expressions of emotion (2 happy,
1 surprise, 1 fear, 1 disgust, 1 anger and 1 sad) with a
set of 42 individual dynamic Action Units including all
combinations (see [46] for full details of transforming the AU
shape deviation data to the social robot's mesh topologies).
With this development, displayed each of the culturally-
derived dynamic facial expression models of the six basic
emotions (n = 30 models per emotion) on the social robot
head along with the existing set of standardized universal
facial expressions of emotion (n = 7).
In a first experiment, we asked a group of East Asian
participants to classify all of these facial expressions by
emotion (section B). In a second experiment, we asked the
same group of East Asian participants to judge their human-
likeness (section C). We blocked and counterbalanced these
1The values of temporal parameters reported in this study are normalized
within their unit interval (i.e., [0,1]). Formula (11) to (16) in [42] should
be used to transform these temporal parameter values into to their real
values (e.g., time in seconds for peak latency). The Action Unit patterns and
temporal parameters of each facial expression model have been deposited
on Open Science Framework (available at https://osf.io/nxe9q/).
two tasks across participants and describe the design and re-
sults below. For both experiments, we recruited 10 East Asian
participants (10 Chinese, 5 females, mean age 23.6 years, SD
= 2.12 years) with minimal exposure to and engagement with
other cultures as assessed by questionnaire (see [17] for an
example). All participants had normal or corrected-to-normal
vision, were free from any emotion related atypicalities (e.g.
Autism Spectrum Disorder, depression), learning difficulties
(e.g. dyslexia), synaesthesia, and disorders of face perception
(e.g. prosopagnosia) as per self-report. All participants had
a minimum International English Language Testing System
(IELTS) score of 6.0 (competent user). Each participant gave
written informed consent and received a standard rate of £6
per hour for their participation. The Ethics Committee of the
College of Science and Engineering, University of Glasgow
provided ethical approval (Ref No: 300160186).
Fig. 3. A. Recognition accuracy of culturally-derived facial expressions
and the social robot's existing facial expressions. B. Judgments of human-
likeness. In both panels, red circles represent the culturally-derived facial
expression models; blue represents the social robot's existing standardized
facial expressions. Circle size represents the number of facial expression
models (e.g., in happy, six models are recognized at 95% accuracy; in
disgust, 1 model is recognized at 25% accuracy).
B. Recognition of universal versus culturally-derived facial
expressions of emotion
On each trial, participants viewed a facial expression
displayed on the social robot head and classified it according
Fig. 4. A. Classification performance of culturally-derived facial expressions of the six basic emotions. The color-coded matrix shows the proportion of
trials on which participants classified the input facial expression as a given emotion (see the colorbar to the right). B. Color-coded face maps show the
Action Unit patterns of the models that participants classified correctly (diagonal squares) and incorrectly (off-diagonal squares). Color-coding indicates
the proportion of trials (see colorbar to the right). For example, Upper Lip Raiser Left and Right and Cheek Raiser Left are common Action Units in
disgust expressions, which likely causes the confusion of anger as disgust as shown in A.
to one of six emotions – happy, surprise, fear, disgust, anger
or sad – in a 6-alternative forced choice task. Each participant
viewed a total of 374 facial animations ([30 culturally-
derived facial expression models ×6 emotions] + [7 existing
standardized universal facial expressions] ×2 repetitions)
presented in random order across the experiment. We pre-
sented each facial animation on one of the social robot's 7
in-built face textures (‘Default,’ ‘Male,’ ‘Female,’ ‘Obama,’
‘iRobot,’ ‘Gabriel,’ and ‘Avatar’), pseudo-randomly selected
for each participant so that each face texture appeared an
equal number of times across the experiment. We blocked all
trials by face texture and randomized the order of the blocks
for each participant. We presented each facial animation once
for a duration of 1.25 seconds. After each animation, the face
returned to a neutral expression. Participants then responded
using a Graphic User Interface (GUI) displayed on a 19-inch
flat panel Dell monitor next to the robot head. We instructed
participants to respond quickly and accurately. Following
participant response, two beeps sounded to cue participants
to prepare for the next trial. Participants then viewed the
social robot and pressed the spacebar to start the next trial.
We displayed the social robot head (size 22.5 cm ×16 cm)
in the participant's central visual field at a constant viewing
distance of 90 cm using a chin rest. The facial expressions
therefore subtended 14.25(vertical) and 10.16(horizontal)
of visual angle, which reflects the average size of a human
face [47] during natural social interaction [48]. We used
MATLAB 2016a to display the GUI and record participant
responses.
To compare the recognition accuracy of the culturally-
derived facial expression models with the social robot's
existing facial expressions, we computed the proportion of
correct responses for each facial expression model (n = 30
per emotion) and each of the social robot's existing facial
expressions (n = 7 total) by pooling the data across all
trials and participants. Fig. 3Ashows the results for each
emotion. Red circles represent each culturally-derived facial
expression model; blue represents the social robot's existing
facial expressions. Circle size represents the number of facial
expression models with a specific accuracy (e.g., in happy,
6 models are recognized at 95% accuracy). As shown by
the distribution of red circles in each emotion category, the
majority of the culturally-derived facial expression models
elicited higher recognition accuracy than the social robot's
existing facial expressions. One exception is anger where
only 1 model showed higher performance than the social
robot's existing facial expression. We will explore and re-
port on this lower recognition performance later in the
manuscript.
C. Judgments of human-likeness of universal versus
culturally-derived facial expressions of emotion
Next, we compared the participants' judgments of human-
likeness for the culturally-derived facial expression models
and the social robot's existing universal facial expressions.
On each trial, we presented a pair of facial expressions
of the same emotion (e.g., happy) – one culturally-derived
facial expression and one of the social robot's existing facial
expressions – each displayed on the same face texture and
in pseudo-random sequential order. We presented each facial
expression once for a duration of 1.25 seconds, with an inter-
stimulus interval (ISI) of 0.5 second between each. After
displaying each pair of facial expressions, the social robot
face returned to a neutral expression. Participants indicated
which facial expression they thought looked most human-
like using a GUI displayed on a 19-inch flat panel Dell
monitor positioned next to the social robot head. Following
participant response, two beeps sounded to cue participants
to prepare for the next trial. Participants then viewed the
social robot head and pressed the spacebar to start the next
trial. We randomly assigned one of the social robot's 7 in-
built face textures to each emotion category, blocked the
trials by emotion, and randomized the order of the blocks
across the experiment for each participant. Each participant
completed a total of 420 trials ([60 pairs of happy facial
expressions + 30 pairs of facial expressions for each of
the other 5 emotions] ×2 pair orders). We used the same
viewing conditions and equipment as in B. Recognition
of universal versus culturally-derived facial expressions of
emotion above.
To compare the human-likeness judgments of the
culturally-derived facial expression models and the social
robot's existing facial expressions, we computed the propor-
tion of times that participants selected each facial expression
as most humanlike by pooling data across all trials and
participants. Fig. 3Bshows the results. As shown by the
distribution of red circles in Fig. 3B, participants consistently
judged the culturally-derived facial expression models as
more human-like than the social robot's existing standardized
universal facial expressions.
We now return to exploring the low recognition accuracy
amongst the culturally-derived facial expression models of
anger (see Fig. 3A). First, we examined the distribution
of correct and incorrect classifications as shown by the
confusion matrix in Fig. 4A. The y axis represents the
emotion of the input stimulus and the x axis represents
the participants' classification response. Each color-coded
cell of the matrix shows the proportion of trials on which
participants classified the input stimulus (e.g., facial expres-
sion model of anger) as a given emotion category (e.g.,
disgust) with proportions derived by pooling data across
all participants and trials. Squares on the diagonal show
the correct classifications; off-diagonal squares show the
incorrect classifications. Brighter colors indicate a higher
proportion of trials; darker colors indicate a lower proportion
of trials (see colorbar to the right). For the anger models, the
off-diagonal squares show that participants tended to mis-
classify these facial expression models as disgust. Similarly,
participants misclassified disgust models as anger, although
to a lower degree. To explore the potential face signalling
source of these misclassifications, we examined the Action
Units distributed across correct and incorrect responses. Fig.
4Bshows the results where each face map shows the Action
Unit patterns that participants classified correctly (diagonal
squares) and incorrectly (off-diagonal squares). Red indicates
a high proportion of trials; blue indicates a low proportion
of trials (see colorbar to the right). For anger, the off-
diagonal squares show that participants tended to misclassify
as disgust the facial expression models that comprised Action
Units that are prevalent in disgust such as the Upper Lip
Raiser, bilaterally (AU10R and AU10L), and Cheek Raiser
Left (AU6L). Similarly, participants misclassified disgust
facial expressions as anger when they contained Action Units
that are common in correctly classified anger expressions
such as Lip Pressor (AU24) and Lip Tightener (AU23).
D. Dynamic Action Units associated with high performance
We showed that the culturally-derived facial expression
models are recognized with higher accuracy and are judged
as more human-like compared to the social robot's existing
facial expressions. To identify which specific face move-
ments are associated with these improved performances, we
used an information-theoretic approach based on mutual
information (MI) [49, 50]. Specifically, MI quantifies the
relationship between two variables – here, the presence
of an Action Unit and performance of a facial expression
model (i.e., recognition accuracy or judgments of human-
likeness). High MI would indicate that an Action Unit (e.g.,
Brow Lowerer, AU4) is strongly associated with performance
(e.g., correct emotion classifications); low MI would in-
dicate a weak relationship. To identify, for each emotion,
the AUs that are strongly associated with performance, we
applied the following analysis for recognition accuracy and
human-likeness separately: We computed the MI between
each Action Unit (i.e., whether it present or absent in the
culturally-derived facial expression model) and performance
(e.g., correct emotion classifications) by pooling the partic-
ipants' responses to the culturally-derived facial expressions
collected in B. Recognition of universal versus culturally-
derived facial expressions of emotion, resulting in 600 trials
per emotion category (30 models ×10 participants ×2
repetitions). We computed the MI for each Action Unit
except three that are present in 100% of the facial expres-
sion models – i.e., in happy, Lip Corner Puller (AU12)
and Dimpler (AU14); in surprise, Inner/Outer Brow Raiser
(AU1-2) – which therefore provides no variance to compute
MI. We established the statistical significance of high MI
values using a Monte Carlo simulation method by shuffling
the participants' responses 1000 times, computing MI for
each Action Unit at each iteration and using the random
distribution of MI values to identify the Action Units with
MI values that are significantly higher than chance (i.e.,
>95% of the distribution, uncorrected). All Action Units
with significantly high MI are displayed on color-coded face
maps in Fig. 5Afor recognition accuracy and in Fig. 5B
for judgements of human-likeness. The color-coded matrices
next to the face maps indicate these Action Units in the first
column.
Certain Action Units could also improve recognition per-
formance based on their specific dynamic properties such
as high amplitude, early peak latency, or fast acceleration.
Fig. 5. Dynamic Action Unit patterns associated with high recognition accuracy (panel A) and high judgments of human-likeness (panel B). In each panel,
the face maps show the Action Units that are associated with high performance; the color-coded matrices also indicate any specific (unit interval) temporal
parameter values associated with performance (see legend at bottom). Action Units that further boost performance are indicated with white asterisks.
For example, panel Ashows that in sad, Chin Raiser at high amplitude further boosts recognition accuracy. Panel Bshows that in fear, judgments of
human-likeness are boosted by Mouth Stretch with medium peak latency.
To identify any such Action Units, we computed for each
Action Unit separately, the MI between performance (e.g.,
correct versus incorrect emotion classifications) and four
main temporal parameters – amplitude, peak latency, ac-
celeration, deceleration – separately using three levels of
temporal parameter values (high, medium, low). We estab-
lished statistical significance of high MI values for each
temporal parameter using a Monte Carlo simulation method
as described above. Action Unit dynamics with significantly
high MI are also displayed in the face maps shown in Fig.
5Afor recognition accuracy and Fig. 5Bfor judgments of
human-likeness. Next, to specify the level of these dynamic
properties (i.e., high, medium, or low) we computed the
frequency of each level across the high-performance trials
(e.g., correct emotion classifications) and took the temporal
parameter level with the highest frequency. The results
are shown in Fig. 5Ain the color-coded matrices where
distinct colours indicate the value of each temporal parameter
significantly associated with high recognition accuracy (blue
– low [0.01, 0.4], green – medium [0.41, 0.8] and yellow
– high [0.81, 1] for the unit interval of each parameter;
see legends below). Together, these results show that for
each emotion, several specific Action Units and/or their
specific dynamic properties are strongly associated with high
recognition accuracy and judgments of human-likeness. For
example, for happy, the presence of Inner Brow Raiser (AU1)
and Cheek Raiser (AU6) is strongly associated with high
recognition accuracy. For surprise, the presence of Mouth
Stretch (AU27) is strongly associated with judgments of
human-likeness.
E. Dynamic Action Units that further boost performance
Above, we identified the Action Units and their dynamic
properties that are strongly associated with (and therefore
important for) the correct classification of emotions and
judgments of human-likeness. As shown in Fig. 3, certain
facial expression models elicit particularly high performance.
To identify the specific face movements that further boost
performance, we first computed the MI between each Action
Unit and very high performance for each emotion separately.
High MI would indicate that an Action Unit boosts perfor-
mance. We defined very high performance as the accuracy
elicited by the top 25% of facial expression models in each
task – i.e., recognition accuracy and judgments of human-
likeness – separately. We established statistical significance
using the Monte Carlo method described above. These high-
performance Action Units are also displayed in the face maps
in Figs. 5A-Band indicated with white asterisks in the color-
coded matrices. Next, to identify and characterize the specific
dynamic Action Unit properties that boost performance, we
conducted a similar MI analysis as before. These Action
Units are also displayed in the face maps in Figs. 5A-B
with the color-coded matrices showing the specific level of
each temporal parameter. For example, in fear, Brow Lowerer
(AU4) boosts performance; for fear, Mouth Stretch with
medium peak latency boosts judgments of human-likeness.
IV. CONCLUSIONS
Here, we transferred a set of 30 culturally-derived dynamic
facial expression models of the six basic emotions to a pop-
ular social robot and compared the recognition accuracy and
human-likeness judgments of East Asian participants with
those of the social robot's existing standardized universal
facial expressions. Results show that these culturally-derived
dynamic facial expression models generally outperformed the
social robot's existing facial expressions on both recognition
accuracy and judgements of human-likeness. Further anal-
ysis of the facial expression models revealed that certain
Action Units and temporal dynamic properties drive high
performance on both recognition accuracy and judgements
of human-likeness. We also showed that the misclassification
of the anger facial expression models as disgust could be
due to shared Action Units – i.e., Upper Lip Raiser (AU10R
and AU10L) and Cheek Raiser Left (AU6L) – as shown
in Fig. 4B(see also e.g. [27, 51, 52]) that could reflect
a latent signalling structure across emotion categories [17].
Identifying such common face movement patterns and those
that clearly distinguish specific emotion categories could
therefore better inform the design of social robots to further
enhance performance.
Together, our results highlight the advantage of using
culturally-sensitive dynamic facial expressions that are de-
rived from cultural perception using data-driven methods
over the theoretically-derived standardized facial expressions
of emotion that currently in common use. We therefore
anticipate that our data-driven and user-centred approach
to modelling dynamic facial expressions will be used to
further diversify and refine the social face signal gener-
ation capabilities of social robots. Such facial expression
models could also be used, in conjunction with additional
information about cultural context, to improve the social
sensing capabilities of social robots. User-directed selec-
tions of culturally-sensitive facial expressions in artificial
agents could also meet personal preferences such as learning
culture-specific facial expressions. Together, we anticipate
that the use of data-driven approaches will further inform
the design of culturally-sensitive digital agents to improve
their performance, accessibility, and marketability within a
culturally diverse global market.
REFERENCES
[1] C. Darwin, The Expression of the Emotions in Man and Animals, 3rd
ed. London: Fontana Press, 1999/1872.
[2] P. Ekman, E. R. Sorenson, and W. V. Friesen, ”Pan-Cultural Elements
in Facial Displays of Emotion,” Science, vol. 164, pp. 86-88, April 4,
1969 1969.
[3] P. Ekman and W. Friesen, ”A new pan-cultural facial expression of
emotion,” Motivation and Emotion, vol. 10, pp. 159-168, 1986.
[4] D. Matsumoto, ”American-Japanese Cultural Differences in the
Recognition of Universal Facial Expressions,Journal of Cross-
Cultural Psychology, vol. 23, pp. 72-84, 1992.
[5] M. Biehl, D. Matsumoto, P. Ekman, V. Hearn, K. Heider, T. Kudoh,
et al., ”Matsumoto and Ekman's Japanese and Caucasian Facial Ex-
pressions of Emotion (JACFEE): Reliability Data and Cross-National
Differences,Journal of Nonverbal Behavior, vol. 21, pp. 3-21, 1997.
[6] J. D. Boucher and G. E. Carlson, ”Recognition of Facial Expression
in Three Cultures,” Journal of Cross-Cultural Psychology, vol. 11, pp.
263-280, 1980.
[7] Kimiko Shimoda, M. Argyle, and P. R. Bitti, ”The intercultural
recognition of emotional expressions by three national racial groups:
English, Italian and Japanese,” European Journal of Social Psychology,
vol. 8, pp. 169-179, 1978.
[8] P. Ekman and W. V. Friesen, Manual for the Facial Action Coding
System. Palo Alto, CA: Consulting Psychologists Press, 1978.
[9] M. Pantic and L. J. M. Rothkrantz, ”Automatic analysis of facial
expressions: The state of the art,” IEEE Transactions on pattern
analysis and machine intelligence, vol. 22, pp. 1424-1445, 2000.
[10] N. Lazzeri, D. Mazzei, A. Greco, A. Rotesi, A. Lanat, and D. E. De
Rossi, ”Can a humanoid face be expressive? A psychophysiological
investigation,Frontiers in bioengineering and biotechnology, vol. 3,
p. 64, 2015.
[11] C. C. Bennett and S. Sabanovic, ”Deriving minimal features for
human-like facial expressions in robotic faces,International Journal
of Social Robotics, vol. 6, pp. 367-381, 2014.
[12] E. G. Krumhuber and K. R. Scherer, ”The look of fear from the eyes
varies with the dynamic sequence of facial actions,Swiss Journal of
Psychology, 2016.
[13] L. Canamero and J. Fredslund, ”I show you how I like you-can you
read it in my face? [robotics],” IEEE Transactions on systems, man,
and cybernetics – Part A: Systems and humans, vol. 31, pp. 454-459,
2001.
[14] T. Hashimoto, S. Hitramatsu, T. Tsuji, and H. Kobayashi, ”Develop-
ment of the face robot SAYA for rich facial expressions,” in SICE-
ICASE, 2006. International Joint Conference, 2006, pp. 5423-5428.
[15] S. Al Moubayed, J. Beskow, G. Skantze, and B. Granstrm, ”Furhat: a
back-projected human-like robot head for multiparty human-machine
interaction,” in Cognitive behavioural systems, ed: Springer, 2012, pp.
114-130.
[16] T. Fong, I. Nourbakhsh, and K. Dautenhahn, ”A survey of socially
interactive robots,Robotics and autonomous systems, vol. 42, pp.
143-166, 2003.
[17] R. E. Jack, O. G. B. Garrod, H. Yu, R. Caldara, and P. G. Schyns,
”Facial expressions of emotion are not culturally universal,Proceed-
ings of the National Academy of Sciences, vol. 109, pp. 7241-7244,
2012 2012.
[18] H. A. Elfenbein and N. Ambady, ”On the universality and cultural
specificity of emotion recognition: A meta-analysis,” Psychological
Bulletin, vol. 128, pp. 203-235, 2002.
[19] B. Mesquita and N. H. Frijda, ”Cultural variations in emotions: a
review,” Psychological Bulletin, vol. 112, pp. 179-204, Sep 1992.
[20] J. A. Russell, ”Is there universal recognition of emotion from facial
expression? A review of the cross-cultural studies,Psychological
Bulletin, vol. 115, pp. 102-41, Jan 1994.
[21] L. Ducci, L. Arcuri, and T. Sineshaw, ”Emotion Recognition in
Ethiopia The Effect of Familiarity with Western Culture on Accuracy
of Recognition,” Journal of Cross-Cultural Psychology, vol. 13, pp.
340-351, 1982.
[22] P. Ekman, ”Universals and cultural differences in facial expressions of
emotion,” presented at the Nebraska Symposium on Motivation, 1972.
[23] P. Ekman, W. V. Friesen, M. O'Sullivan, A. Chan, I. Diacoyanni-
Tarlatzis, K. Heider, et al., ”Universals and cultural differences in the
judgments of facial expressions of emotion,” Journal of Personality
and Social Psychology, vol. 53, pp. 712-7, Oct 1987.
[24] H. A. Elfenbein and N. Ambady, ”Universals and Cultural Differences
in Recognizing Emotions,” Current Directions in Psychological Sci-
ence, vol. 12, pp. 159-164, October 1, 2003 2003.
[25] H. A. Elfenbein, M. Mandal, N. Ambday, S. Harizuka, and S.
Kumar, ”Hemifacial differences in the in-group advantage in emotion
recognition,” Cognition and Emotion, vol. 18, pp. 613-629, 2004.
[26] Y. Huang, S. Tang, D. Helmeste, T. Shioiri, and T. Someya, ”Dif-
ferential judgement of static facial expressions of emotions in three
cultures,” Psychiatry and clinical neurosciences, vol. 55, pp. 479-483,
2001.
[27] R. E. Jack, C. Blais, C. Scheepers, P. G. Schyns, and R. Caldara,
”Cultural confusions show that facial expressions are not universal,
Current Biology, vol. 19, pp. 1543-1548, 2009.
[28] G. Kirouac and F. Y. Dore, ”Accuracy of the judgment of facial
expression of emotions as a function of sex and level of education,
Journal of Nonverbal Behavior, vol. 9, pp. 3-7, 1985.
[29] G. Kirouac and F. Y. Dor, ”Accuracy and latency of judgment of
facial expressions of emotions,” Perceptual and Motor Skills, vol. 57,
pp. 683-686, 1983.
[30] D. Matsumoto and P. Ekman, ”American-Japanese cultural differences
in intensity ratings of facial expressions of emotion,” Motivation and
Emotion, vol. 13, pp. 143-157, 1989.
[31] F. T. McAndrew, ”A cross-cultural study of recognition thresholds for
facial expressions of emotion,” Journal of Cross-Cultural Psychology,
vol. 17, pp. 211-224, 1986.
[32] T. Shioiri, T. Someya, D. Helmeste, and S. W. Tang, ”Misinterpretation
of facial expression: A cross-cultural study,Psychiatry and clinical
neurosciences, vol. 53, pp. 45-50, 1999.
[33] R. E. Jack, ”Culture and facial expressions of emotion,Visual
Cognition, vol. 21, pp. 1248-1286, Sep 1 2013.
[34] N. L. Nelson and J. A. Russell, ”Universality revisited,Emotion
Review, vol. 5, pp. 8-15, 2013 2013.
[35] C. Chen, C. Crivelli, O. G. B. Garrod, P. G. Schyns, J.-M. Fernndez-
Dols, and R. E. Jack, ”Distinct facial expressions represent pain and
pleasure across cultures,” Proceedings of the National Academy of
Sciences, 2018.
[36] C. Crivelli, J. A. Russell, S. Jarillo, and J.-M. Fernndez-Dols, ”The fear
gasping face as a threat display in a Melanesian society,Proceedings
of the National Academy of Sciences, vol. 113, pp. 12403-12407, 2016.
[37] J. Henrich, S. Heine, and A. Norenzayan, ”The weirdest people in the
world?,” Behavioral and Brain Sciences, vol. 33, pp. 61-83, 2010.
[38] R. E. Jack, C. Crivelli, and T. Wheatley, ”Data-driven methods
to diversify knowledge of human psychology,” Trends in Cognitive
Sciences, vol. 22, pp. 1-5, 2018.
[39] E. Krumhuber, A. S. Manstead, D. Cosker, D. Marshall, P. L. Rosin,
and A. Kappas, ”Facial dynamics as indicators of trustworthiness and
cooperative behavior,” Emotion, vol. 7, p. 730, 2007.
[40] G. Trovato, T. Kishi, N. Endo, K. Hashimoto, and A. Takanishi,
”A cross-cultural study on generation of culture dependent facial
expressions of humanoid social robot,” in International Conference
on Social Robotics, 2012, pp. 35-44.
[41] R. E. Jack, W. Sun, I. Delis, O. G. Garrod, and P. G. Schyns, ”Four
not six: Revealing culturally common facial expressions of emotion,
Journal of Experimental Psychology: General, vol. 145, p. 708, 2016.
[42] H. Yu, O. G. B. Garrod, and P. G. Schyns, ”Perception-driven facial
expression synthesis,” Computers and Graphics, vol. 36, pp. 152-162,
2012.
[43] M. Rychlowska, R. E. Jack, O. G. Garrod, P. G. Schyns, J. D. Martin,
and P. M. Niedenthal, ”Functional smiles: Tools for love, sympathy,
and war,Psychological Science, vol. 28, pp. 1259-1270, 2017.
[44] D. Gill, O. G. Garrod, R. E. Jack, and P. G. Schyns, ”Facial
movements strategically camouflage involuntary social signals of face
morphology,Psychological science, vol. 25, pp. 1079-1086, 2014.
[45] C. Chen, O. Garrod, P. Schyns, and R. Jack, ”The Face is the Mirror
of the Cultural Mind,” Journal of vision, vol. 15, pp. 928-928, 2015.
[46] C. Chen, O. G. Garrod, J. Zhan, J. Beskow, P. G. Schyns, and R. E.
Jack, ”Reverse Engineering Psychologically Valid Facial Expressions
of Emotion into Social Robots,” in Automatic Face and Gesture
Recognition (FG 2018), 2018 13th IEEE International Conference on,
2018, pp. 448-452.
[47] L. Ibrahimagic-Seper, A. Celebic, N. Petricevic, and E. Selimovic,
”Anthropometric differences between males and females in face di-
mensions and dimensions of central maxillary incisors,” Medicinski
glasnik, vol. 3, pp. 58-62, 2006.
[48] E. Hall, The Hidden Dimension. Garden City, NY: Doubleday, 1966.
[49] T. E. Nichols and A. P. Holmes, ”Nonparametric permutation tests
for functional neuroimaging: a primer with examples,” Human Brain
Mapping, vol. 15, pp. 1-25, 2002.
[50] T. M. Cover and J. A. Thomas, Elements of information theory: John
Wiley and Sons, 2012.
[51] S. Du, Y. Tao, and A. M. Martinez, ”Compound facial expressions
of emotion,” Proceedings of the National Academy of Sciences, p.
201322355, 2014.
[52] V. Shuman, E. Clark-Polner, B. Meuleman, D. Sander, and K. R.
Scherer, ”Emotion perception from a componential perspective,Cog-
nition and Emotion, vol. 31, pp. 47-56, 2017.
... There are similar advances in robotic realism, including human-like skin textures (Hu & Hoffman, 2019), more purposive uses of touching (Willemse & Van Erp, 2019), better body language (Marmpena, Lim, & Dahl, 2018), and more detailed facial expressions (Chen et al., 2019). These features may give some social robots a commercial edge precisely because they make humans and machines less distinguishable. ...
Article
Social robots have limited social competences. This leads us to view them as depictions of social agents rather than actual social agents. However, people also have limited social competences. We argue that all social interaction involves the depiction of social roles and that they originate in, and are defined by, their function in accounting for failures of social competence.
... There are similar advances in robotic realism, including human-like skin textures (Hu & Hoffman, 2019), more purposive uses of touching (Willemse & Van Erp, 2019), better body language (Marmpena, Lim, & Dahl, 2018), and more detailed facial expressions (Chen et al., 2019). These features may give some social robots a commercial edge precisely because they make humans and machines less distinguishable. ...
Article
Clark and Fischer (C&F) discuss how people interact with social robots in the context of a general analysis of interaction with characters. I suggest that a consideration of aesthetic illusion would add nuance to this analysis. In addition, I illustrate how people's experiences with other depictions of characters require adjustments to C&F's claims.
Article
Humanoid robots can now learn the art of social synchrony using neural networks.
Article
Large language models are enabling rapid progress in robotic verbal communication, but nonverbal communication is not keeping pace. Physical humanoid robots struggle to express and communicate using facial movement, relying primarily on voice. The challenge is twofold: First, the actuation of an expressively versatile robotic face is mechanically challenging. A second challenge is knowing what expression to generate so that the robot appears natural, timely, and genuine. Here, we propose that both barriers can be alleviated by training a robot to anticipate future facial expressions and execute them simultaneously with a human. Whereas delayed facial mimicry looks disingenuous, facial coexpression feels more genuine because it requires correct inference of the human’s emotional state for timely execution. We found that a robot can learn to predict a forthcoming smile about 839 milliseconds before the human smiles and, using a learned inverse kinematic facial self-model, coexpress the smile simultaneously with the human. We demonstrated this ability using a robot face comprising 26 degrees of freedom. We believe that the ability to coexpress simultaneous facial expressions could improve human-robot interaction.
Chapter
There have recently been many studies on face animation according to sound, but facial expressions have not yet accurately represented and clarified the semantic meaning of the text. Studies show that characters need to represent at least six basic emotions: happy, sad, fear, disgust, anger, surprise. However, creating for facial animation for virtual characters is time-consuming and requires high creativity. The main objective of this study is to create Facial Animations according to Vietnamese Semantics (FAVS) more easily. The method is based on the important numerical blendshapes of the 3D model. The input text after predicting the emotion will be passed to the lips synchronous and emotion animating to perform the 3D face animation. Do the comparison with 2 methods: create animation by direct control with real human face via webcam and using keyframe methods. Assess the emotional expression of 3D characters according to the above 3 approaches. Survey respondents were asked to recognize a 3D Virtually sensitive emotional pattern generated for each sentence of text input and give a confidence score for each sentence. Survey results show, negative emotions are the most recognizable, happy and excited are easily confused.
Chapter
As our society ages, questions concerning the relations between generations gain importance. The quality of human relations depends on the quality of emotion communication, which is a significant part of our daily interactions. Emotion expressions serve not only to communicate how the expresser feels, but also to communicate intentions (whether to approach or retreat) and personality traits (such as dominance, trustworthiness, or friendliness) that influence our decisions regarding whether and how to interact with a person. Emotion Communication by the Aging Face and Body delineates how aging affects emotion communication and person perception by bringing together research across multiple disciplines. Scholars and graduate students in the psychology of aging, affective science, and social gerontology will benefit from this over-view and theoretical framework.
Article
The development of data-driven behaviour generating systems has recently become the focus of considerable attention in the fields of human-agent interaction (HAI) and human-robot interaction (HRI). Although rule-based approaches were dominant for years, these proved inflexible and expensive to develop. The difficulty of developing production rules, as well as the need for manual configuration in order to generate artificial behaviours, places a limit on how complex and diverse rule-based behaviours can be. In contrast, actual human-human interaction data collected using tracking and recording devices makes human-like multimodal co-speech behaviour generation possible using machine learning and specifically, in recent years, deep learning. This survey provides an overview of the state-of-the-art of deep learning-based co-speech behaviour generation models and offers an outlook for future research in this area.
Article
Full-text available
Find the contribution here: https://www.jbe-platform.com/content/journals/10.1075/is.22017.wul ------ Pre-print full version attached ------ Social appropriateness is an important topic – both in the human-human interaction (HHI), and in the human-machine interaction (HMI) context. As sociosensitive and socioactive assistance systems advance, the question arises whether a machine’s behavior should include considerations regarding social appropriateness. However, the concept of social appropriateness is difficult to define, as it is determined by multiple aspects. Thus, to date, a unified perspective, encompassing and combining multidisciplinary findings, is missing. When translating results from HHI to HMI, it remains unclear whether such insights into the dynamics of social appropriateness between humans may in fact apply to sociosensitive and socioactive assistance systems. To shed light on this matter, we propose the Five Factor Model of Social Appropriateness (FASA) which provides a multidisciplinary perspective on the notion of social appropriateness and its implementation into technical systems. Finally, we offer reflections on the applicability and ethics of the FASA Model, highlighting both strengths and limitations of the framework.
Article
Interactions with social robots are symbolic experiences guided by the pretense that robots depict real people. But they can also be natural experiences that are direct, automatic, and independent of any thoughtful mapping between what is real and depicted. Both experiences are important, both may apply within the same interaction, and they may vary within a person over time.
Article
Full-text available
Significance Humans often use facial expressions to communicate social messages. However, observational studies report that people experiencing pain or orgasm produce facial expressions that are indistinguishable, which questions their role as an effective tool for communication. Here, we investigate this counterintuitive finding using a new data-driven approach to model the mental representations of facial expressions of pain and orgasm in individuals from two different cultures. Using complementary analyses, we show that representations of pain and orgasm are distinct in each culture. We also show that pain is represented with similar face movements across cultures, whereas orgasm shows differences. Our findings therefore inform understanding of the possible communicative role of facial expressions of pain and orgasm, and how culture could shape their representation.
Conference Paper
Full-text available
Social robots are now part of human society, destined for schools, hospitals, and homes to perform a va- riety of tasks. To engage their human users, social robots must be equipped with the essential social skill of facial expression communication. Yet, even state-of-the-art social robots are limited in this ability because they often rely on a restricted set of facial expressions derived from theory with well-known limitations such as lacking naturalistic dynamics. With no agreed methodology to objectively engineer a broader variance of more psychologically impactful facial expressions into the social robots' repertoire, human-robot interactions remain restricted. Here, we address this generic challenge with new methodologies that can reverse-engineer dynamic facial expressions into a social robot head. Our data-driven, user-centered approach, which combines human perception with psychophysical methods, produced highly recognizable and human-like dynamic facial expressions of the six classic emotions that generally outperformed state-of-art social robot facial expressions. Our data demonstrates the feasibility of our method applied to social robotics and highlights the benefits of using a data-driven approach that puts human users as central to deriving facial expressions for social robots. We also discuss future work to reverse-engineer a wider range of socially relevant facial expressions including conversational messages (e.g., interest, confusion) and personality traits (e.g., trustworthiness, attractiveness). Together, our results highlight the key role that psychology must continue to play in the design of social robots.
Article
Full-text available
Psychology aims to understand real human behavior. However, cultural biases in the scientific process can constrain knowledge. We describe here how data-driven methods can relax these constraints to reveal new insights that theories can overlook. To advance knowledge we advocate a symbiotic approach that better combines data-driven methods with theory.
Article
Full-text available
A smile is the most frequent facial expression, but not all smiles are equal. A social-functional account holds that smiles of reward, affiliation, and dominance serve basic social functions, including rewarding behavior, bonding socially, and negotiating hierarchy. Here, we characterize the facial-expression patterns associated with these three types of smiles. Specifically, we modeled the facial expressions using a data-driven approach and showed that reward smiles are symmetrical and accompanied by eyebrow raising, affiliative smiles involve lip pressing, and dominance smiles are asymmetrical and contain nose wrinkling and upper-lip raising. A Bayesian-classifier analysis and a detection task revealed that the three smile types are highly distinct. Finally, social judgments made by a separate participant group showed that the different smile types convey different social messages. Our results provide the first detailed description of the physical form and social messages conveyed by these three types of functional smiles and document the versatility of these facial expressions.
Article
Full-text available
Significance Humans interpret others’ facial behavior, such as frowns and smiles, and guide their behavior accordingly, but whether such interpretations are pancultural or culturally specific is unknown. In a society with a great degree of cultural and visual isolation from the West—Trobrianders of Papua New Guinea—adolescents interpreted a gasping face (seen by Western samples as conveying fear and submission) as conveying anger and threat. This finding is important not only in supporting behavioral ecology and the ethological approach to facial behavior, as well as challenging psychology’s approach of allegedly pancultural “basic emotions,” but also in applications such as emotional intelligence tests and border security.
Article
Full-text available
As a highly social species, humans generate complex facial expressions to communicate a diverse range of emotions. Since Darwin's work, identifying among these complex patterns which are common across cultures and which are culture-specific has remained a central question in psychology, anthropology, philosophy, and more recently machine vision and social robotics. Classic approaches to addressing this question typically tested the cross-cultural recognition of theoretically motivated facial expressions representing 6 emotions, and reported universality. Yet, variable recognition accuracy across cultures suggests a narrower cross-cultural communication supported by sets of simpler expressive patterns embedded in more complex facial expressions. We explore this hypothesis by modeling the facial expressions of over 60 emotions across 2 cultures, and segregating out the latent expressive patterns. Using a multidisciplinary approach, we first map the conceptual organization of a broad spectrum of emotion words by building semantic networks in 2 cultures. For each emotion word in each culture, we then model and validate its corresponding dynamic facial expression, producing over 60 culturally valid facial expression models. We then apply to the pooled models a multivariate data reduction technique, revealing 4 latent and culturally common facial expression patterns that each communicates specific combinations of valence, arousal, and dominance. We then reveal the face movements that accentuate each latent expressive pattern to create complex facial expressions. Our data questions the widely held view that 6 facial expression patterns are universal, instead suggesting 4 latent expressive patterns with direct implications for emotion communication, social psychology, cognitive neuroscience, and social robotics. (PsycINFO Database Record
Article
Full-text available
Most research on the ability to interpret expressions from the eyes has utilized static information. This research investigates whether the dynamic sequence of facial actions in the eye region influences the judgments of perceivers. Dynamic fear expressions involving the eye region and eyebrows were created which systematically differed in the sequential occurrence of facial actions. Participants rated the intensity of sequential fear expressions, either in addition to a simultaneous, full-blown expression (Experiment 1) or in combination with different levels of eye gaze (Experiment 2). The results showed that the degree of attributed emotion and the appraisal ratings differed as a function of the sequence of facial expressions of fear, with direct gaze resulting in stronger subjective responses. The findings challenge current notions surrounding the study of static facial displays from the eyes and suggest that emotion perception is a dynamic process shaped by the time course of the facial actions of an expression. Possible implications for the field of affective computing and clinical research are discussed.
Conference Paper
Full-text available
With the advent of the digital economy, increasing globalization and cultural integration, cross-cultural social communication is increasing, where the mutual understanding of mental states (e.g., confusion, bored) is a key social skill. One of the most powerful tools in social communication is the face, which can flexibly create a broad spectrum of dynamic facial expressions. Yet, systematic cultural differences in face signalling and decoding (e.g., see Jack, 2013 for a review) presents a challenge to the evolving communication needs of modern society (e.g., designing culturally aware digital avatars and companion robots that can adaptively recognize and produce both culture-specific and universal face signals). Understanding which face signals support accurate communication across cultures, and those that produce confusions therefore remains a central question. To address this question, we used a 4D Generative Face Grammar (GFG, Yu et al., 2012) with reverse correlation (Ahumada & Lovell, 1972) to model the dynamic facial expressions of four mental states - 'thinking,' 'interested,' 'bored' and 'confused' - in 15 Western Caucasian (WC) and 15 East Asian (EA) observers (See Figure S1 Panel A. See also Jack et al., 2012, 2014, Gill et al., 2014). Cross-cultural comparison of the dynamic models revealed, for each mental state, clear commonalities (see Figure S1, Panel B, Common Signals) and cultural specificities in AU patterns (Culture-specific Signals). To illustrate, in 'confused,' Cheek Raiser/Lip Stretcher are culturally common, whereas Upper Lip Raiser is WC-specific and Jaw Drop is EA-specific. Similarly, in 'thinking,' the Chin Raiser is culturally common, whereas the Dimpler is WC-specific and, in contrast, Brow Lowerer/Nostril Compressor are EA-specific. Together, our data provides a common face signalling basis for cross-cultural social communication, and identifies confusing face signals, with implications for the digital economy (e.g., algorithms designed to automatically detect face signals, e.g., Vinciarelli et al., 2009). Meeting abstract presented at VSS 2015.