Conference PaperPDF Available

Laughter Type Recognition from Whole Body Motion

Authors:

Abstract and Figures

Despite the importance of laughter in social interactions it remains little studied in affective computing. Respiratory, auditory, and facial laughter signals have been investigated but laughter-related body movements have received almost no attention. The aim of this study is twofold: first an investigation into observers' perception of laughter states (hilarious, social, awkward, fake, and non-laughter) based on body movements alone, through their categorization of avatars animated with natural and acted motion capture data. Significant differences in torso and limb movements were found between animations perceived as containing laughter and those perceived as nonlaughter. Hilarious laughter also differed from social laughter in the amount of bending of the spine, the amount of shoulder rotation and the amount of hand movement. The body movement features indicative of laughter differed between sitting and standing avatar postures. Based on the positive findings in this perceptual study, the second aim is to investigate the possibility of automatically predicting the distributions of observer's ratings for the laughter states. The findings show that the automated laughter recognition rates approach human rating levels, with the Random Forest method yielding the best performance.
Content may be subject to copyright.
Laughter Type Recognition from Whole Body
Motion
Harry J. Griffin
, Min S. H. Aung
, Bernardino Romera-Paredes
, Ciaran McLoughlin
Gary McKeown
, William Curran
, Nadia Bianchi-Berthouze
UCL Interaction Centre, University College London, London, UK
School of Psychology, Queen’s University Belfast, UK
Email: (harry.griffin, m.aung, ucabbro, ucjt511, n.berthouze)@ucl.ac.uk
Email: (G.McKeown, w.curran)@qub.ac.uk
Abstract—Despite the importance of laughter in social interac-
tions it remains little studied in affective computing. Respiratory,
auditory, and facial laughter signals have been investigated
but laughter-related body movements have received almost no
attention. The aim of this study is twofold: first an investigation
into observers’ perception of laughter states (hilarious, social,
awkward, fake, and non-laughter) based on body movements
alone, through their categorization of avatars animated with
natural and acted motion capture data. Significant differences
in torso and limb movements were found between animations
perceived as containing laughter and those perceived as non-
laughter. Hilarious laughter also differed from social laughter
in the amount of bending of the spine, the amount of shoulder
rotation and the amount of hand movement. The body movement
features indicative of laughter differed between sitting and
standing avatar postures. Based on the positive findings in this
perceptual study, the second aim is to investigate the possibility
of automatically predicting the distributions of observer’s ratings
for the laughter states. The findings show that the automated
laughter recognition rates approach human rating levels, with
the Random Forest method yielding the best performance.
Keywords: laughter, body movement, automatic emotion recog-
nition, automatic laughter type recognition, laughter type perception
I. INTRODUCTION
The increasing use of virtual agents and robots in enter-
tainment, collaborative, and support roles places ever greater
demands on their ability to detect users’ emotional state
from various modalities (body movements, facial expressions,
speech) and produce emotional displays. This is particularly
true in socially complex human-computer interactions such
as education, rehabilitation and health scenarios. In these
situations emotionally expressive agents are much preferred
by users [1].
Laughter is a ubiquitous and complex signal that remains
relatively uninvestigated, in contrast to studies on other emo-
tional expressions such as smiling [2]. Due to the range
of vocal and physical expressions of laughter, its detection
and synthesis are very challenging. Laughter does more than
express hilarity. It can convey negative and mixed emotions
and act as an invitation to shared expression [3]. At least
23 types of laughter have been identified (hilarious, anxious,
embarrassed, etc.) [4] with each laughter type having its own
social function. Hence, the ability to produce the appropriate
type and intensity of laughter in response to a user’s emotional
signals, including laughter, would be a dramatic step forward
in the realism and possibly efficacy of virtual agents.
There have been few studies on synthesizing laughter
in virtual agents, most of which have focused on acoustics
and the face [5], [6]. Urbain et al. present a laughter ma-
chine that is able to recognize laughter from sounds and
give a response [7]. The distinctive respiration patterns of
laughter have been widely corroborated [8] and integrated
into anatomically inspired models of laughter [9]. Recently,
Niewiadomski and Pelachaud investigated the coordination of
virtual agents’ laughter respiration behaviour with other visual
cues; however, this work is mainly based on hilarious laughter
[10]. A further difficulty for synthesis of laughter-related body
movements is that stereotypical laughter actions, e.g. clutching
ones abdomen, rocking back and forth, slapping one’s leg, are
well known but may be seen as exaggerated and unnatural.
Work on automatic recognition of laughter has also started
to emerge but, as with the synthesis of laughter, has mostly
focused the acoustic modality (e.g., [11]–[13]) and more
recently on the combination of face and voice cues [14]. Less
attention has been given to body laughter expressions. Whole-
body postural changes and peripheral gestures associated with
different types of laughter remain unelucidated. In [15], the
authors use electromyographic sensors to measure diaphrag-
matic activity to detect laughter in people watching television.
This is used to trigger laughter in nearby robotic dolls with
the aim of enhancing the user’s laughter.
More recently, there has been interest in creating automatic
classifiers able to differentiate laughter types. To this end,
motion descriptors based on energy estimates, correlation of
shoulder movements and periodicity to characterise laughter
have been investigated [16]. Using a combination of these
measures a Body Laughter Index (BLI) was calculated. The
BLIs of 8 laughter clips were compared with 8 observers’
ratings of the energy of the shoulder movement. A correlation,
albeit weak, between the observers’ ratings and BLIs was
found.
There has been growing evidence supporting the possibility
of automatically discriminating between different emotions
from various modalities: acoustics [17], facial expressions [18]
and body [19]–[23]. Moreover, the study in [24] went further
in trying to characterize different types of laughter. They
investigated automatic discrimination of five types of acted
laughter: happiness, giddiness, excitement, embarrassment and
hurtful. Actors were asked to enact these five emotions using
both vocal and facial expressions whilst they were video-
recorded. The video clips were labelled by expert observers
who were also made aware of the intention of the actors. The
results showed that automatic recognition based only on the
vocal features reach higher accuracy (70% correct recognition)
than when using both facial and vocal features (60% correct
recognition) or facial features alone (40% correct recognition).
While, on the basis of these results, the authors argue that
vocal expressions carry more emotional information than facial
expressions, it should be noted that the actors were asked to try
to keep the head as still as possible so that it was always frontal
to the video camera. These may have constrained and limited
the way people expressed their laughter through their faces and
head movements. In addition, the fact that the expressions were
acted also raises the questions of how naturalistic they were.
One could argue that we are better at acting an expression
through our voice since we can hear it, while we cannot see
our face. This is particularly true when the actors are not
professionals but lay people.
In this study we investigate perception of laughter type
from body movements and lay the groundwork for laughter
type recognition from these cues. This study makes two
contributions: first, by identifying body movements that are
perceived as indicative of different types of natural laughter,
it informs more convincing animation of laughter in avatars,
which will increase their perceived conversational authenticity
and emotional range. Second, it investigates if it is possible to
automatically discriminate between different types of laughter
by comparing a wide range of automated recognition methods.
II. MOTION DATA COLLECTION
Users’ perception of laughter-related body movements was
investigated in a forced-choice perceptual experiment. Body
movements captured during different types of natural and acted
laughter were used to animate an avatar. Observers categorized
the animations as hilarious, social, awkward, fake, or non-
laughter. Naive observers’ categorizations were used to allow
analysis of the perception of body movements in the absence
of other modalities e.g., verbal, facial, and in the absence of
knowledge of the eliciting stimulus and context.
A. Laughter Capture
Nine pairs of participants took part in a motion capture
recording session. The movements of one member of each pair
(subjects - 3 male, 6 female, mean age 25.7) were captured
using a whole-body inertial motion capture suit (Animazoo
IGS-190). The suit was modified to maximize the sensitivity to
spine and shoulder movements. Tasks to elicit laughter in both
standing and sitting postures included word games, collabora-
tive games (Pictionary) and humorous videos [25]. Laughter
also occurred during conversation during “rest” periods. The
subjects also produced fake laughter on request.
B. Stimulus Preparation
Using video recordings of the motion capture session, we
segmented laughter episodes and gave them preliminary la-
bels: hilarious; social (back-channeling, polite, conversational
laughter); awkward (involving a negative emotion such as
Fig. 1. Examples stills from the animated avatars
embarrassment or discomfort on another’s behalf); or fake.
In total, 508 laughter segments and 41 randomly located non-
laughter segments, some containing other behaviour such as
talking, were identified. The motion capture data from these
segments were used to animate an avatar defined by the
positional co-ordinate triplets of 26 anatomical points over the
whole body. The anatomical proportions were the same for all
animations (Figure 1). Viewing angle was standardized to a
slightly elevated ¾ viewpoint, although models were free to
walk and turn in the standing tasks. One hundred and twenty-
six animations (experimenter labels: 34 hilarious, 43 social,
16 awkward, 19 fake, 14 non-laughter - mean duration =
4.1s, SD = 1.8s) were selected as stimuli for the perceptual
phase (non-laughter animations were chosen randomly from
previous sample, with durations within the range of dura-
tions of laughter animations). This ratio of laughter types
according to experimenter-determined labels was designed
to match the frequency of laughter-types in a naturalistic
database [4]. Note that the level of agreement between the
experimenter-determined labels and observers’ categorization
is not of interest here; rather we wished to establish which body
movements are perceived by the observers as indicative of
different laughter types. Therefore this distribution of stimuli
by experimenter-determined labels was implemented only with
the aim of producing sufficient segments in each observer-
determined category to allow valid statistical analysis of body
movement. The observers’ categorisations act as our ground
truth and the experimenter determined labels are not used in
the analysis.
III. PERCEPTUAL STUDY:
A. Body Feature Analysis
Thirty-two observers (17 male, 15 female, mean age 33.0)
viewed the clips of the animated avatar in random order and
categorized each clip as hilarious, social, awkward, fake or
non-laughter. No audio was presented with the animations.
The modal laughter category selected by the observers
acted as the ground truth for the statistical analysis of body
movement features [19]. The number of potential movement
features that can be analyzed is large and increases exponen-
tially if the interactions of multiple features are considered in
combination. Therefore, our selection of features was based on
previous findings in the literature [9], [26] and observers’ com-
ments in post-experiment interviews on which features they
found useful in categorizing laughter. These included postural
changes such as bending of the spine and gestures such as
moving a hand toward the face or abdomen (Table I). Feature
analysis was based on the position coordinate triplets of the
relevant anatomical nodes. Maximum and minimum bending
were calculated as greatest and smallest deviation respectively
TABLE I. LIST OF KNOWLEDGE BASED FEATURES TO BE ANALYSED.
Hands/gesture
Maximum, minimum and range of distance between hands
Maximum, minimum and range of distance of left hand from hip
Maximum, minimum and range of distance of right hand from hip
Maximum, minimum and range of distance of left hand from head
Maximum, minimum and range of distance of right hand from head
Shoulder movement
Correlation of left and right shoulder-hip distances
Range of azimuthal shoulder rotation
Spine and neck bending
Maximum, minimum and range of upper back bending
Maximum, minimum and range of lower back bending
Maximum, minimum and range of neck bending
Maximum, minimum and range of compound spine bending
from collinearity of the spine sections adjacent to the node
in question. Range of bending was calculated as maximum
bending minus minimum bending. Bending was calculated at
each spine node including the neck, and collectively across
all spine nodes (compound bending), defined as the sum of
deviation from collinearity of all spine sections. Distances were
calculated as Euclidean distances in 3D space. The features for
hilarious, social and non-laughter segments were entered into
separate one-way ANOVAs for standing and sitting segments
( the independent variable was the modal observer categoriza-
tion). Planned comparisons tested differences between laughter
and non-laughter (hilarious and social vs. non-laughter) and
between laughter types (hilarious vs. social).
B. Ground Truth from Observer Categorization
The mean number of observers who selected the modal
category was 13.8 (SD = 4.3) with a maximum agreement of
29 of the 32 observers. Segments tied for the modal category
were excluded from the body movement analysis, as were
segments for which the modal category was selected by less
than
1
3
of observers (< 11/32). For all experimenter defined
labels, the most common observer categorization was social or
non-laughter. Too few awkward (N = 4) and fake (N = 1)
remained so these were excluded from further analysis. Ninety-
one segments (52 standing; 39 sitting) were entered into the
final analysis of body movements.
C. Body Movements
For sitting laughter, ANOVAs revealed main effects of
observer categorization on the range of distance between the
hands, and the range of both hands’ distance from the head
and hip (all F (2, 36) > 7, p .003); the range of azimuthal
shoulder rotation (F (2, 36) = 10.04, p < .001); the range of
bending at all spine and neck modes and of compound spine
bending (all F (2, 36) > 11, p < .001); and the minimum
bending at the upper back and neck (both F (2, 36) > 4.5,
p < .02).
For all of these features, planned contrasts revealed signif-
icantly greater activity in laughter than non-laughter segments
(all t
abs
> 2.5, p < .02). Planned comparisons also revealed
greater range of distances of both hands from the hip and of
the left hand from the head in hilarious than social laughter (all
t
abs
> 2, p < .04); and a greater range of azimuthal shoulder
rotation, greater range of bending at all spine and neck nodes
and a greater range of compound spine bending in hilarious
than social laughter (all t
abs
> 2, p < .05).
For standing laughter, ANOVAs revealed main effects of
observer categorization on the range of distance between the
hands, the range of both hands’ distance from the head and
hip, the maximum distance of both hands from the hip and
the minimum distance of the right hand from the head (all
F (2, 49) > 3, p < .05); the range of bending and the
maximum bending of upper and lower back and compound
spine bending (all F (2, 49) > 3, p < .05).
Planned comparisons of these effects revealed greater range
of hand-to-hand, hand-to-head, and hand-to-hip distances for
both hands in laughter than non-laughter segments, and the
range of right-hand-to-hip distances was greater in hilarious
than social laughs (all t
abs
> 2.5, p < .02); both hands moved
further from the hip and the right hand moved closer to the
head in laughter than non-laughter segments (all t
abs
> 3, p <
.05); the range of upper, lower and compound spine bending
was greater for laughter than non-laughter segments and the
range of upper and compound spine bending was greater for
hilarious than social laughs (all t
abs
> 2, p < .05), in addition
the maximum compound spine bending was greater in laughter
than non-laughter segments (t
abs
> 2.46, p = .018).
IV. AUTOMATIC RECOGNITION
The second aim in this study is to investigate the possibility
of automatically predicting the distributions of observers’
ratings for the ve types of laughter. The relative performances
of a broad range of supervised machine learning methods are
tested. In this part of the study we consider the distributions
of the ratings from all 32 observers. This leads to a 5-
output regression problem. If the frequencies of these ratings
are normalised the values can be viewed as a degree of
belief for each outcome and we also preserve a measure of
observer agreement for each instance. This also removes the
need to equate the most frequent label as a ground truth
which is a weak assumption for instances with low agreement.
Moreover, this will also allow for the full set of 126 instances
to be used. The knowledge based features listed in Table I
serves as part of the full feature set for recognition. We also
include kinematically derived motion quantities analogous to
the amount of energy expended. It has been shown that kinetic
energy measures can contribute to the detection of laughter
[16]. For three dimensional motion data a measure analogous
to kinetic energy can be compactly calculated using the sum
of the angular velocity at each joint over for each laughter
segment [22]. Therefore, in the full feature set we also include
the energy from ve upper body articulations: left and right
elbows, left and right shoulders and neck. Initial experiments
showed a low degree of variance in lower body joints for this
dataset and were therefore excluded.
A. Supervised Learning Models
Formally the problem consists of a set of T = 5 supervised
regression tasks, one for each type of laughter (including ’non-
laughter’). We denote by x
i
R
d
, the vector of attributes
describing instance i. We define the matrix of all of the training
instances as X R
d×m
, where m is the number of training
instances and d being the dimensionality of the data. A distinct
label y
i
t
is provided for each task t {1 . . . T }, for instance
i, taken from the frequency of observations. We denote Y
t
R
m
as the vector label t for all instances. We also denote the
corresponding model predicted output as ˆy
i
t
.
a) k-Nearest Neighbour (kNN): This is a simple
model which assigns the value of the predicted output based on
the K nearest training instances in the data space. We attain the
necessary multiple outcome vector by using the means of the
labels from the K nearest neighbours N
K
(x) {1, 2, . . . , m}
of a given instance x. For a test instance x, the prediction is
calculated by ˆy
i
t
=
1
K
P
iN
K
(x)
y
i
t
.
b) Multi Layer Perceptron with Softmax (MLP): The
MLP is a widely used feed forward neural network that can
be naturally applied to learn multiple regression tasks. For our
purposes we further constrain the sum of the network outputs
to 1 by using the softmax activation function [27]. This is an
extension of the logistic function given by:
ˆy
i
t
=
exp
q
i
t
T
P
s=1
exp (q
i
s
)
,
where q
i
t
is the activation value for the output node for task t
and input i.
c) Random Forest (RF): We also investigate the use of
the Random Forest algorithm [28] to generate an ensemble of
decision trees, using the mean of the ensemble as the final
outcome. Each of these trees only has access to a set of δ
attributes, randomly chosen when the tree was created. In the
experiments conducted here, we have set the number of trees
to 500, and the number of attributes considered for each tree
δ =
j
d
k
= 5, as suggested in [29].
d) Linear and Kernel Ridge Regression (RR, KRR):
This is a baseline regression approach. In the linear
form, RR is based in solving the optimization problem
min
w
t
X
>
w
t
Y
t
2
2
+λ kw
t
k
2
, where w
t
represents the weight
vector of the linear model f
t
(x) = hw
t
, xi, x, w
t
R
d
, for
task t {1 . . . T }. For convenience we denote as k·k
2
the
`
2
-norm of a vector. One can extend this approach to non-
linear models by applying the kernel trick. In this case we have
chosen the Gaussian kernel K(x, t) = exp
1
σ
2
kx tk
2
2
.
e) Linear and Kernel Support Vector Regression (SVR,
KSVR): Finally we implement Support Vector Regression to
predict the degree of belief of each of the laughter type based
on the frequency of the ratings for each instance. In the linear
form, SVR is based on the optimization of the following
problem:
min
w
t
1
2
kw
t
k
2
+ C
m
P
i=1
ξ
i
s.t
y
i
t
w
>
t
x
i
ε + ξ
i
ξ
i
0
In that, ε 0 is the deviation allowed from the ground truth
labels y
i
t
. This constraint is weakened in some points by adding
an extra margin ξ
i
. The degree of deviations larger than ε are
adjusted by the second parameter C 0. Similar to KRR,
a non linear variant KSVR is also used in the comparison,
employing also the Gaussian kernel.
B. Evaluation Metrics
In order to robustly evaluate the multiple outcomes of the
models against the distribution of the observers categorisations,
as suggested in [23], we apply four well established multi-
score metrics over a number of instances M :
1) Mean Square Error: this is the standard loss function
which is computed as:
MSE :=
1
MT
M
X
i=1
T
X
t=1
y
i
t
ˆy
i
t
2
2) Cosine Similarity: finds the cosine of the angle be-
tween two vectors resulting in a maximum of 1 when
the vectors are fully aligned.
CS :=
1
M
M
X
i=1
y
i>
ˆy
i
ky
i
k
2
kˆy
i
k
2
3) Top Match Rate: evaluates the number of times the
predicted top ranked label is the same as the top
ranked label for the ground truth.
T MR :=
1
M
M
X
i=1
1
(
argmax
1tT
y
i
t
=argmax
1tT
ˆy
i
t
)
where 1
A
is a function on condition A.
1
A
=
1, A is true
0, A is false
.
4) Ranking Loss: this metric calculates the average
fraction of label pairs that are reversely ordered
for an instance. By ordering the label outcomes
as:
y
i
l
1
y
i
l
2
... y
i
l
T
the ranking loss predicted
outputs can be calculated by:
RL :=
1
M
M
X
i=1
T
P
j=1
T
P
k=j+1
1
n
ˆy
i
l
j
<ˆy
i
l
k
o
T × (T 1) /2
where 1
A
is the same function on condition A as for
TMR.
C. Recognition Results
We implement and evaluate all of the models outlined
above using a leave one subject out (LOSO) validation ap-
proach. This ensures instances from the same subject are not
present in training, validation and test sets at the same time.
We split the subjects into three groups: n training subjects,
1 validation subject to tune model parameters and 1 testing
subject to assess performance. For each model this procedure
is repeated 72 times (9 test subjects ×8 validation subjects,
accounting for all combinations) and the average results are
reported. Parameter values were tuned over a set range for
each of the models, the appropriate ranges were determined
in initial experiments. The parameters adjusted are as follows:
for kNN:k; RR: λ; SVR: C; KSVR: C, σ; KRR: λ, σ; and
MLP: n
hidden
(the number of hidden layer nodes).
Table II compares the performances of all of the models
using the four multi-score metrics. The results show mean (and
TABLE II. COMPARISON OF RECOGNITION PERFORMANCES.
INDICATES HIGHER VALUES CORRESPOND TO BETTER PERFORMANCE AND
INDICATES THE OPPOSITE. THE FIRST SEVEN ROWS CORRESPOND TO
THE AUTOMATIC RECOGNITION MODELS, THE LAST ROW (IR) INDICATES
THE MEAN LEVEL OF AGREEMENT BETWEEN OBSERVER GROUPS.
MSE CS TMR RL
k-NN
0.0151
(0.0041)
0.8825
(0.0300)
0.5272
(0.1658)
0.2998
(0.0517)
RR
0.0142
(0.0030)
0.8892
(0.0242)
0.4935
(0.2175)
0.2942
(0.0800)
KRR
0.0145
(0.0037)
0.8871
(0.0287)
0.5054
(0.2026)
0.2972
(0.0700)
SVR
0.0148
(0.0040)
0.8837
(0.0350)
0.4967
(0.2070)
0.3005
(0.0879)
KSVR
0.0149
(0.0039)
0.8842
(0.0302)
0.4815
(0.1965)
0.2989
(0.0791)
MLP
0.0192
(0.0066)
0.8536
(0.0450)
0.4837
(0.2112)
0.3195
(0.0668)
RF
0.0101
(0.0036)
0.9205
(0.0250)
0.6620
(0.1665)
0.2648
(0.0467)
IR
0.0217
(0.0032)
0.9457
(0.0081)
0.8489
(0.0291)
0.1003
(0.0092)
standard deviation) of each measure after the 72 runs. In order
to understand how informative the form features alone (Table I)
would perform, we also tested the models when trained without
using the ve energy based features. The results showed similar
but reduced performances in comparison to the ones reported
in Table II. For example the best performing scores without
energy features were for the RF model with MSE: 0.0106, CS:
0.9163, TMR: 0.662, RL: 0.2712. This demonstrates the dis-
criminatory power of the form features between laughter types.
This supports previous results showing the importance of form
in affective body expression recognition [30]. In addition, we
also seek to understand the level of agreement between human
observer groups. This calculation would provide a quantitative
context when assessing the rates given in Table II. Using
a simplified version of the approach proposed in [20], the
raters were split randomly into two groups of 16 and the
collective predictions of each group were computed. The same
four measures used for evaluating the systems were applied
to measure the agreement between these two predictions. We
repeated this process 10000 times and computed the averages
(and standard deviation). The results are reported in the last
row of Table II as IR. We can see that the results obtained
for the models are very similar to the inter-rater agreement
measures for MSE and CS but are lower for TMR and RL.
Table III shows the F1-score and accuracy of the classi-
fications for each laughter type from each of the models by
assuming the most frequent observer label as the ground truth
and the highest model output as the prediction. This can be
viewed as treating the data as a classification problem. Within
the 126 instances there were only 6 instances where ’awkward’
was the most frequent label and 5 instances for ’fake’, whereas
the number of instances for ’hilarious’, social’, and ’non-
laughter’ were 25, 46, and 44 respectively. Moreover, for some
of the subjects these classes do not occur if ground truth
is considered in this way. Since we use LOSO classification
performance can not be measured, therefore we show the F1
and accuracy scores for the remaining classes in Table III.
V. DISCUSSION AND CONCLUSION
In this section we discuss the findings from the perceptual
study and the investigation into automated recognition.
TABLE III. F1-SCORE (TOP) AND ACCURACY (BOTTOM) FOR EACH
MODEL BASED ON THE MOST FREQUENT OBSERVER LABELS FOR THE
THREE CATEGORIES WITH A SIGNIFICANT NUMBER OF INSTANCES.
Hilarious Social Not a Laugh
k-NN
0.5941
0.6000
0.3864
0.3397
0.5498
0.6818
RR
0.5253
0.5700
0.2287
0.1712
0.5875
0.7869
KRR
0.5268
0.5900
0.2744
0.2174
0.6068
0.7585
SVR
0.5103
0.5600
0.2555
0.1902
0.5864
0.7813
KSVR
0.4840
0.4900
0.2894
0.2418
0.5676
0.7273
MLP
0.4175
0.4050
0.3797
0.3261
0.5995
0.6932
RF
0.5636
0.6200
0.5562
0.5516
0.7441
0.8011
Analysis based on observer categorization of avatar anima-
tions revealed diagnostic body movement features for laughter
perception. The importance of spine movements in sitting
and standing postures may reflect observers’ sensitivity to
the respiratory movements that generate characteristic laughter
vocalizations and cause the spine to bend [9]. Similarly, that
hilarious laughter had a greater range of spine bending than
social laughter may be due to the energetic nature of hilarious
laughter relative to more controlled social laughter.
The range of azimuthal shoulder rotation was greater in
laughter than non-laughter in the sitting but not standing pos-
ture. When standing, models were free to turn, whereas in the
more constrained sitting condition shoulder rotation may have
been indicative of an energetic laughter episode. Alongside
the findings on spine bending, this hints that greater upper
body movement may indicate laughter. It is counter-intuitive
that any large upper body movement indicates laughter, so
observers’ perception of laughter compared to energetic, non-
laughter movements, e.g. coughs, should be investigated.
The range of distance between the hands was greater
in laughter than non-laughter segments, also indicating dis-
crimination based on the overall amount of movement. An
alternative explanation is the presence of specific gestures
such as pointing to laughter-eliciting stimuli. Standing laughter
segments had a smaller minimum right hand to head distance
than those categorized as non-laughter, suggesting that moving
the hand near or onto the face was seen as indicative of
laughter. This is of particular interest, since this gesture is
incidental to the core process of laughing; however, the timing
of this gesture may be crucial in conveying the presence and
nature of laughter and such temporal factors merit further
investigation. For example the study reported in [31] shows
that local temporal dynamics improves the automatic discrim-
ination between affective body expressions.
There was insufficient consensus on awkward and fake
laughter to draw conclusions on body movements indicative
of these laughter types. These laughter types may be too emo-
tionally and socially complex, or too infrequent in real life, for
observers to have a clear mental model of the body movements
associated with them. Alternatively these types of laughter may
be indistinguishable, on the basis of body movements alone,
from hilarious or social laughter, or from non-laughter speech.
Further information, such as vocalizations, facial expressions,
and context may be necessary for observers to disambiguate
them.
Although we optimized capture of shoulder and spine
movement, the avatar animations were unable to show non-
rigid deformation of the avatar sections (shoulder movement
was shown through relative movements of rigid sections).
Non-rigid deformations of the torso from respiratory action
may be important in animating naturalistic laughter [9]. In
addition our equipment did not capture hand gestures so
the precise nature of arm and hand movements may have
been ambiguous to observers, for example, they may have
been unable to distinguish a pointing gesture from a palm-up
gesture. Annotation of the video recordings of these sessions
in future will identify meaningful gestures and, when these can
be animated, allow us to analyse their contribution towards the
perception of different laughter types.
Ultimately the capture of body movements using more
accessible technology e.g., Microsoft Kinect, will make laugh-
ter detection ubiquitous in interactive systems. Our findings
suggest that torso bending movements, possibly driven by
respiratory actions, and peripheral gestures are used by ob-
servers to detect and classify laughter, and that these should
be included when animating laughter. The resting posture, e.g.
sitting vs. standing, should also be considered as it affects
laughter diagnostic movements, e.g, shoulder rotation. Future
work should cover more complex laughter, e.g. awkward, that
we were unable to reliably elicit in this study. The sex, age,
cultural background and personality of the laughter and ob-
server should also be further considered, for example, laughter
produced by extroverts and introverts may vary and specific
attitudes towards laughter may affect the perception of the
emotional content of the laughter. Some of these factors have
been investigated in [32] using the same set of body laughter
stimuli used in our study. The role of body movements may be
more complex in multimodal displays than in this uni-modal
study and our findings should be validated with simultaneous
facial and audio information to establish their applicability in
functional human-avatar interactions. The temporal dependen-
cies of laughter signals between these modalities and within the
body-movement channel will need to be carefully considered in
these scenarios as the perceived emotional content of laughter
may be strongly dependent on the order, duration and temporal
profile, e.g. onset and offset speed, of these signals.
The results on automatic recognition (Tables II and III)
demonstrate the effectiveness of the non parametric model
RF. The relative poorer performance of the parametric models
could be partially explained by the LOSO validation process
used to tune the model parameters. Recalling that LOSO
separates the training, validation and test sets by subject, this
shows that they may have been prone to idiosyncratic effects
during this tuning; this did not effect the RF model as no
pre-tuning was done. In contrast, the other models showed a
significant dependency on their respective tuned parameters k,
λ, C, σ and n
hidden
. The processing times for all of models
are similar and are within the same order of magnitude with
the exception of the MLP which required up to 10 times longer
depending on n
hidden
. When considering MSE and CS scores
the recognition methods show a good performance. These
metrics are more sensitive to the distribution of observer labels
upon which all of the models are trained. It can be concluded
that our full feature set used in this study is descriptive and
appropriate for learning the observer distributions, with the
worst performing model MLP still returning high scores. In
contrast when considering TMR and LR the performances for
all of the models return mediocre scores. However, in principle,
this is not unexpected since all the methods are regression
models by design.
Table III shows F1 classification scores for three categories:
hilarious, social and non-laughter. The most readily classified
category is non-laughter with social as the most difficult to
discriminate. This shows the feature set used in this study could
be salient for classifying non laughter from body movements.
Nevertheless, they are still descriptive for the discrimination
of the other classes well above chance level (20%). It is also
worth noting that the MSE and CS rates for all of the models
are similar to the MSE and CS scores for the inter observer
group agreement. Though it must be noted that this is not
directly comparable since the values in Table II stem from all
32 observers and the values calculated for IR stem from two
groups of 16. Nevertheless, it does provide an indicator of the
model performances relative to human recognition rates.
Future work should include the in-depth analysis of the
decision tree ensembles within the RF model. This could give
insight into the various features and corresponding thresholds
that have the most discriminatory power and could further
inform the design of improved recognition systems. Further-
more, methods to account for idiosyncratic artifacts should be
considered such as individual bias removal [22] or transfer
learning methods [33].
ACKNOWLEDGMENTS
The research leading to these results has received funding
from the European Union Seventh Framework Programme
(FP7/2007-2013) under grant agreement no. 270780. We thank
all those who participated in our experiments, Jianchuan Qi for
his help in collecting the motion capture data and the members
of the ILHAIRE consortium for their comments.
REFERENCES
[1] C. Creed and R. Beale, “User interactions with an affective nutritional
coach, Interacting with Computers, vol. 24, no. 5, pp. 339–350, 2012.
[2] M. Ochs, R. Niewiadomski, P. Brunet, and C. Pelachaud, “Smiling vir-
tual agent in social context,International journal Cognitive Processing,
pp. 1–14, 2011.
[3] E. Holt, “The last laugh: Shared laughter and topic termination,Journal
of Pragmatics, vol. 42, pp. 1513–1525, 2010.
[4] G. McKeown, R. Cowie, W. Curran, W. Ruch, and E. Douglas-
Cowie, “Ilhaire laughter database, in Proceedings of 4th International
Workshop on Corpora for Research on Emotion, Sentiment & Social
Signals, LREC, 2012, pp. 32–35.
[5] D. Cosker and J. Edge, “Laughing, crying, sneezing and yawning:
Automatic voice driven animation of non-speech articulations, in
Proceedings of Computer Animation and Social Agents, CASA, 2009.
[6] R. Niewiadomski, J. Urbain, C. Pelachaud, and T. Dutoit, “Finding out
the audio and visual features that influence the perception of laughter
intensity and differ in inhalation and exhalation phases,” in Proceedings
of 4th International Workshop on Corpora for Research on Emotion,
Sentiment & Social Signals, LREC, 2012.
[7] J. Urbain, R. Niewiadomski, E. Bevacqua, T. Dutoit, A. Moinet,
C. Pelachaud, B. Picart, J. Tilmanne, and J. Wagner, Avlaughtercycle.
enabling a virtual agent to join in laughing with a conversational partner
using a similarity-driven audiovisual laughter animation, Journal of
Multimodal User Interfaces, vol. 4, pp. 47–58, 2010.
[8] M. Filippelli, R. Pellegrino, I. Iandelli, G. Misuri, J. Rodarte, R. Duranti,
V. Brusasco, and G. Scano, “Respiratory dynamics during laughter,
Journal of Applied Physiology, vol. 90, pp. 1441–1446, 2001.
[9] Z. V. DiLorenzo, P. and B. Sanders, “Laughing out loud: control
for modeling anatomically inspired laughter using audio, In ACM
Transactions on Graphics, vol. 27, p. 125, 2008.
[10] R. Niewiadomski and C. Pelachaud, “Towards multimodal expression
of laughter,” in Intelligent Virtual Agents. Springer, 2012, pp. 231–244.
[11] C.-H. Chou, C.-H. Li, B.-W. Chen, J.-F. Wang, and P.-C. Lin, A real-
time training-free laughter detection system based on novel syllable
segmentation and correlation methods, in Awareness Science and
Technology (iCAST), 2012 4th International Conference on. IEEE,
2012, pp. 294–297.
[12] K. Laskowski, “Contrasting emotion-bearing laughter types in multipar-
ticipant vocal activity detection for meetings,” in Acoustics, Speech and
Signal Processing, 2009. ICASSP 2009. IEEE International Conference
on. IEEE, 2009, pp. 4765–4768.
[13] M. Miranda, J. A. Alonzo, J. Campita, S. Lucila, and M. Suarez, “Dis-
covering emotions in filipino laughter using audio features, in Human-
Centric Computing (HumanCom), 2010 3rd International Conference
on. IEEE, 2010, pp. 1–6.
[14] S. Petridis and M. Pantic, “Audiovisual discrimination between laughter
and speech,” in Acoustics, Speech and Signal Processing, 2008. ICASSP
2008. IEEE International Conference on. IEEE, 2008, pp. 5117–5120.
[15] S. Fukushima, Y. Hashimoto, T. Nozawa, and H. Kajimoto, “Laugh
enhancer using laugh track synchronized with the user’s laugh motion,
in CHI ’10 Extended Abstracts on Human Factors in Computing
Systems, ser. CHI EA ’10. New York, NY, USA: ACM, 2010, pp.
3613–3618.
[16] M. Mancini, G. Varni, D. Glowinski, and G. Volpe, “Computing and
evaluating the body laughter index,” in Human Behavior Understanding.
Springer, 2012, pp. 90–98.
[17] M. El Ayadi, M. S. Kamel, and F. Karray, “Survey on speech emotion
recognition: Features, classification schemes, and databases, Pattern
Recognition, vol. 44, no. 3, pp. 572–587, 2011.
[18] Z. Zeng, M. Pantic, G. Roisman, and T. Huang, A survey of affect
recognition methods: Audio, visual, and spontaneous expressions,
Pattern Analysis and Machine Intelligence, IEEE Transactions on,
vol. 31, no. 1, pp. 39–58, 2009.
[19] A. Kleinsmith and N. Bianchi-Berthouze, Affective body expression
perception and recognition: a survey,IEEE Trans. Affective Computing,
vol. 4, pp. 15–33, 2013.
[20] A. Kleinsmith, N. Bianchi-Berthouze, and A. Steed, Automatic recog-
nition of non-acted affective postures, Systems, Man, and Cybernetics,
Part B: Cybernetics, IEEE Transactions on, vol. 41, no. 4, pp. 1027–
1038, 2011.
[21] G. Castellano, S. D. Villalba, and A. Camurri, “Recognising human
emotions from body movement and gesture dynamics, in Affective
computing and intelligent interaction. Springer, 2007, pp. 71–82.
[22] D. Bernhardt and P. Robinson, “Detecting affect from non-stylised body
motions, in Affective Computing and Intelligent Interaction. Springer,
2007, pp. 59–70.
[23] H. Meng, A. Kleinsmith, and N. Bianchi-Berthouze, “Multi-score
learning for affect recognition: the case of body postures, in Affective
Computing and Intelligent Interaction. Springer, 2011, pp. 225–234.
[24] C. Galvan, D. Manangan, M. Sanchez, J. Wong, and J. Cu, “Audiovisual
affect recognition in spontaneous filipino laughter, in Knowledge and
Systems Engineering (KSE), 2011 Third International Conference on.
IEEE, 2011, pp. 266–271.
[25] G. McKeown, W. Curran, C. McLoughlin, H. J. Griffin, and N. Bianchi-
Berthouze, “Laughter induction techniques suitable for generating mo-
tion capture data of laughter associated body movements,” in 2nd Inter-
national Workshop on Emotion Representation, Analysis and Synthesis
in Continuous Time and Space (EmoSPACE), 2013.
[26] W. Ruch and P. Ekman, “The expressive pattern of laughter, Emotion,
qualia, and consciousness, pp. 426–443, 2001.
[27] J. S. Bridle, “Probabilistic interpretation of feedforward classification
network outputs, with relationships to statistical pattern recognition,” in
Neurocomputing. Springer, 1990, pp. 227–236.
[28] L. Breiman, “Random forests, Machine learning, vol. 45, no. 1, pp.
5–32, 2001.
[29] V. Svetnik, A. Liaw, C. Tong, J. C. Culberson, R. P. Sheridan, and
B. P. Feuston, “Random forest: a classification and regression tool
for compound classification and qsar modeling, Journal of chemical
information and computer sciences, vol. 43, no. 6, pp. 1947–1958, 2003.
[30] A. Kleinsmith and N. Bianchi-Berthouze, “Form as a cue in the
automatic recognition of non-acted affective body expressions, Lecture
Notes in Computer Science, vol. 6874, pp. 155–164, 2011.
[31] A. Kleinsmith, T. Fushimi, and N. Bianchi-Berthouze, An incremental
and interactive affective posture recognition system, in International
Workshop on Adapting the Interaction Style to Affective Factors, in
conjunction with the International Conference on User Modeling, 2005.
[32] G. McKeown, W. Curran, D. Kane, R. McCahon, H. Griffin,
C. McLoughlin, and N. Bianchi-Berthouze, “Human perception of
laughter from context-free whole body motion dynamic stimuli, in
International Conference on Affective Computing and Intelligent In-
teraction, 2013, in press.
[33] B. Romera-Paredes, M. Aung, M. Pontil, A. Williams, P. Watson, and
N. Bianchi-Berthouze, “Transfer learning to account for idiosyncrasy in
face and body expressions, Automatic Face and Gesture Recognition,
2013.
... Laughter is a nonverbal vocalization, which affects our facial expressions, our body movements and posture (Urbain et al., 2013). The availability of multimodal data is not only important for conducting pragmatic analysis of laughter (not the focus of the current paper), allowing for a holistic interpretation of the contextual cues, but also for its detection: audio alone allows for less precise annotation both in terms of laughter onset and offset time, but also for detection itself, considering that laughter can also be silent (Cosentino et al., 2016;Griffin et al., 2013). For this reason, we chose to use a corpus for which video data was available: the Providence Corpus (Demuth et al., 2006). ...
... In some previous work, it has been referred to as 'laughter intensity' (e.g., Curran et al., 2017;El Haddad et al., 2018;McKeown & Curran, 2015), but in the current work, we prefer to refer to it as 'arousal' since we believe it is informative about the change in the laugher's mood as a result of appraising the laughable (Ku et al., 2017;Reisenzein et al., 2019), and because it avoids confusion with acoustic intensity. Laughter can in fact be silent (Cosentino et al., 2016;Griffin et al., 2013) and nonetheless signal a very high arousal state. The stacked barplot in Fig. 3 shows the arousal level displayed in laughter over all time-points by children and mothers (as perceived and agreed upon by annotators 14 ). ...
Article
Full-text available
Laughter is a valuable means for communicating and engaging in interaction since the earliest months of life. Nevertheless, there is a dearth of work on how its use develops in early interactions—given its putative reflexive nature, it has often been disregarded from studies on pre-linguistic vocalizations. We provide a longitudinal characterization of laughter use analyzing interactions of 4 babies with their mothers at five time-points (12, 18, 24, 30, and 36 months). We show how child laughter is very distinct from mothers’ (and adults’ generally), in terms of frequency, duration, level of arousal displayed, overlap with speech, and responsiveness to others’ laughter. Notably, contrary to what might be expected, we observed that children laugh significantly less than their mothers, especially at the first time-points analyzed. We indeed observe an increasing developmental trajectory in the production of laughter overall and in the contingent multimodal response to mothers’ laughter, showing the child’s increasing attunement to the social environment, interest in others’ appraisals and mental states, and awareness of its communicative value. We also show how mothers’ contingent responses to child laughter change over time, going from high-frequency mimicry, to a lower rate of diversified multimodal responses, in line with the child’s neuro-psychological development. Our data support a dynamic view of dialogue where interactants influence each other bidirectionally and emphasizes the crucial communicative value of laughter. When language is not fully developed, laughter might be an early means, in its already fully available expressiveness, to hold the conversational turn and enable meaningful vocal contribution in interaction at the same level of the interlocutor. Our study aims to provide a benchmark for typical laughter development, since we believe it can be an early means, along with other commonly analyzed behaviors (e.g., smiling, gazing, pointing, etc.), to gain insight into early child neuro-psychological development.
... Laughter is a non-verbal vocalisation which affects our facial expressions, our body movements and posture (Griffin et al., 2013;Cosentino et al., 2016). It is very important to adopt a multimodal approach both to laughter identification (laughter can be silent) and to its interpretation: in order to identify laughables, infer intentions, observe gaze direction and attentional states, and take into account other non-verbal social signals. ...
... Laughter is a non-verbal vocalisation which affects our facial expressions, and our body movements and posture. In interpreting laughter, it is very important to take a multimodal approach both to its identification (laughter can be silent) and its comprehension in order to identify laughables, infer intentions, observe gaze direction and attentional states, and take into account other non-verbal social signals (Griffin et al., 2013;Cosentino, Sessa, and Takanishi, 2016). For that reason, and moreover because the study would investigate children for whom language is just emerging, where a good proportion of the communication and interaction is necessarily nonverbally mediated, we decided to use a corpus for which video data was available: ...
Thesis
Full-text available
Laughter is a social vocalization universal across cultures and languages. It is ubiquitous in our dialogues and able to serve a wide range of functions. Laughter has been studied from several perspectives, but the classifications proposed are hard to integrate. Despite being crucial in our daily interaction, relatively little attention has been devoted to the study of laughter in conversation, attempting to model its sophisticated pragmatic use, neuro-correlates in perception and development in children. In the current thesis a new comprehensive framework for laughter analysis is proposed, crucially grounded in the assumption that laughter has propositional content, arguing for the need to distinguish different layers of analysis, similarly to the study of speech: form, positioning, semantics and pragmatics. A formal representation of laughter meaning is proposed and a multilingual corpus study (French, Chinese and English) is conducted in order to test the proposed framework and to deepen our understanding of laughter use in adult conversation. Preliminary investigations are conducted on the viability of a laughter form-function mapping based on acoustic features and on the neuro-correlates involved in the perception of laughter serving different functions in natural dialogue. Our results give rise to novel generalizations about the placement, alignment, semantics and function of laughter, stressing the high pragmatic skills involved in its production and perception. The development of the semantic and pragmatic use of laughter is observed in a longitudinal corpus study of 4 American-English child-mother pairs from 12 to 36 months of age. Results show that laughter use undergoes important development at each level analysed, which complies with what could be hypothesised on the base of phylogenetic data, and that laughter can be an effective means to track cognitive/communicative development, and potential difficulties or delays at a very early stage.
... On the other hand, additional information about body movement such as the velocity and acceleration of joints was used in Kipp and Martin (2009) in combination with the dimensional emotional model to quantify arousal and valence. Other studies have focused on particular expressions and their body language manifestations such as laughter states (Griffin et al. 2013). Here, torso and limb information were compared with different classifiers to identify if the laughter state was fake, awkward, social, or hilarious. ...
Book
Full-text available
The book gives an introduction into the theory and practice of the transdisciplinary field of Character Computing, introduced by Alia El Bolock. The latest scientific findings indicate that “One size DOES NOT fit all” in terms of how to design interactive systems and predict behavior to tailor the interaction experience. Emotions are one of the essential factors that influence people’s daily experiences; they influence decision making and how different emotions are interpreted by different individuals. For example, some people may perform better under stress and others may break. Building upon Rosalind Picard’s vision, if we want computers to be genuinely intelligent and to interact naturally with us, we must give computers the ability to recognize, understand, even to have and express emotions and how different characters perceive and react to these emotions, hence having richer and truly tailored interaction experiences. Psychological processes or personality traits are embedded in the existing fields of Affective and Personality Computing. However, this book is the first that systematically addresses this including the whole human character; namely our stable personality traits, our variable affective, cognitive and motivational states as well as our morals, beliefs and socio-cultural embedding. The book gives an introduction into the theory and practice of the transdisciplinary field of Character Computing. The emerging field leverages Computer Science and Psychology to extend technology to include the whole character of humans and thus paves the way for researchers to truly place humans at the center of any technological development. Character Computing is presented from three main perspectives: ● Profiling and sensing the character ● Leveraging characters to build ubiquitous character-aware systems ● Investigating how to extend Artificial Intelligence to create artificial characters
... On the other hand, additional information about body movement such as the velocity and acceleration of joints was used in Kipp and Martin (2009) in combination with the dimensional emotional model to quantify arousal and valence. Other studies have focused on particular expressions and their body language manifestations such as laughter states (Griffin et al. 2013). Here, torso and limb information were compared with different classifiers to identify if the laughter state was fake, awkward, social, or hilarious. ...
Chapter
Character Computing is a novel and interdisciplinary field of research based on interactive research between Computer Science and Psychology. To allow appropriate recognition and prediction of human behavior, Character Computing needs to be grounded on psychological definitions of human behavior that consider explicit as well as implicit human factors. The framework that guides Character Computing therefore needs to be of considerable complexity in order to capture the human user’s behavior in its entirety. The question to answer in this chapter is how Character Computing can be empirically realized and validated. The psychologically driven interdisciplinary framework for Character Computing will be outlined and how it is realized empirically as Character Computing platform. Special focus in this chapter is laid on experimental validation of the Character Computing approach including concrete laboratory experiments. The chapter adds to the former chapter which discussed the different steps of the Character Computing framework more broadly with respect to current theories and trends in Psychology and Behavior Computing.
Article
Although laughter is known to be a multimodal signal, it is primarily annotated from audio. It is unclear how laughter labels may differ when annotated from modalities like video, which capture body movements and are relevant in in-the-wild studies. In this work we ask whether annotations of laughter are congruent across modalities, and compare the effect that labeling modality has on machine learning model performance. We compare annotations and models for laughter detection, intensity estimation, and segmentation, using a challenging in-the-wild conversational dataset with a variety of camera angles, noise conditions and voices. Our study with 48 annotators revealed evidence for incongruity in the perception of laughter and its intensity between modalities, mainly due to lower recall in the video condition. Our machine learning experiments compared the performance of modern unimodal and multi-modal models for different combinations of input modalities, training, and testing label modalities. In addition to the same input modalities rated by annotators (audio and video), we trained models with body acceleration inputs, robust to cross-contamination, occlusion and perspective differences. Our results show that performance of models with body movement inputs does not suffer when trained with video-acquired labels, despite their lower inter-rater agreement.
Article
People often encounter difficulties in building shared understanding during everyday conversation. The most common symptom of these difficulties are self-repairs, when a speaker restarts, edits or amends their utterances mid-turn. Previous work has focused on the verbal signals of self-repair, i.e. speech disfluences (filled pauses, truncated words and phrases, word substitutions or reformulations), and computational tools now exist that can automatically detect these verbal phenomena. However, face-to-face conversation also exploits rich non-verbal resources and previous research suggests that self-repairs are associated with distinct hand movement patterns. This paper extends those results by exploring head and hand movements of both speakers and listeners using two motion parameters: height (vertical position) and 3D velocity. The results show that speech sequences containing self-repairs are distinguishable from fluent ones: speakers raise their hands and head more (and move more rapidly) during self-repairs. We obtain these results by analysing data from a corpus of 13 unscripted dialogues, and we discuss how these findings could support the creation of improved cognitive artificial systems for natural human-machine and human-robot interaction.
Preprint
Full-text available
The development of virtual agents has enabled human-avatar interactions to become increasingly rich and varied. Moreover, an expressive virtual agent i.e. that mimics the natural expression of emotions, enhances social interaction between a user (human) and an agent (intelligent machine). The set of non-verbal behaviors of a virtual character is, therefore, an important component in the context of human-machine interaction. Laughter is not just an audio signal, but an intrinsic relationship of multimodal non-verbal communication, in addition to audio, it includes facial expressions and body movements. Motion analysis often relies on a relevant motion capture dataset, but the main issue is that the acquisition of such a dataset is expensive and time-consuming. This work studies the relationship between laughter and body movements in dyadic conversations. The body movements were extracted from videos using deep learning based pose estimator model. We found that, in the explored NDC-ME dataset, a single statistical feature (i.e, the maximum value, or the maximum of Fourier transform) of a joint movement weakly correlates with laughter intensity by 30%. However, we did not find a direct correlation between audio features and body movements. We discuss about the challenges to use such dataset for the audio-driven co-laughter motion synthesis task.
Chapter
Character Computing envisions systems that can detect, synthesize, and adapt to human character. The development and realization of this field hinge upon the availability of data about human character traits and states. This data must be comprehensive enough to model the embedded causality in the triad of behavior–situation–character that makes up the core of Character Computing. Acquiring this data requires an intelligent and scalable platform for sensing, processing, analysis, and decision support, which we label as Character-IoT (CIoT). This chapter investigates how this CIoT can be realized. A comprehensive study of sensing modalities in the areas of affective and personality computing is presented to identify the technologies that can be adopted in Character Computing. This includes facial expressions, speech, text, gestures, and others. We also highlight artificial intelligence techniques that are most commonly used in areas of affective and personality computing and analyze which ones are suitable for Character Computing. Finally, we propose an architectural framework for CIoT that can be adopted by future researchers in this field.
Article
Full-text available
Thanks to the decreasing cost of whole-body sensing technology and its increasing reliability, there is an increasing interest in, and understanding of, the role played by body expressions as a powerful affective communication channel. The aim of this survey is to review the literature on affective body expression perception and recognition. One issue is whether there are universal aspects to affect expression perception and recognition models or if they are affected by human factors such as culture. Next, we discuss the difference between form and movement information as studies have shown that they are governed by separate pathways in the brain. We also review psychological studies that have investigated bodily configurations to evaluate if specific features can be identified that contribute to the recognition of specific affective states. The survey then turns to automatic affect recognition systems using body expressions as at least one input modality. The survey ends by raising open questions on data collecting, labeling, modeling, and setting benchmarks for comparing automatic recognition systems.
Conference Paper
Full-text available
The ILHAIRE project seeks to scientifically analyse laughter in sufficient detail to allow the modelling of human laughter and subsequent generation and synthesis of laughter in avatars suitable for human machine interaction. As part of the process an incremental database is required providing different types of data to aid in modelling and synthesis. Here we present an initial part of that database in which laughs were extracted from a number of pre-existing databases. Emphasis has been placed on extraction of laughs that are social and conversational in style as there are already existing databases that include instances of hilarious laughter. However, an attempt has been made to exhaustively extract all instances of laughter from databases that were not designed for the purpose of generating hilarious laughter. Theses databases are: the Belfast Naturalistic Database, the HUMAINE Database, the Green Persuasive Database, the Belfast Induced Natural Emotion Database and the SEMAINE Database.
Conference Paper
Full-text available
Laughter is a frequently occurring social signal and an important part of human non-verbal communication. However it is often overlooked as a serious topic of scientific study. While the lack of research in this area is mostly due to laughter's non-serious nature, it is also a particularly difficult social signal to produce on demand in a convincing manner; thus making it a difficult topic for study in laboratory settings. In this paper we provide some techniques and guidance for inducing both hilarious laughter and conversational laughter. These techniques were devised with the goal of capturing mo-tion information related to laughter while the person laughing was either standing or seated. Comments on the value of each of the techniques and general guidance as to the importance of atmosphere, environment and social setting are provided.
Conference Paper
Full-text available
Laughter is a strong social signal in human-human and human-machine communication. However, very few attempts to model it exist. In this paper we discuss several challenges regarding the generation of laughs. We focus, more particularly, on two aspects a) modeling laughter with different intensities and b) modeling respiration behavior during laughter. Both of these models combine a data-driven approach with high-level animation control. Careful analysis and implementation of the synchronization mechanisms linking visual and respiratory cues has been undertaken. It allows us to reproduce the highly correlated multimodal signals of laughter on a 3D virtual agent.
Article
The role of body posture in affect recognition, and the im-portance of emotion in the development and support of intelligent and social behavior have been accepted and researched within several fields. While posture is considered important, much research has focused on ex-tracting emotion information from dance sequences. Instead, our focus is on creating an affective posture recognition system that incrementally learns to recognize and react to people's affective behaviors. In this pa-per, we examine a set of requirements for creating this system, and our proposed solutions. The first requirement is that the system is general and non-situation specific. Secondly, it should be able to handle explicit and implicit feedback. Finally, it must be able to incrementally learn the emotion categories without predefining them. We tested and compared the performance of our system using 182 standing postures described as a combination of form features and motion flow features, across sev-eral emotion categories, with a typical algorithm used for recognition, back-propagation, and with human observers in an aim to show the gen-eralizability of the system. This initial testing showed positive results.
Conference Paper
The EU-ICT FET Project ILHAIRE is aimed at endowing machines with automated detection, analysis, and synthesis of laughter. This paper describes the Body Laughter Index (BLI) for automated detection of laughter starting from the analysis of body movement captured by a video source. The BLI algorithm is described, and the index is computed on a corpus of videos. The assessment of the algorithm by means of subject's rating is also presented. Results show that BLI can successfully distinguish between different videos of laughter, even if improvements are needed with respect to perception of subjects, multimodal fusion, cultural aspects, and generalization to a broad range of social contexts.
Conference Paper
In this paper we investigate the use of the Transfer Learning (TL) framework to extract the commonalities across a set of subjects and also to learn the way each individual instantiates these commonalities to model idiosyncrasy. To implement this we apply three variants of Multi Task Learning, namely: Regularized Multi Task Learning (RMTL), Multi Task Feature Learning (MTFL) and Composite Multi Task Feature Learning (CMTFL). Two datasets are used; the first is a set of point based facial expressions with annotated discrete levels of pain. The second consists of full body motion capture data taken from subjects diagnosed with chronic lower back pain. A synchronized electromyographic signal from the lumbar paraspinal muscles is taken as a pain-related behavioural indicator. We compare our approaches with Ridge Regression which is a comparable model without the Transfer Learning property; as well as with a subtractive method for removing idiosyncrasy. The TL based methods show statistically significant improvements in correlation coefficients between predicted model outcomes and the target values compared to baseline models. In particular RMTL consistently outperforms all other methods; a paired t-test between RMTL and the best performing baseline method returned a maximum p-value of 2.3 × 10-4.
Conference Paper
In this paper, a laughter detection system based on the correlation characteristic of signals is proposed. The advantages of the system are speaker independent, low-computational and training-free. To achieve the goal, a modified autocorrelation function (MACF) is combined with a new approach called vocal tract transfer detector (VTTD) for segmenting an input signal into a syllable stream. Next, based on each syllable's Mel-scale frequency cepstral coefficients (MFCCs), the correlation between two consecutive syllables is measured by the dynamic time warping (DTW) algorithm. The consecutive syllables with high correlation are considered as a laughter segment. In our experimental result, the proposed system can achieve an accuracy rate of 88.67%. Besides, compared with the baseline, the proposed system can reduce the word error rate (WER) of syllable segmentation by 5.9%. Such results indicate that the proposed method is effective in detecting laughter, thereby demonstrating the feasibility of the system.
Article
Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y. Freund & R. Schapire, Machine Learning: Proceedings of the Thirteenth International conference, ***, 148–156), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression.
Article
This paper investigates how users respond to emotional expressions displayed by an embodied agent. In a between-subjects experiment (N = 50) an emotionally expressive agent (simulating the role of a nutritional coach) was perceived as significantly more likeable and caring than an unemotional version. Feedback from participants also revealed detailed insights into their perceptions of the agents and highlighted a strong preference for the emotionally expressive version. Design implications for embodied agents are discussed and future research areas identified.