Conference PaperPDF Available

Laughter Type Recognition from Whole Body Motion

September 2013

September 2013

DOI:10.1109/ACII.2013.64

Conference: Affective Computing and Intelligent Interaction
At: Geneva, Switzerland

Authors:

Harry Griffin

University College London

Min Hane Aung

University of London

Bernardino Romera-Paredes

University of Oxford

Gary Mckeown

Queen's University Belfast

Show all 7 authorsHide

Despite the importance of laughter in social interactions it remains little studied in affective computing. Respiratory, auditory, and facial laughter signals have been investigated but laughter-related body movements have received almost no attention. The aim of this study is twofold: first an investigation into observers' perception of laughter states (hilarious, social, awkward, fake, and non-laughter) based on body movements alone, through their categorization of avatars animated with natural and acted motion capture data. Significant differences in torso and limb movements were found between animations perceived as containing laughter and those perceived as nonlaughter. Hilarious laughter also differed from social laughter in the amount of bending of the spine, the amount of shoulder rotation and the amount of hand movement. The body movement features indicative of laughter differed between sitting and standing avatar postures. Based on the positive findings in this perceptual study, the second aim is to investigate the possibility of automatically predicting the distributions of observer's ratings for the laughter states. The findings show that the automated laughter recognition rates approach human rating levels, with the Random Forest method yielding the best performance.

…

Figures - uploaded by Gary Mckeown

Content may be subject to copyright.

Content uploaded by Gary Mckeown

Content may be subject to copyright.

Laughter Type Recognition from Whole Body

Motion

Harry J. Grifﬁn

∗

, Min S. H. Aung

∗

, Bernardino Romera-Paredes

∗

, Ciaran McLoughlin

∗

Gary McKeown

†

, William Curran

†

, Nadia Bianchi-Berthouze

∗

UCL Interaction Centre, University College London, London, UK

†

School of Psychology, Queen’s University Belfast, UK

Email: (harry.grifﬁn, m.aung, ucabbro, ucjt511, n.berthouze)@ucl.ac.uk

Email: (G.McKeown, w.curran)@qub.ac.uk

Abstract—Despite the importance of laughter in social interac-

tions it remains little studied in affective computing. Respiratory,

auditory, and facial laughter signals have been investigated

but laughter-related body movements have received almost no

attention. The aim of this study is twofold: ﬁrst an investigation

into observers’ perception of laughter states (hilarious, social,

awkward, fake, and non-laughter) based on body movements

alone, through their categorization of avatars animated with

natural and acted motion capture data. Signiﬁcant differences

in torso and limb movements were found between animations

perceived as containing laughter and those perceived as non-

laughter. Hilarious laughter also differed from social laughter

in the amount of bending of the spine, the amount of shoulder

rotation and the amount of hand movement. The body movement

features indicative of laughter differed between sitting and

standing avatar postures. Based on the positive ﬁndings in this

perceptual study, the second aim is to investigate the possibility

of automatically predicting the distributions of observer’s ratings

for the laughter states. The ﬁndings show that the automated

laughter recognition rates approach human rating levels, with

the Random Forest method yielding the best performance.

Keywords: laughter, body movement, automatic emotion recog-

nition, automatic laughter type recognition, laughter type perception

I. INTRODUCTION

The increasing use of virtual agents and robots in enter-

tainment, collaborative, and support roles places ever greater

demands on their ability to detect users’ emotional state

from various modalities (body movements, facial expressions,

speech) and produce emotional displays. This is particularly

true in socially complex human-computer interactions such

as education, rehabilitation and health scenarios. In these

situations emotionally expressive agents are much preferred

by users [1].

Laughter is a ubiquitous and complex signal that remains

relatively uninvestigated, in contrast to studies on other emo-

tional expressions such as smiling [2]. Due to the range

of vocal and physical expressions of laughter, its detection

and synthesis are very challenging. Laughter does more than

express hilarity. It can convey negative and mixed emotions

and act as an invitation to shared expression [3]. At least

23 types of laughter have been identiﬁed (hilarious, anxious,

embarrassed, etc.) [4] with each laughter type having its own

social function. Hence, the ability to produce the appropriate

type and intensity of laughter in response to a user’s emotional

signals, including laughter, would be a dramatic step forward

in the realism and possibly efﬁcacy of virtual agents.

There have been few studies on synthesizing laughter

in virtual agents, most of which have focused on acoustics

and the face [5], [6]. Urbain et al. present a laughter ma-

chine that is able to recognize laughter from sounds and

give a response [7]. The distinctive respiration patterns of

laughter have been widely corroborated [8] and integrated

into anatomically inspired models of laughter [9]. Recently,

Niewiadomski and Pelachaud investigated the coordination of

virtual agents’ laughter respiration behaviour with other visual

cues; however, this work is mainly based on hilarious laughter

[10]. A further difﬁculty for synthesis of laughter-related body

movements is that stereotypical laughter actions, e.g. clutching

ones abdomen, rocking back and forth, slapping one’s leg, are

well known but may be seen as exaggerated and unnatural.

Work on automatic recognition of laughter has also started

to emerge but, as with the synthesis of laughter, has mostly

focused the acoustic modality (e.g., [11]–[13]) and more

recently on the combination of face and voice cues [14]. Less

attention has been given to body laughter expressions. Whole-

body postural changes and peripheral gestures associated with

different types of laughter remain unelucidated. In [15], the

authors use electromyographic sensors to measure diaphrag-

matic activity to detect laughter in people watching television.

This is used to trigger laughter in nearby robotic dolls with

the aim of enhancing the user’s laughter.

More recently, there has been interest in creating automatic

classiﬁers able to differentiate laughter types. To this end,

motion descriptors based on energy estimates, correlation of

shoulder movements and periodicity to characterise laughter

have been investigated [16]. Using a combination of these

measures a Body Laughter Index (BLI) was calculated. The

BLIs of 8 laughter clips were compared with 8 observers’

ratings of the energy of the shoulder movement. A correlation,

albeit weak, between the observers’ ratings and BLIs was

found.

There has been growing evidence supporting the possibility

of automatically discriminating between different emotions

from various modalities: acoustics [17], facial expressions [18]

and body [19]–[23]. Moreover, the study in [24] went further

in trying to characterize different types of laughter. They

investigated automatic discrimination of ﬁve types of acted

laughter: happiness, giddiness, excitement, embarrassment and

hurtful. Actors were asked to enact these ﬁve emotions using

both vocal and facial expressions whilst they were video-

recorded. The video clips were labelled by expert observers

who were also made aware of the intention of the actors. The

results showed that automatic recognition based only on the

vocal features reach higher accuracy (70% correct recognition)

than when using both facial and vocal features (60% correct

recognition) or facial features alone (40% correct recognition).

While, on the basis of these results, the authors argue that

vocal expressions carry more emotional information than facial

expressions, it should be noted that the actors were asked to try

to keep the head as still as possible so that it was always frontal

to the video camera. These may have constrained and limited

the way people expressed their laughter through their faces and

head movements. In addition, the fact that the expressions were

acted also raises the questions of how naturalistic they were.

One could argue that we are better at acting an expression

through our voice since we can hear it, while we cannot see

our face. This is particularly true when the actors are not

professionals but lay people.

In this study we investigate perception of laughter type

from body movements and lay the groundwork for laughter

type recognition from these cues. This study makes two

contributions: ﬁrst, by identifying body movements that are

perceived as indicative of different types of natural laughter,

it informs more convincing animation of laughter in avatars,

which will increase their perceived conversational authenticity

and emotional range. Second, it investigates if it is possible to

automatically discriminate between different types of laughter

by comparing a wide range of automated recognition methods.

II. MOTION DATA COLLECTION

Users’ perception of laughter-related body movements was

investigated in a forced-choice perceptual experiment. Body

movements captured during different types of natural and acted

laughter were used to animate an avatar. Observers categorized

the animations as hilarious, social, awkward, fake, or non-

laughter. Naive observers’ categorizations were used to allow

analysis of the perception of body movements in the absence

of other modalities e.g., verbal, facial, and in the absence of

knowledge of the eliciting stimulus and context.

A. Laughter Capture

Nine pairs of participants took part in a motion capture

recording session. The movements of one member of each pair

(subjects - 3 male, 6 female, mean age 25.7) were captured

using a whole-body inertial motion capture suit (Animazoo

IGS-190). The suit was modiﬁed to maximize the sensitivity to

spine and shoulder movements. Tasks to elicit laughter in both

standing and sitting postures included word games, collabora-

tive games (Pictionary) and humorous videos [25]. Laughter

also occurred during conversation during “rest” periods. The

subjects also produced fake laughter on request.

B. Stimulus Preparation

Using video recordings of the motion capture session, we

segmented laughter episodes and gave them preliminary la-

bels: hilarious; social (back-channeling, polite, conversational

laughter); awkward (involving a negative emotion such as

Fig. 1. Examples stills from the animated avatars

embarrassment or discomfort on another’s behalf); or fake.

In total, 508 laughter segments and 41 randomly located non-

laughter segments, some containing other behaviour such as

talking, were identiﬁed. The motion capture data from these

segments were used to animate an avatar deﬁned by the

positional co-ordinate triplets of 26 anatomical points over the

whole body. The anatomical proportions were the same for all

animations (Figure 1). Viewing angle was standardized to a

slightly elevated ¾ viewpoint, although models were free to

walk and turn in the standing tasks. One hundred and twenty-

six animations (experimenter labels: 34 hilarious, 43 social,

16 awkward, 19 fake, 14 non-laughter - mean duration =

4.1s, SD = 1.8s) were selected as stimuli for the perceptual

phase (non-laughter animations were chosen randomly from

previous sample, with durations within the range of dura-

tions of laughter animations). This ratio of laughter types

according to experimenter-determined labels was designed

to match the frequency of laughter-types in a naturalistic

database [4]. Note that the level of agreement between the

experimenter-determined labels and observers’ categorization

is not of interest here; rather we wished to establish which body

movements are perceived by the observers as indicative of

different laughter types. Therefore this distribution of stimuli

by experimenter-determined labels was implemented only with

the aim of producing sufﬁcient segments in each observer-

determined category to allow valid statistical analysis of body

movement. The observers’ categorisations act as our ground

truth and the experimenter determined labels are not used in

the analysis.

III. PERCEPTUAL STUDY:

A. Body Feature Analysis

Thirty-two observers (17 male, 15 female, mean age 33.0)

viewed the clips of the animated avatar in random order and

categorized each clip as hilarious, social, awkward, fake or

non-laughter. No audio was presented with the animations.

The modal laughter category selected by the observers

acted as the ground truth for the statistical analysis of body

movement features [19]. The number of potential movement

features that can be analyzed is large and increases exponen-

tially if the interactions of multiple features are considered in

combination. Therefore, our selection of features was based on

previous ﬁndings in the literature [9], [26] and observers’ com-

ments in post-experiment interviews on which features they

found useful in categorizing laughter. These included postural

changes such as bending of the spine and gestures such as

moving a hand toward the face or abdomen (Table I). Feature

analysis was based on the position coordinate triplets of the

relevant anatomical nodes. Maximum and minimum bending

were calculated as greatest and smallest deviation respectively

TABLE I. LIST OF KNOWLEDGE BASED FEATURES TO BE ANALYSED.

Hands/gesture

Maximum, minimum and range of distance between hands

Maximum, minimum and range of distance of left hand from hip

Maximum, minimum and range of distance of right hand from hip

Maximum, minimum and range of distance of left hand from head

Maximum, minimum and range of distance of right hand from head

Shoulder movement

Correlation of left and right shoulder-hip distances

Range of azimuthal shoulder rotation

Spine and neck bending

Maximum, minimum and range of upper back bending

Maximum, minimum and range of lower back bending

Maximum, minimum and range of neck bending

Maximum, minimum and range of compound spine bending

from collinearity of the spine sections adjacent to the node

in question. Range of bending was calculated as maximum

bending minus minimum bending. Bending was calculated at

each spine node including the neck, and collectively across

all spine nodes (compound bending), deﬁned as the sum of

deviation from collinearity of all spine sections. Distances were

calculated as Euclidean distances in 3D space. The features for

hilarious, social and non-laughter segments were entered into

separate one-way ANOVAs for standing and sitting segments

( the independent variable was the modal observer categoriza-

tion). Planned comparisons tested differences between laughter

and non-laughter (hilarious and social vs. non-laughter) and

between laughter types (hilarious vs. social).

B. Ground Truth from Observer Categorization

The mean number of observers who selected the modal

category was 13.8 (SD = 4.3) with a maximum agreement of

29 of the 32 observers. Segments tied for the modal category

were excluded from the body movement analysis, as were

segments for which the modal category was selected by less

than

of observers (< 11/32). For all experimenter deﬁned

labels, the most common observer categorization was social or

non-laughter. Too few awkward (N = 4) and fake (N = 1)

remained so these were excluded from further analysis. Ninety-

one segments (52 standing; 39 sitting) were entered into the

ﬁnal analysis of body movements.

C. Body Movements

For sitting laughter, ANOVAs revealed main effects of

observer categorization on the range of distance between the

hands, and the range of both hands’ distance from the head

and hip (all F (2, 36) > 7, p ≤ .003); the range of azimuthal

shoulder rotation (F (2, 36) = 10.04, p < .001); the range of

bending at all spine and neck modes and of compound spine

bending (all F (2, 36) > 11, p < .001); and the minimum

bending at the upper back and neck (both F (2, 36) > 4.5,

p < .02).

For all of these features, planned contrasts revealed signif-

icantly greater activity in laughter than non-laughter segments

(all t

abs

> 2.5, p < .02). Planned comparisons also revealed

greater range of distances of both hands from the hip and of

the left hand from the head in hilarious than social laughter (all

abs

> 2, p < .04); and a greater range of azimuthal shoulder

rotation, greater range of bending at all spine and neck nodes

and a greater range of compound spine bending in hilarious

than social laughter (all t

abs

> 2, p < .05).

For standing laughter, ANOVAs revealed main effects of

observer categorization on the range of distance between the

hands, the range of both hands’ distance from the head and

hip, the maximum distance of both hands from the hip and

the minimum distance of the right hand from the head (all

F (2, 49) > 3, p < .05); the range of bending and the

maximum bending of upper and lower back and compound

spine bending (all F (2, 49) > 3, p < .05).

Planned comparisons of these effects revealed greater range

of hand-to-hand, hand-to-head, and hand-to-hip distances for

both hands in laughter than non-laughter segments, and the

range of right-hand-to-hip distances was greater in hilarious

than social laughs (all t

abs

> 2.5, p < .02); both hands moved

further from the hip and the right hand moved closer to the

head in laughter than non-laughter segments (all t

abs

> 3, p <

.05); the range of upper, lower and compound spine bending

was greater for laughter than non-laughter segments and the

range of upper and compound spine bending was greater for

hilarious than social laughs (all t

abs

> 2, p < .05), in addition

the maximum compound spine bending was greater in laughter

than non-laughter segments (t

abs

> 2.46, p = .018).

IV. AUTOMATIC RECOGNITION

The second aim in this study is to investigate the possibility

of automatically predicting the distributions of observers’

ratings for the ﬁve types of laughter. The relative performances

of a broad range of supervised machine learning methods are

tested. In this part of the study we consider the distributions

of the ratings from all 32 observers. This leads to a 5-

output regression problem. If the frequencies of these ratings

are normalised the values can be viewed as a degree of

belief for each outcome and we also preserve a measure of

observer agreement for each instance. This also removes the

need to equate the most frequent label as a ground truth

which is a weak assumption for instances with low agreement.

Moreover, this will also allow for the full set of 126 instances

to be used. The knowledge based features listed in Table I

serves as part of the full feature set for recognition. We also

include kinematically derived motion quantities analogous to

the amount of energy expended. It has been shown that kinetic

energy measures can contribute to the detection of laughter

[16]. For three dimensional motion data a measure analogous

to kinetic energy can be compactly calculated using the sum

of the angular velocity at each joint over for each laughter

segment [22]. Therefore, in the full feature set we also include

the energy from ﬁve upper body articulations: left and right

elbows, left and right shoulders and neck. Initial experiments

showed a low degree of variance in lower body joints for this

dataset and were therefore excluded.

A. Supervised Learning Models

Formally the problem consists of a set of T = 5 supervised

regression tasks, one for each type of laughter (including ’non-

laughter’). We denote by x

∈ R

, the vector of attributes

describing instance i. We deﬁne the matrix of all of the training

instances as X ∈ R

d×m

, where m is the number of training

instances and d being the dimensionality of the data. A distinct

label y

is provided for each task t ∈ {1 . . . T }, for instance

i, taken from the frequency of observations. We denote Y

∈

as the vector label t for all instances. We also denote the

corresponding model predicted output as ˆy

a) k-Nearest Neighbour (k−NN): This is a simple

model which assigns the value of the predicted output based on

the K nearest training instances in the data space. We attain the

necessary multiple outcome vector by using the means of the

labels from the K nearest neighbours N

(x) ⊂ {1, 2, . . . , m}

of a given instance x. For a test instance x, the prediction is

calculated by ˆy

i∈N

(x)

b) Multi Layer Perceptron with Softmax (MLP): The

MLP is a widely used feed forward neural network that can

be naturally applied to learn multiple regression tasks. For our

purposes we further constrain the sum of the network outputs

to 1 by using the softmax activation function [27]. This is an

extension of the logistic function given by:

ˆy

exp





s=1

exp (q

)

where q

is the activation value for the output node for task t

and input i.

c) Random Forest (RF): We also investigate the use of

the Random Forest algorithm [28] to generate an ensemble of

decision trees, using the mean of the ensemble as the ﬁnal

outcome. Each of these trees only has access to a set of δ

attributes, randomly chosen when the tree was created. In the

experiments conducted here, we have set the number of trees

to 500, and the number of attributes considered for each tree

δ =

√

= 5, as suggested in [29].

d) Linear and Kernel Ridge Regression (RR, KRR):

This is a baseline regression approach. In the linear

form, RR is based in solving the optimization problem

min



− Y



+λ kw

, where w

represents the weight

vector of the linear model f

(x) = hw

, xi, x, w

∈ R

, for

task t ∈ {1 . . . T }. For convenience we denote as k·k

the

-norm of a vector. One can extend this approach to non-

linear models by applying the kernel trick. In this case we have

chosen the Gaussian kernel K(x, t) = exp



−1

kx − tk



e) Linear and Kernel Support Vector Regression (SVR,

KSVR): Finally we implement Support Vector Regression to

predict the degree of belief of each of the laughter type based

on the frequency of the ratings for each instance. In the linear

form, SVR is based on the optimization of the following

problem:

min

,ξ

+ C

i=1

s.t





− w



≤ ε + ξ

≥ 0

In that, ε ≥ 0 is the deviation allowed from the ground truth

labels y

. This constraint is weakened in some points by adding

an extra margin ξ

. The degree of deviations larger than ε are

adjusted by the second parameter C ≥ 0. Similar to KRR,

a non linear variant KSVR is also used in the comparison,

employing also the Gaussian kernel.

B. Evaluation Metrics

In order to robustly evaluate the multiple outcomes of the

models against the distribution of the observers categorisations,

as suggested in [23], we apply four well established multi-

score metrics over a number of instances M :

1) Mean Square Error: this is the standard loss function

which is computed as:

MSE :=

i=1

t=1



− ˆy



2) Cosine Similarity: ﬁnds the cosine of the angle be-

tween two vectors resulting in a maximum of 1 when

the vectors are fully aligned.

CS :=

i=1

ˆy

kˆy

3) Top Match Rate: evaluates the number of times the

predicted top ranked label is the same as the top

ranked label for the ground truth.

T MR :=

i=1

(

argmax

1≤t≤T

=argmax

1≤t≤T

ˆy

)

where 1

is a function on condition A.



1, A is true

0, A is false

4) Ranking Loss: this metric calculates the average

fraction of label pairs that are reversely ordered

for an instance. By ordering the label outcomes

as:



≥ y

≥ ... ≥ y



the ranking loss predicted

outputs can be calculated by:

RL :=

i=1

j=1

k=j+1

ˆy

<ˆy

T × (T − 1) /2

where 1

is the same function on condition A as for

TMR.

C. Recognition Results

We implement and evaluate all of the models outlined

above using a leave one subject out (LOSO) validation ap-

proach. This ensures instances from the same subject are not

present in training, validation and test sets at the same time.

We split the subjects into three groups: n training subjects,

1 validation subject to tune model parameters and 1 testing

subject to assess performance. For each model this procedure

is repeated 72 times (9 test subjects ×8 validation subjects,

accounting for all combinations) and the average results are

reported. Parameter values were tuned over a set range for

each of the models, the appropriate ranges were determined

in initial experiments. The parameters adjusted are as follows:

for k−NN:k; RR: λ; SVR: C; KSVR: C, σ; KRR: λ, σ; and

MLP: n

hidden

(the number of hidden layer nodes).

Table II compares the performances of all of the models

using the four multi-score metrics. The results show mean (and

TABLE II. COMPARISON OF RECOGNITION PERFORMANCES. ↑

INDICATES HIGHER VALUES CORRESPOND TO BETTER PERFORMANCE AND

↓ INDICATES THE OPPOSITE. THE FIRST SEVEN ROWS CORRESPOND TO

THE AUTOMATIC RECOGNITION MODELS, THE LAST ROW (IR) INDICATES

THE MEAN LEVEL OF AGREEMENT BETWEEN OBSERVER GROUPS.

MSE ↓ CS ↑ TMR ↑ RL ↓

k-NN

0.0151

(0.0041)

0.8825

(0.0300)

0.5272

(0.1658)

0.2998

(0.0517)

0.0142

(0.0030)

0.8892

(0.0242)

0.4935

(0.2175)

0.2942

(0.0800)

KRR

0.0145

(0.0037)

0.8871

(0.0287)

0.5054

(0.2026)

0.2972

(0.0700)

SVR

0.0148

(0.0040)

0.8837

(0.0350)

0.4967

(0.2070)

0.3005

(0.0879)

KSVR

0.0149

(0.0039)

0.8842

(0.0302)

0.4815

(0.1965)

0.2989

(0.0791)

MLP

0.0192

(0.0066)

0.8536

(0.0450)

0.4837

(0.2112)

0.3195

(0.0668)

0.0101

(0.0036)

0.9205

(0.0250)

0.6620

(0.1665)

0.2648

(0.0467)

0.0217

(0.0032)

0.9457

(0.0081)

0.8489

(0.0291)

0.1003

(0.0092)

standard deviation) of each measure after the 72 runs. In order

to understand how informative the form features alone (Table I)

would perform, we also tested the models when trained without

using the ﬁve energy based features. The results showed similar

but reduced performances in comparison to the ones reported

in Table II. For example the best performing scores without

energy features were for the RF model with MSE: 0.0106, CS:

0.9163, TMR: 0.662, RL: 0.2712. This demonstrates the dis-

criminatory power of the form features between laughter types.

This supports previous results showing the importance of form

in affective body expression recognition [30]. In addition, we

also seek to understand the level of agreement between human

observer groups. This calculation would provide a quantitative

context when assessing the rates given in Table II. Using

a simpliﬁed version of the approach proposed in [20], the

raters were split randomly into two groups of 16 and the

collective predictions of each group were computed. The same

four measures used for evaluating the systems were applied

to measure the agreement between these two predictions. We

repeated this process 10000 times and computed the averages

(and standard deviation). The results are reported in the last

row of Table II as IR. We can see that the results obtained

for the models are very similar to the inter-rater agreement

measures for MSE and CS but are lower for TMR and RL.

Table III shows the F1-score and accuracy of the classi-

ﬁcations for each laughter type from each of the models by

assuming the most frequent observer label as the ground truth

and the highest model output as the prediction. This can be

viewed as treating the data as a classiﬁcation problem. Within

the 126 instances there were only 6 instances where ’awkward’

was the most frequent label and 5 instances for ’fake’, whereas

the number of instances for ’hilarious’, ’social’, and ’non-

laughter’ were 25, 46, and 44 respectively. Moreover, for some

of the subjects these classes do not occur if ground truth

is considered in this way. Since we use LOSO classiﬁcation

performance can not be measured, therefore we show the F1

and accuracy scores for the remaining classes in Table III.

V. DISCUSSION AND CONCLUSION

In this section we discuss the ﬁndings from the perceptual

study and the investigation into automated recognition.

TABLE III. F1-SCORE (TOP) AND ACCURACY (BOTTOM) FOR EACH

MODEL BASED ON THE MOST FREQUENT OBSERVER LABELS FOR THE

THREE CATEGORIES WITH A SIGNIFICANT NUMBER OF INSTANCES.

Hilarious Social Not a Laugh

k-NN

0.5941

0.6000

0.3864

0.3397

0.5498

0.6818

0.5253

0.5700

0.2287

0.1712

0.5875

0.7869

KRR

0.5268

0.5900

0.2744

0.2174

0.6068

0.7585

SVR

0.5103

0.5600

0.2555

0.1902

0.5864

0.7813

KSVR

0.4840

0.4900

0.2894

0.2418

0.5676

0.7273

MLP

0.4175

0.4050

0.3797

0.3261

0.5995

0.6932

0.5636

0.6200

0.5562

0.5516

0.7441

0.8011

Analysis based on observer categorization of avatar anima-

tions revealed diagnostic body movement features for laughter

perception. The importance of spine movements in sitting

and standing postures may reﬂect observers’ sensitivity to

the respiratory movements that generate characteristic laughter

vocalizations and cause the spine to bend [9]. Similarly, that

hilarious laughter had a greater range of spine bending than

social laughter may be due to the energetic nature of hilarious

laughter relative to more controlled social laughter.

The range of azimuthal shoulder rotation was greater in

laughter than non-laughter in the sitting but not standing pos-

ture. When standing, models were free to turn, whereas in the

more constrained sitting condition shoulder rotation may have

been indicative of an energetic laughter episode. Alongside

the ﬁndings on spine bending, this hints that greater upper

body movement may indicate laughter. It is counter-intuitive

that any large upper body movement indicates laughter, so

observers’ perception of laughter compared to energetic, non-

laughter movements, e.g. coughs, should be investigated.

The range of distance between the hands was greater

in laughter than non-laughter segments, also indicating dis-

crimination based on the overall amount of movement. An

alternative explanation is the presence of speciﬁc gestures

such as pointing to laughter-eliciting stimuli. Standing laughter

segments had a smaller minimum right hand to head distance

than those categorized as non-laughter, suggesting that moving

the hand near or onto the face was seen as indicative of

laughter. This is of particular interest, since this gesture is

incidental to the core process of laughing; however, the timing

of this gesture may be crucial in conveying the presence and

nature of laughter and such temporal factors merit further

investigation. For example the study reported in [31] shows

that local temporal dynamics improves the automatic discrim-

ination between affective body expressions.

There was insufﬁcient consensus on awkward and fake

laughter to draw conclusions on body movements indicative

of these laughter types. These laughter types may be too emo-

tionally and socially complex, or too infrequent in real life, for

observers to have a clear mental model of the body movements

associated with them. Alternatively these types of laughter may

be indistinguishable, on the basis of body movements alone,

from hilarious or social laughter, or from non-laughter speech.

Further information, such as vocalizations, facial expressions,

and context may be necessary for observers to disambiguate

them.

Although we optimized capture of shoulder and spine

movement, the avatar animations were unable to show non-

rigid deformation of the avatar sections (shoulder movement

was shown through relative movements of rigid sections).

Non-rigid deformations of the torso from respiratory action

may be important in animating naturalistic laughter [9]. In

addition our equipment did not capture hand gestures so

the precise nature of arm and hand movements may have

been ambiguous to observers, for example, they may have

been unable to distinguish a pointing gesture from a palm-up

gesture. Annotation of the video recordings of these sessions

in future will identify meaningful gestures and, when these can

be animated, allow us to analyse their contribution towards the

perception of different laughter types.

Ultimately the capture of body movements using more

accessible technology e.g., Microsoft Kinect, will make laugh-

ter detection ubiquitous in interactive systems. Our ﬁndings

suggest that torso bending movements, possibly driven by

respiratory actions, and peripheral gestures are used by ob-

servers to detect and classify laughter, and that these should

be included when animating laughter. The resting posture, e.g.

sitting vs. standing, should also be considered as it affects

laughter diagnostic movements, e.g, shoulder rotation. Future

work should cover more complex laughter, e.g. awkward, that

we were unable to reliably elicit in this study. The sex, age,

cultural background and personality of the laughter and ob-

server should also be further considered, for example, laughter

produced by extroverts and introverts may vary and speciﬁc

attitudes towards laughter may affect the perception of the

emotional content of the laughter. Some of these factors have

been investigated in [32] using the same set of body laughter

stimuli used in our study. The role of body movements may be

more complex in multimodal displays than in this uni-modal

study and our ﬁndings should be validated with simultaneous

facial and audio information to establish their applicability in

functional human-avatar interactions. The temporal dependen-

cies of laughter signals between these modalities and within the

body-movement channel will need to be carefully considered in

these scenarios as the perceived emotional content of laughter

may be strongly dependent on the order, duration and temporal

proﬁle, e.g. onset and offset speed, of these signals.

The results on automatic recognition (Tables II and III)

demonstrate the effectiveness of the non parametric model

RF. The relative poorer performance of the parametric models

could be partially explained by the LOSO validation process

used to tune the model parameters. Recalling that LOSO

separates the training, validation and test sets by subject, this

shows that they may have been prone to idiosyncratic effects

during this tuning; this did not effect the RF model as no

pre-tuning was done. In contrast, the other models showed a

signiﬁcant dependency on their respective tuned parameters k,

λ, C, σ and n

hidden

. The processing times for all of models

are similar and are within the same order of magnitude with

the exception of the MLP which required up to 10 times longer

depending on n

hidden

. When considering MSE and CS scores

the recognition methods show a good performance. These

metrics are more sensitive to the distribution of observer labels

upon which all of the models are trained. It can be concluded

that our full feature set used in this study is descriptive and

appropriate for learning the observer distributions, with the

worst performing model MLP still returning high scores. In

contrast when considering TMR and LR the performances for

all of the models return mediocre scores. However, in principle,

this is not unexpected since all the methods are regression

models by design.

Table III shows F1 classiﬁcation scores for three categories:

hilarious, social and non-laughter. The most readily classiﬁed

category is non-laughter with social as the most difﬁcult to

discriminate. This shows the feature set used in this study could

be salient for classifying non laughter from body movements.

Nevertheless, they are still descriptive for the discrimination

of the other classes well above chance level (20%). It is also

worth noting that the MSE and CS rates for all of the models

are similar to the MSE and CS scores for the inter observer

group agreement. Though it must be noted that this is not

directly comparable since the values in Table II stem from all

32 observers and the values calculated for IR stem from two

groups of 16. Nevertheless, it does provide an indicator of the

model performances relative to human recognition rates.

Future work should include the in-depth analysis of the

decision tree ensembles within the RF model. This could give

insight into the various features and corresponding thresholds

that have the most discriminatory power and could further

inform the design of improved recognition systems. Further-

more, methods to account for idiosyncratic artifacts should be

considered such as individual bias removal [22] or transfer

learning methods [33].

ACKNOWLEDGMENTS

The research leading to these results has received funding

from the European Union Seventh Framework Programme

(FP7/2007-2013) under grant agreement no. 270780. We thank

all those who participated in our experiments, Jianchuan Qi for

his help in collecting the motion capture data and the members

of the ILHAIRE consortium for their comments.

REFERENCES

[1] C. Creed and R. Beale, “User interactions with an affective nutritional

coach,” Interacting with Computers, vol. 24, no. 5, pp. 339–350, 2012.

[2] M. Ochs, R. Niewiadomski, P. Brunet, and C. Pelachaud, “Smiling vir-

tual agent in social context,” International journal Cognitive Processing,

pp. 1–14, 2011.

[3] E. Holt, “The last laugh: Shared laughter and topic termination,” Journal

of Pragmatics, vol. 42, pp. 1513–1525, 2010.

[4] G. McKeown, R. Cowie, W. Curran, W. Ruch, and E. Douglas-

Cowie, “Ilhaire laughter database,” in Proceedings of 4th International

Workshop on Corpora for Research on Emotion, Sentiment & Social

Signals, LREC, 2012, pp. 32–35.

[5] D. Cosker and J. Edge, “Laughing, crying, sneezing and yawning:

Automatic voice driven animation of non-speech articulations,” in

Proceedings of Computer Animation and Social Agents, CASA, 2009.

[6] R. Niewiadomski, J. Urbain, C. Pelachaud, and T. Dutoit, “Finding out

the audio and visual features that inﬂuence the perception of laughter

intensity and differ in inhalation and exhalation phases,” in Proceedings

of 4th International Workshop on Corpora for Research on Emotion,

Sentiment & Social Signals, LREC, 2012.

[7] J. Urbain, R. Niewiadomski, E. Bevacqua, T. Dutoit, A. Moinet,

C. Pelachaud, B. Picart, J. Tilmanne, and J. Wagner, “Avlaughtercycle.

enabling a virtual agent to join in laughing with a conversational partner

using a similarity-driven audiovisual laughter animation,” Journal of

Multimodal User Interfaces, vol. 4, pp. 47–58, 2010.

[8] M. Filippelli, R. Pellegrino, I. Iandelli, G. Misuri, J. Rodarte, R. Duranti,

V. Brusasco, and G. Scano, “Respiratory dynamics during laughter,”

Journal of Applied Physiology, vol. 90, pp. 1441–1446, 2001.

[9] Z. V. DiLorenzo, P. and B. Sanders, “Laughing out loud: control

for modeling anatomically inspired laughter using audio,” In ACM

Transactions on Graphics, vol. 27, p. 125, 2008.

[10] R. Niewiadomski and C. Pelachaud, “Towards multimodal expression

of laughter,” in Intelligent Virtual Agents. Springer, 2012, pp. 231–244.

[11] C.-H. Chou, C.-H. Li, B.-W. Chen, J.-F. Wang, and P.-C. Lin, “A real-

time training-free laughter detection system based on novel syllable

segmentation and correlation methods,” in Awareness Science and

Technology (iCAST), 2012 4th International Conference on. IEEE,

2012, pp. 294–297.

[12] K. Laskowski, “Contrasting emotion-bearing laughter types in multipar-

ticipant vocal activity detection for meetings,” in Acoustics, Speech and

Signal Processing, 2009. ICASSP 2009. IEEE International Conference

on. IEEE, 2009, pp. 4765–4768.

[13] M. Miranda, J. A. Alonzo, J. Campita, S. Lucila, and M. Suarez, “Dis-

covering emotions in ﬁlipino laughter using audio features,” in Human-

Centric Computing (HumanCom), 2010 3rd International Conference

on. IEEE, 2010, pp. 1–6.

[14] S. Petridis and M. Pantic, “Audiovisual discrimination between laughter

and speech,” in Acoustics, Speech and Signal Processing, 2008. ICASSP

2008. IEEE International Conference on. IEEE, 2008, pp. 5117–5120.

[15] S. Fukushima, Y. Hashimoto, T. Nozawa, and H. Kajimoto, “Laugh

enhancer using laugh track synchronized with the user’s laugh motion,”

in CHI ’10 Extended Abstracts on Human Factors in Computing

Systems, ser. CHI EA ’10. New York, NY, USA: ACM, 2010, pp.

3613–3618.

[16] M. Mancini, G. Varni, D. Glowinski, and G. Volpe, “Computing and

evaluating the body laughter index,” in Human Behavior Understanding.

Springer, 2012, pp. 90–98.

[17] M. El Ayadi, M. S. Kamel, and F. Karray, “Survey on speech emotion

recognition: Features, classiﬁcation schemes, and databases,” Pattern

Recognition, vol. 44, no. 3, pp. 572–587, 2011.

[18] Z. Zeng, M. Pantic, G. Roisman, and T. Huang, “A survey of affect

recognition methods: Audio, visual, and spontaneous expressions,”

Pattern Analysis and Machine Intelligence, IEEE Transactions on,

vol. 31, no. 1, pp. 39–58, 2009.

[19] A. Kleinsmith and N. Bianchi-Berthouze, “Affective body expression

perception and recognition: a survey,” IEEE Trans. Affective Computing,

vol. 4, pp. 15–33, 2013.

[20] A. Kleinsmith, N. Bianchi-Berthouze, and A. Steed, “Automatic recog-

nition of non-acted affective postures,” Systems, Man, and Cybernetics,

Part B: Cybernetics, IEEE Transactions on, vol. 41, no. 4, pp. 1027–

1038, 2011.

[21] G. Castellano, S. D. Villalba, and A. Camurri, “Recognising human

emotions from body movement and gesture dynamics,” in Affective

computing and intelligent interaction. Springer, 2007, pp. 71–82.

[22] D. Bernhardt and P. Robinson, “Detecting affect from non-stylised body

motions,” in Affective Computing and Intelligent Interaction. Springer,

2007, pp. 59–70.

[23] H. Meng, A. Kleinsmith, and N. Bianchi-Berthouze, “Multi-score

learning for affect recognition: the case of body postures,” in Affective

Computing and Intelligent Interaction. Springer, 2011, pp. 225–234.

[24] C. Galvan, D. Manangan, M. Sanchez, J. Wong, and J. Cu, “Audiovisual

affect recognition in spontaneous ﬁlipino laughter,” in Knowledge and

Systems Engineering (KSE), 2011 Third International Conference on.

IEEE, 2011, pp. 266–271.

[25] G. McKeown, W. Curran, C. McLoughlin, H. J. Grifﬁn, and N. Bianchi-

Berthouze, “Laughter induction techniques suitable for generating mo-

tion capture data of laughter associated body movements,” in 2nd Inter-

national Workshop on Emotion Representation, Analysis and Synthesis

in Continuous Time and Space (EmoSPACE), 2013.

[26] W. Ruch and P. Ekman, “The expressive pattern of laughter,” Emotion,

qualia, and consciousness, pp. 426–443, 2001.

[27] J. S. Bridle, “Probabilistic interpretation of feedforward classiﬁcation

network outputs, with relationships to statistical pattern recognition,” in

Neurocomputing. Springer, 1990, pp. 227–236.

[28] L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp.

5–32, 2001.

[29] V. Svetnik, A. Liaw, C. Tong, J. C. Culberson, R. P. Sheridan, and

B. P. Feuston, “Random forest: a classiﬁcation and regression tool

for compound classiﬁcation and qsar modeling,” Journal of chemical

information and computer sciences, vol. 43, no. 6, pp. 1947–1958, 2003.

[30] A. Kleinsmith and N. Bianchi-Berthouze, “Form as a cue in the

automatic recognition of non-acted affective body expressions,” Lecture

Notes in Computer Science, vol. 6874, pp. 155–164, 2011.

[31] A. Kleinsmith, T. Fushimi, and N. Bianchi-Berthouze, “An incremental

and interactive affective posture recognition system,” in International

Workshop on Adapting the Interaction Style to Affective Factors, in

conjunction with the International Conference on User Modeling, 2005.

[32] G. McKeown, W. Curran, D. Kane, R. McCahon, H. Grifﬁn,

C. McLoughlin, and N. Bianchi-Berthouze, “Human perception of

laughter from context-free whole body motion dynamic stimuli,” in

International Conference on Affective Computing and Intelligent In-

teraction, 2013, in press.

[33] B. Romera-Paredes, M. Aung, M. Pontil, A. Williams, P. Watson, and

N. Bianchi-Berthouze, “Transfer learning to account for idiosyncrasy in

face and body expressions,” Automatic Face and Gesture Recognition,

2013.

A Longitudinal Characterization of Typical Laughter Development in Mother–Child Interaction from 12 to 36 Months: Formal Features and Reciprocal Responsiveness

Article

Full-text available

Aug 2022
J NONVERBAL BEHAV

Laughter is a valuable means for communicating and engaging in interaction since the earliest months of life. Nevertheless, there is a dearth of work on how its use develops in early interactions—given its putative reflexive nature, it has often been disregarded from studies on pre-linguistic vocalizations. We provide a longitudinal characterization of laughter use analyzing interactions of 4 babies with their mothers at five time-points (12, 18, 24, 30, and 36 months). We show how child laughter is very distinct from mothers’ (and adults’ generally), in terms of frequency, duration, level of arousal displayed, overlap with speech, and responsiveness to others’ laughter. Notably, contrary to what might be expected, we observed that children laugh significantly less than their mothers, especially at the first time-points analyzed. We indeed observe an increasing developmental trajectory in the production of laughter overall and in the contingent multimodal response to mothers’ laughter, showing the child’s increasing attunement to the social environment, interest in others’ appraisals and mental states, and awareness of its communicative value. We also show how mothers’ contingent responses to child laughter change over time, going from high-frequency mimicry, to a lower rate of diversified multimodal responses, in line with the child’s neuro-psychological development. Our data support a dynamic view of dialogue where interactants influence each other bidirectionally and emphasizes the crucial communicative value of laughter. When language is not fully developed, laughter might be an early means, in its already fully available expressiveness, to hold the conversational turn and enable meaningful vocal contribution in interaction at the same level of the interlocutor. Our study aims to provide a benchmark for typical laughter development, since we believe it can be an early means, along with other commonly analyzed behaviors (e.g., smiling, gazing, pointing, etc.), to gain insight into early child neuro-psychological development.

Growing up laughing: Laughables and pragmatic functions between 12 and 36 months

Article

Jul 2023
J PRAGMATICS

Laughter in interaction : semantics, pragmatics, and child development

Thesis

Full-text available

Sep 2019

Chiara Mazzocconi

Laughter is a social vocalization universal across cultures and languages. It is ubiquitous in our dialogues and able to serve a wide range of functions. Laughter has been studied from several perspectives, but the classifications proposed are hard to integrate. Despite being crucial in our daily interaction, relatively little attention has been devoted to the study of laughter in conversation, attempting to model its sophisticated pragmatic use, neuro-correlates in perception and development in children. In the current thesis a new comprehensive framework for laughter analysis is proposed, crucially grounded in the assumption that laughter has propositional content, arguing for the need to distinguish different layers of analysis, similarly to the study of speech: form, positioning, semantics and pragmatics. A formal representation of laughter meaning is proposed and a multilingual corpus study (French, Chinese and English) is conducted in order to test the proposed framework and to deepen our understanding of laughter use in adult conversation. Preliminary investigations are conducted on the viability of a laughter form-function mapping based on acoustic features and on the neuro-correlates involved in the perception of laughter serving different functions in natural dialogue. Our results give rise to novel generalizations about the placement, alignment, semantics and function of laughter, stressing the high pragmatic skills involved in its production and perception. The development of the semantic and pragmatic use of laughter is observed in a longitudinal corpus study of 4 American-English child-mother pairs from 12 to 36 months of age. Results show that laughter use undergoes important development at each level analysed, which complies with what could be hypothesised on the base of phylogenetic data, and that laughter can be an effective means to track cognitive/communicative development, and potential difficulties or delays at a very early stage.

Character Computing

Book

Full-text available

Jan 2020

The book gives an introduction into the theory and practice of the transdisciplinary field of Character Computing, introduced by Alia El Bolock. The latest scientific findings indicate that “One size DOES NOT fit all” in terms of how to design interactive systems and predict behavior to tailor the interaction experience. Emotions are one of the essential factors that influence people’s daily experiences; they influence decision making and how different emotions are interpreted by different individuals. For example, some people may perform better under stress and others may break. Building upon Rosalind Picard’s vision, if we want computers to be genuinely intelligent and to interact naturally with us, we must give computers the ability to recognize, understand, even to have and express emotions and how different characters perceive and react to these emotions, hence having richer and truly tailored interaction experiences. Psychological processes or personality traits are embedded in the existing fields of Affective and Personality Computing. However, this book is the first that systematically addresses this including the whole human character; namely our stable personality traits, our variable affective, cognitive and motivational states as well as our morals, beliefs and socio-cultural embedding. The book gives an introduction into the theory and practice of the transdisciplinary field of Character Computing. The emerging field leverages Computer Science and Psychology to extend technology to include the whole character of humans and thus paves the way for researchers to truly place humans at the center of any technological development. Character Computing is presented from three main perspectives: ● Profiling and sensing the character ● Leveraging characters to build ubiquitous character-aware systems ● Investigating how to extend Artificial Intelligence to create artificial characters

A Psychologically Driven, User-Centered Approach to Character Modeling

Chapter

Jan 2020

Character Computing is a novel and interdisciplinary field of research based on interactive research between Computer Science and Psychology. To allow appropriate recognition and prediction of human behavior, Character Computing needs to be grounded on psychological definitions of human behavior that consider explicit as well as implicit human factors. The framework that guides Character Computing therefore needs to be of considerable complexity in order to capture the human user’s behavior in its entirety. The question to answer in this chapter is how Character Computing can be empirically realized and validated. The psychologically driven interdisciplinary framework for Character Computing will be outlined and how it is realized empirically as Character Computing platform. Special focus in this chapter is laid on experimental validation of the Character Computing approach including concrete laboratory experiments. The chapter adds to the former chapter which discussed the different steps of the Character Computing framework more broadly with respect to current theories and trends in Psychology and Behavior Computing.

Impact of Annotation Modality on Label Quality and Model Performance in the Automatic Assessment of Laughter In-the-Wild

Article

Jan 2023

Although laughter is known to be a multimodal signal, it is primarily annotated from audio. It is unclear how laughter labels may differ when annotated from modalities like video, which capture body movements and are relevant in in-the-wild studies. In this work we ask whether annotations of laughter are congruent across modalities, and compare the effect that labeling modality has on machine learning model performance. We compare annotations and models for laughter detection, intensity estimation, and segmentation, using a challenging in-the-wild conversational dataset with a variety of camera angles, noise conditions and voices. Our study with 48 annotators revealed evidence for incongruity in the perception of laughter and its intensity between modalities, mainly due to lower recall in the video condition. Our machine learning experiments compared the performance of modern unimodal and multi-modal models for different combinations of input modalities, training, and testing label modalities. In addition to the same input modalities rated by annotators (audio and video), we trained models with body acceleration inputs, robust to cross-contamination, occlusion and perspective differences. Our results show that performance of models with body movement inputs does not suffer when trained with video-acquired labels, despite their lower inter-rater agreement.

Speakers Raise Their Hands and Head During Self-Repairs in Dyadic Conversations

Article

Dec 2023

People often encounter difficulties in building shared understanding during everyday conversation. The most common symptom of these difficulties are self-repairs, when a speaker restarts, edits or amends their utterances mid-turn. Previous work has focused on the verbal signals of self-repair, i.e. speech disfluences (filled pauses, truncated words and phrases, word substitutions or reformulations), and computational tools now exist that can automatically detect these verbal phenomena. However, face-to-face conversation also exploits rich non-verbal resources and previous research suggests that self-repairs are associated with distinct hand movement patterns. This paper extends those results by exploring head and hand movements of both speakers and listeners using two motion parameters: height (vertical position) and 3D velocity. The results show that speech sequences containing self-repairs are distinguishable from fluent ones: speakers raise their hands and head more (and move more rapidly) during self-repairs. We obtain these results by analysing data from a corpus of 13 unscripted dialogues, and we discuss how these findings could support the creation of improved cognitive artificial systems for natural human-machine and human-robot interaction.

Real-time audiovisual laughter detection

Conference Paper

May 2017

Analysis of Co-Laughter Gesture Relationship on RGB videos in Dyadic Conversation Contex

Preprint

Full-text available

Jun 2022

The development of virtual agents has enabled human-avatar interactions to become increasingly rich and varied. Moreover, an expressive virtual agent i.e. that mimics the natural expression of emotions, enhances social interaction between a user (human) and an agent (intelligent machine). The set of non-verbal behaviors of a virtual character is, therefore, an important component in the context of human-machine interaction. Laughter is not just an audio signal, but an intrinsic relationship of multimodal non-verbal communication, in addition to audio, it includes facial expressions and body movements. Motion analysis often relies on a relevant motion capture dataset, but the main issue is that the acquisition of such a dataset is expensive and time-consuming. This work studies the relationship between laughter and body movements in dyadic conversations. The body movements were extracted from videos using deep learning based pose estimator model. We found that, in the explored NDC-ME dataset, a single statistical feature (i.e, the maximum value, or the maximum of Fourier transform) of a joint movement weakly correlates with laughter intensity by 30%. However, we did not find a direct correlation between audio features and body movements. We discuss about the challenges to use such dataset for the audio-driven co-laughter motion synthesis task.

Character-IoT (CIoT): Toward Human-Centered Ubiquitous Computing

Chapter

Jan 2020

Amr El Mougy

Character Computing envisions systems that can detect, synthesize, and adapt to human character. The development and realization of this field hinge upon the availability of data about human character traits and states. This data must be comprehensive enough to model the embedded causality in the triad of behavior–situation–character that makes up the core of Character Computing. Acquiring this data requires an intelligent and scalable platform for sensing, processing, analysis, and decision support, which we label as Character-IoT (CIoT). This chapter investigates how this CIoT can be realized. A comprehensive study of sensing modalities in the areas of affective and personality computing is presented to identify the technologies that can be adopted in Character Computing. This includes facial expressions, speech, text, gestures, and others. We also highlight artificial intelligence techniques that are most commonly used in areas of affective and personality computing and analyze which ones are suitable for Character Computing. Finally, we propose an architectural framework for CIoT that can be adopted by future researchers in this field.

Affective Body Expression Perception and Recognition: A Survey

Article

Full-text available

Jan 2013

Thanks to the decreasing cost of whole-body sensing technology and its increasing reliability, there is an increasing interest in, and understanding of, the role played by body expressions as a powerful affective communication channel. The aim of this survey is to review the literature on affective body expression perception and recognition. One issue is whether there are universal aspects to affect expression perception and recognition models or if they are affected by human factors such as culture. Next, we discuss the difference between form and movement information as studies have shown that they are governed by separate pathways in the brain. We also review psychological studies that have investigated bodily configurations to evaluate if specific features can be identified that contribute to the recognition of specific affective states. The survey then turns to automatic affect recognition systems using body expressions as at least one input modality. The survey ends by raising open questions on data collecting, labeling, modeling, and setting benchmarks for comparing automatic recognition systems.

ILHAIRE Laughter Database

Conference Paper

Full-text available

May 2012

The ILHAIRE project seeks to scientifically analyse laughter in sufficient detail to allow the modelling of human laughter and subsequent generation and synthesis of laughter in avatars suitable for human machine interaction. As part of the process an incremental database is required providing different types of data to aid in modelling and synthesis. Here we present an initial part of that database in which laughs were extracted from a number of pre-existing databases. Emphasis has been placed on extraction of laughs that are social and conversational in style as there are already existing databases that include instances of hilarious laughter. However, an attempt has been made to exhaustively extract all instances of laughter from databases that were not designed for the purpose of generating hilarious laughter. Theses databases are: the Belfast Naturalistic Database, the HUMAINE Database, the Green Persuasive Database, the Belfast Induced Natural Emotion Database and the SEMAINE Database.

Laughter Induction Techniques Suitable for Generating Motion Capture Data of Laughter Associated Body Movements

Conference Paper

Full-text available

Apr 2013

Laughter is a frequently occurring social signal and an important part of human non-verbal communication. However it is often overlooked as a serious topic of scientific study. While the lack of research in this area is mostly due to laughter's non-serious nature, it is also a particularly difficult social signal to produce on demand in a convincing manner; thus making it a difficult topic for study in laboratory settings. In this paper we provide some techniques and guidance for inducing both hilarious laughter and conversational laughter. These techniques were devised with the goal of capturing mo-tion information related to laughter while the person laughing was either standing or seated. Comments on the value of each of the techniques and general guidance as to the importance of atmosphere, environment and social setting are provided.

Towards Multimodal Expression of Laughter

Conference Paper

Full-text available

Sep 2012

Laughter is a strong social signal in human-human and human-machine communication. However, very few attempts to model it exist. In this paper we discuss several challenges regarding the generation of laughs. We focus, more particularly, on two aspects a) modeling laughter with different intensities and b) modeling respiration behavior during laughter. Both of these models combine a data-driven approach with high-level animation control. Careful analysis and implementation of the synchronization mechanisms linking visual and respiratory cues has been undertaken. It allows us to reproduce the highly correlated multimodal signals of laughter on a 3D virtual agent.

An Incremental and Interactive Affective Posture Recognition System

Article

The role of body posture in affect recognition, and the im-portance of emotion in the development and support of intelligent and social behavior have been accepted and researched within several fields. While posture is considered important, much research has focused on ex-tracting emotion information from dance sequences. Instead, our focus is on creating an affective posture recognition system that incrementally learns to recognize and react to people's affective behaviors. In this pa-per, we examine a set of requirements for creating this system, and our proposed solutions. The first requirement is that the system is general and non-situation specific. Secondly, it should be able to handle explicit and implicit feedback. Finally, it must be able to incrementally learn the emotion categories without predefining them. We tested and compared the performance of our system using 182 standing postures described as a combination of form features and motion flow features, across sev-eral emotion categories, with a typical algorithm used for recognition, back-propagation, and with human observers in an aim to show the gen-eralizability of the system. This initial testing showed positive results.

Computing and Evaluating the Body Laughter Index

Conference Paper

Oct 2012

The EU-ICT FET Project ILHAIRE is aimed at endowing machines with automated detection, analysis, and synthesis of laughter. This paper describes the Body Laughter Index (BLI) for automated detection of laughter starting from the analysis of body movement captured by a video source. The BLI algorithm is described, and the index is computed on a corpus of videos. The assessment of the algorithm by means of subject's rating is also presented. Results show that BLI can successfully distinguish between different videos of laughter, even if improvements are needed with respect to perception of subjects, multimodal fusion, cultural aspects, and generalization to a broad range of social contexts.

Transfer learning to account for idiosyncrasy in face and body expressions

Conference Paper

Apr 2013

In this paper we investigate the use of the Transfer Learning (TL) framework to extract the commonalities across a set of subjects and also to learn the way each individual instantiates these commonalities to model idiosyncrasy. To implement this we apply three variants of Multi Task Learning, namely: Regularized Multi Task Learning (RMTL), Multi Task Feature Learning (MTFL) and Composite Multi Task Feature Learning (CMTFL). Two datasets are used; the first is a set of point based facial expressions with annotated discrete levels of pain. The second consists of full body motion capture data taken from subjects diagnosed with chronic lower back pain. A synchronized electromyographic signal from the lumbar paraspinal muscles is taken as a pain-related behavioural indicator. We compare our approaches with Ridge Regression which is a comparable model without the Transfer Learning property; as well as with a subtractive method for removing idiosyncrasy. The TL based methods show statistically significant improvements in correlation coefficients between predicted model outcomes and the target values compared to baseline models. In particular RMTL consistently outperforms all other methods; a paired t-test between RMTL and the best performing baseline method returned a maximum p-value of 2.3 × 10-4.

A real-time training-free laughter detection system based on novel syllable segmentation and correlation methods

Conference Paper

Aug 2012

In this paper, a laughter detection system based on the correlation characteristic of signals is proposed. The advantages of the system are speaker independent, low-computational and training-free. To achieve the goal, a modified autocorrelation function (MACF) is combined with a new approach called vocal tract transfer detector (VTTD) for segmenting an input signal into a syllable stream. Next, based on each syllable's Mel-scale frequency cepstral coefficients (MFCCs), the correlation between two consecutive syllables is measured by the dynamic time warping (DTW) algorithm. The consecutive syllables with high correlation are considered as a laughter segment. In our experimental result, the proposed system can achieve an accuracy rate of 88.67%. Besides, compared with the baseline, the proposed system can reduce the word error rate (WER) of syllable segmentation by 5.9%. Such results indicate that the proposed method is effective in detecting laughter, thereby demonstrating the feasibility of the system.

Machine Learning, Volume 45, Number 1 - SpringerLink

Article

Oct 2001

Leo Breiman

Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y. Freund & R. Schapire, Machine Learning: Proceedings of the Thirteenth International conference, ***, 148–156), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression.

User interactions with an affective nutritional coach

Article

Sep 2012

This paper investigates how users respond to emotional expressions displayed by an embodied agent. In a between-subjects experiment (N = 50) an emotionally expressive agent (simulating the role of a nutritional coach) was perceived as significantly more likeable and caring than an unemotional version. Feedback from participants also revealed detailed insights into their perceptions of the agents and highlighted a strong preference for the emotionally expressive version. Design implications for embodied agents are discussed and future research areas identified.

Laughter Type Recognition from Whole Body Motion

Abstract and Figures

Recommended publications

Ergonomic analysis on driver seat of electric car and its comparison with Lcgc car seat

The effect of trunk rotation during shoulder exercises on the activity of the scapular muscle and sc...

Full-body Kinematic Characteristics of the Maximal Instep Soccer Kick by Male Soccer Players and Par...

Biomechanics and energy demands - with focus on double poling