Conference PaperPDF Available

Affect Detection and Classification from the Non-stationary Physiological Data

December 2013

December 2013
1:240-245

DOI:10.1109/ICMLA.2013.49

Conference: Proceedings of the 2013 12th International Conference on Machine Learning and Applications - Volume 01

Authors:

Omar AlZoubi

Applied Science University

Davide Fossati

Carnegie Mellon University Qatar

Rafael A Calvo

Imperial College London

Affect detection from physiological signals has received a great deal of attention recently. One arising challenge is that physiological measures are expected to exhibit considerable variations or non-stationarities over multiple days/sessions recordings. These variations pose challenges to effectively classify affective sates from future physiological data. The present study collects affective physiological data (electrocardiogram (ECG), electromyogram (EMG), skin conductivity (SC), and respiration (RSP)) from four participants over five sessions each. The study provides insights on how diagnostic physiological features of affect change over time. We compare the classification performance of two feature sets, pooled features (obtained from pooled day data) and day-specific features using an up datable classifier ensemble algorithm. The study also provides an analysis on the performance of individual physiological channels for affect detection. Our results show that using pooled feature set for affect detection is more accurate than using day-specific features. The corrugator and zygomatic facial EMGs were more reliable measures for detecting valence than arousal compared to ECG, RSP and SC over the span of multi-session recordings. It is also found that corrugator EMG features and a fusion of features from all physiological channels have the highest affect detection accuracy for both valence and arousal.

Interaction effect between channel and emotion

…

Figures - uploaded by Omar AlZoubi

Content may be subject to copyright.

Content uploaded by Omar AlZoubi

Content may be subject to copyright.

Affect Detection and Classification from the Non-

Stationary Physiological Data

Omar AlZoubi and Davide Fossati

Computer Science

Carnegie Mellon University in Qatar

Doha, Qatar

oalzoubi@cmu.edu, dfossati@cmu.edu

Sidney D’Mello

Department of Computer Science

The University of Notre Dame

Notre Dame, IN, USA

sdmello@nd.edu

Rafael A. Calvo

School of Electrical and Information Engineering

The University of Sydney

Sydney, Australia

Rafael.Calvo@sydney.edu.au

Abstract— Affect detection from physiological signals has

received a great deal of attention recently. One arising

challenge is that physiological measures are expected to exhibit

considerable variations or non-stationarities over multiple

days/sessions recordings. These variations pose challenges to

effectively classify affective sates from future physiological

data. The present study collects affective physiological data

(electrocardiogram (ECG), electromyogram (EMG), skin

conductivity (SC), and respiration (RSP)) from four

participants over five sessions each. The study provides

insights on how diagnostic physiological features of affect

change over time. We compare the classification performance

of two feature sets; pooled features (obtained from pooled day

data) and day-specific features using an updatable classifier

ensemble algorithm. The study also provides an analysis on the

performance of individual physiological channels for affect

detection. Our results show that using pooled feature set for

affect detection is more accurate than using day-specific

features. The corrugator and zygomatic facial EMGs were

more reliable measures for detecting valence than arousal

compared to ECG, RSP and SC over the span of multi-session

recordings. It is also found that corrugator EMG features and

a fusion of features from all physiological channels have the

highest affect detection accuracy for both valence and arousal.

Keywords—Affect; emotion; classifier ensembles;

physiological; Non-Stationary

I. INTRODUCTION

There is increased attention on using physiological signals in

affect detection systems that are able to detect either discrete

emotional categories or affective dimensions of valence and

arousal [1-3]. Physiological responses such as facial muscle

activity, skin conductivity, heart activity, and respiration,

have all been considered as potential physiological markers

for recognizing affective states. Despite high classification

rates achieved under laboratory conditions [4], the changing

nature of physiological signals introduces significant

challenges when one moves from the lab and into the real

world. In particular, physiological data is expected to exhibit

non-stationarities or day variations due to factors such as:

electrode drift, changes in the electrode impedance, and

modulations by other mental states such as attention and

motivation of subjects [2, 5].

Non-stationarity of the physiological signal indicates that

the signal changes its statistical characteristics as a function

of time. These changes then propagate to feature values

extracted from signals over time. Non-stationarities or day

variations of physiological data represents a major problem

for building reliable classification models of affect (i.e.,

generalizability within an individual). This is because

classification methods assume that training data is obtained

from a stationary distribution. In real world contexts,

however, this assumption of stationarity is routinely violated.

According to Kuncheva [6], every real-world classification

system should be equipped with a mechanism to adapt to

changes in the environment. Therefore, this study utilizes an

updatable ensemble classification approach (winnow),

discussed in more detail in section II.

Understanding the nature of these day variations is

essential for developing reliable affect detection systems that

can be deployed in real world affective computing

applications. There is a critical need for basic research on

how physiological signals vary over time before effective

solutions can be proposed. In this study we try to address two

related research questions. First, the day variation in

physiological data might indicate that diagnostic features of

affect vary from one day/session to another. We test this

issue and evaluate a new classification approach that uses

day-specific features for affect detection. The performance of

day-specific features is then compared to a classification

approach that uses pooled features obtained from pooled day

2013 12th International Conference on Machine Learning and Applications

DOI 10.1109/ICMLA.2013.49

240

2013 12th International Conference on Machine Learning and Applications

DOI 10.1109/ICMLA.2013.49

240

2013 12th International Conference on Machine Learning and Applications

DOI 10.1109/ICMLA.2013.49

240

2013 12th International Conference on Machine Learning and Applications

DOI 10.1109/ICMLA.2013.49

240

data. Second, we test the performance and reliability of

individual physiological channels for detecting both valence

and arousal affective dimensions over the span of multiple

day recording sessions.

A. Background and Related Work

Research on emotion referred to the existence of a set of

discrete emotional prototypes (e.g. happiness, sadness, etc)

[7]. As opposed to the notion or existence of discrete

emotions, Russell [8] suggested that affective experience is

best described in the two-dimensional space of valence and

arousal. The arousal dimension ranges from highly activated

to highly deactivated, and the valence dimension from highly

pleasant to highly unpleasant. Take happiness for example as

an emotion; it is modelled with positive valence and high

arousal. On the other hand, Sadness is modelled with a

negative valence and low arousal.

Recent research has utilized physiological signals for

affect detection of both affective dimensions of valence and

arousal. Kim and Andre [1] conducted an experiment to

detect the levels of valence and arousal of subjects during

music listening using physiological signals. They recorded

ECG, facial EMG, SC, and RSP from three participants.

They achieved 89% classification accuracy for 1-2 degrees

of valence and 77% for 1-2 degrees of arousal using an LDA

classifier. Similarly, Lichtenstein, et al. [4] recorded ECG,

SC, EMG, RSP, and skin temperature from 41 subjects while

watching emotionally charged films. They were able to

detect 1-2 degrees of arousal with 82% accuracy and 72%

for 1-2 degrees of valence using an SVM classifier.

Likewise, Picard, et al. [2] recorded physiological data over a

period of 20 days from one subject (actor), who was asked to

self-elicit a set of eight emotions. They faced the problem of

degrading classification performance when data from

multiple days were combined. They attempted to address the

problem of day variation by including day information as

additional classification features; however, this did not yield

a significant improvement in accuracy.

The above mentioned studies have used traditional batch

static classification techniques without any updating

mechanism to the classifier. On the other hand; adaptive and

updatable classification techniques have been used to handle

non-stationarities in two close domains of study; which are

speech recognition and Brain Computer Interfaces (BCI).

For example, Maier-Hein, et al. [9] implemented an adaptive

approach to detect non-audible speech using seven EMG

electrodes. They realized that a major problem in surface

EMG based speech recognition ensue from repositioning

electrodes between recording sessions, environmental

temperature changes, and skin tissue properties of the

speaker. In order to reduce the impact of these factors, they

investigated a variety of signal normalization and model

adaptation methods. An average word accuracy of 97.3%

was achieved using seven EMG channels and the same

electrode positions. The performance dropped to 76.2% after

repositioning the electrodes if no normalization or adaptation

is performed. However, by applying the adaptation methods

they managed to restore the recognition rates to 87.1%.

Adaptive and dynamic classification approaches have

also been employed in BCI research. BCI aims at giving the

ability to control devices through mere thoughts by utilizing

brain signals such as electroencephalogram (EEG). For

example Lowne, et al. [10] compared the performance of a

dynamic classification approach to a static classifier and an

MLP classifier on an online BCI experiment. They used

EEG data from eight subjects during a wrist extension

exercise; 20% of the data were labelled with true labels

(movement, non-movement). The three classifiers were then

tested on the EEG data in a sequential manner (timely

ordered) to detect one of the two classes. The performance of

the dynamic classifier was significantly higher than that of

the static classifier and MLP. This shows that classifier

adaptation is more effective compared to static classification

when dealing with changing data such as physiological data.

This gives the rational for the use of an adaptive and

updatable classification approach such as the winnow

updatable ensemble algorithm to classify our corpus of

affective data.

The remaining of this paper is organized as follows.

Section II describes the procedure of collecting affective

physiological data and the computational methods employed

for feature extraction and classification. Section III presents

and discusses our results, and section IV provides concluding

remarks and directions for future work.

II. MEASURES, DATA AND METHODS

A. Participants and Measures

Participants were six students from the University of Sydney

(five males and one female) between 24 and 39 years of age.

Participants were paid for their participation in the study.

The physiological sensors used for recording physiological

activity from participants were: ECG, SC, EMG, and RSP.

The physiological signals were acquired using a BIOPAC

MP150 system and AcqKnowledge software with a sampling

rate of 1000 Hz for all channels. The ECG signal was

collected with two electrodes placed on both wrists. EMG

was recorded from the corrugator (eyebrow) and zygomatic

(cheek) facial muscles. The SC was recorded from the index

and middle fingers of the non-dominant hand, and a

respiration belt fixed around the participant chest was used to

measure respiration activity.

The affect-inducing stimulus consisted of set of 400

images selected from the International Affective Picture

System (IAPS) collection [11]. The images were selected on

the basis of their normative valence and arousal scores. The

mean valence norm scores ranges from 1.40 to 8.34, and

mean arousal norm scores ranges from 1.72 to 7.35 (on a

scale from 1 to 9). Images were selected from the four

quadrants: PositiveValence-LowArousal (mean IAPS

valence norm > 6.03 and mean IAPS arousal norm < 5.47),

PositiveValence-HighArousal (mean IAPS valence norm >

6.03 and mean IAPS arousal norm > 5.47),

241241241241

NegativeValence-HighArousal (mean IAPS valence norm <

3.71 and mean IAPS arousal norm > 5.47), and

NegativeValence-LowArousal (mean IAPS valence norm <

3.71 and mean IAPS arousal norm < 5.47). The idea was to

select images from the extremes of both valence and arousal

in order to maximize the differences of participants’

physiological responses. The set of 400 images was then

divided into 5 sets of 80 images each (20 images from each

category). However for classification procedures described

in section III, we classify each of the valence and arousal

dimensions separately.

Only four participants were able to complete the five

recording sessions. The female participant and one male

participant reported their inability to continue because some

images had explicit content that were overwhelming to them.

Therefore, only data from four participants were used in the

current study. This indicates the difficulty of obtaining a

affective physiological data that is intended to track

variations over multiple sessions. We note that even though

the sample size is small, each participant was recorded over

5 sessions. This is consistent with the present goal of

tracking variations within an individual rather than across

individuals.

B. Procedure

participants sat in a quite dimmed room and were asked to

sign a consent form before the start of the session. They then

viewed a set of emotionally charged IAPS images, during

which their physiological signals were continuously

recorded. Each recording session lasted approximately 60

minutes. Each emotional trial consisted of presenting each

image for 12 seconds, followed by a screen that showed a 2

X 2 affective grid [12] which lasted few seconds and allowed

participants to rank their levels of valence (positive,

negative) and arousal (low, high). A blank screen was

presented afterwards for 8 seconds to allow physiological

activity to return to baseline neutral levels before a new

image was presented. Five images were presented

consecutively from each category in order to maintain a

stable emotional state for that category. This protocol was

designed to suit the intended goals of our study and is based

on previous research [11]. Each subject participated in five

recording sessions each separated by one week. A different

set of images were presented for each session in order to

prevent habituation effects. The same setup was used in all

recording sessions.

C. Feature Extraction

The MATLAB-based Augsburg Biosignal Toolbox [13] was

used to preprocess and extract features from the raw

physiological data. A total of 214 statistical features (e.g.

mean, median, standard deviation, maxima and minima)

were extracted from the five physiological channels using

window size of 12 seconds (the length of the emotional trial).

In general, SC responses (SCR) can be observed 1-3 seconds

after stimulus presentations. EMG responses are

substantially faster, however, the frequency of the muscle

activity can be summed up over a period of time to indicate a

change in behavioral pattern. ECG and respiration responses

are considered slower, however, estimating cardiac and

respiratory patterns form short term periods is common in

psychophysiology research area [14]. Eighty-four features

were extracted from ECG, 21 from SC, 21 from each of the

EMG channels, and 67 from the RSP channel. A complete

description of these features can be found in [13].

D. Classification Methods

The present study used the winnow updatable ensemble

algorithm described in detail in Table. I. It is an ensemble

based algorithm that is similar to a weighted majority voting

algorithm. It combines decisions from ensemble members

based on their weights. However, it utilizes a different

updating approach for member classifiers. This includes

promoting ensemble members that make correct predictions

and demoting those that make incorrect predictions.

Updating of the weights is done automatically based on

incoming data, which makes this approach suitable for online

applications that operate in non-stationary environments .

We set the parameter alpha value to 2 (alpha = 2); which

is the parameter used to update the weights of classifier

members. Acceptable results have been achieved using an

alpha value of two in previous research [6]. In this study we

use hard labeled data only. The ensemble relies on regular

feedback where weights are updated on the basis of error.

This is a prerequisite for the updating mechanism used by

winnow. However, in real world applications immediate

feedback might not be available all the time. In case that

feedback or true class labels are unknown, other approaches

could be explored. For example the use of semi-supervised

techniques that combine labeled and unlabeled data for

training a classifier. Additionally, feedback could be

obtained on-demand by asking users of an affect-aware

system in away similar to self-reports. In this case the

frequency at which the ensemble is updated could depend on

some performance threshold.

The WEKA data mining package, and PRTools 4.0 [15],

a pattern recognition MATLAB library, were used for

classification. Chi-square feature selection was used for

dimensionality reduction in order to avoid problems

associated with large feature spaces. WEKA’s support vector

machine (SMO) classifier with a linear kernel was utilized

for training classification models. Many successful

TABLE I. THE WINNOW ENSEMBLE ALGORITHM

- Initialization: Given a classifier ensemble D = (D1,….,Dn),

Initialize all classifier weights; wi = 1. i =1:n.

- Classification: For a new example x, calculate the support for

each class as the sum of the weights of all classifiers Di that suggest

class label ck for x. Set x to the class with largest support.

k=1:number of classes.

- Updating: if x is classified correctly by classifier Di then its

weight is increased (promotion) by wi = alpha * wi,, where alpha >

1. If classifier Di incorrectly classifies x, then its weight is

decreased by wi = wi / alpha (demotion).

242242242242

applications of SVMs have been demonstrated in previous

research [16]. Choosing the SMO classifier as a base

classifier is independent from the classification approach

adopted by the ensemble algorithm. In future work,

additional classifiers could be evaluated to determine their

effect on the performance of the ensemble algorithm.

E. Day Datasets

Day datasets were constructed separately for the two

affective measures valence (positive/negative) and arousal

(low/high). Additionally, separate datasets were constructed

for both IAPS mapped categories (as described in section II.

A.) and self-reports of participants. In total there were 80 (4

participants x 5 recording sessions x 2 affective measures

(valence and arousal) x 2 ratings (IAPS and self-reports))

datasets with 80 instances in each dataset. IAPS ratings

datasets had a balanced distribution of labels 40:40 for

positive/negative valence or low/high arousal. On the other

hand, self-reports datasets had unbalanced distribution of

classes, so a down sampling procedure (WEKA’s

SpreadSubsample which produces a random subsample of a

dataset) was applied to obtain a balanced distribution of

classes. This is done in order to avoid classifier bias towards

predicting majority class. On average, 34% of data was lost

for self-reports arousal datasets and 15% of data was lost for

self-reports valence datasets. Therefore, baseline

classification accuracy is (50%) for both types of data sets. A

preliminary analysis showed that the top five ranked features

-using chi-square feature selection- were sufficient to

produce consistent classification results without sacrificing

performance. Therefore, the top five features were selected

from each dataset and used in all subsequent analysis.

III. RESULTS AND DISCUSSIONS

We first tested the effectiveness and reliability of the IAPS

images of inducing both valence and arousal using Cohen’s

kappa. We then explored the issue of how diagnostic features

of affect change over time by applying feature ranking (chi-

square) to separate day datasets, and how this issue could

affect the building of reliable classification models. Next, we

carried out two classification experiments. The first

experiment compared three feature selection and training

strategies utilizing the winnow ensemble algorithm and a

day-cross validation procedure that resembles baseline

accuracy. The effect of using day-specific and pooled

features on classification accuracy was tested. The second

experiment involved testing the effect of individual

physiological channels on affect detection accuracy. This

experiment built on the results of the first experiment by

using pooled features only for affect classification. In all

classifier training strategies described in the following

sections, we adopted a day cross-validation strategy in order

to test on all available data.

A. The effectivness of IAPS of inducing Affect

In order to test the effectiveness of the IAPS stimuli in

inducing both valence and arousal, we tested the level of

agreement between participants’ self-reports and IAPS

normative ratings using Cohen’s kappa. Participants’ self-

reported valence showed higher agreement with IAPS

normative ratings compared to arousal. The kappa score for

valence was 0.89, and kappa score for arousal was 0.41. It is

evident that the IAPS stimuli were quite successful in

eliciting valence, but was much less effective in influencing

arousal. However, both ratings schemes will be used to

assess affect detection accuracy.

B. Day-Specific Features

As an example of how diagnostic features change across

days, Table II presents the results of chi-square feature

selection applied to participant S1 (applied to each day data

separately). The chi-square value represents the degree of

relevance of a feature to class category. It can be seen from

the results that the diagnostic features are different for each

day. This is a reflection of the changing nature of

physiological data. Table III presents the features selected

from one participant using IAPS ratings only (for space

limitations and as a demonstration), however other subjects

data showed similar behavior. An interesting observation is

that there are frequent features which reoccur on different

days. This is promising as it allows for easier calibration of

affect detection classification models. However this leaves us

wondering whether classification models that are built from

these day-specific features are more accurate than those built

using pooled features selected from pooled day data. We

address this issue in the next section.

C. Winnow Results on Day-Specific and Pooled Features

The winnow ensemble algorithm was run with an ensemble

of four base classifiers each trained on a separate day-

specific features dataset. Testing was done on the remaining

day-data. The procedure was repeated five times to test on all

available data. We also tested the performance of pooled

TABLE II. TOP FIVE SELECTED FEATURES PERFORMED ON DAY DATA SEPRATELY FOR PARTICIPANT S1 WITH VALENCE (IAPS) AS CLASS LABEL

Chi Square/ Feature Name

Day 1 Features Day 2 Features Day 3 Features Day 4 Features Day 5 Features

43 SC-2Diff-minRatio 23 ZYG-EMG-1Diff-minRatio 10 SC-1Diff-minRatio 16 ZYG-EMG-max 28 ZYG-EMG-2Diff-minRatio

43 SC-2Diff-maxRatio 23 ZYG-EMG-1Diff-

maxRatio 10 RSP-Ampl-1Diff-max 10 ECG-QS-min 26 SC-2Diff-maxRatio

23 ZYG-EMG-1Diff-maxRatio 14 RSP-2Diff-range 10 SC-1Diff-maxRatio 9 ECG-QS-range 24 SC-2Diff-minRatio

23 ZYG-EMG-1Diff-minRatio 13 RSP-2Diff-min 6 RSP-Ampl2Diff-maxRatio 8 ECG-HrvDistr-mean 23 ZYG-EMG-2Diff-maxRatio

23 RSP-Pulse-max 12 ZYG-EMG-2Diff-mean 5 ECG-HrvDistr-mean 6 RSP-Pulse1Diff-maxRatio 12 RSP-Ampl-mean

a. ZYG: Zygomatic facial muscle, Amp: Amplitude, min: Minimum, max, Maximum, HRV: Heart rate variability, 1Diff: First Difference., 2Diff: Second Difference.

243243243243

TABLE III. AVERAGE CLASSIFI CATION ACCURACY FOR THREE

TRAINING STRATEGIES USING MIXED FEATURES FROM ALL

PHYSIOLOGICAL CHANNELS (%)

Subject ID D-CV W-

W-SF D-CV W-

W-SF

Valence (IAPS) Arousal (IAPS)

S1 59 74 64 52 72 56

S2 54 63 61 51 69 53

S3 50 74 61 52 75 49

S4 50 76 54 48 73 59

Avergae 53.25 71.75 60.00 50.75 72.25 54.25

Valence (Self) Arousal (Self)

S1 52 73 68 50 63 62

S2 53 62 59 49 65 48

S3 51 79 64 51 76 60

S4 50 70 54 52 69 69

Average 51.50 71.00 61.25 50.50 68.25 59.75

b. D-CV Day Cross-Validation, W-PF: Winnow with Pooled Features, W-SF: Winnow with

Day-Specific Features

features, which are features selected from four days data

combined, using the same training procedure. As a baseline,

we used a day cross-validation procedure, where a single

SMO classification model was constructed from pooled data

of four days, and testing was done on the remaining day data.

This process was repeated five times in order to test on all

available data. The baseline procedure represents a static

classification approach without an update mechanism. This

process was repeated for the four categories (Valence-IAPS ,

Arousal-IAPS , Valence-self, and Arousal-self). The results

in Table III were obtained from a mixed feature set, in which

the top five features were selected from all physiological

channels.

It can be seen from results that accuracy scores using

day-specific and pooled features are higher than day cross-

validation baseline accuracy. This indicates that we were

able to leverage the dynamicity of winnow algorithm to

enhance classification accuracy for this type of data. It can

also be seen from Table III that winnow with pooled features

on average outperformed winnow with day-specific features.

We were expecting that day specific features could provide

higher performance. The explanation to this lower

performance of day-specific features could be due to the fact

that day data tends to have higher clustering cohesion and

tightness compared to data for the same emotion category

across multiple days. In order to test this effect further, in the

next section we present classification results using single

channel’s data using the same procedures described earlier

for day cross-validation, pooled features, and day-specific

features. This also will allow us to shed light on the

performance and reliability of these individual channels for

affect detection over multiple sessions.

D. Channels Effect on Affect Detection Accuracy

The same training procedures described above were

performed on each individual physiological channel’s data.

Results are not shown here but will be outlined in the

Analysis of Variance (ANOVA) analysis described next.

In order to examine the effect of three training strategies

on affect detection accuracy, an ANOVA was conducted on

all accuracy scores combined. This is a one-way repeated

measure ANOVA with accuracy as the dependent variable

and training strategy as the independent variable. A

significant main effect was found for training strategy (F (2,

285) = 57.67, p < 0.05). Bonferroni posthoc tests revealed

that accuracy scores for winnow with pooled features (M =

65.14) were higher than those for winnow with day-specific

features (M = 57.81) and day cross-validation (M = 55.55).

These results suggest that using winnow with pooled features

is more suitable for building predictive models of affect than

the other two training strategies. Pooled features showed that

it had the capacity to describe someone’s overall affective

states with significantly higher accuracy compared to the

more discrete day specific-features. On the other hand, using

winnow ensemble achieved what was expected by

outperforming the single model approach represented by the

day cross-validation approach.

We tested the effect of both physiological channel and

emotion on affect detection accuracy using two-way repeated

measures ANOVA. This analysis was done using winnow

accuracy scores with pooled features only. We found

significant main effect for channel (F (5, 72) = 12.60, p <

0.05), indicating that physiological channels vary in their

usefulness for affect detection. Posthoc tests revealed that

accuracy scores for EMG-cur (M = 69.87) and mixed

features (M = 72.25) were significantly higher than other

channels ECG (M = 62.37), EMG-zyg (M = 58.5), RSP (M =

61.62), and SC (M = 61.62). We did not find significant

effect for emotion category (F (3, 72) = 1.14, p = 0.34),

which indicates that there are no significant differences in

the accuracy at which valence and arousal are detected given

a particular channel. However, when the levels of emotional

categories were decreased to two rather than four categories,

the effect of emotion was only marginally insignificant (F (1,

84) = 3.5, p = 0.065). A significant effect for the interaction

between channel and emotion was found (F (5, 84) = 2.36, p

< 0.05). This indicates that some channels have stronger

influence on one of the two affective components (valence,

arousal) over the other.

The interaction effect was further explored by conducting

simple effects tests. The tests revealed that both EMG

channels were more useful for detecting valence than

arousal; EMG-cur (F (1, 84) = 4, p < 0.05), and EMG-zyg (F

(1, 84) = 10.44, p < 0.05). Other channels where equally

likely to detect both valence and arousal with the same

accuracy, ECG (F (1, 84) = 43, p = 0.62), mixed features (F

(1, 84) = 70, p = 0.15), RSP (F (1, 84) = 0.07, p = 0.79), and

SC (F (1, 84) = 0, p = 0.99). Fig. 1 shows the interaction

effect between channel and emotion. Our findings came in

accordance with literature in regard to both EMG channels.

The corrugator and zygomatic EMG have always shown

consistent changes with the valence component of emotion

[17]. On the other hand, previous research has always

considered SC as an index of arousal [18]. Taking this in

regard, our results might have been affected by using IAPS

images as stimulus.

244244244244

Fig. 1. Interaction effect between channel and emotion

The results also showed that detecting arousal with

acceptable accuracy required more physiological markers in

comparison to valence (see Fig. 1). This probably the reason

that some previous research has outlined that the detection of

arousal is harder than valence [1]. The literature is somehow

inconsistent in this regard, with studies reporting higher

detection rates for arousal than valence [4, 19] and the

contrary [1]. However, an interesting study conducted by

Gomez, et al. [20] found that induced physiological changes

of subjects’ valence lasted longer than those of arousal in

which they dissipate quickly. This might explain the higher

detection rates of valence compared to arousal. However, it

should also be noted that other researchers believe that

valence detection can be more difficult to detect compared to

arousal as valence information is conveyed more subtly [21].

IV. CONCLUSIONS, LIMITATIONS AND FUTURE WORK

We have shown that diagnostic physiological features of

affect exhibit day variations. This is a challenging issue for

building effective affect detection systems. Using day-

specific features did not yield improved affect detection

over that of using pooled feature set. Both facial EMGs

were more predictive of valence than arousal compared to

ECG, RSP and SC. This has implications if designers of

affect detection systems were more interested in detecting

valence than arousal. This also suggests that facial EMG is

more reliable than other measures when considering affect

detection over multiple sessions. Additionally, EMG-cur

and a fusion of features from all channels yielded the

highest detection rates for both valence and arousal. There

are two primary limitations with the present study. One

limitation of our work is the relatively small sample size, so

replication with a larger sample is warranted. The second

limitation is that emotions were artificially induced rather

than spontaneously experienced. This approach was adopted

because strict laboratory control was desired in the present

experiment. Replicating this research in more naturalistic

contexts is an important step for future work.

REFERENCES

[1] J. Kim and E. Andre, "Emotion Recognition Based on Physiological

Changes in Music Listening," IEEE Trans. Pattern Anal. Mach.

Intell., vol. 30, pp. 2067-2083, 2008.

[2] R. W. Picard, E. Vyzas, and J. Healey, "Toward Machine Emotional

Intelligence: Analysis of Affective Physiological State," IEEE Trans.

Pattern Anal. Mach. Intell., vol. 23, pp. 1175-1191, 2001.

[3] O. AlZoubi, S. K. D'Mello, and R. A. Calvo, "Detecting Naturalistic

Expressions of Nonbasic Affect using Physiological Signals," IEEE

Transactions on Affective Computing, vol. 3, pp. 298-310, 2012.

[4] A. Lichtenstein, A. Oehme, S. Kupschick, and T. Jürgensohn,

"Comparing Two Emotion Models for Deriving Affective States from

Physiological Data," in Affect and Emotion in Human-Computer

Interaction. vol. 4868, C. Peter and R. Beale, Eds., ed: Springer

Berlin / Heidelberg, 2008, pp. 35-50.

[5] O. Alzoubi, M. S. Hussain, S. D'Mello, and R. A. Calvo, "Affective

modeling from multichannel physiology: analysis of day differences,"

in Proceedings of the 4th international conference on Affective

computing and intelligent interaction-Volume Part I, 2011, pp. 4-13.

[6] L. I. Kuncheva, "Classifier Ensembles for Changing Environments,"

in Multiple Classifier Systems, ed, 2004, pp. 1-15.

[7] P. Ekman, "An argument for basic emotions," Cognition & Emotion,

vol. 6, pp. 169-200, 1992.

[8] J. A. Russell, "A circumplex model of affect," Journal of Personality

and Social Psychology, vol. 39, pp. 1161-1178, 1980.

[9] L. Maier-Hein, F. Metze, T. Schultz, and A. Waibel, "Session

independent non-audible speech recognition using surface

electromyography," in Automatic Speech Recognition and

Understanding, 2005 IEEE Workshop on, 2005, pp. 331-336.

[10] D. R. Lowne, S. J. Roberts, and R. Garnett, "Sequential non-

stationary dynamic classification with sparse feedback," Pattern

Recognition, vol. 43, pp. 897-905, Mar 2010.

[11] M. Bradley and P. J. Lang, "The international affective picture system

(iaps) in the study of emotion and attention," in Handbook of

Emotion Elicitation and Assessment, J. A. Coan and J. J. B. Allen,

Eds., ed New York:: Oxford University Press, 2007, pp. 29-46.

[12] J. A. Russell, A. Weiss, and G. A. Mendelsohn, "Affect grid: A

single-item scale of pleasure and arousal," Journal of Personality and

Social Psychology, vol. 57, pp. 493-502, 1989.

[13] J. Wagner. (October, 2009). Augsburg Biosignal Toolbox (AuBT).

Available: http://mm-werkstatt.informatik.uni-

augsburg.de/project_details.php?id=%2033

[14] S. D. Kreibig, "Autonomic nervous system activity in emotion: A

review," Biological Psychology, vol. 84, pp. 394-421, 2010.

[15] F. v. d. Heijden, R. P. Duin, D. d. Ridder, and D. M. Tax,

Classification, parameter estimation and state estimation - an

engineering approach using Matlab. Chichester: John Wiley & Sons,

2004.

[16] A. K. Jain, R. P. W. Duin, and M. Jianchang, "Statistical pattern

recognition: a review," Pattern Analysis and Machine Intelligence,

IEEE Transactions on, vol. 22, pp. 4-37, 2000.

[17] A. O. Hamm, H. T. Schupp, and A. I. Weike, "Motivational

organization of emotions: Autonomic changes, cortical responses, and

reflex modulation," Handbook of affective sciences, pp. 187-211,

2003.

[18] R. W. Levenson, "Autonomic Nervous System Differences among

Emotions," Psychological Science, vol. 3, pp. 23-27, 1992.

[19] A. Haag, S. Goronzy, P. Schaich, and J. Williams, "Emotion

Recognition Using Bio-sensors: First Steps towards an Automatic

System," in Affective Dialogue Systems, ed, 2004, pp. 36-48.

[20] P. Gomez, P. G. Zimmermann, S. Guttormsen Schär, and B. Danuser,

"Valence lasts longer than arousal: Persistence of induced moods as

assessed by psychophysiological measures," Journal of

Psychophysiology, vol. 23, pp. 7-17, 2009.

[21] R. W. Picard, Affective Computing, second ed. Cambridge,

Massachusetts: The MIT Press, 1997.

245245245245

Recent Advances of Affect Detection from Arabic Text

Conference Paper

Full-text available

Jun 2019

Emotion Detection (ED) from text has been an active research field recently. It has attracted the attention of researchers as it can measure the emotional contexts while humans interact with computers. Humans could express their emotion in various ways; using typed text, facial expressions, speech, gestures, and physiological measures. ED is considerably different from sentiment analysis SA, where SA goal is to detect polarity from text such as positive, negative or neutral. On the other hand, ED aims to recognize emotions from input text. Emotions can be modeled as discrete categories, e.g. Ekmans six basic emotions (angry, fear, joy, disgust, surprise and sadness). On the other hand there is the dimensional model that express emotions as valence, arousal and dominance values. Social media provides a rich source of emotional text, e.g. Twitter and Facebook. In this paper, we provide a review of recent work on ED from Arabic text. We discuss approaches (lexicons, machine learning, deep neural networks and ensemble approaches), tools for text processing, and We also discuss description of the most popular datasets in this domain.

A Review of Human Activity Recognition Methods

Article

Full-text available

Nov 2015

Recognizing human activities from video sequences or still images is a challenging task due to problems, such as background clutter, partial occlusion, changes in scale, viewpoint, lighting, and appearance. Many applications, including video surveillance systems, human-computer interaction, and robotics for human behavior characterization, require a multiple activity recognition system. In this work, we provide a detailed review of recent and state-of-the-art research advances in the field of human activity classification. We propose a categorization of human activity methodologies and discuss their advantages and limitations. In particular, we divide human activity classification methods into two large categories according to whether they use data from different modalities or not. Then, each of these categories is further analyzed into sub-categories, which reflect how they model human activities and what type of activities they are interested in. Moreover, we provide a comprehensive analysis of the existing, publicly available human activity classification datasets and examine the requirements for an ideal human activity recognition dataset. Finally, we report the characteristics of future research directions and present some open issues on human activity recognition.

Towards PPG-based anger detection for emotion regulation

Article

Full-text available

Aug 2023
J NEUROENG REHABIL

Background Anger dyscontrol is a common issue after traumatic brain injury (TBI). With the growth of wearable physiological sensors, there is new potential to facilitate the rehabilitation of such anger in the context of daily life. This potential, however, depends on how well physiological markers can distinguish changing emotional states and for such markers to generalize to real-world settings. Our study explores how wearable photoplethysmography (PPG), one of the most widely available physiological sensors, could be used detect anger within a heterogeneous population. Methods This study collected the TRIEP (Toronto Rehabilitation Institute Emotion-Physiology) dataset, which comprised of 32 individuals (10 TBI), exposed to a variety of elicitation material (film, pictures, self-statements, personal recall), over two day sessions. This complex dataset allowed for exploration into how the emotion-PPG relationship varied over changes in individuals, endogenous/exogenous drivers of emotion, and day-to-day differences. A multi-stage analysis was conducted looking at: (1) times-series visual clustering, (2) discriminative time-interval features of anger, and (3) out-of-sample anger classification. Results Characteristics of PPG are largely dominated by inter-subject (between individuals) differences first, then intra-subject (day-to-day) changes, before differentiation into emotion. Both TBI and non-TBI individuals showed evidence of linear separable features that could differentiate anger from non-anger classes within time-interval analysis. However, what is more challenging is that these separable features for anger have various degrees of stability across individuals and days. Conclusion This work highlights how there are contextual, non-stationary challenges to the emotion-physiology relationship that must be accounted for before emotion regulation technology can perform in real-world scenarios. It also affirms the need for a larger breadth of emotional sampling when building classification models.

A Deep Learning Approach for Classifying Emotions from Physiological Data

Conference Paper

Apr 2020

Detecting Naturalistic Expressions of Nonbasic Affect Using Physiological Signals

Article

Full-text available

Jul 2012

Signals from peripheral physiology (e.g., ECG, EMG, and GSR) in conjunction with machine learning techniques can be used for the automatic detection of affective states. The affect detector can be user-independent, where it is expected to generalize to novel users, or user-dependent, where it is tailored to a specific user. Previous studies have reported some success in detecting affect from physiological signals, but much of the work has focused on induced affect or acted expressions instead of contextually constrained spontaneous expressions of affect. This study addresses these issues by developing and evaluating user-independent and user-dependent physiology-based detectors of nonbasic affective states (e.g., boredom, confusion, curiosity) that were trained and validated on naturalistic data collected during interactions between 27 students and AutoTutor, an intelligent tutoring system with conversational dialogues. There is also no consensus on which techniques (i.e., feature selection or classification methods) work best for this type of data. Therefore, this study also evaluates the efficacy of affect detection using a host of feature selection and classification techniques on three physiological signals (ECG, EMG, and GSR) and their combinations. Two feature selection methods and nine classifiers were applied to the problem of recognizing eight affective states (boredom, confusion, curiosity, delight, flow/-engagement, surprise, and neutral). The results indicated that the user-independent modeling approach was not feasible; however, a mean kappa score of 0.25 was obtained for user-dependent models that discriminated among the most frequent emotions. The results also indicated that k-nearest neighbor and Linear Bayes Normal Classifier (LBNC) classifiers yielded the best affect detection rates. Single channel ECG, EMG, and GSR and three-channel multimodal models were generally more diagnostic than two--channel models.

Valence Lasts Longer than Arousal: Persistence of Induced Moods as Assessed by Psychophysiological Measures

Article

Full-text available

Jan 2009

How long induced moods last is a critical question for mood research, but has been only poorly addressed to date. In particular, physiological parameters have rarely been included to assess the effectiveness of mood induction procedures. We investigated the persistence of four different moods (positive high-arousal, positive low-arousal, negative high-arousal, and negative low-arousal) induced by film clips during a computer task. We measured subjective affective state, respiration, skin conductance level (SCL), heart rate, and corrugator activity. People who watched the two negative clips reported more negative valence after the task and showed more facial frowning and lower SCL during the task than people who watched the two positive clips. No arousal effects persisted throughout the task. The results suggest that induced changes in the valence dimension of moods are maintained throughout an intervening task and are physiologically best reflected by corrugator activity and SCL, whereas induced changes in the arousal dimension dissipate quickly. The implications of these findings for mood research are discussed.

A Circumplex Model of Affect

Article

Full-text available

Dec 1980

James A Russell

Factor-analytic evidence has led most psychologists to describe affect as a set of dimensions, such as displeasure, distress, depression, excitement, and so on, with each dimension varying independently of the others. However, there is other evidence that rather than being independent, these affective dimensions are interrelated in a highly systematic fashion. The evidence suggests that these interrelationships can be represented by a spatial model in which affective concepts fall in a circle in the following order: pleasure (0), excitement (45), arousal (90), distress (135), displeasure (180), depression (225), sleepiness (270), and relaxation (315). This model was offered both as a way psychologists can represent the structure of affective experience, as assessed through self-report, and as a representation of the cognitive structure that laymen utilize in conceptualizing affect. Supportive evidence was obtained by scaling 28 emotion-denoting adjectives in 4 different ways: R. T. Ross's (1938) technique for a circular ordering of variables, a multidimensional scaling procedure based on perceived similarity among the terms, a unidimensional scaling on hypothesized pleasure–displeasure and degree-of-arousal dimensions, and a principal-components analysis of 343 Ss' self-reports of their current affective states. (70 ref) (PsycINFO Database Record (c) 2012 APA, all rights reserved)

Affect Grid: A Single-Item Scale of Pleasure and Arousal

Article

Full-text available

Sep 1989

This article introduces a single-item scale, the Affect Grid, designed as a quick means of assessing affect along the dimensions of pleasure–displeasure and arousal–sleepiness. The Affect Grid is potentially suitable for any study that requires judgments about affect of either a descriptive or a subjective kind. The scale was shown to have adequate reliability, convergent validity, and discriminant validity in 4 studies in which college students used the Affect Grid to describe (a) their current mood, (b) the meaning of emotion-related words, and (c) the feelings conveyed by facial expressions. Other studies (e.g., J. Snodgrass et al; see record 1989-13842-001) are cited to illustrate the potential uses of the Affect Grid as a measure of mood. (PsycINFO Database Record (c) 2012 APA, all rights reserved)

Affective Modeling from Multichannel Physiology: Analysis of Day Differences

Conference Paper

Full-text available

Oct 2011

Physiological signals are widely considered to contain affective information. Consequently, pattern recognition techniques such as classification are commonly used to detect affective states from physiological data. Previous studies have achieved some success in detecting affect from physiological measures, especially in controlled environments where emotions are experimentally induced. One challenge that arises is that physiological measures are expected to exhibit considerable day variations due to a number of extraneous factors such as environmental changes and sensor placements. These variations pose challenges to effectively classify affective sates from future physiological data; this is a common problem for real world requirements. The present study provides a quantitative analysis of day variations of physiological signals from different subjects. We propose a classifier ensemble approach using a Winnow algorithm to address the problem of day-variation in physiological signals. Our results show that the Winnow ensemble approach outperformed a static classification approach for detecting affective states from physiological signals that exhibited day variations.

The International Affective Picture System (IAPS) in the study of emotion and attention

Article

Jan 2007

Classification, Parameter Estimation and State Estimation: An Engineering Approach Using MATLAB

Article

Jan 2004

Classification, Parameter Estimation and State Estimation is a practical guide for data analysts and designers of measurement systems and postgraduates students that are interested in advanced measurement systems using MATLAB. 'Prtools' is a powerful MATLAB toolbox for pattern recognition and is written and owned by one of the co-authors, B. Duin of the Delft University of Technology. After an introductory chapter, the book provides the theoretical construction for classification, estimation and state estimation. The book also deals with the skills required to bring the theoretical concepts to practical systems, and how to evaluate these systems. Together with the many examples in the chapters, the book is accompanied by a MATLAB toolbox for pattern recognition and classification. The appendix provides the necessary documentation for this toolbox as well as an overview of the most useful functions from these toolboxes. With its integrated and unified approach to classification, parameter estimation and state estimation, this book is a suitable practical supplement in existing university courses in pattern classification, optimal estimation and data analysis. Covers all contemporary main methods for classification and estimation. Integrated approach to classification, parameter estimation and state estimation. Highlights the practical deployment of theoretical issues. Provides a concise and practical approach supported by MATLAB toolbox. Offers exercises at the end of each chapter and numerous worked out examples. PRtools toolbox (MATLAB) and code of worked out examples available from the internet. Many examples showing implementations in MATLAB. Enables students to practice their skills using a MATLAB environment.

Affective Computing

Book

Jan 1997

Rosalind W Picard

Comparing Two Emotion Models for Deriving Affective States from Physiological Data

Chapter

This paper describes an experiment on emotion measurement and classification based on different physiological parameters, which was conducted in the context of a European project on ambient intelligent mobile devices. Emotion induction material consisted of five four-minute video films that induced two positive and three negative emotions. The experimental design gave consideration to both, the basic and the dimensional model of the structure of emotion. Statistical analyses were conducted for films and for self-assessed emotional state and in addition, supervised machine learning technique was utilized. Recognition rates reached up to 72% for a specific emotion (one out of five) and up to 82% for an underlying dimension (one out of two).

Sequential non-stationary dynamic classification with sparse feedback

Article

Mar 2010
PATTERN RECOGN

Many data analysis problems require robust tools for discerning between states or classes in the data. In this paper we consider situations in which the decision boundaries between classes are potentially non-linear and subject to “concept drift” and hence static classifiers fail. The applications for which we present results are characterized by the requirement that robust online decisions be made and by the fact that target labels may be missing, so there is very often no feedback regarding the system's performance. The inherent non-stationarity in the data is tracked using a non-linear dynamic classifier, the parameters of which evolve under an extended Kalman filter framework, derived using a sequential Bayesian-learning paradigm. The method is extended to take into account missing and incorrectly labeled targets and to actively request target labels. The method is shown to work well in simulation as well as when applied to sequential decision problems in medical signal analysis.

Affect Detection and Classification from the Non-stationary Physiological Data

Abstract and Figures

Recommended publications

Emotion recognition based on the multiple physiological signals

Affect detection from non-stationary physiological data using ensemble classifiers

AUTOMATIC AFFECT DETECTION FROM PHYSIOLOGICAL SIGNALS: PRACTICAL ISSUES

Affective Modeling from Multichannel Physiology: Analysis of Day Differences

Detecting Naturalistic Expressions of Nonbasic Affect Using Physiological Signals