Conference PaperPDF Available

Affect Detection and Classification from the Non-stationary Physiological Data

Authors:

Abstract and Figures

Affect detection from physiological signals has received a great deal of attention recently. One arising challenge is that physiological measures are expected to exhibit considerable variations or non-stationarities over multiple days/sessions recordings. These variations pose challenges to effectively classify affective sates from future physiological data. The present study collects affective physiological data (electrocardiogram (ECG), electromyogram (EMG), skin conductivity (SC), and respiration (RSP)) from four participants over five sessions each. The study provides insights on how diagnostic physiological features of affect change over time. We compare the classification performance of two feature sets, pooled features (obtained from pooled day data) and day-specific features using an up datable classifier ensemble algorithm. The study also provides an analysis on the performance of individual physiological channels for affect detection. Our results show that using pooled feature set for affect detection is more accurate than using day-specific features. The corrugator and zygomatic facial EMGs were more reliable measures for detecting valence than arousal compared to ECG, RSP and SC over the span of multi-session recordings. It is also found that corrugator EMG features and a fusion of features from all physiological channels have the highest affect detection accuracy for both valence and arousal.
Content may be subject to copyright.
Affect Detection and Classification from the Non-
Stationary Physiological Data
Omar AlZoubi and Davide Fossati
Computer Science
Carnegie Mellon University in Qatar
Doha, Qatar
oalzoubi@cmu.edu, dfossati@cmu.edu
Sidney D’Mello
Department of Computer Science
The University of Notre Dame
Notre Dame, IN, USA
sdmello@nd.edu
Rafael A. Calvo
School of Electrical and Information Engineering
The University of Sydney
Sydney, Australia
Rafael.Calvo@sydney.edu.au
Abstract— Affect detection from physiological signals has
received a great deal of attention recently. One arising
challenge is that physiological measures are expected to exhibit
considerable variations or non-stationarities over multiple
days/sessions recordings. These variations pose challenges to
effectively classify affective sates from future physiological
data. The present study collects affective physiological data
(electrocardiogram (ECG), electromyogram (EMG), skin
conductivity (SC), and respiration (RSP)) from four
participants over five sessions each. The study provides
insights on how diagnostic physiological features of affect
change over time. We compare the classification performance
of two feature sets; pooled features (obtained from pooled day
data) and day-specific features using an updatable classifier
ensemble algorithm. The study also provides an analysis on the
performance of individual physiological channels for affect
detection. Our results show that using pooled feature set for
affect detection is more accurate than using day-specific
features. The corrugator and zygomatic facial EMGs were
more reliable measures for detecting valence than arousal
compared to ECG, RSP and SC over the span of multi-session
recordings. It is also found that corrugator EMG features and
a fusion of features from all physiological channels have the
highest affect detection accuracy for both valence and arousal.
Keywords—Affect; emotion; classifier ensembles;
physiological; Non-Stationary
I. INTRODUCTION
There is increased attention on using physiological signals in
affect detection systems that are able to detect either discrete
emotional categories or affective dimensions of valence and
arousal [1-3]. Physiological responses such as facial muscle
activity, skin conductivity, heart activity, and respiration,
have all been considered as potential physiological markers
for recognizing affective states. Despite high classification
rates achieved under laboratory conditions [4], the changing
nature of physiological signals introduces significant
challenges when one moves from the lab and into the real
world. In particular, physiological data is expected to exhibit
non-stationarities or day variations due to factors such as:
electrode drift, changes in the electrode impedance, and
modulations by other mental states such as attention and
motivation of subjects [2, 5].
Non-stationarity of the physiological signal indicates that
the signal changes its statistical characteristics as a function
of time. These changes then propagate to feature values
extracted from signals over time. Non-stationarities or day
variations of physiological data represents a major problem
for building reliable classification models of affect (i.e.,
generalizability within an individual). This is because
classification methods assume that training data is obtained
from a stationary distribution. In real world contexts,
however, this assumption of stationarity is routinely violated.
According to Kuncheva [6], every real-world classification
system should be equipped with a mechanism to adapt to
changes in the environment. Therefore, this study utilizes an
updatable ensemble classification approach (winnow),
discussed in more detail in section II.
Understanding the nature of these day variations is
essential for developing reliable affect detection systems that
can be deployed in real world affective computing
applications. There is a critical need for basic research on
how physiological signals vary over time before effective
solutions can be proposed. In this study we try to address two
related research questions. First, the day variation in
physiological data might indicate that diagnostic features of
affect vary from one day/session to another. We test this
issue and evaluate a new classification approach that uses
day-specific features for affect detection. The performance of
day-specific features is then compared to a classification
approach that uses pooled features obtained from pooled day
2013 12th International Conference on Machine Learning and Applications
978-0-7695-5144-9/13 $26.00 © 2013 IEEE
DOI 10.1109/ICMLA.2013.49
240
2013 12th International Conference on Machine Learning and Applications
978-0-7695-5144-9/13 $31.00 © 2013 IEEE
DOI 10.1109/ICMLA.2013.49
240
2013 12th International Conference on Machine Learning and Applications
978-0-7695-5144-9/13 $31.00 © 2013 IEEE
DOI 10.1109/ICMLA.2013.49
240
2013 12th International Conference on Machine Learning and Applications
978-0-7695-5144-9/13 $31.00 © 2013 IEEE
DOI 10.1109/ICMLA.2013.49
240
data. Second, we test the performance and reliability of
individual physiological channels for detecting both valence
and arousal affective dimensions over the span of multiple
day recording sessions.
A. Background and Related Work
Research on emotion referred to the existence of a set of
discrete emotional prototypes (e.g. happiness, sadness, etc)
[7]. As opposed to the notion or existence of discrete
emotions, Russell [8] suggested that affective experience is
best described in the two-dimensional space of valence and
arousal. The arousal dimension ranges from highly activated
to highly deactivated, and the valence dimension from highly
pleasant to highly unpleasant. Take happiness for example as
an emotion; it is modelled with positive valence and high
arousal. On the other hand, Sadness is modelled with a
negative valence and low arousal.
Recent research has utilized physiological signals for
affect detection of both affective dimensions of valence and
arousal. Kim and Andre [1] conducted an experiment to
detect the levels of valence and arousal of subjects during
music listening using physiological signals. They recorded
ECG, facial EMG, SC, and RSP from three participants.
They achieved 89% classification accuracy for 1-2 degrees
of valence and 77% for 1-2 degrees of arousal using an LDA
classifier. Similarly, Lichtenstein, et al. [4] recorded ECG,
SC, EMG, RSP, and skin temperature from 41 subjects while
watching emotionally charged films. They were able to
detect 1-2 degrees of arousal with 82% accuracy and 72%
for 1-2 degrees of valence using an SVM classifier.
Likewise, Picard, et al. [2] recorded physiological data over a
period of 20 days from one subject (actor), who was asked to
self-elicit a set of eight emotions. They faced the problem of
degrading classification performance when data from
multiple days were combined. They attempted to address the
problem of day variation by including day information as
additional classification features; however, this did not yield
a significant improvement in accuracy.
The above mentioned studies have used traditional batch
static classification techniques without any updating
mechanism to the classifier. On the other hand; adaptive and
updatable classification techniques have been used to handle
non-stationarities in two close domains of study; which are
speech recognition and Brain Computer Interfaces (BCI).
For example, Maier-Hein, et al. [9] implemented an adaptive
approach to detect non-audible speech using seven EMG
electrodes. They realized that a major problem in surface
EMG based speech recognition ensue from repositioning
electrodes between recording sessions, environmental
temperature changes, and skin tissue properties of the
speaker. In order to reduce the impact of these factors, they
investigated a variety of signal normalization and model
adaptation methods. An average word accuracy of 97.3%
was achieved using seven EMG channels and the same
electrode positions. The performance dropped to 76.2% after
repositioning the electrodes if no normalization or adaptation
is performed. However, by applying the adaptation methods
they managed to restore the recognition rates to 87.1%.
Adaptive and dynamic classification approaches have
also been employed in BCI research. BCI aims at giving the
ability to control devices through mere thoughts by utilizing
brain signals such as electroencephalogram (EEG). For
example Lowne, et al. [10] compared the performance of a
dynamic classification approach to a static classifier and an
MLP classifier on an online BCI experiment. They used
EEG data from eight subjects during a wrist extension
exercise; 20% of the data were labelled with true labels
(movement, non-movement). The three classifiers were then
tested on the EEG data in a sequential manner (timely
ordered) to detect one of the two classes. The performance of
the dynamic classifier was significantly higher than that of
the static classifier and MLP. This shows that classifier
adaptation is more effective compared to static classification
when dealing with changing data such as physiological data.
This gives the rational for the use of an adaptive and
updatable classification approach such as the winnow
updatable ensemble algorithm to classify our corpus of
affective data.
The remaining of this paper is organized as follows.
Section II describes the procedure of collecting affective
physiological data and the computational methods employed
for feature extraction and classification. Section III presents
and discusses our results, and section IV provides concluding
remarks and directions for future work.
II. MEASURES, DATA AND METHODS
A. Participants and Measures
Participants were six students from the University of Sydney
(five males and one female) between 24 and 39 years of age.
Participants were paid for their participation in the study.
The physiological sensors used for recording physiological
activity from participants were: ECG, SC, EMG, and RSP.
The physiological signals were acquired using a BIOPAC
MP150 system and AcqKnowledge software with a sampling
rate of 1000 Hz for all channels. The ECG signal was
collected with two electrodes placed on both wrists. EMG
was recorded from the corrugator (eyebrow) and zygomatic
(cheek) facial muscles. The SC was recorded from the index
and middle fingers of the non-dominant hand, and a
respiration belt fixed around the participant chest was used to
measure respiration activity.
The affect-inducing stimulus consisted of set of 400
images selected from the International Affective Picture
System (IAPS) collection [11]. The images were selected on
the basis of their normative valence and arousal scores. The
mean valence norm scores ranges from 1.40 to 8.34, and
mean arousal norm scores ranges from 1.72 to 7.35 (on a
scale from 1 to 9). Images were selected from the four
quadrants: PositiveValence-LowArousal (mean IAPS
valence norm > 6.03 and mean IAPS arousal norm < 5.47),
PositiveValence-HighArousal (mean IAPS valence norm >
6.03 and mean IAPS arousal norm > 5.47),
241241241241
NegativeValence-HighArousal (mean IAPS valence norm <
3.71 and mean IAPS arousal norm > 5.47), and
NegativeValence-LowArousal (mean IAPS valence norm <
3.71 and mean IAPS arousal norm < 5.47). The idea was to
select images from the extremes of both valence and arousal
in order to maximize the differences of participants’
physiological responses. The set of 400 images was then
divided into 5 sets of 80 images each (20 images from each
category). However for classification procedures described
in section III, we classify each of the valence and arousal
dimensions separately.
Only four participants were able to complete the five
recording sessions. The female participant and one male
participant reported their inability to continue because some
images had explicit content that were overwhelming to them.
Therefore, only data from four participants were used in the
current study. This indicates the difficulty of obtaining a
affective physiological data that is intended to track
variations over multiple sessions. We note that even though
the sample size is small, each participant was recorded over
5 sessions. This is consistent with the present goal of
tracking variations within an individual rather than across
individuals.
B. Procedure
participants sat in a quite dimmed room and were asked to
sign a consent form before the start of the session. They then
viewed a set of emotionally charged IAPS images, during
which their physiological signals were continuously
recorded. Each recording session lasted approximately 60
minutes. Each emotional trial consisted of presenting each
image for 12 seconds, followed by a screen that showed a 2
X 2 affective grid [12] which lasted few seconds and allowed
participants to rank their levels of valence (positive,
negative) and arousal (low, high). A blank screen was
presented afterwards for 8 seconds to allow physiological
activity to return to baseline neutral levels before a new
image was presented. Five images were presented
consecutively from each category in order to maintain a
stable emotional state for that category. This protocol was
designed to suit the intended goals of our study and is based
on previous research [11]. Each subject participated in five
recording sessions each separated by one week. A different
set of images were presented for each session in order to
prevent habituation effects. The same setup was used in all
recording sessions.
C. Feature Extraction
The MATLAB-based Augsburg Biosignal Toolbox [13] was
used to preprocess and extract features from the raw
physiological data. A total of 214 statistical features (e.g.
mean, median, standard deviation, maxima and minima)
were extracted from the five physiological channels using
window size of 12 seconds (the length of the emotional trial).
In general, SC responses (SCR) can be observed 1-3 seconds
after stimulus presentations. EMG responses are
substantially faster, however, the frequency of the muscle
activity can be summed up over a period of time to indicate a
change in behavioral pattern. ECG and respiration responses
are considered slower, however, estimating cardiac and
respiratory patterns form short term periods is common in
psychophysiology research area [14]. Eighty-four features
were extracted from ECG, 21 from SC, 21 from each of the
EMG channels, and 67 from the RSP channel. A complete
description of these features can be found in [13].
D. Classification Methods
The present study used the winnow updatable ensemble
algorithm described in detail in Table. I. It is an ensemble
based algorithm that is similar to a weighted majority voting
algorithm. It combines decisions from ensemble members
based on their weights. However, it utilizes a different
updating approach for member classifiers. This includes
promoting ensemble members that make correct predictions
and demoting those that make incorrect predictions.
Updating of the weights is done automatically based on
incoming data, which makes this approach suitable for online
applications that operate in non-stationary environments .
We set the parameter alpha value to 2 (alpha = 2); which
is the parameter used to update the weights of classifier
members. Acceptable results have been achieved using an
alpha value of two in previous research [6]. In this study we
use hard labeled data only. The ensemble relies on regular
feedback where weights are updated on the basis of error.
This is a prerequisite for the updating mechanism used by
winnow. However, in real world applications immediate
feedback might not be available all the time. In case that
feedback or true class labels are unknown, other approaches
could be explored. For example the use of semi-supervised
techniques that combine labeled and unlabeled data for
training a classifier. Additionally, feedback could be
obtained on-demand by asking users of an affect-aware
system in away similar to self-reports. In this case the
frequency at which the ensemble is updated could depend on
some performance threshold.
The WEKA data mining package, and PRTools 4.0 [15],
a pattern recognition MATLAB library, were used for
classification. Chi-square feature selection was used for
dimensionality reduction in order to avoid problems
associated with large feature spaces. WEKA’s support vector
machine (SMO) classifier with a linear kernel was utilized
for training classification models. Many successful
TABLE I. THE WINNOW ENSEMBLE ALGORITHM
- Initialization: Given a classifier ensemble D = (D1,….,Dn),
Initialize all classifier weights; wi = 1. i =1:n.
- Classification: For a new example x, calculate the support for
each class as the sum of the weights of all classifiers Di that suggest
class label ck for x. Set x to the class with largest support.
k=1:number of classes.
- Updating: if x is classified correctly by classifier Di then its
weight is increased (promotion) by wi = alpha * wi,, where alpha >
1. If classifier Di incorrectly classifies x, then its weight is
decreased by wi = wi / alpha (demotion).
242242242242
applications of SVMs have been demonstrated in previous
research [16]. Choosing the SMO classifier as a base
classifier is independent from the classification approach
adopted by the ensemble algorithm. In future work,
additional classifiers could be evaluated to determine their
effect on the performance of the ensemble algorithm.
E. Day Datasets
Day datasets were constructed separately for the two
affective measures valence (positive/negative) and arousal
(low/high). Additionally, separate datasets were constructed
for both IAPS mapped categories (as described in section II.
A.) and self-reports of participants. In total there were 80 (4
participants x 5 recording sessions x 2 affective measures
(valence and arousal) x 2 ratings (IAPS and self-reports))
datasets with 80 instances in each dataset. IAPS ratings
datasets had a balanced distribution of labels 40:40 for
positive/negative valence or low/high arousal. On the other
hand, self-reports datasets had unbalanced distribution of
classes, so a down sampling procedure (WEKA’s
SpreadSubsample which produces a random subsample of a
dataset) was applied to obtain a balanced distribution of
classes. This is done in order to avoid classifier bias towards
predicting majority class. On average, 34% of data was lost
for self-reports arousal datasets and 15% of data was lost for
self-reports valence datasets. Therefore, baseline
classification accuracy is (50%) for both types of data sets. A
preliminary analysis showed that the top five ranked features
-using chi-square feature selection- were sufficient to
produce consistent classification results without sacrificing
performance. Therefore, the top five features were selected
from each dataset and used in all subsequent analysis.
III. RESULTS AND DISCUSSIONS
We first tested the effectiveness and reliability of the IAPS
images of inducing both valence and arousal using Cohen’s
kappa. We then explored the issue of how diagnostic features
of affect change over time by applying feature ranking (chi-
square) to separate day datasets, and how this issue could
affect the building of reliable classification models. Next, we
carried out two classification experiments. The first
experiment compared three feature selection and training
strategies utilizing the winnow ensemble algorithm and a
day-cross validation procedure that resembles baseline
accuracy. The effect of using day-specific and pooled
features on classification accuracy was tested. The second
experiment involved testing the effect of individual
physiological channels on affect detection accuracy. This
experiment built on the results of the first experiment by
using pooled features only for affect classification. In all
classifier training strategies described in the following
sections, we adopted a day cross-validation strategy in order
to test on all available data.
A. The effectivness of IAPS of inducing Affect
In order to test the effectiveness of the IAPS stimuli in
inducing both valence and arousal, we tested the level of
agreement between participants’ self-reports and IAPS
normative ratings using Cohen’s kappa. Participants’ self-
reported valence showed higher agreement with IAPS
normative ratings compared to arousal. The kappa score for
valence was 0.89, and kappa score for arousal was 0.41. It is
evident that the IAPS stimuli were quite successful in
eliciting valence, but was much less effective in influencing
arousal. However, both ratings schemes will be used to
assess affect detection accuracy.
B. Day-Specific Features
As an example of how diagnostic features change across
days, Table II presents the results of chi-square feature
selection applied to participant S1 (applied to each day data
separately). The chi-square value represents the degree of
relevance of a feature to class category. It can be seen from
the results that the diagnostic features are different for each
day. This is a reflection of the changing nature of
physiological data. Table III presents the features selected
from one participant using IAPS ratings only (for space
limitations and as a demonstration), however other subjects
data showed similar behavior. An interesting observation is
that there are frequent features which reoccur on different
days. This is promising as it allows for easier calibration of
affect detection classification models. However this leaves us
wondering whether classification models that are built from
these day-specific features are more accurate than those built
using pooled features selected from pooled day data. We
address this issue in the next section.
C. Winnow Results on Day-Specific and Pooled Features
The winnow ensemble algorithm was run with an ensemble
of four base classifiers each trained on a separate day-
specific features dataset. Testing was done on the remaining
day-data. The procedure was repeated five times to test on all
available data. We also tested the performance of pooled
TABLE II. TOP FIVE SELECTED FEATURES PERFORMED ON DAY DATA SEPRATELY FOR PARTICIPANT S1 WITH VALENCE (IAPS) AS CLASS LABEL
Chi Square/ Feature Name
Day 1 Features Day 2 Features Day 3 Features Day 4 Features Day 5 Features
43 SC-2Diff-minRatio 23 ZYG-EMG-1Diff-minRatio 10 SC-1Diff-minRatio 16 ZYG-EMG-max 28 ZYG-EMG-2Diff-minRatio
43 SC-2Diff-maxRatio 23 ZYG-EMG-1Diff-
maxRatio 10 RSP-Ampl-1Diff-max 10 ECG-QS-min 26 SC-2Diff-maxRatio
23 ZYG-EMG-1Diff-maxRatio 14 RSP-2Diff-range 10 SC-1Diff-maxRatio 9 ECG-QS-range 24 SC-2Diff-minRatio
23 ZYG-EMG-1Diff-minRatio 13 RSP-2Diff-min 6 RSP-Ampl2Diff-maxRatio 8 ECG-HrvDistr-mean 23 ZYG-EMG-2Diff-maxRatio
23 RSP-Pulse-max 12 ZYG-EMG-2Diff-mean 5 ECG-HrvDistr-mean 6 RSP-Pulse1Diff-maxRatio 12 RSP-Ampl-mean
a. ZYG: Zygomatic facial muscle, Amp: Amplitude, min: Minimum, max, Maximum, HRV: Heart rate variability, 1Diff: First Difference., 2Diff: Second Difference.
243243243243
TABLE III. AVERAGE CLASSIFI CATION ACCURACY FOR THREE
TRAINING STRATEGIES USING MIXED FEATURES FROM ALL
PHYSIOLOGICAL CHANNELS (%)
Subject ID D-CV W-
PF
W-SF D-CV W-
PF
W-SF
Valence (IAPS) Arousal (IAPS)
S1 59 74 64 52 72 56
S2 54 63 61 51 69 53
S3 50 74 61 52 75 49
S4 50 76 54 48 73 59
Avergae 53.25 71.75 60.00 50.75 72.25 54.25
Valence (Self) Arousal (Self)
S1 52 73 68 50 63 62
S2 53 62 59 49 65 48
S3 51 79 64 51 76 60
S4 50 70 54 52 69 69
Average 51.50 71.00 61.25 50.50 68.25 59.75
b. D-CV Day Cross-Validation, W-PF: Winnow with Pooled Features, W-SF: Winnow with
Day-Specific Features
features, which are features selected from four days data
combined, using the same training procedure. As a baseline,
we used a day cross-validation procedure, where a single
SMO classification model was constructed from pooled data
of four days, and testing was done on the remaining day data.
This process was repeated five times in order to test on all
available data. The baseline procedure represents a static
classification approach without an update mechanism. This
process was repeated for the four categories (Valence-IAPS ,
Arousal-IAPS , Valence-self, and Arousal-self). The results
in Table III were obtained from a mixed feature set, in which
the top five features were selected from all physiological
channels.
It can be seen from results that accuracy scores using
day-specific and pooled features are higher than day cross-
validation baseline accuracy. This indicates that we were
able to leverage the dynamicity of winnow algorithm to
enhance classification accuracy for this type of data. It can
also be seen from Table III that winnow with pooled features
on average outperformed winnow with day-specific features.
We were expecting that day specific features could provide
higher performance. The explanation to this lower
performance of day-specific features could be due to the fact
that day data tends to have higher clustering cohesion and
tightness compared to data for the same emotion category
across multiple days. In order to test this effect further, in the
next section we present classification results using single
channel’s data using the same procedures described earlier
for day cross-validation, pooled features, and day-specific
features. This also will allow us to shed light on the
performance and reliability of these individual channels for
affect detection over multiple sessions.
D. Channels Effect on Affect Detection Accuracy
The same training procedures described above were
performed on each individual physiological channel’s data.
Results are not shown here but will be outlined in the
Analysis of Variance (ANOVA) analysis described next.
In order to examine the effect of three training strategies
on affect detection accuracy, an ANOVA was conducted on
all accuracy scores combined. This is a one-way repeated
measure ANOVA with accuracy as the dependent variable
and training strategy as the independent variable. A
significant main effect was found for training strategy (F (2,
285) = 57.67, p < 0.05). Bonferroni posthoc tests revealed
that accuracy scores for winnow with pooled features (M =
65.14) were higher than those for winnow with day-specific
features (M = 57.81) and day cross-validation (M = 55.55).
These results suggest that using winnow with pooled features
is more suitable for building predictive models of affect than
the other two training strategies. Pooled features showed that
it had the capacity to describe someone’s overall affective
states with significantly higher accuracy compared to the
more discrete day specific-features. On the other hand, using
winnow ensemble achieved what was expected by
outperforming the single model approach represented by the
day cross-validation approach.
We tested the effect of both physiological channel and
emotion on affect detection accuracy using two-way repeated
measures ANOVA. This analysis was done using winnow
accuracy scores with pooled features only. We found
significant main effect for channel (F (5, 72) = 12.60, p <
0.05), indicating that physiological channels vary in their
usefulness for affect detection. Posthoc tests revealed that
accuracy scores for EMG-cur (M = 69.87) and mixed
features (M = 72.25) were significantly higher than other
channels ECG (M = 62.37), EMG-zyg (M = 58.5), RSP (M =
61.62), and SC (M = 61.62). We did not find significant
effect for emotion category (F (3, 72) = 1.14, p = 0.34),
which indicates that there are no significant differences in
the accuracy at which valence and arousal are detected given
a particular channel. However, when the levels of emotional
categories were decreased to two rather than four categories,
the effect of emotion was only marginally insignificant (F (1,
84) = 3.5, p = 0.065). A significant effect for the interaction
between channel and emotion was found (F (5, 84) = 2.36, p
< 0.05). This indicates that some channels have stronger
influence on one of the two affective components (valence,
arousal) over the other.
The interaction effect was further explored by conducting
simple effects tests. The tests revealed that both EMG
channels were more useful for detecting valence than
arousal; EMG-cur (F (1, 84) = 4, p < 0.05), and EMG-zyg (F
(1, 84) = 10.44, p < 0.05). Other channels where equally
likely to detect both valence and arousal with the same
accuracy, ECG (F (1, 84) = 43, p = 0.62), mixed features (F
(1, 84) = 70, p = 0.15), RSP (F (1, 84) = 0.07, p = 0.79), and
SC (F (1, 84) = 0, p = 0.99). Fig. 1 shows the interaction
effect between channel and emotion. Our findings came in
accordance with literature in regard to both EMG channels.
The corrugator and zygomatic EMG have always shown
consistent changes with the valence component of emotion
[17]. On the other hand, previous research has always
considered SC as an index of arousal [18]. Taking this in
regard, our results might have been affected by using IAPS
images as stimulus.
244244244244
Fig. 1. Interaction effect between channel and emotion
The results also showed that detecting arousal with
acceptable accuracy required more physiological markers in
comparison to valence (see Fig. 1). This probably the reason
that some previous research has outlined that the detection of
arousal is harder than valence [1]. The literature is somehow
inconsistent in this regard, with studies reporting higher
detection rates for arousal than valence [4, 19] and the
contrary [1]. However, an interesting study conducted by
Gomez, et al. [20] found that induced physiological changes
of subjects’ valence lasted longer than those of arousal in
which they dissipate quickly. This might explain the higher
detection rates of valence compared to arousal. However, it
should also be noted that other researchers believe that
valence detection can be more difficult to detect compared to
arousal as valence information is conveyed more subtly [21].
IV. CONCLUSIONS, LIMITATIONS AND FUTURE WORK
We have shown that diagnostic physiological features of
affect exhibit day variations. This is a challenging issue for
building effective affect detection systems. Using day-
specific features did not yield improved affect detection
over that of using pooled feature set. Both facial EMGs
were more predictive of valence than arousal compared to
ECG, RSP and SC. This has implications if designers of
affect detection systems were more interested in detecting
valence than arousal. This also suggests that facial EMG is
more reliable than other measures when considering affect
detection over multiple sessions. Additionally, EMG-cur
and a fusion of features from all channels yielded the
highest detection rates for both valence and arousal. There
are two primary limitations with the present study. One
limitation of our work is the relatively small sample size, so
replication with a larger sample is warranted. The second
limitation is that emotions were artificially induced rather
than spontaneously experienced. This approach was adopted
because strict laboratory control was desired in the present
experiment. Replicating this research in more naturalistic
contexts is an important step for future work.
REFERENCES
[1] J. Kim and E. Andre, "Emotion Recognition Based on Physiological
Changes in Music Listening," IEEE Trans. Pattern Anal. Mach.
Intell., vol. 30, pp. 2067-2083, 2008.
[2] R. W. Picard, E. Vyzas, and J. Healey, "Toward Machine Emotional
Intelligence: Analysis of Affective Physiological State," IEEE Trans.
Pattern Anal. Mach. Intell., vol. 23, pp. 1175-1191, 2001.
[3] O. AlZoubi, S. K. D'Mello, and R. A. Calvo, "Detecting Naturalistic
Expressions of Nonbasic Affect using Physiological Signals," IEEE
Transactions on Affective Computing, vol. 3, pp. 298-310, 2012.
[4] A. Lichtenstein, A. Oehme, S. Kupschick, and T. Jürgensohn,
"Comparing Two Emotion Models for Deriving Affective States from
Physiological Data," in Affect and Emotion in Human-Computer
Interaction. vol. 4868, C. Peter and R. Beale, Eds., ed: Springer
Berlin / Heidelberg, 2008, pp. 35-50.
[5] O. Alzoubi, M. S. Hussain, S. D'Mello, and R. A. Calvo, "Affective
modeling from multichannel physiology: analysis of day differences,"
in Proceedings of the 4th international conference on Affective
computing and intelligent interaction-Volume Part I, 2011, pp. 4-13.
[6] L. I. Kuncheva, "Classifier Ensembles for Changing Environments,"
in Multiple Classifier Systems, ed, 2004, pp. 1-15.
[7] P. Ekman, "An argument for basic emotions," Cognition & Emotion,
vol. 6, pp. 169-200, 1992.
[8] J. A. Russell, "A circumplex model of affect," Journal of Personality
and Social Psychology, vol. 39, pp. 1161-1178, 1980.
[9] L. Maier-Hein, F. Metze, T. Schultz, and A. Waibel, "Session
independent non-audible speech recognition using surface
electromyography," in Automatic Speech Recognition and
Understanding, 2005 IEEE Workshop on, 2005, pp. 331-336.
[10] D. R. Lowne, S. J. Roberts, and R. Garnett, "Sequential non-
stationary dynamic classification with sparse feedback," Pattern
Recognition, vol. 43, pp. 897-905, Mar 2010.
[11] M. Bradley and P. J. Lang, "The international affective picture system
(iaps) in the study of emotion and attention," in Handbook of
Emotion Elicitation and Assessment, J. A. Coan and J. J. B. Allen,
Eds., ed New York:: Oxford University Press, 2007, pp. 29-46.
[12] J. A. Russell, A. Weiss, and G. A. Mendelsohn, "Affect grid: A
single-item scale of pleasure and arousal," Journal of Personality and
Social Psychology, vol. 57, pp. 493-502, 1989.
[13] J. Wagner. (October, 2009). Augsburg Biosignal Toolbox (AuBT).
Available: http://mm-werkstatt.informatik.uni-
augsburg.de/project_details.php?id=%2033
[14] S. D. Kreibig, "Autonomic nervous system activity in emotion: A
review," Biological Psychology, vol. 84, pp. 394-421, 2010.
[15] F. v. d. Heijden, R. P. Duin, D. d. Ridder, and D. M. Tax,
Classification, parameter estimation and state estimation - an
engineering approach using Matlab. Chichester: John Wiley & Sons,
2004.
[16] A. K. Jain, R. P. W. Duin, and M. Jianchang, "Statistical pattern
recognition: a review," Pattern Analysis and Machine Intelligence,
IEEE Transactions on, vol. 22, pp. 4-37, 2000.
[17] A. O. Hamm, H. T. Schupp, and A. I. Weike, "Motivational
organization of emotions: Autonomic changes, cortical responses, and
reflex modulation," Handbook of affective sciences, pp. 187-211,
2003.
[18] R. W. Levenson, "Autonomic Nervous System Differences among
Emotions," Psychological Science, vol. 3, pp. 23-27, 1992.
[19] A. Haag, S. Goronzy, P. Schaich, and J. Williams, "Emotion
Recognition Using Bio-sensors: First Steps towards an Automatic
System," in Affective Dialogue Systems, ed, 2004, pp. 36-48.
[20] P. Gomez, P. G. Zimmermann, S. Guttormsen Schär, and B. Danuser,
"Valence lasts longer than arousal: Persistence of induced moods as
assessed by psychophysiological measures," Journal of
Psychophysiology, vol. 23, pp. 7-17, 2009.
[21] R. W. Picard, Affective Computing, second ed. Cambridge,
Massachusetts: The MIT Press, 1997.
245245245245
... Emotions can be detected from various modalities including; text [6], speech [7], facial [8], and physiological signals [9], [10]. Emotion detection from text is considered a complex task compared to sentiment analysis. ...
Conference Paper
Full-text available
Emotion Detection (ED) from text has been an active research field recently. It has attracted the attention of researchers as it can measure the emotional contexts while humans interact with computers. Humans could express their emotion in various ways; using typed text, facial expressions, speech, gestures, and physiological measures. ED is considerably different from sentiment analysis SA, where SA goal is to detect polarity from text such as positive, negative or neutral. On the other hand, ED aims to recognize emotions from input text. Emotions can be modeled as discrete categories, e.g. Ekmans six basic emotions (angry, fear, joy, disgust, surprise and sadness). On the other hand there is the dimensional model that express emotions as valence, arousal and dominance values. Social media provides a rich source of emotional text, e.g. Twitter and Facebook. In this paper, we provide a review of recent work on ED from Arabic text. We discuss approaches (lexicons, machine learning, deep neural networks and ensemble approaches), tools for text processing, and We also discuss description of the most popular datasets in this domain.
... Hussain et al. (2011) proposed a framework for fusing multimodal psychological features, such as heart and facial muscle activity, skin response, and respiration, for detecting and recognizing affective states. AlZoubi et al. (2013) explored the effect of the affective feature variations over time on the classification of affective states. Siddiquie et al. (2013) analyzed four different affective dimensions, such as activation, expectancy, power, and valence (Schuller et al., 2011). ...
Article
Full-text available
Recognizing human activities from video sequences or still images is a challenging task due to problems, such as background clutter, partial occlusion, changes in scale, viewpoint, lighting, and appearance. Many applications, including video surveillance systems, human-computer interaction, and robotics for human behavior characterization, require a multiple activity recognition system. In this work, we provide a detailed review of recent and state-of-the-art research advances in the field of human activity classification. We propose a categorization of human activity methodologies and discuss their advantages and limitations. In particular, we divide human activity classification methods into two large categories according to whether they use data from different modalities or not. Then, each of these categories is further analyzed into sub-categories, which reflect how they model human activities and what type of activities they are interested in. Moreover, we provide a comprehensive analysis of the existing, publicly available human activity classification datasets and examine the requirements for an ideal human activity recognition dataset. Finally, we report the characteristics of future research directions and present some open issues on human activity recognition.
Article
Full-text available
Background Anger dyscontrol is a common issue after traumatic brain injury (TBI). With the growth of wearable physiological sensors, there is new potential to facilitate the rehabilitation of such anger in the context of daily life. This potential, however, depends on how well physiological markers can distinguish changing emotional states and for such markers to generalize to real-world settings. Our study explores how wearable photoplethysmography (PPG), one of the most widely available physiological sensors, could be used detect anger within a heterogeneous population. Methods This study collected the TRIEP (Toronto Rehabilitation Institute Emotion-Physiology) dataset, which comprised of 32 individuals (10 TBI), exposed to a variety of elicitation material (film, pictures, self-statements, personal recall), over two day sessions. This complex dataset allowed for exploration into how the emotion-PPG relationship varied over changes in individuals, endogenous/exogenous drivers of emotion, and day-to-day differences. A multi-stage analysis was conducted looking at: (1) times-series visual clustering, (2) discriminative time-interval features of anger, and (3) out-of-sample anger classification. Results Characteristics of PPG are largely dominated by inter-subject (between individuals) differences first, then intra-subject (day-to-day) changes, before differentiation into emotion. Both TBI and non-TBI individuals showed evidence of linear separable features that could differentiate anger from non-anger classes within time-interval analysis. However, what is more challenging is that these separable features for anger have various degrees of stability across individuals and days. Conclusion This work highlights how there are contextual, non-stationary challenges to the emotion-physiology relationship that must be accounted for before emotion regulation technology can perform in real-world scenarios. It also affirms the need for a larger breadth of emotional sampling when building classification models.
Article
Full-text available
Signals from peripheral physiology (e.g., ECG, EMG, and GSR) in conjunction with machine learning techniques can be used for the automatic detection of affective states. The affect detector can be user-independent, where it is expected to generalize to novel users, or user-dependent, where it is tailored to a specific user. Previous studies have reported some success in detecting affect from physiological signals, but much of the work has focused on induced affect or acted expressions instead of contextually constrained spontaneous expressions of affect. This study addresses these issues by developing and evaluating user-independent and user-dependent physiology-based detectors of nonbasic affective states (e.g., boredom, confusion, curiosity) that were trained and validated on naturalistic data collected during interactions between 27 students and AutoTutor, an intelligent tutoring system with conversational dialogues. There is also no consensus on which techniques (i.e., feature selection or classification methods) work best for this type of data. Therefore, this study also evaluates the efficacy of affect detection using a host of feature selection and classification techniques on three physiological signals (ECG, EMG, and GSR) and their combinations. Two feature selection methods and nine classifiers were applied to the problem of recognizing eight affective states (boredom, confusion, curiosity, delight, flow/-engagement, surprise, and neutral). The results indicated that the user-independent modeling approach was not feasible; however, a mean kappa score of 0.25 was obtained for user-dependent models that discriminated among the most frequent emotions. The results also indicated that k-nearest neighbor and Linear Bayes Normal Classifier (LBNC) classifiers yielded the best affect detection rates. Single channel ECG, EMG, and GSR and three-channel multimodal models were generally more diagnostic than two--channel models.
Article
Full-text available
How long induced moods last is a critical question for mood research, but has been only poorly addressed to date. In particular, physiological parameters have rarely been included to assess the effectiveness of mood induction procedures. We investigated the persistence of four different moods (positive high-arousal, positive low-arousal, negative high-arousal, and negative low-arousal) induced by film clips during a computer task. We measured subjective affective state, respiration, skin conductance level (SCL), heart rate, and corrugator activity. People who watched the two negative clips reported more negative valence after the task and showed more facial frowning and lower SCL during the task than people who watched the two positive clips. No arousal effects persisted throughout the task. The results suggest that induced changes in the valence dimension of moods are maintained throughout an intervening task and are physiologically best reflected by corrugator activity and SCL, whereas induced changes in the arousal dimension dissipate quickly. The implications of these findings for mood research are discussed.
Article
Full-text available
Factor-analytic evidence has led most psychologists to describe affect as a set of dimensions, such as displeasure, distress, depression, excitement, and so on, with each dimension varying independently of the others. However, there is other evidence that rather than being independent, these affective dimensions are interrelated in a highly systematic fashion. The evidence suggests that these interrelationships can be represented by a spatial model in which affective concepts fall in a circle in the following order: pleasure (0), excitement (45), arousal (90), distress (135), displeasure (180), depression (225), sleepiness (270), and relaxation (315). This model was offered both as a way psychologists can represent the structure of affective experience, as assessed through self-report, and as a representation of the cognitive structure that laymen utilize in conceptualizing affect. Supportive evidence was obtained by scaling 28 emotion-denoting adjectives in 4 different ways: R. T. Ross's (1938) technique for a circular ordering of variables, a multidimensional scaling procedure based on perceived similarity among the terms, a unidimensional scaling on hypothesized pleasure–displeasure and degree-of-arousal dimensions, and a principal-components analysis of 343 Ss' self-reports of their current affective states. (70 ref) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
This article introduces a single-item scale, the Affect Grid, designed as a quick means of assessing affect along the dimensions of pleasure–displeasure and arousal–sleepiness. The Affect Grid is potentially suitable for any study that requires judgments about affect of either a descriptive or a subjective kind. The scale was shown to have adequate reliability, convergent validity, and discriminant validity in 4 studies in which college students used the Affect Grid to describe (a) their current mood, (b) the meaning of emotion-related words, and (c) the feelings conveyed by facial expressions. Other studies (e.g., J. Snodgrass et al; see record 1989-13842-001) are cited to illustrate the potential uses of the Affect Grid as a measure of mood. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Conference Paper
Full-text available
Physiological signals are widely considered to contain affective information. Consequently, pattern recognition techniques such as classification are commonly used to detect affective states from physiological data. Previous studies have achieved some success in detecting affect from physiological measures, especially in controlled environments where emotions are experimentally induced. One challenge that arises is that physiological measures are expected to exhibit considerable day variations due to a number of extraneous factors such as environmental changes and sensor placements. These variations pose challenges to effectively classify affective sates from future physiological data; this is a common problem for real world requirements. The present study provides a quantitative analysis of day variations of physiological signals from different subjects. We propose a classifier ensemble approach using a Winnow algorithm to address the problem of day-variation in physiological signals. Our results show that the Winnow ensemble approach outperformed a static classification approach for detecting affective states from physiological signals that exhibited day variations.
Article
Classification, Parameter Estimation and State Estimation is a practical guide for data analysts and designers of measurement systems and postgraduates students that are interested in advanced measurement systems using MATLAB. 'Prtools' is a powerful MATLAB toolbox for pattern recognition and is written and owned by one of the co-authors, B. Duin of the Delft University of Technology. After an introductory chapter, the book provides the theoretical construction for classification, estimation and state estimation. The book also deals with the skills required to bring the theoretical concepts to practical systems, and how to evaluate these systems. Together with the many examples in the chapters, the book is accompanied by a MATLAB toolbox for pattern recognition and classification. The appendix provides the necessary documentation for this toolbox as well as an overview of the most useful functions from these toolboxes. With its integrated and unified approach to classification, parameter estimation and state estimation, this book is a suitable practical supplement in existing university courses in pattern classification, optimal estimation and data analysis. Covers all contemporary main methods for classification and estimation. Integrated approach to classification, parameter estimation and state estimation. Highlights the practical deployment of theoretical issues. Provides a concise and practical approach supported by MATLAB toolbox. Offers exercises at the end of each chapter and numerous worked out examples. PRtools toolbox (MATLAB) and code of worked out examples available from the internet. Many examples showing implementations in MATLAB. Enables students to practice their skills using a MATLAB environment.
Chapter
This paper describes an experiment on emotion measurement and classification based on different physiological parameters, which was conducted in the context of a European project on ambient intelligent mobile devices. Emotion induction material consisted of five four-minute video films that induced two positive and three negative emotions. The experimental design gave consideration to both, the basic and the dimensional model of the structure of emotion. Statistical analyses were conducted for films and for self-assessed emotional state and in addition, supervised machine learning technique was utilized. Recognition rates reached up to 72% for a specific emotion (one out of five) and up to 82% for an underlying dimension (one out of two).
Article
Many data analysis problems require robust tools for discerning between states or classes in the data. In this paper we consider situations in which the decision boundaries between classes are potentially non-linear and subject to “concept drift” and hence static classifiers fail. The applications for which we present results are characterized by the requirement that robust online decisions be made and by the fact that target labels may be missing, so there is very often no feedback regarding the system's performance. The inherent non-stationarity in the data is tracked using a non-linear dynamic classifier, the parameters of which evolve under an extended Kalman filter framework, derived using a sequential Bayesian-learning paradigm. The method is extended to take into account missing and incorrectly labeled targets and to actively request target labels. The method is shown to work well in simulation as well as when applied to sequential decision problems in medical signal analysis.