Conference PaperPDF Available

YAAD: Young Adult’s Affective Data Using Wearable ECG and GSR sensors

Authors:
YAAD: Young Adult’s Affective Data Using
Wearable ECG and GSR sensors
Muhammad Najam Dar
National University of Sciences
and Technology (NUST)
Islamabad, Pakistan
najam.dar77@ce.ceme.edu.pk
Sajid Gul Khawaja
National University of Sciences
and Technology (NUST)
Islamabad, Pakistan
sajid.gul@ceme.nust.edu.pk
Amna Rahim
National University of Sciences
and Technology (NUST)
Islamabad, Pakistan
amna.rahim18@ce.ceme.edu.pk
Aqsa Rahim
Ghulam Ishaq Khan institute
of science and engineering
Topi, Pakistan
aqsa.rahim@giki.edu.pk
Muhammad Usman Akram
National University of Sciences
and Technology (NUST)
Islamabad, Pakistan
usman.akram@ceme.nust.edu.pk
Abstract—Emotions play a significant role in human-computer
interaction and entertainment consumption behavior, which
young adults commonly use. The main challenge is the lack of a
publicly available dataset for young adults with emotion labeling
of physiological signals. This article presents a multi-modal data
set of Electrocardiograms (ECG) and Galvanic Skin Response
(GSR) signals for the emotion classification of young adults.
Signal acquisition was performed through Shimmer3 ECG and
Shimmer3 GSR units wearable to the chest and palm of the par-
ticipants. The ECG signals were acquired from 25 participants,
while GSR signals were acquired from 12 participants while
watching 21 emotional stimulus videos divided into three sessions.
The data was self-annotated for seven emotions: happy, sad, fear,
surprise, anger, disgust, and neutral. These emotional states were
further self-annotated with five very low, low, moderate, high,
and very high-intensity levels of felt emotion. The participant also
annotated valence, arousal, and dominance scores through Google
form against each provided stimulus. The base experimental
results for classifying four classes of high valence high arousal
(HVHA), high valence low arousal (HVLA), low valence high
arousal (LVHA), and low valence low arousal for ECG data is
reported with an accuracy of 69.66%. Our baseline method for
the proposed dataset achieved 66.64% accuracy for the eight-class
classification of categorical emotions. The significance of data lies
in the more emotional classes and less intrusive sensors to mimic
real-world applications. Young adult’s affective data (YAAD) is
made publicly available, and it is valuable for researchers to
develop behavioral assessments based on physiological signals.
Index Terms—Electrocardiograms (ECG), Galvanic Skin Re-
sponse (GSR), Emotion recognition, Affective computing, Human
computer interaction, Biomedical signals.
I. INTRODUCTION
Emotions play a significant role in the decision-making of
advanced human-computer interaction and automated health
care. The intrinsic behavior of Electrocardiogram (ECG) and
Galvanic skin response (GSR) manifest the actual emotional
experience of a person with the help of newly developed
wearable and low-cost sensors. Affective computing through
the processing of these biomedical signals recently improves
the quality of psychological healthcare monitoring [19, 14],
emotion regulation, stress management, affect-aware tutor [5],
aid patients with cognitive disorders with the advancement
of deep learning algorithms [28, 15, 7, 16]. Existing datasets
such as AMIGOS [4], and MAHNOB-HCI [29] (that cor-
relate ECG and GSR signals with emotional states) have a
significant impact on affective computing research, came up
with the challenges of the limited number of emotion classes,
intrusive sensors for data acquisition, and specified for adult
age range only. However, our dataset produced more emotion
classes with wearable sensors from young adults having the
significance for affective computing.
The advancement of technology [9, 20, 12, 27, 26] and
distance learning proposes more technology users in the age
range of children, and young adults [3, 2, 10]. The emotion
elicitation problem is more challenging for children and young
adults as they are more physically active and emotionally
sensitive than adults. The existing emotion annotated datasets
with physiological signals incorporate only adults above 19
years. Therefore, the researchers are more focused on the
emotion elicitation of adults due to the lack of a publicly
available dataset for children and young adults. Similarly,
the available datasets usually acquire Electroencephalogram
(EEG) signals in addition to ECG and GSR signals making it
more intrusive to users to be applicable in real-world applica-
tions. The existing datasets are also limited to various emotion
classes and self-annotation labels. In this work, we collected
the self-annotated multimodal dataset of young adults with
more emotion classes and less intrusive wearable sensors of
ECG and GSR.
This study provided a publicly available 1Young Adults
1Identification Number: doi.org/10.17632/g2p7vwxyn2.4;
Direct URL to data: https://data.mendeley.com/datasets/g2p7vwxyn2/4978-1-6654-9819-7/22/$31.00 ©2022 IEEE
Dataset AMIGOS [22] DEAP [21] DECAF [1] DREAMER
[18]
MAHNOB-
HCI [29] SEED-IV [30] YAAD
Participants 40 (27M, 13F) 32 (16M, 16F) 30 (16M, 14F) 23 (14M, 9F) 30 (13M, 17F) 15 (7M, 8F) 25 (15M, 10F)
Modalities EEG, ECG, GSR
and audio-visual
EEG, GSR and
peripheral
signals
ECG and
peripheral
signals
EEG, ECG
EEG, ECG,
GSR and
peripheral
signals
EEG ECG and GSR
Self-
Assessment
Annotations
Dimensional:
valence, arousal,
dominance,
liking,
familiarity.
Categorical: Six
basic emotions.
Dimensional:
arousal, valence,
liking,
dominance and
familiarity.
Dimensional:
valence,
arousal and
dominance.
Dimensional:
Valence,
arousal and
dominance.
Dimensional:
valence,
dominance.
Categorical:
Happy, Sad,
Neutral and
Fear.
Dimensional:
valence, arousal,
dominance,
familiarity.
Categorical: six
basic emotions,
five levels of
each of the six
basic emotion
categories,
neutral and
mixed emotion
state.
Dimensional
Scale 1 to 9 Continuous scale
1 to 9
0 to 5 and -2
to +2 1 to 5 1 to 9 None 0 to 10
Acquisition
14 Channel
EEG, Wireless
ECG and GSR
32 Channel EEG
and wired GSR
3 channel
ECG
14 Channel
EEG, Wireless
ECG
32 Channel
EEG and wired
ECG, GSR
62-Channel
ESI Neuroscan
System
Shimmer 3
Wireless
Age (years) 21-40 (µ=28.3) 19-37 (µ=26.9) (µ=27.3, σ=
4.3)
22-33 (µ=
26.6, σ=2.7)
19-40
(µ=26.06,
σ=4.93)
20-24 8-25 (µ=15.23,
σ=4.84)
Stimuli 20 Videos 40 Videos 32 Videos 18 Videos 20 Videos 24 Videos 21 Videos
Duration of
signal 51-150 s 60 s each (µ=80 s, σ=20
s)
65-393 s
(µ=199 s)
12-22 s
(µ=17.6 s,
σ=2.2 s)
120 s each
Approx 39 s each
TABLE I: Comparison of state-of-the-art emotion databases using physiological signals. M represented male, F represent
female, µrepresents mean and σrepresents standard deviation.
Affective Data (YAAD) dataset for emotion recognition. Re-
searchers can use this data to develop emotionally intelligent
software to provide product customization, emotion regulation,
stress management, and monitoring anomalies in psychologi-
cal and emotional behavior. The provided data can help im-
prove the quality of distance learning and healthcare services
to people suffering from alexithymia, autism, and other related
conditions by providing a baseline for emotion recognition.
The data significantly augment the available datasets by pro-
viding more emotional categories (35 emotional categories:
7 emotions with five levels each, simultaneously provided
with the dimensional scale of valence, arousal, and dominance
values) and the associated psychological environment. Section
II compares the proposed dataset with existing datasets, while
the description of data and experimentation setup is provided
in sections III and IV, respectively. Section V is dedicated to
the baseline affective computing results, and a conclusion is
provided in section VI.
II. RE LATE D WOR K
The induced emotion through stimuli is highly subjective,
often represented by self-annotation. The available data is
characterized by various variables, such as number of par-
ticipants, type of stimuli, number of emotion classes, type
of acquisition sensor, number and type of modalities, ease
of data acquisition, and variation in annotation models. This
section will review critical databases that tried to generalize
these parameters for affective computing in real-world appli-
cations. The comprehensive comparison of these parameters
between state-of-the-art databases and provided publicly avail-
able database are presented in Table I.
There are two approaches to model emotions through anno-
tations: the dimensional and the categorical. Plutchnik [24,
13] emotion wheel and Russel circumplex model [25] are
based on dimensional approach. The circumplex model is
based on two-dimensional space with the scale of valence
and arousal. This two-dimensional space is divided into four
quadrants of high valence high arousal (HVHA), high valence
low arousal (HVLA), low valence high arousal (LVHA), and
low valence low arousal (LVLA). The participants can annotate
in a range of scalar values for valence and arousal, which can
then be classified in one of these four quadrants. Ekman [8]
proposes the categorical approach with six basic emotion cat-
egories: happiness, sadness, anger, fear, disgust, and surprise.
Most databases such as [21], [18] and [30] use only one of
these approaches to annotate emotion labels. Only AMIGOS
[22] database incorporate both approaches for annotation to
generalize their data for comparative analysis in the domain
of affective computing. In our dataset collection, we also
incorporated both dimensional and categorical approaches with
the addition of five levels of each of the six basic emotions
during self-annotation. Similarly, the annotation provided in
YAAD is on the larger dimensional scale of 0 to 10 compared
to previous datasets for more accurate modeling of self-
VeryLow Low Moderate High VeryHigh
0
20
40
60
80
100
120
140
160
180
200
Number of Labels
Happy
Sad
Fear
Anger
Neutral
Disgust
Surprise
(A)
0 50 100 150 200 250
Sample Number
0
1
2
3
4
5
6
7
8
9
10
Scale (0--10)
Valence Arousal Dominance
(B)
Fig. 1: Dataset statistics for emotion classes and valence, arousal, dominance model (A) Number of samples labelled with
seven emotion classes of happy, sad, fear, anger, neutral, disgust and surprise with five levels of very low, low, moderate,
high and very high states (B) Sample from 1-252 with the score of valence, arousal and dominance.
annotation.
Few state-of-the-art databases with facial emotion expres-
sions are provided for young adults and teenagers [23, 11],
[6]. However, the databases for emotion recognition using
physiological signals are focused on adults age groups only as
provided in Table I. YAAD is provided for emotion elicitation
of young adults with a higher standard deviation. Another
critical problem is the collection of data through non-invasive
wireless devices with a minimum number of electrodes. In
previous databases, EEG is incorporated in addition to ECG,
GSR, and other peripheral signals, making it intrusive to
the participants compared to the real-world environment. Our
dataset acquires only ECG and GSR signals with single elec-
trodes and wireless devices for non-invasive data acquisition
compared to previous datasets.
III. DATA DESCRIPTION
Our dataset consists of two configurations, one with single
modal ECG signals and the other with multi-modal ECG and
GSR signals. The provided multi-modal dataset comprises
seven emotional states (happy, sad, anger, fear, disgust, sur-
prise, and neutral). Each of these seven states consists of
five levels of very low, low, moderate, high, and very high
annotations representing the intensity of the felt state with
a total of 35 states. The annotation is considered a mixed
emotional state for the participant who self-annotates the same
level of felt emotion for more than one emotional state.
The dataset contains two folders, namely raw data and self
annotations. The raw data folder contains two sub-folders
namely single modal and multi-modal. The multimodal sub-
folder contains raw data of both ECG and GSR signals in sepa-
rate sub-folders. Both ECG and GSR signals were collected si-
multaneously from 12 participants. Each ECG and GSR folder
contains 252 files (3 sessions x 12 persons x 7 emotions). The
set of experiments performed for a single modal contains only
ECG signals. Therefore, the single modal sub-folder of raw
data consists of 154 ECG samples comprising 13 participants
(other than that used in multi-modal experiments) watching
seven stimulus videos for the variable number of sessions.
Participants of the single modal dataset with ID numbers 1, 3,
4, 5, 6, 7, 10, and 13 participated in the first session only, and
ID number 2 participated in the first two sessions, participants
with ID numbers 8, 9, 11 and 12 participated in all the three
sessions. Each ECG and GSR sample size 1 x 5000 with a
128 Hz sampling frequency (39-sec data) has a unique ID
based on session number, person ID, and video stimulus ID.
For example, sample ID of ECGdata s3p11v2 represent ECG
data sample of session 3 of participant ID 11 with video ID
2. self annotation labels folder contains the self annotated
labels (emotions) against all the raw data signals of single
modal and multi-modal (two separate excel files). Therefore,
each of these 39-sec raw data samples correlates with one of
these 35 emotional states provided in self annotation excel
file. The distribution of samples against these 35 emotional
labels are present in Fig. 1(a), the highest number of very low
values in encouraging because most of the participants felt
strong emotions against the provided stimuli.
The provided data incorporated 25 volunteer participants,
ten females and 15 males. The age varies from 8 to 25 years,
and each participant completed a questionnaire in the form of
a self-assessment (url). The self-assessment excel file contains
the sample with the session, participant, and video ID, the rat-
ing of 35 emotional states, also provided 3D valence, arousal,
and dominance model on a scale of 0 to 10 (from lowest
to highest). The number of samples provided with scores of
valence, arousal, and dominance is represented in Fig. 1(b),
which shows equal distribution of these values among the
provided scale. The valence score represents the positiveness
or negativeness of emotion, and the arousal score represents
excitement level, while the dominance score represents the
control and influence level felt by emotional stimuli. The
Self-Assessment Manikin (SAM) is used for 3-dimensional
emotional assessments of subjects in the questionnaire. SAM
is a non-verbal pictorial assessment technique that is quick,
inexpensive, and easy, which directly measures the pleasure,
arousal, and dominance associated with a person’s affective
reaction to a wide variety of stimuli. These manikins are
(A) Happy
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
-3
-2
-1
0
1
2
3
4
5
6
7
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
-2
-1
0
1
2
3
4
5
6
7
(B)) Sad (C) Fear
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
-2
-1
0
1
2
3
4
5
6
7
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
-2
-1
0
1
2
3
4
5
6
7
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
-3
-2
-1
0
1
2
3
4
5
6
7
(D) Anger (E) Neutral) (F) Disgust
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
-4
-3
-2
-1
0
1
2
3
Happy
Sad
Fear
Anger
Neutral
Disgust
Surprise
(G) Surprised (H) ECG Combined (I) GSR Combined
Fig. 2: Raw data samples of ECG and GSR for various emotion classes (A) Raw ECG data for happy class, (B) Raw ECG
data for sad class, (C) Raw ECG data for fear class, (D) Raw ECG data for anger class, (E) Raw ECG data for neutral class,
(F) Raw ECG data for disgust class, (G) Raw ECG data for surprised class, (H) Raw ECG data combined for seven emotion
classes with 2-sec window, (I) Raw GSR data combined for seven emotion classes.
Stimulus
ID
Target
Emotion
Video Title Stimulus
ID
Target
Emotion
Video Title Stimulus
ID
Target
Emotion
Video Title
S1V1 Happy Babies Laughing S2V1 Happy Funny babies S3V1 Happy Funny panda
S1V2 Sad Barely there S2V2 Sad Life is beautiful S3V2 Sad The Pianist
S1V3 Neutral National Park Alaska S2V3 Neutral Abstract shapes S3V3 Neutral Color bars
S1V4 Surprise Highest Waterfall S2V4 Surprise Bungee jump S3V4 Surprise Alpha jetman
S1V5 Disgust Disgust compilation S2V5 Disgust Open diabetic ulcer S3V5 Disgust Disgusting eating
S1V6 Anger Triggering OCD S2V6 Anger Schindler’s List S3V6 Anger Kashmir protest
S1V7 Fear Horror scene diet S2V7 Fear The moonlight man S3V7 Fear Whisper
TABLE II: 21 Stimulus videos with ID, target emotion and video title.
mapped with numbers. Ten levels from 0 to 10 are defined for
each dimension, representing an increase in intensity as shown
in the questionnaire attached in the Appendix. For valence, it is
unpleasant to pleasant; for arousal, it is from calm to activated;
and for dominance, it is from no control to in control. The
participant’s name, age, gender, and familiarity score (with
stimulus video from level 0 to 10, 10 means more familiar
and 0 means never watched the video before) is also present
in the excel sheet.
The raw ECG data for samples of seven emotion categories
are represented in Fig. 2(a) to Fig. 2(g). Raw ECG signals
representing these emotional categories are combined for 2-sec
and shown in Fig. 2(h). Similarly, Fig. 2(i) shows the combined
representation of seven emotion classes of raw GSR data.
These raw signals against self-annotation labels would help
the researchers preprocess, extract features using biomedical
signal processing, and then classify emotional categories to
develop emotion-related applications.
IV. EXP ER IM EN TATIO N PROTO CO L
Shimmer3 ECG and Shimmer3 GSR units are used to
capture ECG and GSR signals, respectively. The sampling
frequency for the ECG signal is set to be 256 Hz. Similarly,
the sampling frequency for the GSR signal is set to be 128
Hz, and a total of 5000 samples are considered for each signal.
The electrode placement for the wearable Shimmer3 ECG unit
is represented in Fig. 3(a), where there is no need to place
gel on the chest. GSR signals are acquired by placing one
electrode on the palmar surface of the medial phalange and
the other on the palmar surface of the distal phalange and
an ear clip on the side, as shown in Fig. 3(b). Emotional
stimuli are provided after 5-sec of signal acquisition; these
5-sec should be considered as baseline signals of neutral
state in the absence of emotional stimuli. Participants are
allowed to recover and relax by providing a 3-5 minute gap
between the trials and provided seven stimulus videos. The
same environmental setup is used for each signal acquisition
(A)
(B)
(C)
Fig. 3: Dataset acquisition procedure (A) Shimmer3 GSR unit, (B) Shimmer3 ECG unit, (c) Participant watching stimulus
video with wearable ECG and GSR sensors communicated with MATLAB for data streaming and storage through bluetooth.
process to standardize the dataset. The environmental setup
with a participant watching emotional stimulus is shown in
Fig. 3(c), while both wearable ECG and GSR sensors were
paired with PC and communicated the acquired signal with
MATLAB software through Bluetooth.
There were 21 stimulus videos shown to each participant
for 39-sec. Table II represent all the stimulus videos shown
to participants during data acquisition. For example, stimulus
ID S1V1 represents the first stimulus video used for the first
session, and S3V5 represents the fifth stimulus video used for
the third session. The details of these videos and their links
are also provided in supplementary material with the dataset.
The real-time streaming of data from a Shimmer device to
MATLAB incorporated the Shimmer MATLAB Instrument
Driver. Each sensor has its COM port number through which
their signals are shared over the Bluetooth to PC and saved
as mat files. The ECG data is pre-filtered with a second-
order Chebyshev low pass filter (LPF) with a corner frequency
slightly smaller than the Nyquist frequency and a second-order
Chebyshev high pass filter (HPF) with a corner frequency of
0.5 Hz to minimize the effect of environmental noise and
muscle movement before data storage. The interference noise
is minimized during signal acquisition; mobile phones and all
other electronic devices were removed from the environment
and asked the subject to stay static without unnecessary
movements.
TABLE III: Confusion matrix of ECG modality with four
classes classification of dimensional self-annotation.
Output
Class
Target Class
HVHA HVLA LVHA LVLA
HVHA 278 14 5 14
HVLA 28 720 41 217
LVHA 20 127 655 241
LVLA 41 220 278 1242
TABLE IV: Confusion matrix of eight emotion categories.
Output Class
Target Class
AngerDisgust Fear HappyMixed Neutral Sad Surprise
Anger 140 2 0 8 8 0 1 0
Disgust 0 252 47 52 66 0 11 14
Fear 4 0 197 11 11 2 71
Happy 24 21 38 912 122 75 30 30
Mixed 117 71 72 185 660 111 111 30
Neutral 1 0 0 30 0 189 0 1
Sad 0 0 3 25 30 0 309 3
Surprise 0 1 0 1 1 0 0 105
V. AFFECTI VE COMPUTING RESULTS
This section presents the baseline results of affective com-
puting with ECG signals using the deep learning methodology
proposed in [7]. The proposed dataset has both dimensional
and categorical self-annotations. Therefore, we computed emo-
tion classification in two different schemes. Firstly, with four
classes of HVHA, HVLA, LVHA, LVLA by dividing the
scales of valence and arousal into half. Secondly, with 8 class
categories of happiness, sadness, anger, fear, disgust, neutral,
surprise, and mixed emotion were classified.
The initial exploratory results of classification were com-
puted using ECG signals by combining single modal (154
samples) and multi-modal (252 samples) data with a total of
406 samples from 25 subjects. Each sample contains 39-sec
of ECG data, while the first five seconds of ECG recording is
the baseline signal without any provided stimuli. The last 34-
sec of each sample contains the data with the provided stimuli
of target emotion. The data from each sample is converted to
a 1-sec segment and subtract the average baseline from each
1-sec segment. Total number of 1-sec segments were 406 x 34
= 13,804 samples. These 13,804 samples are divided into 70
percent training (9,663 samples) and 30 percent testing (4,141
samples). After baseline removal and z-score normalization,
the CNN-LSTM architecture was trained and then tested with
these samples based on the methodology provided in [7].
The methodology employed for classification is based on
three one-dimensional convolutional layers with 16 filters of
size 1 x 8 each followed by LSTM layer. The model was
trained with batch size of 64 and learning rate of 0.001 with
adam optimizer.
The number of epochs to train the model was set to 200
epochs, and the experimental setup was followed by [7].
For four-class classification with the dimensional label, the
emotion recognition performance of 69.66% is achieved. Sim-
ilarly, for eight class classification with categorical emotion
labels, the emotion recognition performance of 66.64% is
achieved. The detailed class-wise results are presented as
confusion matrices of four class classification in Table III.
Similarly, the detailed class-wise results for eight classes are
presented in Table IV. This proposed dataset is based on a
single electrode of ECG. The recording was performed for
a limited duration of 39-sec for the minimal intrusion to the
participant. Therefore, due to less number of total training
samples and single electrodes, the recognition performance is
less compared to [7], however, comparable to the baseline
results of many state-of-the-art approaches. The proposed
dataset allows researchers to improve the ECG and GSR-
based emotion recognition performance with less intrusive data
collection and larger classes of emotions.
VI. CONCLUSION
In this work, a multimodal YAAD database is presented
and made publicly available to the research community of
affective computing. The lack of a physiological signal-based
emotion database for young adults makes it understudied for
the prime consumers of the latest technology. This database
contains ECG and GSR recordings using wireless sensors for
25 participants, with dimensional and categorical scales self-
annotation. The audio-visual stimuli of 21 selected videos
are used to induce target emotions. The wide range of self-
annotations and larger classes of emotions with less intrusive
data collection make it valuable to explore the potential of
ECG and GSR for emotion elicitation in young adults. The
limitation of this work is to incorporate ECG and GSR signals
only compared to EEG based emotion analysis [17]. The
baseline results are provided for ECG signals, enabling the
researchers to explore the potential of GSR and the multimodal
fusion of both signals for comparison and improvement of
emotion elicitation in the domain of affective computing.
VII. ETHICS STATEMENT
Data acquisition was performed in a controlled environment
with subjects’ consent, where they agreed to volunteer for this
data collection to support research. The study did not add any
patients to collect data, and the data is collected from students
who volunteer for data collection. All subjects signed a consent
form. The NUST ethical review committee initially approved
the study with protocol number 03-2021-02/20, the collection
of data was performed in a dedicated controlled environment.
The data provided online is wholly anonymized and contains
no information revealing the subject’s identity.
VIII. ACKN OWLEDGEMENTS
The data acquisition process and relevant research are
carried out under the Biomedical Image/Signal Analysis
(BIOMISA) Research Lab in NUST, Islamabad, Pakistan.
REFERENCES
[1] Mojtaba Khomami Abadi et al. “DECAF: MEG-based
multimodal database for decoding affective physiolog-
ical responses”. In: IEEE Transactions on Affective
Computing 6.3 (2015), pp. 209–222.
[2] Hussain Ahmad et al. “Futuristic Short Range Optical
Communication: A Survey”. In: IEEE International
Conference on Information Science and Communication
Technology (ICISCT) (2020).
[3] Jozenia Torres Colorado and Jane Eberle. “Student
demographics and success in online learning environ-
ments.” In: (2012).
[4] Juan Abdon Miranda Correa et al. Amigos: A dataset
for affect, personality and mood research on individ-
uals and groups”. In: IEEE Transactions on Affective
Computing (2018).
[5] Ailbhe Cullen et al. “Creaky voice and the classification
of affect”. In: Proceedings of WASSS, Grenoble, France
(2013).
[6] Kirsten A Dalrymple, Jesse Gomez, and Brad Duchaine.
“The Dartmouth Database of Children’s Faces: Acqui-
sition and validation of a new face stimulus set”. In:
PloS one 8.11 (2013), e79131.
[7] Muhammad Najam Dar et al. “Cnn and lstm-based emo-
tion charting using physiological signals”. In: Sensors
20.16 (2020), p. 4551.
[8] Paul Ekman and Wallace V Friesen. Unmasking the
face: A guide to recognizing emotions from facial clues.
Vol. 10. Ishk, 2003.
[9] Khush Naseeb Fatima et al. “Fully automated diagnosis
of papilledema through robust extraction of vascu-
lar patterns and ocular pathology from fundus pho-
tographs”. In: Biomedical optics express, Vol. 8, Issue
2, pp. 1005-1024 (2017).
[10] Bilal Hassan et al. Automated retinal edema detection
from fundus and optical coherence tomography scans”.
In: IEEE 5th International Conference on Control,
Automation and Robotics (ICCAR), pp. 325-330 (2019).
[11] Bilal Hassan et al. Autonomous Framework for Person
Identification by Analyzing Vocal Sounds and Speech
Patterns”. In: IEEE 5th International Conference on
Control, Automation and Robotics (ICCAR), pp. 325-
330 (2019).
[12] Bilal Hassan et al. “Deep learning based joint segmen-
tation and characterization of multi-class retinal fluid
lesions on OCT scans for clinical use in anti-VEGF
therapy”. In: Computers in Biology and Medicine, Au-
gust (2021).
[13] Bilal Hassan et al. “Structure tensor based automated
detection of macular edema and central serous retinopa-
thy using optical coherence tomography images”. In:
Journal of Optical Society of America A, Vol. 33, Issue
4, pp. 455-463 (2016).
[14] Taimur Hassan, Saleem Aslam, and Ju Wook Jang.
“Fully automated multi-resolution channels and mul-
tithreaded spectrum allocation protocol for IoT based
sensor nets”. In: IEEE Access, Vol. 6, pp. 22545-22556
(2018).
[15] Taimur Hassan et al. “RAG-FW: A hybrid convolutional
framework for the automated extraction of retinal le-
sions and lesion-influenced grading of human retinal
pathology”. In: IEEE Journal of Biomedical and Health
Informatics. January 2021.
[16] Taimur Hassan et al. “Structure tensor graph searches
based fully automated grading and 3D profiling of mac-
ulopathy from retinal OCT images”. In: IEEE Access,
Vol. 6, pp. 44644-44658 (2018).
[17] Zhongyang He et al. “Cross-Day EEG-Based Emotion
Recognition Using Transfer Component Analysis”. In:
Electronics 11.4 (2022), p. 651.
[18] Stamos Katsigiannis and Naeem Ramzan. “DREAMER:
A database for emotion recognition through EEG and
ECG signals from wireless low-cost off-the-shelf de-
vices”. In: IEEE journal of biomedical and health
informatics 22.1 (2017), pp. 98–107.
[19] Samina Khalid et al. Automated segmentation and
quantification of drusen in fundus and optical coher-
ence tomography images for detection of ARMD”. In:
Journal of Digital Imaging, Vol. 31, Issue 4, pp. 464-
476 (2018).
[20] Samina Khalid et al. “Fully automated robust system
to detect retinal edema, central serous chorioretinopa-
thy, and age related macular degeneration from optical
coherence tomography images”. In: BioMed research
international, pp. 217-220 (2017).
[21] Sander Koelstra et al. “Deap: A database for emotion
analysis; using physiological signals”. In: IEEE trans-
actions on affective computing 3.1 (2011), pp. 18–31.
[22] Juan Abdon Miranda-Correa et al. Amigos: A dataset
for affect, personality and mood research on individ-
uals and groups”. In: IEEE Transactions on Affective
Computing 12.2 (2018), pp. 479–493.
[23] Behnaz Nojavanasghari et al. “Emoreact: a multimodal
approach and dataset for recognizing emotional re-
sponses in children”. In: Proceedings of the 18th acm
international conference on multimodal interaction.
2016, pp. 137–144.
[24] Robert Plutchik. “The nature of emotions: Human emo-
tions have deep evolutionary roots, a fact that may
explain their complexity and provide tools for clinical
practice”. In: American scientist 89.4 (2001), pp. 344–
350.
[25] James A Russell. “A circumplex model of affect.”
In: Journal of personality and social psychology 39.6
(1980), p. 1161.
[26] Muhammad Shafay et al. “Deep Fusion Driven Se-
mantic Segmentation for the Automatic Recognition of
Concealed Contraband Items”. In: SoCPaR, pp. 550-559
(2020).
[27] Muhammad Shafay et al. “Temporal Fusion Based
Multi-scale Semantic Segmentation for Detecting Con-
cealed Baggage Threats”. In: IEEE International Con-
ference on Systems, Man, and Cybernetics (SMC), Au-
gust (2021).
[28] Lin Shu et al. “A review of emotion recognition using
physiological signals”. In: Sensors 18.7 (2018), p. 2074.
[29] Mohammad Soleymani et al. “A multimodal database
for affect recognition and implicit tagging”. In: IEEE
transactions on affective computing 3.1 (2011), pp. 42–
55.
[30] Wei-Long Zheng et al. “Emotionmeter: A multimodal
framework for recognizing human emotions”. In: IEEE
transactions on cybernetics 49.3 (2018), pp. 1110–1122.
Article
In this study, a thorough analysis of the proposed approach in the context of emotion classification using both single-modal (A-13sbj) and multi-modal (B-12sbj) sets from the YAAD dataset was conducted. This dataset encompassed 25 subjects exposed to audiovisual stimuli designed to induce seven distinct emotional states. Electrocardiogram (ECG) and galvanic skin response (GSR) biosignals were collected and classified using two deep learning models, BEC-1D and ELINA, along with two different preprocessing techniques, a classical fourier-based filtering and an Empirical Mode Decomposition (EMD) approach. For the single-modal set, this proposal achieved an accuracy of 84.43±30.03, precision of 85.16±28.91, and F1-score of 84.06±29.97. Moreover, in the extended configuration the model maintained strong performance, yielding scores of 80.95±22.55, 82.44±24.34, and 79.91±24.55, respectively. Notably, for the multi-modal set (B-12sbj), the best results were obtained with EMD preprocessing and the ELINA model. This proposal achieved an improved accuracy, precision, and F1-score scores of 98.02±3.78, 98.31±3.31, and 97.98±3.83, respectively, demonstrating the effectiveness of this approach in discerning emotional states from biosignals.
Conference Paper
Emotion recognition plays an important role in understanding human behavior and psychological well-being. In this research, we propose a method to recognize high-valence and low-valence emotions in young adults through the analysis of Heart rate variability, utilizing non-deep learning techniques. The study explores the Young Adult’s Affective Data dataset, comprising physiological information from 25 volunteer participants aged between 8 and 25 years. We employ three non-deep learning classifiers: Support Vector Machine, Logistic Regression, and K-Nearest Neighbors for binary emotion classification. Our method achieved 83% accuracy in recognizing high-valence and low-valence emotions. Overall, our findings highlight the efficacy of HRV-based emotion recognition using non-deep learning techniques, offering promising potential for practical applications in mental health monitoring, affective computing, and human-computer interaction. This study contributes to advancing emotion recognition methods and understanding emotional well-being among young adults.
Conference Paper
Biomedical signals can be used to diagnose several affections of the human body. Nonetheless, they can also be used to describe a more general behavior of specific organs and how they respond according to the feelings and emotions of a person. Therefore, the YAAD dataset, which contains electrocardiogram (ECG) and Galvanic Skin Response (GSR) signals, is used in order to detect and classify seven different emotions from 25 different subjects. Stimulus is provoked to the subjects by exposing them to watch a collection of different videos that evoke emotions, such as anger, happiness, sadness, among others. Two different subsets are used in this research, a single-modal and multi-modal signals. In this work, we propose a series of preprocessing techniques to clean and resample the original signals, then a simple 1-dimensional convolutional neural network is implemented to perform the classification task. Moreover, two different types of validation methods were used to validate our results. We have achieved an accuracy over 95% for both validation methods on the multi-modal subset and an accuracy over 85% for the single-modal subset.
Article
Affective computing focuses on recognizing emotions using a combination of psychology, computer science, and biomedical engineering. With virtual reality (VR) becoming more widely accessible, affective computing has become increasingly important for supporting social interactions on online virtual platforms. However, accurately estimating a person’s emotional state in VR is challenging because it differs from real-world conditions, such as the unavailability of facial expressions. This research proposes a self-training method that uses unlabeled data and a reinforcement learning approach to select and label data more accurately. Experiments on a dataset of dialogues of VR players show that the proposed method achieved an accuracy of over 80% on dominance and arousal labels and outperformed previous techniques in the few-shot classification of emotions based on physiological signals.
Article
Full-text available
EEG-based emotion recognition can help achieve more natural human-computer interaction, but the temporal non-stationarity of EEG signals affects the robustness of EEG-based emotion recognition models. Most existing studies use the emotional EEG data collected in the same trial to train and test models, once this kind of model is applied to the data collected at different times of the same subject, its recognition accuracy will decrease significantly. To address the problem of EEG-based cross-day emotion recognition, this paper has constructed a database of emotional EEG signals collected over six days for each subject using the Chinese Affective Video System and self-built video library stimuli materials, and the database is the largest number of days collected for a single subject so far. To study the neural patterns of emotions based on EEG signals cross-day, the brain topography has been analyzed in this paper, which show there is a stable neural pattern of emotions cross-day. Then, Transfer Component Analysis (TCA) algorithm is used to adaptively determine the optimal dimensionality of the TCA transformation and match domains of the best correlated motion features in multiple time domains by using EEG signals from different time (days). The experimental results show that the TCA-based domain adaptation strategy can effectively improve the accuracy of cross-day emotion recognition by 3.55% and 2.34%, respectively, in the classification of joy-sadness and joy-anger emotions. The emotion recognition model and brain topography in this paper, verify that the database can provide a reliable data basis for emotion recognition across different time domains. This EEG database will be open to more researchers to promote the practical application of emotion recognition.
Chapter
Full-text available
Automatic detection of prohibited items in passenger baggage is a challenging task, especially in cluttered and occluded concealment scenarios. In this paper, we present a deep fusion driven semantic segmentation network that leverages multi-scale feature representations (extracted via CNN backbone) to generate highly accurate segmentation masks of the suspicious items irrespective of the clutter and concealment. Assessed with the public GDXray, SIXray and OPIXray datasets our proposed architecture reached a mean IoU performance of 0.7768, 0.6263, and 0.6713 respectively, outperforming the leading frameworks.
Article
Full-text available
Novel trends in affective computing are based on reliable sources of physiological signals such as Electroencephalogram (EEG), Electrocardiogram (ECG), and Galvanic Skin Response (GSR). The use of these signals provides challenges of performance improvement within a broader set of emotion classes in a less constrained real-world environment. To overcome these challenges, we propose a computational framework of 2D Convolutional Neural Network (CNN) architecture for the arrangement of 14 channels of EEG, and a combination of Long Short-Term Memory (LSTM) and 1D-CNN architecture for ECG and GSR. Our approach is subject-independent and incorporates two publicly available datasets of DREAMER and AMIGOS with low-cost, wearable sensors to extract physiological signals suitable for real-world environments. The results outperform state-of-the-art approaches for classification into four classes, namely High Valence—High Arousal, High Valence—Low Arousal, Low Valence—High Arousal, and Low Valence—Low Arousal. Emotion elicitation average accuracy of 98.73% is achieved with ECG right-channel modality, 76.65% with EEG modality, and 63.67% with GSR modality for AMIGOS. The overall highest accuracy of 99.0% for the AMIGOS dataset and 90.8% for the DREAMER dataset is achieved with multi-modal fusion. A strong correlation between spectral- and hidden-layer feature analysis with classification performance suggests the efficacy of the proposed method for significant feature extraction and higher emotion elicitation performance to a broader context for less constrained environments.
Article
Full-text available
Maculopathy is a collective group of diseases that damages the central region of a retina known as macula. The major two forms of maculopathy are macular edema (ME) and central serous chorioretinopathy (CSCR). Different researchers have worked on the identification of these macular disorders using optical coherence tomography (OCT) images. However, to the best of our knowledge, no research is reported until now that can automatically extract retinal information for the grading of ME and CSCR as per clinically significant macular edema (CSME), non-clinically significant macular edema (non-CSME), Type-I CSCR and Type-II CSCR clinical standards from OCT images. Therefore, this paper presents a novel structure tensor graph searches (ST-GS) based segmentation framework that combines structure tensor and graph theory to automatically extract retinal and choroidal layers along with fluid pores followed by automated reconstruction of 3D retinal surfaces. ST-GS can extract retinal information even from highly degraded OCT scans. Furthermore, the proposed system automatically grades ME and CSCR pathologies as CSME, non-CSME, Type-I CSCR and Type-II CSCR respectively. The proposed system extracts 7 distinct features, and it is trained on 30 (10 healthy, 10 ME, 10 CSCR) labeled OCT volumes containing 3,840 brightness scans (B-scans). After that, 90 (30 healthy, 30 CSCR and 30 ME) unlabeled OCT volumes containing 11,520 B-scans were used for testing the proposed system, where it correctly classified 88 out of 90 cases with the sensitivity, specificity and accuracy ratings of 96.77%, 100% and 97.78% respectively. Furthermore, the proposed system achieved a mean dice coefficient of 0.875±0.0342 and 0.92±0.0258 for extracting cyst and serous fluids respectively. OAPA
Article
Background In anti-vascular endothelial growth factor (anti-VEGF) therapy, an accurate estimation of multi-class retinal fluid (MRF) is required for the activity prescription and intravitreal dose. This study proposes an end-to-end deep learning-based retinal fluids segmentation network (RFS-Net) to segment and recognize three MRF lesion manifestations, namely, intraretinal fluid (IRF), subretinal fluid (SRF), and pigment epithelial detachment (PED), from multi-vendor optical coherence tomography (OCT) imagery. The proposed image analysis tool will optimize anti-VEGF therapy and contribute to reducing the inter- and intra-observer variability. Method The proposed RFS-Net architecture integrates the atrous spatial pyramid pooling (ASPP), residual, and inception modules in the encoder path to learn better features and conserve more global information for precise segmentation and characterization of MRF lesions. The RFS-Net model is trained and validated using OCT scans from multiple vendors (Topcon, Cirrus, Spectralis), collected from three publicly available datasets. The first dataset consisted of OCT volumes obtained from 112 subjects (a total of 11,334 B-scans) is used for both training and evaluation purposes. Moreover, the remaining two datasets are only used for evaluation purposes to check the trained RFS-Net’s generalizability on unseen OCT scans. The two evaluation datasets contain a total of 1,572 OCT B-scans from 1,255 subjects. The performance of the proposed RFS-Net model is assessed through various evaluation metrics. Results The proposed RFS-Net model achieved the mean F1 scores of 0.762, 0.796, and 0.805 for segmenting IRF, SRF, and PED. Moreover, with the automated segmentation of the three retinal manifestations, the RFS-Net brings a considerable gain in efficiency compared to the tedious and demanding manual segmentation procedure of the MRF. Conclusions Our proposed RFS-Net is a potential diagnostic tool for the automatic segmentation of MRF (IRF, SRF, and PED) lesions. It is expected to strengthen the inter-observer agreement, and standardization of dosimetry is envisaged as a result.
Conference Paper
The demand for high speed data communication and seamless user connectivity is increasing day by day which lead towards the development of new next generation communication services. Wi-Fi, the most popular wireless networking technology uses radio waves as medium for transmission. However, it has a constraint over security, reliability, speed and range of transmission. Li-Fi addresses these issues by providing a high-speed communication which is tremendously faster than Wi-Fi. Furthermore, it is also considered an ideal candidate for secure communication. In this paper, we present a survey on Li-Fi communication technique and different architectures which have been proposed over the past for achieving high speed communication using visible light. In addition to this, this paper also discusses the key challenges which are being faced nowadays in achieving optimal visible light communication.
Article
The identification of retinal lesions plays a vital role in accurately classifying and grading retinopathy. Many researchers have presented studies on optical coherence tomography (OCT) based retinal image analysis over the past. However, to the best of our knowledge, there is no framework yet available that can extract retinal lesions from multi-vendor OCT scans and utilize them for the intuitive severity grading of the human retina. To cater this lack, we propose a deep retinal analysis and grading framework (RAG-FW). RAG-FW is a hybrid convolutional framework that extracts multiple retinal lesions from OCT scans and utilizes them for lesion-influenced grading of retinopathy as per the clinical standards. RAG-FW has been rigorously tested on 43,613 scans from five highly complex publicly available datasets, containing multi-vendor scans, where it achieved the mean intersection-over-union score of 0.8055 for extracting the retinal lesions and the accuracy of 98.70% for the correct severity grading of retinopathy.