Content uploaded by Sheharyar Khan
Author content
All content in this area was uploaded by Sheharyar Khan on Jun 04, 2023
Content may be subject to copyright.
Identification of Human Activity and Associated
Context Using Smartphone Inertial Sensors in
Unrestricted Environment
Sadam Hussain Noorani
Department of Computer Engineering
University of Engg. and Technology
Taxila, Pakistan
cpe.sadam@gmail.com
Aasim Raheel
Department of Computer Engineering
University of Engg. and Technology
Taxila, Pakistan
asim.raheel@uettaxila.edu.pk
Sheharyar Khan
Department of Computer Engineering
University of Engg. and Technology
Taxila, Pakistan
sheharyar.khan@uettaxila.edu.pk
Aamir Arsalan
Department of Software Engineering
Fatima Jinnah Women University
Rawalpindi, Pakistan
aamir.arsalan@fjwu.edu.pk
Muhammad Ehatisham-ul-Haq
Department of Creative Technologies
Air University
Islamabad, Pakistan
ehtisham@mail.au.edu.pk
Abstract—Smartphones are increasing ubiquitously due to the
need and demand in the modern era. The world is transmuting
into a global village with the cumulation of smart devices.
Nowadays, smartphones are enriched with inertial sensors which
can be used to recognize physical human activities in the wild.
Human activity recognition (HAR) lies at the core of many appli-
cations like health monitoring, fall detection, road safety, personal
assistance, and behavior-based context awareness. Context-based
HAR is a new dimension that provides fine-grained information
about the action being performed and leads us towards the
automated and intelligent system design, which is useful to
furnish the smart solution to real-life problems. In this paper,
smartphone sensors are utilized to propose a framework for
human activity and context recognition. This framework per-
forms activity-dependent context recognition on the ExtraSensory
dataset using random forest, decision tree, and k-nearest neighbor
classifiers, and achieves an accuracy of 93.10%, demonstrating
the effectiveness of the proposed framework.
Index Terms—Machine learning, Ubiquitous Computing, In-
telligent Systems, Context-Aware, Activity Recognition.
I. INTRODUCTION
Activity recognition is the ability to identify and recognize
the actions performed by an individual [1]. Human activity
recognition aims to recognize activities from a series of
observations or the actions of subjects and the environmental
conditions [2, 3]. There are primarily two mechanisms to iden-
tify human activities which include vision-based and wearable
sensor-based methods. Vision-based activity recognition from
video sequences or images is challenging due to problems like
background clutter, partial occlusion, variations in scale, angle,
lighting, and appearance. Whereas, on the other hand, the non-
visual approach to activity recognition relies mostly on various
wearable sensors [4]. The increasing sophistication of perva-
sive gadgets (particularly smartphones) and their sensing and
networking capabilities have made it possible to continuously
monitor human actions and their behavioral environment.
The existence and understanding of behavioral context enable
surrounding users to adapt proactively and intelligently to the
physical settings or situations around them [5]. As a result,
the growth of smartphones has enabled software developers
to construct context-aware applications capable of identifying
human-centric or community-based novel social and cognitive
activities in any setting and place [6]. Working in-the-wild
conditions are tough for the researchers because capturing
the data in an unconstrained/uncontrolled environment without
affecting daily life is a big deal. The true essence of real life is
suppressed when a person is being monitored by the camera,
whereas, one can easily perform daily living activities while
interacting with a smartphone [7]. Moreover, the processing
capability of modern smartphones is comparable to that of
computers and can execute the majority of jobs previously
performed on PCs, with the additional advantage of portability
[8]. Human activity recognition is the foundation of a number
of high-impact applications, such as health, behavior-based
context-awareness, automation, and self-managing systems. In
addition to this, context-aware activity recognition is valuable
for third-party applications, such as targeted advertising, re-
search platforms, and corporate management [9].
In the literature, activity, and context have been indepen-
dently recognized, however, we adopted a two-step strat-
egy and achieved activity-dependent context recognition. This
work proposes a novel framework capable of recognizing
human physical activities and their associated contexts utiliz-
ing smartphone sensors in-the-wild settings. As illustrated in
Fig. 1., the proposed scheme distinguishes among six human
activities, including lying down, standing, sitting, walking,
running, and bicycling, and their related behavioral contexts
in the second phase of the experiment. The objective of this
979-8-3503-3239-1/23/$31.00 © 2023 IEEE
Fig. 1. Primary Human Activities of daily living along with associated
contexts based on ”ExtraSensory” dataset
study is to develop a cost-effective system for identifying
and categorizing human activities and contexts associated with
them. In this aspect, the proposed scheme offers the following
notable contributions:
•Recognizing six different human activities in the wild
environment
•Behavioral context recognition based on physical activi-
ties
•A comparison of the proposed scheme with the available
state-of-the-art human activity and context recognition
techniques
The rest of the paper is structured as follows: Section II
details the literature related to human activity and context
recognition. The stages involved in the proposed framework
for recognizing human primary activities and contexts are
elaborated in Section III. In Section IV, the experimental find-
ings are presented and analyzed. In Section V, a comparative
analysis of the proposed scheme with the existing schemes is
presented followed by a conclusion in Section VI.
II. RE LATE D WORK
The human ability to recognize another person’s activities is
one of the main subjects of study in the scientific areas of com-
puter vision and machine learning [10]. Context-aware human
activity recognition is a new level in HAR research that leads
to automated and intelligent system development. S. Tahir et
al. developed a human object interaction identification scheme
using wrist-mounted sensors. They identified 3objects and 14
interactions with 86.90% accuracy using the random forest
(RF) classifier [11]. Yuqing et al. developed an acceleration-
based human activity recognition approach employing a con-
volutions neural network (CNN) with a modified convolution
kernel achieving an average accuracy of 93.8% [12]. Andrei
et al. used smartphone sensor data to accurately recognize
six different human activities which include walking, running,
sitting, standing, climbing, and descending stairs attaining an
accuracy of 94% [13].
Yusra et al. studied context-aware human activity recogni-
tion using behavioral contexts and physical activities achiev-
ing an accuracy of 84% using RF classifier. Context-aware
and context-independent HAR experiments demonstrated that
behavioral context improved the primary activity recognition
accuracy [14]. Ehatisham-ul-Haq et al. developed a smart-
phone accelerometer-based model to recognize four daily life
activities (lying down, standing, sitting, and walking) and their
associated contexts with an accuracy of 97.7% using Random
Forest classifier [15]. Khowaja et al. studied context-aware
personalized human activity recognition (CAPHAR) with as-
sociative learning in intelligent environments. CAPHAR com-
putes class association rules between low-level actions and
contextual information to recognize high-level activities and
achieved 23.7% better accuracy for the unseen subjects [16].
A hybrid deep learning-based model by Otebolaku et al. identi-
fied context-aware intricate human actions. They explored am-
bient parameters which include lighting and noise level using
CNN and Long Short-Term Memory (LSTM) to complement
sensory data from traditional sensors. Hybrid deep learning
models outperformed context-free activity recognition models
with an accuracy of 76.80% [17]. Ying et.al presented an
ensemble learning algorithm (ELA) for smartphone sensor ac-
tivity recognition. Their proposed ELA uses a gated recurrent
unit, a CNN, and a deep neural network (DNN). Input samples
to DNN consist of a feature vector with 561 time-domain
and frequency-domain parameters. The full-connected DNN
fused the output of three activity classification models. The
ELA outperforms the existing system and achieved 96.70%
accuracy [18].
III. PROP OS ED METHODOLOGY
Fig. 2 shows a two-level framework for the recognition
of human activities and their associated contexts. The first
level recognizes six primary human activities, whereas the
second level recognizes the behavioral contexts related to each
primary activity. The raw data from inertial sensors (gyroscope
and accelerometer) are pre-processed and then subjected to
Fig. 2. Proposed Methodology for Human Activity and Context Recognition (HACR)
feature extraction and selection stage followed by context-
aware activity recognition using three different machine learn-
ing algorithms which include random forest (RF), decision tree
(DT), and k-nearest neighbors (KNN). The details about each
step involved in the proposed methodology are presented in
the sub-sections below.
A. Data Acquisition
A publicly available dataset termed ’ExtraSensory’ is uti-
lized to conduct experiments for the proposed method. This
dataset was gathered in out-of-lab settings from 60 subjects
while performing their daily life activities. The extrasensory
dataset consists of data from a wide range of sensors, however,
in this study, the data from the accelerometer and gyroscope
sensor of the smartphone at a sampling rate of 40Hz. In
addition to primary activity labels, the dataset also provides
secondary context information corresponding to each of the
primary activities. In this study, we have used six primary
activities (i.e., lying down, sitting, standing, walking, running,
and bicycling) as shown in Table I, where each activity consists
of data of 20 seconds. All the primary activities contained
secondary context labels except running and bicycling. There-
fore bicycling and running are not included in the secondary
context recognition, and were only used for primary activity
recognition.
B. Pre-processing
Data pre-processing is the transformation of raw data into
a format that can be comprehended. The accelerometer and
gyroscope raw data may contain a variety of sources of
noise. Before classification, motion signals are pre-processed
to eliminate noise from raw data. Before using machine
learning or data mining techniques, the input data need to
be cleaned. Different imputation techniques are employed
to remove/replace the missing data. Moreover, a 3rd order
average smoothing filter is applied to eliminate the noise
from raw data of inertial sensors and segmented using the
windowing method.
C. Features Extraction and Selection
After pre-processing of the inertial sensors data, feature
extraction is performed and 20 different time domain features
are extracted which have been utilized in [19]. 60 feature
values for each sensor (accelerometer and gyroscope) are
extracted resulting in a 1x120-dimensional feature vector
for each activity of a subject. After feature extraction, the
”infogain” feature selection method is applied to choose the
best subset of features.
D. Classification
Next to the feature extraction and selection, the selected
feature subset is subjected to human activity and context recog-
nition. Three different supervised machine learning algorithms
which include RF, DT, and KNN are employed for this study.
Random Forest is an ensemble technique that uses a decision
tree as a base classifier and can be used for both classification
and regression purposes. The number of decision trees used
in our study is 100. The KNN uses proximity to assign the
class to a test data point. The number of neighbors used
for the current experiment is k = 5. A decision tree is a
supervised machine learning algorithm that is based on the
entropy calculation of the data. The feature selected as a root
node for the decision tree is the one with the minimum entropy
value.
IV. EXP ER IM EN TAL RESULTS
The extracted feature vector is labeled in two different man-
ners to perform activity and context recognition experiments.
For activity recognition, the feature vector is assigned six
different labels corresponding to each primary activity. For the
second phase, unique contexts corresponding to each primary
activity are labeled. For context recognition, only contexts
related to four primary activities which include walking,
sitting, standing, and lying down are used because context
information related to running and bicycling is not available.
Three different classifiers are used to perform the activity
and context recognition. 10-fold cross-validation is used to
TABLE I
LIS T OF PRIMARY HUMAN ACTIVITIES ALONG WITH RELATED BEHAVIORAL CONTEXT INFORMATION
S.No Human Physical Activities Associated Human Contexts
1 Lying Down Relaxing,Surfing On Internet, Sleeping, Watching TV
2 Sitting Computer Work, Studying, Surfing on Internet, At Home, Phone in Pocket, I Am Driving
3 Standing Talking, With Friends, At Home, Phone in Pocket, Phone in Bag, Phone in Hand
4 Walking Talking, With Friends, At Home, Outside, Phone in Pocket, Phone in Bag, Phone in Hand
TABLE II
HUMAN ACTIVITY RECOGNITION RESULTS FOR RF, DT, A ND KNN
CL ASS IFI ER IN T ERM S OF AC CU RAC Y,PRECISION,RE CA LL,A ND
F-MEASURE
Classifier Accuracy Precision Recall F-Measure
RF 0.814 0.818 0.814 0.813
KNN 0.789 0.787 0.789 0.786
J48 0.753 0.753 0.753 0.753
Fig. 3. Confusion Matrix for Primary Activities
evaluate the performance of the proposed scheme in which
the instances are divided into 10 equal parts and nine parts
are used for training and one part is used for testing purposes.
This process is repeated 10 times and an average accuracy is
reported.
Activity recognition results are presented in Table II. The
classifiers are evaluated in terms of classification accuracy,
precision, recall, and F-measure. Based on the average values
of these indicators, it can be inferred that the RF classifier
outperforms the other classification algorithms achieving an
average accuracy of 81.4%. Moreover, the RF classifier re-
sulted in a precision, recall, and F-measure value of 0.818,
0.814, and 0.813, respectively. Similarly, the confusion matrix
presented in Fig. 3 illustrates that out of 115252 instances,
93781 are correctly classified and only 21381 are misclassified.
The summary of results for context recognition correspond-
ing to each primary activity is summarized in Table III. It can
be examined from the table that context recognition accuracy
for each of the primary activities i.e., lying down, sitting,
walking, and standing for RF classifier is 93.1%, 80.5%,
TABLE III
CONTEXT RECOGNITION RESULTS FOR THE RF, DT AND KNN
CL ASS IFI ER IN T ERM S OF AC CU RAC Y,PRECISION,RECALL AND
F-MEASURE.
Contexts Classifiers Accuracy Precision Recall F-
Measure
Lying Down
RF 0.931 0.930 0.934 0.938
J48 0.928 0.938 0.928 0.929
KNN 0.926 0.925 0.926 0.925
Sitting
RF 0.805 0.810 0.805 0.805
J48 0.721 0.721 0.722 0.721
KNN 0.705 0.706 0.706 0.706
Walking
RF 0.688 0.698 0.677 0.677
KNN 0.534 0.544 0.543 0.541
J48 0.518 0.518 0.518 0.518
Standing
RF 0.612 0.609 0.611 0.60
J48 0.535 0.534 0.536 0.535
KNN 0.478 0.463 0.478 0.465
68.8%, and 61.2%, respectively. Similarly, the RF classifier
has the highest value for precision, recall, and F-measure as
compared to other classifiers used for the context recognition
corresponding to each primary activity. These results are
supported by the confusion matrices presented in Fig. 4. It
can be observed from Fig. 4(a) that the context recognition
results corresponding to the lying down activity shows that
out of 61403 instances, 57210 are correctly classified and only
4193 instances are incorrectly classified. From Fig. 4(b), it
can be observed that out of 25974, the number of correctly
classified instances is 17875 and the number of incorrectly
classified instances are 8099. Moreover, Fig. 4(c) demonstrates
the result for context recognition with standing activity. It can
be computed from the confusion matrix that 16796 instances
are correctly classified and 4397 instances are incorrectly
classified from among a total of 27477 instances. Similarly,
in Fig. 4(d), it can be observed that for sitting activity,
68257 instances for context are correctly classified and 16509
instances are incorrectly classified out of 84766 instances.
V. DISCUSSION
The studies related to human activity and context recogni-
tion in the out-of-lab environment are very few with a very
limited number of contexts. Table III presents the comparative
Fig. 4. Confusion matrices for context recognition results based on primary activities of (a) Lying Down (b) Walking (c) Standing (d) Sitting, where the
contexts are denoted by the labels: Phone in bag (BAG), With Friends (FRND), Phone in Hand (HAND), At Home (HOME), Phone in Pocket (POKT),
Talking (TALK), Computer Work (CWRK), I am driving (DRIV), Surfing on Internet (INTR), Studying (STDY), Relaxing (RELX), Sleeping (SLEP), and
Outside (OSIDE).
TABLE IV
COMPARISON OF A FEW EXISTING STUDIES FOR HUMAN ACTIV ITY A ND CO NTE XT RECOGNITION USING SMART SENSORS
Ref
Year Recognized Activities Subjects Sensors Used Classification Accuracy
[14]
2020 In-The-Wild [6 Physical activities with 10
corresponding human behavioral contexts ]
60 Smartphone’s Accelerometer RF 84.00%
[15]
2020 In-The-Wild [4 Physical activities with 13
corresponding human behavioral contexts ]
60 Smartphone’s Accelerometer RF 97.77%
[19]
2022 In-The-Wild [6 Physical activities with 10
corresponding human behavioral contexts
and 4 phone positions ]
60 Smartphone and smartwatch
Accelerometer
Boosted Decision
Tree
96.70%
Proposed In-The-Wild [6 Physical Activities with
23 corresponding human behavioral
contexts ]
60 Smartphone’s Accelerometer
and Gyroscope
RF, DT, KNN 93.10%
analysis of the proposed framework with the existing state-
of-the-art schemes available in the literature. In this com-
parison, we considered only those studies which have used
the Extrasensory dataset. It can be observed from the table
that the number of subjects for all the studies including our
proposed scheme is 60. However, the studies proposed in
[14, 15] have only used data from smartphone accelerom-
eter whereas, the study conducted in [19] used data from
accelerometer sensor of both smartphone and smartwatch. In
our proposed scheme, we used accelerometer and gyroscope
data from smartphone inertial sensors for activity and context
recognition. However, in terms of the number of recognized
contexts, our proposed scheme recognizes a higher number of
contexts i.e., 23 contexts associated with 6 primary activities
with a comparable accuracy of 93.10% to all other earlier
studies. The study proposed in [14] recognizes 10 contexts
associated with 6 primary activities with the highest accuracy
of 84%, the study in [15] recognized 13 contexts associated
with 4 primary activities with an accuracy of 97.77%, and
the study presented in [19] recognizes 10 contexts associated
with 6 primary activities with an accuracy of 96.70%. In
our proposed scheme, we added 13 contexts with comparable
accuracy from the studies conducted in [16] and [14, 19],
respectively.
VI. CONCLUSION
This paper presented a two-stage human activity and context
recognition framework using data from smartphone inertial
sensors. 20-second data from the inertial sensors (accelerom-
eter and gyroscope) are used to extract time domain features
which are further passed through the feature selection process
before finally moving the activity recognition using RF, DT,
and KNN classifiers. RF classifier produced the best results
in terms of classification accuracy, precision, recall, and F-
measure as compared to DT and KNN classifiers. The result
of this research can be expanded to include more behavioral
contexts which could be further used for user identification
applications. Multiple sensory modalities can be added for this
purpose to increase the system’s recognition performance.
REFERENCES
[1] T. Brezmes, J.-L. Gorricho, and J. Cotrina, “Activity
recognition from accelerometer data on a mobile phone,”
in Distributed Computing, Artificial Intelligence, Bioin-
formatics, Soft Computing, and Ambient Assisted Living:
10th International Work-Conference on Artificial Neural
Networks, Spain, pp. 796–799, Springer, 2009.
[2] A. Jordao, L. A. B. Torres, and W. R. Schwartz, “Novel
approaches to human activity recognition based on ac-
celerometer data,” Signal, Image and Video Processing,
vol. 12, no. 7, pp. 1387–1394, 2018.
[3] M. Usman, Z. Noor, I. Farooq, A. Arsalan,
M. Ehatisham-ul Haq, and A. Raheel, “A smart
chair design for recognizing human-object interactions
using pressure sensors,” in 2020 IEEE 23rd International
Multitopic Conference (INMIC), pp. 1–6, IEEE, 2020.
[4] L. M. Dang, K. Min, H. Wang, M. J. Piran, C. H. Lee,
and H. Moon, “Sensor-based and vision-based human
activity recognition: A comprehensive survey,” Pattern
Recognition, vol. 108, p. 107561, 2020.
[5] Y. Vaizman, K. Ellis, and G. Lanckriet, “Recognizing
detailed human context in the wild from smartphones
and smartwatches,” IEEE pervasive computing, vol. 16,
no. 4, pp. 62–74, 2017.
[6] Z. Gao, D. Liu, K. Huang, and Y. Huang, “Context-
aware human activity and smartphone position-mining
with motion sensors,” Remote Sensing, vol. 11, no. 21,
p. 2531, 2019.
[7] F. Niemann, S. L¨
udtke, C. Bartelt, and M. Ten Hompel,
“Context-aware human activity recognition in industrial
processes,” Sensors, vol. 22, no. 1, p. 134, 2022.
[8] M. Vrigkas, C. Nikou, and I. A. Kakadiaris, “A review
of human activity recognition methods,” Frontiers in
Robotics and AI, vol. 2, p. 28, 2015.
[9] N. Gupta, S. K. Gupta, R. K. Pathak, V. Jain, P. Rashidi,
and J. S. Suri, “Human activity recognition in artificial
intelligence framework: a narrative review,” Artificial
intelligence review, pp. 1–54, 2022.
[10] P. K. Shukla, A. Vijayvargiya, R. Kumar, et al., “Human
activity recognition using accelerometer and gyroscope
data from smartphones,” in 2020 International Confer-
ence on Emerging Trends in Communication, Control and
Computing (ICONC3), pp. 1–6, IEEE, 2020.
[11] S. Tahir, A. Raheel, M. Ehatisham-ul Haq, and A. Ar-
salan, “Object based human-object interaction (hoi)
recognition using wrist-mounted sensors,” in 2020
IEEE 23rd International Multitopic Conference (INMIC),
pp. 1–6, IEEE, 2020.
[12] Y. Chen and Y. Xue, “A deep learning approach to human
activity recognition based on single accelerometer,” in
2015 IEEE international conference on systems, man,
and cybernetics, pp. 1488–1492, IEEE, 2015.
[13] R.-A. Voicu, C. Dobre, L. Bajenaru, and R.-I. Ciobanu,
“Human physical activity recognition using smartphone
sensors,” Sensors, vol. 19, no. 3, p. 458, 2019.
[14] Y. Asim, M. A. Azam, M. Ehatisham-ul Haq, U. Naeem,
and A. Khalid, “Context-aware human activity recogni-
tion (cahar) in-the-wild using smartphone accelerometer,”
IEEE Sensors Journal, vol. 20, no. 8, pp. 4361–4371,
2020.
[15] M. Ehatisham-ul Haq, M. A. Azam, Y. Asim, Y. Amin,
U. Naeem, and A. Khalid, “Using smartphone accelerom-
eter for human physical activity and context recogni-
tion in-the-wild,” Procedia Computer Science, vol. 177,
pp. 24–31, 2020.
[16] S. A. Khowaja, B. N. Yahya, and S.-L. Lee, “Caphar:
context-aware personalized human activity recogni-
tion using associative learning in smart environments,”
Human-centric Computing and Information Sciences,
vol. 10, no. 1, pp. 1–35, 2020.
[17] A. Omolaja, A. Otebolaku, and A. Alfoudi, “Context-
aware complex human activity recognition using hybrid
deep learning models,” Applied Sciences, vol. 12, no. 18,
p. 9305, 2022.
[18] T.-H. Tan, J.-Y. Wu, S.-H. Liu, and M. Gochoo, “Human
activity recognition using an ensemble learning algorithm
with smartphone sensor data,” Electronics, vol. 11, no. 3,
p. 322, 2022.
[19] M. Ehatisham-ul Haq, F. Murtaza, M. A. Azam, and
Y. Amin, “Daily living activity recognition in-the-wild:
Modeling and inferring activity-aware human contexts,”
Electronics, vol. 11, no. 2, p. 226, 2022.