Figure 5 - uploaded by Jang-Hee Yoo
Content may be subject to copyright.
Color value i I of the ith pixel for the statistic model in RGB color space [44].  

Color value i I of the ith pixel for the statistic model in RGB color space [44].  

Source publication
Article
Full-text available
This review article surveys extensively the current progresses made toward video-based human activity recognition. Three aspects for human activity recognition are addressed including core technology, human activity recognition systems, and applications from low-level to high-level representation. In the core technology, three critical processing s...

Contexts in source publication

Context 1
... [44], each pixel is modeled by four parameters, individually brightness distortion, chromaticity distortion, the variation of the brightness distortion, and the variation of the chromaticity distortion. As shown in Figure 5 [44], i E is the expected color value (say, as recorded in the background image) and i I is the color value of the ith pixel. The brightness distortion ( ) is defined as the shortest distance between the ith pixel i I and the line O E i . ...
Context 2
... DBN [78] is a Bayesian network with the same structure unrolled in the time axis as shown in Figure 15 [7]. An HMM is proven to be a special type of DBN with a fixed structure of inference graph. ...

Similar publications

Article
Full-text available
Human Activity Recognition (HAR) has emerged as a vital measure of quality of life, holding significant implications for human health. The need for effective mobility monitoring in diverse settings, both indoors and outdoors, necessitates the development of scientific and technological tools. To broaden accessibility, wearable devices like smartpho...
Article
Full-text available
Human Activity Recognition is the act of recognizing activities performed by humans in real-time. This can be done using video data or more advanced forms of data like- inertial, depth maps, or human skeletal joint trajectories. In this work, we perform human action recognition through skeletal joint tracking of the human body using a deep recurren...
Article
Full-text available
Human activity recognition (HAR) is an emerging methodology essential for smart homes with practical applications such as personal lifecare and healthcare services for the elderly and disabled people. In this work, we present a novel HAR methodology utilizing the recognized body parts of human depth silhouettes and Hidden Markov Models (HMMs). We f...
Article
Full-text available
In recent years, the use of accelerometers embedded in smartphones for Human Activity Recognition (HAR) has been well considered. Nevertheless, the role of the sensor placement is yet to be explored and needs to be further investigated. In this study, we investigated the role of sensor placements for recognizing various types of physical activities...
Article
Full-text available
The advancement of deep learning in human activity recognition (HAR) using 3D skeleton data is critical for applications in healthcare, security, sports, and human–computer interaction. This paper tackles a well-known gap in the field, which is the lack of testing in the applicability and reliability of XAI evaluation metrics in the skeleton-based...

Citations

... HAR refers to the systematic method of recognizing and categorizing human behaviours by leveraging various data types [2]. Te technology discussed in this context has versatility for application across various areas, such as smart homes, medical diagnostics, and video surveillance [3,4]. Te growing popularity of collaborative robotic systems (CRS) in diverse contexts has necessitated the advancement of precise and automated methodologies for HAR [5]. ...
Article
Full-text available
Human activity recognition (HAR) has gained significant attention in computer vision and human-computer interaction. This paper investigates the difficulties encountered in human activity recognition (HAR), precisely differentiating between various activities by extracting spatial and temporal features from sequential data. Traditional machine learning approaches necessitate manual feature extraction, hindering their effectiveness. For temporal features, RNNs have been widely used for HAR; however, they need help processing long sequences, leading to information bottlenecks. This work introduces a framework that effectively integrates spatial and temporal features by utilizing a series of layers that incorporate a self-attention mechanism to overcome these problems. Here, spatial characteristics are derived using 1D convolutions coupled with pooling layers to capture essential spatial information. After that, GRUs are used to make it possible to effectively represent the temporal dynamics that are inherent in sequential data. Furthermore, the utilization of an attention mechanism serves the purpose of dynamically selecting the significant segments within the sequence, thereby improving the model’s comprehension of context and enhancing the efficacy of deep neural networks (DNNs) in the domain of human activity recognition (HAR). Three different optimizers, namely, Adam, SGD, and RMSprop, were employed to train the model. Each optimizer was tested with three distinct learning rates of 0.1, 0.001, and 0.0001. Experiments on the UCI-HAR dataset have shown that the model works well, with an impressive 97% accuracy rate when using the Adam optimizer with a learning rate of 0.001.
... The amount of data generated by the various monitored scenarios is massive, but only some are relevant. Therefore, identifying the critical information in the scenario, detection, and interpretation allow for association with events of interest [4,5]. ...
... The use of motion primitives according to the activity that is desired to be modeled, using a language based on motion primitives, is an expressive and straightforward way to implement modeling for activity inference, particularly for a public inexperienced in managing surveillance systems. (12,7), (13,7), (13,6), (14,6), (14,7), (15,7), (15,6) Lateral road entry seq(A, B, C, D) state C=[ (11,5), (11,4), (12,4), (12,5)]; state D=[(13, 4), (13,5), (14,5), (14,4), (15,4), (15,5) state A=[(9, 5), (9,6), (10,6), (11,6), (11,5), (10,5)]; state B=[(9, 7), (9,8), (10,8), (11,8), (11,7), (10,7)]; Join the highway con(A, B, C, D) state C=[(12, 5), (12,6), (13,6), (14,6), (14,5), (13,5), (15,5), (15,6)]; state D=[(12, 7), (12,8), (14,8), (13,7), (13,8), (14,7), (15,7), (15,8)]; ...
... The use of motion primitives according to the activity that is desired to be modeled, using a language based on motion primitives, is an expressive and straightforward way to implement modeling for activity inference, particularly for a public inexperienced in managing surveillance systems. (12,7), (13,7), (13,6), (14,6), (14,7), (15,7), (15,6) Lateral road entry seq(A, B, C, D) state C=[ (11,5), (11,4), (12,4), (12,5)]; state D=[(13, 4), (13,5), (14,5), (14,4), (15,4), (15,5) state A=[(9, 5), (9,6), (10,6), (11,6), (11,5), (10,5)]; state B=[(9, 7), (9,8), (10,8), (11,8), (11,7), (10,7)]; Join the highway con(A, B, C, D) state C=[(12, 5), (12,6), (13,6), (14,6), (14,5), (13,5), (15,5), (15,6)]; state D=[(12, 7), (12,8), (14,8), (13,7), (13,8), (14,7), (15,7), (15,8)]; ...
Article
Full-text available
SEL, a State-based Language for Video Surveillance Modeling, is a formal language designed to represent and identify activities in surveillance systems through scenario semantics and the creation of motion primitives structured in programs. Motion primitives represent the temporal evolution of motion evidence. They are the most basic motion structures detected as motion evidence, including operators such as sequence, parallel, and concurrency, which indicate trajectory evolution, simultaneity, and synchronization. SEL is a very expressive language that characterizes interactions by describing the relationships between motion primitives. These interactions determine the scenario’s activity and meaning. An experimental model is constructed to demonstrate the value of SEL, incorporating challenging activities in surveillance systems. This approach assesses the language’s suitability for describing complicated tasks.
... Research in computer vision encompasses human activity recognition [1]. Systems designed for such recognition can be pivotal for video cataloging, content-driven retrieval, interactive human-computer interfaces, and surveillance [2], [3]. ...
Article
Full-text available
Human action recognition has emerged as a significant area of study due to it is diverse applications. This research investigates convolutional neural network (CNN) structures to extract spatio-temporal attributes from 2D images. By harnessing the power of pre-trained residual network 50 (ResNet50) and visual geometric group 16 (VGG16) networks through transfer learning, intricate human actions can be discerned more effectively. These networks aid in isolating and merging spatio-temporal features, which are then trained using a support vector machine (SVM) classifier. The refined approach yielded an accuracy of 89.71% on the UCF-101 dataset. Utilizing the UCF YouTube action dataset, activities such as basketball playing and cycling were successfully identified using ResNet50 and VGG16 models. Despite variations in frame dimensions, 3DCNN models demonstrated notable proficiency in video classification. The training phase achieved a remarkable 95.6% accuracy rate. Such advancements in leveraging pre-trained neural networks offer promising prospects for enhancing human activity recognition, especially in areas like personal security and senior care.
... Depending on the sensing method employed, HAR can be broadly categorized into external and internal sensor-based approaches. External methods encompass optical signals (video), Wi-Fi signals (utilized in efficient Wi-Fi-based HAR), environmental signals (e.g., smart home data, including temperature, humidity, CO 2 levels, light intensity), and even seismic waves [2][3][4][5][6]. Notably, camera-based approaches have demonstrated remarkable performance in HAR, particularly with advancements in artificial neural networks [7][8][9]. ...
Article
Full-text available
Various sensing modalities, including external and internal sensors, have been employed in research on human activity recognition (HAR). Among these, internal sensors, particularly wearable technologies, hold significant promise due to their lightweight nature and simplicity. Recently, HAR techniques leveraging wearable biometric signals, such as electrocardiography (ECG) and photoplethysmography (PPG), have been proposed using publicly available datasets. However, to facilitate broader practical applications, a more extensive analysis based on larger databases with cross-subject validation is required. In pursuit of this objective, we initially gathered PPG signals from 40 participants engaged in five common daily activities. Subsequently, we evaluated the feasibility of classifying these activities using deep learning architecture. The model’s performance was assessed in terms of accuracy, precision, recall, and F-1 measure via cross-subject cross-validation (CV). The proposed method successfully distinguished the five activities considered, with an average test accuracy of 95.14%. Furthermore, we recommend an optimal window size based on a comprehensive evaluation of performance relative to the input signal length. These findings confirm the potential for practical HAR applications based on PPG and indicate its prospective extension to various domains, such as healthcare or fitness applications, by concurrently analyzing behavioral and health data through a single biometric signal.
... This method aided in This article has been accepted for publication in IEEE Transactions on Neural Systems and Rehabilitation Engineering. This is the author's version which has not been fully edited and (10) and right hip (13). ...
... The number of parameters, denoted as P aram, was calculated using (13) and (14) for the convolutional and dense layers. In these formulas, Kernel represents its size, Input and Output corresponding to the number of inputs and outputs. ...
Article
Full-text available
Research in the field of human activity recognition is very interesting due to its potential for various applications such as in the field of medical rehabilitation. The need to advance its development has become increasingly necessary to enable efficient detection and response to a wide range of movements. Current recognition methods rely on calculating changes in joint distance to classify activity patterns. Therefore, a different approach is required to identify the direction of movement to distinguish activities exhibiting similar joint distance changes but differing motion directions, such as sitting and standing. The research conducted in this study focused on determining the direction of movement using an innovative joint angle shift approach. By analyzing the joint angle shift value between specific joints and reference points in the sequence of activity frames, the research enabled the detection of variations in activity direction. The joint angle shift method was combined with a Deep Convolutional Neural Network (DCNN) model to classify 3D datasets encompassing spatial-temporal information from RGB-D video image data. Model performance was evaluated using the confusion matrix. The results show that the model successfully classified nine activities in the Florence 3D Actions dataset, including sitting and standing, obtaining an accuracy of (96.72 ± 0.83)%. In addition, to evaluate its robustness, this model was tested on the UTKinect Action3D dataset, obtaining an accuracy of 97.44%, proving that state-of-the-art performance has been achieved.
... An HAR system is composed of three functional subsystems [17]: (i) a sensing module responsible for continuously collecting information from the environment [18], (ii) a processing module responsible for extracting the main features from the sensor signals to discriminate between activities [19] and (iii) a classification module to identify the activity from the key features extracted by the previous module [20]. Concerning (i), the use of computer vision technology has been proposed [21,22] due to its high reliability [23]. A large amount of temporal information about the development of an activity can be extracted by studying the person's postures and movements, the objects he/she is using or the environment where the activity is taking place [24]. ...
Article
Full-text available
As people get older, living at home can expose them to potentially dangerous situations when performing everyday actions or simple tasks due to physical, sensory or cognitive limitations. This could compromise the residents’ health, a risk that in many cases could be reduced by early detection of the incidents. The present work focuses on the development of a system capable of detecting in real time the main activities of daily life that one or several people can perform at the same time inside their home. The proposed approach corresponds to an unsupervised learning method, which has a number of advantages, such as facilitating future replication or improving control and knowledge of the internal workings of the system. The final objective of this system is to facilitate the implementation of this method in a larger number of homes. The system is able to analyse the events provided by a network of non-intrusive sensors and the locations of the residents inside the home through a Bluetooth beacon network. The method is built upon an accurate combination of two hidden Markov models: one providing the rooms in which the residents are located and the other providing the activity the residents are carrying out. The method has been tested with the data provided by the public database SDHAR-HOME, providing accuracy results ranging from 86.78% to 91.68%. The approach presents an improvement over existing unsupervised learning methods as it is replicable for multiple users at the same time.
... Consequently, HAR models tend to be complex in nature. In the past, researchers relied on designing handcrafted feature extractors to encode the necessary features for obtaining precise motion representations from video sequences, aiming to enhance the accuracy of HAR models [7][8][9]. Nevertheless, methods based on hand-crafted feature extraction have limitations as they heavily rely on human insight and lack the ability to automatically adapt to new data. Consequently, their applicability in real-world scenarios which are often dynamic and ever-changing, is very limited. ...
Article
Full-text available
Given the prevalence of surveillance cameras in our daily lives, human action recognition from videos holds significant practical applications. A persistent challenge in this field is to develop more efficient models capable of real-time recognition with high accuracy for widespread implementation. In this research paper, we introduce a novel human action recognition model named Context-Aware Memory Attention Network (CAMA-Net), which eliminates the need for optical flow extraction and 3D convolution which are computationally intensive. By removing these components, CAMA-Net achieves superior efficiency compared to many existing approaches in terms of computation efficiency. A pivotal component of CAMA-Net is the Context-Aware Memory Attention Module, an attention module that computes the relevance score between key-value pairs obtained from the 2D ResNet backbone. This process establishes correspondences between video frames. To validate our method, we conduct experiments on four well-known action recognition datasets: ActivityNet, Diving48, HMDB51 and UCF101. The experimental results convincingly demonstrate the effectiveness of our proposed model, surpassing the performance of existing 2D-CNN based baseline models. Article Highlights Recent human action recognition models are not yet ready for practical applications due to high computation needs. We propose a 2D CNN-based human action recognition method to reduce the computation load. The proposed method achieves competitive performance compared to most SOTA 2D CNN-based methods on public datasets.
... With the continuous advancement of artificial intelligence and wearable technology, research on human activity recognition (HAR) based on inertial measurement unit (IMU) devices has become a trending field (Yadav et al., 2021). It has extensive practical applications such as smart homes, health monitoring, exercise tracking, and game design (Rashidi and Cook, 2009;Hong et al., 2010;Ke et al., 2013). Compared to visual-based HAR, sensor-based HAR methods can better protect user privacy, especially with the development and proliferation of wearable devices that embed sensors such as accelerometers, gyroscopes, and magnetometers. ...
Article
Full-text available
Human activity recognition (HAR) has recently become a popular research field in the wearable sensor technology scene. By analyzing the human behavior data, some disease risks or potential health issues can be detected, and patients’ rehabilitation progress can be evaluated. With the excellent performance of Transformer in natural language processing and visual tasks, researchers have begun to focus on its application in time series. The Transformer model models long-term dependencies between sequences through self-attention mechanisms, capturing contextual information over extended periods. In this paper, we propose a hybrid model based on the channel attention mechanism and Transformer model to improve the feature representation ability of sensor-based HAR tasks. Extensive experiments were conducted on three public HAR datasets, and the results show that our network achieved accuracies of 98.10%, 97.21%, and 98.82% on the HARTH, PAMAP2, and UCI-HAR datasets, respectively, The overall performance is at the level of the most advanced methods.
... Activities undertaken by individuals are not exclusive to the realm of sports; they can also encompass chores such as household cleaning, cooking, water collection, and more [3]. The more frequently a person engages in these activities, the more likely they are to become habits that influence their health [4]. In sports, there are three intensities of activity: high, medium, and low [5]. ...
Article
Problems and Purpose. The global prevalence of diabetes is on the rise, alongside other diseases related to obesity, including hypertension, heart disease and dyslipidemia. Exercise is recognized as a method to control blood sugar levels. This study aimed to investigate the impact of low and moderate-intensity sports exercises on blood sugar levels in patients. Materials and Methods. This research was a laboratory experiment following a completely randomized design. Twenty participants were enlisted for the study and divided into two groups: a control group and a moderate-intensity exercise group. The study spanned two months. Results. The results revealed a significant change in the blood sugar levels of patients who underwent low and moderate-intensity aerobic exercise with a p-value of 0.001. Conclusion. This research concludes that there are notable differences in the impact of low and moderate-intensity aerobic exercises on body mass index and blood sugar levels in diabetes patients.
... Human activity recognition using video signals is a rapidly evolving field with applications in various domains, such as surveillance systems [1], human-computer interaction, and healthcare monitoring [2]. The ability to automatically analyze and understand human activities from video data has significant implications for improving safety, enhancing user experiences, and enabling intelligent systems [3]. In this study, we present a comprehensive approach for human activity recognition that leverages spatial-temporal features and a two-level hierarchical method, integrating hidden Markov models (HMM) [4] and support vector machines (SVM) to achieve accurate and robust activity classification. ...
Article
Full-text available
Human Activity Recognition (HAR) is an important field with diverse applications. However, video-based HAR is challenging because of various factors, such as noise, multiple people, and obscured body parts. Moreover, it is difficult to identify similar activities within and across classes. This study presents a novel approach that utilizes body region relationships as features and a two-level hierarchical model for classification to address these challenges. The proposed system uses a Hidden Markov Model (HMM) at the first level to model human activity, and similar activities are then grouped and classified using a Support Vector Machine (SVM) at the second level. The performance of the proposed system was evaluated on four datasets, with superior results observed for the KTH and Basic Kitchen Activity (BKA) datasets. Promising results were obtained for the HMDB-51 and UCF101 datasets. Improvements of 25%, 25%, 4%, 22%, 24%, and 30% in accuracy, recall, specificity, Precision, F1-score, and MCC, respectively, are achieved for the KTH dataset. On the BKA dataset, the second level of the system shows improvements of 8.6%, 8.6%, 0.85%, 8.2%, 8.4%, and 9.5% for the same metrics compared to the first level. These findings demonstrate the potential of the proposed two-level hierarchical system for human activity recognition applications.