Fig 7 - uploaded by Alessandro Carfì
Content may be subject to copyright.
Confusion matrix for the online testing with α = 0.05 and γ = 0.9. The bottom row reports the recall measures while the rightmost column reports the precision measures. The blue cell reports the overall accuracy.

Confusion matrix for the online testing with α = 0.05 and γ = 0.9. The bottom row reports the recall measures while the rightmost column reports the precision measures. The blue cell reports the overall accuracy.

Source publication
Conference Paper
Full-text available
Gestures are a natural communication modality for humans. The ability to interpret gestures is fundamental for robots aiming to naturally interact with humans. Wearable sensors are promising to monitor human activity, in particular the usage of triaxial accelerometers for gesture recognition have been explored. Despite this, the state of the art pr...

Contexts in source publication

Context 1
... increasing the reactivity of the system. To verify whether and to what extent the above statement holds, we have repeated the tests on Dataset B two more times, once decreasing C to α = 0.05 while keeping τ to the value defined in (9), and one decreasing τ to γ = 0.6 while keeping C as defined in (9). The results of the first test are shown in Fig. 7 and Fig. 9b, while the results of the second test are shown in Fig. 8. Fig. 7 shows that, as expected, new C values yield an increase in the recall at the expenses of a small decrease in precision. Moreover, the number of samples required to issue the label (see Fig. 9b) is significantly smaller than that with the values defined in (9). Similarly, ...
Context 2
... the above statement holds, we have repeated the tests on Dataset B two more times, once decreasing C to α = 0.05 while keeping τ to the value defined in (9), and one decreasing τ to γ = 0.6 while keeping C as defined in (9). The results of the first test are shown in Fig. 7 and Fig. 9b, while the results of the second test are shown in Fig. 8. Fig. 7 shows that, as expected, new C values yield an increase in the recall at the expenses of a small decrease in precision. Moreover, the number of samples required to issue the label (see Fig. 9b) is significantly smaller than that with the values defined in (9). Similarly, Fig. 8 shows that new τ values yield an increase in the recall at ...

Similar publications

Article
Full-text available
Modern wheelchairs, with advanced and robotic technologies, could not reach the life of millions of disabled people due to their high costs, technical limitations, and safety issues. This paper proposes a gesture-controlled smart wheelchair system with an IoT-enabled fall detection mechanism to overcome these problems. It can recognize gestures usi...

Citations

... Hand Gesture Recognition (HGR) plays an essential role in various interactive systems, including signaling systems that rely on gestures [1,2], recognition of sign language [3,4], sports-specific sign language recognition [5,6], human gesture recognition [7,8], pose and posture detection [9,10], physical exercise monitoring [11,12], and control of smart Yi Yao and Chang-Tsun Li's research [51] focuses on addressing the formidable task of recognizing and tracking hand movements in uncontrolled environments. They identify several critical challenges inherent in such environments, including multiple hand regions, moving background objects, variations in scale, speed, trajectory location, changing lighting conditions, and frontal occlusions. ...
... This methodology simplifies the administration of input photographs acquired from different views or angles. Equations (7) and (8) can be used to shear an image along the x-axis and y-axis. In these equations, (x,y) represent the coordinates of a pixel in the original image, (x′,y′) represent the coordinates of the corresponding pixel in the sheared image, and shx and shy represent the shear factors along the x-axis and y-axis, respectively. ...
Article
Full-text available
This research stems from the increasing use of hand gestures in various applications, such as sign language recognition to electronic device control. The focus is the importance of accuracy and robustness in recognizing hand gestures to avoid misinterpretation and instruction errors. However, many experiments on hand gesture recognition are conducted in limited laboratory environments, which do not fully reflect the everyday use of hand gestures. Therefore, the importance of an ideal background in hand gesture recognition, involving only the signer without any distracting background, is highlighted. In the real world, the use of hand gestures involves various unique environmental conditions, including differences in background colors, varying lighting conditions, and different hand gesture positions. However, the datasets available to train hand gesture recognition models often lack sufficient variability, thereby hindering the development of accurate and adaptable systems. This research aims to develop a robust hand gesture recognition model capable of operating effectively in diverse real-world environments. By leveraging deep learning-based image augmentation techniques, the study seeks to enhance the accuracy of hand gesture recognition by simulating various environmental conditions. Through data duplication and augmentation methods, including background, geometric, and lighting adjustments, the diversity of the primary dataset is expanded to improve the effectiveness of model training. It is important to note that the utilization of the green screen technique, combined with geometric and lighting augmentation, significantly contributes to the model’s ability to recognize hand gestures accurately. The research results show a significant improvement in accuracy, especially with implementing the proposed green screen technique, underscoring its effectiveness in adapting to various environmental contexts. Additionally, the study emphasizes the importance of adjusting augmentation techniques to the dataset’s characteristics for optimal performance. These findings provide valuable insights into the practical application of hand gesture recognition technology and pave the way for further research in tailoring techniques to datasets with varying complexities and environmental variations.
... In particular, there is the theme of reliable communication between agents. Successful collaboration requires a robot to understand and react to the actions of humans, which can take different forms, e.g., voice or gestures [4]- [6], or can be implicitly expressed through nonverbal cues [7], [8]. Simultaneously, the human operator requires clear feedback and intuitive media to anticipate the robot's actions. ...
Preprint
Full-text available
This article presents an open-source architecture for conveying robots' intentions to human teammates using Mixed Reality and Head-Mounted Displays. The architecture has been developed focusing on its modularity and re-usability aspects. Both binaries and source code are available, enabling researchers and companies to adopt the proposed architecture as a standalone solution or to integrate it in more comprehensive implementations. Due to its scalability, the proposed architecture can be easily employed to develop shared Mixed Reality experiences involving multiple robots and human teammates in complex collaborative scenarios.
... Gestures are used to provide high-level commands, such as take off, land, or stop, whereas robot velocity is determined by mapping user's wrist movements. A similar setting was considered by Carfì et al. [22]. However, their framework was not designed for HRI. ...
Article
Full-text available
Recent advances in robotics have allowed the introduction of robots assisting and working together with human subjects. To promote their use and diffusion, intuitive and user-friendly interaction means should be adopted. In particular, gestures have become an established way to interact with robots since they allow to command them in an intuitive manner. In this article, we focus on the problem of gesture recognition in human–robot interaction (HRI). While this problem has been largely studied in the literature, it poses specific constraints when applied to HRI. We propose a framework consisting in a pipeline devised to take into account these specific constraints. We implement the proposed pipeline considering, as an example, an evaluation use case. To this end, we consider standard machine learning algorithms for the classification stage and evaluate their performance considering different performance metrics for a thorough assessment.
... In contrast to the natural use of gestures in social situations, gestures as part of a user interface (UI) are considered functional [2]. Most of the current works focus on 1-to-1 mapping between gestures and controlled variables (e.g., [3], [4], [5]), thus heavily restricting expressiveness of the gestures. This paper proposes a system that allows the user to control a robot by communicating target actions, objects, and other parameters via gestures. ...
... Many works focus on developing better sensors enabling gesture-recognition such as wearable [8] or contact-less sensors [9]. The development of better methods to detect individual gestures [3], [4] and human activities provides important key components for HRI systems. Several papers utilize the Leap Motion controller [10] to detect hand gestures. ...
Preprint
Full-text available
Collaborative robots became a popular tool for increasing productivity in partly automated manufacturing plants. Intuitive robot teaching methods are required to quickly and flexibly adapt the robot programs to new tasks. Gestures have an essential role in human communication. However, in human-robot-interaction scenarios, gesture-based user interfaces are so far used rarely, and if they employ a one-to-one mapping of gestures to robot control variables. In this paper, we propose a method that infers the user's intent based on gesture episodes, the context of the situation, and common sense. The approach is evaluated in a simulated table-top manipulation setting. We conduct deterministic experiments with simulated users and show that the system can even handle personal preferences of each user.
... Independently by the sensor type, a gesture-based interface should process the collected data to identify motions with explicit communication intent and ignore normal user operations. This problem is known as gesture recognition, and for IMU data, it has been approached using Naive Bayes Classifiers, Logistic Regression and Decision Trees [19], Convolutional Neural Networks (CNNs) [20], Dynamic Time Warping (DTW) [21], Support Vector Machines (SVMs) [22], and Long Short-Term Memory Neural Networks (LSTM) [23]. ...
Chapter
Full-text available
Close human-robot interaction (HRI), especially in industrial scenarios, has been vastly investigated for the advantages of combining human and robot skills. For an effective HRI, the validity of currently available human-machine communication media or tools should be questioned, and new communication modalities should be explored. This article proposes a modular architecture allowing human operators to interact with robots through different modalities. In particular, we implemented the architecture to handle gestural and touchscreen input, respectively, using a smartwatch and a tablet. Finally, we performed a comparative user experience study between these two modalities.
... Since it was proposed in 1991 [37], RNN with time series as input has been widely used for human activity classification or gesture estimation [38][39][40][41][42][43][44]. Many researchers have carried out extensive work to improve the performance of RNN models in HAR [45][46][47], and Torti et al. [48] propose an RNN system for fall detection suitable for a microcontroller embedded implementation, with an overall detection rate of 98%. ...
Article
Full-text available
Fall detection is a challenging task for human activity recognition but is meaningful in health monitoring. However, for sensor-based fall prediction problems, using recurrent architectures such as recurrent neural network models to extract temporal features sometimes could not accurately capture global information. Therefore, an improved WTCN model is proposed in this research, in which the temporal convolutional network is combined with the wavelet transform. Firstly, we use the wavelet transform to process the one-dimensional time-domain signal into a two-dimensional time-frequency domain signal. This method helps us to process the raw signal data efficiently. Secondly, we design a temporal convolutional network model with ultralong memory referring to relevant convolutional architectures. It avoids the gradient disappearance and explosion problem usefully. In addition, this paper also conducts experiments comparing our WTCN model with typical recurrent architectures such as the long short-term memory network in conjunction with three datasets, UniMiB SHAR, SisFall, and UMAFall. The results show that WTCN outperforms other traditional methods, the accuracy of the proposed algorithm is up to 99.53%, and human fall behavior can be effectively recognized in real time.
... Independently by the sensor type, a gesture-based interface should process the collected data to identify motions with explicit communication intent and ignore normal user operations. This problem is known as gesture recognition, and for IMU data, it has been approached using Naive Bayes Classifiers, Logistic Regression and Decision Trees [19], Convolutional Neural Networks (CNNs) [20], Dynamic Time Warping (DTW) [21], Support Vector Machines (SVMs) [22], and Long Short-Term Memory Neural Networks (LSTM) [23]. ...
Preprint
Full-text available
Close human-robot interaction (HRI), especially in industrial scenarios, has been vastly investigated for the advantages of combining human and robot skills. For an effective HRI, the validity of currently available human-machine communication media or tools should be questioned, and new communication modalities should be explored. This article proposes a modular architecture allowing human operators to interact with robots through different modalities. In particular, we implemented the architecture to handle gestural and touchscreen input, respectively, using a smartwatch and a tablet. Finally, we performed a comparative user experience study between these two modalities.
... In addition to these traditional machine learning methods, the emergence and widespread availability of new hardware allowing the use of deep learning architectures has motivated a tendency towards using deep learning approaches for human activity recognition as well. These include recurrent neural networks (RNNs) (Murakami & Taguchi, 1991;Murad & Pyun, 2017;Carfi et al., 2018;Koch et al., 2019), long short-term memory (LSTM) (Chen et al., 2016;Singh et al., 2017;Pienaar & Malekian, 2019) and convolutional neural networks (CNN) (Wang et al., 2019;Lee et al., 2017;Gholamrezaii & Taghi Almodarresi, 2019;Naqvi et al., 2020;Cruciani et al., 2020;Mehmood et al., 2021;Mekruksavanich & Jitpattanakul, 2021). ...
Article
Full-text available
Manual behavioral observations have been applied in both environment and laboratory experiments in order to analyze and quantify animal movement and behavior. Although these observations contributed tremendously to ecological and neuroscientific disciplines, there have been challenges and disadvantages following in their footsteps. They are not only time-consuming, labor-intensive, and error-prone but they can also be subjective, which induces further difficulties in reproducing the results. Therefore, there is an ongoing endeavor towards automated behavioral analysis, which has also paved the way for open-source software approaches. Even though these approaches theoretically can be applied to different animal groups, the current applications are mostly focused on mammals, especially rodents. However, extending those applications to other vertebrates, such as birds, is advisable not only for extending species-specific knowledge but also for contributing to the larger evolutionary picture and the role of behavior within. Here we present an open-source software package as a possible initiation of bird behavior classification. It can analyze pose-estimation data generated by established deep-learning-based pose-estimation tools such as DeepLabCut for building supervised machine learning predictive classifiers for pigeon behaviors, which can be broadened to support other bird species as well. We show that by training different machine learning and deep learning architectures using multivariate time series data as input, an F1 score of 0.874 can be achieved for a set of seven distinct behaviors. In addition, an algorithm for further tuning the bias of the predictions towards either precision or recall is introduced, which allows tailoring the classifier to specific needs.
... A considerable amount of studies has been conducted to detect human motion based on wearable or attachable sensors, including sensor-glove [8], surface electromyography [9], triaxial accelerometer [10], and surface markers [11]. Image-based detection using cameras has emerged as an effective alternative to detect human workers' motion due to their low cost, comfortability to human workers, and recently increased computational power. ...
Article
Full-text available
The safety of human workers has been the main concern in human-robot close collaboration. Along with rapidly developed artificial intelligence techniques, deep learning models using two-dimensional images have become feasible solutions for human motion detection. These models serve as “sensors” in the closed-loop system that involve humans and robots. Most existing methods that detect human motion using images do not consider the uncertainty from the deep learning model itself. The mappings established by deep learning models should not be taken blindly, and thus uncertainty should be a natural part of this type of sensor. In particular, model uncertainty should be explicitly quantified and incorporated into robot motion control to guarantee safety. With this motivation, to rigorously quantify the uncertainty of these “sensors”, this letter proposes a probabilistic interpretation method and automatically provides a framework to benefit from a deep model's uncertainty. Experimental data from human-robot collaboration has been collected and used to validate the proposed method. A training strategy is proposed to efficiently train surrogate models that learn to refine the prediction of the main Bayesian models. The proposed framework is also compared with Ego hands benchmark showing a 4.7% increase in mIoU.
... Initially, the idea of using temporal information was proposed in 1991 [178] to recognize a finger alphabet consisting of 42 symbols and in 1995 [179] to classify 66 different hand shapes with about 98% accuracy. Since then, the recurrent neural network (RNN) with time series as input has been widely applied to classify human activities or estimate hand gestures [180][181][182][183][184][185][186][187]. ...
Article
Full-text available
Mobile and wearable devices have enabled numerous applications, including activity tracking, wellness monitoring, and human–computer interaction, that measure and improve our daily lives. Many of these applications are made possible by leveraging the rich collection of low-power sensors found in many mobile and wearable devices to perform human activity recognition (HAR). Recently, deep learning has greatly pushed the boundaries of HAR on mobile and wearable devices. This paper systematically categorizes and summarizes existing work that introduces deep learning methods for wearables-based HAR and provides a comprehensive analysis of the current advancements, developing trends, and major challenges. We also present cutting-edge frontiers and future directions for deep learning-based HAR.