Figure 1: Driving a robot gripper with gestures.

International Journal of INTELLIGENT SYSTEMS AND APPLICATIONS IN ENGINEERING Region-based Network for Yoga Pose Estimation with Discriminative Fine-Tuning Optimization

Article

Full-text available

Aug 2023

Pose estimation of human activity recognition has been a keen area of interest in augmented reality experiences, gaming and robotics, animations, behavioral analysis, and more. One such exciting variant of pose estimation in the field of health and science is yoga pose estimation. This paper explores yoga pose estimation using deep learning networks. The research aims to build a system for estimating 45 different complex yoga asanas from 11,000 images using deep learning algorithms. This system is built using a Region-based Convolutional Neural Network (RCNN) to estimate the joints in the body, followed by a Convolutional Neural Network (CNN) for classifying the poses. The model is trained using the Yoga-82 (hierarchically labeled) dataset, a new dataset with complex pose variations mainly designed for hierarchical labeling. Next, it highlights the pose estimation task through ResNet models followed by an optimization algorithm, which increases the accuracy by 10%. The resultant accuracy is 90.5% for the ResNet50 model. Finally, it provides a solution for overlapping yoga poses, multi-person, in-air, and non-conventional poses using a dense network of 17 critical points for analysis and prediction. 1. Introduction Pose estimation is an extensive application of computer vision that deals with analyzing the study of individual body parts that make up the posture through critical point data analysis [1] for different applications like fitness (professional trainer through artificial intelligence (AI) led instructor) [2], [3], physical therapy (posture correction based on mapping and correction of postures) [4], [5] video game or movie production with enriched visuals (based on mapping on avatars through infrared-IR sensors) and robotics (for flexible and smooth reflexes with minimal recalibration) [6], [7] Pose estimation application comprises of tracking changes in human posture and providing feedback in real-time [8]. Yoga pose estimation has been an extensive area of research in clinical applications [9], behavioral analysis, human pose co-estimation (PCE) and prototype pose characterization [10]. Many models like PoseNet [11], Open Pose [12] as well as OpenCV contour detection [13] have been curated and customized to build AI-based pose estimators for medical [14], [15] and fitness related applications [16]. Recent advances also involve pose estimation in 3D space using mediapipe [17]. Tensorflow MoveNet [18] has also paved the way for designing an animated AI pose trainer [19]. Despite the range of models that are available and proposed for pose estimation, the work on the variety of poses stays limited [20]. There are many instances like dog pose, cat pose where the key joint points are obscured in the pose image [21], in such a scenario, detection and prediction need to go hand in hand. The training of the dataset through the proposed Region Based Convolutional Neural Networks (RCNN) [22] model ensures that no such limitation is faced in the everyday yoga pose applications. Further optimization gives robustness to the proposed model [23]. Pose estimation from an image or video frame is a highly challenging task. It depends on the scale and resolution of the image and other aspects like lighting conditions, fluctuations, occultations, background conditions, and more [24]. The complexity increases when pose estimation is applied to fitness-related activities [25], it is mainly due to the wide variety and diversity of possible poses (e.g., thousands of yoga asanas), occlusions (e.g., obstruction of key-point locations due to varied poses), and different angles of appearance (front, back, side view)[26].

AUGMENTED REALITY-BASED VIRTUAL SMARTPHONE

Article

Apr 2023

Abstract In the real world, humans communicate with each other to share their thoughts or feelings. Here VSP (Virtual Smart Phone) is introduced to connect both the Physical and virtual worlds. VSP supports natural hand gestures, Hand movement and the Internet. VSP users can communicate with each other by Virtual mobile phone. The touch gesture radio wave and cloud computing technology are used to achieve this. Augmented reality (AR) has the potential to revolutionize the way people interact with technology. This paper presents a novel AR-based virtual smartphone, which is capable of providing users with an immersive experience. The proposed system is composed of a depth camera, a set of virtual reality (VR) glasses, and a smartphone application. The system is able to track the user's hand movements and gestures, allowing for a more natural interaction with the virtual smartphone. The user is able to manipulate the virtual smartphone display in the same way as a physical device. The cell phone dependency will be removed with VSP. By touching the user's Palm dialling a fresh call, watching movies, or viewing images on the user’s palm or wrist is possible. Calls are placed and terminated using touch gestures. A touch-based engagement in communication is possible using VSP. Furthermore, the proposed system is capable of recognizing various types of gestures, such as swipes and taps, which can be used to control the virtual display. The paper provides an overview of the system architecture and implementation details. The performance of the system is evaluated in terms of accuracy and latency. The results demonstrate that the proposed system is capable of providing a highly interactive, immersive experience. Keywords: Augmented Reality (AR); Computer Vision; Gesture; VSP

DeskVR: Seamless Integration of Virtual Reality into Desk-Based Data Analysis Workflows

Thesis

Full-text available

Feb 2020

Daniel Zielasko

Lifting of virtual reality from the rollercoaster of hypes, and putting it on a substantial track through our lives, needs applications that sustainably bring benefits that exceed the costs to the user. Also, while in the past there has been a lot of research activity engaged in increasing and carving out the benefits of immersive technology, there has been much less in reducing the actual costs. The recent rise of consumer HMDs radically changed the possibilities in this endeavor. We believe that the actual low price and high quality were just door openers, but the main and unique feature is the small device footprint. People suddenly can carry around and use affordable and high fidelity virtual reality devices wherever they want, without the need to visit special purpose facilities. In this thesis, we, therefore, are looking into the possibilities and unique challenges this raises for (office) desk-based working scenarios, as they are ubiquitous in data analysis. As part of our contribution, we, first, characterize this scenario, introduce the term deskVR and name the technical challenges that come with it. Furthermore, we tackle specific demands in two pillars of interaction in virtual reality, selection & manipulation, and navigation. These demands are mainly characterized by the fact that a user will most of the time be seated and the integration of additional hardware, such as controllers and advanced tracking devices, would again increase the costs. As a result, we come up with a new seated travel technique and manifold support for hands-free interaction, including desk-aligned passive haptic menus. Then, we investigate passive and active methods to prevent and reduce cybersickness, for which one primary driver is virtual travel | as for us, tackling cybersickness is one of the critical tasks that have to be solved to integrate virtual reality into everyday life successfully. Finally, and driven by the needs of our partners in neuroscience, we apply the methods and findings made in this thesis to a prototypical application framework for immersive 3D graph exploration, serving as proof of concept for the integrability of virtual reality into desk-based working scenarios. In the graph visualization domain, we then also propose new vertex positioning and edge bundling methods that address challenges arising with the performed up-projection into 3D interactive space.

Towards a framework for recognising Sign language alphabets captured under arbitrary illumination

Preprint

Full-text available

Mar 2018

Weiyun Zhao

Our work addresses the problem of automatically recognising a Sign Language alphabet from a given still image obtained under arbitrary illumination. To solve this problem, we designed a computational framework that is founded on the notion that shape features are robust to illumination changes. The statistical classifier part of the framework uses a set of weighted, self-learned features, i.e., binary relationship between pairs of pixels. There are two possible pairings: an edge pixel with another edge pixel, and an edge pixel with a non-edge pixel. This two- pairing arrangement allows a consistent 2D image representation for all letters of the Sign Language alphabets, even if they were to be captured under varying illumination settings. Our framework, which is modular and extensible, paves the way for a system to perform robust (to illumination changes) recognition of the Sign Language alphabets. We also provide arguments to justify our framework design in term of its fitness for real world application.

Towards a framework for recognising Sign language alphabets captured under arbitrary illumination

Preprint

Full-text available

Mar 2018

Weiyun Zhao

Our work addresses the problem of automatically recognising a Sign Language alphabet from a given still image obtained under arbitrary illumination. To solve this problem, we designed a computational framework that is founded on the notion that shape features are robust to illumination changes. The statistical classifier part of the framework uses a set of weighted, self-learned features, i.e., binary relationship between pairs of pixels. There are two possible pairings: an edge pixel with another edge pixel, and an edge pixel with a non-edge pixel. This two- pairing arrangement allows a consistent 2D image representation for all letters of the Sign Language alphabets, even if they were to be captured under varying illumination settings. Our framework, which is modular and extensible, paves the way for a system to perform robust (to illumination changes) recognition of the Sign Language alphabets. We also provide arguments to justify our framework design in term of its fitness for real world application.

Staff Attendance Monitoring System using Fingerprint Biometrics

Article

Full-text available

Feb 2018

A Reliable Non-Verbal Vocal Input Metaphor for Clicking

Conference Paper

Full-text available

Mar 2017

We extended BlowClick, a NVVI metaphor for clicking, by adding machine learning methods to more reliably classify blowing events. We found a support vector machine with Gaussian kernel performing the best with at least the same latency and more precision than before. Furthermore, we added acoustic feedback to the NVVI trigger, which increases the user's confidence. With this extended technique we conducted a user study with 33 participants and could confirm that it is possible to use NVVI as a reliable trigger as part of a hands-free point-and-click interface.

HandVR: a hand-gesture-based interface to a video retrieval system

Article

Full-text available

Oct 2015

Using one’s hands in human–computer interaction increases both the effectiveness of computer usage and the speed of interaction. One way of accomplishing this goal is to utilize computer vision techniques to develop hand-gesture-based interfaces. A video database system is one application where a hand-gesture-based interface is useful, because it provides a way to specify certain queries more easily. We present a hand-gesture-based interface for a video database system to specify motion and spatiotemporal object queries. We use a regular, low-cost camera to monitor the movements and configurations of the user’s hands and translate them to video queries. We conducted a user study to compare our gesture-based interface with a mouse-based interface on various types of video queries. The users evaluated the two interfaces in terms of different usability parameters, including the ease of learning, ease of use, ease of remembering (memory), naturalness, comfortable use, satisfaction, and enjoyment. The user study showed that querying video databases is a promising application area for hand-gesture-based interfaces, especially for queries involving motion and spatiotemporal relations.

Human Sensing Using Visible Light Communication

Conference Paper

Full-text available

Sep 2015

We present LiSense, the first-of-its-kind system that enables both data communication and fine-grained, real-time human skeleton reconstruction using Visible Light Communication (VLC). LiSense uses shadows created by the human body from blocked light and reconstructs 3D human skeleton postures in real time. We overcome two key challenges to realize shadow-based human sensing. First, multiple lights on the ceiling lead to diminished and complex shadow patterns on the floor. We design light beacons enabled by VLC to separate light rays from different light sources and recover the shadow pattern cast by each individual light. Second, we design an efficient inference algorithm to reconstruct user postures using 2D shadow information with a limited resolution collected by photodiodes embedded in the floor. We build a 3 m x 3 m LiSense testbed using off-the-shelf LEDs and photodiodes. Experiments show that LiSense reconstructs the 3D user skeleton at 60 Hz in real time with 10 degrees mean angular error for five body joints.

Finger application using K-Curvature method and Kinect sensor in real-time

Conference Paper

Full-text available

Aug 2015

Gesture is one of the important aspects of human interaction and also in the context of human computer interaction. Gesture recognition is the mathematical interpretation of a human motion by a computing device. It is often used hand gestures for input commands in personal computers. By recognizing the hand gesture as input, it allows the user to access the computer interactively and makes interaction more natural. This paper presents a finger detection application by using Kinect. Kinect is a depth sensor that is an effective device to capture the gesture in real-time. To detect and recognize the fingertips, it needs to extract the detail of the captured hand image using image processing methods. In this paper, the proposed method is to detect and recognize the fingertips by using the K-Curvature algorithm. Finally, the finger counting application is applied and the proposed method is discussed at the end of this paper. The results obtained from the experiment show that the acceptable average accuracy for the fingertips detection is 73.7% and the average processing time is 15.73 ms. By considering this result, the application of the proposed method can be extended to the hand rehabilitation system

Driving a robot gripper with gestures.

Context in source publication

Similar publications

Citations