Figure 1 - uploaded by Jakub Segen
Content may be subject to copyright.
Driving a robot gripper with gestures.

Driving a robot gripper with gestures.

Source publication
Conference Paper
Full-text available
This paper describes a system that uses a camera and a point light source to track a user's hand in three dimensions. Using depth cues obtained from projections of the hand and its shadow, the system computes the 3D position and orientation of two fingers (thumb and pointing finger). The system recognizes one dynamic and two static gestures. Recogn...

Context in source publication

Context 1
... addition to being intuitive, hand gestures also ooer higher dimensionality. F or example with the gesture shown Figure 1, the user can control up to seven parameters gripper's position, orienta- tion and jaw-separation by natural nger movements. Such dimensionality is not provided by the traditional input devices. ...

Similar publications

Conference Paper
Full-text available
Playing a musical instrument involves a complex set of continuous gestures, both to play the notes and to convey expression. To learn an instrument, a student must learn not only the music itself but also how to perform these bodily gestures. We present MirrorFugue, a set of three interfaces on a piano keyboard designed to visualize hand gesture of...

Citations

... Pose estimation is an extensive application of computer vision that deals with analyzing the study of individual body parts that make up the posture through critical point data analysis [1] for different applications like fitness (professional trainer through artificial intelligence (AI) led instructor) [2], [3], physical therapy (posture correction based on mapping and correction of postures) [4], [5] video game or movie production with enriched visuals (based on mapping on avatars through infrared-IR sensors) and robotics (for flexible and smooth reflexes with minimal recalibration) [6], [7] Pose estimation application comprises of tracking changes in human posture and providing feedback in real-time [8]. ...
... According to Table 2, the following pairs have been chosen in order to form the edges of the skeletal structure of the pose; (0, 1), (0, 2), (2, 4), (1, 3), (6,8), (8,10), (5, 7), (7,9), (5,11), (11,13), (13,15), (6,12), (12,14), (14,16), (5,6). There is a confidence score associated with each key point, based on which the key points are located. ...
... According to Table 2, the following pairs have been chosen in order to form the edges of the skeletal structure of the pose; (0, 1), (0, 2), (2, 4), (1, 3), (6,8), (8,10), (5, 7), (7,9), (5,11), (11,13), (13,15), (6,12), (12,14), (14,16), (5,6). There is a confidence score associated with each key point, based on which the key points are located. ...
Article
Full-text available
Pose estimation of human activity recognition has been a keen area of interest in augmented reality experiences, gaming and robotics, animations, behavioral analysis, and more. One such exciting variant of pose estimation in the field of health and science is yoga pose estimation. This paper explores yoga pose estimation using deep learning networks. The research aims to build a system for estimating 45 different complex yoga asanas from 11,000 images using deep learning algorithms. This system is built using a Region-based Convolutional Neural Network (RCNN) to estimate the joints in the body, followed by a Convolutional Neural Network (CNN) for classifying the poses. The model is trained using the Yoga-82 (hierarchically labeled) dataset, a new dataset with complex pose variations mainly designed for hierarchical labeling. Next, it highlights the pose estimation task through ResNet models followed by an optimization algorithm, which increases the accuracy by 10%. The resultant accuracy is 90.5% for the ResNet50 model. Finally, it provides a solution for overlapping yoga poses, multi-person, in-air, and non-conventional poses using a dense network of 17 critical points for analysis and prediction. 1. Introduction Pose estimation is an extensive application of computer vision that deals with analyzing the study of individual body parts that make up the posture through critical point data analysis [1] for different applications like fitness (professional trainer through artificial intelligence (AI) led instructor) [2], [3], physical therapy (posture correction based on mapping and correction of postures) [4], [5] video game or movie production with enriched visuals (based on mapping on avatars through infrared-IR sensors) and robotics (for flexible and smooth reflexes with minimal recalibration) [6], [7] Pose estimation application comprises of tracking changes in human posture and providing feedback in real-time [8]. Yoga pose estimation has been an extensive area of research in clinical applications [9], behavioral analysis, human pose co-estimation (PCE) and prototype pose characterization [10]. Many models like PoseNet [11], Open Pose [12] as well as OpenCV contour detection [13] have been curated and customized to build AI-based pose estimators for medical [14], [15] and fitness related applications [16]. Recent advances also involve pose estimation in 3D space using mediapipe [17]. Tensorflow MoveNet [18] has also paved the way for designing an animated AI pose trainer [19]. Despite the range of models that are available and proposed for pose estimation, the work on the variety of poses stays limited [20]. There are many instances like dog pose, cat pose where the key joint points are obscured in the pose image [21], in such a scenario, detection and prediction need to go hand in hand. The training of the dataset through the proposed Region Based Convolutional Neural Networks (RCNN) [22] model ensures that no such limitation is faced in the everyday yoga pose applications. Further optimization gives robustness to the proposed model [23]. Pose estimation from an image or video frame is a highly challenging task. It depends on the scale and resolution of the image and other aspects like lighting conditions, fluctuations, occultations, background conditions, and more [24]. The complexity increases when pose estimation is applied to fitness-related activities [25], it is mainly due to the wide variety and diversity of possible poses (e.g., thousands of yoga asanas), occlusions (e.g., obstruction of key-point locations due to varied poses), and different angles of appearance (front, back, side view)[26].
... Face and hand gesture detections are two current areas of emphasis in the field. Cameras and computer vision algorithms have been used in a variety of ways to translate sign language [14,15,16]. Algorithm based on computer vision Computer vision is depending on the study of artificial intelligence systems which are used to extract data from images. ...
Article
Abstract In the real world, humans communicate with each other to share their thoughts or feelings. Here VSP (Virtual Smart Phone) is introduced to connect both the Physical and virtual worlds. VSP supports natural hand gestures, Hand movement and the Internet. VSP users can communicate with each other by Virtual mobile phone. The touch gesture radio wave and cloud computing technology are used to achieve this. Augmented reality (AR) has the potential to revolutionize the way people interact with technology. This paper presents a novel AR-based virtual smartphone, which is capable of providing users with an immersive experience. The proposed system is composed of a depth camera, a set of virtual reality (VR) glasses, and a smartphone application. The system is able to track the user's hand movements and gestures, allowing for a more natural interaction with the virtual smartphone. The user is able to manipulate the virtual smartphone display in the same way as a physical device. The cell phone dependency will be removed with VSP. By touching the user's Palm dialling a fresh call, watching movies, or viewing images on the user’s palm or wrist is possible. Calls are placed and terminated using touch gestures. A touch-based engagement in communication is possible using VSP. Furthermore, the proposed system is capable of recognizing various types of gestures, such as swipes and taps, which can be used to control the virtual display. The paper provides an overview of the system architecture and implementation details. The performance of the system is evaluated in terms of accuracy and latency. The results demonstrate that the proposed system is capable of providing a highly interactive, immersive experience. Keywords: Augmented Reality (AR); Computer Vision; Gesture; VSP
... However, for gesture interfaces, the definition of trigger signals is challenging. Beside the use of dedicated mechanical devices, such as a thumb switch , the use of dedicated trigger gestures [Choumane et al., 2010;Segen and Kumar, 1999], e.g., tapping in space [Jang et al., 2015], is a suitable option in line with a gestural interface. However, gesture recognition is still error-prone and suffers from peruser differences. ...
Thesis
Full-text available
Lifting of virtual reality from the rollercoaster of hypes, and putting it on a substantial track through our lives, needs applications that sustainably bring benefits that exceed the costs to the user. Also, while in the past there has been a lot of research activity engaged in increasing and carving out the benefits of immersive technology, there has been much less in reducing the actual costs. The recent rise of consumer HMDs radically changed the possibilities in this endeavor. We believe that the actual low price and high quality were just door openers, but the main and unique feature is the small device footprint. People suddenly can carry around and use affordable and high fidelity virtual reality devices wherever they want, without the need to visit special purpose facilities. In this thesis, we, therefore, are looking into the possibilities and unique challenges this raises for (office) desk-based working scenarios, as they are ubiquitous in data analysis. As part of our contribution, we, first, characterize this scenario, introduce the term deskVR and name the technical challenges that come with it. Furthermore, we tackle specific demands in two pillars of interaction in virtual reality, selection & manipulation, and navigation. These demands are mainly characterized by the fact that a user will most of the time be seated and the integration of additional hardware, such as controllers and advanced tracking devices, would again increase the costs. As a result, we come up with a new seated travel technique and manifold support for hands-free interaction, including desk-aligned passive haptic menus. Then, we investigate passive and active methods to prevent and reduce cybersickness, for which one primary driver is virtual travel | as for us, tackling cybersickness is one of the critical tasks that have to be solved to integrate virtual reality into everyday life successfully. Finally, and driven by the needs of our partners in neuroscience, we apply the methods and findings made in this thesis to a prototypical application framework for immersive 3D graph exploration, serving as proof of concept for the integrability of virtual reality into desk-based working scenarios. In the graph visualization domain, we then also propose new vertex positioning and edge bundling methods that address challenges arising with the performed up-projection into 3D interactive space.
... The problem of isolating the object of interest, i.e., the gesticulating signer's palm, from a static 2D input image is a well-explored subject, and successful methods are aplenty. Examples of such methods can be found here [2,3,4,5,6]. These methods can easily be implemented as an extra step in our extensible and modular framework. ...
Preprint
Full-text available
Our work addresses the problem of automatically recognising a Sign Language alphabet from a given still image obtained under arbitrary illumination. To solve this problem, we designed a computational framework that is founded on the notion that shape features are robust to illumination changes. The statistical classifier part of the framework uses a set of weighted, self-learned features, i.e., binary relationship between pairs of pixels. There are two possible pairings: an edge pixel with another edge pixel, and an edge pixel with a non-edge pixel. This two- pairing arrangement allows a consistent 2D image representation for all letters of the Sign Language alphabets, even if they were to be captured under varying illumination settings. Our framework, which is modular and extensible, paves the way for a system to perform robust (to illumination changes) recognition of the Sign Language alphabets. We also provide arguments to justify our framework design in term of its fitness for real world application.
... The problem of isolating the object of interest, i.e., the gesticulating signer's palm, from a static 2D input image is a well-explored subject, and successful methods are aplenty. Examples of such methods can be found here [2,3,4,5,6]. These methods can easily be implemented as an extra step in our extensible and modular framework. ...
Preprint
Full-text available
Our work addresses the problem of automatically recognising a Sign Language alphabet from a given still image obtained under arbitrary illumination. To solve this problem, we designed a computational framework that is founded on the notion that shape features are robust to illumination changes. The statistical classifier part of the framework uses a set of weighted, self-learned features, i.e., binary relationship between pairs of pixels. There are two possible pairings: an edge pixel with another edge pixel, and an edge pixel with a non-edge pixel. This two- pairing arrangement allows a consistent 2D image representation for all letters of the Sign Language alphabets, even if they were to be captured under varying illumination settings. Our framework, which is modular and extensible, paves the way for a system to perform robust (to illumination changes) recognition of the Sign Language alphabets. We also provide arguments to justify our framework design in term of its fitness for real world application.
... With the first attempt the false rejection is around 2-3 percent and false acceptance is less than 0.0001 per cent. Each standalone unit cab stores 48 fingerprint templates which may be expanded to 846 by installing an additional memory package [11]. ...
... Gesture recognition has become more interesting because of recent improvements * e-mail: {zielasko, weyers, kuhlen}@vr.rwth-aachen.de † e-mail: neha.neha@rwth-aachen.de in the field of computer vision and defining dedicated trigger gestures has been shown to basically work [7,17,26]. But especially when defining a trigger, approaches in both recognition fields suffer from high detection latency [13], since a gesture has to be finished or a word has to be spoken to be detected correctly. ...
Conference Paper
Full-text available
We extended BlowClick, a NVVI metaphor for clicking, by adding machine learning methods to more reliably classify blowing events. We found a support vector machine with Gaussian kernel performing the best with at least the same latency and more precision than before. Furthermore, we added acoustic feedback to the NVVI trigger, which increases the user's confidence. With this extended technique we conducted a user study with 33 participants and could confirm that it is possible to use NVVI as a reliable trigger as part of a hands-free point-and-click interface.
... It tracks hands well but requires high processing power. Segen and Kumar [7] used barehand interaction for their GestureVR system, but it requires a uniform background. Stauffer and Grimson [8] proposed a method that updates the background model adaptively. ...
Article
Full-text available
Using one’s hands in human–computer interaction increases both the effectiveness of computer usage and the speed of interaction. One way of accomplishing this goal is to utilize computer vision techniques to develop hand-gesture-based interfaces. A video database system is one application where a hand-gesture-based interface is useful, because it provides a way to specify certain queries more easily. We present a hand-gesture-based interface for a video database system to specify motion and spatiotemporal object queries. We use a regular, low-cost camera to monitor the movements and configurations of the user’s hands and translate them to video queries. We conducted a user study to compare our gesture-based interface with a mouse-based interface on various types of video queries. The users evaluated the two interfaces in terms of different usability parameters, including the ease of learning, ease of use, ease of remembering (memory), naturalness, comfortable use, satisfaction, and enjoyment. The user study showed that querying video databases is a promising application area for hand-gesture-based interfaces, especially for queries involving motion and spatiotemporal relations.
... In addition, these systems rely on cameras to capture high-resolution video frames, which bring privacy concerns as the raw camera data can be leaked to the adversary [52,64]. While prior vision methods [55,56] have leveraged shadow to infer human gestures, they work strictly under a single light source and do not apply in a natural indoor setting with multiple light sources. ...
Conference Paper
Full-text available
We present LiSense, the first-of-its-kind system that enables both data communication and fine-grained, real-time human skeleton reconstruction using Visible Light Communication (VLC). LiSense uses shadows created by the human body from blocked light and reconstructs 3D human skeleton postures in real time. We overcome two key challenges to realize shadow-based human sensing. First, multiple lights on the ceiling lead to diminished and complex shadow patterns on the floor. We design light beacons enabled by VLC to separate light rays from different light sources and recover the shadow pattern cast by each individual light. Second, we design an efficient inference algorithm to reconstruct user postures using 2D shadow information with a limited resolution collected by photodiodes embedded in the floor. We build a 3 m x 3 m LiSense testbed using off-the-shelf LEDs and photodiodes. Experiments show that LiSense reconstructs the 3D user skeleton at 60 Hz in real time with 10 degrees mean angular error for five body joints.
... This method tries to find the angle between two points at the finger. If the threshold angle between two points is in range, the middle point will be considered as fingertips [6,15,16]. ...
... After extracting the hand contour, the K-Curvature algorithm is computed. This algorithm is familiar to find fingertips [15,16]. Refer to Fig. 5, this algorithm takes each vector point A (i) to its neighbor points B and C at distance of K. Calculation for point B and C are shown in (1) and (2). ...
Conference Paper
Full-text available
Gesture is one of the important aspects of human interaction and also in the context of human computer interaction. Gesture recognition is the mathematical interpretation of a human motion by a computing device. It is often used hand gestures for input commands in personal computers. By recognizing the hand gesture as input, it allows the user to access the computer interactively and makes interaction more natural. This paper presents a finger detection application by using Kinect. Kinect is a depth sensor that is an effective device to capture the gesture in real-time. To detect and recognize the fingertips, it needs to extract the detail of the captured hand image using image processing methods. In this paper, the proposed method is to detect and recognize the fingertips by using the K-Curvature algorithm. Finally, the finger counting application is applied and the proposed method is discussed at the end of this paper. The results obtained from the experiment show that the acceptable average accuracy for the fingertips detection is 73.7% and the average processing time is 15.73 ms. By considering this result, the application of the proposed method can be extended to the hand rehabilitation system