Figure 1 - uploaded by Hugo Jair Escalante
Content may be subject to copyright.
Color rendering of depth images from the gesture challenge database were recorded with a KinectTM camera. 

Color rendering of depth images from the gesture challenge database were recorded with a KinectTM camera. 

Source publication
Conference Paper
Full-text available
We organized a challenge on gesture recognition: http://gesture.chalearn.org. We made available a large database of 50,000 hand and arm gestures videorecorded with a Kinect™ camera providing both RGB and depth images. We used the Kaggle platform to automate submissions and entry evaluation. The focus of the challenge is on “one-shot-learning”, whic...

Similar publications

Article
Full-text available
Many presentation these days are done with the help of a presentation tool. Lecturers at Universities and researchers in conferences use such tools to order the flow of the presentation and to help audiences follow the presentation points. Presenters control the presentation tools using mouse and keyboard which keep the presenters always beside the...
Article
Full-text available
Natural User Interface [NUI] is the medium of interaction between a user and a machine through natural entity (Air) in the form of user's gesture, recognized by the machine using gesture recognition. Gesture Recognition is the detection of human's bodily motion and behavior. Encroachments have been made using advanced cameras, hardware devices like...
Article
Full-text available
Sign language recognition (SLR) can provide a helpful tool for the communication between the deaf and the external world. This paper proposed a component-based vocabulary extensible SLR framework using data from surface electromyographic (sEMG) sensors, accelerometers (ACC), and gyroscopes (GYRO). In this framework, a sign word was considered to be...
Conference Paper
Full-text available
The recognition of continuous natural gestures is a complex and challenging problem due to the multi-modal nature of involved visual cues (e.g. fingers and lips movements, subtle facial expressions, body pose, etc.), as well as technical limitations such as spatial and temporal resolution and unreliable depth cues. In order to promote the research...

Citations

... During the last 20 years, quite a few hand gesture datasets have been published for machine learning purposes. Cambridge hand gesture dataset [18], Naval Air Training and Operating Procedures Standardization (NATOPS) aircraft handling signals database [16], Keck gesture dataset [15], ChaLearn [19], Sheffield KInect Gesture (SKIG) [20], Microsoft Research Cambridge-12 (MSRC-12) Kinect gesture dataset [21], to name just a few. These datasets were devised for general gesture recognition purposes, none of them focusing specifically on human-drone interaction, an area lately attracting increasing research interest. ...
Conference Paper
Full-text available
Camera-equipped Unmanned Aerial Vehicles (UAVs, or drones) have revolutionized several application domains, with a steadily increasing degree of cognitive autonomy in commercial drones paving the way for unprecedented robotization of daily life. Dynamic cooperation of UAVs with human collaborators is typically necessary during a mission; a fact that has led to various solutions for high-level UAV-operator interaction. Hand gestures are an effective way of facilitating this remote drone handling, giving rise to new gesture languages for visual communication between operators and autonomous UAVs. This paper reviews all the available languages which could be used or have been created for this purpose, as well as relevant gesture recognition datasets for training machine learning models. Moreover, a novel, generic, base gesture language for handling camera-equipped UAVs is proposed, along with a corresponding, large-scale, publicly available video dataset. The presented language can easily and consistently be extended in the future to more specific scenarios/profiles,tailored for particular application domains and/or additional UAV equipment (e.g., aerial manipulators/arms). Finally, we evaluate: a) the performance of state-of-the-art gesture recognition algorithms on the proposed dataset, in a quantitative and objective manner, and b) the intuitiveness, effectiveness and completeness of the proposed gesture language, in a qualitative and subjective manner.
... Then the centroid of each cluster found out mathematically, and hence states of FSM was determined and finally gesture was recognized. Guyon et al. (2012) described the Chalearn gesture dataset recorded using a Kinect camera including Wrestling Referee Signals and Volleyball Referee Signals. Hari and Wilscy (2014) identified umpire frames from each scene of a cricket game and analyzed vertical and horizontal intensity projection profiles. ...
Article
Full-text available
Recognition of hand gestures (hand signals) is an active research area for human computer interaction with many possible applications. Automatic machine vision-based hand gesture interfaces for real-time applications require fast and extremely robust human, pose and hand detection, and gesture recognition. Attempting to recognize gestures performed by official referees in sports (such as basketball game) video places tough requirements on the image segmentation techniques. Here we propose an image segmentation technique based on the histogram of oriented gradients and local binary pattern (LBP) features, which allow recognizing the signals of basketball referee from recorded game videos and achieved an accuracy of 95.6% using LBP features and support vector machine for classification. Our results are relevant for real-time analysis of basketball game.
... Therefore, this dataset was used to train our networks from scratch. CGD dataset was first proposed in [47], and then widely used to evaluate the performance of OSLHGR algorithm. In the videos, each performer is portrayed in front of a fixed Kinect TM camera. ...
Article
Full-text available
Though deep convolutional neural networks (CNNs) have made great breakthroughs in the field of vision-based gesture recognition, however it is challenging to deploy these high-performance networks to resource-constrained mobile platforms and acquire large numbers of labeled samples for deep training of CNNs. Furthermore, there are some application scenarios with only a few samples or even a single one for a new gesture class so that the recognition method based on CNNs cannot achieve satisfactory classification performance. In this paper, a well-designed lightweight network based on I3D with spatial-temporal separable 3D convolutions and Fire module is proposed as an effective tool for the extraction of discriminative features. Then some effective capacity by deep training of large samples from related categories can be transferred and utilized to enhance the learning ability of the proposed network instead of training from scratch. In this way, the implementation of one-shot learning hand gesture recognition (OSLHGR) is carried out by a rational decision with distance measure. Moreover, a kind of mechanism of discrimination evolution with innovation of new sample and voting integration based on multi-classifiers is established to improve the learning and classification performance of the proposed method. Finally, a series of experiments and tests on the IsoGD and Jester datasets are conducted to demonstrate the effectiveness of our improved lightweight I3D. Meanwhile, a specific dataset of gestures with variant angles and directions, BSG 2.0, and the ChaLearn gesture dataset (CGD) are used for the test of OSLHGR. The results on different experiment platforms verify and validate the performance advantages of satisfied classification and real-time response speed.
... Section 2) to tackle the prevalent lack of inverse models. Doing so, (Schüldt et al., 2004) 15 Weizmann (Blank et al., 2005) 13 IXMAS (Weinland et al., 2006) 8 MSR-Action-3D (Li et al., 2010) 7 HMDB (Kuehne et al., 2011) 4 3D Action Pairs (Oreifej and Liu, 2013) 2 50 Salads (Stein and McKenna, 2013) 2 ADLs (Pirsiavash and Ramanan, 2012) 2 CAD-60 (Sung et al., 2012) 2 CMU-MoCap (CMU, 2003) 2 Florence3D Actions (Seidenari et al., 2013) 2 HDM05 (Müller et al., 2007) 2 Hollywood2 (Marszalek et al., 2009) 2 MoPrim (Reng et al., 2005) 2 MSR-II (Cao et al., 2010) 2 MSR Daily Activiy (Wang et al., 2012) 2 UTKinect-Action (Xia et al., 2012) 2 UCF-101 (Soomro et al., 2012) 2 UCF-Sports (Rodriguez et al., 2008) 2 YouTube (Liu et al., 2009) 2 Berkeley-MHAD (Ofli et al., 2013) 1 ChaLearn Gesture (Guyon et al., 2012) 1 CHEMLAB corpus (Vitkute-Adzgauskiene et al., 2014) 1 FBG (Hwang et al., 2007) 1 Fish-action (Rahman et al., 2012) 1 G3D (Bloom et al., 2012) 1 Human Grasp (Schenatti et al., 2003) 1 JIGSAWS (Gao et al., 2014) 1 ManiAc (Aksoy et al., 2015) 1 MSRC-12 (Fothergill et al., 2012) 1 MuHAVi (Singh et al., 2010) 1 Olympic-Sports (Niebles et al., 2010) 1 Ravel (Alameda-Pineda et al., 2011) 1 RGBD-HUDAACT (Ni et al., 2013) 1 Reading Act (Chen et al., 2014) 1 Robust (Gorelick et al., 2007) 1 Stanford-40 Actions (Yao et al., 2011) 1 SYSU-3D-HOI (Hu et al., 2017) 1 TACoS (Regneri et al., 2013) 1 UMD (Veeraraghavan et al., 2006) 1 UT-Interaction (Ryoo and Aggarwal, 2010) 1 (0) among other benefits, readily unlocks the capacity of bidirectional effect associativity as well as performing motor imagery (Jeannerod, 2006). Exploiting language for action understanding. ...
Article
Full-text available
Understanding and defining the meaning of “action” is substantial for robotics research. This becomes utterly evident when aiming at equipping autonomous robots with robust manipulation skills for action execution. Unfortunately, to this day we still lack both a clear understanding of the concept of an action and a set of established criteria that ultimately characterize an action. In this survey, we thus first review existing ideas and theories on the notion and meaning of action. Subsequently, we discuss the role of action in robotics and attempt to give a seminal definition of action in accordance with its use in robotics research. Given this definition we then introduce a taxonomy for categorizing action representations in robotics along various dimensions. Finally, we provide a meticulous literature survey on action representations in robotics where we categorize relevant literature along our taxonomy. After discussing the current state of the art we conclude with an outlook towards promising research directions.
... # Usage KTH (Schüldt et al. 2004) 15 Weizmann (Blank et al. 2005) 13 IXMAS (Weinland et al. 2006) 8 MSR-Action-3D 7 HMDB (Kuehne et al. 2011) 4 3D Action Pairs (Oreifej and Liu 2013) 2 50 Salads (Stein and McKenna 2013) 2 ADLs (Pirsiavash and Ramanan 2012) 2 CAD-60 (Sung et al. 2012) 2 CMU-MoCap (CMU 2003) 2 Florence3D Actions (Seidenari et al. 2013) 2 HDM05 (Müller et al. 2007) 2 Hollywood2 (Marszalek et al. 2009) 2 MoPrim (Reng et al. 2005) 2 MSR-II (Cao et al. 2010) 2 MSR Daily Activiy (Wang et al. 2012) 2 UTKinect-Action (Xia et al. 2012) 2 YouTube ) 2 UCF-101 (Soomro et al. 2012) 2 UCF-Sports (Rodriguez et al. 2008) 2 Berkeley-MHAD (Ofli et al. 2013) 1 ChaLearn Gesture (Guyon et al. 2012) 1 CHEMLAB corpus (Vitkute-Adzgauskiene et al. 2014) 1 FBG (Hwang et al. 2007) 1 Fish-action (Rahman et al. 2012) 1 G3D (Bloom et al. 2012) 1 Human Grasp (Schenatti et al. 2003) 1 JIGSAWS (Gao et al. 2014) 1 ManiAc 1 MSRC-12 (Fothergill et al. 2012) 1 MuHAVi (Singh et al. 2010) 1 Olympic-Sports (Niebles et al. 2010) 1 Ravel (Alameda-Pineda et al. 2011) 1 RGBD-HUDAACT (Ni et al. 2013) 1 Reading Act (Chen et al. 2014) 1 Robust (Gorelick et al. 2007) 1 Stanford-40 Actions (Yao et al. 2011) 1 SYSU-3D-HOI (Science and Lab 2017) 1 TACoS (Regneri et al. 2013) 1 UMD (Veeraraghavan et al. 2006) 1 UT-Interaction (Ryoo and Aggarwal 2010) 1 YouTube ) 1 ...
Preprint
Full-text available
Understanding and defining the meaning of "action" is substantial for robotics research. This becomes utterly evident when aiming at equipping autonomous robots with robust manipulation skills for action execution. Unfortunately, to this day we still lack both a clear understanding of the concept of an action and a set of established criteria that ultimately characterize an action. In this survey we thus first review existing ideas and theories on the notion and meaning of action. Subsequently we discuss the role of action in robotics and attempt to give a seminal definition of action in accordance with its use in robotics research. Given this definition we then introduce a taxonomy for categorizing action representations in robotics along various dimensions. Finally, we provide a systematic literature survey on action representations in robotics where we categorize relevant literature along our taxonomy. After discussing the current state of the art we conclude with an outlook towards promising research directions.
... Then the centroid of each cluster is found out mathematically, and hence the FSM states was determined and, finally, the gesture was recognized. Guyon 8 described Chalearn gesture dataset recorded using a Kinect camera including Referee Wrestling Signals and Referee Volleyball Signals. Trigueiros et al. 9 proposed a vision-based system, which is able to interpret dynamic and static gestures of the referee. ...
Article
Full-text available
Hand gestures, either static or dynamic, for human computer interaction in real time systems is an area of active research and with many possible applications. However, vision-based hand gesture interfaces for real-time applications require fast and extremely robust hand detection, and gesture recognition. Attempting to recognize gestures performed by officials in typical sports video places tremendous computational requirements on the image segmentation techniques. Here we propose an image segmentation technique based on the Histogram of Oriented Gradients (HOG) features that allows recognizing the signals of the basketball referee from videos. We achieve an accuracy of 97.5% using Support Vector Machine (SVM) for classification.
... Thanks to the immense popularity of the Microsoft Kinect 3D camera, there has been a surge in interest in developing methods for human gesture and action recognition from 3D skeletal data and depth images (Shotton et al., 2011;Wu and Shao, 2014b). Also, a number of new datasets (Escalera et al., 2013;Fothergill et al., 2012;Guyon et al., 2012;Wang et al., 2012) have provided researchers with the opportunity to design novel representations and algorithms, and test them on a much larger number of sequences. ...
... Recently, the gesture recognition domain has been stimulated by the collection and publication of large corpora. One such corpus was made available for the ChaLearn 2013 (Guyon et al., 2012) multi-modal gesture recognition competition hosted on Kaggle. This corpus is recorded with a Microsoft Kinect and included RGB images, depth images and audio (the users say the gesture out loud). ...
... A two-layered HGR is further exploited to further reduce the computation complexity. The overall framework that integrates all of the above is evaluated on data from Cha-Learn Gesture Dataset (CGD2011) [19]. ...
... ChaLearn Gesture Dataset (CGD2011) is used in the experiments, which is designed for one-shot learning. CGD2011 is the largest gestures dataset recorded with Kinect [19], which consists of 50,000 gestures (grouped in 500 batches, each batch including 47 sequences and each sequence containing of 1-5 gestures drawn from one of 30 small gesture vocabularies of 8-15 gestures), with frame size 240 9 320, 10 frames/second, recorded by 20 different users. In our experiments, Levenshtein distance is used to evaluate the HGR performance, which is also used in CHALEARN gesture challenge. ...
... Recognition performance of using the second layer, first two layers, and three layers on first 20 development batches of CGD2011[19] (TeLev is the average Levenshtein distance) ...
Article
Full-text available
This paper proposes a novel method for real-time gesture recognition. Aiming at improving the effectiveness and accuracy of HGR, spatial pyramid is applied to linguistically segment gesture sequence into linguistic units and a temporal pyramid is proposed to get a time-related histogram for each single gesture. Those two pyramids can help to extract more comprehensive information of human gestures from RGB and depth video. A two-layered HGR is further exploited to further reduce the computation complexity. The proposed method obtains high accuracy and low computation complexity performance on the ChaLearn Gesture Dataset, comprising more than 50, 000 gesture sequences recorded.
... Long Short Term Memory (LSTM) networks have proven successful for this task [22]. With the release of ChaLearn Gesture Challenge data set [23], there have been a number of works in oneshot learning, in which a single training example is used per gesture class [24], [25], [26]. A third focus is on developing methods that work on well-established gesture sets, such as sign languages. ...
... There is a significant amount of previous research on gesture recognition and benchmarking. Different data sets with recorded gestures are available, such as NATOPS gesture database [13], [14] and Chalearn gesture challenge [15]. However, none of the previously recorded data sets were directly usable in our scenario. ...