Fig 3 - uploaded by Yangsheng Wang
Content may be subject to copyright.
A recognition process of head shake.

A recognition process of head shake.

Source publication
Conference Paper
Full-text available
Head gestures such as nodding and shaking are often used as one of human body languages for communication with each other, and their recognition plays an important role in the development of Human-Computer Interaction (HCI). As head gesture is the continuous motion on the sequential time series, the key problems of recognition are to track multi-vi...

Context in source publication

Context 1
... an example, Fig. 3 shows the recognition process of head shake. Our algorithm is initialized by the output of Haar-like feature based boosted cascade face detector. Once a frontal face is detected, the color model will be created by the aligned region and the size of head in image can be got. The next twelve frames will be taken out. With Kalman filter ...

Similar publications

Article
Full-text available
Hand gestures are an important type of natural language used in many research areas such as human-computer interaction and computer vision. Hand gestures recognition requires the prior determination of the hand position through detection and tracking. One of the most efficient strategies for hand tracking is to use 2D visual information such as col...
Conference Paper
Full-text available
With the growing usage of computer systems in daily life, a natural and intuitive Human Computer Interaction (HCI) method to support the embedding of computer systems in our environment seems necessary. Gestures are of utmost importance for the design of natural user interfaces. Hand gesture recognition to extract meaningful expressions from the hu...
Article
Full-text available
The number of sold Nintendo's Wiimotes (Wii Remote Con-trollers) exceeded in the �rst half of 2008 the number of sold Tablet PC.This success made it very fast to the most spread computer input de-vice worldwide. Considering the fact that Nintendo also o�ers an APIfor Windows platform which allows accessing the Wiimote inner statesthrough the integr...

Citations

... The position and orientation of the head were initially identified using the multi-view model (MVM), and later, the hidden Markov model (HMM) was adopted to recognize head gestures using statistical inference. In the following, Lu et al. [15] applied the Bayesian network framework to MVM to identify head gestures. Later on, color information was inserted into the Bayesian network in order to enhance robustness and performance [16]. ...
Article
Full-text available
This paper proposes an adaptive Kalman filter (AKF) to improve the performance of a vision-based human machine interface (HMI) applied to a video game. The HMI identifies head gestures and decodes them into corresponding commands. Face detection and feature tracking algorithms are used to detect optical flow produced by head gestures. Such approaches often fail due to changes in head posture, occlusion and varying illumination. The adaptive Kalman filter is applied to estimate motion information and reduce the effect of missing frames in a real-time application. Failure in head gesture tracking eventually leads to malfunctioning game control, reducing the scores achieved, so the performance of the proposed vision-based HMI is examined using a game scoring mechanism. The experimental results show that the proposed interface has a good response time, and the adaptive Kalman filter improves the game scores by ten percent.
... Head poses were firstly detected by multi view model (MVM) and then Hidden Markov Model (HMM) was used as a head gesture statistic inference model for recognition. Moreover, Lu et al. [4] presented a Bayesian network (BN) based framework, into which MVM and the head gesture statistic inference model were integrated for recognizing. The decision of head gesture was made by comparing the maximum posterior, the output of BN, with some thresholds. ...
Conference Paper
This paper presents a vision based human machine interface (HMI) for the Xbox. It applies feature tracking algorithms to recognize user's head gestures and translates them into commands for the game. The pyramidal implementation of Lucas Kanade feature tracking is used to trace the optical flows in a sequence of frames. The experimental results show the feasibility of the proposed vision based interface, which has reasonable performance comparing with the standard game pad.
... The main disadvantage is the accuracy in detecting small amounts of motion. Systems introduced in [1] [2] have been based on color transforms to detect the facial skin color. In [3] the mobile contours have been first enhanced using pre-filtering and then transformed into log polar domain. ...
... The main disadvantage is the accuracy in detecting small amounts of motion. Systems introduced in [1, 2] have been based on color transforms to detect the facial skin color. In [3] the mobile contours have been first enhanced using pre-filtering and then transformed into log polar domain. ...
Article
Full-text available
Today computers have become more accessible and easy to use for everyone, except the disabled. Though some progress has been made on this issue but still it has been focused on either a certain disability or is too expensive for real world scenarios. Major contributions have been made for people lacking fine motor skills and speech based interfaces, but what if they lack both. In this regard we have proposed an integrated video based system that enables the user to give commands by head gestures and enter text by lip-reading. Currently certain gestures and limited vocabulary is recog-nized by the system but this could be extended in the current framework.
... In [15] the authors have embedded color information and a subspace method in a Bayesian network for head gesture recognition but this system fails when slight motion of head is concerned. Similarly [16] uses a color model to detect head and hair and then extracts invariant moments, which are used to detect three distinct head gestures using a discrete HMM. ...
Article
Full-text available
A functional head-movement corpus and convolutional neural networks (CNNs) for detecting head-movement functions are presented for analyzing the multiple communicative functions of head movements in multiparty face-to-face conversations. First, focusing on the multifunctionality of head movements, i.e., that a single head movement can simultaneously perform multiple functions, this paper defines 32 non-mutually-exclusive function categories, whose genres are speech production, eliciting and giving feedback, turn management, and cognitive and affect display. To represent and capture arbitrary multifunctional structures, our corpus employs multiple binary codes and logical-sum-based aggregations of multiple coders’ judgments. A corpus analysis targeting four-party Japanese conversations revealed multifunctional patterns in which the speaker modulates multiple functions, such as emphasis and eliciting listeners’ responses, through rhythmic head movements, and listeners express various attitudes and responses through continuous back-channel head movements. This paper proposes CNN-based binary classifiers for detecting each of the functions from the angular velocity of the head pose and the presence or absence of utterances. The experimental results showed that the recognition performance varies greatly, from approximately 30% to 90% in terms of the F-score, depending on the function category, and the performance was positively correlated with the amount of data and inter-coder agreement. In addition, we noted a tendency toward overdetection that added more functions to those originally in the corpus. The analyses and experiments confirm that our approach is promising for studying the multifunctionality of head movements.
Article
Wearable devices with accelerometer, gyroscope and so on are available wherever and whenever users need them. In this study, we construct the head gesture recognition system using wearable sensor with accelerometer. We use a glasses-like wearable sensor to provide easy-to-use system. In order to recognize users head gestures such as "nodding", "shaking" and so on, we extract feature vectors using principal component analysis (PCA) for the acceleration timeseries data and then classify the data by k-nearest neighbor (k-NN) or multi-layer perceptron (MLP) classification method. Moreover, we realize the real-time head gesture recognition system. Through the experiments of the head gesture recognition for the multiple users, we confirm the effectiveness of our system.
Article
The human face is an attractive biometric identifier and face recognition has certainly improved a lot since its beginnings some three decades ago, but still its application in real world has achieved limited success. In this doctoral dissertation we focus on a local feature of the human face namely the lip and analyse it for its relevance and influence on person recognition. In depth study is carried out with respect to various steps involved, such as detection, evaluation, normalization and the applications of the human lip motion. Initially we present a lip detection algorithm that is based on the fusion of two independent methods. The first method is based on edge detection and the second one on region segmentation, each having distinct characteristics and thus exhibit different strengths and weaknesses. We exploit these strengths by combining the two methods using fusion. Then we present results from extensive testing and evaluation of the detection algorithm on a realistic database. Next we give a comparison of the visual features of lip motion for their relevance to person recognition. For this purpose we extract various geometric and appearance based lip features and compare them using three feature selection measures; Minimal- Redundancy-Maximum-Relevance, Bhattacharya Distance and Mutual Information. Next we extract features which model the behavioural aspect of lip motion during speech and exploit them for person recognition. The behavioural features include static features, such as the normalized length of major/minor axis, coordinates of lip extrema points and dynamic features based on optical flow. These features are used to build client model by Gaussian Mixture Model (GMM) and finally the classification is achieved using a Bayesian decision rule. Recognition results are then presented on a text independent database specifically designed for testing behavioural features that require comparatively more data. Lastly we propose a temporal normalization method to compensate for variation caused by lip motion during speech. Given a group of videos for a person uttering the same sentence multiple times we study the lip motion in one of the videos and select certain key frames as synchronization frames. We then synchronize these frames from the first video with the remaining videos of the same person. Finally all the videos are normalized temporally by interpolation using lip morphing. For evaluation of our normalization algorithm we have devised a spatio-temporal person recognition algorithm that compares normalized and un-normalized videos.
Article
Full-text available
Gestures and speech interact. They are linked in language production and perception, with their interaction contributing to felicitous communication. The multifaceted nature of these interactions has attracted considerable attention from the speech and gesture community. This article provides an overview of our current understanding of manual and head gesture form and function, of the principle functional interactions between gesture and speech aiding communication, transporting meaning and producing speech. Furthermore, we present an overview of research on temporal speech-gesture synchrony, including the special role of prosody in speech-gesture alignment. In addition, we provide a summary of tools and data available for gesture analysis, and describe speech-gesture interaction models and simulations in technical systems. This overview also serves as an introduction to a Special Issue covering a wide range of articles on these topics. We provide links to the Special Issue throughout this paper.