Figure 3 - uploaded by Candace Sidner
Content may be subject to copyright.
Head nod recognition curves when varying the detection threshold.

Head nod recognition curves when varying the detection threshold.

Source publication
Conference Paper
Full-text available
Head pose and gesture offer several key conversational grounding cues and are used extensively in face-to-face in- teraction among people. We investigate how dialog context from an embodied conversational agent (ECA) can improve visual recognition of user gestures. We present a recogni- tion framework which (1) extracts contextual features from an...

Context in source publication

Context 1
... total of 274 head nods and 14 head shakes were naturally performed by the participants while interact- ing with the robot. Figure 3 shows head nod detection results for all 9 sub- jects used during testing. The ROC curves present the de- tection performance each recognition algorithm when vary- ing the detection threshold. ...

Similar publications

Article
Full-text available
In this paper, we will present a novel framework of utilizing periocular region for age invariant face recogni-tion. To obtain age invariant features, we first perform preprocessing schemes, such as pose correction, illumina-tion and periocular region normalization. And then we ap-ply robust Walsh-Hadamard transform encoded local bi-nary patterns (...
Conference Paper
Full-text available
Object information is an important cue to discriminate between activities that draw part of their meaning from con-text. Most of current work either ignores this information or relies on specific object detectors. However, such object detectors require a significant amount of training data and complicate the transfer of the action recognition frame...
Article
Full-text available
Dictionary learning (DL), playing a key role in the success of sparse representation, has led to state-of-the-art results in image classification tasks. Among the existing supervised dictionary learning methods, the label of each dictionary atom is predefined and fixed, i.e., each dictionary atom is either associated to all classes or assigned to a...
Article
Full-text available
Considering the advantages and limitations of traditional identification method, combined with the strategy of active detection, the principle of DC grid pilot protection based on active detection is proposed to improve the sensitivity and reliability of hybrid MMC DC grid protection, and to ensure the accurate identification of fault areas in DC g...
Article
Full-text available
This paper proposes a new feature learning method for the recognition of radar high resolution range profile (HRRP) sequences. HRRPs from a period of continuous changing aspect angles are jointly modeled and discriminated by a single model named the discriminative infinite restricted Boltzmann machine (Dis-iRBM). Compared with the commonly used hid...

Citations

... Others note that gestures can be a distraction if not seamlessly integrated with speech output (Piwek et al., 2005). Interesting attempts have also been made to develop automatic recognition of human feedback gestures to ECAs (Morency et al., 2006). Generally, there is a need for more empirical studies of relevant multimodal data to use as a basis for more complex and realistic models. ...
... The motion when the person pushes the button may not be observable because it is too subtle to detect, however the flash lights coming from the pocket-sized camera can be used to infer what the person has done. Morency et al [11] use conversational script which is extracted by using a speech recognition technique as contextual information to help the vision systems understand head gesture behaviours. For example, when there is a question mark at the end of a sentence, it is more likely that a nod or head shake gestures will be observed. ...
Conference Paper
Full-text available
A video surveillance system capable of detecting suspicious activities or behaviours is of paramount importance to law enforcement agencies. Such a system will not only reduce the work load of security personnel involved with monitoring the CCTV video feeds but also improve the time required to respond to any incident. There are two well known models to detect suspicious behaviour: misuse detection models which are dependent on suspicious behaviour definitions and anomaly detection models which measure deviations from defined normal behaviour. However, it is nearly possible to encapsulate the entire spectrum of either suspicious or normal behaviour. One of the ways to overcome this problem is by developing a system which learns in real time and adapts itself to behaviour which can be considered as common and normal or uncommon and suspicious. We present an approach utilising contextual information. Two contextual features, namely, type of behaviour and the commonality level of each type are extracted from longterm observation. Then, a data stream model which treats the incoming data as a continuous stream of information is used to extract these features. We further propose a clustering algorithm which works in conjunction with data stream model. Experiments and comparisons are conducted on the well known CAVIAR datasets to show the efficacy of utilising contextual information for detecting suspicious behaviour. The proposed approach is generic in nature and can be applicable to any features. However for the purpose of this study, we have employed pedestrian trajectories to represent the behaviour of people.
... [23] did not consider the term x 0 , which can be useful for incorporating the prior knowledge beyond the input observations x 1:T . For instance, dialog context as the current topic and expectations from previous utterances helps guide the head gesture recognition in a multi-modal interface [30]. In addition, triangular-chain MODEL1 also has edges between the observations and z, where the hidden CRF in [23] does not. ...
Article
Sequential modeling is a fundamental task in scientific fields, especially in speech and natural language processing, where many problems of sequential data can be cast as a sequential labeling or a sequence classification. In many applications, the two problems are often correlated, for example named entity recognition and dialog act classification for spoken language understanding. This paper presents triangular-chain conditional random fields (CRFs), a unified probabilistic model combining two related problems. Triangular-chain CRFs jointly represent the sequence and meta-sequence labels in a single graphical structure that both explicitly encodes their dependencies and preserves uncertainty between them. An efficient inference and parameter estimation method is described for triangular-chain CRFs by extending linear-chain CRFs. This method outperforms baseline models on synthetic data and real-world dialog data for spoken language understanding.
... These results are well in tune with earlier work on this topic. Hence, we conjecture that the DMLN framework provides an elegant and effective way to extend that work into the temporal domain involving human activities – a research topic that has found considerable attention in the computer vision field in past years [25], [16], [13], [7]. ...
... has primarily been explored by Torralba and colleagues in [20], [18], [35], [36] . They explore different probabilistic models in these works but the central theme throughout is to leverage hypothesized location information to place a prior over possible objects. They do this both to increase classification accuracy as well as improve running time. [16] follows a similar approach but focuses on integrating speech and pose. Activity recognition [25], [16], [13], [7] has seen growing interest. [25] is similar to this work in their notion of activities but focuses on more complicated tasks (e.g. infant care) and uses RFID-tagged objects instead of cameras. [7] offers an elegant algorithm ...
... [16] follows a similar approach but focuses on integrating speech and pose. Activity recognition [25], [16], [13], [7] has seen growing interest. [25] is similar to this work in their notion of activities but focuses on more complicated tasks (e.g. ...
Conference Paper
In this paper, we introduce a first-order probabilistic model that combines multiple cues to classify human activities from video data accurately and robustly. Our system works in a realistic office setting with background clutter, natural illumination, different people, and par- tial occlusion. The model we present is compact, requires only fifteen sen- tences of first-order logic grouped as a Dynamic Markov Logic Network (DMLNs) to implement the probabilistic model and leverages existing state-of-the-art work in pose detection and object recognition.
Conference Paper
We used Fisher linear discriminant analysis (LDA), static neural networks (NN), and focused time delay neural networks (TDNN) for gesture recognition. Gestures were collected in form of acceleration signals along three axes from six participants. A sports watch containing a 3-axis accelerometer, was worn by the users, who performed four gestures. Each gesture was performed for ten seconds, at the speed of one gesture per second. User-dependent and user-independent k-fold cross validations were carried out to measure the classifier performance. Using first and second order statistical descriptors of acceleration signals from validation datasets, LDA and NN classifiers were able to recognize the gestures at an average rate of 86% and 97% (user-dependent) and 89% and 85% (user-independent), respectively. TDNNs proved to be the best, achieving near perfect classification rates both for user-dependent and user-independent scenarios, while operating directly on the acceleration signals alleviating the need for explicit feature extraction.