Conference Paper

Hierarchical 3D Pose Estimation for Articulated Human Body Models from a Sequence of Volume Data

June 2001

June 2001
1998:27-34

Source
DBLP

Conference: Robot Vision, International Workshop RobVis 2001, Auckland, New Zealand, February 16-18, 2001, Proceeding

Authors:

This contribution describes a camera-based approach to fully automatically extract the 3D motion parameters of persons using a model based strategy. In a first step a 3D body model of the person to be tracked is constructed automatically using a calibrated setup of sixteen digital cameras and a monochromatic background. From the silhouette images the 3D shape of the person is determined using the shape-from-silhouette approach. This model is segmented into rigid body parts and a dynamic skeleton structure is fit. In the second step the resulting movable, personalized body template is exploited to estimate the 3D motion parameters of the person in arbitrary poses. Using the same camera setup and the shape-from-silhouette approach a sequence of volume data is captured to which the movable body template is fit. Using a modified ICP algorithm the fitting is performed in a hierarchical manner along the the kinematic chains of the body model. The resulting sequence of motion parameters for the articulated body model can be used for gesture recognition, control of virtual characters or robot manipulators.

Model-Based Human Pose Estimation with Hierarchical ICP from Single Depth Images

Article

Full-text available

Jan 2011

In this paper, human poses which are presented by real 3D points cloud got from Kinect sensor are estimated and tracked by a hierarchical human model in ICP framework. There are several novel points in this paper. First, we compute human models' nearest points rather than points' nearest limbs as traditional methods do to make every limb have points. Second, we consider global information while hierarchically do ICP for every local limbs to conserve articulated kinematics chain. Third, by analyzing the four limbs (two legs and two arms) and enforcing joint constraints, we solve several specific problems, such as leg-or arm-crossing, etc. Experimental results including kinds of real human actions verify our method's effectiveness.

Unusual Behavior Analysis and Its Application to Surveillance Systems

Article

Jan 2006

This paper presents a new posture classification system that can be used to analyze human movements and irregular action directly from video sequences. In order to better characterize a posture in a sequence, we triangulate it into triangular meshes and extract two features, namely, the skeleton feature and the centroid context feature, from the triangulated meshes. The first can be used as a coarse representation, while the second is used to derive a finer description. We adopt a dfs (depth-first search) scheme to extract the skeletal features of a posture from the triangulation result. The proposed skeleton feature extraction scheme is most robust and efficient than conventional silhouette-based approaches. To extract the centroid context feature, we use the skeletal features extracted in the first stage. The centroid context feature is a finer representation that can characterize the shape of a whole body or body parts. The two descriptors working together make human movement analysis a very efficient and accurate process. Experiment results prove that the proposed method is a robust, accurate, and powerful tool for human movement analysis.

Video-Based Human Movement Analysis and Its Application to Surveillance Systems

Article

May 2008

This paper presents a novel posture classification system that analyzes human movements directly from video sequences. In the system, each sequence of movements is converted into a posture sequence. To better characterize a posture in a sequence, we triangulate it into triangular meshes, from which we extract two features: the skeleton feature and the centroid context feature. The first feature is used as a coarse representation of the subject, while the second is used to derive a finer description. We adopt a depth-first search (dfs) scheme to extract the skeletal features of a posture from the triangulation result. The proposed skeleton feature extraction scheme is more robust and efficient than conventional silhouette-based approaches. The skeletal features extracted in the first stage are used to extract the centroid context feature, which is a finer representation that can characterize the shape of a whole body or body parts. The two descriptors working together make human movement analysis a very efficient and accurate process because they generate a set of key postures from a movement sequence. The ordered key posture sequence is represented by a symbol string. Matching two arbitrary action sequences then becomes a symbol string matching problem. Our experiment results demonstrate that the proposed method is a robust, accurate, and powerful tool for human movement analysis.

Segmentation and modelling of full human body shape from 3D scan data: A survey

Article

Full-text available

Jan 2006

Naoufel Werghi

The recent advances in full human body imaging technology illustrated by the 3D human body scanner (HBS), a device delivering full human body shape data, opened up large perspectives for the deployment of this technology in various fields (e.g. clothing industry, anthropology, entertainment). Yet this advance brought challenges on how to process and interpret the data delivered by the HBS in order to bridge the gap between this technology and potential applications. This paper surveys the literature on methods, for human body scan data segmentation and modelling, that attempted to overcome these challenges. It also discusses and evaluated the different approaches with respect to several requirements.

HybVOR: A voronoi-based 3D GIS approach for camera surveillance network placement

Article

Full-text available

May 2015
ISPRS

As a consequence of increasing safety concerns, camera surveillance has been widely adopted as a way to monitor public spaces. One of the major challenges of camera surveillance is to design an optimal method for camera network placement in order to ensure the greater possible coverage. In addition, this method must consider the landscape of the monitored environment to take into account the existing objects that may influence the deployment of such a network. In this paper, a new Voronoi-based 3D GIS oriented approach named " HybVOR " is proposed for surveillance camera network placement. The " HybVOR " approach aims to achieve a coverage near 100% through three main phases. First, a Voronoi Diagram from buildings' footprints is generated and cameras are placed on the Voronoi Edges. Second, the level of coverage is assessed by calculating a viewshed based on a raster Digital Surface Model of the region of interest. Finally, the visibility of the main buildings' entrances is evaluated based on a 3D vector model that contains these features. The effectiveness of the " HybVOR " approach is demonstrated through a case study that corresponds to an area of interest in Jeddah Seaport in the Kingdom of Saudi Arabia.

Real-time Motion Capture System Using One Video Camera Based on Color and Edge Distribution

Article

Full-text available

Jan 2002

This paper proposes a real-time, video based motion capture system using only one video camera. Since conventional video based motion capture systems need many cameras and take a long time to deal with many video images, they cannot generate motion data in real time. Therefore they cannot be used as a real-time input device for a standard PC. On the other hand, the prototype system proposed in this paper uses only one video camera, it takes video images of the upper body of the person, e.g., x, y, z position of the hands, a face rotation, a body rotation, etc., and it employs a very simple motion-tracking method to generate such upper body motion data in real time. This paper mainly describes its aspects as a hand and face motion-capturing device for a standard PC showing its application examples. Key-Words: -Image understanding, Motion capture, Motion recognition, Interface, Virtual reality

On Advances in Robot Kinematics

Article

Full-text available

Jan 2004

The recovery of 3D models from visual data generally results in a ge-ometric skeleton that is a simplification of the shape of the captured figure, called an avatar. In this paper, we introduce a method called hierarchical kinematic synthesis that identifies an articulated skeleton, called kinematic skeleton, which provides a compact representation of the movement of the tracked subject. In this work, the kinematic skeleton of a human is computed from a finite number of key poses captured from full body articulated move-ments of an arbitrary subject, and provides location of joints, length and twist angle of links that form the limbs of the 3D avatar. We use an approximation to the human skeleton which consists of five serial chains constructed from revolute and spherical joints. To recover the kinematic skeleton, a hierarchical approximate finite-position synthesis methodology determines the dimensions of these chains limb by limb. We show that this technique effectively recovers the kinematic skele-ton for several synthetically generated datasets, and that the identifi-cation of the kinematic skeleton improves pose estimation for 3D data while simplifying the generation of avatar movement.

Segmentation of Human Body Parts Using Deformable Triangulation

Conference Paper

Full-text available

Jan 2006

This paper presents a new segmentation algorithm to segment a body posture into different body parts using the technique of triangulation. For well analyzing each posture, we first propose a triangulation-based method to triangulate it to different triangle meshes. Then, we use a depth-first search scheme to find a spanning tree as its skeleton feature from the set of triangulation meshes. The triangulation-based scheme to extract important skeleton features has more robustness and effectiveness than other silhouette-based approaches. Then, different body parts can be roughly extracted by removing all the branching points from the spanning tree. A model-driven technique is then proposed for more accurately segmenting a human body into semantic parts. This technique uses the concept of Gaussian mixture model (GMM) to model different visual properties of different body parts. Then, a suitable segmentation scheme can be driven by classifying these models using their skeletons. Experimental results have proved that the proposed method is robust, accurate, and powerful in body part segmentation

Segmentation of Human Body Parts Using Deformable Triangulation

Article

Full-text available

Jun 2010

This paper presents a novel segmentation algorithm to segment a body posture into different body parts using the technique of deformable triangulation. To analyze each posture more accurately, they are segmented into triangular meshes, where a spanning tree can be found from the meshes using a depth-first search scheme. Then, we can decompose the tree into different subsegments, where each subsegment can be considered as a limb. Then, two hybrid methods (i.e., the skeleton-based and model-driven methods) are proposed for segmenting the posture into different body parts according to its occlusion conditions. To analyze occlusion conditions, a novel clustering scheme is proposed to cluster the training samples into a set of key postures. Then, a model space can be used to classify and segment each posture. If the input posture belongs to the nonocclusion category, the skeleton-based method is used to divide it into different body parts that can be refined using a set of Gaussian mixture models (GMMs). For the occlusion case, we propose a model-driven technique to select a good reference model for guiding the process of body part segmentation. However, if two postures' contours are similar, there will be some ambiguity that can lead to failure during the model selection process. Thus, this paper proposes a tree structure that uses a tracking technique so that the best model can be selected not only from the current frame but also from its previous frame. Then, a suitable GMM-based segmentation scheme can be used to finely segment a body posture into the different body parts. The experimental results show that the proposed method for body part segmentation is robust, accurate, and powerful.

Robust Tracking Algorithm Based on Color and Edge Distribution for Real-time Video Based Motion Capture Systems.

Conference Paper

Jan 2002

Self-Organizing Maps for Pose Estimation with a Time-of-Flight Camera

Conference Paper

Full-text available

Jan 2009

We describe a technique for estimating human pose from an image sequence captured by a time-of-flight camera. The pose estimation is derived from a simple model of the human body that we fit to the data in 3D space. The model is represented by a graph consisting of 44 vertices for the upper torso, head, and arms. The anatomy of these body parts is encoded by the edges, i.e. an arm is represented by a chain of pairwise connected vertices whereas the torso consists of a 2-dimensional grid. The model can easily be extended to the representation of legs by adding further chains of pairwise connected vertices to the lower torso. The model is fit to the data in 3D space by employing an iterative update rule common to self-organizing maps. Despite the simplicity of the model, it captures the human pose robustly and can thus be used for tracking the major body parts, such as arms, hands, and head. The accuracy of the tracking is around 5–6 cm root mean square (RMS) for the head and shoulders and around 2 cm RMS for the head. The implementation of the procedure is straightforward and real-time capable.

Multi-Layer Skeleton Fitting for Online Human Motion Capture

Article

Jan 2002

Image-based reconstruction for view-independent human motion recognition

Conference Paper

Full-text available

Nov 2003

In this paper, we introduce a novel method for employing image-based rendering to extend the range of use of human motion recognition systems. We demonstrate the use of image-based rendering to generate additional training sets for view-dependent human motion recognition systems. Input views orthogonal to the direction of motion are created automatically to construct the proper view from a combination of non-orthogonal views taken from several cameras. To extend motion recognition systems, image-based rendering can be utilized in two ways: (i) to generate additional training sets for these systems containing a large number of non-orthogonal views, and (ii) to generate orthogonal views (the views those systems are trained to recognize) from a combination of non-orthogonal views taken from several cameras. In this case, image-based rendering is used to generate views orthogonal to the mean direction of motion. We tested the method using an existing view-dependent human motion recognition system on two different sequences of motion, and promising initial results were obtained.

Cage-based performance capture

Conference Paper

Aug 2018

Yann Savoye

Nowadays, highly-detailed animations of live-actor performances are increasingly easier to acquire, and 3D Video has reached considerable attention in visual media productions. This lecture will address new paradigm to achieve performance capture using cage-based shapes in motion. We define cage-based performance capture as the non-invasive process of capturing non-rigid surface of actors from multi-view in the form of sparse control deformation handles trajectories and a laser-scanned static template shape.

Cage-based performance capture

Conference Paper

Nov 2016

Yann Savoye

Nowadays, highly-detailed animations of live-like performances are easier to acquire thanks to low-cost sensors. Also, 4D meshes have reached considerable attentions in visual media productions. This course will address new paradigm to achieve performance capture using cage-based shapes in motion. We define cage-based performance capture as the non-invasive process of capturing non-rigid surface of actors from multi-view in the form of sparse control deformation handles trajectories and a laser-scanned template shape. In this course, we address the hard problem of extracting or acquiring and then reusing non-rigid parametrization for video-based animations in four steps: (1) cage-based inverse kinematics, (2) conversion of surface performance capture into cage-based deformation, (3) cage-based cartoon surface exaggeration, and (4) cage-based registration of time-varying reconstructed point clouds. The key objective is to attract the interest of game programmers, digital artists, and filmmakers in employing purely geometric animator-friendly tools to capture and reuse surfaces in motion. Finally, a broad range of advanced animation techniques and promising research-to-production opportunities for the years to come, in-between Graphics, and Vision fields will be presented. At first sight, a central challenge is to express plausible boneless deformations while preserving global and local properties of dynamic captured surfaces with a limited number of controllable, flexible and reusable parameters. While abandoning the classical articulated skeleton as the underlying structure, we show that cage-based deformers offer a flexible design space abstraction to dynamically non-rigid surface motion through learning space-time shape variability. Registered cage-handles trajectories allow the reconstruction of complex mesh sequences by deforming an enclosed template mesh. Decoupling motion from geometry, cage-based performance capture techniques offer reusable outputs for animation transfer.

Multiview 3D reconstruction and human point cloud classification

Conference Paper

May 2014

COMBINING 2D FEATURE TRACKING AND VOLUME RECONSTRUCTION FOR ONLINE VIDEO-BASED HUMAN MOTION CAPTURE

Article

Nov 2011

The acquisition of human motion data is of major importance for creating interactive virtual environments, intelligent user interfaces, and realistic computer animations. Today's performance of off-the-shelf computer hardware enables marker-free non-intrusive optical tracking of the human body. In addition, recent research shows that it is possible to efficiently acquire and render volumetric scene representations in real-time. This paper describes a system to capture human motion without the use of markers or scene-intruding devices. Instead, a 2D feature tracking algorithm and a silhouette-based 3D volumetric scene reconstruction method are applied directly to the image data. A person is recorded by multiple synchronized cameras, and a multi-layer hierarchical kinematic skeleton is fitted to each frame in a two-stage process. The pose of a first model layer at every time step is determined from the tracked 3D locations of hands, head and feet. A more sophisticated second skeleton layer is fitted to the motion data by applying a volume registration technique. We present results with a prototype system showing that the approach is capable of running at interactive frame rates.

Extraction and trace of body joints in human motion capture system - art. no. 67881B

Article

Nov 2007
Proceedings of SPIE

In the marker based human motion capture system, it's a key step to accurately extract and track the 2-D coordinates of the body joints, that because the 3-D reconstruction process and the reliability of the capture system depend heavily on it. Different from those traditional solutions, we use ordinary industrial cameras and take colorful balls as the markers to solve this key point. We have also promoted our solution to solve the problem of occlusion. Finally, we got perfect result in practical applications, and the whole process can be computed in real-time. The method will be extended in the future use.

Status of Digital Human Body Modeling From the Aspect of Information Inclusion

Conference Paper

Jan 2005

This paper surveys the current state of art of digital human body modeling with a focus on information inclusion and analyzes the results from the aspects of design and engineering. It presents the results of a literature study, which intended to investigate the modeling approaches within the mentioned categories, and to investigate the fidelity of models based on the information content. In view of the fact that modeling is always a simplified representation of reality, models with different information contents are developed for different applications. It is also discussed in this paper that the information content of human body models however reflects not only the aspect of application, but also the level of fidelity, or functional sophistication. Taking into consideration the sorts of information needed to model human body as a complex organic system, the authors propose an information content-based categorization. The major categories of aspect models of human body that have been incorporated in a stratified reasoning scheme are: morphological, material, structural, mechanical, physiological and behavioral models. One conclusion is that remarkable progress has been achieved in terms of sophistication of models (i.e., of information inclusion and processing methods). Another conclusion is that further increase of the fidelity of models will not be possible without the proper treatment of the concomitant complexities. Integration of various aspect models and real time computational processing of human models are inevitable in several fields of application. However, development of human body models of such a high sophistication goes together with an exponential growth in the required capacities. This leads us to a trade-off problem in digital human body modeling.

Human Behavior Analysis Using Deformable Triangulations

Conference Paper

Dec 2005

This paper presents a new posture classification system to analyze different human behaviors directly from video sequences using the technique of triangulation. For well analyzing each posture in the video sequences, we propose a triangulation-based method to triangulate it to different triangle meshes from which two important posture features are then extracted, i.e., the ones of skeleton and centroid context. The first one is used for a coarse search and the second one is for a finer classification to classify postures in more details. For the first descriptor, we take advantages of a dfs (depth-first search) scheme to extract the skeleton features of a posture from its triangulation result. Then, with the help of skeleton information, we can define a new shape descriptor, i.e., centroid context, to describe a posture up to a semantic level. That is, the centroid context is a finer descriptor to describe a posture not only from its whole shape but also from its body parts. Since the two descriptors are complement to each other, all desired human postures can be compared and classified very accurately. The nice ability of posture classification can help us generate a set of key postures for transferring a behavior sequence to a set of symbols. Then, a novel string matching scheme is proposed to analyze different human behaviors. Experimental results have proved that the proposed method is robust, accurate, and powerful in human behavior analysis

Depth silhouettes for gesture recognition

Article

Full-text available

Feb 2008
PATTERN RECOGN LETT

This paper, evaluates the influence of depth information on the gesture recognition process. We propose depth silhouettes, a natural extension of the binary silhouette concept, as a mechanism to incorporate depth information for gesture recognition. Using depth silhouettes, we define extensions of three classic techniques employed previously for gesture recognition with monocular vision. These include: (a) silhouette compression using PCA and learning with HMM; (b) an exemplar-based gesture recognition using HMM; and (c) temporal templates that in this work are compressed using PCA and learned with SVM. The results obtained show that, independently of the technique employed, the use of depth silhouettes increases the success significantly. Additionally, we show how the best results are obtained through the combined use of PCA and HMM.

Segmentation and Modeling of Full Human Body Shape From 3-D Scan Data: A Survey

Conference Paper

Full-text available

Jan 2006

Naoufel Werghi

The recent advances in full human body (HB) imaging technology illustrated by the 3D human body scanner (HBS), a device delivering full HB shape data, opened up large perspectives for the deployment of this technology in various fields such as the clothing industry, anthropology, and entertainment. However, these advances also brought challenges on how to process and interpret the data delivered by the HBS in order to bridge the gap between this technology and potential applications. This paper presents a literature survey of research work on HBS data segmentation and modeling aiming at overcoming these challenges, and discusses and evaluates different approaches with respect to several requirements.

MixIn3D: 3D Mixed Reality with ToF-Camera

Conference Paper

Full-text available

Jan 2009

This work discusses an approach to seamlessly integrate real and virtual scene content by on-the-fly 3D scene modeling and dynamic scene interaction. The key element is a ToF-depth camera, accompanied by color cameras, mounted on a pan-tilt head. The system allows to scan the environment for easy 3D reconstruction, and will track and model dynamically moving objects like human actors in 3D. This allows to compute mutual occlusions between real and virtual objects and correct light and shadow generation with mutual light interaction. No dedicated studio is required, as virtually any room can be turned into a virtual studio with this approach. Since the complete process operates in 3D and produces consistent color and depth sequences, this system can be used for full 3D TV production.

Video-Based Rendering

Book

Aug 2005

Marcus A. Magnor

Driven by consumer-market applications that enjoy steadily increasing economic importance, graphics hardware and rendering algorithms are a central focus of computer graphics research. Video-based rendering is an approach that aims to overcome the current bottleneck in the time-consuming modeling process and has applications in areas such as computer games, special effects, and interactive TV. This book offers an in-depth introduction to video-based rendering, a rapidly developing new interdisciplinary topic employing techniques from computer graphics, computer vision, and telecommunication engineering. Providing an overview of the state-of-the-art of video-based rendering and details of the fundamental VBR algorithms, the author discusses the advantages, the potential, as well as the limitations of the approach in the context of different application scenarios.

View-independent human motion classification using image-based reconstruction

Article

Jul 2009
IMAGE VISION COMPUT

We introduce in this paper a novel method for employing image-based rendering to extend the range of applicability of human motion and gait recognition systems. Much work has been done in the field of human motion and gait recognition, and many interesting methods for detecting and classifying motion have been developed. However, systems that can robustly recognize human behavior in real-world contexts have yet to be developed. A significant reason for this is that the activities of humans in typical settings are unconstrained in terms of the motion path. People are free to move throughout the area of interest in any direction they like. While there have been many good classification systems developed in this domain, the majority of these systems have used a single camera providing input to a training-based learning method. Methods that rely on a single camera are implicitly view-dependent. In practice, the classification accuracy of these systems often becomes increasingly poor as the angle between the camera and the direction of motion varies away from the training view angle. As a result, these methods have limited real-world applications, since it is often impossible to limit the direction of motion of people so rigidly. We demonstrate the use of image-based rendering to adapt the input to meet the needs of the classifier by automatically constructing the proper view (image), that matches the training view, from a combination of arbitrary views taken from several cameras. We tested the method on 162 sequences of video data of human motions taken indoors and outdoors, and promising results were obtained.

Articulated Object Reconstruction and Markerless Motion Capture from Depth Video

Article

Apr 2008

We present an algorithm for acquiring the 3D surface geometry and motion of a dynamic piecewise-rigid object using a single depth video camera. The algorithm identifies and tracks the rigid components in each frame, while accumulating the geometric information acquired over time, possibly from different viewpoints. The algorithm also reconstructs the dynamic skeleton of the object, thus can be used for markerless motion capture. The acquired model can then be animated to novel poses. We show the results of the algorithm applied to synthetic and real depth video.

3D human posture segmentation by spectral clustering with surface normal constraint

Article

Sep 2011
SIGNAL PROCESS

In this paper, we propose a new algorithm for partitioning human posture represented by 3D point clouds sampled from the surface of human body. The algorithm is formed as a constrained extension of the recently developed segmentation method, spectral clustering (SC). Two folds of merits are offered by the algorithm: (1) as a nonlinear method, it is able to deal with the situation that data (point cloud) are sampled from a manifold (the surface of human body) rather than the embedded entire 3D space; (2) by using constraints, it facilitates the integration of multiple similarities for human posture partitioning, and it also helps to reduce the limitations of spectral clustering. We show that the constrained spectral clustering (CSC) still can be solved by generalized eigen-decomposition. Experimental results confirm the effectiveness of the proposed algorithm.

Optimal Camera Placement for Automated Surveillance Tasks

Article

Full-text available

Oct 2007

Camera placement has an enormous impact on the performance of vision systems, but the best placement to maximize performance depends on the purpose of the system. As a result, this paper focuses largely on the problem of task-specific camera placement. We propose a new camera placement method that optimizes views to provide the highest resolution images of objects and motions in the scene that are critical for the performance of some specified task (e.g. motion recognition, visual metrology, part identification, etc.). A general analytical formulation of the observation problem is developed in terms of motion statistics of a scene and resolution of observed actions resulting in an aggregate observability measure. The goal of this system is to optimize across multiple cameras the aggregate observability of the set of actions performed in a defined area. The method considers dynamic and unpredictable environments, where the subject of interest changes in time. It does not attempt to measure or reconstruct surfaces or objects, and does not use an internal model of the subjects for reference. As a result, this method differs significantly in its core formulation from camera placement solutions applied to problems such as inspection, reconstruction or the Art Gallery class of problems. We present tests of the system’s optimized camera placement solutions using real-world data in both indoor and outdoor situations and robot-based experimentation using an all terrain robot vehicle-Jr robot in an indoor setting.

Pose Estimation in Conformal Geometric Algebra Part II: Real-Time Pose Estimation Using Extended Feature Concepts

Article

Jan 2005

Part II uses the foundations of Part I [35] to define constraint equations for 2D-3D pose estimation of different corresponding entities. Most articles on pose estimation concentrate on specific types of correspondences, mostly between points, and only rarely use line correspondences. The first aim of this part is to extend pose estimation scenarios to correspondences of an extended set of geometric entities. In this context we are interested to relate the following (2D) image and (3D) model types: 2D point/3D point, 2D line/3D point, 2D line/3D line, 2D conic/3D circle, 2D conic/3D sphere. Furthermore, to handle articulated objects, we describe kinematic chains in this context in a similar manner. We ensure that all constraint equations end up in a distance measure in the Euclidean space, which is well posed in the context of noisy data. We also discuss the numerical estimation of the pose. We propose to use linearized twist transformations which result in well conditioned and fast solvable systems of equations. The key idea is not to search for the representation of the Lie group, describing the rigid body motion, but for the representation of their generating Lie algebra. This leads to real-time capable algorithms.

Human Action Recognition Using Star Templates and Delaunay Triangulation

Conference Paper

Sep 2008

This paper presents a human action recognition system for recognizing various behaviors directly from videos. Firstly, we triangulate the human body to different triangle meshes. Then, we use a depth-first search (dfs) scheme to find a spanning tree from the set of meshes. All leafs of the spanning tree are adopted as the extremities. Different from traditional approaches to find the extremities on the targetpsilas silhouette as skeletons, the extremities found from the internal centroids of triangle meshes can represent a human posture more accurately and robustly. To model each human action, all the input skeleton sequences are then transformed into symbol sequences. Then, we design a string matching scheme to measure the similarity between any two human behaviors. Since 2D postures are used in this paper, the above scheme is sensitive to different view points. To solve the view independent problem, a 2D matrix is then constructed for recording the symbol relations between two viewpoints. Thus, our proposed matching scheme is almost view-invariant. Experimental results show that the proposed scheme is a robust, efficient, and promising tool in human action recognition.

Combining 2D feature tracking and volume reconstruction for Online video-based human motion capture

Conference Paper

Feb 2002

The acquisition of human motion data is of major importance for creating interactive virtual environments, intelligent user interfaces, and realistic computer animations. Today's performance of off-the-shelf computer hardware enables marker-free non-intrusive optical tracking of the human body. In addition, recent research shows that it is possible to efficiently acquire and render volumetric scene representations in real-time. This paper describes a system to capture human motion at interactive frame rates without the use of markers or scene-intruding devices. Instead, 2D computer vision and 3D volumetric scene reconstruction algorithms are applied directly to the image data. A person is recorded by multiple synchronized cameras, and a multilayer hierarchical kinematic skeleton is fitted to each frame in a two-stage process. We present results with a prototype system running on two PCs.

Real-time video based motion capture system based on color and edge distributions

Conference Paper

Feb 2002

This paper proposes a real-time, video based motion capture system using two video cameras. Since conventional video based motion capture systems use many video cameras and take a long time to deal with many video images, they cannot generate motion data in real time. On the other hand, the prototype system proposed uses a few video cameras, up to two, it employs a very simple motion-tracking method based on object color and edge distributions, and it takes video images of the person, e.g., x, y position of the hand, feet and head, and then it generates motion data of such body parts in real time. Especially using two video cameras, it generates 3D motion data in real time. This paper mainly describes its aspects as a real-time motion capture system for the tip parts of the human body, i.e., the hands, feet and head, and validates its usefulness by showing its virtual reality (VR) application examples.

The Visual Analysis of Human Movement: A Survey

Article

Full-text available

Jan 1999

Dariu M. Gavrila

The ability to recognize humans and their activities by vision is key for a machine to interact intelligently and effortlessly with a human-inhabited environment. Because of many potentially important applications, “looking at people” is currently one of the most active application domains in computer vision. This survey identifies a number of promising applications and provides an overview of recent developments in this domain. The scope of this survey is limited to work on whole-body or hand motion; it does not include work on human faces. The emphasis is on discussing the various methodologies; they are grouped in 2-D approaches with or without explicit shape models and 3-D approaches. Where appropriate, systems are reviewed. We conclude with some thoughts about future directions.

Method for registration of 3D shapes

Article

Jan 1992

Creation of flexible anthropomorphic models for 3D videoconferencing using shape from silhouettes

Article

Jul 2000

This contribution describes the semi-automatic creation of highly realistic flexible 3D models of participants for distributed 3D videoconferencing systems. The proposed technique uses a flexible mesh template surrounding an interior skeleton structure which is based on a simplified human skeleton. The vertices of this template are arranged in rigid rings along the bones of the skeleton. Using 3D data obtained by a shape from silhouettes approach, the size and shape of the mesh template are adapted to the real person. Texture mapping of the adapted mesh using real camera images leads to a natural impression. The mesh organization in rigid rings allows an efficient surface deformation according to the skeleton movements. Once the resulting model is transmitted, it can be animated subsequently using the simple parameter set of the interior skeleton structure. Results obtained with real image data confirm the eligibility of the animated person models in terms of realism and efficiency for 3D videoconferencing applications. Copyright © 2000 John Wiley & Sons, Ltd.

Building Symbolic Information for 3D Human Body Modeling from Range Data

Article

Jan 1999

This work is concerned with the signal-to-symbol problem of building skinned, segmented, land-marked and labeled 3D models of the whole human body from range data. A fully automated model-based process is presented that takes raw range data, cleans and skins it, and then locates "interesting" features, to enrich the surface with symbolic information for specific applications. The method is validated via volumetrics in medicine and surface anthropometry.

Distance-Ordered Homotopic Thinning: A Skeletonization Algorithm for 3D Digital Images

Article

Dec 1998

C. Pudney

A technique called distance-ordered homotopic thinning (DOHT) for skeletonizing 3D binary images is presented. DOHT produces skeletons that are homotopic, thin, and medial. This is achieved by sequentially deleting points in ascending distance order until no more can be safely deleted. A point can be safely deleted only if doing so preserves topology. Distance information is provided by the chamfer distance transform, an integer approximation to the Euclidean distance transform. Two variations of DOHT are presented that arise from using different rules for preserving points. The first uses explicit rules for preserving the ends of medial axes or edges of medial surfaces, and the second preserves the centers of maximal balls identified from the chamfer distance transform. By thresholding the centers according to their distance values, the user can control the scale of features represented in the skeleton. Results are presented for real and synthetic 2D and 3D data.

Real-time 3-D pose estimation using a high-speed range sensor

Conference Paper

Jun 1994

This paper describes a system which can perform full 3-D pose estimation of a single arbitrarily shaped, rigid object at rates up to 10 Hz. A triangular mesh model of the object to be tracked is generated offline using conventional range sensors. Real-time range data of the object is sensed by the CMU high speed VLSI range sensor. Pose estimation is performed by registering the real-time range data to the triangular mesh model using an enhanced implementation of the Iterative Closest Point (ICP) Algorithm introduced by Besl and McKay (1992). The method does not require explicit feature extraction or specification of correspondence. Pose estimation accuracies of the order of 1% of the object size in translation, and 1 degree in rotation have been measured

Mapping Texture From Multiple Camera Views Onto 3D-Object Models For Computer Animation

Article

Aug 1995

An algorithm for the mapping of texture from multiple camera views onto a 3D model of a real object is presented. The texture sources are images taken from an object rotating in front of a stationary calibrated camera. The 3D model is represented by a wireframe built of triangles and is geometrically adjusted to the camera views. The presented approach aims at the reduction of texture distortion associated with the boundaries between triangles mapped from different camera views and disturbance due to not visible parts of the object surface. For this purpose adjacent triangles describing the surface of the 3D model are grouped to homogenous surface regions which are textured with a common image followed by a local texture filtering at the region boundaries. For triangles not visible in any camera view a filter has been developed which uses the texture from adjacent visible triangles for the generation of synthetic texture. Experimental investigations with different real 3D objects have ...

3D Motion Estimation for Articulated Human Templates Using a Sequence of Stereoscopic Image Pairs

Article

Dec 1998
Proceedings of SPIE

This contribution describes an approach towards 3D teleconferencing. Textured, 3D antropomorphic models are used in a virtual environment to give the impression of physical closeness. The requirements for such a conferencing system are on the one hand textured, articulated 3D models of the conferees. For high realism a flexible deformation model has been integrated in the 3D models. On the other hand these models have to be animated in the virtual meeting room according to the motion parameters of the real conferees. Therefore motion estimation has to be performed. To avoid wiring of the persons this has to be done optically. In this approach a gradient based motion tracker has been implemented. No markers or optical tracking points are needed to extract the hierarchic motion parameters of the conferee. It works on a stereoscopic image sequence and employs the flexible, articulated antropomorphic model of the conferee. The motion hierarchy of the articulated model is used to reduce the...

Kanade: Real-time 3D pose estimation using a highspeed range sensor

Jan 1994
2235

D A Simon
M Hebert
D. A. Simon

Building symbolic information for 3D human body modeling from range data

L Dekker
I Douros
B F Buston
P Treleaven

Hierarchical 3D Pose Estimation for Articulated Human Body Models from a Sequence of Volume Data

Abstract

No full-text available

Recommended publications

Markerless 3D Human Pose Estimation and Tracking based on RGBD Cameras: an Experimental Evaluation