Conference Paper

Hierarchical 3D Pose Estimation for Articulated Human Body Models from a Sequence of Volume Data

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This contribution describes a camera-based approach to fully automatically extract the 3D motion parameters of persons using a model based strategy. In a first step a 3D body model of the person to be tracked is constructed automatically using a calibrated setup of sixteen digital cameras and a monochromatic background. From the silhouette images the 3D shape of the person is determined using the shape-from-silhouette approach. This model is segmented into rigid body parts and a dynamic skeleton structure is fit. In the second step the resulting movable, personalized body template is exploited to estimate the 3D motion parameters of the person in arbitrary poses. Using the same camera setup and the shape-from-silhouette approach a sequence of volume data is captured to which the movable body template is fit. Using a modified ICP algorithm the fitting is performed in a hierarchical manner along the the kinematic chains of the body model. The resulting sequence of motion parameters for the articulated body model can be used for gesture recognition, control of virtual characters or robot manipulators.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The introduction of depth cameras appeals lots of researches' attention from 2D images to 3D depth images for human behavior analysis [4], [5], [6], [7]. There are three main reasons for this trend. ...
... Before we start the ICP algorithm, the corresponding relationship between human model and human points cloud should be calculated. In [5], [6], [7], they globally project every point onto every limb model and choose the closest foot point as corresponding point for each data point. Differently, in this paper, we gradually compute every limb's closest data points. ...
Article
Full-text available
In this paper, human poses which are presented by real 3D points cloud got from Kinect sensor are estimated and tracked by a hierarchical human model in ICP framework. There are several novel points in this paper. First, we compute human models' nearest points rather than points' nearest limbs as traditional methods do to make every limb have points. Second, we consider global information while hierarchically do ICP for every local limbs to conserve articulated kinematics chain. Third, by analyzing the four limbs (two legs and two arms) and enforcing joint constraints, we solve several specific problems, such as leg-or arm-crossing, etc. Experimental results including kinds of real human actions verify our method's effectiveness.
... In addition to blobs, several researchers have proposed classifying posture based on silhouettes. Weik and Liedtke [6] propose tracing the negative minimum curvatures along body contours to segment body parts and then identify body postures using a modified ICP algorithm. Fujiyoshi et al. ...
... In this section, we introduce a centroid context- Eqs. (6) and (7), we can define a centroid context to describe the characteristics of an arbitrary posture P. In the previous section, we presented a tree search algorithm that can be used to find a spanning tree P dfs T from a posture P based on the triangulation result. As shown in Fig. 5, (b) is the spanning tree derived from (a). ...
Article
This paper presents a new posture classification system that can be used to analyze human movements and irregular action directly from video sequences. In order to better characterize a posture in a sequence, we triangulate it into triangular meshes and extract two features, namely, the skeleton feature and the centroid context feature, from the triangulated meshes. The first can be used as a coarse representation, while the second is used to derive a finer description. We adopt a dfs (depth-first search) scheme to extract the skeletal features of a posture from the triangulation result. The proposed skeleton feature extraction scheme is most robust and efficient than conventional silhouette-based approaches. To extract the centroid context feature, we use the skeletal features extracted in the first stage. The centroid context feature is a finer representation that can characterize the shape of a whole body or body parts. The two descriptors working together make human movement analysis a very efficient and accurate process. Experiment results prove that the proposed method is a robust, accurate, and powerful tool for human movement analysis.
... In addition to blobs, several researchers have proposed for classifying human postures based on silhouettes. Weik and Liedtke [12] traced the negative minimum curvatures along body contours to segment body parts and then identified body postures using a modified Iterative Closest Point (ICP) algorithm. Fujiyoshi et al. [8] presented a skeleton-based method to recognize postures by extracting skeletal features based on curvature changes along the human silhouette. ...
... Based on (12) and checking all clusters in , the set of key postsures , i.e., , can be constructed for analyzing human movement sequences. ...
Article
This paper presents a novel posture classification system that analyzes human movements directly from video sequences. In the system, each sequence of movements is converted into a posture sequence. To better characterize a posture in a sequence, we triangulate it into triangular meshes, from which we extract two features: the skeleton feature and the centroid context feature. The first feature is used as a coarse representation of the subject, while the second is used to derive a finer description. We adopt a depth-first search (dfs) scheme to extract the skeletal features of a posture from the triangulation result. The proposed skeleton feature extraction scheme is more robust and efficient than conventional silhouette-based approaches. The skeletal features extracted in the first stage are used to extract the centroid context feature, which is a finer representation that can characterize the shape of a whole body or body parts. The two descriptors working together make human movement analysis a very efficient and accurate process because they generate a set of key postures from a movement sequence. The ordered key posture sequence is represented by a symbol string. Matching two arbitrary action sequences then becomes a symbol string matching problem. Our experiment results demonstrate that the proposed method is a robust, accurate, and powerful tool for human movement analysis.
... Similarly to [126], this approach used voxel labeling to extract measurements (Fig. 11), yet the tracking employs an extended Kalman filter to ensure more robustness and to cope with the nonlinearity behavior. Weik and Liedtke [130] proposed an optimized articulated skeleton that best fits the body shape data. This model is deployed to estimate the body motion from a stream of 3-D data using a hierarchical ICP algorithm [26]. ...
Article
Full-text available
The recent advances in full human body imaging technology illustrated by the 3D human body scanner (HBS), a device delivering full human body shape data, opened up large perspectives for the deployment of this technology in various fields (e.g. clothing industry, anthropology, entertainment). Yet this advance brought challenges on how to process and interpret the data delivered by the HBS in order to bridge the gap between this technology and potential applications. This paper surveys the literature on methods, for human body scan data segmentation and modelling, that attempted to overcome these challenges. It also discusses and evaluated the different approaches with respect to several requirements.
... The main goal is to move one sensor (using a robot) from one position to another one around the object to monitor and analyse important features that characterize the studied target. Likewise, there are other similar works related to 3D reconstruction of objects and space such as [9][10][11], etc. ...
Article
Full-text available
As a consequence of increasing safety concerns, camera surveillance has been widely adopted as a way to monitor public spaces. One of the major challenges of camera surveillance is to design an optimal method for camera network placement in order to ensure the greater possible coverage. In addition, this method must consider the landscape of the monitored environment to take into account the existing objects that may influence the deployment of such a network. In this paper, a new Voronoi-based 3D GIS oriented approach named " HybVOR " is proposed for surveillance camera network placement. The " HybVOR " approach aims to achieve a coverage near 100% through three main phases. First, a Voronoi Diagram from buildings' footprints is generated and cameras are placed on the Voronoi Edges. Second, the level of coverage is assessed by calculating a viewshed based on a raster Digital Surface Model of the region of interest. Finally, the visibility of the main buildings' entrances is evaluated based on a 3D vector model that contains these features. The effectiveness of the " HybVOR " approach is demonstrated through a case study that corresponds to an area of interest in Jeddah Seaport in the Kingdom of Saudi Arabia.
... Some particular techniques and other constraints are necessary to reduce this computation time. Weik and Liedtke [4] proposed a hierarchical method for 3D pose estimation. Luck et al [5] proposed a real-time algorithm with reducing joints and their degrees of freedom of a human body. ...
Article
Full-text available
This paper proposes a real-time, video based motion capture system using only one video camera. Since conventional video based motion capture systems need many cameras and take a long time to deal with many video images, they cannot generate motion data in real time. Therefore they cannot be used as a real-time input device for a standard PC. On the other hand, the prototype system proposed in this paper uses only one video camera, it takes video images of the upper body of the person, e.g., x, y, z position of the hands, a face rotation, a body rotation, etc., and it employs a very simple motion-tracking method to generate such upper body motion data in real time. This paper mainly describes its aspects as a hand and face motion-capturing device for a standard PC showing its application examples. Key-Words: -Image understanding, Motion capture, Motion recognition, Interface, Virtual reality
... Body pose recovery methods from images frequently use geometrically defined models of humans as a reference for pose estimation (e.g. Cheung and Kanade, 2000; Weik and Liedtke, 2001; Kakadiaris and Metaxas, 1998). Previous research has focused on creating a geometric skeleton that represents a simplification of the subject's geometric properties as a set of solids roughly aligned to the medial axis of each limb. ...
Article
Full-text available
The recovery of 3D models from visual data generally results in a ge-ometric skeleton that is a simplification of the shape of the captured figure, called an avatar. In this paper, we introduce a method called hierarchical kinematic synthesis that identifies an articulated skeleton, called kinematic skeleton, which provides a compact representation of the movement of the tracked subject. In this work, the kinematic skeleton of a human is computed from a finite number of key poses captured from full body articulated move-ments of an arbitrary subject, and provides location of joints, length and twist angle of links that form the limbs of the 3D avatar. We use an approximation to the human skeleton which consists of five serial chains constructed from revolute and spherical joints. To recover the kinematic skeleton, a hierarchical approximate finite-position synthesis methodology determines the dimensions of these chains limb by limb. We show that this technique effectively recovers the kinematic skele-ton for several synthetically generated datasets, and that the identifi-cation of the kinematic skeleton improves pose estimation for 3D data while simplifying the generation of avatar movement.
... In addition to blobs, another larger class of approaches to segment a posture is based on the feature of human silhouette. For example, Weik and Liedtke [5] tracked the negative minimum curvatures along body contours to segment body parts and then recognized body postures using a modified ICP algorithm. Furthermore, in [9], Mori et al. integrated the features of contour, shape, shading, and focus together for detecting half limbs and assembled them together to form up different human body parts. ...
Conference Paper
Full-text available
This paper presents a new segmentation algorithm to segment a body posture into different body parts using the technique of triangulation. For well analyzing each posture, we first propose a triangulation-based method to triangulate it to different triangle meshes. Then, we use a depth-first search scheme to find a spanning tree as its skeleton feature from the set of triangulation meshes. The triangulation-based scheme to extract important skeleton features has more robustness and effectiveness than other silhouette-based approaches. Then, different body parts can be roughly extracted by removing all the branching points from the spanning tree. A model-driven technique is then proposed for more accurately segmenting a human body into semantic parts. This technique uses the concept of Gaussian mixture model (GMM) to model different visual properties of different body parts. Then, a suitable segmentation scheme can be driven by classifying these models using their skeletons. Experimental results have proved that the proposed method is robust, accurate, and powerful in body part segmentation
... In addition to blobs, the silhouette is another important feature of body part segmentation. For example, Weik and Liedtke [22] tracked the negative minimum curvatures along body contours and then analyzed each body part using a modified iterative closest point algorithm. In [13], Rosin traced the convexity of object contour to segment an object into different parts. ...
Article
Full-text available
This paper presents a novel segmentation algorithm to segment a body posture into different body parts using the technique of deformable triangulation. To analyze each posture more accurately, they are segmented into triangular meshes, where a spanning tree can be found from the meshes using a depth-first search scheme. Then, we can decompose the tree into different subsegments, where each subsegment can be considered as a limb. Then, two hybrid methods (i.e., the skeleton-based and model-driven methods) are proposed for segmenting the posture into different body parts according to its occlusion conditions. To analyze occlusion conditions, a novel clustering scheme is proposed to cluster the training samples into a set of key postures. Then, a model space can be used to classify and segment each posture. If the input posture belongs to the nonocclusion category, the skeleton-based method is used to divide it into different body parts that can be refined using a set of Gaussian mixture models (GMMs). For the occlusion case, we propose a model-driven technique to select a good reference model for guiding the process of body part segmentation. However, if two postures' contours are similar, there will be some ambiguity that can lead to failure during the model selection process. Thus, this paper proposes a tree structure that uses a tracking technique so that the best model can be selected not only from the current frame but also from its previous frame. Then, a suitable GMM-based segmentation scheme can be used to finely segment a body posture into the different body parts. The experimental results show that the proposed method for body part segmentation is robust, accurate, and powerful.
... Some particular techniques and other constraints are necessary to reduce this computation time. Weik and Liedtke proposed a hierarchical method for 3D pose estimation [8]. Luck et a1 proposed a real-time algorithm with reduced number of joints and their degrees of freedom in a human body [9]. ...
... Pose estimation based on 3D data has been addressed in [7]. The 3D volume of a person is estimated in a multi-camera setup using the shape-from-silhouette method. ...
Conference Paper
Full-text available
We describe a technique for estimating human pose from an image sequence captured by a time-of-flight camera. The pose estimation is derived from a simple model of the human body that we fit to the data in 3D space. The model is represented by a graph consisting of 44 vertices for the upper torso, head, and arms. The anatomy of these body parts is encoded by the edges, i.e. an arm is represented by a chain of pairwise connected vertices whereas the torso consists of a 2-dimensional grid. The model can easily be extended to the representation of legs by adding further chains of pairwise connected vertices to the lower torso. The model is fit to the data in 3D space by employing an iterative update rule common to self-organizing maps. Despite the simplicity of the model, it captures the human pose robustly and can thus be used for tracking the major body parts, such as arms, hands, and head. The accuracy of the tracking is around 5–6 cm root mean square (RMS) for the head and shoulders and around 2 cm RMS for the head. The implementation of the procedure is straightforward and real-time capable.
... These approaches reconstruct the volume of a moving person at interactive frame rates and fit a comparably simple ellipsoid model to the volumes [3] , or compute the motion parameters for a kinematic structure by means of a force-field exerted by the volume elements [14]. In [23] , an iterative closest point method is used to fit a human model to volume data. In previous work, efficient optical feature tracking and volume reconstruction were hardly considered simultaneously for the acquisition ofFigure 1: Online system architecture color-based feature tracking to determine the 3D locations of salient body features over time. ...
... Additionally, silhouettes have been used widely as tools for recognition. Weik and Liedtke [21] use 16 cameras and a shape-from-silhouette method to build a model template for the body, which is then matched to volume data generated in each successive frame. Pose analysis is performed based on an automatic reconstruction of the subject's skeleton. ...
Conference Paper
Full-text available
In this paper, we introduce a novel method for employing image-based rendering to extend the range of use of human motion recognition systems. We demonstrate the use of image-based rendering to generate additional training sets for view-dependent human motion recognition systems. Input views orthogonal to the direction of motion are created automatically to construct the proper view from a combination of non-orthogonal views taken from several cameras. To extend motion recognition systems, image-based rendering can be utilized in two ways: (i) to generate additional training sets for these systems containing a large number of non-orthogonal views, and (ii) to generate orthogonal views (the views those systems are trained to recognize) from a combination of non-orthogonal views taken from several cameras. In this case, image-based rendering is used to generate views orthogonal to the mean direction of motion. We tested the method using an existing view-dependent human motion recognition system on two different sequences of motion, and promising initial results were obtained.
Conference Paper
Nowadays, highly-detailed animations of live-actor performances are increasingly easier to acquire, and 3D Video has reached considerable attention in visual media productions. This lecture will address new paradigm to achieve performance capture using cage-based shapes in motion. We define cage-based performance capture as the non-invasive process of capturing non-rigid surface of actors from multi-view in the form of sparse control deformation handles trajectories and a laser-scanned static template shape.
Conference Paper
Nowadays, highly-detailed animations of live-like performances are easier to acquire thanks to low-cost sensors. Also, 4D meshes have reached considerable attentions in visual media productions. This course will address new paradigm to achieve performance capture using cage-based shapes in motion. We define cage-based performance capture as the non-invasive process of capturing non-rigid surface of actors from multi-view in the form of sparse control deformation handles trajectories and a laser-scanned template shape. In this course, we address the hard problem of extracting or acquiring and then reusing non-rigid parametrization for video-based animations in four steps: (1) cage-based inverse kinematics, (2) conversion of surface performance capture into cage-based deformation, (3) cage-based cartoon surface exaggeration, and (4) cage-based registration of time-varying reconstructed point clouds. The key objective is to attract the interest of game programmers, digital artists, and filmmakers in employing purely geometric animator-friendly tools to capture and reuse surfaces in motion. Finally, a broad range of advanced animation techniques and promising research-to-production opportunities for the years to come, in-between Graphics, and Vision fields will be presented. At first sight, a central challenge is to express plausible boneless deformations while preserving global and local properties of dynamic captured surfaces with a limited number of controllable, flexible and reusable parameters. While abandoning the classical articulated skeleton as the underlying structure, we show that cage-based deformers offer a flexible design space abstraction to dynamically non-rigid surface motion through learning space-time shape variability. Registered cage-handles trajectories allow the reconstruction of complex mesh sequences by deforming an enclosed template mesh. Decoupling motion from geometry, cage-based performance capture techniques offer reusable outputs for animation transfer.
Article
The acquisition of human motion data is of major importance for creating interactive virtual environments, intelligent user interfaces, and realistic computer animations. Today's performance of off-the-shelf computer hardware enables marker-free non-intrusive optical tracking of the human body. In addition, recent research shows that it is possible to efficiently acquire and render volumetric scene representations in real-time. This paper describes a system to capture human motion without the use of markers or scene-intruding devices. Instead, a 2D feature tracking algorithm and a silhouette-based 3D volumetric scene reconstruction method are applied directly to the image data. A person is recorded by multiple synchronized cameras, and a multi-layer hierarchical kinematic skeleton is fitted to each frame in a two-stage process. The pose of a first model layer at every time step is determined from the tracked 3D locations of hands, head and feet. A more sophisticated second skeleton layer is fitted to the motion data by applying a volume registration technique. We present results with a prototype system showing that the approach is capable of running at interactive frame rates.
Article
In the marker based human motion capture system, it's a key step to accurately extract and track the 2-D coordinates of the body joints, that because the 3-D reconstruction process and the reliability of the capture system depend heavily on it. Different from those traditional solutions, we use ordinary industrial cameras and take colorful balls as the markers to solve this key point. We have also promoted our solution to solve the problem of occlusion. Finally, we got perfect result in practical applications, and the whole process can be computed in real-time. The method will be extended in the future use.
Conference Paper
This paper surveys the current state of art of digital human body modeling with a focus on information inclusion and analyzes the results from the aspects of design and engineering. It presents the results of a literature study, which intended to investigate the modeling approaches within the mentioned categories, and to investigate the fidelity of models based on the information content. In view of the fact that modeling is always a simplified representation of reality, models with different information contents are developed for different applications. It is also discussed in this paper that the information content of human body models however reflects not only the aspect of application, but also the level of fidelity, or functional sophistication. Taking into consideration the sorts of information needed to model human body as a complex organic system, the authors propose an information content-based categorization. The major categories of aspect models of human body that have been incorporated in a stratified reasoning scheme are: morphological, material, structural, mechanical, physiological and behavioral models. One conclusion is that remarkable progress has been achieved in terms of sophistication of models (i.e., of information inclusion and processing methods). Another conclusion is that further increase of the fidelity of models will not be possible without the proper treatment of the concomitant complexities. Integration of various aspect models and real time computational processing of human models are inevitable in several fields of application. However, development of human body models of such a high sophistication goes together with an exponential growth in the required capacities. This leads us to a trade-off problem in digital human body modeling.
Conference Paper
This paper presents a new posture classification system to analyze different human behaviors directly from video sequences using the technique of triangulation. For well analyzing each posture in the video sequences, we propose a triangulation-based method to triangulate it to different triangle meshes from which two important posture features are then extracted, i.e., the ones of skeleton and centroid context. The first one is used for a coarse search and the second one is for a finer classification to classify postures in more details. For the first descriptor, we take advantages of a dfs (depth-first search) scheme to extract the skeleton features of a posture from its triangulation result. Then, with the help of skeleton information, we can define a new shape descriptor, i.e., centroid context, to describe a posture up to a semantic level. That is, the centroid context is a finer descriptor to describe a posture not only from its whole shape but also from its body parts. Since the two descriptors are complement to each other, all desired human postures can be compared and classified very accurately. The nice ability of posture classification can help us generate a set of key postures for transferring a behavior sequence to a set of symbols. Then, a novel string matching scheme is proposed to analyze different human behaviors. Experimental results have proved that the proposed method is robust, accurate, and powerful in human behavior analysis
Article
Full-text available
This paper, evaluates the influence of depth information on the gesture recognition process. We propose depth silhouettes, a natural extension of the binary silhouette concept, as a mechanism to incorporate depth information for gesture recognition. Using depth silhouettes, we define extensions of three classic techniques employed previously for gesture recognition with monocular vision. These include: (a) silhouette compression using PCA and learning with HMM; (b) an exemplar-based gesture recognition using HMM; and (c) temporal templates that in this work are compressed using PCA and learned with SVM. The results obtained show that, independently of the technique employed, the use of depth silhouettes increases the success significantly. Additionally, we show how the best results are obtained through the combined use of PCA and HMM.
Conference Paper
Full-text available
The recent advances in full human body (HB) imaging technology illustrated by the 3D human body scanner (HBS), a device delivering full HB shape data, opened up large perspectives for the deployment of this technology in various fields such as the clothing industry, anthropology, and entertainment. However, these advances also brought challenges on how to process and interpret the data delivered by the HBS in order to bridge the gap between this technology and potential applications. This paper presents a literature survey of research work on HBS data segmentation and modeling aiming at overcoming these challenges, and discusses and evaluates different approaches with respect to several requirements.
Conference Paper
Full-text available
This work discusses an approach to seamlessly integrate real and virtual scene content by on-the-fly 3D scene modeling and dynamic scene interaction. The key element is a ToF-depth camera, accompanied by color cameras, mounted on a pan-tilt head. The system allows to scan the environment for easy 3D reconstruction, and will track and model dynamically moving objects like human actors in 3D. This allows to compute mutual occlusions between real and virtual objects and correct light and shadow generation with mutual light interaction. No dedicated studio is required, as virtually any room can be turned into a virtual studio with this approach. Since the complete process operates in 3D and produces consistent color and depth sequences, this system can be used for full 3D TV production.
Book
Driven by consumer-market applications that enjoy steadily increasing economic importance, graphics hardware and rendering algorithms are a central focus of computer graphics research. Video-based rendering is an approach that aims to overcome the current bottleneck in the time-consuming modeling process and has applications in areas such as computer games, special effects, and interactive TV. This book offers an in-depth introduction to video-based rendering, a rapidly developing new interdisciplinary topic employing techniques from computer graphics, computer vision, and telecommunication engineering. Providing an overview of the state-of-the-art of video-based rendering and details of the fundamental VBR algorithms, the author discusses the advantages, the potential, as well as the limitations of the approach in the context of different application scenarios.
Article
We introduce in this paper a novel method for employing image-based rendering to extend the range of applicability of human motion and gait recognition systems. Much work has been done in the field of human motion and gait recognition, and many interesting methods for detecting and classifying motion have been developed. However, systems that can robustly recognize human behavior in real-world contexts have yet to be developed. A significant reason for this is that the activities of humans in typical settings are unconstrained in terms of the motion path. People are free to move throughout the area of interest in any direction they like. While there have been many good classification systems developed in this domain, the majority of these systems have used a single camera providing input to a training-based learning method. Methods that rely on a single camera are implicitly view-dependent. In practice, the classification accuracy of these systems often becomes increasingly poor as the angle between the camera and the direction of motion varies away from the training view angle. As a result, these methods have limited real-world applications, since it is often impossible to limit the direction of motion of people so rigidly. We demonstrate the use of image-based rendering to adapt the input to meet the needs of the classifier by automatically constructing the proper view (image), that matches the training view, from a combination of arbitrary views taken from several cameras. We tested the method on 162 sequences of video data of human motions taken indoors and outdoors, and promising results were obtained.
Article
We present an algorithm for acquiring the 3D surface geometry and motion of a dynamic piecewise-rigid object using a single depth video camera. The algorithm identifies and tracks the rigid components in each frame, while accumulating the geometric information acquired over time, possibly from different viewpoints. The algorithm also reconstructs the dynamic skeleton of the object, thus can be used for markerless motion capture. The acquired model can then be animated to novel poses. We show the results of the algorithm applied to synthetic and real depth video.
Article
In this paper, we propose a new algorithm for partitioning human posture represented by 3D point clouds sampled from the surface of human body. The algorithm is formed as a constrained extension of the recently developed segmentation method, spectral clustering (SC). Two folds of merits are offered by the algorithm: (1) as a nonlinear method, it is able to deal with the situation that data (point cloud) are sampled from a manifold (the surface of human body) rather than the embedded entire 3D space; (2) by using constraints, it facilitates the integration of multiple similarities for human posture partitioning, and it also helps to reduce the limitations of spectral clustering. We show that the constrained spectral clustering (CSC) still can be solved by generalized eigen-decomposition. Experimental results confirm the effectiveness of the proposed algorithm.
Article
Full-text available
Camera placement has an enormous impact on the performance of vision systems, but the best placement to maximize performance depends on the purpose of the system. As a result, this paper focuses largely on the problem of task-specific camera placement. We propose a new camera placement method that optimizes views to provide the highest resolution images of objects and motions in the scene that are critical for the performance of some specified task (e.g. motion recognition, visual metrology, part identification, etc.). A general analytical formulation of the observation problem is developed in terms of motion statistics of a scene and resolution of observed actions resulting in an aggregate observability measure. The goal of this system is to optimize across multiple cameras the aggregate observability of the set of actions performed in a defined area. The method considers dynamic and unpredictable environments, where the subject of interest changes in time. It does not attempt to measure or reconstruct surfaces or objects, and does not use an internal model of the subjects for reference. As a result, this method differs significantly in its core formulation from camera placement solutions applied to problems such as inspection, reconstruction or the Art Gallery class of problems. We present tests of the system’s optimized camera placement solutions using real-world data in both indoor and outdoor situations and robot-based experimentation using an all terrain robot vehicle-Jr robot in an indoor setting.
Article
Part II uses the foundations of Part I [35] to define constraint equations for 2D-3D pose estimation of different corresponding entities. Most articles on pose estimation concentrate on specific types of correspondences, mostly between points, and only rarely use line correspondences. The first aim of this part is to extend pose estimation scenarios to correspondences of an extended set of geometric entities. In this context we are interested to relate the following (2D) image and (3D) model types: 2D point/3D point, 2D line/3D point, 2D line/3D line, 2D conic/3D circle, 2D conic/3D sphere. Furthermore, to handle articulated objects, we describe kinematic chains in this context in a similar manner. We ensure that all constraint equations end up in a distance measure in the Euclidean space, which is well posed in the context of noisy data. We also discuss the numerical estimation of the pose. We propose to use linearized twist transformations which result in well conditioned and fast solvable systems of equations. The key idea is not to search for the representation of the Lie group, describing the rigid body motion, but for the representation of their generating Lie algebra. This leads to real-time capable algorithms.
Conference Paper
This paper presents a human action recognition system for recognizing various behaviors directly from videos. Firstly, we triangulate the human body to different triangle meshes. Then, we use a depth-first search (dfs) scheme to find a spanning tree from the set of meshes. All leafs of the spanning tree are adopted as the extremities. Different from traditional approaches to find the extremities on the targetpsilas silhouette as skeletons, the extremities found from the internal centroids of triangle meshes can represent a human posture more accurately and robustly. To model each human action, all the input skeleton sequences are then transformed into symbol sequences. Then, we design a string matching scheme to measure the similarity between any two human behaviors. Since 2D postures are used in this paper, the above scheme is sensitive to different view points. To solve the view independent problem, a 2D matrix is then constructed for recording the symbol relations between two viewpoints. Thus, our proposed matching scheme is almost view-invariant. Experimental results show that the proposed scheme is a robust, efficient, and promising tool in human action recognition.
Conference Paper
The acquisition of human motion data is of major importance for creating interactive virtual environments, intelligent user interfaces, and realistic computer animations. Today's performance of off-the-shelf computer hardware enables marker-free non-intrusive optical tracking of the human body. In addition, recent research shows that it is possible to efficiently acquire and render volumetric scene representations in real-time. This paper describes a system to capture human motion at interactive frame rates without the use of markers or scene-intruding devices. Instead, 2D computer vision and 3D volumetric scene reconstruction algorithms are applied directly to the image data. A person is recorded by multiple synchronized cameras, and a multilayer hierarchical kinematic skeleton is fitted to each frame in a two-stage process. We present results with a prototype system running on two PCs.
Conference Paper
This paper proposes a real-time, video based motion capture system using two video cameras. Since conventional video based motion capture systems use many video cameras and take a long time to deal with many video images, they cannot generate motion data in real time. On the other hand, the prototype system proposed uses a few video cameras, up to two, it employs a very simple motion-tracking method based on object color and edge distributions, and it takes video images of the person, e.g., x, y position of the hand, feet and head, and then it generates motion data of such body parts in real time. Especially using two video cameras, it generates 3D motion data in real time. This paper mainly describes its aspects as a real-time motion capture system for the tip parts of the human body, i.e., the hands, feet and head, and validates its usefulness by showing its virtual reality (VR) application examples.
Article
Full-text available
The ability to recognize humans and their activities by vision is key for a machine to interact intelligently and effortlessly with a human-inhabited environment. Because of many potentially important applications, “looking at people” is currently one of the most active application domains in computer vision. This survey identifies a number of promising applications and provides an overview of recent developments in this domain. The scope of this survey is limited to work on whole-body or hand motion; it does not include work on human faces. The emphasis is on discussing the various methodologies; they are grouped in 2-D approaches with or without explicit shape models and 3-D approaches. Where appropriate, systems are reviewed. We conclude with some thoughts about future directions.
Article
This contribution describes the semi-automatic creation of highly realistic flexible 3D models of participants for distributed 3D videoconferencing systems. The proposed technique uses a flexible mesh template surrounding an interior skeleton structure which is based on a simplified human skeleton. The vertices of this template are arranged in rigid rings along the bones of the skeleton. Using 3D data obtained by a shape from silhouettes approach, the size and shape of the mesh template are adapted to the real person. Texture mapping of the adapted mesh using real camera images leads to a natural impression. The mesh organization in rigid rings allows an efficient surface deformation according to the skeleton movements. Once the resulting model is transmitted, it can be animated subsequently using the simple parameter set of the interior skeleton structure. Results obtained with real image data confirm the eligibility of the animated person models in terms of realism and efficiency for 3D videoconferencing applications. Copyright © 2000 John Wiley & Sons, Ltd.
Article
This work is concerned with the signal-to-symbol problem of building skinned, segmented, land-marked and labeled 3D models of the whole human body from range data. A fully automated model-based process is presented that takes raw range data, cleans and skins it, and then locates "interesting" features, to enrich the surface with symbolic information for specific applications. The method is validated via volumetrics in medicine and surface anthropometry.
Article
A technique called distance-ordered homotopic thinning (DOHT) for skeletonizing 3D binary images is presented. DOHT produces skeletons that are homotopic, thin, and medial. This is achieved by sequentially deleting points in ascending distance order until no more can be safely deleted. A point can be safely deleted only if doing so preserves topology. Distance information is provided by the chamfer distance transform, an integer approximation to the Euclidean distance transform. Two variations of DOHT are presented that arise from using different rules for preserving points. The first uses explicit rules for preserving the ends of medial axes or edges of medial surfaces, and the second preserves the centers of maximal balls identified from the chamfer distance transform. By thresholding the centers according to their distance values, the user can control the scale of features represented in the skeleton. Results are presented for real and synthetic 2D and 3D data.
Conference Paper
This paper describes a system which can perform full 3-D pose estimation of a single arbitrarily shaped, rigid object at rates up to 10 Hz. A triangular mesh model of the object to be tracked is generated offline using conventional range sensors. Real-time range data of the object is sensed by the CMU high speed VLSI range sensor. Pose estimation is performed by registering the real-time range data to the triangular mesh model using an enhanced implementation of the Iterative Closest Point (ICP) Algorithm introduced by Besl and McKay (1992). The method does not require explicit feature extraction or specification of correspondence. Pose estimation accuracies of the order of 1% of the object size in translation, and 1 degree in rotation have been measured
Article
An algorithm for the mapping of texture from multiple camera views onto a 3D model of a real object is presented. The texture sources are images taken from an object rotating in front of a stationary calibrated camera. The 3D model is represented by a wireframe built of triangles and is geometrically adjusted to the camera views. The presented approach aims at the reduction of texture distortion associated with the boundaries between triangles mapped from different camera views and disturbance due to not visible parts of the object surface. For this purpose adjacent triangles describing the surface of the 3D model are grouped to homogenous surface regions which are textured with a common image followed by a local texture filtering at the region boundaries. For triangles not visible in any camera view a filter has been developed which uses the texture from adjacent visible triangles for the generation of synthetic texture. Experimental investigations with different real 3D objects have ...
Article
This contribution describes an approach towards 3D teleconferencing. Textured, 3D antropomorphic models are used in a virtual environment to give the impression of physical closeness. The requirements for such a conferencing system are on the one hand textured, articulated 3D models of the conferees. For high realism a flexible deformation model has been integrated in the 3D models. On the other hand these models have to be animated in the virtual meeting room according to the motion parameters of the real conferees. Therefore motion estimation has to be performed. To avoid wiring of the persons this has to be done optically. In this approach a gradient based motion tracker has been implemented. No markers or optical tracking points are needed to extract the hierarchic motion parameters of the conferee. It works on a stereoscopic image sequence and employs the flexible, articulated antropomorphic model of the conferee. The motion hierarchy of the articulated model is used to reduce the...
Kanade: Real-time 3D pose estimation using a highspeed range sensor
  • D A Simon
  • M Hebert
  • D. A. Simon
Building symbolic information for 3D human body modeling from range data
  • L Dekker
  • I Douros
  • B F Buston
  • P Treleaven