Visualization of the ground contact constraint.

Visualization of the ground contact constraint.

Source publication
Article
Full-text available
Human motion capture (MoCap) plays a key role in healthcare and human–robot collaboration. Some researchers have combined orientation measurements from inertial measurement units (IMUs) and positional inference from cameras to reconstruct the 3D human motion. Their works utilize multiple cameras or depth sensors to localize the human in three dimen...

Context in source publication

Context 1
... define the cost as depicted in Figure 2. LetˆPLetˆ LetˆP B g (θ), where g ∈ {le f t_ f oot, right_ f oot} is the left or right ankle position of the estimated model, and let P B g be the intersection between the contact surface and the line where the 2D ankle keypoint is back-projected into three dimensions. ...

Citations

... Additionally, approaches like smoothing techniques and robust algorithms designed for challenging conditions, unexpected noise, and segmentation play a pivotal role. These highlights underscore the importance of algorithms in optimizing and enhancing the quality of sensor-captured data [1,7,32,[39][40][41][42][43]. ...
Article
Full-text available
Technological advancements have expanded the range of methods for capturing human body motion, including solutions involving inertial sensors (IMUs) and optical alternatives. However, the rising complexity and costs associated with commercial solutions have prompted the exploration of more cost-effective alternatives. This paper presents a markerless optical motion capture system using a RealSense depth camera and intelligent computer vision algorithms. It facilitates precise posture assessment, the real-time calculation of joint angles, and acquisition of subject-specific anthropometric data for gait analysis. The proposed system stands out for its simplicity and affordability in comparison to complex commercial solutions. The gathered data are stored in comma-separated value (CSV) files, simplifying subsequent analysis and data mining. Preliminary tests, conducted in controlled laboratory environments and employing a commercial MEMS-IMU system as a reference, revealed a maximum relative error of 7.6% in anthropometric measurements, with a maximum absolute error of 4.67 cm at average height. Stride length measurements showed a maximum relative error of 11.2%. Static joint angle tests had a maximum average error of 10.2%, while dynamic joint angle tests showed a maximum average error of 9.06%. The proposed optical system offers sufficient accuracy for potential application in areas such as rehabilitation, sports analysis, and entertainment.
... Also, 3D linear displacement kinematics relative to a starting position can be derived. However, the mostly consistent accuracy achieved for 3D angular kinematics cannot be achieved in 3D position estimation, as this involves double integration of the acceleration signal, causing strong integration drift [8,9]. Since neither rate gyroscope nor magnetometer provide additional displacement data, there is no possibility to counteract this drift through data fusion methods, as is performed in the angular estimates. ...
Article
Full-text available
Integrated Ultra-wideband (UWB) and Magnetic Inertial Measurement Unit (MIMU) sensor systems have been gaining popularity for pedestrian tracking and indoor localization applications, mainly due to their complementary error characteristics that can be exploited to achieve higher accuracies via a data fusion approach. These integrated sensor systems have the potential for improving the ambulatory 3D analysis of human movement (estimating 3D kinematics of body segments and joints) over systems using only on-body MIMUs. For this, high accuracy is required in the estimation of the relative positions of all on-body integrated UWB/MIMU sensor modules. So far, these integrated UWB/MIMU sensors have not been reported to have been applied for full-body ambulatory 3D analysis of human movement. Also, no review articles have been found that have analyzed and summarized the methods integrating UWB and MIMU sensors for on-body applications. Therefore, a comprehensive analysis of this technology is essential to identify its potential for application in 3D analysis of human movement. This article thus aims to provide such a comprehensive analysis through a structured technical review of the methods integrating UWB and MIMU sensors for accurate position estimation in the context of the application for 3D analysis of human movement. The methods used for integration are all summarized along with the accuracies that are reported in the reviewed articles. In addition, the gaps that are required to be addressed for making this system applicable for the 3D analysis of human movement are discussed.
... Inertial Measurement Units (IMUs) are used for the self-perception of a robot's kinematics, e.g., in order to guarantee the equilibrium during their motion and working. IMUs are mainly composed of gyroscopes, magnetometers, and accelerometers [111]. Infrared sensors (IR) are used, for instance, in combination with sonar to improve robots' capability to efficiently follow physicians in hospitals wards. ...
Article
Full-text available
Cyber-physical or virtual systems or devices that are capable of autonomously interacting with human or non-human agents in real environments are referred to as social robots. The primary areas of application for biomedical technology are nursing homes, hospitals, and private homes for the purpose of providing assistance to the elderly, people with disabilities, children, and medical personnel. This review examines the current state-of-the-art of social robots used in healthcare applications, with a particular emphasis on the technical characteristics and requirements of these different types of systems. Humanoids robots, companion robots, and telepresence robots are the three primary categories of devices that are identified and discussed in this article. The research looks at commercial applications, as well as scientific literature (according to the Scopus Elsevier database), patent analysis (using the Espacenet search engine), and more (searched with Google search engine). A variety of devices are enumerated and categorized, and then our discussion and organization of their respective specifications takes place.
... However, IMUs suffer from severe drift for long-term capturing, resulting in misalignments with the human body. Then, some methods exploit additional sensors, such as RGB camera [17], RGB-D camera [46,57,67], or LiDAR [27] to alleviate the problem and make obvious improvement. However, they all focus on HPE without considering the scene constraints, which are limited for reconstructing human-scene integrated digital urban and human-scene natural interactions. ...
Preprint
Full-text available
We present SLOPER4D, a novel scene-aware dataset collected in large urban environments to facilitate the research of global human pose estimation (GHPE) with human-scene interaction in the wild. Employing a head-mounted device integrated with a LiDAR and camera, we record 12 human subjects' activities over 10 diverse urban scenes from an egocentric view. Frame-wise annotations for 2D key points, 3D pose parameters, and global translations are provided, together with reconstructed scene point clouds. To obtain accurate 3D ground truth in such large dynamic scenes, we propose a joint optimization method to fit local SMPL meshes to the scene and fine-tune the camera calibration during dynamic motions frame by frame, resulting in plausible and scene-natural 3D human poses. Eventually, SLOPER4D consists of 15 sequences of human motions, each of which has a trajectory length of more than 200 meters (up to 1,300 meters) and covers an area of more than 2,000 $m^2$ (up to 13,000 $m^2$), including more than 100K LiDAR frames, 300k video frames, and 500K IMU-based motion frames. With SLOPER4D, we provide a detailed and thorough analysis of two critical tasks, including camera-based 3D HPE and LiDAR-based 3D HPE in urban environments, and benchmark a new task, GHPE. The in-depth analysis demonstrates SLOPER4D poses significant challenges to existing methods and produces great research opportunities. The dataset and code are released at \url{http://www.lidarhumanmotion.net/sloper4d/}
... Approaches that fuse these two types of sensors to benefit from complementarity, so as to achieve more robust mocap, have attracted much attention. Preceding methods propose to combine IMUs with RGB cameras, which can be achieved by either optimization [42,20,37,35,36] or regression [10,52,55,68]. Recently, Liang et al. [30] presents a learning-and-optimization method fusing a single camera with only 4 IMUs, which demonstrates robust challenging motion capture. ...
Preprint
We propose a multi-sensor fusion method for capturing challenging 3D human motions with accurate consecutive local poses and global trajectories in large-scale scenarios, only using a single LiDAR and 4 IMUs. Specifically, to fully utilize the global geometry information captured by LiDAR and local dynamic motions captured by IMUs, we design a two-stage pose estimator in a coarse-to-fine manner, where point clouds provide the coarse body shape and IMU measurements optimize the local actions. Furthermore, considering the translation deviation caused by the view-dependent partial point cloud, we propose a pose-guided translation corrector. It predicts the offset between captured points and the real root locations, which makes the consecutive movements and trajectories more precise and natural. Extensive quantitative and qualitative experiments demonstrate the capability of our approach for compelling motion capture in large-scale scenarios, which outperforms other methods by an obvious margin. We will release our code and captured dataset to stimulate future research.
... It can capture accurate short-term motions but suffer from severe drift with the acquisition time increasing. Some methods [6,13,39,47,48] utilize extra external RGB or RGBD cameras as a remedy to improve the accuracy, but result in limited capture space, human activities, and interactions. HPS [9] uses a headmounted camera, which looks outwards like the human eyes, to complement IMUs in global localization. ...
... However, IMU-based methods suffer from severe drift over time. To improve the pose estimation accuracy, some methods [6,13,24,39,40] utilize extra external RGB or RGBD cameras as a remedy. Helten et al. [10] combined two RGB-D cameras with IMUs to perform local pose optimization. ...
Preprint
Full-text available
We propose Human-centered 4D Scene Capture (HSC4D) to accurately and efficiently create a dynamic digital world, containing large-scale indoor-outdoor scenes, diverse human motions, and rich interactions between humans and environments. Using only body-mounted IMUs and LiDAR, HSC4D is space-free without any external devices' constraints and map-free without pre-built maps. Considering that IMUs can capture human poses but always drift for long-period use, while LiDAR is stable for global localization but rough for local positions and orientations, HSC4D makes both sensors complement each other by a joint optimization and achieves promising results for long-term capture. Relationships between humans and environments are also explored to make their interaction more realistic. To facilitate many down-stream tasks, like AR, VR, robots, autonomous driving, etc., we propose a dataset containing three large scenes (1k-5k $m^2$) with accurate dynamic human motions and locations. Diverse scenarios (climbing gym, multi-story building, slope, etc.) and challenging human activities (exercising, walking up/down stairs, climbing, etc.) demonstrate the effectiveness and the generalization ability of HSC4D. The dataset and code is available at https://github.com/climbingdaily/HSC4D.
... As imagebased mocap solutions suffer from occlusions, fusing images with IMUs, which aims at achieving more robust motion tracking, has recently attracted much attention. This can be achieved by either energy-based optimization [22,[35][36][37]43,65] which optimizes human pose to fit both image features and inertia measurements, or feature-based estimation [13,62] which regresses human pose from the combined features derived from images and IMUs. Zhang et al. [87] propose to exploit IMUs in the 2D pose estimation by fusing the image features of each pair of joints linked by the IMUs. ...
Preprint
Motion capture from sparse inertial sensors has shown great potential compared to image-based approaches since occlusions do not lead to a reduced tracking quality and the recording space is not restricted to be within the viewing frustum of the camera. However, capturing the motion and global position only from a sparse set of inertial sensors is inherently ambiguous and challenging. In consequence, recent state-of-the-art methods can barely handle very long period motions, and unrealistic artifacts are common due to the unawareness of physical constraints. To this end, we present the first method which combines a neural kinematics estimator and a physics-aware motion optimizer to track body motions with only 6 inertial sensors. The kinematics module first regresses the motion status as a reference, and then the physics module refines the motion to satisfy the physical constraints. Experiments demonstrate a clear improvement over the state of the art in terms of capture accuracy, temporal stability, and physical correctness.
... The conventional approach to gait analysis is to attach six IMUs to the upper and lower legs and feet [15]. Some IMU-based full-body motion analyses require more than 10 inertial sensors to track one subject [2], [16], [17]. Such configurations are prone to errors because each sensor must be attached to a predefined body segment. ...
... The results also suggest that the attention module reduces intra-misassignment errors. (a) one-by-one [16] (b) ours FIGURE 7. Some results on CMU-MoCap [44] in terms of confusion matrices. The left column represents the assignment results of the conventional work [18], and the right column represents the results of the proposed method. ...
Article
Full-text available
Due to the recent technological advances in inertial measurement units (IMUs), many applications for the measurement of human motion using multiple body-worn IMUs have been developed. In these applications, each IMU has to be attached to a predefined body segment. A technique to identify the body segment on which each IMU is mounted allows users to attach inertial sensors to arbitrary body segments, which avoids having to remeasure due to incorrect attachment of the sensors. We address this IMU-to-segment assignment problem and propose a novel end-to-end learning model that incorporates a global feature generation module and an attention-based mechanism. The former extracts the feature representing the motion of all attached IMUs, and the latter enable the model to learn the dependency relationships between the IMUs. The proposed model thus identifies the IMU placement based on the features from global motion and relevant IMUs. We quantitatively evaluated the proposed method using synthetic and real public datasets with three sensor configurations, including a full-body configuration mounting 15 sensors. The results demonstrated that our approach significantly outperformed the conventional and baseline methods for all datasets and sensor configurations.
Preprint
Either RGB images or inertial signals have been used for the task of motion capture (mocap), but combining them together is a new and interesting topic. We believe that the combination is complementary and able to solve the inherent difficulties of using one modality input, including occlusions, extreme lighting/texture, and out-of-view for visual mocap and global drifts for inertial mocap. To this end, we propose a method that fuses monocular images and sparse IMUs for real-time human motion capture. Our method contains a dual coordinate strategy to fully explore the IMU signals with different goals in motion capture. To be specific, besides one branch transforming the IMU signals to the camera coordinate system to combine with the image information, there is another branch to learn from the IMU signals in the body root coordinate system to better estimate body poses. Furthermore, a hidden state feedback mechanism is proposed for both two branches to compensate for their own drawbacks in extreme input cases. Thus our method can easily switch between the two kinds of signals or combine them in different cases to achieve a robust mocap. %The two divided parts can help each other for better mocap results under different conditions. Quantitative and qualitative results demonstrate that by delicately designing the fusion method, our technique significantly outperforms the state-of-the-art vision, IMU, and combined methods on both global orientation and local pose estimation. Our codes are available for research at https://shaohua-pan.github.io/robustcap-page/.