Figure - available from: Multimedia Tools and Applications
This content is subject to copyright. Terms and conditions apply.
The proposed hand bones representation

The proposed hand bones representation

Source publication
Article
Full-text available
Hand pose estimation is a significant research topic for various computer vision applications. Nonetheless, reliable and robust pose estimation with existing methods remains challenging due to the complex anatomy of the hand and the varying shapes and sizes of hands. The traditional approach involved using depth sensors or multi-camera setups. Howe...

Citations

... Much research on hand pose estimation has been conducted with depth cameras [2], [3]. However, their performance degrades beyond controlled indoor environments; they are sensitive to changing lighting conditions, such as direct sunlight, which can interfere with their infrared sensors [4]. Moreover, the hardware comes at a premium. ...
... HKD can be performed using various input modalities, including depth cameras and leap motion controllers. However, they are costly and affected by ambient conditions [4]. Therefore, color cameras were considered as the medium for capturing hand images. ...
... Yang et al. [15] modified the Nonparametric Structure Regularization Machine [16] by replacing its backbone with an HRNet integrated with a shuffle attention network [17] to estimate the 2D hand pose. In [4], a three-stage approach was proposed. The first stage extracted features using a UNet with a pretrained ResNet-34 encoder. ...
Article
This paper deals with the measurement of the hand keypoints in a vision-based setup under different constraints. Hand keypoint detection plays a crucial role in many gesture-based applications. However, developing a generalized detection method has remained a long-standing problem. Several factors impede accurate detection: the fingers’ distance from the camera and their nearness, self-occlusion, variations in illumination, and background clutter. To overcome these barriers, we propose a two-stage architecture. The first stage generates precise hand regions, eliminating adjoining skin regions and background clutter. The second stage incorporates a novel multiscale attention block to detect keypoint coordinates precisely. Qualitative and quantitative evaluations found that the proposed architecture outperforms state-of-the-art models, with endpoint errors as low as 2.3, 1.14, and 2.11 pixels for the three benchmark datasets. This advancement lays the groundwork for future 3D hand pose estimation developments and their applications.