ArticlePDF Available

3D RECONSTRUCTION TECHNIQUE FROM 2D SEQUENTIAL HUMAN BODY IMAGES IN SPORTS: A REVIEW

October 2020
Technology Reports of Kansai University

October 2020

Authors:

Azrulhizam Shapi’i

Universiti Kebangsaan Malaysia

Zainal Rasyid Mahayuddin

Universiti Kebangsaan Malaysia

The process of 3D Reconstruction is a fundamental problem in Computer Vision. However, recent researches have been successfully addressed by motion capture systems with body-worn markers and multiple cameras. To recover 3D reconstruction from a fully-body human pose by a single camera remains a challenging problem. For instance, noisy background, variation in human appearance, and self-occlusion were among these challenges. This thesis investigated methods of 3D Reconstruction from monocular image sequences in vigorous activities such as sports. Six current methods were selected based on their focus on recovery fully automated system for estimating 3D human pose for 2D joint location. These researches have been developed as an algorithm that can solve the ill-posed problem. The evaluation of the methods was divided into two sections. First, each process's theoretical and comparative study was disclosed to identify the technique used, the problems that inquired, and the results achieved in their approach. After that, the advantages and disadvantages of each method were listed. Also, several factors, such as accuracy, self-occlusion, and so on, have been compared amongst these methods. In the second stage, based on the advantages found in the first stage of evaluation, three methods were chosen to be evaluated using a specific data set. Initially, the codes of the three methods on the PennAction dataset (tennis) were run, and the performance of the methods in 3D Reconstruction is showed. Then, the methods were tested on a varied activities sequence from the CMU motion capture database. This study's novel is the evaluation of current methods based on the accuracy of their performance on the specific dataset of a tennis player. We also proposed a technique that combines each technique's particular advantages to create a more efficient method for 3D Reconstruction of 2D sequential images in outdoor activities.

Summary of Advantages and Disadvantages of Each Algorithm.

…

Comparison of Methods for 3D Reconstruction.

…

A comparison of PCP scores on PennAction dataset

…

Figures - uploaded by Azrulhizam Shapi’i

Content may be subject to copyright.

Content uploaded by Azrulhizam Shapi’i

Content may be subject to copyright.

ISSN: 04532198

Volume 62, Issue 09, October, 2020

4973

3D RECONSTRUCTION TECHNIQUE FROM 2D

SEQUENTIAL HUMAN BODY IMAGES IN

SPORTS: A REVIEW

Azrulhizam Shapii1, Sanaz Pichak2, Zainal Rasyid Mahayuddin3

Center for Artificial Intelligence Technology, Universiti Kebangsaan Malaysia1,2,3

ABSTRACT— The process of 3D Reconstruction is a fundamental problem in Computer Vision. However,

recent researches have been successfully addressed by motion capture systems with body-worn markers and

multiple cameras. To recover 3D reconstruction from a fully-body human pose by a single camera remains a

challenging problem. For instance, noisy background, variation in human appearance, and self-occlusion were

among these challenges. This thesis investigated methods of 3D Reconstruction from monocular image

sequences in vigorous activities such as sports. Six current methods were selected based on their focus on

recovery fully automated system for estimating 3D human pose for 2D joint location. These researches have

been developed as an algorithm that can solve the ill-posed problem. The evaluation of the methods was

divided into two sections. First, each process's theoretical and comparative study was disclosed to identify the

technique used, the problems that inquired, and the results achieved in their approach. After that, the

advantages and disadvantages of each method were listed. Also, several factors, such as accuracy, self-

occlusion, and so on, have been compared amongst these methods. In the second stage, based on the

advantages found in the first stage of evaluation, three methods were chosen to be evaluated using a specific

data set. Initially, the codes of the three methods on the PennAction dataset (tennis) were run, and the

performance of the methods in 3D Reconstruction is showed. Then, the methods were tested on a varied

activities sequence from the CMU motion capture database. This study's novel is the evaluation of current

methods based on the accuracy of their performance on the specific dataset of a tennis player. We also

proposed a technique that combines each technique's particular advantages to create a more efficient method

for 3D Reconstruction of 2D sequential images in outdoor activities.

KEYWORDS: 3D Reconstruction, Sports, Human pose, Images sequence.

1. INTRODUCTION

Multimedia equipment can capture video or multi-photographs in real-time in the course of a sports activity

that can be replayed to an athlete player after the game to identify and rectify faults in technique. However,

although this technique is flexible, the images shown provide only a single perspective (single camera view),

which reduces considerably the ability to conduct an in-depth analysis [4]. Multiple cameras can be used to

address this issue for simultaneous capture of the player's performance, but this will incur high costs and be

complicated. It will also require post-processing and thus limit the time for motion capture. On the other

hand, multiple challenges must be simplified in 3D Reconstruction of the human body area from sequential

images. In this article, some considerations are taken into account the different methods analyzed to determine

the most suitable approach to be applied to tennis. First, the "realistic human body" has been targeted due to

the complexity in modeling based on variations in individual body shape and different clothes. Second, the

accurate recognition of self-occlusion where some limbs block other body parts in the images and obstruct the

vision when the stationary camera is studied. Third, finding proper image descriptors can be more helpful in

resolving many pose ambiguities and usually require trial and evaluation procedures to determine the most

competitive representations. Finally, special attention was given to the inclusion of real-world conditions such

A. Shapii, S. Pichak and Z. R. Mahayuddin, 2020 Technology Reports of Kansai University

4974

as cluttered background, uncontrolled scenes, noisy data, and moving person's speed in a sequential frame

[12]. Therefore, this research identified the best method that improves the ill-posed problem and how to handle

outdoor conditions in the tennis court environment. This paper has three objectives, which are as follows;

first, we evaluate different methods for 3D Reconstruction of the human body from a sequence of monocular

images to determine which one performs efficiently under occlusions, noise on the real-world data. Second,

we compare the developed and implemented 3D reconstruction methods to identify the advantages and

drawbacks of each. Third, we propose a new technique combining the benefits of different methods studied

that is more accurate in a particular application (tennis sport) with a fixed and ordinary camera.

2. Literature Review

Considerable research has addressed the challenge of human motion capture from imagery such as [7], [11],

[14] allow reconstructing 3D human motion using feature tracks in monocular image sequences and

combining random camera motion depending on prior trained base poses. Also, they focus on any movement;

periodic and non-periodic. The review of the methods was conducted following the method proposed by [11],

[3], [1], [15], [14], [6]. These six of the most recent 3D reconstruction algorithms were selected for the analysis

based on their research's performance and result. The theoretical approach of all methods is discussed, and the

detailed performance of the mathematical model was identified. [11] offered a model that was not activity-

dependent to retrieve the 3D configuration of a human figure from 2D locations of anatomical points in a

single image, leveraging a large motion capture corpus as a substitute for visual memory. [3] developed three

principled approaches to enhance particle filtering by integrating bottom-up information either as proposal

density for obtaining more diverse particles or as complementary cues to improve likelihood computation

during the correction step. He also demonstrated that a feedback mechanism from top-down modeling could

further adapt and enhance the bottom-up predictors to enhance tracking performance. [1] modeled how joint-

limits differ with a pose for getting good poses. They collected a motion capture dataset that explored a

multiplicity of human poses and developed a pose-dependent model of joint limits that forms their prior. [15]

proposed the integration of a sparsity-driven 3D geometric prior and temporal smoothness when the image

locations of the human joints are provided and when they are unknown, and this was extended by

programming the image locations of the joints as latent variables by considering several ambiguities in 2D

joint areas. The approach suggested by [14] aims to address the issue of predicting non-rigid human 3D shape

and motion from image sequences captured by non-calibrated cameras. They factorized 2D observations in

camera parameters, base poses, and mixing coefficients, in the same way as other state-of-the-art solutions.

Compared with existing methods, the novelty of this method is that it can handle arbitrary camera motion

without the need to use predefined skeleton or anthropometric constraints. In contrast, other plans require

good camera motion during the sequence to obtain a proper 3D reconstruction.

[6] proposed in their method the goal to make the 3D motion reconstruction more accurate. So more built-in

knowledge was added, such as height-map, which was introduced into the algorithmic scheme of

reconstructing the 3D pose/motion in a single-view calibrated camera. Finally, our approach was a

comparative study of 3D reconstruction methods of the human body from a tennis player's 2D image sequence.

We focused on evaluating different methods that studied sports poses by analyzing several factors such as

accuracy of human pose estimation, self-occlusion, and noisy background that are still not fully resolved. We

run the code of their algorithm of these methods in MATLAB on the Penn Action dataset to get a 3D

reconstruction result. After collecting all the results and comparing them together, we proposed a new

technique that combines three methods Xiawoei method, the Wandt method, and the Du method. The novel

approach proposed improving 2D joint location and occlusion to recreate 2D images into 3D images with

realistic results, minimum requirements, and significant results.

ISSN: 04532198

Volume 62, Issue 09, October, 2020

4975

3. Methodology

The methodology used in this research consisted of four phases described in Figure 1. Phases one consisted

of the analysis of multiple methods recently published for 3D Reconstruction to identify six methods that

showed to be the most relevant for this research. Step two consisted of comparing the experimental result of

each technique presented by the authors based on several factors such as projection, camera, realistic

Reconstruction, self-occlusion, accuracy, noisy background, and process speed to shorten the list three

highlighted methods. In phase three, the evaluation of these three selected methods was studied using specific

sequential images of the tennis player data set, and the results were compared. Finally, step 4 consisted of the

proposal of a new, improved method for 3D Reconstruction from 2D sequential images that combine the

robustness of each technique evaluated.

Figure 1. Methodology Use for Comparative Study of 3D Reconstruction

Figure 2 Displayed the pathway selected to evaluate the performance of the three chosen methods. The first

step for the analysis was conducted by analyzing the mathematics described for each method. Following this,

the code was digitized using MATLAB. Each method's performance was assessed using the specific dataset

proposed by each author to verify that the codes are working without error. However, when the code was not

provided, additional work was required, and the mathematical analysis of the code was used to program the

method as described by the author. Specific factors for these methods were evaluated on our particular data

set to assess their performance and compare their accuracy in 3D Reconstruction [8]. The output of running

codes was compared using a tennis player's dataset (Penn Action). Finally, these methods were evaluated on

the CMU dataset to understand their performance in 3D reconstruction error, accuracy percentage, and to

compare the results.

A. Shapii, S. Pichak and Z. R. Mahayuddin, 2020 Technology Reports of Kansai University

4976

Figure 2. Process Used for The Evaluation of Three Selected Methods

The final stage of this research consisted of the compilation of the advantage found in the evaluated methods.

Specific benefits were integrated into the core method (i.e., The method that shows the best performance on

the proposed data set) to overcome disadvantages found and improve 3D Reconstruction efficacy. The final

method proposed includes the highlights and provided a novel approach for 3D Reconstruction from 2D

sequential tennis sports images.

4. Results

The review of each selected method is presented in Table 1. The advantages and disadvantages of each one

of their techniques are described.

Table 1. Summary of Advantages and Disadvantages of Each Algorithm.

Authors

Specification of

application

Pros

Cons

Illustrations

Ramakri

shna

method

1- Human pose

recovery based on

sparse

representation in an

over-complete

dictionary

2- enforces a

mandatory

criterion on the sum

of squared limb

lengths

3-enforces the

quantum of eight

selected limbs for

constancy.

1-solves

anthropometric

regularity

2-robustness to missing

data

3-Joint Sensitivity to

noise

4- describes an

expansive range of

actions by a statistical

model of human pose

5- solves the pose and

camera by reducing the

image reprojection

error.

1-Limb proportions

are different

between various

individuals.

2- Does not support

occlusion handling.

3-Cannot recover

the correct pose

with sound

perspective effects

when the mean pose

is not a reasonable

initialization

ISSN: 04532198

Volume 62, Issue 09, October, 2020

4977

4- utilizes a model

with 23 landmarks

of human anatomy.

5- MP algorithm to

estimate the sparse

demonstration of

3D pose and the

relative camera

from only 2D image

6- low RMS

reconstruction error

7- provides accurate

results using a single

image and no

requirement for

annotation to resolve

ambiguities

8-accuracy in

the camera estimation

9- applied to frames of

monocular video

streams

10-able to recover the

pose from non-standard

viewpoints

11- good generalization

to an extensive range of

poses

and viewpoints

Atul

method

1-fully automated

3D human pose and

shape analysis of the

human targets in

videos, recognizing

their activities and

characterizing their

behavior

2- combines Top-

down and Bottom-

up methods

3-uses advanced

Particle filtering

(PF) algorithms.

4- uses the

framework of the

non-parametric

density propagation

system

based on particle

filtering

1- high efficacy in

substantial ambiguities

2- overcomes

limitations of particle

filtering by improving

the proposal

density modeling and

likelihood computation

function

3-improves tracking

4-solves non-rigid

deformable surface

reconstruction

5- articulates body pose

recovery in static

images

1-self and Partial

occlusion in unseen

scenarios

2-Optimization

problems

3- uses fixed bone

lengths priors

1-a physically-

motivated prior

allows

anthropometrically

valid poses and

restricts on invalid

poses

2- last is combined

1-good generalization

while avoiding

invalid 3D poses

2-pose

parameterization is

accurate and

straightforward

3-improves pose

1-depth ambiguities

at several joints

2- incorrect

estimation of the

camera matrix

3-algorithm is

sensitive against

Gaussian noise

A. Shapii, S. Pichak and Z. R. Mahayuddin, 2020 Technology Reports of Kansai University

4978

Akhter

method

with a selected

sparse

representation of

poses from an over-

complete dictionary

3- formulates prior

over two endpoints

of each 3D bone

location.

4- uses Orthogonal

Matching Pursuit

(OMP)

estimation by a

grouping of body parts

(extended-torso)

4-defines a kinematic

skeleton tree structure

to apply joint-angle

limits

5-avoids non-rigid self-

calibration by selecting

linear coefficients from

a cosine function

4-needs multi-

camera setups

Xiaowei

method

1- combines a

sparsity-driven 3D

geometric

prior and

a 3D temporal

smoothness prior

2- uses a deep

convolutional

neural network

(CNN) architecture

to detect body parts

3- an Expectation-

Maximization (EM)

framework to

retrieve a sparse

model of 3D human

pose sequence

4- casting the 2D

joint locations as

latent variables

1- highly effective

against detector error,

occlusion, and

ambiguity

2- no requirement for

synchronized 2D-3D

data

3- handles the 2D

estimation uncertainty

in a statistical

framework

4-good accuracy in-the-

wild videos

5- improves the

initialization results

6- improves 2D joint

localization

7- using a single camera

1- cannot handle

multiple subjects

2- assumes

manually labeled

2D joint locations

Wandt

method

1-A periodic model

to mix coefficients

for periodic and

quasi-periodic

motions

2- a regularization

term based on

temporal bone

length constancy

prior for non-

periodic motion

3-based on a-priorly

trained base poses

4-model 3D pose as

a linear combination

of base poses

1-estimates non-rigid

human body pose

captured by an

uncalibrated camera

2-solves an unstable 3D

motion reconstruction

3-accurate algorithm

for estimating periodic

motion

4- handles arbitrary

camera motion

5-the stability of the

method

6-handles noise and

occlusions in real-

world data

7- does not use a

predefined

skeleton or

anthropometric

1-restrictive

assumptions on the

3D configurations

possible

2- ambiguous

camera placement

and 3D shape

deformation

ISSN: 04532198

Volume 62, Issue 09, October, 2020

4979

constraints

method

1- RGB-based 2D

joint detection

algorithm

2- a dual-stream

Deep Convolution

Network (ConvNet)

to detect 2D

landmarks of human

joints

3-algorithm utilizes

the height-map as a

type of built-in prior

knowledge to

capture 3D

articulated skeleton

a motion under a

single-view

calibrated camera

1- improves skeleton-

based human 3D pose

estimation from the

inaccurate

localization of 2D

joints by temporal

constraints on the

camera

2-lower

reconstruction error

1-discriminative parts

are missed in pose

estimation

2-fails in complex

human motion

After identifying each algorithm's weaknesses and strengths, the experimental results of different parameters

are presented in Table 2. These results are based on the publisher's assessment of their selective database. For

example, [11], [15], focused their analysis in a noisy background. On the other hand, [3], [1], [6] focused their

results on self-occlusion. Moreover, some of the described methods present the evaluation on a single image,

whereas others evaluate sequential photos. Three methods were found suitable for the proposed application

of this study, as described below. Among these methods, two methods (i.e., Xiaowei and Bastian) showed

successful outcomes in a noisy background, self-occlusion, and realistic Reconstruction, making them ideal

for further evaluation of 3D Reconstruction. Furthermore, from the other methods compared, the Yu du

method was selected due to its outstanding results, the parameter required for the analysis of the dataset chose.

No significant advantages were found in the other methods analyzed and were discarded as they presented a

lack of noise background reduction, realistic Reconstruction, and both. Finally, these three selected methods

showed to be the most suitable methods to be analyzed using their result in a database of a tennis player.

Table 2. Comparison of Methods for 3D Reconstruction.

Parameters

Reference

Ramakrishna

Method

Atul

Method

Akhter

Method

Xiaowei

Method

Wandt

Method

Algorithm

Projected

Matching

Pursuit (MP)

Advanced

particle

filtering(PF)

Orthogonal

Matching

Pursuit

(OMP)

Expectation-

Maximization

Periodic -

non-

Periodic

Height-map

Projection

Single image

Sequence

images

Single

image

Sequence

images

Sequence

images

Sequence

images

Camera

arbitrary

Fixed

arbitrary

Arbitrary

fixed

Code of

algorithm

restrict

Open-source

restrict

Open-source

restrict

Open-source

Noisy

focus

No focus

focus

No focus

A. Shapii, S. Pichak and Z. R. Mahayuddin, 2020 Technology Reports of Kansai University

4980

background

Self-occlusion

No focus

Focus

No focus

focus

Realistic

reconstruction

(Average 3D

error/ cm)

Not given

121.56

113.01

187.1

118.69

Average 2D

joints

localization

accuracy (pixel)

94.5

27.65

Not given

10.85

8.43

95.8

5. Evaluation of Methods On Pennaction

This section demonstrates the application of the selected approaches for pose estimation with in-the-wild

images sequence. Results are presented utilizing action from the PennAction dataset. The "tennis forehand"

was chosen for evaluation due to it is not a simple pose. It also has some challenges, such as the large pose

variability, self-occlusion, and image blur because of fast motion. We selected six frames (2,8,14,20,25,30)

from 31 images sequence of the dataset that we were able to evaluate the main factors. Tables 3 to table 8

illustrated the 3D results of each method on frames.

Table 3. 3D Results of Frame #2

ISSN: 04532198

Volume 62, Issue 09, October, 2020

4981

Analysis of frame #2 is shown that the Wandt method didn't have the correct result in the left arm (red color)

because of the racquet as a noisy environment.

Table 4. 3D Results of Frame #8

Input

Frame #8

Xiaowei

Wandt

Analysis of frame #8 is shown that the Du method didn't have accurate results in the right leg (violet color)

because of the right leg occluded by the left leg that it causes the problem of self-occlusion.

Table 5 3D Results of Frame #14

Input

Frame #14

A. Shapii, S. Pichak and Z. R. Mahayuddin, 2020 Technology Reports of Kansai University

4982

Xiaowei

Wandt

Analysis of frame #14 is shown that both methods Wandt and Du didn't have good results in both arms because

of arms of the player are behind the racquet. It causes occlusion.

Table 6 3D Results of Frame #20

ISSN: 04532198

Volume 62, Issue 09, October, 2020

4983

Analysis of frame #20 is shown that the Xiaowei method had the best result in this specific angle with less

missing data.

Table 7. 3D Results of Frame #25

A. Shapii, S. Pichak and Z. R. Mahayuddin, 2020 Technology Reports of Kansai University

4984

Analysis of frame #25 is shown that the Wandt method had a poor result in arms and shoulder (yellow color)

but had a better impact on the parts of legs compare with the Du method.

Table 8. 3D Results of Frame #30

Analysis of frame #30 is shown that the Wandt method is sensitive to the angle of the image and couldn't

reconstruct with accuracy.

Table 9 shows a summary of the results on the tennis player dataset. The conclusion of these results is shown

that the method proposed by [15], [6] had several similarities in the 3D result. But Xiaowei algorithm is more

robust to noise and can handle occlusions and reconstruct the occluded body parts correctly. Although,

Bastian's method revealed better performance to reconstruct in the part of the legs.

Table 9. A comparison of PCP scores on PennAction dataset

Methods

Upper Arms

Lower Arms

Upper legs

Lower legs

Average

Xiaowei

0.93

0.71

0.96

0.84

0.86

0.92

0.68

0.94

0.82

0.84

Wandt

0.90

0.61

0.98

0.85

0.83

ISSN: 04532198

Volume 62, Issue 09, October, 2020

4985

5.1 Running Time

Table 10 showed the processing time of the three methods evaluated. The method proposed by Xiaowei is the

fastest method to process 3D reconstruction according to the specifications mentioned in the computing

section. Algorithms usually converge in 20 iterations with average CPU time below 150s for a sequence of

31 frames.

Table 10. The Processing Time of Each Algorithm

processing time

(seconds) per frame

Xiaowei method

Wandt method

Du method

5.2 Evaluation of Methods On CMU

These methods were evaluated by testing them on a sequence of varied activities from the CMU motion

capture database. Care was taken to make sure that the motion capture frames were not those utilized in the

shape bases' training. It could be seen that the reconstruction results of the jumping sequences were inferior

in comparison with the other sequences. This was because the difference between jumping motions of various

individuals was much larger than between running movements. As such, a new, untrained jumping motion

was insufficiently explained by the base poses, whereas each new running pattern was the same as those in

the training data. Second, the evaluation of 3D motion recovery was carried out with the ground-truth 2D joint

locations. The 3D reconstruction errors in millimeters are reported in Table 11. The standard evaluation per

common error (mm) in 3D was computed between the reconstructed pose and the ground truth in the camera

frame and their root locations.

Table 11. 3D reconstruction error in mm

Method

run

Jump

(mm)

Wandt

28.05

31.13

58.3

64.4

Xiaowei

20.99

47.57

This table is shown 3D reconstructions of Xiaowei are highly realistic, which was demonstrated by the 3D

error.

6. Discussion

The result from the evaluation conducted indicated that the method proposed by [15] is highly recommended

for 3D reconstruction of tennis player images. In this method, 2D joint heat maps capturing positional

uncertainty are generated with a deep, fully CNN. These heat maps are combined with a sparse model of the

3D human pose within an Expectation-Maximization framework realized the 3D parameter estimation over

the entire sequence. However, this method provided a solution for most of the challenges in 3D reconstruction,

such as large pose variability, self-occlusion, and image blur caused by fast motion. But, it needs manually

labeled for a 2D joint location that reduces the percentage of accuracy. To improve this issue, we proposed a

3D human pose estimation framework presented by Wandt et al. (2016). It consists of a synthesis between

discriminative image-based and 3D reconstruction. It treated 2D joint locations as latent variables whose

uncertainty distributions are given by a deep, fully convolutional neural network. The new 3D poses are

modeled by sparse representation. The 3D parameter estimates are realized via an Expectation-Maximization

A. Shapii, S. Pichak and Z. R. Mahayuddin, 2020 Technology Reports of Kansai University

4986

algorithm, where it is shown that the 2D joint location uncertainties can be conveniently marginalized out

during inference. Further, to improve the robustness of the method against occlusion and reconstruction

ambiguity, 3D temporal smoothness prior is imposed on the 3D pose and viewpoint parameters, which [6]

considered. Therefore, the usage of the method proposed by [15] as core-base and the integration of the

advantages described for the process by [14], [6] might provide an effective method for 3D reconstruction of

images sequence on a specific dataset. The novel method proposed might improve 2D joint location and

occlusion that can recreate 2D images into 3D images with realistic results, minimum requirements, and

significant results.

7. Conclusion

This paper is a review study of 3D reconstruction methods of the human body from a 2D image sequence of

tennis players. Among all the sports, we chose tennis sport because this exercise developed at very high speed

and required technical skills development. Also, this sport presents a challenge for the 3D reconstruction due

to factors such as self-occlusion and occlusion that occurs during the development of the game. Many

technologies tried to help raise the level of athletic techniques and reduce arbitration errors and physical

damage. However, they faced multiple problems such as high cost, time-consuming and heavy equipment

[10]. We believe that the simulation of tennis players' movement taken by the arbitrary camera through a 3D

reconstruction of sequential images reduces might be economically viable and simplify the time when

compared with traditional technologies. Moreover, this method can help the players and coaches to improve

their skills significantly. On the other hand, the increasing demand for 3D reconstruction, especially for the

human body, can provide multiple additional applications such as movies, gaming, and medical purpose. The

achievement of this research can help other industries, as well. Especially, generating 3D poses from a

sequence of images is much cheaper than marker-based technologies. Modeling the 3D human body from

image sequences is a challenging problem and has been a research topic for many years [2]. Significant

theoretical and algorithmic results were achieved to extract even complicated poses of the human body form.

Research in the area of the human pose has been approached from many different issues in an attempt to

implement a robust, accurate, and automatic fully-body system. In this paper, we focused on evaluating

different methods that studied sports poses by analyzing several factors that are still not fully resolved in this

area. For instance, realistic scenes background clutter, human appearance variation, and self-occlusion are

challenges required in an in-depth investigation. We also identified the most suitable method, which improves

the ill-posed problem and can handle outdoor conditions to be implemented in the tennis court environment

with a high-speed process [13]. To reach this goal, there is a two-step evaluation. First of all, we have chosen

six current methods based on their focus on several features such as image sequence, camera, sport poses in

the real world, and so on. These methods have improved several challenges in old methods and some

recommendations for future work. Advantages and disadvantages of methods of [11], [3], [1], [15], [14], [6]

were discussed and compared theoretically. Some factors, such as accuracy of human pose estimation, self-

occlusion, and noisy background, were analyzed in their experimental results.

In the next step of evaluation, three top methods are selected for further and more in-depth analysis. We run

the code of their algorithm in MATLAB on the Penn Action dataset to get a 3D reconstruction result. The

codes of [15], [6] obtained from the Internet. We have implemented the code of [14] method by ourselves. To

get the final and definitive results, we also tested these methods on the database CMU MoCap. After that, it

was decided that among them, the methods proposed by Xiaowei might be the most suitable method to be

implemented for the 3D reconstruction applied to tennis. This method proved to be faster than the other

method evaluated and produced outstanding results inaccuracy. Subsequently, the method proposed by the

Wandt showed to provide better accuracy when dealing with self-occlusions. Finally, the method proposed

by Du showed the lowest accuracy and poor performance when occlusion was involved. We eventually

ISSN: 04532198

Volume 62, Issue 09, October, 2020

4987

proposed a new technique, combining some approaches of the three methods Xiaowei method, the Wandt

method, and the Du method. It presented a 3D human pose estimation framework from a monocular image

consisting of a novel synthesis between a deep learning-based 2D part regressor, a sparsity-driven 3D

reconstruction approach of the Wandt, and a 3D temporal smoothness prior in the Du method. This joint

consideration combines the discriminative power of state-of-the-art 2D part detectors, the expressiveness of

3D pose models, and regularization by aggregating information over time. So, it can go directly from 2D

appearance to 3D geometry. The proposed method can improve 2D joint locations for tennis players in outdoor

conditions from sequence images taken by an arbitrary camera.

8. Acknowledgment

This research was funded by the University Grants PP-FTSM-2020.

9. References

[1] Akhter, I. and Black, M. J., 2015. Pose-Conditioned Joint Angle Limits for 3D Human Pose

Reconstruction. IEEE Conference On Computer Vision and Pattern Recognition (CVPR), pp. 1446-1455

[2] Ashraf, Y. A., Venkat, I. and Belaton, B. 2014. Reconstruction of 3d Faces by Shape Estimation and

Texture Interpolation. Asia-Pacific Journal of Information Technology and Multimedia, 3(1): pp. 15 – 21.

[3] Atul, K. 2014. Coupling Top-down and Bottom-up Methods for 3D Human Pose and Shape

Estimation from Monocular Image Sequences. Pattern Recognition :1410-0117.

[4] Thompson, J. J., Jain, A., LeCun, Y., & Bregler, C. 2014. Joint training of a convolutional network

and a graphical model for human pose estimation. In Advances in neural information processing systems:

pp.1799-1807. arXiv:1406.2984.

[5] Rasheed, Nada & Nordin, Md Jan. (2018). Classification and Reconstruction Algorithms for the

Archaeological Fragments. Journal of King Saud University - Computer and Information Sciences.

10.1016/j.jksuci.2018.09.019.

[6] Du, Y., Wong, Y., Liu, Y., Han, F., Gui, Y., Wang, Z., Kankanhalli, M. & Geng, W. 2016. Marker-

less 3D human motion capture with monocular image sequence and height-maps. In European Conference on

Computer Vision. pp. 20-36.

[7] Gotardo, P.F. & Martinez, A.M. 2011. Non-rigid structure from motion with complementary rank-3

spaces. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on: pp. 3065-3072.

[8] Li, F., Li, K. & Lv, J. 2013. Research on biomechanics technology based on the tennis sports. In: Du

W. (eds) Informatics and Management Science III. Lecture Notes in Electrical Engineering, vol 206. Springer,

London. pp. 415-420.

[9] Mousavi Kahaki, Seyed Mostafa & Nordin, Md Jan & Ashtari, Amir & Zahra, Sophia. (2016).

Invariant Feature Matching for Image Registration Application Based on New Dissimilarity of Spatial

Features. PloS one. 11. e0149710. 10.1371/journal.pone.0149710.

[10] Norshaliza, K. 2016. Active Contour Model Using Fractional Sinc Wave Function for Medical Image

Segmentation. Asia-Pacific Journal 0f Information Technology and Multimedia, 5(2): 47 – 61.

A. Shapii, S. Pichak and Z. R. Mahayuddin, 2020 Technology Reports of Kansai University

4988

[11] Ramakrishna, V., Kanade, T. & Sheikh, Y. 2012. Reconstructing 3d human pose from 2d image

landmarks. In European Conference on Computer Vision. pp. 573-586.

[12] Saima, A. L., Rosziati, I., Nik Shahidah, A. M. T., Norhalina, S. and Suhaila, S. 2018. Thresholding

and Quantization Algorithms for Image Compression Techniques: A Review. Asia-Pacific Journal of

Information Technology and Multimedia, 7(1): 83 – 89.

[13] Shingade, A. & Ghotkar, A. 2014. Animation of 3D human model using markerless motion capture

applied to sports. International Journal of Computer Graphics & Animation (IJCGA) 4(1): 27-39.

[14] Wandt, B., Ackermann, H. & Rosenhahn, B. 2016. 3D Reconstruction of Human Motion from

Monocular Image Sequences. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(8): pp.

1505-1516.

[15] Xiaowei, Z., X., Zhu, M., Leonardos, S., Derpanis, K.G. and Daniilidis, K. 2016. Sparseness meets

deepness: 3D human pose estimation from monocular video. In Proceedings of the IEEE Conference on

Computer Vision and Pattern Recognition. pp. 4966-4975.

[16] Azrulhizam Shapii, Sanaz Pichak, Mohd Baharuddin & Iida Hiroyuki. (2020). Comparative Study of

3D Reconstruction Methods from 2D Sequential Images in Sports. Asia-Pacific Journal of Information

Technology and Multimedia. 09. 40-57. 10.17576/apjitm-2020-0901-04.

This work is licensed under a Creative Commons Attribution Non-Commercial 4.0

International License.

Singular and Multimodal Techniques of 3D Object Detection: Constraints, Advancements and Research Direction

Article

Full-text available

Dec 2023

Two-dimensional object detection techniques can detect multiscale objects in images. However, they lack depth information. Three-dimensional object detection provides the location of the object in the image along with depth information. To provide depth information, 3D object detection involves the application of depth-perceiving sensors such as LiDAR, stereo cameras, RGB-D, RADAR, etc. The existing review articles on 3D object detection techniques are found to be focusing on either a singular modality (e.g., only LiDAR point cloud-based) or a singular application field (e.g., autonomous vehicle navigation). However, to the best of our knowledge, there is no review paper that discusses the applicability of 3D object detection techniques in other fields such as agriculture, robot vision or human activity detection. This study analyzes both singular and multimodal techniques of 3D object detection techniques applied in different fields. A critical analysis comprising strengths and weaknesses of the 3D object detection techniques is presented. The aim of this study is to facilitate future researchers and practitioners to provide a holistic view of 3D object detection techniques. The critical analysis of the singular and multimodal techniques is expected to help the practitioners find the appropriate techniques based on their requirement.

Research on Object Panoramic 3D Point Cloud Reconstruction System Based on Structure From Motion

Article

Full-text available

Jan 2022

3D reconstruction is the transformation of real objects into mathematical models. By using 3D models, we can observe the shape and measure the parameters, and help us to analyze the properties of objects. For the problems of incompleteness and inefficiency in the reconstruction of object 3D point clouds, a fast and automated system for panoramic 3D point cloud reconstruction of objects was proposed. First, we designed an automatic platform, which could acquire RGB image sequences of objects in two directions. Then we adopted the Structure From Motion (SFM) algorithm to generate point clouds. For the problem of different scales of point clouds, we obtained the scaling by calculating the length ratio of the axes of the oriented bounding box, and scaled the point clouds to a uniform scale. In addition, markers were placed around the object and used to acquire the rotation matrix of the object point cloud in two directions. Finally, we verified the point cloud models of different objects generated by the system, and found that the relative error didn’t exceed 6.67%. According to the results, the system proposed could reconstruct the panoramic 3D point cloud of the object better and provide a reference for related research.

A Systematic Review of Recent Deep Learning Approaches for 3D Human Pose Estimation

Article

Full-text available

Dec 2023

Three-dimensional human pose estimation has made significant advancements through the integration of deep learning techniques. This survey provides a comprehensive review of recent 3D human pose estimation methods, with a focus on monocular images, videos, and multi-view cameras. Our approach stands out through a systematic literature review methodology, ensuring an up-to-date and meticulous overview. Unlike many existing surveys that categorize approaches based on learning paradigms, our survey offers a fresh perspective, delving deeper into the subject. For image-based approaches, we not only follow existing categorizations but also introduce and compare significant 2D models. Additionally, we provide a comparative analysis of these methods, enhancing the understanding of image-based pose estimation techniques. In the realm of video-based approaches, we categorize them based on the types of models used to capture inter-frame information. Furthermore, in the context of multi-person pose estimation, our survey uniquely differentiates between approaches focusing on relative poses and those addressing absolute poses. Our survey aims to serve as a pivotal resource for researchers, highlighting state-of-the-art deep learning strategies and identifying promising directions for future exploration in 3D human pose estimation.

Thresholding And Quantization Algorithms for Image Compression Techniques: A Review

Article

Full-text available

Jun 2018

With increasing demand on digital images, there is a need to compress the image to entertain the limited bandwidth and storage capacity. Recently, there is a growing interest among researchers focusing on compression of various types of images and data. Amongst various compression algorithms, transform-based compression is one of the promising algorithms. Despite the technological advances in transmission and storage, the demands placed on the bandwidth of communication and storage capacities by far outstrips its availability. This paper presents a review of image compression principle, compression techniques and various thresholding algorithms (pre-processing algorithms) and quantization algorithm (post-processing algorithms). This paper intends to give an overview to the relevant parties to choose the suitable image compression algorithms to suit with the need.

Active Contour Model Using Fractional Sync Wave Function For Medical Image Segmentation

Article

Full-text available

Dec 2016

Norshaliza Kamaruddin

Intensity inhomogeneity occurs when pixels in medical images overlap due to anomalies in medical imaging devices. These anomalies lead to difficult medical image segmentation. This study proposes a new active contour model (ACM) with fractional sinc function to inexpensively segment medical images with intensity inhomogeneity. The method integrates a nonlinear fractional sinc function in its curve evolution and edge enhancement. The fractional sinc function contributes in giving a rapid contour movement where it improves the curve's bending capability. Furthermore, the fractional sinc function enables the contour evolution to move toward the object based on the preserved edges. This study uses the proposed method to segment medical images with intensity inhomogeneity using five various image modalities. With improved speed, the proposed method more accurately segments medical images compared with other baseline methods.

Sparseness Meets Deepness: 3D Human Pose Estimation from Monocular Video

Conference Paper

Full-text available

Jun 2016

Coupling Top-down and Bottom-up Methods for 3D Human Pose and Shape Estimation from Monocular Image Sequences

Article

Full-text available

Oct 2014

Atul Kanaujia

Until recently Intelligence, Surveillance, and Reconnaissance (ISR) focused on acquiring behavioral information of the targets and their activities. Continuous evolution of intelligence being gathered of the human centric activities has put increased focus on the humans, especially inferring their innate characteristics - size, shapes and physiology. These bio-signatures extracted from the surveillance sensors can be used to deduce age, ethnicity, gender and actions, and further characterize human actions in unseen scenarios. However, recovery of pose and shape of humans in such monocular videos is inherently an ill-posed problem, marked by frequent depth and view based ambiguities due to self-occlusion, foreshortening and misalignment. The likelihood function often yields a highly multimodal posterior that is difficult to propagate even using the most advanced particle filtering(PF) algorithms. Motivated by the recent success of the discriminative approaches to efficiently predict 3D poses directly from the 2D images, we present several principled approaches to integrate predictive cues using learned regression models to sustain multimodality of the posterior during tracking. Additionally, these learned priors can be actively adapted to the test data using a likelihood based feedback mechanism. Estimated 3D poses are then used to fit 3D human shape model to each frame independently for inferring anthropometric bio-signatures. The proposed system is fully automated, robust to noisy test data and has ability to swiftly recover from tracking failures even after confronting with significant errors. We evaluate the system on a large number of monocular human motion sequences.

Animation of 3D Human Model Using Markerless Motion Capture Applied To Sports

Article

Full-text available

Feb 2014

Markerless motion capture is an active research in 3D virtualization. In proposed work we presented a system for markerless motion capture for 3D human character animation, paper presents a survey on motion and skeleton tracking techniques which are developed or are under development. The paper proposed a method to transform the motion of a performer to a 3D human character (model), the 3D human character performs similar movements as that of a performer in real time. In the proposed work, human model data will be captured by Kinect camera, processed data will be applied on 3D human model for animation. 3D human model is created using open source software (MakeHuman). Anticipated dataset for sport activity is considered as input which can be applied to any HCI application.

Reconstruction of 3D Faces by Shape Estimation and Texture Interpolation

Article

Full-text available

Jun 2014

This paper aims to address the ill-posed problem of reconstructing 3D faces from single 2D face images. An extended Tikhonov regularization method is connected with the standard 3D morphable model in order to reconstruct the 3D face shapes from a small set of 2D facial points. Further, by interpolating the input 2D texture with the model texture and warping the interpolated texture to the reconstructed face shapes, 3D face reconstruction is achieved. For the texture warping, the 2D face deformation has been learned from the model texture using a set of facial landmarks. Our experimental results justify the robustness of the proposed approach with respect to the reconstruction of realistic 3D face shapes.

Non-rigid structure from motion with complementary rank-3 spaces

Conference Paper

Full-text available

Jul 2011
IEEE Comput Soc Conf Comput Vis Pattern Recogn

Non-rigid structure from motion (NR-SFM) is a difficult, underconstrained problem in computer vision. This paper proposes a new algorithm that revises the standard matrix factorization approach in NR-SFM. We consider two alternative representations for the linear space spanned by a small number K of 3D basis shapes. As compared to the standard approach using general rank-3K matrix factors, we show that improved results are obtained by explicitly modeling K complementary spaces of rank-3. Our new method is positively compared to the state-of-the-art in NR-SFM, providing improved results on high-frequency deformations of both articulated and simpler deformable shapes. We also present an approach for NR-SFM with occlusion.

Marker-Less 3D Human Motion Capture with Monocular Image Sequence and Height-Maps

Conference Paper

Oct 2016

The recovery of 3D human pose with monocular camera is an inherently ill-posed problem due to the large number of possible projections from the same 2D image to 3D space. Aimed at improving the accuracy of 3D motion reconstruction, we introduce the additional built-in knowledge, namely height-map, into the algorithmic scheme of reconstructing the 3D pose/motion under a single-view calibrated camera. Our novel proposed framework consists of two major contributions. Firstly, the RGB image and its calculated height-map are combined to detect the landmarks of 2D joints with a dual-stream deep convolution network. Secondly, we formulate a new objective function to estimate 3D motion from the detected 2D joints in the monocular image sequence, which reinforces the temporal coherence constraints on both the camera and 3D poses. Experiments with HumanEva, Human3.6M, and MCAD dataset validate that our method outperforms the state-of-the-art algorithms on both 2D joints localization and 3D motion recovery. Moreover, the evaluation results on HumanEva indicates that the performance of our proposed single-view approach is comparable to that of the multi-view deep learning counterpart.

3D Reconstruction of Human Motion from Monocular Image Sequences

Article

Apr 2016

This article tackles the problem of estimating non-rigid human 3D shape and motion from image sequences taken by uncalibrated cameras. Similar to other state-of-the-art solutions we factorize 2D observations in camera parameters, base poses and mixing coefficients. Existing methods require sufficient camera motion during the sequence to achieve a correct 3D reconstruction. To obtain convincing 3D reconstructions from arbitrary camera motion, our method is based on a-priorly trained base poses. We show that strong periodic assumptions on the coefficients can be used to define an efficient and accurate algorithm for estimating periodic motion such as walking patterns. For the extension to non-periodic motion we propose a novel regularization term based on temporal bone length constancy. In contrast to other works, the proposed method does not use a predefined skeleton or anthropometric constraints and can handle arbitrary camera motion. We achieve convincing 3D reconstructions, even under the influence of noise and occlusions. Multiple experiments based on a 3D error metric demonstrate the stability of the proposed method. Compared to other state-of-the-art methods our algorithm shows a significant improvement.

Reconstructing 3D Human Pose from 2D Image Landmarks

Conference Paper

Oct 2012

Reconstructing an arbitrary configuration of 3D points from their projection in an image is an ill-posed problem. When the points hold semantic meaning, such as anatomical landmarks on a body, human observers can often infer a plausible 3D configuration, drawing on extensive visual memory. We present an activity-independent method to recover the 3D configuration of a human figure from 2D locations of anatomical landmarks in a single image, leveraging a large motion capture corpus as a proxy for visual memory. Our method solves for anthropometrically regular body pose and explicitly estimates the camera via a matching pursuit algorithm operating on the image projections. Anthropometric regularity (i.e., that limbs obey known proportions) is a highly informative prior, but directly applying such constraints is intractable. Instead, we enforce a necessary condition on the sum of squared limb-lengths that can be solved for in closed form to discourage implausible configurations in 3D. We evaluate performance on a wide variety of human poses captured from different viewpoints and show generalization to novel 3D configurations and robustness to missing data.

3D RECONSTRUCTION TECHNIQUE FROM 2D SEQUENTIAL HUMAN BODY IMAGES IN SPORTS: A REVIEW

Abstract and Figures

Recommended publications

Comparative Study of 3D Reconstruction Methods from 2D Sequential Images in Sports

A Kinematic Chain Space for Monocular Motion Capture

RepNet: Weakly Supervised Training of an Adversarial Reprojection Network for 3D Human Pose Estimati...

Total Ankle Replacement Digital Templating Method