Conference PaperPDF Available

Deformable Model-Based Shape and Motion Analysis from Images using Motion Residual Error.

Authors:

Abstract

We present a novel method for the shape and motion estimation of a deformable model using error residuals from model-based motion analysis. The motion of the model is first estimated using a model-based least squares method. Using the residuals from the least squares so- lution, the non-rigid structure of the model can be better estimated by computing how changes in the shape of the model affect its motion parameterization. This method is implemented as a component in a deformable model-based framework that uses optical flow information and edges. This general model-based framework is applied to human face shape and motion estimation. We present experiments that demonstrate that this framework is a considerable im- provement over a framework that uses only optical flow information and edges.
(appeared in Proceedings ICCV ’98, pp. 113-119)
Deformable Model-Based Shape and Motion Analysis
from Images using Motion Residual Error
Douglas DeCarlo and Dimitris Metaxas
Department of Computer & Information Science, University of Pennsylvania,Philadelphia PA 19104-6389
dmd@gradient.cis.upenn.edu,dnm@central.cis.upenn.edu
Abstract
We present a novel method for the shape and motion
estimation of a deformable model using error residuals
from model-based motion analysis. The motion of the
model is first estimated using a model-based least squares
method. Using the residuals from the least squares so-
lution, the non-rigid structure of the model can be better
estimated by computing how changes in the shape of the
model affect its motion parameterization. This method is
implemented as a component in a deformable model-based
framework that uses optical flow information and edges.
This general model-based framework is applied to human
face shape and motion estimation. We present experiments
that demonstrate that this framework is a considerable im-
provement over a framework that uses only optical flow
information and edges.
1 Introduction
In this paper we develop a new robust approach to the
problem of shape and motion from non-rigid motion using
deformable models. The starting point for our method is an
intuitive distinction between shape and motion. Our model
has motion parameters, which describe both rigid and non-
rigid motions. The model also has shape parameters, which
describe the basic underlying shape of the model. The pur-
pose of this distinction is to reduce the number of motion
parameters. The shape parameters represent static quanti-
ties whose true values are fixed. The motion parameters
are dynamic quantities, whose true values changewith the
motion of the observed subject. After a short period of
time, the values of the shape parameters are established,
resulting in a smaller sized estimation problem in the long
term. This distinction leads us to develop a method where
changes in the image are initially attributed entirely to mo-
tion, but then the error in the reconstructed motion is used
to more accurately extract both shape and motion parame-
ters of the object being tracked.
There is a good deal of work addressing the problem
of shape estimation from motion, most of which has as-
sumed a rigid shape. In tracking deformable objects, struc-
ture has been estimated from non-rigid motion where the
input was either range data [1, 6, 10, 11, 14] or image data
[7, 8, 12, 17]. In particular, Koch [8] describes a model-
based framework which uses optical flow information to
estimate the rigid translation and rotation of a moving face,
and adapts the shape of the face to account for the motion
discrepancy. None of these approaches use tracking error
residuals to improve shape and motion estimates.
Our new formulation is used in concert with the method
described by DeCarlo and Metaxas [3] to simultaneously
update the shape and motion of a face. In [3], optical flow
provides a dynamic velocity constraint on the deformable
model. This results in a model-based least squares solution
to optical flow, which is related to methods described by
Black and Yacoob [2] and Li, et al. [9]. The optical flow
constraints are used to estimate both the rigid and non-rigid
motion of a human face—head motion and facial expres-
sions, but not the underlying face shape. Edges are also
used in motion estimation to combat tracking error accu-
mulation.
In this paper, we extend this framework so that the face
shape is updated also based on the optical flow informa-
tion. Derivatives of the model Jacobian (second deriva-
tives of the model) determine how changesin the parame-
ters of the model affect its motion parameterization. Us-
ing these derivatives in a truncated Taylor series expan-
sion, the model parameters (both shape and motion) are
refined by minimizing the residuals from the model-based
motion computation. This method simultaneously corrects
the shape and motion parameters for each image frame.
For everyimage in the sequence,we first solve a model-
based least squares optical flow solution [3], which de-
termines the motion parameters. Then, the residual from
this computation determines the error in the model param-
eters using another least squares process, which adjusts the
shape and motion parameters of the model.
This approach allows a more accurate extraction of the
shape and motion. The estimation framework presented in
[3] extracted the basic shape of the face using only edge
information. Edge information is not always adequate due
to poor illumination and self-occlusion. This may result
in inaccurate estimation of the basic shape, which can in
turn cause error in the motion estimation. This approach
also differs from other model-based shape and motion es-
timation methods [8] where optical flow information was
used to directly improve the shape, leading to potentially
large shape estimation errors. Our method does not require
the extraction of tracked features, but instead uses motion
information–in this case, optical flow information. Shape
and motion are extracted simultaneously.
We demonstrate our methodology in facial shape and
motion estimation: an application area possessing a rea-
sonably clear separation between shape and motion param-
eters. In fact, this division is often built into face models
[2, 9, 12, 17] to simplify model construction or estima-
tion, while Reynard, et al. [16] use this separation to permit
learning the variability of motions for a class of objects.
In this paper, we first give a brief review of the model-
based optical flow framework and face modelpresented in
[3]. We then describe how this framework is augmented
with a shape and motion refinement computation. We
present experiments that extract the shape of the face, and
track its motion, in the presence of large head rotations and
expressions. We compare the results of these experiments
to those that use the framework demonstrated in [3]. The
shape from motion approach shows a significant improve-
ment in the estimation of face shape.
2 Model description and framework
The shape sof a deformable model is parameterized by
a vector of values qand is defined over a domain which
can be used to identify specific points on the model. The
shape s u;qswith uand parameters qs, is allowed to
translate and rotate so that a point on the model is given
by: x u c Rs u;qs(1)
where the model parameters q qcqθqs.qccis
the translation, and qθis the quaternion that specifies the
rotation matrix R. For conciseness, the dependency of xon
qis not always written.
To distinguish the processes of shape estimation and
motion tracking, the parameters in qare rearranged and
separated into qb, which describe the basic shape of the
object, and into qm, which describe its motion (both rigid
and non-rigid), so that q qbqm. This distinction be-
tween shape and motion parameters is necessary to con-
sider during the constructionof a shape model, and allows
for the more effective estimation of shape and motion.
2.1 Face model description
For the applications in this paper, the shape model used
is the deformable face model described in [3], and is shown
in Figure 1. It is a three-dimensional polygon mesh, shown
in Figure 1(b), whose geometry is controlled using a man-
ually constructed sequence of parameterized deformations,
which include localized scaling and bending operations.
The parameterization of the model is based on data gath-
ered in face anthropometry studies [4], which ensures that
the model is capable of representing a wide variety of
faces.
The shape of the face model (in rest position) is formed
using a set of parameterized deformations specified by the
shape parameters qb. Results of applying some of the
deformations that affect the nose are displayed in Fig-
ure 2. Also included in qbare parameters which spec-
ify the appearance of facial expressions, called expression-
(a) (b)
Figure 1: The deformable face model
Figure 2: Example nose shape deformations
shape parameters—these parametersdo not change the un-
derlying face shape, but rather change the appearance of
a particular facial expression. These parameters abstract
information related to facial muscle placement. Figure 3
contains examples of varying expression-shape parameters
that specify how a particular individual smiles. Figure 3(a)
shows the model in its rest state (not smiling), while (b)
and (c) contain differently shaped smiles. The smile in Fig-
ure 3(c) is more curved (like the Cheshire cat’s) by varying
some of the expression-shape parameters. In total (both
shape and expression-shape parameters), qbcontains ap-
proximately 100 parameters.
The motion of the face model, which includes rigid head
motions (specified by qcand qθ) as well as non-rigid facial
expressions, uses a separate set of deformations specified
by the motion parameters qm. Figure 4 shows example de-
formations performed by these parameters. The model is
shown with the mouth open in Figure 4(a), smiling in (b),
raising eyebrows in (c) and frowning eyebrows in (d). In
total, qmcontains 15 rigid and non-rigid motion parame-
ters.
The partition of qinto qband qmcan also be viewed an-
other way–the parameters in qbare a static quantity for a
particular individual, and specify what a person looks like
and how their facial expressions appear. The parameters
in qmare a dynamic quantity, which change when a sub-
ject moves their head, opens their mouth, or makes a facial
expression. The goal of a shape and motion estimation pro-
cess is to recover the value of qfrom a sequence of frames.
(a) (b) (c)
Figure 3: Example smile expression-shape deformations
(a) (b) (c) (d)
Figure 4: Face motion and expression deformations; (a)
open mouth, (b) smile (c) raise brows (d) frown brows
During estimation, the change in qbshould tend to zero as
the shape of the face is established. Once this occurs, fit-
ting need only continue for qm. For reasons of efficiency,
it is in our best interest to include as many parameters as
possible in qb.
2.2 Deformable model dynamics
Estimation of the model parameters qisbasedonfirst
order Lagrangian dynamics [10]. As the model changes,
velocities of points on the three-dimensional model xare
given by: ˙
x u L u;q˙
q(2)
where Lxqis the model Jacobian [10]. Note that the
dependency of Lon qis not always written, for reasons of
conciseness.
For computations using image information, Lmust take
the camera projection into account [3]. In this case, the
two-dimensional model xpincludes a perspective projec-
tion, and has a correspondingprojectedJacobian Lprelated
by: ˙
xpu Lpu;q˙
q(3)
As is often the case in a deformable model framework
in a vision application, the dynamic equations of motion
[10] of the model are simplified to obtain:
˙
q fqfqL u f u du(4)
Using L, the three-dimensional applied forces fare con-
verted to generalized forces which act on qand are inte-
grated over the model to find the total (generalized) pa-
rameter force fq. The distribution of forces on the model is
based in part on forces computed from the edges of an in-
put image. The equations of motion (4) are integrated over
time to estimate qfrom a sequence of images.
In addition to this, a model-based motion computation
can be used [3, 9]. We are assuming here that this motion
is expressed by an over-determined linear system:
A˙
q b 0 (5)
where the matrix Aand vector bcan depend on both the
model and the data. For example, optical flow information
in the form of (5) constrained the velocity of the model
(4) in [3] to produce a model-based least-squares motion
computation; this frameworkis detailed in the next section.
2.3 Motion estimation using optical flow
The integration of optical flow into a deformable model
formulation was presented in [3], and is briefly reviewed
here. The optical flow constraint equation, which expresses
a constraint on the optical flow image velocities, is refor-
mulated as a system of dynamic constraints on ˙
q, the de-
formable model velocity.
The optical flow constraint equation at a pixel iin the
image I has the form [5]
Iiui
viIti0(6)
where I IxIyare the spatial derivatives and Itis the
time derivative of the image intensity. uiand viare the
components of the optical flow velocities.
In a model based approach, uiand viare identified with
the components of the projected model velocities ˙
xpui.
It is important, however,that a distinction is made between
the shape parameters qband the motion parameters qm.
Any observed motion is caused by dynamic changes in the
true value of qm. The true value of qbis a static quantity
the meaning of ˙
qbcomes from the analogy of physics,
where the value of qbimproves over the course of fitting
(over time). Hence, the optical flow velocities are identi-
fied with the portion of ˙
xpthat corresponds to changes in
qm:
ui
vi˙
xpuiLmpui˙
qm(7)
where LpLbpLmpis the projected model Jacobian
that has been split into blocks correspondingto qband qm.
The constraint equation for the optical flow at a point i
in the image can be found by rewriting (6) using (7):
IiLmpui˙
qmIti0(8)
Instead of using this constraint at every pixel in the image,
mpixels are carefully selected from the input image [3]
(wheremdimqm). For the mchosen pixels in the image,
the system of equations based on (8) becomes:
I1Lmpu1
.
.
.
ImLmpum
˙
qm
It1
.
.
.
Itm
0(9)
which can be written compactly as
Bm˙
qmIt0(10)
which is simply an instance of (5). This produces a con-
straint on the dynamic equations of motion(4), which has
the linear least-squares solution:
˙
qmBmIt(11)
where Bmis the pseudo-inverse of Bm.
2.4 Shape and motion estimation
This section describes our new technique for non-rigid
shape and motion estimation using the residuals from a
least-squares motion estimation. When optical flow is used
as the cue for motion estimation, as in Section 2.3, the
residuals are in part caused by violations of the optical flow
constraints (i.e. specularity), by linearization of the optical
flow constraints, and by measurement noise. In a model-
based framework, residuals are also produced by errors in
the extracted shape and motion of the model. In order for
the residuals to be useful, however, a significant error in
the shape and motion during tracking must be responsible
for the majority of the residual—this is our primary as-
sumption. This assumption is supported by experimental
evidence discussed in Section 3.
The use of a model allows for a model-based compu-
tation using these residuals. For the applications here, the
deformable face model described in Section 2.1 is used.
The optical flow least-squares residuals Rare computed
from (10) using (11):
RBm˙
qmItBmBmItIt1 BmBmIt
(12)
The residual is a vector which has dimension m(the num-
ber of pixels used in the motion computation).
There are a number of approaches to using this residual
information–giventhe assumption above, the goal of these
approaches will be to reduce this residual. One possible
approach is to extract shape information using the same
formulation for determining motion as described in Sec-
tion 2.3, as in:
Bm˙
qmBb˙
qbIt0(13)
where the construction of Bbis similar to Bm, but uses Lb
instead of Lm. The system in (13) is decoupled, and is
solved for motion first, and then for shape in terms of the
residual R:
Bb˙
qbR˙
qbBbR(14)
This method is closely related to the method described
by Koch [8]. It is a reasonable approach in the context
of image coding, where image fidelity is of much greater
importance than face shape estimation—the face shape is
deformed to account for the trackingerrors in motion. This
produces a face shape that results in a higher quality image,
but does not necessarily estimate the actual 3-D face shape
of the subject.
As stated earlier, in the framework presented here, a
clear distinction is made between shape and motion param-
eters, since the true value of qbis a static quantity. Hence,
it does not make sense to adjust the shape parameters qb
directly from observed velocities, as in [8].
Instead of this, our approach is to find what small
change in qwould affect the largest reduction in the mo-
tion residual. This approach uses the fact that the model
Jacobian Lmpu;qdepends on both qband qm(based on
how the model was constructed), so that second derivative
information is used. Let qbe the current deviation of q
from its true value (not including the motion in ˙
qm)—this
includes both the shape error and the accumulated motion
error. We assume qis of sufficiently small magnitude so
that the first-order approximation to Lmusing its Taylor-
series expansion is sufficiently accurate:
Lmpui;qq Lmpui;qLmpui;q
qq(15)
For the case of the face model described in Section 2.1,
whose parameterization consists of mostly affine scaling
deformations, sufficient accuracy is easily attained. Com-
bining this approximation of Lmpwith the model-based
optical flow constraint equation (8) results in:
IiLmpui˙
qmIiLmpui
qq˙
qmIti0 (16)
where Lmpqis part of the model Hessian matrix. It is
used here as a block matrix, written here “curried” with q
to keep the notation under control.
When (16) is considered over mpixels from the input
image, this results in the system:
Bm˙
qmH˙
qmq It0(17)
where H
I1Lmpu1
q
.
.
.
ImLmpum
q
(18)
The transpositions performed in the construction of H
allow it now to be curried with ˙
qm. This manipulation al-
lows for the solution of q, which is found using another
least squares process:
H˙
qmqRq H˙
qmR(19)
This least squares solution determines the best set of
small changes in qband qmthat minimize the optical flow
residual (12), given the linearization of Lmpin (15). There-
fore, we update both the shape qband the motion qm, un-
like previousapproaches, in order to capture the non-rigid
shape and motion using motion analysis.
2.5 Implementation
Updating qusing the solution for the shape and motion
estimation in (19) can be accomplished by simply replac-
ing qby qqin the next iteration (this update is in ad-
dition to the numerical integration of the dynamic motion
equations). To improve robustness, however, such a solu-
tion requires further processing.
The processing of qverifies there has been a sig-
nificant decrease in the residual given the change in q.
This is necessary due to the linear approximation in (15),
and that the visible portion of the projected model xpcan
change with q.Onceqhas been computed using (19),
the model-based motion analysis in (10) is re-solved using
qnew qq, producing an updated residualRnew.Ifthe
addition of qcauses the residual magnitude of Rnew to
be larger than R, the results of the shape and motion re-
finement are discarded. Otherwise, the changes specified
by qcan be used directly.
Efficiency gains are accomplished by omitting param-
eters in the construction of Hfrom (18) which cannot be
affected based on qm. For example, if there is no motion
extracted in the eyebrowregion of the face, then there is no
reason to include eyebrow shape parameters in H.Atany
point in time, typically about half of the shape parameters
of the face model can be omitted from the computations.
The process of determining qcan also be iterated,
solving (10) and (19) repeatedly to obtain a greater im-
provement. For the applications here, the linear approx-
imation in (15) is relatively accurate for the face model
described in Section 2.1, due to the fact that most of the
model parameterization is linear scaling. As a result, only
the single iteration is performed.
The least squares solution to (19) is solved using a
singular-value decomposition. This avoids any problems
associated with the lowering of rank due to the aperture
problem or a lack of motion, as well as the problems asso-
ciated with a non-orthogonal set of parameters.
3 Experiments and discussion
We now present two experimentsto demonstrate the im-
proved shape estimation ability of our new model-based
shape and motion estimation technique. The entire process
of shape and motion estimation is automatic, except for the
initialization, which involves the manual specification of a
few landmark features in the first frame of the sequence.
The model is then initially fit using edge information [3].
The problem of automatically locating the face and its var-
ious features has been addressed elsewhere [18, 19]. No
markers or make-up are used on the subject.
Both experiments use the same subject in the image
sequences. The extracted shape results can be compared
against a Cyberware range scan of the subject, shown in
Figure 5 (where the extracted face is manually scaled by a
small amount to eliminate the depth ambiguity). This anal-
ysis is used for the shape parameters in qb. Unfortunately,
a similar analysis for the motion parameters in qmcannot
be performed, since ground truth is not available. However,
by visually inspecting the alignment of the model with the
image, a rough verification can be performed.
(a) (b)
Figure 5: Range scan of subject (a) shaded and (b) textured
For each of the tracking examples, several frames from
the original image sequence are displayed (480 480Indy-
Cam grayscale). Below each, the same sequence is shown
with the estimated face superimposed. Additional close-
ups are provided to show the difference between using our
new shape and motion estimation technique, and the opti-
cal flow framework from [3]. Finally, a graph is displayed
that indicates the RMS deviation of the model from the
range scan over the course of the image sequence (for both
techniques).
Processing each frame takes approximately 10 seconds
(on a 200 MHz SGI Indigo 2), where the least-squares
shape and motion estimation takes about 6 seconds, and
the least-squares optical flow solution takes about 1 second
(but has to be performed twice to check q). The compu-
tation used 200 pixels from each image.
Figure 6 shows four frames from the first experiment. In
this sequence, the subject makes a series of non-rigid face
motions: openingthe mouth in (b), smiling in (c), and rais-
ing the eyebrows in (d). In each case, the motion param-
eter values change appropriately, and at the correct times
(and compare closely with those extracted without the non-
rigid shape and motion computation,as there was very lit-
tle motion error). A close-up of Figure 6(c) is shown in
Figure 7(a), showing a fitted smile expression (including
changes to smile expression-shapeparameters, causing the
smile to turn upward at its corners). Figure 7(b) shows the
same frame fitted using the framework in [3], showing a
smile expression that does not have well-fitting expression-
shape parameters. The RMS error graph in Figure 8 clearly
shows the advantagesof using our new technique. Besides
having a lower RMS error at the final frame, this lower
level was reached relatively quickly.
Figure 9 shows five frames from the second experi-
ment. The subject moves his head around in different di-
rections, producing a smile in (b) and (c), before finally
turning his head to the side (e). The face model captures
this motion well, even in the presence of the significant
self-occlusion in (e). A close-up of Figure 9(b) is shown
in Figure 10(a), showing another well-fitted smile expres-
sion. Again, the framework without the non-rigid shape
and motion computation is not able to correct the error in
the smile expression-shape parameters in (b). Figure 10(c)
is a close-up of Figure 9(e). The shape of the face is im-
proved over the estimated face shape in (d), which uses
only edge information to extractthe shape. The difference
is most pronounced in the nose profile, the position in the
right eyebrow, and the slope of the forehead. The RMS
error graph in Figure 11 again shows the beneficial result
of our technique. Most of the face shape that is extracted
from the sequence is extracted before frame (c). The de-
crease in RMS error for the estimation process using the
framework in [3] between frames (d) and (e) corresponds
to when the subject turned his head to the side by a large
amount, where the profile view contained good edge infor-
mation to fit the face shape. The framework using the new
technique, however, did not have to “wait” for the subject
to turn his head substantially to get a goodshape estimate.
Judging by the good performance of our method, it
seems that the model-based least squares non-rigid shape
and motion method is relatively insensitive to optical flow
constraint equation errors (such as violations of the bright-
ness constancy assumption [15], or the truncation of higher
order image-derivative terms [13]). This was also observed
for the model-based least-squares optical flow solution in
[3].
The derivation in Section 2.4 assumes that shape error
is the leading contributor to the residuals from the motion
computation. In order to estimate what portion of the resid-
uals are caused by shape error, both experimentswere run
again; this time, the initial model shape was taken from
the range scan of the subject (so that shape error is elimi-
nated). The residuals that resulted from these experiments
had a fairly small magnitude, which averaged around 0.035
(pixel intensity units—for pixels in the range 0 1 ). This
value stayed fairly constant throughoutboth experiments.
In the actual experiments, the residual magnitudes
started fairly high (around 0.18for the first experiment, and
0.24 for the second), and ended up around 0.050 (for both
experiments) by the end of motion sequence. This enforces
the validity of our assumption that shape error is responsi-
ble for the bulk of the residual.
4 Conclusions
In this paper, we presented a novel deformable model
technique which uses residuals from a model-based optical
flow solution to refine the shape and motion of the model.
The experiments show how this technique is an improve-
ment over edge-based techniquesfor shape and motion es-
timation.
Besides having greater accuracy than a framework us-
ing only optical flow and edges, our framework extracts
the shape of the face without needing data from extreme
head poses (such as a profile view). Instead, much smaller
motions are needed to extract much of the shape informa-
(a) (b) (c) (d)
Figure 6: Tracking and shape estimation experiment 1
(a) (b)
Figure 7: Experiment 1 results (a) with and (b) without
shape from motion (close-ups)
RMS error
(cm)
0.0
1.0
2.0
frames
abcd
Figure 8: Experiment1 shape estimation results (solid line:
with new technique; dotted line: without)
tion.
The least squares non-rigid shape and motion compu-
tation seems to be robust to optical flow constraint equa-
tion violations or approximations (such as small lighting
changes or higher order image derivativeterms).
This method was presented in the context of face shape
and motion estimation, although it could be applied to
other model-based domains. This work should provide
some encouragement to researchers working on automatic
motion-based model construction, since the benefits of this
method are only possible within a model-based framework.
(a) (b) (c) (d) (e)
Figure 9: Tracking and shape estimation experiment 2
(a) (b)
(c) (d)
Figure 10: Experiment 2 results (a),(c) with and (b),(d)
without new technique (close-ups)
RMS error
(cm)
0.0
1.0
2.0
ab c d frames
e
Figure 11: Experiment 2 shape estimation results (solid
line: with newtechnique; dottedline: without)
Acknowledgments
This research is partially supported by NSF Career
Award grant 9624604; ARO grant DAAH-04-96-1-007;
and ONR-YIP grant K-5-55043/3916-1552793.
References
[1] A. Amini and J. Duncan. Pointwise tracking of left-
ventricular motion in 3D. In Proc. IEEE Workshop on Visual
Motion, pages 294–299, 1991.
[2] M. Black and Y. Yacoob. Tracking and recognizing rigid
and non-rigid facial motions using local parametric models
of image motion. In Proceedings ICCV ’95, pages 374–381,
1995.
[3] D. DeCarlo and D. Metaxas. The integration of optical flow
and deformable models with applications to human face
shape and motion estimation. In Proceedings CVPR ’96,
pages 231–238, 1996.
[4] L. Farkas. Anthropometry of the Head and Face.Raven
Press, 1994.
[5] B.K.P. Horn. Robot Vision. McGraw-Hill, 1986.
[6] W.C. Huang and D. B. Goldgof. Adaptive-size meshes
for rigid and nonrigid shape analysis and synthesis. IEEE
Pattern Analysis and Machine Intelligence, 15(6):611–616,
June 1993.
[7] I. Kakadiaris and D. Metaxas. Model-based estimation of
3d human motion with occlusion based on active multi-
viewpoint selection. In Proceedings CVPR ’96, pages 81–
87, 1996.
[8] R. Koch. Dynamic 3-D scene analysis through synthesis
feedback control. IEEE Pattern Analysis and Machine In-
telligence, 15(6):556–568, June 1993.
[9] H. Li, P. Roivainen, and R. Forchheimer. 3-D motion esti-
mation in model-based facial image coding. IEEE Pattern
Analysis and Machine Intelligence, 15(6):545–555, June
1993.
[10] D. Metaxas. Physics-Based Deformable Models : Applica-
tions to Computer Vision, Graphics, and Medical Imaging.
Kluwer Academic Publishers, 1996.
[11] S.K. Mishra, D. Goldgof, and T.S. Huang. Motion analy-
sis and epicardial deformation estimation from angiography
data. In Proceedings CVPR ’91, pages 331–336, 1991.
[12] Y. Moses, D. Reynard, and A. Blake. Robust real time track-
ing and classificiation of facial expressions. In Proceedings
ICCV ’95, pages 296–301, 1995.
[13] H.H. Nagel. Displacement vectors derived from second-
order intensity variations in image sequences. CVGIP,
21(1):85–117, January 1983.
[14] C. Nastar and N. Ayache. Spatio-temporal analysis of non-
rigid motion from 4D data. In Proc. IEEE Workshop on
Motion of Non-Rigid and Articulated Objects, pages 146–
151, 1994.
[15] S. Negahdaripour and C.H. Yu. A generalized brightness
change model for computing optical flow. In ICCV93, pages
2–11, 1993.
[16] D. Reynard, A. Wildenberg, A. Blake, and J. Marchant.
Learning dynamics of complex motions from image se-
quences. In Proceedings ECCV ’96, pages I:357–368, 1996.
[17] D. Terzopoulos and K. Waters. Analysis and synthesis
of facial image sequences using physical and anatomical
models. IEEE Pattern Analysis and Machine Intelligence,
15(6):569–579, 1993.
[18] Y. Yacoob and L.S. Davis. Computing spatio-temporal rep-
resentations of human faces. In Proceedings CVPR ’94,
pages 70–75, 1994.
[19] A.L. Yuille, D.S. Cohen, and P. Halliman. Feature extrac-
tion from faces using deformable templates. International
Journal of Computer Vision, 8:104–109, 1992.
... The purpose of this distinction is to reduce the number of motion parameters. This distinction now leads us to develop a method, initially presented in [DM98], where changes in the image are initially attributed entirely to motion, but then the error in the reconstructed motion is used to more accurately extract both shape and motion parameters of the object being tracked. ...
... The matrix G is the same as H in[DM98], but is changed here to avoid overloading with the measurement matrix from Chapter 5. ...
Article
This thesis describes techniques for the construction of face models for both computer graphics and computer vision applications. It also details model-based computer vision methods for extracting and combining data with the model. Our face models respect the measurements of populations described by face anthropometry studies. In computer graphics, the anthropometric measurements permit the automatic generation of varied geometric models of human faces. This is accomplished by producing a random set of face measurements generated according to anthropometric statistics. A face fitting these measurements is realized using variational modeling. In computer vision, anthropometric data biases face shape estimation towards more plausible individuals. Having such a detailed model encourages the use of model-based techniques—we use a physics-based deformable model framework. We derive and solve a dynamic system which accounts for edges in the image and incorporates optical flow as a motion constraint on the model. Our solution ensures this constraint remains satisfied when edge information is used, which helps prevent tracking drift. This method is extended using the residuals from the optical flow solution. The extracted structure of the model can be improved by determining small changes in the model that reduce this error residual. We present experiments in extracting the shape and motion of a face from image sequences which exhibit the generality of our technique, as well as provide validation.
... These can be provided by additional smoothing constraints or by a predefined motion model to regularize the optical flow field. They have been successfully demonstrated in the context of face tracking [12], [13] using deformable meshes or for camera motion or person tracking using rigid or affine motion models. In [14] a method has been presented that uses optical flow in connection with radial basis functions to track less constrained deformations. ...
Conference Paper
Full-text available
To obtain motion for arbitrary, real-world video sequences are a challenging but important task for both algorithm evaluation and model design. In this paper, we analyze a method for motion estimation that exploits the entire image information using the optical flow equation. Optical flow cannot be computed locally, since only one independent measurement is available from the image sequence at a point, while the flow velocity has two components a second constraint is needed. Paper presents experimental results obtained from the fast version of Classic + NL algorithm method for obtaining flow on Weizmann Action Database.
... These can be provided by additional smoothing constraints or by a predefined motion model to regularize the optical flow field. They have been successfully demonstrated in the context of face tracking [12], [13] using deformable meshes or for camera motion or person tracking using rigid or affine motion models. In [14] a method has been presented that uses optical flow in connection with radial basis functions to track less constrained deformations. ...
Conference Paper
Full-text available
To obtain motion for arbitrary, real-world video sequences are a challenging but important task for both algorithm evaluation and model design. In this paper, we analyze a method for motion estimation that exploits the entire image information using the optical flow equation. Optical flow cannot be computed locally, since only one independent measurement is available from the image sequence at a point, while the flow velocity has two components a second constraint
... The first one is responsible for detecting 2D-3D correspondences between the input and reference images. Pixel-based methods, also known as direct methods, use intensity differences between two images to calculate the 2D-3D correspondences [9,15,16] and feature-based methods [52] attempt to detect the presence of the same feature in two images. Conversely, the shape inference phase tries to adjust the deformation model to correspondences according to a set of constraints [2,41,33]. ...
... This approach is suitable for estimating global information, for instance, the motion of the camera or object. DeCarlo et al. 11 and Essa et al. 10 used this approach to estimate facial expressions. To estimate the precise 3-D structure of an object, the precise relationship among the feature points is required. ...
Article
Full-text available
An image-based 3-D modeling system is presented in this paper. Our modeling system consists of three main stages: camera calibration, depth estimation and 3-D geometry reconstruction. All of these steps are executed automatically. In the camera calibration stage, some patterns are used that help to determine the camera's position in an environment. The camera's intrinsic and external parameters are determined using epipolar geometry. After the camera parameters are determined, the camera's location in each projected frame is determined. The depth for each pixel in a base image is estimated from the camera's focus to the object's surface by measuring the similarity between the base image and the neighboring images. The object's 3-D geometry is reconstructed with texture from the base image using the depth information.
... Because of error accumulation these methods were not able to deal with longer image sequences. Some solutions were proposed trying to prevent this drift away [8] [13] . Our method constrains the texture appearance to the neighborhood of the initial values and in this way prevents the drift away. ...
Article
Full-text available
A simple method is presented for 3D head pose esti- mation and tracking in monocular image sequences. A generic geometric model is used. The initialization con- sists of aligning the perspective projection of the geomet- ric model with the subjects head in the initial image. After the initialization, the gray levels from the initial image are mappedonto the visible sideofthe headmodelto forma tex- tured object. Only a limited number of points on the object is used allowing real-time performance even on low-end computers. The appearance changes caused by movement in the complex light conditions of a real scene present a big problem for fitting the textured model to the data from new images. Having in mind realhuman-computerinterfaceswe propose a simple adaptive appearance changes model that is updated by the measurements from the new images. To stabilize the model we constrain it to some neighborhood of the initial gray values. The neighborhood is defined using some simple heuristics.
... However, if the object is non-rigid, not only the object rotation and translation, but also the shape of the object must be estimated in 3D, because the shape changes by the object motion. For reducing the DOFs in recovery of the object shape, 3D shape model with some DOFs is generally introduced for non-rigid object shape recovery [1, 4, 6]. In this paper, our objects are deformable plastic tapes. ...
Article
In this paper, we propose a method to estimate 3D shape of deformable plastic tapes from multiple camera images. In this method, the tape is modeled as serial connection of multiple rectangular plates, where the size of each plate is previously known and node angles of between plates represent the shape of the object. The node angles of the object are estimated by 2D silhouette shapes taken in the multiple images. The estimation is performed by minimizing the difference of the silhouette shapes between the input images and synthesized images of the model shape. For demonstrating the proposed method, 3D shape of a tape is estimated with two camera images. The accuracy of the estimation is sufficient for making the assembling robot in our plant to handle the tape. Computation time is also sufficiently short for applying the proposed algorithm in the assembling plant.
Chapter
Full-text available
In this chapter, the state-of-the-art in facial animation and expression analysis is reviewed and new techniques for the estimation of 3-D human motion, deformation, and facial expressions from monocular video sequences are presented. Since illumination has a considerable influence on the appearance of objects in a scene, methods for the derivation of photometric scene properties from images are also addressed. For a particular implementation, the potential of these analysis techniques is illustrated for applications like character animation and model-based video coding. Experiments have shown that the usage of 3-D computer models allows video transmissions at bit-rates of a few kbit/s, enabling a wide variety of new applications.
Article
Recent development of physics-based face modeling that emulates the anatomical structure including skin, muscles, and skull allows us to create detailed, realistic animations. However, synthesis of facial expressions on such complex models often involves significant manual work due to the difficulty in determining appropriate values of the muscle actuation parameters. This paper presents an example-based performance-driven method to automatically estimate facial muscle actuation parameters from markerless video footage. Our method is based on an efficient face tracker which uses a facial deformation subspace model. During the training phase of the tracker a set of templates associated with the subspace basis is computed to alleviate the online computation. At runtime, the tracking algorithm establishes temporal correspondence of the face region in the video sequence by simultaneously determining both motion and appearance parameters. Using a set of example pairs that consist of the appearance and animation parameters corresponding to the key expressions, we learn the relationship between facial appearances and animation parameters. It enables the animation parameters to be computed in real-time from the appearance parameters obtained by the tracker, allowing animation of the anatomical model at interactive rates.
Conference Paper
Full-text available
We present a formal methodology for the integration of optical flow and deformable models. The optical flow constraint equation provides a non-holonomic constraint on the motion of the deformable model. In this augmented system, forces computed from edges and optical flow are used simultaneously. When this dynamic system is solved, a model-based least-squares solution for the optical flow is obtained and improved estimation results are achieved. The use of a 3-D model reduces or eliminates problems associated with optical flow computation. This approach instantiates a general methodology for treating visual cues as constraints on deformable models. We apply this framework to human face shape and motion estimation. Our 3-D deformable face model uses a small number of parameters to describe a rich variety of face shapes and facial expressions. We present experiments in extracting the shape and motion of a face from image sequences.
Book
Full-text available
1 Introduction 2 Image Formation & Image Sensing 3 Binary Images: Geometrical Properties 4 Binary Images: Topological Properties 5 Regions & Image Segmentation 6 Image Processing: Continuous Images 7 Image Processing: Discrete Images 8 Edges & Edge Finding 9 Lightness & Color 10 Reflectance Map: Photometric Stereo 11 Reflectance Map: Shape from Shading 12 Motion Field & Optical Flow 13 Photogrammetry & Stereo 14 Pattern Classification 15 Polyhedral Objects 16 Extended Gaussian Images 17 Passive Navigation & Structure from Motion 18 Picking Parts out of a Bin & Hand-Eye Systems
Article
A local approach for interframe displacement estimates is developed by minimization of the squared differences between a second-order Taylor expansion of gray values from one frame and the observed gray values within the same window from the next frame. If the second-order terms in the Taylor expansion are significant, a system of two coupled nonlinear equations for the two unknown components of the displacement vector can be derived. In the special case of “gray value corners,” these equations can be simplified to facilitate a closed form solution. An iterative refinement procedure is developed to extend these estimates for image regions which do not exhibit exactly the properties of “gray value corners.” The minimization approach is generalized in such a way that the approach of Horn and Schanck (Artif. Intell. 17, 1981, 185–203) can be recognized as a special case of this generalized form which should be applicable even across occluding edges. It thus appears to be an interesting model for the local computation of optical flow.
Conference Paper
We present a formal methodology for the integration of optical flow and deformable models. The optical flow constraint equation provides a non-holonomic constraint on the motion of the deformable model. In this augmented system, forces computed from edges and optical flow are used simultaneously. When this dynamic system is solved, a model-based least-squares solution for the optical flow is obtained and improved estimation results are achieved. The use of a 3-D model reduces or eliminates problems associated with optical flow computation. This approach instantiates a general methodology for treating visual cues as constraints on deformable models. We apply this framework to human face shape and motion estimation. Our 3-D deformable face model uses a small number of parameters to describe a rich variety of face shapes and facial expressions. We present experiments in extracting the shape and motion of a face from image sequences
Article
A method for detecting and describing the features of faces using deformable templates is described. The feature of interest, an eye for example, is described by a parameterized template. An energy function is defined which links edges, peaks, and valleys in the image intensity to corresponding properties of the template. The template then interacts dynamically with the image, by altering its parameter values to minimize the energy function, thereby deforming itself to find the best fit. The final parameter values can be used as descriptors for the features. This method is demonstrated by showing deformable templates detecting eyes and mouths in real images