Conference PaperPDF Available

Deformable Model-Based Shape and Motion Analysis from Images using Motion Residual Error.

January 1998

January 1998

DOI:10.1109/ICCV.1998.710708

Source
DBLP

Conference: Computer Vision, 1998. Sixth International Conference on

Authors:

Dimitris N. Metaxas

Rutgers, The State University of New Jersey

We present a novel method for the shape and motion estimation of a deformable model using error residuals from model-based motion analysis. The motion of the model is first estimated using a model-based least squares method. Using the residuals from the least squares so- lution, the non-rigid structure of the model can be better estimated by computing how changes in the shape of the model affect its motion parameterization. This method is implemented as a component in a deformable model-based framework that uses optical flow information and edges. This general model-based framework is applied to human face shape and motion estimation. We present experiments that demonstrate that this framework is a considerable im- provement over a framework that uses only optical flow information and edges.

Content uploaded by Dimitris N. Metaxas

Content may be subject to copyright.

(appeared in Proceedings ICCV ’98, pp. 113-119)

Deformable Model-Based Shape and Motion Analysis

from Images using Motion Residual Error

Douglas DeCarlo and Dimitris Metaxas

Department of Computer & Information Science, University of Pennsylvania,Philadelphia PA 19104-6389

dmd@gradient.cis.upenn.edu,dnm@central.cis.upenn.edu

Abstract

We present a novel method for the shape and motion

estimation of a deformable model using error residuals

from model-based motion analysis. The motion of the

model is ﬁrst estimated using a model-based least squares

method. Using the residuals from the least squares so-

lution, the non-rigid structure of the model can be better

estimated by computing how changes in the shape of the

model affect its motion parameterization. This method is

implemented as a component in a deformable model-based

framework that uses optical ﬂow information and edges.

This general model-based framework is applied to human

face shape and motion estimation. We present experiments

that demonstrate that this framework is a considerable im-

provement over a framework that uses only optical ﬂow

information and edges.

1 Introduction

In this paper we develop a new robust approach to the

problem of shape and motion from non-rigid motion using

deformable models. The starting point for our method is an

intuitive distinction between shape and motion. Our model

has motion parameters, which describe both rigid and non-

rigid motions. The model also has shape parameters, which

describe the basic underlying shape of the model. The pur-

pose of this distinction is to reduce the number of motion

parameters. The shape parameters represent static quanti-

ties whose true values are ﬁxed. The motion parameters

are dynamic quantities, whose true values changewith the

motion of the observed subject. After a short period of

time, the values of the shape parameters are established,

resulting in a smaller sized estimation problem in the long

term. This distinction leads us to develop a method where

changes in the image are initially attributed entirely to mo-

tion, but then the error in the reconstructed motion is used

to more accurately extract both shape and motion parame-

ters of the object being tracked.

There is a good deal of work addressing the problem

of shape estimation from motion, most of which has as-

sumed a rigid shape. In tracking deformable objects, struc-

ture has been estimated from non-rigid motion where the

input was either range data [1, 6, 10, 11, 14] or image data

[7, 8, 12, 17]. In particular, Koch [8] describes a model-

based framework which uses optical ﬂow information to

estimate the rigid translation and rotation of a moving face,

and adapts the shape of the face to account for the motion

discrepancy. None of these approaches use tracking error

residuals to improve shape and motion estimates.

Our new formulation is used in concert with the method

described by DeCarlo and Metaxas [3] to simultaneously

update the shape and motion of a face. In [3], optical ﬂow

provides a dynamic velocity constraint on the deformable

model. This results in a model-based least squares solution

to optical ﬂow, which is related to methods described by

Black and Yacoob [2] and Li, et al. [9]. The optical ﬂow

constraints are used to estimate both the rigid and non-rigid

motion of a human face—head motion and facial expres-

sions, but not the underlying face shape. Edges are also

used in motion estimation to combat tracking error accu-

mulation.

In this paper, we extend this framework so that the face

shape is updated also based on the optical ﬂow informa-

tion. Derivatives of the model Jacobian (second deriva-

tives of the model) determine how changesin the parame-

ters of the model affect its motion parameterization. Us-

ing these derivatives in a truncated Taylor series expan-

sion, the model parameters (both shape and motion) are

reﬁned by minimizing the residuals from the model-based

motion computation. This method simultaneously corrects

the shape and motion parameters for each image frame.

For everyimage in the sequence,we ﬁrst solve a model-

based least squares optical ﬂow solution [3], which de-

termines the motion parameters. Then, the residual from

this computation determines the error in the model param-

eters using another least squares process, which adjusts the

shape and motion parameters of the model.

This approach allows a more accurate extraction of the

shape and motion. The estimation framework presented in

[3] extracted the basic shape of the face using only edge

information. Edge information is not always adequate due

to poor illumination and self-occlusion. This may result

in inaccurate estimation of the basic shape, which can in

turn cause error in the motion estimation. This approach

also differs from other model-based shape and motion es-

timation methods [8] where optical ﬂow information was

used to directly improve the shape, leading to potentially

large shape estimation errors. Our method does not require

the extraction of tracked features, but instead uses motion

information–in this case, optical ﬂow information. Shape

and motion are extracted simultaneously.

We demonstrate our methodology in facial shape and

motion estimation: an application area possessing a rea-

sonably clear separation between shape and motion param-

eters. In fact, this division is often built into face models

[2, 9, 12, 17] to simplify model construction or estima-

tion, while Reynard, et al. [16] use this separation to permit

learning the variability of motions for a class of objects.

In this paper, we ﬁrst give a brief review of the model-

based optical ﬂow framework and face modelpresented in

[3]. We then describe how this framework is augmented

with a shape and motion reﬁnement computation. We

present experiments that extract the shape of the face, and

track its motion, in the presence of large head rotations and

expressions. We compare the results of these experiments

to those that use the framework demonstrated in [3]. The

shape from motion approach shows a signiﬁcant improve-

ment in the estimation of face shape.

2 Model description and framework

The shape sof a deformable model is parameterized by

a vector of values qand is deﬁned over a domain Ωwhich

can be used to identify speciﬁc points on the model. The

shape s u;qswith uΩand parameters qs, is allowed to

translate and rotate so that a point on the model is given

by: x u c Rs u;qs(1)

where the model parameters q qcqθqs.qccis

the translation, and qθis the quaternion that speciﬁes the

rotation matrix R. For conciseness, the dependency of xon

qis not always written.

To distinguish the processes of shape estimation and

motion tracking, the parameters in qare rearranged and

separated into qb, which describe the basic shape of the

object, and into qm, which describe its motion (both rigid

and non-rigid), so that q qbqm. This distinction be-

tween shape and motion parameters is necessary to con-

sider during the constructionof a shape model, and allows

for the more effective estimation of shape and motion.

2.1 Face model description

For the applications in this paper, the shape model used

is the deformable face model described in [3], and is shown

in Figure 1. It is a three-dimensional polygon mesh, shown

in Figure 1(b), whose geometry is controlled using a man-

ually constructed sequence of parameterized deformations,

which include localized scaling and bending operations.

The parameterization of the model is based on data gath-

ered in face anthropometry studies [4], which ensures that

the model is capable of representing a wide variety of

faces.

The shape of the face model (in rest position) is formed

using a set of parameterized deformations speciﬁed by the

shape parameters qb. Results of applying some of the

deformations that affect the nose are displayed in Fig-

ure 2. Also included in qbare parameters which spec-

ify the appearance of facial expressions, called expression-

(a) (b)

Figure 1: The deformable face model

Figure 2: Example nose shape deformations

shape parameters—these parametersdo not change the un-

derlying face shape, but rather change the appearance of

a particular facial expression. These parameters abstract

information related to facial muscle placement. Figure 3

contains examples of varying expression-shape parameters

that specify how a particular individual smiles. Figure 3(a)

shows the model in its rest state (not smiling), while (b)

and (c) contain differently shaped smiles. The smile in Fig-

ure 3(c) is more curved (like the Cheshire cat’s) by varying

some of the expression-shape parameters. In total (both

shape and expression-shape parameters), qbcontains ap-

proximately 100 parameters.

The motion of the face model, which includes rigid head

motions (speciﬁed by qcand qθ) as well as non-rigid facial

expressions, uses a separate set of deformations speciﬁed

by the motion parameters qm. Figure 4 shows example de-

formations performed by these parameters. The model is

shown with the mouth open in Figure 4(a), smiling in (b),

raising eyebrows in (c) and frowning eyebrows in (d). In

total, qmcontains 15 rigid and non-rigid motion parame-

ters.

The partition of qinto qband qmcan also be viewed an-

other way–the parameters in qbare a static quantity for a

particular individual, and specify what a person looks like

and how their facial expressions appear. The parameters

in qmare a dynamic quantity, which change when a sub-

ject moves their head, opens their mouth, or makes a facial

expression. The goal of a shape and motion estimation pro-

cess is to recover the value of qfrom a sequence of frames.

(a) (b) (c)

Figure 3: Example smile expression-shape deformations

(a) (b) (c) (d)

Figure 4: Face motion and expression deformations; (a)

open mouth, (b) smile (c) raise brows (d) frown brows

During estimation, the change in qbshould tend to zero as

the shape of the face is established. Once this occurs, ﬁt-

ting need only continue for qm. For reasons of efﬁciency,

it is in our best interest to include as many parameters as

possible in qb.

2.2 Deformable model dynamics

Estimation of the model parameters qisbasedonﬁrst

order Lagrangian dynamics [10]. As the model changes,

velocities of points on the three-dimensional model xare

given by: ˙

x u L u;q˙

q(2)

where L∂x∂qis the model Jacobian [10]. Note that the

dependency of Lon qis not always written, for reasons of

conciseness.

For computations using image information, Lmust take

the camera projection into account [3]. In this case, the

two-dimensional model xpincludes a perspective projec-

tion, and has a correspondingprojectedJacobian Lprelated

by: ˙

xpu Lpu;q˙

q(3)

As is often the case in a deformable model framework

in a vision application, the dynamic equations of motion

[10] of the model are simpliﬁed to obtain:

q fqfqL u f u du(4)

Using L, the three-dimensional applied forces fare con-

verted to generalized forces which act on qand are inte-

grated over the model to ﬁnd the total (generalized) pa-

rameter force fq. The distribution of forces on the model is

based in part on forces computed from the edges of an in-

put image. The equations of motion (4) are integrated over

time to estimate qfrom a sequence of images.

In addition to this, a model-based motion computation

can be used [3, 9]. We are assuming here that this motion

is expressed by an over-determined linear system:

A˙

q b 0 (5)

where the matrix Aand vector bcan depend on both the

model and the data. For example, optical ﬂow information

in the form of (5) constrained the velocity of the model

(4) in [3] to produce a model-based least-squares motion

computation; this frameworkis detailed in the next section.

2.3 Motion estimation using optical ﬂow

The integration of optical ﬂow into a deformable model

formulation was presented in [3], and is brieﬂy reviewed

here. The optical ﬂow constraint equation, which expresses

a constraint on the optical ﬂow image velocities, is refor-

mulated as a system of dynamic constraints on ˙

q, the de-

formable model velocity.

The optical ﬂow constraint equation at a pixel iin the

image I has the form [5]

∇Iiui

viIti0(6)

where ∇I IxIyare the spatial derivatives and Itis the

time derivative of the image intensity. uiand viare the

components of the optical ﬂow velocities.

In a model based approach, uiand viare identiﬁed with

the components of the projected model velocities ˙

xpui.

It is important, however,that a distinction is made between

the shape parameters qband the motion parameters qm.

Any observed motion is caused by dynamic changes in the

true value of qm. The true value of qbis a static quantity—

the meaning of ˙

qbcomes from the analogy of physics,

where the value of qbimproves over the course of ﬁtting

(over time). Hence, the optical ﬂow velocities are identi-

ﬁed with the portion of ˙

xpthat corresponds to changes in

qm:

vi˙

xpuiLmpui˙

qm(7)

where LpLbpLmpis the projected model Jacobian

that has been split into blocks correspondingto qband qm.

The constraint equation for the optical ﬂow at a point i

in the image can be found by rewriting (6) using (7):

∇IiLmpui˙

qmIti0(8)

Instead of using this constraint at every pixel in the image,

mpixels are carefully selected from the input image [3]

(wheremdimqm). For the mchosen pixels in the image,

the system of equations based on (8) becomes:

∇I1Lmpu1

∇ImLmpum

It1

Itm

0(9)

which can be written compactly as

Bm˙

qmIt0(10)

which is simply an instance of (5). This produces a con-

straint on the dynamic equations of motion(4), which has

the linear least-squares solution:

qmBmIt(11)

where Bmis the pseudo-inverse of Bm.

2.4 Shape and motion estimation

This section describes our new technique for non-rigid

shape and motion estimation using the residuals from a

least-squares motion estimation. When optical ﬂow is used

as the cue for motion estimation, as in Section 2.3, the

residuals are in part caused by violations of the optical ﬂow

constraints (i.e. specularity), by linearization of the optical

ﬂow constraints, and by measurement noise. In a model-

based framework, residuals are also produced by errors in

the extracted shape and motion of the model. In order for

the residuals to be useful, however, a signiﬁcant error in

the shape and motion during tracking must be responsible

for the majority of the residual—this is our primary as-

sumption. This assumption is supported by experimental

evidence discussed in Section 3.

The use of a model allows for a model-based compu-

tation using these residuals. For the applications here, the

deformable face model described in Section 2.1 is used.

The optical ﬂow least-squares residuals Rare computed

from (10) using (11):

RBm˙

qmItBmBmItIt1 BmBmIt

(12)

The residual is a vector which has dimension m(the num-

ber of pixels used in the motion computation).

There are a number of approaches to using this residual

information–giventhe assumption above, the goal of these

approaches will be to reduce this residual. One possible

approach is to extract shape information using the same

formulation for determining motion as described in Sec-

tion 2.3, as in:

Bm˙

qmBb˙

qbIt0(13)

where the construction of Bbis similar to Bm, but uses Lb

instead of Lm. The system in (13) is decoupled, and is

solved for motion ﬁrst, and then for shape in terms of the

residual R:

Bb˙

qbR˙

qbBbR(14)

This method is closely related to the method described

by Koch [8]. It is a reasonable approach in the context

of image coding, where image ﬁdelity is of much greater

importance than face shape estimation—the face shape is

deformed to account for the trackingerrors in motion. This

produces a face shape that results in a higher quality image,

but does not necessarily estimate the actual 3-D face shape

of the subject.

As stated earlier, in the framework presented here, a

clear distinction is made between shape and motion param-

eters, since the true value of qbis a static quantity. Hence,

it does not make sense to adjust the shape parameters qb

directly from observed velocities, as in [8].

Instead of this, our approach is to ﬁnd what small

change in qwould affect the largest reduction in the mo-

tion residual. This approach uses the fact that the model

Jacobian Lmpu;qdepends on both qband qm(based on

how the model was constructed), so that second derivative

information is used. Let ∆qbe the current deviation of q

from its true value (not including the motion in ˙

qm)—this

includes both the shape error and the accumulated motion

error. We assume ∆qis of sufﬁciently small magnitude so

that the ﬁrst-order approximation to Lmusing its Taylor-

series expansion is sufﬁciently accurate:

Lmpui;q∆q Lmpui;q∂Lmpui;q

∂q∆q(15)

For the case of the face model described in Section 2.1,

whose parameterization consists of mostly afﬁne scaling

deformations, sufﬁcient accuracy is easily attained. Com-

bining this approximation of Lmpwith the model-based

optical ﬂow constraint equation (8) results in:

∇IiLmpui˙

qm∇Ii∂Lmpui

∂q∆q˙

qmIti0 (16)

where ∂Lmp∂qis part of the model Hessian matrix. It is

used here as a block matrix, written here “curried” with ∆q

to keep the notation under control.

When (16) is considered over mpixels from the input

image, this results in the system:

Bm˙

qmH˙

qm∆q It0(17)

where H

∇I1∂Lmpu1

∂q

∇Im∂Lmpum

∂q

(18)

The transpositions performed in the construction of H

allow it now to be curried with ˙

qm. This manipulation al-

lows for the solution of ∆q, which is found using another

least squares process:

H˙

qm∆qR∆q H˙

qmR(19)

This least squares solution determines the best set of

small changes in qband qmthat minimize the optical ﬂow

residual (12), given the linearization of Lmpin (15). There-

fore, we update both the shape qband the motion qm, un-

like previousapproaches, in order to capture the non-rigid

shape and motion using motion analysis.

2.5 Implementation

Updating qusing the solution for the shape and motion

estimation in (19) can be accomplished by simply replac-

ing qby q∆qin the next iteration (this update is in ad-

dition to the numerical integration of the dynamic motion

equations). To improve robustness, however, such a solu-

tion requires further processing.

The processing of ∆qveriﬁes there has been a sig-

niﬁcant decrease in the residual given the change in q.

This is necessary due to the linear approximation in (15),

and that the visible portion of the projected model xpcan

change with q.Once∆qhas been computed using (19),

the model-based motion analysis in (10) is re-solved using

qnew q∆q, producing an updated residualRnew.Ifthe

addition of ∆qcauses the residual magnitude of Rnew to

be larger than R, the results of the shape and motion re-

ﬁnement are discarded. Otherwise, the changes speciﬁed

by ∆qcan be used directly.

Efﬁciency gains are accomplished by omitting param-

eters in the construction of Hfrom (18) which cannot be

affected based on qm. For example, if there is no motion

extracted in the eyebrowregion of the face, then there is no

reason to include eyebrow shape parameters in H.Atany

point in time, typically about half of the shape parameters

of the face model can be omitted from the computations.

The process of determining ∆qcan also be iterated,

solving (10) and (19) repeatedly to obtain a greater im-

provement. For the applications here, the linear approx-

imation in (15) is relatively accurate for the face model

described in Section 2.1, due to the fact that most of the

model parameterization is linear scaling. As a result, only

the single iteration is performed.

The least squares solution to (19) is solved using a

singular-value decomposition. This avoids any problems

associated with the lowering of rank due to the aperture

problem or a lack of motion, as well as the problems asso-

ciated with a non-orthogonal set of parameters.

3 Experiments and discussion

We now present two experimentsto demonstrate the im-

proved shape estimation ability of our new model-based

shape and motion estimation technique. The entire process

of shape and motion estimation is automatic, except for the

initialization, which involves the manual speciﬁcation of a

few landmark features in the ﬁrst frame of the sequence.

The model is then initially ﬁt using edge information [3].

The problem of automatically locating the face and its var-

ious features has been addressed elsewhere [18, 19]. No

markers or make-up are used on the subject.

Both experiments use the same subject in the image

sequences. The extracted shape results can be compared

against a Cyberware range scan of the subject, shown in

Figure 5 (where the extracted face is manually scaled by a

small amount to eliminate the depth ambiguity). This anal-

ysis is used for the shape parameters in qb. Unfortunately,

a similar analysis for the motion parameters in qmcannot

be performed, since ground truth is not available. However,

by visually inspecting the alignment of the model with the

image, a rough veriﬁcation can be performed.

(a) (b)

Figure 5: Range scan of subject (a) shaded and (b) textured

For each of the tracking examples, several frames from

the original image sequence are displayed (480 480Indy-

Cam grayscale). Below each, the same sequence is shown

with the estimated face superimposed. Additional close-

ups are provided to show the difference between using our

new shape and motion estimation technique, and the opti-

cal ﬂow framework from [3]. Finally, a graph is displayed

that indicates the RMS deviation of the model from the

range scan over the course of the image sequence (for both

techniques).

Processing each frame takes approximately 10 seconds

(on a 200 MHz SGI Indigo 2), where the least-squares

shape and motion estimation takes about 6 seconds, and

the least-squares optical ﬂow solution takes about 1 second

(but has to be performed twice to check ∆q). The compu-

tation used 200 pixels from each image.

Figure 6 shows four frames from the ﬁrst experiment. In

this sequence, the subject makes a series of non-rigid face

motions: openingthe mouth in (b), smiling in (c), and rais-

ing the eyebrows in (d). In each case, the motion param-

eter values change appropriately, and at the correct times

(and compare closely with those extracted without the non-

rigid shape and motion computation,as there was very lit-

tle motion error). A close-up of Figure 6(c) is shown in

Figure 7(a), showing a ﬁtted smile expression (including

changes to smile expression-shapeparameters, causing the

smile to turn upward at its corners). Figure 7(b) shows the

same frame ﬁtted using the framework in [3], showing a

smile expression that does not have well-ﬁtting expression-

shape parameters. The RMS error graph in Figure 8 clearly

shows the advantagesof using our new technique. Besides

having a lower RMS error at the ﬁnal frame, this lower

level was reached relatively quickly.

Figure 9 shows ﬁve frames from the second experi-

ment. The subject moves his head around in different di-

rections, producing a smile in (b) and (c), before ﬁnally

turning his head to the side (e). The face model captures

this motion well, even in the presence of the signiﬁcant

self-occlusion in (e). A close-up of Figure 9(b) is shown

in Figure 10(a), showing another well-ﬁtted smile expres-

sion. Again, the framework without the non-rigid shape

and motion computation is not able to correct the error in

the smile expression-shape parameters in (b). Figure 10(c)

is a close-up of Figure 9(e). The shape of the face is im-

proved over the estimated face shape in (d), which uses

only edge information to extractthe shape. The difference

is most pronounced in the nose proﬁle, the position in the

right eyebrow, and the slope of the forehead. The RMS

error graph in Figure 11 again shows the beneﬁcial result

of our technique. Most of the face shape that is extracted

from the sequence is extracted before frame (c). The de-

crease in RMS error for the estimation process using the

framework in [3] between frames (d) and (e) corresponds

to when the subject turned his head to the side by a large

amount, where the proﬁle view contained good edge infor-

mation to ﬁt the face shape. The framework using the new

technique, however, did not have to “wait” for the subject

to turn his head substantially to get a goodshape estimate.

Judging by the good performance of our method, it

seems that the model-based least squares non-rigid shape

and motion method is relatively insensitive to optical ﬂow

constraint equation errors (such as violations of the bright-

ness constancy assumption [15], or the truncation of higher

order image-derivative terms [13]). This was also observed

for the model-based least-squares optical ﬂow solution in

[3].

The derivation in Section 2.4 assumes that shape error

is the leading contributor to the residuals from the motion

computation. In order to estimate what portion of the resid-

uals are caused by shape error, both experimentswere run

again; this time, the initial model shape was taken from

the range scan of the subject (so that shape error is elimi-

nated). The residuals that resulted from these experiments

had a fairly small magnitude, which averaged around 0.035

(pixel intensity units—for pixels in the range 0 1 ). This

value stayed fairly constant throughoutboth experiments.

In the actual experiments, the residual magnitudes

started fairly high (around 0.18for the ﬁrst experiment, and

0.24 for the second), and ended up around 0.050 (for both

experiments) by the end of motion sequence. This enforces

the validity of our assumption that shape error is responsi-

ble for the bulk of the residual.

4 Conclusions

In this paper, we presented a novel deformable model

technique which uses residuals from a model-based optical

ﬂow solution to reﬁne the shape and motion of the model.

The experiments show how this technique is an improve-

ment over edge-based techniquesfor shape and motion es-

timation.

Besides having greater accuracy than a framework us-

ing only optical ﬂow and edges, our framework extracts

the shape of the face without needing data from extreme

head poses (such as a proﬁle view). Instead, much smaller

motions are needed to extract much of the shape informa-

(a) (b) (c) (d)

Figure 6: Tracking and shape estimation experiment 1

(a) (b)

Figure 7: Experiment 1 results (a) with and (b) without

shape from motion (close-ups)

RMS error

(cm)

0.0

1.0

2.0

frames

abcd

Figure 8: Experiment1 shape estimation results (solid line:

with new technique; dotted line: without)

tion.

The least squares non-rigid shape and motion compu-

tation seems to be robust to optical ﬂow constraint equa-

tion violations or approximations (such as small lighting

changes or higher order image derivativeterms).

This method was presented in the context of face shape

and motion estimation, although it could be applied to

other model-based domains. This work should provide

some encouragement to researchers working on automatic

motion-based model construction, since the beneﬁts of this

method are only possible within a model-based framework.

(a) (b) (c) (d) (e)

Figure 9: Tracking and shape estimation experiment 2

(a) (b)

Figure 10: Experiment 2 results (a),(c) with and (b),(d)

without new technique (close-ups)

RMS error

(cm)

0.0

1.0

2.0

ab c d frames

Figure 11: Experiment 2 shape estimation results (solid

line: with newtechnique; dottedline: without)

Acknowledgments

This research is partially supported by NSF Career

Award grant 9624604; ARO grant DAAH-04-96-1-007;

and ONR-YIP grant K-5-55043/3916-1552793.

References

[1] A. Amini and J. Duncan. Pointwise tracking of left-

ventricular motion in 3D. In Proc. IEEE Workshop on Visual

Motion, pages 294–299, 1991.

[2] M. Black and Y. Yacoob. Tracking and recognizing rigid

and non-rigid facial motions using local parametric models

of image motion. In Proceedings ICCV ’95, pages 374–381,

1995.

[3] D. DeCarlo and D. Metaxas. The integration of optical ﬂow

and deformable models with applications to human face

shape and motion estimation. In Proceedings CVPR ’96,

pages 231–238, 1996.

[4] L. Farkas. Anthropometry of the Head and Face.Raven

Press, 1994.

[5] B.K.P. Horn. Robot Vision. McGraw-Hill, 1986.

[6] W.C. Huang and D. B. Goldgof. Adaptive-size meshes

for rigid and nonrigid shape analysis and synthesis. IEEE

Pattern Analysis and Machine Intelligence, 15(6):611–616,

June 1993.

[7] I. Kakadiaris and D. Metaxas. Model-based estimation of

3d human motion with occlusion based on active multi-

viewpoint selection. In Proceedings CVPR ’96, pages 81–

87, 1996.

[8] R. Koch. Dynamic 3-D scene analysis through synthesis

feedback control. IEEE Pattern Analysis and Machine In-

telligence, 15(6):556–568, June 1993.

[9] H. Li, P. Roivainen, and R. Forchheimer. 3-D motion esti-

mation in model-based facial image coding. IEEE Pattern

Analysis and Machine Intelligence, 15(6):545–555, June

1993.

[10] D. Metaxas. Physics-Based Deformable Models : Applica-

tions to Computer Vision, Graphics, and Medical Imaging.

Kluwer Academic Publishers, 1996.

[11] S.K. Mishra, D. Goldgof, and T.S. Huang. Motion analy-

sis and epicardial deformation estimation from angiography

data. In Proceedings CVPR ’91, pages 331–336, 1991.

[12] Y. Moses, D. Reynard, and A. Blake. Robust real time track-

ing and classiﬁciation of facial expressions. In Proceedings

ICCV ’95, pages 296–301, 1995.

[13] H.H. Nagel. Displacement vectors derived from second-

order intensity variations in image sequences. CVGIP,

21(1):85–117, January 1983.

[14] C. Nastar and N. Ayache. Spatio-temporal analysis of non-

rigid motion from 4D data. In Proc. IEEE Workshop on

Motion of Non-Rigid and Articulated Objects, pages 146–

151, 1994.

[15] S. Negahdaripour and C.H. Yu. A generalized brightness

change model for computing optical ﬂow. In ICCV93, pages

2–11, 1993.

[16] D. Reynard, A. Wildenberg, A. Blake, and J. Marchant.

Learning dynamics of complex motions from image se-

quences. In Proceedings ECCV ’96, pages I:357–368, 1996.

[17] D. Terzopoulos and K. Waters. Analysis and synthesis

of facial image sequences using physical and anatomical

models. IEEE Pattern Analysis and Machine Intelligence,

15(6):569–579, 1993.

[18] Y. Yacoob and L.S. Davis. Computing spatio-temporal rep-

resentations of human faces. In Proceedings CVPR ’94,

pages 70–75, 1994.

[19] A.L. Yuille, D.S. Cohen, and P. Halliman. Feature extrac-

tion from faces using deformable templates. International

Journal of Computer Vision, 8:104–109, 1992.

Generation, Estimation and Tracking of Faces

Article

Jan 1998

Douglas DeCarlo

This thesis describes techniques for the construction of face models for both computer graphics and computer vision applications. It also details model-based computer vision methods for extracting and combining data with the model. Our face models respect the measurements of populations described by face anthropometry studies. In computer graphics, the anthropometric measurements permit the automatic generation of varied geometric models of human faces. This is accomplished by producing a random set of face measurements generated according to anthropometric statistics. A face fitting these measurements is realized using variational modeling. In computer vision, anthropometric data biases face shape estimation towards more plausible individuals. Having such a detailed model encourages the use of model-based techniques—we use a physics-based deformable model framework. We derive and solve a dynamic system which accounts for edges in the image and incorporates optical flow as a motion constraint on the model. Our solution ensures this constraint remains satisfied when edge information is used, which helps prevent tracking drift. This method is extended using the residuals from the optical flow solution. The extracted structure of the model can be improved by determining small changes in the model that reduce this error residual. We present experiments in extracting the shape and motion of a face from image sequences which exhibit the generality of our technique, as well as provide validation.

Human Motion Detection for video surveillance by estimating Optical Flow

Conference Paper

Full-text available

Oct 2015

To obtain motion for arbitrary, real-world video sequences are a challenging but important task for both algorithm evaluation and model design. In this paper, we analyze a method for motion estimation that exploits the entire image information using the optical flow equation. Optical flow cannot be computed locally, since only one independent measurement is available from the image sequence at a point, while the flow velocity has two components a second constraint is needed. Paper presents experimental results obtained from the fast version of Classic + NL algorithm method for obtaining flow on Weizmann Action Database.

Motion Estimation Using Optical Flow

Conference Paper

Full-text available

Feb 2012

Real Time Non-Rigid 3D Surface Tracking using Particle Filter

Article

Apr 2015

Automatic Animated Face Modeling Using Multiview Video

Article

Full-text available

An image-based 3-D modeling system is presented in this paper. Our modeling system consists of three main stages: camera calibration, depth estimation and 3-D geometry reconstruction. All of these steps are executed automatically. In the camera calibration stage, some patterns are used that help to determine the camera's position in an environment. The camera's intrinsic and external parameters are determined using epipolar geometry. After the camera parameters are determined, the camera's location in each projected frame is determined. The depth for each pixel in a base image is estimated from the camera's focus to the object's surface by measuring the similarity between the base image and the neighboring images. The object's 3-D geometry is reconstructed with texture from the base image using the depth information.

A S tabili zed Adaptive Appearan ce Changes Model fo r 3 D Head Tracking

Article

Full-text available

A simple method is presented for 3D head pose esti- mation and tracking in monocular image sequences. A generic geometric model is used. The initialization con- sists of aligning the perspective projection of the geomet- ric model with the subjects head in the initial image. After the initialization, the gray levels from the initial image are mappedonto the visible sideofthe headmodelto forma tex- tured object. Only a limited number of points on the object is used allowing real-time performance even on low-end computers. The appearance changes caused by movement in the complex light conditions of a real scene present a big problem for fitting the textured model to the data from new images. Having in mind realhuman-computerinterfaceswe propose a simple adaptive appearance changes model that is updated by the measurements from the new images. To stabilize the model we constrain it to some neighborhood of the initial gray values. The neighborhood is defined using some simple heuristics.

3D Shape and Pose Estimaion of Deformable Tapes from Multiple Views

Article

Jan 2004

In this paper, we propose a method to estimate 3D shape of deformable plastic tapes from multiple camera images. In this method, the tape is modeled as serial connection of multiple rectangular plates, where the size of each plate is previously known and node angles of between plates represent the shape of the object. The node angles of the object are estimated by 2D silhouette shapes taken in the multiple images. The estimation is performed by minimizing the difference of the silhouette shapes between the input images and synthesized images of the model shape. For demonstrating the proposed method, 3D shape of a tape is estimated with two camera images. The accuracy of the estimation is sufficient for making the assembling robot in our plant to handle the tape. Computation time is also sufficiently short for applying the proposed algorithm in the assembling plant.

Analysis and Synthesis of Facial Expressions

Chapter

Full-text available

Jan 2006

Peter Eisert

In this chapter, the state-of-the-art in facial animation and expression analysis is reviewed and new techniques for the estimation of 3-D human motion, deformation, and facial expressions from monocular video sequences are presented. Since illumination has a considerable influence on the appearance of objects in a scene, methods for the derivation of photometric scene properties from images are also addressed. For a particular implementation, the potential of these analysis techniques is illustrated for applications like character animation and model-based video coding. Experiments have shown that the usage of 3-D computer models allows video transmissions at bit-rates of a few kbit/s, enabling a wide variety of new applications.

Example-based performance-driven animation of an anatomical face model

Article

Feb 2011

Y. Zhang

Recent development of physics-based face modeling that emulates the anatomical structure including skin, muscles, and skull allows us to create detailed, realistic animations. However, synthesis of facial expressions on such complex models often involves significant manual work due to the difficulty in determining appropriate values of the muscle actuation parameters. This paper presents an example-based performance-driven method to automatically estimate facial muscle actuation parameters from markerless video footage. Our method is based on an efficient face tracker which uses a facial deformation subspace model. During the training phase of the tracker a set of templates associated with the subspace basis is computed to alleviate the online computation. At runtime, the tracking algorithm establishes temporal correspondence of the face region in the video sequence by simultaneously determining both motion and appearance parameters. Using a set of example pairs that consist of the appearance and animation parameters corresponding to the key expressions, we learn the relationship between facial appearances and animation parameters. It enables the animation parameters to be computed in real-time from the appearance parameters obtained by the tracker, allowing animation of the anatomical model at interactive rates.

Motion detection and object tracking in image sequences

Article

Full-text available

Jan 2003

Zoran Zivkovic

The Integration of Optical Flow and Deformable Models with Applications to Human Face Shape and Motion Estimation.

Conference Paper

Full-text available

Jan 1996

We present a formal methodology for the integration of optical flow and deformable models. The optical flow constraint equation provides a non-holonomic constraint on the motion of the deformable model. In this augmented system, forces computed from edges and optical flow are used simultaneously. When this dynamic system is solved, a model-based least-squares solution for the optical flow is obtained and improved estimation results are achieved. The use of a 3-D model reduces or eliminates problems associated with optical flow computation. This approach instantiates a general methodology for treating visual cues as constraints on deformable models. We apply this framework to human face shape and motion estimation. Our 3-D deformable face model uses a small number of parameters to describe a rich variety of face shapes and facial expressions. We present experiments in extracting the shape and motion of a face from image sequences.

Learning Dynamics of Complex Motions from Image Sequences.

Conference Paper

Full-text available

Jan 1996

Robot Vision

Book

Full-text available

Jan 1986

Berthold K. P. Horn

1 Introduction 2 Image Formation & Image Sensing 3 Binary Images: Geometrical Properties 4 Binary Images: Topological Properties 5 Regions & Image Segmentation 6 Image Processing: Continuous Images 7 Image Processing: Discrete Images 8 Edges & Edge Finding 9 Lightness & Color 10 Reflectance Map: Photometric Stereo 11 Reflectance Map: Shape from Shading 12 Motion Field & Optical Flow 13 Photogrammetry & Stereo 14 Pattern Classification 15 Polyhedral Objects 16 Extended Gaussian Images 17 Passive Navigation & Structure from Motion 18 Picking Parts out of a Bin & Hand-Eye Systems

Physics-Based Deformable Models: Applications to Computer Vision, Graphics and Medical Imaging

Article

Jan 1996

Dimitris N. Metaxas

Facial feature extraction by deformable templates

Article

Robust real time tracking and classificiation of facial expressions

Article

Jan 1995

Nagel, H.H.: Displacement vectors derived from second order intensity variations in image sequences. Comput. Vis. Graph. Image Process. 21, 85-117

Article

Jan 1983
Comput Graph Image Process

Hans-Hellmut Nagel

A local approach for interframe displacement estimates is developed by minimization of the squared differences between a second-order Taylor expansion of gray values from one frame and the observed gray values within the same window from the next frame. If the second-order terms in the Taylor expansion are significant, a system of two coupled nonlinear equations for the two unknown components of the displacement vector can be derived. In the special case of “gray value corners,” these equations can be simplified to facilitate a closed form solution. An iterative refinement procedure is developed to extend these estimates for image regions which do not exhibit exactly the properties of “gray value corners.” The minimization approach is generalized in such a way that the approach of Horn and Schanck (Artif. Intell. 17, 1981, 185–203) can be recognized as a special case of this generalized form which should be applicable even across occluding edges. It thus appears to be an interesting model for the local computation of optical flow.

Anthropometry of Head and Face

Article

Jan 1994

L. G. Farkas

The Integration of Optical Flow and Deformable Models with Applications to Human Face Shape and Motion Estimation.

Conference Paper

Jan 1996
IEEE Comput Soc Conf Comput Vis Pattern Recogn

Feature Extraction Faces Using Deformable Templates

Article

Aug 1992

A method for detecting and describing the features of faces using deformable templates is described. The feature of interest, an eye for example, is described by a parameterized template. An energy function is defined which links edges, peaks, and valleys in the image intensity to corresponding properties of the template. The template then interacts dynamically with the image, by altering its parameter values to minimize the energy function, thereby deforming itself to find the best fit. The final parameter values can be used as descriptors for the features. This method is demonstrated by showing deformable templates detecting eyes and mouths in real images

Deformable Model-Based Shape and Motion Analysis from Images using Motion Residual Error.

Abstract

Recommended publications

Adjusting shape parameters using model-based optical flow residuals

The Integration of Optical Flow and Deformable Models with Applications to Human Face Shape and Moti...

Deformable model-based face shape and motion estimation

The Integration of Optical Flow and Deformable Models with Applications to Human Face Shape and Moti...