ArticlePDF Available

Appearance-Based Face Recognition and Light-Fields

May 2004
IEEE Transactions on Pattern Analysis and Machine Intelligence 26(4):449-65

May 2004
26(4):449-65

DOI:10.1109/TPAMI.2004.1265861

Source
PubMed

Authors:

Ralph Gross

Carnegie Mellon University

Iain Matthews

Oculus VR

Arguably the most important decision to be made when developing an object recognition algorithm is selecting the scene measurements or features on which to base the algorithm. In appearance-based object recognition, the features are chosen to be the pixel intensity values in an image of the object. These pixel intensities correspond directly to the radiance of light emitted from the object along certain rays in space. The set of all such radiance values over all possible rays is known as the plenoptic function or light-field. In this paper, we develop a theory of appearance-based object recognition from light-fields. This theory leads directly to an algorithm for face recognition across pose that uses as many images of the face as are available, from one upwards. All of the pixels, whichever image they come from, are treated equally and used to estimate the (eigen) light-field of the object. The eigen light-field is then used as the set of features on which to base recognition, analogously to how the pixel intensities are used in appearance-based face and object recognition.

: A comparison of eigen light-fields with FaceIt and eigenfaces for face recognition across pose on the PIE database. The table contains the average recognition rate computed across all disjoint pairs of

…

Vectorization by normalization. Vectorization is the process of converting a set of images of a face into a light-field vector. Vectorization is performed by first classifying each input image into one of a finite number of poses. For each pose, a normalization is then applied to convert the image into a sub-vector of the light-field vector. If poses are missing, the corresponding part of the light-field vector is missing.

…

(a) The first, simpler normalization for three poses in the finite set in Figure 3, one frontal, one a 3/4 view, the final a full profile. Just as in eigenfaces, we assume that the eye and nose locations are known, warp the face into a coordinate frame in which these canonical points are in a fixed location and finally crop the image with a (pose dependent) mask. (b) The second, more complex normalization. In this case, a large number (39-54 depending on the pose) of points on the face are used to perform the normalization.

…

: A comparison of the performance of eigen light-fields and Fisher light-fields with FaceIt on three different face recognition across pose and illumination scenarios. In all three cases, eigen light-fields and

…

The performance of eigen light-fields with a subset of the images using the 3-point normalization and the PIE database. The average recognition rate is plot against the percentage of pixels in the probe and gallery images. A subset of the images can be used without any significant reduction in the recognition rate.

…

Figures - uploaded by Iain Matthews

Content may be subject to copyright.

Content uploaded by Iain Matthews

Content may be subject to copyright.

Appearance-Based Face Recognition and Light-Fields

Ralph Gross, Iain Matthews, and Simon Baker

CMU-RI-TR-02-20

Abstract

Arguably the most important decision to be made when developing an object recognition

algorithm is selecting the scene measurements or features on which to base the algorithm. In

appearance-based object recognition the features are chosen to be the pixel intensity values in an

image of the object. These pixel intensities correspond directly to the radiance of light emitted

from the object along certain rays in space. The set of all such radiance values over all possible

rays is known as the plenoptic function or light-ﬁeld. In this paper we develop the theory of

appearance-based object recognition from light-ﬁelds. This theory leads directly to a pose-invariant

face recognition algorithm that uses as many images of the face as are available, from one upwards.

All of the pixels, whichever image they come from, are treated equally and used to estimate the

(eigen) light-ﬁeld of the object. The eigen light-ﬁeld is then used as the set of features on which

to base recognition, analogously to how the pixel intensities are used in appearance-based object

recognition. We also show how our algorithm can be extended to recognize faces across pose and

illumination by using Fisher light-ﬁelds.

1 Introduction

Arguably the most important decision to be made when developing an object recognition algo-

rithm is selecting the scene measurements or features on which to base the algorithm. One of the

most successful and well-studied approaches to object recognition is the appearance-based ap-

proach. Although the expression “appearance-based” was introduced by Murase and Nayar [17],

the approach itself dates back to Turk and Pentland’s Eigenfaces [25] and perhaps before [24]. The

deﬁning characteristic of appearance-based algorithms is that they directly use the pixel intensity

values in an image of the object as the features on which to base the recognition decision.

The pixel intensities that are used as features in appearance-based algorithms correspond

directly to the radiance of light emitted from the object along certain rays in space. Although

there may be various non-linearities caused by the optics (e.g. vignetting), the CCD sensor itself,

or by gamma correction in the camera, the pixel intensities can be thought of as approximately

equivalent to the radiance of light emitted from the object in the direction of the pixel.

The plenoptic function [1] or light-ﬁeld [12, 16] speciﬁes the radiance of light along all

rays in the scene. Hence, the light-ﬁeld of an object is the set of all possible features that could be

used by an appearance-based object recognition algorithm. It is natural, therefore, to investigate

using light-ﬁelds (as an intermediate representation) for appearance-based object recognition. In

the ﬁrst part of this paper we develop the theory of appearance-based object recognition from light-

ﬁelds. In the second part we propose an algorithm for pose-invariant face recognition based on an

algorithm to estimate the (eigen) light-ﬁeld of a face from a set of images. Finally, we extend our

algorithm to perform face recognition across both pose and illumination using Fisher light-ﬁelds.

1.1 Theoretical Properties of Light-Fields for Recognition

There are a number of important theoretical questions pertaining to object recognition from light-

ﬁelds. Some examples are:

1. The fundamental question “what is the set of images of an object under all possible illu-

mination conditions?” was recently posed and answered in [5]. Because an image simply

consists of a subset of measurements from the light-ﬁeld, it is natural to ask the same ques-

tion about the set of all light-ﬁelds of an object. Answering this second question may also

help understand the variation in appearance of objects across both pose and illumination.

2. “When can two objects be distinguished from their images?” is perhaps the most important

theoretical question in object recognition. Various attempts have been made to answer it in

one form or another. For example, it was shown in [4] that, given a pair of images, there is

always an object that could have generated those two images (under different illuminations.)

Similarly one might ask “when can two objects be distinguished from their light-ﬁelds?”

In the ﬁrst part of this paper we derive a number of fundamental properties of object light-ﬁelds. In

particular, we ﬁrst investigate the set of all possible light-ﬁelds of an object under varying illumi-

nation. Amongst other things we show that the set of all light-ﬁelds is a convex cone, analogously

to the results in [5] for single images. Afterwards we investigate the degree to which objects are

distinguishable from their light-ﬁelds. We show that, under arbitrary illumination conditions, if

two objects have the same shape they cannot be distinguished, even given their light-ﬁeld. The sit-

uation for objects with different shapes is different however. We show that two objects can almost

always be distinguished from their light-ﬁelds if they have different shapes.

1.2 Face Recognition Using Light-Fields

One implication of this theory is that “appearance-based” object recognition from light-ﬁelds is

theoretically more powerful than object recognition from single images. Capturing an entire light-

ﬁeld is normally not appropriate for object recognition however; it requires either a large number

of cameras, a great deal of time, or both. This does not mean that it is impossible to use light-ﬁelds

in practical object recognition algorithms. In the second part of this paper we develop a pose-

invariant face recognition algorithm that is based on an algorithm to estimate the (eigen) light-ﬁeld

of an object from an arbitrary collection of images [13]. This algorithm is based on an algorithm

for dealing with occlusions in the eigen-space approach [6, 15]. The eigen light-ﬁeld, once it has

been estimated, is then used as an enlarged set of features on which to base the face recognition

decision. Some of the advantageous properties of this algorithm are as follows:

1. Any number of images can be used, from one upwards, in both the training (gallery) and the

test (probe) sets. Moreover, none of the training images need to have been captured from the

same pose as any of the test images. For example, there might be two test images for each

person, a full frontal view and a full proﬁle, and only one training image, a half proﬁle. In

this way, our algorithm can perform “face recognition across pose.”

2. If only one test or training image is available, our algorithm behaves “reasonably” when es-

timating the light-ﬁeld. In particular, we prove that the light-ﬁeld estimated by our algorithm

correctly re-renders images across pose (under suitable assumptions about the objects.)

3. If more than one test or training image is available, the extra information (including the

implicit shape information) is incorporated into a better estimate of the light-ﬁeld. The ﬁnal

face recognition algorithm therefore performs demonstrably better with more input images.

4. It is straightforward to extend our algorithm to perform “face recognition across both pose

and illumination” [14]. We generalize eigen light-ﬁelds to Fisher Light-ﬁelds analogously to

how eigenfaces were generalized to Fisherfaces in [3].

1.3 Overview

The remainder of this paper is organized as follows. We begin in Section 2 by introducing object

light-ﬁelds and deriving some of their key properties. We continue in Section 3 by describing eigen

light-ﬁelds and their use in our algorithm for face recognition across pose. In Section 4 we extend

our algorithm to use Fisher light-ﬁelds and to recognize faces across both pose and illumination.

We conclude in Section 5 with a summary and suggestions for future work.

L( , )φ

360

Viewpoint

Object

Light-Field

Radiance

-90

Figure 1: An illustration of the 2D light-ﬁeld [16] of a 2D object. The object is conceptually placed within

a circle. The angle to the viewpoint

around the circle is measured by the angle and the direction that

the viewing ray makes with the radius of the circle by

. For each pair of angles and , the radiance of

light reaching the viewpoint is denoted

, the light-ﬁeld [16]. Although the light-ﬁeld of a 3D object

is actually 4D, we will continue to use the 2D notation of this ﬁgure for ease of explanation.

Object Light-Fields and Their Properties f

or Recognition

2.1 Object Light-Fields

The plenoptic function [1] or light-ﬁeld [16] is a function which speciﬁes the radiance of light

in free space. It is usually assumed to be a 5D function of position (3D) and orientation (2D).

In addition, it is also sometimes modeled as a function of time, wavelength, and polarization,

depending on the application in mind. Assuming that there is no absorption or scattering of light

through the air [18], the light-ﬁeld is actually only a 4D function, a 2D function of position deﬁned

over a 2D surface, and a 2D function of direction [12, 16]. In 2D, the light-ﬁeld of a 2D object is

only 2D. See Figure 1 for an illustration of the 2D light-ﬁeld of a 2D object.

2.2 The Set of All Light-Fields of an Object Under Varying Illumination

The fundamental question “what is the set of images of an object under all possible illumination

conditions?” was recently posed and answered by Belhumeur and Kriegman [5]. We begin our

analysis by asking the analogous question for light-ﬁelds. Since an image just consists of a subset

of the rays in the light-ﬁeld, it is not surprising that the same result also holds for light-ﬁelds:

Theorem 1 The set of -pixel light-ﬁelds of any object, seen under all possible lighting conditions,

is a convex cone in .

This result holds for any object, even if the object is non-convex and non-Lambertian. As pointed

out in [5], the proof is essentially a trivial combination of the additive property of light and the

fact that the set of all illumination conditions is itself a convex cone. For this reason, the same

result holds for any subset of illumination conditions that is a convex cone. One example is an

arbitrary number of point light sources at inﬁnity. It is straightforward to show that this subset of

illumination conditions is a convex cone and therefore that the following theorem also holds:

Theorem 2 The set of -pixel light-ﬁelds of any object, illuminated by an arbitrary number of

point light sources at inﬁnity, is a convex cone in

These results are analogous to those in [5]. Moreover, since Theorems 1 and 2 clearly also hold for

any subset of rays in the light-ﬁeld, the analogous results in [5] are special cases of these theorems.

When we investigate the nature of the illumination cones in more detail, however, we ﬁnd

several differences between images and light-ﬁelds. Some of the differences are summarized in

Table 1. If we consider arbitrary illumination conditions and any convex object, the image illu-

mination cone always exactly equals the set of all images because every point on the object can

be illuminated independently and set to radiate any intensity. This result holds for any reﬂectance

function. The only minor requirement is that no point on the object has zero reﬂectance.

The situation is different for light-ﬁelds. It is possible to choose reﬂectance functions for

which the light-ﬁeld illumination cone is equal to the set of all light-ﬁelds. One simple example

Table 1: A comparison of image illumination cones and light-ﬁeld illumination cones. The main point to

note is that in three of the four cases, the light-ﬁeld illumination cone is a “smaller” subset of the set of all

light-ﬁelds than the corresponding image illumination cone is a subset of the set of all images.

Image Illumination Cone Light-ﬁeld Illumination Cone

Arbitrary Illumination Conditions Always exactly equals Can sometimes be

Any Convex Object the set of all images the set of all light-ﬁelds

Arbitrary Illumination Conditions Always exactly equals Never is

Convex Lambertian Object the set of all images the set of all light-ﬁelds

Point Light Sources at Inﬁnity Can sometimes be Can sometimes be

Any Convex Object full-dimensional full-dimensional

Point Light Sources at Inﬁnity Can sometimes be Never is

Convex Lambertian Object full-dimensional full-dimensional

is to use a “mirrored” object. However, for most reﬂectance functions the light-ﬁeld illumination

cone is not equal to the set of all light-ﬁelds. One example is Lambertian reﬂectance. In this

case, the light-ﬁeld cone never equals the set of all light-ﬁelds because any two pixels in the light-

ﬁeld that image the same point on the object will always have the same intensity. For Lambertian

objects the image illumination cone across arbitrary illumination conditions still exactly equals the

set of all images because the pixels can still all be set independently by choosing the illumination

appropriately.

For point light sources at inﬁnity (rather than for arbitrary illumination conditions), the

results are similar. The image illumination cone can sometimes be full-dimensional. For convex

Lambertian objects the dimensionality equals the number of distinct surface normals. (See [5]

Proposition 5.) If each surface normal is different, the image illumination cone is full-dimensional.

For light-ﬁelds, however, the light-ﬁeld illumination cone of a convex Lambertian object with point

light sources at inﬁnity is never full dimensional because any two pixels in the light-ﬁeld that image

the same point on the surface will always have the same intensity.

The trend in Table 1 is clear. Object recognition in the presence of illumination changes is

“theoretically” easier using light-ﬁelds than with images. Using either model (arbitrary illumina-

tion or point light sources at inﬁnity) the light-ﬁeld illumination cone is a “smaller” subset of the

set of all light-ﬁelds than the image illumination cone is a subset of the set of all images.

2.3 Distinguishability of Objects from Their Images and Light-Fields

As mentioned in [5] the convex cone property is potentially very important for object recognition

because it implies that if the illumination cones of two objects are disjoint, they can be separated

by a linear discriminant function. This property makes classiﬁcation much easier because apply-

ing a linear classiﬁer is in general far easier than determining which illumination cone an image or

light-ﬁeld lies closest to. However, to take advantage of this property, the two illumination cones

must be disjoint. If they are not the two objects will not always be distinguishable anyway. These

arguments, of course, apply equally to both image and light-ﬁeld illumination cones. In this sec-

tion we study the distinguishability (intersection) of illumination cones and show that the task is

theoretically easier for light-ﬁelds than for images. We begin with image illumination cones.

2.3.1 Distinguishability of Objects from Their Images

An immediate corollary of the fact that the image illumination cones of convex objects under arbi-

trary lighting are exactly equal to the set of all images (see Table 1) is that no two convex objects

(Lambertian or not) can ever be distinguished without some assumptions about the illumination:

Corollary 1 The image illumination cones of any two convex objects seen under all possible light-

ing conditions are exactly equal. It is therefore never possible to say which convex object an image

came from. It is not even possible to eliminate any convex objects as possibilities.

Perhaps one of the most important results of [5] is to show that, if the illumination consists of point

sources at inﬁnity the situation is more favorable; empirically the volume of the image illumination

cone is much less than the space of all images. It is also straight-forward to show that there are

pairs of objects that are distinguishable under this smaller set of lighting conditions:

Theorem 3 There exist pairs of objects for which the intersection of their illumination cones (over

the set of illumination conditions consisting of arbitrary numbers of point light sources at inﬁnity)

only consists of the black (all zero) image; i.e. there are pairs of objects that are always distin-

guishable (over the set of illumination conditions which consist of point light sources at inﬁnity.)

Proof: (Sketch) One example is to consider two Lambertian spheres, one with an albedo function

that has multiple step discontinuities (which appear in every image), one that varies smoothly

everywhere. All of the images of the object with the step discontinuity in the albedo map will also

have a step discontinuity in the image, whereas none of the images of the other object will.

Although we have shown that there are pairs of objects for which the image illumination

cones (for point light sources at inﬁnity) only intersect at the all black image, there are pairs of

objects for which their image illumination cones do intersect.

Theorem 4 There exist pairs of objects for which the intersection of their illumination cones (over

the set of point light sources at inﬁnity) consists of more than just the black (all zero) image; i.e.

there are pairs of objects that are sometime indistinguishable (over point light sources at inﬁnity.)

Proof: Consider two convex Lambertian objects in different illuminations. If each object has

albedo variation proportional to the foreshortened incoming illumination of the other object, the

two objects will generate the same image. (The constants of proportionality must be the same.)

2.3.2 Distinguishability of Same-Shape Objects from Their Light-Fields

In the previous section we showed that distinguishing objects from their images under varying

illumination is often very difﬁcult, and in many cases “theoretically” impossible. If the objects

are the same shape, convex, and Lambertian, intuitively the light-ﬁeld should not contain any

additional information. It is no surprise, then, that it is fairly straight-forward to prove an analogy

of Corollary 1 for (convex Lambertian) objects of the same shape:

Theorem 5 The light-ﬁeld illumination cones over all possible lighting conditions of any two con-

vex, Lambertian objects of the same shape are exactly equal.

Proof: Given arbitrary lighting, it is possible to generate any incoming radiance distribution over

the surface of the (convex) object using lasers. It is therefore possible to generate any light-ﬁeld

for any convex object (subject to the necessary and sufﬁcient constraint that rays imaging the same

point on the surface of the object have the same intensity.)

Distinguishing (convex Lambertian) objects of the same shape from their light-ﬁelds is

therefore impossible without any assumptions on the illumination. If assumptions are made about

the illumination, the situation is different. As in Theorems 3 and 4 above, if the illumination

consists of point light sources at inﬁnity two objects of the same shape may or may not be distin-

guishable.

Theorem 6 There exist pairs of same-shape convex, Lambertian objects for which the intersection

of their light-ﬁeld illumination cones (over the set of point light sources at inﬁnity) only consists

of the black (all zero) light-ﬁeld; i.e. there are pairs of same-shape objects that are always distin-

guishable (over the set of point light sources at inﬁnity.)

Proof: Essentially the same as the proof of Theorem 3.

Theorem 7 There exist pairs of convex, Lambertian objects with the same shape for which the

intersection of their light-ﬁeld illumination cones (over the set of point light sources at inﬁnity)

consists of more than just the black (all zero) image; i.e. there are pairs of same-shape objects that

are sometime indistinguishable even given their light-ﬁelds.

Proof: Essentially the same as the proof of Theorem 4.

2.3.3 Distinguishability of Differently-Shaped Objects from Their Light-Fields

Intuitively the situation for differently shaped objects is different. The light-ﬁeld contains con-

siderable information about the shape of the objects. In fact, we recently showed in [2] that, so

long as the light-ﬁeld does not contain any extended constant intensity regions, it uniquely deﬁnes

the shape of a Lambertian object. This means that the intersection of the light-ﬁeld cones of two

differently shaped objects must only contain light-ﬁelds that have constant intensity regions.

Theorem 8 The intersection of the light-ﬁeld illumination cones over all possible lighting con-

ditions of any two Lambertian objects that have different shapes only consists of light-ﬁelds that

have constant intensity regions.

This theorem implies that two differently shaped Lambertian objects can always be distinguished

from any light-ﬁeld that does not contain constant intensity regions.

2.3.4 Summary

We have described various conditions under which pairs of objects are distinguishable from their

images or light-ﬁelds. See Table 2 for a summary. When nothing is assumed about the incoming

illumination, it is impossible to distinguish between any pair of objects from their images. If the

illumination consists of a collection of point light sources at inﬁnity, the situation is a little better.

Some pairs of objects can always be distinguished, but other pairs are sometimes indistinguishable.

If the objects have the same shape the situation is the same with light-ﬁelds. Light-ﬁeld

don’t add to the discriminatory power of a single image. If the objects have different shapes the

light-ﬁeld adds a lot of discriminatory power. So long as the light-ﬁeld has no constant inten-

sity regions, any pair of differently shaped objects can be distinguished under any illumination

conditions.

Table 2: The distinguishability of objects from their images and light-ﬁelds. The main point to note is

that if two objects have the same shape, the light-ﬁeld adds nothing to the ease with which they can be

distinguished, compared to just a single image. On the other hand, if the two objects have different shapes,

it is theoretically far easier to distinguish them from their light-ﬁelds than it is from single images.

Arbitrary Illumination Point Light Sources

Conditions at Inﬁnity

Images of Two Never Distinguishable Sometimes Distinguishable (Thm. 3)

Convex Lambertian Objects (Corollary 1) Sometime Indistinguishable (Thm. 4)

Light-ﬁelds of Two Same Shape Never Distinguishable Sometimes Distinguishable (Thm. 6)

Convex Lambertian Objects (Theorem 5) Sometime Indistinguishable (Thm. 7)

Light-ﬁelds of Two Differently Distinguishable if No Always Distinguishable if No

Shaped Lambertian Objects Constant Intensity (Thm. 8) Constant Intensity (Thm. 8)

2.4 Implications

The implication of these theoretical results is as follows. The light-ﬁeld provides considerable

information about the shape of objects that can help distinguish between them in unknown, ar-

bitrary illumination conditions under which they would be indistinguishable from single images.

Although it is practically impossible to capture the entire light-ﬁeld for most object recognition

tasks, sometimes it may be possible to capture 2-3 images. Ideally we would like an object recog-

nition algorithm that can use any subset of the light-ﬁeld; a single image, a pair of images, multiple

images, or even the entire light-ﬁeld. Such an algorithm should be able to take advantage of the

implicit shape information in the light-ﬁeld. In the remainder of this paper we describe exactly

such an algorithm, the ﬁrst step of which is to estimate the light-ﬁeld from the input image(s).

3 Eigen Light-Fields for Face Recognition Across Pose

In many face recognition application scenarios the pose of the probe and gallery images are differ-

ent. The gallery image might be a frontal “mug-shot” and the probe might be a 3/4 view captured

from a surveillance camera in the corner of the room. The number of gallery and probe imagesmay

also vary. The gallery may consist of a pair of images of each subject, perhaps a frontal mug-shot

and full proﬁle view, like the images typically captured by police departments. The probe may be

a similar pair of images, a single 3/4 view, or even a collection of views from random poses.

Until recently face recognition across pose (i.e. when the gallery and probe have different

poses) has received very little attention in the literature. Algorithms have been proposed which can

recognize faces [19] or more general objects [17] at a variety of poses. Most of these algorithms

require gallery images at every pose, however. Algorithms have been proposed which do gener-

alize across pose, for example [11], but this algorithm computes 3D head models using a gallery

containing a large number of images per subject captured with controlled illumination variation. It

cannot be used with arbitrary galleries and probes. Note, however, that concurrent with this work

there has been a growing interest in face recognition across pose. For example, Vetter et al have

developed an algorithm based on ﬁtting a 3D morphable model [8, 22].

In this section we propose an algorithm for face recognition across pose using light-ﬁelds.

Our algorithm can use any number of gallery images captured at arbitrary poses, and any number

of probe images also captured with arbitrary poses. A minimum of 1 gallery and 1 probe image are

needed, but if more images are available the performance of our algorithm generally gets better.

Our algorithm operates by estimating (a representation of) the light-ﬁeld of the subject’s

head. First, generic training data is used to compute an eigen-space of head light-ﬁelds, similar

to the construction of eigen-faces [25]. Light-ﬁelds are simply used rather than images. Given a

collection of gallery or probe images, the projection into the eigen-space is performed by setting

up a least-squares problem and solving for the projection coefﬁcients similarly to approaches used

to deal with occlusions in the eigenspace approach [15, 6]. This simple linear algorithm can be

applied to any number of images, captured from any poses. Finally, matching is performed by

comparing the probe and gallery light-ﬁelds using a nearest neighbor algorithm.

The remainder of this section is organized as follows. We begin in Section 3.1 by intro-

ducing the concept of eigen light-ﬁelds before presenting the algorithm to estimate them from a

collection of images in Section 3.2. After describing some of the properties of this algorithm in

Section 3.3, we the describe how the algorithm can be used to perform face recognition across pose

in Section 3.4. Finally, we present experimental face recognition across pose results in Section 3.5.

3.1 Eigen Light-Fields

Suppose we are given a collection of light-ﬁelds where . See Figure 1 for the

deﬁnition of this notation. If we perform an eigen-decomposition of these vectors using Principal

Components Analysis (PCA), we obtain eigen light-ﬁelds where .

Then, assuming that the eigen-space of light-ﬁelds is a good representation of the set of light-ﬁelds

under consideration, we can approximate any light-ﬁeld as:

(1)

where is the inner (or dot) product between and . This

decomposition is analogous to that used in face and object recognition [25, 17]; it is just performed

on the entire light-ﬁeld rather than on single images. (The mean light-ﬁeld can be included as a

constant additive term in Equation (1) and subtracted from the light-ﬁeld in the deﬁnition of if

so preferred. There is very little difference in doing this however.)

3.2 Estimating Light-Fields from Images

Capturing the complete light-ﬁeld of an object is a difﬁcult task, primarily because it requires a

huge number of images [12, 16]. In most object recognition scenarios it is unreasonable to expect

more than a few images of the object; often just one. As shown in Figure 2, however, any image of

360

Camera Image "Plane"

Camera "Effective" Pinhole

Object

-90

Figure 2: The 1D image of a 2D object corresponds to a curve (surface for a 2D image of a 3D object) in

the light-ﬁeld. Each pixel corresponds to a ray in space through the camera pinhole and the location of the

pixel in the image. In general this ray intersects the light-ﬁeld circle at a different point for each pixel. As

the pixel considered “moves” in the image, the point on the light-ﬁeld circle traces out a curve in

- space.

This curve is a straight vertical line iff the “effective pinhole” lies on the circle used to deﬁne the light-ﬁeld.

the object corresponds to a curve (for 3D objects, a surface) in the light-ﬁeld. One way to look at

this curve is as a highly occluded light-ﬁeld; only a very small part of the light-ﬁeld is visible.

Can the coefﬁcients be estimated from this highly occluded view? Although this may

seem hopeless, note that light-ﬁelds are highly redundant, especially for objects with simple re-

ﬂectance properties such as Lambertian. An algorithm is presented in [15] to solve for the unknown

for eigen-images. A similar algorithm was used in [6]. Rather than using the inner product

, Leonardis and Bischof [15] solve for as the least squares solution of:

(2)

where there is one such equation for each pair of and that are un-occluded in . Assuming

that lies completely within the eigen-space and that enough pixels are un-occluded, then

the solution of Equation (2) will be exactly the same as that obtained using the inner product:

Theorem 9 Assuming that is in the linear span of , then

is always an exact minimum solution of Equation (2).

Since there are unknowns ( ) in Equation (2), at least un-occluded light-ﬁeld pixels

are needed to over-constrain the problem, but more may be required due to linear dependencies

between the equations. In practice, times as many equations as unknowns are typically

required to get a reasonable solution [15]. Given an image , the following is then an

algorithm for estimating the eigen light-ﬁeld coefﬁcients :

Eigen Light-Field Estimation Algorithm

1. For each pixel in compute the corresponding light-ﬁeld angles and .

(This step assumes that the camera intrinsics are known, as well as the relative orientation

between the camera and object. In Section 3.4.1 we will describe how to avoid this step and

instead use a simple “normalization” to convert the input images into light-ﬁeld vectors.)

2. Find the least-squares solution (for ) to the set of equations:

(3)

where and range over their allowed values. (In general, the eigen light-ﬁelds need

to be interpolated to estimate . Also, all of the equations for which the pixel

does not image the object should be excluded from the computation.)

Although we have described this algorithm for a single image , any number of images

can obviously be used. The extra pixels from the other images are simply added in as additional

constraints on the unknown coefﬁcients in Equation (3).

3.3 Properties of the Eigen Light-Field Estimation Algorithm

The Eigen Light-Field Estimation Algorithm can be used to estimate a light-ﬁeld from a collection

of images. Once the light-ﬁeld has been estimated, it can then, theoretically at least, be used to

render new images of the same object under different poses. See [26] for a related algorithm. In

this section we show that if the objects used to create the eigen-space of light-ﬁelds all have the

same shape as the object imaged to create the input to the algorithm, then this re-rendering process

is in some sense “correct,” assuming that all the objects are Lambertian. As a ﬁrst step, we show

that the eigen light-ﬁelds capture the shape of the objects in the following sense:

Lemma 1 If is a collection of light-ﬁelds of Lambertian objects with the

same shape, then all of the eigen light-ﬁelds have the property that if and

deﬁne two rays which image the same point on the surface of any of the objects then:

(4)

Proof: The property in Equation (4) holds for all of the light-ﬁelds used

in the PCA because they are Lambertian. Hence, it also holds for any linear combination of the .

Therefore it holds for the eigen-vectors because they are linear combinations of the .

The property in Equation (4) also holds for all linear combinations of the eigen light-ﬁelds.

It therefore holds for the light-ﬁeld recovered in Equation (3) in the Light-Field Estimation Algo-

rithm, assuming that the light-ﬁeld from which the input image is derived lies in the eigen-space so

that Theorem 9 applies. This means that the Light-Field Estimation Algorithm estimates the light-

ﬁeld in a way that is consistent with the object being Lambertian and of the appropriate shape:

Theorem 10 Suppose are the eigen light-ﬁelds of a set of Lambertian

objects with the same shape and is an image of another Lambertian object with the same

shape. If the light-ﬁeld from which

is derived lies in the light-ﬁeld eigen-space, then the

light-ﬁeld recovered by the Light-Field Estimation Algorithm has the property that if is

any pair of angles which image the same point in the scene as the pixel then:

(5)

where is the light-ﬁeld estimated by the Light-Field Estimation Algorithm; i.e. the

algorithm correctly re-renders the object under the Lambertian reﬂectance model.

Theorem 10 implies that the algorithm is acting reasonably in estimating the light-ﬁeld, a task

which is impossible from a single image without a prior model on the shape of the object. Here,

the shape model is implicitly contained in the eigen light-ﬁelds. Theorem 10 assumes that all of the

objects are approximately the same shape, but that is a common assumption for faces [21]. Even if

there is some shape variation in faces, it is reasonable to assume that the eigen light-ﬁelds will cap-

ture this information. Theorem 10 also assumes that faces are Lambertian and that the light-ﬁeld

eigenspace accurately approximates any face light-ﬁeld. The extent to which these assumptions

are valid will be demonstrated by the empirical results obtained by our face recognition algorithm.

(Note: We are not proposing the Eigen Light-Field Estimation Algorithm as an algorithm

for rendering across pose. It is only correct in a very idealized scenario. However, the fact that it is

correct in this idealized scenario gives us conﬁdence in its use for face recognition across pose.)

3.4 Application to Face Recognition Across Pose

The Eigen Light-Field Estimation Algorithm described above is somewhat abstract. In order to be

able to use it for face recognition across pose we need to be able to do two things:

Vectorization: The input to a face recognition algorithm consists of a collection of images (possi-

bly just one) captured from a variety of poses. The Eigen Light-Field Estimation Algorithm

operates on light-ﬁeld vectors (light-ﬁelds represented as vectors). Vectorization consists of

converting the input images into a light-ﬁeld vector (with missing elements, as appropriate.)

Classiﬁcation: Given the eigen coefﬁcients for a collection of gallery (training) faces

and for a probe (test) face, we need to classify which gallery face is the most likely match.

We now describe each of these tasks in turn.

3.4.1 Vectorization by Normalization

Vectorization is the process of converting a collection of images of a face into a light-ﬁeld vector.

Before we can do this, we ﬁrst have to decide how to discretize the light-ﬁeld into pixels. Perhaps

the most natural way to do this is to uniformly sample the light-ﬁeld angles, and in the 2D

case of Figure 2. This is not the only way to discretize the light-ﬁeld. Any sampling, uniform or

non-uniform, could be used. All that is needed is a way of specifying what is the allowed set of

light-ﬁeld pixels. For each such pixel, there is a corresponding index in the light-ﬁeld vector; i.e.

if the light-ﬁeld is sampled at pixels, the light-ﬁeld vectors are dimensional vectors.

We specify the set of light-ﬁeld pixels in the following manner. We assume that there

are only a ﬁnite set of poses in which the face can occur. Each face image is ﬁrst

classiﬁed into the nearest pose. (Although this assumption is clearly an approximation, its validity

is demonstrated by the empirical results in Section 3.5.3. In both the FERET [20] and PIE [23]

databases, there is considerable variation in the pose of the faces. Although the subjects are asked

to place their face in a ﬁxed pose, they rarely do this perfectly. Both databases therefore contain

considerable variation away from the ﬁnite set of poses. Since our algorithm performs well on both

databases, the approximation of classifying faces into a ﬁnite set of poses is validated.)

Each pose

is then allocated a ﬁxed number of pixels . The total number of

pixels in a light-ﬁeld vector is therefore . If we have images from pose and , for

example, we know of the pixels in the light-ﬁeld vector. The remaining

are unknown, missing data. This vectorization process is illustrated in Figure 3.

We still need to specify how to sample the pixels of a face in pose . This process is

analogous to that needed in appearance-based object recognition and usually performed by “nor-

malization.” In eigenfaces [25], the standard approach is to ﬁnd the positions of several canonical

points, typically the eyes and the nose, and to warp the input image onto a coordinate frame where

these points are in ﬁxed locations. The resulting image is then masked. To generalize eigenface

normalization to eigen light-ﬁelds, we just need to deﬁne such a normalization for each pose.

Input Images

L. Profile

Left 3/4

Frontal

R. Profile

Classified Faces Light−Field Vector

Classify by Pose

Normalize

Missing

Figure 3: Vectorization by normalization. Vectorization is the process of converting a set of images of a

face into a light-ﬁeld vector. Vectorization is performed by ﬁrst classifying each input image into one of a

ﬁnite number of poses. For each pose, a normalization is then applied to convert the image into a sub-vector

of the light-ﬁeld vector. If poses are missing, the corresponding part of the light-ﬁeld vector is missing.

In this paper we experimented with two different normalizations. The ﬁrst one, illustrated

in Figure 4(a) for three poses, is a simple one based on the location of the eyes and the nose.

Just as in eigenfaces, we assume that the eye and nose locations are known, warp the face into a

coordinate frame in which these canonical points are in a ﬁxed location and ﬁnally crop the image

with a (pose dependent) mask to yield the pixels. For this simple 3-point normalization, the

resulting masked images vary in size between 7200 and 12600 pixels.

The second normalization is more complex and is motivated by the success of Active Ap-

pearance Models [9]. This normalization is based on the location of a large number (39–54 de-

pending on the pose) of points on the face. These canonical points are triangulated and the image

warped with a piecewise afﬁne warp onto a coordinate frame in which the canonical points are in

ﬁxed locations. See Figure 4(b) for an illustration of this multi-point normalization. The resulting

masked images for this multi-point normalization vary in size between 20800 and 36000 pixels.

Although currently the multi-point normalization is performed using hand-marked points, it could

be performed by ﬁtting an ActiveAppearance Model [9] and then using the implied canonical point

locations. Further discussion of this way of automating our algorithm is contained in Section 5.2.

Input Image

Profile

3/4 View

Frontal

Warped Cropped = Normalized

Input Image Normalized

(a) 3-Point Normalization (b) Multi-Point Normalization

Figure 4: (a) The ﬁrst, simpler normalization for three poses in the ﬁnite set in Figure 3, one frontal, one a

3/4 view, the ﬁnal a full proﬁle. Just as in eigenfaces, we assume that the eye and nose locations are known,

warp the face into a coordinate frame in which these canonical points are in a ﬁxed location and ﬁnally crop

the image with a (pose dependent) mask. (b) The second, more complex normalization. In this case, a large

number (39–54 depending on the pose) of points on the face are used to perform the normalization.

3.4.2 Classiﬁcation using Nearest Neighbor

The Eigen Light-Field Estimation Algorithm outputs a vector of eigen coefﬁcients

Given a set of gallery (training) faces, we obtain a corresponding set of vectors ,

where is an index over the set of gallery faces. Similarly, given a probe (or test) face, we

obtain a vector of eigen coefﬁcients for that face. To complete the face recognition

algorithm we need an algorithm which classiﬁes

with the index which is the most

likely match. Many different classiﬁcation algorithms could be used for this task. For simplicity,

we use the nearest neighbor algorithm which classiﬁes the vector

with the index:

(6)

All of the results reported in this paper use the Euclidean distance in Equation (6). Alternative

distance functions, such as the Mahalanobis distance, could be used instead if so desired.

c27

c29c05 c14

c07

c34c37c02

c09c25

c22 c11

c31

Figure 5: An illustration of the pose variation in the CMU PIE database [23]. The pose varies from full

right proﬁle (c02) to full frontal (c27) and on to full left proﬁle (c34). The 9 cameras in the horizontal sweep

are each separated by about . The 4 other cameras include 1 above (c09) and 1 below (c07) the central

camera, and 2 in the corners of the room (c25 and c31), typical locations for surveillance cameras.

3.5 Experimental Results

3.5.1 Databases

We used two databases in our face recognition across pose experiments, the CMU Pose, Illumina-

tion, and Expression (PIE) database [23] and the FERET database [20]. Each of these databases

contains substantial pose variation. In the pose subset of the CMU PIE database (see Figure 5),

the 68 subjects are imaged simultaneously under 13 different poses totaling 884 images. In the

FERET database, the subjects are imaged non-simultaneously in 9 different poses. See Figure 6

for an example. We used 75 subjects from the FERET pose subset giving 675 images in total. (In

both cases, we used greyscale images even if the database actually contains color images.)

3.5.2 Selecting the Gallery, Probe, and Generic Training Data

In each of our experiments we divided the database(s) into three disjoint subsets:

Generic Training Data: Many face recognition algorithms such as eigenfaces, and including our

algorithm, require “generic training data” to build a generic face model. In eigenfaces, for

babebdbcbb bf bg bh bi

Figure 6: An illustration of the pose variation in the FERET database [20]. The poses of the 9 images vary

from (bb) to full frontal (ba) and on to (bi). Overall, the variation in pose is somewhat less than

in the CMU PIE database. See Figure 5 for an illustration of the pose variation in the PIE database.

example, generic training data is needed to compute the eigenspace. Similarly, in our algo-

rithm generic data is needed to construct the eigen light-ﬁeld.

Gallery: The gallery is the set of “training” images of the people to be recognized; i.e. the images

given to the algorithm as examples of each person that might need to be recognized.

Probe: The probe set contains the “test” images; i.e. the examples of images to be presented to

the system that should be classiﬁed with the identity of the person in the image.

The division into these three subsets is performed as follows. First we randomly select half of the

subjects as generic training data. The images of the remaining subjects are used for the gallery

and probe. There is never any overlap between the generic training data and the gallery and probe.

For the PIE database we randomly select

subjects for the generic training data. For the FERET

database we randomly select subjects for the generic training data.

After the generic training data has been removed, the remainder of the database(s) is divided

into probe and gallery sets based on the pose of the images. For example, we might set the gallery

to be the frontal images and the probe set to be the left proﬁles. In this case, we evaluate how well

our algorithm is able to recognize people from their proﬁles given that the algorithm has only seen

them from the front. In the experiments described below we choose the gallery and probe poses in

various different ways. The gallery and probe are always completely disjoint however.

3.5.3 Experiment 1: Comparison with Other Algorithms

We ﬁrst conducted an experiment to compare our algorithm with two others. In particular we com-

pared our algorithm with eigenfaces [25] and FaceIt, the commercial face recognition system from

Identix (formerly Visionics). Eigenfaces is the defacto baseline standard by which face recognition

algorithms are compared. FaceIt ﬁnished top overall in the Face Recognition Vendor Test 2000 [7].

We ﬁrst performed a comparison using the PIE database [23]. After randomly selecting the

generic training data, we selected the gallery pose as one of the 13 PIE poses and the probe pose

as any other of the remaining 12 PIE poses. For each disjoint pair of gallery and probe poses, we

compute the average recognition rate over all subjects in the probe and gallery sets. The details of

the results are included in Figures 7–8 and a summary is included in Table 3.

In Figure 7 we plot color-coded “confusion matrices” of the results. The row

denotes the pose of the gallery, the column the pose of the probe, and the displayed intensity the

average recognition rate. A lighter color denotes a higher recognition rate. (On the diagonals the

gallery and probe images are the same and so all three algorithms obtain a 100% recognition rate.)

Eigen light-ﬁelds performs far better than the other algorithms, as is witnessed by the lighter color

of Figures 7(a–b) compared to Figures 7(c–d). Note how eigen light-ﬁelds is far better able to

generalize across wide variations in pose, and in particular to and from near proﬁle views.

Several “cross-sections” through the confusion matrices in Figure 7 are shown in Figure 8.

In each cross-section, we ﬁx the pose of the gallery images and vary the pose of the probe image.

In each graph we plot four curves, one for eigenfaces, one for FaceIt, one for eigen light-ﬁelds

with the 3-point normalization, and one for eigen light-ﬁelds with the multi-point normalization.

As can be seen, eigen light-ﬁelds outperforms the other two algorithms. In particular, it is better

able to recognize the face when the gallery and probe poses are very different. This is witnessed

by the eigen light-ﬁeld curves in Figure 8 being higher at the extremities of the probe pose range.

The results in Figures 7 and 8 are summarized in Table 3. In this table we include the

average recognition rate computed over all disjoint gallery-probe poses. As can be seen, eigen

c22 c02 c25 c37 c05 c07 c27 c09 c29 c11 c14 c31 c34

c22

c02

c25

c37

c05

c07

c27

c09

c29

c11

c14

c31

c34

100

c22 c02 c25 c37 c05 c07 c27 c09 c29 c11 c14 c31 c34

c22

c02

c25

c37

c05

c07

c27

c09

c29

c11

c14

c31

c34

(a) Eigen Light-Fields - 3-Point Normalization (b) Eigen Light-Fields - Multi-point Normalization

c22 c02 c25 c37 c05 c07 c27 c09 c29 c11 c14 c31 c34

c22

c02

c25

c37

c05

c07

c27

c09

c29

c11

c14

c31

c34

100

c22 c02 c25 c37 c05 c07 c27 c09 c29 c11 c14 c31 c34

c22

c02

c25

c37

c05

c07

c27

c09

c29

c11

c14

c31

c34

Figure 7: A comparison with FaceIt and eigenfaces for face recognition across pose on the PIE database.

For each pair of gallery and probe poses, we plot the color-coded average recognition rate. The fact that the

images in (a) and (b) are lighter in color than those in (c) and (d) implies that our algorithm performs better.

light-ﬁelds outperforms both the standard eigenfaces algorithm and the commercial FaceIt system.

We next performed a similar comparison using the FERET database [20]. Just as with the

PIE database, we selected the gallery pose as one of the 9 FERET poses and the probe pose as

any other of the remaining 8 FERET poses. For each disjoint pair of gallery and probe poses,

we compute the average recognition rate over all subjects in the probe and gallery sets, and then

average the results. The results are very similar to those for the PIE database and are summarized

c22 c02 c25 c37 c05 c07 c09 c29 c11 c14 c31 c34

100

Probe Pose

Accuracy

ELF 3−Point

ELF Multi−Point

FaceIt

Eigenfaces

c02 c25 c37 c05 c07 c27 c09 c29 c11 c14 c31 c34

100

Probe Pose

Accuracy

ELF 3−Point

ELF Multi−Point

FaceIt

Eigenfaces

(a) Gallery Pose c27 (b) Gallery Pose c22

c22 c02 c25 c05 c07 c27 c09 c29 c11 c14 c31 c34

100

Probe Pose

Accuracy

ELF 3−Point

ELF Multi−Point

FaceIt

Eigenfaces

c22 c02 c25 c37 c05 c07 c27 c09 c29 c11 c14 c34

100

Probe Pose

Accuracy

ELF 3−Point

ELF Multi−Point

FaceIt

Eigenfaces

Figure 8: Several “cross-sections” through the confusion matrices in Figure 7. In each ﬁgure we ﬁx the

pose of the gallery and only vary the pose of the probe. We plot four curves, one each for eigen light-ﬁelds

with the 3-point normalization, eigen light-ﬁelds with the multi-point normalization, eigenfaces, and FaceIt.

The performance of eigen light-ﬁelds is superior to that for the other two algorithms, particularly when the

pose of the gallery and probe are radically different. Eigen light-ﬁelds recognizes faces better across pose.

in Table 4. Again, eigen light-ﬁelds performs signiﬁcantly better than both FaceIt and eigenfaces.

Overall, the performance improvement of eigen light-ﬁelds over the other two algorithms

is more signiﬁcant on the PIE database than on the FERET database. This is because the PIE

database contains more variation in pose than the FERET database. See Figures 5 and 6.

Table 3: A comparison of eigen light-ﬁelds with FaceIt and eigenfaces for face recognition across pose

on the PIE database. The table contains the average recognition rate computed across all disjoint pairs of

gallery and probe poses; i.e. this table summarizes the average performance in Figure 7.

Eigenfaces FaceIt Eigen Light-Fields Eigen Light-Fields

3-Point Normalization Multi-Point Normalization

Average Recognition Rate 16.6% 24.3% 52.5% 66.3%

Table 4: A comparison of eigen light-ﬁelds with FaceIt and eigenfaces for face recognition across pose on

the FERET database. The table contains the average recognition rate computed across all disjoint pairs of

gallery and probe poses. Again, eigen light-ﬁelds outperforms both eigenfaces and FaceIt.

Eigenfaces FaceIt Eigen Light-Fields

3-Point Normalization

Average Recognition Rate 53.2% 70.8% 80.3%

3.5.4 Experiment 2: Improvement with the Number of Input Images

So far we have assumed that just a single gallery and probe image are available to the algorithm.

What happens if more gallery and/or probe images are available? In Experiment 2 we investigate

the performance of eigen light-ﬁelds with different numbers of images using the PIE database. To

compute the recognition rate with gallery images, we select every possible set of gallery poses

and probe pose. In total this amounts to differentcombinations of poses.

We then compute the average recognition rate for each such combination and average the results.

We plot the overall average recognition rate against the number of gallery images in Figure 9(a).

As can be seen, eigen light-ﬁelds is able to estimate a more accurate light-ﬁeld using more gallery

images and thereby obtain a higher recognition rate.

Eigen light-ﬁelds can also take advantage of more than one probe image. We therefore

repeated Experiment 2 but reversed the roles of the gallery and probe. The results are shown

in Figure 9(b). Again the performance increases with the number of probe images, however the

1 2 3 4 5

100

Nbr Gallery Images

Accuracy

ELF 3−Point

ELF Multi−Point

1 2 3 4 5

100

Nbr Probe Images

Accuracy

ELF 3−Point

ELF Multi−Point

(a) Varying the Number of Gallery Images (b) Varying the Number of Probe Images

Figure 9: (a) The improvement in the performance of our algorithm with increasing numbers of gallery

images. Using the additional images, eigen light-ﬁelds is able to estimate the light-ﬁelds more accurately

and thereby obtains a higher recognition rate. (b) The performance of eigen light-ﬁelds also improves with

the number of probe images. The performance increase is greater with increased numbers of gallery images

because the accuracy of the light-ﬁeld of every gallery subject is improved. On the other hand, with more

probe images, the accuracy of just the one probe subject is improved.

beneﬁt of using multiple probe images is not as much as the beneﬁt of using multiple gallery

images. With multiple gallery images the accuracy of the light-ﬁeld of every subject in the gallery

is improved. With more probe images, the accuracy of the light-ﬁeld of just the single probe subject

is improved.

3.5.5 Experiment 3: Matching Sub-Images

We just illustrated how the performance of eigen light-ﬁelds improves if more gallery and/or probe

images are available. Eigen light-ﬁelds can use any subset of the light-ﬁeld. In particular, it does

not even need a complete image. To validate this property, we ran the following experiment. We

repeated Experiment 1, but for each pair of gallery and probe poses, we randomly selected a certain

percentage of the pixels in the masked image. We then compute the average recognition rate just

using this subset of the pixels. This process is repeated for 100 random samples of pixels and the

results averaged. The results are plot in Figure 10 for a variety of pixel percentages ranging from

10 50 80 100

100

Accuracy

Percentage of Pixels used

Figure 10: The performance of eigen light-ﬁelds with a subset of the images using the 3-point normalization

and the PIE database. The average recognition rate is plot against the percentage of pixels in the probe and

gallery images. A subset of the images can be used without any signiﬁcant reduction in the recognition rate.

10% to 100% (the complete image). These results were obtained using the 3-point normalization

and so the performance with 100% is 52.5%, as per Table 3. The ﬁgure clearly demonstrates that

a subset of the images can be used without any signiﬁcant reduction in the recognition rate.

3.5.6 Experiment 4: Division of the Input Images between Gallery and Probe

In Experiment 2 we examined the beneﬁts of using more than one gallery or probe image. Suppose

that gallery and probe images are available in total. Is it better to use gallery and probe

images or gallery and probe images? In order to answer this question, we conducted

Experiment 4. Given images, we generated every possible combination of gallery images

and probe image (as in Experiment 2) and every possible combination of gallery images

and probe images. We then computed the average recognition rate for each case. Similarly we

switched the roles of gallery and probe. The results are shown in Figure 11. The conclusion is clear.

It is better to divide the images equally between gallery and probe rather than asymmetrically.

One possible conclusion from this result is that adding more that one image to each of

the probe and gallery allows a better estimate of the light-ﬁeld. Having two more accurate esti-

mates results in better performance than having one very accurate estimate and one not so accurate

4 6 8 10 12

100

Accuracy

Combined Number of Gallery and Probe Images

n−1:1

n/2:n/2

4 6 8 10 12

100

Accuracy

Combined Number of Gallery and Probe Images

1:n−1

n/2:n/2

(a) gallery, probe vs. of each (b) gallery, probe vs. of each

Figure 11: (a) The performance of using gallery images and probe image versus using of each.

The empirical evidence suggests to split up the images evenly into gallery and probe. (b) The performance

of using gallery image and probe images versus using of each. Again splitting up the images

evenly achieves higher recognition rates. Having two more accurate estimates of the light-ﬁelds results in

better performance than having one very accurate estimate and one not so accurate estimate.

estimate.

4 Fisher Light-Fields for FR Across Pose and Illumination

After pose variation the next most signiﬁcant factor affecting the appearance of faces is illumina-

tion. In many face recognition application scenarios both the pose and illumination of the probe

and gallery images may be different. The gallery images may be two frontal and proﬁle “mug-

shots” captured in well controlled lighting. The probe may be a single 3/4 view captured from a

surveillance camera in the corner of a room with strong overhead lighting.

Whereas face recognition across pose has received very little attention in the literature, a

number of approaches have been proposed for face recognition across illumination. Examples in-

clude, discarding the ﬁrst 3 eigen-vectors in eigenfaces, using discriminant analysis [3], and using

image illumination cones [5]. Some of these approaches, for example image illumination cones

[5], require multiple gallery images captured with signiﬁcant illumination variation. We would

like an algorithm that can operate with just a single gallery and probe image. We chose to combine

Fisherfaces [3] with eigen light-ﬁelds [13] to obtain Fisher light-ﬁelds [14]. After describing how

these two techniques can be combined to give an algorithm for face recognition across pose and

illumination, we complete this section by presenting experimental results in Section 4.2.

4.1 Fisher Light-Fields

Suppose now that we are given a set of light-ﬁelds , where

each of objects is imaged under different illumination conditions. We could proceed as

described above and perform Principal Component Analysis on the whole set of light-

ﬁelds. This approach ignores object afﬁliations and effectively re-indexes the set of light-ﬁelds as

. Deﬁne the total scatter matrix as:

where is the mean of the complete light-ﬁeld set. PCA determines the orthogonal projection :

(7)

that maximizes the determinant of the total scatter matrix of the projected samples :

This scatter stems from both inter-class variations between the objects, as well as from intra-class

variation within the object classes. In practice, most of the scatter is due to illumination changes.

Consequently PCA encodes the illumination variations and fails to discriminate well between ob-

ject classes. An alternative approach is Fisher’s Linear Discriminant [10], also known as Linear

Discriminant Analysis [27]. Fisher’s Linear Discriminant uses the available class information to

compute a projection better suited for discrimination tasks.

Deﬁne the within-class scatter matrix as:

where is the mean of class . Furthermore deﬁne the between-class scatter matrix as

where refers to the number of samples in class . Fisher’s Linear Discriminant computes the

projection that maximizes the ratio:

The optimal projection is found by solving the generalized eigenvalue problem:

Due to the structure of the data, the within-class scatter matrix is always singular. We over-

come this problem by ﬁrst using PCA to reduce the dimension and then applying Fisher’s Linear

Discriminant [3] in the lower dimensional PCA subspace where the within-class scatter matrix

is non-singular. The overall projection is given by . Analogously to the Eigen

Light-Field Estimation Algorithm, with Fisher light-ﬁelds we ﬁnd the least squares solution to:

(8)

where are the generalized eigenvectors of and . Note that there are at most

nonzero generalized eigenvectors. This extension of eigen light-ﬁelds to Fisher light-ﬁelds

mirrors the step from eigenfaces to Fisherfaces as proposed in [3].

4.2 Experimental Results

4.2.1 Databases

For our face recognition across pose and illumination experiments, we used the pose and illumi-

nation subset of the PIE database [23]. In this subset, 68 subjects are imaged under 13 different

poses and 21 illumination conditions. Many of the illumination directions introduce fairly subtle

f15

f16

f13

f21

f12

f11

f08

f06 f10 f18 f04

f02

c22

c02

c25

c37

c07

c27

c09

c29

c11

c14

c31

c34

c05

Figure 12: An illustration of the pose and illumination variation in the CMU PIE database [23]. The

pose varies from full right proﬁle (c22) to full frontal (c27) and on to full left proﬁle (c34). Similarly, the

illumination (ﬂash) locations span the full range from right proﬁle (f16) to left proﬁle (f02).

variations in appearance and so we selected 12 of the 21 illumination conditions which span the set

widely. The set of 13 pose variations and 12 illumination variations are illustrated for one subject

in Figure 12. In total we used images in the experiments. Although the

PIE database contains color images, all of the experiments in this paper use greyscale images.

4.2.2 Selecting the Gallery, Probe, and Generic Training Data

We select the generic training data just as in Section 3.5.2. We randomly select subjects of the

PIE database for the generic training data and then remove this data from the experiments. There

are then a variety of ways of selecting the gallery and probe images from the remaining data:

Same Pose, Different Illumination: The gallery and probe poses are the same. The gallery and

probe illuminations are different. This scenario is like traditional face recognition across

illumination, but is performed separately for each pose.

Different Pose, Same Illumination: The gallery and probe poses are different. The gallery and

probe illuminations are the same. This scenario is like traditional face recognition across

pose, but is performed separately for each possible illumination.

Different Pose, Different Illumination: Both the pose and illumination of the probe and gallery

are different. This is the hardest and most general scenario.

4.2.3 Experiment 5: Comparison with Other Algorithms

We compare our algorithms with FaceIt under these three scenarios. In all cases we generate every

possible test scenario and then average the results. For “same pose, different illumination”, for ex-

ample, we consider every possible pose. We then generate every pair of disjoint probe and gallery

illumination conditions. We then compute the average recognition rate for each such case. For ex-

ample, we might compare probe pose c27, illumination f11 against gallery pose c27, illumination

f21. We then average over every pose and every pair of distinct illumination conditions.

Table 5: A comparison of the performance of eigen light-ﬁelds and Fisher light-ﬁelds with FaceIt on three

different face recognition across pose and illumination scenarios. In all three cases, eigen light-ﬁelds and

Fisher light-ﬁelds outperform FaceIt by a large margin.

Eigen Light-Fields Fisher Light-Fields FaceIt

Same pose, Different illumination - 81.1 41.6

Different pose, Same illumination 72.9 - 25.8

Different pose, Different illumination - 36.0 18.1

The results are included in Table 5. For “same-pose, different illumination,” the task is es-

sentially face recognition across illumination separately for each pose. In this case, it makes little

sense to try eigen light-ﬁelds since we know how poorly eigenfaces performs with illumination

variation. Fisher light-ﬁelds becomes Fisher faces for each pose which empirically we ﬁnd outper-

forms FaceIt. Example illumination “confusion matrices” are included for two poses in Figure 13.

For “different pose, same illumination,” the task reduces to face recognition across pose but

for a variety of different illumination conditions. In this case there is no intra-class variation and so

it makes little sense to apply Fisher light-ﬁelds. This experiment is the same as Experiment 1 but

the results are averaged over every possible illumination condition. As we found for Experiment 1,

eigen light-ﬁelds outperforms FaceIt by a large amount.

Finally, in the “different pose, different illumination” task both algorithms perform fairly

poorly. The task is very difﬁcult, however, as can be seen in Figure 12. If the pose and illumination

are both extreme, almost none of the face is visible. Since this case might occur in either the probe

or the gallery, the chances that such a difﬁcult case occurs is quite large. Although more work is

needed on this task, note that Fisher light-ﬁelds still outperforms FaceIt by a large amount.

Gallery Flash Conditions

Probe Flash Conditions

f16 f15 f13 f21 f12 f11 f08 f06 f10 f18 f04 f02

f16

f15

f13

f21

f12

f11

f08

f06

f10

f18

f04

f02

100

Gallery Flash Conditions

Probe Flash Conditions

f16 f15 f13 f21 f12 f11 f08 f06 f10 f18 f04 f02

f16

f15

f13

f21

f12

f11

f08

f06

f10

f18

f04

f02

(a) Fisher LF - Pose c27 vs. c27 (frontal) (c) FaceIt - Pose c27 vs. c27

Gallery Flash Conditions

Probe Flash Conditions

f16 f15 f13 f21 f12 f11 f08 f06 f10 f18 f04 f02

f16

f15

f13

f21

f12

f11

f08

f06

f10

f18

f04

f02

100

Gallery Flash Conditions

Probe Flash Conditions

f16 f15 f13 f21 f12 f11 f08 f06 f10 f18 f04 f02

f16

f15

f13

f21

f12

f11

f08

f06

f10

f18

f04

f02

(b) Fisher LF - Pose c37 vs. c37 (right ) (d) FaceIt - Pose c37 vs. c37

Figure 13: Example “confusion matrices” for the “same-pose, different illumination” task. For a given

pose, and a pair of distinct probe and gallery illumination conditions, we color-code the average recognition

rate. The superior performance of Fisher light-ﬁelds is witnessed by the lighter color of (a–b) over (c–d).

5 Conclusion

5.1 Summary

Appearance-based object recognition uses pixels or measurements of light in the scene as its fea-

tures. In the ultimate limit, the set of all such measurements is the plenoptic function or light-ﬁeld.

In this paper we have explored appearance-based object recognition from light-ﬁelds. We ﬁrst

analyzed the theoretical distinguishability of objects from their images and light-ﬁelds. We pre-

sented a number of results which show that theoretically objects can be distinguishable from their

light-ﬁelds in cases that they are ambiguous from just a single image. This theoretical analysis

motivates trying to build appearance-based object recognition algorithms that use as much of the

light-ﬁeld as is available, be it a single image, a pair of images, or multiple images.

In the second half of this paper we proposed an appearance-based algorithm for face recog-

nition across pose based on an algorithm to estimate the (eigen) light-ﬁeld from a collection of

images. This algorithm can use any number of gallery images captured form arbitrary poses and

any number of probe images also captured from arbitrary poses. The gallery and probe poses do

not need to overlap, and any number of gallery and probe images can be used. We showed that

our algorithm can reliably recognize faces across pose and also take advantage of the additional

information contained in widely separated views to improve recognition performance if more than

one gallery or probe image is available. We extended our algorithm to recognize faces across both

pose and illumination simultaneously by generalizing eigen light-ﬁelds [13] to Fisher light-ﬁelds

[14], analogously to how eigen faces [25] can be generalized to Fisherfaces [3].

5.2 Limitations and Future Work: Normalization

Appearance-based object recognition algorithms require that the images be aligned. In eigenfaces

[25] the images are normally warped so that the eyes, and perhaps the nose, are in canonical

locations. Why is this alignment needed? It is needed to make sure that the features used in the

training phase match up with the features used in the testing phase. As we have pointed out, the

features used in appearance-based algorithms are the radiances of light along certain rays in space.

For such algorithms to be meaningful, the light radiated from the cheek of one person, say, must

correspond to the same features as the light radiated from the cheek of another person.

With 2D images of frontal faces a simple translation (or perhaps an afﬁne warp) is enough

to register faces. With light-ﬁelds of 3D objects, the registration is a 6 DOF rigid transformation

(rotation plus translation) in the 3D world (perhaps followed by a correction for the intrinsics

of the camera.) Although performing such a registration is more difﬁcult in 3D, in essence it is

performing the same function as the simple translation or afﬁne warp in 2D, namely to ensure that

the same light rays correspond to the same pixels (features.)

In this paper we used two different normalizations based on manually marked locations of

the eyes and nose, etc, combined with the known pose of the face, to perform this registration.

See Section 3.4.1 for the details. The essence of this step is to convert the input image into the

light-ﬁeld coordinate frame in a way that the same light rays for each subject (training and testing)

get mapped to the same pixels in the light-ﬁeld. At present our algorithm is somewhat ad-hoc and

requires user input in the form of feature point location. We are currently working on using “active

appearance models” [9] to perform this registration in a more principled and automated way.

Acknowledgements

Much of Section 3 ﬁrst appeared in [13] and much of Section 4 in [14]. We would like to thank

Terence Sim and Takeo Kanade for preliminary discussions on the light-ﬁeld estimation algorithm

and the reviewers of [13] and [14] for their feedback. The research described in this paper was

supported by U.S. Ofﬁce of Naval Research contract N00014-00-1-0915. Portions of the research

in this paper use the FERET database of facial images collected under the FERET program.

References

[1] E.H. Adelson and J. Bergen. The plenoptic function and elements of early vision. In Landy

and Movshon, editors, Computational Models of Visual Processing. MIT Press, 1991.

[2] S. Baker, T. Sim, and T. Kanade. When is the shape of a scene unique given its light-ﬁeld:

A fundamental theorem of 3D vision? IEEE Transaction on Pattern Analysis and Machine

Intelligence, 2002. (Accepted for publication).

[3] P.N. Belhumeur, J. Hespanha, and D.J. Kriegman. Eigenfaces vs. Fisherfaces: Recognition

using class speciﬁc linear projection. IEEE Transactions on Pattern Analysis and Machine

Intelligence, 19(7):711–720, 1997.

[4] P.N. Belhumeur and D.W. Jacobs. Comparing images under variable illumination. In Pro-

ceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1998.

[5] P.N. Belhumeur and D.J. Kriegman. What is the set of images of an object under all possible

lighting conditions? International Journal of Computer Vision, 28(3):1–16, 1998.

[6] M. Black and A. Jepson. Eigen-tracking: Robust matching and tracking of articulated objects

using a view-based representation. International Journal of Computer Vision, 36(2):101–130,

1998.

[7] D.M. Blackburn, M. Bone, and P.J. Phillips. Facial recognition vendor test 2000: Evaluation

report, 2000.

[8] V. Blanz, S. Romdhani, and T. Vetter. Face identiﬁcation across different poses and illumi-

nation with a 3D morphable model. In Proceedings of the Fifth International Conference on

Face and Gesture Recognition, 2002.

[9] T.F. Cootes, G.J. Edwards, and C.J. Taylor. Active appearance models. IEEE Transactions

on Pattern Analysis and Machine Intelligence, 23(6):681–685, 2001.

[10] K. Fukunaga. Introduction to statistical pattern recognition. Academic Press, 1990.

[11] A. Georghiades, P.N. Belhumeur, and D. Kriegman. From few to many: Generative models

for recognition under variable pose and illumination. In Proceedings of the Fourth Interna-

tional Conference on Face and Gesture Recognition, 2000.

[12] S. J. Gortler, R. Grzeszczuk, R. Szeliski, and M. F. Cohen. The lumigraph. In Computer

Graphics Proceedings, Annual Conference Series (SIGGRAPH), 1996.

[13] R. Gross, I. Matthews, and S. Baker. Eigen light-ﬁelds and face recognition across pose. In

Proceedings of the Fifth International Conference on Face and Gesture Recognition, 2002.

[14] R. Gross, I. Matthews, and S. Baker. Fisher light-ﬁelds for face recognition across pose and

illumination. In Proceedings of the German Symposium on Pattern Recognition (DAGM),

2002.

[15] A. Leonardis and H. Bischof. Dealing with occlusions in the eigenspace approach. In Pro-

ceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1996.

[16] M. Levoy and M. Hanrahan. Light ﬁeld rendering. In Computer Graphics Proceedings,

Annual Conference Series (SIGGRAPH), 1996.

[17] H. Murase and S.K. Nayar. Visual learning and recognition of 3-D objects from appearance.

International Journal of Computer Vision, 14:5–24, 1995.

[18] S.G. Narasimhan and S.K. Nayar. Chromatic framework for vision in bad weather. In Pro-

ceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2000.

[19] A.P. Pentland, B. Moghaddam, and T. Starner. View-based and modular eigenspaces for

face recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern

Recognition, 1994.

[20] P. J. Phillips, H. Wechsler, J. Huang, and P. Rauss. The FERET database and evaluation

procedure for face recognition algorithms. Image and Vision Computing, 16(5):295–306,

1998.

[21] T. Riklin-Raviv and A. Shashua. The Quotient image: Class based recognition and synthesis

under varying illumination. In Proceedings of the IEEE Conference on Computer Vision and

Pattern Recognition, 1999.

[22] S. Romdhani, V. Blanz, and T. Vetter. Face identiﬁcation by matching a 3D morphable model

using linear shape and texture error functions. In Proceedings of the European Conference

on Computer Vision, 2002.

[23] T. Sim, S. Baker, and M. Bsat. The CMU pose, illumination, and expression (PIE) database.

In Proceedings of the Fifth International Conference on Face and Gesture Recognition, 2002.

[24] L. Sirovich and M. Kirby. Low-dimensional procedure for the characterization of human

faces. Journal of the Optical Society of America, 4(3):519–524, 1987.

[25] M. Turk and A. Pentland. Face recognition using eigenfaces. In Proceedings of the IEEE

Conference on Computer Vision and Pattern Recognition, 1991.

[26] T. Vetter and T. Poggio. Linear object classes and image synthesis from a single example

image. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7):733–741,

1997.

[27] W. Zhao, A. Krishnaswamy, R. Chellappa, D.L. Swets, and J. Weng. Discriminant analysis

of principal components for face recognition. In H. Wechsler, P.J. Phillips, V. Bruce, and

T. Huang, editors, Face Recognition: From Theory to Applications. Springer Verlag, 1998.

Automatic labeling of 3D facial acupoint landmarks

Article

Full-text available

Mar 2024

p>As special marks on a human face, facial landmarks reflect the facial features of various parts of the face, which is crucial in biomedicine and medical imaging. In addition, facial landmarks are also important features in computer vision such as face detection, face recognition, facial pose estimation, and facial animation. In this paper, we construct a 3D facial acupoint annotated dataset by labeling 37 facial acupoints on 846 neutral face triangle mesh on the FaceScape dataset. Based on these annotated data, we use a feature template matching method to realize the automatic annotation of 37 acupoints on triangle meshes. We used 40 meshes as the training set to extract the geometric patterns of 3D acupoints and then measured the performance of the automatic labeling algorithm on 20 meshes and 806 meshes as the test sets. In the training process, we extract the tangent plane for each landmark, project the neighbor vertices of the landmark to the tangent plane, and construct the feature image with R × R resolution through the bounding box of the projected points. In the testing process, we use the pattern images extracted during training to find the average features and use them as a guide to optimize the predicted landmarks. The experimental results show that our automatic acupoint labeling method has achieved good results. </p

Current Advances and Future Perspectives of Image Fusion: A Comprehensive Review

Article

Sep 2022
INFORM FUSION

Multiple imaging modalities can be combined to provide more information about the real world than a single modality alone. Infrared images discriminate targets with respect to their thermal radiation differences, and visible images are promising for texture details. On the other hand, polarized images deliver intensity and polarization information, and multispectral images dispense the spatial, spectral, and temporal information depending upon the environment. Different sensors provide images with different characteristics, such as type of degradation, important features, textural attributes, etc. Several stimulating tasks have been explored in the last decades based on algorithms, performance assessments, processing techniques, and prospective applications. However, most of the reviews and surveys have not properly addressed the issues of additional possibilities of imaging fusion. The primary goal of this paper is to give a thorough overview of image fusion approaches, including associated background and current breakthroughs. We introduce image fusion and categorize the methods based on conventional image processing, deep learning (DL) architectures, and fusion scenarios. Further, we emphasize the recent DL developments in various image fusion scenarios. However, there are still several difficulties to overcome, including developing more advanced algorithms to support more dependable and real-time practical applications, discussed in future perspectives. This study can assist researchers in coping with multiple imaging modalities, recent fusion developments, and future perspectives.

The Bus Control System Based on Intelligent Security Design

Article

Full-text available

Feb 2022

Due to traffic congestion and the changeable internal and external driving environment, it is difficult for bus drivers to respond quickly to the surrounding environment in an emergency, resulting in traffic accidents. This paper designs an intelligent and efficient active security system. The security monitoring system includes abnormal emotion processing system and trajectory offset processing system. The two systems are applied to image acquisition unit, image processing unit, vehicle trajectory determination unit, data storage unit and data transmission unit. The trajectory offset processing system and abnormal emotion processing system are used to analyze and process the data of various sensors, upload the data to the platform, and realize multi department cooperative processing.

Visual Feature Learning on Video Object and Human Action Detection: A Systematic Review

Article

Full-text available

Dec 2021

Video object and human action detection are applied in many fields, such as video surveillance, face recognition, etc. Video object detection includes object classification and object location within the frame. Human action recognition is the detection of human actions. Usually, video detection is more challenging than image detection, since video frames are often more blurry than images. Moreover, video detection often has other difficulties, such as video defocus, motion blur, part occlusion, etc. Nowadays, the video detection technology is able to implement real-time detection, or high-accurate detection of blurry video frames. In this paper, various video object and human action detection approaches are reviewed and discussed, many of them have performed state-of-the-art results. We mainly review and discuss the classic video detection methods with supervised learning. In addition, the frequently-used video object detection and human action recognition datasets are reviewed. Finally, a summarization of the video detection is represented, e.g., the video object and human action detection methods could be classified into frame-by-frame (frame-based) detection, extracting-key-frame detection and using-temporal-information detection; the methods of utilizing temporal information of adjacent video frames are mainly the optical flow method, Long Short-Term Memory and convolution among adjacent frames.

A unified approach to multi-pose audio-visual ASR

Conference Paper

Aug 2007

The Appropriation of the Agile Approach in Public Sector: Modeling the Achievement of Good Governance

Chapter

Jun 2023

Agile is on the rise as a new way of governing. It’s an emerging theme in the field of management to detach agile from its roots in software development and explore a new context of application. This article is a contribution to the literature to propose a model based on the theoretical gaps of the agile approach specifically in the public sector and linking the agile framework to the broader classical values of public management. In this paper, we model the components of the theoretical framework of the agile approach. We draw on De Vaujany’s theory of appropriation to explain the link between the public administration crisis and the attempt to model the agile approach in the public context. The model is guided by the principles of the agile approach for the independent variables. The moderating variables are the context presented in the models of Boehm and Turner [13] and that of Kruchten [14]. The dependent variables are the satisfaction of the users or stakeholder request to achieve good governance. With this paper, we hope to build a bridge for further collaboration between practitioners and academics in the search for new ways to improve public value.KeywordsModelingAgileOrganizational contextGood governancePublic sector

Automated Quality Inspection Using Computer Vision: A Review

Chapter

Jun 2023

In the context of manufacturing, ensuring the highest quality of the product can be an expensive task; because of, dealing with defective products, slowdowns of the production line, intermittent visual inspections, and high cost of uptime and downtime, plus customer complaints after the product is shipped. That is why the computer vision system uses machine learning to spot defects in real-time on product specifications.In this paper, we introduce the newly developed automated inspection techniques, presenting an overview based on the statistical studies applied in the manufacturing environment, and on computer vision Stages.The paper lists the most recent machine learning and deep learning algorithms applied in manufacturing to detect defects of the products.KeywordsComputer visionMachine learningDeep learningQuality controlManufacturingTransfer learning

A Comprehensive Review in Using the Advances of Deep Learning in the 3D Race Classification

Chapter

Jun 2023

Human faces can reveal not just the human identity, but even demographic characteristics such as ethnicity and gender. Recently, the researchers get the advantages of Deep Learning techniques in developing face recognition systems implemented on both 2D and 3D face datasets. However, the usefulness of Deep learning in analyzing facial features of 3D faces gender, and ethnicity are examined in literature with only three main perspectives: data representation, augmentation, and comparison using the several commonly used format of 3D face representation such as depth images, point clouds, normal maps, triangular mesh, and horizontal disparity images. Many algorithms are implemented by authors on popular 3D datasets including FRGC v2, 3D-Texas, and BU3D-FE. In this work, we highlight the advantages of using the deep learning 3D representation in “race recognition” approaches and refer the researchers to the important related works in this field by comparing them according to their distinguishing metrics and invariant conditions support and the used techniques and datasets.KeywordsRace ClassificationDeep Learning3D face Recognition

Video based eye blink analysis for psychological state determination

Article

Sep 2021

The status of mental health and mood of human beings are well comprehensible by careful observation of movements of different body parts. Eye being the most prominent body part, analysis of different eye parameters such as blink, gaze, opening and closing rate provides important clues on mood status as well as mental health conditions. The present work can be viewed from a statistical and machine learning perspective that utilizes eye blink information to study the mental health status of a person. By using appropriate image processing techniques eye blinks of different subjects were collected through an experimental setup. The setup contained a recording environment where each participant was required to watch two videos of opposite emotions, i.e., joy and sad during different time settings. From the recorded videos of each participant, eye blinks were extracted and investigated. On analyzing the blink rates thoroughly, using statistical and machine learning means we observed; 1) an increase in number of eye blinks when the mood of a participant swings from sad to joy and 2) a significantly smaller number of blinks in depressed participants than the normal participants while in sad mood.

Evaluation of Face Recognition Systems

Chapter

Aug 2021

While face recognition research has been perennial and popular since its inception, there has been a marked escalation in this research in recent years due to the confluence of several factors, primarily the development of advanced machine learning algorithms, free and robust software implementations thereof, ever faster GPU processors for running them, vast web-scraped face image databases, open performance benchmarks, and a vibrant face recognition literature.

Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection

Article

Full-text available

Jul 1997

We develop a face recognition algorithm which is insensitive to large variation in lighting direction and facial expression. Taking a pattern classification approach, we consider each pixel in an image as a coordinate in a high-dimensional space. We take advantage of the observation that the images of a particular face, under varying illumination but fixed pose, lie in a 3D linear subspace of the high dimensional image space-if the face is a Lambertian surface without shadowing. However, since faces are not truly Lambertian surfaces and do indeed produce self-shadowing, images will deviate from this linear subspace. Rather than explicitly modeling this deviation, we linearly project the image into a subspace in a manner which discounts those regions of the face with large deviation. Our projection method is based on Fisher's linear discriminant and produces well separated classes in a low-dimensional subspace, even under severe variation in lighting and facial expressions. The eigenface technique, another method based on linearly projecting the image space to a low dimensional subspace, has similar computational requirements. Yet, extensive experimental results demonstrate that the proposed “Fisherface” method has error rates that are lower than those of the eigenface technique for tests on the Harvard and Yale face databases

Real Time Face Recognition using Eigenfaces

Conference Paper

Full-text available

May 2000
Proceedings of SPIE

In recent years considerable progress has been made in the area of face recognition. Through the development of techniques like eigenfaces, computers can now compte favorably with humans in many face recognition tasks, particularly those in which large databases of faces must be searched. Whilst these methods perform extremely well under constrained conditions, the problem of face recognition under gross variations in expression, view, and lighting remains largely unsolved. This paper details the design of a real-time face recognition system aimed at operating in less constrained environments. The system is capable of single scale recognition with an accuracy of 94% at 2 frames-per- second. A description of the system's performance and the issues and problems faced during it's development is given.

Face recognition using eigenfaces

Article

Jan 1991
J COGNITIVE NEUROSCI

EigenTracking: Robust Matching and Tracking of Articulated Objects Using a View-Based Representation

Article

Jan 1998

This paper describes an approach for tracking rigid and articulated objects using a view-based representation. The approach builds on and extends work on eigenspace representations, robust estimation techniques, and parameterized optical flow estimation. First, we note that the least-squares image reconstruction of standard eigenspace techniques has a number of problems and we reformulate the reconstruction problem as one of robust estimation. Second we define a "subspace constancy assumption" that allows us to exploit techniques for parameterized optical flow estimation to solve for both the view of an object and the affine transformation between the eigenspace and the image. To account for large affine transformations between the eigenspace and the image we define a multi-scale eigenspace representation and a coarse-to-fine matching strategy. Finally, we use these techniques to track objects over long image sequences in which the objects simultaneously undergo both affine image motions and changes of view. In particular we use this "EigenTracking" technique to track and recognize the gestures of a moving hand.

Facial Recognition Vendor Test 2002 Evaluation Report

Article

Jan 2003

Face Recognition Vendor Test 2000: Evaluation Report

Article

Feb 2001

The biggest change in the facial recognition community since the completion of the FERET program has been the introduction of facial recognition products to the commercial market. Open market competitiveness has driven numerous technological advances in automated face recognition since the FERET program and significantly lowered system costs. Today there are dozens of facial recognition systems available that have the potential to meet performance requirements for numerous applications. But which of these systems best meet the performance requirements for given applications? Repeated inquiries from numerous government agencies on the current state of facial recognition technology prompted the DoD Counterdrug Technology Development Program Office to establish a new set of evaluations. The Facial Recognition Vendor Test 2000 (FRVT 2000) was cosponsored by the DoD Counterdrug Technology Development Program Office, the National Institute of Justice and the Defense Advanced Research Projects Agency and was administered in May and June 2000.

The FERET database and evaluation procedure for face-recognition algorithms

Article

Apr 1998
IMAGE VISION COMPUT

The Face Recognition Technology (FERET) program database is a large database of facial images, divided into development and sequestered portions. The development portion is made available to researchers, and the sequestered portion is reserved for testing facerecognition algorithms. The FERET evaluation procedure is an independently administered test of face-recognition algorithms. The test was designed to: (1) allow a direct comparison between different algorithms, (2) identify the most promising approaches, (3) assess the state of the art in face recognition, (4) identify future directions of research, and (5) advance the state of the art in face recognition.

Principal Component Analysis with Missing Data and Its Application to Polyhedral Object Modeling

Chapter

Jan 2002

Observation-based object modeling often requires integration of shape descriptions from different views. In current conventional methods, to sequentially merge multiple views, an accurate description of each surface patch has to be precisely known in each view, and the transformation between adjacent views needs to be accurately recovered. When noisy data and mismatches are present, the recovered transformation become erroneous, in addition, the transformation errors accumulate and propagate along the sequence, resulting in an inaccurate object model. To overcome these problems, we have developed a weighted least-squares (WLS) approach which simultaneously recovers object shape and transformation among different views without recovering interframe motion as an intermediate step.

Inroduction to Statistical Pattern Recognition

Chapter

Jan 1990

K.~Fukunaga

Robust Recognition Using Eigenimages

Article

Jul 1997

The basic limitations of the standard appearance-based matching methods using eigenimages are nonrobust estimation of coefficients and inability to cope with problems related to outliers, occlusions, and varying background. In this paper we present a new approach which successfully solves these problems. The major novelty of our approach lies in the way the coefficients of the eigenimages are determined. Instead of computing the coefficients by a projection of the data onto the eigenimages, we extract them by a robust hypothesize-and-test paradigm using subsets of image points. Competing hypotheses are then subject to a selection procedure based on the Minimum Description Length principle. The approach enables us not only to reject outliers and to deal with occlusions but also to simultaneously use multiple classes of eigenimages.

Appearance-Based Face Recognition and Light-Fields

Abstract and Figures

Recommended publications

Face Description with Local Binary Patterns: Application to Face Recognition

Automatic Construction of Active Appearance Models as an Image Coding Problem

Unknown

Face Recognition Across Pose and Illumination

Generic vs. Person Specific Active Appearance Models