ArticlePDF Available

Discriminative Learning and Recognition of Image Set Classes Using Canonical Correlations

July 2007
IEEE Transactions on Pattern Analysis and Machine Intelligence 29(6):1005-18

July 2007
29(6):1005-18

DOI:10.1109/TPAMI.2007.1037

Source
PubMed

Authors:

Tae-Kyun Kim

Imperial College London

Roberto Cipolla

University of Cambridge

We address the problem of comparing sets of images for object recognition, where the sets may represent variations in an object's appearance due to changing camera pose and lighting conditions. Canonical Correlations (also known as principal or canonical angles), which can be thought of as the angles between two d-dimensional subspaces, have recently attracted attention for image set matching. Canonical correlations offer many benefits in accuracy, efficiency, and robustness compared to the two main classical methods: parametric distribution-based and nonparametric sample-based matching of sets. Here, this is first demonstrated experimentally for reasonably sized data sets using existing methods exploiting canonical correlations. Motivated by their proven effectiveness, a novel discriminative learning method over sets is proposed for set classification. Specifically, inspired by classical Linear Discriminant Analysis (LDA), we develop a linear discriminant function that maximizes the canonical correlations of within-class sets and minimizes the canonical correlations of between-class sets. Image sets transformed by the discriminant function are then compared by the canonical correlations. Classical orthogonal subspace method (OSM) is also investigated for the similar purpose and compared with the proposed method. The proposed method is evaluated on various object recognition problems using face image sets with arbitrary motion captured under different illuminations and image sets of 500 general objects taken at different views. The method is also applied to object category recognition using ETH-80 database. The proposed method is shown to outperform the state-of-the-art methods in terms of accuracy and efficiency.

Conceptual illustration of the proposed method.

…

Example of the two time sets (top and bottom) of a person acquired in a single lighting setting. They contain significant variations in pose and expression.

…

Examples of Face Video Database. The data set contains 100 face classes with varying age, ethnicity, and genders. Each class has about 1400 images from the 14 image sequences captured under 7 different lighting conditions.

…

Conceptual illustration of canonical correlations. Two sets are represented as linear subspaces which are planes here. Canonical vectors on the planes are found to yield maximum correlations. In a two dimensional subspace case, the second canonical vectors u 2 , v 2 are determined to be perpendicular to the first

…

Figures - uploaded by Tae-Kyun Kim

Content may be subject to copyright.

Content uploaded by Tae-Kyun Kim

Content may be subject to copyright.

Technical Report

Tae-Kyun Kim

20 August 2006

Department of Engineering

University of Cambridge

Chapter 1

Discriminative Learning and

Recognition of Image Set Classes Using

Canonical Correlations

Tae-Kyun Kim

, Josef Kittler

, Roberto Cipolla

1 : Engineering Department, University of Cambridge, Cambridge, CB2 1PZ, UK

2 : CVSSP, University of Surrey, Guildford, GU2 7XH, UK

Abstract

We address the problem of comparing sets of images for object recognition, where the

sets may represent variations in an object’s appearance due to changing camera pose and

lighting conditions. Canonical Correlations (also known as principal or canonical angles),

which can be thought of as the angles between two d-dimensional subspaces, have recently

attracted attention for image set matching. Canonical correlations offer many beneﬁts in

accuracy, efﬁciency, and robustness compared to the two main classical methods: para-

metric distribution-based and non-parametric sample-based matching of sets. Here, this

is ﬁrst demonstrated experimentally for reasonably sized data sets using existing methods

exploiting canonical correlations. Motivated by their proven effectiveness, a novel discrim-

inative learning method over sets is proposed for set classiﬁcation. Speciﬁcally, inspired by

classical Linear Discriminant Analysis (LDA), we develop a linear discriminant function

that maximizes the canonical correlations of within-class sets and minimizes the canonical

correlations of between-class sets. Image sets transformed by the discriminant function are

then compared by the canonical correlations. The proposed method is evaluated on various

object recognition problems using face image sets with arbitrary motion captured under dif-

ferent illuminations and image sets of ﬁve hundred general objects taken at different views.

The method is also applied to object category recognition using ETH-80 database. The

proposed method is shown to outperform the state-of-the-art methods in terms of accuracy

and efﬁciency.

1 Discriminative Learning and Recognition of Image Set Classes Using Canonical

Correlations

1.1 Introduction

Many computer vision tasks can be cast as learning problems over vector or image sets. In

object recognition, for example, a set of vectors may represent a variation in an object’s

appearance – be it due to camera pose changes, non-rigid deformations or variation in

illumination conditions. The objective of this work is to classify an unknown set of vectors

to one of the training classes, each also represented by vector sets. More robust object

recognition performance can be achieved by efﬁciently using set information rather than

a single vector or image as input. Examples of pattern sets of an object are shown in

Figure 1.1.

Whereas most of the previous work on matching image sets for object recognition ex-

ploits temporal coherence between consecutive images [1, 2, 3, 4, 5], this study does not

make any such assumption. Sets may be derived from sparse and unordered observations

acquired by multiple still shots of a three dimensional object or a long term monitoring of

a scene, as exempliﬁed e.g. by surveillance systems, where a subject would not face the

camera all the time. By this, training sets can be more conveniently augmented in the pro-

posed framework. As this work does not exploit any data-semantics explicitly, the proposed

method is expected to be applied to many other problems requiring a set comparison.

Relevant previous approaches to set matching for set classiﬁcation can be broadly parti-

tioned into parametric model-based [6, 7] and non-parametric sample-based methods [8, 9].

In the model-based approaches, each set is represented by a parametric distribution func-

tion, typically Gaussian. The closeness of the two distributions is then measured by the

Kullback-Leibler Divergence (KLD) [30]. Due to the difﬁculty of parameter estimation

under limited training data, these methods easily fail when the training and novel test sets

do not have strong statistical relationships.

Rather more relevant methods for comparing sets are based on matching of pairwise

samples of sets, e.g. Nearest Neighbour (NN) and Hausdorff distance matching [8, 9]. The

methods are based on the premise that similarity of a pair of sets is reﬂected by the sim-

ilarity of the modes (or NN samples) of the two respective sets. This is certainly useful

in many computer vision applications where the data acquisition conditions may change

dramatically over time. For example, as shown in Figure 1.1 (a), when two sets contain im-

ages of an object taken from different views but with a certain overlap in views, global data

characteristics of the sets are signiﬁcantly different making the model-based approaches un-

successful. To recognise the two sets as the same class, the most effective solution would

be to ﬁnd the common views and measure the similarity of those parts of data. In spite

of their rational basis, the non-parametric sample-based methods easily fail, as they do not

take into account the effect of outliers as well as the natural variability of the sensory data

due to the 3D nature of the observed objects. Note also that such methods are very time

consuming as they require a comparison of every pair of samples drawn from the two sets.

The above discussion is concerned purely with how to quantify the degree of match

between two sets, that is, how to deﬁne similarity of two sets. However, the other impor-

tant problem in set classiﬁcation is how to learn discriminative function from training data

associated with a given similarity function. To our knowledge, the topic of discriminative

learning over sets has not been given a proper attention in the literature. In this study, we

interpret the classical Linear Discriminant Analysis (LDA) [9, 10] and its non-parametric

1 Discriminative Learning and Recognition of Image Set Classes Using Canonical

Correlations 3

(a) Two sets (top and bottom) contain images of a 3D object taken from different views but with a

certain overlap in their views.

(b) Two face image sets (top and bottom) collected from videos taken under different illumination

settings. Face patterns of the two sets vary in both lighting and pose.

Figure 1.1: Examples of image sets. The sets contain different pattern variations caused

by different views and lighting.

variants, Non-parametric Discriminant Analysis (NDA) [18], as techniques of discrimina-

tive learning over sets (See Section 1.2.1). LDA has been recognized as a powerful method

for face recognition based on a single face image as input. The methods based on LDA

have been widely advocated in the literature [10, 11, 12, 13, 14, 17]. However, note that

these methods do not consider multiple input images. When they are directly applied to

set classiﬁcation based on sample matching, they inherit the drawbacks of the classical

non-parametric sample-based methods as discussed above.

Relatively recently the concept of canonical correlations has attracted increasing at-

tention for image set matching in [15, 19, 20, 21, 22], following the early works [23, 24,

25, 26]. Each set is represented by a linear subspace and the angles between two high-

dimensional subspaces are exploited as a similarity measure of two sets (See Section 1.2.2

for more details). As a method for comparing sets, the beneﬁts of canonical correlations

over both parametric distribution-based and sample-based matching, have been noted in

our earlier work [15] as well as in [7]. They include efﬁciency, accuracy and robust-

ness. This will be discussed and demonstrated in a more detailed and rigorous manner

in Section 1.2.2 and Section 1.5. A nonlinear extension of canonical correlation has been

proposed in [15, 20, 38] and a feature selection scheme for the method in [15]. The Con-

strained Mutual Subspace Method (CMSM) [21][22] is the most related to the approach of

this study. In CMSM, a constrained subspace is deﬁned as the subspace in which the en-

tire class population exhibits small variance. The authors showed that the sets of different

classes in the constrained subspace had small canonical correlations. However, the prin-

ciple of CMSM is rather heuristic, especially the process of selecting the dimensionality

of the constrained subspace. If the dimensionality is too low, the subspace will be a null

space. In the opposite case, the subspace simply captures all the energy of the original data

and thus cannot play the role of a discriminant function.

This study presents a novel method of object recognition using image sets, which is

based on canonical correlations. The previous conference version [16] has been extended

1 Discriminative Learning and Recognition of Image Set Classes Using Canonical

Correlations

by a more detailed discussion of the key ingredients of the method and the convergence

properties of the proposed learning, as well as by reporting the results of additional exper-

iments on face recognition and general object category recognition using the ETH80 [36]

data base. The main contributions of this study are as follows: First of all, as a method of

comparing sets of images, the beneﬁts of canonical correlations of linear subspaces are ex-

plained and evaluated. Extensive experiments comparing canonical correlations with both

classical methods (parametric model-based and non-parametric sample-based matching)

are carried out to demonstrate these advantages empirically. A novel method of discrim-

inant analysis of canonical correlations is then proposed. A linear discriminant function

that maximizes the canonical correlations of within-class sets and minimizes the canoni-

cal correlations of between-class sets is deﬁned, by analogy to the optimization concept

of LDA. The linear mapping is found by a novel iterative optimization algorithm. Image

sets transformed by the discriminant function are then compared by canonical correlations.

The discriminative capability of the proposed method is shown to be signiﬁcantly bet-

ter than both, the method [19] that simply aggregates canonical correlations and the kNN

method applied to image vectors transformed by LDA. Interestingly, the method exhibits

very good accuracy as well as other attractive properties: low computational matching cost

and simplicity of feature selection. The proposed iterative solution is further compared

with classical orthogonal subspace method (OSM) [31], also devised to improve the simple

canonical correlation method. As canonical correlations are only determined up to rota-

tions within subspaces, the canonical correlations of subspaces of between-class sets can

be minimized by orthogonalizing those subspaces. To our knowledge, the close relationship

of the orthogonal subspace method and canonical correlations has not been noted before.

It is also interesting to see that OSM has a close afﬁnity to CMSM. The proposed method

and OSM are assessed experimentally on diverse object recognition problems: faces with

arbitrary motion under different lighting, general 3D objects observed from different view

points and the ETH80 general object category database. The new techniques are shown

to outperform the state-of-the-art methods, including OSM/CMSM and a commercial face

recognition software, in terms of accuracy and efﬁciency.

The chapter is organized as follows. The relevant background methods are brieﬂy re-

viewed and discussed in Section 1.2. Section 1.3 highlights the problem of discriminant

analysis over sets and presents a novel iterative solution. In Section 1.4, the orthogonal

subspace method is explained and related to both the proposed method and the prior art.

The experimental results and their discussion are presented in Section 1.5. Conclusions are

drawn in Section 1.6.

1.2 Key Ingredients of the Proposed Learning

1.2.1 Parametric/Non-parametric Linear Discriminant Analysis

Assume that a data matrix X = {x

, x

, ..., x

} ∈ R

N×M

is given, where x

∈ R

is a

N-dimensional column vector obtained by raster-scanning an image. Each vector belongs

to one of object classes denoted by C

. Classical linear discriminant analysis (LDA) ﬁnds

1 Discriminative Learning and Recognition of Image Set Classes Using Canonical

Correlations 5

a transformation T ∈ R

N×n

(n ≤ N) which maps a vector x to

x = T

x ∈ R

such that

the transformed data have maximum separation between classes and minimum separation

within classes. The between-class and within-class scatter matrices in LDA [10] are given

by B =

− m)(m

− m)

, W =

x∈C

(x − m

)(x − m

)

, where m

denotes the class mean, m is the global mean of the entire sample set and M

denotes

the number of samples in class c. With the assumption that all classes have Gaussian

distributions with equal covariance matrix, trace(B) and trace(W) measure the scatter of

vectors in the between-class and within-class populations respectively. A nonparametric

form of these scatter matrices is also proposed in [18] with the deﬁnition of the between-

class and within-class neighbours of a sample x

∈ C

given by

B =

i=1

(∆

)(∆

)

, W =

i=1

(∆

)(∆

)

(1.1)

where ∆

= x

− x

, ∆

= x

− x

, x

= {x

∈ C

|kx

− xk ≤ kz − xk, ∀z ∈ C

}

and x

= {x

∈ C

|kx

− xk ≤ |z − xk, ∀z ∈ C

}. w

is a sample weight in order to

deemphasize samples away from class boundaries. LDA or Nonparametric Discriminant

Analysis (NDA) ﬁnds the optimal T which maximizes trace(

B) and minimizes trace(

W),

where

W are the scatter matrices of the transformed data. As these are explicitly rep-

resented with T by

B = T

BT,

W = T

WT, the solution T can be easily obtained by

solving the generalized eigen-problem, BT = WTΛ, where Λ is the eigenvalue matrix.

When we regard the training data of each class as a set, LDA or NDA can be viewed

as the discriminant analysis of the vector sets based on similarity of parametric model-

based and non-parametric sample-based matching of sets respectively. In LDA, each set

(i.e. a class) is assumed to be normally distributed with equal covariance matrix and these

parametric distributions are optimally separated. On the other hand, in NDA, set similarity

is measured by the aggregated distance of a certain number of nearest neighbour samples

and the separation of the sets is optimized based on this set similarity.

It is also worth noting that the between-class and within-class scatter measures based

on pairwise vector-distance in LDA/NDA can be related to pairwise vector-correlation in

many pattern recognition problems. The magnitude of a data vector is often normalized so

that |x| = 1. As trace(AB) = trace(BA) for any matrix A, B and |x| = 1, trace(W)

in (1.1) equals

trace(

2(1 − x

)). The problem of minimizing trace(W) can be

changed into the maximization of trace(W

) and similarly the maximization of trace(B)

into the minimization of trace(B

), where

, W

(1.2)

and x

, x

indicate the closest between-class and within-class vectors of a given vector

. Note the weight w

is omitted for simplicity and the total number of training sets M

does not change the direction of the desired components. We now see the optimization

1 Discriminative Learning and Recognition of Image Set Classes Using Canonical

Correlations

problem of classical NDA deﬁned by correlations of pairwise vectors. Rather than dealing

with correlations of every pair of vectors, in the proposed method, we exploit canonical

correlations of pairwise linear subspaces of sets (See Section 1.3.1 for the proposed prob-

lem formulation). By resorting to canonical correlations the proposed method overcomes

the shortcomings of both classical model-based and sample-based approaches in set com-

parison.

1.2.2 Deﬁnition and Solution of Canonical Correlations

Canonical correlations, which are cosines of principal angles 0 ≤ θ

≤ . . . ≤ θ

≤ (π/2)

between any two d-dimensional linear subspaces L

and L

are uniquely deﬁned as:

cos θ

= max

∈L

max

∈L

(1.3)

subject to u

= v

= 1, u

= v

= 0, i 6= j. There are various ways to solve

this problem. They are all equivalent but the Singular Value Decomposition (SVD) solu-

tion [26] is more numerically stable than the others, as the number of free parameters to esti-

mate is smaller. A comparison with the method called MSM [19] is given in Appendix .0.1.

The SVD solution is as follows: Assume that P

∈ R

N×d

and P

∈ R

N×d

form unitary

orthogonal bases for two linear subspaces, L

and L

. Let the SVD of P

∈ R

d×d

= Q

ΛQ

s.t. Λ = diag(σ

, ..., σ

) (1.4)

where Q

= Q

= I

. Canonical correlations are the

singular values and the associated canonical vectors, whose correlations are deﬁned as

canonical correlations, are given by

U = P

= [u

, ..., u

], V = P

= [v

, ..., v

] (1.5)

Canonical vectors are orthonormal in each subspace and Q

, Q

can be seen as rotation

matrices of P

, P

. The concept is illustrated in Figure 1.2.

Intuitively, the ﬁrst canonical correlation tells us how close are the closest vectors from

two subspaces. Similarly, the higher canonical correlations tell us about the proximity of

vectors of the two subspaces in other dimensions (perpendicular to the previous ones) of

the embedding space. Note that a set of high-dimensional pattern vectors can usually be

well conﬁned to a low-dimensional subspace which retains most of the energy of the set.

See Figure 1.3 for the canonical vectors computed from the sample image sets given in

Figure 1.1. The common modes (views and/or illuminations) of the two different sets are

well captured by the ﬁrst few canonical vectors found. Each canonical vector of one set is

very similar to the corresponding canonical vector of the other set despite the data changes

across the sets. The canonical vectors of different dimensions represent different variations

1 Discriminative Learning and Recognition of Image Set Classes Using Canonical

Correlations 7

Figure 1.2: Conceptual illustration of canonical correlations. Two sets are represented as

linear subspaces which are planes here. Canonical vectors on the planes are

found to yield maximum correlations. In a two dimensional subspace case, the

second canonical vectors u

, v

are determined to be perpendicular to the ﬁrst

ones.

(a) (b)

Figure 1.3: Principal components vs. canonical vectors. (a) The ﬁrst 5 principal compo-

nents computed from the four image sets shown in Figure 1.1. The principal

components of the different image sets are signiﬁcantly different. (b) The ﬁrst

5 canonical vectors of the four image sets, which are computed for each pair

of the two image sets of the same object. Every pair of canonical vectors

(each column) U, V well captures the common modes (views and illumina-

tions) of the two sets containing the same object. The pairwise canonical vec-

tors are quite similar. The canonical vectors of different dimensions u

, ...u

and v

, ..., v

represent different pattern variations e.g. in pose or lighting.

of the patterns. Compared with the parametric distribution-based matching, this concept is

more ﬂexible as it effectively places a uniform prior over the subspace of possible pattern

variations. Compared with the NN matching of samples, this approach is much more stable

as the samples are conﬁned to a certain subspace. The complexity of SVD of a d × d

dimensional matrix is also very low.

1 Discriminative Learning and Recognition of Image Set Classes Using Canonical

Correlations

1.3 Discriminant-analysis of Canonical Correlations (DCC)

As shown in Figure 1.3, canonical correlations of two different image sets of the same objet

acquired in different conditions proved to be a promising measure of similarity of the two

sets. This suggests that by matching based on image sets one could achieve a robust solution

to the problem of object recognition even when the observation data is subject to extensive

data variations. However, it is further required to suppress the contribution to similarity of

canonical vectors of two image sets due to common environmental condition. The optimal

discriminant function is proposed to transform image sets so that canonical correlations

of within-class sets are maximized while canonical correlations of between-class sets are

minimized in the transformed data space.

1.3.1 Problem Formulation

Assume m sets of vectors are given as {X

, ..., X

}, where X

describes a data matrix of

the i-th set containing observation vectors (or images) in its columns. Each set belongs to

one of object classes denoted by C

. A d-dimensional linear subspace of the i-th set is rep-

resented by an orthonormal basis matrix P

∈ R

N×d

s.t. X

' P

, where Λ

, P

are the eigenvalue and eigenvector matrices of the d largest eigenvalues respectively and N

denotes the vector dimension. We deﬁne a transformation matrix T = [t

, ..., t

] ∈ R

N×n

where n ≤ N, |t

| = 1 s.t. T : X

→ Y

= T

. The matrix T transforms images so that

the transformed image sets are class-wise more discriminative using canonical correlations.

Representation. Orthonormal basis matrices of the subspaces of the transformed data are

obtained from the previous matrix factorization of X

= (T

)(T

)

' (T

)Λ

)

(1.6)

Except when T is an orthogonal matrix, T

is not generally an orthonormal basis ma-

trix. Note that canonical correlations are only deﬁned for orthonormal basis matrices of

subspaces. Any orthonormal components of T

now deﬁned by T

can represent an

orthonormal basis matrix of the transformed data. See Section 1.3.2 for details.

Set Similarity. The similarity of any two transformed data sets represented by T

are deﬁned as the sum of canonical correlations by

= max

tr(M

), (1.7)

= Q

or T

T, (1.8)

as tr(AB) = tr(BA) for any matrix A, B. Q

, Q

are the rotation matrices similarly

deﬁned in the SVD solution of canonical correlations (1.4) with the two transformed sub-

spaces.

1 Discriminative Learning and Recognition of Image Set Classes Using Canonical

Correlations 9

Figure 1.4: Conceptual illustration of the proposed method. Here are drawn the three

sets represented by the basis vector matrices P

, i = 1, ..., 3. We assume that

the two sets P

, P

are within-class sets and the third one is coming from the

other class. Canonical vectors P

, i = 1, ..., 3, j 6= i are equivalent to basis

vectors P

in this simple drawing where each set occupies a one-dimensional

space. Basis vectors are projected on the discriminative subspace by T and

normalized such that |T

| = 1. Then, the principal angle of within-class

sets, θ becomes zero and the angles of between-class sets, φ

, φ

are maxi-

mized.

Discriminant Function. The discriminative function (or matrix) T is found to maximize

the similarities of any pairs of within-class sets while minimizing the similarities of pair-

wise sets of different classes. Matrix T is deﬁned with the objective function J by

T = arg max

J = arg max

i=1

k∈W

i=1

l∈B

(1.9)

where the indices are deﬁned as W

= {j |X

∈ C

} and B

= {j |X

/∈ C

}. That

is, the two index sets W

, B

denote, respectively, the within-class and between-class sets

for a given set of class i, by analogy to [18]. See Figure 1.4 for the concept of the pro-

posed problem. In the discriminative subspace represented by T, canonical correlations of

within-class sets are to be maximized and canonical correlations of between-class sets to

be minimized.

1.3.2 Iterative Learning

The optimization problem of T involves the variables Q, P

as well as T. As the other

variables are not explicitly represented by T, a closed form solution for T is hard to ﬁnd.

We propose an iterative optimization algorithm. Speciﬁcally, we compute an optimal solu-

tion for one of the three variables at a time by ﬁxing the other two and repeating this for

1 Discriminative Learning and Recognition of Image Set Classes Using Canonical

Correlations

a certain number of iterations. Thus, the proposed iterative optimization is comprised of

the three main steps: normalization of P, optimization of matrices Q, and T. Each step is

explained below:

Normalization. The matrix P

is normalized to P

for a ﬁxed T so that the columns of

are orthonormal. QR-decomposition of T

is performed s.t. T

= Φ

∆

where Φ

∈ R

N×d

is the orthonormal matrix composed by the ﬁrst d columns and ∆

∈

d×d

is the d × d invertible upper-triangular matrix. From (1.6), Y

= T

√

∆

√

. As ∆

√

is still an upper-triangular matrix, Φ

can represent an orthonormal

basis matrix of the transformed data Y

. As ∆

is invertible,

= T

∆

−1

) → P

= P

∆

−1

. (1.10)

Computation of rotation matrices Q. Rotation matrices Q

for every i, j are obtained for

a ﬁxed T and P

. The correlation matrix M

deﬁned in the left of (1.8) can be conveniently

used for the optimization of Q

, as it has Q

outside of the matrix product. Let the SVD

of P

= Q

ΛQ

(1.11)

where Λ is a singular matrix and Q

, Q

are orthogonal rotation matrices. Note that the

matrices which are Singular-Value decomposed have only d

elements.

Computation of T. The optimal discriminant transformation matrix T is computed for

given P

and Q

by using the deﬁnition of M

in the right of (1.8) and (1.9). With T being

on the outside of the matrix product M

, it is convenient to solve for. The discriminative

function is found by

T = max

argT

tr(T

T)/tr(T

T) (1.12)

i=1

l∈B

− P

)(P

− P

)

i=1

k∈W

− P

)(P

− P

)

where B

= {j |X

/∈ C

} and W

= {j |X

∈ C

}. Note that no loss of generality is

incurred from (1.9) as

B = I − 1/2 · (A − B)

(A − B),

where A = T

, B = T

. The solution {t

}

i=1

is obtained by solving the

following generalized eigenvalue problem: S

t = λS

t. When S

is non singular, the

optimal T is computed by eigen-decomposition of (S

)

−1

. Note also that the proposed

learning can avoid a singular case of S

by pre-applying PCA to data similarly with the

1 Discriminative Learning and Recognition of Image Set Classes Using Canonical

Correlations 11

Algorithm 1. Discriminant-analysis of Canonical Correlations (DCC)

Input: All P

∈ R

N×d

Output: T ∈ R

N×n

1. T ← I

2. Do iterate the followings:

3. For all i, do QR-decomposition: T

= Φ

∆

→ P

= P

∆

−1

4. For every pair i, j, do SVD: P

= Q

ΛQ

5. Compute S

i=1

l∈B

− P

)(P

− P

)

i=1

k∈W

− P

)(P

− P

)

6. Compute eigenvectors {t

}

i=1

of (S

)

−1

, T ← [t

, ..., t

]

7.End

8.T ← [t

, ..., t

]

Figure 1.5: Proposed iterative algorithm for ﬁnding T, which maximizes class separation

in terms of canonical correlations

Fisherface method [10] and speed up by using a small number of nearest neighboring sets

in B

, W

similarly with [18]. Canonical correlation analysis for multiple sets [37] is also

noteworthy here with regard to fast learning. It may help speeding up by reformulating

the between-class and within-class scatter matrices in (1.12) by the canonical correlation

analysis of multiple sets thus by avoiding the computation of the rotation matrices of every

pair of image sets in the iterations.

With the identity matrix I ∈ R

N×N

as the initial value of T, the algorithm is iterated

until it converges to a stable point. A Pseudo-code for the learning is given in Algorithm 1.

Once T maximizing the canonical correlations of within-class sets and minimizing those

of between-class sets in the training data is found, a comparison of any two novel sets

is achieved by transforming them by T, and then computing canonical correlations (See

(1.7)).

1.3.3 Discussion about Convergence

Although we do not provide a proof of convergence or uniqueness of the proposed op-

timization process, its convergence to a global maximum was conﬁrmed experimentally.

See Figure 1.6 for examples of the iterative learning. Each example is for the learning us-

ing a different training data set. The value of the objective function J for all cases becomes

stable after ﬁrst few iterations, starting with the initial value T = I. This fast and stable

convergence is very favorable for keeping the learning cost low. Furthermore, as shown at

bottom right in Figure 1.6, it was observed that the proposed algorithm converged to the

same point irrespective of the initial value of T. These results are indicative of the deﬁned

criterion being a quadratic convex function with respect to the joint set of variables as well

as each individual variable as argued in [32, 33].

For all of the experiments in Section 1.5, the number of iterations was ﬁxed to 5. The

proposed learning took about 50 seconds for the face experiments on a Pentium IV PC us-

1 Discriminative Learning and Recognition of Image Set Classes Using Canonical

Correlations

1 2 3 4 5 6 7 8 9 10

1.75

1.8

1.85

1.9

1.95

Number of iterations

Jtot

1 2 3 4 5 6 7 8 9 10

1.65

1.7

1.75

1.8

1.85

1.9

Number of iterations

Jtot

1 2 3 4 5 6 7 8 9 10

1.5

1.52

1.54

1.56

1.58

1.6

1.62

1.64

1.66

1.68

Number of iterations

Jtot

1 2 3 4 5 6 7 8 9 10

1.5

1.55

1.6

1.65

1.7

1.75

1.8

1.85

1.9

Number of iterations

Jtot

Figure 1.6: Convergence characteristics of the optimization: The cost of J of a given

training set is shown as a function of the number of iterations.The bottom right

shows the convergence to a unique maximum with different random initials of

ing non-optimized Matlab code, while the OSM/CMSM methods took around 5 seconds.

Note the learning is performed once in an off-liner manner. On-line matching by the three

recognition methods is highly time-efﬁcient. See the experimental section for more infor-

mation about the time complexity of the methods.

1.4 Alternative Methods of Discriminative Canonical Cor-

relations for Set Classiﬁcation

1.4.1 Orthogonal Subspace Method (OSM)

Orthogonality of two subspaces means that any vector of one subspace is orthogonal to

any vector of the other subspace [31]. This requirement is equivalent to that of each basis

vector of one subspace being orthogonal to each basis vector of the other. When recalling

that canonical correlations are deﬁned as maximal correlations between any two vectors

of two subspaces as given in (1.3), it is very clear that canonical correlations of any two

orthogonal subspaces are zeros. Thus, measuring canonical correlations of class speciﬁc

orthogonal subspaces might be a basis for classifying image sets.

Let us assume that the subspaces of the between-class sets B

= {j|X

/∈ C

}of a given

data set X

are orthogonal to the subspace of the set X

. If the subspaces are orthogonal,

1 Discriminative Learning and Recognition of Image Set Classes Using Canonical

Correlations 13

all canonical correlations of those subspaces would also be zero as

l∈B

= O ∈ R

d×d

→ trace(Q

) = 0 (1.13)

where O is a zero matrix and P

is a basis matrix of the set X

. The classical orthog-

onal subspace method (OSM) [31] has been developed as a method designed to obtain

class-speciﬁc orthogonal subspaces. The OSM ﬁnds the subspace, which is represented by

the basis matrix denoted by P

, where data sets of different classes are orthogonal. See

Appendix .0.2 for the details of the OSM solution. By orthogonalizing the subspaces of

between-class sets, the discrimination of image sets, in terms of canonical correlations is

achieved.

Comparison with the Proposed Solution, DCC. Note that the orthogonality of subspaces

is a restrictive condition, at least when the number of classes is large. It is often the case that

the subspaces of OSM represented by P

and P

l∈B

are correlated. If P

has non-zero

values, canonical correlations could be much greater than zero as

À 0 (1.14)

where q is a column of the rotation matrix Q in the deﬁnition of canonical correlations.

Generally, the problem of minimizing correlations of basis matrices P

in OSM is not

equivalent to the proposed problem formulation where the canonical correlations q

are minimized.

Note again that the principal components of P are sensitive to data changes, whereas

canonical vectors PQ are consistent as shown in Figure 1.3. Thus, the proposed optimiza-

tion by canonical correlations is expected to be more robust to possible data changes than

the OSM solution based on P. Moreover the orthogonal subspace method does not ex-

plicitly attempt to maximize canonical correlations of the within-class sets. It combines all

examples of a class together. See Appendix .0.2 for details. The better accuracy of DCC

over OSM was evident when the number of training classes was large or the conditions for

obtaining the training and test data were different (See the experimental section).

1.4.2 Constrained Mutual Subspace Method (CMSM)

It is worth noting that CMSM [21][22] can be seen to be closely related to the orthog-

onal subspace method. For the details of CMSM, refer to Appendix .0.3. CMSM ﬁnds

the constrained subspace where the total projection operators have small variances. Each

class is represented by a subspace which maximally represents the class data variances,

then the class subspace is projected into the constrained subspace. The projected data

subspace compromises the maximum representation of each class and the minimum repre-

sentation of a mixture of all the other classes. This is similar in concept with the orthogonal

subspace method explained in Appendix .0.2. Both methods try to minimize the correla-

tion of between-class subspaces deﬁned by P

l∈B

. However, the dimensionality of the

1 Discriminative Learning and Recognition of Image Set Classes Using Canonical

Correlations

constrained subspace of CMSM should be optimised for each application. If the dimen-

sionality is too low, the constrained subspace will be a null space. In the opposite case,

the constrained subspace simply retains all the energy of the original data and thus can

not play a role as a discriminant function. This dependence of CMSM on the parame-

ter (dimensionality) selection makes it rather empirical. In contrast, there is no need to

choose any subspace from the discriminative space represented by the rotation matrix P

in the orthogonal subspace method. A full dimension of the matrix can simply be adopted.

Note the proposed method, DCC, also exhibited insensitivity to dimensionality, thus being

practically, as well as theoretically, very appealing (See the experimental section).

1.5 Experimental Results and Discussion

The proposed method (the code is available at http://mi.eng.cam.ac.uk/∼tkk22) is evaluated

on various object or object category recognition problems: using face image sets with

arbitrary motion captured under different illuminations, image sets of ﬁve hundred general

objects taken at different views and the 8 general object categories, each of which has

several different objects. The task of all of the experiments is to classify an unknown set of

vectors to one of the training classes, each also represented by vector sets.

1.5.1 Database of Face Image Sets

We have collected a database called the Cambridge-Toshiba Face Video Database with 100

individuals of varying age and ethnicity, and equally represented genders, which are shown

in Figure 1.7. For each person, 14 (7 illuminations × two recordings) video sequences of

the person in arbitrary motion were collected. Each sequence was recorded in a different

illumination setting for 10s at 10fps and at 320×240 pixel resolution. See Figure 1.8 for

samples from an original image sequence and seven different lightings. Following auto-

matic localization using a cascaded face detector [27] and cropping to a uniform scale of

20×20 pixels, images of faces were histogram equalized. Note that the face localization

was performed automatically on the images of uncontrolled quality. Thus it was not as

accurate as any conventional face registration with either manual or automatic eye posi-

tions performed on high quality face images. Our experimental conditions are closer to the

conditions given for typical surveillance systems.

1.5.2 Comparative Methods and Parameter Setting

We compared the performance of :

• KL-Divergence algorithm (KLD) [6] as a representative parametric model-based

method,

• Non-parametric sample-based methods such as k-Nearest Neighbour (kNN) and Haus-

dorff Distance (d(S

, S

) = min

∈S

max

∈S

d(x

, x

)) [9] of images transformed

1 Discriminative Learning and Recognition of Image Set Classes Using Canonical

Correlations 15

Figure 1.7: Examples of Face Video Database. The data set contains 100 face classes

with varying age, ethnicity, and genders. Each class has about 1400 images

from the 14 image sequences captured under 7 different lighting conditions.

(a)

(b)

Figure 1.8: Example images of the face data sets. (a) Frames of a typical face video

sequence with automatic face detection (b) Face prototypes of 7 different illu-

minations

by (i) PCA, and (ii) LDA [10] subspaces, which are estimated from training data

similarly to [8],

• Nearest Neighbour (NN) by FaceIt (v.5.0), the commercial face recognition system

from Identix, which ranked top overall in the Face Recognition Vendor Test 2000 and

2002 [34, 35],

• Mutual Subspace Method (MSM) [19], which is equivalent to a simple aggregation

of canonical correlations,

• Constrained MSM (CMSM) [21, 22] used in a state-of-the-art commercial system

called FacePass [29],

• Orthogonal Subspace Method (OSM) [31],

1 Discriminative Learning and Recognition of Image Set Classes Using Canonical

Correlations

• and the proposed iterative discriminative learning, DCC.

To compare different algorithms, important parameters of each method were adjusted

and the optimal ones in terms of test identiﬁcation rates were selected. In KLD, 96% of data

energy was explained by the principal subspace of training data used [6]. In kNN methods,

the dimension of PCA subspace was chosen to be 150, which represents more than 98%

of training data energy (Note that removing the ﬁrst 3 components improved the accuracy

in the face recognition experiment as similarly observed in [10]). The best dimension of

LDA subspace was also found at around 150. The number of nearest neighbors used was

chosen from one to ten. In MSM/CMSM/OSM/DCC, the dimension of the linear subspace

of each image set represented 98% of data energy of the set, which was around 10. PCA

was performed for each set in the MSM/CMSM/DCC methods.

Dimension Selection of the Discriminative Subspaces in CMSM/OSM/DCC. As shown

in Figure 1.9 (a), CMSM exhibited a high peaking in the the relationship between accuracy

and dimensionality of the constrained subspace, whereas the proposed method, DCC, pro-

vided constant identiﬁcation rates regardless of dimensionality of T beyond a certain point.

The best dimension of the constrained subspace of CMSM was found to be at around 360

and was ﬁxed. For DCC, we ﬁxed the dimension at 150 for all experiments (the full dimen-

sion can also be conveniently exploited without any feature selection). The full dimension

was also used for the rotation matrix P

in OSM. Note that the proposed method DCC

and OSM do not require any elaborate feature selection and this behaviour of DCC/OSM

is highly attractive from the practical point of view, compared to CMSM. Without feature

selection the accuracy of CMSM in the full space drops dramatically to the level equiva-

lent to that of MSM, which is a simple aggregation of canonical correlations without any

discriminative transformation.

Number of Canonical Correlations. Figure 1.9 (b) shows the accuracy of MSM/DCC

according to the number of canonical correlations used. Basically, this parameter does not

affect the accuracy of the methods as much as the dimension of the discriminative subspace,

as shown in Figure 1.9 (a). The proposed method, DCC, was shown to be less sensitive to

this parameter than MSM. The number of canonical correlations was ﬁxed to be the same

(i.e. this was set as the dimension of linear subspaces of image sets) for all the methods,

MSM/CMSM/OSM/DCC.

1.5.3 Face Recognition Experiments

Training of all the algorithms was performed with data sequences acquired in a single il-

lumination setting and testing with a single other setting. We used 18 randomly selected

training/test combinations of the sequences for reporting identiﬁcation rates. The perfor-

mance of the evaluated recognition algorithms is shown in Figure 1.10 and Table 1.1. The

18 experiments were divided into two parts according to the degree of difference between

the training and the test data of the experiments, which was measured by KL-Divergence

1 Discriminative Learning and Recognition of Image Set Classes Using Canonical

Correlations 17

50 100 150 200 250 300 350 400

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Dimension

Identification rate

DCC

CMSM

1 2 3 4 5 6 7 8 9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Number of principal angles

Identification rate

DCC

MSM

(a) (b)

Figure 1.9: (a) The effect of the dimensionality of the discriminative subspace on the

proposed iterative method (DCC) and CMSM. The accuracy of CMSM at 400

is equivalent to that of MSM, a simple aggregation of canonical correlations.

(b) The effect of the number of canonical correlations on DCC and MSM.

0 5 10 15 20

0.92

0.93

0.94

0.95

0.96

0.97

0.98

0.99

Rank

Cumulative identification rate

MSM

CMSM

OSM

DCC

kNN−LDA

Figure 1.10: Cumulative recognition plot for the MSM/kNN-LDA/CMSM/OSM/DCC meth-

ods

between the training and test data. Figure 1.10 shows the cumulative recognition rates for

the averaged results of all 18 experiments and Table 1.1 shows the results separately for the

ﬁrst (easier) and the second parts (more difﬁcult) of the experiments.

In this experiment, all training samples of a class were drawn from a single video se-

quence of arbitrary head movement, so they were randomly divided into two sets for the

within-class sets in the proposed learning. Note that the proposed method with this random

partition still worked well. The test recognition rates changed by less than 1-2 % for the dif-

ferent trials of random partitioning. If samples of a class can be partitioned according to the

data semantics, the concept of the within-class sets would be more useful and reasonable,

which is the case of the following other experiments.

In Table 1.1, most of the methods generally had lower recognition rates for the exper-

iments with larger KL-Divergence between the training and test data. The KLD method

achieved by far the worst recognition rate. Considering that the illumination conditions

1 Discriminative Learning and Recognition of Image Set Classes Using Canonical

Correlations

KLD HD-PCA 1NN-PCA 10NN-PCA FaceIt S/W

1st half 0.49±0.14 0.60±0.07 0.95±0.03 0.96±0.03 0.90±0.09

2nd half 0.24±0.13 0.47±0.09 0.71±0.20 0.71±0.21 0.86±0.05

10NN-LDA MSM CMSM OSM DCC

1st half 0.98±0.01 0.94±0.03 0.98±0.01 0.98±0.01 0.98±0.01

2nd half 0.87±0.07 0.91±0.02 0.93±0.06 0.94±0.06 0.95±0.04

Table 1.1: Evaluation results. The mean and standard deviation of recognition rates of different

methods. The results are shown separately for the ﬁrst (easier) and the second parts

(more difﬁcult) of the experiments.

varied across data and that the face motion was largely unconstrained, the distribution of

within-class face patterns was very broad, making this result unsurprising. In the methods

of non-parametric sample-based matching, the Hausdorff-Distance (HD) measure provided

far poorer results than the k-Nearest Neighbors (kNN) methods deﬁned in the PCA sub-

space. 10NN-PCA yielded the best accuracy of the sample-based methods deﬁned in the

PCA subspace, which is worse than MSM by 8.6% on average. Its performance greatly var-

ied across the experiments. Note that MSM showed robust performance with a large margin

over kNN-PCA method under the different experimental conditions. The improvement of

MSM over both KLD and HD/kNN-PCA methods was very impressive. The beneﬁts of us-

ing canonical correlations over both classical approaches for set classiﬁcation, which have

been explained throughout the previous sections, were conﬁrmed.

The commercial face recognition software FaceIt (v.5.0) yielded the performance which

is in the middle of those of kNN-PCA and kNN-LDA methods on average. Although the

NN method using FaceIt is based on individual sample matching, it delivered more robust

performance for the data changes (the difference in accuracy between the ﬁrst half and

the second half is not as large as those of kNN-PCA/LDA methods). This is reasonable,

considering that FaceIt was trained independently with the training images used for other

methods.

Table 1.1 also gives a comparison for the methods combined with discriminative learn-

ing. kNN-LDA yielded a big improvement over kNN-PCA but the accuracy of the method

again greatly varied across the experiments. Note that 10NN-LDA outperformed MSM for

similar conditions between the training and test sets, but it became noticeably inferior as

the conditions changed. It delivered similar accuracy to MSM on average, which is also

shown in Figure 1.10. The proposed method DCC, CMSM and OSM constantly provided

a signiﬁcant improvement over both MSM and kNN-LDA method as shown in Table 1.1 as

well as in Figure 1.10.

More Comparison of DCC, OSM and CMSM. Note that CMSM/OSM can be consid-

ered as measuring correlation between subspaces deﬁned by the basis matrix P in a simple

way which is different from the canonical correlations deﬁned by PQ. In spite of this

difference, the accuracy of CMSM/OSM was impressive in this experiment. As explained

above, when an ideal solution of CMSM/OSM exists and Q only provides a rotation within

the subspace, the solution of CMSM/OSM can be close to that of the proposed method

DCC. However, if class subspaces cannot be made orthogonal to each other, then the di-

rect optimization of canonical correlations offered by DCC is preferred. The novel data

1 Discriminative Learning and Recognition of Image Set Classes Using Canonical

Correlations 19

(a) MSM (b) CMSM

Figure 1.11: Confusion matrices for the MSM/CMSM/OSM/DCC methods. The diagonal

and off-diagonal values in the DCC confusion matrix can be distinguished

much better.

space PQ is robust to environmental changes as shown in Figure 1.3, making the solution

of DCC, which is obtained by directly optimizing PQ space, also robust. Note that the

proposed method was better than CMSM/OSM for the second half of the experiments in

Table 1.1 (although it is not very clear).

The differences of the three methods are clearly apparent from the associated confusion

matrices of the training data. We trained the three methods using both training and test sets

of the worst experimental case for the methods (See the last two of Figure 1.8 (b)), and

compared their confusion matrices of the total class data with that of MSM, as shown in

Figure 1.11. Both OSM and CMSM considerably improved the ability of class discrimi-

nation over MSM, but they were still far from optimal compared with DCC for the given

data. As discussed above, both of the proposed method, DCC, and OSM are preferable to

CMSM as they do not involve the selection of dimensionality of the discriminative sub-

spaces. While the best dimension for CMSM had to be identiﬁed with reference to the test

results, the full dimension of the discriminative space can simply be adopted for any new

test data in the DCC and OSM methods.

We designed another face experiments with more face image sets in the Cambridge-

Toshiba face video database. The database involves two sets of videos acquired at different

times, each of which consists of seven different illumination sequences for each person.

We used one time set for training and the other set for testing thus having more variations

between the training and testing (See Figure 1.12 for an example of the two different times

sets acquired in the same illumination). Note the training and testing sets in the previous

experimental setting were drawn from the same time set. In this experiment using a sin-

gle illumination set for training, the full 49 combinations of the different lighting settings

were exploited. We also increased the number of image sets per each class for training.

1 Discriminative Learning and Recognition of Image Set Classes Using Canonical

Correlations

Figure 1.12: Example of the two time sets (top and bottom) of a person acquired in a single

lighting setting. They contain signiﬁcant variations in pose and expression.

0.8

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

Identification rate

DCC

OSM

CMSM

Single set

Double sets

Triple sets

Figure 1.13: Recognition rates of the CMSM/OSM/DCC methods when using a single,

double and triple image sets in training.

We randomly drew a combination of different illumination sequences for training and used

all 7 illumination sequences for testing. 10-fold cross validation was performed for these

experiments. Figure 1.13 shows the mean and standard deviations of recognition rates of

all experiments. The proposed method signiﬁcantly outperformed OSM/CMSM methods

when the test sets are much different from the training sets. These results are consistent

with those of the methods in the 2nd part of the experiment in Table 1.1 (but the difference

is much clearer here). Overall, all three methods improved their accuracy by using more

image sets in training.

Matching complexity. The complexity of the methods based on canonical correlations

(MSM/ CMSM/OSM/DCC), O(d

), is much lower than that of the sample-based matching

methods (kNN-PCA/LDA), O(m

n), where d is the subspace dimension of each set, m

is the number of samples of each set and n is the dimensionality of feature vectors, since

d ¿ m, n. In the face experiments, the unit matching time of comparing the two image

sets which contain about 100 images is 0.004 for the canonical correlations based method

and 1.1 seconds for the kNN method.

1.5.4 Experiment on Large Scale General Object Database

The ALOI database [28] with 500 general object categories taken at different viewing an-

gles provides another experimental data set for the proposed method. Object images were

segmented from the simple background and scaled to 20×20 pixel size. A training set and

1 Discriminative Learning and Recognition of Image Set Classes Using Canonical

Correlations 21

(a)

(b)

Figure 1.14: ALOI experiment. (a) The training set consists of 18 images taken at 10

degree intervals. (b) Two test sets are shown. Each test set contains 9

images at 10 degree intervals, different from the training set.

90 110 130 150 170

0.4

0.5

0.6

0.7

0.8

0.9

View angle deviation between train and test set (degree)

Identification rate

MSM

CMSM

DCC

kNN−LDA

OSM

Figure 1.15: Identiﬁcation rates for the 5 different test sets. The object viewing angles of

the test sets differ from those of the training set to a varying extent.

ﬁve test sets were set up with different viewing angles of the objects as shown in Figure 1.14

(a) and (b). Note that the pose of all the images in the test sets differed by at least 5 degree

from every sample of the training set. The methods of MSM, kNN-LDA, CMSM and OSM

were compared with the proposed method in terms of identiﬁcation rate. The parameter

were selected in the same way as in the face recognition experiment. The dimension of

the linear subspace of each image set was ﬁxed to 5, representing more than 98% data en-

ergy in MSM/CMSM/OSM/DCC methods. The best number of nearest neighbors in the

kNN-LDA method was found to be ﬁve.

Judging from Figure 1.15 and Figure 1.16, kNN-LDA yielded better accuracy than

MSM in all the cases. This contrasted with the ﬁndings in the face recognition experi-

ment. This may have been caused by the somewhat artiﬁcial experimental setting. The

nearest neighbours of the training and test set differed only slightly due to the ﬁve degree

pose difference. Please note that the two sets had no changes in lighting and had accu-

rate localization of the objects as well. Further note that the accuracy of MSM could be

improved by using only the ﬁrst canonical correlation, similarly to the results shown in Fig-

ure 1.9 (b). Here again, CMSM, OSM and the proposed method DCC were substantially

superior to MSM. Overall, the accuracy of CMSM/OSM was similar to that of kNN-LDA

1 Discriminative Learning and Recognition of Image Set Classes Using Canonical

Correlations

1 6 11 16 21 26 31 36 41 46

0.7

0.75

0.8

0.85

0.9

0.95

Rank

Cumulative Identification rate

OSM

MSM

CMSM

DCC

kNN−LDA

Figure 1.16: Cumulative recognition rates of the MSM/kNN-LDA/CMSM/OSM/DCC meth-

ods for the ALOI experiment

method as shown in Figure 1.16. The proposed iterative method, DCC, constantly outper-

formed all the other methods including OSM/CMSM as well as kNN-LDA. Please note

this experiment involved a larger number of classes, compared with the face experiments.

Furthermore, the set of images of the training class had quite different pose distributions

from those of the test set. The accuracy of CMSM/OSM methods might be degraded by all

these factors, whereas the proposed method is still robust.

1.5.5 Object Category Recognition using ETH80 database

An interesting problem of object category recognition was performed using the public

ETH80 data base. As shown in Figure 1.17, there are 8 categories which contain 10 objects

each, with 41 images of different views. More details about the data base can be found

in [36]. We randomly partitioned 10 objects into two sets of ﬁve objects for training and

testing. In Experiment 1, we used all 41 view images of objects. In Experiment 2, we

used all 41 views for training but a random subset of 15 view images for testing. 10-fold

cross-validation was carried out for both experiments. Parameters such as the dimension of

the linear subspaces, the number of principal angles and nearest neighbors were selected

as in the previous experiment. The dimension of the constrained subspace of CMSM was

also best optimised.

From Table 1.2, it is worth noting that the accuracy of kNN-PCA method is similar

(but slightly inferior) to that of the PCA method reported in [36]. Note that we used only

5 objects per category, in contrast to [36] where 9 objects were used for training. The

recognition rates for individual object categories also showed similar behaviors to those

of [36].

As shown in Table 1.2, the kNN methods were much inferior to the methods based

on canonical correlations. The sample-based matching method was very sensitive to the

variations in different objects of the same categories, failing in object categorization. The

methods using canonical correlations provided much more accurate results. The proposed

method (DCC) delivered the best accuracy over all tested methods. The improvement of

Discriminative Learning and Recognition of Image Set Classes Using Canonical

Correlations 23

(a)

(b)

Figure 1.17: Object category database (ETH80) contains (a) 8 different object categories

and (b) 10 different objects for each category.

kNN-PCA kNN-LDA MSM CMSM OSM DCC

exp.1 0.762±0.21 0.752±0.17 0.865±0.13 0.897±0.10 0.905±0.09 0.917±0.09

exp.2 - - - 0.852±0.21 0.865±0.18 0.912±0.13

Table 1.2: Evaluation results of object categorization. The mean recognition rate and its standard

deviation for all experiments.

DCC over CMSM/OSM was bigger in the second experiment where only a subset of images

of objects was involved in the testing. Note that this makes the testing set very different

from the training set. The major principal components of the image sets are highly sen-

sitive to the variations in pose. The accuracy of CMSM/OSM methods was considerably

decreased in the presence of this variation, while the DCC method maintained almost the

same accuracy.

1.6 Conclusions

A novel discriminative learning framework has been proposed for set classiﬁcation based

on canonical correlations. It is based on iterative learning which is theoretically and practi-

cally appealing. The proposed method has been evaluated on various object and object cate-

gory recognition problems. The new technique facilitates effective discriminative learning

over sets, and exhibits an impressive set classiﬁcation accuracy. It signiﬁcantly outper-

formed the KLD method representing a parametric distribution-based matching, and kNN

methods in both PCA/LDA subspaces as examples of non-parametric sample-based match-

ing. It also largely outperformed the method based on a simple aggregation of canonical

correlations.

The proposed DCC method achieved not only better accuracy but also possesses many

good properties, compared with CMSM/OSM methods. CMSM had to be optimised apos-

teriori by feature selection. In contrast DCC does not need any feature selection. It exhib-

ited a robust performance over a wide range of dimensions of the discriminative subspace

as well as the number of canonical correlations used. Although CMSM/OSM delivered a

comparable accuracy to DCC in particular cases, in general it lagged behind the proposed

method.

The canonical-correlation based methods including the proposed method were also

shown to be highly time efﬁcient in matching, thus offering an attractive tool for recog-

Discriminative Learning and Recognition of Image Set Classes Using Canonical

Correlations

nition involving a large scale database.

.0.1 Equivalence of SVD solution to Mutual Subspace Method [19]

In Mutual Subspace Method (MSM), canonical correlations are deﬁned as the eigenvalues

of the matrix P

∈ R

N×N

, where P

∈ R

N×d

is a basis matrix of a data set

i. The SVD solution in (1.4) for computing canonical correlations is symmetric. That is,

= Λ

By multiplying the above two equations, we obtain

)(Q

) = Λ

→ Q

= Λ

→ P

= P

as Q

= Q

= I. P

and Λ

are the eigenvector matrix and eigenvalue

matrix respectively of the matrix P

. That is, the canonical correlations of

MSM simply assume the square value of the canonical correlations of the SVD solution.

Please note that the dimension of the matrix P

∈ R

d×d

is relatively low compared with

the dimension of P

∈ R

N×N

.0.2 OSM solution

Denote the correlation matrices of the C classes by C

, ...C

and the respective a priori

probabilities by π

, ..., π

[31]. Then matrix C

i=1

is the correlation matrix of

the mixture of all the classes. Matrix C

can be diagonalized by BC

= Λ. Denoting

= Λ

−1/2

B, we have P

= I. Then,

+ ...π

= I

This means that matrices π

and Σ

j6=i

have the same eigenvectors but

the eigenvalues λ

of π

and λ

of Σ

j6=i

are related by λ

= 1 − λ

That is, in the space rotated by matrix P

, the most important basis vectors of class i,

which are the eigenvectors of π

corresponding to largest eigenvalues, are at the

same time the least signiﬁcant basis vectors for the ensemble of the rest of the classes. Let

be such an eigenvector matrix so that

= Λ

Discriminative Learning and Recognition of Image Set Classes Using Canonical

Correlations 25

Then,

j6=i

= I − Λ

Since every matrix π

for all j 6= i is positive semideﬁnite, π

should be a diagonal matrix having smaller elements than 1 − λ

. If we let P

denote the

eigenvectors of j-th class by π

≈ P

, the matrix P

now

has small diagonal elements. Accordingly, P

should have all the elements close to

zero. In the ideal case when π

has the eigenvalues which are exactly equal to

one, the matrix P

would be a zero matrix for all j 6= i. The two subspaces deﬁned

by P

, P

are called orthogonal subspaces. That is, every column of P

is perpendicular to

every column of P

Note that the OSM method does not exploit the concept of multiple sets in a single

class (or within-class sets). The method assumes that all data vectors of a single class i are

represented by a single set P

. From the above, the matrix P

could represent an alternative

discriminative space where the canonical correlation of between-class sets are minimized.

Note that the matrix P

is a rotation matrix in its concept and therefore it is a square matrix.

.0.3 Constrained Mutual Subspace Method [21]

The constrained subspace D is spanned by N

eigenvectors d of the matrix G =

i=1

s.t.

Gd = λd

where C is the number of training classes, P

is a basis matrix of the original i-th class

data and eigenvector d corresponds to the N

smallest eigenvalues. The optimal dimension

of the constrained subspace is set experimentally. The subspace P

is projected onto

D and the orthogonal components of the projected subspace, normalised to unit length, are

obtained as inputs for computing canonical correlations by the method of MSM [19].

Bibliography

[1] K. Lee, M. Yang, and D. Kriegman. Video-based face recognition using probabilistic

apperance manifolds. Proc. Computer Vision and Pattern Recognition, pp. 313–320,

2003.

[2] S. Zhou, V. Krueger, and R. Chellappa. Probabilistic recognition of human faces from

video. Computer Vision and Image Inderstanding, vol. 91, no. 1, pp. 214–245, 2003.

[3] Y. Li, S. Gong, and H. Liddell. Recognising the dynamics of faces across multiple

views. Proc. British Machine Vision Conference, pp. 242–251, 2000.

[4] X. Liu and T. Chen. Video-Based Face Recognition Using Adaptive Hidden Markov

Models. Proc. Computer Vision and Pattern Recognition, pp. 340–345, 2003.

[5] A. Hadid and M. Pietikainen. From Still Image to Video-Based Face Recognition:

An Experimental Analysis. Sixth IEEE International Conference on Automatic Face

and Gesture Recognition, pp. 813–818, 2004.

[6] G. Shakhnarovich, J. W. Fisher, and T. Darrel. Face recognition from long-term ob-

servations. Proc. European Conf. Computer Vision, pp. 851–868, 2002.

[7] O. Arandjelovi

c, G. Shakhnarovich, J. Fisher, R. Cipolla, and T. Darrell. Face recog-

nition with image sets using manifold density divergence. Proc. Computer Vision and

Pattern Recognition, pp. 581–588, 2005.

[8] S. Satoh. Comparative Evaluation of Face Sequence Matching for Content-based

Video Access. Proc. Int’l Conf. on Automatic Face and Gesture Recognition, pp.

163–168 2000.

[9] R.O. Duda, P.E. Hart, and D.G. Stork. Pattern Classiﬁcation. John Wily & Sons, Inc.,

New York, 2nd edition, 2000.

[10] P.N. Belhumeur, J.P. Hespanha, and D.J. Kriegman. Eigenfaces vs. Fisherfaces:

Recognition Using Class Speciﬁc Linear Projection. IEEE Trans. Pattern Analysis

and Machine Intelligence, vol. 19, no. 7, pp. 711–720, 1997.

[11] W.Y. Zhao, R. Chellappa, and A. Krishnaswamy. Discriminant Analysis of Principal

Components for Face Recognition. Proc. Int’l Conf. on Automatic Face and Gesture

Recognition, pp. 336–341, 1998.

BIBLIOGRAPHY 27

[12] M.T. Sadeghi and J.V. Kittler. Decision Making in the LDA Space: Generalised Gra-

dient Direction Metric. Proc. Int’l Conf. on Automatic Face and Gesture Recognition,

pp. 248–253, 2004.

[13] X. Wang and X. Tang. Random Sampling LDA for Face Recognition. Proc. Computer

Vision and Pattern Recognition, pp. 259–265, 2004.

[14] T-K. Kim and J. Kittler. Locally Linear Discriminant Analysis for Multimodally Dis-

tributed Classes for Face Recognition with a Single Model Image. IEEE Trans. Pat-

tern Analysis and Machine Intelligence, vol. 27, no.3, pp. 318–327, 2005.

[15] T-K. Kim, O. Arandjelovi

c and R. Cipolla, Learning over Sets using Boosted Mani-

fold Principal Angles (BoMPA). Proc. British Machine Vision Conference, pp. 779–

788, 2005.

[16] T-K. Kim, J. Kittler and R. Cipolla, Learning Discriminative Canonical Correlations

for Object Recognition with Image Sets. Proc. European Conf. Computer Vision, pp.

251–262, 2006.

[17] M.-H. Yang. Kernel Eigenfaces vs. Kernel Fisherfaces: Face Recognition Using Ker-

nel Methods. Proc. Int’l Conf. on Automatic Face and Gesture Recognition, pp. 215–

220, 2002.

[18] M. Bressan, J. Vitria Nonparametric discriminant analysis and nearest neighbor clas-

siﬁcation. Pattern Recognition Letters, vol. 24, no. 15, pp. 2743–2749, 2003.

[19] O. Yamaguchi, K. Fukui, and K. Maeda. Face recognition using temporal image

sequence. Proc. Int’l Conf. on Automatic Face and Gesture Recognition, pp. 318–

323, 1998.

[20] L. Wolf and A. Shashua. Learning over sets using kernel principal angles. J. Machine

Learning Research, vol. 4, no. 10, pp. 913–931, 2003.

[21] K. Fukui and O. Yamaguchi. Face recognition using multi-viewpoint patterns for

robot vision. Int’l Symp. of Robotics Research, pp. 192–201, 2003.

[22] M. Nishiyama, O. Yamaguchi and K. Fukui, Face Recognition with the Multiple

Constrained Mutual Subspace Method. Proc. of Audio- and Video-based Biometric

Person Authentication, pp. 71-80, 2005.

[23] H. Hotelling. Relations between two sets of variates. Biometrika, vol. 28, no. 34, pp.

321–372, 1936.

[24] T. Kailath. A view of three decades of linear ﬁltering theory. IEEE Trans. Information

Theory, vol. 20, no. 2, pp. 146–181, 1974.

[25] R. Gittins. Canonical analysis: A review with applications in ecology. Springer-

Verlag, Berlin, Germany, 1985.

28 BIBLIOGRAPHY

[26]

A. Bj

orck and G. H. Golub. Numerical methods for computing angles between linear

subspaces. Mathematics of Computation, vol. 27, no. 123, pp. 579–594, 1973.

[27] P. Viola and M. Jones. Robust real-time face detection. Int’l J. Computer Vision, vol.

57, no. 2, pp. 137–154, 2004.

[28] J.M. Geusebroek, G.J. Burghouts, and A.W.M. Smeulders. The Amsterdam library of

object images. Int’l J. Computer Vision, vol. 61, no. 1, pp. 103–112, January, 2005.

[29] Toshiba Corporation, Facepass. http://www.toshiba.co.jp/mmlab/tech/ w31e.htm.

[30] T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley, 1991.

[31] E.Oja, Subspace Methods of Pattern Recognition. Research Studies Press, 1983.

[32] D.D. Lee and H.S. Seung, Algorithms for Non-Negative Matirx Factorization. Ad-

vances in Neural Information Processing Systems, pp. 556–562, 2001.

[33] D.D. Lee and H.S. Seung, Learning the Parts of Objects by Non-Negative Matrix

Factorization. Nature, vol. 401, no. 6755, pp. 788–791, 1999.

[34] P.J. Phillips, P. Grother, R.J Micheals, D.M. Blackburn, E. Tabassi, and J.M. Bone,

FRVT 2002: Evaluation Report, Mar. 2003. http://www.frvt.org/FRVT2002/.

[35] D.M. Blackburn, M. Bone, and P.J. Phillips, Facial Recognition Vendor Test 2000:

Evaluation Report, 2000.

[36] B. Leibe and B. Schiele, Analyzing appearance and contour based methods for object

categorization. Proc. Computer Vision and Pattern Recognition, pp. 409–415, 2003.

[37] J. Via, I. Santamaria, J. Perez, Canonical Correlation Analysis (CCA) Algorithms for

Multiple Data Sets: Application to Blind SIMO Equalization, 13th European Signal

Processing Conference, Antalya, Turkey, 2005.

[38] D. Hardoon, S. Szedmak and J. Shawe-Taylor Canonical correlation analysis; An

overview with application to learning methods Neural Computation, vol. 16, no. 12,

pp. 2639–2664, 2004

Distributed Manifold Hashing for Image Set Classification and Retrieval

Article

Mar 2024

Conventional image set methods typically learn from image sets stored in one location. However, in real-world applications, image sets are often distributed or collected across different positions. Learning from such distributed image sets presents a challenge that has not been studied thus far. Moreover, efficiency is seldom addressed in large-scale image set applications. To fulfill these gaps, this paper proposes Distributed Manifold Hashing (DMH), which models distributed image sets as a connected graph. DMH employs Riemannian manifold to effectively represent each image set and further suggests learning hash code for each image set to achieve efficient computation and storage. DMH is formally formulated as a distributed learning problem with local consistency constraint on global variables among neighbor nodes, and can be optimized in parallel. Extensive experiments on three benchmark datasets demonstrate that DMH achieves highly competitive accuracies in a distributed setting and provides faster classification and retrieval than state-of-the-arts.

Discriminative latent subspace learning with adaptive metric learning

Article

Full-text available

Nov 2023
NEURAL COMPUT APPL

Least squares regression (LSR) has been widely used in the field of pattern recognition. However, LSR-based classifier still suffers from the following issues. One is that it focuses only on the dependency between the input data and the output targets, while overlooking the local structure of instances. Another one is that using binary labels as the regression targets is too strict to fully exploit the discriminative information of the data. To address these issues, we propose a novel multiclass classification method called discriminative latent subspace learning with adaptive metric learning (DLSAML). Specifically, DLSAML adaptively learns a metric matrix for the residuals between inputs and outputs, driving smaller distances between instances of the same class and larger distances between instances of different classes in the output space. To solve the second problem, latent representations are learnt guided by the pairwise label relations as the regression targets, allowing for more flexible use of discriminative information in the data. As a combination of these two techniques, the interactive optimization of the projection matrix and metric matrix allows DLSAML to fully exploit the structural and supervised information of the data to obtain a more discriminative latent subspace for multiclass classification. Extensive experiments on several benchmark datasets have demonstrated the effectiveness of the proposed method.

Synaptic Plasticity Models and Bio-Inspired Unsupervised Deep Learning: A Survey

Preprint

Full-text available

Jul 2023

Recently emerged technologies based on Deep Learning (DL) achieved outstanding results on a variety of tasks in the field of Artificial Intelligence (AI). However, these encounter several challenges related to robustness to adversarial inputs, ecological impact, and the necessity of huge amounts of training data. In response, researchers are focusing more and more interest on biologically grounded mechanisms, which are appealing due to the impressive capabilities exhibited by biological brains. This survey explores a range of these biologically inspired models of synaptic plasticity, their application in DL scenarios, and the connections with models of plasticity in Spiking Neural Networks (SNNs). Overall, Bio-Inspired Deep Learning (BIDL) represents an exciting research direction, aiming at advancing not only our current technologies but also our understanding of intelligence.

Multi-view Object Recognition Based on Deep Models

Chapter

Jun 2024

Deep contrastive representation learning for multi-modal clustering

Article

Mar 2024
NEUROCOMPUTING

Set representative vector and its asymmetric attention-based transformation for heterogeneous set-to-set matching

Article

Feb 2024
NEUROCOMPUTING

Heterogeneous Defect Prediction

Chapter

Jan 2024

Cross-company defect prediction (CCDP) learns a prediction model by using training data from one or multiple projects of a source company and then applies the model to the target company data. Existing CCDP methods are based on the assumption that the data of source and target companies should have the same software metrics. However, for CCDP, the source and target company data is usually heterogeneous, namely the metrics used and the size of metric set are different in the data of two companies. We call CCDP in this scenario as heterogeneous CCDP (HCCDP) task. We aim to provide an effective solution for HCCDP. Cross-project defect prediction (CPDP) refers to predicting defects in a target project using prediction models trained from historical data of other source projects. And CPDP in the scenario where source and target projects have different metric sets is called heterogeneous defect prediction (HDP). Existing HDP methods only consider the linear correlation relationship among the features (metrics) of the source and target projects, and such models are insufficient to evaluate nonlinear correlation relationship among the features. We propose a new cost-sensitive transfer kernel canonical correlation analysis (CTKCCA) approach for HDP. CTKCCA can not only make the data distributions of source and target projects much more similar in the nonlinear feature space, where the learned features have favorable separability, but also utilize the different misclassification costs for defective and defect-free classes to alleviate the class imbalance problem. To facilitate data sharing, it is essential to study how to protect the privacy of data owners before they release their data.

Many Is Better Than One: Multiple Covariation Learning for Latent Multiview Representation

Chapter

Nov 2023

Canonical correlation analysis is a typical multiview representation learning technique, which utilizes within-set and between-set covariance matrices to analyze the correlation between two multidimensional datasets. However, it is quite difficult for the covariance matrix to measure the nonlinear relationship between features because of its linear structure. In this paper, we propose a multiple covariation projection (MCP) method to learn latent two-view representation, which has the ability to model the complicated feature relationship. The proposed MCP first constructs multiple general covariance matrices for modeling diverse feature relations, and then integrates them together via a linear ensemble strategy. At last, an efficient two-stage algorithm is designed for solutions. In addition, we further present a multiview MCP for dealing with the case of multiple (more than two) views. Experimental results on benchmark datasets show the effectiveness of our proposed MCP method in multiview classification and clustering tasks.

Cooperative linear regression model for image set classification

Article

Jun 2023
EXPERT SYST APPL

Learning enhanced specific representations for multi-view feature learning

Article

Apr 2023
KNOWL-BASED SYST

Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection

Article

Full-text available

Jul 1997

We develop a face recognition algorithm which is insensitive to large variation in lighting direction and facial expression. Taking a pattern classification approach, we consider each pixel in an image as a coordinate in a high-dimensional space. We take advantage of the observation that the images of a particular face, under varying illumination but fixed pose, lie in a 3D linear subspace of the high dimensional image space-if the face is a Lambertian surface without shadowing. However, since faces are not truly Lambertian surfaces and do indeed produce self-shadowing, images will deviate from this linear subspace. Rather than explicitly modeling this deviation, we linearly project the image into a subspace in a manner which discounts those regions of the face with large deviation. Our projection method is based on Fisher's linear discriminant and produces well separated classes in a low-dimensional subspace, even under severe variation in lighting and facial expressions. The eigenface technique, another method based on linearly projecting the image space to a low dimensional subspace, has similar computational requirements. Yet, extensive experimental results demonstrate that the proposed “Fisherface” method has error rates that are lower than those of the eigenface technique for tests on the Harvard and Yale face databases

Numerical Methods for Computing Angles Between Linear Subspaces

Article

Full-text available

Jul 1973

Assume that two subspaces $F$ and $G$ of a unitary space are defined as the ranges (or null spaces) of given rectangular matrices $A$ and $B$. Accurate numerical methods are developed for computing the principal angles $\theta_k(F, G)$ and orthogonal sets of principal vectors $u_k \in F$ and $\upsilon_k \in G, k = 1, 2, \cdots, q = \dim(G) \leqq \dim(F)$. An important application in statistics is computing the canonical correlations $\sigma_k = \cos \theta_k$ between two sets of variates. A perturbation analysis shows that the condition number for $\theta_k$ essentially is $\max(\kappa(A), _\kappa(B))$, where $\kappa$ denotes the condition number of a matrix. The algorithms are based on a preliminary $QR$-factorization of $A$ and $B$ (or $A^H$ and $B^H$), for which either the method of Householder transformations (HT) or the modified Gram-Schmidt method (MGS) is used. Then $\cos \theta_k$ and $\sin \theta_k$ are computed as the singular values of certain related matrices. Experimental results are given, which indicates that MGS gives $\theta_k$ with equal precision and fewer arithmetic operations than HT. However, HT gives principal vectors, which are orthogonal to working accuracy, which is not generally true for MGS. Finally, the case when $A$ and/or $B$ are rank deficient is discussed.

Canonical correlation analysis; An overview with application to learning methods

Article

May 2003

We present a general method using kernel Canonical Correlation Analysis to learn a semantic representation to web images and their associated text. The semantic space provides a common representation and enables a comparison between the text and images. In the experiments we look at two approaches of retrieving images based only on their content from a text query. We compare the approaches against a standard cross-representation retrieval technique known as the Generalised Vector Space Model.

Numerical methods for computing angles between linear subspaces

Article

Jul 1973

Assume that two subspaces F and G of a unitary space are defined as the ranges (or null spaces) of given rectangular matrices A and B. Accurate numerical methods are developed for computing the principal angles θk(F, G) and orthogonal sets of principal vectors uk ϵ Fand vk ϵ G, k = 1, 2, q = dim(G) ≤ dim(F). An important application in statistics is computing the canonical correlations σk = cos θk between two sets of variates. A perturbation analysis shows that the condition number for θk essentially is max(k(A), k(B)), where k denotes the condition number of a matrix. The algorithms are based on a preliminary QR-factorization of A and B (or AH and BH, for which either the method of Householder transformations (HT) or the modified Gram-Schmidt method (MGS) is used. Then cos θk and sin θk are computed as the singular values of certain related matrices. Experimental results are given, which indicates that MGS gives θk with equal precision and fewer arithmetic operations than HT. However, HT gives principal vectors, which are orthogonal to working accuracy, which is not generally true for MGS. Finally, the case when A and /or B are rank deficient is discussed.

NEW ASPECTS ON THE SUBSPACE METHODS OF PATTERN RECOGNITION.

Book

Jan 1984

Erkki Oja

Subspace methods are decision-theoretic vector space methods in which each pattern class is represented by a relatively low-dimensional subspace. Classification is usually based on projections, which computationally involves only inner products between a test vector and a set of basis vectors and is therefore very fast. Several attempts at constructing the classification subspace in an optimal way have been reported, most of them based on class correlation matrices computed from samples. A promising way to compute subspaces with high discriminatory power, recently introduced by Kohonen, is to use decision-directed learning. Variants of the basic method have been developed. They are analyzed mathematically and results are reviewed on their application to phonemic recognition of speech.

Development and evaluation of face recognition system using constrained mutual subspace method

Article

Jan 2004

Pattern Classification (Pt.1)

Book

Jan 2001

Extended subspace methods of pattern recognition

Article

Sep 1996
PATTERN RECOGN LETT

The Subspace Pattern Recognition Method (SPRM) is a statistical method where each class is represented by a separate subspace. There are a number of variants to it including the Averaged Learning Subspace Method (ALSM). The decision surfaces in all these methods are quadratic. In some applications, we may require decision surfaces which are more nonlinear in nature. In this paper, we have proposed the use of more than one subspace (cluster) in the representation of the classes in the subspace methods of pattern recognition. By keeping the number of principal components in all the clusters the same, this model allows for a piecewise linear approximation of the decision surfaces. We have used this model to extend both the SPRM and the ALSM to obtain the Extended SPRM and the Extended ALSM, respectively. We have investigated the use of a dynamic clustering algorithm for the assignment of patterns to different clusters as opposed to a random assignment. We have conducted experiments on three data sets including a 192-dimensional large character data set. The results indicate that the proposed methods have the potential to approximate any decision surface, and can considerably improve the classification accuracy on the test sets.

Relations Between Two Sets of Variates

Article

Nov 1935
BIOMETRIKA

Harold Hotelling

Concepts of correlation and regression may be applied not only to ordinary one-dimensional variates but also to variates of two or more dimensions. Marksmen side by side firing simultaneous shots at targets, so that the deviations are in part due to independent individual errors and in part to common causes such as wind, provide a familiar introduction to the theory of correlation; but only the correlation of the horizontal components is ordinarily discussed, whereas the complex consisting of horizontal and vertical deviations may be even more interesting. The wind at two places may be compared, using both components of the velocity in each place. A fluctuating vector is thus matched at each moment with another fluctuating vector. The study of individual differences in mental and physical traits calls for a detailed study of the relations between sets of correlated variates. For example the scores on a number of mental tests may be compared with physical measurements on the same persons. The questions then arise of determining the number and nature of the independent relations of mind and body shown by these data to exist, and of extracting from the multiplicity of correlations in the system suitable characterizations of these independent relations. As another example, the inheritance of intelligence in rats might be studied by applying not one but s different mental tests to N mothers and to a daughter of each

Canonical Correlation Analysis (CCA) algorithms for multiple data sets: Application to blind SIMO equalization

Article

Jan 2005

Canonical Correlation Analysis (CCA) is a classical tool in statisti-cal analysis that measures the linear relationship between two or several data sets. In [1] it was shown that CCA of M = 2 data sets can be reformulated as a pair of coupled least squares (LS) problems. Here, we generalize this idea to M > 2 data sets. First, we present a batch algorithm to extract all the canonical vectors through an iterative regression procedure, which at each iteration uses as desired output the mean of the outputs obtained in the pre-vious iteration. Furthermore, this alternative formulation of CCA as M coupled regression problems allows us to derive in a straight-forward manner a recursive least squares (RLS) algorithm for on-line CCA. The proposed batch and on-line algorithms are applied to blind equalization of single-input multiple-output (SIMO) channels. Some simulation results show that the CCA-based algorithms out-perform other techniques based on second-order statistics for this particular application.

Discriminative Learning and Recognition of Image Set Classes Using Canonical Correlations

Abstract and Figures

Recommended publications

Effects of classifier structures and training regimes on integrated segmentation and recognition of...

Face Recognition Using Discriminate Analysis and Canonical Correlations

Cipolla: Incremental Learning of Locally Orthogonal Subspaces for Set-based Object Recognition. BMVC...

Learning Discriminative Canonical Correlations for Object Recognition with Image Sets

First Year Report Learning Discriminative Canonical Correlations for Object Recognition with Image S...