ArticlePDF Available

Chemometrics and Modeling

February 2001
CHIMIA International Journal for Chemistry 55(1-2):70-80

February 2001
55(1-2):70-80

DOI:10.2533/chimia.2001.70

License
CC BY-NC 4.0

Authors:

Yvan Vander Heyden

Vrije Universiteit Brussel

Chemometrics is a chemical discipline in which mathematical and statistical techniques are applied to design experiments or to analyze chemical data. An important part of chemometrics is modeling, in which one tries to relate two or more characteristics in such a way that the obtained model represents reality as closely as possible. In this article some less known but useful regression methods such as orthogonal least squares, inverse and robust regression are introduced and compared with the well-known classical least squares regression method. Genetic algorithms are described as a means of carrying out feature selection for multivariate regression. Regression methods such as principal component regression and partial least squares are introduced as well as the use of N-way principal components.

Available via license: CC BY-NC 4.0

Content may be subject to copyright.

COMPUTATIONAL CHEMISTRY COLUMN

CHIMIA 2001,55, No,1/2

Computational Chemistry Column

Column Editors:

Prof. Dr. H. Huber, University of Basel

Prof. Dr. K. MOiler, F. Hoffmann-La Roche AG, Basel

Prof. Dr. H.P. LOthi, Univ. of Geneva, ETH-ZOrich

Chimia 55 (2001) 70-aO

ISSN 0009-4293

Chemometrics and Modeling

Frederic Estienne, Yvan Vander Heyden, and D. Luc Massart*

Abstract: Chemometrics is a chemical discipline in which mathematical and statistical techniques are applied

to design experiments or to analyze chemical data. An important part of chemometrics is modeling, in which

one tries to relate two or more characteristics in such a way that the obtained model represents reality as closely

as possible. In this article some less known but useful regression methods such as orthogonal least squares,

inverse and robust regression are introduced and compared with the well-known classical least squares

regression method. Genetic algorithms are described as a means of carrying out feature selection for

multivariate regression. Regression methods such as principal component regression and partial least squares

are introduced as well as the use of N-way principal components.

Keywords: Analytical chemistry' Chemical data analysis' Chemometrics . Modeling·

OSAR . Regression methods

Introduction

Chemometrics has been defined

[1]

as a

chemical discipline that uses mathemat-

ics, statistics and formal logic (a) to de-

sign or select optimal experimental pro-

cedures, (b) to provide the maximum rel-

evant chemical information by analyzing

chemical data, and (c) to obtain knowl-

edge about chemical systems.

In this article we will focus on how

chemometrics is used for modeling pur-

poses. However, first we should note

that, while modeling is probably the most

important area of chemometrics, there

are many other applications such as

method validation, optimization, statisti-

cal process control, signal processing,

etc.

'Correspondence: Prof, Dr. D.L. Massart

Farmaceutisch Instituut

Vrije Universiteit Brussel

Laarbeeklaan 103

B-1 090 Brussels

Tel.: +32 2 477 47 34

Fax: +32 2 477 47 35

E-Mail: fabi@fabi.vub.ac.be

Modeling is applied when two or

more characteristics of the same objects

are measured or calculated and then relat-

ed to each other, for example the concen-

tration of a chemical compound to an in-

strumental signal, the chemical structure

of a drug to its activity or instrumental

responses to sensory characteristics. The

purpose of the modeling usually is to

make predictions (e.g. predict the con-

centration of a certain analyte in a sample

from a measured signal), but sometimes

simply to verify the nature of the relation-

ship.

The expertise of the authors is in the

use of chemometrics for analytical chem-

ical purposes and most examples will

therefore come from that area.

Classical Univariate Least Squares:

Straight Line Models

Before introducing some of the more

sophisticated methods such as genetic al-

gorithms, latent variable procedures or

neural nets, we should look shortly at the

classical univariate least squares method-

ology (often called ordinary least squares

- OLS), which is what analytical chem-

ists generally use to construct a (linear)

calibration line. In most analytical tech-

niques the concentration of a sample can-

not be measured directly but is derived

from a measured signal that is in direct

relation to the concentration. Suppose x

represents a concentration and y the cor-

responding measured instrumental sig-

nal. To be able to define a model y

f(x)

a relationship between x and y has to ex-

ist. The simplest and most convenient sit-

uation is when the relation is linear which

leads to a model of the type y

I X

and which represents a straight line. The

coefficients boand b

represent the inter-

cept and the slope of the line. Relation-

ships between y and x that follow a

curved line can for instance be represent-

ed by a regression model of the type y

bo+b1x +b11X2.

The least squares regression analysis

is a methodology that allows the coeffi-

cients of a given model to be estimated.

For calibration purposes one usually fo-

cuses on straight line models which we

also will do in the rest of this section.

COMPUTATIONAL CHEMISTRY COLUMN 71

CHI MIA 2001,55. No. 1/2

Fig. 1. Straight line fitting through a series of measured points,

a urcm nl

f (

oncentrulion)

This relationship can be inversed to

become

n cntralion

(mea.,ur menl

(:1.

Another possibility is to apply inverse

regression. The term inverse is applied in

opposition to the usual calibration proce-

dure. Calibration consists of measuring

samples with a known characteristic and

deriving a calibration line (or more gen-

erally a model). A measurement is then

carried out for an unknown sample and

its concentration is derived from the

measurement result and the calibration

line. In view of the assumptions of OLS,

the measurement is the y-value and the

concentration the x-value, i.e.

OLS-methods leads to wrong estimates

and

Significant errors in the least

squares estimate of b

can be expected if

the ratio between the measurement error

on the x values and the range of the x val-

ues is large. In that case OLS should not

be used. To obtain correct values for b()

and b

the sum of least squares must now

be obtained in the direction given in

Fig. 2. Such methods are sometimes

called errors in variables models or or-

thogonalleast squares. Detailed studies

of the application of models of this type

can be found in [9][10].

OLS is then applied in the usual way.

meaning that the sum of the squared re-

siduals is minimized in the direction of y,

which is now the concentration. This may

appear strange, since, when the calibra-

tion line is computed, there are no errors

in the concentrations. However, if it is

taken into account that there will be an

error in the predicted concentration of the

unknown sample, then minimizing in this

way means that one minimizes the pre-

diction errors, which is what is important

to the analytical chemist. It has been

shown indeed that better results are ob-

tained in this way [I 1-13]. The analytical

chemist should therefore really apply

Eqn. (2), instead of the usual Eqn. (1). In

most cases the difference in prediction

quality between both approaches is very

small in practice, so that there is general-

ly no harm in applying Eqn. (1). We will

see however that when multivariate cali-

bration is applied, inverse regression is

the rule. It should be noted that, when the

aim is not to predict y-values, but to ob-

tain the best possible estimates of

and

PI'

inverse regression performs less well

than the usual procedure.

quantities are related to each other and

the assumption then does not hold, be-

cause there are also measurement errors

in x. This is for instance the case when

two methods are compared to each other.

Often one of these methods is a reference

method and the other a new method,

which is faster or cheaper and a demon-

stration is required that the results of both

methods are sufficiently similar. A cer-

tain number of samples are analyzed with

both methods and a straight line model

relating both series of measurements is

obtained. If

as estimated from b

is not

more different from 0 than an a priori ac-

cepted bias and

as estimated by b

not more different from I than a given

amount, then one can accept that for

practical purposes Y

x. In its simplest

statistical expression, this means that it is

tested that

0 and

I or to put it in

another way that bois statistically differ-

ent from 0 and/or blis statistically differ-

ent from 1. If this is the case then it is

concluded that the two methods do not

yield the same result but that there is a

constant (intercept) or proportional

(slope) systematic error or bias.

This means that one should calculate

boand b

and at first sight this could be

done by OLS. However both regression

variables (not only Yibut now also Xi) are

subject to error, as already mentioned.

This violates one of the key assumptions

of the OLS calculations.

It has been shown

[5-8]

that the com-

putation of boand blaccording to the

Some Variants of the Univariate

Least Square Straight Line Models

Conventionally the x values represent the

so-called controlled or independent vari-

able, i.e. the variable that is considered

not to have a measurement error (or a

negligible one), which is the concentra-

tion in our case. The y values represent

the dependent variable, i.e. the measured

response, which is considered to have a

measurement error. The least squares ap-

proach allows b

and blvalues to be ob-

tained such that the model fits the meas-

ured points (xj, Yi)best (Fig. 1).

The true relationship between x and y

is considered to be y

X whi Ie the

relationship between each Xi and its

measured Yican be represented as Yi

+blXi +ei' The signal Yiis composed of a

component predicted by the model, b

b1x, and a random component, ej, the re-

sidual (Fig. 1). The least squares regres-

sion finds the estimates b

and blfor

and

by calculating the values boand bl

for which I.e?

I. (Yi - bo- blxY, the

sum of the squared residuals, is minimal.

This explains the name 'least squares'.

Standard books about regression, includ-

ing least squares approaches, are given in

[2][3]. Analytical chemists can find in-

formation in [4][5].

A fundamental assumption of OLS is

that there are only errors in the direction

of y. In some instances, two measured

COMPUTATIONAL CHEMISTRY COLUMN 72

CHIMIA 2001, 55, No.1/2

Fig. 2. The errors-in-variables model.

residuals. Cook's squared distance or the

Mahalanobis distance can for instance be

used.

A more elegant way is to apply so-

called robust regression methods. The

easiest to explain is the single median

method [14]. The slope between each pair

of points is computed. For instance the

slope between points 1 and 2 is 1.10, be-

tween 1 and 3 1.00, between 5 and 6 6.20.

The complete list is 1.10, 1.00, 1.03,

0.95, 2.00, 0.90, 1.00, 0.90, 2.23, 1.10,

0.90, 2.67, 0.70, 3.45, 6.20. These are

now ranked and the median slope (here

the 8

value 1.03) is chosen. All pairs of

points of which the outlier is one point

have high values and end up at the end of

the ranking, so that they do not have an

influence on the chosen median slope:

even if the outlier was still more distant,

the selected median would still be the

same. A similar procedure for the inter-

cept, which we will not explain in detail,

leads to the straight line equation y

0.00

1.03 x, which is close to the line ob-

tained with OLS after eliminating the

outlier. The single median method is not

the best robust regression method. Better

results are obtained with the least median

of squares method (LMS) [15], the itera-

tively reweighted [16] or biweight re-

gression [17]. Comparing results of cali-

bration lines obtained with OLS and with

a robust method is one way of finding

outliers towards a regression model [18].

Multivariate (Multiple) Regression

Multivariate regression, also often

called multiple regression or multiple

linear regression (MLR) in the linear

case, is used to obtain values for the b co-

efficients in an equation of the type

where x"

X2, ... , X

are different varia-

bles. In analytical spectroscopic applica-

tions, these variables could be the absorb-

encies obtained at different wavelengths,

y being a concentration or other charac-

teristic of the samples to be predicted, in

QSAR (the study of quantitative struc-

ture-activity relationships) they could be

variables such as hydrophobicity (Jog P),

the Hammett electronic parameter

0',

with y being some measure of biological

activity. In experimental design, equa-

tions of the type

Fig. 3. The leverage effect.

Robust Regression

One of the most frequently occurring

difficulties for an experimentalist is that

of the presence of outliers. The outliers

may be due to experimental error or to

the fact that the proposed model does

not represent the data well enough. For

example, if the postulated model is a

straight line, and measurements are made

in a concentration range where this is no

longer true, the measurements obtained

in that region will be model outliers. In

Fig. 3 it is clear that the last point is not

representative for the straight line fitted

by the rest of the data. The outlier attracts

the regression line computed by OLS. It

is said to exert leverage on the regression

line. One might think that outliers can be

discovered by examining the residuals to-

wards the line. As can be observed this is

not necessarily true: the outlier's residual

is not much larger than that of some other

data points.

To avoid the leverage effect, the

outlier(s) should be eliminated. One way

to achieve this is to use more efficient

outlier diagnostics than simply looking at

=bo+b

',+b

12 ,

2+b

)2+b22 2

(3)

(4)

COMPUTATIONAL CHEMISTRY COLUMN

CHIMIA 2001. 55. No. 1/2

are used to describe a response y as a

function of the experimental variables x

and X2' Both Eqn. (3) and (4) are called

linear, which may surprise the non-initi-

ated, since the shape of the relationship

between y and (x "X2) is certainly not lin-

ear. The term linear should be understood

as linear in the regression parameters. An

equation such as y = b

+log (x - b

non-linear [2].

It can be observed from the applica-

tions cited above that multiple regression

models occur quite often. We will first

consider the classical solution to estimate

the coefficients. Later we will describe

some more sophisticated methodologies

introduced by chemometricians, such as

those based on latent vectors.

As for the univariate case, the b-val-

ues are estimates of the true b-parameters

and the estimation is done by minimizing

a (sum of) squares. It can be shown that

= (

T ).1 T

where b is the vector containing the b-

values from Eqn. (3), Xis an nxm matrix

containing the x-values for n samples (or

objects as they are often called) and m

variables and y is the vector containing

the measurements for the n samples.

One difficulty is that the inversion of

the

XTX

matrix Leads to unstable results

when the x-variables are very correlated.

As we will explain later, this happens for

instance with spectroscopic data. There

are two ways to avoid this probLem. One

is to select variables (variable selection

or feature selection) such that correlation

is reduced, the other is to combine the

variables in such a way that the resulting

summarizing variables are not correlated

(feature reduction). Both feature selec-

tion and feature reduction lead to a small-

er number of variables than the initial

number of variables, which by itself has

important advantages.

The classical approach, which is

found in many statistical packages, is the

so-called stepwise regression, a feature

selection method. The so-called forward

selection procedure consists of first se-

lecting the variable that is best correlated

with y. Suppose this is found to be Xi.The

model at this stage is restricted to y=f(xJ.

Then, one tests all other variables by add-

ing them to the model, which then be-

comes a model in two variables y=f(xi>xj)'

The variable Xjwhich is retained together

with Xi is the one which when added to

the model leads to the largest improve-

ment compared to the original model

y = f(Xi)' Then it is tested whether the

observed improvement is significant. If

not, the procedure stops and the model is

restricted to y = f(Xi)' If the improvement

is significant, Xj is incorporated defini-

tively in the model. It is then investigated

which variable should be added as the

third one and whether this yields a signif-

icant improvement. The procedure is re-

peated until finally no further improve-

ment is obtained. The procedure is based

on analysis of variance and several vari-

ants such as backwards elimination

(starting with all variables and eliminat-

ing successively the least important ones)

or a combination of forward and back-

ward methods has also been proposed. It

should be noted that the criteria applied

in the analysis of variance are such that

variables are automatically selected that

are less correlated. In certain contexts

such as in experimental design or QSAR,

the reason for applying feature selection

is not only to avoid the numerical diffi-

culties described higher, but also to ex-

plain relationships. The variables that are

included in the regression equation have

a chemical and physical meaning and

when a certain variable is retained it is

considered that the variable influences

the y-value, e.g. the biological activity,

which then leads to proposals for causal

relationships. Correct feature selection

then becomes very important in those sit-

uations to avoid making wrong conclu-

sions. A discussion comparing different

strategies for feature selection in QSAR

is given in [19]. One of the problems is

that the procedures involve regressing

many variables on y and chance correla-

tions may then occur [20].

There are other difficulties, for in-

stance, the choice of experimental condi-

tions, the samples or the objects. These

should cover the experimental domain as

well as possible and, where possible, fol-

Iowan experimental design. This is dem-

onstrated, for instance, in [21]. Outliers

can also cause problems. Detection of

multivariate outliers is not evident. As for

the univariate regression, robust regres-

sion is possible [15][22]. An interesting

example in which multivariate robust re-

gression is applied concerns an experi-

mental design [23] carried out to opti-

mize the yield of an organic synthesis.

Wide Data Matrices

Chemists often produce wide data

matrices, characterized by a relatively

small number of objects (a few tens to a

few hundred) and a very large number of

variables (many hundreds, at least). For

instance, analytical chemists now often

apply very fast spectroscopic methods,

such as near infrared spectroscopy (NIR).

Because of the rapid character of the

analysis, there is no time to dissolve the

sample or separate certain constituents.

The chemist tries to extract the informa-

tion required from the spectrum as such

and to do so he has to relate a y-value

such as an octane number of gasoline

samples or a protein content of wheat

samples to the absorbance at 500 to, in

some cases, 10000 wavelengths. The

e.g.

1000 variables for 100 objects con-

stitute the Xmatrix. Such matrices con-

tain many more columns than rows and

are therefore often called wide.

Very wide matrices are also encoun-

tered in QSAR. For instance, in compara-

tive molecular field analysis (CoMFA),

developed by Cramer [24], three-dimen-

sional grids are laid over a set of mole-

cules whose properties in reacting with

other molecules or receptors one wants to

predict. At each of the resulting lattice

points electrostatic, hydrophobic and

steric fields are computed. A typical grid

of 5 x 2x2nm3with a spacing of 0.02 nm

yields 2 500 000 such lattice points for

each molecule. This huge set of data con-

stitutes the Xmatrix and must be related

e.g.

biological activity data.

Feature selection/reduction then takes

on a completely different complexity

compared to the situations described in

the preceding sections. It should be noted

that variables in such matrices are often

very correLated. This can for instance be

expected for two neighboring wave-

lengths in a spectrum or the fields meas-

ured at adjacent locations in the CoMFA

lattice. In what follows, we will explain

which methods chemometricians use to

modeL very large, wide and highly corre-

lated data matrices.

Genetic Algorithms for Feature

Selection

Genetic algorithms are general opti-

mization tools aiming at selecting the fit-

test solution to a problem. Suppose that,

to keep it simple, nine variables are

measured. Possible solutions are repre-

sented in Fig. 4. Selected variables are

indicated by a 1, non-selected variables

by a

Such solutions are sometimes, in

analogy with genetics, called chromo-

somes in the jargon of the specialists.

By random selection a set of such so-

lutions is obtained (in real applications

often several hundreds). For each soLu-

tion an MLR model is built using an

equation such as (3) and the sum of

squares of the residuals of the objects to-

COMPUTATIONAL CHEMISTRY COLUMN

CHIMIA 2001, 55. No.1/2

wards that model is determined. In the

jargon of the field one says that the fit-

ness of each solution is determined: the

smaller the sum of squares the better the

model describes the data and the fitter the

corresponding solutions are. Then fol-

lows what is described as the selection of

the fittest (leading to names such as ge-

netic algorithms or evolutionary compu-

tation). For instance out of the, say 100

original solutions, the 50 fittest are re-

tained. They are called the parent genera-

tion. From these a child generation is ob-

tained by reproduction and mutation.

Reproduction is explained in Fig. 5.

Two randomly chosen parent solutions

produce two child solutions by cross

over. The cross over point is also chosen

randomly. The first part of solution 1 and

the second part of solution 2 together

yield child solution I'. Solution 2' results

from the first part of solution 2 and the

second of solution 1.

The child solutions are added to the

selected parent solutions to form a new

generation. This is repeated for many

generations and the best solution from

the final generation is retained. Each gen-

eration is additionally submitted to muta-

tion steps. Here and there randomly cho-

sen bits of the solution string are changed

(0 to I or I to 0). This is applied in Fig. 6.

The need for the mutation step can be

understood from Fig. 5. Suppose that the

best solution is close to one of the child

solutions in that Fig., but should not in-

c1ude variable 9. However, because the

value for variable 9 is I in both parents, it

is also unavoidably 1 in the children.

Mutation can change this and move the

solutions in a better direction.

Genetic algorithms were first pro-

posed by Holland [25]. They were intro-

duced in chemometrics by Lucasius et al.

[26] and Leardi [27]. They were applied

for instance in QSAR and molecular

modeling [28], conformational analysis

[29], multivariate calibration for the de-

termination of certain characteristics of

polymers [30] or octane numbers [31].

Reviews about applications in chemistry

can be found in [32][33]. There are sever-

al competing algorithms such as simulat-

ed annealing [34] or the immune algo-

rithm [35].

REPRODUCTION (MATING)

[.11oI2III2IITillTIJ

Latent Variables for Feature

Reduction: Principal Components

The alternative to feature selection is to

combine the variables in what we called

earlier summarizing variables. Chemo-

metricians call this latent variables and

the obtention of such variables is called

feature reduction. It should be under-

stood that in this case no variables are

discarded. The type of latent variable

most commonly used is the principal

component (PC). To explain it we will

first consider the simplest possible situa-

tion. Two variables

(XI

and x::) were

measured for a certain number of objects

and the number of variables should be

reduced to one. In principal component

analysis (PCA) this is achieved by defin-

ing a new axis or vari-able on which the

objects are projected. The projections are

called the scores, sl, along principal

component I, PCI (Fig.7).

MUTATION

Fig. 4. A set of solutions

for feature selection

from nine variables for

MLR.

CHROMOSOMES

(Solutions)

•

VARIABLE

123456789

L~LiIlTIIQ]~

(cross over)

Fig. 5. Genetic algorithms: the reproduction step. Fig. 6. Genetic algorithms: the mutation step.

COMPUTATIONAL CHEMISTRY COLUMN

CHIMIA 2001,55, No. 1/2

The projections along PC 1 preserve

the information present in the

X'-X2

plot,

namely that there are two groups of data.

By definition, PCI is drawn in the direc-

tion of the largest variation through the

data. A second PC, PC2, can also be ob-

tained. By definition it is orthogonal to

the first' one (Fig. 8a). The scores along

PCI and along PC2 can be plotted against

each other yielding what is called a score

plot (Fig. 8b).

The reader observes that PCA decor-

relates: while the data points in the

Xl-X2

plot are correlated they are no longer so

in the s

I-S2

plot. This also means that

there was correlated and therefore redun-

dant information present in

and

X2'

PCA picks up all the important informa-

( )

tion in PCI and the rest, along PC2, is

noise and can be eliminated. By keeping

only PCI, feature reduction is applied:

the number of variables, originally two,

has been reduced to one. This is achieved

by computing the score along PCI as:

In other words the score is a weight-

ed sum of the original variables. The

weights are known as loadings and plots

of the loadings are called loading plots.

This can now be generalized to m di-

mensions. In the m-dimensional space,

PC I is obtained as the axis of largest vari-

ation in the data, PC2 is orthogonal to

PC I and is drawn into the direction of

largest remaining variation around PC1.

It therefore contains less variation (and

information) than PC I. PC3 is orthogo-

nal to the plane of PCl and PC2. It is

drawn in the direction of largest variation

around that plane, but contains less varia-

tion than PC3. In the same way PC4

is orthogonal to the hyperplane PC I,

PC2, PC3 and contains stilI less varia-

tion, etc. For a matrix with dimensions

m, N

min (n, m) PCs can be extract-

ed. However, since each ofthem contains

less and less information, at a certain

time they contain only noise and the

process can be stopped before reaching

N. If only d« N PCs are obtained, then

feature reduction is achieved.

A very important application of prin-

cipal components is to visually display

the information present in the data set and

most multivariate data applications start

therefore with score and/or loading plots.

The score plots give information about

the objects and the loading plots about

the variables. Both can be combined into

a biplot, which are all the more effective

after certain types of data transformation,

e.g. spectral mapping [36]. In Fig. 9 a

score plot is shown for an investigation

into the Maillard reaction, a reaction be-

tween sugars and amino acids [37]. The

samples consist of reaction mixtures of

different combinations of sugars and

amino acids. The variables are the areas

under the peaks of the reaction mixtures.

The reactions are very complex: 159 dif-

ferent peaks were observed. Each of the

samples is therefore characterized by its

value for 159 variables. The PCI-PC2

score plot of Fig. 9 can be seen as a pro-

jection of the samples from 159-dimen-

sional space to the two-dimensional

space that best preserves the variance in

the data. In the score plot different sym-

bols are given to the samples according to

the sugar that was present and it is ob-

Fig. 8. a) Second PC and

b) score plot of the data in

Fig. 1.

Fig. 7. Feature reduction of

two variables,

and X2, by

a principal component.

PCI

xxx x xxx

PC2

x x x x x

COMPUTATIONAL CHEMISTRY COLUMN

•

. .

o ~

<ff "

000

'#.

~X ••

~x,. ••

.Il

.~f

~>9'~~.+

-2

X ,

+ +

cae.sGf-

, ',. "+

-4

(fttfE .

G :

-6

2 4

10 12

16 18

PC1 ·58.01%

Fig. 9.

peA

score plot of samples from the Maillard reaction. The samples with rhamnose have

symbol

PC2

CHIMIA 2001. 55, NO.1/2

loadings on PC I are all positive and not

very different. Referring to Eqn. (5), and

remembering that the loadings are the

weights (the w-values) this means that

the score on PCI is simply a weighted

sum of the variables and therefore a glo-

bal indicator of pollution. The samples

with highest score on PCI are those with

the highest degree of polIution. Along

PC2 some variables have positive load-

ings and others negative loadings. Those

of the aliphatic variables are positive and

those of the aromatic variables are nega-

tive. It follows that samples with positive

scores contain more aliphatic than aro-

matic variables.

Combining PC 1 and PC2, one can

then conclude that samples with symbol

x have an aliphatic character and that the

total content increases with higher values

on PCl. The same reasoning can be held

for the samples with symbol·: they have

an aromatic character. In fact, one could

define new aliphaticity and aromaticity

factors as in Fig. 12. This can be done in a

more formal way using what is called

factor analysis.

Other Latent Variables

PCI

x Xx

xXX

X•

••••

o -

~:~,~ x •

••

•

••

•

••

•

••

Fig. 10.

peA

score plot of air samples.

served, for instance, that samples with

rhamnose occupy a specific location in

the score plot. This is only possible if

they also occupy a different place in the

original I 59-dimensional space, i.e. their

GC chromatogram is different. By study-

ing different parts of the data and by in-

cluding the information from the loading

plots, it is then possible to understand the

effect of the starting materials on the re-

action mixture obtained.

Principal components have been used

in many different fields of application.

Whenever a table of samples x variables

is obtained and some correlation between

the variables is expected, a principal com-

ponents approach is useful. Let us con-

sider an environmental example [38]. In

Fig. 10 the score plot is shown. The data

consist of air samples taken at different

times in the same sampling location. For

each of the samples a capilIary GC chro-

matogram was obtained. The different

symbols given to the samples indicate

different wind directions prevailing at the

time of sampling. Clearly the wind direc-

tion has an effect on the sample composi-

tions. To understand this better Fig. 11

gives a plot of the loadings of a few of the

variables involved. It is observed that the

There are other types of latent varia-

bles. In projection pursuit [37][39] a la-

tent variable is chosen such that, instead

of largest variation in the data set, it de-

scribes the largest inhomogeneity. In this

way clusters or outliers can be observed

more easily. Fig. 13 shows the result ap-

plied to the MailIard data of Fig. 9 and it

can be observed that the cluster of rham-

nose samples can now be observed more

clearly .

If the y-values are not characteristics

observed for a set of samples, but the

class affiliation of the samples (e.g. sam-

ples

1-10

belong to class A, samples

11-

25 to class B), then a latent variable can

be defined that describes the largest dis-

crimination between the classes. Such la-

tent variables are called canonical vari-

ates or sometimes linear discriminant

functions and are the basis for supervised

pattem recognition methods such as line-

ar discriminant analysis. In the partial

least squares (PLS) section, a further type

of latent factor wilI be introduced.

N-way Methods

Some data have a more complex

structure than the classical 2-way matrix

or table. Typical examples are met, for

instance, in environmental chemistry

[40]. A set of n variables can be measured

COMPUTATIONAL CHEMISTRY COLUMN 77

CHIMIA 2001.55. No. 1/2

PC2

Principal Component Regression

(PCR)

in m different locations at p different

times. This leads to a 3-way data set with

dimensions n x m x p. The three ways (or

modes) are the variable mode, the loca-

tion mode and the time mode. This can of

course be generalized to a higher number

of modes, but for the sake of simplicity

we will restrict here to 3-way. The classi-

cal approach to study such data is to per-

form what is called unfolding. Unfolding

consists of rearranging a 3-way matrix

into a 2-way matrix. The 3-way array can

be considered as several 2-way tables

(slices of the original matrix), and these

tables can be put next to each other, lead-

ing to a new 2-way array (Fig. 14). This

rearranged matrix can be treated with

PCA. Considering the example of Fig. 14,

the scores will carry information about

the locations, and the loadings mixed in-

formation about the two other modes.

Unfolding can be performed in differ-

ent directions so that each of the three

modes is successively preserved in the

unfolded matrix. In this way, three differ-

ent PCA models can be built, the scores

of each of these models giving informa-

tion about one of the modes. This ap-

proach is called the Tucker! model. It

is the first of a series of Tucker models

[41]. The most important of these is the

Tucker3 model. Tucker3 is a true noway

method as it takes into account the multi-

way structure of the data. It consists in

building, through an iterative process, a

score matrix for each of the modes, and a

core matrix defining the interactions be-

tween the modes. As in PCA, the compo-

nents in each mode are constrained to be

orthogonaL The number of components

can be different in each mode. A graphi-

cal representation of the Tucker3 model

for 3-way data is given in Fig. 15. It ap-

pears as a sum, weighted by the core ma-

trix G, of outer products between the fac-

tors stored as columns in the A, Band C

score matrices.

Another common n-way model is

the Parafac-Candecomp model that was

proposed simultaneously by Chan and

Harchman [42][43]. Information about n-

way methods (and software) can be

found in [44--46]. Applications in process

control [47][48], environmental chemis-

try [40][49], food chemistry [50], curve

resolution [51] and several other fields

have been published.

Fig. 11.

peA

loading plot of a

few variables measured on

the air samples in Fig. 12.

New fundamental factors dis-

covered on a score plot.

PCI

aliphatic factor

n-dadecane

a-xylene

- 0.300

0.300

arOnl atic factor

PCI

Fig. 12.

•

C\l

000

($J

§~

000

;~)(~)lft

§.

'dOC

-2

·3

-4 -2

PP1

PC2

Fig. 13. Projection pursuit plot of samples from the Maillard reaction. The samples with rhamnose

have symbol O. Until now we have applied latent var-

iables only for display purposes. Princi-

COMPUTATIONAL CHEMISTRY COLUMN 78

CHIMIA 2001, 55, No.1/2

Location

Fig. 14. Unfolding of a 3-way matrix, performed preserving the 'Location' dimension.

Applications of peR and PLS

peR and PLS have been applied in

many different fields. The following ref-

erences constitute a somewhat haphazard

selection from a very large literature.

There are many analytical applications in

the pharmaceutical industry [54], the pe-

troleum industry [55], food science [56],

environmental chemistry [57]. The meth-

ods are used with near or mid infrared

[58], chromatographic [59], Raman [60],

UV [61], potentiometric [62] data. A

good overview of applications in QSAR

is found in [63].

with the data contained in an (often) wide

matrix of correlated variables. However

the approach is different. In PCR one

works in two steps. In the first the scores

are obtained and only the X matrix is in-

volved, in the second y is related to the

scores. In PLS this is done in one step.

The latent variables are obtained, not

with the variation in X as criterion as is

the case for principal components, but

such that the new latent variable shows

maximal covariance between X and y.

This means that the latent variable is now

built immediately in function of the rela-

tionship between y and X. In principle

one therefore expects that PLS would

perform better than PCR, but in practice

they often perform equally well. A tutori-

al can be found in [52]. Several algo-

rithms are available. A very performant

one requiring the least computer time ac-

cording to our experience is SIMPLS

[53].

Variables x Time

Location

Fig. 15, Graphical representation of the Tucker3 model. n, m, and p are the dimensions of the

original matrix X, WI, W2, and W3 are the number of components extracted on mode 1, 2 and 3,

respectively, corresponding to the number of columns of the loading matrices A, Band C,

respectively.

pal components can however also be

used as the basis of a regression method.

It is applied among others when the x-

values constitute a wide X-matrix,

e,g,

for NIR calibration (see earlier). Instead

of the original x-values one applies the

reduced ones, the scores, Suppose m vari-

ables (e.g. 1000) were measured for n

samples (e.g. 100). As explained earlier

this requires either feature selection or

feature reduction, The latter can be

achieved by replacing the m x-values by

the scores on the k significant PC scores

(e,g,

5). The X matrix now no longer con-

sists of 100 x 1000 absorbance values but

of 100 x 5 scores since each of the 100

samples is now characterized by five

scores instead of 1000 variables. The re-

gression model is:

(6)

Since:

Eqn (6) becomes:

By using the principal components as

intermediates it is therefore possible to

solve the wide X matrix regression prob-

lem, It should be noted also that the prin-

cipal components are by definition not

correlated, so that the correlation prob-

lem mentioned earlier is therefore also

solved,

Partial Least Squares (PLS)

The aim of PLS is the same as that of

PCR, namely to model a set of y-values

PLS2 and Other Methods that

Describe the Relationship between

Two Tables

Instead of relating one y-value to

many x-values, it is possible to model a

set of y-values with a set of x-values.

This means that one relates two matrices

Y and X, or in other words two tables.

For instance, one could measure for a

certain set of samples a number of senso-

ry characteristics on the one hand and ob-

tain analytical measures on the other.

This would yield two tables as depicted

in Fig. 16. One could then wonder if it is

possible to predict the sensory character-

istics from the (easier to measure) chemi-

cal measurements or at least to under-

stand which (combinations) of analytical

measurements are related to which senso-

ry characteristics. At the same time one

wants to obtain information about the

COMPUTATIONAL CHEMISTRY COLUMN

Analytical measurements

Samples

Sensory data

CHI MIA 2001. 55. No.

1/2

Fig. 16. Relating two 2-way tables.

on-line measurements

batches

quality measurements

Fig 17. In process analysis, one is con-

cerned with the quality of finished batch-

es and this can be described by a number

of quality parameters. At the same time

for each batch a number of variables can

be measured on the process in function of

time [68]. This yields a two-way table on

the one hand and a three-way one on the

other. Relating these tables allows to pre-

dict the quality of a batch from the meas-

urements made during the process.

Acknowledgements

Y. Vander Heyden is a postdoctoral fellow

of the Fund for Scientific Research (FWO) .:..

Vlaanderen.

Received: December 21, 2000

batches

structure of each of the two tables (e.g.

which analytical variables give similar

information). PLS2 can be used for this

purpose. Other methods that can be ap-

plied are, for instance, canonical correla-

tion and reduced rank regression. An ex-

ample relating 20 measurements of me-

chanical strength of meat patties to the

sensory evaluation of textural attributes

can be found in [64] and a comparison of

methods in [65].

Generalization

It is also possible to relate multi-way

models to a vector of y-values or to 2-

Fig. 17. Relating a

two-way and a three-

way table.

way tables. The same way as with 2-way

data, the latent variables obtained in mul-

ti-way models are then used to build the

regression models. The multi-way analog

to peR would consist in modeling the

original data with Tucker3 or Parafac,

and then regress the dependent y variable

on the obtained scores. A more sophisti-

cated n-way version ofPLS (N-PLS) was

also developed. The principle of N-PLS

is to fit a model similar to Parafac, but

aiming at maximizing the covariance be-

tween the dependent and independent

variables instead of fitting a model in a

least squares sense. The usefulness of

such approaches will be apparent from

[I] D.L. Massart, B.G.M. Vandeginste,

L.M.C. Buydens, S. de Jong, P.J. Lewi, J.

Smeyers- Verbeke, 'Handbook of Chemo-

metrics', Elsevier, Amsterdam, 1997.

[2] N.R. Draper, H. Smith, 'Applied Regres-

sion Analysis', Wiley, New York, 1981.

[3]

Mandel, 'The Statistical Analysis of Ex-

perimental Data', Dover reprint, 1984,

Wiley&Sons, New York, 1964.

[4] D.L. MacTaggart, S.D. Farwell, J. Assoc.

Off. Anal. Chern. 1992, 75, 594.

[5] J.e. Miller, J.N.Miller, 'Statistics for Ana-

lytical Chemistry', Ellis Horwood, Chich-

ester, 3rd ed., 1993.

[6] W.E. Deming, 'Statistical Adjustment of

Data', Wiley, New York, 1943.

[7] P.T. Boggs, e.H. Spiegelman, J.R. Don-

aldson, R.B. Schnabel, J. Econometrics

1988,38,

169.

[8] P.J.Cornbleet, N.Gochman, Clill. Chem.

1979, 25, 432.

[9] e. Hartmann,

Smeyers-Verbeke, D.L.

Massart, Ana/usis 1993, 2J, 125.

[10] 1. Riu, F.x. Rius, J. Chemometr. 1995, 9,

343.

[I I] R.G. Krutchkoff, Technometrics 1967, 9,

425.

[12] V. Centner, D.L. Massart, S. de long,

Fre-

sen ius J. Anal.Chem. 1998,36/,2.

[13] B. Grientschnig, Fresenius J. Alla/.Chem.

2000, 367, 497.

COMPUTATIONAL CHEMISTRY COLUMN

CHI MIA 2001,55, No.112

[14] H. Theil, Nederlandse Akademie van Weten-

schappen Proc., Scr. A 1950,53, 386.

[15] P.J. Rousseeuw, A.M. Leroy, 'RobustRe-

gression and Outlier Detection', Wiley,

New York, 1987.

[16] G.R. Phillips, E.R. Eyring, Ana/. Chem.

1983, 55, 1134.

[17] F. Mosteller, J.W. Tukey, 'Data Analysis

and Regression', Addison-Wesley, Read-

ing,1977.

[18] P. Van Keerberghen, J. Smeyers- Verbeke,

R. Leardi, C.L. Karr, D.L. Massart, Che-

mom. IlIfel/. Lab. Syst. 1995, 28, 73.

[19] H. Kubinyi, Quant. Struct.-Act. Relat.

1994,13,285.

120] J.G. Topliss, R.J. Costello,

Med. Chem.

1972,15, 1066.

[21] M. Sergent, D. Mathieu, R. Phan- Tan-

LUll, G. Drava, Chemom. Intel/. Lab. Syst.

1995,27, 153.

[22] A.C. Atkinson, J. Am. Stat. Soc. 1994,89,

1329.

[23] S. Morgenthaler, M.M. Schumacher, Che-

mom. IlIfel/. Lab. Syst. 1999, 47, 127.

[24] R.D. Cramer III, D.E. Patterson, J.D.

Bunce, J. Am. Chem. Soc. 1988, 110,

5959.

[25] J.H. Holland, 'Adaption in Natural and

Artificial Systems', University of Michi-

gan Press, Ann Arbor, MI, 1975, revised

reprint, MIT Press, Cambridge, 1992.

[26] C.B. Lucasius, M.L.M. Beckers, G. Kate-

man, Anal. Chim. Acta 1994,286, 135.

[27] R. Leardi, R. Boggia, M. Terri Ie, J. Chem-

om.

1992,6,267.

[28] J. Devillers ed., 'Genetic Algorithms in

Molecular Modeling', Academic Press,

London, 1996.

[29] M.L.M. Beckers, E.P.P.A. Derks, W.J.

Melssen, L.M.C. Buydens, Comput.

Chem. 1996, 20, 449.

[30] D. Jouan-Rimbaud, D.L.Massart, R. Lear-

di, O.E. de Noord, Anal. Chem. 1995, 67,

4295.

[31] R. Meusinger, R. Moros, Chemom. Intel/.

Lab. Syst. 1999,46,67.

[32] P. Willet, Trends. Biochem, 1995, 13, 516.

[33] D.H. Hibbert, Chemom. Intel/. Lab. Syst.

1993,19,277.

[34] J.H. Kalivas, J. Chemom. 1991,5,37.

[35] X.G. Shao, Z.H. Chen, X.Q. Lin, Frese-

nius

Anal. Chem. 2000, 366, 10.

[36] P.J. Lewi, Arzneim. Forschung 1976, 26,

1295.

[37] Q. Guo, W. Wu, F. Questier, D.L.Massart,

C. Boucon, S. de Jong, Anal. Chem.

Year? 72,2846.

[38] J. Smeyers-Verbeke, J.c. Den Hartog,

W.H. Dekker, D. Coomans, L. Buydens,

D.L. Massart, Atmos. Environ., 1984, 18,

2471.

[39] J.H. Friedman, J. Am. Stat. Soc. 1987,82,

249.

[40] P. Barbieri, C.A. Andersson, D.L. Mas-

sart, S. Predonzani, G. Adami, G.E. Rei-

sen hofer, Anal. Chim. Acta 1999, 398,

227.

141] L.R. Tucker, Psychometrika 1966, 31,

279.

[42] R. Harshman, UCLA working papers in

phonetics 1970, 16, I.

[43] J.D. Carrol, J. Chang, Psychometrika,

1970, 45, 283.

[44] C.A. Andersson, R. Bro, Chemom. IlIfel/.

Lab. Syst. 2000,52, 1.

[45] M. Kroonenberg, 'Three-mode Principal

Component Analysis. Theory and Appli-

cations', DSWO Press, Leiden, 1983, re-

print 1989.

[46] R. Henrion, Chemom. Intel/. Lab. Syst.

1994,25, I.

[47] P. Nomikos, J.F. MacGregor, MChE

Journal, 1994, 40, 1361.

[48] D.J. Louwerse, A.K. Smilde, Chem. Eng.

Sci. 2000,55, 1225.

[49] R. Henrion, Chemom. Intel/. Lab. Syst.

1992, 16, 87.

[50] R. Bro, Chemom. IlIfel/. Lab. Syst. 1998,

46,133.

[51] A. de Juan, S.c. Rutan, R. Tauler, D.L.

Massart, Chemom. llIfel/. Lab. Syst. 1998,

40,19.

[52] P. Geladi, B.R. Kowalski, Anal. Chim.

Acta 1986, 185, 1.

[53] S. de Jong, Chemom. Intel/. Lab. Syst.

1993, 18, 251.

[54] K.D. Zissis, R.G. Brereton, S. Dunkerley,

R.E.A. Escott, Anal. Chim. Acta 1999,

384,71.

[55] C.J. de Bakker, P.M. Fredericks, App/.

Spect. 1995,49, 1766.

[56] S. Vaira, V.E. Mantovani, J.e. Robles,

J.e. Sanchis, H.C. Goicoechea, Ana/. Lett.

1999,32,3131.

[57] V. Simeonov, S. Tsakovski, D.L. Massart,

Toxicological

Environmental Chemis-

try, 1999, 72, 81.

[58] 1.B. Cooper, K.L. Wise, W.T. Welch,

M.B. Summer, B.K. Wilt, R.R. Bledsoe,

Appl. Speer.

1997,51,1613.

[59] M.P. Montana, N.B. Pappano, N.B. De-

battista, J. Raba, J.M. Luco, Chromato-

graphia 2000, 51, 727.

[60] O. Svensson, M. Josefson, F.W. Langkil-

de, Chemom. llIfel/. Lab. Syst. 2000, 49,

49.

[61] F. Vogt, M. Tacke, M. Jakusch, B. Mizai-

koff,Anal. Chim. Acta 2000, 422,187.

[62] M. Baret, D.L. Massart, P. Fabry, e. Me-

nardo, F. Conesa, Talanta 1999, 50, 541.

[63] S. Wold in 'Chemometric Methods in Mo-

lecular Design', Ed.: H. van de Water-

beemd, VCH, Weinheim, 1995.

[64] S. Beilken, L.M. Eadie, I. Griffiths, P.N.

Jones, P.V. Harris, J. Food Sci. 1991,56,

1465.

[65] B.G.M. Vandeginste, D.L. MasslU1,

L.M.C. Buydens, S. de Jong, P.J. Lewi, J.

Smeyers- Verbeke, 'Handbook of Chemo-

metrics and Qualimetrics' Part B, Chapter

35, Elsevier, Amsterdam, 1998.

[66] R. Bro, H. Heimdal, Chemom. Intel/. Lab.

Syst. 1996, 34, 85.

[67] R. Bro, J. Chemom. 1996, 10,47.

[68] e. Duchesne, J.F. McGregor, Chemom.

Intel/. Lab. Syst. 2000,51,125.

Optimized Photo-Fenton degradation of psychoactive pharmaceuticals alprazolam and diazepam using a chemometric approach-Structure and toxicity of transformation products

Article

Feb 2021
J HAZARD MATER

The objectives of the present study were: a) to evaluate the photocatalytic degradation of two benzodiazepine pharmaceuticals, alprazolam and diazepam, using Photo-Fenton, b) to optimize the experimental parameters through a central composite experimental design, c) to assess their mineralization and toxicity variations and d) to identify the transformation products during the process and to propose transformation pathways. Response Surface Methodology proved to be a useful tool for the optimization of the degradation process as the statistical coefficients (R 2 = 0.967 for alprazolam and R 2 = 0.929 for diazepam) showed satisfactory values confirming the adequate correlation between the predicted models and experimental values. Two sets of experimental conditions were proposed taking into consideration criteria related to the reaction rate and the minimum use of iron. Toxicity of the system varied with time after the treatment, indicating the gradual production of transformation products which differ in their toxic potential. Fifteen and twenty-three photocatalytic degradation products were identified for ALP and DZP respectively using LC-(ESI)MS/MS. In the case of ALP, the main degradation reactions included, phenyl-group removal and the opening of the 7-membered ring, while for DZP, degradation occurred through hydroxylation, formation of benzophenone and the opening of the 7-membered cyclic group.

Quantum mechanics implementation in drug-design workflows: does it really help?

Article

Full-text available

Aug 2017

The pharmaceutical industry is progressively operating in an era where development costs are constantly under pressure, higher percentages of drugs are demanded, and the drug-discovery process is a trial-and-error run. The profit that flows in with the discovery of new drugs has always been the motivation for the industry to keep up the pace and keep abreast with the endless demand for medicines. The process of finding a molecule that binds to the target protein using in silico tools has made computational chemistry a valuable tool in drug discovery in both academic research and pharmaceutical industry. However, the complexity of many protein–ligand interactions challenges the accuracy and efficiency of the commonly used empirical methods. The usefulness of quantum mechanics (QM) in drug–protein interaction cannot be overemphasized; however, this approach has little significance in some empirical methods. In this review, we discuss recent developments in, and application of, QM to medically relevant biomolecules. We critically discuss the different types of QM-based methods and their proposed application to incorporating them into drug-design and -discovery workflows while trying to answer a critical question: are QM-based methods of real help in drug-design and -discovery research and industry?

Review on Multiway Analysis in Chemistry—2000–2005

Article

Dec 2006
CRIT REV ANAL CHEM

Rasmus Bro

This review describes advances in multiway analysis during the period 2000-2005. Multiway analysis started to take off in chemistry in the 1980s, but only in recent years has it been broadly applied to many diverse kinds of data. This review reflects how the field has matured and how the methods have been applied to more and more difficult types of data in new research areas. Multiway analysis is described in terms of different types of data, different areas of applications as well as more fundamental and theoretical results throughout the period. It is evident from this review that multiway analysis is presently a generally accepted and used tool whose full potential is far from reached.

Applied Regression Analysis

Article

Jan 1968

Statistics for analytical chemistry

Book

Jan 1988

Statistical analysis of experimental data

Article

Jan 1964

J. Mandel

Analytical use of linear regression. Part 1: Regression procedures for calibration and quantitation

Article

Jan 1992

Correct and incorrect use of multilinear regression

Article

Jan 1995
CHEMOMETR INTELL LAB

Michelle Sergent

Statistical Adjustment of Data

Article

Jan 1966

Robust regression and outlier detection for non-linear models using genetic algorithms

Article

Apr 1995
CHEMOMETR INTELL LAB

P. Vankeerberghen

Data Analysis and REgression

Article

Jan 1978
J Roy Stat Soc

Exploratory Projection Pursuit

Article

Mar 1987

Jerome H. Friedman

A new projection pursuit algorithm for exploring multivariate data is presented that has both statistical and computational advantages over previous methods. A number of practical issues concerning its application are addressed. A connection to multivariate density estimation is established, and its properties are investigated through simulation studies and application to real data. The goal of exploratory projection pursuit is to use the data to find low- (one-, two-, or three-) dimensional projections that provide the most revealing views of the full-dimensional data. With these views the human gift for pattern recognition can be applied to help discover effects that may not have been anticipated in advance. Since linear effects are directly captured by the covariance structure of the variable pairs (which are straightforward to estimate) the emphasis here is on the discovery of nonlinear effects such as clustering or other general nonlinear associations among the variables. Although arbitrary nonlinear effects are impossible to parameterize in full generality, they are easily recognized when presented in a low-dimensional visual representation of the data density. Projection pursuit assigns a numerical index to every projection that is a functional of the projected data density. The intent of this index is to capture the degree of nonlinear structuring present in the projected distribution. The pursuit consists of maximizing this index with respect to the parameters defining the projection. Since it is unlikely that there is only one interesting view of a multivariate data set, this procedure is iterated to find further revealing projections. After each maximizing projection has been found, a transformation is applied to the data that removes the structure present in the solution projection while preserving the multivariate structure that is not captured by it. The projection pursuit algorithm is then applied to these transformed data to find additional views that may yield further insight. This projection pursuit algorithm has potential advantages over other dimensionality reduction methods that are commonly used for data exploration. It focuses directly on the “interestingness” of a projection rather than indirectly through the interpoint distances. This allows it to be unaffected by the scale and (linear) correlational structure of the data, helping it to overcome the “curse of dimensionality” that tends to plague methods based on multidimensional scaling, parametric mapping, cluster analysis, and principal components.

Multivariate Statistical Process Control of Batch Processes Using Three-Way Models

Article

Apr 2000
CHEM ENG SCI

The theory of batch MSPC control charts is extended and improved control charts are developed. Unfold-PCA, PARAFAC and Tucker3 models are discussed and used as a basis for these charts. The results of the different models are compared and the performance of the control charts based on these models is investigated. It is found that this performance depends on the type of fault occuring in the batch process. A strategy is provided to partition reference data describing the normal operating conditions, in order to be able to monitor a new incomplete batch on-line.

Chemometrics and Modeling

Abstract

Recommended publications

A robust regression analysis of recruitment in fisheries

Galton's Family Heights Data Revisited

The LASSO and Sparse Least Squares Regression Methods for SNP Selection in Predicting Quantitative T...

Intermediate Least Squares Regression Method