ArticlePDF Available

Quantitative structure–property relationships prediction of some physico-chemical properties of glycerol based solvents

July 2013
Green Chemistry 15(8)

July 2013
15(8)

DOI:10.1039/C3GC40694F

Authors:

José I García

Spanish National Research Council

Alba Mayoral

University of Alcalá

Quantitative structure–properties relationships (QSPR) models have been developed for three characteristic properties of a series of 62 new glycerol derivatives, relevant to solvent classification and substitution uses. Using structural descriptor variables, three equations have been found using multiple linear regression analysis, which can be applied for in silico prediction of physico-chemical properties, allowing a faster selection of target solvents for a given application.

List of properties in the 62 glycerol solvents studied in the present work.

…

. Example of boiling point prediction of 1,2,3-triethoxypropane (222) using the linear regression obtained with DARC/PELCO descriptors. a

…

Figures - uploaded by José I García

Content may be subject to copyright.

Content uploaded by José I García

Content may be subject to copyright.

García et al., Green Chem. 2013, 15, 2283–2293

Quantitative structure-property relationships prediction of some

physico–chemical properties of glycerol based solvents.

José I. García,*a Héctor García-Marín,a José A. Mayoral,a,b and Pascual Pérezc

Quantitative Structure-Properties Relationships (QSPR) models have been developed for three

characteristic properties of a series of 62 new glycerol derivatives, relevant to solvent

classification and substitution uses. Using structural descriptor variables, three equations have

been found using Multiple Linear Regression analysis, which can be applied for in silico prediction

of physico-chemical properties, allowing a faster selection of target solvents for a given 10

application.

Introduction

Organic solvents are used in huge amounts in many industrial

and daily life applications, but unfortunately the majority of

them come from petroleum and they are often labelled as toxic 15

or hazardous substances. For this reason, substantial efforts

are being done to develop more benign solvents from

renewable sources. Our group has recently described a family

of solvents based on glycerol,1 a concomitant product in

biodiesel production. To facilitate the search of possible 20

substitution applications, we have also determined a number

of physico-chemical properties of these glycerol-derived

solvents, and compared them with those of conventional

organic solvents. Many of these properties are difficult to

measure, so it is clear that the development of efficient 25

quantitative structure-properties relationship (QSPR)

equations would be of great interest to accelerate the search of

the best solvent for a given application. The concept is based

on the fact that it exists a close relationship between the bulk

properties and the molecular structure of a series of similar 30

chemical compounds. In this context, solvent classification is

a very interesting issue, which has traditionally been

addressed from both microscopic (intermolecular interactions)

and macroscopic (as a continuum medium) approaches.

However, solvation processes are hard to parameterize given 35

that solvation energy (the only observable magnitude) is

controlled by a large amount of factors. For this reason,

classification of solvents, and especially that of neotheric

solvents, is far from being straightforward,2-7 and hence,

during the last three decades of the 20th century, many efforts 40

have been devoted to classify solvents using empirical

parameters.8

Quantitative structure–property relationships are

mathematical equations relating chemical structure to a wide

variety of physical, chemical, and biological properties; in our 45

case, solvent properties. QSPR models, once established, can

be used to predict properties of compounds as yet unmeasured

or even unknown.9-13 In this context, there are many reports

about the applications of QSPR in connection with solvents,

such as physico-chemical properties in alkanes series,14 50

optical properties of organic compounds,15 thermophysical

properties of some fluids,16 solubility of hazardous

compounds,17 acidity constants of some acid derivatives,13

permeability of organic compounds in membranes,18 or

important properties of room temperature ionic liquids 55

(RTILs), such as toxicity.19-21 A major step in the

development of QSPR models is ﬁnding a set of molecular

descriptors able to represent the variation of the structural

features of the molecules, and therefore a wide variety of

descriptors have been reported for use in QSPR analysis.22-25 60

The molecular descriptors chosen (X) are correlated with one

or more response variables (Y) using different statistical

approaches. Among the many statistical procedures available

to establish those relationships, such as Partial Least Squares

Analysis (PLSA), Multiple Linear Regression (MLR), 65

Artificial Neural Network (ANN), or Principal Component

Analysis (PCA), a really good example about using QSPR in

the classification of Solvents through PCA can be found in the

literature.26 Probably MLR27 is the most used because it is

simple and intuitive. 70

Scheme 1. General structure and codification of the glycerol-derived

solvents used in this study.

From an industrial application point of view, there are three

main solvent features that must be taken into account, as are 75

1) behaviour in dissolution processes, which can be well

defined through the solvatochromic parameter E!

! (see

below),28-29 2) mechanical aspects, which can be quantified by

their viscosity, and 3) volatility aspects, very related to safety,

toxicity and air pollution, which can be considered through 80

the boiling point.

OOR

R = Code =

(CF

)

Table 1. List of properties in the 62 glycerol solvents studied in the

present work.

Solvent

Code

Visc.

(cP)

b.p.

(ºC)

1,2,3-propanetriol

000

0.812

93430

29030

3–methoxy–1,2–propanediol

100

0.710

37.72

222

3–ethoxy–1,2–propanediol

200

0.690

35.14

221

3–n–butoxy–1,2–propanediol

400

0.680

42.03

249

1,3–dimethoxy–2–propanol

101

0.610

3.46

170

1–isopropoxy–3–methoxy–2–propanol

103i

0.490

3.38

188

1–n–butoxy–3–methoxy–2–propanol

104

0.480

208

1–isobutoxy–3–methoxy–2–propanol

104i

0.490

200

1–tert–butoxy–3–methoxy–2–propanol

104t

0.440

195

1–n–butoxy–3–isopropoxy–2–propanol

403i

0.470

4.59

223

1,3–di–n–butoxy–2–propanol

404

0.450

5.53

248

3–n–butoxy–1–tert–butoxy–2–propanol

404t

0.390

230

3–n–butoxy–1–isobutoxy–2–propanol

404i

0.460

229

1–ethoxy–3–isopropoxy–2–propanol

203i

0.450

187

1–n–butoxy–3–ethoxy–2–propanol

204

0.450

220

1–tert–butoxy–3–ethoxy–2–propanol

204t

0.410

204

1–isobutoxy–3–ethoxy–2–propanol

204i

0.460

214

1,3–diisopropoxy–2–propanol

3i03i

0.460

202

1–tert–butoxy–3–isopropoxy–2–propanol

3i04t

0.370

202

1–isobutoxy–3–isopropoxy–2–propanol

3i04i

0.440

215

1–isopropoxy–3–(2,2,2–trifluoroethoxy)–2–

propanol

3i03F

0.590

6.90

176

1–n–butoxy–3–(2,2,2–trifluoroethoxy)–2–propanol

403F

0.600

210

1–tert–butoxy–3–(2,2,2–trifluoroethoxy)–2–

propanol

4t03F

0.570

8.61

199

1–isobutoxy–3–(2,2,2–trifluoroethoxy)–2–

propanol

4i03F

0.600

205

1,3–bis(2,2,2–trifluoroethoxy)–2–propanol

3F03F

0.700

8.14

197

1,3–bis(2,2,3,3,3–pentafluoropropoxy)–2–propanol

5F05F

0.699

204

1,3–bis(2,2,3,3,4,4,4–heptafluorobutoxy)–2–

propanol

7F07F

0.685

19.60

206

1,2,3–trimethoxypropane

111

150

1–isopropoxy–2,3–dimethoxypropane

113i

1.03

170

2–n–butoxy–3–methoxy–1–isopropoxypropane

143i

0.145

215

1–tert–butoxy–2,3–dimethoxypropane

114t

0.214

180

2–n–butoxy–1–tert–butoxy–3–methoxypropane

144t

219

1–n–butoxy–2,3–dimethoxypropane

114

0.178

199

1,2–di–n–butoxy–3–methoxypropane

144

234

1–isobutoxy–2,3–dimethoxypropane

114i

193

2–n–butoxy–1–isobutoxy–3–methoxypropane

144i

227

2–ethoxy–3–methoxy–1–isopropoxypropane

123i

0.167

161

3–ethoxy–2–methoxy–1–isopropoxypropane

213i

0.171

183

1–tert–butoxy–2–ethoxy–3–methoxypropane

124t

190

1 tert–butoxy–3–ethoxy–2–methoxypropane

214t

0.150

193

1–n–butoxy–2–ethoxy–3–methoxypropane

124

0.155

209

1–n–butoxy–3–ethoxy–2–methoxypropane

214

0.164

209

1–isobutoxy–2–ethoxy–3–methoxypropane

124i

198

1–isobutoxy–3–ethoxy–2–methoxypropane

214i

0.161

201

2,3–diethoxy–1–isopropoxypropane

223i

192

1–tert–butoxy–2,3–diethoxypropane

224t

0.155

199

1–n–butoxy–2,3–diethoxypropane

224

0.161

217

1–isobutoxy–2,3–diethoxypropane

224i

210

1–n–butoxy–2–methoxy–3–isopropoxypropane

413i

0.155

1.67

218

1–n–butoxy–2–ethoxy–3–isopropoxypropane

423i

222

3–n–butoxy–1–tert–butoxy–2–methoxypropane

414t

0.141

234

3–n–butoxy–1–tert–butoxy–2–ethoxypropane

424t

211

1,3–di–n–butoxy–2–methoxypropane

414

0.145

3.78

244

3–n–butoxy–1–isobutoxy–2–methoxypropane

414i

0.150

226

3–n–butoxy–1–isobutoxy–2–ethoxypropane

424i

241

3–isopropoxy–2–methoxy–1–(2,2,2–

trifluoroethoxy)–propane

3i13F

180

3–tert–butoxy–2–methoxy–1–(2,2,2–

trifluoroethoxy)–propane

4t13F

0.373

2.14

185

1,2,3–tri–n–butoxypropane

444

2.72

270

3–n–butoxy–2–methoxy–1–(2,2,2–

trifluoroethoxy)propane

413F

207

2–methoxy–1,3–bis(2,2,2–trifluoroethoxy)propane

3F13F

0.553

2.33

178

2–ethoxy–1,3–bis(2,2,2–trifluoroethoxy)propane

3F23F

0.595

171

2–n–butoxy–1,3–bis(2,2,2–

trifluoroethoxy)propane

3F43F

0.574

208

Therefore, we decided to select for the present work 62

solvents based on glycerol, all of them prepared in our

laboratory (Scheme 1 and Table 1),1 and the three above-5

mentioned properties, also determined by us, were analyzed

for this solvent set using several QSPR models.

Results and discussion

Molecular structure definition

There are many ways of describing the structure of a chemical 10

compound as a vector of numbers. In this work we have used

two different approaches, based on molecular connectivity

descriptors: topological parameters and DARC/PELCO

descriptors.

Figure 1. DARC/PELCO scheme used to describe glycerol based

solvent structures.

Topological parameters are based on the molecular graph of

each compound.25,31 They are easily determined from the 20

connectivity and adjacency matrixes of each compound. The

number of connected components of a graph is a topological

invariant that measures the number of structurally independent

or disjoint subnetworks. These parameters are excellent

descriptors of molecular size, shape and flexibility. They are 25

global parameters in the sense that the whole molecular

structure is condensed in a single number. The topological

descriptors selected for QSPR studies in this work are: i)

Hydrogen bond acceptor counters (HBA), ii) Hydrogen bond

donor counters (HBD), iii) Rotatable bond counters (RB), iv) 30

Flexibility index (

),32 v) Balaban index (Bal),33 vi) Wiener

index (W),34 vii) Zagreb index (Z),35 viii) Kier shape index

(

n),32 ix) Subcount index (SC),36 and x) Conectivity index

(

).25 Full definition of the indices used in the statistical

analyses are given in the Supporting Information. 35

DARC/PELCO (Description, Acquisition, Retrieval and

Computer–aided design / Perturbation of an Environment

which is Limited, Concentric and Ordered),37 is another

excellent way to describe chemical structures, yet much less

used in QSPR studies. This system is particularly suitable for 40

studying families of compounds with a common chemical

substructure. The DARC/PELCO method is based on the

exhaustive generation of all topochromatic sites around the

reference structure (F0), which corresponds to the glycerol

skeleton common to all structures, and the evaluation of their 45

contribution to the property. The DARC/PELCO descriptors

are local, since each one indicates the presence or absence of

a group of atoms in a given molecular position. Their

definition is shown in Figure 1. In this definition we have

incorporated the symmetry of the glycerol derivatives used, by 50

(BF2)

)

(DF2)

)

assuming that the contributions of groups occupying

equivalent positions (i.e. those linked to carbons 1 and 3 of

the glycerol moiety) will display the same influence on the

property under study. Preliminary studies have demonstrated

that this simplification do not alter the results of the 5

regression analyses.

Solvent Properties Selection

Solvent polarity (𝐸!

Solvent polarity parameters have demonstrated their

usefulness not only to classify organic solvents but also to 10

explain solvent effects on very different physical and

chemical processes. An excellent overview of solvent polarity

parameters and their applications can be found in the

outstanding Reichardt’s book.8 Although there are several

procedures to quantify solvent polarity, solvatochromism 15

measurements of probe dyes is undoubtedly the most

successful methodology for an accurate determination of this

solvent feature due to their easy determination and the high

sensitivity to small polarity changes. From this point of view,

the Dimroth and Reichardt ET(30) parameter28-29 is one of the 20

most widespread used parameter. ET(30) values represent a

blend of dipolarity/polarizability and hydrogen bond donor

solvation abilities of the solvent, the latter feature contributing

to the total ET(30) value to a greater extent. E!

! is a

normalized form of ET(30), taking the value 0 for hexadecane 25

and 1 for water.

Viscosity.

Viscosity describes a fluid's internal resistance to flow and

may be thought of as a measure of fluid friction. This property

is particularly interesting from the viewpoint of possible large 30

scale industrial applications, where big solvent volumes have

to be stirred and pumped from one place to another.

Boiling point

One major problem concerning the use of organic solvents is

the presence of traces of these compounds in the air. The most 35

common volatile organic compounds (VOC’s) are solvents

indeed. Nowadays a big effort is being done to solve this

problem, trying to substitute these volatile solvents with

others that are less or non-volatile. For this reason this

property is really important to be not only measured but also 40

predicted. Boiling point is a quick and easy form to estimate

the volatility of a solvent, since in general a higher boiling

point correlated with a lower volatility at ambient pressure

and temperature.

Quantitative Structure Properties Relationships 45

Multiple Linear Regression (MLR) with topological indices.

It is often assumed that the relationship between structural

parameters and experimental properties is well represented by

a linear model:

y = b0 + b1 x1 + b2 x2 + ... + bn xn or 50

Y = X·B (in matrix form) Eq. (1)

In Eq. (1) the bi are unknown coefficients, and the objective

of regression analysis is to estimate their values. As QSPR

data sets consist of variables that are diverse in range,

variation and size, prior to regression analysis auto-scaling is 55

usually applied, i.e., the ith column is mean centred (with xi)

and scaled with 1/SD(xi), where SD is the standard deviation.

When X is of full rank the least squares solution is: B = (XT

X)–1XTY, where B is the estimator vector for the regression

coeﬃcients. However, very often, not all these coefficients 60

have statistical significance, so the final QSPR model should

only keep those descriptors really contributing to the variation

in the property observed. To this end we used a stepwise

method for variable selection. In this way, independent

variables xi are entering and leaving in the regression 65

equation, and only those having statistically significant

coefficients are finally kept in the model fitting.

The three regression equations obtained for the three

experimental properties fitted are the following:

!=b!+b!·HBA +b!·HBD +b!·SC!

! Eq. (2) 70

η=b!+b!·HBD Eq. (3)

bp =b!+!b!·RB +b!·HBD +b!·HBA Eq. (4)

The corresponding coefficient values and MLR parameters

are shown in Table 2.

Table 2. Linear regression parameters from equations 2–4.a 75

b.p.

b0 ± e0

0.206 ± 0.035

—b

111.1 ± 17.0

b1 ± e1

0.073 ± 0.010

14.50 ± 3.56

11.8 ± 1.9

b2 ± e2

0.194 ± 0.021

—

24.7 ± 5.0

b3 ± e3

−0.019 ± 0.004

—

−3.2 ± 1.2

0.957

0.823

0.769

σ(y)

0.0437

7.51

12.2

72.39

74.57

64.52

a bi are the coefficients for each regressions, ei is the tolerance for the bi

value in a 95% confidence interval. N is the number of cases (solvents

data) used in each regression, R2 is the determination coefficient. b As the

b0 coefficient turned to be non-significant in the standard MLR analysis,

fitting was done by forcing the equation to pass through the origin of 80

coordinates. A slight improvement in R2 was obtained in this way. c F(3,43

0,05) = 2,84; F(1,18, 0,05) = 4,41; F(3,59, 0,05) = 2,84. All equations are

statistically significant (p > 95%).

As can be seen, hydrogen-bonding ability of the solvent

seems to be the most important feature in modeling the three 85

properties under study. This result is consistent with the kind

of intermolecular interactions involved. It is well-known that

! values are dominated by the HBD ability of the solvent,

due to the strong specific solvation stablished through

hydrogen-bonding with the phenolate oxygen of the betaine 90

dye. Similarly, the strong solvent-solvent intermolecular

hydrogen-bond interactions of most of the glycerol-derived

solvents included in the study are in the origin of the viscosity

values obtained, and hence of the importance of this

coefficient in the MLR model. Finally, the same strong 95

intermolecular interactions can be invoked to explain the high

boiling points displayed by most of the solvents considered.

Figure 2 plots the experimental values vs. those calculated

with the three MLR models. The dotted line represents the

least squares fit between both sets of data. 100

Figure 2. Plots of predicted vs. experimental values of E!

! (a), Viscosity

(b), and boiling point (c), as calculated through MLR analysis using 5

topological indices.

As can be seen the best results are obtained in the case of

the E!

! solvation parameter, which is consistent with the

higher determination coefficient value obtained in the MLR

analysis. In the other two cases, although there a clear 10

correlation, as indicated by the grouping of points around the

diagonal, the fit is not good enough to lead to a fully

predictive model.

The robustness and predictivity character of the method

was tested by splitting the data into a training and a test set, 15

which was created by extracting eight solvents from the

complete set, so the training set consists of 54 solvents. The

solvents of the test set (Table 3) were selected bearing in mind

the representativity of the whole set and for all the properties

the test set size is within the usually recommended percentage 20

of 10-20% of total cases.

Table 3. Subgroup of eight solvents extracted from the total amount of

solvents in order to create the new 54 solvents training set.

Solvent

η (cP)

b.p. (ºC)

200

0,690

35,140

221

104

0,480

—

208

3i03F

0,590

6,900

176

5F05F

0,699

—

204

113i

—

1,030

170

414t

0,141

—

234

4t13F

0,373

2,140

185

3F23F

0,595

—

171

The three new regression equations obtained with the new

54 solvents group of the training set are sumarized in Table 4. 25

Table 4. Linear regression parameters from equations 2–4 obtained with

the training set of solvents.a

b.p.

b0 ± e0

0.196 ± 0.039

—b

111.4 ± 17.6

b1 ± e1

0.071 ± 0.012

14.19 ± 4.59

11.7 ± 2.0

b2 ± e2

0.200 ± 0.024

—

25.9 ± 5.4

b3 ± e3

−0.018 ± 0.005

—

−3.0 ± 1.4

0.953

0.791

0.782

σ(y)

0.0456

8.16

11.9

238.24

45.31

59.64

a bi are the coefficients for each regressions, ei is the tolerance for the bi

value in a 95% confidence interval. N is the number of cases (solvents

data) used in each regression, R2 is the determination coefficient. b As the 30

b0 coefficient turned to be non-significant in the standard MLR analysis,

fitting was done by forcing the equation to pass through the origin of

coordinates. A slight improvement in R2 was obtained in this way.

c F(3,36 0,05) = 2.88; F(1,12, 0,05) = 4.75; F(3,51, 0,05) = 2.79. All equations are

statistically significant (p > 95%). 35

As can be seen, the regression coefficients are in all cases

very close to those calculated with the full set of solvents,

which illustrates the robustness of the equations obtained.

These new equations were used to predict the polarity,

viscosity and boiling point of solvents in the test set. As a 40

measure of the goodness of the prediction we used the mean

unsigned error (MUE). In the case of E!

! the MUE of the

fitting of the training set was 0.028, whereas that of the

predictions of the test set was 0.030 and represent less than

5% of the whole range of values (0.671). This points to a 45

reasonable predictivity for the model developed. In the case of

viscosity the corresponding MUE for the training and test sets

are 7.28 and 5.08, respectively, i.e. 18% of the whole range of

values (41.0) in the worst case, which indicates the poorer

predictivity of the corresponding equations, although they 50

could be still be used in a semi-quantitative way. Finally, in

the case of the boiling points, the MUE values for the training

and test sets are 8.6 and 10.9, respectively. Again, the error is

only slightly higher in the case of the “pure predictions” (test

set), representing less than 8% of the whole range of values 55

(140.0), which would allow a reasonable degree of

predictivity. A plot comparing the predicted and experimental

values of the test set is in Figure S1 of the Electronic

Supplementary Information (ESI).

Partial Least Squares (PLS) Regression with topological

indices. 5

One problem when using topological indices is the high pair-

correlation existing between many of them, given they often

recover similar structural features of the target molecule. This

can have undesirable consequences in MLR analyses, since

the real significance of a variable cannot be ascertained if it is 10

highly correlated with another one. For instance, when

examining variable coefficients in Eq. 2 one should be aware

that HBA and SC!

! have a pair correlation coefficient as high

as 0.828 (full pairwise correlation data are gathered in Table

S3 in the ESI). 15

Table 5. PLS regression results obtained in the treatment of the

experimental solvent properties studied in this work.

! a

ηb

b.p.c

HBA

0.0141

0.238

8.9

HBD

0.1370

9.387

46.7

0.0097

0.713

6.4

−0.0104

−0.001

−2.4

BalJX

−0.1500

−3.210

31.1

BalJY

−0.0806

−2.638

35.4

0.0000

0.003

0.0

0.0007

0.005

−0.2

κ!

!!.

0.0026

−0.024

2.1

κ!

!!.

−0.0093

0.004

−2.0

κ!

!!.

0.0228

−1.118

−5.4

𝑆𝐶!

0.0030

−0.013

2.3

𝑆𝐶!

0.0030

−0.013

2.3

𝑆𝐶!

0.0022

0.025

−1.7

𝑆𝐶!

−0.0006

0.136

−6.4

𝑆𝐶!

0.0040

0.105

−7.7

𝜒!

0.0041

0.000

1.1

𝜒!

0.0044

−0.034

10.8

𝜒!

0.0100

−0.106

−5.0

𝜒!

−0.0011

0.819

0.9

𝜒!

0.0211

−0.188

2.6

𝜒!

!.!.

−0.0185

−0.560

−11.2

𝜒!

!.!.

−0.0201

−0.344

−12.8

𝜒!

!.!.

−0.0242

−1.185

17.0

𝜒!

!,!.!.

−0.0796

0.445

109.8

𝜒!

!",!.!.

−0.0632

−3.142

12.0

0.9997

30.973

−111.3

0.969 (0.954)a

0.700 (0.535)

0.891 (0.770)

σ(y)

0.036

7.29

8.1

a PLS regression used 4 latent variables built from the 26 original ones.

b PLS regression used 3 latent variables built from the 26 original ones.

c PLS regression used 7 latent variables built from the 26 original ones. 20

d Values in parentheses correspond to full cross-validated analyses, i.e.

each value is predicted by the equation obtained leaving that solvent out.

The resulting fitting is therefore more representative of the true predictive

ability of the model.

A possible solution to this problem is to transform the original 25

variables in a new set of a few new orthogonal (not correlated)

variables, gathering most of the total variance of data. In the case

of PLS regression,38,39 both the dependent (y) and the

independent (x) variables are projected in a new space, trying of

maximize the explanation of the variance of y through the 30

variance of latent variables x. Once this relationship is found, the

PLS coefficients are projected back to the original x-space, to

obtain the corresponding regression coefficients.

Figure 3. Plots of predicted vs. experimental values of E!

! (a), Viscosity 35

(b), and boiling point (c), as calculated through PLS analysis using

topological indices.

When the PLS regression technique was applied to our

problem, slighty better models were obtained for two of the

three properties considered. The corresponding coefficients 40

and PLS parameters are shown in Table 5, the most important

coefficients corresponding again to the hydrogen-bonding

indices. Plots of predicted vs. experimental values of the

properties are displayed in Figure 3. As can be seen in these

plots, in the case of E!

! the PLS model fits very well the

values of most of the 62 solvents used in the analysis. The 5

MUE of the fitted values is 0.028, identical to that obtained in

the previous MLR analysis. The full cross-validated

predictions (i.e., those performed by leaving the predicted

point out of the PLS calculation of the coefficients) are close

to normal predictions in all but one case (7F07F), which 10

points to the robustness of the model and the reliability of the

predictions. The MUE in this case is only slightly higher,

0.034. On the other hand, viscosity displays a bad

behaviour concerning the PLS analysis, with a determination

coefficient (R2) even lower than that found in the MLR 15

analysis. Again, hydrogen bond donor ability and κ3

!α. are the

topological variables with higher coefficients. However the

MUE of the fitted values is 5.86, and that of the cross-

validated values increases to 7.88, values which are not far

from those obtained in the MLR analyses, although they seem 20

to be too high to allow reliable quantitative predictions.

Table 6. PLS regression results obtained in the treatment of the training

set of solvents.

! a

ηb

b.p.c

HBA

0.0145

0.142

4.181

HBD

0.1390

10.982

38.876

0.0097

0.717

10.645

−0.0096

0.047

1.938

BalJX

−0.1680

−2.110

31.450

BalJY

−0.0990

−2.036

29.459

0.0000

0.002

−0.006

0.0006

0.003

−0.226

κ!

!!.

0.0022

-0.019

−0.010

κ!

!!.

−0.0088

0.047

2.044

κ!

!!.

0.0213

−1.290

−1.801

𝑆𝐶!

0.0027

−0.011

0.155

𝑆𝐶!

0.0027

−0.011

0.155

𝑆𝐶!

0.0022

0.019

−1.142

𝑆𝐶!

−0.0006

0.125

−3.202

𝑆𝐶!

0.0042

0.079

−1.859

𝜒!

0.0038

−0.001

−0.943

𝜒!

0.0034

−0.017

3.560

𝜒!

0.0101

−0.128

−4.585

𝜒!

−0.0008

0.757

−1.390

𝜒!

0.0219

−0.268

7.733

𝜒!

!.!.

−0.0180

−0.441

−8.287

𝜒!

!.!.

−0.0184

−0.186

−7.186

𝜒!

!.!.

−0.0198

−0.864

−0.827

𝜒!

!,!.!.

−0.0699

0.959

69.209

𝜒!

!".,!.!.

−0.0519

−3.170

19.237

1.0874

22.028

−56.840

0.967

0.668

0.874

σ(y)

0.036

7.55

8.7

a PLS regression used 4 latent variables built from the 26 original ones.

b PLS regression used 3 latent variables built from the 26 original ones. 25

c PLS regression used 8 latent variables built from the 26 original ones.

Finally, the fitting of boiling points is slightly better with

the PLS approach (higher R2 and lower σ(y)), and the

resulting model is quite robust, with only three outliers: 30

glycerol itself (000), 444 and 7F07F. In this case, the MUE

are 6.2 (fitted values) and 8.4 (cross-validated values),

slightly better than those found in the MLR analyses.

In order to have a more reliable proof of the predictive

ability of these equations, we splitted the data again into the 35

same training and tests sets used in the MLR analyses. The

results of the corresponding regressions are shown in Table 6.

Plots of experimental vs. predicted values (including solvents

in the test set) are shown in Figure S2 (ESI).

As can be seen from the values in Table 6, both the 40

goodness of the fitting and the regression coefficients

obtained with the training set of solvents are quite similar to

those calculated with the full set.

Concerning the prediction errors, the MUE for E!

!!are 0.027

for the training set (almost identical to that calculated with the 45

full set of solvents) and 0.030 for the test set, which points to

a good predictivity of the equations developed. Concerning

the viscosity, the corresponding MUE values are 6.33 and

4.62 for the training and test sets, respectively, which are also

quite close to that obtained with the full set of solvents (5.86) 50

and point to a worse predictivity of this property by the model

developed. Finally, the MUE for the prediction of boiling

points are 6.6 (training set) and 10.3 (test set). Even if the

latter is clearly higher, it still represent about 7% of the full

range of b.p. values, which may be enough to obtain a 55

reasonable predictivity of this solvent property.

Multiple Linear Regression (MLR) with DARC/PELCO

descriptors.

In this case we used again the stepwise method to include in

the regression equation only those variables which are 60

statistically significant. It should be noted that for predictive

purposes, given the local character of the DARC/PELCO

descriptiors, the values of the coefficients of all the variables

not included in the final equations must be taken as zero. The

three MLR equations thus obtained are the following: 65

!=b!+b!·A!+b!·B!" +b!·A!+b!·B!+b!·C!" +b!·C!

Eq. (5)

η=b!+b!·A!+b!·C!" +b!·A! Eq. (6)

bp =b!+b!·D!+b!·A!+b!·C!+b!·C!+b!·B2 +b!·C!" +

b!·B!" +b!·A! Eq. (7) 70

Table 7. Linear regression parameters from equations 5–7.

b.p.

b0 ± e0

0.851 ± 0.057

70.79 ± 5.45

278.2 ± 10.6

b1 ± e1

−0.278 ± 0.023

−32.50 ± 3.10

19.1 ± 3.60

b2 ± e2

0.140 ± 0.024

6.90 ± 2.40

−55.6 ± 6.68

b3 ± e3

−0.160 ± 0.038

−3.52 ± 2.50

33.6 ± 6.32

b4 ± e4

−0.026 ± 0.012

—

12.6 ± 2.49

b5 ± e5

−0.059 ± 0.032

—

7.9 ± 1.86

b6 ± e6

−0.016 ± 0.014

—

12.0 ± 5.84

b7 ± e7

—

7.0 ± 3.98

b8 ± e8

—

−6.1 ± 3.97

0.972

0.981

0.933

σ(y)

0.036

2.08

6,9

229.23

228.29

92.18

a F(6,40 0,05) = 2.34; F(3,14, 0,05) = 3.34; F(8,54, 0,05) = 2.18. All equations are

statistically significant (p > 95%).

The corresponding coefficient values and MLR parameters are 75

shown in Table 7, and the plots of predicted vs. experimental

values of the properties are displayed in Figure 4.

Figure 4. Plots of predicted vs. experimental values of E!

! (a), Viscosity

(b), and boiling point (c), as calculated through MLR analysis using

DARC/PELCO descriptors.

As can be seen, the fitting of the three properties is better

than those described with the precedent approaches. Even the 10

viscosity display good values. In a first approach, this cannot

be ascribed to overfitting, given the final equation has only

three independent variables to fit seventeen data, i.e. more

than five times data than variables. Similarly, boiling point

also displays a very good fitting, with low standard error (ca. 15

7 ºC).

The robustness of the method was tested again by removing

the same test set of solvents (Table 3) from the entire data

and, as can be seen from the values gathered in Table 8, the

regresion coefficients in eq. 5–7 do not change dramatically, 20

all values lying withing the calculated confidence margins.

Table 8. Linear regression factors from equations 5–7 using a reduced

training set of 54 solvents.

b.p.

b0 ± e0

0.839 ± 0.059

74.13 ± 6.81

281.3 ± 11.0

b1 ± e1

−0.289 ± 0.025

−34.26 ± 3.78

18.4 ± 3.5

b2 ± e2

0.126 ± 0.026

6.99 ± 2.50

−56.4 ± 6.8

b3 ± e3

−0.148 ± 0.039

−2.99 ± 2.99

33.0 ± 6.1

b4 ± e4

−0.030 ± 0.013

—

12.2 ± 2.4

b5 ± e5

−0.055 ± 0.041

—

7.8 ± 1.9

b6 ± e6

−0.016 ± 0.014

—

10.6 ± 7.5

b7 ± e7

—

8.1 ± 4.3

b8 ± e8

—

−6.5 ± 4.1

0.975

0.983

0.940

σ(y)

0.035

2.05

6.6

206.40

174.86

87.83

F(6,33 0,05) = 2.42; F(3,10, 0,05) = 3.71; F(8,46, 0,05) = 2.18. All equations are

statistically significant (p > 95%). 25

Figure S3 (in ESI) shows the predicted data for the eight

members of the test group. It can be seen that the best

predictable property is the boiling point, whose deviations

from experiment are less than ten percent in the worst case.

!!displays a more erratic behaviour, specially in ht ecase of 30

fluorinated compounds, for which deviations are important in

relative terms, although they preserve the qualitative order

experimentally observed. As expected, the largest deviations

correspond to those structural features less represented in the

training set (highly branched and highly fluorinated chains). 35

Concerning the MUE, in all cases the values for the fitted

values using the solvent training set are lower than those

obtained the topological descriptors (0.024, 1.37 and 4.6 for

!, viscosity and b.p., respectively), but this values are

significantly higher for the test set (0.051, 2.05 and 10.3, 40

respectively). Anyway, these errors represent between 5% and

8% of the full range of values, which point to a reasonably

good predictivity of these equations.

As already mentioned, DARC/PELCO descriptors are

highly intuitive, given their straightforward matching with the 45

molecular structure. As a consequence, the prediction of the

property of a new compound is extremely simple. As an

example we present the calculation of the boiling point of a

glycerol-derived solvent, not belonging to our 62 solvent set,

namely 1,2,3-triethoxypropane (222). This compound and its 50

boiling point were described in the literature, so the example

represents a “real world” prediction, given that the property

was determined by other authors using a different

experimental technique. Table 9 gathers the detailed

prediction procedure from the calculated regression 55

coefficients. As can be seen, the predicted value (177 ºC) is

reasonably close to the experimental one (181 ºC)40, and

within the standard regression error (ca. 95% predicted values

should be within a range of ±14 ºC from experimental ones).

Table 9. Example of boiling point prediction of 1,2,3-triethoxypropane

(222) using the linear regression obtained with DARC/PELCO

descriptors.a

No of fragments

Total contribution

278.2

−6.1

−55.6

−111.2

0.0

B2!

7.9!

15.8!

176.7

a Experimental value: 181 ºC.40 5

Multiple Linear Regression (MLR) with mixed DARC/PELCO

and topological descriptors.

More compact prediction equations (equations 8−10) were

obtained by mixing DARC/PELCO and topological indices,

thus considering simultaneously local and global structure 10

descriptors, respectively. The coefficients and statistical

parameters for these regressions are gathered in Table 10, and

the plots of predicted vs. experimental values of the properties

are displayed in Figure 5.

!=b!+b!·HBD +b!·B!" +b!·A!+b!·χ!

!.!. Eq. (8) 15

η=b!+b!·A!+b!·A!+b!·χ! Eq. (9)

bp =b!+b!·A!+b!·RB +b!·Bal!" +b!·χ!

!.!.+b!·χ!

!.!. Eq. (10)

Table 10. Linear regression factors from equations 8–10.

b.p.

b0 ± e0

0.523 ± 0.122

67.55 ± 3.94

292.6 ± 35.8

b1 ± e1

0.140 ± 0.042

–35.86 ± 2.51

−49.7 ± 9.0

b2 ± e2

0.177 ± 0.020

−5.27 ± 1.75

12.9 ± 3.4

b3 ± e3

−0.099 ± 0.043

0.99 ± 0.23

−26.0 ± 13.2

b4 ± e4

−0.026 ± 0.010

20.4 ± 6.9

b5 ± e5

−8.3 ± 7.5

0.968

0.989

0.932

σ(y)

0.036

1.46

6.8

314.31

376.64

153.47

F(6,40 0,05) = 2.34; F(3,14, 0,05) = 3.34; F(8,54, 0,05) = 2.18. All equations are

statistically significant (p > 95%). 20

Although the statistical tests are very similar to those

obtained with the DARC/PELCO descriptors only, less

independent variables are used in the final equations, leading

to higher number of cases/number of variables ratios. In the

case of viscosity, the number of independent variables does 25

not change, but the standard error of the predictions is slightly

improved (from 2.05 to 1.46 cP).

The robustness and predictivity of these equations was

again tested by splitting the solvent set into training and test

sets. The corresponding regression results are gathered in 30

Table 11. As can be seen, there are not significant changes in

fitting parameters and regression coefficients. Figure S4 (in

ESI) shows the predicted data for the eight members of the

test group.

Figure 5. Plots of predicted vs. experimental values of E!

! (a), Viscosity

(b), and boiling point (c), as calculated through MLR analysis using

topological indices and DARC/PELCO descriptors. 40

Table 11. Linear regression factors from equations 8–10.

𝐄𝐓

𝐍

𝛈

b0 ± e0

0.512 ± 0.129

70.41 ± 4.49

288.6± 38.2

b1 ± e1

0.139 ± 0.045

–37.83 ± 2.71

−49.3 ± 8.9

b2 ± e2

0.173 ± 0.021

−5.46 ± 1.91

13.3 ± 3.5

b3 ± e3

−0.112 ± 0.050

1.03 ± 0.23

−22.5 ± 14.1

b4 ± e4

−0.024 ± 0.011

19.5 ± 6.6

b5 ± e5

−9.3 ± 7.6

0.969

0.993

0.944

σ(y)

0.038

1.35

6.2

268.19

405.64

161.36

F(4,35 0,05) = 2.64; F(3,10, 0,05) = 3.71; F(5,49, 0,05) = 2.42. All equations are

statistically significant (p > 95%).

The comparison of the MUE calculated with the fitting of

the training set and the test set indicates that prediction errors 5

are significantly higher in the later case, but they are anyway

lower than those obtained with the precedent models,

representing 5-6% of the full range of experimental values in

all cases. A summary of the MUE calculated for all the

equations developed in this work is gathered in Table 12. 10

Table 12. Mean unsigned errors (MUE) calculated for the different

equations developed in this work.a

Model

training

test

training

test

training

test

MLR Topol.

0.028

0.030

7.28

5.08

8.6

10.9

PLS Topol.

0.027

0.030

6.34

4.62

6.6

10.3

MLR D.-P.

0.024

0.051

1.37

2.05

4.6

10.3

MLR Mixed

0.026

0.033

1.02

1.95

3.9

8.8

a Bolface values indicate errors within the 5% of the full range of

experimental values, and italicized values, indicate errors within the 8%

of the full range. 15

If we take the MUE calculated for the test set as a measure

of the actual predictivity of the equations, we can conclude

that good predictive models have been developed for the three

properties under study. Topological descriptors seem to be

more adequate for the prediction of ETN, mostly due to the 20

poor predictions of DARC/PELCO descriptors for fluorinated

solvents. The latter, on the other hand, perform much better in

the prediction of viscosities. Overall, the mixed

DARC/PELCO-topological model constitutes the best

compromise for reasonably predicting the three solvent 25

properties studied here.

A referee suggested that PLS analyses could also be applied

to the DARC/PELCO and mixed parameter models. The

corresponding results can be found in the Electronic

Supplementary Information, but in no case improvement over 30

the MLR equations could be obtained, so they will not

discussed here.

Experimental

Glycerol-based solvents were obtained by ring opening of

either the appropriate glycidol ether (non-symetric glycerol-35

based solvents) or epichlorohydrin (symetric glycerol-based

solvents) with corresponding alkoxide in alcoholic media, and

purified by vacuum distillation as described previously.1

The complete list of the 62 solvents used in QSPR analyses

and the values of the experimental properties studied are 40

gathered in Table 1.

Different topological descriptors were calculated for the

molecular structures of every solvent using Materials Studio

Modeling 4.0 from Accelrys. This software can calculate

topological descriptors on the basis of molecular structural 45

information. All these descriptors are gathered in Table S1 of

the Supporting Information.

DARC/PELCO descriptors where generated from the

scheme shown in Figure 1. The presence of a C unit (bearing

the corresponding hydrogen atoms) was codified as 1 in the 50

data matrix (2 if the unit is simultaneously present at both

symmetric sides of the glycerol moiety). C units bearing

fluorine atoms were codified as independent variables (those

starting with “F” in the regression analyses). The final

DARC/PELCO matrix is gathered in Table S2. 55

Multiple linear regression analyses were carried out using

the SPSS software. In all Tables the following information is

provided:

- Regression coefficients bi, as defined previously (B = (XT

X)–1XTY). 60

- Individual confidence intervals (at the 95% probability

level) of each bi coefficient. These confidence intervals are

calculated from the estimated standard error of bi and the

Student’s test with N−p degrees of freedom:

ei = s.e.(bi)·t(N−p, 0.975) 65

- Number of cases included in the regression, N.

- Multiple determination coefficient, R2, which is a measure

of the proportion of the total variation about the mean of y

explained by the regression.

- Standard error of the regression σ(y) is the root square of the 70

residual mean square, and it is estimate of the error with

which any observed value of y could be predicted by the

regression equation.

- F value, defined as the quotient of the regression and

residual mean squares. When compared with a Fisher-75

Snedecor F distribution with p−1 and N−p degrees of freedom,

at a 95% probability level (values given in the footables), it

allows establishing if the variance explained by the regression

equation is significantly different from that of the error. More

strictly, it tests the H0 hypothesis, i.e., that all regression 80

coefficients are zero. If the calculated F value is larger than

the tabulated one, the hypothesis is rejected, and the equation

is considered statistically significant.

- Stepwise linear regression procedure is a method to select

the “best” regression equation from a set of independent 85

vatiables, x. Each variable is sequentially included in the

equation, following its single correlation with the response, y.

For each new variable entering, a partial F-test is performed to

see if the improvement in the equation is significant. If the

variable is accepted, then partial F-tests are also performed for 90

the rest of variables already in the equation. Those not passing

the test are then eliminated. The procedure is repeated until no

more variables are included in the equation. Partial F-tests are

carried out at a 90% probability level.

The Mean Unsigned (or Absolute) Error (MUE or MAE) is is 95

an average of the absolute errors ei=|ŷi-yi|, where ŷi is the

value predicted by the model and yi the experimental value.

Conclusions

In this study three characteristic properties relevant to classify

solvents and facilitate the search of substitution uses have

been investigated in a series of 62 glycerol derivatives that

can be used as solvents. Global topological descriptors, based 5

on the molecular graphs, have been successfully applied to

analyze and predict solvent polarities, both using traditional

MLR and PLS regression analyses. However, boiling points

and viscosities are not so well modeled using this kind of

structural variables. 10

On the other hand, DARC/PELCO local structural

descriptors have revealed as clearly superior to describe the

viscosity of this family of solvents. Boiling points are

similarly well predicted with both kinds of approaches.

Overall, the mixed model with DARC/PELCO and 15

topological descriptors constitutes the best compromise for

reasonably predicting the three solvent properties studied in

this work.

Highly significant regression equations have been

developed for the three properties under study. The robustness 20

and predictive value of these equations have been

demonstrated through the use of an independent test set of

solvents. Therefore, the QSPR models developed provide

signiﬁcant additional insight into the relationship between the

molecular structure and some fundamental solvent properties. 25

Based on these results, it seems that quantitative structure

activity/property relationships (QSAR/QSPR) could be quite

useful for in silico prediction of physico-chemical properties,

allowing a faster selection of target solvents for a given

application. 30

Acknowledgements

Financial support from the Spanish MINECO (project

CTQ2011-28124-C02-01, the European Social Fund (ESF)

and the Gobierno de Aragón (Grupo Consolidado E11) is

gratefully acknowledged. 35

Notes and references

a Instituto de Síntesis Química y Catálisis Homogénea, Facultad de

Ciencias, CSIC-Univ. de Zaragoza, Pedro Cerbuna, 12, E-50009

Zaragoza, Spain. Tel: +34 976762271; E-mail: jig@unizar.es

b Dept. Organic Chemistry, Facultad de Ciencias, Univ. de Zaragoza, 40

Pedro Cerbuna, 12, E-50009 Zaragoza, Spain.

c Dept. Physical Chemistry, Facultad de Ciencias, Univ. de Zaragoza,

Pedro Cerbuna, 12, E-50009 Zaragoza, Spain.

1 García, J. I., García-Marín, H., Mayoral, J. A., Pérez, P., Green Chem., 45

2010, 12, 426.

2 Cramer, R. D., J. Am. Chem. Soc., 1980, 102, 1837.

3 Carlson, R., Design and Optimization in Organic Synthesis. Ed.

Elsevier, Amsterdam, 1992.

4 Chastrette, M., Rajzmann, M., Chanon, M., Purcell, K. F., J. Am. Chem. 50

Soc., 1985, 107, 1.

5 Koppel, I. A., Palm, V. A., The Influence of the Solvent on Organic

Reactivity, in Advances in Linear Free Energy Relationships. Ed. Plenum

Press, London, 1972.

6 Kamlet, M. J., Abboud, J. L., Abraham, M. H., Taft, R. W., J. Org. 55

Chem., 1983, 48 (17), 2877.

7 Catalán, J., Solvent Effects based on non-HBD Solvents in Handbook of

Solvents. Ed. William Andrew Publishing, New York, 2001.

8 Reichardt, C., Solvents and Solvent Effects. 3th ed.; Ed. Wiley-VCH,

Weinhein, 2003. 60

9 Ravi, M., Hopfinger, A. J., Hormann, R. E., Dinan, L., J. Chem. Inf.

Comput. Sci., 2001, 41, 1587.

10 Luke, B. T., J. Mol. Struct. (Theochem.), 1999, 13, 468.

11 Bruneau, P., J. Chem. Inf. Comput. Sci., 2001, 41, 1605.

12 Katritzky, A. R., Petrukhin, R., Tatham, D., J. Chem. Inf. Comput. Sci., 65

2001, 41, 679.

13 Ghasemi, J., Saaidpour, S., Brown, S. D., J. Mol. Struct. (Theochem.),

2007, 805, 27.

14 Brauner, N., Shachamb, M., Cholakovc, G. S., Statevad, R. P., Chem.

Eng. Sci., 2005, 60, 5458. 70

15 Lind, P., Lopes, C., Oberg, K., Eliasson, B., Chem. Phys. Lett., 2004,

387, 238.

16 Ungerer, P., Nieto-Draghi, C., Rousseau, B., Ahunbay, G., Lachet, V.,

J. Mol. Liq., 2007, 134, 71.

17 Ghasemi, J. B., Abdolmaleki, A., Mandoumi, N., J. Hazardous Mat., 75

2009, 161, 74.

18 Fatemi, M. H., Haghdadi, M., J. Mol. Struct., 2008, 886, 43.

19 Torrecilla, J. S., Palomar, J., Lemus, J., Rodríguez, F., Green Chem.,

2010, 12, 123.

20 Alvarez-Guerra, M., Irabien, A., Green Chem., 2011, 13, 1507. 80

21 Yan, F., Xia, S., Wang, Q., Ma, P. , J. Chem. Eng. Data, 2012, 57,

2252.

22 Consonni, V., Todeschini, R., Pavan, M., Gramatica, P., J. Chem. Inf.

Comput. Sci., 2002, 42, 693.

23 Krenkel, G., Castro, E. A., Toropov, A. A., J. Mol. Struct. (Theochem.), 85

2001, 542, 107.

24 Ghasemi, J., Shahmirani, S., Farahani, E. V., Ann. Chim., 2006, 96, 327.

25 Kier, L. B., Hall, L. H., Molecular Connectivity in Structure-Activity

Analysis. Ed. Research Studies Press Ltd, New York, 1985.

26 Katritzky, A. R., Fara, D. C., Kuanar, M., Hur, E., Karelson, M., J. 90

Phys. Chem. A, 2005, 109, 10323.

27 Draper, N. R., Smith, H., Applied Regression Analysis. Ed. Wiley-

Interscience, 1998.

28 Dimroth, K., Reichardt, C., Siepmann, T., Bohlmann, F., Liebigs Ann.

Chem., 1963, 1, 661. 95

29 Dimroth, K., Reichardt, C., Schweig, A., Liebigs Ann. Chem., 1963, 95,

669.

30 Lide, D. R., Handbook of Chemistry and Physics. 84th ed.; Ed. CRC,

New York, 2004.

31 Katritzky, A. R., Gordeeva, E. V., J. Chem. Inf. Comput. Sci., 1993, 100

835.

32 Hall, L. H., Kier, L. B., Rev. Comput. Chem. II, 1991, 367.

33 Balaban, A. T., Chem. Phys. Lett., 1982, 309.

34 Wiener, H., J. Chem. Phys., 1947, 17.

35 Bonchev, D., Information Theoretic Indices for Characterization of 105

Chemical Structures. Ed. Research Studies Press Ltd., New York, 1983.

36 Kier, L. B., Hall, L. H., Molecular Connectivity Indices in Chemistry

and Drug Research. Ed. deStevens, New York, 1976.

37 Dubois, J. E., Computer Representation and Manipulation of Chemical

Information. Ed. Wiley, New York, 1974. 110

38 Wold, S., Ruhe, A., Wold, H., Dunn, W., SIAM J. Sci. Stat. Comput.,

1984, 5, 735.

39 Geladi, P., Kowalski, B. R., Anal. Chim. Acta, 1986, 185, 1.

40 Fairbourne, A., Gibson, G. P., Stephens, D. W., J. Chem. Soc., 1931,

445.

115

S11

Electronic Supplementary Information for

Quantitative structure-property relationships prediction of

some physico–chemical properties of glycerol based solvents.

José I. García,*a Héctor García-Marín,a José A. Mayoral,a,b and Pascual Pérezc

a Instituto de Síntesis Química y Catálisis Homogénea, Facultad de Ciencias, CSIC-Univ. de Zaragoza, Pedro

Cerbuna, 12, E-50009 Zaragoza, Spain. Tel: +34 976762271; E-mail: jig@unizar.es.

b Dept. Organic Chemistry, Facultad de Ciencias, Univ. de Zaragoza, Pedro Cerbuna, 12, E-50009 Zaragoza, Spain.

E-mail: mayoral@unizar.es.

c Dept. Physical Chemistry, Facultad de Ciencias, Univ. de Zaragoza, Pedro Cerbuna, 12, E-50009 Zaragoza, Spain.

E-mail: pascual@unizar.es.

Definition of the topological parameters

Topological indices are usually obtained from two-dimensional molecular structures (molecular graphs, G),

mostly through the connectivity adjacency (A(G)) and topological distance matrices (D(G)), and the vertex

degree vector (δ(G)):

A(G)

δ(G)

D(G)

Topological indices are calculated from different invariant features of the molecular graph, and contain

information about molecular size, molecular shape, branching, molecular flexibility, etc. The exact definition

of the indices used in this work are given below.

Balaban indices (JX, JY):15, 23

Balaban index is defined as:

where M is the number of bonds, N is the number of atoms in the molecule, and si is calculated as the sum of

terms from a modified topological distance matrix. In this modified distance matrix, each bond contributes with

1/b to the total connectivity, with b=1 for single bonds, b=2 for double bonds, b=3 for triple bonds, and b=1.5

for aromatic bonds:

𝑠!=𝑑!"

!!!

Corrections for heteroatoms have been introduced through contributions for the modification of the

electronegativity (X) and the atomic radii (Y):

where i is the atomic number and Gi is the group number in the Periodic Table of the elements. From these

corrections, the 𝑠!

! values are defined as:

𝑠!

!=𝑋∙𝑠! (for JX index)

𝑠!

!=𝑌∙𝑠! (for JY index)

Wiener index (W):16, 24

The Wiener index is defined as the sum of the lengths of the shortest paths between all pairs of vertices in the

chemical graph representing the non-hydrogen atoms in the molecule. It is easily computed from the

topological distance matrix:

𝑊=

2𝑑!"

This index is a measure of the centrality of the graph, and hence it is related with the molecular compactness.

Zagreb index:17

It is defined as the sum of squares of the difference between the number of electrons participating in covalent

bonds and the number of hydrogen atoms bonded to the same atom. This is equivalent to the sum of the squares

of the vertices degrees, δi:

∑

+−

GiX 1567,00078,04196,0 +−=

GiY 0537,00160,011 91,1 +−=

S13

Randic and Kier & Hall connectivity indices (

):18

indices were first proposed by Randic25 from the vertices degrees, as:



𝛿𝑖∙𝛿𝑗

𝐵 , extended to all bonds in the molecule (B).

Kier and Hall extended the definition by including the number of edges of a given sub-graph (h), and different

kinds of sub-graphs (r):

!𝐺

𝛿!

!!!

where σn is the number of sub-graph of length h and δ is the vertex degree.

There are four kinds of sub-graphs, known as path (linear chains), cluster (branched chains), path/cluster, and

chain (cycles), each one emphasizing a particular aspect of the molecular connectivity. The n superindex refers

to the number of bonds considered to calculate the topological index. Thus, n=0 refers to individual atoms, n=1

refers to directly connected atoms, n=2 refers to three atoms connected through two consecutive bonds, and so

on.

, and hence

A further refinement10d, 18 can be included to the

indicesby considering the atom valences, thus allowing

distinguishing the presence of heteroatoms in the structure. This is accomplished by calculating s “corrected” d

value, using the atomic number and the number of valence electrons of the vertex atoms:

Where Zv is the number of valence electrons, Z is the atomic number and h is the number of hydrogen atoms

bonded to the vertex atom. The resulting “valence-corrected” indices are named as

Kier & Hall count indices (SC):

SC is the count of sub-graphs of a given length present in the molecules. Thus, SC=0 is the number of atoms,

SC=1, the number of chemical bonds, SC=2, the number of pair bonds, and so on. For longer sub-graphs, path,

cluster, path/cluster and chain types can be also considered.

Kier shape indices (

n):

All the prededent topological indices are heavily influenced by the size of the molecular graph. Kier developed

the

indices to best discriminate between different shapes of the molecules. They are defined from sub-graphs

of a given length, taking into account also the maximum and minimum connectivity of the molecule for the

same length (a way to “normalize” the

values, making them independent of the molecular size):

Where m is the length chosen of the sub-graph, mPi the number of sub-graphs of length m contained in the total

graph, and mPmax and mPmin is the maximum and minimum number possible of sub-graphs of length m that can

contain the total graph. Some examples are given below.

κ1, K =2:

κ2, K =2:

κ3, K =4:

∑∑

−==

hZagreb

)(

σδ

( )

∏

×××

iinji

sub

)...(

δδδδ

( )

∑

=≡

sub

PsubnChi ))((

( )

1−−

−

( )

maxmin

min

1−=NP

)1(

max

=NN

edgesofnumberP

min

2−=NP

)2)(1(

max

−−

=NN

edgesadjace ntofnumberP

___

Similarly to the

indices, a modification has been sugested for

indices to account for the presence of

heteroatoms in the molecular graph. 14, 26 In this modification, both the covalent radii and the hybridizations

are considered. The



! indices are defined as the

n ones, but substituting N by N+α, where a is defined as:

Where ri is the covalent radium of atom i and rCsp3 is taken as 0.77 Å (the covalent radius of a carbon atom with

sp3 hybridization).

Molecular flexibility index (

):14

The starting hypothesis to define f is that an infinitely long linear saturated hydrocarbon molecule (i.e. all-sp3

C−C bonds) is infinitely flexible. Flexibility is reduced by the presence of a limited number of atoms, rings,

branched chains, and the presence of atoms with covalent radii shorter than that of Csp3:

min

−=NP

)_(

)2(

max

evenN

P−

)_(

)3)(1(

max

oddN

P−−

edgesadjacentoftriosP

___

∑"

'−=

iCsp

αα

κκ

S15

Table S1. Topological parameters of 62 glycerol based solvents.

Code

HBA

HBD

BalJX

BalJY

𝜿𝟏

𝒂𝒎

𝜿𝟐

𝒂𝒎

𝜿𝟑

𝒂𝒎

𝐒𝐂𝐩

𝟎

𝐒𝐂𝐩

𝟏

𝐒𝐂𝐩

𝟐

𝐒𝐂𝐩

𝟑

𝐒𝐂𝐜

𝟑

𝝌

𝟎

𝝌

𝟏

𝝌

𝟐

𝝌𝒑

𝟑

𝝌𝒄𝒍

𝟑

𝝌𝒗𝒎𝟎

𝝌𝒗𝒎𝟏

𝝌𝒗𝒎𝟐

𝝌𝒑

𝒗𝒎

𝟑

𝝌𝒄𝒍

𝒗𝒎

𝟑

000

3.02

2.572

2.814

5.88

3.08

2.88

4.99

2.81

1.92

1.39

0.29

3.33

1.71

1.02

0.42

0.13

100

3.98

2.620

2.901

6.88

4.05

3.72

5.70

3.31

2.30

1.48

0.29

4.29

2.09

1.29

0.57

0.13

200

4.95

2.665

2.926

7.88

5.03

4.88

6.41

3.81

2.66

1.75

0.29

5.00

2.68

1.50

0.73

0.13

400

6.91

2.723

2.939

153

9.88

6.99

6.88

7.82

4.81

3.36

2.25

0.29

6.42

3.68

2.26

1.16

0.13

101

4.95

2.686

2.996

7.88

5.03

4.88

6.41

3.81

2.68

1.56

0.29

5.26

2.47

1.56

0.72

0.13

103i

5.58

2.915

3.193

143

9.88

5.65

6.88

7.98

4.66

3.87

2.02

0.70

6.83

3.45

2.49

0.98

0.37

104

7.89

2.788

3.033

202

10.88

7.98

7.78

8.53

5.31

3.74

2.33

0.29

7.38

4.06

2.54

1.31

0.13

104i

6.51

2.909

3.162

194

10.88

6.58

7.78

8.69

5.16

4.22

2.26

0.70

7.54

3.91

3.04

1.12

0.54

104t

4.65

3.173

3.444

180

10.88

4.70

7.78

8.91

4.96

4.99

2.17

1.85

7.76

3.76

3.53

1.07

1.24

403i

8.40

2.973

3.205

324

12.88

8.48

9.80

10.10

6.16

4.93

2.79

0.70

8.95

5.04

3.46

1.57

0.37

404

10.86

2.907

3.121

419

13.88

10.96

10.88

10.65

6.81

4.80

3.10

0.29

9.50

5.64

3.51

1.91

0.13

404t

7.15

3.152

3.380

388

13.88

7.21

10.88

11.03

6.46

6.05

2.94

1.85

9.88

5.35

4.51

1.66

1.24

404i

9.36

2.984

3.203

408

13.88

9.44

10.88

10.81

6.66

5.28

3.03

0.70

9.66

5.50

4.01

1.71

0.54

203i

6.51

2.950

3.214

192

10.88

6.58

7.78

8.69

5.16

4.22

2.29

0.70

7.54

4.04

2.70

1.14

0.37

204

8.88

2.845

3.082

262

11.88

8.97

8.88

9.23

5.81

4.10

2.60

0.29

8.08

4.64

2.74

1.47

0.13

204t

5.46

3.176

3.434

237

11.88

5.51

8.88

9.61

5.46

5.35

2.44

1.85

8.46

4.35

3.74

1.22

1.24

204i

7.45

2.950

3.193

253

11.88

7.53

8.88

9.40

5.66

4.57

2.53

0.70

8.25

4.50

3.25

1.28

0.54

3i03i

6.34

3.079

3.334

243

11.88

6.40

8.88

9.56

5.52

5.05

2.47

1.11

8.41

4.43

3.42

1.24

0.60

3i04t

5.53

3.273

3.522

296

12.88

5.58

9.80

10.48

5.81

6.18

2.62

2.26

9.33

4.75

4.46

1.33

1.48

3i04i

7.23

3.068

3.305

314

12.88

7.30

9.80

10.27

6.02

5.40

2.72

1.11

9.12

4.89

3.97

1.38

0.77

3i03F

6.06

3.115

3.499

378

13.67

6.21

10.67

11.19

6.31

6.53

2.86

2.26

8.17

4.25

3.17

1.20

0.54

403F

7.72

3.032

3.384

484

14.67

7.90

11.60

11.73

6.96

6.41

3.17

1.85

8.72

4.86

3.21

1.53

0.31

4t03F

5.55

3.267

3.639

450

14.67

5.67

11.60

12.11

6.60

7.66

3.01

3.41

9.10

4.57

4.21

1.29

1.42

4i03F

6.87

3.106

3.465

472

14.67

7.03

11.60

11.90

6.81

6.88

3.10

2.26

8.88

4.71

3.72

1.34

0.72

3F03F

6.05

3.136

3.615

557

15.46

6.26

12.46

12.82

7.10

8.01

3.25

3.41

7.94

4.07

2.91

1.15

0.49

5F05F

6.90

3.636

4.205

1283

108

21.18

7.17

6.88

17.82

9.60

11.16

7.06

4.70

10.45

5.33

4.09

2.02

0.84

7F07F

8.01

4.071

4.718

2399

144

26.90

8.33

6.20

22.82

12.10

14.41

10.13

6.20

12.96

6.58

5.24

2.82

1.17

111

5.93

2.907

3.263

102

8.88

6.01

4.39

7.11

4.35

2.85

1.97

0.20

6.22

2.85

1.77

1.04

0.12

113i

6.51

3.111

3.431

182

10.88

6.58

6.28

8.69

5.20

4.03

2.43

0.61

7.79

3.84

2.70

1.30

0.35

143i

9.36

3.331

3.612

369

13.88

9.44

9.26

10.81

6.70

5.12

3.07

0.61

9.92

5.42

3.68

1.82

0.35

114t

5.46

3.341

3.652

225

11.88

5.51

7.32

9.61

5.49

5.16

2.58

1.77

8.72

4.15

3.74

1.39

1.23

144t

8.02

3.513

3.789

436

14.88

8.08

10.17

11.73

6.99

6.25

3.22

1.77

10.84

5.74

4.73

1.91

1.23

114

8.88

2.974

3.260

250

11.88

8.97

7.32

9.23

5.85

3.91

2.74

0.20

8.34

4.44

2.74

1.63

0.12

144

11.8

3.246

3.505

470

14.88

11.95

10.17

11.36

7.35

5.00

3.38

0.20

10.46

6.03

3.73

2.15

0.12

114i

7.46

3.089

3.383

241

11.88

7.53

7.32

9.40

5.70

4.39

2.67

0.61

8.50

4.30

3.24

1.44

0.53

144i

10.3

3.330

3.595

458

14.88

10.40

10.17

11.52

7.20

5.48

3.31

0.61

10.62

5.89

4.23

1.96

0.53

123i

7.45

3.240

3.552

232

11.88

7.53

7.32

9.40

5.70

4.42

2.55

0.61

8.50

4.42

2.92

1.37

0.35

213i

7.45

3.158

3.465

237

11.88

7.53

7.32

9.40

5.70

4.39

2.70

0.61

8.50

4.42

2.90

1.46

0.35

124t

6.29

3.450

3.753

282

12.88

6.35

8.22

10.32

5.99

5.54

2.70

1.77

9.42

4.74

3.96

1.46

1.23

214t

6.29

3.366

3.665

288

12.88

6.35

8.22

10.32

5.99

5.52

2.85

1.77

9.42

4.74

3.94

1.54

1.23

124

9.87

3.114

3.395

310

12.88

9.96

8.22

9.94

6.35

4.29

2.86

0.20

9.05

5.03

2.96

1.70

0.12

214

9.87

3.045

3.322

316

12.88

9.96

8.22

9.94

6.35

4.27

3.01

0.20

9.05

5.03

2.95

1.79

0.12

124i

8.40

3.219

3.507

300

12.88

8.48

8.22

10.10

6.20

4.77

2.79

0.61

9.21

4.89

3.46

1.51

0.53

214i

8.40

3.146

3.430

306

12.88

8.48

8.22

10.10

6.20

4.74

2.94

0.61

9.21

4.89

3.45

1.60

0.53

223i

8.40

3.304

3.604

294

12.88

8.48

8.22

10.10

6.20

4.77

2.82

0.61

9.21

5.01

3.12

1.53

0.35

224t

7.15

3.498

3.792

352

13.88

7.21

9.26

11.03

6.49

5.90

2.97

1.77

10.13

5.33

4.16

1.61

1.23

224

10.86

3.197

3.471

383

13.88

10.96

9.26

10.65

6.85

4.65

3.13

0.20

9.75

5.62

3.17

1.86

0.12

224i

9.36

3.292

3.572

372

13.88

9.44

9.26

10.81

6.70

5.12

3.06

0.61

9.92

5.47

3.67

1.67

0.53

413i

9.36

3.175

3.445

384

13.88

9.44

9.26

10.81

6.70

5.10

3.20

0.61

9.92

5.42

3.67

1.90

0.35

423i

10.32

3.329

3.598

458

14.88

10.40

10.17

11.52

7.20

5.48

3.32

0.61

10.62

6.01

3.89

1.96

0.35

414t

8.02

3.349

3.616

454

14.88

8.08

10.17

11.73

6.99

6.22

3.35

1.77

10.84

5.74

4.71

1.98

1.23

424t

8.90

3.500

3.765

535

15.88

8.97

11.21

12.44

7.49

6.60

3.47

1.77

11.55

6.33

4.93

2.05

1.23

414

11.86

3.105

3.357

488

14.88

11.95

10.17

11.36

7.35

4.97

3.51

0.20

10.46

6.03

3.72

2.23

0.12

414i

10.32

3.182

3.439

476

14.88

10.40

10.17

11.52

7.20

5.45

3.44

0.61

10.62

5.89

4.22

2.03

0.53

424i

11.28

3.339

3.595

559

15.88

11.37

11.21

12.23

7.70

5.83

3.56

0.61

11.33

6.47

4.44

2.10

0.53

3i13F

6.87

3.304

3.723

444

14.67

7.03

9.96

11.90

6.85

6.70

3.27

2.17

9.14

4.64

3.37

1.52

0.53

4t13F

6.28

3.457

3.864

522

15.67

6.42

11.00

12.82

7.14

7.83

3.42

3.33

10.06

4.95

4.41

1.61

1.41

444

14.84

3.443

3.683

789

17.88

14.94

13.17

13.48

8.85

6.06

4.15

0.20

12.58

7.62

4.70

2.74

0.12

413F

8.60

3.223

3.610

559

15.67

8.78

11.00

12.44

7.49

6.58

3.58

1.77

9.68

5.24

3.42

1.85

0.30

3F13F

6.80

3.325

3.835

638

16.46

7.02

11.72

13.53

7.64

8.18

3.66

3.33

8.90

4.46

3.12

1.47

0.48

3F23F

7.57

3.476

3.980

736

17.46

7.80

12.76

14.23

8.14

8.56

3.78

3.33

9.61

5.04

3.34

1.54

0.48

3F43F

9.15

3.642

4.119

987

19.46

9.41

14.72

15.65

9.14

9.27

4.29

3.33

11.02

6.04

4.11

1.99

0.48

S19

Figure S1. Predicted vs. experimental values of E!

!, viscosity, and boiling

point for the selected solvent test set using MLR analysis with topological

parameters (equations 2–4 in the main text). 5

Figure S2. Predicted vs. experimental values of E!

!, viscosity, and boiling

point for the selected solvent test set using PLS analysis with topological

parameters.

221

208

176

204

170

234

185

171

224

222

201

204

172

208

175

178

100

150

200

250

200

104

3i03F

5F05F

113i

414t

4t13F

3F23F

boiling point (ºC)

Exp.

Calc.

35,1

6,9

1,0

2,1

29,5

13,5

/2,5

/5,0

0,0

5,0

10,0

15,0

20,0

25,0

30,0

35,0

40,0

200

3i03F

113i

4t13F

Dynamic visc. (cP) Exp.

Calc.

0,690

0,480

0,590

0,699

0,141

0,373

0,595

0,701

0,447

0,606

0,743

0,157

0,352

0,529

0,000

0,100

0,200

0,300

0,400

0,500

0,600

0,700

0,800

200

104

3i03F

5F05F

414t

4t13F

3F23F

ETN

Exp.

Calc.

Figure S3. Predicted vs. experimental values of E!

!, viscosity, and boiling

point for the selected solvent test set using MLR analysis with

DARC/PELCO descriptors (equations 5–7 in the text). 5

Figure S4. Predicted vs. experimental values of E!

!, viscosity, and boiling

point for the selected solvent test set using MLR analysis with mixed

topological and DARC/PELCO descriptors (equations 8–10 in the text)..

0,690

0,480

0,590

0,699

0,141

0,373

0,595

0,661

0,497

0,609

0,795

0,118

0,290

0,506

0,000

0,100

0,200

0,300

0,400

0,500

0,600

0,700

0,800

0,900

200

104

3i03F

5F05F

414t

4t13F

3F23F

ETN

Exp.

Calc.

221

208

176

204

170

234

185

171

233

207

192

185

178

224

194

178

100

150

200

250

200

104

3i03F

5F05F

113i

414t

4t13F

3F23F

boiling point (ºC)

Exp.

Calc.

35,1

6,9

1,0

2,1

39,9

5,6

2,6

0,0

5,0

10,0

15,0

20,0

25,0

30,0

35,0

40,0

45,0

200

3i03F

113i

4t13F

Dynamic visc. (cP) Exp.

Calc.

0,690

0,480

0,590

0,699

0,141

0,373

0,595

0,670

0,474

0,628

0,746

0,140

0,332

0,515

0,000

0,100

0,200

0,300

0,400

0,500

0,600

0,700

0,800

200

104

3i03F

5F05F

414t

4t13F

3F23F

ETN

Exp.

Calc.

221

208

176

204

170

234

185

171

236

209

190

198

173

219

189

183

100

150

200

250

200

104

3i03F

5F05F

113i

414t

4t13F

3F23F

boiling point (ºC)

Exp.

Calc.

35,1

6,9

1,0

2,1

39,2

6,3

/1,7

2,5

/5,0

0,0

5,0

10,0

15,0

20,0

25,0

30,0

35,0

40,0

45,0

200

3i03F

113i

4t13F

Dynamic visc. (cP)

Exp.

Calc.

S21

Table S4. Comparison between MLR and PLS analyses with DARC/PELCO descriptors for the three solvent properties studied.

Descriptor

𝐄𝐓

𝐍

Dynamic Viscosity (cP)

Boiling point (ºC)

MLR

PLSa

MLR

PLSb

MLR

PLSc

0.851

0.796

70.79

71.800

278.2

279.3

−0.278

−0.268

−3.52

−4.588

−6.1

-8.4

−0.160

−0.116

−32.50

−34.448

−55.6

-55.7

n.s.

−0.045

n.s.

0.083

n.s.

5.0

−0.026

−0.034

n.s.

0.546

7.9

7.7

BF2

0.140

0.134

n.s.

2.606

7.0

6.7

n.s.

0.003

n.s.

0.083

33.6

15.3

−0.016

−0.018

n.s.

0.799

12.6

12. 5

CF2

−0.059

−0.055

6.90

2.816

12.0

9.5

n.s.

0.003

n.s.

0.083

n.s.

15.3

n.s.

−0.015

n.s.

0.799

19.1

18.9

DF2

n.s.

−0.033

n.s.

2.816

n.s.

3.0

0.972

0.968

0.981

0.991

0.933

0.935

0.036

2.08

1.28

6.9

6.3

a 4 latent variables. b 5 latent variables. c 6 latent variables.

Given that C1 and D1 are linearly dependent (see Table S3), their behaviour differs in stepwise MLR and PLS analyses of the boiling

point response. In the former case, the variable entering in the equation takes the full value (33.6), whereas in the “back-projection” of 5

the PLS coefficients into the original variables, each coefficient takes half of the full value (15.3). Of course, the predictions within the

solvent set used are therefore identical, given that all structures for which C1=1, have D1=1 too. Similar, but not the same behaviour is

observed for other highly correlated parameters, such as CF2 and DF2.

Table S5. Comparison between MLR and PLS analyses with mixed topological and DARC/PELCO descriptors for the three

solvent properties studied.

Descriptor

𝐄𝐓

𝐍

Dynamic Viscosity (cP)

Boiling point (ºC)

MLR

PLSa

MLR

PLSb

MLR

PLSc

0.523

0.865

67.55

156.41

292.6

171.2

−0.099

−0.054

−5.27

6.333

n.s.

10.918

n.s.

−0.005

−35.86

−27.740

−49.7

−26.743

n.s.

−0.014

n.s.

5.713

n.s.

−3.438

n.s.

−0.012

n.s.

−1.155

n.s.

2.847

BF2

0.177

0.007

n.s.

−0.224

n.s.

1.900

n.s.

−0.004

n.s.

5.713

n.s.

4.677

n.s.

0.024

n.s.

0.904

n.s.

5.368

CF2

n.s.

0.013

n.s.

−0.223

n.s.

−0.045

n.s.

−0.004

n.s.

5.713

n.s.

4.677

n.s.

0.014

n.s.

0.904

n.s.

4.454

DF2

n.s.

−0.006

n.s.

−0.223

n.s.

−3.253

HBA

n.s.

0.035

n.s.

−1.564

n.s.

−0.898

HBD

0.140

0.060

n.s.

21.407

n.s.

15.825

n.s.

0.032

n.s.

−14.793

12.9

15.721

n.s.

−0.014

n.s.

7.672

n.s.

−0.395

BalJX

n.s.

−0.013

n.s.

−30.362

n.s.

0.272

BalJY

n.s.

−0.014

n.s.

−18.803

−26.0

−0.564

n.s.

0.000

n.s.

−0.054

n.s.

−0.032

n.s.

−0.002

n.s.

3.661

n.s.

1.387

κ1

n.s.

−0.009

n.s.

−5.738

n.s.

0.526

κ2

n.s.

−0.013

n.s.

8.699

n.s.

−0.765

κ3

n.s.

0.022

n.s.

−5.679

n.s.

−8.078

SC0

n.s.

−0.007

n.s.

−5.848

n.s.

0.463

SC1

n.s.

−0.007

n.s.

−5.848

n.s.

0.463

SC2

n.s.

0.006

n.s.

7.678

n.s.

0.231

SC3

n.s.

−0.017

n.s.

−2.194

n.s.

−8.409

SC3

n.s.

0.012

n.s.

−3.522

n.s.

0.869

0χ

n.s.

−0.003

0.99

1.811

n.s.

0.144

1χ

n.s.

−0.007

n.s.

−3.813

n.s.

0.412

2χ

n.s.

0.008

n.s.

8.389

n.s.

−0.720

3χp

n.s.

0.003

n.s.

−1.152

n.s.

3.781

3χcl

n.s.

0.002

n.s.

−7.165

n.s.

2.006

0χvm

−0.026

−0.039

n.s.

3.486

−8.3

−3.314

1χvm

n.s.

−0.009

n.s.

−6.006

2.333

2χvm

n.s.

−0.010

n.s.

12.129

20.4

3.026

3χp

n.s.

−0.009

n.s.

5.831

2.857

3χcl

n.s.

−0.009

n.s.

−10.869

1.690

0.968

0.989

0.999

0.932

0.935

0.036

1.46

0.20

6.8

6.3

a 6 latent variables. b 13 latent variables. c 9 latent variables.

Influence of Laser Beam Aberrations Compensation and Spot Size on the Transmittance in Native and Optically Cleared Skeletal Muscles

Article

Mar 2023
OPTIK

Light propagation and penetration inside tissues highly affect the optical imaging of biological tissues. Two major factors influence that; the first is associated with the dense scattering properties of tissues. Consequently, optical clearing (OC) methods have been developed to reduce tissue scattering by matching the tissue layers’ refractive indices via different protocols. The second factor is related to the illuminating wavefront and the size of the incident light beam. The present work monitored the optical transmittance of skeletal muscles after applying different OC approaches (physical OC using 99%-glycerol immersion and photothermal OC using IR-laser irradiation). First, the optical transmittance of the samples before and after the two OC procedures were compared, revealing a transmittance increase of 300% and 20%. Then, the laser beam wavefront aberrations were compensated in real-time by utilizing an active-adaptive Shack-Hartmann wavefront sensor system to provide an ideal illumination wavefront. Finally, the transmittance of the samples was compared using uncompensated and compensated laser wavefronts providing a 35% increase in the transmittance after aberrations compensation. Moreover, the aberration-free incident laser beam’s transmittance with different spot diameters was investigated. The results revealed that the larger beam diameter provided higher transmittance, hence higher optical penetration within the tissue.

An accurate and interpretable deep learning model for environmental properties prediction using hybrid molecular representations

Article

Full-text available

Feb 2022
AICHE J

Lipophilicity, as quantified by the decimal logarithm of the octanol–water partition coefficient (log KOW), is an essential environmental property. Deep neural networks (DNNs) based quantitative structure–property relationship (QSPR) studies have received more and more attention because of their excellent performance for prediction. However, the black‐box nature of DNNs limits the application range where interpretability is essential. Hence, this study aims to develop an accurate and interpretable deep neural network (AI‐DNN) model for log KOW prediction. A hybrid method of molecular representation was employed to guarantee the accuracy of the proposed AI‐DNN model. The hybrid molecular representations are able to integrate the directed message passing neural networks (D‐MPNNs) learned molecular representations and the fixed molecule‐level features of CDK descriptors, and can capture both the local and the global features of overall molecule. The performance analysis shows that the proposed QSPR model exhibits promising predictive accuracy and discriminative power in the structural isomers and stereoisomers. Moreover, the Monte Carlo Tree Search (MCTS) approach was used to interpret the proposed AI‐DNN model by identifying the molecular substructures contributed to the lipophilicity. This interpretability can be applied to critical fields where there is a high demand for interpretable deep networks, such as green solvent design and drug discovery.

Chemometrics for Selection, Prediction, and Classification of Sustainable Solutions for Green Chemistry—A Review

Article

Full-text available

Dec 2020

In this review, we present the applications of chemometric techniques for green and sustainable chemistry. The techniques, such as cluster analysis, principal component analysis, artificial neural networks, and multivariate ranking techniques, are applied for dealing with missing data, grouping or classification purposes, selection of green material, or processes. The areas of application are mainly finding sustainable solutions in terms of solvents, reagents, processes, or conditions of processes. Another important area is filling the data gaps in datasets to more fully characterize sustainable options. It is significant as many experiments are avoided, and the results are obtained with good approximation. Multivariate statistics are tools that support the application of quantitative structure–property relationships, a widely applied technique in green chemistry.

Data-driven Model Construction

Chapter

Dec 2023

Over the last decade, there has been a significant shift from traditional mechanistic and empirical modelling into statistical and data-driven modelling for applications in reaction engineering. In particular, the integration of machine learning and first-principle models has demonstrated significant potential and success in the discovery of (bio)chemical kinetics, prediction and optimisation of complex reactions, and scale-up of industrial reactors. Summarising the latest research and illustrating the current frontiers in applications of hybrid modelling for chemical and biochemical reaction engineering, Machine Learning and Hybrid Modelling for Reaction Engineering fills a gap in the methodology development of hybrid models. With a systematic explanation of the fundamental theory of hybrid model construction, time-varying parameter estimation, model structure identification and uncertainty analysis, this book is a great resource for both chemical engineers looking to use the latest computational techniques in their research and computational chemists interested in new applications for their work.

A Simple Model with Wide Applicability for the Determination of Binary Interaction Parameters for Mixtures of n -Alkanes with Carbon Dioxide and Nitrogen

Article

May 2022

Molecular Simulation of Glycerol-Derived Triether Podands for Lithium Ion Solvation

Article

Apr 2022

Solvate ionic liquids (ILs) are promising candidates for several applications due to their stability, high coulombic efficiency, and low volatility. In this work, we investigate the solvation of lithium-bistriflimide by different glycerol-derived triether solvents, using molecular dynamics simulations. Very strong interactions between Li+ and the solvent oxygen sites are found, leading to significant conformational changes in the solvent. By comparing the conformation of the neat solvents with their IL mixtures at different concentrations and temperatures, we find that the presence of Li+ induces a distinct crown-like structure in the solvent molecules. The Li+ cations and the surrounding solvent form a podand complex, which is stable even at elevated temperatures. These glycerol-derived solvents exhibit distinct interactions with Li+ cations which may be exploited in electrolytic applications or lithium recovery processes.

Glycerol‐derived solvents containing two or three distinct functional groups enabled by trifluoroethyl glycidyl ether

Article

Full-text available

Dec 2021
AICHE J

Conversion of epichlorohydrin to glycidyl ethers creates versatile precursors that can be transformed into a variety of molecular species with glycerol skeletons, enabling the design of molecules with highly tailored functionalities. The synthesis of 2,2,2‐trifluoroethyl glycidyl ether (TFGE, IUPAC name: 2‐[(2,2,2‐trifluoroethoxy)methyl]oxirane, CAS# 1535‐91‐7) was optimized to provide high yield/selectivity and good “green metrics.” TFGE was then used as a platform molecule in the synthesis of asymmetric glycerol 1,3‐diether‐2‐alcohol derivatives, which were subsequently transformed to 1,2,3‐triethers or 1,3‐diether‐2‐ketones. The density, viscosity, and CO2 solubility of each molecule were measured and compared with those of other glycerol‐derived compounds as well as compounds with similar functional groups. Furthermore, quantum chemical calculations were performed to understand the structure–property–performance relationships of these molecules for CO2 absorption. Based on the results in this work, we foresee that TFGE (and similar glycidyl ethers) would offer great flexibility in molecular design of green solvents and precursors to more complex compounds.

Predictive deep learning models for environmental properties

Chapter

Jan 2021

As an essential environmental property, octanol-water partition coefficient (KOW) quantifies the lipophilicity of a compound and it could be further employed to predict the toxicity. Thus, it is an indispensable factor and should be considered in screening and development of green solvents with respect to unconventional and novel compounds. Herein, a deep-learning-assisted predictive model has been developed to accurately and reliably calculate log KOW values for organic compounds. An embedding algorithm was specifically established for generating signatures automatically for molecular structures to express structural information and connectivity. Afterwards, the Tree-structured long short-term memory (Tree-LSTM) network was used in conjunction with signature descriptor for automatic feature selection, and it was then coupled with the back-propagation neural network to develop a deep neural network (DNN), which is used for modeling quantity structure-property relationship (QSPR) to predict log KOW. Comparing with an authoritative estimation method, the proposed DNN-based QSPR model exhibited the better predictive accuracy and greater discriminative power in terms of the structural isomers and stereoisomers. As such, the proposed deep learning approach can act as a promising and intelligent tool for developing environmental property prediction methods for guiding development or screening of green solvents.

Automated extraction of molecular features in machine learning-based environmental property prediction

Chapter

Jan 2021

Environmental properties of compounds provide significant information in treating organic pollutants, which drives the chemical process and environmental science toward eco-friendly technology. Traditional group contribution methods play an important role in property estimations, whereas various disadvantages emerge in their applications, such as scattered predicted values for certain groups of compounds. In order to address such issues, an extraction strategy for molecular features is proposed in this research, which is characterized by interpretability and discriminating power with regard to isomers. Based on the Henry’s law constant data of organic compounds in water, we developed a hybrid predictive model that integrates the proposed strategy in conjunction with a neural network framework. The structure of the predictive model is optimized using cross-validation and grid search to improve its robustness. Moreover, the predictive model is improved by introducing the plane of best fit descriptor as input and adopting k-means clustering in sampling. In contrast with reported models in the literature, the developed predictive model demonstrates improved generality, higher accuracy, and fewer molecular features used in its development.

A novel unambiguous strategy of molecular feature extraction in machine learning assisted predictive models for environmental properties

Article

May 2020

Environmental properties of compounds provide significant information in treating organic pollutants, which drives the chemical process and environmental science toward eco-friendly technology. Traditional group contribution methods play an important role in property estimations, whereas various disadvantages emerge in their applications, such as scattered predicted values for certain groups of compounds. In order to address such issues, an extraction strategy for molecular features is proposed in this research, which is characterized by interpretability and discriminating power with regard to isomers. Based on the Henry's law constant data of organic compounds in water, we developed a hybrid predictive model that integrates the proposed strategy in conjunction with a neural network framework. The structure of the predictive model is optimized using cross-validation and grid search to improve its robustness. Moreover, the predictive model is improved by introducing the plane of best fit descriptor as input and adopting k-means clustering in sampling. In contrast with reported models in the literature, the developed predictive model demonstrates improved generality, higher accuracy, and fewer molecular features used in its development.

A quantum-chemical-based guide to analyze/quantify the cytotoxicity of ionic liquids

Article

Full-text available

Jan 2010
GREEN CHEM

A COSMO-RS descriptor (Sσ-profile) has been used in quantitative structure–activity relationship studies (QSARs) based on a neural network for the prediction of the toxicological effect of ionic liquids (ILs) on a leukemia rat cell line (LogEC50 IPC-81) for a wide variety of compounds including imidazolium, pyridinium, ammonium, phosphonium, pyrrolidinium and quinolinium ILs. Sσ-profile is a two-dimensional quantum-chemical parameter capable of characterising the electronic structure and molecular size of cations and anions. By using a COSMO-RS descriptor for a training set of 105 compounds (96 ILs and 9 closely related salts) with known biological activities (experimental LogEC50 IPC-81 values), a reliable neural network was designed for the systematic analysis of the influence of structural IL elements (cation side chain, head group, anion type and the presence of functional groups) on the cytotoxicity of 450 IL compounds. The Quantitative Structure–Activity Map (QSAM), a new concept developed here, was proposed as a valuable tool for (i) the molecular understanding of IL toxicity, by relating Log EC50 IPC-81 parameters to the electronic structure of compounds given by quantum-chemical calculations; and (ii) the sustainable design of IL products with low toxicity, by linking the chemical structure of counterions to the predictions of IL cytotoxicity in handy contour plots. As a principal contribution, quantum-chemical-based QSAM guides allow the analysis/quantification of the non-linear mixture effects of the toxicophores constituting the IL structures. Based on these favorable results, the QSAR model was applied to estimate IL cytotoxicities in order to screen commercially available compounds with comparatively low toxicities.

Applied Regression Analysis.

Article

Sep 1967

Applied Regression Analysis.

Article

Jun 1967

The Influence of the Solvent on Organic Reactivity

Article

Jan 1972

There can be no general theory of organic reactivity without consideration of solvent effects. Solvent effects on rate and equilibrium constants, and on spectral characteristics, for example, are no less in magnitude than structural effects. It is generally agreed that the problem of solvent-solute interaction is no less complicated than that of structural effects. Therefore all the various approaches in this field, as reflected in the correspondingly vast literature, cannot be reviewed in this chapter, even briefly. Hence we have restricted ourselves to certain definite aspects of the problem of solvent effects, viz. the results one can hope to obtain by the use of the LFER principle.1,2,3

CRC handbook of chemistry and physics: From paper to web

Article

Jan 2009

Predicting Toxicity of Ionic Liquids in Acetylcholinesterase Enzyme by the Quantitative Structure–Activity Relationship Method Using Topological Indexes

Article

Jul 2012

A new topological index (TI) was proposed based on atom characters (e.g., atom radius, atom electronegativity, etc.) and atom positions in the hydrogen-suppressed molecule structure in our previous work. In this work, the TI was used for predicting the toxicity of ILs in acetylcholin esterase (log EC50 AChE) by the multiple linear regression (MLR) method. For ILs composed entirely of cations and anions, the TIs are calculated from cations and anions, respectively. The 221 ILs used in the MLR model are based on imidazolium (Im), pyridinium (Pyi), pyrrolidinium (Pyo), ammonium (Am), phosphonium (Ph), quinolinium (Qu), piperidinium (Pi), and morpholinium (Mo). The regression coefficient (R2) and the overall average absolute error (AAE) are 0.877 and 0.153, respectively.

Design of ionic liquids: An ecotoxicity (Vibrio fischeri) discrimination approach

Article

Jun 2011
GREEN CHEM

Ionic liquids have attracted a lot of attention as potential replacements for conventional volatile organic solvents, although they may pose environmental risks to aquatic ecosystems that have to be assessed. There is strong interest in developing mathematical models to estimate the ecotoxicity of ionic liquids, minimising the experimental investigations and the consequent consumption of time and resources. This paper presents a new approach for estimating the ecotoxicity of ILs, based on the standardised assay with the bacterium Vibrio fischeri, by means of the application of Partial Least Squares-Discriminant Analysis (PLS-DA). The PLS-DA model developed makes it possible to discriminate ionic liquids, formed by combinations of 30 anions and 64 cations, on the basis of their expected toxicity with respect to conventional solvents that they may replace. The successful results obtained in the validation of the model reveal that this approach can be useful as a screening tool to easily aid, from the early stages of the process, the design of aquatic environmentally friendly ionic liquids. This approach may also be useful for the further development of predictive models based on other aquatic organisms, for which more data are expected to be available in the near future.

The Collinearity Problem in Linear Regression. The Partial Least Squares (PLS) Approach to Generalized Inverses

Article

Sep 1984

The use of partial least squares (PLS) for handling collinearities among the independent variables X in multiple regression is discussed. Consecutive estimates $({\text{rank }}1,2,\cdots )$ are obtained using the residuals from previous rank as a new dependent variable y. The PLS method is equivalent to the conjugate gradient method used in Numerical Analysis for related problems. To estimate the “optimal” rank, cross validation is used. Jackknife estimates of the standard errors are thereby obtained with no extra computation. The PLS method is compared with ridge regression and principal components regression on a chemical example of modelling the relation between the measured biological activity and variables describing the chemical structure of a set of substituted phenethylamines.

Improved molecular descriptors to calculate boiling points based on the optimization of correlation weights of local graph invariants

Article

Jun 2001
J MOL STRUC-THEOCHEM

We report the calculation of boiling points for several alkyl alcohols through the use of improved molecular descriptors based on the optimization of correlation weights of local invariants of graphs. As local invariants we have used the presence of different chemical elements (i.e. C, H, and O) and the existence of different vertex degree values (i.e. 1, 2, 3 and 4). The inherent flexibility of the chosen molecular descriptor seems to be rather suitable to obtain satisfactory predictions of the property under study. Comparison with other similar approximation reveals a very good behavior of the present method. The use of higher order polynomials do not seem to be necessary to improve the results regarding the simple linear fitting equations. Some possible future extensions are pointed out in order to achieve a more definitive conclusion about this approximation.

LX.?The partial esterification of polyhydric alcohols. Part XI. The five methyl ethers of glycerol and related compounds

Article

Jan 1931

Quantitative structure–property relationships prediction of some physico-chemical properties of glycerol based solvents

Abstract and Figures

Recommended publications

In silico prediction of cutaneous penetration rate of some chemicals from their molecular structural...

QSPR Prediction of Aqueous Solubility of Drug-Like Organic Compounds

QSPR Modeling of Stability Constants of the Li-Hemispherands Complexes Using MLR: A Theoretical Host...

Synthesis and physico-chemical properties of alkyl glycerol ethers. Green Chem

QSRR Prediction of the Chromatographic Retention Behavior of Painkiller Drugs