ArticlePDF Available

Accurate calculations of the hydration free energies of druglike molecules using the reference interaction site model

Authors:

Abstract and Figures

We report on the results of testing the reference interaction site model (RISM) for the estimation of the hydration free energy of druglike molecules. The optimum model was selected after testing of different RISM free energy expressions combined with different quantum mechanics and empirical force-field methods of structure optimization and atomic partial charge calculation. The final model gave a systematic error with a standard deviation of 2.6 kcal/mol for a test set of 31 molecules selected from the SAMPL1 blind challenge set [J. P. Guthrie, J. Phys. Chem. B 113, 4501 (2009)]. After parametrization of this model to include terms for the excluded volume and the number of atoms of different types in the molecule, the root mean squared error for a test set of 19 molecules was less than 1.2 kcal/mol.
Content may be subject to copyright.
Accurate calculations of the hydration free energies of druglike molecules
using the reference interaction site model
David S. Palmer, Volodymyr P. Sergiievskyi, Frank Jensen, and Maxim V. Fedorov
a
Max Planck Institute for the Mathematics in the Sciences, Inselstrasse 22, DE-04103 Leipzig,
Germany and Department of Chemistry, Aarhus University, Langelandsgade 140, 8000 Aarhus C, Denmark
Received 30 April 2010; accepted 8 June 2010; published online 22 July 2010
We report on the results of testing the reference interaction site model RISM for the estimation of
the hydration free energy of druglike molecules. The optimum model was selected after testing of
different RISM free energy expressions combined with different quantum mechanics and empirical
force-field methods of structure optimization and atomic partial charge calculation. The final model
gave a systematic error with a standard deviation of 2.6 kcal/mol for a test set of 31 molecules
selected from the SAMPL1 blind challenge set J. P. Guthrie, J. Phys. Chem. B 113, 4501 2009兲兴.
After parametrization of this model to include terms for the excluded volume and the number of
atoms of different types in the molecule, the root mean squared error for a test set of 19 molecules
was less than 1.2 kcal/mol. © 2010 American Institute of Physics. doi:10.1063/1.3458798
I. INTRODUCTION
Accurate calculation of the hydration free energies of
organic molecules is a long-standing challenge in computa-
tional chemistry and is important in many aspects of research
in the pharmaceutical and agrochemical industries. For ex-
ample, many of the pharmacokinetic properties of potential
drug molecules are defined by their in vivo solvation and
acid-base behavior, which can be estimated from their hydra-
tion free energies.
17
Commonly used methods to calculate hydration free en-
ergy may be categorized as either explicit or implicit solvent
models. In the first approach, each solvent molecule is in-
cluded explicitly and molecular simulation methods are used
to sample their conformational freedom.
1,2,813
In the second
approach, the implicit effect of solvent on solute is included
by solving either the Poisson–Boltzmann PB or the
generalized-Born GB equation.
1418
While explicit solvent
methods are more scientifically rigorous, implicit models are
often preferred because they are less computationally expen-
sive. Both methods have recently been subjected to a blind
test for the calculation of hydration free energies of druglike
molecules SAMPL1 test.
19
The best predictions were in the
range RMSE
pred
=2.53.5 kcal mol
−1
, which equates to an
2 log unit error in the related pharmacokinetic property
estimated from G
solv
=−RT ln K.
2022
Clearly, the current
methods to calculate hydration free energy are not accurate
enough for modern pharmaceutical research. Furthermore,
because these methods have been available for some time
before the first benchmark on druglike molecules was pub-
lished, it has led to a situation where they have often been
used blindly beyond their domain of applicability.
Integral equation theory is an alternative framework for
the calculation of hydration free energies.
2325
Unlike PB or
GB methods, it retains information about the solvent struc-
ture in terms of density correlation functions, but estimates
the solute chemical potential without long molecular dynam-
ics MD or Monte Carlo MC simulations. At present, there
are several approaches based on integral equations. The mo-
lecular Ornstein–Zernike theory is used to calculate the
three-dimensional 3D hydration structure in molecular
liquids.
26,27
The site-site Ornstein–Zernike SSOZ integral
equation is used to calculate the properties of complex
solute-solvent systems in the reference interaction site model
RISM formalism developed by Chandler and others.
2830
The theory has been applied successfully to calculate the
structural and thermodynamic properties of various chemical
and biological systems.
3140
Despite a recent resurgence in interest in biomedical ap-
plications of the integral equation theory,
4146
the methods
were not represented in the recent SAMPL1 blind test and
have not been adequately tested for druglike molecules. In
this work, we test several previously used RISM methods
combined with molecular geometries and partial charges cal-
culated using different quantum mechanics QM and em-
pirical force-field FF methods for a subset of the SAMPL1
dataset of druglike molecules.
II. THEORY
A. RISM
The RISM, initially introduced by Chandler and
Andersen,
28
permits the description of the thermodynamics
of infinitely dilute solutions by a set of integral equations. A
complete description of the RISM may be found in Ref. 25.
Here we give only the basic definitions that are needed to
calculate hydration free energies. In the RISM approach,
both the solute and the solvent molecules are treated as sets
of sites with spherically symmetric properties. In the sim-
plest case, the sites are just the atoms of the molecules. In
this paper, we are using the so-called one-dimensional 1D
RISM approach
25
where solute and solvent interactions are
a
Author to whom correspondence should be addressed. Tel.: 49 0 341
9959 756. Electronic mail: fedorov@mis.mpg.de.
THE JOURNAL OF CHEMICAL PHYSICS 133, 044104 2010
0021-9606/2010/1334/044104/11/$30.00 © 2010 American Institute of Physics133, 044104-1
described by spherically symmetric site-to-site functions. Us-
ing this approach, one operates only with radial parts of these
functions and, therefore, all the numerical tasks are one di-
mensional which leads to a significant reduction in compu-
tational cost.
Three types of site-site correlation functions are consid-
ered in the RISM: intramolecular correlation functions, total
correlation functions, and direct correlation functions. In-
tramolecular correlation functions describe the structure of
the molecule. For the two sites, s and s
of one molecule, the
intramolecular correlation function is
ss
r =
r r
ss
4
r
ss
2
, 1
where r
ss
is the distance between the sites and
r r
ss
is
the Dirac delta-function. Total correlation functions h
s
r
and direct correlation functions c
s
r are defined for each
pair of solute and solvent sites s and
, respectively. The
total correlation functions can be expressed as h
s
r
=g
s
r1, where g
s
r is the radial distribution function of
solvent sites around the solute sites. Bulk solvent total cor-
relation functions h
bulk
r are also considered and represent
the distribution of sites
of solvent molecules around the site
of the selected solvent molecule. Direct correlation func-
tions c
s
r are calculated using the set of RISM equations
for the case of infinitely diluted solution
25
h
s
r =
s
ss
c
s
bulk
+
h
bulk
兴典共r,
2
s = 1, ... ,N
solute
,
= 1, ... ,N
solvent
, r 0;.
Here x y典共r is the radial part of the spherically
symmetric three-dimensional convolution x y典共r
=
R
3xr r
yr
dr
, and
is a number density of the bulk
solvent. To complete the set of RISM equations, one needs to
use a closure relationship, which has the general form
c
s
r = e
s
rB
s
r
h
s
r + c
s
r −1, 3
where
s
r=−
u
s
r+h
s
rc
s
r, u
s
r is the atom-
atom potential, B
s
r is a so-called bridge function,
25,47
=1/ k
B
T, k
B
is the Boltzmann constant, and T is the tempera-
ture in our calculations we used T=300 K. The case Br
0 corresponds to the frequently used hypernetted chain
closure.
25,47
However, the RISM equations with hypernetted
chain closure did not converge for many molecules in the
investigated set the poor convergence of RISM with Br
0 has been reported previously
25,31,48
. Therefore, to im
-
prove convergence of our algorithm, we used the partially
linearized hypernetted chain closure PLHNC,
49,50
c
s
r =
e
s
r
h
s
r + c
s
r −1,
s
r 0
u
s
r,
s
r 0.
4
In our calculations, the intramolecular correlation func-
tions
ss
r were found from Eq. 1. We used the modified
SPC/E water model MSPC/E proposed by Lue and Blank-
shtein in Ref. 51. The MSPCE/E model differs from the
original SPC/E water model
52
by the additional Lennard-
Jones LJ potential for the water hydrogen, which was
modified to prevent possible divergence of the
algorithm.
38,5355
The total correlation functions of the bulk
solvent h
bulk
r were obtained from the previous work of
some of us
56
where these functions were calculated by the
RISM equations for solvent-solvent correlations
25
and the
wavelet-based algorithm for integral equations.
5659
For all solutes, we used the LJ parameters from the
OPLS-2005 force-field.
60,61
We obtained the solute partial
charges by different QM methods see below.
The set of RISM equations 2, together with the closure
relation 4, allows us to find the functions h
s
r and c
s
r,
which are used to calculate hydration free energies. There are
no known methods to solve the set of RISM equations ana-
lytically in the general case. Thus, in most cases the RISM
equations are solved numerically. In Ref. 62, it was shown
that RISM-like equations for monatomic particles can be ef-
fectively solved using a multigrid technique.
63
In the current
work, we use the RISM-MOL solver, which is the
MATLAB
realization of the multigrid algorithm for solving RISM
equations.
62
The RISM-MOL solver is one of the recent de
-
velopments of the Computational Physical Chemistry and
Biophysics Group of the Max-Planck-Institute for Math-
ematics in the Sciences. Since this program has not previ-
ously been reported in literature, we give a short description
of it in the Appendix of this article.
Within the RISM theory, there are several expressions
which allow one to obtain values of the hydration free energy
from the total and direct correlation functions h
s
r and
c
s
r. In our work, we compare the accuracy of four of the
most popular free energy expressions.
55,6466
The first ex
-
pression is the hypernetted-chain HNC approximation,
64
in
which the formula for the hydration free energy is
G
HNC
=2
␲␳
kT
s
0
−2c
s
r h
s
r
c
s
r h
s
r兲兲兴r
2
dr. 5
The hypernetted-chain method with repulsive bridge correc-
tion HNCB proposed by Kovalenko and Hirata in Ref. 55
has a modified form of this hydration free energy expression.
Here we adopted the HNCB expression from the previous
work
55
for the case of PLHNC closure.
The hydration free energy expression for the HNCB is
55
G
HNCB
= G
HNC
+4
␲␳
kT
s
0
h
s
r +1
e
B
s
R
r
−1r
2
dr. 6
Here B
s
R
r兲其 are repulsive bridge correction functions, de-
fined for each pair of solute s and solvent
atoms by the
expression
exp B
s
R
r兲兲 =
bulk
exp
s
s
r
12
, 7
where
bulk
r are the solvent intramolecular correlation
functions, and
s
and
s
are the site-site parameters of the
pairwise Lennard-Jones potential.
044104-2 Palmer et al. J. Chem. Phys. 133, 044104 2010
The third expression is the Gaussian fluctuation GF
approximation,
6567
in which the free energy is given as
G
GF
=2
␲␳
kT
s
0
−2c
s
r c
s
rh
s
r兲兲r
2
dr. 8
The final hydration free energy equation we consider is
the partial wave PW expression.
65,68
It has previously been
demonstrated to be one of the most accurate methods for
calculating hydration free energies of simple organic mol-
ecules within the RISM framework.
38,65,66
The PW expres
-
sion for hydration free energy is
G
PW
= G
GF
+2
␲␳
kT
s
0
h
˜
s
rh
s
rr
2
dr, 9
where h
˜
s
r=
s
˜
ss
h
s
˜
bulk
,
˜
ss
and
˜
bulk
are the
elements of matrices that are inverse to the matrices W
=
ss
and W
bulk
=
bulk
, which are built from the solute
and solvent intramolecular correlation functions
ss
and
bulk
, respectively.
We note that there exists a more sophisticated version of
the RISM-3D RISM, which operates with three-dimensional
solute-solvent correlation functions.
6972
This theory is based
on three-dimensional analogous of Eqs. 2 and 4 see, e.g.,
Refs 25 and 72 for details. However, we are not using this
approach in this work, first because the use of 3D convolu-
tion in the 3D RISM calculations makes them very compu-
tationally expensive even when advanced numerical ap-
proaches and parallel programming are used to speed up the
calculations.
72
The computational cost makes 3D RISM dif
-
ficult to use for screening of large series of compounds and
testing different theoretical models e.g., in this work we
performed about 10
3
different RISM calculations to choose
the best parameters for hydration free energy prediction.
Second, the large computational cost associated with the 3D
convolution operations in the 3D RISM approach limits the
finite grid spacing to 0.5 Å and the potential cutoff to
8–10 Å,
72,73
which complicates taking integrals in the 3D
analogs of the hydration free energy expressions 5 and 8
Ref. 73 and affects the numerical accuracy of these calcu-
lations. However, in general, we consider the 3D RISM to be
a promising approach and we plan to extend our model to 3D
RISM in the near future. However, to make the 3D RISM
more feasible for practical applications, significant develop-
ments of the numerical part of the model are required. A
detailed investigation of the performance of hydration free
energy calculations using 1D RISM and 3D RISM ap-
proaches will be the subject of future work in our group.
III. MATERIALS AND METHODS
A. Dataset
The methods discussed above were tested on hydration
free energy data for 31 organic molecules taken from the
SAMPL1 dataset Table I.
19
The molecules in this dataset
present a stringent test of methods to calculate hydration free
energy because they are significantly larger and more com-
plex i.e., more functional groups than those previously con-
sidered as benchmarks for hydration free energy calculations.
In Table I, the experimental hydration free energies in
kcal/mol are given for all 31 molecules as reported in the
original SAMPL1 publication. The data are tabulated as
G
hydr
=−RT lnc
aq
/ c
gas
, with concentrations in mol/l,
which corresponds to the choice of standard states proposed
by Ben-Naim. The experimental data are given as grand
mean averages taken from multiple sources from the pub-
lished literature. Details of the methods used to compile the
dataset are given in Ref. 19 and will not be recapitulated
here. For the 21 molecules, the experimental hydration data
were reported with estimated experimental uncertainties,
which range from 0.1 to 0.44 kcal/mol with a median value
of 0.1 kcal/mol. The hydration free energies of the remaining
10 molecules were calculated from solubility and vapor pres-
sures; if these data were reported without experimental un-
certainties, they were arbitrarily assigned values of 1 log
unit. As such, the reported experimental uncertainties for
these molecules are pessimistic estimates, as has previously
been noted by other authors.
19
In the present study, we work with only 31 of the 63
molecules in the full SAMPL1 dataset in order to make it
computationally feasible to test a large number of different
free energy expressions and molecular geometry and partial
charges methods. The 31 molecules were selected at random
TABLE I. Hydration free energy data and SAMPL1 identification code for
the 31 molecules used in the current work.
ID Molecule
G
hydr
kcal/mol
Error
kcal/mol
Cup08002 1,2-dinitroxypropane 4.95 0.1
Cup08004 2-butyl nitrate 1.82 0.1
Cup08005 Isobutyl nitrate 1.88 0.1
Cup08007 Alachlor 8.21 0.29
Cup08009 Ametryn 7.65 0.45
Cup08016 Carbofuran 9.61 0.3
Cup08020 Chlorimuronethyl 14.01 1.93
Cup08021 Chloropicrin 1.45 0.1
Cup08024 Diazinon 6.48 0.13
Cup08025 Dicamba 9.86 1.93
Cup08026 Dichlobenil 4.71 1.93
Cup08028 Dinoseb 6.23 1.93
Cup08029 Endosulfan alpha 4.23 0.26
Cup08030 Endrin 4.82 0.1
Cup08033 Heptachlor 2.55 0.1
Cup08034 Isophorone 5.18 1.37
Cup08035 Lindane 5.44 0.1
Cup08038 Methyparathion 7.19 0.1
Cup08041 Nitroxyacetone 5.99 0.1
Cup08043 Parathion 6.74 0.1
Cup08044 Pebulate 3.64 1.93
Cup08045 Phorate 4.37 0.1
Cup08048 Propanil 7.78 1.93
Cup08050 Simazine 10.22 0.1
Cup08052 Terbacil 11.14 1.93
Cup08053 Terbutryn 6.68 0.42
Cup08057 Vernolate 4.13 1.36
Cup08058 4-amino-4-nitroazobenzene 11.24 0.44
Cup08018 Chlordane 3.44 0.1
Cup08047 Prometryn 8.43 0.1
Cup08032 Fenuron 9.13 1.93
044104-3
RISM G of druglike molecules J. Chem. Phys. 133, 044104 2010
from the full dataset. The selected molecules contain be-
tween 8 and 27 heavy atoms each and have molecular
weights ranging from 119 to 426 atomic mass units.
B. Geometry optimization and atomic partial charge
calculation
Estimates of hydration free energies obtained by the
RISM are sensitive to the input molecular geometry and to
the calculated atomic partial charges. In this work, we tested
a variety of different classical force-field and quantum me-
chanical methods for their calculation.
Molecular structures were obtained for each molecule in
the SAMPL1 test set from the supporting information of Ref.
19. As a preliminary preparation of these structures, a low-
mode conformational search was carried out for each mol-
ecule in both gas and aqueous phases using the OPLS-2005
force-field
60,61
in MACROMODEL v.9.1,
74
where aqueous sol
-
vent was simulated using the generalized-Born surface area
approximation.
75
The global minimum energy conformers
were used as input to each of the following geometry opti-
mization and partial charge calculations.
First, molecular geometries were optimized using the
B3LYP hybrid density functional and 6-31G
ⴱⴱ
basis set with
diffuse orbitals for heavy atoms and hydrogen
76
in vacuum
and in aqueous solvent simulated separately by two different
hydration models: the polarizable continuum model
PCM
7779
and the conductorlike continuum model
CPCM,
79,80
implemented in GAUSSIAN 03.
81
All electronic
structure calculations were carried out in
GAUSSIAN 03
RevE.01,
81
unless otherwise stated. For each of the opti
-
mized molecular geometries, atomic partial charges were es-
timated by seven different methods: CHELP Charges from
Electrostatic Potentials,
82
CHELPG grid-based CHELP,
83
ESP Merz–Kollman Electrostatic Potential charges,
84,85
CHELP-DIPOLE, CHELPG-DIPOLE, ESP-DIPOLE, and
natural population analysis NPA.
86
The CHELP, CHELPG,
and ESP methods calculate atomic partial charges that repro-
duce the electrostatic potential on grid points outside the van
der Waals surface of the molecule. The suffix “-DIPOLE”
indicates that the atomic partial charges are also constrained
to reproduce the molecular dipole. In NPA atomic partial
charges are obtained by decomposing the molecular wave
function into atomic contributions. Since each atomic partial
charge calculation was repeated for three different solvation
models vacuum, PCM, CPCM, we have 37=21 geom-
etry and atomic partial charge sets calculated using density
functional theory DFT.
Second, the molecular geometries were optimized using
Hartree–Fock HF theory and the 6-31G
ⴱⴱ
basis set in
vacuum. As for the DFT calculations, seven different partial
charge estimations were carried out.
Third, AM1-BCC and AM1-Mulliken atomic partial
charges were calculated using
MOPAC
87,88
and
ANTECHAMBER.
89,90
AM1-BCC charges are evaluated by ap
-
plying an empirical bond charge correction BCC scheme to
AM1-Mulliken charges. Here we use the BCC parameters
derived by Jakalian et al.,
91
which were fitted by these au
-
thors to make the AM1-BCC charges match the electrostatic
potential at the HF/ 6-31G
level.
Finally, the geometries and partial charges calculated by
the OPLS2005 force-field in both vacuum and aqueous sol-
vent during the low-mode conformational search were used
as an additional set of parameters. In total, we have 21+7
+2+2=32 different pairs of molecular geometries and
atomic partial charges for each molecule. For each of these
sets, the hydration free energy was calculated using four dif-
ferent RISM free energy expressions HNC, HNCB, GF, and
PW. In total, this gives 324=128 different combinations
of free energy calculation methods. To identify the selected
methods, we will list slash separated names of QM method,
hydration model, partial charge method, and RISM expres-
sion. For example, B3LYP/PCM/CHELPG-DIPOLE/PW.
For each combination of methods, the values of the hy-
dration free energies of the 31 molecules from Table I were
calculated. The best of these models were then parametrized
to improve predictions of the hydration free energy using
separate training and independent test sets.
C. Statistical modeling
1. Error calculation
For all molecules of the dataset see Table I, hydration
free energy values were calculated using different structure
optimization methods, partial charge models, and RISM free
energy formulas HNC, HNCB, GF, and PW. To compare
calculated and experimental results, root mean squared de-
viation RMSD was evaluated,
RMSDG,G
expt
=
1
N
i
G
i
G
expt
i
2
, 10
where index i runs through the set of N selected molecules,
and G
i
and G
expt
i
are the calculated and the experimental
hydration free energy values of molecule i, respectively. The
total deviation can be split into the two parts: mean displace-
ment M and standard deviation SD, which are calculated
by the formulas
MG G
expt
=
1
N
iS
G
i
G
expt
i
, 11
SDG G
expt
=
1
N
iS
G
i
G
expt
i
MG G
expt
兲兲
2
. 12
The mean displacement gives the systematic error, which can
be corrected by a simple constant term. The standard devia-
tion gives the random error that is not explained by the
model. One can see the connection between these three for-
mulas,
RMSDG,G
expt
2
= MG G
expt
2
+SDG G
expt
2
. 13
2. Fitting formula
In Ref. 38, it was shown that when excluded volume-
based correction terms are included in the RISM/PW for-
mula, the accuracy of the calculated hydration free energies
044104-4 Palmer et al. J. Chem. Phys. 133, 044104 2010
for simple nonpolar organic solutes improves considerably.
This result suggests that excluded volume corrections should
also be useful for improving the prediction of the RISM for
the SAMPL1 molecule set. In this case, we calculate the
excluded volume of the solute in infinitely dilute aqueous
solution as a limiting case of the partial molar volume
formula
92
when the solute density tends to zero,
V
ex
=
1
+
4
N
solute
s
0
h
OO
bulk
r h
so
r兲兲r
2
dr. 14
Here h
OO
bulk
r is the total oxygen-to-oxygen correlation func-
tion of bulk water and h
so
r is the total correlation function
between the solute site s and the water oxygen.
In Ref. 38, it is discussed that the RISM formulas may
systematically overestimate the hydration free energy of
small organic compounds, which contain certain types of
functional groups, e.g., charged groups or hydroxyl groups.
The authors introduced group contribution terms to correct
for these systematic errors. In a similar manner, additional
functional group corrections might be required for calcula-
tions of the larger molecules considered here. Due to the
structures of the molecules from SAMPL1 set, however,
there is no single obvious way to separate them by functional
groups. Therefore, to be consistent we used atom type rather
than functional group corrections. The 31 molecules given in
Table I contain hydrogen, carbon, oxygen, nitrogen, oxygen,
chlorine, phosphorus, and sulfur atoms. Thus, the fitting for-
mula is
G
corr
b = G
RISM
+ b
V
V
ex
+
j
b
j
n
j
, 15
where j runs over the all atom types: j
H,C,N,O,Cl,P,S, n
j
is a number of atoms of type j in
the molecule, and b=b
V
,b
H
,b
C
,b
N
,b
O
,b
Cl
,b
P
,b
S
are the
coefficients to be fitted on the training molecule set. To pa-
rametrize the empirical model, we partitioned the 31-
molecule SAMPL1 subset into separate training and inde-
pendent test sets. As a training set, we chose 12 molecules,
which are listed in Table II. As one can see, the minimum
fitting condition is satisfied: for each atom type there is at
least 1 molecule from the training set which contain atoms of
this type. The test set comprised the remaining 19 molecules
given in Table I, which are not in the 12-molecule
training set given in Table II. Coefficients b
=b
V
,b
H
,b
C
,b
N
,b
O
,b
Cl
,b
P
,b
S
in the formula 15 were fit-
ted to minimize the root mean squared deviation
RMSDG
expt
,G
corr
b兲兲 on the training set molecules.
3. Validation of the fitting results
Because we have relatively small test and training sets,
the small error on the test set by itself was not enough to
validate the formula. An additional validation procedure was
needed. First, a standard analysis of the variance t-test and
F-test
93
was performed to make sure that both experimental
and corrected calculated results have the same mean values
and standard deviations. Second, the coefficients of determi-
nation R
2
were calculated to check the strength of correla-
tion between the corrected calculated and the experimental
results. To check that the fitting coefficients are not depen-
dent on the choice of the training set, three additional tests
were performed: i leave-one-out cross-validation, ii
leave-five-out cross-validation, and iii comparison of the
coefficients obtained by fitting to the training set and to the
full set. In the leave-one-out test, we perform a series of
fittings using the training sets, which are the initial 31-
molecule test set from Table I with 1 molecule extracted. For
all possible choices of the extracted molecule, we have 31
different sets of fitting coefficients,
b
k
= b
V
k
,b
H
k
,b
C
k
,b
N
k
,b
O
k
,b
Cl
k
,b
P
k
,b
S
k
, k = 1, ... ,31.
16
We count the relative standard deviation of each type of fit-
ting coefficient,
b
j
=
SD共兵b
j
k
其兲
M共兵b
j
k
其兲兩
100%, 17
where j is the type of the coefficient: j
V,H,C,N,O,Cl,P,S. Values
b
j
show the sensitivity of
the coefficient b
j
to the choice of training set. Low
b
j
values
indicate that coefficient b
j
is not arbitrary and we can trust its
value, while high
b
j
values indicate physically nonreliable
coefficients. The leave-five-out test is similar to leave-one-
out, but the training sets are constructed by excluding 5 mol-
ecules from the initial 31-molecule test set given in Table I.
TABLE II. The number of times each atom type occurs in each molecule of the training set.
ID n
H
n
C
n
N
n
O
n
Cl
n
P
n
S
Cup08002 6 3 2 6 0 0 0
Cup08009 17 9 5 0 0 0 1
Cup08021 0 1 1 2 3 0 0
Cup08024 21 12 2 3 0 1 1
Cup08026 3 7 1 0 2 0 0
Cup08029 6 9 0 3 6 0 1
Cup08032 12 9 2 1 0 0 0
Cup08034 14 9 0 1 0 0 0
Cup08035 6 6 0 0 6 0 0
Cup08038 10 8 1 5 0 1 1
Cup08044 21 10 1 1 0 0 1
Cup08057 21 10 1 1 0 0 1
044104-5
RISM G of druglike molecules J. Chem. Phys. 133, 044104 2010
Because the number of possible choices of 5 molecules
among 31 is quite large C
31
5
=169 911, we chose randomly
1000 such extractions and calculate the values
b
j
for them.
In addition, the training set fitting coefficients were com-
pared to the full set fitting coefficients. In this case, we per-
form two different fittings. Using the full 31-molecule test
sets from Table I for training, we obtain the fitting coeffi-
cients b
j
full
, j V,H,C,N,O,Cl,P,S. Using the training
set from Table II, we obtain another set of fitting coefficient
b
j
train
and calculate the
b
j
values by the formula
b
j
=
b
j
full
b
j
train
b
j
full
100%. 18
IV. RESULTS AND DISCUSSION
A. Analysis of calculated data
1. Models without empirical corrections
The hydration free energy values were calculated for the
31-molecule test set from Table I using 128 combinations of
RISM and structure calculation methods. One can find the
results of the calculations in Ref. 94. The comparison with
experiment shows quite high RMSD values for all methods.
The smallest error is about 5.6 kcal/mol.
95
However, if we
look at the differences between the calculated and experi-
mental results, we can see that they are not random. For
many combinations of QM/RISM methods, differences are
distributed around a mean value and the standard deviation is
reasonably small see Fig. 1. The smallest standard devia-
tion of the differences is achieved using the B3LYP/gas/
CHELPG-DIPOLE/PW methods and is about 2.6 kcal/mol,
which is comparable to the results of the SAMPL1 hydration
free energy predictions found in literature.
2022
We see that
although the RISM predictions contain large systematic er-
rors, the free energies calculated using the RISM are well
correlated with the experimental values. To support this
point, we calculated the correlation coefficients for the ex-
perimental and calculated values for each combination of
methods. In Table III, correlation coefficients are listed for
the methods that give the smallest standard deviation of the
differences between the calculated and the experimental hy-
dration free energy values. Results of these methods are well
correlated with the experimental data for most of them cor-
relation coefficients are larger than 0.7. RMSD values, stan-
dard deviations, and correlation coefficients for all methods
−15 −10 −5
0
0
5
10
15
Δ G
ex
p
(kcal/mol)
Δ G
calc
Δ G
exp
(kcal/mol)
RMSD=12.5
Mean =12.2
SD=2.6
B3LYP/gas/ChelpG−dipole/PW
FIG. 1. Systematic and random errors between the hydration free energies
calculated by the B3LYP/gas/CHELPG-DIPOLE/PW method and the ex-
perimental results.
TABLE III. a RISM results with the smallest standard deviations of differences between experimental and
calculated hydration free energies. b RISM results with the largest correlation coefficients.
QM level Solvation model Partial charges Formula
Standard deviation
kcal/mol Correlation coefficient
a Ten results with the smallest standard deviation
B3LYP Gas CHELPG-DIPOLE PW 2.599 0.749
B3LYP Gas CHELP-DIPOLE PW 2.642 0.685
B3LYP Gas CHELPG PW 2.647 0.744
B3LYP Gas CHELP PW 2.672 0.677
HF Gas CHELPG PW 3.132 0.769
HF Gas CHELPG-DIPOLE PW 3.187 0.766
FF Gas OPLS2005 PW 3.331 0.706
HF Gas CHELP-DIPOLE PW 3.391 0.688
HF Gas CHELP PW 3.459 0.679
B3LYP PCM CHELP PW 3.558 0.820
b Ten results with the highest correlation coefficients
B3LYP CPCM CHELPG-DIPOLE PW 3.595 0.869
B3LYP CPCM CHELPG PW 3.582 0.868
B3LYP PCM CHELPG-DIPOLE PW 3.581 0.868
B3LYP PCM CHELPG PW 3.568 0.868
B3LYP CPCM CHELPG-DIPOLE GF 7.370 0.830
B3LYP CPCM CHELPG GF 7.349 0.830
B3LYP PCM CHELPG-DIPOLE GF 7.347 0.829
B3LYP CPCM CHELP-DIPOLE PW 3.689 0.824
B3LYP PCM CHELP-DIPOLE PW 3.575 0.823
B3LYP CPCM CHELP PW 3.701 0.821
044104-6 Palmer et al. J. Chem. Phys. 133, 044104 2010
are given in Ref. 94. Using only these preliminary results, we
can already select the most and least suitable methods. We
see that good correlations with experiment are observed with
both HF and B3LYP methods with CHELP, CHELPG, and
CHELP-DIPOLE, or CHELPG-DIPOLE charges. As we see,
the standard deviation for the OPLS2005 charges with PW
expression is about 3.3 kcal/mol; a promising result for this
level of theory. We also see that the smallest standard devia-
tions between the calculated and the experimental results are
obtained with the PW RISM formula, the GF formula gives
intermediate results the lowest standard deviation of error is
about 5.3 kcal/mol, while the HNC and HNCB free energy
formulas give quite large deviations from experiment stan-
dard deviations of errors are larger than 8.8 kcal/mol. The
methods for which we have reported small standard devia-
tions of the errors might be expected to be amenable to pa-
rametrization using, e.g., molecular volume and atom type
variables.
2. Models with empirical corrections
For each combination of methods, the coefficients b
=b
V
,b
H
,b
C
,b
N
,b
O
,b
Cl
,b
P
,b
S
in formula 15 were fitted
using the training set molecules from Table II. Each fitting
formula was assessed using the test set comprising the re-
maining 19 molecules from the 31-molecule test set. The ten
best results with smallest RMSD on the test set are listed in
Table IV. Fitting results for other methods are given in Ref.
94. Comparing Table III smallest standard deviations with
Table IV best fitting results we can see the same set of
structure optimization and partial charge methods. It is inter-
esting to note that although the GF formula gives much
larger standard deviations than the PW formula, after param-
etrization it is able to produce results, which are almost as
good as for the PW formula. We also note that OPLS2005
force-field calculations combined with the PW formula give
good results after fitting RMSD of about 2 kcal/mol.
94
The best combination of methods is HF/gas/CHELPG/
PW. The calculated values of RISM/PW hydration free ener-
gies and the calculated excluded volumes are given in Ref.
94. After fitting, the RMSD value for the 19-molecule test set
is less than 1.2 kcal/mol. Differences between the calculated
and the experimental hydration free energies for this method
are presented in Fig. 2.
In Table V, values of the fitting coefficients b
=b
V
,b
H
,b
C
,b
N
,b
O
,b
Cl
,b
P
,b
S
are presented for the HF/gas/
CHELPG/PW method. To validate these coefficients, leave-
one-out and leave-five-out cross-validations have been car-
ried out, along with a comparison between the coefficients
obtained by fitting against either the training set or full
dataset. As one can see, the deviations between the different
training sets are quite small. The highest deviations are about
11% for sulfur and phosphorus these are the rarest elements
in the 31-molecule test set. The small relative deviations
mean that the formula will not change a lot if one uses a
different training set, i.e., the fitting coefficients are stable.
B. Comparison with other methods
Hydration free energies predicted by other methods for
the 63-molecule SAMPL1 set are given in Refs. 2022. The
trend in these results is that continuum models which in-
clude some fitted parameters give RMS errors around
2.5 kcal/mol on the SAMPL1 set, while slightly higher
RMSD errors are reported for explicit solvent approaches. In
order to provide a direct comparison to our results, we have
used the data given in Refs. 2022 to recalculate the RMSD
obtained by these methods for the 19 molecules of our test
set only Table VI. For the HF/gas/CHELPG/PW method,
TABLE IV. The ten fitting results with smallest RMSDs for the 19-molecule test set.
QM level Solvation model Partial charges Formula
RMSD
kcal/mol R
2
F-test t-test
HF Gas CHELPG PW 1.138 0.897 Passed Passed
HF Gas CHELPG-DIPOLE PW 1.161 0.894 Passed Passed
B3LYP Gas CHELPG-DIPOLE GF 1.250 0.877 Passed Passed
B3LYP Gas CHELPG GF 1.270 0.871 Passed Passed
HF Gas CHELPG GF 1.344 0.857 Passed Passed
B3LYP Gas CHELPG-DIPOLE PW 1.372 0.859 Passed Passed
HF Gas CHELPG-DIPOLE GF 1.375 0.850 Passed Passed
B3LYP Gas CHELP-DIPOLE GF 1.417 0.831 Passed Passed
B3LYP Gas CHELPG PW 1.434 0.846 Passed Passed
AM1 Gas BCC PW 1.470 0.817 Passed Passed
−14 −12 −10 −8 −6 −4 −2
−5
−4
−3
−2
−1
0
1
2
3
4
5
Δ G
ex
p
(kcal/mol)
Δ G
corr
Δ G
exp
(kcal/mol)
RMSD = 1.138 kcal/mol
HF/gas/ChelpG/PW
Training set
Test set
FIG. 2. The results for the best fitted model HF/gas/CHELPG/PW. The
RMSD on the test set is 1.14 kcal/mol.
044104-7
RISM G of druglike molecules J. Chem. Phys. 133, 044104 2010
we obtained a RMSD on the test set of 1.14 kcal/mol, which
is almost half of that reported for the continuum models for
the same molecules.
Of course, such comparisons are not completely fair be-
cause results in Refs. 2022 were obtained without knowl-
edge of the experimental hydration free energies, while re-
sults in the current paper were fitted to give the best
performance on the SAMPL1 subset. However, analysis of
the performance of the different methods shows quite reason-
able trends: the best performing methods are those which use
better levels of QM theory and better RISM hydration free
energy expressions GF and PW. This indicates that good
agreement with experiment is not just a random result of
statistical fitting but has a physical background. The authors
realize that the fitting procedure proposed in this paper needs
to be improved and further validated before it can be used for
the accurate blind prediction of hydration free energies.
However, this paper illustrates a procedure by which the ef-
ficient RISM-based method for calculating hydration free en-
ergies can be developed.
V. CONCLUSIONS
We have compared the performance of different models
based on RISM theory for the calculation of the hydration
free energies of druglike molecules. The best models were
identified among 128 possible combinations of four different
RISM free energy expressions and 32 different sets of mo-
lecular geometries and atomic partial charges.
TABLE V. Fitting coefficients for the HF/gas/CHELPG/PW method and their deviations during the leave-one-
out, leave-five-out, and training vs full fitting validations.
Coefficient Value
One-left-test
b
j
%
Five-left-test
b
j
%
Train vs full fit
b
j
%
b
V
0.233 0.904 2.108 1.396
b
H
0.599 3.217 7.666 5.233
b
C
1.383 1.544 3.822 2.309
b
N
2.193 1.431 3.461 1.531
b
O
1.629 1.470 3.606 7.738
b
Cl
2.687 1.621 3.869 1.040
b
P
4.867 4.150 9.686 11.564
b
S
4.460 2.061 5.157 11.030
TABLE VI. Comparison of hydration free energies for the 19-molecule test set calculated by different methods kcal/mol.
Mol. ID Expt.
a
RISM
b
SM6
c
SM8
c
SMD
c
Klamt1
d
Klamt2
e
Sulea1
f
Sulea2
g
Sulea3
h
Cup08004 1.82 3.492 0.40 0.30 0.70 0.43 0.02 0.13 0.24 1.46
Cup08005 1.88 3.391 0.30 0.20 0.60 0.13 0.13 0.17 0.26 1.52
Cup08007 8.21 9.878 6.10 6.30 8.40 7.66 8.02 9.23 9.01 7.54
Cup08016 9.61 9.859 12.20 12.30 10.90 10.97 11.15 8.64 8.35 9.23
Cup08018 3.44 3.142 3.00 2.70 4.40 ¯¯2.33 2.88 2.06
Cup08020 14.01 14.178 27.00 26.30 23.10 17.61 17.59 21.53 20.67
21.59
Cup08025 9.86 9.965 8.00 7.90 6.80 9.46 9.46 7.63 7.73 8.08
Cup08028 6.23 7.212 9.70 9.60 8.30 4.54 4.54 4.12 4.26 6.60
Cup08030 4.82 3.924 6.30 5.60 4.70 7.34 7.34 4.47 5.11 5.01
Cup08033 2.55 2.667 2.10 1.80 2.30 5.91 5.91 0.62 1.08 0.77
Cup08041 5.99 6.086 5.40 5.10 3.50 3.81 3.97
7.04 7.01 6.94
Cup08043 6.74 6.181 6.50 7.90 6.30 7.65 7.65 5.84 5.86 7.51
Cup08045 4.37 2.514 4.10 6.80 7.20 4.71 4.71 3.19 3.53 4.44
Cup08047 8.43 7.292 7.10 8.30 7.90 8.15 8.17 9.36 8.71 8.44
Cup08048 7.78 7.193 8.50 8.60 7.60 8.94 8.94 8.40 8.20 7.95
Cup08050 10.22 9.074 10.00 11.10 11.20
9.74 9.74 9.91 9.14 8.68
Cup08052 11.14 9.266 8.90 9.60 9.20 11.27 11.27 15.67 15.35 14.47
Cup08053 6.68 7.337 8.40 9.40 8.10 7.38 7.63 10.00 9.34 9.26
Cup08058 11.24 12.923 13.80 13.10 11.40 ¯¯13.36 14.12 16.27
RMSD 1.108 3.40 3.30 2.65 1.76 1.73 2.53 2.33 2.45
a
Experimental data Ref. 19.
b
HF/gas/CHELPG/PW RISM method with correction.
c
SM6, SM8, and SMD models Ref. 20.
d
Original prediction Ref. 21.
e
Prediction after cross merging Ref. 21.
f
Model 兵共1,0.9,16其共Ref. 22兲共supporting information.
g
Model 兵共1,0.9,25其共Ref. 22兲共supporting information.
h
Model 兵共2,1.0,25其共Ref. 22兲共supporting information.
044104-8 Palmer et al. J. Chem. Phys. 133, 044104 2010
The RISM calculations were validated against experi-
mental data taken from the SAMPL1 dataset. Since these
data were originally published as part of a blind challenge to
calculate hydration free energies, this has permitted us to
compare our results with those of the best implicit and ex-
plicit solvent approaches.
Although we observe that hydration free energies calcu-
lated with RISM theory contain significant absolute errors,
for the best methods tested here, these are found to be domi-
nated by large systematic errors, while the random errors are
considerably smaller. Using the best free energy expression
PW combined with the best structure determination meth-
ods HF or B3LYP with CHELPG/CHELPG-DIPOLE
charges and AM1 with BCC charges, the random errors in
the calculated hydration free energies were approximately
2.6 kcal/mol, which is comparable to results obtained by the
best implicit and explicit solvent methods. After parametri-
zation using an excluded volume term and simple atom
counts, the RMSD calculated by the best model HF/gas/
CHELPG/PW was less than 1.2 kcal/mol, which is about
half the error reported by continuum models for the same
molecules.
Hydration free energies calculated by RISM theory have
traditionally been considered to be too inaccurate to be use-
ful in practical applications such as pharmaceutical drug de-
sign. However, these assumptions have been based on pub-
lications that have tested the HNC, HNCB, or related free
energy expressions. The results presented here show that the
PW or GF expressions allow relatively accurate calculations
of hydration free energies, which may be systematically im-
proved by the addition of a small number of simple empirical
parameters.
The RISM calculations based on the HNC expression
give inaccurate estimates of hydration free energies because
they overestimate the energy required to form a cavity in the
solvent and underestimate the electrostatic contribution to
the hydration free energy of hydrogen bonding sites.
38,65,66
In
principle, it might be possible to eliminate some of these
errors through the design of an appropriate bridge function,
but this is presently an open problem in the integral equation
theory of molecular liquids.
The results presented here indicate that qualitatively cor-
rect results obtained by the best RISM expressions can be
improved by an empirical fitting procedure to yield very ac-
curate quantitative predictions of hydration free energies.
The optimum model HF/gas/CHELPG/PW is considerably
less computationally expensive than explicit solvent ap-
proaches for estimating hydration free energy. The results
suggest that after further development RISM theory has the
potential to be widely beneficial in practical applications
such as, e.g., pharmaceutical drug discovery and drug devel-
opment.
ACKNOWLEDGMENTS
This work was supported by a grant from the Villum
Kahn Rasmussen foundation through a postdoctoral grant to
D.S.P. Computations were made possible through grants
from the Lundbeck Foundation, the Novo Nordisk Founda-
tion, the Carlsberg Foundation, and from the Danish Center
for Scientific Computing. We thank Gennady N. Chuev and
Andrey I. Frolov for useful discussions and critical reading
of the manuscript. We would also like to acknowledge the
support staff of the Max-Planck-Institute for Mathematics in
the Sciences and particularly Ms. Valeria Huenniger, Ms.
Heike Rackwitz, and Ms. Theresa Petsch for the technical
and administrative support of the collaboration with Aarhus
University.
APPENDIX: THE RISM-MOL SOLVER
In the current work, the calculations of the RISM solute-
solvent correlation functions were performed with the RISM-
MOL program, which was developed, for fast solution of the
RISM integral equations, by Fedorov and Sergiievskyi in the
Computational Physical Chemistry and Biophysics group of
the Max-Planck-Institute for Mathematics in the Sciences.
To solve the RISM equations, the RISM-MOL program
uses the Fourier iterative method
23
speeded up by the multi
-
grid technique.
63
It was shown recently that the multigrid
method
63
is able to speed up the Fourier iterations for the
atomic Ornstein–Zernike equation up to several dozen
times.
62
The same multigrid method has been implemented
in the RISM-MOL program for 1D RISM calculations. Using
this algorithm, the hydration free energy calculations for the
largest molecule in the set 42 atoms took about 30 s on one
single processor core. The average time required for the hy-
dration free energy calculations was 17 s/molecule.
94
As the input data, the RISM-MOL solver takes the Car-
tesian coordinates, parameters of the Lennard-Jones poten-
tial, and partial charges q
s
of the atoms of the solute mol-
ecule. The parameters of the solvent molecules, as well as
precalculated bulk-solvent correlation functions h
s
bulk
r, are
embedded in the program. Using the atomic parameters, the
site-site interaction potentials between the solute sites s and
the solvent sites
are calculated,
u
s
r = u
s
LJ
r + u
s
C
r, A1
where u
s
C
r is the Coulomb potential
u
s
C
r =
q
s
q
r
A2
and u
s
LJ
r is a Lennard-Jones potential
u
s
LJ
r =4
s
冉冉
s
r
12
s
r
6
. A3
The pair Lennard-Jones parameters
s
and
s
are calcu-
lated via the combining rules. By default, the Lorentz–
Berthelot rules are used
s
=
s
+
2
,
s
=
s
. A4
Other combining rules can be defined by the user.
In the RISM-MOL program, it is possible to vary the
number of grids, the number of grid points, the number of
iterations, and, hence, the accuracy of the calculation. In the
044104-9
RISM G of druglike molecules J. Chem. Phys. 133, 044104 2010
current study, six-grid iterations were used. The final solution
was obtained on a grid with 4096 grid points and 0.05 bohr
step size with L
2
-norm accuracy =10
−4
.
The fast implementation of the algorithm for the numeri-
cal solution of the RISM equations, together with the pre-
sented possibilities for accurate hydration free energy calcu-
lations, makes the RISM-MOL solver a robust tool for
investigating the thermodynamics of solution. The program
can be obtained for academic users free of charge from
Fedorov by request.
1
C. A. Reynolds, P. M. King, and W. G. Richards, Mol. Phys. 76, 251
1992.
2
P. Kollman, Chem. Rev. Washington, D.C. 93, 2395 1993.
3
G. Perlovich and A. Bauer-Brandl, Curr. Drug Deliv. 1,2132004.
4
G. L. Perlovich, T. V. Volkova, and A. Bauer-Brandl, J. Pharm. Sci. 95,
2158 2006.
5
G. L. Perlovich, L. K. Hansen, T. V. Volkova, S. Mirza, A. N. Manin, and
A. Bauer-Brandl, Cryst. Growth Des. 7, 2643 2007.
6
L. D. Hughes, D. S. Palmer, F. Nigsch, and J. B. O. Mitchell, J. Chem.
Inf. Model. 48,2202008.
7
D. S. Palmer, A. Llinas, I. Morao, G. M. Day, J. M. Goodman, R. C.
Glen, and J. B. O. Mitchell, Mol. Pharmacol. 5,2662008.
8
W. L. Jorgensen and J. TiradoRives, Perspect. Drug Discovery Des. 3,
123 1995.
9
N. Matubayasi and M. Nakahara, J. Chem. Phys. 113, 6070 2000.
10
N. Matubayasi and M. Nakahara, J. Mol. Liq. 119,232005.
11
M. R. Shirts and V. S. Pande, J. Chem. Phys. 122, 134508 2005.
12
N. Matubayasi, Front. Biosci. 14, 3536 2009.
13
J. L. Knight and C. L. Brooks, J. Comput. Chem. 30, 1692 2009.
14
J. Tomasi and M. Persico, Chem. Rev. Washington, D.C. 94, 2027
1994.
15
B. Roux and T. Simonson, Biophys. Chem. 78,11999.
16
D. Bashford and D. A. Case, Annu. Rev. Phys. Chem. 51, 129 2000.
17
J. Tomasi, B. Mennucci, and R. Cammi, Chem. Rev. Washington, D.C.
105, 2999 2005.
18
M. B. Ulmschneider, J. P. Ulmschneider, M. S. P. Sansom, and A. Di
Nola, Biophys. J. 92, 2338 2007.
19
J. P. Guthrie, J. Phys. Chem. B 113, 4501 2009.
20
A. V. Marenich, C. J. Cramer, and D. G. Truhlar, J. Phys. Chem. B 113,
4538 2009.
21
A. Klamt, F. Eckert, and M. Diedenhofen, J. Phys. Chem. B 113,4508
2009.
22
T. Sulea, D. Wanapun, S. Dennis, and E. O. Purisima, J. Phys. Chem. B
113,45112009.
23
P. A. Monson and G. P. Morriss, Adv. Chem. Phys. 77,4511990.
24
J.-P. Hansen and I. R. McDonald, Theory of Simple Liquids, 3rd ed.
Academic, London, 1991, http://www.sciencedirect.com/science/book/
9780123705358.
25
Molecular Theory of Solvation, edited by F. Hirata Kluwer Academic,
Dordrecht, 2003.
26
L. Blum and A. J. Torruella, J. Chem. Phys. 56,3031972.
27
K. Amano and M. Kinoshita, Chem. Phys. Lett. 488,12010.
28
D. Chandler and H. C. Andersen, J. Chem. Phys. 57, 1930 1972.
29
F. Hirata, B. M. Pettitt, and P. J. Rossky, J. Chem. Phys. 77, 509 1982.
30
B. M. Pettitt and P. J. Rossky, J. Chem. Phys. 77, 1451 1982.
31
M. Kinoshita, Y. Okamoto, and F. Hirata, J. Comput. Chem. 19,1724
1998.
32
M. Kinoshita, Y. Okamoto, and F. Hirata, J. Am. Chem. Soc. 120, 1855
1998.
33
M. Kinoshita, Y. Okamoto, and F. Hirata, J. Chem. Phys. 110, 4090
1999.
34
T. Imai, M. Kinoshita, and F. Hirata, Bull. Chem. Soc. Jpn. 73, 1113
2000.
35
T. Imai, R. Hiraoka, A. Kovalenko, and F. Hirata, J. Am. Chem. Soc.
127, 15334 2005.
36
N. Yoshida, S. Phongphanphanee, Y. Maruyama, T. Imai, and F. Hirata, J.
Am. Chem. Soc. 128, 12042 2006.
37
N. Yoshida, S. Phongphanphanee, and F. Hirata, J. Phys. Chem. B 111,
4588 2007.
38
G. Chuev, M. Fedorov, and J. Crain, Chem. Phys. Lett. 448, 198 2007.
39
M. V. Fedorov and A. A. Kornyshev, Mol. Phys. 105,12007.
40
G. N. Chuev and M. V. Fedorov, J. Chem. Phys. 131, 074503 2009.
41
T. Imai, Y. Harano, M. Kinoshita, A. Kovalenko, and F. Hirata, J. Chem.
Phys. 126, 225102 2007.
42
T. Imai, S. Ohyama, A. Kovalenko, and F. Hirata, Protein Sci. 16,1927
2007.
43
D. Yokogawa, H. Sato, T. Imai, and S. Sakaki, J. Chem. Phys. 130,
064111 2009.
44
T. Imai, K. Oda, A. Kovalenko, F. Hirata, and A. Kidera, J. Am. Chem.
Soc. 131, 12430 2009.
45
Y. Kiyota, R. Hiraoka, N. Yoshida, Y. Maruyama, I. Imai, and F. Hirata,
J. Am. Chem. Soc. 131, 3852 2009.
46
K. Nishiyama, T. Yamaguchi, and F. Hirata, J. Phys. Chem. B 113, 2800
2009.
47
J.-P. Hansen and I. R. McDonald, Theory of Simple Liquids, 4th ed.
Elsevier Academic Press, Amsterdam, The Netherlands, 2000.
48
M. Kinoshita, Y. Okamoto, and F. Hirata, J. Comput. Chem. 18, 1320
1997.
49
A. Kovalenko and F. Hirata, J. Phys. Chem. B 103 , 7942 1999.
50
A. Kovalenko and F. Hirata, J. Chem. Phys. 110, 10095 1999.
51
L. Lue and D. Blankschtein, J. Phys. Chem. 96, 8582 1992.
52
H. J. C. Berendsen, J. R. Grigera, and T. P. Straatsma, J. Phys. Chem. 91,
6269 1987.
53
F. Hirata and P. J. Rossky, Chem. Phys. Lett. 83,3291981.
54
P. H. Lee and G. M. Maggiora, J. Phys. Chem. 97, 10175 1993.
55
A. Kovalenko and F. Hirata, J. Chem. Phys. 113, 2793 2000.
56
G. N. Chuev and M. V. Fedorov, J. Comput. Chem. 25,13692004.
57
G. N. Chuev and M. V. Fedorov, J. Chem. Phys. 120, 1191 2004.
58
M. V. Fedorov and G. N. Chuev, J. Mol. Liq. 120, 159 2005.
59
M. V. Fedorov, H. J. Flad, G. N. Chuev, L. Grasedyck, and B. N.
Khoromskij, Computing 80,472007.
60
W. L. Jorgensen, D. S. Maxwell, and J. TiradoRives, J. Am. Chem. Soc.
118, 11225 1996.
61
G. A. Kaminski, R. A. Friesner, J. Tirado-Rives, and W. L. Jorgensen, J.
Phys. Chem. B 105, 6474 2001.
62
M. V. Fedorov and W. Hackbusch, “A multigrid solver for the integral
equations of the theory of liquids,” Preprint No. 88 Max-Planck-Institut
fuer Mathematik in den Naturwissenschaften, 2008.
63
W. Hackbusch, Multi-Grid Methods and Applications Springer-Verlag,
Berlin, 1985.
64
S. J. Singer and D. Chandler, Mol. Phys. 55,6211985.
65
S. Ten-no, J. Chem. Phys. 115, 3724 2001.
66
K. Sato, H. Chuman, and S. Ten-no, J. Phys. Chem. B 109, 17290
2005.
67
D. Chandler, Y. Singh, and D. M. Richardson, J. Chem. Phys. 81,1975
1984.
68
S. Ten-no and S. Iwata, J. Chem. Phys. 111, 4865 1999.
69
C. M. Cortis, P. J. Rossky, and R. A. Friesner, J. Chem. Phys. 107, 6400
1997.
70
Q. H. Du, D. Beglov, and B. Roux, J. Phys. Chem. B 104, 796 2000.
71
A. Kovalenko, F. Hirata, and M. Kinoshita, J. Chem. Phys. 113, 9830
2000.
72
T. Luchko, S. Gusarov, D. R. Roe, C. Simmerling, D. A. Case, J. Tuszyn
-
ski, and A. Kovalenko, J. Chem. Theory Comput. 6, 607 2010.
73
S. Genheden, T. Luchko, S. Gusarov, A. Kovalenko, and U. Ryde, J.
Phys. Chem. B 114,85052010.
74
Schrödinger LLC 2008, SCHRODINGER SUITE 2008, MAESTRO Version 8.5,
MACROMODEL Version 9.6.
75
W. C. Still, A. Tempczyk, R. C. Hawley, and T. Hendrickson, J. Am.
Chem. Soc. 112, 6127 1990.
76
R. Krishnan, J. S. Binkley, R. Seeger, and J. A. Pople, J. Chem. Phys. 72,
650 1980.
77
E. Cancès, B. Mennucci, and J. Tomasi, J. Chem. Phys. 107, 3032
1997.
78
B. Mennucci and J. Tomasi, J. Chem. Phys. 106, 5151 1997.
79
M. Cossi, N. Rega, G. Scalmani, and V. Barone, J. Comput. Chem. 24,
669 2003.
80
V. Barone and M. Cossi, J. Phys. Chem. A 102, 1995 1998.
81
M. J. Frisch, G. W. Trucks, H. B. Schlegel et al., GAUSSIAN 03, Gaussian,
Inc., Wallingford, CT, 2004.
82
L. E. Chirlian and M. M. Francl, J. Comput. Chem. 8, 894 1987.
83
C. M. Breneman and K. B. Wiberg, J. Comput. Chem. 11,3611990.
84
B. H. Besler, K. M. Merz, and P. A. Kollman, J. Comput. Chem. 11,431
1990.
85
U. C. Singh and P. A. Kollman, J. Comput. Chem. 5, 129 1984.
044104-10 Palmer et al. J. Chem. Phys. 133, 044104 2010
86
A. E. Reed, R. B. Weinstock, and F. Weinhold, J. Chem. Phys. 83, 735
1985.
87
J. J. P. Stewart, MOPAC 6.00, Fujitsu Limited, Tokyo, Japan.
88
J. J. P. Stewart, J. Comput.-Aided Mol. Des. 4,11990.
89
J. M. Wang, W. Wang, P. A. Kollman, and D. A. Case, J. Mol. Graphics
Modell. 25, 247 2006.
90
J. M. Wang, R. M. Wolf, J. W. Caldwell, P. A. Kollman, and D. A. Case,
J. Comput. Chem. 25,11572004.
91
A. Jakalian, D. B. Jack, and C. I. Bayly, J. Comput. Chem. 23, 1623
2002.
92
J. G. Kirkwood and F. P. Buff, J. Chem. Phys. 19, 774 1951.
93
K. Knight, Mathematical Statistics CRC, Boca Raton, FL, 2000,p.502.
94
See supplementary material at http://dx.doi.org/10.1063/1.3458798 for
the results of the RISM calculations, results of the statistical analysis of
the calculations, results of the fitting and brief analysis of the computa-
tional performance of the algorithm.
95
For B3LYP calculations in vacuum with CHELP-DIPOLE charges and
Gaussian fluctuation GF RISM free energy formula.
044104-11
RISM G of druglike molecules J. Chem. Phys. 133, 044104 2010
... 31,32,[48][49][50] The PC1 functional has been shown to give accurate predictions of hydration free energies for neutral and ionized solutes, in both pure water and salt solutions at a wide-range of temperatures. [29][30][31][32][33] It has also been successfully applied to the prediction of solvation free energies in organic solvents. 49 ...
... Extensive previous benchmarking on solvation free energy data of organic molecules indicates that the PC1 functional gives more accurate results than the GF, PSE-3 or PC functionals. [29][30][31][32][33] In this section, which discusses binding free energies, we therefore focus on the results obtained using the PC1 functional, while the results of the other functionals are provided in the Supporting Information. The calculated binding free energies must be interpreted with caution because they do not include some terms relating Table 1). ...
Article
Bovine and camel chymosins are aspartic proteases that are used in dairy food manufacturing. Both enzymes catalyse proteolysis of a milk protein, κ-casein, which helps to initiate milk coagulation. Surprisingly, camel chymosin shows a 70% higher clotting activity than bovine chymosin for bovine milk, while exhibiting only 20% of the unspecific proteolytic activity. By contrast, bovine chymosin is a poor coagulant for camel milk. Although both enzymes are marketed commercially, the disparity in their catalytic activity is not yet well understood at a molecular level, due in part to a lack of atomistic resolution data about the chymosin - κ-casein complexes. Here, we report computational alanine scanning calculations of all four chymosin - κ-casein complexes, allowing us to elucidate the influence that individual residues have on binding thermodynamics. Of the 12 sequence differences in the binding sites of bovine and camel chymosin, eight are shown to be particularly important for understanding differences in the binding thermodynamics (Asp112Glu, Lys221Val, Gln242Arg, Gln278Lys. Glu290Asp, His292Asn, Gln294Glu, and Lys295Leu. Residue in bovine chymosin written first). The relative binding free energies of single-point mutants of chymosin are calculated using the molecular mechanics three dimensional reference interaction site model (MM-3DRISM). Visualisation of the solvent density functions calculated by 3DRISM reveals the difference in solvation of the binding sites of chymosin mutants. This article is protected by copyright. All rights reserved.
... Apart from being reliable in terms of HFE estimations, RISM is computationally less expensive than MD simulations, which could be attributed to the use of correlation functions. RISM has been frequently used for computing solvation free energies 17,[22][23][24][25][26][27] predominantly in water. Palmer et al. ...
Preprint
The potential to predict Solvation Free Energies (SFEs) in any solvent using a machine learning (ML) model based on thermodynamic output, extracted from 3D-RISM simulations in water is investigated. The models on multiple solvents take into account both the solute and solvent description and offer the possibility to predict SFEs of any solute in any solvent with root mean squared errors less than 1 kcal/mol. Validations that involve exclusion of fractions or clusters of the solutes or solvents exemplify the model’s capability to predict SFEs of novel solutes and solvents with diverse chemical profiles. In addition to being predictive, our models can identify the solute and solvent features that influence SFE predictions. Furthermore, using 3D-RISM hydration thermodynamic output to predict SFEs in any organic solvent reduces the need to run 3D-RISM simulations in all these solvents. Altogether, our multi-solvent models for SFE predictions that take advantage of the solvation effects are expected to have an impact in the property prediction space.
... These results indicate that the inclusion for molecular orientation dependencies contributes to the improvement of the SFE. In addition, the following corrections based on the phenomenological partial molar volume (PMV) corrections have been proposed: the universal corrections [129][130][131][132][133], the structural descriptor correction [134], the bridge function correction [135], and the pressure correction [136][137][138][139][140]. These corrections include the PMV expressed as follows: [141,142] ...
Article
Full-text available
Molecular dynamics simulation is a fruitful tool for investigating the structural stability, dynamics, and functions of biopolymers at an atomic level. In recent years, simulations can be performed on time scales of the order of milliseconds using specialpurpose systems. Since the most stable structure, as well as meta-stable structures and intermediate structures, is included in trajectories in long simulations, it is necessary to develop analysis methods for extracting them from trajectories of simulations. For these structures, methods for evaluating the stabilities, including the solvent effect, are also needed. We have developed relaxation mode analysis to investigate dynamics and kinetics of simulations based on statistical mechanics. We have also applied the three-dimensional reference interaction site model theory to investigate stabilities with solvent effects. In this paper, we review the results for designing amino-acid substitution of the 10-residue peptide, chignolin, to stabilize the misfolded structure using these developed analysis methods. Fullsize Image
... The error associated with these thermodynamic quantities traces to any error in the force field used, and to the approximations used to compute the atomic distributions. In recent years, corrections to errors in thermodynamic quantities have been developed that are applied after the solvent density distributions have been calculated, leading to estimates of solvation free energies with accuracies as good as explicit solvent simulations [18][19][20][21][22][23][24][25][26][27][28][29]. ...
Article
Full-text available
Computed, high-resolution, spatial distributions of solvation energy and entropy can provide detailed information about the role of water in molecular recognition. While grid inhomogeneous solvation theory (GIST) provides rigorous, detailed thermodynamic information from explicit solvent molecular dynamics simulations, recent developments in the 3D reference interaction site model (3D-RISM) theory allow many of the same quantities to be calculated in a fraction of the time. However, 3D-RISM produces atomic-site, rather than molecular, density distributions, which are difficult to extract physical meaning from. To overcome this difficulty, we introduce a method to reconstruct molecular density distributions from atomic-site density distributions. Furthermore, we assess the quality of the resulting solvation thermodynamics density distributions by analyzing the binding site of coagulation Factor Xa with both GIST and 3D-RISM. We find good qualitative agreement between the methods for oxygen and hydrogen densities as well as direct solute-solvent energetic interactions. However, 3D-RISM predicts lower energetic and entropic penalties for moving water from the bulk to the binding site.
... We have proposed an extension of molecular DFT to arbitrary fluid/solvents (the so-called MDFT method) in the goal of describing the solvation of three-dimensional molecular object in those solvents. [36][37][38][48][49][50][51][52][53][54][55][56] Note that a 3D-version of the RISM equations [57][58][59][60][61][62], as well as a RISMbased DFT approach [63,64] have also been developed recently with the same goal. ...
Article
Full-text available
For the problem of molecular solvation, formulated as a liquid submitted to the external potential field created by a molecular solute of arbitrary shape dissolved in that solvent, we draw a connection between the Gaussian Field Theory derived by David Chandler [Phys. Rev. E, 48, 2898 (1993)] and classical Density Functional Theory. We show that Chandler's results concerning the solvation of a hard core of arbitrary shape can be recovered by either minimising a linearised HNC functional using an auxiliary Lagrange multiplier field to impose a vanishing density inside the core, or by minimising this functional directly outside the core --indeed a simpler procedure. Those equivalent approaches are compared to two other variants of DFT, either in the HNC, or partially linearised HNC approximation, for the solvation of a Lennard-Jones solute of increasing size in a Lennard-Jones solvent. Compared to Monte-Carlo simulations, all those theories give acceptable results for the inhomogeneous solvent structure, but are completely out-of-range for the solvation free-energies. This can be fixed in DFT by adding a hard-sphere bridge correction to the HNC functional.
Article
The potential to predict Solvation Free Energies (SFEs) in any solvent using a machine learning (ML) model based on thermodynamic output, extracted exclusively from 3D-RISM simulations in water is investigated. The models on multiple solvents take into account both the solute and solvent description and offer the possibility to predict SFEs of any solute in any solvent with root mean squared errors less than 1 kcal/mol. Validations that involve exclusion of fractions or clusters of the solutes or solvents exemplify the model’s capability to predict SFEs of novel solutes and solvents with diverse chemical profiles. In addition to being predictive, our models can identify the solute and solvent features that influence SFE predictions. Furthermore, using 3D-RISM hydration thermodynamic output to predict SFEs in any organic solvent reduces the need to run 3D-RISM simulations in all these solvents. Altogether, our multi-solvent models for SFE predictions that take advantage of the solvation effects are expected to have an impact in the property prediction space.
Article
Full-text available
The integration equation theory (IET) provides highly efficient tools for the calculation of structural and thermodynamic properties of molecular liquids. In recent years, the 3D reference interaction site model (3DRISM), the most developed IET for solvation, has been widely applied to study protein solvation, aggregation, and drug‐receptor binding. However, hydrophobic solutes with sufficient size (>nm) can induce water density depletion at the solute–solvent interface. This density depletion is not considered in the original 3DRISM theory. The authors here review the recent developments of 3DRISM at hydrophobic surfaces and related theories to address this challenge. At hydrophobic surfaces, an additional hydrophobicity‐induced density inhomogeneity equation is introduced to 3DRISM theory to consider this density depletion. Accordingly, several new closures equations including D2 closure and D2MSA closures are developed to enable stable numerical solutions of 3DRISM equations. These newly developed theories hold great promise for an accurate and rapid calculation of the solvation effect for complex molecular systems such as proteins. At the end of the report, the authors also provide a perspective on other challenges of the IETs as an efficient solvation model.
Article
The hydration free energy (HFE) is a critical property for predicting and understanding chemical and biological processes in aqueous solution. There are a number of computational methods to derive HFE, generally classified into the equilibrium or non-equilibrium methods, based on the type of calculations used. In the present study, we compute the hydration free energies of 34 small, neutral, organic molecules with experimental HFE between +2 and -16 kcal/mol. The one-sided non-equilibrium methods Jarzynski Forward (JF) and Backward (JB), the two-sided non-equilibrium methods Jarzynski mean based on the average of JF and JB, Crooks Gaussian Intersection (CGI), and the Bennett Acceptance Ratio (BAR) are compared to the estimates from the two-sided equilibrium method Multistate Bennett Acceptance Ratio (MBAR), which is considered as the reference method for HFE calculations, and experimental data from the literature. Our results show that the estimated hydration free energies from all the methods are consistent with MBAR results, and all methods provide a mean absolute error of ∼0.8 kcal/mol and root mean square error of ∼1 kcal for the 34 organic molecules studied. In addition, the results show that one-sided methods JF and JB result in systematic deviations that cannot be corrected entirely. The statistical efficiency ε of the different methods can be expressed as the one over the simulation time times the average variance in the HFE. From such an analysis, we conclude that ε(MBAR) > ε(BAR) ≈ ε(CGI) > ε(JX), where JX is any of the Jarzynski methods. In other words, the non-equilibrium methods tested here for the prediction of HFE have lower computational efficiency than the MBAR method.
Article
Recently, Güssregen et al. used solute-solvent distribution functions calculated by the three-dimensional Reference Interaction Site Model (3DRISM) in a 3D quantitative structure-activity relationship (QSAR) approach to model activity data for a set of serine protease inhibitors; this approach was referred to as Comparative Analysis of 3D RISM Maps (CARMa). [ J. Chem. Inf. Model: 2017, 57, 1652-1666] Here we extend this idea by introducing probe atoms into the 3DRISM solvent model in order to directly capture other molecular interactions in addition to those related to hydration/dehydration. Benchmark results for six different protein-ligand systems show that CARMa models trained on probe atom descriptors give consistently more accurate predictions than Comparative Molecular Field Analysis (CoMFA) and other common QSAR approaches.
Article
Recently, we proposed a reference-modified density functional theory (RMDFT) to calculate solvation free energy (SFE), in which a hard-sphere fluid was introduced as the reference system instead of an ideal molecular gas. Through the RMDFT, using an optimal diameter for the hard-sphere reference system, the values of the SFE calculated at room temperature and normal pressure were in good agreement with those for more than 500 small organic molecules in water as determined by experiments. In this study, we present an application of the RMDFT for calculating the temperature and pressure dependences of the SFE for solute molecules in water. We demonstrate that the RMDFT has high predictive ability for the temperature and pressure dependences of the SFE for small solute molecules in water when the optimal reference hard-sphere diameter determined for each thermodynamic condition is used. We also apply the RMDFT to investigate the temperature and pressure dependences of the thermodynamic stability of an artificial small protein, chignolin, and discuss the mechanism of high-temperature and high-pressure unfolding of the protein. © 2017 Wiley Periodicals, Inc.
Article
Full-text available
A method of 'natural population analysis' has been developed to calculate atomic charges and orbital populations of molecular wave functions in general atomic orbital basis sets. The natural analysis is an alternative to conventional Mulliken population analysis, and seems to exhibit improved numerical stability and to better describe the electron distribution in compounds of high ionic character, such as those containing metal atoms. An ab initio calculation is conducted of SCF-MO wave functions for compounds of type CH3X and LiX (X = F, OH, NH2, CH3, BH2, BeH, Li, H) in a variety of basis sets to illustrate the generality of the method, and to compare the natural populations with results of Mulliken analysis, density integration, and empirical measures of ionic character. Natural populations are found to give a satisfactory description of these molecules, providing a unified treatment of covalent and extreme ionic limits at modest computational cost.
Technical Report
Full-text available
Abstract In this article we present a new multigrid algorithm to solve the Ornstein-Zernike type integral equations of the theory of liquids. This approach is based on ideas com- ing from the multigrid methods for numerical solutions of integral equations (see §16 in [13]). We describe this method in a general manner as a ’template’ for construc- tion of efficient multilevel iterations for numerical solution of the integral equations in the theory of liquids. We report on several numerical experiments to illustrate the effectiveness of the method. The algorithm is tested on a model problem - a simple monoatomic,fluid with a continuous short ranged potential. The tests have indicated that the method sufficiently accelerates the convergence of the numerical solution in all considered cases. AMS Subject Classification: 65R99, 45G15 PACS numbers: 02.60.Nm, 61.20.Ne, 61.20.Gy Key words. Ornstein-Zernike equation, integral equations theory of liquids, multigrid methods.
Article
A contracted Gaussian basis set (6‐311G∗∗) is developed by optimizing exponents and coefficients at the Møller–Plesset (MP) second‐order level for the ground states of first‐row atoms. This has a triple split in the valence s and p shells together with a single set of uncontracted polarization functions on each atom. The basis is tested by computing structures and energies for some simple molecules at various levels of MP theory and comparing with experiment.
Article
The optimized cluster expansion methods developed in the first article of this series (I) are generalized to apply to molecular fluids. These methods make use of summations of ring and chain cluster diagrams. The summations are performed explicitly for certain classes of molecular models. The molecules in these classes contain several ``interaction sites,'' and the total interaction between two molecules is a sum of site-site potentials that depend on the scalar distances between sites on the two molecules. The principal results of this work are computationally simple techniques for calculating the thermo-dynamic properties and pair correlation functions of molecular fluids in which the intermolecular interactions are highly angular dependent. The techniques should be reliable since they arise from the same approximations that have been shown to be very accurate when applied to simple fluids.
Article
Salt effects on the stability and on the solvation structure of a peptide in a variety of aqueous solutions of the alkalihalide ions are studied by means of the reference interaction site model (RISM) theory. The order of salt effect on the peptide stability is consistent with the experimental results; the order follows the Hofmeister series. The results are further analyzed in order to clarify the nature of the salt effect which determines the Hofmeister series and to find the reason why the Hofmeister series applies so generally to a variety of solutes in aqueous solutions, ii heuristic model for explaining salt effects on the solvation structure of the peptide is proposed based on changes in the peptide-water pair correlation functions due to the ion perturbation.
Article
It is shown that the free energies associated with the solutions of extended RISM integral equations can be obtained in closed form thus avoiding the necessity of numerical coupling parameter integrations. In addition, variational principles are deduced which provide a basis for efficient algorithms to solve extended RISM integral equations.
Article
The energy representation of the molecular configuration in a dilute solution is introduced to express the solvent distribution around the solute over a one-dimensional coordinate specifying the solute–solvent interaction energy. On the basis of the energy representation, an approximate functional for the solvation free energy of a solute in solution is constructed by adopting the Percus-Yevick-type approximation in the unfavorable region of the solute–solvent interaction and the hypernetted-chain-type approximation in the favorable region. The solvation free energy is then given exactly to second order with respect to the solvent density and to the solute–solvent interaction. It is demonstrated that the solvation free energies of nonpolar, polar, and ionic solutes in water are evaluated accurately and efficiently from the single functional over a wide range of thermodynamic conditions. The extension to a flexible solute molecule is straightforward. The applicability of the method is illustrated for solute molecules with a stretching or torsional degree of freedom.
Article
The RISM integral equation is extended to molecules with charged sites via a renormalization of the Coulomb potentials and the introduction of appropriate closure relations. For a fluid of diatomics with atomic charges of ±0.2 e the equation yields site-site correlation functions in qualitative agreement with those from computer simulation.