ArticlePDF Available

Synergy and Complementarity between Focused Machine Learning and Physics-Based Simulation in Affinity Prediction

December 2021
Journal of Chemical Information and Modeling XXXX(XXX)

December 2021
XXXX(XXX)

DOI:10.1021/acs.jcim.1c01382

License
CC BY-NC-ND 4.0

Authors:

Ann Cleves

BioPharmics Division, Optibrium Ltd.

Ajay N Jain

BioPharmics LLC

We present results on the extent to which physics-based simulation (exemplified by FEP⁺) and focused machine learning (exemplified by QuanSA) are complementary for ligand affinity prediction. For both methods, predictions of activity for LFA-1 inhibitors from a medicinal chemistry lead optimization project were accurate within the applicable domain of each approach. A hybrid model that combined predictions by both approaches by simple averaging performed better than either method, with respect to both ranking and absolute pKi values. Two publicly available FEP⁺ benchmarks, covering 16 diverse biological targets, were used to test the generality of the synergy. By identifying training data specifically focused on relevant ligands, accurate QuanSA models were derived using ligand activity data known at the time of the original series publications. Results across the 16 benchmark targets demonstrated significant improvements both for ranking and for absolute pKi values using hybrid predictions that combined the FEP⁺ and QuanSA predicted affinity values. The results argue for a combined approach for affinity prediction that makes use of physics-driven methods as well as those driven by machine learning, each applied carefully on appropriate compounds, with hybrid prediction strategies being employed where possible.

Plot of experimental activities versus predicted activities from QuanSA for the full set of 67 future test molecules. Test molecules 5−8 have structures significantly different from those of the training compounds, and the plot points for these compounds are highlighted in orange. Also shown are the top pose families and interactions with the pocket field for four example test molecules with the spirocyclic pyrrolidine scaffold (9−12) whose points on the graph are highlighted in blue and are indicated with red arrows.

…

FEP + reference ligand (13) and four test molecules (4 and 14−16) are shown. FEP + employs an initial docked pose of the reference molecule in the LFA-1 binding pocket. The top pose family of the reference ligand resulting from ensemble docking using Surflex-Dock is shown to illustrate the potential conformational variation of the ligand in the protein pocket.

…

Four examples from comparisons of activity predictions on a 17 molecule subset of the blind test set. For QuanSA, the top pose family for each test molecule plus the interaction sticks of the top pose with the pocket field is shown. For FEP + , the initial docked poses are shown. Hybrid predicted pK i values are the simple average of the QuanSA and FEP + values.

…

Comparisons of activity predictions on a 17 molecule subset of the blind test set for QuanSA, FEP + , and hybrid methods.

…

Preparation and scoring procedures in the early lead optimization scenario, using a bound reference compound to identify relevant additional bound ligands, which are then used collectively to identify a pool of relevant bioactivity data for input to QuanSA model induction.

…

Figures - available via license: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International

Content may be subject to copyright.

Available via license: CC BY-NC-ND 4.0

Content may be subject to copyright.

Synergy and Complementarity between Focused Machine Learning

and Physics-Based Simulation in Aﬃnity Prediction

Ann E. Cleves, Stephen R. Johnson,*and Ajay N. Jain*

Cite This: https://doi.org/10.1021/acs.jcim.1c01382

Read Online

ACCESS Metrics & More Article Recommendations *

sıSupporting Information

ABSTRACT: We present results on the extent to which physics-based

simulation (exempliﬁed by FEP+) and focused machine learning

(exempliﬁed by QuanSA) are complementary for ligand aﬃnity prediction.

For both methods, predictions of activity for LFA-1 inhibitors from a

medicinal chemistry lead optimization project were accurate within the

applicable domain of each approach. A hybrid model that combined

predictions by both approaches by simple averaging performed better than

either method, with respect to both ranking and absolute pKivalues. Two

publicly available FEP+benchmarks, covering 16 diverse biological targets,

were used to test the generality of the synergy. By identifying training data

speciﬁcally focused on relevant ligands, accurate QuanSA models were derived using ligand activity data known at the time of the

original series publications. Results across the 16 benchmark targets demonstrated signiﬁcant improvements both for ranking and for

absolute pKivalues using hybrid predictions that combined the FEP+and QuanSA predicted aﬃnity values. The results argue for a

combined approach for aﬃnity prediction that makes use of physics-driven methods as well as those driven by machine learning,

each applied carefully on appropriate compounds, with hybrid prediction strategies being employed where possible.

■INTRODUCTION

Binding aﬃnity prediction continues to be a challenge for

computer-aided drug design, especially in the case where there

is no high-resolution experimental structure of the target of

interest. Even when structures of the biological target are

available, aﬃnity prediction is diﬃcult. Simulation oriented

physics-based methods, such as MM/PBSA or MM/

GBSA

1−3,6

or free energy perturbation (FEP),

7−9

share a key

attraction: in principle, these approaches are congruent with

what is known physically. The former methods nominally

predict absolute binding free energy. In terms of predictive

accuracy, even in the case where experimental structures are

known for all ligands under consideration, performance has

been observed to be quite variable on a per-target basis,

though more consistent results have been obtained in some

cases, with careful application.

Additional context with respect

to the state of physical simulation approaches is provided by

recent reviews.

4,5

For the FEP approach, relative free energy predictions are

made. This is done by estimating the diﬀerence in the free

energies of protein−ligand complexes between related ligand

pairs (typically diﬀering relatively modestly in their sub-

stituents). Advances in force ﬁelds, sampling methods, and

automated design of perturbation graphs

can help to guide

ﬁne-grained molecular optimization. In cases where the FEP+

method is applicable, for single perturbations of a few ligand

atoms from a known reference ligand, errors in predicting

changes in free energy have been reported to be as low as 0.5

pKiunits (0.9 kcal/mol).

More recent benchmarking on a

more challenging set of perturbations yielded errors roughly

50% higher.

Aﬃnity prediction remains a challenging

problem, even in cases where targets have well-characterized

structures and there is little uncertainty in ligand binding

modes.

Machine-learning approaches have seen a recent resurgence

in their applications within the CADD ﬁeld, in part driven by

advances in deep-learning methodologies. A recent review

highlights a number of successful applications as well as

limitations,

with further context provided by a full book

treatment.

With respect to binding aﬃnity prediction in the

context of lead optimization, a critical factor is that the

methods typically require thousands of data points in order to

learn eﬀectively, because of the need to develop encoded

internal representations that meaningfully capture the

important aspects required for prediction. Early-stage lead

optimization may involve just dozens of assayed molecules

within a newly discovered chemical series, and even mid-to-

late-stage projects may be limited to hundreds or up to a few

thousand data points. The recently introduced QuanSA

machine-learning method (Quantitative Surface-ﬁeld Analysis)

Received: November 11, 2021

Articlepubs.acs.org/jcim

American Chemical Society A

https://doi.org/10.1021/acs.jcim.1c01382

J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Downloaded via 178.171.38.123 on December 11, 2021 at 02:20:05 (UTC).

See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.

diﬀers from the deep-learning paradigm and from historically

widely used methods.

The central diﬀerence is that, rather than applying a generic

machine-learning approach to an input molecular representa-

tion divorced from a binding event, QuanSA builds a physically

interpretable model that is analogous to a protein binding site.

By doing so, it addresses the problem of ligand conformation

and alignment fully automatically, and it moves in the direction

of causal modeling, where the requirement for data can be

reduced. The method constructs a nonlinear “pocket ﬁeld”that

is still physical in nature, and which is directly related to the

functional form of scoring functions for docking.

15,16

QuanSA

pocket-ﬁeld models mirror key physical phenomena that are

observed in protein−ligand interactions:

(1) choice of ligand

poses is deﬁned by the model; (2) non-additive (or even anti-

additive) eﬀects of substituent changes on a central scaﬀold

can be modeled eﬀectively; (3) changes in ligand structures

induce changes in predicted ligand poses; (4) the model of

molecular activity is dependent on the detailed shape of

ligands. Nearly all QSAR and deep-learning methods ignore

some or all of these aspects of protein−ligand interactions.

Additional discussion of the theoretical contrasts between the

QuanSA multiple-instance learning approach and other QSAR

(3D and 2D) approaches can be found in the papers

introducing the method

along with the antecedent

QMOD

and Compass

19−21

approaches, the latter of which

introduced the multiple-instance machine-learning paradigm.

Here, we explore the performance of both FEP+and the

QuanSA machine-learning method in a lead optimization

project application scenario and using two publicly available

FEP+benchmarks,

9,11

spanning 16 diverse targets and covering

aﬃnity predictions for nearly 400 molecules. Project data for

LFA-1

23,24

was used as a representative example of mid-to-late-

stage lead optimization, where substantial structure−activity

data exist, particularly within a chemical series of interest. The

two FEP+benchmarks were used to assess early-stage project

application, where only sparse data may be available.

Accuracy of the QuanSA and FEP+approaches, as well as a

hybrid approach combining predictions from the two methods

by simple averaging, will be detailed in what follows. In

Figure 1. Overview of the QuanSA method. Beginning from ligand structures and activities (here against LFA-1), a multiple-ligand alignment is

produced (with variants for each molecule), after which a smooth, nonlinear function is induced (called a “pocket ﬁeld”), into which new molecules

can be ﬂexibly ﬁt as is commonly done with docking approaches. Here, the new test molecule, compound 4, was made 7 months after the last

molecule within the training set (example molecules 1−3), and it was accurately predicted. Shown in the lower row is the predicted pose of

compound 4, the surface surrounded by the pocket ﬁeld (left), and the interactions with the pocket ﬁeld with and without the surface (middle and

right).

Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

https://doi.org/10.1021/acs.jcim.1c01382

J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

addition, because the QuanSA approach can be practically

applied to screen large databases for new lead discovery and

scaﬀold replacement, a screening utility was assessed using

structure−activity data for diverse compounds that were

disclosed after the data used for model induction.

■RESULTS AND DISCUSSION

We report results for two types of project application

scenarios: mid-to-late-stage lead optimization and early stage.

In both cases, training and testing data for QuanSA were

temporally segregated: building models on older molecules and

predicting the activities of future molecules. This parallels the

application scenario for predictive modeling, and it avoids bias

in assessing the performance of learned models.

17,25,26

For the

mid-to-late-stage scenario, the data set included compound

registration dates and associated activities. For the early-stage

scenario, coarser temporal segregation was accomplished by

making use of the years of disclosure of structure−activity and

protein structural data. Assessment of a screening utility for the

QuanSA models also employed temporal segregation.

QuanSA Model Induction. The QuanSA method has

been previously described in detail

and will be summarized

only brieﬂy here, with additional details in the Supporting

Information.Figure 1 illustrates the induction of a QuanSA

pocket ﬁeld. Beginning with pure SAR data (here SMILES

strings and associated pKimeasurements), low-energy

conformational ensembles are produced, from which multiple

mutual ligand alignments are automatically constructed. These

alignments may be inﬂuenced (optionally) by provision of

known bound ligand poses, and each ligand alignment contains

a single optimal pose along with many related alternative poses.

The derived pocket ﬁeld acts as a virtual binding pocket, into

which new molecules are ﬂexibly ﬁt, subject to the joint

considerations of optimizing ligand interactions with the

pocket and minimizing ligand strain.

For all models in this study, training ligands were focused

around scaﬀolds of interest with respect to prediction and, in

all cases, the poses of bound ligands were used to drive the

initial alignment process. The more general case of diverse

scaﬀolds without the beneﬁt of known bound ligand poses is

more challenging, and that has been discussed extensively in

prior work.

14,18

Figure 1 shows three representative training molecules (1−

3) and one future test molecule (4) from this work. Shown in

3D is the mutual overlay of the ﬁnal optimal poses of the

training molecules in the model. In this example, QuanSA

accurately predicted the activity of the new molecule, which

was synthesized months after the molecules used for model

induction.

Mid-to-Late-Stage Project Application Scenario: LFA-

1. LFA-1 is a heterodimeric protein of the integrin family with

noncovalently linked αand βsubunits and is expressed on the

surface of leukocytes.

LFA-1 mediates the interactions

between leukocytes and other cells and has been pursued as

a target for immunological disorder treatments, both by

antibodies

and with small molecules.

The compounds in

this work were generated in an eﬀort to identify orally active

small molecules that disrupted the LFA-1/ICAM-1 inter-

action.

23,24

The set is comprised mostly of bicyclic hydantoins

(e.g., compound 2), spirocyclic hydantoins, and spirocyclic

pyrrolidines (e.g., compound 1), and all bind competitively to

the I-domain allosteric site of LFA-1 and prevent the

conformational changes required for ICAM-1 binding.

The LFA-1 structure−activity set contained homogeneous,

high-quality assay data, with time stamps available to allow for

segregation of data into a training set and a set of future

compounds for prediction. Figure 2 (left) depicts the QuanSA

Figure 2. Preparation and scoring procedures using a temporally segregated set of LFA-1 inhibitors from a medicinal chemistry lead optimization

project: QuanSA (left) and FEP+(right). The QuanSA approach follows a machine-learning paradigm, employing a training set and a holdout set

for model selection. The FEP+approach combines careful force ﬁeld parameter estimation, molecular docking, and extensive physical simulation.

Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

https://doi.org/10.1021/acs.jcim.1c01382

J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

model building, model selection, and testing procedure applied

to the series of LFA-1 inhibitors. Model selection was done by

testing alternative models on a later set of holdout molecules.

Figure 2 (right) depicts the procedure for FEP+. QuanSA

makes use of a typical machine-learning paradigm, employing

training and (optional) holdout sets of molecules, within

successive time windows of project activity. FEP+makes

predictions on a set of structural variations of a reference

molecule, with the reference here being selected from among

the LFA-1 holdout set and the 17 molecules for prediction

being chosen from among the 67 molecules from the ﬁnal

project time window.

The selected model had a mean unsigned error (MUE) of

0.56 log unit on the holdout set, corresponding to a Kendall’sτ

of 0.48 (p< 0.0001). This model was reﬁned using the holdout

molecules, resulting in a ﬁnal ﬁt to the 135 training/holdout

molecules of 0.25 log unit MUE and Kendall’sτof 0.86 (p<

0.0001). The reﬁned pocket ﬁeld (shown in Figure 1) was then

used to score the blind test set of 67 future molecules.

The plot in Figure 3 shows the experimental activities

compared to the QuanSA predicted activities for the full set of

67 future test molecules. QuanSA yielded statistically

signiﬁcant predictions for the full blind test with a τof 0.57

(95% conﬁdence interval (CI) 0.42−0.69, p< 0.0001) and an

Figure 3. Plot of experimental activities versus predicted activities from QuanSA for the full set of 67 future test molecules. Test molecules 5−8

have structures signiﬁcantly diﬀerent from those of the training compounds, and the plot points for these compounds are highlighted in orange.

Also shown are the top pose families and interactions with the pocket ﬁeld for four example test molecules with the spirocyclic pyrrolidine scaﬀold

(9−12) whose points on the graph are highlighted in blue and are indicated with red arrows.

Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

https://doi.org/10.1021/acs.jcim.1c01382

J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

MUE of 0.52 log unit (95% CI 0.43−0.63). Lines indicate

perfect prediction, and to ±0.7 and ±1.5 units of pKi

(corresponding to ±1 and ±2 kcal/mol). Just under 80% of

the molecules (53 of 67) were predicted within 0.7 unit of pKi,

and just two molecules exceeded 1.5 units.

Figure 3 shows the structures of eight example test

molecules. Four test molecules are shown in 2D only (top

right, 5−8) with structures signiﬁcantly diﬀerent from those of

the majority of compounds synthesized late in the project.

Compounds 5−7contain centrally located amines, and

compound 8has a diﬀerent scaﬀold. Despite these structural

diﬀerences, QuanSA accurately predicted the activities of these

structurally divergent molecules whose activity spanned a range

of 3.5 log units.

Four other test molecules, each of which has the heavily

explored spirocyclic pyrrolidine scaﬀold (9−12), are also

shown in Figure 3, along with their top-scoring pose families

and interactions with the pocket ﬁeld. Many of the molecules

in the data set varied only in the substitutions on the

spirocyclic pyrrolidine nitrogen, as shown for molecules 9−12.

The interaction sticks for these molecules with the pocket ﬁeld

closely mimic the interactions observed in the X-ray cocrystal

structure of compound 1with LFA-1.

Most of the

interactions were hydrophobic (teal sticks) including those

for the dichlorophenyl group itself, which occupies a

hydrophobic pocket. The urea carbonyl, thought to be

hydrogen bonded via a water molecule, is marked by a

prominent red acceptor stick. Compounds 9and 10 were

among the most potent molecules in the test set, and QuanSA

accurately predicted the activities despite the negative charge

on the R group.

FEP+Prediction Performance and Hybrid Modeling.

The FEP+approach employs a reference ligand with a known

free energy of binding along with a structure of the ligand

bound to the protein of interest. From this reference ligand, a

set of molecular transformations can be made and arranged

into a connected graph such that connected pairs of test

molecules have relatively high similarities. For each such

connected pair, a calculation of ΔΔGij is carried out,

corresponding to a single edge in the graph. To obtain a

prediction for a particular molecule, a single edge is the

minimal calculation required, though calculation of the full set

of ΔΔGij within a perturbation graph and application of cycle-

closure corrections can improve the accuracy.

In practice, due

to the complexity and computational expense of applying the

method, single-edged aﬃnity predictions are often employed.

We limited our FEP+predictions to a subset of 17 of the 67

future test molecules that were suitable for single-edged ΔΔG

calculations from a single reference ligand. Figure 4 shows the

FEP+reference ligand (13) and four example test molecules (4

and 14−16) from the 17 molecule test subset. All 17 molecules

in the subset used with FEP+had the spirocyclic pyrrolidine

core and diﬀered only by the R group at the pyrrolidine

nitrogen. Standard Glide MCSS docking

was used to

establish initial binding modes for the FEP+calculations (see

the Experimental Section for details).

In order to illustrate ligand movement within the LFA-1

allosteric binding site, the results of ensemble docking are also

shown in Figure 4. The ensemble docking pose families shown

Figure 4. FEP+reference ligand (13) and four test molecules (4and 14−16) are shown. FEP+employs an initial docked pose of the reference

molecule in the LFA-1 binding pocket. The top pose family of the reference ligand resulting from ensemble docking using Surﬂex-Dock is shown to

illustrate the potential conformational variation of the ligand in the protein pocket.

Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

https://doi.org/10.1021/acs.jcim.1c01382

J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

here are consistent across the diﬀerent compounds: while the

spirocyclic core tends to bind in a relatively ﬁxed orientation,

there is the potential for conformational variation for the

pyrrolidine nitrogen R groups. This conformational variation is

consistent with crystal structures, which also suggested that

substituents anchored from the ﬁve-membered ring system

project into solvent.

Figure 5 shows four examples from comparisons of QuanSA

and FEP+activity predictions on the 17 molecule subset. The

ﬁnal optimal pose families from QuanSA for each of the

molecules 4,14,15, and 16 follow the motif seen for test

compounds 9−12 (Figure 3) with the spirocyclic pyrrolidine

scaﬀold in a relatively ﬁxed position and conformational

variation for the R group on the pyrrolidine nitrogen. Also, the

pocket-ﬁeld interactions followed the same pattern, mostly

hydrophobic with a prominent acceptor interaction near the

urea carbonyl. The possible orientation changes in the R

groups is reﬂected in the starting docked poses for FEP+.

QuanSA predicted the activities of compounds 4,14, and 15

within 0.5 log unit of activity. FEP+predictions for these active

molecules were slightly less accurate, but still quite good. Note

that the orientations of the nitrogen substituents of the

pyrrolidine diﬀer between QuanSA and FEP+. This was

expected, reﬂecting the pose variation seen in Figure 4 from

ensemble docking of the reference ligand. The QuanSA

alignments were driven by mutual similarity, inﬂuenced by the

crystallographic reference ligand pose toward the “bottom”of

the ligands, which shared structural homogeneity. The diversity

of substituent orientations seen in the FEP+poses reﬂected

solvent exposure with sparse protein interactions.

Combining the two methods by averaging their independent

predictions (termed “hybrid”model predictions) often led to

partial cancellation of errors. For example, for the relatively

active molecule 14 and the signiﬁcantly less active molecule

16, predictions from both methods were oﬀ, but the errors

were opposite in sign. By combining the results from the two

methods, the predictions for both molecules were reduced to

negligible discrepancies from experimental activity. Note that

typical standard deviations in repeated LFA-1 IC50 determi-

nations were approximately 0.1 pKiunit.

23,24

Figure 6 shows a plot of individual test performance on a

subset of 17 ligands for the QuanSA structure-guided model

(purple times signs, MUE = 0.44), and for FEP+(green plus

signs, MUE = 0.56) as well as for the combination of the

methods (red squares, MUE = 0.25). Hybrid predictions were

deﬁned as the average of the QuanSA and FEP+predictions for

each molecule. Using a paired ttest, the relatively small

diﬀerence in prediction errors between QuanSA and FEP+was

not statistically signiﬁcant (p-value = 0.24). However, the

hybrid model performed statistically better than FEP+alone

(p-value = 0.002) and better than structure-guided QuanSA

alone (the paired ttest p-value of 0.09 just misses weak

signiﬁcance). The signed prediction errors of QuanSA and

FEP+were only slightly correlated (p= 0.04 by Kendall’sτ),

allowing the hybrid model to exhibit marked improvement.

Early-Stage Project Application Scenario: Sixteen

FEP+Benchmark Targets. Early-stage project application

may oﬀer only a handful of data points within a relatively

newly identiﬁed chemical series. The original FEP+bench-

mark,

here referred to as the Abel benchmark, consisted of

eight targets, each with a prediction set ranging from 11 to 42

members (each including a reference compound within the

prediction set). More recent benchmarking work, here referred

to as the Schindler benchmark,

consisted of eight targets,

each with a prediction set ranging from 24 to 44 members

(each including a reference compound within the prediction

set).

Structure−activity data within some series were extremely

limited, but contemporaneously available structure−activity

data and protein structure data were plentiful in other cases.

Figure 7 shows how a focused approach to model induction

can be applied in cases where sparse data exist within a

Figure 5. Four examples from comparisons of activity predictions on

a 17 molecule subset of the blind test set. For QuanSA, the top pose

family for each test molecule plus the interaction sticks of the top pose

with the pocket ﬁeld is shown. For FEP+, the initial docked poses are

shown. Hybrid predicted pKivalues are the simple average of the

QuanSA and FEP+values.

Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

https://doi.org/10.1021/acs.jcim.1c01382

J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

particular series, but where data from other series can be

exploited. We made use of the reference ligand in each case to

identify particularly relevant protein−ligand complexes and

structure−activity data, where the information was available

contemporaneously with the public disclosure of the molecules

within the prediction set.

The eSim 3D molecular similarity method (described in

detail previously

) was employed to identify particularly

relevant protein structures: those whose cognate ligands

exhibited high similarities to the FEP+reference ligand when

the protein binding sites were aligned. Then, those bound

ligand poses were used to screen for the most structurally

similar ligands from the available bioactivity data. Both of the

ﬁltering steps were applied to data that were publicly available

when each FEP+benchmark series was made public, through

either publication or a patent. Compounds from future years

were reserved for testing screening-style extrapolation from the

structurally focused data sets.

Focused QuanSA Model Building. Figure 8 illustrates

the focused model building process, using SHP2 as an example.

An allosteric mechanism for inhibiting SHP2 was published in

2016,

with a chemical series related to the initial lead

structures being disclosed in a subsequent patent that was

granted in 2018 (U.S. Patent 10,093,646), which contained the

structure−activity data used for the FEP+prediction set. There

were several cocrystallized allosteric inhibitors available by

2018 (top middle of Figure 8), with some extending quite far

beyond the spatial extent of the series of interest. By employing

a static eSim similarity measurement between each of the

crystallographically aligned ligands and the reference ligand, a

ﬁltered subset of relevant bound variants was identiﬁed (top

right).

Similarly, by 2018, a large number of alternative allosteric

inhibitors had been discovered, again with many extending far

beyond the reference ligand. In practice, with a physically

grounded aﬃnity prediction method such as QuanSA, such a

large set of competitive inhibitors dilutes the predictive

performance of models within the space that closely

encompasses a particular series or set of related series that

explore the same area. The full set of known ligands was

screened against the multiple-ligand crystallographically

derived alignment of relevant bound ligands using the eSim

method,

and those ligands whose scores exceeded a

threshold were retained (bottom middle of Figure 8). Finally,

the standard process for QuanSA model induction was

employed, making use of the relevant bound ligand poses to

help constrain generation of initial poses for all ligands. This

step may also ﬁlter the training molecule pool further on the

basis of multiple stages of accumulating ligands that are at ﬁrst

similar to the bound ligands, then those which are similar to

the newly aligned ligands, and so forth (see the Experimental

Section for additional details). In the case of SHP2, the full

pool of known ligands from 2018 and earlier numbered 514,

with the eSim-based ﬁltering process against the crystallo-

graphic ligands resulting in 51 molecules. The QuanSA

alignment initialization’s accumulative process retained 15 of

51 from the ﬁltered training pool (bottom right of Figure 8).

Each of the 16 targets underwent the same procedure for

focused model building, as just described. Figure 9 illustrates

predictions for SHP2 on four representative ligands (bottom

row), along with representative training ligands (top row).

Prediction values are shown for FEP+and QuanSA, and the

“hybrid”prediction for each ligand is simply the average of

Figure 6. Comparisons of activity predictions on a 17 molecule subset

of the blind test set for QuanSA, FEP+, and hybrid methods.

Figure 7. Preparation and scoring procedures in the early lead

optimization scenario, using a bound reference compound to identify

relevant additional bound ligands, which are then used collectively to

identify a pool of relevant bioactivity data for input to QuanSA model

induction.

Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

https://doi.org/10.1021/acs.jcim.1c01382

J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

those two values. The prediction set is characterized largely by

the disposition of the amine (right side of the central core),

whether being primary or secondary and the characteristics of

its connection to the central scaﬀold, though some variation of

the left-hand substituent was also explored. For SHP2, the

mean unsigned error for both FEP+and QuanSA was 0.6 log

unit, and the hybrid approach yielded 0.4. The sparse data for

model training was able to cover the variations present in the

prediction set, and the errors of the two primary approaches

partially canceled, allowing for the improvement seen in the

hybrid approach.

Figure 10 shows the analogous information for c-MET,

where 59 ligands of diverse structural character formed the

ﬁnal focused set for model parameterization. In contrast to

SHP2, the available training set consisted of molecules outside

the series of interest, and four diﬀerent heterocyclic cores are

present in the training examples shown in Figure 10. The

QuanSA approach was able to learn the eﬀects of the various

substitutions from alternative scaﬀolds and to transfer the

Figure 8. Process of constructing a focused QuanSA model from diverse data for SHP2.

Figure 9. Representative examples of predictions from the FEP+, QuanSA, and hybrid approaches for SHP2. Note that many of the SHP2 training

compounds came from the same patent and series as the prediction set.

Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

https://doi.org/10.1021/acs.jcim.1c01382

J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

information to the particular series represented in the

prediction set.

The crucial distinction between QuanSA and other machine-

learning methods for aﬃnity prediction is that QuanSA

constructs a model that is physically analogous to a protein

binding site. Therefore, in order to accurately predict, for

example, the quantitative eﬀect of the morpholine of the left-

most prediction case, other examples of ligands that place

cationic species in the same vicinity in their bound states must

be properly modeled in the learning process. The blue

interaction stick (red arrow) shows the preference that the

pocket ﬁeld has for an amine that is geometrically disposed as

in the optimal pose of this prediction example. It is diﬃcult to

understand the eﬀect on binding from a protein structural

perspective. The amine appears to be within the solvent,

relatively far from an obvious interaction partner. This perhaps

explains why the structure-focused FEP+approach made an

underprediction. In this case, the hybrid prediction was quite

accurate (just 0.2 pKiunit low). The right-most prediction

example shows an example where the hybrid approach did not

perform the best of all three, but it signiﬁcantly improved upon

the poorer of the two primary predictions.

It is conceivable, given a suﬃciently large quantity of data,

that a learning method which ignores the conformational strain

and pose of ligands in their bound state could make

meaningfully accurate predictions in cases like this. However,

for the type of ﬁne-grained guidance represented by these

examples, many early-stage lead optimization projects lack such

quantities of data. For the most challenging targets, where

relevant structure−activity data are the most scarce, methods

that can make eﬀective use of data sets measured in dozens of

compounds rather than thousands have a clear advantage.

Statistical Analysis for Focused Model Building.

Figure 11 shows plots for the full set of predictions for both

benchmark sets along with the cumulative histograms of

unsigned prediction errors, exhibiting the same type of error

Figure 10. Representative examples of predictions from the FEP+, QuanSA, and hybrid approaches for c-MET.

Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

https://doi.org/10.1021/acs.jcim.1c01382

J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

cancellation seen for LFA-1. The hybrid approach produced a

clear reduction in the fraction of predictions with large errors.

For the Schindler benchmark, the hybrid predictions had fewer

than 20% with errors of 1.0 log unit or greater, compared with

just under 30% for QuanSA and just under 40% for FEP+. For

the Abel benchmark, which consisted of smaller “jumps”than

seen for the Schindler benchmark targets, the hybrid approach

produced roughly 10% of predictions with errors of 1.0 log unit

or greater, with FEP+yielding just over 30% and QuanSA just

over 20%. For the Schindler benchmark, the unsigned error for

the hybrid predictions was very signiﬁcantly better than that of

either of the other two methods (pvalues of 10−10 and 10−6

compared with FEP+and QuanSA, respectively, using the

paired ttest). For the Abel benchmark, the unsigned error for

the hybrid predictions was very signiﬁcantly better than that

for FEP+(pvalue of 10−9). Between the hybrid and QuanSA

approaches, the hybrid method’s reduction in large errors

would make it preferred among the two, despite the error

distributions not being well-diﬀerentiated using the paired t

test.

Prediction errors for the FEP+approach in this analysis were

larger than those reported for the original analysis for the Abel

benchmark.

Here, the reference ligand was treated as a

training exemplar, with known absolute ΔG, and the ligands in

Figure 11. Plots of all predictions for each of the three methods for both FEP+benchmarks along with cumulative histograms of unsigned

prediction error. Lines indicate perfect performance (solid black), 1 kcal/mol error (dashed dark gray), and 2 kcal/mol (dashed light gray).

Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

https://doi.org/10.1021/acs.jcim.1c01382

J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

the prediction set were treated as having unknown activity.

Given the reported ΔΔGvalues, ﬁnal ΔGvalues for the

prediction set were made using the reference ligand’s value as

an oﬀset. In the original work, all experimental ΔGvalues were

used to “center”the predicted values. The more recent analysis

for the Schindler benchmark

noted this issue, and statistics

were not calculated for deviation from ΔG. Rather, emphasis

was placed on correlation statistics and upon pairwise ΔΔG

error magnitudes (the accuracy of single-edged predictions). In

practice, in a real prediction scenario, the average ΔGof a

prediction set cannot be known. Therefore, our analysis treats

the FEP+, QuanSA, and hybrid approaches in the same

manner, with the reference ligand as part of the “knowns”and

the prediction set as “unknowns”.

Table 1 shows per-target data set sizes and statistical values

for the three methods, both using rank-correlation (Kendall’sτ,

which is not aﬀected by the oﬀset calculation) and MUE. In all

cases, the hybrid approach had the lowest MUE. In ﬁve of

eight cases, it also had the highest rank correlation, with FEP+

and QuanSA showing very slightly higher values in the

remaining three cases (two for FEP+and one for QuanSA). In

no case did the hybrid approach fail to produce a statistically

signiﬁcant ranking, compared with one failure each for FEP+

and QuanSA (italics). Table 2 shows the analogous data for

the Abel benchmark. Note that, in three cases (MCL1, BACE,

and P38), a random half of the original prediction set was used

for training (marked with asterisks in Table 2; see the

Experimental Section for details). The pattern was similar to

that observed for the Schindler set, with the hybrid method

Table 1. Per-Target Performance of FEP+, QuanSA, and Hybrid Approaches and Data Set Sizes for the Eight Targets of the

Schindler Benchmark

Nmean unsigned error Kendall’sτ

target pred full pool ﬁltered ﬁnal FEP+QuanSA hybrid FEP+QuanSA hybrid

SHP2 25 514 51 15 0.58

(0.41−0.76) 0.61

(0.41−0.84) 0.40

(0.27−0.53) 0.69

(0.46−0.87) 0.43

(0.01−0.75) 0.72

(0.47−0.90)

PFKFB3 39 489 34 34 1.08

(0.86−1.30) 0.72

(0.57−0.88) 0.45

(0.33−0.58) 0.70

(0.56−0.82) 0.50

(0.31−0.66) 0.73

(0.62−0.84)

SYK 43 1827 18 18 0.63

(0.48−0.81) 0.62

(0.48−0.78) 0.49

(0.37−0.62) 0.34

(0.07−0.59) 0.13

(−0.12−0.37) 0.35

(0.09−0.60)

HIF2a 41 63 30 29 0.70

(0.53−0.88) 0.82

(0.63−1.02) 0.58

(0.44−0.74) 0.54

(0.27−0.77) 0.42

(0.18−0.64) 0.51

(0.25−0.72)

TNKS2 27 541 150 143 0.86

(0.59−1.15) 0.74

(0.60−0.89) 0.64

(0.45−0.83) 0.34

(−0.04−0.65) 0.55

(0.26−0.76) 0.49

(0.16−0.75)

c-MET 23 176 62 59 1.07

(0.80−1.34) 0.82

(0.66−0.99) 0.71

(0.54−0.90) 0.82

(0.66−0.93) 0.68

(0.51−0.83) 0.85

(0.73−0.95)

CDK8 32 130 60 60 0.96

(0.68−1.25) 0.99

(0.71−1.30) 0.90

(0.68−1.16) 0.66

(0.42−0.87) 0.45

(0.19−0.68) 0.66

(0.44−0.85)

EG5 27 147 34 34 1.08

(0.90−1.26) 1.09

(0.84−1.32) 0.96

(0.78−1.15) 0.73

(0.53−0.89) 0.47

(0.09−0.77) 0.67

(0.40−0.89)

32.1 485.9 54.9 49.0 0.87 ±0.21 0.80 ±0.17 0.64 ±0.21 0.60 ±0.18 0.45 ±0.16 0.62 ±0.16

Unsigned error is in units of pKi, and Kendall’sτvalues are unitless. Numbers in parentheses are 95% conﬁdence intervals calculated by

resampling with replacement, bolded values are the best from any method, and values shown in italics did not meet statistical signiﬁcance at the p=

0.01 level. The values in the bottom row are the mean and standard deviation for the respective statistical measurement column.

Table 2. Per-Target Performance of FEP+, QuanSA, and Hybrid Approaches and Data Set Sizes for the Eight Targets of the

Abel Benchmark

Nmean unsigned error Kendall’sτ

target pred full pool ﬁltered ﬁnal FEP+QuanSA hybrid FEP+QuanSA hybrid

thrombin 10 2401 74 74 0.55

(0.34−0.81) 0.42

(0.23−0.65) 0.28

(0.18−0.41) 0.63

(−0.16−1.00) 0.63

(−0.27−1.00) 0.85

(0.45−1.00)

MCL1*20 170 35 34 0.78

(0.53−1.05) 0.30

(0.15−0.49) 0.41

(0.25−0.57) 0.58

(0.13−0.89) 0.73

(0.35−1.00) 0.70

(0.36−0.94)

BACE*17 1705 93 93 0.98

(0.75−1.21) 0.30

(0.20−0.41) 0.46

(0.34−0.60) 0.77

(0.54−0.94) 0.62

(0.22−0.90) 0.81

(0.54−0.98)

P38*16 1901 92 84 0.66

(0.38−0.95) 0.62

(0.39−0.87) 0.49

(0.34−0.65) 0.58

(0.22−0.85) 0.13

(−0.35−0.62) 0.64

(0.31−0.89)

PTP1b 22 528 53 41 0.78

(0.59−0.97) 0.96

(0.62−1.31) 0.54

(0.37−0.73) 0.81

(0.53−0.99) 0.27

(−0.13−0.60) 0.65

(0.37−0.88)

CDK2 15 86 43 43 0.69

(0.49−0.89) 0.84

(0.52−1.17) 0.61

(0.39−0.83) 0.29

(−0.23−0.76) 0.67

(0.33−0.92) 0.71

(0.32−0.97)

TYK2 15 124 48 48 0.62

(0.45−0.84) 0.80

(0.49−1.14) 0.69

(0.50−0.89) 0.71

(0.34−0.98) 0.53

(0.11−0.88) 0.78

(0.45−1.00)

JNK1 20 155 68 55 1.38

(1.04−1.71) 0.46

(0.27−0.70) 0.74

(0.57−0.90) 0.89

(0.69−1.00) 0.64

(0.36−0.85) 0.88

(0.65−1.00)

16.9 883.8 63.3 59.0 0.81 ±0.21 0.59 ±0.17 0.53 ±0.21 0.66 ±0.18 0.53 ±0.16 0.75 ±0.16

Unsigned error is in units of pKi, and Kendall’sτvalues are unitless. Numbers in parentheses are 95% conﬁdence intervals calculated by

resampling with replacement, bolded values are the best from any method, and values shown in italics did not meet statistical signiﬁcance at the p=

0.01 level. The values in the bottom row are the mean and standard deviation for the respective statistical measurement column.

Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

https://doi.org/10.1021/acs.jcim.1c01382

J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

producing the best performance, either by MUE or by rank

correlation, though the advantage of the hybrid approach over

the QuanSA method was smaller.

The average Kendall’sτvalues over all 16 targets for the

three methods were as follows: 0.63 (FEP+), 0.49 (QuanSA),

and 0.69 (hybrid). None of the per-target rank-correlation

diﬀerences between methods were statistically signiﬁcant at p=

0.01 by the paired ttest due to the relatively small number of

targets. The statistical power is also limited by the fact that

each individual data set is relatively small, and several are

dominated by a narrow experimental assay range, so the

correlation statistics tend to have high variance. The values of

the average per-target unsigned error for the three methods

were 0.84 (FEP+), 0.69 (QuanSA), and 0.58 (hybrid). By the

paired ttest, the per-target hybrid MUE was consistently lower

than those for FEP+(p<10

−3) and QuanSA (p= 0.02). This

agreed with the analysis of the unsigned prediction error across

the ligands within the Schindler benchmark (N= 257) and the

Abel benchmark (N= 135), which oﬀer more statistical power

to diﬀerentiate between the methods (see Figure 11).

With respect to the sizes of the bioactivity data sets, we see

that the typical size of the nominally available bioactivity data

was in the hundreds of molecules. However, only roughly one-

tenth of these survived the ﬁlter of relevance against the bound

Figure 12. Plots for all eight targets of the Schindler FEP+benchmark. FEP+shown in green plus signs, QuanSA shown in violet times signs, and

hybrid predictions shown in red squares, with a single gray circle marking the activity of the FEP+reference ligand (treated as part of the training

set). In addition, a histogram of signed prediction errors is shown in the lower right.

Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

https://doi.org/10.1021/acs.jcim.1c01382

J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

known ligand poses, which themselves were ﬁltered for

relevance. The ﬁnal focused set of structure−activity data for

each target ranged from 15 (SHP2) to 143 (TNKS2),

averaging about 50 for the Schindler benchmark and about

60 for the Abel benchmark. Data requirements of this scale for

the ﬁnal models are generally within the scope of lead

optimization projects quite early on in the exploration of a new

chemical series.

Figures 12 and 13 show individual plots for all predictions

by each method for each target along with histograms of the

signed prediction error values. The histograms showed a

marked decrease in errors of large magnitude (either over- or

underpredictions) by the hybrid method (shown in red). None

of the methods exhibited a systematic bias, with all histograms

being centered very close to zero.

During ﬁne-grained lead optimization, while the rank order

of synthetic candidates is clearly important, the absolute

accuracy of aﬃnity predictions takes on additional importance.

For example, in the case of PFKFB3, the reference ligand had a

pKiof roughly 6.5. Consider the predictions for the 39

molecules of the test set as nominal true positives (TPs,

predicted and experimental ≥reference), true negatives (TNs,

predicted/experimental ≤reference), false positives (FPs,

predicted ≥reference, experimental ≤reference), and false

Figure 13. Plots for all eight targets of the Abel FEP+benchmark. FEP+shown in green plus signs, QuanSA shown in violet times signs, and hybrid

predictions shown in red squares, with a single gray circle marking the activity of the FEP+reference ligand (treated as part of the training set). In

addition, a histogram of signed prediction errors is shown in the lower right.

Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

https://doi.org/10.1021/acs.jcim.1c01382

J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

negatives (FNs, predicted ≤reference, experimental ≥

reference). Both FEP+and QuanSA produced good rankings

(0.70 and 0.50 by Kendall’sτ, respectively, both with p< 0.01).

The experimental data contained 20 positives and 19 negatives.

FEP+correctly identiﬁed 20 of 20 TPs, but at the expense of 10

of 19 FPs. QuanSA correctly identiﬁed 7 of 20 TPs, but it did

so with 0 of 19 FPs. The hybrid approach obtained 20 of 20

TPs with just 4 of 19 FPs. The hybrid method also produced

the best Kendall’sτ(0.73, a marginal increase).

Adiﬀerent eﬀect was seen with the case of SYK. None of the

methods yielded a high-quality ranking, but this was an eﬀect

of the data distribution. Both methods produced relatively

accurate results on an absolute scale (MUE of 0.6 each), with a

relatively small fraction of predictions being oﬀby more than 2

kcal/mol (9 kcal/mol for FEP+and 4 kcal/mol for QuanSA of

43 total predictions). The hybrid approach made an improve-

ment (MUE of 0.5), with 36 of 43 molecules predicted within

1 kcal/mol of experimental and just 1 of 43 with an error of

greater than 2 kcal/mol. Nevertheless, the rank correlation was

marginal, reﬂecting the limitations of rank-based statistics in

such cases.

Predicted inhibitory activity relative to the current project

context may have a signiﬁcant inﬂuence on decisions about

which candidate molecules to synthesize. In particular, a

smaller proportion of large absolute errors will provide better

guidance if rank correlation is equivalent between two

prediction methods and, possibly, even if rank correlation is

slightly worse for the method with better absolute ﬁdelity.

Overall, the hybrid approach appears to be the best choice in

terms of achieving accurate absolute binding aﬃnities or

rankings thereof. Across all 16 targets, with respect to MUE, is

was the best approach in 12 of 16 cases and second best by a

small margin in 4 of 16 cases. With respect to ranking, it was

either the best approach (10 of 16) or second best by a small

margin (6 of 16). Beyond the per-target performance, as seen

from the cumulative histograms of unsigned errors in Figure 11

and the histograms of signed errors in Figures 12 and 13, the

hybrid approach made a marked improvement in terms of the

frequency of large errors, both for overpredictions and for

underpredictions. Nearly 70% of the time, hybrid predictions

were within 1 kcal/mol of experiment, and errors of 2 kcal/mol

or greater occurred 5% of the time or less.

Extrapolation with QuanSA: Identiﬁcation of Novel

Scaﬀolds and Linkers. Because the QuanSA method can be

applied automatically and rapidly, QuanSA pocket ﬁelds can be

used to screen large numbers of candidate molecules. We

explored the ability of the induced models to identify novel

active molecules from ChEMBL data, where the publication

dates of the reports of the new molecules were strictly later (by

year) than the data on which models were constructed. This

approach to data segregation makes it very unlikely that

information about the “new”molecules would have been

known and used in designing the molecules used to construct

the models. The converse, of course, is desirable: to see how

well a model can identify novel actives whose structures may

be, in part, reﬂected in the structures known at the time of

model construction.

We assessed the screening utility of the models for

identifying novel molecules by establishing thresholds on

minimum predicted activity (6.0 pKiunits) and on the raw

nearest-neighbor similarity (0.60 eSim unit) of a screened

molecule to a training molecule, both in their predicted

optimal poses. In the project application scenarios previously

discussed, the eSim nearest-neighbor similarity was very high

(averages of 0.87, 0.89, and 0.79 for the LFA-1, Abel, and

Schindler sets, respectively), with only a single LFA-1

prediction molecule having an eSim score of less than 0.60,

none within the Abel set, and fewer than 5% within the

Schindler set. Note that, especially for structurally divergent

molecules, it is not expected that the activity predictions will be

as accurate as for ligands within the focus of the models.

Rather, a critical feature of the selection criteria is that they

identify a small fraction of false positives, as the space of

candidates to be explored may be large. In order to establish

speciﬁcity, we also screened a decoy set of 1000 drug/leadlike

ZINC molecules, with the entire set presumptively deﬁned as

false positives. For the 17 targets, in 12 cases, 1 or fewer of the

1000 decoys met the thresholds, with three cases two to four

false positives existed, and in two cases (TNKS2 and CDK8)

the estimated FP rate was 1−3%.

For the LFA-1 case, just 44 ChEMBL molecules existed to

be screened as temporally prospective candidates, but none of

the molecules passed both criteria. Of the 16 FEP+benchmark

targets, “future”data existed in ChEMBL for all but PFKFB3

to assess extrapolation utility. For these 15 targets, in all cases

except PTP1b, new active ChEMBL molecules were identiﬁed,

ranging from a handful (e.g., CDK8, HIF2a, SHP2, JNK1, and

TYK2) to dozens or hundreds (c-MET, SYK, TNKS2, BACE,

MCL-1, and P38).

Figure 14 shows examples from the Schindler target set, one

each for SHP2, c-MET, and SYK. In each case, the automatic

prediction of bound pose is important in establishing the

relationship between the novel compound and those forming

the training set. A notable example was observed for c-MET,

where the new compound was predicted to have greater

activity than any of the training molecules, and it was highly

active. This new molecule makes use of the triazolopyrazine at

right,

but it contained a novel linker.

Figure 15 shows examples from the Abel benchmark target

set, one each for BACE, MCL-1, and P38. These follow the

same pattern: predictions of target-speciﬁc activity that depend

upon identifying low-energy conformations of complex small

molecules that align with the predicted binding modes of

modeled ligands. Of particular note was MCL1. Here, a

macrocyclic linkage for a highly active inhibitor

was

identiﬁed.

Computational Time Complexity. The QuanSA and

FEP+methods have quite diﬀerent time requirements for

calculations. FEP+has been optimized for GPU-based

acceleration, and calculations for this study were performed

by using computing nodes running Intel Xeon E7-8867 v3

CPUs (2.5 GHz, 16 core), with an NVIDIA Tesla K40c GPU

dedicated to each node. The most time-consuming preparatory

calculation is the estimation of custom force ﬁeld parameters

for unparameterized torsions. In this case, the calculation

required approximately 1 h per molecule for each of the 17

molecules studied. Following force ﬁeld parameterization, each

single-edged ΔΔGcalculation (i.e., a prediction for a single

molecule) required just over 2 h. Because each new molecule

may require force ﬁeld parameterization, the expectation is

roughly 3 h of wall-clock time per molecule. This allows for

synthetic prioritization during late-stage lead optimization

given an appropriate computing infrastructure, where

predictions on diﬀerent molecules can be processed in parallel.

For QuanSA, the process of model induction and selection is

the preparatory calculation, but it is done a single time prior to

Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

https://doi.org/10.1021/acs.jcim.1c01382

J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

applying a model to many new ligands. The process consists of

conformational search, mutual ligand alignment, model

parameter learning, and model selection. The QuanSA method

makes use of multicore laptops and workstations for all

calculations, with the reference system containing dual Intel

Xeon Gold 6154 CPUs (3.00 GHz, 36 cores total) with no

GPU-based acceleration. For LFA-1, a fairly typical case, the

entire model induction and selection process required

approximately 3 h of real wall-clock time, the majority of

which was spent in the model parameter learning step for each

of several alternative alignment hypotheses. Calculations of

predicted binding poses and scores for new molecules required

an average of less than 10 s per molecule (including both

conformational search and ﬁtting into the QuanSA pocket

ﬁeld). Screens of very large numbers of possible design ideas or

virtual synthetic libraries are possible with the QuanSA

method, either employed directly or after using a similarity-

based method as a prescreen.

■CONCLUSIONS

Using temporally segregated series of molecules from a lead

optimization project and from two public aﬃnity-prediction

benchmarks, we have explored the performance of orthogonal

aﬃnity prediction approaches. The QuanSA approach

constructs binding-site models based on ligand structure and

aﬃnity data, using multiple-instance machine learning. The

models are physically sensible in that they model the protein−

ligand interaction in a manner analogous to the physical

process of binding. Given a QuanSA pocket ﬁeld, the process

of scoring a new molecule is analogous to docking to a protein

structure: conformational search, alignment to the pocket ﬁeld,

and optimization of ﬁnal poses with respect to conformational

strain and interaction score. A parallel set of experiments was

carried out by using the physics-based simulation approach

FEP+on LFA-1 and by making use of previously published

data for two benchmarks comprising 16 additional targets.

9,11

For the LFA-1 lead optimization set, the QuanSA model’s

predicted aﬃnities for 67 future compounds had an average

error of 0.4 pKiunit (with highly signiﬁcant rank correlation

statistics). For the 17 compounds on which FEP+calculations

were made, the average prediction error was marginally worse

than that for QuanSA. More importantly, by combining the

QuanSA and FEP+predictions into a hybrid model, the

average error dropped signiﬁcantly, to less than 0.3 pKiunit,

with a very high correlation between experimental and

Figure 14. Examples of novel ligands for each of three targets from

the Schindler FEP+benchmark identiﬁed through temporally

prospective screening of focused QuanSA models. Figure 15. Examples of novel ligands for each of three targets from

the Abel benchmark identiﬁed through temporally prospective

screening of focused QuanSA models.

Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

https://doi.org/10.1021/acs.jcim.1c01382

J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

predicted aﬃnities. The errors that the modeling approaches

made in predictions were only slightly correlated, so the hybrid

model exhibited error cancellation.

This pattern of synergy between the two approaches, due to

partial error cancellation, was mirrored across 16 additional

examples derived from public FEP+benchmarking data. The

machine-learning strategy involved choosing the most relevant

data to inform model construction, rather than simply

aggregating a large amount of data. The QuanSA method is

physically driven, and unnecessarily large data sets tend to

dilute the information required for accurate aﬃnity prediction.

Both on a per-target basis and in terms of aggregate error

analysis on nearly 400 compound aﬃnity predictions, the

hybrid approach (averaging the predictions from FEP+and

QuanSA) performed better than the two primary methods,

particularly with respect to a reduction in large-magnitude

errors. Given the consistency of the results presented here, we

expect this to be true quite often, perhaps even for the majority

of targets across the range of early-stage and mid-to-late-stage

lead optimization projects.

Two aspects of the learned QuanSA models are important

by way of contrast to FEP+. First, application of the models is

computationally inexpensive. Second, the models are relatively

insensitive to molecular scaﬀold changes that may be

inappropriate for application of a free energy perturbation

method. Taken together, this allows for use cases such as

screening large numbers of virtual compounds or searching for

scaﬀold alternatives.

Currently, in cases where biophysical data are available,

application of FEP+and related methods is often done

exclusive of machine-learning approaches. The results

presented here argue that a simple and direct approach can

improve upon either single-mode method. Apart from the

synergy of the numerical predictions, the complementarity of

the methods is multifaceted. One can be applied in low

throughput on relatively close-in analogues at a high

computational cost. The other can be readily applied in

those cases to provide an orthogonal prediction, but it can also

be applied across a wider range of chemical space, and very

large sets of potential ligands can be processed. Application of

QuanSA models across the targets to identify future active

ChEMBL ligands was successful in many cases, with generally

very low estimated false positive rates. The QuanSA

predictions yield a prediction of binding pose, rather than a

black box just producing a number, which lends conﬁdence

where synthetic eﬀort requires justiﬁcation and also can

stimulate the design process.

In terms of assessment for either simulation-based or

machine-learning methods, studies have only rarely been

published utilizing time-stamped lead optimization data. This

type of application is the most realistic approach to validation

that is possible without fully prospective experiments, and it is

more likely to reﬂect future real-world prospective perform-

ance than other validation schemes. Here, accurate predictions

were obtained at both the short time scale (compound

registration dates) and a longer time scale (data disclosure

years).

As a general matter for application in lead optimization, we

believe that the dichotomy between physics-based simulation

methods and those from machine learning has been driven

largely by the nonphysical assumptions of traditional QSAR

approaches. The QuanSA approach moves much closer to

physics-based simulation in terms of its underlying mechanics.

The approaches appear to be complementary, both in the

sense of orthogonality of prediction errors and in terms of their

domains of applicability. Medicinal chemistry projects within

most stages of structure-enabled lead optimization could

beneﬁt from both types of approaches, in combination and to

serve complementary goals.

■EXPERIMENTAL SECTION

This is not primarily a report of new methods. As such, data

curation and computational protocols will be described, with

references to detailed reports of their algorithmic under-

pinnings, implementation, and validation. All molecular and

activity data along with computational procedures for the

public benchmarks comprising the bulk of this study are freely

available (for details, see Supporting Information and Data and

Software Availability).

Molecular and Activity Data. LFA-1. A total of 202

compounds from a lead optimization project formed the data

set. Molecules were provided as SMILES strings with

registration dates and associated activities. The molecules

were sorted by registration date and segregated temporally,

with the oldest third as train molecules (n= 67), the next third

as holdout molecules (n= 68), and the most recent third as

test molecules (n= 67). Standard procedures were used to

convert 2D to 3D structures, protonate the molecules as

expected at physiological pH, and perform a conformational

search. There were 57 molecules having exactly one

unspeciﬁed chiral center, and these were prepared as racemic

mixtures.

ChEMBL target 1803, human LFA-1, had 131 small

molecule ligands with associated IC50 activities. The set was

ﬁltered to exclude ligands that were present in the set of 202

project compounds and those that did not bind at the allosteric

binding site. The 44 remaining ligands formed the ChEMBL

extrapolation set, representing three LFA-1 pharmaceutical

industry eﬀorts: 32 from ref 35, 10 from ref 36, and 2 from the

same series of molecules in our 202 ligand set.

The 1000

ZINC molecule set used to establish model speciﬁcity has been

used previously in the same manner.

14,18

FEP+Benchmark Data Sets. Data comprising 16 prediction

sets was derived from the Supporting Information of two FEP+

benchmarking studies.

9,11

Data for model induction was

curated from ChEMBL and, where necessary, from the original

publications cited within that Supporting Information. For

each target, where Nmolecules were indicated as comprising

the prediction set, the reference molecule was removed from

the predictions to be made and added to the QuanSA model

induction sets. The reference molecule in its bound state was

also used to identify related contemporaneous PDB structures

from a mutually aligned set, as described previously.

For both benchmark data sets, the procedure outlined in

Figure 7 made use of default eSim thresholds for building

focused models of 6.5. For the Schindler benchmark, in cases

where this threshold yielded too few training molecules or

poor prediction set coverage based on QuanSA scoring quality

measures, the thresholds were reduced. To test an alternative

strategy, for the Abel benchmark, in three cases (BACE,

MCL1, and P38) half of the N−1 molecules for each

prediction set were randomly selected and added to the

respective training sets. For the Schindler benchmark, there

were, on average 32.1 prediction examples per target (total of

257). For the Abel benchmark, there were 16.9 examples per

target (total of 135). Tables 1 and 2provide total numbers of

Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

https://doi.org/10.1021/acs.jcim.1c01382

J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

compounds for each target (prediction set, full possible

training pool, pretraining ﬁltered pool, and ﬁnal number of

training compounds).

ChEMBL data representing “future”molecules were

collected for each of the 16 targets using ChEMBL target

identiﬁers, and the year of literature or patent publication was

used to temporally segregate the extrapolation sets from the

model induction sets. These were further ﬁltered to remove

data used for model induction (in rare cases, a training

compound appeared in future-year publications). The decoy

set used to establish speciﬁcity was the same as that used for

LFA-1.

Computational Procedures for QuanSA Model In-

duction and Prediction. The QuanSA method is the

successor to the QMOD approach.

QuanSA builds a pocket

field rather than constructing a physical set of probes. QuanSA

brings together ideas and algorithms from molecular

similarity

and multiple instance learning together with

many lessons learned and features developed with the

QMOD approach.

The methods papers are comprehensive,

and the Supporting Information contains a more detailed

algorithmic description than the foregoing without recapitulat-

ing the prior publications. Standard procedures were employed

for QuanSA model induction (Surﬂex Platform releases 5.001

for LFA-1 and 5.122 for the Schindler and Abel benchmarks,

BioPharmics LLC, Sonoma County, CA, 2021).

LFA-1. The results reported here were generated using

version 5 of the Surﬂex Platform. Surﬂex-Dock was used to

align X-ray crystallographic protein structures and to perform

ensemble docking as an independent means to analyze ligand

pose. Ligand preparation was carried out with standard

procedures using the -pquant level of conformer elaboration.

This produced up to 1000 conformers for each training ligand

(though typically far fewer for LFA-1 compounds). For the 57

molecules with a single unspeciﬁed chiral center, the -enum

chiral 1 parameter resulted in the molecules being racemized.

Details on the ForceGen methodology have been detailed

previously.

38,39

The QuanSA initialization procedure (init) automatically

builds multiple initial alignments, but it can be inﬂuenced by

user knowledge and guidance. The -clknown parameter

speciﬁes a set of known poses for competitive ligands, in this

case an alignment of six crystallographic ligands. The searched

conformer database along with a ﬁle containing molecule

names and associated activity information formed the input to

the initialization procedure, which was done in the standard

manner. After initialization, model building was carried out for

the top ﬁve alignments produced by the initialization

procedure, followed by model selection, informed by the use

of a holdout set. After selection of the model for prospective

application, the holdout molecules were then added to reﬁne

the selected model, as outlined in Figure 2.

FEP+Benchmark Data Sets. As for application to LFA-1,

standard procedures were used for model induction and for

scoring the prediction sets. The primary variation in the

application to the individual targets was in ﬁltering the training

pool of data for each target (as described above). For the

targets SYK, EG5, and CDK8, the initial threshold for ligand

similarity to known bound ligands was reduced from the

standard value of 6.5 in order to yield suﬃcient training data

for adequate coverage of the prediction sets based on the

quality metrics produced in the scoring procedure.

Computational Procedures for FEP+.FEP+calculations

were performed by using standard protocols, making use of the

Glide, Prime, MacroModel, and FEP+tools (all release 2019-4,

Schrödinger, LLC, New York, NY, 2019).

Poses were generated by using Glide in SP mode using core

constraints relative to compound 1. The ligand and residues

within 5 Å of the ligand were left ﬂexible, while the remainder

of the atoms were constrained. For application of FEP+, the

Glide poses were minimized by using MacroModel with

OPLS3 with implicit solvation. The protein was held rigid.

Custom force ﬁeld parameters were calculated using the

Forceﬁeld Builder module that is part of FEP+. Single-edge

FEP+calculations were performed relative to compound 13 for

the 17 test molecules from the temporally segregated test set.

■DATA AND SOFTWARE AVAILABILITY

An extensive data archive is freely available at www.jainlab.org

(see the Supporting Information). All software employed

herein is commercially available.

■ASSOCIATED CONTENT

sıSupporting Information

The Supporting Information is available free of charge at

https://pubs.acs.org/doi/10.1021/acs.jcim.1c01382.

Additional information about computational methods;

detailed description of the contents of the extensive data

archive (PDF)

All prediction data for the FEP+targets for all three

methods (XLSX)

■AUTHOR INFORMATION

Corresponding Authors

Stephen R. Johnson −Computer-Assisted Drug-Design,

Bristol-Myers Squibb Company, Princeton, New Jersey

08648, United States; Email: stephen.johnson@bms.com

Ajay N. Jain −Research and Development, BioPharmics LLC,

Santa Rosa, California 95404, United States; orcid.org/

0000-0003-4641-8501; Email: ajain@jainlab.org

Author

Ann E. Cleves −Applied Science, BioPharmics LLC, Santa

Rosa, California 95404, United States; orcid.org/0000-

0002-1622-2770

Complete contact information is available at:

https://pubs.acs.org/10.1021/acs.jcim.1c01382

Notes

The authors declare no competing ﬁnancial interest.

■ACKNOWLEDGMENTS

The authors thank Bristol-Myers Squibb for providing support

for this work.

■REFERENCES

(1) Brown, S. P.; Muchmore, S. W. High-throughput calculation of

protein-ligand binding Affinities: Modification and adaptation of the

MM-PBSA protocol to enterprise grid computing. J. Chem. Inf. Model.

2006,46, 999−1005.

(2) Brown, S. P.; Muchmore, S. W. Rapid estimation of relative

protein-ligand binding affinities using a high-throughput version of

MM-PBSA. J. Chem. Inf. Model. 2007,47, 1493−1503.

(3) Brown, S. P.; Muchmore, S. W. Large-scale application of high-

throughput molecular mechanics with Poisson-Boltzmann surface area

Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

https://doi.org/10.1021/acs.jcim.1c01382

J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

for routine physics-based scoring of protein-ligand complexes. J. Med.

Chem. 2009,52, 3159−3165.

(4) Wang, E.; Sun, H.; Wang, J.; Wang, Z.; Liu, H.; Zhang, J. Z.;

Hou, T. End-point binding free energy calculation with MM/PBSA

and MM/GBSA: strategies and applications in drug design. Chem.

Rev. 2019,119, 9478−9508.

(5) Chodera, J. D.; Mobley, D. L.; Shirts, M. R.; Dixon, R. W.;

Branson, K.; Pande, V. S. Alchemical free energy methods for drug

discovery: Progress and challenges. Curr. Opin. Struct. Biol. 2011,21,

150−160.

(6) Genheden, S.; Ryde, U. The MM/PBSA and MM/GBSA

methods to estimate ligand-binding affinities. Expert Opin. Drug

Discovery 2015,10, 449−461.

(7) Jorgensen, W. L.; Ravimohan, C. Monte Carlo simulation of

differences in free energies of hydration. J. Chem. Phys. 1985,83,

3050−3054.

(8) Kollman, P. Free energy calculations: Applications to chemical

and biochemical phenomena. Chem. Rev. 1993,93, 2395−2417.

(9) Wang, L.; Wu, Y.; Deng, Y.; Kim, B.; Pierce, L.; Krilov, G.;

Lupyan, D.; Robinson, S.; Dahlgren, M. K.; Greenwood, J.; Romero,

D. L.; Masse, C.; Knight, J. L.; Steinbrecher, T.; Beuming, T.; Damm,

W.; Harder, E.; Sherman, W.; Brewer, M.; Wester, R.; Murcko, M.;

Frye, L.; Farid, R.; Lin, T.; Mobley, D. L.; Jorgensen, W. L.; Berne, B.

J.; Friesner, R. A.; Abel, R. Accurate and reliable prediction of relative

ligand binding potency in prospective drug discovery by way of a

modern free-energy calculation protocol and force field. J. Am. Chem.

Soc. 2015,137, 2695−2703.

(10) Sun, H.; Li, Y.; Tian, S.; Xu, L.; Hou, T. Assessing the

performance of MM/PBSA and MM/GBSA methods. 4. Accuracies

of MM/PBSA and MM/GBSA methodologies evaluated by various

simulation protocols using PDBbind data set. Phys. Chem. Chem. Phys.

2014,16, 16719−16729.

(11) Schindler, C. E. M.; Baumann, H.; Blum, A.; Bose, D.;

Buchstaller, H.-P.; Burgdorf, L.; Cappel, D.; Chekler, E.; Czodrowski,

P.; Dorsch, D.; Eguida, M. K. I.; Follows, B.; Fuchss, T.; Grädler, U.;

Gunera, J.; Johnson, T.; Jorand Lebrun, C.; Karra, S.; Klein, M.;

Knehans, T.; Koetzner, L.; Krier, M.; Leiendecker, M.; Leuthner, B.;

Li, L.; Mochalkin, I.; Musil, D.; Neagu, C.; Rippmann, F.; Schiemann,

K.; Schulz, R.; Steinbrecher, T.; Tanzer, E.-M.; Unzue Lopez, A.;

Viacava Follis, A.; Wegener, A.; Kuhn, D. Large-scale assessment of

binding free energy calculations in active drug discovery projects. J.

Chem. Inf. Model. 2020,60, 5457−5474.

(12) Walters, W. P.; Barzilay, R. Applications of deep learning in

molecule generation and molecular property prediction. Acc. Chem.

Res. 2021,54, 263−270.

(13) Ramsundar, B.; Eastman, P.; Walters, P.; Pande, V. Deep

Learning for the Life Sciences: Applying Deep Learning to Genomics,

Microscopy, Drug Discovery, and More;O’Reilly Media, Inc.: 2019.

(14) Cleves, A. E.; Jain, A. N. Quantitative Surface Field Analysis:

Learning Causal Models to Predict Ligand Binding Affinity and Pose.

J. Comput.-Aided Mol. Des. 2018,32, 731−757.

(15) Jain, A. Scoring noncovalent protein-ligand interactions: A

continuous differentiable function tuned to compute binding

affinities. J. Comput.-Aided Mol. Des. 1996,10, 427−440.

(16) Pham, T.; Jain, A. Parameter estimation for scoring protein-

ligand interactions using negative training data. J. Med. Chem. 2006,

49, 5856−5868.

(17) Jain, A.; Cleves, A. Does your model weigh the same as a Duck?

J. Comput.-Aided Mol. Des. 2012,26,57−67.

(18) Cleves, A. E.; Jain, A. N. Extrapolative prediction using

physically-based QSAR. J. Comput.-Aided Mol. Des. 2016,30, 127−

152.

(19) Jain, A. N.; Dietterich, T. G.; Lathrop, R. H.; Chapman, D.;

Critchlow, R. E., Jr.; Bauer, B. E.; Webster, T. A.; Lozano-Perez, T.

Compass: A Shape-Based Machine Learning Tool for Drug Design. J.

Comput.-Aided Mol. Des. 1994,8, 635−652.

(20) Jain, A.; Koile, K.; Chapman, D. Compass: Predicting biological

activities from molecular surface properties. Performance comparisons

on a steroid benchmark. J. Med. Chem. 1994,37, 2315−27.

(21) Jain, A.; Harris, N.; Park, J. Quantitative binding site model

generation: Compass applied to multiple chemotypes targeting the 5-

HT1a receptor. J. Med. Chem. 1995,38, 1295−1308.

(22) Dietterich, T. G.; Lathrop, R. H.; Lozano-Pérez, T. Solving the

multiple instance problem with axis-parallel rectangles. Artif. Intell.

1997,89,31−71.

(23) Potin, D.; Launay, M.; Monatlik, F.; Malabre, P.; Fabreguettes,

M.; Fouquet, A.; Maillet, M.; Nicolai, E.; Dorgeret, L.; Chevallier, F.;

Besse, D.; Dufort, M.; Caussade, F.; Ahmad, S. Z.; Stetsko, D. K.;

Skala, S.; Davis, P. M.; Balimane, P.; Patel, K.; Yang, Z.; Marathe, P.;

Postelneck, J.; Townsend, R. M.; Goldfarb, V.; Sheriff, S.; Einspahr,

H.; Kish, K.; Malley, M. F.; DiMarco, J. D.; Gougoutas, J. Z.; Kadiyala,

P.; Cheney, D. L.; Tejwani, R. W.; Murphy, D. K.; Mcintyre, K. W.;

Yang, X.; Chao, S.; Leith, L.; Xiao, Z.; Mathur, A.; Chen, B.-C.; Wu,

D.-R.; Traeger, S. C.; McKinnon, M.; Barrish, J. C.; Robl, J. A.;

Iwanowicz, E. J.; Suchard, S. J.; Dhar, T. G. M. Discovery and

development of 5-[(5 S, 9 R)-9-(4-cyanophenyl)-3-(3, 5-dichlor-

ophenyl)-1-methyl-2, 4-dioxo-1, 3, 7-triazaspiro [4.4] non-7-yl-

methyl]-3-thiophenecarboxylic acid (BMS-587101)a small molecule

antagonist of leukocyte function associated antigen −1.J. J. Med.

Chem. 2006,49, 6946−6949.

(24) Watterson, S. H.; Xiao, Z.; Dodd, D. S.; Tortolani, D. R.;

Vaccaro, W.; Potin, D.; Launay, M.; Stetsko, D. K.; Skala, S.; Davis, P.

M.; Lee, D.; Yang, X.; McIntyre, K. W.; Balimane, P.; Patel, K.; Yang,

Z.; Marathe, P.; Kadiyala, P.; Tebben, A. J.; Sheriff, S.; Chang, C. Y.;

Ziemba, T.; Zhang, H.; Chen, B.-C.; DelMonte, A. J.; Aranibar, N.;

McKinnon, M.; Barrish, J. C.; Suchard, S. J.; Murali Dhar, T. G. Small

Molecule Antagonist of Leukocyte Function Associated Antigen-1

(LFA-1): Structure- Activity Relationships Leading to the Identi-

fication of 6-((5 S, 9 R)-9-(4-Cyanophenyl)-3-(3, 5-dichlorophenyl)-

1-methyl-2, 4-dioxo-1, 3, 7-triazaspiro [4.4] nonan-7-yl) nicotinic

Acid (BMS-688521). J. Med. .Chem. . 2010,53, 3814−3830.

(25) Cleves, A. E.; Jain, A. N. Effects of inductive bias on

computational evaluations of ligand-based modeling and on drug

discovery. J. Comput.-Aided Mol. Des. 2008,22, 147−159.

(26) Varela, R.; Walters, W.; Goldman, B.; Jain, A. Iterative

refinement of a binding pocket model: Active computational steering

of lead optimization. J. Med. Chem. 2012,55, 8926−8942.

(27) Hogg, N.; Henderson, R.; Leitinger, B.; McDowall, A.; Porter,

J.; Stanley, P. Mechanisms contributing to the activity of integrins on

leukocytes. Immunol. Rev. 2002,186, 164−171.

(28) Lebwohl, M.; Tyring, S. K.; Hamilton, T. K.; Toth, D.; Glazer,

S.; Tawfik, N. H.; Walicke, P.; Dummer, W.; Wang, X.; Garovoy, M.

R.; Pariser, D. A novel targeted T-cell modulator, efalizumab, for

plaque psoriasis. N. Engl. J. Med. 2003,349, 2004−2013.

(29) Welzenbach, K.; Hommel, U.; Weitz-Schmidt, G. Small

Molecule Inhibitors Induce Conformational Changes in the I Domain

and the I-like Domain of Lymphocyte Function-associated Antigen-1.

J. Biol. Chem. 2002,277, 10590−10598.

(30) Friesner, R. A.; Banks, J. L.; Murphy, R. B.; Halgren, T. A.;

Klicic, J. J.; Mainz, D. T.; Repasky, M. P.; Knoll, E. H.; Shelley, M.;

Perry, J. K.; Shaw, D. E.; Francis, P.; Shenkin, P. S. Glide: A new

approach for rapid, accurate docking and scoring. 1. Method and

assessment of docking accuracy. J. Med. Chem. 2004,47, 1739−1749.

(31) Cleves, A. E.; Johnson, S. R.; Jain, A. N. Electrostatic-field and

surface-shape similarity for virtual screening and pose prediction. J.

Comput.-Aided Mol. Des. 2019,33, 865−886.

(32) Chen, Y.-N. P.; LaMarche, M. J.; Chan, H. M.; Fekkes, P.;

Garcia-Fortanet, J.; Acker, M. G.; Antonakos, B.; Chen, C. H.-T.;

Chen, Z.; Cooke, V. G.; Dobson, J. R.; Deng, Z.; Fei, F.; Firestone, B.;

Fodor, M.; Fridrich, C.; Gao, H.; Grunenfelder, D.; Hao, H.-X.; Jacob,

J.; Ho, S.; Hsiao, K.; Kang, Z. B.; Karki, R.; Kato, M.; Larrow, J.; La

Bonte, L. R.; Lenoir, F.; Liu, G.; Liu, S.; Majumdar, D.; Meyer, M. J.;

Palermo, M.; Perez, L.; Pu, M.; Price, E.; Quinn, C.; Shakya, S.;

Shultz, M. D.; Slisz, J.; Venkatesan, K.; Wang, P.; Warmuth, M.;

Williams, S.; Yang, G.; Yuan, J.; Zhang, J.-H.; Zhu, P.; Ramsey, T.;

Keen, N. J.; Sellers, W. R.; Stams, T.; Fortin, P. D. Allosteric

inhibition of SHP2 phosphatase inhibits cancers driven by receptor

tyrosine kinases. Nature 2016,535, 148−152.

Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

https://doi.org/10.1021/acs.jcim.1c01382

J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

(33) Jia, H.; Dai, G.; Weng, J.; Zhang, Z.; Wang, Q.; Zhou, F.; Jiao,

L.; Cui, Y.; Ren, Y.; Fan, S.; Zhou, J.; Qing, W.; Gu, Y.; Wang, J.; Sai,

Y.; Su, W. Discovery of (S)-1-(1-(Imidazo [1, 2-a] pyridin-6-yl)

ethyl)-6-(1-methyl-1 H-pyrazol-4-yl)-1 H-[1, 2, 3] triazolo [4, 5-b]

pyrazine (Volitinib) as a Highly Potent and Selective Mesenchymal−

Epithelial Transition Factor (c-Met) Inhibitor in Clinical Develop-

ment for Treatment of Cancer. J. Med. Chem. 2014,57, 7577−7589.

(34) Tron, A. E.; Belmonte, M. A.; Adam, A.; Aquila, B. M.; Boise, L.

H.; Chiarparin, E.; Cidado, J.; Embrey, K. J.; Gangl, E.; Gibbons, F.

D.; Gregory, G. P.; Hargreaves, D.; Hendricks, J. A.; Johannes, J. W.;

Johnstone, R. W.; Kazmirski, S. L.; Kettle, J. G.; Lamb, M. L.; Matulis,

S. M.; Nooka, A. K.; Packer, M. J.; Peng, B.; Rawlins, P. B.; Robbins,

D. W.; Schuller, A. G.; Su, N.; Yang, W.; Ye, Q.; Zheng, X.; Secrist, J.

P.; Clark, E. A.; Wilson, D. M.; Fawell, S. E.; Hird, A. W. Discovery of

Mcl-1-specific inhibitor AZD5991 and preclinical activity in multiple

myeloma and acute myeloid leukemia. Nature Comm 2018,9, 5341.

(35) Winn, M.; Reilly, E. B.; Liu, G.; Huth, J. R.; Jae, H.-S.; Freeman,

J.; Pei, Z.; Xin, Z.; Lynch, J.; Kester, J.; von Geldern, T. W.; Leitza, S.;

DeVries, P.; Dickinson, R.; Mussatto, D.; Okasinski, G. F. Discovery

of novel p-arylthio cinnamides as antagonists of leukocyte function-

associated antigen-1/intercellular adhesion molecule-1 interaction. 4.

Structure- activity relationship of substituents on the benzene ring of

the cinnamide. J. Med. Chem. 2001,44, 4393−4403.

(36) Kollmann, C. S.; Bai, X.; Tsai, C.-H.; Yang, H.; Lind, K. E.;

Zhu, Z.; Israel, D. I.; Cuozzo, J. W.; Morgan, B. A.; Yuki, K.; Xie, C.;

Springer, T. A.; Shimaoka, M.; Evindar, G.; Skinner, S. R. Application

of encoded library technology (ELT) to a protein−protein interaction

target: Discovery of a potent class of integrin lymphocyte function-

associated antigen 1 (LFA-1) antagonists. Bioorg. Med. Chem. 2014,

22, 2353−2365.

(37) Spitzer, R.; Cleves, A.; Varela, R.; Jain, A. Protein function

annotation by local binding site surface similarity. Proteins: Struct.,

Funct., Genet. 2014,82, 679−694.

(38) Cleves, A. E.; Jain, A. N. ForceGen 3D Structure and

Conformer Generation: From Small Lead-Like Molecules to Macro-

cyclic Drugs. J. Comput.-Aided Mol. Des. 2017,31, 419−439.

(39) Jain, A. N.; Cleves, A. E.; Gao, Q.; Wang, X.; Liu, Y.; Sherer, E.

C.; Reibarkh, M. Y. Complex macrocycle exploration: Parallel,

heuristic, and constraint-based conformer generation using ForceGen.

J. Comput.-Aided Mol. Des. 2019,33, 531−558.

Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

https://doi.org/10.1021/acs.jcim.1c01382

J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

From UK-2A to florylpicoxamid: Active learning to identify a mimic of a macrocyclic natural product

Article

Full-text available

Apr 2024
J COMPUT AID MOL DES

Scaffold replacement as part of an optimization process that requires maintenance of potency, desirable biodistribution, metabolic stability, and considerations of synthesis at very large scale is a complex challenge. Here, we consider a set of over 1000 time-stamped compounds, beginning with a macrocyclic natural-product lead and ending with a broad-spectrum crop anti-fungal. We demonstrate the application of the QuanSA 3D-QSAR method employing an active learning procedure that combines two types of molecular selection. The first identifies compounds predicted to be most active of those most likely to be well-covered by the model. The second identifies compounds predicted to be most informative based on exhibiting low predicted activity but showing high 3D similarity to a highly active nearest-neighbor training molecule. Beginning with just 100 compounds, using a deterministic and automatic procedure, five rounds of 20-compound selection and model refinement identifies the binding metabolic form of florylpicoxamid. We show how iterative refinement broadens the domain of applicability of the successive models while also enhancing predictive accuracy. We also demonstrate how a simple method requiring very sparse data can be used to generate relevant ideas for synthetic candidates.

Designing Collagen-Binding Peptide with Enhanced Properties Using Hydropathic Free Energy Predictions

Article

Full-text available

Mar 2023

Collagen is fundamental to a vast diversity of health functions and potential therapeutics. Short peptides targeting collagen are attractive for designing modular systems for site-specific delivery of bioactive agents. Characterization of peptide–protein binding involves a larger number of potential interactions that require screening methods to target physiological conditions. We build a hydropathy-based free energy estimation tool which allows quick evaluation of peptides binding to collagen. Previous studies showed that pH plays a significant role in collagen structure and stability. Our design tool enables probing peptides for their collagen-binding property across multiple pH conditions. We explored binding features of currently known collagen-binding peptides, collagen type I alpha chain 2 sense peptide (TKKTLRT) and decorin LRR-10 (LRELHLNNN). Based on these analyzes, we engineered a collagen-binding peptide with enhanced properties across a large pH range in contrast to LRR-10 pH dependence. To validate our predictions, we used a quantum-dots-based binding assay to compare the coverage of the peptides on type I collagen. The predicted peptide resulted in improved collagen binding. Hydropathy of the peptide–protein pair is a promising approach to finding compatible pairings with minimal use of computational resources, and our method allows for quick evaluation of peptides for binding to other proteins. Overall, the free-energy-based tool provides an alternative computational screening approach that impacts protein interaction search methods.

Pathfinder-Driven Chemical Space Exploration and Multiparameter Optimization in Tandem with Glide/IFD and QSAR-Based Active Learning Approach to Prioritize Design Ideas for FEP+ Calculations of SARS-CoV-2 PLpro Inhibitors

Article

Full-text available

Dec 2022
MOLECULES

Njabulo Joyfull Gumede

A global pandemic caused by the SARS-CoV-2 virus that started in 2020 and has wreaked havoc on humanity still ravages up until now. As a result, the negative impact of travel restrictions and lockdowns has underscored the importance of our preparedness for future pandemics. The main thrust of this work was based on addressing this need by traversing chemical space to design inhibitors that target the SARS-CoV-2 papain-like protease (PLpro). Pathfinder-based retrosynthesis analysis was used to generate analogs of GRL-0617 using commercially available building blocks by replacing the naphthalene moiety. A total of 10 models were built using active learning QSAR, which achieved good statistical results such as an R2 > 0.70, Q2 > 0.64, STD Dev < 0.30, and RMSE < 0.31, on average for all models. A total of 35 ideas were further prioritized for FEP+ calculations. The FEP+ results revealed that compound 45 was the most active compound in this series with a ΔG of −7.28 ± 0.96 kcal/mol. Compound 5 exhibited a ΔG of −6.78 ± 1.30 kcal/mol. The inactive compounds in this series were compound 91 and compound 23 with a ΔG of −5.74 ± 1.06 and −3.11 ± 1.45 kcal/mol. The combined strategy employed here is envisaged to be of great utility in multiparameter lead optimization efforts, to traverse chemical space, maintaining and/or improving the potency as well as the property space of synthetically aware design ideas.

A high quality, industrial data set for binding affinity prediction: performance comparison in different early drug discovery scenarios

Article

Full-text available

Sep 2022
J COMPUT AID MOL DES

We release a new, high quality data set of 1162 PDE10A inhibitors with experimentally determined binding affinities together with 77 PDE10A X-ray co-crystal structures from a Roche legacy project. This data set is used to compare the performance of different 2D- and 3D-machine learning (ML) as well as empirical scoring functions for predicting binding affinities with high throughput. We simulate use cases that are relevant in the lead optimization phase of early drug discovery. ML methods perform well at interpolation, but poorly in extrapolation scenarios—which are most relevant to a real-world application. Moreover, we find that investing into the docking workflow for binding pose generation using multi-template docking is rewarded with an improved scoring performance. A combination of 2D-ML and 3D scoring using a modified piecewise linear potential shows best overall performance, combining information on the protein environment with learning from existing SAR data. Graphical abstract

Ligand Binding Free Energy Evaluation by Monte Carlo Recursion

Preprint

Full-text available

Aug 2022

The correct evaluation of ligand binding free energies by computational methods is still a very challenging active area of research. The most employed methods for these calculations can be roughly classified into four groups: ( i ) the fastest and less accurate methods, such as molecular docking, designed to sample a large number of molecules and rapidly rank them according to the potential binding energy; ( ii ) the second class of methods use a thermodynamic ensemble, typically generated by molecular dynamics, to analyze the endpoints of the thermodynamic cycle for binding and extract differences, in the so-called ‘end-point’ methods; ( iii ) the third class of methods is based on the Zwanzig relationship and computes the free energy difference after a chemical change of the system (alchemical methods); and ( iv ) methods based on biased simulations, such as metadynamics, for example. These methods require increased computational power and as expected, result in increased accuracy for the determination of the strength of binding. Here, we describe an intermediate approach, based on the Monte Carlo Recursion (MCR) method first developed by Harold Scheraga. In this method, the system is sampled at increasing effective temperatures, and the free energy of the system is assessed from a series of terms W ( b , T ), computed from Monte Carlo (MC) averages at each iteration. We show the application of the MCR for ligand binding with datasets of guest-hosts systems (N=75) and we observed that a good correlation is obtained between experimental data and the binding energies computed with MCR. We also compared the experimental data with an end-point calculation from equilibrium Monte Carlo calculations that allowed us to conclude that the lower-energy (lower-temperature) terms in the calculation are the most relevant to the estimation of the binding energies, resulting in similar correlations between MCR and MC data and the experimental values. On the other hand, the MCR method provides a reasonable view of the binding energy funnel, with possible connections with the ligand binding kinetics, as well. The codes developed for this analysis are publicly available on GitHub as a part of the LiBELa/MCLiBELa project ( https://github.com/alessandronascimento/LiBELa ). Table of Contents/Abstract Graphics

Ligand Binding Free Energy Evaluation by Monte Carlo Recursion

Article

Feb 2023

The correct evaluation of ligand binding free energies by computational methods is still a very challenging active area of research. The most employed methods for these calculations can be roughly classified into four groups: (i) the fastest and less accurate methods, such as molecular docking, designed to sample a large number of molecules and rapidly rank them according to the potential binding energy; (ii) the second class of methods use a thermodynamic ensemble, typically generated by molecular dynamics, to analyze the endpoints of the thermodynamic cycle for binding and extract differences, in the so-called 'end-point' methods; (iii) the third class of methods is based on the Zwanzig relationship and computes the free energy difference after a chemical change of the system (alchemical methods); and (iv) methods based on biased simulations, such as metadynamics, for example. These methods require increased computational power and as expected, result in increased accuracy for the determination of the strength of binding. Here, we describe an intermediate approach, based on the Monte Carlo Recursion (MCR) method first developed by Harold Scheraga. In this method, the system is sampled at increasing effective temperatures, and the free energy of the system is assessed from a series of terms W(b,T), computed from Monte Carlo (MC) averages at each iteration. We show the application of the MCR for ligand binding with datasets of guest-hosts systems (N = 75) and we observed that a good correlation is obtained between experimental data and the binding energies computed with MCR. We also compared the experimental data with an end-point calculation from equilibrium Monte Carlo calculations that allowed us to conclude that the lower-energy (lower-temperature) terms in the calculation are the most relevant to the estimation of the binding energies, resulting in similar correlations between MCR and MC data and the experimental values. On the other hand, the MCR method provides a reasonable view of the binding energy funnel, with possible connections with the ligand binding kinetics, as well. The codes developed for this analysis are publicly available on GitHub as a part of the LiBELa/MCLiBELa project (https://github.com/alessandronascimento/LiBELa).

Advances in computational structure-based antibody design

Article

Apr 2022

Antibodies are currently the most important class of biotherapeutics and are used to treat numerous diseases. Recent advances in computational methods are ushering in a new era of antibody design, driven in part by accurate structure prediction. Previously, structure-based antibody design has been limited to a relatively small number of cases where accurate structures or models of both the target antigen and antibody were available. As we move towards a time where it is possible to accurately model most antibodies and antigens, and to reliably predict their binding site, there is vast potential for true computational antibody design. In this review, we describe the latest methods that promise to launch a paradigm shift towards entirely in silico structure-based antibody design.

Electrostatic-field and surface-shape similarity for virtual screening and pose prediction

Article

Full-text available

Oct 2019
J COMPUT AID MOL DES

We introduce a new method for rapid computation of 3D molecular similarity that combines electrostatic field comparison with comparison of molecular surface-shape and directional hydrogen-bonding preferences (called “eSim”). Rather than employing heuristic “colors” or user-defined molecular feature types to represent conformation-dependent molecular electrostatics, eSim calculates the similarity of the electrostatic fields of two molecules (in addition to shape and hydrogen-bonding). We present detailed virtual screening performance data on the standard 102 target DUD-E set. In its moderately fast screening mode, eSim running on a single computing core is capable of processing over 60 molecules per second. In this mode, eSim performed significantly better than all alternate methods for which full DUD-E data were available (mean ROC area of 0.74, p \(< 10^{-9}\), by paired t-test, compared with the best performing alternate method). In addition, for 92 targets of the DUD-E set where multiple ligand-bound crystal structures were available, screening performance was assessed using alternate ligands or sets thereof (in their bound poses) as similarity targets. Using the joint alignment of five ligands for each protein target, mean ROC area exceeded 0.82 for the 92 targets. Design-focused application of ligand similarity methods depends on accurate predictions of geometric molecular relationships. We comprehensively assessed pose prediction accuracy by curating nearly 400,000 bound ligand pose pairs across the DUD-E targets. Overall, beginning from agnostic initial poses, we observed an 80% success rate for RMSD \(\le 2.0\) Å among the top 20 predicted eSim poses. These examples were split roughly 50/50 into cases with high direct atomic overlap (where a shared scaffold exists between a pair) and low direct atomic overlap (where where a ligand pair has dissimilar scaffolds but largely occupies the same space). Within the high direct atomic overlap subset, the pose prediction success rate was 93%. For the more challenging subset (where dissimilar scaffolds are to be aligned), the success rate was 70%. The eSim approach enables both large-scale screening and rational design of ligands and is rooted in physically meaningful, non-heuristic, molecular comparisons.

Complex macrocycle exploration: parallel, heuristic, and constraint-based conformer generation using ForceGen

Article

Full-text available

Jun 2019
J COMPUT AID MOL DES

ForceGen is a template-free, non-stochastic approach for 2D to 3D structure generation and conformational elaboration for small molecules, including both non-macrocycles and macrocycles. For conformational search of non-macrocycles, ForceGen is both faster and more accurate than the best of all tested methods on a very large, independently curated benchmark of 2859 PDB ligands. In this study, the primary results are on macrocycles, including results for 431 unique examples from four separate benchmarks. These include complex peptide and peptide-like cases that can form networks of internal hydrogen bonds. By making use of new physical movements (“flips” of near-linear sub-cycles and explicit formation of hydrogen bonds), ForceGen exhibited statistically significantly better performance for overall RMS deviation from experimental coordinates than all other approaches. The algorithmic approach offers natural parallelization across multiple computing-cores. On a modest multi-core workstation, for all but the most complex macrocycles, median wall-clock times were generally under a minute in fast search mode and under 2 min using thorough search. On the most complex cases (roughly cyclic decapeptides and larger) explicit exploration of likely hydrogen bonding networks yielded marked improvements, but with calculation times increasing to several minutes and in some cases to roughly an hour for fast search. In complex cases, utilization of NMR data to constrain conformational search produces accurate conformational ensembles representative of solution state macrocycle behavior. On macrocycles of typical complexity (up to 21 rotatable macrocyclic and exocyclic bonds), design-focused macrocycle optimization can be practically supported by computational chemistry at interactive time-scales, with conformational ensemble accuracy equaling what is seen with non-macrocyclic ligands. For more complex macrocycles, inclusion of sparse biophysical data is a helpful adjunct to computation.

Discovery of Mcl-1-specific inhibitor AZD5991 and preclinical activity in multiple myeloma and acute myeloid leukemia

Article

Full-text available

Dec 2018

Mcl-1 is a member of the Bcl-2 family of proteins that promotes cell survival by preventing induction of apoptosis in many cancers. High expression of Mcl-1 causes tumorigenesis and resistance to anticancer therapies highlighting the potential of Mcl-1 inhibitors as anticancer drugs. Here, we describe AZD5991, a rationally designed macrocyclic molecule with high selectivity and affinity for Mcl-1 currently in clinical development. Our studies demonstrate that AZD5991 binds directly to Mcl-1 and induces rapid apoptosis in cancer cells, most notably myeloma and acute myeloid leukemia, by activating the Bak-dependent mitochondrial apoptotic pathway. AZD5991 shows potent antitumor activity in vivo with complete tumor regression in several models of multiple myeloma and acute myeloid leukemia after a single tolerated dose as monotherapy or in combination with bortezomib or venetoclax. Based on these promising data, a Phase I clinical trial has been launched for evaluation of AZD5991 in patients with hematological malignancies (NCT03218683).

Quantitative surface field analysis: learning causal models to predict ligand binding affinity and pose

Article

Full-text available

Jul 2018
J COMPUT AID MOL DES

We introduce the QuanSA method for inducing physically meaningful field-based models of ligand binding pockets based on structure-activity data alone. The method is closely related to the QMOD approach, substituting a learned scoring field for a pocket constructed of molecular fragments. The problem of mutual ligand alignment is addressed in a general way, and optimal model parameters and ligand poses are identified through multiple-instance machine learning. We provide algorithmic details along with performance results on sixteen structure-activity data sets covering many pharmaceutically relevant targets. In particular, we show how models initially induced from small data sets can extrapolatively identify potent new ligands with novel underlying scaffolds with very high specificity. Further, we show that combining predictions from QuanSA models with those from physics-based simulation approaches is synergistic. QuanSA predictions yield binding affinities, explicit estimates of ligand strain, associated ligand pose families, and estimates of structural novelty and confidence. The method is applicable for fine-grained lead optimization as well as potent new lead identification.

ForceGen 3D structure and conformer generation: from small lead-like molecules to macrocyclic drugs

Article

Full-text available

May 2017
J COMPUT AID MOL DES

We introduce the ForceGen method for 3D structure generation and conformer elaboration of drug-like small molecules. ForceGen is novel, avoiding use of distance geometry, molecular templates, or simulation-oriented stochastic sampling. The method is primarily driven by the molecular force field, implemented using an extension of MMFF94s and a partial charge estimator based on electronegativity-equalization. The force field is coupled to algorithms for direct sampling of realistic physical movements made by small molecules. Results are presented on a standard benchmark from the Cambridge Crystallographic Database of 480 drug-like small molecules, including full structure generation from SMILES strings. Reproduction of protein-bound crystallographic ligand poses is demonstrated on four carefully curated data sets: the ConfGen Set (667 ligands), the PINC cross-docking benchmark (1062 ligands), a large set of macrocyclic ligands (182 total with typical ring sizes of 12–23 atoms), and a commonly used benchmark for evaluating macrocycle conformer generation (30 ligands total). Results compare favorably to alternative methods, and performance on macrocyclic compounds approaches that observed on non-macrocycles while yielding a roughly 100-fold speed improvement over alternative MD-based methods with comparable performance.

Extrapolative prediction using physically-based QSAR

Article

Full-text available

Feb 2016
J COMPUT AID MOL DES

Surflex-QMOD integrates chemical structure and activity data to produce physically-realistic models for binding affinity prediction . Here, we apply QMOD to a 3D-QSAR benchmark dataset and show broad applicability to a diverse set of targets. Testing new ligands within the QMOD model employs automated flexible molecular alignment, with the model itself defining the optimal pose for each ligand. QMOD performance was compared to that of four approaches that depended on manual alignments (CoMFA, two variations of CoMSIA, and CMF). QMOD showed comparable performance to the other methods on a challenging, but structurally limited, test set. The QMOD models were also applied to test a large and structurally diverse dataset of ligands from ChEMBL, nearly all of which were synthesized years after those used for model construction. Extrapolation across diverse chemical structures was possible because the method addresses the ligand pose problem and provides structural and geometric means to quantitatively identify ligands within a model’s applicability domain. Predictions for such ligands for the four tested targets were highly statistically significant based on rank correlation. Those molecules predicted to be highly active (\(\hbox {pK}_i \ge 7.5\)) had a mean experimental \(\hbox {pK}_i\) of 7.5, with potent and structurally novel ligands being identified by QMOD for each target.

Applications of Deep Learning in Molecule Generation and Molecular Property Prediction

Article

Dec 2020

ConspectusRecent advances in computer hardware and software have led to a revolution in deep neural networks that has impacted fields ranging from language translation to computer vision. Deep learning has also impacted a number of areas in drug discovery, including the analysis of cellular images and the design of novel routes for the synthesis of organic molecules. While work in these areas has been impactful, a complete review of the applications of deep learning in drug discovery would be beyond the scope of a single Account. In this Account, we will focus on two key areas where deep learning has impacted molecular design: the prediction of molecular properties and the de novo generation of suggestions for new molecules.One of the most significant advances in the development of quantitative structure-activity relationships (QSARs) has come from the application of deep learning methods to the prediction of the biological activity and physical properties of molecules in drug discovery programs. Rather than employing the expert-derived chemical features typically used to build predictive models, researchers are now using deep learning to develop novel molecular representations. These representations, coupled with the ability of deep neural networks to uncover complex, nonlinear relationships, have led to state-of-the-art performance. While deep learning has changed the way that many researchers approach QSARs, it is not a panacea. As with any other machine learning task, the design of predictive models is dependent on the quality, quantity, and relevance of available data. Seemingly fundamental issues, such as optimal methods for creating a training set, are still open questions for the field. Another critical area that is still the subject of multiple research efforts is the development of methods for assessing the confidence in a model.Deep learning has also contributed to a renaissance in the application of de novo molecule generation. Rather than relying on manually defined heuristics, deep learning methods learn to generate new molecules based on sets of existing molecules. Techniques that were originally developed for areas such as image generation and language translation have been adapted to the generation of molecules. These deep learning methods have been coupled with the predictive models described above and are being used to generate new molecules with specific predicted biological activity profiles. While these generative algorithms appear promising, there have been only a few reports on the synthesis and testing of molecules based on designs proposed by generative models. The evaluation of the diversity, quality, and ultimate value of molecules produced by generative models is still an open question. While the field has produced a number of benchmarks, it has yet to agree on how one should ultimately assess molecules "invented" by an algorithm.

Large-Scale Assessment of Binding Free Energy Calculations in Active Drug Discovery Projects

Article

Aug 2020

Accurate ranking of compounds with regards to their binding affinity to a protein using computational methods is of great interest to pharmaceutical research. Physics-based free energy calculations are regarded as the most rigorous way to estimate binding affinity. In recent years, many retrospective studies carried out both in academia and industry have demonstrated its potential. Here, we present the results of large-scale prospective application of the FEP+ method in active drug discovery projects in an industry setting at Merck KGaA, Darmstadt, Germany. We compare these prospective data to results obtained on a new diverse, public benchmark of eight pharmaceutically relevant targets. Our results offer insights into the challenges faced when using free energy calculations in real-life drug discovery projects and identify limitations that could be tackled by future method development. The new public data set we provide to the community can support further method development and comparative benchmarking of free energy calculations.

End-Point Binding Free Energy Calculation with MM/PBSA and MM/GBSA: Strategies and Applications in Drug Design

Article

Jun 2019

Molecular mechanics Poisson-Boltzmann surface area (MM/PBSA) and molecular mechanics generalized Born surface area (MM/GBSA) are arguably very popular methods for binding free energy prediction since they are more accurate than most scoring functions of molecular docking and less computationally demanding than alchemical free energy methods. MM/PBSA and MM/GBSA have been widely used in biomolecular studies such as protein folding, protein-ligand binding, protein-protein interaction, etc. In this review, methods to adjust the polar solvation energy and to improve the performance of MM/PBSA and MM/GBSA calculations are reviewed and discussed. The latest applications of MM/GBSA and MM/PBSA in drug design are also presented. This review intends to provide readers with guidance for practically applying MM/PBSA and MM/GBSA in drug design and related research fields.

Allosteric inhibition of SHP2 phosphatase inhibits cancers driven by receptor tyrosine kinases

Article

Jun 2016

The non-receptor protein tyrosine phosphatase SHP2, encoded by PTPN11, has an important role in signal transduction downstream of growth factor receptor signalling and was the first reported oncogenic tyrosine phosphatase. Activating mutations of SHP2 have been associated with developmental pathologies such as Noonan syndrome and are found in multiple cancer types, including leukaemia, lung and breast cancer and neuroblastoma. SHP2 is ubiquitously expressed and regulates cell survival and proliferation primarily through activation of the RAS-ERK signalling pathway. It is also a key mediator of the programmed cell death 1 (PD-1) and B- and T-lymphocyte attenuator (BTLA) immune checkpoint pathways. Reduction of SHP2 activity suppresses tumour cell growth and is a potential target of cancer therapy. Here we report the discovery of a highly potent (IC50 = 0.071 μM), selective and orally bioavailable small-molecule SHP2 inhibitor, SHP099, that stabilizes SHP2 in an auto-inhibited conformation. SHP099 concurrently binds to the interface of the N-terminal SH2, C-terminal SH2, and protein tyrosine phosphatase domains, thus inhibiting SHP2 activity through an allosteric mechanism. SHP099 suppresses RAS-ERK signalling to inhibit the proliferation of receptor-tyrosine-kinase-driven human cancer cells in vitro and is efficacious in mouse tumour xenograft models. Together, these data demonstrate that pharmacological inhibition of SHP2 is a valid therapeutic approach for the treatment of cancers.

Synergy and Complementarity between Focused Machine Learning and Physics-Based Simulation in Affinity Prediction

Abstract and Figures

Recommended publications

Quantitative surface field analysis: learning causal models to predict ligand binding affinity and p...

From UK-2A to florylpicoxamid: Active learning to identify a mimic of a macrocyclic natural product

Structure-Based and Ligand-Based Virtual Screening on DUD-E+: Performance Dependence on Approximatio...

Electrostatic-field and surface-shape similarity for virtual screening and pose prediction