ArticlePDF Available

Synergy and Complementarity between Focused Machine Learning and Physics-Based Simulation in Affinity Prediction

Authors:
  • BioPharmics Division, Optibrium Ltd.
  • BioPharmics LLC

Abstract and Figures

We present results on the extent to which physics-based simulation (exemplified by FEP⁺) and focused machine learning (exemplified by QuanSA) are complementary for ligand affinity prediction. For both methods, predictions of activity for LFA-1 inhibitors from a medicinal chemistry lead optimization project were accurate within the applicable domain of each approach. A hybrid model that combined predictions by both approaches by simple averaging performed better than either method, with respect to both ranking and absolute pKi values. Two publicly available FEP⁺ benchmarks, covering 16 diverse biological targets, were used to test the generality of the synergy. By identifying training data specifically focused on relevant ligands, accurate QuanSA models were derived using ligand activity data known at the time of the original series publications. Results across the 16 benchmark targets demonstrated significant improvements both for ranking and for absolute pKi values using hybrid predictions that combined the FEP⁺ and QuanSA predicted affinity values. The results argue for a combined approach for affinity prediction that makes use of physics-driven methods as well as those driven by machine learning, each applied carefully on appropriate compounds, with hybrid prediction strategies being employed where possible.
Content may be subject to copyright.
Synergy and Complementarity between Focused Machine Learning
and Physics-Based Simulation in Anity Prediction
Ann E. Cleves, Stephen R. Johnson,*and Ajay N. Jain*
Cite This: https://doi.org/10.1021/acs.jcim.1c01382
Read Online
ACCESS Metrics & More Article Recommendations *
sıSupporting Information
ABSTRACT: We present results on the extent to which physics-based
simulation (exemplied by FEP+) and focused machine learning
(exemplied by QuanSA) are complementary for ligand anity prediction.
For both methods, predictions of activity for LFA-1 inhibitors from a
medicinal chemistry lead optimization project were accurate within the
applicable domain of each approach. A hybrid model that combined
predictions by both approaches by simple averaging performed better than
either method, with respect to both ranking and absolute pKivalues. Two
publicly available FEP+benchmarks, covering 16 diverse biological targets,
were used to test the generality of the synergy. By identifying training data
specically focused on relevant ligands, accurate QuanSA models were derived using ligand activity data known at the time of the
original series publications. Results across the 16 benchmark targets demonstrated signicant improvements both for ranking and for
absolute pKivalues using hybrid predictions that combined the FEP+and QuanSA predicted anity values. The results argue for a
combined approach for anity prediction that makes use of physics-driven methods as well as those driven by machine learning,
each applied carefully on appropriate compounds, with hybrid prediction strategies being employed where possible.
INTRODUCTION
Binding anity prediction continues to be a challenge for
computer-aided drug design, especially in the case where there
is no high-resolution experimental structure of the target of
interest. Even when structures of the biological target are
available, anity prediction is dicult. Simulation oriented
physics-based methods, such as MM/PBSA or MM/
GBSA
13,6
or free energy perturbation (FEP),
79
share a key
attraction: in principle, these approaches are congruent with
what is known physically. The former methods nominally
predict absolute binding free energy. In terms of predictive
accuracy, even in the case where experimental structures are
known for all ligands under consideration, performance has
been observed to be quite variable on a per-target basis,
10
though more consistent results have been obtained in some
cases, with careful application.
3
Additional context with respect
to the state of physical simulation approaches is provided by
recent reviews.
4,5
For the FEP approach, relative free energy predictions are
made. This is done by estimating the dierence in the free
energies of proteinligand complexes between related ligand
pairs (typically diering relatively modestly in their sub-
stituents). Advances in force elds, sampling methods, and
automated design of perturbation graphs
9
can help to guide
ne-grained molecular optimization. In cases where the FEP+
method is applicable, for single perturbations of a few ligand
atoms from a known reference ligand, errors in predicting
changes in free energy have been reported to be as low as 0.5
pKiunits (0.9 kcal/mol).
9
More recent benchmarking on a
more challenging set of perturbations yielded errors roughly
50% higher.
11
Anity prediction remains a challenging
problem, even in cases where targets have well-characterized
structures and there is little uncertainty in ligand binding
modes.
Machine-learning approaches have seen a recent resurgence
in their applications within the CADD eld, in part driven by
advances in deep-learning methodologies. A recent review
highlights a number of successful applications as well as
limitations,
12
with further context provided by a full book
treatment.
13
With respect to binding anity prediction in the
context of lead optimization, a critical factor is that the
methods typically require thousands of data points in order to
learn eectively, because of the need to develop encoded
internal representations that meaningfully capture the
important aspects required for prediction. Early-stage lead
optimization may involve just dozens of assayed molecules
within a newly discovered chemical series, and even mid-to-
late-stage projects may be limited to hundreds or up to a few
thousand data points. The recently introduced QuanSA
machine-learning method (Quantitative Surface-eld Analysis)
Received: November 11, 2021
Articlepubs.acs.org/jcim
© XXXX The Authors. Published by
American Chemical Society A
https://doi.org/10.1021/acs.jcim.1c01382
J. Chem. Inf. Model. XXXX, XXX, XXXXXX
Downloaded via 178.171.38.123 on December 11, 2021 at 02:20:05 (UTC).
See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.
diers from the deep-learning paradigm and from historically
widely used methods.
14
The central dierence is that, rather than applying a generic
machine-learning approach to an input molecular representa-
tion divorced from a binding event, QuanSA builds a physically
interpretable model that is analogous to a protein binding site.
By doing so, it addresses the problem of ligand conformation
and alignment fully automatically, and it moves in the direction
of causal modeling, where the requirement for data can be
reduced. The method constructs a nonlinear pocket eldthat
is still physical in nature, and which is directly related to the
functional form of scoring functions for docking.
15,16
QuanSA
pocket-eld models mirror key physical phenomena that are
observed in proteinligand interactions:
17
(1) choice of ligand
poses is dened by the model; (2) non-additive (or even anti-
additive) eects of substituent changes on a central scaold
can be modeled eectively; (3) changes in ligand structures
induce changes in predicted ligand poses; (4) the model of
molecular activity is dependent on the detailed shape of
ligands. Nearly all QSAR and deep-learning methods ignore
some or all of these aspects of proteinligand interactions.
Additional discussion of the theoretical contrasts between the
QuanSA multiple-instance learning approach and other QSAR
(3D and 2D) approaches can be found in the papers
introducing the method
14
along with the antecedent
QMOD
18
and Compass
1921
approaches, the latter of which
introduced the multiple-instance machine-learning paradigm.
22
Here, we explore the performance of both FEP+and the
QuanSA machine-learning method in a lead optimization
project application scenario and using two publicly available
FEP+benchmarks,
9,11
spanning 16 diverse targets and covering
anity predictions for nearly 400 molecules. Project data for
LFA-1
23,24
was used as a representative example of mid-to-late-
stage lead optimization, where substantial structureactivity
data exist, particularly within a chemical series of interest. The
two FEP+benchmarks were used to assess early-stage project
application, where only sparse data may be available.
Accuracy of the QuanSA and FEP+approaches, as well as a
hybrid approach combining predictions from the two methods
by simple averaging, will be detailed in what follows. In
Figure 1. Overview of the QuanSA method. Beginning from ligand structures and activities (here against LFA-1), a multiple-ligand alignment is
produced (with variants for each molecule), after which a smooth, nonlinear function is induced (called a pocket eld), into which new molecules
can be exibly t as is commonly done with docking approaches. Here, the new test molecule, compound 4, was made 7 months after the last
molecule within the training set (example molecules 13), and it was accurately predicted. Shown in the lower row is the predicted pose of
compound 4, the surface surrounded by the pocket eld (left), and the interactions with the pocket eld with and without the surface (middle and
right).
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article
https://doi.org/10.1021/acs.jcim.1c01382
J. Chem. Inf. Model. XXXX, XXX, XXXXXX
B
addition, because the QuanSA approach can be practically
applied to screen large databases for new lead discovery and
scaold replacement, a screening utility was assessed using
structureactivity data for diverse compounds that were
disclosed after the data used for model induction.
RESULTS AND DISCUSSION
We report results for two types of project application
scenarios: mid-to-late-stage lead optimization and early stage.
In both cases, training and testing data for QuanSA were
temporally segregated: building models on older molecules and
predicting the activities of future molecules. This parallels the
application scenario for predictive modeling, and it avoids bias
in assessing the performance of learned models.
17,25,26
For the
mid-to-late-stage scenario, the data set included compound
registration dates and associated activities. For the early-stage
scenario, coarser temporal segregation was accomplished by
making use of the years of disclosure of structureactivity and
protein structural data. Assessment of a screening utility for the
QuanSA models also employed temporal segregation.
QuanSA Model Induction. The QuanSA method has
been previously described in detail
14
and will be summarized
only briey here, with additional details in the Supporting
Information.Figure 1 illustrates the induction of a QuanSA
pocket eld. Beginning with pure SAR data (here SMILES
strings and associated pKimeasurements), low-energy
conformational ensembles are produced, from which multiple
mutual ligand alignments are automatically constructed. These
alignments may be inuenced (optionally) by provision of
known bound ligand poses, and each ligand alignment contains
a single optimal pose along with many related alternative poses.
The derived pocket eld acts as a virtual binding pocket, into
which new molecules are exibly t, subject to the joint
considerations of optimizing ligand interactions with the
pocket and minimizing ligand strain.
For all models in this study, training ligands were focused
around scaolds of interest with respect to prediction and, in
all cases, the poses of bound ligands were used to drive the
initial alignment process. The more general case of diverse
scaolds without the benet of known bound ligand poses is
more challenging, and that has been discussed extensively in
prior work.
14,18
Figure 1 shows three representative training molecules (1
3) and one future test molecule (4) from this work. Shown in
3D is the mutual overlay of the nal optimal poses of the
training molecules in the model. In this example, QuanSA
accurately predicted the activity of the new molecule, which
was synthesized months after the molecules used for model
induction.
Mid-to-Late-Stage Project Application Scenario: LFA-
1. LFA-1 is a heterodimeric protein of the integrin family with
noncovalently linked αand βsubunits and is expressed on the
surface of leukocytes.
27
LFA-1 mediates the interactions
between leukocytes and other cells and has been pursued as
a target for immunological disorder treatments, both by
antibodies
28
and with small molecules.
29
The compounds in
this work were generated in an eort to identify orally active
small molecules that disrupted the LFA-1/ICAM-1 inter-
action.
23,24
The set is comprised mostly of bicyclic hydantoins
(e.g., compound 2), spirocyclic hydantoins, and spirocyclic
pyrrolidines (e.g., compound 1), and all bind competitively to
the I-domain allosteric site of LFA-1 and prevent the
conformational changes required for ICAM-1 binding.
The LFA-1 structureactivity set contained homogeneous,
high-quality assay data, with time stamps available to allow for
segregation of data into a training set and a set of future
compounds for prediction. Figure 2 (left) depicts the QuanSA
Figure 2. Preparation and scoring procedures using a temporally segregated set of LFA-1 inhibitors from a medicinal chemistry lead optimization
project: QuanSA (left) and FEP+(right). The QuanSA approach follows a machine-learning paradigm, employing a training set and a holdout set
for model selection. The FEP+approach combines careful force eld parameter estimation, molecular docking, and extensive physical simulation.
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article
https://doi.org/10.1021/acs.jcim.1c01382
J. Chem. Inf. Model. XXXX, XXX, XXXXXX
C
model building, model selection, and testing procedure applied
to the series of LFA-1 inhibitors. Model selection was done by
testing alternative models on a later set of holdout molecules.
Figure 2 (right) depicts the procedure for FEP+. QuanSA
makes use of a typical machine-learning paradigm, employing
training and (optional) holdout sets of molecules, within
successive time windows of project activity. FEP+makes
predictions on a set of structural variations of a reference
molecule, with the reference here being selected from among
the LFA-1 holdout set and the 17 molecules for prediction
being chosen from among the 67 molecules from the nal
project time window.
The selected model had a mean unsigned error (MUE) of
0.56 log unit on the holdout set, corresponding to a Kendallsτ
of 0.48 (p< 0.0001). This model was rened using the holdout
molecules, resulting in a nal t to the 135 training/holdout
molecules of 0.25 log unit MUE and Kendallsτof 0.86 (p<
0.0001). The rened pocket eld (shown in Figure 1) was then
used to score the blind test set of 67 future molecules.
The plot in Figure 3 shows the experimental activities
compared to the QuanSA predicted activities for the full set of
67 future test molecules. QuanSA yielded statistically
signicant predictions for the full blind test with a τof 0.57
(95% condence interval (CI) 0.420.69, p< 0.0001) and an
Figure 3. Plot of experimental activities versus predicted activities from QuanSA for the full set of 67 future test molecules. Test molecules 58
have structures signicantly dierent from those of the training compounds, and the plot points for these compounds are highlighted in orange.
Also shown are the top pose families and interactions with the pocket eld for four example test molecules with the spirocyclic pyrrolidine scaold
(912) whose points on the graph are highlighted in blue and are indicated with red arrows.
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article
https://doi.org/10.1021/acs.jcim.1c01382
J. Chem. Inf. Model. XXXX, XXX, XXXXXX
D
MUE of 0.52 log unit (95% CI 0.430.63). Lines indicate
perfect prediction, and to ±0.7 and ±1.5 units of pKi
(corresponding to ±1 and ±2 kcal/mol). Just under 80% of
the molecules (53 of 67) were predicted within 0.7 unit of pKi,
and just two molecules exceeded 1.5 units.
Figure 3 shows the structures of eight example test
molecules. Four test molecules are shown in 2D only (top
right, 58) with structures signicantly dierent from those of
the majority of compounds synthesized late in the project.
Compounds 57contain centrally located amines, and
compound 8has a dierent scaold. Despite these structural
dierences, QuanSA accurately predicted the activities of these
structurally divergent molecules whose activity spanned a range
of 3.5 log units.
Four other test molecules, each of which has the heavily
explored spirocyclic pyrrolidine scaold (912), are also
shown in Figure 3, along with their top-scoring pose families
and interactions with the pocket eld. Many of the molecules
in the data set varied only in the substitutions on the
spirocyclic pyrrolidine nitrogen, as shown for molecules 912.
The interaction sticks for these molecules with the pocket eld
closely mimic the interactions observed in the X-ray cocrystal
structure of compound 1with LFA-1.
24
Most of the
interactions were hydrophobic (teal sticks) including those
for the dichlorophenyl group itself, which occupies a
hydrophobic pocket. The urea carbonyl, thought to be
hydrogen bonded via a water molecule, is marked by a
prominent red acceptor stick. Compounds 9and 10 were
among the most potent molecules in the test set, and QuanSA
accurately predicted the activities despite the negative charge
on the R group.
FEP+Prediction Performance and Hybrid Modeling.
The FEP+approach employs a reference ligand with a known
free energy of binding along with a structure of the ligand
bound to the protein of interest. From this reference ligand, a
set of molecular transformations can be made and arranged
into a connected graph such that connected pairs of test
molecules have relatively high similarities. For each such
connected pair, a calculation of ΔΔGij is carried out,
corresponding to a single edge in the graph. To obtain a
prediction for a particular molecule, a single edge is the
minimal calculation required, though calculation of the full set
of ΔΔGij within a perturbation graph and application of cycle-
closure corrections can improve the accuracy.
9
In practice, due
to the complexity and computational expense of applying the
method, single-edged anity predictions are often employed.
We limited our FEP+predictions to a subset of 17 of the 67
future test molecules that were suitable for single-edged ΔΔG
calculations from a single reference ligand. Figure 4 shows the
FEP+reference ligand (13) and four example test molecules (4
and 1416) from the 17 molecule test subset. All 17 molecules
in the subset used with FEP+had the spirocyclic pyrrolidine
core and diered only by the R group at the pyrrolidine
nitrogen. Standard Glide MCSS docking
30
was used to
establish initial binding modes for the FEP+calculations (see
the Experimental Section for details).
In order to illustrate ligand movement within the LFA-1
allosteric binding site, the results of ensemble docking are also
shown in Figure 4. The ensemble docking pose families shown
Figure 4. FEP+reference ligand (13) and four test molecules (4and 1416) are shown. FEP+employs an initial docked pose of the reference
molecule in the LFA-1 binding pocket. The top pose family of the reference ligand resulting from ensemble docking using Surex-Dock is shown to
illustrate the potential conformational variation of the ligand in the protein pocket.
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article
https://doi.org/10.1021/acs.jcim.1c01382
J. Chem. Inf. Model. XXXX, XXX, XXXXXX
E
here are consistent across the dierent compounds: while the
spirocyclic core tends to bind in a relatively xed orientation,
there is the potential for conformational variation for the
pyrrolidine nitrogen R groups. This conformational variation is
consistent with crystal structures, which also suggested that
substituents anchored from the ve-membered ring system
project into solvent.
23
Figure 5 shows four examples from comparisons of QuanSA
and FEP+activity predictions on the 17 molecule subset. The
nal optimal pose families from QuanSA for each of the
molecules 4,14,15, and 16 follow the motif seen for test
compounds 912 (Figure 3) with the spirocyclic pyrrolidine
scaold in a relatively xed position and conformational
variation for the R group on the pyrrolidine nitrogen. Also, the
pocket-eld interactions followed the same pattern, mostly
hydrophobic with a prominent acceptor interaction near the
urea carbonyl. The possible orientation changes in the R
groups is reected in the starting docked poses for FEP+.
QuanSA predicted the activities of compounds 4,14, and 15
within 0.5 log unit of activity. FEP+predictions for these active
molecules were slightly less accurate, but still quite good. Note
that the orientations of the nitrogen substituents of the
pyrrolidine dier between QuanSA and FEP+. This was
expected, reecting the pose variation seen in Figure 4 from
ensemble docking of the reference ligand. The QuanSA
alignments were driven by mutual similarity, inuenced by the
crystallographic reference ligand pose toward the bottomof
the ligands, which shared structural homogeneity. The diversity
of substituent orientations seen in the FEP+poses reected
solvent exposure with sparse protein interactions.
Combining the two methods by averaging their independent
predictions (termed hybridmodel predictions) often led to
partial cancellation of errors. For example, for the relatively
active molecule 14 and the signicantly less active molecule
16, predictions from both methods were o, but the errors
were opposite in sign. By combining the results from the two
methods, the predictions for both molecules were reduced to
negligible discrepancies from experimental activity. Note that
typical standard deviations in repeated LFA-1 IC50 determi-
nations were approximately 0.1 pKiunit.
23,24
Figure 6 shows a plot of individual test performance on a
subset of 17 ligands for the QuanSA structure-guided model
(purple times signs, MUE = 0.44), and for FEP+(green plus
signs, MUE = 0.56) as well as for the combination of the
methods (red squares, MUE = 0.25). Hybrid predictions were
dened as the average of the QuanSA and FEP+predictions for
each molecule. Using a paired ttest, the relatively small
dierence in prediction errors between QuanSA and FEP+was
not statistically signicant (p-value = 0.24). However, the
hybrid model performed statistically better than FEP+alone
(p-value = 0.002) and better than structure-guided QuanSA
alone (the paired ttest p-value of 0.09 just misses weak
signicance). The signed prediction errors of QuanSA and
FEP+were only slightly correlated (p= 0.04 by Kendallsτ),
allowing the hybrid model to exhibit marked improvement.
Early-Stage Project Application Scenario: Sixteen
FEP+Benchmark Targets. Early-stage project application
may oer only a handful of data points within a relatively
newly identied chemical series. The original FEP+bench-
mark,
9
here referred to as the Abel benchmark, consisted of
eight targets, each with a prediction set ranging from 11 to 42
members (each including a reference compound within the
prediction set). More recent benchmarking work, here referred
to as the Schindler benchmark,
11
consisted of eight targets,
each with a prediction set ranging from 24 to 44 members
(each including a reference compound within the prediction
set).
Structureactivity data within some series were extremely
limited, but contemporaneously available structureactivity
data and protein structure data were plentiful in other cases.
Figure 7 shows how a focused approach to model induction
can be applied in cases where sparse data exist within a
Figure 5. Four examples from comparisons of activity predictions on
a 17 molecule subset of the blind test set. For QuanSA, the top pose
family for each test molecule plus the interaction sticks of the top pose
with the pocket eld is shown. For FEP+, the initial docked poses are
shown. Hybrid predicted pKivalues are the simple average of the
QuanSA and FEP+values.
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article
https://doi.org/10.1021/acs.jcim.1c01382
J. Chem. Inf. Model. XXXX, XXX, XXXXXX
F
particular series, but where data from other series can be
exploited. We made use of the reference ligand in each case to
identify particularly relevant proteinligand complexes and
structureactivity data, where the information was available
contemporaneously with the public disclosure of the molecules
within the prediction set.
The eSim 3D molecular similarity method (described in
detail previously
31
) was employed to identify particularly
relevant protein structures: those whose cognate ligands
exhibited high similarities to the FEP+reference ligand when
the protein binding sites were aligned. Then, those bound
ligand poses were used to screen for the most structurally
similar ligands from the available bioactivity data. Both of the
ltering steps were applied to data that were publicly available
when each FEP+benchmark series was made public, through
either publication or a patent. Compounds from future years
were reserved for testing screening-style extrapolation from the
structurally focused data sets.
Focused QuanSA Model Building. Figure 8 illustrates
the focused model building process, using SHP2 as an example.
An allosteric mechanism for inhibiting SHP2 was published in
2016,
32
with a chemical series related to the initial lead
structures being disclosed in a subsequent patent that was
granted in 2018 (U.S. Patent 10,093,646), which contained the
structureactivity data used for the FEP+prediction set. There
were several cocrystallized allosteric inhibitors available by
2018 (top middle of Figure 8), with some extending quite far
beyond the spatial extent of the series of interest. By employing
a static eSim similarity measurement between each of the
crystallographically aligned ligands and the reference ligand, a
ltered subset of relevant bound variants was identied (top
right).
Similarly, by 2018, a large number of alternative allosteric
inhibitors had been discovered, again with many extending far
beyond the reference ligand. In practice, with a physically
grounded anity prediction method such as QuanSA, such a
large set of competitive inhibitors dilutes the predictive
performance of models within the space that closely
encompasses a particular series or set of related series that
explore the same area. The full set of known ligands was
screened against the multiple-ligand crystallographically
derived alignment of relevant bound ligands using the eSim
method,
31
and those ligands whose scores exceeded a
threshold were retained (bottom middle of Figure 8). Finally,
the standard process for QuanSA model induction was
employed, making use of the relevant bound ligand poses to
help constrain generation of initial poses for all ligands. This
step may also lter the training molecule pool further on the
basis of multiple stages of accumulating ligands that are at rst
similar to the bound ligands, then those which are similar to
the newly aligned ligands, and so forth (see the Experimental
Section for additional details). In the case of SHP2, the full
pool of known ligands from 2018 and earlier numbered 514,
with the eSim-based ltering process against the crystallo-
graphic ligands resulting in 51 molecules. The QuanSA
alignment initializations accumulative process retained 15 of
51 from the ltered training pool (bottom right of Figure 8).
Each of the 16 targets underwent the same procedure for
focused model building, as just described. Figure 9 illustrates
predictions for SHP2 on four representative ligands (bottom
row), along with representative training ligands (top row).
Prediction values are shown for FEP+and QuanSA, and the
hybridprediction for each ligand is simply the average of
Figure 6. Comparisons of activity predictions on a 17 molecule subset
of the blind test set for QuanSA, FEP+, and hybrid methods.
Figure 7. Preparation and scoring procedures in the early lead
optimization scenario, using a bound reference compound to identify
relevant additional bound ligands, which are then used collectively to
identify a pool of relevant bioactivity data for input to QuanSA model
induction.
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article
https://doi.org/10.1021/acs.jcim.1c01382
J. Chem. Inf. Model. XXXX, XXX, XXXXXX
G
those two values. The prediction set is characterized largely by
the disposition of the amine (right side of the central core),
whether being primary or secondary and the characteristics of
its connection to the central scaold, though some variation of
the left-hand substituent was also explored. For SHP2, the
mean unsigned error for both FEP+and QuanSA was 0.6 log
unit, and the hybrid approach yielded 0.4. The sparse data for
model training was able to cover the variations present in the
prediction set, and the errors of the two primary approaches
partially canceled, allowing for the improvement seen in the
hybrid approach.
Figure 10 shows the analogous information for c-MET,
where 59 ligands of diverse structural character formed the
nal focused set for model parameterization. In contrast to
SHP2, the available training set consisted of molecules outside
the series of interest, and four dierent heterocyclic cores are
present in the training examples shown in Figure 10. The
QuanSA approach was able to learn the eects of the various
substitutions from alternative scaolds and to transfer the
Figure 8. Process of constructing a focused QuanSA model from diverse data for SHP2.
Figure 9. Representative examples of predictions from the FEP+, QuanSA, and hybrid approaches for SHP2. Note that many of the SHP2 training
compounds came from the same patent and series as the prediction set.
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article
https://doi.org/10.1021/acs.jcim.1c01382
J. Chem. Inf. Model. XXXX, XXX, XXXXXX
H
information to the particular series represented in the
prediction set.
The crucial distinction between QuanSA and other machine-
learning methods for anity prediction is that QuanSA
constructs a model that is physically analogous to a protein
binding site. Therefore, in order to accurately predict, for
example, the quantitative eect of the morpholine of the left-
most prediction case, other examples of ligands that place
cationic species in the same vicinity in their bound states must
be properly modeled in the learning process. The blue
interaction stick (red arrow) shows the preference that the
pocket eld has for an amine that is geometrically disposed as
in the optimal pose of this prediction example. It is dicult to
understand the eect on binding from a protein structural
perspective. The amine appears to be within the solvent,
relatively far from an obvious interaction partner. This perhaps
explains why the structure-focused FEP+approach made an
underprediction. In this case, the hybrid prediction was quite
accurate (just 0.2 pKiunit low). The right-most prediction
example shows an example where the hybrid approach did not
perform the best of all three, but it signicantly improved upon
the poorer of the two primary predictions.
It is conceivable, given a suciently large quantity of data,
that a learning method which ignores the conformational strain
and pose of ligands in their bound state could make
meaningfully accurate predictions in cases like this. However,
for the type of ne-grained guidance represented by these
examples, many early-stage lead optimization projects lack such
quantities of data. For the most challenging targets, where
relevant structureactivity data are the most scarce, methods
that can make eective use of data sets measured in dozens of
compounds rather than thousands have a clear advantage.
Statistical Analysis for Focused Model Building.
Figure 11 shows plots for the full set of predictions for both
benchmark sets along with the cumulative histograms of
unsigned prediction errors, exhibiting the same type of error
Figure 10. Representative examples of predictions from the FEP+, QuanSA, and hybrid approaches for c-MET.
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article
https://doi.org/10.1021/acs.jcim.1c01382
J. Chem. Inf. Model. XXXX, XXX, XXXXXX
I
cancellation seen for LFA-1. The hybrid approach produced a
clear reduction in the fraction of predictions with large errors.
For the Schindler benchmark, the hybrid predictions had fewer
than 20% with errors of 1.0 log unit or greater, compared with
just under 30% for QuanSA and just under 40% for FEP+. For
the Abel benchmark, which consisted of smaller jumpsthan
seen for the Schindler benchmark targets, the hybrid approach
produced roughly 10% of predictions with errors of 1.0 log unit
or greater, with FEP+yielding just over 30% and QuanSA just
over 20%. For the Schindler benchmark, the unsigned error for
the hybrid predictions was very signicantly better than that of
either of the other two methods (pvalues of 1010 and 106
compared with FEP+and QuanSA, respectively, using the
paired ttest). For the Abel benchmark, the unsigned error for
the hybrid predictions was very signicantly better than that
for FEP+(pvalue of 109). Between the hybrid and QuanSA
approaches, the hybrid methods reduction in large errors
would make it preferred among the two, despite the error
distributions not being well-dierentiated using the paired t
test.
Prediction errors for the FEP+approach in this analysis were
larger than those reported for the original analysis for the Abel
benchmark.
9
Here, the reference ligand was treated as a
training exemplar, with known absolute ΔG, and the ligands in
Figure 11. Plots of all predictions for each of the three methods for both FEP+benchmarks along with cumulative histograms of unsigned
prediction error. Lines indicate perfect performance (solid black), 1 kcal/mol error (dashed dark gray), and 2 kcal/mol (dashed light gray).
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article
https://doi.org/10.1021/acs.jcim.1c01382
J. Chem. Inf. Model. XXXX, XXX, XXXXXX
J
the prediction set were treated as having unknown activity.
Given the reported ΔΔGvalues, nal ΔGvalues for the
prediction set were made using the reference ligands value as
an oset. In the original work, all experimental ΔGvalues were
used to centerthe predicted values. The more recent analysis
for the Schindler benchmark
11
noted this issue, and statistics
were not calculated for deviation from ΔG. Rather, emphasis
was placed on correlation statistics and upon pairwise ΔΔG
error magnitudes (the accuracy of single-edged predictions). In
practice, in a real prediction scenario, the average ΔGof a
prediction set cannot be known. Therefore, our analysis treats
the FEP+, QuanSA, and hybrid approaches in the same
manner, with the reference ligand as part of the knownsand
the prediction set as unknowns.
Table 1 shows per-target data set sizes and statistical values
for the three methods, both using rank-correlation (Kendallsτ,
which is not aected by the oset calculation) and MUE. In all
cases, the hybrid approach had the lowest MUE. In ve of
eight cases, it also had the highest rank correlation, with FEP+
and QuanSA showing very slightly higher values in the
remaining three cases (two for FEP+and one for QuanSA). In
no case did the hybrid approach fail to produce a statistically
signicant ranking, compared with one failure each for FEP+
and QuanSA (italics). Table 2 shows the analogous data for
the Abel benchmark. Note that, in three cases (MCL1, BACE,
and P38), a random half of the original prediction set was used
for training (marked with asterisks in Table 2; see the
Experimental Section for details). The pattern was similar to
that observed for the Schindler set, with the hybrid method
Table 1. Per-Target Performance of FEP+, QuanSA, and Hybrid Approaches and Data Set Sizes for the Eight Targets of the
Schindler Benchmark
a
Nmean unsigned error Kendallsτ
target pred full pool ltered nal FEP+QuanSA hybrid FEP+QuanSA hybrid
SHP2 25 514 51 15 0.58
(0.410.76) 0.61
(0.410.84) 0.40
(0.270.53) 0.69
(0.460.87) 0.43
(0.010.75) 0.72
(0.470.90)
PFKFB3 39 489 34 34 1.08
(0.861.30) 0.72
(0.570.88) 0.45
(0.330.58) 0.70
(0.560.82) 0.50
(0.310.66) 0.73
(0.620.84)
SYK 43 1827 18 18 0.63
(0.480.81) 0.62
(0.480.78) 0.49
(0.370.62) 0.34
(0.070.59) 0.13
(0.120.37) 0.35
(0.090.60)
HIF2a 41 63 30 29 0.70
(0.530.88) 0.82
(0.631.02) 0.58
(0.440.74) 0.54
(0.270.77) 0.42
(0.180.64) 0.51
(0.250.72)
TNKS2 27 541 150 143 0.86
(0.591.15) 0.74
(0.600.89) 0.64
(0.450.83) 0.34
(0.040.65) 0.55
(0.260.76) 0.49
(0.160.75)
c-MET 23 176 62 59 1.07
(0.801.34) 0.82
(0.660.99) 0.71
(0.540.90) 0.82
(0.660.93) 0.68
(0.510.83) 0.85
(0.730.95)
CDK8 32 130 60 60 0.96
(0.681.25) 0.99
(0.711.30) 0.90
(0.681.16) 0.66
(0.420.87) 0.45
(0.190.68) 0.66
(0.440.85)
EG5 27 147 34 34 1.08
(0.901.26) 1.09
(0.841.32) 0.96
(0.781.15) 0.73
(0.530.89) 0.47
(0.090.77) 0.67
(0.400.89)
32.1 485.9 54.9 49.0 0.87 ±0.21 0.80 ±0.17 0.64 ±0.21 0.60 ±0.18 0.45 ±0.16 0.62 ±0.16
a
Unsigned error is in units of pKi, and Kendallsτvalues are unitless. Numbers in parentheses are 95% condence intervals calculated by
resampling with replacement, bolded values are the best from any method, and values shown in italics did not meet statistical signicance at the p=
0.01 level. The values in the bottom row are the mean and standard deviation for the respective statistical measurement column.
Table 2. Per-Target Performance of FEP+, QuanSA, and Hybrid Approaches and Data Set Sizes for the Eight Targets of the
Abel Benchmark
a
Nmean unsigned error Kendallsτ
target pred full pool ltered nal FEP+QuanSA hybrid FEP+QuanSA hybrid
thrombin 10 2401 74 74 0.55
(0.340.81) 0.42
(0.230.65) 0.28
(0.180.41) 0.63
(0.161.00) 0.63
(0.271.00) 0.85
(0.451.00)
MCL1*20 170 35 34 0.78
(0.531.05) 0.30
(0.150.49) 0.41
(0.250.57) 0.58
(0.130.89) 0.73
(0.351.00) 0.70
(0.360.94)
BACE*17 1705 93 93 0.98
(0.751.21) 0.30
(0.200.41) 0.46
(0.340.60) 0.77
(0.540.94) 0.62
(0.220.90) 0.81
(0.540.98)
P38*16 1901 92 84 0.66
(0.380.95) 0.62
(0.390.87) 0.49
(0.340.65) 0.58
(0.220.85) 0.13
(0.350.62) 0.64
(0.310.89)
PTP1b 22 528 53 41 0.78
(0.590.97) 0.96
(0.621.31) 0.54
(0.370.73) 0.81
(0.530.99) 0.27
(0.130.60) 0.65
(0.370.88)
CDK2 15 86 43 43 0.69
(0.490.89) 0.84
(0.521.17) 0.61
(0.390.83) 0.29
(0.230.76) 0.67
(0.330.92) 0.71
(0.320.97)
TYK2 15 124 48 48 0.62
(0.450.84) 0.80
(0.491.14) 0.69
(0.500.89) 0.71
(0.340.98) 0.53
(0.110.88) 0.78
(0.451.00)
JNK1 20 155 68 55 1.38
(1.041.71) 0.46
(0.270.70) 0.74
(0.570.90) 0.89
(0.691.00) 0.64
(0.360.85) 0.88
(0.651.00)
16.9 883.8 63.3 59.0 0.81 ±0.21 0.59 ±0.17 0.53 ±0.21 0.66 ±0.18 0.53 ±0.16 0.75 ±0.16
a
Unsigned error is in units of pKi, and Kendallsτvalues are unitless. Numbers in parentheses are 95% condence intervals calculated by
resampling with replacement, bolded values are the best from any method, and values shown in italics did not meet statistical signicance at the p=
0.01 level. The values in the bottom row are the mean and standard deviation for the respective statistical measurement column.
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article
https://doi.org/10.1021/acs.jcim.1c01382
J. Chem. Inf. Model. XXXX, XXX, XXXXXX
K
producing the best performance, either by MUE or by rank
correlation, though the advantage of the hybrid approach over
the QuanSA method was smaller.
The average Kendallsτvalues over all 16 targets for the
three methods were as follows: 0.63 (FEP+), 0.49 (QuanSA),
and 0.69 (hybrid). None of the per-target rank-correlation
dierences between methods were statistically signicant at p=
0.01 by the paired ttest due to the relatively small number of
targets. The statistical power is also limited by the fact that
each individual data set is relatively small, and several are
dominated by a narrow experimental assay range, so the
correlation statistics tend to have high variance. The values of
the average per-target unsigned error for the three methods
were 0.84 (FEP+), 0.69 (QuanSA), and 0.58 (hybrid). By the
paired ttest, the per-target hybrid MUE was consistently lower
than those for FEP+(p<10
3) and QuanSA (p= 0.02). This
agreed with the analysis of the unsigned prediction error across
the ligands within the Schindler benchmark (N= 257) and the
Abel benchmark (N= 135), which oer more statistical power
to dierentiate between the methods (see Figure 11).
With respect to the sizes of the bioactivity data sets, we see
that the typical size of the nominally available bioactivity data
was in the hundreds of molecules. However, only roughly one-
tenth of these survived the lter of relevance against the bound
Figure 12. Plots for all eight targets of the Schindler FEP+benchmark. FEP+shown in green plus signs, QuanSA shown in violet times signs, and
hybrid predictions shown in red squares, with a single gray circle marking the activity of the FEP+reference ligand (treated as part of the training
set). In addition, a histogram of signed prediction errors is shown in the lower right.
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article
https://doi.org/10.1021/acs.jcim.1c01382
J. Chem. Inf. Model. XXXX, XXX, XXXXXX
L
known ligand poses, which themselves were ltered for
relevance. The nal focused set of structureactivity data for
each target ranged from 15 (SHP2) to 143 (TNKS2),
averaging about 50 for the Schindler benchmark and about
60 for the Abel benchmark. Data requirements of this scale for
the nal models are generally within the scope of lead
optimization projects quite early on in the exploration of a new
chemical series.
Figures 12 and 13 show individual plots for all predictions
by each method for each target along with histograms of the
signed prediction error values. The histograms showed a
marked decrease in errors of large magnitude (either over- or
underpredictions) by the hybrid method (shown in red). None
of the methods exhibited a systematic bias, with all histograms
being centered very close to zero.
During ne-grained lead optimization, while the rank order
of synthetic candidates is clearly important, the absolute
accuracy of anity predictions takes on additional importance.
For example, in the case of PFKFB3, the reference ligand had a
pKiof roughly 6.5. Consider the predictions for the 39
molecules of the test set as nominal true positives (TPs,
predicted and experimental reference), true negatives (TNs,
predicted/experimental reference), false positives (FPs,
predicted reference, experimental reference), and false
Figure 13. Plots for all eight targets of the Abel FEP+benchmark. FEP+shown in green plus signs, QuanSA shown in violet times signs, and hybrid
predictions shown in red squares, with a single gray circle marking the activity of the FEP+reference ligand (treated as part of the training set). In
addition, a histogram of signed prediction errors is shown in the lower right.
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article
https://doi.org/10.1021/acs.jcim.1c01382
J. Chem. Inf. Model. XXXX, XXX, XXXXXX
M
negatives (FNs, predicted reference, experimental
reference). Both FEP+and QuanSA produced good rankings
(0.70 and 0.50 by Kendallsτ, respectively, both with p< 0.01).
The experimental data contained 20 positives and 19 negatives.
FEP+correctly identied 20 of 20 TPs, but at the expense of 10
of 19 FPs. QuanSA correctly identied 7 of 20 TPs, but it did
so with 0 of 19 FPs. The hybrid approach obtained 20 of 20
TPs with just 4 of 19 FPs. The hybrid method also produced
the best Kendallsτ(0.73, a marginal increase).
Adierent eect was seen with the case of SYK. None of the
methods yielded a high-quality ranking, but this was an eect
of the data distribution. Both methods produced relatively
accurate results on an absolute scale (MUE of 0.6 each), with a
relatively small fraction of predictions being oby more than 2
kcal/mol (9 kcal/mol for FEP+and 4 kcal/mol for QuanSA of
43 total predictions). The hybrid approach made an improve-
ment (MUE of 0.5), with 36 of 43 molecules predicted within
1 kcal/mol of experimental and just 1 of 43 with an error of
greater than 2 kcal/mol. Nevertheless, the rank correlation was
marginal, reecting the limitations of rank-based statistics in
such cases.
Predicted inhibitory activity relative to the current project
context may have a signicant inuence on decisions about
which candidate molecules to synthesize. In particular, a
smaller proportion of large absolute errors will provide better
guidance if rank correlation is equivalent between two
prediction methods and, possibly, even if rank correlation is
slightly worse for the method with better absolute delity.
Overall, the hybrid approach appears to be the best choice in
terms of achieving accurate absolute binding anities or
rankings thereof. Across all 16 targets, with respect to MUE, is
was the best approach in 12 of 16 cases and second best by a
small margin in 4 of 16 cases. With respect to ranking, it was
either the best approach (10 of 16) or second best by a small
margin (6 of 16). Beyond the per-target performance, as seen
from the cumulative histograms of unsigned errors in Figure 11
and the histograms of signed errors in Figures 12 and 13, the
hybrid approach made a marked improvement in terms of the
frequency of large errors, both for overpredictions and for
underpredictions. Nearly 70% of the time, hybrid predictions
were within 1 kcal/mol of experiment, and errors of 2 kcal/mol
or greater occurred 5% of the time or less.
Extrapolation with QuanSA: Identication of Novel
Scaolds and Linkers. Because the QuanSA method can be
applied automatically and rapidly, QuanSA pocket elds can be
used to screen large numbers of candidate molecules. We
explored the ability of the induced models to identify novel
active molecules from ChEMBL data, where the publication
dates of the reports of the new molecules were strictly later (by
year) than the data on which models were constructed. This
approach to data segregation makes it very unlikely that
information about the newmolecules would have been
known and used in designing the molecules used to construct
the models. The converse, of course, is desirable: to see how
well a model can identify novel actives whose structures may
be, in part, reected in the structures known at the time of
model construction.
We assessed the screening utility of the models for
identifying novel molecules by establishing thresholds on
minimum predicted activity (6.0 pKiunits) and on the raw
nearest-neighbor similarity (0.60 eSim unit) of a screened
molecule to a training molecule, both in their predicted
optimal poses. In the project application scenarios previously
discussed, the eSim nearest-neighbor similarity was very high
(averages of 0.87, 0.89, and 0.79 for the LFA-1, Abel, and
Schindler sets, respectively), with only a single LFA-1
prediction molecule having an eSim score of less than 0.60,
none within the Abel set, and fewer than 5% within the
Schindler set. Note that, especially for structurally divergent
molecules, it is not expected that the activity predictions will be
as accurate as for ligands within the focus of the models.
Rather, a critical feature of the selection criteria is that they
identify a small fraction of false positives, as the space of
candidates to be explored may be large. In order to establish
specicity, we also screened a decoy set of 1000 drug/leadlike
ZINC molecules, with the entire set presumptively dened as
false positives. For the 17 targets, in 12 cases, 1 or fewer of the
1000 decoys met the thresholds, with three cases two to four
false positives existed, and in two cases (TNKS2 and CDK8)
the estimated FP rate was 13%.
For the LFA-1 case, just 44 ChEMBL molecules existed to
be screened as temporally prospective candidates, but none of
the molecules passed both criteria. Of the 16 FEP+benchmark
targets, futuredata existed in ChEMBL for all but PFKFB3
to assess extrapolation utility. For these 15 targets, in all cases
except PTP1b, new active ChEMBL molecules were identied,
ranging from a handful (e.g., CDK8, HIF2a, SHP2, JNK1, and
TYK2) to dozens or hundreds (c-MET, SYK, TNKS2, BACE,
MCL-1, and P38).
Figure 14 shows examples from the Schindler target set, one
each for SHP2, c-MET, and SYK. In each case, the automatic
prediction of bound pose is important in establishing the
relationship between the novel compound and those forming
the training set. A notable example was observed for c-MET,
where the new compound was predicted to have greater
activity than any of the training molecules, and it was highly
active. This new molecule makes use of the triazolopyrazine at
right,
33
but it contained a novel linker.
Figure 15 shows examples from the Abel benchmark target
set, one each for BACE, MCL-1, and P38. These follow the
same pattern: predictions of target-specic activity that depend
upon identifying low-energy conformations of complex small
molecules that align with the predicted binding modes of
modeled ligands. Of particular note was MCL1. Here, a
macrocyclic linkage for a highly active inhibitor
34
was
identied.
Computational Time Complexity. The QuanSA and
FEP+methods have quite dierent time requirements for
calculations. FEP+has been optimized for GPU-based
acceleration, and calculations for this study were performed
by using computing nodes running Intel Xeon E7-8867 v3
CPUs (2.5 GHz, 16 core), with an NVIDIA Tesla K40c GPU
dedicated to each node. The most time-consuming preparatory
calculation is the estimation of custom force eld parameters
for unparameterized torsions. In this case, the calculation
required approximately 1 h per molecule for each of the 17
molecules studied. Following force eld parameterization, each
single-edged ΔΔGcalculation (i.e., a prediction for a single
molecule) required just over 2 h. Because each new molecule
may require force eld parameterization, the expectation is
roughly 3 h of wall-clock time per molecule. This allows for
synthetic prioritization during late-stage lead optimization
given an appropriate computing infrastructure, where
predictions on dierent molecules can be processed in parallel.
For QuanSA, the process of model induction and selection is
the preparatory calculation, but it is done a single time prior to
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article
https://doi.org/10.1021/acs.jcim.1c01382
J. Chem. Inf. Model. XXXX, XXX, XXXXXX
N
applying a model to many new ligands. The process consists of
conformational search, mutual ligand alignment, model
parameter learning, and model selection. The QuanSA method
makes use of multicore laptops and workstations for all
calculations, with the reference system containing dual Intel
Xeon Gold 6154 CPUs (3.00 GHz, 36 cores total) with no
GPU-based acceleration. For LFA-1, a fairly typical case, the
entire model induction and selection process required
approximately 3 h of real wall-clock time, the majority of
which was spent in the model parameter learning step for each
of several alternative alignment hypotheses. Calculations of
predicted binding poses and scores for new molecules required
an average of less than 10 s per molecule (including both
conformational search and tting into the QuanSA pocket
eld). Screens of very large numbers of possible design ideas or
virtual synthetic libraries are possible with the QuanSA
method, either employed directly or after using a similarity-
based method as a prescreen.
CONCLUSIONS
Using temporally segregated series of molecules from a lead
optimization project and from two public anity-prediction
benchmarks, we have explored the performance of orthogonal
anity prediction approaches. The QuanSA approach
14
constructs binding-site models based on ligand structure and
anity data, using multiple-instance machine learning. The
models are physically sensible in that they model the protein
ligand interaction in a manner analogous to the physical
process of binding. Given a QuanSA pocket eld, the process
of scoring a new molecule is analogous to docking to a protein
structure: conformational search, alignment to the pocket eld,
and optimization of nal poses with respect to conformational
strain and interaction score. A parallel set of experiments was
carried out by using the physics-based simulation approach
FEP+on LFA-1 and by making use of previously published
data for two benchmarks comprising 16 additional targets.
9,11
For the LFA-1 lead optimization set, the QuanSA models
predicted anities for 67 future compounds had an average
error of 0.4 pKiunit (with highly signicant rank correlation
statistics). For the 17 compounds on which FEP+calculations
were made, the average prediction error was marginally worse
than that for QuanSA. More importantly, by combining the
QuanSA and FEP+predictions into a hybrid model, the
average error dropped signicantly, to less than 0.3 pKiunit,
with a very high correlation between experimental and
Figure 14. Examples of novel ligands for each of three targets from
the Schindler FEP+benchmark identied through temporally
prospective screening of focused QuanSA models. Figure 15. Examples of novel ligands for each of three targets from
the Abel benchmark identied through temporally prospective
screening of focused QuanSA models.
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article
https://doi.org/10.1021/acs.jcim.1c01382
J. Chem. Inf. Model. XXXX, XXX, XXXXXX
O
predicted anities. The errors that the modeling approaches
made in predictions were only slightly correlated, so the hybrid
model exhibited error cancellation.
This pattern of synergy between the two approaches, due to
partial error cancellation, was mirrored across 16 additional
examples derived from public FEP+benchmarking data. The
machine-learning strategy involved choosing the most relevant
data to inform model construction, rather than simply
aggregating a large amount of data. The QuanSA method is
physically driven, and unnecessarily large data sets tend to
dilute the information required for accurate anity prediction.
Both on a per-target basis and in terms of aggregate error
analysis on nearly 400 compound anity predictions, the
hybrid approach (averaging the predictions from FEP+and
QuanSA) performed better than the two primary methods,
particularly with respect to a reduction in large-magnitude
errors. Given the consistency of the results presented here, we
expect this to be true quite often, perhaps even for the majority
of targets across the range of early-stage and mid-to-late-stage
lead optimization projects.
Two aspects of the learned QuanSA models are important
by way of contrast to FEP+. First, application of the models is
computationally inexpensive. Second, the models are relatively
insensitive to molecular scaold changes that may be
inappropriate for application of a free energy perturbation
method. Taken together, this allows for use cases such as
screening large numbers of virtual compounds or searching for
scaold alternatives.
Currently, in cases where biophysical data are available,
application of FEP+and related methods is often done
exclusive of machine-learning approaches. The results
presented here argue that a simple and direct approach can
improve upon either single-mode method. Apart from the
synergy of the numerical predictions, the complementarity of
the methods is multifaceted. One can be applied in low
throughput on relatively close-in analogues at a high
computational cost. The other can be readily applied in
those cases to provide an orthogonal prediction, but it can also
be applied across a wider range of chemical space, and very
large sets of potential ligands can be processed. Application of
QuanSA models across the targets to identify future active
ChEMBL ligands was successful in many cases, with generally
very low estimated false positive rates. The QuanSA
predictions yield a prediction of binding pose, rather than a
black box just producing a number, which lends condence
where synthetic eort requires justication and also can
stimulate the design process.
In terms of assessment for either simulation-based or
machine-learning methods, studies have only rarely been
published utilizing time-stamped lead optimization data. This
type of application is the most realistic approach to validation
that is possible without fully prospective experiments, and it is
more likely to reect future real-world prospective perform-
ance than other validation schemes. Here, accurate predictions
were obtained at both the short time scale (compound
registration dates) and a longer time scale (data disclosure
years).
As a general matter for application in lead optimization, we
believe that the dichotomy between physics-based simulation
methods and those from machine learning has been driven
largely by the nonphysical assumptions of traditional QSAR
approaches. The QuanSA approach moves much closer to
physics-based simulation in terms of its underlying mechanics.
The approaches appear to be complementary, both in the
sense of orthogonality of prediction errors and in terms of their
domains of applicability. Medicinal chemistry projects within
most stages of structure-enabled lead optimization could
benet from both types of approaches, in combination and to
serve complementary goals.
EXPERIMENTAL SECTION
This is not primarily a report of new methods. As such, data
curation and computational protocols will be described, with
references to detailed reports of their algorithmic under-
pinnings, implementation, and validation. All molecular and
activity data along with computational procedures for the
public benchmarks comprising the bulk of this study are freely
available (for details, see Supporting Information and Data and
Software Availability).
Molecular and Activity Data. LFA-1. A total of 202
compounds from a lead optimization project formed the data
set. Molecules were provided as SMILES strings with
registration dates and associated activities. The molecules
were sorted by registration date and segregated temporally,
with the oldest third as train molecules (n= 67), the next third
as holdout molecules (n= 68), and the most recent third as
test molecules (n= 67). Standard procedures were used to
convert 2D to 3D structures, protonate the molecules as
expected at physiological pH, and perform a conformational
search. There were 57 molecules having exactly one
unspecied chiral center, and these were prepared as racemic
mixtures.
ChEMBL target 1803, human LFA-1, had 131 small
molecule ligands with associated IC50 activities. The set was
ltered to exclude ligands that were present in the set of 202
project compounds and those that did not bind at the allosteric
binding site. The 44 remaining ligands formed the ChEMBL
extrapolation set, representing three LFA-1 pharmaceutical
industry eorts: 32 from ref 35, 10 from ref 36, and 2 from the
same series of molecules in our 202 ligand set.
24
The 1000
ZINC molecule set used to establish model specicity has been
used previously in the same manner.
14,18
FEP+Benchmark Data Sets. Data comprising 16 prediction
sets was derived from the Supporting Information of two FEP+
benchmarking studies.
9,11
Data for model induction was
curated from ChEMBL and, where necessary, from the original
publications cited within that Supporting Information. For
each target, where Nmolecules were indicated as comprising
the prediction set, the reference molecule was removed from
the predictions to be made and added to the QuanSA model
induction sets. The reference molecule in its bound state was
also used to identify related contemporaneous PDB structures
from a mutually aligned set, as described previously.
37
For both benchmark data sets, the procedure outlined in
Figure 7 made use of default eSim thresholds for building
focused models of 6.5. For the Schindler benchmark, in cases
where this threshold yielded too few training molecules or
poor prediction set coverage based on QuanSA scoring quality
measures, the thresholds were reduced. To test an alternative
strategy, for the Abel benchmark, in three cases (BACE,
MCL1, and P38) half of the N1 molecules for each
prediction set were randomly selected and added to the
respective training sets. For the Schindler benchmark, there
were, on average 32.1 prediction examples per target (total of
257). For the Abel benchmark, there were 16.9 examples per
target (total of 135). Tables 1 and 2provide total numbers of
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article
https://doi.org/10.1021/acs.jcim.1c01382
J. Chem. Inf. Model. XXXX, XXX, XXXXXX
P
compounds for each target (prediction set, full possible
training pool, pretraining ltered pool, and nal number of
training compounds).
ChEMBL data representing futuremolecules were
collected for each of the 16 targets using ChEMBL target
identiers, and the year of literature or patent publication was
used to temporally segregate the extrapolation sets from the
model induction sets. These were further ltered to remove
data used for model induction (in rare cases, a training
compound appeared in future-year publications). The decoy
set used to establish specicity was the same as that used for
LFA-1.
Computational Procedures for QuanSA Model In-
duction and Prediction. The QuanSA method is the
successor to the QMOD approach.
14
QuanSA builds a pocket
field rather than constructing a physical set of probes. QuanSA
brings together ideas and algorithms from molecular
similarity
31
and multiple instance learning together with
many lessons learned and features developed with the
QMOD approach.
18
The methods papers are comprehensive,
and the Supporting Information contains a more detailed
algorithmic description than the foregoing without recapitulat-
ing the prior publications. Standard procedures were employed
for QuanSA model induction (Surex Platform releases 5.001
for LFA-1 and 5.122 for the Schindler and Abel benchmarks,
BioPharmics LLC, Sonoma County, CA, 2021).
LFA-1. The results reported here were generated using
version 5 of the Surex Platform. Surex-Dock was used to
align X-ray crystallographic protein structures and to perform
ensemble docking as an independent means to analyze ligand
pose. Ligand preparation was carried out with standard
procedures using the -pquant level of conformer elaboration.
This produced up to 1000 conformers for each training ligand
(though typically far fewer for LFA-1 compounds). For the 57
molecules with a single unspecied chiral center, the -enum
chiral 1 parameter resulted in the molecules being racemized.
Details on the ForceGen methodology have been detailed
previously.
38,39
The QuanSA initialization procedure (init) automatically
builds multiple initial alignments, but it can be inuenced by
user knowledge and guidance. The -clknown parameter
species a set of known poses for competitive ligands, in this
case an alignment of six crystallographic ligands. The searched
conformer database along with a le containing molecule
names and associated activity information formed the input to
the initialization procedure, which was done in the standard
manner. After initialization, model building was carried out for
the top ve alignments produced by the initialization
procedure, followed by model selection, informed by the use
of a holdout set. After selection of the model for prospective
application, the holdout molecules were then added to rene
the selected model, as outlined in Figure 2.
FEP+Benchmark Data Sets. As for application to LFA-1,
standard procedures were used for model induction and for
scoring the prediction sets. The primary variation in the
application to the individual targets was in ltering the training
pool of data for each target (as described above). For the
targets SYK, EG5, and CDK8, the initial threshold for ligand
similarity to known bound ligands was reduced from the
standard value of 6.5 in order to yield sucient training data
for adequate coverage of the prediction sets based on the
quality metrics produced in the scoring procedure.
Computational Procedures for FEP+.FEP+calculations
were performed by using standard protocols, making use of the
Glide, Prime, MacroModel, and FEP+tools (all release 2019-4,
Schrödinger, LLC, New York, NY, 2019).
Poses were generated by using Glide in SP mode using core
constraints relative to compound 1. The ligand and residues
within 5 Å of the ligand were left exible, while the remainder
of the atoms were constrained. For application of FEP+, the
Glide poses were minimized by using MacroModel with
OPLS3 with implicit solvation. The protein was held rigid.
Custom force eld parameters were calculated using the
Forceeld Builder module that is part of FEP+. Single-edge
FEP+calculations were performed relative to compound 13 for
the 17 test molecules from the temporally segregated test set.
DATA AND SOFTWARE AVAILABILITY
An extensive data archive is freely available at www.jainlab.org
(see the Supporting Information). All software employed
herein is commercially available.
ASSOCIATED CONTENT
*
sıSupporting Information
The Supporting Information is available free of charge at
https://pubs.acs.org/doi/10.1021/acs.jcim.1c01382.
Additional information about computational methods;
detailed description of the contents of the extensive data
archive (PDF)
All prediction data for the FEP+targets for all three
methods (XLSX)
AUTHOR INFORMATION
Corresponding Authors
Stephen R. Johnson Computer-Assisted Drug-Design,
Bristol-Myers Squibb Company, Princeton, New Jersey
08648, United States; Email: stephen.johnson@bms.com
Ajay N. Jain Research and Development, BioPharmics LLC,
Santa Rosa, California 95404, United States; orcid.org/
0000-0003-4641-8501; Email: ajain@jainlab.org
Author
Ann E. Cleves Applied Science, BioPharmics LLC, Santa
Rosa, California 95404, United States; orcid.org/0000-
0002-1622-2770
Complete contact information is available at:
https://pubs.acs.org/10.1021/acs.jcim.1c01382
Notes
The authors declare no competing nancial interest.
ACKNOWLEDGMENTS
The authors thank Bristol-Myers Squibb for providing support
for this work.
REFERENCES
(1) Brown, S. P.; Muchmore, S. W. High-throughput calculation of
protein-ligand binding Affinities: Modification and adaptation of the
MM-PBSA protocol to enterprise grid computing. J. Chem. Inf. Model.
2006,46, 9991005.
(2) Brown, S. P.; Muchmore, S. W. Rapid estimation of relative
protein-ligand binding affinities using a high-throughput version of
MM-PBSA. J. Chem. Inf. Model. 2007,47, 14931503.
(3) Brown, S. P.; Muchmore, S. W. Large-scale application of high-
throughput molecular mechanics with Poisson-Boltzmann surface area
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article
https://doi.org/10.1021/acs.jcim.1c01382
J. Chem. Inf. Model. XXXX, XXX, XXXXXX
Q
for routine physics-based scoring of protein-ligand complexes. J. Med.
Chem. 2009,52, 31593165.
(4) Wang, E.; Sun, H.; Wang, J.; Wang, Z.; Liu, H.; Zhang, J. Z.;
Hou, T. End-point binding free energy calculation with MM/PBSA
and MM/GBSA: strategies and applications in drug design. Chem.
Rev. 2019,119, 94789508.
(5) Chodera, J. D.; Mobley, D. L.; Shirts, M. R.; Dixon, R. W.;
Branson, K.; Pande, V. S. Alchemical free energy methods for drug
discovery: Progress and challenges. Curr. Opin. Struct. Biol. 2011,21,
150160.
(6) Genheden, S.; Ryde, U. The MM/PBSA and MM/GBSA
methods to estimate ligand-binding affinities. Expert Opin. Drug
Discovery 2015,10, 449461.
(7) Jorgensen, W. L.; Ravimohan, C. Monte Carlo simulation of
differences in free energies of hydration. J. Chem. Phys. 1985,83,
30503054.
(8) Kollman, P. Free energy calculations: Applications to chemical
and biochemical phenomena. Chem. Rev. 1993,93, 23952417.
(9) Wang, L.; Wu, Y.; Deng, Y.; Kim, B.; Pierce, L.; Krilov, G.;
Lupyan, D.; Robinson, S.; Dahlgren, M. K.; Greenwood, J.; Romero,
D. L.; Masse, C.; Knight, J. L.; Steinbrecher, T.; Beuming, T.; Damm,
W.; Harder, E.; Sherman, W.; Brewer, M.; Wester, R.; Murcko, M.;
Frye, L.; Farid, R.; Lin, T.; Mobley, D. L.; Jorgensen, W. L.; Berne, B.
J.; Friesner, R. A.; Abel, R. Accurate and reliable prediction of relative
ligand binding potency in prospective drug discovery by way of a
modern free-energy calculation protocol and force field. J. Am. Chem.
Soc. 2015,137, 26952703.
(10) Sun, H.; Li, Y.; Tian, S.; Xu, L.; Hou, T. Assessing the
performance of MM/PBSA and MM/GBSA methods. 4. Accuracies
of MM/PBSA and MM/GBSA methodologies evaluated by various
simulation protocols using PDBbind data set. Phys. Chem. Chem. Phys.
2014,16, 1671916729.
(11) Schindler, C. E. M.; Baumann, H.; Blum, A.; Bose, D.;
Buchstaller, H.-P.; Burgdorf, L.; Cappel, D.; Chekler, E.; Czodrowski,
P.; Dorsch, D.; Eguida, M. K. I.; Follows, B.; Fuchss, T.; Grädler, U.;
Gunera, J.; Johnson, T.; Jorand Lebrun, C.; Karra, S.; Klein, M.;
Knehans, T.; Koetzner, L.; Krier, M.; Leiendecker, M.; Leuthner, B.;
Li, L.; Mochalkin, I.; Musil, D.; Neagu, C.; Rippmann, F.; Schiemann,
K.; Schulz, R.; Steinbrecher, T.; Tanzer, E.-M.; Unzue Lopez, A.;
Viacava Follis, A.; Wegener, A.; Kuhn, D. Large-scale assessment of
binding free energy calculations in active drug discovery projects. J.
Chem. Inf. Model. 2020,60, 54575474.
(12) Walters, W. P.; Barzilay, R. Applications of deep learning in
molecule generation and molecular property prediction. Acc. Chem.
Res. 2021,54, 263270.
(13) Ramsundar, B.; Eastman, P.; Walters, P.; Pande, V. Deep
Learning for the Life Sciences: Applying Deep Learning to Genomics,
Microscopy, Drug Discovery, and More;OReilly Media, Inc.: 2019.
(14) Cleves, A. E.; Jain, A. N. Quantitative Surface Field Analysis:
Learning Causal Models to Predict Ligand Binding Affinity and Pose.
J. Comput.-Aided Mol. Des. 2018,32, 731757.
(15) Jain, A. Scoring noncovalent protein-ligand interactions: A
continuous differentiable function tuned to compute binding
affinities. J. Comput.-Aided Mol. Des. 1996,10, 427440.
(16) Pham, T.; Jain, A. Parameter estimation for scoring protein-
ligand interactions using negative training data. J. Med. Chem. 2006,
49, 58565868.
(17) Jain, A.; Cleves, A. Does your model weigh the same as a Duck?
J. Comput.-Aided Mol. Des. 2012,26,5767.
(18) Cleves, A. E.; Jain, A. N. Extrapolative prediction using
physically-based QSAR. J. Comput.-Aided Mol. Des. 2016,30, 127
152.
(19) Jain, A. N.; Dietterich, T. G.; Lathrop, R. H.; Chapman, D.;
Critchlow, R. E., Jr.; Bauer, B. E.; Webster, T. A.; Lozano-Perez, T.
Compass: A Shape-Based Machine Learning Tool for Drug Design. J.
Comput.-Aided Mol. Des. 1994,8, 635652.
(20) Jain, A.; Koile, K.; Chapman, D. Compass: Predicting biological
activities from molecular surface properties. Performance comparisons
on a steroid benchmark. J. Med. Chem. 1994,37, 231527.
(21) Jain, A.; Harris, N.; Park, J. Quantitative binding site model
generation: Compass applied to multiple chemotypes targeting the 5-
HT1a receptor. J. Med. Chem. 1995,38, 12951308.
(22) Dietterich, T. G.; Lathrop, R. H.; Lozano-Pérez, T. Solving the
multiple instance problem with axis-parallel rectangles. Artif. Intell.
1997,89,3171.
(23) Potin, D.; Launay, M.; Monatlik, F.; Malabre, P.; Fabreguettes,
M.; Fouquet, A.; Maillet, M.; Nicolai, E.; Dorgeret, L.; Chevallier, F.;
Besse, D.; Dufort, M.; Caussade, F.; Ahmad, S. Z.; Stetsko, D. K.;
Skala, S.; Davis, P. M.; Balimane, P.; Patel, K.; Yang, Z.; Marathe, P.;
Postelneck, J.; Townsend, R. M.; Goldfarb, V.; Sheriff, S.; Einspahr,
H.; Kish, K.; Malley, M. F.; DiMarco, J. D.; Gougoutas, J. Z.; Kadiyala,
P.; Cheney, D. L.; Tejwani, R. W.; Murphy, D. K.; Mcintyre, K. W.;
Yang, X.; Chao, S.; Leith, L.; Xiao, Z.; Mathur, A.; Chen, B.-C.; Wu,
D.-R.; Traeger, S. C.; McKinnon, M.; Barrish, J. C.; Robl, J. A.;
Iwanowicz, E. J.; Suchard, S. J.; Dhar, T. G. M. Discovery and
development of 5-[(5 S, 9 R)-9-(4-cyanophenyl)-3-(3, 5-dichlor-
ophenyl)-1-methyl-2, 4-dioxo-1, 3, 7-triazaspiro [4.4] non-7-yl-
methyl]-3-thiophenecarboxylic acid (BMS-587101)a small molecule
antagonist of leukocyte function associated antigen 1.J. J. Med.
Chem. 2006,49, 69466949.
(24) Watterson, S. H.; Xiao, Z.; Dodd, D. S.; Tortolani, D. R.;
Vaccaro, W.; Potin, D.; Launay, M.; Stetsko, D. K.; Skala, S.; Davis, P.
M.; Lee, D.; Yang, X.; McIntyre, K. W.; Balimane, P.; Patel, K.; Yang,
Z.; Marathe, P.; Kadiyala, P.; Tebben, A. J.; Sheriff, S.; Chang, C. Y.;
Ziemba, T.; Zhang, H.; Chen, B.-C.; DelMonte, A. J.; Aranibar, N.;
McKinnon, M.; Barrish, J. C.; Suchard, S. J.; Murali Dhar, T. G. Small
Molecule Antagonist of Leukocyte Function Associated Antigen-1
(LFA-1): Structure- Activity Relationships Leading to the Identi-
fication of 6-((5 S, 9 R)-9-(4-Cyanophenyl)-3-(3, 5-dichlorophenyl)-
1-methyl-2, 4-dioxo-1, 3, 7-triazaspiro [4.4] nonan-7-yl) nicotinic
Acid (BMS-688521). J. Med. .Chem. . 2010,53, 38143830.
(25) Cleves, A. E.; Jain, A. N. Effects of inductive bias on
computational evaluations of ligand-based modeling and on drug
discovery. J. Comput.-Aided Mol. Des. 2008,22, 147159.
(26) Varela, R.; Walters, W.; Goldman, B.; Jain, A. Iterative
refinement of a binding pocket model: Active computational steering
of lead optimization. J. Med. Chem. 2012,55, 89268942.
(27) Hogg, N.; Henderson, R.; Leitinger, B.; McDowall, A.; Porter,
J.; Stanley, P. Mechanisms contributing to the activity of integrins on
leukocytes. Immunol. Rev. 2002,186, 164171.
(28) Lebwohl, M.; Tyring, S. K.; Hamilton, T. K.; Toth, D.; Glazer,
S.; Tawfik, N. H.; Walicke, P.; Dummer, W.; Wang, X.; Garovoy, M.
R.; Pariser, D. A novel targeted T-cell modulator, efalizumab, for
plaque psoriasis. N. Engl. J. Med. 2003,349, 20042013.
(29) Welzenbach, K.; Hommel, U.; Weitz-Schmidt, G. Small
Molecule Inhibitors Induce Conformational Changes in the I Domain
and the I-like Domain of Lymphocyte Function-associated Antigen-1.
J. Biol. Chem. 2002,277, 1059010598.
(30) Friesner, R. A.; Banks, J. L.; Murphy, R. B.; Halgren, T. A.;
Klicic, J. J.; Mainz, D. T.; Repasky, M. P.; Knoll, E. H.; Shelley, M.;
Perry, J. K.; Shaw, D. E.; Francis, P.; Shenkin, P. S. Glide: A new
approach for rapid, accurate docking and scoring. 1. Method and
assessment of docking accuracy. J. Med. Chem. 2004,47, 17391749.
(31) Cleves, A. E.; Johnson, S. R.; Jain, A. N. Electrostatic-field and
surface-shape similarity for virtual screening and pose prediction. J.
Comput.-Aided Mol. Des. 2019,33, 865886.
(32) Chen, Y.-N. P.; LaMarche, M. J.; Chan, H. M.; Fekkes, P.;
Garcia-Fortanet, J.; Acker, M. G.; Antonakos, B.; Chen, C. H.-T.;
Chen, Z.; Cooke, V. G.; Dobson, J. R.; Deng, Z.; Fei, F.; Firestone, B.;
Fodor, M.; Fridrich, C.; Gao, H.; Grunenfelder, D.; Hao, H.-X.; Jacob,
J.; Ho, S.; Hsiao, K.; Kang, Z. B.; Karki, R.; Kato, M.; Larrow, J.; La
Bonte, L. R.; Lenoir, F.; Liu, G.; Liu, S.; Majumdar, D.; Meyer, M. J.;
Palermo, M.; Perez, L.; Pu, M.; Price, E.; Quinn, C.; Shakya, S.;
Shultz, M. D.; Slisz, J.; Venkatesan, K.; Wang, P.; Warmuth, M.;
Williams, S.; Yang, G.; Yuan, J.; Zhang, J.-H.; Zhu, P.; Ramsey, T.;
Keen, N. J.; Sellers, W. R.; Stams, T.; Fortin, P. D. Allosteric
inhibition of SHP2 phosphatase inhibits cancers driven by receptor
tyrosine kinases. Nature 2016,535, 148152.
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article
https://doi.org/10.1021/acs.jcim.1c01382
J. Chem. Inf. Model. XXXX, XXX, XXXXXX
R
(33) Jia, H.; Dai, G.; Weng, J.; Zhang, Z.; Wang, Q.; Zhou, F.; Jiao,
L.; Cui, Y.; Ren, Y.; Fan, S.; Zhou, J.; Qing, W.; Gu, Y.; Wang, J.; Sai,
Y.; Su, W. Discovery of (S)-1-(1-(Imidazo [1, 2-a] pyridin-6-yl)
ethyl)-6-(1-methyl-1 H-pyrazol-4-yl)-1 H-[1, 2, 3] triazolo [4, 5-b]
pyrazine (Volitinib) as a Highly Potent and Selective Mesenchymal
Epithelial Transition Factor (c-Met) Inhibitor in Clinical Develop-
ment for Treatment of Cancer. J. Med. Chem. 2014,57, 75777589.
(34) Tron, A. E.; Belmonte, M. A.; Adam, A.; Aquila, B. M.; Boise, L.
H.; Chiarparin, E.; Cidado, J.; Embrey, K. J.; Gangl, E.; Gibbons, F.
D.; Gregory, G. P.; Hargreaves, D.; Hendricks, J. A.; Johannes, J. W.;
Johnstone, R. W.; Kazmirski, S. L.; Kettle, J. G.; Lamb, M. L.; Matulis,
S. M.; Nooka, A. K.; Packer, M. J.; Peng, B.; Rawlins, P. B.; Robbins,
D. W.; Schuller, A. G.; Su, N.; Yang, W.; Ye, Q.; Zheng, X.; Secrist, J.
P.; Clark, E. A.; Wilson, D. M.; Fawell, S. E.; Hird, A. W. Discovery of
Mcl-1-specific inhibitor AZD5991 and preclinical activity in multiple
myeloma and acute myeloid leukemia. Nature Comm 2018,9, 5341.
(35) Winn, M.; Reilly, E. B.; Liu, G.; Huth, J. R.; Jae, H.-S.; Freeman,
J.; Pei, Z.; Xin, Z.; Lynch, J.; Kester, J.; von Geldern, T. W.; Leitza, S.;
DeVries, P.; Dickinson, R.; Mussatto, D.; Okasinski, G. F. Discovery
of novel p-arylthio cinnamides as antagonists of leukocyte function-
associated antigen-1/intercellular adhesion molecule-1 interaction. 4.
Structure- activity relationship of substituents on the benzene ring of
the cinnamide. J. Med. Chem. 2001,44, 43934403.
(36) Kollmann, C. S.; Bai, X.; Tsai, C.-H.; Yang, H.; Lind, K. E.;
Zhu, Z.; Israel, D. I.; Cuozzo, J. W.; Morgan, B. A.; Yuki, K.; Xie, C.;
Springer, T. A.; Shimaoka, M.; Evindar, G.; Skinner, S. R. Application
of encoded library technology (ELT) to a proteinprotein interaction
target: Discovery of a potent class of integrin lymphocyte function-
associated antigen 1 (LFA-1) antagonists. Bioorg. Med. Chem. 2014,
22, 23532365.
(37) Spitzer, R.; Cleves, A.; Varela, R.; Jain, A. Protein function
annotation by local binding site surface similarity. Proteins: Struct.,
Funct., Genet. 2014,82, 679694.
(38) Cleves, A. E.; Jain, A. N. ForceGen 3D Structure and
Conformer Generation: From Small Lead-Like Molecules to Macro-
cyclic Drugs. J. Comput.-Aided Mol. Des. 2017,31, 419439.
(39) Jain, A. N.; Cleves, A. E.; Gao, Q.; Wang, X.; Liu, Y.; Sherer, E.
C.; Reibarkh, M. Y. Complex macrocycle exploration: Parallel,
heuristic, and constraint-based conformer generation using ForceGen.
J. Comput.-Aided Mol. Des. 2019,33, 531558.
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article
https://doi.org/10.1021/acs.jcim.1c01382
J. Chem. Inf. Model. XXXX, XXX, XXXXXX
S
... Earlystage lead optimization may involve just dozens of assayed molecules within a newly discovered chemical series, and even mid-to late-stage projects may be limited to hundreds or up to a few thousand data points. The recently introduced QuanSA machine-learning method (Quantitative Surfacefield Analysis) differs from the deep-learning paradigm and from historically widely used methods [11,12] in ways that make it applicable even in early-stage lead optimization. ...
... Nearly all QSAR and deep-learning methods ignore some or all of these aspects of protein-ligand interactions. Additional discussion of the theoretical contrasts between the QuanSA multiple-instance learning approach and other QSAR (3D and 2D) approaches can be found in the papers introducing the method [11,12] along with the antecedent QMOD [16] and Compass [17][18][19] approaches, the latter of which introduced the multiple-instance machine-learning paradigm [20]. Figure 2 depicts the overall scheme of the study. ...
... The QuanSA methodology derives a pocket-field beginning from an initial mutual alignment of a set of training ligands [11,12], where each ligand has multiple possible initial poses. When protein structure information is available, it is possible to make use of the experimentally determined relative poses of prior known bound ligands in order to guide the construction of the initial set of training poses. ...
Article
Full-text available
Scaffold replacement as part of an optimization process that requires maintenance of potency, desirable biodistribution, metabolic stability, and considerations of synthesis at very large scale is a complex challenge. Here, we consider a set of over 1000 time-stamped compounds, beginning with a macrocyclic natural-product lead and ending with a broad-spectrum crop anti-fungal. We demonstrate the application of the QuanSA 3D-QSAR method employing an active learning procedure that combines two types of molecular selection. The first identifies compounds predicted to be most active of those most likely to be well-covered by the model. The second identifies compounds predicted to be most informative based on exhibiting low predicted activity but showing high 3D similarity to a highly active nearest-neighbor training molecule. Beginning with just 100 compounds, using a deterministic and automatic procedure, five rounds of 20-compound selection and model refinement identifies the binding metabolic form of florylpicoxamid. We show how iterative refinement broadens the domain of applicability of the successive models while also enhancing predictive accuracy. We also demonstrate how a simple method requiring very sparse data can be used to generate relevant ideas for synthetic candidates.
... [28,29]. Physics-based approaches are recognized for further fostering regulation of ML techniques as a screen in classification or generation tasks [30,31]. ...
Article
Full-text available
Collagen is fundamental to a vast diversity of health functions and potential therapeutics. Short peptides targeting collagen are attractive for designing modular systems for site-specific delivery of bioactive agents. Characterization of peptide–protein binding involves a larger number of potential interactions that require screening methods to target physiological conditions. We build a hydropathy-based free energy estimation tool which allows quick evaluation of peptides binding to collagen. Previous studies showed that pH plays a significant role in collagen structure and stability. Our design tool enables probing peptides for their collagen-binding property across multiple pH conditions. We explored binding features of currently known collagen-binding peptides, collagen type I alpha chain 2 sense peptide (TKKTLRT) and decorin LRR-10 (LRELHLNNN). Based on these analyzes, we engineered a collagen-binding peptide with enhanced properties across a large pH range in contrast to LRR-10 pH dependence. To validate our predictions, we used a quantum-dots-based binding assay to compare the coverage of the peptides on type I collagen. The predicted peptide resulted in improved collagen binding. Hydropathy of the peptide–protein pair is a promising approach to finding compatible pairings with minimal use of computational resources, and our method allows for quick evaluation of peptides for binding to other proteins. Overall, the free-energy-based tool provides an alternative computational screening approach that impacts protein interaction search methods.
... In the literature, various scientists have prospectively and retrospectively demonstrated the great utility of FEP+ in hit-to-lead identification and lead optimization efforts [51][52][53][54][55][56]. Furthermore, a detailed theory behind the FEP+ method and the detailed protocols can be obtained from the original publications that were also adopted in this work [47][48][49][50]53,[55][56][57][58][59][60][61]. ...
Article
Full-text available
A global pandemic caused by the SARS-CoV-2 virus that started in 2020 and has wreaked havoc on humanity still ravages up until now. As a result, the negative impact of travel restrictions and lockdowns has underscored the importance of our preparedness for future pandemics. The main thrust of this work was based on addressing this need by traversing chemical space to design inhibitors that target the SARS-CoV-2 papain-like protease (PLpro). Pathfinder-based retrosynthesis analysis was used to generate analogs of GRL-0617 using commercially available building blocks by replacing the naphthalene moiety. A total of 10 models were built using active learning QSAR, which achieved good statistical results such as an R2 > 0.70, Q2 > 0.64, STD Dev < 0.30, and RMSE < 0.31, on average for all models. A total of 35 ideas were further prioritized for FEP+ calculations. The FEP+ results revealed that compound 45 was the most active compound in this series with a ΔG of −7.28 ± 0.96 kcal/mol. Compound 5 exhibited a ΔG of −6.78 ± 1.30 kcal/mol. The inactive compounds in this series were compound 91 and compound 23 with a ΔG of −5.74 ± 1.06 and −3.11 ± 1.45 kcal/mol. The combined strategy employed here is envisaged to be of great utility in multiparameter lead optimization efforts, to traverse chemical space, maintaining and/or improving the potency as well as the property space of synthetically aware design ideas.
... 10). Recently, a similar approach was introduced where FEP and ML predictions were averaged, leading to a higher prediction performance [35]. ...
Article
Full-text available
We release a new, high quality data set of 1162 PDE10A inhibitors with experimentally determined binding affinities together with 77 PDE10A X-ray co-crystal structures from a Roche legacy project. This data set is used to compare the performance of different 2D- and 3D-machine learning (ML) as well as empirical scoring functions for predicting binding affinities with high throughput. We simulate use cases that are relevant in the lead optimization phase of early drug discovery. ML methods perform well at interpolation, but poorly in extrapolation scenarios—which are most relevant to a real-world application. Moreover, we find that investing into the docking workflow for binding pose generation using multi-template docking is rewarded with an improved scoring performance. A combination of 2D-ML and 3D scoring using a modified piecewise linear potential shows best overall performance, combining information on the protein environment with learning from existing SAR data. Graphical abstract
... Guallar and coworkers showed promising results using MC simulations in PELE combined with MSM for the estimation of ligand binding free energies 9,10 . An emerging area of intense research combines the alchemical method FEP with machine learning, allowing better estimation of parameters and better precision in free energy estimation 11,12 , at the cost of losing throughput in calculations. ...
Preprint
Full-text available
The correct evaluation of ligand binding free energies by computational methods is still a very challenging active area of research. The most employed methods for these calculations can be roughly classified into four groups: ( i ) the fastest and less accurate methods, such as molecular docking, designed to sample a large number of molecules and rapidly rank them according to the potential binding energy; ( ii ) the second class of methods use a thermodynamic ensemble, typically generated by molecular dynamics, to analyze the endpoints of the thermodynamic cycle for binding and extract differences, in the so-called ‘end-point’ methods; ( iii ) the third class of methods is based on the Zwanzig relationship and computes the free energy difference after a chemical change of the system (alchemical methods); and ( iv ) methods based on biased simulations, such as metadynamics, for example. These methods require increased computational power and as expected, result in increased accuracy for the determination of the strength of binding. Here, we describe an intermediate approach, based on the Monte Carlo Recursion (MCR) method first developed by Harold Scheraga. In this method, the system is sampled at increasing effective temperatures, and the free energy of the system is assessed from a series of terms W ( b , T ), computed from Monte Carlo (MC) averages at each iteration. We show the application of the MCR for ligand binding with datasets of guest-hosts systems (N=75) and we observed that a good correlation is obtained between experimental data and the binding energies computed with MCR. We also compared the experimental data with an end-point calculation from equilibrium Monte Carlo calculations that allowed us to conclude that the lower-energy (lower-temperature) terms in the calculation are the most relevant to the estimation of the binding energies, resulting in similar correlations between MCR and MC data and the experimental values. On the other hand, the MCR method provides a reasonable view of the binding energy funnel, with possible connections with the ligand binding kinetics, as well. The codes developed for this analysis are publicly available on GitHub as a part of the LiBELa/MCLiBELa project ( https://github.com/alessandronascimento/LiBELa ). Table of Contents/Abstract Graphics
Article
The correct evaluation of ligand binding free energies by computational methods is still a very challenging active area of research. The most employed methods for these calculations can be roughly classified into four groups: (i) the fastest and less accurate methods, such as molecular docking, designed to sample a large number of molecules and rapidly rank them according to the potential binding energy; (ii) the second class of methods use a thermodynamic ensemble, typically generated by molecular dynamics, to analyze the endpoints of the thermodynamic cycle for binding and extract differences, in the so-called 'end-point' methods; (iii) the third class of methods is based on the Zwanzig relationship and computes the free energy difference after a chemical change of the system (alchemical methods); and (iv) methods based on biased simulations, such as metadynamics, for example. These methods require increased computational power and as expected, result in increased accuracy for the determination of the strength of binding. Here, we describe an intermediate approach, based on the Monte Carlo Recursion (MCR) method first developed by Harold Scheraga. In this method, the system is sampled at increasing effective temperatures, and the free energy of the system is assessed from a series of terms W(b,T), computed from Monte Carlo (MC) averages at each iteration. We show the application of the MCR for ligand binding with datasets of guest-hosts systems (N = 75) and we observed that a good correlation is obtained between experimental data and the binding energies computed with MCR. We also compared the experimental data with an end-point calculation from equilibrium Monte Carlo calculations that allowed us to conclude that the lower-energy (lower-temperature) terms in the calculation are the most relevant to the estimation of the binding energies, resulting in similar correlations between MCR and MC data and the experimental values. On the other hand, the MCR method provides a reasonable view of the binding energy funnel, with possible connections with the ligand binding kinetics, as well. The codes developed for this analysis are publicly available on GitHub as a part of the LiBELa/MCLiBELa project (https://github.com/alessandronascimento/LiBELa).
Article
Antibodies are currently the most important class of biotherapeutics and are used to treat numerous diseases. Recent advances in computational methods are ushering in a new era of antibody design, driven in part by accurate structure prediction. Previously, structure-based antibody design has been limited to a relatively small number of cases where accurate structures or models of both the target antigen and antibody were available. As we move towards a time where it is possible to accurately model most antibodies and antigens, and to reliably predict their binding site, there is vast potential for true computational antibody design. In this review, we describe the latest methods that promise to launch a paradigm shift towards entirely in silico structure-based antibody design.
Article
Full-text available
We introduce a new method for rapid computation of 3D molecular similarity that combines electrostatic field comparison with comparison of molecular surface-shape and directional hydrogen-bonding preferences (called “eSim”). Rather than employing heuristic “colors” or user-defined molecular feature types to represent conformation-dependent molecular electrostatics, eSim calculates the similarity of the electrostatic fields of two molecules (in addition to shape and hydrogen-bonding). We present detailed virtual screening performance data on the standard 102 target DUD-E set. In its moderately fast screening mode, eSim running on a single computing core is capable of processing over 60 molecules per second. In this mode, eSim performed significantly better than all alternate methods for which full DUD-E data were available (mean ROC area of 0.74, p \(< 10^{-9}\), by paired t-test, compared with the best performing alternate method). In addition, for 92 targets of the DUD-E set where multiple ligand-bound crystal structures were available, screening performance was assessed using alternate ligands or sets thereof (in their bound poses) as similarity targets. Using the joint alignment of five ligands for each protein target, mean ROC area exceeded 0.82 for the 92 targets. Design-focused application of ligand similarity methods depends on accurate predictions of geometric molecular relationships. We comprehensively assessed pose prediction accuracy by curating nearly 400,000 bound ligand pose pairs across the DUD-E targets. Overall, beginning from agnostic initial poses, we observed an 80% success rate for RMSD \(\le 2.0\) Å among the top 20 predicted eSim poses. These examples were split roughly 50/50 into cases with high direct atomic overlap (where a shared scaffold exists between a pair) and low direct atomic overlap (where where a ligand pair has dissimilar scaffolds but largely occupies the same space). Within the high direct atomic overlap subset, the pose prediction success rate was 93%. For the more challenging subset (where dissimilar scaffolds are to be aligned), the success rate was 70%. The eSim approach enables both large-scale screening and rational design of ligands and is rooted in physically meaningful, non-heuristic, molecular comparisons.
Article
Full-text available
ForceGen is a template-free, non-stochastic approach for 2D to 3D structure generation and conformational elaboration for small molecules, including both non-macrocycles and macrocycles. For conformational search of non-macrocycles, ForceGen is both faster and more accurate than the best of all tested methods on a very large, independently curated benchmark of 2859 PDB ligands. In this study, the primary results are on macrocycles, including results for 431 unique examples from four separate benchmarks. These include complex peptide and peptide-like cases that can form networks of internal hydrogen bonds. By making use of new physical movements (“flips” of near-linear sub-cycles and explicit formation of hydrogen bonds), ForceGen exhibited statistically significantly better performance for overall RMS deviation from experimental coordinates than all other approaches. The algorithmic approach offers natural parallelization across multiple computing-cores. On a modest multi-core workstation, for all but the most complex macrocycles, median wall-clock times were generally under a minute in fast search mode and under 2 min using thorough search. On the most complex cases (roughly cyclic decapeptides and larger) explicit exploration of likely hydrogen bonding networks yielded marked improvements, but with calculation times increasing to several minutes and in some cases to roughly an hour for fast search. In complex cases, utilization of NMR data to constrain conformational search produces accurate conformational ensembles representative of solution state macrocycle behavior. On macrocycles of typical complexity (up to 21 rotatable macrocyclic and exocyclic bonds), design-focused macrocycle optimization can be practically supported by computational chemistry at interactive time-scales, with conformational ensemble accuracy equaling what is seen with non-macrocyclic ligands. For more complex macrocycles, inclusion of sparse biophysical data is a helpful adjunct to computation.
Article
Full-text available
Mcl-1 is a member of the Bcl-2 family of proteins that promotes cell survival by preventing induction of apoptosis in many cancers. High expression of Mcl-1 causes tumorigenesis and resistance to anticancer therapies highlighting the potential of Mcl-1 inhibitors as anticancer drugs. Here, we describe AZD5991, a rationally designed macrocyclic molecule with high selectivity and affinity for Mcl-1 currently in clinical development. Our studies demonstrate that AZD5991 binds directly to Mcl-1 and induces rapid apoptosis in cancer cells, most notably myeloma and acute myeloid leukemia, by activating the Bak-dependent mitochondrial apoptotic pathway. AZD5991 shows potent antitumor activity in vivo with complete tumor regression in several models of multiple myeloma and acute myeloid leukemia after a single tolerated dose as monotherapy or in combination with bortezomib or venetoclax. Based on these promising data, a Phase I clinical trial has been launched for evaluation of AZD5991 in patients with hematological malignancies (NCT03218683).
Article
Full-text available
We introduce the QuanSA method for inducing physically meaningful field-based models of ligand binding pockets based on structure-activity data alone. The method is closely related to the QMOD approach, substituting a learned scoring field for a pocket constructed of molecular fragments. The problem of mutual ligand alignment is addressed in a general way, and optimal model parameters and ligand poses are identified through multiple-instance machine learning. We provide algorithmic details along with performance results on sixteen structure-activity data sets covering many pharmaceutically relevant targets. In particular, we show how models initially induced from small data sets can extrapolatively identify potent new ligands with novel underlying scaffolds with very high specificity. Further, we show that combining predictions from QuanSA models with those from physics-based simulation approaches is synergistic. QuanSA predictions yield binding affinities, explicit estimates of ligand strain, associated ligand pose families, and estimates of structural novelty and confidence. The method is applicable for fine-grained lead optimization as well as potent new lead identification.
Article
Full-text available
We introduce the ForceGen method for 3D structure generation and conformer elaboration of drug-like small molecules. ForceGen is novel, avoiding use of distance geometry, molecular templates, or simulation-oriented stochastic sampling. The method is primarily driven by the molecular force field, implemented using an extension of MMFF94s and a partial charge estimator based on electronegativity-equalization. The force field is coupled to algorithms for direct sampling of realistic physical movements made by small molecules. Results are presented on a standard benchmark from the Cambridge Crystallographic Database of 480 drug-like small molecules, including full structure generation from SMILES strings. Reproduction of protein-bound crystallographic ligand poses is demonstrated on four carefully curated data sets: the ConfGen Set (667 ligands), the PINC cross-docking benchmark (1062 ligands), a large set of macrocyclic ligands (182 total with typical ring sizes of 12–23 atoms), and a commonly used benchmark for evaluating macrocycle conformer generation (30 ligands total). Results compare favorably to alternative methods, and performance on macrocyclic compounds approaches that observed on non-macrocycles while yielding a roughly 100-fold speed improvement over alternative MD-based methods with comparable performance.
Article
Full-text available
Surflex-QMOD integrates chemical structure and activity data to produce physically-realistic models for binding affinity prediction . Here, we apply QMOD to a 3D-QSAR benchmark dataset and show broad applicability to a diverse set of targets. Testing new ligands within the QMOD model employs automated flexible molecular alignment, with the model itself defining the optimal pose for each ligand. QMOD performance was compared to that of four approaches that depended on manual alignments (CoMFA, two variations of CoMSIA, and CMF). QMOD showed comparable performance to the other methods on a challenging, but structurally limited, test set. The QMOD models were also applied to test a large and structurally diverse dataset of ligands from ChEMBL, nearly all of which were synthesized years after those used for model construction. Extrapolation across diverse chemical structures was possible because the method addresses the ligand pose problem and provides structural and geometric means to quantitatively identify ligands within a model’s applicability domain. Predictions for such ligands for the four tested targets were highly statistically significant based on rank correlation. Those molecules predicted to be highly active (\(\hbox {pK}_i \ge 7.5\)) had a mean experimental \(\hbox {pK}_i\) of 7.5, with potent and structurally novel ligands being identified by QMOD for each target.
Article
ConspectusRecent advances in computer hardware and software have led to a revolution in deep neural networks that has impacted fields ranging from language translation to computer vision. Deep learning has also impacted a number of areas in drug discovery, including the analysis of cellular images and the design of novel routes for the synthesis of organic molecules. While work in these areas has been impactful, a complete review of the applications of deep learning in drug discovery would be beyond the scope of a single Account. In this Account, we will focus on two key areas where deep learning has impacted molecular design: the prediction of molecular properties and the de novo generation of suggestions for new molecules.One of the most significant advances in the development of quantitative structure-activity relationships (QSARs) has come from the application of deep learning methods to the prediction of the biological activity and physical properties of molecules in drug discovery programs. Rather than employing the expert-derived chemical features typically used to build predictive models, researchers are now using deep learning to develop novel molecular representations. These representations, coupled with the ability of deep neural networks to uncover complex, nonlinear relationships, have led to state-of-the-art performance. While deep learning has changed the way that many researchers approach QSARs, it is not a panacea. As with any other machine learning task, the design of predictive models is dependent on the quality, quantity, and relevance of available data. Seemingly fundamental issues, such as optimal methods for creating a training set, are still open questions for the field. Another critical area that is still the subject of multiple research efforts is the development of methods for assessing the confidence in a model.Deep learning has also contributed to a renaissance in the application of de novo molecule generation. Rather than relying on manually defined heuristics, deep learning methods learn to generate new molecules based on sets of existing molecules. Techniques that were originally developed for areas such as image generation and language translation have been adapted to the generation of molecules. These deep learning methods have been coupled with the predictive models described above and are being used to generate new molecules with specific predicted biological activity profiles. While these generative algorithms appear promising, there have been only a few reports on the synthesis and testing of molecules based on designs proposed by generative models. The evaluation of the diversity, quality, and ultimate value of molecules produced by generative models is still an open question. While the field has produced a number of benchmarks, it has yet to agree on how one should ultimately assess molecules "invented" by an algorithm.
Article
Accurate ranking of compounds with regards to their binding affinity to a protein using computational methods is of great interest to pharmaceutical research. Physics-based free energy calculations are regarded as the most rigorous way to estimate binding affinity. In recent years, many retrospective studies carried out both in academia and industry have demonstrated its potential. Here, we present the results of large-scale prospective application of the FEP+ method in active drug discovery projects in an industry setting at Merck KGaA, Darmstadt, Germany. We compare these prospective data to results obtained on a new diverse, public benchmark of eight pharmaceutically relevant targets. Our results offer insights into the challenges faced when using free energy calculations in real-life drug discovery projects and identify limitations that could be tackled by future method development. The new public data set we provide to the community can support further method development and comparative benchmarking of free energy calculations.
Article
Molecular mechanics Poisson-Boltzmann surface area (MM/PBSA) and molecular mechanics generalized Born surface area (MM/GBSA) are arguably very popular methods for binding free energy prediction since they are more accurate than most scoring functions of molecular docking and less computationally demanding than alchemical free energy methods. MM/PBSA and MM/GBSA have been widely used in biomolecular studies such as protein folding, protein-ligand binding, protein-protein interaction, etc. In this review, methods to adjust the polar solvation energy and to improve the performance of MM/PBSA and MM/GBSA calculations are reviewed and discussed. The latest applications of MM/GBSA and MM/PBSA in drug design are also presented. This review intends to provide readers with guidance for practically applying MM/PBSA and MM/GBSA in drug design and related research fields.
Article
The non-receptor protein tyrosine phosphatase SHP2, encoded by PTPN11, has an important role in signal transduction downstream of growth factor receptor signalling and was the first reported oncogenic tyrosine phosphatase. Activating mutations of SHP2 have been associated with developmental pathologies such as Noonan syndrome and are found in multiple cancer types, including leukaemia, lung and breast cancer and neuroblastoma. SHP2 is ubiquitously expressed and regulates cell survival and proliferation primarily through activation of the RAS-ERK signalling pathway. It is also a key mediator of the programmed cell death 1 (PD-1) and B- and T-lymphocyte attenuator (BTLA) immune checkpoint pathways. Reduction of SHP2 activity suppresses tumour cell growth and is a potential target of cancer therapy. Here we report the discovery of a highly potent (IC50 = 0.071 μM), selective and orally bioavailable small-molecule SHP2 inhibitor, SHP099, that stabilizes SHP2 in an auto-inhibited conformation. SHP099 concurrently binds to the interface of the N-terminal SH2, C-terminal SH2, and protein tyrosine phosphatase domains, thus inhibiting SHP2 activity through an allosteric mechanism. SHP099 suppresses RAS-ERK signalling to inhibit the proliferation of receptor-tyrosine-kinase-driven human cancer cells in vitro and is efficacious in mouse tumour xenograft models. Together, these data demonstrate that pharmacological inhibition of SHP2 is a valid therapeutic approach for the treatment of cancers.