Available via license: CC BY-NC-ND 4.0
Content may be subject to copyright.
Synergy and Complementarity between Focused Machine Learning
and Physics-Based Simulation in Affinity Prediction
Ann E. Cleves, Stephen R. Johnson,*and Ajay N. Jain*
Cite This: https://doi.org/10.1021/acs.jcim.1c01382
Read Online
ACCESS Metrics & More Article Recommendations *
sıSupporting Information
ABSTRACT: We present results on the extent to which physics-based
simulation (exemplified by FEP+) and focused machine learning
(exemplified by QuanSA) are complementary for ligand affinity prediction.
For both methods, predictions of activity for LFA-1 inhibitors from a
medicinal chemistry lead optimization project were accurate within the
applicable domain of each approach. A hybrid model that combined
predictions by both approaches by simple averaging performed better than
either method, with respect to both ranking and absolute pKivalues. Two
publicly available FEP+benchmarks, covering 16 diverse biological targets,
were used to test the generality of the synergy. By identifying training data
specifically focused on relevant ligands, accurate QuanSA models were derived using ligand activity data known at the time of the
original series publications. Results across the 16 benchmark targets demonstrated significant improvements both for ranking and for
absolute pKivalues using hybrid predictions that combined the FEP+and QuanSA predicted affinity values. The results argue for a
combined approach for affinity prediction that makes use of physics-driven methods as well as those driven by machine learning,
each applied carefully on appropriate compounds, with hybrid prediction strategies being employed where possible.
■INTRODUCTION
Binding affinity prediction continues to be a challenge for
computer-aided drug design, especially in the case where there
is no high-resolution experimental structure of the target of
interest. Even when structures of the biological target are
available, affinity prediction is difficult. Simulation oriented
physics-based methods, such as MM/PBSA or MM/
GBSA
1−3,6
or free energy perturbation (FEP),
7−9
share a key
attraction: in principle, these approaches are congruent with
what is known physically. The former methods nominally
predict absolute binding free energy. In terms of predictive
accuracy, even in the case where experimental structures are
known for all ligands under consideration, performance has
been observed to be quite variable on a per-target basis,
10
though more consistent results have been obtained in some
cases, with careful application.
3
Additional context with respect
to the state of physical simulation approaches is provided by
recent reviews.
4,5
For the FEP approach, relative free energy predictions are
made. This is done by estimating the difference in the free
energies of protein−ligand complexes between related ligand
pairs (typically differing relatively modestly in their sub-
stituents). Advances in force fields, sampling methods, and
automated design of perturbation graphs
9
can help to guide
fine-grained molecular optimization. In cases where the FEP+
method is applicable, for single perturbations of a few ligand
atoms from a known reference ligand, errors in predicting
changes in free energy have been reported to be as low as 0.5
pKiunits (0.9 kcal/mol).
9
More recent benchmarking on a
more challenging set of perturbations yielded errors roughly
50% higher.
11
Affinity prediction remains a challenging
problem, even in cases where targets have well-characterized
structures and there is little uncertainty in ligand binding
modes.
Machine-learning approaches have seen a recent resurgence
in their applications within the CADD field, in part driven by
advances in deep-learning methodologies. A recent review
highlights a number of successful applications as well as
limitations,
12
with further context provided by a full book
treatment.
13
With respect to binding affinity prediction in the
context of lead optimization, a critical factor is that the
methods typically require thousands of data points in order to
learn effectively, because of the need to develop encoded
internal representations that meaningfully capture the
important aspects required for prediction. Early-stage lead
optimization may involve just dozens of assayed molecules
within a newly discovered chemical series, and even mid-to-
late-stage projects may be limited to hundreds or up to a few
thousand data points. The recently introduced QuanSA
machine-learning method (Quantitative Surface-field Analysis)
Received: November 11, 2021
Articlepubs.acs.org/jcim
© XXXX The Authors. Published by
American Chemical Society A
https://doi.org/10.1021/acs.jcim.1c01382
J. Chem. Inf. Model. XXXX, XXX, XXX−XXX
Downloaded via 178.171.38.123 on December 11, 2021 at 02:20:05 (UTC).
See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.
differs from the deep-learning paradigm and from historically
widely used methods.
14
The central difference is that, rather than applying a generic
machine-learning approach to an input molecular representa-
tion divorced from a binding event, QuanSA builds a physically
interpretable model that is analogous to a protein binding site.
By doing so, it addresses the problem of ligand conformation
and alignment fully automatically, and it moves in the direction
of causal modeling, where the requirement for data can be
reduced. The method constructs a nonlinear “pocket field”that
is still physical in nature, and which is directly related to the
functional form of scoring functions for docking.
15,16
QuanSA
pocket-field models mirror key physical phenomena that are
observed in protein−ligand interactions:
17
(1) choice of ligand
poses is defined by the model; (2) non-additive (or even anti-
additive) effects of substituent changes on a central scaffold
can be modeled effectively; (3) changes in ligand structures
induce changes in predicted ligand poses; (4) the model of
molecular activity is dependent on the detailed shape of
ligands. Nearly all QSAR and deep-learning methods ignore
some or all of these aspects of protein−ligand interactions.
Additional discussion of the theoretical contrasts between the
QuanSA multiple-instance learning approach and other QSAR
(3D and 2D) approaches can be found in the papers
introducing the method
14
along with the antecedent
QMOD
18
and Compass
19−21
approaches, the latter of which
introduced the multiple-instance machine-learning paradigm.
22
Here, we explore the performance of both FEP+and the
QuanSA machine-learning method in a lead optimization
project application scenario and using two publicly available
FEP+benchmarks,
9,11
spanning 16 diverse targets and covering
affinity predictions for nearly 400 molecules. Project data for
LFA-1
23,24
was used as a representative example of mid-to-late-
stage lead optimization, where substantial structure−activity
data exist, particularly within a chemical series of interest. The
two FEP+benchmarks were used to assess early-stage project
application, where only sparse data may be available.
Accuracy of the QuanSA and FEP+approaches, as well as a
hybrid approach combining predictions from the two methods
by simple averaging, will be detailed in what follows. In
Figure 1. Overview of the QuanSA method. Beginning from ligand structures and activities (here against LFA-1), a multiple-ligand alignment is
produced (with variants for each molecule), after which a smooth, nonlinear function is induced (called a “pocket field”), into which new molecules
can be flexibly fit as is commonly done with docking approaches. Here, the new test molecule, compound 4, was made 7 months after the last
molecule within the training set (example molecules 1−3), and it was accurately predicted. Shown in the lower row is the predicted pose of
compound 4, the surface surrounded by the pocket field (left), and the interactions with the pocket field with and without the surface (middle and
right).
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article
https://doi.org/10.1021/acs.jcim.1c01382
J. Chem. Inf. Model. XXXX, XXX, XXX−XXX
B
addition, because the QuanSA approach can be practically
applied to screen large databases for new lead discovery and
scaffold replacement, a screening utility was assessed using
structure−activity data for diverse compounds that were
disclosed after the data used for model induction.
■RESULTS AND DISCUSSION
We report results for two types of project application
scenarios: mid-to-late-stage lead optimization and early stage.
In both cases, training and testing data for QuanSA were
temporally segregated: building models on older molecules and
predicting the activities of future molecules. This parallels the
application scenario for predictive modeling, and it avoids bias
in assessing the performance of learned models.
17,25,26
For the
mid-to-late-stage scenario, the data set included compound
registration dates and associated activities. For the early-stage
scenario, coarser temporal segregation was accomplished by
making use of the years of disclosure of structure−activity and
protein structural data. Assessment of a screening utility for the
QuanSA models also employed temporal segregation.
QuanSA Model Induction. The QuanSA method has
been previously described in detail
14
and will be summarized
only briefly here, with additional details in the Supporting
Information.Figure 1 illustrates the induction of a QuanSA
pocket field. Beginning with pure SAR data (here SMILES
strings and associated pKimeasurements), low-energy
conformational ensembles are produced, from which multiple
mutual ligand alignments are automatically constructed. These
alignments may be influenced (optionally) by provision of
known bound ligand poses, and each ligand alignment contains
a single optimal pose along with many related alternative poses.
The derived pocket field acts as a virtual binding pocket, into
which new molecules are flexibly fit, subject to the joint
considerations of optimizing ligand interactions with the
pocket and minimizing ligand strain.
For all models in this study, training ligands were focused
around scaffolds of interest with respect to prediction and, in
all cases, the poses of bound ligands were used to drive the
initial alignment process. The more general case of diverse
scaffolds without the benefit of known bound ligand poses is
more challenging, and that has been discussed extensively in
prior work.
14,18
Figure 1 shows three representative training molecules (1−
3) and one future test molecule (4) from this work. Shown in
3D is the mutual overlay of the final optimal poses of the
training molecules in the model. In this example, QuanSA
accurately predicted the activity of the new molecule, which
was synthesized months after the molecules used for model
induction.
Mid-to-Late-Stage Project Application Scenario: LFA-
1. LFA-1 is a heterodimeric protein of the integrin family with
noncovalently linked αand βsubunits and is expressed on the
surface of leukocytes.
27
LFA-1 mediates the interactions
between leukocytes and other cells and has been pursued as
a target for immunological disorder treatments, both by
antibodies
28
and with small molecules.
29
The compounds in
this work were generated in an effort to identify orally active
small molecules that disrupted the LFA-1/ICAM-1 inter-
action.
23,24
The set is comprised mostly of bicyclic hydantoins
(e.g., compound 2), spirocyclic hydantoins, and spirocyclic
pyrrolidines (e.g., compound 1), and all bind competitively to
the I-domain allosteric site of LFA-1 and prevent the
conformational changes required for ICAM-1 binding.
The LFA-1 structure−activity set contained homogeneous,
high-quality assay data, with time stamps available to allow for
segregation of data into a training set and a set of future
compounds for prediction. Figure 2 (left) depicts the QuanSA
Figure 2. Preparation and scoring procedures using a temporally segregated set of LFA-1 inhibitors from a medicinal chemistry lead optimization
project: QuanSA (left) and FEP+(right). The QuanSA approach follows a machine-learning paradigm, employing a training set and a holdout set
for model selection. The FEP+approach combines careful force field parameter estimation, molecular docking, and extensive physical simulation.
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article
https://doi.org/10.1021/acs.jcim.1c01382
J. Chem. Inf. Model. XXXX, XXX, XXX−XXX
C
model building, model selection, and testing procedure applied
to the series of LFA-1 inhibitors. Model selection was done by
testing alternative models on a later set of holdout molecules.
Figure 2 (right) depicts the procedure for FEP+. QuanSA
makes use of a typical machine-learning paradigm, employing
training and (optional) holdout sets of molecules, within
successive time windows of project activity. FEP+makes
predictions on a set of structural variations of a reference
molecule, with the reference here being selected from among
the LFA-1 holdout set and the 17 molecules for prediction
being chosen from among the 67 molecules from the final
project time window.
The selected model had a mean unsigned error (MUE) of
0.56 log unit on the holdout set, corresponding to a Kendall’sτ
of 0.48 (p< 0.0001). This model was refined using the holdout
molecules, resulting in a final fit to the 135 training/holdout
molecules of 0.25 log unit MUE and Kendall’sτof 0.86 (p<
0.0001). The refined pocket field (shown in Figure 1) was then
used to score the blind test set of 67 future molecules.
The plot in Figure 3 shows the experimental activities
compared to the QuanSA predicted activities for the full set of
67 future test molecules. QuanSA yielded statistically
significant predictions for the full blind test with a τof 0.57
(95% confidence interval (CI) 0.42−0.69, p< 0.0001) and an
Figure 3. Plot of experimental activities versus predicted activities from QuanSA for the full set of 67 future test molecules. Test molecules 5−8
have structures significantly different from those of the training compounds, and the plot points for these compounds are highlighted in orange.
Also shown are the top pose families and interactions with the pocket field for four example test molecules with the spirocyclic pyrrolidine scaffold
(9−12) whose points on the graph are highlighted in blue and are indicated with red arrows.
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article
https://doi.org/10.1021/acs.jcim.1c01382
J. Chem. Inf. Model. XXXX, XXX, XXX−XXX
D
MUE of 0.52 log unit (95% CI 0.43−0.63). Lines indicate
perfect prediction, and to ±0.7 and ±1.5 units of pKi
(corresponding to ±1 and ±2 kcal/mol). Just under 80% of
the molecules (53 of 67) were predicted within 0.7 unit of pKi,
and just two molecules exceeded 1.5 units.
Figure 3 shows the structures of eight example test
molecules. Four test molecules are shown in 2D only (top
right, 5−8) with structures significantly different from those of
the majority of compounds synthesized late in the project.
Compounds 5−7contain centrally located amines, and
compound 8has a different scaffold. Despite these structural
differences, QuanSA accurately predicted the activities of these
structurally divergent molecules whose activity spanned a range
of 3.5 log units.
Four other test molecules, each of which has the heavily
explored spirocyclic pyrrolidine scaffold (9−12), are also
shown in Figure 3, along with their top-scoring pose families
and interactions with the pocket field. Many of the molecules
in the data set varied only in the substitutions on the
spirocyclic pyrrolidine nitrogen, as shown for molecules 9−12.
The interaction sticks for these molecules with the pocket field
closely mimic the interactions observed in the X-ray cocrystal
structure of compound 1with LFA-1.
24
Most of the
interactions were hydrophobic (teal sticks) including those
for the dichlorophenyl group itself, which occupies a
hydrophobic pocket. The urea carbonyl, thought to be
hydrogen bonded via a water molecule, is marked by a
prominent red acceptor stick. Compounds 9and 10 were
among the most potent molecules in the test set, and QuanSA
accurately predicted the activities despite the negative charge
on the R group.
FEP+Prediction Performance and Hybrid Modeling.
The FEP+approach employs a reference ligand with a known
free energy of binding along with a structure of the ligand
bound to the protein of interest. From this reference ligand, a
set of molecular transformations can be made and arranged
into a connected graph such that connected pairs of test
molecules have relatively high similarities. For each such
connected pair, a calculation of ΔΔGij is carried out,
corresponding to a single edge in the graph. To obtain a
prediction for a particular molecule, a single edge is the
minimal calculation required, though calculation of the full set
of ΔΔGij within a perturbation graph and application of cycle-
closure corrections can improve the accuracy.
9
In practice, due
to the complexity and computational expense of applying the
method, single-edged affinity predictions are often employed.
We limited our FEP+predictions to a subset of 17 of the 67
future test molecules that were suitable for single-edged ΔΔG
calculations from a single reference ligand. Figure 4 shows the
FEP+reference ligand (13) and four example test molecules (4
and 14−16) from the 17 molecule test subset. All 17 molecules
in the subset used with FEP+had the spirocyclic pyrrolidine
core and differed only by the R group at the pyrrolidine
nitrogen. Standard Glide MCSS docking
30
was used to
establish initial binding modes for the FEP+calculations (see
the Experimental Section for details).
In order to illustrate ligand movement within the LFA-1
allosteric binding site, the results of ensemble docking are also
shown in Figure 4. The ensemble docking pose families shown
Figure 4. FEP+reference ligand (13) and four test molecules (4and 14−16) are shown. FEP+employs an initial docked pose of the reference
molecule in the LFA-1 binding pocket. The top pose family of the reference ligand resulting from ensemble docking using Surflex-Dock is shown to
illustrate the potential conformational variation of the ligand in the protein pocket.
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article
https://doi.org/10.1021/acs.jcim.1c01382
J. Chem. Inf. Model. XXXX, XXX, XXX−XXX
E
here are consistent across the different compounds: while the
spirocyclic core tends to bind in a relatively fixed orientation,
there is the potential for conformational variation for the
pyrrolidine nitrogen R groups. This conformational variation is
consistent with crystal structures, which also suggested that
substituents anchored from the five-membered ring system
project into solvent.
23
Figure 5 shows four examples from comparisons of QuanSA
and FEP+activity predictions on the 17 molecule subset. The
final optimal pose families from QuanSA for each of the
molecules 4,14,15, and 16 follow the motif seen for test
compounds 9−12 (Figure 3) with the spirocyclic pyrrolidine
scaffold in a relatively fixed position and conformational
variation for the R group on the pyrrolidine nitrogen. Also, the
pocket-field interactions followed the same pattern, mostly
hydrophobic with a prominent acceptor interaction near the
urea carbonyl. The possible orientation changes in the R
groups is reflected in the starting docked poses for FEP+.
QuanSA predicted the activities of compounds 4,14, and 15
within 0.5 log unit of activity. FEP+predictions for these active
molecules were slightly less accurate, but still quite good. Note
that the orientations of the nitrogen substituents of the
pyrrolidine differ between QuanSA and FEP+. This was
expected, reflecting the pose variation seen in Figure 4 from
ensemble docking of the reference ligand. The QuanSA
alignments were driven by mutual similarity, influenced by the
crystallographic reference ligand pose toward the “bottom”of
the ligands, which shared structural homogeneity. The diversity
of substituent orientations seen in the FEP+poses reflected
solvent exposure with sparse protein interactions.
Combining the two methods by averaging their independent
predictions (termed “hybrid”model predictions) often led to
partial cancellation of errors. For example, for the relatively
active molecule 14 and the significantly less active molecule
16, predictions from both methods were off, but the errors
were opposite in sign. By combining the results from the two
methods, the predictions for both molecules were reduced to
negligible discrepancies from experimental activity. Note that
typical standard deviations in repeated LFA-1 IC50 determi-
nations were approximately 0.1 pKiunit.
23,24
Figure 6 shows a plot of individual test performance on a
subset of 17 ligands for the QuanSA structure-guided model
(purple times signs, MUE = 0.44), and for FEP+(green plus
signs, MUE = 0.56) as well as for the combination of the
methods (red squares, MUE = 0.25). Hybrid predictions were
defined as the average of the QuanSA and FEP+predictions for
each molecule. Using a paired ttest, the relatively small
difference in prediction errors between QuanSA and FEP+was
not statistically significant (p-value = 0.24). However, the
hybrid model performed statistically better than FEP+alone
(p-value = 0.002) and better than structure-guided QuanSA
alone (the paired ttest p-value of 0.09 just misses weak
significance). The signed prediction errors of QuanSA and
FEP+were only slightly correlated (p= 0.04 by Kendall’sτ),
allowing the hybrid model to exhibit marked improvement.
Early-Stage Project Application Scenario: Sixteen
FEP+Benchmark Targets. Early-stage project application
may offer only a handful of data points within a relatively
newly identified chemical series. The original FEP+bench-
mark,
9
here referred to as the Abel benchmark, consisted of
eight targets, each with a prediction set ranging from 11 to 42
members (each including a reference compound within the
prediction set). More recent benchmarking work, here referred
to as the Schindler benchmark,
11
consisted of eight targets,
each with a prediction set ranging from 24 to 44 members
(each including a reference compound within the prediction
set).
Structure−activity data within some series were extremely
limited, but contemporaneously available structure−activity
data and protein structure data were plentiful in other cases.
Figure 7 shows how a focused approach to model induction
can be applied in cases where sparse data exist within a
Figure 5. Four examples from comparisons of activity predictions on
a 17 molecule subset of the blind test set. For QuanSA, the top pose
family for each test molecule plus the interaction sticks of the top pose
with the pocket field is shown. For FEP+, the initial docked poses are
shown. Hybrid predicted pKivalues are the simple average of the
QuanSA and FEP+values.
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article
https://doi.org/10.1021/acs.jcim.1c01382
J. Chem. Inf. Model. XXXX, XXX, XXX−XXX
F
particular series, but where data from other series can be
exploited. We made use of the reference ligand in each case to
identify particularly relevant protein−ligand complexes and
structure−activity data, where the information was available
contemporaneously with the public disclosure of the molecules
within the prediction set.
The eSim 3D molecular similarity method (described in
detail previously
31
) was employed to identify particularly
relevant protein structures: those whose cognate ligands
exhibited high similarities to the FEP+reference ligand when
the protein binding sites were aligned. Then, those bound
ligand poses were used to screen for the most structurally
similar ligands from the available bioactivity data. Both of the
filtering steps were applied to data that were publicly available
when each FEP+benchmark series was made public, through
either publication or a patent. Compounds from future years
were reserved for testing screening-style extrapolation from the
structurally focused data sets.
Focused QuanSA Model Building. Figure 8 illustrates
the focused model building process, using SHP2 as an example.
An allosteric mechanism for inhibiting SHP2 was published in
2016,
32
with a chemical series related to the initial lead
structures being disclosed in a subsequent patent that was
granted in 2018 (U.S. Patent 10,093,646), which contained the
structure−activity data used for the FEP+prediction set. There
were several cocrystallized allosteric inhibitors available by
2018 (top middle of Figure 8), with some extending quite far
beyond the spatial extent of the series of interest. By employing
a static eSim similarity measurement between each of the
crystallographically aligned ligands and the reference ligand, a
filtered subset of relevant bound variants was identified (top
right).
Similarly, by 2018, a large number of alternative allosteric
inhibitors had been discovered, again with many extending far
beyond the reference ligand. In practice, with a physically
grounded affinity prediction method such as QuanSA, such a
large set of competitive inhibitors dilutes the predictive
performance of models within the space that closely
encompasses a particular series or set of related series that
explore the same area. The full set of known ligands was
screened against the multiple-ligand crystallographically
derived alignment of relevant bound ligands using the eSim
method,
31
and those ligands whose scores exceeded a
threshold were retained (bottom middle of Figure 8). Finally,
the standard process for QuanSA model induction was
employed, making use of the relevant bound ligand poses to
help constrain generation of initial poses for all ligands. This
step may also filter the training molecule pool further on the
basis of multiple stages of accumulating ligands that are at first
similar to the bound ligands, then those which are similar to
the newly aligned ligands, and so forth (see the Experimental
Section for additional details). In the case of SHP2, the full
pool of known ligands from 2018 and earlier numbered 514,
with the eSim-based filtering process against the crystallo-
graphic ligands resulting in 51 molecules. The QuanSA
alignment initialization’s accumulative process retained 15 of
51 from the filtered training pool (bottom right of Figure 8).
Each of the 16 targets underwent the same procedure for
focused model building, as just described. Figure 9 illustrates
predictions for SHP2 on four representative ligands (bottom
row), along with representative training ligands (top row).
Prediction values are shown for FEP+and QuanSA, and the
“hybrid”prediction for each ligand is simply the average of
Figure 6. Comparisons of activity predictions on a 17 molecule subset
of the blind test set for QuanSA, FEP+, and hybrid methods.
Figure 7. Preparation and scoring procedures in the early lead
optimization scenario, using a bound reference compound to identify
relevant additional bound ligands, which are then used collectively to
identify a pool of relevant bioactivity data for input to QuanSA model
induction.
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article
https://doi.org/10.1021/acs.jcim.1c01382
J. Chem. Inf. Model. XXXX, XXX, XXX−XXX
G
those two values. The prediction set is characterized largely by
the disposition of the amine (right side of the central core),
whether being primary or secondary and the characteristics of
its connection to the central scaffold, though some variation of
the left-hand substituent was also explored. For SHP2, the
mean unsigned error for both FEP+and QuanSA was 0.6 log
unit, and the hybrid approach yielded 0.4. The sparse data for
model training was able to cover the variations present in the
prediction set, and the errors of the two primary approaches
partially canceled, allowing for the improvement seen in the
hybrid approach.
Figure 10 shows the analogous information for c-MET,
where 59 ligands of diverse structural character formed the
final focused set for model parameterization. In contrast to
SHP2, the available training set consisted of molecules outside
the series of interest, and four different heterocyclic cores are
present in the training examples shown in Figure 10. The
QuanSA approach was able to learn the effects of the various
substitutions from alternative scaffolds and to transfer the
Figure 8. Process of constructing a focused QuanSA model from diverse data for SHP2.
Figure 9. Representative examples of predictions from the FEP+, QuanSA, and hybrid approaches for SHP2. Note that many of the SHP2 training
compounds came from the same patent and series as the prediction set.
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article
https://doi.org/10.1021/acs.jcim.1c01382
J. Chem. Inf. Model. XXXX, XXX, XXX−XXX
H
information to the particular series represented in the
prediction set.
The crucial distinction between QuanSA and other machine-
learning methods for affinity prediction is that QuanSA
constructs a model that is physically analogous to a protein
binding site. Therefore, in order to accurately predict, for
example, the quantitative effect of the morpholine of the left-
most prediction case, other examples of ligands that place
cationic species in the same vicinity in their bound states must
be properly modeled in the learning process. The blue
interaction stick (red arrow) shows the preference that the
pocket field has for an amine that is geometrically disposed as
in the optimal pose of this prediction example. It is difficult to
understand the effect on binding from a protein structural
perspective. The amine appears to be within the solvent,
relatively far from an obvious interaction partner. This perhaps
explains why the structure-focused FEP+approach made an
underprediction. In this case, the hybrid prediction was quite
accurate (just 0.2 pKiunit low). The right-most prediction
example shows an example where the hybrid approach did not
perform the best of all three, but it significantly improved upon
the poorer of the two primary predictions.
It is conceivable, given a sufficiently large quantity of data,
that a learning method which ignores the conformational strain
and pose of ligands in their bound state could make
meaningfully accurate predictions in cases like this. However,
for the type of fine-grained guidance represented by these
examples, many early-stage lead optimization projects lack such
quantities of data. For the most challenging targets, where
relevant structure−activity data are the most scarce, methods
that can make effective use of data sets measured in dozens of
compounds rather than thousands have a clear advantage.
Statistical Analysis for Focused Model Building.
Figure 11 shows plots for the full set of predictions for both
benchmark sets along with the cumulative histograms of
unsigned prediction errors, exhibiting the same type of error
Figure 10. Representative examples of predictions from the FEP+, QuanSA, and hybrid approaches for c-MET.
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article
https://doi.org/10.1021/acs.jcim.1c01382
J. Chem. Inf. Model. XXXX, XXX, XXX−XXX
I
cancellation seen for LFA-1. The hybrid approach produced a
clear reduction in the fraction of predictions with large errors.
For the Schindler benchmark, the hybrid predictions had fewer
than 20% with errors of 1.0 log unit or greater, compared with
just under 30% for QuanSA and just under 40% for FEP+. For
the Abel benchmark, which consisted of smaller “jumps”than
seen for the Schindler benchmark targets, the hybrid approach
produced roughly 10% of predictions with errors of 1.0 log unit
or greater, with FEP+yielding just over 30% and QuanSA just
over 20%. For the Schindler benchmark, the unsigned error for
the hybrid predictions was very significantly better than that of
either of the other two methods (pvalues of 10−10 and 10−6
compared with FEP+and QuanSA, respectively, using the
paired ttest). For the Abel benchmark, the unsigned error for
the hybrid predictions was very significantly better than that
for FEP+(pvalue of 10−9). Between the hybrid and QuanSA
approaches, the hybrid method’s reduction in large errors
would make it preferred among the two, despite the error
distributions not being well-differentiated using the paired t
test.
Prediction errors for the FEP+approach in this analysis were
larger than those reported for the original analysis for the Abel
benchmark.
9
Here, the reference ligand was treated as a
training exemplar, with known absolute ΔG, and the ligands in
Figure 11. Plots of all predictions for each of the three methods for both FEP+benchmarks along with cumulative histograms of unsigned
prediction error. Lines indicate perfect performance (solid black), 1 kcal/mol error (dashed dark gray), and 2 kcal/mol (dashed light gray).
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article
https://doi.org/10.1021/acs.jcim.1c01382
J. Chem. Inf. Model. XXXX, XXX, XXX−XXX
J
the prediction set were treated as having unknown activity.
Given the reported ΔΔGvalues, final ΔGvalues for the
prediction set were made using the reference ligand’s value as
an offset. In the original work, all experimental ΔGvalues were
used to “center”the predicted values. The more recent analysis
for the Schindler benchmark
11
noted this issue, and statistics
were not calculated for deviation from ΔG. Rather, emphasis
was placed on correlation statistics and upon pairwise ΔΔG
error magnitudes (the accuracy of single-edged predictions). In
practice, in a real prediction scenario, the average ΔGof a
prediction set cannot be known. Therefore, our analysis treats
the FEP+, QuanSA, and hybrid approaches in the same
manner, with the reference ligand as part of the “knowns”and
the prediction set as “unknowns”.
Table 1 shows per-target data set sizes and statistical values
for the three methods, both using rank-correlation (Kendall’sτ,
which is not affected by the offset calculation) and MUE. In all
cases, the hybrid approach had the lowest MUE. In five of
eight cases, it also had the highest rank correlation, with FEP+
and QuanSA showing very slightly higher values in the
remaining three cases (two for FEP+and one for QuanSA). In
no case did the hybrid approach fail to produce a statistically
significant ranking, compared with one failure each for FEP+
and QuanSA (italics). Table 2 shows the analogous data for
the Abel benchmark. Note that, in three cases (MCL1, BACE,
and P38), a random half of the original prediction set was used
for training (marked with asterisks in Table 2; see the
Experimental Section for details). The pattern was similar to
that observed for the Schindler set, with the hybrid method
Table 1. Per-Target Performance of FEP+, QuanSA, and Hybrid Approaches and Data Set Sizes for the Eight Targets of the
Schindler Benchmark
a
Nmean unsigned error Kendall’sτ
target pred full pool filtered final FEP+QuanSA hybrid FEP+QuanSA hybrid
SHP2 25 514 51 15 0.58
(0.41−0.76) 0.61
(0.41−0.84) 0.40
(0.27−0.53) 0.69
(0.46−0.87) 0.43
(0.01−0.75) 0.72
(0.47−0.90)
PFKFB3 39 489 34 34 1.08
(0.86−1.30) 0.72
(0.57−0.88) 0.45
(0.33−0.58) 0.70
(0.56−0.82) 0.50
(0.31−0.66) 0.73
(0.62−0.84)
SYK 43 1827 18 18 0.63
(0.48−0.81) 0.62
(0.48−0.78) 0.49
(0.37−0.62) 0.34
(0.07−0.59) 0.13
(−0.12−0.37) 0.35
(0.09−0.60)
HIF2a 41 63 30 29 0.70
(0.53−0.88) 0.82
(0.63−1.02) 0.58
(0.44−0.74) 0.54
(0.27−0.77) 0.42
(0.18−0.64) 0.51
(0.25−0.72)
TNKS2 27 541 150 143 0.86
(0.59−1.15) 0.74
(0.60−0.89) 0.64
(0.45−0.83) 0.34
(−0.04−0.65) 0.55
(0.26−0.76) 0.49
(0.16−0.75)
c-MET 23 176 62 59 1.07
(0.80−1.34) 0.82
(0.66−0.99) 0.71
(0.54−0.90) 0.82
(0.66−0.93) 0.68
(0.51−0.83) 0.85
(0.73−0.95)
CDK8 32 130 60 60 0.96
(0.68−1.25) 0.99
(0.71−1.30) 0.90
(0.68−1.16) 0.66
(0.42−0.87) 0.45
(0.19−0.68) 0.66
(0.44−0.85)
EG5 27 147 34 34 1.08
(0.90−1.26) 1.09
(0.84−1.32) 0.96
(0.78−1.15) 0.73
(0.53−0.89) 0.47
(0.09−0.77) 0.67
(0.40−0.89)
32.1 485.9 54.9 49.0 0.87 ±0.21 0.80 ±0.17 0.64 ±0.21 0.60 ±0.18 0.45 ±0.16 0.62 ±0.16
a
Unsigned error is in units of pKi, and Kendall’sτvalues are unitless. Numbers in parentheses are 95% confidence intervals calculated by
resampling with replacement, bolded values are the best from any method, and values shown in italics did not meet statistical significance at the p=
0.01 level. The values in the bottom row are the mean and standard deviation for the respective statistical measurement column.
Table 2. Per-Target Performance of FEP+, QuanSA, and Hybrid Approaches and Data Set Sizes for the Eight Targets of the
Abel Benchmark
a
Nmean unsigned error Kendall’sτ
target pred full pool filtered final FEP+QuanSA hybrid FEP+QuanSA hybrid
thrombin 10 2401 74 74 0.55
(0.34−0.81) 0.42
(0.23−0.65) 0.28
(0.18−0.41) 0.63
(−0.16−1.00) 0.63
(−0.27−1.00) 0.85
(0.45−1.00)
MCL1*20 170 35 34 0.78
(0.53−1.05) 0.30
(0.15−0.49) 0.41
(0.25−0.57) 0.58
(0.13−0.89) 0.73
(0.35−1.00) 0.70
(0.36−0.94)
BACE*17 1705 93 93 0.98
(0.75−1.21) 0.30
(0.20−0.41) 0.46
(0.34−0.60) 0.77
(0.54−0.94) 0.62
(0.22−0.90) 0.81
(0.54−0.98)
P38*16 1901 92 84 0.66
(0.38−0.95) 0.62
(0.39−0.87) 0.49
(0.34−0.65) 0.58
(0.22−0.85) 0.13
(−0.35−0.62) 0.64
(0.31−0.89)
PTP1b 22 528 53 41 0.78
(0.59−0.97) 0.96
(0.62−1.31) 0.54
(0.37−0.73) 0.81
(0.53−0.99) 0.27
(−0.13−0.60) 0.65
(0.37−0.88)
CDK2 15 86 43 43 0.69
(0.49−0.89) 0.84
(0.52−1.17) 0.61
(0.39−0.83) 0.29
(−0.23−0.76) 0.67
(0.33−0.92) 0.71
(0.32−0.97)
TYK2 15 124 48 48 0.62
(0.45−0.84) 0.80
(0.49−1.14) 0.69
(0.50−0.89) 0.71
(0.34−0.98) 0.53
(0.11−0.88) 0.78
(0.45−1.00)
JNK1 20 155 68 55 1.38
(1.04−1.71) 0.46
(0.27−0.70) 0.74
(0.57−0.90) 0.89
(0.69−1.00) 0.64
(0.36−0.85) 0.88
(0.65−1.00)
16.9 883.8 63.3 59.0 0.81 ±0.21 0.59 ±0.17 0.53 ±0.21 0.66 ±0.18 0.53 ±0.16 0.75 ±0.16
a
Unsigned error is in units of pKi, and Kendall’sτvalues are unitless. Numbers in parentheses are 95% confidence intervals calculated by
resampling with replacement, bolded values are the best from any method, and values shown in italics did not meet statistical significance at the p=
0.01 level. The values in the bottom row are the mean and standard deviation for the respective statistical measurement column.
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article
https://doi.org/10.1021/acs.jcim.1c01382
J. Chem. Inf. Model. XXXX, XXX, XXX−XXX
K
producing the best performance, either by MUE or by rank
correlation, though the advantage of the hybrid approach over
the QuanSA method was smaller.
The average Kendall’sτvalues over all 16 targets for the
three methods were as follows: 0.63 (FEP+), 0.49 (QuanSA),
and 0.69 (hybrid). None of the per-target rank-correlation
differences between methods were statistically significant at p=
0.01 by the paired ttest due to the relatively small number of
targets. The statistical power is also limited by the fact that
each individual data set is relatively small, and several are
dominated by a narrow experimental assay range, so the
correlation statistics tend to have high variance. The values of
the average per-target unsigned error for the three methods
were 0.84 (FEP+), 0.69 (QuanSA), and 0.58 (hybrid). By the
paired ttest, the per-target hybrid MUE was consistently lower
than those for FEP+(p<10
−3) and QuanSA (p= 0.02). This
agreed with the analysis of the unsigned prediction error across
the ligands within the Schindler benchmark (N= 257) and the
Abel benchmark (N= 135), which offer more statistical power
to differentiate between the methods (see Figure 11).
With respect to the sizes of the bioactivity data sets, we see
that the typical size of the nominally available bioactivity data
was in the hundreds of molecules. However, only roughly one-
tenth of these survived the filter of relevance against the bound
Figure 12. Plots for all eight targets of the Schindler FEP+benchmark. FEP+shown in green plus signs, QuanSA shown in violet times signs, and
hybrid predictions shown in red squares, with a single gray circle marking the activity of the FEP+reference ligand (treated as part of the training
set). In addition, a histogram of signed prediction errors is shown in the lower right.
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article
https://doi.org/10.1021/acs.jcim.1c01382
J. Chem. Inf. Model. XXXX, XXX, XXX−XXX
L
known ligand poses, which themselves were filtered for
relevance. The final focused set of structure−activity data for
each target ranged from 15 (SHP2) to 143 (TNKS2),
averaging about 50 for the Schindler benchmark and about
60 for the Abel benchmark. Data requirements of this scale for
the final models are generally within the scope of lead
optimization projects quite early on in the exploration of a new
chemical series.
Figures 12 and 13 show individual plots for all predictions
by each method for each target along with histograms of the
signed prediction error values. The histograms showed a
marked decrease in errors of large magnitude (either over- or
underpredictions) by the hybrid method (shown in red). None
of the methods exhibited a systematic bias, with all histograms
being centered very close to zero.
During fine-grained lead optimization, while the rank order
of synthetic candidates is clearly important, the absolute
accuracy of affinity predictions takes on additional importance.
For example, in the case of PFKFB3, the reference ligand had a
pKiof roughly 6.5. Consider the predictions for the 39
molecules of the test set as nominal true positives (TPs,
predicted and experimental ≥reference), true negatives (TNs,
predicted/experimental ≤reference), false positives (FPs,
predicted ≥reference, experimental ≤reference), and false
Figure 13. Plots for all eight targets of the Abel FEP+benchmark. FEP+shown in green plus signs, QuanSA shown in violet times signs, and hybrid
predictions shown in red squares, with a single gray circle marking the activity of the FEP+reference ligand (treated as part of the training set). In
addition, a histogram of signed prediction errors is shown in the lower right.
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article
https://doi.org/10.1021/acs.jcim.1c01382
J. Chem. Inf. Model. XXXX, XXX, XXX−XXX
M
negatives (FNs, predicted ≤reference, experimental ≥
reference). Both FEP+and QuanSA produced good rankings
(0.70 and 0.50 by Kendall’sτ, respectively, both with p< 0.01).
The experimental data contained 20 positives and 19 negatives.
FEP+correctly identified 20 of 20 TPs, but at the expense of 10
of 19 FPs. QuanSA correctly identified 7 of 20 TPs, but it did
so with 0 of 19 FPs. The hybrid approach obtained 20 of 20
TPs with just 4 of 19 FPs. The hybrid method also produced
the best Kendall’sτ(0.73, a marginal increase).
Adifferent effect was seen with the case of SYK. None of the
methods yielded a high-quality ranking, but this was an effect
of the data distribution. Both methods produced relatively
accurate results on an absolute scale (MUE of 0.6 each), with a
relatively small fraction of predictions being offby more than 2
kcal/mol (9 kcal/mol for FEP+and 4 kcal/mol for QuanSA of
43 total predictions). The hybrid approach made an improve-
ment (MUE of 0.5), with 36 of 43 molecules predicted within
1 kcal/mol of experimental and just 1 of 43 with an error of
greater than 2 kcal/mol. Nevertheless, the rank correlation was
marginal, reflecting the limitations of rank-based statistics in
such cases.
Predicted inhibitory activity relative to the current project
context may have a significant influence on decisions about
which candidate molecules to synthesize. In particular, a
smaller proportion of large absolute errors will provide better
guidance if rank correlation is equivalent between two
prediction methods and, possibly, even if rank correlation is
slightly worse for the method with better absolute fidelity.
Overall, the hybrid approach appears to be the best choice in
terms of achieving accurate absolute binding affinities or
rankings thereof. Across all 16 targets, with respect to MUE, is
was the best approach in 12 of 16 cases and second best by a
small margin in 4 of 16 cases. With respect to ranking, it was
either the best approach (10 of 16) or second best by a small
margin (6 of 16). Beyond the per-target performance, as seen
from the cumulative histograms of unsigned errors in Figure 11
and the histograms of signed errors in Figures 12 and 13, the
hybrid approach made a marked improvement in terms of the
frequency of large errors, both for overpredictions and for
underpredictions. Nearly 70% of the time, hybrid predictions
were within 1 kcal/mol of experiment, and errors of 2 kcal/mol
or greater occurred 5% of the time or less.
Extrapolation with QuanSA: Identification of Novel
Scaffolds and Linkers. Because the QuanSA method can be
applied automatically and rapidly, QuanSA pocket fields can be
used to screen large numbers of candidate molecules. We
explored the ability of the induced models to identify novel
active molecules from ChEMBL data, where the publication
dates of the reports of the new molecules were strictly later (by
year) than the data on which models were constructed. This
approach to data segregation makes it very unlikely that
information about the “new”molecules would have been
known and used in designing the molecules used to construct
the models. The converse, of course, is desirable: to see how
well a model can identify novel actives whose structures may
be, in part, reflected in the structures known at the time of
model construction.
We assessed the screening utility of the models for
identifying novel molecules by establishing thresholds on
minimum predicted activity (6.0 pKiunits) and on the raw
nearest-neighbor similarity (0.60 eSim unit) of a screened
molecule to a training molecule, both in their predicted
optimal poses. In the project application scenarios previously
discussed, the eSim nearest-neighbor similarity was very high
(averages of 0.87, 0.89, and 0.79 for the LFA-1, Abel, and
Schindler sets, respectively), with only a single LFA-1
prediction molecule having an eSim score of less than 0.60,
none within the Abel set, and fewer than 5% within the
Schindler set. Note that, especially for structurally divergent
molecules, it is not expected that the activity predictions will be
as accurate as for ligands within the focus of the models.
Rather, a critical feature of the selection criteria is that they
identify a small fraction of false positives, as the space of
candidates to be explored may be large. In order to establish
specificity, we also screened a decoy set of 1000 drug/leadlike
ZINC molecules, with the entire set presumptively defined as
false positives. For the 17 targets, in 12 cases, 1 or fewer of the
1000 decoys met the thresholds, with three cases two to four
false positives existed, and in two cases (TNKS2 and CDK8)
the estimated FP rate was 1−3%.
For the LFA-1 case, just 44 ChEMBL molecules existed to
be screened as temporally prospective candidates, but none of
the molecules passed both criteria. Of the 16 FEP+benchmark
targets, “future”data existed in ChEMBL for all but PFKFB3
to assess extrapolation utility. For these 15 targets, in all cases
except PTP1b, new active ChEMBL molecules were identified,
ranging from a handful (e.g., CDK8, HIF2a, SHP2, JNK1, and
TYK2) to dozens or hundreds (c-MET, SYK, TNKS2, BACE,
MCL-1, and P38).
Figure 14 shows examples from the Schindler target set, one
each for SHP2, c-MET, and SYK. In each case, the automatic
prediction of bound pose is important in establishing the
relationship between the novel compound and those forming
the training set. A notable example was observed for c-MET,
where the new compound was predicted to have greater
activity than any of the training molecules, and it was highly
active. This new molecule makes use of the triazolopyrazine at
right,
33
but it contained a novel linker.
Figure 15 shows examples from the Abel benchmark target
set, one each for BACE, MCL-1, and P38. These follow the
same pattern: predictions of target-specific activity that depend
upon identifying low-energy conformations of complex small
molecules that align with the predicted binding modes of
modeled ligands. Of particular note was MCL1. Here, a
macrocyclic linkage for a highly active inhibitor
34
was
identified.
Computational Time Complexity. The QuanSA and
FEP+methods have quite different time requirements for
calculations. FEP+has been optimized for GPU-based
acceleration, and calculations for this study were performed
by using computing nodes running Intel Xeon E7-8867 v3
CPUs (2.5 GHz, 16 core), with an NVIDIA Tesla K40c GPU
dedicated to each node. The most time-consuming preparatory
calculation is the estimation of custom force field parameters
for unparameterized torsions. In this case, the calculation
required approximately 1 h per molecule for each of the 17
molecules studied. Following force field parameterization, each
single-edged ΔΔGcalculation (i.e., a prediction for a single
molecule) required just over 2 h. Because each new molecule
may require force field parameterization, the expectation is
roughly 3 h of wall-clock time per molecule. This allows for
synthetic prioritization during late-stage lead optimization
given an appropriate computing infrastructure, where
predictions on different molecules can be processed in parallel.
For QuanSA, the process of model induction and selection is
the preparatory calculation, but it is done a single time prior to
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article
https://doi.org/10.1021/acs.jcim.1c01382
J. Chem. Inf. Model. XXXX, XXX, XXX−XXX
N
applying a model to many new ligands. The process consists of
conformational search, mutual ligand alignment, model
parameter learning, and model selection. The QuanSA method
makes use of multicore laptops and workstations for all
calculations, with the reference system containing dual Intel
Xeon Gold 6154 CPUs (3.00 GHz, 36 cores total) with no
GPU-based acceleration. For LFA-1, a fairly typical case, the
entire model induction and selection process required
approximately 3 h of real wall-clock time, the majority of
which was spent in the model parameter learning step for each
of several alternative alignment hypotheses. Calculations of
predicted binding poses and scores for new molecules required
an average of less than 10 s per molecule (including both
conformational search and fitting into the QuanSA pocket
field). Screens of very large numbers of possible design ideas or
virtual synthetic libraries are possible with the QuanSA
method, either employed directly or after using a similarity-
based method as a prescreen.
■CONCLUSIONS
Using temporally segregated series of molecules from a lead
optimization project and from two public affinity-prediction
benchmarks, we have explored the performance of orthogonal
affinity prediction approaches. The QuanSA approach
14
constructs binding-site models based on ligand structure and
affinity data, using multiple-instance machine learning. The
models are physically sensible in that they model the protein−
ligand interaction in a manner analogous to the physical
process of binding. Given a QuanSA pocket field, the process
of scoring a new molecule is analogous to docking to a protein
structure: conformational search, alignment to the pocket field,
and optimization of final poses with respect to conformational
strain and interaction score. A parallel set of experiments was
carried out by using the physics-based simulation approach
FEP+on LFA-1 and by making use of previously published
data for two benchmarks comprising 16 additional targets.
9,11
For the LFA-1 lead optimization set, the QuanSA model’s
predicted affinities for 67 future compounds had an average
error of 0.4 pKiunit (with highly significant rank correlation
statistics). For the 17 compounds on which FEP+calculations
were made, the average prediction error was marginally worse
than that for QuanSA. More importantly, by combining the
QuanSA and FEP+predictions into a hybrid model, the
average error dropped significantly, to less than 0.3 pKiunit,
with a very high correlation between experimental and
Figure 14. Examples of novel ligands for each of three targets from
the Schindler FEP+benchmark identified through temporally
prospective screening of focused QuanSA models. Figure 15. Examples of novel ligands for each of three targets from
the Abel benchmark identified through temporally prospective
screening of focused QuanSA models.
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article
https://doi.org/10.1021/acs.jcim.1c01382
J. Chem. Inf. Model. XXXX, XXX, XXX−XXX
O
predicted affinities. The errors that the modeling approaches
made in predictions were only slightly correlated, so the hybrid
model exhibited error cancellation.
This pattern of synergy between the two approaches, due to
partial error cancellation, was mirrored across 16 additional
examples derived from public FEP+benchmarking data. The
machine-learning strategy involved choosing the most relevant
data to inform model construction, rather than simply
aggregating a large amount of data. The QuanSA method is
physically driven, and unnecessarily large data sets tend to
dilute the information required for accurate affinity prediction.
Both on a per-target basis and in terms of aggregate error
analysis on nearly 400 compound affinity predictions, the
hybrid approach (averaging the predictions from FEP+and
QuanSA) performed better than the two primary methods,
particularly with respect to a reduction in large-magnitude
errors. Given the consistency of the results presented here, we
expect this to be true quite often, perhaps even for the majority
of targets across the range of early-stage and mid-to-late-stage
lead optimization projects.
Two aspects of the learned QuanSA models are important
by way of contrast to FEP+. First, application of the models is
computationally inexpensive. Second, the models are relatively
insensitive to molecular scaffold changes that may be
inappropriate for application of a free energy perturbation
method. Taken together, this allows for use cases such as
screening large numbers of virtual compounds or searching for
scaffold alternatives.
Currently, in cases where biophysical data are available,
application of FEP+and related methods is often done
exclusive of machine-learning approaches. The results
presented here argue that a simple and direct approach can
improve upon either single-mode method. Apart from the
synergy of the numerical predictions, the complementarity of
the methods is multifaceted. One can be applied in low
throughput on relatively close-in analogues at a high
computational cost. The other can be readily applied in
those cases to provide an orthogonal prediction, but it can also
be applied across a wider range of chemical space, and very
large sets of potential ligands can be processed. Application of
QuanSA models across the targets to identify future active
ChEMBL ligands was successful in many cases, with generally
very low estimated false positive rates. The QuanSA
predictions yield a prediction of binding pose, rather than a
black box just producing a number, which lends confidence
where synthetic effort requires justification and also can
stimulate the design process.
In terms of assessment for either simulation-based or
machine-learning methods, studies have only rarely been
published utilizing time-stamped lead optimization data. This
type of application is the most realistic approach to validation
that is possible without fully prospective experiments, and it is
more likely to reflect future real-world prospective perform-
ance than other validation schemes. Here, accurate predictions
were obtained at both the short time scale (compound
registration dates) and a longer time scale (data disclosure
years).
As a general matter for application in lead optimization, we
believe that the dichotomy between physics-based simulation
methods and those from machine learning has been driven
largely by the nonphysical assumptions of traditional QSAR
approaches. The QuanSA approach moves much closer to
physics-based simulation in terms of its underlying mechanics.
The approaches appear to be complementary, both in the
sense of orthogonality of prediction errors and in terms of their
domains of applicability. Medicinal chemistry projects within
most stages of structure-enabled lead optimization could
benefit from both types of approaches, in combination and to
serve complementary goals.
■EXPERIMENTAL SECTION
This is not primarily a report of new methods. As such, data
curation and computational protocols will be described, with
references to detailed reports of their algorithmic under-
pinnings, implementation, and validation. All molecular and
activity data along with computational procedures for the
public benchmarks comprising the bulk of this study are freely
available (for details, see Supporting Information and Data and
Software Availability).
Molecular and Activity Data. LFA-1. A total of 202
compounds from a lead optimization project formed the data
set. Molecules were provided as SMILES strings with
registration dates and associated activities. The molecules
were sorted by registration date and segregated temporally,
with the oldest third as train molecules (n= 67), the next third
as holdout molecules (n= 68), and the most recent third as
test molecules (n= 67). Standard procedures were used to
convert 2D to 3D structures, protonate the molecules as
expected at physiological pH, and perform a conformational
search. There were 57 molecules having exactly one
unspecified chiral center, and these were prepared as racemic
mixtures.
ChEMBL target 1803, human LFA-1, had 131 small
molecule ligands with associated IC50 activities. The set was
filtered to exclude ligands that were present in the set of 202
project compounds and those that did not bind at the allosteric
binding site. The 44 remaining ligands formed the ChEMBL
extrapolation set, representing three LFA-1 pharmaceutical
industry efforts: 32 from ref 35, 10 from ref 36, and 2 from the
same series of molecules in our 202 ligand set.
24
The 1000
ZINC molecule set used to establish model specificity has been
used previously in the same manner.
14,18
FEP+Benchmark Data Sets. Data comprising 16 prediction
sets was derived from the Supporting Information of two FEP+
benchmarking studies.
9,11
Data for model induction was
curated from ChEMBL and, where necessary, from the original
publications cited within that Supporting Information. For
each target, where Nmolecules were indicated as comprising
the prediction set, the reference molecule was removed from
the predictions to be made and added to the QuanSA model
induction sets. The reference molecule in its bound state was
also used to identify related contemporaneous PDB structures
from a mutually aligned set, as described previously.
37
For both benchmark data sets, the procedure outlined in
Figure 7 made use of default eSim thresholds for building
focused models of 6.5. For the Schindler benchmark, in cases
where this threshold yielded too few training molecules or
poor prediction set coverage based on QuanSA scoring quality
measures, the thresholds were reduced. To test an alternative
strategy, for the Abel benchmark, in three cases (BACE,
MCL1, and P38) half of the N−1 molecules for each
prediction set were randomly selected and added to the
respective training sets. For the Schindler benchmark, there
were, on average 32.1 prediction examples per target (total of
257). For the Abel benchmark, there were 16.9 examples per
target (total of 135). Tables 1 and 2provide total numbers of
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article
https://doi.org/10.1021/acs.jcim.1c01382
J. Chem. Inf. Model. XXXX, XXX, XXX−XXX
P
compounds for each target (prediction set, full possible
training pool, pretraining filtered pool, and final number of
training compounds).
ChEMBL data representing “future”molecules were
collected for each of the 16 targets using ChEMBL target
identifiers, and the year of literature or patent publication was
used to temporally segregate the extrapolation sets from the
model induction sets. These were further filtered to remove
data used for model induction (in rare cases, a training
compound appeared in future-year publications). The decoy
set used to establish specificity was the same as that used for
LFA-1.
Computational Procedures for QuanSA Model In-
duction and Prediction. The QuanSA method is the
successor to the QMOD approach.
14
QuanSA builds a pocket
field rather than constructing a physical set of probes. QuanSA
brings together ideas and algorithms from molecular
similarity
31
and multiple instance learning together with
many lessons learned and features developed with the
QMOD approach.
18
The methods papers are comprehensive,
and the Supporting Information contains a more detailed
algorithmic description than the foregoing without recapitulat-
ing the prior publications. Standard procedures were employed
for QuanSA model induction (Surflex Platform releases 5.001
for LFA-1 and 5.122 for the Schindler and Abel benchmarks,
BioPharmics LLC, Sonoma County, CA, 2021).
LFA-1. The results reported here were generated using
version 5 of the Surflex Platform. Surflex-Dock was used to
align X-ray crystallographic protein structures and to perform
ensemble docking as an independent means to analyze ligand
pose. Ligand preparation was carried out with standard
procedures using the -pquant level of conformer elaboration.
This produced up to 1000 conformers for each training ligand
(though typically far fewer for LFA-1 compounds). For the 57
molecules with a single unspecified chiral center, the -enum
chiral 1 parameter resulted in the molecules being racemized.
Details on the ForceGen methodology have been detailed
previously.
38,39
The QuanSA initialization procedure (init) automatically
builds multiple initial alignments, but it can be influenced by
user knowledge and guidance. The -clknown parameter
specifies a set of known poses for competitive ligands, in this
case an alignment of six crystallographic ligands. The searched
conformer database along with a file containing molecule
names and associated activity information formed the input to
the initialization procedure, which was done in the standard
manner. After initialization, model building was carried out for
the top five alignments produced by the initialization
procedure, followed by model selection, informed by the use
of a holdout set. After selection of the model for prospective
application, the holdout molecules were then added to refine
the selected model, as outlined in Figure 2.
FEP+Benchmark Data Sets. As for application to LFA-1,
standard procedures were used for model induction and for
scoring the prediction sets. The primary variation in the
application to the individual targets was in filtering the training
pool of data for each target (as described above). For the
targets SYK, EG5, and CDK8, the initial threshold for ligand
similarity to known bound ligands was reduced from the
standard value of 6.5 in order to yield sufficient training data
for adequate coverage of the prediction sets based on the
quality metrics produced in the scoring procedure.
Computational Procedures for FEP+.FEP+calculations
were performed by using standard protocols, making use of the
Glide, Prime, MacroModel, and FEP+tools (all release 2019-4,
Schrödinger, LLC, New York, NY, 2019).
Poses were generated by using Glide in SP mode using core
constraints relative to compound 1. The ligand and residues
within 5 Å of the ligand were left flexible, while the remainder
of the atoms were constrained. For application of FEP+, the
Glide poses were minimized by using MacroModel with
OPLS3 with implicit solvation. The protein was held rigid.
Custom force field parameters were calculated using the
Forcefield Builder module that is part of FEP+. Single-edge
FEP+calculations were performed relative to compound 13 for
the 17 test molecules from the temporally segregated test set.
■DATA AND SOFTWARE AVAILABILITY
An extensive data archive is freely available at www.jainlab.org
(see the Supporting Information). All software employed
herein is commercially available.
■ASSOCIATED CONTENT
*
sıSupporting Information
The Supporting Information is available free of charge at
https://pubs.acs.org/doi/10.1021/acs.jcim.1c01382.
Additional information about computational methods;
detailed description of the contents of the extensive data
archive (PDF)
All prediction data for the FEP+targets for all three
methods (XLSX)
■AUTHOR INFORMATION
Corresponding Authors
Stephen R. Johnson −Computer-Assisted Drug-Design,
Bristol-Myers Squibb Company, Princeton, New Jersey
08648, United States; Email: stephen.johnson@bms.com
Ajay N. Jain −Research and Development, BioPharmics LLC,
Santa Rosa, California 95404, United States; orcid.org/
0000-0003-4641-8501; Email: ajain@jainlab.org
Author
Ann E. Cleves −Applied Science, BioPharmics LLC, Santa
Rosa, California 95404, United States; orcid.org/0000-
0002-1622-2770
Complete contact information is available at:
https://pubs.acs.org/10.1021/acs.jcim.1c01382
Notes
The authors declare no competing financial interest.
■ACKNOWLEDGMENTS
The authors thank Bristol-Myers Squibb for providing support
for this work.
■REFERENCES
(1) Brown, S. P.; Muchmore, S. W. High-throughput calculation of
protein-ligand binding Affinities: Modification and adaptation of the
MM-PBSA protocol to enterprise grid computing. J. Chem. Inf. Model.
2006,46, 999−1005.
(2) Brown, S. P.; Muchmore, S. W. Rapid estimation of relative
protein-ligand binding affinities using a high-throughput version of
MM-PBSA. J. Chem. Inf. Model. 2007,47, 1493−1503.
(3) Brown, S. P.; Muchmore, S. W. Large-scale application of high-
throughput molecular mechanics with Poisson-Boltzmann surface area
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article
https://doi.org/10.1021/acs.jcim.1c01382
J. Chem. Inf. Model. XXXX, XXX, XXX−XXX
Q
for routine physics-based scoring of protein-ligand complexes. J. Med.
Chem. 2009,52, 3159−3165.
(4) Wang, E.; Sun, H.; Wang, J.; Wang, Z.; Liu, H.; Zhang, J. Z.;
Hou, T. End-point binding free energy calculation with MM/PBSA
and MM/GBSA: strategies and applications in drug design. Chem.
Rev. 2019,119, 9478−9508.
(5) Chodera, J. D.; Mobley, D. L.; Shirts, M. R.; Dixon, R. W.;
Branson, K.; Pande, V. S. Alchemical free energy methods for drug
discovery: Progress and challenges. Curr. Opin. Struct. Biol. 2011,21,
150−160.
(6) Genheden, S.; Ryde, U. The MM/PBSA and MM/GBSA
methods to estimate ligand-binding affinities. Expert Opin. Drug
Discovery 2015,10, 449−461.
(7) Jorgensen, W. L.; Ravimohan, C. Monte Carlo simulation of
differences in free energies of hydration. J. Chem. Phys. 1985,83,
3050−3054.
(8) Kollman, P. Free energy calculations: Applications to chemical
and biochemical phenomena. Chem. Rev. 1993,93, 2395−2417.
(9) Wang, L.; Wu, Y.; Deng, Y.; Kim, B.; Pierce, L.; Krilov, G.;
Lupyan, D.; Robinson, S.; Dahlgren, M. K.; Greenwood, J.; Romero,
D. L.; Masse, C.; Knight, J. L.; Steinbrecher, T.; Beuming, T.; Damm,
W.; Harder, E.; Sherman, W.; Brewer, M.; Wester, R.; Murcko, M.;
Frye, L.; Farid, R.; Lin, T.; Mobley, D. L.; Jorgensen, W. L.; Berne, B.
J.; Friesner, R. A.; Abel, R. Accurate and reliable prediction of relative
ligand binding potency in prospective drug discovery by way of a
modern free-energy calculation protocol and force field. J. Am. Chem.
Soc. 2015,137, 2695−2703.
(10) Sun, H.; Li, Y.; Tian, S.; Xu, L.; Hou, T. Assessing the
performance of MM/PBSA and MM/GBSA methods. 4. Accuracies
of MM/PBSA and MM/GBSA methodologies evaluated by various
simulation protocols using PDBbind data set. Phys. Chem. Chem. Phys.
2014,16, 16719−16729.
(11) Schindler, C. E. M.; Baumann, H.; Blum, A.; Bose, D.;
Buchstaller, H.-P.; Burgdorf, L.; Cappel, D.; Chekler, E.; Czodrowski,
P.; Dorsch, D.; Eguida, M. K. I.; Follows, B.; Fuchss, T.; Grädler, U.;
Gunera, J.; Johnson, T.; Jorand Lebrun, C.; Karra, S.; Klein, M.;
Knehans, T.; Koetzner, L.; Krier, M.; Leiendecker, M.; Leuthner, B.;
Li, L.; Mochalkin, I.; Musil, D.; Neagu, C.; Rippmann, F.; Schiemann,
K.; Schulz, R.; Steinbrecher, T.; Tanzer, E.-M.; Unzue Lopez, A.;
Viacava Follis, A.; Wegener, A.; Kuhn, D. Large-scale assessment of
binding free energy calculations in active drug discovery projects. J.
Chem. Inf. Model. 2020,60, 5457−5474.
(12) Walters, W. P.; Barzilay, R. Applications of deep learning in
molecule generation and molecular property prediction. Acc. Chem.
Res. 2021,54, 263−270.
(13) Ramsundar, B.; Eastman, P.; Walters, P.; Pande, V. Deep
Learning for the Life Sciences: Applying Deep Learning to Genomics,
Microscopy, Drug Discovery, and More;O’Reilly Media, Inc.: 2019.
(14) Cleves, A. E.; Jain, A. N. Quantitative Surface Field Analysis:
Learning Causal Models to Predict Ligand Binding Affinity and Pose.
J. Comput.-Aided Mol. Des. 2018,32, 731−757.
(15) Jain, A. Scoring noncovalent protein-ligand interactions: A
continuous differentiable function tuned to compute binding
affinities. J. Comput.-Aided Mol. Des. 1996,10, 427−440.
(16) Pham, T.; Jain, A. Parameter estimation for scoring protein-
ligand interactions using negative training data. J. Med. Chem. 2006,
49, 5856−5868.
(17) Jain, A.; Cleves, A. Does your model weigh the same as a Duck?
J. Comput.-Aided Mol. Des. 2012,26,57−67.
(18) Cleves, A. E.; Jain, A. N. Extrapolative prediction using
physically-based QSAR. J. Comput.-Aided Mol. Des. 2016,30, 127−
152.
(19) Jain, A. N.; Dietterich, T. G.; Lathrop, R. H.; Chapman, D.;
Critchlow, R. E., Jr.; Bauer, B. E.; Webster, T. A.; Lozano-Perez, T.
Compass: A Shape-Based Machine Learning Tool for Drug Design. J.
Comput.-Aided Mol. Des. 1994,8, 635−652.
(20) Jain, A.; Koile, K.; Chapman, D. Compass: Predicting biological
activities from molecular surface properties. Performance comparisons
on a steroid benchmark. J. Med. Chem. 1994,37, 2315−27.
(21) Jain, A.; Harris, N.; Park, J. Quantitative binding site model
generation: Compass applied to multiple chemotypes targeting the 5-
HT1a receptor. J. Med. Chem. 1995,38, 1295−1308.
(22) Dietterich, T. G.; Lathrop, R. H.; Lozano-Pérez, T. Solving the
multiple instance problem with axis-parallel rectangles. Artif. Intell.
1997,89,31−71.
(23) Potin, D.; Launay, M.; Monatlik, F.; Malabre, P.; Fabreguettes,
M.; Fouquet, A.; Maillet, M.; Nicolai, E.; Dorgeret, L.; Chevallier, F.;
Besse, D.; Dufort, M.; Caussade, F.; Ahmad, S. Z.; Stetsko, D. K.;
Skala, S.; Davis, P. M.; Balimane, P.; Patel, K.; Yang, Z.; Marathe, P.;
Postelneck, J.; Townsend, R. M.; Goldfarb, V.; Sheriff, S.; Einspahr,
H.; Kish, K.; Malley, M. F.; DiMarco, J. D.; Gougoutas, J. Z.; Kadiyala,
P.; Cheney, D. L.; Tejwani, R. W.; Murphy, D. K.; Mcintyre, K. W.;
Yang, X.; Chao, S.; Leith, L.; Xiao, Z.; Mathur, A.; Chen, B.-C.; Wu,
D.-R.; Traeger, S. C.; McKinnon, M.; Barrish, J. C.; Robl, J. A.;
Iwanowicz, E. J.; Suchard, S. J.; Dhar, T. G. M. Discovery and
development of 5-[(5 S, 9 R)-9-(4-cyanophenyl)-3-(3, 5-dichlor-
ophenyl)-1-methyl-2, 4-dioxo-1, 3, 7-triazaspiro [4.4] non-7-yl-
methyl]-3-thiophenecarboxylic acid (BMS-587101)a small molecule
antagonist of leukocyte function associated antigen −1.J. J. Med.
Chem. 2006,49, 6946−6949.
(24) Watterson, S. H.; Xiao, Z.; Dodd, D. S.; Tortolani, D. R.;
Vaccaro, W.; Potin, D.; Launay, M.; Stetsko, D. K.; Skala, S.; Davis, P.
M.; Lee, D.; Yang, X.; McIntyre, K. W.; Balimane, P.; Patel, K.; Yang,
Z.; Marathe, P.; Kadiyala, P.; Tebben, A. J.; Sheriff, S.; Chang, C. Y.;
Ziemba, T.; Zhang, H.; Chen, B.-C.; DelMonte, A. J.; Aranibar, N.;
McKinnon, M.; Barrish, J. C.; Suchard, S. J.; Murali Dhar, T. G. Small
Molecule Antagonist of Leukocyte Function Associated Antigen-1
(LFA-1): Structure- Activity Relationships Leading to the Identi-
fication of 6-((5 S, 9 R)-9-(4-Cyanophenyl)-3-(3, 5-dichlorophenyl)-
1-methyl-2, 4-dioxo-1, 3, 7-triazaspiro [4.4] nonan-7-yl) nicotinic
Acid (BMS-688521). J. Med. .Chem. . 2010,53, 3814−3830.
(25) Cleves, A. E.; Jain, A. N. Effects of inductive bias on
computational evaluations of ligand-based modeling and on drug
discovery. J. Comput.-Aided Mol. Des. 2008,22, 147−159.
(26) Varela, R.; Walters, W.; Goldman, B.; Jain, A. Iterative
refinement of a binding pocket model: Active computational steering
of lead optimization. J. Med. Chem. 2012,55, 8926−8942.
(27) Hogg, N.; Henderson, R.; Leitinger, B.; McDowall, A.; Porter,
J.; Stanley, P. Mechanisms contributing to the activity of integrins on
leukocytes. Immunol. Rev. 2002,186, 164−171.
(28) Lebwohl, M.; Tyring, S. K.; Hamilton, T. K.; Toth, D.; Glazer,
S.; Tawfik, N. H.; Walicke, P.; Dummer, W.; Wang, X.; Garovoy, M.
R.; Pariser, D. A novel targeted T-cell modulator, efalizumab, for
plaque psoriasis. N. Engl. J. Med. 2003,349, 2004−2013.
(29) Welzenbach, K.; Hommel, U.; Weitz-Schmidt, G. Small
Molecule Inhibitors Induce Conformational Changes in the I Domain
and the I-like Domain of Lymphocyte Function-associated Antigen-1.
J. Biol. Chem. 2002,277, 10590−10598.
(30) Friesner, R. A.; Banks, J. L.; Murphy, R. B.; Halgren, T. A.;
Klicic, J. J.; Mainz, D. T.; Repasky, M. P.; Knoll, E. H.; Shelley, M.;
Perry, J. K.; Shaw, D. E.; Francis, P.; Shenkin, P. S. Glide: A new
approach for rapid, accurate docking and scoring. 1. Method and
assessment of docking accuracy. J. Med. Chem. 2004,47, 1739−1749.
(31) Cleves, A. E.; Johnson, S. R.; Jain, A. N. Electrostatic-field and
surface-shape similarity for virtual screening and pose prediction. J.
Comput.-Aided Mol. Des. 2019,33, 865−886.
(32) Chen, Y.-N. P.; LaMarche, M. J.; Chan, H. M.; Fekkes, P.;
Garcia-Fortanet, J.; Acker, M. G.; Antonakos, B.; Chen, C. H.-T.;
Chen, Z.; Cooke, V. G.; Dobson, J. R.; Deng, Z.; Fei, F.; Firestone, B.;
Fodor, M.; Fridrich, C.; Gao, H.; Grunenfelder, D.; Hao, H.-X.; Jacob,
J.; Ho, S.; Hsiao, K.; Kang, Z. B.; Karki, R.; Kato, M.; Larrow, J.; La
Bonte, L. R.; Lenoir, F.; Liu, G.; Liu, S.; Majumdar, D.; Meyer, M. J.;
Palermo, M.; Perez, L.; Pu, M.; Price, E.; Quinn, C.; Shakya, S.;
Shultz, M. D.; Slisz, J.; Venkatesan, K.; Wang, P.; Warmuth, M.;
Williams, S.; Yang, G.; Yuan, J.; Zhang, J.-H.; Zhu, P.; Ramsey, T.;
Keen, N. J.; Sellers, W. R.; Stams, T.; Fortin, P. D. Allosteric
inhibition of SHP2 phosphatase inhibits cancers driven by receptor
tyrosine kinases. Nature 2016,535, 148−152.
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article
https://doi.org/10.1021/acs.jcim.1c01382
J. Chem. Inf. Model. XXXX, XXX, XXX−XXX
R
(33) Jia, H.; Dai, G.; Weng, J.; Zhang, Z.; Wang, Q.; Zhou, F.; Jiao,
L.; Cui, Y.; Ren, Y.; Fan, S.; Zhou, J.; Qing, W.; Gu, Y.; Wang, J.; Sai,
Y.; Su, W. Discovery of (S)-1-(1-(Imidazo [1, 2-a] pyridin-6-yl)
ethyl)-6-(1-methyl-1 H-pyrazol-4-yl)-1 H-[1, 2, 3] triazolo [4, 5-b]
pyrazine (Volitinib) as a Highly Potent and Selective Mesenchymal−
Epithelial Transition Factor (c-Met) Inhibitor in Clinical Develop-
ment for Treatment of Cancer. J. Med. Chem. 2014,57, 7577−7589.
(34) Tron, A. E.; Belmonte, M. A.; Adam, A.; Aquila, B. M.; Boise, L.
H.; Chiarparin, E.; Cidado, J.; Embrey, K. J.; Gangl, E.; Gibbons, F.
D.; Gregory, G. P.; Hargreaves, D.; Hendricks, J. A.; Johannes, J. W.;
Johnstone, R. W.; Kazmirski, S. L.; Kettle, J. G.; Lamb, M. L.; Matulis,
S. M.; Nooka, A. K.; Packer, M. J.; Peng, B.; Rawlins, P. B.; Robbins,
D. W.; Schuller, A. G.; Su, N.; Yang, W.; Ye, Q.; Zheng, X.; Secrist, J.
P.; Clark, E. A.; Wilson, D. M.; Fawell, S. E.; Hird, A. W. Discovery of
Mcl-1-specific inhibitor AZD5991 and preclinical activity in multiple
myeloma and acute myeloid leukemia. Nature Comm 2018,9, 5341.
(35) Winn, M.; Reilly, E. B.; Liu, G.; Huth, J. R.; Jae, H.-S.; Freeman,
J.; Pei, Z.; Xin, Z.; Lynch, J.; Kester, J.; von Geldern, T. W.; Leitza, S.;
DeVries, P.; Dickinson, R.; Mussatto, D.; Okasinski, G. F. Discovery
of novel p-arylthio cinnamides as antagonists of leukocyte function-
associated antigen-1/intercellular adhesion molecule-1 interaction. 4.
Structure- activity relationship of substituents on the benzene ring of
the cinnamide. J. Med. Chem. 2001,44, 4393−4403.
(36) Kollmann, C. S.; Bai, X.; Tsai, C.-H.; Yang, H.; Lind, K. E.;
Zhu, Z.; Israel, D. I.; Cuozzo, J. W.; Morgan, B. A.; Yuki, K.; Xie, C.;
Springer, T. A.; Shimaoka, M.; Evindar, G.; Skinner, S. R. Application
of encoded library technology (ELT) to a protein−protein interaction
target: Discovery of a potent class of integrin lymphocyte function-
associated antigen 1 (LFA-1) antagonists. Bioorg. Med. Chem. 2014,
22, 2353−2365.
(37) Spitzer, R.; Cleves, A.; Varela, R.; Jain, A. Protein function
annotation by local binding site surface similarity. Proteins: Struct.,
Funct., Genet. 2014,82, 679−694.
(38) Cleves, A. E.; Jain, A. N. ForceGen 3D Structure and
Conformer Generation: From Small Lead-Like Molecules to Macro-
cyclic Drugs. J. Comput.-Aided Mol. Des. 2017,31, 419−439.
(39) Jain, A. N.; Cleves, A. E.; Gao, Q.; Wang, X.; Liu, Y.; Sherer, E.
C.; Reibarkh, M. Y. Complex macrocycle exploration: Parallel,
heuristic, and constraint-based conformer generation using ForceGen.
J. Comput.-Aided Mol. Des. 2019,33, 531−558.
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article
https://doi.org/10.1021/acs.jcim.1c01382
J. Chem. Inf. Model. XXXX, XXX, XXX−XXX
S