ArticlePDF Available

Prediction of off-target drug effects through data fusion

Authors:
  • BioPharmics Division, Optibrium Ltd.
  • BioPharmics LLC

Abstract and Figures

We present a probabilistic data fusion framework that combines multiple computational approaches for drawing relationships between drugs and targets. The approach has special relevance to identifying surprising unintended biological targets of drugs. Comparisons between molecules are made based on 2D topological structural considerations, based on 3D surface characteristics, and based on English descriptions of clinical effects. Similarity computations within each modality were transformed into probability scores. Given a new molecule along with a set of molecules sharing some biological effect, a single score based on comparison to the known set is produced, reflecting either 2D similarity, 3D similarity, clinical effects similarity or their combination. The methods were validated within acurated structural pharmacology database (SPDB) and further tested by blind application to data derived from the ChEMBL database. For prediction of off-target effects, 3D-similarity performed best as a single modality, but combining all methods produced performance gains. Striking examples of structurally surprising off-target predictions are presented.
Content may be subject to copyright.
PREDICTION OF OFF-TARGET DRUG EFFECTS
THROUGH DATA FUSION
EMMANUEL R. YERA, ANN E. CLEVES, and AJAY N. JAIN
Bioengineering and Therapeutic Sciences, University of California, San Francisco,
San Francisco, CA 94143, USA
E-mail: ajain@jainlab.org
www.jainlab.org
We present a probabilistic data fusion framework that combines multiple computational approaches
for drawing relationships between drugs and targets. The approach has special relevance to identi-
fying surprising unintended biological targets of drugs. Comparisons between molecules are made
based on 2D topological structural considerations, based on 3D surface characteristics, and based
on English descriptions of clinical effects. Similarity computations within each modality were trans-
formed into probability scores. Given a new molecule along with a set of molecules sharing some
biological effect, a single score based on comparison to the known set is produced, reflecting either 2D
similarity, 3D similarity, clinical effects similarity or their combination. The methods were validated
within a curated structural pharmacology database (SPDB) and further tested by blind application
to data derived from the ChEMBL database. For prediction of off-target effects, 3D-similarity per-
formed best as a single modality, but combining all methods produced performance gains. Striking
examples of structurally surprising off-target predictions are presented.
Keywords: Molecular similarity; Surflex-Sim; Patient Package Inserts; Off-Target Predictions.
1. Introduction
In prior work, we introduced a methodological approach for data fusion which was used to
predict the protein targets of small molecules based on molecular similarity.1Given a test
molecule and a set of small molecules with a known shared biological effect, the method
produces a score corresponding to the likelihood that the test molecule will share the same
activity. We showed that for predicting primary targets (i.e. targets modulating intended
therapeutic effects) the performance advantage of a 3D similarity method over a 2D method
was relatively small, due to the dominating effects of human 2D bias in drug design (i.e.
“me-too” drugs).1,2 However, for predicting secondary targets (i.e. sources of side-effects) 3D
similarity was much more effective than 2D topological comparisons. We also showed that
clinical effects of drugs could be used as a surrogate for biochemical characterization,1making
use of common side effects of muscarinic antagonism as markers for the biochemical protein-
ligand effect. It was possible using 3D chemical similarity to achieve strong separation of likely
muscarinic modulators from those with no evidence of such effects.
In the current work, we expand the analysis to a much larger set of small molecule drugs,
again making use of 2D and 3D chemical similarity computations. Additionally, computations
involving structural similarity are augmented with clinical effects similarity, made possible by
automating the extraction and weighting of relevant textual terms from drug package inserts.
The top row of Figure 1 shows two highly similar first generation sulfonylureas, tolbutamide
and tolazamide, each having highly similar pharmacological effects,3with their therapeutic
benefits deriving from identical mechanisms.4Clinical effects similarity coincides here with
Pacific Symposium on Biocomputing 2014
160
Biocomputing 2014 Downloaded from www.worldscientific.com
by 46.246.28.95 on 12/21/15. For personal use only.
Figure 1.
Relationship between molecular similarity methods, the proteins they modulate, and
clinical effects in common.The top row shows two antidiabetics, tolbutamide (first in
class) and tolazamide (follow on) which are very structurally similar, interact with
similar proteins, and have similar clinical effects. The bottom row shows two anti-
epileptic drugs, carbamazepine and levetiracetam, that have different primary targets
but similar clinical effects and 3D molecular similarity. Surflex-Sim's 3D overlay is
shown at the bottom where carbamazepine is colored by green carbons and
levetiracetam is in atom color. Green sticks correspond to regions of significant
hydrophobic similarity and blue/red sticks correspond to regions of significant polar
similarity.
Yera/Cleves/Jain: Ligand Structure/Function 20
Tolbutamide Tolazamide
Levetiracetam
NH
2
N
O
O
Carbamazepine
NH
2
N
O
N
H
N
HN
O
O
OS
N
H
N
H
O
O
O
S
Intended Target:
Kir6.2
Other Target:
PPARγ-RXR
Intended Target:
Kir6.2
Other Target:
PPARγ-RXR
Intended Target:
Nav1.1
Off-Target:
Nav1.5
Intended Target:
Synaptic Vesicle
Glycoprotein 2A
Clinical Terms in Common: diabetes mellitus, blood glucose, and hypoglycemia
Similarity: High 2D and High 3D
Clinical Terms in Common: grand mal and status epilepticus
Similarity: Low 2D but High 3D
Therapeutic Indication: Antidiabetic
Therapeutic Indication: Anticonvulsant
Fig. 1. Relationship between small molecules based on molecular similarity, protein target modulation, and
clinical effects. The optimal 3D superimposition (bottom) indicates high similarity, despite little topological
commonality (green sticks correspond to regions of significant surface shape similarity and blue/red sticks
correspond to regions of significant polar similarity).
high structural 2D and 3D similarity. Next, consider the two structurally dissimilar anticon-
vulsants on the bottom of Figure 1, carbamazepine and levetiracetam. Carbamazepine was
one of the first anticonvulsants (approved in 1968), and its therapeutic benefit is attributed to
stabilizing the inactivated state of voltage-gated sodium channels (Nav1.1).5Levetiracetam is
a newer anticonvulsant, believed to act through interaction with synaptic vesicle glycoprotein
2A (SV2A).6As expected, the two package inserts have clinical effect terms in common due
to shared indications. Given the high 3D structural similarity, our expectation is that these
drugs do in fact share some molecular targets, as will be discussed later.
The present study establishes a computational method to draw relationships between drugs
based on the clinical effects present in Patient Package Inserts (PPI), whose utility for pre-
dicting drug target interactions has been shown previously.7The present study makes three
primary contributions. First, we introduce a method to extract and weight medically relevant
terms from English clinical effects information. Second, we show that drug similarity com-
puted from package inserts is directly correlated with drug similarity computed by molecular
structure comparison. Third, we established that the combination of 2D, 3D, and PPI simi-
larity yielded better off-target predictive performance over any single similarity computation.
Recovery of roughly 40–50% of off-target annotations was possible with false positive rates of
about 1–3%. The approach is generalizable to other computational modalities (e.g. docking of
ligands to protein structures), and it is our hope that broad application of the methods will
aid in identifying unexpected interactions between drugs and biological targets.
Pacific Symposium on Biocomputing 2014
Biocomputing 2014 Downloaded from www.worldscientific.com
by 46.246.28.95 on 12/21/15. For personal use only.
2. Methods and Data
The following describes the molecular data sets, computational methods, and specific compu-
tational procedures (see http://www.jainlab.org for additional details on software, data, and
protocols).
2.1. Molecular Data Sets
In the present study two molecular data sets are used. The Structural Pharmacology Database
(SPDB) is a deeply curated drug target database that is used as the basis to make predictions.
A set of drug target annotations from ChEMBL that were not annotated in our database were
used as a blind test set.
The details of the SPDB and its relationship to other databases has been extensively de-
scribed elsewhere.1,2,8 It has two features that are particularly important for the present study.
First, “targets” are specific binding sites on proteins or protein complexes. This is a critical
distinction in order to make inferences about small molecule activity based on structural sim-
ilarity. Second, primary targets (those that are believed to be therapeutically beneficial) are
distinguished from secondary targets (which mediate pharmacologically relevant off-target ef-
fects). By making this distinction, it is possible to explicitly quantify performance of methods
for prediction of surprising effects. Of the roughly 1000 drugs within the SPDB, 602 met our
criteria for inclusion based on PPI information (see below). Of the 257 primary and secondary
targets of these 602 drugs, 91 had at least 5 annotated drugs and formed the basis of cross-
validation experiments. These 91 targets were comprised of 83 human proteins, including 28
aminergic GPCRs, 19 ligand and voltage gated ion channels, 13 human enzymes, 7 nucleotide
and short peptide GPCRs, 5 tyrosine kinases, 5 steroid receptors, 3 reuptake transporters, 2
ion transporters, and 1 transcription factor. The remaining 8 targets were bacterial, fungal,
and viral proteins. To test the methodology, we employed ChEMBL version 14, which curates
linkages between chemicals and biological targets.9For each of the 602 drugs, corresponding
ChEMBL compounds were identified based on direct structural equivalence. Equating the 91
SPDB target binding sites to ChEMBL bioactivities was done manually, yielding 65 corre-
sponding ChEMBL targets. Significant bioactivity was defined as Kd, Ki, or IC50 values less
than or equal to 1µM. There were 380 drug-target interactions present in ChEMBL that were
missing from the SPDB matrix of 602 drugs and 91 targets. This set served as a blind test set
and will be referred to as the ChEMBL set in what follows.
2.2. Patient Package Insert Similarity
We employed the well established vector space information retrieval approach10,11 to model
patient package inserts (PPIs). Text documents are modeled as vectors in high dimen-
sional space where each dimension corresponds to a term with an associated weight. Co-
incidence of terms with high weight leads to high computed similarity between documents.
The process to transform PPIs into weighted term vectors requires four steps. First, rele-
vant sections are extracted, including: Indication, Contraindications, Precautions, Adverse
Reactions, Drug Interactions, and Clinical Pharmacology. Second, term lists (up to five
Pacific Symposium on Biocomputing 2014
162
Biocomputing 2014 Downloaded from www.worldscientific.com
by 46.246.28.95 on 12/21/15. For personal use only.
words each) are generated, with punctuation and short words like prepositions and ar-
ticles removed. Third, to eliminate artifactual terms and enhance relevance, terms are
identified that are part of two controlled vocabularies: Medical Subject Headings (MeSH,
http://www.ncbi.nlm.nih.gov/mesh) and the low-level Medical Dictionary for Regulatory
Activities (MedDRA, http://www.meddra.org). Last, term weights are assigned based on
information richness (e.g. “generalized seizures” >“seizures”). Word frequencies from the
Google Web 1T 5-gram Corpus (http://www.ldc.upenn.edu/Catalog/index.jsp, catalog num-
ber LDC2006T13) were used to compute term weights, with rare terms producing higher
scores than common ones. For example, “seizures” produced a log odds weighting of 4.74, but
the more specific term “generalized seizures” yielded 6.89. The final output for each drug is a
vector composed of 6,591 term weights (the weight of the term if present and zero otherwise).
From the PPI for carbamazepine, the Indication Section includes: “patients with the following
seizure types: partial seizures with complex symptomatology (psychomotor, temporal lobe).”
The unfiltered bigrams include both sensible ones such as “partial seizures” and useless ones
such as “patients with” with the filtering process eliminating the latter. For carbamazepine,
the two most heavily weighted terms were “failure liver” (8.83) and “syncope and collapse”
(8.62). The term “partial seizures” scored 6.37, with many related terms (e.g. “grand mal”)
scoring similarly.
PPISimilarity (A, B) = Pn
i=1 AiBi
pPn
i=1 A2
ipPn
i=1 B2
i
(1)
Comparison of a pair of drug PPI vectors is quantified using the cosine similarity metric
(Eq. 1). The metric has a range of 0–1, but its units are both arbitrary and counterintuitive.
To employ such values in our data fusion framework, the raw similarity scores were normalized
to p-values by generating a distribution of PPI similarity scores for unrelated molecule pairs.
The unrelated pairs were identified based on having low 2D and low 3D similarity, quantified
as described below with pairwise p-value comparisons 0.5 (we have previously shown that
structurally unrelated drug pairs very infrequently share targets1). So, given a PPI similarity
score Sbetween a drug pair, the p-value is simply the proportion of occurrences of Sor
greater in the background set. For example, the raw PPI similarity between carbamazepine
and levetiracetam was 0.286 (see Figure 1), and this corresponded to a p-value of 0.044. The
most heavily weighted terms in the comparison included the following: pancytopenia (6.6),
cytochrome p450 (6.6), grand mal (6.5), antiepileptic drugs (6.5), and partial seizures (6.4).
2.3. Target Prediction using Patient Package Insert Similarity
We have previously reported a framework for data fusion which allows for the integration
of similarity scores into a single value.1Briefly, given a molecule Aand a set a molecules
with a shared biological effect, Bn, the similarity between molecule Aand each molecule Bi
is computed. The similarity scores are normalized to p-values as detailed above by assessing
score magnitude against score from a random background set. The multinomial distribution
is then used to compute the likelihood, M, of observing the set of p-values and of the converse
probabilities, M. The log-odds score Lis then computed by taking the log of the ratio of M
Pacific Symposium on Biocomputing 2014
163
Biocomputing 2014 Downloaded from www.worldscientific.com
by 46.246.28.95 on 12/21/15. For personal use only.
and Mand inverting the sign. A detailed discussion of the computation and corresponding 2D
and 3D similarity example can be found in the original publication.1An attractive feature of
our methodology is that it is able to integrate the results of different similarity computations
into a single value. For example, the log-odds calculation for tolazamide interacting with
PPARγ-RXR yields single-modality values of 11.35 for PPI, 7.57 for 3D, and 5.49 for 2D.
Combining the similarity methods gives a stronger prediction compared to using any single
method alone with 3D+2D+PPI log-odds = 23.43.
2.4. Similarity and p-value Computation with Surflex-Sim
The Surflex-Sim 3D molecular similarity method and its use for virtual screening and off-target
prediction has been extensively described in multiple publications.2,8,12,13 Briefly, given two
molecules in specific poses, a value from 0 to 1 is computed that reflects the degree to which
their molecular surfaces are congruent with respect to both shape and polarity. The function
is based on the differences in distances from observer points surrounding the molecules to
the closest points on their surfaces, including both the closest hydrophobic surface points and
the closest polar surface points. So, two molecules that may have very different underlying
chemical scaffolds may exhibit nearly identical surfaces to the observer points. These points
are analogous to a protein binding pocket, which also “observes” ligands from the outside.
Additional details regarding the theory and underlying algorithmic details can be found in the
previously published work. In order to produce a log-odds value for a molecule against a list
of molecules with a shared annotation, 3D similarity values must be computed against each
annotated molecule, and these values must then be transformed into probabilities. Given the
particulars of the conformational sampling density, 3D similarity optimization thoroughness,
and empirical conversion of raw scores to p-values, the overall process required many hours
for each comparison of one molecule to a typical set of annotated molecules.
In the current work, two improvements were made to support large-scale application of the
methods. First, a new mode of pose optimization was developed in which diverse conforma-
tions of molecules are pre-generated prior to molecular comparison. Using this new mode, the
optimal pose for one molecule onto a specific pose of another can be done quickly enough to
process roughly 2 million drug-like molecules per day on a single computing core (compared
with roughly 10,000 previously). Second, rather than using explicit computation of 1000 back-
ground similarity values for each molecule (as previously), we made use of the observation
that these distributions were essentially always normally distributed. Given a molecule pair,
only the particular mean and standard deviation for each need be estimated in order to de-
rive a p-value rather than making use of the full empirical computation. Estimation of the
distributional parameters was accomplished using simple linear regression models that made
use of “molecular imprints” for each molecule.8A molecular imprint is a vector of similarity
values for a particular molecule against a fixed basis set of molecules (one pose each). Such
vectors have precedent in predicting many molecular properties,14,15 and the conformational
pre-search procedure was augmented to produce standard molecular imprints. So, given two
pre-searched molecules, their mutual maximal 3D similarity can be rapidly calculated, and
the p-value conversion is immediately derived from the estimated distributional parameters
Pacific Symposium on Biocomputing 2014
164
Biocomputing 2014 Downloaded from www.worldscientific.com
by 46.246.28.95 on 12/21/15. For personal use only.
for each molecule. Taken together, the two improvements allow for typical 3D log-odds com-
putations to be made in a few minutes for a given molecule against a target characterized by
twenty known ligands. To test the accuracy of the faster method, we recomputed the p-values
and log-odds values from our previous work. An all-by-all similarity of the 358 drugs from
the original study yielded a Pearson’s correlation of 0.947 and Kendall’s tau of 0.814, both
highly statistically significant. The full log-odds computation of 358 drugs against 44 targets
yielded a Pearson’s correlation of 0.955 and Kendall’s Tau of 0.761 (again highly statistically
significant).
For 2D molecular similarity computations, which make purely topological comparisons
between molecules, we employed the previously described GSIM-2D method.1,2 This method
is sufficiently efficient that empirical conversion of raw scores into p-values is possible, as
we have previously described.1For this method to yield high similarity, two molecules must
be roughly the same size and contain similar subgraph compositions, especially for those
subgraphs rooted at heteroatoms.
3. Results and Discussion
3.1. Relationship between Structural Novelty and Clinical Effects
Previously, we quantified the effect of me-too drugs by showing that drug pairs with high
2D and high 3D similarity had four times more likelihood of having identical primary and
secondary targets than drugs pairs where one was structurally novel.1Here, this analysis has
been extended to clinical effects by making use of the lexical similarity of package inserts.
Both to establish the relevance of the PPI similarity metric and to quantify the degree to
which structural novelty is related to changes in clinical effects, we computed the pairwise 2D,
3D, and PPI similarity of all 602 drugs. The drug pairs were separated into four categories
based on chemical structural similarity: high 2D and 3D similarity, low 2D but high 3D, high
2D but low 3D, and low 2D and 3D. High similarity included pairs with p-values 0.01 and
low similarity were those with p-values 0.5.
Figure 2A shows the histogram of the PPI p-value distributions for each of the four struc-
tural categories. It is clear that the “me-too” drug distribution (red line, drug pairs with high
2D and high 3D similarity) is different than the others. Toward the left side of the plot, where
clinical effects similarity was high (PPI p-values 0.05), a large fraction of the me-too drug
pairs had highly similar clinical effects. Structurally novel drug pairs (high 3D but low 2D
similarity, green line) exhibited a significantly smaller fraction with highly concordant clin-
ical effects but still showed some relationship between structural similarity and therapeutic
profile. The high 2D and low 3D pairs had little signal (blue), and only a very small portion
of structurally dissimilar drug pairs (low 2D and low 3D, magenta) shared clinically similar
effects. Clearly, drug pairs with very high structural similarity (both by 2D and 3D meth-
ods) were much more likely to have closely shared clinical effects than molecule pairs of any
other category, even those sharing high 3D similarity but low 2D similarity. The converse
observations paralleled these observations. Figure 2B shows the corresponding histograms of
3D and 2D p-value distributions where molecule pair segregation was made based on clinical
effect similarity. The 2D and 3D similarity p-value distributions for drug pairs with high PPI
Pacific Symposium on Biocomputing 2014
165
Biocomputing 2014 Downloaded from www.worldscientific.com
by 46.246.28.95 on 12/21/15. For personal use only.
0 0.2 0.4 0.6 0.8 1
Figure 7.
Quantifying the effect of me-too drugs based on PPI similarity.Panel A shows the PPI p-
value distribution of drug pairs that were segregated based on 2D and 3D p-values into
the four bins shown above (number of pairs per quadrant are shown in parentheses).
Drug pairs with high 2D and high 3D have a higher likelihood of having significant
phenotypic effects than molecules with low 2D but high 3D. Panel B shows the 3D p-
values distribution of drug pairs that were segregated based on high and low PPI p-
values.len(high_ppi): 3968
len(low_ppi): 88539
Yera/Cleves/Jain: Ligand Structure/Function 26
Frequency
PPI p-value 3D or 2D p-value
high 2D/high 3D (1065)
2D & 3D p-val ≤ 0.01
low 2D/high 3D (728)
2D p-val ≥ 0.50 & 3D p-val ≤ 0.01
high 2D/low 3D (428)
2D p-val ≤ 0.01 & 3D p-val ≥ 0.50
low 2D/low 3D (79654)
2D & 3D p-val ≥ 0.50
A B
0 0.2 0.4 0.6 0.8 1
2D high PPI
2D low PPI
3D high PPI
3D low PPI
Fig. 2. Relationship between structural similarity and clinical effects similarity.
similarity (red and green lines) showed stronger enrichment for low p-values associated with
high 3D structural similarity. As expected, drug pairs that had low PPI similarity (blue and
brown lines) also had low 3D and 2D structural similarity.
3.2. Internal SPDB Validation: Off Target Effects
An attractive aspect of the log-odds framework is that it allows us to combine different types
of similarity computations into a single value. For each of the 602 drugs in our dataset, we
computed the 2D, 3D, PPI, and combination log-odds scores of interacting with each of the 91
targets that had at least 5 drugs as ligands in the SPDB. In each case, any self/self comparisons
were omitted from the calculations, making this exercise a leave-one-out cross-validation of
the log-odds predictive methodology. The three methods were used independently and in
combination to predict the log-odds of known primary and secondary target interactions.
As we observed in our previous study, primary target predictions were dominated by the
presence of me-too drugs, limiting the differences between any methods (data not shown).
However, for prediction of secondary targets, i.e. those that mediate side-effects, significant
differences appeared. Table 1 summarizes the true-positive rates observed for difference log-
odds computations for secondary target prediction at different score thresholds.
Table 1. SPDB Secondary Target Performance
Log-Odds 3D 2D PPI 3D+2D 3D+PPI 2D+PPI 3D+2D+PPI
0 97 90 96 95 98 97 97
10 43 7 14 55 61 33 64
20 16 0 0 23 26 1 38
Pacific Symposium on Biocomputing 2014
Biocomputing 2014 Downloaded from www.worldscientific.com
by 46.246.28.95 on 12/21/15. For personal use only.
For all single methods and combinations of methods, the information present in the anno-
tated drugs yielded positive information, evidenced by high true-positive rates at a log-odds
threshold of 0. However, substantial differences among the methods appeared as higher log-
odds thresholds were considered. At a threshold of 10, the 3D similarity approach showed a
much higher retrieval rate than either of the other two single-mode methods. All combinations
of methods showed synergy, with the most effective retrieval occurring with a combination
of all three similarity methods to produce a single log-odds score. Roughly 60% of the true
secondary target annotations could be recovered using the log-odds score from 3D+2D+PPI
similarity computations. Note, however, that true positive rates without the context of false
positive rates can be very misleading. The issue of estimating false positive rates is not straight-
forward though. In our SPDB, a missing annotation between a drug and a target does not
mean that the interaction does not occur. Authentic interactions within our 602 drug/91 target
set may have been published after our curation or have yet to be biochemically characterized.
Nonetheless, we expect that the large majority of unannotated interactions, in fact, represent
true negative data. So, as a surrogate for a measurement of false positive rates for our sim-
ilarity methods, we determined the number of drug/target predictions for interactions that
were unannotated. At log-odds thresholds of 5, 10, and 20, predictions for non-existent SPDB
annotations for both 3D similarity alone and 3D+2D+PPI were 3%, 1%, and 0.2%. These are
upper limits of false positive predictions. As will be described below, the false positive rate
was actually lower since many of the new predictions were validated as true by incorporating
annotations from the ChEMBL database.
3.3. Prediction of New Drug-Target Pairs within ChEMBL
As discussed above, a missing annotation within the SPDB between a drug and a target does
not necessarily mean that the interaction does not occur. For example, the drugs orphenadrine
and mesoridazine showed high 3D log-odds against the muscarinic receptor but the interactions
had been unannotated in the SPDB. Careful inspection of the literature revealed that the
drugs were known to antagonize muscarinic receptors.1Therefore, drug target annotations
that are known but missing from our SPDB can serve as a blind set to test our methodology.
To supplement annotations within the SPDB with a blind set for methodological testing,
we searched ChEMBL and found 380 biochemically characterized drug/target interactions
not present in the SPDB. We then investigated how well the methodology could identify the
new ChEMBL annotations based only upon information within in the SPDB as the basis to
compute the log-odds.
Table 2 shows the proportions correctly predicted at various log-odds using different meth-
ods and combinations. In general, the trends observed for the SPDB leave-one-out experiments
were borne out. Among individual methods, 3D similarity strongly outperformed 2D- or PPI-
Table 2. ChEMBL Prediction Performance
Log-Odds 3D 2D PPI 3D+2D 3D+PPI 2D+PPI 3D+2D+PPI
5 43 14 13 42 41 19 41
10 16 3 3 20 18 8 22
20 2 0 0 3 1 0 4
Pacific Symposium on Biocomputing 2014
167
Biocomputing 2014 Downloaded from www.worldscientific.com
by 46.246.28.95 on 12/21/15. For personal use only.
based similarity, with the latter two having similar performance. However, the combination of
the three methods, overall, yielded better performance than 3D alone. At log-odds thresholds
of 10 and 20, using the full combination of methods, the percentage of recovered annotations
within the SPDB test set was 22% and 4%, respectively. This compared with 16% and 2%
using 3D similarity alone, and 3% and 0% using either 2D or PPI similarity alone. The enrich-
ment ratios for the combination approach, using the upper-bound false positive rates discussed
above, corresponded to 22-fold and 40-fold, respectively, at log-odds thresholds of 10 and 20.
Figure 3 shows a typical example of a drug/target interaction not annotated in the
SPDB where the combination similarity approach confidently identified a pharmacolog-
ically relevant target. Sibutramine is an anorexic annotated in the SPDB as a ligand
of the serotonin and norepinephrine reuptake transporters. However, it has been shown
that sibutramine also interacts with the dopamine reuptake transporter and that this in-
teraction contributes to the therapeutic benefit (indicated in the Meridia package insert,
http://www.rxabbott.com/pdf/meridia.pdf). Computing the similarity between sibutramine
and 11 other dopamine reuptake transporter inhibitors (two are shown in Figure Figure 3),
the log-odds were 2.3, 4.2, and 6.9 using 2D, 3D, and PPI, respectively. These predictions
were strengthened by combining all three methods, with corresponding log-odds of 9.4. The
pairwise PPI similarities between sibutramine and bupropion and nefazodone are highly signif-
icant as are the individual 3D similarities. Clinical effects can be sufficient to infer off-targets,
nefazodone
terms: 547
Sibutramine
terms: 464
Primary: 5-HT & Norepinephrine reuptake transporters
Figure 15.
ChEMBL Example: PPI alone can infer off-target effects and combining with 3D makes
stronger inference.
Yera/Cleves/Jain: Ligand Structure/Function 36
2D Sim: 0.085 (p = 0.061)
bupropion
terms: 413
3D Sim: 7.1 (p = 0.045)
Predicted Target 2D 3D PPI 3D+PPI 2D+3D+PPI
Dopamine reuptake
transporter 2.3 4.2 6.9 9.2 9.4
term (212 in common) weight
tooth disorder 8.04
flu syndrome 7.41
abnormal dreams 7.20
emotional lability 6.95
hypomania 6.36
term (250 in common) weight
thinking abnormal 8.07
tooth disorder 8.04
abnormal dreams 7.20
emotional lability 6.95
suicidal ideation 6.23
PPI Sim: 0.420 (p = 0.001) 3D Sim: 7.6 (p = 0.028)
2D Sim: 0.032 (p = 0.506) PPI Sim: 0.432 (p = 0.001)
Cl
N
Cl
N
N
N
N
N
O
O
Cl
HN
O
Fig. 3. ChEMBL example showing that combination similarity effectively predicts a drug target interaction
not covered within the SPDB. Shown are the 2D structures, 3D overlays, and common clinical terms between
sibutramine and two dopamine reuptake transporter inhibitors, bupropion and nefazodone.
Pacific Symposium on Biocomputing 2014
Biocomputing 2014 Downloaded from www.worldscientific.com
by 46.246.28.95 on 12/21/15. For personal use only.
StructuralNoveltyofChEMBLandSPDB
Fluoxetine Prediction:M3ligand Apomorphine Prediction:D3ligand
ChEMBL
SPDB
q
uency
M3LogOdds
highest2Dsimilarity
NO
D3LogOdds
highest2Dsimilarity
Fre
q
2D: 1.2
3D:7.7
PPI: 3.7
Methadone
2Dpval=0.041
3Dpval=0.0074
2D: 1.1
3D: 7.7
PPI: 1.2
Ropinirole
2Dpval=0.210
3Dpval=0.00035
2Dpval
00.2 0.4 0.6 0.8 1
Fig. 4. A near-neighbor analysis for each of 602 drugs (SPDB in green, ChEMBL in red) based on target
annotations from the SPDB.
but combining similarity methods generally adds confidence to predictions.
Note, however, that the numerical performance on the ChEMBL set was lower than for
the SPDB set in terms of pure true positive recovery rates (see Tables 1 and 2). This stemmed
from an increase in structural diversity for molecules within ChEMBL compared to those
molecules within the SPDB for the target identified by the ChEMBL annotation. To quantify
structural novelty, we performed a nearest-neighbor analysis. For each drug within ChEMBL,
the most similar 2D representative from the SPDB was identified (based on p-value) from
within the collection of drugs having the same target annotation. An analogous leave-one-out
computation was performed for each drug target annotation within the SPDB. Figure 4 shows
a histogram of the distributions of p-values for the ChEMBL (red line) and SPDB (green line)
sets. Within the SPDB set, there were substantially more cases with extremely low p-values
than for the ChEMBL set. The nearest structural neighbor for each ChEMBL test molecule
were generally more divergent. Two examples are highlighted from the ChEMBL set where
the nearest neighbor had poor 2D p-values relative to the much more significant 3D p-values
which provided support for high log-odds scores.
Fluoxetine (blue box) is a selective serotonin reuptake inhibitor which mediates its ther-
apeutic benefit through inhibition of the 5-HT reuptake transporter. The ChEMBL data in-
dicated that fluoxetine also interacts with the muscarinic M3receptor. The nearest-neighbor
molecule sharing this annotation was methadone (2D p-value = 0.041). Considering all of the
muscarinic M3receptor ligands (38 total), the 2D, 3D, and PPI log-odds were 1.2, 7.7, and
3.7 respectively. Combining all of the methods gave a score of 8.2.
Apomorphine (red box) is indicated to treat Parkinson’s disease and its therapeutic benefit
is thought to be primarily due to activating dopamine D2receptors. However, apomorphine
was indicated within ChEMBL to also interact with the dopamine D3receptor (which is also
known to play a role in the beneficial effects for other anti-Parkinsonian drugs). The nearest-
neighbor drug within the D34 ligands was ropinirole (2D p-value = 0.210), which is structurally
distinct in a topological sense in Figure 4. As in the previous case, when considering all 11
dopamine D3ligands, the 3D comparisons provide primary support for a positive log-odds
Pacific Symposium on Biocomputing 2014
169
Biocomputing 2014 Downloaded from www.worldscientific.com
by 46.246.28.95 on 12/21/15. For personal use only.
score. The 2D, 3D, and PPI log-odds were -1.1, 7.7, and 1.2 respectively. The combination of
all three comparison types yielded a score of 3.3. Here, the 3D molecular similarity information
was the most reliable predictor.
4. Conclusion
In the present study, we report a means to combine chemical similarity between molecules
with information derived from computing similarity based upon lexical analysis of patient
package inserts (PPI). As expected based on our prior work, drugs that were highly struc-
turally similar (both by 2D and 3D comparison) were much more likely to have significant
overlap of their clinical effects compared to drugs that were structurally different (low 2D
similarity but high 3D similarity). Our prior work illustrated a similar effect with respect to
specifically annotated molecular targets: me-too drugs tend to have nearly identical target
profiles.1The correlation between lexical and chemical similarity also served to validate the
lexical comparison methodology.
We extended a probabilistic data fusion method to include observations from both molec-
ular and clinical effects similarity and reported performance on predicting protein targets of
small molecules. This was done both by leave-one-out cross-validation on our internal database
of drug-target interactions (the SPDB) as well as on a blind test on new interactions present
in ChEMBL. For off-target prediction within the SPDB, 3D similarity was the most effec-
tive single information source. However, combining the methods predicted a larger proportion
of secondary targets than any of the individual methods, while maintaining a similar nomi-
nal false positive rate. On the test against previously unseen ChEMBL drug-target linkages,
again 3D similarity was the single most effective predictor, but gains were derived from com-
bining the different data sources. We note that the method supports the integration of any
method that produces scores relating molecules to targets (e.g. docking), and that inclusion
of additional information sources is likely to produce further benefits. It is also important to
understand that this framework is similar in character to virtual screening methods, in that
while enrichment for compounds with the predicted effects occurs, the actual potencies of the
effects are not predicted. This point is discussed at length in a prior study.16
In contemplating the problem of off-target prediction for drugs, the problem of molecular
design ancestry can confuse the issue of methodological validation. For example, ligands of
aminergic GPCRs offer troublesome test case, owing to the established promiscuity of such
drugs among numerous targets.17 Returning to Figure 1 (bottom), we see the example of lev-
etiracetam, an anticonvulsant believed to have a unique mechanism of action when compared
with most existing anticonvulsants. The established CNS targets of the major classes of an-
ticonvulsant drugs include the GABAAreceptor (for barbiturates such as pentobarbital) and
neuronal voltage-gated sodium channels (for drugs such as carbamazepine and phenytoin).
These drugs have been recently shown to modulate voltage-gated potassium channels as part
of their anti-epileptic effects.18–21 Levetiracetam, having a novel scaffold, has been proposed
to work through an entirely new mechanism of action due to high binding affinity to the
synaptic vesicle protein SV2A (which is not a known therapeutic target of any drug).6,22,23
Our methods strongly predict that levetiracetam is a voltage-gated sodium channel modulator
Pacific Symposium on Biocomputing 2014
170
Biocomputing 2014 Downloaded from www.worldscientific.com
by 46.246.28.95 on 12/21/15. For personal use only.
with 3D log-odds alone of 14.5 (the combination log-odds was 21.4). Levetiracetam has been
shown to inhibit voltage-gated potassium currents,22 leading to the suggestion that this drug,
like other anti-epileptics, acts at least in part through potassium channels. Considering that
many antiepileptics modulate both sodium and potassium channels,23 our prediction supports
the notion that levetiracetam shares a similar mechanism of action, perhaps in addition to
the interaction with SV2A.
Identification of off-target activities of drugs is a difficult problem, particularly in cases
where the drug in question has a non-obvious structural relationship with the known ligands
of a given target. Our hope is that methods that make use of multiple information sources
will help to identify clinically important and unexpected effects.
References
1. E. Yera, A. Cleves and A. Jain, Journal of Medicinal Chemistry 54, 6771 (2011).
2. A. E. Cleves and A. N. Jain, Journal of Computer-Aided Molecular Design 22, 147 (2008).
3. J. Wright and R. Willette, Journal of Medicinal Chemistry 5, 815 (1962).
4. K. Nagashima, A. Takahashi, H. Ikeda, A. Hamasaki, N. Kuwamura, Y. Yamada and Y. Seino,
Diabetes Research and Clinical Practice 66, S75 (2004).
5. D. S. Ragsdale and M. Avoli, Brain Research 26, p. 16 (1998).
6. B. Lynch, N. Lambeng, K. Nocka, P. Kensel-Hammes, S. Bajjalieh, A. Matagne and B. Fuks,
PNAS 101, 9861 (2004).
7. M. Campillos, M. Kuhn, A. Gavin, L. Jensen and P. Bork, Science 321, 263 (2008).
8. A. E. Cleves and A. N. Jain, Journal of Medicinal Chemistry 49, 2921 (2006).
9. A. Gaulton, L. Bellis, A. Bento, J. Chambers, M. Davies, A. Hersey, Y. Light, S. McGlinchey,
D. Michalovich and B. Al-Lazikani, Nucleic Acids Research 40, D1100 (2012).
10. G. Salton and M. J. McGill, Introduction to Modern Information Retrieval (McGraw-Hill, Inc.,
New York, NY, USA, 1986).
11. J. Han, M. Kamber and J. Pei, Data Mining: Concepts and Techniques, Second Edition (Morgan
Kaufmann Series in Data Management Systems), 2 edn. (Morgan Kaufmann, 2006).
12. A. N. Jain, Journal of Computer-Aided Molecular Design 14, 199 (2000).
13. A. N. Jain, Journal of Medicinal Chemistry 47, 947 (2004).
14. A. Ghuloum, R. Carleton and A. Jain, Journal of Medicinal Chemistry 42, 1739 (1999).
15. J. Mount, J. Ruppert, W. Welch and A. Jain, Journal of Medicinal Chemistry 42, 60 (1999).
16. A. Jain and A. Cleves, J Comput Aided Mol Des 26, 57 (2012).
17. M. J. Keiser, B. L. Roth, B. N. Armbruster, P. Ernsberger, J. J. Irwin and B. K. Shoichet,
Nature Biotechnology 25, 197 (2007).
18. C. Zona, V. Tancredi, E. Palma, G. Pirrone and M. Avoli, Canadian Journal of Physiology and
Pharmacology 68, 545 (1990).
19. F. Bloom, D. Kupfer and B. Bunney, Psychopharmacology: The Fourth Generation of Progress
(Raven Press, 1995).
20. M. Nobile and P. Vercellino, British Journal of Pharmacology 120, 647 (1997).
21. A. Ambr´osio, P. Soares-da Silva, C. Carvalho and A. Carvalho, Neurochem. Res. 27, 121 (2002).
22. M. Madeja, D. Georg Margineanu, A. Gorji, E. Siep, P. Boerrigter, H. Klitgaard and E. Speck-
mann, Neuropharmacology 45, 661 (2003).
23. R. Surges, K. Volynski and M. Walker, Therapeutic Advances in Neurological Disorders 1, 13
(2008).
Pacific Symposium on Biocomputing 2014
171
Biocomputing 2014 Downloaded from www.worldscientific.com
by 46.246.28.95 on 12/21/15. For personal use only.
... In this work, we show how protein structural information can be exploited to bolster predictions of polypharmacology from ligand-based computations. The approach combines data from molecular docking, protein binding pocket similarity, 3D structural ligand similarity, and ligand-similarity based on lexical analysis of drug package inserts [6][7][8][9]. The combined computational approach identified a clear chemical and structural linkage between perixosome proliferator-activator receptor alpha (PPARa) and the cyclooxygenase (COX) enzymes, which share no sequence homology and have disparate in vivo functions. ...
... We have shown that the combination of molecular structural similarity combined with similarity computed from drug package inserts provides improved detection of true ligand-target interactions over use of single-mode similarity computations when controlling for false detection rates [9]. The study focused on 602 drugs and 91 diverse biological targets, with the emphasis being on computational validation of combining multiple ligand similarity methods in a blind prediction test on the ChEMBL database. ...
... Using all available ligand-based information, the overall log-odds scores for gemfibrozil, clofibric acid, and fenofibric acid were: 11.6, 7.5, and 7.0, respectively. To put these numbers in perspective, a systematic blind prediction test [9] suggested that log-odds scores of greater than 5.0 yielded correct ligand to target linkages 40-50 % of the time, with an upper bound on the false positive prediction rate of roughly 1-3 %. ...
Article
Full-text available
We have previously validated a probabilistic framework that combined computational approaches for predicting the biological activities of small molecule drugs. Molecule comparison methods included molecular structural similarity metrics and similarity computed from lexical analysis of text in drug package inserts. Here we present an analysis of novel drug/target predictions, focusing on those that were not obvious based on known pharmacological crosstalk. Considering those cases where the predicted target was an enzyme with known 3D structure allowed incorporation of information from molecular docking and protein binding pocket similarity in addition to ligand-based comparisons. Taken together, the combination of orthogonal information sources led to investigation of a surprising predicted relationship between a transcription factor and an enzyme, specifically, PPARα and the cyclooxygenase enzymes. These predictions were confirmed by direct biochemical experiments which validate the approach and show for the first time that PPARα agonists are cyclooxygenase inhibitors.
... We can use a similar formulation to Eq. 1 by expressing the similarities of these poses to native poses in terms of probabilities. We have previously shown how to transform the results of molecular similarity computations into probability values by comparing the magnitude of a similarity score for molecule A versus B to the distribution of scores for A and B compared with a random background set of molecules [40,41]. In that work, given the maximal similarity of A to B (in any energetically reasonable conformation of either molecule) the problem was to assign a probability of observing a similarity value or that magnitude or higher. ...
... The work presented here represents the first generalization of our ongoing work using such techniques for predicting polypharmacology [40,41,49]. We believe that hybrid approaches that combine information from docking and scoring, ligand similarity, and protein pocket similarity will frequently show synergistic performance improvements for lead discovery and for predictions of binding mode, affinity, and off-target biological effects. ...
Article
Full-text available
Prediction of the bound configuration of small-molecule ligands that differ substantially from the cognate ligand of a protein co-crystal structure is much more challenging than re-docking the cognate ligand. Success rates for cross-docking in the range of 20-30 % are common. We present an approach that uses structural information known prior to a particular cutoff-date to make predictions on ligands whose bounds structures were determined later. The knowledge-guided docking protocol was tested on a set of ten protein targets using a total of 949 ligands. The benchmark data set, called PINC ("PINC Is Not Cognate"), is publicly available. Protein pocket similarity was used to choose representative structures for ensemble-docking. The docking protocol made use of known ligand poses prior to the cutoff-date, both to help guide the configurational search and to adjust the rank of predicted poses. Overall, the top-scoring pose family was correct over 60 % of the time, with the top-two pose families approaching a 75 % success rate. Correct poses among all those predicted were identified nearly 90 % of the time. The largest improvements came from the use of molecular similarity to improve ligand pose rankings and the strategy for identifying representative protein structures. With the exception of a single outlier target, the knowledge-guided docking protocol produced results matching the quality of cognate-ligand re-docking, but it did so on a very challenging temporally-segregated cross-docking benchmark.
... Similarly, off-target effects such as toxicity could be predicted by similarity search between the query molecule and a toxicity database [80,81]. Recently, similarity-based methods were also used in synthesis design assuming that similar compounds have similar reactivities. ...
Article
Full-text available
Rapid in silico selection of target focused libraries from commercial repositories is an attractive and cost-effective approach in early drug discovery. If structures of active compounds are available, rapid 2D similarity search can be performed on multimillion compounds’ databases. This approach can be combined with physico-chemical parameter and diversity filtering, bioisosteric replacements, and fragment-based approaches for performing a first round biological screening. Our objectives were to investigate the combination of 2D similarity search with various 3D ligand and structure-based methods for hit expansion and validation, in order to increase the hit rate and novelty. In the present account, six case studies are described and the efficiency of mixing is evaluated. While sequentially combined 2D/3D similarity approach increases the hit rate significantly, sequential combination of 2D similarity with pharmacophore model or 3D docking enriched the resulting focused library with novel chemotypes. Parallel integrated approaches allowed the comparison of the various 2D and 3D methods and revealed that 2D similarity-based and 3D ligand and structure-based techniques are often complementary, and their combinations represent a powerful synergy. Finally, the lessons we learnt including the advantages and pitfalls of the described approaches are discussed.
... The relationship between drug and target, through data fusion framework, is a best approach in which 3D-similarity well predicts off-target effects [92]. Discovered drug target interactions can lead to the drug being repositioned for therapeutic treatment of its offtarget's associated disease [93]. ...
Article
Background: One of the major goals of computational chemists is to determine and develop the pathways for anticancer drug discovery and development. In recent past, high performance computing systems elicited the desired results with little or no side effects. The aim of the current review is to evaluate the role of computational chemistry in ascertaining kinases as attractive targets for anticancer drug discovery and development. Methods: Research related to computational studies in the field of anticancer drug development is reviewed. Extensive literature on achievements of theorists in this regard has been compiled and presented with special emphasis on kinases being the attractive anticancer drug targets. Results: Different approaches to facilitate anticancer drug discovery include determination of actual targets, multi-targeted drug discovery, ligand-protein inverse docking, virtual screening of drug like compounds, formation of di-nuclear analogs of drugs, drug specific nano-carrier design, kinetic and trapping studies in drug design, multi-target QSAR (Quantitative Structure Activity Relationship) model, targeted co-delivery of anticancer drug and siRNA, formation of stable inclusion complex, determination of mechanism of drug resistance, and designing drug like libraries for the prediction of drug-like compounds. Protein kinases have gained enough popularity as attractive targets for anticancer drugs. These kinases are responsible for uncontrolled and deregulated differentiation, proliferation, and cell signaling of the malignant cells which result in cancer. Conclusion: Interest in developing drugs through computational methods is a growing trend, which saves equally the cost and time. Kinases are the most popular targets among the other for anticancer drugs which demand attention. 3D-QSAR modelling, molecular docking, and other computational approaches have not only identified the target-inhibitor binding interactions for better anticancer drug discovery but are also designing and predicting new inhibitors, which serve as lead for the synthetic preparation of drugs. In light of computational studies made so far in this field, the current review highlights the importance of kinases as attractive targets for anticancer drug discovery and development.
... Recently, chemical similarity between molecules is being extended to evaluate clinical effects, if combined with information derived from computing similarity based upon lexical analysis of patient package inserts. It is expected, that drugs with highly structurally similarity (both by 2D and 3D comparison) are much more likely to have significant overlap of their clinical effects, compared to drugs that are structurally different (low 2D similarity but high 3D similarity Yera et al. 2014). However in the search of new candidates chemical similarity does not always lead to biological similarity. ...
Chapter
Drug discovery and development is a slow complicated multi-objective and expensive enterprise. Drug candidates are a compromise output of competing pharmacodynamics and pharmacokinetic processes. To facilitate this task and avoid failures in clinical phases, computational techniques and in silico modeling using the endpoints offered by high technology, are extremely valuable. In this chapter, some historical aspects and a background overview for constructing Quantitative Structure-Activity Relationships (QSAR) and Quantitative Structure-Property Relationships (QSPR) are provided. The different goals for the establishment of QSAR/QSPR models are defined. Representative examples and success stories of in silico modeling along the different drug discovery processes are presented. Examples include models for optimizing efficient binding to receptor, using both ligand- and structure-based approaches, for in vitro permeability predictions, predictions for human intestinal absorption and blood brain barrier penetration, as well as for plasma protein binding and drug metabolism. The value of global and local models as well as their interpretability and the criteria for their evaluation and proper use are discussed throughout this chapter.
... optimization of the objective function of a docking [4,5] or molecular similarity calculation [6,7]. Recently, we have explored hybrid approaches that blend agnostic conformational elaboration prior to docking or similarity optimization with some degree of local refinement during the pose optimization process [8,9]. Agnostic conformer generation (independent of any target) can offer advantages both in terms of speed and predictive accuracy, but this places a premium on the quality of the conformational ensembles. ...
Article
Full-text available
We introduce the ForceGen method for 3D structure generation and conformer elaboration of drug-like small molecules. ForceGen is novel, avoiding use of distance geometry, molecular templates, or simulation-oriented stochastic sampling. The method is primarily driven by the molecular force field, implemented using an extension of MMFF94s and a partial charge estimator based on electronegativity-equalization. The force field is coupled to algorithms for direct sampling of realistic physical movements made by small molecules. Results are presented on a standard benchmark from the Cambridge Crystallographic Database of 480 drug-like small molecules, including full structure generation from SMILES strings. Reproduction of protein-bound crystallographic ligand poses is demonstrated on four carefully curated data sets: the ConfGen Set (667 ligands), the PINC cross-docking benchmark (1062 ligands), a large set of macrocyclic ligands (182 total with typical ring sizes of 12–23 atoms), and a commonly used benchmark for evaluating macrocycle conformer generation (30 ligands total). Results compare favorably to alternative methods, and performance on macrocyclic compounds approaches that observed on non-macrocycles while yielding a roughly 100-fold speed improvement over alternative MD-based methods with comparable performance.
... Surflex computational methods have been described in detail in previous work: 3D similarity [12,13], 2D-similarity and computations involving comparisons of single molecules to sets of molecules [14,15], docking (including used of multiple protein structures) [10,[16][17][18], and both standard and structure-guided QMOD [1,4,6,7]. Details of algorithmic enhancements made to the QMOD procedure will be described in what follows, but prior descriptions will not be repeated except in abbreviated form where needed. ...
Article
Full-text available
Surflex-QMOD integrates chemical structure and activity data to produce physically-realistic models for binding affinity prediction . Here, we apply QMOD to a 3D-QSAR benchmark dataset and show broad applicability to a diverse set of targets. Testing new ligands within the QMOD model employs automated flexible molecular alignment, with the model itself defining the optimal pose for each ligand. QMOD performance was compared to that of four approaches that depended on manual alignments (CoMFA, two variations of CoMSIA, and CMF). QMOD showed comparable performance to the other methods on a challenging, but structurally limited, test set. The QMOD models were also applied to test a large and structurally diverse dataset of ligands from ChEMBL, nearly all of which were synthesized years after those used for model construction. Extrapolation across diverse chemical structures was possible because the method addresses the ligand pose problem and provides structural and geometric means to quantitatively identify ligands within a model’s applicability domain. Predictions for such ligands for the four tested targets were highly statistically significant based on rank correlation. Those molecules predicted to be highly active (\(\hbox {pK}_i \ge 7.5\)) had a mean experimental \(\hbox {pK}_i\) of 7.5, with potent and structurally novel ligands being identified by QMOD for each target.
... Thus, integrating systems biology with structural or chemoinformatics investigation has been used to predict drug adverse effects (Chen et al. 2013b). A quantitative method (Spiros and Geerts 2012), 2D topological structural features, 3D surface characteristics and clinical effects of drugs have been used to predict the off-target effects of drugs (Yera et al. 2014). ...
Article
Full-text available
A system can be defined as an organized, interconnected structure consisting of interrelated and interdependent elements (e.g., components, factors, members, parts). These parts and processes are connected by structural and/or behavioral relationships and continually influence one another directly or indirectly to maintain a balance essential for the existence of the system, and for achieving its goal. With increasing inflow of biological data, serious efforts to empathize biological systems as true systems are nowadays almost practicable. Handling high-throughput data places stress mainly on in silico approach comprising database handling, modeling, simulation and analysis, resulting in dramatic progress in system-level analysis. The databases and methods in bioinformatics are now moving in the direction of implementation of integrative dataset systems to represent genes, proteins and metabolic pathways in combination with simulated environment which is dynamic. For understanding the complex biological disorders and normal pathways of system it is significant to integrate the reductionist data which comes from transcriptomics, genomics, proteomics, lipidomics, glycomics, fluxomics and metabolomics. Numerous bioinformatics approaches are being exploited to integrate the molecular information from the biological databases and assist in simulation of metabolic networks. High-throughput experimental data set systems are, however, established on the static representation of the molecular data and existing knowledge. Various biological tools have been developed for understanding the mechanism of several diseases for drug discovery process. Study of dynamic nature of genetic, biochemical and signal transduction pathways can be done by simulating reactions with the help of integrative tools. Rising usage of rational drug designing approach is significant for identification of target in disease polluted network and evaluating ligand interaction for enhanced efficacy. How in-depth investigation of the whole system (a holistic approach) leads to emergence of systems biology is the crux of this review.
Article
Full-text available
Computer-aided drug design is a mature field by some measures, and it has produced notable successes that underpin the study of interactions between small molecules and living systems. However, unlike a truly mature field, fallacies of logic lie at the heart of the arguments in support of major lines of research on methodology and validation thereof. Two particularly pernicious ones are cum hoc ergo propter hoc (with this, therefore because of this) and confirmation bias (seeking evidence that is confirmatory of the hypothesis at hand). These fallacies will be discussed in the context of off-target predictive modeling, QSAR, molecular similarity computations, and docking. Examples will be shown that avoid these problems.
Article
Full-text available
ChEMBL is an Open Data database containing binding, functional and ADMET information for a large number of drug-like bioactive compounds. These data are manually abstracted from the primary published literature on a regular basis, then further curated and standardized to maximize their quality and utility across a wide range of chemical biology and drug-discovery research problems. Currently, the database contains 5.4 million bioactivity measurements for more than 1 million compounds and 5200 protein targets. Access is available through a web-based interface, data downloads and web services at: https://www.ebi.ac.uk/chembldb.
Article
Full-text available
Levetiracetam (LEV) is a new antiepileptic drug that is clinically effective in generalized and partial epilepsy syndromes as sole or add-on medication. Nevertheless, its underlying mechanism of action is poorly understood. It has a unique preclinical profile; unlike other antiepileptic drugs (AEDs), it modulates seizure-activity in animal models of chronic epilepsy with no effect in most animal models of acute seizures. Yet it is effective in acute in-vitro 'seizure' models. A possible explanation for these dichotomous findings is that LEV has different mechanisms of actions, whether given acutely or chronically and in 'epileptic' and control tissue. Here we review the general mechanism of action of AEDs, give an updated and critical overview about the experimental findings of LEV's cellular targets (in particular the synaptic vesicular protein SV2A) and ask whether LEV represents a new class of AED.
Book
This is the third edition of the premier professional reference on the subject of data mining, expanding and updating the previous market leading edition. This was the first (and is still the best and most popular) of its kind. Combines sound theory with truly practical applications to prepare students for real-world challenges in data mining. Like the first and second editions, Data Mining: Concepts and Techniques, 3rd Edition equips professionals with a sound understanding of data mining principles and teaches proven methods for knowledge discovery in large corporate databases. The first and second editions also established itself as the market leader for courses in data mining, data analytics, and knowledge discovery. Revisions incorporate input from instructors, changes in the field, and new and important topics such as data warehouse and data cube technology, mining stream data, mining social networks, and mining spatial, multimedia and other complex data. This book begins with a conceptual introduction followed by a comprehensive and state-of-the-art coverage of concepts and techniques. Each chapter is a stand-alone guide to a critical topic, presenting proven algorithms and sound implementations ready to be used directly or with strategic modification against live data. Wherever possible, the authors raise and answer questions of utility, feasibility, optimization, and scalability. relational data. -- A comprehensive, practical look at the concepts and techniques you need to get the most out of real business data. -- Updates that incorporate input from readers, changes in the field, and more material on statistics and machine learning, -- Scores of algorithms and implementation examples, all in easily understood pseudo-code and suitable for use in real-world, large-scale data mining projects. -- Complete classroom support for instructors as well as bonus content available at the companion website. A comprehensive and practical look at the concepts and techniques you need in the area of data mining and knowledge discovery.
Article
Voltage-gated sodium channels mediate regenerative inward currents that are responsible for the initial depolarization of action potentials in brain neurons. Many of the most widely used antiepileptic drugs, as well as a number of promising new compounds suppress the abnormal neuronal excitability associated with seizures by means of complex voltage- and frequency-dependent inhibition of ionic currents through sodium channels. Over the past decade, advances in molecular biology have led to important new insights into the molecular structure of the sodium channel and have shed light on the relationship between channel structure and channel function. In this review, we examine how our current knowledge of sodium channel structure–function relationships contributes to our understanding of the action of anticonvulsant sodium channel blockers.
Article
Drug structures may be quantitatively compared based on 2D topological structural considerations and based on 3D characteristics directly related to binding. A framework for combining multiple similarity computations is presented along with its systematic application to 358 drugs with overlapping pharmacology. Given a new molecule along with a set of molecules sharing some biological effect, a single score based on comparison to the known set is produced, reflecting either 2D similarity, 3D similarity, or their combination. For prediction of primary targets, the benefit of 3D over 2D was relatively small, but for prediction of off-targets, the added benefit was large. In addition to assessing prediction, the relationship between chemical similarity and pharmacological novelty was studied. Drug pairs that shared high 3D similarity but low 2D similarity (i.e., a novel scaffold) were shown to be much more likely to exhibit pharmacologically relevant differences in terms of specific protein target modulation.
Article
We report that carbamazepine (Tegretol), a drug that is useful for the treatment of complex partial seizures, enhances outward, voltage-dependent K+ currents generated by rat neocortical cells in culture and recorded with patch-clamping techniques. This effect is seen in the presence of therapeutic concentrations of carbamazepine (10-20 microM). Furthermore, at these doses carbamazepine does not influence voltage-dependent inward Na+ and Ca2+ currents recorded in these cells. The action exerted by carbamazepine on K+ currents is a novel finding and might represent an important mechanism for controlling neocortical excitability and thus the generation of epileptiform activity.
Article
The action of the anticonvulsant drug phenytoin on K ⁺ currents was investigated in neuroblastoma cells by whole‐cell voltage‐clamp recording. Neuroblastoma cells expressed an outward K ⁺ current with a voltage‐and time‐dependence which resembled the delayed‐rectifier K ⁺ current found in other cells. When added to the standard external solution at concentrations ranging between 1 and 200 μM, phenytoin reduced the current ( n = 65). Inhibition was concentration‐dependent with a half‐maximal inhibitory concentration of 30.9 + 0.8 μM. The K ⁺ current inhibition by phenytoin was voltage‐dependent with block by phenytoin being relieved by depolarization. The times taken to reach steady‐state inhibition and complete recovery from inhibition were about 20 s. Neither the activation and inactivation rates of the K ⁺ current nor the K ⁺ channel availability were significantly altered by the blocking drug. A use‐dependent block was observed at phenytoin concentrations of 10, 25 and 50 μM. These results suggest that phenytoin affects K ⁺ currents and that this effect might lead to a reduction in neuronal excitability.