ArticlePDF Available

Prediction of off-target drug effects through data fusion

January 2014

January 2014
19:160-71

DOI:10.1142/9789814583220_0016

Source
PubMed

Authors:

Ann Cleves

BioPharmics Division, Optibrium Ltd.

Ajay N Jain

BioPharmics LLC

We present a probabilistic data fusion framework that combines multiple computational approaches for drawing relationships between drugs and targets. The approach has special relevance to identifying surprising unintended biological targets of drugs. Comparisons between molecules are made based on 2D topological structural considerations, based on 3D surface characteristics, and based on English descriptions of clinical effects. Similarity computations within each modality were transformed into probability scores. Given a new molecule along with a set of molecules sharing some biological effect, a single score based on comparison to the known set is produced, reflecting either 2D similarity, 3D similarity, clinical effects similarity or their combination. The methods were validated within acurated structural pharmacology database (SPDB) and further tested by blind application to data derived from the ChEMBL database. For prediction of off-target effects, 3D-similarity performed best as a single modality, but combining all methods produced performance gains. Striking examples of structurally surprising off-target predictions are presented.

Relationship between molecular similarity methods, the proteins they modulate, and clinical effects in common. The top row shows two antidiabetics, tolbutamide (first in class) and tolazamide (follow on) which are very structurally similar, interact with similar proteins, and have similar clinical effects. The bottom row shows two antiepileptic drugs, carbamazepine and levetiracetam, that have different primary targets but similar clinical effects and 3D molecular similarity. Surflex-Sim's 3D overlay is shown at the bottom where carbamazepine is colored by green carbons and levetiracetam is in atom color. Green sticks correspond to regions of significant hydrophobic similarity and blue/red sticks correspond to regions of significant polar similarity.

…

. SPDB Secondary Target Performance

…

Quantifying the effect of me-too drugs based on PPI similarity. Panel A shows the PPI pvalue distribution of drug pairs that were segregated based on 2D and 3D p-values into the four bins shown above (number of pairs per quadrant are shown in parentheses). Drug pairs with high 2D and high 3D have a higher likelihood of having significant phenotypic effects than molecules with low 2D but high 3D. Panel B shows the 3D pvalues distribution of drug pairs that were segregated based on high and low PPI pvalues. len(high_ppi): 3968 len(low_ppi): 88539

…

ChEMBL Example: PPI alone can infer off-target effects and combining with 3D makes stronger inference.

…

Figures - uploaded by Ajay N Jain

Content may be subject to copyright.

Content uploaded by Ajay N Jain

Content may be subject to copyright.

PREDICTION OF OFF-TARGET DRUG EFFECTS

THROUGH DATA FUSION

EMMANUEL R. YERA, ANN E. CLEVES, and AJAY N. JAIN†

Bioengineering and Therapeutic Sciences, University of California, San Francisco,

San Francisco, CA 94143, USA

†E-mail: ajain@jainlab.org

www.jainlab.org

We present a probabilistic data fusion framework that combines multiple computational approaches

for drawing relationships between drugs and targets. The approach has special relevance to identi-

fying surprising unintended biological targets of drugs. Comparisons between molecules are made

based on 2D topological structural considerations, based on 3D surface characteristics, and based

on English descriptions of clinical eﬀects. Similarity computations within each modality were trans-

formed into probability scores. Given a new molecule along with a set of molecules sharing some

biological eﬀect, a single score based on comparison to the known set is produced, reﬂecting either 2D

similarity, 3D similarity, clinical eﬀects similarity or their combination. The methods were validated

within a curated structural pharmacology database (SPDB) and further tested by blind application

to data derived from the ChEMBL database. For prediction of oﬀ-target eﬀects, 3D-similarity per-

formed best as a single modality, but combining all methods produced performance gains. Striking

examples of structurally surprising oﬀ-target predictions are presented.

Keywords: Molecular similarity; Surﬂex-Sim; Patient Package Inserts; Oﬀ-Target Predictions.

1. Introduction

In prior work, we introduced a methodological approach for data fusion which was used to

predict the protein targets of small molecules based on molecular similarity.1Given a test

molecule and a set of small molecules with a known shared biological eﬀect, the method

produces a score corresponding to the likelihood that the test molecule will share the same

activity. We showed that for predicting primary targets (i.e. targets modulating intended

therapeutic eﬀects) the performance advantage of a 3D similarity method over a 2D method

was relatively small, due to the dominating eﬀects of human 2D bias in drug design (i.e.

“me-too” drugs).1,2 However, for predicting secondary targets (i.e. sources of side-eﬀects) 3D

similarity was much more eﬀective than 2D topological comparisons. We also showed that

clinical eﬀects of drugs could be used as a surrogate for biochemical characterization,1making

use of common side eﬀects of muscarinic antagonism as markers for the biochemical protein-

ligand eﬀect. It was possible using 3D chemical similarity to achieve strong separation of likely

muscarinic modulators from those with no evidence of such eﬀects.

In the current work, we expand the analysis to a much larger set of small molecule drugs,

again making use of 2D and 3D chemical similarity computations. Additionally, computations

involving structural similarity are augmented with clinical eﬀects similarity, made possible by

automating the extraction and weighting of relevant textual terms from drug package inserts.

The top row of Figure 1 shows two highly similar ﬁrst generation sulfonylureas, tolbutamide

and tolazamide, each having highly similar pharmacological eﬀects,3with their therapeutic

beneﬁts deriving from identical mechanisms.4Clinical eﬀects similarity coincides here with

Pacific Symposium on Biocomputing 2014

160

Biocomputing 2014 Downloaded from www.worldscientific.com

by 46.246.28.95 on 12/21/15. For personal use only.

Figure 1.

Relationship between molecular similarity methods, the proteins they modulate, and

clinical effects in common.The top row shows two antidiabetics, tolbutamide (first in

class) and tolazamide (follow on) which are very structurally similar, interact with

similar proteins, and have similar clinical effects. The bottom row shows two anti-

epileptic drugs, carbamazepine and levetiracetam, that have different primary targets

but similar clinical effects and 3D molecular similarity. Surflex-Sim's 3D overlay is

shown at the bottom where carbamazepine is colored by green carbons and

levetiracetam is in atom color. Green sticks correspond to regions of significant

hydrophobic similarity and blue/red sticks correspond to regions of significant polar

similarity.

Yera/Cleves/Jain: Ligand Structure/Function 20

Tolbutamide Tolazamide

Levetiracetam

Carbamazepine

Intended Target:

Kir6.2

Other Target:

PPARγ-RXR

Intended Target:

Kir6.2

Other Target:

PPARγ-RXR

Intended Target:

Nav1.1

Off-Target:

Nav1.5

Intended Target:

Synaptic Vesicle

Glycoprotein 2A

Clinical Terms in Common: diabetes mellitus, blood glucose, and hypoglycemia

Similarity: High 2D and High 3D

Clinical Terms in Common: grand mal and status epilepticus

Similarity: Low 2D but High 3D

Therapeutic Indication: Antidiabetic

Therapeutic Indication: Anticonvulsant

Fig. 1. Relationship between small molecules based on molecular similarity, protein target modulation, and

clinical eﬀects. The optimal 3D superimposition (bottom) indicates high similarity, despite little topological

commonality (green sticks correspond to regions of signiﬁcant surface shape similarity and blue/red sticks

correspond to regions of signiﬁcant polar similarity).

high structural 2D and 3D similarity. Next, consider the two structurally dissimilar anticon-

vulsants on the bottom of Figure 1, carbamazepine and levetiracetam. Carbamazepine was

one of the ﬁrst anticonvulsants (approved in 1968), and its therapeutic beneﬁt is attributed to

stabilizing the inactivated state of voltage-gated sodium channels (Nav1.1).5Levetiracetam is

a newer anticonvulsant, believed to act through interaction with synaptic vesicle glycoprotein

2A (SV2A).6As expected, the two package inserts have clinical eﬀect terms in common due

to shared indications. Given the high 3D structural similarity, our expectation is that these

drugs do in fact share some molecular targets, as will be discussed later.

The present study establishes a computational method to draw relationships between drugs

based on the clinical eﬀects present in Patient Package Inserts (PPI), whose utility for pre-

dicting drug target interactions has been shown previously.7The present study makes three

primary contributions. First, we introduce a method to extract and weight medically relevant

terms from English clinical eﬀects information. Second, we show that drug similarity com-

puted from package inserts is directly correlated with drug similarity computed by molecular

structure comparison. Third, we established that the combination of 2D, 3D, and PPI simi-

larity yielded better oﬀ-target predictive performance over any single similarity computation.

Recovery of roughly 40–50% of oﬀ-target annotations was possible with false positive rates of

about 1–3%. The approach is generalizable to other computational modalities (e.g. docking of

ligands to protein structures), and it is our hope that broad application of the methods will

aid in identifying unexpected interactions between drugs and biological targets.

Pacific Symposium on Biocomputing 2014

161

Biocomputing 2014 Downloaded from www.worldscientific.com

by 46.246.28.95 on 12/21/15. For personal use only.

2. Methods and Data

The following describes the molecular data sets, computational methods, and speciﬁc compu-

tational procedures (see http://www.jainlab.org for additional details on software, data, and

protocols).

2.1. Molecular Data Sets

In the present study two molecular data sets are used. The Structural Pharmacology Database

(SPDB) is a deeply curated drug target database that is used as the basis to make predictions.

A set of drug target annotations from ChEMBL that were not annotated in our database were

used as a blind test set.

The details of the SPDB and its relationship to other databases has been extensively de-

scribed elsewhere.1,2,8 It has two features that are particularly important for the present study.

First, “targets” are speciﬁc binding sites on proteins or protein complexes. This is a critical

distinction in order to make inferences about small molecule activity based on structural sim-

ilarity. Second, primary targets (those that are believed to be therapeutically beneﬁcial) are

distinguished from secondary targets (which mediate pharmacologically relevant oﬀ-target ef-

fects). By making this distinction, it is possible to explicitly quantify performance of methods

for prediction of surprising eﬀects. Of the roughly 1000 drugs within the SPDB, 602 met our

criteria for inclusion based on PPI information (see below). Of the 257 primary and secondary

targets of these 602 drugs, 91 had at least 5 annotated drugs and formed the basis of cross-

validation experiments. These 91 targets were comprised of 83 human proteins, including 28

aminergic GPCRs, 19 ligand and voltage gated ion channels, 13 human enzymes, 7 nucleotide

and short peptide GPCRs, 5 tyrosine kinases, 5 steroid receptors, 3 reuptake transporters, 2

ion transporters, and 1 transcription factor. The remaining 8 targets were bacterial, fungal,

and viral proteins. To test the methodology, we employed ChEMBL version 14, which curates

linkages between chemicals and biological targets.9For each of the 602 drugs, corresponding

ChEMBL compounds were identiﬁed based on direct structural equivalence. Equating the 91

SPDB target binding sites to ChEMBL bioactivities was done manually, yielding 65 corre-

sponding ChEMBL targets. Signiﬁcant bioactivity was deﬁned as Kd, Ki, or IC50 values less

than or equal to 1µM. There were 380 drug-target interactions present in ChEMBL that were

missing from the SPDB matrix of 602 drugs and 91 targets. This set served as a blind test set

and will be referred to as the ChEMBL set in what follows.

2.2. Patient Package Insert Similarity

We employed the well established vector space information retrieval approach10,11 to model

patient package inserts (PPIs). Text documents are modeled as vectors in high dimen-

sional space where each dimension corresponds to a term with an associated weight. Co-

incidence of terms with high weight leads to high computed similarity between documents.

The process to transform PPIs into weighted term vectors requires four steps. First, rele-

vant sections are extracted, including: Indication, Contraindications, Precautions, Adverse

Reactions, Drug Interactions, and Clinical Pharmacology. Second, term lists (up to ﬁve

Pacific Symposium on Biocomputing 2014

162

Biocomputing 2014 Downloaded from www.worldscientific.com

by 46.246.28.95 on 12/21/15. For personal use only.

words each) are generated, with punctuation and short words like prepositions and ar-

ticles removed. Third, to eliminate artifactual terms and enhance relevance, terms are

identiﬁed that are part of two controlled vocabularies: Medical Subject Headings (MeSH,

http://www.ncbi.nlm.nih.gov/mesh) and the low-level Medical Dictionary for Regulatory

Activities (MedDRA, http://www.meddra.org). Last, term weights are assigned based on

information richness (e.g. “generalized seizures” >“seizures”). Word frequencies from the

Google Web 1T 5-gram Corpus (http://www.ldc.upenn.edu/Catalog/index.jsp, catalog num-

ber LDC2006T13) were used to compute term weights, with rare terms producing higher

scores than common ones. For example, “seizures” produced a log odds weighting of 4.74, but

the more speciﬁc term “generalized seizures” yielded 6.89. The ﬁnal output for each drug is a

vector composed of 6,591 term weights (the weight of the term if present and zero otherwise).

From the PPI for carbamazepine, the Indication Section includes: “patients with the following

seizure types: partial seizures with complex symptomatology (psychomotor, temporal lobe).”

The unﬁltered bigrams include both sensible ones such as “partial seizures” and useless ones

such as “patients with” with the ﬁltering process eliminating the latter. For carbamazepine,

the two most heavily weighted terms were “failure liver” (8.83) and “syncope and collapse”

(8.62). The term “partial seizures” scored 6.37, with many related terms (e.g. “grand mal”)

scoring similarly.

PPISimilarity (A, B) = Pn

i=1 Ai∗Bi

pPn

i=1 A2

i∗pPn

i=1 B2

(1)

Comparison of a pair of drug PPI vectors is quantiﬁed using the cosine similarity metric

(Eq. 1). The metric has a range of 0–1, but its units are both arbitrary and counterintuitive.

To employ such values in our data fusion framework, the raw similarity scores were normalized

to p-values by generating a distribution of PPI similarity scores for unrelated molecule pairs.

The unrelated pairs were identiﬁed based on having low 2D and low 3D similarity, quantiﬁed

as described below with pairwise p-value comparisons ≥0.5 (we have previously shown that

structurally unrelated drug pairs very infrequently share targets1). So, given a PPI similarity

score Sbetween a drug pair, the p-value is simply the proportion of occurrences of Sor

greater in the background set. For example, the raw PPI similarity between carbamazepine

and levetiracetam was 0.286 (see Figure 1), and this corresponded to a p-value of 0.044. The

most heavily weighted terms in the comparison included the following: pancytopenia (6.6),

cytochrome p450 (6.6), grand mal (6.5), antiepileptic drugs (6.5), and partial seizures (6.4).

2.3. Target Prediction using Patient Package Insert Similarity

We have previously reported a framework for data fusion which allows for the integration

of similarity scores into a single value.1Brieﬂy, given a molecule Aand a set a molecules

with a shared biological eﬀect, Bn, the similarity between molecule Aand each molecule Bi

is computed. The similarity scores are normalized to p-values as detailed above by assessing

score magnitude against score from a random background set. The multinomial distribution

is then used to compute the likelihood, M, of observing the set of p-values and of the converse

probabilities, M∗. The log-odds score Lis then computed by taking the log of the ratio of M

Pacific Symposium on Biocomputing 2014

163

Biocomputing 2014 Downloaded from www.worldscientific.com

by 46.246.28.95 on 12/21/15. For personal use only.

and M∗and inverting the sign. A detailed discussion of the computation and corresponding 2D

and 3D similarity example can be found in the original publication.1An attractive feature of

our methodology is that it is able to integrate the results of diﬀerent similarity computations

into a single value. For example, the log-odds calculation for tolazamide interacting with

PPARγ-RXR yields single-modality values of 11.35 for PPI, 7.57 for 3D, and 5.49 for 2D.

Combining the similarity methods gives a stronger prediction compared to using any single

method alone with 3D+2D+PPI log-odds = 23.43.

2.4. Similarity and p-value Computation with Surﬂex-Sim

The Surﬂex-Sim 3D molecular similarity method and its use for virtual screening and oﬀ-target

prediction has been extensively described in multiple publications.2,8,12,13 Brieﬂy, given two

molecules in speciﬁc poses, a value from 0 to 1 is computed that reﬂects the degree to which

their molecular surfaces are congruent with respect to both shape and polarity. The function

is based on the diﬀerences in distances from observer points surrounding the molecules to

the closest points on their surfaces, including both the closest hydrophobic surface points and

the closest polar surface points. So, two molecules that may have very diﬀerent underlying

chemical scaﬀolds may exhibit nearly identical surfaces to the observer points. These points

are analogous to a protein binding pocket, which also “observes” ligands from the outside.

Additional details regarding the theory and underlying algorithmic details can be found in the

previously published work. In order to produce a log-odds value for a molecule against a list

of molecules with a shared annotation, 3D similarity values must be computed against each

annotated molecule, and these values must then be transformed into probabilities. Given the

particulars of the conformational sampling density, 3D similarity optimization thoroughness,

and empirical conversion of raw scores to p-values, the overall process required many hours

for each comparison of one molecule to a typical set of annotated molecules.

In the current work, two improvements were made to support large-scale application of the

methods. First, a new mode of pose optimization was developed in which diverse conforma-

tions of molecules are pre-generated prior to molecular comparison. Using this new mode, the

optimal pose for one molecule onto a speciﬁc pose of another can be done quickly enough to

process roughly 2 million drug-like molecules per day on a single computing core (compared

with roughly 10,000 previously). Second, rather than using explicit computation of 1000 back-

ground similarity values for each molecule (as previously), we made use of the observation

that these distributions were essentially always normally distributed. Given a molecule pair,

only the particular mean and standard deviation for each need be estimated in order to de-

rive a p-value rather than making use of the full empirical computation. Estimation of the

distributional parameters was accomplished using simple linear regression models that made

use of “molecular imprints” for each molecule.8A molecular imprint is a vector of similarity

values for a particular molecule against a ﬁxed basis set of molecules (one pose each). Such

vectors have precedent in predicting many molecular properties,14,15 and the conformational

pre-search procedure was augmented to produce standard molecular imprints. So, given two

pre-searched molecules, their mutual maximal 3D similarity can be rapidly calculated, and

the p-value conversion is immediately derived from the estimated distributional parameters

Pacific Symposium on Biocomputing 2014

164

Biocomputing 2014 Downloaded from www.worldscientific.com

by 46.246.28.95 on 12/21/15. For personal use only.

for each molecule. Taken together, the two improvements allow for typical 3D log-odds com-

putations to be made in a few minutes for a given molecule against a target characterized by

twenty known ligands. To test the accuracy of the faster method, we recomputed the p-values

and log-odds values from our previous work. An all-by-all similarity of the 358 drugs from

the original study yielded a Pearson’s correlation of 0.947 and Kendall’s tau of 0.814, both

highly statistically signiﬁcant. The full log-odds computation of 358 drugs against 44 targets

yielded a Pearson’s correlation of 0.955 and Kendall’s Tau of 0.761 (again highly statistically

signiﬁcant).

For 2D molecular similarity computations, which make purely topological comparisons

between molecules, we employed the previously described GSIM-2D method.1,2 This method

is suﬃciently eﬃcient that empirical conversion of raw scores into p-values is possible, as

we have previously described.1For this method to yield high similarity, two molecules must

be roughly the same size and contain similar subgraph compositions, especially for those

subgraphs rooted at heteroatoms.

3. Results and Discussion

3.1. Relationship between Structural Novelty and Clinical Eﬀects

Previously, we quantiﬁed the eﬀect of me-too drugs by showing that drug pairs with high

2D and high 3D similarity had four times more likelihood of having identical primary and

secondary targets than drugs pairs where one was structurally novel.1Here, this analysis has

been extended to clinical eﬀects by making use of the lexical similarity of package inserts.

Both to establish the relevance of the PPI similarity metric and to quantify the degree to

which structural novelty is related to changes in clinical eﬀects, we computed the pairwise 2D,

3D, and PPI similarity of all 602 drugs. The drug pairs were separated into four categories

based on chemical structural similarity: high 2D and 3D similarity, low 2D but high 3D, high

2D but low 3D, and low 2D and 3D. High similarity included pairs with p-values ≤0.01 and

low similarity were those with p-values ≥0.5.

Figure 2A shows the histogram of the PPI p-value distributions for each of the four struc-

tural categories. It is clear that the “me-too” drug distribution (red line, drug pairs with high

2D and high 3D similarity) is diﬀerent than the others. Toward the left side of the plot, where

clinical eﬀects similarity was high (PPI p-values ≤0.05), a large fraction of the me-too drug

pairs had highly similar clinical eﬀects. Structurally novel drug pairs (high 3D but low 2D

similarity, green line) exhibited a signiﬁcantly smaller fraction with highly concordant clin-

ical eﬀects but still showed some relationship between structural similarity and therapeutic

proﬁle. The high 2D and low 3D pairs had little signal (blue), and only a very small portion

of structurally dissimilar drug pairs (low 2D and low 3D, magenta) shared clinically similar

eﬀects. Clearly, drug pairs with very high structural similarity (both by 2D and 3D meth-

ods) were much more likely to have closely shared clinical eﬀects than molecule pairs of any

other category, even those sharing high 3D similarity but low 2D similarity. The converse

observations paralleled these observations. Figure 2B shows the corresponding histograms of

3D and 2D p-value distributions where molecule pair segregation was made based on clinical

eﬀect similarity. The 2D and 3D similarity p-value distributions for drug pairs with high PPI

Pacific Symposium on Biocomputing 2014

165

Biocomputing 2014 Downloaded from www.worldscientific.com

by 46.246.28.95 on 12/21/15. For personal use only.

0 0.2 0.4 0.6 0.8 1

Figure 7.

Quantifying the effect of me-too drugs based on PPI similarity.Panel A shows the PPI p-

value distribution of drug pairs that were segregated based on 2D and 3D p-values into

the four bins shown above (number of pairs per quadrant are shown in parentheses).

Drug pairs with high 2D and high 3D have a higher likelihood of having significant

phenotypic effects than molecules with low 2D but high 3D. Panel B shows the 3D p-

values distribution of drug pairs that were segregated based on high and low PPI p-

values.len(high_ppi): 3968

len(low_ppi): 88539

Yera/Cleves/Jain: Ligand Structure/Function 26

Frequency

PPI p-value 3D or 2D p-value

high 2D/high 3D (1065)

2D & 3D p-val ≤ 0.01

low 2D/high 3D (728)

2D p-val ≥ 0.50 & 3D p-val ≤ 0.01

high 2D/low 3D (428)

2D p-val ≤ 0.01 & 3D p-val ≥ 0.50

low 2D/low 3D (79654)

2D & 3D p-val ≥ 0.50

A B

0 0.2 0.4 0.6 0.8 1

2D high PPI

2D low PPI

3D high PPI

3D low PPI

Fig. 2. Relationship between structural similarity and clinical eﬀects similarity.

similarity (red and green lines) showed stronger enrichment for low p-values associated with

high 3D structural similarity. As expected, drug pairs that had low PPI similarity (blue and

brown lines) also had low 3D and 2D structural similarity.

3.2. Internal SPDB Validation: Oﬀ Target Eﬀects

An attractive aspect of the log-odds framework is that it allows us to combine diﬀerent types

of similarity computations into a single value. For each of the 602 drugs in our dataset, we

computed the 2D, 3D, PPI, and combination log-odds scores of interacting with each of the 91

targets that had at least 5 drugs as ligands in the SPDB. In each case, any self/self comparisons

were omitted from the calculations, making this exercise a leave-one-out cross-validation of

the log-odds predictive methodology. The three methods were used independently and in

combination to predict the log-odds of known primary and secondary target interactions.

As we observed in our previous study, primary target predictions were dominated by the

presence of me-too drugs, limiting the diﬀerences between any methods (data not shown).

However, for prediction of secondary targets, i.e. those that mediate side-eﬀects, signiﬁcant

diﬀerences appeared. Table 1 summarizes the true-positive rates observed for diﬀerence log-

odds computations for secondary target prediction at diﬀerent score thresholds.

Table 1. SPDB Secondary Target Performance

Log-Odds 3D 2D PPI 3D+2D 3D+PPI 2D+PPI 3D+2D+PPI

0 97 90 96 95 98 97 97

10 43 7 14 55 61 33 64

20 16 0 0 23 26 1 38

Pacific Symposium on Biocomputing 2014

166

Biocomputing 2014 Downloaded from www.worldscientific.com

by 46.246.28.95 on 12/21/15. For personal use only.

For all single methods and combinations of methods, the information present in the anno-

tated drugs yielded positive information, evidenced by high true-positive rates at a log-odds

threshold of 0. However, substantial diﬀerences among the methods appeared as higher log-

odds thresholds were considered. At a threshold of 10, the 3D similarity approach showed a

much higher retrieval rate than either of the other two single-mode methods. All combinations

of methods showed synergy, with the most eﬀective retrieval occurring with a combination

of all three similarity methods to produce a single log-odds score. Roughly 60% of the true

secondary target annotations could be recovered using the log-odds score from 3D+2D+PPI

similarity computations. Note, however, that true positive rates without the context of false

positive rates can be very misleading. The issue of estimating false positive rates is not straight-

forward though. In our SPDB, a missing annotation between a drug and a target does not

mean that the interaction does not occur. Authentic interactions within our 602 drug/91 target

set may have been published after our curation or have yet to be biochemically characterized.

Nonetheless, we expect that the large majority of unannotated interactions, in fact, represent

true negative data. So, as a surrogate for a measurement of false positive rates for our sim-

ilarity methods, we determined the number of drug/target predictions for interactions that

were unannotated. At log-odds thresholds of 5, 10, and 20, predictions for non-existent SPDB

annotations for both 3D similarity alone and 3D+2D+PPI were 3%, 1%, and 0.2%. These are

upper limits of false positive predictions. As will be described below, the false positive rate

was actually lower since many of the new predictions were validated as true by incorporating

annotations from the ChEMBL database.

3.3. Prediction of New Drug-Target Pairs within ChEMBL

As discussed above, a missing annotation within the SPDB between a drug and a target does

not necessarily mean that the interaction does not occur. For example, the drugs orphenadrine

and mesoridazine showed high 3D log-odds against the muscarinic receptor but the interactions

had been unannotated in the SPDB. Careful inspection of the literature revealed that the

drugs were known to antagonize muscarinic receptors.1Therefore, drug target annotations

that are known but missing from our SPDB can serve as a blind set to test our methodology.

To supplement annotations within the SPDB with a blind set for methodological testing,

we searched ChEMBL and found 380 biochemically characterized drug/target interactions

not present in the SPDB. We then investigated how well the methodology could identify the

new ChEMBL annotations based only upon information within in the SPDB as the basis to

compute the log-odds.

Table 2 shows the proportions correctly predicted at various log-odds using diﬀerent meth-

ods and combinations. In general, the trends observed for the SPDB leave-one-out experiments

were borne out. Among individual methods, 3D similarity strongly outperformed 2D- or PPI-

Table 2. ChEMBL Prediction Performance

Log-Odds 3D 2D PPI 3D+2D 3D+PPI 2D+PPI 3D+2D+PPI

5 43 14 13 42 41 19 41

10 16 3 3 20 18 8 22

20 2 0 0 3 1 0 4

Pacific Symposium on Biocomputing 2014

167

Biocomputing 2014 Downloaded from www.worldscientific.com

by 46.246.28.95 on 12/21/15. For personal use only.

based similarity, with the latter two having similar performance. However, the combination of

the three methods, overall, yielded better performance than 3D alone. At log-odds thresholds

of 10 and 20, using the full combination of methods, the percentage of recovered annotations

within the SPDB test set was 22% and 4%, respectively. This compared with 16% and 2%

using 3D similarity alone, and 3% and 0% using either 2D or PPI similarity alone. The enrich-

ment ratios for the combination approach, using the upper-bound false positive rates discussed

above, corresponded to 22-fold and 40-fold, respectively, at log-odds thresholds of 10 and 20.

Figure 3 shows a typical example of a drug/target interaction not annotated in the

SPDB where the combination similarity approach conﬁdently identiﬁed a pharmacolog-

ically relevant target. Sibutramine is an anorexic annotated in the SPDB as a ligand

of the serotonin and norepinephrine reuptake transporters. However, it has been shown

that sibutramine also interacts with the dopamine reuptake transporter and that this in-

teraction contributes to the therapeutic beneﬁt (indicated in the Meridia package insert,

http://www.rxabbott.com/pdf/meridia.pdf). Computing the similarity between sibutramine

and 11 other dopamine reuptake transporter inhibitors (two are shown in Figure Figure 3),

the log-odds were 2.3, 4.2, and 6.9 using 2D, 3D, and PPI, respectively. These predictions

were strengthened by combining all three methods, with corresponding log-odds of 9.4. The

pairwise PPI similarities between sibutramine and bupropion and nefazodone are highly signif-

icant as are the individual 3D similarities. Clinical eﬀects can be suﬃcient to infer oﬀ-targets,

nefazodone

terms: 547

Sibutramine

terms: 464

Primary: 5-HT & Norepinephrine reuptake transporters

Figure 15.

ChEMBL Example: PPI alone can infer off-target effects and combining with 3D makes

stronger inference.

Yera/Cleves/Jain: Ligand Structure/Function 36

2D Sim: 0.085 (p = 0.061)

bupropion

terms: 413

3D Sim: 7.1 (p = 0.045)

Predicted Target 2D 3D PPI 3D+PPI 2D+3D+PPI

Dopamine reuptake

transporter 2.3 4.2 6.9 9.2 9.4

term (212 in common) weight

tooth disorder 8.04

flu syndrome 7.41

abnormal dreams 7.20

emotional lability 6.95

hypomania 6.36

term (250 in common) weight

thinking abnormal 8.07

tooth disorder 8.04

abnormal dreams 7.20

emotional lability 6.95

suicidal ideation 6.23

PPI Sim: 0.420 (p = 0.001) 3D Sim: 7.6 (p = 0.028)

2D Sim: 0.032 (p = 0.506) PPI Sim: 0.432 (p = 0.001)

Fig. 3. ChEMBL example showing that combination similarity eﬀectively predicts a drug target interaction

not covered within the SPDB. Shown are the 2D structures, 3D overlays, and common clinical terms between

sibutramine and two dopamine reuptake transporter inhibitors, bupropion and nefazodone.

Pacific Symposium on Biocomputing 2014

168

Biocomputing 2014 Downloaded from www.worldscientific.com

by 46.246.28.95 on 12/21/15. For personal use only.

StructuralNoveltyofChEMBLandSPDB

Fluoxetine Prediction:M3ligand Apomorphine Prediction:D3ligand

ChEMBL

SPDB

uency

M3LogOdds

highest2Dsimilarity

D3LogOdds

highest2Dsimilarity

Fre

2D: 1.2

3D:7.7

PPI: 3.7

Methadone

2Dp‐val=0.041

3Dp‐val=0.0074

2D: ‐1.1

3D: 7.7

PPI: 1.2

Ropinirole

2Dp‐val=0.210

3Dp‐val=0.00035

2Dp‐val

00.2 0.4 0.6 0.8 1

Fig. 4. A near-neighbor analysis for each of 602 drugs (SPDB in green, ChEMBL in red) based on target

annotations from the SPDB.

but combining similarity methods generally adds conﬁdence to predictions.

Note, however, that the numerical performance on the ChEMBL set was lower than for

the SPDB set in terms of pure true positive recovery rates (see Tables 1 and 2). This stemmed

from an increase in structural diversity for molecules within ChEMBL compared to those

molecules within the SPDB for the target identiﬁed by the ChEMBL annotation. To quantify

structural novelty, we performed a nearest-neighbor analysis. For each drug within ChEMBL,

the most similar 2D representative from the SPDB was identiﬁed (based on p-value) from

within the collection of drugs having the same target annotation. An analogous leave-one-out

computation was performed for each drug target annotation within the SPDB. Figure 4 shows

a histogram of the distributions of p-values for the ChEMBL (red line) and SPDB (green line)

sets. Within the SPDB set, there were substantially more cases with extremely low p-values

than for the ChEMBL set. The nearest structural neighbor for each ChEMBL test molecule

were generally more divergent. Two examples are highlighted from the ChEMBL set where

the nearest neighbor had poor 2D p-values relative to the much more signiﬁcant 3D p-values

which provided support for high log-odds scores.

Fluoxetine (blue box) is a selective serotonin reuptake inhibitor which mediates its ther-

apeutic beneﬁt through inhibition of the 5-HT reuptake transporter. The ChEMBL data in-

dicated that ﬂuoxetine also interacts with the muscarinic M3receptor. The nearest-neighbor

molecule sharing this annotation was methadone (2D p-value = 0.041). Considering all of the

muscarinic M3receptor ligands (38 total), the 2D, 3D, and PPI log-odds were 1.2, 7.7, and

3.7 respectively. Combining all of the methods gave a score of 8.2.

Apomorphine (red box) is indicated to treat Parkinson’s disease and its therapeutic beneﬁt

is thought to be primarily due to activating dopamine D2receptors. However, apomorphine

was indicated within ChEMBL to also interact with the dopamine D3receptor (which is also

known to play a role in the beneﬁcial eﬀects for other anti-Parkinsonian drugs). The nearest-

neighbor drug within the D34 ligands was ropinirole (2D p-value = 0.210), which is structurally

distinct in a topological sense in Figure 4. As in the previous case, when considering all 11

dopamine D3ligands, the 3D comparisons provide primary support for a positive log-odds

Pacific Symposium on Biocomputing 2014

169

Biocomputing 2014 Downloaded from www.worldscientific.com

by 46.246.28.95 on 12/21/15. For personal use only.

score. The 2D, 3D, and PPI log-odds were -1.1, 7.7, and 1.2 respectively. The combination of

all three comparison types yielded a score of 3.3. Here, the 3D molecular similarity information

was the most reliable predictor.

4. Conclusion

In the present study, we report a means to combine chemical similarity between molecules

with information derived from computing similarity based upon lexical analysis of patient

package inserts (PPI). As expected based on our prior work, drugs that were highly struc-

turally similar (both by 2D and 3D comparison) were much more likely to have signiﬁcant

overlap of their clinical eﬀects compared to drugs that were structurally diﬀerent (low 2D

similarity but high 3D similarity). Our prior work illustrated a similar eﬀect with respect to

speciﬁcally annotated molecular targets: me-too drugs tend to have nearly identical target

proﬁles.1The correlation between lexical and chemical similarity also served to validate the

lexical comparison methodology.

We extended a probabilistic data fusion method to include observations from both molec-

ular and clinical eﬀects similarity and reported performance on predicting protein targets of

small molecules. This was done both by leave-one-out cross-validation on our internal database

of drug-target interactions (the SPDB) as well as on a blind test on new interactions present

in ChEMBL. For oﬀ-target prediction within the SPDB, 3D similarity was the most eﬀec-

tive single information source. However, combining the methods predicted a larger proportion

of secondary targets than any of the individual methods, while maintaining a similar nomi-

nal false positive rate. On the test against previously unseen ChEMBL drug-target linkages,

again 3D similarity was the single most eﬀective predictor, but gains were derived from com-

bining the diﬀerent data sources. We note that the method supports the integration of any

method that produces scores relating molecules to targets (e.g. docking), and that inclusion

of additional information sources is likely to produce further beneﬁts. It is also important to

understand that this framework is similar in character to virtual screening methods, in that

while enrichment for compounds with the predicted eﬀects occurs, the actual potencies of the

eﬀects are not predicted. This point is discussed at length in a prior study.16

In contemplating the problem of oﬀ-target prediction for drugs, the problem of molecular

design ancestry can confuse the issue of methodological validation. For example, ligands of

aminergic GPCRs oﬀer troublesome test case, owing to the established promiscuity of such

drugs among numerous targets.17 Returning to Figure 1 (bottom), we see the example of lev-

etiracetam, an anticonvulsant believed to have a unique mechanism of action when compared

with most existing anticonvulsants. The established CNS targets of the major classes of an-

ticonvulsant drugs include the GABAAreceptor (for barbiturates such as pentobarbital) and

neuronal voltage-gated sodium channels (for drugs such as carbamazepine and phenytoin).

These drugs have been recently shown to modulate voltage-gated potassium channels as part

of their anti-epileptic eﬀects.18–21 Levetiracetam, having a novel scaﬀold, has been proposed

to work through an entirely new mechanism of action due to high binding aﬃnity to the

synaptic vesicle protein SV2A (which is not a known therapeutic target of any drug).6,22,23

Our methods strongly predict that levetiracetam is a voltage-gated sodium channel modulator

Pacific Symposium on Biocomputing 2014

170

Biocomputing 2014 Downloaded from www.worldscientific.com

by 46.246.28.95 on 12/21/15. For personal use only.

with 3D log-odds alone of 14.5 (the combination log-odds was 21.4). Levetiracetam has been

shown to inhibit voltage-gated potassium currents,22 leading to the suggestion that this drug,

like other anti-epileptics, acts at least in part through potassium channels. Considering that

many antiepileptics modulate both sodium and potassium channels,23 our prediction supports

the notion that levetiracetam shares a similar mechanism of action, perhaps in addition to

the interaction with SV2A.

Identiﬁcation of oﬀ-target activities of drugs is a diﬃcult problem, particularly in cases

where the drug in question has a non-obvious structural relationship with the known ligands

of a given target. Our hope is that methods that make use of multiple information sources

will help to identify clinically important and unexpected eﬀects.

References

1. E. Yera, A. Cleves and A. Jain, Journal of Medicinal Chemistry 54, 6771 (2011).

2. A. E. Cleves and A. N. Jain, Journal of Computer-Aided Molecular Design 22, 147 (2008).

3. J. Wright and R. Willette, Journal of Medicinal Chemistry 5, 815 (1962).

4. K. Nagashima, A. Takahashi, H. Ikeda, A. Hamasaki, N. Kuwamura, Y. Yamada and Y. Seino,

Diabetes Research and Clinical Practice 66, S75 (2004).

5. D. S. Ragsdale and M. Avoli, Brain Research 26, p. 16 (1998).

6. B. Lynch, N. Lambeng, K. Nocka, P. Kensel-Hammes, S. Bajjalieh, A. Matagne and B. Fuks,

PNAS 101, 9861 (2004).

7. M. Campillos, M. Kuhn, A. Gavin, L. Jensen and P. Bork, Science 321, 263 (2008).

8. A. E. Cleves and A. N. Jain, Journal of Medicinal Chemistry 49, 2921 (2006).

9. A. Gaulton, L. Bellis, A. Bento, J. Chambers, M. Davies, A. Hersey, Y. Light, S. McGlinchey,

D. Michalovich and B. Al-Lazikani, Nucleic Acids Research 40, D1100 (2012).

10. G. Salton and M. J. McGill, Introduction to Modern Information Retrieval (McGraw-Hill, Inc.,

New York, NY, USA, 1986).

11. J. Han, M. Kamber and J. Pei, Data Mining: Concepts and Techniques, Second Edition (Morgan

Kaufmann Series in Data Management Systems), 2 edn. (Morgan Kaufmann, 2006).

12. A. N. Jain, Journal of Computer-Aided Molecular Design 14, 199 (2000).

13. A. N. Jain, Journal of Medicinal Chemistry 47, 947 (2004).

14. A. Ghuloum, R. Carleton and A. Jain, Journal of Medicinal Chemistry 42, 1739 (1999).

15. J. Mount, J. Ruppert, W. Welch and A. Jain, Journal of Medicinal Chemistry 42, 60 (1999).

16. A. Jain and A. Cleves, J Comput Aided Mol Des 26, 57 (2012).

17. M. J. Keiser, B. L. Roth, B. N. Armbruster, P. Ernsberger, J. J. Irwin and B. K. Shoichet,

Nature Biotechnology 25, 197 (2007).

18. C. Zona, V. Tancredi, E. Palma, G. Pirrone and M. Avoli, Canadian Journal of Physiology and

Pharmacology 68, 545 (1990).

19. F. Bloom, D. Kupfer and B. Bunney, Psychopharmacology: The Fourth Generation of Progress

(Raven Press, 1995).

20. M. Nobile and P. Vercellino, British Journal of Pharmacology 120, 647 (1997).

21. A. Ambr´osio, P. Soares-da Silva, C. Carvalho and A. Carvalho, Neurochem. Res. 27, 121 (2002).

22. M. Madeja, D. Georg Margineanu, A. Gorji, E. Siep, P. Boerrigter, H. Klitgaard and E. Speck-

mann, Neuropharmacology 45, 661 (2003).

23. R. Surges, K. Volynski and M. Walker, Therapeutic Advances in Neurological Disorders 1, 13

(2008).

Pacific Symposium on Biocomputing 2014

171

Biocomputing 2014 Downloaded from www.worldscientific.com

by 46.246.28.95 on 12/21/15. For personal use only.

Chemical and protein structural basis for biological crosstalk between PPARα and COX enzymes

Article

Full-text available

Nov 2014
J COMPUT AID MOL DES

We have previously validated a probabilistic framework that combined computational approaches for predicting the biological activities of small molecule drugs. Molecule comparison methods included molecular structural similarity metrics and similarity computed from lexical analysis of text in drug package inserts. Here we present an analysis of novel drug/target predictions, focusing on those that were not obvious based on known pharmacological crosstalk. Considering those cases where the predicted target was an enzyme with known 3D structure allowed incorporation of information from molecular docking and protein binding pocket similarity in addition to ligand-based comparisons. Taken together, the combination of orthogonal information sources led to investigation of a surprising predicted relationship between a transcription factor and an enzyme, specifically, PPARα and the cyclooxygenase enzymes. These predictions were confirmed by direct biochemical experiments which validate the approach and show for the first time that PPARα agonists are cyclooxygenase inhibitors.

Knowledge-guided docking: Accurate prospective prediction of bound configurations of novel ligands using Surflex-Dock

Article

Full-text available

May 2015
J COMPUT AID MOL DES

Prediction of the bound configuration of small-molecule ligands that differ substantially from the cognate ligand of a protein co-crystal structure is much more challenging than re-docking the cognate ligand. Success rates for cross-docking in the range of 20-30 % are common. We present an approach that uses structural information known prior to a particular cutoff-date to make predictions on ligands whose bounds structures were determined later. The knowledge-guided docking protocol was tested on a set of ten protein targets using a total of 949 ligands. The benchmark data set, called PINC ("PINC Is Not Cognate"), is publicly available. Protein pocket similarity was used to choose representative structures for ensemble-docking. The docking protocol made use of known ligand poses prior to the cutoff-date, both to help guide the configurational search and to adjust the rank of predicted poses. Overall, the top-scoring pose family was correct over 60 % of the time, with the top-two pose families approaching a 75 % success rate. Correct poses among all those predicted were identified nearly 90 % of the time. The largest improvements came from the use of molecular similarity to improve ligand pose rankings and the strategy for identifying representative protein structures. With the exception of a single outlier target, the knowledge-guided docking protocol produced results matching the quality of cognate-ligand re-docking, but it did so on a very challenging temporally-segregated cross-docking benchmark.

Rapid Identification of Potential Drug Candidates from Multi-Million Compounds’ Repositories. Combination of 2D Similarity Search with 3D Ligand/Structure Based Methods and In Vitro Screening

Article

Full-text available

Sep 2021
MOLECULES

Rapid in silico selection of target focused libraries from commercial repositories is an attractive and cost-effective approach in early drug discovery. If structures of active compounds are available, rapid 2D similarity search can be performed on multimillion compounds’ databases. This approach can be combined with physico-chemical parameter and diversity filtering, bioisosteric replacements, and fragment-based approaches for performing a first round biological screening. Our objectives were to investigate the combination of 2D similarity search with various 3D ligand and structure-based methods for hit expansion and validation, in order to increase the hit rate and novelty. In the present account, six case studies are described and the efficiency of mixing is evaluated. While sequentially combined 2D/3D similarity approach increases the hit rate significantly, sequential combination of 2D similarity with pharmacophore model or 3D docking enriched the resulting focused library with novel chemotypes. Parallel integrated approaches allowed the comparison of the various 2D and 3D methods and revealed that 2D similarity-based and 3D ligand and structure-based techniques are often complementary, and their combinations represent a powerful synergy. Finally, the lessons we learnt including the advantages and pitfalls of the described approaches are discussed.

Computational Approaches Towards Kinases as Attractive Targets for Anticancer Drug Discovery and Development

Article

Jul 2019

Background: One of the major goals of computational chemists is to determine and develop the pathways for anticancer drug discovery and development. In recent past, high performance computing systems elicited the desired results with little or no side effects. The aim of the current review is to evaluate the role of computational chemistry in ascertaining kinases as attractive targets for anticancer drug discovery and development. Methods: Research related to computational studies in the field of anticancer drug development is reviewed. Extensive literature on achievements of theorists in this regard has been compiled and presented with special emphasis on kinases being the attractive anticancer drug targets. Results: Different approaches to facilitate anticancer drug discovery include determination of actual targets, multi-targeted drug discovery, ligand-protein inverse docking, virtual screening of drug like compounds, formation of di-nuclear analogs of drugs, drug specific nano-carrier design, kinetic and trapping studies in drug design, multi-target QSAR (Quantitative Structure Activity Relationship) model, targeted co-delivery of anticancer drug and siRNA, formation of stable inclusion complex, determination of mechanism of drug resistance, and designing drug like libraries for the prediction of drug-like compounds. Protein kinases have gained enough popularity as attractive targets for anticancer drugs. These kinases are responsible for uncontrolled and deregulated differentiation, proliferation, and cell signaling of the malignant cells which result in cancer. Conclusion: Interest in developing drugs through computational methods is a growing trend, which saves equally the cost and time. Kinases are the most popular targets among the other for anticancer drugs which demand attention. 3D-QSAR modelling, molecular docking, and other computational approaches have not only identified the target-inhibitor binding interactions for better anticancer drug discovery but are also designing and predicting new inhibitors, which serve as lead for the synthetic preparation of drugs. In light of computational studies made so far in this field, the current review highlights the importance of kinases as attractive targets for anticancer drug discovery and development.

QSAR/QSPR Modeling in the Design of Drug Candidates with Balanced Pharmacodynamic and Pharmacokinetic Properties

Chapter

May 2017

Drug discovery and development is a slow complicated multi-objective and expensive enterprise. Drug candidates are a compromise output of competing pharmacodynamics and pharmacokinetic processes. To facilitate this task and avoid failures in clinical phases, computational techniques and in silico modeling using the endpoints offered by high technology, are extremely valuable. In this chapter, some historical aspects and a background overview for constructing Quantitative Structure-Activity Relationships (QSAR) and Quantitative Structure-Property Relationships (QSPR) are provided. The different goals for the establishment of QSAR/QSPR models are defined. Representative examples and success stories of in silico modeling along the different drug discovery processes are presented. Examples include models for optimizing efficient binding to receptor, using both ligand- and structure-based approaches, for in vitro permeability predictions, predictions for human intestinal absorption and blood brain barrier penetration, as well as for plasma protein binding and drug metabolism. The value of global and local models as well as their interpretability and the criteria for their evaluation and proper use are discussed throughout this chapter.

ForceGen 3D structure and conformer generation: from small lead-like molecules to macrocyclic drugs

Article

Full-text available

May 2017
J COMPUT AID MOL DES

We introduce the ForceGen method for 3D structure generation and conformer elaboration of drug-like small molecules. ForceGen is novel, avoiding use of distance geometry, molecular templates, or simulation-oriented stochastic sampling. The method is primarily driven by the molecular force field, implemented using an extension of MMFF94s and a partial charge estimator based on electronegativity-equalization. The force field is coupled to algorithms for direct sampling of realistic physical movements made by small molecules. Results are presented on a standard benchmark from the Cambridge Crystallographic Database of 480 drug-like small molecules, including full structure generation from SMILES strings. Reproduction of protein-bound crystallographic ligand poses is demonstrated on four carefully curated data sets: the ConfGen Set (667 ligands), the PINC cross-docking benchmark (1062 ligands), a large set of macrocyclic ligands (182 total with typical ring sizes of 12–23 atoms), and a commonly used benchmark for evaluating macrocycle conformer generation (30 ligands total). Results compare favorably to alternative methods, and performance on macrocyclic compounds approaches that observed on non-macrocycles while yielding a roughly 100-fold speed improvement over alternative MD-based methods with comparable performance.

Extrapolative prediction using physically-based QSAR

Article

Full-text available

Feb 2016
J COMPUT AID MOL DES

Surflex-QMOD integrates chemical structure and activity data to produce physically-realistic models for binding affinity prediction . Here, we apply QMOD to a 3D-QSAR benchmark dataset and show broad applicability to a diverse set of targets. Testing new ligands within the QMOD model employs automated flexible molecular alignment, with the model itself defining the optimal pose for each ligand. QMOD performance was compared to that of four approaches that depended on manual alignments (CoMFA, two variations of CoMSIA, and CMF). QMOD showed comparable performance to the other methods on a challenging, but structurally limited, test set. The QMOD models were also applied to test a large and structurally diverse dataset of ligands from ChEMBL, nearly all of which were synthesized years after those used for model construction. Extrapolation across diverse chemical structures was possible because the method addresses the ligand pose problem and provides structural and geometric means to quantitatively identify ligands within a model’s applicability domain. Predictions for such ligands for the four tested targets were highly statistically significant based on rank correlation. Those molecules predicted to be highly active (\(\hbox {pK}_i \ge 7.5\)) had a mean experimental \(\hbox {pK}_i\) of 7.5, with potent and structurally novel ligands being identified by QMOD for each target.

A holistic approach for integration of biological systems and usage in drug discovery

Article

Full-text available

Jan 2016

A system can be defined as an organized, interconnected structure consisting of interrelated and interdependent elements (e.g., components, factors, members, parts). These parts and processes are connected by structural and/or behavioral relationships and continually influence one another directly or indirectly to maintain a balance essential for the existence of the system, and for achieving its goal. With increasing inflow of biological data, serious efforts to empathize biological systems as true systems are nowadays almost practicable. Handling high-throughput data places stress mainly on in silico approach comprising database handling, modeling, simulation and analysis, resulting in dramatic progress in system-level analysis. The databases and methods in bioinformatics are now moving in the direction of implementation of integrative dataset systems to represent genes, proteins and metabolic pathways in combination with simulated environment which is dynamic. For understanding the complex biological disorders and normal pathways of system it is significant to integrate the reductionist data which comes from transcriptomics, genomics, proteomics, lipidomics, glycomics, fluxomics and metabolomics. Numerous bioinformatics approaches are being exploited to integrate the molecular information from the biological databases and assist in simulation of metabolic networks. High-throughput experimental data set systems are, however, established on the static representation of the molecular data and existing knowledge. Various biological tools have been developed for understanding the mechanism of several diseases for drug discovery process. Study of dynamic nature of genetic, biochemical and signal transduction pathways can be done by simulating reactions with the help of integrative tools. Rising usage of rational drug designing approach is significant for identification of target in disease polluted network and evaluating ligand interaction for enhanced efficacy. How in-depth investigation of the whole system (a holistic approach) leads to emergence of systems biology is the crux of this review.

Does Your Model Weigh the Same as a Duck?

Article

Full-text available

Dec 2011
J COMPUT AID MOL DES

Computer-aided drug design is a mature field by some measures, and it has produced notable successes that underpin the study of interactions between small molecules and living systems. However, unlike a truly mature field, fallacies of logic lie at the heart of the arguments in support of major lines of research on methodology and validation thereof. Two particularly pernicious ones are cum hoc ergo propter hoc (with this, therefore because of this) and confirmation bias (seeking evidence that is confirmatory of the hypothesis at hand). These fallacies will be discussed in the context of off-target predictive modeling, QSAR, molecular similarity computations, and docking. Examples will be shown that avoid these problems.

ChEMBL: a Large-scale Bioactivity Database for Drug Discovery

Article

Full-text available

Sep 2011
NUCLEIC ACIDS RES

ChEMBL is an Open Data database containing binding, functional and ADMET information for a large number of drug-like bioactive compounds. These data are manually abstracted from the primary published literature on a regular basis, then further curated and standardized to maximize their quality and utility across a wide range of chemical biology and drug-discovery research problems. Currently, the database contains 5.4 million bioactivity measurements for more than 1 million compounds and 5200 protein targets. Access is available through a web-based interface, data downloads and web services at: https://www.ebi.ac.uk/chembldb.

Review: Is levetiracetam different from other antiepileptic drugs? Levetiracetam and its cellular mechanism of action in epilepsy revisited

Article

Full-text available

Jul 2008

Levetiracetam (LEV) is a new antiepileptic drug that is clinically effective in generalized and partial epilepsy syndromes as sole or add-on medication. Nevertheless, its underlying mechanism of action is poorly understood. It has a unique preclinical profile; unlike other antiepileptic drugs (AEDs), it modulates seizure-activity in animal models of chronic epilepsy with no effect in most animal models of acute seizures. Yet it is effective in acute in-vitro 'seizure' models. A possible explanation for these dichotomous findings is that LEV has different mechanisms of actions, whether given acutely or chronically and in 'epileptic' and control tissue. Here we review the general mechanism of action of AEDs, give an updated and critical overview about the experimental findings of LEV's cellular targets (in particular the synaptic vesicular protein SV2A) and ask whether LEV represents a new class of AED.

Data Mining: Concepts and Techniques

Book

Jan 2012

This is the third edition of the premier professional reference on the subject of data mining, expanding and updating the previous market leading edition. This was the first (and is still the best and most popular) of its kind. Combines sound theory with truly practical applications to prepare students for real-world challenges in data mining. Like the first and second editions, Data Mining: Concepts and Techniques, 3rd Edition equips professionals with a sound understanding of data mining principles and teaches proven methods for knowledge discovery in large corporate databases. The first and second editions also established itself as the market leader for courses in data mining, data analytics, and knowledge discovery. Revisions incorporate input from instructors, changes in the field, and new and important topics such as data warehouse and data cube technology, mining stream data, mining social networks, and mining spatial, multimedia and other complex data. This book begins with a conceptual introduction followed by a comprehensive and state-of-the-art coverage of concepts and techniques. Each chapter is a stand-alone guide to a critical topic, presenting proven algorithms and sound implementations ready to be used directly or with strategic modification against live data. Wherever possible, the authors raise and answer questions of utility, feasibility, optimization, and scalability. relational data. -- A comprehensive, practical look at the concepts and techniques you need to get the most out of real business data. -- Updates that incorporate input from readers, changes in the field, and more material on statistics and machine learning, -- Scores of algorithms and implementation examples, all in easily understood pseudo-code and suitable for use in real-world, large-scale data mining projects. -- Complete classroom support for instructors as well as bonus content available at the companion website. A comprehensive and practical look at the concepts and techniques you need in the area of data mining and knowledge discovery.

Psychopharmacology: The Fourth Generation of Progress

Article

Aug 1995

Robert B. Daroff

Sodium channels as molecular targets for antiepileptic drugs

Article

Apr 1998

Voltage-gated sodium channels mediate regenerative inward currents that are responsible for the initial depolarization of action potentials in brain neurons. Many of the most widely used antiepileptic drugs, as well as a number of promising new compounds suppress the abnormal neuronal excitability associated with seizures by means of complex voltage- and frequency-dependent inhibition of ionic currents through sodium channels. Over the past decade, advances in molecular biology have led to important new insights into the molecular structure of the sodium channel and have shed light on the relationship between channel structure and channel function. In this review, we examine how our current knowledge of sodium channel structure–function relationships contributes to our understanding of the action of anticonvulsant sodium channel blockers.

Introduction To Modern Information Retrieval

Book

Jan 1984

Chemical Structural Novelty: On-Targets and Off-Targets

Article

Sep 2011
J MED CHEM

Drug structures may be quantitatively compared based on 2D topological structural considerations and based on 3D characteristics directly related to binding. A framework for combining multiple similarity computations is presented along with its systematic application to 358 drugs with overlapping pharmacology. Given a new molecule along with a set of molecules sharing some biological effect, a single score based on comparison to the known set is produced, reflecting either 2D similarity, 3D similarity, or their combination. For prediction of primary targets, the benefit of 3D over 2D was relatively small, but for prediction of off-targets, the added benefit was large. In addition to assessing prediction, the relationship between chemical similarity and pharmacological novelty was studied. Drug pairs that shared high 3D similarity but low 2D similarity (i.e., a novel scaffold) were shown to be much more likely to exhibit pharmacologically relevant differences in terms of specific protein target modulation.

Potassium currents in rat cortical neurons in culture are enhanced by the antiepileptic drug carbamazepine

Article

May 1990
CAN J PHYSIOL PHARM

We report that carbamazepine (Tegretol), a drug that is useful for the treatment of complex partial seizures, enhances outward, voltage-dependent K+ currents generated by rat neocortical cells in culture and recorded with patch-clamping techniques. This effect is seen in the presence of therapeutic concentrations of carbamazepine (10-20 microM). Furthermore, at these doses carbamazepine does not influence voltage-dependent inward Na+ and Ca2+ currents recorded in these cells. The action exerted by carbamazepine on K+ currents is a novel finding and might represent an important mechanism for controlling neocortical excitability and thus the generation of epileptiform activity.

Inhibition of delayed rectifier K+ channels by phenytoin in rat neuroblastoma cells

Article

Mar 1997

The action of the anticonvulsant drug phenytoin on K ⁺ currents was investigated in neuroblastoma cells by whole‐cell voltage‐clamp recording. Neuroblastoma cells expressed an outward K ⁺ current with a voltage‐and time‐dependence which resembled the delayed‐rectifier K ⁺ current found in other cells. When added to the standard external solution at concentrations ranging between 1 and 200 μM, phenytoin reduced the current ( n = 65). Inhibition was concentration‐dependent with a half‐maximal inhibitory concentration of 30.9 + 0.8 μM. The K ⁺ current inhibition by phenytoin was voltage‐dependent with block by phenytoin being relieved by depolarization. The times taken to reach steady‐state inhibition and complete recovery from inhibition were about 20 s. Neither the activation and inactivation rates of the K ⁺ current nor the K ⁺ channel availability were significantly altered by the blocking drug. A use‐dependent block was observed at phenytoin concentrations of 10, 25 and 50 μM. These results suggest that phenytoin affects K ⁺ currents and that this effect might lead to a reduction in neuronal excitability.

Prediction of off-target drug effects through data fusion

Abstract and Figures

Recommended publications

Anchor Global Position Accuracy Enhancement Based on Data Fusion

Improving Click Fraud Detection by Real Time Data Fusion

Fusion technique for multisensor track initiation

Multi-sensor Data Fusion Based on Dynamic Fuzzy Neural Network