ArticlePDF Available

DrugBank: A Comprehensive Resource for in Silico Drug Discovery and Exploration

January 2006
Nucleic Acids Research 34(Database issue):D668-72

January 2006
34(Database issue):D668-72

DOI:10.1093/nar/gkj067

Source
PubMed

Authors:

David Scott Wishart

University of Alberta

Craig Knox

University of Alberta

An Chi Guo

University of Alberta

Show all 8 authorsHide

DrugBank is a unique bioinformatics/cheminformatics resource that combines detailed drug (i.e. chemical) data with comprehensive drug target (i.e. protein) information. The database contains >4100 drug entries including >800 FDA approved small molecule and biotech drugs as well as >3200 experimental drugs. Additionally, >14 000 protein or drug target sequences are linked to these drug entries. Each DrugCard entry contains >80 data fields with half of the information being devoted to drug/chemical data and the other half devoted to drug target or protein data. Many data fields are hyperlinked to other databases (KEGG, PubChem, ChEBI, PDB, Swiss-Prot and GenBank) and a variety of structure viewing applets. The database is fully searchable supporting extensive text, sequence, chemical structure and relational query searches. Potential applications of DrugBank include in silico drug target discovery, drug design, drug docking or screening, drug metabolism prediction, drug interaction prediction and general pharmaceutical education. DrugBank is available at http://redpoll.pharmacy.ualberta.ca/drugbank/.

Content uploaded by Craig Knox

Content may be subject to copyright.

DrugBank: a comprehensive resource for in silico

drug discovery and exploration

David S. Wishart*, Craig Knox, An Chi Guo, Savita Shrivastava, Murtaza Hassanali,

Paul Stothard, Zhan Chang and Jennifer Woolsey

Department of Computing Science and Department of Biological Sciences, University of Alberta, Edmonton,

AB, Canada T6G 2E8

Received August 14, 2005; Revised and Accepted October 8, 2005

ABSTRACT

DrugBank is a unique bioinformatics/cheminformat-

ics resource that combines detailed drug (i.e. chem-

ical) data with comprehensive drug target (i.e. protein)

information. The database contains .4100 drug ent-

ries including .800 FDA approved small molecule

and biotech drugs as well as .3200 experimental

drugs. Additionally, .14 000 protein or drug target

sequences are link ed to these drug entries. Each

DrugCard entry contains .80 data fields with half of

the information being devoted to drug/chemical data

and the other half devoted to drug target or protein

data. Many data fields are hyperlinked to other data-

bases (KEGG, PubChem, ChEBI, PDB, Swiss-Prot and

GenBank) and a variety of structure viewing applets.

The database is fully searchable supporting extensive

text, sequence, chemical structure and relational

query searches. Potential applications of DrugBank

include in silico drug target discovery, drug design ,

drug docking or screening, drug metabolism predic-

tion, drug interaction prediction and general pharma-

ceutical education. DrugBank is available at http://

redpoll.pharmacy.ualberta.ca/drugbank/.

INTRODUCTION

Until the 1980s, most of our knowledge about drugs, drug

mechanisms and drug receptors could ﬁt in a few encyclope dic

books and a couple dozen schematic ﬁgures. However, with

the recent explosion in biological and chemical knowledge,

this is no longer the case. There is simply too much data

(images, models, structures and sequences) from too many

sources. Unfortunately, most of this information still resides

in textbooks or print journals. The limited drug or drug recep-

tor data that is electronically available is either inaccessible

(except through expensive subscriptions), inadequate or

widely scattered among many different public databases.

This state of affairs largely reﬂects the ‘two solitudes’ of

cheminformatics and bioinformatics. Neither discipline has

really tried to integrate with the other. As a consequence,

the wealth of electronic sequence/structure data that exists

today has never been well linked to the enormous body of

drug or chemical knowledge that has accumulated over the

past half century.

Recently, some notable efforts have been made to partially

overcome this ‘informatics gap’. The Therapeutic Target

Database or TTD is one such example (1). This very useful

web-based resource contains linked lists of names for >1100

small molecule drugs and drug targets (i.e. proteins). In addi-

tion to the TTD, a number of more comprehensive small

molecule databases have also emerged including KEGG

(2), ChEBI (3) and PubChem (http://pubchem.ncbi.nlm.nih.

gov/). Each contains tens of thousands of chemical

entries—including hundreds of small molecule drugs. All

three databases provide names, synonyms, images, structure

ﬁles and hyperlinks to other databases. Furthermore, both

KEGG and PubChem support structure similarity searches.

Unfortunately, these databases were not speciﬁcally designed

to be drug databases, and so they do not provide speciﬁc

pharmaceutical information or links to speciﬁc drug targets

(i.e. sequences). Furthermore, because thes e databases were

designed to be synoptic (containing <15 ﬁelds per compound

entry) they do not provide a comprehensive molecular sum-

mary of any given drug or its corresponding protein target.

More specialized drug databases such as PharmGKB (4) or on-

line pharmaceutical encyclopedias such as RxList (5) tend to

offer muc h more detailed clinical information about many

drugs (their pharmacology, metabolism and indications) but

they were not designed to contain structural, chemical or

physico-chemical information. Instead their data content is

targeted more towards pharmacists, physicians or consumers.

Ideally, what is needed is something that combines the

strengths of, say, PharmGKB, PubChem and Swiss-Prot to

*To whom correspondence should be addressed. Tel: +1 780 492 0383; Fax: +1 780 492 1071; Email: david.wishart@ualberta.ca

The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access

version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press

are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but

only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact journals.permissions@oxfordjournals.org

D668–D672 Nucleic Acids Research, 2006, Vol. 34, Database issue

doi:10.1093/nar/gkj067

create a single, fully searchable in silico drug resource that

links sequence, structure and mechanistic data about drug

molecules (including biotech drugs) with sequence, structure

and mechanistic data about their drug targets. Beyond its obvi-

ous educational value, this kind of database could potentially

allow researchers to easily visualize and explore 3D drug

interactions, compare drug similarities or perform in silico

drug (or drug target) discovery. Here, we wish to describe

just such a database—called DrugBank.

DATABASE DESCRIPTION

Fundamentally, DrugBank is a dual purpose bioinformatics–

cheminformatics database with a strong focus on quantitative,

analytic or molecular-scale information about both drugs and

drug targets. In many respects it combines the data-rich

molecular biology content normally found in curated sequence

databases such as Swiss-Prot and UniProt (6) with the equally

rich data found in medicinal chemistry textbooks and chemical

reference handbooks. By bringing these two disparate types of

information together into one uniﬁed and freely available

resource, we wanted to allow educators and researchers

from diverse disciplines and backgrounds (academic, indus-

trial, clinical, non-clinical ) to conduct the type of in silico

learning and discovery that is now routine in the world of

genomics and proteomics.

The diversity of data types and the required breadth of

domain knowledge, combined with the fact that the data

were mostly ‘paper- bound’ made the assembly of DrugBank

both difﬁcult and time-consuming. To compile, conﬁrm and

validate this comprehensive collection of data, more than a

dozen textbooks, several hundred journal articles, nearly 30

different electronic databases, and at least 20 in-house or

web-based programs were individually searched, accessed,

compared, written or run over the course of four years. The

team of DrugBank archivists and annotators included two

accredited pharmacists, a physician and three bioinform-

aticians with dual training in computing science and molecular

biology/chemistry.

DrugBank currently contains >4100 drug entries, corres-

ponding to >12 000 different trade names and synonyms.

These drug entries were chosen according to the following

rules: the molecule must contain more than one type of atom,

be non-redundant, have a known chemical structure and be

identiﬁed as a drug or drug-like molecule by at least one

reputable data source. To facilitate more targeted research

and exploration, DrugBank is divided into four major categor-

ies: (i) FDA-approved small molecule drugs (>700 entries),

(ii) FDA-approved biotech (protein/peptide) drugs (>100 ent-

ries), (iii) nutraceuticals or micronutrients such as vitamins

and metabolites (>60 entries) and (iv) experimental drugs,

including unapproved drugs, de-listed drugs, illicit drugs,

enzyme inhibitors and potential toxins (3200 entries). These

individual ‘Drug Types’ are also bundled into two larger

categories including all FDA drugs (Approved Drugs) and

All Compounds (Experimental + FDA + nutraceuticals).

DrugBank’s cover age for non-trivial FDA-approved drugs

is 80% complete. In addition, >14 000 protein (i.e. drug

target) sequences are linked to these drug entries. More com-

plete information about the numbers of drugs, drug targets and

non-redundant drug targets (including their sequences) is

available in the DrugBank ‘download’ page. The entire data-

base, including text, sequence, structure and image data occu-

pies nearly 16 gigabytes of data—most of which can be freely

downloaded.

DrugBank is a fully searchable web-enabled resource with

many built-in tools and features for viewing, sorting and

extracting drug or drug target data. Detailed instructions on

where to locate and how to use these browsing/search tools

are provided on the DrugBank homepage. As with any web-

enabled database, DrugBank supports standard text queries

(through the text search box located on the home page). It

also offers general database browsing using the ‘Browse’ and

‘PharmaBrowse’ buttons located at the top of each DrugB ank

page. To facilitate general browsing, DrugBank is divided into

synoptic summary tables which, in turn, are linked to more

detailed ‘DrugCards’—in analogy to the very successful

GeneCards concept (7). All of DrugBank’s summary tables

can be rapidly browsed, sorted or reformatted (using up to six

different criteria) in a manner similar to the way PubMed

abstracts may be viewed. Clicking on the DrugCard button

found in the leftmost column of any given DrugBank summary

table opens a webpage describing the drug of interest in much

greater detail. Each DrugC ard entry contains >80 data ﬁelds

with half of the information being devoted to drug/chemical

data and the other half devoted to drug target or protein data

(see Table 1). In addition to providing comprehensive

numeric, sequence and textual data, each DrugCard also con-

tains hyperlinks to othe r databases, abstracts, digital images

and interactive applets for viewing molecular structures

(Figure 1). In addi tion to the general browsing features, Drug-

Bank also provides a more specialized ‘PharmBrowse’

feature. This is designed for pharmacists, physicians and

medicinal chemists who tend to think of drugs in clus ters

of indications or drug classes. This particular browsing tool

Table 1. Summary of the data fields or data types found in each DrugCard

Drug or compound information Drug target or receptor information

Generic name Target name

Brand name(s)/synonyms Target synonyms

IUPAC name Target protein sequence

Chemical structure/sequence Target no. of residues

Chemical formula Target molecular weight

PubChem/KEGG/ChEBI Links Target pI

Swiss-Prot/GenBank Links Target gene ontology

FDA/MSDS/RxList Links Target general function

Molecular weight Target specific function

Melting point Target pathways

Water solubility Target reactions

pKa or pI Target Pfam domains

LogP or hydrophobicity Target signal sequences

NMR/MS spectra Target transmembrane regions

MOL/SDF/PDF text files Target essentiality

MOL/PDB image files Target GenBank protein ID

SMILES string Target Swiss-Prot ID

Indication Target PDB ID

Pharmacology Target cellular location

Mechanism of action Target DNA sequence

Biotransformation/absorption Target chromosome location

Patient/physician information Target locus

Metabolizing enzymes Target SNPs/mutations

A more complete listing is provided on the DrugBank home page.

Nucleic Acids Research, 2006, Vol. 34, Database issue D669

Figure 1. A screenshot montage of the DrugBank Database showing several possible views of information describing the drug Ramipril. Not all fields are shown.

D670 Nucleic Acids Research, 2006, Vol. 34, Database issue

provides navigation hyperlinks to >70 drug classes, which in

turn list the FDA-approved drugs associated with the drugs.

Each drug name is then linked to its respective DrugCard.

A key distinguishing feature of DrugBank from other on-

line drug resources is its extensive support for higher level

database searching and selecting functions. In addition to the

data viewing and sorting features already described, Drug-

Bank also offers a local BLAST (8) search that supports

both single and multiple sequence queries, a boolean text

search [using GLIMPSE; (9)], a chemical structure search

utility and a relational data extraction tool (10). These can

all be accessed via the database navigation bar located at

the top of every DrugBank page.

The BLAST search (SeqSearch) is particularly useful as it

can potentially allow users to quickly and simply identify drug

leads from newly sequenced pathogens. Speciﬁcally, a new

sequence, a group of sequences or even an entire proteome can

be searched against DrugBank’s database of known drug target

sequences by pasting the FASTA formatted sequence (or

sequences) into the SeqSearch query box and pressing the

‘submit’ button. A signiﬁcant hit reveals, through the associ-

ated DrugCard hyperlink, the name(s) or chemical structure(s)

of potential drug leads that may act on that query protein (or

proteome).

DrugBank’s structure similarity search tool (ChemQuery)

can be used in a similar manner to its sequence search tools.

Users may sketch (through ACD’s freely available chemical

sketching applet) or paste a SMILES string (11) of a possible

lead compound into the ChemQuery window. Submitting the

query launches a structure similarity search tool that looks for

common substructures from the query compound that match

DrugBank’s database of known drug or drug-like compounds.

High scoring hits are presented in a tabular format with hyper-

links to the corresponding DrugCards (which in turn links to

the protein target). The ChemQuery tool allows users to

quickly determine whether their compound of interest acts

on the desired protein target. This kind of chemical structure

search may also reveal whether the compound of interest may

unexpectedly interact with unintended protein targets. In addi-

tion to these structure similarity searches , the ChemQuery

utility also supports compound searches on the basis of chem-

ical formula and mol ecular weight ranges.

DrugBank’s data extraction utility (Data Extractor) employs

a simple relational database system that allows users to select

one or more data ﬁelds and to search for ranges, occurrences or

partial occurrences of words, strings or numbers. The data

extractor uses clickable web forms so that users may intuit-

ively construct SQL-like queries. Using a few mouse clicks, it

is relatively simple to construct very complex queries (‘ﬁnd all

drugs less than 600 daltons with LogPs less than 3.2 that are

antihistamines’) or to build a series of highly customized

tables. The outpu t from these queries is provided as an

HTML format with hyperlinks to all associated DrugCards.

QUALITY ASSURANCE, COMPLETENESS

AND CURATION

Every effort is made to ensure that DrugBank is as complete,

correct and current as possible. Each DrugCard is entered or

prepared by one member of the curation team and separately

validated by second member of the curation team. Additional

spot checks are routinely performed on each entry by senior

members of the curation group, including a physician, an

accredited pharmacist and two PhD-level biochemists. Several

software packages including text mining tools, chemical para-

meter calculators and protein annotation tools (10) have been

modiﬁed or speciﬁcally developed to aid in DrugBank’s data

entry and data validation. These tools collate and display text

(and images) from multiple sources allowing the curators to

compare, assess, enter and correct drug or drug target informa-

tion. In addition to using a CVS (Current Versioning System),

all changes and edits to the central database are monitored,

dated and displayed on the DrugBank ‘download’ page using a

specially developed text tracking system. A second text track-

ing system has been implemented to monitor the completeness

(0–100%) of each ﬁeld (for all approved drugs) and to display

up-to-date statistics on the number of drugs, drug targets and

non-redundant sequences in various drug categories. This

information is also displayed in the ‘download’ page. To

ensure DrugBank is current, new drugs (approved and experi-

mental) are identiﬁed using continuously running screen-

scraping tools linked to the FDA, the PDB and RxList web-

sites. Backﬁlling of older, more obscure and orphan drugs is

ongoing and done manually. Drug targets are iden tiﬁed and

conﬁrmed using multiple sources (PubMed, TTD, FDA labels,

RxList, PharmGKB, textbooks) as are all drug structures

(KEGG, PubChem, images from FDA labels).

CONCLUSION

In summary, DrugBank is a comprehensive, web-accessible

database that brings together quantitative chemical, physical,

pharmaceutical and biological data about thousands of well-

studied drugs and drug targets. DrugBank is primarily focused

on providing the kind of detailed molecular data needed to

facilitate drug discovery and drug development. This includes

physical property data, structure and image ﬁles, pharmaco-

logical and physiological data about thous ands of drug pro-

ducts as well as extensive molecular biological information

about their corresponding drug targets. DrugBank is unique,

not only in the type of data it provides but also in the level of

integration and depth of coverage it achieves. In additio n to its

extensive small molecule drug coverage, DrugBank is cer-

tainly the only public database we are aware of that provides

any signiﬁcant information about the 110+ approved biotech

drugs. DrugBank also supports an extensive array of visual-

izing, querying and search options including a structure sim-

ilarity search tool and an easy-to-use relational data extraction

system. It is hoped that DrugBank will serve as a useful

resource to not only members of the pharmaceutical research

community but to educators, students, clinicians and the

general public.

ACKNOWLEDGEMENTS

The authors wish to thank Genome Prairie, a division of

Genome Canada for financial support. Funding to pay the

Open Access publication charges for this article was provided

by Genome Canada.

Conflict of interest statement. None declared.

Nucleic Acids Research, 2006, Vol. 34, Database issue D671

REFERENCES

1. Chen,X., Ji,Z.L. and Chen,Y.Z. (2002) TTD: therapeutic target database.

Nucleic Acids Res., 30, 412–415.

2. Kanehisa,M., Goto,S., Kawashima,S., Okuno,Y. and Hattori,M. (2004)

The KEGG resource for deciphering the genome. Nucleic Acids Res., 32,

D277–D280.

3. Brooksbank,C., Cameron,G. and Thornton,J. (2005) The European

Bioinformatics Institute’s data resources: towards systems biology.

Nucleic Acids Res., 33, D46–D53.

4. Hewett,M., Oliver,D.E., Rubin,D.L., Easton,K.L.,

Stuart,J.M., Altman,R.B. and Klein,T.E. (2002) PharmGKB:

the Pharmacogenetics Knowledge Base. Nucleic Acids Res., 30,

163–165.

5. Hatfield,C.L., May,S.K. and Markoff,J.S. (1999) Quality of consumer

drug information provided by four web sites. Am. J. Health Syst. Pharm.,

56, 2308–2311.

6. Bairoch,A., Apweiler,R., Wu,C.H., Barker,W.C., Boeckmann,B.,

Ferro,S., Gasteiger,E., Huang,H., Lopez,R., Magrane,M. et al. (2005) The

Universal Protein Resource (UniProt). Nucleic Acids Res., 33,

D154–D159.

7. Rebhan,M., Chalifa-Caspi,V., Prilusky,J. and Lancet,D. (1998)

GeneCards: a novel functional genomics compendium with automated

data mining and query reformulation support. Bioinformatics, 14,

656–664.

8. Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z.,

Miller,W. and Lipman,D.J. (1997) Gapped BLAST and PSI-BLAST: a

new generation of protein database search programs. Nucleic Acids Res.,

25, 3389–3402.

9. Manber,U. and Bigot,P. (1997) USENIX Symposium on Internet

Technologies and Systems (NSITS’97), Monterey, CA, pp. 231–239.

10. Sundararaj,S., Guo,A., Habibi-Nazhad,B., Rouani,M., Stothard,P.,

Ellison,M. and Wishart,D.S. (2004) The CyberCell Database (CCDB): a

comprehensive, self-updating, relational database to coordinate and

facilitate in silico modeling of Escherichia coli. Nucleic Acids Res., 32,

D293–D295.

11. Weininger,D. (1988) SMILES 1. Introduction and encoding rules.

J. Chem. Inf. Comput. Sci., 28, 31–38.

D672 Nucleic Acids Research, 2006, Vol. 34, Database issue

Deep Visual Proteomics Unveils Precision Medicine Insights in Composite Small Lymphocytic and Classical Hodgkin Lymphoma

Preprint

Full-text available

Jun 2024

Coexistence of two cancer types in the same organ presents challenges for clinical decision-making, calling for personalized treatment strategies. Deep Visual Proteomics (DVP) combines AI driven single cell type analysis with laser microdissection and ultrasensitive mass spectrometry. In a composite case of classical Hodgkin lymphoma (cHL) and small lymphocytic lymphoma (SLL) in a single patient, we investigated the potential of DVP to inform precision oncology. We quantified the proteomic landscapes in the cHL and SLL to a depth of thousands of proteins. Our analysis revealed distinct proteome profiles in cHL and SLL populations, highlighting their clonal unrelatedness. Our data suggested standardized chemotherapy and interleukin-4 inhibition as potential strategies to manage chemo-resistance - instead of bone marrow transplantation. DVP highlighted minichromosome maintenance protein and proteasome inhibitors for cHL and H3K27 methylation and receptor tyrosine kinase inhibitors for SLL as subtype-specific treatments. Thus cell-type specific insights of DVP can guide personalized oncological treatments.

Suspected Adverse Drug Reactions Associated With Leukotriene Receptor Antagonists Versus First Line Asthma Medications: A National Registry-Pharmacology Approach

Preprint

Full-text available

Jun 2024

Aims. To determine the suspected adverse drug reaction (ADR) profile of leukotriene receptor antagonists (LTRAs: montelukast and zafirlukast) relative to first-line asthma medications short-acting beta agonists (SABA: salbutamol) and inhaled corticosteroid (ICH: beclomethasone) in the United Kingdom. To determine chemical and pharmacological rationale for the suspected ADR signals. Methods. Properties of the asthma medications (pharmacokinetics and pharmacology) were datamined from the chemical database of bioactive molecules with drug-like properties, European molecular Biology laboratory (ChEMBL). Suspected ADR profiles of the asthma medications was curated from the Medicines and Healthcare products Regulatory Authority (MHRA) Yellow Card interactive drug analysis profiles (iDAP) and concatenated to the standardised prescribing levels (Open Prescribing) between 2018-2023. Results. Total ADRs per 100,000 Rx (P < .001) and psychiatric system organ class (SOC) ADRs (P < .001) reached statistical significance. Montelukast exhibited the greatest ADR rate at 15.64 per 100,000 Rx. The low lipophilic ligand efficiency (LLE = 0.15) of montelukast relative to the controls may explain the promiscuity of interactions with off-target G-coupled protein receptors (GPCRs). This included the dopamine signalling axis, which in combination with bioaccumulation in the cerebrospinal fluid (CSF) to achieve Cmax beyond a typical dose can be ascribed to the psychiatric side effects observed. Cardiac ADRs did not reach statistical significance but inhibitory interaction of montelukast with the MAP kinase p38 alpha (a cardiac protective pathway) was identified as a potential rationale for montelukast withdrawal cardiac effects. Conclusion. Relative to the controls, montelukast displays a range of suspected system organ class level ADRs. For psychiatric ADR, montelukast is statistically significant (P < .001). A mechanistic hypothesis is proposed based on polypharmacological interactions in combination with CSF levels attained. This work further supports the close monitoring of montelukast for neuropsychiatric side effects.

Integrated ML-Based Strategy Identifies Drug Repurposing for Idiopathic Pulmonary Fibrosis

Article

Full-text available

Jun 2024

Faheem Ahmed

Idiopathic pulmonary fibrosis (IPF) affects an estimated global population of around 3 million individuals. IPF is a medical condition with an unknown cause characterized by the formation of scar tissue in the lungs, leading to progressive respiratory disease. Currently, there are only two FDA-approved small molecule drugs specifically for the treatment of IPF and this has created a demand for the rapid development of drugs for IPF treatment. Moreover, denovo drug development is time and cost-intensive with less than a 10% success rate. Drug repurposing currently is the most feasible option for rapidly making the drugs to market for a rare and sporadic disease. Normally, the repurposing of drugs begins with a screening of FDA-approved drugs using computational tools, which results in a low hit rate. Here, an integrated machine learning-based drug repurposing strategy is developed to significantly reduce the false positive outcomes by introducing the predock machine-learning-based predictions followed by literature and GSEA-assisted validation and drug pathway prediction. The developed strategy is deployed to 1480 FDA-approved drugs and to drugs currently in a clinical trial for IPF to screen them against "TGFB1", "TGFB2", "PDGFR-a", "SMAD-2/3", "FGF-2", and more proteins resulting in 247 total and 27 potentially repurposable drugs. The literature and GSEA validation suggested that 72 of 247 (29.14%) drugs have been tried for IPF, 13 of 247 (5.2%) drugs have already been used for lung fibrosis, and 20 of 247 (8%) drugs have been tested for other fibrotic conditions such as cystic fibrosis and renal fibrosis. Pathway prediction of the remaining 142 drugs was carried out resulting in 118 distinct pathways. Furthermore, the analysis revealed that 29 of 118 pathways were directly or indirectly involved in IPF and 11 of 29 pathways were directly involved. Moreover, 15 potential drug combinations are suggested for showing a strong synergistic effect in IPF. The drug repurposing strategy reported here will be useful for rapidly developing drugs for treating IPF and other related conditions.

Assessing the Anticholinergic Cognitive Burden Classification of Putative Anticholinergic Drugs Using ADME Properties

Preprint

Jun 2024

Aim: This study evaluated the use of machine learning in leveraging drug ADME data to develop a novel anticholinergic burden (AB) scale and compared its performance to previously published scales. Methods: Experimental and in silico ADME data were collected for antimuscarinic activity, blood-brain barrier penetration, bioavailability, chemical structure and P-gp substrate profile. These five ADME properties were used to train an unsupervised model to assign anticholinergic burden scores to drugs. The performance of the model was evaluated through 10-fold cross-validation and compared with the clinical ACB scale and non-clinical ATS scale which is based primarily on muscarinic binding affinity. Results: In silico software (ADMET predictor ®) used for screening drugs for their blood-brain barrier (BBB) penetration correctly identified some drugs that do not cross the BBB. The mean AUC for the unsupervised and ACB scale based on five selected features was 0.76 and 0.64 respectively. The unsupervised model agreed with the ACB scale on the classification of more than half of the drugs (n=49 of m=88) and agreed on the classification of less than half the drugs in the ATS scale (n=12/25). Conclusion: Our findings suggest that the commonly used ACB scale may misclassify certain drugs due to their inability to cross the BBB. On the other hand, the ATS scale would misclassify drugs solely depending on muscarinic binding affinity without considering ADME properties. Machine learning models can be trained on these features to build classification models that are easy to update and have greater generalizability.

Multiple Kronecker RLS fusion-based link propagation for drug-side effect prediction

Article

Full-text available

Jun 2024

Drug-side effect prediction has become an essential area of research in the field of pharma-cology. As the use of medications continues to rise, so does the importance of understanding and mitigating the potential risks associated with them. At present, researchers have turned to data-driven methods to predict drug-side effects. Drug-side effect prediction is a link prediction problem, and the related data can be described from various perspectives. To process these kinds of data, a multi-view method, called Multiple Kronecker RLS fusion-based link propagation (MKronRLSF-LP), is proposed. MKronRLSF-LP extends the Kron-RLS by finding the consensus partitions and multiple graph Laplacian constraints in the multi-view setting. Both of these multi-view settings contribute to a higher quality result. Extensive experiments have been conducted on drug-side effect datasets, and our empirical results provide evidence that our approach is effective and robust.

Paeoniae Radix Alba and Network Pharmacology Approach for Osteoarthritis: A Review

Article

Full-text available

Jun 2024

Osteoarthritis (OA) is the most common type of arthritis and affects more than 240 million people worldwide; the most frequently affected areas are the hips, knees, feet, and hands. OA pathophysiology is multifactorial, involving genetic, developmental, metabolic, traumatic, and inflammation factors. Therefore, treatments able to address several path mechanisms can help control OA. Network pharmacology is developing as a next-generation research strategy to shift the paradigm of drug discovery from “one drug, one target” to “multi-component, multi-target”. In this paper, network pharmacology is employed to investigate the potential role of Paeoniae Radix Alba (PRA) in the treatment of OA. PRA is a natural product known for its protective effects against OA, which has recently drawn attention because of its ability to provide physiological benefits with fewer toxic effects. This review highlights the anti-inflammatory properties of PRA in treating OA. PRA can be used alone or in combination with conventional therapies to enhance their effectiveness and reduce side effects. The study also demonstrates the use of network pharmacology as a cost-effective and time-saving method for predicting therapeutic targets of PRA in OA treatment.

Molecular Modeling in Drug Delivery: Polymer Protective Coatings as Case Study

Chapter

Jun 2024

Modular Access to Functionalized Oxetanes as Benzoyl Bioisosteres

Article

Jun 2024
J AM CHEM SOC

Interactions of a Novel Anthracycline with Oligonucleotide DNA and Cyclodextrins in an Aqueous Environment

Article

Jun 2024
J PHYS CHEM B

Drug to genome to drug: a computational large-scale chemogenomics screening for novel drug candidates against sporotrichosis

Article

Jun 2024
BRAZ J MICROBIOL

Sporotrichosis is recognized as the predominant subcutaneous mycosis in South America, attributed to pathogenic species within the Sporothrix genus. Notably, in Brazil, Sporothrix brasiliensis emerges as the principal species, exhibiting significant sapronotic, zoonotic and enzootic epidemic potential. Consequently, the discovery of novel therapeutic agents for the treatment of sporotrichosis is imperative. The present study is dedicated to the repositioning of pharmaceuticals for sporotrichosis therapy. To achieve this goal, we designed a pipeline with the following steps: (a) compilation and preparation of Sporothrix genome data; (b) identification of orthologous proteins among the species; (c) identification of homologous proteins in publicly available drug-target databases; (d) selection of Sporothrix essential targets using validated genes from Saccharomyces cerevisiae; (e) molecular modeling studies; and (f) experimental validation of selected candidates. Based on this approach, we were able to prioritize eight drugs for in vitro experimental validation. Among the evaluated compounds, everolimus and bifonazole demonstrated minimum inhibitory concentration (MIC) values of 0.5 µg/mL and 4.0 µg/mL, respectively. Subsequently, molecular docking studies suggest that bifonazole and everolimus may target specific proteins within S. brasiliensis– namely, sterol 14-α-demethylase and serine/threonine-protein kinase TOR, respectively. These findings shed light on the potential binding affinities and binding modes of bifonazole and everolimus with their probable targets, providing a preliminary understanding of the antifungal mechanism of action of these compounds. In conclusion, our research advances the understanding of the therapeutic potential of bifonazole and everolimus, supporting their further investigation as antifungal agents for sporotrichosis in prospective hit-to-lead and preclinical investigations.

Petromagnetic Properties In The Naica Mining District, Chihuahua, Mexico: Searching For Source of Mineralization

Article

Full-text available

Jan 2003
EARTH PLANETS SPACE

Ore mineral and host lithologies have been sampled with 89 oriented samples from 14 sites in the Naica District, northern Mexico. Magnetic parameters permit to charac- terise samples: saturation magnetization, density, low- high-temperature magnetic sus- ceptibility, remanence intensity, Koenigsberger ratio, Curie temperature and hystere- sis parameters. Rock magnetic properties are controlled by variations in titanomag- netite content and hydrothermal alteration. Post-mineralization hydrothermal alter- ation seems the major event that affected the minerals and magnetic properties. Curie temperatures are characteristic of titanomagnetites or titanomaghemites. Hysteresis parameters indicate that most samples have pseudo-single domain (PSD) magnetic grains. Alternating filed (AF) demagnetization and isothermal remanence (IRM) ac- quisition both indicate that natural and laboratory remanences are carried by MD-PSD spinels in the host rocks. The trend of NRM intensity vs susceptibility suggests that the carrier of remanent and induced magnetization is the same in all cases (spinels). The Koenigsberger ratio range from 0.05 to 34.04, indicating the presence of MD and PSD magnetic grains. Constraints on the geometry of the intrusive source body devel- oped in the model of the magnetic anomaly are obtained by quantifying the relative contributions of induced and remanent magnetization components.

Gapped BLAST and PSI-BLAST: A new generation of protein database search programs

Article

Full-text available

Sep 1997

The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic, and statistical refinements permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is described for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position Specific Iterated BLAST (PSLBLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities.

GeneCards: a novel functional genomics compendium with automated data mining and query reformulation support

Article

Full-text available

Jan 1998

Motivation: Modern biology is shifting from the 'one gene one postdoc' approach to genomic analyses that include the simultaneous monitoring of thousands of genes. The importance of efficient access to concise and integrated biomedical information to support data analysis and decision making is therefore increasing rapidly, in both academic and industrial research. However, knowledge discovery in the widely scattered resources relevant for biomedical research is often a cumbersome and non-trivial task, one that requires a significant amount of training and effort.

Gapped BLAST and PSI-BLAST: A new generation of protein database search programs

Article

Full-text available

Sep 1997

The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSIBLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.

Rebhan M, Chalifa-Caspi V, Prilusky J, Lancet D. GeneCards: a novel functional genomics compendium with automated data mining and query reformulation support. Bioinformatics 14: 656-664

Article

Full-text available

Feb 1998

Modern biology is shifting from the 'one gene one postdoc' approach to genomic analyses that include the simultaneous monitoring of thousands of genes. The importance of efficient access to concise and integrated biomedical information to support data analysis and decision making is therefore increasing rapidly, in both academic and industrial research. However, knowledge discovery in the widely scattered resources relevant for biomedical research is often a cumbersome and non-trivial task, one that requires a significant amount of training and effort. To develop a model for a new type of topic-specific overview resource that provides efficient access to distributed information, we designed a database called 'GeneCards'. It is a freely accessible Web resource that offers one hypertext 'card' for each of the more than 7000 human genes that currently have an approved gene symbol published by the HUGO/GDB nomenclature committee. The presented information aims at giving immediate insight into current knowledge about the respective gene, including a focus on its functions in health and disease. It is compiled by Perl scripts that automatically extract relevant information from several databases, including SWISS-PROT, OMIM, Genatlas and GDB. Analyses of the interactions of users with the Web interface of GeneCards triggered development of easy-to-scan displays optimized for human browsing. Also, we developed algorithms that offer 'ready-to-click' query reformulation support, to facilitate information retrieval and exploration. Many of the long-term users turn to GeneCards to quickly access information about the function of very large sets of genes, for example in the realm of large-scale expression studies using 'DNA chip' technology or two-dimensional protein electrophoresis. Freely available at http://bioinformatics.weizmann.ac.il/cards/ Contact: cards@bioinformatics.weizmann.ac.il

Quality of consumer drug information provided by four Web sites

Article

Full-text available

Dec 1999

The quality of drug-specific information available to consumers on the Internet was studied. The 30 most commonly dispensed prescription drugs were selected to represent those medications for which consumers would be seeking information. A Web page evaluation form was developed to objectively evaluate each site in terms of sponsors, references, recency of updates, ease of use, overall organization, and other characteristics. A second form was developed to qualitatively and quantitatively assess the drug information provided by the sites. Four Internet sites, MedicineNet, RxList, Drug InfoNet, and thriveonline, were evaluated. Authors, contributors, and references were identified for three of the sites. All sites had disclaimers advising patients to seek the advice of a health care professional, all indexed drug information by both brand name and generic name, and all were well organized. Only RxList and MedicineNet contained information on all the drugs evaluated. For the drugs documented, RxList, MedicineNet, Drug InfoNet, and thriveonline contained 84%, 60%, 87%, and 72% of the 22 variables assessed, respectively. The accuracy of the information provided was greater than 98% for all the sites. Only two of four Internet sites containing consumer drug information included all the prescription drugs being evaluated. Most but not all of the information on the four sites was accurate.

TTD: Therapeutic Target Database

Article

Full-text available

Feb 2002
NUCLEIC ACIDS RES

A number of proteins and nucleic acids have been explored as therapeutic targets. These targets are subjects of interest in different areas of biomedical and pharmaceutical research and in the development and evaluation of bioinformatics, molecular modeling, computer-aided drug design and analytical tools. A publicly accessible database that provides comprehensive information about these targets is therefore helpful to the relevant communities. The Therapeutic Target Database (TTD) is designed to provide information about the known therapeutic protein and nucleic acid targets described in the literature, the targeted disease conditions, the pathway information and the corresponding drugs/ligands directed at each of these targets. Cross-links to other databases are also introduced to facilitate the access of information about the sequence, 3D structure, function, nomenclature, drug/ligand binding properties, drug usage and effects, and related literature for each target. This database can be accessed at http://xin.cz3.nus.edu.sg/group/ttd/ttd.asp and it currently contains entries for 433 targets covering 125 disease conditions along with 809 drugs/ligands directed at each of these targets. Each entry can be retrieved through multiple methods including target name, disease name, drug/ligand name, drug/ligand function and drug therapeutic classification.

PharmGKB: the Pharmacogenetics Knowledge Base

Article

Full-text available

Feb 2002
NUCLEIC ACIDS RES

The Pharmacogenetics Knowledge Base (PharmGKB; http://www.pharmgkb.org/) contains genomic, phenotype and clinical information collected from ongoing pharmacogenetic studies. Tools to browse, query, download, submit, edit and process the information are available to registered research network members. A subset of the tools is publicly available. PharmGKB currently contains over 150 genes under study, 14 Coriell populations and a large ontology of pharmacogenetics concepts. The pharmacogenetic concepts and the experimental data are interconnected by a set of relations to form a knowledge base of information for pharmacogenetic researchers. The information in PharmGKB, and its associated tools for processing that information, are tailored for leading-edge pharmacogenetics research. The PharmGKB project was initiated in April 2000 and the first version of the knowledge base went online in February 2001.

SMILES, A Chemical Language and Information System. 1. Introduction to Methodology and Encoding Rules

Article

Feb 1988

David Weininger

SMILES (Simplified Molecular Input Line Entry System) is a chemical notation system designed for modern chemical information processing. Based on principles of molecular graph theory, SMILES allows rigorous structure specification by use of a very small and natural grammar. The SMILES notation system is also well suited for high-speed machine processing. The resulting ease of usage by the chemist and machine compatability allow many highly efficient chemical computer applications to be designed including generation of a unique notation, constant-speed (zeroeth order) database retrieval, flexible substructure searching, and property prediction models.

The CyberCell Database (CCDB): A comprehensive, self-updating, relational database to coordinate and facilitate in silico modeling of Escherichia coli

Article

Feb 2004
NUCLEIC ACIDS RES

The CyberCell Database (CCDB: http://redpoll. pharmacy.ualberta.ca/CCDB) is a comprehensive, web‐accessible database designed to support and coordinate international efforts in modeling an Escherichia coli cell on a computer. The CCDB brings together both observed and derived quantitative data from numerous independent sources covering many aspects of the genomic, proteomic and metabolomic character of E.coli (strain K12). The database is self‐updating but also supports ‘community’ annotation, and provides an extensive array of viewing, querying and search options including a powerful, easy‐to‐use relational data extraction system.

DrugBank: A Comprehensive Resource for in Silico Drug Discovery and Exploration

Abstract

Recommended publications

Object-oriented parsing of biological databases with Python

The Importance of Biological Databases in Biological Discovery

Virgil database for rich links (1999 update)

Development of the receptor database (RDB): Application to the endocrine disruptor problem