ArticlePDF Available

DrugBank: A Comprehensive Resource for in Silico Drug Discovery and Exploration

Authors:

Abstract

DrugBank is a unique bioinformatics/cheminformatics resource that combines detailed drug (i.e. chemical) data with comprehensive drug target (i.e. protein) information. The database contains >4100 drug entries including >800 FDA approved small molecule and biotech drugs as well as >3200 experimental drugs. Additionally, >14 000 protein or drug target sequences are linked to these drug entries. Each DrugCard entry contains >80 data fields with half of the information being devoted to drug/chemical data and the other half devoted to drug target or protein data. Many data fields are hyperlinked to other databases (KEGG, PubChem, ChEBI, PDB, Swiss-Prot and GenBank) and a variety of structure viewing applets. The database is fully searchable supporting extensive text, sequence, chemical structure and relational query searches. Potential applications of DrugBank include in silico drug target discovery, drug design, drug docking or screening, drug metabolism prediction, drug interaction prediction and general pharmaceutical education. DrugBank is available at http://redpoll.pharmacy.ualberta.ca/drugbank/.
DrugBank: a comprehensive resource for in silico
drug discovery and exploration
David S. Wishart*, Craig Knox, An Chi Guo, Savita Shrivastava, Murtaza Hassanali,
Paul Stothard, Zhan Chang and Jennifer Woolsey
Department of Computing Science and Department of Biological Sciences, University of Alberta, Edmonton,
AB, Canada T6G 2E8
Received August 14, 2005; Revised and Accepted October 8, 2005
ABSTRACT
DrugBank is a unique bioinformatics/cheminformat-
ics resource that combines detailed drug (i.e. chem-
ical) data with comprehensive drug target (i.e. protein)
information. The database contains .4100 drug ent-
ries including .800 FDA approved small molecule
and biotech drugs as well as .3200 experimental
drugs. Additionally, .14 000 protein or drug target
sequences are link ed to these drug entries. Each
DrugCard entry contains .80 data fields with half of
the information being devoted to drug/chemical data
and the other half devoted to drug target or protein
data. Many data fields are hyperlinked to other data-
bases (KEGG, PubChem, ChEBI, PDB, Swiss-Prot and
GenBank) and a variety of structure viewing applets.
The database is fully searchable supporting extensive
text, sequence, chemical structure and relational
query searches. Potential applications of DrugBank
include in silico drug target discovery, drug design ,
drug docking or screening, drug metabolism predic-
tion, drug interaction prediction and general pharma-
ceutical education. DrugBank is available at http://
redpoll.pharmacy.ualberta.ca/drugbank/.
INTRODUCTION
Until the 1980s, most of our knowledge about drugs, drug
mechanisms and drug receptors could fit in a few encyclope dic
books and a couple dozen schematic figures. However, with
the recent explosion in biological and chemical knowledge,
this is no longer the case. There is simply too much data
(images, models, structures and sequences) from too many
sources. Unfortunately, most of this information still resides
in textbooks or print journals. The limited drug or drug recep-
tor data that is electronically available is either inaccessible
(except through expensive subscriptions), inadequate or
widely scattered among many different public databases.
This state of affairs largely reflects the ‘two solitudes’ of
cheminformatics and bioinformatics. Neither discipline has
really tried to integrate with the other. As a consequence,
the wealth of electronic sequence/structure data that exists
today has never been well linked to the enormous body of
drug or chemical knowledge that has accumulated over the
past half century.
Recently, some notable efforts have been made to partially
overcome this ‘informatics gap’. The Therapeutic Target
Database or TTD is one such example (1). This very useful
web-based resource contains linked lists of names for >1100
small molecule drugs and drug targets (i.e. proteins). In addi-
tion to the TTD, a number of more comprehensive small
molecule databases have also emerged including KEGG
(2), ChEBI (3) and PubChem (http://pubchem.ncbi.nlm.nih.
gov/). Each contains tens of thousands of chemical
entries—including hundreds of small molecule drugs. All
three databases provide names, synonyms, images, structure
files and hyperlinks to other databases. Furthermore, both
KEGG and PubChem support structure similarity searches.
Unfortunately, these databases were not specifically designed
to be drug databases, and so they do not provide specific
pharmaceutical information or links to specific drug targets
(i.e. sequences). Furthermore, because thes e databases were
designed to be synoptic (containing <15 fields per compound
entry) they do not provide a comprehensive molecular sum-
mary of any given drug or its corresponding protein target.
More specialized drug databases such as PharmGKB (4) or on-
line pharmaceutical encyclopedias such as RxList (5) tend to
offer muc h more detailed clinical information about many
drugs (their pharmacology, metabolism and indications) but
they were not designed to contain structural, chemical or
physico-chemical information. Instead their data content is
targeted more towards pharmacists, physicians or consumers.
Ideally, what is needed is something that combines the
strengths of, say, PharmGKB, PubChem and Swiss-Prot to
*To whom correspondence should be addressed. Tel: +1 780 492 0383; Fax: +1 780 492 1071; Email: david.wishart@ualberta.ca
The Author 2006. Published by Oxford University Press. All rights reserved.
The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access
version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press
are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but
only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact journals.permissions@oxfordjournals.org
D668–D672 Nucleic Acids Research, 2006, Vol. 34, Database issue
doi:10.1093/nar/gkj067
create a single, fully searchable in silico drug resource that
links sequence, structure and mechanistic data about drug
molecules (including biotech drugs) with sequence, structure
and mechanistic data about their drug targets. Beyond its obvi-
ous educational value, this kind of database could potentially
allow researchers to easily visualize and explore 3D drug
interactions, compare drug similarities or perform in silico
drug (or drug target) discovery. Here, we wish to describe
just such a database—called DrugBank.
DATABASE DESCRIPTION
Fundamentally, DrugBank is a dual purpose bioinformatics–
cheminformatics database with a strong focus on quantitative,
analytic or molecular-scale information about both drugs and
drug targets. In many respects it combines the data-rich
molecular biology content normally found in curated sequence
databases such as Swiss-Prot and UniProt (6) with the equally
rich data found in medicinal chemistry textbooks and chemical
reference handbooks. By bringing these two disparate types of
information together into one unified and freely available
resource, we wanted to allow educators and researchers
from diverse disciplines and backgrounds (academic, indus-
trial, clinical, non-clinical ) to conduct the type of in silico
learning and discovery that is now routine in the world of
genomics and proteomics.
The diversity of data types and the required breadth of
domain knowledge, combined with the fact that the data
were mostly ‘paper- bound’ made the assembly of DrugBank
both difficult and time-consuming. To compile, confirm and
validate this comprehensive collection of data, more than a
dozen textbooks, several hundred journal articles, nearly 30
different electronic databases, and at least 20 in-house or
web-based programs were individually searched, accessed,
compared, written or run over the course of four years. The
team of DrugBank archivists and annotators included two
accredited pharmacists, a physician and three bioinform-
aticians with dual training in computing science and molecular
biology/chemistry.
DrugBank currently contains >4100 drug entries, corres-
ponding to >12 000 different trade names and synonyms.
These drug entries were chosen according to the following
rules: the molecule must contain more than one type of atom,
be non-redundant, have a known chemical structure and be
identified as a drug or drug-like molecule by at least one
reputable data source. To facilitate more targeted research
and exploration, DrugBank is divided into four major categor-
ies: (i) FDA-approved small molecule drugs (>700 entries),
(ii) FDA-approved biotech (protein/peptide) drugs (>100 ent-
ries), (iii) nutraceuticals or micronutrients such as vitamins
and metabolites (>60 entries) and (iv) experimental drugs,
including unapproved drugs, de-listed drugs, illicit drugs,
enzyme inhibitors and potential toxins (3200 entries). These
individual ‘Drug Types’ are also bundled into two larger
categories including all FDA drugs (Approved Drugs) and
All Compounds (Experimental + FDA + nutraceuticals).
DrugBank’s cover age for non-trivial FDA-approved drugs
is 80% complete. In addition, >14 000 protein (i.e. drug
target) sequences are linked to these drug entries. More com-
plete information about the numbers of drugs, drug targets and
non-redundant drug targets (including their sequences) is
available in the DrugBank ‘download’ page. The entire data-
base, including text, sequence, structure and image data occu-
pies nearly 16 gigabytes of data—most of which can be freely
downloaded.
DrugBank is a fully searchable web-enabled resource with
many built-in tools and features for viewing, sorting and
extracting drug or drug target data. Detailed instructions on
where to locate and how to use these browsing/search tools
are provided on the DrugBank homepage. As with any web-
enabled database, DrugBank supports standard text queries
(through the text search box located on the home page). It
also offers general database browsing using the ‘Browse’ and
‘PharmaBrowse’ buttons located at the top of each DrugB ank
page. To facilitate general browsing, DrugBank is divided into
synoptic summary tables which, in turn, are linked to more
detailed ‘DrugCards’—in analogy to the very successful
GeneCards concept (7). All of DrugBank’s summary tables
can be rapidly browsed, sorted or reformatted (using up to six
different criteria) in a manner similar to the way PubMed
abstracts may be viewed. Clicking on the DrugCard button
found in the leftmost column of any given DrugBank summary
table opens a webpage describing the drug of interest in much
greater detail. Each DrugC ard entry contains >80 data fields
with half of the information being devoted to drug/chemical
data and the other half devoted to drug target or protein data
(see Table 1). In addition to providing comprehensive
numeric, sequence and textual data, each DrugCard also con-
tains hyperlinks to othe r databases, abstracts, digital images
and interactive applets for viewing molecular structures
(Figure 1). In addi tion to the general browsing features, Drug-
Bank also provides a more specialized ‘PharmBrowse’
feature. This is designed for pharmacists, physicians and
medicinal chemists who tend to think of drugs in clus ters
of indications or drug classes. This particular browsing tool
Table 1. Summary of the data fields or data types found in each DrugCard
Drug or compound information Drug target or receptor information
Generic name Target name
Brand name(s)/synonyms Target synonyms
IUPAC name Target protein sequence
Chemical structure/sequence Target no. of residues
Chemical formula Target molecular weight
PubChem/KEGG/ChEBI Links Target pI
Swiss-Prot/GenBank Links Target gene ontology
FDA/MSDS/RxList Links Target general function
Molecular weight Target specific function
Melting point Target pathways
Water solubility Target reactions
pKa or pI Target Pfam domains
LogP or hydrophobicity Target signal sequences
NMR/MS spectra Target transmembrane regions
MOL/SDF/PDF text files Target essentiality
MOL/PDB image files Target GenBank protein ID
SMILES string Target Swiss-Prot ID
Indication Target PDB ID
Pharmacology Target cellular location
Mechanism of action Target DNA sequence
Biotransformation/absorption Target chromosome location
Patient/physician information Target locus
Metabolizing enzymes Target SNPs/mutations
A more complete listing is provided on the DrugBank home page.
Nucleic Acids Research, 2006, Vol. 34, Database issue D669
Figure 1. A screenshot montage of the DrugBank Database showing several possible views of information describing the drug Ramipril. Not all fields are shown.
D670 Nucleic Acids Research, 2006, Vol. 34, Database issue
provides navigation hyperlinks to >70 drug classes, which in
turn list the FDA-approved drugs associated with the drugs.
Each drug name is then linked to its respective DrugCard.
A key distinguishing feature of DrugBank from other on-
line drug resources is its extensive support for higher level
database searching and selecting functions. In addition to the
data viewing and sorting features already described, Drug-
Bank also offers a local BLAST (8) search that supports
both single and multiple sequence queries, a boolean text
search [using GLIMPSE; (9)], a chemical structure search
utility and a relational data extraction tool (10). These can
all be accessed via the database navigation bar located at
the top of every DrugBank page.
The BLAST search (SeqSearch) is particularly useful as it
can potentially allow users to quickly and simply identify drug
leads from newly sequenced pathogens. Specifically, a new
sequence, a group of sequences or even an entire proteome can
be searched against DrugBank’s database of known drug target
sequences by pasting the FASTA formatted sequence (or
sequences) into the SeqSearch query box and pressing the
‘submit’ button. A significant hit reveals, through the associ-
ated DrugCard hyperlink, the name(s) or chemical structure(s)
of potential drug leads that may act on that query protein (or
proteome).
DrugBank’s structure similarity search tool (ChemQuery)
can be used in a similar manner to its sequence search tools.
Users may sketch (through ACD’s freely available chemical
sketching applet) or paste a SMILES string (11) of a possible
lead compound into the ChemQuery window. Submitting the
query launches a structure similarity search tool that looks for
common substructures from the query compound that match
DrugBank’s database of known drug or drug-like compounds.
High scoring hits are presented in a tabular format with hyper-
links to the corresponding DrugCards (which in turn links to
the protein target). The ChemQuery tool allows users to
quickly determine whether their compound of interest acts
on the desired protein target. This kind of chemical structure
search may also reveal whether the compound of interest may
unexpectedly interact with unintended protein targets. In addi-
tion to these structure similarity searches , the ChemQuery
utility also supports compound searches on the basis of chem-
ical formula and mol ecular weight ranges.
DrugBank’s data extraction utility (Data Extractor) employs
a simple relational database system that allows users to select
one or more data fields and to search for ranges, occurrences or
partial occurrences of words, strings or numbers. The data
extractor uses clickable web forms so that users may intuit-
ively construct SQL-like queries. Using a few mouse clicks, it
is relatively simple to construct very complex queries (‘find all
drugs less than 600 daltons with LogPs less than 3.2 that are
antihistamines’) or to build a series of highly customized
tables. The outpu t from these queries is provided as an
HTML format with hyperlinks to all associated DrugCards.
QUALITY ASSURANCE, COMPLETENESS
AND CURATION
Every effort is made to ensure that DrugBank is as complete,
correct and current as possible. Each DrugCard is entered or
prepared by one member of the curation team and separately
validated by second member of the curation team. Additional
spot checks are routinely performed on each entry by senior
members of the curation group, including a physician, an
accredited pharmacist and two PhD-level biochemists. Several
software packages including text mining tools, chemical para-
meter calculators and protein annotation tools (10) have been
modified or specifically developed to aid in DrugBank’s data
entry and data validation. These tools collate and display text
(and images) from multiple sources allowing the curators to
compare, assess, enter and correct drug or drug target informa-
tion. In addition to using a CVS (Current Versioning System),
all changes and edits to the central database are monitored,
dated and displayed on the DrugBank ‘download’ page using a
specially developed text tracking system. A second text track-
ing system has been implemented to monitor the completeness
(0–100%) of each field (for all approved drugs) and to display
up-to-date statistics on the number of drugs, drug targets and
non-redundant sequences in various drug categories. This
information is also displayed in the ‘download’ page. To
ensure DrugBank is current, new drugs (approved and experi-
mental) are identified using continuously running screen-
scraping tools linked to the FDA, the PDB and RxList web-
sites. Backfilling of older, more obscure and orphan drugs is
ongoing and done manually. Drug targets are iden tified and
confirmed using multiple sources (PubMed, TTD, FDA labels,
RxList, PharmGKB, textbooks) as are all drug structures
(KEGG, PubChem, images from FDA labels).
CONCLUSION
In summary, DrugBank is a comprehensive, web-accessible
database that brings together quantitative chemical, physical,
pharmaceutical and biological data about thousands of well-
studied drugs and drug targets. DrugBank is primarily focused
on providing the kind of detailed molecular data needed to
facilitate drug discovery and drug development. This includes
physical property data, structure and image files, pharmaco-
logical and physiological data about thous ands of drug pro-
ducts as well as extensive molecular biological information
about their corresponding drug targets. DrugBank is unique,
not only in the type of data it provides but also in the level of
integration and depth of coverage it achieves. In additio n to its
extensive small molecule drug coverage, DrugBank is cer-
tainly the only public database we are aware of that provides
any significant information about the 110+ approved biotech
drugs. DrugBank also supports an extensive array of visual-
izing, querying and search options including a structure sim-
ilarity search tool and an easy-to-use relational data extraction
system. It is hoped that DrugBank will serve as a useful
resource to not only members of the pharmaceutical research
community but to educators, students, clinicians and the
general public.
ACKNOWLEDGEMENTS
The authors wish to thank Genome Prairie, a division of
Genome Canada for financial support. Funding to pay the
Open Access publication charges for this article was provided
by Genome Canada.
Conflict of interest statement. None declared.
Nucleic Acids Research, 2006, Vol. 34, Database issue D671
REFERENCES
1. Chen,X., Ji,Z.L. and Chen,Y.Z. (2002) TTD: therapeutic target database.
Nucleic Acids Res., 30, 412–415.
2. Kanehisa,M., Goto,S., Kawashima,S., Okuno,Y. and Hattori,M. (2004)
The KEGG resource for deciphering the genome. Nucleic Acids Res., 32,
D277–D280.
3. Brooksbank,C., Cameron,G. and Thornton,J. (2005) The European
Bioinformatics Institute’s data resources: towards systems biology.
Nucleic Acids Res., 33, D46–D53.
4. Hewett,M., Oliver,D.E., Rubin,D.L., Easton,K.L.,
Stuart,J.M., Altman,R.B. and Klein,T.E. (2002) PharmGKB:
the Pharmacogenetics Knowledge Base. Nucleic Acids Res., 30,
163–165.
5. Hatfield,C.L., May,S.K. and Markoff,J.S. (1999) Quality of consumer
drug information provided by four web sites. Am. J. Health Syst. Pharm.,
56, 2308–2311.
6. Bairoch,A., Apweiler,R., Wu,C.H., Barker,W.C., Boeckmann,B.,
Ferro,S., Gasteiger,E., Huang,H., Lopez,R., Magrane,M. et al. (2005) The
Universal Protein Resource (UniProt). Nucleic Acids Res., 33,
D154–D159.
7. Rebhan,M., Chalifa-Caspi,V., Prilusky,J. and Lancet,D. (1998)
GeneCards: a novel functional genomics compendium with automated
data mining and query reformulation support. Bioinformatics, 14,
656–664.
8. Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z.,
Miller,W. and Lipman,D.J. (1997) Gapped BLAST and PSI-BLAST: a
new generation of protein database search programs. Nucleic Acids Res.,
25, 3389–3402.
9. Manber,U. and Bigot,P. (1997) USENIX Symposium on Internet
Technologies and Systems (NSITS’97), Monterey, CA, pp. 231–239.
10. Sundararaj,S., Guo,A., Habibi-Nazhad,B., Rouani,M., Stothard,P.,
Ellison,M. and Wishart,D.S. (2004) The CyberCell Database (CCDB): a
comprehensive, self-updating, relational database to coordinate and
facilitate in silico modeling of Escherichia coli. Nucleic Acids Res., 32,
D293–D295.
11. Weininger,D. (1988) SMILES 1. Introduction and encoding rules.
J. Chem. Inf. Comput. Sci., 28, 31–38.
D672 Nucleic Acids Research, 2006, Vol. 34, Database issue
... The conclusive evidence of clonal unrelatedness helped guide the clinician's decision to recommend chemotherapy. ABVD (Adriamycin, Bleomycin, Vinblastine, Dacarbazine) chemotherapy is widely recognized as a standard firstline treatment for cHL 33,34 . To date, the patient has undergone two series of ABVD treatment, resulting in complete remission of cHL as confirmed by PET-CT scans. ...
... Our analysis has revealed not only the upregulation of the proteasome subunit protein PMSB but also PMSA/C/D/E in HRS cells. Therefore, it may be beneficial to consider the use of additional proteasome inhibitors for therapeutic intervention 34 . In addition, proteasome inhibitors have demonstrated the ability to sensitize tumor cells to various anti-tumor therapies, indicating their potential role in managing chemo-resistance 35 . ...
Preprint
Full-text available
Coexistence of two cancer types in the same organ presents challenges for clinical decision-making, calling for personalized treatment strategies. Deep Visual Proteomics (DVP) combines AI driven single cell type analysis with laser microdissection and ultrasensitive mass spectrometry. In a composite case of classical Hodgkin lymphoma (cHL) and small lymphocytic lymphoma (SLL) in a single patient, we investigated the potential of DVP to inform precision oncology. We quantified the proteomic landscapes in the cHL and SLL to a depth of thousands of proteins. Our analysis revealed distinct proteome profiles in cHL and SLL populations, highlighting their clonal unrelatedness. Our data suggested standardized chemotherapy and interleukin-4 inhibition as potential strategies to manage chemo-resistance - instead of bone marrow transplantation. DVP highlighted minichromosome maintenance protein and proteasome inhibitors for cHL and H3K27 methylation and receptor tyrosine kinase inhibitors for SLL as subtype-specific treatments. Thus cell-type specific insights of DVP can guide personalized oncological treatments.
... The Electronic Medicines Compendium (EMC), and the Chemical Database of bioactive molecules with drug-like properties European molecular Biology laboratory (ChEMBL) databases were used to identify the physiochemical properties, pharmacokinetics, and pharmacology data of montelukast, zafirlukast, beclomethasone, salbutamol. [15][16][17][18] ...
... EMC, DrugBank, PubChem, ChEMBL and wider literature were used to find pharmacokinetic data, by searching for "drug name" and "parameter required" ( Table 2). [15][16][17][18][20][21][22][23][24][25] ...
Preprint
Full-text available
Aims. To determine the suspected adverse drug reaction (ADR) profile of leukotriene receptor antagonists (LTRAs: montelukast and zafirlukast) relative to first-line asthma medications short-acting beta agonists (SABA: salbutamol) and inhaled corticosteroid (ICH: beclomethasone) in the United Kingdom. To determine chemical and pharmacological rationale for the suspected ADR signals. Methods. Properties of the asthma medications (pharmacokinetics and pharmacology) were datamined from the chemical database of bioactive molecules with drug-like properties, European molecular Biology laboratory (ChEMBL). Suspected ADR profiles of the asthma medications was curated from the Medicines and Healthcare products Regulatory Authority (MHRA) Yellow Card interactive drug analysis profiles (iDAP) and concatenated to the standardised prescribing levels (Open Prescribing) between 2018-2023. Results. Total ADRs per 100,000 Rx (P < .001) and psychiatric system organ class (SOC) ADRs (P < .001) reached statistical significance. Montelukast exhibited the greatest ADR rate at 15.64 per 100,000 Rx. The low lipophilic ligand efficiency (LLE = 0.15) of montelukast relative to the controls may explain the promiscuity of interactions with off-target G-coupled protein receptors (GPCRs). This included the dopamine signalling axis, which in combination with bioaccumulation in the cerebrospinal fluid (CSF) to achieve Cmax beyond a typical dose can be ascribed to the psychiatric side effects observed. Cardiac ADRs did not reach statistical significance but inhibitory interaction of montelukast with the MAP kinase p38 alpha (a cardiac protective pathway) was identified as a potential rationale for montelukast withdrawal cardiac effects. Conclusion. Relative to the controls, montelukast displays a range of suspected system organ class level ADRs. For psychiatric ADR, montelukast is statistically significant (P < .001). A mechanistic hypothesis is proposed based on polypharmacological interactions in combination with CSF levels attained. This work further supports the close monitoring of montelukast for neuropsychiatric side effects.
... 34 SMILES is a simplified line notation utilized to portray the configuration of chemical entities. The SMILES were obtained from DrugBank, 35 PubChem, 36 and ChEMBL 37 between the years 2022 and 2023 ( Figure 2). All the drugrelated databases are shown in green boxes along with their mapping process in Figure 2. SMILES strings were transformed into a fixed-length vector representation using extended connectivity fingerprints (ECFP) generation methods. ...
Article
Full-text available
Idiopathic pulmonary fibrosis (IPF) affects an estimated global population of around 3 million individuals. IPF is a medical condition with an unknown cause characterized by the formation of scar tissue in the lungs, leading to progressive respiratory disease. Currently, there are only two FDA-approved small molecule drugs specifically for the treatment of IPF and this has created a demand for the rapid development of drugs for IPF treatment. Moreover, denovo drug development is time and cost-intensive with less than a 10% success rate. Drug repurposing currently is the most feasible option for rapidly making the drugs to market for a rare and sporadic disease. Normally, the repurposing of drugs begins with a screening of FDA-approved drugs using computational tools, which results in a low hit rate. Here, an integrated machine learning-based drug repurposing strategy is developed to significantly reduce the false positive outcomes by introducing the predock machine-learning-based predictions followed by literature and GSEA-assisted validation and drug pathway prediction. The developed strategy is deployed to 1480 FDA-approved drugs and to drugs currently in a clinical trial for IPF to screen them against "TGFB1", "TGFB2", "PDGFR-a", "SMAD-2/3", "FGF-2", and more proteins resulting in 247 total and 27 potentially repurposable drugs. The literature and GSEA validation suggested that 72 of 247 (29.14%) drugs have been tried for IPF, 13 of 247 (5.2%) drugs have already been used for lung fibrosis, and 20 of 247 (8%) drugs have been tested for other fibrotic conditions such as cystic fibrosis and renal fibrosis. Pathway prediction of the remaining 142 drugs was carried out resulting in 118 distinct pathways. Furthermore, the analysis revealed that 29 of 118 pathways were directly or indirectly involved in IPF and 11 of 29 pathways were directly involved. Moreover, 15 potential drug combinations are suggested for showing a strong synergistic effect in IPF. The drug repurposing strategy reported here will be useful for rapidly developing drugs for treating IPF and other related conditions.
... Chemical Structure. For the chemical structure data, molecular structure openly available from Drugbank [39] and Pubchem [22] were used to compute Extended Connectivity Fingerprints (Morgan Fingerprints, ECFP) [40]. The ECFP were converted from a binary vector to chemical clusters by calculating pairwise dice similarity coefficients and then using one iteration of hierarchical clustering so that each drug is assigned to its own chemical structure cluster. ...
Preprint
Aim: This study evaluated the use of machine learning in leveraging drug ADME data to develop a novel anticholinergic burden (AB) scale and compared its performance to previously published scales. Methods: Experimental and in silico ADME data were collected for antimuscarinic activity, blood-brain barrier penetration, bioavailability, chemical structure and P-gp substrate profile. These five ADME properties were used to train an unsupervised model to assign anticholinergic burden scores to drugs. The performance of the model was evaluated through 10-fold cross-validation and compared with the clinical ACB scale and non-clinical ATS scale which is based primarily on muscarinic binding affinity. Results: In silico software (ADMET predictor ®) used for screening drugs for their blood-brain barrier (BBB) penetration correctly identified some drugs that do not cross the BBB. The mean AUC for the unsupervised and ACB scale based on five selected features was 0.76 and 0.64 respectively. The unsupervised model agreed with the ACB scale on the classification of more than half of the drugs (n=49 of m=88) and agreed on the classification of less than half the drugs in the ATS scale (n=12/25). Conclusion: Our findings suggest that the commonly used ACB scale may misclassify certain drugs due to their inability to cross the BBB. On the other hand, the ATS scale would misclassify drugs solely depending on muscarinic binding affinity without considering ADME properties. Machine learning models can be trained on these features to build classification models that are easy to update and have greater generalizability.
... Pau dataset is derived from the SIDER database (Kuhn et al., 2010) which contains information about drugs and their recorded side effects. Miz dataset includes information about drug-protein interactions and drug-side effect interactions, obtained from the DrugBank (Wishart et al., 2006) and SIDER database, respectively. There were 658 drugs with both targeted protein and side effect information. ...
Article
Full-text available
Drug-side effect prediction has become an essential area of research in the field of pharma-cology. As the use of medications continues to rise, so does the importance of understanding and mitigating the potential risks associated with them. At present, researchers have turned to data-driven methods to predict drug-side effects. Drug-side effect prediction is a link prediction problem, and the related data can be described from various perspectives. To process these kinds of data, a multi-view method, called Multiple Kronecker RLS fusion-based link propagation (MKronRLSF-LP), is proposed. MKronRLSF-LP extends the Kron-RLS by finding the consensus partitions and multiple graph Laplacian constraints in the multi-view setting. Both of these multi-view settings contribute to a higher quality result. Extensive experiments have been conducted on drug-side effect datasets, and our empirical results provide evidence that our approach is effective and robust.
... At present, a large number of pertinent databases and tools offer vital support for the field of network pharmacology in TCM. Databases frequently employed in the network pharmacology of TCM include TCMSP [84], drug information databases [85], PubChem [86], target interaction databases [87], and gene-disease association databases [88], which provide an understanding of the effects of herbs on diseases. The network pharmacology algorithm is a frequently employed network clustering technique that commences with a random node (drug, target, or disease), computes the similarity between that node and its neighboring nodes, and constructs a "drug-target-disease" network. ...
Article
Full-text available
Osteoarthritis (OA) is the most common type of arthritis and affects more than 240 million people worldwide; the most frequently affected areas are the hips, knees, feet, and hands. OA pathophysiology is multifactorial, involving genetic, developmental, metabolic, traumatic, and inflammation factors. Therefore, treatments able to address several path mechanisms can help control OA. Network pharmacology is developing as a next-generation research strategy to shift the paradigm of drug discovery from “one drug, one target” to “multi-component, multi-target”. In this paper, network pharmacology is employed to investigate the potential role of Paeoniae Radix Alba (PRA) in the treatment of OA. PRA is a natural product known for its protective effects against OA, which has recently drawn attention because of its ability to provide physiological benefits with fewer toxic effects. This review highlights the anti-inflammatory properties of PRA in treating OA. PRA can be used alone or in combination with conventional therapies to enhance their effectiveness and reduce side effects. The study also demonstrates the use of network pharmacology as a cost-effective and time-saving method for predicting therapeutic targets of PRA in OA treatment.
Article
Sporotrichosis is recognized as the predominant subcutaneous mycosis in South America, attributed to pathogenic species within the Sporothrix genus. Notably, in Brazil, Sporothrix brasiliensis emerges as the principal species, exhibiting significant sapronotic, zoonotic and enzootic epidemic potential. Consequently, the discovery of novel therapeutic agents for the treatment of sporotrichosis is imperative. The present study is dedicated to the repositioning of pharmaceuticals for sporotrichosis therapy. To achieve this goal, we designed a pipeline with the following steps: (a) compilation and preparation of Sporothrix genome data; (b) identification of orthologous proteins among the species; (c) identification of homologous proteins in publicly available drug-target databases; (d) selection of Sporothrix essential targets using validated genes from Saccharomyces cerevisiae; (e) molecular modeling studies; and (f) experimental validation of selected candidates. Based on this approach, we were able to prioritize eight drugs for in vitro experimental validation. Among the evaluated compounds, everolimus and bifonazole demonstrated minimum inhibitory concentration (MIC) values of 0.5 µg/mL and 4.0 µg/mL, respectively. Subsequently, molecular docking studies suggest that bifonazole and everolimus may target specific proteins within S. brasiliensis– namely, sterol 14-α-demethylase and serine/threonine-protein kinase TOR, respectively. These findings shed light on the potential binding affinities and binding modes of bifonazole and everolimus with their probable targets, providing a preliminary understanding of the antifungal mechanism of action of these compounds. In conclusion, our research advances the understanding of the therapeutic potential of bifonazole and everolimus, supporting their further investigation as antifungal agents for sporotrichosis in prospective hit-to-lead and preclinical investigations.
Article
Full-text available
Ore mineral and host lithologies have been sampled with 89 oriented samples from 14 sites in the Naica District, northern Mexico. Magnetic parameters permit to charac- terise samples: saturation magnetization, density, low- high-temperature magnetic sus- ceptibility, remanence intensity, Koenigsberger ratio, Curie temperature and hystere- sis parameters. Rock magnetic properties are controlled by variations in titanomag- netite content and hydrothermal alteration. Post-mineralization hydrothermal alter- ation seems the major event that affected the minerals and magnetic properties. Curie temperatures are characteristic of titanomagnetites or titanomaghemites. Hysteresis parameters indicate that most samples have pseudo-single domain (PSD) magnetic grains. Alternating filed (AF) demagnetization and isothermal remanence (IRM) ac- quisition both indicate that natural and laboratory remanences are carried by MD-PSD spinels in the host rocks. The trend of NRM intensity vs susceptibility suggests that the carrier of remanent and induced magnetization is the same in all cases (spinels). The Koenigsberger ratio range from 0.05 to 34.04, indicating the presence of MD and PSD magnetic grains. Constraints on the geometry of the intrusive source body devel- oped in the model of the magnetic anomaly are obtained by quantifying the relative contributions of induced and remanent magnetization components.
Article
Full-text available
The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic, and statistical refinements permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is described for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position Specific Iterated BLAST (PSLBLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities.
Article
Full-text available
Motivation: Modern biology is shifting from the 'one gene one postdoc' approach to genomic analyses that include the simultaneous monitoring of thousands of genes. The importance of efficient access to concise and integrated biomedical information to support data analysis and decision making is therefore increasing rapidly, in both academic and industrial research. However, knowledge discovery in the widely scattered resources relevant for biomedical research is often a cumbersome and non-trivial task, one that requires a significant amount of training and effort.
Article
Full-text available
The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSIBLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.
Article
Full-text available
Modern biology is shifting from the 'one gene one postdoc' approach to genomic analyses that include the simultaneous monitoring of thousands of genes. The importance of efficient access to concise and integrated biomedical information to support data analysis and decision making is therefore increasing rapidly, in both academic and industrial research. However, knowledge discovery in the widely scattered resources relevant for biomedical research is often a cumbersome and non-trivial task, one that requires a significant amount of training and effort. To develop a model for a new type of topic-specific overview resource that provides efficient access to distributed information, we designed a database called 'GeneCards'. It is a freely accessible Web resource that offers one hypertext 'card' for each of the more than 7000 human genes that currently have an approved gene symbol published by the HUGO/GDB nomenclature committee. The presented information aims at giving immediate insight into current knowledge about the respective gene, including a focus on its functions in health and disease. It is compiled by Perl scripts that automatically extract relevant information from several databases, including SWISS-PROT, OMIM, Genatlas and GDB. Analyses of the interactions of users with the Web interface of GeneCards triggered development of easy-to-scan displays optimized for human browsing. Also, we developed algorithms that offer 'ready-to-click' query reformulation support, to facilitate information retrieval and exploration. Many of the long-term users turn to GeneCards to quickly access information about the function of very large sets of genes, for example in the realm of large-scale expression studies using 'DNA chip' technology or two-dimensional protein electrophoresis. Freely available at http://bioinformatics.weizmann.ac.il/cards/ Contact: cards@bioinformatics.weizmann.ac.il
Article
Full-text available
The quality of drug-specific information available to consumers on the Internet was studied. The 30 most commonly dispensed prescription drugs were selected to represent those medications for which consumers would be seeking information. A Web page evaluation form was developed to objectively evaluate each site in terms of sponsors, references, recency of updates, ease of use, overall organization, and other characteristics. A second form was developed to qualitatively and quantitatively assess the drug information provided by the sites. Four Internet sites, MedicineNet, RxList, Drug InfoNet, and thriveonline, were evaluated. Authors, contributors, and references were identified for three of the sites. All sites had disclaimers advising patients to seek the advice of a health care professional, all indexed drug information by both brand name and generic name, and all were well organized. Only RxList and MedicineNet contained information on all the drugs evaluated. For the drugs documented, RxList, MedicineNet, Drug InfoNet, and thriveonline contained 84%, 60%, 87%, and 72% of the 22 variables assessed, respectively. The accuracy of the information provided was greater than 98% for all the sites. Only two of four Internet sites containing consumer drug information included all the prescription drugs being evaluated. Most but not all of the information on the four sites was accurate.
Article
Full-text available
A number of proteins and nucleic acids have been explored as therapeutic targets. These targets are subjects of interest in different areas of biomedical and pharmaceutical research and in the development and evaluation of bioinformatics, molecular modeling, computer-aided drug design and analytical tools. A publicly accessible database that provides comprehensive information about these targets is therefore helpful to the relevant communities. The Therapeutic Target Database (TTD) is designed to provide information about the known therapeutic protein and nucleic acid targets described in the literature, the targeted disease conditions, the pathway information and the corresponding drugs/ligands directed at each of these targets. Cross-links to other databases are also introduced to facilitate the access of information about the sequence, 3D structure, function, nomenclature, drug/ligand binding properties, drug usage and effects, and related literature for each target. This database can be accessed at http://xin.cz3.nus.edu.sg/group/ttd/ttd.asp and it currently contains entries for 433 targets covering 125 disease conditions along with 809 drugs/ligands directed at each of these targets. Each entry can be retrieved through multiple methods including target name, disease name, drug/ligand name, drug/ligand function and drug therapeutic classification.
Article
Full-text available
The Pharmacogenetics Knowledge Base (PharmGKB; http://www.pharmgkb.org/) contains genomic, phenotype and clinical information collected from ongoing pharmacogenetic studies. Tools to browse, query, download, submit, edit and process the information are available to registered research network members. A subset of the tools is publicly available. PharmGKB currently contains over 150 genes under study, 14 Coriell populations and a large ontology of pharmacogenetics concepts. The pharmacogenetic concepts and the experimental data are interconnected by a set of relations to form a knowledge base of information for pharmacogenetic researchers. The information in PharmGKB, and its associated tools for processing that information, are tailored for leading-edge pharmacogenetics research. The PharmGKB project was initiated in April 2000 and the first version of the knowledge base went online in February 2001.
Article
SMILES (Simplified Molecular Input Line Entry System) is a chemical notation system designed for modern chemical information processing. Based on principles of molecular graph theory, SMILES allows rigorous structure specification by use of a very small and natural grammar. The SMILES notation system is also well suited for high-speed machine processing. The resulting ease of usage by the chemist and machine compatability allow many highly efficient chemical computer applications to be designed including generation of a unique notation, constant-speed (zeroeth order) database retrieval, flexible substructure searching, and property prediction models.
Article
The CyberCell Database (CCDB: http://redpoll. pharmacy.ualberta.ca/CCDB) is a comprehensive, web‐accessible database designed to support and coordinate international efforts in modeling an Escherichia coli cell on a computer. The CCDB brings together both observed and derived quantitative data from numerous independent sources covering many aspects of the genomic, proteomic and metabolomic character of E.coli (strain K12). The database is self‐updating but also supports ‘community’ annotation, and provides an extensive array of viewing, querying and search options including a powerful, easy‐to‐use relational data extraction system.