ArticlePDF AvailableLiterature Review

In silico drug repositioning: from large-scale transcriptome data to therapeutics

Authors:

Abstract and Figures

Drug repositioning is an attractive alternative to conventional drug development when new beneficial effects of old drugs are clinically validated because pharmacokinetic and safety profiles are generally already available. Since ~ 30% of drugs newly approved by the US food and drug administration (FDA) are developed through drug repositioning, identifying novel usage for existing drugs is an emerging strategy for developing disease treatments. With advances in next-generation sequencing technologies, available transcriptome data related to diseases have expanded rapidly. Harnessing these resources enables a better understanding of disease mechanisms and drug mode of action (MOA), and moves toward personalized pharmacotherapy. In this review, we briefly outline publicly available large-scale transcriptome databases and tools for drug repositioning. We also highlight recent approaches leading to the discovery of novel drug targets, drug response biomarkers, drug indications, and drug MOA.
Content may be subject to copyright.
REVIEW
In silico drug repositioning: from large-scale transcriptome data
to therapeutics
Ok-Seon Kwon
1,3
Wankyu Kim
2
Hyuk-Jin Cha
1,3
Haeseung Lee
2
Received: 30 January 2019 / Accepted: 26 July 2019
ÓThe Pharmaceutical Society of Korea 2019
Abstract Drug repositioning is an attractive alternative to
conventional drug development when new beneficial
effects of old drugs are clinically validated because phar-
macokinetic and safety profiles are generally already
available. Since *30% of drugs newly approved by the
US food and drug administration (FDA) are developed
through drug repositioning, identifying novel usage for
existing drugs is an emerging strategy for developing dis-
ease treatments. With advances in next-generation
sequencing technologies, available transcriptome data
related to diseases have expanded rapidly. Harnessing these
resources enables a better understanding of disease mech-
anisms and drug mode of action (MOA), and moves toward
personalized pharmacotherapy. In this review, we briefly
outline publicly available large-scale transcriptome data-
bases and tools for drug repositioning. We also highlight
recent approaches leading to the discovery of novel drug
targets, drug response biomarkers, drug indications, and
drug MOA.
Keywords Drug repositioning In silico drug
repositioning Transcriptome Pharmacogenomics Big
data
Introduction
Despite major scientific and technological advances in not
only basic research but also drug discovery and develop-
ment, the number of new drugs approved by the US Food
and Drug Administration (FDA) has steadily declined. This
trend is called Eroom’s Law, and is the reverse of the more
familiar Moore’s Law that refers to the exponential
increase in the number of transistors in a dense integrated
circuit (Scannell et al. 2012). To improve the productivity
and success rate of drug discovery, many novel strategies
have been developed, including target structure-based drug
design (Chen and Butte 2016), disease modeling with stem
cell technology (Xia and Wong 2012; Kondo et al. 2017),
and drug repositioning (or repurposing) (Hernandez et al.
2017).
Among these strategies, drug repositioning could
potentially overcome the challenges of drug discovery
(Hernandez et al. 2017; Novac 2013). In this approach,
toxicity, pharmacokinetics, and pharmacodynamics profiles
of a given drug are fully characterized during clinical trials.
Once the efficacy of old drugs or drug candidates on new
indications is established, successful therapeutic interven-
tion can be anticipated with less risk from failure during
clinical trials. There are many examples of serendipitous
success for drug repositioning, as exemplified by thalido-
mide (originally developed for sedation and nausea, and
now prescribed for multiple myeloma), sildenafil (origi-
nally developed for angina but now used to treat male
erectile dysfunction), and raloxifene (originally developed
for breast cancer and now an established treatment for
osteoporosis) (Pushpakom et al. 2018). Currently, *30%
of drugs newly approved by the FDA are derived from drug
repositioning. Encouraged by these successes, more sys-
tematic approaches are being applied to identify new
&Haeseung Lee
haeseung@ewha.ac.kr
1
Research Institute of Pharmaceutical Sciences, Seoul
National University, Seoul 08826, Republic of Korea
2
Ewha Research Center for Systems Biology, Department of
Life Sciences, Ewha Womans University, Seoul 03760,
Republic of Korea
3
College of Pharmacy, Seoul National University,
Seoul 08826, Republic of Korea
123
Arch. Pharm. Res. Online ISSN 1976-3786
https://doi.org/10.1007/s12272-019-01176-3 Print ISSN 0253-6269
indications for old drugs, drug candidates, and drugs
withdrawn from the market via drug repositioning and
repurposing (Kim et al. 2016).
Recent advances in next-generation sequencing and
high-throughput technologies have rapidly expanded
available biological and chemical datasets, ushering in the
era of big data (Costa 2014). Furthermore, computational
methods taking advantage of these datasets have been
actively developed to couple diseases with novel thera-
peutics (Jin and Wong 2014). In particular, data-driven
approaches based on large-scale transcriptome data have
accelerated the discovery of candidate drugs across a wide
range of diseases (Chen and Butte 2016), some of which
are enrolled in clinical trials (Jahchan et al. 2013). Tran-
scriptional profiling, especially of mRNA, provides a
comprehensive view of biological changes that reflect the
overall consequences of multiple genetic variations. Thus,
disease mechanisms and drug mode of action (MOA) can
often be elucidated based on altered transcriptome profiles.
In this review, we describe publicly available resources
widely used for transcriptome-based drug repositioning,
and focus on recently developed computational approaches
to identify new drug targets, drug response biomarkers,
drug indication, and drug MOA that utilize these databases.
Public resources for in silico drug repositioning
A number of notable reference datasets widely used in
transcriptome-based drug repositioning have recently been
published and updated (Table 1) (Chen and Butte 2016;
Kannan et al. 2016). These are divided into (i) disease-
based, (ii) drug-based, and (iii) knowledge-based datasets
depending on the biological perspective that the data
describe. Disease-based datasets such as The Cancer
Genome Atlas (TCGA) and Cancer Cell Line Encyclopedia
(CCLE) include gene expression profiles and detailed
information on clinical or preclinical samples. Drug-based
datasets such as CMap, LINCS, and CTRP include gene
expression profiles of drug perturbation, drug efficacy,
known targets, and other drug-related features. Knowledge-
based datasets such as Gene Ontology, MSigDB, and
KEGG include gene or protein functional annotations that
are used to understand or interpret mechanisms of disease
or drug action based on a set of genes. Datasets listed
herein have been thoroughly investigated for hypothesis
testing and discovery in the pharmacogenomics field, but
we selectively describe their applications in identifying
drug targets, drug response biomarkers, drug indication,
and drug MOA (Fig. 1).
Clinical and preclinical transcriptome data
With rapid advances in systemic approaches, integrative
analysis of multiple-layer omics data from various sources
is widely used not only to identify drug–disease relation-
ships, but also to discover optimal drug targets and/or drugs
for diseases (Kannan et al. 2016). Thus, it is of the utmost
importance to develop better integrative platforms and
databases for disease-related information that are publi-
cally accessible.
Patient-derived transcriptome data
The Cancer Genome Atlas (TCGA; https://cancergenome.
nih.gov) project, a joint collaboration between the National
Cancer Institute (NCI) and the National Human Genome
Research Institute (NHGRI), is one of the largest public
resources for multi-layer cancer genomics, with over
11,000 patient profiles representing 36 cancer types, and 15
genomic assays per tumor type. TCGA data contain
information on tumors such as gene expression, copy
number variation, somatic mutations, single-nucleotide
polymorphisms (SNPs), and clinical outcomes with
pathological annotation. Even though TCGA database
contains comprehensive information of cancer, TCGA has
not been complete in the aspect of missing information
such as transcriptome from normal tissue or drug treatment
history. In this line, considering the difference of analytical
breadth such as incomplete information, and emergent
themes across cancer type and organ of origins, TCGA
launched the Pan-Cancer analysis project to provide com-
prehensive information about cancer. Through TCGA Pan-
Cancer Atlas project, comprehensive database of 12 dif-
ferent tumor type including a total of 5074 tumor sample
has been assessed for clinical, genomic, epigenomic,
transcriptional and proteomic data on at least one platform
each (Cancer Genome Atlas Research et al. 2013). More-
over, Pan-Cancer Atlase reclassifies human tumor into
three major categories based on molecular similarities:
cell-of-origin pattern, oncogenic processes and signaling
pathways (Sanchez-Vega et al. 2018). The integrative data
from TCGA Pan-Cancer Atlas, this is a powerful emerging
resource as we enter a new era of cancer treatment.
Cell line-derived transcriptome data
The Cancer Cell Line Encyclopedia (CCLE; https://portals.
broadinstitute.org/ccle) database is a large-scale genomic
dataset of gene expression, copy number, and DNA
sequencing data from 1457 human cancer cell lines,
encompassing 36 tumor types. Similarly, transcriptome
data from in vitro cancer cell lines are provided in the
Genomics of Drug Sensitivity in Cancer (GDSC) database,
O.-S. Kwon et al.
123
which includes over 1000 cancer cell lines, and NCI-60,
which covers 60 cancer cell lines. Moreover, by compiling
cell line and compound sensitivity data using CCLE,
GDSC, and NCI-60, a profound understanding of the
connections between pharmacological vulnerability and
molecular signature of a responsive cancer cell line can be
gleaned (Barretina et al. 2012; Cancer Cell Line Encyclo-
pedia and Genomics of Drug Sensitivity in Cancer 2015).
However, standardization with additional curation and
processing to combine information of cell line and drug
Table 1 Publically accessible databases widely used in transcriptomic-based in silico drug repositioning
Category Name Description (as of December 2018) URL
Omics data
repositories
GEO Raw and processed transcriptome data from multiple platforms https://www.ncbi.nim.nih.
gov/geo/
SRA Sequencing data from multiple platforms https://www.ncbi.nlm.nih.
gov/sra
ArrayExpress Raw and processed transcriptome data from multiple platforms https://www.ebi.ac.uk/
arrayexpress/
Disease-
based
ICGC Genomic, transcriptomic, epigenomic and clinical data from [24,000 tumours
(22 different tumour types)
https://icgc.org/
TCGA Genomic, transcriptomic, epigenomic and clinical data from [11,000 tumours
(33 different tumour types)
http://tcga-data.nci.nih.gov/
CCLE Genomic, transcriptomic and epigenomic data from [1000 cancer cell lines http://www.broadinstitute.
org/ccle
GDSE Genomic, transcriptomic and epigenomic data from [1000 cancer cell lines https://wwwcancerrxgene.
org/
Durg-based CMap Gene expression profiles for 1309 chemical compounds in 5 cancer cell lines https://portals.broadinstitute.
org/cmap/
LINCS Gene expression profiles for perturbagens (20,413 chemicals and 2119 genetic
knockdown/overexpression) across 77 cell lines
https://clue.io/
NCI60 Drug response data (GI50, LC50 values) of 60 cancer cell lines for 45,449
compounds
https://dtp.cancer.gov/
discovety_development/
nci-60/
CTRP Drug response data (AUC, EC50 values) of 860 cancer cell lines for 481
compounds
https://portals.broadinstitute.
org/ctrp/
CCLE Drug response data (AUC, IC50 values) of 504 cancer cell lines for 24
compounds
http://www.broadinstitute.
org/ccle
GDSE Drug response data (AUC, IC50 values) of 714 cancer cell lines for 142
compounds
https://www.cancerrxgene.
org/
NCl-
ALMANAC
Therapeutic activity for pairwise combinations ([5000 pairs) of 104 FDA-
approved anticancer drugs against NCI-60 cell lines
https://dtp.cancer.
govincialmanac
PubChem
Bioassay
Chemical compound screening data, including [3.4 M unique chemical
compounds, [12 K protein targets, and [1 M assays.
https://pubchem.ncbi.nlm.
nih.gov/
ChEMBL Chemical compound screening data, including [2.2 M unique chemical
compounds, [12 K protein targets, and [1 M assays
https://www.ebi.ac.uk/
chembl/
Knowledge-
based
Gene
ontology
Database collection of over 15,000 genes with gene-ontology, including 13,212
biological process, 1547 cellular components and 4162 molecular functions
http://www.geneontology.
org/
MsigDB Data repository, contained 17,810 genes sets, with 8 major collection Database
collection of over 2300 biological pathways for 25 different species
http://software.broadinstitute.
org/gsea/msigdb
http://www.wikipathways.
org
KEGG Database collection for genomes, pathways, disease and compounds
information, including 3947 genes, 200 pathway and 9324 gene-pathway
association
http://www.genome.jp/kegg
BioCarta Database collection of 1396 genes with 254 pathway and 4417 gene-pathway
association
http://www.biocarta.com
Reactome Database collection of 7535 genes with 1638 pathway and 83,680 gene-pathway
association
http://www.reactome.org
In silico drug repositioning: from large-scale transcriptome data to therapeutics
123
treatment is a challenge that is required to enable integra-
tive analysis.
Transcriptome data following treatment
The connectivity map (build 02)
The connectivity map (CMap) is a collection of genome-
wide gene expression data from five human cancer cell
lines treated with 1309 compounds obtained using the
Affymetrix microarray platform (Lamb et al. 2006). The
concept of CMap is to establish a comprehensive reference
database of drug-induced gene expression profiles to
compare with a set of genes representing the biological
state of interest, and to discover functional connections
between them. It provides a web-based tool that performs
simple pattern matching analysis with CMap reference data
based on a user-submitted gene list, but is no longer
updated or modified (https://portals.broadinstitute.org/
cmap/).
Library of integrated network-based cellular signatures
(LINCS) L1000
LINCS L1000, also referred to as LINCS, or an extended
version of CMap, is a resource containing 1.3 million gene
expression profiles associated with 20,413 chemical per-
turbagens (e.g., small molecules or drugs) and *5000
genetic perturbagens (e.g., single-gene knockdown or
overexpression) (Subramanian et al. 2017). Data were
acquired using the L1000 assay developed by the Broad
Institute CMap team to facilitate rapid high-throughput
gene expression profiling at low cost. The L1000 assay
measures the expression of 978 landmark genes, and
expression values for remaining genes are estimated by a
linear model using a diverse collection of transcriptome
data from Affymetrix microarray data in Gene Expression
Omnibus (GEO). LINCS L1000 datasets are fully down-
loadable from GEO (accession: GSE92742) and are easily
accessible via the cloud-based software platform CLUE
(https://clue.io/).
Fig. 1 Public databases utilized
in drug repositioning pipelines
O.-S. Kwon et al.
123
Knowledge-based gene annotations
In drug development pipelines, the knowledge base, which
includes information on drugs, biological implications of
drugs, and clinical outcomes, can reveal associations and
thereby provide integrative implications (Fotis et al. 2018).
To provide biological insight relevant to drug development,
utilizing molecular interaction data gathered from various
knowledge bases is a potentially powerful method. Below,
gene annotation databases that illuminate the biological
background by exploring molecular mechanisms and
molecular interactions are briefly described.
Gene annotation databases such as the Kyoto Encyclo-
pedia of Genes and Genomes (KEGG), Gene Ontology
(GO), and the Molecular Signatures Database (MSigDB)
provide diverse types of interaction models, including
signaling pathways, metabolic networks, and regulatory
interactions, based on transcriptome data. The KEGG
database collection integrates genomic and chemical
information. In terms of systemic information, the KEGG
database includes KEGG Pathway containing pathway
maps, KEGG Disease comprising disease entries, and
KEGG Drug that includes comprehensive information on
drugs, approved in Japan, the USA, and Europe. In par-
ticular, KEGG Pathway, which contains manually drawn
pathway maps, provides intuitive information on interac-
tions between genes and proteins (Kanehisa et al. 2018).
By contrast, the GO project aims to provide ontologies of
genes defined with their own properties. GO provides
ontologies and annotation information for three domains:
cellular component (CC), biological process (BP), and
molecular function (MF) (Zhang et al. 2014; Rhee et al.
2008). Meanwhile, MSigDB, developed for gene set
enrichment analysis (GSEA), covers a large number of
gene sets with annotations and links from external
resources including KEGG, GO, GEO, and ArrayExpress
(Liberzon et al. 2011). Together, these knowledge-based
databases provide a foundation for computational drug
repositioning based on transcriptome analysis, and collate
valuable information such as target identification and MOA
of drugs.
Web-based drug repositioning tools
Exploring complex large data sets described above often
requires high-performance computing resources but access
is difficult without proficient computer skills. A number of
user-friendly interface-based web tools that assist research
in drug repositioning have lowered this barrier for all sci-
entists regardless of their computational backgrounds (Sam
and Athri 2019). Since most transcriptome-based studies
initiate hypothesis testing on the sets of differentially
expressed genes (DEGs) that represent the biological state
of interest, various web-based analytic tools have been
developed to associate these DEGs with drugs.
CLUE (https://clue.io/l1000-query) provides a cloud-
based query tool to find positive or negative connections
between a user-submitted gene set and all the signatures in
LINCS L1000 (Subramanian et al. 2017). The term sig-
nature here refers to a vector of differential gene expression
values (Z score) induced by individual perturbagen in
LINCS L1000. CLUE returns a list of approximately
50,000 unique perturbagens, including small molecules,
single-gene knockdown and overexpression, with a score
based on the amount of inducing expressional changes of
the input genes.
L1000CDS
2
(http://amp.pharm.mssm.edu/L1000CDS2/)
is another LINCS L1000 signature search engine (Duan
et al. 2016). It processed LINCS L1000 data to define the
signatures using the characteristic direction method (Clark
et al. 2014). Predictive performance of L1000CDS
2
was
tested on expression signatures from human cells infected
with Ebola virus. Based on these signatures, kenpaullone, a
GSK3B/CDK2 inhibitor was predicted and its dose-de-
pendent efficacy in inhibiting Ebola infection was validated
in vitro.
DeSigN (http://design-v2.cancerresearch.my/query)
associates drug efficacy with a user-submitted gene set by
comparing it against drug response-related gene expression
signatures for 140 drugs (Lee et al. 2017). The individual
expression signature of a drug was defined as a differential
gene expression profile derived by using its drug sensitivity
(IC50) and baseline gene expression against cancer cell
lines in GDSC data. DeSigN was validated using four
different drug sensitivity studies deposited in the GEO
database. In addition, bosutinib, a src tyrosine kinase
inhibitor, was predicted as a sensitive drug for oral squa-
mous cell carcinoma (OSCC) and its efficacy was
demonstrated by in vitro viability assay.
Prediction of novel drug–target interactions
Drug polypharmacology (Hopkins 2008), in which a single
drug acts on multiple targets, implies the therapeutic
potential of a drug for new indications, and thus facilitates
innovative and successful drug repositioning (Reddy and
Zhang 2013). Drug-induced transcriptome data reflect the
combined effects of multiple targets of a drug, providing
insight into its MOA or unintended off-targets. CMap and
LINCS are the most comprehensive resources for exploring
novel drug–target interactions (DTIs). From CMap data,
high correlations among gene expression changes caused
by drugs sharing the same target have been systematically
shown (Wang et al. 2013). Several methods have been
developed to expand known drug–target relationships
In silico drug repositioning: from large-scale transcriptome data to therapeutics
123
based on drug similarity at the gene expression level
(Hizukuri et al. 2015; Iwata et al. 2017). On the other hand,
drug-induced differentially expressed genes (DEGs) com-
prise only a small proportion of known target genes, but are
distributed close to targets in the functional protein–protein
interaction (PPI) network (Isik et al. 2015). Based on these
observations, a target prediction model was developed that
integrates drug-induced DEGs and the network topology of
PPIs.
Genetically perturbed transcriptome data can also be
utilized to seek novel DTIs using drug-induced transcrip-
tome data. Importantly, novel connections between a drug
and its target gene can be inferred from common expres-
sion signatures shared by both drug treatments and loss of
gene function in yeast systems (Hughes et al. 2000). This
idea was applied in human cancer cells using LINCS
L1000 data, which led to the discovery of compound BRD-
1868 that targets Casein Kinase 1A1, which is related to
drug resistance in lung cancer (Lantermann et al. 2015;
Subramanian et al. 2017). Another similar approach com-
prehensively predicted novel DTIs between 1124 drugs and
829 target proteins by correlating gene expression patterns
caused by chemical and genetic perturbations (Sawada
et al. 2018). Notably, this approach distinguished predicted
DTIs by inhibitory and activatory interactions, depending
on whether a genetic perturbation directly compared with a
drug is knockdown or overexpression.
Identification of drug response biomarkers
In drug development pipelines, most drugs are developed
based on the molecular features of a given disease. In terms
of drug repositioning, identification of indicators or
biomarkers of repurposed drugs is critical to match the
appropriate drug with the right patient based on predicted
drug responses (Kelloff and Sigman 2012). With great
advances in sequencing technologies, large-scale tran-
scriptome data and pharmacogenomics-based disease
models have emerged that aid the identification of
biomarkers and the prediction of drug responses. In the
Cancer Therapeutics Response Portal (CTRP) database,
transcriptome-based biomarkers of drug sensitivity have
been identified by integrating drug response profiles for
481 anticancer drugs across 860 cancer cell lines (Cancer
Cell Line Encyclopedia and Genomics of Drug Sensitivity
in Cancer 2015). Drug response profiles from CTRP can be
utilized to predict drug responses in cell lines, which have
particular disease features or defined gene signatures,
suggesting that drugs may sensitize certain disease fea-
tures. For example, sensitivity patterns of 481 chemical
compounds were correlated with *19,000 basal transcript
levels across 823 different human cancer cell lines, and this
demonstrated that analyzing the basal gene expression
profile of cell lines can predict drug responses and illu-
minate the mechanisms of small molecules (Rees et al.
2016). Furthermore, based on validation with previously
annotated targets and drugs, as exemplified by BCL2 and
ABT-199 (Rees et al. 2016) and SLC35F2 and YM-155
(Winter et al. 2014; Rees et al. 2016), ML239 was newly
identified, after being originally identified by phenotypic
screening to selectively eliminate epithelial breast cancer
cells, and found to activate fatty acid desaturase 2 (FADS2)
(Rees et al. 2016). Furthermore, chemoresistance score was
defined, which is strongly correlated with mesenchymal
cancer traits, by leveraging integrative transcriptome data
from both CTRP and CCLE (Hong et al. 2018). Further-
more, analyzing the association between drug response
profiles and genome-wide RNAi screening data in the
Achilles project (Tsherniak et al. 2017) identified ITGB3,
highly expressed in mesenchymal-type lung cancer cell
lines (Bae et al. 2016; Hong et al. 2016), as an Achilles’
heel for chemoresistant cancer cells with mesenchymal
traits. In conclusion, dependency on ITGB3 was considered
to be one of the major factors determining the responses of
most chemotherapeutic drugs (Hong et al. 2018). Thus,
leveraging publicly available pharmacogenomics data
linked to diseases offers a promising approach for identi-
fying drug biomarkers with statistical reliability.
Discovery of novel drug indications
Systems biology approaches have utilized large-scale
pharmacogenomics data to identify previously unrecog-
nized relationships between diseases and drugs. These
approaches generally begin by defining a gene expression
signature (e.g., a collection of genes representing a disease
state) and comparing it directly against compound signa-
tures in reference databases such as CMap or CTRP. This
query signature can be derived from a disease, drug per-
turbation, or genetic perturbation, and used to perform
(i) drug–disease, (ii) drug–drug, or (iii) drug–gene com-
parisons (Fig. 2). Below, several studies that have discov-
ered new drug indications through such comparisons are
described.
The first case, the most prevalent approach, typically
defines a disease signature as a set of DEGs obtained by
comparing disease and corresponding control (healthy)
states, and seeks a drug whose perturbation reverses the
disease signature. For example, disease signatures were
generated for 100 diseases using microarray data from
GEO, and each disease signature was mapped to 164 drug
signatures in CMap (build 01) (Dudley et al. 2011; Sirota
et al. 2011). Among the highly anti-correlated disease–drug
pairs, many known disease–drug relationships were
O.-S. Kwon et al.
123
recovered, along with new associations including cime-
tidine (a histamine H
2
receptor antagonist for antiulcer
treatment) for the treatment of lung adenocarcinoma, and
topiramate (a voltage-gated sodium and calcium channel
blocker as an anticonvulsant) for the treatment of inflam-
matory bowel disease. A similar systematic approach using
a small cell lung cancer (SCLC) expression signature found
that antidepressant drugs (imipramine, a tricyclic antide-
pressant; promethazine, a histamine H
1
receptor antagonist
for allergies; and bepridil, an amine calcium channel
blocker) are potent inducers of apoptosis in SCLC (Jahchan
et al. 2013). These findings led to the enrolment into
clinical trials of a related molecule, the tricyclic antide-
pressant desipramine, for the treatment of SCLC
(NCT01719861, phase IIa clinical trials). In another
example, comparison of a metastatic colon signature
against compound signatures in CMap (build 02) resulted
in the identification of citalopram (a selective serotonin
reuptake inhibitor and antidepressant), troglitazone (a
ligand mimetic of PPARcand antihyperglycemic agent),
and enilconazole (a fungicide) drugs for the treatment of
colorectal cancer metastasis (van Noort et al. 2014). A
common assumption of these studies is that a strong anti-
correlation between a disease and drug signatures indicates
that the drug may potentially have a therapeutic effect on
the disease.
Disease signatures can also be used to characterize
disease states for other biological systems (e.g., cancer
cells or organoids), and may be associated with drug
activity such as IC50, EC50, and AUC values. For exam-
ple, a mesenchymal score was calculated using a mes-
enchymal signature for each cancer cell line available in
CTRP, and correlated with cell line sensitivity against 481
compounds (Viswanathan et al. 2017). The authors found
that ferroptosis inducers (e.g., RSL3, ML210, and ML162)
were selectively potent against mesenchymal cancer cells
via inhibition of a lipid peroxidase pathway. A similar
approach using a YM155-resistant signature led to the
Fig. 2 Three signature types
(disease, drug perturbation, and
genetic perturbation) used to
compare compound signatures
in pharmacogenomics databases
for identifying novel
relationships between diseases
and drugs
In silico drug repositioning: from large-scale transcriptome data to therapeutics
123
discovery of BCL2 homology 3 mimetics (ABT-263, ABT-
737, and WEHI-539) that selectively ablate abnormal
human embryonic stem cells (hESCs) resistant to YM155,
which specifically eliminates undifferentiated hESCs (Lee
et al. 2013; Cho et al. 2018).
Drug–drug comparisons can be used to extrapolate
knowledge on a given drug to other drugs based on simi-
larity, assuming that drugs whose perturbations cause
similar gene expression changes may have similar thera-
peutic effects. Indeed, drugs with similar MOAs were
significantly enriched in the sub-modules of a large-scale
drug association network constructed based on drug-in-
duced transcriptional similarity from CMap data (Iorio
et al. 2010). In this network, fasudil, a Rho-kinase inhibitor
and vasodilator, was clustered with well-known autophagy
inducers, and its effect on autophagy enhancement was
validated.
LINCS contains drug-induced transcriptome data for
additional perturbagens causing perturbations 15-fold
beyond the range included in CMap, providing an excellent
opportunity for exploring candidate compounds. One
approach retrieved LINCS data to identify drugs whose
signatures (i.e., DEGs from comparisons before and after
drug treatment) are similar to those of known glioblastoma
(GBM) drugs (Lee et al. 2016). By integrating this signa-
ture similarity with other features such as drug targets and
chemical structures, 14 drugs were predicted for the
treatment of GBM, and more than half displayed anti-
proliferative activity against patient-derived GBM cells.
In the final case, a disease is linked to a drug based on
similarity between transcriptomic signatures generated
from a genetic perturbation (e.g., knockdown or overex-
pression of a disease biomarker) and a drug. This concept
was first applied as an alternative to targeting the poorly
druggable gene encoding integrin beta 3 (ITGB3), respon-
sible for chemoresistance in mesenchymal lung cancer
(Hong et al. 2018). From LINCS data, atorvastatin (a
HMG-CoA reductase inhibitor for anti-dyslipidemia)
mimicked expression changes caused by knockdown of
ITGB3 and was identified as a chemosensitizer. A similar
approach was performed for the N-acetylgalactosaminyl-
transferase 14 (GALNT14) protein, the expression of which
is strongly correlated with lung cancer recurrence and
metastasis (Lee et al. 2008; Kwon et al. 2015). Due to a
lack of feasible drugs that directly inhibit the GALNT14
protein, the authors generated the gene expression signa-
ture of shRNA-mediated GALNT14 depletion in metastatic
lung cancer, and identified bortezomib (the first-in-class
proteasome inhibitor for multiple myeloma) (Argyriou
et al. 2008), which likely reverses GALNT14-dependent
gene expression (Kwon et al. 2018).
All the above studies successfully identified drugs by
matching transcriptomic signatures of drugs and
therapeutic targets, rather than attempting to inhibit these
protein targets directly. Given that many potential molec-
ular targets identified from cancer genomic profiling are
undruggable (Lazo and Sharlow 2016), this approach could
prove to be a viable strategy in cancer pharmacology.
Identification of drug mode of action
Identification of the molecular pathways and adverse
effects of a compound are crucial for drug repositioning.
Traditionally, the MOA of a drug has been predicted based
on analysis of chemical structure, gene expression profiles
following drug treatment (Lamb 2007), and side effect
similarity (Campillos et al. 2008). Furthermore, most of
these approaches are only applied to drugs that are well-
characterized based on the available structure and docu-
mented side effect (Iorio et al. 2010). Thus, when prior
information on drugs is lacking, gene signature-based
methods are the most cost-effective approach for eluci-
dating the MOA (Lamb et al. 2006; Lamb 2007). In this
regard, a subset of differential gene expression data fol-
lowing treatment can provide profound information on
connections between drugs, pathways, and diseases
through pharmacogenomics (e.g., CMap and LINCS) and
pathway (e.g., KEGG and GO) databases. As an example
of the power of identifying the MOA of a drug, fasudil was
newly identified to induce cellular autophagy through
network analysis, and could therefore be applicable for
neurodegenerative disorders (Iorio et al. 2010). Moreover,
they provide their approach for discovering MOA with
publically accessible tool, named as Mode of Action by
NeTwoRk Analysis (MANTRA, http://mantra.tigem.it).
Similarly, another study analyzed 16,268 compound and 68
human cell lines, gathered from LINCS, and performed
pathway enrichment analysis to reveal active pathways
(Iwata et al. 2017). By mapping onto KEGG biological
pathways using genes up- or down-regulated by the com-
pound, the authors proposed a computational approach to
identify not only active pathways, but also target proteins
and therapeutic indications. In another example, borte-
zomib was found to interrupt tumor metastasis in lung
cancer (Kwon et al. 2018), caused by an unexpected off-
target effect within cells, independent of proteasome inhi-
bition. The authors subsequently discovered the MOA of
bortezomib on metastasis by conducting KEGG and GSEA
analyses with combined transcriptome data from borte-
zomib-treated cells and GALNT14 expressing lung cancer
patients from TCGA. It was concluded that bortezomib
suppresses the TGFb-dependent gene signature, and
thereby inhibits tumor metastasis in lung cancer (Kwon
et al. 2018).
O.-S. Kwon et al.
123
Conclusions and future directions
With rapid advances in technology, complex transcriptome
datasets that reflect disease systems, ranging from single
cells to patients, are emerging and expanding. Diverse
transcriptome datasets enable researchers to achieve effi-
cient drug repositioning by prediction of drug side effects,
drug indications, and drug MOA. However, transcriptome
data has inherent challenging limitations. First, integrating
gene expression data from different platforms (e.g.,
microarray, RNA-seq, or L1000 assay) is intriguing as the
range of genes measured varies depending on the platform.
Second, since changes in mRNA levels are relatively more
sensitive than changes in DNA molecules, the effects of
various environmental factors that induce gene expression
changes are comprehensively reflected in transcriptome
data. In particular, large-scale data such as CMap generated
over a long period of time contain considerable experi-
mental variations or noise due to batch effects. Additional
pre-processing steps are therefore required to yield reliable
results. Third, in contrast to detecting genetic variations
compared to a reference genome, there is no representative
reference to define whether gene expression levels are high
or low. To determine if gene expression has changed under
a certain condition, there should be comparable gene
expression data under a control condition. The former two
limitations can be overcome by applying sophisticated
normalization methods, but the last one is still challenging.
Despite these limitations, systems biology approaches
leveraging these datasets can reveal previously unrecog-
nized relationships between drugs and diseases, and pro-
vide alternative strategies for associating drugs with newly
identified targets, regardless of their druggability. Since
drugging undruggable molecular targets represents a major
hurdle for traditional drug discovery methods, new thera-
peutic indications for existing drugs can prove crucial for
certain diseases. In particular, for personalized pharma-
cotherapy, an emerging approach for disease treatment that
takes into account individual characteristics of each
patient, discovering therapeutic indications from matched
drugs may be of great benefit to drug development and
drug repositioning. In this article, we reviewed publicly
available large-scale transcriptome databases and multi-
disciplinary methods utilizing the data within them. These
big data approaches hold great promise for overcoming the
limitations of traditional drug discovery pipelines and
supporting the field of precision pharmacology.
Acknowledgements This work was supported by the National
Research Foundation of Korea (NRF) via grants funded by the Korea
government (MSIT; NRF-2017R1A6A3A11030794, NRF-
2017M3C9A5028690, and NRF-2019R1C1C1008710) to HSL,
WKK, and OSK.
Compliance with ethical standards
Conflict of interest The authors declare that they have no conflict of
interest.
References
Argyriou AA, Iconomou G, Kalofonos HP (2008) Bortezomib-
induced peripheral neuropathy in multiple myeloma: a compre-
hensive review of the literature. Blood 112:1593–1599
Bae GY, Hong SK, Park JR, Kwon OS, Kim KT, Koo J, Oh E, Cha HJ
(2016) Chronic TGFbeta stimulation promotes the metastatic
potential of lung cancer cells by Snail protein stabilization
through integrin beta3-Akt-GSK3beta signaling. Oncotarget
7:25366–25376
Barretina J, Caponigro G, Stransky N, Venkatesan K, Margolin AA,
Kim S, Wilson CJ, Lehar J, Kryukov GV, Sonkin D, Reddy A,
Liu M, Murray L, Berger MF, Monahan JE, Morais P, Meltzer J,
Korejwa A, Jane-Valbuena J, Mapa FA, Thibault J, Bric-Furlong
E, Raman P, Shipway A, Engels IH, Cheng J, Yu GK, Yu J,
Aspesi P Jr, De Silva M, Jagtap K, Jones MD, Wang L, Hatton
C, Palescandolo E, Gupta S, Mahan S, Sougnez C, Onofrio RC,
Liefeld T, Macconaill L, Winckler W, Reich M, Li N, Mesirov
JP, Gabriel SB, Getz G, Ardlie K, Chan V, Myer VE, Weber BL,
Porter J, Warmuth M, Finan P, Harris JL, Meyerson M, Golub
TR, Morrissey MP, Sellers WR, Schlegel R, Garraway LA
(2012) The cancer cell line encyclopedia enables predictive
modelling of anticancer drug sensitivity. Nature 483:603–607
Campillos M, Kuhn M, Gavin AC, Jensen LJ, Bork P (2008) Drug
target identification using side-effect similarity. Science
321:263–266
Cancer Cell Line Encyclopedia, Genomics of Drug Sensitivity in
Cancer (2015) Pharmacogenomic agreement between two cancer
cell line data sets. Nature 528:84–87
Cancer Genome Atlas Research, Weinstein JN, Collisson EA, Mills
GB, Shaw KR, Ozenberger BA, Ellrott K, Shmulevich I, Sander
C, Stuart JM (2013) The cancer genome atlas pan-cancer
analysis project. Nat Genet 45:1113–1120
Chen B, Butte AJ (2016) Leveraging big data to transform target
selection and drug discovery. Clin Pharmacol Ther 99:285–297
Cho SJ, Kim KT, Jeong HC, Park JC, Kwon OS, Song YH, Shin JG,
Kang S, Kim W, Shin HD, Lee MO, Moon SH, Cha HJ (2018)
Selective elimination of culture-adapted human embryonic stem
cells with bh3 mimetics. Stem Cell Rep 11:1244–1256
Clark NR, Hu KS, Feldmann AS, Kou Y, Chen EY, Duan Q, Ma’ayan
A (2014) The characteristic direction: a geometrical approach to
identify differentially expressed genes. BMC Bioinform 15:79
Costa FF (2014) Big data in biomedicine. Drug Discov Today
19:433–440
Duan Q, Reid SP, Clark NR, Wang Z, Fernandez NF, Rouillard AD,
Readhead B, Tritsch SR, Hodos R, Hafner M, Niepel M, Sorger
PK, Dudley JT, Bavari S, Panchal RG, Ma’ayan A (2016)
L1000CDS(2): LINCS L1000 characteristic direction signatures
search engine. NPJ Syst Biol Appl 2:16015
Dudley JT, Sirota M, Shenoy M, Pai RK, Roedder S, Chiang AP,
Morgan AA, Sarwal MM, Pasricha PJ, Butte AJ (2011)
Computational repositioning of the anticonvulsant topiramate
for inflammatory bowel disease. Sci Transl Med 3:96ra76
Fotis C, Antoranz A, Hatziavramidis D, Sakellaropoulos T, Alex-
opoulos LG (2018) Network-based technologies for early drug
discovery. Drug Discov Today 23:626–635
Hernandez JJ, Pryszlak M, Smith L, Yanchus C, Kurji N, Shahani
VM, Molinski SV (2017) Giving drugs a second chance:
In silico drug repositioning: from large-scale transcriptome data to therapeutics
123
overcoming regulatory and financial hurdles in repurposing
approved drugs as cancer therapeutics. Front Oncol 7:273
Hizukuri Y, Sawada R, Yamanishi Y (2015) Predicting target proteins
for drug candidate compounds based on drug-induced gene
expression data in a chemical structure-independent manner.
BMC Med Genom 8:82
Hong SK, Park JR, Kwon OS, Kim KT, Bae GY, Cha HJ (2016)
Induction of integrin beta3 by sustained ERK activity promotes
the invasiveness of TGFbeta-induced mesenchymal tumor cells.
Cancer Lett 376:339–346
Hong SK, Lee H, Kwon OS, Song NY, Lee HJ, Kang S, Kim JH, Kim
M, Kim W, Cha HJ (2018) Large-scale pharmacogenomics
based drug discovery for ITGB3 dependent chemoresistance in
mesenchymal lung cancer. Mol Cancer 17:175
Hopkins AL (2008) Network pharmacology: the next paradigm in
drug discovery. Nat Chem Biol 4:682–690
Hughes TR, Marton MJ, Jones AR, Roberts CJ, Stoughton R, Armour
CD, Bennett HA, Coffey E, Dai H, He YD, Kidd MJ, King AM,
Meyer MR, Slade D, Lum PY, Stepaniants SB, Shoemaker DD,
Gachotte D, Chakraburtty K, Simon J, Bard M, Friend SH (2000)
Functional discovery via a compendium of expression profiles.
Cell 102:109–126
Iorio F, Bosotti R, Scacheri E, Belcastro V, Mithbaokar P, Ferriero R,
Murino L, Tagliaferri R, Brunetti-Pierri N, Isacchi A, Di
Bernardo D (2010) Discovery of drug mode of action and drug
repositioning from transcriptional responses. Proc Natl Acad Sci
USA 107:14621–14626
Isik Z, Baldow C, Cannistraci CV, Schroeder M (2015) Drug target
prioritization by perturbed gene expression and network infor-
mation. Sci Rep 5:17417
Iwata M, Sawada R, Iwata H, Kotera M, Yamanishi Y (2017)
Elucidating the modes of action for bioactive compounds in a
cell-specific manner by large-scale chemically-induced tran-
scriptomics. Sci Rep 7:40164
Jahchan NS, Dudley JT, Mazur PK, Flores N, Yang D, Palmerton A,
Zmoos AF, Vaka D, Tran KQ, Zhou M, Krasinska K, Riess JW,
Neal JW, Khatri P, Park KS, Butte AJ, Sage J (2013) A drug
repositioning approach identifies tricyclic antidepressants as
inhibitors of small cell lung cancer and other neuroendocrine
tumors. Cancer Discov 3:1364–1377
Jin G, Wong ST (2014) Toward better drug repositioning: prioritizing
and integrating existing methods into efficient pipelines. Drug
Discov Today 19:637–644
Kanehisa M, Sato Y, Furumichi M, Morishima K, Tanabe M (2018)
New approach for understanding genome variations in KEGG.
Nucleic Acids Res 47:D590–D595
Kannan L, Ramos M, Re A, El-Hachem N, Safikhani Z, Gendoo DM,
Davis S, Gomez-Cabrero D, Castelo R, Hansen KD, Carey VJ,
Morgan M, Culhane AC, Haibe-Kains B, Waldron L (2016)
Public data and open source tools for multi-assay genomic
investigation of disease. Brief Bioinform 17:603–615
Kelloff GJ, Sigman CC (2012) Cancer biomarkers: selecting the right
drug for the right patient. Nat Rev Drug Discov 11:201–214
Kim RS, Goossens N, Hoshida Y (2016) Use of big data in drug
development for precision medicine. Expert Rev Precis Med
Drug Dev 1:245–253
Kondo T, Imamura K, Funayama M, Tsukita K, Miyake M, Ohta A,
Woltjen K, Nakagawa M, Asada T, Arai T, Kawakatsu S, Izumi
Y, Kaji R, Iwata N, Inoue H (2017) iPSC-based compound
screening and in vitro trials identify a synergistic anti-amyloid
beta combination for Alzheimer’s disease. Cell Rep
21:2304–2312
Kwon OS, Oh E, Park JR, Lee JS, Bae GY, Koo JH, Kim H, Choi YL,
Choi YS, Kim J, Cha HJ (2015) GalNAc-T14 promotes
metastasis through Wnt dependent HOXB9 expression in lung
adenocarcinoma. Oncotarget 6:41916–41928
Kwon O-S, Lee H, Kong H-J, Park JE, Lee W, Kang S, Kim M, Kim
W, Cha H-J (2018) In silico Drug Repositioning of bortezomib
to reverse metastatic effect of GALNT14 in lung cancer. https://
doi.org/10.1101/394163
Lamb J (2007) The connectivity map: a new tool for biomedical
research. Nat Rev Cancer 7:54–60
Lamb J, Crawford ED, Peck D, Modell JW, Blat IC, Wrobel MJ,
Lerner J, Brunet JP, Subramanian A, Ross KN, Reich M,
Hieronymus H, Wei G, Armstrong SA, Haggarty SJ, Clemons
PA, Wei R, Carr SA, Lander ES, Golub TR (2006) The
connectivity map: using gene-expression signatures to connect
small molecules, genes, and disease. Science 313:1929–1935
Lantermann AB, Chen D, Mccutcheon K, Hoffman G, Frias E, Ruddy
D, Rakiec D, Korn J, Mcallister G, Stegmeier F, Meyer MJ,
Sharma SV (2015) Inhibition of casein kinase 1 alpha prevents
acquired drug resistance to erlotinib in EGFR-mutant non-small
cell lung cancer. Cancer Res 75:4937–4948
Lazo JS, Sharlow ER (2016) Drugging undruggable molecular cancer
targets. Annu Rev Pharmacol Toxicol 56:23–40
Lee ES, Son DS, Kim SH, Lee J, Jo J, Han J, Kim H, Lee HJ, Choi
HY, Jung Y, Park M, Lim YS, Kim K, Shim Y, Kim BC, Lee K,
Huh N, Ko C, Park K, Lee JW, Choi YS, Kim J (2008)
Prediction of recurrence-free survival in postoperative non-small
cell lung cancer patients by using an integrated model of clinical
information and gene expression. Clin Cancer Res 14:7397–7404
Lee MO, Moon SH, Jeong HC, Yi JY, Lee TH, Shim SH, Rhee YH,
Lee SH, Oh SJ, Lee MY, Han MJ, Cho YS, Chung HM, Kim KS,
Cha HJ (2013) Inhibition of pluripotent stem cell-derived
teratoma formation by small molecules. Proc Natl Acad Sci
USA 110:E3281–E3290
Lee H, Kang S, Kim W (2016) Drug repositioning for cancer therapy
based on large-scale drug-induced transcriptional signatures.
PLoS ONE 11:e0150460
Lee BK, Tiong KH, Chang JK, Liew CS, Abdul Rahman ZA, Tan
AC, Khang TF, Cheong SC (2017) DeSigN: connecting gene
expression with therapeutics for drug repurposing and develop-
ment. BMC Genom 18:934
Liberzon A, Subramanian A, Pinchback R, Thorvaldsdottir H,
Tamayo P, Mesirov JP (2011) Molecular signatures database
(MSigDB) 3.0. Bioinformatics 27:1739–1740
Novac N (2013) Challenges and opportunities of drug repositioning.
Trends Pharmacol Sci 34:267–272
Pushpakom S, Iorio F, Eyers PA, Escott KJ, Hopper S, Wells A, Doig
A, Guilliams T, Latimer J, Mcnamee C, Norris A, Sanseau P,
Cavalla D, Pirmohamed M (2018) Drug repurposing: progress,
challenges and recommendations. Nat Rev Drug Discov 18:41
Reddy AS, Zhang S (2013) Polypharmacology: drug discovery for the
future. Expert Rev Clin Pharmacol 6:41–47
Rees MG, Seashore-Ludlow B, Cheah JH, Adams DJ, Price EV, Gill
S, Javaid S, Coletti ME, Jones VL, Bodycombe NE, Soule CK,
Alexander B, Li A, Montgomery P, Kotz JD, Hon CS, Munoz B,
Liefeld T, Dancik V, Haber DA, Clish CB, Bittker JA, Palmer
M, Wagner BK, Clemons PA, Shamji AF, Schreiber SL (2016)
Correlating chemical sensitivity and basal gene expression
reveals mechanism of action. Nat Chem Biol 12:109–116
Rhee SY, Wood V, Dolinski K, Draghici S (2008) Use and misuse of
the gene ontology annotations. Nat Rev Genet 9:509–515
Sam E, Athri P (2019) Web-based drug repurposing tools: a survey.
Brief Bioinform 20:299–316
Sanchez-Vega F, Mina M, Armenia J, Chatila WK, Luna A, La KC,
Dimitriadoy S, Liu DL, Kantheti HS, Saghafinia S, Chakravarty
D, Daian F, Gao Q, Bailey MH, Liang WW, Foltz SM,
Shmulevich I, Ding L, Heins Z, Ochoa A, Gross B, Gao J, Zhang
H, Kundra R, Kandoth C, Bahceci I, Dervishi L, Dogrusoz U,
Zhou W, Shen H, Laird PW, Way GP, Greene CS, Liang H, Xiao
Y, Wang C, Iavarone A, Berger AH, Bivona TG, Lazar AJ,
O.-S. Kwon et al.
123
Hammer GD, Giordano T, Kwong LN, Mcarthur G, Huang C,
Tward AD, Frederick MJ, Mccormick F, Meyerson M, Cancer
Genome Atlas Research, Van Allen EM, Cherniack AD, Ciriello
G, Sander C, Schultz N (2018) Oncogenic signaling pathways in
the cancer genome atlas. Cell 173(321–337):e10
Sawada R, Iwata M, Tabei Y, Yamato H, Yamanishi Y (2018)
Predicting inhibitory and activatory drug targets by chemically
and genetically perturbed transcriptome signatures. Sci Rep
8:156
Scannell JW, Blanckley A, Boldon H, Warrington B (2012)
Diagnosing the decline in pharmaceutical R&D efficiency. Nat
Rev Drug Discov 11:191–200
Sirota M, Dudley JT, Kim J, Chiang AP, Morgan AA, Sweet-Cordero
A, Sage J, Butte AJ (2011) Discovery and preclinical validation
of drug indications using compendia of public gene expression
data. Sci Transl Med 3:96ra77
Subramanian A, Narayan R, Corsello SM, Peck DD, Natoli TE, Lu X,
Gould J, Davis JF, Tubelli AA, Asiedu JK, Lahr DL, Hirschman
JE, Liu Z, Donahue M, Julian B, Khan M, Wadden D, Smith IC,
Lam D, Liberzon A, Toder C, Bagul M, Orzechowski M, Enache
OM, Piccioni F, Johnson SA, Lyons NJ, Berger AH, Shamji AF,
Brooks AN, Vrcic A, Flynn C, Rosains J, Takeda DY, Hu R,
Davison D, Lamb J, Ardlie K, Hogstrom L, Greenside P, Gray
NS, Clemons PA, Silver S, Wu X, Zhao WN, Read-Button W,
Wu X, Haggarty SJ, Ronco LV, Boehm JS, Schreiber SL,
Doench JG, Bittker JA, Root DE, Wong B, Golub TR (2017) A
next generation connectivity map: L1000 platform and the
FIRST 1,000,000 profiles. Cell 171(1437–1452):e17
Tsherniak A, Vazquez F, Montgomery PG, Weir BA, Kryukov G,
Cowley GS, Gill S, Harrington WF, Pantel S, Krill-Burger JM,
Meyers RM, Ali L, Goodale A, Lee Y, Jiang G, Hsiao J, Gerath
WFJ, Howell S, Merkel E, Ghandi M, Garraway LA, Root DE,
Golub TR, Boehm JS, Hahn WC (2017) Defining a cancer
dependency map. Cell 170(564–576):e16
Van Noort V, Scholch S, Iskar M, Zeller G, Ostertag K, Schweitzer C,
Werner K, Weitz J, Koch M, Bork P (2014) Novel drug
candidates for the treatment of metastatic colorectal cancer
through global inverse gene-expression profiling. Cancer Res
74:5690–5699
Viswanathan VS, Ryan MJ, Dhruv HD, Gill S, Eichhoff OM,
Seashore-Ludlow B, Kaffenberger SD, Eaton JK, Shimada K,
Aguirre AJ, Viswanathan SR, Chattopadhyay S, Tamayo P,
Yang WS, Rees MG, Chen S, Boskovic ZV, Javaid S, Huang C,
Wu X, Tseng YY, Roider EM, Gao D, Cleary JM, Wolpin BM,
Mesirov JP, Haber DA, Engelman JA, Boehm JS, Kotz JD, Hon
CS, Chen Y, Hahn WC, Levesque MP, Doench JG, Berens ME,
Shamji AF, Clemons PA, Stockwell BR, Schreiber SL (2017)
Dependency of a therapy-resistant state of cancer cells on a lipid
peroxidase pathway. Nature 547:453–457
Wang K, Sun J, Zhou S, Wan C, Qin S, Li C, He L, Yang L (2013)
Prediction of drug-target interactions for drug repositioning only
based on genomic expression similarity. PLoS Comput Biol
9:e1003315
Winter GE, Radic B, Mayor-Ruiz C, Blomen VA, Trefzer C,
Kandasamy RK, Huber KVM, Gridling M, Chen D, Klampfl T,
Kralovics R, Kubicek S, Fernandez-Capetillo O, Brummelkamp
TR, Superti-Furga G (2014) The solute carrier SLC35F2 enables
YM155-mediated DNA damage toxicity. Nat Chem Biol
10:768–773
Xia X, Wong ST (2012) Concise review: a high-content screening
approach to stem cell research and drug discovery. Stem Cells
30:1800–1807
Zhang J, Xing Z, Ma M, Wang N, Cai YD, Chen L, Xu X (2014)
Gene ontology and KEGG enrichment analyses of genes related
to age-related macular degeneration. Biomed Res Int
2014:450386
Publisher’s Note Springer Nature remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.
In silico drug repositioning: from large-scale transcriptome data to therapeutics
123
... Transcriptome analysis using RNA-seq reveals the RNA expression level in cells using next-generation sequencing analysis and can identify intracellular transcripts that change in response to specific stimuli [20]. Because the expression of transcripts induced by drug treatment ref lects a wide range of target changes, it provides abundant information on drug action mechanisms [21]. Using the characteristics of these drug response transcripts, it is possible to identify the MOA of a combination of natural compounds with similar structures. ...
Article
Full-text available
Natural products have successfully treated several diseases using a multi-component, multi-target mechanism. However, a precise mechanism of action (MOA) has not been identified. Systems pharmacology methods have been used to overcome these challenges. However, there is a limitation as those similar mechanisms of similar components cannot be identified. In this study, comparisons of physicochemical descriptors, molecular docking analysis and RNA-seq analysis were performed to compare the MOA of similar compounds and to confirm the changes observed when similar compounds were mixed and used. Various analyses have confirmed that compounds with similar structures share similar MOA. We propose an advanced method for in silico experiments in herbal medicine research based on the results. Our study has three novel findings. First, an advanced network pharmacology research method was suggested by partially presenting a solution to the difficulty in identifying multi-component mechanisms. Second, a new natural product analysis method was proposed using large-scale molecular docking analysis. Finally, various biological data and analysis methods were used, such as in silico system pharmacology, docking analysis and drug response RNA-seq. The results of this study are meaningful in that they suggest an analysis strategy that can improve existing systems pharmacology research analysis methods by showing that natural product–derived compounds with the same scaffold have the same mechanism.
... As knowledge of biological mechanisms advances and biomedical knowledge is well collected, more accurate and precise computational drug repurposing based on wellcurated data has become possible [8]. One computational repurposing framework is a network-based approach that can recommend candidate drugs by observing the complex relationships among biological entities such as drugs, genes, and diseases. ...
Article
Full-text available
Background Computational drug repurposing is crucial for identifying candidate therapeutic medications to address the urgent need for developing treatments for newly emerging infectious diseases. The recent COVID-19 pandemic has taught us the importance of rapidly discovering candidate drugs and providing them to medical and pharmaceutical experts for further investigation. Network-based approaches can provide repurposable drugs quickly by leveraging comprehensive relationships among biological components. However, in a case of newly emerging disease, applying a repurposing methods with only pre-existing knowledge networks may prove inadequate due to the insufficiency of information flow caused by the novel nature of the disease. Methods We proposed a network-based complementary linkage method for drug repurposing to solve the lack of incoming new disease-specific information in knowledge networks. We simulate our method under the controlled repurposing scenario that we faced in the early stage of the COVID-19 pandemic. First, the disease-gene-drug multi-layered network was constructed as the backbone network by fusing comprehensive knowledge database. Then, complementary information for COVID-19, containing data on 18 comorbid diseases and 17 relevant proteins, was collected from publications or preprint servers as of May 2020. We estimated connections between the novel COVID-19 node and the backbone network to construct a complemented network. Network-based drug scoring for COVID-19 was performed by applying graph-based semi-supervised learning, and the resulting scores were used to validate prioritized drugs for population-scale electronic health records-based medication analyses. Results The backbone networks consisted of 591 diseases, 26,681 proteins, and 2,173 drug nodes based on pre-pandemic knowledge. After incorporating the 35 entities comprised of complemented information into the backbone network, drug scoring screened top 30 potential repurposable drugs for COVID-19. The prioritized drugs were subsequently analyzed in electronic health records obtained from patients in the Penn Medicine COVID-19 Registry as of October 2021 and 8 of these were found to be statistically associated with a COVID-19 phenotype. Conclusion We found that 8 of the 30 drugs identified by graph-based scoring on complemented networks as potential candidates for COVID-19 repurposing were additionally supported by real-world patient data in follow-up analyses. These results show that our network-based complementary linkage method and drug scoring algorithm are promising strategies for identifying candidate repurposable drugs when new emerging disease outbreaks.
... Transcriptome analysis of mRNA reflects the collection of all mRNA products of a cell under a given condition (drug treatment or disease status), which provides a comprehensive view of biological changes resulting from multiple genetic variations. Therefore, altered transcriptome profiles can be used to elucidate the disease mechanisms and drug mode of action (Kwon et al., 2019). The application of transcriptomic data analysis combined with bioinformatics analysis in TCM research can significantly reduce the large number of experiments needed for screening pathways and therapeutic targets in the early stage of TCM research. ...
Article
Full-text available
Introduction: Cinnamomi ramulus (CR) is one of the most widely used traditional Chinese medicine (TCM) with anti-cancer effects. Analyzing transcriptomic responses of different human cell lines to TCM treatment is a promising approach to understand the unbiased mechanism of TCM. Methods: This study treated ten cancer cell lines with different CR concentrations, followed by mRNA sequencing. Differential expression (DE) analysis and gene set enrichment analysis (GSEA) were utilized to analyze transcriptomic data. Finally, the in silico screening results were verified by in vitro experiments. Results: Both DE and GSEA analysis suggested the Cell cycle pathway was the most perturbated pathway by CR across these cell lines. By analyzing the clinical significance and prognosis of G2/M related genes (PLK1, CDK1, CCNB1, and CCNB2) in various cancer tissues, we found that they were up-regulated in most cancer types, and their down-regulation showed better overall survival rates in cancer patients. Finally, in vitro experiments validation on A549, Hep G2, and HeLa cells suggested that CR can inhibit cell growth by suppressing the PLK1/CDK1/ Cyclin B axis. Discussion: This is the first study to apply transcriptomic analysis to investigate the cancer cell growth inhibition of CR on various human cancer cell lines. The core effect of CR on ten cancer cell lines is to induce G2/M arrest by inhibiting the PLK1/CDK1/Cyclin B axis.
Chapter
This book series invites all the Specialists, Professors, Doctors, Scientists, Academicians, Healthcare professionals, Nurses, Students, Researchers, Business Delegates, and Industrialists across the globe to publish their insights and convey recent developments in the field of Nursing, Pharmaceutical Research and Innovations in Pharma Industry. Book series on Pharmacy and Nursing covers research work in a set of clinical sciences and medicine.
Article
Epidermal growth factor receptor inhibitors (EGFRi) have exhibited promising clinical outcomes in the treatment of various cancers. However, their widespread application has been limited by low patient eligibility and the emergence of resistance. Leveraging a multi-omics approach (>1000 cancer cell lines), we explored molecular signatures linked to EGFRi responsiveness and found that expression signatures involved in the estrogen response could recapitulate cancer cell dependency on EGFR, a phenomenon not solely attributable to EGFR-activating mutations. By correlating genome-wide function screening data with EGFRi responses, we identified chemokine receptor 6 (CCR6) as a potential druggable target to mitigate EGFRi resistance. In isogenic cell models, pharmacological inhibition of CCR6 effectively reversed acquired EGFRi resistance, disrupting mitochondrial oxidative phosphorylation, a cellular process commonly associated with therapy resistance. Our data-driven strategy unveils drug-response biomarkers and therapeutic targets for resistance, thus potentially expanding EGFRi applicability and efficacy.
Article
Full-text available
BACKGROUND Diminished ovarian reserve has a serious impact on female reproduction with an increasing incidence every year. An important cause of this is oxidative stress. Rubi fructus, a traditional medicinal and edible plant, has shown therapeutic effects against gynecological diseases. Vanillic acid, isoquercitrin, kaempferol‐3‐O‐rutinoside, kaempferol‐3‐O‐sophoroside, oleanolic acid, tormentic acid, tiliroside, and ellagic acid are the major bioactive components in R. fructus. However, studies involved in the effectiveness and mechanism of these components in oxidative stress‐induced ovarian dysfunction are scarce. RESULTS In this study, the protective mechanisms of the bioactive components were evaluated in human ovarian granulosa cells. Isoquercitrin was significantly superior to other bioactive components in relieving damage in human ovarian granulosa cells induced by 2,2‐azobis (2‐methylpropionamidine) dihydrochloride, considering enhanced cell viability, reduced reactive oxygen species accumulation, and improved mitochondrial membrane potential level. Isoquercitrin protected human ovarian granulosa cells from oxidative stress by regulating the enzyme activity of glutathione peroxidase, inhibiting cell apoptosis, improving the expression of genes related to oxidative stress, and ameliorating heme oxygenase 1 protein expression. CONCLUSION Isoquercitrin, a bioactive component in R. fructus, has a significant protective effect on oxidative damage induced by 2,2‐azobis (2‐methylpropionamidine) dihydrochloride in human ovarian granulosa cells, providing evidence for its potential application in protecting ovarian function. © 2024 Society of Chemical Industry.
Preprint
Full-text available
Transcriptomic and phenomic profiling assays analyze drug perturbations to provide unbiased information regarding the mechanisms of action (MOAs) of drugs. However, few studies have compared the bioinformatics contents derived from these assays. This study investigated the transcriptomic and phenomic features in terms of diversities and MOA prediction. From publicly available L1000 and Cell Painting datasets, transcriptomic and phenomic features for 274 compounds annotated with 30 MOAs were prepared for analyses. Feature-extraction analyses with tSNE and Isomap algorithms showed that the compound distribution based on transcriptomic features was more dispersed than that based on phenomic features. Pairwise comparison across compounds showed high correlative clusters in phenomic feature heatmap. To explore the predictive potential for the MOA of compounds, transcriptomic and/or phenomic features were used to train machine learning models. XGBoost and Extra Tree models resulted in overfitting, whereas the KNN and Adaboost models yielded a relatively lower performance. Notably, the glucocorticoid receptor agonist was the class of MOA with the highest predictability based on transcriptomic and/or phenomic features. In conclusion, L1000 features were more diverse than the Cell Painting features. Machine learning analysis suggested new similar pairs of compounds and predicted certain classes among MOAs more accurately than others.
Chapter
Huge fiscal investments in conjunction with high attrition rates encountered during de novo drug discovery delay the timelines required for development and entry of novel drug molecules into the pharmaceutical market. This situation mandates the initiation of contemporary drug repurposing research strategies to reconnoiter the hidden off-label indications of de-risked compounds or existing approved drugs with established safety profile. This chapter leverages case studies to provide a comprehensive insight into state-of-the-art drug repurposing strategies. At the outset, the application of multifaceted omics data for identifying repurposable drug candidates in various disorders is highlighted. These include genomics-, transcriptomics-, proteomics-, epigenetics-, metabolomics-, microbomics-, pregnomics-, and multiomics- based approaches. Further, the utilization of massive dataretrieved from side effect databases and electronic health records to generate evidences in Fframing drug repurposing hypotheses is elaborated. Finally, text- mining techniques, computational methods, and artificial intelligence subsets deployed in effectuating prolific drug repurposing are deliberated.
Chapter
Alzheimer’s disease (AD) has become a public health emergency due to its complexity and heterogeneity; therefore, therapeutic regimens must focus on cure rather than symptom management. Alternative strategies, such as repositioning existing drugs to treat AD, have been increasingly applied recently due to the sluggish pace and rising failure rate of traditional drug discovery. Reevaluating existing drugs for a new indication is known as “drug repositioning,” which may save money, time, and effort throughout the drug development process. Computational strategies have been providing excellent facilities for the effective prediction of drug repositioning, especially the integration of the network pharmacology method, which offers a novel approach to drug discovery by creating models that account for the broad physiological or pathophysiological context of protein targets and the effects of changing them without compromising the essential molecular details. Network pharmacology guides and assists drug repositioning by identifying new drug targets, disease mechanisms, multi-target drugs, drug combinations, and adverse drug reactions through the analysis and molecular visualization of multilayer omics data on the drug-target-disease association. This chapter discusses the importance and success of drug repositioning in AD development and the prospects and methodologies of network pharmacology in understanding various aspects of drug repositioning.Key wordsDrug repositioning Alzheimer’s disease Network pharmacology Network analysis Omics
Article
The rising global burden of cancer has driven considerable efforts into the research and development of effective anti-cancer agents. Fortunately, with impressive advances in transcriptome profiling technology, the Connectivity Map (CMap) database has emerged as a promising and powerful drug repurposing approach. It provides an important platform for systematically discovering of the associations among genes, small-molecule compounds and diseases, and elucidating the mechanism of action of drug, contributing toward efficient anti-cancer pharmacotherapy. Moreover, CMap-based computational drug repurposing is gaining attention because of its potential to overcome the bottleneck constraints faced by traditional drug discovery in terms of cost, time and risk. Herein, we provide a comprehensive review of the applications of drug repurposing for anti-cancer drug discovery and summarize approaches for computational drug repurposing. We focus on the principle of the CMap database and novel CMap-based software/algorithms as well as their progress achieved for drug repurposing in the field of oncotherapy. This article is expected to illuminate the emerging potential of CMap in discovering effective anti-cancer drugs, thereby promoting efficient healthcare for cancer patients.
Article
Full-text available
Even when targets responsible for chemoresistance are identified, drug development is often hampered due to the poor druggability of these proteins. We systematically analyzed therapy-resistance with a large-scale cancer cell transcriptome and drug-response datasets and predicted the candidate drugs based on the gene expression profile. Our results implicated the epithelial–mesenchymal transition as a common mechanism underlying resistance to chemotherapeutic drugs. Notably, we identified ITGB3, whose expression was abundant in both drug resistance and mesenchymal status, as a promising target to overcome chemoresistance. We also confirmed that depletion of ITGB3 sensitized cancer cells to conventional chemotherapeutic drugs by modulating the NF-κB signaling pathway. Considering the poor druggability of ITGB3 and the lack of feasible drugs to directly inhibit this protein, we took an in silico screening for drugs mimicking the transcriptome-level changes caused by knockdown of ITGB3. This approach successfully identified atorvastatin as a novel candidate for drug repurposing, paving an alternative path to drug screening that is applicable to undruggable targets. Electronic supplementary material The online version of this article (10.1186/s12943-018-0924-8) contains supplementary material, which is available to authorized users.
Article
Full-text available
KEGG (Kyoto Encyclopedia of Genes and Genomes; https://www.kegg.jp/ or https://www.genome.jp/kegg/) is a reference knowledge base for biological interpretation of genome sequences and other high-throughput data. It is an integrated database consisting of three generic categories of systems information, genomic information and chemical information, and an additional human-specific category of health information. KEGG pathway maps, BRITE hierarchies and KEGG modules have been developed as generic molecular networks with KEGG Orthology nodes of functional orthologs so that KEGG pathway mapping and other procedures can be applied to any cellular organism. Unfortunately, however, this generic approach was inadequate for knowledge representation in the health information category, where variations of human genomes, especially disease-related variations, had to be considered. Thus, we have introduced a new approach where human gene variants are explicitly incorporated into what we call 'network variants' in the recently released KEGG NETWORK database. This allows accumulation of knowledge about disease-related perturbed molecular networks caused not only by gene variants, but also by viruses and other pathogens, environmental factors and drugs. We expect that KEGG NETWORK will become another reference knowledge base for the basic understanding of disease mechanisms and practical use in clinical sequencing and drug development.
Article
Full-text available
The selective survival advantage of culture-adapted human embryonic stem cells (hESCs) is a serious safety concern for their clinical application. With a set of hESCs with various passage numbers, we observed that a subpopulation of hESCs at late passage numbers was highly resistant to various cell death stimuli, such as YM155, a survivin inhibitor. Transcriptome analysis from YM155-sensitive (YM155S) and YM155-resistant (YM155R) hESCs demonstrated that BCL2L1 was highly expressed in YM155R hESCs. By matching the gene signature of YM155R hESCs with the Cancer Therapeutics Response Portal dataset, BH3 mimetics were predicted to selectively ablate these cells. Indeed, short-course treatment with a sub-optimal dose of BH3 mimetics induced the spontaneous death of YM155R, but not YM155S hESCs by disrupting the mitochondrial membrane potential. YM155S hESCs remained pluripotent following BH3 mimetics treatment. Therefore, the use of BH3 mimetics is a promising strategy to specifically eliminate hESCs with a selective survival advantage.
Preprint
Full-text available
Although many molecular targets for cancer therapy have been discovered, they often show poor druggability, which is a major obstacle to develop targeted drugs. As an alternative route to drug discovery, we adopted an in silico drug repositioning (in silico DR) approach based on large-scale gene expression signatures, with the goal of identifying inhibitors of lung cancer metastasis. Our analysis of clinoco-genomic data identified GALNT14, an enzyme involved in O-linked N-acetyl galactosamine glycosylation, as a putative driver of lung cancer metastasis leading to poor survival. To overcome the poor druggability of GALNT14, we leveraged Connectivity Map approach, an in silico screening for drugs that are likely to revert the metastatic expression patterns. It leads to identification of bortezomib (BTZ) as a potent metastatic inhibitor, bypassing direct inhibition of poorly druggable target, GALNT14. The anti-metastatic effect of BTZ was verified in vitro and in vivo. Notably, both BTZ treatment and GALNT14 knockdown attenuated TGFβ-mediated gene expression and suppressed TGFβ-dependent metastatic genes, suggesting that BTZ acts by modulating TGFβ signaling. Taken together, these results demonstrate that our in silico DR approach is a viable strategy to identify a candidate drug for undruggable targets, and to uncover its underlying mechanisms.
Article
Full-text available
Genome-wide identification of all target proteins of drug candidate compounds is a challenging issue in drug discovery. Moreover, emerging phenotypic effects, including therapeutic and adverse effects, are heavily dependent on the inhibition or activation of target proteins. Here we propose a novel computational method for predicting inhibitory and activatory targets of drug candidate compounds. Specifically, we integrated chemically-induced and genetically-perturbed gene expression profiles in human cell lines, which avoided dependence on chemical structures of compounds or proteins. Predictive models for individual target proteins were simultaneously constructed by the joint learning algorithm based on transcriptomic changes in global patterns of gene expression profiles following chemical treatments, and following knock-down and over-expression of proteins. This method discriminates between inhibitory and activatory targets and enables accurate identification of therapeutic effects. Herein, we comprehensively predicted drug-target-disease association networks for 1,124 drugs, 829 target proteins, and 365 human diseases, and validated some of these predictions in vitro. The proposed method is expected to facilitate identification of new drug indications and potential adverse effects.
Article
Full-text available
Although the traditional drug discovery approach has led to the development of many successful drugs, the attrition rates remain high. Recent advances in systems-oriented approaches (systems-biology and/or pharmacology) and 'omics technologies has led to a plethora of new computational tools that promise to enable a more-informed and successful implementation of the reductionist, one drug for one target for one disease, approach. These tools, based on biomolecular pathways and interaction networks, offer a systematic approach to unravel the mechanism(s) of a disease and link them to the chemical space and network footprint of a drug. Drug discovery can draw upon this holistic approach to identify the most-promising targets and compounds during the early phases of development.
Article
Full-text available
In the process of drug development, in vitro studies do not always adequately predict human-specific drug responsiveness in clinical trials. Here, we applied the advantage of human iPSC-derived neurons, which offer human-specific drug responsiveness, to screen and evaluate therapeutic candidates for Alzheimer’s disease (AD). Using AD patient neurons with nearly 100% purity from iPSCs, we established a robust and reproducible assay for amyloid β peptide (Aβ), a pathogenic molecule in AD, and screened a pharmaceutical compound library. We acquired 27 Aβ-lowering screen hits, prioritized hits by chemical structure-based clustering, and selected 6 leading compounds. Next, to maximize the anti-Aβ effect, we selected a synergistic combination of bromocriptine, cromolyn, and topiramate as an anti-Aβ cocktail. Finally, using neurons from familial and sporadic AD patients, we found that the cocktail showed a significant and potent anti-Aβ effect on patient cells. This human iPSC-based platform promises to be useful for AD drug development.
Article
Given the high attrition rates, substantial costs and slow pace of new drug discovery and development, repurposing of 'old' drugs to treat both common and rare diseases is increasingly becoming an attractive proposition because it involves the use of de-risked compounds, with potentially lower overall development costs and shorter development timelines. Various data-driven and experimental approaches have been suggested for the identification of repurposable drug candidates; however, there are also major technological and regulatory challenges that need to be addressed. In this Review, we present approaches used for drug repurposing (also known as drug repositioning), discuss the challenges faced by the repurposing community and recommend innovative ways by which these challenges could be addressed to help realize the full potential of drug repurposing.
Article
The systematic translation of cancer genomic data into knowledge of tumour biology and therapeutic possibilities remains challenging. Such efforts should be greatly aided by robust preclinical model systems that reflect the genomic diversity of human cancers and for which detailed genetic and pharmacological annotation is available¹. Here we describe the Cancer Cell Line Encyclopedia (CCLE): a compilation of gene expression, chromosomal copy number and massively parallel sequencing data from 947 human cancer cell lines. When coupled with pharmacological profiles for 24 anticancer drugs across 479 of the cell lines, this collection allowed identification of genetic, lineage, and gene-expression-based predictors of drug sensitivity. In addition to known predictors, we found that plasma cell lineage correlated with sensitivity to IGF1 receptor inhibitors; AHR expression was associated with MEK inhibitor efficacy in NRAS-mutant lines; and SLFN11 expression predicted sensitivity to topoisomerase inhibitors. Together, our results indicate that large, annotated cell-line collections may help to enable preclinical stratification schemata for anticancer agents. The generation of genetic predictions of drug response in the preclinical setting and their incorporation into cancer clinical trial design could speed the emergence of ‘personalized’ therapeutic regimens².
Article
Genetic alterations in signaling pathways that control cell-cycle progression, apoptosis, and cell growth are common hallmarks of cancer, but the extent, mechanisms, and co-occurrence of alterations in these pathways differ between individual tumors and tumor types. Using mutations, copy-number changes, mRNA expression, gene fusions and DNA methylation in 9,125 tumors profiled by The Cancer Genome Atlas (TCGA), we analyzed the mechanisms and patterns of somatic alterations in ten canonical pathways: cell cycle, Hippo, Myc, Notch, Nrf2, PI-3-Kinase/Akt, RTK-RAS, TGFβ signaling, p53 and β-catenin/Wnt. We charted the detailed landscape of pathway alterations in 33 cancer types, stratified into 64 subtypes, and identified patterns of co-occurrence and mutual exclusivity. Eighty-nine percent of tumors had at least one driver alteration in these pathways, and 57% percent of tumors had at least one alteration potentially targetable by currently available drugs. Thirty percent of tumors had multiple targetable alterations, indicating opportunities for combination therapy.