A preview of this full-text is provided by Springer Nature.
Content available from Nature Biotechnology
This content is subject to copyright. Terms and conditions apply.
Articles
https://doi.org/10.1038/s41587-021-00946-z
1Peking University International Cancer Institute, Health Science Center, Peking University, Beijing, China. 2Department of Pharmacology, School of Basic
Medical Sciences, Health Science Center, Peking University, Beijing, China. 3Beijing & Qingdao Langu Pharmaceutical R&D Platform, Beijing Gigaceuticals
Tech. Co. Ltd., Beijing, China. 4Department of Anatomy, Histology and Embryology, Neuroscience Research Institute, Health Science Center, Peking
University, Beijing, China. 5State Key Laboratory of Natural and Biomimetic Drugs, School of Pharmaceutical Sciences, Health Science Center, Peking
University, Beijing, China. 6These authors contributed equally: Jie Zhu, Jingxiang Wang, Xin Wang, Mingjing Gao, Bingbing Guo. 7These authors jointly
supervised this work: Hong Zhu, Ning Zhang, Ruimao Zheng, Zhengwei Xie. ✉e-mail: rainbow_zhou@126.com; zhangning@bjmu.edu.cn;
rmzheng@pku.edu.cn; xiezhengwei@hsc.pku.edu.cn
Recent developments in the application of deep learning to
diverse areas (for example, natural language processing, com-
puter vision and so on) suggest the potential of advanced
algorithms for the assessment of chemicals in applications such as
molecular encoding, chemical synthesis route planning and inhibi-
tor target prediction1–5. Combined with resources developed in
computational chemistry, these deep learning tools are changing
the landscape of chemical and pharmaceutical research and devel-
opment (for example, enabling rapid sampling of a vast chemical
space and allowing researchers to make accurate predictions about
structure–function relationships).
Drug development based on target proteins has been a suc-
cessful approach in the past decades, but these methods cannot
address diseases that lack well-defined protein targets. One strategy
for developing drugs to treat these diseases would be to generate a
model capable of predicting efficacy independent of specific targets.
A recent study showed how a new antibiotic candidate for treating
Escherichia coli infections was found using a customized deep learn-
ing model6. However, this kind of model is built on a case-by-case
basis and relies on phenotypic data specific to a single disease state;
that is, it lacks the ability to generalize to other diseases.
Given that most diseases are associated with characteris-
tic changes in gene expression profiles, such changes are used as
indicators reflecting the underlying mechanisms of diseases, an
assumption embodied in the Connectivity Map (CMap) concept7–10.
However, CMap is applicable only to the molecules whose tran-
scriptional profiles have already been experimentally assessed. We
envisioned that a model capable of predicting chemically inducible
changes in transcriptional profiles (CTPs) for an unlimited number
of small molecules would make it much easier to find potent agents
to develop as treatments for most diseases. First, we constructed a
neural network using simplified molecular-input line-entry system
(SMILES) chemical encoding as input to fit CTPs that were mea-
sured in the L1000 project11 (Fig. 1a). Second, using gene signatures
specific to pathological contexts, we employed gene set enrichment
analysis (GSEA)12 to evaluate the potential efficacy of compounds
against these diseases. We refer to this approach and model as
DLEPS.
Results
The architecture and training of DLEPS. To build a general-
purpose model that is suitable for use with many diseases, especially
for disorders without well-defined targets, we developed DLEPS
comprising two stages. First, we trained a deep neural network to
predict CTPs based on data from cell culture screening with diverse
compounds (Fig. 1a). The SMILES encoding of small molecules was
initially parsed to a grammar tree13, which was then encoded to a
point randomly in a high-dimensional sphere (Fig. 1a, middle). The
latent vector was further passed to a deep dense network to predict
the CTPs (Fig. 1a, right).
Second, we selected upregulated and downregulated gene sig-
natures that should reflect pathological changes in gene expression
levels; here, we employed GSEA, which has been adopted in CMap,
to compute an enrichment score as the efficacy score7,9. According
to this score, we finally selected several top-ranked candidate small
molecules to be assayed with cell cultures or directly in animal models
Prediction of drug efficacy from transcriptional
profiles with deep learning
Jie Zhu 1,2,6, Jingxiang Wang3,6, Xin Wang2,6, Mingjing Gao3,6, Bingbing Guo4,6, Miaomiao Gao1,
Jiarui Liu4, Yanqiu Yu1, Liang Wang2, Weikaixin Kong 5, Yongpan An2, Zurui Liu3, Xinpei Sun 1,
Zhuo Huang 5, Hong Zhou2,7 ✉ , Ning Zhang1,7 ✉ , Ruimao Zheng4,7 ✉ and Zhengwei Xie 1,3,7 ✉
Drug discovery focused on target proteins has been a successful strategy, but many diseases and biological processes lack obvi-
ous targets to enable such approaches. Here, to overcome this challenge, we describe a deep learning–based efficacy prediction
system (DLEPS) that identifies drug candidates using a change in the gene expression profile in the diseased state as input.
DLEPS was trained using chemically induced changes in transcriptional profiles from the L1000 project. We found that the
changes in transcriptional profiles for previously unexamined molecules were predicted with a Pearson correlation coefficient
of 0.74. We examined three disorders and experimentally tested the top drug candidates in mouse disease models. Validation
showed that perillen, chikusetsusaponin IV and trametinib confer disease-relevant impacts against obesity, hyperuricemia and
nonalcoholic steatohepatitis, respectively. DLEPS can generate insights into pathogenic mechanisms, and we demonstrate that
the MEK–ERK signaling pathway is a target for developing agents against nonalcoholic steatohepatitis. Our findings suggest
that DLEPS is an effective tool for drug repurposing and discovery.
NATURE BIOTECHNOLOGY | VOL 39 | NOVEMBER 2021 | 1444–1452 | www.nature.com/naturebiotechnology
1444
Content courtesy of Springer Nature, terms of use apply. Rights reserved