ArticlePDF Available

Biclustering analysis of transcriptome big data identifies condition-specific microRNA targets

Authors:

Abstract

We present a novel approach to identify human microRNA (miRNA) regulatory modules (mRNA targets and relevant cell conditions) by biclustering a large collection of mRNA fold-change data for sequence-specific targets. Bicluster targets were assessed using validated messenger RNA (mRNA) targets and exhibited on an average 17.0% (median 19.4%) improved gain in certainty (sensitivity + specificity). The net gain was further increased up to 32.0% (median 33.4%) by incorporating functional networks of targets. We analyzed cancer-specific biclusters and found that the PI3K/Akt signaling pathway is strongly enriched with targets of a few miRNAs in breast cancer and diffuse large B-cell lymphoma. Indeed, five independent prognostic miRNAs were identified, and repression of bicluster targets and pathway activity by miR-29 was experimentally validated. In total, 29 898 biclusters for 459 human miRNAs were collected in the BiMIR database where biclusters are searchable for miRNAs, tissues, diseases, keywords and target genes.
Published online 1 March 2019 Nucleic Acids Research, 2019, Vol. 47, No. 9 e53
doi: 10.1093/nar/gkz139
Biclustering analysis of transcriptome big data
identifies condition-specific microRNA targets
Sora Yoon1, Hai C. T. Nguyen1, Woobeen Jo1, Jinhwan Kim1, Sang-Mun Chi2,
Jiyoung Park1, Seon-Young Kim3,4 and Dougu Nam1,5,*
1School of Life Sciences, Ulsan National Institute of Science and Technology, Ulsan 44919, Republic of Korea,
2School of Computer Science and Engineering, Kyungsung University, Busan 48434, Republic of Korea,
3Department of Functional Genomics, University of Science and Technology (UST), Daejeon 34141, Republic of
Korea, 4Genome Editing Research Center, Personalized Genomic Medicine Research Center, Korea Research
Institute of Bioscience and Biotechnology (KRIBB), Daejeon 34141, Republic of Korea and 5Department of
Mathematical Sciences, Ulsan National Institute of Science and Technology, Ulsan 44919, Republic of Korea
Received December 15, 2018; Editorial Decision February 13, 2019; Accepted February 19, 2019
ABSTRACT
We present a novel approach to identify human mi-
croRNA (miRNA) regulatory modules (mRNA targets
and relevant cell conditions) by biclustering a large
collection of mRNA fold-change data for sequence-
specific targets. Bicluster targets were assessed us-
ing validated messenger RNA (mRNA) targets and
exhibited on an average 17.0% (median 19.4%) im-
proved gain in certainty (sensitivity + specificity).
The net gain was further increased up to 32.0% (me-
dian 33.4%) by incorporating functional networks of
targets. We analyzed cancer-specific biclusters and
found that the PI3K/Akt signaling pathway is strongly
enriched with targets of a few miRNAs in breast can-
cer and diffuse large B-cell lymphoma. Indeed, five
independent prognostic miRNAs were identified, and
repression of bicluster targets and pathway activ-
ity by miR-29 was experimentally validated. In total,
29 898 biclusters for 459 human miRNAs were col-
lected in the BiMIR database where biclusters are
searchable for miRNAs, tissues, diseases, keywords
and target genes.
INTRODUCTION
MicroRNAs (miRNAs) are small non-coding RNA
molecules (19–23 nt) that regulate gene expression by
binding to miRNA response elements in messenger RNA
(mRNA) at the post-transcription level (1,2). Since their
discovery, extensive studies have revealed their key roles in
regulating cell cycle and differentiation, chronic diseases,
cancer progression and other processes (3–6). As the func-
tion of an miRNA is characterized by its target genes, there
have been efforts to systematically identify these target
genes based on the binding sequences (7–12). Although
these methods have provided hundreds to thousands of
potential targets, they also yield a large number of false-
positives and do not suggest specic targets related to the
cell condition being examined.
To select more reliable mRNA targets for each miRNA,
paired expression proles of miRNAs and mRNAs (de-
noted as miRNA–mRNA proles) have been incorporated
considering the anticorrelation between an miRNA and its
target mRNA. In addition to simple Pearson and Spearman
correlation methods, a number of computational meth-
ods that integrate both the binding sequence and miRNA–
mRNA proles have been developed to detect the miRNA–
mRNA regulatory relationships including penalized re-
gression and the Bayesian methods (13–15) (denoted as
anticorrelation-based methods). Many of these methods
used multivariate linear models in which multiple miRNAs
regulate a common target gene. Although anticorrelation-
based methods have improved target prediction, they re-
quire very costly miRNA–mRNA proles, and only a lim-
ited number of such paired datasets are publicly available at
present.
Another approach for improving miRNA target predic-
tion is by inference of miRNA regulation modules. Based
on binding sequence information, a bipartite graph between
miRNAs and mRNAs was constructed and the maximum
bicliques (or biclusters) were identied (16,17). These bi-
cliques represent miRNA regulation modules in which mul-
tiple miRNAs may coregulate their common targets. By in-
corporating miRNA–mRNA proles, these modules were
further rened for specic cell conditions (18–21). Because
of the modular nature of cellular processes, these modules
were considered to represent more reliable miRNA regula-
tion patterns (22). Recent methods incorporated additional
information such as protein–protein (or gene–gene) inter-
actions, copy number variation and methylation data to
*To whom correspondence should be addressed. Tel: +82 52 217 2525; Fax: +82 52 217 2639; Email: dougnam@unist.ac.kr
C
The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which
permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Downloaded from https://academic.oup.com/nar/article-abstract/47/9/e53/5366474 by guest on 19 March 2020
e53 Nucleic Acids Research, 2019, Vol. 47, No. 9 PAGE 2OF 10
Figure 1. Two approaches for miRNA regulation module discovery. Red,
yellow and blue nodes represent miRNA regulators, mRNA target genes
and cell conditions, respectively. R, g and C stand for regulator, target gene
and cell condition, respectively. (A) Existing approach. For a given cell con-
dition (here, C1), down (or up)-regulated mRNAs are selected and biclus-
ters between multiple miRNAs and these mRNA targets are identied. (B)
Our approach. For a given miRNA (here, R1), mRNAs with correspond-
ing binding sequences are selected and biclusters between these mRNAs
and multiple cell conditions are searched.
better understand miRNA regulation (23). The myriad of
computational methods for miRNA target prediction have
been reviewed and categorized previously (15,20,23), some
of which are summarized in Supplementary Table S1.
In this study, we propose a novel approach to identifying
miRNA targets for a variety of cell conditions by biclus-
tering a large collection of mRNA proles for sequence-
specic targets. To this end, we collected 5158 human mi-
croarray expression datasets with diverse test and con-
trol conditions from the Gene Expression Omnibus (GEO)
database (24) and compiled corresponding fold-change
(FC) proles representing 5158 cell conditions. Whereas ex-
isting methods for miRNA regulation modules biclustered
miRNAs and mRNA targets under a given cell condition
(Figure 1A), we considered a different dimension and bi-
clustered mRNA targets and cell conditions (i.e. FC pro-
les) for an miRNA of interest (Figure 1B). Our approach
provides more reliable miRNA target groups that are ro-
bustly regulated across different cell conditions without us-
ing miRNA–mRNA proles. A related approach incorpo-
rated coexpression of sequence-specic targets using 250
microarray datasets to prioritize true targets (25), but it
clustered only target genes and did not suggest relevant cell
conditions.
Typically, biclustering algorithms seek to identify a com-
plete association (i.e. biclique) between two subsets of nodes
(e.g. a subset of target genes and a subset of cell conditions)
(26,27). Taking into account the noise in microarray data,
we developed a progressive bicluster extension (PBE) algo-
rithm that allows for a small portion of unconnected pairs
between two node subsets but yields biclusters of much
larger sizes. In the initial step, PBE identies bicliques using
the bimax algorithm (27). These bicliques are used as seeds
that are extended by competitively adding ‘dense’ (low pro-
portion of zero values) rows and columns. Next, less dense
rows and columns are removed based on a threshold. By
increasing this threshold (tight to less tight) during the it-
eration of bicluster extension, PBE identied the bicluster
structures from noisy data more accurately than state-of-
the-art algorithms (17,27–31). QUBIC (29) uses a similar
approach by searching for seed biclusters that are then ex-
tended. However, QUBIC applies a threshold for minimum
column density only, which does not change during exten-
sion and does not remove noisy rows (Supplementary Fig-
ure S4B).
The biclusters were assessed using experimentally vali-
dated targets and exhibited substantially improved accu-
racy compared to the purely sequence-based method. The
accuracy was even further improved by selecting the targets
having functional interactions with other target genes. No-
tably, these gains were obtained using only publicly available
gene expression and protein functional interaction data,
but were compared favorably with those obtained from
the anticorrelation-based methods. Moreover, our predic-
tions are available for 459 human miRNAs and a vari-
ety of cell conditions from our bicluster database, called
BiMIR (http://btool.org/bimir dir/). We further validated
our approach by analyzing the pathways of cancer-specic
biclusters and prognosis of associated miRNAs followed by
conrmatory experiments.
MATERIALS AND METHODS
Collection of expression fold-change data
We downloaded CEL les for 2019 GEO series produced
using the Affymetrix U133 Plus 2.0 chip. Robust multi-
array average (RMA) normalization was applied to each
CEL le using ‘justRMA’ function in R ‘affy’ package (32).
The intensities of probes for each gene were collapsed by
their average value. Next, we curated two sample groups
(test/control) for each experimental series and calculated
the logarithmic FC (denoted as logFC) of the average ex-
pressions in each group. In total, logFC proles for 5158
(test/control) cell conditions were collected for 20 639 hu-
man gene symbols. The logFC matrix and information of
the cell conditions are available from our bimir R package
(https://github.com/unistbig/bimir).
Sequence-specic miRNA targets
Sequence-specic miRNA targets were obtained from the
seven sequence-based target prediction databases (Tar-
getScan (33), miRanda (34), mirSVR (35), PITA (36),
DIANA-microT (37,38), miRDB (39) and TargetRank
(40)). The number of candidate miRNA–mRNA interac-
tions, parameters used and download sites for the sequence-
specic targets are available in Supplementary Data (Sec-
tion S1).
MiRNA target prediction using a progressive bicluster exten-
sion (PBE) algorithm
The overview of biclustering-based miRNA target predic-
tion is shown in Figure 2. First, 5158 mRNA microarray
datasets with two sample groups (test/control) were col-
lected from GEO database (24), and corresponding logFC
data were compiled for 20 639 human genes (columns) and
5158 fold-change cell conditions (rows). These logFC data
are quantized into up-, neutral- and down-regulated genes
(denoted as 1, 0 and 1, respectively) using ±log21.3 (here-
after, simply denoted as 1.3 FC) thresholds. We regarded
1.3 FC as an appropriate threshold for representing tar-
get expression changes caused by miRNA regulation ex-
cluding noisy data and covering many ‘ne-tuned’ mRNA
Downloaded from https://academic.oup.com/nar/article-abstract/47/9/e53/5366474 by guest on 19 March 2020
PAGE 3OF 10 Nucleic Acids Research, 2019, Vol. 47, No. 9 e53
Figure 2. Overview of the biclustering-based miRNA target predic-
tion. (A) The gene expression fold-change compendium. (B) Sequence-
specic targets for each miRNA were obtained from seven miRNA target
databases. (C) The MIR prole is composed of binarized logarithmic fold-
change values of sequence-specic targets for selected cell conditions. (D)
From MIR prole, seed biclusters are extracted using BIMAX algorithm,
and then are extended using PBE algorithm. (E) Finally, merged biclusters
are generated by hierarchical clustering of extended biclusters and remov-
ing the noisy rows and columns.
targets simultaneously. For each miRNA, sequence-specic
targets predicted in at least three out of seven miRNA tar-
get databases were selected (denoted as background set).
Then, logFC proles for each condition were accumulated
to the background set based on the enrichment of 1.3-fold
up-regulated genes in the background set (hypergeometric
test, FDR <5%). The resulting logFC submatrix was con-
verted to a binary matrix by replacing 1 with 0, and was
dubbed MIR prole for the given miRNA. We rst applied
the bimax biclustering algorithm (27) to the MIR prole to
obtain a number of small biclusters completely lled with
1 (called seed biclusters). These seed biclusters were then
‘progressively’ extended using PBE algorithm (extended bi-
clusters); rows and columns with many 1’s were competi-
tively added to the seed bicluster and then relatively noisy
rows and columns were removed, and this process was re-
peated by slightly increasing the threshold for zero propor-
tion in each row and column (strict to less strict). The ex-
tended biclusters were then clustered using average-linkage
hierarchical clustering (merged bicluster) to remove redun-
dant results. The Meet/Min distance was used for hierar-
chical clustering as follows: For two different extended bi-
clusters A and B,
Distance (A,B)=1|AB|
min (|A|,|B|),
where |A|is the multiplication of the row and column sizes
of A. We tested for the three cutoff values (0.3, 0.5 and 0.7)
for the cluster dendrogram. This cutoff had a limited effect
on the result, and thus we used the cutoff =0.5. After the
merging, the rows or columns containing more than 10%
of zeros were trimmed off individually, nally yielding the
‘merged biclusters’. See Supplementary Data for a detailed
description of PBE algorithm (Section S2, Supplementary
Figures S1 and S2). Only the merged bicluster was used for
target prediction and is simply denoted as ‘bicluster’ here-
after unless noted otherwise.
The resulting biclusters represent predicted target genes
(bicluster columns) up-regulated for the clustered cell con-
ditions (bicluster rows). Down-regulated biclusters were
also generated in the symmetrical way. Up (down)-regulated
biclusters imply that the corresponding miRNA is down
(up)-regulated in the captured test conditions. Detailed fea-
tures of the biclusters are described in Supplementary Data
(Supplementary Figure S3 and Supplementary Table S2).
We mainly reported the analysis results for 1.3 FC thresh-
old, but biclusters were also generated under ±log1.5 and
±log2.0 thresholds (denoted as 1.5 FC and 2.0 FC thresh-
olds, respectively) to capture more specic and stronger
miRNA regulation. Overall, for the list of sequence-specic
targets of a given miRNA, two MIR proles (up and down)
are generated for each threshold (1.3, 1.5 and 2.0). The three
up-regulated (and down-regulated) MIR proles have dif-
ferent condition counts, while the gene counts are the same.
Therefore, the resulting seed bicluster (and the nal merged
bicluster) counts differ for different thresholds. An example
of let-7c bicluster for stem cell conditions are described in
Supplementary Data (Section S5).
Experimental validation of miR-29b/c regulation in breast
cancer
miRNA transfection. miR-29b-3p and miR-29c-3p mimic
and miRNA scramble control were purchased from Geno-
lution. Each miRNA (100 nM) were transiently transfected
into MDA-MB-231 by using G-fectin Reagent (Genolu-
tion). All experiments were performed 48 h after transfec-
tion.
Real-time quantitative PCR. One microgram of total
RNA from MDA-MB-231 cell was reverse transcribed with
oligo dT and M-MLV RT reverse transcriptase (Invitro-
gen). Real-time quantitative PCR was performed using a
GENETBIO SYBR Green Prime Q-master Mix and the
QuantStudio 5 PCR system (ThermoFisher). All runs were
accompanied by the internal control B2M or HPRT gene.
Because both the reference genes yielded very similar re-
sults, only B2M results are shown in Figure 6. The samples
were run in duplicate and normalized to B2M or GAPDH
using a DD cycle threshold-based algorithm, to provide ar-
bitrary units representing relative expression.
Downloaded from https://academic.oup.com/nar/article-abstract/47/9/e53/5366474 by guest on 19 March 2020
e53 Nucleic Acids Research, 2019, Vol. 47, No. 9 PAGE 4OF 10
Methods
Precision
Methods
Sensitivity
ABC
Figure 3. Simulation test for biclustering algorithms. (A) Example of simulation prole. Orange and gray elements indicate 1 and 0, respectively. (B)
Precision and (C) sensitivity of tested biclustering algorithms.
Immunoblotting. Harvested cells were lysed in RIPA
buffer and subjected to centrifugation, and the super-
natants were collected. Protein concentration was mea-
sured using the BCA protein assay kit (Pierce), and equal
amounts of protein were resolved using 10% or 12% sodium
dodecyl sulfate (SDS)-polyacrylamide gel electrophoresis
(PAGE) and transferred to Nylon membranes (GE Health-
care, Amersham). Target proteins were observed by in-
cubation with primary antibodies and infrared uores-
cence dye-conjugated secondary antibodies as follows: rab-
bit anti-human FAK (1:1000, cell signaling), phospho-
FAK (1:1000, cell signaling), Akt (1:1000, cell signal-
ing), phospho- Akt (1:1000, cell signaling) and mouse
anti-human GAPDH (1:1000, cell signaling). The HRP-
conjugated secondary antibodies were purchased from Cell
Signaling Technology. Immunodetection was performed us-
ing an Odyssey CLx scanner (Li-COR Biosciences).
RESULTS
Comparison with other biclustering algorithms
Compared to seed biclusters, PBE algorithm yielded much
larger biclusters by allowing for a small portion of noise
(Supplementary Figure S3). Its performance was compared
with those of ve existing biclustering algorithms such as
ISA (28), QUBIC (29), FABIA (30), BiBit (31) and HOC-
CLUS2 (17). A summary of each method is described in
Supplementary Data (Section S4). First, the size and sig-
nal density of biclusters generated from a real up-regulated
MIR prole of let-7c-5p were compared (Supplementary
Table S3). PBE yielded large biclusters with high densities,
whereas existing algorithms yielded biclusters with either
smaller sizes or poorer densities. PBE also captured stem-
cell-specic bicluster better than the other algorithms (Sup-
plementary Figure S4). Detailed results for real data analy-
sis are described in Supplementary Data (Section S4).
Next, we tested the sensitivity and specicity of biclus-
tering algorithms using simulated binary proles that reect
the average size and density of real MIR proles (700 rows,
300 columns and 20% density) (Figure 3A). The simulated
proles contained seven biclusters in which row and col-
umn sizes were randomly chosen between 20 and 80, and
each bicluster included 1–3% of zeros (noise). Some of bi-
clusters overlapped with each other by <20% of the biclus-
ter sizes. The simulation was repeated 50 times. Here, ‘true
elements’ indicate those included within the seven biclus-
ters, and ‘false elements’ indicate those outside the biclus-
ters. Thus, after running each biclustering algorithm, the
sensitivity was measured as the number of true elements
within the predicted biclusters divided by the number of all
true elements. The precision was measured as the propor-
tion of true elements within the predicted biclusters. PBE
showed perfect precision (median =100%) with high sensi-
tivity (median =95.6%). The performance of ISA depended
on the row (TG) and column (TC) thresholds. When TG =
TC =1, high sensitivity was observed (median =97.2%)
while precision was relatively low (median =87.7%). When
both TG and TC were increased to 2, the precision was in-
creased (median =96.8%) but the sensitivity was decreased
(median =86.1%). The QUBIC results were affected by the
consistency parameter c. As this value was increased, pre-
cision was increased while sensitivity was decreased. The
best performance was observed when using the default pa-
rameter (c=0.95, median precision =80.8%, median sen-
sitivity =100%). BIMAX and BiBit do not allow zeros
in the biclusters and exhibited quite low sensitivities (me-
dian BIMAX sensitivity =10.2%, median BiBit sensitivity
=14.5%). However, when 30 iterations were applied for BI-
MAX, its sensitivity was much increased to 86.7%. FABIA
yielded highly noisy biclusters for all tested sparseness pa-
rameters (a) resulting in low precision (median 46.6%)
and sensitivity (66.0%). Results for a=0.01 and 0.05
are shown in Figure 3.Fora0.1, FABIA did not cre-
ate a bicluster. HOCCLUS2 was also tested but excluded
from Figure 3, because it did not generate any bicluster un-
der this simulation setting. HOCCLUS2 detected biclusters
from sparser data (12% or lower density). These results indi-
cate that PBE is an efcient algorithm to identify biclusters
from noisy binary data.
Downloaded from https://academic.oup.com/nar/article-abstract/47/9/e53/5366474 by guest on 19 March 2020
PAGE 5OF 10 Nucleic Acids Research, 2019, Vol. 47, No. 9 e53
AB C
Figure 4. Performance of miRNA target prediction using binding sequence, biclustering and functional networks. (A) Sensitivity and specicity of pooled
bicluster targets of 11 miRNAs. Targets with binding sequences were used as background (diagonal black dash). Blue nodes represent biclustering results.
Red/yellow/green/purple nodes represent the results obtained using both the biclustering and network-based target selection with node degrees 2, 3, 4 and
5, respectively. (B) Average sensitivity and specicity for different node degrees of target networks. (C) Average gains in certainty of methods using binding
sequence, biclustering and network information.
Accuracy of the biclustering target prediction
The bicluster targets were assessed using validated miRNA
targets. miRTarBase (41) provides hundreds of thousands
of experimentally validated miRNA-target relations with
‘strong’ evidence (reporter assays or western blot) and ‘less
strong’ (or weak) evidence (pSILAC or microarray exper-
iment). Among the sequence-specic targets (background
set) of a given miRNA, those validated with ‘strong’ evi-
dence were regarded as gold positive (GP) targets, whereas
those having neither strong nor weak evidence were set
as gold negative (GN) targets. For evaluation, we selected
miRNAs having more than 30 GPs whose fraction within
the background set was not <5%. Eleven miRNAs that sat-
ised these criteria were analyzed (Figure 4A).
For each miRNA, all the resulting bicluster targets,
whether up- or downregulated, were pooled as predicted
targets, and corresponding sensitivity, specicity, as well as
GP enrichment and GN depletion were calculated (Supple-
mentary Tables S5 and S6). When the 1.3 FC threshold was
used to quantize the logFC data, the average sensitivity and
specicity of the 11 miRNAs were 0.704 and 0.466, respec-
tively (summation =1.170), representing a 17.0% (median
19.4%) improved gain compared with the sequence-based
target prediction. Although positive gains were obtained
for all 11 miRNAs for the 1.3 FC cutoff (Figure 4A), the
relative performances for each miRNA were quite differ-
ent for different FC cutoffs (Supplementary Table S5). For
example, the gain of miR-34a-5p decreased as the FC cut-
off was increased because of the rapid decline in sensitiv-
ity (gains for 1.3 FC: 20.8%, 1.5 FC: 13.3%, 2.0 FC: 7.2%).
In contrast, the gain of miR-21-5p increased as the cutoff
was increased because the specicity was relatively more in-
creased (gains for 1.3 FC: 16.4%, 1.5 FC: 26.5% and 2.0
FC: 31.3%). Such a difference presumably represents dif-
ferent miRNA regulation patterns. The former case corre-
sponds to the ‘ne tuner’ miRNAs that moderately regu-
late many genes. Therefore, using a lower cutoff helps detect
subtle changes in target expressions. However, miRNAs for
the latter case seem to more strongly regulate a relatively
small number of targets. Among the three thresholds tested,
1.3 FC exhibited the best overall gain with the largest sen-
sitivity.
miRNA targets tend to be functionally related with each
other (42,43). Therefore, we incorporated the protein func-
tional interaction networks from the STRING database
(44) (edge threshold >150) between the bicluster target
genes to improve the prediction. Among the bicluster tar-
gets, we further selected those with kor more functional in-
teractions with other targets and measured the correspond-
ing gains. Intriguingly, the specicity rapidly increased as k
was increased (Figure 4B), and the maximum gain reached
up to 32.0% when k=3 (specicity =77.8%, Figure 4C).
The maximum median gain was even higher (33.4% when
k=4). These results show that target interaction networks
can improve the miRNA target prediction considerably.
Comparison with anticorrelation-based methods in cancer
miRNA–mRNA paired proling has been commonly used
to predict condition-specic miRNA targets based on the
anticorrelation between miRNA and its mRNA targets.
Therefore, we compared our biclustering method with seven
anticorrelation-based methods (GenMiR++(13), Pearson
correlation, Spearman correlation, Lasso (45,46), Elas-
tic Net (47), IDA (48) and Tiresias (49)) in predicting
cancer-specic miRNA targets. Pearson/Spearman corre-
lation, Lasso, Elastic Net and IDA were implemented us-
ing miRLAB R package (50), and GenMiR++ and Tire-
sias were run using original MATLAB and Perl codes,
respectively. For the 11 miRNAs evaluated in the pre-
vious section, the accuracy of the predicted targets was
compared between anticorrelation-based methods and our
biclustering method. For the anticorrelation-based meth-
ods, the sequence-specic targets of each miRNA were
sorted in the order of anticorrelation scores that were
calculated from TCGA (The Cancer Genome Atlas)
miRNA–mRNA proles by Pearson/Spearman correla-
Downloaded from https://academic.oup.com/nar/article-abstract/47/9/e53/5366474 by guest on 19 March 2020
e53 Nucleic Acids Research, 2019, Vol. 47, No. 9 PAGE 6OF 10
Figure 5. Performance comparison between biclustering and anticorrelation-based methods. Black asterisks represent biclustering predictions. Green and
red asterisks represent bicluster targets with at least one and three network degrees, respectively. Solid lines represent ROCs of the seven anticorrelation-
based methods. The title of each panel represents the cancer type, miRNA and target regulation direction (parenthesized). Blue, green and red titles
represent the 11, 6 and 3 cases where the biclustering method performed better than, similar to and worse than anticorrelation-based methods, respectively.
Dashed black lines represent the background results when only sequence-specic targets were used. BRCA, DLBC, GBMLGG and LAML represent
breast invasive carcinoma, diffuse large B-cell lymphoma, glioma and acute myeloid lymphoma, respectively.
tion, Bayesian method, penalized regression or neural net-
work model. These sorted scores were compared to the gold
standard positive/negative sets that yielded ROC curves.
For the biclustering method, we selected biclusters where at
least 30% of the rows pertained to ‘tumor versus normal’ or
‘aggressive versus non-aggressive tumor’ conditions. These
biclusters represented 33 miRNA-cancer pairs for ve can-
cer types (breast, brain, lung, colon or blood cancer). In
each miRNA–cancer pair, corresponding bicluster targets
were pooled in the order of proportion of the specic cancer
condition in each bicluster. Thus, the true and false-positive
rates of bicluster targets in each pooling step were depicted
instead of ROC curve (asterisks, Figure 5). After remov-
ing six cases where none of the areas under ROC curves
(AUCs) exceeded 0.6 and the maximum biclustering gain
was <1.1, we selected biclusters from 20 cases that were co-
herent with known miRNA expression (quantitative PCR
results) for comparison. In other words, upregulated biclus-
ters were chosen when corresponding miRNA was known
to be downregulated and vice versa, in cancer. Supplemen-
tary Table S7 lists the literature reporting the expression lev-
els of miRNAs in cancers.
Overall, the biclustering method was compared favor-
ably with the miRNA–mRNA prole based methods (Fig-
ure 5). For 11 out of the 20 cases, the biclustering method
exhibited better gains than the anticorrelation-based meth-
ods; in 6 other cases, both approaches exhibited simi-
lar performances; in the remaining 3 cases, the bicluster-
ing method was inferior to the best anticorrelation-based
method, mostly because of its low sensitivity. As seen in
the previous section, incorporating the network informa-
tion tended to increase the specicity and gain of the bi-
clustering method. Among the seven anticorrelation-based
methods, Genmir++ performed best for most cases.
These results showed that if miRNA expression informa-
tion was provided, our biclustering approach overall per-
formed better than anticorrelation-based methods in prior-
itizing condition-specic miRNA targets. Notably, miRNA
expression is relatively easily obtained from the literature or
quantitative PCR experiments.
miRNAs targeting PI3K/Akt signaling in cancer
We further analyzed the bicluster targets corresponding to
the 20 cancer-miRNA pairs (Figure 5). Among them, breast
Downloaded from https://academic.oup.com/nar/article-abstract/47/9/e53/5366474 by guest on 19 March 2020
PAGE 7OF 10 Nucleic Acids Research, 2019, Vol. 47, No. 9 e53
Figure 6. miRNA targets in PI3K/Akt signaling pathway (breast cancer). (A) miRNA targets predicted from breast cancer biclusters are highlighted by
red borders. For each target molecule, corresponding miRNA names and target gene symbols are represented. (B) Distant relapse-free survival analysis
for 210 patients with breast cancer exhibiting high and low miR-29a, miR-29b and miR-29c levels. The patients were divided into two groups based on
their best splits at top 33.8%, 40% and 66% values, respectively. (C) Transcript levels of miR-29 target gene candidates were analyzed by qRT-PCR. MDA-
MB-231 breast cancer cells were transiently transfected with either scrambled miRNA (control) or miR-29 (29b-3p or 29c-3p). All the nine genes tested
were considerably downregulated by miR-29b and/or -29c. In particular, ITGB1, GNG12 and VEGFA were downregulated by both miR-29b and -29c.
Statistical signicance was tested by one-tailed t-test. *P<0.05; **P<0.01; ***P<0.001 versus scrambled miRNA. (Dand E) Activation of downstream
pathway candidates such as AKT and FAK were analyzed by immunoblotting. Total cell lysates extracted from either scrambled miRNA or (D) miR-29b-
3p as well as (E) miR-29c-3p transfected cells were analyzed for the levels of pAKT, AKT, pFAK and FAK.
cancer and diffuse large B-cell lymphoma (DLBCL) yielded
the largest numbers of biclusters. In breast cancer, biclus-
ter targets of miR-1, miR-29a/b/c, miR-34a and miR-145
were upregulated in aggressive cancer; in DLBCL, the tar-
gets of miR-29a/b/c, miR-34a and miR-145 were also up-
regulated. We pooled those bicluster targets in each cancer
type and performed pathway enrichment analysis (KEGG
annotation) using the DAVID tool (51) to identify seven
and four signicant pathways (FDR <0.05) in breast can-
cer and DLBCL, respectively (Supplementary Tables S8
and S9). Interestingly, the bicluster targets in both can-
cer types were strongly enriched with ‘PI3K/Akt signaling
pathway’ (FDR =2.6E-7 for breast cancer; FDR =5.3E-
7 for DLBCL). This pathway is known to be frequently
hyperactivated in many cancers to promote cell cycle and
survival, proliferation and epithelial–mesenchymal transi-
tion of tumor cells (52,53). In addition, extracellular matrix
(ECM)–receptor interaction and focal adhesion pathways
were commonly caught in both cancer types, but all the
Downloaded from https://academic.oup.com/nar/article-abstract/47/9/e53/5366474 by guest on 19 March 2020
e53 Nucleic Acids Research, 2019, Vol. 47, No. 9 PAGE 8OF 10
corresponding bicluster targets except two (CAV2, BIRC2)
were also included in PI3K/Akt signaling pathway.
Figure 6A and Supplementary Figure S5A show
PI3K/Akt pathway where the bicluster targets are high-
lighted for breast cancer and DLBCL, respectively. In
both cancer types, the miRNAs targeted multiple ligands
including genes encoding growth factors (e.g. VEGFA and
PDGFC targeted by miR-29) and ECM (e.g. COL1A1,
LAMC1 and THBS2 by miR-29); signal transducers such
as receptor tyrosine kinase (e.g. MEK and/or PDGFRA
by miR-34a), G-proteins (GNB4 and GNG12 by miR-29),
toll-like receptor (TLR4 by miR-34a and miR-145) and
integrin (e.g. ITGB1 by miR-29); as well as downstream
effectors such as NRAS (by miR-29 and miR-145) and
CDK6 (by miR-29). In addition, AKT3 was targeted by
miR-29 in breast cancer, and cytokine receptor (IL2RB
and IL6R) and one component of the PI3K complex
(PIK3R3) were also targeted by miR-34a and miR-29,
respectively, in DLBCL. Indeed, it was previously shown
that miR-29b upregulation in breast cancer considerably
inhibited metastasis by repressing targets related to the
tumor microenvironment (54) (including some genes listed
above).
In the present study, we experimentally validated the bi-
cluster targets of miR-29 using the human breast cancer cell
line, MDA-MB-231, which is a well-established metastatic
and invasive cancer cell line. Transcript levels of nine bi-
cluster targets related to ECM or PI3K were analyzed 2
days after transient transfection with either miR-29 or con-
trol miRNA. All the nine targets were signicantly down-
regulated by miR-29b or -29c transfection compared to the
controls (Figure 6C). Furthermore, the activation of ECM-
related downstream pathways such as focal adhesion ki-
nase (FAK) and AKT were also attenuated by miR-29 (Fig-
ure 6D and E) demonstrating the capability of biclustering
analysis to capture relevant pathways for disease.
Finally, we analyzed the prognostic values of these miR-
NAs using multivariate Cox proportion hazard (mCPH)
model for public miRNA expression datasets. The distant-
relapse-free survival was tested for 210 patients with breast
cancer (GEO database, GSE22216). Among the six miR-
NAs analyzed, the three miR-29 family miRNAs had sig-
nicant prognostic values (mCPH P-values of miR-29a =
0.0042, miR-29b =0.0064, miR-29c =0.0038; adjusted
for age, tumor size, lymph nodes involved, ER and grade).
Then, the overall survival of 116 patients with DLBCL
(GSE40239) was also analyzed for ve miRNAs. Among
them, two exhibited signicant prognostic values (mCPH
P-values for miR-34a =0.0185 and miR-145 =0.0041; ad-
justed for International Prognostic Index (IPI) and gender).
See Supplementary Tables S10 and S11 for detailed results.
Kaplan–Meier plots contrasting the effects of miRNA ex-
pression on survival are also shown in Figure 6B and Sup-
plementary Figures S5B and S5C.
Overall, by analyzing cancer biclusters, we were able to
identify the key pathways (PI3K/Akt signaling, ECM and
focal adhesion), and ve associated prognostic miRNAs
(miR-29a, miR-29b and miR-29c in breast cancer; miR-34a
and miR-145 in DLBCL) that are repressive of tumor pro-
gression (hazard ratios of 0.593–0.745). In particular, the
effects of miR-29b/c on these pathways were validated ex-
perimentally (Figure 6C-E).
BiMIR: a bicluster database for condition-specic miRNA
targets
In total, 29 898 biclusters were generated for 459 human
miRNAs using PBE algorithm (13 949 for 1.3 FC; 10 999 for
1.5 FC; 4950 for 2.0 FC thresholds) and compiled in BiMIR
database (http://www.btool.org/bimir dir/) where biclusters
are searchable for miRNAs, tissues, diseases, keywords, tar-
get genes of interest and their combinations. BiMIR can be
used for investigating novel miRNA functions, targets and
related cell conditions.
Along with the list of searched biclusters, the function en-
richment results for bicluster targets are provided based on
the MSigDB (55) pathway (C2) and gene ontology (C5) cat-
egories. If biclusters are searched for a specic organ/tissue
or disease, the proportion of corresponding conditions in
each bicluster is also indicated. These help the user nd rel-
evant biclusters. The heat maps for each bicluster are visu-
alized (Supplementary Figure S6) and corresponding target
genes and cell conditions are hyperlinked to Genecards (56)
and GEO (24) databases for detailed information, respec-
tively. For bicluster target genes, the experimental evidence
from miRTarBase (41), network node degrees and protein
network visualization based on STRING database (44)are
provided. All the biclusters are downloadable from BiMIR
database.
DISCUSSION
Here, we presented a novel framework to prioritize miRNA
targets by biclustering sequence-specic targets and cell
conditions, which is a dimension that has been rarely inves-
tigated. This is based on the idea that miRNA targets, like
other cellular molecules, have modular activity and can be
repeatedly captured across different cell conditions. Indeed,
the bicluster targets exhibited substantially improved accu-
racy compared to purely sequence-based targets and were
often enriched in well-known pathways characterizing the
modules identied. Moreover, functionally connected tar-
gets exhibited even higher accuracy, further conrming the
modular activity of miRNA targets. The functional inter-
action of miRNA targets and their contribution to target
prediction have been studied previously (57,58).
We analyzed cancer biclusters and found that PI3K/Akt
signaling pathway was intensively targeted by a few miR-
NAs in two cancer types. Further, prognostic values of those
miRNAs and the regulatory effects of miR-29 were also val-
idated. These results demonstrate that biclustering analysis
is able to reveal key pathways controlled by miRNAs in dis-
ease. BiMIR database provides miRNAs and targeted path-
ways for dozens of diseases.
Based on the knowledge of miRNA expression, our pre-
diction was favorably compared with seven anticorrelation-
based methods under cancer conditions. These results
demonstrate the practical value of our approach in that our
results can provide fairly good target predictions for a va-
riety of cell conditions without generating costly miRNA–
mRNA proles. BiMIR database was designed to explore
Downloaded from https://academic.oup.com/nar/article-abstract/47/9/e53/5366474 by guest on 19 March 2020
PAGE 9OF 10 Nucleic Acids Research, 2019, Vol. 47, No. 9 e53
the modular regulatory networks of miRNAs by connect-
ing miRNAs, cell conditions (or disease), mRNA targets
and associated pathways. The user may obtain candidate
miRNAs and target genes for a cell condition of interest.
Knowledge of the miRNA expression level will help select
the proper direction of biclusters (up or down).
Despite the improvements and usefulness shown in this
study, there remain difculties in our approach regarding
free parameters that need to be optimized. First, the mini-
mum seed size of 10 ×10 was determined in an ad hoc man-
ner, and its optimal size may be affected by the size of the
fold-change data. Second, the iteration number of 20 in BI-
MAX algorithm was used to compromise the computation
time, using a higher iteration number yielded more biclus-
ters. However, other parameters seemed to be less sensitive.
For example, we gradually increased the threshold of zero
proportion from 0.01 to 0.1 (step size 0.01) during 10 iter-
ations of bicluster extension. This may seem to allow 10%
of zeros in the end, but the nal zero proportion was only
1.5% because of the trimming process. The cutoff of hier-
archical clustering of the extended clusters was also a less
sensitive parameter. In addition, the biclusters were gener-
ated under a rather strict criterion (for targets in three or
more databases); therefore, BiMIR can be used for selecting
a small number of highly likely targets for the cell condition
of interest.
The biclustering approach presented here can also be ap-
plied for predicting the condition-specic targets of other
sequence-specic regulators such as transcription factors
or RNA-binding proteins. In this regard, the entire 5158
mRNA fold-change proles for 20 639 genes are provided
for general systems biology research. These mRNA fold-
change data are different from the GTEx transcriptome
data (59) in that GTEx data represent transcription levels
in normal tissues, whereas our fold-change data represent
gene expression ‘changes’ for a variety of cell conditions
such as disease, chemical treatment, tissues and differentia-
tions. Thus, these fold-change data can also be used for clus-
tering or regulatory network analysis for a specic group of
genes or cell conditions.
Whereas existing methods to identify miRNA regulation
modules bicluster multiple miRNAs and multiple target
genes representing coregulatory networks, our work pre-
sented here is focused on prioritizing highly likely target
genes of a single miRNA that are commonly detected across
multiple cell conditions. Our approach can also be extended
to evaluate the miRNA coregulatory networks by overlap-
ping biclusters for different miRNAs. A signicant over-
lap indicates mRNA targets coregulated under multiple cell
conditions. Our approach and data would contribute to un-
covering the modular structure of complex regulatory net-
works.
DATA AVAILABILITY
BiMIRdatabaseareavailableathttp://www.btool.org/
bimir dir/. BiMIR R package that includes the biclustering
code and the large expression fold-change data are available
at https://github.com/unistbig/bimir.
SUPPLEMENTARY DATA
Supplementary Data are available at NAR Online.
FUNDING
National Research Foundation (NRF) of Korea, Ge-
nomics Program [2016M3C9A3945893]; Basic Science
Research Program (NRF) [2017R1E1A1A03070107,
NRF-2018R1A5A1024340]; Bio-Synergy Research Project
[NRF-2017M3A9C4065956]. Funding for open access
charge: NRF [NRF-2016M3C9A3945893].
Conict of interest statement. None declared.
REFERENCES
1. Chen,K. and Rajewsky,N. (2007) The evolution of gene regulation by
transcription factors and microRNAs. Nat. Rev. Genet.,8, 93–103.
2. Salmena,L., Poliseno,L., Tay,Y., Kats,L. and Pandol,P.P. (2011) A
ceRNA hypothesis: the Rosetta Stone of a hidden RNA language?
Cell,146, 353–358.
3. Bueno,M.J. and Malumbres,M. (2011) MicroRNAs and the cell
cycle. Biochim. Biophys. Acta,1812, 592–601.
4. Shivdasani,R.A. (2006) MicroRNAs: regulators of gene expression
and cell differentiation. Blood,108, 3646–3653.
5. Neal,C.S., Michael,M.Z., Pimlott,L.K., Yong,T.Y., Li,J.Y.Z. and
Gleadle,J.M. (2011) Circulating microRNA expression is reduced in
chronic kidney disease. Nephrol. Dialysis Transplant.,26, 3794–3802.
6. Zhang,B.H., Pan,X.P., Cobb,G.P. and Anderson,T.A. (2007)
microRNAs as oncogenes and tumor suppressors. Dev. Biol.,302,
1–12.
7. John,B., Enright,A.J., Aravin,A., Tuschl,T., Sander,C. and
Marks,D.S. (2004) Human MicroRNA targets. PLoS Biol.,2, e363.
8. Lewis,B.P., Burge,C.B. and Bartel,D.P. (2005) Conserved seed
pairing, often anked by adenosines, indicates that thousands of
human genes are microRNA targets. Cell,120, 15–20.
9. Krek,A., Grun,D., Poy,M.N., Wolf,R., Rosenberg,L., Epstein,E.J.,
MacMenamin,P., da Piedade,I., Gunsalus,K.C., Stoffel,M. et al.
(2005) Combinatorial microRNA target predictions. Nat. Genet.,37,
495–500.
10. Kertesz,M., Iovino,N., Unnerstall,U., Gaul,U. and Segal,E. (2007)
The role of site accessibility in microRNA target recognition. Nat.
Genet.,39, 1278–1284.
11. Kiriakidou,M., Nelson,P.T., Kouranov,A., Fitziev,P., Bouyioukos,C.,
Mourelatos,Z. and Hatzigeorgiou,A. (2004) A combined
computational-experimental approach predicts human microRNA
targets. Genes Dev.,18, 1165–1178.
12. Kim,D., Sung,Y.M., Park,J., Kim,S., Kim,J., Park,J., Ha,H., Bae,J.Y.,
Kim,S. and Baek,D. (2016) General rules for functional microRNA
targeting. Nat. Genet.,48, 1517–1526.
13. Huang,J.C., Babak,T., Corson,T.W., Chua,G., Khan,S., Gallie,B.L.,
Hughes,T.R., Blencowe,B.J., Frey,B.J. and Morris,Q.D. (2007) Using
expression proling data to identify human microRNA targets. Nat.
Methods,4, 1045–1049.
14. Lu,Y., Zhou,Y., Qu,W., Deng,M. and Zhang,C. (2011) A Lasso
regression model for the construction of microRNA-target regulatory
networks. Bioinformatics,27, 2406–2413.
15. Muniategui,A., Pey,J., Planes,F.J. and Rubio,A. (2013) Joint analysis
of miRNA and mRNA expression data. Brief. Bioinform.,14,
263–278.
16. Yoon,S. and De Micheli,G. (2005) Prediction of regulatory modules
comprising microRNAs and target genes. Bioinformatics,21,
ii93–ii100.
17. Pio,G., Ceci,M., D’Elia,D., Loglisci,C. and Malerba,D. (2013) A
novel biclustering algorithm for the discovery of meaningful
biological correlations between microRNAs and their target genes.
BMC Bioinformatics,14, S8.
18. Joung,J.G., Hwang,K.B., Nam,J.W., Kim,S.J. and Zhang,B.T. (2007)
Discovery of microRNA-mRNA modules via population-based
probabilistic learning. Bioinformatics,23, 1141–1147.
19. Peng,X., Li,Y., Walters,K.A., Rosenzweig,E.R., Lederer,S.L.,
Aicher,L.D., Proll,S. and Katze,M.G. (2009) Computational
Downloaded from https://academic.oup.com/nar/article-abstract/47/9/e53/5366474 by guest on 19 March 2020
e53 Nucleic Acids Research, 2019, Vol. 47, No. 9 PAGE 10 OF 10
identication of hepatitis C virus associated microRNA-mRNA
regulatory modules in human livers. BMC Genomics,10, 373.
20. Liu,B., Li,J. and Cairns,M.J. (2014) Identifying miRNAs, targets and
functions. Brief. Bioinform.,15, 1–19.
21. Liu,B., Liu,L., Tsykin,A., Goodall,G.J., Green,J.E., Zhu,M.,
Kim,C.H. and Li,J. (2010) Identifying functional miRNA-mRNA
regulatory modules with correspondence latent dirichlet allocation.
Bioinformatics,26, 3105–3111.
22. Mitra,K., Carvunis,A.R., Ramesh,S.K. and Ideker,T. (2013)
Integrative approaches for nding modular structure in biological
networks. Nat. Rev. Genet.,14, 719–732.
23. Le,T.D., Liu,L., Zhang,J., Liu,B. and Li,J. (2015) From miRNA
regulation to miRNA-TF co-regulation: computational approaches
and challenges. Brief. Bioinform.,16, 475–496.
24. Clough,E. and Barrett,T. (2016) The Gene Expression Omnibus
Database. Methods Mol. Biol.,1418, 93–110.
25. Gennarino,V.A., D’Angelo,G., Dharmalingam,G., Fernandez,S.,
Russolillo,G., Sanges,R., Mutarelli,M., Belcastro,V., Ballabio,A.,
Ver d e ,P. et al. (2012) Identication of microRNA-regulated gene
networks by expression analysis of target genes. Genome Res.,22,
1163–1172.
26. Bondy,J.A. and Murty,U.S.R. (1976) Graph Theory with Applications.
Macmillan, London.
27. Prelic,A., Bleuler,S., Zimmermann,P., Wille,A., Buhlmann,P.,
Gruissem,W., Hennig,L., Thiele,L. and Zitzler,E. (2006) A systematic
comparison and evaluation of biclustering methods for gene
expression data. Bioinformatics,22, 1122–1129.
28. Bergmann,S., Ihmels,J. and Barkai,N. (2003) Iterative signature
algorithm for the analysis of large-scale gene expression data. Phys.
Rev. E,67, 031902.
29. Li,G., Ma,Q., Tang,H., Paterson,A.H. and Xu,Y. (2009) QUBIC: a
qualitative biclustering algorithm for analyses of gene expression
data. Nucleic Acids Res.,37, e101.
30. Hochreiter,S., Bodenhofer,U., Heusel,M., Mayr,A., Mitterecker,A.,
Kasim,A., Khamiakova,T., Van Sanden,S., Lin,D., Talloen,W. et al.
(2010) FABIA: factor analysis for bicluster acquisition.
Bioinformatics,26, 1520–1527.
31. Rodriguez-Baena,D.S., Perez-Pulido,A.J. and Aguilar-Ruiz,J.S.
(2011) A biclustering algorithm for extracting bit-patterns from
binary datasets. Bioinformatics,27, 2738–2745.
32. Gautier,L., Cope,L., Bolstad,B.M. and Irizarry,R.A. (2004) affy -
analysis of Affymetrix GeneChip data at the probe level.
Bioinformatics,20, 307–315.
33. Garcia,D.M., Baek,D., Shin,C., Bell,G.W., Grimson,A. and
Bartel,D.P. (2011) Weak seed-pairing stability and high target-site
abundance decrease the prociency of lsy-6 and other microRNAs.
Nat. Struct. Mol. Biol.,18, 1139–1146.
34. Kozomara,A. and Grifths-Jones,S. (2014) miRBase: annotating
high condence microRNAs using deep sequencing data. Nucleic
Acids Res.,42, D68–D73.
35. Betel,D., Koppal,A., Agius,P., Sander,C. and Leslie,C. (2010)
Comprehensive modeling of microRNA targets predicts functional
non-conserved and non-canonical sites. Genome Biol.,11, R90.
36. Kertesz,M., Iovino,N., Unnerstall,U., Gaul,U. and Segal,E. (2007)
The role of site accessibility in microRNA target recognition. Nat.
Genet.,39, 1278–1284.
37. Maragkakis,M., Reczko,M., Simossis,V.A., Alexiou,P.,
Papadopoulos,G.L., Dalamagas,T., Giannopoulos,G., Goumas,G.,
Koukis,E., Kourtis,K. et al. (2009) DIANA-microT web server:
elucidating microRNA functions through target prediction. Nucleic
Acids Res.,37, W273–W276.
38. Paraskevopoulou,M.D., Georgakilas,G., Kostoulas,N., Vlachos,I.S.,
Vergoulis,T., Reczko,M., Filippidis,C., Dalamagas,T. and
Hatzigeorgiou,A.G. (2013) DIANA-microT web server v5.0: service
integration into miRNA functional analysis workows. Nucleic Acids
Res.,41, W169–W173.
39. Wang,X.W. (2016) Improving microRNA target prediction by
modeling with unambiguously identied microRNA-target pairs
from CLIP-ligation studies. Bioinformatics,32, 1316–1322.
40. Nielsen,C.B., Shomron,N., Sandberg,R., Hornstein,E., Kitzman,J.
and Burge,C.B. (2007) Determinants of targeting by endogenous and
exogenous microRNAs and siRNAs. RNA,13, 1894–1910.
41. Chou,C.H., Shrestha,S., Yang,C.D., Chang,N.W., Lin,Y.L.,
Liao,K.W., Huang,W.C., Sun,T.H., Tu,S.J., Lee,W.H. et al. (2018)
miRTarBase update 2018: a resource for experimentally validated
microRNA-target interactions. Nucleic Acids Res.,46, D296–D302.
42. Sass,S., Dietmann,S., Burk,U.C., Brabletz,S., Lutter,D.,
Kowarsch,A., Mayer,K.F., Brabletz,T., Ruepp,A., Theis,F.J. et al.
(2011) MicroRNAs coordinately regulate protein complexes. BMC
Syst. Biol.,5, 136.
43. Sakai,A., Saitow,F., Maruyama,M., Miyake,N., Miyake,K.,
Shimada,T., Okada,T. and Suzuki,H. (2017) MicroRNA cluster
miR-17-92 regulates multiple functionally related voltage-gated
potassium channels in chronic neuropathic pain. Nat. Commun.,8,
16079.
44. Szklarczyk,D., Morris,J.H., Cook,H., Kuhn,M., Wyder,S.,
Simonovic,M., Santos,A., Doncheva,N.T., Roth,A., Bork,P. et al.
(2017) The STRING database in 2017: quality-controlled
protein-protein association networks, made broadly accessible.
Nucleic Acids Res.,45, D362–D368.
45. Santosa,F. and Symes,W.W. (1986) Linear inversion of Band-Limited
reection seismograms. Siam J. Sci. Stat. Comput.,7, 1307–1330.
46. Tibshirani,R. (1996) Regression shrinkage and selection via the
Lasso. J. R. Stat. Soc. Series B Methodol.,58, 267–288.
47. Sass,S., Pitea,A., Unger,K., Hess,J., Mueller,N.S. and Theis,F.J.
(2015) MicroRNA-Target network inference and local network
enrichment analysis identify two microRNA clusters with distinct
functions in head and neck squamous cell carcinoma. Int. J. Mol.
Sci.,16, 30204–30222.
48. Le,T.D., Liu,L., Tsykin,A., Goodall,G.J., Liu,B., Sun,B.Y. and Li,J.
(2013) Inferring microRNA-mRNA causal regulatory relationships
from expression data. Bioinformatics,29, 765–771.
49. Koo,J., Zhang,J.Y. and Chaterji,S. (2018) Tiresias: Context-sensitive
approach to decipher the presence and strength of MicroRNA
regulatory interactions. Theranostics,8, 277–291.
50. Le,T.D., Zhang,J., Liu,L., Liu,H. and Li,J. (2015) miRLAB: An R
based dry lab for exploring miRNA-mRNA regulatory relationships.
PLoS One,10, e0145386.
51. Huang,D.W., Sherman,B.T., Tan,Q., Collins,J.R., Alvord,W.G.,
Roayaei,J., Stephens,R., Baseler,M.W., Lane,H.C. and
Lempicki,R.A. (2007) The DAVID Gene Functional Classication
Tool: a novel biological module-centric algorithm to functionally
analyze large gene lists. Genome Biol.,8, R183.
52. Chang,F., Lee,J.T., Navolanic,P.M., Steelman,L.S., Shelton,J.G.,
Blalock,W.L., Franklin,R.A. and McCubrey,J.A. (2003) Involvement
of PI3K/Akt pathway in cell cycle progression, apoptosis, and
neoplastic transformation: a target for cancer chemotherapy.
Leukemia,17, 590–603.
53. Luo,J., Manning,B.D. and Cantley,L.C. (2003) Targeting the
PI3K-Akt pathway in human cancer: rationale and promise. Cancer
Cell,4, 257–262.
54. Chou,J., Lin,J.H., Brenot,A., Kim,J.W., Provot,S. and Werb,Z. (2013)
GATA3 suppresses metastasis and modulates the tumour
microenvironment by regulating microRNA-29b expression. Nat. Cell
Biol.,15, 201–213.
55. Liberzon,A., Subramanian,A., Pinchback,R., Thorvaldsdottir,H.,
Tamayo,P. and Mesirov,J.P. (2011) Molecular signatures database
(MSigDB) 3.0. Bioinformatics,27, 1739–1740.
56. Safran,M., Dalah,I., Alexander,J., Rosen,N., Iny Stein,T.,
Shmoish,M., Nativ,N., Bahir,I., Doniger,T., Krug,H. et al. (2010)
GeneCards Version 3: the human gene integrator. Database
(Oxford),2010, baq020.
57. Liang,H. and Li,W.H. (2007) MicroRNA regulation of human
protein protein interaction network. RNA,13, 1402–1408.
58. Wang,P., Ning,S., Wang,Q., Li,R., Ye,J., Zhao,Z., Li,Y., Huang,T.
and Li,X. (2013) mirTarPri: improved prioritization of microRNA
targets through incorporation of functional genomics data. PLoS
One,8, e53685.
59. GTEx Consortium (2015) Human genomics. The Genotype-Tissue
Expression (GTEx) pilot analysis: multitissue gene regulation in
humans. Science,348, 648–660.
Downloaded from https://academic.oup.com/nar/article-abstract/47/9/e53/5366474 by guest on 19 March 2020

Supplementary resource (1)

... The current results may provide a theoretical basis for future studies on the occurrence and development of DLBCL. miRNAs regulate target genes or themselves to activate or inhibit signaling pathways, and have become a research hotspot for tumor development and therapeutic targets in DLBCL (23). In the present study, KEGG analysis was implemented to determine the roles of these target genes. ...
... Zhao et al (25) has indicated that SMAD5 antisense RNA 1 inhibits DLBCL cell proliferation by sponging miR-135b-5p to upregulate adenomatous polyposis coli expression and inactivate the classic Wnt/β-catenin signaling pathway. Yoon et al (23) found that the PI3K/AKT signaling pathway is strongly enriched with targets of miR-29 in DLBCL. In the present study, certain signaling pathways, such as pathways in cancer, and the MAPK, Wnt and PI3K/AKT signaling pathways, were identified, which is consistent with the results of previous studies on DLBCL (5,23). ...
... Yoon et al (23) found that the PI3K/AKT signaling pathway is strongly enriched with targets of miR-29 in DLBCL. In the present study, certain signaling pathways, such as pathways in cancer, and the MAPK, Wnt and PI3K/AKT signaling pathways, were identified, which is consistent with the results of previous studies on DLBCL (5,23). In addition, a number of novel signaling pathways were identified in the present study, such as regulation of actin cytoskeleton, focal adhesion, endocytosis, axon guidance and the calcium signaling pathway, which may therefore be associated with the occurrence and development of DLBCL. ...
Article
Full-text available
miRNAs. Next, three databases (TargetScan, microRNA. org and PITA) were used to predict by intersection the potential target genes of the 204 differential miRNAs identified, and a Venn diagram of the results was performed. Subsequently, the target genes of differential miRNAs were analyzed by Gene Ontology and Kyoto Encyclopedia of Genes and Genomes pathway analysis. Finally, to validate the miRNA microarray data, reverse transcription-quantitative PCR (RT-qPCR) was performed for 8 differentially expressed miRNAs (miR-193a-3p, miR-19a-3p, miR-19b-3p, miR-370-3p, miR-1275, miR-490-5p, miR-630 and miR-665) using DLBCL and LRH fresh samples. In total, 204 miRNAs exhibited differential expression, including 105 downregulated and 54 upregulated miRNAs. The cut-off criteria were set as P≤0.05 and fold-change ≥2. A total of 7,522 potential target genes for the 204 miRNAs were predicted. Potential target genes were enriched in the following pathways: ‘Cancer’, ‘MAPK signaling pathway’, ‘regulation of actin cytoskeleton’, ‘focal adhesion’, ‘endocytosis’, ‘Wnt signaling pathway’, ‘axon guidance’, ‘calcium signaling pathway’ and ‘PI3K/AKT signaling pathway’. A total of 8 miRNAs were validated by RT-qPCR, and 4 miRNAs (miR-19b-3p, miR-193a-3p, miR-370-3p and miR-490-5p) exhibited low expression levels in DLBCL (P<0.05), while miR-630 was highly expressed in DLBCL (P<0.05). Overall, the present study screened 204 differentially expressed miRNAs and analyzed the expression levels of 8 differentially expressed miRNAs in DLBCL. These differentially expressed miRNAs may serve as therapeutic targets for improvement of therapeutic efficacy in DLBCL in the future.
... These invaluable database repositories provide new paradigms to explore context-specific miRNA-gene regulatory relationship. Several computational methods have been proposed on the basis of modular structure identification [15][16][17][18][19][20][21]. Zhang et al. developed a joint non-negative matrix factorization method to discover miRNAgene co-modules in ovarian cancer [15]. ...
... Then, a bi-clustering method based on a sparse matrix factorization is used to cluster the regulation matrix for discovering miRNA-gene modules. Yoon et al. (2019) also developed a bi-clustering method to identify condition-specific modules by integrating the gene expression and miRNA sequencespecific targets information [21]. Although these methods can discover miRNA-gene modules for one cancer or tissue to some extent, they fail to identify cancer-specific and shared miRNA-gene modules when integrating multiple cancer data. ...
... Then, a bi-clustering method based on a sparse matrix factorization is used to cluster the regulation matrix for discovering miRNA-gene modules. Yoon et al. (2019) also developed a bi-clustering method to identify condition-specific modules by integrating the gene expression and miRNA sequencespecific targets information [21]. Although these methods can discover miRNA-gene modules for one cancer or tissue to some extent, they fail to identify cancer-specific and shared miRNA-gene modules when integrating multiple cancer data. ...
Article
Full-text available
Existing studies have demonstrated that dysregulation of microRNAs (miRNAs or miRs) is involved in the initiation and progression of cancer. Many efforts have been devoted to identify microRNAs as potential biomarkers for cancer diagnosis, prognosis and therapeutic targets. With the rapid development of miRNA sequencing technology, a vast amount of miRNA expression data for multiple cancers has been collected. These invaluable data repositories provide new paradigms to explore the relationship between miRNAs and cancer. Thus, there is an urgent need to explore the complex cancer-related miRNA-gene patterns by integrating multi-omics data in a pan-cancer paradigm. In this study, we present a tensor sparse canonical correlation analysis (TSCCA) method for identifying cancer-related miRNA-gene modules across multiple cancers. TSCCA is able to overcome the drawbacks of existing solutions and capture both the cancer-shared and specific miRNA-gene co-expressed modules with better biological interpretations. We comprehensively evaluate the performance of TSCCA using a set of simulated data and matched miRNA/gene expression data across 33 cancer types from the TCGA database. We uncover several dysfunctional miRNA-gene modules with important biological functions and statistical significance. These modules can advance our understanding of miRNA regulatory mechanisms of cancer and provide insights into miRNA-based treatments for cancer.
... It is well known that miRNA regulation is essential to a wide range of important biological processes, including RNA silencing, transcriptional regulation of gene expression, cellular functions, signaling pathways and human cancers. Previous studies [25][26][27] have shown that miRNA regulation is condition-specific, implying that the miRNA regulation is cell-specific even these single-cells are phenotypically identical. Fortunately, single-cell RNA sequencing technology provides us an opportunity to gain insights into miRNA regulation at single-cell level. ...
Article
Full-text available
Background Existing computational methods for studying miRNA regulation are mostly based on bulk miRNA and mRNA expression data. However, bulk data only allows the analysis of miRNA regulation regarding a group of cells, rather than the miRNA regulation unique to individual cells. Recent advance in single-cell miRNA-mRNA co-sequencing technology has opened a way for investigating miRNA regulation at single-cell level. However, as currently single-cell miRNA-mRNA co-sequencing data is just emerging and only available at small-scale, there is a strong need of novel methods to exploit existing single-cell data for the study of cell-specific miRNA regulation. Results In this work, we propose a new method, CSmiR (Cell-Specific miRNA regulation) to combine single-cell miRNA-mRNA co-sequencing data and putative miRNA-mRNA binding information to identify miRNA regulatory networks at the resolution of individual cells. We apply CSmiR to the miRNA-mRNA co-sequencing data in 19 K562 single-cells to identify cell-specific miRNA-mRNA regulatory networks for understanding miRNA regulation in each K562 single-cell. By analyzing the obtained cell-specific miRNA-mRNA regulatory networks, we observe that the miRNA regulation in each K562 single-cell is unique. Moreover, we conduct detailed analysis on the cell-specific miRNA regulation associated with the miR-17/92 family as a case study. The comparison results indicate that CSmiR is effective in predicting cell-specific miRNA targets. Finally, through exploring cell–cell similarity matrix characterized by cell-specific miRNA regulation, CSmiR provides a novel strategy for clustering single-cells and helps to understand cell–cell crosstalk. Conclusions To the best of our knowledge, CSmiR is the first method to explore miRNA regulation at a single-cell resolution level, and we believe that it can be a useful method to enhance the understanding of cell-specific miRNA regulation.
... Recent networkbased methods have proposed combining the individual co-expression network of each dataset, calculated separately; however, these methods are computationally expensive for a large collection of datasets (Ter Veer et al. 2019). A simpler and widely used methodology is to use or compare the logFC values (as the mean expression value normalized to internal control) obtained for each dataset that may come from numerous conditions, which has been used in different contexts, including miRNA target prediction, toxicogenomic patterns, and compound similarity matrices (Kramer et al. 2020;Yoon et al. 2019;Zhou et al. 2018;Cheng and Yang 2013). Here, we combined these two methods, that is, logFC comparison and co-expression network, and performed a pairwise correlation analysis between ace2 and other genes using logFC values obtained from re-analysis (limma) of public zebrafish Affymetrix datasets (GEO), followed by network enrichment. ...
Article
Full-text available
Human Angiotensin I Converting Enzyme 2 (ACE2) plays an essential role in blood pressure regulation and SARS-CoV-2 entry. ACE2 has a highly conserved, one-to-one ortholog (ace2) in zebrafish, which is an important model for human diseases. However, the zebrafish ace2 expression profile has not yet been studied during early development, between genders, across different genotypes, or in disease. Moreover, a network-based meta-analysis for the extraction of functionally enriched pathways associated with differential ace2 expression is lacking in the literature. Herein, we first identified significant development-, tissue-, genotype-, and gender-specific modulations in ace2 expression via meta-analysis of zebrafish Affymetrix transcriptomics datasets (ndatasets = 107); and the correlation analysis of ace2 meta-differential expression profile revealed distinct positively and negatively correlated local functionally enriched gene networks. Moreover, we demonstrated that ace2 expression was significantly modulated under different physiological and pathological conditions related to development, tissue, gender, diet, infection, and inflammation using additional RNA-seq datasets. Our findings implicate a novel translational role for zebrafish ace2 in organ differentiation and pathologies observed in the intestines and liver.
... Moreover, many scientific areas only need to work with binary data, where the possible values are 0 or 1 and only need one bit to be represented. Some examples of research fields that can work with binary data are gene expression analyses [13,30], where a value equal to one indicates that a gene is differentially expressed in an individual; marketing [9], to represent whether people have access to a certain product or shop; or social networks [10], where those values equal to one indicate relationships among users. ...
Article
Full-text available
Biclustering is a data mining technique that allows us to find groups of rows and columns that are highly correlated in a 2D dataset. Although there exist several software applications to perform biclustering, most of them suffer from a high computational complexity which prevents their use in large datasets. In this work we present ScalaParBiBit, a parallel tool to find biclusters on binary data, quite common in many research fields such as text mining, marketing or bioinformatics. ScalaParBiBit takes advantage of the special characteristics of these binary datasets, as well as of an efficient parallel implementation and algorithm, to accelerate the biclustering procedure in distributed-memory systems. The experimental evaluation proves that our tool is significantly faster and more scalable that the state-of-the-art tool ParBiBit in a cluster with 32 nodes and 768 cores. Our tool together with its reference manual are freely available at https://github.com/fraguela/ScalaParBiBit.
... It is known that weak repression by miRNAs can nevertheless have a substantial effect on cell phenotype (Flynt & Lai, 2008). Recently, data have been obtained supporting an FC threshold as low as 1.3 for functional miRNA targets (Yoon et al, 2019). This threshold can apply to the miRNA-induced mRNA up-regulation. ...
Article
Full-text available
The migrational propensity of neuroblastoma is affected by cell identity, but the mechanisms behind the divergence remain unknown. Using RNAi and time-lapse imaging, we show that ADRN-type NB cells exhibit RAC1- and kalirin-dependent nucleokinetic (NUC) migration that relies on several integral components of neuronal migration. Inhibition of NUC migration by RAC1 and kalirin-GEF1 inhibitors occurs without hampering cell proliferation and ADRN identity. Using three clinically relevant expression dichotomies, we reveal that most of up-regulated mRNAs in RAC1- and kalirin–GEF1–suppressed ADRN-type NB cells are associated with low-risk characteristics. The computational analysis shows that, in a context of overall gene set poverty, the upregulomes in RAC1- and kalirin–GEF1–suppressed ADRN-type cells are a batch of AU-rich element–containing mRNAs, which suggests a link between NUC migration and mRNA stability. Gene set enrichment analysis–based search for vulnerabilities reveals prospective weak points in RAC1- and kalirin–GEF1–suppressed ADRN-type NB cells, including activities of H3K27- and DNA methyltransferases. Altogether, these data support the introduction of NUC inhibitors into cancer treatment research.
... MicroRNA (miRNA) which presents this sequence is one of the big genomic cancer data sets (Peralta et al. 2015;Sabzehzari and Naghavi 2018). They are small non-coding RNA molecules (19-23 nt) that regulate gene expression by binding to miRNA response elements in messenger RNA (mRNA) at the post-transcription level (Yoon et al. 2019). ...
Article
The diagnosis of cancer is presently undergoing a change of paradigm for the diagnostic panel using molecular biomarkers. MicroRNA (miRNA) is one of the most important genomic datasets presenting the genome sequences. Since several studies have shown the relationship between miRNAs and cancers, data mining and machine learning methods can be incorporated to extract a large amount of knowledge from cancer genomic datasets. However, previous research works on the identification of cancers from miRNAs have made it possible to diagnose cancer, and the accuracy of some classes is not quite satisfactory. Therefore, this research is aimed at promoting a super-class (meta-label) approach and deep learning in a three-phase method to diagnose cancers from miRNAs. The steps in the first phase of the proposed method, named Representation learning, are partitioning data into super-classes, meta-data creation and super-classes classification. This phase helps data to be split into some subsets to improve classification accuracy. In other words, the first phase groups labels based on the separability of classes into a meta-label, and then a multi-label learner is built to predict these meta-labels. In the second phase, a feature selection to reduce the dimensions of the problem is applied to each super-class to help to focus the attention of an induction algorithm in those features that are more important to predict the target concept. In the third phase of the proposed method, an evolutionary deep neural network for the classification of labels in each super-class is performed. The last two phases are done separately for each subset in which five super-classes and subsequently five deep neural networks are trained. The experimental results reveal that the proposed method achieved more efficient results than 19 recent machine learning methods. Despite the fact that evaluating the dataset which consists of 29 types of cancers provides a more complicated situation for the convolutional neural network to be learned, the performance of the method is noticeably better than other existing methods. The other success which can be considered here is a significant reduction in running time comparing to other methods.
... MicroRNA (miRNA) which presents this sequence is one of the big genomic cancer data sets (Peralta et al. 2015;Sabzehzari and Naghavi 2018). They are small non-coding RNA molecules (19-23 nt) that regulate gene expression by binding to miRNA response elements in messenger RNA (mRNA) at the post-transcription level (Yoon et al. 2019). ...
Article
Full-text available
The diagnosis of cancer is presently undergoing a change of paradigm for the diagnostic panel using molecular biomarkers. MicroRNA (miRNA) is one of the most important genomic datasets presenting the genome sequences. Since several studies have shown the relationship between miRNAs and cancers, data mining and machine learning methods can be incorporated to extract a large amount of knowledge from cancer genomic datasets. However, previous research works on the identification of cancers from miRNAs have made it possible to diagnose cancer, and the accuracy of some classes is not quite satisfactory. Therefore, this research is aimed at promoting a super-class (meta-label) approach and deep learning in a three-phase method to diagnose cancers from miRNAs. The steps in the first phase of the proposed method, named Representation learning, are partitioning data into super-classes, meta-data creation and super-classes classification. This phase helps data to be split into some subsets to improve classification accuracy. In other words, the first phase groups labels based on the separability of classes into a meta-label, and then a multi-label learner is built to predict these meta-labels. In the second phase, a feature selection to reduce the dimensions of the problem is applied to each super-class to help to focus the attention of an induction algorithm in those features that are more important to predict the target concept. In the third phase of the proposed method, an evolutionary deep neural network for the classification of labels in each super-class is performed. The last two phases are done separately for each subset in which five super-classes and subsequently five deep neural networks are trained. The experimental results reveal that the proposed method achieved more efficient results than 19 recent machine learning methods. Despite the fact that evaluating the dataset which consists of 29 types of cancers provides a more complicated situation for the convolutional neural network to be learned, the performance of the method is noticeably better than other existing methods. The other success which can be considered here is a significant reduction in running time comparing to other methods.
... Different from other biological networks (i.e., protein-protein interaction network), the miRNA-target regulatory network is a bipartite network. Consequently, the generated miRNA-target modules are actually bicliques where every miRNA of the miRNA set is connected to each target gene of the target gene set (Yoon et al., 2019). In this work, we utilize the R package biclique to enumerate all bicliques from the identified miRNA-target bipartite network. ...
Article
Full-text available
Autism spectrum disorder (ASD) is a class of neurodevelopmental disorders characterized by genetic and environmental risk factors. The pathogenesis of ASD has a strong genetic basis, consisting of rare de novo or inherited variants among a variety of multiple molecules. Previous studies have shown that microRNAs (miRNAs) are involved in neurogenesis and brain development and are closely associated with the pathogenesis of ASD. However, the regulatory mechanisms of miRNAs in ASD are largely unclear. In this work, we present a stepwise method, ASDmiR, for the identification of underlying pathogenic genes, networks, and modules associated with ASD. First, we conduct a comparison study on 12 miRNA target prediction methods by using the matched miRNA, lncRNA, and mRNA expression data in ASD. In terms of the number of experimentally confirmed miRNA–target interactions predicted by each method, we choose the best method for identifying miRNA–target regulatory network. Based on the miRNA–target interaction network identified by the best method, we further infer miRNA–target regulatory bicliques or modules. In addition, by integrating high-confidence miRNA–target interactions and gene expression data, we identify three types of networks, including lncRNA–lncRNA, lncRNA–mRNA, and mRNA–mRNA related miRNA sponge interaction networks. To reveal the community of miRNA sponges, we further infer miRNA sponge modules from the identified miRNA sponge interaction network. Functional analysis results show that the identified hub genes, as well as miRNA-associated networks and modules, are closely linked with ASD. ASDmiR is freely available at https://github.com/chenchenxiong/ASDmiR.
Preprint
Full-text available
RNA-sequencing technology provides an effective tool for understanding miRNA regulation in complex human diseases, including cancers. A large number of computational methods have been developed to make use of bulk and single-cell RNA-sequencing data to identify miRNA regulations at the resolution of multiple samples (i.e. group of cells or tissues). However, due to the heterogeneity of individual samples, there is a strong need to infer miRNA regulation specific to individual samples to uncover miRNA regulation at single-sample resolution level. Here, we develop a framework, Scan, for scanning sample-specific miRNA regulation. Since a single network inference method or strategy cannot perform well for all types of new data, Scan incorporates 27 network inference methods and two strategies to infer tissue-specific or cell-specific miRNA regulation from bulk or single-cell RNA-sequencing data. Results on bulk and single-cell RNA-sequencing data demonstrate the effectiveness of Scan in inferring sample-specific miRNA regulation. Moreover, we have found that incorporating priori information of miRNA targets can improve the accuracy of miRNA target prediction. In addition, Scan can contribute to the clustering cells/tissues and construction of cell/tissue correlation networks. Finally, the comparison results have shown that the performance of network inference methods is likely to be data-specific, and selecting optimal network inference methods is required for more accurate prediction of miRNA targets. We have made Scan freely available to the public to help infer sample-specific miRNA regulation for new data, benchmark new network inference methods and deepen the understanding of miRNA regulation at the resolution of individual samples.
Article
Full-text available
MicroRNAs (miRNAs) are short non-coding RNAs that regulate expression of target messenger RNAs (mRNAs) post-transcriptionally. Understanding the precise regulatory role of miRNAs is of great interest since miRNAs have been shown to play an important role in development, diseases, and other biological processes. Early work on miRNA target prediction has focused on static sequence-driven miRNA-mRNA complementarity. However, recent research also utilizes expression-level data to study context-dependent regulation effects in a more dynamic, physiologically-relevant setting. Methods: We propose a novel artificial neural network (ANN) based method, named Tiresias, to predict such targets in a context-dependent manner by combining sequence and expression data. In order to predict the interacting pairs among miRNAs and mRNAs and their regulatory weights, we develop a two-stage ANN and present how to train it appropriately. Tiresias is designed to study various regulation models, ranging from a simple linear model to a complex non-linear model. Tiresias has a single hyper-parameter to control the sparsity of miRNA-mRNA interactions, which we optimize using Bayesian optimization. Results: Tiresias performs better than existing computational methods such as GenMiR++, Elastic Net, and PIMiM, achieving an F1 score of >0.8 for a certain level of regulation strength. For the TCGA breast invasive carcinoma dataset, Tiresias results in the rate of up to 82% in detecting the experimentally-validated interactions between miRNAs and mRNAs, even if we assume that true regulations may result in a low level of regulation strength. Conclusion: Tiresias is a two-stage ANN, computational method that deciphers context-dependent microRNA regulatory interactions. Experiment results demonstrate that Tiresias outperforms existing solutions and can achieve a high F1 score. Source code of Tiresias is available at https://bitbucket.org/cellsandmachines/.
Article
Full-text available
MicroRNAs (miRNAs) are small non-coding RNAs of ∼ 22 nucleotides that are involved in negative regulation of mRNA at the post-transcriptional level. Previously, we developed miRTarBase which provides information about experimentally validated miRNA-target interactions (MTIs). Here, we describe an updated database containing 422 517 curated MTIs from 4076 miRNAs and 23 054 target genes collected from over 8500 articles. The number of MTIs curated by strong evidence has increased ∼1.4-fold since the last update in 2016. In this updated version, target sites validated by reporter assay that are available in the literature can be downloaded. The target site sequence can extract new features for analysis via a machine learning approach which can help to evaluate the performance of miRNA-target prediction tools. Furthermore, different ways of browsing enhance user browsing specific MTIs. With these improvements, miRTarBase serves as more comprehensively annotated, experimentally validated miRNA-target interactions databases in the field of miRNA related research. miRTarBase is available at http://miRTarBase.mbc.nctu.edu.tw/.
Article
Full-text available
miR-17-92 is a microRNA cluster with six distinct members. Here, we show that the miR-17-92 cluster and its individual members modulate chronic neuropathic pain. All cluster members are persistently upregulated in primary sensory neurons after nerve injury. Overexpression of miR-18a, miR-19a, miR-19b and miR-92a cluster members elicits mechanical allodynia in rats, while their blockade alleviates mechanical allodynia in a rat model of neuropathic pain. Plausible targets for the miR-17-92 cluster include genes encoding numerous voltage-gated potassium channels and their modulatory subunits. Single-cell analysis reveals extensive co-expression of miR-17-92 cluster and its predicted targets in primary sensory neurons. miR-17-92 downregulates the expression of potassium channels, and reduced outward potassium currents, in particular A-type currents. Combined application of potassium channel modulators synergistically alleviates mechanical allodynia induced by nerve injury or miR-17-92 overexpression. miR-17-92 cluster appears to cooperatively regulate the function of multiple voltage-gated potassium channel subunits, perpetuating mechanical allodynia.
Article
Full-text available
A system-wide understanding of cellular function requires knowledge of all functional interactions between the expressed proteins. The STRING database aims to collect and integrate this information, by consolidating known and predicted protein-protein association data for a large number of organisms. The associations in STRING include direct (physical) interactions, as well as indirect (functional) interactions, as long as both are specific and biologically meaningful. Apart from collecting and reassessing available experimental data on protein-protein interactions, and importing known pathways and protein complexes from curated databases, interaction predictions are derived from the following sources: (i) systematic co-expression analysis, (ii) detection of shared selective signals across genomes, (iii) automated text-mining of the scientific literature and (iv) computational transfer of interaction knowledge between organisms based on gene orthology. In the latest version 10.5 of STRING, the biggest changes are concerned with data dissemination: the web frontend has been completely redesigned to reduce dependency on outdated browser technologies, and the database can now also be queried from inside the popular Cytoscape software framework. Further improvements include automated background analysis of user inputs for functional enrichments, and streamlined download options. The STRING resource is available online, at http://string-db.org/.
Article
The functional rules for microRNA (miRNA) targeting remain controversial despite their biological importance because only a small fraction of distinct interactions, called site types, have been examined among an astronomical number of site types that can occur between miRNAs and their target mRNAs. To systematically discover functional site types and to evaluate the contradicting rules reported previously, we used large-scale transcriptome data and statistically examined whether each of approximately 2 billion site types is enriched in differentially downregulated mRNAs responding to overexpressed miRNAs. Accordingly, we identified seven non-canonical functional site types, most of which are novel, in addition to four canonical site types, while also removing numerous false positives reported by previous studies. Extensive experimental validation and significantly elevated 3' UTR sequence conservation indicate that these non-canonical site types may have biologically relevant roles. Our expanded catalog of functional site types suggests that the gene regulatory network controlled by miRNAs may be far more complex than currently understood.
Chapter
The Gene Expression Omnibus (GEO) database is an international public repository that archives and freely distributes high-throughput gene expression and other functional genomics data sets. Created in 2000 as a worldwide resource for gene expression studies, GEO has evolved with rapidly changing technologies and now accepts high-throughput data for many other data applications, including those that examine genome methylation, chromatin structure, and genome–protein interactions. GEO supports community-derived reporting standards that specify provision of several critical study elements including raw data, processed data, and descriptive metadata. The database not only provides access to data for tens of thousands of studies, but also offers various Web-based tools and strategies that enable users to locate data relevant to their specific interests, as well as to visualize and analyze the data. This chapter includes detailed descriptions of methods to query and download GEO data and use the analysis and visualization tools. The GEO homepage is at http:// www. ncbi. nlm. nih. gov/ geo/ .
Article
Motivation: MicroRNAs (miRNAs) are small noncoding RNAs that are extensively involved in many physiological and disease processes. One major challenge in miRNA studies is the identification of genes targeted by miRNAs. Currently, most researchers rely on computational programs to initially identify target candidates for subsequent validation. Although considerable progress has been made in recent years for computational target prediction, there is still significant room for algorithmic improvement. Results: Here, we present an improved target prediction algorithm, which was developed by modeling high-throughput profiling data from recent CLIPL (crosslinking and immunoprecipitation followed by RNA ligation) sequencing studies. In these CLIPL-seq studies, the RNA sequences in each miRNA-target pair were covalently linked and unambiguously determined experimentally. By analyzing the CLIPL data, many known and novel features relevant to target recognition were identified and then used to build a computational model for target prediction. Comparative analysis showed that the new algorithm had improved performance over existing algorithms when applied to independent experimental data. Availability: All the target prediction data as well as the prediction tool can be accessed at miRDB (http://mirdb.org). Contact: xwang@radonc.wustl.edu.