ArticlePDF Available

Biclustering analysis of transcriptome big data identifies condition-specific microRNA targets

March 2019
Nucleic Acids Research 47(9)

March 2019
47(9)

DOI:10.1093/nar/gkz139

License
CC BY 4.0

Authors:

Sora Yoon

University of Pennsylvania

Nguyen Cao Truong Hai

Ulsan National Institute of Science and Technology

Kim Jinhwan

Sejong University

Show all 8 authorsHide

We present a novel approach to identify human microRNA (miRNA) regulatory modules (mRNA targets and relevant cell conditions) by biclustering a large collection of mRNA fold-change data for sequence-specific targets. Bicluster targets were assessed using validated messenger RNA (mRNA) targets and exhibited on an average 17.0% (median 19.4%) improved gain in certainty (sensitivity + specificity). The net gain was further increased up to 32.0% (median 33.4%) by incorporating functional networks of targets. We analyzed cancer-specific biclusters and found that the PI3K/Akt signaling pathway is strongly enriched with targets of a few miRNAs in breast cancer and diffuse large B-cell lymphoma. Indeed, five independent prognostic miRNAs were identified, and repression of bicluster targets and pathway activity by miR-29 was experimentally validated. In total, 29 898 biclusters for 459 human miRNAs were collected in the BiMIR database where biclusters are searchable for miRNAs, tissues, diseases, keywords and target genes.

Available via license: CC BY 4.0

Content may be subject to copyright.

Published online 1 March 2019 Nucleic Acids Research, 2019, Vol. 47, No. 9 e53

doi: 10.1093/nar/gkz139

Biclustering analysis of transcriptome big data

identiﬁes condition-speciﬁc microRNA targets

Sora Yoon1, Hai C. T. Nguyen1, Woobeen Jo1, Jinhwan Kim1, Sang-Mun Chi2,

Jiyoung Park1, Seon-Young Kim3,4 and Dougu Nam1,5,*

1School of Life Sciences, Ulsan National Institute of Science and Technology, Ulsan 44919, Republic of Korea,

2School of Computer Science and Engineering, Kyungsung University, Busan 48434, Republic of Korea,

3Department of Functional Genomics, University of Science and Technology (UST), Daejeon 34141, Republic of

Korea, 4Genome Editing Research Center, Personalized Genomic Medicine Research Center, Korea Research

Institute of Bioscience and Biotechnology (KRIBB), Daejeon 34141, Republic of Korea and 5Department of

Mathematical Sciences, Ulsan National Institute of Science and Technology, Ulsan 44919, Republic of Korea

Received December 15, 2018; Editorial Decision February 13, 2019; Accepted February 19, 2019

ABSTRACT

We present a novel approach to identify human mi-

croRNA (miRNA) regulatory modules (mRNA targets

and relevant cell conditions) by biclustering a large

collection of mRNA fold-change data for sequence-

speciﬁc targets. Bicluster targets were assessed us-

ing validated messenger RNA (mRNA) targets and

exhibited on an average 17.0% (median 19.4%) im-

proved gain in certainty (sensitivity + speciﬁcity).

The net gain was further increased up to 32.0% (me-

dian 33.4%) by incorporating functional networks of

targets. We analyzed cancer-speciﬁc biclusters and

found that the PI3K/Akt signaling pathway is strongly

enriched with targets of a few miRNAs in breast can-

cer and diffuse large B-cell lymphoma. Indeed, ﬁve

independent prognostic miRNAs were identiﬁed, and

repression of bicluster targets and pathway activ-

ity by miR-29 was experimentally validated. In total,

29 898 biclusters for 459 human miRNAs were col-

lected in the BiMIR database where biclusters are

searchable for miRNAs, tissues, diseases, keywords

and target genes.

INTRODUCTION

MicroRNAs (miRNAs) are small non-coding RNA

molecules (19–23 nt) that regulate gene expression by

binding to miRNA response elements in messenger RNA

(mRNA) at the post-transcription level (1,2). Since their

discovery, extensive studies have revealed their key roles in

regulating cell cycle and differentiation, chronic diseases,

cancer progression and other processes (3–6). As the func-

tion of an miRNA is characterized by its target genes, there

have been efforts to systematically identify these target

genes based on the binding sequences (7–12). Although

these methods have provided hundreds to thousands of

potential targets, they also yield a large number of false-

positives and do not suggest specic targets related to the

cell condition being examined.

To select more reliable mRNA targets for each miRNA,

paired expression proles of miRNAs and mRNAs (de-

noted as miRNA–mRNA proles) have been incorporated

considering the anticorrelation between an miRNA and its

target mRNA. In addition to simple Pearson and Spearman

correlation methods, a number of computational meth-

ods that integrate both the binding sequence and miRNA–

mRNA proles have been developed to detect the miRNA–

mRNA regulatory relationships including penalized re-

gression and the Bayesian methods (13–15) (denoted as

anticorrelation-based methods). Many of these methods

used multivariate linear models in which multiple miRNAs

regulate a common target gene. Although anticorrelation-

based methods have improved target prediction, they re-

quire very costly miRNA–mRNA proles, and only a lim-

ited number of such paired datasets are publicly available at

present.

Another approach for improving miRNA target predic-

tion is by inference of miRNA regulation modules. Based

on binding sequence information, a bipartite graph between

miRNAs and mRNAs was constructed and the maximum

bicliques (or biclusters) were identied (16,17). These bi-

cliques represent miRNA regulation modules in which mul-

tiple miRNAs may coregulate their common targets. By in-

corporating miRNA–mRNA proles, these modules were

further rened for specic cell conditions (18–21). Because

of the modular nature of cellular processes, these modules

were considered to represent more reliable miRNA regula-

tion patterns (22). Recent methods incorporated additional

information such as protein–protein (or gene–gene) inter-

actions, copy number variation and methylation data to

*To whom correspondence should be addressed. Tel: +82 52 217 2525; Fax: +82 52 217 2639; Email: dougnam@unist.ac.kr

The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which

permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Downloaded from https://academic.oup.com/nar/article-abstract/47/9/e53/5366474 by guest on 19 March 2020

e53 Nucleic Acids Research, 2019, Vol. 47, No. 9 PAGE 2OF 10

Figure 1. Two approaches for miRNA regulation module discovery. Red,

yellow and blue nodes represent miRNA regulators, mRNA target genes

and cell conditions, respectively. R, g and C stand for regulator, target gene

and cell condition, respectively. (A) Existing approach. For a given cell con-

dition (here, C1), down (or up)-regulated mRNAs are selected and biclus-

ters between multiple miRNAs and these mRNA targets are identied. (B)

Our approach. For a given miRNA (here, R1), mRNAs with correspond-

ing binding sequences are selected and biclusters between these mRNAs

and multiple cell conditions are searched.

better understand miRNA regulation (23). The myriad of

computational methods for miRNA target prediction have

been reviewed and categorized previously (15,20,23), some

of which are summarized in Supplementary Table S1.

In this study, we propose a novel approach to identifying

miRNA targets for a variety of cell conditions by biclus-

tering a large collection of mRNA proles for sequence-

specic targets. To this end, we collected 5158 human mi-

croarray expression datasets with diverse test and con-

trol conditions from the Gene Expression Omnibus (GEO)

database (24) and compiled corresponding fold-change

(FC) proles representing 5158 cell conditions. Whereas ex-

isting methods for miRNA regulation modules biclustered

miRNAs and mRNA targets under a given cell condition

(Figure 1A), we considered a different dimension and bi-

clustered mRNA targets and cell conditions (i.e. FC pro-

les) for an miRNA of interest (Figure 1B). Our approach

provides more reliable miRNA target groups that are ro-

bustly regulated across different cell conditions without us-

ing miRNA–mRNA proles. A related approach incorpo-

rated coexpression of sequence-specic targets using 250

microarray datasets to prioritize true targets (25), but it

clustered only target genes and did not suggest relevant cell

conditions.

Typically, biclustering algorithms seek to identify a com-

plete association (i.e. biclique) between two subsets of nodes

(e.g. a subset of target genes and a subset of cell conditions)

(26,27). Taking into account the noise in microarray data,

we developed a progressive bicluster extension (PBE) algo-

rithm that allows for a small portion of unconnected pairs

between two node subsets but yields biclusters of much

larger sizes. In the initial step, PBE identies bicliques using

the bimax algorithm (27). These bicliques are used as seeds

that are extended by competitively adding ‘dense’ (low pro-

portion of zero values) rows and columns. Next, less dense

rows and columns are removed based on a threshold. By

increasing this threshold (tight to less tight) during the it-

eration of bicluster extension, PBE identied the bicluster

structures from noisy data more accurately than state-of-

the-art algorithms (17,27–31). QUBIC (29) uses a similar

approach by searching for seed biclusters that are then ex-

tended. However, QUBIC applies a threshold for minimum

column density only, which does not change during exten-

sion and does not remove noisy rows (Supplementary Fig-

ure S4B).

The biclusters were assessed using experimentally vali-

dated targets and exhibited substantially improved accu-

racy compared to the purely sequence-based method. The

accuracy was even further improved by selecting the targets

having functional interactions with other target genes. No-

tably, these gains were obtained using only publicly available

gene expression and protein functional interaction data,

but were compared favorably with those obtained from

the anticorrelation-based methods. Moreover, our predic-

tions are available for 459 human miRNAs and a vari-

ety of cell conditions from our bicluster database, called

BiMIR (http://btool.org/bimir dir/). We further validated

our approach by analyzing the pathways of cancer-specic

biclusters and prognosis of associated miRNAs followed by

conrmatory experiments.

MATERIALS AND METHODS

Collection of expression fold-change data

We downloaded CEL les for 2019 GEO series produced

using the Affymetrix U133 Plus 2.0 chip. Robust multi-

array average (RMA) normalization was applied to each

CEL le using ‘justRMA’ function in R ‘affy’ package (32).

The intensities of probes for each gene were collapsed by

their average value. Next, we curated two sample groups

(test/control) for each experimental series and calculated

the logarithmic FC (denoted as logFC) of the average ex-

pressions in each group. In total, logFC proles for 5158

(test/control) cell conditions were collected for 20 639 hu-

man gene symbols. The logFC matrix and information of

the cell conditions are available from our bimir R package

(https://github.com/unistbig/bimir).

Sequence-specic miRNA targets

Sequence-specic miRNA targets were obtained from the

seven sequence-based target prediction databases (Tar-

getScan (33), miRanda (34), mirSVR (35), PITA (36),

DIANA-microT (37,38), miRDB (39) and TargetRank

(40)). The number of candidate miRNA–mRNA interac-

tions, parameters used and download sites for the sequence-

specic targets are available in Supplementary Data (Sec-

tion S1).

MiRNA target prediction using a progressive bicluster exten-

sion (PBE) algorithm

The overview of biclustering-based miRNA target predic-

tion is shown in Figure 2. First, 5158 mRNA microarray

datasets with two sample groups (test/control) were col-

lected from GEO database (24), and corresponding logFC

data were compiled for 20 639 human genes (columns) and

5158 fold-change cell conditions (rows). These logFC data

are quantized into up-, neutral- and down-regulated genes

(denoted as 1, 0 and −1, respectively) using ±log21.3 (here-

after, simply denoted as 1.3 FC) thresholds. We regarded

1.3 FC as an appropriate threshold for representing tar-

get expression changes caused by miRNA regulation ex-

cluding noisy data and covering many ‘ne-tuned’ mRNA

Downloaded from https://academic.oup.com/nar/article-abstract/47/9/e53/5366474 by guest on 19 March 2020

PAGE 3OF 10 Nucleic Acids Research, 2019, Vol. 47, No. 9 e53

Figure 2. Overview of the biclustering-based miRNA target predic-

tion. (A) The gene expression fold-change compendium. (B) Sequence-

specic targets for each miRNA were obtained from seven miRNA target

databases. (C) The MIR prole is composed of binarized logarithmic fold-

change values of sequence-specic targets for selected cell conditions. (D)

From MIR prole, seed biclusters are extracted using BIMAX algorithm,

and then are extended using PBE algorithm. (E) Finally, merged biclusters

are generated by hierarchical clustering of extended biclusters and remov-

ing the noisy rows and columns.

targets simultaneously. For each miRNA, sequence-specic

targets predicted in at least three out of seven miRNA tar-

get databases were selected (denoted as background set).

Then, logFC proles for each condition were accumulated

to the background set based on the enrichment of 1.3-fold

up-regulated genes in the background set (hypergeometric

test, FDR <5%). The resulting logFC submatrix was con-

verted to a binary matrix by replacing −1 with 0, and was

dubbed MIR prole for the given miRNA. We rst applied

the bimax biclustering algorithm (27) to the MIR prole to

obtain a number of small biclusters completely lled with

1 (called seed biclusters). These seed biclusters were then

‘progressively’ extended using PBE algorithm (extended bi-

clusters); rows and columns with many 1’s were competi-

tively added to the seed bicluster and then relatively noisy

rows and columns were removed, and this process was re-

peated by slightly increasing the threshold for zero propor-

tion in each row and column (strict to less strict). The ex-

tended biclusters were then clustered using average-linkage

hierarchical clustering (merged bicluster) to remove redun-

dant results. The Meet/Min distance was used for hierar-

chical clustering as follows: For two different extended bi-

clusters A and B,

Distance (A,B)=1−|A∩B|

min (|A|,|B|),

where |A|is the multiplication of the row and column sizes

of A. We tested for the three cutoff values (0.3, 0.5 and 0.7)

for the cluster dendrogram. This cutoff had a limited effect

on the result, and thus we used the cutoff =0.5. After the

merging, the rows or columns containing more than 10%

of zeros were trimmed off individually, nally yielding the

‘merged biclusters’. See Supplementary Data for a detailed

description of PBE algorithm (Section S2, Supplementary

Figures S1 and S2). Only the merged bicluster was used for

target prediction and is simply denoted as ‘bicluster’ here-

after unless noted otherwise.

The resulting biclusters represent predicted target genes

(bicluster columns) up-regulated for the clustered cell con-

ditions (bicluster rows). Down-regulated biclusters were

also generated in the symmetrical way. Up (down)-regulated

biclusters imply that the corresponding miRNA is down

(up)-regulated in the captured test conditions. Detailed fea-

tures of the biclusters are described in Supplementary Data

(Supplementary Figure S3 and Supplementary Table S2).

We mainly reported the analysis results for 1.3 FC thresh-

old, but biclusters were also generated under ±log1.5 and

±log2.0 thresholds (denoted as 1.5 FC and 2.0 FC thresh-

olds, respectively) to capture more specic and stronger

miRNA regulation. Overall, for the list of sequence-specic

targets of a given miRNA, two MIR proles (up and down)

are generated for each threshold (1.3, 1.5 and 2.0). The three

up-regulated (and down-regulated) MIR proles have dif-

ferent condition counts, while the gene counts are the same.

Therefore, the resulting seed bicluster (and the nal merged

bicluster) counts differ for different thresholds. An example

of let-7c bicluster for stem cell conditions are described in

Supplementary Data (Section S5).

Experimental validation of miR-29b/c regulation in breast

cancer

miRNA transfection. miR-29b-3p and miR-29c-3p mimic

and miRNA scramble control were purchased from Geno-

lution. Each miRNA (100 nM) were transiently transfected

into MDA-MB-231 by using G-fectin Reagent (Genolu-

tion). All experiments were performed 48 h after transfec-

tion.

Real-time quantitative PCR. One microgram of total

RNA from MDA-MB-231 cell was reverse transcribed with

oligo dT and M-MLV RT reverse transcriptase (Invitro-

gen). Real-time quantitative PCR was performed using a

GENETBIO SYBR Green Prime Q-master Mix and the

QuantStudio 5 PCR system (ThermoFisher). All runs were

accompanied by the internal control B2M or HPRT gene.

Because both the reference genes yielded very similar re-

sults, only B2M results are shown in Figure 6. The samples

were run in duplicate and normalized to B2M or GAPDH

using a DD cycle threshold-based algorithm, to provide ar-

bitrary units representing relative expression.

Downloaded from https://academic.oup.com/nar/article-abstract/47/9/e53/5366474 by guest on 19 March 2020

e53 Nucleic Acids Research, 2019, Vol. 47, No. 9 PAGE 4OF 10

Methods

Precision

Methods

Sensitivity

ABC

Figure 3. Simulation test for biclustering algorithms. (A) Example of simulation prole. Orange and gray elements indicate 1 and 0, respectively. (B)

Precision and (C) sensitivity of tested biclustering algorithms.

Immunoblotting. Harvested cells were lysed in RIPA

buffer and subjected to centrifugation, and the super-

natants were collected. Protein concentration was mea-

sured using the BCA protein assay kit (Pierce), and equal

amounts of protein were resolved using 10% or 12% sodium

dodecyl sulfate (SDS)-polyacrylamide gel electrophoresis

(PAGE) and transferred to Nylon membranes (GE Health-

care, Amersham). Target proteins were observed by in-

cubation with primary antibodies and infrared uores-

cence dye-conjugated secondary antibodies as follows: rab-

bit anti-human FAK (1:1000, cell signaling), phospho-

FAK (1:1000, cell signaling), Akt (1:1000, cell signal-

ing), phospho- Akt (1:1000, cell signaling) and mouse

anti-human GAPDH (1:1000, cell signaling). The HRP-

conjugated secondary antibodies were purchased from Cell

Signaling Technology. Immunodetection was performed us-

ing an Odyssey CLx scanner (Li-COR Biosciences).

RESULTS

Comparison with other biclustering algorithms

Compared to seed biclusters, PBE algorithm yielded much

larger biclusters by allowing for a small portion of noise

(Supplementary Figure S3). Its performance was compared

with those of ve existing biclustering algorithms such as

ISA (28), QUBIC (29), FABIA (30), BiBit (31) and HOC-

CLUS2 (17). A summary of each method is described in

Supplementary Data (Section S4). First, the size and sig-

nal density of biclusters generated from a real up-regulated

MIR prole of let-7c-5p were compared (Supplementary

Table S3). PBE yielded large biclusters with high densities,

whereas existing algorithms yielded biclusters with either

smaller sizes or poorer densities. PBE also captured stem-

cell-specic bicluster better than the other algorithms (Sup-

plementary Figure S4). Detailed results for real data analy-

sis are described in Supplementary Data (Section S4).

Next, we tested the sensitivity and specicity of biclus-

tering algorithms using simulated binary proles that reect

the average size and density of real MIR proles (700 rows,

300 columns and 20% density) (Figure 3A). The simulated

proles contained seven biclusters in which row and col-

umn sizes were randomly chosen between 20 and 80, and

each bicluster included 1–3% of zeros (noise). Some of bi-

clusters overlapped with each other by <20% of the biclus-

ter sizes. The simulation was repeated 50 times. Here, ‘true

elements’ indicate those included within the seven biclus-

ters, and ‘false elements’ indicate those outside the biclus-

ters. Thus, after running each biclustering algorithm, the

sensitivity was measured as the number of true elements

within the predicted biclusters divided by the number of all

true elements. The precision was measured as the propor-

tion of true elements within the predicted biclusters. PBE

showed perfect precision (median =100%) with high sensi-

tivity (median =95.6%). The performance of ISA depended

on the row (TG) and column (TC) thresholds. When TG =

TC =1, high sensitivity was observed (median =97.2%)

while precision was relatively low (median =87.7%). When

both TG and TC were increased to 2, the precision was in-

creased (median =96.8%) but the sensitivity was decreased

(median =86.1%). The QUBIC results were affected by the

consistency parameter c. As this value was increased, pre-

cision was increased while sensitivity was decreased. The

best performance was observed when using the default pa-

rameter (c=0.95, median precision =80.8%, median sen-

sitivity =100%). BIMAX and BiBit do not allow zeros

in the biclusters and exhibited quite low sensitivities (me-

dian BIMAX sensitivity =10.2%, median BiBit sensitivity

=14.5%). However, when 30 iterations were applied for BI-

MAX, its sensitivity was much increased to 86.7%. FABIA

yielded highly noisy biclusters for all tested sparseness pa-

rameters (a) resulting in low precision (median ≤46.6%)

and sensitivity (≤66.0%). Results for a=0.01 and 0.05

are shown in Figure 3.Fora≥0.1, FABIA did not cre-

ate a bicluster. HOCCLUS2 was also tested but excluded

from Figure 3, because it did not generate any bicluster un-

der this simulation setting. HOCCLUS2 detected biclusters

from sparser data (12% or lower density). These results indi-

cate that PBE is an efcient algorithm to identify biclusters

from noisy binary data.

Downloaded from https://academic.oup.com/nar/article-abstract/47/9/e53/5366474 by guest on 19 March 2020

PAGE 5OF 10 Nucleic Acids Research, 2019, Vol. 47, No. 9 e53

AB C

Figure 4. Performance of miRNA target prediction using binding sequence, biclustering and functional networks. (A) Sensitivity and specicity of pooled

bicluster targets of 11 miRNAs. Targets with binding sequences were used as background (diagonal black dash). Blue nodes represent biclustering results.

Red/yellow/green/purple nodes represent the results obtained using both the biclustering and network-based target selection with node degrees 2, 3, 4 and

5, respectively. (B) Average sensitivity and specicity for different node degrees of target networks. (C) Average gains in certainty of methods using binding

sequence, biclustering and network information.

Accuracy of the biclustering target prediction

The bicluster targets were assessed using validated miRNA

targets. miRTarBase (41) provides hundreds of thousands

of experimentally validated miRNA-target relations with

‘strong’ evidence (reporter assays or western blot) and ‘less

strong’ (or weak) evidence (pSILAC or microarray exper-

iment). Among the sequence-specic targets (background

set) of a given miRNA, those validated with ‘strong’ evi-

dence were regarded as gold positive (GP) targets, whereas

those having neither strong nor weak evidence were set

as gold negative (GN) targets. For evaluation, we selected

miRNAs having more than 30 GPs whose fraction within

the background set was not <5%. Eleven miRNAs that sat-

ised these criteria were analyzed (Figure 4A).

For each miRNA, all the resulting bicluster targets,

whether up- or downregulated, were pooled as predicted

targets, and corresponding sensitivity, specicity, as well as

GP enrichment and GN depletion were calculated (Supple-

mentary Tables S5 and S6). When the 1.3 FC threshold was

used to quantize the logFC data, the average sensitivity and

specicity of the 11 miRNAs were 0.704 and 0.466, respec-

tively (summation =1.170), representing a 17.0% (median

19.4%) improved gain compared with the sequence-based

target prediction. Although positive gains were obtained

for all 11 miRNAs for the 1.3 FC cutoff (Figure 4A), the

relative performances for each miRNA were quite differ-

ent for different FC cutoffs (Supplementary Table S5). For

example, the gain of miR-34a-5p decreased as the FC cut-

off was increased because of the rapid decline in sensitiv-

ity (gains for 1.3 FC: 20.8%, 1.5 FC: 13.3%, 2.0 FC: 7.2%).

In contrast, the gain of miR-21-5p increased as the cutoff

was increased because the specicity was relatively more in-

creased (gains for 1.3 FC: 16.4%, 1.5 FC: 26.5% and 2.0

FC: 31.3%). Such a difference presumably represents dif-

ferent miRNA regulation patterns. The former case corre-

sponds to the ‘ne tuner’ miRNAs that moderately regu-

late many genes. Therefore, using a lower cutoff helps detect

subtle changes in target expressions. However, miRNAs for

the latter case seem to more strongly regulate a relatively

small number of targets. Among the three thresholds tested,

1.3 FC exhibited the best overall gain with the largest sen-

sitivity.

miRNA targets tend to be functionally related with each

other (42,43). Therefore, we incorporated the protein func-

tional interaction networks from the STRING database

(44) (edge threshold >150) between the bicluster target

genes to improve the prediction. Among the bicluster tar-

gets, we further selected those with kor more functional in-

teractions with other targets and measured the correspond-

ing gains. Intriguingly, the specicity rapidly increased as k

was increased (Figure 4B), and the maximum gain reached

up to 32.0% when k=3 (specicity =77.8%, Figure 4C).

The maximum median gain was even higher (33.4% when

k=4). These results show that target interaction networks

can improve the miRNA target prediction considerably.

Comparison with anticorrelation-based methods in cancer

miRNA–mRNA paired proling has been commonly used

to predict condition-specic miRNA targets based on the

anticorrelation between miRNA and its mRNA targets.

Therefore, we compared our biclustering method with seven

anticorrelation-based methods (GenMiR++(13), Pearson

correlation, Spearman correlation, Lasso (45,46), Elas-

tic Net (47), IDA (48) and Tiresias (49)) in predicting

cancer-specic miRNA targets. Pearson/Spearman corre-

lation, Lasso, Elastic Net and IDA were implemented us-

ing miRLAB R package (50), and GenMiR++ and Tire-

sias were run using original MATLAB and Perl codes,

respectively. For the 11 miRNAs evaluated in the pre-

vious section, the accuracy of the predicted targets was

compared between anticorrelation-based methods and our

biclustering method. For the anticorrelation-based meth-

ods, the sequence-specic targets of each miRNA were

sorted in the order of anticorrelation scores that were

calculated from TCGA (The Cancer Genome Atlas)

miRNA–mRNA proles by Pearson/Spearman correla-

Downloaded from https://academic.oup.com/nar/article-abstract/47/9/e53/5366474 by guest on 19 March 2020

e53 Nucleic Acids Research, 2019, Vol. 47, No. 9 PAGE 6OF 10

Figure 5. Performance comparison between biclustering and anticorrelation-based methods. Black asterisks represent biclustering predictions. Green and

red asterisks represent bicluster targets with at least one and three network degrees, respectively. Solid lines represent ROCs of the seven anticorrelation-

based methods. The title of each panel represents the cancer type, miRNA and target regulation direction (parenthesized). Blue, green and red titles

represent the 11, 6 and 3 cases where the biclustering method performed better than, similar to and worse than anticorrelation-based methods, respectively.

Dashed black lines represent the background results when only sequence-specic targets were used. BRCA, DLBC, GBMLGG and LAML represent

breast invasive carcinoma, diffuse large B-cell lymphoma, glioma and acute myeloid lymphoma, respectively.

tion, Bayesian method, penalized regression or neural net-

work model. These sorted scores were compared to the gold

standard positive/negative sets that yielded ROC curves.

For the biclustering method, we selected biclusters where at

least 30% of the rows pertained to ‘tumor versus normal’ or

‘aggressive versus non-aggressive tumor’ conditions. These

biclusters represented 33 miRNA-cancer pairs for ve can-

cer types (breast, brain, lung, colon or blood cancer). In

each miRNA–cancer pair, corresponding bicluster targets

were pooled in the order of proportion of the specic cancer

condition in each bicluster. Thus, the true and false-positive

rates of bicluster targets in each pooling step were depicted

instead of ROC curve (asterisks, Figure 5). After remov-

ing six cases where none of the areas under ROC curves

(AUCs) exceeded 0.6 and the maximum biclustering gain

was <1.1, we selected biclusters from 20 cases that were co-

herent with known miRNA expression (quantitative PCR

results) for comparison. In other words, upregulated biclus-

ters were chosen when corresponding miRNA was known

to be downregulated and vice versa, in cancer. Supplemen-

tary Table S7 lists the literature reporting the expression lev-

els of miRNAs in cancers.

Overall, the biclustering method was compared favor-

ably with the miRNA–mRNA prole based methods (Fig-

ure 5). For 11 out of the 20 cases, the biclustering method

exhibited better gains than the anticorrelation-based meth-

ods; in 6 other cases, both approaches exhibited simi-

lar performances; in the remaining 3 cases, the bicluster-

ing method was inferior to the best anticorrelation-based

method, mostly because of its low sensitivity. As seen in

the previous section, incorporating the network informa-

tion tended to increase the specicity and gain of the bi-

clustering method. Among the seven anticorrelation-based

methods, Genmir++ performed best for most cases.

These results showed that if miRNA expression informa-

tion was provided, our biclustering approach overall per-

formed better than anticorrelation-based methods in prior-

itizing condition-specic miRNA targets. Notably, miRNA

expression is relatively easily obtained from the literature or

quantitative PCR experiments.

miRNAs targeting PI3K/Akt signaling in cancer

We further analyzed the bicluster targets corresponding to

the 20 cancer-miRNA pairs (Figure 5). Among them, breast

Downloaded from https://academic.oup.com/nar/article-abstract/47/9/e53/5366474 by guest on 19 March 2020

PAGE 7OF 10 Nucleic Acids Research, 2019, Vol. 47, No. 9 e53

Figure 6. miRNA targets in PI3K/Akt signaling pathway (breast cancer). (A) miRNA targets predicted from breast cancer biclusters are highlighted by

red borders. For each target molecule, corresponding miRNA names and target gene symbols are represented. (B) Distant relapse-free survival analysis

for 210 patients with breast cancer exhibiting high and low miR-29a, miR-29b and miR-29c levels. The patients were divided into two groups based on

their best splits at top 33.8%, 40% and 66% values, respectively. (C) Transcript levels of miR-29 target gene candidates were analyzed by qRT-PCR. MDA-

MB-231 breast cancer cells were transiently transfected with either scrambled miRNA (control) or miR-29 (29b-3p or 29c-3p). All the nine genes tested

were considerably downregulated by miR-29b and/or -29c. In particular, ITGB1, GNG12 and VEGFA were downregulated by both miR-29b and -29c.

Statistical signicance was tested by one-tailed t-test. *P<0.05; **P<0.01; ***P<0.001 versus scrambled miRNA. (Dand E) Activation of downstream

pathway candidates such as AKT and FAK were analyzed by immunoblotting. Total cell lysates extracted from either scrambled miRNA or (D) miR-29b-

3p as well as (E) miR-29c-3p transfected cells were analyzed for the levels of pAKT, AKT, pFAK and FAK.

cancer and diffuse large B-cell lymphoma (DLBCL) yielded

the largest numbers of biclusters. In breast cancer, biclus-

ter targets of miR-1, miR-29a/b/c, miR-34a and miR-145

were upregulated in aggressive cancer; in DLBCL, the tar-

gets of miR-29a/b/c, miR-34a and miR-145 were also up-

regulated. We pooled those bicluster targets in each cancer

type and performed pathway enrichment analysis (KEGG

annotation) using the DAVID tool (51) to identify seven

and four signicant pathways (FDR <0.05) in breast can-

cer and DLBCL, respectively (Supplementary Tables S8

and S9). Interestingly, the bicluster targets in both can-

cer types were strongly enriched with ‘PI3K/Akt signaling

pathway’ (FDR =2.6E-7 for breast cancer; FDR =5.3E-

7 for DLBCL). This pathway is known to be frequently

hyperactivated in many cancers to promote cell cycle and

survival, proliferation and epithelial–mesenchymal transi-

tion of tumor cells (52,53). In addition, extracellular matrix

(ECM)–receptor interaction and focal adhesion pathways

were commonly caught in both cancer types, but all the

Downloaded from https://academic.oup.com/nar/article-abstract/47/9/e53/5366474 by guest on 19 March 2020

e53 Nucleic Acids Research, 2019, Vol. 47, No. 9 PAGE 8OF 10

corresponding bicluster targets except two (CAV2, BIRC2)

were also included in PI3K/Akt signaling pathway.

Figure 6A and Supplementary Figure S5A show

PI3K/Akt pathway where the bicluster targets are high-

lighted for breast cancer and DLBCL, respectively. In

both cancer types, the miRNAs targeted multiple ligands

including genes encoding growth factors (e.g. VEGFA and

PDGFC targeted by miR-29) and ECM (e.g. COL1A1,

LAMC1 and THBS2 by miR-29); signal transducers such

as receptor tyrosine kinase (e.g. MEK and/or PDGFRA

by miR-34a), G-proteins (GNB4 and GNG12 by miR-29),

toll-like receptor (TLR4 by miR-34a and miR-145) and

integrin (e.g. ITGB1 by miR-29); as well as downstream

effectors such as NRAS (by miR-29 and miR-145) and

CDK6 (by miR-29). In addition, AKT3 was targeted by

miR-29 in breast cancer, and cytokine receptor (IL2RB

and IL6R) and one component of the PI3K complex

(PIK3R3) were also targeted by miR-34a and miR-29,

respectively, in DLBCL. Indeed, it was previously shown

that miR-29b upregulation in breast cancer considerably

inhibited metastasis by repressing targets related to the

tumor microenvironment (54) (including some genes listed

above).

In the present study, we experimentally validated the bi-

cluster targets of miR-29 using the human breast cancer cell

line, MDA-MB-231, which is a well-established metastatic

and invasive cancer cell line. Transcript levels of nine bi-

cluster targets related to ECM or PI3K were analyzed 2

days after transient transfection with either miR-29 or con-

trol miRNA. All the nine targets were signicantly down-

regulated by miR-29b or -29c transfection compared to the

controls (Figure 6C). Furthermore, the activation of ECM-

related downstream pathways such as focal adhesion ki-

nase (FAK) and AKT were also attenuated by miR-29 (Fig-

ure 6D and E) demonstrating the capability of biclustering

analysis to capture relevant pathways for disease.

Finally, we analyzed the prognostic values of these miR-

NAs using multivariate Cox proportion hazard (mCPH)

model for public miRNA expression datasets. The distant-

relapse-free survival was tested for 210 patients with breast

cancer (GEO database, GSE22216). Among the six miR-

NAs analyzed, the three miR-29 family miRNAs had sig-

nicant prognostic values (mCPH P-values of miR-29a =

0.0042, miR-29b =0.0064, miR-29c =0.0038; adjusted

for age, tumor size, lymph nodes involved, ER and grade).

Then, the overall survival of 116 patients with DLBCL

(GSE40239) was also analyzed for ve miRNAs. Among

them, two exhibited signicant prognostic values (mCPH

P-values for miR-34a =0.0185 and miR-145 =0.0041; ad-

justed for International Prognostic Index (IPI) and gender).

See Supplementary Tables S10 and S11 for detailed results.

Kaplan–Meier plots contrasting the effects of miRNA ex-

pression on survival are also shown in Figure 6B and Sup-

plementary Figures S5B and S5C.

Overall, by analyzing cancer biclusters, we were able to

identify the key pathways (PI3K/Akt signaling, ECM and

focal adhesion), and ve associated prognostic miRNAs

(miR-29a, miR-29b and miR-29c in breast cancer; miR-34a

and miR-145 in DLBCL) that are repressive of tumor pro-

gression (hazard ratios of 0.593–0.745). In particular, the

effects of miR-29b/c on these pathways were validated ex-

perimentally (Figure 6C-E).

BiMIR: a bicluster database for condition-specic miRNA

targets

In total, 29 898 biclusters were generated for 459 human

miRNAs using PBE algorithm (13 949 for 1.3 FC; 10 999 for

1.5 FC; 4950 for 2.0 FC thresholds) and compiled in BiMIR

database (http://www.btool.org/bimir dir/) where biclusters

are searchable for miRNAs, tissues, diseases, keywords, tar-

get genes of interest and their combinations. BiMIR can be

used for investigating novel miRNA functions, targets and

related cell conditions.

Along with the list of searched biclusters, the function en-

richment results for bicluster targets are provided based on

the MSigDB (55) pathway (C2) and gene ontology (C5) cat-

egories. If biclusters are searched for a specic organ/tissue

or disease, the proportion of corresponding conditions in

each bicluster is also indicated. These help the user nd rel-

evant biclusters. The heat maps for each bicluster are visu-

alized (Supplementary Figure S6) and corresponding target

genes and cell conditions are hyperlinked to Genecards (56)

and GEO (24) databases for detailed information, respec-

tively. For bicluster target genes, the experimental evidence

from miRTarBase (41), network node degrees and protein

network visualization based on STRING database (44)are

provided. All the biclusters are downloadable from BiMIR

database.

DISCUSSION

Here, we presented a novel framework to prioritize miRNA

targets by biclustering sequence-specic targets and cell

conditions, which is a dimension that has been rarely inves-

tigated. This is based on the idea that miRNA targets, like

other cellular molecules, have modular activity and can be

repeatedly captured across different cell conditions. Indeed,

the bicluster targets exhibited substantially improved accu-

racy compared to purely sequence-based targets and were

often enriched in well-known pathways characterizing the

modules identied. Moreover, functionally connected tar-

gets exhibited even higher accuracy, further conrming the

modular activity of miRNA targets. The functional inter-

action of miRNA targets and their contribution to target

prediction have been studied previously (57,58).

We analyzed cancer biclusters and found that PI3K/Akt

signaling pathway was intensively targeted by a few miR-

NAs in two cancer types. Further, prognostic values of those

miRNAs and the regulatory effects of miR-29 were also val-

idated. These results demonstrate that biclustering analysis

is able to reveal key pathways controlled by miRNAs in dis-

ease. BiMIR database provides miRNAs and targeted path-

ways for dozens of diseases.

Based on the knowledge of miRNA expression, our pre-

diction was favorably compared with seven anticorrelation-

based methods under cancer conditions. These results

demonstrate the practical value of our approach in that our

results can provide fairly good target predictions for a va-

riety of cell conditions without generating costly miRNA–

mRNA proles. BiMIR database was designed to explore

Downloaded from https://academic.oup.com/nar/article-abstract/47/9/e53/5366474 by guest on 19 March 2020

PAGE 9OF 10 Nucleic Acids Research, 2019, Vol. 47, No. 9 e53

the modular regulatory networks of miRNAs by connect-

ing miRNAs, cell conditions (or disease), mRNA targets

and associated pathways. The user may obtain candidate

miRNAs and target genes for a cell condition of interest.

Knowledge of the miRNA expression level will help select

the proper direction of biclusters (up or down).

Despite the improvements and usefulness shown in this

study, there remain difculties in our approach regarding

free parameters that need to be optimized. First, the mini-

mum seed size of 10 ×10 was determined in an ad hoc man-

ner, and its optimal size may be affected by the size of the

fold-change data. Second, the iteration number of 20 in BI-

MAX algorithm was used to compromise the computation

time, using a higher iteration number yielded more biclus-

ters. However, other parameters seemed to be less sensitive.

For example, we gradually increased the threshold of zero

proportion from 0.01 to 0.1 (step size 0.01) during 10 iter-

ations of bicluster extension. This may seem to allow 10%

of zeros in the end, but the nal zero proportion was only

∼1.5% because of the trimming process. The cutoff of hier-

archical clustering of the extended clusters was also a less

sensitive parameter. In addition, the biclusters were gener-

ated under a rather strict criterion (for targets in three or

more databases); therefore, BiMIR can be used for selecting

a small number of highly likely targets for the cell condition

of interest.

The biclustering approach presented here can also be ap-

plied for predicting the condition-specic targets of other

sequence-specic regulators such as transcription factors

or RNA-binding proteins. In this regard, the entire 5158

mRNA fold-change proles for 20 639 genes are provided

for general systems biology research. These mRNA fold-

change data are different from the GTEx transcriptome

data (59) in that GTEx data represent transcription levels

in normal tissues, whereas our fold-change data represent

gene expression ‘changes’ for a variety of cell conditions

such as disease, chemical treatment, tissues and differentia-

tions. Thus, these fold-change data can also be used for clus-

tering or regulatory network analysis for a specic group of

genes or cell conditions.

Whereas existing methods to identify miRNA regulation

modules bicluster multiple miRNAs and multiple target

genes representing coregulatory networks, our work pre-

sented here is focused on prioritizing highly likely target

genes of a single miRNA that are commonly detected across

multiple cell conditions. Our approach can also be extended

to evaluate the miRNA coregulatory networks by overlap-

ping biclusters for different miRNAs. A signicant over-

lap indicates mRNA targets coregulated under multiple cell

conditions. Our approach and data would contribute to un-

covering the modular structure of complex regulatory net-

works.

DATA AVAILABILITY

BiMIRdatabaseareavailableathttp://www.btool.org/

bimir dir/. BiMIR R package that includes the biclustering

code and the large expression fold-change data are available

at https://github.com/unistbig/bimir.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

National Research Foundation (NRF) of Korea, Ge-

nomics Program [2016M3C9A3945893]; Basic Science

Research Program (NRF) [2017R1E1A1A03070107,

NRF-2018R1A5A1024340]; Bio-Synergy Research Project

[NRF-2017M3A9C4065956]. Funding for open access

charge: NRF [NRF-2016M3C9A3945893].

Conict of interest statement. None declared.

REFERENCES

1. Chen,K. and Rajewsky,N. (2007) The evolution of gene regulation by

transcription factors and microRNAs. Nat. Rev. Genet.,8, 93–103.

2. Salmena,L., Poliseno,L., Tay,Y., Kats,L. and Pandol,P.P. (2011) A

ceRNA hypothesis: the Rosetta Stone of a hidden RNA language?

Cell,146, 353–358.

3. Bueno,M.J. and Malumbres,M. (2011) MicroRNAs and the cell

cycle. Biochim. Biophys. Acta,1812, 592–601.

4. Shivdasani,R.A. (2006) MicroRNAs: regulators of gene expression

and cell differentiation. Blood,108, 3646–3653.

5. Neal,C.S., Michael,M.Z., Pimlott,L.K., Yong,T.Y., Li,J.Y.Z. and

Gleadle,J.M. (2011) Circulating microRNA expression is reduced in

chronic kidney disease. Nephrol. Dialysis Transplant.,26, 3794–3802.

6. Zhang,B.H., Pan,X.P., Cobb,G.P. and Anderson,T.A. (2007)

microRNAs as oncogenes and tumor suppressors. Dev. Biol.,302,

1–12.

7. John,B., Enright,A.J., Aravin,A., Tuschl,T., Sander,C. and

Marks,D.S. (2004) Human MicroRNA targets. PLoS Biol.,2, e363.

8. Lewis,B.P., Burge,C.B. and Bartel,D.P. (2005) Conserved seed

pairing, often anked by adenosines, indicates that thousands of

human genes are microRNA targets. Cell,120, 15–20.

9. Krek,A., Grun,D., Poy,M.N., Wolf,R., Rosenberg,L., Epstein,E.J.,

MacMenamin,P., da Piedade,I., Gunsalus,K.C., Stoffel,M. et al.

(2005) Combinatorial microRNA target predictions. Nat. Genet.,37,

495–500.

10. Kertesz,M., Iovino,N., Unnerstall,U., Gaul,U. and Segal,E. (2007)

The role of site accessibility in microRNA target recognition. Nat.

Genet.,39, 1278–1284.

11. Kiriakidou,M., Nelson,P.T., Kouranov,A., Fitziev,P., Bouyioukos,C.,

Mourelatos,Z. and Hatzigeorgiou,A. (2004) A combined

computational-experimental approach predicts human microRNA

targets. Genes Dev.,18, 1165–1178.

12. Kim,D., Sung,Y.M., Park,J., Kim,S., Kim,J., Park,J., Ha,H., Bae,J.Y.,

Kim,S. and Baek,D. (2016) General rules for functional microRNA

targeting. Nat. Genet.,48, 1517–1526.

13. Huang,J.C., Babak,T., Corson,T.W., Chua,G., Khan,S., Gallie,B.L.,

Hughes,T.R., Blencowe,B.J., Frey,B.J. and Morris,Q.D. (2007) Using

expression proling data to identify human microRNA targets. Nat.

Methods,4, 1045–1049.

14. Lu,Y., Zhou,Y., Qu,W., Deng,M. and Zhang,C. (2011) A Lasso

regression model for the construction of microRNA-target regulatory

networks. Bioinformatics,27, 2406–2413.

15. Muniategui,A., Pey,J., Planes,F.J. and Rubio,A. (2013) Joint analysis

of miRNA and mRNA expression data. Brief. Bioinform.,14,

263–278.

16. Yoon,S. and De Micheli,G. (2005) Prediction of regulatory modules

comprising microRNAs and target genes. Bioinformatics,21,

ii93–ii100.

17. Pio,G., Ceci,M., D’Elia,D., Loglisci,C. and Malerba,D. (2013) A

novel biclustering algorithm for the discovery of meaningful

biological correlations between microRNAs and their target genes.

BMC Bioinformatics,14, S8.

18. Joung,J.G., Hwang,K.B., Nam,J.W., Kim,S.J. and Zhang,B.T. (2007)

Discovery of microRNA-mRNA modules via population-based

probabilistic learning. Bioinformatics,23, 1141–1147.

19. Peng,X., Li,Y., Walters,K.A., Rosenzweig,E.R., Lederer,S.L.,

Aicher,L.D., Proll,S. and Katze,M.G. (2009) Computational

Downloaded from https://academic.oup.com/nar/article-abstract/47/9/e53/5366474 by guest on 19 March 2020

e53 Nucleic Acids Research, 2019, Vol. 47, No. 9 PAGE 10 OF 10

identication of hepatitis C virus associated microRNA-mRNA

regulatory modules in human livers. BMC Genomics,10, 373.

20. Liu,B., Li,J. and Cairns,M.J. (2014) Identifying miRNAs, targets and

functions. Brief. Bioinform.,15, 1–19.

21. Liu,B., Liu,L., Tsykin,A., Goodall,G.J., Green,J.E., Zhu,M.,

Kim,C.H. and Li,J. (2010) Identifying functional miRNA-mRNA

regulatory modules with correspondence latent dirichlet allocation.

Bioinformatics,26, 3105–3111.

22. Mitra,K., Carvunis,A.R., Ramesh,S.K. and Ideker,T. (2013)

Integrative approaches for nding modular structure in biological

networks. Nat. Rev. Genet.,14, 719–732.

23. Le,T.D., Liu,L., Zhang,J., Liu,B. and Li,J. (2015) From miRNA

regulation to miRNA-TF co-regulation: computational approaches

and challenges. Brief. Bioinform.,16, 475–496.

24. Clough,E. and Barrett,T. (2016) The Gene Expression Omnibus

Database. Methods Mol. Biol.,1418, 93–110.

25. Gennarino,V.A., D’Angelo,G., Dharmalingam,G., Fernandez,S.,

Russolillo,G., Sanges,R., Mutarelli,M., Belcastro,V., Ballabio,A.,

Ver d e ,P. et al. (2012) Identication of microRNA-regulated gene

networks by expression analysis of target genes. Genome Res.,22,

1163–1172.

26. Bondy,J.A. and Murty,U.S.R. (1976) Graph Theory with Applications.

Macmillan, London.

27. Prelic,A., Bleuler,S., Zimmermann,P., Wille,A., Buhlmann,P.,

Gruissem,W., Hennig,L., Thiele,L. and Zitzler,E. (2006) A systematic

comparison and evaluation of biclustering methods for gene

expression data. Bioinformatics,22, 1122–1129.

28. Bergmann,S., Ihmels,J. and Barkai,N. (2003) Iterative signature

algorithm for the analysis of large-scale gene expression data. Phys.

Rev. E,67, 031902.

29. Li,G., Ma,Q., Tang,H., Paterson,A.H. and Xu,Y. (2009) QUBIC: a

qualitative biclustering algorithm for analyses of gene expression

data. Nucleic Acids Res.,37, e101.

30. Hochreiter,S., Bodenhofer,U., Heusel,M., Mayr,A., Mitterecker,A.,

Kasim,A., Khamiakova,T., Van Sanden,S., Lin,D., Talloen,W. et al.

(2010) FABIA: factor analysis for bicluster acquisition.

Bioinformatics,26, 1520–1527.

31. Rodriguez-Baena,D.S., Perez-Pulido,A.J. and Aguilar-Ruiz,J.S.

(2011) A biclustering algorithm for extracting bit-patterns from

binary datasets. Bioinformatics,27, 2738–2745.

32. Gautier,L., Cope,L., Bolstad,B.M. and Irizarry,R.A. (2004) affy -

analysis of Affymetrix GeneChip data at the probe level.

Bioinformatics,20, 307–315.

33. Garcia,D.M., Baek,D., Shin,C., Bell,G.W., Grimson,A. and

Bartel,D.P. (2011) Weak seed-pairing stability and high target-site

abundance decrease the prociency of lsy-6 and other microRNAs.

Nat. Struct. Mol. Biol.,18, 1139–1146.

34. Kozomara,A. and Grifths-Jones,S. (2014) miRBase: annotating

high condence microRNAs using deep sequencing data. Nucleic

Acids Res.,42, D68–D73.

35. Betel,D., Koppal,A., Agius,P., Sander,C. and Leslie,C. (2010)

Comprehensive modeling of microRNA targets predicts functional

non-conserved and non-canonical sites. Genome Biol.,11, R90.

36. Kertesz,M., Iovino,N., Unnerstall,U., Gaul,U. and Segal,E. (2007)

The role of site accessibility in microRNA target recognition. Nat.

Genet.,39, 1278–1284.

37. Maragkakis,M., Reczko,M., Simossis,V.A., Alexiou,P.,

Papadopoulos,G.L., Dalamagas,T., Giannopoulos,G., Goumas,G.,

Koukis,E., Kourtis,K. et al. (2009) DIANA-microT web server:

elucidating microRNA functions through target prediction. Nucleic

Acids Res.,37, W273–W276.

38. Paraskevopoulou,M.D., Georgakilas,G., Kostoulas,N., Vlachos,I.S.,

Vergoulis,T., Reczko,M., Filippidis,C., Dalamagas,T. and

Hatzigeorgiou,A.G. (2013) DIANA-microT web server v5.0: service

integration into miRNA functional analysis workows. Nucleic Acids

Res.,41, W169–W173.

39. Wang,X.W. (2016) Improving microRNA target prediction by

modeling with unambiguously identied microRNA-target pairs

from CLIP-ligation studies. Bioinformatics,32, 1316–1322.

40. Nielsen,C.B., Shomron,N., Sandberg,R., Hornstein,E., Kitzman,J.

and Burge,C.B. (2007) Determinants of targeting by endogenous and

exogenous microRNAs and siRNAs. RNA,13, 1894–1910.

41. Chou,C.H., Shrestha,S., Yang,C.D., Chang,N.W., Lin,Y.L.,

Liao,K.W., Huang,W.C., Sun,T.H., Tu,S.J., Lee,W.H. et al. (2018)

miRTarBase update 2018: a resource for experimentally validated

microRNA-target interactions. Nucleic Acids Res.,46, D296–D302.

42. Sass,S., Dietmann,S., Burk,U.C., Brabletz,S., Lutter,D.,

Kowarsch,A., Mayer,K.F., Brabletz,T., Ruepp,A., Theis,F.J. et al.

(2011) MicroRNAs coordinately regulate protein complexes. BMC

Syst. Biol.,5, 136.

43. Sakai,A., Saitow,F., Maruyama,M., Miyake,N., Miyake,K.,

Shimada,T., Okada,T. and Suzuki,H. (2017) MicroRNA cluster

miR-17-92 regulates multiple functionally related voltage-gated

potassium channels in chronic neuropathic pain. Nat. Commun.,8,

16079.

44. Szklarczyk,D., Morris,J.H., Cook,H., Kuhn,M., Wyder,S.,

Simonovic,M., Santos,A., Doncheva,N.T., Roth,A., Bork,P. et al.

(2017) The STRING database in 2017: quality-controlled

protein-protein association networks, made broadly accessible.

Nucleic Acids Res.,45, D362–D368.

45. Santosa,F. and Symes,W.W. (1986) Linear inversion of Band-Limited

reection seismograms. Siam J. Sci. Stat. Comput.,7, 1307–1330.

46. Tibshirani,R. (1996) Regression shrinkage and selection via the

Lasso. J. R. Stat. Soc. Series B Methodol.,58, 267–288.

47. Sass,S., Pitea,A., Unger,K., Hess,J., Mueller,N.S. and Theis,F.J.

(2015) MicroRNA-Target network inference and local network

enrichment analysis identify two microRNA clusters with distinct

functions in head and neck squamous cell carcinoma. Int. J. Mol.

Sci.,16, 30204–30222.

48. Le,T.D., Liu,L., Tsykin,A., Goodall,G.J., Liu,B., Sun,B.Y. and Li,J.

(2013) Inferring microRNA-mRNA causal regulatory relationships

from expression data. Bioinformatics,29, 765–771.

49. Koo,J., Zhang,J.Y. and Chaterji,S. (2018) Tiresias: Context-sensitive

approach to decipher the presence and strength of MicroRNA

regulatory interactions. Theranostics,8, 277–291.

50. Le,T.D., Zhang,J., Liu,L., Liu,H. and Li,J. (2015) miRLAB: An R

based dry lab for exploring miRNA-mRNA regulatory relationships.

PLoS One,10, e0145386.

51. Huang,D.W., Sherman,B.T., Tan,Q., Collins,J.R., Alvord,W.G.,

Roayaei,J., Stephens,R., Baseler,M.W., Lane,H.C. and

Lempicki,R.A. (2007) The DAVID Gene Functional Classication

Tool: a novel biological module-centric algorithm to functionally

analyze large gene lists. Genome Biol.,8, R183.

52. Chang,F., Lee,J.T., Navolanic,P.M., Steelman,L.S., Shelton,J.G.,

Blalock,W.L., Franklin,R.A. and McCubrey,J.A. (2003) Involvement

of PI3K/Akt pathway in cell cycle progression, apoptosis, and

neoplastic transformation: a target for cancer chemotherapy.

Leukemia,17, 590–603.

53. Luo,J., Manning,B.D. and Cantley,L.C. (2003) Targeting the

PI3K-Akt pathway in human cancer: rationale and promise. Cancer

Cell,4, 257–262.

54. Chou,J., Lin,J.H., Brenot,A., Kim,J.W., Provot,S. and Werb,Z. (2013)

GATA3 suppresses metastasis and modulates the tumour

microenvironment by regulating microRNA-29b expression. Nat. Cell

Biol.,15, 201–213.

55. Liberzon,A., Subramanian,A., Pinchback,R., Thorvaldsdottir,H.,

Tamayo,P. and Mesirov,J.P. (2011) Molecular signatures database

(MSigDB) 3.0. Bioinformatics,27, 1739–1740.

56. Safran,M., Dalah,I., Alexander,J., Rosen,N., Iny Stein,T.,

Shmoish,M., Nativ,N., Bahir,I., Doniger,T., Krug,H. et al. (2010)

GeneCards Version 3: the human gene integrator. Database

(Oxford),2010, baq020.

57. Liang,H. and Li,W.H. (2007) MicroRNA regulation of human

protein protein interaction network. RNA,13, 1402–1408.

58. Wang,P., Ning,S., Wang,Q., Li,R., Ye,J., Zhao,Z., Li,Y., Huang,T.

and Li,X. (2013) mirTarPri: improved prioritization of microRNA

targets through incorporation of functional genomics data. PLoS

One,8, e53685.

59. GTEx Consortium (2015) Human genomics. The Genotype-Tissue

Expression (GTEx) pilot analysis: multitissue gene regulation in

humans. Science,348, 648–660.

Downloaded from https://academic.oup.com/nar/article-abstract/47/9/e53/5366474 by guest on 19 March 2020

Supplementary Data

Data

May 2019

Sora Yoon · Nguyen Cao Truong Hai · Woobeen Jo · Kim Jinhwan · Dougu Nam

Screening and identification of differentially expressed microRNAs in diffuse large B‑cell lymphoma based on microRNA microarray

Article

Full-text available

Aug 2021

miRNAs. Next, three databases (TargetScan, microRNA. org and PITA) were used to predict by intersection the potential target genes of the 204 differential miRNAs identified, and a Venn diagram of the results was performed. Subsequently, the target genes of differential miRNAs were analyzed by Gene Ontology and Kyoto Encyclopedia of Genes and Genomes pathway analysis. Finally, to validate the miRNA microarray data, reverse transcription-quantitative PCR (RT-qPCR) was performed for 8 differentially expressed miRNAs (miR-193a-3p, miR-19a-3p, miR-19b-3p, miR-370-3p, miR-1275, miR-490-5p, miR-630 and miR-665) using DLBCL and LRH fresh samples. In total, 204 miRNAs exhibited differential expression, including 105 downregulated and 54 upregulated miRNAs. The cut-off criteria were set as P≤0.05 and fold-change ≥2. A total of 7,522 potential target genes for the 204 miRNAs were predicted. Potential target genes were enriched in the following pathways: ‘Cancer’, ‘MAPK signaling pathway’, ‘regulation of actin cytoskeleton’, ‘focal adhesion’, ‘endocytosis’, ‘Wnt signaling pathway’, ‘axon guidance’, ‘calcium signaling pathway’ and ‘PI3K/AKT signaling pathway’. A total of 8 miRNAs were validated by RT-qPCR, and 4 miRNAs (miR-19b-3p, miR-193a-3p, miR-370-3p and miR-490-5p) exhibited low expression levels in DLBCL (P<0.05), while miR-630 was highly expressed in DLBCL (P<0.05). Overall, the present study screened 204 differentially expressed miRNAs and analyzed the expression levels of 8 differentially expressed miRNAs in DLBCL. These differentially expressed miRNAs may serve as therapeutic targets for improvement of therapeutic efficacy in DLBCL in the future.

TSCCA: A tensor sparse CCA method for detecting microRNA-gene patterns from multiple cancers

Article

Full-text available

Jun 2021
PLOS COMPUT BIOL

Existing studies have demonstrated that dysregulation of microRNAs (miRNAs or miRs) is involved in the initiation and progression of cancer. Many efforts have been devoted to identify microRNAs as potential biomarkers for cancer diagnosis, prognosis and therapeutic targets. With the rapid development of miRNA sequencing technology, a vast amount of miRNA expression data for multiple cancers has been collected. These invaluable data repositories provide new paradigms to explore the relationship between miRNAs and cancer. Thus, there is an urgent need to explore the complex cancer-related miRNA-gene patterns by integrating multi-omics data in a pan-cancer paradigm. In this study, we present a tensor sparse canonical correlation analysis (TSCCA) method for identifying cancer-related miRNA-gene modules across multiple cancers. TSCCA is able to overcome the drawbacks of existing solutions and capture both the cancer-shared and specific miRNA-gene co-expressed modules with better biological interpretations. We comprehensively evaluate the performance of TSCCA using a set of simulated data and matched miRNA/gene expression data across 33 cancer types from the TCGA database. We uncover several dysfunctional miRNA-gene modules with important biological functions and statistical significance. These modules can advance our understanding of miRNA regulatory mechanisms of cancer and provide insights into miRNA-based treatments for cancer.

Exploring cell-specific miRNA regulation with single-cell miRNA-mRNA co-sequencing data

Article

Full-text available

Dec 2021
BMC BIOINFORMATICS

Background Existing computational methods for studying miRNA regulation are mostly based on bulk miRNA and mRNA expression data. However, bulk data only allows the analysis of miRNA regulation regarding a group of cells, rather than the miRNA regulation unique to individual cells. Recent advance in single-cell miRNA-mRNA co-sequencing technology has opened a way for investigating miRNA regulation at single-cell level. However, as currently single-cell miRNA-mRNA co-sequencing data is just emerging and only available at small-scale, there is a strong need of novel methods to exploit existing single-cell data for the study of cell-specific miRNA regulation. Results In this work, we propose a new method, CSmiR (Cell-Specific miRNA regulation) to combine single-cell miRNA-mRNA co-sequencing data and putative miRNA-mRNA binding information to identify miRNA regulatory networks at the resolution of individual cells. We apply CSmiR to the miRNA-mRNA co-sequencing data in 19 K562 single-cells to identify cell-specific miRNA-mRNA regulatory networks for understanding miRNA regulation in each K562 single-cell. By analyzing the obtained cell-specific miRNA-mRNA regulatory networks, we observe that the miRNA regulation in each K562 single-cell is unique. Moreover, we conduct detailed analysis on the cell-specific miRNA regulation associated with the miR-17/92 family as a case study. The comparison results indicate that CSmiR is effective in predicting cell-specific miRNA targets. Finally, through exploring cell–cell similarity matrix characterized by cell-specific miRNA regulation, CSmiR provides a novel strategy for clustering single-cells and helps to understand cell–cell crosstalk. Conclusions To the best of our knowledge, CSmiR is the first method to explore miRNA regulation at a single-cell resolution level, and we believe that it can be a useful method to enhance the understanding of cell-specific miRNA regulation.

Functional analysis of co-expression networks of zebrafish ace2 reveals enrichment of pathways associated with development and disease

Article

Full-text available

Oct 2021

Human Angiotensin I Converting Enzyme 2 (ACE2) plays an essential role in blood pressure regulation and SARS-CoV-2 entry. ACE2 has a highly conserved, one-to-one ortholog (ace2) in zebrafish, which is an important model for human diseases. However, the zebrafish ace2 expression profile has not yet been studied during early development, between genders, across different genotypes, or in disease. Moreover, a network-based meta-analysis for the extraction of functionally enriched pathways associated with differential ace2 expression is lacking in the literature. Herein, we first identified significant development-, tissue-, genotype-, and gender-specific modulations in ace2 expression via meta-analysis of zebrafish Affymetrix transcriptomics datasets (ndatasets = 107); and the correlation analysis of ace2 meta-differential expression profile revealed distinct positively and negatively correlated local functionally enriched gene networks. Moreover, we demonstrated that ace2 expression was significantly modulated under different physiological and pathological conditions related to development, tissue, gender, diet, infection, and inflammation using additional RNA-seq datasets. Our findings implicate a novel translational role for zebrafish ace2 in organ differentiation and pathologies observed in the intestines and liver.

ScalaParBiBit: scaling the binary biclustering in distributed-memory systems

Article

Full-text available

Sep 2021
CLUSTER COMPUT

Biclustering is a data mining technique that allows us to find groups of rows and columns that are highly correlated in a 2D dataset. Although there exist several software applications to perform biclustering, most of them suffer from a high computational complexity which prevents their use in large datasets. In this work we present ScalaParBiBit, a parallel tool to find biclusters on binary data, quite common in many research fields such as text mining, marketing or bioinformatics. ScalaParBiBit takes advantage of the special characteristics of these binary datasets, as well as of an efficient parallel implementation and algorithm, to accelerate the biclustering procedure in distributed-memory systems. The experimental evaluation proves that our tool is significantly faster and more scalable that the state-of-the-art tool ParBiBit in a cluster with 32 nodes and 768 cores. Our tool together with its reference manual are freely available at https://github.com/fraguela/ScalaParBiBit.

Kalirin-RAC controls nucleokinetic migration in ADRN-type neuroblastoma

Article

Full-text available

Mar 2021

The migrational propensity of neuroblastoma is affected by cell identity, but the mechanisms behind the divergence remain unknown. Using RNAi and time-lapse imaging, we show that ADRN-type NB cells exhibit RAC1- and kalirin-dependent nucleokinetic (NUC) migration that relies on several integral components of neuronal migration. Inhibition of NUC migration by RAC1 and kalirin-GEF1 inhibitors occurs without hampering cell proliferation and ADRN identity. Using three clinically relevant expression dichotomies, we reveal that most of up-regulated mRNAs in RAC1- and kalirin–GEF1–suppressed ADRN-type NB cells are associated with low-risk characteristics. The computational analysis shows that, in a context of overall gene set poverty, the upregulomes in RAC1- and kalirin–GEF1–suppressed ADRN-type cells are a batch of AU-rich element–containing mRNAs, which suggests a link between NUC migration and mRNA stability. Gene set enrichment analysis–based search for vulnerabilities reveals prospective weak points in RAC1- and kalirin–GEF1–suppressed ADRN-type NB cells, including activities of H3K27- and DNA methyltransferases. Altogether, these data support the introduction of NUC inhibitors into cancer treatment research.

Cancer miRNA biomarkers classification using a new representation algorithm and evolutionary deep learning

Article

Oct 2020
SOFT COMPUT

The diagnosis of cancer is presently undergoing a change of paradigm for the diagnostic panel using molecular biomarkers. MicroRNA (miRNA) is one of the most important genomic datasets presenting the genome sequences. Since several studies have shown the relationship between miRNAs and cancers, data mining and machine learning methods can be incorporated to extract a large amount of knowledge from cancer genomic datasets. However, previous research works on the identification of cancers from miRNAs have made it possible to diagnose cancer, and the accuracy of some classes is not quite satisfactory. Therefore, this research is aimed at promoting a super-class (meta-label) approach and deep learning in a three-phase method to diagnose cancers from miRNAs. The steps in the first phase of the proposed method, named Representation learning, are partitioning data into super-classes, meta-data creation and super-classes classification. This phase helps data to be split into some subsets to improve classification accuracy. In other words, the first phase groups labels based on the separability of classes into a meta-label, and then a multi-label learner is built to predict these meta-labels. In the second phase, a feature selection to reduce the dimensions of the problem is applied to each super-class to help to focus the attention of an induction algorithm in those features that are more important to predict the target concept. In the third phase of the proposed method, an evolutionary deep neural network for the classification of labels in each super-class is performed. The last two phases are done separately for each subset in which five super-classes and subsequently five deep neural networks are trained. The experimental results reveal that the proposed method achieved more efficient results than 19 recent machine learning methods. Despite the fact that evaluating the dataset which consists of 29 types of cancers provides a more complicated situation for the convolutional neural network to be learned, the performance of the method is noticeably better than other existing methods. The other success which can be considered here is a significant reduction in running time comparing to other methods.

Cancer miRNA biomarkers classification using a new representation algorithm and evolutionary deep learning

Article

Full-text available

Feb 2021
SOFT COMPUT

ASDmiR: A Stepwise Method to Uncover miRNA Regulation Related to Autism Spectrum Disorder

Article

Full-text available

Oct 2020

Autism spectrum disorder (ASD) is a class of neurodevelopmental disorders characterized by genetic and environmental risk factors. The pathogenesis of ASD has a strong genetic basis, consisting of rare de novo or inherited variants among a variety of multiple molecules. Previous studies have shown that microRNAs (miRNAs) are involved in neurogenesis and brain development and are closely associated with the pathogenesis of ASD. However, the regulatory mechanisms of miRNAs in ASD are largely unclear. In this work, we present a stepwise method, ASDmiR, for the identification of underlying pathogenic genes, networks, and modules associated with ASD. First, we conduct a comparison study on 12 miRNA target prediction methods by using the matched miRNA, lncRNA, and mRNA expression data in ASD. In terms of the number of experimentally confirmed miRNA–target interactions predicted by each method, we choose the best method for identifying miRNA–target regulatory network. Based on the miRNA–target interaction network identified by the best method, we further infer miRNA–target regulatory bicliques or modules. In addition, by integrating high-confidence miRNA–target interactions and gene expression data, we identify three types of networks, including lncRNA–lncRNA, lncRNA–mRNA, and mRNA–mRNA related miRNA sponge interaction networks. To reveal the community of miRNA sponges, we further infer miRNA sponge modules from the identified miRNA sponge interaction network. Functional analysis results show that the identified hub genes, as well as miRNA-associated networks and modules, are closely linked with ASD. ASDmiR is freely available at https://github.com/chenchenxiong/ASDmiR.

Scanning sample-specific miRNA regulation from bulk and single-cell RNA-sequencing data

Preprint

Full-text available

Aug 2023

RNA-sequencing technology provides an effective tool for understanding miRNA regulation in complex human diseases, including cancers. A large number of computational methods have been developed to make use of bulk and single-cell RNA-sequencing data to identify miRNA regulations at the resolution of multiple samples (i.e. group of cells or tissues). However, due to the heterogeneity of individual samples, there is a strong need to infer miRNA regulation specific to individual samples to uncover miRNA regulation at single-sample resolution level. Here, we develop a framework, Scan, for scanning sample-specific miRNA regulation. Since a single network inference method or strategy cannot perform well for all types of new data, Scan incorporates 27 network inference methods and two strategies to infer tissue-specific or cell-specific miRNA regulation from bulk or single-cell RNA-sequencing data. Results on bulk and single-cell RNA-sequencing data demonstrate the effectiveness of Scan in inferring sample-specific miRNA regulation. Moreover, we have found that incorporating priori information of miRNA targets can improve the accuracy of miRNA target prediction. In addition, Scan can contribute to the clustering cells/tissues and construction of cell/tissue correlation networks. Finally, the comparison results have shown that the performance of network inference methods is likely to be data-specific, and selecting optimal network inference methods is required for more accurate prediction of miRNA targets. We have made Scan freely available to the public to help infer sample-specific miRNA regulation for new data, benchmark new network inference methods and deepen the understanding of miRNA regulation at the resolution of individual samples.

Tiresias: Context-sensitive Approach to Decipher the Presence and Strength of MicroRNA Regulatory Interactions

Article

Full-text available

Jan 2018
thno

MicroRNAs (miRNAs) are short non-coding RNAs that regulate expression of target messenger RNAs (mRNAs) post-transcriptionally. Understanding the precise regulatory role of miRNAs is of great interest since miRNAs have been shown to play an important role in development, diseases, and other biological processes. Early work on miRNA target prediction has focused on static sequence-driven miRNA-mRNA complementarity. However, recent research also utilizes expression-level data to study context-dependent regulation effects in a more dynamic, physiologically-relevant setting. Methods: We propose a novel artificial neural network (ANN) based method, named Tiresias, to predict such targets in a context-dependent manner by combining sequence and expression data. In order to predict the interacting pairs among miRNAs and mRNAs and their regulatory weights, we develop a two-stage ANN and present how to train it appropriately. Tiresias is designed to study various regulation models, ranging from a simple linear model to a complex non-linear model. Tiresias has a single hyper-parameter to control the sparsity of miRNA-mRNA interactions, which we optimize using Bayesian optimization. Results: Tiresias performs better than existing computational methods such as GenMiR++, Elastic Net, and PIMiM, achieving an F1 score of >0.8 for a certain level of regulation strength. For the TCGA breast invasive carcinoma dataset, Tiresias results in the rate of up to 82% in detecting the experimentally-validated interactions between miRNAs and mRNAs, even if we assume that true regulations may result in a low level of regulation strength. Conclusion: Tiresias is a two-stage ANN, computational method that deciphers context-dependent microRNA regulatory interactions. Experiment results demonstrate that Tiresias outperforms existing solutions and can achieve a high F1 score. Source code of Tiresias is available at https://bitbucket.org/cellsandmachines/.

MiRTarBase update 2018: A resource for experimentally validated microRNA-target interactions

Article

Full-text available

Nov 2017
NUCLEIC ACIDS RES

MicroRNAs (miRNAs) are small non-coding RNAs of ∼ 22 nucleotides that are involved in negative regulation of mRNA at the post-transcriptional level. Previously, we developed miRTarBase which provides information about experimentally validated miRNA-target interactions (MTIs). Here, we describe an updated database containing 422 517 curated MTIs from 4076 miRNAs and 23 054 target genes collected from over 8500 articles. The number of MTIs curated by strong evidence has increased ∼1.4-fold since the last update in 2016. In this updated version, target sites validated by reporter assay that are available in the literature can be downloaded. The target site sequence can extract new features for analysis via a machine learning approach which can help to evaluate the performance of miRNA-target prediction tools. Furthermore, different ways of browsing enhance user browsing specific MTIs. With these improvements, miRTarBase serves as more comprehensively annotated, experimentally validated miRNA-target interactions databases in the field of miRNA related research. miRTarBase is available at http://miRTarBase.mbc.nctu.edu.tw/.

MicroRNA cluster miR-17-92 regulates multiple functionally related voltage-gated potassium channels in chronic neuropathic pain

Article

Full-text available

Jul 2017

miR-17-92 is a microRNA cluster with six distinct members. Here, we show that the miR-17-92 cluster and its individual members modulate chronic neuropathic pain. All cluster members are persistently upregulated in primary sensory neurons after nerve injury. Overexpression of miR-18a, miR-19a, miR-19b and miR-92a cluster members elicits mechanical allodynia in rats, while their blockade alleviates mechanical allodynia in a rat model of neuropathic pain. Plausible targets for the miR-17-92 cluster include genes encoding numerous voltage-gated potassium channels and their modulatory subunits. Single-cell analysis reveals extensive co-expression of miR-17-92 cluster and its predicted targets in primary sensory neurons. miR-17-92 downregulates the expression of potassium channels, and reduced outward potassium currents, in particular A-type currents. Combined application of potassium channel modulators synergistically alleviates mechanical allodynia induced by nerve injury or miR-17-92 overexpression. miR-17-92 cluster appears to cooperatively regulate the function of multiple voltage-gated potassium channel subunits, perpetuating mechanical allodynia.

The STRING database in 2017: quality-controlled protein-protein association networks, made broadly accessible

Article

Full-text available

Oct 2016
NUCLEIC ACIDS RES

A system-wide understanding of cellular function requires knowledge of all functional interactions between the expressed proteins. The STRING database aims to collect and integrate this information, by consolidating known and predicted protein-protein association data for a large number of organisms. The associations in STRING include direct (physical) interactions, as well as indirect (functional) interactions, as long as both are specific and biologically meaningful. Apart from collecting and reassessing available experimental data on protein-protein interactions, and importing known pathways and protein complexes from curated databases, interaction predictions are derived from the following sources: (i) systematic co-expression analysis, (ii) detection of shared selective signals across genomes, (iii) automated text-mining of the scientific literature and (iv) computational transfer of interaction knowledge between organisms based on gene orthology. In the latest version 10.5 of STRING, the biggest changes are concerned with data dissemination: the web frontend has been completely redesigned to reduce dependency on outdated browser technologies, and the database can now also be queried from inside the popular Cytoscape software framework. Further improvements include automated background analysis of user inputs for functional enrichments, and streamlined download options. The STRING resource is available online, at http://string-db.org/.

Identifying functional miRNA-mRNA regulatory modules with correspondence latent dirichlet allocation

Article

General rules for functional microRNA targeting

Article

Oct 2016
Nat Genet

The functional rules for microRNA (miRNA) targeting remain controversial despite their biological importance because only a small fraction of distinct interactions, called site types, have been examined among an astronomical number of site types that can occur between miRNAs and their target mRNAs. To systematically discover functional site types and to evaluate the contradicting rules reported previously, we used large-scale transcriptome data and statistically examined whether each of approximately 2 billion site types is enriched in differentially downregulated mRNAs responding to overexpressed miRNAs. Accordingly, we identified seven non-canonical functional site types, most of which are novel, in addition to four canonical site types, while also removing numerous false positives reported by previous studies. Extensive experimental validation and significantly elevated 3' UTR sequence conservation indicate that these non-canonical site types may have biologically relevant roles. Our expanded catalog of functional site types suggests that the gene regulatory network controlled by miRNAs may be far more complex than currently understood.

The Gene Expression Omnibus Database

Chapter

Mar 2016
Meth Mol Biol

The Gene Expression Omnibus (GEO) database is an international public repository that archives and freely distributes high-throughput gene expression and other functional genomics data sets. Created in 2000 as a worldwide resource for gene expression studies, GEO has evolved with rapidly changing technologies and now accepts high-throughput data for many other data applications, including those that examine genome methylation, chromatin structure, and genome–protein interactions. GEO supports community-derived reporting standards that specify provision of several critical study elements including raw data, processed data, and descriptive metadata. The database not only provides access to data for tens of thousands of studies, but also offers various Web-based tools and strategies that enable users to locate data relevant to their specific interests, as well as to visualize and analyze the data. This chapter includes detailed descriptions of methods to query and download GEO data and use the analysis and visualization tools. The GEO homepage is at http:// www. ncbi. nlm. nih. gov/ geo/ .

Regression Shrinkage and Selection via the LASSO

Article

Jan 1996

R. J. Tibshirani

Targeting the PI3K-Akt pathway in human cancer: Rationale and promise

Article

Jan 2006
CELL

Improving microRNA target prediction by modeling with unambiguously identified microRNA-target pairs from CLIP-Ligation studies

Article

Jan 2016
BIOINFORMATICS

Xiaowei Wang

Motivation: MicroRNAs (miRNAs) are small noncoding RNAs that are extensively involved in many physiological and disease processes. One major challenge in miRNA studies is the identification of genes targeted by miRNAs. Currently, most researchers rely on computational programs to initially identify target candidates for subsequent validation. Although considerable progress has been made in recent years for computational target prediction, there is still significant room for algorithmic improvement. Results: Here, we present an improved target prediction algorithm, which was developed by modeling high-throughput profiling data from recent CLIPL (crosslinking and immunoprecipitation followed by RNA ligation) sequencing studies. In these CLIPL-seq studies, the RNA sequences in each miRNA-target pair were covalently linked and unambiguously determined experimentally. By analyzing the CLIPL data, many known and novel features relevant to target recognition were identified and then used to build a computational model for target prediction. Comparative analysis showed that the new algorithm had improved performance over existing algorithms when applied to independent experimental data. Availability: All the target prediction data as well as the prediction tool can be accessed at miRDB (http://mirdb.org). Contact: xwang@radonc.wustl.edu.

Biclustering analysis of transcriptome big data identifies condition-specific microRNA targets

Abstract

Supplementary resource (1)

Recommended publications

A microRNA-based gene dysregulation pathway in Huntington's disease

Pathway analysis of microRNAs in mouse heart development

miRNAs in Cardiovascular Development

Uncovering the Roles of miRNAs and Their Relationship with Androgen Receptor in Prostate Cancer