ArticlePDF Available

A Comparison of Normalization Techniques for MicroRNA Microarray Data

February 2008
Statistical Applications in Genetics and Molecular Biology 7(1):22-22

February 2008
7(1):22-22

DOI:10.2202/1544-6115.1287

Source
RePEc

Authors:

Yoonkyung Lee

The Ohio State University

David Jarjoura

The Northern Negros State College of Science and Technology (NONESCOST)

Show all 7 authorsHide

Normalization of expression levels applied to microarray data can help in reducing measurement error. Different methods, including cyclic loess, quantile normalization and median or mean normalization, have been utilized to normalize microarray data. Although there is considerable literature regarding normalization techniques for mRNA microarray data, there are no publications comparing normalization techniques for microRNA (miRNA) microarray data, which are subject to similar sources of measurement error. In this paper, we compare the performance of cyclic loess, quantile normalization, median normalization and no normalization for a single-color microRNA microarray dataset. We show that the quantile normalization method works best in reducing differences in miRNA expression values for replicate tissue samples. By showing that the total mean squared error are lowest across almost all 36 investigated tissue samples, we are assured that the bias correction provided by quantile normalization is not outweighed by additional error variance that can arise from a more complex normalization method. Furthermore, we show that quantile normalization does not achieve these results by compression of scale.

confidence band of the bias for the human brain tissue data. A) without normalization, B) after median normalization, C) after quantile normalization and D) after cyclic loess.

…

Figures - uploaded by Chang-Gong Liu

Content may be subject to copyright.

Content uploaded by Chang-Gong Liu

Content may be subject to copyright.

Statistical Applications in Genetics

and Molecular Biology

Volume 7, Issue 1 2008 Article 22

A Comparison of Normalization Techniques

for MicroRNA Microarray Data

Youlan Rao∗Yoonkyung Lee†David Jarjoura‡

Amy S. Ruppert∗∗ Chang-gong Liu††

Jason C. Hsu‡‡ John P. Hagan§

∗The Ohio State University, rao@stat.ohio-state.edu

†The Ohio State University, yklee@stat.ohio-state.edu

‡The Ohio State University, david.jarjoura@osumc.edu

∗∗The Ohio State University, amy.ruppert@osumc.edu

††The Ohio State University, chang-gong.liu@osumc.edu

‡‡The Ohio State University, jch@stat.ohio-state.edu

§The Ohio State University, microrna@gmail.com

A Comparison of Normalization Techniques

for MicroRNA Microarray Data∗

Youlan Rao, Yoonkyung Lee, David Jarjoura, Amy S. Ruppert, Chang-gong Liu,

Jason C. Hsu, and John P. Hagan

Abstract

Normalization of expression levels applied to microarray data can help in reducing measure-

ment error. Different methods, including cyclic loess, quantile normalization and median or mean

normalization, have been utilized to normalize microarray data. Although there is considerable lit-

erature regarding normalization techniques for mRNA microarray data, there are no publications

comparing normalization techniques for microRNA (miRNA) microarray data, which are subject

to similar sources of measurement error. In this paper, we compare the performance of cyclic loess,

quantile normalization, median normalization and no normalization for a single-color microRNA

microarray dataset. We show that the quantile normalization method works best in reducing dif-

ferences in miRNA expression values for replicate tissue samples. By showing that the total mean

squared error are lowest across almost all 36 investigated tissue samples, we are assured that the

bias correction provided by quantile normalization is not outweighed by additional error variance

that can arise from a more complex normalization method. Furthermore, we show that quantile

normalization does not achieve these results by compression of scale.

KEYWORDS: microRNA, median normalization, cyclic loess normalization, quantile normal-

ization, robust estimates, smoothing spline, mean squared error

∗This material is based in part upon work supported by the National Science Foundation under

Agreement No. 0635561. Jason C. Hsu’s research is supported by NSF Grant Number DMS-

0505519

1 Introduction

In microarray experiments, variation of expression measurements among arrays

can be attributed to many sources, such as differences in sample RNA prepa-

ration, cDNA labeling, image intensity and microarray hybridization/wash efﬁ-

ciency. Normalization of expression levels applied to microarray data can help

in removing this error. Different methods, including cyclic loess, quantile nor-

malization (Bolstad et al. 2003) and median or mean normalization (Churchill

2002, Churchill 2003, Churchill and Oliver 2001, Kerr and Churchill 2001, and

Wolﬁnger et al. 2001), have been utilized to normalize microarray data. Brieﬂy,

cyclic loess makes the MA plot of probe intensities from every pair of arrays

scatter about the M= 0 axis, quantile normalization makes the distributions

of expression levels the same across arrays, and median or mean normalization

shifts the individual log-intensities on each array so that the median or mean

log-intensities, respectively, are the same across arrays. These normalization al-

gorithms can be applied either globally to an entire data set or locally to some

physical subset of the data (Quackenbush 2002). Irizarry et al. (2003) applied

the quantile normalization procedure to normalize dilution data and spike-in data

from Affymetrix arrays, and showed how quantile normalization removed bias

as compared to no normalization. Their analysis was unique in that they knew

the true expression levels and could therefore determine the degree of bias re-

duction from quantile normalization.

MicroRNAs (miRNAs) are noncoding RNAs of 19-24 nucleotides that are

negative regulators of gene expression. Recently implicated as important in

development and normal physiology, microRNAs are abnormally expressed in

many human cancers (Volinia et al. 2006, Lu et al. 2005). Moreover, aberrant

microRNA expression has been shown to initiate and promote carcinogenesis

(reviewed in Hagan and Croce 2007). These microRNA expression signatures

may reveal new oncogenetic pathways in human cancers. For systematic in-

vestigation of microRNA expression, oligonucleotide-based microarrays for mi-

croRNAs in human and mouse tissues have been developed recently (Liu et al.

2004) and several commercial platforms are now available. To date, more than a

hundred published reports have used microRNA microarrays to investigate their

expression proﬁles, where more than two-thirds have used single color versus

two color hybridization systems. Although there is substantial literature regard-

ing normalization techniques for mRNA microarray data, there are no published

reports comparing normalization techniques for microRNA (miRNA) microar-

ray data, which are subject to the similar sources of error variation.

Many statistical reports on mRNA microarrays have focused on Affymetrix

Rao et al.: Comparing Normalization Techniques for MicroRNA Microarray Data

Published by The Berkeley Electronic Press, 2008

mRNA arrays, which have an exceedingly high density of probes that are in situ

synthesized on the array. For example, in one Human Genome U133 Plus2.0

GeneChip, probe sets for each mRNA, including numerous housekeeping genes,

consist of eleven oligonucleotide probes selected to maximize speciﬁcity and to

have similar melting temperatures across the entire array. In contrast, microRNA

microarrays are often lower density spotted arrays. Our focus is on single color

microRNA microarray. This type of microarray is used predominantly in com-

parison to dual color arrays. Results from the Version 3.0 microRNA microarray

used in this study and its earlier versions have appeared in more than 40 publi-

cations. The Version 3.0 microarray contains 3790 probes spotted in duplicate.

The probes are 40 nucleotides in length, consisting of the genomic sequence

that has the mature microRNA sequence and additional ﬂanking bases. With

the exception of six probes designed against Arabidopsis thaliana microRNAs,

the rest of the probes are derived from known and predicted human and mouse

microRNAs. This design allows for the detection of mature as well as precursor

miRNAs and is particularly helpful in determining if computationally predicted

miRNAs are real. Although U6 snRNA is frequently used as a control for mi-

croRNA experiments, this noncoding RNA has been shown to vary as much as

ﬁve fold for equivalent amounts of total RNA by both microarray and North-

ern analysis (Hagan and Liu, unpublished observations). Hence, probes for U6

snRNA were not included in the Version 3.0 microarray. Most, if not all, com-

mercially available microRNA microarrays do not have controls for endogenous

RNAs that have been shown to be largely invariant between tissue samples.

Given the short length of miRNAs and the fact that far more mRNAs are

known than miRNAs, it is important to compare normalization methods specif-

ically for the miRNA microarray data. Although microRNA microarrays are

lower density spotted arrays than mRNA microarrays, they are not “boutique”

arrays. For example, microRNA arrays do not meet the following criteria: “more

than half the probes might be differentially expressed between any two samples

and that the differential expression might be predominately in one direction”

(Oshlack et al. 2007). We also do not expect global differences across miRNA

arrays. As an example, the biggest difference in miRNA expressions was ex-

pected between brain and heart tissues, we found only 15% of miRNAs were

differentially expressed with a greater than 2 fold difference, when comparing

these distinct tissue types. Other examples include the referenced miRNA stud-

ies in cancer (Calin et al. 2005, Volinia et al. 2006, Yanaihara et al. 2006)

and tissue differentiation (Babak et al. 2004, Barad et al. 2004, Garzon et al.

2004) in Davison et al. (2006). For the three referenced cancer studies that used

microRNA microarrays, the number of differentially expressed microRNAs are

Statistical Applications in Genetics and Molecular Biology, Vol. 7 [2008], Iss. 1, Art. 22

http://www.bepress.com/sagmb/vol7/iss1/art22

13/245 (5.3%), 22/228−57/228 (9.6% −25.0%, range depends on which of six

tumor/normal comparisons were performed) and 43/352 (12.2%). For the three

referenced differentiation studies, the number of differentially expressed mi-

croRNA are 19/399 (4.8%), 25/154−35/154 (15.2%−22.7%, range depends on

the speciﬁc pairwise tissue comparison) and 35/150 −57/150 (23.3% −38.0%,

range depends on the speciﬁc pairwise tissue comparison). We can conclude

with conﬁdence that much less than 50% of miRNAs are differentially expressed

based on our experience and assessment of the literature. In addition to our cus-

tom microRNA microarrays, there are numerous commercially available miRNA

microarrays. For example, LC Sciences, Exiqon, Agilent, Invitrogen, and Am-

bion sell miRNA microarrays, with 1564,4000,15000,3000, and 1224 miRNA

probes, respectively. Hence, the probe density of our array is similar to many cur-

rently available commercial platforms. Importantly, high throughput sequencing

of microRNAs is rapidly expanding the number of known microRNAs. Hence,

our custom arrays soon will need to be updated with evenmore probes to reﬂect

the recently identiﬁed microRNAs. The microRNA registry (Version 10.1) cur-

rently has sequences for 5395 miRNAs. Even though microRNA microarrays

are not ”boutique” arrays in general, a few cases exist where large numbers of

microRNAs will be differentially expressed in only one direction. Knockouts

of essential microRNA biogenesis proteins such as Drosha, DGCR8, or Dicer1

lead to a dramatic reduction in steady state microRNA levels by blocking pro-

duction of mature microRNAs (Kumar et al. 2007). These global downregula-

tion cases are exceptionally easy to detect by microarray as the percentage of

microRNAs expressed above background is considerably different in compari-

son to controls. Other conﬁrmed examples that show unidirectional microRNA

regulation are quite rare. Using a novel bead-based microRNA proﬁling system,

microRNAs were reported to be downregulated primarily in cancers (129 of 217

investigated). Almost all studies of microRNAs in cancer, including all the re-

search referenced in this manuscript, have found roughly balanced numbers or

a slight enrichment for upregulated microRNAs in cancer, casting doubt on the

conclusions of Lu et al. (2005). Even research that at ﬁrst glance might seem

to support the conclusions of Lu and colleagues demonstrates unequivocally the

opposite. For example, Chang et al. (2008) reported that Myc expression leads

to widespread repression of microRNAs. As their Supplemental Table 1 shows

for 313 human microRNAs investigated, 11 and 17 microRNAs are upregulated

and downregulated, respectively, at least two fold upon induced Myc expression.

Although vigilance must be exercised to make sure that the underlying assump-

tions are valid, the normalization methods that we present are compatible for the

vast majority of studies using microRNA microarrays.

Rao et al.: Comparing Normalization Techniques for MicroRNA Microarray Data

Published by The Berkeley Electronic Press, 2008

In this paper, we compare the performance of median, cyclic loess, quantile,

and no normalization for miRNA microarray data. The data included 72 mi-

croarrays obtained from RNA from 26 human and 10 mouse tissues that were

hybridized as techinal replicates. Hence, each RNA sample was hybridized to

two independent microarrays. Since replicate samples should, in theory, have al-

most identical values for expressions, one can compare different normalization

techniques in terms of the closeness of normalized measurements in the repli-

cated samples. Moreover, there are no confounding biological effects that come

from tissues from different individuals. The differences between these paired

expression levels with and without normalization can be divided into a bias and

variance components by expression level. Both of these miRNA-by-miRNA dif-

ferences components should be reduced after applying normalization methods.

We used these differences to provide direct evidence of the capability of each

method of reducing these two components. It was critical to examine the effects

on both quantities because the complexity of a transformation may increase the

error variance over and above its bias reduction. To resemble how normalization

is typically applied to samples, normalization was done globally across all 72

samples. This is an important distinction from normalizing each of 36 replicate

pairs separately, where this level of normalization could produce artiﬁcially low

variance and bias.

Section 2 describes the normalization methods in detail. Section 3 describes

the miRNA data used in this paper. Section 4 compares normalization methods.

2 Normalization Methods

Three commonly used normalization techniques are reviewed. Suppose that we

have the (log base 2 transformed) probe level expression values from pmiRNAs

and narrays in a p×nmatrix X.

Median normalization shifts miRNAs expressions on each array by additive

constants so that the medians of miRNAs expressions are the same across arrays

by the following steps:

•Take the median of each column of Xand generate a n-dimensional me-

dian vector M;

•Calculate the overall median of the vector M;

•Shift miRNAs expression values of each array by subtracting the differ-

ence between the median of each array and the overall median from them.

Statistical Applications in Genetics and Molecular Biology, Vol. 7 [2008], Iss. 1, Art. 22

http://www.bepress.com/sagmb/vol7/iss1/art22

Instead of matching the median only across the arrays, Quantile normaliza-

tion makes the distributions of expression levels the same across arrays by the

following steps:

•Sort each column of Xseparately to generate a sorted p×nmatrix Y;

•Take the mean of each row of Yand generate a p-dimensional vector Ab,

called the baseline array;

•Get the normalized miRNAs expressions for each array by rearranging the

baseline array Abto have the same ordering of the corresponding column

of the matrix Xso that empirical distributions of miRNA expressions are

the same as that of the baseline array across arrays.

Cyclic loess considers the MA plot of probe intensities from every pair of

arrays (Xij, Xij ′), with ﬁxed j6=j′and i= 1,2, ..., p, and makes the M and A

pairs scattered around the M= 0 axis by the following steps:

•Compute Mi=Xij −Xij′and Ai=1

2(Xij +Xij′);

•Fit a loess curve by regression Mon A, and denoted the ﬁtted vector by

•Setting the vector D= (M−ˆ

M)/2, get the normalized miRNAs expres-

sions for (Xij , Xij′)by modifying Xij to Xij +Diand Xij′to Xij −Di,

i= 1,2, ..., p.

3 Description of Data

Total RNA was purchased from Ambion Inc. Microarray labeling and hybridiza-

tion were performed as previously described in Liu et al. (2004), except for

the exceptions noted below. The Ohio State University Comprehensive Can-

cer Center Version 3.0 microRNA microarray was used and this array contains

3790 oligo probes derived from 578 mature miRNAs spotted in duplicate (329

Homo sapiens, and 249 Mus musculus) that are annotated in the miRNA reg-

istry http://microrna.sanger.ac.uk/ sequences/ (Accessed Nov. 2005). Of the 396

evolutionarily conserved mature microRNAs between mice and human in Ver-

sion 10.1 of the microRNA registry, 68% are identical in length and sequence.

Hence, many of the mouse probes serve as additional controls for their human

counterparts and vice versa. In addition, 1493 human and 1137 mouse oligo

Rao et al.: Comparing Normalization Techniques for MicroRNA Microarray Data

Published by The Berkeley Electronic Press, 2008

probes for miRNAs computationally predicted in human and mouse, respec-

tively, are also spotted in duplicate. Often, more than one probe set exists for a

given mature miRNA. Additionally, there are duplicate probe spots correspond-

ing to most precursor miRNAs. Hybridization signals were ultimately detected

with Streptavidin-Alexa 647, conjugate and scanned images (Axon 4000B) were

quantiﬁed using the Genepix 6.0 software through a local background correction

(Axon Instruments, Sunnyvale, CA).

4 Analysis

Background-corrected median signals for duplicate probes on an array were av-

eraged. After normalization across all 72 arrays, let Xibe the log base 2trans-

formed expression value of the ith miRNA for a certain tissue, and let Yibe the

log base 2transformed expression value of the ith miRNA for the replicate of

the tissue.

Bias. The average Ai= (Xi+Yi)/2and the difference Mi=Xi−Yiof

expression values for each miRNA can then be computed. The MA plot of the

two vectors Xiand Yiis a 45-degree rotation and axis scaling of their scatter

plot. This plot is particularly useful for array data because Mirepresents the

log fold change and Airepresents the average log intensity for the ith miRNA.

When the loess curves of the MA plot deviate from the horizontal line at M= 0

, this demonstrates differences in the intensity levels between two arrays from

the same tissue (Gentleman et al. 2005). In contrast, if the loess curves align

with M= 0, the normalization method is considered to exhibit little bias at all

levels of expression. When MA plots and loess curves were made for the repli-

cate array data from human brain tissue using no normalization, median normal-

ization, quantile normalization and cyclic loess, we observed that the quantile

normalization method removed bias the best (Figure 1C), the loess curve closely

followed the horizontal line at M= 0. No normalization, median normalization

and cyclic loess behaved similarly in that their loess curves are not aligning with

M= 0 closely enough (Figure 1A, 1B and 1D).

Binning. To compare the normalization methods in how much they reduced

error variance in addition to reducing bias, we formally modeled the mean and

variance of differences in replicate arrays as a function of their expression lev-

els. In order to obtain reliable estimates of the expression levels, we binned

duplicates according to their average expression level ﬁrst and then proceeded

by modeling the mean and variance based on the binned data.

We created equally-sized bins containing 34 miRNAs probes. For each bin,

Statistical Applications in Genetics and Molecular Biology, Vol. 7 [2008], Iss. 1, Art. 22

http://www.bepress.com/sagmb/vol7/iss1/art22

0 5 10 15

−6 −4 −2 0 2 4 6

A: Average of expressions

M: Difference of expresssions

0 5 10 15

−6 −4 −2 0 2 4 6

A: Average of expressions

M: Difference of expresssions

0 5 10 15

−6 −4 −2 0 2 4 6

M: Difference of expresssions

0 5 10 15

−6 −4 −2 0 2 4 6

M: Difference of expresssions

Figure 1: MA and loess plot of expression values for the human brain tissue

data. A) without normalization, B) after median normalization, C) after quantile

normalization and D) after cyclic loess.

we summarized the differences in the replicate arrays by median absolute devi-

ation (MAD) of the differences and median of the differences to obtain robust

estimates of variance and bias, respectively (Lin et al. 2002). The smoothed

MADs and medians of the differences were used to detect systematic effects due

to the different normalization methods as a function of expression levels. Lower

values of smoothed MADs and smoothed medians closer to zero across average

expressions correspond to a superior normalization method.

As stated above, each bin consisted of 34 miRNAs probes. For ﬁxed k

(1 ≤k≤K), let X(i)k(i= 1,2, ..., 34) be the expression value of the ith

miRNA in the kth bin for a speciﬁc tissue, and let Y(i)k(i= 1,2, ..., 34) be the

Rao et al.: Comparing Normalization Techniques for MicroRNA Microarray Data

Published by The Berkeley Electronic Press, 2008

expression value of the ith miRNA in the kth bin for the replicate of the tissue.

The difference between the replicate arrays expression values for each miRNA

in the kth bin can be denoted by D(i)k=X(i)k−Y(i)k(i= 1,2, ..., 34), and the

corresponding observations by d(i)k. We assume that for ﬁxed k,

D(i)ki.i.d.

∼N(µk, σ2

k)i= 1,2, ..., 34

and use

mdk=median

1≤i≤34 (d(i)k)

as a robust location (center) estimate of µk=E[D(1)k], and

MADdk=median

1≤i≤34 |d(i)k−median

1≤i≤34 (d(i)k)|,

as a robust estimate of scale (spread), which is proportional to σk=pvar[D(1)k]

under normality.

For the average expression values of miRNAs in the kth bin across certain

tissue replicates, let A(i)k= (X(i)k+Y(i)k)/2 (i= 1,2, ..., 34) and a(i)kbe the

ith observation. Similarly, for estimation of the center of the average expression

values in each bin, we consider

mak=median

1≤i≤34 (a(i)k).

As Figure 1A suggests, it is sensible to model µkand σkas a function of the

center of the average expression values of miRNA replicates in the kth bin.

For the paired observations (ma1, md1),(ma2, md2), ..., (maK, mdK), we

modeled the median difference as a smooth function of the median average

mdk=η(mak) + ǫk, k = 1,2, ..., K

with ǫk∼N(0, σ2

m,k)and with a different variance for each bin. The smoothed

relationship ηwas obtained by the weighted smoothing spline with weights equal

to the reciprocal of the squared MAD of difference. Quantile normalization gave

the best results when comparing the weighted smoothed curves for the median

difference in expression values using the human brain tissue data (Figure 2).

Similarly, for the paired observations (ma1, MADd1),(ma2, MADd2),...,

(maK, MADdK), we considered the following model with unequal variance

MADdk=ξ(mak) + ǫk, k = 1,2, ..., K

Statistical Applications in Genetics and Molecular Biology, Vol. 7 [2008], Iss. 1, Art. 22

http://www.bepress.com/sagmb/vol7/iss1/art22

0 2 4 6 8 10 12 14

−1.5 −1.0 −0.5 0.0 0.5 1.0

median of average of expressions

median of difference of expressions

0 2 4 6 8 10 12 14

−1.5 −1.0 −0.5 0.0 0.5 1.0

median of average of expressions

median of difference of expressions

0 2 4 6 8 10 12 14

−1.5 −1.0 −0.5 0.0 0.5 1.0

median of difference of expressions

0 2 4 6 8 10 12 14

−1.5 −1.0 −0.5 0.0 0.5 1.0

median of difference of expressions

Figure 2: weighted smoothed medians of difference of expression values for the

human brain tissue data. A) without normalization, B) after median normaliza-

tion, C) after quantile normalization and D) after cyclic loess.

and ǫk∼N(0, σ2

MAD). The smoothed MAD of differences ξcan again be ob-

tained by smoothing splines with the smoothing parameter selected by general-

ized maximum likelihood (GML) (Gu 2002). It was difﬁcult to see differences

in the relationship between MADd and ma among the normalization methods

(Figure 3), but they became more apparent if the bias and variance were com-

bined into a mean-squared error statistic.

Conﬁdence intervals. The ﬁtted medians of differences ηis the smoothed

estimate of bias parameter µk, and the ﬁtted MAD of differences ξis the smoothed

estimate of scale parameter. We used the ﬁtted MAD to estimate conﬁdence in-

tervals around bias and obtained a pointwise conﬁdence interval for the bias by

Rao et al.: Comparing Normalization Techniques for MicroRNA Microarray Data

Published by The Berkeley Electronic Press, 2008

0 2 4 6 8 10 12 14

0.5 1.0 1.5 2.0

median of average of expressions

MAD of difference of expressions

0 2 4 6 8 10 12 14

0.5 1.0 1.5 2.0

median of average of expressions

MAD of difference of expressions

0 2 4 6 8 10 12 14

0.5 1.0 1.5 2.0

MAD of difference of expressions

0 2 4 6 8 10 12 14

0.5 1.0 1.5 2.0

MAD of difference of expressions

Figure 3: smoothed MADs versus median averages for the human brain tissue

data. A) without normalization, B) after median normalization, C) after quantile

normalization and D) after cyclic loess.

binned expression values as

ˆη(mak)±3.98

√34

ξ(mak),

(see Hoaglin et al. 2000). The conﬁdence band after quantile normalization

encompasses the horizontal line at M= 0, while those using no normalization,

median normalization or cyclic loess do not include zero for larger expression

values (Figure 4).

Mean Squared Error. We obtained the mean squared error (MSE) of the

difference in expression values (including variance and squared bias)

MSEk=E[D2

(1)k] = var[D(1)k] + E[D(1)k]2=σ2

k+µ2

Statistical Applications in Genetics and Molecular Biology, Vol. 7 [2008], Iss. 1, Art. 22

http://www.bepress.com/sagmb/vol7/iss1/art22

6 8 10 12 14 16

−2 −1 0 1 2

A: average of expressions

M: difference of expressions

6 8 10 12 14 16

−2 −1 0 1 2

A: average of expressions

M: difference of expressions

6 8 10 12 14 16

−2 −1 0 1 2

M: difference of expressions

6 8 10 12 14 16

−2 −1 0 1 2

M: difference of expressions

Figure 4: conﬁdence band of the bias for the human brain tissue data. A) without

normalization, B) after median normalization, C) after quantile normalization

and D) after cyclic loess.

which can be estimated by the smoothed estimates

[ˆ

ξ(mak)

0.6745 ]2+ ˆη(mak)2,

(see Huber 2003). The estimated MSE for quantile normalization is smallest

when average expression values are greater than noise levels of measurements,

and the estimated MSE for cyclic loess is slightly larger than that of quantile

normalization across all average expression values. Median normalization per-

formed similarly to no normalization (Figure 5).

To evaluate the global bias and variance for each method, we averaged MSEs

across expression levels greater than 4.5; the value 4.5(log base 2 transformed)

Rao et al.: Comparing Normalization Techniques for MicroRNA Microarray Data

Published by The Berkeley Electronic Press, 2008

6 8 10 12 14

0.0 0.2 0.4 0.6 0.8 1.0

median of average of expressions

MSE of difference of expressions

Brain tissue

Figure 5: MSE curves without normalization (black, solid line), after median

normalization (green, dashed line), and after quantile normalization (red, dot-

dashed line) after cyclic loess (blue, dotted line).

was selected because 95% of the blanks (spots lacking oligonucleotide probes)

gave intensities less than this value. The average MSEs for no normalization,

median normalization, quantile normalization and cyclic loess using the brain

tissue data were 0.278, 0.274, 0.225, 0.270 respectively. These results were

found consistently across the other 35 tissue types (Figure 6), where the MSEs

were lower for quantile normalization (coded 2) in almost all tissue samples

compared to no normalization (coded 0), median normalization (coded 1) and

cyclic loess (coded 3), except for human lung, human liver, human thymus,

mouse liver and mouse lung. When the normalization methods were applied

to each tissue type separately, instead of to all 72 arrays together, the results

were similar.

Checking for Scale Compression. It is possible that the superior results for

Statistical Applications in Genetics and Molecular Biology, Vol. 7 [2008], Iss. 1, Art. 22

http://www.bepress.com/sagmb/vol7/iss1/art22

000

0000

000

0000

mean of MSEs

Esophagus

Colon

Cervix

Lung

Brain

Bladder

Liver

Kidney

Adipose

Heart

Thymus

Ovary

Placenta

Testes

Thyroid

Skeletal Muscle

Small Intestine

Spleen

Prostate

Trachea

Pancreas

Breast

Stomach

Uterus

Adrenal

Lymph Node

Mouse Spleen

Mouse Liver

Mouse Brain

Mouse Heart

Mouse Ovary

Mouse Embryo

Mouse Lung

Mouse Thymus

Mouse Kidney

Mouse Testicle

0.2 0.4 0.6 0.8 1.0

11111

1111111

111

1111

222

222222

222

2222

333

3333

3333333

333

33333333

Figure 6: mean of MSEs for the difference in expression values without normal-

ization (0 and black), after median normalization (1 and green), after quantile

normalization (2 and red) and after cyclic loess (3 and blue).

quantile normalization is the result of the compression of the scale downward

after transformation. To check this, we ﬁrst calculated coefﬁcients of varia-

tion (CV) as the ratio of an estimate of the standard deviation of measurement

(√MSE) for each bin to the mean expression for that bin and then average the

ratios across bins. We found the CVs followed the same pattern as the MSEs,

that is, typically lower values for quantile normalization across tissues (Figure

7). It is also possible that the superior results for quantile normalization is the

result of compressing the scale from both ends after transformation; thereby re-

ducing spread and sensitivity of transformed measurements. To check this, we

calculated the average variance of expression levels across the 36 tissues for each

miRNA. This variance consists of true variance across tissues and measurement

Rao et al.: Comparing Normalization Techniques for MicroRNA Microarray Data

Published by The Berkeley Electronic Press, 2008

error as obtained with the MSE. Averaging the variance across miRNAs and the

MSEs across tissues, we found the ratios of signal (true) variance to noise (mea-

surement error) variance were 12.0,14.0,16.3and 16.3for no, median, quantile

and cyclic loess normalization respectively.

000

0000

000

0000

mean of CVs

Esophagus

Colon

Cervix

Lung

Brain

Bladder

Liver

Kidney

Adipose

Heart

Thymus

Ovary

Placenta

Testes

Thyroid

Skeletal Muscle

Small Intestine

Spleen

Prostate

Trachea

Pancreas

Breast

Stomach

Uterus

Adrenal

Lymph Node

Mouse Spleen

Mouse Liver

Mouse Brain

Mouse Heart

Mouse Ovary

Mouse Embryo

Mouse Lung

Mouse Thymus

Mouse Kidney

Mouse Testicle

0.04 0.06 0.08 0.10 0.12

111

1111111

111

222

2222

222

333

3333333

3333

Figure 7: mean of CVs for the difference in expression values without normal-

ization (0 and black), after median normalization (1 and green), after quantile

normalization (2 and red) and after cyclic loess (3 and blue).

Comparative Study We compare real-time RT-PCR miRNA data (Lee et al.

2008) with our microarry miRNA data, since twenty-one tissues were common

to both datasets. Speciﬁcally, we focused on brain and heart, since these tissues

are quite biologically distinct and have substantial differences in their miRNA

expression proﬁles. If a normalization technique was overly aggressive, then

there would be an ”averaging-out” effect, leading to a signiﬁcant decrease in the

number of differentially expressed miRNAs. A well known difference between

Statistical Applications in Genetics and Molecular Biology, Vol. 7 [2008], Iss. 1, Art. 22

http://www.bepress.com/sagmb/vol7/iss1/art22

microarray and RT-PCR data is that the fold changes observed by microarray

tend to be compressed in comparison with fold changes observed by RT-PCR.

We found 51 miRNAs were characterized by a four fold difference in expression

by RT-PCR. For the microarray data on identical miRNAs, we found that 36,

35,35,35 miRNAs were two fold differentially expressed for no, median, cyclic

loess and quantile normalization respectively. This set of miRNAs was found to

have roughly an 70% overlap with the RT-PCR data. The observed values for

fold changes varied little with respect to the normalization method used. In this

respect, we could not conclude any superior normalization method based strictly

on this analysis, but we could at least conclude that quantile normalization is not

worse than other methods in terms of its sensitivity.

5 Conclusion

We showed that the quantile normalization method works best in reducing dif-

ferences in miRNA expression values for duplicate tissue samples, cyclic loess

works slightly worse than quantile normalization, whereas no normalization and

median normalization behave similarly and seem to be inferior to quantile nor-

malization and cyclic loess with regard to bias. This is not surprising because

quantile normalization adjusted better for differential bias across the scale of

expression values. By showing that the total MSE was lower across almost all

36 tissue samples, we were assured that the bias correction provided by quan-

tile normalization was not outweighed by additional error variance that can arise

from a more complex normalization method. Furthermore, we showed that quan-

tile normalization does not achieve smaller replication error by compressing the

scale downward or by compressing the scale from both ends.

References

Babak, T., Zhang, W., Morris, Q., Blencowe, B. and Hughes, T. (2004). Prob-

ing microRNAs with microarrays: Tissue speciﬁcty and functional inference,

RNA 10: 1813–1819.

Barad, O., Meiri, E., Avniel, A., Aharonov, R., Barzilai, A., Bentwich, I., Einav,

U., Gilad, S., Hurban, P., Karov, Y., Lobenhofer, E. K., Sharon, E., Shibo-

leth, Y. M., Shtutman, M., Bentwich, Z. and Einat, P. (2004). MicroRNA ex-

pression detected by oligonucleotide microarrays: System establishment and

expression proﬁling in human tissues, Genome Research 14: 2486–2494.

Rao et al.: Comparing Normalization Techniques for MicroRNA Microarray Data

Published by The Berkeley Electronic Press, 2008

Bolstad, B. M., Irizarry, R. A., Astrand, M. and Speed, T. P. (2003). A com-

parision of normalization methods for high density oligonucleotide array data

based on variance and bias, Bioinformatics 19: 185–193.

Calin, G., Ferracin, M., Cimmino, A., DiLeva, G., Shimizu, M., Wojcik, S.,

Iorio, M., Visone, R., Sever, N., Fabbri, M., Iuliano, R., Palumbo, T., Pichiorri,

F., Roldo, C., Garzon, R., Sevignani, C., Rassenti, L., Alder, H., Volinia, S.,

Liu, C. G., Kipps, T. J., Negrini, M. and Croce, C. M. (2005). A microRNA

signature associated with prognosis and progression in chronic lymphocytic

leukemia, The New England Journal of Medicine 353: 1793–1801.

Chang, T., Yu, D., Lee, Y., Wentzel, E., Arking, D., West, K., Dang, C. V.,

Thomas-Tikhonenko, A. and Mendell, J. T. (2008). Widespread microRNA

repression by myc contributes to tumorigenesis, Nature Genetics 40(1): 43–

50.

Churchill, G. A. (2002). Fundamentals of experimental design for cdna microar-

rays, Nature Genetics 32: 490–495.

Churchill, G. A. (2003). Discussion to statistical challenges in functional

genomics-comment, Statistical Science 18: 64–69.

Churchill, G. A. and Oliver, B. (2001). Sex, ﬂies and microarrays, Nature Ge-

netics 29: 355–356.

Davison, T., Johnson, C. and Andruss, B. (2006). Analyzing micro-RNA ex-

pression using microarrays, Methods in Enzymology 411: 14–34.

Garzon, R., Pichiorri, F., Palumbo, T., Iuliano, R., Cimmino, A., Aqeilan, R.,

Volinia, S., Bhatt, D., Alder, H., Marcucci, G., Carlin, G., Liu, C. G., Bloom-

ﬁeld, C., Andreeff, M. and Croce, C. (2006). MiRNA ﬁngerprints during hu-

man megakaryocytopoiesis, Proceedings of the National Academy of Sciences

of the United States of America 101: 5078–5083.

Gentleman, R., Carey, V. J., Huber, W., Irizarry, R. and Dudoit, S. (2005). Bioin-

formatics and computational biology solutions using R and bioconductor,

Springer: New York.

Gu, C. (2002). Smoothing Spline ANOVA Models, Springer: New York.

Hagan, J. and Croce, C. (2007). MicroRNAs in carcinogenesis, Cytogenetic and

Genome Research 118: 252–259.

Statistical Applications in Genetics and Molecular Biology, Vol. 7 [2008], Iss. 1, Art. 22

http://www.bepress.com/sagmb/vol7/iss1/art22

Hoaglin, D. C., Mosteller, F. and Tukey, J. W. (2000). Understanding Robust

and Exploratory Data Analysis, John Wiley & Sons.

Huber, P. (2003). Robust Statistics, John Wiley & Sons.

Irizarry, R. A., Hobbs, B., Collin, F. and Speed, T. (2003). Exploration, nor-

malization, and summaries of high density oligonucleotide array probe level

data., Biostatistics 4: 249–264.

Kerr, M. K. and Churchill, G. (2001). Experimental design for gene expression

microarrays., Biostatistics 2: 183–201.

Kumar, M., Lu, J., Mercer, K., Golub, T. and Jacks, T. (2007). Impaired mi-

croRNA processing enhances cellular transformation and tumorigenesis, Na-

ture Genetics 39(5): 673–677.

Lee, E., Baek, M., Gusev, Y., Brackett, D. J., Nuovo, G. and Schmittgen, T.

(2008). Systematic evaluation of microRNA processing patterns in tissues,

cell lines, and tumors, RNA 14: 35–42.

Lin, Y., Nadler, S. T., Lan, H., Attie, A. D. and Yandell, B. S. (2003). Adaptive

gene picking with microarray data: detecting important low abundance sig-

nals, in G. Parmigiani, E. S. Garrett, R. A. Irizarry and S. L. Zeger (eds), The

Analysis of Gene Expression Data: Methods and Software, Springer-Verlag.

Liu, C., Calin, G., Meloon, B., Gamliel, N., Sevignani, C., Ferracin, M., Du-

mitru, C., Shimizu, M., Zupo, S., Dono, M., Alder, H., Bullrich, F., Negrini,

M. and Croce, C. (2004). An oligonucleotide microchip for genome-wide mi-

croRNA proﬁling in human and mouse tissues, Proceedings of the National

Academy of Sciences of the United States of America 101(26): 9740–9744.

Lu, J., Getz, G., Miska, E., Alvarez-Saavedra, E., Lamb, J., Peck, D., Sweet-

Cordero, A., Ebert, B. L., Mark, R., Ferrando, A., R., D. J., Jacks, T., Horvitz,

H. R. and Golub, T. R. (2005). MicroRNA expression proﬁles classify human

cancers, Nature 435(7043): 843–848.

Oshlack, A., Emslie, D., Corcoran, L. and Smyth, G. (2007). Normalization

of boutique two-color microarrays with a high proportion of differentially ex-

pressed probes, Genome Biology 8(1):R2.

Quackenbush, J. (2002). Microarray data normalization and transformation, Na-

ture Genetics 32: 496–501.

Rao et al.: Comparing Normalization Techniques for MicroRNA Microarray Data

Published by The Berkeley Electronic Press, 2008

Volinia, S., Calin, G., Liu, C., Ambs, S., Cimmino, A., Petrocca, F., Visone,

R., Iorio, M., Roldo, C., Ferracin, M., Prueitt, R., Yanaihara, N., Lanza, G.,

Scarpa, A., Vecchione, A., Negrini, M., Harris, C. and Croce, C. (2006). A

microRNA expression signature of human solid tumors deﬁnes cancer gene

targets, Proceedings of the National Academy of Sciences of the United States

of America 103(7): 2257–2261.

Wolﬁnger, R., Gibson, G., Wolﬁnger, E. D., Bennett, L., Hamadeh, H., Bushe,

P., Afsha, C. and Paules, R. . (2001). Assessing gene signiﬁcance from cdna

microarray expression data via mixed models, Journal of Computational Bi-

ology 8: 625–637.

Yanaihara, N., Caplen, N., Bowman, E., Seike, M., Kumamoto, K., Yi, M.,

Stephens, R., Okamoto, A., Yokota, J., Tanaka, T., Carlin, G., Liu, C. G.,

Croce, C. and Harris, C. (2006). Unique miRNA molecular proﬁles in lung

cancer diagnosis and prognosis, Cancer Cell 9(3): 189–198.

Statistical Applications in Genetics and Molecular Biology, Vol. 7 [2008], Iss. 1, Art. 22

http://www.bepress.com/sagmb/vol7/iss1/art22

Comparing preprocessing strategies for 3D-Gene microarray data of extracellular vesicle-derived miRNAs

Article

Full-text available

Jun 2024
BMC BIOINFORMATICS

Background Extracellular vesicle-derived (EV)-miRNAs have potential to serve as biomarkers for the diagnosis of various diseases. miRNA microarrays are widely used to quantify circulating EV-miRNA levels, and the preprocessing of miRNA microarray data is critical for analytical accuracy and reliability. Thus, although microarray data have been used in various studies, the effects of preprocessing have not been studied for Toray’s 3D-Gene chip, a widely used measurement method. We aimed to evaluate batch effect, missing value imputation accuracy, and the influence of preprocessing on measured values in 18 different preprocessing pipelines for EV-miRNA microarray data from two cohorts with amyotrophic lateral sclerosis using 3D-Gene technology. Results Eighteen different pipelines with different types and orders of missing value completion and normalization were used to preprocess the 3D-Gene microarray EV-miRNA data. Notable results were suppressed in the batch effects in all pipelines using the batch effect correction method ComBat. Furthermore, pipelines utilizing missForest for missing value imputation showed high agreement with measured values. In contrast, imputation using constant values for missing data exhibited low agreement. Conclusions This study highlights the importance of selecting the appropriate preprocessing strategy for EV-miRNA microarray data when using 3D-Gene technology. These findings emphasize the importance of validating preprocessing approaches, particularly in the context of batch effect correction and missing value imputation, for reliably analyzing data in biomarker discovery and disease research.

Machine Learning Algorithms in Rock Strength Prediction: A Novel Method for Evaluating Dynamic Compressive Strength of Rocks Under Freeze-Thaw Cycles

Article

Full-text available

May 2024

The combined impact of freeze-thaw cycles and dynamic loads significantly influences the long-term durability of rock engineering in high-cold regions. Consequently, investigating the dynamic compressive strength (DCS) of rocks subjected to freeze-thaw cycles has emerged as a crucial area of scientific research to advance rock engineering construction in cold regions. Presently, the determination of the DCS of rocks under freeze-thaw cycles primarily relies on indoor experiments. However, this approach has faced criticism due to its drawbacks, including prolonged duration, high costs, and reliance on rock samples. To address these limitations, the exploration of using artificial intelligence technology to develop more accurate and convenient DCS prediction models for rocks under freeze-thaw cycles is a promising attempt. In this context, this paper introduces a DCS prediction model for rocks under freeze-thaw cycles, which integrates the Sparrow Search Algorithm (SSA) with Random Forest (RF). Firstly, employing a dataset of 216 samples, Principal Component Analysis (PCA) is utilized to reduce the dimensionality of ten influential factors. Subsequently, five optimization algorithms are employed to optimize the hyperparameters of both the BP and RF algorithms. Finally, a comprehensive evaluation and comparative analysis are carried out to assess the predictive performance of the optimized model, using evaluation metrics such as Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and Coefficient of Determination (R2).The research findings demonstrate that the SSA-RF model exhibits the best predictive performance, surpassing the other nine models in terms of generalization. The prediction model proposed in this study has good applicability for predicting DCS of freeze-thaw rock in cold regions, and also provides new ideas for the combination of machine learning and rock mass engineering in cold regions.

Cladribine and ocrelizumab induce differential miRNA profiles in peripheral blood mononucleated cells from relapsing–remitting multiple sclerosis patients

Article

Full-text available

Dec 2023

Background and objectives Multiple sclerosis (MS) is a chronic, progressive neurological disease characterized by early-stage neuroinflammation, neurodegeneration, and demyelination that involves a spectrum of heterogeneous clinical manifestations in terms of disease course and response to therapy. Even though several disease-modifying therapies (DMTs) are available to prevent MS-related brain damage—acting on the peripheral immune system with an indirect effect on MS lesions—individualizing therapy according to disease characteristics and prognostic factors is still an unmet need. Given that deregulated miRNAs have been proposed as diagnostic tools in neurodegenerative/neuroinflammatory diseases such as MS, we aimed to explore miRNA profiles as potential classifiers of the relapsing–remitting MS (RRMS) patients’ prospects to gain a more effective DMT choice and achieve a preferential drug response. Methods A total of 25 adult patients with RRMS were enrolled in a cohort study, according to the latest McDonald criteria before (pre-cladribine, pre-CLA; pre-ocrelizumab, pre-OCRE, time T0) and after high-efficacy DMTs, time T1, 6 months post-CLA (n = 10, 7 F and 3 M, age 39.0 ± 7.5) or post-OCRE (n = 15, 10 F and 5 M, age 40.5 ± 10.4) treatment. A total of 15 age- and sex-matched healthy control subjects (9 F and 6 M, age 36.3 ± 3.0) were also selected. By using Agilent microarrays, we analyzed miRNA profiles from peripheral blood mononuclear cells (PBMC). miRNA–target networks were obtained by miRTargetLink, and Pearson’s correlation served to estimate the association between miRNAs and outcome clinical features. Results First, the miRNA profiles of pre-CLA or pre-OCRE RRMS patients compared to healthy controls identified modulated miRNA patterns (40 and seven miRNAs, respectively). A direct comparison of the two pre-treatment groups at T0 and T1 revealed more pro-inflammatory patterns in the pre-CLA miRNA profiles. Moreover, both DMTs emerged as being capable of reverting some dysregulated miRNAs toward a protective phenotype. Both drug-dependent miRNA profiles and specific miRNAs, such as miR-199a-3p, miR-29b-3p, and miR-151a-3p, emerged as potentially involved in these drug-induced mechanisms. This enabled the selection of miRNAs correlated to clinical features and the related miRNA–mRNA network. Discussion These data support the hypothesis of specific deregulated miRNAs as putative biomarkers in RRMS patients’ stratification and DMT drug response.

Voids prediction beneath cement concrete slabs using a FEM-ANN method

Article

Full-text available

Mar 2023

The voids beneath cement concrete slabs are a major invisible disease, resulting in a rapid decrease in service performance in the composite pavement. Accurate voids prediction is essential for the extensive application and long-term service of composite pavement. This research provides a FEM-ANN (Finite Element Modelling-Artificial Neural Network) method to predict the voids beneath concrete slabs. These ANN models include the original back propagation (BP), the particle swarm optimisation (PSO) BP model, the genetic algorithm (GA) BP model, and the whale optimisation algorithm (WOA) BP model. The voids FEM model is established and validated by the measured data in the field, and the relative error of measured and simulated results is within 4%. The cross-validation results show that the WOA-BP model has the best prediction performance, with the highest score of 8, which refers to the overall score of the mean value and variance of these evaluation indices. Therefore, this FEM-ANN framework is an efficient method for estimating the voids beneath concrete slabs. Furthermore, it is discovered that the base modulus with the highest contribution degree of 20.34% is the most dominant factor in predicting the voids output. HIGHLIGHTS • A FEM-ANN method is utilised to predict the voids beneath concrete slabs • The WOA-BP model exhibits the best comprehensive performance of the four ANN models. • Wd and pavement mechanical responses have a positive effect on Av opposite to Kd and pavement structure. • The base modulus is the primary factor in predicting the voids output.

Deciphering a Sleeping Pathogen: Uncovering Novel Transcriptional Regulators of Hypoxia-Induced Dormancy in Mycobacterium Tuberculosis

Preprint

Full-text available

Mar 2023

Rohak Jain

Along the pathogenesis of Mycobacterium Tuberculosis (MTB), hypoxia-induced dormancy is a process involving the oxygen-depleted environment encountered inside the lung granuloma, where bacilli enter a viable, non-replicating state termed as latency. Affecting nearly two billion people, latent TB can linger in the host for indefinite periods of time before resuscitating, which significantly strains the accuracy of treatment options and patient prognosis. Transcriptional factors thought to mediate this process have only conferred mild growth defects, signaling that our current understanding of the MTB genetic architecture is highly insufficient. In light of these inconsistencies, the objective of this study was to characterize regulatory mechanisms underlying the transition of MTB into dormancy. The project methodology involved a three-part approach - constructing an aggregate hypoxia dataset, inferring a gene regulatory network based on those observations, and leveraging several downstream network analyses to make sense of it all. Results indicated dormancy to be functionally associated with cell redox homeostasis, metal ion cycling, and cell wall metabolism, all of which modulate essential host-pathogen interactions. Additionally, the crosstalk between individual regulons (Rv0821c and Rv0144; Rv1152 and Rv2359) was shown to be critical in facilitating bacterial persistence and allowing MTB to gain control over key micronutrients within the cell. Defense antioxidants and nutritional immunity were also identified as future avenues to explore further. In providing some of the first insights into the methods utilized by MTB to endure in a hypoxic state, this research suggests a range of strategies that might aid in improved clinical outcomes of TB treatment.

On data normalization and batch-effect correction for tumor subtyping with microRNA data

Article

Full-text available

Jan 2023

The discovery of new tumor subtypes has been aided by transcriptomics profiling. However, some new subtypes can be irreproducible due to data artifacts that arise from disparate experimental handling. To deal with these artifacts, methods for data normalization and batch-effect correction have been utilized before performing sample clustering for disease subtyping, despite that these methods were primarily developed for group comparison. It remains to be elucidated whether they are effective for sample clustering. We examined this issue with a re-sampling-based simulation study that leverages a pair of microRNA microarray data sets. Our study showed that (i) normalization generally benefited the discovery of sample clusters and quantile normalization tended to be the best performer, (ii) batch-effect correction was harmful when data artifacts confounded with biological signals, and (iii) their performance can be influenced by the choice of clustering method with the Prediction Around Medoid method based on Pearson correlation being consistently a best performer. Our study provides important insights on the use of data normalization and batch-effect correction in connection with the design of array-to-sample assignment and the choice of clustering method for facilitating accurate and reproducible discovery of tumor subtypes with microRNAs.

Automatically Prioritizing Tasks in Software Development

Article

Full-text available

Jan 2023

italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Context : Within the domain of managing software development teams, effective task prioritization is a critical responsibility that should not be underestimated, particularly for larger organizations with significant backlogs. Current approaches primarily rely on predicting task priority without considering information about other tasks, potentially resulting in inaccurate priority predictions. Objective : This paper presents the benefits of considering the entire backlog when prioritizing tasks. Method : We employ an iterative approach using particle swarm optimization to optimize a linear model with various preprocessing methods to determine the optimal model for task prioritization within a backlog. Results : The findings of our study demonstrate the usefulness of constructing a task prioritization model based on complete information from the backlog. Conclusion : The method proposed in our study can serve as a valuable resource for future researchers and can also facilitate the development of new tools to aid IT management teams.

Fast rank-based normalization of miRNA qPCR arrays using support vector regression

Article

Jan 2023

One of the first important steps in elucidating the function of microRNAs (miRNAs) is expression profiling. Many methods including low-density qPCR arrays are increasingly used to profile the expression of miRNAs. Normalization techniques are necessary due to certain biases in profiling approaches, and the techniques can significantly affect the accuracy of miRNA quantification. Most normalization methods for continous expression data have been developed for mRNA microarrays and new and modified methods should be used for miRNA studies in general and RT-qPCR miRNA arrays in particular. Previously, cyclic normalization using support vector regression has been successfully applied to mRNA arrays. Here, a new method based on support vector regression is introduced for miRNA normalization and the cyclic nature of algorithm in cyclic spline normalization has also been modified. It was shown that by creating a baseline array, it is possible to remove the cyclic nature of the normalization to achieve faster normalization, with no loss of accuracy. To assess how much the mentioned normalization method reduces technical error, mean square error (MSE) in two real miRNA qPCR array datasets and a simulated dataset before and after normalizations was robustly modelled and compared. Our method was also systematically compared with the most commonly used methods for normalization of qPCR miRNA arrays. The new method showed lower MSE values corresponding to other common methods of miRNA normalization.

MicroRNAs in Type 2 Diabetes: Focus on MicroRNA Profiling in Islets of Langerhans

Chapter

Dec 2022

Differential expression of microRNAs (miRNAs) is observed in many diseases including type 2 diabetes (T2D). Insulin secretion from pancreatic beta cells is central for the regulation of blood glucose levels and failure to release enough insulin results in hyperglycemia and T2D. The importance in T2D pathogenesis of single miRNAs in beta cells has been described; however, to get the full picture, high-throughput miRNA sequencing is necessary. Here we describe a method using small RNA sequencing, from sample preparation to expression analysis using bioinformatic tools. In the end, a tutorial on differential expression analysis is presented in R using publicly available data.

Commonly Used Normalization Approaches in MicroRNA Expression Profiling

Article

Full-text available

Aug 2022

RT-qPCR, microarray platforms and miRNA sequencing are the most common techniques used to determine microRNA (miRNA) expressions. One of the most important issues in studies these techniques are used is the normalization of the data by using appropriate normalization method. The purpose of normalization is to eliminate the effects of biological and technical variations on study results. Numerous normalization approaches are used for normalization of data obtained from different techniques in miRNA expression studies. In this review, information about the most commonly used normalization approaches in miRNA expression studies is summarized.

Understanding Robust and Exploratory Data Analysis.

Article

Dec 1983

Probing microRNAs with microarrays: Tissue specificity and functional inference

Article

Jan 2005

Systematic evaluation of microRNA processing patterns in tissues, cell lines, and tumors

Article

Jan 2007
RNA

Experimental design for gene expression microarrays

Article

Jan 2001
BIOSTATISTICS

Statistical challenges in functional genomics - Comment

Article

Feb 2003
STAT SCI

Exploration, Normalization, and Summaries of High Density Oligonucleotide Array Probe Level Data

Article

Jan 2003
BIOSTATISTICS

Smoothing spline ANOVA models

Book

Jan 2013

Chong Gu

Introduction.- Model Construction.- Regression with Gaussian-Type Responses.- More Splines.- Regression and Exponential Families.- Regression with Correlated Responses.- Probability Density Estimation.- Hazard Rate Estimation.- Asymptotic Convergence.- Penalized Pseudo Likelihood.

Bioinformatics and Computational Biology Solutions Using R and Bioconductor

Book

Jan 2005

Full four-color book. Some of the editors created the Bioconductor project and Robert Gentleman is one of the two originators of R. All methods are illustrated with publicly available data, and a major section of the book is devoted to fully worked case studies. Code underlying all of the computations that are shown is made available on a companion website, and readers can reproduce every number, figure, and table on their own computers.

Supplementary Notes MicroRNA Expression Profiles Classify Human Cancers

Article

Jan 2005
NATURE

A Comparison of Normalization Methods for High Density Oligonucleotide Array Data Based on Variance and Bias

Article

Jan 2003

Motivation: When running experiments that involve multiple high density oligonucleotide arrays, it is important to remove sources of variation between arrays of non-biological origin. Normalization is a process for reducing this variation. It is common to see non-linear relations between arrays and the standard normalization provided by Affymetrix does not perform well in these situations. Results: We present three methods of performing normalization at the probe intensity level. These methods are called complete data methods because they make use of data from all arrays in an experiment to form the normalizing relation. These algorithms are compared to two methods that make use of a baseline array: a one number scaling based algorithm and a method that uses a non-linear normalizing relation by comparing the variability and bias of an expression measure. Two publicly available datasets are used to carry out the comparisons. The simplest and quickest complete data method is found to perform favorably. Availability: Software implementing all three of the complete data normalization methods is available as part of the R package Affy, which is a part of the Bioconductor project http://www.bioconductor.org. Supplementary information: Additional figures may be found at http://www.stat.berkeley.edu/~bolstad/normalize/index.html

A Comparison of Normalization Techniques for MicroRNA Microarray Data

Abstract and Figures

Recommended publications

Replica-Exchange and Standard State Binding Free Energies with Grand Canonical Monte Carlo

Deformability of polycaproamide in cyclic extension - compression

Validation of nanostring microrna analysis in leukaemic blood

Rhythmic expression of microRNA in breast epithelial cell culture as determined by microarray