ArticlePDF Available

A Comparison of Normalization Techniques for MicroRNA Microarray Data

Authors:

Abstract and Figures

Normalization of expression levels applied to microarray data can help in reducing measurement error. Different methods, including cyclic loess, quantile normalization and median or mean normalization, have been utilized to normalize microarray data. Although there is considerable literature regarding normalization techniques for mRNA microarray data, there are no publications comparing normalization techniques for microRNA (miRNA) microarray data, which are subject to similar sources of measurement error. In this paper, we compare the performance of cyclic loess, quantile normalization, median normalization and no normalization for a single-color microRNA microarray dataset. We show that the quantile normalization method works best in reducing differences in miRNA expression values for replicate tissue samples. By showing that the total mean squared error are lowest across almost all 36 investigated tissue samples, we are assured that the bias correction provided by quantile normalization is not outweighed by additional error variance that can arise from a more complex normalization method. Furthermore, we show that quantile normalization does not achieve these results by compression of scale.
Content may be subject to copyright.
Statistical Applications in Genetics
and Molecular Biology
Volume 7, Issue 1 2008 Article 22
A Comparison of Normalization Techniques
for MicroRNA Microarray Data
Youlan RaoYoonkyung LeeDavid Jarjoura
Amy S. Ruppert∗∗ Chang-gong Liu††
Jason C. Hsu‡‡ John P. Hagan§
The Ohio State University, rao@stat.ohio-state.edu
The Ohio State University, yklee@stat.ohio-state.edu
The Ohio State University, david.jarjoura@osumc.edu
∗∗The Ohio State University, amy.ruppert@osumc.edu
††The Ohio State University, chang-gong.liu@osumc.edu
‡‡The Ohio State University, jch@stat.ohio-state.edu
§The Ohio State University, microrna@gmail.com
Copyright c
2008 The Berkeley Electronic Press. All rights reserved.
A Comparison of Normalization Techniques
for MicroRNA Microarray Data
Youlan Rao, Yoonkyung Lee, David Jarjoura, Amy S. Ruppert, Chang-gong Liu,
Jason C. Hsu, and John P. Hagan
Abstract
Normalization of expression levels applied to microarray data can help in reducing measure-
ment error. Different methods, including cyclic loess, quantile normalization and median or mean
normalization, have been utilized to normalize microarray data. Although there is considerable lit-
erature regarding normalization techniques for mRNA microarray data, there are no publications
comparing normalization techniques for microRNA (miRNA) microarray data, which are subject
to similar sources of measurement error. In this paper, we compare the performance of cyclic loess,
quantile normalization, median normalization and no normalization for a single-color microRNA
microarray dataset. We show that the quantile normalization method works best in reducing dif-
ferences in miRNA expression values for replicate tissue samples. By showing that the total mean
squared error are lowest across almost all 36 investigated tissue samples, we are assured that the
bias correction provided by quantile normalization is not outweighed by additional error variance
that can arise from a more complex normalization method. Furthermore, we show that quantile
normalization does not achieve these results by compression of scale.
KEYWORDS: microRNA, median normalization, cyclic loess normalization, quantile normal-
ization, robust estimates, smoothing spline, mean squared error
This material is based in part upon work supported by the National Science Foundation under
Agreement No. 0635561. Jason C. Hsu’s research is supported by NSF Grant Number DMS-
0505519
1 Introduction
In microarray experiments, variation of expression measurements among arrays
can be attributed to many sources, such as differences in sample RNA prepa-
ration, cDNA labeling, image intensity and microarray hybridization/wash effi-
ciency. Normalization of expression levels applied to microarray data can help
in removing this error. Different methods, including cyclic loess, quantile nor-
malization (Bolstad et al. 2003) and median or mean normalization (Churchill
2002, Churchill 2003, Churchill and Oliver 2001, Kerr and Churchill 2001, and
Wolfinger et al. 2001), have been utilized to normalize microarray data. Briefly,
cyclic loess makes the MA plot of probe intensities from every pair of arrays
scatter about the M= 0 axis, quantile normalization makes the distributions
of expression levels the same across arrays, and median or mean normalization
shifts the individual log-intensities on each array so that the median or mean
log-intensities, respectively, are the same across arrays. These normalization al-
gorithms can be applied either globally to an entire data set or locally to some
physical subset of the data (Quackenbush 2002). Irizarry et al. (2003) applied
the quantile normalization procedure to normalize dilution data and spike-in data
from Affymetrix arrays, and showed how quantile normalization removed bias
as compared to no normalization. Their analysis was unique in that they knew
the true expression levels and could therefore determine the degree of bias re-
duction from quantile normalization.
MicroRNAs (miRNAs) are noncoding RNAs of 19-24 nucleotides that are
negative regulators of gene expression. Recently implicated as important in
development and normal physiology, microRNAs are abnormally expressed in
many human cancers (Volinia et al. 2006, Lu et al. 2005). Moreover, aberrant
microRNA expression has been shown to initiate and promote carcinogenesis
(reviewed in Hagan and Croce 2007). These microRNA expression signatures
may reveal new oncogenetic pathways in human cancers. For systematic in-
vestigation of microRNA expression, oligonucleotide-based microarrays for mi-
croRNAs in human and mouse tissues have been developed recently (Liu et al.
2004) and several commercial platforms are now available. To date, more than a
hundred published reports have used microRNA microarrays to investigate their
expression profiles, where more than two-thirds have used single color versus
two color hybridization systems. Although there is substantial literature regard-
ing normalization techniques for mRNA microarray data, there are no published
reports comparing normalization techniques for microRNA (miRNA) microar-
ray data, which are subject to the similar sources of error variation.
Many statistical reports on mRNA microarrays have focused on Affymetrix
1
Rao et al.: Comparing Normalization Techniques for MicroRNA Microarray Data
Published by The Berkeley Electronic Press, 2008
mRNA arrays, which have an exceedingly high density of probes that are in situ
synthesized on the array. For example, in one Human Genome U133 Plus2.0
GeneChip, probe sets for each mRNA, including numerous housekeeping genes,
consist of eleven oligonucleotide probes selected to maximize specificity and to
have similar melting temperatures across the entire array. In contrast, microRNA
microarrays are often lower density spotted arrays. Our focus is on single color
microRNA microarray. This type of microarray is used predominantly in com-
parison to dual color arrays. Results from the Version 3.0 microRNA microarray
used in this study and its earlier versions have appeared in more than 40 publi-
cations. The Version 3.0 microarray contains 3790 probes spotted in duplicate.
The probes are 40 nucleotides in length, consisting of the genomic sequence
that has the mature microRNA sequence and additional flanking bases. With
the exception of six probes designed against Arabidopsis thaliana microRNAs,
the rest of the probes are derived from known and predicted human and mouse
microRNAs. This design allows for the detection of mature as well as precursor
miRNAs and is particularly helpful in determining if computationally predicted
miRNAs are real. Although U6 snRNA is frequently used as a control for mi-
croRNA experiments, this noncoding RNA has been shown to vary as much as
five fold for equivalent amounts of total RNA by both microarray and North-
ern analysis (Hagan and Liu, unpublished observations). Hence, probes for U6
snRNA were not included in the Version 3.0 microarray. Most, if not all, com-
mercially available microRNA microarrays do not have controls for endogenous
RNAs that have been shown to be largely invariant between tissue samples.
Given the short length of miRNAs and the fact that far more mRNAs are
known than miRNAs, it is important to compare normalization methods specif-
ically for the miRNA microarray data. Although microRNA microarrays are
lower density spotted arrays than mRNA microarrays, they are not “boutique”
arrays. For example, microRNA arrays do not meet the following criteria: “more
than half the probes might be differentially expressed between any two samples
and that the differential expression might be predominately in one direction”
(Oshlack et al. 2007). We also do not expect global differences across miRNA
arrays. As an example, the biggest difference in miRNA expressions was ex-
pected between brain and heart tissues, we found only 15% of miRNAs were
differentially expressed with a greater than 2 fold difference, when comparing
these distinct tissue types. Other examples include the referenced miRNA stud-
ies in cancer (Calin et al. 2005, Volinia et al. 2006, Yanaihara et al. 2006)
and tissue differentiation (Babak et al. 2004, Barad et al. 2004, Garzon et al.
2004) in Davison et al. (2006). For the three referenced cancer studies that used
microRNA microarrays, the number of differentially expressed microRNAs are
2
Statistical Applications in Genetics and Molecular Biology, Vol. 7 [2008], Iss. 1, Art. 22
http://www.bepress.com/sagmb/vol7/iss1/art22
13/245 (5.3%), 22/22857/228 (9.6% 25.0%, range depends on which of six
tumor/normal comparisons were performed) and 43/352 (12.2%). For the three
referenced differentiation studies, the number of differentially expressed mi-
croRNA are 19/399 (4.8%), 25/15435/154 (15.2%22.7%, range depends on
the specific pairwise tissue comparison) and 35/150 57/150 (23.3% 38.0%,
range depends on the specific pairwise tissue comparison). We can conclude
with confidence that much less than 50% of miRNAs are differentially expressed
based on our experience and assessment of the literature. In addition to our cus-
tom microRNA microarrays, there are numerous commercially available miRNA
microarrays. For example, LC Sciences, Exiqon, Agilent, Invitrogen, and Am-
bion sell miRNA microarrays, with 1564,4000,15000,3000, and 1224 miRNA
probes, respectively. Hence, the probe density of our array is similar to many cur-
rently available commercial platforms. Importantly, high throughput sequencing
of microRNAs is rapidly expanding the number of known microRNAs. Hence,
our custom arrays soon will need to be updated with evenmore probes to reflect
the recently identified microRNAs. The microRNA registry (Version 10.1) cur-
rently has sequences for 5395 miRNAs. Even though microRNA microarrays
are not ”boutique” arrays in general, a few cases exist where large numbers of
microRNAs will be differentially expressed in only one direction. Knockouts
of essential microRNA biogenesis proteins such as Drosha, DGCR8, or Dicer1
lead to a dramatic reduction in steady state microRNA levels by blocking pro-
duction of mature microRNAs (Kumar et al. 2007). These global downregula-
tion cases are exceptionally easy to detect by microarray as the percentage of
microRNAs expressed above background is considerably different in compari-
son to controls. Other confirmed examples that show unidirectional microRNA
regulation are quite rare. Using a novel bead-based microRNA profiling system,
microRNAs were reported to be downregulated primarily in cancers (129 of 217
investigated). Almost all studies of microRNAs in cancer, including all the re-
search referenced in this manuscript, have found roughly balanced numbers or
a slight enrichment for upregulated microRNAs in cancer, casting doubt on the
conclusions of Lu et al. (2005). Even research that at first glance might seem
to support the conclusions of Lu and colleagues demonstrates unequivocally the
opposite. For example, Chang et al. (2008) reported that Myc expression leads
to widespread repression of microRNAs. As their Supplemental Table 1 shows
for 313 human microRNAs investigated, 11 and 17 microRNAs are upregulated
and downregulated, respectively, at least two fold upon induced Myc expression.
Although vigilance must be exercised to make sure that the underlying assump-
tions are valid, the normalization methods that we present are compatible for the
vast majority of studies using microRNA microarrays.
3
Rao et al.: Comparing Normalization Techniques for MicroRNA Microarray Data
Published by The Berkeley Electronic Press, 2008
In this paper, we compare the performance of median, cyclic loess, quantile,
and no normalization for miRNA microarray data. The data included 72 mi-
croarrays obtained from RNA from 26 human and 10 mouse tissues that were
hybridized as techinal replicates. Hence, each RNA sample was hybridized to
two independent microarrays. Since replicate samples should, in theory, have al-
most identical values for expressions, one can compare different normalization
techniques in terms of the closeness of normalized measurements in the repli-
cated samples. Moreover, there are no confounding biological effects that come
from tissues from different individuals. The differences between these paired
expression levels with and without normalization can be divided into a bias and
variance components by expression level. Both of these miRNA-by-miRNA dif-
ferences components should be reduced after applying normalization methods.
We used these differences to provide direct evidence of the capability of each
method of reducing these two components. It was critical to examine the effects
on both quantities because the complexity of a transformation may increase the
error variance over and above its bias reduction. To resemble how normalization
is typically applied to samples, normalization was done globally across all 72
samples. This is an important distinction from normalizing each of 36 replicate
pairs separately, where this level of normalization could produce artificially low
variance and bias.
Section 2 describes the normalization methods in detail. Section 3 describes
the miRNA data used in this paper. Section 4 compares normalization methods.
2 Normalization Methods
Three commonly used normalization techniques are reviewed. Suppose that we
have the (log base 2 transformed) probe level expression values from pmiRNAs
and narrays in a p×nmatrix X.
Median normalization shifts miRNAs expressions on each array by additive
constants so that the medians of miRNAs expressions are the same across arrays
by the following steps:
Take the median of each column of Xand generate a n-dimensional me-
dian vector M;
Calculate the overall median of the vector M;
Shift miRNAs expression values of each array by subtracting the differ-
ence between the median of each array and the overall median from them.
4
Statistical Applications in Genetics and Molecular Biology, Vol. 7 [2008], Iss. 1, Art. 22
http://www.bepress.com/sagmb/vol7/iss1/art22
Instead of matching the median only across the arrays, Quantile normaliza-
tion makes the distributions of expression levels the same across arrays by the
following steps:
Sort each column of Xseparately to generate a sorted p×nmatrix Y;
Take the mean of each row of Yand generate a p-dimensional vector Ab,
called the baseline array;
Get the normalized miRNAs expressions for each array by rearranging the
baseline array Abto have the same ordering of the corresponding column
of the matrix Xso that empirical distributions of miRNA expressions are
the same as that of the baseline array across arrays.
Cyclic loess considers the MA plot of probe intensities from every pair of
arrays (Xij, Xij ), with fixed j6=jand i= 1,2, ..., p, and makes the M and A
pairs scattered around the M= 0 axis by the following steps:
Compute Mi=Xij Xijand Ai=1
2(Xij +Xij);
Fit a loess curve by regression Mon A, and denoted the fitted vector by
ˆ
M;
Setting the vector D= (Mˆ
M)/2, get the normalized miRNAs expres-
sions for (Xij , Xij)by modifying Xij to Xij +Diand Xijto Xij Di,
i= 1,2, ..., p.
3 Description of Data
Total RNA was purchased from Ambion Inc. Microarray labeling and hybridiza-
tion were performed as previously described in Liu et al. (2004), except for
the exceptions noted below. The Ohio State University Comprehensive Can-
cer Center Version 3.0 microRNA microarray was used and this array contains
3790 oligo probes derived from 578 mature miRNAs spotted in duplicate (329
Homo sapiens, and 249 Mus musculus) that are annotated in the miRNA reg-
istry http://microrna.sanger.ac.uk/ sequences/ (Accessed Nov. 2005). Of the 396
evolutionarily conserved mature microRNAs between mice and human in Ver-
sion 10.1 of the microRNA registry, 68% are identical in length and sequence.
Hence, many of the mouse probes serve as additional controls for their human
counterparts and vice versa. In addition, 1493 human and 1137 mouse oligo
5
Rao et al.: Comparing Normalization Techniques for MicroRNA Microarray Data
Published by The Berkeley Electronic Press, 2008
probes for miRNAs computationally predicted in human and mouse, respec-
tively, are also spotted in duplicate. Often, more than one probe set exists for a
given mature miRNA. Additionally, there are duplicate probe spots correspond-
ing to most precursor miRNAs. Hybridization signals were ultimately detected
with Streptavidin-Alexa 647, conjugate and scanned images (Axon 4000B) were
quantified using the Genepix 6.0 software through a local background correction
(Axon Instruments, Sunnyvale, CA).
4 Analysis
Background-corrected median signals for duplicate probes on an array were av-
eraged. After normalization across all 72 arrays, let Xibe the log base 2trans-
formed expression value of the ith miRNA for a certain tissue, and let Yibe the
log base 2transformed expression value of the ith miRNA for the replicate of
the tissue.
Bias. The average Ai= (Xi+Yi)/2and the difference Mi=XiYiof
expression values for each miRNA can then be computed. The MA plot of the
two vectors Xiand Yiis a 45-degree rotation and axis scaling of their scatter
plot. This plot is particularly useful for array data because Mirepresents the
log fold change and Airepresents the average log intensity for the ith miRNA.
When the loess curves of the MA plot deviate from the horizontal line at M= 0
, this demonstrates differences in the intensity levels between two arrays from
the same tissue (Gentleman et al. 2005). In contrast, if the loess curves align
with M= 0, the normalization method is considered to exhibit little bias at all
levels of expression. When MA plots and loess curves were made for the repli-
cate array data from human brain tissue using no normalization, median normal-
ization, quantile normalization and cyclic loess, we observed that the quantile
normalization method removed bias the best (Figure 1C), the loess curve closely
followed the horizontal line at M= 0. No normalization, median normalization
and cyclic loess behaved similarly in that their loess curves are not aligning with
M= 0 closely enough (Figure 1A, 1B and 1D).
Binning. To compare the normalization methods in how much they reduced
error variance in addition to reducing bias, we formally modeled the mean and
variance of differences in replicate arrays as a function of their expression lev-
els. In order to obtain reliable estimates of the expression levels, we binned
duplicates according to their average expression level first and then proceeded
by modeling the mean and variance based on the binned data.
We created equally-sized bins containing 34 miRNAs probes. For each bin,
6
Statistical Applications in Genetics and Molecular Biology, Vol. 7 [2008], Iss. 1, Art. 22
http://www.bepress.com/sagmb/vol7/iss1/art22
0 5 10 15
−6 −4 −2 0 2 4 6
A
A: Average of expressions
M: Difference of expresssions
0 5 10 15
−6 −4 −2 0 2 4 6
B
A: Average of expressions
M: Difference of expresssions
0 5 10 15
−6 −4 −2 0 2 4 6
C
M: Difference of expresssions
0 5 10 15
−6 −4 −2 0 2 4 6
D
M: Difference of expresssions
Figure 1: MA and loess plot of expression values for the human brain tissue
data. A) without normalization, B) after median normalization, C) after quantile
normalization and D) after cyclic loess.
we summarized the differences in the replicate arrays by median absolute devi-
ation (MAD) of the differences and median of the differences to obtain robust
estimates of variance and bias, respectively (Lin et al. 2002). The smoothed
MADs and medians of the differences were used to detect systematic effects due
to the different normalization methods as a function of expression levels. Lower
values of smoothed MADs and smoothed medians closer to zero across average
expressions correspond to a superior normalization method.
As stated above, each bin consisted of 34 miRNAs probes. For fixed k
(1 kK), let X(i)k(i= 1,2, ..., 34) be the expression value of the ith
miRNA in the kth bin for a specific tissue, and let Y(i)k(i= 1,2, ..., 34) be the
7
Rao et al.: Comparing Normalization Techniques for MicroRNA Microarray Data
Published by The Berkeley Electronic Press, 2008
expression value of the ith miRNA in the kth bin for the replicate of the tissue.
The difference between the replicate arrays expression values for each miRNA
in the kth bin can be denoted by D(i)k=X(i)kY(i)k(i= 1,2, ..., 34), and the
corresponding observations by d(i)k. We assume that for fixed k,
D(i)ki.i.d.
N(µk, σ2
k)i= 1,2, ..., 34
and use
mdk=median
1i34 (d(i)k)
as a robust location (center) estimate of µk=E[D(1)k], and
MADdk=median
1i34 |d(i)kmedian
1i34 (d(i)k)|,
as a robust estimate of scale (spread), which is proportional to σk=pvar[D(1)k]
under normality.
For the average expression values of miRNAs in the kth bin across certain
tissue replicates, let A(i)k= (X(i)k+Y(i)k)/2 (i= 1,2, ..., 34) and a(i)kbe the
ith observation. Similarly, for estimation of the center of the average expression
values in each bin, we consider
mak=median
1i34 (a(i)k).
As Figure 1A suggests, it is sensible to model µkand σkas a function of the
center of the average expression values of miRNA replicates in the kth bin.
For the paired observations (ma1, md1),(ma2, md2), ..., (maK, mdK), we
modeled the median difference as a smooth function of the median average
mdk=η(mak) + ǫk, k = 1,2, ..., K
with ǫkN(0, σ2
m,k)and with a different variance for each bin. The smoothed
relationship ηwas obtained by the weighted smoothing spline with weights equal
to the reciprocal of the squared MAD of difference. Quantile normalization gave
the best results when comparing the weighted smoothed curves for the median
difference in expression values using the human brain tissue data (Figure 2).
Similarly, for the paired observations (ma1, MADd1),(ma2, MADd2),...,
(maK, MADdK), we considered the following model with unequal variance
MADdk=ξ(mak) + ǫk, k = 1,2, ..., K
8
Statistical Applications in Genetics and Molecular Biology, Vol. 7 [2008], Iss. 1, Art. 22
http://www.bepress.com/sagmb/vol7/iss1/art22
A
median of average of expressions
median of difference of expressions
B
median of average of expressions
median of difference of expressions
C
median of difference of expressions
D
median of difference of expressions
Figure 2: weighted smoothed medians of difference of expression values for the
human brain tissue data. A) without normalization, B) after median normaliza-
tion, C) after quantile normalization and D) after cyclic loess.
and ǫkN(0, σ2
MAD). The smoothed MAD of differences ξcan again be ob-
tained by smoothing splines with the smoothing parameter selected by general-
ized maximum likelihood (GML) (Gu 2002). It was difficult to see differences
in the relationship between MADd and ma among the normalization methods
(Figure 3), but they became more apparent if the bias and variance were com-
bined into a mean-squared error statistic.
Confidence intervals. The fitted medians of differences ηis the smoothed
estimate of bias parameter µk, and the fitted MAD of differences ξis the smoothed
estimate of scale parameter. We used the fitted MAD to estimate confidence in-
tervals around bias and obtained a pointwise confidence interval for the bias by
9
Rao et al.: Comparing Normalization Techniques for MicroRNA Microarray Data
Published by The Berkeley Electronic Press, 2008
A
median of average of expressions
MAD of difference of expressions
B
median of average of expressions
MAD of difference of expressions
C
MAD of difference of expressions
D
MAD of difference of expressions
Figure 3: smoothed MADs versus median averages for the human brain tissue
data. A) without normalization, B) after median normalization, C) after quantile
normalization and D) after cyclic loess.
binned expression values as
ˆη(mak)±3.98
34
ˆ
ξ(mak),
(see Hoaglin et al. 2000). The confidence band after quantile normalization
encompasses the horizontal line at M= 0, while those using no normalization,
median normalization or cyclic loess do not include zero for larger expression
values (Figure 4).
Mean Squared Error. We obtained the mean squared error (MSE) of the
difference in expression values (including variance and squared bias)
MSEk=E[D2
(1)k] = var[D(1)k] + E[D(1)k]2=σ2
k+µ2
k,
10
Statistical Applications in Genetics and Molecular Biology, Vol. 7 [2008], Iss. 1, Art. 22
http://www.bepress.com/sagmb/vol7/iss1/art22
A
A: average of expressions
M: difference of expressions
B
A: average of expressions
M: difference of expressions
C
M: difference of expressions
D
M: difference of expressions
Figure 4: confidence band of the bias for the human brain tissue data. A) without
normalization, B) after median normalization, C) after quantile normalization
and D) after cyclic loess.
which can be estimated by the smoothed estimates
[ˆ
ξ(mak)
0.6745 ]2+ ˆη(mak)2,
(see Huber 2003). The estimated MSE for quantile normalization is smallest
when average expression values are greater than noise levels of measurements,
and the estimated MSE for cyclic loess is slightly larger than that of quantile
normalization across all average expression values. Median normalization per-
formed similarly to no normalization (Figure 5).
To evaluate the global bias and variance for each method, we averaged MSEs
across expression levels greater than 4.5; the value 4.5(log base 2 transformed)
11
Rao et al.: Comparing Normalization Techniques for MicroRNA Microarray Data
Published by The Berkeley Electronic Press, 2008
6 8 10 12 14
0.0 0.2 0.4 0.6 0.8 1.0
median of average of expressions
MSE of difference of expressions
Brain tissue
Figure 5: MSE curves without normalization (black, solid line), after median
normalization (green, dashed line), and after quantile normalization (red, dot-
dashed line) after cyclic loess (blue, dotted line).
was selected because 95% of the blanks (spots lacking oligonucleotide probes)
gave intensities less than this value. The average MSEs for no normalization,
median normalization, quantile normalization and cyclic loess using the brain
tissue data were 0.278, 0.274, 0.225, 0.270 respectively. These results were
found consistently across the other 35 tissue types (Figure 6), where the MSEs
were lower for quantile normalization (coded 2) in almost all tissue samples
compared to no normalization (coded 0), median normalization (coded 1) and
cyclic loess (coded 3), except for human lung, human liver, human thymus,
mouse liver and mouse lung. When the normalization methods were applied
to each tissue type separately, instead of to all 72 arrays together, the results
were similar.
Checking for Scale Compression. It is possible that the superior results for
12
Statistical Applications in Genetics and Molecular Biology, Vol. 7 [2008], Iss. 1, Art. 22
http://www.bepress.com/sagmb/vol7/iss1/art22
000
000
0
0
00
00
0
0000
0
0
00
0
000
0
000
0000
0
0
0
mean of MSEs
Esophagus
Colon
Cervix
Lung
Brain
Bladder
Liver
Kidney
Adipose
Heart
Thymus
Ovary
Placenta
Testes
Thyroid
Skeletal Muscle
Small Intestine
Spleen
Prostate
Trachea
Pancreas
Breast
Stomach
Uterus
Adrenal
Lymph Node
Mouse Spleen
Mouse Liver
Mouse Brain
Mouse Heart
Mouse Ovary
Mouse Embryo
Mouse Lung
Mouse Thymus
Mouse Kidney
Mouse Testicle
0.2 0.4 0.6 0.8 1.0
1
11
11111
1
1
1111111
1
1
1
1
1
111
1
111
1111
1
1
1
222
2
222
2
22
2
222222
2
2
22
2
222
2
22
2
2222
2
2
2
333
3333
3
33
3333333
3
3
33
3
333
33333333
3
3
3
Figure 6: mean of MSEs for the difference in expression values without normal-
ization (0 and black), after median normalization (1 and green), after quantile
normalization (2 and red) and after cyclic loess (3 and blue).
quantile normalization is the result of the compression of the scale downward
after transformation. To check this, we first calculated coefficients of varia-
tion (CV) as the ratio of an estimate of the standard deviation of measurement
(MSE) for each bin to the mean expression for that bin and then average the
ratios across bins. We found the CVs followed the same pattern as the MSEs,
that is, typically lower values for quantile normalization across tissues (Figure
7). It is also possible that the superior results for quantile normalization is the
result of compressing the scale from both ends after transformation; thereby re-
ducing spread and sensitivity of transformed measurements. To check this, we
calculated the average variance of expression levels across the 36 tissues for each
miRNA. This variance consists of true variance across tissues and measurement
13
Rao et al.: Comparing Normalization Techniques for MicroRNA Microarray Data
Published by The Berkeley Electronic Press, 2008
error as obtained with the MSE. Averaging the variance across miRNAs and the
MSEs across tissues, we found the ratios of signal (true) variance to noise (mea-
surement error) variance were 12.0,14.0,16.3and 16.3for no, median, quantile
and cyclic loess normalization respectively.
0
00
000
0
0
00
00
0
0000
0
0
00
0
000
0
00
0
0000
0
0
0
mean of CVs
Esophagus
Colon
Cervix
Lung
Brain
Bladder
Liver
Kidney
Adipose
Heart
Thymus
Ovary
Placenta
Testes
Thyroid
Skeletal Muscle
Small Intestine
Spleen
Prostate
Trachea
Pancreas
Breast
Stomach
Uterus
Adrenal
Lymph Node
Mouse Spleen
Mouse Liver
Mouse Brain
Mouse Heart
Mouse Ovary
Mouse Embryo
Mouse Lung
Mouse Thymus
Mouse Kidney
Mouse Testicle
0.04 0.06 0.08 0.10 0.12
1
11
111
1
1
1
1
1111111
1
1
1
1
1
11
1
1
1
1
1
111
1
1
1
1
222
2222
2
22
2
2222
22
2
2
22
2
222
2
22
2
22
22
2
2
2
333
333
3
3
33
3333333
3
3
33
3
33
3
3
33
3
3333
3
3
3
Figure 7: mean of CVs for the difference in expression values without normal-
ization (0 and black), after median normalization (1 and green), after quantile
normalization (2 and red) and after cyclic loess (3 and blue).
Comparative Study We compare real-time RT-PCR miRNA data (Lee et al.
2008) with our microarry miRNA data, since twenty-one tissues were common
to both datasets. Specifically, we focused on brain and heart, since these tissues
are quite biologically distinct and have substantial differences in their miRNA
expression profiles. If a normalization technique was overly aggressive, then
there would be an ”averaging-out” effect, leading to a significant decrease in the
number of differentially expressed miRNAs. A well known difference between
14
Statistical Applications in Genetics and Molecular Biology, Vol. 7 [2008], Iss. 1, Art. 22
http://www.bepress.com/sagmb/vol7/iss1/art22
microarray and RT-PCR data is that the fold changes observed by microarray
tend to be compressed in comparison with fold changes observed by RT-PCR.
We found 51 miRNAs were characterized by a four fold difference in expression
by RT-PCR. For the microarray data on identical miRNAs, we found that 36,
35,35,35 miRNAs were two fold differentially expressed for no, median, cyclic
loess and quantile normalization respectively. This set of miRNAs was found to
have roughly an 70% overlap with the RT-PCR data. The observed values for
fold changes varied little with respect to the normalization method used. In this
respect, we could not conclude any superior normalization method based strictly
on this analysis, but we could at least conclude that quantile normalization is not
worse than other methods in terms of its sensitivity.
5 Conclusion
We showed that the quantile normalization method works best in reducing dif-
ferences in miRNA expression values for duplicate tissue samples, cyclic loess
works slightly worse than quantile normalization, whereas no normalization and
median normalization behave similarly and seem to be inferior to quantile nor-
malization and cyclic loess with regard to bias. This is not surprising because
quantile normalization adjusted better for differential bias across the scale of
expression values. By showing that the total MSE was lower across almost all
36 tissue samples, we were assured that the bias correction provided by quan-
tile normalization was not outweighed by additional error variance that can arise
from a more complex normalization method. Furthermore, we showed that quan-
tile normalization does not achieve smaller replication error by compressing the
scale downward or by compressing the scale from both ends.
References
Babak, T., Zhang, W., Morris, Q., Blencowe, B. and Hughes, T. (2004). Prob-
ing microRNAs with microarrays: Tissue specificty and functional inference,
RNA 10: 1813–1819.
Barad, O., Meiri, E., Avniel, A., Aharonov, R., Barzilai, A., Bentwich, I., Einav,
U., Gilad, S., Hurban, P., Karov, Y., Lobenhofer, E. K., Sharon, E., Shibo-
leth, Y. M., Shtutman, M., Bentwich, Z. and Einat, P. (2004). MicroRNA ex-
pression detected by oligonucleotide microarrays: System establishment and
expression profiling in human tissues, Genome Research 14: 2486–2494.
15
Rao et al.: Comparing Normalization Techniques for MicroRNA Microarray Data
Published by The Berkeley Electronic Press, 2008
Bolstad, B. M., Irizarry, R. A., Astrand, M. and Speed, T. P. (2003). A com-
parision of normalization methods for high density oligonucleotide array data
based on variance and bias, Bioinformatics 19: 185–193.
Calin, G., Ferracin, M., Cimmino, A., DiLeva, G., Shimizu, M., Wojcik, S.,
Iorio, M., Visone, R., Sever, N., Fabbri, M., Iuliano, R., Palumbo, T., Pichiorri,
F., Roldo, C., Garzon, R., Sevignani, C., Rassenti, L., Alder, H., Volinia, S.,
Liu, C. G., Kipps, T. J., Negrini, M. and Croce, C. M. (2005). A microRNA
signature associated with prognosis and progression in chronic lymphocytic
leukemia, The New England Journal of Medicine 353: 1793–1801.
Chang, T., Yu, D., Lee, Y., Wentzel, E., Arking, D., West, K., Dang, C. V.,
Thomas-Tikhonenko, A. and Mendell, J. T. (2008). Widespread microRNA
repression by myc contributes to tumorigenesis, Nature Genetics 40(1): 43–
50.
Churchill, G. A. (2002). Fundamentals of experimental design for cdna microar-
rays, Nature Genetics 32: 490–495.
Churchill, G. A. (2003). Discussion to statistical challenges in functional
genomics-comment, Statistical Science 18: 64–69.
Churchill, G. A. and Oliver, B. (2001). Sex, flies and microarrays, Nature Ge-
netics 29: 355–356.
Davison, T., Johnson, C. and Andruss, B. (2006). Analyzing micro-RNA ex-
pression using microarrays, Methods in Enzymology 411: 14–34.
Garzon, R., Pichiorri, F., Palumbo, T., Iuliano, R., Cimmino, A., Aqeilan, R.,
Volinia, S., Bhatt, D., Alder, H., Marcucci, G., Carlin, G., Liu, C. G., Bloom-
field, C., Andreeff, M. and Croce, C. (2006). MiRNA fingerprints during hu-
man megakaryocytopoiesis, Proceedings of the National Academy of Sciences
of the United States of America 101: 5078–5083.
Gentleman, R., Carey, V. J., Huber, W., Irizarry, R. and Dudoit, S. (2005). Bioin-
formatics and computational biology solutions using R and bioconductor,
Springer: New York.
Gu, C. (2002). Smoothing Spline ANOVA Models, Springer: New York.
Hagan, J. and Croce, C. (2007). MicroRNAs in carcinogenesis, Cytogenetic and
Genome Research 118: 252–259.
16
Statistical Applications in Genetics and Molecular Biology, Vol. 7 [2008], Iss. 1, Art. 22
http://www.bepress.com/sagmb/vol7/iss1/art22
Hoaglin, D. C., Mosteller, F. and Tukey, J. W. (2000). Understanding Robust
and Exploratory Data Analysis, John Wiley & Sons.
Huber, P. (2003). Robust Statistics, John Wiley & Sons.
Irizarry, R. A., Hobbs, B., Collin, F. and Speed, T. (2003). Exploration, nor-
malization, and summaries of high density oligonucleotide array probe level
data., Biostatistics 4: 249–264.
Kerr, M. K. and Churchill, G. (2001). Experimental design for gene expression
microarrays., Biostatistics 2: 183–201.
Kumar, M., Lu, J., Mercer, K., Golub, T. and Jacks, T. (2007). Impaired mi-
croRNA processing enhances cellular transformation and tumorigenesis, Na-
ture Genetics 39(5): 673–677.
Lee, E., Baek, M., Gusev, Y., Brackett, D. J., Nuovo, G. and Schmittgen, T.
(2008). Systematic evaluation of microRNA processing patterns in tissues,
cell lines, and tumors, RNA 14: 35–42.
Lin, Y., Nadler, S. T., Lan, H., Attie, A. D. and Yandell, B. S. (2003). Adaptive
gene picking with microarray data: detecting important low abundance sig-
nals, in G. Parmigiani, E. S. Garrett, R. A. Irizarry and S. L. Zeger (eds), The
Analysis of Gene Expression Data: Methods and Software, Springer-Verlag.
Liu, C., Calin, G., Meloon, B., Gamliel, N., Sevignani, C., Ferracin, M., Du-
mitru, C., Shimizu, M., Zupo, S., Dono, M., Alder, H., Bullrich, F., Negrini,
M. and Croce, C. (2004). An oligonucleotide microchip for genome-wide mi-
croRNA profiling in human and mouse tissues, Proceedings of the National
Academy of Sciences of the United States of America 101(26): 9740–9744.
Lu, J., Getz, G., Miska, E., Alvarez-Saavedra, E., Lamb, J., Peck, D., Sweet-
Cordero, A., Ebert, B. L., Mark, R., Ferrando, A., R., D. J., Jacks, T., Horvitz,
H. R. and Golub, T. R. (2005). MicroRNA expression profiles classify human
cancers, Nature 435(7043): 843–848.
Oshlack, A., Emslie, D., Corcoran, L. and Smyth, G. (2007). Normalization
of boutique two-color microarrays with a high proportion of differentially ex-
pressed probes, Genome Biology 8(1):R2.
Quackenbush, J. (2002). Microarray data normalization and transformation, Na-
ture Genetics 32: 496–501.
17
Rao et al.: Comparing Normalization Techniques for MicroRNA Microarray Data
Published by The Berkeley Electronic Press, 2008
Volinia, S., Calin, G., Liu, C., Ambs, S., Cimmino, A., Petrocca, F., Visone,
R., Iorio, M., Roldo, C., Ferracin, M., Prueitt, R., Yanaihara, N., Lanza, G.,
Scarpa, A., Vecchione, A., Negrini, M., Harris, C. and Croce, C. (2006). A
microRNA expression signature of human solid tumors defines cancer gene
targets, Proceedings of the National Academy of Sciences of the United States
of America 103(7): 2257–2261.
Wolfinger, R., Gibson, G., Wolfinger, E. D., Bennett, L., Hamadeh, H., Bushe,
P., Afsha, C. and Paules, R. . (2001). Assessing gene significance from cdna
microarray expression data via mixed models, Journal of Computational Bi-
ology 8: 625–637.
Yanaihara, N., Caplen, N., Bowman, E., Seike, M., Kumamoto, K., Yi, M.,
Stephens, R., Okamoto, A., Yokota, J., Tanaka, T., Carlin, G., Liu, C. G.,
Croce, C. and Harris, C. (2006). Unique miRNA molecular profiles in lung
cancer diagnosis and prognosis, Cancer Cell 9(3): 189–198.
18
Statistical Applications in Genetics and Molecular Biology, Vol. 7 [2008], Iss. 1, Art. 22
http://www.bepress.com/sagmb/vol7/iss1/art22
... Although 75th percentile normalization aims to normalize data such that the top 25% of values (75th percentile) become 1, it may not sufficiently correct for the influence of low-signal missing values ( Supplementary Fig. S7c and d). In contrast, quantile normalization was performed for pipeline 4 after checkpoint 2 (ICC = 0.51, NRMSE = 0.080), which is highly effective in harmonizing distributions across datasets [56,57]. Based on previous findings, we presumed that this method would effectively correct batch effects and improve agreement. ...
Article
Full-text available
Background Extracellular vesicle-derived (EV)-miRNAs have potential to serve as biomarkers for the diagnosis of various diseases. miRNA microarrays are widely used to quantify circulating EV-miRNA levels, and the preprocessing of miRNA microarray data is critical for analytical accuracy and reliability. Thus, although microarray data have been used in various studies, the effects of preprocessing have not been studied for Toray’s 3D-Gene chip, a widely used measurement method. We aimed to evaluate batch effect, missing value imputation accuracy, and the influence of preprocessing on measured values in 18 different preprocessing pipelines for EV-miRNA microarray data from two cohorts with amyotrophic lateral sclerosis using 3D-Gene technology. Results Eighteen different pipelines with different types and orders of missing value completion and normalization were used to preprocess the 3D-Gene microarray EV-miRNA data. Notable results were suppressed in the batch effects in all pipelines using the batch effect correction method ComBat. Furthermore, pipelines utilizing missForest for missing value imputation showed high agreement with measured values. In contrast, imputation using constant values for missing data exhibited low agreement. Conclusions This study highlights the importance of selecting the appropriate preprocessing strategy for EV-miRNA microarray data when using 3D-Gene technology. These findings emphasize the importance of validating preprocessing approaches, particularly in the context of batch effect correction and missing value imputation, for reliably analyzing data in biomarker discovery and disease research.
... A scoring system was implemented for each evaluation indicator to comprehensively evaluate the predictive models. The study normalized the individual evaluation [52,53] . Normalization of various evaluation indicators in the prediction model is shown in equation (33) Table 3. Evaluation indicators of each prediction model ...
Article
Full-text available
The combined impact of freeze-thaw cycles and dynamic loads significantly influences the long-term durability of rock engineering in high-cold regions. Consequently, investigating the dynamic compressive strength (DCS) of rocks subjected to freeze-thaw cycles has emerged as a crucial area of scientific research to advance rock engineering construction in cold regions. Presently, the determination of the DCS of rocks under freeze-thaw cycles primarily relies on indoor experiments. However, this approach has faced criticism due to its drawbacks, including prolonged duration, high costs, and reliance on rock samples. To address these limitations, the exploration of using artificial intelligence technology to develop more accurate and convenient DCS prediction models for rocks under freeze-thaw cycles is a promising attempt. In this context, this paper introduces a DCS prediction model for rocks under freeze-thaw cycles, which integrates the Sparrow Search Algorithm (SSA) with Random Forest (RF). Firstly, employing a dataset of 216 samples, Principal Component Analysis (PCA) is utilized to reduce the dimensionality of ten influential factors. Subsequently, five optimization algorithms are employed to optimize the hyperparameters of both the BP and RF algorithms. Finally, a comprehensive evaluation and comparative analysis are carried out to assess the predictive performance of the optimized model, using evaluation metrics such as Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and Coefficient of Determination (R2).The research findings demonstrate that the SSA-RF model exhibits the best predictive performance, surpassing the other nine models in terms of generalization. The prediction model proposed in this study has good applicability for predicting DCS of freeze-thaw rock in cold regions, and also provides new ideas for the combination of machine learning and rock mass engineering in cold regions.
... Data analysis was performed using R-Bioconductor (Seattle, WA, USA). The samples were log 2 -transformed and normalized by the median alignment method; differentially expressed (DE) miRNAs were selected by the R-limma tool with threshold false discovery rate <0.05 and linear |fold change| >1.5 (24). Possible confounding factors such as sex (M/F) and previous therapy (yes/no) were included in limma linear models as covariates. ...
Article
Full-text available
Background and objectives Multiple sclerosis (MS) is a chronic, progressive neurological disease characterized by early-stage neuroinflammation, neurodegeneration, and demyelination that involves a spectrum of heterogeneous clinical manifestations in terms of disease course and response to therapy. Even though several disease-modifying therapies (DMTs) are available to prevent MS-related brain damage—acting on the peripheral immune system with an indirect effect on MS lesions—individualizing therapy according to disease characteristics and prognostic factors is still an unmet need. Given that deregulated miRNAs have been proposed as diagnostic tools in neurodegenerative/neuroinflammatory diseases such as MS, we aimed to explore miRNA profiles as potential classifiers of the relapsing–remitting MS (RRMS) patients’ prospects to gain a more effective DMT choice and achieve a preferential drug response. Methods A total of 25 adult patients with RRMS were enrolled in a cohort study, according to the latest McDonald criteria before (pre-cladribine, pre-CLA; pre-ocrelizumab, pre-OCRE, time T0) and after high-efficacy DMTs, time T1, 6 months post-CLA (n = 10, 7 F and 3 M, age 39.0 ± 7.5) or post-OCRE (n = 15, 10 F and 5 M, age 40.5 ± 10.4) treatment. A total of 15 age- and sex-matched healthy control subjects (9 F and 6 M, age 36.3 ± 3.0) were also selected. By using Agilent microarrays, we analyzed miRNA profiles from peripheral blood mononuclear cells (PBMC). miRNA–target networks were obtained by miRTargetLink, and Pearson’s correlation served to estimate the association between miRNAs and outcome clinical features. Results First, the miRNA profiles of pre-CLA or pre-OCRE RRMS patients compared to healthy controls identified modulated miRNA patterns (40 and seven miRNAs, respectively). A direct comparison of the two pre-treatment groups at T0 and T1 revealed more pro-inflammatory patterns in the pre-CLA miRNA profiles. Moreover, both DMTs emerged as being capable of reverting some dysregulated miRNAs toward a protective phenotype. Both drug-dependent miRNA profiles and specific miRNAs, such as miR-199a-3p, miR-29b-3p, and miR-151a-3p, emerged as potentially involved in these drug-induced mechanisms. This enabled the selection of miRNAs correlated to clinical features and the related miRNA–mRNA network. Discussion These data support the hypothesis of specific deregulated miRNAs as putative biomarkers in RRMS patients’ stratification and DMT drug response.
... In this paper, 243 sets of data are randomly divided into 10 parts, and these optimised models are evaluated for model stability by the K-fold crossvalidation method, as illustrated in Figure 12. This paper adopts a method to score and evaluate the stability of all the evaluation indexes for each model to compare the comprehensive performance of these optimisation models (Rao et al. 2008). Compared to the traditional simple grading method, this approach can provide an effective solution to the excessive disparity between the evaluation scores of close indexes (Crosby et al. 2020). ...
Article
Full-text available
The voids beneath cement concrete slabs are a major invisible disease, resulting in a rapid decrease in service performance in the composite pavement. Accurate voids prediction is essential for the extensive application and long-term service of composite pavement. This research provides a FEM-ANN (Finite Element Modelling-Artificial Neural Network) method to predict the voids beneath concrete slabs. These ANN models include the original back propagation (BP), the particle swarm optimisation (PSO) BP model, the genetic algorithm (GA) BP model, and the whale optimisation algorithm (WOA) BP model. The voids FEM model is established and validated by the measured data in the field, and the relative error of measured and simulated results is within 4%. The cross-validation results show that the WOA-BP model has the best prediction performance, with the highest score of 8, which refers to the overall score of the mean value and variance of these evaluation indices. Therefore, this FEM-ANN framework is an efficient method for estimating the voids beneath concrete slabs. Furthermore, it is discovered that the base modulus with the highest contribution degree of 20.34% is the most dominant factor in predicting the voids output. HIGHLIGHTS • A FEM-ANN method is utilised to predict the voids beneath concrete slabs • The WOA-BP model exhibits the best comprehensive performance of the four ANN models. • Wd and pavement mechanical responses have a positive effect on Av opposite to Kd and pavement structure. • The base modulus is the primary factor in predicting the voids output.
... With the raw data in hand, combining it into an aggregated version required establishing commonalities in gene expression patterns such that related timepoints from different studies could be grouped up accordingly. Before getting into all the details, however, quantile normalization was first applied to standardize the statistical properties of each data distribution (Rao et al., 2008). ...
Preprint
Full-text available
Along the pathogenesis of Mycobacterium Tuberculosis (MTB), hypoxia-induced dormancy is a process involving the oxygen-depleted environment encountered inside the lung granuloma, where bacilli enter a viable, non-replicating state termed as latency. Affecting nearly two billion people, latent TB can linger in the host for indefinite periods of time before resuscitating, which significantly strains the accuracy of treatment options and patient prognosis. Transcriptional factors thought to mediate this process have only conferred mild growth defects, signaling that our current understanding of the MTB genetic architecture is highly insufficient. In light of these inconsistencies, the objective of this study was to characterize regulatory mechanisms underlying the transition of MTB into dormancy. The project methodology involved a three-part approach - constructing an aggregate hypoxia dataset, inferring a gene regulatory network based on those observations, and leveraging several downstream network analyses to make sense of it all. Results indicated dormancy to be functionally associated with cell redox homeostasis, metal ion cycling, and cell wall metabolism, all of which modulate essential host-pathogen interactions. Additionally, the crosstalk between individual regulons (Rv0821c and Rv0144; Rv1152 and Rv2359) was shown to be critical in facilitating bacterial persistence and allowing MTB to gain control over key micronutrients within the cell. Defense antioxidants and nutritional immunity were also identified as future avenues to explore further. In providing some of the first insights into the methods utilized by MTB to endure in a hypoxic state, this research suggests a range of strategies that might aid in improved clinical outcomes of TB treatment.
... To our knowledge, ComBat is the most popular BEC method when the batch variable is known, which is the case for our data. For normalization, we examined three methods that are relatively commonly used in the literature (29)(30)(31). ...
Article
Full-text available
The discovery of new tumor subtypes has been aided by transcriptomics profiling. However, some new subtypes can be irreproducible due to data artifacts that arise from disparate experimental handling. To deal with these artifacts, methods for data normalization and batch-effect correction have been utilized before performing sample clustering for disease subtyping, despite that these methods were primarily developed for group comparison. It remains to be elucidated whether they are effective for sample clustering. We examined this issue with a re-sampling-based simulation study that leverages a pair of microRNA microarray data sets. Our study showed that (i) normalization generally benefited the discovery of sample clusters and quantile normalization tended to be the best performer, (ii) batch-effect correction was harmful when data artifacts confounded with biological signals, and (iii) their performance can be influenced by the choice of clustering method with the Prediction Around Medoid method based on Pearson correlation being consistently a best performer. Our study provides important insights on the use of data normalization and batch-effect correction in connection with the design of array-to-sample assignment and the choice of clustering method for facilitating accurate and reproducible discovery of tumor subtypes with microRNAs.
Article
Full-text available
italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Context : Within the domain of managing software development teams, effective task prioritization is a critical responsibility that should not be underestimated, particularly for larger organizations with significant backlogs. Current approaches primarily rely on predicting task priority without considering information about other tasks, potentially resulting in inaccurate priority predictions. Objective : This paper presents the benefits of considering the entire backlog when prioritizing tasks. Method : We employ an iterative approach using particle swarm optimization to optimize a linear model with various preprocessing methods to determine the optimal model for task prioritization within a backlog. Results : The findings of our study demonstrate the usefulness of constructing a task prioritization model based on complete information from the backlog. Conclusion : The method proposed in our study can serve as a valuable resource for future researchers and can also facilitate the development of new tools to aid IT management teams.
Article
One of the first important steps in elucidating the function of microRNAs (miRNAs) is expression profiling. Many methods including low-density qPCR arrays are increasingly used to profile the expression of miRNAs. Normalization techniques are necessary due to certain biases in profiling approaches, and the techniques can significantly affect the accuracy of miRNA quantification. Most normalization methods for continous expression data have been developed for mRNA microarrays and new and modified methods should be used for miRNA studies in general and RT-qPCR miRNA arrays in particular. Previously, cyclic normalization using support vector regression has been successfully applied to mRNA arrays. Here, a new method based on support vector regression is introduced for miRNA normalization and the cyclic nature of algorithm in cyclic spline normalization has also been modified. It was shown that by creating a baseline array, it is possible to remove the cyclic nature of the normalization to achieve faster normalization, with no loss of accuracy. To assess how much the mentioned normalization method reduces technical error, mean square error (MSE) in two real miRNA qPCR array datasets and a simulated dataset before and after normalizations was robustly modelled and compared. Our method was also systematically compared with the most commonly used methods for normalization of qPCR miRNA arrays. The new method showed lower MSE values corresponding to other common methods of miRNA normalization.
Chapter
Differential expression of microRNAs (miRNAs) is observed in many diseases including type 2 diabetes (T2D). Insulin secretion from pancreatic beta cells is central for the regulation of blood glucose levels and failure to release enough insulin results in hyperglycemia and T2D. The importance in T2D pathogenesis of single miRNAs in beta cells has been described; however, to get the full picture, high-throughput miRNA sequencing is necessary. Here we describe a method using small RNA sequencing, from sample preparation to expression analysis using bioinformatic tools. In the end, a tutorial on differential expression analysis is presented in R using publicly available data.
Article
Full-text available
RT-qPCR, microarray platforms and miRNA sequencing are the most common techniques used to determine microRNA (miRNA) expressions. One of the most important issues in studies these techniques are used is the normalization of the data by using appropriate normalization method. The purpose of normalization is to eliminate the effects of biological and technical variations on study results. Numerous normalization approaches are used for normalization of data obtained from different techniques in miRNA expression studies. In this review, information about the most commonly used normalization approaches in miRNA expression studies is summarized.
Book
Introduction.- Model Construction.- Regression with Gaussian-Type Responses.- More Splines.- Regression and Exponential Families.- Regression with Correlated Responses.- Probability Density Estimation.- Hazard Rate Estimation.- Asymptotic Convergence.- Penalized Pseudo Likelihood.
Book
Full four-color book. Some of the editors created the Bioconductor project and Robert Gentleman is one of the two originators of R. All methods are illustrated with publicly available data, and a major section of the book is devoted to fully worked case studies. Code underlying all of the computations that are shown is made available on a companion website, and readers can reproduce every number, figure, and table on their own computers.
Article
Motivation: When running experiments that involve multiple high density oligonucleotide arrays, it is important to remove sources of variation between arrays of non-biological origin. Normalization is a process for reducing this variation. It is common to see non-linear relations between arrays and the standard normalization provided by Affymetrix does not perform well in these situations. Results: We present three methods of performing normalization at the probe intensity level. These methods are called complete data methods because they make use of data from all arrays in an experiment to form the normalizing relation. These algorithms are compared to two methods that make use of a baseline array: a one number scaling based algorithm and a method that uses a non-linear normalizing relation by comparing the variability and bias of an expression measure. Two publicly available datasets are used to carry out the comparisons. The simplest and quickest complete data method is found to perform favorably. Availability: Software implementing all three of the complete data normalization methods is available as part of the R package Affy, which is a part of the Bioconductor project http://www.bioconductor.org. Supplementary information: Additional figures may be found at http://www.stat.berkeley.edu/~bolstad/normalize/index.html