ArticlePDF Available

Optimization of oligonucleotide-based DNA microarrays

Authors:

Abstract and Figures

Oligonucleotide-based DNA microarrays are becoming increasingly useful for the analysis of gene expression and single nucleotide polymorphisms. Here we report a systematic study of the sensitivity, specificity and dynamic range of microarray signals and their dependence on the labeling and hybridization conditions as well as on the length, concentration, attachment moiety and purity of the oligonucleotides. Both a controlled set of in vitro synthesized transcripts and RNAs from biological samples were used in these experiments. An algorithm is presented that allows the efficient selection of oligonucleotides able to discriminate a single nucleotide mismatch. Critical parameters for various applications are discussed based on statistical analysis of the results. These data will facilitate the design and standardization of custom-made microarrays applicable to gene expression profiling and sequencing analyses.
Content may be subject to copyright.
© 2002 Oxford University Press Nucleic Acids Research, 2002, Vol. 30, No. 11 e51
Optimization of oligonucleotide-based DNA
microarrays
Angela Relógio, Christian Schwager
1
, Alexandra Richter
1
, Wilhelm Ansorge
1
and
Juan Valcárcel*
Gene Expression Programme and
1
Functional Genomics Technology and Instrumentation Programme,
European Molecular Biology Laboratory, Meyerhofstrasse 1, D-69117 Heidelberg, Germany
Received February 25, 2002; Revised and Accepted April 12, 2002
ABSTRACT
Oligonucleotide-based DNA microarrays are becoming
increasingly useful for the analysis of gene expression
and single nucleotide polymorphisms. Here we report a
systematic study of the sensitivity, specificity and
dynamic range of microarray signals and their depend-
ence on the labeling and hybridization conditions as
well as on the length, concentration, attachment moiety
and purity of the oligonucleotides. Both a controlled set
of
in vitro
synthesized transcripts and RNAs from
biological samples were used in these experiments. An
algorithm is presented that allows the efficient selec-
tion of oligonucleotides able to discriminate a single
nucleotide mismatch. Critical parameters for various
applications are discussed based on statistical analysis
of the results. These data will facilitate the design and
standardization of custom-made microarrays applicable
to gene expression profiling and sequencing analyses.
INTRODUCTION
DNA microarrays hold the promise of becoming a revolu-
tionary tool for large-scale parallel analyses of genome
sequence and gene expression (1–4). Current applications
range from global analyses of transcriptional programmes in
yeast or mammals (5,6) to establishment of novel criteria for
the classification and evaluation of clinical course of tumors
(7–10) and to accelerated discovery of drug targets (11,12).
Methods for microarray fabrication include spotting of DNA
onto nylon membranes or glass slides by robots with pins or
ink jet printers (13,14). The DNA spotted corresponds to frag-
ments of genomic DNA, cDNAs, PCR products or chemically
synthesized oligonucleotides (15). cDNA arrays are often used
in RNA expression analysis, while oligonucleotide arrays are
additionally used for sequence analyses. Oligonucleotides can
also be synthesized in situ on the surface of the array by means
of light-directed combinatorial synthesis (photolithography) or
ink jet technologies, which allow microarrays of higher density.
Current state-of-the-art technology allows the inclusion of more
than 400 000 sequences representing up to 13 000 genes and
expressed sequence tags (see below) on a surface of 1.6 cm
2
(16).
Oligonucleotide-based microarrays offer a number of advantages
over cDNA microarrays, including (i) more controlled specificity
of hybridization, which makes them particularly useful for the
analysis of single nucleotide polymorphisms (17) or mutational
analysis (18,19); (ii) versatility to address subtle questions about
transcriptome composition such as the presence and prevalence of
alternatively spliced or alternatively polyadenylated transcripts
(20,21); (iii) capacity to systematically screen whole genomic
regions for gene discovery (22,23); and (iv) the fact that
only sequence information (not biological samples or cDNA
collections) is required to generate custom-made microarrays.
Despite the predictable impact and widespread use of oligo-
nucleotide-based microarrays, there is a paucity of publicly
available information regarding the design and use of this tech-
nology. In addition there is a need for standardization that will
facilitate comparison of microarray data (24). Basic questions
such as the number of oligonucleotides required for reliable
detection of an RNA can have profound practical conse-
quences for the quality and financial feasibility of specific
experiments or projects.
Early reports using in situ photolithographic synthesis
employed 300 probe pairs (match and single mismatch control)
of 15 nt in length per gene studied. Statistical analysis of the
hybridization signals allowed the detection of 2-fold changes
in the levels of cytokine mRNAs in T lymphocytes under a
variety of physiological stimuli (25). Improved photolitho-
graphic in situ synthesis resulting in reliable longer oligo-
nucleotides, together with statistical analyses of the data,
allowed these investigators to reduce the number of probes
required per gene to 20 oligonucleotide pairs of 25 nt in length
(8,26), and more recently to 16 oligonucleotide pairs per gene
(16).
Ink jet procedures have allowed in situ synthesis of longer
oligos. A single 60 nt-long oligo per gene has rendered results
comparable with those obtained using cDNA microarrays, and
has allowed functional annotation of complete chromosomes
under dozens of experimental conditions (22). One disadvantage
of long oligonucleotide probes, shared with cDNA micro-
arrays, is the difficulty of generating reliable mismatch
controls that will assess the specificity of hybridization.
Here we report results of experiments designed to optimize
the selection of oligonucleotides and the performance of oligo-
nucleotide-based microarrays. Specificity, sensitivity and
dynamic range of the signals were analyzed with regard to
characteristics of the oligonucleotide (e.g. length and purity),
hybridization conditions, labeling method and other parameters
*To whom correspondence should be addressed. Tel: +49 6221 387 156; Fax: +49 6221 387 518; Email: juan.valcarcel@embl-heidelberg.de
e51 Nucleic Acids Research, 2002, Vol. 30, No. 11 PAGE 2 OF 10
both in a controlled system composed of in vitro transcribed
RNAs and using mRNA from mammalian cells.
MATERIALS AND METHODS
Oligonucleotide selection
Oligonucleotides were selected using modified Gene Skipper
software and selection rules modified from published criteria (25).
The algorithm used applies the following set of hierarchical
conditions. (i) Exclusion of oligonucleotides with ‘adverse
base composition: total number of As or Ts less than 10; total
number of Cs or Gs less than six; no more than six As or Ts in a row;
a palindrome score (a measure of probe self-complementarity)
of <7 nt. (ii) Selection of sets of oligonucleotides with homo-
geneously high melting temperature. (iii) Exclusion of oligo-
nucleotides with perfect complementarity to other sequences
present in the set of genes to be analyzed. (iv) Exclusion of
oligonucleotides with ability to form hairpin loops. The
program also selects the corresponding mismatch control
oligonucleotides, containing a single transversion in the central
position. This software is freely available by email request to
schwager@embl-heidelberg.de.
Spotting and attachment to glass slides
Unless indicated, HPLC-purified oligonucleotides containing
an amino group and six carbon spacer at the 5 end were
spotted onto aminosilane-coated glass slides using either a
GMS 417 spotter (Affymetrix) or a SDDC Microarray spotter
(Engineering Systems Inc.), with equivalent results. Fifty
picoliters of oligonucleotide solutions were spotted at concen-
trations of 30–100 µM. Attachment was achieved by incubating
the coated glass slides with the spotted oligos for 4 h at 60°C
and10minat120°C. Alternative attachment protocols, e.g.
overnight incubation at 37°C, resulted in decreased sensitivity.
Attachment onto epoxy surfaces or after acid treatment of
glass slides was as described previously (27).
Preparation and labeling of in vitro transcripts
Templates for transcription of selected genes were generated
by PCR from the corresponding cDNAs using oligonucleotides
containing a T7 promoter in the (–) strand. Standard 25 µl in vitro
transcription reactions were set up containing 100 µMfluor-
escently labeled nucleotides [cyanine 5-CTP (Cy5) or cyanine 3-
CTP (Cy3); NEN], 200 µM CTP, 500 µMATP,UTP,GTP
(Amersham Pharmacia), 200 ng of template DNA and 1.6 U/µl
T7 RNA polymerase (Promega). After incubation for 2 h at
37°C, the DNA template was digested with 10 U of DNase I
(Promega) at 37°C for 30 min. The samples were then purified
using Chroma-Spin columns (Clontech) and stored at –20°C.
To precisely measure the amount of RNA synthesized, an
aliquot of the reaction was spiked with a trace of [α-
32
P]GTP. The
transcripts were quantified after fractionation by denaturing poly-
acrylamide gel electrophoresis, excising the corresponding
band and measuring radioactivity with a liquid scintillation counter.
Preparation and labeling of RNA from HeLa cells
Total RNA was extracted from HeLa cells using the RNAeasy
kit (Qiagen) and the concentration estimated by measuring
optical density at 260 nm. Poly(A)
+
purification was carried
out using olig-dT cellulose columns (28).
For direct cDNA labeling, 2 µg of poly(A)
+
RNA was incubated
in a 25 µl reaction containing 200 pmol of 14-nt random
primers, 200 pmol oligo-dT (12–18 nt in length), which was
heated at 70°C for 10 min and then left on ice for 5 min. cDNA
synthesis was carried out in 55 µl reactions containing 400 U
Superscript II Reverse Transcriptase (Invitrogen), 100 µM
Cy5-dUTP (New England Nuclear), 200 µM dTTP, 500 µM
dATP, dCTP, dGTP (Amersham Pharmacia), and the buffer
conditions recommended by the manufacturer. After 2 h incuba-
tion at 42°C the reaction was stopped by incubating at 65°Cfor
10 min in the presence of 50 mM NaOH and 1 mM EDTA in a
final volume of 58 µl. Labeled cDNA was purified using
Chroma-Spin columns +STE-10 (Clontech) and stored at –20°C.
For cRNA labeling, either 5–20 µgoftotalHeLaRNAor2µg
of poly(A)
+
RNA was used. Signals obtained using 20 µgof
total HeLa RNA and 2 µg of poly(A)
+
RNA were comparable.
RNA was incubated with 8 µM T7-dT(24) primer in a 25 µl
volume at 70°C for 10 min and then incubated at 4°Cfor5min.
First strand synthesis was carried out in a 41.7 µl reaction
containing 420 U Superscript II Reverse Transcriptase
(Invitrogen), 500 µM dNTP mix (Amersham Pharmacia) and
10 mM DTT, under the buffer conditions recommended by the
manufacturer. After 1 h incubation at 37°C the reaction
mixture was chilled on ice for 5 min and then the second strand
was synthesized in a 75 µl reaction mix containing 20 U DNA
polymerase I, 5 U DNA ligase, 5 U RNase H (all three
enzymes from Invitrogen), 200 µM dNTP mix, under the
buffer conditions recommended by the manufacturer, incu-
bated for 2 h at 16°C in a Thermocycler, and the reaction
stopped with 60 mM EDTA in a total final volume of 85 µl.
After phenol/chloroform extraction and ethanol precipitation
in the presence of 20 µg glycogen (Roche), the pellet was
washed twice with 70% ethanol, dried and resuspended in 6 µl
of distilled water. T7 transcription was carried out overnight at
37°Cin25µl using one-fourth of cDNA, 160 U T7
Polymerase (Promega), 1 mM DTT, 500 µM ATP, UTP, GTP,
250 µMCTPand100µM Cy5-CTP. After digestion with 1 U
of RNase-free DNase for 30 min, labeled cRNA was purified
twice using Chroma-Spin columns (Clontech).
Fragmentation of labeled cRNA was achieved by incubation of
15 µgofcRNAin20mMTrisacetatepH8.1,50mMpotassium
acetate, 15 mM magnesium acetate for 15 min at 94°C.
Hybridization and washing
Slides were incubated in a glass chamber for 45 min at 42° C
with pre-warmed pre-hybridization buffer (6× SSC, 0.5%
SDS, 1% BSA) and subsequently quickly washed with
distilled water pre-warmed at the same temperature and dried
by short centrifugation.
Poly(A)
+
(5 µg) and 1 µg of human Cot DNA were added to
the sample, dried in a speed vac at 45°C and redissolved in 12 µl
(for a 24 × 24 mm cover slip) of hybridization buffer (50%
formamide, 6× SSC,0.5%SDS,5× Denhardt’s solution; 58%
formamide for RNA from HeLa cells). Hybridization was
carried out in a humid chamber for 16 h. Washings were
performed twice at 42°C inside a glass chamber containing
0.1× SSC, 0.1% SDS for 5 min and twice more in 0.1× SSC for
5 min. Washes at higher temperatures (47, 55 and 62°C) or
lower concentrations of SSC (0.03× SSC, 0.01× SSC or water)
resulted in very significant losses in fluorescent signals. Slides
were subsequently dried by brief centrifugation.
PAGE 3 OF 10 Nucleic Acids Research, 2002, Vol. 30, No. 11 e51
Data analyses
Microarrays were scanned using either a GMS 418 array scanner
(Affymetrix) or a Gene Pix 4000B simultaneous dual wave-length
scanner (Axon Instruments Inc.). The data obtained were analyzed
using Chip Skipper software. This software is freely available by
email request to schwager@embl-heidelberg.de.
The intensity values per spot were determined by creating a
circle adjusted to the size of the spot (diameter 15 µm
centered on the spot) and integrating the intensity value per
pixel in that area. Background was extracted by determining
the median values of pixels located on the perimeter of a
square surrounding the circle centered on the same position.
Intensity values were normalized using as spotting controls a
mix of oligonucleotides of known concentration labeled with
Cy5 and Cy3.
RESULTS
RNAs corresponding to the antisense sequence of five
eukaryotic RNA binding proteins were transcribed in vitro
in the presence of fluorescent Cy5- or Cy3-labeled nucleotides.
The rationale for the use of antisense transcripts was to allow
direct comparisons with hybridization of labeled cDNAs or
cRNAs generated in subsequent experiments aimed to analyze
mRNAs from biological samples (see below). A trace of
radioactive nucleotides was used in the transcription reactions
to allow the quantification of the yield of purified RNAs. The
RNAs were hybridized to an oligonucleotide microarray where
HPLC-purified 5-amino modified oligos of 25, 30 or 35 nt in
length were printed on the activated surface of a glass slide
(see Materials and Methods). Figure 1A indicates the layout of
oligos corresponding to one of the genes. Fifteen non-overlapping
oligos corresponding to each length, complementary in
sequence to the corresponding transcript, were selected
according to the algorithm described in Materials and
Methods, and printed in triplicate. Mismatch controls
containing a single transversion change in the middle position of
each oligo were printed in triplicate next to the corresponding
perfect match oligo. The sequences of oligonucleotides used
for the analysis of two of the genes are provided as Supplementary
Material. After hybridization to in vitro transcribed RNA and
Figure 1. Hybridization of in vitro transcribed RNAs to oligonucleotide microarrays. (A) Layout of oligos corresponding to one gene, and fluorescent scan of the
hybridization data corresponding to oligos for one of the genes under study (U2AF35). (B) Microarray layout for the five genes under study and fluorescent scan
of hybridization data to the complete set of in vitro transcribed RNAs. The layout of oligo lengths and M/MM controls is indicated. The white box surrounds an
area where the oligos spotted correspond to regions of the mRNA not present in our in vitro transcripts. (C) As in (B), without RNA corresponding to the gene
SXL. (D) As in (B), with only SXL RNA.
e51 Nucleic Acids Research, 2002, Vol. 30, No. 11 PAGE 4 OF 10
washing, fluorescent signals were detected using a confocal
microarray scanner.
Specificity and sensitivity optimization
The results shown in Figure 1A indicate that the signals associated
with perfect match oligos were stronger than those associated
with their mismatch controls.
Figure 1B shows similar results for the five genes studied.
Triplets of fluorescent signals present at the bottom-right position
of each quarter correspond to fluorescent markers used as spotting
controls. As a first test of specificity, one of the RNAs (SXL)
was omitted from the hybridization mix. A reduction in the
fluorescent signals corresponding to SXL oligonucleotides
was observed (compare Fig. 1B and C). As a second test, only
SXL transcripts were hybridized to the microarray. Figure 1D
shows that little fluorescence was detected associated with
oligos corresponding to genes different from SXL. A third
specificity test was built in the design of the microarray.
Oligos corresponding to sequences in the 3-untranslated
region (3-UTR) of some of the genes were present in the
microarray, whereas labeled in vitro synthesized RNAs were
limited to the open reading frames. Hybridization to oligos
complementary to the UTR regions was undetectable for most
probes (positions inside the white rectangles in Fig. 1B).
Taken together, the results of Figure 1 indicate that hybrid-
ization to a significant fraction of the oligonucleotides selected
is specific, as shown by the discrimination of single nucleotide
mismatches.
Table 1 summarizes quantitative information obtained from
at least three independent experiments carried out as in Figure 1B.
RNAslabeledwithCy-5andRNAslabeledwithCy-3were
used in each experiment, thus providing a duplicate read out of
each result. As signals for each oligo were obtained in triplicate,
the figures in Tables 1 and 2 correspond to the average and
standard deviation of at least 270 independent measurements
for each gene and oligo length. The data were further validated
by results from more than 40 hybridization experiments.
Oligo length
Two main conclusions can be drawn from these data. First, a
decrease in match/mismatch (M/MM) ratio was observed with
the increase in oligo length. This is particularly clear when the
average median values are considered (from 4.4 for 25mers to
1.8 for 35mers). This trend is expected, as longer oligos are
more likely to energetically accommodate a single nucleotide
mismatch at a central position. Statistically, 75% of the oligo-
nucleotides selected showed a M/MM discrimination >2-fold.
The second conclusion is that up to 4-fold differences in M/MM
ratios could be observed for the different genes studied. These
could not be attributed to overall differences in G+C content,
or other obvious sequence features.
The effect of oligonucleotide length on the intensity of the
fluorescent signals was also analyzed. Table 2 shows the ratios
between the intensity of the signals for each gene and oligo
length. While the signals for 30 and 35mer oligonucleotides
were 2–5 times higher than 25mers, no significant difference
was observed between oligos of 30 and 35 nt. Once again,
differences in microarray performance were observed for
different genes. Sensitivity measurements indicated that 0.1 ng
(0.3 fmol) could be routinely detected. This level of sensitivity
would enable the detection of one specific mRNA present at
0.01% in 1 µg of poly(A)
+
RNA. This would be equivalent to
detect low abundance mRNA species (e.g. PPAR-α,
HMGcoA), but may represent difficulties to detect very low
abundance transcripts (e.g. Fas or Insulin receptor) (29).
Although the threshold of detection depended on the sensitivity of
the scanner used, similar M/MM ratios were obtained with
different scanners.
Table 1 . Variation of microarray sensitivity and specificity with oligo length: M/MM ratios for oligos corresponding to
the different genes and oligonucleotide lengths
Average and median M/MM values, as well as percentage of oligos with a M/MM discrimination >2, are indicated.
Length M/MM %M/MM>2
U2AF65 Srp20 U2AF35 TIA1 SXL Average Median Average Median
25mer 3.3
± 0.6 11.1 ± 1.3 3.6 ± 0.1 4.6 ± 0.1 4.4 ± 0.4 5.4 4.4 77% 75%
30mer 2.7
± 0.6 9.7 ± 0.4 4.3 ± 0.2 2.6 ± 0.4 2.2 ± 0.2 4.3 2.7 79% 85%
35mer 1.8 ± 0.2 8.2 ± 0.4 2.8 ± 0.2 1.8 ± 0.4 1.7 ± 0.1 3.3 1.8 74% 60%
Tabl e 2. Variation of microarray sensitivity and specificity with oligo length: ratios between
fluorescent intensities of different lengths of oligonucleotides corresponding to the indicated genes
Average, standard deviation and median M/MM values are also shown.
Length Signal intensity ratios
U2AF65 Srp20 U2AF35 TIA1 SXL Average Median
35/25 2.2
± 0.6 3.6 ± 0.5 2.8 ± 0.5 2 ± 0.1 2.5 ± 0.1 2.6 ± 0.6 2.5
35/30 0.6
± 0.1 0.6 ± 0.1 1.3 ± 0.4 1.1 ± 0.1 1.2 ± 0.1 1 ± 0.3 1.1
30/25 2.9 ± 0.4 5.8 ± 0.6 1.6 ± 0.1 1.7 ± 0.2 2.1 ± 0 2.8 ± 1.7 2.1
PAGE 5 OF 10 Nucleic Acids Research, 2002, Vol. 30, No. 11 e51
Oligo concentration
Tables 3 and 4 show the results obtained for one of the genes
(SXL) with different amounts of oligonucleotides spotted.
Equivalent results were obtained for the other four genes
studied (data not shown). Only marginal increases in specificity
and sensitivity were observed by increasing the concentrations
of the oligo in the spotting solution from 20 to 50 pmol of
oligo/µl. Although additional tests in a range of concentrations
from 10 to 100 pmol/µl registered up to 10-fold differences in
signal intensity, no significant differences were normally
observed for concentrations between 30 and 100 pmol/µl.
These observations indicate that the amount of oligonucleotide
attached at spotting concentrations between 30 and 100 µM
were not rate limiting for detection of fluorescent RNAs.
Consistent with this conclusion, hybridization of higher
amounts of target RNAs resulted in increased fluorescent
signals (data not shown).
The performance of different attachment chemistries was
also tested. Higher sensitivity (between 10- and 100-fold) was
observed with silanized coating compared with pan-epoxy
coating or acid treatment of the glass surface (27). Although 5
amino modification was not strictly required, 2–4-fold
increases in detection levels were observed when amino-modified
oligos were used.
Hybridization temperature and formamide concentration
Next, the effect of different hybridization temperatures and
percentage of formamide on sensitivity and specificity of the
signals were analyzed. Figure 2 shows the results obtained for
one gene (SXL) and one oligo length (30mer), which were
representative of the performance of other genes and oligo-
nucleotide lengths. Hybridization at temperatures between 4
and 25°C resulted in poor microarray performance due to low
signal intensities (4°C) or high background (25°C). Therefore,
a range of temperatures between 30 and 42°C was tested. Figure
2A shows that while increasing the temperature from 30 to
35°C resulted in a 25% increase in M/MM ratio (up to 40% for
other genes, and not higher for shorter oligos), a further
increase to 42°C did not improve (in fact, decreased) discrimi-
nation.
The reverse tendency was observed regarding hybridization
intensities. Figure 2B shows a slight decrease in hybridization
signals between 30 and 35°C, followed by an increase when
the hybridization was carried out at 42°C.
Standard hybridization solutions include 50% formamide.
Absence or lower concentrations of formamide (e.g. 40%)
Table 3 . Variation of microarray sensitivity and specificity with the
concentration of oligos spotted: M/MM ratios
Average and standard deviations are shown for oligos of the indicated lengths,
spotted at the concentrations indicated.
Oligo concentration (pmol/µl) M/MM
25mer 30mer 35mer
50 4.3
± 0.6 2 ± 0.1 2 ± 0.2
30 3.6
± 0.6 2 ± 0.1 1.5 ± 0.1
20 3.8 ± 0.1 2.2 ± 0.2 1.8 ± 0.1
Table 4 . Variation of microarray sensitivity and specificity with the
concentration of oligos spotted: fluorescent intensities for oligos of the
indicated lengths, spotted at the concentrations indicated
Average and standard deviation values for all lengths are shown.
Oligo concentration
(pmol/
µl)
Signal intensities (×10
6
)
25mer 30mer 35mer Average
50 1.75
± 0.07 2.49 ± 0.07 2.98 ± 0.08 2.41 ± 0.62
30 1.75
± 0.05 2.38 ± 0.01 2.51 ± 0.05 2.21 ± 0.41
20 1.06 ± 0.09 2.09 ± 0.02 2.36 ± 0.07 1.84 ± 0.69
Figure 2. Variation of specificity and sensitivity of the microarray with hybrid-
ization temperature and formamide concentration. (A) Variation of M/MM
ratios with temperature. Average M/MM ratios for 30-nt oligos corresponding
to SXL at the indicated temperatures. Standard deviations are represented by
vertical bars. Filled squares represent values at 50% formamide; triangles
represent the value at 58% formamide. (B) Variation of signal intensities wit
h
temperature. Average fluorescent intensities for 30-nt oligos corresponding to
SXL at the indicated temperatures. Standard deviations are represented by
vertical bars. Filled squares represent values at 50% formamide; triangles
represent the value at 58% formamide.
e51 Nucleic Acids Research, 2002, Vol. 30, No. 11 PAGE 6 OF 10
resulted in poor fluorescent signals. The effects of increasing
formamide concentration to 58% were assessed, and are repre-
sented as triangles in Figure 2. While the increase in formamide
concentration caused a slight increase in M/MM discrimination, it
was accompanied by a more substantial decrease in hybridiza-
tion signals. Use of higher (70–80%) formamide concentrations
resulted in very poor sensitivity. These effects can be
explained, at least qualitatively, by the more stringent hybridization
conditions imposed by the presence of formamide.
Washing temperatures were also systematically tested.
Temperatures of 47, 55 and 65°C resulted in progressive and
significant loss of signals compared with 42°C. Temperatures
of 37 or 25°C resulted in progressive loss of M/MM discrimi-
nation compared with 42°C.
Purity and source of oligonucleotides
A potentially important issue for large-scale microarray
performance is the quality and source of oligonucleotides. To
address this, seven selected oligonucleotides corresponding to
one of the genes under study, chosen strategically to represent
oligos with different levels of performance, were obtained
from four different commercial providers. Both non-purified
and HPLC-purified oligonucleotides were obtained from three
of these providers. Tables 5 and 6 summarize the results of the
comparison. Two conclusions can be drawn from these results.
First, differences in performance of up to 70% between
providers were observed, both regarding sensitivity and M/MM
ratios. Secondly, while purified oligos could provide up to
5-fold better sensitivity, non-purified oligos showed higher
M/MM discrimination. This could be due to an increased
proportion of oligos shorter than full length in the non-purified
preparations, resulting in lower sensitivity (Table 2) while
showing higher M/MM discrimination (Table 1). Consistent
with this possibility, the degree of full length oligo in non-purified
preparations, assessed by electrophoresis on denaturing gels,
correlated with performances more comparable with those of
purified oligonucleotides (data not shown).
Dynamic range
Microarray analyses often serve to compare the relative abun-
dance of a set of RNA species between two samples. To
address what was the dynamic range of sensitivity of our
microarrays, experiments were carried out in which different
amounts of Cy5- and Cy3-labeled RNA samples were hybridized
together to the same microarray. The ratio between the signals
obtained by scanning the slides at the wavelength characteristic of
each fluorochrome was compared with the input ratio between
the two RNAs. Table 7 shows statistical analyses of such
comparisons for the five genes under study. Whereas approxi-
mately linear responses were observed for input ratios between
1 and 10, higher input ratios were underestimated by up to 3-fold.
This indicates that changes in concentration >10-fold may not
be accurately quantified. Interestingly, a 1:1 input ratio was
measured in the microarray as a 1.6 Cy5:Cy3 ratio. This effect
could be due to lower levels of Cy3 incorporation during tran-
scription of the target RNAs or to less efficient detection of
RNAs labeled with this fluorochrome. From a practical point
of view, this emphasizes the need for reciprocal labeling in
order to establish reliable comparisons between two samples.
Analysis of HeLa mRNAs
The experiments described above were carried out using
precise amounts of specific RNAs transcribed in vitro.To
verify the performance of the microarrays with RNAs obtained
from biological samples, poly(A)
+
RNA was isolated from
HeLa cells in culture and fluorescently labeled by a variety of
procedures (see below).
Figure 3 shows results obtained under optimized hybrid-
ization conditions and indicate that, although the discrimination
was reduced compared with the values obtained for the simplified
system, most of the oligonucleotides still distinguished
between the match and the single mismatch control sequence.
Statistical analyses of the results indicated that: (i) 30% better
M/MM discrimination was observed for 25-nt oligos compared
with 35-nt oligos (average discrimination for 25mers was 1.8);
(ii) 30- or 35-nt oligos had 40% better sensitivity than
25mers; and (iii) the intensity of signals associated with genes
not present in the sample (SXL) was on average 100-fold lower
Table 5 . Variation of microarray sensitivity and specificity with degree of
purification among different commercial providers: average M/MM ratios for
seven pairs of HPLC-purified versus non-purified oligos for four different
commercial providers
a
Provider 4 could not produce non-purified oligos.
Provider 1 2 3 4
a
Non-purified 6.1 ± 0.4 4.6 ± 0.1 4.4 ± 0.5
HPLC-purified 3.9 ± 0.1 3.2 ± 0.1 4.1 ± 0.3 4.9 ± 0.3
Table 6 . Variation of microarray sensitivity and specificity with degree of
purification among different commercial providers: average relative signal
intensities for HPLC-purified versus non-purified oligos for four different
commercial providers
a
The value of non-purified oligos from provider 2 was set arbitrarily to 1.
Provider 1 2
a
3 4
Non-purified 1.3 ± 0.2 1 ± 0 0.8 ± 0.2
HPLC-purified 4.3 ± 0.2 3.5 ± 0.1 4.1 ± 0.2 3 ± 0.2
Tabl e 7. Comparison between input ratios of in vitro
transcripts labeled with Cy5/Cy3 and the observed
fluorescence values after hybridization
Average, standard deviation and median values correspond
to oligos of 30 nt.
Input ratios Observed ratios
Median Average
11.61.6
± 0.1
22.32.5
± 0.2
10 10.4 12.4
± 2.7
30 20.2 24.1
± 6
50 26.7 30.4
± 6.4
100 30.1 37.8 ± 13.5
PAGE 7 OF 10 Nucleic Acids Research, 2002, Vol. 30, No. 11 e51
than for genes expected to be expressed. If SXL RNA was
spiked in the sample, however, signals associated with the
corresponding oligos were of comparable intensity as when
present in a simpler mix of RNAs (data not shown).
Table 8 shows the percentage of oligonucleotides showing
>2-fold M/MM discrimination for RNAs analyzed using a
variety of sample labeling protocols (see Materials and
Methods). The data indicate that a level of discrimination
similar to that obtained for in vitro transcribed RNAs can also
be achieved for the complex mixture of HeLa mRNAs using
25-nt oligos and oligo-dT-primed cDNA linearly amplified by
transcription with T7 RNA polymerase. As expected, the
fraction of oligos showing discrimination was reduced when
oligos corresponding to SXL, a gene not expressed in HeLa
cells, were considered (Table 8). The difference in discrimination
between expressed genes and SXL was reduced for longer
oligos, particularly when amplification was carried out using
random primers.
As additional controls of specificity, 25-nt oligos corresponding
to human β-actin, β-tubulin and the Arabidopsis genes mgd
and fad were included in the microarray. While the proportion
of oligos showing a M/MM ratio >1 was between 87 and 100%
for genes expressed in HeLa cells (U2AF65, U2AF35, TIA-1,
SRp20, β-actin and β-tubulin) this proportion was 50% or less
for oligos corresponding to control genes (SXL, mgd and fad),
as expected from random distribution of spurious hybrid-
ization. Accordingly, the median M/MM ratio for all oligo
lengths was 1.6–1.7 for oligos corresponding to genes
expressed in HeLa cells, whereas it was 1.0 for control genes
(Table 9). These data suggest that M/MM discrimination does
occur for the majority of the oligos that are able to hybridize to
RNAs present in the sample, although often this ratio is 2-fold.
Discrimination could not be improved further by using more
stringent washing conditions. Amplification improved the
sensitivity of detection by a factor of 10 compared with direct
labeling by reverse transcription. RNA fragmentation of
T7-derived transcripts, achieved by partial degradation at
pH 8.1 in the presence of 15 mM magnesium, also increased
sensitivity by 1.5–2.0-fold, although this was accompanied by
moderate (1.5-fold) decreases in M/MM discrimination.
Figure 3. Hybridization of fluorescently labeled HeLa mRNAs to oligonucleotide microarrays. Microarray layout for the five genes under study, and fluorescent
scan of hybridization data to labeled HeLa mRNAs, obtained under optimized conditions.
Table 8 . Oligonucleotide discrimination for HeLa mRNAs labeled using
different protocols: percentage of oligos with M/MM ratios >2 indicated for
oligos of different lengths and different labeling methods
The lower part of the table indicates the same values for oligos corresponding
to SXL, a Drosophila gene whose transcripts are not present in HeLa cells.
Labeling method %M/MM>2
Direct
labeling
Amplification
using odT
Amplification
using random
primers
Amplification
using random
primers + odT
25mer58735473
30mer50635050
35mer46474433
SXL
25mer17102713
30mer25203527
35mer 20 18 33 40
Table 9 . Oligonucleotide discrimination for HeLa mRNAs labeled using
different protocols: M/MM ratios for all genes and oligo lengths and different
labeling methods, for oligos corresponding to genes expressed in HeLa cells
versus control genes
Amplification
method
HeLa genes Control genes
Average Median Average Median
odT 1.7
± 0.8 1.6 1.0 ± 0.5 1.0
Random primer 1.4 ± 0.5 1.5 0.9 ± 0.6 1.0
e51 Nucleic Acids Research, 2002, Vol. 30, No. 11 PAGE 8 OF 10
As an additional test for the specificity of the signals
detected, HeLa cells were transfected with an expression
vector encoding TIA-1, or the gene was knocked down in
tissue culture by transfecting short double-stranded RNA
oligos corresponding to TIA-1 sequences (30). RNA isolated
from these cells was labeled with either Cy5 or Cy3 and
compared with RNA from untransfected cells labeled with the
other dye. As predicted, increases or decreases in hybridization
signals specific for TIA-1 were detected depending on whether
TIA-1 was overexpressed or its expression inhibited (data not
shown).
Performance of longer oligonucleotides
Oligos significantly longer than 35 nt have been used in the
literature (23). The rationale for the use of longer oligo micro-
arrays is that their sensitivity could approach that of cDNA
microarrays. To compare the performance of long versus short
(25–35 nt) oligos, two 60-nt oligos were selected for each of
the genes under study, which covered sequences that included
a subset of the 25–35 nt-long oligos described above. 25–35-
and 60-nt oligos were printed in the same slides. The results of
hybridization experiments using in vitro transcribed RNAs
indicated that hybridization signals associated to 60-nt oligos were
10-fold higher than the signals detected for the corresponding
25mers (Table 10). This ratio was reduced to 2-fold when
signals obtained for 60mers and 30mers were compared.
Hybridization of HeLa RNAs was also within a similar range
of values (Table 10).
One difficulty associated with the use of long oligos is that
the difference in hybridization stability between perfect match
and single mismatch controls is predicted to be too low to
permit discrimination, and therefore that hybridization specificity
is more difficult to assess for each oligo. To address this question,
in vitro transcribed fluorescent RNAs corresponding to the five
genes analyzed in Figure 1 were hybridized to the microarray
containing short and long oligos. Hybridization signals
corresponding to these genes were on average 15.4 times
higher than those associated to the four control genes (β-actin,
β-tubulin, mgd and fad) (Table 11). This ratio was 20-fold
when the performance of 25-nt oligos was compared between
the same set of genes. We conclude that 60-nt oligos can
provide adequate specificity and better sensitivity than shorter
oligos in this experimental set up.
The behavior of 60-nt oligos was also analyzed using HeLa
cell RNAs as targets. Signals associated with 60mers were on
average 7-fold higher than for 25mers and 2.7-fold higher than
for 30mers (Table 10). Specificity was measured as the
average ratio between signals associated with human genes
versus SXL and Arabidopsis controls. This ratio was 12 for
25 nt-long oligos, while it was reduced to 3.3 for 60mers.
We conclude that while 60 nt-long oligos can provide signifi-
cantly better sensitivity than 25 or 30mers, their specificity in
complex mixtures of RNA is significantly lower than that
obtained for 25mers.
DISCUSSION
The data presented in this manuscript will assist in the design
of oligonucleotide-based DNA microarrays. The algorithm
provided allows the selection of oligos of variable length with
optimized uniform hybridization properties and with statistically
significant discrimination between perfect match and a single
nucleotide mismatch at a central position. Although at least
part of the signal associated with mismatch control oligonucleotides
is likely to be due to hybridization to the genuine target (31),
we adopted the criterion of considering only those oligos
showing at least 2-fold differences in hybridization ratios
between match and mismatch. However, statistical analyses
indicated that lower ratios could also be considered significant,
as frequently assumed in the literature (21).
Conditions were found in which 75% of the oligos selected
by our algorithm cleared the more stringent discrimination
criteria. This corresponds to a 98.4% probability of obtaining
at least one reliable measurement in a set of three selected
oligonucleotides, a 99.0% probability of obtaining at least two
reliable measurements in a set of five oligos, or a 93.76%
probability of obtaining at least three reliable measurements in
a set of five. These figures are significantly lower than the
number of oligonucleotides utilized to assess gene expression
in the literature (from 25 to 300 per RNA) (25), and could
therefore significantly reduce the cost and simplify data
processing of custom-made microarrays.
The next complementary step in oligonucleotide selection
should involve extensive sequence comparisons (e.g. BLAST
analyses) to minimize the chances that an oligo will hybridize
to identical sequences present in two or more genes. This
represents an intensive bioinformatic effort and selection of
oligonucleotides with increased discrimination can only help
to complement these efforts to improve microarray specificity.
An important conclusion of our results is that multiple
aspects of microarray design contribute to their performance, from
the choice of oligonucleotide provider or level of purification to
hybridization temperature within narrow margins. Variations
often work in opposite directions regarding sensitivity and
specificity, and therefore an appropriate compromise may need
to be reached for each experimental set up and application.
Tabl e 1 0. Performance of 60-nt oligos compared with 25 and 30mers:
average and median values of the ratios between fluorescent signals
associated with oligos of different lengths, after hybridization of labeled
RNAs either generated by in vitro transcription (IVT) or isolated from HeLa
cells
60/25 60/30
Average Median Average Median
IVT 10 ± 6.7 10.5 1.8 ± 1.2 1.3
HeLa 7.1 ± 3.3 7.1 2.7 ± 1.4 2.2
Table 1 1. Performance of 60-nt oligos compared with 25 and
30mers: average ratios between fluorescent signals associated
with oligos complementary to sequences present in the labeled
RNAs and signals from oligos complementary to control
Drosophila and Arabidopsis genes
Sample/control
25mers 60mers
IVT 20.7
± 715.4± 6
HeLa 11.9 ± 3 3.3 ± 1
PAGE 9 OF 10 Nucleic Acids Research, 2002, Vol. 30, No. 11 e51
Let us consider the choice of oligonucleotide length. Oligos
of 25 nt in length can provide optimal discrimination at the cost
of significant losses in sensitivity, which may be very detri-
mental for detection of genes expressed at low levels. The data
of Table 11 indicate that discrimination between genes
expressed and not expressed in a biological RNA sample is
12-fold for oligos of 25 nt in length, but only 3.3-fold for 60 nt-
long oligos covering similar sequences. In contrast, 60 nt-long
oligos have higher sensitivity under the same conditions, but at
the cost of more limited specificity, which in addition is difficult
to quantify. The data of Table 10 indicate that 60-nt oligos are
on average 7-fold more sensitive than 25mers. Therefore the
7-fold increase in sensitivity is accompanied by a 4-fold loss
in specificity. Problems of specificity can be particularly
serious considering that fewer oligos per gene are usually
selected when longer oligos are used in the microarrays, thus
providing a reduced number of independent measurements
per gene.
The results of Tables 1 and 2 suggest that 30-nt oligos can
represent an adequate compromise between sensitivity and
specificity for optimal microarray performance in a system of
limited RNA complexity. Similarly, the 3-fold lower sensitivity
for RNAs from HeLa cells of 30mers compared with 60-nt
oligos (Table 10) may be compensated by an increase in
specificity, and by the possibility of assessing the degree of the
specificity of each oligo by the use of mismatch controls.
Considered together, the data suggest that both the abun-
dance in expression of the mRNAs to be studied and the degree
of similarity to other RNAs present in the sample need to be
taken into consideration for the choice of oligo length.
Rather subtle changes in hybridization conditions also affect
microarray performance. Figure 2 shows an inverse correlation
between signal intensity and M/MM discrimination in a range
of temperatures between 30 and 42°C. While an increased M/MM
ratio is expected from the lower thermal stability of imperfect
duplexes under more stringent temperatures, the decrease
observed at 42°C cannot be explained easily. Equally puzzling
is the increase in hybridization signals between 35 and 42°C
that follows the more expected decrease between 30 and 35°C.
Increased signals at 42°C could be explained by the opening of
secondary structures in the target cRNA, thereby facilitating
hybridization to the microarray. If this is the case, it is conceivable
that the reduction in discrimination observed at 42°C in Figure 2A
could also be attributed to a general increase in the availability
of target sequences for hybridization.
Finally, cost considerations can also play a relevant part in
microarray design. Relatively small increases in sensitivity by
the presence of expensive 5 amino modifications or HPLC
purification may be critical for some applications but not for
others.
Concluding remarks
In this manuscript we have provided a quantitative and statistical
analysis for the use of oligonucleotide-based microarrays that
will aid in the selection of appropriate reagents and conditions
for gene profiling and genotyping. Although these will vary
depending on the specific application, we suggest that 30 nt-long
oligos offer an adequate balance between sensitivity and
specificity. Longer oligos can provide a slight increase in
sensitivity at the cost of a significant decrease in specificity,
which is also difficult to assess due to the absence of reliable
mismatch controls. The higher cost of HPLC purification can
be compensated by significant increases in sensitivity. Using
the algorithm presented here, five oligonucleotides and their
mismatch controls should be sufficient to provide statistically
reliable quantification of signals corresponding to a gene or
sequence feature.
SUPPLEMENTARY MATERIAL
Supplementary Material is available at NAR Online.
ACKNOWLEDGEMENTS
We thank MWG Biotech AG for providing free oligonucleotides,
George Dimopoulos, George K. Christophides, Thomas Preis,
Vladimir Benes and members of the Ansorge and Valcárcel
laboratories for technical help, reagents, discussions and critical
reading of the manuscript. A.R. was the recipient of a Praxis
XXI PhD fellowship from the Portuguese Ministry of Science
and Technology. This work was supported in part by a grant
from the Human Frontier Science Program Organization.
REFERENCES
1. Noordewier,M.O. and Warren,P.V. (2001) Gene expression microarrays
and the integration of biological knowledge. Trends Biotechnol., 19,
412–415.
2. Young,R.A. (2000) Biomedical discovery with DNA arrays. Cell, 102,
9–15.
3. Mills,J.C., Roth,K.A., Cagan,R.L. and Gordon,J.I. (2001) DNA
microarrays and beyond: completing the journey from tissue to cell.
Nature Cell Biol., 3, E175–178.
4. Bassett,D.E.,Jr, Eisen,M.B. and Boguski,M.S. (1999) Gene expression
informatics–it’s all in your mine. Nature Genet., 21, 51–55.
5. Holstege,F.C., Jennings,E.G., Wyrick,J.J., Lee,T.I., Hengartner,C.J.,
Green,M.R., Golub,T.R., Lander,E.S. and Young,R.A. (1998) Dissecting
the regulatory circuitry of a eukaryotic genome. Cell, 95, 717–728.
6. Lee,C.K., Klopp,R.G., Weindruch,R. and Prolla,T.A. (1999) Gene
expression profile of aging and its retardation by caloric restriction.
Science, 285, 1390–1393.
7. Golub,T.R., Slonim,D.K., Tamayo,P., Huard,C., Gaasenbeek,M.,
Mesirov,J.P., Coller,H., Loh,M.L., Downing,J.R., Caligiuri,M.A.
et al. (1999) Molecular classification of cancer: class discovery and class
prediction by gene expression monitoring. Science, 286, 531–537.
8. Alon,U., Barkai,N., Notterman,D.A., Gish,K., Ybarra,S., Mack,D. and
Levine,A.J. (1999) Broad patterns of gene expression revealed by
clustering analysis of tumor and normal colon tissues probed by
oligonucleotide arrays. Proc. Natl Acad. Sci. USA, 96, 6745–6750.
9. Hippo,Y., Taniguchi,H., Tsutsumi,S., Machida,N., Chong,J.M.,
Fukayama,M., Kodama,T. and Aburatani,H. (2002) Global gene
expression analysis of gastric cancer by oligonucleotide microarrays.
Cancer Res., 62, 233–240.
10. Liotta,L. and Petricoin,E. (2000) Molecular profiling of human cancer.
Nature Rev. Genet., 1, 48–56.
11. Clarke,P.A., te Poele,R., Wooster,R. and Workman,P. (2001) Gene
expression microarray analysis in cancer biology, pharmacology, and
drug development: progress and potential. Biochem. Pharmacol., 62,
1311–1336.
12. Debouck,C. and Goodfellow,P.N. (1999) DNA microarrays in drug
discovery and development. Nature Genet., 21, 48–50.
13. Granjeaud,S., Bertucci,F. and Jordan,B.R. (1999) Expression profiling:
DNA arrays in many guises. Bioessays, 21, 781–790.
14. Hughes,T.R., Mao,M., Jones,A.R., Burchard,J., Marton,M.J.,
Shannon,K.W., Lefkowitz,S.M., Ziman,M., Schelter,J.M., Meyer,M.R.
et al. (2001) Expression profiling using microarrays fabricated by an
ink-jet oligonucleotide synthesizer. Nat. Biotechnol., 19, 342–347.
15. Southern,E., Mir,K. and Shchepinov,M. (1999) Molecular interactions on
microarrays. Nature Genet., 21,59.
e51 Nucleic Acids Research, 2002, Vol. 30, No. 11 PAGE 10 OF 10
16. Lockhart,D.J. and Barlow,C. (2001) Expressing what’s on your mind:
DNA arrays and the brain. Nature Rev. Neurosci., 2, 63–68.
17. LaForge,K.S., Shick,V., Spangler,R., Proudnikov,D., Yuferov,V.,
Lysov,Y., Mirzabekov,A. and Kreek,M.J. (2000) Detection of single
nucleotide polymorphisms of the human murine opioid receptor gene by
hybridization or single nucleotide extension on custom oligonucleotide
gelpad microchips: potential in studies of addiction. Am.J.Med.Genet.,
96, 604–615.
18. Hacia,J.G. (1999) Resequencing and mutational analysis using
oligonucleotide microarrays Nature Genet., 21, 42–47.
19. Drobyshev,A., Mologina,N., Shik,V., Pobedimskaya,D., Yershov,G. and
Mirzabekov,A. (1997) Sequence analysis by hybridization with
oligonucleotide microchip: identification of beta-thalassemia mutations.
Gene, 188, 45–52.
20. Modrek,B. and Lee,C. (2001) A genomic view of alternative splicing.
Nature Genet., 30, 13–19.
21. Hu,G.K., Madore,S.J., Moldover,B., Jatkoe,T., Balaban,D., Thomas,J.
and Wang,Y. (2001) Predicting splice variant from DNA chip expression
data. Genome Res., 11, 1237–1245.
22. Shoemaker,D.D., Schadt,E.E., Armour,C.D., He,Y.D., Garrett-Engele,P.,
McDonagh,P.D., Loerch,P.M., Leonardson,A., Lum,P.Y., Cavet,G. et al.
(2001) Experimental annotation of the human genome using microarray
technology. Nature, 409, 922–927.
23. Hughes,T.R. and Shoemaker,D.D. (2001) DNA microarrays for
expression profiling. Curr. Opin. Chem. Biol., 5, 21–25.
24. Brazma,A., Hingamp,P., Quackenbush,J., Sherlock,G., Spellman,P.,
Stoeckert,C., Aach,J., Ansorge,W., Ball,C.A., Causton,H.C. et al. (2001)
Minimum information about a microarray experiment (MIAME)-toward
standards for microarray data. Nature Genet., 29, 365–371.
25. Lockhart,D.J., Dong,H., Byrne,M.C., Follettie,M.T., Gallo,M.V.,
Chee,M.S., Mittmann,M., Wang,C., Kobayashi,M., Horton,H. et al.
(1996) Expression monitoring by hybridization to high-density
oligonucleotide arrays. Nat. Biotechnol., 14, 1675–1680.
26. Wodicka,L., Dong,H., Mittmann,M., Ho,M.H. and Lockhart,D.J. (1997)
Genome-wide expression monitoring in Saccharomyces cerevisiae.
Nat. Biotechnol., 15, 1359–1367.
27. Call,D.R., Chandler,D.P. and Brockman,F. (2001) Fabrication of DNA
microarrays using unmodified oligonucleotide probes. Biotechniques, 30,
368–372, 374, 376, passim.
28. Sambrook,J., Fritsch,E.F. and Maniatis,T. (1989) Molecular Cloning,2nd
Edn. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY.
29. Zhang,J., Day,I. and Byrne,C. (2002) A novel medium throughput
quantitative competitive PCR technology to simultaneously measure
mRNA levels from multiple genes. Nucleic Acids Res., 30,e20.
30. Elbashir,S.M., Harborth,J., Lendeckel,W., Yalcin,A., Weber,K. and
Tuschl,T. (2001) Duplexes of 21-nucleotide RNAs mediate RNA
interference in cultured mammalian cells. Nature, 411, 494–498.
31. Chudin,E., Walker,R., Kosaka,A., Wu,S.X., Rabert,D., Chang,T.K. and
Kreder,D.E. (2002) Assessment of the relationship between signal
intensities and transcript concentration for Affymetrix GeneChip(R)
arrays. Genome Biol., 3, RESEARCH0005.
... As expected, the DNA fragment can be efficiently pulled down by the capture chip (Fig. 1c). Previous work has shown that 20 bp to 150 bp complementary sequences can efficiently support hybridization [21][22][23]. Next, we checked every base of all IMGT documented HLA alleles of the 14 genes. We found that every base of the exons (>23 bp) of all documented HLA alleles of 14 HLA genes can be covered by at least one DNA fragment that had enough complementary sequence with an on-chip bait to enable hybridization capture (Fig. 1d, Additional file 3: Figure S3). ...
... To allow highresolution HLA-typing of large populations at 5-10% cost of current common hybridization-based and amplicon-based NGS approaches (for measurement details, see the Methods), we developed STC-Seq to capture and sequence the coding region of 14 HLA genes using large CDS fragment as baits. Previous work has shown that 20 to 150 bp complementary sequences can efficiently generate hybridization signals [21][22][23], which explains why the long CDS fragments of one HLA allele could serve as a universal bait to efficiently pull down the corresponding homologous alleles. Moreover, long double-stranded baits can acquire a high diversity of captured reads, which contributes to the improvement of the HLA typing accuracy comparing to a hybridization-based NGS HLA typing approach [20]. ...
Article
Full-text available
Background Highly polymorphic human leukocyte antigen (HLA) genes are responsible for fine-tuning the adaptive immune system. High-resolution HLA typing is important for the treatment of autoimmune and infectious diseases. Additionally, it is routinely performed for identifying matched donors in transplantation medicine. Although many HLA typing approaches have been developed, the complexity, low-efficiency and high-cost of current HLA-typing assays limit their application in population-based high-throughput HLA typing for donors, which is required for creating large-scale databases for transplantation and precision medicine. ResultsHere, we present a cost-efficient Saturated Tiling Capture Sequencing (STC-Seq) approach to capturing 14 HLA class I and II genes. The highly efficient capture (an approximately 23,000-fold enrichment) of these genes allows for simplified allele calling. Tests on five genes (HLA-A/B/C/DRB1/DQB1) from 31 human samples and 351 datasets using STC-Seq showed results that were 98% consistent with the known two sets of digitals (field1 and field2) genotypes. Additionally, STC can capture genomic DNA fragments longer than 3 kb from HLA loci, making the library compatible with the third-generation sequencing. ConclusionsSTC-Seq is a highly accurate and cost-efficient method for HLA typing which can be used to facilitate the establishment of population-based HLA databases for the precision and transplantation medicine.
... Furthermore, the gene-specific microarrays require different sets of capture probes for each set of human genetic mutations. Thus, immobilization and hybridization conditions must be optimized for each gene-specific microarray, which is very tedious and time consuming [47]. The recent use of zip-code arrays can overcome these limitations. ...
... By changing the gene-specific sequences linked to the complementary zip-code sequence, zip-code microarray can be used as a universal platform for the SNP genotyping [13]. Moreover, by designing zip-code sequences to have similar T m values and avoid cross-hybridization, hybridization can be performed at a single temperature that provides rapid and robust hybridization without false signals under more stringent conditions [47]. ...
Chapter
Full-text available
Single nucleotide polymorphism (SNP) is the most abundant form of genetic variations in the human genome and has been extensively studied as genetic marker for efficient diagnosis and prognosis of various human diseases. The fundamental and standard SNP genotyping method is direct sequencing, which specifies whole base sequence in a sample genome with high accuracy. By satisfying the challenge to develop a high-capacity SNP genotyping platform that is readily scalable yet highly accurate, DNA microarray technology emerged as the most powerful SNP genotyping strategy to replace the conventional methods. This chapter presents a comprehensive overview for DNA microarray-based technologies to genotype SNP, which are classified into ASOCH, zip-code microarray, universal amplification-based technology, and bead array-based technology. ASOCH is the earliest version of SNP genotyping on DNA microarray and relies on very straightforward competitive hybridization of SNP-containing target strands for the allele-specific oligonucleotides (ASO).
... However, to discriminate two closely related species like L hilgardii and L. buchneri it will be important to test additional probes that could target other regions of rDNA, such as that between 23S and 5S pre-rRNA. Other possible strategies to obtain increased specificity and sensitivity could consider the use of PNA (peptide nucleic acids) as an alternative to DNA as probes (Weiler, Gausepohl, Hauser, Jensen, & Hoheisel, 1997) or the preparation of longer probes (Rel ogio, Schwager, Richter, Ansorge, & Valc arcel, 2002). ...
... There is evidence that the length of the probes on the microarray platform may contribute to the differences. Oligonucleotide 30mer probes (found on the Codelink platform) provide twice the intensity of 25mer probes (found on the Affymetrix platform) (Relogio et al., 2002). A comparison between the CodeLink and Affymetrix platforms suggested a 10fold greater sensitivity of the former platform (Shippy et al., 2004). ...
Article
Full-text available
Traditional methods for cancer risk assessment are resource-intensive, retrospective, and not feasible for the vast majority of environmental chemicals. In this study, we investigated whether quantitative genomic data from short-term studies may be used to set protective thresholds for potential tumorigenic effects. We hypothesized that gene expression biomarkers measuring activation of the key early events in established pathways for rodent liver cancer exhibit cross-chemical thresholds for tumorigenesis predictive for liver cancer risk. We defined biomarker thresholds for 6 major liver cancer pathways using training sets of chemicals with short-term genomic data (3-29 days of exposure) from the TG-GATES (n = 77 chemicals) and DrugMatrix (n = 86 chemicals) databases and then tested these thresholds within and between datasets. The 6 pathway biomarkers represented genotoxicity, cytotoxicity, and activation of xenobiotic, steroid, and lipid receptors (AhR, CAR, ER, and PPARα). Thresholds were calculated as the maximum values derived from exposures without detectable liver tumor outcomes. We identified clear response values that were consistent across training and test sets. Thresholds derived from the TG-GATES training set were highly predictive (97%) in a test set of independent chemicals, while thresholds derived from the DrugMatrix study were 96-97% predictive for the TG-GATES study. Threshold values derived from an abridged gene list (2/biomarker) also exhibited high predictive accuracy (91-94%). These findings support the idea that early genomic changes can be used to establish threshold estimates or "molecular tipping points" that are predictive of later-life health outcomes.
... Short oligonucleotides (25-mer) provide a lower signal-to-noise ratio of hybridization than 60-mer probes and the analyses need to average several consecutive markers thus diminishing the overall resolution. In contrast, 25-mer probes are more specific allowing the discrimination of SNP under optimal conditions but with reduced sensitivity [11,12]. For SNP-arrays, a single sample is labelled and hybridized to the array and changes in CN are determined in silico, comparing the signal intensity of the sample with a set of analog experiments performed on hundreds of reference DNAs. ...
Article
Full-text available
Submicroscopic chromosomal copy number variations (CNVs), such as deletions and duplications, account for about 15–20% of patients affected with developmental delay, intellectual disability, multiple congenital anomalies, and autism spectrum disorder. Most of CNVs are de novo or inherited rearrangements with clinical relevance, but there are also rare inherited imbalances with unknown significance that make difficult the clinical management and genetic counselling. Chromosomal microarrays analysis (CMA) are recognized as the first-line test for CNV detection and are now routinely used in the clinical diagnostic laboratory. The recent use of CMA platforms that combine classic copy number analysis with single-nucleotide polymorphism (SNP) genotyping has increased the diagnostic yields. Here we discuss the application of the Cytoscan high-density (HD) SNP-array for the detection of CNVs. We provide an overview of molecular analyses involved in identifying pathogenic CNVs and highlight important guidelines to establish pathogenicity of CNV.
... The array was not as sensitive as PCR or qPCR in the detection of low levels of EEHV, this was determined due to EEHV not being detected in trunk washes samples that had previously tested positive by qPCR. Previous studies have also reported poor microarray sensitivity, this may be due to poor hybridisation process or issues with reagents, such as Relógio et al. (2002) found higher concentrations (70-80%) of formamide resulted in very poor sensitivity. Probe design is also a key factor in regards to microarray sensitivity. ...
Article
Herpesviruses are ubiquitous and are found worldwide, most animal species can be infected with multiple herpesviruses. Some cause clinical disease and others remain symptomatic throughout life. Herpesviruses are found in both captive and wild animals including Asian elephants (Elephas maximus). Elephant Endothelioltropic Herpesvirus (EEHV) has been reported in both captive and wild Asian elephants, with a number of cases being reported in North America, Europe and Asia. It has been suggested that EEHV is associated with haemorrhagic disease, which has been attributed to a number of Asian elephant deaths, affecting mostly juveniles and calves. Clinical signs can vary from weight loss, lethargy, depression, cyanosis of the tongue and sudden death. Molecular testing using qPCR has enabled the detection of individual variants of EEHV, this thesis investigates the EEHV1 variant. EEHV1 has been highlighted as the variant that is more frequently associated with deaths. This thesis includes five studies investigating different aspects of EEHV. Including, the relationship between pregnancy and EEHV viral shedding, the use of an amended human protocol for culturing endothelial cells, EEHV tissue tropism, a potential genetic or familial link between EEHV associated deaths and the detection of potential co-pathogens. The main findings from this thesis include: 1) the use of a longitudinal study investigating a potential link between the physiological stress of pregnancy and EEHV viral shedding. This study suggested there was no link between pregnancy and EEHV viral shedding however other stressors may be involved. 2) Using an amended human umbilical vein endothelial cell protocol, the culture of Asian elephant endothelial cells was successful. The cells from this study may be used in subsequent drug testing and vaccine development. 3) Quantitative PCR was used to determine EEHV1 tropism in tissues from two deaths associated with the virus. Tropism appeared to be for the heart and liver. 4) This thesis provides results from a preliminary study into a potential link between EEHV associated deaths. The data from an Asian elephant genogram shows there is the possibility of a genetic or familial link, which requires further investigation. 5) A number of tissues from deaths associated with EEHV and or death from other causes were investigated for the presence of potential co-pathogens, including the presence of encephalomyocarditis virus (EMCV), using microarray technology. The results indicated there were no co-pathogens present in the tissues. This thesis adds to the current published data, and includes the first known preliminary study investigating a potential genetic link between elephant deaths due to EEHV.
... With oligonucleotides, probes can be designed to identify a unique part of a given transcript, making the detection of closely related genes or splice variants possible. The arraying of pre-synthesized longer oligonucleotides (50 to 100 polymers) has recently been developed to counteract the disadvantages of short oligonucleotides which may sometimes result in less specific hybridization and reduced sensitivity (Schulze and Downward, 2001;TJ, 2003;Relogio et al., 2002). The advantage of synthetic oligonucleotides is that the sequence information alone is sufficient to generate the DNA to be arrayed therefore no time consuming handling of cDNA resources is required (Kane, 2000). ...
... Although the capture probe length has been reported as an important factor on the sensitivity of DNA microarrays, [63,64] (Table S2). It was expected that Probe 3 and Probe 5, which had higher negative free energy (G) values exhibit poor performances since their secondary structures were more stables, but only Probe 5 showed negligible signals compared with Probe 3 (Fig. S4). ...
Article
Full-text available
Paper-based biosensors offer a promising technology to be used at the point of care, enabled by good performance, convenience and low-cost. In this article, we describe a colorimetric vertical-flow DNA microarray (DNA-VFM) that takes advantage of the screening capability of DNA microarrays in a paper format together with isothermal amplification by means of Recombinase Polymerase Amplification (RPA). Different assay parameters such as hybridization buffer, flow rate, printing buffer and capture probe concentration were optimized. A limit of detection (LOD) of 4.4 nM was achieved as determined by tabletop scanning. The DNA-VFM was applied as a proof of concept for detection of Neisseria meningitidis, a primary cause of bacterial meningitis. The LOD was determined to be between 38 and 2.1 × 10⁶ copies/VFMassay, depending on the choice of DNA capture probes. The presented approach provides multiplex capabilities of DNA microarrays in a paper-based format for future point-of-care applications.
Article
Hybridization of complementary single strands of DNA represents a very effective natural molecular recognition process widely exploited for diagnostic, biotechnology and nanotechnology applications. A common approach relies on the immobilization on a surface of single stranded DNA probes that bind complementary targets in solution. However, despite the deep knowledge on DNA interactions in bulk solution, the modelling of the same interactions on a surface are still challenging and perceived as strongly system-dependent. Here we show that a two dimensional analysis of the kinetics of hybridization, performed at different target concentration and probe surface density by a label-free optical biosensor, reveals peculiar features inconsistent with an ideal Langmuir-like behaviour. We propose a simple non-Langmuir kinetic model accounting for an enhanced electrostatic repulsion originating from the surface immobilization of nucleic acids and for steric hindrance close to full hybridization of the surface probes. The analysis of the kinetic data by the model enables to quantify the repulsive potential at the surface, as well as to retrieve the kinetic parameters of isolated probes. We show that the strength and the kinetics of hybridization at large probe density can be improved by a 3D immobilization strategy of probe strands with a double stranded linker.
Article
Full-text available
We describe a flexible system for gene expression profiling using arrays of tens of thousands of oligonucleotides synthesized in situ by an ink-jet printing method employing standard phosphoramidite chemistry. We have characterized the dependence of hybridization specificity and sensitivity on parameters including oligonucleotide length, hybridization stringency, sequence identity, sample abundance, and sample preparation method. We find that 60-mer oligonucleotides reliably detect transcript ratios at one copy per cell in complex biological samples, and that ink-jet arrays are compatible with several different sample amplification and labeling techniques. Furthermore, results using only a single carefully selected oligonucleotide per gene correlate closely with those obtained using complementary DNA (cDNA) arrays. Most of the genes for which measurements differ are members of gene families that can only be distinguished by oligonucleotides. Because different oligonucleotide sequences can be specified for each array, we anticipate that ink-jet oligonucleotide array technology will be useful in a wide variety of DNA microarray applications.
Article
Full-text available
Microarray analysis has become a widely used tool for the generation of gene expression data on a genomic scale. Although many significant results have been derived from microarray studies, one limitation has been the lack of standards for presenting and exchanging such data. Here we present a proposal, the Minimum Information About a Microarray Experiment (MIAME), that describes the minimum information required to ensure that microarray data can be easily interpreted and that results derived from its analysis can be independently verified. The ultimate goal of this work is to establish a standard for recording and reporting microarray-based gene expression data, which will in turn facilitate the establishment of databases and public repositories and enable the development of data analysis tools. With respect to MIAME, we concentrate on defining the content and structure of the necessary information rather than the technical format for capturing it.
Article
Full-text available
Technologies for whole-genome RNA expression studies are becoming increasingly reliable and accessible. However, universal standards to make the data more suitable for comparative analysis and for inter-operability with other information resources have yet to emerge. Improved access to large electronic data sets, reliable and consistent annotation and effective tools for 'data mining' are critical. Analysis methods that exploit large data warehouses of gene expression experiments will be necessary to realize the full potential of this technology.
Article
Diagnostics for genetic diseases were run and sequence analysis of DNA was carried out by hybridization of RNA transcripts with oligonucleotide array microchips. Polyacrylamide gel pads (100 x 100 x 20 microm) were fixed on a glass slide of the microchip and contained allele-specific immobilized oligonucleotides (10-mers). The RNA transcripts of PCR-amplified genomic DNA were fluorescently labeled by enzymatic or chemical methods and hybridized with the microchips. The simultaneous measurement in real time of the hybridization and melting on the entire oligonucleotide array was carried out with a fluorescence microscope equipped with CCD camera. The monitoring of the hybridization specificity for duplexes with different stabilities and AT content was enhanced by its measurement at optimal, discrimination temperatures on melting curves. Microchip diagnostics were optimized by choosing the proper allele-specific oligonucleotides from among the set of overlapping oligomers. The accuracy of mutation detection can be increased by simultaneous hybridization of the microchip with two differently labeled samples and by parallel monitoring their hybridization with a multi-wavelength fluorescence microscope. The efficiency and reliability of the sequence analysis were demonstrated with diagnostics for beta-thalassemia mutations.
Article
The genomic sequence of the budding yeast Saccharomyces cerevisiae has been used to design and synthesize high-density oligonucleotide arrays for monitoring the expression levels of nearly all yeast genes. This direct and highly parallel approach involves the hybridization of total mRNA populations to a set of four arrays that contain a total of more than 260,000 specifically chosen oligonucleotides synthesized in situ using light-directed combinatorial chemistry. The measurements are quantitative, sensitive, specific, and reproducible. Expression levels ranging from less than 0.1 copies to several hundred copies per cell have been measured for cells grown in rich and minimal media. Nearly 90% of all yeast mRNAs are observed to be present under both conditions, with approximately 50% present at levels between 0.1 and 1 copy per cell. Many of the genes observed to be differentially expressed under these conditions are expected, but large differences are also observed for many previously uncharacterized genes.
Article
The human genome encodes approximately 100,000 different genes, and at least partial sequence information for nearly all will be available soon. Sequence information alone, however, is insufficient for a full understanding of gene function, expression, regulation, and splice-site variation. Because cellular processes are governed by the repertoire of expressed genes, and the levels and timing of expression, it is important to have experimental tools for the direct monitoring of large numbers of mRNAs in parallel. We have developed an approach that is based on hybridization to small, high-density arrays containing tens of thousands of synthetic oligonucleotides. The arrays are designed based on sequence information alone and are synthesized in situ using a combination of photolithography and oligonucleotide chemistry. RNAs present at a frequency of 1:300,000 are unambiguously detected, and detection is quantitative over more than three orders of magnitude. This approach provides a way to use directly the growing body of sequence information for highly parallel experimental investigations. Because of the combinatorial nature of the chemistry and the ability to synthesize small arrays containing hundreds of thousands of specifically chosen oligonucleotides, the method is readily scalable to the simultaneous monitoring of tens of thousands of genes.
Article
Genome-wide expression analysis was used to identify genes whose expression depends on the functions of key components of the transcription initiation machinery in yeast. Components of the RNA polymerase II holoenzyme, the general transcription factor TFIID, and the SAGA chromatin modification complex were found to have roles in expression of distinct sets of genes. The results reveal an unanticipated level of regulation which is superimposed on that due to gene-specific transcription factors, a novel mechanism for coordinate regulation of specific sets of genes when cells encounter limiting nutrients, and evidence that the ultimate targets of signal transduction pathways can be identified within the initiation apparatus.
Article
DNA microarrays can be used to measure the expression patterns of thousands of genes in parallel, generating clues to gene function that can help to identify appropriate targets for therapeutic intervention. They can also be used to monitor changes in gene expression in response to drug treatments. Here, we discuss the different ways in which microarray analysis is likely to affect drug discovery.
Article
Oligonucleotide microarray (DNA chip)-based hybridization analysis is a promising new technology which potentially allows rapid and cost-effective screens for all possible mutations and sequence variations in genomic DNA. Here, I review current strategies and uses for DNA chip-based resequencing and mutational analysis, the underlying principles of experimental designs, and future efforts to improve the sensitivity and specificity of chip-based assays.