ArticlePDF Available

Minimum information about a microarray experiment (MIAME) - Toward standards for microarray data

Authors:

Abstract

Microarray analysis has become a widely used tool for the generation of gene expression data on a genomic scale. Although many significant results have been derived from microarray studies, one limitation has been the lack of standards for presenting and exchanging such data. Here we present a proposal, the Minimum Information About a Microarray Experiment (MIAME), that describes the minimum information required to ensure that microarray data can be easily interpreted and that results derived from its analysis can be independently verified. The ultimate goal of this work is to establish a standard for recording and reporting microarray-based gene expression data, which will in turn facilitate the establishment of databases and public repositories and enable the development of data analysis tools. With respect to MIAME, we concentrate on defining the content and structure of the necessary information rather than the technical format for capturing it.
commentary
nature genetics •
volume 29 • december 2001
365
Minimum information about a microarray
experiment (MIAME)—toward standards
for microarray data
Alvis Brazma1, Pascal Hingamp2, John Quackenbush3, Gavin Sherlock4, Paul Spellman5,
Chris Stoeckert6, John Aach7, Wilhelm Ansorge8, Catherine A. Ball4, Helen C. Causton9,
Terry Gaasterland10, Patrick Glenisson11, Frank C.P. Holstege12, Irene F. Kim4, Victor
Markowitz13, John C. Matese4, Helen Parkinson1, Alan Robinson1, Ugis Sarkans1, Steffen
Schulze-Kremer14, Jason Stewart15, Ronald Taylor16, Jaak Vilo1& Martin Vingron17
Microarray analysis has become a widely used tool for the generation of gene expression data on a
genomic scale. Although many significant results have been derived from microarray studies, one lim-
itation has been the lack of standards for presenting and exchanging such data. Here we present a
proposal, the Minimum Information About a Microarray Experiment (MIAME), that describes the min-
imum information required to ensure that microarray data can be easily interpreted and that results
derived from its analysis can be independently verified. The ultimate goal of this work is to establish a
standard for recording and reporting microarray-based gene expression data, which will in turn facil-
itate the establishment of databases and public repositories and enable the development of data analy-
sis tools. With respect to MIAME, we concentrate on defining the content and structure of the necessary
information rather than the technical format for capturing it.
1European Bioinformatics Institute, EMBL outstation, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK. 2Centre d’Immunologie de
Marseille Luminy (CIML TAGC) & Université de la Méditerranée, Marseille, France. 3The Institute for Genomic Research (TIGR), Rockville, Maryland,
USA. 4Stanford University, Palo Alto, California, USA. 5University of California Berkeley, Berkeley, California, USA. 6University of Pennsylvania,
Philadelphia, Pennsylvania, USA. 7Department of Genetics, Harvard Medical School, Cambridge, Massachusetts, USA. 8European Molecular Biology
Laboratory (EMBL), Heidelberg, Germany. 9CSC/Imperial College School of Medicine Microarray Centre, London, UK. 10Rockefeller University, New York,
New York, USA. 11Katholieke Universiteit Leuven, Leuven, Belgium. 12University Medical Center, Utrecht, Netherlands. 13GeneLogic Inc, Gaithersburg,
Maryland, USA. 14RZDP German Genome Resource Center, Berlin, Germany. 15Open Informatics, Albuquerque, New Mexico, USA. 16Center for
Computational Pharmacology, University of Colorado School of Medicine, Denver, Colorado, USA. 17Max Plank Institute for Molecular Genetics, Berlin,
Germany. Correspondence should be addressed to A.B. (e-mail:brazma@ebi.ac.uk) or J.Q. (e-mail: johnq@tigr.org).
Introduction
After genome sequencing, DNA microarray analysis1has become
the most widely used source of genome-scale data in the life sci-
ences. Microarray expression studies are producing massive
quantities of gene expression and other functional genomics
data, which promise to provide key insights into gene function
and interactions within and across metabolic pathways2–4.
Unlike genome sequence data, however, which have standard
formats for presentation and widely used tools and databases,
much of the microarray data generated so far remain inaccessible
to the broader research community.
Several factors contribute to the barrier to widespread access to
microarray data. The field is young and has only recently
approached the maturity needed to identify important aspects of
the data. In addition, gene expression data are more complex than
sequence data in that they are meaningful only in the context of a
detailed description of the conditions under which they were gen-
erated, including the particular state of the living system under
study and the perturbations to which it has been subjected. In con-
trast to an organism’s genome, there are as many transcriptomes as
there are cell types multiplied by environmental conditions. More-
over, comparing gene expression data is considerably more diffi-
cult, because at present, microarrays do not measure gene expres-
sion levels in any objective units. In fact, most measurements report
only relative changes in gene expression, using a reference which is
rarely standardized. Finally, different microarray platforms and
experimental designs produce data in various formats and units
and are normalized in different ways, all of which makes compari-
son and integration of these data an error-prone exercise5,6.
Although the largest microarray laboratories have established
their own databases7, microarray data accompanying publica-
tions are typically reported on authors’ web sites using a variety
of formats, if they are accessible at all. Exactly what annotation
should be provided for microarray data is open to debate, but it is
clear that most of the publicly available data are currently not
annotated in sufficient detail for use by independent parties (in
fact, they are often not annotated at all). The reported data are
often completely ‘stripped’ of all the evidence about the quality,
reliability and possible error levels of particular data points. For
instance, for two-channel microarray data, it is common to
report only the background subtracted signal ratios without
indicating anything about the absolute signal and background
levels. Yet these are important for assessing the reliability of the
measured expression for each arrayed gene.
© 2001 Nature Publishing Group http://genetics.nature.com
© 2001 Nature Publishing Group http://genetics.nature.com
It is widely acknowledged that there is a need for public repos-
itories for microarray data8,9, whose functions would include
providing access to supporting data for publications based on
microarray experiments. Such repositories are under develop-
ment by the National Center for Biotechnology Information
(which has developed the Gene Expression Omnibus), the DNA
Database of Japan, and the European Bioinformatics Institute
(which has developed ArrayExpress); however, it is less clear
exactly what information should be stored in such databases.
Should databases store raw microarray scans (images), or is one
final summary scalar per array element (such as one green/red
ratio per spot for two channel platforms) sufficient? Or should
some intermediate data, such as the complete output from a par-
ticular image analysis software package, be used instead? And
should this be reported as ‘raw’ data or should it be normalized?
What information about the experimental set-up should be
required? And how should the array elements (printed spots or
features) be annotated to facilitate an understanding of the
experimental results?
The precise nature of the information to be stored will be dic-
tated by the function of the particular database or repository. If
the unique goal of the database is to archive supporting data for
published experiments, it can be assumed that the publications
themselves will provide information explaining the database
entries. It may be argued that it can be left to peer review to
ensure that a particular publication together with the respective
database entry contains the information that is necessary to ver-
ify and reproduce the experimental results. It is unlikely, how-
ever, that such a system could be effective or scalable. Moreover,
the value and usefulness of such a nonstandardized database
would be considerably limited. For instance, it would be difficult
to use the database for high-throughput automated data analysis
or mining. The experience of the sequence databases over the
past decade unequivocally demonstrates the strategic impor-
tance of structured, consistent annotation applied early in the
process of data generation.
We believe that it is necessary to define the minimum informa-
tion that must be reported, in order to ensure the interpretability
of the experimental results generated using microarrays as well as
their potential independent verification. Here we propose a doc-
ument called MIAME, the Minimum Information About a
Microarray Experiment, as a starting point for a broader com-
munity discussion. To make the task more manageable, we focus
on microarray-based gene expression data, which arguably cov-
ers the most popular applications of microarray technology. We
believe that the adoption of such a standard will facilitate the
establishment and usefulness of microarray databases. MIAME
should also prompt microarray manufacturers and software pro-
ducers to develop adequate microarray laboratory information
management systems (LIMS), enabling the production and cap-
ture of MIAME-compatible primary data at the bench.
The idea of having a defined minimum standard for information
associated with experiments is not new to life sciences. A similar
mode of operation has been adopted by the macromolecular struc-
ture community (see, for example, http://msd.ebi.ac.uk/), where
most journals require submission of a well-defined minimum of
raw data associated with publications. Not unlike crystallography
data, those generated by microarray experiments are usually of a
size and complexity that are meaningless to the general research
community unless a minimum defined standard states what data
are sufficient to support and verify conclusions.
Over and above representation of expression measurements,
MIAME addresses the need for the comprehensive annotation
necessary to interpret the results of microarray data. It is plat-
form-independent but includes essential evidence about how the
gene expression level measurements have been obtained. The
first version of a MIAME document (MIAME 1.0) was recently
completed and is used here as the basis for this discussion (Web
Note A). A glossary of terms (Web Note B) and an example of a
MIAME-compliant description of an experiment (Web Note C)
were prepared to facilitate discussion of the proposed standard.
MIAME is being developed by the Microarray Gene Expression
Database group (MGED; http://www.mged.org/), a grass-roots
movement to develop standards for microarray data5. MIAME
1.0 was approved in the MGED 3 meeting in Stanford University
on 28–31 May 2001. It should be noted that MIAME does not
specify the format in which the information should be provided,
but only its content. A technical data format to capture microar-
ray information in a form that includes MIAME requirements is
being developed in a collaborative initiative coordinated by the
Life Sciences Research Task Force of the Object Management
Group (OMG) and, in fact, is well under way.
Gene expression—a conceptual view
A collection of gene expression data can be viewed abstractly as a
table with rows representing genes, columns representing vari-
ous samples and each position in the table describing the mea-
surement for a particular gene in a particular sample (Fig. 1). We
call this table a gene expression matrix. In addition to the matrix,
a description of a microarray experiment should also contain
information about the genes whose expression has been mea-
sured and the experimental conditions under which the samples
were taken. The information required to describe a microarray
experiment can be divided conceptually into three logical parts:
gene annotation, sample annotation and a gene expression
matrix (Fig. 1).
Ideally, we would like to measure amounts of gene expression in
natural units, such as mRNA copies per cell10, and to have an error
estimate or reliability indicator such as the standard deviation (s.d.)
associated with each value. There are a number of experimental
challenges, however, that make direct measurement of gene expres-
sion difficult. Raw data from microarray experiments are images
Fig. 1 Conceptual view of gene expression data. The model has three parts: (i)
gene annotation, which may be given as links to gene sequence databases, (ii)
sample annotation, for which there currently are no public external databases
(except the species taxonomy) and (iii) the gene expression matrix, in which
each position contains information characterizing the expression of a particu-
lar gene in a particular sample.
commentary
366 nature genetics •
volume 29 • december 2001
samples
genes
gene annotation
sample annotation
gene expression
matrix
gene expression levels
© 2001 Nature Publishing Group http://genetics.nature.com
© 2001 Nature Publishing Group http://genetics.nature.com
commentary
nature genetics •
volume 29 • december 2001
367
from hybridized microarray
scans that have to be analyzed to
identify and quantify each fea-
ture (spot) in the image. A DNA
sequence may be spotted on a
microarray several times; in addition, several distinct DNA
sequences may be spotted that map to the same gene. To yield a sin-
gle value for these, the corresponding measurements have to be
combined. Moreover, the same biological condition can be mea-
sured in several (replicate) hybridizations, and the information
from all replicates has to be summarized to derive a single gene
expression data matrix. Finally, to compare microarray data from
several samples, the data must be appropriately normalized.
There are at least three levels of data relevant to a microarray
experiment: (i) the scanned images (raw data); (ii) the quantita-
tive outputs from the image analysis procedure (microarray
quantitation matrices); and (iii) the derived measurements (gene
expression data matrices; Fig. 2). There is an important series of
transformations leading from raw data to the gene expression
matrix, and the steps involved are far from being standardized.
As there are no widely used standard controls for microarray
assays, microarray data from different sources use different mea-
surement units whose conversion factors are typically unknown
and may even vary depending on expression level. This indicates
the necessity to record not only the final gene expression matrix,
but also a detailed description of how the expression values were
obtained, if verification of the data is to be ensured. Conse-
quently, the nature of the data that must be recorded necessarily
becomes more complex.
Because microarray data have meaning only in the context of
the particular biological sample and the exact conditions under
which the samples were taken, a major element of the standard
that we propose addresses sample annotation. For instance, if we
are interested in finding out how different cell types react to
treatments with various chemical compounds, we must record
unambiguous information about the cell types and compounds
used in the experiments. This information should be contained
in sample annotation.
Although gene annotation can to a certain extent be expressed
by links to sequence databases, the possibly complicated many-
to-many relationships between genes in the gene expression
matrix and elements on the array make it necessary to provide a
full and detailed description of each element on the array.
General principles of MIAME design
As a starting point, we propose that for the data and annotations
from microarray experiments to have the most value, they should
satisfy the following requirements: (i) the recorded information
about each experiment should be sufficient to interpret the experi-
ment and should be detailed enough to enable comparisons to sim-
ilar experiments and permit replication of experiments and (ii) the
information should be structured in a way that enables useful
querying as well as automated data analysis and mining.
The first requirement implies that a detailed annotation of the
sample and other experimental conditions should be recorded and
that some reliability estimates of particular data points should be
given. For example, red/green ratios alone for two-channel plat-
forms cannot normally be regarded as sufficientcurrently there is
no widely accepted method for indicating the confidence in a mea-
surement, and much intensity-specific and expression-level infor-
mation is lost. The necessary level of detail and whether the raw
image data should be included are less obvious and are still widely
discussed in the microarray community.
The second requirement implies the need for controlled vocabu-
laries and ontologies to represent data as well as the need to limit
free-format text only to cases where more structured representa-
tions are not feasible. This includes the use of a standardized
nomenclature for the description of biological samples and condi-
tions. The nomenclature may be as simple as a controlled vocabu-
lary or as fully developed as an ontology. Usage of the same
taxonomic classification (http://www.ncbi.nlm.nih.gov/Taxon-
omy/) is an example of an ontology that allows researchers to
unambiguously determine the phylogenetic relationship between
organisms. Unfortunately, such resources are not available for
many other types of sample annotation, particularly those specific
for individual species, such as ‘developmental stage’ (see the MGED
Ontology Working Group home page for reference). A practical
approach may be to initially use free-text descriptions for some
sample annotations, despite the difficulty of incorporating them in
automated queries. The use of free text may be sufficient, for exam-
ple, in describing details of a laboratory protocol. A reference to a
publication describing the experiment is an alternative; however,
there are obvious drawbacks to not having information in hand for
either queries or browsing.
Although the goal of MIAME is to specify only the content of
the information and not the technical format, MIAME includes
recommendations for which parts of the information should be
provided as controlled vocabularies. The distinction between
free-text format and controlled vocabularies influences the
quantification
matrices gene expression
data matrix
spots
genes
quantifications samples
array scans
raw data
quantification datum
gene
expression
level
Fig. 2 Three levels of microarray gene
expression data processing. The raw
data from microarray experiments are
images. These images have to be
quantified by image analysis soft-
ware, which identifies spots related
to each element on the array and
measures the fluorescence intensity
of each spot in each channel,
together with the background inten-
sity and a number of other quantifica-
tions, depending on the particular
software (microarray quantification
matrices). To obtain the final gene
expression matrix, all the quantities
related to each gene (either on the
same array or on replicate arrays)
have to be combined and the entire
matrix has to be normalized to make
different arrays comparable.
© 2001 Nature Publishing Group http://genetics.nature.com
© 2001 Nature Publishing Group http://genetics.nature.com
commentary
368 nature genetics •
volume 29 • december 2001
information content: a defined term taken from a given con-
trolled vocabulary is more precise in its meaning than the same
term used in a free-text field and can provide more advanced
data query and analysis options. As the majority of the necessary
controlled vocabularies do not exist, the MIAME definition
includes lists of ‘qualifier, value, source’ triplets, which authors
can use to define their own qualifiers and provide the appropri-
ate values. For instance:
qualifier: cell type
value: epithelial
source: Gray’s anatomy (38th ed.)
or
qualifier: treatment
value: 15heat shock
source: Smith and Jones, Nature Genet. (1992)
Given sufficient detail by the author, these triplets can fully
describe a particular aspect of an experiment. The idea stems
from the information sciences, where a ‘qualifier’ defines a con-
cept and a ‘value’ contains the appropriate instance of the con-
cept. ‘Source’ is either user-defined or a reference to an externally
defined ontology or controlled vocabulary, such as the species
taxonomy database. The judgment regarding the necessary level
of detail is left to the data providers. In the future, these qualifier
lists may be gradually supplemented with predefined fields as the
respective ontologies are developed.
The aim of establishing microarray databases should be kept in
mind, although the MIAME document is conceptually indepen-
dent from it. An important principle in MIAME is that its parts
can be provided as references or links to pre-existing and identifi-
able descriptions. For instance, for commercial or other standard
arrays, required information needs to be provided only once by
the array supplier and referenced thereafter by the users. Stan-
dard protocols also need to be provided only once after they are
established, whereas specific deviations and parameters may be
provided with each experiment (note that this would allow users
to create a library of standard protocols). It is necessary that
either a valid reference or the information itself be provided for
every experimental data set.
There is one important additional principle underpinning
MIAME: as microarray technology is developing rapidly, it
would be counterproductive and unrealistic to impose on users
any particular platform, software or methods of data analysis.
Instead the standards should simply require the description of
data in sufficient detail and
with sufficient annotation, so
that interested parties will have
all the necessary information
to understand how conclusions
were reached. Note that we
assume that data will be pro-
duced by different experimen-
tal platforms and laboratories,
which means that few default
assumptions can be made and
most of the information
should be reported explicitly by
each laboratory.
In developing MIAME, we
have sought to find a compro-
mise between placing a burden
on data producers to annotate
experiments in elaborate detail
and ensuring that data are
annotated in enough detail to be useful to the general research
community. Too much detail may be too taxing for data produc-
ers and may complicate data recording and database submission,
whereas too little detail may limit the usefulness of the data.
MIAME is an informal specification, the goal of which is to guide
cooperative data providers. It is not designed to close all possible
loopholes in data submission requirements. MIAME is not
designed as a ‘questionnaire’ that can be filled in, but only as an
informal specification on which microarray experiment–annota-
tion tools may be based.
The six parts of MIAME
We define a microarray experiment as a set of one or more
hybridizations, each of which relates one or more samples to one
or more arrays. The hybridized array is then scanned and the
resulting image analyzed, relating each element on the array with
a set of measurements (Fig. 3). The data are normalized and
combined with data from replicate hybridizations.
The minimum information about a published microarray-
based gene expression experiment includes a description of the
following six sections:
1. Experimental design: the set of hybridization experiments
as a whole
2. Array design: each array used and each element (spot,
feature) on the array
3. Samples: samples used, extract preparation and labeling
4. Hybridizations: procedures and parameters
5. Measurements: images, quantification and specifications
6. Normalization controls: types, values and specifications
Each of these sections contains information that can be provided
using controlled vocabularies, as well as fields that use free-text for-
mat. Here we discuss only the general information required in each
of these sections; for a full description, see the MIAME document
(Web Note A), which includes a sample experiment described
according to MIAME requirements. We do not discuss why we
regard each of the MIAME elements as necessary, but we hope that
this follows from the principles discussed in the previous sections.
In constructing the MIAME standard, we were careful to include
as much relevant information as possible to aid in the interpreta-
tion of the results of each microarray experiment. Some have sug-
gested that this is excessive as, for example, sample preparation and
labeling protocols typically appear in the Methods sections of pub-
lications associated with microarray experiments. Although these
external links publications
(e.g. PubMed)
source
(e.g. taxonomy)
gene
(e.g. GENBANK)
experiment
hybridization
datanormalization
sample array
Fig. 3 A schematic representation of six components of a microarray experiment.
© 2001 Nature Publishing Group http://genetics.nature.com
© 2001 Nature Publishing Group http://genetics.nature.com
commentary
nature genetics •
volume 29 • december 2001
369
assertions are merited, we believe that the comprehensive nature of
MIAME confers distinct advantages without imposing an excessive
burden on submitters of microarray data. There are a number of
reasons for this. First, stand-alone entries make the use of a data-
base inherently more efficient. Second, a collection of protocols in a
relatively standard format (including controlled vocabularies)
would facilitate comparison and usage of the protocols by third
parties. Third, not all journals publish experimental protocols in
sufficient detail (often due to length considerations) for others to
reproduce them. Fourth, once this information has been prepared
for a journal publication, providing it for a database submission
should not constitute significant additional effort. Moreover, jour-
nals are increasingly relying on electronic release of both protocols
and the data supporting published reports, and the MIAME stan-
dard would allow for a uniform presentation of such information.
Part 1: Experimental design. This section describes the exper-
iment as a whole, which may consist of one or more hybridiza-
tions. Normally an ‘experiment’ should include a set of
hybridizations that are inter-related and address a common bio-
logical question, such as all hybridizations relating to research
published in a single paper. Each experiment should have an
author (submitter) as well as contact information, links (URL),
citations and a single-sentence experiment title. The section also
includes a free-text format description of the experiment or a
link to an electronically available publication.
The minimal information required in this section includes the
type of the experiment (such as normal-versus-diseased compar-
ison, time course, dose response, and so on) and the experimen-
tal variables, including parameters or conditions tested (such as
time, dose, genetic variation or response to a treatment or com-
pound). This section also provides general quality-related indi-
cators such as usage and types of replicates and quality-control
steps (such as dealing with low-complexity sequence-induced
nonspecific hybridization). Provided in a format of controlled
vocabularies, these will enable accurate queries and more formal
data analysis than free-text descriptions.
Finally, this section specifies the experimental relationships
between the array and sample entitiesthat is, which samples
and which arrays were used in each hybridization assay. Each of
these will be assigned unique identifiers that are cross-referenced
with the information provided in the following sections. This
information will allow the user to reconstruct unambiguously
the experimental design and to relate together information from
further MIAME sections.
Part 2: Array design. The aim of this section is to provide a
systematic definition of all arrays used in the experiment, includ-
ing the genes represented and their physical layout on the array.
There are two parts to this section. The first is a list of the physical
arrays; each member of the list is a simple description that gives a
unique ID to each array used in the experiment and a reference to
a particular array design. These designs are described in the sec-
ond part of the section.
In the context of a database, array types should be defined and
submitted only once by the array provider and referred to thereafter
by users of the arrays. The array-type definition includes informa-
tion common to all arrays of a particular type (such as glass-slide
spotted with PCR-amplified cDNA clones) as well as precise
descriptions of the physical content of each element (spot or fea-
ture). This section consists of three parts: (i) a description of the
array as a whole (such as platform type, provider and surface type);
(ii) a description of each type of element or spot used (properties
that are typically common to many elements, such as ‘synthesized
oligo-nucleotides’ or ‘PCR products from cDNA clones’); and (iii)
a description of the specific properties of each element, such as the
DNA sequence and, possibly, quality-control indicators.
The challenge for element definition is to achieve a unique and
unambiguous description of the element. Because references to
an external gene index may not be stable, it is essential to physi-
cally identify each element’s composition. Disclosing the nature
of the relationship between an array element and its cognate
gene’s transcript allows informed assessment of an element’s
potential for nonspecific cross-hybridization or its capacity to
distinguish alternative splice variants. Thus, where elements are
based on cDNA clones, PCR amplicons or composite oligonu-
cleotides, it is necessary that clone IDs, primer pair sequences or
oligonucleotide sequence sets, respectively, are specified. In the
case of commercial arrays where such details may be proprietary,
MIAME allows for a compromise: instead of the actual probe
sequence, the reference sequence from which the probe was
derived may be specified. Although we do not consider this to be
ideal, as the probe sequence may affect the results and interpreta-
tion of hybridization, it does allow commercial organizations to
protect their intellectual investment in developing their array
reagents while providing the minimum information necessary to
uniquely identify the array elements.
Part 3: Samples. This section describes the second partner in
the hybridization reaction: the labeled nucleic acids that repre-
sent the transcripts in the sample. The MIAME ‘sample’ concept
represents the biological material (or biomaterial) for which the
gene expression profile is being established. This section is
divided into three parts which describe the source of the original
sample (such as organism taxonomy and cell type) and any bio-
logical in vivo or in vitro treatments applied, the technical extrac-
tion of the nucleic acids, and their subsequent labeling.
As the characteristics necessary to accurately define a biologi-
cal sample vary greatly from organism to organism, most of the
biological sample definition is provided as an adaptable list of
‘qualifier, value, source’ triplets (such as ‘strain, ‘129P1-Lama2dy
or ‘ICSGNM’). Currently, the single common feature of all sam-
ples is the organism’s taxonomic definition. A list of qualifiers
initially left at a submitter’s discretion may progressively be made
standard when applicable ontologies are made public.
As for laboratory protocols for sample treatments, sample
extraction and labeling, these will need to be specified initially as
free-format text. Again, it is anticipated that popular protocols
will be provided once and referred to thereafter by submitters
pointing out the exact parameters and deviations from the stan-
dard protocol. Knowledge of these protocols may be important
for interpreting the data.
Part 4: Hybridizations. This section defines the laboratory
conditions under which the hybridizations were carried out.
Other than a free-text description of the hybridization protocol,
MIAME requires that a number of critical hybridization parame-
ters are explicitly specified: choice of hybridization solution
(such as salt and detergent concentrations), nature of the block-
ing agent, wash procedure, quantity of labeled target used,
hybridization time, volume, temperature and descriptions of the
hybridization instruments.
Part 5: Measurements. The actual experimental results are
defined in this section. It consists of the three parts discussed
in Section 2, progressing from raw to processed data: (a) the
original scans of the array (images), (b) the microarray quan-
tification matrices based on image analysis, and (c) the final
gene expression matrix after normalization and consolidation
from possible replicates.
Image data should be provided as raw scanner image files (such
as TIFF), accompanied by scanning information that includes rel-
evant scan parameters and laboratory protocols. MIAME does
not require a particular image format, only that submitters pro-
vide the original scans upon which data quantification was based
© 2001 Nature Publishing Group http://genetics.nature.com
© 2001 Nature Publishing Group http://genetics.nature.com
commentary
370 nature genetics •
volume 29 • december 2001
in a format readable by generally available software. Storing the
primary image files would require a significant quantity of disk
space, and there is no community consensus as to whether this
would be cost-effective or whether this should be the task of pub-
lic repositories or the primary authors. Nevertheless, as images
represent the primary data from a microarray assay and the algo-
rithms used for analysis can affect the conclusions that are
reached, the current MIAME standard includes a specification for
image deposition. As scanning protocols and image analysis
methods mature, this mandatory requirement on image files may
be revisited.
For each experimental image, a microarray quantification
matrix contains the complete image analysis output as directly
generated by the image analysis software (normally provided as
separate spreadsheet-type files). Note that for a given image this
is a 2D matrix, where array elements (spots or features) consti-
tute one dimension and quantification types (such as mean and
median intensity, mean or median background intensity) are the
second dimension. We also provide in this section the co-lateral
information needed to understand how image analysis was car-
ried out, in particular the software used, the underlying method-
ology (such as algorithms and statistics), all relevant parameters
and the definitions of the quantifications used (such as mean or
median intensity). Note that if authors use their own custom-
made (or customized) image-analysis software, the specification
of its output is not formally dictated by MIAME. Nevertheless, in
the spirit of MIAME, the output should include the information
that permits the nature and quality of individual spot measure-
ments to be assessed.
Finally, the gene expression matrix (summarized information)
consists of sets of gene expression levels for each sample. If
microarray quantification matrices can be considered
spot/image centric, then the gene expression matrix is gene/sam-
ple centric. At this point, the expression values may have been
normalized, consolidated and transformed in any number of
ways by the submitter in order to present the data in a form
amenable to scientific analysis. Rather than attempting to impose
a standard for gene expression values, MIAME indicates pre-
ferred detailed specifications of all numerical calculations
applied to unprocessed quantifications in (b) that have led to the
data in (c). Experimenters are encouraged, though not required,
to provide reliability indicators (such as s.d.) for each data point.
Part 6: Normalization controls. A typical microarray experi-
ment involves a number of hybridization assays in which the data
from multiple samples are analyzed to identify relative changes in
expression levels, identify differentially expressed genes and, in
many cases, discover classes of genes or samples having similar
patterns of expression. A typical experiment follows a ‘reference
design’ (more sophisticated loop designs have been proposed11,
although these have not yet been widely adopted) in which many
samples are compared to a common reference sample so as to
facilitate inferences about relative expression changes between
samples. For these comparisons, the reported hybridization
intensities derived from image processing must first be normal-
ized. Normalization adjusts for a number of technical variations
between and within single hybridizations, namely quantity of
starting RNA and labeling and detection efficiencies for each
sample. There are a variety of normalization schemes in use,
including total-intensity, ratio-based and both linear and nonlin-
ear regression techniques. In addition, these analyses may be
based on either the complete data set, a user-defined subset of
genes (often a set of ‘housekeeping genes’ thought not to change
their level of expression under the conditions used) or exogenous
genes for which RNA is ‘spiked’ into the initial samples of inter-
est. Whether used for normalization or not, the use of exogenous
controls is becoming increasingly common both for quality con-
trol within single arrays and for array-hybridization compar-
isons within and between platforms.
Section 6 of the MIAME standard provides an opportunity for
the specification of parameters relevant to normalization and
control elements. Our proposed standard includes (i) the nor-
malization strategy (spiking, housekeeping genes, total array,
other approach) (ii) the normalization and quality control algo-
rithms used, (iii) the identities and location of the array elements
serving as controls, as well as their type (spiking, normalization,
negative or positive hybridization controls, ‘landing lights’ to
assist spotfinding), and (iv) hybridization extract preparation,
detailing how the control samples are included in sample targets
prior to hybridization.
Discussion
Our goal is to develop a standard that can serve both research sci-
entists and software developers. To that end, we hope that this
description will stimulate discussion of the proposed MIAME
standards and we encourage the microarray community, as well
as the general research community, to provide us with their views
on how this standard can be improved. For this purpose an e-
mail discussion group has been set up by MGED consortium (to
join, see http://www.mged.org).
At first glance, the extent of the information requested in the
MIAME specification may seem daunting. It should be noted, how-
ever, that for most laboratories the majority of the information will
be similar for many experiments, and once that information is
specified, it should not have to be specified again. For example,
most laboratories will use a single design for tens or hundreds of
microarray experiments. The same is true of labeling protocols and
normalization strategies. Describing these and other specifications
has several goals: to help scientists conducting and designing exper-
iments to record appropriate data, to assist those scientists inter-
preting or analyzing those data and to facilitate the design of
databases and software that enable the data to be archived, queried
and retrieved in an intuitive and biologically relevant manner.
We pose several questions to the research community. Is the
current MIAME draft sufficiently detailed to capture the infor-
mation needed to analyze and evaluate microarray data? If not,
what is missing? Or is MIAME already too extensive, requiring
specific details that are unlikely to ever be exploited; if so, what is
superfluous? Are there objections to having a defined minimum
information standard in principle and, if so, what are the alterna-
tives? The goal of our proposal is not to impose specific solutions
upon the community but instead to establish a community-wide
understanding of the optimal infrastructure for the sharing of
microarray data. It is possible that parts of MIAME may be bur-
densome whereas other sections do not offer sufficient detail.
One important consideration is the queries that one would like
to make of a MIAME-supportive gene expression database. Our
development of MIAME was guided primarily by a desire to pro-
vide the information necessary to make such queries.
The MIAME document represents an overall consensus of the
MGED working group on microarray data annotations1in all
parts except section 5(a) concerning ‘hybridization-scan raw
data’. A majority of the working group supports the view that
providing raw image data is an essential part of MIAME. There is
also a considerable minority, however, who do not adhere to this
view. If the consensus emerges that the primary image data are
important, what is the preferred mechanism for ensuring access
to images? Should they be stored in public repositories, or should
the availability of the images be the responsibility of the experi-
menter? We anticipate that the answer to this (and other ques-
tions posed here) may evolve over time.
© 2001 Nature Publishing Group http://genetics.nature.com
© 2001 Nature Publishing Group http://genetics.nature.com
commentary
nature genetics •
volume 29 • december 2001
371
A more fundamental question for discussion is whether nat-
ural units for gene expression measurements exist. If so, what
might they be, and can they be calculated from microarray mea-
surements? In the absence of natural units of gene expression,
how should gene expression data be organized, in particular to
facilitate cross-experiment and cross-platform transcriptome
comparisons? Is it possible and helpful to introduce standard
controls and protocols for microarray experiments themselves,
to facilitate comparison of the data?
Once MIAME has stabilized and a general consensus is
reached, we can turn to practical applications. An initial techni-
cal application is to develop a data model that is able to record
MIAME. Such a data model is already being developed within
the OMG with the participation of the MGED consortium
(http://www.geml.org/omg.htm). Founded on this data model is
a standard data-exchange format (an XML description called
MAGE-ML: Microarray Gene Expression Markup Language),
which will allow communication of MIAME supportive data
between local laboratory databases, central archives and stand-
alone analysis packages. The final version of the MAGE-ML stan-
dard has been submitted to OMG, and participating
organizations are already concentrating on the development of
the supporting software.
The next important step is the development of gene expression
databases able to record MIAME information and, equally impor-
tant, of data submission tools. Such tools may include web-based
questionnaires allowing users to enter MIAME information
directly into a database or to export captured data in the standard
format discussed above. In many cases, it is envisaged that most of
the MIAME information will be recorded through local LIMS soft-
ware before being uploaded into central archiving databases using a
standard data-exchange format. As such, the development of such
MIAME-friendly LIMS software will be an important task. We
hope that by adopting and publicizing MIAME, we will encourage
software developers to adapt their tools to this standard.
It is important that the effort to define minimum information
requirements and data-exchange formats is endorsed by many
major commercial genomics and bioinformatics companies. The
availability of minimum data requirements will help in develop-
ing databases that can exchange information with public or other
private databases. MIAME addresses the problem faced by most
commercial gene expression companies of integrating gene
expression data from multiple sources and multiple platforms.
Eventually, when MIAME–supportive public repositories are
established, the general research community must consider
whether full data disclosure should be required for publication.
Journals and funding agencies will also have to consider whether,
in the tradition of DNA sequence and macromolecular structure
data, release of microarray data at a MIAME compliant level
should be required.
In its present form, MIAME 1.0 is the first version of a docu-
ment describing the minimal information required to report an
array based gene expression experiment. Although some of the
current specifications may become redundant or irrelevant as the
technology evolves, extra information may need to be added at a
later date. We therefore plan to couple future versions with
progress in technology and analysis as well as experience gained
within the microarray community. In addition, microarrays can
be used for many types of experiment other than monitoring
gene expression (comparative genome hybridization, genome
mismatch scanning, chromatin IP experiments, and so on), and
future versions of MIAME will attempt to accommodate these
other types of data.
During this era of genomic-scale experiments, establishing
expectations for format and content, sharing data and analysis
tools and establishing databases and other resources has
become a widespread problem throughout the life sciences.
For instance, the neuroimaging community seems to be con-
fronted with very similar problems (how to compare data
across different laboratories, a lack of standards for data nor-
malization, a need for standard annotations) and is following a
similar strategy for developing a solution12. Our hope is that
such an approach becomes the norm by which data presenta-
tion and publication standards are developed in the future. As
such, we look forward to hearing comments and suggestions
from the general research community.
Note: Supplementary information is available on the Nature Genetics
web site (http://genetics.nature.com/supplementary_info/).
Acknowledgments
The MIAME document is a result of the work of many people. The idea was
conceived during an international meeting organized by the EBI in
November 1999 to discuss gene expression databases8, during which a
preliminary version of the MIAME document was produced. A microarray
annotations mailing list was created and many of the members of this
mailing list contributed to subsequent drafts. We would particularly like to
acknowledge G. Barton, K. Henrick and J-J. Riethoven, M. Bittner, R.
Bumgarner, M. Cherry, T. Freeman, J. Hoheisel and his team, A. Lash, H.
Mangalam, T. Preiss, A. Richter, C. Schwager, M. Ringwald, Y. Tateno and R.
Young. The document was extensively discussed in a meeting of the MGED
steering committee meeting at the US National Institutes of Health in
November 2000, where the current version of MIAME was effectively
prepared. The final additions to the document were made during the MGED
3 conference1(http://www.mged.org/). Although the MGED is a grass-roots
movement and does not presently have a dedicated funding, the authors of
this paper have been funded from contributions from various sources,
including the Industry Support Programme at the EBI, Lipper Foundation,
Medical Research Council, Incyte Genomics and the National Heart, Lung,
and Blood Institute of the US NIH.
Received 13 July; accepted 22 October 2001.
1. The Chipping Forecast. Nature Genet. 21, 1–60 (1999).
2. Brown, P.O. & Botstein, D. Exploring the new world of the genome with DNA
microarrays. Nature Genet. 21, 33–37 (1999).
3. Young, R. Biomedical discovery with DNA arrays. Cell 102, 9–16 (2000).
4. Lockhart, D. & Winzeler, E. Genomics, gene expression and DNA arrays. Nature
405, 827–836 (2000).
5. Aach, J., Rindone, W. & Church, G.M. Systematic management and analysis of
yeast gene expression data. Genome Res. 10, 431–445 (2000).
6. Quackenbush, J. Computational analysis of microarray data, Nature Rev. Genet.
2, 418–427 (2001).
7. Sherlock, G. et al. The Stanford Microarray Database. Nucleic Acids Res. 29,
152–155 (2001).
8. Brazma, A., Robinson, A., Cameron, G. & Ashburner M. One-stop shop for
microarray data. Nature 403, 699–700 (2000).
9. Editorial. Free and public expression. Nature 410, 851 (2001).
10. Bassett, D.E., Eisen, M.B. & Boguski, M.B. Gene expression informaticsit’s all in
your mine. Nature Genet. 21, 51–55 (1999).
11. Kerr, M.K. & Churchill G.A. Experimental design for gene expression microarrays.
Biostatistics 2, 183–201 (2001).
12. The Governing Council of the Organization for Human Brain Mapping (OHBM).
Neuoroimaging databases. Science 292, 1673–1676 (2001).
© 2001 Nature Publishing Group http://genetics.nature.com
© 2001 Nature Publishing Group http://genetics.nature.com
... It is the likelihood of discovering a distinction; when no such contrast really exists, which brings about the acknowledgment of a latent compound as a functioning compound. Such a blunder, which isn't uncommon, might be endured on the grounds that, in resulting preliminaries, the compound will uncover itself as dormant and hence at last rejected [15]Type I mistake is really fixed ahead of time by decision of the degree of importance utilized in the test [7]. It might be noticed that type I mistake can be made little by changing the degree of importance and by expanding the size of the example. ...
... The base data about a microarray explore (MIAME) standard characterizes diaries require creators of papers that utilization microarray information to store them in a public storehouse (like the Gene Expression or Array Express) in an organization that sticks to the MIAME standard. The MIAME information structures apply straightforwardly to a wide assortment of technologies' [15]. One shortcoming of MIAME and related norms, for example, the base data about a proteomics explore (MIAPE) standard [19], is that, despite the fact that they portray the assortment of metadata about the innovation utilized in an investigation, they do exclude an organized method to store clinical or segment information about the patients whose examples were utilized in the analyses. ...
Article
Statistical methods are imperative to reach substantial determinations from the acquired information Principle centre is given to kinds of information, estimation of focal varieties and fundamental tests, which are valuable for the investigation of various sorts of perceptions. Scarcely any boundaries like typical dissemination, computation of test size, level of importance, invalid speculation, records of changeability, and distinctive test are clarified in detail by giving reasonable models. Utilizing these rules, we are sure sufficient that postgraduate understudies will actually want to characterize the dissemination of information alongside the utilization of legitimate test. Data is likewise given in regards to different free programming projects and sites helpful for computations of measurements. In this manner, postgraduate understudies will have profited in two different ways whether they settle on scholastics or for the industry.
... International initiatives, such as the Minimum Information About a Microarray Experiment (MIAME) and the Proteomics Standards Initiative (PSI), should be expanded to cover the full spectrum of multiomics approaches. Developing and disseminating standardized guidelines and best practices will support the harmonization of research efforts, fostering greater collaboration and data sharing within the scientific community (Brazma et al., 2001). ...
... Microarray's experimental studies produce a large number of gene expressions and other information about genome expression, which promises to provide key information on genetic functioning and interaction in different pathways such as metabolic pathways and signalling. [6] Several factors contribute to the restriction of broad access to microarray data as this technology requires chip development, hybridisation, laser scanners and large amount of computational work. In addition, genetic data are much more complex than sequential data because it only makes sense in the context of a detailed description of the conditions under which it is built, including the specific condition of the system being studied and disturbed. ...
... One addressed factor which is rarely otherwise reported in experimental metadata but has been shown to have significant influence on the response to many abiotic stresses, including heat stress, is the time of day at sampling (Xu, Yuan, and Xie 2022;Blair et al. 2019). The process of measuring gene expression can be documented following the Minimum information about a microarray experiment (MIAME, (Brazma et al. 2001)) or Minimum Information About a Next-generation Sequencing Experiment (MINSEQE, https://www.fged.org/projects/minseqe/ ) standards. If transcriptomic data are documented in this way and published in an open access database such as the NCBI SRA, they can be made findable, accessible, interoperable and reusable (FAIR, (Wilkinson et al. 2016)) and thereby have a bigger impact for scientific research. ...
Preprint
Full-text available
Heat stress significantly affects global agricultural yield and food security and as climate change is expected to increase the frequency and severity of heatwaves, this is a growing challenge. Tomato plants are prone to heat stress exposure both in the field and in greenhouses, making heat stress resilience a key trait for breeding. While the identification of heat-associated genes has been addressed in multiple individual studies, the quantitative integration of data from these studies holds potential for low-cost, high-value knowledge gain about the complex network of actors involved in heat stress response mechanisms. To address this challenge, we have compiled a comprehensive data resource containing both novel and publicly available RNA-seq data on tomato in heat stress spanning multiple tissues, genotypes, and levels and durations of stress exposure. We show that in each individual dataset the large majority of responses originates from an interaction between the stimulus and the specific experimental setup. Conversely, by intersecting differentially expressed genes across experiments, we identify a tomato-specific core response of only 57 genes encoding heat shock proteins, transcriptional regulators, enzymes, transporters and several uncharacterized proteins. 17 of these genes lie within previously identified genetic loci associated with heat tolerance traits. Applying the same approach to all publicly available RNA-seq data on drought and salt stress in tomato, we find large overlaps in the conditional parts of the stress responses but the robust and sustained core responses are mostly stress-specific. Finally, we show that the core responses to these stresses are enriched with evolutionarily ancient genes with orthologs across all domains of life and that the heat core response genes form identifiable co-evolving clusters within the Streptophyta. Our study exemplifies the importance and advantage of using FAIR public data to interpret results of new stress experiments, and provides tools to perform such analyses in a relatively short time.
... In addition to the matrix, a description of a microarray experiment should also contain information about the genes whose expression has been measured and the experimental conditions in which the samples were taken. The information required to describe a microarray experiment can be conceptually divided into three logical parts: genetic annotation, sample annotation and a gene expression matrix (7). ...
Article
Full-text available
Esta investigación evalúa el rendimiento de los algoritmos de agrupación más conocidos utilizando el índice de estabilidad biológica (BSI). Se realizó una comparación entre los algoritmos de agrupación, para determinar de estos cuál es el óptimo según el puntaje obtenido en cada algoritmo, la agrupación de génica en Ciencia Intensiva, el mismo que utiliza bases de datos extensas para cubrir casi todos los resultados que pudiesen ocurrir realmente. Se aplica este método a una base de datos de expresión de genes (Microarray). El análisis se lo realizó a la base de datos “mouse” incluida en el paquete clValid en el software R, para el estudio de las células mesenquimales de ratones (cresta neural y el mesodermo derivado), también se utiliza métodos gráficos como los dendogramas para un primer enfoque. Para la selección del algoritmo óptimo, se calculó el índice biológico de estabilidad para cada algoritmo de agrupación siendo el mejor, el que más cerca de la unidad se encuentre. En consecuencia, el algoritmo más estable para dicha base de datos es “Diana”. Para llegar a este resultado se visualizó gráficamente el número de clústeres con la respuesta obtenida en cada caso; se tomó como el algoritmo óptimo el que más se apegue a la realidad del problema teniendo en cuenta su puntaje en los índices y además con la ayuda de un gráfico de filogenética para un ultimo enfoque.
Article
Full-text available
The collection, cryopreservation, thawing, and culture of peripheral blood mononuclear cells (PBMCs) can profoundly influence T cell viability and immunogenicity. Gold-standard PBMC processing protocols have been developed by the Office of HIV/AIDS Network Coordination (HANC); however, these protocols are not universally observed. Herein, we have explored the current literature assessing how technical variation during PBMC processing can influence cellular viability and T cell immunogenicity, noting inconsistent findings between many of these studies. Amid the mounting concerns over scientific replicability, there is growing acknowledgement that improved methodological rigour and transparent reporting is required to facilitate independent reproducibility. This review highlights that in human T cell studies, this entails adopting stringent standardised operating procedures (SOPs) for PBMC processing. We specifically propose the use of HANC’s Cross-Network PBMC Processing SOP, when collecting and cryopreserving PBMCs, and the HANC member network International Maternal Pediatric Adolescent AIDS Clinical Trials (IMPAACT) PBMC Thawing SOP when thawing PBMCs. These stringent and detailed protocols include comprehensive reporting procedures to document unavoidable technical variations, such as delayed processing times. Additionally, we make further standardisation and reporting recommendations to minimise and document variability during this critical experimental period. This review provides a detailed overview of the challenges inherent to a procedure often considered routine, highlighting the importance of carefully considering each aspect of SOPs for PBMC collection, cryopreservation, thawing, and culture to ensure accurate interpretation and comparison between studies.
Article
Full-text available
This study assessed the use of data management plans among researchers at a selected higher learning institution (HLI) in Tanzania. A pretested structured questionnaire was administered to registered postgraduate students. Many of the respondents reported that a data management plan (DMP) was required before writing a research project and when a research project was submitted. The results also demonstrated that many respondents did not use any online DMP template tools to formulate their DMP although most of them were aware of available DMP template tools such as OpenDMP. Many respondents stated that the requirement of using a DMP were selection of a DMP format, updating the DMP regularly, having a short and to-the-point DMP and a well-structured DMP specifying the kinds and formats of the data to be acquired, generated, produced, and preserved. Meeting funders’ institutions, and publishers’ requirements, and ensuring that data are accurate, complete, and reliable were among the DMP benefits in HLIs identified by the respondents. Several challenges were revealed including a lack of awareness, competence, and guidelines to assist researchers using a DMP for their research projects. The conclusion is that researchers need to develop and use DMP template tools to plan, organize, and work on their research projects in addition to ensuring that they meet funders' requirements. It is recommended that HLIs should provide extensive training programs for raising awareness about DMPs among the researchers and to make DMPs a mandatory requirement for finalizing research projects among researchers, and not only for funding purposes.
Chapter
Disorders of behavior represent some of the most common and disabling diseases affecting humankind; however, despite their worldwide distribution, genetic influences on these illnesses are often overlooked by families and mental health professionals. Psychiatric genetics is a rapidly advancing field, elucidating the varied roles of specific genes and their interactions in brain development and dysregulation. Principles of Psychiatric Genetics includes 22 disorder-based chapters covering, amongst other conditions, schizophrenia, mood disorders, anxiety disorders, Alzheimer's disease, learning and developmental disorders, eating disorders and personality disorders. Supporting chapters focus on issues of genetic epidemiology, molecular and statistical methods, pharmacogenetics, epigenetics, gene expression studies, online genetic databases and ethical issues. Written by an international team of contributors, and fully updated with the latest results from genome-wide association studies, this comprehensive text is an indispensable reference for psychiatrists, neurologists, psychologists and anyone involved in psychiatric genetic studies.
Article
The majority of cervical cancers have been linked to the infection by human papillomavirus (HPV). There is a need to identify genes which play a role in the final manifestation of cervical cancers following HPV infection. To identify a number of genetic markers associated with cervical cancer that may aid in the disease's diagnosis or prognosis using machine learning methods. To do this, we will assess numerous gene expression profiles with integrative machine learning approaches such as random forest (RF) and support vector machine-based recursive feature elimination (SVMRFE). The conceptual analysis consists of following steps: (i) gene expression analysis and (ii) machine learning analysis for predicting genes. The selected datasets were GSE75132 and GSE39001 for this study. Accuracy and cross validation were carried for both SVM-RFE and RF model for the gene identification purpose. R Bioconductor packages “GEOquery,” “limma,” and “umap” were utilized. The selected genes of machine learning methods were combined. The SVM model was the best for predicting the gene expression microarray profile based on the accuracy this study was able to get. The SVM model indicated that genes might be used as biomarkers to identify biological processes. The identified genes were considered as potential gene signatures in cervical cancer detection, and their interactions were studied.
Preprint
Myocardial infarction and reperfusion is a complex injury consisting of many distinct molecular stress patterns that influence cardiomyocyte survival and adaptation. Cell signalling that is essential to cardiac development also presents potential disease-modifying opportunities to recover and limit myocardial injury or maladaptive remodelling. Here we hypothesized that Yap signalling could be sensitive to one or more molecular stress patterns associated with early acute ischemia. Yap, not Taz, patterns of expression differ in post-myocardial infarct compared to peri-infarct tissue suggesting cell-specificity that would be challenging to resolve for causation in vivo. Using H9c2 ventricular myotubes in vitro as a model, Yap levels were most sensitive to nutrient deprivation compared to other stress patterns typified by ischemia within the first hour of stress. Moreover, this is mediated by amino acid availability, dominantly L-isoleucine, and influences the expression of Ctgf, a major determinant of myocardial adaptation after injury. These findings present novel opportunities for future therapeutic development and risk assessment for myocardial injury and adaptation.
Article
Full-text available
Technologies for whole-genome RNA expression studies are becoming increasingly reliable and accessible. However, universal standards to make the data more suitable for comparative analysis and for inter-operability with other information resources have yet to emerge. Improved access to large electronic data sets, reliable and consistent annotation and effective tools for 'data mining' are critical. Analysis methods that exploit large data warehouses of gene expression experiments will be necessary to realize the full potential of this technology.
Article
Full-text available
Thousands of genes are being discovered for the first time by sequencing the genomes of model organisms, an exhilarating reminder that much of the natural world remains to be explored at the molecular level. DNA microarrays provide a natural vehicle for this exploration. The model organisms are the first for which comprehensive genome-wide surveys of gene expression patterns or function are possible. The results can be viewed as maps that reflect the order and logic of the genetic program, rather than the physical order of genes on chromosomes. Exploration of the genome using DNA microarrays and other genome-scale technologies should narrow the gap in our knowledge of gene function and molecular biology between the currently-favoured model organisms and other species.
Article
Full-text available
Is a universal, public DNA-microarray database a realistic goal?
Article
Full-text available
Microarray experiments are providing unprecedented quantities of genome-wide data on gene-expression patterns. Although this technique has been enthusiastically developed and applied in many biological contexts, the management and analysis of the millions of data points that result from these experiments has received less attention. Sophisticated computational tools are available, but the methods that are used to analyse the data can have a profound influence on the interpretation of the results. A basic understanding of these computational tools is therefore required for optimal experimental design and meaningful data analysis.
Article
The Stanford Microarray Database (SMD) stores raw and normalized data from microarray experiments, and provides web interfaces for researchers to retrieve, analyze and visualize their data. The two immediate goals for SMD are to serve as a storage site for microarray data from ongoing research at Stanford University, and to facilitate the public dissemination of that data once published, or released by the researcher. Of paramount importance is the connection of microarray data with the biological data that pertains to the DNA deposited on the microarray (genes, clones etc.). SMD makes use of many public resources to connect expression information to the relevant biology, including SGD [Ball,C.A., Dolinski,K., Dwight,S.S., Harris,M.A., Issel-Tarver,L., Kasarskis,A., Scafe,C.R., Sherlock,G., Binkley,G., Jin,H. et al. (2000) Nucleic Acids Res., 28, 77–80], YPD and WormPD [Costanzo,M.C., Hogan,J.D., Cusick,M.E., Davis,B.P., Fancher,A.M., Hodges,P.E., Kondu,P., Lengieza,C., Lew-Smith,J.E., Lingner,C. et al. (2000) Nucleic Acids Res., 28, 73–76], Unigene [Wheeler,D.L., Chappey,C., Lash,A.E., Leipe,D.D., Madden,T.L., Schuler,G.D., Tatusova,T.A. and Rapp,B.A. (2000) Nucleic Acids Res., 28, 10–14], dbEST [Boguski,M.S., Lowe,T.M. and Tolstoshev,C.M. (1993) Nature Genet., 4, 332–333] and SWISS-PROT [Bairoch,A. and Apweiler,R. (2000) Nucleic Acids Res., 28, 45–48] and can be accessed at http://genome-www.stanford.edu/microarray.
Article
We report steps toward the systematic management, standardization, and analysis of functional genomics data. We developed the ExpressDB database for yeast RNA expression data and loaded it with ∼17.5 million pieces of data reported by 11 studies with three different kinds of high-throughput RNA assays. A web-based tool supports queries across the data from these studies. We examined comparability of data by converting data from 9 studies (217 conditions) into mRNA relative abundance estimates (ERAs) and by clustering of conditions by ERAs. We report on generation of ERAs and condition clustering for non-microarray data (5 studies, 63 conditions) and describe initial attempts to generate microarray-based ERAs (4 studies, 154 conditions), which exhibit increased error, on our web sitehttp://arep.med.harvard.edu/ExpressDB. We recommend standards for data reporting, suggest research into improving comparability of microarray data through quantifying and standardizing control condition RNA populations, and also suggest research into the calibration of different RNA assays. We introduce a model for a database that integrates different kinds of functional genomics data, Biomolecule Interaction, Growth and Expression Database (BIGED).
Article
We report steps toward the systematic management, standardization, and analysis of functional genomics data. We developed the ExpressDB database for yeast RNA expression data and loaded it with approximately 17.5 million pieces of data reported by 11 studies with three different kinds of high-throughput RNA assays. A web-based tool supports queries across the data from these studies. We examined comparability of data by converting data from 9 studies (217 conditions) into mRNA relative abundance estimates (ERAs) and by clustering of conditions by ERAs. We report on generation of ERAs and condition clustering for non-microarray data (5 studies, 63 conditions) and describe initial attempts to generate microarray-based ERAs (4 studies, 154 conditions), which exhibit increased error, on our web site http://arep.med.harvard. edu/ExpressDB. We recommend standards for data reporting, suggest research into improving comparability of microarray data through quantifying and standardizing control condition RNA populations, and also suggest research into the calibration of different RNA assays. We introduce a model for a database that integrates different kinds of functional genomics data, Biomolecule Interaction, Growth and Expression Database (BIGED).
Article
Experimental genomics in combination with the growing body of sequence information promise to revolutionize the way cells and cellular processes are studied. Information on genomic sequence can be used experimentally with high-density DNA arrays that allow complex mixtures of RNA and DNA to be interrogated in a parallel and quantitative fashion. DNA arrays can be used for many different purposes, most prominently to measure levels of gene expression (messenger RNA abundance) for tens of thousands of genes simultaneously. Measurements of gene expression and other applications of arrays embody much of what is implied by the term 'genomics'; they are broad in scope, large in scale, and take advantage of all available sequence information for experimental design and data interpretation in pursuit of biological understanding.
Article
Guidelines for submitting commentsPolicy: Comments that contribute to the discussion of the article will be posted within approximately three business days. We do not accept anonymous comments. Please include your email address; the address will not be displayed in the posted comment. Cell Press Editors will screen the comments to ensure that they are relevant and appropriate but comments will not be edited. The ultimate decision on publication of an online comment is at the Editors' discretion. Formatting: Please include a title for the comment and your affiliation. Note that symbols (e.g. Greek letters) may not transmit properly in this form due to potential software compatibility issues. Please spell out the words in place of the symbols (e.g. replace “α” with “alpha”). Comments should be no more than 8,000 characters (including spaces ) in length. References may be included when necessary but should be kept to a minimum. Be careful if copying and pasting from a Word document. Smart quotes can cause problems in the form. If you experience difficulties, please convert to a plain text file and then copy and paste into the form.