ArticlePDF Available

Minimum information about a microarray experiment (MIAME) - Toward standards for microarray data

December 2001
Nature Genetics 29(4)

December 2001
29(4)

DOI:10.1038/ng1201-365

Source
PubMed

Authors:

Alvis Brazma

European Molecular Biology Laboratory

Pascal Hingamp

Aix-Marseille Université

John Quackenbush

Harvard T.H. Chan School of Public Health

Gavin Sherlock

Stanford University

Show all 24 authorsHide

Microarray analysis has become a widely used tool for the generation of gene expression data on a genomic scale. Although many significant results have been derived from microarray studies, one limitation has been the lack of standards for presenting and exchanging such data. Here we present a proposal, the Minimum Information About a Microarray Experiment (MIAME), that describes the minimum information required to ensure that microarray data can be easily interpreted and that results derived from its analysis can be independently verified. The ultimate goal of this work is to establish a standard for recording and reporting microarray-based gene expression data, which will in turn facilitate the establishment of databases and public repositories and enable the development of data analysis tools. With respect to MIAME, we concentrate on defining the content and structure of the necessary information rather than the technical format for capturing it.

Content uploaded by Helen Causton

Content may be subject to copyright.

commentary

nature genetics •

volume 29 • december 2001

365

Minimum information about a microarray

experiment (MIAME)—toward standards

for microarray data

Alvis Brazma1, Pascal Hingamp2, John Quackenbush3, Gavin Sherlock4, Paul Spellman5,

Chris Stoeckert6, John Aach7, Wilhelm Ansorge8, Catherine A. Ball4, Helen C. Causton9,

Terry Gaasterland10, Patrick Glenisson11, Frank C.P. Holstege12, Irene F. Kim4, Victor

Markowitz13, John C. Matese4, Helen Parkinson1, Alan Robinson1, Ugis Sarkans1, Steffen

Schulze-Kremer14, Jason Stewart15, Ronald Taylor16, Jaak Vilo1& Martin Vingron17

Microarray analysis has become a widely used tool for the generation of gene expression data on a

genomic scale. Although many signiﬁcant results have been derived from microarray studies, one lim-

itation has been the lack of standards for presenting and exchanging such data. Here we present a

proposal, the Minimum Information About a Microarray Experiment (MIAME), that describes the min-

imum information required to ensure that microarray data can be easily interpreted and that results

derived from its analysis can be independently veriﬁed. The ultimate goal of this work is to establish a

standard for recording and reporting microarray-based gene expression data, which will in turn facil-

itate the establishment of databases and public repositories and enable the development of data analy-

sis tools. With respect to MIAME, we concentrate on deﬁning the content and structure of the necessary

information rather than the technical format for capturing it.

1European Bioinformatics Institute, EMBL outstation, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK. 2Centre d’Immunologie de

Marseille Luminy (CIML TAGC) & Université de la Méditerranée, Marseille, France. 3The Institute for Genomic Research (TIGR), Rockville, Maryland,

USA. 4Stanford University, Palo Alto, California, USA. 5University of California Berkeley, Berkeley, California, USA. 6University of Pennsylvania,

Philadelphia, Pennsylvania, USA. 7Department of Genetics, Harvard Medical School, Cambridge, Massachusetts, USA. 8European Molecular Biology

Laboratory (EMBL), Heidelberg, Germany. 9CSC/Imperial College School of Medicine Microarray Centre, London, UK. 10Rockefeller University, New York,

New York, USA. 11Katholieke Universiteit Leuven, Leuven, Belgium. 12University Medical Center, Utrecht, Netherlands. 13GeneLogic Inc, Gaithersburg,

Maryland, USA. 14RZDP German Genome Resource Center, Berlin, Germany. 15Open Informatics, Albuquerque, New Mexico, USA. 16Center for

Computational Pharmacology, University of Colorado School of Medicine, Denver, Colorado, USA. 17Max Plank Institute for Molecular Genetics, Berlin,

Germany. Correspondence should be addressed to A.B. (e-mail:brazma@ebi.ac.uk) or J.Q. (e-mail: johnq@tigr.org).

Introduction

After genome sequencing, DNA microarray analysis1has become

the most widely used source of genome-scale data in the life sci-

ences. Microarray expression studies are producing massive

quantities of gene expression and other functional genomics

data, which promise to provide key insights into gene function

and interactions within and across metabolic pathways2–4.

Unlike genome sequence data, however, which have standard

formats for presentation and widely used tools and databases,

much of the microarray data generated so far remain inaccessible

to the broader research community.

Several factors contribute to the barrier to widespread access to

microarray data. The ﬁeld is young and has only recently

approached the maturity needed to identify important aspects of

the data. In addition, gene expression data are more complex than

sequence data in that they are meaningful only in the context of a

detailed description of the conditions under which they were gen-

erated, including the particular state of the living system under

study and the perturbations to which it has been subjected. In con-

trast to an organism’s genome, there are as many transcriptomes as

there are cell types multiplied by environmental conditions. More-

over, comparing gene expression data is considerably more difﬁ-

cult, because at present, microarrays do not measure gene expres-

sion levels in any objective units. In fact, most measurements report

only relative changes in gene expression, using a reference which is

rarely standardized. Finally, different microarray platforms and

experimental designs produce data in various formats and units

and are normalized in different ways, all of which makes compari-

son and integration of these data an error-prone exercise5,6.

Although the largest microarray laboratories have established

their own databases7, microarray data accompanying publica-

tions are typically reported on authors’ web sites using a variety

of formats, if they are accessible at all. Exactly what annotation

should be provided for microarray data is open to debate, but it is

clear that most of the publicly available data are currently not

annotated in sufﬁcient detail for use by independent parties (in

fact, they are often not annotated at all). The reported data are

often completely ‘stripped’ of all the evidence about the quality,

reliability and possible error levels of particular data points. For

instance, for two-channel microarray data, it is common to

report only the background subtracted signal ratios without

indicating anything about the absolute signal and background

levels. Yet these are important for assessing the reliability of the

measured expression for each arrayed gene.

It is widely acknowledged that there is a need for public repos-

itories for microarray data8,9, whose functions would include

providing access to supporting data for publications based on

microarray experiments. Such repositories are under develop-

ment by the National Center for Biotechnology Information

(which has developed the Gene Expression Omnibus), the DNA

Database of Japan, and the European Bioinformatics Institute

(which has developed ArrayExpress); however, it is less clear

exactly what information should be stored in such databases.

Should databases store raw microarray scans (images), or is one

ﬁnal summary scalar per array element (such as one green/red

ratio per spot for two channel platforms) sufﬁcient? Or should

some intermediate data, such as the complete output from a par-

ticular image analysis software package, be used instead? And

should this be reported as ‘raw’ data or should it be normalized?

What information about the experimental set-up should be

required? And how should the array elements (printed spots or

features) be annotated to facilitate an understanding of the

experimental results?

The precise nature of the information to be stored will be dic-

tated by the function of the particular database or repository. If

the unique goal of the database is to archive supporting data for

published experiments, it can be assumed that the publications

themselves will provide information explaining the database

entries. It may be argued that it can be left to peer review to

ensure that a particular publication together with the respective

database entry contains the information that is necessary to ver-

ify and reproduce the experimental results. It is unlikely, how-

ever, that such a system could be effective or scalable. Moreover,

the value and usefulness of such a nonstandardized database

would be considerably limited. For instance, it would be difﬁcult

to use the database for high-throughput automated data analysis

or mining. The experience of the sequence databases over the

past decade unequivocally demonstrates the strategic impor-

tance of structured, consistent annotation applied early in the

process of data generation.

We believe that it is necessary to deﬁne the minimum informa-

tion that must be reported, in order to ensure the interpretability

of the experimental results generated using microarrays as well as

their potential independent veriﬁcation. Here we propose a doc-

ument called MIAME, the Minimum Information About a

Microarray Experiment, as a starting point for a broader com-

munity discussion. To make the task more manageable, we focus

on microarray-based gene expression data, which arguably cov-

ers the most popular applications of microarray technology. We

believe that the adoption of such a standard will facilitate the

establishment and usefulness of microarray databases. MIAME

should also prompt microarray manufacturers and software pro-

ducers to develop adequate microarray laboratory information

management systems (LIMS), enabling the production and cap-

ture of MIAME-compatible primary data at the bench.

The idea of having a deﬁned minimum standard for information

associated with experiments is not new to life sciences. A similar

mode of operation has been adopted by the macromolecular struc-

ture community (see, for example, http://msd.ebi.ac.uk/), where

most journals require submission of a well-deﬁned minimum of

raw data associated with publications. Not unlike crystallography

data, those generated by microarray experiments are usually of a

size and complexity that are meaningless to the general research

community unless a minimum deﬁned standard states what data

are sufﬁcient to support and verify conclusions.

Over and above representation of expression measurements,

MIAME addresses the need for the comprehensive annotation

necessary to interpret the results of microarray data. It is plat-

form-independent but includes essential evidence about how the

gene expression level measurements have been obtained. The

ﬁrst version of a MIAME document (MIAME 1.0) was recently

completed and is used here as the basis for this discussion (Web

Note A). A glossary of terms (Web Note B) and an example of a

MIAME-compliant description of an experiment (Web Note C)

were prepared to facilitate discussion of the proposed standard.

MIAME is being developed by the Microarray Gene Expression

Database group (MGED; http://www.mged.org/), a grass-roots

movement to develop standards for microarray data5. MIAME

1.0 was approved in the MGED 3 meeting in Stanford University

on 28–31 May 2001. It should be noted that MIAME does not

specify the format in which the information should be provided,

but only its content. A technical data format to capture microar-

ray information in a form that includes MIAME requirements is

being developed in a collaborative initiative coordinated by the

Life Sciences Research Task Force of the Object Management

Group (OMG) and, in fact, is well under way.

Gene expression—a conceptual view

A collection of gene expression data can be viewed abstractly as a

table with rows representing genes, columns representing vari-

ous samples and each position in the table describing the mea-

surement for a particular gene in a particular sample (Fig. 1). We

call this table a gene expression matrix. In addition to the matrix,

a description of a microarray experiment should also contain

information about the genes whose expression has been mea-

sured and the experimental conditions under which the samples

were taken. The information required to describe a microarray

experiment can be divided conceptually into three logical parts:

gene annotation, sample annotation and a gene expression

matrix (Fig. 1).

Ideally, we would like to measure amounts of gene expression in

natural units, such as mRNA copies per cell10, and to have an error

estimate or reliability indicator such as the standard deviation (s.d.)

associated with each value. There are a number of experimental

challenges, however, that make direct measurement of gene expres-

sion difﬁcult. Raw data from microarray experiments are images

Fig. 1 Conceptual view of gene expression data. The model has three parts: (i)

gene annotation, which may be given as links to gene sequence databases, (ii)

sample annotation, for which there currently are no public external databases

(except the species taxonomy) and (iii) the gene expression matrix, in which

each position contains information characterizing the expression of a particu-

lar gene in a particular sample.

commentary

366 nature genetics •

volume 29 • december 2001

samples

genes

gene annotation

sample annotation

gene expression

matrix

gene expression levels

commentary

nature genetics •

volume 29 • december 2001

367

from hybridized microarray

scans that have to be analyzed to

identify and quantify each fea-

ture (spot) in the image. A DNA

sequence may be spotted on a

microarray several times; in addition, several distinct DNA

sequences may be spotted that map to the same gene. To yield a sin-

gle value for these, the corresponding measurements have to be

combined. Moreover, the same biological condition can be mea-

sured in several (replicate) hybridizations, and the information

from all replicates has to be summarized to derive a single gene

expression data matrix. Finally, to compare microarray data from

several samples, the data must be appropriately normalized.

There are at least three levels of data relevant to a microarray

experiment: (i) the scanned images (raw data); (ii) the quantita-

tive outputs from the image analysis procedure (microarray

quantitation matrices); and (iii) the derived measurements (gene

expression data matrices; Fig. 2). There is an important series of

transformations leading from raw data to the gene expression

matrix, and the steps involved are far from being standardized.

As there are no widely used standard controls for microarray

assays, microarray data from different sources use different mea-

surement units whose conversion factors are typically unknown

and may even vary depending on expression level. This indicates

the necessity to record not only the ﬁnal gene expression matrix,

but also a detailed description of how the expression values were

obtained, if veriﬁcation of the data is to be ensured. Conse-

quently, the nature of the data that must be recorded necessarily

becomes more complex.

Because microarray data have meaning only in the context of

the particular biological sample and the exact conditions under

which the samples were taken, a major element of the standard

that we propose addresses sample annotation. For instance, if we

are interested in ﬁnding out how different cell types react to

treatments with various chemical compounds, we must record

unambiguous information about the cell types and compounds

used in the experiments. This information should be contained

in sample annotation.

Although gene annotation can to a certain extent be expressed

by links to sequence databases, the possibly complicated many-

to-many relationships between genes in the gene expression

matrix and elements on the array make it necessary to provide a

full and detailed description of each element on the array.

General principles of MIAME design

As a starting point, we propose that for the data and annotations

from microarray experiments to have the most value, they should

satisfy the following requirements: (i) the recorded information

about each experiment should be sufﬁcient to interpret the experi-

ment and should be detailed enough to enable comparisons to sim-

ilar experiments and permit replication of experiments and (ii) the

information should be structured in a way that enables useful

querying as well as automated data analysis and mining.

The ﬁrst requirement implies that a detailed annotation of the

sample and other experimental conditions should be recorded and

that some reliability estimates of particular data points should be

given. For example, red/green ratios alone for two-channel plat-

forms cannot normally be regarded as sufﬁcientcurrently there is

no widely accepted method for indicating the conﬁdence in a mea-

surement, and much intensity-speciﬁc and expression-level infor-

mation is lost. The necessary level of detail and whether the raw

image data should be included are less obvious and are still widely

discussed in the microarray community.

The second requirement implies the need for controlled vocabu-

laries and ontologies to represent data as well as the need to limit

free-format text only to cases where more structured representa-

tions are not feasible. This includes the use of a standardized

nomenclature for the description of biological samples and condi-

tions. The nomenclature may be as simple as a controlled vocabu-

lary or as fully developed as an ontology. Usage of the same

taxonomic classiﬁcation (http://www.ncbi.nlm.nih.gov/Taxon-

omy/) is an example of an ontology that allows researchers to

unambiguously determine the phylogenetic relationship between

organisms. Unfortunately, such resources are not available for

many other types of sample annotation, particularly those speciﬁc

for individual species, such as ‘developmental stage’ (see the MGED

Ontology Working Group home page for reference). A practical

approach may be to initially use free-text descriptions for some

sample annotations, despite the difﬁculty of incorporating them in

automated queries. The use of free text may be sufﬁcient, for exam-

ple, in describing details of a laboratory protocol. A reference to a

publication describing the experiment is an alternative; however,

there are obvious drawbacks to not having information in hand for

either queries or browsing.

Although the goal of MIAME is to specify only the content of

the information and not the technical format, MIAME includes

recommendations for which parts of the information should be

provided as controlled vocabularies. The distinction between

free-text format and controlled vocabularies inﬂuences the

quantification

matrices gene expression

data matrix

spots

genes

quantifications samples

array scans

raw data

quantification datum

gene

expression

level

Fig. 2 Three levels of microarray gene

expression data processing. The raw

data from microarray experiments are

images. These images have to be

quantiﬁed by image analysis soft-

ware, which identiﬁes spots related

to each element on the array and

measures the ﬂuorescence intensity

of each spot in each channel,

together with the background inten-

sity and a number of other quantiﬁca-

tions, depending on the particular

software (microarray quantiﬁcation

matrices). To obtain the ﬁnal gene

expression matrix, all the quantities

related to each gene (either on the

same array or on replicate arrays)

have to be combined and the entire

matrix has to be normalized to make

different arrays comparable.

commentary

368 nature genetics •

volume 29 • december 2001

information content: a deﬁned term taken from a given con-

trolled vocabulary is more precise in its meaning than the same

term used in a free-text ﬁeld and can provide more advanced

data query and analysis options. As the majority of the necessary

controlled vocabularies do not exist, the MIAME deﬁnition

includes lists of ‘qualiﬁer, value, source’ triplets, which authors

can use to deﬁne their own qualiﬁers and provide the appropri-

ate values. For instance:

qualiﬁer: cell type

value: epithelial

source: Gray’s anatomy (38th ed.)

qualiﬁer: treatment

value: 15′heat shock

source: Smith and Jones, Nature Genet. (1992)

Given sufﬁcient detail by the author, these triplets can fully

describe a particular aspect of an experiment. The idea stems

from the information sciences, where a ‘qualiﬁer’ deﬁnes a con-

cept and a ‘value’ contains the appropriate instance of the con-

cept. ‘Source’ is either user-deﬁned or a reference to an externally

deﬁned ontology or controlled vocabulary, such as the species

taxonomy database. The judgment regarding the necessary level

of detail is left to the data providers. In the future, these qualiﬁer

lists may be gradually supplemented with predeﬁned ﬁelds as the

respective ontologies are developed.

The aim of establishing microarray databases should be kept in

mind, although the MIAME document is conceptually indepen-

dent from it. An important principle in MIAME is that its parts

can be provided as references or links to pre-existing and identiﬁ-

able descriptions. For instance, for commercial or other standard

arrays, required information needs to be provided only once by

the array supplier and referenced thereafter by the users. Stan-

dard protocols also need to be provided only once after they are

established, whereas speciﬁc deviations and parameters may be

provided with each experiment (note that this would allow users

to create a library of standard protocols). It is necessary that

either a valid reference or the information itself be provided for

every experimental data set.

There is one important additional principle underpinning

MIAME: as microarray technology is developing rapidly, it

would be counterproductive and unrealistic to impose on users

any particular platform, software or methods of data analysis.

Instead the standards should simply require the description of

data in sufﬁcient detail and

with sufﬁcient annotation, so

that interested parties will have

all the necessary information

to understand how conclusions

were reached. Note that we

assume that data will be pro-

duced by different experimen-

tal platforms and laboratories,

which means that few default

assumptions can be made and

most of the information

should be reported explicitly by

each laboratory.

In developing MIAME, we

have sought to ﬁnd a compro-

mise between placing a burden

on data producers to annotate

experiments in elaborate detail

and ensuring that data are

annotated in enough detail to be useful to the general research

community. Too much detail may be too taxing for data produc-

ers and may complicate data recording and database submission,

whereas too little detail may limit the usefulness of the data.

MIAME is an informal speciﬁcation, the goal of which is to guide

cooperative data providers. It is not designed to close all possible

loopholes in data submission requirements. MIAME is not

designed as a ‘questionnaire’ that can be ﬁlled in, but only as an

informal speciﬁcation on which microarray experiment–annota-

tion tools may be based.

The six parts of MIAME

We deﬁne a microarray experiment as a set of one or more

hybridizations, each of which relates one or more samples to one

or more arrays. The hybridized array is then scanned and the

resulting image analyzed, relating each element on the array with

a set of measurements (Fig. 3). The data are normalized and

combined with data from replicate hybridizations.

The minimum information about a published microarray-

based gene expression experiment includes a description of the

following six sections:

1. Experimental design: the set of hybridization experiments

as a whole

2. Array design: each array used and each element (spot,

feature) on the array

3. Samples: samples used, extract preparation and labeling

4. Hybridizations: procedures and parameters

5. Measurements: images, quantiﬁcation and speciﬁcations

6. Normalization controls: types, values and speciﬁcations

Each of these sections contains information that can be provided

using controlled vocabularies, as well as ﬁelds that use free-text for-

mat. Here we discuss only the general information required in each

of these sections; for a full description, see the MIAME document

(Web Note A), which includes a sample experiment described

according to MIAME requirements. We do not discuss why we

regard each of the MIAME elements as necessary, but we hope that

this follows from the principles discussed in the previous sections.

In constructing the MIAME standard, we were careful to include

as much relevant information as possible to aid in the interpreta-

tion of the results of each microarray experiment. Some have sug-

gested that this is excessive as, for example, sample preparation and

labeling protocols typically appear in the Methods sections of pub-

lications associated with microarray experiments. Although these

external links publications

(e.g. PubMed)

source

(e.g. taxonomy)

gene

(e.g. GENBANK)

experiment

hybridization

datanormalization

sample array

Fig. 3 A schematic representation of six components of a microarray experiment.

commentary

nature genetics •

volume 29 • december 2001

369

assertions are merited, we believe that the comprehensive nature of

MIAME confers distinct advantages without imposing an excessive

burden on submitters of microarray data. There are a number of

reasons for this. First, stand-alone entries make the use of a data-

base inherently more efﬁcient. Second, a collection of protocols in a

relatively standard format (including controlled vocabularies)

would facilitate comparison and usage of the protocols by third

parties. Third, not all journals publish experimental protocols in

sufﬁcient detail (often due to length considerations) for others to

reproduce them. Fourth, once this information has been prepared

for a journal publication, providing it for a database submission

should not constitute signiﬁcant additional effort. Moreover, jour-

nals are increasingly relying on electronic release of both protocols

and the data supporting published reports, and the MIAME stan-

dard would allow for a uniform presentation of such information.

Part 1: Experimental design. This section describes the exper-

iment as a whole, which may consist of one or more hybridiza-

tions. Normally an ‘experiment’ should include a set of

hybridizations that are inter-related and address a common bio-

logical question, such as all hybridizations relating to research

published in a single paper. Each experiment should have an

author (submitter) as well as contact information, links (URL),

citations and a single-sentence experiment title. The section also

includes a free-text format description of the experiment or a

link to an electronically available publication.

The minimal information required in this section includes the

type of the experiment (such as normal-versus-diseased compar-

ison, time course, dose response, and so on) and the experimen-

tal variables, including parameters or conditions tested (such as

time, dose, genetic variation or response to a treatment or com-

pound). This section also provides general quality-related indi-

cators such as usage and types of replicates and quality-control

steps (such as dealing with low-complexity sequence-induced

nonspeciﬁc hybridization). Provided in a format of controlled

vocabularies, these will enable accurate queries and more formal

data analysis than free-text descriptions.

Finally, this section speciﬁes the experimental relationships

between the array and sample entitiesthat is, which samples

and which arrays were used in each hybridization assay. Each of

these will be assigned unique identiﬁers that are cross-referenced

with the information provided in the following sections. This

information will allow the user to reconstruct unambiguously

the experimental design and to relate together information from

further MIAME sections.

Part 2: Array design. The aim of this section is to provide a

systematic deﬁnition of all arrays used in the experiment, includ-

ing the genes represented and their physical layout on the array.

There are two parts to this section. The ﬁrst is a list of the physical

arrays; each member of the list is a simple description that gives a

unique ID to each array used in the experiment and a reference to

a particular array design. These designs are described in the sec-

ond part of the section.

In the context of a database, array types should be deﬁned and

submitted only once by the array provider and referred to thereafter

by users of the arrays. The array-type deﬁnition includes informa-

tion common to all arrays of a particular type (such as glass-slide

spotted with PCR-ampliﬁed cDNA clones) as well as precise

descriptions of the physical content of each element (spot or fea-

ture). This section consists of three parts: (i) a description of the

array as a whole (such as platform type, provider and surface type);

(ii) a description of each type of element or spot used (properties

that are typically common to many elements, such as ‘synthesized

oligo-nucleotides’ or ‘PCR products from cDNA clones’); and (iii)

a description of the speciﬁc properties of each element, such as the

DNA sequence and, possibly, quality-control indicators.

The challenge for element deﬁnition is to achieve a unique and

unambiguous description of the element. Because references to

an external gene index may not be stable, it is essential to physi-

cally identify each element’s composition. Disclosing the nature

of the relationship between an array element and its cognate

gene’s transcript allows informed assessment of an element’s

potential for nonspeciﬁc cross-hybridization or its capacity to

distinguish alternative splice variants. Thus, where elements are

based on cDNA clones, PCR amplicons or composite oligonu-

cleotides, it is necessary that clone IDs, primer pair sequences or

oligonucleotide sequence sets, respectively, are speciﬁed. In the

case of commercial arrays where such details may be proprietary,

MIAME allows for a compromise: instead of the actual probe

sequence, the reference sequence from which the probe was

derived may be speciﬁed. Although we do not consider this to be

ideal, as the probe sequence may affect the results and interpreta-

tion of hybridization, it does allow commercial organizations to

protect their intellectual investment in developing their array

reagents while providing the minimum information necessary to

uniquely identify the array elements.

Part 3: Samples. This section describes the second partner in

the hybridization reaction: the labeled nucleic acids that repre-

sent the transcripts in the sample. The MIAME ‘sample’ concept

represents the biological material (or biomaterial) for which the

gene expression proﬁle is being established. This section is

divided into three parts which describe the source of the original

sample (such as organism taxonomy and cell type) and any bio-

logical in vivo or in vitro treatments applied, the technical extrac-

tion of the nucleic acids, and their subsequent labeling.

As the characteristics necessary to accurately deﬁne a biologi-

cal sample vary greatly from organism to organism, most of the

biological sample deﬁnition is provided as an adaptable list of

‘qualiﬁer, value, source’ triplets (such as ‘strain’, ‘129P1-Lama2dy’

or ‘ICSGNM’). Currently, the single common feature of all sam-

ples is the organism’s taxonomic deﬁnition. A list of qualiﬁers

initially left at a submitter’s discretion may progressively be made

standard when applicable ontologies are made public.

As for laboratory protocols for sample treatments, sample

extraction and labeling, these will need to be speciﬁed initially as

free-format text. Again, it is anticipated that popular protocols

will be provided once and referred to thereafter by submitters

pointing out the exact parameters and deviations from the stan-

dard protocol. Knowledge of these protocols may be important

for interpreting the data.

Part 4: Hybridizations. This section deﬁnes the laboratory

conditions under which the hybridizations were carried out.

Other than a free-text description of the hybridization protocol,

MIAME requires that a number of critical hybridization parame-

ters are explicitly speciﬁed: choice of hybridization solution

(such as salt and detergent concentrations), nature of the block-

ing agent, wash procedure, quantity of labeled target used,

hybridization time, volume, temperature and descriptions of the

hybridization instruments.

Part 5: Measurements. The actual experimental results are

defined in this section. It consists of the three parts discussed

in Section 2, progressing from raw to processed data: (a) the

original scans of the array (images), (b) the microarray quan-

tification matrices based on image analysis, and (c) the final

gene expression matrix after normalization and consolidation

from possible replicates.

Image data should be provided as raw scanner image ﬁles (such

as TIFF), accompanied by scanning information that includes rel-

evant scan parameters and laboratory protocols. MIAME does

not require a particular image format, only that submitters pro-

vide the original scans upon which data quantiﬁcation was based

commentary

370 nature genetics •

volume 29 • december 2001

in a format readable by generally available software. Storing the

primary image ﬁles would require a signiﬁcant quantity of disk

space, and there is no community consensus as to whether this

would be cost-effective or whether this should be the task of pub-

lic repositories or the primary authors. Nevertheless, as images

represent the primary data from a microarray assay and the algo-

rithms used for analysis can affect the conclusions that are

reached, the current MIAME standard includes a speciﬁcation for

image deposition. As scanning protocols and image analysis

methods mature, this mandatory requirement on image ﬁles may

be revisited.

For each experimental image, a microarray quantiﬁcation

matrix contains the complete image analysis output as directly

generated by the image analysis software (normally provided as

separate spreadsheet-type ﬁles). Note that for a given image this

is a 2D matrix, where array elements (spots or features) consti-

tute one dimension and quantiﬁcation types (such as mean and

median intensity, mean or median background intensity) are the

second dimension. We also provide in this section the co-lateral

information needed to understand how image analysis was car-

ried out, in particular the software used, the underlying method-

ology (such as algorithms and statistics), all relevant parameters

and the deﬁnitions of the quantiﬁcations used (such as mean or

median intensity). Note that if authors use their own custom-

made (or customized) image-analysis software, the speciﬁcation

of its output is not formally dictated by MIAME. Nevertheless, in

the spirit of MIAME, the output should include the information

that permits the nature and quality of individual spot measure-

ments to be assessed.

Finally, the gene expression matrix (summarized information)

consists of sets of gene expression levels for each sample. If

microarray quantiﬁcation matrices can be considered

spot/image centric, then the gene expression matrix is gene/sam-

ple centric. At this point, the expression values may have been

normalized, consolidated and transformed in any number of

ways by the submitter in order to present the data in a form

amenable to scientiﬁc analysis. Rather than attempting to impose

a standard for gene expression values, MIAME indicates pre-

ferred detailed speciﬁcations of all numerical calculations

applied to unprocessed quantiﬁcations in (b) that have led to the

data in (c). Experimenters are encouraged, though not required,

to provide reliability indicators (such as s.d.) for each data point.

Part 6: Normalization controls. A typical microarray experi-

ment involves a number of hybridization assays in which the data

from multiple samples are analyzed to identify relative changes in

expression levels, identify differentially expressed genes and, in

many cases, discover classes of genes or samples having similar

patterns of expression. A typical experiment follows a ‘reference

design’ (more sophisticated loop designs have been proposed11,

although these have not yet been widely adopted) in which many

samples are compared to a common reference sample so as to

facilitate inferences about relative expression changes between

samples. For these comparisons, the reported hybridization

intensities derived from image processing must ﬁrst be normal-

ized. Normalization adjusts for a number of technical variations

between and within single hybridizations, namely quantity of

starting RNA and labeling and detection efﬁciencies for each

sample. There are a variety of normalization schemes in use,

including total-intensity, ratio-based and both linear and nonlin-

ear regression techniques. In addition, these analyses may be

based on either the complete data set, a user-deﬁned subset of

genes (often a set of ‘housekeeping genes’ thought not to change

their level of expression under the conditions used) or exogenous

genes for which RNA is ‘spiked’ into the initial samples of inter-

est. Whether used for normalization or not, the use of exogenous

controls is becoming increasingly common both for quality con-

trol within single arrays and for array-hybridization compar-

isons within and between platforms.

Section 6 of the MIAME standard provides an opportunity for

the speciﬁcation of parameters relevant to normalization and

control elements. Our proposed standard includes (i) the nor-

malization strategy (spiking, housekeeping genes, total array,

other approach) (ii) the normalization and quality control algo-

rithms used, (iii) the identities and location of the array elements

serving as controls, as well as their type (spiking, normalization,

negative or positive hybridization controls, ‘landing lights’ to

assist spotﬁnding), and (iv) hybridization extract preparation,

detailing how the control samples are included in sample targets

prior to hybridization.

Discussion

Our goal is to develop a standard that can serve both research sci-

entists and software developers. To that end, we hope that this

description will stimulate discussion of the proposed MIAME

standards and we encourage the microarray community, as well

as the general research community, to provide us with their views

on how this standard can be improved. For this purpose an e-

mail discussion group has been set up by MGED consortium (to

join, see http://www.mged.org).

At ﬁrst glance, the extent of the information requested in the

MIAME speciﬁcation may seem daunting. It should be noted, how-

ever, that for most laboratories the majority of the information will

be similar for many experiments, and once that information is

speciﬁed, it should not have to be speciﬁed again. For example,

most laboratories will use a single design for tens or hundreds of

microarray experiments. The same is true of labeling protocols and

normalization strategies. Describing these and other speciﬁcations

has several goals: to help scientists conducting and designing exper-

iments to record appropriate data, to assist those scientists inter-

preting or analyzing those data and to facilitate the design of

databases and software that enable the data to be archived, queried

and retrieved in an intuitive and biologically relevant manner.

We pose several questions to the research community. Is the

current MIAME draft sufﬁciently detailed to capture the infor-

mation needed to analyze and evaluate microarray data? If not,

what is missing? Or is MIAME already too extensive, requiring

speciﬁc details that are unlikely to ever be exploited; if so, what is

superﬂuous? Are there objections to having a deﬁned minimum

information standard in principle and, if so, what are the alterna-

tives? The goal of our proposal is not to impose speciﬁc solutions

upon the community but instead to establish a community-wide

understanding of the optimal infrastructure for the sharing of

microarray data. It is possible that parts of MIAME may be bur-

densome whereas other sections do not offer sufﬁcient detail.

One important consideration is the queries that one would like

to make of a MIAME-supportive gene expression database. Our

development of MIAME was guided primarily by a desire to pro-

vide the information necessary to make such queries.

The MIAME document represents an overall consensus of the

MGED working group on microarray data annotations1in all

parts except section 5(a) concerning ‘hybridization-scan raw

data’. A majority of the working group supports the view that

providing raw image data is an essential part of MIAME. There is

also a considerable minority, however, who do not adhere to this

view. If the consensus emerges that the primary image data are

important, what is the preferred mechanism for ensuring access

to images? Should they be stored in public repositories, or should

the availability of the images be the responsibility of the experi-

menter? We anticipate that the answer to this (and other ques-

tions posed here) may evolve over time.

commentary

nature genetics •

volume 29 • december 2001

371

A more fundamental question for discussion is whether nat-

ural units for gene expression measurements exist. If so, what

might they be, and can they be calculated from microarray mea-

surements? In the absence of natural units of gene expression,

how should gene expression data be organized, in particular to

facilitate cross-experiment and cross-platform transcriptome

comparisons? Is it possible and helpful to introduce standard

controls and protocols for microarray experiments themselves,

to facilitate comparison of the data?

Once MIAME has stabilized and a general consensus is

reached, we can turn to practical applications. An initial techni-

cal application is to develop a data model that is able to record

MIAME. Such a data model is already being developed within

the OMG with the participation of the MGED consortium

(http://www.geml.org/omg.htm). Founded on this data model is

a standard data-exchange format (an XML description called

MAGE-ML: Microarray Gene Expression Markup Language),

which will allow communication of MIAME supportive data

between local laboratory databases, central archives and stand-

alone analysis packages. The ﬁnal version of the MAGE-ML stan-

dard has been submitted to OMG, and participating

organizations are already concentrating on the development of

the supporting software.

The next important step is the development of gene expression

databases able to record MIAME information and, equally impor-

tant, of data submission tools. Such tools may include web-based

questionnaires allowing users to enter MIAME information

directly into a database or to export captured data in the standard

format discussed above. In many cases, it is envisaged that most of

the MIAME information will be recorded through local LIMS soft-

ware before being uploaded into central archiving databases using a

standard data-exchange format. As such, the development of such

MIAME-friendly LIMS software will be an important task. We

hope that by adopting and publicizing MIAME, we will encourage

software developers to adapt their tools to this standard.

It is important that the effort to deﬁne minimum information

requirements and data-exchange formats is endorsed by many

major commercial genomics and bioinformatics companies. The

availability of minimum data requirements will help in develop-

ing databases that can exchange information with public or other

private databases. MIAME addresses the problem faced by most

commercial gene expression companies of integrating gene

expression data from multiple sources and multiple platforms.

Eventually, when MIAME–supportive public repositories are

established, the general research community must consider

whether full data disclosure should be required for publication.

Journals and funding agencies will also have to consider whether,

in the tradition of DNA sequence and macromolecular structure

data, release of microarray data at a MIAME compliant level

should be required.

In its present form, MIAME 1.0 is the ﬁrst version of a docu-

ment describing the minimal information required to report an

array based gene expression experiment. Although some of the

current speciﬁcations may become redundant or irrelevant as the

technology evolves, extra information may need to be added at a

later date. We therefore plan to couple future versions with

progress in technology and analysis as well as experience gained

within the microarray community. In addition, microarrays can

be used for many types of experiment other than monitoring

gene expression (comparative genome hybridization, genome

mismatch scanning, chromatin IP experiments, and so on), and

future versions of MIAME will attempt to accommodate these

other types of data.

During this era of genomic-scale experiments, establishing

expectations for format and content, sharing data and analysis

tools and establishing databases and other resources has

become a widespread problem throughout the life sciences.

For instance, the neuroimaging community seems to be con-

fronted with very similar problems (how to compare data

across different laboratories, a lack of standards for data nor-

malization, a need for standard annotations) and is following a

similar strategy for developing a solution12. Our hope is that

such an approach becomes the norm by which data presenta-

tion and publication standards are developed in the future. As

such, we look forward to hearing comments and suggestions

from the general research community.

Note: Supplementary information is available on the Nature Genetics

web site (http://genetics.nature.com/supplementary_info/).

Acknowledgments

The MIAME document is a result of the work of many people. The idea was

conceived during an international meeting organized by the EBI in

November 1999 to discuss gene expression databases8, during which a

preliminary version of the MIAME document was produced. A microarray

annotations mailing list was created and many of the members of this

mailing list contributed to subsequent drafts. We would particularly like to

acknowledge G. Barton, K. Henrick and J-J. Riethoven, M. Bittner, R.

Bumgarner, M. Cherry, T. Freeman, J. Hoheisel and his team, A. Lash, H.

Mangalam, T. Preiss, A. Richter, C. Schwager, M. Ringwald, Y. Tateno and R.

Young. The document was extensively discussed in a meeting of the MGED

steering committee meeting at the US National Institutes of Health in

November 2000, where the current version of MIAME was effectively

prepared. The ﬁnal additions to the document were made during the MGED

3 conference1(http://www.mged.org/). Although the MGED is a grass-roots

movement and does not presently have a dedicated funding, the authors of

this paper have been funded from contributions from various sources,

including the Industry Support Programme at the EBI, Lipper Foundation,

Medical Research Council, Incyte Genomics and the National Heart, Lung,

and Blood Institute of the US NIH.

Received 13 July; accepted 22 October 2001.

1. The Chipping Forecast. Nature Genet. 21, 1–60 (1999).

2. Brown, P.O. & Botstein, D. Exploring the new world of the genome with DNA

microarrays. Nature Genet. 21, 33–37 (1999).

3. Young, R. Biomedical discovery with DNA arrays. Cell 102, 9–16 (2000).

4. Lockhart, D. & Winzeler, E. Genomics, gene expression and DNA arrays. Nature

405, 827–836 (2000).

5. Aach, J., Rindone, W. & Church, G.M. Systematic management and analysis of

yeast gene expression data. Genome Res. 10, 431–445 (2000).

6. Quackenbush, J. Computational analysis of microarray data, Nature Rev. Genet.

2, 418–427 (2001).

7. Sherlock, G. et al. The Stanford Microarray Database. Nucleic Acids Res. 29,

152–155 (2001).

8. Brazma, A., Robinson, A., Cameron, G. & Ashburner M. One-stop shop for

microarray data. Nature 403, 699–700 (2000).

9. Editorial. Free and public expression. Nature 410, 851 (2001).

10. Bassett, D.E., Eisen, M.B. & Boguski, M.B. Gene expression informaticsit’s all in

your mine. Nature Genet. 21, 51–55 (1999).

11. Kerr, M.K. & Churchill G.A. Experimental design for gene expression microarrays.

Biostatistics 2, 183–201 (2001).

12. The Governing Council of the Organization for Human Brain Mapping (OHBM).

Neuoroimaging databases. Science 292, 1673–1676 (2001).

REVIEW ABOUT BASICS OF BIOSTATISTICS

Article

Jun 2024

Statistical methods are imperative to reach substantial determinations from the acquired information Principle centre is given to kinds of information, estimation of focal varieties and fundamental tests, which are valuable for the investigation of various sorts of perceptions. Scarcely any boundaries like typical dissemination, computation of test size, level of importance, invalid speculation, records of changeability, and distinctive test are clarified in detail by giving reasonable models. Utilizing these rules, we are sure sufficient that postgraduate understudies will actually want to characterize the dissemination of information alongside the utilization of legitimate test. Data is likewise given in regards to different free programming projects and sites helpful for computations of measurements. In this manner, postgraduate understudies will have profited in two different ways whether they settle on scholastics or for the industry.

Integrative Multi-Omics Approaches in Insect Pest Ecology: Current Insights and Future Directions

Technical Report

Jun 2024

Zahid Ali

Integrative computational approach for gene expression profiling of metastatic breast cancer

Article

Jan 2023

Identification of core, conditional and crosstalk components of tomato heat stress response using integrative transcriptomics and orthology

Preprint

Full-text available

Apr 2024

Heat stress significantly affects global agricultural yield and food security and as climate change is expected to increase the frequency and severity of heatwaves, this is a growing challenge. Tomato plants are prone to heat stress exposure both in the field and in greenhouses, making heat stress resilience a key trait for breeding. While the identification of heat-associated genes has been addressed in multiple individual studies, the quantitative integration of data from these studies holds potential for low-cost, high-value knowledge gain about the complex network of actors involved in heat stress response mechanisms. To address this challenge, we have compiled a comprehensive data resource containing both novel and publicly available RNA-seq data on tomato in heat stress spanning multiple tissues, genotypes, and levels and durations of stress exposure. We show that in each individual dataset the large majority of responses originates from an interaction between the stimulus and the specific experimental setup. Conversely, by intersecting differentially expressed genes across experiments, we identify a tomato-specific core response of only 57 genes encoding heat shock proteins, transcriptional regulators, enzymes, transporters and several uncharacterized proteins. 17 of these genes lie within previously identified genetic loci associated with heat tolerance traits. Applying the same approach to all publicly available RNA-seq data on drought and salt stress in tomato, we find large overlaps in the conditional parts of the stress responses but the robust and sustained core responses are mostly stress-specific. Finally, we show that the core responses to these stresses are enriched with evolutionarily ancient genes with orthologs across all domains of life and that the heat core response genes form identifiable co-evolving clusters within the Streptophyta. Our study exemplifies the importance and advantage of using FAIR public data to interpret results of new stress experiments, and provides tools to perform such analyses in a relatively short time.

AGRUPACIÓN DE GENES EN CIENCIA INTENSIVA: COMPARACIÓN Y ANÁLISIS DE TENDENCIA MEDIANTE EL ÍNDICE DE ESTABILIDAD BIOLÓGICA

Article

Full-text available

Jan 2020

Esta investigación evalúa el rendimiento de los algoritmos de agrupación más conocidos utilizando el índice de estabilidad biológica (BSI). Se realizó una comparación entre los algoritmos de agrupación, para determinar de estos cuál es el óptimo según el puntaje obtenido en cada algoritmo, la agrupación de génica en Ciencia Intensiva, el mismo que utiliza bases de datos extensas para cubrir casi todos los resultados que pudiesen ocurrir realmente. Se aplica este método a una base de datos de expresión de genes (Microarray). El análisis se lo realizó a la base de datos “mouse” incluida en el paquete clValid en el software R, para el estudio de las células mesenquimales de ratones (cresta neural y el mesodermo derivado), también se utiliza métodos gráficos como los dendogramas para un primer enfoque. Para la selección del algoritmo óptimo, se calculó el índice biológico de estabilidad para cada algoritmo de agrupación siendo el mejor, el que más cerca de la unidad se encuentre. En consecuencia, el algoritmo más estable para dicha base de datos es “Diana”. Para llegar a este resultado se visualizó gráficamente el número de clústeres con la respuesta obtenida en cada caso; se tomó como el algoritmo óptimo el que más se apegue a la realidad del problema teniendo en cuenta su puntaje en los índices y además con la ayuda de un gráfico de filogenética para un ultimo enfoque.

Technical pitfalls when collecting, cryopreserving, thawing, and stimulating human T-cells

Article

Full-text available

May 2024

The collection, cryopreservation, thawing, and culture of peripheral blood mononuclear cells (PBMCs) can profoundly influence T cell viability and immunogenicity. Gold-standard PBMC processing protocols have been developed by the Office of HIV/AIDS Network Coordination (HANC); however, these protocols are not universally observed. Herein, we have explored the current literature assessing how technical variation during PBMC processing can influence cellular viability and T cell immunogenicity, noting inconsistent findings between many of these studies. Amid the mounting concerns over scientific replicability, there is growing acknowledgement that improved methodological rigour and transparent reporting is required to facilitate independent reproducibility. This review highlights that in human T cell studies, this entails adopting stringent standardised operating procedures (SOPs) for PBMC processing. We specifically propose the use of HANC’s Cross-Network PBMC Processing SOP, when collecting and cryopreserving PBMCs, and the HANC member network International Maternal Pediatric Adolescent AIDS Clinical Trials (IMPAACT) PBMC Thawing SOP when thawing PBMCs. These stringent and detailed protocols include comprehensive reporting procedures to document unavoidable technical variations, such as delayed processing times. Additionally, we make further standardisation and reporting recommendations to minimise and document variability during this critical experimental period. This review provides a detailed overview of the challenges inherent to a procedure often considered routine, highlighting the importance of carefully considering each aspect of SOPs for PBMC collection, cryopreservation, thawing, and culture to ensure accurate interpretation and comparison between studies.

The use of data management planning among researchers in higher learning institutions: The case of the Nelson Mandela African Institution of Science and Technology in Tanzania

Article

Full-text available

May 2024

This study assessed the use of data management plans among researchers at a selected higher learning institution (HLI) in Tanzania. A pretested structured questionnaire was administered to registered postgraduate students. Many of the respondents reported that a data management plan (DMP) was required before writing a research project and when a research project was submitted. The results also demonstrated that many respondents did not use any online DMP template tools to formulate their DMP although most of them were aware of available DMP template tools such as OpenDMP. Many respondents stated that the requirement of using a DMP were selection of a DMP format, updating the DMP regularly, having a short and to-the-point DMP and a well-structured DMP specifying the kinds and formats of the data to be acquired, generated, produced, and preserved. Meeting funders’ institutions, and publishers’ requirements, and ensuring that data are accurate, complete, and reliable were among the DMP benefits in HLIs identified by the respondents. Several challenges were revealed including a lack of awareness, competence, and guidelines to assist researchers using a DMP for their research projects. The conclusion is that researchers need to develop and use DMP template tools to plan, organize, and work on their research projects in addition to ensuring that they meet funders' requirements. It is recommended that HLIs should provide extensive training programs for raising awareness about DMPs among the researchers and to make DMPs a mandatory requirement for finalizing research projects among researchers, and not only for funding purposes.

Human molecular genetics of opioid addiction

Chapter

Sep 2012

Disorders of behavior represent some of the most common and disabling diseases affecting humankind; however, despite their worldwide distribution, genetic influences on these illnesses are often overlooked by families and mental health professionals. Psychiatric genetics is a rapidly advancing field, elucidating the varied roles of specific genes and their interactions in brain development and dysregulation. Principles of Psychiatric Genetics includes 22 disorder-based chapters covering, amongst other conditions, schizophrenia, mood disorders, anxiety disorders, Alzheimer's disease, learning and developmental disorders, eating disorders and personality disorders. Supporting chapters focus on issues of genetic epidemiology, molecular and statistical methods, pharmacogenetics, epigenetics, gene expression studies, online genetic databases and ethical issues. Written by an international team of contributors, and fully updated with the latest results from genome-wide association studies, this comprehensive text is an indispensable reference for psychiatrists, neurologists, psychologists and anyone involved in psychiatric genetic studies.

Translation of Gene Expression Data Into Personalized Treatment in Cervical Cancer: Machine Learning Approach

Article

Apr 2024

The majority of cervical cancers have been linked to the infection by human papillomavirus (HPV). There is a need to identify genes which play a role in the final manifestation of cervical cancers following HPV infection. To identify a number of genetic markers associated with cervical cancer that may aid in the disease's diagnosis or prognosis using machine learning methods. To do this, we will assess numerous gene expression profiles with integrative machine learning approaches such as random forest (RF) and support vector machine-based recursive feature elimination (SVMRFE). The conceptual analysis consists of following steps: (i) gene expression analysis and (ii) machine learning analysis for predicting genes. The selected datasets were GSE75132 and GSE39001 for this study. Accuracy and cross validation were carried for both SVM-RFE and RF model for the gene identification purpose. R Bioconductor packages “GEOquery,” “limma,” and “umap” were utilized. The selected genes of machine learning methods were combined. The SVM model was the best for predicting the gene expression microarray profile based on the accuracy this study was able to get. The SVM model indicated that genes might be used as biomarkers to identify biological processes. The identified genes were considered as potential gene signatures in cervical cancer detection, and their interactions were studied.

Yap is a Nutrient Sensor Sensitive to the Amino Acid L-Isoleucine and Regulates Expression of Ctgf in Cardiomyocytes

Preprint

Apr 2024

Myocardial infarction and reperfusion is a complex injury consisting of many distinct molecular stress patterns that influence cardiomyocyte survival and adaptation. Cell signalling that is essential to cardiac development also presents potential disease-modifying opportunities to recover and limit myocardial injury or maladaptive remodelling. Here we hypothesized that Yap signalling could be sensitive to one or more molecular stress patterns associated with early acute ischemia. Yap, not Taz, patterns of expression differ in post-myocardial infarct compared to peri-infarct tissue suggesting cell-specificity that would be challenging to resolve for causation in vivo. Using H9c2 ventricular myotubes in vitro as a model, Yap levels were most sensitive to nutrient deprivation compared to other stress patterns typified by ischemia within the first hour of stress. Moreover, this is mediated by amino acid availability, dominantly L-isoleucine, and influences the expression of Ctgf, a major determinant of myocardial adaptation after injury. These findings present novel opportunities for future therapeutic development and risk assessment for myocardial injury and adaptation.

Gene expression informatics--it's all in your mine

Article

Full-text available

Feb 1999

Technologies for whole-genome RNA expression studies are becoming increasingly reliable and accessible. However, universal standards to make the data more suitable for comparative analysis and for inter-operability with other information resources have yet to emerge. Improved access to large electronic data sets, reliable and consistent annotation and effective tools for 'data mining' are critical. Analysis methods that exploit large data warehouses of gene expression experiments will be necessary to realize the full potential of this technology.

Exploring the New World of the Genome With DNA Microarrays

Article

Full-text available

Feb 1999

Thousands of genes are being discovered for the first time by sequencing the genomes of model organisms, an exhilarating reminder that much of the natural world remains to be explored at the molecular level. DNA microarrays provide a natural vehicle for this exploration. The model organisms are the first for which comprehensive genome-wide surveys of gene expression patterns or function are possible. The results can be viewed as maps that reflect the order and logic of the genetic program, rather than the physical order of genes on chromosomes. Exploration of the genome using DNA microarrays and other genome-scale technologies should narrow the gap in our knowledge of gene function and molecular biology between the currently-favoured model organisms and other species.

One-stop shop for microarray data

Article

Full-text available

Mar 2000

Is a universal, public DNA-microarray database a realistic goal?

Quackenbush, J. Computational analysis of microarray data. Nat. Rev. Genet. 2, 418-427

Article

Full-text available

Jul 2001

John Quackenbush

Microarray experiments are providing unprecedented quantities of genome-wide data on gene-expression patterns. Although this technique has been enthusiastically developed and applied in many biological contexts, the management and analysis of the millions of data points that result from these experiments has received less attention. Sophisticated computational tools are available, but the methods that are used to analyse the data can have a profound influence on the interpretation of the results. A basic understanding of these computational tools is therefore required for optimal experimental design and meaningful data analysis.

The Stanford Microarray Database

Article

Jan 2001
NUCLEIC ACIDS RES

Gavin Sherlock

The Stanford Microarray Database (SMD) stores raw and normalized data from microarray experiments, and provides web interfaces for researchers to retrieve, analyze and visualize their data. The two immediate goals for SMD are to serve as a storage site for microarray data from ongoing research at Stanford University, and to facilitate the public dissemination of that data once published, or released by the researcher. Of paramount importance is the connection of microarray data with the biological data that pertains to the DNA deposited on the microarray (genes, clones etc.). SMD makes use of many public resources to connect expression information to the relevant biology, including SGD [Ball,C.A., Dolinski,K., Dwight,S.S., Harris,M.A., Issel-Tarver,L., Kasarskis,A., Scafe,C.R., Sherlock,G., Binkley,G., Jin,H. et al. (2000) Nucleic Acids Res., 28, 77–80], YPD and WormPD [Costanzo,M.C., Hogan,J.D., Cusick,M.E., Davis,B.P., Fancher,A.M., Hodges,P.E., Kondu,P., Lengieza,C., Lew-Smith,J.E., Lingner,C. et al. (2000) Nucleic Acids Res., 28, 73–76], Unigene [Wheeler,D.L., Chappey,C., Lash,A.E., Leipe,D.D., Madden,T.L., Schuler,G.D., Tatusova,T.A. and Rapp,B.A. (2000) Nucleic Acids Res., 28, 10–14], dbEST [Boguski,M.S., Lowe,T.M. and Tolstoshev,C.M. (1993) Nature Genet., 4, 332–333] and SWISS-PROT [Bairoch,A. and Apweiler,R. (2000) Nucleic Acids Res., 28, 45–48] and can be accessed at http://genome-www.stanford.edu/microarray.

Systematic Management and Analysis of Yeast Gene Expression Data

Article

Apr 2000
GENOME RES

John Aach

We report steps toward the systematic management, standardization, and analysis of functional genomics data. We developed the ExpressDB database for yeast RNA expression data and loaded it with ∼17.5 million pieces of data reported by 11 studies with three different kinds of high-throughput RNA assays. A web-based tool supports queries across the data from these studies. We examined comparability of data by converting data from 9 studies (217 conditions) into mRNA relative abundance estimates (ERAs) and by clustering of conditions by ERAs. We report on generation of ERAs and condition clustering for non-microarray data (5 studies, 63 conditions) and describe initial attempts to generate microarray-based ERAs (4 studies, 154 conditions), which exhibit increased error, on our web sitehttp://arep.med.harvard.edu/ExpressDB. We recommend standards for data reporting, suggest research into improving comparability of microarray data through quantifying and standardizing control condition RNA populations, and also suggest research into the calibration of different RNA assays. We introduce a model for a database that integrates different kinds of functional genomics data, Biomolecule Interaction, Growth and Expression Database (BIGED).

Computational Analysis of Microarray data

Article

Nov 2000
NAT REV GENET

John Quackenbush

Systematic Management and Analysis of Yeast Gene Expression Data

Article

May 2000
GENOME RES

We report steps toward the systematic management, standardization, and analysis of functional genomics data. We developed the ExpressDB database for yeast RNA expression data and loaded it with approximately 17.5 million pieces of data reported by 11 studies with three different kinds of high-throughput RNA assays. A web-based tool supports queries across the data from these studies. We examined comparability of data by converting data from 9 studies (217 conditions) into mRNA relative abundance estimates (ERAs) and by clustering of conditions by ERAs. We report on generation of ERAs and condition clustering for non-microarray data (5 studies, 63 conditions) and describe initial attempts to generate microarray-based ERAs (4 studies, 154 conditions), which exhibit increased error, on our web site http://arep.med.harvard. edu/ExpressDB. We recommend standards for data reporting, suggest research into improving comparability of microarray data through quantifying and standardizing control condition RNA populations, and also suggest research into the calibration of different RNA assays. We introduce a model for a database that integrates different kinds of functional genomics data, Biomolecule Interaction, Growth and Expression Database (BIGED).

Lockhart, D. J. & Winzeleer, E. A. Genomics, gene expression and DNA arrays. Nature 405, 827-836

Article

Jul 2000

Experimental genomics in combination with the growing body of sequence information promise to revolutionize the way cells and cellular processes are studied. Information on genomic sequence can be used experimentally with high-density DNA arrays that allow complex mixtures of RNA and DNA to be interrogated in a parallel and quantitative fashion. DNA arrays can be used for many different purposes, most prominently to measure levels of gene expression (messenger RNA abundance) for tens of thousands of genes simultaneously. Measurements of gene expression and other applications of arrays embody much of what is implied by the term 'genomics'; they are broad in scope, large in scale, and take advantage of all available sequence information for experimental design and data interpretation in pursuit of biological understanding.

Biomedical Discovery with DNA Arrays

Article

Aug 2000
CELL

Richard A. Young

Guidelines for submitting commentsPolicy: Comments that contribute to the discussion of the article will be posted within approximately three business days. We do not accept anonymous comments. Please include your email address; the address will not be displayed in the posted comment. Cell Press Editors will screen the comments to ensure that they are relevant and appropriate but comments will not be edited. The ultimate decision on publication of an online comment is at the Editors' discretion. Formatting: Please include a title for the comment and your affiliation. Note that symbols (e.g. Greek letters) may not transmit properly in this form due to potential software compatibility issues. Please spell out the words in place of the symbols (e.g. replace “α” with “alpha”). Comments should be no more than 8,000 characters (including spaces ) in length. References may be included when necessary but should be kept to a minimum. Be careful if copying and pasting from a Word document. Smart quotes can cause problems in the form. If you experience difficulties, please convert to a plain text file and then copy and paste into the form.

Minimum information about a microarray experiment (MIAME) - Toward standards for microarray data

Abstract

Recommended publications

Brazma, A. et al. Minimum information about a microarray experiment (MIAME)-toward standards for mic...

Microarray Gene Expression Data Analysis: A Beginner's Guide

Submission of Microarray Data to Public Repositories

Standards for microarray data [1]