ArticlePDF Available

A Module Map Showing Conditional Activity of Expression Modules in Cancer

Authors:

Abstract and Figures

DNA microarrays are widely used to study changes in gene expression in tumors, but such studies are typically system-specific and do not address the commonalities and variations between different types of tumor. Here we present an integrated analysis of 1,975 published microarrays spanning 22 tumor types. We describe expression profiles in different tumors in terms of the behavior of modules, sets of genes that act in concert to carry out a specific function. Using a simple unified analysis, we extract modules and characterize gene-expression profiles in tumors as a combination of activated and deactivated modules. Activation of some modules is specific to particular types of tumor; for example, a growth-inhibitory module is specifically repressed in acute lymphoblastic leukemias and may underlie the deregulated proliferation in these cancers. Other modules are shared across a diverse set of clinical conditions, suggestive of common tumor progression mechanisms. For example, the bone osteoblastic module spans a variety of tumor types and includes both secreted growth factors and their receptors. Our findings suggest that there is a single mechanism for both primary tumor proliferation and metastasis to bone. Our analysis presents multiple research directions for diagnostic, prognostic and therapeutic studies.
Bone osteoblastic module (#234), a module that responds significantly to multiple conditions, including breast cancer, lung cancer, HCC and ALL.(a) Expression profile of genes in the bone osteoblastic module. Details of data presentation are as described for LiC, liver cancer; BC, breast cancer; LC, lung cancer; L, leukemia. Asterisks denote general annotations. (b) Module genes in the context of the molecular pathways underlying bone remodeling. The pathways are shown for the differentiation and matrix remodeling events (light blue arrows) of the three main cell types in bone and cartilage: chondrocytes (top), osteoblasts (middle) and osteoclasts (bottom). The coordination and balance among the three processes results in either bone building or resorption. The module genes (purple) are primarily associated with proliferation and differentiation of chondrocytes and osteoblasts. Even those module genes that are related to osteoclast induction encode proteins that are typically secreted by osteoblasts. The genes include both intracellular or membrane proteins (thin black border) and extracellular secreted ones (bold blue border), thus forming a coherent and self-sufficient autocrine module. (c) The expression and function of 32 module genes in normal tissues based on previous immunohistochemical and in situ hybridization experiments. Almost all (31 of 32) of the genes function in bone or cartilage (blue), and 14 are expressed primarily (pink) or uniquely (purple) in bone or cartilage. In contrast, only 8 of the genes are angiogenic (green), and another 5 genes are partly associated with blood vessels or antiangiogenic function (yellow). (d) The expression of 23 of the 32 module genes in epithelial tumors and their surrounding stroma based on previous immunohistochemical and in situ hybridization experiments. Whereas 19 of the genes are associated with breast cancer (green) or other epithelial tumors (orange), only 4 are expressed solely in stroma (blue).
… 
Content may be subject to copyright.
A module map showing conditional activity of expression
modules in cancer
Eran Segal
1,4
, Nir Friedman
2
, Daphne Koller
1
& Aviv Regev
3
DNA microarrays are widely used to study changes in gene
expression in tumors, but such studies are typically system-
specific and do not address the commonalities and variations
between different types of tumor. Here we present an
integrated analysis of 1,975 published microarrays spanning
22 tumor types. We describe expression profiles in different
tumors in terms of the behavior of modules, sets of genes
that act in concert to carry out a specific function. Using a
simple unified analysis, we extract modules and characterize
gene-expression profiles in tumors as a combination of
activated and deactivated modules. Activation of some
modules is specific to particular types of tumor; for example,
a growth-inhibitory module is specifically repressed in acute
lymphoblastic leukemias and may underlie the deregulated
proliferation in these cancers. Other modules are shared across
a diverse set of clinical conditions, suggestive of common
tumor progression mechanisms. For example, the bone
osteoblastic module spans a variety of tumor types and
includes both secreted growth factors and their receptors.
Our findings suggest that there is a single mechanism for both
primary tumor proliferation and metastasis to bone. Our
analysis presents multiple research directions for diagnostic,
prognostic and therapeutic studies.
Cancer is a multifaceted phenomenon, originating in different tissues
and involving disruptions of various cellular processes. Aberrations in
regulation of key proliferation and survival pathways are common
to all tumors, whereas alterations in other pathways may be specific to
certain tumors. Understanding which mechanisms are general and
which are specific has important therapeutic implications, but few
studies
1–4
address this issue from a genome-wide perspective. Here, we
used DNA microarray data in a comprehensive analysis aimed at
identifying the shared and unique molecular ‘modules underlying
human malignancies. Two recent studies
3,5
demonstrate the utility of
similar approaches in the context of a single module. The result of our
analysis is a global map showing the modules that are induced or
repressed in a wide variety of clinical conditions.
We analyzed a ‘cancer compendium’ of expression profiles compiled
from 26 studies (Supplementary Table 1 online), measuring the
expression of 14,145 genes in 1,975 arrays spanning 17 categories
(Fig. 1a). First, we organized genes into higher-level modules, and
then we identified clinical conditions in which different modules are
induced or repressed.
We started by collecting 2,849 biologically meaningful gene sets,
including clusters of coexpressed genes, genes expressed in specific
tissue types
6
and genes belonging to the same functional category or
pathway
7–9
(Fig. 1b). We identified the arrays in which each gene set
has a prominent expression signature by testing whether the expres-
sion of a statistically significant fraction of the genes in the set changed
coordinately in the array (Fig. 1c,d). In our compendium, the change
in expression of each gene in a given array is relative to the average
expression of the gene across all arrays in the relevant data set.
Gene sets reflect biological modules only approximately. Only
a subset of genes in a set may contribute to its expression signature,
and different gene sets may have similar signatures across the
arrays, owing to either an overlap between the gene sets or coregula-
tion of nonoverlapping gene sets. When several gene sets (a cluster)
have similar signatures, we extracted from this cluster a core
module, which both refines the gene composition of each gene set
and combines several related gene sets. This module more closely
reflects the genes that participate in a specific biological process, as it
consists of the genes whose expression profile corresponds to the
signature of the cluster. Overall, we identified 456 statistically sig-
nificant modules (Supplementary Note and Supplementary Fig. 1
online) that span various processes and functions, including metabo-
lism, transcription, translation, degradation, cellular and neural
signaling, growth, cell cycle, apoptosis and extracellular matrix and
cytoskeleton components.
In the second step of our analysis, we used these modules to
characterize clinical conditions according to the combination of
modules that are activated and deactivated in them. Using informa-
tion provided in the original studies, we annotated all the arrays with
263 biological and clinical conditions, including tissue and tumor
type, diagnostic and prognostic information, and molecular markers.
Published online 26 September 2004; doi:10.1038/ng1434
1
Computer Science Department, Stanford University, Stanford, California 94305, USA.
2
School of Computer Science and Engineering, Hebrew University, Jerusalem
91904, Israel.
3
Bauer Center for Genomics Research, Harvard University, Cambridge, Massachusetts 02138, USA.
4
Present address: Center for Studies in Physics and
Biology, The Rockefeller University, New York, New York 10021, USA. Correspondence should be addressed to D.K. (koller@cs.stanford.edu) or A.R.
(aregev@cgr.harvard.edu).
1090 VOLUME 36
[
NUMBER 10
[
OCTOBER 2004 NATURE GENETICS
LETTERS
© 2004 Nature Publishing Group http://www.nature.com/naturegenetics
For each module and each condition, we tested whether the module
was induced (or repressed) in a significant fraction of the arrays
labeled with the condition. We distinguished between specific’ and
general’ annotations: specific annotations are evaluated within each
category, whereas general annotations are evaluated only relative to
their lack of association with arrays from the other categories. We
compiled the module-condition pairs into a global module map for
cancer (Fig. 2).
The results must be interpreted with caution, because the biological
interpretation of induction (or repression) of a module in a given
condition depends on our choice of normalization (Supplementary
Note online). In addition, interpretation may be confounded by
combining diverse data sets, each normalized separately. To address
this problem, we used annotations in a way that is strictly local to each
category (Supplementary Note online) in the final analysis step, in
which we paired modules with clinical annotations.
The module map shows that some modules (e.g., cell cycle; Fig. 3a)
are shared across multiple tumor types and may be related to general
tumorigenic processes, whereas others are more specific to the tissue
origin or progression of particular tumors. For example, modules
related to neural processes (e.g., #274 and #137) are repressed in a
subset of brain tumors (relative to other central nervous system
tumors), and an intermediate filament module (#357) is induced in
squamous cell lung carcinomas and reduced in lung adenocarcinomas
(both relative to other lung tumors), consistent with the idea that de-
differentiation processes accompany tumorigenesis. Related modules,
such as cell cycle modules (Fig. 3a), seem to
form building blocks that are used together in
different conditions. More specialized mod-
ules, such as signaling and growth regulatory
modules (Fig. 3b,c), are used in distinct
combinations by various tumors.
Conversely, the module map characterizes
each condition by a particular combination of
modules. For example, invasive hepatocellular
carcinoma (HCC) is characterized by induc-
tion of cell cycle modules and repression of
modules related to metabolism, detoxifica-
tion, the extracellular matrix and signaling
(relative to hepatitis-infected liver tissue and
noninvasive HCC). Estrogen receptor–posi-
tive breast cancer is characterized by repres-
sion of modules containing keratins and
other intermediate filaments (relative to
other breast adenocarcinomas and human
mammary epithelial cells). The map indicates
that related conditions involve related mod-
ules, albeit in distinct ways (Fig. 3d,e). For
example, various tumors of hematologic ori-
gin (Fig. 3d) involve similar immune, inflam-
mation, growth regulation and signaling
modules. The pattern of involvement sepa-
rates different tumor types and subtypes.
Characterizing conditions in terms of
modules provides important insights into
the mechanisms underlying specific malig-
nancies. For example, the growth inhibitory
module (Fig. 4) consists primarily of growth
suppressors (11 of 16) whose expression is
coordinately repressed in a subset of acute
leukemia arrays (relative to the leukemia
category; 40 arrays; P o 4 10
29
). Some
of these genes are direct (DUSP2 (ref. 10),
DUSP4 (ref. 11), DUSP6 (ref. 12)) or indirect
(RGS3 (ref. 13), RGS4 (ref. 14)) repressors of
ERK1, an activator of cell proliferation
(Fig. 4b) known to be constitutively active
in acute leukemia
10
.Others(MAP3K7IP1
(also called TAB1;ref.15)andGADD45G
(ref. 16)) are activators of the apoptosis
repressor p38 (Fig. 4b). Thus, the concerted
downregulation of these growth suppres-
sors may allow ERK1 and p38 to escape
regulation, leading to uncontrolled prolifera-
Find arrays where
gene sets change
significantly
Expression
data
Gene sets
14,145 genes, 1,975 arrays
Gene sets
vs. arrays
Find
underlying
genes
Modules
Modules
vs. arrays
Find arrays where
modules change
significantly
Modules vs.
conditions
Find significantly
enriched array conditions
1
3
4
2
Tissue-specific gene
sets 101 (4%)
Gene ontology
1,281 (45%)
Kegg pathways
114 (4%)
GenMapp pathways
53 (2%)
Gene expression
clusters 1,300 (45%)
Gene 1
Gene 2
Gene 3
Gene 4
Gene 5
Gene 6
Gene 7
Gene set 2
Gene set 1
Gene 8
Gene set 3
Array 2
Array 1
Array 3
Array 4
Array 5
Array 6
Array 7
Gene set 1
Gene set 2
Gene set 3
Gene 2
Gene 3
Gene 5
Gene 6
Gene 7
Gene set 2
Gene set 1
Array 2
Array 3
Array 4
Array 5
Gene 1
Gene 2
Gene 8
Gene set 3
Array 6
Array 3
Module 1
Module 2
1
2
2
Module 1
Module 2
3
Condition 1
Condition 2
Condition 3
Module 1
Module 2
Condition 1
Condition 3
4
Repressed Induced
Viral infection
16 (1%)
Various tumors
155 (7%)
Stimulated PBMCs
183 (9%)
Stimulated immune
53 (3%)
Prostate cancer
102 (5%)
Neuro tumors
90 (4%)
NCI60 152 (7%)
Lung cancer
276 (13%)
Liver cancer
207 (10%)
Leukemia
142 (7%)
HeLa cell cycle
114 (5%)
Gliomas 47 (2%)
Fibroblast serum
18 (1%)
Fibroblast infection
18 (1%)
Fibroblast EWS/FLI
10 (<1%)
B lymphoma
313 (15%)
Breast cancer
195 (9%)
ab
c
d
Figure 1 Overview of the analysis procedure. (a) Composition of the 1,975 arrays in our compiled
cancer compendium according to the conditions they represent. PBMCs, peripheral blood mononuclear
cells. (b) Composition of the 2,849 gene sets in our analysis according to the source from which they
were compiled. (c) Flow chart of the different steps in our analysis. (d) Example of the analysis on an
input expression data of seven arrays, eight genes and three gene sets. Circled numbers correspond to
steps in the flow chart. In this example, gene sets 1 and 2 are significantly induced in arrays 2–5 and
thus constitute a gene set cluster, whereas gene set 3 is significantly repressed in arrays 3 and 6 and
thus constitutes its own gene set cluster. The module resulting from the first gene set cluster includes
genes 2, 3, 5, 6 and 7, as these genes contribute to the significant expression of this gene set cluster.
Although gene 4 is a member of both gene sets 1 and 2, it is not part of the module, as it did not
contribute to their significance (gene 4 is repressed in the arrays where these gene sets are significantly
induced). In the final step of the analysis, arrays are annotated with clinical conditions 1–3; for
example, array 1 is annotated with conditions 1 and 2. The set of arrays where module 1 is
significantly induced (arrays 2–5) is enriched for condition 1, and the set where module 2 is
significantly repressed is enriched for condition 3.
NATURE GEN ETICS VOLUME 36
[
NUMBER 10
[
OCTOBER 2004 1091
LETTERS
© 2004 Nature Publishing Group http://www.nature.com/naturegenetics
10
100
1,000
0
Genes per
module
annotation
100
10
0
1,000
Clinical annotations
Modules
>0.4
0
>0.4
0
CNS
Immune
Cell line & cancer
Immune
Tunor cell lines
Leukemia
CNS
Lymphoma
Cell line /
lung cancer
Invasive /
high grade
Hematologic
Liver
Lung
Hematologic
Hematologic
Hematologic
Immune
CNS
Hematologic
Immune
Lymphoma
Liver
Integrins &
sphingolipids
Immune
Apoptosis
Cell lines
IF & keratins
Translation, folding,
degradation &
oxid. phosphorylation
Immune
Signaling
DNA damage /
nucleotide metabolism
CNS signaling
Cell cycle
Cell cycle & microtubili
Signaling
DNA repair &
chromatin
Muscle
Immune
Immune
Protein biosynthesis
Synapse
Immune
Chromatin &
metabolism
Cytoskeleton,
adhesion & signaling
Signaling &
development
Signaling &
growth regulation
Signaling,
development & immune
MMPs
Signaling & regulation
Signaling
Signaling
ECM, adhesion &
Signaling
Tissues
Immune
Signaling & CNS
Metabolism & detox
Apoptosis
Immune
Immune
Immune
Signaling
Arrays per
monocytes (L*) (141)
hematologic cancer (L*) (141)
acute leukemia (L*) (141)
cell line (NC*) (135)
cancer (VT*) (190)
CNS tumor (G*) (50)
CNS tumor (NT*) (81)
macrophages (SI*) (53)
monocytes (SI*) (53 )
lung cancer (LuC*) (238)
non small cell lung cancer (LuC*) (205)
liver cancer cell line (LiC) (10)
cell line (BL) (14)
invasive liver tumor (LiC) (38)
breast cancer cell line (BC) (17)
lung carcinoid (LuC) (20)
small cell lung cancer (LuC) (10)
GC B like DLBCL (VT) (6)
centroblasts (VT) (6)
malignant glioblastoma (VT) (10)
CNS tumor (VT) (20)
primary blood mononuclear cells (L) (21)
bone marrow (L) (23)
epithelial cell line (NC) (32)
follicular lymphoma (BL) (28)
primary blood mononuclear cells (VT) (54)
hematologic cancer (VT) (52)
lymphocytes (VT) (42)
B cells (VT) (32)
lymphoma (VT) (22)
diffuse large B cell lymphoma - DLBCL (VT) (11)
colon cancer metastasis to liver (LiC) (5)
p53 positive hepatocellular carcinoma (LiC) (23)
grade 3 (BC) (38)
T cells (VT) (10)
HeLa cell line (HC*) (114)
prostate cancer (PC) (52)
adenocarcinoma (VT) (75)
node in lung (LuC) (5)
LPS stimulated immune cells (SP) (21)
mutated p53 tumor (BC) (35)
malignant glioblastoma (NT) (10)
human mammary epithelial cells - HMECs (BC) (11)
medulloblastoma (VT) (10)
normal tissue (VT) (90)
after doxorubicin chemotherapy (BC) (21)
normal CNS tissue (NT) (4)
stage T2 (LuC) (35)
GC B like DLBCL (BL) (53)
acute myelogeous leukemia (L) (50)
T cells (BL) (7)
primary blood mononuclear cells (BL) (14)
normal lymphocytes (BL) (31)
live bacteria stimulated immune cells (SP) (5)
ionomycin PMA stimulated immune cells (SP) (7)
stimulated B cells (BL) (12)
squamous cell lung cancer (LuC) (35)
melanoma cell line (NC) (16)
large cell lung cancer (LuC) (4)
leukemia cell line (NC) (14)
colon cancer cell line (NC) (14)
follicular lymphoma (VT) (11)
cancer (VT) (190)
medulloblastoma (NT) (60)
vincristine chemotherapy (NT) (60)
non - classic malignant glioblastoma(G) (14)
fast doubling (20-40) cell lines (NC) (90)
stage T1 (LuC) (153)
unstimulated immune cells (SP) (39)
renal tissue (NT) (5)
adenocarcinoma (LuC) (166)
fibroblasts (FE*) (10)
aT/RT - CNS and other origin (NT) (10)
stimulated immune cells (SP*) (143)
primary blood mononuclear cells (SP*) (152)
monocytes (SP*) (182)
breast cancer (BC*) (112)
activated B like DLBCL (BL) (30)
diffuse large B cell lymphoma - DLBCL (BL) (126)
non-tumor liver tissue (LiC) (76)
liver tissue (LiC*) (187)
hepatitis infected liver (LiC*) (156)
normal lung tissue (LuC) (22)
leukemia (BL) (51)
chronic lymphocytic leukemia (BL) (51)
lymphocytes (BL*) (249)
B cells (BL*) (236)
hematologic cancer (BL*) (205)
macrophages (SP) (30)
gram-bacteria stimulated immune cells (SP) (78)
B.pertussis stimulated immune cells (SP) (67)
cancer and cell line (LiC) (126)
adenocarcinoma (LiC) (97)
hepatocellular carcinoma (LiC) (104)
lymphoma (BL) (77)
B cells (L) (62)
lymphocytes (L) (91)
acute lymphocytic leukemia (L) (111)
BC – Breast cancer
BLB – lymphoma
FE – Fibroblast EWS-FLI
FI – Fibroblast serum
FS – Fibroblast infection
G – Gliomas
HC – HeLacell cycle
L – Leukemia
Lic – Liver cancer
immune
LuC – Lung cancer
NC – NCI60
NT – Neurotumors
PC – Prostate cancer
SI – Stimulated
SP – Stimulated PBMCs
VI – Viral infection
VT – Various tumors
Figure 2 The cancer module map: a matrix of modules (rows) versus array clinical conditions (columns), where a red (or green) entry indicates that the
arrays in which the corresponding module was significantly induced (or repressed) contained more arrays with the given annotation than would be expected
by chance. The intensity of the entries corresponds to the fraction of arrays in the module with the given annotation that were significantly induced (or
repressed). White entries indicate that both the induced and repressed arrays were significant for the given annotation. Only significant modules are shown.
A subset of significant conditions is shown; redundant conditions were removed for clarity. Only columns (rows) with two or more significant entries are
shown. The number of genes in each module and the number of arrays annotated with each condition are shown using gray bars (in log-scale). Each
condition annotation is followed by an abbreviated code of the data set in which it was analyzed and by the number of arrays with that annotation. The box
(top right) contains details for these abbreviations. Asterisks indicate general annotations. The rows and columns of the matrix were each clusteredinto
distinct clusters
30
, and the resulting clusters are indicated by vertical and horizontal lines. We manually assigned, whenever possible, a concise label to
module clusters (right; colored bars) or condition clusters (bottom; colored bars). Related conditions (or modules) are often clustered together inthemodule
map, but many modules are shared across conditions, indicating that tumors are characterized by combinations of a small number of shared and unique
modules. CNS, central nervous system; ECM, extracellular matrix; MMPs, matrix metalloproteinases.
1092 VOLUME 36
[
NUMBER 10
[
OCTOBER 2004 NATURE GENETICS
LETTERS
© 2004 Nature Publishing Group http://www.nature.com/naturegenetics
a
b
c
cAMP signaling (575)
cGMP signaling (65)
Secreted signaling (92)
GPCRs (375)
Signaling (117)
Signaling (176)
Signaling (129)
GPCRs (146)
RTK signaling (259)
Signaling (94)
d
e
>0.4
0
Primary blood mononuclear cells (SP*)
Stimulated immune cells (SP*)
Monocytes (SP*)
Macrophages (SP)
Chronic lymphocytic leukemia (BL)
Leukemia (BL)
Unstimulated immune cells (SP)
B cells (BL*)
Lymphocytes (BL*)
Hematologic cancer (BL*)
Lymphoma (BL)
B.pertussis stimulated immune cells (SP)
Diffuse large B cell lymphoma - DLBCL (VT)
Gram-bacteria stimulated immune cells (SP)
Activated B like DLBCL (BL)
Large cell lung cancer (LuC)
Cell line (BL)
Macrophages (SI*)
Monocytes (SI*)
Epithelial cell line (NC)
CNS tumor (VT)
CNS tumor (NT*)
Lung carcinoid (LuC)
Lymphocytes (L)
B cells (L)
Cancer and cell line (LiC)
Bone marrow (L)
Primary blood mononuclear cells (L)
Melanoma cell line (NC)
Acute leukemia (L*)
Hematologic cancer (L*)
Monocytes (L*)
Growth suppressors (173)
Apoptosis (340)
Cytokines & growth factors (433)
Growth inhibitors (488)
Antiapoptosis (537)
Antiapoptotic evasion (312)
Programmed cell death (300)
Cancer and cell line (LiC)
Macrophages (SP)
B.pertussis stimulated immune cells (SP)
Gram-bacteria stimulated immune cells (SP)
Grade 3 (BC)
Hepatocellular carcinoma (LiC)
Adenocarcinoma (LiC)
p53 positive hepatocellular carcinoma (LiC)
Bone marrow (L)
B cells (L)
Centroblasts (VT)
GC B like DLBCL (VT)
T cells (VT)
Small cell lung cancer (LuC)
Liver cancer cell line (LiC)
Invasive liver tumor (LiC)
Monocytes (SP*)
Live bacteria stimulated immune cells (SP)
Cell line (BL)
Chronic lymphocytic leukemia (BL)
Leukemia (BL)
Acute myelogeous leukemia (L)
Normal tissue (VT)
Non-tumor liver tissue (LiC)
Hepatitis infected liver (LiC*)
Liver tissue (LiC*)
Normal lung tissue (LuC)
Fibroblasts (FE*)
Follicular lymphoma (BL)
B cells (BL*)
Lymphocytes (BL*)
Hematologic cancer (BL*)
M phase (320)
DNA replication (158)
Nucleotide metabolism (337)
Cell cycle (57)
Spindle & kinetochore (315)
Cell cycle (54)
Macrophages (SI*)
Monocytes (SI*)
Primary blood mononuclear cells (SP*)
Stimulated immune cells (SP*)
Monocytes (SP*)
Cell line (BL)
Bone marrow (L)
Primary blood mononuclear cells (L)
Unstimulated immune cells (SP)
Normal lymphocytes (BL)
Primary blood mononuclear cells (BL)
LPS stimulated immune cells (SP)
Stimulated B cells (BL)
Live bacteria stimulated immune cells (SP)
Ionomycin PMA stimulated immune cells (SP)
T cells (BL)
Follicular lymphoma (BL)
Acute myelogeous leukemia (L)
GC B like DLBCL (BL)
Activated B like DLBCL (BL)
Diffuse large B cell lymphoma - DLBCL (BL)
B.pertussis stimulated immune cells (SP)
Gram-bacteria stimulated immune cells (SP)
B cells (L)
Lymphocytes (L)
Acute lymphocytic leukemia (L)
Lymphoma (BL)
Chronic lymphocytic leukemia (BL)
Leukemia (BL)
B cells (BL*)
Lymphocytes (BL*)
Hematologic cancer (BL*)
Hematologic cancer (L*)
Monocytes (L*)
Acute leukemia (L*)
Tissue genes
Immune & inflammation
Serine proteases
ECM & signaling
Ox. phos. & cell cycle
CNS & synapse genes
Immune signaling & Ag
Transcriptional regulation
Cell line genes
Cell cycle
Apoptosis
Cell cycle & chromatin
ECM, adhesion &
signaling
Ox. phos. & degradation
Protein biosynthesis &
degradation
Immune response
MMPs
Chemokines & signaling
Signaling
Development &
growth regulation
Ion channels & signaling
Signaling & CNS genes
Muscle genes
Hematologic cancer
Hematologic cancer
Immune
Hematologic cancer (L*)
Monocytes (L*)
Acute leukemia (L*)
Liver cancer cell line (LiC)
Non small cell lung cancer (LuC*)
Lung cancer (LuC*)
Cancer (VT*)
Breast cancer cell line (BC)
Invasive liver tumor (LiC)
AT/RT - CNS and other origin (NT)
Bone marrow (L)
Primary blood mononuclear cells (L)
Leukemia cell line (NC)
Epithelial cell line (NC)
Squamous cell lung cancer (LuC)
Primary blood mononuclear cells (VT)
Hematologic cancer (VT)
Lung carcinoid (LuC)
Live bacteria stimulated immune cells (SP)
Unstimulated immune cells (SP)
Fibroblasts (FE*)
Cell line (NC*)
Monocytes (SP*)
Primary blood mononuclear cells (SP*)
Stimulated immune cells (SP*)
Breast cancer (BC*)
Diffuse large B cell lymphoma - DLBCL (BL)
GC B like DLBCL (BL)
Stage T2 (LuC)
After doxorubicin chemotherapy (BC)
Activated B like DLBCL (BL)
Non-tumor liver tissue (LiC)
Hepatitis infected liver (LiC*)
Liver tissue (LiC*)
Cancer and cell line (LiC)
Acute lymphocytic leukemia (L)
Lymphocytes (L)
Stage T1 (LuC)
Fast doubling (20-40) cell lines (NC)
B cells (L)
Macrophages (SP)
Hepatocellular carcinoma (LiC)
Adenocarcinoma (LiC)
Lymphoma (BL)
CNS tumor (NT*)
CNS tumor (G*)
AT/RT - CNS and other origin (NT)
Lung carcinoid (LuC)
CNS tumor (VT)
Malignant glioblastoma (VT)
Malignant glioblastoma (NT)
Medulloblastoma (VT)
Normal CNS tissue (NT)
Vincristine chemotherapy (NT)
Medulloblastoma (NT)
Non-classic malignant glioblastoma(G)
Apoptosis, cytoskeleton & Ag presentation
Signaling
Metabolism & energy
Cell lines, tissues & ECM
Immune response, ECM & signaling
CNS genes
Protein biosynthesis, folding & degradation
ECM, adhesion & cytoskeleton
Synapse genes
Tissue genes, cytoskeleton & growth
Immune & growth
Signaling & tissues
Growth regulation
Figure 3 Combinatorial signatures in the cancer module map. Five submatrices of the full map (Fig. 2) showing rows of numbered modules organized by
conditions that show similarities (a–c) and module clusters arranged by related conditions (d,e). Each column heading is followed by the code (Fig. 2) of the
data set on which the condition was analyzed. The box at the top right of Figure 2 contains details for these abbreviations. (a) Cell cycle modules induced in
HCC, small cell lung cancer and grade-three breast cancer, repressed in several normal tissues, in chronic lymphocytic leukemia (CLL) and acute myeloid
leukemia (AML). (b) Growth regulatory modules are mostly used by hematologic malignancies. In most cases, a particular condition shows either uniform
induction or repression of most growth-modulating modules, both apoptotic and antiapoptotic. (c) Signal transduction modules representing a variety of
pathways are coregulated in various tumors. Most modules are repressed in HCC and ALL. A subset is induced in activated B-like diffuse large B-cell
lymphoma (DLBCL), and another subset is reduced in stage T1 lung adenocarcinoma. White elements indicate modules that are both induced and repressed
in the same condition, either because some module genes were induced and others repressed or because the modules were induced in certain arrays and
repressed in others from the same condition. GPCR: G protein–coupled receptors; RTK: receptor tyrosine kinase. (d) Immune system conditions use similar
modules in distinct ways. Many modules are shared across tumor types, cell types and data sets, including DLBCL, ALL, AML, CLL and follicular lymphoma.
But each condition has a unique module signature. CNS: central nervous system; ECM: extracellular matrix. (e) CNS tumors are characterized by a
combination of CNS–specific genes, immune response modules, ECM and cyotoskeletal proteins, and neural signaling modules. Lung carcinoid tumors, of
neurological origin, use similar modules.
NATURE GEN ETICS VOLUME 36
[
NUMBER 10
[
OCTOBER 2004 1093
LETTERS
© 2004 Nature Publishing Group http://www.nature.com/naturegenetics
tion and reduced cell death. DUSP2 has been implicated in acute
leukemia
10
; the other genes may offer new therapeutic targets.
The steroid catabolism module (Fig. 5) primarily contains steroid
hormone enzymes (8 of 13) whose expression is repressed in a subset
of HCC and hepatic cell lines (relative to hepatitis-infected liver tissue
and HCC; 31 arrays; P o 4 10
8
). This may indicate more than a
general reduction in metabolic processes. Expression of an additional
module (#404), consisting of steroid hormone receptors (6 of 25
module genes) and binding proteins (15 of 25), is repressed in a subset
of HCC and hepatic cell lines (relative to hepatitis-infected liver tissue
and HCC; 24 arrays; P o 2.5 10
6
). This reduction of steroid
hormone catabolism in HCC is consistent with the fact that HCC is
significantly more prevalent in men and postmenopausal women
17
and that elevated levels of serum testosterone predict an increased
HCC risk. Overall, these results suggest that an imbalance in the
generation of steroid hormones and in receiving steroid hormone
signals may have a role in hepatitis and HCC.
Other modules provide insight into a variety of tumors. For
example, the bone osteoblastic module (Fig. 6)consistsofgenes
associated with proliferation and differentiation of bone-building cells.
These genes are induced in 172 arrays, including a subset of breast
cancer samples (relative to other breast cancer and human mammary
epithelial cells; 37 arrays; P o 5.6 10
14
) and a subset of nontumor
hepatitis-infected liver (relative to other hepatitis-infected liver tissue
a
b
ERK1
Gq
Proliferation
FGF
JNKp38
Apoptosis
Direct activation
Direct inhibition
Indirect activation
Growth suppressor
Module genes
1 – Two-component signal transduction system (phosphorelay)
2 – MAPKKK cascade
>4
MEKK4
DUSP6 DUSP2
RGS4 RGS3
DUSP4DUSP8
GADD45G
MAP3K7IP1 DUSP6 DUSP2
RGS4 RGS3
DUSP4DUSP8
GADD45G
MAP3K7IP1
DUSP6 DUSP2
RGS4 RGS3
DUSP4DUSP8
GADD45G
MAP3K7IP1
MAP3K7IP1
ADORA2B
KCNH2
DUSP8
RGS4
FPR1
EGF
FGF2
DUSP4
DUSP6
GADD45G
RGS3
ADRA2A
C5R1
FGFR1
DUSP2
Module
changes
Acute leukemia
(Leukemia*)
21
Nontumor liver (LiC)
Liver tissue (LiC*)
Cancer & cell line (LiC)
Cell line (LiC)
2-Hydroxy-
estradiol-17-β
>4
19-Hydroxy-
testosterone
Androst-4-ene-
3,17-dione
Dehydroepi-
androsterone
b
Dehydroepiandro-
sterone sulfate
16 α-Hydroxydehyd-
roepiandrosterone
16 α-Hydroxyandrost-
4-ene-3,17-dione
3β, 17β-Dihydroxy-
androst-5-ene
5 α-Dihydro-
testosterone
5 β-Dihydro-
testosterone
SULT2A1STS
HSD3B1,2
HSD3B1,2 HSD3B1,2
AKR1D1CYP19A1
19-Oxo-testosterone
Estriol
CYP19A1
CYP19A1
3-Glucuronide
16α-Hydr-
oxyestrone
2-Methoxy-
estradiol-17-β
CYP19A1
AKR1D1
AKR1C4
CYP11B1,2
11β-Hydroxyan-
drost-4-ene-3,17-
dione
Androsterone
5α-Androsta-
ne-3,17-dione
5β-Androsta-
ne-3,17-dione
Etiocholan-
3α-ol-17-one
2-Methoxyestrone
2-Hydroxyestrone
CYP39A1
CYP7B1
Estrone
3-sulfate
SRD5A2
UGT2B4,7,10,15 UGT2B4,7,10,15
UGT2B4,7,10,15
HSD17B4
UGT2B4,7,10,15
HSD17B4
SRD5A2
UGT2B4,7,10,15
HSD11B1
UGT2B4,7,10,15
UGT2B4,7,10,15
SULT1E1
SRD5A2
UGT2B4,7,10,15 UGT2B4,7,10,15
UGT2B4,7,10,15
HSD17B4
UGT2B4,7,10,15
HSD17B4
SRD5A2
UGT2B4,7,10,15
HSD11B1
UGT2B4,7,10,15
UGT2B4,7,10,15
SULT1E1
ARSD,E
Module genes
Estrone
Testosterone
Estradiol-17-β
Estrone
Testosterone
Estradiol-17-β
17-Glucuronide
16-Glucuronide 3-Glucuronide
3-Glucuronide
3-Glucuronide
Adrenosterone
3-Glucuronide
17-Glucuronide
16-Glucuronide 3-Glucuronide
3-Glucuronide
3-Glucuronide
Adrenosterone
3-Glucuronide
Steroid hormones Catabolites
DDT
SULT1E1
UGT2B7
UGT2B4
UGT2B15
SRD5A2
HSD17B4
HSD11B1
UGT2B10
DCT
C21orf127
TYRP1
TYR
Module changes
1 2 3
1 – Pigmentation
2 – Pigment biosynthesis
3 – Androgen and estrogen metabolism
a
Figure 5 Steroid catabolism module (#505), a module that responds
significantly to one specific condition: liver tissue and tumor samples.
(a) Expression profile of genes in the steroid catabolism module. Details of
data presentation are as described for Figure 4. LiC, liver cancer. Asterisks
denote general annotations. (b) Module genes (purple) in the context of the
androgen and estrogen metabolism pathway. The pathway was adapted from
the Kyoto Encyclopedia of Genes and Genomes pathway database
7
,showing
only metabolic steps associated with human enzymes. Enzymes are shown
as rectangles; metabolites as circles. Steroid hormones and their catabolic
end products are highlighted in light green and light blue, respectively. Most
of the module genes are associated with catabolism of androgens and
estrogens (which occurs in the liver).
Figure 4 Growth inhibitory module (#173), a
module that responds significantly to one specific
condition: acute leukemia. (a) Expression profile
of genes in the growth inhibitory module. Shown
are all arrays in which expression of the modules
genes changed significantly, and the direction of
change (induction or repression) in each such
array (red or green, respectively). Gray pixels
represent missing values. The arrays
corresponding to acute leukemia are indicated by
brown pixels in the top row, followed by an
abbreviated code of the data set in which they
were analyzed. Asterisks denote general
annotations. The membership of the module
genes in the two gene sets from which the
module was generated is shown (left, purple
pixels). (b) Module genes (purple) in the context
of the MAPK pathways of proliferation and
apoptosis. The pathway was compiled from
known interactions in the literature. All of the
module genes were significantly repressed in
acute leukemia, and most are known to inhibit
cell growth (bold blue border). Only DUSP2 was
previously implicated in acute leukemia; other
module genes are new potential targets.
1094 VOLUME 36
[
NUMBER 10
[
OCTOBER 2004 NATURE GENETICS
LETTERS
© 2004 Nature Publishing Group http://www.nature.com/naturegenetics
cd
b
Mesenchymal
cells
Osteochondro
progenitors
Cell
death
Chondrocytes
Deposition
Gaq/11
CBFA
Osteoclast
progenitor
Hematopoietic
stem cells
(CFU-GM)
Runx2
Deposition
Resorption
Runx2
?
Osteoblasts
Osteoclasts
BSPOsteocalcin
Differentiation
& matrix remodeling
Molecular activation
Molecular Inhibition
Activation
of differentiation
Inhibition
of differentiation
Module gene
Nonmodule gene
tPA
Plasmin
LTBPs
TGFb
IGFBP
Protease
EndothelinR
Endothelin
PTHrP
BMP2
IHH
PTHrP
BMP2
Smad1
GH
GH
Secreted protein
Osteoclast
Osteoblasts
Chondrocytes
Proliferating
chondrocytes
Hypertrophic
chondrocytes
Activation
of proliferation
Inhibition
of proliferation
a
BMP4
SOX9
DLX5
BMP7
IL6
ETS2
TIE
BMP1
FGFR1
HOXD13
FRZB
DLX3
MGP
PRELP
PTHR1
TNA
IGF2
LUM
AEBP1
FBN1
COL12A1
SPARC
INHBA
TNFRSF11B
CART1
SPP2
GHR
IGF1
COMP
COL11A1
POSTN
COL1A1
Breast cancer (BC*)
Invasive liver tumor (LiC)
Hepatocellular carcinoma (LiC)
Nontumor liver tissue (LiC)
Lung cancer (LC*)
Non small cell lung cancer (LC*)
Acute lymphocytic leukemia (L)
Module changes
1 – Bone remodeling
2 – Skeletal development
1 2
>4
FRZB
FRZB
CART1
POSTIN
DLX3
PTHR
FGFR1
DLX5
SOX9
GHR
AEBP1
SOX9
SOX9
PTHR
DLX5
DLX5
BMP7
DLX5
ETS2
GHR
COL11A1
LUMPRELP
COMPCOL12A1
SPARC
LUM FBN1
COMP
BMP1
COL1A1
TNA
IGF1
BMP4
MGP
INHBA
SPARC
IL-6TNFRSF11B
IGF2
BMP4
INHBA
BMP7
COMP
IGF1
FRZB
FRZB
CART1
POSTIN
DLX3
PTHR
FGFR1
DLX5
SOX9
GHR
AEBP1
SOX9
SOX9
PTHR
DLX5
DLX5
BMP7
DLX5
ETS2
GHR
COL11A1
LUMPRELP
COMPCOL12A1
SPARC
LUM FBN1
COMP
BMP1
COL1A1
TNA
IGF1
BMP4
MGP
INHBA
SPARC
IL-6TNFRSF11B
IGF2
BMP4
INHBA
BMP7
COMP
IGF1
CART1
POSTN
DLX3
PTHR
FGFR1
DLX5
SOX9
GHR
AEBP1
SOX9
SOX9
PTHR
DLX5
DLX5
BMP7
DLX5
ETS2
GHR
COL11A1
LUMPRELP
COMPCOL12A1
SPARC
LUM FBN1
COMP
BMP1
COL1A1
TNA
IGF1
BMP4
MGP
INHBA
SPARC
IL-6TNFRSF11B
IGF2
BMP4
INHBA
BMP7
COMP
IGF1
Module
Module (32)
Specific tissue,
including
bone/cartilage (14)
Unique
to bone/
cartilage
(6)
Angio
associated
(13)
Angiogenic
(8)
Bone/cartilage
function (31)
Module
Module (32)
Specific tissue,
including
bone/cartilage (14)
Unique to
bone/cartilage
(6)
Angio
associated
(13)
Angiogenic
(8)
Bone/cartilage
function (31)
Module (32)
Epithelial cancer (19)
Breast canc
(1
Tumor
stroma (12)
8
5
3
4
Figure 6 Bone osteoblastic module (#234), a module that responds significantly to multiple conditions, including breast cancer, lung cancer, HCC and ALL.
(a) Expression profile of genes in the bone osteoblastic module. Details of data presentation are as described for Figure 4. LiC, liver cancer; BC, breast
cancer; LC, lung cancer; L, leukemia. Asterisks denote general annotations. (b) Module genes in the context of the molecular pathways underlying bone
remodeling. The pathways are shown for the differentiation and matrix remodeling events (light blue arrows) of the three main cell types in bone and
cartilage: chondrocytes (top), osteoblasts (middle) and osteoclasts (bottom). The coordination and balance among the three processes results in either bone
building or resorption. The module genes (purple) are primarily associated with proliferation and differentiation of chondrocytes and osteoblasts. Even those
module genes that are related to osteoclast induction encode proteins that are typically secreted by osteoblasts. The genes include both intracellular or
membrane proteins (thin black border) and extracellular secreted ones (bold blue border), thus forming a coherent and self-sufficient autocrine module.
(c) The expression and function of 32 module genes in normal tissues based on previous immunohistochemical and in situ hybridization experiments. Almost
all (31 of 32) of the genes function in bone or cartilage (blue), and 14 are expressed primarily (pink) or uniquely (purple) in bone or cartilage. In contrast, only
8 of the genes are angiogenic (green), and another 5 genes are partly associated with blood vessels or antiangiogenic function (yellow). (d) The expression of
23 of the 32 module genes in epithelial tumors and their surrounding stroma based on previous immunohistochemical and in situ hybridization experiments.
Whereas 19 of the genes are associated with breast cancer (green) or other epithelial tumors (orange), only 4 are expressed solely in stroma (blue).
NATURE GEN ETICS VOLUME 36
[
NUMBER 10
[
OCTOBER 2004 1095
LETTERS
© 2004 Nature Publishing Group http://www.nature.com/naturegenetics
and HCC; 47 arrays; P o 10
10
). Expression of these genes is
repressed in 361 arrays, including subsets of HCC (relative to other
hepatitis-infected liver tissue and HCC; 48 arrays; P o 2 10
9
),
a subset of ALL1 acute lymphoblastic leukemia (relative to other
acute lymphoblastic leukemia and acute myeloid leukemia; 10 arrays;
P o 9 10
6
) and a subset of lung cancer samples (relative to other
lung cancers; 120 arrays; P o 10
33
).
Bone-related clinical conditions have been associated with all of
these malignancies. In particular, bone metastasis is a key phenomenon
in breast cancer, and some breast metastases are known to be
osteoblastic
18
. Not all primary breast tumors activate the osteoblastic
module, consistent with the fact that many breast metastases to bone
are not osteoblastic
18
and probably use different mechanisms
19
.Bone
metastasis is also common in lung cancer
18
and was recently impli-
cated in HCC
20
. Finally, ALL has been associated with reduced
bone-mass density in a subpopulation of individuals
21
. The bone
osteoblastic module reflects these diverse phenomena and may
partially explain them. Although osteoblastic metastasis is also com-
mon in prostate cancer
18
, the module was not substantially expressed
in the prostate cancer samples in our compendium. As several genes
in the module that are known to be transcriptionally induced in
prostate cancer (MGP, IGF2, IL6 and GHR) are not induced in
this data set, we suspect that these arrays are uninformative about
osteoblastic metastasis.
The induction of the bone osteoblastic module in breast cancer is
particularly interesting. Previous studies suggested that breast tumors
preferentially metastasize to bone owing to a cycle of positive feedback
through reciprocal secretion of growth factors between the tumor and
bone cells
18
. It was previously unclear, however, whether the molecular
mechanisms necessary to initiate this cycle are present in the primary
tumor
19
. We found that both the secreted growth factors and the
intracellular proteins required to receive their signal were induced in
primary breast cancer tumors, suggesting that the primary tumor uses
the osteoblastic mechanism for its own paracrine proliferation. One
might suspect that the module is induced in the surrounding stroma
rather than in the tumor itself. Previous immunohistochemical and
in situ hybridization experiments (Fig. 6d) indicate that 19 of the
32 module genes are expressed in epithelial cells in tumors and some
also in metastasis of breast cancer to bone (e.g., IGF2 (ref. 18), BMP4
(ref. 18), IL6 (ref. 18), FRZB
22
and activin A
23
). Only 4 of 32 genes, all
of which encode secreted proteins, are expressed solely in the stroma,
indicative of possible paracrine signaling between tumor and breast
stroma. This process may be subsequently substituted by signaling
between the metastasized tumor and bone stroma. Thus, this bor-
rowed module may both be innately useful to the primary tumor and
provide a mechanism for effective osteoblastic bone metastasis. This
hypothesis is consistent with recent findings on the metastatic poten-
tial of primary tumors
24,25
and identifies several new targets for
further research.
The downregulation of the bone osteoblastic module in HCC, ALL
and lung cancer is also notable. There is no clear explanation for this
downregulation in lung and HCC tumors, but repression of this
growth-inducing module in the ALL bone marrow samples provides a
potential explanation for the reduced bone mass density in ALL. Dlx3
and Dlx5, two ALL-1 targets that are crucial to osteoblast proliferation
and differentiation
26
, are part of the module.
In conclusion, our method provides a global view of cancer and
shows that tumors can be characterized by combinations of a relatively
small number of modules. Several other methods have been proposed
for global analysis of microarray data
27–29
. Notably, our work, which is
the first to apply such global analysis to human data, uses existing
biological knowledge directly, in the form of gene sets and clinical
annotations. Furthermore, unlike recent meta-analysis
4
of a large
compendium of cancer expression profiles, our approach focuses on
identifying modules of genes and is independent of predefined queries
(Supplementary Note online).
The results of our analysis are publicly available on a data-mining
website; the automated tool that we used to generate the analysis is
also available. This tool allows researchers to construct a module map
from any collection of gene sets and expression data in any organism
and to study new data in the context of a large compendium.
Although the quality of current annotations and normalization
procedures may limit the maps accuracy, our examples indicate that
many phenomena are sufficiently robust to be detected using our
approach. Thus, our approach provides a valuable tool for under-
standing the molecular basis of cancer, both for specific tumors and
for tumorigenic processes in general.
METHODS
DNA microarray data set. We downloaded data available for 1,975 human
DNA microarrays from the Stanford Microarray Database and the Center for
Genomic Research at the Whitehead Institute (Supplementary Table 1 online).
We normalized the expression of each gene g in every data set separately. For
data sets generated using Affymetrix chips, we first determined the log (base 2)
of the expression value of gene g in each array (truncating to 10 expression
values that are below 10). For data sets generated using spotted cDNA chips, we
used the log-ratio (base 2) between the measured sample and the control
sample. In both types of data sets, we then normalized the (log-space)
expression value of gene g in each array relative to its average expression in
all the arrays in the same data set, by subtracting its average in that data set
from each of its expression measurements. After this normalization, the mean
value of a gene, in each data set, is zero.
Gene sets. We compiled 2,849 gene sets, obtained as follows: 1,281 from the
Gene Ontology
8
hierarchy (downloaded on July 2003, version 1.320); 114 from
the Kyoto Encyclopedia of Genes and Genomes
7
(downloaded on May 2003);
53 from the Gene MicroArray Pathway Profiler
9
(downloaded on July 2003);
101 tissue-specific expressed gene sets
6
(one gene set was defined for each array
by taking all genes above absolute expression of 400; we removed genes whose
absolute expression was 4400 in 450 of the 101 arrays); and 1,300 gene sets
obtained by clustering each of the data sets of Supplementary Table 1 online
using a published clustering method (the P-cluster algorithm
27
) and taking
clusters of coexpressed genes.
Identifying arrays in which the expression of gene sets changes significantly.
To identify the arrays in which each gene set was significantly induced (or
repressed), we defined the induced (or repressed) genes in each array to be
those genes whose change in expression was greater (or less) than twofold. For
each gene set and each array, we calculated the fraction of genes from that gene
set that were induced (or repressed) in that array and used the hypergeometric
distribution to calculate a P value for this fraction (compared with the null
hypothesis of choosing the same number genes at random). We corrected for
multiple tests using the false discovery rate correction with 5% false rate.
Statistical significance of array–gene set pairs. We evaluated the number of
array–gene set pairs in which the gene set was significantly induced (or
repressed) in the array (as described above). Overall, we found 299,233 such
pairs; only 14,962 would be expected by chance (P o 0.05), suggesting that the
selected gene sets are informative for the cancer compendium (Supplementary
Fig. 2 online).
Automatic identification of gene set clusters. We carried out (bottom-up)
hierarchical clustering of the gene sets in the matrix of all significant array–gene
set pairs
30
. This resulted in a tree in which each leaf node, corresponding to
some gene set G, is associated with a vector (indexed by arrays) that is zero
everywhere except for entries that correspond to arrays in which set G was
significantly induced (or repressed), in which case the entry contains the
1096 VOLUME 36
[
NUMBER 10
[
OCTOBER 2004 NATURE GENETICS
LETTERS
© 2004 Nature Publishing Group http://www.nature.com/naturegenetics
fraction (or negative fraction) of genes from set G that are induced (or
repressed) in an array a.Eachinternalnodeisassociatedwithavector
representing the average of all of the gene set vectors at its descendant leaves.
We annotated each interior node with the Pearson correlation between the
vectors associated with its two children in the hierarchy. We defined as a cluster
each interior node whose Pearson correlation differed by more than 0.05 from
the Pearson correlation of its parent node in the hierarchy, resulting in 577
clusters of gene sets. Such interior nodes represent points in the tree with a
large gap between the similarities in expression of the nodes children and the
similarity in expression of the node and its sibling.
Testing consistency of a gene with expression of a gene set. Given a gene set G
and a gene g, we tested whether the expression of g was consistent with the
significant changes in the expression of G. We first identified the subsets of
arrays I and R in which G was significantly induced and repressed, respectively.
We then measured the extent to which the expression of g changed by more
(or less) than twofold in arrays in I (or R) with the score
Score ðgÞ¼
X
fa2Ijg isinduced inag
logðp
a
Þ +
X
fa2Rjg isrepressed in ag
logðp
a
Þ;
where p
a
is the fraction of genes in array a that are induced (or repressed) by
more than twofold for arrays in I (or in R). This score assigns more weight
to induction in arrays where there are fewer induced genes (and respectively
for repression).
We evaluated the significance of the score for gene g with respect to the null
hypothesis where the genes in each array are randomly permuted. Under this
null hypothesis, the score for gene g is the sum of independent binary random
variables, one for each array in I and R. The random variable corresponding to
array a attains the value log(p
a
) with probability p
a
and the value of 0 with
probability 1 p
a
. Because the score for gene g in this model is a sum of
independent random variables, its mean m and variance s
2
are the sum of the
means and variances, respectively, of the these variables and can be computed
analytically:
m ¼
X
a2I[R
p
a
log p
a
s
2
¼
X
a2I[R
p
a
ð1 p
a
Þ log
2
p
a
:
Moreover, by the central limit theorem, the distribution of the score for gene
g under the null hypothesis can be closely approximated by a Gaussian distri-
bution with mean m and variance s
2
.Weusedstandardmethodsforcomputing
the tail probability of a Gaussian distribution to compute the probability of
attaining a score as large as the observed score under the null hypothesis.
Deriving modules from clusters of gene sets. For each cluster of gene sets, we
defined G to be the union of the gene sets in the cluster. We then tested each
gene in G for consistency (as described above). The resulting module consists of
genes whose expression is significantly consistent with the expression of the
gene set (after false discovery rate correction for multiple hypotheses using 5%
false rate). Leave-one-out cross-validation analysis (Supplementary Note and
Supplementary Fig. 1 online) showed that 456 of the 577 gene-set clusters were
significant at P o 0.01. All further analysis was carried out only for the 456
modules derived from these 456 gene set clusters.
Enrichment of clinical annotations. To characterize conditions as a combina-
tion of activated and deactivated modules, we associated each array with the
annotations it represents, from a total of 263 clinical annotations that we
compiled based on published studies (see our project website for the complete
set of clinical annotations). We distinguished between 185 specific annotations
(present in o70% of the arrays in a given category; Fig. 1a and project
website) and 78 general annotations (present in 70% or more of the arrays in a
category). For example, ‘Stage T2’ is a specific annotation in the ‘lung cancer’
category (12.6% of samples in this category), whereas ‘lung cancer’ is a general
annotation (86% of the samples in the ‘lung cancer’ category). For each
module and each annotation, we calculated the fraction of arrays associated
with that annotation of the total number of arrays in which the module is
significantly induced (or repressed) and used the hypergeometric distribution
to calculate a P value for this fraction. For specific annotations, we only
considered arrays in the same category when computing the P value. For
general annotations, we considered all other arrays in the compendium as
background (i.e., the other arrays were marked as not having the general
annotation). In both cases, all annotations were strictly local (e.g., the lung
cancer annotation in the lung cancer category is distinct from the lung cancer
annotation in the ‘various tumors’ category and is reported separately). We
carried out a false discovery rate correction for multiple hypotheses and took
P o 0.05 to be significant in Figure 2.
GeneXPress. We carried out all analysis and visualizations in GeneXPress. This
tool can identify the arrays in which gene sets are significantly expressed, and
the clinical annotations enriched in these significant arrays, and can be used for
any input expression data and gene sets in any organism. GeneXPress is freely
available for academic use.
URLs. More detailed results, including the expression compendium, clinical
annotations that we compiled and all the significant gene set–array pairs,
viewable in GeneXPress, can be found on our project website (http://dags.
stanford.edu/cancer). The website also contains detailed views of all 456
modules in the format of Figures 46, which can be searched and browsed
in various ways. GeneXPress is freely available for academic use at http://
GeneXPress.stanford.edu/. All expression data used is available from the
Stanford Microarray Database (http://genome-www5.stanford.edu/Microarray/
SMD/) and the Center for Genomic Research at the Whitehead Institute
(http://www-genome.wi.mit.edu/cgi-bin/cancer/datasets.cgi).
ACKNOWLEDGMENTS
We thank J. Effrat, T. Fojo, Y. Friedman, A. Kaushal, W. Lu, T. Pham, M. Tong,
and R. Yelensky for technical help with software and visualization and I. Ben-
Porath,Y.Dor,L.Garwin,N.Kaminski,D.Peer,O.RandoandT.Ravehfor
comments on previous versions of this manuscript. E.S., N.F. and D.K. were
supported by a National Science Foundation grant under the Information
Technology Research program. E.S. was also supported by a Stanford Graduate
Fellowship. N.F. was also supported by an Alon Fellowship, by the Harry & Abe
Sherman Senior Lectureship in Computer Science and by the United States-Israel
Bi-National Science Foundation grant. N.F. and A.R. were supported by a Center
of Excellence Grant from the National Institute of General Medical Sciences.
A.R. was also supported by the Bauer Center for Genomics Research.
COMPETING INTERESTS STATEMENT
The authors declare that they have no competing financial interests.
Received 16 March; accepted 25 August 2004
Published online at http://www.nature.com/naturegenetics/
1. Ramaswamy, S., Ross, K.N., Lander, E.S. & Golub, T.R. A molecular signature of
metastasis in primary solid tumors. Nat. Genet. 33, 49–54 (2003).
2. Ramaswamy, S. et al. Multiclass cancer diagnosis using tumor gene expression
signatures. Proc. Natl. Acad. Sci. USA 98, 15149–15154 (2001).
3. Lamb, J. et al. A mechanism of cyclin D1 action encoded in the patterns of gene
expression in human cancer. Cell 114, 323–334 (2003).
4. Rhodes, D.R. et al. Large-scale meta-analysis of cancer microarray data identifies
common transcriptional profiles of neoplastic transformation and progression.
Proc. Natl. Acad. Sci. USA 101, 9309–9314 (2004).
5. Mootha, V.K. et al. PGC-1alpha-responsive genes involved in oxidative phosphorylation
are coordinately downregulated in human diabetes. Nat. Genet. 34, 267–723 (2003).
6. Su, A.I. et al. Large-scale analysis of the human and mouse transcriptomes. Proc. Natl.
Acad. Sci. USA 99, 4465–4470 (2002).
7. Kanehisa, M., Goto, S., Kawashima, S. & Nakaya, A. The KEGG databases at
GenomeNet. Nucleic Acids Res. 30, 42–46 (2002).
8. Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene
Ontology Consortium. Nat. Genet. 25, 25–29 (2000).
9. Dahlquist, K.D., Salomonis, N., Vranizan, K., Lawlor, S.C. & Conklin, B.R. GenMAPP,
a new tool for viewing and analyzing microarray data on biological pathways. Nat.
Genet. 31, 19–20 (2002).
10. Kim, S.C. et al. Constitutive activation of extracellular signal-regulated kinase in
human acute leukemias: combined role of activation of MEK, hyperexpression of
extracellular signal-regulated kinase, and downregulation of a phosphatase, PAC1.
Blood 93, 3893–3899 (1999).
11. Chu, Y., Solski, P.A., Khosravi-Far, R., Der, C.J. & Kelly, K. The mitogen-activated
protein kinase phosphatases PAC1, MKP-1, and MKP-2 have unique substrate
specificities and reduced activity in vivo toward the ERK2 sevenmaker mutation.
J. Biol. Chem. 271, 6497–6501 (1996).
NATURE GEN ETICS VOLUME 36
[
NUMBER 10
[
OCTOBER 2004 1097
LETTERS
© 2004 Nature Publishing Group http://www.nature.com/naturegenetics
12. Furukawa, T., Sunamura, M., Motoi, F., Matsuno, S. & Horii, A. Potential tumor
suppressive pathway involving DUSP6/MKP-3 in pancreatic cancer. Am. J. Pathol.
162, 1807–1815 (2003).
13. Leone, A.M., Errico, M., Lin, S.L. & Cowen, D.S. Activation of extracellular signal-
regulated kinase (ERK) and Akt by human serotonin 5-HT(1B) receptors in transfected
BE(2)-C neuroblastoma cells is inhibited by RGS4. J. Neurochem. 75, 934–938
(2000).
14. Shi, C.S. et al. Regulator of G-protein signaling 3 (RGS3) inhibits Gbeta1gamma 2-
induced inositol phosphate production, mitogen-activated protein kinase activation,
and Akt activation. J. Biol. Chem. 276, 24293–24300 (2001).
15. Ge, B. et al. TAB1beta (transforming growth factor-beta-activated protein kinase 1-
binding protein 1beta), a novel splicing variant of TAB1 that interacts with p38alpha
but not TAK1. J. Biol. Chem. 278, 2286–2293 (2003).
16. Mita, H., Tsutsui, J., Takekawa, M., Witten, E.A. & Saito, H. Regulation of MTK1/
MEKK4 kinase activity by its N-terminal autoinhibitory domain and GADD45 binding.
Mol. Cell. Biol. 22, 4544–4555 (2002).
17. Granata, O.M. et al. Altered androgen metabolism eventually leads hepatocellular
carcinoma to an impaired hormone responsiveness. Mol. Cell. Endocrinol. 193,5158
(2002).
18. Mundy, G.R. Metastasis to bone: causes, consequences and therapeutic opportunities.
Nat. Rev. Cancer 2, 584–593 (2002).
19. Kang, Y. et al. A multigenic program mediating breast cancer metastasis to bone.
Cancer Cell 3, 537–459 (2003).
20. Iguchi, H. et al. A possible role of VEGF in osteolytic bone metastasis of hepatocellular
carcinoma. J. Exp. Clin. Cancer Res. 21, 309–313 (2002).
21. Boot, A.M., van den Heuvel-Eibrink, M.M., Hahlen, K., Krenning, E.P. & de Muinck
Keizer-Schrama, S.M. Bone mineral density in children with acute lymphoblastic
leukaemia. Eur. J. Cancer 35, 1693–1697 (1999).
22. Ugolini, F. et al. Differential expression assay of chromosome arm 8p genes
identifies Frizzled-related (FRP1/FRZB) and Fibroblast Growth Factor Receptor
1 (FGFR1) as candidate breast cancer genes. Oncogene 18, 1903–1910
(1999).
23. Reinholz, M.M., Iturria, S.J., Ingle, J.N. & Roche, P.C. Differential gene expression of
TGF-beta family members and osteopontin in breast tumor tissue: analysis by real-time
quantitative PCR. Breast Cancer Res. Treat. 74, 255–269 (2002).
24. Bernards, R. & Weinberg, R.A. A progression puzzle. Nature 418, 823 (2002).
25. Hynes, R.O. Metastatic potential: generic predisposition of the primary tumor or rare,
metastatic variants-or both? Cell 113, 821–823 (2003).
26. Ferrari, N. et al. DLX genes as targets of ALL-1: DLX 2,3,4 down-regulation in t(4;11)
acute lymphoblastic leukemias. J. Leukoc. Biol. 74, 302–305 (2003).
27. Segal, E. et al. Module networks: identifying regulatory modules and their condition-
specific regulators from gene expression data. Nat. Genet. 34, 166–176 (2003).
28. Ihmels, J. et al. Revealing modular organization in the yeast transcriptional network.
Nat. Genet. 31, 370–377 (2002).
29. Tanay, A., Sharan, R., Kupiec, M. & Shamir, R. Revealing modularity and organization
in the yeast molecular network by integrated analysis of highly heterogeneous genome-
wide data. Proc. Natl. Acad. Sci. USA 101, 2981–2986 (2004).
30. Eisen, M.B., Spellman, P.T., Brown, P.O. & Botstein, D. Cluster analysis and display of
genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95, 14863–14868
(1998).
1098 VOLUME 36
[
NUMBER 10
[
OCTOBER 2004 NATURE GENETICS
LETTERS
© 2004 Nature Publishing Group http://www.nature.com/naturegenetics
... Despite the availability of plethora of molecular profiles of tumor samples, there is still a lack of suitable methodologies to extract important information from the diverse tumor datasets for a mechanistic understanding of tumorigenesis. Several top-down bioinformatics methods utilized high-throughput gene expression data to study dysregulation of gene expression in cancer 55,56 and link the upstream signaling pathway to downstream transcription program 57 . Some other methods infer network of transcription factor and target genes by using multi-omics data [58][59][60][61] . ...
Article
Full-text available
Acute myeloid leukemia (AML) is characterized by uncontrolled proliferation of poorly differentiated myeloid cells, with a heterogenous mutational landscape. Mutations in IDH1 and IDH2 are found in 20% of the AML cases. Although much effort has been made to identify genes associated with leukemogenesis, the regulatory mechanism of AML state transition is still not fully understood. To alleviate this issue, here we develop a new computational approach that integrates genomic data from diverse sources, including gene expression and ATAC-seq datasets, curated gene regulatory interaction databases, and mathematical modeling to establish models of context-specific core gene regulatory networks (GRNs) for a mechanistic understanding of tumorigenesis of AML with IDH mutations. The approach adopts a new optimization procedure to identify the top network according to its accuracy in capturing gene expression states and its flexibility to allow sufficient control of state transitions. From GRN modeling, we identify key regulators associated with the function of IDH mutations, such as DNA methyltransferase DNMT1, and network destabilizers, such as E2F1. The constructed core regulatory network and outcomes of in-silico network perturbations are supported by survival data from AML patients. We expect that the combined bioinformatics and systems-biology modeling approach will be generally applicable to elucidate the gene regulation of disease progression.
... For example, higher expression of AEBP1 was interrelated to larger tumor size, lower levels of histological differentiation, more metastasis in the lymph node, and higher cancer stage in patients with colon adenocarcinoma and glioblastoma [19,24,25]. Similarly, overexpression of AEBP1 may be a distinguished factor in the progression of malignant BCa via bone differentiation and matrix remodeling [26][27][28]. In the present study, we discussed EMT, a hallmark of cancer, is related to the aggressive characteristics of BCa [29]. ...
Article
Background: The aberrant expression of adipocyte enhancer binding protein 1 (AEBP1) has been observed in many cancers and it seems to be involved in the tumorigenesis, progression, and metastasis in numerous tumor types. However, the contribution of AEBP1 in breast cancer (BCa) remains inexplicable. Methods: Information related to the diagnostic significance and expression of AEBP1 in BCa was obtained from the public dataset Kaplan-Meier Plotter (http://kmplot.com/analysis/) and the dataset UALCAN (https://ualcan.path.uab.edu/index.html). The MTT (methyl thiazolyl tetrazolium) assay, colony formation assay, Transwell® assay, and FACS (fluorescence-activated cell sorting) assay were used to detect the proliferation, invasive and apoptotic ability of cells before and after treatment. In addition, we constructed an AEBP1 overexpression vector and silenced AEBP1, combined with Real-Time Quantitative Reverse Transcription PCR (qRT-PCR), western blot, immunohistochemistry and TUNEL (terminal deoxynucleotidyl transferase-mediated dUTP-biotin nick end labeling) assay to investigate the prognostic significance, biological functions and potential mechanisms of AEBP1 in BCa. Results: Higher expression of AEBP1 mRNA (message RNA) was observed in BCa patients with later-stage, who obtained poorer overall survival. Meanwhile, compared with adjacent noncancerous tissues, AEBP1 protein expression was dramatically upregulated in the BCa ones. Furthermore, overexpressed AEBP1 enhanced cell proliferation, migration, invasion, and blocked cell apoptosis in BCa cells. Moreover, the research certificated that AEBP1 upregulated the expression of MMP (matrix metalloproteinase)-2, 9, vimentin, N-cadherin (neural-cadherin), phosphorylation of ERK (extracellular signal-regulated kinase), Smad2/3 (Abbreviated from Sma for nematode and Mad for Drosophila) and AKT (V-akt murine thymoma viral oncogene homolog), while down-regulated the expression of E-cadherin (epithelial cadherin) and PTEN (phosphatase and tensin homolog deleted on chromosome 10). To inhibit cell apoptosis, enforced expression of AEBP1 effectively blocked the cleavage of caspase 9 and p53 (protein 53) and promoted the expression of anti-apoptotic protein Bcl-2 (B-cell lymphoma-2). Finally, AEBP1 accelerated subcutaneously transplanted tumor growth in nude mice by increasing the expression of the cell proliferation biomarker ki67, the phosphorylation of AKT, and blocked apoptosis in vivo. Conclusions: In summary, these data suggested the important role of AEBP1 in the BCa progression, which could be used as a potential biomarker for prognostic hallmark and a novel therapeutic strategy.
... WormClust web application enables gene-by-gene query to identify coexpression with metabolic (sub)-pathways A major premise of this study is the assumption that variance in mRNA levels results, at least in part, from transcriptional regulation, which in turn suggests that genes are coexpressed because they are coregulated. In reverse engineering of gene regulatory networks, coexpression of TFs with their target genes has been used to define causal relationships (Segal et al, 2004;MacNeil & Walhout, 2011). To make our data available to the community as well as to enable the easy identification of TFs and other C. elegans genes that are coexpressed with metabolic (sub)-pathways, we developed a web application named WormClust, which is available on the WormFlux website (Yilmaz & Walhout, 2016). ...
Article
Full-text available
Metabolism is controlled to ensure organismal development and homeostasis. Several mechanisms regulate metabolism, including allosteric control and transcriptional regulation of metabolic enzymes and transporters. So far, metabolism regulation has mostly been described for individual genes and pathways, and the extent of transcriptional regulation of the entire metabolic network remains largely unknown. Here, we find that three-quarters of all metabolic genes are transcriptionally regulated in the nematode Caenorhabditis elegans. We find that many annotated metabolic pathways are coexpressed, and we use gene expression data and the iCEL1314 metabolic network model to define coregulated subpathways in an unbiased manner. Using a large gene expression compendium, we determine the conditions where subpathways exhibit strong coexpression. Finally, we develop "WormClust," a web application that enables a gene-by-gene query of genes to view their association with metabolic (sub)-pathways. Overall, this study sheds light on the ubiquity of transcriptional regulation of metabolism and provides a blueprint for similar studies in other organisms, including humans.
... Multiple clusters of various expressed genes have been identified as genetic signatures in HCC studies that hold promising results as diagnostic/prognostic markers and as therapeutic targets for HCC. The frequently known proliferation clusters include [56,57] cyclins of A/B type and cell cycle division proteins that regulate the cell cycle, Hetero-hexamer DNA helicase MCM3-7, PCNA, and DNA TOP2A. In clinical practice, the genetic signature of regulated c-MYC has been associated with poorer OS and poor cell differentiation. ...
Article
Hepatocellular carcinoma is known to be a common primary liver malignancy and a serious leading cause of cancerrelated mortality globally. Hence, ongoing recent advances in the genetic field regarding hepatocellular carcinoma paid researchers great attention to identifying various biomarkers to act as diagnostic and prognostic tools for the early detection of hepatocellular carcinoma and also developing targeted therapeutic agents that are indicated and available for advanced stages of hepatocellular carcinomas, however, their antitumor efficacy remains limited and under investigations. Therefore, our review summarized the genetic studies of liver cancer focusing on the somatic mutations, copy number variations, and epigenetic modifications that represent early alterations and oncodrivers in hepatocarcinogenesis, Moreover, the identification of genetic signatures and proteomic targets through hepatocellular carcinoma-related genome-wide screening, to show the ongoing clinical application of such analysis to facilitate diagnosis, prognosis and management of patients with hepatocellular carcinoma for a better outcome.
... There is a rich literature on estimating model (1) based on gene expression data (Segal et al., 2004). Due to high-dimensional p n >> n nature of the data, they introduced regularized linear regression procedure based on Lasso (Tibshirani, 1996). ...
Article
Full-text available
This study aims to introduce four modified linear estimators for the right-censored high-dimensional data. Obviously, data of interest involves two important problems to be solved that are censorship and high dimensionality. This paper can be distinguished from other studies in the literature with that it achieves to handle these two problems simultaneously. The main contribution of the paper is merging weightedridge method with the imputation techniques to obtain more efficient estimators than its alternatives. To solve the censorship problem, four imputation techniques are considered based on machine learning algorithms kNN, sliding-windows, regression and support vector machines. The high-dimensionality problem is handled by the weighted ridge approach which provides estimator with less risk than its alternatives because it detects the covariates with a weak contribution via the post-selection procedure. To show the empirical performance of the introduced estimators, a simulation study is made and comparative results are presented. Results show that kNN and regression imputation basis WR esitmators show satisfying performances on estimation of the high-dimensional right-censored model.
Preprint
Metastasis is the principal cause of cancer death, yet we lack an understanding of metastatic cell states, their relationship to primary tumor states, and the mechanisms by which they transition. In a cohort of biospecimen trios from same-patient normal colon, primary and metastatic colorectal cancer, we show that while primary tumors largely adopt Lgr5+ intestinal stem-like states, metastases display progressive plasticity. Loss of intestinal cell states is accompanied by reprogramming into a highly conserved fetal progenitor state, followed by non-canonical differentiation into divergent squamous and neuroendocrine-like states, which is exacerbated by chemotherapy and associated with poor patient survival. Using matched patient-derived organoids, we demonstrate that metastatic cancer cells exhibit greater cell-autonomous multilineage differentiation potential in response to microenvironment cues than their intestinal lineage-restricted primary tumor counterparts. We identify PROX1 as a stabilizer of intestinal lineage in the fetal progenitor state, whose downregulation licenses non-canonical reprogramming.
Preprint
Full-text available
Cancer is rarely the straightforward consequence of an abnormality in a single gene, but rather reflects a complex interplay of many genes, represented as gene modules. Here, we leveraged the recent advances of model-agnostic interpretation approach and developed CGMega, an explainable and graph attention-based deep learning framework to perform cancer gene module dissection. CGMega outperforms current approaches in cancer gene prediction, and it provides a promising approach to integrate multi-omics information. We applied CGMega to breast cancer cell line and acute myeloid leukemia (AML) patients, and we uncovered the high-order gene module formed by ErbB family and tumor factors NRG1, PPM1A and DLG2. We identified 396 candidate AML genes, and observed the enrichment of either known AML genes or candidate AML genes in a single gene module. We also identified patient-specific AML genes and associated gene modules. Together, these results indicate that CGMega can be used to dissect cancer gene modules, and provide high-order mechanistic insights into cancer development and heterogeneity.
Chapter
The introduction of high-throughput gene expression profiling technologies (DNA microarrays) in molecular biology and their expected applications to the clinic have allowed the design of predictive signatures linked to a particular clinical condition or patient outcome in a given clinical setting. However, it has been shown that such signatures are prone to several problems: (i) they are heavily unstable and linked to the set of patients chosen for training; (ii) data topology is problematic with regard to the data dimensionality (too many variables for too few samples); (iii) diseases such as cancer are provoked by subtle misregulations which cannot be readily detected by current analysis methods. To find a predictive signature generalizable for multiple datasets, a strategy of superimposition of a large scale of protein-protein interaction data (human interactome) was devised over several gene expression datasets (a total of 2,464 breast cancer tumors were integrated), to find discriminative regions in the interactome (subnetworks) predicting metastatic relapse in breast cancer. This method, Interactome-Transcriptome Integration (ITI), was applied to several breast cancer DNA microarray datasets and allowed the extraction of a signature constituted by 119 subnetworks. All subnetworks have been stored in a relational database and linked to Gene Ontology and NCBI EntrezGene annotation databases for analysis. Exploration of annotations has shown that this set of subnetworks reflects several biological processes linked to cancer and is a good candidate for establishing a network-based signature for prediction of metastatic relapse in breast cancer.
Article
Introduction: Adamantinoma-like Ewing sarcoma (ALES) is a rare aggressive malignancy occasionally diagnosed in the thyroid gland. ALES shows basaloid cytomorphology, expresses keratins, p63, p40, frequently CD99, and harbours the t(11;22) EWSR1::FLI1 translocation. There is debate on whether ALES resembles more sarcoma or carcinoma. Methods: We performed RNA sequencing from two ALES cases and compared findings with skeletal Ewing's sarcomas and nonneoplastic thyroid tissue. ALES was investigated by in situ hybridization (ISH) for high-risk human papillomavirus (HPV) DNA and immunohistochemistry for the following antigens: keratin 7, keratin 20, keratin 5, keratins (AE1/AE3 and CAM5.2), CD45, CD20, CD5, CD99, chromogranin, synaptophysin, calcitonin, thyroglobulin, PAX8, TTF1, S100, p40, p63, p16, NUT, desmin, ER, FLI1, INI1, and myogenin. Results: An uncommon EWSR1::FLI transcript with retained EWSR1 exon 8 was detected in both ALES cases. Regulators of EWSR1::FLI1 splicing (HNRNPH1, SUPT6H, SF3B1) necessary for production of a functional fusion oncoprotein, as well as 53 genes (including TNNT1, NKX2.2) activated downstream to the EWSR1::FLI1 cascade, were overexpressed. Eighty-six genes were uniquely overexpressed in ALES, most of which were related to squamous differentiation. Immunohistochemically, ALES strongly expressed keratins 5, AE1/AE3 and CAM5.2, p63, p40, p16, and focally CD99. INI1 was retained. The remaining immunostains and HPV DNA ISH were negative. Conclusion: Comparative transcriptomic profiling reveals overlapping features of ALES with skeletal Ewing's sarcoma and an epithelial carcinoma, as evidenced by immunohistochemical expression of keratin 5, p63, p40, CD99, the transcriptome profile, and detection of EWSR1::FLI1 fusion transcript by RNA sequencing.
Thesis
Recent advances in science and technology have enabled genetic testing to be conducted inexpensively, expeditiously, and directly by consumers, therefore allowing individuals access to their genetic information without the intervention of healthcare practitioners. This technology can assist individuals to better manage their wellbeing and conserve healthcare funds. Yet, direct-to-consumer genetic testing is not free from controversy primarily due to potential human rights infringements and a perceived lack of regulation. While direct-to-consumer genetic testing may provide consumers with autonomy, involvement in healthcare decisions, convenience, and enhanced genetic literacy, the field remains contentious. The questionable validity, accuracy, and utility of tests, the absence of professional oversight and lack of suitable genetic counselling, potential result misinterpretation, consent processes, follow-up costs which burden healthcare systems, and privacy concerns surrounding the usage and confidentiality of genetic data for research, have brought direct-to-consumer genetic testing to the fore. Despite its growing prevalence, direct-to-consumer genetic testing remains greatly under-investigated in South Africa and, while the need for regulation has been highlighted, it is yet to be fully examined. Therefore, in this dissertation, I map the current legal landscape relating to direct-to-consumer genetic testing in South Africa. This is done through a comprehensive legal analysis of South Africa’s extant law relevant to the industry, and the issues associated therewith – with the intention of determining if, and how, direct-to-consumer genetic testing is legally governed in South Africa and how its various aspects and processes function within the current legislative framework. Through this analysis, I find that the legal landscape in South Africa relating to direct-to- consumer genetic testing is multi-layered and the industry is, in fact, governed by a variety of, sometimes overlapping, statutes and regulations. Clarifying South Africa’s current legal landscape regarding direct-to-consumer genetic testing enables local, as well as foreign, direct-to- consumer genetic testing companies operating in South Africa to better understand the parameters within which they may legally function, in terms of offering genetic tests directly to the public and subsequent genetic research conducted using the genetic data obtained from the samples of consumers.
Article
Full-text available
The goal of the International HapMap Project is to determine the common patterns of DNA sequence variation in the human genome and to make this information freely available in the public domain. An international consortium is developing a map of these patterns across the genome by determining the genotypes of one million or more sequence variants, their frequencies and the degree of association between them, in DNA samples from populations with ancestry from parts of Africa, Asia and Europe. The HapMap will allow the discovery of sequence variants that affect common disease, will facilitate development of diagnostic tools, and will enhance our ability to choose targets for therapeutic intervention.
Article
Extracellular signal-regulated kinase (ERK) is an important intermediate in signal transduction pathways that are initiated by many types of cell surface receptors. It is thought to play a pivotal role in integrating and transmitting transmembrane signals required for growth and differentiation. Constitutive activation of ERK in fibroblasts elicits oncogenic transformation, and recently, constitutive activation of ERK has been observed in some human malignancies, including acute leukemia. However, mechanisms underlying constitutive activation of ERK have not been well characterized. In this study, we examined the activation of ERK in 79 human acute leukemia samples and attempted to find factors contributing to constitutive ERK activation. First, we showed that ERK and MEK were constitutively activated in acute leukemias by in vitro kinase assay and immunoblot analysis. However, in only one half of the studied samples, the pattern of ERK activation was similar to that of MEK activation. Next, by semiquantitative reverse transcriptase-polymerase chain reaction (RT-PCR) and immunoblot analysis, we showed hyperexpression of ERK in a majority of acute leukemias. In 17 of 26 cases (65.4%) analyzed by immunoblot, the pattern of ERK expression was similar to that of ERK activation. The fact of constitutive activation of ERK in acute leukemias suggested to us the possibility of an abnormal downregulation mechanism of ERK. Therefore, we examined PAC1, a specific ERK phosphatase predominantly expressed in hematopoietic tissue and known to be upregulated at the transcription level in response to ERK activation. Interestingly, in our study, PAC1 gene expression in acute leukemias showing constitutive ERK activation was significantly lower than that in unstimulated, normal bone marrow (BM) samples showing minimal or no ERK activation (P = .002). Also, a significant correlation was observed between PAC1 downregulation and phosphorylation of ERK in acute leukemias (P= .002). Finally, by further analysis of 26 cases, we showed that a complementary role of MEK activation, ERK hyperexpression, and PAC1 downregulation could contribute to determining the constitutive activation of ERK in acute leukemia. Our results suggest that ERK is constitutively activated in a majority of acute leukemias, and in addition to the activation of MEK, the hyperexpression of ERK and downregulation of PAC1 also contribute to constitutive ERK activation in acute leukemias.
Article
Dlx genes constitute a gene family thought to be essential in morphogenesis and de- velopment. We show here that in vertebrate cells, Dlx genes appear to be part of a regulatory cascade initiated by acute lymphoblastic leukemia (ALL)-1, a master regulator gene whose disruption is impli- cated in several human acute leukemias. The ex- pression of Dlx2, Dlx3, Dlx5, Dlx6, and Dlx7 was absent in All-1 / mouse embryonic stem cells and reduced in All-1 / cells. In leukemic pa- tients affected by the t(4;11)(q21;q23) chromo- somal abnormality, the expression of DLX2, DLX3, and DLX4 was virtually abrogated. Our data indicate that Dlx genes are downstream tar- gets of ALL-1 and could be considered as impor- tant tools for the study of the early leukemic cell phenotype. J. Leukoc. Biol. 74: 000-000; 2003.
Article
Alu sequences represent the largest family of short interspersed repetitive elements (SINEs) in humans with 500 000 copies per genome. Recently, one Alu subfamily was found to be human specific (HS). We originally described the use of polymorphic HS Alu insertions as a tool in population studies and recently as tools in DNA fingerprinting and forensic analysis. In this report, we will use this simple polymerase chain reaction (PCR) base technique for the detection of HS Alu insertion polymorphisms. We will test the resolving power of this DNA profiling approach in both population genetics and paternity assessment. At the population level, we will describe the genotypic distribution of five polymorphic Alu insertions among 3 populations from the American continent, one of African origin, the other two Amerindians. Insight into their relationships will be provided. At the family level, we will examine one European American family of seven individuals and the same pedigree will also be characterized by way of the two systems currently and widely used to ascertain paternity: PCR-sequence specific oligonucleotide probe hybridization (PCR-SSO) and PCR-restriction fragment length polymorphism (PCR-RFLP) of human leucocyte antigen (HLA) class II molecules, and a standard RFLP protocol used in forensic casework and paternity studies. The importance and strengths of the method as well as its perspectives for future use in filiation studies will be evaluated.
Article
Metastasis is a rare event. Does it arise from rare, variant, highly metastatic cells or does a primary tumor progress to a premalignant state from which metastases arise stochastically without further changes in gene expression? Arguments and evidence have been adduced to support either position. A paper in this month's Cancer Cell(Kang et al., 2003) and other arguments instead suggest models combining features of both.