ArticlePDF Available

Mining for regulatory programs in the cancer transcriptome

Authors:

Abstract and Figures

DNA microarrays have been widely applied to cancer transcriptome analysis. The Oncomine database contains a large collection of such data, as well as hundreds of derived gene-expression signatures. We studied the regulatory mechanisms responsible for gene deregulation in these cancer signatures by searching for the coordinate regulation of genes with common transcription factor binding sites. We found that genes with binding sites for the archetypal cancer transcription factor, E2F, were disproportionately overexpressed in a wide variety of cancers, whereas genes with binding sites for other transcription factors, such as Myc-Max, c-Rel and ATF, were disproportionately overexpressed in specific cancer types. These results suggest that alterations in pathways activating these transcription factors may be responsible for the observed gene deregulation and cancer pathogenesis.
Content may be subject to copyright.
NATURE GENETICS
|
VOLUME 37
|
NUMBER 6
|
JUNE 2005 579
ANALYSIS
Global gene expression profiling with DNA microarrays has been widely
applied to human cancer, leading to the elucidation of complex gene-
expression programs activated and repressed in various types and sub-
types of cancer, a ‘molecular taxonomy’ of cancer. We and others have
attempted to characterize large collections of cancer gene-expression
data in terms of common ‘signatures’ of activation
1,2
or in terms of
coordinately regulated processes or modules’
3
. But such efforts have
not focused on the regulatory mechanisms responsible for observed
gene-expression alterations in cancer. Some gene-expression patterns
observed from microarray data probably represent a downstream read-
out of a few genetic aberrations (mutations, amplifications, deletions,
translocations, etc.) that led to the activation or inactivation of a few
transcription factors. In some cases, cancer-causing genetic aberrations
may not be directly apparent from these downstream gene-expression
read-outs. For example, a mutation in the Rb tumor suppressor that
leads to dissociation and activation of E2F1 would manifest not in the
differential gene-expression of Rb or E2F1, but in the coordinate activa-
tion of E2F target genes. Global methods for inferring transcriptional
regulatory mechanisms from gene-expression data have been widely
applied to yeast gene expression and also to the human cell cycle
4
but not
yet to human cancer. We searched for cancer regulatory programs that
link transcription factors to target genes that are conditionally activated
in specific cancer types and subtypes (Fig. 1).
We began by defining gene-expression signatures characteristic of a
wide variety of cancer types and subtypes represented in the Oncomine
database
5
. We used data from 65 independent studies including 6,732
microarray experiments and 70.8 million gene-expression measure-
ments to derive 265 gene-expression signatures (Supplementary Table
1 online). Signatures were defined as sets of genes with statistically
significant (Q < 0.10) differential expression in cancer, either relative
to normal tissue or relative to other types or subtypes of cancer. We also
derived normal tissue signatures as sets of genes differentially expressed
in a single normal tissue type relative to other normal tissue types.
Gene-expression signatures ranged in size from 20 genes to 2,200 genes
and represented nearly every major type of cancer and normal tissue.
Next, we constructed a database of transcriptional regulatory sig-
natures, relating transcription factors to candidate target genes by
identifying putative transcription factor binding sites in the promoter
sequences of human genes. We submitted all 1-kb human promoter
sequences to the MATCH software program, which identifies and scores
sequence matches to transcription factor binding site position weight
matrices from the TRANSFAC database
6
. Although a high-scoring
match does not constitute a definitive transcription factor binding site
and regulatory interaction, we reasoned that the sets of genes with the
highest scoring matches are likely to be enriched for true target genes.
After we applied a match threshold and rank filter, our database con-
tained 361 regulatory signatures, comprising 466,491 potential regula-
tory interactions that represent putative transcription factor binding
sites in the promoters of candidate target genes (Supplementary Table
2 online). There are several limitations to this approach, and we concede
that our database is incomplete and probably contains false interac-
tions. But the database is sufficient for large-scale enrichment analysis
as well as initial hypothesis generation.
With a database of gene-expression signatures and transcriptional
regulatory signatures in place, we sought to identify conditional regula-
tory programs (CRPs), consisting of a transcription factor that coor-
dinately regulates a set of target genes in a particular tissue type. We
identified candidate CRPs by searching for disproportionate overlap
of regulatory signatures with gene-expression signatures. We reasoned
that if a transcription factor is responsible for the coordinate regulation
Mining for regulatory programs in the cancer
transcriptome
Daniel R Rhodes
1–3
, Shanker Kalyana-Sundaram
1
, Vasudeva Mahavisno
1
, Terrence R Barrette
1
, Debashis Ghosh
2,4
& Arul M Chinnaiyan
1–3,5
DNA microarrays have been widely applied to cancer transcriptome analysis. The Oncomine database contains a large
collection of such data, as well as hundreds of derived gene-expression signatures. We studied the regulatory mechanisms
responsible for gene deregulation in these cancer signatures by searching for the coordinate regulation of genes with common
transcription factor binding sites. We found that genes with binding sites for the archetypal cancer transcription factor, E2F, were
disproportionately overexpressed in a wide variety of cancers, whereas genes with binding sites for other transcription factors,
such as Myc-Max, c-Rel and ATF, were disproportionately overexpressed in specific cancer types. These results suggest that
alterations in pathways activating these transcription factors may be responsible for the observed gene deregulation and cancer
pathogenesis.
1
Department of Pathology,
2
Bioinformatics Program,
3
Comprehensive Cancer
Center, and Departments of
4
Biostatistics and
5
Urology, University of Michigan
Medical School, Ann Arbor, Michigan 48109, USA. Correspondence should be
addressed to A.M.C. (arul@umich.edu).
Published online 26 May 2005; doi:10.1038/ng1578
© 2005 Nature Publishing Group http://www.nature.com/naturegenetics
580 VOLUME 37
|
NUMBER 6
|
JUNE 2005
|
NATURE GENETICS
ANALYSIS
of a set of genes in a given tissue type, then the candidate target genes of
the transcription factor should be disproportionately over-represented
in the gene-expression signature. We compared 265 gene expression
signatures with 366 regulatory signatures. We counted the degree of
overlap between all signature pairs and computed the significance of
the overlaps by the binomial distribution. From this analysis, we defined
311 regulatory programs that showed highly significant overlap (P <
0.00033) between a gene-expression signature and a regulatory signa-
ture (Supplementary Tables 3–6 online). Given the total number of
hypotheses tested, we would expect only 31 signature pairs to have such
significant overlap by chance (Q < 0.10; Fig. 2a).
We first examined the 81 CRPs specific to normal human tissues.
Several of the programs validated our method by specifically linking a
transcription factor to the tissue type in which it is known to act. For
example, CRPs 11 and 58 are composed of genes activated in normal
liver tissue that have promoter binding sites for HNF4α and HNF1,
respectively, two hepatocyte nuclear factors known to control liver-spe-
cific processes
7,8
(Fig. 2b). CRP 222 is composed of genes activated in
muscle tissue that have promoter binding sites for MEF2A (also called
RSRFC4), a transcription factor with a known role in myocyte differen-
tiation
9
. CRP 24 is composed of genes activated in normal brain tissue
that have promoter binding sites for the EGR (also called KROX) family
of transcription factors
10
. CRPs 32, 136 and 307 are composed of genes
activated in normal blood cells that have binding sites for IRF family of
transcription factors, which are activated in white blood cells and have
a role in host defense
11
. Similarly, binding sites for NF-κB, which has a
central role in immune function, are also enriched in genes expressed
in blood cells (CRPs 178 and 264)
12
. The gene-expression signature for
early progenitor cells showed an enrichment for Oct/POU family tran-
scription factors, which function in pluripotent stem cells
13,14
. Taken
together, these results show that our approach can identify regulatory
mechanisms responsible for gene regulation in human tissue. They also
suggest sets of target genes for each of these tissue-specific regulatory
programs. Other normal tissue CRPs may link transcription factors to
tissue types in which they were not previously known to act. The full
list of normal tissue CRPs is provided in Supplementary Table 4 online
and can be explored in detail at Oncomine.
Next, we examined the 232 CRPs involving human cancer. More than
half (126) of these relate one of several variant E2F binding sites to tar-
get genes in one of many cancer types, including follicular lymphoma,
Burkitt lymphoma, diffuse large B-cell lymphoma (DLBCL), acute
lymphoblastic leukemia, glioblastoma, medulloblastoma, leiomyosar-
coma, small cell lung cancer (SCLC), squamous cell lung cancer, hepa-
tocellular carcinoma, salivary adenoid cystic carcinoma, adrenocortical
carcinoma, high-grade astrocytoma and high-grade breast carcinoma.
These results reaffirm that activation of the E2F pathway is a prevalent
event in human cancer
15,16
and provide hundreds of putative E2F tar-
gets activated in specific human tumors. As others have found, we show
through a second layer of enrichment analysis that E2F CRPs include
genes involved in several cellular proliferation related processes such as
the cell cycle, DNA replication and mRNA splicing. We also show that
several E2F cancer regulatory programs, including those activated in
high-grade breast cancer (CRP 99) and small cell lung cancer (CRP 96),
are enriched for proteins involved in chromatin modification, including
EZH2, JJAZ1 (SUZ12), CBX3, HMGA1 and BAF53A. The E2F path-
way regulates EZH2 (ref. 17); perhaps its role in regulating chromatin
modifying genes is more widespread. Regulatory programs linking
NF-Y binding sites to cancer signatures were also common and usually
coincided with E2F cancer regulatory programs. This is not surprising
as the binding of the NF-Y transcription factor to certain promoters
is necessary for E2F activity
16
. NF-Y transactivation is dependent on
phosphorylation by CDK2, suggesting a potential therapeutic approach
for repressing NF-Y and thus E2F cancer regulatory programs
18
.
To confirm that the E2F cancer regulatory programs represent gene
sets truly activated by E2F, we collected data from an independent study
that identified transcriptional targets of the E2F family in an inducible
cell line system
19
. In total, 558 genes were significantly overexpressed
upon E2F activation. We reasoned that if at least a fraction of these
results represented a physiologically relevant E2F signature, and if our
E2F cancer regulatory programs represented valid programs activated
by E2F in human cancer in vivo, then we should find substantial overlap
between the two. To test this, we selected a representative E2F binding
site and its ten respective cancer regulatory programs. In nine of the ten
CRPs, we found a significant enrichment of genes from the in vitro E2F
signature (P < 0.005), suggesting that our approach identified valid E2F
targets activated in human cancers (Supplementary Table 7 online).
To select the most promising candidate E2F targets among our CRPs,
we identified target genes that are activated by E2F in the inducible cell
culture system and are most common in E2F CRPs. Among the nine
CRPs that showed significant overlap with the in vitro signature, eight
contained three known E2F targets, including CCNE2, RRM2 (ref. 20)
and EZH2 (ref. 17). Other known E2F targets activated in many cancer
CRPs include TFDP1, CDC25, RPA1 and USP13. The near universal
Figure 1 Overview of the method used to elucidate CRPs. Data were
integrated from three sources: TRANSFAC
6
, the University of California
Santa Cruz (UCSC) genome browser and Oncomine
5
. Putative transcription
factor regulatory signatures were compared with gene-expression signatures,
and their overlap was assessed using the binomial distribution to derive
CRPs.
TRANSFAC
TRANSCRIPTIONFACTOR
POSITIONWEIGHTMATRICES
UCSC genome browser
2EF3EQ
KBPROMOTERSEQUENCES
Oncomine
MICROARRAYS
MILLIONMEASUREMENTS
Putative binding site identification
(1) Match algorithm
(2) rank threshold
Differential expression analysis
(1) t-test
(2) false discovery rate correction
Gene-expression signatures
205 cancer signatures
29 normal tissue signatures
Putative regulatory signatures
466,491 candidate binding sites
Expected
Observed
0
I.I
.
NI
PP
I.
.
´
´
µ
´
¦
§
¥
´
¤

Enrichment analysis
S1S2 S3 S4S5S6S7 S8 S9 S10 S11S12S13S14S15S16S17S18
CRPs
Transcription factor X regulates target gene Y in tissue Z
Factor
© 2005 Nature Publishing Group http://www.nature.com/naturegenetics
NATURE GENETICS
|
VOLUME 37
|
NUMBER 6
|
JUNE 2005 581
ANALYSIS
activation of these genes by E2F in cancer suggests that they are crucial
mediators of carcinogenesis. For example, the observed activation of
cyclin E2 by E2F is part of an autoregulatory loop, as cyclin E2 activates
CDKs, which further activate E2F
21
. Most of these E2F target genes have
a role in cellular proliferation; therefore, their importance in E2F-medi-
ated tumorigenesis is not surprising. Notably, however, EZH2 functions
as a chromatin-modifying transcriptional repressor and is important in
embryogenesis
22
. Functional studies showed that EZH2 promotes inva-
sion in breast cancer cells
23
and is associated with a lethal phenotype in
prostate cancer
24
. EZH2 is also associated with poorly differentiated can-
cers relative to their well-differentiated counterparts
2
. Perhaps the hyper-
activated E2F pathway leads to EZH2 overexpression and concomitant
repression of prodifferentiation genes, thus locking cells into an undif-
ferentiated invasive phenotype. This raises the possibility that the E2F
pathway is responsible for both cancer growth and dedifferentiation.
Our analysis uncovered several other cancer regulatory programs
involving transcription factors other than E2F. For example, CRP 120
suggests that c-Rel activates hundreds of target genes in DLBCL, and
CRP 45 suggests that c-Rel activity is most apparent in de novo DLBCL
relative to transformed DLBCL (Fig. 2c). The enrichment of c-Rel bind-
ing sites in the promoters of genes activated in DLBCL is consistent
with the observation of c-Rel amplification in DLBCL
25
. Although one
report failed to find a link between c-Rel amplification status and down-
stream gene-expression changes
26
, our results suggest that c-Rel activity
is evident from DLBCL gene-expression patterns. Transformed and
de novo DLBCL are markedly different at the gene-expression level
27
,
though morphologically indistinguishable. Our work suggests that a
key difference may be the specific activation of the c-Rel regulatory
program in de novo DLBCL. Upon examination of the target genes in
the c-Rel–DLBCL program, we found an enrichment of genes involved
in both cell proliferation and apoptosis. We found that E2F1 was among
the target gene set; our analysis also identified an E2F1-DLBCL regula-
tory program. These results suggested that there may be a two-tiered
regulatory mechanism beginning with c-Rel activation of target genes,
which include E2F1, and then E2F1 activation of its target genes, many
of which have a role in cellular proliferation (Fig. 2c).
Another regulatory program (CRP 240) details an abundance of c-
Myc–Max binding sites among genes overexpressed in SCLC. This is
consistent with the known amplification and overexpression of the
Myc family of transcription factors in SCLC
28,29
. Enrichment analysis
of this program identified a preponderance of genes involved in DNA
metabolism and the cell cycle. Furthermore, we found that this program
significantly overlapped with the E2F1-SCLC regulatory program (CRP
96; P = 0.01), suggesting that several genes are dually activated by E2F1
and Myc in SCLC. Myc binding sites were also common among genes
activated in normal umbilical endothelial cells (CRP 186) as well as
in adreocortical carcinoma (CRP 320). The final regulatory program
that we explored suggested that ATF activates target genes in salivary
carcinoma, of which a disproportionate number are involved in cell
migration (CRP 177). ATF1 is activated as a fusion protein in meta-
static melanoma
30
, but no link between ATF and salivary carcinoma
currently exists. If the ATF program is indeed overactivated in salivary
carcinoma, then therapies targeting ATF in melanoma
31
may be useful.
We observed that many of the transcription factors involved in cancer
regulatory programs have oncogenic activity, suggesting that their pre-
dicted regulatory function in CRPs may be important in carcinogen-
esis. Transcription factors identified by our analysis with a causative
role in cancer include E2F1 (ref. 32), Myc
33
, c-Rel
34
, ATF1 (ref. 30) and
C-ETS-1 (ref. 35). All regulatory programs and their target genes can be
explored through our web-based data-mining platform, Oncomine.
The identification of a CRP implies that a specific transcription factor
is active in a specific tissue type and is responsible for the observed gene
regulation or deregulation. Activation of a transcription factor can occur
either as a downstream effect of a signaling cascade (e.g., phosphoryla-
1
10
100
1,000
10,000
23456789101112
/VERLAPSIGNIFICANCENEGATIVELOGPVALUE
#OUNTOFSIGNATUREPAIRS
A
Transformed
Follicular
De novo
PQBP1
LNPEP
ZNF212
BTG2
DKFZp566O084
DRPLA
SSNA1
UBE2M
FTH1
FLJ2001
4
CCRL2
MGC15677
PKMYT1
PCQAP
T
TK
C9orf23
E2F1
ML
L
RC3
PLEKHA3
Gene
TUBG1
KIAA0406
HRD1 | MRPL49
NUP98 | HEL308
TRA2A | ARHGEF2
RNASE4
SLC3A2
DVL3
raptor
EDF1
CPSF1 | ADCK5
MBD3
PKMYT1 | FLJ30002
MAZ
TTK
SLC3A2
E2F1
TRIP10
NAG
RC3
Transformed
Follicular
De novo
Gene
S1S2 S3 S4S5S6S7 S8 S9 S10 S11S12S13 S14S15S16S17S18
Li
19 other types
HNF4
B
E2F1
c-REL
C
Figure 2 Regulatory programs encoded in
gene-expression signatures. (a) Regulatory
programs were inferred if a gene-expression
signature significantly overlapped with a
signature of candidate transcription factor
targets. The number of significant overlaps
observed (red) is compared with the number
expected by chance (black). (b) A representative
normal tissue regulatory program linking
the HNF4 transcription factor to 78 target
genes (20 shown) exclusively activated in
normal liver tissue (Li) relative to several
other normal tissues. (c) Two regulatory
programs activated in de novo DLBCL relative
to post-transformation DLBCL and follicular
lymphoma. The transcription factor of the
second program (E2F1) is a target gene in the
first program (c-Rel), suggesting a two-tier
regulatory mechanism. Red indicates relative
overexpression of genes (rows) in the profiled
samples (columns); blue indicates relative
underexpression.
© 2005 Nature Publishing Group http://www.nature.com/naturegenetics
582 VOLUME 37
|
NUMBER 6
|
JUNE 2005
|
NATURE GENETICS
ANALYSIS
tion, nuclear translocation) or by overexpression of the transcription
factor itself. To identify those CRPs that may be regulated by the latter
mechanism, we searched for concomitant overexpression of the tran-
scription factors that bind to the enriched binding sites. We found over-
expression of a respective transcription factor for 79 of the 311 identified
CRPs (25%). This analysis may provide a level of validity to some CRPs
and, in some cases, may identify the specific transcription factor among
a family that is responsible for the target gene regulation. For example,
the CREB-ATF binding site is enriched among genes overexpressed in
salivary carcinoma (CRP 250). We found that both CREB1 and ATF5
are significantly overexpressed in salivary carcinoma, suggesting that
they may be responsible for the CREB-ATF CRP (Supplementary Fig.
1 online). We also observed concomitant overexpression of the expected
transcription factor with several of the normal tissue CRPs, including
HNF4A in liver, MEF2A in muscle, EGR3 and EGR4 in brain and mul-
tiple IRFs in blood (Supplementary Table 8 online).
In summary, integrative bioinformatics analyses similar to those
carried out by Segal et al.
3
, our group and others
1,36
will generate new
hypotheses about cancer progression. Previously, by carefully main-
taining clinical annotations of the specific tissue specimens analyzed,
we were able to identify gene alterations that were common to cancer
regardless of tissue of origin as well as gene signatures characteristic of
more aggressive dedifferentiated cancers
2
. In this report, our integrative
approach of analyzing gene-expression signatures in the context of can-
didate regulatory signatures identified hundreds of normal tissue and
cancer CRPs, of which we have highlighted only a few. Several of these
regulatory programs link a transcription factor to a tissue type in which
the transcription factor is thought to act, whereas several others suggest
new regulatory mechanisms in cancer and normal tissue, such as ATF
pathway activation in salivary carcinoma. Furthermore, the regulatory
programs uncovered by our analysis suggest candidate target genes,
such as E2F1 activation by c-Rel in de novo DLBCL and chromatin-
modifying genes by E2F in high-grade breast cancer. Though powerful,
our approach has several limitations: (i) the number of characterized
transcription factor binding sites, (ii) the accuracy of the binding sites,
(iii) the facts that we only scanned 1-kb promoters and that binding
sites are likely to occur outside this region and (iv) the number of genes
profiled in the microarray studies and the sensitivity of the various
microarray platforms. Despite these limitations, we were able to discern
several regulatory mechanisms encoded in gene-expression signatures.
We anticipate that our approach will become more valuable as the accu-
racy and coverage of transcription factor target databases improves.
METHODS
Cancer signatures. We derived cancer signatures from the Oncomine
cancer microarray database. We used 65 independent data sets compris-
ing 6,348 samples (arrays) and 70.9 million gene-expression measure-
ments. The samples spanned 26 normal and cancer tissue types. The 65
data sets measured an average of 6,376.5 (range 507–15,294) unique
genes as determined by Entrez Gene. We analyzed differential expres-
sion with Student’s t-test and false discovery rates to identify genes with
significant differential expression between two classes of samples. We
defined gene-expression signatures from analyses that resulted in 20 or
more significant genes (Q < 0.10, mean difference > 0.5 Z-score units),
for a total of 234 gene-expression signatures with an average size of
398 genes (range 20–2,997). Twenty-nine were normal human tissue
signatures, and 205 were cancer signatures, of which 68 were derived
from comparisons of a cancer type and other cancer types, 50 from
comparisons of a cancer type and the respective normal tissue, 22 from
comparisons of various molecular subtypes of cancer and 12 from com-
parisons of histologic subtypes of cancer.
Regulatory signatures. We defined regulatory signatures by scanning
human gene promoter sequences for the presence of experimentally
defined transcription factor binding sites. We downloaded 1-kb pro-
moter sequences from 20,647 RefSeq reference sequences from the
University of California Santa Cruz genome browser (August 2004).
These reference sequences mapped to 15,665 unique genes (Entrez
Gene). In cases with multiple reference sequences per gene, we ana-
lyzed each promoter sequence independently. We submitted sequences
sequentially to MATCH, a component of the TRANSFAC Professional
Suite, which scans a sequence for the presence of transcription factor
binding sites as determined by a database of position weight matrices.
We applied the following settings: group of matrices was set to ‘verte-
brates’; ‘use high quality matrices’ was selected; cut-off selection’ was
set to 0.8 and 0.85 as mat. sim and core sim. cutoff. For each promoter
sequence, the program output ‘hits’ designated by matrix identifier and
factor name. For each hit, the position, strand, core match and matrix
match were provided. In total, 366 distinct matrices were identified in
the promoters of human genes, although many of the matrices represent
variants of the same transcription factor binding site. With the afore-
mentioned settings, 16,159,457 million hits were identified. Because in
some cases, our lenient match threshold identified hits in nearly every
promoter sequence, we filtered the hit list to contain only the top 2,000
hits per matrix sorted by the matrix similarity score. Five matrices with
greater than 2,000 perfect matches (score = 1.0) were removed from the
analysis. To ensure that our results would be robust to the selected hit
threshold, the analysis was rerun with 1,500 and 2,500 hit thresholds.
As expected, we obtained largely overlapping results (data not shown).
After mapping reference sequences to Entrez Gene, we defined 466,491
potential regulatory interactions. Transcription factor matrices had an
average of 1,292.2 potential gene targets (range 4–1,554).
Enrichment analysis. We assessed each gene-expression signature (S
G
)
for the significant enrichment of each regulatory signature (S
R
). The
possible set for each gene-expression signature (P
G
) was defined as the
set of measured genes in each respective data set. The possible set for
regulatory signatures (P
R
) was defined as the set of genes with available
promoter sequences. We counted the number of genes intersecting a
gene-expression signature and a regulatory signature: n = c(S
G
S
R
),
where c(A) denotes the number of elements in set A. We counted the
number of genes in both the regulatory signature and the possible set
for the gene-expression signature: N = c(S
R
P
G
). Next, we computed
the background probability of observing a gene in a gene-expression
signature by dividing the number of genes in both the gene-expression
signature and the possible set for the regulatory signature by the number
of genes in both possible sets:
Finally, we calculated the probability of observing an equal or larger
intersection between the gene-expression signature and regulatory sig-
nature by chance by summing the binomial distribution probabilities
for all intersections of equal or larger size:
We applied the method of false discovery rates to adjust P values for
multiple hypothesis testing. We calculated Q values as:
© 2005 Nature Publishing Group http://www.nature.com/naturegenetics
NATURE GENETICS
|
VOLUME 37
|
NUMBER 6
|
JUNE 2005 583
ANALYSIS
where N is the number of regulatory signatures tested against each
gene-expression signature and R is the ascending order rank of the
respective P value. We also calculated global Q values where N is the
total number of hypotheses tested (all gene-expression signatures by
all regulatory signatures). We used the global Q value to assess the sig-
nificance of our entire study and used the signature-specific Q values
to interpret the significance of the observed enrichments per gene-
expression signature.
In vitro E2F analysis. We collected target genes for E2F1, E2F2 and
E2F3 from an in vitro E2F profiling study
19
. We created a composite
signature of 588 target genes by combining all genes that were induced
by any one of the E2F family members. We selected ten CRPs that cor-
responded to a single representative E2F binding site (V$E2F_Q4_01)
for enrichment analysis. We carried out enrichment analysis by the
binomial distribution exactly as described in the preceding section.
URLs. The Oncomine database is available at http://www.oncomine.
org/. Promoter sequences from the University of California Santa Cruz
genome browser are available at http://hgdownload.cse.ucsc.edu/gold-
enPath/hg17/bigZips/.
Note: Supplementary information is available on the Nature Genetics website.
ACKNOWLEDGMENTS
We thank D. Gibbs for hardware support and R. Varambally for database support.
This research is supported in part by the National Institutes of Health through the
University of Michigans Cancer Center Support Grant, pilot funds from the Dean’s
Office and the Department of Pathology. D.R.R. was supported by the Medical
Scientist Training Program and the Cancer Biology Training Program, and A.M.C.
is a Pew Scholar.
COMPETING INTERESTS STATEMENT
The authors declare that they have no competing financial interests.
Published online at http://www.nature.com/naturegenetics/
1. Ramaswamy, S., Ross, K.N., Lander, E.S. & Golub, T.R. A molecular signature of
metastasis in primary solid tumors. Nat. Genet. 33, 49–54 (2003).
2. Rhodes, D.R. et al. Large-scale meta-analysis of cancer microarray data identifies
common transcriptional profiles of neoplastic transformation and progression. Proc.
Natl. Acad. Sci. USA 101, 9309–9314 (2004).
3. Segal, E., Friedman, N., Koller, D. & Regev, A. A module map showing conditional
activity of expression modules in cancer. Nat. Genet. 36, 1090–1098 (2004).
4. Elkon, R., Linhart, C., Sharan, R., Shamir, R. & Shiloh, Y. Genome-wide in silico
identification of transcriptional regulators controlling the cell cycle in human cells.
Genome Res. 13, 773–780 (2003).
5. Rhodes, D.R. et al. ONCOMINE: a cancer microarray database and integrated data-
mining platform. Neoplasia 6, 1–6 (2004).
6. Matys, V. et al. TRANSFAC: transcriptional regulation, from patterns to profiles.
Nucleic Acids Res. 31, 374–378 (2003).
7. Sladek, F.M., Zhong, W.M., Lai, E. & Darnell, J.E. Jr. Liver-enriched transcription
factor HNF-4 is a novel member of the steroid hormone receptor superfamily. Genes
Dev. 4, 2353–2365 (1990).
8. Xanthopoulos, K.G. et al. The different tissue transcription patterns of gene for
HNF-1, C/EBP, HNF-3, and HNF-4, protein factors that govern liver-specific transcrip-
tion. Proc. Natl. Acad. Sci. USA 88, 3807–3811 (1991).
9. Black, B.L. & Olson, E.N. Transcriptional control of muscle development by myocyte
enhancer factor-2 (MEF2) proteins. Annu. Rev. Cell Dev. Biol. 14, 167–196 (1998).
10. O’Donovan, K.J., Tourtellotte, W.G., Millbrandt, J. & Baraban, J.M. The EGR family
of transcription-regulatory factors: progress at the interface of molecular and systems
neuroscience. Trends Neurosci. 22, 167–173 (1999).
11. Taniguchi, T., Ogasawara, K., Takaoka, A. & Tanaka, N. IRF family of transcription
factors as regulators of host defense. Annu. Rev. Immunol. 19, 623–655 (2001).
12. Caamano, J. & Hunter, C.A. NF-kappaB family of transcription factors: central regula-
tors of innate and adaptive immune functions. Clin. Microbiol. Rev. 15, 414–429
(2002).
13. Rosner, M.H. et al. A POU-domain transcription factor in early stem cells and germ
cells of the mammalian embryo. Nature 345, 686–692 (1990).
14. Nichols, J. et al. Formation of pluripotent stem cells in the mammalian embryo depends
on the POU transcription factor Oct4. Cell 95, 379–391 (1998).
15. La Thangue, N.B. The yin and yang of E2F-1: balancing life and death. Nat. Cell Biol.
5, 587–589 (2003).
16. Zhu, W., Giangrande, P.H. & Nevins, J.R. E2Fs link the control of G1/S and G2/M
transcription. EMBO J. 23, 4615–4626 (2004).
17. Bracken, A.P. et al. EZH2 is downstream of the pRB-E2F pathway, essential for prolif-
eration and amplified in cancer. EMBO J. 22, 5323–5335 (2003).
18. Chae, H.D., Yun, J., Bang, Y.J. & Shin, D.Y. Cdk2-dependent phosphorylation of the
NF-Y transcription factor is essential for the expression of the cell cycle-regulatory
genes and cell cycle G1/S and G2/M transitions. Oncogene 23, 4084–4088 (2004).
19. Muller, H. et al. E2Fs regulate the expression of genes involved in differentiation,
development, proliferation, and apoptosis. Genes Dev. 15, 267–285 (2001).
20. DeGregori, J., Kowalik, T. & Nevins, J.R. Cellular targets for activation by the E2F1
transcription factor include DNA synthesis- and G1/S-regulatory genes. Mol. Cell Biol.
15, 4215–4524 (1995).
21. Keenan, S.M., Lents, N.H. & Baldassare, J.J. Expression of cyclin E renders cyclin D-
CDK4 dispensable for inactivation of the retinoblastoma tumor suppressor protein, acti-
vation of E2F, and G1-S phase progression. J. Biol. Chem. 279, 5387–5396 (2004).
22. Cao, R. et al. Role of histone H3 lysine 27 methylation in Polycomb-group silencing.
Science 298, 1039–1043 (2002).
23. Kleer, C.G. et al. EZH2 is a marker of aggressive breast cancer and promotes neoplastic
transformation of breast epithelial cells. Proc. Natl. Acad. Sci. USA 100, 11606–
11611 (2003).
24. Varambally, S. et al. The polycomb group protein EZH2 is involved in progression of
prostate cancer. Nature 419, 624–629 (2002).
25. Gilmore, T.D., Kalaitzidis, D., Liang, M.C. & Starczynowski, D.T. The c-Rel transcrip-
tion factor and B-cell proliferation: a deal with the devil. Oncogene 23, 2275–2286
(2004).
26. Houldsworth, J. et al. Relationship between REL amplification, REL function, and clini-
cal and biologic features in diffuse large B-cell lymphomas. Blood 103, 1862–1868
(2004).
27. Lossos, I.S. et al. Transformation of follicular lymphoma to diffuse large-cell lymphoma:
alternative patterns with increased or decreased expression of c-myc and its regulated
genes. Proc. Natl. Acad. Sci. USA 99, 8886–8891 (2002).
28. Nau, M.M. et al. L-myc, a new myc-related gene amplified and expressed in human
small cell lung cancer. Nature 318, 69–73 (1985).
29. Wong, A.J. et al. Gene amplification of c-myc and N-myc in small cell carcinoma of
the lung.
Science 233, 461–464 (1986).
30. Zucman, J. et al. EWS and ATF-1 gene fusion induced by t(12;22) translocation in
malignant melanoma of soft parts. Nat. Genet. 4, 341–345 (1993).
31. Jean, D. & Bar-Eli, M. Targeting the ATF-1/CREB transcription factors by single chain
Fv fragment in human melanoma: potential modality for cancer therapy. Crit. Rev.
Immunol. 21, 275–286 (2001).
32. Johnson, D.G., Cress, W.D., Jakoi, L. & Nevins, J.R. Oncogenic capacity of the E2F1
gene. Proc. Natl. Acad. Sci. USA 91, 12823–12827 (1994).
33. Schwab, M., Varmus, H.E. & Bishop, J.M. Human N-myc gene contributes to neoplastic
transformation of mammalian cells in culture. Nature 316, 160–162 (1985).
34. Sylla, B.S. & Temin, H.M. Activation of oncogenicity of the c-rel proto-oncogene. Mol.
Cell Biol. 6, 4709–4716 (1986).
35. Seth, A. & Papas, T.S. The c-ets-1 proto-oncogene has oncogenic activity and is posi-
tively autoregulated. Oncogene 5, 1761–1767 (1990).
36. Lamb, J. et al. A mechanism of cyclin D1 action encoded in the patterns of gene
expression in human cancer. Cell 114, 323–334 (2003).
© 2005 Nature Publishing Group http://www.nature.com/naturegenetics
... 13 and Oncomine (www.oncomine.org) 14 databases were used to assess the expression of DEGs in SKCM cells. The cancer genome atlas (TCGA) gene alterations (mutations and copy number changes) and gene expression (reads per kilobase of RNA-Seq reads) of the dataset were analyzed using the cBioPortal software with the default settings. ...
Article
Full-text available
Background High levels of UV exposure are a significant factor that can trigger the onset and progression of SKCM. Moreover, this exposure is closely linked to the malignancy of the tumor and the prognosis of patients. Our objective is to identify a tumor biomarker database associated with UV exposure, which can be utilized for prognostic analysis and diagnosis and treatment of SKCM. Methods This study used the weighted gene co-expression network analyses (WGCNA) and gene mutation frequency analyses to screen for UV-related target genes using the GSE59455 and the cancer genome atlas databases (TCGA). The prognostic model was created using Cox regression and least absolute shrinkage and selection operator analyses (LASSCO). Furthermore, in vitro experiments further validated that the overexpression or knockdown of COL4A3 could regulate the proliferation and migration abilities of SKMEL28 and A357 melanoma cells. Results A prognostic model was created that included six genes with a high UV-related mutation in SKCM: COL4A3, CHRM2, DSC3, GIMAP5, LAMC2, and PSG7. The model had a strong patient survival correlation (P˂0.001, hazard ratio (HR) = 1.57) and significant predictor (P˂0.001, HR = 3.050). Furthermore, the model negatively correlated with immune cells, including CD8⁺ T cells (Cor=−0.408, P˂0.001), and M1-type macrophages (Cor=−0.385, P˂0.001), and immune checkpoints, including programmed cell death ligand-1. Moreover, we identified COL4A3 as a molecule with significant predictive functionality. Overexpression of COL4A3 significantly inhibited the proliferation, migration, and invasion abilities of SKMEL28 and A357 melanoma cells, while knockdown of COL4A3 yielded the opposite results. And overexpression of COL4A3 enhanced the inhibitory effects of imatinib on the proliferation, migration, and invasion abilities of SKMEL28 and A357 cells. Conclusion The efficacy of the prognostic model was validated by analyzing the prognosis, immune infiltration, and immune checkpoint profiles. COL4A3 stands out as a novel diagnostic and therapeutic target for SKCM, offering new strategies for small-molecule targeted drug therapies.
... Then, candidate GRNs were inferred according to the Spearman correlation between TF activities. The rationale behind using the TF activity, but not the expression, is that aberrant TF behavior in the disease state may not get manifested in the differential gene expression of the TF, rather in the coordinated activation of the target genes 27,28 . We obtained 532 candidate GRNs (examples in Supplementary Fig. 1) by varying the hyperparametersnamely, the number of TFs selected from each method (VIPER, RI, NetAct), the ATAC-seq TF-target gene binding probability, and the TF activity correlation cutoff (see Supplementary Table 2). ...
Article
Full-text available
Acute myeloid leukemia (AML) is characterized by uncontrolled proliferation of poorly differentiated myeloid cells, with a heterogenous mutational landscape. Mutations in IDH1 and IDH2 are found in 20% of the AML cases. Although much effort has been made to identify genes associated with leukemogenesis, the regulatory mechanism of AML state transition is still not fully understood. To alleviate this issue, here we develop a new computational approach that integrates genomic data from diverse sources, including gene expression and ATAC-seq datasets, curated gene regulatory interaction databases, and mathematical modeling to establish models of context-specific core gene regulatory networks (GRNs) for a mechanistic understanding of tumorigenesis of AML with IDH mutations. The approach adopts a new optimization procedure to identify the top network according to its accuracy in capturing gene expression states and its flexibility to allow sufficient control of state transitions. From GRN modeling, we identify key regulators associated with the function of IDH mutations, such as DNA methyltransferase DNMT1, and network destabilizers, such as E2F1. The constructed core regulatory network and outcomes of in-silico network perturbations are supported by survival data from AML patients. We expect that the combined bioinformatics and systems-biology modeling approach will be generally applicable to elucidate the gene regulation of disease progression.
... This definition can apply, but is not limited to, an enzyme's catalytic activity. Footprint-based activity estimation (Dugourd & Saez-Rodriguez, 2019) relies on the concept that the measured abundances of molecules (such as phosphopeptides or transcripts) can be used as a proxy of upstream (direct or indirect) regulator activities responsible for those changes (Rhodes et al, 2005;Casado et al, 2013;Ochoa et al, 2016). In the case of TF activity estimation, this means that measured changes in the abundances of transcripts give us information about the changes of activities of the transcription factors that regulate their abundance. ...
Article
Full-text available
Multi-omics datasets can provide molecular insights beyond the sum of individual omics. Various tools have been recently developed to integrate such datasets, but there are limited strategies to systematically extract mechanistic hypotheses from them. Here, we present COSMOS (Causal Oriented Search of Multi-Omics Space), a method that integrates phosphoproteomics, transcriptomics, and metabolomics datasets. COSMOS combines extensive prior knowledge of signaling, metabolic, and gene regulatory networks with computational methods to estimate activities of transcription factors and kinases as well as network-level causal reasoning. COSMOS provides mechanistic hypotheses for experimental observations across multi-omics datasets. We applied COSMOS to a dataset comprising transcriptomics, phosphoproteomics, and metabolomics data from healthy and cancerous tissue from eleven clear cell renal cell carcinoma (ccRCC) patients. COSMOS was able to capture relevant crosstalks within and between multiple omics layers, such as known ccRCC drug targets. We expect that our freely available method will be broadly useful to extract mechanistic insights from multi-omics studies.
... Integrating temporal expression data with protein interaction data is more challenging. Generally, the integration of -omics data with interactomes is very useful to gain deeper insight, like identifying dysregulated pathways or gene communities of interest [6][7][8][9] . Popular approaches in network analysis combined with expression data include community detection, identification of active subnetworks or of changes in general network features such as centrality measures [10][11][12][13][14][15][16][17][18][19][20][21] . ...
Preprint
Full-text available
Integrating -omics data with biological networks such as protein-protein interaction networks is a popular and useful approach to interpret expression changes of genes in changing conditions, and to identify relevant cellular pathways, active subnetworks or network communities. Yet, most -omics data integration tools are restricted to static networks and therefore cannot easily be used for analyzing time-series data. Determining regulations or exploring the network structure over time requires time-dependent networks which incorporate time as one component in their structure. Here, we present a method to project time-series data on sequential layers of a multilayer network, thus creating a temporal multilayer network (tMLN). We implemented this method as a Cytoscape app we named TimeNexus. TimeNexus allows to easily create, manage and visualize temporal multilayer networks starting from a combination of node and edge tables carrying the information on the temporal network structure. To allow further analysis of the tMLN, TimeNexus creates and passes on regular Cytoscape networks in form of static versions of the tMLN in three different ways: i) over the entire set of layers, ii) over two consecutive layers at a time, iii) or on one single layer at a time. We combined TimeNexus with the Cytoscape apps PathLinker and AnatApp/ANAT to extract active subnetworks from tMLNs. To test the usability of our app, we applied TimeNexus together with PathLinker or ANAT on temporal expression data of the yeast cell cycle and were able to identify active subnetworks relevant for different cell cycle phases. We furthermore used TimeNexus on our own temporal expression data from a mouse pain assay inducing hindpaw inflammation and detected active subnetworks relevant for an inflammatory response to injury, including immune response, cell stress response and regulation of apoptosis. TimeNexus is freely available from the Cytoscape app store at https://apps.cytoscape.org/apps/TimeNexus.
... The CCAAT box is a widespread DNA element in mammalian promoters [2][3][4][5], with a relatively precise location, from 60 to 100 base pairs upstream of transcription start sites (TSSs). It is found in inducible genes, including cell-cycle regulated, as well as in genes overexpressed in cancer cells [6][7][8]. The NF-Y trimeric Transcription Factor is the primary-likely the sole-CCAAT-binding activity in all eukaryotes [9]. ...
Article
Full-text available
NF-Y is a trimeric Transcription Factor -TF- which binds with high selectivity to the conserved CCAAT element. Individual ChIP-seq analysis as well as ENCODE have progressively identified locations shared by other TFs. Here, we have analyzed data introduced by ENCODE over the last five years in K562, HeLa-S3 and GM12878, including several chromatin features, as well RNA-seq profiling of HeLa-S3 cells after NF-Y inactivation. We double the number of sequence-specific TFs and co-factors reported. We catalogue them in 4 classes based on co-association criteria, infer target genes categorizations, identify positional bias of binding sites and gene expression changes. Larger and novel co-associations emerge, specifically concerning subunits of repressive complexes as well as RNA-binding proteins. On the one hand, these data better define NF-Y association with single members of major classes of TFs, on the other, they suggest that it might have a wider role in the control of mRNA production.
... The arrows represent the hierarchical relationships between GO terms. This functional analysis, which agrees with a vast body of work on GO enrichment among cancer genes in the past [64][65][66][67][68][69][70] , is in line with our results of the evolution age enrichment analysis: cancer cells overexpress old genes mainly involved with essential cell functions, such as cell cycle and cell division, and suppresses young genes typically associated with multicellular functions, such as cell differentiation and inter-cellular signaling. This agrees . ...
Preprint
Full-text available
The question of the existence of cancer is inadequately answered by invoking somatic mutations or the disruptions of cellular and tissue control mechanisms. As such uniformly random events alone cannot account for the almost inevitable occurrence of an extremely complex process such as cancer. In the different epistemic realm, an ultimate explanation of cancer is that cancer is a reversion of a cell to an ancestral pre-Metazoan state, i.e. a cellular form of atavism. Several studies have suggested that genes involved in cancer have evolved at particular evolutionary time linked to the unicellular-multicellular transition. Here we used a refined phylostratigraphic analysis of evolutionary ages of the known genes/pathways associated with cancer and the genes differentially expressed between normal and cancer tissue as well as between embryonic and mature (differentiated) cells. We found that cancer-specific transcriptomes and cancer-related pathways were enriched for genes that evolved in the pre-Metazoan era and depleted of genes that evolved in the post-Metazoan era. By contrast an opposite relation was found for cell maturation: the age distribution frequency of the genes expressed in differentiated epithelial cells were enriched for post-Metazoan genes and depleted of pre-Metazoan ones. These findings support the atavism theory that cancer cells manifest the reactivation of an ancient ancestral state featuring unicellular modalities. Thus our bioinformatics analyses suggest that not only does oncogenesis recapitulate ontogenesis, and ontogenesis recapitulates phylogenesis, but also oncogenesis recapitulates phylogenesis. This more encompassing perspective may offer a natural organizing framework for genetic alterations in cancers and point to new treatment options that target the genes controlling the atavism transition. One Sentence Summary Tracing cancer gene evolutionary ages revealed that cancer reverts to a pre-existing early Metazoan state.
... In summary, patients in the high-PIRs group may be suitable for immunotherapy, and they may have a better or the same benefit through immunotherapy compared with the low-PIRs group. Finally, we used the Oncomine database, an authoritative database of tumors, to verify the expression of the genes that constructed the above model and to carry out pancancer analysis, which further verified our results [43]. ...
Article
Full-text available
The immune microenvironment plays a vital role in the progression of hepatocellular carcinoma (HCC). Thousands of immune-related genes (IRGs) have been identified, but their effects on HCC are not fully understood. In this study, we identified the differentially expressed IRGs and analyzed their functions in HCC in a systematic way. Furthermore, we constructed a diagnostic and a prognostic model using multiple statistical methods, and both models had good distinguishing performance, which we verified in several independent datasets. This diagnostic model was also adaptable to proteomic data. The combination of a prognostic risk model and classic clinical staging can effectively distinguish patients in high- and low-risk groups. Furthermore, we systematically explore the differences in the immune microenvironment between the high-risk group and the low-risk group to help clinical decision-making. In summary, we systematically analyzed immune-related genes in HCC, explored their functions, constructed a diagnostic and a prognostic model and investigated potential therapeutic schedules in high-risk patients. The model performance was verified in multiple databases. Our findings can provide directions for future research.
... In addition to genome sequencing, RNA sequencing of transcriptome profiling has been widely used in cancer research and contributes to the analysis of tumor biology [118]. Kozminsky et al. used a microfluidic graphene oxide nanoroughened structure-based device (GO chip) to isolate CTCs and CTC clusters from the whole blood of metastatic castration-resistant prostate cancer patients [83]. ...
Article
Full-text available
Circulating tumor cells (CTCs), a type of cancer cell that spreads from primary tumors into human peripheral blood and are considered as a new biomarker of cancer liquid biopsy. It provides the direction for understanding the biology of cancer metastasis and progression. Isolation and analysis of CTCs offer the possibility for early cancer detection and dynamic prognosis monitoring. The extremely low quantity and high heterogeneity of CTCs are the major challenges for the application of CTCs in liquid biopsy. There have been significant research endeavors to develop efficient and reliable approaches to CTC isolation and analysis in the past few decades. With the advancement of microfabrication and nanomaterials, a variety of approaches have now emerged for CTC isolation and analysis on microfluidic platforms combined with nanotechnology. These new approaches show advantages in terms of cell capture efficiency, purity, detection sensitivity and specificity. This review focuses on recent progress in the field of nanotechnology-assisted microfluidics for CTC isolation and detection. Firstly, CTC isolation approaches using nanomaterial-based microfluidic devices are summarized and discussed. The different strategies for CTC release from the devices are specifically outlined. In addition, existing nanotechnology-assisted methods for CTC downstream analysis are summarized. Some perspectives are discussed on the challenges of current methods for CTC studies and promising research directions.
Preprint
Full-text available
Paclitaxel is a standard of care neoadjuvant therapy for patients with triple negative breast cancer (TNBC); however, it shows limited benefit for locally advanced or metastatic disease. Here we used a coordinated experimental-computational approach to explore the influence of paclitaxel on the cellular and molecular responses of TNBC cells. We found that escalating doses of paclitaxel resulted in multinucleation, promotion of senescence, and initiation of DNA damage induced apoptosis. Single-cell RNA sequencing (scRNA-seq) of TNBC cells after paclitaxel treatment revealed upregulation of innate immune programs canonically associated with interferon response and downregulation of cell cycle progression programs. Systematic exploration of transcriptional responses to paclitaxel and cancer-associated microenvironmental factors revealed common gene programs induced by paclitaxel, IFNB, and IFNG. Transcription factor (TF) enrichment analysis identified 13 TFs that were both enriched based on activity of downstream targets and also significantly upregulated after paclitaxel treatment. Functional assessment with siRNA knockdown confirmed that the TFs FOSL1, NFE2L2 and ELF3 mediate cellular proliferation and also regulate nuclear structure. We further explored the influence of these TFs on paclitaxel-induced cell cycle behavior via live cell imaging, which revealed altered progression rates through G1, S/G2 and M phases. We found that ELF3 knockdown synergized with paclitaxel treatment to lock cells in a G1 state and prevent cell cycle progression. Analysis of publicly available breast cancer patient data showed that high ELF3 expression was associated with poor prognosis and enrichment programs associated with cell cycle progression. Together these analyses disentangle the diverse aspects of paclitaxel response and identify ELF3 upregulation as a putative biomarker of paclitaxel resistance in TNBC.
Article
Full-text available
The natural history of follicular lymphoma (FL) is frequently characterized by transformation to a more aggressive diffuse large B cell lymphoma (DLBCL). We compared the gene-expression profiles between transformed DLBCL and their antecedent FL. No genes were observed to increase or decrease their expression in all of the cases of histological transformation. However, two different gene-expression profiles associated with the transformation process were defined, one in which c-myc and genes regulated by c-myc showed increased expression and one in which these same genes showed decreased expression. Further, there was a striking difference in gene-expression profiles between transformed DLBCL and de novo DLBCL, because the gene-expression profile of transformed DLBCL was more similar to their antecedent FL than to de novo DLBCL. This study demonstrates that transformation from FL to DLBCL can occur by alternative pathways and that transformed DLBCL and de novo DLBCL have very different gene-expression profiles that may underlie the different clinical behaviors of these two types of morphologically similar lymphomas.
Article
Full-text available
The TRANSFAC® database on eukaryotic transcriptional regulation, comprising data on transcription factors, their target genes and regulatory binding sites, has been extended and further developed, both in number of entries and in the scope and structure of the collected data. Structured fields for expression patterns have been introduced for transcription factors from human and mouse, using the CYTOMER® database on anatomical structures and developmental stages. The functionality of Match™, a tool for matrix-based search of transcription factor binding sites, has been enhanced. For instance, the program now comes along with a number of tissue-(or state-)specific profiles and new profiles can be created and modified with Match™ Profiler. The GENE table was extended and gained in importance, containing amongst others links to LocusLink, RefSeq and OMIM now. Further, (direct) links between factor and target gene on one hand and between gene and encoded factor on the other hand were introduced. The TRANSFAC® public release is available at http://www.gene-regulation.com. For yeast an additional release including the latest data was made available separately as TRANSFAC® Saccharomyces Module (TSM) at http://transfac.gbf.de. For CYTOMER® free download versions are available at http://www.biobase.de:8080/index.html.
Article
Full-text available
The transcription factors that act in hepatocyte-specific gene expression include proteins that are present mainly in liver cells (HNF-1/LFB1, C/EBP, HNF-3, HNF-4) (HNF, hepatocyte nuclear factor; C/EBP, rat enhancer binding protein) and proteins that are widely distributed (AP-1, NF-1, NF-Y/ACF). We show here that the genes encoding each of these liver-enriched factors exhibit different patterns of transcriptional control in different tissues. In addition, there were several instances in which transcription was detected (e.g., for HNF-1) when no mRNA or specific DNA binding protein was found, suggesting the importance of posttranscriptional control in some instances for these factors. These experiments identify C/EBP, HNF-3, and HNF-4, and perhaps also HNF-1, as targets for the study of cascades of transcriptionally controlled transcription factors in differentiated cells.
Article
Full-text available
HNF-4 (hepatocyte nuclear factor 4) is a protein enriched in liver extracts that binds to sites required for the transcription of the genes for transthyretin (TTR), the carrier protein in the serum for vitamin A and thyroid hormone, and for apolipoprotein CIII (apoCIII), a major constituent of chylomicrons and very low-density lipoproteins (VLDL). Synthetic oligonucleotides derived from amino acid sequence of affinity-purified HNF-4 protein (54 kD) were used in the polymerase chain reaction (PCR) to isolate a cDNA clone encoding the protein. HNF-4 is a member of the steroid hormone receptor superfamily with an unusual amino acid in the conserved "knuckle" of the first zinc finger (DGCKG). Studies with in vitro-translated HNF-4 protein show that it binds to its recognition site as a dimer, and cotransfection assays indicate that it activates transcription in a sequence-specific fashion in nonhepatic (HeLa) cells. Northern blot analysis reveals that HNF-4 mRNA is present in kidney and intestine, as well as liver, but is absent in other tissues. DNA-binding and antisera reactivity data suggest that HNF-4 could be identical to liver factor A1 (LF-A1), a DNA-binding activity implicated in the regulation of transcription of the alpha 1-antitrypsin, apolipoprotein A1, and pyruvate kinase genes. The similarity between HNF-4 and other ligand-dependent transcription factors raises the possibility that HNF-4 and the genes it regulates respond to an as yet unidentified ligand.
Article
Prostate cancer is the second most common cause of death from cancer in men, and is exceeded only by lung cancer in male mortality rates from malignant disease. Whilst clinically significant prostate cancer is largely a disease of ‘Western’ society, it is a paradox that microfocal, well-differentiated ‘latent’ cancer, which is only diagnosed at autopsy, has an incidence of up to 80% in 80 year old men and an equal worldwide distribution. There may be either some aetiological factor in America and Europe, which activates this latent cancer, or else some protective factor, possibly dietary, which prevents expression of the diseases in the ‘East’. In the USA, clinically significant prostate cancer is diagnosed in 9.5% of men and 3.5% will die from the disease.
Article
Type I interferons (IFN-α/β) were originally identified as humoral factors, which are secreted in virally infected cells and confer an antiviral state in uninfected cells. Subsequently, their multifunctional roles have also been demonstrated, which include antitumor actions. More recently, the IFN system has been the focus of much attention in the context of the regulation of the innate and adaptive immune systems. Indeed, the IFN genes are induced in antigen-presenting cells (APCs) via the activation of distinct Toll-like receptors (TLRs), and accumulating evidence indicates the importance of TLR-induced IFN-α/β for the induction of both innate and adaptive immune responses. Two members of the interferon regulatory factor (IRF) family of transcription factors, IRF-3 and IRF-7, play mutually nonredundant functions in IFN-α/β gene induction in response to viral infection or TLR stimulation. Another unique facet of the IFN-α/β system is that IFN-α/β are produced at low levels in normally growing cells. Although seemingly futile, a weak signal by these IFNs is critical to eliciting from cells strong responses to other stimuli, thereby providing a foundation for an efficient operation of the immune system. In the context of the antitumor action of IFNs, p53 gene transcription is induced by IFN-α/β, accompanied by an increase in p53 protein level for boosting p53 responses in tumor suppression. Furthermore, a new link was discovered between p53 and IFN-α/β in antiviral immunity. In this review, we focus on recent studies on the type IIFN (IFN-α/β) system and IRF-family transcription factors with respect to immunity and oncogenesis.
Article
The murine oct-3 gene encodes a transcription factor containing a POU-specific domain and a homeodomain. In marked contrast to other homeodomain-encoding genes, oct-3 is expressed in the totipotent and pluripotent stem cells of the pregastrulation embryo and is down-regulated during differentiation to endoderm and mesoderm, suggesting that it has a role in early development. The oct-3 gene is also expressed in primordial germ cells and in the female germ line.
Article
The proto-oncogene ets-1 is a member of the ets family of genes that share homology with the viral oncogene, v-ets, of the avian leukemia virus E26. By using expression vectors, we demonstrate that the ets-1 gene transforms NIH3T3 cells and the ets-1 transfected cells form colonies in soft agar and induce tumors in nude mice. We have also determined that the ets-1 protein contains homology with the helix-loop-helix motif of the HLH family proteins, but lacks the basic domain upstream of helix I. Transfection of the NIH3T3 cells with ets-1 vectors results in the activation of the endogenous ets-1 gene. Using hybridization probes that can distinguish between transcripts from endogenous and exogenous templates, we show that the endogenous ets-1 gene is activated by the expression of the transfected exogenous ets-1. In contrast, the expression of transfected ets-2 has no effect on the endogenous ets-1 gene expression. The results indicate that the ets-1 proto-oncogene is positively autoregulated by its product.