ArticlePDF Available

Increased methylation variation in epigenetic domains across cancer types

Authors:

Abstract and Figures

Tumor heterogeneity is a major barrier to effective cancer diagnosis and treatment. We recently identified cancer-specific differentially DNA-methylated regions (cDMRs) in colon cancer, which also distinguish normal tissue types from each other, suggesting that these cDMRs might be generalized across cancer types. Here we show stochastic methylation variation of the same cDMRs, distinguishing cancer from normal tissue, in colon, lung, breast, thyroid and Wilms' tumors, with intermediate variation in adenomas. Whole-genome bisulfite sequencing shows these variable cDMRs are related to loss of sharply delimited methylation boundaries at CpG islands. Furthermore, we find hypomethylation of discrete blocks encompassing half the genome, with extreme gene expression variability. Genes associated with the cDMRs and large blocks are involved in mitosis and matrix remodeling, respectively. We suggest a model for cancer involving loss of epigenetic stability of well-defined genomic domains that underlies increased methylation variability in cancer that may contribute to tumor heterogeneity.
Large hypomethylated genomic blocks in human colon cancer. (a,b) Shown in a and b are smoothed methylation values from bisulfite sequencing data for cancer samples (red) and normal samples (blue) in two genomic regions. The hypomethylated blocks are shown with pink shading. Gray bars indicate the location of PMDs, LOCKs, LADs, CpG islands and gene exons. Note that the blocks coincide with the PMDs, LOCKS and LADs in a but not in b. Also one can see small hypermethylated blocks at the right edge, which account for 3% of the blocks. (c) The distribution of high-frequency smoothed methylation values for the normal samples (blue) versus the cancer samples (red) shows global hypomethylation of cancer compared to normal. (d) The distribution of methylation values in the blocks (solid lines) and outside the blocks (dashed lines) for normal samples (blue) and cancer samples (red). Note that although the normal and cancer distributions are similar outside the blocks, within the blocks, methylation values for cancer show a general shift. (e) Distribution of methylation differences between cancer and normal samples stratified by inclusion in repetitive DNA and blocks. Inside the blocks, the average difference was ~20% in both in repeat and non-repeat areas. Outside the blocks, the average difference was ~0% in repeat and non-repeat areas, indicating that blocks rather than repeats account for the observed differences in DNA methylation. The boxes show the 25% quantile, the median and the 75% quantile, and each whisker has a length of 1.5 times the interquartile range.
… 
Content may be subject to copyright.
Increased methylation variation in epigenetic domains across
cancer types
Kasper Daniel Hansen1,2,*, Winston Timp2,3,4,*, Héctor Corrada Bravo2,5,*, Sarven
Sabunciyan2,6,*, Benjamin Langmead1,2,*, Oliver G. McDonald2,7, Bo Wen2,3, Hao Wu8, Yun
Liu2,3, Dinh Diep9, Eirikur Briem2,3, Kun Zhang9, Rafael A. Irizarry1,2,†, and Andrew P.
Feinberg2,3,†
1 Dept. of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
2 Center for Epigenetics, Johns Hopkins University School of Medicine, Baltimore, MD, USA
3 Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
4 Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
5 Center for Bioinformatics and Computational Biology, Department of Computer Science,
University of Maryland, College Park, MD, USA
6 Department of Pediatrics, Johns Hopkins University School of Medicine, Baltimore, MD, USA
7 Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD, USA
8 Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory
University, Atlanta, GA, USA
9 Department of Bioengineering, Institute for Genomic Medicine and Institute of Engineering in
Medicine, University of California at San Diego, San Diego, CA, USA
Summary
Tumor heterogeneity is a major barrier to effective cancer diagnosis and treatment. We recently
identified cancer-specific differentially DNA-methylated regions (cDMRs) in colon cancer, which
also distinguish normal tissue types from each other, suggesting that these cDMRs might be
generalized across cancer types. Here we show stochastic methylation variation of the same
cDMRs, distinguishing cancer from normal, in colon, lung, breast, thyroid, and Wilms tumors,
with intermediate variation in adenomas. Whole genome bisulfite sequencing shows these variable
cDMRs are related to loss of sharply delimited methylation boundaries at CpG islands.
Furthermore, we find hypomethylation of discrete blocks encompassing half the genome, with
extreme gene expression variability. Genes associated with the cDMRs and large blocks are
involved in mitosis and matrix remodeling, respectively. These data suggest a model for cancer
involving loss of epigenetic stability of well-defined genomic domains that underlies increased
methylation variability in cancer and could contribute to tumor heterogeneity.
Correspondence to Rafael A. Irizarry and Andrew P. Feinberg: rafa@jhu.edu, afeinberg@jhu.edu.
*Equal contributions from these authors
DATA ACCESSION
Whole genome bisulfite sequencing data, capture bisulfite sequencing data, custom GoldenGate microarray data, ChIP-chip LOCK
data, Wilms’ tumor copy number microarray data are submitted, pending assignment of accession numbers.
Author contributions: K.D.H. and R.A.I wrote the DMR finder and smoothing algorithms; W.T. performed and analyzed the arrays
with H.C.B. who wrote new software for this purpose; S.S. made the libraries and performed validation; B.L. wrote new methylation
sequence alignment software; O.G.M. performed histopathologic analysis; B.W. and H.W. performed LOCK experiments; Y.L
performed copy number experiments; D.D. and K.Z. performed bisulfite capture; E.B. performed the sequencing; R.A.I. and A.P.F.
conceived and led the experiments and wrote the paper with the predominant assistance of K.D.H., W.T., H.C.B. and B.L.
NIH Public Access
Author Manuscript
Nat Genet. Author manuscript; available in PMC 2012 February 1.
Published in final edited form as:
Nat Genet
. ; 43(8): 768–775. doi:10.1038/ng.865.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
Introduction
Cancer is generally viewed as over 200 separate diseases of abnormal cell growth,
controlled by a series of mutations, but also involving epigenetic non-sequence changes
involving the same genes1. DNA methylation at CpG dinucleotides has been studied
extensively in cancer, with hypomethylation or hypermethylation reported at some genes,
and global hypomethylation ascribed to normally methylated repetitive DNA elements. Until
now, cancer epigenetics has focused on high-density CpG islands, gene promoters, or
dispersed repetitive elements2,3.
Here we have taken a different and more general approach to cancer epigenetics. It is based
on our recent observation of frequent methylation alterations in colon cancer of lower
cytosine-density CpG regions near islands, termed shores; as well as the observation that
these cancer-specific differentially methylated regions, or cDMRs, correspond largely to the
same regions that show DNA methylation variation among normal spleen, liver, and brain,
or tissue-specific DMRs (tDMRs)4. Furthermore, cDMRs are highly enriched among
regions differentially methylated during stem cell reprogramming of induced pluripotent
stem (iPS) cells5. We thus reasoned that the very same sites might be generalized cDMRs,
since they are involved in normal tissue differentiation but show aberrant methylation in at
least one cancer type (colon).
We tested this hypothesis by designing a semi-quantitative custom Illumina array for
methylation analysis of 151 cDMRs consistently altered across colon cancer, and analyzed
these sites in 290 samples, including matched normal and cancer from colon, breast, lung,
thyroid, and Wilms’ tumor. We were surprised to discover that almost all of these cDMRs
were altered across all cancers tested. Specifically, the cDMRs showed increased stochastic
variation in methylation level within each tumor type, suggesting a generalized disruption of
the integrity of the cancer epigenome. To investigate this idea further, we performed
genome-scale bisulfite sequencing of 3 colorectal cancers, the matched normal colonic
mucosa, and two adenomatous polyps. These experiments revealed a surprising loss of
methylation stability in colon cancer, involving CpG islands and shores, and large (up to
several megabases) blocks of hypomethylation affecting more than half of the genome, with
associated stochastic variability in gene expression, which could provide an epigenetic
mechanism for tumor heterogeneity.
RESULTS
Stochastic variation in DNA methylation across cancer types
We sought to increase the precision of DNA methylation measurements over our previous
tiling array-based approach, termed CHARM6, analyzing 151 colon cDMRs4. We designed
a custom nucleotide-specific Illumina bead array 384 probes covering 139 regions7. We
studied 290 samples, including cancers from colon, lung, breast, thyroid, and Wilms’, with
matched normal tissues to 111 of these 122 cancers, along with 30 colon premalignant
adenomas and 27 additional normal samples (see Methods). To minimize the risk of genetic
heterogeneity arising from sampling multiple clones we purified DNA from small (0.5 cm ×
0.2 cm) sections verified by histopathologic examination.
Cluster analysis of the DNA methylation values revealed that the colon cancer cDMRs
largely distinguished cancer from normal for each tumor type (Supplementary Fig. 1). The
increased across-sample variability in methylation within the cancer samples of each tumor
type compared to normal was even more striking than differences in mean methylation. We
therefore computed across-sample variance within normal and cancer samples in all five
Hansen et al. Page 2
Nat Genet. Author manuscript; available in PMC 2012 February 1.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
tumor/normal tissue types at each CpG site. Although these CpGs sites were selected for
differences in mean values in colon cancer, the great majority exhibited greater variance in
cancer than normal in each tissue type (Fig. 1a–e), even accounting for differences in
variability expected from mean shifts according to a binomial distribution model of
methylation measurements (Supplementary Fig. 2). This increase was statistically significant
(p<0.01, using an F-test) for 81%, 92%,81%, 70%, and 80% of the CpG sites in colon, lung,
breast, thyroid, and Wilms tumor, respectively. Furthermore, 157 CpG sites had statistically
significant increased variability in all cancer types tested. This increased stochastic variation
was found in CpG islands, CpG island shores, and regions distant from islands (Fig. 1a–e).
These data suggest a potential mechanism of tumor heterogeneity, namely increased
stochastic variation of DNA methylation in cancers compared to normal, within each tumor
type tested (see Discussion). We ruled out increased cellular heterogeneity and patient age
as artifactual causes for methylation heterogeneity in cancer samples (Supplementary Figs. 3
and 4). Furthermore, there was no difference in methylation hypervariability comparing five
high copy variation colon cancers to five low copy variation Wilms tumors (Supplementary
Fig. 5a–b), arguing against genetic heterogeneity as a cause of methylation hypervariability.
Similarly, 7 Wilms tumors without aberrant p53 expression by immunohistochemistry
showed similar methylation hypervariability to 7 colon tumors with positive staining, a
marker of chromosomal instability (Supplementary Fig. 6).
The loci where increased variability in cancer was observed are also able to distinguish the
five normal tissues from each other, but this is a mean shift rather than a variation shift,
apparent from cluster analysis (Supplementary Fig. 7). Interestingly, this is the case even
when only using the 25 most variable sites in cancer (Fig. 1f). This result reinforces the
concept of a biological relationship between normal tissue differentiation and stochastic
variation in cancer DNA methylation.
To determine if the increased variability is a general property of cytosine methylation in
cancer or a specific property of the CpGs selected for our custom array, we used as a control
a publicly available methylation dataset comparing colorectal cancer to matched normal
mucosa on the Illumina Human Methylation 27k beadchip array. In this dataset we found
that only 42% of the sites showed a statistically significant increase in methylation
variability, compared to 81% in the custom array (p<0.01), confirming the specificity of the
cancer DMRs included in our custom array. Increased stochastic variation was more
common in CpGs far from islands (57%) than in shores (44%) or islands (31%), contrasting
the relative representation of these locations on the 27k array which breaks down as: distal
to islands (26.4%), shores (31.6%) and islands (42%) (see Methods). This result suggested
that something other than relationship to CpG islands might be defining the largest fraction
of sites of altered DNA methylation in cancer.
Hypomethylation of large DNA methylation blocks in colon cancer
The methylation stochasticity described above appears to be a general property of cancer,
affecting cDMRs in both island and non-island regions, in all five cancer types tested. To
investigate this apparent universal loss of DNA methylation pattern integrity in cancer, and
analyze lower CpG abundance regions not examined by array-based methods, we performed
shotgun bisulfite genome sequencing on 3 colorectal cancers and the matched normal
colonic mucosausing the ABI SOLiD platform. We wanted to obtain methylation estimates
with enough precision to detect differences of 10% methylation. Because we used a local
likelihood approach, which aggregated information from neighboring CpGs and combined
data from 3 biological replicates, we determined that 4X coverage would suffice to estimate
methylation values at this precision with a standard error of at most 3% (see Methods). We
therefore obtained between 12.5 and 13.5 gigabases for each sample, providing ~5X
coverage for each CpG after quality control filtering (see Methods) and
Hansen et al. Page 3
Nat Genet. Author manuscript; available in PMC 2012 February 1.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
alignment(Supplementary Table 1). To verify the accuracy of methylation values obtained
by our approach, we performed capture bisulfite sequencing on the same 6 samples for
39,262 regions yielding 39.3k–125.6k CpG with >30× coverage (Supplementary Table 2),
with correlations of 0.82–0.91 between our local likelihood approach and capture
sequencing, a remarkable agreement since experiments were performed in different
laboratories using different sequencing platforms and protocols. Examination of individual
loci demonstrated that our methylation estimates closely track the high-coverage capture
data (Supplementary Fig. 8). We also performed traditional bisulfite pyrosequencing, further
confirming the accuracy of our approach (Supplementary Fig. 9).
Sequencing analysis revealed the surprising presence of large blocks of contiguous
hypomethylation in cancer compared to normal (Fig. 2a–b). We identified 13,540 such
regions of 5kb–10MB (Table 1, Supplementary Table 3). The across-cancer average
hypomethylation throughout the blocks was 12%–23%. Remarkably, these hypomethylated
blocks in cancer corresponded to more than half of the genome, even accounting for the
number of CpG sites within the blocks (Table 1), and may include small hypermethylated
regions. We also noted the existence of a small fraction (3%) of hypermethylated blocks in
cancer (Table 1, Figs. 2a, b). A histogram of smoothed methylation values shows the shift in
distribution of global DNA methylation (Fig. 2c). The predominant change in block
methylation in cancer was a loss in the abundant compartment of intermediate methylation
levels (mean 73%for all samples) to significantly lower levels (50–61%)(Fig. 2d).
These blocks are common across all three cancers. An analysis of the tumors individually
versus a normal profile shows consistent block boundary locations (see Fig. 2,
Supplementary Fig. 10, and Methods). These blocks were not driven by copy number
variation since the location of the latter was not consistent across subjects, in contrast to the
consistent block boundaries (Supplementary Fig. 11a, b), and the methylation difference
estimates provided by our statistical approach did not correlate with copy number values
(Supplementary Fig. 11c).
Global hypomethylation in cancer8 is attributed to the presence of normally methylated
repetitive elements9 and may be relevant to colon cancer as LINE-1 element
hypomethylation is associated with worse prognosis in colon cancer10. We observed that in
normal tissues, repetitive elements were more methylated than non-repetitive regions (76%
vs. 66%). To determine whether such repetitive elements were responsible for the block
hypomethylation, we compared differences in methylation levels inside and outside repeat
elements (see Methods), both inside and outside blocks. Most of the global hypomethylation
was due to hypomethylated blocks (Fig. 2e) and not the presence of repetitive elements. As
repetitive elements are slightly enriched in blocks (odds ratio 1.4), much of the apparent
repeat-associated methylation may in fact be due to blocks. This result does not exclude
repeat-associated hypomethylation, since not all repeats were mappable. However, 57% of
L1 elements, 94% of L2 elements, 95% of MIR sequences, and 18% of Alu elements were
covered by our data (Supplementary Table 4) and did not show repeat-specific
hypomethylation (Supplementary Fig. 12). Note that it is possible that Alu sequences not
covered by our data are somehow more hypomethylated than covered Alu sequences and
thus contribute to global hypomethylation.
Lister et al. performed bisulfite sequencing analysis of the H1 human embryonic stem cell
line compared to the IMR90 fibroblast line, identifying large regions of the genome that are
less methylated in fibroblast cells than ES cells, referred to as partially methylated domains
(PMDs)11. The intermediate-methylation level regions we identified above largely coincided
with the PMDs, containing 85% of CpGs inside PMDs (odds ratio 6.5, P<2×1016,
Supplementary Table 5). We previously described large organized chromatin lysine (K)
Hansen et al. Page 4
Nat Genet. Author manuscript; available in PMC 2012 February 1.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
modifications, or LOCKs, genome-wide in normal mouse cells that are associated with both
constitutive and tissue-specific gene silencing12. We mapped LOCKs in primary human
cells (see Methods). Remarkably, 89% of the LOCKs were contained within the blocks
(odds ratio 6.8, P<2×1016). LOCKs are also known to overlap with nuclear lamina-
associated domains or LADs12. Approximately 83% of the LADs were also contained
within the blocks (odds ratio 4.9, P<2×1016). In addition, DNase I hypersensitive sites, a
structural signal for regulatory regions13 were enriched within 1 kb of block boundaries and
small DMRs (p<2×1016 for both). Thus the large hypomethylated blocks we identified in
cancer correspond to a genomic organization identified in normal cells by several
complementary methods. Note that although the PMDs and our hypomethylated blocks
largely overlap, we demonstrate later significant differences in gene expression in cancer
between non-overlapping blocks and PMDs.
We observed a relationship between the 157 CpGs that are hypervariable across all cancer
types identified by our custom array and the hypomethylated blocks identified by whole
genome bisulfite sequencing. We found that 63% of the hypomethylated hypervariable
CpGs were within hypomethylated blocks, and 37% of the hypermethylated hypervariable
CpGs were within the rare hypermethylated blocks. In contrast, hypomethylated and
hypermethylated CpGs, respectively, from the control Human Methylation 27K array, that
were not hypervariable in cancer were enriched only 13% and 1.5% in the hypomethylated
and hypermethylated blocks, respectively, demonstrating high statistical significance for
enrichment of hypervariably methylated CpGs in blocks (p<2×1016; Supplementary Table
6).
Small DMRs in cancer involve loss of stability of DNA methylation boundaries
We developed a statistical algorithm (see Methods) for detecting DNA methylation changes
in regions smaller than the blocks (5kb). Our analysis of biological replicates was critical
as we found that regions showing across-subject variability in normal samples would be
easily confused with DMRs if only one cancer-normal pair was available (Supplementary
Fig. 13). Methylation measurements in these smaller regions exhibited good agreement with
measurements from our previous CHARM-based microarray analysis4 (Supplementary Fig.
14). We refer to these as small DMRs to distinguish them from the large (>5 kb)
differentially methylated blocks described above. The increased comprehensiveness of
sequencing over CHARM and other published array-based analyses allowed us to detect
more small DMRs than previously reported, 5,810 hypermethylated and 4,315
hypomethylated small DMRs (Supplementary Table 7). We also confirmed our finding4 that
hypermethylated cDMRs are enriched in CpG islands while hypomethylated cDMRs are
enriched in CpG island shores (Table 1). Sequencing also showed that the ratio of
unmethylated to methylated islands is normally approximately 2:1, and for both types
approximately 20% change methylation state in cancers (Table 2, Supplementary Table 8).
The most striking and consistent characteristic of small DMR architecture was a shift in one
or both of the DNA methylation boundaries of a CpG island out of the island into the
adjacent region (Fig. 3a,)or into the interior of the island (Fig. 3b). Boundary shifts into
islands would appear as hypermethylated islands on array-based data, while boundary shifts
out of islands would appear as hypomethylated shores.
The second most frequent category of small DMRs involved loss of methylation boundaries
at CpG islands. For example, many hypermethylated cDMRs were defined in normal
samples by unmethylated regions surrounded by highly methylated regions. In cancer, these
regions exhibited stable methylation levels of approximately 40–60% throughout (Table 1,
Fig. 3c). These regions with loss of methylation boundaries largely correspond to what are
classified as hypermethylated islands in cancer.
Hansen et al. Page 5
Nat Genet. Author manuscript; available in PMC 2012 February 1.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
We also found hypomethylated cDMRs that arose de novo in highly methylated regions
outside of blocks, which we call novel hypomethylated DMRs, usually corresponding to
CpG-rich regions that were not conventional islands (Table 1). Here, regions in which
normal colon tissue was 75–95% methylated dropped to lower levels (20–40%) in cancer
(Fig. 3d). In summary, in addition to the hypomethylated blocks, we found 10,125 small
DMRs, 5,494 of which clearly fell in three categories: shifts of methylation boundaries, loss
of methylation boundaries, and novel hypomethylation. Note that not all small DMRs
followed a consistent pattern across all three sample pairs and were therefore not classified
(Table 1).
Methylation-based Euclidean distances show colon adenomas intermediate between
normals and cancers
Using multidimensional scaling of the methylation values measured via the custom array in
colon samples we noticed that normal samples clustered tightly together in contrast to
dispersed cancer samples (Fig. 4a). This is consistent with the observed increase in
methylation variability in cancer described earlier. We analyzed 30 colon adenomas on the
custom array, and found that they were intermediate in both variability within samples and
distance to the cluster of normal samples (Fig. 4a).
We subsequently performed whole genome bisulfite sequencing on two of these adenomas,
a premalignant colon adenoma with relatively small methylation-based distance to the
normal colons and an adenoma with a large methylation-based distance to the normal
colons, similar to the cancer samples. We computed average methylation levels over each
block from each sequenced sample and computed pairwise Euclidean distances between
samples using these values. These measurements from hypomethylated blocks confirm the
characteristic observed the array data: genome-wide increased variability in cancers
compared to normals with adenomas exhibiting intermediate values (Fig. 4).
Expression of cell cycle genes associated with hypomethylated shores in cancer
Whole genome analysis has demonstrated an inverse relationship between gene expression
and methylation, especially at transcriptional start sites14. To study this relationship in small
DMRs, we obtained public microarray gene expression data from cancer and normal colon
samples (see Methods) and compared to results fromour sequencing data. We mapped 6,869
genes to DMRs within 2 kb of the gene’s transcription start site and observed the expected
inverse relationship between DNA methylation and gene expression (r = 0.27, p< 2×1016,
Supplementary Fig. 15).
We examined the inverse relationship between methylation and gene expression for each
category of small DMRs separately and noticed that the strongest relationship for
hypomethylated shores is due to methylation boundary shifts (Supplementary Table 9). We
performed gene ontology enrichment analysis15 for differentially expressed genes
(FDR<0.05), comparing those associated with hypomethylated boundary shifts to the other
categories. Categories (Supplementary Table 10) were strongly enriched for mitosis and
cell-cycle related genes CEP55, CCNB1, CDCA2, PRC1, CDC2, FBXO5, AURKA, CDK1,
CDKN3, CDK7, and CDC20B, among others (Supplementary Table 11).
Increased variation in gene expression in hypomethylated blocks and DMRs
We compared across-subject methylation variability levels between cancer and normal,
within the blocks, and found a striking similarity to the cancer methylation hypervariability
found with the custom array (Fig. 1a–e compared to Supplementary Fig. 16). To study the
relationship to gene expression in colon cancer, we obtained public gene expression data
from cancer and normal samples (see Methods). Genes in the blocks were generally silenced
Hansen et al. Page 6
Nat Genet. Author manuscript; available in PMC 2012 February 1.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
(80% genes silenced in all samples) both in normal and cancer samples. Of the genes
consistently transcribed in normal tissue, albeit at low levels, 36% are silenced in blocks in
cancers, compared to 15% expected by chance. This is consistent with other reports in the
literature, e.g. Frigola et al16.
More striking than subtle differences in gene silencing, we found substantial enrichment of
genes exhibiting increased expression variability in cancer compared to normal samples in
the hypomethylated blocks. First, we ruled out that this observed increased variability was
due to the potential high cellular heterogeneity of cancer (Supplementary Fig. 17a). Then,
we noticed a clear and statistically significant association between increased variability in
expression of a gene and its location within a hypomethylated block (Supplementary Fig.
17b). For example, 26 of the 50 genes exhibiting the largest increase in expression
variability were inside the blocks; 52% compared to the 17% expected by chance (p =
3×109). Expression levels for 25 of these exhibited an interesting pattern: while never
expressed in normal samples, they exhibited stochastic expression in cancer (Fig. 5 and
Supplementary Fig. 18). For example the genes MMP3, MMP7, MMP10, SIM2, CHI3L1,
STC1, and WISP (described in the Discussion) were expressed in 96%, 100%, 67%, 8%,
79%, 50%, and 17% of the cancer samples, respectively, but never expressed in normal
samples (Supplementary Table 12).
Functional differences between hypomethylated blocks and PMDs
As noted above, the hypomethylated blocks we observed substantially overlapped PMDs
reported in a fibroblast cell line by Lister et al.11. We examined the genomic regions of no
overlap between blocks and PMDs to identify potential functional differences between them.
We grouped them into two sets: 1) regions within the hypomethylated blocks but not in the
PMDs (B+P) and 2) regions within the PMDs but not in the hypomethylated blocks (BP
+). We obtained microarray gene expression data from fibroblast samples (see Methods)
and, as expected, the genes in the fibroblast PMDs were relatively silenced in the fibroblast
samples (p<2×1016). Furthermore, genes that were silenced in fibroblast samples and
consistently expressed in normal colon were enriched in the B-P+ regions (odds ratio of 3.2,
p<2×1016), while genes consistently silenced in colon and consistently expressed in
fibroblast samples were enriched in the B+P-regions (odds ratio 2.8, p = 0.0004). Finally,
the 50 hypervariable genes described above were markedly enriched in the B+P regions
(p=0.00013), yet showed no enrichment in the BP+ regions. These results suggest that
hypervariable gene expression in colon cancer may be related to their presence in
hypomethylated blocks.
DISCUSSION
In summary, we show that colon cancer cDMRs are generally involved in the common solid
tumors of adulthood, lung, breast, thyroid, and colon cancer, and the most common solid
tumor of childhood, Wilms tumor, with tight clustering of methylation levels in normal
tissues, and marked stochastic variation in cancers. Efforts to exploit DNA methylation for
cancer screening focus on identifying narrowly defined cancer-specific profiles17. Our data
suggests future efforts might instead be directed at defining the cancer epigenome as the
departure from a narrowly defined normal profile.
Surprisingly, two-thirds of all methylation changes in colon cancer involve hypomethylation
of large blocks, with consistent locations across samples, comprising more than half of the
genome. The functional relevance is supported by the fact that genes in colon blocks not in
fibroblast blocks tend to be silenced in colon and not in fibroblasts and vice-versa.
Hansen et al. Page 7
Nat Genet. Author manuscript; available in PMC 2012 February 1.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
The most variably expressed genes in cancer are enriched in the blocks, and involve genes
associated with tumor heterogeneity and progression, including three matrix
metalloproteinase genes, MMP3, MMP7, and MMP1018, and a fourth, SIM2, which acts
through metalloproteinases to promote tumor invasion19. Another, STC1, helps mediate the
Warburg effect of reprogramming tumor metabolism20. CHI3L1 encodes a secreted
glycoprotein associated with inflammatory responses and poor prognosis in multiple tumor
types including colon21. WISP genes are targets of Wnt-1 thought to contribute to tissue
invasion in breast and colon cancer22. Our gene ontology enrichment analysis15 of genes
associated with hypervariable expression in blocks (FDR<0.05)showed enrichment for
categories including extracellular matrix remodeling genes (Supplementary Table 13). One
cautionary note raised by these findings is that treatment of cancer patients with nonspecific
DNA methylation inhibitors could have unintended consequences in the activation of tumor-
promoting genes in hypomethylated blocks. It is also important to note that while previous
studies23,24 have shown large-region hypermethylation or no regional methylation change,
this study is based on whole-genome bisulfite sequencing. Nevertheless, future studies are
needed to show whether block hypomethylation is a feature of cancer epigenomes in
general.
Small DMRs, while representing a relatively small fraction of the genome (0.3%), are
numerous (10,125), and frequently involve loss of boundaries of DNA methylation at the
edge of CpG islands, shifting of DNA methylation boundaries, or the creation of novel
hypomethylated regions in CG-dense regions that are not canonical islands. These data
underscore the importance of hypomethylated CpG island shores in cancer since shores
associated with hypomethylation and gene overexpression in cancer are enriched for cell
cycle related genes, suggesting a role in the unregulated growth that characterizes cancer.
We propose a model relating tissue-specific DMRs to the sites of methylation
hypervariability in cancer. Normal pluripotency might require stochastic gene expression at
some loci, allowing for differentiation along alternative pathways in response to external
stimuli or even intrinsically. The epigenome could collaborate to create a permissive state by
changing its physical configuration to relax the stringency of epigenetic marks, since
variance increases away from the extremes, and a similar process may occur in cancer. One
way is by altering LOCKs/LADs/blocks, which could involve a change in the chromatin
packing density or proximity to the nuclear lamina. Similarly, subtle shifts in DNA
methylation boundaries near CpG islands may drive normal chromatin organization and
tissue-specific gene expression. Given the importance of boundary regions for both small
DMRs and large blocks identified in this study, it will be important to focus future
epigenetic investigations on the boundaries of blocks and CpG islands (shores), and on
genetic or epigenetic changes in genes encoding factors that interact with them.
The increased methylation and expression variability in each cancer type is consistent with
the potential selective value of increased epigenetic plasticity in a varying environment first
suggested for evolution but applicable to the strong but variable selective forces under which
a cancer grows, such as varying oxygen tension or metastasis to a distant site25. Thus,
increased epigenetic heterogeneity in cancer at cDMRs (which we show are also tDMRs)
could underlie the ability of cancer cells to adapt rapidly to changing environments, such as
increased oxygen with neovascularization, then decreased oxygen with necrosis; or
metastasis to a new intercellular milieu.
Supplementary Material
Refer to Web version on PubMed Central for supplementary material.
Hansen et al. Page 8
Nat Genet. Author manuscript; available in PMC 2012 February 1.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
Acknowledgments
We thank Applied Biosystems, Inc. for supplying reagents for the sequencing experiments, Bert Vogelstein and
Martha Zeiger for tumor samples, and Marvin Newhouse for computer assistance. This work was supported by NIH
Grants R37CA054358, R01HG005220, 5P50HG003233, F32CA138111, 5R01GM083084, andR01DA025779
(KZ).
References
1. Jones PA, Baylin SB. The fundamental role of epigenetic events in cancer. Nat Rev Genet. 2002;
3:415–28. [PubMed: 12042769]
2. Feinberg AP, Tycko B. The history of cancer epigenetics. Nat Rev Cancer. 2004; 4:143–53.
[PubMed: 14732866]
3. Esteller M. Epigenetics in cancer. N Engl J Med. 2008; 358:1148–59. [PubMed: 18337604]
4. Irizarry RA, et al. The human colon cancer methylome shows similar hypo-and hypermethylation at
conserved tissue-specific CpG island shores. Nat Genet. 2009; 41:178–86. [PubMed: 19151715]
5. Doi A, et al. Differential methylation of tissue-and cancer-specific CpG island shores distinguishes
human induced pluripotent stem cells, embryonic stem cells and fibroblasts. Nat Genet. 2009;
41:1350–3. [PubMed: 19881528]
6. Irizarry RA, et al. Comprehensive high-throughput arrays for relative methylation (CHARM).
Genome Res. 2008; 18:780–90. [PubMed: 18316654]
7. Bibikova M, et al. High-throughput DNA methylation profiling using universal bead arrays.
Genome Res. 2006; 16:383–93. [PubMed: 16449502]
8. Feinberg AP, Gehrke CW, Kuo KC, Ehrlich M. Reduced genomic 5-methylcytosine content in
human colonic neoplasia. Cancer Res. 1988; 48:1159–61. [PubMed: 3342396]
9. Ehrlich M. DNA methylation in cancer: too much, but also too little. Oncogene. 2002; 21:5400–13.
[PubMed: 12154403]
10. Ogino S, et al. A cohort study of tumoral LINE-1 hypomethylation and prognosis in colon cancer.
J Natl Cancer Inst. 2008; 100:1734–8. [PubMed: 19033568]
11. Lister R, et al. Human DNA methylomes at base resolution show widespread epigenomic
differences. Nature. 2009; 462:315–22. [PubMed: 19829295]
12. Wen B, Wu H, Shinkai Y, Irizarry RA, Feinberg AP. Large histone H3 lysine 9 dimethylated
chromatin blocks distinguish differentiated from embryonic stem cells. Nat Genet. 2009; 41:246–
50. [PubMed: 19151716]
13. Hesselberth JR, et al. Global mapping of protein-DNA interactions in vivo by digital genomic
footprinting. Nat Methods. 2009; 6:283–9. [PubMed: 19305407]
14. Li Y, et al. The DNA Methylome of Human Peripheral Blood Mononuclear Cells. PLoS Biol.
2010; 8:e1000533. [PubMed: 21085693]
15. Falcon S, Gentleman R. Using GOstats to test gene lists for GO term association. Bioinformatics.
2007; 23:257–8. [PubMed: 17098774]
16. Frigola J, et al. Epigenetic remodeling in colorectal cancer results in coordinate gene suppression
across an entire chromosome band. Nat Genet. 2006; 38:540–9. [PubMed: 16642018]
17. Gal-Yam EN, Saito Y, Egger G, Jones PA. Cancer epigenetics: modifications, screening, and
therapy. Annu Rev Med. 2008; 59:267–80. [PubMed: 17937590]
18. Yu AE, Hewitt RE, Connor EW, Stetler-Stevenson WG. Matrix metalloproteinases. Novel targets
for directed cancer therapy. Drugs Aging. 1997; 11:229–44. [PubMed: 9303281]
19. Aleman MJ, et al. Inhibition of Single Minded 2 gene expression mediates tumor-selective
apoptosis and differentiation in human colon cancer cells. Proc Natl Acad Sci U S A. 2005;
102:12765–70. [PubMed: 16129820]
20. Yeung HY, et al. Hypoxia-inducible factor-1-mediated activation of stanniocalcin-1 in human
cancer cells. Endocrinology. 2005; 146:4951–60. [PubMed: 16109785]
21. Eurich K, Segawa M, Toei-Shimizu S, Mizoguchi E. Potential role of chitinase 3-like-1 in
inflammation-associated carcinogenic changes of epithelial cells. World J Gastroenterol. 2009;
15:5249–59. [PubMed: 19908331]
Hansen et al. Page 9
Nat Genet. Author manuscript; available in PMC 2012 February 1.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
22. Fischer H, et al. COL11A1 in FAP polyps and in sporadic colorectal tumors. BMC Cancer. 2001;
1:17. [PubMed: 11707154]
23. Clark SJ. Action at a distance: epigenetic silencing of large chromosomal regions in
carcinogenesis. Human Molecular Genetics. 2007; 16:R88–R95. [PubMed: 17613553]
24. Feber A, et al. Comparative methylome analysis of benign and malignant peripheral nerve sheath
tumours. Genome Research. 2011
25. Feinberg A, Irizarry R. Stochastic epigenetic variation as a driving force of development,
evolutionary adaptation, and disease. Proceedings of the National Academy of Sciences. 2010;
107:1757.
26. Zilliox MJ, Irizarry RA. A gene expression bar code for microarray data. Nat Methods. 2007;
4:911–3. [PubMed: 17906632]
27. Bolstad BM, Irizarry RA, Astrand M, Speed TP. A comparison of normalization methods for high
density oligonucleotide array data based on variance and bias. Bioinformatics. 2003; 19:185–93.
[PubMed: 12538238]
28. Leek JT, et al. Tackling the widespread and critical impact of batch effects in high-throughput data.
Nat Rev Genet. 2010; 11:733–9. [PubMed: 20838408]
29. Aryee MJ, et al. Accurate genome-scale percentage DNA methylation estimates from microarray
data. Biostatistics. 2011; 12:197–210. [PubMed: 20858772]
30. Bormann Chung CA, et al. Whole methylome analysis by ultra-deep sequencing using two-base
encoding. PLoS One. 2010; 5:e9320. [PubMed: 20179767]
31. Deng J, et al. Targeted bisulfite sequencing reveals changes in DNA methylation associated with
nuclear reprogramming. Nat Biotechnol. 2009; 27:353–60. [PubMed: 19330000]
32. Xi Y, Li W. BSMAP: whole genome bisulfite sequence MAPping program. BMC Bioinformatics.
2009; 10:232. [PubMed: 19635165]
33. Eckhardt F, et al. DNA methylation profiling of human chromosomes 6, 20 and 22. Nat Genet.
2006; 38:1378–85. [PubMed: 17072317]
34. Loader, C. Local regression and likelihood. Springer Verlag; 1999.
35. Jurka J. Repbase update: a database and an electronic journal of repetitive elements. Trends Genet.
2000; 16:418–20. [PubMed: 10973072]
36. Olshen AB, Venkatraman ES, Lucito R, Wigler M. Circular binary segmentation for the analysis of
array-based DNA copy number data. Biostatistics. 2004; 5:557–72. [PubMed: 15475419]
37. Sabates-Bellver J, et al. Transcriptome profile of human colorectal adenomas. Mol Cancer Res.
2007; 5:1263–75. [PubMed: 18171984]
38. Gyorffy B, Molnar B, Lage H, Szallasi Z, Eklund AC. Evaluation of microarray preprocessing
algorithms based on concordance with RT-PCR in clinical samples. PLoS One. 2009; 4:e5645.
[PubMed: 19461970]
39. Galamb O, et al. Reversal of gene expression changes in the colorectal normal-adenoma pathway
by NS398 selective COX2 inhibitor. Br J Cancer. 2010; 102:765–73. [PubMed: 20087348]
40. Smith JC, Boone BE, Opalenik SR, Williams SM, Russell SB. Gene profiling of keloid fibroblasts
shows altered expression in multiple fibrosis-associated pathways. J Invest Dermatol. 2008;
128:1298–310. [PubMed: 17989729]
41. Chen Y, et al. Developing and applying a gene functional association network for anti-angiogenic
kinase inhibitor activity assessment in an angiogenesis co-culture model. BMC Genomics. 2008;
9:264. [PubMed: 18518970]
42. Duarte TL, Cooke MS, Jones GD. Gene expression profiling reveals new protective roles for
vitamin C in human skin cells. Free Radic Biol Med. 2009; 46:78–87. [PubMed: 18973801]
Hansen et al. Page 10
Nat Genet. Author manuscript; available in PMC 2012 February 1.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
Figure 1. Increased methylation variance of common CpG sites across human cancer types
Methylation levels measured at 384 CpG sites using a custom Illumina array exhibit an
increase in across-sample variability in (a) colon, (b) lung, (c) breast, (d) thyroid, and (e)
kidney (Wilms tumor) cancers. Each panel shows the across-sample standard deviation of
methylation level for each CpG in normal and matched cancer samples. The solid line is the
identity line; CpGs above this line have greater variability in cancer. The dashed line
indicates the threshold at which differences in methylation variance become significant (F-
test at 99% level). In all five tissue types, the vast majority of CpGs are above the solid line,
indicating that variability is larger in cancer samples than in normal. Colors indicate the
location of each CpG with respect to canonical annotated CpG islands. (f) Using the CpGs
that showed the largest increase in variability we performed hierarchical clustering on the
normal samples. The heatmap of the methylation values for these CpGs clearly distinguishes
the tissue types, indicating that these sites of increased methylation heterogeneity in cancer
are tissue-specific DMRs.
Hansen et al. Page 11
Nat Genet. Author manuscript; available in PMC 2012 February 1.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
Figure 2. Large hypomethylated genomic blocks in human colon cancer
Shown in (a) and (b) are smoothed methylation values from bisulfite sequencing data for
cancer samples (red) and normal samples (blue) in two genomic regions. The
hypomethylated blocks are shown with pink shading. Grey bars indicate the location of
PMDs, LOCKs, LADs, CpG Islands, and gene exons. Note that the blocks coincide with the
PMD, LOCKS, and LADs in panel (a) but not in (b). Also one can see small
hypermethylated blocks at the right edge, which account for 3% of the blocks. (c) The
distribution of high-frequency smoothed methylation values for the normal samples (blue)
versus the cancer samples (red) demonstrates global hypomethylation of cancer compared to
normal. (d) The distribution of methylation values in the blocks (solid lines) and outside the
blocks(dashed lines) for normal samples (blue) and cancer samples (red). Note that while the
normal and cancer distributions are similar outside the blocks, within the blocks methylation
values for cancer exhibit a general shift. (e) Distribution of methylation differences between
cancer and normal samples stratified by inclusion in repetitive DNA and blocks. Inside the
blocks, the average difference was ~20% in both in repeat and non-repeat areas. Outside
the blocks, the average difference was ~0% in repeat and non-repeat areas, indicating that
blocks rather than repeats account for the observed differences in DNA methylation.
Hansen et al. Page 12
Nat Genet. Author manuscript; available in PMC 2012 February 1.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
Figure 3. Loss of methylation stability at small DMRs
Methylation estimates plotted against genomic location for normal samples (blue) and
cancer samples (red). The small DMR locations are shaded pink. Grey bars indicate the
location of blocks, CpG islands, and gene exons. Tick marks along the bottom axis indicate
the location of CpGs. Pictured are examples of (a) a methylation boundary shift outward, (b)
a methylation boundary shift inward, (c) a loss of methylation boundary, and (d) a novel
hypomethylation DMR.
Hansen et al. Page 13
Nat Genet. Author manuscript; available in PMC 2012 February 1.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
Figure 4. Adenomas show intermediate methylation variability
(a) Multidimensional scaling of pairwise distances derived from methylation levels assayed
on a custom Illumina array. Note that cancer samples (red) are largely far from the tight
cluster of normal samples (blue), while adenoma samples (black) exhibit a range of
distances: some are as close as other normal samples, others are as far as cancer samples,
and many are at intermediate distances. (b) Multidimensional scaling of pairwise distances
derived from average methylation values in blocks identified via bisulfite sequencing.
Matching sequenced adenoma samples (labeled 1 and 2) appear in the same locations
relative to the cluster of normal samples in both (a) and (b). (c) Methylation values for
normal (blue), cancer (red) and two adenoma samples (black). Adenoma 1, which appeared
closer to normal samples in the multidimensional scaling analysis (a), follows a similar
methylation pattern to the normal samples. However, in some regions (shaded with pink)
differences between Adenoma 1 and the normal samples are observed. Adenoma 2 shows a
similar pattern to cancers.
Hansen et al. Page 14
Nat Genet. Author manuscript; available in PMC 2012 February 1.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
Figure 5. High variability of gene expression associated with blocks
(a) An example of hypervariably expressed genes contained within a block; note genes
MMP7, MMP10, and MMP3 highlighted in red. Methylation values for cancer samples (red)
and normal samples (blue) with hypomethylated block locations highlighted (pink shading)
are plotted against genomic location. Grey bars are as in Fig. 2. (b) Standardized log
expression values for 26 hypervariable genes in cancer located within hypomethylated block
regions (normal samples in blue, cancer samples in red). Standardization was performed
using the gene expression barcode. Genes with standardized expression values below 2.54,
or the 99.5th percentile of a normal distribution (horizontal dashed line) are determined to be
silenced by the barcode method26. Vertical dashed lines separate the values for the different
genes. Note there is consistent expression silencing in normal samples compared to
hypervariable expression in cancer samples. A similar plot drawn from an alternative GEO
dataset is shown in Supplementary Figure 18.
Hansen et al. Page 15
Nat Genet. Author manuscript; available in PMC 2012 February 1.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
Hansen et al. Page 16
Table 1
Genomic features of Differentially Methylated Regions (DMRs) in colon cancer
N # CpG Genomic size Median size (bp) Overlap with islands Overlap with shores Overlap with Ref seq mRNA TSS
Normal genome (reference) N/A 28.2M 3.10 Gb N/A 27.7K 55.4K 36,983
Hypomethylated blocks 13,540 16.2M 1.95 Gb 39,412 17.6% 26.8% 10,453
Hypermethylated blocks 2,871 485K 35.8 Mb 9,213 13.4% 36.4% 976
Hypomethylated small DMRs 4,315 59.5K 2.91 Mb 401 2.2% 51.0% 1,708
Novel hypomethylated 448 8.35K 367 Kb 658 2.9% 19.9% 30
Shift of methylation boundary 1,516 17.5K 741 Kb 261 2.1% 92.8% 1,313
Other 2,351 33.7K 1.80MB 479 2.1% 29.9% 368
Hypermethylated small DMRs 5,810 403K 6.14 Mb 820 67.2% 17.0% 3,068
Loss of boundary*1,756 165K 2.36 Mb 1,159 80.9% 3.4% 1,091
Shift of methylation boundary 1,774 96.3K 1.40 Mb 502 60.3% 33.0% 1,027
Other 2,280 142K 2.38MB 769 62.2% 15.1% 983
*As described in the text, loss of boundary DMRs were associated with increase of methylation in the CpG island and a decrease of methylation in the adjacent shore. We score these as a single event and
classify them here since there are more CpGs in the islands than in the shores.
Nat Genet. Author manuscript; available in PMC 2012 February 1.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
Hansen et al. Page 17
Table 2
Methylation values* observed in CpG islands in cancer compared to normal samples
Methylation status in normals Total Hypo No change Hyper
Unmethylated (<= 0.2) 16184 0.1% 83.2% 16.7%
Partial methylated (>= 0.2, <=0.8) 4796 17.0% 46.7% 36.3%
Methylated (>= 0.8) 5527 24.0% 75.9% 0.1%
*Average methylation value in each island were then averaged across subject for cancer and normal samples separately
Nat Genet. Author manuscript; available in PMC 2012 February 1.
... 89 The carcinogenic process features a marked reduction in these heterochromatin regions and DNA methylation, giving rise to erratic gene activity (Table 1). 90 Hypermethylation of promoter-associated CGIs, being the most extensively studied epigenetic change in tumorigenesis, is chiefly linked to the transcriptional silencing of TSGs and mismatch repair genes pivotal in numerous cancerrelated pathways. 91 The discovery of DNA methylation in the promoter region of retinoblastoma TSG (RB1) marked a significant milestone. ...
... These alterations can initiate stochastic epigenetic variations within these susceptible domains early in cancer development. 90 Furthermore, cancer frequently involves mutations in epigenetic modifiers and modulators, or these components may relay signals from oncogenic pathways, indirectly altering chromatin modifications locally or globally to further tumor progression. Chromatin states at epigenetic mediator genes are especially prone to disruption by cancer-predisposing environmental factors. ...
Article
Full-text available
Epigenetic modifications are defined as heritable changes in gene activity that do not involve changes in the underlying DNA sequence. The oncogenic process is driven by the accumulation of alterations that impact genome's structure and function. Genetic mutations, which directly disrupt the DNA sequence, are complemented by epigenetic modifications that modulate gene expression, thereby facilitating the acquisition of malignant characteristics. Principals among these epigenetic changes are shifts in DNA methylation and histone mark patterns, which promote tumor development and metastasis. Notably, the reversible nature of epigenetic alterations, as opposed to the permanence of genetic changes, positions the epigenetic machinery as a prime target in the discovery of novel therapeutics. Our review delves into the complexities of epigenetic regulation, exploring its profound effects on tumor initiation, metastatic behavior, metabolic pathways, and the tumor microenvironment. We place a particular emphasis on the dysregulation at each level of epigenetic modulation, including but not limited to, the aberrations in enzymes responsible for DNA methylation and histone modification, subunit loss or fusions in chromatin remodeling complexes, and the disturbances in higher‐order chromatin structure. Finally, we also evaluate therapeutic approaches that leverage the growing understanding of chromatin dysregulation, offering new avenues for cancer treatment.
... Differences in methylation levels between A/T-rich and G/C-rich regions have been observed to vary by cell type. Somatic cells and embryonic stem cells show higher methylation in A/T-rich regions, in contrast to placenta, IMR-90 fibroblasts, and cancer cell lines, which show higher methylation in G/C-rich regions (Lister et al, 2009;Hansen et al, 2011;Hon et al, 2012Hon et al, , 2013Schultz et al, 2015). It has been suggested that genomic base composition may influence the formation of distinct methylome profiles during cellular differentiation (Quante & Bird, 2016;Liu et al, 2018). ...
... Class III profiles were found in trophoblasts, early epiblasts, prospermatogonia, and oocytes, all undergoing global DNA methylation ( Fig 1C). This class also included placenta, fibroblasts, and cultured cancer cell lines (Fig S1B), which are known to contain PMDs (Lister et al, 2009;Hansen et al, 2011;Hon et al, 2012;. These results illustrate a clear relationship between megabase-scale methylome patterns (A) Representative methylome patterns for each of the three classes. ...
Article
Full-text available
DNA methylation is an essential epigenetic mechanism that regulates cellular reprogramming and development. Studies using whole-genome bisulfite sequencing have revealed distinct DNA methylome landscapes in human and mouse cells and tissues. However, the factors responsible for the differences in megabase-scale methylome patterns between cell types remain poorly understood. By analyzing publicly available 258 human and 301 mouse whole-genome bisulfite sequencing datasets, we reveal that genomic regions rich in guanine and cytosine, when located near the nuclear center, are highly susceptible to both global DNA demethylation and methylation events during embryonic and germline reprogramming. Furthermore, we found that regions that generate partially methylated domains during global DNA methylation are more likely to resist global DNA demethylation, contain high levels of adenine and thymine, and are adjacent to the nuclear lamina. The spatial properties of genomic regions, influenced by their guanine–cytosine content, are likely to affect the accessibility of molecules involved in DNA (de)methylation. These properties shape megabase-scale DNA methylation patterns and change as cells differentiate, leading to the emergence of different megabase-scale methylome patterns across cell types.
... Several items have recently observed widespread distinguishable methylated region across the genome in aging skin [71][72][73]. And these data were consistent with previous report showing substantial hypomethylation in common skin cancers such as squamous cell carcinoma and basal cell carcinoma [74,75]. Remarkably, the overall level of DNA methylation and the expression level of DNA methyl-transferase1(DNMT1) decreased in the process of cellular aging. ...
... The hypomethylation of promoter regions and non-coding repeats of proto-oncogenes causes gene activation and destabilizes the genome structure, resulting in abnormal cell proliferation. 33 In the past, DNA methylation was assumed to be irreversible. Nevertheless, studies conducted over the past decade have revealed that gene expression and chromosome stability depend on a dynamic balance between methylation initiation, methylation maintenance, and demethylation. ...
Article
Full-text available
Background: Ovarian cancer stands as the deadliest malignant tumor within the female reproductive tract. As a result of the absence of effective diagnostic and monitoring markers, 75% of ovarian cancer cases are diagnosed at a late stage, leading to a mere 50% survival rate within five years. The advancement of molecular biology is essential for accurate diagnosis and treatment of ovarian cancer. Methods: A review of several randomized clinical trials, focusing on the ovarian cancer, was undertaken. The advancement of molecular biology and diagnostic methods related to accurate diagnosis and treatment of ovarian cancer were examined. Results: Liquid biopsy is an innovative method of detecting malignant tumors that has gained increasing attention over the past few years. Cell-free DNA assay-based liquid biopsies show potential in delineating tumor status heterogeneity and tracking tumor recurrence. DNA methylation influences a multitude of biological functions and diseases, especially during the initial phases of cancer. The cell-free DNA methylation profiling system has emerged as a sensitive and non-invasive technique for identifying and detecting the biological origins of cancer. It holds promise as a biomarker, enabling early screening, recurrence monitoring, and prognostic evaluation of cancer. Conclusions: This review evaluates recent advancements and challenges associated with cell-free DNA methylation analysis for the diagnosis, prognosis monitoring, and assessment of therapeutic responses in the management of ovarian cancers, aiming to offer guidance for precise diagnosis and treatment of this disease.
... In addition, DNA methylation marks function as genome stabilizers by silencing transposable elements 34,38 . The main ways DNA methylation is altered in cancer include genome-wide hypomethylation in repetitive elements like retrotransposable elements 39,40 , hypermethylation of promoters [40][41][42][43] , and propensity for cytosines in CpG contexts to be mutated [44][45][46][47] . ...
Article
Full-text available
Although intratumoral heterogeneity has been established in pediatric central nervous system tumors, epigenomic alterations at the cell type level have largely remained unresolved. To identify cell type-specific alterations to cytosine modifications in pediatric central nervous system tumors, we utilize a multi-omic approach that integrated bulk DNA cytosine modification data (methylation and hydroxymethylation) with both bulk and single-cell RNA-sequencing data. We demonstrate a large reduction in the scope of significantly differentially modified cytosines in tumors when accounting for tumor cell type composition. In the progenitor-like cell types of tumors, we identify a preponderance differential Cytosine-phosphate-Guanine site hydroxymethylation rather than methylation. Genes with differential hydroxymethylation, like histone deacetylase 4 and insulin-like growth factor 1 receptor, are associated with cell type-specific changes in gene expression in tumors. Our results highlight the importance of epigenomic alterations in the progenitor-like cell types and its role in cell type-specific transcriptional regulation in pediatric central nervous system tumors.
... Thus, in CIN2+/CC these sites are identifiable via the usual feature selection paradigm based on testing for a difference in average DNAm. In this regard it is worth noting that although a number of studies have suggested inherently stochastic and increased DNAm variation in invasive cancer [59,60], the reality is that many CpGs do display DNAm changes across a larger proportion of tumours, suggesting a less stochastic pattern compared to precursor lesions. In line with this, the dynamic patterns of DNAm change during carcinogenesis were subsequently studied in terms of epigenetic diversity, loosely defined by the magnitude of inter-CpG covariances, indicating that epigenetic clonal diversity may be maximal in the stage immediately prior to the onset of invasive cancer (figure 2c) [61,62]. ...
Article
Full-text available
Epigenetic changes are known to accrue in normal cells as a result of ageing and cumulative exposure to cancer risk factors. Increasing evidence points towards age-related epigenetic changes being acquired in a quasi-stochastic manner, and that they may play a causal role in cancer development. Here, I describe the quasi-stochastic nature of DNA methylation (DNAm) changes in ageing cells as well as in normal cells at risk of neoplastic transformation, discussing the implications of this stochasticity for developing cancer risk prediction strategies, and in particular, how it may require a conceptual paradigm shift in how we select cancer risk markers. I also describe the mounting evidence that a significant proportion of DNAm changes in ageing and cancer development are related to cell proliferation, reflecting tissue-turnover and the opportunity this offers for predicting cancer risk via the development of epigenetic mitotic-like clocks. Finally, I describe how age-associated DNAm changes may be causally implicated in cancer development via an irreversible suppression of tissue-specific transcription factors that increases epigenetic and transcriptomic entropy, promoting a more plastic yet aberrant cancer stem-cell state. This article is part of a discussion meeting issue ‘Causes and consequences of stochastic processes in development and disease’.
... Further analyses in a separate cohort of recent-onset TMD revealed patterns of differential methylation that may determine persistence or resolution of orofacial pain. As regions of consistently differential methylation are more likely to be functional than single CpG sites, 25,30 we focused our followup analyses on the 6 DMRs associated with chronic painful TMD and their neighboring genes, integrating the methylation data sets with other -omic data exploring genetic variation and transcriptomic features. These analyses provided strong convergent evidence to support the physiological significance and regulatory function of these DMRs. ...
Article
Full-text available
Temporomandibular disorders (TMDs), collectively representing one of the most common chronic pain conditions, have a substantial genetic component, but genetic variation alone has not fully explained the heritability of TMD risk. Reasoning that the unexplained heritability may be because of DNA methylation, an epigenetic phenomenon, we measured genome-wide DNA methylation using the Illumina MethylationEPIC platform with blood samples from participants in the Orofacial Pain: Prospective Evaluation and Risk Assessment (OPPERA) study. Associations with chronic TMD used methylation data from 496 chronic painful TMD cases and 452 TMD-free controls. Changes in methylation between enrollment and a 6-month follow-up visit were determined for a separate sample of 62 people with recent-onset painful TMD. More than 750,000 individual CpG sites were examined for association with chronic painful TMD. Six differentially methylated regions were significantly ( P < 5 × 10 ⁻⁸ ) associated with chronic painful TMD, including loci near genes involved in the regulation of inflammatory and neuronal response. A majority of loci were similarly differentially methylated in acute TMD consistent with observed transience or persistence of symptoms at follow-up. Functional characterization of the identified regions found relationships between methylation at these loci and nearby genetic variation contributing to chronic painful TMD and with gene expression of proximal genes. These findings reveal epigenetic contributions to chronic painful TMD through methylation of the genes FMOD , PM20D1 , ZNF718 , ZFP57 , and RNF39 , following the development of acute painful TMD. Epigenetic regulation of these genes likely contributes to the trajectory of transcriptional events in affected tissues leading to resolution or chronicity of pain.
Article
Aim: To predict base-resolution DNA methylation in cancerous and paracancerous tissues. Material & methods: We collected six cancer DNA methylation datasets from The Cancer Genome Atlas and five cancer datasets from Gene Expression Omnibus and established machine learning models using paired cancerous and paracancerous tissues. Tenfold cross-validation and independent validation were performed to demonstrate the effectiveness of the proposed method. Results: The developed cross-tissue prediction models can substantially increase the accuracy at more than 68% of CpG sites and contribute to enhancing the statistical power of differential methylation analyses. An XGBoost model leveraging multiple correlating CpGs may elevate the prediction accuracy. Conclusion: This study provides a powerful tool for DNA methylation analysis and has the potential to gain new insights into cancer research from epigenetics.
Article
The correct establishment of DNA methylation patterns is vital for mammalian development and is achieved by the de novo DNA methyltransferases DNMT3A and DNMT3B. DNMT3B localises to H3K36me3 at actively transcribing gene bodies via its PWWP domain. It also functions at heterochromatin through an unknown recruitment mechanism. Here, we find that knockout of DNMT3B causes loss of methylation predominantly at H3K9me3-marked heterochromatin and that DNMT3B PWWP domain mutations or deletion result in striking increases of methylation in H3K9me3-marked heterochromatin. Removal of the N-terminal region of DNMT3B affects its ability to methylate H3K9me3-marked regions. This region of DNMT3B directly interacts with HP1α and facilitates the bridging of DNMT3B with H3K9me3-marked nucleosomes in vitro. Our results suggest that DNMT3B is recruited to H3K9me3-marked heterochromatin in a PWWP-independent manner that is facilitated by the protein’s N-terminal region through an interaction with a key heterochromatin protein. More generally, we suggest that DNMT3B plays a role in DNA methylation homeostasis at heterochromatin, a process which is disrupted in cancer, aging and Immunodeficiency, Centromeric Instability and Facial Anomalies (ICF) syndrome.
Chapter
Tumor cells evolve through space and time, generating genetically and phenotypically diverse cancer cell populations that are continually subjected to the selection pressures of their microenvironment and cancer treatment.
Article
Full-text available
Tumor angiogenesis is a highly regulated process involving intercellular communication as well as the interactions of multiple downstream signal transduction pathways. Disrupting one or even a few angiogenesis pathways is often insufficient to achieve sustained therapeutic benefits due to the complexity of angiogenesis. Targeting multiple angiogenic pathways has been increasingly recognized as a viable strategy. However, translation of the polypharmacology of a given compound to its antiangiogenic efficacy remains a major technical challenge. Developing a global functional association network among angiogenesis-related genes is much needed to facilitate holistic understanding of angiogenesis and to aid the development of more effective anti-angiogenesis therapeutics. We constructed a comprehensive gene functional association network or interactome by transcript profiling an in vitro angiogenesis model, in which human umbilical vein endothelial cells (HUVECs) formed capillary structures when co-cultured with normal human dermal fibroblasts (NHDFs). HUVEC competence and NHDF supportiveness of cord formation were found to be highly cell-passage dependent. An enrichment test of Biological Processes (BP) of differentially expressed genes (DEG) revealed that angiogenesis related BP categories significantly changed with cell passages. Built upon 2012 DEGs identified from two microarray studies, the resulting interactome captured 17226 functional gene associations and displayed characteristics of a scale-free network. The interactome includes the involvement of oncogenes and tumor suppressor genes in angiogenesis. We developed a network walking algorithm to extract connectivity information from the interactome and applied it to simulate the level of network perturbation by three multi-targeted anti-angiogenic kinase inhibitors. Simulated network perturbation correlated with observed anti-angiogenesis activity in a cord formation bioassay. We established a comprehensive gene functional association network to model in vitro angiogenesis regulation. The present study provided a proof-of-concept pilot of applying network perturbation analysis to drug phenotypic activity assessment.
Article
Full-text available
Aberrant DNA methylation (DNAm) was first linked to cancer over 25 yr ago. Since then, many studies have associated hypermethylation of tumor suppressor genes and hypomethylation of oncogenes to the tumorigenic process. However, most of these studies have been limited to the analysis of promoters and CpG islands (CGIs). Recently, new technologies for whole-genome DNAm (methylome) analysis have been developed, enabling unbiased analysis of cancer methylomes. By using MeDIP-seq, we report a sequencing-based comparative methylome analysis of malignant peripheral nerve sheath tumors (MPNSTs), benign neurofibromas, and normal Schwann cells. Analysis of these methylomes revealed a complex landscape of DNAm alterations. In contrast to what has been reported for other tumor types, no significant global hypomethylation was observed in MPNSTs using methylome analysis by MeDIP-seq. However, a highly significant (P < 10(-100)) directional difference in DNAm was found in satellite repeats, suggesting these repeats to be the main target for hypomethylation in MPNSTs. Comparative analysis of the MPNST and Schwann cell methylomes identified 101,466 cancer-associated differentially methylated regions (cDMRs). Analysis showed these cDMRs to be significantly enriched for two satellite repeat types (SATR1 and ARLα) and suggests an association between aberrant DNAm of these sequences and transition from healthy cells to malignant disease. Significant enrichment of hypermethylated cDMRs in CGI shores (P < 10(-60)), non-CGI-associated promoters (P < 10(-4)) and hypomethylated cDMRs in SINE repeats (P < 10(-100)) was also identified. Integration of DNAm and gene expression data showed that the expression pattern of genes associated with CGI shore cDMRs was able to discriminate between disease phenotypes. This study establishes MeDIP-seq as an effective method to analyze cancer methylomes.
Article
Full-text available
DNA methylation plays an important role in biological processes in human health and disease. Recent technological advances allow unbiased whole-genome DNA methylation (methylome) analysis to be carried out on human cells. Using whole-genome bisulfite sequencing at 24.7-fold coverage (12.3-fold per strand), we report a comprehensive (92.62%) methylome and analysis of the unique sequences in human peripheral blood mononuclear cells (PBMC) from the same Asian individual whose genome was deciphered in the YH project. PBMC constitute an important source for clinical blood tests world-wide. We found that 68.4% of CpG sites and <0.2% of non-CpG sites were methylated, demonstrating that non-CpG cytosine methylation is minor in human PBMC. Analysis of the PBMC methylome revealed a rich epigenomic landscape for 20 distinct genomic features, including regulatory, protein-coding, non-coding, RNA-coding, and repeat sequences. Integration of our methylome data with the YH genome sequence enabled a first comprehensive assessment of allele-specific methylation (ASM) between the two haploid methylomes of any individual and allowed the identification of 599 haploid differentially methylated regions (hDMRs) covering 287 genes. Of these, 76 genes had hDMRs within 2 kb of their transcriptional start sites of which >80% displayed allele-specific expression (ASE). These data demonstrate that ASM is a recurrent phenomenon and is highly correlated with ASE in human PBMCs. Together with recently reported similar studies, our study provides a comprehensive resource for future epigenomic research and confirms new sequencing technology as a paradigm for large-scale epigenomics studies.
Article
Full-text available
DNA methylation is a key regulator of gene function in a multitude of both normal and abnormal biological processes, but tools to elucidate its roles on a genome-wide scale are still in their infancy. Methylation sensitive restriction enzymes and microarrays provide a potential high-throughput, low-cost platform to allow methylation profiling. However, accurate absolute methylation estimates have been elusive due to systematic errors and unwanted variability. Previous microarray preprocessing procedures, mostly developed for expression arrays, fail to adequately normalize methylation-related data since they rely on key assumptions that are violated in the case of DNA methylation. We develop a normalization strategy tailored to DNA methylation data and an empirical Bayes percentage methylation estimator that together yield accurate absolute methylation estimates that can be compared across samples. We illustrate the method on data generated to detect methylation differences between tissues and between normal and tumor colon samples.
Article
Full-text available
High-throughput technologies are widely used, for example to assay genetic variants, gene and protein expression, and epigenetic modifications. One often overlooked complication with such studies is batch effects, which occur because measurements are affected by laboratory conditions, reagent lots and personnel differences. This becomes a major problem when batch effects are correlated with an outcome of interest and lead to incorrect conclusions. Using both published studies and our own analyses, we argue that batch effects (as well as other technical and biological artefacts) are widespread and critical to address. We review experimental and computational approaches for doing so.
Article
Matrix metalloproteinases (MMPs), or matrixins, are a family of zinc endopeptidases that play a key role in both physiological and pathological tissue degradation. Normally, there is a careful balance between cell division, matrix synthesis and matrix degradation, which is under the control of cytokines, growth factors and cell matrix interactions. The MMPs are involved in remodelling during tissue morphogenesis and wound healing. Under pathological conditions, this balance is altered: in arthritis, there is uncontrolled destruction of cartilage; in cancer, increased matrix turnover is thought to promote tumour cell invasion. The demonstration of a functional role of MMPs in arthritis and tumour metastasis raises the possibility of therapeutic intervention using synthetic MMP inhibitors with appropriate selectivity and pharmacokinetics. As the process of drug discovery focuses on structure-based design, efforts to resolve the 3-dimensional structures of the MMP family have intensified. Several novel MMP inhibitors have been identified and are currently being investigated in clinical trials. The structural information that is rapidly accumulating will be useful in refining the available inhibitors to selectively target specific MMP family members. In this review, we focus on the role of MMPs and their inhibitors in tumour invasion, metastasis and angiogenesis, and examine how MMPs may be targeted to prevent cancer progression.
Article
Neo-Darwinian evolutionary theory is based on exquisite selection of phenotypes caused by small genetic variations, which is the basis of quantitative trait contribution to phenotype and disease. Epigenetics is the study of nonsequence-based changes, such as DNA methylation, heritable during cell division. Previous attempts to incorporate epigenetics into evolutionary thinking have focused on Lamarckian inheritance, that is, environmentally directed epigenetic changes. Here, we propose a new non-Lamarckian theory for a role of epigenetics in evolution. We suggest that genetic variants that do not change the mean phenotype could change the variability of phenotype; and this could be mediated epigenetically. This inherited stochastic variation model would provide a mechanism to explain an epigenetic role of developmental biology in selectable phenotypic variation, as well as the largely unexplained heritable genetic variation underlying common complex disease. We provide two experimental results as proof of principle. The first result is direct evidence for stochastic epigenetic variation, identifying highly variably DNA-methylated regions in mouse and human liver and mouse brain, associated with development and morphogenesis. The second is a heritable genetic mechanism for variable methylation, namely the loss or gain of CpG dinucleotides over evolutionary time. Finally, we model genetically inherited stochastic variation in evolution, showing that it provides a powerful mechanism for evolutionary adaptation in changing environments that can be mediated epigenetically. These data suggest that genetically inherited propensity to phenotypic variability, even with no change in the mean phenotype, substantially increases fitness while increasing the disease susceptibility of a population with a changing environment.
Book
This book, and the associated software, have grown out of the author’s work in the field of local regression over the past several years. The book is designed to be useful for both theoretical work and in applications. Most chapters contain distinct sections introducing methodology, computing and practice, and theoretical results. The methodological and practice sections should be accessible to readers with a sound background in statistical meth- ods and in particular regression, for example at the level of Draper and Smith (1981). The theoretical sections require a greater understanding of calculus, matrix algebra and real analysis, generally at the level found in advanced undergraduate courses. Applications are given from a wide vari- ety of fields, ranging from actuarial science to sports. The extent, and relevance, of early work in smoothing is not widely appre- ciated, even within the research community. Chapter 1 attempts to redress the problem. Many ideas that are central to modern work on smoothing: local polynomials, the bias-variance trade-off, equivalent kernels, likelihood models and optimality results can be found in literature dating to the late nineteenth and early twentieth centuries. The core methodology of this book appears in Chapters 2 through 5. These chapters introduce the local regression method in univariate and multivariate settings, and extensions to local likelihood and density estima- tion. Basic theoretical results and diagnostic tools such as cross validation are introduced along the way. Examples illustrate the implementation of the methods using the locfit software. The remaining chapters discuss a variety of applications and advanced topics: classification, survival data, bandwidth selection issues, computa- vi tion and asymptotic theory. Largely, these chapters are independent of each other, so the reader can pick those of most interest. Most chapters include a short set of exercises. These include theoretical results; details of proofs; extensions of the methodology; some data analysis examples and a few research problems. But the real test for the methods is whether they provide useful answers in applications. The best exercise for every chapter is to find datasets of interest, and try the methods out! The literature on mathematical aspects of smoothing is extensive, and coverage is necessarily selective. I attempt to present results that are of most direct practical relevance. For example, theoretical motivation for standard error approximations and confidence bands is important; the reader should eventually want to know precisely what the error estimates represent, rather than simply asuming software reports the right answers (this applies to any model and software; not just local regression and loc- fit!). On the other hand, asymptotic methods for boundary correction re- ceive no coverage, since local regression provides a simpler, more intuitive and more general approach to achieve the same result. Along with the theory, we also attempt to introduce understanding of the results, along with their relevance. Examples of this include the discussion of non-identifiability of derivatives (Section 6.1) and the problem of bias estimation for confidence bands and bandwidth selectors (Chapters 9 and 10).
Article
Motivation: When running experiments that involve multiple high density oligonucleotide arrays, it is important to remove sources of variation between arrays of non-biological origin. Normalization is a process for reducing this variation. It is common to see non-linear relations between arrays and the standard normalization provided by Affymetrix does not perform well in these situations. Results: We present three methods of performing normalization at the probe intensity level. These methods are called complete data methods because they make use of data from all arrays in an experiment to form the normalizing relation. These algorithms are compared to two methods that make use of a baseline array: a one number scaling based algorithm and a method that uses a non-linear normalizing relation by comparing the variability and bias of an expression measure. Two publicly available datasets are used to carry out the comparisons. The simplest and quickest complete data method is found to perform favorably. Availability: Software implementing all three of the complete data normalization methods is available as part of the R package Affy, which is a part of the Bioconductor project http://www.bioconductor.org. Supplementary information: Additional figures may be found at http://www.stat.berkeley.edu/~bolstad/normalize/index.html