ArticlePDF Available

Measuring cell-type specific differential methylation in human brain tissue

Authors:

Abstract and Figures

The behavior of epigenetic mechanisms in the brain is obscured by tissue heterogeneity and disease-related histological changes. Not accounting for these confounders leads to biased results. We develop a statistical methodology that estimates and adjusts for celltype composition by decomposing neuronal and non-neuronal differential signal. This method provides a conceptual framework for deconvolving heterogeneous epigenetic data from postmortem brain studies. We apply it to find cell-specific differentially methylated regions between prefrontal cortex and hippocampus. We demonstrate the utility of the method on both Infinium 450k and CHARM data.
Content may be subject to copyright.
MET H O D Open Access
Measuring cell-type specific differential
methylation in human brain tissue
Carolina M Montaño
1,2
, Rafael A Irizarry
3
, Walter E Kaufmann
4
, Konrad Talbot
5
, Raquel E Gur
5
, Andrew P Feinberg
6
and Margaret A Taub
7*
Abstract
The behavior of epigenetic mecha nisms in the brain is obscured by tissue heterogeneity and disease-related
histological changes. Not accounting for these confounders leads to biased results. We develop a statistical
methodology that estimates and adjusts for celltype composition by decomposing neuronal and non-neuronal
differential signal. This method provides a conceptual framework for deconvolving heterogeneous epigenetic data
from postmortem brain studies. We apply it to find cell-specific differentially methylated regions between
prefrontal cortex and hippocampus. We demonstrate the utility of the method on both Infinium 450k and CHARM
data.
Keywords: DNA methylation, epigenetics, differentially methylated region, brain region, cell-type heterogeneity,
deconvolution, NeuN, neuron, glia, postmortem brain, fluorescence activated cell sorting
Background
The brain is a p articularly good example of highly spe-
cialized and diverse functions arising from the same
genetic program. Epigenetic mechanisms copy informa-
tion other than the sequence itself during cell divisio n,
such as DNA methylation and chromatin arrangements
[1]. Ther efore , epigenetics is an attracti ve substrate for
understanding specialized brain function an d its disrup-
tion in disease. An example of an epigenetic mechanism
is DNA methylation, which at CpG dinucleotides is
heritable during cell division, because that sequence i s
recognized by a DNA methyltransferase on newly repli-
cated strands. In post-mitotic cells such as neurons,
DNA methylation has been shown to contribute to
memory formation [2], other types of synaptic plasticity
[3], drug addiction [4], and reversible behavior in the
honeybee Apis mellifera [5]. Neurological diseases have
also been linked to mutations in D NA methyltrans-
ferases [6] and methyl-CpG-binding proteins [7].
Despite its importance, the epigenetic profile of the
brain has not yet been explored in depth due to, among
other f actors, brain region and cell-type heterogeneity.
The cerebral cortex has distinct functional regions, each
organized into cell layers of neurons and glia that vary
throughout the cortex [8]. While neurons are the main sig-
naling unit, glia play an important role in scaffolding and
maintaining synapses [9]. Epigenet ic profiling of neurons
and non-neurons using the Illumina GoldenGate assay has
shown that neurons and glia have a unique DNA methyla-
tion signature that cannot be assessed using samples from
bulk cortex [10]. This is i mportant because shifts in glial
cell populations such as oligodendrocytes contribute to
defects in cortical myelination, and microglia activation has
been linked to neurodegenerative disorders [11].
Traditional epidemiological studies using brain tissue
done so far do not account for differences in cell-type
composition [12-14]. Statistical methods for estimating
cell-type composition from genomic profiles have been
developed for gene expression [15-18], and DNA methyla-
tion in blood tissue [19] and in brain [20]. DNA methyla-
tion can then be used to calculate and potentially adjust
for differing cell proportions, a crucial step when studying
diseases where cell population shifts occur [21].
While DNA methylation data can now be used to cal-
culate differing cell proportions, individual cell-type pro-
filing has not been done yet due to the extensive mixture
combinations required for validation in blood (at least
five different cell types) [19]. In contrast, cell profiling in
* Correspondence: mtaub@jhsph.edu
7
Department of Biostatistics, Johns Hopkins Bloomberg School of Public
Health, 615 N Wolfe Street, Baltimore, MD 21205, USA
Full list of author information is available at the end of the article
Montaño et al. Genome Biology 2013, 14:R94
http://genomebiology.com/content/14/8/R94
© 2013 Montaño et al.; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative
Commons Attribution License (http://creativecommons.org/licenses/by/2.0), w hich permits unrestricted use, distribution, and
reproduction in any medium , provided the original work is properly cited.
the brain can be achieved by separating the cell types into
two main compartments: neurons and glia . In a re cent
publication [20], a method is proposed for estimating
neuron and glia proportions similar to the appro ach pro-
posed for whole blood [19]. While this is a useful step
toward correcting for cell distribution, this approach
does not permit the unbiased estimation of glia- and neu-
ron-specific differences between two sets of samples [20].
Such calculated cell-ty pe specific analysis off ers a crucial
advantage in studies of the brain, where neurons and glia
cannot generally be dissociated. For example, many brain
bank specimens contain pulverized material or even par-
affin-fixed specimens, for which methods exist to isolate
DNA for genome-scale methylation analysis [22]. Flow
sorting, as do ne here to develop this method, generally
does not yield sufficient quantities of material for gen-
ome-scale analysis, and is also extremely labor intensive
and costly.
Here we have developed a novel statistical epigenetics
approach that takes advantage of the stability and cell-
type specificity of DNA methylation, a s well as the fact
that the b rain is ma de up of two major cell types, neu-
rons and glia, in order to deconvolve the t wo main cell
components in the brain. Thus, the method allows one to
measure DNA methylation, for example, across brain
regions, and from those data calculate to a first approxi-
mation the difference in DNA methylation that is neu-
ron- or glia-specific. Moreover, once sorted data is
available for a given brain region, investigators can use
such data to calculate cell proportions on any u nsorted
sample measured on the same methylation platform
without the need to sort themselves. This approach
should have broad application to a range of problems in
neurodevelopment and disease research.
Results and discussion
Estimation of mixture proportions
We measured DNA methylation profiles for dorsolateral
prefrontal cortex (DLPFC), hippocampal formation (HF),
and superior temporal gyrus (STG) samples dissected
from frozen brains of normal individuals using the com-
prehensive high-throughput arrays for relative methylation
(CHARM) technique [23]. We also labeled and separated
neuronal nuclei in a subset of samples using a neuron-
specific antibody (NeuN) and fluorescence-activated
cell sorting (FACS) [24,25]. Neuronal (NeuN+) and non-
neuronal (NeuN-) fractions f rom DLPFC, HF, and STG
were collected for downstream processing and methyla-
tion analysis with CHARM (Additional File 1, Figure S1).
To illustrate the downst ream effects of the cell popula-
tion confounding problem, and focusing on two brain
regions for clarity, we examined a genomic region for
which: (1) no difference was obse rved between DLPFC
and HF in either neuronal or glial fractions; and (2) a dif-
ference was observed between neuronal and glial nuclei
within each br ain region (Figure 1a). Note that a strong
methylation difference between brain regions is observed
between the non-cell-sorted brain samples. This must be a
false-positive and, as we demonstrate below, must be due
to differences in cell-type composition between the brain
regions.
We modified a statistical method originall y developed
to estimate cell populations in blood [19] to calculate
neuronal and glial proportions for each of our unsorted
samples, adapting it to use a constrained linear optimiza-
tion model (Figure 1b, see overview in Additional File 1,
Figure S2a). We confirmed that our approach effectively
estimated these cell proportions using a mixture experi-
ment with an independent set of samples (Additional File
1, Figure S2b). To demonstrate that the false-positive
results of Figure 1 are due to difference in cell-type distri-
bution, we mathematically reconstructed the unsorted
sample methylation profile u sing the pure neuronal and
glial profiles and their estimated frequencies and pre-
dicted this result (Additional File 1, Figure S2c).
While the above results rely on having neuronal and
glial methylation signals for each brain region, we per-
formed additional analyses to determine whether accurate
estimates of neuronal and glial proportions in unsorted
samples from a brain region could be obtained using
select ed data from another brain region. Figure 1c shows
the accuracy of estimates obtained from such universal
data, compared to estimates based on sorted data from
each individual brain region. We also accurately reproduce
the cell proportion estimates from our mixture experiment
(Additional File 1, Figure S2d, see Materials and Methods
for additional details of how this analysis was performed).
Our results indicate that accurate estimates could be
obtained for a new brain region without the need to sort
samples from that region.
Generative model of methylation signal
Currently, obtaining cell-type specific DMRs from
unsorted samples is a mathematically intractable problem.
However, because in human postmortem brain samples
we are interested in just two cell fractions (neurons and
glia), we were able to develop a novel statistical procedure
to perform this deconvolution. The methylation signal for
any sample i at a given genomic location, Y
i
, can be mod-
eled as a linear combination of the methylation levels of
neuronal and glial fractions in the brain region where the
sample i was obtained. Specifically, for any given CpG, the
DNAm profile of a mixed sample can then be written as
(see Materials and Methods):
Y
i
= μ
D,+
+
μ
D,
μ
D,+
π
i
+
μ
H,+
μ
D,+
X
i
(
1 π
i
)
+
μ
H,
μ
D,
X
i
π
i
+ ε
i
Montaño et al. Genome Biology 2013, 14:R94
http://genomebiology.com/content/14/8/R94
Page 2 of 9
Here, we define
μ
D
,+
and
μ
D
,
to be the methylation
level of neuronal and glial fractions, respectively, in
DLPFC, with
μ
H
,+
and
μ
H
,
defined similarly for HF. For
each sample i,
X
i
is 1 if sample
i
was obtained from HF
and 0 for DLPFC samples. We let
π
i
to be the fraction of
glia in sample
i
,sothat
1 π
is the fraction of neurons.
Finally,
ε
i
represents biological variability and measure-
ment error. The statistical insight is that because the term
π
i
can be estimated with high precision (Additional File 1,
Figure S2c), it can be treated as fixed. With this assump-
tion in place, the equation above is actually a linear model
of the form
Y
i
= β
0
+ β
1
π
i
+ β
2
X
i
(
1 π
i
)
+ β
3
X
i
π
i
+ ε
i
,
in which the parameters
β
2
and
β
3
represent the quanti-
ties we are interested in measuring, that is, the differences
in neurons and glia, r espectively, between brain regions.
We refer to this model as M2. Fitting this linear model by
least squares and obtaining estimates for millions of geno-
mic locations is computationally feasible. (Fitting the
model for 4 million probes to ok about 5 seconds on our
laptop).
This statistical framework also exposes the problem with
existing naïve approaches to assess DNA methylation sig-
natures in mixed samples. To date, most published ana-
lyses ignore cell composition [26-30] and look for
associations in a way eq uivalent to fitting a simple linear
regression model
Y
i
= α
0
+ α
1
X
i
+ ε
i
(where the t-t est is
derived from the
X
i
=0or
1
). We refer to this model as
M1. In M1, the parameter
α
1
represents a combination of
the methylation differences in neurons and glia in which it
is impossible to deconvolve cell-type-specific contributions.
Furthermore, we can mathematically demonstrate that the
least squares estimate of
α
1
will be biased by differences in
cell-type frequency under the null hypothesis of no differ-
ence in methylation between brain regions (Figure 2a, see
Methods Section). Similarly, a naïve model suggested by
Guintivano et al. [20] that incorporates cell-type propor-
tions
Y
i
=
γ
0
+
γ
1
X
i
+
γ
2
π
i
+ ε
i
(we refer to this as model
M3) will lead to biased results as well, and to decreased
power to detect methylation differences (Additional File 1,
Figure S3). We also note that even the superior methods
show a smal l amount of bias (boxplot not centered at 0),
which can be explained by slightly inaccurate mixture esti-
mates (see Materials and Methods).
To test the utility of our model, we confirmed our theo-
retical results with experimental data. First, we obtained
estimates of significant neuron-specific methylation differ-
ences between DLPFC and HF using sorted brain samples
(gold standard, FDR <0.05, Additional File 1, Table S1).
We then used the unsorted brain data to calculate the
parameters representing the differences in brain-region
methylation using models M1 (total methylation dif fer-
ence,
α
1
) and M2(neuron -specific methylation difference,
β
2
). Figure 2b shows that we can estimate neuron-specific
methylation differences more accurately with model M2.
Therefore, we can assess neuron-specific methylation dif-
ferences between DLPFC and HF using whole tissue after
estimating cell proportions.
Using the sorted samples, we did not find statistically sig-
nificant DMRs in the non-neuronal fraction, which high-
lights the importance of isolating a neuronal signal from
total methylation values. The result is in agreement with
recently published literature suggesting that glia cells, con-
tained in the NeuN- fraction, have less diverse transcrip-
tion patterns across brain r egions than neurons [31], the
latter havi ng a distinct DN A-methylation signature [10].
Interestingly, proteins involved in modifying chromatin
56,663,400 56,663,800 56,664,200
0.0
0.2
0.4
0.6
0.8
1.0
DLPFC: NeuN-
DLPFC: NeuN+
HF: NeuN-
HF: NeuN+
DLPFC
HF
Methylation
Chr.18
a
0.0
0.2
0.4
0.6
0.8
1.0
Fraction of NeuN+ cells (%)
DLPFC HF
0.0 0.2 0.4 0.6 0.8 1.
0
0.0
0.2
0.4
0.6
0.8
1.0
NeuN+ estimated using universal data
1HX1HVWLPDWHGXVLQJUHJLRQíVSHFLILFGDWD
DLPFC
HF
STG
c
b
Location
Figure 1 The pr oportion of neuronal cells in a given brain r egion influences the identification of differentially methylated regions.
(a) Whole-tissue methylation signals show false-positive brain-region differences. Panel shows a plot of smoothed methylation signals from
sorted neuronal and glial cells (teal and purple lines) from DLPFC and HF (solid and dashed lines) as well as from whole-tissue DLPFC (gold line)
and HF (grey line). (b) Estimated neuronal fraction of cells for whole-tissue samples differs between DLPFC and HF (mean DLPFC = 0.53 (n = 19),
mean HF = 0.30 (n = 13), two-sample t-test P value 6.3 × 10
-6
). (c) Estimated neuronal fraction of cells for whole-tissue samples using universal
DMRs vs. estimated neuronal fraction using brain region-specific DMRs from DLPFC (gold), HF (grey), and STG (blue).
Montaño et al. Genome Biology 2013, 14:R94
http://genomebiology.com/content/14/8/R94
Page 3 of 9
were found among the brain-region neuronal DMRs, sup-
porting the role of epigenetic mechanisms in ne uronal
function and synaptic plasticity [32]. For example, neuron-
specific methylation of the histone methyltransferase
SETD3, which methyl ates histone H 3 at lysine 36, was
lower i n HF than in DLPFC, and histone deacetylase
HDAC4 s hows hypomethylation in DLPFC. Other genes
involved in neural differentiation include JAG1, TTL1,
NPAS4, CUX-2, DOCK2, NGEF, OLFM1, SATB2,and
GIT2.
Application to Illumina Infinium HumanMethyation450
Dataset
While the CHARM platform has many advan tages for
studying methylation patterns due to the high density and
location of prob es, the assay r equires restriction-enzyme
digestion and lacks single-base resolution. The Illumina
Infinium HumanMethylation450 (450K) array has emerged
as an af fordable alternative to obtain reliable quantitative
measurements of methylation. To demonstrate the perfor-
mance of our method on data from the 4 50K array, we
used data accessible at NCBI GEO database (Guintivano et
al. [20], accession GSE41826), consisting of 77 n ormal
samples from prefrontal cortex, of which 29 were sorted
into neuronal and glial fractions, nine were mixtures of
neurons and glia of kno wn propo rtions, and 10 were
unsorted, whole-tissue samples. We first applied our
method to obtain accurate cell-fraction estimates on the
known mixture samples (Additional File 1, Figure S4a).
Using these cell-fraction estimates and the pure neuronal
and glial profiles, we mathematically reconstructed the
methylation profile for the mixture samples in a set of
genomic regions and compared these results to the actual
observed methylation for these samples (Additional File 1,
Figure S4b). The cell proportion calcula tions agreed with
Guintivano et al.s estimates for prefrontal cortex. Our
CHARM cell proportion estimates are on average higher
than those obtained using 450K arrays, as the CHARM
data were sampled using 2 mm dermal biopsy punches to
minimize white matter contamination. The mathematical
reconstruction of the methylation signal was also done for
the unsorted samples (Additional File 1, Figure S4c).
Given that sorted data on the 450K array are only avail-
able for one brain region, we cannot demonstrate our
improved ability to detect true brain-region differences in
cell-type specific met hylation on this platform. However,
to show our ability to reduce false-positive signal, we
constructed an artificial comparison by grouping the
mixture samples with the highest and lowest neuronal
fractions and appli ed models M1 and M2 to look for dif-
ferences between these two groups. Any such differences
are clearly due only to cell-fraction variation, and model
M2 reduces the number of false-posit ive signals (Addi-
tional File 1, Figure S4d), as we saw for our CHARM data
(Figure 2a). These results indicate that our methods apply
well to data from the 450K array.
Conclusions
We describe an algorithm to address a gap in the analysis
of methylation data from complex tissues with varying
degrees of cell-type heterogeneity such as the b rain. To
appropriately measure the methylation differences
between two brain cortical regions, we separated a small
number of samples of the brain nuclei into neuronal and
non-neuronal fractions by c ell sorting, and developed a
statistical method to account for cell heterogeneity in a set
of unsorted samples by decomposing the signal into its
two components. Our proposed method takes a dvantage
Total Methylation NeuN+ NeuN-
-4
-2
0
2
4
6
8
Test statistic for difference in means
Estimated difference
ab
Model M2
Model M1
Sorted DLPFC-HF NeuN+ methylation difference
Total HF-DLPFC Methylation difference
(Model M1)
Neuronal HF-DLPFC Methylation difference
(Model M2)
0.4 0.2 0.0 0.2 0.4
-0.5
0.0
0.5
0.4 0.2 0.0 0.2 0.4
-0.5
0.0
0.5
Estimated difference
Figure 2 Effects of direct modeling on false-positives and accuracy. (a) Explicit modeling for differences in cell type reduces false-positive rate.
Boxplots of test statistics for the difference in means based on linear regression estimation from models M1 and M2. Eighty percent of regions from
M1 show a statistically significant difference in overall mean (at level 0.05); 16% and 12% of regions from M2 show a statistically significant difference
in neurons or glia, respectively (at level 0.05). (b) Explicit modeling of neuronal methylation differences improves estimation accuracy. Comparison of
gold-standard mean difference in methylation in neuron-specific DMRs to the estimated mean difference from models M1 (left) and M2 (right), along
with the linear regression fit to the data (95% CI for the slope of the regression line of M1 = (0.29, 0.44), for M2 = (0.68, 0.95).
Montaño et al. Genome Biology 2013, 14:R94
http://genomebiology.com/content/14/8/R94
Page 4 of 9
of the separation of the brain cells into two nuclei frac-
tions. The neuronal fraction encompasses a diverse popu-
lation of neuronal cells, and the non-neuronal nuclei
contain astrocytes, oligodendrocytes, a minority of NeuN-
negative neurons, and endothelial cells. To separate the
methylation signal into more than two fractions is mathe-
matically plausible, as one can simply define
π
i
as the frac-
tion of cells of the cell-type of interest, fit model M2, and
consider
β
3
. However, investigating how robust our results
are to the noise in cell fraction estimates when there are
more than two cell types will require further study.
The experimental design presented here provides for effi-
cient use of scarce tissue bank resources and limited funds
for methylation profiling. Once purified methylation pro-
files are obtained from the brain regions of interest using a
small number of samp les, the gold-standard methylation
data can be used for any further analysis, a nd b y any
laboratory, without the need to sort nuclei again. We have
demonstrated our method on data from both CHARM and
the Illumina 450K array. To apply our method to a new
measurement platform or new brain regions, we recom-
mend performing cell sorting on a subset of the samples to
first obtain the cell-type specific signals needed for the cell-
fraction estimation. If brain-region specific data are not
available, we have also shown that for samples measured
with CHARM, accurate estimates o f cell proportions in
samples from one brai n regio n could be obtained using
sorted data from another brain region. We provide a fra-
mework that can be applied, even retrosp ectively, to psy-
chiatric case-control studies using frozen postmortem
brain samples, and can be easily adapted to other microar-
ray or sequencing platforms, and to other target tissues.
Materials and methods
Generative model of methylation signal
To illustrate our model, we consider the case of estimating
differences in methylation between DLPFC (D) and HF
(H). We assume these brain tissues are composed of two
cell types, NeuN+ (+) and NeuN- (-). For a fixed genomic
position, we let μj,k be the methylation level in region j,
j Î {H, D} and cell type k, k Î {+, -}. Scientifically, we are
interested in identifying genomic locations where μH,k -
μD,k 0, that is, where NeuN+ or NeuN- have different
methylation levels in the two brain regions.
Given a sample i and considering a fixed genomic posi-
tion, we define X
i
as the indicator that sample i is from the
hippocampus, that is, X
i
= 1 if sample i is from the hippo-
campus and 0 otherwise. We also define π
i
to be the frac-
tion of sample i that consists of NeuN- cells (1 - π
i
is the
fraction of NeuN+ cells). We can then derive the expected
value of the methylation signal of sample i at that genomic
position as
E
(
Y
i
)
= {π
i
μ
D,
+
(
1 π
i
)
μ
D,+
}
(
1 X
i
)
+ {π
i
μ
H,
+
(
1 π
i
)
μ
H,+
}
(
X
i
).
Rearranging terms gives:
E
(
Y
i
)
= μ
D
,
+
+
(
μ
D
,
μ
D
,
+
)
π
i
+
(
μ
H
,
+
μ
D
,
+
)
X
i
(
1 π
i
)
+
(
μ
H
,
μ
D
,
)
X
i
π
i
(1)
Supposewewantedtoestimatewhetherthereisadif-
ference in methylation between the two brain regions
being considered, H and D. If we fit a model with terms
matching those above, that is,
E
(
Y
i
)
= β
0
+ β
1
π
i
+ β
2
X
i
(
1 π
i
)
+ β
3
X
i
π
i
(M2)
then our estimated coefficients have interpretations
equivalent to the generative model in Equation 1. Specifi-
cally, we can test the hypothesis of no difference in
NeuN+ methylation between D and H (μ
H,+
- μ
D,+
=0)
by testing the hypothesis that b
2
=0,andthehypothesis
of no differen ce in NeuN- methylation between D and H
(μ
H,-
- μ
D,-
= 0) by testing the hypothesis that b
3
=0.
From the equations above, we can see that estimating
the fraction of cells of each type, π
i
, allows us to explicitly
find locations with brain-region differences specific to
NeuN+ or NeuN- cells.
Naïve models are biased
In general, π
i
is unknown and therefore not included in
the linear model, that is, the model
E
(
Y
i
)
= α
0
+ α
1
X
i
(M1)
is fitted. However, this model does not account for all
the sources of variation in Y
i
, and the least squares esti-
mate
ˆ
α
1
is a biased estim ate of the di fference in methyla-
tion between H and D under the null hypothesis. To see
this, we can write
E
(
ˆα
)
=
(
X
t
X
)
1
X
t
X
(
Y
)
,whereX is the
design matrix of the above mod el and
ˆα
1
is the vector
(
ˆ
α
0
,
ˆ
α
1
) and the hats represent least squares estimates.
For simplicity, we assume equal numbers of samples
from H and D. We then have
E
(
ˆα
1
)
= μ
H,+
μ
D,+
+
(
μ
H,
μ
H,+
)
¯π
H
(
μ
D,
μ
D,+
)
¯π
D
Where
¯π
j
is the mean fraction of NeuN- cells in region
j. Under the null hypothesis of no difference betwe en
D and H in eithe r + or -, we ha ve
μ
H
,
+
μ
D
,
+
=
0
and
also
(
μ
H, -
μ
H, +
)
=
(
μ
D, -
μ
D, +
)
=
δ
, which gives
E
(
ˆα
1
)
= δ
(
¯π
H
−¯π
D
).
This means that where + and - have different methyla-
tion levels (δ 0), a difference in the fractions of + and -
cells in the different brain regions can lead to false-positive
signals of brain region differences in methylation.
Guintivano et al. [20] estimate
π
i
and propose an ad hoc
approach to adjust for this that is approximated by fitting
the following model
E
(
Y
i
)
= γ
0
+ γ
1
X
1
+ γ
2
π
i
(M3)
Montaño et al. Genome Biology 2013, 14:R94
http://genomebiology.com/content/14/8/R94
Page 5 of 9
However, this model does not accou nt for all the
sources of variation in Y
i
either and the least squares
estimate
ˆ
γ
1
is a biased estimate of the difference in
methylation between H and D. To see this, we can write
E
(
ˆγ
)
=
(
X
t
X
)
1
X
t
E
(
Y
)
,whereX is the design matrix of
the abo ve model a nd
ˆ
γ
is the vector
(
ˆγ
0
, ˆγ
1
, ˆγ
2
)
and the
hats repre sent least squares estimates. For simplicity, we
assume equal numbers of samples from H and D. We
then have
E
(
ˆγ
1
)
= μ
H,+
μ
D,+
+ K
((
μ
H,
μ
H,+
)
(
μ
D,
μ
D,+
))
Where K is a function of the
π
i
s that does not depend
on the sample size:
K =
¯π
H
1
2
¯π
H
¯π +
1
2
¯π
2
( ¯π )
2
+
1
2
¯π
2
H
( ¯π −¯π
H
)
1
2
¯π
2
( ¯π
H
)
2
+ ¯π ( ¯π
H
−¯π)
With
¯
π
and
¯
π
H
the average of the
π
i
s in all samples and
H samples, respectively, and
¯
π
2
and
¯π
2
H
the average of the
π
2
i
s in all samples and H samples, respectively. Note that
the bias is directly proportional to the difference between
NeuN+ and NeuN- fractions, demonstrating that this
approach is incapable of deconvolvi ng these quantities of
interest.
Estimation of mixture proportions
Although we have shown that fitting the mis-speci fied
model, which does not include the cell-fraction terms, can
lead to bias under the null hypothesis, the cell fractions for
a given sample are unknown apriori. At any given methy-
lation site, we are assuming that there is some underlying
mean methylation value within each combination of cell
type (+, -) and brain region (D, H). If we know these
underlying means, we can derive an estimate of the
unknown cell fraction at a particular site, given an
observed methylation signal an d assu ming the generativ e
model above. For example, suppose sample i is from
D and we observe m ethylation signal Y
i
at a given locus.
From Equation 1, we have
E
(
Y
i
)
= μ
D,+
+ π
i
(
μ
D,
μ
D,+
)
= π
i
μ
D,
+
(
1 π
i
)
μ
D,
+
(2)
If we assu me
μ
D
,+
and
μ
D
,
are known ,
π
i
is the only
unknown in this equation, so it can be estimated. Note
that we do need to constrain our estimate of
π
i
to be
between 0 and 1. Also, the means μ are not known, so we
collected data to allow us to estimate these means, by mea-
suring methylation in pure cell sorted + or - fractions from
each brain region of interest. Given that these methylation
measurements have uncertainty, we want to reduce the
uncertainty in our estimate of
π
i
by using many informa-
tive genomic regions. We first select a set of genomic
regions where + and - methylation differs. We then find
the optimal value of
π
i
to explain the observed methylation
for sample i in these locations, as a function of our esti-
mated means and
π
i
, subject to the constraint that
π
i
is
between 0 and 1. This procedure closely follows that pre-
sented by Houseman et al. [19].
Selection of the genomic locations can be based on a
variety of factors, such as the range of observed methyla-
tion at the se locations, t he variance of the methylation
estimates, and the length of the region of differential
methylation. For our esti mation, we chose the 500 geno-
mic re gions which were the strongest + vs. - DMR candi-
dates in the brain region of interest in relation to the
amount of methylation difference and the length of the
region showing the methylation difference. We found
that our results were quite robust t o the number of
regions selected, with 500 performing well.
To investigate whether it is absolutely necessary to have
sorted data from a given brain region to estimate cell pro-
portions in unsorted data from that region, we identified a
set of universal genomic regions. These universal regions
had different NeuN+ and NeuN- methylation signals
within a brain region, but showed consistent NeuN + and
NeuN- methylation levels across the three brain regions
for which we had data (DLPFC, HF, and STG). Many of
these + vs. - DMR candi dates had consistent NeuN+ and
NeuN- levels across brain regions, with 14% to 17% of the
probes in the + vs. - DMRs belonging to genomic regions
of consistent signal. We estimated the means μ in these
regions of consistent signal using sorted data from DLPFC
alone, an d then performed cell fract ion estimation in the
unsorted samples from DLPFC, HF, and STG using these
mean values. Since we do not know the true cell fractions
in these unsorted samp les, we used the estimates we had
obtained for each brain region using the region-specifi c
DMRs and mean values, as described above, as our gold
standard.
All analysis was implemented in R (R Core Team, R:
A Langua ge and Environment for Statistical Computing.
R Foundation for Statistical Computing: Vienna, Austria,
2012; [33]). The d ata discussed in this publication have
been deposited in NCBIsGeneExpressionOmnibus
and are accessible through GEO series accession num-
ber GSE48610.
Effect of inaccurate mixture estimates
As previously described, failur e to account for differences
in c ell-mixtures in our samples can lead to biased esti-
mates of brain-reg ion differences under the null hypoth-
esis of no brain region difference. However, inaccurate
mixture estimates can also lead to bias. For example,
consider the methylation signal in sample i
E
(
Y
i
)
= β
0
+ β
1
π
i
+ β
2
X
i
(
1 π
i
)
+ β
3
X
i
π
i
Montaño et al. Genome Biology 2013, 14:R94
http://genomebiology.com/content/14/8/R94
Page 6 of 9
Now suppose we have an inaccurate estimate o f
π
i
,
called
π
i
,where
π
i
= π
i
+ γ
i
. Using this inaccurate esti-
mate gives us the following contribution to our regres-
sion formulation from sample i:
β
0
+β
1
π
i
+β
2
X
i
(
1π
i
)
+β
3
X
i
(
π
i
= β
0
+β
1
(
π
i
+γ
i
)
+β
2
X
i
(
1π
i
+γ
i
)
+β
3
X
i
(
π
i
+γ
i
)
= β
0
+ β
1
γ
i
+ β
1
π
i
+ β
2
X
i
+
(
β
3
β
2
)
X
i
γ
i
= β
0
+ β
1
γ
i
+ β
1
π
i
(
1 π
i
)
+ β
3
X
i
π
i
+
(
β
3
β
2
)
X
i
(
η
i
π
i
)
(
1 η
i
)
π
i
)
= β
0
+β
1
γ
i
+β
1
π
i
+β
2
X
i
(
1π
i
)
+β
3
X
i
π
i
+
(
β
3
β
2
)
η
i
X
i
(
1π
i
)
+
(
β
3
β
2
)(
1η
i
)
X
i
π
i
= β
0
+ β
1
γ
i
+ β
1
π
i
+
(
β
2
+
(
β
3
β
2
)
η
i
)
X
i
(
1 π
i
)
+
(
β
3
+
(
β
3
β
2
)(
1 η
i
))
X
i
π
i
where h
i
is between 0 and 1, and the third line follows
from the fact that g
i
must be between
π
i
and
1
π
i
to
ensure that
π
i
is between 0 and 1. We can see that the
coefficien t of
X
i
(
1 π
i
)
is no longer measuring just the
quantity we are interested in (the difference between
NeuN+ methylation in regions H and D), but it also has
an additi onal facto r related to the size of the estimation
error, and similarly for the coefficient of
X
i
π
i
.
CHARM DNA methylation analysis
Genomic DNA was isolated from brains using the Master-
pure kit from Epicentre, according to the manufacturers
protocol. For genome-wide DNA methylation assessment,
1 ug of genomic DNA from each sample was digested,
fractionated, labeled, and hybridized to a CHARM array as
described [34,35] using a custom Nimblegen 2.1 million
feature array assaying 5,114,6 55 CpG sites. We used the
Bioconductor package charm f or sample preproce ssing
along with the package bumphunter for DMR identifica-
tion and permutation computation.
Human postmortem brain samples
Fluorescence-activated cell sorting was performed on fro-
zen postmortem dorsolateral prefrontal cortex (n = 4), and
hippocampal formation (n = 4) and su perior tempor al
gyrus (n = 3) from individuals not affected with neurologi-
cal or psychiatric disease. To validate the statistical model,
we used nine additional healthy samples from the dorso-
lateral prefrontal cortex. These samples underwent nuclei
extraction and sorting as described below. The model was
applied to additional unsorted control samples (19 samples
from DLPFC, 13 samples from HF, 31 samples from STG)
to deconvolve NeuN+ and NeuN- methylation signatures.
All samples were obtained from the bank of the Center for
Neurodegenerative Disease Research (CNDR) in the
Department of Pathology and Laboratory Medicine at the
University of Pennsylvania (directed by Dr John Q Troja-
nowski, see Additional File 1, Tables S2-4 for demographic
information).
Nuclei extraction, NeuN labeling, and sorting
Total nuclei were extracted via sucrose gradient centrifu-
gation a s previously d escribed [25]. A total of 250 mg of
frozen tissue per sample was homogenized in 5 m L of
lysis buffer (0.32M sucrose, 10 mM Tris pH 8.0, 5 mM
CaCl
2
, 3 mM Mg acetate, 1 mM DTT, 0.1 mM EDTA,
0.1% Triton X-100) by douncing 50 times i n a 40-mL
dounce homogenizer. Lysate was transferred to a 15 mL
ultracentrifugation tube and 9 mL of sucrose solution
(1.8 M sucrose, 10 mM Tris pH 8.0, 3 mM Mg acetate,
1 mM DTT) was pipetted to the bottom of the t ube. The
solution was then centrifuged at 27,000 rpm for 2.5 h at
4C (Beckman, L8-80 M; SW28.1 rotor). After cent rifuga-
tion, the supernatant was removed by aspiration and the
nuclei pellet was resuspended in 500 uL of PBS.
The nuclei were incubated in a staining mix (0.71%
normal goat serum, 0.036% BSA, 1:1200 anti-NeuN
NeuN (Millipore, MAB377), 1:1400 Alexa647 goat anti-
mouse secondary antibody (Invitrogen, 21236) for 45 min
by rotating in the dark at 4 °C. Unstained nuclei and
nuclei staine d with only secon dary antibody served as
negative controls. The fluorescent nuclei were run
through a FACS machine with proper gate s ettings. A
small portion of the NeuN
+
and NeuN
-
nuclei were re-
run on the FACS machine to validate the purity. Immu-
nonegative (NeuN
-
) nuclei were collected in parallel. To
pellet the sorted nuclei, 2 mL of sucrose solution, 50 uL
of1MCaCl
2
, and 30 uL of Mg acetate were added to 10
mL of nuclei in PBS, incubated on ice for 15 min, then
centrifuged at 3,000 rpm for 20 min. The nuclei pellet
was resuspended in 10 mM Tris (pH 7.5), 4 mM MgCl
2
,
and 1 mM CaCl
2
. Fluorescent images were taken on a
Zeiss Axio Observer. Z1 microscope with a Plan-Apoc-
hromat 100x/1.40 oil-immersion objective lens. Images
were generated using an Axiocam MR3 microscope cam-
era and Axiovision software (AxioVs40, version 4.8.2.0,
Carl Zeiss, Inc). Images were processed using ImageJ.
Additional material
Additional file 1: Supplementary Information. A PDF file containing
Figures S1-4 and Tables S1-4.
Abbreviations
CHARM: comprehensive high-throughput arrays for relative methylation;
DLPFC: dorsolateral prefrontal fortex; DMR: differentially methylated region;
FACS: fluorescence-activated cell sorting; FDR: false discovery rate; HF:
hippocampal formation; NeuN+: NeuN-positive fraction; NeuN-: NeuN-
negative fraction; STG: superior temporal gyrus.
Competing interests
The authors declare that they have no competing interests.
Authors contributions
CMM conceived of the study, designed and performed experiments,
analyzed the data and developed the statistical method, and wrote the
Montaño et al. Genome Biology 2013, 14:R94
http://genomebiology.com/content/14/8/R94
Page 7 of 9
paper. MAT conceived of the study, analyzed the data and developed the
statistical method, and wrote the paper. APF conceived of the study. RAI
analyzed data and developed the statistical method. WEK designed
experiments. KT selected and acquired samples. REG selected and acquired
samples. All authors read and approved the final manuscript.
Acknowledgements
All samples were obtained from the bank of the Center for
Neurodegenerative Disease Research (CNDR) in the Department of
Pathology and Laboratory Medicine at the University of Pennsylvania
(directed by Dr John Q Trojanowski). This work was funded by NIH Grant
U01 MH085270 to APF and CMM, Department of Defense (CDMRP)
AR080125 to APF and WEK, and NIH Grant R01 GM083084 to RAI and MAT.
The research reported in this publication was also supported by NIAMS
Award Number P30AR053503. We would like to thank Joe Chrest for his
expertise in cell sorting, Rakel Tryggvadottir for her assistance with sample
hybridizations, and Romeo Papazyan for his help with fluorescence
microscopy imaging.
Authors details
1
Medical Scientist Training Program, Johns Hopkins University School of
Medicine, 1830 E Monument Street, Baltimore, MD 21205, USA.
2
Predoctoral
Training Program in Human Genetics, McKusick-Nathans Institute of Genetic
Medicine, Johns Hopkins University School of Medicine, 733 N Broadway,
Baltimore, MD 21205, USA.
3
Dana Farber Cancer Institute, Department of
Biostatistics and Computational Biology, 450 Brookline Avenue, Boston, MA
02215, USA.
4
Department of Neurology, Boston Childrens Hospital and
Harvard Medical School, 300 Longwood Avenue, Boston, MA 02115, USA.
5
Department of Psychiatry, University of Pennsylvania, 3400 Spruce Street,
Philadelphia, PA 19104, USA.
6
Center for Epigenetics, Johns Hopkins
University School of Medicine, 855 N Wolfe Street, Baltimore, MD 21205,
USA.
7
Department of Biostatistics, Johns Hopkins Bloomberg School of
Public Health, 615 N Wolfe Street, Baltimore, MD 21205, USA.
Received: 7 March 2013 Revised: 11 June 2013
Accepted: 30 August 2013 Published: 30 August 2013
References
1. Feinberg AP: Phenotypic plasticity and the epigenetics of human
disease. Nature 2007, 447:433-440.
2. Miller CA, Sweatt JD: Covalent modification of DNA regulates memory
formation. Neuron 2007, 53:857-869.
3. Feng J, Zhou Y, Campbell SL, Le T, Li E, Sweatt JD, Silva AJ, Fan G: Dnmt1
and Dnmt3a maintain DNA methylation and regulate synaptic function
in adult forebrain neurons. Nat Neurosci 2010, 13:423-430.
4. LaPlant Q, Vialou V, Covington HE, Dumitriu D, Feng J, Warren BL, Maze I,
Dietz DM, Watts EL, Iniguez SD, Koo JW, Mouzon E, Renthal W, Hollis F,
Wang H, Noonan MA, Ren Y, Eisch AJ, Bolanos CA, Kabbaj M, Xiao G,
Neve RL, Hurd YL, Oosting RS, Fan G, Morrison JH, Nestrel EJ: Dnmt3a
regulates emotional behavior and spine plasticity in the nucleus
accumbens. Nat Neurosci 2010, 13:1137-1143.
5. Herb BR, Wolschin F, Hansen KD, Aryee MJ, Langmead B, Irizarry R,
Amdam GV, Feinberg AP: Reversible switching between epigenetic states
in honeybee behavioral subcastes. Nat Neurosci 2012, 15:1371-1373.
6. Hansen RS, Wijmenga C, Luo P, Stanek AM, Canfield TK, Weemaes CM,
Gartler SM: The DNMT3B DNA methyltransferase gene is mutated in the
ICF immunodeficiency syndrome. Proc Natl Acad Sci USA 1999,
96:14412-14417.
7. Amir RE, Van den Veyver IB, Wan M, Tran CQ, Francke U, Zoghbi HY: Rett
syndrome is caused by mutations in X-linked MECP2, encoding methyl-
CpG-binding protein 2. Nat Genet 1999, 23:185-188.
8. Kandel ER, Schwartz JH, Jessell TM: Principles of neural science. 4 edition.
New York: McGraw-Hill, Health Professions Division; 2000.
9. Hughes V: Microglia: The constant gardeners. Nature 2012, 485:570-572.
10. Iwamoto K, Bundo M, Ueda J, Oldham MC, Ukai W, Hashimoto E, Saito T,
Geschwind DH, Kato T: Neurons show distinctive DNA methylation profile
and higher interindividual variations compared with non-neurons.
Genome Res 2011, 21:688-696.
11. Prinz M, Priller J, Sisodia SS, Ransohoff RM: Heterogeneity of CNS myeloid
cells and their roles in neurodegeneration. Nat Neurosci 2011,
14:1227-1235.
12. Grayson DR, Jia X, Chen Y, Sharma RP, Mitchell CP, Guidotti A, Costa E:
Reelin promoter hypermethylation in schizophrenia. Proc Natl Acad Sci
USA 2005, 102:9341-9346.
13. Mill J, Tang T, Kaminsky Z, Khare T, Yazdanpanah S, Bouchard L, Jia P,
Assadzadeh A, Flanagan J, Schumacher A, Wang SC, Petronis A:
Epigenomic profiling reveals DNA-methylation changes associated with
major psychosis. Am J Hum Genet 2008, 82:696-711.
14. Sabunciyan S, Aryee MJ, Irizarry RA, Rongione M, Webster MJ, Kaufman WE,
Murakami P, Lessard A, Yolken RH, Feinberg AP, Potash JB: Genome-wide
DNA methylation scan in major depressive disorder. PLoS One 2012,
7:
e34451.
15.
Shen-Orr SS, Tibshirani R, Khatri P, Bodian DL, Staedtler F, Perry NM,
Hastie T, Sarwal MM, Davis MM, Butte AJ: Cell type-specific gene
expression differences in complex tissues. Nat Methods 2010, 7:287-289.
16. Gaujoux R, Seoighe C: Semi-supervised Nonnegative Matrix Factorization
for gene expression deconvolution: a case study. Infect Genet Evol 2012,
12:913-921.
17. Gong T, Hartmann N, Kohane IS, Brinkmann V, Staedtler F, Letzkus M,
Bongiovanni S, Szustakowski JD: Optimal deconvolution of transcriptional
profiling data using quadratic programming with application to complex
clinical blood samples. PLoS One 2011, 6:e27156.
18. Kuhn A, Thu D, Waldvogel HJ, Faull RL, Luthi-Carter R: Population-specific
expression analysis (PSEA) reveals molecular changes in diseased brain.
Nat Methods 2011, 8:945-947.
19. Houseman EA, Accomando WP, Koestler DC, Christensen BC, Marsit CJ,
Nelson HH, Wiencke JK, Kelsey KT: DNA methylation arrays as surrogate
measures of cell mixture distribution. BMC Bioinformatics 2012, 13:86.
20. Guintivano J, Aryee MJ, Kaminsky ZA: A cell epigenotype specific model
for the correction of brain cellular heterogeneity bias and its application
to age, brain region and major depression. Epigenetics 2013, 8:290-302.
21. Liu Y, Aryee MJ, Padyukov L, Fallin MD, Hesselberg E, Runarsson A,
Reinius L, Acevedo N, Taub M, Ronninger M, Shchetynsky K, Scheynius A,
Kere J, Alfredsson L, Klareskog L, Ekström TJ, Feinberg AP: Epigenome-wide
association data implicate DNA methylation as an intermediary of
genetic risk in rheumatoid arthritis. Nat Biotechnol 2013, 31:142-147.
22. Gu H, Bock C, Mikkelsen TS, Jager N, Smith ZD, Tomazou E, Gnirke A,
Lander ES, Meissner A: Genome-scale DNA methylation mapping of
clinical samples at single-nucleotide resolution. Nat Methods 2010,
7:133-136.
23. Irizarry RA, Ladd-Acosta C, Carvalho B, Wu H, Brandenburg SA, Jeddeloh JA,
Wen B, Feinberg AP: Comprehensive high-throughput arrays for relative
methylation (CHARM). Genome Res 2008, 18:780-790.
24. Mullen RJ, Buck CR, Smith AM: NeuN, a neuronal specific nuclear protein
in vertebrates. Development 1992, 116:201-211.
25. Jiang Y, Matevossian A, Huang HS, Straubhaar J, Akbarian S: Isolation of
neuronal chromatin from brain tissue. BMC Neurosci 2008, 9:42.
26. Ladd-Acosta C, Pevsner J, Sabunciyan S, Yolken RH, Webster MJ, Dinkins T,
Callinan PA, Fan JB, Potash JB, Feinberg AP: DNA methylation signatures
within the human brain. Am J Hum Genet 2007, 81:1304-1315.
27. Gibbs JR, van der Brug MP, Hernandez DG, Traynor BJ, Nalls MA, Lai SL,
Arepalli S, Dillman A, Rafferty IP, Troncoso J, Johnson R, Zielke HR,
Ferrucci L, Longo DL, Cookson MR, Singleton AB: Abundant quantitative
trait loci exist for DNA methylation and gene expression in human
brain. PLoS Genetics 2010, 6
:e1000952.
28.
Davies MN, Volta M, Pidsley R, Lunnon K, Dixit A, Lovestone S, Coarfa C,
Harris RA, Milosavljevic A, Troakes C, Al-Sarraj S, Dobson R, Schalkwyk LC,
Mill J: Functional annotation of the human brain methylome identifies
tissue-specific epigenetic variation across brain and blood. Genome Biol
2012, 13:R43.
29. Pardo LM, Rizzu P, Francescatto M, Vitezic M, Leday GG, Sanchez JS,
Khamis A, Takahashi H, van de Berg WD, Medvedeva YA, van de Wiel MA,
Daub CO, Carninci P, Heutink P: Regional differences in gene expression
and promoter usage in aged human brains. Neurobiol Aging 2013,
34:1825-1836.
30. Hannum G, Guinney J, Zhao L, Zhang L, Hughes G, Sadda S, Klotzle B,
Bibikova M, Fan JB, Gao Y, Deconde R, Chen M, Rajapakse I, Friend S,
Ideker T, Zhang K: Genome-wide methylation profiles reveal quantitative
views of human aging rates. Mol Cell 2013, 49:359-367.
31. Ko Y, Ament SA, Eddy JA, Caballero J, Earls JC, Hood L, Price ND: Cell type-
specific genes show striking and distinct patterns of spatial expression
in the mouse brain. Proc Natl Acad Sci USA 2013, 110:3095-3100.
Montaño et al. Genome Biology 2013, 14:R94
http://genomebiology.com/content/14/8/R94
Page 8 of 9
32. Day JJ, Sweatt JD: Epigenetic mechanisms in cognition. Neuron 2011,
70:813-829.
33. , R. http://www.r-project.org/.
34. Jaffe AE, Murakami P, Lee H, Leek JT, Fallin MD, Feinberg AP, Irizarry RA:
Bump hunting to identify differentially methylated regions in epigenetic
epidemiology studies. Int J Epidemiol 2012, 41:200-209.
35. Irizarry RA, Ladd-Acosta C, Wen B, Wu Z, Montano C, Onyango P, Cui H,
Gabo K, Rongione M, Webster M, Ji H, Potash JB, Sabunciyan S,
Feinberg AP: The human colon cancer methylome shows similar hypo-
and hypermethylation at conserved tissue-specific CpG island shores.
Nat Genet 2009, 41:178-186.
doi:10.1186/gb-2013-14-8-r94
Cite this article as: Montaño et al.: Measuring cell-type specific
differential methylation in human brain tissue. Genome Biology 2013
14:R94.
Submit your next manuscript to BioMed Central
and take full advantage of:
Convenient online submission
Thorough peer review
No space constraints or color figure charges
Immediate publication on acceptance
Inclusion in PubMed, CAS, Scopus and Google Scholar
Research which is freely available for redistribution
Submit your manuscript at
www.biomedcentral.com/submit
Montaño et al. Genome Biology 2013, 14:R94
http://genomebiology.com/content/14/8/R94
Page 9 of 9
... 42 Since both tumors and normal lung tissues are composed of multiple cell types, including epithelial cells, fibroblasts, hematopoietic cells, and endothelial cells, DNAme and gene expression profiles collected at the tissue level (''bulk'') may not accurately reflect alterations present in specific cell types. 43,44 Since the majority of DNAme alterations in lung cancers have been found to occur in epithelial cells, 45,46 we aimed at identifying DNAme alterations specific to the epithelial cell population. To resolve the confounding effects of other cell types, we leveraged previously validated computational methods to estimate the proportions of epithelial cells and infer epithelial-specific methylomes and transcriptomes ( Figure S3; STAR Methods). ...
... When the DNAme were measured at the tissue (''bulk'') level, the differential DNAme profiles between patient subjects may result from the differences in tissue compositions. 43 Article ll OPEN ACCESS perspective, tissue composition is meaningful in classifications of tumor subtypes and predictions of treatment response. However, from a biological perspective, users may be interested in identifying the differential DNAme present in only specific cell types. ...
... The output from TCA was also a three-dimensional tensor with shape methylome x cell type x sample, indicating the DNAme levels in each cell type and in each sample. It is important to note that users can also leverage other existing tools 43,44,[79][80][81][82][83][84] to adjust the effects from tissue compositions and then input the deconvoluted data to EpiMix. ...
Article
Full-text available
DNA methylation (DNAme) is a major epigenetic factor influencing gene expression with alterations leading to cancer and immunological and cardiovascular diseases. Recent technological advances have enabled genome-wide profiling of DNAme in large human cohorts. There is a need for analytical methods that can more sensitively detect differential methylation profiles present in subsets of individuals from these heterogeneous, population-level datasets. We developed an end-to-end analytical framework named "EpiMix" for population-level analysis of DNAme and gene expression. Compared with existing methods, EpiMix showed higher sensitivity in detecting abnormal DNAme that was present in only small patient subsets. We extended the model-based analyses of EpiMix to cis-regulatory elements within protein-coding genes, distal enhancers, and genes encoding microRNAs and long non-coding RNAs (lncRNAs). Using cell-type-specific data from two separate studies, we discover epigenetic mechanisms underlying childhood food allergy and survival-associated, methylation-driven ncRNAs in non-small cell lung cancer.
... M ost epigenome data are generated at the bulk-tissue level, which can confound molecular classifications of disease 1,2 or prevent the identification of cell-type-specific epigenetic alterations 3,4 . To address these challenges, a number of reference-based and reference-free cell-type deconvolution algorithms have been proposed [5][6][7][8][9][10][11][12][13][14] , with reference-based methods offering the greatest potential to identify cell-type-specific DNA methylation (DNAm) changes 2,15,16 . However, a major limitation remains in that these algorithms require DNAm reference profiles representing the main cell types in a given tissue 15,16 . ...
... In all, we identified 13 tissue types that met all criteria at the time of writing (Fig. 1a). In all cases, the tissue-specific mRNA expression references matrices were validated in independent scRNA-seq datasets (Supplementary Table 1), with reasonably high accuracy and across all underlying cell types (Fig. 1b [1][2][3][4][5][6][7][8][9][10][11][12][13]. We then imputed corresponding tissue-specific DNAm reference matrices, with DNAm defined at the promoters of marker genes and for the same cell types as given in the mRNA expression references (Fig. 1c Systematic validation of the DNAm-atlas. ...
... , Supplementary Table 2 and Supplementary Figs.[1][2][3][4][5][6][7][8][9][10][11][12][13]. For instance, for 8 out of 13 tissue types, validation accuracy was over 90%(Fig. ...
Article
Full-text available
Bulk-tissue DNA methylomes represent an average over many different cell types, hampering our understanding of cell-type-specific contributions to disease development. As single-cell methylomics is not scalable to large cohorts of individuals, cost-effective computational solutions are needed, yet current methods are limited to tissues such as blood. Here we leverage the high-resolution nature of tissue-specific single-cell RNA-sequencing datasets to construct a DNA methylation atlas defined for 13 solid tissue types and 40 cell types. We comprehensively validate this atlas in independent bulk and single-nucleus DNA methylation datasets. We demonstrate that it correctly predicts the cell of origin of diverse cancer types and discovers new prognostic associations in olfactory neuroblastoma and stage 2 melanoma. In brain, the atlas predicts a neuronal origin for schizophrenia, with neuron-specific differential DNA methylation enriched for corresponding genome-wide association study risk loci. In summary, the DNA methylation atlas enables the decomposition of 13 different human tissue types at a high cellular resolution, paving the way for an improved interpretation of epigenetic data. This resource presents an in silico generated DNA methylation atlas that can be used for cell-type deconvolution of human tissues.
... However, these methods are labor-intensive and/or cost-prohibitive for most large-scale transcriptomic interrogation. As an alternative, statistical methods have been developed to deconvolute effects of individual cell types using data generated from bulk tissue [12][13][14][15][16][17]. ...
... Note that the model has no constant due to P nc c¼1 P c ffi 1. Alternatively, the model is sometimes written with a constant whereby one of the cell type proportions is omitted [17] but this produces identical results [16,28,29]. ...
Article
Full-text available
Postpartum depression (PPD) affects 1 in 7 women and has negative mental health consequences for both mother and child. However, the precise biological mechanisms behind the disorder are unknown. Therefore, we performed the largest transcriptome-wide association study (TWAS) for PPD (482 cases, 859 controls) to date using RNA-sequencing in whole blood and deconvoluted cell types. No transcriptional changes were observed in whole blood. B-cells showed a majority of transcriptome-wide significant results (891 transcripts representing 789 genes) with pathway analyses implicating altered B-cell activation and insulin resistance. Integration of other data types revealed cell type-specific DNA methylation loci and disease-associated eQTLs (deQTLs), but not hormones/neuropeptides (estradiol, progesterone, oxytocin, BDNF), serve as regulators for part of the transcriptional differences between cases and controls. Further, deQTLs were enriched for several brain region-specific eQTLs, but no overlap with MDD risk loci was observed. Altogether, our results constitute a convergence of evidence for pathways most affected in PPD with data across different biological mechanisms.
... For example, miR205 was identified as a marker for saliva [60,115]; however, Ludwig and coworkers reported a highly specific expression of miR205-5p in the skin [112], and it was recently reported for the identification of vaginal secretion [118]. One notable finding from studies that have examined DNA methylation is instead that this mark is involved in the regulation of several molecular mechanisms; it is cellspecific, as showed by cell-specific differentially methylated regions identified in post-mortem brain areas [69,126], and it is widely affected by environmental conditions throughout life. In particular, confounding factors, such as early life events [127], smoking, ethnicity, and gender [128,129] and diseases, can modify DNA methylation levels at specific sites in the genome. ...
Article
Full-text available
The possibility of using epigenetics in forensic investigation has gradually risen over the last few years. Epigenetic changes with their dynamic nature can either be inherited or accumulated throughout a lifetime and be reversible, prompting investigation of their use across various fields. In forensic sciences, multiple applications have been proposed, such as the discrimination of monozygotic twins, identifying the source of a biological trace left at a crime scene, age prediction, determination of body fluids and tissues, human behavior association, wound healing progression, and determination of the post-mortem interval (PMI). Despite all these applications, not all the studies considered the impact of PMI and post-sampling effects on the epigenetic modifications and the tissue-specificity of the epigenetic marks. This review aims to highlight the substantial forensic significance that epigenetics could support in various forensic investigations. First, basic concepts in epigenetics, describing the main epigenetic modifications and their functions, in particular, DNA methylation, histone modifications, and non-coding RNA, with a particular focus on forensic applications, were covered. For each epigenetic marker, post-mortem stability and tissue-specificity, factors that should be carefully considered in the study of epigenetic biomarkers in the forensic context, have been discussed. The advantages and limitations of using post-mortem tissues have been also addressed, proposing directions for these innovative strategies to analyze forensic specimens.
... Both CellDMC [4] and TOAST [5] use interaction terms between covariates and cell type proportions in a linear model to test csDE/csDM. This statistical framework has been shown as a generalization of several previous works [6][7][8]. TCA [9] models the cell type-specific methylation levels of each individual and derives a procedure for cell type-specific inference. While Cell-DMC, TOAST, and TCA mainly focus on continuous methylation or gene expression data measured in microarray, CARseq [10] is designed for cell type-specific inference for count data from RNA-sequencing by using a negative binomial (NB) distribution. ...
Article
Full-text available
Bulk high-throughput omics data contain signals from a mixture of cell types. Recent developments of deconvolution methods facilitate cell type-specific inferences from bulk data. Our real data exploration suggests that differential expression or methylation status is often correlated among cell types. Based on this observation, we develop a novel statistical method named CeDAR to incorporate the cell type hierarchy in cell type-specific differential analyses of bulk data. Extensive simulation and real data analyses demonstrate that this approach significantly improves the accuracy and power in detecting cell type-specific differential signals compared with existing methods, especially in low-abundance cell types.
... Methylation signatures vary not only between individuals, but also in a tissue-and cell-type dependent manner [45]. We, therefore, subcloned our target amplicon in order to identify GAD1 DNA methylation patterns with single-cell resolution, thereby exposing the uniformity of each group. ...
Article
Full-text available
DNA methylation profiling has become a promising approach towards identifying biomarkers of neuropsychiatric disorders including autism spectrum disorder (ASD). Epigenetic markers capture genetic risk factors and diverse exogenous and endogenous factors, including environmental risk factors and complex disease pathologies. We analysed the differential methylation profile of a regulatory region of the GAD1 gene using cerebral organoids generated from induced pluripotent stem cells (iPSCs) from adults with a diagnosis of ASD and from age- and gender-matched healthy individuals. Both groups showed high levels of methylation across the majority of CpG sites within the profiled GAD1 region of interest. The ASD group exhibited a higher number of unique DNA methylation patterns compared to controls and an increased CpG-wise variance. We detected six differentially methylated CpG sites in ASD, three of which reside within a methylation-dependent transcription factor binding site. In ASD, GAD1 is subject to differential methylation patterns that may not only influence its expression, but may also indicate variable epigenetic regulation among cells.
Article
Full-text available
Epigenetic mechanisms such as DNA methylation have been implicated in a number of diseases including cancer, heart disease, autoimmune disorders, and neurodegenerative diseases. While it is recognized that DNA methylation is tissue-specific, a limitation for many studies is the ability to sample the tissue of interest, which is why there is a need for a proxy tissue such as blood, that is reflective of the methylation state of the target tissue. In the last decade, DNA methylation has been utilized in the design of epigenetic clocks, which aim to predict an individual’s biological age based on an algorithmically defined set of CpGs. A number of studies have found associations between disease and/or disease risk with increased biological age, adding weight to the theory of increased biological age being linked with disease processes. Hence, this review takes a closer look at the utility of DNA methylation as a biomarker in aging and disease, with a particular focus on Alzheimer’s disease.
Chapter
Studies in epigenetic epidemiology have reported increasing numbers of epigenetic biomarkers associated with a wide range of exposures and outcomes. Due to cost and technical difficulties, these markers are usually derived from complex tissues that are composed of many different cell-types. This cell-type heterogeneity prevents the identification of cell-type specific epigenetic alterations, posing significant challenges to the interpretation and understanding of these markers. Consequently, there is a strong need to develop cost-effective computational solutions to tackle the cell-type heterogeneity problem. Here, I discuss some recently proposed cell-type deconvolution algorithms aimed at estimating cell-type fractions and identifying cell-type specific differential DNA methylation changes. I describe their successful application to epigenome studies. We also discuss their main limitations, providing general guidelines for their successful implementation and for correctly interpretating results derived from them.KeywordsEWASDNA methylationCell-type heterogeneityCell-type deconvolutionCancer
Article
Aim: To detect expression quantitative trait methylation (eQTM) loci within the cerebrum of prenatal Down syndrome (DS) and controls. Material & methods: DNA methylation gene expression profiles were acquired from NeuN+ nuclei, obtained from cerebrum sections of DS and controls. Linear regression models were applied to both datasets and were subsequently applied in an integrative analysis model to detect DS-associated eQTM loci. Results & conclusion: Widespread aberrant DNA methylation and gene expression were observed in DS. A substantial number of differentially methylated loci were replicated according to a previously reported study. Subsequent integrative analyses (eQTM) yielded numerous associated DS loci. the authors associated DNA methylation, gene expression and eQTM loci with DS that may underlie particular DS phenotypical characteristics.
Article
Epigenetics is the study of gene regulation. It refers to the structural and dynamic factors that govern how, when, and where functional units of the genome called genes are transcribed and regulated. Although DNA isoften referred to as the “blueprint” of life, the nucleotide sequence that makes up the genome does not provide sufficient information on the spatiotemporal regulation and the mechanisms for understanding essential physiological processes such as differentiation, development, and disease. In this article, we will provide an overview of epigenetics, their importance in understanding behavior, some of the challenges of conducting experiments in epigenetics, and how to analyze data in the context of behavior. Given the large number of studies in epigenetics, even with a focus on behavior, only general concepts will be provided and discussed. The readers are encouraged to refer to some of the specific examples and tools that are included.
Article
Full-text available
Brain cellular heterogeneity may bias cell type specific DNA methylation patterns, influencing findings in psychiatric epigenetic studies. We performed fluorescence activated cell sorting (FACS) of neuronal nuclei and Illumina HM450 DNA methylation profiling in post mortem frontal cortex of 29 major depression and 29 matched controls. We identify genomic features and ontologies enriched for cell type specific epigenetic variation. Using the top cell epigenotype specific (CETS) marks, we generated a publically available R package, "CETS," capable of quantifying neuronal proportions and generating in silico neuronal profiles from DNA methylation data. We demonstrate a significant overlap in major depression DNA methylation associations between FACS separated and CETS model generated neuronal profiles relative to bulk profiles. CETS derived neuronal proportions correlated significantly with age in the frontal cortex and cerebellum and accounted for epigenetic variation between brain regions. CETS based control of cellular heterogeneity will enable more robust hypothesis testing in the brain.
Article
Full-text available
To characterize gene expression patterns in the regional subdivisions of the mammalian brain, we integrated spatial gene expression patterns from the Allen Brain Atlas for the adult mouse with panels of cell type-specific genes for neurons, astrocytes, and oligodendrocytes from previously published transcriptome profiling experiments. We found that the combined spatial expression patterns of 170 neuron-specific transcripts revealed strikingly clear and symmetrical signatures for most of the brain's major subdivisions. Moreover, the brain expression spatial signatures correspond to anatomical structures and may even reflect developmental ontogeny. Spatial expression profiles of astrocyte- and oligodendrocyte-specific genes also revealed regional differences; these defined fewer regions and were less distinct but still symmetrical in the coronal plane. Follow-up analysis suggested that region-based clustering of neuron-specific genes was related to (i) a combination of individual genes with restricted expression patterns, (ii) region-specific differences in the relative expression of functional groups of genes, and (iii) regional differences in neuronal density. Products from some of these neuron-specific genes are present in peripheral blood, raising the possibility that they could reflect the activities of disease- or injury-perturbed networks and collectively function as biomarkers for clinical disease diagnostics.
Article
Full-text available
Epigenetic mechanisms integrate genetic and environmental causes of disease, but comprehensive genome-wide analyses of epigenetic modifications have not yet demonstrated robust association with common diseases. Using Illumina HumanMethylation450 arrays on 354 anti-citrullinated protein antibody-associated rheumatoid arthritis cases and 337 controls, we identified two clusters within the major histocompatibility complex (MHC) region whose differential methylation potentially mediates genetic risk for rheumatoid arthritis. To reduce confounding factors that have hampered previous epigenome-wide studies, we corrected for cellular heterogeneity by estimating and adjusting for cell-type proportions in our blood-derived DNA samples and used mediation analysis to filter out associations likely to be a consequence of disease. Four CpGs also showed an association between genotype and variance of methylation. The associations for both clusters replicated at least one CpG (P < 0.01), with the rest showing suggestive association, in monocyte cell fractions in an independent cohort of 12 cases and 12 controls. Thus, DNA methylation is a potential mediator of genetic risk.
Article
Full-text available
In honeybee societies, distinct caste phenotypes are created from the same genotype, suggesting a role for epigenetics in deriving these behaviorally different phenotypes. We found no differences in DNA methylation between irreversible worker and queen castes, but substantial differences between nurses and forager subcastes. Reverting foragers back to nurses reestablished methylation levels for a majority of genes and provides, to the best of our knowledge, the first evidence in any organism of reversible epigenetic changes associated with behavior.
Article
Full-text available
Dynamic changes to the epigenome play a critical role in establishing and maintaining cellular phenotype during differentiation, but little is known about the normal methylomic differences that occur between functionally distinct areas of the brain. We characterized intra- and inter-individual methylomic variation across whole blood and multiple regions of the brain from multiple donors. Distinct tissue-specific patterns of DNA methylation were identified, with a highly significant over-representation of tissue-specific differentially methylated regions (TS-DMRs) observed at intragenic CpG islands and low CG density promoters. A large proportion of TS-DMRs were located near genes that are differentially expressed across brain regions. TS-DMRs were significantly enriched near genes involved in functional pathways related to neurodevelopment and neuronal differentiation, including BDNF, BMP4, CACNA1A, CACA1AF, EOMES, NGFR, NUMBL, PCDH9, SLIT1, SLITRK1 and SHANK3. Although between-tissue variation in DNA methylation was found to greatly exceed between-individual differences within any one tissue, we found that some inter-individual variation was reflected across brain and blood, indicating that peripheral tissues may have some utility in epidemiological studies of complex neurobiological phenotypes. This study reinforces the importance of DNA methylation in regulating cellular phenotype across tissues, and highlights genomic patterns of epigenetic variation across functionally distinct regions of the brain, providing a resource for the epigenetics and neuroscience research communities.
Article
Full-text available
Background There has been a long-standing need in biomedical research for a method that quantifies the normally mixed composition of leukocytes beyond what is possible by simple histological or flow cytometric assessments. The latter is restricted by the labile nature of protein epitopes, requirements for cell processing, and timely cell analysis. In a diverse array of diseases and following numerous immune-toxic exposures, leukocyte composition will critically inform the underlying immuno-biology to most chronic medical conditions. Emerging research demonstrates that DNA methylation is responsible for cellular differentiation, and when measured in whole peripheral blood, serves to distinguish cancer cases from controls. Results Here we present a method, similar to regression calibration, for inferring changes in the distribution of white blood cells between different subpopulations (e.g. cases and controls) using DNA methylation signatures, in combination with a previously obtained external validation set consisting of signatures from purified leukocyte samples. We validate the fundamental idea in a cell mixture reconstruction experiment, then demonstrate our method on DNA methylation data sets from several studies, including data from a Head and Neck Squamous Cell Carcinoma (HNSCC) study and an ovarian cancer study. Our method produces results consistent with prior biological findings, thereby validating the approach. Conclusions Our method, in combination with an appropriate external validation set, promises new opportunities for large-scale immunological studies of both disease states and noxious exposures.
Article
A battery of monoclonal antibodies (mAbs) against brain cell nuclei has been generated by repeated immunizations. One of these, mAb A60, recognizes a vertebrate nervous system- and neuron-specific nuclear protein that we have named NeuN (Neuronal Nuclei). The expression of NeuN is observed in most neuronal cell types throughout the nervous system of adult mice. However, some major cell types appear devoid of immunoreactivity including cerebellar Purkinje cells, olfactory bulb mitral cells, and retinal photoreceptor cells. NeuN can also be detected in neurons in primary cerebellar cultures and in retinoic acid-stimulated P19 embryonal carcinoma cells. Immunohistochemically detectable NeuN protein first appears at developmental timepoints which correspond with the withdrawal of the neuron from the cell cycle and/or with the initiation of terminal differentiation of the neuron. NeuN is a soluble nuclear protein, appears as 3 bands (46-48 x 10(3) M(r)) on immunoblots, and binds to DNA in vitro. The mAb crossreacts immunohistochemically with nervous tissue from rats, chicks, humans, and salamanders. This mAb and the protein recognized by it serve as an excellent marker for neurons in the central and peripheral nervous systems in both the embryo and adult, and the protein may be important in the determination of neuronal phenotype.
Article
To characterize the promoterome of caudate and putamen regions (striatum), frontal and temporal cortices, and hippocampi from aged human brains, we used high-throughput cap analysis of gene expression to profile the transcription start sites and to quantify the differences in gene expression across the 5 brain regions. We also analyzed the extent to which methylation influenced the observed expression profiles. We sequenced more than 71 million cap analysis of gene expression tags corresponding to 70,202 promoter regions and 16,888 genes. More than 7000 transcripts were differentially expressed, mainly because of differential alternative promoter usage. Unexpectedly, 7% of differentially expressed genes were neurodevelopmental transcription factors. Functional pathway analysis on the differentially expressed genes revealed an overrepresentation of several signaling pathways (e.g., fibroblast growth factor and wnt signaling) in hippocampus and striatum. We also found that although 73% of methylation signals mapped within genes, the influence of methylation on the expression profile was small. Our study underscores alternative promoter usage as an important mechanism for determining the regional differences in gene expression at old age.
Article
The ability to measure human aging from molecular profiles has practical implications in many fields, including disease prevention and treatment, forensics, and extension of life. Although chronological age has been linked to changes in DNA methylation, the methylome has not yet been used to measure and compare human aging rates. Here, we build a quantitative model of aging using measurements at more than 450,000 CpG markers from the whole blood of 656 human individuals, aged 19 to 101. This model measures the rate at which an individual's methylome ages, which we show is impacted by gender and genetic variants. We also show that differences in aging rates help explain epigenetic drift and are reflected in the transcriptome. Moreover, we show how our aging model is upheld in other human tissues and reveals an advanced aging rate in tumor tissue. Our model highlights specific components of the aging process and provides a quantitative readout for studying the role of methylation in age-related disease.
Article
Once thought to be passive sentinels, microglia now seem to be crucial for pruning back neurons during development.