ArticlePDF AvailableLiterature Review

DNA methylome analysis using short bisulfite sequencing data

Authors:
  • Altos Labs

Abstract and Figures

Bisulfite conversion of genomic DNA combined with next-generation sequencing (BS-seq) is widely used to measure the methylation state of a whole genome, the methylome, at single-base resolution. However, analysis of BS-seq data still poses a considerable challenge. Here we summarize the challenges of BS-seq mapping as they apply to both base and color-space data. We also explore the effect of sequencing errors and contaminants on inferred methylation levels and recommend the most appropriate way to analyze this type of data.
Content may be subject to copyright.
A preview of the PDF is not available
... After filtering the raw reads of MethylC-seq libraries with fastp [71], the clean reads of each accession were mapped to the corresponding pseudo reference genome [30,34], in which the lettuce reference genome (Salinas_v8) sequences were replaced by the corresponding nucleotides of each accession at the SNPs loci [75,76], using Bismark (v0.15.0) with options (--score_min L,0,-0.2 -X 1000). ...
Article
Full-text available
Background Lettuce (Lactuca sativa L.) is an economically important vegetable crop worldwide. Lettuce is believed to be domesticated from a single wild ancestor Lactuca serriola and subsequently diverged into two major morphologically distinct vegetable types: leafy lettuce and stem lettuce. However, the role of epigenetic variation in lettuce domestication and divergence remains largely unknown. Results To understand the genetic and epigenetic basis underlying lettuce domestication and divergence, we generate single-base resolution DNA methylomes from 52 Lactuca accessions, including major lettuce cultivars and wild relatives. We find a significant increase of DNA methylation during lettuce domestication and uncover abundant epigenetic variations associated with lettuce domestication and divergence. Interestingly, DNA methylation variations specifically associated with leafy and stem lettuce are related to regulation and metabolic processes, respectively, while those associated with both types are enriched in stress responses. Moreover, we reveal that domestication-induced DNA methylation changes could influence expression levels of nearby and distal genes possibly through affecting chromatin accessibility and chromatin loop. Conclusion Our study provides population epigenomic insights into crop domestication and divergence and valuable resources for further domestication for diversity and epigenetic breeding to boost crop improvement.
... It converts unmethylated cytosine nucleotides to uracil, while methylated cytosines remain unchanged (Krueger et al., 2012). ...
Article
Full-text available
Ongoing climatic shifts and increasing anthropogenic pressures demand an efficient delineation of conservation units and accurate predictions of populations' resilience and adaptive potential. Molecular tools involving DNA sequencing are nowadays routinely used for these purposes. Yet, most of the existing tools focusing on sequence‐level information have shortcomings in detecting signals of short‐term ecological relevance. Epigenetic modifications carry valuable information to better link individuals, populations, and species to their environment. Here, we discuss a series of epigenetic monitoring tools that can be directly applied to various conservation contexts, complementing already existing molecular monitoring frameworks. Focusing on DNA sequence‐based methods (e.g. DNA methylation, for which the applications are readily available), we demonstrate how (a) the identification of epi‐biomarkers associated with age or infection can facilitate the determination of an individual's health status in wild populations; (b) whole epigenome analyses can identify signatures of selection linked to environmental conditions and facilitate estimating the adaptive potential of populations; and (c) epi‐eDNA (epigenetic environmental DNA), an epigenetic‐based conservation tool, presents a non‐invasive sampling method to monitor biological information beyond the mere presence of individuals. Overall, our framework refines conservation strategies, ensuring a comprehensive understanding of species' adaptive potential and persistence on ecologically relevant timescales.
... FASTQ reads were processed using the Bismark pipeline [20,21]. First, we used TrimGalore-0.6.6 with the '-rrbs' option to eliminate adapter contamination and perform tail trimming, which involved clipping 10 base pairs from the forward and reverse reads, to remove low complexity artifacts of the adaptase library construction method. ...
Article
Full-text available
Hepatic xenobiotic metabolism and transport decline with age, while intact xenobiotic metabolism is associated with longevity. However, few studies have examined the genome-wide impact of epigenetic aging on these processes. We used reduced representation bisulfite sequencing (RRBS) to map DNA methylation changes in liver DNA from mice ages 4 and 24 months. We identified several thousand age-associated differentially methylated sites (a-DMS), many of which overlapped genes encoding Phase I and Phase II drug metabolizing enzymes, in addition to ABC and SLC classes of transporters. Notable genes harboring a-DMS were Cyp1a2 , Cyp2d9 , and Abcc2 that encode orthologs of the human drug metabolizing enzymes CYP1A2 and CYP2D6, and the multidrug resistance protein 2 (MRP2) transporter. Cyp2d9 hypermethylation with age was significantly associated with reduced gene expression, while Abcc2 expression was unchanged with age. Cyp1a2 lost methylation with age while, counterintuitively, its expression also reduced with age. We hypothesized that age-related dysregulation of the hepatic transcriptional machinery caused down-regulation of genes despite age-related hypomethylation. Bioinformatic analysis of hypomethylated a-DMS in our sample found them to be highly enriched for hepatic nuclear factor 4 alpha (HNF4α) binding sites. HNF4α promotes Cyp1a2 expression and is downregulated with age, which could explain the reduction in Cyp1a2 expression. Overall, our study supports the broad impact of epigenetic aging on xenobiotic metabolism and transport. Future work should evaluate the interplay between hepatic nuclear receptor function and epigenetic aging. These results may have implications for studies of longevity and healthy aging.
... The omics era has seen a rapid growth in studies measuring these dynamic changes in DNA methylation during ageing. They are usually based on whole genome bisulphite sequencing (WGBS) [23]. Perhaps the most obvious way to search for patterns in this WGBS data are differentially methylated positions (DMPs). ...
Article
Full-text available
Background The ageing process is a multifaceted phenomenon marked by the gradual deterioration of cellular and organismal functions, accompanied by an elevated susceptibility to diseases. The intricate interplay between genetic and environmental factors complicates research, particularly in complex mammalian models. In this context, simple invertebrate organisms have been pivotal, but the current models lack detectable DNA methylation limiting the exploration of this critical epigenetic ageing mechanism. This study introduces Nasonia vitripennis, the jewel wasp, as an innovative invertebrate model for investigating the epigenetics of ageing. Leveraging its advantages as a model organism and possessing a functional DNA methylation system, Nasonia emerges as a valuable addition to ageing research. Results Whole-genome bisulfite sequencing unveiled dynamic alterations in DNA methylation, with differentially methylated CpGs between distinct time points in both male and female wasps. These changes were associated with numerous genes, enriching for functions related to telomere maintenance, histone methylation, and mRNA catabolic processes. Additionally, other CpGs were found to be variably methylated at each timepoint. Sex-specific effects on epigenetic entropy were observed, indicating differential patterns in the loss of epigenetic stability over time. Constructing an epigenetic clock containing 19 CpGs revealed a robust correlation between epigenetic age and chronological age. Conclusions Nasonia vitripennis emerges as a promising model for investigating the epigenetics of ageing, shedding light on the intricate dynamics of DNA methylation and their implications for age-related processes. This research not only expands the repertoire of ageing models but also opens avenues for deeper exploration of epigenetic mechanisms in the context of ageing.
... Bisulfite sequencing is a highly robust technology that enables detection and quantification of DNA methylation patterns [15][16][17]. In recent years, it has made immense contributions to our understanding of gene expression regulation, genome stability maintenance, and the heritability of epigenetic marks [2,3,7,18]. ...
Article
Full-text available
Background Bisulfite sequencing detects and quantifies DNA methylation patterns, contributing to our understanding of gene expression regulation, genome stability maintenance, conservation of epigenetic mechanisms across divergent taxa, epigenetic inheritance and, eventually, phenotypic variation. Graphical representation of methylation data is crucial in exploring epigenetic regulation on a genome-wide scale in both plants and animals. This is especially relevant for non-model organisms with poorly annotated genomes and/or organisms where genome sequences are not yet assembled on chromosome level. Despite being a technology of choice to profile DNA methylation for many years now there are surprisingly few lightweight and robust standalone tools available for efficient graphical analysis of data in non-model systems. This significantly limits evolutionary studies and agrigenomics research. BSXplorer is a tool specifically developed to fill this gap and assist researchers in explorative data analysis and in visualising and interpreting bisulfite sequencing data more easily. Results BSXplorer provides in-depth graphical analysis of sequencing data encompassing (a) profiling of methylation levels in metagenes or in user-defined regions using line plots and heatmaps, generation of summary statistics charts, (b) enabling comparative analyses of methylation patterns across experimental samples, methylation contexts and species, and (c) identification of modules sharing similar methylation signatures at functional genomic elements. The tool processes methylation data quickly and offers API and CLI capabilities, along with the ability to create high-quality figures suitable for publication. Conclusions BSXplorer facilitates efficient methylation data mining, contrasting and visualization, making it an easy-to-use package that is highly useful for epigenetic research.
... However, reads generated from whole-genome sequencing techniques such as ATAC-seq (Buenrostro et al., 2013), BS-seq (Krueger et al., 2012), and RAD-seq (Davey and Blaxter, 2011) are not recommended to be used with a masked genome in analysis. These techniques are commonly employed for sequencing across the entire genome, aiming to comprehend the structure, functionality, and variations within the genome. ...
Article
Nucleic acid modifications play essential roles in diverse biological processes, ranging from gene expression regulation to stress response. While traditional research focused on common modifications like methylation, recent discoveries are unveiling a wide range of rare modifications with potentially crucial functions. However, accurately detecting and mapping these modifications pose significant challenges due to their low abundance and diverse chemical properties. This article summarizes the recent discoveries of rare DNA and RNA modifications across various organisms, highlighting their potential biological significance. Furthermore, it critically evaluates the limitations of current mapping techniques, including potential sources of false positives and negatives. Finally, the article discusses emerging strategies for overcoming these challenges and future opportunities in the field of rare nucleic acid modification detection.
Article
Full-text available
Bisulfite sequencing, a combination of bisulfite treatment and high-throughput sequencing, has proved to be a valuable method for measuring DNA methylation at single base resolution. Here, we present B-SOLANA, an approach for the analysis of two-base encoding (colorspace) bisulfite sequencing data on the SOLiD platform of Life Technologies. It includes the alignment of bisulfite sequences and the determination of methylation levels in CpG as well as non-CpG sequence contexts. B-SOLANA enables a fast and accurate analysis of large raw sequence datasets. The source code, released under the GNU GPLv3 licence, is freely available at http://code.google.com/p/bsolana/. b.kreck@ikmb.uni-kiel.de Supplementary data are available at Bioinformatics online.
Article
Full-text available
Motivation: MethylCoder is a software program that generates per-base methylation data given a set of bisulfite-treated reads. It provides the option to use either of two existing short-read aligners, each with different strengths. It accounts for soft-masked alignments and overlapping paired-end reads. MethylCoder outputs data in text and binary formats in addition to the final alignment in SAM format, so that common high-throughput sequencing tools can be used on the resulting output. It is more flexible than existing software and competitive in terms of speed and memory use. Availability: MethylCoder requires only a python interpreter and a C compiler to run. Extensive documentation and the full source code are available under the MIT license at: https://github.com/brentp/methylcode. Contact: bpederse@gmail.com.
Article
Full-text available
Elucidating how and to what extent CpG islands (CGIs) are methylated in germ cells is essential to understand genomic imprinting and epigenetic reprogramming. Here we present, to our knowledge, the first integrated epigenomic analysis of mammalian oocytes, identifying over a thousand CGIs methylated in mature oocytes. We show that these CGIs depend on DNMT3A and DNMT3L but are not distinct at the sequence level, including in CpG periodicity. They are preferentially located within active transcription units and are relatively depleted in H3K4me3, supporting a general transcription-dependent mechanism of methylation. Very few methylated CGIs are fully protected from post-fertilization reprogramming but, notably, the majority show incomplete demethylation in embryonic day (E) 3.5 blastocysts. Our study shows that CGI methylation in gametes is not entirely related to genomic imprinting but is a strong factor in determining methylation status in preimplantation embryos, suggesting a need to reassess mechanisms of post-fertilization demethylation.
Article
Full-text available
5-hydroxymethylcytosine (5hmC) is a modified base present at low levels in diverse cell types in mammals. 5hmC is generated by the TET family of Fe(II) and 2-oxoglutarate-dependent enzymes through oxidation of 5-methylcytosine (5mC). 5hmC and TET proteins have been implicated in stem cell biology and cancer, but information on the genome-wide distribution of 5hmC is limited. Here we describe two novel and specific approaches to profile the genomic localization of 5hmC. The first approach, termed GLIB (glucosylation, periodate oxidation, biotinylation) uses a combination of enzymatic and chemical steps to isolate DNA fragments containing as few as a single 5hmC. The second approach involves conversion of 5hmC to cytosine 5-methylenesulphonate (CMS) by treatment of genomic DNA with sodium bisulphite, followed by immunoprecipitation of CMS-containing DNA with a specific antiserum to CMS. High-throughput sequencing of 5hmC-containing DNA from mouse embryonic stem (ES) cells showed strong enrichment within exons and near transcriptional start sites. 5hmC was especially enriched at the start sites of genes whose promoters bear dual histone 3 lysine 27 trimethylation (H3K27me3) and histone 3 lysine 4 trimethylation (H3K4me3) marks. Our results indicate that 5hmC has a probable role in transcriptional regulation, and suggest a model in which 5hmC contributes to the 'poised' chromatin signature found at developmentally-regulated genes in ES cells.
Article
Full-text available
A combination of bisulfite treatment of DNA and high-throughput sequencing (BS-Seq) can capture a snapshot of a cell's epigenomic state by revealing its genome-wide cytosine methylation at single base resolution. Bismark is a flexible tool for the time-efficient analysis of BS-Seq data which performs both read mapping and methylation calling in a single convenient step. Its output discriminates between cytosines in CpG, CHG and CHH context and enables bench scientists to visualize and interpret their methylation data soon after the sequencing run is completed. Availability and implementation: Bismark is released under the GNU GPLv3+ licence. The source code is freely available from www.bioinformatics.bbsrc.ac.uk/projects/bismark/. Contact: felix.krueger@bbsrc.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.
Article
Full-text available
Methylation at the 5' position of cytosine in DNA has important roles in genome function and is dynamically reprogrammed during early embryonic and germ cell development. The mammalian genome also contains 5-hydroxymethylcytosine (5hmC), which seems to be generated by oxidation of 5-methylcytosine (5mC) by the TET family of enzymes that are highly expressed in embryonic stem (ES) cells. Here we use antibodies against 5hmC and 5mC together with high throughput sequencing to determine genome-wide patterns of methylation and hydroxymethylation in mouse wild-type and mutant ES cells and differentiating embryoid bodies. We find that 5hmC is mostly associated with euchromatin and that whereas 5mC is under-represented at gene promoters and CpG islands, 5hmC is enriched and is associated with increased transcriptional levels. Most, if not all, 5hmC in the genome depends on pre-existing 5mC and the balance between these two modifications is different between genomic regions. Knockdown of Tet1 and Tet2 causes downregulation of a group of genes that includes pluripotency-related genes (including Esrrb, Prdm14, Dppa3, Klf2, Tcl1 and Zfp42) and a concomitant increase in methylation of their promoters, together with an increased propensity of ES cells for extraembryonic lineage differentiation. Declining levels of TETs during differentiation are associated with decreased hydroxymethylation levels at the promoters of ES cell-specific genes together with increased methylation and gene silencing. We propose that the balance between hydroxymethylation and methylation in the genome is inextricably linked with the balance between pluripotency and lineage commitment.
Article
Full-text available
Genome-wide mapping of 5-methylcytosine is of broad interest to many fields of biology and medicine. A variety of methods have been developed, and several have recently been advanced to genome-wide scale using arrays and next-generation sequencing approaches. We have previously reported reduced representation bisulfite sequencing (RRBS), a bisulfite-based protocol that enriches CG-rich parts of the genome, thereby reducing the amount of sequencing required while capturing the majority of promoters and other relevant genomic regions. The approach provides single-nucleotide resolution, is highly sensitive and provides quantitative DNA methylation measurements. This protocol should enable any standard molecular biology laboratory to generate RRBS libraries of high quality. Briefly, purified genomic DNA is digested by the methylation-insensitive restriction enzyme MspI to generate short fragments that contain CpG dinucleotides at the ends. After end-repair, A-tailing and ligation to methylated Illumina adapters, the CpG-rich DNA fragments (40-220 bp) are size selected, subjected to bisulfite conversion, PCR amplified and end sequenced on an Illumina Genome Analyzer. Note that alignment and analysis of RRBS sequencing reads are not covered in this protocol. The extremely low input requirements (10-300 ng), the applicability of the protocol to formalin-fixed and paraffin-embedded samples, and the technique's single-nucleotide resolution extends RRBS to a wide range of biological and clinical samples and research applications. The entire process of RRBS library construction takes ∼9 d.
Article
The developmental potential of human pluripotent stem cells suggests that they can produce disease-relevant cell types for biomedical research. However, substantial variation has been reported among pluripotent cell lines, which could affect their utility and clinical safety. Such cell-line-specific differences must be better understood before one can confidently use embryonic stem (ES) or induced pluripotent stem (iPS) cells in translational research. Toward this goal we have established genome-wide reference maps of DNA methylation and gene expression for 20 previously derived human ES lines and 12 human iPS cell lines, and we have measured the in vitro differentiation propensity of these cell lines. This resource enabled us to assess the epigenetic and transcriptional similarity of ES and iPS cells and to predict the differentiation efficiency of individual cell lines. The combination of assays yields a scorecard for quick and comprehensive characterization of pluripotent cell lines.