ArticleLiterature Review

Interspersed repeats and other mementos of transposable elements in mammalian genomes

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

The bulk of the human genome is ultimately derived from transposable elements. Observations in the past year lead to some new and surprising ideas on functions and consequences of these elements and their remnants in our genome. The many new examples of human genes derived from single transposon insertions highlight the large contribution of selfish DNA to genomic evolution.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... Both of these trends in the location of HERV distribution are likely to be the result of purifying selection. In this case, the harmful HERV provirus within a transcription unit is subject to negative selection and disappears over the course of evolution [12,15,[21][22][23][24]. Because the splicing and poly(A) addition signals of HERV are present in the antisense direction, HERV transcription in the opposite direction to that of the gene may be the least disruptive to mRNA synthesis [15,21,22,25]. ...
... In this case, the harmful HERV provirus within a transcription unit is subject to negative selection and disappears over the course of evolution [12,15,[21][22][23][24]. Because the splicing and poly(A) addition signals of HERV are present in the antisense direction, HERV transcription in the opposite direction to that of the gene may be the least disruptive to mRNA synthesis [15,21,22,25]. A recent study proposed a correlation between silencing mechanisms and the evolutionary age of HERVs. ...
... The results showed that these elements are mainly distributed in intergenic regions and introns. This may be because the integration of a HERV provirus within the transcription unit is harmful and therefore subject to negative selection and elimination during evolution [12,15,[21][22][23][24]. In particular, the number of proviruses on the Y chromosome was significantly different from that predicted by the chi-square test (p = 0.01), which indicates that the male-specific region of the Y chromosome (MSY) accumulates higher densities of HERVs and associated sequences, consistent with previous studies [65]. ...
Article
Full-text available
Background: Human endogenous retroviruses (HERVs) result from ancestral infections caused by exogenous retroviruses that became incorporated into the germline DNA and evolutionarily fixed in the human genome. HERVs can be transmitted vertically in a Mendelian fashion and be stably maintained in the human genome, of which they are estimated to comprise approximately 8%. HERV-K (HML1-10) transcription has been confirmed to be associated with a variety of diseases, such as breast cancer, lung cancer, prostate cancer, melanoma, rheumatoid arthritis, and amyotrophic lateral sclerosis. However, the poor characterization of HML-9 prevents a detailed understanding of the regulation of the expression of this family in humans and its impact on the host genome. In light of this, a precise and updated HERV-K HML-9 genomic map is urgently needed to better evaluate the role of these elements in human health. Results: We report a comprehensive analysis of the presence and distribution of HERV-K HML-9 elements within the human genome, with a detailed characterization of the structural and phylogenetic properties of the group. A total of 23 proviruses and 47 solo LTR elements were characterized, with a detailed description of the provirus structure, integration time, potential regulated genes, transcription factor binding sites (TFBS), and primer binding site (PBS) features. The integration time results showed that the HML-9 elements found in the human genome integrated into the primate lineage between 17.5 and 48.5 million years ago (mya). Conclusion: The results provide a clear characterization of HML-9 and a comprehensive background for subsequent functional studies.
... PCR amplification of the breakpoints was done using primer pairs with one primer in the RP1 gene and one in the L1 insert. The L1annealing primers are somewhat unspecific as the L1derived sequences are found across the human genome in both orientations [29,30]. The PCR reactions were performed using Biotools DNA polymerase according to the manufacturer's instructions (Biotools B&M Labs, Madrid, Spain), run in 1% agarose gel, and observed for L1 insertion-specific bands. ...
... The coordinates for the targets were extended 15 kb up-and downstream. A .bed file was created by merging any overlapping regions, and Bedtools (v2.30) ...
Article
Full-text available
Retinitis pigmentosa (RP) is a group of inherited degenerative retinal disorders affecting more than 1.5 million people worldwide. For 30-50% of individuals with RP, the genetic cause remains unresolved by current clinical diagnostic gene panels. It is likely explained by variants in novel RP-associated genes or noncoding regulatory regions, or by complex genetic alterations such as large structural variants. Recent developments in long-read sequencing techniques have opened an opportunity for efficient analysis of complex genetic variants. We analysed a Finnish family with dominantly inherited RP affecting six individuals in three generations. Two affected individuals underwent a comprehensive clinical examination in combination with a clinical diagnostic gene panel, followed by whole exome sequencing in our laboratory. They exhibited typical signs of RP, yet initial sequence analysis found no causative variants. Reanalysis of the sequencing data detected a LINE-1 (L1) retrotransposon insertion of unknown size in exon 4 of the RP1 axonemal microtubule-associated (RP1) gene. The large chimeric L1 insertion that segregated with the disease was further characterised using targeted adaptive nanopore sequencing of RP1, allowing us to identify a 5.6 kb L1 transposable element insertion in RP1 as the cause of RP in this family with dominantly inherited RP.
... Briefly, canonical FlyBase 697 TE consensus sequences were curated for each strain to include only TEs that best represent the 698 TE insertion landscape of each strain using RepeatMasker-4.1.0 results (Smit 1999;Larkin et al. 699 Srivastav et al, 2023 . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. ...
... RepeatMasker outputs of the comprehensive D. melanogaster TE library on respective genome 713 assemblies of 8 strains was generated (Smit 1999). RM.out files were parsed, and insertions 714 defragmented using scripts from https://github.com/4ureliek/Parsing-RepeatMasker-Outputs ...
Preprint
Full-text available
Animal genomes are parasitized by a horde of transposable elements (TEs) whose mutagenic activity can have catastrophic consequences. The piRNA pathway is a conserved mechanism to repress TE activity in the germline via a specialized class of small RNAs associated with effector Piwi proteins called piwi-associated RNAs (piRNAs). piRNAs are produced from discrete genomic regions called piRNA clusters (piCs). While piCs are generally enriched for TE sequences and the molecular processes by which they are transcribed and regulated are relatively well understood in Drosophila melanogaster, much less is known about the origin and evolution of piCs in this or any other species. To investigate piC evolution, we use a population genomics approach to compare piC activity and sequence composition across 8 geographically distant strains of D. melanogaster with high quality long-read genome assemblies. We perform extensive annotations of ovary piCs and TE content in each strain and test predictions of two proposed models of piC evolution. The 'de novo' model posits that individual TE insertions can spontaneously attain the status of a small piC to generate piRNAs silencing the entire TE family. The 'trap' model envisions large and evolutionary stable genomic clusters where TEs tend to accumulate and serves as a long-term "memory" of ancient TE invasions and produce a great variety of piRNAs protecting against related TEs entering the genome. It remains unclear which model best describes the evolution of piCs. Our analysis uncovers extensive variation in piC activity across strains and signatures of rapid birth and death of piCs in natural populations. Most TE families inferred to be recently or currently active show an enrichment of strain-specific insertions into large piCs, consistent with the trap model. By contrast, only a small subset of active LTR retrotransposon families is enriched for the formation of strain-specific piCs, suggesting that these families have an inherent proclivity to form de novo piCs. Thus, our findings support aspects of both 'de novo' and 'trap' models of piC evolution. We propose that these two models represent two extreme stages along an evolutionary continuum, which begins with the emergence of piCs de novo from a few specific LTR retrotransposon insertions that subsequently expand by accretion of other TE insertions during evolution to form larger 'trap' clusters. Our study shows that piCs are evolutionarily labile and that TEs themselves are the major force driving the formation and evolution of piCs.
... In addition, mobile element insertions (MEIs) have gained interest in the field of cancer and genetic predispositions in recent years. While 42% of the human genome consists of mobile elements (MEs) [15], only a small fraction (<0.05%) of them remain active. All active elements are retrotransposons and belong to subfamilies of Long Interspersed Nuclear Elements 1, Alu, and Short interspersed element-variable number tandem repeat-Alus (SVA) elements. ...
... The detection of MEIs remains challenging due to the ME abundance across the human genome and their repeating nature, which causes sequencing and mapping errors [15]. Nonetheless, based on current estimations, MEIs occur de novo in 1 of 12-14 live births [23], might be responsible for 0.04% of all genetic diseases [24], and represent up to 0.3% of all disease-causing variants [19]. ...
Article
Full-text available
The vast majority of patients at risk of hereditary breast and/or ovarian cancer (HBOC) syndrome remain without a molecular diagnosis after routine genetic testing. One type of genomic alteration that is commonly missed by diagnostic pipelines is mobile element insertions (MEIs). Here, we reanalyzed multigene panel data from suspected HBOC patients using the MEI detection tool Mobster. A novel Alu element insertion in ATM intron 54 (ATM:c.8010+30_8010+31insAluYa5) was identified as a potential contributing factor in seven patients. Transcript analysis of patient-derived RNA from three heterozygous carriers revealed exon 54 skipping in 38% of total ATM transcripts. To manifest the direct association between the Alu element insertion and the aberrant splice pattern, HEK293T and MCF7 cells were transfected with wild-type or Alu element-carrying minigene constructs. On average, 77% of plasmid-derived transcripts lacked exon 54 in the presence of the Alu element insertion compared to only 4.7% of transcripts expressed by the wild-type minigene. These results strongly suggest ATM:c.8010+30_8010+31insAluYa5 as the main driver of ATM exon 54 skipping. Since this exon loss is predicted to cause a frameshift and a premature stop codon, mutant transcripts are unlikely to translate into functional proteins. Based on its estimated frequency of up to 0.05% in control populations, we propose to consider ATM:c.8010+30_8010+31insAluYa5 in suspected HBOC patients and to clarify its role in carcinogenesis through future epidemiological and functional analyses. Generally, the implementation of MEI detection tools in diagnostic sequencing pipelines could increase the diagnostic yield, as MEIs are likely underestimated contributors to genetic diseases.
... The ongoing explosion in the number of sequenced organisms highlights the need for reliable and thorough automated genome annotation pipelines. Most of the vertebrate genome finds its ultimate origin in transposable elements (TEs) (1)(2)(3)(4)(5), which have an enormous impact on genome activity and evolution (6)(7)(8)(9). Due to the volume and diversity of TEs, complete annotation of genomes depends on accurate identification and modeling of TE families (10). ...
... To simulate master-gene model phylogenies such as seen in LINE families ( Figure 2B), the tree is expanded by adding a randomly determined number of children (2)(3)(4)(5) with randomly chosen branch lengths (0-5) to the current parent node (initially the root of the tree). One of the new children is randomly picked to be the new parent node (or 'master gene') and the process is iterated until the target number of extant nodes is reached (100). ...
Article
Full-text available
The construction of a high-quality multiple sequence alignment (MSA) from copies of a transposable element (TE) is a critical step in the characterization of a new TE family. Most studies of MSA accuracy have been conducted on protein or RNA sequence families, where structural features and strong signals of selection may assist with alignment. Less attention has been given to the quality of sequence alignments involving neutrally evolving DNA sequences such as those resulting from TE replication. Transposable element sequences are challenging to align due to their wide divergence ranges, fragmentation, and predominantly-neutral mutation patterns. To gain insight into the effects of these properties on MSA accuracy, we developed a simulator of TE sequence evolution, and used it to generate a benchmark with which we evaluated the MSA predictions produced by several popular aligners, along with Refiner, a method we developed in the context of our RepeatModeler software. We find that MAFFT and Refiner generally outperform other aligners for low to medium divergence simulated sequences, while Refiner is uniquely effective when tasked with aligning high-divergent and fragmented instances of a family.
... Transposable elements (TEs), also known as transposons or mobile elements, comprise a significant portion in mammalian genomes (1)(2)(3), approximately half of the human genome (4). Most TEs are transposition incompetent due to accumulated interior mutations and truncation or various host repression mechanisms (5). ...
... Although researchers have long noted that most of reference LTR elements and L1s in gene introns are in the antisense orientation with respect to the host genes (1,53), possibly due to ill effects on transcript processing of sense-oriented elements (67,68), there are no established conclusions about the orientation tendency of other types of nonreference MEIs and the number of sites in previous studies were limited (17,44,46). Our large collection of MEIs found in genes allowed us to closely examine the strand bias of different MEIs. ...
Article
Full-text available
Mobile element insertions (MEIs) are a major class of structural variants (SVs) and have been linked to many human genetic disorders, including hemophilia, neurofibromatosis, and various cancers. However, human MEI resources from large-scale genome sequencing are still lacking compared to those for SNPs and SVs. Here, we report a comprehensive map of 36 699 non-reference MEIs constructed from 5675 genomes, comprising 2998 Chinese samples (∼26.2×, NyuWa) and 2677 samples from the 1000 Genomes Project (∼7.4×, 1KGP). We discovered that LINE-1 insertions were highly enriched in centromere regions, implying the role of chromosome context in retroelement insertion. After functional annotation, we estimated that MEIs are responsible for about 9.3% of all protein-truncating events per genome. Finally, we built a companion database named HMEID for public use. This resource represents the latest and largest genomewide study on MEIs and will have broad utility for exploration of human MEI findings.
... The repeat masker data for humans (hg38) and zebrafish (zv9) were downloaded from UCSC (Smit, 1996(Smit, , 1999 and overlapped with syntenic lncRNAs of humans and zebrafish respectively. The enrichment was calculated as previously described (Karakülah & Suner, 2017) and Fisher's test was computed for the p-value and odds ratio using R. ...
Preprint
Full-text available
Syntenic long non-coding RNAs (lncRNAs) often show limited sequence conservation across species, prompting concern in the field. This study delves into functional signatures of syntenic lncRNAs between humans and zebrafish. Syntenic lncRNAs have high expression in zebrafish and ∼90% near protein-coding genes in sense or antisense orientation. During early zebrafish development and human embryonic stem cells (H1-hESC), are enriched with cis-regulatory repressor signatures, influencing development-associated genes. In later zebrafish developmental stages and specific human cell lines, these lncRNAs serve as enhancers or transcription-start-sites(TSS) for protein-coding. Analysis of Transposable Elements (TEs) in syntenic lncRNA sequence divergence unveils intriguing patterns, human lncRNAs show enrichment in simple repeat elements, while zebrafish counterparts exhibit LTR element enrichment. This sequence evolution, possibly stemming from post-rearrangement mutations, enhances DNA elements or cis-regulatory functions. It may also contribute to vertebrate innovation by creating novel TF binding sites within the locus. This study sheds light on the conserved functionality of syntenic lncRNAs through DNA elements, emphasizing their role across species despite sequence divergence.
... Transposons or transposable elements (TEs) are present in almost all organisms; these mobile genetic elements account for ~50% of the human genome (12,13). Transposons are usually classified as RNA-based retrotransposons and 'cut-paste' or 'cut-copy' types of transposons (14). ...
Article
Glioma is the most common type of primary intracranial malignant tumor, and because of its high invasiveness and recurrence, its prognosis remains poor. The present study investigated the biological function of piggyBac transportable element derived 5 (PGBD5) in glioma. Glioma and para-cancerous tissues were obtained from five patients. Reverse transcription-quantitative PCR and western blotting were used to detect the expression levels of PGBD5. Transwell assay and flow cytometry were used to evaluate cell migration, invasion, apoptosis and cell cycle distribution. In addition, a nude mouse tumor transplantation model was established to study the downstream pathways of PGBD5 and the molecular mechanism was analyzed using transcriptome sequencing. The mRNA and protein expression levels of PGBD5 were increased in glioma tissues and cells. Notably, knockdown of PGBD5 in vitro could inhibit the migration and invasion of glioma cells. In addition, the knockdown of PGBD5 expression promoted apoptosis and caused cell cycle arrest in the G2/M phase, thus inhibiting cell proliferation. Furthermore, in vivo experiments revealed that knockdown of PGBD5 expression could inhibit Ki67 expression and slow tumor growth. Changes in PGBD5 expression were also shown to be closely related to the peroxisome proliferator-activated receptor (PPAR) signaling pathway. In conclusion, interference with PGBD5 could inhibit the malignant progression of glioma through the PPAR pathway, suggesting that PGBD5 may be a potential molecular target of glioma.
... Transposable elements (TEs), also known as mobile elements or transposons, comprise approximately half of the mammalian genome. 1,2 The transposition of TEs can result in diverse genetic effects, such as genomic instability, which has been extensively reported in humans and is associated with the development of diseases. 3 In addition, TEs mediate structural variations (SVs), including insertions or deletions, in the genome. ...
... Transposable elements (TEs), also known as mobile elements or transposons, comprise approximately half of the mammalian genome. 1,2 The transposition of TEs can result in diverse genetic effects, such as genomic instability, which has been extensively reported in humans and is associated with the development of diseases. 3 In addition, TEs mediate structural variations (SVs), including insertions or deletions, in the genome. ...
Article
Full-text available
Transposable elements (TEs) mobility is capable of generating a large number of structural variants (SVs), which can have considerable potential as molecular markers for genetic analysis and molecular breeding in livestock. Our results showed that the pig genome contains mainly TE-SVs generated by short interspersed nuclear elements (51,873/76.49%), followed by long interspersed nuclear elements (11,131/16.41%), and more than 84% of the common TE-SVs (Minor allele frequency, MAF > 0.10) were validated to be polymorphic. Subsequently, we utilized the identified TE-SVs to gain insights into the population structure, resulting in clear differentiation among the three pig groups and facilitating the identification of relationships within Chinese local pig breeds. In addition, we investigated the frequencies of TEs in the gene coding regions of different pig groups and annotated the respective TE types, related genes, and functional pathways. Through genome-wide comparisons of Large White pigs and Chinese local pigs utilizing the Beijing Black pigs, we identified TE-mediated SVs associated with quantitative trait loci and observed that they were mainly involved in carcass traits and meat quality traits. Lastly, we present the first documented evidence of TE transduction in the pig genome.
... A role for DNA repetitive elements in personalized medicine One of the current challenges in molecular medicine is to understand how DNA variations in non-coding sequences translate into phenotypic variability among individuals. Repetitive DNA elements represent 56-69% of the human genome [1,2,14,110,111]. Although macrosatellite repeats have been less well studied than many repeat classes, there is increasing evidence for a strong correlation between macrosatellite copy number, epigenetic modi cations and local gene expression (references). ...
Preprint
Full-text available
Background Reduced copy number of the D4Z4 macrosatellite at human chromosome 4q35 is associated with facioscapulohumeral muscular dystrophy (FSHD). A pervasive idea is that chromatin alterations at the 4q35 locus following D4Z4 repeat unit deletion lead to disease via inappropriate expression of nearby genes. Here, we sought to analyze transcription and chromatin characteristics across 4q35 and how these are affected by D4Z4 deletions and exogenous stresses. Results We found that the 4q subtelomere is subdivided into discrete domains, each with characteristic chromatin features associated with distinct gene expression profiles. Centromere-proximal genes within 4q35 (ANT1, FAT1 and FRG1) display active histone marks at their promoters. In contrast, poised or repressed markings are present at telomere-proximal loci including FRG2, DBE-T and D4Z4. We discovered that these discrete domains undergo region-specific chromatin changes upon treatment with chromatin enzyme inhibitors or genotoxic drugs. We demonstrated that the 4q35 telomere-proximal FRG2, DBE-T and D4Z4-derived transcripts are induced upon DNA damage to levels inversely correlated with the D4Z4 repeat number, are stabilized through post-transcriptional mechanisms upon DNA damage, and are bound to chromatin. Conclusion Our study reveals unforeseen biochemical features of RNAs from clustered transcription units within the 4q35 subtelomere. Specifically, the FRG2, DBE-T and D4Z4-derived transcripts are chromatin-associated and are stabilized post-transcriptionally after induction by genotoxic stress. Remarkably, the extent of this response is modulated by the copy number of the D4Z4 repeats, raising new hypotheses about their regulation and function in human biology and disease.
... To understand the genome architecture at SV breakpoints and the role of unusual DNA sequences such as low-copy repeats or tandem repeats [52,53] in chromoanagenesis, we checked for all repeat elements at the SV breakpoints using RepeatMasker [http://www.repeatmasker.org, 15 August 2023] and Repbase update programs [54]. Of the 55 SV breakpoints that were detected by MPseq and had sequencing reads by nanopore sequencing, 19 were intergenic, 35 were at intronic regions, and 1 was at an exon (Supplemental Table S5). ...
Article
Full-text available
Complex structural chromosome abnormalities such as chromoanagenesis have been reported in acute myeloid leukemia (AML). They are usually not well characterized by conventional genetic methods, and the characterization of chromoanagenesis structural abnormalities from short-read sequencing still presents challenges. Here, we characterized complex structural abnormalities involving chromosomes 2, 3, and 7 in an AML patient using an integrated approach including CRISPR/Cas9-mediated nanopore sequencing, mate pair sequencing (MPseq), and SNP microarray analysis along with cytogenetic methods. SNP microarray analysis revealed chromoanagenesis involving chromosomes 3 and 7, and a pseudotricentric chromosome 7 was revealed by cytogenetic methods. MPseq revealed 138 structural variants (SVs) as putative junctions of complex rearrangements involving chromosomes 2, 3, and 7, which led to 16 novel gene fusions and 33 truncated genes. Thirty CRISPR RNA (crRNA) sequences were designed to map 29 SVs, of which 27 (93.1%) were on-target based on CRISPR/Cas9 crRNA nanopore sequencing. In addition to simple SVs, complex SVs involving over two breakpoints were also revealed. Twenty-one SVs (77.8% of the on-target SVs) were also revealed by MPseq with shared SV breakpoints. Approximately three-quarters of breakpoints were located within genes, especially intronic regions, and one-quarter of breakpoints were intergenic. Alu and LINE repeat elements were frequent among breakpoints. Amplification of the chromosome 7 centromere was also detected by nanopore sequencing. Given the high amplification of the chromosome 7 centromere, extra chromosome 7 centromere sequences (tricentric), and more gains than losses of genomic material, chromoanasynthesis and chromothripsis may be responsible for forming this highly complex structural abnormality. We showed this combination approach’s value in characterizing complex structural abnormalities for clinical and research applications. Characterization of these complex structural chromosome abnormalities not only will help understand the molecular mechanisms responsible for the process of chromoanagenesis, but also may identify specific molecular targets and their impact on therapy and overall survival.
... Key words: retrotransposons, innate immunity, retrotransposon-derived gene, dsRNA, chimeric transcripts previously regarded as "genomic parasites" (Smit, 1999;International Human Genome Sequencing Consortium, 2001). However, they have now emerged as essential components that are capable of influencing the functions of our immune system. ...
Article
Full-text available
Retrotransposons, which account for approximately 42% of the human genome, have been increasingly recognized as "non-self" pathogen-associated molecular patterns (PAMPs) due to their virus-like sequences. In abnormal conditions such as cancer and viral infections, retrotransposons that are aberrantly expressed due to impaired epigenetic suppression display PAMPs, leading to their recognition by pattern recognition receptors (PRRs) of the innate immune system and triggering inflammation. This viral mimicry mechanism has been observed in various human diseases, including aging and autoimmune disorders. However, recent evidence suggests that retrotransposons possess highly regulated immune reactivity and play important roles in the development and function of the immune system. In this review, I discuss a wide range of retrotransposon-derived transcripts, their role as targets in immune recognition, and the diseases associated with retrotransposon activity. Furthermore, I explore the implications of chimeric transcripts formed between retrotransposons and known gene mRNAs, which have been previously underestimated, for the increase of immune-related gene isoforms and their influence on immune function. Retrotransposon-derived transcripts have profound and multifaceted effects on immune system function. The aim of this comprehensive review is to provide a better understanding of the complex relationship between retrotransposon transcripts and immune defense.
... MusHAL1 is a Mus-specific HAL1 (half-L1) consisting of ORF1-like and polyA sequences. This constitutes an independent group although it likely arose from the division of an ancestral L1 copy (Smit, 1999). MusHAL1 in the mouse genome became extinct millions of years ago (with an average divergence of 13.4%) ( Fig. 2A and Supplementary Table S1), whereas rat-specific HAL1, named RNHAL1, still retains transposition activity. ...
Article
Full-text available
Retrotransposons are transposable elements that are transposed via transcription and reverse transcription. Their copies have accumulated in the genome of mammals, occupying approximately 40% of mammalian genomic mass. These copies are often involved in numerous phenomena, such as chromatin spatial organization, gene expression, development and disease, and have been recognized as a driving force in evolution. Different organisms have gained specific retrotransposon subfamilies and retrotransposed copies, such as hundreds of Mus-specific subfamilies with diverse sequences and genomic locations. Despite this complexity, basic information is still necessary for present-day genomic and epigenomic studies. Herein, we describe the characteristics of each subfamily of Mus-specific retrotransposons in terms of sequence structure, phylogenetic relationships, evolutionary age, and preference for A or B compartments of chromatin.
... This led to the finding that MOG autoantibodies could bind HERV-W protein (107) and one study used a nanotechnology approach to show a proof of principle that antibodies raised against MOG could cross-react with HERV-W (107). HERV are remnant genetic material left in the human genome following infection with retroviruses and constitute around 7% of the human genome (228). Initial investigation of HERV-W in MS brains was driven by isolation of HERV-W protein from sera, CSF and brain samples of affected individuals (229). ...
Article
Full-text available
T cells have an essential role in adaptive immunity against pathogens and cancer, but failure of thymic tolerance mechanisms can instead lead to escape of T cells with the ability to attack host tissues. Multiple sclerosis (MS) occurs when structures such as myelin and neurons in the central nervous system (CNS) are the target of autoreactive immune responses, resulting in lesions in the brain and spinal cord which cause varied and episodic neurological deficits. A role for autoreactive T cell and antibody responses in MS is likely, and mounting evidence implicates Epstein-Barr virus (EBV) in disease mechanisms. In this review we discuss antigen specificity of T cells involved in development and progression of MS. We examine the current evidence that these T cells can target multiple antigens such as those from pathogens including EBV and briefly describe other mechanisms through which viruses could affect disease. Unravelling the complexity of the autoantigen T cell repertoire is essential for understanding key events in the development and progression of MS, with wider implications for development of future therapies.
... The majority of both uniform and non-uniform variants occurred within introns of genes, followed by intergenic regions. The fact that more variants were identified at introns than intergenic regions may be due to the likely increased mappability at introns relative to intergenic regions: a study on mappability of short reads to the mammalian genome observed that the majority of regions that can't be uniquely mapped lie within repetitive regions [8], which are enriched in both introns and intergenic space [9]; thus, exons are more mappable and can enable increased mappability into nearby introns. A substantial number of uniform and non-uniform variants were identified at regions that could lead to specific functional consequences (Figs. ...
Article
Full-text available
Background CD-1 is an outbred mouse stock that is frequently used in toxicology, pharmacology, and fundamental biomedical research. Although inbred strains are typically better suited for such studies due to minimal genetic variability, outbred stocks confer practical advantages over inbred strains, such as improved breeding performance and low cost. Knowledge of the full genetic variability of CD-1 would make it more useful in toxicology, pharmacology, and fundamental biomedical research. Results We performed deep genomic DNA sequencing of CD-1 mice and used the data to identify genome-wide SNPs, indels, and germline transposable elements relative to the mm10 reference genome. We used multiple genome-wide sequencing data types and previously published CD-1 SNPs to validate our called variants. We used the called variants to construct a strain-specific CD-1 reference genome, which we show can improve mappability and reduce experimental biases from genome-wide sequencing data derived from CD-1 mice. Based on previously published ChIP-seq and ATAC-seq data, we find evidence that genetic variation between CD-1 mice can lead to alterations in transcription factor binding. We also identified a number of variants in the coding region of genes which could have effects on translation of genes. Conclusions We have identified millions of previously unidentified CD-1 variants with the potential to confound studies involving CD-1. We used the identified variants to construct a CD-1-specific reference genome, which can improve accuracy and reduce bias when aligning genomics data derived from CD-1 mice.
... Although it is unclear how this happened, our results revealed that it is at least in part attributable to expansions of retrotransposable elements, rather than large-scale segmental duplication events. Given that fitness-based selection and phenotypic plasticity are connected to genome size [18][19][20] , this expanded genome may be related to its strikingly wide host range. Notably, the wide endogenization of various viral genomes into the tapeworm genome suggests previous viral infections 21 , consistent with a recent study that identified six novel RNA viruses that infect the tapeworm Schistocephalus solidus 22 . ...
Article
Full-text available
Taenia hydatigena is a widespread gastrointestinal helminth that causes significant health problems in livestock industry. This parasite can survive in a remarkably wide range of intermediate hosts and affects the transmission dynamics of zoonotic parasites. T. hydatigena is therefore of particular interest to researchers interested in studying zoonotic diseases and the evolutionary strategies of parasites. Herein we report a high-quality draft genome for this tapeworm, characterized by some hallmarks (e.g., expanded genome size, wide integrations of viral-like sequences and extensive alternative splicing during development), and specialized adaptations related to its parasitic fitness (e.g., adaptive evolutions for teguments and lipid metabolism). Importantly, in contrast with the evolutionarily close trematodes, which achieve gene diversification associated with immunosuppression by gene family expansions, in T. hydatigena and other cestodes, this is accomplished by alternative splicing and gene loss. This indicates that these two classes have evolved different mechanisms for survival. In addition, molecular targets for diagnosis and intervention were identified to facilitate the development of control interventions. Overall, this work uncovers new strategies by which helminths evolved to interact with their hosts.
... This domestication of proteins using transposable elements is a well-known phenomenon known as lateral gene transfer that can enrich the recipient and provide a mechanism for evolutionary flexibility (57, 58). About one-half of the mammalian genome consists of DNA with viral or transposon origin and about 8% belong to ancient retroviruses (2,59). According to the Gene Expression Omnibus (GEO) database for gene expression profiling, the Arc gene was identified as a candidate gene involved in the pathogenesis of various neurological diseases, including epilepsy and a number of others, such as depression. ...
Article
Full-text available
A product of the immediate early gene Arc (Activity-regulated cytoskeleton-associated protein or Arc protein) of retroviral ancestry resides in the genome of all tetrapods for millions of years and is expressed endogenously in neurons. It is a well-known protein, very important for synaptic plasticity and memory consolidation. Activity-dependent Arc expression concentrated in glutamatergic synapses affects the long-time synaptic strength of those excitatory synapses. Because it modulates excitatory-inhibitory balance in a neuronal network, the Arc gene itself was found to be related to the pathogenesis of epilepsy. General Arc knockout rodent models develop a susceptibility to epileptic seizures. Because of activity dependence, synaptic Arc protein synthesis also is affected by seizures. Interestingly, it was found that Arc protein in synapses of active neurons self-assemble in capsids of retrovirus-like particles, which can transfer genetic information between neurons, at least across neuronal synaptic boutons. Released Arc particles can be accumulated in astrocytes after seizures. It is still not known how capsid assembling and transmission timescale is affected by seizures. This scientific field is relatively novel and is experiencing swift transformation as it grapples with difficult concepts in light of evolving experimental findings. We summarize the emergent literature on the subject and also discuss the specific rodent models for studying Arc effects in epilepsy. We summarized both to clarify the possible role of Arc-related pseudo-viral particles in epileptic disorders, which may be helpful to researchers interested in this growing area of investigation.
... The TEs are further divided into two groups [21][22][23][24], with Class I transposons (retrotransposons) which make new copies using an RNA-mediated copy-and-paste mechanism [25], and Class II transposable elements (DNA transposons) which replicate elements using a DNA-mediated cut-and-paste mechanism [26]. Studies have shown that retrotransposons occupy a significant fraction of TEs in the eukaryotic genome [11,27]. Based on the structural features, retrotransposons are further subdivided into Long Terminal Repeat retrotransposons (LTRs), Long Interspersed Nuclear Elements (LINEs) and Short Interspersed Nuclear Elements (SINEs). ...
Article
Full-text available
Abstract Amphibians, particularly anurans, display an enormous variation in genome size. Due to theunavailability of whole genome datasets in the past, the genomic elements and evolutionary causes of anuran genome size variation are poorly understood. To address this, we analyzed whole-genome sequences of 14 anuran species ranging in size from 1.1 to 6.8 Gb. By annotating multiple genomic elements, we investigated the genomic correlates of anuran genome size variation and further examined whether the genome size relates to habitat types.
... Previous work has suggested that the largest single classifiable component of a typical mammalian genome is TEs (27), and our data ( Fig. 1) corroborate this. As noted previously by Elliott and Gregory in 2015 (2), genome size linearly correlates with the percentage of TE content within a genome, and this is again supported by our data ( Fig. 1 and table S1). ...
Article
We examined transposable element (TE) content of 248 placental mammal genome assemblies, the largest de novo TE curation effort in eukaryotes to date. We found that although mammals resemble one another in total TE content and diversity, they show substantial differences with regard to recent TE accumulation. This includes multiple recent expansion and quiescence events across the mammalian tree. Young TEs, particularly long interspersed elements, drive increases in genome size, whereas DNA transposons are associated with smaller genomes. Mammals tend to accumulate only a few types of TEs at any given time, with one TE type dominating. We also found association between dietary habit and the presence of DNA transposon invasions. These detailed annotations will serve as a benchmark for future comparative TE analyses among placental mammals.
... Retroelements are a group of repetitive DNA sequences that consist of 42-43% of transposon elements in the human genome [8], and they are classified into two main groups based on possession or lack of long terminal repeats (LTRs). LINEs and short interspersed nuclear elements (SINEs) belong to non-LTR retroelements consisting 33% of the human genome [9]. ...
Article
Full-text available
The footprint of human endogenous retroviruses (HERV), specifically HERV-K, has been found in malignancies, such as melanoma, teratocarcinoma, osteosarcoma, breast cancer, lymphoma, and ovary and prostate cancers. HERV-K is characterized as the most biologically active HERV due to possession of open reading frames (ORF) for all Gag, Pol, and Env genes, which enables it to be more infective and obstructive towards specific cell lines and other exogenous viruses, respectively. Some factors might contribute to carcinogenicity and at least one of them has been recognized in various tumors, including overexpression/methylation of long interspersed nuclear element 1 (LINE-1), HERV-K Gag, and Env genes themselves plus their transcripts and protein products, and HERV-K reverse transcriptase (RT). Therapies effective for HERV-K-associated tumors mostly target invasive autoimmune responses or growth of tumors through suppression of HERV-K Gag or Env protein and RT. To design new therapeutic options, more studies are needed to better understand whether HERV-K and its products (Gag/Env transcripts and HERV-K proteins/RT) are the initiators of tumor formation or just the disorder’s developers. Accordingly, this review aims to present evidence that highlights the association between HERV-K and tumorigenicity and introduces some of the available or potential therapies against HERV-K-induced tumors.
... Only approximately 100 L1 sites are still retrotransposition competent in the germline [129] and in disease [130]. The L1 distribution in the human genome shows a preference for the leading strand orientation relative to the replication direction [124] and for the template strand orientation in transcribed regions [123,131] (Fig. 3a). Even though there is a higher density of L1 elements at late replicating regions, integration is more likely to occur at early-replicating sites, suggesting that evolutionary selection contributes to the observed patterns in the genome. ...
Article
Full-text available
Across biological systems, a number of genomic processes, including transcription, replication, DNA repair, and transcription factor binding, display intrinsic directionalities. These directionalities are reflected in the asymmetric distribution of nucleotides, motifs, genes, transposon integration sites, and other functional elements across the two complementary strands. Strand asymmetries, including GC skews and mutational biases, have shaped the nucleotide composition of diverse organisms. The investigation of strand asymmetries often serves as a method to understand underlying biological mechanisms, including protein binding preferences, transcription factor interactions, retrotransposition, DNA damage and repair preferences, transcription-replication collisions, and mutagenesis mechanisms. Research into this subject also enables the identification of functional genomic sites, such as replication origins and transcription start sites. Improvements in our ability to detect and quantify DNA strand asymmetries will provide insights into diverse functionalities of the genome, the contribution of different mutational mechanisms in germline and somatic mutagenesis, and our knowledge of genome instability and evolution, which all have significant clinical implications in human disease, including cancer. In this review, we describe key developments that have been made across the field of genomic strand asymmetries, as well as the discovery of associated mechanisms.
... Localized in an unstable region of chromosome 5q (11.1-13.3), that contains a large 500 kb inverted repeat element, SMN genes comprise 62% interspersed repetitive DNA and the density of Alu elements are 4-fold higher than in average in the genome (Rochette et al., 2001;Smit, 1999). DNA rearrangements in this region might be the reason why genome editing approaches to correct the exon 7 SMN2 SMAcausing mutation have proven challenging. ...
Preprint
Full-text available
Whether neurodevelopmental defects underlie the selective neuronal death that characterizes neurodegenerative diseases is becoming an intriguing question. To address it, we focused on the motor neuron (MN) disease Spinal Muscular Atrophy (SMA), caused by reduced levels of the ubiquitous protein SMN. Taking advantage of the first isogenic human induced pluripotent stem cell-derived SMA model that we have generated and a spinal cord organoid system, here we report that the relative and temporal expression of early neural progenitor and MN markers is altered in SMA. Furthermore, the corrected isogenic controls only partially reverse these abnormalities. These findings raise the relevant clinical implication that SMN-increasing treatments might not fully amend SMA pathological phenotypes. The approach we have taken demonstrates that the discovery of new disease mechanisms is greatly improved by using human isogenic models. Moreover, our study implies that SMA has a developmental component that might trigger the MN degeneration.
... Previous work has suggested that the largest single classifiable component of a typical mammalian genome is TEs (27), and our data ( Fig. 1) corroborate this. As noted previously by Elliott and Gregory in 2015 (2), genome size linearly correlates with the percentage of TE content within a genome, and this is again supported by our data ( Fig. 1 and table S1). ...
Preprint
We examined transposable element (TE) content of 248 placental mammal genome assemblies, the largest de novo TE curation effort in eukaryotes to date. We find that while mammals resemble one another in total TE content and diversity, they show substantial differences with regard to recent TE accumulation. This includes multiple recent expansion and quiescence events across the mammalian tree. Young TEs, particularly LINEs, drive increases in genome size while DNA transposons are associated with smaller genomes. Mammals tend to accumulate only a few types of TE at any given time, with one TE type dominating. We also found association between dietary habit and the presence of DNA transposon invasions. These detailed annotations will serve as a benchmark for future comparative TE analyses among placental mammals. One-Sentence Summary A de novo assessment of TE content in 248 mammals finds informative trends in mammalian genome evolution.
... One of the most direct contributions of TEs to their host genomes occurs with the process of 'molecular domestication' whereby the gene (or genes) encoded by and serving the replication of a TE is (are) co-opted by the host genome to create a new gene (new genes) with cellular function(s) [6,[17][18][19][20][21][22]. Recent studies suggest that TE domestication is a common pathway for the emergence of new genes and functions [6,[19][20][21][22][23][24]. ...
Article
Full-text available
Background Transposable elements (TEs) are selfish DNA sequences capable of moving and amplifying at the expense of host cells. Despite this, an increasing number of studies have revealed that TE proteins are important contributors to the emergence of novel host proteins through molecular domestication. We previously described seven transposase-derived domesticated genes from the PIF/Harbinger DNA family of TEs in Drosophila and a co-domestication. All PIF TEs known in plants and animals distinguish themselves from other DNA transposons by the presence of two genes. We hypothesize that there should often be co-domestications of the two genes from the same TE because the transposase (gene 1) has been described to be translocated to the nucleus by the MADF protein (gene 2). To provide support for this model of new gene origination, we investigated available insect species genomes for additional evidence of PIF TE domestication events and explored the co-domestication of the MADF protein from the same TE insertion. Results After the extensive insect species genomes exploration of hits to PIF transposases and analyses of their context and evolution, we present evidence of at least six independent PIF transposable elements proteins domestication events in insects: two co-domestications of both transposase and MADF proteins in Anopheles (Diptera), one transposase-only domestication event and one co-domestication in butterflies and moths (Lepidoptera), and two transposases-only domestication events in cockroaches (Blattodea). The predicted nuclear localization signals for many of those proteins and dicistronic transcription in some instances support the functional associations of co-domesticated transposase and MADF proteins. Conclusions Our results add to a co-domestication that we previously described in fruit fly genomes and support that new gene origination through domestication of a PIF transposase is frequently accompanied by the co-domestication of a cognate MADF protein in insects, potentially for regulatory functions. We propose a detailed model that predicts that PIF TE protein co-domestication should often occur from the same PIF TE insertion.
... The strand bias of TEs of the middle exon can be caused by the asymmetric distribution of TEs in introns. TEs on sense strands generate harmful poly-A signals in introns, resulting in the reduction of TEs in sense strands in introns [54,55]. Alu has polyadenylation signals in the sense strand that can generate the last exons, which can explain the strand bias of Alu of the last exon (Fig 2C) [56]. ...
Article
Full-text available
Genes generate transcripts of various functions by alternative splicing. However, in most transcriptome studies, short-reads sequencing technologies (next-generation sequencers) have been used, leaving full-length transcripts unobserved directly. Although long-reads sequencing technologies would enable the sequencing of full-length transcripts, the data analysis is difficult. In this study, we developed an analysis pipeline named SPLICE and analyzed cDNA sequences from 42 pairs of hepatocellular carcinoma (HCC) and matched non-cancerous livers with an Oxford Nanopore sequencer. Our analysis detected 46,663 transcripts from the protein-coding genes in the HCCs and the matched non-cancerous livers, of which 5,366 (11.5%) were novel. A comparison of expression levels identified 9,933 differentially expressed transcripts (DETs) in 4,744 genes. Interestingly, 746 genes with DETs, including the LINE1-MET transcript, were not found by a gene-level analysis. We also found that fusion transcripts of transposable elements and hepatitis B virus (HBV) were overexpressed in HCCs. In vitro experiments on DETs showed that LINE1-MET and HBV-human transposable elements promoted cell growth. Furthermore, fusion gene detection showed novel recurrent fusion events that were not detected in the short-reads. These results suggest the efficiency of full-length transcriptome studies and the importance of splicing variants in carcinogenesis.
... Le génome est organisé par des enchaînements de régions dites "intergéniques" et "géniques", bien que cette définition soit discutable dans certains cas. Les régions "géniques" sont composées de gènes qui peuvent coder pour des protéines (ARN messager (ARNm)) ou pour d'autres ARN (Djebali et al., 2012) (Smit, 1999). Par exemple, une séquence répétée nommée Alu est fortement présente dans le génome humain et correspond à 11% de ce dernier (Lander et al., 2001 ...
Thesis
Full-text available
Les projets de séquençage à haut débit produisent une énorme quantité de données biologiques brutes. Cependant, elles sont difficilement exploitables si elles ne sont pas annotées. Pour traiter ces données, des programmes d’annotation de génomes ont été développés, mais ces derniers sont encore trop sujet aux erreurs de prédiction, faisant de l’annotation des génomes un des défis majeurs en bio-informatique. Dans ce contexte, mes travaux de thèse s’organisent autour d’un trinôme : 1) l’amélioration de la prédiction des gènes eucaryotes codant pour des protéines en se focalisant spécifiquement sur les sites d’épissage 2) en exploitant des algorithmes d’intelligence artificielle (CNN et algorithmes évolutionnaires), 3) entraînés avec des données de haute qualité incluant une forte diversité d’espèces eucaryotes. Notre stratégie consiste à combiner l’ensemble des données validées avec les programmes développés afin d’améliorer la prédiction des gènes en diminuant le taux d’erreurs et éviter qu’elles ne se propagent dans les bases de données. De plus, ces travaux permettront une meilleure compréhension des organismes et de leurs mécanismes biologiques.
... than random expectation (43.6%) (Fig. 1B) and disproportionately few (23/61, 37.7%) of these were sense oriented to their host gene. Given modest genome-wide L1 integration site preferences, which mainly reflect the underlying distribution of AT-rich sequences, these patterns were likely dominated by post-integration selection, and are concordant with prior results obtained by human analyses (Sultana et al. 2019;Smits et al. 2021;Flasch et al. 2019;Ewing and Kazazian 2010;Attig et al. 2018;Smit 1999). Consistent with L1-mediated retrotransposition in humans and other mammals (Jurka 1997;Tang and Liang 2019;Moran et al. 1996;Richardson et al. 2017;Ewing et al. 2020;Smits et al. 2021), the L1 and Alu insertions generated TSDs with a median length of 15bp (Fig. 1C) and integrated at a motif strongly resembling the preferred L1 endonuclease motif (Fig. 1C). ...
Article
Full-text available
The retrotransposon LINE-1 (L1) is central to the recent evolutionary history of the human genome, and continues to drive genetic diversity and germline pathogenesis. However, the spatiotemporal extent and biological significance of somatic L1 activity is poorly defined, and is virtually unexplored in other primates. From a single L1 lineage active at the divergence of apes and Old World monkeys, successive L1 subfamilies have emerged in each descendant primate germline. As revealed by case studies, the presently-active human L1 subfamily can also mobilize during embryonic and brain development in vivo. It is unknown whether non-human primate L1s can similarly generate somatic insertions in the brain. Here we applied ~40× single-cell whole genome sequencing (scWGS), and retrotransposon capture sequencing (RC-seq), to 20 hippocampal neurons from two rhesus macaques ( Macaca mulatta ). In one animal, we detected and robustly PCR validated a somatic L1 insertion that generated target site duplications, carried a short 5′ transduction, and was present in ~7% of hippocampal neurons but absent from cerebellum and nonbrain tissues. The corresponding donor L1 allele was exceptionally mobile in vitro, and was embedded in PRDM4 , a gene expressed throughout development and in neural stem cells. Nanopore long-read methylome and RNA-seq transcriptome analyses indicated young retrotransposon subfamily activation in the early embryo, followed by repression in adult tissues. These data highlight endogenous macaque L1 retrotransposition potential, provide prototypical evidence of L1-mediated somatic mosaicism in a non-human primate, and allude to L1 mobility in the brain over the last 30 million years of human evolution.
... Long interspersed elements type 1 (LINE1s, or L1s) are ubiquitous non-long terminal repeat (LTR) retrotransposons in mammals [1,2], comprising 17% and 19% of the human and mouse genome, respectively [3,4]. Only a very small fraction of genomic L1 copies are full-length as the vast majority of L1s suffer "structural defects", such as 5'-truncation [5,6], 5'-inversion [6][7][8], or internal rearrangement [9]. ...
Article
Full-text available
Background The internal promoter in L1 5’UTR is critical for autonomous L1 transcription and initiating retrotransposition. Unlike the human genome, which features one contemporarily active subfamily, four subfamilies (A_I, Gf_I and Tf_I/II) have been amplifying in the mouse genome in the last one million years. Moreover, mouse L1 5’UTRs are organized into tandem repeats called monomers, which are separated from ORF1 by a tether domain. In this study, we aim to compare promoter activities across young mouse L1 subfamilies and investigate the contribution of individual monomers and the tether sequence. Results We observed an inverse relationship between subfamily age and the average number of monomers among evolutionarily young mouse L1 subfamilies. The youngest subgroup (A_I and Tf_I/II) on average carry 3–4 monomers in the 5’UTR. Using a single-vector dual-luciferase reporter assay, we compared promoter activities across six L1 subfamilies (A_I/II, Gf_I and Tf_I/II/III) and established their antisense promoter activities in a mouse embryonic fibroblast cell line and a mouse embryonal carcinoma cell line. Using consensus promoter sequences for three subfamilies (A_I, Gf_I and Tf_I), we dissected the differential roles of individual monomers and the tether domain in L1 promoter activity. We validated that, across multiple subfamilies, the second monomer consistently enhances the overall promoter activity. For individual promoter components, monomer 2 is consistently more active than the corresponding monomer 1 and/or the tether for each subfamily. Importantly, we revealed intricate interactions between monomer 2, monomer 1 and tether domains in a subfamily-specific manner. Furthermore, using three-monomer 5’UTRs, we established a complex nonlinear relationship between the length of the outmost monomer and the overall promoter activity. Conclusions The laboratory mouse is an important mammalian model system for human diseases as well as L1 biology. Our study extends previous findings and represents an important step toward a better understanding of the molecular mechanism controlling mouse L1 transcription as well as L1’s impact on development and disease.
... Often, the activity of a DNA transposon is only evidenced by the presence of tiny elements with terminal inverted repeats (TIRs) [54,55]. Autonomous elements with long terminal repeats (LTRs) may be outnumbered by elements with a reduced internal sequence [56,57] and LINE elements sometimes give free rides to internal deletion products [58]. Long insertions are also more likely to be selectively disadvantageous to the genome. ...
Article
Full-text available
The discovery and characterization of transposable element (TE) families are crucial tasks in the process of genome annotation. Careful curation of TE libraries for each organism is necessary as each has been exposed to a unique and often complex set of TE families. De Novo methods have been developed; however, a fully automated and accurate approach to the development of complete libraries remains elusive. In this review, we cover established methods and recent developments in De Novo TE analysis. We also present various methodologies used to assess these tools and discuss opportunities for further advancement of the field.
... One source of structural variation that has not been sufficiently considered comes from transposable elements (TEs), which together constitute >42% of the human genome (Smit 1999;International Human Genome Sequencing Consortium 2001;Audano et al. 2019;Linthorst et al. 2020). Although the vast majority of TEs do not alter coding regions of our genome, some TE classes harbor strong gene regulatory potential that can directly affect gene expression levels (Jacobs et al. 2014;Wang et al. 2014;Chuong et al. 2016;Fuentes et al. 2018;Pontis et al. 2019). ...
Article
Full-text available
Genome-wide association studies (GWAS) have been highly informative in discovering disease-associated loci but are not designed to capture all structural variations in the human genome. Using long-read sequencing data, we discovered widespread structural variation within SINE-VNTR-Alu (SVA) elements, a class of great ape-specific transposable elements with gene-regulatory roles, which represents a major source of structural variability in the human population. We highlight the presence of structurally variable SVAs (SV-SVAs) in neurological disease-associated loci, and we further associate SV-SVAs to disease-associated SNPs and differential gene expression using luciferase assays and expression quantitative trait loci data. Finally, we genetically deleted SV-SVAs in the BIN1 and CD2AP Alzheimer's disease-associated risk loci and in the BCKDK Parkinson's disease-associated risk locus and assessed multiple aspects of their gene-regulatory influence in a human neuronal context. Together, this study reveals a novel layer of genetic variation in transposable elements that may contribute to identification of the structural variants that are the actual drivers of disease associations of GWAS loci.
Article
The piRNA pathway is conserved to repress transposable element (TE) activity in the animal germline via a specialized class of small RNAs called piwi-interacting RNAs (piRNAs). piRNAs are produced from discrete genomic regions called piRNA clusters (piCs). While the molecular processes by which piCs function are relatively well understood in Drosophila melanogaster , much less is known about the origin and evolution of piCs in this or any other species. To investigate piC origin and evolution, we use a population genomics approach to compare piC activity and sequence composition across 8 geographically distant strains of D. melanogaster with high quality long-read genome assemblies. We perform annotations of ovary piCs and genome-wide TE content in each strain. Our analysis uncovers extensive variation in piC activity across strains and signatures of rapid birth and death of piCs. Most TEs inferred to be recently active show an enrichment of insertions into old and large piCs, consistent with the previously proposed 'trap' model of piC evolution. By contrast, a small subset of active LTR families is enriched for the formation of new piCs, suggesting that these TEs have higher proclivity to form piCs. Thus, our findings uncover processes leading to the origin of piCs. We propose piC evolution begins with the emergence of piRNAs from a few specific LTR retrotransposon insertions that subsequently expand by accretion of other TE insertions during evolution to form larger ‘trap’ clusters. Our study shows that TEs themselves are the major force driving the rapid evolution of piCs.
Article
Selfish genetic elements comprise significant fractions of mammalian genomes. In rare instances, host genomes domesticate segments of these elements for function. Using a complete human genome assembly and 25 additional vertebrate genomes, we re-analyzed the evolutionary trajectories and functional potential of capsid (CA) genes domesticated from Metaviridae, a lineage of retrovirus-like retrotransposons. Our study expands on previous analyses to unearth several new insights about the evolutionary histories of these ancient genes. We find that at least five independent domestication events occurred from diverse Metaviridae, giving rise to three universally retained single-copy genes evolving under purifying selection and two gene families unique to placental mammals, with multiple members showing evidence of rapid evolution. In the SIRH/RTL family, we find diverse amino-terminal domains, widespread loss of protein-coding capacity in RTL10 despite its retention in several mammalian lineages, and differential utilization of an ancient programmed ribosomal frameshift in RTL3 between the domesticated CA and protease domains. Our analyses also reveal that most members of the PNMA family in mammalian genomes encode a conserved putative amino-terminal RNA-binding domain (RBD) both adjoining and independent from domesticated CA domains. Our analyses lead to a significant correction of previous annotations of the essential CCDC8 gene. We show that this putative RBD is also present in several extant Metaviridae, revealing a novel protein domain configuration in retrotransposons. Collectively, our study reveals the divergent outcomes of multiple domestication events from diverse Metaviridae in the common ancestor of placental mammals.
Preprint
Full-text available
Background: Reduced copy number of the D4Z4 macrosatellite at human chromosome 4q35 is associated with facioscapulohumeral muscular dystrophy (FSHD). A pervasive idea is that chromatin alterations at the 4q35 locus following D4Z4 repeat unit deletion lead to disease via inappropriate expression of nearby genes. Here, we sought to analyze transcription and chromatin characteristics across 4q35 and how these are affected by D4Z4 deletions and exogenous stresses. Results: We found that the 4q subtelomere is subdivided into discrete domains, each with characteristic chromatin features associated with distinct gene expression profiles. Centromere-proximal genes within 4q35 (ANT1, FAT1 and FRG1) display active histone marks at their promoters. In contrast, poised or repressed markings are present at telomere-proximal loci including FRG2, DBE-T and D4Z4. We discovered that these discrete domains undergo region-specific chromatin changes upon treatment with chromatin enzyme inhibitors or genotoxic drugs. We demonstrated that the 4q35 telomere-proximal FRG2, DBE-T and D4Z4-derived transcripts are induced upon DNA damage to levels inversely correlated with the D4Z4 repeat number, are stabilized through post-transcriptional mechanisms upon DNA damage, and are bound to chromatin. Conclusion: Our study reveals unforeseen biochemical features of RNAs from clustered transcription units within the 4q35 subtelomere. Specifically, the FRG2, DBE-T and D4Z4-derived transcripts are chromatin-associated and are stabilized post-transcriptionally after induction by genotoxic stress. Remarkably, the extent of this response is modulated by the copy number of the D4Z4 repeats, raising new hypotheses about their regulation and function in human biology and disease.
Article
Over recent years, the investigation of transposable elements (TEs) has granted researchers a deeper comprehension of their characteristics and functions, particularly regarding their significance in the mechanisms contributing to cancer development. This manuscript focuses on prostate carcinoma cell lines and offers a comprehensive review intended to scrutinize the associations and interactions between TEs and genes, as well as their response to treatment using various chemical drugs, emphasizing their involvement in cancer progression. We assembled a compendium of articles retrieved from the PubMed database to construct networks demonstrating correlations with genes and pharmaceuticals. In doing so, we linked the transposition of certain TE types to the expression of specific transcripts directly implicated in carcinogenesis. Additionally, we underline that treatment employing different drugs revealed unique patterns of TE reactivation. Our hypothesis gathers the current understanding and guides research toward evidence‐based investigations, emphasizing the association between antiviral drugs, chemotherapy, and the reduced expression of TEs in patients affected by prostate cancer.
Article
Full-text available
Recombination is responsible for breaking up haplotypes, influencing genetic variability, and the efficacy of selection. Bird genomes lack the protein PR domain-containing protein 9, a key determinant of recombination dynamics in most metazoans. Historical recombination maps in birds show an apparent stasis in positioning recombination events. This highly conserved recombination pattern over long timescales may constrain the evolution of recombination in birds. At the same time, extensive variation in recombination rate is observed across the genome and between different species of birds. Here, we characterize the fine-scale historical recombination map of an iconic migratory songbird, the Eurasian blackcap (Sylvia atricapilla), using a linkage disequilibrium–based approach that accounts for population demography. Our results reveal variable recombination rates among and within chromosomes, which associate positively with nucleotide diversity and GC content and negatively with chromosome size. Recombination rates increased significantly at regulatory regions but not necessarily at gene bodies. CpG islands are associated strongly with recombination rates, though their specific position and local DNA methylation patterns likely influence this relationship. The association with retrotransposons varied according to specific family and location. Our results also provide evidence of heterogeneous intrachromosomal conservation of recombination maps between the blackcap and its closest sister taxon, the garden warbler. These findings highlight the considerable variability of recombination rates at different scales and the role of specific genomic features in shaping this variation. This study opens the possibility of further investigating the impact of recombination on specific population-genomic features.
Preprint
Full-text available
Understanding variation in chromatin contact patterns across human populations is critical for interpreting non-coding variants and their ultimate effects on gene expression and phenotypes. However, experimental determination of chromatin contacts at a population-scale is prohibitively expensive. To overcome this challenge, we develop and validate a machine learning method to quantify the diversity 3D chromatin contacts at 2 kilobase resolution from genome sequence alone. We then apply this approach to thousands of diverse modern humans and the inferred human-archaic hominin ancestral genome. While patterns of 3D contact divergence genome-wide are qualitatively similar to patterns of sequence divergence, we find that 3D divergence in local 1-megabase genomic windows does not follow sequence divergence. In particular, we identify 392 windows with significantly greater 3D divergence than expected from sequence. Moreover, 26% of genomic windows have rare 3D contact variation observed in a small number of individuals. Using in silico mutagenesis we find that most sequence changes to do not result in changes to 3D chromatin contacts. However in windows with substantial 3D divergence, just one or a few variants can lead to divergent 3D chromatin contacts without the individuals carrying those variants having high sequence divergence. In summary, inferring 3D chromatin contact maps across human populations reveals diverse contact patterns. We anticipate that these genetically diverse maps of 3D chromatin contact will provide a reference for future work on the function and evolution of 3D chromatin contact variation across human populations.
Preprint
Full-text available
Selfish genetic elements and their remnants comprise at least half of the human genome. Active transposons duplicate by inserting copies at new sites in a host genome. Following insertion, transposons can acquire mutations that render them inactive; the accrual of additional mutations can render them unrecognizable over time. However, in rare instances, segments of transposons become useful for the host, in a process called gene domestication. Using the first complete human genome assembly and 25 additional vertebrate genomes, we analyzed the evolutionary trajectories and functional potential of genes domesticated from the capsid genes of Metaviridae, a retroviral-like retrotransposon family. Our analysis reveals four families of domesticated capsid genes in placental mammals with varied evolutionary outcomes, ranging from universal retention to lineage-specific duplications or losses and from purifying selection to lineage-specific rapid evolution. The four families of domesticated capsid genes have divergent amino-terminal domains, inherited from four distinct ancestral metaviruses. Structural predictions reveal that many domesticated genes encode a previously unrecognized RNA-binding domain retained in multiple paralogs in mammalian genomes both adjacent to and independent from the capsid domain. Collectively, our study reveals diverse outcomes of domestication of diverse metaviruses, which led to structurally and evolutionarily diverse genes that encode important, but still largely-unknown functions in placental mammals.
Preprint
Full-text available
The human silencing hub (HUSH) complex binds to transcripts of LINE-1 retrotransposons (L1s) and other genomic repeats, recruiting MORC2 and other effectors to remodel chromatin. However, how HUSH and MORC2 operate alongside DNA methylation, a central epigenetic regulator of repeat transcription, remains poorly understood. Here we interrogate this relationship in human neural progenitor cells (hNPCs), a somatic model of brain development that tolerates removal of DNA methyltransferase DNMT1. Upon loss of MORC2 or HUSH subunit TASOR in hNPCs, L1s remain silenced by robust promoter methylation. However, genome demethylation and activation of evolutionarily-young L1s attracts MORC2 binding. Simultaneous depletion of DNMT1 and MORC2 causes massive accumulation of L1 transcripts. We identify the same mechanistic hierarchy at pericentromeric α-satellites and clustered protocadherin genes, repetitive elements important for chromosome structure and neurodevelopment respectively. Our data delineate the independent epigenetic control of repeats in somatic cells, with implications for understanding the vital functions of HUSH-MORC2 in hypomethylated contexts throughout human development.
Article
DNA sequencing has revolutionized medicine over recent decades. However, analysis of large structural variation and repetitive DNA, a hallmark of human genomes, has been limited by short-read technology, with read lengths of 100–300 bp. Long-read sequencing (LRS) permits routine sequencing of human DNA fragments tens to hundreds of kilobase pairs in size, using both real-time sequencing by synthesis and nanopore-based direct electronic sequencing. LRS permits analysis of large structural variation and haplotypic phasing in human genomes and has enabled the discovery and characterization of rare pathogenic structural variants and repeat expansions. It has also recently enabled the assembly of a complete, gapless human genome that includes previously intractable regions, such as highly repetitive centromeres and homologous acrocentric short arms. With the addition of protocols for targeted enrichment, direct epigenetic DNA modification detection, and long-range chromatin profiling, LRS promises to launch a new era of understanding of genetic diversity and pathogenic mutations in human populations. Expected final online publication date for the Annual Review of Genomics and Human Genetics, Volume 24 is August 2023. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Article
Human DUX4 and its mouse ortholog Dux are normally expressed in the early embryo-the 4-cell or 2-cell cleavage stage embryo, respectively-and activate a portion of the first wave of zygotic gene expression. DUX4 is epigenetically suppressed in nearly all somatic tissue, whereas FSHD-causing mutations result in its aberrant expression in skeletal muscle, transcriptional activation of the early embryonic program, and subsequent muscle pathology. Although DUX4 and Dux both activate an early totipotent transcriptional program, divergence of their DNA binding domains limits the use of DUX4 expressed in mice as a preclinical model for FSHD. In this study, we identify the porcine DUXC mRNA expressed in early development and show that both pig DUXC and human DUX4 robustly activate a highly similar early embryonic program in pig muscle cells. These results support further investigation of pig preclinical models for FSHD.
Article
Human endogenous retroviruses (HERVs) can be vertically transmitted in a Mendelian fashion, are stably maintained in the human genome, and are estimated to constitute approximately 8% of the genome. HERVs affect human physiology and pathology via their provirus-encoded protein or long terminal repeat (LTR) element effect. Characterization of the genomic distribution is an essential step to understanding the relationships between endogenous retrovirus expression and diseases. However, the poor characterization of HML-8 prevents a detailed understanding of the regulation of the expression of this family in humans and its impact on the host genome. In light of this, the definition of an accurate and updated HERV-K HML-8 genomic map is urgently needed. Here, we report the results of a comprehensive analysis of HERV-K HML-8 sequence presence and distribution within the human genome and hominoids, with a detailed description of the different structural and phylogenetic aspects characterizing the group. A total of 40 proviruses and 5 solo LTR elements for human were characterized, which included a detailed description of provirus structure, integration time, potentially regulated genes, transcription factor-binding sites, and primer-binding site (PBS) features. Besides, 9 chimpanzee sequences, 8 gorilla sequences, and 10 orangutan sequences belonging to the HML-8 subgroup were identified. The integration time results showed that the HML-8 elements were integrated into the primate lineage around 35 and 42 million years ago (mya), during primates evolutionary speciation. Overall, the results clarified the composition of the HML-8 groups, providing an exhaustive background for subsequent functional studies.
Chapter
Among the skin disorders reflecting mosaicism, two major morphological categories are nonsegmental and segmental mosaicism. On the other hand, two major genetic categories are genomic mosaics and epigenetic mosaics. In genomic mosaicism we can discriminate lethal from nonlethal mutations. Lethal mutations can only survive in a mosaic state. By contrast, nonlethal mutations, when transmitted to the next generation, cause a diffuse, nonmosaic involvement, or they give rise to disseminated mosaicism as noted, for example, in hereditary traits characterized by multiple benign skin tumors. A simple segmental manifestation reflects a postzygotic mutation occurring in an otherwise healthy embryo, whereas a superimposed mosaic manifestation is overlaid on a diffuse, nonsegmental involvement and reflects loss of the corresponding wild-type allele occurring in a heterozygous embryo. In common polygenic skin disorders such as psoriasis, a pronounced, superimposed segmental manifestation may originate from early loss of heterozygosity or from a postzygotic new mutation occurring at an additional predisposing gene locus. In autosomal recessive traits, heterozygous individuals are usually healthy. Rarely, however, the disorder becomes manifest in mosaic form when the corresponding wild-type allele is lost at an early developmental stage, giving rise to a homozygous or compound heterozygous patch. Twin spots are paired patches that differ genetically from each other and from the surrounding background tissue. In human skin, possible examples are cutis tricolor and paired nevus flammeus and nevus anemicus. In epidermolysis bullosa and other genodermatoses, revertant mosaicism may result from a postzygotic back mutation, giving rise to patches of healthy skin. Epigenetic mosaicism of autosomes has been studied in mice and dogs and may also occur in humans. Epigenetic mosaicism of X chromosomes results in a linear or otherwise segmental pattern in various X-linked skin disorders. In some of these traits such as incontinentia pigmenti or focal dermal hypoplasia, X inactivation accounts for survival of female embryos, whereas male embryos carrying the mutation usually die in utero. By way of exception, however, male embryos with a 46,XY karyotype may survive because they carry a postzygotic new mutation giving rise to genomic mosaicism, or because they have a 47,XXY karyotype resulting in functional X-chromosome mosaicism. It should be borne in mind that not all of the X-linked human genes are inactivated. For example, the gene of X-linked recessive ichthyosis escapes inactivation, which is why female gene carries display a completely normal phenotype.
Article
Full-text available
Human endogenous retroviruses (HERVs) were integrated into the human genome in ancient times and have been coevolving with the host. Since the Human Genome Project, HERVs have attracted increasing attention.
Thesis
p>Using DNA extracted from blood samples from 270 informative females, I determined that severely skewed X-inactivation in normal women is relatively common and increases with age (P<0.05). Samples of both buccal and urinary epithelia were also obtained from 88 of the females studied. Although there was a significant association of the X-inactivation ratios between different tissues in most individuals, wide variations were apparent in some cases, making accurate extrapolations between tissues impossible. The degree of correlation between tissues fell markedly with age. Overall, these data suggest that the major factors in the aetiology of skewed X-inactivation are secondary selection processes. Previous studies in cases of trisomy rescue for a number of autosomes show a strong association with skewed X-inactivation. Data from the control group was used to test the hypothesis that trisomy 7 mosaicism causes Silver-Russell syndrome, a syndrome which has previously been attributed to imprinted genes. Consistent with the hypothesis, results showed a significant increase in the frequency of completely skewed X-inactivation in SRS patients (3 of 29) when compared to controls (3 of 270), suggesting the presence of undetected trisomy 7 in SRS patients. Detailed studies of the spreading of X-inactivation into autosomal DNA in five unbalanced human X:autosome translocations were also performed. Using allele-specific RT-PCR long-range silencing of autosomal genes located up to 45Mb from the translocation breakpoint was observed, directly demonstrating the ability of X-inactivation to spread in cis through autosomal DNA. Spreading of gene silencing occurred in either a continuous or discontinuous fashion in different cases, suggesting that some autosomal DNA is resistant to the X-inactivation signal. Observations of late-replication, histone acetylation, histone methylation and XIST RNA show that X-inactivation can spread in the absence of cytogenetic features of the inactive X, although these histone modifications were found to be better cytogenetic correlates of the spread of X-inactivation than late-replication.</p
Article
Full-text available
Since the initial discovery of DNA phosphorothioate (PT) modification systems in Streptomyces lividans in the 1980s, explorations of the biological functions of DNA PT systems have advanced and yielded a number of important findings. However, the functions of PT systems, especially in genetic regulation, remain largely unknown.
Preprint
Full-text available
Background: Human endogenous retroviruses (HERVs) result from ancestral infections by exogenous retroviruses that became incorporated into germ-line DNA and evolutionary fixed in the human genome. HERVs could vertically transmit in a Mendelian fashion and stable maintenance in the human genome which are estimated to comprise about 8%. HERV-K (HML1-10) transcription has been confirmed to be associated with a variety of diseases, such as breast cancer, lung cancer, prostate cancer, melanoma, rheumatoid arthritis, and amyotrophic lateral sclerosis. However, the poorly characterization of HML-9 hinders a detailed understanding of the expression regulation of this family in human health and its actual impact on host genomes. In the light of this, the definition of a precise and updated HERV-K HML-9 genomic map is urgently needed. Results: We report a comprehensive analysis of HERV-K HML-9 sequences presence and distribution within the human genome, with a detailed description of the different structural and phylogenetic aspects characterizing the group. A total of 23 proviruses and 47 solo LTR elements were characterized with a detailed description of provirus structure, integration time, potentially regulated genes, transcription factor binding sites, and primer binding site features. The integration time results showed that the HML-9 elements found in the human genome have been integrated into the primate lineage between 37.5 and 151.5 million years ago (mya). Conclusion: The results have finally clarified the composition of HML-9, providing an exhaustive background for subsequent functional studies.
Chapter
Long noncoding RNAs (lncRNAs) are promising candidates as biomarkers of inflammation and cancer. LncRNAs have several properties that make them well-suited as molecular markers of disease: (1) many lncRNAs are expressed in a tissue-specific manner, (2) distinct lncRNAs are upregulated based on different inflammatory or oncogenic stimuli, (3) lncRNAs released from cells are packaged and protected in extracellular vesicles, and (4) circulating lncRNAs in the blood are detectable using various RNA sequencing approaches. Here we focus on the potential for lncRNA biomarkers to detect inflammation and cancer, highlighting key biological, technological, and analytical considerations that will help advance the development of lncRNA-based liquid biopsies.
Chapter
Full-text available
The genomes of all eukaryotes contain multiple copies of DNA sequences that are related to sequences found in infectious retroviruses (for review, see Coffin, 1984; Garfinkel, 1992). These elements are transmitted through the germ line as stable Mendelian genes, yet they exhibit structural and sequence similarities to infectious exogenous retroviruses. It is these similarities that have led investigators to speculate that endogenous retroviruses are remnants of prior infections with exogenous retroviral agents and, with evolutionary time, changes have occurred to make them no longer infectious or pathogenic. These speculations have been supported with experimental studies that show that the genomes of infectious, exogenous retroviruses can integrate into the host chromosome and be inherited through the germ line. Since retroviruses are thought to have evolved from retrotransposons (Temin, 1980, 1992), it is also possible that some endogenous retrovirus-related sequences are actually precursors of infectious forms. In either case, once they are part of the host genome, these proviruses can serve as a pool of genetic material that exogenous viruses can use to produce variants with altered host specificities and phenotypes; they can encode gene products that compete for or complement in trans retrovirus function(s); and they can, themselves, act as insertional mutagens to change the regulation of host genes.
Article
Full-text available
We report several classes of human interspersed repeats that resemble fossils of DNA transposons, elements that move by excision and reintegration in the genome, whereas previously characterized mammalian repeats all appear to have accumulated by retrotransposition, which involves an RNA intermediate. The human genome contains at least 14 families and > 100,000 degenerate copies of short (180-1200 bp) elements that have 14- to 25-bp terminal inverted repeats and are flanked by either 8 bp or TA target site duplications. We describe two ancient 2.5-kb elements with coding capacity, Tigger1 and -2, that closely resemble pogo, a DNA transposon in Drosophila, and probably were responsible for the distribution of some of the short elements. The deduced pogo and Tigger proteins are related to products of five DNA transposons found in fungi and nematodes, and more distantly, to the Tc1 and mariner transposases. They also are very similar to the major mammalian centromere protein CENP-B, suggesting that this may have a transposase origin. We further identified relatively low-copy-number mariner elements in both human and sheep DNA. These belong to two subfamilies previously identified in insect genomes, suggesting lateral transfer between diverse species.
Article
Full-text available
Cytosine methylation is widely distributed in multicellular organisms. We present a comprehensive survey of the existing data on the phylogenetic distribution of DNA methylation in invertebrates, together with new data for the crustacean Penaeus semisulcatus, the annelid Aporrectodea caliginosa trapezoides, and the parasitic platyhelminth Schistosoma mansoni. Two alternative hypotheses addressing the function of cytosine methylation in invertebrates are evaluated: (1) cytosine methylation is an ancient regulatory mechanism which was lost in species with low rates of cell turnover, and (2) cytosine methylation is primarily a defense mechanism against genomic parasites and is expected to be present in all species with large genomes. We discuss the role of DNA methylation in the evolution of development in light of these hypotheses and conclude that gene control and cell memory are important and primitive functions of DNA methylation.
Article
Full-text available
We report several classes of human interspersed repeats that resemble fossils of DNA transposons, elements that move by excision and reintegration in the genome, whereas previously characterized mammalian repeats all appear to have accumulated by retrotransposition, which involves an RNA intermediate. The human genome contains at least 14 families and > 100,000 degenerate copies of short (180-1200 bp) elements that have 14- to 25-bp terminal inverted repeats and are flanked by either 8 bp or TA target site duplications. We describe two ancient 2.5-kb elements with coding capacity, Tigger1 and -2, that closely resemble pogo, a DNA transposon in Drosophila, and probably were responsible for the distribution of some of the short elements. The deduced pogo and Tigger proteins are related to products of five DNA transposons found in fungi and nematodes, and more distantly, to the Tc1 and mariner transposases. They also are very similar to the major mammalian centromere protein CENP-B, suggesting that this may have a transposase origin. We further identified relatively low-copy-number mariner elements in both human and sheep DNA. These belong to two subfamilies previously identified in insect genomes, suggesting lateral transfer between diverse species.
Article
Full-text available
A functional approach to gene cloning was applied to HeLa cells in an attempt to isolate cDNA fragments which convey resistance to gamma interferon (IFN-gamma)-induced programmed cell death. One of the rescued cDNAs, described in this work, was a fragment of a novel gene, named DAP-5. Analysis of a DAP-5 full-length cDNA clone revealed that it codes for a 97-kDa protein that is highly homologous to eukaryotic translation initiation factor 4G (eIF4G, also known as p220). According to its deduced amino acid sequence, this novel protein lacks the N-terminal region of eIF4G responsible for association with the cap binding protein eIF4E. The N-terminal part of DAP-5 has 39% identity and 63% similarity to the central region of mammalian p220. Its C-terminal part is less homologous to the corresponding region of p220, suggesting that it may possess unique functional properties. The rescued DAP-5 cDNA fragment which conveyed resistance to IFN-gamma-induced cell death was expressed from the vector in the sense orientation. Intriguingly, it comprised part of the coding region which corresponds to the less conserved C-terminal part of DAP-5 and directed the synthesis of a 28-kDa miniprotein. The miniprotein exerted a dual effect on HeLa cells. Low levels of expression protected the cells from IFN-gamma-induced programmed cell death, while high levels of expression were not compatible with continuous cell growth. The relevance of DAP-5 protein to possible changes in a cell's translational machinery during programmed cell death and growth arrest is discussed.
Article
Full-text available
Translation initiation in eukaryotes is facilitated by the cap structure, m7GpppN (where N is any nucleotide). Eukaryotic translation initiation factor 4F (eIF4F) is a cap binding protein complex that consists of three subunits: eIF4A, eIF4E and eIF4G. eIF4G interacts directly with eIF4E and eIF4A. The binding site of eIF4E resides in the N-terminal third of eIF4G, while eIF4A and eIF3 binding sites are present in the C-terminal two-thirds. Here, we describe a new eukaryotic translational regulator (hereafter called p97) which exhibits 28% identity to the C-terminal two-thirds of eIF4G. p97 mRNA has no initiator AUG and translation starts exclusively at a GUG codon. The GUG-initiated open reading frame (907 amino acids) has no canonical eIF4E binding site. p97 binds to eIF4A and eIF3, but not to eIF4E. Transient transfection experiments show that p97 suppresses both cap-dependent and independent translation, while eIF4G supports both translation pathways. Furthermore, inducible expression of p97 reduces overall protein synthesis. These results suggest that p97 functions as a general repressor of translation by forming translationally inactive complexes that include eIF4A and eIF3, but exclude eIF4E.
Article
Full-text available
Previous experiments using human teratocarcinoma cells indicated that p40, the protein encoded by the first open reading frame (ORF) of the human LINE-1 (L1Hs) retrotransposon, occurs in a large cytoplasmic ribonucleoprotein complex in direct association with L1Hs RNA(s), the p40 RNP complex. We have now investigated the interaction between partially purified p40 and L1Hs RNA in vitro using an RNA binding assay dependent on co-immunoprecipitation of p40 and bound RNA. These experiments identified two p40 binding sites on the full-length sense strand L1Hs RNA. Both sites are in the second ORF of the 6000 nt RNA: site A between residues 1999 and 2039 and site B between residues 4839 and 4875. The two RNA segments share homologous regions. Experiments involving UV cross-linking followed by immunoprecipitation indicate that p40 in the in vitro complex is directly associated with L1Hs RNA, as it is in the p40 RNP complex found in teratocarcinoma cells. Binding and competition experiments demonstrate that p40 binds to single-stranded RNA containing a p40 binding site, but not to single-stranded or double-stranded DNA, double-stranded RNA or a DNA-RNA hybrid containing a binding site sequence. Thus, p40 appears to be a sequence-specific, single-strand RNA binding protein.
Article
Full-text available
Cell stress, viral infection, and translational inhibition increase the abundance of human Alu RNA, suggesting that the level of these transcripts is sensitive to the translational state of the cell. To determine whether Alu RNA functions in translational homeostasis, we investigated its role in the regulation of double-stranded RNA-activated kinase PKR. We found that overexpression of Alu RNA by cotransient transfection increased the expression of a reporter construct, which is consistent with an inhibitory effect on PKR. Alu RNA formed stable, discrete complexes with PKR in vitro, bound PKR in vivo, and antagonized PKR activation both in vitro and in vivo. Alu RNAs produced by either overexpression or exposure of cells to heat shock bound PKR, whereas transiently overexpressed Alu RNA antagonized virus-induced activation of PKR in vivo. Cycloheximide treatment of cells decreased PKR activity, coincident with an increase in Alu RNA. These observations suggest that the increased levels of Alu RNAs caused by cellular exposure to different stresses regulate protein synthesis by antagonizing PKR activation. This provides a functional role for mammalian short interspersed elements, prototypical junk DNA.
Article
Full-text available
In both plants and Drosophila melanogaster, expression from a transgenic locus may be silenced when repeated transgene copies are arranged as a concatameric array. This repeat-induced gene silencing is frequently manifested as a decrease in the proportion of cells that express the transgene, resulting in a variegated pattern of expression. There is also some indication that, in transgenic mammals, the number of transgene copies within an array can exert a repressive influence on expression, with several mouse studies reporting a decrease in the level of expression per copy as copy number increases. However, because these studies compare different sites of transgene integration as well as arrays with different numbers of copies, the expression levels observed may be subject to varying position effects as well as the influence of the multicopy array. Here we describe use of the lox/Cre system of site-specific recombination to generate transgenic mouse lines in which different numbers of a transgene are present at the same chromosomal location, thereby eliminating the contribution of position effects and allowing analysis of the effect of copy number alone on transgene silencing. Reduction in copy number results in a marked increase in expression of the transgene and is accompanied by decreased chromatin compaction and decreased methylation at the transgene locus. These findings establish that the presence of multiple homologous copies of a transgene within a concatameric array can have a repressive effect upon gene expression in mammalian systems.
Article
Full-text available
Genetic models predict that genomic rearrangement in hybrids can facilitate reproductive isolation and the formation of new species by preventing gene flow between the parent species and hybrid (sunflowers are an example). The mechanism underlying hybridization-induced chromosome remodelling is as yet unknown, although mobile element activity has been shown to be involved in DNA rearrangement in some dysgenic Drosophila hybrids. It has been proposed that DNA methylation evolved as a means of repressing the movement of mobile elements (the host defence model). If such a protective mechanism were to fail, mobile elements could be activated, and could cause major and rapid genome alterations. Here we demonstrate the occurrence of genome-wide undermethylation, retroviral element amplification and chromosome remodelling in an interspecific mammalian hybrid (Macropus eugenii x Wallabia bicolor). Atypically extended centromeres of Macropus eugenii derived autosomes in the hybrid were composed primarily of an unmethylated, amplified retroviral element not detectable in either parent species. These results, taken with the observation of deficient methylation and de novo chromosome change in other mammalian hybrids, indicate that the failure of DNA methylation and subsequent mobile-element activity in hybrids could facilitate rapid karyotypic evolution.
Article
Full-text available
The transposon Tc1 of the nematode Caenorhabditis elegans is a member of the widespread family of Tc1/mariner transposons. The distribution pattern of virtually identical transposons among insect species that diverged 200 million years ago suggested horizontal transfer of the elements between species. This hypothesis gained experimental support when it was shown that Tc1 and later also mariner transposons could be made to jump in vitro, with their transposase as the only protein required. Later it was shown that mariner transposons from one fruit fly species can jump in other fruit fly species and in a protozoan and, recently, that a Tc1-like transposon from the nematode jumps in fish cells and that a fish Tc1-like transposon jumps in human cells. Here we show that the Tc1 element from the nematode jumps in human cells. This provides further support for the horizontal spread hypothesis. Furthermore, it suggests that Tc1 can be used as vehicle for DNA integration in human gene therapy.
Article
Full-text available
Immunoglobulin and T-cell-receptor genes are assembled from component gene segments in developing lymphocytes by a site-specific recombination reaction, V(D)J recombination. The proteins encoded by the recombination-activating genes, RAG1 and RAG2, are essential in this reaction, mediating sequence-specific DNA recognition of well-defined recombination signals and DNA cleavage next to these signals. Here we show that RAG1 and RAG2 together form a transposase capable of excising a piece of DNA containing recombination signals from a donor site and inserting it into a target DNA molecule. The products formed contain a short duplication of target DNA immediately flanking the transposed fragment, a structure like that created by retroviral integration and all known transposition reactions. The results support the theory that RAG1 and RAG2 were once components of a transposable element, and that the split nature of immunoglobulin and T-cell-receptor genes derives from germline insertion of this element into an ancestral receptor gene soon after the evolutionary divergence of jawed and jawless vertebrates.
Article
Full-text available
The RAG1 and RAG2 proteins are known to initiate V(D)J recombination by making a double-strand break between the recombination signal sequence (RSS) and the neighboring coding DNA. We show that these proteins can also drive the coupled insertion of cleaved recombination signals into new DNA sites in a transpositional reaction. This RAG-mediated DNA transfer provides strong evidence for the evolution of the V(D)J recombination system from an ancient mobile DNA element and suggests that repeated transposition may have promoted the expansion of the antigen receptor loci. The inappropriate diversion of V(D)J rearrangement to a transpositional pathway may also help to explain certain types of DNA translocation associated with lymphatic tumors.
Article
Full-text available
Retrotransposons, transposable elements related to animal retroviruses, are found in all eukaryotes investigated and make up the majority of many plant genomes. Their ubiquity points to their importance, especially in their contribution to the large-scale structure of complex genomes. The nature and frequency of retro-element appearance, activation and amplification are poorly understood in all higher eukaryotes. Here we employ a novel approach to determine the insertion dates for 17 of 23 retrotransposons found near the maize adh1 gene, and two others from unlinked sites in the maize genome, by comparison of long terminal repeat (LTR) divergences with the sequence divergence between adh1 in maize and sorghum. All retrotransposons examined have inserted within the last six million years, most in the last three million years. The structure of the adh1 region appears to be standard relative to the other gene-containing regions of the maize genome, thus suggesting that retrotransposon insertions have increased the size of the maize genome from approximately 1200 Mb to 2400 Mb in the last three million years. Furthermore, the results indicate an increased mutation rate in retrotransposons compared with genes.
Article
Full-text available
The evolution, mobility and deleterious genetic effects of human Alus are fairly well understood. The complexity of regulated transcriptional expression of Alus is becoming apparent and insight into the mechanism of retrotransposition is emerging. Unresolved questions concern why mobile, highly repetitive short interspersed elements (SINEs) have been tolerated throughout evolution and why and how families of such sequences are periodically replaced. Either certain SINEs are more successful genomic parasites or positive selection drives their relative success and genomic maintenance. A complete understanding of the evolutionary dynamics and significance of SINEs requires determining whether or not they have a function(s). Recent evidence suggests two possibilities, one concerning DNA and the other RNA. Dispersed Alus exhibit remarkable tissue-specific differences in the level of their 5-methylcytosine content. Differences in Alu methylation in the male and female germlines suggest that Alu DNA may be involved in either the unique chromatin organization of sperm or signaling events in the early embryo. Alu RNA is increased by cellular insults and stimulates protein synthesis by inhibiting PKR, the eIF2 kinase that is regulated by double-stranded RNA. PKR serves other roles potentially linking Alu RNA to a variety of vital cell functions. Since Alus have appeared only recently within the primate lineage, this proposal provokes the challenging question of how Alu RNA could have possibly assumed a significant role in cell physiology.
Article
Full-text available
Retrotransposition affects genome structure by increasing repetition and producing insertional mutations. Dispersion of the retrotransposon L1 throughout mammalian genomes suggests that L1 activity might be an important evolutionary force. Here we report that L1 retrotransposition contributes to rapid genome evolution in the mouse, because a number of L1 sequences from the T(F) subfamily are retrotransposition competent. We show that the T(F) subfamily is large, young and expanding, containing approximately 4,800 full-length members in strain 129. Eleven randomly isolated, full-length T(F) elements averaged 99.8% sequence identity to each other, and seven of these retrotransposed in cultured cells. Thus, we estimate that the mouse genome contains approximately 3,000 active T(F) elements, 75 times the estimated number of active human L1s. Moreover, as T(F) elements are polymorphic among closely related mice, they have retrotransposed recently, implying rapid amplification of the subfamily to yield genomes with different patterns of interspersed repetition. Our data show that mice and humans differ considerably in the number of active L1s, and probably differ in the contribution of retrotransposition to ongoing sequence evolution.
Article
Full-text available
Several distinct families of endogenous retrovirus-like sequences (HERVs) exist in the genomes of humans and other primates. One of these families, the HERV-K group, contains members that encode functional proteins and that have been implicated in the etiology of insulin-dependent diabetes mellitus (IDDM). Because of potential functional and disease relevance, it is important to determine if there are HERV-K-associated genetic differences between individuals. In this study, we have investigated the divergence and evolutionary age of HERV-K long terminal repeats (LTRs). Thirty-seven LTRs, taken primarily from random human clones in GenBank, were aligned and grouped into nine clusters with decreasing sequence divergence. Cluster 1 sequences are 8.6% divergent, on average, whereas cluster 9 LTRs, represented by the LTRs of the fully sequenced HERV-K10 clone, show an average of only 1.1% divergence from each other. The evolutionary age of 18 LTRs from different clusters was then investigated by genomic PCR to determine presence or absence of the retroviral element in different primate species. LTRs from clusters of higher divergence were detected in monkeys and apes, whereas LTRs in clusters with lower divergence were acquired later in evolution. Notably, LTRs of cluster 9 were found only in humans at all nine loci examined. Genomic Southern analysis with an oligonucleotide probe specific for cluster 9 LTRs suggests that HERV-K elements with this type of LTR expanded independently in the genomes of humans and the great apes. This is the first report of endogenous retroviral integrations that are specific to humans and indicates that some HERVs have amplified much later than previously thought. These elements may still be actively transposing and may therefore represent a source of genetic variation linked to disease development.
Article
Full-text available
The gene encoding BC200 RNA arose from a monomeric Alu element. Subsequently, the RNA had been recruited or exapted into a function of the nervous system. Here we confirm the presence of the BC200 gene in several primate species among the Anthropoidea. The period following the divergence of New World monkeys and Old World monkeys from their common ancestor is characterized by a significantly higher substitution rate in the examined 5' flanking region than in the BC200 RNA coding region itself. Furthermore, the conservation of CpG dimers in the RNA coding region (200 bp) is drastically increased compared to the 5' flanking region (approximately 400 bp) over all 12 species examined. Finally, the brain-specific expression pattern of BC200 RNA and its presence as a ribonucleoprotein particle (RNP) are conserved in Old World and New World monkeys. Our studies indicate that the gene encoding BC200 RNA was created at least 35-55 million years ago and its presence, mode of expression, and association with protein(s) as an RNP are under selective pressure.
Article
Full-text available
The distribution of MIRs (mammalian-wide interspersed repeats) was investigated in 164 human sequences (> or = 100 kb), which were assigned, according to their GC level, to isochore families L, H1, H2 and H3. MIR elements, whose total number in the genome was estimated to be about 3.3 x 10(5), were found to be unevenly distributed in human isochores. The majority of MIRs (55%) were found in the L isochore family. In contrast, MIR density was highest in H2, closely followed by H1, whereas densities in L and H3 were 2- and 3-fold lower than in H2, respectively. For this reason, the assessment of MIR distribution by inter-repeat PCR led to an overestimation of MIR numbers in H2 isochore and an underestimation in L isochores.
Article
Full-text available
The multiple sclerosis-associated retrovirus (MSRV) isolated from plasma of MS patients was found to be phylogenetically and experimentally related to human endogenous retroviruses (HERVs). To characterize the MSRV-related HERV family and to test the hypothesis of a replication-competent HERV, we have investigated the expression of MSRV-related sequences in healthy tissues. The expression of MSRV-related transcripts restricted to the placenta led to the isolation of overlapping cDNA clones from a cDNA library. These cDNAs spanned a 7.6-kb region containing gag, pol, and env genes; RU5 and U3R flanking sequences; a polypurine tract; and a primer binding site (PBS). As this PBS showed similarity to avian retrovirus PBSs used by tRNATrp, this new HERV family was named HERV-W. Several genomic elements were identified, one of them containing a complete HERV-W unit, spanning all cDNA clones. Elements of this multicopy family were not replication competent, as gag and pol open reading frames (ORFs) were interrupted by frameshifts and stop codons. A complete ORF putatively coding for an envelope protein was found both on the HERV-W DNA prototype and within an RU5-env-U3R polyadenylated cDNA clone. Placental expression of 8-, 3.1-, and 1.3-kb transcripts was observed, and a putative splicing strategy was described. The apparently tissue-restricted HERV-W long terminal repeat expression is discussed with respect to physiological and pathological contexts.
Article
Full-text available
We report the cloning of a novel gene, called Tramp, in the Xp/Yp PAR region that has a functional homologue on the Y chromosome and escapes X-inactivation. This gene encodes, within a single exon, a putative protein that has amino acid similarity with transposases of the Ac family. Flanking this gene we have identified putative terminal inverted repeats (TIRs) and a duplicate target site, suggesting that it may be an ancient transposable element. The nucleotide differences in these sites and the TIR-binding inactivity of the putative Tramp protein suggest that this element is not an autonomous transposon. In the human genome, the Tramp protein may be involved in the transposition of other transposable elements, like medium reiterated frequency repeats, or it could be specialized in the acquisition of a new cellular function.
Article
Full-text available
The propagation of X chromosome inactivation is thought to be mediated by the cis- limited spreading of the non-protein coding Xist transcript. In this report we have investigated the localization of Xist RNA on rodent metaphase chromosomes. We show that Xist RNA exhibits a banded pattern on the inactive X and is excluded from regions of constitutive heterochromatin. The banding pattern suggests a preferential association with gene-rich, G-light regions. Analysis of X:autosome rearrangements revealed that restricted propagation of X inactivation into cis -linked autosomal material is reflected by a corresponding limited spread of Xist RNA. We discuss these results in the context of models for the function of Xist RNA in the propagation of X inactivation.
Article
Mutations in Bruton's tyrosine kinase (Btk) result in the immunodeficiency X-linked agammaglobulinemia (XLA). In a previous study of 101 patients with presumed XLA, we identified seven patients with large genomic alterations in Btk. The recent completion of 100 kb of contiguous DNA sequence at the Btk locus has allowed us to characterize these mutations in detail and to identify four different types of alterations. These alterations included a 253-bp retroposon insertion at position +5 within intron 9, an inversion of greater than 48 kb that disrupted Btk between exons 4 and 5, a 12.9-kb duplication including Btk exons 2 to 5, and four deletions ranging from 2.8 to 38 kb in size. The duplication and three of the deletions resulted from unequal crossovers of Alu repeats. Further, three of the deletions terminated within a repeat-rich cluster spanning 30 kb of sequence 3′ of Btk exon 19, suggesting that this region was more susceptible to unequal crossovers than the rest of the Btk gene. These studies describe the first reports of an insertion, an inversion, and a duplication in Btk and demonstrate the utility of large-scale sequencing in the elucidation of disease-causing mutations.
Article
Although a number of alternative solutions to the chromosome end-replication problem are used in nature, the telomerase solution is the most widespread and perhaps the oldest among eukaryotes. The finding of clear RT motifs in the catalytic subunit of telomerase means we no longer need to qualify it as a “specialized” RT. Indeed, expression of the telomerase RNA and the catalytic subunit (along with whatever components might be provided by a reticulocyte lysate) reconstitutes human telomerase activity in vitro (Weinrich et al. 1997xWeinrich, S.L, Pruzan, R, Ma, L, Ouellette, M, Tesmer, V.M, Holt, S.E, Bodnar, A.G, Lichtsteiner, S, Kim, N.W, Trager, J.B et al. Nat. Genet. 1997; 17: 498–502Crossref | PubMedSee all ReferencesWeinrich et al. 1997). This suggests that underneath the massive telomerase RNP complex (based on glycerol gradient and sizing column estimates), telomerase may have a simple two-component RNP enzyme at its core, much like simpler RTs encoded by group II introns and non-LTR retrotransposons. Since RTs are thought to have been with us since the transition from RNA- and protein-based systems to the present-day DNA-, RNA-, and protein-based systems (see Figure 2Figure 2), we now have a satisfying explanation for near universality of telomerase among eukaryotes. While it is still far from clear exactly how telomerase evolved to its present-day form, it is likely to be with us for a long time.
Article
A 190 bp insertion is associated with the white-eosin mutation in Drosophila melanogaster. This insertion is a member of a family of transposable elements, pogo elements, which is of the same class as the P and hobo elements of D. melanogaster. Strains typically have many copies of a 190 bp element, 10-15 elements 1.1-1.5 kb in size and several copies of a 2.1 kb element. The smaller elements all appear to be derived from the largest by single internal deletions so that all elements share terminal sequences. They either always insert at the dinucleotide TA and have perfect 21 bp terminal inverse repeats, or have 22 bp inverse repeats and produce no duplication upon insertion. Analysis by DNA blotting of their distribution and occupancy of insertion sites in different strains suggests that they may be less mobile than P or hobo. The DNA sequence of the largest element has two long open reading frames on one strand which are joined by splicing as indicated by cDNA analysis. RNAs of this strand are made, whose sizes are similar to the major size classes of elements. A protein predicted by the DNA sequence has significant homology with a human centrosomal-associated protein, CENP-B. Homologous sequences were not detected in other Drosophila species, suggesting that this transposable element family may be restricted to D. melanogaster.
Article
Epidemiological data and genetic studies indicate that certain forms of human epilepsy are inherited. Based on the similarity between the human and mouse genomes, mouse models of epilepsy could facilitate the discovery of genes associated with epilepsy syndromes. Here, we report an insertional murine mutation that inactivates a novel gene and results in whole body jerks, generalized clonic seizures, and epileptic brain activity in transgenic mice. The gene, named jerky, encodes a putative 41.7 kD protein displaying homology to a number of nuclear regulatory proteins, suggesting that perhaps the jerky protein is able to bind DNA.
Article
The single copy endogenous retrovirus locus ERV-3 is known to be primarily expressed in the placenta. The absence of expression of this gene in choriocarcinoma cell lines has led to speculation that this may be a defect associated with this abnormality. We show here that ERV-3 is not normally expressed in the cytotrophoblast from which these tumour cells are derived but is expressed in normal syncytiotrophoblast. The conservation of the ERV-3 open reading frame for env in ape and old world monkey species and its tight regulation and site of expression suggest a functional role for this gene in this tissue.
Article
Over a third of the human genome consists of interspersed repetitive sequences which are primarily degenerate copies of transposable elements. In the past year, the identities of many of these transposable elements were revealed. The emerging concept is that only three mechanisms of amplification are responsible for the vast majority of interspersed repeats and that with each autonomous element a number of dependent non-autonomous sequences have co-amplified.
Article
It is commonly accepted that the reverse-transcribed cellular RNA molecules, called retroposons, integrate at staggered breaks in mammalian chromosomes. However, unlike what was previously thought, most of the staggered breaks are not generated by random nicking. One of the two nicks involved is primarily associated with the 5'-TTAAAA hexanucleotide and its variants derived by a single base substitution, particularly A --> G and T --> C. It is probably generated in the antisense strand between the consensus bases 3'-AA and TTTT complementary to 5'-TTAAAA. The sense strand is nicked at variable distances from the TTAAAA consensus site toward the 3' end, preferably within 15-16 base pairs. The base composition near the second nicking site is also nonrandom at positions preceding the nick. On the basis of the observed sequence patterns it is proposed that integration of mammalian retroposons is mediated by an enzyme with endonucleolytic activity. The best candidate for such enzyme may be the reverse transcriptase encoded by the L1 non-long-terminal-repeat retrotransposon, which contains a freshly reported domain homologous to the apurinic/apyrimidinic (AP) endonuclease family [Martin, F., Olivares, M., Lopez, M. C. & Alonso, C. (1996) Trends Biochem. Sci. 21, 283-285; Feng, Q., Moran, J. V., Kazazian, H. H. & Boeke, J. D. (1996) Cell 87, 905-916] and shows nicking in vitro with preference for targets similar to 5'-TTAAAA/3'-AATTTT consensus sequence [Feng, Q., Moran, J. V., Kazazian, H. H. & Boeke, J. D. (1996) Cell 87, 905-916]. A model for integration of mammalian retroposons based on the presented data is discussed.
Article
A helix-turn-helix (HTH) DNA-binding motif is identified in transposase sequences in Tc1, mariner and pogo DNA transposum. The findings are supported by results of various sequence analysis methods. Tc1 transposases are also predicted to contain another DNA-binding region. These findings are in accord with experimental evidence obtained from Tc1A, Tc3A and pogo transposases. The pogo family transposases, but not the pogo-type transcription factors, contain the HTH motif, suggesting that HTH structures are essential for Tc1/mariner/pogo transposition. Analysis of multiple sequence alignments enabled the identification of the HTH motif in distantly related protein sequences.
Article
Most of the 5-methylcytosine in mammalian DNA resides in transposons, which are specialized intragenomic parasites that represent at least 35% of the genome. Transposon promoters are inactive when methylated and, over time, C-->T transition mutations at methylated sites destroy many transposons. Apart from that subset of genes subject to X inactivation and genomic imprinting, no cellular gene in a non-expressing tissue has been proven to be methylated in a pattern that prevents transcription. It has become increasingly difficult to hold that reversible promoter methylation is commonly involved in developmental gene control; instead, suppression of parasitic sequence elements appears to be the primary function of cytosine methylation, with crucial secondary roles in allele-specific gene expression as seen in X inactivation and genomic imprinting.
Article
The cellular response to environmental signals is largely dependent upon the induction of responsive protein kinase signaling pathways. Within these pathways, distinct protein-protein interactions play a role in determining the specificity of the response through regulation of kinase function. The interferon-induced serine/threonine protein kinase, PKR, is activated in response to various environmental stimuli. Like many protein kinases, PKR is regulated through direct interactions with activator and inhibitory molecules, including P58IPK, a cellular PKR inhibitor. P58IPK functions to represses PKR-mediated phosphorylation of the eukaryotic initiation factor 2alpha subunit (eIF-2alpha) through a direct interaction, thereby relieving the PKR-imposed block on mRNA translation and cell growth. To further define the molecular mechanism underlying regulation of PKR, we have utilized an interaction cloning strategy to identify a novel cDNA encoding a P58IPK-interacting protein. This protein, designated P52rIPK, possesses limited homology to the charged domain of Hsp90 and is expressed in a wide range of cell lines. P52rIPK and P58IPK interacted in a yeast two-hybrid assay and were recovered as a complex from mammalian cell extracts. When coexpressed with PKR in yeast, P58IPK repressed PKR-mediated eIF-2alpha phosphorylation, inhibiting the normally toxic and growth-suppressive effects associated with PKR function. Conversely, introduction of P52rIPK into these strains resulted in restoration of both PKR activity and eIF-2alpha phosphorylation, concomitant with growth suppression due to inhibition of P58IPK function. Furthermore, P52rIPK inhibited P58IPK function in a reconstituted in vitro PKR-regulatory assay. Our results demonstrate that P58IPK is inhibited through a direct interaction with P52rIPK which, in turn, results in upregulation of PKR activity. Taken together, our data describe a novel protein kinase-regulatory system which encompasses an intersection of interferon-, stress-, and growth-regulatory pathways.
Article
A confident consensus sequence for Hsmar1, the first mariner transposon recognized in the human genome, was generated using three genomic and 15 cDNA sequences. It is thought to represent the ancestrally active copy that invaded an early primate genome. The consensus is 1287 base pairs (bp) long, has 30 bp perfect inverted terminal repeats (ITRs), and encodes a 343 amino acid (aa) mariner transposase. Each copy has diverged from the consensus largely independently of the others and mostly neutrally, and most are now defective. They differ from the consensus by an average of 7.8% in DNA sequence and 7.5 indels per kilobase, both of which values indicate that the copies were formed about 50 Myr ago. On average, only 20% of the 73 surmised CpG hypermutable sites in the consensus remain. A remarkable exception to this loss of functionality is revealed by a set of ten cDNA clones derived from a particular genomic copy that has diverged only 2.4% from the consensus, retained 54% of its hypermutable CpG pairs, and which has a full-length transposase open reading frame. The complete sequence of one of these cDNAs (NIB1543) indicates that the transposase gene of this copy may have been conserved because it is spliced to a human cellular gene encoding a SET domain protein. A specific PCR assay was used to reveal the presence of Hsmar1 copies in all primates examined representing all major lineages, but not in close relatives of primates. PCR fragments cloned and sequenced from a representative sample of primates confirmed that Hsmar1 copies are present in all major lineages, and also revealed another cecropia subfamily mariner in prosimians only, and a third highly divergent mariner present in the greater slow loris Nycticebus coucang. There are about 200 copies of Hsmar1 in the human genome, as well as +/-2400 copies of a derived 80 bp paired ITR structure and +/-4600 copies of solo ITRs. Thus, this transposon had a considerable insertional mutagenic effect on past primate genomes.
Article
Large blocks of tandemly repeated sequences, or satellites, surround the centromeres of complex eukaryotes. During mitosis in Drosophila, satellite DNA binds proteins that, during interphase, bind other sites. The requirement for a repeat to borrow a partner protein from those available at mitosis might limit the spectrum of repeat units that can be expanded into large blocks. To account for the ubiquity and pericentric localization of satellites, we propose that they are utilized to maintain regions of late replication, thus ensuring that the centromere is the last region to replicate on a chromosome.
Article
That endogenous retrovirus (ERV) is present within the placenta of humans and other mammals has been known for the past 25 years, but the significance of this observation is still not fully understood. Much molecular biological data have emerged in recent years to support the earlier electron microscopic data on the presence of placental ERV. The evidence for ERV in animal and human placental tissue is presented, then integrated with data on the presence of ERV in a range of other tissues, in particular teratocarcinoma cells. Placental invasiveness and maternal immunosuppression are then discussed in relation to metalloproteinase secretion, the immunosuppressive potential of retroviruses, and placental growth factors, while the evidence for a functional link between placental protooncogenes and trophoblast malignancy is reviewed. Finally, placental development, structure, and life span are discussed within an evolutionary context. The hypothesis that one or more ancient trophoblastic ERVs could have played a role in the evolution and divergence of all placental mammals is evaluated.
Article
A number of genes which affect the susceptibility of mice to infection by retroviruses have been described. One of the most interesting of these genes is Fv1 (Friend virus susceptibility 1), which acts at a stage in the retroviral life-cycle following virus entry into the cell but prior to integration and formation of proviral structures. A detailed understanding of the mode of action of Fv1 might be expected to shed fresh light on early steps of the retroviral replication, although progress has been slow in this area due to uncertainty about the nature of the Fv1 gene. The recent cloning of Fv1 by a positional approach fills this gap in current knowledge. Fv1 appears to be derived from a fragment of a retroviral genome, an observation that may suggest novel approaches to the control of retroviral replication.
Article
Recent work has shown that X-chromosome inactivation is brought about by Xist mRNA, which coats the inactive X-chromosome. This paper presents a hypothesis on the function of this RNA. It is suggested that interspersed repetitive elements of the LINE type, in which the X-chromosome is particularly rich, act as booster elements to promote the spread of Xist mRNA. Contact with this RNA causes the LINE elements to be sensed as repeated elements by the cell's system for repeat-induced gene silencing. This leads to the silencing of these elements and the intervening unique sequences by their conversion to heterochromatin.
Article
Nature is the international weekly journal of science: a magazine style journal that publishes full-length research papers in all disciplines of science, as well as News and Views, reviews, news, features, commentaries, web focuses and more, covering all branches of science and how science impacts upon all aspects of society and life.
Article
A substantial fraction of mammalian genomes is composed of mobile elements and their remnants. Recent insertions of LTR-retrotransposons, non-LTR retrotransposons, and non-autonomous retrotransposons have caused disease frequently in mice, but infrequently in humans. Although many of these elements are defective, a number of mammalian non-LTR retrotransposons of the L1 type are capable of autonomous retrotransposition. The mechanism by which they retrotranspose and in turn aide the retrotransposition of non-autonomous elements is being elucidated.
Article
Fukuyama-type congenital muscular dystrophy (FCMD), one of the most common autosomal recessive disorders in Japan (incidence is 0.7-1.2 per 10,000 births), is characterized by congenital muscular dystrophy associated with brain malformation (micropolygria) due to a defect in the migration of neurons. We previously mapped the FCMD gene to a region of less than 100 kilobases which included the marker locus D9S2107 on chromosome 9q31. We have also described a haplotype that is shared by more than 80% of FCMD chromosomes, indicating that most chromosomes bearing the FCMD mutation could be derived from a single ancestor. Here we report that there is a retrotransposal insertion of tandemly repeated sequences within this candidate-gene interval in all FCMD chromosomes carrying the founder haplotype (87%). The inserted sequence is about 3 kilobases long and is located in the 3' untranslated region of a gene encoding a new 461-amino-acid protein. This gene is expressed in various tissues in normal individuals, but not in FCMD patients who carry the insertion. Two independent point mutations confirm that mutation of this gene is responsible for FCMD. The predicted protein, which we term fukutin, contains an amino-terminal signal sequence, which together with results from transfection experiments suggests that fukutin is a secreted protein. To our knowledge, FCMD is the first human disease to be caused by an ancient retrotransposal integration.
Article
In eukaryotic genomes, methylation of cytosine residues commonly occurs in repetitive sequences. This methylation correlates with reduced gene expression and suppression of recombination, and is thus thought to serve as a genome-defense mechanism that guards against the deleterious effects of multicopy transposable elements and aberrant gene duplications. Analysis of methylation in fungi and plants suggests that the ability of DNA repeats to pair with one another is a key to their selection for methylation. Recent data have outlined the substrate requirements for the establishment and maintenance of methylation in eukaryotic repeated sequences. Substrate-methylation patterns could help us to understand the way in which methyltransferase enzymes recognize their substrates.
Article
The recently inserted subfamilies of Alu retroposons (Ya5/8 and Yb8) are composed of approximately 2000 elements. We have screened a human chromosome 19-specific cosmid library for the presence of Ya5/8 and Yb8 Alu family members. This analysis resulted in the identification of 12 Ya5/8 Alu family members and 15 Yb8 Alu family members from human chromosome 19. The total number of Ya5/8 and Yb8 Alu family members located on human chromosome 19 does not differ from that expected based upon random integration of Alu repeats within the human genome. The distribution of both subfamilies of Alu elements along human chromosome 19 also appears to be random. DNA sequence analysis of the individual Alu elements revealed a low level of random mutations within both subfamilies of Alu elements consistent with their recent evolutionary origin. Oligonucleotide primers complementary to the flanking unique sequences adjacent to each Alu element were used in polymerase chain reaction assays to determine the phylogenetic distribution and human genomic variation associated with each Alu family member. All of the chromosome 19-specific Ya5/8 and Yb8 Alu family members were restricted to the human genome and absent from orthologous positions within the genomes of several non-human primates. Three of the Yb8 Alu family members were polymorphic for insertion presence/absence within the genomes of a diverse array of human populations. The polymorphic Alu elements will be useful tools for the study of human population genetics.
Article
Genome-wide demethylation has been suggested to be a step in carcinogenesis. Evidence for this notion comes from the frequently observed global DNA hypomethylation in tumour cells, and from a recent study suggesting that defects in DNA methylation might contribute to the genomic instability of some colorectal tumour cell lines. DNA hypomethylation has also been associated with abnormal chromosomal structures, as observed in cells from patients with ICF (Immunodeficiency, Centromeric instability and Facial abnormalities) syndrome and in cells treated with the demethylating agent 5-azadeoxycytidine. Here we report that murine embryonic stem cells nullizygous for the major DNA methyltransferase (Dnmt1) gene exhibited significantly elevated mutation rates at both the endogenous hypoxanthine phosphoribosyltransferase (Hprt) gene and an integrated viral thymidine kinase (tk) transgene. Gene deletions were the predominant mutations at both loci. The major cause of the observed tk deletions was either mitotic recombination or chromosomal loss accompanied by duplication of the remaining chromosome. Our results imply an important role for mammalian DNA methylation in maintaining genome stability.
Article
The search for new endogenous retroviral sequences, on the basis of sequence homologies with the pol gene of the recently reported multiple sclerosis associated retrovirus (MSRV), allowed us to identify a full length endogenous retrovirus sequence located on the long arm of human chromosome 7. This retrovirus, HERV-7q, includes in its env region, within a single 1,620 bp open reading frame, a 664 bp domain almost identical to a 3' non-coding region of the rab7 gene. Transcripts encompassing both the env and the 3' LTR regions of HERV-7q have already been identified as expressed sequence tags, suggesting that this env-like gene might code for a 538 amino acid long deduced protein.
Article
L1 elements are polyA retrotransposons which inhabit the human genome. Recent work has defined an endonuclease (L1 EN) encoded by the L1 element required for retrotransposition. We report the sequence specificity of this nicking endonuclease and the physical basis of its DNA recognition. L1 endonuclease is specific for the unusual DNA structural features found at the TpA junction of 5'(dTn-dAn) x 5'(dTn-dAn) tracts. Within the context of this sequence, substitutions which generate a pyrimidine-purine junction are tolerated, whereas purine-pyrimidine junctions greatly reduce or eliminate nicking activity. The A-tract conformation of the DNA substrate 5' of the nicked site is required for L1 EN nicking. Chemical or physical unwinding of the DNA helix enhances L1 endonuclease activity, while disruption of the adenine mobility associated with TpA junctions reduces it. Akin to the protein-DNA interactions of DNase I, L1 endonuclease DNA recognition is likely mediated by minor groove interactions. Unlike several of its homologues, however, L1 EN exhibits no AP endonuclease activity. Finally, we speculate on the implications of the specificity of the L1 endonuclease for the parasitic relationship between retroelements and the human genome.
Article
Differentiation-related expression of endogenous retrovirus ERV-3 env in the normal human placental syncytiotrophoblast suggests a role in placental development. The choriocarcinoma cell line BeWo, a model of trophoblast differentiation, is maintained in an undifferentiated state and undergoes differentiation upon the addition of forskolin. The expression of ERV-3 env mRNA increased after 48 h forskolin treatment, concurrently with increased intercellular fusion and production of human chorionic gonadotropin (beta-hCG) mRNA, a hormonal differentiation marker for trophoblast. Over expression of ERV-3 env induced differentiation of BeWo characterized by decreased cell growth, differentiation-related morphologic changes, and induction of beta-hCG mRNA. These results support the first known role for the expression of an endogenous retrovirus in trophoblast differentiation.