ArticleLiterature Review

Do transposable elements really contribute to proteomes?

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Recent studies indicate that the initial classification of transposable elements (TEs) as 'useless', 'selfish' or 'junk' pieces of DNA is not an accurate one. TEs seem to have complex regulatory functions and contribute to the coding regions of many genes. Because this contribution had been documented only at transcript level, we searched for evidence that would also support the translation of TE cassettes. Our findings suggest that the proportion of proteins with TE-encoded fragments (approximately 0.1%), although probably underestimated, is much less than what the data at transcript level suggest (approximately 4%). In all cases, the TE cassettes are derived from old TEs, consistent with the idea that incorporation (exaptation) of TE fragments into functional proteins requires long evolutionary periods. We therefore argue that functional proteins are unlikely to contain TE cassettes derived from young TEs, the role of which is probably limited to regulatory functions.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Despite the prevalence of Alu exons in the human transcriptome, the contribution of Alu exons to the human proteome remains unclear and controversial [11][12][13][14][15]. In the 1990s and early 2000s, Makałowski and others carried out large-scale discoveries of Alu exons in human genes using cDNA sequences and expressed sequence tags (ESTs) [11,12,16]. ...
... In fact, in these studies there was little experimental evidence to support the existence of Alu-derived peptides in vitro or in vivo. In 2006, Makalowski and colleagues revisited this question by searching for transposable elements derived peptides in non-redundant protein entries in the Protein Databank (PDB) [13]. Since all proteins in PDB have solved 3D structures, Makalowski and colleagues reasoned that PDB provides a high-confidence collection of stable, functional proteins. ...
... They did not identify any Alu derived peptides in PDB protein entries. On the basis of this result, Makalowski and colleagues concluded that exons derived from Alu or other young repetitive elements do not have adequate evolutionary time to be incorporated into stable protein products, and the role of Alu exons should be almost entirely regulatory [13]. Since then, this has become the prevailing view on the contribution of Alu exons to the human proteome [3,14,15,17]. ...
Article
Full-text available
Background Alu elements are major contributors to lineage-specific new exons in primate and human genomes. Recent studies indicate that some Alu exons have high transcript inclusion levels or tissue-specific splicing profiles, and may play important regulatory roles in modulating mRNA degradation or translational efficiency. However, the contribution of Alu exons to the human proteome remains unclear and controversial. The prevailing view is that exons derived from young repetitive elements, such as Alu elements, are restricted to regulatory functions and have not had adequate evolutionary time to be incorporated into stable, functional proteins. Results We adopt a proteotranscriptomics approach to systematically assess the contribution of Alu exons to the human proteome. Using RNA sequencing, ribosome profiling, and proteomics data from human tissues and cell lines, we provide evidence for the translational activities of Alu exons and the presence of Alu exon derived peptides in human proteins. These Alu exon peptides represent species-specific protein differences between primates and other mammals, and in certain instances between humans and closely related primates. In the case of the RNA editing enzyme ADARB1, which contains an Alu exon peptide in its catalytic domain, RNA sequencing analyses of A-to-I editing demonstrate that both the Alu exon skipping and inclusion isoforms encode active enzymes. The Alu exon derived peptide may fine tune the overall editing activity and, in limited cases, the site selectivity of ADARB1 protein products. Conclusions Our data indicate that Alu elements have contributed to the acquisition of novel protein sequences during primate and human evolution. Electronic supplementary material The online version of this article (doi:10.1186/s13059-016-0876-5) contains supplementary material, which is available to authorized users.
... In most cases, TEs are deleterious to the host as a consequence of their insertion into coding regions that lead to alterations of gene expression (Makalowski, 2000). But some TEs have more favorable effects on the expression of host genes (Gotea et al., 2006). For example, PTPN1 is a 435 amino acid protein that belongs to the large family of protein tyrosine phosphatases (PTPs), which catalyze protein dephosphorylation. ...
... For example, PTPN1 is a 435 amino acid protein that belongs to the large family of protein tyrosine phosphatases (PTPs), which catalyze protein dephosphorylation. The L3 fragment found in the PTPN1 mRNA corresponds to part of this RT domain (Andersen et al., 2004; Gotea et al., 2006). The TEs-fusion event is one of the most important evolutionary mechanisms for the creation of new functions (Yi et al., 2006). ...
... However, our results showed that more than 100 dual coding genes were affected in protein motif regions by TEs (Fig. 2). Only 4.3 % of the exonized TE elements potentially contributed a new function to the proteome, consistent with previous reports (Gotea et al., 2006). A well known example is the new Alu exon of human gene RNA specific adenosine deaminase 2. This exon was derived from a primate specific Alu transposable element. ...
Article
Full-text available
A dual coding event, which is the translation of different isoforms from a single gene, is one of the special patterns among the alternative splicing events. This is an important mechanism for the regulation of protein diversity in human and mouse genomes. Although the regulation for dual coding events has been characterized in a few genes, the individual mechanism remains unclear. Numerous studies have described the exonization of transposable elements, which is the splicing mediated insertion of transposable element sequence fragments into mature mRNAs. Therefore, in this study, we investigated the number of transposable element (TE)-derived dual coding genes in human, chimpanzee and mouse genomes. TE fusion exons appeared in the dual coding regions of 309 human genes. Functional protein domain alterations by TE-derived dual coding events were observed in 129 human genes. Comparative TE-derived dual coding events were also analyzed in chimpanzee and mouse orthologs. Seventy chimpanzee orthologs had TE-derived dual coding events, but mouse orthologs did not have any TE-derived dual coding events. Taken together, our analyses listed the number of TE-derived dual coding genes which could be investigated by experimental analysis and suggested that TE-derived dual coding events were major sources for the functional diversity of human genes, but not mouse genes.
... Studies of this type include searches of expression databases (Smit 1999), studies of particular TE superfamilies (Robertson and Zumpano 1997;Sarkar et al. 2003;Gao and Voytas 2005;Muehlbauer et al. 2006;de Jesus et al. 2012), and studies of TE exonization (Nekrutenko and Li 2001;Britten 2006;Wu et al. 2007;Lockton and Gaut 2009;Donoghue et al. 2011). However, showing that a TE-like gene is expressed is insufficient to show that it is exapted because maladaptive or nonadaptive sequences, such as active TEs or TEs included in aberrant (nonfunctional) mRNA splice forms, may also be expressed (Gotea and Makalowski 2006;Keren et al. 2010). Furthermore, like certain conventional genes such as stressactivated genes (de Nadal et al. 2011), the expression of certain ETEs may be induced only rarely. ...
... TE insertions may be included as cassette exons in conventional gene transcripts (Britten 2006), fusions between TE and conventional genes may sometimes be functional (Cordaux et al. 2006), and it is possible that even recent insertions at least transiently provide weak beneficial functions . Nevertheless, without additional evidence, we must assume that most TE insertions, even if expressed, are likely nonfunctional (Gotea and Makalowski 2006), thus unless they have strong evidence of exaptation (e.g., microsynteny), in this study we do not regard such sequences as ETEs. To differentiate between such TE insertions and bona fide ETEs, we measured various genetic attributes such as microsynteny, siRNA coverage, and repetitiveness at not just the whole-gene level, but also at the TE-specific domains (supplementary table S5, Supplementary Material online) and visually curated all predicted ETEs. ...
Article
Complex eukaryotes contain millions of transposable elements (TEs), comprising large fractions of their nuclear genomes. TEs consist of structural, regulatory, and coding sequences that are ordinarily associated with transposition, but that occasionally confer on the organism a selective advantage and may thereby become exapted. Exapted transposable element genes (ETEs) are known to play critical roles in diverse systems, from vertebrate adaptive immunity to plant development. Yet despite their evident importance, most ETEs have been identified fortuitously and few systematic searches have been conducted, suggesting that additional ETEs may await discovery. To explore this possibility, we develop a comprehensive systematic approach to searching for ETEs. We use TE-specific conserved domains to identify with high precision genes derived from TEs and screen them for signatures of exaptation based on their similarities to reference sets of known ETEs, conventional (non-TE) genes, and TE genes across diverse genetic attributes including repetitiveness, conservation of genomic location and sequence, and levels of expression and repressive small RNAs. Applying this approach in the model plant Arabidopsis thaliana, we discover a surprisingly large number of novel high confidence ETEs. Intriguingly, unlike known plant ETEs, several of the novel ETE families form tandemly arrayed gene clusters, while others are relatively young. Our results not only identify novel TE-derived genes that may have practical applications, but also challenge the notion that TE exaptation is merely a relic of ancient life, instead suggesting that it may continue to fundamentally drive evolution. © The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
... TEs can both remove DNA from the genome by generating target site deletions and add DNA through 3 0 and, less frequently, through 5 0 transduction (14,15). TEs contribute to protein-coding regions both at the transcript and at the protein level (16)(17)(18) and TE-encoded proteins have been domesticated and are part of host genes (8). Additionally, ectopic recombination between TEs causes deletions, duplications, and sequence rearrangements (Fig. 2). ...
... TEs play a role in the generation of new coding sequences either by being domesticated as components of host transcripts (8,(16)(17)(18)(137)(138)(139) or by inducing duplication of host genes (14,140). TEs can affect gene expression in several ways and some of these changes might be adaptive. ...
Article
Full-text available
Recent research is starting to shed light on the factors that influence the population and evolutionary dynamics of transposable elements (TEs) and TE life cycles. Genomes differ sharply in the number of TE copies, in the level of TE activity, in the diversity of TE families and types, and in the proportion of old and young TEs. In this chapter, we focus on two well-studied genomes with strikingly different architectures, humans and Drosophila, which represent two extremes in terms of TE diversity and population dynamics. We argue that some of the answers might lie in (1) the larger population size and consequently more effective selection against new TE insertions due to ectopic recombination in flies compared to humans; and (2) in the faster rate of DNA loss in flies compared to humans leading to much faster removal of fixed TE copies from the fly genome.
... Alu elements have several sequence motifs resembling consensus splice sites in both sense and antisense orientations (Gotea and Makalowski, 2006), and the insertion of Alu elements into intronic regions may introduce new exons into existing functional genes. It has became a current opinion now that it is the exonization of Alu elements that plays a crucial role in birth of new exons in primate genomes (Corvelo and Eyras, 2008;Lin et al., 2008). ...
... In all cases, the RE cassettes are most frequently derived from older REs, in line with the hypothesis that incorporation of TE fragments into functional proteins requires long evolutionary periods. The role of evolutionary recent REs is probably limited to regulatory functions (Gotea and Makalowski, 2006). ...
Article
Repetitive sequences occupy a huge fraction of essentially every eukaryotic genome. Repetitive sequences cover more than 50% of mammalian genomic DNAs, whereas gene exons and protein-coding sequences occupy only ~3% and 1%, respectively. Numerous genomic repeats include genes themselves. They generally encode "selfish" proteins necessary for the proliferation of transposable elements (TEs) in the host genome. The major part of evolutionary "older" TEs accumulated mutations over time and fails to encode functional proteins. However, repeats have important functions also on the RNA level. Repetitive transcripts may serve as multifunctional RNAs by participating in the antisense regulation of gene activity and by competing with the host-encoded transcripts for cellular factors. In addition, genomic repeats include regulatory sequences like promoters, enhancers, splice sites, polyadenylation signals, and insulators, which actively reshape cellular transcriptomes. TE expression is tightly controlled by the host cells, and some mechanisms of this regulation were recently decoded. Finally, capacity of TEs to proliferate in the host genome led to the development of multiple biotechnological applications.
... Thus, analyzing other body parts and increasing the number of genomes analyzed will likely identify more chimeric gene-TE transcripts. Second, although our estimate is based on the highly accurate annotations of TE insertions performed using the REPET pipeline (Rech et al. 2022), highly diverged and fragmented TE insertions are difficult to be accurately annotated by any pipeline and, as such, might go undetected (Gotea and Makałowski 2006;Rodriguez and Makałowski 2022). Still, the combination of an accurate annotation of chimeric gene-TE transcripts, with expression data across body parts, and of the investigation of the protein domain acquisition performed in this work not only significantly advances our knowledge on the role of TEs in gene expression and protein novelty but also provides a rich resource for the follow-up analysis of gene-TE chimeras. ...
Article
Full-text available
Transcriptomes are dynamic, with cells, tissues, and body parts expressing particular sets of transcripts. Transposable elements (TEs) are a known source of transcriptome diversity; however, studies often focus on a particular type of chimeric transcript, analyze single body parts or cell types, or are based on incomplete TE annotations from a single reference genome. In this work, we have implemented a method based on de novo transcriptome assembly that minimizes the potential sources of errors while identifying a comprehensive set of gene-TE chimeras. We applied this method to the head, gut, and ovary dissected from five Drosophila melanogaster natural strains, with individual reference genomes available. We found that ∼19% of body part–specific transcripts are gene–TE chimeras. Overall, chimeric transcripts contribute a mean of 43% to the total gene expression, and they provide protein domains for DNA binding, catalytic activity, and DNA polymerase activity. Our comprehensive data set is a rich resource for follow-up analysis. Moreover, because TEs are present in virtually all species sequenced to date, their role in spatially restricted transcript expression is likely not exclusive to the species analyzed in this work.
... In a process called exonization, TEs can integrate into genomic regions and offer recognition by the splicing machinery as a newly recruited exon [100]. Approximately 4% of human genes contain TE motifs in their coding regions, indicating that exons may have been derived from the exonization of TEs [101][102][103][104][105][106]. Some studies have identified that exonized LINEs in the human genome provide an additional domain and produce abnormal transcripts through diverse alternative splicing mechanisms in cancers. ...
Article
Full-text available
Alternative splicing of messenger RNA (mRNA) precursors contributes to genetic diversity by generating structurally and functionally distinct transcripts. In a disease state, alternative splicing promotes incidence and development of several cancer types through regulation of cancer-related biological processes. Transposable elements (TEs), having the genetic ability to jump to other regions of the genome, can bring about alternative splicing events in cancer. TEs can integrate into the genome, mostly in the intronic regions, and induce cancer-specific alternative splicing by adjusting various mechanisms, such as exonization, providing splicing donor/acceptor sites, alternative regulatory sequences or stop codons, and driving exon disruption or epigenetic regulation. Moreover, TEs can produce microRNAs (miRNAs) that control the proportion of transcripts by repressing translation or stimulating the degradation of transcripts at the post-transcriptional level. Notably, TE insertion creates a cancer-friendly environment by controlling the overall process of gene expression before and after transcription in cancer cells. This review emphasizes the correlative interaction between alternative splicing by TE integration and cancer-associated biological processes, suggesting a macroscopic mechanism controlling alternative splicing by TE insertion in cancer.
... In the overwhelming majority of cases, TE insertions affecting protein-coding regions are deleterious, since the most common results of this are frameshift or introduction of a premature stop codon. TE-encoded sequences were found only in 0.1% of functional proteins in the Protein Data Bank [249]. ...
Article
Full-text available
Transposable elements (TEs) have been extensively studied for decades. In recent years, the introduction of whole-genome and whole-transcriptome approaches, as well as single-cell resolution techniques, provided a breakthrough that uncovered TE involvement in host gene expression regulation underlying multiple normal and pathological processes. Of particular interest is increased TE activity in neuronal tissue, and specifically in the hippocampus, that was repeatedly demonstrated in multiple experiments. On the other hand, numerous neuropathologies are associated with TE dysregulation. Here, we provide a comprehensive review of literature about the role of TEs in neurons published over the last three decades. The first chapter of the present review describes known mechanisms of TE interaction with host genomes in general, with the focus on mammalian and human TEs; the second chapter provides examples of TE exaptation in normal neuronal tissue, including TE involvement in neuronal differentiation and plasticity; and the last chapter lists TE-related neuropathologies. We sought to provide specific molecular mechanisms of TE involvement in neuron-specific processes whenever possible; however, in many cases, only phenomenological reports were available. This underscores the importance of further studies in this area.
... These investigations have elucidated how the completeness and quality of assembled genomes and the annotation methods have a significant effect on the estimation of TE diversity and abundance. When heterochromatic regions were added, and de novo TE annotation procedures were implemented, the TE content of the D. melanogaster genome increased from 2% in early studies of euchromatic domains to 15% [24,25]. The genetic distance computation revealed that most of the TE copies originated from the recent transposition events [26,27]. ...
Article
Full-text available
Citation: Haq, I.U.; Muhammad, M.; Yuan, H.; Ali, S.; Abbasi, A.; Asad, M.; Ashraf, H.J.; Khurshid, A.; Zhang, K.; Zhang, Q.; et al. Satellitome Analysis and Transposable Elements Comparison in Geographically Distant Populations of Spodoptera frugiperda. Life 2022, 12, 521. https://doi.org/10.3390/life12040521
... T ransposable elements (TEs), including retrotransposons and DNA transposons, occupy a significant portion of eukaryotic genomes 1 . Although long considered "junk DNA" 2 , TEs are now widely accepted as catalysts of genetic innovations by directly contributing to regulatory or coding sequences 3,4 and mediating sequence changes such as duplications or deletions 5,6 . The mechanism responsible for the generation of duplicates affects their evolutionary trajectories 7,8 . ...
Article
Full-text available
Despite long being considered as “junk”, transposable elements (TEs) are now accepted as catalysts of evolution. One example is Mutator -like elements (MULEs, one type of terminal inverted repeat DNA TEs, or TIR TEs) capturing sequences as Pack-MULEs in plants. However, their origination mechanism remains perplexing, and whether TIR TEs mediate duplication in animals is almost unexplored. Here we identify 370 Pack-TIRs in 100 animal reference genomes and one Pack-TIR ( Ssk-FB4 ) family in fly populations. We find that single-copy Pack-TIRs are mostly generated via transposition-independent gap filling, and multicopy Pack-TIRs are likely generated by transposition after replication fork switching. We show that a proportion of Pack-TIRs are transcribed and often form chimeras with hosts. We also find that Ssk-FB4s represent a young protein family, as supported by proteomics and signatures of positive selection. Thus, TIR TEs catalyze new gene structures and new genes in animals via both transposition-independent and -dependent mechanisms.
... Since CpG islands correspond to open and actively transcribed chromatin regions, these promoters could be targeted by TE insertions and would provide them with a permissive transcriptional context for their expression, favoring the TE recruitment by the host as new transcribed sequences. TE domestication might also be facilitated by an insertion close to a promoter, or when the insertion results in a fusion with a host gene, with the TE possibly benefiting from the regulatory elements of the linked host gene if this gene is expressed in the germ line [64,283,284]. Fifth, if a novel TE is acquired by horizontal transfer, it will transiently escape the repression mechanisms of the host, bringing new evolutionary potentialities and recruitment opportunities. ...
Article
Full-text available
Transposable elements (TEs) are major components of all vertebrate genomes that can cause deleterious insertions and genomic instability. However, depending on the specific genomic context of their insertion site, TE sequences can sometimes get positively selected, leading to what are called “exaptation” events. TE sequence exaptation constitutes an important source of novelties for gene, genome and organism evolution, giving rise to new regulatory sequences, protein-coding exons/genes and non-coding RNAs, which can play various roles beneficial to the host. In this review, we focus on the development of vertebrates, which present many derived traits such as bones, adaptive immunity and a complex brain. We illustrate how TE-derived sequences have given rise to developmental innovations in vertebrates and how they thereby contributed to the evolutionary success of this lineage.
... Although the exact contribution of TEs to the proteome has been discussed, some authors have suggested that thousands of proteins contain sequences resulting from TE exonization in vertebrate genomes including humans (204) (205,206 (54,207). Most retrocopies are only processed pseudogenes and lack their parental gene features, such as introns or promoter (50,51). ...
Thesis
Retrotransposons are mobile genetics elements, which form almost half of our genome. Only the L1HS subfamily of the Long Interspersed Element-1 class (LINE-1 or L1) has retained the ability to jump autonomously in humans. Their mobilization in the germline – but also in some somatic tissues – contributes to human genetic diversity and to diseases, such as cancer. L1 reactivation can be directly mutagenic by disrupting genes or regulatory sequences. In addition, L1 sequences themselves contain many regulatory cis-elements. Thus, L1 insertions near a gene or within intronic sequences can also produce more subtle genic alterations. To explore L1-mediated genic alterations in a genome-wide manner, we have developed a dedicated RNA-seq analysis software able to identify L1 chimeric or antisense transcripts and to annotate these novel isoforms with their associated alternative splicing events. During the course of this work, it appeared that understanding the link between L1HS insertion polymorphisms and phenotype or disease requires a comprehensive view of the different L1HS copies present in a given individual or sample. To provide a comprehensive summary of L1HS insertion polymorphisms identified in healthy or pathological human samples and published in peer-reviewed journals, we developed euL1db, the European database of L1HS retrotransposon insertions in humans. This work will help understanding the overall impact of L1 insertions on gene expression, at a genome-wide scale.
... (Ghosh et al., 2001 ;Vinckenbosch et al., 2006). (Gotea and Makalowski, 2006). L'exonisation d'éléments récents comme les Alu ou les L1 semblent au contraire avoir un impact négatif sur la fonction des protéines au sein desquelles ils sont insérés en cas d'expression constitutive. ...
Thesis
L’avènement du séquençage haut débit d’exome (SHD-E) en diagnostic et en recherche ces dernières années a conduit à l’identification des bases génétiques de nombreuses pathologies mendéliennes, permettant de résoudre de nombreuses situations d’errance diagnostique. Néanmoins, l’analyse des données de SHD-E permet uniquement d’identifier des variations pathogènes ou probablement pathogènes dans 30 à 45 % des situations sans diagnostic. En effet, certaines limites existent, tant au niveau clinique, moléculaire et bioinformatique. L’évolution constante des connaissances cliniques, du nombre de nouveaux gènes impliqués en pathologie humaine, et des corrélations clinico-biologique a un impact important sur l’analyse des données, entraînant une amélioration progressive de la recherche diagnostique. Des limites techniques inhérentes à la technologie, avec en particulier des régions non couvertes, existent, mais se sont également significativement réduites ces dernières années. Enfin, au-delà de l’analyse de SNV et de CNV, d’autres anomalies génétiques peuvent être responsables de maladies rares, nécessitant un développement bioinformatique pour optimiser les résultats. Bien que le séquençage à haut débit du génome permette de résoudre des observations, en particulier en cas de variations dans les régions non codantes ou les variants de structure, il existe encore de nombreuses informations à extraire et à exploiter à partir des données de SHD-E.L’objectif de cette thèse a donc été de participer à l’amélioration des approches bioinformatiques d’analyse de données de SHD-E pour l’identification de nouveaux gènes ou mécanismes moléculaires impliqués dans des maladies génétiques rares afin de réduire l’errance diagnostique des patients.Plusieurs stratégies ont ainsi été mises en place. La première stratégie a consisté en une réanalyse recherche de données de 80 patients ayant bénéficié d’un SHD-E au laboratoire CERBA (thèse CIFRE) dont la lecture diagnostique était négative. Elle a conduit à la mise en évidence deux nouveaux gènes candidats dans la déficience intellectuelle syndromique, dont le gène OTUD7A (article 1). La deuxième stratégie a consisté en la mise au point d’un pipeline bioinformatique pour extraire les données du génome mitochondrial à partir des données de SHD-E. L’ADN mitochondrial n’est pas ciblé par les kits de capture d’exome mais peut être extrait des données capturées indirectement rendant son analyse possible à partir de données de SHD-E préexistantes. A partir de la collection GAD d’exomes de patients sans diagnostic, deux variations causales ont été identifiées chez deux individus atteints de troubles neuro-développementaux sur 928 personnes étudiées, et ainsi résoudre une errance diagnostique dans 0,2 % des patients sans diagnostic (article 2). La troisième stratégie a consisté en la mise en place d’un pipeline bioinformatique d’identification des éléments mobiles au sein des données d’exome, étant attendu qu’environ 0,3 % des variations pathogènes du génome humain ont pour origine l’insertion de novo d’un élément mobile. A partir de la collection GAD d’exomes de 3322 patients sans diagnostic, cette étape a permis d’identifier deux cas en lien avec l’insertion d’un élément Alu au sein d’un exon du gène FERMT1 et du gène GRIN2B (article 3 en cours d’écriture).Cette thèse a permis de repousser certaines limites de la technologie d’exome. D’autres perspectives existent, et sont explorées par l’équipe, en lien avec le projet Européen Solve-RD.
... De acordo com Schaack (2010 [63]), esse mecanismó e um veículo comum para transferência de genes entre espécies de procariotos, e embora nunca tenha sido demonstrado que TEs transfiram genes entre espécies de eucariotos, eles são capazes de capturar e transportar sequências com grande frequência dentro de uma mesma espécie. a uma invasão e colonização em eucariotos tenham um papel relevante na origem de novas espécies de modo que estes têm sido considerados agentes chaves da evolução [44,[64][65][66][67][68][69]. A figura 1.5, retirada de [63], mostra um diagrama de alguns efeitos evolutivos hoje atribuídosatribuídos`atribuídosà atividade de TEs. ...
Thesis
As doenças transmitidas por mosquitos têm grande custo de vidas e socio-econômico, especialmente em países tropicais em desenvolvimento, e por isso é uma prioridade para a Organização Mundial da Saúde. Novas propostas para o controle destas doenças incluem a modificação genética dos vetores e para isso, além da identificação e inserção de genes de resistência ao patógeno no mosquito, é necessária a obtenção de métodos e cientes para difundir e fixar tais transgenes nas populações naturais. O uso de elementos transponíveis (TEs) tem sido proposto como mecanismo de propagação devido a suas características egoístas e capacidade de invasão em populações inteiras. Nesta tese examinamos modelos matemáticos sob a ótica de simulações dos fenômenos ecológicos e evolutivos envolvidos nos processos de invasão de uma população selvagem de mosquitos por uma família de TEs que carregue um gene de resistência que confira refratoriedade contra o patógeno de uma doença transmissível Elaboramos as premissas de um estudo recente que adaptou técnicas de filodinâmica usadas com sequências de vírus para sequências de TEs, e como essa analogia pode ser usada para estimar o tempo de invasão de TEs em uma população de mosquitos. Foi desenvolvido um simulador de novo baseado em indivíduos capaz de representar três níveis de organização biológica: a população dos mosquitos, as quantidades e loci dos elementos e as sequências individuais que sofrem mutações ao longo das gerações. Exploramos tanto a influência do custo de fitness dos TEs como a influência de diferentes dinâmicas populacionais nas quantidades totais de elementos por indivíduo e na população, tomando como base uma família de TEs que se expande de acordo com um modelo master gene. Observamos que topologias reconstruídas de uma família com essa característica exibem as estruturas pectinadas previstas na literatura teórica, e em casos simples, o tempo entre eventos de transposi ção pode ser observado graficamente na arvore. Mostramos também o conflito entre a taxa de transposição, a perda de TEs ativos e o impacto individual no fitness do hospedeiro, e como essas grandezas devem ser consideradas em conjunto para futuros estudos. Mostramos que ao fazer a filodinâmica com sequências de TEs, e possível observar a influência da demografia dos hospedeiros na estimativa da população dos TEs.
... However, with time transposable element sequences can also add to the functionality of genomic features through a process of co-option in which the transposable element sequence, or part of it, is recruited to perform some function. The incorporation of transposable elements (exaptation) has been shown to contribute to the evolution of regulatory motifs (16), promoters (17) and lncRNA (18) among others, and transposable elements have been co-opted into ancient protein-coding genes, either in their main isoform (19)(20)(21) or as alternative splice variants (22). ...
Article
Full-text available
Transposable elements colonize genomes and with time may end up being incorporated into functional regions. SINE Alu elements, which appeared in the primate lineage, are ubiquitous in the human genome and more than a thousand overlap annotated coding exons. Although almost all Alu-derived coding exons appear to be in alternative transcripts, they have been incorporated into the main coding transcript in at least 11 genes. The extent to which Alu regions are incorporated into functional proteins is unclear, but we detected reliable peptide evidence to support the translation to protein of 33 Alu-derived exons. All but one of the Alu elements for which we detected peptides were frame-preserving and there was proportionally seven times more peptide evidence for Alu elements as for other primate exons. Despite this strong evidence for translation to protein we found no evidence of selection, either from cross species alignments or human population variation data, among these Alu-derived exons. Overall, our results confirm that SINE Alu elements have contributed to the expansion of the human proteome, and this contribution appears to be stronger than might be expected over such a relatively short evolutionary timeframe. Despite this, the biological relevance of these modifications remains open to question.
... This could happen due to (i) possible deleterious effect of TEs insertion on encoded proteins [29] and (ii) disruption of the cellular process through chromosome nicking by TE fragment containing proteins [29]. To control the deleterious effect TE insertions, host possesses several mechanisms by which TEs can be eliminated before translation [30]. Thus from our analysis, it is clear that the exaptation phenomenon probably occurred in the analysed PDC gene. ...
Article
Full-text available
Pongamia pinnata (also called Millettia pinnata), a non-edible oil yielding tree, is well known for its multipurpose benefits and acts as a potential source for medicine and biodiesel preparation. Due to increase in demand for cultivation, understanding of genetic diversity is an important parameter for further breeding and cultivation programme. Transposable elements (TEs) are a major component of plant genome but still, their evolutionary significance in Pongamia remains unexplored. In view to understand the role of TEs in genome diversity, Pongamia unigenes were screened for the presence of TE cassettes. Our analysis showed the presence of all categories of TE cassettes in unigenes with major contribution of long terminal repeat-retrotransposons towards unigene diversity. Interestingly, the insertion of some TEs was also observed in both organellar genomes. The study of insertion of TEs in coding sequence is of great interest as they may be responsible for protein diversity thereby influencing the phenotype. The present investigation confirms the exaptation phenomenon in pyruvate decarboxylase (PDC) gene where the entire exon sequence was derived from Ty3-gypsy like retrotransposon. The study of PDC protein revealed the translation of gypsy element into protein. Furthermore, the phylogenetic study confirmed the diversity in PDC gene due to insertion of the gypsy element, where the PDC genes with and without gypsy insertion were clustered separately.
... Selective forces can explain why some elements are more likely to be retained in certain genomic locations than others [22,23]. For instance, de novo insertions of the human LINE 1 (L1) retrotransposon readily occur within (and disrupt) gene exons [24], but very few if any L1 elements have been fixed within the coding region of human genes [25]. Similarly, no LTR retrotransposon is known to exhibit insertion preference with regard to which DNA strand is transcribed, and yet these elements are strongly depleted in the sense orientation within human introns-most likely due to their propensity to interfere with gene splicing and polyadenylation when inserted in sense orientation [11,26]. ...
Article
Full-text available
Abstract Transposable elements (TEs) are major components of eukaryotic genomes. However, the extent of their impact on genome evolution, function, and disease remain a matter of intense interrogation. The rise of genomics and large-scale functional assays has shed new light on the multi-faceted activities of TEs and implies that they should no longer be marginalized. Here, we introduce the fundamental properties of TEs and their complex interactions with their cellular environment, which are crucial to understanding their impact and manifold consequences for organismal biology. While we draw examples primarily from mammalian systems, the core concepts outlined here are relevant to a broad range of organisms.
... The authors hypothesized that ME could be exploited as ready-to-use motifs in evolutionary experiments. In contrast to previously reported results, Gotea and Makalowski [33] analyzed proteins of the Protein Databank (PDB, http://www.rcsb.org/) and showed that of 3764 proteins annotated at that moment, only 3 (0.1%) had protein fragments originating from ME. ...
Article
Full-text available
Recent studies revealed that about 80% of the eukaryotic genome is biochemically active; it produces not solely mRNA but also a large number of noncoding RNAs (ncRNAs). Thus, a large fraction of “ribonome” (the total cellular complement of RNAs and their regulatory factors) of the cell consists of a variety of noncoding RNAs (ncRNAs), while mRNAs occupy only a small part of it. It is well known that long noncoding RNAs (lncRNAs) are involved in the regulation of protein-coding gene expression by altering the chromatin structure, transcription regulation, and pre-mRNA splicing. MicroRNAs and small interfering RNAs trigger the RNA interference mechanism involved in the transcriptional and post-transcriptional regulation of gene expression. However, our knowledge of the role of noncoding part of the genome in proteome diversification and plasticity is scarce. In this mini-review, we discuss new data obtained over the past few years, which change our view of the role of noncoding part of the genome in the cell proteome formation.
... We call exapted those TE fragments that became a part of a coding sequence (CDS) but do not code for a protein domain attributed to their original function. In humans Alu sequences are major donators of exons, but exons acquired from other elements, such as LINEs, endogenous retroviruses, and DNA transposons, have also been reported [5]. Examples of the exaptation of an endogenous retrovirus envelope (env) gene are the primate genes Syncytin-1 and Syncytin-2, which might be involved in the formation of the placenta [6]. ...
Article
Full-text available
Transposable elements, often considered to be not important for survival, significantly contribute to the evolution of transcriptomes, promoters, and proteomes. Reverse transcriptase, encoded by some transposable elements, can be used in trans to produce a DNA copy of any RNA molecule in the cell. The retrotransposition of protein-coding genes requires the presence of reverse transcriptase, which could be delivered by either non-long terminal repeat (non-LTR) or LTR transposons. The majority of these copies are in a state of “relaxed” selection and remain “dormant” because they are lacking regulatory regions; however, many become functional. In the course of evolution, they may undergo subfunctionalization, neofunctionalization, or replace their progenitors. Functional retrocopies (retrogenes) can encode proteins, novel or similar to those encoded by their progenitors, can be used as alternative exons or create chimeric transcripts, and can also be involved in transcriptional interference and participate in the epigenetic regulation of parental gene expression. They can also act in trans as natural antisense transcripts, microRNA (miRNA) sponges, or a source of various small RNAs. Moreover, many retrocopies of proteincoding genes are linked to human diseases, especially various types of cancer.
... In humans, Alu sequences are major donators of exons, but exons derived from other SINEs, LINEs, endogenous retroviruses and DNA transposons have also been described. It is generally believed that TE cessattes do not significantly contribute to human proteome ( Gotea and Makalowski, 2006). However, recent proteomic screen demonstrated that many Alu-cassettes are indeed part of the active human proteins ( Lin et al., 2016). ...
Chapter
Full-text available
More than half of the human genome originated in transposable elements (TEs). Although these segments are mostly located in intronic and intergenic regions, some of them can be found in protein‐coding exons. Moreover, some functionally important genes evolved from TEs. These genes are involved in major biological processes such as immunity, replication, reproduction, cell proliferation and apoptosis. In addition, TEs contribute to human proteome indirectly by retrocopying messenger ribonucleic acid (mRNA) molecules back to the genome and creating new variants of the existing genes, which in turn can evolve a new function or new expression pattern. This demonstrates the importance of TEs as a genomic pool of coding sequences for the creation and evolution of gene functions. Key Concepts More than half of the human genome originated in transposable elements and they have profound consequences for the genome evolution. Transposable elements contribute to the human proteome either directly by co‐option of TE‐originated sequences or indirectly by using TE's molecular machinery to duplicate existing genetic material. Retrogenes are byproducts of L1 element activity. Human genes can be shuffled by the process called genome transduction that involves "leaking" transcription of a transposon. Transposons moving around the genome can alter expression profile of the host genes.
... Compared with mature IGF-1 relatively little is known about the mechanism of action of the different E peptides [1,[10][11][12]. From an evolutionary point of view the unchanged persistence over long evolutionary periods of MIRb-derived IGF-1 exon 5 implies its functional relevance [52,62]. Moreover, E-peptides are protein-coding regions in which synonymous mutation rates are extremely low compared to IGF-1 core (Table 1), indicating additional sequence constraints beyond those dictated by the structure and function of the proteins. ...
Article
Insulin-like growth factor (IGF-1) -1 is a pleiotropic hormone exerting mitogenic and anti-apoptotic effects. Inclusion or exclusion of exon 5 into the IGF-1 mRNA gives rise to three transcripts, IGF-1Ea, IGF-1Eb and IGF-1Ec, which yield three different C-terminal extensions called Ea, Eb and Ec peptides. The biological significance of the IGF-1 splice variants and how the E-peptides affect the actions of mature IGF-1 are largely unknown. In this study we investigated the origin and conservation of the IGF-1 E-peptides and we compared the pattern of expression of the IGF-1 isoforms in vivo, in nine mammalian species, and in vitro using human and mouse IGF-1 minigenes. Our analysis showed that only IGF-1Ea is conserved among all vertebrates, whereas IGF-1Eb and IGF-1Ec are an evolutionary novelty originated from the exonization of a mammalian interspersed repetitive-b (MIR-b) element. Both IGF-1Eb and IGF-1Ec mRNAs were constitutively expressed in all mammalian species analyzed but their expression ratio varies greatly among species. Using IGF-1 minigenes we demonstrated that divergence in cis-acting regulatory elements between human and mouse conferred species-specific features to the exon 5 region. Finally, the protein-coding sequences of exon 5 showed low rate of synonymous mutations and contain disorder-promoting amino acids, suggesting a regulatory role for these domains. In conclusion, exonization of a MIR-b element in the IGF-1 gene determined gain of exon 5 during mammalian evolution. Alternative splicing of this novel exon added new regulatory elements at the mRNA and protein level potentially able to regulate the mature IGF-1 across tissues and species.
... This can be explained by specific sequences within the Alu fragments that promote exonization434445. Approximately 5 % of all alternatively spliced internal exons (exons flanked by introns at both sides) are derived from Alu elements [36, 46]. Alus have a great potential to become exonized because they can harbor up to ten potential 5 0 splice sites and 13 potential 3 0 splice sites. ...
Article
Full-text available
The human genome is under constant invasion by retrotransposable elements. The most successful of these are the Alu elements; with a copy number of over a million, they occupy about 10 % of the entire genome. Interestingly, the vast majority of these Alu insertions are located in gene-rich regions, and one-third of all human genes contains an Alu insertion. Alu sequences are often embedded in gene sequence encoding pre-mRNAs and mature mRNAs, usually as part of their intron or UTRs. Once transcribed, they can regulate gene expression as well as increase the number of RNA isoforms expressed in a tissue or a species. They also regulate the function of other RNAs, like microRNAs, circular RNAs, and potentially long non-coding RNAs. Mechanistically, Alu elements exert their effects by influencing diverse processes, such as RNA editing, exonization, and RNA processing. In so doing, they have undoubtedly had a profound effect on human evolution.
... Les éléments transposables contribuent donc au transcriptome, mais est-il possible de trouver leur trace dans le protéome ? Une première étude faite sur la Protein Data Bank 16 montre que seules 3 entrées sur les 3 000 que comptait la base à cette époque contiennent des séquences d'éléments transposables (Gotea & Makałowski, 2006). Une autre étude confirme l'épissage alternatif de 88 % des exons porteurs d'éléments transposables (M. ...
Article
Genomes are dynamic structures, varying in size and composition, in which retrotransposons play a major role. In this context, our work aims at: 1) clarifying the phylogenetic relationships within genus Lupinus (Fabaceae) using additional nuclear markers (rRNA-ETS and SymRK), 2) assessing the diversity, abundance and role of the retrotransposons Ty1/copia and Ty3/gypsy in Lupinus genome size variation by amplification and in situ hybridization, and 3) sequencing, annotating and comparing a first genomic region available for lupine with homologous regions in other Fabaceae. The obtained phylogenetic framework improves our understanding of the evolutionary history of lupines, and when combined with the exploration of retrotransposons, highlights lineage-specific patterns of genome size variation. Copia and gypsy elements appear to contribute more significantly to genome size differences in Mediterranean lupines than in African lupines, suggesting different mechanisms involved in the genus. This was confirmed at the local scale (SymRK gene region) where these retroelements represent 25% of the analyzed region in Lupinus angustifolius.
... Authors differ widely in their assessment of the number of functional retrogenes (Gotea and Makalowski 2006, Suyama et al. 2006, Gray et al 2006. Following the first reports of functional retrogenes in the late eighties (McCarrey and Thomas 1987, Dahl et al 1990, Boer et al 1987, it was thought that these were isolated cases, as mRNA retrocopies lack their own promoter elements. ...
... Most new TE exons appear in 5-or 3-prime UTRs and may play regulatory roles (Sela et al., 2007;Shen et al., 2011). In contrast, TE exonization in protein-coding regions generally produces short, lineage-specific exons that appear as rare splice variants, questioning their biological relevance and suggesting that most probably do not translate into proteins (Gotea and Makałowski, 2006;Lin et al., 2008;Pavlíček et al., 2002;Piriyapongsa et al., 2007). ...
Article
Full-text available
Transposable elements constitute a large fraction of vertebrate genomes and, during evolution, may be co-opted for new functions. Exonization of transposable elements inserted within or close to host genes is one possible way to generate new genes, and alternative splicing of the new exons may represent an intermediate step in this process. The genes TMPO and ZNF451 are present in all vertebrate lineages. Although they are not evolutionarily related, mammalian TMPO and ZNF451 do have something in common - they both code for splice isoforms that contain LAP2alpha domains. We found that these LAP2alpha domains have sequence similarity to repetitive sequences in non-mammalian genomes, which are in turn related to the first ORF from a DIRS1-like retrotransposon. This retrotransposon domestication happened separately and resulted in proteins that combine retrotransposon and host protein domains. The alternative splicing of the retrotransposed sequence allowed the production of both the new and the untouched original isoforms, which may have contributed to the success of the colonization process. The LAP2alpha-specific isoform of TMPO (LAP2α) has been co-opted for important roles in the cell, while the ZNF451 LAP2alpha isoform is evolving under strong purifying selection but remains uncharacterized. mtress@cnio.es; valencia@cnio.es. © The Author(s) 2015. Published by Oxford University Press.
... Long terminal repeat (LTR) retrotransposons are most common in plant genomes while animal genomes, including the human genome, are often flooded by non-LTR retrotransposons. Most of the human genome is transcribed and TEs therefore greatly contribute to cellular transcriptome and proteome [1,2]. Recent insertions of TEs underlie the variability of human populations and can cause several human diseases [3,4]. ...
Article
Full-text available
Transposable elements form a significant proportion of eukaryotic genomes. Recently, Lexa et al (Nucleic Acids Res 42:968-978, 2014) reported that plant long terminal repeat (LTR) retrotransposons often contain potential quadruplex sequences (PQSs) in their LTRs and experimentally confirmed their ability to adopt four-stranded DNA conformations. Here, we searched for PQSs in human retrotransposons and found that PQSs are specifically localized in the 3'-UTR of LINE-1 elements, in LTRs of HERV elements and are strongly accumulated in specific regions of SVA elements. Circular dichroism spectroscopy confirmed that most PQSs had adopted monomolecular or bimolecular guanine quadruplex structures. Evolutionarily young SVA elements contained more PQSs than older elements and their propensity to form quadruplex DNA washigher. Full-length L1 elements contained more PQSs than truncated elements; the highest proportion of PQSs was found inside transpositionally active L1 elements (PA2 and HS families). Conservation of quadruplexes at specific positions of transposable elements implies their importancein their life cycle. The increasing quadruplex presence in evolutionary young LINE-1 and SVA families makes these elements important contributors toward present genome-wide quadruplex distribution. http://www.biomedcentral.com/1471-2164/15/1032/abstract
... First, Alu provides a site of alternative splicing when inserted in introns (Sorek et al., 2002;Lev-Maor et al., 2003) (Fig. 5A), changing the splicing pattern of host genes. Thus, the presence of > 500,000 intronic Alu copies in the human genome implies that Alu has a significant impact on the evolution of the human proteome (Gotea and Makalowski, 2006). Second, some Alu and B1 copies contain binding sites for transcription factors such as SP1, p53, NFκB, retinoic acids receptors (RARs), and aryl hydrocarbon receptor (AhR) (Vansant and Reynolds, 1995;Piedrafita et al., 1996;Oei et al., 2004;Polak and Domany, 2006;Apostolou and Thanos, 2008;Roman et al., 2008;Zemojtel et al., 2009) (Fig. 5B), suggesting that Alu and B1 retrotransposition has shaped the regulatory networks in their hosts (Feschotte, 2008). ...
Article
Full-text available
Short interspersed elements (SINEs) are a class of retrotransposons, which amplify their copy numbers in their host genomes by retrotransposition. More than a million copies of SINEs are present in a mammalian genome, constituting over 10% of the total genomic sequence. In contrast to the other two classes of retrotransposons, long interspersed elements (LINEs) and long terminal repeat (LTR) elements, SINEs are transcribed by RNA polymerase III. However, like LINEs and LTR elements, the SINE transcription is likely regulated by epigenetic mechanisms such as DNA methylation, at least for human Alu and mouse B1. Whereas SINEs and other transposable elements have long been thought as selfish or junk DNA, recent studies have revealed that they play functional roles at their genomic locations, for example, as distal enhancers, chromatin boundaries and binding sites of many transcription factors. These activities imply that SINE retrotransposition has shaped the regulatory network and chromatin landscape of their hosts. Whereas it is thought that the epigenetic mechanisms were originated as a host defense system against proliferation of parasitic elements, this review discusses a possibility that the same mechanisms are also used to regulate the SINE-derived functions.
... For instance, thousands of ordinary human gene transcripts may contain TE-derived sequences (Nekrutenko and Li 2001; Britten 2006; Sela et al. 2007). However, most exonized TEs are expressed only as rare splice variants and are probably not translated into functional peptides (Gotea and Makalowski 2006; Lin et al. 2008a). Indeed, the vast majority is derived not from TE coding sequences, but from nonautonomous elements. ...
Chapter
Full-text available
While evolution is often understood exclusively in terms of adaptation, innovation often begins when a feature adapted for one function is co-opted for a different purpose, such as when feathers originally adapted for insulation became used for flight. Co-opted features are called exaptations. Transposable elements are often viewed as molecular parasites, yet they are frequently the source of evolutionary innovation. One way in which transposable elements contribute to evolution is that their sequences can be co-opted to perform phenotypically beneficial functions. Transposable element gene exaptations have contributed to major innovations such as the vertebrate adaptive immune system and the mammalian placenta. They also often become transcription factors, and transposable element-derived transcription factor binding sites can form new regulatory networks. In this chapter, we review transposable element coding sequence exaptations in plants.
... Molecular domestication of transposases, integrases, reverse transcriptases, and envelope proteins has occurred repeatedly during the evolution of diverse major eukaryote lineages and, during neofunctionalization, some of the newly obtained functions becoming essential for survival of the organism (Miller et al. 1999;Volff 2006). In the past decade, substantial evidence has accumulated for TEs being a dynamic reservoir for new cellular functions (Nekrutenko and Li 2001;Mariño-Ramírez et al. 2005;Medstrand et al. 2005;Britten 2006; Gotea and Makalowski 2006;Thornburg et al. 2006). Although the functions of the majority of DGs are still unknown (Campillos et al. 2006;Volff 2006;Feschotte and Pritham 2007;Sinzelle et al. 2009), some may protect against infections, some are necessary for reproduction, whereas others enable the replication of chromosomes and the control of cell proliferation and apoptosis (Volff 2006). ...
Article
Full-text available
Molecular domestications of transposable elements have occurred repeatedly during the evolution of eukaryotes. Vertebrates, especially mammals, possess numerous single copy domesticated genes (DGs) that have originated from the intronless multicopy transposable elements. However, the origin and evolution of the retroelement-derived DGs (RDDGs) that originated from Metaviridae has been only partially elucidated, due to absence of genome data or to limited analysis of a single family of DGs. We traced the genesis and regulatory wiring of the Metaviridae-derived DGs through phylogenomic analysis, using whole-genome information from more than 90 chordate genomes. Phylogenomic analysis of these DGs in chordate genomes provided direct evidence that major diversification has occurred in the ancestor of placental mammals. Mammalian RDDGs have been shown to originate in several steps by independent domestication events and to diversify later by gene duplications. Analysis of syntenic loci has shown that diverse RDDGs and their chromosomal positions were fully established in the ancestor of placental mammals. By analysis of active Metaviridae lineages in amniotes we have demonstrated that RDDGs originated from retroelement remains. The chromosomal gene movements of RDDGs were highly dynamic only in the ancestor of placental mammals. During the domestication process, de novo acquisition of regulatory regions is shown to be a prerequisite for the survival of the DGs. The origin and evolution of de novo acquired promoters and UTRs in diverse mammalian RDDGs have been explained by comparative analysis of orthologous gene loci. The origin of placental mammals specific innovations and adaptations, such as placenta and newly evolved brain functions, was most probably connected to the regulatory wiring of domesticated genes and their rapid fixation in the ancestor of placental mammals.
... In the case of Drosophila genes, we used only the core region of each gene, i.e. the regions that are present in all known alternative splicing products. This was necessary because TEs can occasionally be incorporated into transcripts by alternative splicing, and the functionality of such splice products is uncertain (33).The sequences of eukaryotic TEs were downloaded from RepBase [http://www.girinst.org, v. 15.12, (34)]. ...
Article
Full-text available
The numerous discovered cases of domesticated transposable element (TE) proteins led to the recognition that TEs are a significant source of evolutionary innovation. However, much less is known about the reverse process, whether and to what degree the evolution of TEs is influenced by the genome of their hosts. We addressed this issue by searching for cases of incorporation of host genes into the sequence of TEs and examined the systems-level properties of these genes using the Saccharomyces cerevisiae and Drosophila melanogaster genomes. We identified 51 cases where the evolutionary scenario was the incorporation of a host gene fragment into a TE consensus sequence, and we show that both the yeast and fly homologues of the incorporated protein sequences have central positions in the cellular networks. An analysis of selective pressure (Ka/Ks ratio) detected significant selection in 37% of the cases. Recent research on retrovirus-host interactions shows that virus proteins preferentially target hubs of the host interaction networks enabling them to take over the host cell using only a few proteins. We propose that TEs face a similar evolutionary pressure to evolve proteins with high interacting capacities and take some of the necessary protein domains directly from their hosts.
... Below we provide a review of the evidence for the different eukaryotic genes that are derived from TEs. Although there have been several reviews of this topic in the past decade (Feschotte et al., 2002; Robertson, 2002; Kazazian, 2004; Gotea and Makalowski, 2006; Volff, 2006; Jurka et al., 2007 ), our contribution is different in five ways: g b B T 0 3 8 0 6 3 1 g b A C 1 5 7 9 7 7 1 g b A C 2 1 0 2 6 0 5 g b E F 6 5 9 4 6 8 1 g b A C 2 0 1 7 6 2 5 g b A C 2 1 5 1 7 4 5 al., 1997) derived from the nucleotide sequences of the ZmHack-2 (SanMiguel et al., 1996), a gypsy-type plant LTR retrotransposon in Zea mays genome with extremely high copy number (up to 200,000) and sequence diversity (Feschotte et al., 2002). Sequence of the 5 0 LTR (1624 bp) of Hack-2 (U68404) was downloaded, blasted (NCBI; Altschul et al., 1997) and analyzed by Mega4 (Tamura et al., 2007). ...
... A rich source of cryptic splice sites are repeats, in particular, Alu repeats. Indeed, dependent on the stringency of search criteria, 0.1-4% human mature messenger ribonucleic acids (mRNAs) were shown to contain Alu repeats in the protein-coding region (Gotea and Makazowski, 2006). The Alu consensus contains several cryptic sites, both donor and acceptor. ...
Chapter
Alternative splicing is an important mechanism of generating protein diversity and accelerated genome evolution. The mode of the selection acting in constitutive, major alternative and minor alternative regions of human genes is different. Whereas constitutive and major alternative regions tend to evolve under negative (stabilizing) selection, alternatively spliced exons from minor isoforms experience lower selective pressure at the amino acid level accompanied by weak selection against synonymous sequence variation. The McDonald–Kreitman test uses the nucleotide variation for a gene or a set of genes between and within species to detect the positive Darwinian selection in the presence of negative selection. The results of the test suggest that alternatively spliced exons are also subject to positive selection, with up to 27% of amino acids fixed by positive selection. Key concepts Alternative splicing is an important mechanism of generating protein diversity and accelerated genome evolution. Alternatively spliced regions are often evolutionarily young. There is a difference in the selection mode in constitutive, major alternative, and minor alternative regions of human genes. Constitutive and major alternative regions evolve under negative (stabilizing) selection. Up to 27% of positions in minor alternative regions may be experiencing positive selection.
... They may function as part of genome-wide regulatory networks (36). LINE-1 (L1) elements, the major group of nLTR retrotransposons, are known to play important roles in mammalian genome evolution (18,37,38). Early studies of L1 expression put a strong emphasis on primarily germline expression of these elements (39,40). ...
Article
Full-text available
Fragile X-associated tremor/ataxia syndrome (FXTAS) is a neurodegenerative disorder associated with fragile X premutation carriers. Previous studies have shown that fragile X rCGG repeats are sufficient to cause neurodegeneration and that the rCGG-repeat-binding proteins Pur α and heterogeneous nuclear ribonucleoprotein (hnRNP) A2/B1 could modulate rCGG-mediated neuronal toxicity. Mobile genetic elements or their remnants populate the genomes, and the activities of these elements are tightly controlled for the fitness of host genomes in different organisms. Here we provide both biochemical and genetic evidence to show that the activation of a specific retrotransposon, gypsy, can modulate rCGG-mediated neurodegeneration in an FXTAS Drosophila model. We find that one of the rCGG-repeat-binding proteins, hnRNP A2/B1, is involved in this process via interaction with heterochromatin protein 1. Knockdown of gypsy RNA by RNAi could suppress the neuronal toxicity caused by rCGG repeats. These data together point to a surprisingly active role for retrotransposition in neurodegeneration.
... Their contribution to the evolution of protein-coding sequences will be better understood with the increasing progression of comparative genome sequencing projects. Except for the constitutively expressed exonizations, the functions of alternative splice variants containing exonized moieties as protein cassettes remains hypothetical and only their unchanged persistence over long evolutionary periods implies possible func- tion [46]. Future intensive protein sequencing and structural as well as functional studies will provide more significant insights into the contributions of such alternative variants. ...
Article
Protein-coding genes are composed of exons and introns flanked by untranslated regions. Before the mRNA of a gene can be translated into protein, the splicing machinery removes all the intronic regions and joins the protein-coding exons together. Exonization is a process, whereby genes acquire new exons from non-protein-coding, primarily intronic, DNA sequences. Genomic insertions or point mutations within DNA sequences often generate alternative splice sites, causing the splicing system to include new sequences as exons or to elongate existing exons. Because the alternative splice sites are not as efficient as the originals the new variants usually constitute a minor fraction of mature mRNAs. While the prevailing original splice variant maintains functionality, the additional sequence, free from selection pressure, evolves a new function or eventually vanishes. If the new splice variant is advantageous, selection might operate to optimize the new splice sites and consequently increase the proportion of the alternative splice variant. In some instances, the original splice variant is completely replaced by constitutive splicing of the new form. Because of the fortuitous presence of internal splice site-like structures within their sequences, portions of transposed elements frequently serve as modules of exonization. Their recruitment requires a long and versatile optimization process involving multiple changes over a time span of millions, even hundreds of millions, of years. Comparisons of corresponding genes and mRNAs in phylogenetically related species enables one to chronologically reconstruct such changes, from ancient ancestors to living species, in a stepwise manner. We will review this process using three different exemplary cases: (1) the evolution of a constitutively spliced mammalian-wide repeat (MIR), (2) the evolution of an alternative exon 1 from an alternative 5'-extended primary transcript containing an Alu element, and (3) a rare case of the stepwise exoniztion of an Alu element-derived sequence mediated by A-to-I RNA editing.
... In recent years, the view that not all transpositional events are detrimental has gained acceptance. Host genomes have evolved mechanisms to harness the unique properties of transposable elements to their own benefit. Transposed elements (elements that have been rendered incapable of transposition through mutation), and retroelements (elements that transpose via an RNA intermediate) in particular, have played important roles in mammalian genome evolution and in the generation of new human-specific genes345: transposed elements support genome integrity as part of centromeres and telomeres, impact the transcriptome and contribute to tissue-specific gene expression678. Recent evidence from our laboratory and others suggests that, within an individual, neuronal genomes are genetically diverse and that brains are somatic mosaics [9,10]. ...
Article
LINE-1 (L1) elements are retrotransposons that insert extra copies of themselves throughout the genome using a 'copy and paste' mechanism. L1s comprise nearly approximately 20% of the human genome and are able to influence chromosome integrity and gene expression upon reinsertion. Recent studies show that L1 elements are active and 'jumping' during neuronal differentiation. New somatic L1 insertions could generate 'genomic plasticity' in neurons by causing variation in genomic DNA sequences and by altering the transcriptome of individual cells. Thus, L1-induced variation could affect neuronal plasticity and behavior. We discuss potential consequences of L1-induced neuronal diversity and propose that a mechanism for generating diversity in the brain could broaden the spectrum of behavioral phenotypes that can originate from any single genome.
... TEs constitute a substantial proportion of eukaryotic genomes (>40% of human genome) and sculpt the genome by innovating genes and creating new regulatory networks during evolution (Muotri et al., 2007). Although most of these sequences have lost their ability to transpose, some TE fragments can still be transcribed when integrated into mRNA coding regions (Caras et al., 1987;Gotea and Makalowski, 2006). These insertions contribute to alternative splicing and protein coding (Tamura et al., 2007;Wu et al., 2007). ...
Article
It has been reported that many genes and small RNAs are associated with density-dependent polyphenism in locusts. However, the regulatory mechanism underlying gene transcription is still unknown. Here, by analysis of transcriptome database of the migratory locust, we identified abundant transcripts of transposable elements, which are mediators of genetic variation and gene transcriptional regulation, mainly including CR1, I, L2 and RTE-BovB. We cloned one I element, which represents the most abundant transcripts in all transposable elements, and investigated its developmental and tissue-specific expression in gregarious and solitary locusts. Although there are no significant differences of I element expression in whole bodies between gregarious and solitary locusts at various developmental stages, this I element exhibits high expression level and differential expression pattern between gregarious and solitary locusts in central and peripheral nervous tissues, such as brain, antenna and labial palps. These results suggest that I element is potentially involved in the response of neural systems to social environmental changes in locusts.
Preprint
Full-text available
Transcriptomes are dynamic, with cells, tissues, and body parts expressing particular sets of transcripts. Transposons are a known source of transcriptome diversity, however studies often focus on a particular type of chimeric transcript, analyze single body parts or cell types, or are based on incomplete transposon annotations from a single reference genome. In this work, we have implemented a method based on de novo transcriptome assembly that minimizes the potential sources of errors while identifying a comprehensive set of gene-TE chimeras. We applied this method to head, gut and ovary dissected from five Drosophila melanogaster natural populations, with individual reference genomes available. We found that 18.6% of body part specific transcripts are gene-TE chimeras. Overall, chimeric transcripts contribute a median of 38% to the total gene expression, and they provide both DNA binding and catalytic protein domains. Our comprehensive dataset is a rich resource for follow-up analysis. Moreover, because transposable elements are present in virtually all species sequenced to date, their relevant role in spatially restricted transcript expression is likely not exclusive to the species analyzed in this work.
Chapter
Full-text available
Understanding the abundance, diversity, and distribution of TEs in genomes is crucial to understand genome structure, function, and evolution. Advances in whole-genome sequencing techniques, as well as in bioinformatics tools, have increased our ability to detect and analyze the transposable element content in genomes. In addition to reference genomes, we now have access to population datasets in which multiple individuals within a species are sequenced. In this chapter, we highlight the recent advances in the study of TE population dynamics focusing on fruit flies and humans, which represent two extremes in terms of TE abundance, diversity, and activity. We review the most recent methodological approaches applied to the study of TE dynamics as well as the new knowledge on host factors involved in the regulation of TE activity. In addition to transposition rates, we also focus on TE deletion rates and on the selective forces that affect the dynamics of TEs in genomes.
Article
The study of de novo protein-coding genes is maturing from the ad hoc reporting of individual cases to the systematic analysis of extensive genomic data from several species. We identify three key challenges for this emerging field: understanding how best to identify de novo genes, how they arise and why they spread. We highlight the intellectual challenges of understanding how a de novo gene becomes integrated into pre-existing functions and becomes essential. We suggest that, as with protein sequence evolution, antagonistic co-evolution may be key to de novo gene evolution, particularly for new essential genes and new cancer-associated genes.
Article
Full-text available
Cysteine/tyrosine-rich 1 (CYYR1) is a gene we previously identified on human chromosome 21 starting from an in-depth bioinformatics analysis of chromosome 21 segment 40/105 (21q21.3), where no coding region had previously been predicted. CYYR1 was initially characterized as a four-exon gene, whose brain-derived cDNA sequencing predicts a 154-amino acid product. In this study we provide, with in silico and in vitro analyses, the first detailed description of the human CYYR1 locus. The analysis of this locus revealed that it is composed of a multi-transcript system, which includes at least seven CYYR1 alternative spliced isoforms and a new CYYR1 antisense gene (named CYYR1-AS1). In particular, we cloned, for the first time, the following isoforms: CYYR1-1,2,3,4b and CYYR1-1,2,3b, which present a different 3' transcribed region, with a consequent different carboxy-terminus of the predicted proteins; CYYR1-1,2,4 lacks exon 3; CYYR1-1,2,2bis,3,4 presents an additional exon between exon 2 and exon 3; CYYR1-1b,2,3,4 presents a different 5' untranslated region when compared to CYYR1. The complexity of the locus is enriched by the presence of an antisense transcript. We have cloned a long transcript overlapping with CYYR1 as an antisense RNA, probably a non-coding RNA. Expression analysis performed in different normal tissues, tumour cell lines as well as in trisomy 21 and euploid fibroblasts has confirmed a quantitative and qualitative variability in the expression pattern of the multi-transcript locus, suggesting a possible role in complex diseases that should be further investigated.
Article
In the present study, an in silico analysis was performed to identify transposable element (TE) fragments inserted in Cyps with functions associated with resistance to insecticides and developmental regulation as well as in neighboring genes in two sibling species, Drosophila melanogaster and D. simulans. The Cyps associated with insecticide resistance and their neighboring non-Cyp genes have accumulated a greater number of TE fragments than the other Cyps or a random sample of genes, predominantly in the 5´-flanking regions. Most of the insertions were due to DNA transposons, with DNAREP1 fragments being the most common. These fragments carry putative binding sites for transcription factors, which reinforces the hypothesis that DNAREP1 may influence gene regulation and play a role in the adaptation of Drosophila species.
Book
Full-text available
Veja como obter e-book abaixo Capítulo 1. Origem da vida: um tempo curto para uma experiência bem-sucedida Carlos Frederico Martins Menck, Eduardo Gorab e Mariana Cabral de Oliveira Capítulo 2. O mundo de RNA e a origem da complexidade da vida Mariana Cabral de Oliveira e Carlos Frederico Martins Menck Capítulo 3. Genoma não codificante - Uma breve introdução Alysson Renato Muotri e Cassiano Carromeu Capítulo 4. O papel do RNA de interferência na célula eucariótica Stephano Spanó Mello, Luciana Nogueira de Sousa Andrade e Carlos Frederico Martins Menck Capítulo 5. Estabilidade do material genético: mutagênese e reparo Luis Eduardo Soares Netto e Carlos Frederico Martins Menck Capítulo 6. Sexo, por quê? Sergio Russo Matioli e Anita Wajntal Capítulo 7. Taxas de evolução e o relógio molecular Daniela Calcagnotto Capítulo 8. Evolução dos genes nucleares de RNA ribossômico Eduardo Gorab Capítulo 9. O genoma instável, sequências genéticas móveis Marie-Anne Van Sluys, Nathalia de Setta, Katia C. Scortecci e Ana Paula Pimentel Costa Capítulo 10. Evolução dos genomas Laila Alves Nahum Capítulo 11. Biologia evolutiva do desenvolvimento Luis Paulo de Moura Andrioli Capítulo 12. Reconstrução filogenética. Introdução e o método da máxima parcimônia Cristina Yumi Miyaki, Cláudia A. de Moraes Russo e Sergio Luiz Pereira Capítulo 13. Reconstrução filogenética: Métodos geométricos Cláudia A. de Moraes Russo, Cristina Yumi Miyaki, Sergio Luiz Pereira Capítulo 14. Reconstrução filogenética: Métodos probabilísticos Sergio Luiz Pereira, Cristina Yumi Miyaki, Cláudia A. de Moraes Russo Capítulo 15. Reconstrução filogenética: Inferência bayesiana Sergio Luiz Pereira Capítulo 16. Como escolher genes para problemas filogenéticos específicos Claudia A. M. Russo Capítulo 17. Polimorfismos de isozimas Vera Nisaka Solferini e Denise Selivon Scheepmaker Capítulo 18. RFLP: O emprego de enzimas de restrição para a detecção de polimorfismos no DNA Maria Cristina Arias e Maria Elena Infante-Malachias Capítulo 19. Métodos baseados em PCR para análise de polimorfismos de ácidos nucléicos Sergio Russo Matioli e Maria Rita dos Santos e Passos-Bueno Capítulo 20. Genealogias e o processo de coalescência Flora Maria de Campos Fernandes Capítulo 21. Análise filogeográfica Haydée A. Cunha e Antonio M. Solé-Cava Capítulo 22. Biodiversidade molecular e genética da conservação Antonio M. Solé-Cava e Haydée A. Cunha Book not available as ebook. It can be purchased at: https://holoseditora.websiteseguro.com/index.php?area=produto&prodid=321&cat=36 or http://www.sbg.org.br/CatalogoLivro/livro_9788586699757.html
Article
Full-text available
Mobile elements are DNA fragments that are able to self-replicate within the genome of a host organism. Usually, mobile elements comprise about 40-50% of mammalian genome. In the present review, evolutionary recent insertions of mobile elements are considered which have occurred after divergence of human and chimpanzee ancestral forms, i.e. later than about 6 million years. Human-specific transposable elements are represented by relatively small number of copies that can be subdivided into four groups: HERV-K (HML-2), L1, Alu, and SVA. The number of human-specific copies of HERV-K (HML-2), L1, Alu, and SVA representatives amounts roughly to 150, 1200, 5500, and 860 copies per genome respectively. Furthermore, we succeeded in describing a new family of human-specific mobile elements that are present only in human genome and are absent in other primates. Insertions of human-specific mobile elements can be regarded as important candidates for the role of molecular-genetic agents of anthropogenesis--each new insertion of such a mobile element supplies the acceptor gene locus with the set of new functional sites for binding transcription factors that can make significant alterations to adjacent genes functioning. On the basis of known evidences confirming the influence of human-specific mobile elements on adjacent genes expression, total number of human genes regulated by them can be estimated like hundreds.
Chapter
The human genome contains hundreds of genes with protein-coding exons and even complete open reading frames derived from transposable elements. These genes are involved in major biological processes such as immunity, replication, reproduction, cell proliferation and apoptosis. This demonstrates the importance of transposable elements as a genomic pool of coding sequences for the creation and evolution of gene functions. Keywords: transposable elements; human genome; molecular domestication; exonization; evolution
Chapter
Full-text available
Transposable elements (TEs) are selfish fragments of DNA able to reproduce themselves into the host genomes. TEs typically occupy ∼40–50% of the mammalian genomes. In our studies, we focus on evolutionary recent TE inserts that appeared in the DNA of human ancestor lineage after divergence with the chimpanzee ancestry, i.e. less than ∼6 million years ago. These human specific elements (hsTEs) represent only a minor fraction of the whole TE cargo of the human genome. hsTEs are represented by the four families called HERV-K(HML-2), L1, Alu and SVA. The number of human specific copies for HERV-K(HML-2), L1, Alu and SVA families is approx. 150, 1,200, 5,500 and 860 copies per genome, respectively. Taken together, hsTEs shape ∼6.4 megabases of human DNA, which is about 6-times lower than what is occupied by the human specific simple nucleotide polymorphisms, and 23-times smaller than the overall length of human specific deletions and duplications. However, although modest in terms of genomic proportion, hsTEs should be regarded as the perspective candidates for being molecular genetic agents of human speciation. Unlike most of random mutations and duplications, each novel insert of hsTE has provided to the recipient genomic locus a set of functional transcriptional factor binding sites positively selected during the TE evolution. For example, clusters of novel inserts of Alu elements may serve as CpG islets, SVA elements provide functional splice sites and polyadenylation signals, whereas L1 and HERV-K(HML-2) elements donate enhancers, promoters, splice sites and polyadenylation signals. Significant proportion of the human-specific genomic deletions, duplications and translocations has been also generated due to ectopic recombinations between the different individual TE inserts. Among the other, we report for the first time a detailed functional characteristics of the HERV-K(HML-2) hsTEs done at the genome-wide level. We have identified 65 active in vivo human specific promoters contributed by these elements. We also identified three cases of the hsTE-mediated human specific transcriptional regulation of functional protein-coding genes taking part in brain development during embryogenesis. We found ∼180 human specific polyadenylation signals transferred by the SVA elements into the introns of known functional genes. Scaling of these data to the total number of the hsTEs predicts that hundreds of human genes are regulated by these elements. Finally, we discovered the first exclusively human specific TE family, represented by ∼80 members formed by a combination of a part of a CpG islet of human gene MAST2 ansd of the 3′-terminal part of an SVA retrotransposon. According to our estimates, this family, termed CpG-SVA, was far more active than the ancestral SVA family. Our data indicate that MAST2 regulatory sequence was recruited during the evolution to provide effective CpG-SVA transcription in human testicular germ-line cells. KeywordsHuman evolution-Genetic instability-Transposable elements-Human specific promoters-Antisense transcripts-Regulation of gene expression-Brain development-Hybrid family of retrotransposons
Chapter
The L1 retrotransposon is amajor component ofmammalian genomes and hasmolded them throughout evolution in many ways, thereby expanding the possibilities for human diversity. In this paper,we discuss one further mechanism bywhich L1 can alter the genome, namely the retrotransposition of a transcript involving sequences from twoadjacent genes toformanewgene. Inaddition, a small fractionof L1elements inthe human genome is still actively retrotransposing, but there are little data on the extent of variation in retrotransposition potential among different individual human beings. Here we present evidence for considerable individual variation in L1 retrotransposition capability, a finding that has significant implications for the role of retrotransposition in present-day human neurological diversity.
Article
Full-text available
Palaeontologists, Stephen J. Gould and Elisabeth Vrba, introduced the term "ex-aptation" with the aim of improving and enlarging the scientific language available to researchers studying the evolution of any useful character, instead of calling it an "adaptation" by default, coming up with what Gould named an "extended taxonomy of fitness". With the extension to functional co-optations from non-adaptive structures ("spandrels"), the notion of exaptation expanded and revised the neo-Darwinian concept of "pre-adaptation" (which was misleading, for Gould and Vrba, suggesting foreordination). Exaptation is neither a "saltationist" nor an "anti-Darwinian" concept and, since 1982, has been adopted by many researchers in evolutionary and molecular biology, and particularly in human evolution. Exaptation has also been contested. Objections include the "non-operationality objection".We analyze the possible operationalization of this concept in two recent studies, and identify six directions of empirical research, which are necessary to test "adaptive vs. exaptive" evolutionary hypotheses. We then comment on a comprehensive survey of literature (available online), and on the basis of this we make a quantitative and qualitative evaluation of the adoption of the term among scientists who study human evolution. We discuss the epistemic conditions that may have influenced the adoption and appropriate use of exaptation, and comment on the benefits of an "extended taxonomy of fitness" in present and future studies concerning human evolution.
Article
In an article published in these pages, Elhaik et al. (Elhaik E, Landan G, Graur D. 2009. Can GC content at third-codon positions be used as a proxy for isochore composition? Mol Biol Evol. 26:1829-1833) asked if GC3, the GC level of the third-codon positions in protein-coding genes, can be used as a "proxy" to estimate the GC level of the surrounding isochore. We use available data to directly answer this simple question in the affirmative and show how the use of indirect methods can lead to apparently conflicting conclusions. The answer reasserts that in human and other vertebrates, genes have a strong tendency to reside in compositionally corresponding isochores, which has far-reaching implications for genome structure and evolution.
Article
dans lesquelles les rétrotransposons jouent un rôle moteur. Dans ce cadre, nous nous sommes fixé trois objectifs de travail : 1) améliorer notre connaissance des relations phylogénétiques au sein du genre Lupinus (Fabaceae) par l'utilisation de nouveaux marqueurs nucléaires (ARNr-ETS et SymRK), 2) évaluer par amplification et par hybridation in situ la diversité, l'abondance et le rôle des rétrotransposons Ty1/copia et Ty3/gypsy dans les variations de taille de génome des lupins, et 3) séquencer, annoter et comparer une première région génomique disponible pour un lupin avec les régions homologues d'autres fabacées. La phylogénie obtenu améliore notre compréhension de l'histoire évolutive des lupins, etmet en évidence des schémas de variation de taille de génome différents d'une lignée à l'autre. Les analyses de rétrotransposons révèlent que les éléments copia et gypsy contribuent de façon plus significative aux différences de taille de génome chez les lupins méditerranéens que chez les lupins africains et suggèrent différents modes et mécanismes d'évolution de la taille des génomes au sein du genre. À l'échelle locale (région du gène SymRK), nous confirmons la forte implication de ces éléments qui représentent 25% de la région analysée chez Lupinus angustifolius.
Article
Full-text available
With its theoretical basis firmly established in molecular evolutionary and population genetics, the comparative DNA and protein sequence analysis plays a central role in reconstructing the evolutionary histories of species and multigene families, estimating rates of molecular evolution, and inferring the nature and extent of selective forces shaping the evolution of genes and genomes. The scope of these investigations has now expanded greatly owing to the development of high-throughput sequencing techniques and novel statistical and computational methods. These methods require easy-to-use computer programs. One such effort has been to produce Molecular Evolutionary Genetics Analysis (MEGA) software, with its focus on facilitating the exploration and analysis of the DNA and protein sequence variation from an evolutionary perspective. Currently in its third major release, MEGA3 contains facilities for automatic and manual sequence alignment, web-based mining of databases, inference of the phylogenetic trees, estimation of evolutionary distances and testing evolutionary hypotheses. This paper provides an overview of the statistical methods, computational tools, and visual exploration modules for data input and the results obtainable in MEGA.
Article
Full-text available
A search for new members of the mammalian interspersed repeat (MIR) family has been done over the coding regions of human genome from GenBank-116. Only 254 nucleotide sequences contained MIRs in coding regions, of which 45 MIR copies were unknown before, including 17 that occurred in translated gene regions. The program developed by the authors has been demonstrated to surpass the CENSOR program in the search power. The evolution of the MIR copies located in translated regions of human genome is discussed.
Article
Full-text available
Interspersed repetitive sequences are major components of eukaryotic genomes. Repetitive elements comprise over 50% of the mammalian genome. Because the specific function of these elements remains to be defined and because of their unusual ‘behaviour’ in the genome, they are often quoted as a selfish or junk DNA. Our view of the entire phenomenon of repetitive elements has to now be revised in the light of data on their biology and evolution, especially in the light of what we know about the retroposons. I would like to argue that even if we cannot define the specific function of these elements, we still can show that they are not useless pieces of the genomes. The repetitive elements interact with the surrounding sequences and nearby genes. They may serve as recombination hot spots or acquire specific cellular functions such as RNA transcription control or even become part of protein coding regions. Finally, they provide very efficient mechanism for genomic shuffling. As such, repetitive elements should be called genomic scrap yard rather than junk DNA. Tables listing examples of recruited (exapted) transposable elements are available at http://www.ncbi.nlm.gov/Makalowski/ScrapYard/
Article
Full-text available
A 65-bp “core” sequence is dispersed in hundreds of thousands copies in the human genome. This sequence was found to constitute the central segment of a group of short interspersed elements (SINEs), referred to as mammalian-wide interspersed repeats, that proliferated before the radiation of placental mammals. Here, we propose that the core identifies an ancient tRNA-like SINE element, which survived in different lineages such as mammals, reptiles, birds, and fish, as well as mollusks, presumably for >550 million years. This element gave rise to a number of sequence families (CORE-SINEs), including mammalian-wide interspersed repeats, whose distinct 3′ ends are shared with different families of long interspersed elements (LINEs). The evolutionary success of the generic CORE-SINE element can be related to the recruitment of the internal promoter from highly transcribed host RNA as well as to its capacity to adapt to changing retropositional opportunities by sequence exchange with actively amplifying LINEs. It reinforces the notion that the very existence of SINEs depends on the cohabitation with both LINEs and the host genome.
Article
Full-text available
Genomic nomenclature has not kept pace with the levels and depth of analyzing and understanding genomic structure, function, and evolution. We wish to propose a general terminology that might aid the integrated study of evolution and molecular biology. Here we designate as a "nuon" any stretch of nucleic acid sequence that may be identifiable by any criterion. We show how such a general term will facilitate contemplation of the structural and functional contributions of such elements to the genome in its past, current, or future state. We focus in this paper on pseudogenes and dispersed repetitive elements, since their current names reflect the prevalent view that they constitute dispensable genomic noise (trash), rather than a vast repertoire of sequences with the capacity to shape an organism during evolution. This potential to contribute sequences for future use is reflected in the suggested terms "potonuons" or "potogenes." If such a potonuon has been coopted into a variant or novel function, an evolutionary process termed "exaptation," we employ the term "xaptonuon." If a potonuon remains without function (nonaptive nuon), it is a "nonaptation" and we term it "naptonuon." A number of examples for potonuons and xaptonuons are given.
Article
Full-text available
Article
Full-text available
In studies of mutations causing deficiency of ornithine delta-aminotransferase (EC 2.6.1.13), we found an allele whose mature mRNA has a 142-nucleotide insertion at the junction of sequences from exons 3 and 4. The insert derives from an Alu element in ornithine delta-aminotransferase intron 3 oriented in the direction opposite to transcription (an "antisense Alu"). A guanine----cytosine transversion creates a donor splice site in this Alu, activating a cryptic acceptor splice site at its 5' end and causing splice-mediated insertion of an Alu fragment into the mature ornithine-delta-aminotransferase mRNA. We note that the complement of the Alu consensus sequence has at least two cryptic acceptor sites and several potential donor sequences and predict that similar mutations will be found in other genes.
Article
Full-text available
The amino acid sequence of the cytosolic human placenta protein-tyrosine-phosphatase 1B (PTPase 1B; protein-tyrosine-phosphate phosphohydrolase, EC 3.1.3.48) has been determined. It consists of a single chain of 321 residues with an N-acetylated N-terminal methionine and an unusually proline-rich C-terminal region. The enzyme is structurally related to the two cytoplasmic domains of both the leukocyte common antigen CD45 and LAR, a CD45-like molecule with an external segment that resembles a neural cell adhesion molecule. A low molecular weight protein encoded by a cDNA clone from T cells also shows extensive sequence similarities. The present study defines homologous domains common to this diverse family of PTPases that includes both soluble and receptor-like transmembrane forms. The cysteinyl residues 121 and 215 of PTPase 1B are conserved among all members of the family and are candidates for involvement in catalysis since PTPase 1B is inactivated by thiol modifying reagents. Two segments rich in positively charged residues (residues 33-47 and 227-238) may provide sites of interaction with inhibitory anionic polymers such as heparin or poly(Glu/Tyr).
Article
Full-text available
We have used available protein sequence data for the anaphylatoxin (C5a) portion of the fifth component of human complement (residues 19-25) to synthesize a mixed-sequence oligonucleotide probe. The labeled oligonucleotide was then used to screen a human liver cDNA library, and a single candidate cDNA clone of 1.85 kilobase pairs was isolated. Hybridization of the mixed-sequence probe to the complementary strand of the plasmid insert and sequence analysis by the dideoxy method predicted the expected protein sequence of C5a (positions 1-12), amino-terminal to the anticipated priming site. The sequence obtained further predicted an arginine-rich sequence (RPRR) immediately upstream of the N-terminal threonine of C5a, indicating that the promolecule form of C5 is synthesized with a beta alpha-chain orientation as previously shown for pro-C3 and pro-C4. The C5 cDNA clone was sheared randomly by sonication, subcloned into M13 mp8, and sequenced at random by the dideoxy technique, thereby generating a contiguous sequence of 1703 base pairs. This clone contained coding sequence for the C-terminal 262 amino acid residues of the beta-chain, the entire C5a fragment, and the N-terminal 98 residues of the alpha'-chain. The 3' end of the clone had a polyadenylated tail preceded by a polyadenylation recognition site, a 3'-untranslated region, and base pairs homologous to the human Alu concensus sequence. Comparison of the derived partial human C5 protein sequence with that previously determined for murine C3 and human alpha 2-macroglobulin has indicated regions of pronounced sequence similarity. Examination of cytoplasmic RNA prepared from human liver and the human hepatoma cell line Hep G2 by Northern transfer has indicated a C5 mRNA species of about 5.2 kilobase pairs.
Article
Full-text available
Dispersion of repetitive sequence elements is a source of genetic variability that contributes to genome evolution. Alu elements, the most common dispersed repeats in the human genome, can cause genetic diseases by several mechanisms, including de novo Alu insertions and splicing of intragenic Alu elements into mRNA. Such mutations might contribute positively to protein evolution if they are advantageous or neutral. To test this hypothesis, we searched the literature and sequence databases for examples of protein-coding regions that contain Alu sequences: 17 Alu 'cassettes' inserted within 15 different coding sequences were found. In three instances, these events caused genetic diseases; the possible functional significance of the other Alu-containing mRNAs is discussed. Our analysis suggests that splice-mediated insertion of intronic elements is the major mechanism by which Alu segments are introduced into mRNAs.
Article
Full-text available
The double-stranded RNA-specific editase 1 (RED1/ADAR2) is implicated in the editing of precursor-mRNAs (pre-mRNA) encoding subunits of glutamate receptors (GluRs) in brain. Site-specific deamination of adenosine to inosine alters the codon at the Q/R site in GluR-B rendering the heteromeric receptor impermeable to Ca2+ ions. We cloned human RED1 (hRED1/hADAR2) cDNAs from a brain cDNA library. The human enzyme is 95% identical to the rat homologue. We characterized two alternatively spliced forms that differed by the presence of an Alu-J cassette in the deaminase domain. For the long form containing the Alu cassette, we isolated cDNA clones with an alternative C-terminus and 3'-UTR. An 8.8-kb transcript of hRED1 is most abundant in brain and heart, and lower levels are detected in other tissues. In vitro editing assays with purified recombinant hRED1 containing or lacking the Alu-J cassette revealed that both forms of the protein have the same substrate specificity, but differ in their catalytic activity.
Article
Full-text available
SWISS-PROT is a curated protein sequence database which strives to provide a high level of annotation (such as the description of the function of a protein, its domain structure, post-translational modifications, variants, etc), a minimal level of redundancy and a high level of integration with other databases. Recent developments of the database include: an increase in the number and scope of model organisms; cross-references to seven additional databases; a variety of new documentation files; the creation of TREMBL, an unannotated supplement to SWISS-PROT. This supplement consists of entries in SWISS-PROT-like format derived from the translation of all coding sequences (CDS) in the EMBL nucleotide sequence database, except CDS already included in SWISS-PROT.
Article
Full-text available
The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.
Article
Full-text available
The completion of the human genome will greatly accelerate the development of a new branch of science--evolutionary genomics. We can now directly address important questions about the evolutionary history of human genes and their regulatory sequences. Computational analyses of the human genome will reveal the number of genes and repetitive elements, the extent of gene duplication and compositional heterogeneity in the human genome, and the extent of domain shuffling and domain sharing among proteins. Here we present some first glimpses of these features.
Article
Full-text available
mRNA profiling enables the expression levels of thousands of transcripts in a cell to be monitored simultaneously. Nevertheless, analyses in yeast and mammalian cells have demonstrated that mRNA levels alone are unreliable indicators of the corresponding protein abundances. This discrepancy between mRNA and protein levels argues for the relevance of additional control mechanisms besides transcription. As translational control is a major mechanism regulating gene expression, the use of translated mRNA in profiling experiments might depict the proteome more closely than does the use of total mRNA. This would combine the technical potential of genomics with the physiological relevance of proteomics.
Article
Full-text available
Genetic information of human is encoded in two genomes: nuclear and mitochondrial. Both of them reflect molecular evolution of human starting from the beginning of life (about 4.5 billion years ago) until the origin of Homo sapiens species about 100,000 years ago. From this reason human genome contains some features that are common for different groups of organisms and some features that are unique for Homo sapiens. 3.2 x 10(9) base pairs of human nuclear genome are packed into 23 chromosomes of different size. The smallest chromosome - 21st contains 5 x 10(7) base pairs while the biggest one -1st contains 2.63 x 10(8) base pairs. Despite the fact that the nucleotide sequence of all chromosomes is established, the organisation of nuclear genome put still questions: for example: the exact number of genes encoded by the human genome is still unknown giving estimations from 30 to 150 thousand genes. Coding sequences represent a few percent of human nuclear genome. The majority of the genome is represented by repetitiVe sequences (about 50%) and noncoding unique sequences. This part of the genome is frequently wrongly called "junk DNA". The distribution of genes on chromosomes is irregular, DNA fragments containing low percentage of GC pairs code lower number of genes than the fragments of high percentage of GC pairs.
Article
Full-text available
Most active non-LTR (long terminal repeat) retrotransposons carry two open reading frames (ORFs) encoding ORF1p and ORF2p proteins. The ORF2p proteins are relatively well studied and are known to contain endonuclease/reverse transcriptase domains. At the same time, the biological function of ORF1p proteins remains poorly understood, except in that they nonspecifically bind single-stranded mRNA/DNA molecules. CR1-like elements form the most widely distributed clade/superfamily of non-LTR retrotransposons. We found that ORF1p proteins encoded by diverse CR1-like elements contain conserved esterase domain (ES) or plant homeodomain (PHD). This indicates that CR1-like ORF1p proteins are either lipolytic enzymes or are involved in protein-protein interactions related to chromatin remodeling. Sequence conservation of ES suggests that interaction with cellular membranes is an important phase in life circles of CR1-like elements. Presumably such interaction helps in penetrating host cells. As a consequence, the presence of multiple young CR1 families characterized by approximately 10% intrafamily and 40% interfamily identities may be explained by a relatively frequent horizontal transfer of these CR1-like elements. Unexpectedly, ES links together non-LTR retrotransposons and single-stranded RNA viruses like influenza C and coronaviruses, which are known to depend on their own ES.
Article
Full-text available
To explore the possibility that an arbitrary sequence can evolve towards acquiring functional role when fused with other pre-existing protein modules, we replaced the D2 domain of the fd-tet phage genome with the soluble random polypeptide RP3-42. The replacement yielded an fd-RP defective phage that is six-order magnitude lower infectivity than the wild-type fd-tet phage. The evolvability of RP3-42 was investigated through iterative mutation and selection. Each generation consists of a maximum of ten arbitrarily chosen clones, whereby the clone with highest infectivity was selected to be the parent clone of the generation that followed. The experimental evolution attested that, from an initial single random sequence, there will be selectable variation in a property of interest and that the property in question was able to improve over several generations. fd-7, the clone with highest infectivity at the end of the experimental evolution, showed a 240-fold increase in infectivity as compared to its origin, fd-RP. Analysis by phage ELISA using anti-M13 antibody and anti-T7 antibody revealed that about 37-fold increase in the infectivity of fd-7 was attributed to the changes in the molecular property of the single polypeptide that replaced the D2 domain of the g3p protein. This study therefore exemplifies the process of a random polypeptide generating a functional role in rejuvenating the infectivity of a defective bacteriophage when fused to some preexisting protein modules, indicating that an arbitrary sequence can evolve toward acquiring a functional role. Overall, this study could herald the conception of new perspective regarding primordial polypeptides in the field of molecular evolution.
Article
Here, we report the presence of two splice variants of the human epithelial sodium channel α subunit (hαENaC) containing Alu cassette, namely hαENaC+22 and hαENaC+Alu, in various tissues. Functional expression of these splice variants with hENaC β and γ subunits produced loss-of-channel activity in the Xenopus oocyte expression system. Interestingly, coexpression of hαENaC+22 or hαENaC+Alu, respectively, with wild type hENaC α, β, and γ subunits enhanced the expression of amiloride-sensitive current in oocytes. The presence of Alu sequences in the 3′-untranslated region of hγENaC was also identified.
Article
To study the genome-wide impact of transposable elements (TEs) on the evolution of protein-coding regions, we examined 13 799 human genes and found 533 (∼4%) cases of TEs within protein-coding regions. The majority of these TEs (∼89.5%) reside within ‘introns’ and were recruited into coding regions as novel exons. We found that TE integration often has an effect on gene function. In particular, there were two mouse genes whose coding regions consist largely of TEs, suggesting that TE insertion might create new genes. Thus, there is increasing evidence for an important role of TEs in gene evolution. Because many TEs are taxon-specific, their integration into coding regions could accelerate species divergence.
Article
Adaptation has been defined and recognized by two different criteria: historical genesis (features built by natural selection for their present role) and current utility (features now enhancing fitness no matter how they arose). Biologists have often failed to recognize the potential confusion between these different definitions because we have tended to view natural selection as so dominant among evolutionary mechanisms that historical process and current product become one. Yet if many features of organisms are non-adapted, but available for useful cooptation in descendants, then an important concept has no name in our lexicon (and unnamed ideas generally remain unconsidered): features that now enhance fitness but were not built by natural selection for their current role. We propose that such features be called exaptations and that adaptation be restricted, as Darwin suggested, to features built by selection for their current role. We present several examples of exaptation, indicating where a failure to conceptualize such an idea limited the range of hypotheses previously available. We explore several consequences of exaptation and propose a terminological solution to the problem of preadaptation.
Article
In the course of the genomic cloning of nCL-2, a stomach-specific calpain, we identified a genomic clone encoding a novel member of the calpain large subunit family and designated it 'nCL-4'. First, using exon sequences, we cloned the cDNA for mouse nCL-4. Based on this sequence, we also cloned the cDNAs for rat and human nCL-4. In the case of human nCL-4, the longest open reading frame encodes 690 amino acid residues (Mr 79095) with equal sequence similarities (50-55%) to both ubiquitous and organ-specific calpain large subunits from mammals. The deduced amino acid sequence revealed that nCL-4 is highly conserved among mammals. nCL-4 can be aligned without significant deletions or insertions, and, thus, like other calpains, can be divided into four domains (I-IV). The significant similarity of domains II and IV to those in conventional calpain large subunits suggests the potential protease activity and Ca2+-binding ability of nCL-4. Northern blot analysis revealed that the mRNA for nCL-4 is expressed predominantly in stomach and small intestine but not in uterus, suggesting specialized functions of nCL-4 in the digestive tract. When overexpressed in COS-7 cells, a specific band for nCL-4 was detected. In addition, the gene coding for nCL-4 was localized on human chromosome 1.
Article
Decay-accelerating factor (DAF), a glycoprotein that is anchored to the cell membrane by phosphatidylinositol, binds activated complement fragments C3b and C4b, thereby inhibiting amplification of the complement cascade on host cell membranes. Here, we report the molecular cloning of human DAF from HeLa cells. Analysis of DAF complementary DNAs revealed two classes of DAF messenger RNA, one apparently derived from the other by a splicing event that causes a coding frameshift near the C terminus. The apparent 'intron' sequence contains an Alu family member and encodes contiguous protein sequence. Two DAF proteins are therefore possible, having divergent C-terminal domains which differ in their hydrophobicity. Both mRNAs are found on polysomes, suggesting that both are translated. We propose that the major (90%) spliced DAF mRNA encodes membrane-bound DAF whereas the minor (10%) unspliced DAF mRNA may encode secreted DAF and we present expression data supporting this. The deduced DAF sequence contains four repeating units homologous to a consensus repeat found in a recently described family of complement proteins.
Article
The changes in DNA sequence that have taken place during the evolution of eukaryotic genomes cannot be accounted for simply by base substitutions; some more complex mutations must have taken place as well. Transposable elements can affect gene structure and expression in several ways that suggest that they may have contributed to these evolutionary events.
Article
Two types of calcium-dependent protease with distinct calcium requirements (termed muCANP and mCANP) are known in mammalian tissues. These two isozymes consist of different large (80-kDa) subunits (mu- or m-types) and identical small (30-kDa) subunits. By screening human and rat muscle cDNA libraries with a cDNA probe for the chicken CANP large subunit, which has a structure similar to both the mammalian mu- and m-types, a cDNA clone encoding a novel member of the CANP large subunit family was obtained. The encoded protein (designated "p94") consists of 821 amino acid residues (Mr 94,084) and shows significant sequence homology with both human mu-type (54%) and m-type (51%) large subunits. p94 can be divided into four domains (I-IV) as reported for the CANP large subunit family. Domains II and IV are potential cysteine protease and calcium-binding domains, respectively, and have sequences homologous to the corresponding domains of other CANP large subunits. However, domain I of p94 is significantly different from others. Moreover, p94 contains two unique sequences of 62 and 77 residues in domains II and III, respectively. In contrast to the ubiquitous expression of mu- and m-types, Northern blot analysis revealed that the mRNA for p94 exists only in skeletal muscle with none detected in other tissues including heart muscle and smooth muscles such as intestine.
Article
Two rel-containing cDNA clones were isolated from a library derived from the Daudi human cell line, which is known to express c-rel mRNA. Clone #1 appeared to contain the entire c-rel coding sequence, which differs from v-rel in having three additional N-terminal residues and 111 additional C-terminal residues. In addition, Clone #1 had an internal 32 amino acid exon not found in v-rel or in turkey c-rel. Clone #2 was truncated at its 5' end and did not contain this new exon. Analysis of a genomic clone of human c-rel revealed that the new exon was a portion of an inverted Alu repeat. The occurrence of potential splice sites and of open reading frames in the inverted consensus Alu sequence suggests that the incorporation of Alu fragments as potential coding exons could be a relatively common event in human mRNAs. Whether such messages can be translated is unknown: antiserum raised against a peptide at the predicted C-terminus of the c-rel protein precipitated p82hc-rel, but antiserum raised against a peptide located in the Alu exon did not.
Article
Two simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions are presented. Although they give no weights to different types of codon substitutions, these methods give essentially the same results as those obtained by Miyata and Yasunaga's and by Li et al.'s methods. Computer simulation indicates that estimates of synonymous substitutions obtained by the two methods are quite accurate unless the number of nucleotide substitutions per site is very large. It is shown that all available methods tend to give an underestimate of the number of nonsynonymous substitutions when the number is large.
Article
A new method called the neighbor-joining method is proposed for reconstructing phylogenetic trees from evolutionary distance data. The principle of this method is to find pairs of operational taxonomic units (OTUs [= neighbors]) that minimize the total branch length at each stage of clustering of OTUs starting with a starlike tree. The branch lengths as well as the topology of a parsimonious tree can quickly be obtained by using this method. Using computer simulation, we studied the efficiency of this method in obtaining the correct unrooted tree in comparison with that of five other tree-making methods: the unweighted pair group method of analysis, Farris's method, Sattath and Tversky's method, Li's method, and Tateno et al.'s modified Farris method. The new, neighbor-joining method and Sattath and Tversky's method are shown to be generally better than the other methods.
Article
Natural selection operating within genomes will inevitably result in the appearance of DNAs with no phenotypic expression whose only 'function' is survival within genomes. Prokaryotic transposable elements and eukaryotic middle-repetitive sequences can be seen as such DNA's and thus no phenotypic or evolutionary function need be assigned to them.
Article
A quantitative population genetics model for the evolution of transposable genetic elements is developed. This model shows that "selfish" DNA sequences do not have to be selectively neutral at the organismic level; indeed, such DNA can produce major deleterious effects in the host organism and still spread through the population. The model can be used to explain the evolution of introns within eukaryotic genes; this explanation does not invoke a long-term evolutionary advantage for introns, nor does it depend on the hypothesis that eukaryotic gene structure may be an evolutionary relic. Transposable genes that carried information specifying sexual reproduction in the host organism would favor their own spread. Consequently, it is tempting to speculate that some of the genes controlling sex were originally selected as transposable elements.
Article
The DNA of higher organisms usually falls into two classes, one specific and the other comparatively nonspecific. It seems plausible that most of the latter originates by the spreading of sequences which had little or no effect on the phenotype. We examine this idea from the point of view of the natural selection of preferred replicators within the genome.
Article
Charcot-Marie-Tooth disease type 1A (CMT1A) is a common autosomal dominant demyelinating neuropathy that is associated with a 1.5 megabase (Mb) tandem DNA duplication in chromosome 17p11.2-p12. Hereditary neuropathy with liability to pressure palsies (HNPP, tomaculous neuropathy) is another less frequently diagnosed autosomal dominant neuropathy and is associated with a 1.5 Mb deletion in chromosome 17p11.2-12. Meiotic unequal crossover is a proposed mechanism for the generation of both the duplication in CMT1A and the deletion in HNPP. CMT1A-REP is a repeat that flanks the region which is duplicated/deleted in CMT1A/HNPP. The CMT1A-REP repeat sequence may mediate unequal crossover through misalignment of the homologous, repeated sequences. Three copies of the CMT1A-REP repeat are present on stably inherited CMT1A duplication chromosomes. In this report, molecular analysis in multiple patients detected three copies of the CMT1A-REP sequence on both inherited and de novo CMT1A duplication chromosomes, and one copy of the CMT1A-REP repeat on the deleted chromosome in both inherited and de novo HNPP. These observations support the hypothesis that a reciprocal recombination mechanism involving the CMT1A-REP is responsible for the generation of both the duplicated and deleted chromosomes, and document the first examples in humans of Mendelian syndromes resulting from the reciprocal products of unequal exchange involving large intra-chromosomal segments.
Article
Protein tyrosine phosphatases (PTPs) constitute a family of receptor-like and cytoplasmic signal transducing enzymes that catalyze the dephosphorylation of phosphotyrosine residues and are characterized by homologous catalytic domains. The crystal structure of a representative member of this family, the 37-kilodalton form (residues 1 to 321) of PTP1B, has been determined at 2.8 A resolution. The enzyme consists of a single domain with the catalytic site located at the base of a shallow cleft. The phosphate recognition site is created from a loop that is located at the amino-terminus of an alpha helix. This site is formed from an 11-residue sequence motif that is diagnostic of PTPs and the dual specificity phosphatases, and that contains the catalytically essential cysteine and arginine residues. The position of the invariant cysteine residue within the phosphate binding site is consistent with its role as a nucleophile in the catalytic reaction. The structure of PTP1B should serve as a model for other members of the PTP family and as a framework for understanding the mechanism of tyrosine dephosphorylation.
Article
While the potential importance of mRNA stability to the regulation of gene expression has been recognized, the structures and mechanisms involved in the determination of individual mRNA decay rates have just begun to be elucidated, particularly in mammalian systems and yeast. It is now well established that mRNA decay is not a default process, in which an array of nonspecific nucleases degrades indiscriminately based on target size or ribosome protection of the substrate. Rather, like transcription, RNA processing, and translation, mRNA decay is a precise process dependent on a variety of specific cis-acting sequences and trans-acting factors. Entry into the pathways of mRNA decay is triggered by at least three types of initiating event: poly(A) shortening, arrest of translation at a premature nonsense codon, and endonucleolytic cleavage. Steps subsequent to poly(A) shortening or premature translational termination converge in a pathway that progresses from removal of the 5' cap to exonucleolytic digestion of the body of the mRNA. mRNA fragments generated by endonucleolytic cleavage are most likely removed by exonucleolytic decay as well, but these events have not been characterized in detail. Nucleases and other factors (including mRNA sequence elements and autoregulatory proteins) required for the promotion or inhibition of these pathways have been identified by both biochemical and genetic methods and systematic attempts to understand their respective roles have begun. mRNA sequences whose presence or absence has marked effects on mRNA decay rates include the ubiquitous cap and poly(A) tail, sequences that comprise endonuclease cleavage sites, and sequences that promote poly(A) shortening. The latter are found in the 3'-UTR (untranslated region) and in coding regions. Evidence that poly(A) stimulates translation initiation, that some destabilization sequences must be translated in order to function, and that premature translation termination promotes rapid mRNA decay indicates a close linkage between the elements regulating mRNA decay and components of the protein synthesis apparatus. This linkage, and other data, leads us to propose a model for a functional mRNP. In this model, interactions between factors associated with opposite ends of an mRNA stimulate translation initiation and minimize the rate of entry into the pathways of mRNA decay. Events that initiate mRNA decay are postulated to be those that can disrupt this functional complex and create substrates for exonucleolytic digestion.
Article
Protein tyrosine phosphatases (PTPs) regulate various physiological events in animal cells. They comprise a diverse family which are classified into two categories, receptor type and nonreceptor type. From the domain organization and phylogenetic tree, we have classified known PTPs into 17 subtypes (9 receptor-type and 8 nonreceptor-type PTPs) which are characterized by different organization of functional domain and independent cluster in tree. The receptor type PTPs are thought to be implicated in cell-cell adhesion by association of cell adhesion molecules. Since sponges are the most primitive multicellular animals and are thought to be lacking cell cohesiveness and coordination typical of eumetazoans, cloning and sequencing of PTP cDNAs of Ephydatia fluviatilis (freshwater sponge) have been conducted by RT-PCR to determine whether or not sponges have PTP genes in their genomes. We have isolated nine PTPs, of which five are possibly receptor type. A phylogenetic tree including the sponge PTPs revealed that most of the gene duplications that gave rise to the 17 subtypes had been completed in the very early evolution of animals before the parazoan-eumetazoan split, the earliest branching among extant animal phyla. The family tree also revealed the rapid evolutionary rate of PTP subtypes in the early stage of animal evolution.
Article
Calpain, a calcium (Ca2+)-activated cysteine protease presents in several somatic mammalian cells, has been demonstrated to mediate specific Ca2+-dependent reactions including cell fusion. Because spermatozoa cells have an absolute Ca2+ requirement for penetration of oocytes, we have postulated that calpain would also be found in mammalian spermatozoa. Here we show that whole sperm homogenate and cell fractions prepared from ejaculated human spermatozoa contain calpain activity. Specific calpain inhibitors impaired this proteolytic activity. Unlike the enzyme described in somatic cells, sperm calpain was mostly particulate in nature and its activity was maximal at pH 9.0. Presence of sperm calpain was confirmed by immunoblot analysis using specific anti-calpain I and anti-calpain II antibodies. A 67 kDa calpain II protein and a 75 kDa calpain I protein were detected. Also spermatozoa contain the endogenous calpain inhibitor, calpastatin. We detected 158.8 +/- 24.5 (mean +/- SD) fmol calpastatin/mg sperm protein. Immunoblot analysis using specific antibodies showed a 68 kDa calpastatin protein located in the cytosolic fraction. This is the first demonstration that a complete calpain-calpastatin system exists in mammalian spermatozoa. Because calpain is a unique effector system for calcium-dependent processes, our data reveals a novel mechanism by which calcium exerts its regulatory functions in spermatozoa.
Article
Alu elements have amplified in primate genomes through a RNA-dependent mechanism, termed retroposition, and have reached a copy number in excess of 500,000 copies per human genome. These elements have been proposed to have a number of functions in the human genome, and have certainly had a major impact on genomic architecture. Alu elements continue to amplify at a rate of about one insertion every 200 new births. We have found 16 examples of diseases caused by the insertion of Alu elements, suggesting that they may contribute to about 0.1% of human genetic disorders by this mechanism. The large number of Alu elements within primate genomes also provides abundant opportunities for unequal homologous recombination events. These events often occur intrachromosomally, resulting in deletion or duplication of exons in a gene, but they also can occur interchromosomally, causing more complex chromosomal abnormalities. We have found 33 cases of germ-line genetic diseases and 16 cases of cancer caused by unequal homologous recombination between Alu repeats. We estimate that this mode of mutagenesis accounts for another 0.3% of human genetic diseases. Between these different mechanisms, Alu elements have not only contributed a great deal to the evolution of the genome but also continue to contribute to a significant portion of human genetic diseases.
Article
Calpains are a superfamily of related proteins, some of which have been shown to function as calcium-dependent cysteine proteases. In mammals, eight different calpains have been identified. We report the identification of a new mammalian calpain gene, CAPN11. The predicted protein possesses the features typical of calpains including potential protease and calcium-binding domains. The CAPN11 mRNA exhibits a highly restricted tissue distribution with highest levels present in testis. Radiation hybrid mapping localized the gene to human chromosome 6, within a region mapped to p12. Phylogenetic analysis suggests that, in mammals, the predicted CAPN11 protein is most closely related to CAPN1 and CAPN2. However, of the calpain sequences available, the predicted CAPN11 sequence exhibits greatest homology to the chicken micro/m calpain. Thus CAPN11 may be the human orthologue of micro/m calpain. The discovery of this new calpain emphasizes the complexity of the calpain family, with members being distinguished on the basis of protease activity, calcium dependence, and tissue expression.
Article
A conserved mRNA degradation system, referred to as mRNA surveillance, exists in eukaryotic cells to degrade aberrant mRNAs. A defining aspect of aberrant transcripts is that the spatial relationship between the termination codon and specific downstream sequence information has been altered. A key, yet unknown, feature of the mRNA surveillance system is how this spatial relationship is assessed in individual transcripts. Two views have emerged to describe how discrimination between proper and improper termination might occur. In the first view, a surveillance complex assembles onto the mRNA after translation termination, and scans the mRNA in a 3' to 5' direction for a limited distance. If specific downstream sequence information is encountered during this scanning, then the surveillance complex targets the transcript for rapid decay. An alternate view suggests that the downstream sequence information influences how translation termination occurs. This view encompasses several ideas including: (a) The architecture of the mRNP can alter the rate of key steps in translation termination; (b) the discrimination between a proper and improper termination occurs via an internal, Upf1-dependent, timing mechanism; and (c) proper termination results in the restructuring of the mRNP to a form that promotes mRNA stability. This proposed model for mRNA surveillance is similar to other systems of kinetic proofreading that monitor the accuracy of other biogenic processes such as translation and spliceosome assembly.
Article
Since separation from fungi and plants, multicellular animals evolved a variety of gene families involved in cell-cell communication from a limited number of ancestral precursors by gene duplications in two separate periods of animal evolution. In the very early evolution of animals before the separation of parazoans and eumetazoans, animals underwent extensive gene duplications by which different subtypes (subfamilies) with distinct functions diverged. The multiplicity of members (isoforms) in the same subtype increased by further gene duplications (isoform duplications) in the first half of chordate evolution before the fish-tetrapod split; different isoforms are virtually identical in structure and function but differ in tissue distribution. From cloning and phylogenetic analyses of four subfamilies of the protein tyrosine kinase (PTK) family, we recently showed extensive isoform duplications in a limited period around or just before the cyclostome-gnathostome split. To obtain a reliable estimate for the divergence time of vertebrate isoforms, we have conducted isolation of cDNAs encoding the protein tyrosine phosphatases (PTPs) from Branchiostoma belcheri, an amphioxus, Eptatretus burgeri, a hagfish, and Potamotrygon motoro, a ray. We obtained 33 different cDNAs in total, most of which belong to known PTP subfamilies. The phylogenetic analyses of five subfamilies based on the maximum likelihood method revealed frequent isoform duplications in a period around or just before the gnathostome-cyclostome split. An evolutionary implication was discussed in relation to the Cambrian explosion.
Article
Presence of transposable elements (TEs) in the human genome has profound effects on genome function, structure and evolution. TE mobility and inter-TE recombination are the origin of a large spectrum of mutations and genome reorganization leading to diseases. From the data provided by the Human Genome Project and from information on the detection and dynamics of TEs within and between species acquired during the last two decades, we now know that these elements are not only involved in mutagenesis but can also participate in many cellular functions including recombination, gene regulation, protein-coding RNA messages and, possibly, cellular stress response and centromere function. TEs also promote a general genome shuffling process that has been important for the evolution of several gene families and for the development of new regulatory pathways.
Article
Structural genomics aims to use high-throughput structure determination and computational analysis to provide three-dimensional models of every tractable protein. The process of choosing proteins for experimental structure characterization is known as target selection. In this nomenclature, the targets are regions of proteins to be studied by crystallography or NMR. Selection of the targets is principally a computational process of restricting candidate proteins to those that are tractable and of unknown structure, and prioritizing according to expected interest and accessibility.
Article
The microtubule associated protein tau has been implicated in several neurodegenerative diseases, grouped as tauopathies. To search for tau-associated proteins, the two-hybrid system was used with tau as a bait and an adult human brain cDNA library as a source of putative interacting proteins. We have identified two positive clones consisting of an Alu-derived amino acid sequence that binds to tau and show moderate homology with a motif found in several neuronal proteins related to neurodegenerative disorders. We have also demonstrated that the Alu-derived motif interacts in vitro with tau and may be involved in modulation of its phosphorylation. These findings suggest the existence of tau-binding proteins that are able to bind to tau through their Alu-derived sequence in a direct way. The possible interaction of these proteins with tau could play a role in its cellular localization, regulate the amount of phosphorylated tau and also be involved in the pathological processes of tauopathies.
Article
Alu repetitive elements are found in approximately 1.4 million copies in the human genome, comprising more than one-tenth of it. Numerous studies describe exonizations of Alu elements, that is, splicing-mediated insertions of parts of Alu sequences into mature mRNAs. To study the connection between the exonization of Alu elements and alternative splicing, we used a database of ESTs and cDNAs aligned to the human genome. We compiled two exon sets, one of 1176 alternatively spliced internal exons, and another of 4151 constitutively spliced internal exons. Sixty one alternatively spliced internal exons (5.2%) had a significant BLAST hit to an Alu sequence, but none of the constitutively spliced internal exons had such a hit. The vast majority (84%) of the Alu-containing exons that appeared within the coding region of mRNAs caused a frame-shift or a premature termination codon. Alu-containing exons were included in transcripts at lower frequencies than alternatively spliced exons that do not contain an Alu sequence. These results indicate that internal exons that contain an Alu sequence are predominantly, if not exclusively, alternatively spliced. Presumably, evolutionary events that cause a constitutive insertion of an Alu sequence into an mRNA are deleterious and selected against.
Article
Casein kinase 2 (CK2) is a tetrameric enzyme constitutively expressed in all eukaryotic tissues. The two known isoforms of the catalytic subunit, CK2alpha and CK2alpha', have been reported to have distinct tissue-dependent subcellular distributions. We recently described a third isoform of the catalytic subunit, designated CK2alpha", which is highly expressed in liver. Immunoblot analysis of HuH-7 human hepatoma cell fractions as well as immunofluorescent microscopy revealed that CK2alpha" was exclusively localized to the nucleus and preferentially associated with the nuclear matrix. CK2alpha and CK2alpha' were found in nuclear, membrane, and cytosolic compartments. Deletion of the carboxy-terminal 32 amino acids from the CK2alpha" sequence resulted in release of the truncated green fluorescent protein fusion protein from the nuclear matrix and redistribution to both the nucleus and the cytoplasm. Demonstration that the carboxy terminus is necessary but not sufficient for nuclear retention indicates that the underlying mechanism of CK2alpha" nuclear localization is dependent on the secondary structure of the holoenzyme directed by the carboxy-terminal sequence.
Article
In eukaryotes, an elaborate set of mechanisms has evolved to ensure that the multistep process of gene expression is accurately executed and adapted to cellular needs. The mRNA surveillance pathway works in this context by assessing the quality of mRNAs to ensure that they are suitable for translation. mRNA surveillance facilitates the detection and destruction of mRNAs that contain premature termination codons by a process called nonsense-mediated decay. Moreover, recent studies have shown that a distinct mRNA surveillance process, called nonstop decay, is responsible for depleting mRNAs that lack in-frame termination codons. mRNA surveillance thereby prevents the synthesis of truncated and otherwise aberrant proteins, which can have dominant-negative and other deleterious effects.
Article
A pseudogene is a gene copy that does not produce a functional, full-length protein. The human genome is estimated to contain up to 20,000 pseudogenes. Although much effort has been devoted to understanding the function of pseudogenes, their biological roles remain largely unknown. Here we report the role of an expressed pseudogene-regulation of messenger-RNA stability-in a transgene-insertion mouse mutant exhibiting polycystic kidneys and bone deformity. The transgene was integrated into the vicinity of the expressing pseudogene of Makorin1, called Makorin1-p1. This insertion reduced transcription of Makorin1-p1, resulting in destabilization of Makorin1 mRNA in trans by way of a cis-acting RNA decay element within the 5' region of Makorin1 that is homologous between Makorin1 and Makorin1-p1. Either Makorin1 or Makorin1-p1 transgenes could rescue these phenotypes. Our findings demonstrate a specific regulatory role of an expressed pseudogene, and point to the functional significance of non-coding RNAs.