Figure 1 - uploaded by Gwênlyn Glusman
Content may be subject to copyright.
The three types of genes, and the sources of information used in gene prediction. Stars indicate the functional products. Adapted from [1].  

The three types of genes, and the sources of information used in gene prediction. Stars indicate the functional products. Adapted from [1].  

Source publication
Conference Paper
Full-text available
The genome is rich in information, posing analysis and visualization challenges, while offering enticing opportunities to discover functional structures. Interestingly, half the human genome is routinely discarded from most analyses as ldquojunkrdquo DNA, when interspersed repeats are masked. The GESTALT Workbench is an online tool for large-scale...

Contexts in source publication

Context 1
... most common type of gene prediction tools ("ab initio") relies on the identification of sequence signals and compositional constraints that reflect the various processes leading from the genomic sequence to the functional protein product, e.g. promoter identification (transcriptional control), splicing signals and constraints, and coding sequence constraints (Figure 1, vertically striped arrows). Orthogonally, the similarity of the se- quence to those in the ever-expanding sequence data- bases, at the genomic, transcript or protein levels, can be used to identify expressed elements in the sequence (Figure 1, white arrow) and both functional and dysfunc- tional elements in the sequence (Figure 1, gray arrows). ...
Context 2
... identification (transcriptional control), splicing signals and constraints, and coding sequence constraints (Figure 1, vertically striped arrows). Orthogonally, the similarity of the se- quence to those in the ever-expanding sequence data- bases, at the genomic, transcript or protein levels, can be used to identify expressed elements in the sequence (Figure 1, white arrow) and both functional and dysfunc- tional elements in the sequence (Figure 1, gray arrows). Examples of these include exons and interspersed re- peats, respectively. ...
Context 3
... identification (transcriptional control), splicing signals and constraints, and coding sequence constraints (Figure 1, vertically striped arrows). Orthogonally, the similarity of the se- quence to those in the ever-expanding sequence data- bases, at the genomic, transcript or protein levels, can be used to identify expressed elements in the sequence (Figure 1, white arrow) and both functional and dysfunc- tional elements in the sequence (Figure 1, gray arrows). Examples of these include exons and interspersed re- peats, respectively. ...
Context 4
... a third orthogonal concept was proposed [2,3], based on the identification of "transcriptional footprints" (Figure 1, black arrow). These methods de- tect statistically significant skews in transcribed se- quences. ...
Context 5
... classical "protein-coding" gene functions by produc- ing an RNA intermediary that is processed into a "mes- senger RNA" (mRNA), which is then translated into a functional protein (Figure 1, left column). A second type of transcripts has been described, lacking protein-coding potential: these "RNA genes" are functional in their processed, potentially spliced form (Figure 1, center col- umn). ...
Context 6
... classical "protein-coding" gene functions by produc- ing an RNA intermediary that is processed into a "mes- senger RNA" (mRNA), which is then translated into a functional protein (Figure 1, left column). A second type of transcripts has been described, lacking protein-coding potential: these "RNA genes" are functional in their processed, potentially spliced form (Figure 1, center col- umn). Following splicing, the usual fate of the spliced intron lariats is degradation, and yet some protein-coding genes act as "hosts" for additional functional elements embedded within non-coding exons and even in introns, e.g. ...
Context 7
... third type of gene was identified [8], which pro- duces non-functional, spliced RNA forms that are quickly degraded. In these "stencil" genes [1], the func- tional elements are produced by controlled cleavage of the introns: the "main" transcript is therefore just a framework for producing intron lariats (Figure 1, right column). An important implication of this is that, in con- trast with protein coding and RNA genes, the exons of stencil genes are not expected to be conserved. ...

Citations

... This method has successfully confirmed the coding status of overlapping genes hidden in the supposedly non-coding frames of regular mitochondrial protein coding genes (Seligmann, 2012e), overlapping genes encrypted in the 3 0 -to-5 0 direction of these genes (Seligmann, 2012a) and overlapping genes coded by tetracodons (Seligmann, 2012d), quadruplet codons consisting of four, rather than three nucleotides (Baranov et al., 2009;Gonzalez et al., 2012). A similar approach, based on nucleotide contents as related to DNA singlestrandedness, has also been used in other contexts to detect protein coding genes (Glusman et al., 2006;Glusman, 2009). ...
Article
GenBank's EST database includes RNAs matching exactly human mitochondrial sequences assuming systematic asymmetric nucleotide exchange-transcription along exchange rules: A→G→C→U/T→A (12 ESTs), A→U/T→C→G→A (4 ESTs), C→G→U/T→C (3 ESTs), and A→C→G→U/T→A (1 EST), no RNAs correspond to other potential asymmetric exchange rules. Hypothetical polypeptides translated from nucleotide-exchanged human mitochondrial protein coding genes align with numerous GenBank proteins, predicted secondary structures resemble their putative GenBank homologue's. Two independent methods designed to detect overlapping genes (one based on nucleotide contents analyses in relation to replicative deamination gradients at third codon positions, and circular code analyses of codon contents based on frame redundancy), confirm nucleotide-exchange-encrypted overlapping genes. Methods converge on which genes are most probably active, and which not, and this for the various exchange rules. Mean EST lengths produced by different nucleotide exchanges are proportional to (a) extents that various bioinformatics analyses confirm the protein coding status of putative overlapping genes; (b) known kinetic chemistry parameters of the corresponding nucleotide substitutions by the human mitochondrial DNA polymerase gamma (nucleotide DNA misinsertion rates); (c) stop codon densities in predicted overlapping genes (stop codon readthrough and exchanging polymerization regulate gene expression by counterbalancing each other). Numerous rarely expressed proteins seem encoded within regular mitochondrial genes through asymmetric nucleotide exchange, avoiding lengthening genomes. Intersecting evidence between several independent approaches confirms the working hypothesis status of gene encryption by systematic nucleotide exchanges.
... references Farabaugh (1996); Glusman (2009Glusman ( , 2006; Hwang et al. (1995); Seligmann and Pollock (2003) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 ...
Article
Suppressor tRNAs induce expression of additional (off-frame) genes coded by stopless genetic codes without lengthening genomes, decreasing DNA replication costs. RNA 3'-to-5' polymerization by tRNAHis guanylyltransferase suggests further cryptic code: hypothetical 'invertases' polymerizing in the 3'-to-5' direction, advancing in the 5'-to-3' direction would produce non-complementary RNA templated by regular genes, with different coding properties. Assuming 'invertase' activity, BLAST analyses detect GenBank-stored RNA ESTs and proteins (some potentially coding for the hypothesized invertase) for human mitochondrial genes. These peptides' predicted secondary structures resemble their GenBank homologues'. 3'-to-5' EST lengths increase with their self-hybridization potential: Single-stranded RNA degradation perhaps limits 3'-to-5' elongation. Independent methods confirm predicted 3'-to-5' overlapping genes: (a) Presumed 3'-to-5' overlapping genes avoid codons belonging to circular codes; (b) Spontaneous replicational deamination (mutation) gradients occur at 3rd codon positions, unless these are involved in overlap coding, because mutations are counter selected in overlapping genes. Tests a and b converge on predicted 3'-to-5' gene expression levels. Highly expressed ones include also fewer stops, and mitochondrial genomes (in Primates and Drosophila) adapt to avoid dependence of 3'-to-5' coding upon antitermination tRNA activity. Secondary structure, circular code, gradient and coevolution analyses yield each clear positive results independently confirming each other. These positive results (including physical evidence for 3'-to-5' ESTs) indicate that 3'-to-5' coding and invertase activity is an a priori improbable working hypothesis that cannot be dismissed. Note that RNAs produced by invertases potentially produce triple-stranded DNA:RNA helices by antiparallel Hoogsteen pairings at physiological pH, as previously observed for mitochondrial genomes.
Article
Weak triplet codon-anticodon interactions render ribosome-free translation unlikely. Some modern tRNAs read quadruplet codons (tetracodons), suggesting vestigial ribosome-free translation. Here, mitochondrial genomes are explored for tetracoded overlapping protein coding (tetra)genes. Occasional single tetracodons within regular mitochondrial genes coevolve positively/negatively with antisense tRNAs with predicted reduced/expanded anticodons (depending on taxon), suggesting complex tetra-decoding mechanisms. Transcripts of antisense tRNAs with unusual anticodons are more abundant than of homologues with regular anticodons. Assuming overlapping tetracoding with silent 4th tetracodon position, BLAST aligns 10 putative tetragenes spanning 17% of regular human mitochondrial protein coding tricodons with 14 GenBank proteins. Various tests including predicted peptide secondary structures, 3rd codon position (of the regular main frame of the protein coding gene) conservation against replicational deamination mutation gradients, and circular code usage (overlapping genes avoid using circular code codons) confirm tetracoding in these overlapping tetragenes with silent 4th position, but not for BLAST-predicted tetragenes assuming silent 2nd or 3rd positions. This converges with tetradecoding mechanisms that are more compatible with silent 4th, than at other, tetracodon positions. Tetracoding increases with (a) GC-contents, perhaps conserved or switched on in high temperature conditions; (b) usage of theoretically predicted 'tessera' tetracodons; (c) 12s rRNA stability; and d) antisense tRNA numbers with predicted expanded anticodons. Most detected tetragenes are not evolutionarily conserved, apparently reflect specific, transient adaptations. Tetracoding increases with mammal longevity.