ArticlePDF Available

Retrotransposon-Gene Associations Are Widespread Among D. melanogaster Populations

Authors:

Abstract and Figures

We have surveyed 18 natural populations of Drosophila melanogaster for the presence of 23 retrotransposon-gene-association alleles (i.e., the presence of an LTR retrotransposon sequence in or within 1,000 bp of a gene) recently identified in the sequenced D. melanogaster genome. The identified associations were detected only in the D. melanogaster populations. The majority (61%) of the identified retrotransposon-gene associations were present only in the sequenced strain in which they were first identified. Thirty percent of the associations were detected in at least one of the natural populations, and 9% of the associations were detected in all of the D. melanogaster populations surveyed. Sequence analysis of an association allele present in all populations indicates that selection is a significant factor in the spread and/or maintenance of at least some of retroelement-gene associations in D. melanogaster.
Content may be subject to copyright.
Retrotransposon-Gene Associations Are Widespread Among
D. melanogaster Populations
Lucia F. Franchini, Eric W. Ganko, and John F. McDonald
Department of Genetics, University of Georgia, Athens
We have surveyed 18 natural populations of Drosophila melanogaster for the presence of 23 retrotransposon-gene–
association alleles (i.e., the presence of an LTR retrotransposon sequence in or within 1,000 bp of a gene) recently
identified in the sequenced D. melanogaster genome. The identified associations were detected only in the D.
melanogaster populations. The majority (61%) of the identified retrotransposon-gene associations were present only in
the sequenced strain in which they were first identified. Thirty percent of the associations were detected in at least one of
the natural populations, and 9% of the associations were detected in all of the D. melanogaster populations surveyed.
Sequence analysis of an association allele present in all populations indicates that selection is a significant factor in the
spread and/or maintenance of at least some of retroelement-gene associations in D. melanogaster.
Introduction
Retrotransposons are eukaryotic transposable ele-
ments having a life cycle that includes reverse transcrip-
tion of an RNA intermediate (Boeke and Stoye 1997).
Retrotransposons constitute a significant fraction of most
eukaryotic genomes. For example, in species with rel-
atively large genomes, such as humans, it is estimated that
nearly half of the genome is composed of retrotransposon
sequences (Venter et al. 2001; Lander et al. 2001). The
abundance of retrotransposon sequences in the genomes of
some plant species is even higher. For example, at least
50% of the maize genome (SanMiguel et al. 1996) and
approximately 90% of the genome of some species of lilies
(Flavell 1986) are composed of retrotransposons. In
species with smaller genomes, such as yeast, nematodes,
and fruit flies, the percentage of retrotransposon sequences
in the genome is much less, typically ranging from 1% to
10% (Cherry et al. 1997; Celniker et al. 2002; Kaminker
et al. 2002; Kidwell 2002).
While the adaptive significance of transposable
elements (TEs) was assumed by those researchers that
first discovered them (e.g., McClintock [1951] and Shapiro
[1977]), subsequent theoretical and population genetic
studies (e.g., Hickey [1982] and Charlesworth and Langley
[1989]) called this assumption into question and proposed
that retrotransposons and other transposable elements are
more appropriately viewed as parasitic-like sequences that
provide little or no adaptive benefit to their hosts (e.g.,
Charlesworth [1988]). More recent findings in molecular
biology and genomics indicate that this negative in-
dictment may have been premature. Presently, there are
a large and growing number of examples of retrotranspo-
son sequences located in or near genes that have been
shown to have a significant effect on gene expression
(Britten 1996; Brosius 1999; Landry, Medstrand, and
Mager 2001; Medstrand, Landry, and Mager 2001; Sorek,
Ast, and Graur 2002; Lerman et al. 2003).
The growing availability of the complete sequence of
a variety of genomes is providing an unprecedented
opportunity to more objectively assess the contribution of
transposable element sequences to gene structure and
function. The genomic approach typically begins with the
identification of a TE-gene association (i.e., the occurrence
of a TE sequence in or near a gene) in a sequenced genome
(Maside et al. 2002; Petrov et al. 2003). For example, it
has recently been shown that the majority of long terminal
repeat (LTR) retrotransposon sequences in C. elegans are
located in or near genes (Ganko, Fielman, and McDonald
2001; Ganko et al. 2003). Likewise, recent computational
analyses of the sequenced human genome indicate that
retrotransposon sequences are located in the coding
regions of at least 4% (Nekrutenko and Li 2001) and in
the promoter regions of at least 25% (Jordan et al. 2003) of
human genes. However, the mere identification of a TE-
gene association in a sequenced genome is not, in itself,
indicative of adaptive significance because it may only
represent an insertional mutant unique to the sequenced
strain. If, on the other hand, a given TE-gene association is
shown to be in high frequency or fixed within a species
or among closely related species, this may be taken as
putative evidence that the association is of functional
significance. The selective hypothesis can be subsequently
tested for each candidate adaptive association by sequence
and/or molecular analyses.
Our laboratory has long been interested in the
evolution and significance of transposable elements in
Drosophila (e.g., McDonald [1990] and McDonald
[1993]). We have recently initiated a study to identify
LTR retrotransposon-gene associations (i.e., the presence
of an LTR retrotransposon sequence in or within 1,000 bp
of a gene) in the sequenced Drosophila melanogaster
genome (Ganko et al., unpublished data). We have
subsequently initiated a survey of the presence of these
LTR retrotransposon-gene associations in natural popula-
tions. In this paper, we report the presence of 23
previously unidentified LTR retrotransposon-gene associ-
ations in the sequenced Drosophila genome. Using
a polymerase chain reaction (PCR)–based approach, we
have searched for the presence of each of these
associations in 18 natural populations of D. melanogaster.
The results indicate that the majority of the associations
identified in the sequence strain involve recently inserted
elements and that these associations are endemic to the
sequenced strain. In contrast, those associations that were
Key words: Long terminal repeats, retrotransposons, transposable
elements, Drosophila melanogaster, genome evolution.
E-mail address: mcgene@uga.edu.
Mol. Biol. Evol. 21(7):1323–1331. 2004
doi:10.1093/molbev/msh116
Advance Access publication March 10, 2004
Molecular Biology and Evolution vol. 21 no. 7 Ó Society for Molecular Biology and Evolution 2004; all rights reserved.
by guest on June 4, 2013http://mbe.oxfordjournals.org/Downloaded from
Table 1
LTR-Gene Associations Identified in the Drosophila Melanogaster Genome
Element Family TE Size Proximity Intron/exon Gene Putative Gene Functions
mdg3 5520 bp 0 bp intron para Voltage-sensitive sodium chennel;
DDT susceptability/resistance
412 7520 bp 0 bp intron Deaf 1 Deformed epidermal autoregulatory factor-1
Blastopia 4977 bp 0 bp intron Bazooka Protein kinase C binding
Blastopia 3247 bp 0 bp intron CG6352 odsH-site homeobox transcription factor
297 5899 bp 0 bp exon-intron Cyp309a2 Cytochrome P450
mdg3 267 bp 0 bp 39 UTR mRpL48 Mitochondrial ribosomal protein L48
Roo 9049 bp 0 bp 39 UTR CG9527 Pristanol-CoA oxidase
Roo 9150 bp 0 bp 59 UTR CG12885 Unknown
Transpac 329 bp 0 bp 59 UTR CG7900 Fatty acid amide hydrolase
Quasimodo 659bp 8 bp 39 CycE Cyclin dependant protein kinase, regulator
297 414 bp 68 bp 59 Cyp309a1 Cytochrome P450
Quasimodo 658 bp 171 bp 59 HSP Heat shock protein
412 3731 bp 491 bp 39 DHPR Dihydropteridine reductase
297 5622 bp 939 bp 59 Ab1 Protein tyrosine kinase
Roo 9212 bp 0 bp intron DopR Dopamine receptor
297 5990 bp 0 bp intron Syn Neurotransmiter release
Roo 427 bp 0 bp 59 UTR CG18446 Unknown zinc finger domain
Quasimodo 659 bp 177 bp 39 SPN3 Serpins
Quasimodo 659 bp 256 bp 59 CG9333 Unknown
297 413 bp 283 bp 39 Ken Transcription factor
297 413 bp 293 bp 59 TM4SF Integral membrane protein;
plasma membrane
Quasimodo 281 bp 207 bp 59 CTCF Transcription factor
Beagle 593 bp 458 bp 39 CG17514 Translation activator
NOTE.—Boldface type highlights retroelement and gene names.
Table 2
Presence or Absence of Retroelement Sequence Associated with 23 Genes in 18 Drosophila melanogaster Strains
Representing a Natural Population, the Laboratory Stock y
1
;cn
1
bw
1
sp
1
Quasimodo/ Beagle/ roo/ 297/ roo/ 297/ 297/ Quasimodo/
a
Quasimodo/
b
Quasimodo/
Geographic Area Strain CTCF CG17514 DopR Syn CG18446 ken
a
TM4SF
a
spn3 CG9333 CycE
Laboratory stock 1 y
1
;cn
1
bw
1
sp
1
1 111111 1 1 1
Americas 2 US (Athens) 1 112111 2 2 2
3 US (California) 1 121111 2 2 2
17 Chile (Santiago) 1 122211 2 2 2
11 Antilles
(Rouge)
1 122111 1 1 2
Europe 5 Germany
(Freiburg)
1 121222 2 2 2
19 France
(Bordeaux)
1 122211 1 1 2
12 Italy (Frascati) 1 122222 2 2 2
13 Russia
(Dilizhan)
1 121122 2 2 2
18 Russia
(Dushabe)
1 122122 1 1 2
Africa 4 South Africa
(Cape Town)
1 122211 2 2 2
8 Kenia (Kenya) 1 122222 1 1 2
6 Niger (Niamey) 1 122222 1 1 2
9 Swaziland
(Mbabane)
1 122222 2 2 2
7 Congo
(Dimonika)
1 122222 1 1 2
14 Congo
(Brazzaville)
1 122211 2 2 2
15 Ivory Coast
(Tai Forest)
1 122211 1 1 2
Oceania 16 Australia
(Melbourne)
1 122111 2 2 2
Asia 10 India (Rohtak) 1 122122 1 1 2
a
ken and TM4SF flank the 297 LTR.
b
spn3 and CG9333 flank the Quasimodo LTR.
1324 Franchini et al.
by guest on June 4, 2013http://mbe.oxfordjournals.org/Downloaded from
found to be widespread among populations are composed
of relatively small retrotransposon fragments that are
presumably of much older origin. We present evidence of
a selective sweep of a retrotransposon-gene association
present in all D. melanogaster populations surveyed.
Methods
Identification of LTR-Gene Association in the D.
melanogaster Genome
Using previously identified Drosophila LTR retro-
transposon sequences as queries (Bowen and McDonald
2001), sequence retrieval was initiated via BlastN searches
(default parameters [Altschul et al. 1997]) against the BDGP
(http://www.fruitfly.org) and GenBank (http://www.ncbi.
nlm.nih.gov) databases. Results with e-values less than e
210
were annotated on the corresponding genomic clone with
MacVector version 7.0 (http://www.gcg.com), and nearby
genes were noted. Selected genes within 1 kb of a TE were
Blasted against NCBI’s EST database and mapped
along with predicted transcript structures from FlyBase
(http://www.flybase.org).
Drosophila Strains
D. melanogaster strains established from 20 to 30
wild-collected females from Congo (Dimonika), Niger
(Niamey), Swaziland (Mbabane), Kenya (Kenia), South
Africa (Cape Town), India (Rohtak), Russia (Dushnabe
and Dilizhan), Congo (Brazzaville), Ivory Coast (Tai
Forest), Australia (Melbourne), Chile (Santiago de Chile),
and France (Bordeaux) populations were obtained from
Jean David, CNRS Gif-sur-Yvette, France. Strains from
Germany (Freiburg), Italy (Frascati), and The Antilles
(Rouge) were obtained from Nikolaj Junakovic, Universita´
la Sapienza, Rome, Italy. The California and Athens
(Georgia, USA) strains were provided by Daniel Promis-
low, University of Georgia. The D. melanogaster
sequenced strain y
1
;cn
1
bw
1
sp
1
was obtained from the
Bloomington, Indiana, stock center.
Polymerase Chain Reaction
PCR primers were designed with MacVector version
7.0 and synthesized by Integrated DNA Technologies
(Coralville, Iowa). The primer sequences used in each
reaction are displayed in table 3 of Supplementary
Material online. Three replicate PCR reactions were
carried out per strain, per gene-retrotransposon association.
The DNA used in each reaction (100 ng) was separately
isolated from 10 flies (five males and five females per
isolation) according to previously described methods
(Gloor et al. 1993). PCR products for each primer were
amplified in a 25ll reaction containing 3mM MgCl
2
, 10X
Table 2
Extended
412/ 297/ 297/ Quasimodo/ mdg3/ blastopia/ 412/ mdg3/ Transpac/ blastopia/ roo/ roo/ 297/
Deaf1 Cyp309a1 Ab1 CG16954 para Baz DHPR mRpL48 CG7900 CG6352 CG9527 CG12885 Cyp309a2
1111 1111 1 1 1 1 1
2222 2222 2 2 2 2 2
2222 2222 2 2 2 2 2
2222 2222 2 2 2 2 2
2222 2222 2 2 2 2 2
2222 2222 2 2 2 2 2
2222 2222 2 2 2 2 2
2222 2222 2 2 2 2 2
2222 2222 2 2 2 2 2
2222 2222 2 2 2 2 2
2222 2222 2 2 2 2 2
2222 2222 2 2 2 2 2
2222 2222 2 2 2 2 2
2222 2222 2 2 2 2 2
2222 2222 2 2 2 2 2
2222 2222 2 2 2 2 2
2222 2222 2 2 2 2 2
2222 2222 2 2 2 2 2
2222 2222 2 2 2 2 2
LTR-Gene Associations Among Drosophila Populations 1325
by guest on June 4, 2013http://mbe.oxfordjournals.org/Downloaded from
PCR buffer supplied by Pierce (Rockford, Ill.), 2%
DMSO, 0.2mM dNTPs, 0.5lM of each primer, and 0.5U
of Taq DNA polymerase supplied by Pierce [Rockford,
Ill.]. The program consisted of an initial incubation at 948C
for 5 min followed by 35 cycles each consisting of 30 s at
948C, 1 min at the annealing temperature specific for each
primer pair (see table 3 in Supplementary Material online),
1 min (per kb of PCR product) at 728C and a final ex-
tension cycle 10 min at 728C (final extension). All reactions
were carried out in a Hot Top–equipped Robocycler Gra-
dient 96 (Stratagene, La Jolla, Calif.). 25 ll of each PCR
product was separated on 1% agarose gel in 0.53 TBE
running buffer containing 0.25lgmL
21
ethidium bromide.
Gel images were analyzed by UV transillumination.
Sequencing
PCR products were agarose gel purified (Qiaquick,
Qiagen, Valencia Calif.) and cloned with TOPO TA
(Invitrogen, Carlsbad, Calif.). DNA sequencing was
performed in the Molecular Genetics Instrumentation
Facility at the University of Georgia. Sequencing primers
and primers used for amplifying sequenced PCR product,
when different from the association primers, are shown
in table 3 on Supplementary Material online. Sequence
readouts were checked manually for accurate base callings
and were assembled with Sequencher (Genes Codes, Ann
Arbor, Mich.). The length of the region analyzed is given
according to the expected length in the sequenced strain and
the polymorphic site positions are located relative to this
reference sequence. Nucleotide sequences were aligned
using ClustalW (MacVector 7.0). As a control for PCR
errors, we also sequenced the published y
1
;cn
1
bw
1
sp
1
strain. Population genetic parameters were obtained using
DnaSP version 3.95.7 (Rozas and Rozas 1999).
Results
Twenty-three new LTR retrotransposon-gene associ-
ations were selected for analysis. Genome sequence
analysis resulted in the identification of over 300 hundred
LTR retrotransposon-gene associations (Ganko et al.,
unpublished data). These associations consisted of full-
length and smaller, fragmented LTR-retrotransposon
FIG. 1.—Examples of PCR analyses used to detect the presence of LTR retrotransposon-gene association across 18 representative natural
populations of D. melanogaster and the sequenced strain. Three PCR reactions were performed per strain, per gene; six representative strains are shown.
First lane DNA ladder. (A) PCR products showing that the Quasimodo LTR fragment (281 bp) located 207 bp upstream of the CTCF gene is present in
all D. melanogaster populations analyzed. Q ¼ product from Quasimodo-specific primers (expected size ¼ 186 bp); C ¼ product from CTCF-specific
primers (expected size ¼ 395 bp); QC ¼ product from Quasimodo F 1 CTCF R primers PCR (expected size ¼ 1,805 bp). (B) PCR products showing
that the Beagle fragment (593 bp) located 458 bp 39 to the heterochromatic CG17514 gene is present in all D. melanogaster populations analyzed. B ¼
Beagle primers PCR product (expected size ¼ 562); G ¼ CG17514 primers PCR product (expected size ¼ 250); BG ¼ CG17514 F 1 Beagle R primers
PCR product (expected size ¼ 3,293 bp).
1326 Franchini et al.
by guest on June 4, 2013http://mbe.oxfordjournals.org/Downloaded from
sequences located both within genes and adjacent to genes.
The 23 associations selected to be representative of the
variety of sizes and location of gene-associated element
sequences are shown in table 1. The sequences analyzed
include six full-length LTR retrotransposons, nine LTRs,
and eight fragments, ranging in size from 281 bp to 9212
bp. There were two instances in which a single retro-
transposon sequence was associated with two genes (a 297
LTR was flanked by the Ken and TM4SF genes; a 659-bp
Quasimodo LTR was flanked by the spn3 and CG9333
genes).
LTR retrotransposon sequences were located within
the Flybase-predicted transcriptional boundary (including
untranslated leader regions [UTR] introns and exons) of 12
genes. Six of these 12 associations were located within
introns. Of the remaining six associations, two were in the
39 UTR, 3 were in the 59 UTR, and one spanned part of an
exon and intron. Of the 11 LTR retrotransposon sequences
located outside of gene boundaries, six were located 59
(ranging from 68 to 939 bp upstream from the transcrip-
tional start site) of the gene and five were located 39
(ranging from 8 to 491 bp downstream of the polyA
addition site) to the gene.
The majority of the LTR retrotransposon-gene
associations identified in the sequenced genome were not
detected in natural populations. Two sets of PCR primers
were designed for each retrotransposon-gene association,
one to amplify a portion of the associated gene and the
other to amplify a portion of the associated retrotransposon
sequence. Appropriate pairs of these gene and retrotrans-
poson primers were combined to detect the presence or
absence of each retrotransposon-gene association in strains
representing 18 geographically dispersed populations of
D. melanogaster.
More than half (14 of 23, or 61%) of the associations
were detected only in the sequenced strain (tables 1 and 2).
Of these, the majority of the associated elements (80%)
were full-length or nearly full-length in size and had
identical or nearly identical LTRs (. 99% sequence
identity). This is consistent with the possibility that these
elements have inserted in the recent evolutionary past and,
thus, represent mutational polymorphism within the
sequenced strain. Seven of the 23 associations were
detected in some but not all of the D. melanogaster
populations (tables 1 and 2). Some of these alleles were
found to display slight indel variation in the size of the
associated retrotransposon sequence (fig. 4).
Two associations were detected in all 18 D.
melanogaster populations. Two of the 23 associations
(9%) were detected in all 18 of the D. melanogaster
populations surveyed (tables 1 and 2 and fig. 1). One of
these associations is a promoter-containing 268-bp Qua-
simodo LTR fragment located 207 bp 59 to the CTCF
(CG8591) gene (fig. 2A). The second is a Beagle fragment
(593 bp) located 458 bp 39 to CG17514 gene (fig. 3). The
CTCF (CG8591) gene maps to a euchromatic region
FIG. 2.—(A) Structure of the Quasimodo-CTCF allele in the sequenced D. melanogaster genome. The 281-bp fragment of Quasimodo LTR is
associated with the CTCF gene in the D. melanogaster sequenced genome. Arrows represent the position of the primers used to detect the associations
in the populations and species studied. The area sequenced is boxed. Two alternative transcripts have been detected for this gene, Ra and Rb, one
composed of exons 1 (Ra), 2(Ra), 3 and 4 and the other composed of exon 1 (Rb) and exon 2 (Rb). (B) The sequence of the 281-bp Quasimodo LTR
fragment is conserved across all D. melanogaster populations examined, whereas adjacent intronic and exonic sequences are significantly diverged.
Vertical numbers represent the position of the polymorphic site; numbering is according to the sequenced strain; zero indicates no polymorphism in
a defined area of the sequence.
LTR-Gene Associations Among Drosophila Populations 1327
by guest on June 4, 2013http://mbe.oxfordjournals.org/Downloaded from
(65F6) on chromosome 3L (www.flybase.org), whereas
the CG17514 gene maps to constitutive heterochromatin
(Hoskins et al. 2002).
Sequence Analysis
To gain insight into the factors that may have
contributed to the widespread distribution of the Beagle-
CG17514 and Quasimodo-CTCF alleles in D. mela-
nogaster populations, we sequenced various regions in
and around the LTR retrotransposon sequence associated
with each of these two widely distributed alleles in six
geographically diverse populations (Athens, California,
Germany, Kenya, India, and Antilles). The resulting
sequences were aligned with one another and with that
of the sequenced y
1
;cn
1
bw
1
sp
1
strain.
We sequenced the 1,144-bp region containing the
Beagle fragment and the adjoining 59 flanking region
(including the 39 UTR) of CG17514 (fig. 3). We also
sequenced an additional 882-bp coding region (exon 3)
within the CG17514 gene. This coding region was found
to contain the highest number of polymorphic sites
among the six natural populations and sequenced y
1
;cn
1
bw
1
sp
1
strains (6.1% divergence). The sequence of the
region containing the Beagle fragment (6.1% divergence)
and the adjacent intragenomic region (3.3% divergence)
were also found to be highly polymorphic among the six
populations and the sequenced y
1
;cn
1
bw
1
sp
1
strain.
In an 1,863-bp region containing the Quasimodo LTR
fragment and a portion of the CTCF gene, there were a total
of 38 polymorphic sites, of which 14 were small indels.
Remarkably, the entire 281-bp Quasimodo LTR fragment
was found to be identical in sequence among all six natural
population samples (0% divergence), as well as in the
sequenced y
1
;cn
1
bw
1
sp
1
strain. The sequence of the
immediately adjacent CTCF exon 1 (59 UTR) was nearly
FIG. 3.—(A) Structure of the Beagle-CG17514 allele in the sequenced D. melanogaster genome. A 593-bp Beagle fragment is located
458 bp downstream to the CG17514 gene on the D. melanogaster sequence genome. Arrows represent the position of the primers used to
detect the associations in the populations and species studied. The area sequenced is boxed. (B) Sequence analysis showing that the Beagle-derived
sequence and the gene region (exon and intron) contain a high number of polymorphic sites in the seven strains analyzed. Vertical numbers represent
the position of the polymorphic site; numbering is according to the sequenced strain, and zero indicates no polymorphism in a defined area of
the sequence.
1328 Franchini et al.
by guest on June 4, 2013http://mbe.oxfordjournals.org/Downloaded from
invariant (0.3% divergence) among all strains. Higher
levels of intraspecific variation were, however, detected in
the more distal intron 1 (2.5% divergence) and exons 2 and
3 (3.0% divergence) of the CTCF gene (fig. 2).
Discussion
Intraspecific patterns of nucleotide and retrotranspo-
son-gene allelic variation appear to be distinct. Several
techniques have been used to estimate levels of nucleotide
genetic variation within and between species. In the early to
middle 1970s, extensive studies of allozyme variation were
carried out within and between species of Drosophila
(Ayala 1975). The general conclusion was that relatively
little nucleotide genetic variation exists between popula-
tions of Drosophila species. Local populations of Drosoph-
ila were estimated to be greater than 95% identical based on
the results of allozyme studies, and this value has been
generally supported by subsequent restriction fragment
length polymorphisms (RFLP) and direct sequencing based
studies (Aquadro et al. 1992).
The results presented in this paper suggest that the
story is quite different with regard to retrotransposon
insertional variation in or near genes. We estimate that in
the sequenced Drosophila melanogaster genome, approx-
imately 2% of the genes (approximately 300 genes) are
associated with an LTR retrotransposon sequence (i.e., an
LTR retrotransposon sequence in or within 1,000 bp of the
gene) (Ganko et al., unpublished data). The results
presented in this paper suggest that the vast majority of
these associations (;61%) are endemic to the sequenced
strain. Previous studies indicate that the genome of the
sequenced strain is a typical D. melanogaster genome with
respect to the number and distribution of transposable
elements (Kaminker et al. 2002). In so far as this is correct,
our results indicate that although there appears to be
a relatively large number of retrotransposon-gene associ-
ations present in D. melanogaster genomes, the majority
of these variants are likely to be population/strain specific.
We did find, however, that a significant proportion of
the retrotransposon-gene associations present in the
sequenced genome are widely distributed among natural
populations. Indeed, 39% (nine of 23) of the retrotrans-
poson-gene associations identified in the sequence strain
were detected in at least two populations, and more than
30% (seven of 23) were detected in at least seven out of
the 18 populations. Nine percent (two of 23) of the
retrotransposon-gene associations were detected in all of
the 18 populations surveyed.
Previous surveys of transposable-element insertion
variants using in situ hybridization and RFLP methodolo-
gies (Charlesworth and Langley 1989) failed to detect
insertion variants that were widespread among D. mela-
nogaster populations. However, the ability of these
techniques to detect relatively small insertions are limited,
and our results indicate that most of the retrotransposon-
gene associations that are widespread among populations
are composed of relatively small retrotransposon fragments
(tables 1 and 2).
The majority of the retrotransposon-gene association
variants that are unique to one or a few populations are
likely of recent evolutionary origin. When LTR retro-
transposons initially integrate into genomes, they are
generally full-length in size; that is, they are composed of
gag, pol, and sometimes env genes flanked by identical
LTRs (Boeke and Stoye 1997). Full-length Drosophila
LTR retrotransposons are typically 5 to 7 kb in length
(Archipoda, Lynbaniskaya, and Ilin 1995). Over time, these
full-length elements generally decrease in size because of
the gradual accumulation of small deletions or by other
mechanisms believed to actively remove transposable
element sequences from the genome (Petrov 2002). In our
study, more than 60% of the LTR retrotransposon
sequences that are unique to the sequenced strain are
more than 3,000 bp in length and most are full-length or
nearly full-length elements. In addition, retrotransposon-
gene associations identified in the sequenced strain that are
also present in only a few (1 to 3) of the 18 natural
populations surveyed are also composed of full-length or
nearly full-length retrotransposons (e.g., roo-DopR and
297-Syn). The degree of sequence identity among the 59
and 39 LTRs of a full-length LTR retrotransposon can be
used to estimate the time elapsed since the element
transposed (SanMiguel et al. 1998; Jordan and McDonald
1999a, 1999b). All full-length elements found to be
associated with genes in our survey displayed greater than
99% sequence identity, indicating that they have been
recently inserted. These observations stand in contrast to
the fact that those associations more widespread among
populations (eight or nine out of 18) are composed of
retrotransposon sequences no larger than 659 bp in
length. The two associations that were found to be present
in all 18 populations surveyed were composed of retro-
transposon fragments of only 207 and 593 bp in length
respectively.
We conclude from these results that most of the
retrotransposon-gene associations that are strain/popula-
tion specific or present in only a few populations are the
products of relatively recent insertional events. This is
consistent with previous results indicating that essentially
FIG. 4.—PCR products showing that seven LTR retrotransposon-
gene associations are variably present in the 18 strains analyzed. The
numbers above each gel correspond to the 18 D. melanogaster natural
populations described in table 2.
LTR-Gene Associations Among Drosophila Populations 1329
by guest on June 4, 2013http://mbe.oxfordjournals.org/Downloaded from
all full-length elements present in the D. melanogaster
genome are much younger than the age of the species
(Bowen and McDonald 2001; Kaminker et al. 2002). As
elements age and/or are spread among populations, they
appear to become significantly reduced in size. The
question remains as to whether or not those retrotranspo-
son fragments that remain associated with genes over
relatively long spans of evolutionary time are of adaptive
significance.
Sequence analysis indicates that the widespread
Quasimodo-CTCF gene association has undergone a re-
cent selective sweep. There are at least three plausible
explanations for the widespread distribution of retro-
transposon-gene association alleles among Drosophila
melanogaster populations. It is possible that the insertion
alleles were present in the common ancestor of present-
day populations and have been maintained by chance or
selection in some or all populations over evolutionary
time. A second possibility is that the insertion allele arose
more recently in some population and has been spread to
other populations by migration coupled with the action of
drift and/or selection. A third, less likely, possibility is
that the insertion event occurred independently in many
or all of the populations in which the retrotransposon-
gene association is currently present. We consider this
latter possibility extremely unlikely for at least two
reasons. First, new LTR retrotransposon insertions
typically involve full-length elements and, as discussed
above, all of the associations that are widespread among
populations surveyed are composed of relatively small
fragments of retrotransposons. Second, the precise in-
sertion site of any given associated retrotransposon
sequence is the same among all associated alleles,
indicating that each is likely the product of the same
insertional event. Under any scenario, if the retrotrans-
poson-gene associations are being maintained or spread
by random processes, neutral substitutions would be
expected to accumulate among the homologous variants
over evolutionary time.
In an initial effort to assess the relative roles of drift
and selection in the maintenance of widespread retro-
transposon-gene associations, we have examined the
patterns of sequence variation in and around the
retroelement sequences in the two associations that were
detected in all of the 18 populations surveyed in this
study. Figure 3 displays the levels of variation in and
around the Beagle retrotransposon sequence associated
with the CG17514 gene among six of the 18 geo-
graphically diverse populations in which it is found. The
level of polymorphism within the Beagle element and
adjacent intergenic region is twice as high among
populations (6.6%) as in the gene-encoding region
(3.3%). This pattern of variation provides no evidence
of selection operating on the retrotransposon sequence.
The fact that this association is located within a consti-
tutively heterochromatic (and, thus, low recombinogenic)
region of the genome may help explain why it has been
widely maintained in the species, despite the apparent
absence of positive selection.
In contrast, figure 2 displays the patterns of nucleotide
variation in and around the LTR fragment located just
upstream of the CTCF gene among these same six
populations. The level of sequence variation is significantly
reduced in the upstream region immediately adjacent to the
fragmented retrotransposon. Indeed, we found that the 281-
bp sequence of the Quasimodo LTR fragment itself is
sequentially identical among all six populations (0%
divergence). Nucleotide variability remains remarkably
low in the intergenic region immediately adjacent to the
Quasimodo sequence (0.3%) but gradually increases as
a function of distance from the insertion site, reaching
a maximum of 3% in the regions of exons 2 and 3 that were
sequenced. These results are consistent with a selective
sweep centered in the Quasimodo LTR fragment (e.g.,
Hudson, Saez, and Ayala [1997] and Saez et al. [2003]).
Future molecular studies will be required to delineate the
likely functional significance of the Quasimodo sequence
on CTCF gene expression.
Acknowledgments
Research supported by National Institutes of Health
(NIH) Grant to J.F.M. E.W.G. is supported through an
NIH Genetics Training Grant.
Literature Cited
Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z.
Zhang, W. Miller, and D. J. Lipman. 1997. Gapped BLAST
and PSI-BLAST: a new generation of protein database search
programs. Nucleic Acids Res. 25:3389–3402.
Aquadro, C. F., R. M. Jennings Jr., M. M. Bland, C. C. Laurie,
and C. H. Langley. 1992. Patterns of naturally occurring
restriction map variation, dopa decarboxylase activity varia-
tion and linkage disequilibrium in the Ddc gene region of
Drosophila melanogaster. Genetics 132:443–452.
Archipoda, I. R., N. V. Lynbaniskaya, and Y. V. Ilin. 1995
Drosophila retrotransposons. RG Landas Co., Austin, Tex.
Ayala, J. F. 1975. Genetic differentiation during the speciation
process. Evol. Biol. 8:1–78.
Boeke, J. D., and J. P. Stoye, 1997 Retrotransposons, endogenous
retroviruses and the evolution of retroelements. Cold Spring
Harbor Laboratory Press, Cold Spring Harbor, NY.
Bowen, N. J., and J. F. McDonald. 2001. Drosophila euchromatic
LTR retrotransposons are much younger than the host species
in which they reside. Genome Res. 11:1527–1540.
Britten, R. J. 1996. DNA sequence insertion and evolutionary
variation in gene regulation. Proc. Natl. Acad. Sci. USA 93:
9374–9377.
Brosius, J. 1999. RNAs from all categories generate retro-
sequences that may be exapted as novel genes or regulatory
elements. Gene 238:115–134.
Celniker, S. E., D. A. Wheeler, B. Kronmiller et al. (29 co-
authors). 2002. Finishing a whole-genome shotgun: release 3
of the Drosophila melanogaster euchromatic genome se-
quence. Genome Biol. 3 :RESEARCH0079.
Charlesworth, B. 1988. The maintenance of transposable ele-
ments in natural populations. Basic Life Sci. 47:189–212.
Charlesworth, B., and C. H. Langley. 1989. The population ge-
netics of Drosophila transposable elements. Annu. Rev. Genet.
23:251–287.
Cherry, J. M., C. Ball, S. Weng et al. (8 co-authors). 1997.
Genetic and physical maps of Saccharomyces cerevisiae.
Nature 387:67–73.
1330 Franchini et al.
by guest on June 4, 2013http://mbe.oxfordjournals.org/Downloaded from
Flavell, R. B. 1986. Repetitive DNA and chromosome evolution
in plants. Philos. Trans. R. Soc. Lond. Biol. Sci. 312:227–242.
Ganko, E. W., V. Bhattacharjee, P. Schliekelman, and J. F.
McDonald. 2003. Evidence for the contribution of LTR
retrotransposons to C. elegans gene evolution. Mol. Biol.
Evol. 20:1925–1931.
Ganko, E. W., K. T. Fielman, and J. F. McDonald. 2001.
Evolutionary history of Cer elements and their impact on the
C. elegans genome. Genome Res. 11:2066–2074.
Gloor, G. B., C. R. Preston, D. M. Johnson-Schlitz, N. A. Nassif,
R. W. Phillis, W. K. Benz, H. M. Robertson, and W. R.
Engels. 1993. Type I repressors of P element mobility.
Genetics 135:81–95.
Hickey, D. A. 1982. Selfish DNA: a sexually-transmitted nuclear
parasite. Genetics 101:519–531.
Hoskins, R. A., C. D. Smith, J. W. Carlson et al. (13 co-
authors). 2002. Heterochromatic sequences in a Drosophila
whole-genome shotgun assembly. Genome Biol. 3:
RESEARCH0085.
Hudson, R. R., A. G. Saez, and F. J. Ayala. 1997. DNA variation
at the Sod locus of Drosophila melanogaster: an unfolding
story of natural selection. Proc. Natl. Acad. Sci. USA
94:7725–7729.
Jordan, I. K., and J. F. McDonald. 1999a. Comparative genomics
and evolutionary dynamics of Saccharomyces cerevisiae Ty
elements. Genetica 107:3–13.
———. 1999b. Tempo and mode of Ty element evolution in
Saccharomyces cerevisiae. Genetics 151:1341–1351.
Jordan, I. K., I. B. Rogozin, G. V. Glazko, and E. V. Koonin.
2003. Origin of a substantial fraction of human regulatory
sequences from transposable elements. Trends Genet. 19:
68–72.
Kaminker, J. S., C. M. Bergman, B. Kronmiller et al. (9 co-
authors). 2002. The transposable elements of the Drosophila
melanogaster euchromatin: a genomics perspective. Genome
Biol. 3:RESEARCH0084.
Kidwell, M. G. 2002. Transposable elements and the evolution of
genome size in eukaryotes. Genetica 115:49–63.
Lander, E. S., L. M. Linton, B. Birren et al. (more than 100 co-
authors). 2001. Initial sequencing and analysis of the human
genome. Nature 409:860–921.
Landry, J. R., P. Medstrand, and D. L. Mager. 2001. Repetitive
elements in the 59 untranslated region of a human zinc-finger
gene modulate transcription and translation efficiency.
Genomics 76:110–116.
Lerman, D. N., P. Michalak, A. B. Helin, B. R. Bettencourt, and M.
E. Feder. 2003. Modification of heat-shock gene expression in
Drosophila melanogaster populations via transposable ele-
ments. Mol. Biol. Evol. 20:135–144.
Maside, X., A. W. Lee, and B. Charlesworth. 2003. Inferences on
the evolutionary history of the S-element family of Drosophila
melanogaster. Mol. Biol. Evol. 20:1183–1187.
McClintock, B. 1951. Chromosome organization and genetic
expression. Cold Spr. Harb. Symp. Quant. Biol. 16:13–47.
McDonald, J. F. 1990. Macroevolution and retroviral elements.
Bioscience 40:183–191.
———. 1993. Evolution and consequences of transposable
elements. Curr. Opin. Genet. Dev. 3:855–864.
Medstrand, P., J. R. Landry, and D. L. Mager. 2001. Long
terminal repeats are used as alternative promoters for the
endothelin B receptor and apolipoprotein C-I genes in
humans. J. Biol. Chem. 276:1896–1903.
Nekrutenko, A., and W. H. Li. 2001. Transposable elements are
found in a large number of human protein-coding genes.
Trends Genet. 17:619–621.
Petrov, D. A. 2002. DNA loss and evolution of genome size in
Drosophila. Genetica 115:81–91.
Petrov, D. A., Y. T. Aminetzach, J. C. Davis, D. Bensasson, and
A. E. Hirsh. 2003. Size matters: non-LTR retrotransposable
elements and ectopic recombination in Drosophila. Mol. Biol.
Evol. 20:880–892.
Rozas, J., and R. Rozas. 1999. DnaSP version 3: an integrated
program for molecular population genetics and molecular
evolution analysis. Bioinformatics 15:174–175.
Saez, A. G., A. Tatarenkov, E. Barrio, N. H. Becerra, and F. J.
Ayala. 2003. Patterns of DNA sequence polymorphism at Sod
vicinities in Drosophila melanogaster: unraveling the foot-
print of a recent selective sweep. Proc. Natl. Acad. Sci. USA
100:1793–1798.
SanMiguel, P., B. S. Gaut, A. Tikhonov, Y. Nakajima, and J. L.
Bennetzen. 1998. The paleontology of intergene retrotrans-
posons of maize. Nat. Genet. 20:43–45.
SanMiguel, P., A. Tikhonov, Y. K. Jin et al. 1996. Nested
retrotransposons in the intergenic regions of the maize
genome. Science 274:765–768.
Shapiro, J. A. 1977. DNA insertion elements and the evolution of
chromosome primary structure. Trends Biochem. Sci. 2:622–
627.
Sorek, R., G. Ast, and D. Graur. 2002. Alu-containing exons are
alternatively spliced. Genome Res. 12:1060–1067.
Venter, J. C., M. D. Adams, E. W. Myers et al. 2001. The
sequence of the human genome. Science 291:1304–1351.
Thomas Eickbush, Associate Editor
Accepted February 26, 2004
LTR-Gene Associations Among Drosophila Populations 1331
by guest on June 4, 2013http://mbe.oxfordjournals.org/Downloaded from

Supplementary resources (18)

... Because Hsp genes share some of these features and represent extreme manifestations of others, Lerman et al. [15] suggested that proximal promoters of Hsp genes in general were natural ''hotspots'' for TE integration. Although this suggestion was consistent with the discovery of naturally occurring TEs in Hsp70 genes' promoters [15], (1) these TEs were few, (2) with few exceptions [49], naturally occurring TEs had not been discovered in other Hsp genes, and (3) TEs were not known to be comparatively rare in the proximal promoter regions of non-heat-shock genes. Three independent lines of evidence now establish that insertions of one TE, P elements, are common, not only in Hsp70 promoters, but also in other (single copy) heat-shock promoters: ...
... A testable prediction is that experimental evolution in contrasting thermal regimes should be capable of altering P element frequencies in Hsp genes according to the phenotypes of P element insertions; i.e., increasing P element allelic frequency when beneficial and decreasing frequency when deleterious. The same prediction ought to be applicable to other instances in which TE-derived sequences modify a host function and have been assimilated by the host genome [1,20,49,72,73]. ...
Article
Full-text available
Transposable elements are potent agents of genomic change during evolution, but require access to chromatin for insertion—and not all genes provide equivalent access. To test whether the regulatory features of heat-shock genes render their proximal promoters especially susceptible to the insertion of transposable elements in nature, we conducted an unbiased screen of the proximal promoters of 18 heat-shock genes in 48 natural populations of Drosophila. More than 200 distinctive transposable elements had inserted into these promoters; greater than 96% are P elements. By contrast, few or no P element insertions segregate in natural populations in a “negative control” set of proximal promoters lacking the distinctive regulatory features of heat-shock genes. P element transpositions into these same genes during laboratory mutagenesis recapitulate these findings. The natural P element insertions cluster in specific sites in the promoters, with up to eight populations exhibiting P element insertions at the same position; laboratory insertions are into similar sites. By contrast, a “positive control” set of promoters resembling heat-shock promoters in regulatory features harbors few P element insertions in nature, but many insertions after experimental transposition in the laboratory. We conclude that the distinctive regulatory features that typify heat-shock genes (in Drosophila) are especially prone to mutagenesis via P elements in nature. Thus in nature, P elements create significant and distinctive variation in heat-shock genes, upon which evolutionary processes may act. Synopsis Transposable elements can be a major source of evolutionary change. Their insertion can directly affect the genes into, or next to, which they insert. To insert, however, they must first gain access to the host gene. The authors reasoned that, because the DNA in the promoters (i.e., regulatory regions) of heat-shock genes is unusually accessible, these genes might harbor many transposable elements. With a technique that can detect any insertion into a gene, they discovered more than 200 distinctive transposable elements in the promoter regions of heat-shock genes in fruit flies from the wild—but few or none in the promoter regions of more typical genes. Surprisingly, out of the one hundred kinds of transposable elements in fruit flies, almost all were P elements. P elements are remarkable because they invaded the fruit fly genome only during the last century. These findings imply that the combination of accessible DNA and the recent invasion of P elements have left a distinctive imprint on the promoters of heat-shock genes.
... Although some authors have described TEs without relevant function in the genomes (Strobel et al., 1979) or with deleterious effects on genome organization (Hickey, 1982), there is evidence of the crucial roles of TEs in the evolutionary dynamics of eukaryotic genomes (Gonzaĺez and Petrov, 2009). There were diverse retrotransposons that were components of coding regions of functional genes or part of regulatory regions in the arthropod model species Drosophila melanogaster (Franchini et al., 2004). In the present study, most collinear duplicated genes that had a positive blast (discarding the uncharacterized proteins) were coding genes for different transposons, retrotransposons, or transposases. ...
Article
Full-text available
The sea louse Caligus rogercresseyi is a marine ectoparasite that constitutes one of the major threats to the salmon farming industry, where the primary control strategy is the use of delousing drugs through immersion treatments. The emergence of pharmacological resistance in this copepodid species has previously been described using transcriptome data. However, the molecular mechanisms underlying chromosome rearrangements have not yet been explored. This study aimed to identify structural genomic variations and gene expression in C. rogercresseyi associated with pesticide sensitivity. In this study, genome resequencing was conducted using Oxford Nanopore Technology on lice strains with contrasting sensitivity to azamethiphos to detect genome duplications. Transcriptome profiling of putative gene duplications was performed by Illumina sequencing. Copy Number Variants (CNVs) were identified through comparative coverage, and collinear/tandem gene duplications over all the chromosomal regions by sequence homology. Duplications or CNVs in functional genes were primarily identified in transposable elements and genes related to the drug response, with differential expression values calculated by RNA-seq analyses of the same strains. Notably, differentially duplicated genes were found in coding regions related to cuticle proteins, suggesting that a putative resistance mechanism may be associated with cuticular structure formation and the proteins involved. Collectively, the results revealed that the intensive use of pesticides on sea lice populations increases the frequency of gene duplication, expanding the molecular elements involved in drug response. This study is the first to report an association between genome rearrangements and pharmacological resistance in sea lice populations.
... A single gene can, therefore, encode a rich repertoire of transcripts that can be involved in diverse biological functions, and contribute to adaptive evolution and disease (e.g., Marasca et al. 2020;Kiyose et al. 2022;Singh and Ahi 2022;Verta and Jacobs 2022). The potential contribution of transposable element (TE) insertions to the diversification of the transcriptome was analyzed soon after the first whole-genome sequences were available (Ganko et al. 2003;Jordan et al. 2003;van de Lagemaat et al. 2003;Franchini et al. 2004;Lipatov et al. 2005). TEs are present in virtually all genomes studied to date and are able to insert copies of themselves in the genome, and although their mutation capacity is often harmful, they also represent an important source of adaptive genetic variation (Volff 2006;Casacuberta and González 2013;Cowley and Oakey 2013;Schrader and Schmitz 2019). ...
Article
Full-text available
Transcriptomes are dynamic, with cells, tissues, and body parts expressing particular sets of transcripts. Transposable elements (TEs) are a known source of transcriptome diversity; however, studies often focus on a particular type of chimeric transcript, analyze single body parts or cell types, or are based on incomplete TE annotations from a single reference genome. In this work, we have implemented a method based on de novo transcriptome assembly that minimizes the potential sources of errors while identifying a comprehensive set of gene-TE chimeras. We applied this method to the head, gut, and ovary dissected from five Drosophila melanogaster natural strains, with individual reference genomes available. We found that ∼19% of body part–specific transcripts are gene–TE chimeras. Overall, chimeric transcripts contribute a mean of 43% to the total gene expression, and they provide protein domains for DNA binding, catalytic activity, and DNA polymerase activity. Our comprehensive data set is a rich resource for follow-up analysis. Moreover, because TEs are present in virtually all species sequenced to date, their role in spatially restricted transcript expression is likely not exclusive to the species analyzed in this work.
... The low numbers of DNA transposons could also be connected to the generally assumed low frequencies of horizontal transfer in the Antarctic, owing to low species diversity present (Kelley et al., 2014). In Drosophila, some ТЕs are associated with genes that have adaptive effects (Franchini et al., 2004). Therefore, the presence of such adaptive insertions might still be predicted in B. antarctica even given the low number of ТЕs in its genome. ...
Article
Full-text available
Belgica antarctica (Diptera: Chironomidae), a brachypterous midge endemic to the maritime Antarctic, was first described in 1900. Over more than a century of study, a vast amount of information has been compiled on the species (3 750 000 Google search results as of January 10, 2021), encompassing its ecology and biology, life cycle and reproduction, polytene chromosomes, physiology, biochemistry and, increasingly, omics. In 2014, B. antarctica’s genome was sequenced, further boosting research. Certain developmental stages can be cultured successfully in the laboratory. Taken together, this wealth of information allows the species to be viewed as a natural model organism for studies of adaptation and function in extreme environments.
... Only TE insertions with sufficiently mild deleterious effects on the host genome can avoid being eliminated, and thus have opportunities to spread in a population. It was reported that most TEs in Drosophila are present at low frequencies, supporting the hypothesis that most TE insertions are selected against (Franchini et al. 2004;Cridland et al. 2013;Barr on et al. 2014). The accumulation of mildly deleterious mutations in TE sequences leads to eventual transposition inactivation of the inserted element (Maside et al. 2005). ...
Article
Full-text available
Transposable elements (TEs) contribute to a large fraction of the expansion of many eukaryotic genomes due to the capability of TEs duplicating themselves through transposition. A first step to understanding the roles of TEs in a eukaryotic genome is to characterize the population-wide variation of TE insertions in the species. Here, we present a maximum-likelihood (ML) method for estimating allele frequencies and detecting selection on TE insertions in a diploid population, based on the genotypes at TE insertion sites detected in multiple individuals sampled from the population using paired-end (PE) sequencing reads. Tests of the method on simulated data show that it can accurately estimate the allele frequencies of TE insertions even when the PE sequencing is conducted at a relatively low coverage (=5X). The method can also detect TE insertions under strong selection, and the detection ability increases with sample size in a population, although a substantial fraction of actual TE insertions under selection may be undetected. Application of the ML method to genomic sequencing data collected from a natural Daphnia pulex population shows that, on the one hand, most (>90%) TE insertions present in the reference D. pulex genome are either fixed or nearly fixed (with allele frequencies >0.95); on the other hand, among the non-reference TE insertions (i.e., those detected in some individuals in the population but absent from the reference genome), the majority (>70%) are still at low frequencies (<0.1). Finally, we detected a substantial fraction (∼9%) of non-reference TE insertions under selection.
... TEs have traditionally been considered as either selfish or junk DNA conferring no benefit to their host. This view has been challenged with clear examples of both beneficial individual insertions (Franchini et al. 2004;Schlenke and Begun 2004) and TE families (Biessmann et al. 1992). The idea that TEs are solely genomic parasites now appears overly simplistic; however, the majority of insertions appears to be deleterious or neutral in a broad range of eukaryotes (Charlesworth et al. 1992;Jordan and McDonald 1999;Pereira 2004). ...
Article
Full-text available
Capsaspora owczarzaki, a protistan symbiont of the pulmonate snail Biomphalaria glabrata, is the centre of much interest in evolutionary biology due to its close relationship to Metazoa. The whole genome sequence of this protist has revealed new insights into the ancestral genome composition of Metazoa, in particular with regard to gene families involved in the evolution of multicellularity. The draft genome revealed the presence of 23 families of transposable element, made up from DNA transposon as well as LTR and non-LTR retrotransposon families.The phylogenetic analyses presented here show that all of the transposable elements identified in the C. owczarzaki genome have orthologous families in Metazoa, indicating that the ancestral metazoan also had a rich diversity of elements. Molecular evolutionary analyses also show that the majority of families have recently been active within the Capsaspora genome. One family now appears to be inactive and a further five families show no evidence of current transposition. Most individual element copies are evolutionarily young, however a small proportion of inserts appear to have persisted for longer in the genome. The families present in the genome show contrasting population histories and appear to be in different stages of their life cycles. Transcriptome data have been analysed from multiple stages in the C. owczarzaki life cycle. Expression levels vary greatly both between families and between different stages of the life cycle, suggesting an unexpectedly complex level of transposable element regulation in a single celled organism.
... Here we use results from coalescent theory to determine the probability distribution of allele frequency for a neutral TE insertion identified in a reference genome, conditional on its estimated time since insertion. This method is particularly suitable for genotyping or resequencing studies in which TEs identified in a well-assembled genome are subsequently assayed for their allele frequency in populations (Blumenstiel et al. 2002; Petrov et al. 2003, 2011; Franchini et al. 2004; Neafsey et al. 2004; Lipatov et al. 2005; Gonzalez et al. 2008). Since the age of an insertion allele cannot be exactly determined, we incorporate uncertainty in age estimates into our approach by integrating over the Bayesian posterior distribution of time since insertion. ...
Article
Full-text available
How natural selection acts to limit the proliferation of transposable elements (TEs) in genomes has been of interest to evolutionary biologists for many years. To describe TE dynamics in populations, previous studies have used models of transposition-selection equilibrium that assume a constant rate of transposition. However, since TE invasions are known to happen in bursts through time, this assumption may not be reasonable. Here we propose a test of neutrality for TE insertions that does not rely on the assumption of a constant transposition rate. We consider the case of TE insertions that have been ascertained from a single haploid reference genome sequence. By conditioning on the age of an individual TE insertion allele (inferred by the number of unique substitutions that have occurred within the particular TE sequence since insertion), we determine the probability distribution of the insertion allele frequency in a population sample under neutrality. Taking models of varying population size into account, we then evaluate predictions of our model against allele frequency data from 190 retrotransposon insertions sampled from North American and African populations of Drosophila melanogaster. Using this non-equilibrium neutral model, we are able to explain about 80% of the variance in TE insertion allele frequencies based on age alone. Controlling for both non-equilibrium dynamics of transposition and host demography, we provide evidence for negative selection acting against most TEs as well as for positive selection acting on a small subset of TEs. Our work establishes a new framework for the analysis of the evolutionary forces governing large insertion mutations like TEs, gene duplications or other copy number variants.
... Although it is clear that theoretically, like point mutations, retronuons are probably more often a disadvantage than an advantage to the affected individual and in most cases they have no effect at all, it is remarkable that 25% of analyzed promoter regions in the human genome contain retronuon-derived sequences (Jordan et al. 2003) and that the 5Ј ends of a large proportion of mRNAs contain parts of retronuons, thus indicating a role of the respective retronuons in gene regulation (van de Lagemaat et al. 2003;Oei et al. 2004; see also Franchini et al. 2004). A further striking discovery is that up to 5% of human genes harbor sequences from Alu retronuons in their protein-coding regions that arose mainly via alternative splicing (Nekrutenko and Li 2001;Sorek et al. 2002;Lev-Maor et al. 2003;Kreahling and Graveley 2004;Singer et al. 2004), although it needs to be established what percentage of the alternatively spliced mRNAs encode functional protein variants. ...
Article
Full-text available
The application of molecular genetics, in particular comparative genomics, to the field of evolutionary biology is paving the way to an enhanced “New Synthesis.” Apart from their power to establish and refine phylogenies, understanding such genomic processes as the dynamics of change in genomes, even in hypothetical RNA-based genomes and the in vitro evolution of RNA molecules, helps to clarify evolutionary principles that are otherwise hidden among the nested hierarchies of evolutionary units. To this end, I outline the course of hereditary material and examine several issues including disparity, causation, or bookkeeping of genes, adaptation, and exaptation, as well as evolutionary contingency at the genomic level—issues at the heart of some of Stephen Jay Gould's intellectual battlegrounds. Interestingly, where relevant, the genomic perspective is consistent with Gould's agenda. Extensive documentation makes it particularly clear that exaptation plays a role in evolutionary processes that is at least as significant as—and perhaps more significant than—that played by adaptation.
... In addition, 17% of gene-associated insertions occur in the 1 kb region upstream or downstream of a gene, with potential regulatory implications [37]. For all gene-associated insertions located in the NPE, whose sequence divergence from the consensus could be estimated (114 of 141), 61% are putatively very recent (identity $99%), suggesting that they may be polymorphic in An. gambiae and vary geographically among populations [38]. Most of these putatively recent insertions (83%) are proviral, consistent with the overall pattern observed in the genome (Table S1). ...
Preprint
Full-text available
Transcriptomes are dynamic, with cells, tissues, and body parts expressing particular sets of transcripts. Transposons are a known source of transcriptome diversity, however studies often focus on a particular type of chimeric transcript, analyze single body parts or cell types, or are based on incomplete transposon annotations from a single reference genome. In this work, we have implemented a method based on de novo transcriptome assembly that minimizes the potential sources of errors while identifying a comprehensive set of gene-TE chimeras. We applied this method to head, gut and ovary dissected from five Drosophila melanogaster natural populations, with individual reference genomes available. We found that 18.6% of body part specific transcripts are gene-TE chimeras. Overall, chimeric transcripts contribute a median of 38% to the total gene expression, and they provide both DNA binding and catalytic protein domains. Our comprehensive dataset is a rich resource for follow-up analysis. Moreover, because transposable elements are present in virtually all species sequenced to date, their relevant role in spatially restricted transcript expression is likely not exclusive to the species analyzed in this work.
Article
Full-text available
The Saccharomyces cerevisiae genome contains five families of long terminal repeat (LTR) retrotransposons, Ty1-Ty5. The sequencing of the S. cerevisiae genome provides an unprecedented opportunity to examine the patterns of molecular variation existing among the entire gf nomic complement of Ty retrotransposons. We report the results of an analysis of the nucleotide and amino acid sequence variation within and between the five Ty element families of the S. cerevisiae genome. Our results indicate that individual Ty element families tend to be highly homogenous in both sequence and size variation. Comparisons of within-element 5' and 3' LTR sequences indicate that the vast majority of Ty elements have recently transposed. Furthermore, intrafamily Ty sequence comparisons reveal the action of negative selection on Ty element coding sequences. These results taken together suggest that there is a high level of genomic turnover of S. cerevisiae Ty elements, which is presumably in response to selective pressure to escape host-mediated repression and elimination mechanisms.
Article
Full-text available
The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.
Article
Full-text available
A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies—a whole-genome assembly and a regional chromosome assembly—were used, each combining sequence data from Celera and the publicly funded genome effort. The public data were shredded into 550-bp segments to create a 2.9-fold coverage of those genome regions that had been sequenced, without including biases inherent in the cloning and assembly procedure used by the publicly funded group. This brought the effective coverage in the assemblies to eightfold, reducing the number and size of gaps in the final assembly over what would be obtained with 5.11-fold coverage. The two assembly strategies yielded very similar results that largely agree with independent mapping data. The assemblies effectively cover the euchromatic regions of the human chromosomes. More than 90% of the genome is in scaffold assemblies of 100,000 bp or more, and 25% of the genome is in scaffolds of 10 million bp or larger. Analysis of the genome sequence revealed 26,588 protein-encoding transcripts for which there was strong corroborating evidence and an additional ∼12,000 computationally derived genes with mouse matches or other weak supporting evidence. Although gene-dense clusters are obvious, almost half the genes are dispersed in low G+C sequence separated by large tracts of apparently noncoding sequence. Only 1.1% of the genome is spanned by exons, whereas 24% is in introns, with 75% of the genome being intergenic DNA. Duplications of segmental blocks, ranging in size up to chromosomal lengths, are abundant throughout the genome and reveal a complex evolutionary history. Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems. DNA sequence comparisons between the consensus sequence and publicly funded genome data provided locations of 2.1 million single-nucleotide polymorphisms (SNPs). A random pair of human haploid genomes differed at a rate of 1 bp per 1250 on average, but there was marked heterogeneity in the level of polymorphism across the genome. Less than 1% of all SNPs resulted in variation in proteins, but the task of determining which SNPs have functional consequences remains an open challenge.
Article
Forty-six second-chromosome lines of Drosophila melanogaster isolated from five natural populations were surveyed for restriction map variation in a 65-kb region surrounding the gene (Ddc) encoding dopa decarboxylase (DDC). Sixty-nine restriction sites were scored, 13 of which were polymorphic. Average heterozygosity per nucleotide was estimated to be 0.005. Eight large (0.7-5.0 kb) inserts, two small inserts (100 and 200 bp) and three small deletions (100-300 bp) were also observed across the 65-kb region. We see no evidence for a reduction in either nucleotide heterozygosity or insertion/deletion variation in the central 26-kb segment containing Ddc and a dense cluster of lethal complementation groups and transcripts (greater than or equal to 9 genes) compared to that seen in the adjacent regions (totaling 39 kb) in which only a single gene and transcript has been detected, or to that observed for other gene regions in D. melanogaster. The distribution of restriction site variation shows no significant departure from that expected under an equilibrium neutral model. However insertions and deletions show a significant departure from neutrality in that they are too rare in frequency, consistent with them being deleterious on average. Significant linkage disequilibrium among variants exists across much of the 65-kb region. Lower regional rates of recombination combined with the influence of polymorphic chromosomal inversions, rather than epistatic selection among genes in the dense cluster, probably are sufficient explanations for the creation and/or maintenance of the linkage disequilibrium observed in the Ddc region. We have also assayed adult DDC enzyme activity in these same lines. Twofold variation in activity among lines is observed within our sample. Significant associations are observed between level of DDC enzyme activity and restriction map variants. Surprisingly, one line with a 5.0-kb insert within an intron and one line with a 1.5-kb insert near the 5' end of Ddc each show normal adult DDC activities.
Article
Integrative recombination mechanisms associated with temperate bacteriophages and other DNA insertion elements join DNA molecules in the absence of extensive genetic homology. These mechanisms provide a new pathway for the evolution of prokaryotic and eukaryotic chromosome structure.