Content uploaded by John F Mcdonald
Author content
All content in this area was uploaded by John F Mcdonald on Dec 12, 2013
Content may be subject to copyright.
Retrotransposon-Gene Associations Are Widespread Among
D. melanogaster Populations
Lucia F. Franchini, Eric W. Ganko, and John F. McDonald
Department of Genetics, University of Georgia, Athens
We have surveyed 18 natural populations of Drosophila melanogaster for the presence of 23 retrotransposon-gene–
association alleles (i.e., the presence of an LTR retrotransposon sequence in or within 1,000 bp of a gene) recently
identified in the sequenced D. melanogaster genome. The identified associations were detected only in the D.
melanogaster populations. The majority (61%) of the identified retrotransposon-gene associations were present only in
the sequenced strain in which they were first identified. Thirty percent of the associations were detected in at least one of
the natural populations, and 9% of the associations were detected in all of the D. melanogaster populations surveyed.
Sequence analysis of an association allele present in all populations indicates that selection is a significant factor in the
spread and/or maintenance of at least some of retroelement-gene associations in D. melanogaster.
Introduction
Retrotransposons are eukaryotic transposable ele-
ments having a life cycle that includes reverse transcrip-
tion of an RNA intermediate (Boeke and Stoye 1997).
Retrotransposons constitute a significant fraction of most
eukaryotic genomes. For example, in species with rel-
atively large genomes, such as humans, it is estimated that
nearly half of the genome is composed of retrotransposon
sequences (Venter et al. 2001; Lander et al. 2001). The
abundance of retrotransposon sequences in the genomes of
some plant species is even higher. For example, at least
50% of the maize genome (SanMiguel et al. 1996) and
approximately 90% of the genome of some species of lilies
(Flavell 1986) are composed of retrotransposons. In
species with smaller genomes, such as yeast, nematodes,
and fruit flies, the percentage of retrotransposon sequences
in the genome is much less, typically ranging from 1% to
10% (Cherry et al. 1997; Celniker et al. 2002; Kaminker
et al. 2002; Kidwell 2002).
While the adaptive significance of transposable
elements (TEs) was assumed by those researchers that
first discovered them (e.g., McClintock [1951] and Shapiro
[1977]), subsequent theoretical and population genetic
studies (e.g., Hickey [1982] and Charlesworth and Langley
[1989]) called this assumption into question and proposed
that retrotransposons and other transposable elements are
more appropriately viewed as parasitic-like sequences that
provide little or no adaptive benefit to their hosts (e.g.,
Charlesworth [1988]). More recent findings in molecular
biology and genomics indicate that this negative in-
dictment may have been premature. Presently, there are
a large and growing number of examples of retrotranspo-
son sequences located in or near genes that have been
shown to have a significant effect on gene expression
(Britten 1996; Brosius 1999; Landry, Medstrand, and
Mager 2001; Medstrand, Landry, and Mager 2001; Sorek,
Ast, and Graur 2002; Lerman et al. 2003).
The growing availability of the complete sequence of
a variety of genomes is providing an unprecedented
opportunity to more objectively assess the contribution of
transposable element sequences to gene structure and
function. The genomic approach typically begins with the
identification of a TE-gene association (i.e., the occurrence
of a TE sequence in or near a gene) in a sequenced genome
(Maside et al. 2002; Petrov et al. 2003). For example, it
has recently been shown that the majority of long terminal
repeat (LTR) retrotransposon sequences in C. elegans are
located in or near genes (Ganko, Fielman, and McDonald
2001; Ganko et al. 2003). Likewise, recent computational
analyses of the sequenced human genome indicate that
retrotransposon sequences are located in the coding
regions of at least 4% (Nekrutenko and Li 2001) and in
the promoter regions of at least 25% (Jordan et al. 2003) of
human genes. However, the mere identification of a TE-
gene association in a sequenced genome is not, in itself,
indicative of adaptive significance because it may only
represent an insertional mutant unique to the sequenced
strain. If, on the other hand, a given TE-gene association is
shown to be in high frequency or fixed within a species
or among closely related species, this may be taken as
putative evidence that the association is of functional
significance. The selective hypothesis can be subsequently
tested for each candidate adaptive association by sequence
and/or molecular analyses.
Our laboratory has long been interested in the
evolution and significance of transposable elements in
Drosophila (e.g., McDonald [1990] and McDonald
[1993]). We have recently initiated a study to identify
LTR retrotransposon-gene associations (i.e., the presence
of an LTR retrotransposon sequence in or within 1,000 bp
of a gene) in the sequenced Drosophila melanogaster
genome (Ganko et al., unpublished data). We have
subsequently initiated a survey of the presence of these
LTR retrotransposon-gene associations in natural popula-
tions. In this paper, we report the presence of 23
previously unidentified LTR retrotransposon-gene associ-
ations in the sequenced Drosophila genome. Using
a polymerase chain reaction (PCR)–based approach, we
have searched for the presence of each of these
associations in 18 natural populations of D. melanogaster.
The results indicate that the majority of the associations
identified in the sequence strain involve recently inserted
elements and that these associations are endemic to the
sequenced strain. In contrast, those associations that were
Key words: Long terminal repeats, retrotransposons, transposable
elements, Drosophila melanogaster, genome evolution.
E-mail address: mcgene@uga.edu.
Mol. Biol. Evol. 21(7):1323–1331. 2004
doi:10.1093/molbev/msh116
Advance Access publication March 10, 2004
Molecular Biology and Evolution vol. 21 no. 7 Ó Society for Molecular Biology and Evolution 2004; all rights reserved.
by guest on June 4, 2013http://mbe.oxfordjournals.org/Downloaded from
Table 1
LTR-Gene Associations Identified in the Drosophila Melanogaster Genome
Element Family TE Size Proximity Intron/exon Gene Putative Gene Functions
mdg3 5520 bp 0 bp intron para Voltage-sensitive sodium chennel;
DDT susceptability/resistance
412 7520 bp 0 bp intron Deaf 1 Deformed epidermal autoregulatory factor-1
Blastopia 4977 bp 0 bp intron Bazooka Protein kinase C binding
Blastopia 3247 bp 0 bp intron CG6352 odsH-site homeobox transcription factor
297 5899 bp 0 bp exon-intron Cyp309a2 Cytochrome P450
mdg3 267 bp 0 bp 39 UTR mRpL48 Mitochondrial ribosomal protein L48
Roo 9049 bp 0 bp 39 UTR CG9527 Pristanol-CoA oxidase
Roo 9150 bp 0 bp 59 UTR CG12885 Unknown
Transpac 329 bp 0 bp 59 UTR CG7900 Fatty acid amide hydrolase
Quasimodo 659bp 8 bp 39 CycE Cyclin dependant protein kinase, regulator
297 414 bp 68 bp 59 Cyp309a1 Cytochrome P450
Quasimodo 658 bp 171 bp 59 HSP Heat shock protein
412 3731 bp 491 bp 39 DHPR Dihydropteridine reductase
297 5622 bp 939 bp 59 Ab1 Protein tyrosine kinase
Roo 9212 bp 0 bp intron DopR Dopamine receptor
297 5990 bp 0 bp intron Syn Neurotransmiter release
Roo 427 bp 0 bp 59 UTR CG18446 Unknown zinc finger domain
Quasimodo 659 bp 177 bp 39 SPN3 Serpins
Quasimodo 659 bp 256 bp 59 CG9333 Unknown
297 413 bp 283 bp 39 Ken Transcription factor
297 413 bp 293 bp 59 TM4SF Integral membrane protein;
plasma membrane
Quasimodo 281 bp 207 bp 59 CTCF Transcription factor
Beagle 593 bp 458 bp 39 CG17514 Translation activator
NOTE.—Boldface type highlights retroelement and gene names.
Table 2
Presence or Absence of Retroelement Sequence Associated with 23 Genes in 18 Drosophila melanogaster Strains
Representing a Natural Population, the Laboratory Stock y
1
;cn
1
bw
1
sp
1
Quasimodo/ Beagle/ roo/ 297/ roo/ 297/ 297/ Quasimodo/
a
Quasimodo/
b
Quasimodo/
Geographic Area Strain CTCF CG17514 DopR Syn CG18446 ken
a
TM4SF
a
spn3 CG9333 CycE
Laboratory stock 1 y
1
;cn
1
bw
1
sp
1
1 111111 1 1 1
Americas 2 US (Athens) 1 112111 2 2 2
3 US (California) 1 121111 2 2 2
17 Chile (Santiago) 1 122211 2 2 2
11 Antilles
(Rouge)
1 122111 1 1 2
Europe 5 Germany
(Freiburg)
1 121222 2 2 2
19 France
(Bordeaux)
1 122211 1 1 2
12 Italy (Frascati) 1 122222 2 2 2
13 Russia
(Dilizhan)
1 121122 2 2 2
18 Russia
(Dushabe)
1 122122 1 1 2
Africa 4 South Africa
(Cape Town)
1 122211 2 2 2
8 Kenia (Kenya) 1 122222 1 1 2
6 Niger (Niamey) 1 122222 1 1 2
9 Swaziland
(Mbabane)
1 122222 2 2 2
7 Congo
(Dimonika)
1 122222 1 1 2
14 Congo
(Brazzaville)
1 122211 2 2 2
15 Ivory Coast
(Tai Forest)
1 122211 1 1 2
Oceania 16 Australia
(Melbourne)
1 122111 2 2 2
Asia 10 India (Rohtak) 1 122122 1 1 2
a
ken and TM4SF flank the 297 LTR.
b
spn3 and CG9333 flank the Quasimodo LTR.
1324 Franchini et al.
by guest on June 4, 2013http://mbe.oxfordjournals.org/Downloaded from
found to be widespread among populations are composed
of relatively small retrotransposon fragments that are
presumably of much older origin. We present evidence of
a selective sweep of a retrotransposon-gene association
present in all D. melanogaster populations surveyed.
Methods
Identification of LTR-Gene Association in the D.
melanogaster Genome
Using previously identified Drosophila LTR retro-
transposon sequences as queries (Bowen and McDonald
2001), sequence retrieval was initiated via BlastN searches
(default parameters [Altschul et al. 1997]) against the BDGP
(http://www.fruitfly.org) and GenBank (http://www.ncbi.
nlm.nih.gov) databases. Results with e-values less than e
210
were annotated on the corresponding genomic clone with
MacVector version 7.0 (http://www.gcg.com), and nearby
genes were noted. Selected genes within 1 kb of a TE were
Blasted against NCBI’s EST database and mapped
along with predicted transcript structures from FlyBase
(http://www.flybase.org).
Drosophila Strains
D. melanogaster strains established from 20 to 30
wild-collected females from Congo (Dimonika), Niger
(Niamey), Swaziland (Mbabane), Kenya (Kenia), South
Africa (Cape Town), India (Rohtak), Russia (Dushnabe
and Dilizhan), Congo (Brazzaville), Ivory Coast (Tai
Forest), Australia (Melbourne), Chile (Santiago de Chile),
and France (Bordeaux) populations were obtained from
Jean David, CNRS Gif-sur-Yvette, France. Strains from
Germany (Freiburg), Italy (Frascati), and The Antilles
(Rouge) were obtained from Nikolaj Junakovic, Universita´
la Sapienza, Rome, Italy. The California and Athens
(Georgia, USA) strains were provided by Daniel Promis-
low, University of Georgia. The D. melanogaster
sequenced strain y
1
;cn
1
bw
1
sp
1
was obtained from the
Bloomington, Indiana, stock center.
Polymerase Chain Reaction
PCR primers were designed with MacVector version
7.0 and synthesized by Integrated DNA Technologies
(Coralville, Iowa). The primer sequences used in each
reaction are displayed in table 3 of Supplementary
Material online. Three replicate PCR reactions were
carried out per strain, per gene-retrotransposon association.
The DNA used in each reaction (100 ng) was separately
isolated from 10 flies (five males and five females per
isolation) according to previously described methods
(Gloor et al. 1993). PCR products for each primer were
amplified in a 25ll reaction containing 3mM MgCl
2
, 10X
Table 2
Extended
412/ 297/ 297/ Quasimodo/ mdg3/ blastopia/ 412/ mdg3/ Transpac/ blastopia/ roo/ roo/ 297/
Deaf1 Cyp309a1 Ab1 CG16954 para Baz DHPR mRpL48 CG7900 CG6352 CG9527 CG12885 Cyp309a2
1111 1111 1 1 1 1 1
2222 2222 2 2 2 2 2
2222 2222 2 2 2 2 2
2222 2222 2 2 2 2 2
2222 2222 2 2 2 2 2
2222 2222 2 2 2 2 2
2222 2222 2 2 2 2 2
2222 2222 2 2 2 2 2
2222 2222 2 2 2 2 2
2222 2222 2 2 2 2 2
2222 2222 2 2 2 2 2
2222 2222 2 2 2 2 2
2222 2222 2 2 2 2 2
2222 2222 2 2 2 2 2
2222 2222 2 2 2 2 2
2222 2222 2 2 2 2 2
2222 2222 2 2 2 2 2
2222 2222 2 2 2 2 2
2222 2222 2 2 2 2 2
LTR-Gene Associations Among Drosophila Populations 1325
by guest on June 4, 2013http://mbe.oxfordjournals.org/Downloaded from
PCR buffer supplied by Pierce (Rockford, Ill.), 2%
DMSO, 0.2mM dNTPs, 0.5lM of each primer, and 0.5U
of Taq DNA polymerase supplied by Pierce [Rockford,
Ill.]. The program consisted of an initial incubation at 948C
for 5 min followed by 35 cycles each consisting of 30 s at
948C, 1 min at the annealing temperature specific for each
primer pair (see table 3 in Supplementary Material online),
1 min (per kb of PCR product) at 728C and a final ex-
tension cycle 10 min at 728C (final extension). All reactions
were carried out in a Hot Top–equipped Robocycler Gra-
dient 96 (Stratagene, La Jolla, Calif.). 25 ll of each PCR
product was separated on 1% agarose gel in 0.53 TBE
running buffer containing 0.25lgmL
21
ethidium bromide.
Gel images were analyzed by UV transillumination.
Sequencing
PCR products were agarose gel purified (Qiaquick,
Qiagen, Valencia Calif.) and cloned with TOPO TA
(Invitrogen, Carlsbad, Calif.). DNA sequencing was
performed in the Molecular Genetics Instrumentation
Facility at the University of Georgia. Sequencing primers
and primers used for amplifying sequenced PCR product,
when different from the association primers, are shown
in table 3 on Supplementary Material online. Sequence
readouts were checked manually for accurate base callings
and were assembled with Sequencher (Genes Codes, Ann
Arbor, Mich.). The length of the region analyzed is given
according to the expected length in the sequenced strain and
the polymorphic site positions are located relative to this
reference sequence. Nucleotide sequences were aligned
using ClustalW (MacVector 7.0). As a control for PCR
errors, we also sequenced the published y
1
;cn
1
bw
1
sp
1
strain. Population genetic parameters were obtained using
DnaSP version 3.95.7 (Rozas and Rozas 1999).
Results
Twenty-three new LTR retrotransposon-gene associ-
ations were selected for analysis. Genome sequence
analysis resulted in the identification of over 300 hundred
LTR retrotransposon-gene associations (Ganko et al.,
unpublished data). These associations consisted of full-
length and smaller, fragmented LTR-retrotransposon
FIG. 1.—Examples of PCR analyses used to detect the presence of LTR retrotransposon-gene association across 18 representative natural
populations of D. melanogaster and the sequenced strain. Three PCR reactions were performed per strain, per gene; six representative strains are shown.
First lane DNA ladder. (A) PCR products showing that the Quasimodo LTR fragment (281 bp) located 207 bp upstream of the CTCF gene is present in
all D. melanogaster populations analyzed. Q ¼ product from Quasimodo-specific primers (expected size ¼ 186 bp); C ¼ product from CTCF-specific
primers (expected size ¼ 395 bp); QC ¼ product from Quasimodo F 1 CTCF R primers PCR (expected size ¼ 1,805 bp). (B) PCR products showing
that the Beagle fragment (593 bp) located 458 bp 39 to the heterochromatic CG17514 gene is present in all D. melanogaster populations analyzed. B ¼
Beagle primers PCR product (expected size ¼ 562); G ¼ CG17514 primers PCR product (expected size ¼ 250); BG ¼ CG17514 F 1 Beagle R primers
PCR product (expected size ¼ 3,293 bp).
1326 Franchini et al.
by guest on June 4, 2013http://mbe.oxfordjournals.org/Downloaded from
sequences located both within genes and adjacent to genes.
The 23 associations selected to be representative of the
variety of sizes and location of gene-associated element
sequences are shown in table 1. The sequences analyzed
include six full-length LTR retrotransposons, nine LTRs,
and eight fragments, ranging in size from 281 bp to 9212
bp. There were two instances in which a single retro-
transposon sequence was associated with two genes (a 297
LTR was flanked by the Ken and TM4SF genes; a 659-bp
Quasimodo LTR was flanked by the spn3 and CG9333
genes).
LTR retrotransposon sequences were located within
the Flybase-predicted transcriptional boundary (including
untranslated leader regions [UTR] introns and exons) of 12
genes. Six of these 12 associations were located within
introns. Of the remaining six associations, two were in the
39 UTR, 3 were in the 59 UTR, and one spanned part of an
exon and intron. Of the 11 LTR retrotransposon sequences
located outside of gene boundaries, six were located 59
(ranging from 68 to 939 bp upstream from the transcrip-
tional start site) of the gene and five were located 39
(ranging from 8 to 491 bp downstream of the polyA
addition site) to the gene.
The majority of the LTR retrotransposon-gene
associations identified in the sequenced genome were not
detected in natural populations. Two sets of PCR primers
were designed for each retrotransposon-gene association,
one to amplify a portion of the associated gene and the
other to amplify a portion of the associated retrotransposon
sequence. Appropriate pairs of these gene and retrotrans-
poson primers were combined to detect the presence or
absence of each retrotransposon-gene association in strains
representing 18 geographically dispersed populations of
D. melanogaster.
More than half (14 of 23, or 61%) of the associations
were detected only in the sequenced strain (tables 1 and 2).
Of these, the majority of the associated elements (80%)
were full-length or nearly full-length in size and had
identical or nearly identical LTRs (. 99% sequence
identity). This is consistent with the possibility that these
elements have inserted in the recent evolutionary past and,
thus, represent mutational polymorphism within the
sequenced strain. Seven of the 23 associations were
detected in some but not all of the D. melanogaster
populations (tables 1 and 2). Some of these alleles were
found to display slight indel variation in the size of the
associated retrotransposon sequence (fig. 4).
Two associations were detected in all 18 D.
melanogaster populations. Two of the 23 associations
(9%) were detected in all 18 of the D. melanogaster
populations surveyed (tables 1 and 2 and fig. 1). One of
these associations is a promoter-containing 268-bp Qua-
simodo LTR fragment located 207 bp 59 to the CTCF
(CG8591) gene (fig. 2A). The second is a Beagle fragment
(593 bp) located 458 bp 39 to CG17514 gene (fig. 3). The
CTCF (CG8591) gene maps to a euchromatic region
FIG. 2.—(A) Structure of the Quasimodo-CTCF allele in the sequenced D. melanogaster genome. The 281-bp fragment of Quasimodo LTR is
associated with the CTCF gene in the D. melanogaster sequenced genome. Arrows represent the position of the primers used to detect the associations
in the populations and species studied. The area sequenced is boxed. Two alternative transcripts have been detected for this gene, Ra and Rb, one
composed of exons 1 (Ra), 2(Ra), 3 and 4 and the other composed of exon 1 (Rb) and exon 2 (Rb). (B) The sequence of the 281-bp Quasimodo LTR
fragment is conserved across all D. melanogaster populations examined, whereas adjacent intronic and exonic sequences are significantly diverged.
Vertical numbers represent the position of the polymorphic site; numbering is according to the sequenced strain; zero indicates no polymorphism in
a defined area of the sequence.
LTR-Gene Associations Among Drosophila Populations 1327
by guest on June 4, 2013http://mbe.oxfordjournals.org/Downloaded from
(65F6) on chromosome 3L (www.flybase.org), whereas
the CG17514 gene maps to constitutive heterochromatin
(Hoskins et al. 2002).
Sequence Analysis
To gain insight into the factors that may have
contributed to the widespread distribution of the Beagle-
CG17514 and Quasimodo-CTCF alleles in D. mela-
nogaster populations, we sequenced various regions in
and around the LTR retrotransposon sequence associated
with each of these two widely distributed alleles in six
geographically diverse populations (Athens, California,
Germany, Kenya, India, and Antilles). The resulting
sequences were aligned with one another and with that
of the sequenced y
1
;cn
1
bw
1
sp
1
strain.
We sequenced the 1,144-bp region containing the
Beagle fragment and the adjoining 59 flanking region
(including the 39 UTR) of CG17514 (fig. 3). We also
sequenced an additional 882-bp coding region (exon 3)
within the CG17514 gene. This coding region was found
to contain the highest number of polymorphic sites
among the six natural populations and sequenced y
1
;cn
1
bw
1
sp
1
strains (6.1% divergence). The sequence of the
region containing the Beagle fragment (6.1% divergence)
and the adjacent intragenomic region (3.3% divergence)
were also found to be highly polymorphic among the six
populations and the sequenced y
1
;cn
1
bw
1
sp
1
strain.
In an 1,863-bp region containing the Quasimodo LTR
fragment and a portion of the CTCF gene, there were a total
of 38 polymorphic sites, of which 14 were small indels.
Remarkably, the entire 281-bp Quasimodo LTR fragment
was found to be identical in sequence among all six natural
population samples (0% divergence), as well as in the
sequenced y
1
;cn
1
bw
1
sp
1
strain. The sequence of the
immediately adjacent CTCF exon 1 (59 UTR) was nearly
FIG. 3.—(A) Structure of the Beagle-CG17514 allele in the sequenced D. melanogaster genome. A 593-bp Beagle fragment is located
458 bp downstream to the CG17514 gene on the D. melanogaster sequence genome. Arrows represent the position of the primers used to
detect the associations in the populations and species studied. The area sequenced is boxed. (B) Sequence analysis showing that the Beagle-derived
sequence and the gene region (exon and intron) contain a high number of polymorphic sites in the seven strains analyzed. Vertical numbers represent
the position of the polymorphic site; numbering is according to the sequenced strain, and zero indicates no polymorphism in a defined area of
the sequence.
1328 Franchini et al.
by guest on June 4, 2013http://mbe.oxfordjournals.org/Downloaded from
invariant (0.3% divergence) among all strains. Higher
levels of intraspecific variation were, however, detected in
the more distal intron 1 (2.5% divergence) and exons 2 and
3 (3.0% divergence) of the CTCF gene (fig. 2).
Discussion
Intraspecific patterns of nucleotide and retrotranspo-
son-gene allelic variation appear to be distinct. Several
techniques have been used to estimate levels of nucleotide
genetic variation within and between species. In the early to
middle 1970s, extensive studies of allozyme variation were
carried out within and between species of Drosophila
(Ayala 1975). The general conclusion was that relatively
little nucleotide genetic variation exists between popula-
tions of Drosophila species. Local populations of Drosoph-
ila were estimated to be greater than 95% identical based on
the results of allozyme studies, and this value has been
generally supported by subsequent restriction fragment
length polymorphisms (RFLP) and direct sequencing based
studies (Aquadro et al. 1992).
The results presented in this paper suggest that the
story is quite different with regard to retrotransposon
insertional variation in or near genes. We estimate that in
the sequenced Drosophila melanogaster genome, approx-
imately 2% of the genes (approximately 300 genes) are
associated with an LTR retrotransposon sequence (i.e., an
LTR retrotransposon sequence in or within 1,000 bp of the
gene) (Ganko et al., unpublished data). The results
presented in this paper suggest that the vast majority of
these associations (;61%) are endemic to the sequenced
strain. Previous studies indicate that the genome of the
sequenced strain is a typical D. melanogaster genome with
respect to the number and distribution of transposable
elements (Kaminker et al. 2002). In so far as this is correct,
our results indicate that although there appears to be
a relatively large number of retrotransposon-gene associ-
ations present in D. melanogaster genomes, the majority
of these variants are likely to be population/strain specific.
We did find, however, that a significant proportion of
the retrotransposon-gene associations present in the
sequenced genome are widely distributed among natural
populations. Indeed, 39% (nine of 23) of the retrotrans-
poson-gene associations identified in the sequence strain
were detected in at least two populations, and more than
30% (seven of 23) were detected in at least seven out of
the 18 populations. Nine percent (two of 23) of the
retrotransposon-gene associations were detected in all of
the 18 populations surveyed.
Previous surveys of transposable-element insertion
variants using in situ hybridization and RFLP methodolo-
gies (Charlesworth and Langley 1989) failed to detect
insertion variants that were widespread among D. mela-
nogaster populations. However, the ability of these
techniques to detect relatively small insertions are limited,
and our results indicate that most of the retrotransposon-
gene associations that are widespread among populations
are composed of relatively small retrotransposon fragments
(tables 1 and 2).
The majority of the retrotransposon-gene association
variants that are unique to one or a few populations are
likely of recent evolutionary origin. When LTR retro-
transposons initially integrate into genomes, they are
generally full-length in size; that is, they are composed of
gag, pol, and sometimes env genes flanked by identical
LTRs (Boeke and Stoye 1997). Full-length Drosophila
LTR retrotransposons are typically 5 to 7 kb in length
(Archipoda, Lynbaniskaya, and Ilin 1995). Over time, these
full-length elements generally decrease in size because of
the gradual accumulation of small deletions or by other
mechanisms believed to actively remove transposable
element sequences from the genome (Petrov 2002). In our
study, more than 60% of the LTR retrotransposon
sequences that are unique to the sequenced strain are
more than 3,000 bp in length and most are full-length or
nearly full-length elements. In addition, retrotransposon-
gene associations identified in the sequenced strain that are
also present in only a few (1 to 3) of the 18 natural
populations surveyed are also composed of full-length or
nearly full-length retrotransposons (e.g., roo-DopR and
297-Syn). The degree of sequence identity among the 59
and 39 LTRs of a full-length LTR retrotransposon can be
used to estimate the time elapsed since the element
transposed (SanMiguel et al. 1998; Jordan and McDonald
1999a, 1999b). All full-length elements found to be
associated with genes in our survey displayed greater than
99% sequence identity, indicating that they have been
recently inserted. These observations stand in contrast to
the fact that those associations more widespread among
populations (eight or nine out of 18) are composed of
retrotransposon sequences no larger than 659 bp in
length. The two associations that were found to be present
in all 18 populations surveyed were composed of retro-
transposon fragments of only 207 and 593 bp in length
respectively.
We conclude from these results that most of the
retrotransposon-gene associations that are strain/popula-
tion specific or present in only a few populations are the
products of relatively recent insertional events. This is
consistent with previous results indicating that essentially
FIG. 4.—PCR products showing that seven LTR retrotransposon-
gene associations are variably present in the 18 strains analyzed. The
numbers above each gel correspond to the 18 D. melanogaster natural
populations described in table 2.
LTR-Gene Associations Among Drosophila Populations 1329
by guest on June 4, 2013http://mbe.oxfordjournals.org/Downloaded from
all full-length elements present in the D. melanogaster
genome are much younger than the age of the species
(Bowen and McDonald 2001; Kaminker et al. 2002). As
elements age and/or are spread among populations, they
appear to become significantly reduced in size. The
question remains as to whether or not those retrotranspo-
son fragments that remain associated with genes over
relatively long spans of evolutionary time are of adaptive
significance.
Sequence analysis indicates that the widespread
Quasimodo-CTCF gene association has undergone a re-
cent selective sweep. There are at least three plausible
explanations for the widespread distribution of retro-
transposon-gene association alleles among Drosophila
melanogaster populations. It is possible that the insertion
alleles were present in the common ancestor of present-
day populations and have been maintained by chance or
selection in some or all populations over evolutionary
time. A second possibility is that the insertion allele arose
more recently in some population and has been spread to
other populations by migration coupled with the action of
drift and/or selection. A third, less likely, possibility is
that the insertion event occurred independently in many
or all of the populations in which the retrotransposon-
gene association is currently present. We consider this
latter possibility extremely unlikely for at least two
reasons. First, new LTR retrotransposon insertions
typically involve full-length elements and, as discussed
above, all of the associations that are widespread among
populations surveyed are composed of relatively small
fragments of retrotransposons. Second, the precise in-
sertion site of any given associated retrotransposon
sequence is the same among all associated alleles,
indicating that each is likely the product of the same
insertional event. Under any scenario, if the retrotrans-
poson-gene associations are being maintained or spread
by random processes, neutral substitutions would be
expected to accumulate among the homologous variants
over evolutionary time.
In an initial effort to assess the relative roles of drift
and selection in the maintenance of widespread retro-
transposon-gene associations, we have examined the
patterns of sequence variation in and around the
retroelement sequences in the two associations that were
detected in all of the 18 populations surveyed in this
study. Figure 3 displays the levels of variation in and
around the Beagle retrotransposon sequence associated
with the CG17514 gene among six of the 18 geo-
graphically diverse populations in which it is found. The
level of polymorphism within the Beagle element and
adjacent intergenic region is twice as high among
populations (6.6%) as in the gene-encoding region
(3.3%). This pattern of variation provides no evidence
of selection operating on the retrotransposon sequence.
The fact that this association is located within a consti-
tutively heterochromatic (and, thus, low recombinogenic)
region of the genome may help explain why it has been
widely maintained in the species, despite the apparent
absence of positive selection.
In contrast, figure 2 displays the patterns of nucleotide
variation in and around the LTR fragment located just
upstream of the CTCF gene among these same six
populations. The level of sequence variation is significantly
reduced in the upstream region immediately adjacent to the
fragmented retrotransposon. Indeed, we found that the 281-
bp sequence of the Quasimodo LTR fragment itself is
sequentially identical among all six populations (0%
divergence). Nucleotide variability remains remarkably
low in the intergenic region immediately adjacent to the
Quasimodo sequence (0.3%) but gradually increases as
a function of distance from the insertion site, reaching
a maximum of 3% in the regions of exons 2 and 3 that were
sequenced. These results are consistent with a selective
sweep centered in the Quasimodo LTR fragment (e.g.,
Hudson, Saez, and Ayala [1997] and Saez et al. [2003]).
Future molecular studies will be required to delineate the
likely functional significance of the Quasimodo sequence
on CTCF gene expression.
Acknowledgments
Research supported by National Institutes of Health
(NIH) Grant to J.F.M. E.W.G. is supported through an
NIH Genetics Training Grant.
Literature Cited
Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z.
Zhang, W. Miller, and D. J. Lipman. 1997. Gapped BLAST
and PSI-BLAST: a new generation of protein database search
programs. Nucleic Acids Res. 25:3389–3402.
Aquadro, C. F., R. M. Jennings Jr., M. M. Bland, C. C. Laurie,
and C. H. Langley. 1992. Patterns of naturally occurring
restriction map variation, dopa decarboxylase activity varia-
tion and linkage disequilibrium in the Ddc gene region of
Drosophila melanogaster. Genetics 132:443–452.
Archipoda, I. R., N. V. Lynbaniskaya, and Y. V. Ilin. 1995
Drosophila retrotransposons. RG Landas Co., Austin, Tex.
Ayala, J. F. 1975. Genetic differentiation during the speciation
process. Evol. Biol. 8:1–78.
Boeke, J. D., and J. P. Stoye, 1997 Retrotransposons, endogenous
retroviruses and the evolution of retroelements. Cold Spring
Harbor Laboratory Press, Cold Spring Harbor, NY.
Bowen, N. J., and J. F. McDonald. 2001. Drosophila euchromatic
LTR retrotransposons are much younger than the host species
in which they reside. Genome Res. 11:1527–1540.
Britten, R. J. 1996. DNA sequence insertion and evolutionary
variation in gene regulation. Proc. Natl. Acad. Sci. USA 93:
9374–9377.
Brosius, J. 1999. RNAs from all categories generate retro-
sequences that may be exapted as novel genes or regulatory
elements. Gene 238:115–134.
Celniker, S. E., D. A. Wheeler, B. Kronmiller et al. (29 co-
authors). 2002. Finishing a whole-genome shotgun: release 3
of the Drosophila melanogaster euchromatic genome se-
quence. Genome Biol. 3 :RESEARCH0079.
Charlesworth, B. 1988. The maintenance of transposable ele-
ments in natural populations. Basic Life Sci. 47:189–212.
Charlesworth, B., and C. H. Langley. 1989. The population ge-
netics of Drosophila transposable elements. Annu. Rev. Genet.
23:251–287.
Cherry, J. M., C. Ball, S. Weng et al. (8 co-authors). 1997.
Genetic and physical maps of Saccharomyces cerevisiae.
Nature 387:67–73.
1330 Franchini et al.
by guest on June 4, 2013http://mbe.oxfordjournals.org/Downloaded from
Flavell, R. B. 1986. Repetitive DNA and chromosome evolution
in plants. Philos. Trans. R. Soc. Lond. Biol. Sci. 312:227–242.
Ganko, E. W., V. Bhattacharjee, P. Schliekelman, and J. F.
McDonald. 2003. Evidence for the contribution of LTR
retrotransposons to C. elegans gene evolution. Mol. Biol.
Evol. 20:1925–1931.
Ganko, E. W., K. T. Fielman, and J. F. McDonald. 2001.
Evolutionary history of Cer elements and their impact on the
C. elegans genome. Genome Res. 11:2066–2074.
Gloor, G. B., C. R. Preston, D. M. Johnson-Schlitz, N. A. Nassif,
R. W. Phillis, W. K. Benz, H. M. Robertson, and W. R.
Engels. 1993. Type I repressors of P element mobility.
Genetics 135:81–95.
Hickey, D. A. 1982. Selfish DNA: a sexually-transmitted nuclear
parasite. Genetics 101:519–531.
Hoskins, R. A., C. D. Smith, J. W. Carlson et al. (13 co-
authors). 2002. Heterochromatic sequences in a Drosophila
whole-genome shotgun assembly. Genome Biol. 3:
RESEARCH0085.
Hudson, R. R., A. G. Saez, and F. J. Ayala. 1997. DNA variation
at the Sod locus of Drosophila melanogaster: an unfolding
story of natural selection. Proc. Natl. Acad. Sci. USA
94:7725–7729.
Jordan, I. K., and J. F. McDonald. 1999a. Comparative genomics
and evolutionary dynamics of Saccharomyces cerevisiae Ty
elements. Genetica 107:3–13.
———. 1999b. Tempo and mode of Ty element evolution in
Saccharomyces cerevisiae. Genetics 151:1341–1351.
Jordan, I. K., I. B. Rogozin, G. V. Glazko, and E. V. Koonin.
2003. Origin of a substantial fraction of human regulatory
sequences from transposable elements. Trends Genet. 19:
68–72.
Kaminker, J. S., C. M. Bergman, B. Kronmiller et al. (9 co-
authors). 2002. The transposable elements of the Drosophila
melanogaster euchromatin: a genomics perspective. Genome
Biol. 3:RESEARCH0084.
Kidwell, M. G. 2002. Transposable elements and the evolution of
genome size in eukaryotes. Genetica 115:49–63.
Lander, E. S., L. M. Linton, B. Birren et al. (more than 100 co-
authors). 2001. Initial sequencing and analysis of the human
genome. Nature 409:860–921.
Landry, J. R., P. Medstrand, and D. L. Mager. 2001. Repetitive
elements in the 59 untranslated region of a human zinc-finger
gene modulate transcription and translation efficiency.
Genomics 76:110–116.
Lerman, D. N., P. Michalak, A. B. Helin, B. R. Bettencourt, and M.
E. Feder. 2003. Modification of heat-shock gene expression in
Drosophila melanogaster populations via transposable ele-
ments. Mol. Biol. Evol. 20:135–144.
Maside, X., A. W. Lee, and B. Charlesworth. 2003. Inferences on
the evolutionary history of the S-element family of Drosophila
melanogaster. Mol. Biol. Evol. 20:1183–1187.
McClintock, B. 1951. Chromosome organization and genetic
expression. Cold Spr. Harb. Symp. Quant. Biol. 16:13–47.
McDonald, J. F. 1990. Macroevolution and retroviral elements.
Bioscience 40:183–191.
———. 1993. Evolution and consequences of transposable
elements. Curr. Opin. Genet. Dev. 3:855–864.
Medstrand, P., J. R. Landry, and D. L. Mager. 2001. Long
terminal repeats are used as alternative promoters for the
endothelin B receptor and apolipoprotein C-I genes in
humans. J. Biol. Chem. 276:1896–1903.
Nekrutenko, A., and W. H. Li. 2001. Transposable elements are
found in a large number of human protein-coding genes.
Trends Genet. 17:619–621.
Petrov, D. A. 2002. DNA loss and evolution of genome size in
Drosophila. Genetica 115:81–91.
Petrov, D. A., Y. T. Aminetzach, J. C. Davis, D. Bensasson, and
A. E. Hirsh. 2003. Size matters: non-LTR retrotransposable
elements and ectopic recombination in Drosophila. Mol. Biol.
Evol. 20:880–892.
Rozas, J., and R. Rozas. 1999. DnaSP version 3: an integrated
program for molecular population genetics and molecular
evolution analysis. Bioinformatics 15:174–175.
Saez, A. G., A. Tatarenkov, E. Barrio, N. H. Becerra, and F. J.
Ayala. 2003. Patterns of DNA sequence polymorphism at Sod
vicinities in Drosophila melanogaster: unraveling the foot-
print of a recent selective sweep. Proc. Natl. Acad. Sci. USA
100:1793–1798.
SanMiguel, P., B. S. Gaut, A. Tikhonov, Y. Nakajima, and J. L.
Bennetzen. 1998. The paleontology of intergene retrotrans-
posons of maize. Nat. Genet. 20:43–45.
SanMiguel, P., A. Tikhonov, Y. K. Jin et al. 1996. Nested
retrotransposons in the intergenic regions of the maize
genome. Science 274:765–768.
Shapiro, J. A. 1977. DNA insertion elements and the evolution of
chromosome primary structure. Trends Biochem. Sci. 2:622–
627.
Sorek, R., G. Ast, and D. Graur. 2002. Alu-containing exons are
alternatively spliced. Genome Res. 12:1060–1067.
Venter, J. C., M. D. Adams, E. W. Myers et al. 2001. The
sequence of the human genome. Science 291:1304–1351.
Thomas Eickbush, Associate Editor
Accepted February 26, 2004
LTR-Gene Associations Among Drosophila Populations 1331
by guest on June 4, 2013http://mbe.oxfordjournals.org/Downloaded from